2,
k # ..... =
(r -
1)!(s -
r -
1 ) ! ( n - s)!
x[F(x)]r-af(x)h(x)
dx
, (3.21)
where
I 2 ( x ) = f y I F ( y ) - F(x)] s-r 1 F ( y ) [ 1 - F ( y ) ] n - ' + l d y .
N. Balakrishnan and R. Aggarwala
98
Table 4.2 Variances and covariances, ~rl,/:n, for generalized logistic distribution i
j
n
k=O.1
k=0.2
k-0.3
k=0.4
1 1 1 2 1 1 1 2 2 3 1 1 1 1 2 2 2 3 3 4 1 1 1 1 1 2 2 2 2 3 3 3 4 4 5 1 1 1 1 1 i 2 2 2 2 2 3 3 3
l 1 2 2 l 2 3 2 3 3 1 2 3 4 2 3 4 3 4 4 1 2 3 4 5 2 3 4 5 3 4 5 4 5 5 1 2 3 4 5 6 2 3 4 5 6 3 4 5
1 2 2 2 3 3 3 3 3 3 4 4 4 4 4 4 4 4 4 4 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 6 6 6 6 6 6 6 6 6 6 6 6 6 6
3.54009 3.21455 1.03356 1.79852 3.22336 1.04485 0.55362 1.32104 0.73478 1.40937 3.28505 1.06283 0.58503 0.36921 1.19989 0.68084 0.43740 0.93569 0.61883 1.23146 3.35729 1.08387 0.60343 0.39789 0.27466 1.15580 0.65767 0.43910 0.30559 0.80070 0.54420 0.38336 0.77173 0.55482 1.12576 3.42971 1.10535 0.61859 0.41378 0.29890 0.21788 1.13905 0.64835 0.43784 0.31824 0.23298 0.74209 0.50749 0.37197
4.46581 5.10946 1.I4267 1.53681 5.71999 1,34670 0,59750 1.42022 0,66110 1,04662 6.26143 1,49483 0.71809 0.39491 1.44100 0.71372 0.39942 0.87291 0.50235 0.84125 6.74599 1.62022 0.79929 0.47839 0.29245 1.49155 0.75234 0.45590 0.28093 0.83424 0.51443 0.32074 0.66174 0.42064 0.72488 7.18577 1.73146 0.86542 0.53337 0.35537 0.23138 1.55009 0.78834 0.49052 0.32881 0.21499 0.83079 0.52342 0.35375
6.94236 9.75063 1.35715 1.41980 12.10242 1.84310 0.68042 1.60667 0.62167 0.82761 14.18588 2.22352 0.92185 0.44289 1,80771 0.77263 0.37751 0.84371 0.42332 0.60870 16.08437 2.55631 1.10391 0.59832 0.32549 2.00411 0.88499 0.48550 0.26615 0.89382 0.49868 0.27644 0.58475 0.33005 0.49307 17.84484 2.85917 1.26044 0.71330 0.43831 0.25637 2.19232 0.98360 0.56187 0,34729 0.20396 0,95327 0.55122 0.34340
15.81242 26.71650 1.74585 1.41663 36.54763 2.70557 0.82155 1.92143 0.61055 0.69285 45.74617 3.53490 1.24534 0.52306 2.38204 0.86465 0.36902 0.84489 0.36962 0.46340 54.49984 4.30274 1.59998 0.78270 0.38027 2.82015 1.07213 0.53059 0.25965 0.98575 0.49570 0.24510 0.53199 0.26743 0.35184 62.91348 5.03126 1.92369 0.99537 0.56382 0.29761 3.24177 1.26138 0.65859 0.37514 0.19877 1.12247 0.59293 0.34024
Recurrence relations for single and product moments
99
Table 4.2 (Contd.) i
j
n
k=0.1
k-0.2
k=0.3
k=0.4
3 4 4 4 5 5 6 1 1 1 1 1 1 1 2 2 2 2 2 2 3 3 3 3 3 4 4 4 4 5 5 5 6 6 7 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2
6 4 5 6 5 6 6 1 2 3 4 5 6 7 2 3 4 5 6 7 3 4 5 6 7 4 5 6 7 5 6 7 6 7 7 1 2 3 4 5 6 7 8 2 3 4 5 6 7 8
6 6 6 6 6 6 6 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8
0.27398 0.63533 0.47142 0.35041 0.67975 0.51319 1.05388 3.49933 1.12628 0.63213 0.42579 0.31232 0.23838 0.18024 1.13450 0.64561 0.43823 0.32303 0.24741 0.18754 0.71193 0.48793 0.36195 0.27847 0.21181 0.57303 0.42873 0.33191 0.25367 0.54469 0.42561 0.32766 0.62007 0.48341 1.00085 3.56536 1.14633 0.64455 0.43594 0.32219 0.24980 0.19781 0.15355 1.13611 0.64630 0.43994 0.32649 0.25385 0.20144 0.15662
0.23266 0.60833 0.41604 0.27603 0.54913 0.36970 0.64826 7.58979 1.83248 0.92303 0.57732 0.39642 0.28145 0.19107 1.61016 0.82268 0.51855 0.35780 0.25489 0.17347 0.84047 0.53490 0.37139 0.26574 0.18145 0.58954 0.41276 0.29713 0.20382 0.49130 0.35681 0.24646 0.47837 0.33434 0.59312 7.96462 1.92557 0.97487 0.61517 0.42892 0.31394 0.23245 0.16257 1.66948 0.85547 0.54334 0.38038 0.27919 0.20715 0.14510
0.20279 0.59574 0.37534 0.22349 0.45593 0.27517 0.42058 19,49694 3.14047 1.40176 0.81102 0.52073 0.34417 0.21109 2.37227 1.07430 0.62632 0.40405 0.26792 0.16471 1.01505 0.59741 0.38773 0.25819 0.15922 0.61824 0.40448 0.27089 0.16779 0.45202 0.30526 0.19030 0.37871 0.23865 0.37035 21.06088 3.40504 1.53250 0.89864 0.59017 0.40770 0.28261 0.17923 2.54463 1.15941 0.68426 0.45117 0.31251 0.21706 0.13786
0.18121 0.59674 0.34603 0.18570 0.38858 0.21111 0.28575 71.05382 5.73102 2.22871 1.18728 0.71178 0.43820 0.24404 3.65018 1.44019 0.77291 0.46546 0.28743 0.16043 1.25583 0.68014 0.41193 0.25538 0.14295 0.66112 0.40344 0.25145 0.14132 0.42410 0.26637 0.15059 0.30722 0.17538 0.24190 78.96685 6.40817 2.52060 1.36693 0.84404 0.55007 0.35734 0.20661 4.04755 1.61182 0.87957 0.54517 0.35620 0.23182 0.13422
N. Balakrishnan and R. Aggarwala
100 Table 4.2 (Contd.) i
j
n
k=0.1
k=0.2
k=0.3
k=0.4
3 3 3 3 3 3 4 4 4 4 4 5 5 5 5 6 6 6 7 7 8
3 4 5 6 7 8 4 5 6 7 8 5 6 7 8 6 7 8 7 8 8
8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8
0.69517 0.47691 0.35572 0.27757 0.22086 0.17207 0.53883 0.40451 0.31711 0.25322 0.19783 0.48152 0.37990 0.30485 0.23911 0.48709 0.39377 0.31073 0.57774 0.46072 0.95953
0.85592 0.54789 0.38550 0.28394 0.21122 0.14825 0.58455 0.41392 0.30627 0.22861 0.16089 0.46665 0.34741 0.26056 0.18407 0.41954 0.31687 0.22514 0.42931 0.30803 0.55104
1.07680 0.64042 0.42433 0.29491 0.20534 0.13067 0.64520 0.43013 0.30025 0.20974 0.13381 0.45973 0.32280 0.22650 0.14502 0.36801 0.25993 0.16731 0.32702 0.21239 0.33319
1.38618 0.76211 0.47456 0.31105 0.20291 0.11769 0.72489 0.45401 0.29880 0.19551 0.11366 0.46044 0.30468 0.20018 0.11676 0.32861 0.21720 0.12730 0.25498 0.15065 0.21057
Upon integrating by parts treating dy for integration and the rest of the integrand for differentiation, and splitting the integrals into two by writing F(y) as F(x) + IF(y) -F(x)], we get
I2(x) = - (s - r) fxxY[F(y ) - F(x)]" r 1[1 _ F(y)]" s + l f ( y ) d y -- (s -- r -- 1 ) F ( x ) f y
+(n-s+l)
y[F(y) - F(x)]S-r-211 - F(y)]n-s+lf(y) dy
fyy{F(y)-F(x)]"
rI1-F(y)]"
~f(y)dy
(3.22)
+ (n - s + 1)F(x) fy y[F(y) - F(x)]' ~-1 [1 - F(y)]'-'f(y) dy
B
Upon substituting the expression of/2(x) in (3.22) into (3.21) and simplifying the resulting expression, we obtain -
=
(s -
r)(n
-
s +
1) (#r,,+l:°+,
-
n+l r(n - s + 1) -}-
~1-@--1
(#r4-1,s+l:n+l -- #r÷l,s:n--1)
which, when rewritten, yields the recurrence relation in (3.20).
Q.E.D.
Recurrence relations for single and product moments
101
Table 5.1 Coefficients for observed order statistics in BLUE's for generalized logistic distribution
07 n=5 k=0.1
01 k=0.2
k=0.3
k=0.4
k=0.1
k=0.2
k=0.3
k=0.4
.07486 .23328 .30752 .27178 •11257
.04985 .21701 .31589 .30212 .11513
.02744 .19840 .32719 .34389 .10309
.01054 .17613 .33854 .39800 .07679
-.12093 -.16055 -.07005 .07676 .27477
-.07036 -.17684 -.13730 .00675 .37774
-.03522 -.17932 -.19949 -.08201 .49604
-.01274 -.17095 -.25513 -.18900 .62782
n = 10 k=0.1 .01937 .06451 .10293 .13161 .14887 .15363 .14502 .12218 .08421 .02767
k=0.2 .01168 .05401 .09363 .12611 .14861 .15903 .15544 .13569 .09678 .01901
k=0.3 .00562 .04310 .08338 ,11979 .14829 .16569 ,16885 .15394 .11502 -.00370
k=0.4 .00185 •03270 .07245 .11223 .14701 .17259 .18461 .17743 .14157 -.04245
k=0.1 -.04482 -.07419 -.07547 -.06378 -.04250 -.01412 .01914 .05500 .09083 .14992
k=0.2 -.02052 -.05909 -.07399 -.07564 -.06536 -.04400 -.01215 .03000 .08365 .23711
k=0.3 -.00797 -.04451 -.06831 -.08169 -.08345 -.07240 -.04697 -.00441 .06205 .34766
k=0.4 -.00224 -.03222 -.06083 -.08361 -.09718 -.09845 -.08369 -.04699 .02472 .48049
n = 15 k=0.1 .00884 .02940 .04850 .06530 .07928 .09013 .09760 .10150 .10165 .09789 .09004 .07790 .06124 .03968 .01104
k=0.2 .00508 .02345 .04185 .05908 .07431 .08698 .09664 .10292 .10546 .10387 .09773 .08651 .06948 .04533 .00131
k=0.3 .00228 .01756 .03491 .05236 .06876 .08335 .09549 .10460 .11011 .11139 .10772 .09814 .08118 .05405 -.02191
k-0.4 .00070 .01241 .02820 .04539 .06263 .07900 .09368 .10594 .11500 .12001 .11990 .11322 .09766 .06842 -.06216
k=0.1 -.02566 -.04369 -.04885 -.04901 -.04539 -.03869 -.02947 -.01817 -.00519 .00907 .02422 .03982 .05544 .07074 .10483
k=0.2 -.01041 -.03032 -.04043 -.04628 -.04840 -.04708 -.04252 -.03489 -.02427 -.01074 .00569 .02519 .04829 .07692 .17926
k=0.3 -.00355 -.01992 -.03197 -.04145 -.04812 -.05179 -.05226 -.04932 -.04269 -.03203 -.01679 .00398 .03224 .07324 .28045
k = 0.4 -.00088 -.01266 -.02475 -.03617 -.04611 -.05398 -.05925 -.06139 -.05979 -.05367 -.04194 -.02282 .00723 .05727 .40891
n - 20 k=0.1 .00510 .01677 .02795 .03833 .04770 .05589 .06280 .06837 .07251 .07518
k-0.2 .00284 .01296 .02332 .03348 .04306 .05181 .05957 .06618 .07153 .07549
k=0.3 .00122 .00931 .01866 •02842 .03808 .04734 •05594 .06367 .07036 .07582
k-0.4 .00035 .00628 .01439 .02348 .03299 .04255 .05185 .06064 .06870 .07579
k-0.1 -.01742 -.02976 -.03431 -.03625 -.03624 -.03464 -.03169 -.02758 -.02246 -.01647
k=0.2 -.00652 -.01893 -.02572 -.03049 -.03357 -.03512 -.03524 -.03400 -.03145 -.02764
k-0.3 -.00204 -.01137 -.01847 -.02456 -.02959 -.03350 -.03623 -.03773 -.03793 -.03676
k=0.4 -.00046 -.00660 -.01305 -.01948 -.02555 -.03105 -.03579 -.03964 -.04242 -.04399
N. Balakrishnan and R. Aggarwala
102 Table 5.1 (Contd.)
o~
oT n = 20 k=0.1 .07631 .07586 .07376 .06997 .06442 .05703 .04772 .03636 .02274 .00523
k=0.2
k=0.3
k=0.4
k=0.1
k=0.2
k=0.3
k=0.4
.07794 .07879 .07790 .07513 .07035 .06334 .05384 .04143 .02530 -.00425
.07987 .08235 .08305 .08173 .07813 .07187 .06243 .04893 .02952 -.02669
.08168 .08613 .08886 .08954 .08777 .08298 .07435 .06041 .03782 -.06656
-.00973 -.00237 .00551 .01377 .02230 .03097 .03966 .04827 .05695 .08150
-.02261 -.01637 -.00895 -.00032 .00953 .02072 .03349 .04845 .06749 .14725
-.03415 -.03001 -.02421 -.01658 -.00687 .00536 .02086 .04129 .07112 .24137
-.04416 -.04274 -.O3945 -.03399 -.02585 -.01428 .00203 .02587 .06483 .36579
Table 5.2 Variances and covariance of BLUE's for generalized logistic distribution k 5
10
15
20
Var(oT)
Var(0~)
Cov(0;,o;_)
0.63269 0.64600 0.66815 0.69936 0.30861 0.31196 0.31714 0.32380 0.20393 0.20553 0.20790 0.21082 0.15226 0.15323 0.15464 0.15631
0.17883 0.20481 0.25010 0.31783 0.07901 0.08603 0.09879 0.11845 0.05063 0.05413 0.06083 0.07163 0.03724 0.03941 0.04377 0.05104
-0.08046 -0.16286 -0.24942 -0.34292 -0.03739 -0.07447 -0.11123 -0.14805 -0.02426 -0.04813 -0.07144 -0.09434 -0.01794 -0.03553 -0.05258 -0.06919
~
0.1 0.2 0.3 0.4 0.1 0.2 0.3 0.4 0.1 0.2 0.3 0.4 0.1 0.2 0.3 0.4
o~ -
o~
COROLLARY 3.4. S e t t i n g r = 1 a n d s = n i n (3.20), w e o b t a i n t h e r e l a t i o n
1 /21,n+l:n+l = ]/'l,n:n+l - - n~-'-i-(122'n+l:n+l -- #2,n:n+l) (3.23) n+l
-~
n--1
,(]AI:n --k[Al,n:n],/
l~l )> 3
m
REMARK 3.1. L e t t i n g t h e s h a p e p a r a m e t e r k - + 0 i n T h e o r e m s 3 . 1 - 3 . 6 , w e d e d u c e the recurrence relations for the product moments of order statistics from the l o g i s t i c d i s t r i b u t i o n e s t a b l i s h e d b y S h a h (1966); see a l s o B a l a k r i s h n a n (1992).
Recurrence relationsfor single and product moments
103
Table 5.3 Coefficients for observed order statistics in BLUE's based on right-censored samples for generalized logistic distribution (sample size n = 20, r = number observed, k = 0.1) ,
,
r
0t
02
5
-0.19604 -0.27905 -0.26271 -0.21836 1.95617 -0.02280 -0.02639 -0.01702 -0.00432 -0.01014 0.02553 0.04119 0.05667 0.07155 0.86546 0.00145 0.01092 0.02162 0.03207 0.04188 0.05081 0.05871 0.06543 0.07091 0.07499 0.07764 0.07874 0.07820 0.07592 0.26073
-0.13848 -0.21007 -0.21422 -0.19832 0.76109 -0.05057 -0.08158 -0.08893 -0.08877 -0.08329 -0.07368 -0.06079 -0.04525 -0.02763 0.60050 -0.02744 -0.04591 -0.05190 -0.05379 -0.05266 -0.04913 -0.04357 -0.03635 -0.02765 -0.01779 -0.00692 0.00472 0.01691 0.02943 0.36204
10
15
Var(<)
Var(0_;)
Cov(< ,0_;)
0.86691
0.27064
0.36583
0.20775
0.10373
0.03760
0.15567
0.05780
-0.01048
0~
~
0~
REMARK 3.2. The relations established in Theorems 3.1-3.6 are complete in the sense that they will enable one to compute the product moments of order statistics for all sample sizes in a simple recursive manner. This may be done for any choice o f the shape parameter k that is of interest. The recursive computational algorithm is explained in the next section in detail.
4. Recursive computational algorithm Starting with the values of #H = E(X) a n d ,~L:l (2) = E(X2), relations (2.2) and (2.4) . (2) can be used to determine /~r:n and #~:n for r = 1 , 2 , . . . ,n and for n = 2, 3 , 4 , . . . . From these values, variances o f all order statistics can be readily computed.
104
N. Balakrishnan and R. Aggarwala
By starting with the fact that/.1,2:2 = /121:1(see Arnold and Balakrishnan, 1989), and /*2,3:3 can be determined from relations (3.2) and (3.11), respectively; /.l~3:3 can then be determined from (3.19). For the sample of size 4,/.1,2:4 and/*2,3:4 can be determined from (3.2),/*3,4:4 from (3.11),/.1~3:4 from (3.5),/*2,4:4 from (3.15), and finally/.1~4:4 from (3.23). This process may be followed similarly to determine /*r,s:n for 1 _< r < s _< n and for n = 5, 6,.... From these values, one can readily compute all the covariances of order statistics. Table 4.1 gives the means of order statistics for selected values ofk = 0.1 (0.1)0.4 up to sample size n = 8. Table 4.2 gives the values of variances and covariances for the same choices of n and k. Only positive values for k are considered, since E(X,-:n;k) = -E(X~_/+l:n; - k ) and Cov(X/:n,Xj:~;k) = Cov(Xn_j+l:n,Xn_i+l:n;- k ) , 1 < i _<j _< n, where Cov(X/:.,X/:.) = Var(X/:.). 121,2:3
5. Best linear unbiased estimators
With the use of the recursive algorithms to compute means, variances and covariances of order statistics from the generalized logistic distribution, we can develop best linear unbiased estimators of location (01) and scale (02) parameters for a given value of k when available samples are either complete or censored either from the left, the right or both, given prior knowledge of which order statistics were observed. Using the results from Balakrishnan and Cohen (1991) and David (1981), we have
07 = - . ' r r ,
(5.1)
= I'FY
where Y is the vector of observed order statistics,/, is the corresponding vector of means of order statistics for the standard (location 0, scale 1) distribution, Z is the corresponding variance-covariance matrix for the standard distribution, 1 is a vector of ones of appropriate dimension, and finally, F = E-l(1/~' A- PI')E-1
and
A = (l'Z-11)(/{E-1/*) - (l'E
1,/1)2
From this, one also obtains var(0~) = 02/*'£-1/*A '
var(0~) = 02
1'2-11 A (5.2)
cov(0 , 0;) =
a2 /~'I2-11
S
Table 5.1 gives coefficients for the observed order statistics obtained using (5.1) for the BLUE's of the location and scale parameters for selected values of k when the entire sample of size n is observed. Table 5.2 gives corresponding variances and covariance of the estimators obtained using (5.2). A short table of coefficients, variances and covariance of BLUE's is also given for k = 0.1 and n = 20
Recurrencerelationsfor singleandproductmoments
105
for right censored samples, where the number observed is r = 5, 10, and 15 (Table 5.3).
6. Maximum likelihood estimation We consider in this section two- and three-parameter maximum likelihood estimation. For the two-parameter case, we assume that the shape parameter k is a known quantity, and we estimate the location (0a) and scale (02) parameters using the maximum likelihood method. For the three-parameter case, the shape parameter k, as well as the scale and location parameters are assumed to be unknown. In both cases, for a right censored sample of size r from n independent random variables from the generalized logistic distribution with location parameter 01, scale parameter 02, and shape parameter k, the likelihood function to be maximized is given by Lx~:°,x2:,,...,xr:,(xl ,x2,...
{
,xr)
[,-
l) r
ri=, E'-
,}2
O(
0~{1 + [ 1 - k
{xr-O''~]¼\~ rrV t 02)jj 1i=1{
(6.l)
and its logarithm is given by in L = constant - r In 02 +
( ~ - l ) ~ln[1-k(Xi-01"~] i=1
t
o2
)J
-2£1n/=1 1+ 1- t , ~ J J J (6.2)
- ( n - r ) l n { l + f 1-k(xr-Ol')]\ 02 j[l} . It should be noted here that maximization of L(.) is subject to the constraint when k > 0 and Xl _> 01 + 02/k when k < 0. Notice that in both cases, at the boundary xr (or xi) = 01 + 02/k, the likelihood function takes the value 0. Thus, the maximum likelihood estimates must be subject to the strict inequalities xr < 01 + 02/k when k > 0 and Xl > 01 + 02/k when k < 0. In the two-parameter case, upon differentiation of in L with respect to 01 and 02, the maximum likelihood estimates for 01 and 02 are obtained by simultaneously solving the following two equations:
xr <_Ol + 02/k
106
N. Balakrishnan and R. Aggarwala
r
Oln L 001
,=, ,+ [,_<~)], (6.3)
k
(n--r) 1 [1 -k(~)]~ 1(~22)~---0
1+[,-<~)] ~ and Oln L 002
o~ ~
-' ~1~1
i=,
~(~)
1 ÷ [l_k(~)] l[1-k(~)]~-l[k(~)]
-
(n -
¼
t-(T) 1~.@) =0 . (6.4)
r) k
Upon simplification, these equations become
(~_)
k
2~ I1 - k(~/15 I (6.5)
-~
(n -- r)
, =0
and
(6.6) +
kfx_o,~
( n - r)
, = o.
Recurrencerelationsfor singleandproductmoments
107
These two equations must be solved numerically for 01 and 02. In the three-parameter case, in addition to solving these two equations, we must simultaneously solve the following equation:
£[
01nL
1 ln l k2 i=l
°q~
k(Xi-Ol~] \
.,-0,
- I £
02 /J
02_2
i=1 1 - k ( ~ )
X'o20'
-- (~y) ln[ ] -k(xr-Ol~ ~ 02 /] -- ( - ~ ) , I
[1 - k(~)]k 02 JJ 1 {-kl-~ln[ 1-k(xi-Ol~]\ '=' 1+ [1 -k(~)] ~ r
-2~-~
k [1 - k ( @ ) ]
1 ÷ [1 -k(~2<)]¼
x {-~--~ln[1-k(Xr-Ol'~]
1
(~2<)
}
for 01, 02, and k. Using (6.5) and (6.6), this likelihood equation simplifies to:
"[
,=1
+2£
-
1-
k(Xi-Ol~ \
02 }
1
-(n-r)ln
[1-k(Xr-Ol~] \
02 JJ
[1-k(~-°']]~ \ o= )J , l n [ 1 - k ( X i - 0 1 ) ] ,=,l+[l_k(~)]~ k, 02 JJ
(n -
r)[1
- k(~)]¼
(6.8)
ln[1-k(xr-Ol'~]
Thus, in the three-parameter case, one must solve equations (6.5), (6.6), and (6.8) simultaneously for 01, 02 and k. REMARK 6.1. Notice that, as k ~ 0, the left hand side of (6.8) approaches 0. Thus, k = 0 is always a solution to the three equations (6.5), (6.6) and (6.8). However, for any particular sample, the probability is 0 that the maximum likelihood estimate of k is 0. One should be wary of this when using computer algorithms to solve (6.5), (6.6) and (6.8) simultaneously.
N. Balakrishnan and R. Aggarwala
108
7. Numerical example A sample of size n = 20 was generated using A P L with k = 0.1 , 01 = 50 and 02 = 10, using the probability t r a n s f o r m method, that is, setting
F(x; 01; 0 2 )
1 =
, =
U
,
a uniform (0, 1) r a n d o m variable. The ordered sample obtained was: 11.98487, 21.04888, 26.85949, 30.21810, 33.59806, 35.34005, 40.67467, 45.92994, 47.88610, 48.10389, 49.12268, 54.51988, 55.83412, 59.00270, 59.64955, 62.58630, 63.72846, 63.88126, 69.40010, 72.72817 . A calculation of the correlation between the ordered sample values Yi:n and the quantiles F -1 [i/(20 + 1)], i = 1 , 2 , . . . , 20, was made assuming k = 0.1 and found to be 0.99255. Similar calculations assuming k = 0.05, k = 0.15, k = 0.20 and k = 0.25 were obtained, giving the results 0.98924, 0.99363, 0.99255, and 0.98940 respectively, indicating that perhaps any value of k between 0.1 and 0.2 describes the data well. Figs. 7.1 and 7.2 show quantile plots for k = 0.1 and k = 0.2. Using 5000 samples each time, p-values for the significance of correlations between Y/:n and F-l[i/(20+ 1)1 were calculated to test the values of k = 0.1, k = 0.2, k = 0.3 and k = 0.4 and were found to be 0.9648, 0.9686, 0.7638 and 0.4894, respectively. N o n e of these were significant, which was to be expected, considering the nature of the test. However, the p-values for k = 0.1 and k = 0.2 were much higher than the others, confirming the region of best choice for k. The B L U E ' s obtained using the full sample were as follows: (SE = standard error, obtained using Table 5.2) Using k = 0.1, 0 T = 49.54979,
SE(0~) = 3.72811,
0~ = 9.55430,
SE(0~) = 1.84375
SE(0~) = 3.78815,
0~ = 9.67733,
SE(0~) = 1.92114
and using k -- 0.2, 0~ = 50.33276,
In both cases, the estimates of 01 and 02 obtained are very close to the true values. Next, the B L U E ' s were calculated for right-censored samples using the coefficients given in Table 5.3, with k = 0.1. The results were as follows: (r = n u m b e r observed) for r = 5: for r = 10:
0~ 0~ 0~ 02
= = = =
43.84540, 7.74302, 49.16279, 10.21530,
SE(0~) SE(0~) SE(0T) SE(0~)
= = = =
7.20936, 4.02812 4.65610, 3.29010
109
Recurrence relations for single and product moments (2=2 OBSERVATIONS) 80.000 75.789 71.579 67.368 63.158 58.947 54.737 50.526
-2
46.316 42.105 Y 37.895 33.684 29.474 25.263 21.053 16.842 12.632 8.4211 4.2105 -0.17764E-14
-4.000
-2.000
0.000
2.000
4.000
F-1[i/(n+l)]
Fig. 7.1Quantile plot ofgenerated values, ~ ffomgeneralizedlogistic distribution with quantiles from generalized logistic distribution(k=0.1).
for r = 15:
0~ = 49.90012, 0~ = 10.75206,
SE(0~) = 4.24223, SE(0~) = 2.58488
for r = 20:
0~ = 49.54979, 0~ = 9.55430,
SE(0~) = 3.72811, SE(0~) = 1.84375
As expected, for small values o f r, the standard errors increase drastically. The r = 20 line is, o f course, the full sample, re-displayed for comparison. Again, the estimates are all close to the true values o f 01 and 02.
110
N. Balakrishnan and R. Aggarwala
(2=2 OBSERVATIONS) 80.000 75.789 71.579 67.368 63.158
.2
58.947 54.737 50.526
-2
46.316 42.105 Y 37.895 33.684 29.474 25.263 21.053 16.842 12.632 8.4211 4.2105 -0.17764E-14
-6.000
-3.000
0.000
3.000
6.000
F-1[i/(n+l)]
Fig. 7.2 Quantile plotofgenerated values, ~ from generalizedlogistic distribution with quantiles from generalizedlogistic distribution(k=0.2).
U s i n g M a p l e V Release 3, m a x i m u m l i k e l i h o o d estimates were also o b t a i n e d for the full s a m p l e as follows: U s i n g k = 0.1,
01 = 49.44142
and
0 2 - - 9.30316 .
U s i n g k = 0.2,
01 = 50.18144
and
02 =- 9.28734 .
A t w o - p a r a m e t e r m a x i m u m l i k e l i h o o d analysis was c o n d u c t e d for k = 0.1, a n d r = 5(5)20. The results are as follows:
Recurrence relations for single and product moments
for r = 5:
01 = 41.02149
02 = 6.38566
for r = 10: for r = 15: for r = 20:
01 = 48.25653 O1 = 49.58711 01 = 49.44142
02 = 9.41583 02 = 10.21531 02 = 9.30316
111
In this case, we had to ensure that these were the best possibl e estimates in the permissible region with respect to maximization of the likelihood function. In fact, they turned out to be the only permissible estimates. One may also obtain approximate standard errors of these estimates. However, due to the non-regularity of the two-parameter distribution, this should be done via a simulation study, rather than by the method of approximation of the information matrix. Thus, 1000 random samples of size 20 from the generalized logistic distribution with shape parameter k = 0.1 were generated, and the biases, variances, and covariance of the m a x i m u m likelihood estimates were obtained as follows: bias(O~) o2 5 10 15 20
-0.383334 -0.093311 -0.037775 -0.020462
bias(b2) o2 -0.174934 -0.066467 -0.040353 -0.021671
Var(b,) o~
Var(02) o~
o2
0.634252 0.193798 0.148690 0.144961
0.185203 0.088359 0.054835 0.035865
0.218841 0.028468 -0.008588 -0.015321
F o r comparison purposes, 1000 random samples of size 20 from the generalized logistic distribution with shape parameter k = 0.2 were generated, and the biases, variances, and covariance of the m a x i m u m likelihood estimates for the full sample were obtained as follows: r
20
bias(O~)
02
-0.024896
bias(02)
02
-0.034524
Var(O~)
~
0.159742
Var(02)
0~
0.034619
Cov(O~,O2)
0~
-0.035709
In obtaining these values, we ensured that each estimate from the simulation was a permissible one. In correcting for the bias of the m a x i m u m likelihood estimates, since these are only simulated and not exact values, we must first determine whether or not the calculated bias is significantly different from zero. Since the estimated bias is just an average of 1000 m a x i m u m likelihood estimates, a simple z-test will suffice. Thus, to test H0: (estimated bias of 0i)/02 = 0 for i -- 1,2, the following p-values are obtained: For k = 0.1: r: i = 1: i = 2:
5 0.000 0.000
10 0.000 0.000
15 0.002 0.000
20 0.089 0.000
112
N. Balakrishnan and R. Aggarwala
and for k = 0.2 (r = 20 only): i = 1: i = 2:
0.049 0.000
All of these values are at least somewhat significant, even if dependence between the estimates of 01 and 02 is considered, and Type-I errors are summed using a Bonferroni argument. As such, we have corrected for the bias of all m a x i m u m likelihood estimates of the original simulated sample. The resulting estimates, corrected for bias, and their variances are given by the following: For k = 0.1:
5 10 15 20
43.9883 49.1977 49.9892 49.6360
7.2504 4.5149 4.0965 3.6127
7.7396 10.0862 10.6449 9.5092
3.8227
9.6194
4.0369 3.2116 2.5975 1.8408
and for k = 0,2: r
01*)
20
50.4209
1.8538
All of these values compare very well with the best linear unbiased estimates presented earlier. Using these unbiased estimates of 02, we may also now obtain standard errors and an estimate of the covariance of the original (not corrected for bias) maxim u m likelihood estimates: For k = 0.1:
and for
r
01
SE(01)
02
5 10 15 20
41.02149 48.25653 49.58711 49.44142
6.1638 4.4402 4.1047 3.6205
6.38566 9.41583 10.21531 9.30316
SE(02) Coy(01,02) 3.3308 2.9981 2.4927 1.8009
13.1089 2.8961 -0.9731 -1.3854
k = 0.2:
F
01
SE(01)
02
sg(02)
Cov(01,02)
20
50.18144
3.8447
9.28734
1.7898
-3.3043
Recurrence relations for single and product moments
113
Incidentally, the values which are obtained when using the method of approximating the information matrix, that is, substituting the original m a x i m u m likelihood estimates (not corrected for bias) into the matrix of negatives of second derivatives of the log-likelihood and inverting, agree quite well with these variances and covariance for larger values of r, and are given by the following: For k = 0.1:
r 5 10 15 20
V/,~(l,1) v/Z(2,2) 5.04440 4.14645 4.03852 3.71267
2.90077 2.77845 2.30928 1.7619t
I(1,2) 10.04024 2.07025 -1.29874 -1.80124
and for k = 0.2: r 20
@I(1,1)I~(2,2~ 3.70088
1.80608
-[(1,2) --3.14516
Next, we consider a three-parameter analysis. If the shape parameter k is assumed to be unknown, best linear unbiased estimates for the three parameters can not be obtained. However, m a x i m u m likelihood estimation is still possible. The m a x i m u m likelihood estimates for r = 10, 15 and 20 were as follows: for r = 10: for r = 15: for r = 20:
O1 =49.06638 01 = 48.82512 01 = 50.51735
02 = 14.03007 02 = 11.13237 02 = 9.40326
k = -0.23045 k = -0.05129 k = 0.24861
In light of the close agreement between simulated results and the approximated information matrix for the two-parameter case, the approximate information matrix for this three-parameter case was also obtained, with the following results (3 rd parameter is k):
r
/r(1,1)
I(2,2)
I(3,3)
I(1,2)
/r(l,3)
-1(2,3)
10 15 20
40.08580 21.10314 15.60632
66.89431 10.44796 3.81140
0.13555 0.07841 0.03745
26.90281 -1.45586 -3.01869
-0.69739 0.37938 0.25180
-2.74171 -0.62508 0.12295
A simulation study was also conducted for the full sample case. 500 samples of size 20 from the generalized logistic distribution with shape parameter k = 0.1 were simulated, and t h e following values were obtained:
114
N. Balakrishnan and R. Aggarwala
bias(O,)
bias(b2)
02
02
bias(/~)
Var(O, ) 0.169703
-0.032107
-0.031482
-0.005367
Cov(0~,~2)
Cov (0~,~)
Cov (&~)
-0.015203
0.011412
0.004379
0~
~2
o~
Var(02)
Var/i~j
o~
0.037129
0.023955
02
For comparison purposes, 500 samples of size 20 from the generalized logistic distribution with shape parameter k = 0.2 were simulated, and the following values for the full sample case were obtained: bias(Ol)
bias(02)
02
02
bias(/~)
Var(b~) 0.146539
-0.026821
-0.050732
-0.025533
Coy(O, ,62) 0~
Coy (O~,~) T
(&~) Cov 02
-0.023482
0.006827
0.012748
0~
Var(b2)
Var(/})
0~
0.043898
0.022385
Again, in this case, in order to correct for bias, we first tested whether the bias was significantly different from zero. The p-values obtained for the test H0: bias (parameter/) = 0, i = 1,2, 3, where parameter 1 = 01, parameter2 = 02, parameter3 = k, were as follows: For k = 0.1: i: p-val:
1 0.081
2 0.000
3 0.438
1 0.117
2 0.000
3 0.000
and for k = 0.2: i: p-val:
We have chosen here to correct estimates of 01 and 02 for bias in the simulation with k = 0.1, and to correct estimates of 02 and k for bias in the simulation with k = 0.2. Thus, the unbiased estimates and their standard errors are given in the following table ( k s i m = value of k used for corresponding simulation):
sim 0.1 0.2
0 )SEE ) l 50.8291 50.5174
3.9882 3.7920
9.7089 9.9058
1.9316 2.1864
0.2486 0.2741
0.1548 0.1496
Finally, we are able to approximate the variances and covariances of the original maximum likelihood estimators using the unbiased estimate for 02:
Recurrence relations for single and product moments
ksim V~ar(01) V~ar(02) Var(l~) 0.1 0.2
15.9967 14.3791
3.4999 4.3075
I 15
Cov(~)l,02)
Cov(~)l,k)
Cov(02,/¢)
-1.4331 -2.3042
0.1108 0.0676
0.0425 0,1263
0.0240 0.0224
For the most part, these agree well with the values given above for the approximate information matrix (r = 20). Although best linear unbiased estimates can not be obtained for the threeparameter analysis, the method of moment estimation, where expected and observed sample mean, variance and skewness are equated, is possible in this case when the full sample is observed. The estimates obtained using the method of moments for the full sample were as follows: for r = 20:
01 = 48.38003
02 = 9.16606
f~ = 0.05126 .
A simulation was conducted, and 1000 samples of size 20 with shape parameter k = 0.1 were generated. Again, for comparison purposes, 1000 samples of size 20 with shape parameter k = 0.2 were also generated. The approximate biases, variances and covariances of the moment estimators were obtained as follows:
ksim
bias(0~) 02
0.1 0.2
-0.097042 -0.188451
ksim 0.1 0.2
COY(01,02) 0~ -0.017226 -0.052578
bias(b2) 02
a;asri~
Var(0,)
-0.023411 0.045116
-0.056100 -0.113374
0.148594 0.190247
Cov(01,k) 02
Cov(02,k) 02
0.001375 0.000397
0.004774 0.009752
0~
Var(02)
0~
0.042713 0.079860
Var(/~) 0.004154 0.003862
Once again, in order to correct for bias, we first tested whether the biases were significantly different from zero. The p-values obtained for the test H0: bias ( p a r a m e t e r i ) = 0 , i - l , 2 , 3 , where p a r a m e t e r l = 0 1 , p a r a m e t e r 2 = 0 2 , parameter 3 = k, were as follows: For k = 0.1: i:
1
2
3
p-val: 0.000 0.000 0.000 and for k = 0.2: i:
1
2
3
p-val: 0.000 0.000 0.000 All of these values are highly significant, and as such, all three parameters may be corrected for bias:
116
N. Balakrishnan and R. Aggarwala
0.1 0.2
49.2908 50.0328
3.5813 3.6572
9.38579 8.77038
1.9863 2.3715
0.1074 0.1646
0.0645 0.0621
For the most part, these estimates are comparable to those obtained using the maximum likelihood method. Therefore, due to the comparable ease of computation for the method of moments, one may prefer to use this method for obtaining estimates of parameters in the case when all three parameters of the generalized logistic distribution are unknown.
II. D o u b l y truncated generalized logistic distribution 8. Introduction
Let X1, X2,...,Am be a random sample of size n from the doubly truncated generalized logistic population with cumulative distribution function 1
(e
kx)l/e],
(1
Q)[1 +
/
Qj < x < P ,
<¼
1
F(x) = 1 (P-Q)[l+e
when k > O
J
x],
lc < Q1 < x < P1
when k < O, (8.1)
QI < x < P l
when k = O ,
Q1 < x %P1 <¼
when k > O
and with probability density function
( P - Q){ [1 + (1
k~c)llkl2}'
-
1
f(x) = I
< Q1 < x < Pj
when k < O, (8.2)
e-X
(P - Q)[(1 + e-X)2] '
QI < x < PI
when k = 0 ,
where 1 - P is the proportion of right truncation and Q is the proportion of left truncation of the generalized logistic distribution with cumulative distribution function as given by (1.1): 1 1+
(1
G(x)= 1 - 1 + e -x'
kx?l/k,
--oc<x<~
when k > O
1 ~<x
when k < O ,
-oo<x
when k = O
]
.
117
Recurrence relations for single and product moments
Thus, P = G(PI) and Q = G(QI). Let Xl:n < X2:n _<.-- _<,l;n:, denote the order statistics obtained by arranging the n ~.'s in increasing order of magnitude. Let us use #~f) to denote the single moments E(X/:,) for 1 < r < n and i > 1, and #~,s:, to denote the product moments E(X~:,Xs:n) for 1 _< r < s _< n. Let us further denote Var(Xr:n) by a~.... and Cov(Xr:,,Xs:,) by a ...... For simplicity, we shall also use #~:, for #~,I. In Section II of this paper, we establish several recurrence relations satisfied by the single moments #~i)~and the product moments #r,s:, from the doubly truncated generalized logistic distribution. These relations will enable one to compute all the single and product moments of order statistics for all sample sizes in a simple recursive manner. If we let the shape parameter k -+ 0, the recurrence relations reduce to the corresponding results for the doubly truncated logistic distribution established by Balakrishnan and Kocherlakota (1986). By starting with the values F~(X) :/21:1, E(X 2) : ~(2) 1:1 and /~1,2:2=/~2:1, one can determine the means, variances and covariances of all order statistics for all sample sizes through this recursive computational procedure. Work of this nature has been carried out by Joshi (1978, 1979, 1982) for truncated exponential distributions and by Balakrishnan and Malik (1987) and Ali and Khan (1987) for truncated log-logistic distributions. From (8.1) and (8.2), we observe that the characterizing differential equation for the doubly truncated generalized logistic population is (1 - l~c)f(x) = g ( x ) - (P - Q)F2(x)
(8.3)
= (1 - P + Q ) F ( x ) + (P - Q ) F ( x ) [ I - F(x)] .
(8.4)
As Balakrishnan and Kocherlakota (1986) exploited these equations for the standard doubly truncated logistic distribution (case when k = 0) in order to derive several recurrence relations for the single and the product moments of order statistics, we shall use (8.3) and (8.4) in the following sections to establish similar results for the doubly truncated generalized logistic distribution in (8.1). These generalized recurrence relations are then shown to be complete in the sense that they will enable one to compute all the single and product moments of order statistics for all sample sizes in a simple recursive manner. 9. Recurrence relations for single moments
The density function of Xr:n (1 < r < n) is given by [David (1981, p. 9), Arnold, Balakrishnan and Nagaraja (1992, p. 10)] n!
= ( r - 1)!(n- r)!
IF(x)] r-l[1 - F(x)]"-~f(x),
Q1 <_ x <_ t:)1.
(9.1) In this section, we establish some recurrence relations for the isingle moments of order statistics from the doubly truncated generalized logistic distribution.
N. Balakrishnan and R. Aggarwala
118
THEOREM 9.1. F o r 1 < r < n - 1, n > 2 a n d i = 0, 1 , 2 , . . . , #(i4.1)
H(/4,1) n+l [ i+1 (#(,) _k#!{+l) ) r4-1:n4,1 ~--- t~r:n4-1 -]- ~ Q Y(YlT ;--~ 1) \ r:n
(9.2) + .1 - P .+ Q {. n-r+l
. (/+1) 'ar+l:n'}J (g+l)~] ~G:n --
and for n > 1 and i = 0, 1,2,..., /~(/+1) n+l:n+l
.(,+1) n+l /~n:n+l -]- ~
(#(i)_k/~(ni:+l)) \ n:n
[i~1 - -
(9.3) +(1-P+Q)(tt!i:+l)-p{+l)]
.
PROOF. For 1 < r < n - 1 and i = 0, 1 , 2 , . . . , let us consider from (9.1) and (8.4) (1 - e -~- Q)
/l~i! -- k~t!in+l) = (r - l ) [ ( n - y)[
xi[f(x)]r[l - F(x)] n-r dx I
Integrating now by parts treating x g for integration and the rest of the integrands for differentiation, we obtain r(1 - P + Q) r (g+l) r'r:n ,,(,/_
i+
[~r+ 1:n -
1
(i+1)] J
(9.4) r(P.
Q)(n - r + 1) [. (i+1)
_/z(i+,)]
The relation in (9.2) follows simply by rewriting (9.4). Relation (9.3) is obtained by setting r = n in the above proof and simplifying. Q.E.D. THEOREM 9.2.
For n >_ 2 and i = 0, 1 , 2 , . . . ,
#(i÷1) (1 . . . . (i+l) - - - , (g+l) - r" + ~d)#1.,,-1 - (1 - 2P + 2~)#l:n l:n+l =
i + 1 {. (i) n
\#1:~-k~'i:.
(9.5)
. (i+1)~ . ) ,
and for i = 0, 1 , 2 , . . . , #(g+l)l:2 = (1
-- P + Q)p(g+l)
--(i+
_
,..,.-.~\ (,+1)
(1 - 2P + z~d)#H
1 ) ( , u l i { - k • (i+l)'~ 'Ul:l
) '
(9.6)
Recurrence relationsfor single and product moments
119
PROOF. For n _> 1 and i = 0, 1,2,..., let us consider from (9.1) and (8.4)
l:n--g#l:n /~(i/ , (i+1) : r / < { ( l - P - I - Q )
fo P' xi[f(x)][1-F(x)] n lax i
Integrating by parts treating x i for integration and the rest of the integrands for differentiation, and then writing F(x) as 1 - [1 - F(x)], we obtain k~(i)
.(i+1)_
l:n -- kt~l:n
n
i; 1
{(l_p+Q)[,(i+l
)
.(i+1)1
Lt~l:n-I -/Xl:n
]
The relation in (9.5) follows simply by rewriting (9.7). Relation (9.6) is obtained by setting n = 1 in the above proof and simplifying. Q.E.D. REMARK 9.1. Letting the shape parameter k ~ 0 in (9.2)-(9.5), we deduce the recurrence relations established by Balakrishnan and Kocherlakota (1986) for the single moments of order statistics from the doubly truncated logistic distribution. Furthermore, letting P -+ 1 and Q ~ 0, we deduce the recurrence relations for the generalized logistic distribution, established in Section I. REMARK 9.2. The recurrence relations established in (9.2)-(9.5) will enable one to compute all the single moments of all order statistics for all sample sizes in a simple recursive manner. This is explained in detail in Section 11. 10. Recurrence relations for product moments
The joint density function of X/:~ and Xj:n (1 < i < j _< n) is given by [David (1981, p. 10), Arnold, Balakrishnan and Nagaraja (1992, p. 16)] n! fi,j:n(x,y) = (i - 1 ) ! ( j - i - 1 ) ! ( n - j ) ! [F(x)]i-I[F(Y) - F(x)]J-i=l x [1-F(y)]"-Jf(x)f(y),
Q1 <_x
•
(10.1)
In this section, we establish several recurrence relations for the product moments of order statistics, from the doubly truncated generalized logistic distribution. THEOREM 10.1. For 1 _< i < j _< n and j - i _> 2, n+l
]Ai+lff+l:n+l ~- /Ai+2,j+l:n+ 1 -~" ( i + 1 ) ( P - Q) )< I#i,j:n -- ~i+l,j:n ~- ~ (~2j:n- k]2i,j:n)]
(10.2)
N. Balakrishnan and R. Aggarwala
120 and for 1 < i < n - 1, • (2)
].li+l,i+2:n+l
z
# i + 2 : n + l ~-
x
[
n+l (i + 1)(P - Q)
1
, (2) q_ 7 (/~i+lin _ kl~i,i+l:n ) #i,i+l:n -- /*i+l:n
]
(10.3)
PROOF. F o r 1 _< i < j _< n a n d j - i _> 2, let us consider from (10.1) and (8.3)
~+:. - k~i,;:. = E (x;:0 xj:. - kX~:~X;:n) =
n_[ (i- 1)!(j'-i-
[P'y[1-
F(y)]" J f ( y ) I i ( y ) d y
1)!(n-J)[aOt (10.4)
where 11 (y) =
/(
IF(x)] i IF(y) - F(x)] j-i-I dx
1
- (P - Q) fQi[F(x)]i+l [F(y) - F(x)] j-*-I dx
m
Integrating by parts treating dy for integration and the rest of the integrands for differentiation, we obtain I1 (y) z
x(j
-- i -- 1)[f(x)]i[f(y)
-- f(x)lJ-i-2f(x)
dx
1
- (P-
Q)
/j
x(j-i-
1)[F(x)]i+~[F(y) - F ( x ) ] ;
i 2f(x)dx
1
+ (P - O)
£
x(i + 1)[F(x)]~[FCv) - F(x)]J<-~f(x) dx 1
U p o n substituting this expression o f h (y) into (10.4) and simplifying the resulting expression, we obtain
Pj:, - klxi,y:n = i[#i+lj:, - #i,j:,] -~
i ( i + 1 ) ( P - Q) n+l
× I/~i+ld+l:n+l - #i+2,j+l:n+l] • The recurrence relation in (10.2) is derived simply by rewriting the above equation. Relation (10.3) is obtained by setting j = n in the above p r o o f and simplifying. Q.E.D.
,
Recurrence relations for single and product moments THEOREM
10.2. For 1 < i < j _ < n - l a n d
121
j-i>_2,
[
n+l ~-2-)(-P - Q) (1 - 2P + 2Q) (#id_,: n - #ij:,)
#i,j:n+l ~ / ' / i d - l : n + l -~ (n - j
n(1 - P + Q) ;-)71 1 n-j+l
(10.5)
-
(/~,:. - kl~ia:.)] ;
and for 1 < i < n - 2 , n+l ]Ai'n:n+l =]'li'n-I:n+l q 2(P Z Q)[(1 - 2P + 2Q)(#i,n_l: n -- ]Ai,n:n)
-n(1 - P + Q)(#i,,
l:n-1 -- Pl~i:n
1) -
(10.6)
(lli:n -- kl'li,n:n)]
•
PROOF. For 1 < i < j < n - 1 a n d j - i ___2, let us consider from (10.1) and (8.4)
0 _
n!
- (i - 1)!(j - i - 1)!(n - j)!
f0 °' xIF(x)]i-lf(x)Jl(X)
dx ,
1
(10.7) where Jl(x) = (1 - P + Q )
~xx P I IF(y) - F ( x ) ] j i-1[1 - F ( y ) ] n - J F ( y ) d y
+ (P - Q)
IF(y) - F(x)] j:i 111 - F(y)]" J+lF(y) dy .
Writing F(y) as 1 - [1 - F(y)] in the above integrals and then integrating by parts treating dy for integration and the rest of the integrands for differentiation, we obtain an expression for J1 (x) which, when substituted into (10.7) and simplified, gives I~i:, - k#id:, = (1 - P + O){n[pi,j:,_ 1 - #i,j l:n-l] + (n - j + 1)[#i,j_,:. - #i,j:n]} + (P - Q) { (n - j + 1)I#id:, - #i,j ,:~] (n - j +
1)(n-j+2) n + 1
} [l'li,j-l:n+l - #i,j:n+l]
•
The recurrence relation in (10.5) is derived simply by rewriting the above equation. Relation (10.6) is obtained by setting j = n in the above proof and simplifying. Q.E.D.
N. Balakrishnan and R. Aggarwala
122
THEOREM 10.3. For 1 < i < n - 2, n+l [ ( . (2) "~ #i,i+l:n+l = (n - i + I)(P - Q) (1 - P + Q) npi,i+l:._ 1 -- t#i+l:. ) - (1 - 2 P + 2Q)Yi,i+l:n i n-i+l
n - 1 i (#/:.
__
kui,i+l:n) 1
#(2) . i+1:.+1 ,
(10.8)
and for n >_ 2,
I'ln-l.n:n+l -- 2(p÷-l-Q) I(l -- g ÷ Q)(l'lPl#n-l:n-, -- ( n
1]# (2,~/n:nl
-- (1 -- 2P + 2Q)#,,_l,n: n - (#,, l:n -- k#n-, .... )] n - l , (2) -
PROOF.For
t~n:n+l
•
1 < i < n - 2, let us consider from (10.1) and (8.4)
#i:n-k#i,i+l:n
= E(X/:ng/°l:n - k~i:nX/+l:n)
n!
= (i - 1)!(n -- i - 1)!
£P~x{F(x)]i-lf(x)J2(x)
dx,
(10.10)
I
where
J2(x)
= (1 - P + Q) fx PI [1 - F(y)]" i-iF(y) dy + (P - Q)
f"
[1 -
dy.
x
Writing F(y) as 1 - [1 - F(y)] in the above integrals and then integrating by parts treating dy for integration and the rest of the integrands for differentiation, we obtain an expression for J~ (x) which, when substituted into (10.10) and simplified by combining integrands containing ( 1 - P + Q)x 2 and then combining integrands containing (P - Q)x 2, gives 1" . (2) + n#i,i+l:n_ 1 -- (n -- i)#i,i+l:n] #i:n -- k#i,i+l:,, = (1 - P + Q) [-t#i+l:~ I-i #(2) + (P - O)(n - i) kn + 1 i+l:n+l ÷
n-i+l n+ 1
#i,i+l:n
] #i,i+l:n+l
"
Recurrence relations for single and product moments
123
The recurrence relation in (10.8) is derived simply by rewriting the above equation. Relation (10.9) is derived by setting i = n - 1 in the above proof and simplifying. Q.E.D. THEOREM 10.4. For 1 < i < j < n " 1 and j - i > 2, i /zi,jwl:n+l z /zi~j:n+l ~ - ~ / . (/zi+l,j:n+l - /zi+l,jWl:n+l ) F/ -]- 1
(1 -- P ~- Q)
~ ( n - j + 1)(P- Q)
1/ +7--_
+
(/zi+l,j:n -- /zi+ld+l:n)
- k/z/j:,)}
(10.11)
and f o r l < i < n - 2 ,
_ i (/zi+l:n+l t` (2) -- /zi+l,i+2:n+l) /zi,i+2:n+l = /zi,i+l:n+l ~n+l -~ ( l ' l - i ) ( P - a )
{(1 - P +Q)[i(/zl2+)l:n- /zi+l'i+2:n)
+ (/zi,i+l:n-/zi,i+2:n)]-{-(/zi:n-k/zi,i+l:n)}
,
(10.12)
PROOF. Following the proof of Theorem 10.2, writing F(y) as F(x) + [F(y) - F(x)l in Ji (X) and then integrating by parts treating dy for integration and the rest of the integrands for differentiation, we obtain an expression for Jl(x) which, when substituted into (10.7) and simplified, gives, for 1 <_i<j<_n-1 and j - i _ > 2, /zi:n -- k/zi,j:n = (1 - P + Q){i[/zi+l,j+l:n -- /zi+lj:n] ~- ( J - i)[/zi,j+l:n -- /zi,j:n]} q-
(P- Q)(n-j+ n+l
1)
{ i[/zi+l,j+l:n+l - /zi+l,j:n+l ]
+(J -- i) [/zi,j+l:n+l - /zid:n+l] } "
The recurrence relation in (10.11) is derived simply by rewriting the above equation. Relation (10.12) is obtained by setting j = i + 1 in the above proof and simplifying. Q.E.D. THEOREM 10.5. For 1 < i < n - 2, i /zi,n+l:n+l = /zi,n:n+l -~" l'l --~ (/Zi+I .... +1 -- /zi+l,n+l:n+l ) +P~--Qn + 1 L[1- n P_ 7+ Q +
n-i
(i/Zi+I. . . .
-- nPl/zi:n
(/zi:n -- k/zi,n:n
;
1
-}-/zi,n:n)
(10.13)
N. Balakrishnan and R. Aggarwala
124 and for n _< 2,
#.-1,.+1:.+1 = # . - 1 .... +~ + (n
--
1') /~#n:n+l (2)
÷ J (1 r, 1 - P ÷ ÷ ~ n_ Q
- #.#+1:.+1)
Q)[(n - 1]# . ,(2) , : . - - n P l U n - l : n - l ÷ #n-l,n:n]
÷ (#n-am -- k#n-l,n:n)} •
(10.14)
PROOF. For 1 < i < n - 2, let us consider from (10.1) and (8.4) #~:, - k # , .... = E ( X ~ : , X ; ° : , - ~ . : , X ~ : , )
n! /)[ = ( i - 1 ) ! ( n - i - 1)! x[F(x)]i-lf(x)Jg(x) dx ,
(10.15)
1
where J3(x) = (1 - P + Q) ~x PI IF(y) - F(x)] n-i 1F(y) dy + (P - Q)
fx"
IF(y) - F(x)]" i-1 [1 - F(y)]F(y) dy .
Writing F(y) as F(x) ÷ IF(y) - F(x)l in the above integrals and then integrating by parts treating dy for integration and the rest of the integrands for differentiation, we obtain an expression for J3(x) which, when substituted into (10.15) and simplified by combining integrands containing P1, gives #i:, - kgi .... = (1 - P ÷ Q)[nPl#i:n-, - i#i+l .... - (n - i)#/,n:,] P-Q
÷ - n - ~ T {i[#i+l,n+l:n+l - #i+l,n:n+l]
+
i)[#,,,,+,:.+1-
#,,.:.+,] }
The recurrence relation in (10.13) follows readily upon simplifying the above equation. Relation (10.14) is obtained by setting i = n - 1 in the above proof and simplifying. Q.E.D. REMARK 10.1. Letting the shape parameter k -+ 0 in the above results, we deduce the recurrence relations for the product moments of order statistics from the doubly truncated logistic distribution established by Balakrishnan and Kocherlakota (1986). Furthermore, letting P --+ 1 and Q --+ 0, we deduce the recurrence relations for product moments of order statistics from the generalized logistic distribution, which have been established in Section I. REMARK 10.2. The recurrence relations established in this section are complete in the sense that they will enable one to compute all the product moments of all order statistics for all sample sizes in a simple recursive manner. This can be done
Recurrence relations for single and product moments
125
for any choice of the shape parameter k; the recursive computational algorithm is explained in Section 11 in detail.
11. Recursive algorithm Starting with the value of #1:1 = E(X), we can use (9.6) to obtain//1:2 and (9.5) to obtain/21:3,/21:4 . . . . ,/21:n. Then, using (9.3), we can obtain/22:2 and using (9.2), we can obtain/22:3. Again, using (9.3), we can obtain/23:3 and using (9.2), we obtain /22:4 and/23:4. Proceeding in this way, we can obtain all the first-order moments/2r:n ~- (2) = E ( X 2 ) , w e for r = 1 , 2 , . . . , n and n = 2, 3, 4,.... Starting with the value oi/21:a proceed exactly on similar lines to obtain all the second-order moments./2~2) for r = 1,2,... ,n and n = 2, 3,4,.... From these values, variances of all order statistics can be computed. By starting with the fact that/21,2:2 ~---/221:1(see Arnold and Balakrishnan, 1989), /22,3:3,/23,4:4,''" ~/2n l . . . . can be determined from (10.3). Then/21,2:3 can be determined using (10.9), and /21,2:4 /21,2:5,''',/21,2:n from (10.8). /21,3:3 c a n then be determined using (10.14) with n = 2 and/22,4:4,/23,5:5,''',/2n-2,n:n can be found using (10:2). /21,3:4 can then be found using (10.5), and /21,3:5, /21,3:6,--.,/21,3:n can be determined using (10.5). /22,3:4 is then determined using (10.9) and /22,3:5, /22,3:6,''',/22,3:n f r o m (10.8). Next, /2~,4:4 can be determined using (10.13) and following the steps above, we may similarly determine/2r,s:, for 1 _< r < s _< n and n = 5, 6,.... From these values, covariances of order statistics can be readily computed. Thus, by starting just with the values of E(X) and E(X2), we may compute the means, variances and covariances of order statistics for all sample sizes in a simple recursive manner. This may be done for any value of the shape parameter k and the truncation parameters P and Q.
Acknowledgements The authors thank Dr. Gail Wolkowicz for her help in the computation of maximum likelihood estimates, and the Natural Sciences and Engineering Research Council of Canada for funding this research.
References Ali, M. M. and A. H. Khan (1987). On order statistics from the log-logistic distribution. J. Statist. Plann. Infer. 17, 103 108. Arnold, B. C. and N. Balakrishnan (1989). Relations, Bounds and Approximations for Order Statistics. Lecture Notes in Statistics No. 53, Springer-Verlag, New York. Arnold, B. C., N. Balakrishnan and H. N. Nagaraja (1992). A First Course in Order Statistics. John Wiley & Sons, New York.
126
N. Balakrishnan and R. Aggarwala
Balakrishnan, N. (Ed.) (1992). Handbook of the Logistic Distribution. Marcel Dekker, New York. Balakrishnan, N. and A. C. Cohen (1991). Order Statistics and Inference: Estimation Methods. Academic Press, San Diego. Balakrishnan, N. and S. Kocherlakota (1986). On the moments of order statistics from the doubly truncated logistic distribution. J. Statist. Plann. Infer. 13, 117-129. Balakrishnan, N. and H. J. Malik (1987). Moments of order statistics from truncated log-logistic distribution. J. Statist. Plann. Infer. 17, 251~67. Balakrishnan, N. and K. S. Sultan (1998). Recurrence relations and identities for moments of order statistics. In the companion volume. Cohen, A. C. and B. J. Whitten (1988). Parameter Estimation for Reliability and Life Span Models. Marcel Dekker, New York.. David, H. A. (1981). Order Statistics, Second edition. John Wiley & Sons, New York. Hosking, J. R. M. (1986). The theory of probability weighted moments. I B M Research Report PC 12210. Hosking, J. R. M. (1990). L-moments: Analysis and estimation of distributions using linear combinations of order statistics. J. Roy. Statist. Soc. Series B 52, 105-124. Joshi, P. C. (1978). Recurrence relations between moments of order statistics from exponential and truncated exponential distributions. Sankhya, Set. B 39, 362-371. Joshi, P. C. (1979). A note on the moments of order statistics from doubly truncated exponential distribution. Ann. Inst. Statist. Math. 31, 321-324. Joshi, P. C. (1982). A note on the mixed moments of order statistics from exponential and truncated exponential distributions. J. Statist. Plann. Infer. 6, 13-16. Shah, B. K. (1966). On the bivariate moments of order statistics from a logistic distribution. Ann. Math. Statist. 37, 1002-1010. Shah, B. K. (1970). Note on moments of logistic order statistics. Ann. Math. Statist. 41, 2151~152. Zelterman, D. and N. Balakrishnan (1992). Univariate generalized distributions. In Handbook of the Logistic Distribution (Ed., N. Balakrishnan), pp. 209-221. Marcel Dekker, New York.
N. Balakrishnan and C. R. Rao, eds., Handbook of Statistics, Vol. 17 © 1998 Elsevier Science B.V. All rights reserved
D
Order Statistics from the Type III Generalized Logistic Distribution and Applications
N. Balakrishnan and S. K. Lee
1. Introduction
In the past 150 years, the theory, methodology and applications of the logistic distribution have been studied in detail by numerous authors. The logistic function was first used as a growth function by Verhulst (1838, 1845) in demographic studies and was given its present name by Reed and Berkson (1929). Due to its simplicity and that it has a shape that resembles the normal distribution, the logistic function has been very popular since the end of the nineteenth century. Pearl and Reed (1920, 1924), Pearl, Reed and Kish (1940), Schultz (1930) and more recently Oliver (1982) all applied the logistic model as a growth model in human populations and biological organisms. Some applications of the logistic function in bioassy problems were discussed by Berkson (1944, 1951, 1953), Wilson and Worcester (1943), and Finney (1947, 1952). Other applications and significant developments concerning the logistic distribution can be found in the book by Balakrishnan (1992). In the past 50 years, several different forms of generalizations of the logistic distribution have been proposed in the literature. These distributions are referred to as "the generalized logistic distributions". Balakrishnan and Leung (1988) defined three types of generalized logistic distributions by compounding an extreme-value distribution of the double exponential type, a reduced log-Weibull distribution and an exponential-gamma distribution with a gamma distribution (denoted by Types I, II and III), respectively. It is of interest to mention here that the Type I generalized logistic distribution was first derived by Dubey (1969). The Type II generalized logistic distribution is simply negative of the Type I distribution. The Type III generalized logistic distribution was first proposed by Gumbel (1944). He derived the Type III distribution as the limiting distribution of the m t h midrange ~ of a large sample from an unlimited symmetric continuous distribution with zero mean. The Type III distribution was also studied by Davidson (1980) who established two other characterizations of the distribution as a I The m th midrange of a sample of size n is defined as (Xm:~+X,~--m+l:n)/2. 127
128
N. Balakrishnan and S. K. Lee
difference of two independent and identically distributed r a n d o m variables from some distributions. Prentice (1976) proposed the Type IV generalized logistic distribution, with probability density function (p.d.f.) F(p + q)
e -qy
g(Y)=r(p)r(q)(l+e y)P+q'
-oc
p,q>O
,
(1.1)
as an alternative to the usual logistic model for binomial regression data. It is readily observed that Types I, II and IH are all special cases of the Type IV distribution. The density function in (1.1) corresponds to the density function of Types I, II and III distribution when q = 1, p = 1, and p = q, respectively. Zelterman and Balakrishnan (1992) have reviewed the developments on these four generalized forms of the logistic distribution. In this paper, we first obtain the Type III generalized logistic distribution by compounding an exponential-gamma distribution with a g a m m a distribution as in Balakrishnan and Leung (1988). Then we present a reparametrized model which has a standard normal distribution as the limiting distribution as the shape parameter tends to infinity. This reparametrized model, in addition to being useful as a life-time model, can also serve as meaningful alternative to the normal distribution for robustness studies concerning some classical procedures based on normality. In Section 3, we study the order statistics and moments from this reparametrized distribution. Tables of means, variances and covariances of order statistics are presented for sample size n = 20 and shape parameter = 0.5(0.5)3(1)6(2)12. Following steps similar to those of G u p t a (1960), we derive explicit formulae for the single moments of order statistics for positive integer values of the shape parameter. In Sections 4 and 5, we study the best linear unbiased estimators (BLUEs) and the m a x i m u m likelihood estimators (MLEs) of the location and scale parameters, respectively. The coefficients of the BLUEs, the variance and covariance factors of the BLUEs and simulated values of the bias, variances, covariance and mean square errors of the MLEs for sample size n = 20 and various choices of e are also presented. Finally, we compute the relative efficiency between the BLUEs and the MLEs, make some comparative comments, and present two examples which illustrate both these methods of estimation.
2. Type III generalized logistic distribution Let Y be a random variable with a p.d.f, of an exponential-gamma distribution given by
f(Y I 2)
=
e -'~e Ye-~y,
- o c < y < oc,
c~ > 0,
2 > 0 .
(2.1)
Let us now assume that the parameter 2 in (2.1) has a g a m m a distribution with density function
Order statistics from Type III generalized logistic
9(2)--
e-,t)t~ 1 F(~) '
2>0,
129
c~>0 .
(2.2)
Then, we obtain the density function of the compounding distribution based on (2.1) and (2.2) as f(y) =
/0
=
f ( y [ 2)g(2) d2
e ~Y
[oo22~_~e ~(a+e--v)d2
1
e -~y
=B(c~,c~)(l+e-y) 2~'
-oo
c~>0 ,
(2.3)
where B(., .) is the complete beta function and c~is the shape parameter. Balakrishnan and Leung (1988) referred to this as the Type I I I generalized logistic distribution. Gumbel (1944) derived this Type Ill distribution as the limiting distribution of the mth midrange of a large sample from an unlimited symmetric continuous distribution with zero mean. Davidson (1980) showed that the Type lII distribution can also be expressed as the distribution of the difference of two independent and identically distributed random variables from the generalized extreme value distribution. From the density function in (2.3), it is clear that the distribution is symmetric about zero. When the shape parameter e = 1, (2.3) corresponds to the usual logistic density function, and for positive integer values of e (2.3) is simply the density function of the sample median in a random sample of size 2e - 1 from the usual logistic distribution. The corresponding c.d.f, of the distribution in (2.3) can be obtained as 1
F(y) = B(e, e)
fY e-at dt, oo (1 + e-t) 2~
-oc
c~>0
(2.4)
Making the transformation u = l@e-,, we get F(y)
-
1
[P(Y) u~_I(I_u) ~ l d u
=Ip(y)(C~,~),
- o c < y < oo,
c~ > 0,
(2.5)
where Ip(y)(., .) is the incomplete beta ratio and p(y) - - l + e 1 Y" For integer e, integrating Eq. (2.5) by parts repeatedly, treating u ~-~ for integration and the rest of the integrand for differentiation, we get F(y) = Z
2~
1 {p(y)}r{q(y)}Z~-,-r
e-y where q(y) = 1 - p(y) - l+e Y"
(2.6)
130
N. Balakrishnan and S. K, Lee
Davidson (1980) showed that the moment generating function (m.g.f.) of the Type III generalized logistic distribution is My(t) =
+ t)r(
{F(~)} 2
- t)
,
I tl<e
.
(2.7)
As mentioned by Balakrishnan and Leung (1988), the mean and the coefficient of skewness (V/~l) of the distribution are equal to 0. Variance and the coefficient of kurtosis (fi2) can be obtained from the m.g.f, in (2.7) as follows: Var(Y) = E(Y 2) = Mi~(t)lt= 0 = 2O'(e )
(2.8)
and E ( Y 4) f l 2 ( y ) -- { V a r ( y ) } 2
=
(2.9)
,
where O(z) is the psi (or digamma) function defined by 0(z) = d In F(z)/dz, tk'(z) and O"'(z)are its first and third derivatives, respectively. These two measures have been computed for various values of e and are presented in Table 1. From Table 1, we can see that the Type III generalized logistic distribution is a family of symmetric distributions with coefficient of kurtosis smaller than that of the logistic (when c~ > 1). The value of fi2 decreases with increasing values of c~; and close to value 3 as ~ gets large. This indicates that Table 1 Values of variance and coefficient of kurtosis
0.5 1.0 1.5 2.0 2.5 3.0 4.0 5.0 6.0 8.0 10.0 12.0
Var(Y)
fi2
9.869604 3.289868 1.869604 1.289868 0.980716 0.789868 0.567646 0.442646 0.362646 0.266274 0.210333 0.173804
5.000000 4.200000 3.806250 3.593763 3.465596 3.381282 3.278475 3.218723 3.179874 3.132556 3.104879 3.086739
Order statistics from Type Ill generalized logistic
131
the distribution has thicker tails than the normal distribution for small ~ and close to the normal distribution when ~ is large. Therefore, we will consider a reparametrized model of the Type Ill generalized logistic distribution from now on. Let Y be a random variable from the Type III generalized logistic distribution with density as given in (2.3) with mean 0 and variance 2~'(c~). From Abramowitz and Stegun (1965, pp. 259), we have
1
~
B2n
2nz 2----~
(z) ~-, In z - 2z - ~ = lnz -
1
1
1 - - + 1. . . + - 12z2 120z4 - 252z6
2z
where B2n are the Bernoulli numbers. Taking the limit as the shape parameter of the distribution ~--+cc, it immediately follows that (since ~ ' ( ~ ) ~ I/a) Var(Y) ~ 2/~. Now consider the random variable X = (2/~)-1/2y. Then the p.d.f, and c.d.f. of X are given by
,2/5N~
e(x) = B'~v'--',~)~,(1
e-,/~ + e-X/57-~)2='
- o o < x < 001
~>0
(2.10)
and
ix G(x) = ~ L ~
e V~t (1 + e - V / ~ ' ) 2~ dt
1 rP(xX) ~ 1 - B(~,~J ° u-(1-
u)~ 1 du
= Ip(xx)(~, ~)
(2.11)
where p(.) is as defined before and K = X/~-/e. Further, the characteristic function of X can easily be found as
~o(t) =
{F(~)} 2
From Abramowitz and Stegun (1965, p. 257), we can express the gamma function as i
l[
1
1
F(z) ~ e-ZzZ-~(2~z)~L1 + i~z + 288z ~
139 51840z 3
571 2488320z4 ~
] (2.13)
Upon substituting (2.13) into (2.12) and taking the limit as e ---+ oo, we get
132
N. Balakrishnan and S. K. Lee
" ( ) " 0~'2\/2-:11 lijnooqo(t) = li+m [( 1 + it/V~ ) ~+lt@L/2)l/2 I 1 - - it/v/~ a-,t(/, _ [!im(l+it/(2/o:)l/2~it(~/2)'/2 [lim(1 -- k~ ~ \ l - - ~ J
k~+~
(it)2~-1/2 ] 2~ J
x [ l i m ( 1 - ( i t ) 2 ~ ~] = e (it)a-l.e
002
2
(it)2 e-~-
(2.14)
which is the characteristic function of the standard normal distribution. In Figure 1, we have plotted the density of X in (2.10) against the standard normal density for various choices of ~. It is seen that this family of distributions is very close to the normal distribution and hence could be used in robustness studies of some classical procedures based on normality; they could also serve as useful and meaningful alternatives to the normal distribution while measuring the performance of some goodness-of-fit tests for normality.
3. Order statistics and moments Let X1,X2,... ,Xn be a random sample of size n from the reparametrized distribution mentioned above with p.d.f, f(x) and c.d.f. F(x) as given in (2.10) and (2.11), respectively. Let Xl:n _<X2:, _<... _< Xn:~ be the order statistics obtained from the above random sample. Then the density function of the ith order statistic, X/:n(1 < i _< n), is given by [David (1981, p. 9); Arnold, Balakrishnan and Nagaraja (1992, p. 10)] n!
J}:,(x) -- ( i - 1)[(n- i)[
{F(x)}i-l{1 - F ( x ) } ° - i f ( x ) ,
-oo < x < (3.1)
and the joint density function of the order statistics, X/:, and Xj:, (1 _< i < j _< n), is given by [David (1981, p. 10); Arnold, Balakrishnan and Nagaraja (1992, p. 16)1
fi,j:n(X,y) = (i-
n! 1)! 0" - - i - - 1)!(n
-- j)! {F(x)}i-I {F(Y) -F(x)}J-i 1 x { 1 - F C y ) } n Jf(x)f(y), -oo<x
Order statistics from Type III generalized logistic
133
,,,','(0,1)
4N(0,1) 0,4.
03
0.2
0.1
-2
0
2
-4
-2
0
2
x
x
N(0,1)
o
-2
2
0
2
x
x
0.4-
O.4
0.2
0
4
-2
0
2
2
O
2
x
x
Fig. 1. Density plots of Eq. (2.10) against N(0,1) density.
Cov(X/:n,Xj:n), b y i a i j : , (1 _< i < j < n). Note that 6i,i: n Var(X,.:,) and we will use Pi:, instead of #}.~ for convenience. Upon using the expression o f f ( x ) and F ( x ) in (2.10) and (2.11), respectively, we may compute the single moments of order statistics from (3.1) as =
• (k) =#i:n
5
00
xkfi:n(x) dx,
1 < i < n ,
(3.3)
N. Balakrishnan
134
S. K. Lee
and
..N(0,1)
4,v(o,1) o.4 I O.3
0.2
0,1
-2
o x
2
./N(O.I)
t
0 !4
-2
0 x
2
.N(0J)
0.4 0.3 0.2
O-4 x
x
., N(oj)
014
-2
/N(O,I)
6 ×
x
Fig. 1. (Contd.). and the product moments of order statistics from (3.2) as
#i,j:,
[l
xyf~j:, (x,y)dx dy,
1 < i < j <_ n
(3.4)
oo<x
directly by using numerical integration methods since explicit algebraic expressions cannot be derived from Eqs. (3.3) and (3.4) in general. Following steps
Order statistics from Type III generalized logistic
135
similar to those of Gupta (1960), however, exact explicit formulae may be derived for the single moments of order statistics for positive integer values of cc For positive integer values of c~,upon using the expression ofF(x) in Eq. (2.6), we can write the moment generating function of Xn:n as
Mn:n(t) = E(e tx":") - n x / ~ ° ~ f et x I ~ T j ( 2 ~ ; 1 ) { B(oq c~) a-oo × q
P
(V/~X)} ~
(1 ÷ e-X/~-~x)2~ dx n--1
-B(~,~)~
12~ ( 2 ~ 7 1) ur(1-u)2~-l-r L r=c~
x u~+g@St 1(1 - u) ~-x/~'-' du nV(2@
re(n-l)
{r(~)}2r(nm + 1)j=
,)
×F[nm-@+j+v~t
) +1] ,
(3.5)
'
n
m blr ( 1 - u) m-r] where aj(m,n) denotes the coefficient of uJ(1 - u) mn j in [2rm~ (r) with m = 2~ - 1. The single moments of the largest order statistic, Xn:n, may be easily derived from Eq. (3.5). For example, from (3.5) we immediately obtain
d ~'n:n= ~M.:n(t)r,=o
nv/7~r(2~) =
m(n-l)
{r(~)I2r(,m+ 1) j=o~(n-1) Z aj(m,n-
1)F(~+j)
x F[nm - (~ +j) + 1]{O[c~+ j ] - tp[nm - (c~+j) + 1]}
(3.6)
and d2 =
m(n-l)
_
nc~r(2.)
1) ~
- 2{r(~)}zr( nm+ × r[nm - (c~ + j ) +
a j ( m , n - 1)r(~+j)
j=~in-ll
1]{@[~ +j] -
O[nm - (~ + j ) +
+ ¢'[~+j] + ~'[nm- (~+j) + 1]} ; thence, the variance of X.:n is given by
1]) 2 (3.7)
N. Balakrishnan and S. K. Lee
136
{7n,n: n .~-
(3.8)
#(2) _ (#n:n)2 ?/:n
As a result, the first two single moments of X,:, can be easily computed from Eqs. (3.6) and (3.7) at least for small values of n. Furthermore, upon using the recurrence relation [David (1981, p. 48); Arnold, Balakrishnan and Nagaraja (1992, p. 112)] that n
/z!k) = ~--'~(--1)r--i(~)
(;
--1"~# (k)
(3.9)
the single moments of all other order statistics may be computed in a simple recursive manner. Also, the coefficients a j ( m , n - 1) involved in Eqs. (3.5)-(3.7) can be easily generated in a recursive way as follows. First of all, we note that ao(m,O)=l ,
aj(m,O)=O
and
aj(m, 1)= (m~ , j _ > l . \J,/
(3.10) Next, for n > 2 we have
aj(m, n) = Coefficient of ud(1 - u) m~ j in ~ ( m r ) u r ( 1 - u ) m Coeff. of ur(1 -- U)m-r in
=
ur(1
-
-
~
11,)m-r
× [Coeff. of uj r(1 - u) m(~-l)-~-r) in
{r=~(7)bit(I--hi)m--r} n-l] r=c~
=~(m)aj-~(m,n-1)
•
(3.11)
Thus, by starting with the values of aj(m, 1) given in (3.10), we can compute the coefficients aj(m, n) for any value of n by repeated application of the recurrence relation in (3.11). Means, variances and covariances of all order statistics have been computed for sample size n = 20 and for choices of e = 0.5~0.5)3(1)6(2)12 by using Eqs. (3.3) and (3.4). Since the distribution is symmetric around zero, we have
Xi:n d -X,-i+l:,
(3.12)
Order statistics from Type 1II generalized logistic
137
and (X~:n,Xj:n) _a (-Xn j+~:,,,-X,,_~+~:,,) .
(3.13)
These results reduce the amount of computation involved in the evaluation of moments of order statistics. From (3.12) and (3.13), we readily get z
n-i+l:n
and
#i,j:~ =/~-j+~,n-i+~:n •
(3.15)
The IMSL routine D Q D A G was employed to perform the single integration in the computation of the single moments of order statistics, while the double integration involved in the computation of the product moments was performed by IMSL routine DTWODQ. Values of means, and variances and covariances of all order statistics are presented in Table 2 and Table 3, respectively; all these values were computed in double precision and rounded off to five decimal places. In order to check the correctness of these values, we used the following known identities: Table 2 Means of order statistics (#i:,) for sample size n = 20 n
i
~=0.5
~--1.0
~=1.5
7=2.0
~=2.5
~=3,0
20
1l 12 13 14 15 16 17 18 19 20
0.07943 0.24027 0.40727 0.58547 0.78165 1.00595 1.27552 1.62408 2.13524 3.14255
0.07071 0.21356 0.36087 0.51628 0.68464 0.87320 1.09417 1.37147 1.76431 2.50863
0.06777 0.20456 0.34529 0.49316 0.65239 0.82929 1.03435 1.28797 1.64012 2.28745
0.06630 0.20008 0.33755 0.48170 0.63647 0.80768 1.00500 1.24709 1.57928 2.17765
0.06542 0.19741 0.33294 0.47488 0.62701 0.79487 0.98765 1.22298 1.54345 2.11270
0.06484 0.19564 0.32989 0.47037 0.62075 0.78641 0.97621 1.20712 1.51991 2.06999
n
i
c~= 4.0
7 = 5.0
~ = 6.0
c~= 8.0
~ = 10.0
~ = 12.0
20
11 12 13 14 15 16 17 18 I9 20
0.06412 0.19344 0.32609 0.46477 0.61300 0.77594 0.96208 1.18755 1.49093 2.01744
0,06369 0,19213 0,32383 0.46144 0.60839 0.76972 0.95369 1.17597 1.47381 1.98645
0.06341 0.19126 0.32233 0.45923 0.60533 0.76560 0.94814 1.16831 1.46251 1.96604
0.06305 0.19017 0.32047 0.45648 0.60153 0.76049 0.94126 1.15883 1.44853 1.94083
0.06284 0.18952 0.31935 0.45483 0.59927 0.75744 0.93716 1.15318 1.44022 1.92588
0.06270 0.18909 0.31861 0.45374 0.59776 0.75541 0.93444 1.14943 1.43471 1.91599
t Missing values can be found by the symmetry relation #i:~ = -#n-i+l:n.
138
N. Balakrishnan and S. K. Lee
Table 3 Variances and covariances of order statistics (~tj:n) for n = 20 n
i
j
~=0.5
~ = 1.0
7 = 1.5
~=2.0
~=2.5
~=3.0
20
1 1 1 1 1 1 1 1 1 1 1 1 l 1 1 1 1 1 1 1 2 2
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 2 3
1.60641 0.60709 0.35907 0.25018 0.18992 0.15212 0.12648 0.10815 0.09454 0.08416 0.07608 0.06971 0.06463 0.06058 0.05734 0.05477 0.05276 0.05124 0.05015 0.04945 0.61272 0.36290
0.84810 0.33539 0.20612 0.14842 0.11588 0.09501 0.08051 0.06984 0.06166 0.05520 0.04996 0.04563 0.04200 0.03889 0.03622 0.03389 0.03184 0.03002 0.02840 0.02695 0.34949 0.21560
0.62091 0.25628 0.16160 0.11845 0.09371 0.07760 0.06624 0.05777 0.05119 0.04592 0.04158 0.03794 0.03483 0.03212 0.02973 0.02759 0.02564 0.02383 0.02210 0.02035 0,27459 0,17396
0.51786 0,22057 0.14134 0.10467 0.08340 0.06943 0.05949 0,05203 0~04619 0,04148 0,03757 0.03426 0.03141 0.02891 0.02667 0.02465 0.02277 0.02100 0.01924 0.01737 0.24103 0.15522
0.46061 0.20067 0.12995 0.09688 0.07754 0.06475 0.05562 0.04872 0.04330 0.03890 0.03524 0.03213 0.02943 0.02705 0.02492 0.02297 0.02115 0.01940 0.01765 0.01570 0.22234 0.14472
0.42466 0.18810 0.12272 0.09190 0.07378 0.06175 0.05312 0.04658 0.04143 0.03724 0.03374 0.03075 0.02816 0.02586 0.02379 0.02189 0.02011 0.01838 0.01663 0.01465 0.21053 0.13805
n
i
j
~ = 4.0
~ = 5.0
e¢ = 6.0
e¢ = 8.0
¢~= 10,0
~ = 12.0
20
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2
l 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 2 3
0.38243 0.17322 0.11411 0.08594 0.06926 0.05812 0.05009 0.04399 0.03916 0.03522 0.03191 0.02908 0.02661 0.02441 0.02243 0.02059 0.01886 0.01717 0.01543 0.01341 0.19651 0,13009
0.35864 0.16475 0.10918 0.08251 0.06665 0.05602 0.04834 0,04249 0,03784 0.03404 0.03085 0.02811 0.02571 0.02358 0.02164 0.01984 0.01814 0.01647 0.01474 0.01271 0.18850 0.12552
0.34344 0.15931 0.10599 0.08029 0.06495 0.05466 0.04720 0.04151 0.03698 0.03327 0.03015 0.02747 0.02513 0.02303 0.02112 0.01935 0.01767 0.01602 0.01429 0.01226 0.18334 0.12256
0.32521 0.15272 0.10213 0.07759 0.06289 0.05299 0.04581 0.04031 0.03593 0.03233 0.02931 0.02670 0.02441 0.02236 0.02049 0.01876 0.01710 0,01547 0.01376 0.01171 0.17707 0.11896
0.31469 0.14890 0.09987 0.07601 0.06168 0.05201 0.04499 0.03960 0.03531 0.03178 0.02881 0.02624 0.02399 0.02197 0.02012 0.01841 0.01677 0.01515 0.01344 0.01139 0.17342 0.11685
0.30784 0.14639 0.09839 0.07497 0.06088 0.05137 0.04445 0.03914 0.03490 0.03142 0.02848 0.02594 0.02371 0.02171 0.01988 0.01818 0.01655 0.01493 0.01324 0.01119 0.17103 0.11547
t Missing values can be found by the symmetry relation aij:,, = ¢7n-j+l,n i+l:n.
Order statistics from Type 1II generalized logistic
139
Table 3 (Contd.) n
i
j
~=0.5
~=1.0
~=1.5
~=2.0
~=2.5
~=3.0
20
2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3
4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 3 4 5 6 7 8
0.25305 0.19221 0.15402 0.12810 0.10956 0.09580 0.08529 0.07712 0.07067 0.06553 0.06142 0,05814 0,05554 0,05351 0.05197 0.05086 0.36880 0.25750 0.19576 0.15696 0.13061 0.11175
0.15557 0.12162 0.09981 0.08462 0.07345 0.06488 0.05810 0.05260 0.04805 0.04423 0.04097 0.03815 0.03570 0.03355 0.03164 0.02993 0.22603 0.16345 0.12796 0.10512 0.08919 0.07745
0.12784 0.10128 0.08396 0.07172 0.06259 0.05549 0.04979 0.04510 0.04116 0.03779 0.03485 0.03227 0.02995 0.02783 0.02587 0.02400 0.18600 0.13700 0.10870 0.09020 0.07711 0.06732
0.11525 0.09197 0.07665 0.06573 0.05751 0.05108 0.04588 0.04157 0.03792 0.03477 0.03201 0.02954 0.02730 0.02523 0.02326 0.02132 0.16796 0.12500 0.09990 0.08333 0.07151 0.06261
0.10816 0.08670 0.07249 0.06230 0.05461 0.04855 0.04364 0.03955 0.03606 0.03304 0.03038 0.02798 0.02580 0.02375 0.02179 0.01983 0.15782 0.11822 0.09490 0,07942 0.06831 0.05990
0.10363 0.08333 0.06981 0.06010 0.05273 0.04692 0.04219 0.03823 0.03486 0.03193 0.02933 0.02698 0.02483 0.02281 0,02086 0.01887 0,15137 0.11389 0.09170 0.07690 0.06624 0.05815
n
i
j
ct = 4 . 0
~ =
ct =
c~ =
~ =
~ =
20
2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3
4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 3 4 5 6 7 8
0.09821 0.07927 0.06659 0.05744 0.05047 0.04494 0.04043 0.03665 0.03340 0.03057 0.02805 0.02577 0.02367 0.02168 0.01974 0.01774 0.14364 0.10868 0.08784 0.07385 0.06374 0.05603
0.09509 0.07692 0.06473 0.05589 0.04915 0.04379 0.03941 0.03572 0.03256 0.02979 0.02732 0.02507 0.02300 0.02102 0.01909 0.01709 0.13920 0.10567 0.08560 0.07209 0.06229 0.05480
5.0
6.0
0.09306 0.07540 0.06351 0.05488 0.04829 0.04304 0.03874 0.03512 0.03200 0.02927 0.02683 0.02462 0.02256 0.02060 0.01867 0.01667 0.13631 0.10372 0.08414 0.07093 0.06133 0.05399
8.0
0.09059 0.07353 0.06202 0.05365 0.04723 0.04212 0.03792 0.03437 0.03132 0.02864 0.02624 0.02406 0.02202 0.02008 0.01816 0.01615 0.13279 0.10132 0.08235 0.06951 0.06017 0.05299
10.0
0.08914 0.07244 0.06114 0.05292 0.04661 0.04157 0.03743 0.03394 0.03092 0.02827 0.02589 0.02373 0.02170 0.01977 0.01786 0.01585 0.13072 0.09992 0.08130 0.06868 0.05948 0.05241
t Missing values can be found by the symmetry relation crid:, = ~r, j+l,n-i+l:n.
12,0
0.08818 0.07172 0.06057 0.05244 0.04620 0.04121 0.03711 0.03365 0.03066 0.02802 0.02566 0.02351 0.02150 0.01957 0.01766 0.01566 0.12937 0.09899 0.08060 0.06813 0.05902 0.05202
140
N. Balakrishnan and S. K. Lee
Table 3 (Contd.) n
i
j
c~= 0.5
c~= 1.0
c~= 1.5
c~= 2.0
c~-- 2.5
~ = 3.0
20
3 3 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4 4 4 4 4 4
9 10 I1 12 13 14 15 16 17 18 4 5 6 7 8 9 10 1l 12 13 14 15
0.09774 0.08705 0.07872 0.07215 0.06692 0.06273 0.05938 0.05673 0.05466 0.05309 0.26366 0.20067 0.16104 0.13410 0.11480 0.10045 0.08949 0.08096 0.07422 0.06885 0.06455 0.06112
0.06844 0.06131 0.05552 0.05074 0.04671 0.04327 0.04031 0.03772 0.03545 0.03343 0.17221 0.13502 0.11103 0.09428 0.08192 0.07242 0.06490 0.05880 0.05374 0.04948 0.04585 0.04272
0.05971 0.05360 0.04856 0.04433 0.04071 0.03756 0.03477 0.03228 0.03000 0.02789 0.14648 0.11640 0.09668 0.08271 0.07226 0.06411 0.05757 0.05218 0.04764 0.04376 0.04038 0.03739
0.05563 0.04998 0.04530 0.04133 0.03790 0.03490 0.03221 0.02977 0.02752 0.02538 0.13476 0.10785 0.09005 0.07733 0.06774 0.06021 0.05412 0.04906 0.04478 0.04107 0.03782 0.03492
0.05328 0.04790 0.04342 0.03960 0.03629 0.03337 0.03075 0.02835 0.02611 0.02396 0.12812 0.10299 0.08627 0.07425 0.06515 0.05797 0.05213 0.04727 0.04312 0.03953 0.03635 0.03350
0.05176 0.04656 0.04220 0.03849 0.03525 0.03239 0.02980 0.02743 0.02520 0.02305 0.12387 0.09987 0,08382 0,07226 0,06346 0.05651 0.05084 0.04610 0.04205 0.03853 0.03540 0.03258
n
i
j
c~= 4.0
c~= 5.0
c~= 6.0
~ = 8.0
c~= 10.0
c~ = 12.0
20
3 3 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4 4 4 4 4 4
9 10 11 12 13 14 15 16 17 18 4 5 6 7 8 9 10 11 12 13 14 15
0.04992 0.04492 0.04073 0.03713 0.03399 0.03120 0.02866 0.02633 0.02411 0.02196 0.11874 0.09609 0.08086 0.06984 0.06142 0.05474 0.04927 0.04468 0,04075 0.03731 0.03424 0.03147
0.04884 0.04396 0.03986 0.03634 0.03325 0.03050 0.02800 0.02568 0.02348 0.02133 0.11577 0.09389 0.07914 0.06842 0.06022 0.05370 0.04835 0.04385 0.03998 0.03659 0.03357 0.03082
0.04814 0.04334 0.03930 0.03582 0.03277 0.03004 0.02756 0.02526 0,02307 0.02092 0.11383 0.09246 0.07801 0.06750 0.05944 0.05302 0.04775 0.04330 0.03948 0.03612 0.03313 0.03040
0.04727 0.04257 0.03860 0.03518 0.03218 0.02949 0.02703 0.02475 0.02257 0.02041 0.11146 0.09070 0.07662 0.06636 0.05848 0.05218 0.04700 0.04263 0.03886 0.03555 0.03258 0.02987
0.04676 0.04211 0.03819 0.03480 0.03183 0.02916 0.02672 0.02444 0.02227 0.02012 0.11006 0.08966 0.07581 0.06569 0.05791 0.05169 0.04656 0.04223 0.03850 0.03521 0.03226 0.02956
0.04642 0.04181 0.03792 0.03456 0.03159 0.02894 0.02651 0.02424 0.02208 0.01993 0.10914 0.08898 0.07527 0.06524 0.05753 0.05136 0.04627 0.04197 0.03825 0.03498 0.03205 0.02936
t Missing values can be found by the symmetry relation cri,j:n = a,, /+l,,-i+~ ....
Order statistics from Type III generalized logistic
141
Table 3 (Contd.) n
i
j
e = 0.5
c~ = 1.0
c~= 1.5
c~= 2.0
c~ = 2.5
c~= 3.0
20
4 4 5 5 5 5 5 5 5 5 5 5 5 5 6 6 6 6 6 6 6 6
16 17 5 6 7 8 9 10 11 12 13 14 15 16 6 7 8 9 10 11 12 13
0.05840 0.05627 0.20713 0.16641 0.13869 0.11881 0.10402 0.09271 0208390 0.07695 0.07140 0.06696 0.06341 0.06059 0.17324 0.14453 0.12392 0.10857 0.09683 0.08767 0.08043 0.07466
0.03999 0.03758 0.14291 0.11766 0.09999 0.08694 0.07690 0.06894 0.06248 0.05712 0.05261 0.04876 0.04544 0.04254 0.12513 0.10644 0.09261 0.08197 0.07352 0.06665 0.06096 0.05616
0.03471 0.03227 0.12460 0.10361 0.08870 0.07754 0.06883 0.06183 0.05606 0.05120 0.04703 0.04341 0.04020 0.03733 0.11115 0.09524 0.08331 0.07399 0.06649 0.06030 0.05509 0.05062
0.03228 0.02984 0.11616 0.09709 0.08343 0.07313 0.06503 0.05847 0.05302 0.04840 0.04441 0.04090 0.03776 0.03491 0.10463 0.08998 0.07891 0.07021 0.06315 0.05729 0.05231 0.04800
0.03089 0.02845 0.11134 0.09335 0.08040 0.07058 0.06283 0.05653 0.05127 0.04678 0.04289 0.03945 0.03636 0.03353 0.10087 0.08695 0.07637 0.06802 0.06121 0.05553 0.05069 0.04648
0.02999 0.02756 0.10823 0.09093 0.07843 0.06893 0.06140 0.05526 0.05012 0.04573 0.04190 0.03851 0.03545 0.03263 0.09844 0.08498 0.07472 0.06659 0.05995 0.05439 0.04964 0.04549
n
i
j
c~= 4.0
c~= 5.0
c~ = 6.0
c~= 8.0
~ = 10.0
c~= 12.0
20
4 4 5 5 5 5 5 5 5 5 5 5 5 5 6 6 6 6 6 6 6 6
16 17 5 6 7 8 9 10 11 12 13 14 15 16 6 7 8 9 10 11 12 13
0.02891 0.02648 0.10447 0.08799 0.07604 0.06691 0.05966 0.05372 0.04873 0.04444 0.04070 0.03736 0.03434 0.03155 0.09548 0.08257 0.07269 0.06484 0.05840 0.05299 0.04835 0.04428
0.02827 0.02586 0.10227 0.08628 0.07464 0.06573 0.05863 0.05281 0.04791 0.04369 0.03999 0.03669 0.03369 0.03091 0.09375 0.08116 0.07151 0.06381 0.05749 0.05217 0.04759 0.04357
0.02786 0.02545 0.10084 0.08515 0.07372 0.06496 0.05796 0.05221 0.04737 0.04319 0.03953 0.03625 0.03327 0.03050 0.09261 0.08023 0.07072 0.06314 0.05689 0.05162 0.04709 0.04310
0.02735 0.02495 0.09907 0.08377 0.07259 0.06400 0.05713 0.05148 0.04670 0.04258 0.03896 0.03571 0.03275 0.02999 0.09121 0.07909 0.06976 0.06230 0.05615 0.05095 0.04647 0.04252
0.02705 0.02465 0.09803 0.08295 0.07192 0.06343 0.05664 0.05104 0.04631 0.04222 0.03862 0.03539 0.03244 0.02969 0.09038 0.07841 0.06919 0.06180 0.05571 0.05056 0.04610 0.04218
0.02685 0.02446 0.09735 0.08242 0.07148 0.06306 0.05632 0.05075 0.04605 0.04198 0.03839 0.03518 0.03223 0.02949 0.08983 0.07797 0.06881 0.06148 0.05542 0.05029 0.04586 0.04195
t Missing values can be found by the symmetry relation aiff:n
=
~n-j+l,n-i+l:n.
142
N. B a ~ k r ~ h n a n and S. K. L e e
Table 3 (Contd.) n
i
j
c~= 0.5
c~= 1.0
~ = 1.5
c~= 2.0
c~= 2.5
c~= 3.0
20
6 6 7 7 7 7 7 7 7 7 8 8 8 8 8 8 9 9 9 9 10 10
14 15 7 8 9 10 11 12 13 14 8 9 10 11 12 13 9 10 11 12 10 11
0.07004 0.06634 0.15186 0.13034 0.11429 0.10200 0.09241 0.08482 0.07877 0.07392 0.13833 0.12141 0.10845 0.09832 0.09030 0.08390 0.13027 0.11648 0.10569 0.09714 0,12650 0.11490
0.05207 0.04853 0.11379 0.09909 0.08776 0.07875 0.07143 0.06535 0.06023 0.05585 0.10655 0.09443 0.08479 0.07694 0.07043 0.06493 0.10221 0.09184 0.08338 0.07636 0.10017 0.09100
0.04673 0.04329 0.10249 0.08971 0.07972 0.07167 0.06503 0.05943 0.05462 0.05044 0,09691 0.08618 0.07752 0.07036 0,06432 0.05914 0.09356 0.08420 0.07647 0.06993 0.09197 0.08357
0.04422 0.04084 0.09715 0.08526 0.07589 0.06829 0.06197 0.05660 0.05196 0.04787 0.09232 0.08223 0.07403 0.06720 0.06140 0.05638 0.08940 0.08053 0.07314 0.06685 0.08802 0.07998
0.04276 0.03942 0.09406 0,08267 0.07366 0.06632 0.06019 0.05496 0.05041 0.04638 0.08965 0.07992 0.07199 0.06536 0.05969 0.05477 0.08697 0.07838 0.07119 0.06504 0.08571 0.07788
0.04182 0.03850 0.09205 0.08098 0.07221 0.06504 0.05902 0.05388 0.04939 0.04541 0.08790 0.07842 0.07066 0.06415 0.05857 0.05371 0.08538 0.07697 0.06991 0.06386 0.08419 0.07651
n
i
j
~=4.0
~=5.0
~=6.0
~=8.0
~=10.0
~=12.0
20
6 6 7 7 7 7 7 7 7 7 8 8 8 8 8 8 9 9 9 9 10 10
14 15 7 8 9 10 11 12 13 14 8 9 10 1l 12 13 9 10 11 12 10 11
0.04066 0.03738 0.08959 0.07892 0.07043 0.06346 0.05760 0.05256 0.04815 0.04422 0.08576 0.07656 0.06902 0.06266 0.05720 0.05242 0.08343 0.07524 0.06834 0.06240 0.08233 0.07481
0.03998 0.03672 0.08815 0.07771 0.06938 0.06253 0.05675 0.05178 0.04742 0.04353 0.08449 0.07547 0.06805 0.06179 0.05639 0.05166 0.08227 0.07422 0.06741 0.06155 0.08122 0.07381
0.03954 0.03629 0.08720 0.07691 0.06868 0.06191 0.05620 0.05127 0.04694 0.04307 0.08366 0.07475 0.06741 0.06121 0.05586 0.05115 0.08151 0.07354 0.06680 0.06098 0.08050 0.07315
0.03899 0.03576 0.08603 0.07592 0.06783 0.06116 0.05551 0.05064 0.04635 0.04250 0.08264 0.07386 0.06662 0.06049 0.05520 0.05053 0.08057 0.07270 0.06604 0.06028 0.07959 0.07233
0.03866 0.03544 0.08533 0.07533 0.06732 0.06070 0.05510 0.05026 0.04599 0.04216 0.08202 0.07334 0.06615 0.06007 0.05481 0.05016 0.08001 0.07221 0.06559 0.05986 0.07906 0.07184
0.03844 0.03523 0.08487 0.07495 0.06698 0.06041 0.05483 0.05001 0.04576 0.04194 0.08162 0.07298 0.06584 0.05979 0.05455 0.04992 0.07964 0.07188 0.06529 0.05959 0.07870 0.07152
t Missing values can be found by the symmetry relation ~ri,j:. = an_jWI,n_i÷l: n,
Order statistics from Type III generalized logistic
143
~-~-'~#i:. = nE(X) i=1 =
n,ul: 1
and n
o-ij:, = n Var(X) i=1 j = l :
nol,l: 1 .
4. BLUEs of location and scale parameters Let Y be a random variable with p.d.f. 2X/~
e-~2-~(Y-~)/~
g(y; #, a) = aB(c~, c~) (1 + e-X/T~(y-~)/,,)2~' '
- o c < y , # < oc, c~,a > 0 (4.1)
where #, a and ~ are the location, scale and shape parameters, respectively. Then, Y is the location-scale form of the reparametrized model presented in Section 2. The corresponding c.d.f, is found as
G(y;#,o)--B(~,~f0
- -
u~ l ( 1 - u ) ~ - l d u
where p(.) is as defined earlier. Now, let 111,I12,..., Yn be a random sample of size n from the distribution with p.d.f, and c.d.f, as given in (4.1) and (4.2), respectively, where the shape parameter e is assumed to be known. Further, let Yl:n < Y2:~-< " " -< Y, .... be a Type-II right-censored sample obtained from the above random sample. The best linear unbiased estimators (BLUEs) of # and a are then given by [David (1981, pp. 128132); Arnold, Balakrishnan and Nagaraja (1992, p. 172)] n-F
#* = -/~'FY = d Y =
~aiYi:,,
(4.3)
i=1
and n--r
•* = I ' / ' Y = b'Y= ~-~biYi:, , i=1
(4.4)
144
N . B a l a k r i s h n a n a n d S. K . L e e
where a and b are the vectors of coefficients for the BLUEs of the location and scale parameters, respectively,
/'=X1 [r 1=
1(1/{ _/d,)12-1],
{1 , 1 , . . . , 1 ]'(n-r)×l,
#=
A = (ltX-11)(/t'.Y-lp) -- (/Rt~Y'-ll) 2,
[#l:n,#2:n,''',#n-r:n(nl'
r)×l
and o.l,l:n
O'l,2:n
"' "
o.i,n-r:n
O-l,2:n
o.2.2:n
"''
ff2,n-r:n
.
i
o.l,n-r:n
0"2,n-r:n
12=
.
.
"""
~Tn-r,n-r:n
(n-r) ×(n-r)
Further, we have the variances and covariance of the BLUEs #* and o.* as Var(#*)
(4.5)
= o.2/R'12 1/[///~ ,
Var(o.*) = o-21'12
I1/A
(4.6)
,
and Cov(/~*, a*) = - a 2 # ' .r - l l / A
.
(4.7)
After computing the means, variances and covariances of order statistics for sample size n = 20 from the standard reparametrized distribution as described in Section 3, the coefficients of the BLUEs o f # and o- were computed from (4.3) and (4.4). All the computations were carried out in double precision and the IMSL routine D L I N D S was used for finding the inverse of the variance-covariance matrix. In Tables 4 and 5, we have presented the coefficients ai and b i of the best linear unbiased estimators #* and a*, respectively, for sample size n = 20, r = 0, 2, 4 and e = 1.0, 2.0(2)10.0. Values of these coefficients have been rounded off to five decimal places. In order to check the accuracy of the tabulated coefficients, we verified the conditions n-r
n-r
Z ai = 1, Z ai#i:n = O i=1
(4.8)
i=1
and n
r
n--r
~bi=O,
Zbi#i:,= l ,
i=I
i=1
(4.9)
based on the fact that the estimators #* and a* are unbiased for # and a, respectively. The values of Var(#*)/~ 2, Var(a*)/a 2 and Cov(#*, cr*)/a 2 have been computed and are presented in Table 6 for sample size n = 2 0 , r = 0 , 2 , 4 and c~= 1.0,2.0(2)10.0. These values have also been rounded off to five decimal places.
Order statistics from Type III generalized logistic
145
Table 4 Coefficients for the B L U E of p for n = 20 r
i
~=1.0
~=2.0
~=4.0
~=6.0
~=8.0
~=10.0
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 1 2 3 4 5 6 7 8 9 10 ll 12 13 14 15 16 17 18 1 2 3 4 5 6 7 8 9 10
0.00685 0.02012 0.03221 0.04291 0.05214 0.05986 0.06605 0.07070 0.07381 0.07536 0.07536 0.07381 0.07070 0.06605 0.05986 0.05214 0.04291 0.03221 0.02012 0.00685 0.00574 0.01894 0.03108 0.04187 0.05123 0.05911 0.06548 0,07032 0,07363 0,07539 0.07559 0.07423 0.07131 0.06683 0.06078 0.05317 0.04401 0.06129 0.00231 0.01540 0.02771 0.03885 0.04866 0.05708 0.06404 0.06951 0.07345 0.07584
0.01838 0.03445 0.04297 0.04893 0.05337 0.05673 0.05923 0.06102 0.06217 0.06274 0.06274 0.06217 0.06102 0.05923 0.05673 0.05337 0.04893 0.04297 0.03445 0.01838 0.01549 0.03178 0.04067 0.04701 0.05181 0.05553 0.05840 0.06054 0.06205 0.06296 0.06330 0.06307 0.06224 0.06077 0.05857 0.05549 0,05127 0.09904 0,00907 0,02602 0,03580 0.04302 0.04870 0.05327 0.05697 0.05993 0.06224 0.06393
0.03070 0.04276 0.04731 0.05023 0.05228 0.05378 0.05486 0.05562 0.05611 0.05635 0.05635 0.05611 0.05562 0.05486 0.05378 0.05228 0.05023 0.04731 0.04276 0.03070 0.02573 0.03884 0.04415 0.04770 0.05033 0.05235 0.05393 0.05517 0.05611 0.05680 0.05724 0.05744 0.05739 0.05707 0.05641 0.05534 0.05368 0.12433 0.01660 0.03181 0.03860 0.04338 0.04710 0.05012 0.05265 0.05479 0.05660 0.05813
0.03630 0.04537 0.04841 0.05032 0.05165 0.05260 0.05329 0.05377 0.05407 0.05422 0.05422 0.05407 0.05377 0.05329 0.05260 0.05165 0.05032 0.04841 0.04537 0.03630 0.03033 0.04097 0.04494 0.04759 0.04956 0.05110 0.05234 0.05333 0.05414 0.05478 0.05526 0.05559 0.05576 0.05576 0.05556 0.05510 0.05425 0.13364 0.02003 0.03354 0.03920 0.04319 0.04633 0.04892 0.05113 0.05305 0.05474 0.05624
0.03941 0.04661 0.04889 0.05031 0.05128 0.05198 0.05248 0.05283 0.05305 0.05316 0.05316 0.05305 0.05283 0.05248 0.05198 0.05128 0.05031 0.04889 0.04661 0.03941 0.03287 0.04196 0.04526 0.04747 0.04913 0.05045 0.05152 0.05242 0.05316 0.05377 0.05428 0.05467 0.05495 0.05511 0.05512 0.05494 0.05448 0.13844 0.02194 0.03435 0.03943 0.04305 0.04590 0.04829 0.05036 0.05219 0.05382 0.05531
0.04138 0.04733 0.04915 0.05028 0.05105 0.05160 0.05199 0.05227 0.05244 0.05252 0.05252 0.05244 0.05227 0.05199 0.05160 0.05105 0.05028 0.04915 0.04733 0.04138 0.03448 0.04252 0.04542 0.04738 0.04886 0.05005 0.05103 0.05187 0.05257 0.05318 0.05369 0.05412 0.05446 0.05471 0.05485 0.05484 0.05461 0.14136 0.02314 0.03480 0.03955 0.04294 0.04564 0.04791 0.04989 0.05167 0.05328 0.05475
2
4
N. Balakrishnan and S. K. Lee
146
Table 4 (Contd.) r
i
~=1.0
~=2.0
ct=4.0
~=6.0
~=8.0
~=10.0
11 12 13 14 15 16
0.07667 0.07590 0.07352 0.06950 0.06383 0.16775
0.06503 0.06553 0.06541 0.06461 0.06301 0.21744
0.05939 0.06040 0.06115 0.06160 0.06169 0.24601
0.05756 0.05873 0.05972 0.06054 0.06113 0.25594
0.05666 0.05789 0.05901 0.06000 0.06083 0.26097
0.05612 0.05740 0.05858 0.05967 0.06065 0.26399
Table 5 Coefficients for the BLUE of ~r for n = 20 r
i
~ = 1.0
c~ = 2.0
c~~ 4.0
~ = 6.0
~ = 8.0
~ = 10.0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
-0.05686 -0.06110 -0.06034 -0.05660 -0.05076 -0.04338 -0.03483 -0.02545 -0.01549 -0.00520 0.00520 0.01549 0.02545 0.03483 0,04338 0.05076 0.05660 0.06034 0.06110 0.05686 -0.06308 -0.06766 -0.06671 -0.06245 -0.05588 -0.04761 -0.03807 -0.02762 -0.01654 -0,00511 0.00643 0.01782 0.02882 0.03915 0.04851 0.05653
-0.07491 -0.07176 -0.06409 -0.05580 -0.04731 -0.03875 -0.03014 -0.02153 -0.01292 -0.00431 0.00431 0.01292 0.02153 0.03014 0.03875 0.04731 0.05580 0.06409 0.07176 0.07491 -0.08433 -0.08044 -0.07159 -0,06210 -0.05241 -0.04266 -0.03289 -0.02312 -0.01337 -0.00363 0,00609 0.01580 0.02548 0.03512 0.04471 0.05417
-0,09085 -0.07550 -0.06356 -0.05334 -0.04409 -0.03545 -0.02722 -0.01927 -0.01149 -0.00382 0.00382 0.01149 0.01927 0.02722 0.03545 0.04409 0.05334 0.06356 0,07550 0,09085 -0A0350 -0,08548 -0,07161 -0.05978 -0.04909 -0.03911 -0.02962 -0.02046 -0.01152 -0.00270 0.00606 0.01485 0.02373 0.03280 0.04213 0.05185
-0.09751 -0.07616 -0.06294 -0.05223 -0.04284 -0.03426 -0.02620 -0.01849 -0.01101 -0.00366 0.00366 0.01101 0.01849 0.02620 0.03426 0.04284 0.05223 0.06294 0.07616 0.09751 -0.11159 -0.08656 -0.07114 -0,05868 -0.04777 -0.03781 -0.02846 -0.01953 -0.01087 -0.00237 0.00608 0.01456 0.02317 0.03201 0.04122 0.05096
-0.10109 -0.07637 -0.06255 -0.05162 -0.04218 -0.03364 -0.02568 -0.01810 -0.01077 -0.00358 0.00358 0.01077 0.01810 0.02568 0.03364 0.04218 0.05162 0.06255 0.07637 0.10109 -0.11596 -0.08696 -0.07082 -0.05808 -0.04708 -0.03714 -0.02787 -0.01906 -0.01055 -0.00220 0.00609 0.01442 0.02289 0.03162 0.04076 0.05049
-0.10331 -0.07645 -0.06229 -0.05125 -0.04178 -0.03327 -0.02537 -0.01787 -0.01063 -0,00353 0.00353 0.01063 0.01787 0.02537 0.03327 0.04178 0.05125 0.06229 0.07645 0.10331 -0.11868 -0.08716 -0.07060 -0.05770 -0.04666 -0.03673 -0.02752 -0.01878 -0.01035 -0.00209 0.00610 0.01434 0.02273 0.03139 0.04049 0.05021
Order statistics frorn Type III generalized logistic
147
Table 5 (Contd.) r
i
c~= 1.0
c~- 2.0
c~= 4.0
7 = 6.0
7 = 8.0
c~= 10.0
17 18 1 2 3 4 5 6 7 8 9 10 11 12 13 t4 15 t6
0.06274 0.19071 -0.07233 -0.07723 -0.07580 -0.07061 -0,06281 -0.05310 -0.04198 -0.02985 -0.01705 -0.00391 0.00930 0.02228 0.03474 0.04634 0.05671 0.33532
0.06339 0.22178 -0.09747 -0.09224 -0.08157 -0.07026 -0.05879 -0.04729 -0.03582 -0.02438 -0.01300 -0.00166 0.00961 0.02082 0.03195 0.04295 0.05377 0.36336
0.06212 0.23933 -0.11992 -0.09812 -0.08160 -0.06756 -0.05490 -0.04312 -0.03193 -0.02115 -0.01065 -0.00032 0.00992 0.02016 0.03047 0.04093 0.05160 0.37619
0.06147 0.24532 -0.12935 -0.09937 -0.08105 -0.06628 -0.05336 -0.04158 -0.03055 -0.02003 -0.00984 0.00015 0.01005 0.01996 0.02999 0.04023 0.05081 0.38021
0.06111 0.24833 -0.13444 -0.09983 -0.08067 -0.06557 -0.05255 -0.04079 -0.02985 -0.01946 -0.00943 0.00039 0.01012 0.01987 0.02975 0.03989 0.05041 0.38219
0.06089 0.25014 -0.13761 -0.10006 -0.08041 -0.06513 -0.05205 -0.04031 -0.02943 -0.01912 -0.00919 0.00053 0.01016 0.01981 0.02961 0.03968 0.05017 0.38336
Table 6 Values of (1) Var(/~*)/rr2, (2) Var(rr*)/a2 and (3) Cov(g*, cr*)/rr2 for n = 20 e=l.0
~--2.0
(1) (2) (3)
0.07595 0.03660 0.00000
0.06282 0,03224 0,00000
(1) (2) (3)
0.07609 0.04065 0.00072
0,06323 0,03645 0.00130
(1) (2) (3)
0.07695 0.04673 0.00298
0.06468 0.04247 0.00424
e=4.0 r=0 0.05635 0.02962 0.00000 r=2 0.05705 0.03405 0.00174 r=4 0.05889 0.03999 0.00505
a=6.0
~=8.0
e=10.0
0.05421 0.02870 0.00000
0.05315 0.02824 0.00000
0.05252 0.02796 0.00000
0.05504 0.03322 0.00192
0.05405 0.03280 0.00201
0.05346 0.03255 0.00207
0.05703 0.03912 0.00534
0.05612 0.03869 0.00549
0.05557 0.03842 0.00558
5. MLEs of location and scale parameters C o n s i d e r a r a n d o m s a m p l e o f size n f r o m t h e r e p a r a m e t r i z e d d i s t r i b u t i o n w i t h p . d . f , g(y) a n d c.d.f. G(y) as g i v e n in (4.1) a n d (4.2), r e s p e c t i v e l y , w h e r e t h e s h a p e p a r a m e t e r e is a s s u m e d t o b e k n o w n . L e t Yl:n _< Y2:n _< " " <_ Y, . . . . b e a T y p e - I I right-censored sample obtained from the random sample. The likelihood function b a s e d o n s u c h a T y p e - I I r i g h t - c e n s o r e d s a m p l e is t h e n g i v e n b y
N. Balakrishnan and S. K. L e e
148
n[
n r
L = 7., [1 - G(ro_~:.)]~I] g(~:~) i=1 n
n!e =
r
i=~
n-r
rD"-r I~ U1 + e-V/~(Y':"-~)/'] 2~ i=1 tr
x
1
B(~,c~)ao
where p(.) is as defined before. We then obtain the likelihood estimating equations as n
OlnL _ ( n _ r ) _ 2
F
E e v&7~(yi:~ #)/° + ~?'0" h(y~-r:.) = 0 i=1 1 + e-V@-g(fi:,-u)/G
0~
(5.2)
and
OlnL O~
nmr
(. - r > + ~
~yi:. i=1
n--r
_ 2v/~ Z
yi:,,e V'T~O":"-~)/°
i=1 1 - { - e
r~y, . . . . h(y._r:.) = 0 ,
(5.3)
v @gcv~:" ~')/~
where h(t) = ~g(t) is the hazard function. Then, the maximum likelihood estimators/2 and ~-are obtained by solving (5.2) and (5.3) simultaneously. Since these two equations cannot be solved analytically, some numerical method must be employed. In order to study the performance of these estimators, we generated 5,000 pseudorandom samples of size n = 20 from the standard reparametrized population (with # = 0 and o- = 1) with choices ofc~ = 0.5(0.5)3(1)6(2)12. The IMSL routines DNEQNJ and D N E Q N F were employed to solve the maximum likelihood estimating equations. Both routines are based on the M I N P A C K subroutine HYBRDJ and HYBRD1, respectively, which use a modification of M. J. D. Powell's hybird algorithm. These algorithms are a variation of Newton's method, which take precaution to avoid large step size or increasing residuals. Thus we determined the bias, variances, covariance and mean square errors of the maximum likelihood estimators /2 and ir for r = 0,2,4. The values of Bias(/~)/cr, Bias(&)/~r, Var(/2)/a 2, Var(&)/cr 2, Cov(/2,&)/~r 2, MSE(/2)/~ 2 and MSE(6)/a 2 determined through this Monte Carlo process are presented in Table 7.
Order statistics from Type III generalized logistic
149
Table 7 Simulated values of (1) Bias(/0/a, (2) Bias(f-)/a, (3) Var(/t)/0- 2, (4) Var(f-)/a 2, (5) Cov(~, 6)/0 "2, (6) MSE(~)/a 2 and (7) MSE(&)/0-2 n
r
(1)
(2)
(3)
20
0 2 4
-0.00128 -0.00321 -0.00823
-0.02477 -0.02924 -0.03704
0.07287 0.07314 0.07403
20
0 2 4
-0.00503 -0.00783 -0.01261
-0.03186 -0.03726 -0.04341
20
0 2 4
0.00361 -0.00031 -0.00611
-0.03255 -0.03974 -0.04742
20
0 2 4
0.00381 0.00071 -0.00412
-0.03415 -0.03912 -0.04499
20
0 2 4
0.00077 -0.00217 -0.00786
-0.03562 -0.04007 -0.04761
20
0 2 4
0.00162 -0.00242 -0.00791
-0.03993 -0.04709 -0.05418
(4)
~=1.0 0.03434 0.03768 0.04310 ~=2.0 0.06178 0.03068 0.06197 0.03438 0.06324 0.03890 = 4.0 0.05800 0.02755 0.05858 0.03126 0.06064 0.03613 ~=6.0 0.05538 0.02725 0.05622 0.03107 0.05824 0.03635 ~=8.0 0.05477 0.02691 0.05544 0.03080 0.05722 0.03573 a = 10.0 0.05170 0.02576 0.05282 0.02960 0.05466 0.03434
(5)
(6)
(7)
-0.00015 0.07287 0 . 0 0 0 7 5 0.07315 0.00286 0.07409
0.03495 0.03853 0.04447
-0.00115 0.06181 -0.00034 0.06203 0 . 0 0 2 0 1 0.06339
0.03170 0.03577 0.04079
0.00052 0.05801 0 . 0 0 1 9 8 0.05858 0 . 0 0 5 1 3 0.06068
0.02861 0.03284 0.03838
0.00000 0.00170 0.00490
0.05540 0.05622 0.05825
0.02842 0.03260 0.03837
-0.00020 0.00135 0.00429
0.05477 0.05545 0.05728
0.02818 0.03240 0.03800
-0.00115 0.00085 0.00377
0.05170 0.05282 0.05472
0.02735 0.03182 0.03728
6. Comparison of the BLUEs with the MLEs I n o r d e r to c o m p a r e t h e p e r f o r m a n c e o f t h e B L U E s a n d M L E s , we d e f i n e t h e r e l a t i v e efficiency b e t w e e n t h e t w o m e t h o d s o f e s t i m a t i o n as f o l l o w s : Eft(#)
MSE(/z)
- V a r ( # * ~ x 100
(6.1)
and Eft(a)
-
MSE(6-) V a r ( a * ~ x 100 .
(6.2)
W e i n t e r p r e t t h e v a l u e o f E f t ( # ) as f o l l o w s : • if E f t ( # ) > 100, t h e n we c o n c l u d e t h a t the e s t i m a t i o n o f # b a s e d o n t h e B L U E s is m o r e efficient t h a n t h a t b a s e d o n t h e M L E s . • i f E f t ( # ) < 100, t h e n we c o n c l u d e t h a t the e s t i m a t i o n o f # b a s e d o n the M L E s is m o r e efficient t h a n t h a t b a s e d o n t h e B L U E s . T h e i n t e r p r e t a t i o n f o r E f t ( a ) f o l l o w s similarly.
N. Balakrishnan and S. K. Lee
150
We computed the values of Eff(/~) and Eft(a) for sample size n = 20 and r = 0, 2, 4 with choices of ~ = 1.0, 2.0(2)10. These values are presented in Table 8. As we observe from Table 8, Eff(/~) are quite close to 100 which indicates that /~* and ~ have about the same efficiency in estimating/~. On the other hand, 8 is slightly more efficient than a* in estimating a as indicated by the values of Eft(a).
7. Illustrative examples Now we shall present two examples in order to illustrate the practical importance of Tables 4-7. EXAMPLE 1 The following arranged sample of size 20 was simulated from the reparametrized distribution with location parameter /~ = 25, scale parameter a = 5 and shape parameter ~ = 2.0: 17.0914 17.3896 18.4054 23.2186 23.3792 24.4208 28.9735 29.9410 30.5459
18.4836 18.5006 20.6758 26.0103 27.1519 28.7569 30.6616 32.2781 33.5083
21.0104 28.8479
Assume that the data are from the reparametrized distribution with p.d.f. (4.1) and a = 1.0~2.0, 4.0. Q-Q plots of the data in Figure 2 indicate that these models all appear reasonably consonant with the data and hence they will be used here to illustrate both methods of estimation. We should mention here that the correlation values from the Q-Q plots are at least 0.96; and based on i,000 simulated samples, we found the p-values to be at least 0.33. From Tables 4-6, we obtained the necessary coefficients ai and bi for the B L U E s / : and a* and the values of Var(#*)/a 2 and Var(a*)/a 2 for the choices of mentioned above. Also, we obtained the values of Var(/~)/a 2 and Var(6)/a 2 from Table 7. By making use of these values and the numerical method described in Section 5, we computed the BLUEs and MLEs o f / z and a, and also their standard errors for complete as well as Type-II right-censored samples with r = 0, 2, 4, respectively. These findings are summarized in Table 9. Table 8 Relative efficiency between the BLUEs and the MLEs for n = 20 c~= 1.0
a = 2.0
Eft(#) Eft(a)
95.94 95.49
98.39 98.33
Eft(p) Eft(a)
96.14 94.78
98.10 98.13
Eft(#) Eft(a)
96.28 95.16
98.01 96.04
a = 4.0 r=0 102.95 96.59 r~2 102.68 96.45 r=4 103.04 95.97
a = 6.0
~ = 8.0
c~= 10.0
102.20 99.02
103.05 99.79
98.44 97.82
102.14 98.13
102.59 98.78
98.80 97.76
102.14 98.08
102.07 98.22
98.47 97.03
Order statistics from Type III generalized logistic Q--Q p ~ for F_xamp~I alpha-| .0
i:il ~
÷ + + +
÷
÷
-0.5
+
+
÷ +
÷
+ 4+
-1 -1,5
rho=0.3o98 p-value.0.3330
...2 -2.~
f
i
25 samp4equa.~.es
3o
Q - Q plot foe Ex,~Tg,e I alpha,.2..O + + + +
1 + + +
"- o.5
0 ÷ ÷ i-0.5
+
+
+
4.+ ÷ ÷
--I
+ rho,.0.9736 p--vahJe-0.4010
--1.5
~
3'o
sample quanmes
Q - Q l~Otfor Exemp~ 1 ~oha,,4.0
+ + + + + m o
+
0
+
+
+ + +
++
i-.0.5 + + + -I .5
+
d)<~0.9754 p-val~-0.3o40
Fig. 2. Q-Q plots for Example 1.
151
N. Balakrishnan and S. K. Lee
152
Table 9 The BLUEs and MLEs of # and a for Example 1
c¢
r=0 1.0 2.0 4.0 r=2 1.0 2.0 4.0 r=4 1.0 2.0 4.0
~*
S.E.(#*)
a*
S.E.(a*)
[l
25.0255 24.9811 24.9672
1,2916 1.2609 1.2358
4.6868 5,0307 5.2061
0.8966 0,9033 0.8960
25.0511 25.0257 25.0240
1.3417 1.3077 1.2828
4.8641 5.2007 5.3705
25.1902 25.2207 25.2509
1.4581 1,4273 1.4047
5.2565 5.6123 5.7885
S.E.([~)
&
S.E.(#)
25.0501 1.2373 25.0001 1.2203 24.9800 1.2235
4.5834 4.9097 5.0803
0.8494 0.8600 0.8432
0.9807 0.9929 0.9910
25.0662 1.2787 25.0350 1.2567 25.0245 1.2625
4.7280 5,0483 5.2163
0.9178 0.9360 0,9223
1.1363 1.1566 1.1576
25,1996 1.3731 25.2160 1.3537 25.2306 1.3693
5,0465 5,3830 5.5606
1.0477 1.0617 1.0570
F r o m T a b l e 9, we can see t h a t the B L U E s as well as the M L E s o f # a n d a are close to the true values o f 25 a n d 5 for c o m p l e t e as well as censored samples for all values o f c~ considered. Besides, we also notice t h a t the s t a n d a r d errors o f the M L E s are all slightly smaller t h a n those o f the B L U E s . EXAMPLE 2 The following a r r a n g e d s a m p l e o f size 20 was s i m u l a t e d f r o m the r e p a r a m e t r i z e d d i s t r i b u t i o n with l o c a t i o n p a r a m e t e r # = 75, scale p a r a m e t e r a = 10 a n d shape p a r a m e t e r ~ = 8.0: 56.6439 58.5165 62.1652 66.8842 67.0852 68,6479 70.8896 71.1141 71.7507 72.1389 72.1936 74.2165 77.7925 78.6317 79.5452 84.5838 86.6567 86.9632 88.7370 91.5514
A s s u m e t h a t the d a t a are f r o m the r e p a r a m e t r i z e d d i s t r i b u t i o n with p.d.f. (4.1) a n d ~ = 6.0, 8.0, 10.0. Once again, Q - Q plots o f the d a t a in F i g u r e 3 indicate t h a t these m o d e l s a p p e a r to fit the d a t a very well a n d the c o r r e l a t i o n values f r o m the Q - Q plots are at least 0.98. Based on 1,000 s i m u l a t e d samples, we f o u n d the pvalues to be at least 0.81, thus p r o v i d i n g strong evidence t o w a r d s the a p p r o p r i ateness o f the m o d e l s c o n s i d e r e d here. F o l l o w i n g the p r o c e d u r e as in E x a m p l e 1, we c o m p u t e d the B L U E s a n d M L E s o f # a n d a, a n d also their s t a n d a r d errors for c o m p l e t e s a m p l e as well as T y p e - I I right c e n s o r e d samples with r = 0, 2, 4 a n d choices o f e m e n t i o n e d above. These findings are s u m m a r i z e d in T a b l e 10. Once again, the B L U E s a n d the M L E s o f # a n d a are close to the true value o f 75 a n d 10 for c o m p l e t e as well as c e n s o r e d samples for all values o f e considered. Also, the s t a n d a r d errors o f the M L E s are all slightly smaller t h a n those o f the BLUEs.
Order statistics from Type III generalized logistic Q--Q ix~oHorExamp~:, ¢pha-s,o
1.
4¢4" + +
|
4, +
/ 4, ÷ o, +
-1
+ 4,
rho-0.9888
-13
p-value,.O.8190
+ 65
70
5 80 sample qu~tiles
Q-Q p~otfor Exmp~ 2 .lp~e.O
+ + + ÷ 0.~
o-
+
+
+
+
o .i.÷
~-0.5
4.
+ 4+
-I
4+
rho~,9888 p-vaJue.O.8190
-1.5 +
sample quan~es
Q - Q plot Ior Example 2 Itlpha.10.0 + ÷
i + 1
+ +
0.5 4, 4-
~
o +
~-o.5
4,
+
rho=0.9089
-1.5
p-value.O.8260 i
i
I
60
65
70
1
i
75 80 sample quan~le$
|
85
Fig. 3. Q-Q plots for Example 2.
i
90
153
154
N. Balakrishnan and S. K. Lee
Table 10 The BLUEs and MLEs of # and a for Example 2 c¢
r=0 6.0 8.0 10.0 r= 2 6.0 8.0 10.0 r=4 6.0 8.0 10.0
#*
S.E.(,u*)
74.3274 74.3299 74.3313
2.2760 2.2730 2.2712
74.5044 74.5143 74.5201 74.7030 74.7185 74.7277
a*
S.E.(~*)
~
S.E.(~)
&
S.E.(gr)
9.7753 9.8595 9.9106
1.6560 1.6569 1.6572
74.3187 74.3234 74.3261
2.2060 2.2160 2.1576
9.3739 9.4457 9.4890
1.5474 1.5495 1.5230
2.3924 2.3906 2.3895
10.1974 10.2829 10.3346
1.8586 1.8623 1.8645
74.4582 74.4685 74.4746
2.3048 2.3059 2.2608
9.7204 9.7933 9.8372
1.7134 1.7187 1.6925
2.5249 2.5252 2.5253
10.5728 10.6597 10.7124
2.0912 2.0967 2.0997
74.5983 74.6131 74.6218
2.4176 2.4140 2.3699
10.0177 10.0918 10.1366
1.9099 1.9076 1.8784
References Abramowitz, M. and I. A. Stegun (Eds.) (1965). Handbook of Mathematical Functions with Formulas, Graphs and Mathematical Tables. Dover Publications, New York. Arnold, B. C., N. Balakrishnan and H. N. Nagaraja (1992). A First Course in Order Statistics. John Wiley & Sons, New York. Balakrishnan, N. (Ed.) (1992). Handbook of the Logistic Distribution. Marcel Dekker, New York. Balakrishnan, N. and M. Y. Leung (1988). Order statistics from the Type I generalized logistic distribution. Comm. Statist. - Sim. Comput. 17(1), 25-50. Berkson, J. (1944). Application of the logistic function to bioassay. J. Amer. Statist. Assoc. 37, 357-365. Berkson, J. (1951). Why I prefer logits to probits. Biometrics 7, 32%339. Berkson, J. (1953). A statistically precise and relatively simple method of estimating the bio-assay with quantal response, based on the logistic function. J. Amer. Statist. Assoc. 48, 565 599. David, H. A. (1981). Order Statistics, Second edition. John Wiley & Sons, New York. Davidson, R. R. (1980). Some properties of a family of generalized logistic distributions. In Statistical Climatology, Developments in Atmospheric Science, 13 (Eds., S. Ikeda et al.). Elsevier, Amsterdam. Dubey, S. D. (1969). A new derivation of the logistic distribution. Nay. Res. Log. Quart. 16, 37-40. Finney, D. J. (1947). The principles of biological assay. J. Roy. Statist. Soe. Series B 9, 46-91. Finney, D. J. (1952). Statistical Methods in Biological Assay. Hafner, New York. Gumbel, E. J. (1944). Ranges and midranges. Ann. Math. Statist. 15, 414-422. Gupta, S. S. (1960). Order statistics from the gamma distribution. Technometries 2, 243~62. Oliver, F. R. (1982). Notes on the logistic curve for human populations. J. Roy. Statist. Soc. Series A 145, 359-363. Pearl, R. and L. J. Reed (1920). On the rate of growth of the population of the United States since 1790 and its mathematical representation. Proc. Natl. Acad. Sci. 6, 275-288. Pearl, R. and L. J. Reed (1924). Studies in Human Biology. Williams and Wilkins, Baltimore. Pearl, R., L. J. Reed and J. F. Kish (1940). The logistic curve and the census count of 1940. Science 92, 486-488. Prentice, R. L. (1976). A generalization of the probit and logit methods for dose response curves. Biometrics 32, 761-768. Reed, L. J. and J. Berkson (1929). The application of the logistic function to experimental data. J. Phys. Chem. 33, 760-779. Schultz, H. (1930). The standard error of a forecast from a curve. J. Amer. Statist. Assoc. 25, 139 185.
Order statistics from Type III generalized logistie
155
Shah, B. K. (1966). On the bivariate moments of order statistics from a logistic distribution. Ann. Math. Statist. 37, 1002-1010. Verhulst, P. J. (1838). Notice sur la lois que la population suit dans sons accroissement. Corr. Math. et Physique 10, 113 121. Verhulst, P. J. (1845). Recherches math6matiques sur la loi claccroissement de la population. A caddmie de Bruxelles 18, 1-38. Wilson, E. B. and J. Worcester (1943). The determination of L. D. 50 and its sampling error in bioassay. Proc. Natl. Acad. Sei. 29, 79 85. Zelterman, D. and N. Balakrishnan (1992). Univariate generalized distributions. In Handbook o f the Logistic Distribution (Ed., N. Balakrishnan), pp. 209521. Marcel Dekker, New York.
Part II Linear Estimation
N. Balakrishnan and C. R. Rao, eds., Handbook of Statistics, Vol. 17 © 1998 Elsevier Science B.V. All rights reserved.
l'1~)
Estimation of Scale Parameter Based on a Fixed Set of Order Statistics
Sanat
K. Sarkar
and
Wenjin
Wang
1. Introduction
Suppose that F ( x ) is a known absolutely continuous distribution function which is free of any unknown parameters. Then, for any real/~ and o(> 0), F ( ( x - # ) / a ) is called a location-scale distribution with the location parameter # and the scale parameter a, generated by the distribution function F(x). Let Cg(F) denote the family of all location-scale distributions generated by F ( x ) , that is Cg(F)={F((x-#)/o),
-oc<#<~,
o>0}
.
The density function f o f F is called the generating density function for the family Cg(F). Any parameter-free distribution can generate a family of location-scale distributions. Some important generating distribution functions are the following: 1. 2. 3. 4.
The standard exponential distribution: F ( x ) = 1 - e X(x > 0); The standard normal distribution ~b(x) with density function: O(x) = ~--~2e 4; The standard logistic distribution: L(x) - 1+~1 x,. The uniform distribution with density: f ( x ) = 1 if - ½ < x < ½ and = 0 otherwise; 5. The Weibull distribution with known shape parameter b(> 0): F(x) = 1 - e-X~ (x > 0); 6. The Gamma distribution with density function: g(x) = r~seX' ~ X(x> 0), where r is a known positive constant.
These generating distributions generate different families of location-scale distributions except that the standard exponential distribution is a special case of Weibull distribution with b = 1 and is also a special case of Gamma distribution with r = 1. These location-scale families have wide applications and have been frequently studied in the literature. Take, for example, the logistic distribution. It has been applied in studies of population growth, physicochemical phenomena, bio-assay, mental ability, survival data, agricultural production data (see references given by Hatter and Moore 1967; Balakrishnan 1992).
159
S. K. Sarkar and IV. Wang
160
The primary goal of this article is to present some new results on estimation of the scale parameter of a family of location-scale distributions based on a fixed set of order statistics. Let (1.1)
~J(1 ~ X2 ~ "'" ~ Yn
be the n observed order statistics from a random sample of size N ( > n) drawn from a population whose cumulative distribution function is a member of :g(F). These order statistics are any fixed set of order statistics selected from the sample, not necessarily the first n observations. They can be singly or doubly Type II censored observations, or any other Type II censored data. Throughout this chapter, we will use n to denote the number of observed order statistics and N to denote the sample size which, for simplicity, is not indicated in the notations. When n = N, the sample in (1.1) is a complete sample. Let Yi = (X~-/~)/o- (i = 1 , . . . , n). Then, Yi's are the standardized order statistics which can be considered as n observed order statistics for a sample of size N drawn from a population with distribution function F. Let ai denote the expected value of Yi, and qij denote the covariance between Yi and Yj (i ¢ j) or the variance of Y~ (i = j). It is clear that these ai's and qij's are not dependent on the location and scale parameters; they only depend on the form of F(x). Let X = (X1,...
,Xn)',
a = (al,...,a~)',
and Q
= (qij)
•
(1.2)
Then, we have E(x) = ,e +
Cov(X) =
2Q,
(1.3)
where e is an n-dimensional vector of unit entries. Model (1.3) can be used to obtain some estimators of the scale parameter. First note that the scale parameter is one of the regression coefficients in model (1.3). By applying the theory of linear model, one is able to obtain the best unbiased linear estimator of the scale parameter, as done by Lloyd (1952). This estimator as well as the simplified "unbiased nearly best" and "nearly unbiased, nearly best" linear estimators proposed by Blom (1958, 1962) will be reviewed in Section 2. Since the scale parameter a in model (1.3) is known to be positive, it is a natural requirement that any good estimator of o- should possess the same property. Is Lloyd's estimator always positive with probability one? In Leslie, Stephens and Fotopoulos (1986), the authors have noted that it is difficult to establish this positivity. They had to assume, like Sarkadi (1981), that it is always positive in order to prove the consistency of Shapiro-Wilk's (1965) W-statistic for testing normality which involves this estimator. In the text book on order statistics by Arnold, Balakrishnan, and Nagaraja (1992), the authors have made a conjecture based on empirical evidence that Lloyd's estimator is positive with probability one. Recently, Bai, Sarkar, and Wang (1996) gave an affirmative answer analytically to this conjecture for the first time. They proved that Lloyd's estimator is positive with probability one when the underlying generating density is log-concave, a property shared by many of the commonly used
Estimation of scale parameter based on a fixed set of order statistics
161
distributions. Similar result holds for Blom's L-estimators. Their results are stated in Section 3. Observing that in model (1.3) the scale parameter a appears not only as a regression coefficient but also as a common component of the variance-covariance matrix of X, in Section 4, we propose three non-linear estimators of the scale parameter which involve quadratic functions of the given set of order statistics. One is based on the positive square-root of the residual sum of squares obtained by fitting model (1.3). Shapiro and Wilk (1965) and Nelson and Hahn (1972, 1973) mentioned about such an estimator. They, however, did not offer any detailed investigation of this estimator either in those papers or in any subsequent publications. We attempt to study this estimator in this paper. In particular, we provide simplified and explicit representations of such an estimator for some specific distributions, such as uniform and exponential. The second estimator is the best linear combination of the best unbiased L-estimator and the positive square root of the aforementioned residual sum of squares. This combined estimator is an improvement over the best unbiased L-estimator in the sense of having smaller mean squared error. Through simulation, the amount of improvement is shown to be quite significant for several generating distributions. Constants needed to compute the combined estimator are calculated based on simulation for some generating distributions and presented in tables. These simulation results are also reported in Section 4. The third non-linear estimator proposed in Section 4 is determined by minimizing an objective function which is slightly different from, but more meaningful in the present context than, the one used in the usual weighted least squares approach for Lloyd's best unbiased L-estimator. In Section 5, we extend the positivity of the best unbiased L-estimator to censored scale regression model which involves several populations with location parameters being linear functions of some covariates.
2. Linear estimators
This section is devoted to L-estimators of the scale parameter. An L-estimator is an estimator which is linear in X. Estimators of this type appear to be very intuitive when the family of distributions under consideration depends on location and scale parameters. They are attractive in case of censored samples, and often provide quick measures of location and scale, especially when the number of observed order statistics is relatively small. In the literature, many L-estimators have been introduced to estimate the scale parameter in different situations. Results related to some important L-estimators are reviewed here. 2.1. T h e b e s t u n b i a s e d L - e s t i m a t o r
Lloyd (1952) first applied the least-squares theory originally developed by Aitken (1935) to the model (1.3) and obtained the best unbiased L-estimators of # and a.
S. K. Sarkar and W. Wang
162
According to Gauss-Markov's Theorem, the unique best linear unbiased estimators of # and a are the weighted least-squares (WLS) estimators given by (~t, 6-)'= ( A ' Q - ' A ) - I A ' Q - ' X
(2.1)
where A = (e, a) and X, a, Q are as given in (1.2). That is, d Q -I (ea' - ae')Q-1X
=
(2.2)
(etQ -le) ( a ' Q - ' a ) - ( e ' Q - ' a ) 2 ' e'Q -1 (ea t - ael)Q-1X
=
(2.3)
( e t Q - l e ) ( a t Q - ' a ) _ (etQ-la) 2 '
It can be shown that the variances of the best unbiased L-estimators are as follows, var(~) =
a'Q-la -0"2 (etQ-le) (a,Q-la) - (e,Q-la) 2 ' e'Q-% var(6) = G2 (e'Q le)(a'Q ' a ) - (e'Q 'a) 2
(2.4)
.
If the generating density function f(x) is symmetric about the origin and the sample is either complete or, if there is a censoring, the censoring is symmetric about the median, then it can be shown that e/Q-la = 0 (cf David 1981; Arnold, Balakrishnan, and Nagaraja 1992). In this symmetric case, the weighted leastsquares estimators given in (2.2) and (2.3) can be further simplified as /l = e t Q - 1 X / e ' Q - l e ,
6- = a ' Q - 1 X / a ' Q - l a ,
(2.5)
and the corresponding variances are 002
(;2
var(/~) -- e,Q_le ,
var(6-) - a,Q_la .
(2.6)
The best unbiased L-estimators based on the n fixed order statistics given in (2.2) and (2.3) can also be obtained directly by minimizing variances. Let us consider the estimation of 0-. Suppose that p'X is an unbiased estimator. The unbiasedness leads to the two conditions: p'e = 0 and p'a = 1, for p. The variance of p'X is E(ptX
_ 0.)2
pt(002Q + (e + 00a)(e + 0-a)')p
-
0-2 =
p1Qp 002 .
Thus, the best unbiased linear estimator is the one that minimizes p~Qp subject to the aforementioned conditions, or equivalently, the one obtained by minimizing p'Qp - 22p'e - 6(p'a - 1) with respect to p and using the conditions on p. By taking the first partial derivative with respect to p, and equating it to zero, we have
Estimation o f scale parameter based on a fixed set o f order statistics
Qp-2e-c~a=O
163
.
Hence p = Q-l(2e + 6a) ,
(2.7)
where 2e'Q-le+6e'Q-la=O,
2a'Q l e + S a ' Q l a = 1 ,
because of the conditions on p. Solving 2 and 6 from the last two equations and putting the solutions back into (2.7), we get the best choice of the coefficient vector p and hence the best unbiased L-estimator of a as is given in (2.3). Lloyd's estimators are highly efficient. For some location-scale families, such as those generated by 1 and 4 mentioned in Section I, they are in fact the uniformly minimum variance unbiased estimators (UMVUE) (see Arnold, Balakrishnan, and Nagaraja 1992). The processes of computing the best unbiased L-estimators are often easier than that for some other types of estimators, such as the maximum likelihood estimators, because for most generating distributions these latter estimators do not have closed-form and have to be obtained through numerical methods. Since the coefficient vector p depends only on the generating distribution function, not the parameters, tables are available for many commonly used location-scale families. See Sarhan and Greenberg (1962) and references given in David (1981), Balakrishnan and Cohen (1991). The scale estimator given in (2.3) is often used in the context of life testing and reliability (Lawless 1982; Harter 1988; David 1981; Arnold, Balakrishnan, and Nagaraja 1992; Nelson and Hahn 1972, 1973). Another important application of is in testing of distributional assumptions. For non-censored sample, Shapiro and Wilk (1965) compared this best unbiased L-estimator with the usual sample sum of squares about the sample mean and constructed the well-known W statistic for testing for normality, which is W=
(a'Q-la)262
(a,Q_2a) ~/N1 (Xi _ ~)2
'
where X~ is the ith order statistic in the sample (1.1) without censoring and J( is the sample average. The distribution of W statistic does not depend on the parameters and its percentage points can be obtained through simulations or approximations (cf Shapiro and Wilk 1968) or by use of some asymptotic results (cf. Leslie, Stephens and Fotopoulos 1986). Recently, Parrish (1992) gave new tables of coefficients and percentage points for the W test. The same technique has also been adapted to develop a test for testing exponentiality (Shapiro and Wilk 1972). The W test has been shown through extensive simulations to be very powerful against a wide class of alternative distributions (Shapiro, Wilk, and Chen 1968; Gan and Koehler 1990). The W statistic is also useful for identifying outliers (Barnett and Lewis 1984; Iglewicz and Hoaglin 1993).
164
s. K. Sarkar and W. Wang
2.2. N e a r l y best L-estimators
The estimator given in (2.3) requires full knowledge of a and Q, the expected vector and the variance-covariance matrix of the given set of order statistics. However, the explicit forms of the expectations and the covariances of order statistics are known only for a few generating distributions, such as the exponential, uniform, power, pareto, and for normal distribution when the sample size does not exceed 5 (see, for example, David 1981; Arnold, Balakrishnan, and Nagaraja 1992). For most generating distributions, we need extensive numerical calculations to obtain the exact values of the expectations and covariances of order statistics. This may be computationally laborious for large sample sizes. Due to this difficulty, Gupta (1952) suggested a very simple method which simply takes Q = I, the n x n unit matrix, in (2.3). The resulting estimator appears to be the usual (unweighted) least-squares estimator of the scale parameter in model (1.3). This estimator eliminates the need of the computation of the covariances between order statistics. It has been noted that this simplified linear estimator of a gives surprisingly good result, at least in the case of normal distribution (Chernoff and Lieberman 1954; Ali and Chan 1964). Unlike Gupta's method, Blom (1958, 1962) proposed an "unbiased nearly best" linear estimator of the scale parameter, denoted by O-approx. This estimator is obtained by replacing qij'S in (2.3) by their approximations, where qij is the (i,j) th entry of Q. To be more specific, suppose that the X/ in (1.1) is the qi th order statistics based on the random sample of size N and define pi = q i / ( N + 1), then we have for large N (David and Johnson 1954), ai ~ F - l ( p i )
and
pi(1 - pj) qij ~" ( N + 2 ) f ( F _ l ( p i ) ) f ( F _ i ( p j ) )
,
(2.8)
where F -1 (.) is the inverse of the generating distribution F and f is its density function. Then, O'approx is obtained by replacing the elements of Q in (2.3) by the approximations given in (2.8) (see, e.g., Balakrishnan and Cohen 1991; David 1981). This approximate estimator is also asymptotically normal and relatively easy to compute (see, Balakrishnan and Cohen 1991, p. 217 for calculation of Q-l). The positivity of this approximate estimator was also proved by Bai, Sarkar, and Wang (1996). Like Gupta's simplified estimator, the "unbiased nearly best" linear estimator still requires the exact expectations of the order statistics. However, if the unbiasedness is given up, one may also approximate asymptotically the expectations ai's by their approximations given in (2.8) and obtain a "nearly unbiased, nearly best" linear estimator of the scale parameter. 2.3. Other L-estimators
Many other L-estimators of the scale parameter have been introduced in the literature. For interested reader, we refer to the books written by David (1981), and by Balakrishnan and Cohen (1991).
Estimation of scale parameter based on a fixed set of order statistics
165
3. The positivity of the best unbiased L-estimator Being an estimator of a positive quantity, the best unbiased L-estimator 6 should also be positive. But, is it in general positive with probability one? The answer to this question is not apparently clear, because it is a usual weighted least-squares estimator of a regression parameter which, even if the regression parameter is known to be positive, may not turn out to be always positive. The alternate approach to obtain this best unbiased L-estimator, as shown in Section 2.1, does not also guarantee the positivity of the estimator. The positivity appears to depend on properties of the underlying generating distribution. It is easy to see that 6 is always positive when n = 2. Note that a necessary and sufficient condition for an L-statistic, p'X, to be an unbiased estimator of the scale parameter is that pie = 0 and p'a = 1. If n = 2, that is, only two observed order statistics (X1 _< X2) are available, then there exits a unique unbiased L-estimator of the scale parameter obtained by solving the equations Pl + p 2 = 0, and p l a l +p2a2 = 1, which is given by
x2-x~ a2 -- al This estimator, of course, is also the best unbiased L-estimator and clearly it is positive with probability one. In the case when n is greater than two, there exist numerous unbiased L-estimators of a. M a n y of them can be negative with probability greater than zero. For example, when n = 3, all unbiased L-estimators can be expressed as
1 - - [X3 - X l q-p2((a3 - al)X2 - (a2 - al)~3 - (a3 - a2)X1)] , a3 - al (3.1) where p2 can be any real number. Obviously, in this case, not all unbiased L-estimators are positive with probability one. The best choice for/)2 which guarantees that 6- has the minimum variance is related to the second moments of the order statistics X1 ,X2 and X3. The positivity of the best unbiased L-estimator obtained in this way is not clear. As mentioned before, it seems to depend on the properties of the generating density function. If it is assumed that the generating density function is symmetric about the origin and that a m o n g the three observed order statistics, X1 ~__22 ~ X3, X1 and)(3 are symmetrically placed with respect to the median X2, then the estimator given in (3.1) with p2 = 0 is the best unbiased L-estimator of a, which again is indeed positive with probability one. In general, we have THEOREM 1 (Bai, Sarkar and Wang, 1996). The best unbiased L-estimator of the scale parameter given in (2.3) is positive with probability one if at least one of the following conditions is satisfied:
166
s. K. Sarkar and W. Wang
1) The generating density function is log-concave; 2) n = 2; 3) n = 3 and the generating density function is symmetric about the origin and the censored sample is symmetric about the median. Under the same conditions stated in the theorem, Blom's "unbiased, nearly best" and "nearly unbiased, nearly best" linear estimators as given in Section 2.2 are also positive with probability one. This theorem is a special form of a more general result proved by Bai, Sarkar and Wang (1996) in the context described in Section 5. Note that the most important condition in the theorem is that the generating density function is log-concave. A non-negative real function f ( x ) , defined on an interval subset A on the real line, is said to be log-concave in x on A if f ( x - y ) f ( x ' - y') >_ f ( x - y ' ) f ( x ' - y) forall x < _ x t, y < yl, x , y , xr,y, E A
.
An equivalent definition is that f ( x ) satisfies the inequality f(~x+(1-e)x)>f~(x)fl-~(y),
for all 0 < c ~ < 1 and x , y E A
.
From the above definition, f ( x ) is a log-concave function on (-co, oe) if and only i f f ( x - y) is a TP2 (totally positive of order 2) function on the real plane. Karlin (1968) gave a comprehensive treatment of TP2 probability density functions, and Das Gupta and Sarkar (1982) obtained some general results providing interrelationships between TP2 and log-concave properties of certain probability densities. The log-concavity property is shared by many commonly used generating density functions, such as uniform, exponential, normal, gamma (with the shape parameter greater than one), beta (with the parameters greater than one), double exponential, logistic. 4. Nonlinear estimators
In this section, we introduce three non-linear estimators of the scale parameter a. 4.1. A n estimator based on residuals
Since, as mentioned in Section 1, the scale parameter a appears not only as a regression coefficient parameter but also as a parameter in the covariance matrix in model (1.3), it is possible to have an alternate estimator of a through the weighted residual sum of squares obtained by fitting this model. Such possibility was hinted in Shapiro and Wilk (1965) and Nelson and Hahn (1972). Let R= ~(X-Afl)tQ
I(X-Afl) (4.1)
: v/X'(Q -1 - Q-1A(A'Q-IA)-IA'Q-1)X ,
Estimation of scale parameter based on a fixed set of order statistics
167
where fl = (/~, 6)' and A are given in (2.1). Since R2/(n - 2 ) is an unbiased estimator of 62, an estimator based on R seems to be reasonable for estimating 6. Shapiro and Wilk (t965) used R 2, instead of the sample variance, to derive a test for normality in case of incomplete samples. However, they did not get into any details either in that paper or in any subsequent publications. The idea of using R as an alternative to estimate 6 was also mentioned by Nelson and Hahn (1972) without any details. Here, we attempt to study such an estimator in a little more details. We consider the estimator with minimum mean squared error among the class of estimators {kR, 0 < k < oc}. Let r = ~ / y ' (Q-1 - Q - ' A ( A ' Q - I A ) - I A ' Q - ' ) y
(4.2)
where Y = (X - / r e ) / 6 . The distribution of r, which is same as that of R/6, depends only on the form of the generating distribution function F but not on parameters. It is easy to see that ^'~
6 R ~-
nE(@2R
(4.3)
has the minimum mean squared error in {kR, 0 < k < oc}. The expectation of r can be simplified and given an explicit, closed-form representation for uniform and exponential distributions. These are derived later in this section. Such derivation for most other generating distribution functions is, however, difficult and needs to be worked out using Monte Carlo simulation. Values of E(r) have been calculated in Table 2 and 3 for some families of location-scale distributions and different sample sizes.
4.2. A combined estimator Now that we have two different estimators of 6, one is given in (2.3) and the other in (4.3), one might want to develop yet another estimator by having the "best" linear combination of them. Let zz
e'Q -1 (ea' - a e ' ) Q - l Y ( e t Q - l e ) ( a t Q - l a ) - (etQ-la) 2 '
(4.4)
and Er 2 -- ErE(zr) CI = Er2Ez2 _ E2(zr)
z
(n - 2) - ErE(zr) (n - 2)Ez 2 - E2(zr)
E z 2 E r - E(zr) EzaEr - E(zr) Cr = Er2Ez2 _ E2(zr) = (n - 2)Ez 2 - E2(zr)
(4.5)
(4.6)
168
S.K.
Sarkar and IV. Wang
Then the following combined estimator (4.7)
OC = C16 q- G R
has the minimum mean squared error among all such linear combinations of & and R, and hence, provides an improvement over both & and &~' in the sense of having smaller mean squared error. However, like &~', &c is also a biased estimator of the scale parameter, and the bias is given by (4.8)
Bias(&c) = [CI + C~E(r)- 1] 0- . The mean squared error of this combined estimator is given by MSE(&c) = [(1
-
C 1 -
GE(r)] 0-2
(4.9)
.
The combined estimator has negative bias since Bias(#c) = - M S E ( i r c ) / a < O. In order to see the extent of improvement of &c over ~, we have performed a Monte Carlo study for families of normal and logistic distributions on an IBM 3090 mainframe. The constants C1, G, and E(r) were first estimated through 100,000 pseudo-random samples and the ratio P of the mean squared error of the resulting combined estimator to the variance of the best unbiased L-estimator was then computed. The program was written in F O R T R A N language, calling some IMSL subroutines for pseudo-random numbers generation. These calculations are reported in Tables 1, 2 and 3 for certain sample sizes, complete as well as doubly censored. Although the constants C1 and Cr for doubly censored sample from exponential with n observed order statistics are exactly known, which are (n - 1)In and 0, respectively, with the ratio P being exactly (n - 1)/n, we carried out our simulation also for this distribution and reported the results in the tables only to point out that the closeness of these simulated constants to their known actual values gives credence to the accuracy of the simulation results for normal and logistic. The improvement of &c over 6- is encouraging, particularly for small samples. We have also made a comparison of the combined estimator with the maximum likelihood estimator for logistic distribution. The maximum likelihood estimator requires solving the following system of nonlinear equations: rle -rl r2 ~ 2e -r' l + e -r~ ~ l + e r, t - n - 2 _i=1, , -1- -+- 0e - Z --/,/
r l Yl e - r~ q - r2Yn 1 + e -r~
1 q- e -Y.
+ ~ i=1
Yi - ~'~ 12~-e-~ + e---------ff= 0 ' i=1
where Y~is the (rl + i)th standardized order statistic (i = 1 , . . . , n; rl is the number of observations censored from left and r2 is the number from right), and is computationally intensive like 6 o But, unlike 6c, the maximum likelihood esti-
Estimation of scale parameter based on a fixed set of order statistics
169
Table 1 Values of C1, Cr and P for the combined estimator for families of normal, logistic and exponential distributions N
n
rj
r2
Logistic
Normal
Exponential
C1
Cr
P
C~
C~
P
C1
C~
P
3
3
0
0
0.7337
0.0670
0.7795
0.7123
0.0511
0.7478
0.6652
0.00
0.6667
4
3 3 4
0 1 0
1 0 0
0.7271 0.7298 0.7768
0.0547 0.0510 0.0592
0.7653 0.7657 0.8411
0.7065 0.7092 0.7724
0.0453 0.0416 0.0373
0.7382 0.7385 0.8141
0.6648 0.6571 0.7494
0.00 0.01 0.00
0.6667 0.6666 0.7500
5
3 3 3 4 4 5
0 1 2 0 1 0
2 1 0 1 0 0
0.7240 0.7243 0.7235 0.7764 0.7769 0.8024
0.0465 0.0373 0.0472 0.0511 0.0507 0.0524
0.7569 0.7509 0.7568 0.8327 0.8327 0.8749
0.7021 0.7088 0.7037 0.7683 0.7657 0.8071
0.036t 0.0386 0.0339 0.0372 0.0394 0.0316
0.7276 0.7361 0.7277 0.8098 0.8095 0.8523
0.6653 0.6656 0.6708 0.7468 0.7512 0.8009
0.00 0.00 0.01 0.00 0.00 0.00
0.6667 0.6667 0.6667 0.7500 0.7500 0.8000
6
3 3 3 3 4
0 1 2 3 0
3 2 1 0 2
0.7183 0.7201 0.7217 0.7212 0.7781
0.0458 0.0308 0.0287 0.0419 0.0442
0.7506 0.7422 0.7423 0.7509 0.8272
0.7009 0.7064 0.7101 0.6983 0.7678
0.0256 0.0320 0.0268 0.0291 0.0319
0.7191 0.7291 0.7294 0.7190 0.8036
0.6669 0.6655 0.6685 0.6633 0.7489
0.00 0.00 0.00 0.00 0.00
0.6667 0.6667 0.6667 0.6667 0.7500
4 4 5 5 6
1 2 0 1 0
1 0 1 0 0
0.7826 0.7812 0.8060 0.8081 0.8250
0.0370 0.0416 0.0455 0.0441 0.0440
0.8243 0.8276 0.8698 0.8702 0.8972
0.7700 0.7662 0.8163 0.8087 0.8393
0.0349 0.0333 0.0240 0.0292 0.0226
0.8091 0.8034 0.8513 0.8507 0.8781
0.7541 0.7521 0.8027 0.7932 0.8349
0.00 0.00 0.00 0.00 0.00
0.7500 0.7500 0.8000 0.8000 0.8333
3 3 3 3 3
0 1 2 3 4
4 3 2 1 0
0.7184 0.7158 0.7155 0.7192 0.7196
0.0395 0.0285 0.0252 0.0238 0.0378
0.7464 0.7362 0.7337 0.7364 0.7465
0.6966 0.7027 0.7079 0.7034 0.6967
0.0221 0.0276 0.0241 0.0266 0.0219
0.7123 0.7223 0.7252 0.7224 0.7123
0.6681 0.6683 0.6712 0.6645 0.6662
0.00 0.00 0.01 0.00 0.00
0.6667 0.6667 0.6666 0.6667 0.6667
4 4 4 4 5
0 l 2 3 0
3 2 1 0 2
0.7790 0.7810 0.7855 0.7780 0.8102
0.0395 0.0329 0.0290 0.0403 0.0396
0.8232 0.8181 0.8184 0.8231 0.8663
0.7694 0.7710 0.7714 0.7640 0.8063
0.0253 0.0303 0.0299 0.0300 0.0281
0.7980 0.8050 0.8050 0.7976 0.8466
0.7449 0.7482 0.7480 0.7528 0.7984
0.00 0.00 0.00 0.00 0.00
0.7500 0.7500 0.7500 0.7500 0.8000
5 5 6 6 7
1 2 0 1 0
1 0 1 0 0
0.8198 0.8105 0.8316 0.8330 0.8410
0.0315 0.0394 0.0375 0.0367 0.0383
0.8652 0.8663 0.8941 0.8943 0.9125
0.8095 0.8086 0.8368 0.8393 0.8543
0.0285 0.0265 0.0235 0.0221 0.0212
0.8506 0.8468 0.877l 0.8773 0.8955
0.7996 0.8013 0.8305 0.8367 0.8534
0.00 0.00 0.00 0.00 0.00
0.8000 0.8000 0.8333 0.8333 0.8571
7
Note: N is the sample size, n is the number of observed order statistics, rl is the number of observations censored from left, and r2 is the number of observations censored from right.
mator
has some added disadvantages.
computational
As mentioned
in Harter
and Moore
difficulties may arise if the initial values are wrongfully
solving the maximum
likelihood
equations.
(1967),
guessed for
Also, the rate of convergence
for the
170
S. K. Sarkar and W. Wang
Table 2 Values of E(r), Cb C,,, Bias/a and P for the combined estimator for families of normal and exponential distributions with N= 10
rl r2 Normal
Exponential
E(r)
C1
Cr
2.6793 2.4823 2.2749 2.2727 2.0507
0.8942 0.8747 0.8650 0.8487 0.8511
Bias/~
P
E(r)
C1
Cr
Bias/or
P
0 0 0 1 0
0 1 2 1 3
0.0192 -0.0543 0.0249 , -0.0634 0.0264'-0.0749 0.0333 -0.0756 0.0289 -0.0896
0.9426 0.9319 0.9208 0.9173 0.9063
2.5963 2.4003 2.2003 2.2043 1.9763
0.9159 -0.0061 -0.1000 0.8998 0.8843 0.0019 -0.1111 0.8889 0.8773 -0.0011 -0.1250 0.8750 0.9056 -0,0139 -0.1249 0.8744 0.8502 0.0035 -0.1429 0.8571
1 0 1 2 0
2 2.0451 0.8555 0.0258 -0.0917 4 1.8029 0.8368 0.0297 -0.1098 3 1.7959 0.8453 0.0231 -0.1133 2 1.7929 0.8415 0.0247 -0.1142 5 1.5201 0.8126 0.0322 -0.1385
0.9050 0.8871 0.8849 0.8836 0.8588
1,9808 1.7310 1,7322 1.7323 1.4532
0.8633 -0.0031 -0.1429 0.8571 0.8172 0.0093 -0.1666 0.8332 0.8238 0.0055 -0.1667 0.8333 0.8377 -0.0025 -0.1667 0.8333 0.7946 0.0037 -0.2000 0.8000
1 2 0 1 2
4 3 6 5 4
1.5115 1.5121 1.1857 1.1747 1.1741
0.8128 0.8290 0.7784 0.7732 0.7843
0.0287 0.0164 0.0324 0.0302 0.0175
-0.1439 -0.1462 -0.1832 -0.1914 -0.1951
0.8540 0.8532 0.8150 0.8071 0.8044
1.4574 1.4506 1.1271 1.1340 1.1276
0.8270 -0.0184 0.7830 0.0117 0.7522 -0.0019 0.7666 -0.0146 0.7466 0.0030
-0.1999 -0.1999 -0.2500 -0.2499 -0.2500
0.7996 0.7998 0.7500 0.7498 0.7500
3 0 1 2 3
3 7 6 5 4
1.1712 0.7493 0.7371 0.7334 0.7358
0.7972 0.7146 0.7098 0.7069 0.7060
0.0056 0.0324 0.0222 0.0187 0.0169
-0.1963 -0.2611 -0.2738 -0.2793 -0.2816
0.8037 0.7378 0.7257 0.7204 0.7182
1.1256 0.7054 0.7072 0.7060 0.7055
0.7409 0.0081 0.6680 -0.0020 0.6591 0.0107 0.6604 0.0089 0.6683 -0.0024
-0.2500 -0.3333 -0.3333 -0.3333 -0.3333
0.7499 0.6667 0.6666 0.6666 0.6667
solution o f these e q u a t i o n s depends heavily on the a m o u n t o f censoring. O n the o t h e r hand, as o u r s i m u l a t i o n results (in T a b l e 3) indicate, 6c and the m a x i m u m likelihood e s t i m a t o r h a v e similar p e r f o r m a n c e in terms o f bias and m e a n sq u ar ed error. Hence, because o f the c o m p u t a t i o n a l p r o b l e m s associated with the maxim u m likelihood estimates, &c appears to be a g o o d alternative for estimating the scale p a r a m e t e r for logistic distribution. W e have chosen a sample o f size 10 an d considered different possible types o f d o u b l e censoring in o u r s i m u l a t i o n study. O n e t h o u s a n d samples were g e n e r a te d in each scenario. A modified Powell hybrid m e t h o d was e m p l o y e d by calling the I M S L s u b r o u t i n e D N E Q N J (user has to p r o v i d e the J o c o b i a n ) to solve the m a x i m u m likelihood equations. In all these cases, the initial values for the l o c a t i o n a n d scale p a r a m e t e r s are set to zero an d one, respectively, which are the values used to generate the r a n d o m numbers. It is i m p o r t a n t to p o i n t o u t that H a r t e r and M o o r e (1967) also c o n d u c t e d a M o n t e C a r l o study o f the m a x i m u m likelihood estimators for the p a r a m e t e r s in the family o f logistic distributions t h r o u g h 1,000 simulations o f sample sizes 10 and 20. They, however, used a different p r o c e d u r e to solve the m a x i m u m likelihood equations. W e like to e m p h a s i z e that their simulated biases an d the m e a n sq u ar ed errors for the m a x i m u m likelihood scale e s t i m a t o r are very close to w h a t we have observed in o u r study.
Estimation of scale parameter based on a fixed set of order statistics
171
Table 3 Values of E(r), Cb Cr, Bias/{r and P for the combined estimator, and of Bias/a and P for the maximum likelihood estimator for the logistic distribution with N = 10 n
rl
r2
Combined estimator
ML estimator
E(r)
C1
Cr
Bias/a
P
Bias/a
P
10 9 8 8 7
0 0 0 1 0
0 1 2 l 3
2.6465 2.4593 2.2529 2.2552 2.0299
0.9002 0.8966 0.8770 0.8819 0.8788
0.0108 0.0098 0.0143 0.0128 0.0069
-0.0713 -0.0793 -0.0908 -0.0892 -0.1072
0.9279 0.9202 0.9082 0.9100 0.8927
-0.0550 -0.0595 -0.0959 -0.0829 -0.0887
0.9325 0.8689 0.9303 0.8310 0.9132
7 6 6 6 5
1 0 1 2 0
2 4 3 2 5
2.0308 1.7753 1.7868 1.7875 1.4899
0.8547 0.8288 0.8411 0.8479 0.8061
0.0204 0.0233 0.0188 0.0157 0.0207
-0.1039 -0.1299 -0.1254 -0.1240 -0.1631
0.8944 0.8684 0.8736 0.8752 0.8359
-0.1012 -0.1249 -0.1279 -0.1301 -0.1453
0.8619 0.8615 0.9169 0.8525 0.8100
5 5 4 4 4
1 2 0 1 2
4 3 6 5 4
1.5002 1.5036 1.1606 1.1673 1.1680
0.8202 0.8076 0.7743 0.7762 0.7785
0.0150 0.0249 0.0099 0.0145 0.0156
-0.1574 -0.1550 -0.2143 -0.2069 -0.2032
0.8421 0.8436 0.7856 0.7928 0.7964
-0.1514 -0.1640 -0.2114 -0.2295 -0.2118
0.8594 0.8443 0.7963 0.7761 0.8124
4 3 3 3 3
3 0 1 2 3
3 7 6 5 4
1.1660 0.7240 0.7305 0.7309 0.7321
0.7797 0.6802 0.6903 0.6845 0.6917
0.0157 0.0269 0.0240 0.0370 0.0286
-0.2020 -0.3004 -0.2922 -0.2884 -0.2873
0.7976 0.6990 0.7073 0.7103 0.7119
-0.2202 -0.2969 -0.2824 -0.3091 -0.3295
0.7953 0.7258 0.7087 0.7254 0.6835
4.3. A n o t h e r possible nonlinear e s t i m a t o r W e p r o p o s e a n o t h e r n o n - l i n e a r e s t i m a t o r o f a in t h i s s u b s e c t i o n . L l o y d ' s b e s t unbiased L-estimator was obtained by minimizing the usual weighted leastsquares objective function, namely, (X
-/~e
-
~a)'Q -1 (x
-
~e
-
aa)
.
This objective function does not take into account the fact that Var(X), the m a t r i x u s e d i n w e i g h i n g t h e r e s i d u a l s , a l s o i n v o l v e s a 2, a n d it a p p e a r s t h a t a m o r e n a t u r a l o b j e c t i v e f u n c t i o n is h o t , a ) = ~-f2(X - # e - a a ) t Q
l(X -/~e - aa)
.
(4.10)
W e d e v e l o p a n e s t i m a t o r o f a b a s e d o n t h i s o b j e c t i v e f u n c t i o n . I n p a r t i c u l a r , we f i n d (~/, &t) t h a t m i n i m i z e h ( # , a ) , a n d p r o p o s e ~t, o r a m u l t i p l e o f it, as a n a l t e r n a t e e s t i m a t o r o f o-. T h e first p a r t i a l d e r i v a t i v e s o f h ( # , a ) w i t h r e s p e c t t o # a n d a, r e s p e c t i v e l y , a r e
172
S. K. Sarkar and W. Wang
0h(u, o) _
0g
2 e , Q _ 1 ( x - g e - o-a)
0-2
Oh(g, a)
_
00-
2 (X - ge - a a ) ' Q l(X - ge - o-a) 0-3
2 a,Q_ l (X - ge - o-a)
0-2
Equating these two derivatives to zero, we have e ' Q - ' (X - ge - o-a) = 0 , (x - ge)'Q-'
(X - ge -
(4.11)
o-a) = 0 .
(4.12)
Multiplying (4.11) by g and adding to equation (4.12), we get X ' Q 1(X - ge - o-a) = 0 .
(4.13)
The solutions to equations (4.11) and (4.13) are given by a ' Q - l ( e X ' - X e ' ) Q 1x X'V1X /2t = - e,Q_l(ea, _ a e ' ) Q i x - e'V2X '
(4.14)
e'Q l ( e x ' - Xel)Q-1X X/V3 x &~ = e ' Q - l ( e a ' - a e ' ) Q 1x - a'V3X
(4.15) '
where Vl = Q 1
Q-lae, Q 1 etQ - l a (= Q-laelQ
v2 = Q - l
Q -el,eQa _' Ql a -1 ( = Q - l e a ' Q -1, i f e ' Q
V3 = Q - I
Q lee'Q -1 e'Q le
1, i f e i Q 1a = 0) , la = 0) ,
and (4.16)
Note that the best unbiased L-estimators of g and 0-, as given in (2.2) and (2.3) can be rewritten as
/2 = 6 =
a ' Q I(eEX' E X e ' ) Q 1X (EX)'V1X e'Q - l ( e a ' - a e ' ) Q IEX - e'V2(EX) ' e ' Q - I ( e E X ' - E X e ' ) Q 1X (EX)'V3X e ' Q - l ( e a ' - a e ' ) Q 1EX a'V3(EX) '
(4.17)
(4.18)
where EX is the expectation vector of X. It is of interest to observe that the only difference between/2 and/2t is that EX in formula (4.17) is replaced by X in (4.14). Similar difference exists between & and 6t in formulae (4.18) and (4.15).
173
Estimation o f scale parameter based on a f i x e d set o f order statistics
Next, we need to show that ~t and #t as given in (4.14) and (4.15) are really the minima for the objective function h(#, o-). Since/~t and &t satisfy equations (4.11), (4.12) and (4.13), we have 0,22h(~ ' 2 , _~ o# o-),,=/,.... <=~2t2eO e ,
o-0# 02 02 O--a-h(#, o-) u=g,,~=<-- ,,----a-h(#, o-) {~=~,,a=6-,= °72 =--4e, Q 1a o#o,oand
O04h(#,o-) a=~,,,~=a-,: &72 a, O _1( X - ,hie) = ,5.22 ( X -
~te)'Q-l(X- ~te) a)
Thus, the Hessian matrix is that 2 (&te'Q-le n = = ~o.-3 &t e'Q - l a
&te'Q-la ) (X-flte)'Ql(x-~te)
•
dt
Noting from (4.11) that #t e'Q-la = e'Q-1X - ~t e'Q -le , we have e'Q-le (X - [,te)'Q l ( x -- hte ) - (&t e'Q la)2 =e'Q leX'Q 1X-(e'Q-1X) 2 , which is always greater than or equal to zero (Cauchy-Schwartz inequality). We also know that it equals zero if and only if X = c e for some constant c. Since X is the vector of order statistics, P(X = c e) = 0 for any c. Hence, the Hessian matrix H is positive definite with probability one. In other words, we prove that/)t and &t minimize the function h(#, o-) given in (4.10). If the generating density function is symmetric around the origin and the censoring is also symmetric,/~t and &t can be further simplified as e'Q ix ) t - - e,Q_le,
x ' ( Q-1 Qe'Q-'elee'QI'~x.] &t = a,Q_lX
Therefore, ~t = ~ but R2
&t-- a,Q-la ~- F& ,
(4.19)
S.K. Sarkar and W. Wang
174
where R is given in (4.1). In this case and under the conditions given in Theorem 1, #t is always greater than 6- and has a positive bias. 4.4. E x a m p l e s
The estimators given in the previous sections can be simplified for some special families of location-scale distributions. Some examples are given below. EXAMPLE 1. Let X be a complete sample of order statistics from the normal distribution with mean # and standard deviation o-. Then, we have Q le = e and a1Q-le = 0. Consequently, the best unbiased L-estimators of # and a given in (2.2) and (2.3) are now simplified as /2 = X ,
6 = a'Q-1X/a'Q
la .
The estimators given in (4.17) and (4.18) are now /~t = X ,
&t =
X'Q 1X - N X 2 a,Q_lX
As for the residual-based estimators, not much simplification can be obtained except that the R given in (4.1) and z in (4.4) can be rewritten as / R = VX'Q-1X
- NX 2 - a'Q-la
~-2
and
z
_ a'Q -IY atQ-l~
EXAMPLE2. Consider the family of two-parameter uniform distributions with the generating density function if -½_<x_<½ .
f(x)=l,
Let Xk+l < . . . < XN k be a doubly Type II censored symmetric sample from a two-parameter uniform distribution with k censored observations each side. If the suffixes of the a vector and the Q matrix start from k + 1, then 1 i ai=-~+N+l,
i=k+l,...,N-k
,
and for k + 1 < i < j < N - k , qij =
i(N - j + 1)
(N + 1)2(N + 2)
Let qij denote the (i,j)th element of Q-1. Then, for j > i, f-l, j=i+l, i=k+l,...,N-k, ---- ~ 2, j = i = k + l , . . . , N - k , (N+ 1)(X+2) 0, j > i + l qij
175
Estimation o f scale parameter based on a fixed set of order statistics
(See Arnold, Balakrishnan, and Nagaraja 1992). So e'Q -1 = ( N + 1)(N+ 2)(1,0,0,...,0, 1) ,
a'Q-1 = ( N + I ) ( N + 2 )
1
k
1
N-k+
-~+~-@--~,0,0,...,0,-~-~
e ' Q - l e - 2 ( N + 1)(N+ 2),
N+I
1.)
e'Q-la = 0 ,
e'Q-1X = (N + 1)(N + 2)(X~+I + X u - k )
,
a'Q-IX = (N + 1 - 2k)(N + 2)(XN k --Xk+l)/2 , i/ N - k
2
N-k-1
XtQ-1X = 2(X + 1)(X + 2)~i=k~+lX/ -- i=k+l ~ X/Y/+l
/
Thus, =
+ XN-k 2 '
=-o+1
=
(N+ 1) (XN-k ~
Xk~ l ~
N-k-1
'
and (~-'~N-k ~i y2 4(N + 1 ) \z--,i=k+l
O't--
-- ~z-~i=k+l " ~ N - k - 1 Y i X i + l _ ~2~
( N + 1 - 2 k ) ( X N - k -- Xk)
By using the above expressions, we can also simplify the residuals-based estimators, &~ and/rc. EXAMPLE 3. Let Xr,+l <<_'" <--XN-r2 be a doubly Type II censored sample available from the two-parameter location-scale exponential distribution and Yr,+I <_ " " <<_Yu-r2 be the corresponding standardized order statistics. Let Wrt+l =bYrd+l,
Wii = ( N - i +
1)(Y/- Y/ 1),
i=rl
+2,.
where b=
1)2 2_,i=1 V/x-•r,+l 1/(N - i +
Then, W/'s are independent and r~+l E(Wr,+1) = c = b ~ 1 / ( N i=1
i + 1),
Var(W~,+l) = 1 ,
,U-r2
,
176
S. K. Sarkar and W. Wang
E(Wi)=I,
Var(Wi)=l,
i=rl+2,...,N-r2
.
Let D be the t r a n s f o r m a t i o n matrix such that W = DY. Thus, D Q D ' = I, D A = ( D e , D a ) =
0
e
"
V " N - ~+2 W ~ / ( n - 1). Then, the best unbiased L-esLet n = N - rl - r2 and W0 = z__.i:~, timator of the scale p a r a m e t e r given in (2.3) can be rewritten as
6 =
(De)'(DQD')
l[De(Da)' - D a ( D e ) ' ] ( D Q D ' ) - I D X
( ( D e ) t ( D Q D ' ) - I De) ( ( D a ) ' ( D Q D ' ) - I D a ) - ( ( D e ) ' ( D Q D ' ) - I D a ) 2 = (De)1[De(Da)'-
= Woa
Da(De)'IDX
b2(n-
1)
N-r2
=
,
Z (X-i+l)(Xi-Xi_i)/(n-1) i=rl +2
and r given in (4.2) simplifies to
r=
=
W'(I - DA(A'D'DA)-IA'D')W =
I
I,/° W
0
o' i _ @ i ee,
) W
\z~i=r, +2 i} w, 2 -
~-]
:
(n -
1)sv~,
i=rl +2
where N-r2 E Vi2 -
$2 =
Wi
1/(n -
1),with V/= v,N_r2
i=rl+2
,
m--d=rl+2 W/ i = rl + 2, . . . , N - r2 .
Since N ' s are independent standard exponential variables (i = rl + 2 , . . . , N - r2), ~ ' s have the joint density function l
f ( < +2,. • •, ~N-r2) -- -(n -- 2)! N-r 2
if Z
vi=
1, a n d 0 _ < vi_< 1, i = r l + 2 , . . . , N - r 2 ,
i=rl +2
and is independent of W0 (see, for example, R a o 1973, p. 215). It is easy to see that R = ra = (n-
1)SvW0a = ( n -
1)Sv& .
177
Estimation o f scale parameter based on a f i x e d set o f order statistics
For n : 3, we have
E(r)
V~EIW2- v/31
=T =T
Ix-yle X-Ydxdy
=
e x
( y - x)e Ydy] dx
(x - y ) e -y dy +
2
v~ 2
/o
e-X[x + e -x - 1 + e X]dx
2 So, 6~ = ~ - R
.
In this case, the mean squared error for 6~ is 1 o2 which is equal to the variance of 6. Similar to the derivation of 6, we have {7t :
~N-r2 mi 2 i=rl+2 G x_~N_r ~ -Z-ai=rl -+2 Wi
N r2
(n-1)Woo- ~
N r2
Vta=(n-1)& Z
i=rl +2
~2
i=rl +2
and (DX)' (I - D,(oe)"~DX /)'
(De)'(I
bc / =v,+~ -c~~,r ~ )DX ba'
De(Da)'
5. Extensionof the positivity results to censored scale regression model 5.1. Censored scale regression m o d e l
Let us first introduce a censored linear model which includes model (1.3) as a special case. Suppose that there are k different populations with distribution functions from the same location-scale family Cg(F), having a common scale parameter a(> 0) but having different location parameters. The ith population has the location parameter #i = Z~/~, where Zi's are known p x 1 vectors of covariates,/~ (p x 1) is a vector of unknown coefficient parameters. Both ~ and a are unknown constants across all the k populations. Suppose that a random sample of size Ni is drawn from the ith population and that only ni of the order statistics of the sample are observed and the remaining (N~ - ni) observations are
178
s. K. Sarkar and W. Wang
censored. Let Xi = (X/l,... ,X/hi)', where X/1 _< ..- _< X,.n, are the ni observed order statistics for the ith population. They may be any fixed set of ni order statistics of the sample and are not necessarily the first n,- order statistics. If we define Yi = (Xi - Z I ~ e i ) / a = (Y/I,..., Y/hi)', where ei = ( 1 , . . . , 1)': n i x 1, then Y~I < "'" < Y/n, become the corresponding n i order statistics based on a random sample of size Ni from the generating distribution function F(y). Suppose that E(Yi) = ai
and
Cov(Yi) = Qi -
Obviously, the ai's and Qi's are free of parameters, depending only on the form of F. Let us denote the n x 1 (n = ~-]~/k.l hi) vector of observed order statistics from the k samples by X = ( X 'l , . , ., Xk), ! t and the corresponding column vector of a~'s by a. Also, let Q = diag(Q1,... , Qk), an n x n block-diagonal matrix whose ith diagonal matrix is Qi. Then, we have E(X)=Rfl+aa
and
Cov(X)=a2Q,
(5.1)
where R = D Z with D = d i a g ( e l , . . . , ek), Z = ( Z 1 , . . . , Zk)'. This model often arises in accelerated life testings. The k populations can be k different testing conditions. Crawford (1970) gave an example of such a situation where a new Class B insulation for electric motors was evaluated using temperature accelerated life testing on 40 motorettes. Ten motorettes were put on test together at each of four temperatures (150°C, 170°C, 190°C, 220°C). At the time the analysis was performed, only some motorettes had failed. In this example, we have four populations corresponding to four different testing conditions (temperatures), and
zl/
=/30 + fl z, ,
where z~ is the ith testing temperature. Model (5.1) has been studied by many authors (Nelson and Hahn 1972, 1973; Hamouda 1988; and for the simple regression model, see Moussa 1972; Leone and Hamouda 1973; Hamouda and Leone 1974). When k = 1, model (5.1) can be reduced to model (1.3). General discussion on model (5.1) can be found in the book written by Balakrishnan and Cohen (1991), 5.2. The B L U E o f the scale p a r a m e t e r
Since a appears as a regression coefficient in this model, the same approach for obtaining the best unbiased L-estimator of cr in model (1.3) can be used (as derived by Nelson and Hahn 1972, 1973) to obtain the following best unbiased L-estimator of the scale parameter when the rank of Z is p _< k: 6 - a'[Q-1 - Q 1 R ( R ' Q - 1 R ) - I R ' Q - ~ ] X at[Q - 1 - Q - I R ( R ' Q - 1 R ) 1R' Q 1]a
(5.2)
It should be noted that the estimation of a is still possible even when the rank of Z is k _
Estimation of scale parameter based on a fixed set of order statistics
179
the same as (5.1) with Z and fl replaced, respectively, by Z * = (ZZ') ½ and fl* = (ZZ')-Izfl. Therefore, & in this case is given by (5.2) with Z replaced by Z*, a k x k matrix of rank k. The estimator given in (5.2) may be computationally laborious to obtain in practice, particularly for large sample sizes. Therefore, Nelson and Hahn (1972, 1973) introduced a two-step method and obtained simple linear unbiased estimators (SLUEs) of the parameters in model (5.1), which require computation of the inverses of smaller order (< n) matrices. However, as noted by Nelson and Hahn (1972, 1973), the SLUEs are generally not the BLUEs. A necessary and sufficient condition for the SLUEs to be the BLUEs were given by Escobar (1986). Another possible way to approximate the BLUEs is to replace the a vector and the Q matrix by their approximations as discussed in Section 2. Like the "nearly unbiased, nearly best" L-estimator of the scale parameter in model (1.3), the estimator obtained in this way should be very efficient if the sample sizes for all the k populations are large. 5.3. Positivity o f the B L U E
The following results concerning the positivity of the best unbiased L-estimator (5.2) were proved by Bai, Sarkar, and Wang (1996). Their proofs are omitted here. THEOREM 2. When the rank of Z is k _
THEOREM 3. When the rank of Z is p < k, the estimator (5.2) is positive with probability one if all the following conditions are satisfied: a) f ( y ) is symmetric about zero; b) the set of order statistics from each population is complete or symmetrically censored; and c) either 2 _< ni <_ 3 for all i or f ( y ) is log-concave in y.
6. Concluding remarks Discussion of some new results and ideas on estimation of the scale parameter in a location-scale family of distributions based on a fixed set of order statistics has been the main focus in this article. In addition to reporting a new result that provides an analytical proof of a conjecture on the positivity of Lloyd's best unbiased L-estimator, we put forward a number of new nonlinear estimators. The result supporting that one could possibly use the combined estimator in place of the maximum likelihood estimator for the purpose of estimating the scale parameter of a logistic distribution is an important finding relating to these new
180
S. K. Sarkar and W. Wang
estimators. With regard to the other nonlinear estimators, we have not attempted here to make a comprehensive study but intend to do so in future research.
References Aitken, A. C. (1935). On least squares and linear combinations of observations. Proc. Roy. Soc. Edinburgh 55, 4248. All, M. M. and L. K. Chan (1964). On Gupta's estimates of the parameters of the normal distribution. Biometrika 51,498 501. Arnold, B. C., N. Balakrishnan and H. N. Nagaraja (1992). A First Course in Order Statistics. John Wiley & Sons, Inc.. Bai, Z. D., S. K. Sarkar and W. Wang (1996). Positivity of the best unbiased L-estimator of the scale parameter with complete or selected order statistics from location-scale distribution. Statist. Prob. Lett. 32, 181 188. Balakrishnan, N. (1992). Handbook o f the Logistic Distribution. Marcel Dekker, Inc., New York. Balakrishnan, N. and A. C. Cohen (1991). Order Statistics and Inference." Estimation Methods. Academic, Boston. Barnett, V. and T. Lewis (1984). Outliers in Statistical Data. 2nd ed. New York, John Wiley & Sons. Blom, G. (1958). Statistical Estimates and Transformed Beta-Variables, Almqvist and Wiksell, Uppsala, Sweden. Blom, G. (1962). Nearly best linear estimates of location and scale parameters. In: A. E. Sarhan and B. G. Greenberg, eds., Contributions to Order Statistics 3446, Wiley, New York. Chernoff, H. and G. J. Lieberman (1954). Use of normal probability paper. J. Amer. Statist. Assoc. 49, 778-785. Crawford, D. E. (1970). Analysis of incomplete life test data on motorettes. Insulation/Circuits, 16, No. 11, 4348. Das Gupta, S. and S. K. Sarkar (1982). On TP2 and log-concavity, Inequalities in Statistics and Probability, IMS Lecture Notes-Monograph Series 5, 54-58. David, H. A. (1981). Order Statistics (2nd ed.). New York, Wiley. David, F. N. and N. L. Johnson (1954). Statistical treatment of censored data. I. Fundamental formulae. Biometrika 41,228-240. Escobar, L. A. (1986). The equivalence of regression-simple and best-linear-unbiased estimators with type II censored data from a location scale distribution. J. Amer. Statist. Assoc. 81, 210 214. Gan F. F. and K. J. Koehler (1990). Goodness-of-fit tests based on P-P probability plots. Technometrics 32, 289-303. Gupta, A. K. (1952). Estimation of the mean and standard deviation of a normal population from a censored sample. Biometrika 39, 260-273. Hamouda, E. M. (t988). Inference in regression problems based on order statistics, Comm. Statist. Part A-Theory and Methods 17, 2343-2367. Hamouda, E. M. and F. C. Leone (1974). The O-BLUE estimators for complete and censored samples in linear regression. Technometrics 16, 441 446. Harter, H. L. (1988). Weibull, log-weibull and gamma order statistics. Handbook o f Statistics 7, eds. Krishnaiah and Rao, New York, North Holland. Harter, H. L. and A. H. Moore (1967). Maximum-likelihood estimation, from censored samples, of the parameters of a logistic distribution. J. Amer. Statist. Assoc. 62, 675-684. Iglewicz, B. and D. C. Hoaglin (1993). How to Detect and Handle Outliers. The ASQS Basic References in Quality Control: Statistical Techniques, Vol. 16 Milwaukee, WI: American Society for Quality Control. Karlin, S. (1968). Total Positivity. Stanford University Press, Stanford: CA. Lawless, J. F. (1982). Statistical Models and MethodLfor Lifetime Data. New York, John Wiley.
Estimation o f scale parameter based on a fixed set of order statistics
181
Leone, F. C. and E. M. Hamouda (1973). Relative efficiencies of O-BLUE estimators in simple linear regression. J. Amer. Statist. Assoc. 68, 953-959. Leslie, J. R., M. A. Stephens and S. Fotopoulos (1986). Asymptotic distribution of the Shapiro-Wilk W for testing for normality. Ann. Statist. 14, 1497-1506. Lloyd, E. H. (1952). Least-squares estimation of location and scale parameters using order statistics, Biometrika 39, 88-95. Moussa, E. A. (1972). Estimation and robustness of efficiency of linear models for small complete and censored samples. Ph.D. Thesis, University of Iowa. Nelson, W. and G. J. Hahn (1972). Linear estimation of a regression relationship from censored data: Part I - Simple methods and their applications. Technometries 14, 247-269. Nelson, W. and G. J. Hahn (1973). Linear estimation of a regression relationship from censored data: Part II - Best linear unbiased estimation and theory. Technometrics 15, 133-150. Parrish, R. S. (1992). New tables of coefficients and percentage points for the W test for normality. J. Statist. Camp. Simul. 41, 169 185. Rao, C. R. (1973). Linear Statistical Inference and Its Applications. John Wiley & Sons, Inc., New York. Sarhan, A. E. and B. G. Greenberg (Eds.) (1962). Contributions to Order Statistics. Wiley, New York. Sarkadi, K. (1981). On the consistency of some goodness of fit tests. Proc. Sixth Conf. Probab. Theory, Brasov, 1979, Bucuresti 195-204. Ed. Acad. R. S. Romania. Shapiro, S. S. and M. B. Wilk (1965). An analysis of variance test for normality (complete samples). Biometrika 52, 591-611. Shapiro, S. S. and M. B. Wilk (1968). Approximations for the null distribution of the W statistic. Technometries 10, 861 866. Shapiro, S. S. and M. B. Wilk (1972). An analysis of variance test for the exponential distribution (complete samples). Technometries 14, 355-370. Shapiro, S. S., M. B. Wilk and H. J. Chen (1968). A comparative study of various tests for normality. J. Amer. Statist. Assoc. 63, 1343-1372.
N. Balakrishnan and C. R. Rao, eds., Handbook of Statistics, Vol. 17 © 1998 Elsevier Science B.V. All rights reserved.
"7 [
Optimal Linear Inference Using Selected Order Statistics in Location-Scale Models
M. Masoom Ali and Dale Umbach
1. Introduction
One of the major earliest works in statistical inference using selected order statistics was done by Mosteller (1946). In this pioneering work he advocated estimation of location/scale parameters using a few optimally selected order statistics, particularly for large sample size. These procedures were developed as a compromise between lack of efficiency and quickness and ease of computation. In general, it has been observed that for most distributions eMciencies of 90% or more are achieved with seven or even fewer optimally chosen observations. These estimates are based on linear combinations of the selected order statistics, which are Best Linear Unbiased Estimates. Lloyd (1952) introduced BLUE's to construct linear estimates. Since the coefficients of these linear combinations are functions of means and covariances of order statistics, the estimates can be numerically computed for small sample size. Ogawa (1951) considered the problem of estimating location/scale parameters for large samples and introduced the Asymptotically Best Linear Unbiased Estimates. Sarhan and Greenberg (1962) gave a comprehensive account of the estimation problem using a few selected order statistics which was addressed up until that point in time. Mosteller's (1946) paper, along with Ogawa's (1951) paper and Sarhan and Greenberg's (1962) book, laid the foundation for this area of research. Some of the other works on estimation of parameters are found in Yamanouchi (1949), Kulldorff (1961a,b 1963a,b 1964, 1973), Harter (1961, 1971), Siddiqui (1963), Sfirndal (1962, 1964), Eisenberger (1968), Eisenberger and Posner (1965), Chart (1969, 1970), Chan and Chan (1973), Chan and Cheng (1973, 1988), Chan and Mead (1971a), Chan, Chan and Mead (1971, 1973), Gupta and Gnanadesikan (1966), Hassanein (1968, 1969a,b, 1971, 1972, 1974, 1977), Kaminsky (1972, 1973, 1974), Ogawa (1960, 1976), Saleh (1966, 1967), Saleh and Ali (1966), and Sarhan and Greenberg (1967). The articles by Kubat and Epstein (1980) and Koutrouvelis (1981) are among the first to address the problem of estimating the quantiles of a distribution using
183
M. M. Ali and D. Umbaeh
184
a few optimally selected order statistics. Since then Ali, Umbach and Hassanien (1981a,b), Umbach, Ali and Hassanein (1981a,b), Ali, Umbach and Saleh (1982, 1992), Ali Umbach, Saleh and Hassanein (1983), Saleh (1981), Saleh, Ali and Umbach (1983a, 1985) Saleh, Hassanein and Ali (1992), Umbach, Ali and Saleh (1986), and Umbach and Ali (1996b) have also considered the problem of estimating quantiles. Some works on testing hypotheses and constructing confidence intervals using optimal spacings were also published. Sarhan and Greenberg (1962) address some problems in their book, Saleh, Ali and Umbach (1984), Ali, Umbach and Saleh (1985), Eisenberger (1968), Hassanein, Saleh and Brown (1986a), Saleh, Ali and Umbach (1985), Saleh, Hassanein and Ali (1988), Saleh, Hassanein and Brown (1985), Saleh and Sen (1985) and Umbach, Ali and Saleh (1986) also considered the testing problem using optimal spacing. Finally, Ali and Umbach (1989a) and Umbach and Ali (1990, 1991, 1996a) developed goodness-of-fit tests using a few selected order statistics. Chan and Cheng (1988) mention in their survey paper some earlier works in this field along with examples of applications of optimal spacings. For example, Benson (1949) used two out of seventy observations to estimate the standard deviation of the thread strength in textiles. As another example of application, Eisenberger and Posner (1965) used selected sample quantiles for data compression in space telemetry. These estimators, based on a few selected order statistics, gained popularity in the statistical literature because of their computational simplicity, high efficiency, and robust behavior under departures from distributional assumptions.
2. Preliminaries
Let X I , 352, . . . , X n be a random sample of size n from a population with distribution function G and density function 9. Let XI:~ < X2:n < -.. < X,:, be the corresponding order statistics. Let X,,:,, X,2:n, . . . , X,,:, be k order statistics of arbitrary ranks subject only to the restriction 1 _< nt < n2 < -.. < nk _< n. The joint density of these k order statistics is given by h(Xnl:n,xn2:n,..., Xn+:n) = n! rTk+',
k+l
k
II[G(xni:n)
11i< kni - ni-1 - 1)! i=~
- G(x<
,:,)]
ni--ni_l
--1
I-Ig(x,i:n)
(2.1)
i=1
where no = 0, nk+l = n + l, G(x,0:~) = 0 and G(x,k+t:,) = 1. Now, let Ui = GO(/) for i = l, 2 , . . . , n . Then U1, U 2 , . . . , U, is a random sample of size n from the uniform distribution over (0,1). Let UI:, < U2:, < ... < U,:n be the corresponding order statistics. Then the joint density of U , , : , , U,2:, , . . . , U,+:, for 1 < nl < n2 < . . ' < nk < n is given by I~k+l,
n!
k+l
I'I[ttni:n
i=1 V'v/i - - h i - 1 - - 1 ) ! i=1
--
.~i_t:.]/~i . . . . 1-1
(2.2)
Optimal linear inference using selected order statistics in location-scale models
185
for 0 < un~:n < u,2:n < "'" < Unk:n < 1 with un0:, = 0 and u.k+~:n= 1. With Vi = - ln(1 - U/), for i = 1, 2, . . . , n, V/has an exponential distribution with mean 1. Then the corresponding order statistics, Vl:, < V2:~ < ... < V~:n, have the joint density n!exp -
vi:,
=n!exp
-
i=1
(n-i+l)(vi:,-vi
(2.3)
1:~)
"=
for 0 < vnl:n < Vn2:n < "'" < 1Ank:n < 00, with v0:n = 0. Hence, with Z i = ( n - i + 1)(V/:,- V/-l:n), for i = 1, 2 , . . . , n , Z1, Z2, . ,Z, are independently and identically distributed as exponential with mean 1. Thus V/:n can be expressed as V~:,
Z1 + Z2 n n- 1 +
Zi + n-i÷l
(2.4)
This produces the Renyi (1953) representation of X/:, as Xi:n = a - l ( g i : n ) ~- G-l(1 - exp(-V~:,)) .
(2.5)
This indicates why the exponential distribution plays a central role in the study of order statistics. In particular, this representation has been used by Chernoff, Gastwirth and Johns (1967) to study the asymptotic distributional properties of linear combinations of order statistics. Order statistics arise naturally in the estimation of the quantile function, (2.6)
Q(p) = inf{x]G(x) >_ p } .
Given a random sample of size n from some population with distribution function G, an estimate of G is the empirical distribution function, G~(x), given by Gn(x) =
{o orx .l i-~
1
for X/-I:~ _< x < X/:n for X,:,_<x .
(2.7)
Since G,(x) -+ G(x) in probability as n ~ oc, a natural estimate of Q(p) is given by the sample quantile function, Q, (p), given by Q~(p) = G21(p) := inf{xlGn(x ) >_p} = X~.,.
for -i - < p1_ < /'/
i .
(2.8)
n
Parzen (1979) modified this estimator slightly through interpolation. He suggests using 0,(p)=n (/-p)Xi
l:n+n(p
i-I.)X~.~. /2
for i - 1 < p < _ -i. T/
(2.9)
n
Under the assumption that the form of G is known up to some unknown parameters, efficiencies of the preceding estimators can be greatly improved upon by using a few selected order statistics. In particular, when sampling from a
186
M.
M. Ali and D. Umbach
location-scale model, estimation by the use of selected order statistics is often used. This is typically accomplished by first estimating the location and scale parameters and then using these estimators to form an estimate of Q(p). Thus, estimation of the parameters will be considered at this point. Let X1, X2, . . . , X ~ be a random sample of size n from a distribution with distribution function Fx,~ and density function f~,6, where F~.,~(x) = F ( ( x - 2)/3) and f ~ . , ~ ( x ) = f ( ( x - 2 ) / 3 ) / 6 with location 2 E I R and scale 6 E I R +. Let XI:, <X2:n < ... <X,:, be the corresponding order statistics and let Y,.:, = (X/:n - 2 ) / 6 for i = 1, 2, . . . , n. Then YI:~ < Y2:~ < " " < Y~:n are order statistics corresponding to a random sample from F. Hence, with X/:, = 2 + bY/:,, for i = 1, 2, . . . , n, and E(Y/:,) = ei and Cov(Y/:,, Yj:n) = Yij, E(~:~) = 2 + 6e;
(2.10)
Cov(X/:n,Xj:~) = 62vii .
(2.11)
and
Using these results, one can find the Best Linear Unbiased Estimators (BLUE) of )o and/or 3 based on selected order statistics as follows. Denote the ranks of the k selected order statistics by nl < n2 < " " < nk. Express X~,:~ as X , ~ : , = ) c + 6 c ~ n i + e n ~ for i = 1, 2, . . . , n ,
(2.12)
where E(e~i) = 0 and Cov(e,,, % ) = 32v~,j. In matrix notation, this is X=AO+e=21+6e+e
,
(2.13)
where
x --(x., :., xn2:.,..., xok:.)' A =(1, x)
o
6/ =(c~n,,
~n2, - • •, c%) ~
1 = ( 1 , 1 , . . . , 1) t
e =(ent,em_,
• • • ,enk) t
(2.14)
•
By the Gauss-Markov theorem, the BLUE for O based on Xn,:,, X,2:~,... , Xn~:~ is 6 = (A'V-1A)-IA'V i x ,
(2.15)
where V is the matrix whose elements are vninj. The variance-covariance matrix for is 62(A'V-IA) 1. If 6 is known, then the B L U E of 2 is
= (ltV-1l) l l t v - l ( x -
6~)
(2.16)
Optimal linear inference using selected order statistics in location-scale models
187
with Var()~) = 6 2 ( 1 W 11) 1
(2.17)
If 2 is known, then the BLUE of 6 is
; -~- (o~tv-lo~)-l~tv-i(x - 21)
(2.18)
Vat(3) = 62(0(V-1~¢) i
(2.19)
with
These estimates can all be computed once the first two moments of the order statistics are known. This demonstrates the importance of tables of moments of order statistics. Depending on one's focus and which parameters may be known, one may use a variety of criteria to choose the optimal set of k order statistics at this point. This will be the focus of the next section. Suffice it to say here that analytic results on optimal selection of order statistics are quite rare. The result is the generation of lengthy tables for each possible sample size. To simplify things, asymptotic results are often used. For the asymptotic case, (Pl,P2,... ,p/r) is called a spacing of the k selected order statistics if 0 < pl < P2 < ' < pk < 1. Here, the ranks of the order statistics are obtained from ni = Inpi] + 1, for i = 1, 2, . . . , n, where [.] represents the greatest integer function. For a fixed spacing, the asymptotic distribution of selected order statistics was obtained by Mosteller (1946). THEOREM (Mosteller). Let 0 < Pl < P2 < " " < Pk < 1 be fixed. Suppose that X = (Xn~:~, Xn2:n, . . . , X,~:,) ~ is a vector of k order statistics from a random sample of size n from distribution function G with associated density g, where ni = [npi] + 1, for i = 1, 2, . . . , k. Suppose that g(G-l(pi)) > 0 and that g and g' are continuous in a neighborhood of G-l(pi) for i = 1, 2, . . . , n. Then X h a s a kvariate normal distribution with
E(Xn,:,)=G 1(pi)
for i = l ,
2,..., k
and
(2.20)
Cov(X,,,,X,j,) •
= 1.
"
n
p,(1 - pj)
g(G-l(pi))g(G l(pj))
for i _ < j = l ,
2, . . . , k .
Under the location-scale model, with G = F~.6, note that E(X) = 21 + 6u, where U = (Ul,U2,.--,Uk)
' = (F I(pl),F-1Co2),... , F l(pk))' .
(2.21)
Also, the variance-covariance matrix of X is given (62/n)W, where the (i,j)th element of W is given by ( p i / j ~ ) ( ( 1 - p j ) / f j ) for i<_j, where f,. = f ( u i ) for i = 1, 2, . . . , k. Let B = (1, u). Then the Gauss-Markov theorem yields the Asymptotically Best Linear Unbiased Estimator (ABLUE) of O as
188
M. M. Ali and D. Umbach
= (BtW I B ) - I B t w - I / .
(2.22)
The asymptotic variance-covariance matrix for ~) is (62/n)(B'W-1B) 1. The coefficients of ~) and the elements of W 1 have been expressed in many different ways by different authors. In particular, W -1 is a tri-diagonal matrix. The main diagonal elements can be expressed as + Pi-----1)pi1
fi2~o~
f o r / = 1, 2, . . . , k ,
(2.23)
where p0 = 0 and pk+l = 1. The diagonals above and below the main diagonal have non-zero values where the (i, i - 1) th and the ( i - 1, i)th elements are --J~f-I
for i = 2, 3, . . . , k .
(2.24)
Pi -- Pi 1
All other elements of W -1 are zero. Quantities K1, K2, and K3 are introduced in Sarhan and Greenberg (1962) as
k+l
K1 = I'W 11 = Z
~
-- J~-l) 2
(2.25)
i=1 p i - - P i - I k+l --ft lUi 1)2 K2 = u'W-~u = Z (fiui i--1
(2.26)
Pi -- Pi 1
k+l K 3 = ltW 1/t = Z (J~ui - j } I Ui-1 )(ft" - j } 1) , i=i Pi - Pi 1
(2.27)
where additionally fo = fouo = f~+luk+l = fk+l = 0. With A = KIK2 - K~ this yields
k k (~ = (~2,3)' = (~-~. aiX,,:n, ~ biXni:n) t i=1 i-1
(2.28)
where
diiK2 ~ - d ~ ai=~
1 pi 1
fi+l ~ ) pi+l (2.29)
A
\
Pi - pi-1
bi= ~- \- pi- pi~l
Pi+1 - Pi
]
pi+-~l--p~i I
f i-KJ ~3 ~- I A - Pi- 1 Pi+ft'+1- ~ . ) 1
for i = 1, 2, ...
(2.30) k.
Also, the variance-covariance matrix for (2,3)' can be expressed as (b2/n)M, where
Optimal linear inference using selected order statistics in location-scale models
-K3
M = S
189
(2,31)
K1
If 6 is known, then the A B L U E of 2 is k
a*Xn,:~- a~7
(2.32)
i = 1, 2, . . . , k
(2.33)
: ( I ' W - 1 1 ) - l l ' w - I ( x - au) : Z i=1
1
with
a; =fi@i-fi-1Pi-1 Pi+lfi+'--~) for and Var()0 = (a2/n)(l'W 11)-'
=
¢]2/(nK1)
(2.34)
.
If 2 is known, then the A B L U E of ~ is
= (u'W-lu):lu'w-l(x
k - )~l) = ~_b~Xn,:n - .~ K3
i=1
(2.35)
/£2
with = f i ( ~ u ~ i 7 J~_IHi_I b*
\
fi+lUi+l_ _ - ~ u i ~
Pi-Pil
Pi+l-Pi
for i :
1, 2, ...
k
(2.36)
I
and
Var(6) =
(a2/n)(utW-1/4)
1 = (52/(nK2) .
(2.37)
3. Optimality criteria for estimation
The question now arises as to which set of order statistics to select. For the small sample case, this typically results in checking all (~) subsets of k order statistics to determine the optimum collection once the optimality criterion has been established. This is typically straightforward on a computer. For the asymptotic case, the programming is a bit more complex because optimizing over 0 < Pl < / ) 2 < " " < Pk < 1, is required. One cannot simply check all possible collections of order statistics. For the case where one of the parameters is known, it is clear that selecting the order statistics to minimize the variance of the estimator is appropriate. This leads to minimization of 62(l'V 11)-1 or maximization of l ' V - l l when 6 is known for the small sample case and to maximization of K1 in the asymptotic case. If 2 is
190
M. M. Ali and D. Umbach
known, one minimizes 62(e'V-le) ~ or maximizes ~tv-l~ for the small sample case and maximizes/£2 for the asymptotic case. When both parameters are unknown, the situation is a bit more interesting. One approach is to select the order statistics to minimize the generalized variance, Var(2)Var(6) - Cov2@ 6). This is, of course, the determinant of the variancecovariance matrix of ®. This leads to maximization of det(A'V-1A) for the small sample case and to minimization of det(M) = 1/A in the asymptotic case. This leads to maximization of A = K1K2 - K 2. In many early papers in this area, the optimal spacings for the parameters of some specific distributions were obtained by maximizing K1, K2, or A over 0 < pl < P2 < "'" < Pk < 1. However, in many cases this approach proved to be mathematically intractable or computationally tedious. Hence alternative criteria and approaches were developed to attack the problem. In their survey paper, Chan and Cheng (1988) mention the use of dynamic programming and other numerical optimization techniques to overcome some of the computational difficulties in this maximization process. Hassanein (1969a) takes a different approach. He minimizes the total variance, i.e. Var(2) + Var(6). This is the trace of the variance-covariance matrix. In the asymptotic case, the total variance is (a2/(nk))(K2 + K~). I f f is symmetric and a symmetric spacing is used, (Pi = 1 -pk-i+l), then K3 = 0. This special case leads to minimization of K21 + Ki-1. One might also take a minimax approach to the problem. This entails minimizing max(Var(),), Var(6)). For the asymptotic case, this reduces to minimizing max(K1/A, K2/k). For the symmetric case mentioned above, this entails maximization of min(K1,/(2). An approximate solution to the optimal spacing has been proposed by Sfirndal (1962) and Eubank (1981a, 1986). Let I represent the Fisher information matrix for &,a. Now, since 2 and 6 are location and scale parameters, it can be shown that 1 i= j
1 (J1 J 3 ) J3 J2
'
(3.1)
where J1, J2, and J3 do not depend on 2 and 6. Define 0(w) = (~q(w), O2(w))', where
02
I//l(W) =n-Z-~.af(F I(W)) O2(w)
=
Ow 02 __F Ow2
-1
( w ) f ( F -l(w)) .
(3.2) (3.3)
Then, under suitable regularity conditions, Eubank (1981 a) shows that as k --+ oc, optimal spacings are given by pi = H - l ( i / ( k + 1)), where H -I is the quantile function for the density h, given by
Optimal linear inference using selected order statistics in location-scale models
1
1~2(w)12/3/ f2 1O2(w)lZ/3dw
for 6 known, for )o known,
[0(w),j_lqt(w) ]1/3/f~[qt(w),j_lqt(w)l,/3dw
for both unknown.
IlPl(W)l
h(w) =
2/3
191
/fo
ItPl(W)l 2/3dw
(3.4) Not much attention was focused on the problem of estimating the quantiles using a few selected order statistics until the early 1980's. Kubat and Epstein (1980), Koutrouvelis (1981), Eubank (1981b), Ali, Umbach and Hassanein (19818), and Umbach, Ali and Hassanein (1981b) are some of the papers which created more interest in the problem of optimal selection of order statistics in general and in the problem of estimating quantiles using optimal spacings in particular. The general approach to estimating quantiles using a few selected order statistics is outlined below. The ~th quantile ofF~,a is given by Q(~) = 2 + 6F -1 (~). The BLUE or A B L U E of Q(~) is given by )~+ 6F -1 (~), whose variance is Var(2) + (F l(~))2Var(6)+ 2F 1(~)Cov(2, 6). For the asymptotic case, this means one must minimize ( K 2 q- ( f
l ( ~ ) ) 2 K 1 - 2F l(~)K3)/a .
(3.5)
The optimal spacing in this case almost always depends on the value of {. Thus, if one is going to estimate Q(~) for more than one value of 4, another criteria for choosing the spacing is required. This is the focus of the remainder of this section. Let q(2, 6) be a sufficiently smooth function of the location and scale parameters that one may wish to estimate. Examples of such include the quantile function Q({) = 2 + 6 F 1(~) ,
(3.6)
the hazard function f ( ( t - 2)/6) h(t) = 6(1 - F ( ( t - ),)/6))
'
(3.7)
and the survival function s(t) = 1 - F ( ( t - 2)/6) .
(3.8)
N
Consider q(2, 6) as an estimator of q(2, 6). The local linearization theorem as in Rao (1965) can be used to obtain the asymptotic distribution of q(2, 6). Oq Oq Oq Oq • Thus, if ~ , 37, 2 ag' 2 and 0;C37 2 all exist bS:' for - v o < ,~ < ec and 0 < 6 < oo, then q(2, 6) is asymptotically unbiased with an expression of the asymptotic variance of q(2, 6) given by / \ [, 2 - ~ Oq Oq Var(q(2, 5)) = Var(2)/0q|2+\~j V a r ( 5 ) ( ~ ) +2Cov(2, 6 ) ~ 62 = --a~'M¢o , n
(3.9)
192
M. M. Ali and D. Umbach
where a~, = (Oq Oq) ' ~'Oa A similar expression for the variance of the maximum likelihood estimator of q(2, 6) can be developed. Let )o and ~ represent the maximum likelihood estimators. Then the asymptotic variance of the maximum likelihood estimator can be expressed as q(2, c5) as
(+)2
Var(q()~, 6)) = Var(2) ~-~
+Var(~) \ ~ j
^ OqOq
+ 2Coy()., c5)~-~ 0a 62
= --ofJ-l(~o n
(3.10)
.
Thus, the asymptotic efficiency of q(2, ~) can be expressed as AE(q(i, 3)) - Var(q(2, 3)) Var(q(2, a))
~o'J la~ ogM~o
(3.11)
Finally, by the Courant-Fisher Theorem, see Rao (1965, p. 48), AE(q(2, 6)) lies between vl and re, where vl _< v2 are the characteristic roots of J - I M ~. Now, the smaller characteristic root, Vl can be expressed as 1
1)1 -- 2(J1J2 - j 2 )
{J2K1 q- J1K2 - 2J3K3.
- [ ( J 2 K 1 - J I K 2 ) 2 -}- 4(J2K3 - J3K2)(J1K3 - J3K,)] 1/2 }
(3.12)
while the larger characteristic root, v2 can be expressed as 1 V2-
2(JlJ2 _j2)
{J2K1 + J I K 2 - 2J3K3.
@ [ ( J 2 g l - J 1 g 2 ) 2 -]- 4(J2K3 - J3K2)(J1K3 - J 3 g l ) ] 1/2 } •
(3.13)
I f f is symmetric then J3 = 0. If also the spacing is symmetric then K3 = 0. In this case vl is min(K1/J1,K2/J2) and v2 is max(K1/Ji,K2/J2). One approach is to choose the spacing to maximize trace(J-1M 1) = vl + V2
J2KI + J1K2 - 2J3K3
(3.14)
J, J2 -
For the symmetric case, this reduces to K~/J1 + K2/J2. This is motivated by the total variance criterion. Consideration of the generalized variance concept leads to m i n i m i z i n g
Optimal linear inference using selected order statistics in location-scale models
193
d e t ( j - l M -1) = vlv2 (3.15)
K1K2 - K~ g
g2 -
Since J1, J2, and J3 are fixed, this criterion results in the same spacing as minimizing the generalized variance. Since Vl serves as a lower bound for the asymptotic efficiency of q()o, 3), a conservative spacing is obtained by choosing the spacing 0 < pl < p2 < " < pk < 1 to maximize vl. The beauty of any of these last three methods of choosing a spacing is that they yield a spacing which is independent of the function or functions of 2 and 6 being estimated. Thus, the spacing so generated may be considered robust for the estimation of 2 and 6 as well. It should be noted that a similar approach could be taken to compare O with other estimators. One could simply replace (3.1) with (1/n) times the inverse of the variance-covariance matrix of the estimator of ® for which a comparison is to be made. Such estimators might include the full sample BLUE as an example.
4. Specific distributions In this section these results are applied to specific distributions. The so-called life distributions play a central role because observations from these distributions are often naturally ordered.
4.1. Exponential distribution
The distribution function of the exponential distribution is given by
F~.,~(x)
l" 0 1 - e x p ( - ( x - 2)/c5))
for x < 2 for 2 _< x .
(4.1)
The lack-of-memory property makes it a natural distribution as a model for lifetime analysis. Estimation of the parameters of this distribution has been carried out by many authors. Harter (1961) estimates 2 with 6 known and c5 with 2 known in small samples with k = 1 or 2. Kulldorff (1963a) considers the same problem with more order statistics, Both show that the optimal selection of order statistics is the same whether estimating 2 or 6 alone. For the two parameter problem, minimization of the generalized variance was carried out for small samples by Sarhan and Greenberg (1963) for k --- 2 and by Saleh (1967) for larger values of k. Saleh also considers censoring. Saleh, Ali and Umbach (1983b) extend this to samples censored in the middle. Siddiqui (1963) uses a slightly different approach to the problem with k = 2. For the asymptotic case, note that ui = - l n ( 1 - p i ) and j) = 1 - p i . These can be substituted in K1, K2, and K3, and then K1, /(2, and A maximized over
194
M . M . A l i a n d D. U m b a c h
0 < Pl )2
< ' • ' < Pk < 1
directly. However, f o r
this distribution
it is c o n v e n i e n t
to carry out the maximization over the ui's, i.e. over 0 < ul < u2 < -. • < uk < ec. After substitution and simplification, one finds K1 -
1
(u,
k
K2=~
- - u1' 2 2
eui - - eui t
i-I
K3
(4.2)
e u~ - 1 u0=0
(4.3)
_ e ulul- 1
(4.4)
O g a w a (1960) maximizes (4.3) for estimation o f 6 with 2 k n o w n for k = 1 ( 1) 15. He presents the spacings and coefficients o f the A B L U E for 6. This table was reproduced in Sarhan and Greenberg (1962) and has since been used extensively in m a n y applications. A s y m p t o t i c results are quite interesting when 2 is u n k n o w n . F o r the estimation o f 2 when 6 is known, one maximizes (4.2), which occurs for ul = 0 or pl = 0. Thus, the first order statistic Xl:n should be selected. W h e n both parameters are u n k n o w n , the generalized variance criterion leads to maximization o f A-
/£2 e u~ - 1
{ ul "/2 \e"~ - 11
After the reparameterization ti = ui+l following expressions are obtained I./2
/(2 -- e u~ _ 1
(4.5)
-
Ul
for i --- l, 2 , . . . , k - 1, (to = 0) the
+ e - U , X-"k-1 (ti . .- ti-1) . . 2 i=1
et~ -- eti 1
u2 e ul -- 1
e "'K~
(4,6)
After simplification, A has the following f o r m e-Ul
A - ~
eut _
K~
(4.7)
which is to be maximized over 0
0
<ec
.
(4.8)
Notice that Kj depends only on q, t2, . . . , tk-1 and has the same f o r m as (4.3). Thus the results o f O g a w a (1960) can be used to optimize K~. N o w e - " ' / ( e "~ - I) is maximized at Ul = 0 or pl = 0. But pl -- 0 is not covered by Mosteller's Theorem (1946). Sarhan, Greenberg and O g a w a (1963) consider the asymptotic case for k = 2. Saleh and Ali (1966) try to patch things up by justifying the choice pl = 1 / ( n + 1) for arbitrary k. Saleh (1966) extends these results to the censored case. Estimation o f the quantile function Q(~) = 2 - 6In(1 - ~)
(4.9)
Optimal linear inference using selected order statistics in location-scale models
195
has been considered by U m b a c h , Ali and Hassanein (198 l a). They minimize the variance of the BLUE, 2 - aln(1 - ~), for k = 2 with n = 2(1)30. This work was extended to k = 3(1)6 for n = k(1)100 in Ali, Umbach and Saleh (1982). For asymptotic results, note that (3.5) can be rewritten as K ; q (F-1 ({))2 eu,-1
2~u,l (~)Ul) 1 e'-I J e-U,K~
.((F41(~)-)2 2F4'(~)<-) e"' =eU'+\ e,-1 e,-1 /K;
(4.,0)
Minimization of (4.10) was carried out by Ali, Umbach and Hassanein (1981a) for k = 2 and by Saleh, Ali and Umbach (1983b) for k = 4(2)14. Again, this was done in two stages. First Ogawa (1960) was used to maximize K~. Then the resulting function of Ul was minimized. For certain values of { the first order statistic is selected. In these cases pl = 1/(n + 1) was again used. A more careful approach to maximization of zXwas taken in Ali, Umbach and Saleh (1985). Here the first order statistic was included in the selected group and then k was maximized over 0 < p2 < p3 < " < Pk < 1.
4.2. Logistic distribution Another distribution that has had a lot of attention in the area of optimal spacing is the logistic distribution. Even though it is not itself a life distribution it is a simple transformation of one, the so-called log-logistic distribution. The form of the distribution function to be considered here is 1 b3~,a(x) = 1 + e x p ( - ( x - 2)/6)
(4.11)
Small sample work for this distribution has been somewhat sparse. Chan, Cheng and Mead (1971) consider the estimation of the parameters in small samples using the generalized variance criterion. Beyer, Moore and Hatter (1976) extended these results by including censoring. In considering the asymptotic case, compact expressions u, = - ln((1 - p i ) / p i ) and f,. = (1 - p i ) / p i are available. However, these do not lead to easily manipulated expressions for KI, K2, and K3 as they did in the exponential case. Nonetheless, asymptotic results are rather plentiful for this distribution. Gupta and Gnanadesikan (1966) consider estimation of 2 and a alone for k = 2 or 3. They note the complexity involved in the problem of minimizing the generalized variance. Hassanein (1969a) attacked the joint estimation problem by consideration of the total variance. Chan, Cheng and Mead (1972) minimize the generalized variance for k = 2, 3, and 4. For larger values of k, Chan and Cheng (1972) find the optimal estimator of 2 with 6 known. Later, Chan and Cheng (1974) extend these results to the censored case. Hassanien (1974) minimizes the generalized variance for k = 2(1)10.
196
M. M. Ali and D. Umbach
Estimation of the quantile function 0(4) = 2 - 61n((1 - ~)/~)
(4.12)
was considered by Eubank (1981 b). Using Eubank's approach, Saleh, Hassanein, and Ali (1992) consider the problem of finding ABLUE's of the quantile function for k = 2(1)10. Ali and Umbach (1989b) present exact results for k = 2, 3, and 4. Saleh0 Ali and Umbach (1992) provide the full asymptotic results. Ali and Umbach (1994) also find the spacings that maximize vl in (3.12) subject to the spacing being symmetric, the so-called conservative spacing. To carry this out, note the Fisher information matrix (3.1) for the logistic distribution has J1 = 1/3, J2 = (g2 + 3)/9, and J3 = 0. This yields v,
=
729 . f 722 -~- 3 . i mln - - h l , - (~22+3)2 ]. 9
/£2 ]
3 J~
(4.13)
to be maximized. The results of the maximization process are reproduced in Table 4.1 for illustrative purposes. 4.3. Extreme value distribution
The extreme value distribution provides a natural model for lifetimes. In this work, the Type 1 (smallest) extreme value distribution, whose distribution function is given by F~,6(x) = 1 - e x p ( - e x p ( ( x - 2)/6)) ,
(4.14)
is considered. All results are easily transformed to cover the "largest" extreme value distribution, since the "largest" is simply the negative of the "smallest." Hassanein (1969b) finds estimates of the parameters based on 2 or 3 order statistics for small samples, n _< 20. He also gives asymptotic results for k -- 2, 3, and 4 based on the total variance criterion. Hassanien (1972) extends these results to k = 2(1)10 in the small sample case. He also considers censoring. For this distribution ui = i n ( - l n ( 1 - p i ) ) and J} = - ( 1 - p i ) l n ( 1 - p i ) . Even after substituting these into K1, K2, and /£3, the expressions are still very untractable. Numerical results, however, are plentiful for this distribution. This can be attributed to the importance of the distribution in life testing. Hassanein (1968) provides asymptotic results for estimation of the parameters separately. He considers joint estimation of 2 and ~ by minimizing the generalized variance for k = 2. Chan and Mead (1971a) and Chan and Kabir (1969) extend the results to consider more order statistics. Hassanien (1972) minimizes the total variance for the joint estimation of 2 and 6. Kulldorff (1973) shows that the optimal spacing for location in a Type 1 extreme value distribution is essentially the same as the optimal spacing for scale in a Type 2 distribution. Estimation of the quantile function Q(~) = 2 + 61n(-ln(1 - ~))
(4.15)
Optimal linear inference using selected order statistics in location-scale models
197
Table 4.1
Optimal spacing, coefficients, and characteristic roots for the conservative spacing for the logistic distribution. k=2
k=3
k=4
k=5
k=6
k=7
k=8
pl a~ b~ P2 a2 b2 P3 a3 b3 p4 a4 b4 p5 a5 b5 P6 a6 b6 /~ a7 b7 P8 a8 b8
0.1503 0.5000 -0.2887 0.8497 0.5000 0.2887
0.1029 0.1587 -0.2309 0.5000 0.6826 0.0000 0.8971 0.1587 0.2309
0.0552 0.0431 -0.1011 0.2282 0.4569 -0.1745 0.7718 0.4569 0.1745 0.9448 0.0431 0.1011
0.0404 0.0213 -0.0724 0.1690 0.2098 -0.1699 0.5000 0.5379 0.0000 0.8310 0.2098 0.1699 0.9596 0.0213 0.0724
0.0259 0.0093 -0.0430 0.1098 0.0813 -0.1077 0.2749 0.4095 -0.1227 0.7251 0.4095 0.1227 0.8902 0.0813 0.1077 0.9741 0.0093 0.0430
0.0197 0.0052 -0.0321 0.0850 0.0474 -0.0842 0.2124 0.2197 -0.1331 0.5000 0.4552 0.0000 0.7876 0.2197 0.1331 0.9150 0.0474 0.0842 0.9803 0.0052 0.0321
0.0141 0.0028 -0.0220 0.0617 0.0258 -0.0592 0.1527 0.1019 -0.0987 0.3066 0.3696 -0.0930 0.6934 0.3696 0.0930 0.8473 0.1019 0.0987 0.9383 0.0258 0.0592 0.9859 0.0028 0.0220
K1 K2 /£3
0.2170 0.9311 0.0000
0.2909 0.9779 0.0000
0.2762 1.1849 0.0000
0.3077 1.2072 0.0000
0.2995 1.2849 0.0000
0.3159 1.2971 0.0000
0.3110 1.3343 0.0000
vl v2
0.6511 0.6511
0.6838 0.8726
0.8286 0.8286
0.8442 0.9231
0.8986 0.8986
0.9071 0.9477
0.9331 0.9331
was considered by Hassanein, Saleh and Brown (1984) and (1986). They also make some applications to testing. Umbach and Ali (1996b) find the spacings that maximize Vl in (3.12), the conservative spacing. To carry this out, note the Fisher information matrix (3.1) for the extreme value distribution has J1 = 1, "]2 = (1 -- 7) 2 -F- 2 ' and J3 = 1 - 7, where 7 is Euler's constant. Using these values vl can be expressed as
vl = 7
3
{((1 - 7) 2 -}- 7z2/6)Xl + K 2 - 2(1 - 7)/£3
- I{((1 - 7) 2 + Tc2/6)K1 -- K2} 2 q- 4{((1 - 7) 2 -t- rr2/6)K3 - (1 - 7)K2}{K3 -- (1 -- 7)K1}] 1/2 }
(4.16)
198
M . M . A# and D. Umbach
A steepest d e s c e n t a l g o r i t h m was u s e d to m a x i m i z e this e x p r e s s i o n for k = 2(1)8. T h e results o f this p r o c e s s a r e r e p o r t e d in T a b l e 4.2. F'or a c o n c r e t e e x a m p l e o f the use o f s u c h a table, c o n s i d e r the d a t a o n p a g e 257 o f A r n o l d , B a l a k r i s h n a n , a n d N a g a r a j a (1992). T h e r e t h e y r e p o r t the a n n u a l rainfall at the L o s A n g e l e s C i v i c C e n t e r f o r the y e a r s 1890 t h r o u g h 1989. E v e n t h o u g h t h e d a t a is n o t a r a n d o m s a m p l e , it f o l l o w s a n e x t r e m e v a l u e d i s t r i b u t i o n q u i t e well. S u p p o s e t h a t e s t i m a t e s o f 2 a n d 6 b a s e d o n k = 5 o r d e r statistics are desired. F r o m T a b l e 4.2, t h e f o l l o w i n g a r e f o u n d , pl = 0.0927, p2 = 0.3626, P3 = 0.9029, 1o4 = 0.9762, p5 = 0.9963 . (4.17) T h u s , since n = 100, t h e r a n k s o f the selected o r d e r statistics are 10, 37, 91, 98, a n d 100. T h e d a t a yields Table 4.2 Optimal spacing, coefficients, and characteristic roots for the conservative spacing for the extreme value distribution k=2
k=3
k=4
k=5
k=6
k-7
k-8
0.1705 0.3759 -0.3722 0.9358 0.6241 0.3722
0.2700 0.4996 -0.4132 0.9499 0.4462 0.2827 0.9939 0.0542 0.1305
0.0614 0.0538 -0.1007 0.2707 0.3374 -0.2700 0.8845 0.4984 0.1698 0.9824 0.1103 0.2009
0.0927 0.0824 -0.1415 0.3626 0.3962 -0.2534 0.9029 0.4014 0.1712 0.9762 0.0954 0.1544 0.9963 0.0247 0.0693
0.0284 0.0195 -0.0411 0.1280 0.0742 -0.1172 0.3320 0.3048 -0.2144 0.8480 0.4208 0.0926 0.9575 0.1386 0.1781 0.9929 0.0421 0.1020
0.0401 0.0279 -0.0565 0.1707 0.1011 -0.1441 0.4085 0.3362 -0.1899 0.8669 0.3598 0.1101 0.9548 0.1148 0.1428 0.9870 0.0467 0.0949 0.9978 0.0135 0.0427
0.0152 0.0094 -0.0207 0.0703 0.0321 -0.0599 0.1819 0.0831 -0.1155 0.3735 0.2780 -0.1787 0.8211 0.3670 0.0536 0.9335 0.1444 0.1461 0.9798 0.0655 0.1162 0.9965 0.0205 0.0590
KI /(2 /(3
0.6257 1.1411 0.2645
0.6847 1.2487 0.2895
0.8092 1.4756 0.3421
0.8429 1.5371 0.3564
0.8850 1.6140 0.3742
0.9046 1.6497 0.3825
0.9234 1.6840 0.3904
vl v2
0.6257 0.6257
0.6847 0.6847
0.8092 0.8092
0.8429 0.8429
0.8850 0.8850
0.9046 0.9046
0.9234 0.9234
pl al bl P2 a2 b2 P3 a3 b3 P4 a4 b4
P5 a5 b5 P6 a6
b6 p7 a7
b7 P8 a8 b8
Optimal linear inference using selected order statistics in location-scale models
199
n~0 = 7.38, n37 = 11.80, n91 = 23.65, n98 = 30.57, n~00 = 34.04 . (4.18) The table also yields al = 0.0824, a2 = 0.3962, a3 = 0.4014, a4 = 0.0954, a5 = 0.0247 . (4.19) Thus, ~. --- 0.0824 • 7.38 + 0.3962 - 11.80 + 0.4014.23.65 + 0.0954 • 30.57 + 0.0247 • 34.04 -= 18.53 .
(4.20)
The table also yields bl = - 0 . 1 4 1 5 ,
b2 = - 0 . 2 5 3 4 ,
b 3 = 0.1712, b 4 = 0.1544, a5 = 0.0693 .
(4.21) Thus, = - 0.1415 • 7.38 - 0.2534- 11.80 + 0.1712.23.65 + 0.1544- 30.57 + 0.0693 • 34.04 = 7.09 .
(4.22)
Table 4.2 also yields KI = 0 . 8 4 2 9 , K 2 = 1.5371, and / £ 3 = 0 . 3 5 6 4 . Thus A = 0.8429,1.5371 - 0 . 3 5 6 4 2 = 1.1686. Substituting these in (2.31), the asymptotic variance o f ~ is (62/100)(1.5371/1.1686). Using 7.09 for 6, this is approximately 0.6612. Again, using (2.31) the asymptotic variance o f 6 is (62/100) (0.8429/1.1686). Using 7.09 for 6, this is approximately 0.3626.
4.4. Weibull distribution The Weibull distribution is second only to the exponential distribution in importance as a life distribution. Specifically, consider the Weibull distribution given by
F~,~(x)
f 0 1 - e x p ( - ( x - 2)/6) ~)
tbr x < )~ for fl < x .
(4.23)
In some applications, authors consider e k n o w n and get optimal spacings for the estimation o f 2 and 6 for various values o f c~. Using this approach, ui = ( - l n ( 1 _pi))O/~) and J} = e(1 -Pi)(-ln(1 -pi)) "@'.As before, these do not lead to simple forms for g l , K 2 , and /£3. Nonetheless, numerical results are available for the joint A B L U E o f fl and 6 for k = 2, 4, and 6 based on the generalized variance criterion in Hassanein (1971). A n o t h e r fruitful a p p r o a c h comes from considering the shape-scale Weibull distribution, whose distribution function is given by
F~,B(x) =
{ 0 1 - exp(-(x//~) ~)
for x < 0 for 0 <_ x .
(4.24)
200
M. M. All and D. Umbach
The natural logarithm of a random variable with this distribution has the extreme value distribution (4.14) with 2 = In fi and 6 = 1/co Thus, the estimators k
lnAfl =
~ailnX~,:~ i=l
=
=- ~ bilnXn~:,
=
(4.25)
k
1/~
3
i=1
are obtained. Thus, for an arbitrary spacing, estimators of the form = e i and ~ = =1
(4.26)
can be considered. Using this approach Mann (1970) obtains the optimal spacing for the scale parameter. Umbach and All (1996b) used the same approach to find a conservative spacing for the Weibull distribution. Using the estimators (4.20), they find that j) and ~ are asymptotically unbiased and obtain the asymptotic variance-covariance matrix for (fl, ~) as (fl2/n)M*, where M * = A -1
7
-9
"~
(4.27)
Here, f i , g2, K3, and A are calculated using the extreme value distribution (4.14). They also note that for the Weibull distribution (4.18), the information matrix is given by l*
1j,
1 (
~2
-(1-7)fi
B2)
(4.28)
Let q(~, fl) represent a sufficiently smooth function of ~ and ft. The asymptotic efficiency of q(& fl) can be expressed as ~ t J *-~ ¢o
aE(q(& fl)) - re'M'co
(4.29)
Thus, AE(q(~ fl)) lies between v~ and v~, where vl < v~ are the characteristic roots ofJ*-~M *-~. Interestingly, it turns out that v~ and v~ do not depend on e and fi, even though J* and M* do. In fact, v~ can be expressed as 3 v~ =~-5 {((1 - y) 2 + ~2/6)K1 +I£2 + 2(1 - y)K3 - [{((I - 7) 2 + zc2/6)Xl - K2} 2 + 4{((1
- 7) 2 + a:2/6)K3 +
(1
(4.30)
- y ) K 2 } { K 3 + (1 - y ) K l } ] 1/2 } .
They present the results of numerically maximizing v~ for k = 2(1)8.
Optimal linear inference using selected order statistics in location-scale models
201
4.5. N o r m a l distribution
The normal distribution has received a fair amount of attention in this area also. Here the mean is 2 and the standard deviation is 6. One of the earliest works in selecting order statistics for the location scale model was an application to the normal distribution by Yamanouchi (1949). He justifies the use of the spacing Pi = (i - ½)/k. Kulldorff (1963b) finds the optimal spacing using the generalized variance criterion under symmetric spacing. Kulldorff (1964) takes an interesting approach. He uses simple estimates based on the selected order statistics, such as their average, instead of the B L U E or ABLUE. He finds optimal selections of order statistics within these simplified classes. Chan and Chan (1973) consider optimal spacings for the joint estimation of 2 and 6. Hassanein, Saleh and Brown (1986b) and Saleh, Hassanein and Ali (1988) consider estimation of the quantiles of the normal distribution. Ali and Umbach (1994) find the spacings that maximize v~ in (3.12) under symmetric spacings. For this problem J1 -- 1, J2 = 2, and J3 ~ 0. Symmetric spacings reduce (3.12) to vl = min{Kl,/£2/2} .
(4.31)
They present results of maximizing (4.25) for k = 2(1)8. 4.6. C a u c h y distribution
Consider the Cauchy distribution whose distribution function is given by " F~,~(x) -- ~ + - arctan rc
.
(4.32)
This distribution has a knack for producing interesting results. It happens with selected order statistics as well. For this distribution ui = tan(Tz(pi - ½)) and 1
Y~ = 7t(l + tan2(Tr(pi - ½)))
(4.33)
As usual, these do not produce expressions for Kl, K2, and K3 that can be easily manipulated. Nonetheless, some activity is reviewed below. Bloch (1966) considers estimation of 2 with known 6. He shows that the spacing pl = .13, p2 = .40, P3 --- .50, p4 = .60, p5 = .87 is 95% efficient. Chan (1970) finds the optimal spacings for the estimation of 2 alone and 6 alone. Chan, Chan and Mead (1973) show that the generalized variance is minimized when the uniform spacing pi = i / ( k + 1) is used. Cane (1974) restricts attention to symmetric spacings. She confirms that the symmetric spacing is optimal and tables the coefficients of the estimators for k = 2(1)10. Balmer, Boulton and Sack (1974) show that for estimating 2 with known 6 that an asymmetric spacing may be optimal. They show that this is indeed the case when k -- 4m - 1 for some positive integer m.
M. M. All and D. Umbach
202
For the Cauchy distribution, note that J1 = 1/2, J2 = 2, and J3 = 0. By restricting to a symmetric spacing, vl in (3.12) is given by v~ = min{2K1,/£2/2} .
(4.34)
Maximization of this expression was carried out by Umbach (1994) for k = 2, 3, 4, and 5. He finds that a uniform spacing is optimal under this criterion only for k = 3.
4.7. Pareto distribution Another distribution that has received quite a bit of attention in this area is the Pareto distribution. There are a number of forms of the distribution that can be considered. First consider the form
F;,6(x) = { 0
for x < 2 + 6 for ;+6_<x.
(4.35)
For the small sample problem, Vgnnman (1976) uses the generalized variance criterion to find the BLUE of 2 and 6. Umbach, Ali and Hassanein (1981b) find the optimal spacing for the BLUE of the quantile function Q(~) = 2 + 6(1 - ~)'/~
(4.36)
based on two order statistics. Note that ui = ( 1 - p i ) ~ and J} = e ( 1 - P i ) (~+~)/~. Kulldorff and Vfinnman (1973) use these values to discuss the optimal spacing of the order statistics for estimation of 2 and 6 for known c~in the asymptotic case. Chan and Cheng (1973) discuss the optimal spacing for estimation of 6 with c~and 2 known. Umbach, Ali, and Hassanein (1981b) find the optimal spacing for the estimation of Q(~) for k=2. A few authors have considered the form of the distribution function given by {0
F,3(x)
for x < f l for f l < x
l-(x/fl)-~
(4.37) .
The natural logarithm of a random variable with this distribution has the exponential distribution (4.1) with 2 = in fi and 6 = 1/c~. Thus, estimators k
lnfl
~-
~
ai
lnX~,:.
=
)o
i=1
(4.38)
k
17a
=
~bilnX,,:,
=
i=l
can be formed. Hence, for an arbitrary spacing, estimators of the form = e i and ~ ==1 can be considered.
(4.39)
Optimal linear inference using selected order statistics in location-scale models
203
Koutrouvelis (1981) used this technique to estimate the parameters using the generalized variance criterion. Saleh, Ali and Umbach (1985) use this idea to form a nonlinear estimator of the quantile function fl(1 - ~)1/~ = e~(1 _ ~)a.
4.8. Other distributions A number of other distributions have received some attention in this area. For the gamma distribution these include SS.rndal (1964) and Hassanein (1977). Some results for the double exponential distribution are available in Cheng (1978), Ali, Umbach and Hassanein (1981a) and Umbach, Ali and Saleh (1986). Dyer and Whisenand (1973) consider optimal spacing for the Rayleigh distribution. See the bibliography for others. 5. Tests of significance In this section, tests of significance are considered. Specifically four null hypotheses are considered: HI : ,~ = 20, 6 unspecified, H2:6 = 60, 2 unspecified, H3:2 = 20,6 = 6o, H4: Q(~) = Qo, 2 and 6 otherwise unspecified. Define the quadratic form/2 as /2 = ( X - 21 - 8u)'W l ( X - 21 - 8u) .
(5.1)
Mosteller's Theorem implies that nf2/62 has a chi-square distribution with k degrees of freedom. Now,/2 can be orthogonally partitioned as/2 =/2o +/21, where (20= ( X - i l - 3 u ) t W
l(X-~l-3u)
,
/21 = K l ( ~ - 2 ) 2 + 2 K 3 ( ) . - , ~ ) ( 3 - 8 ) ÷ K 2 ( 3 - 8 )
(5.2) 2 .
(5.3)
See Saleh, Ali, and Umbach (1984). Thus, n/20/82 has a chi-square distribution with k - 2 degrees of freedom and n/21/a 2 has a chi-square distribution with 2 degrees of freedom The minimum of/21 under the restriction 2 = 20 is (~, - 2o)2A/K2. Following Anderson (1958, pp. 189-190), Saleh, Ali and Umbach (1984) propose the test statistic tl =
a g X T 2 7 ( i - 2) V/Oo/(k - 2)
(5.4)
for testing HI. Under H1, tl has a (central) t-distribution with k - 2 degrees of freedom. Under the alternative hypothesis, tl has a noncentral t-distribution with noncentrality parameter
M. M. Ali and D. Umbach
204
(5.5) The spacing that maximizes the power of the test is obtained by maximizing A/K2. This is, of course, the spacing that minimizes the variance of the ABLUE of 2 with 3 unknown. The minimum of (21 under the restriction 3 = 6o is (~ - 3o)2A/K1. Again, they propose the test statistic
t2 --
(3 - 3)
/f2o/(k-
(5.6)
2)
for testing//2, which under H2 has a t-distribution with k - 2 degrees of freedom. Under the alternative hypothesis, t2 has a noncentral t-distribution with noncentrality parameter
x/SX
l (3 - 30)
(5.7)
The spacing that maximizes the power of the test is obtained by maximizing A/K1. This is, of course, the spacing that minimizes the variance of the ABLUE of 3 with 2 unknown. For symmetric distributions and symmetric spacings, these reduce to maximization of K1 and K2, respectively. For testing H3, Saleh and Sen (1985) propose the test statistic L ° = nf2~/62o. Under //3, L° has a chi-square distribution with 2 degrees of freedom. They consider the Pitman-A.R.E. of the test, which turns out to be exactly (3.11). Thus, one could consider any of the three optimality criteria at the end of Section 3 to choose the spacing. In a similar manner to the development of tl and t2, Saleh, Umbach and Ali (1984) propose the test statistic
t4= )~+6F ~ ~ 21(~)-QO )--
~K 2
+(F
A
l(~))2K~-2F 1(~)K3
(5.8)
for testing H4. Again, t4 has a t-distribution with k - 2 degrees of freedom under H4. Under the alternative hypothesis t4 has a noncentral t-distribution with noncentrality parameter (2 + 6F-1 ( ~ ) - Q O ) ~ K 2 nA 6 + (F -l(~))2K 1
--
2F
(5.9) I(~)K
3
The power of the test is maximized by choosing the spacing that maximizes this noncentrality parameter. This spacing also minimizes (3.5). Here again, the duality with the estimation problem is evident.
Optimal linear inference using selected order statistics in location-scale models
205
6. T e s t i n g g o o d n e s s - o f - f i t
In this section, tests concerning the suitability of the assumed underlying distribution, F, of the location scale model, F~,~(x) = F((x - 2)/3) are considered. The test statistics will vary depending on which parameters are assumed known. First, consider the goodness-of-fit hypothesis //5: ~,~(x) = Fo((~ - 4o)13o) ,
(6.1)
where Fo is a completely specified distribution function, with density fo, and 2o and 6o are specified. For a fixed spacing, let Wo and uo be the values of W and u, respectively, calculated using F = Fo and f = fo. See (2.21) and following. Now, let 05 = (X - 2ol - 3oUo)'W~-l (X - 4ol - 3oUo) •
(6.2)
U n d e r / / 5 , nf25/32o has a limiting chi-square distribution with k degrees of freedom. Umbach and Ali (1990) propose testing Hs with a critical region given by 2 n05/32o > z~(k), where zZ(k) represents the ( 1 - c~)100%-ile of the chi-square distribution with k degrees of freedom. For a few different spacings, they show that the power of the test is comparable to Pearson's test, but that it is somewhat less powerful than the Kolmogorov-Smirnov test. Modifications are necessary if one or both of the parameters is unknown. For example, consider 06: F2,f(x) ~-- Fo((X - 4)/3o) ,
(6.3)
where Fo is a completely specified distribution function and 3o is specified. A value for the location parameter 4 is not specified. To carry out the test, 4 must be estimated. For this use the estimator 4 of (2.32) calculated using F = Fo and f = fo and using 3 = 3o. Similar to I2s, let 06 = ( X - 21 - 3oUo)'Wo 1 (X - 21 -
3oUo)
.
(6.4)
Under H6, n06/32o has a limiting chi-square distribution with k - 1 degrees of freedom. Umbach and Ali (1990) propose testing//6 with a critical region given by n06/32o > z 2~ ( k - 1). They also present results of a Monte Carlo power study. Things are not so straightforward for H7: FL~(x ) = Fo( (X - 40)/3) ,
(6.5)
where Fo is a completely specified distribution function and 4o is specified. A value for the scale parameter 3 is not specified. To carry out the test, estimate 3 with of (2.35) calculated using F = Fo and f = fo and using 4 = 4o. Now, let
M. M. Ali and D. Umbach
206
07
:
(X -
2ol
-
6Uo)'Wol(X
-
2ol
-
6Uo)
•
(6.6)
Under 117, n~7/82 has a limiting chi-square distribution with k - 1 degrees of freedom. However, a test statistic cannot be built directly on this quantity, since 8 is an unknown parameter. Umbach and Ali (1990) propose the following fix for distributions Fo with finite variance ao2. Under//7, the sample variance S 2 converges in law to 82o2o. This yields (6.7)
n(27 . ~ 2 cro2 _ n O 7 O-2
82
S2
S2
converging in law to a chi-square distribution with k - 1 degrees of freedom. They propose the use of (nO7a2o)/S 2 as a test statistic for//7. Since alternatives t o / / 7 m a y contain distributions whose second moments are not finite, which would tend to make (6.7) small, they propose a two tailed test in this case. The case where both parameters are unknown is handled in a similar manner in Ali and Umbach (1989a). Consider
Ha: F),a(x) = Fo( (X - 2)/8) ,
(6.8)
where Fo is a completely specified distribution function. To carry out the test, estimate 2 with 2 and 8 with 6 calculated using F = Fo and f = )Co. Then calculate the test statistic (nOoa2o)/S 2 in a manner similar to the previous case. The limiting distribution is chi-square with k - 2 degrees of freedom in this case. They propose a two-tailed test in this case also. The results of an extensive Monte Carlo simulation of power are included in their work. LaRiccia (1991) presents a test based on all of the order statistics for Hs. Umbach and Ali (1996a) propose a variant which is based on a few order statistics. It is outlined below. Choose appropriate functions hi, h2, • •., hr and define
hj = (hj(uol), hj(uo2), ...,hj(uog))'
for = 1, 2, . . . , r ,
(6.9)
where Uoi is the :helement of uo. F o r m the linear model F
X = 21 + 6uo +
~-~/~:hj +e
,
(6.10)
j=l
where e has mean 0 and variance-covariance matrix (82/n)Wo. Now, instead of testing//8 directly, they test H9:/3
= 0
,
(6.11)
where/3 = (/31, / 3 2 , . . . , fir)'The X-matrix for a regression analysis based on the previous linear model can be partitioned as
Optimal linear inference using selected order statistics in location-scale modeb
Xt = (1, u0) X2 = (hi, h2, . . . , hr) .
207
(6.12)
Now, for i and j = 1, 2, let Cii = Xi'WolXi, forming C :
ICtt
C121
L C21
C22 J "
Let C ij represent the corresponding elements of C -l. Then the regression estimator of p can be expressed as j~ = (C21XI t -Jr C 2 2 X 2 t ) W o l X
.
(6.13)
Thus, for this model,/] is asymptotically normal with mean/~ and variance covariance matrix given by (62/n)C 22. Under/49, the quadratic form 32
(6,14)
is asymptotically chi-square with r degrees of freedom. To form a test statistic with a chi-square distribution under//9, they replace 6 in the denominator of (6.14) with 3. Thus, they propose the test statistic 32
(6.15)
for testing the goodness-of-fit hypothesis H9. They include results of a fairly extensive Monte Carlo simulation that indicates the test statistic (6.15) converges to its limiting chi-square distribution faster than does LaRiccia's statistic. Umbach and Ali (1991) have also developed a test of exponentiality with both location and scale unspecified based on a few selected order statistics. It is based on the non-central t-distribution.
References [1] Adatia, A. and L. K. Chan (1981). Some relations between stratified, grouped and selected order statistics samples. Scand. Actuar. J. 193-202. [2] Ali, M. Masoom (1994). Inference base on optimal inference: Some recent results. Proc. ICCS IV, Lahore, Pakistan 7, 145-153. [3] Ali, M. Masoom and D. Umbach (1984). Testing Pareto quantiles using selected order statistics. J. Statist. Stud. 4, 107 123. [4] Ali, M. Masoom and D. Umbach (1989a). A Shapiro-Wilk type goodness-of-fit test using a few order statistics. J. Statist. Plann. Infer. 22, 251-261. [5] Ali, M. Masoom and D. Umbach (1989b). Estimation of quantiles of symmetrically truncated logistic distribution using a few optimally selected order statistics. J. In[or. Opt. Sci. 10(2), 303-307. [6] Ali, M. Masoom and D. Umbach (1990). Nonlinear estimation for the log-logistic distribution using selected order statistics. J. Statist. Stud. 10, 1-8.
208
M. M. Ali and D. Umbach
[7] Ali, M. Masoom and D. Umbach (1994). Estimating functions of the parameters of normal and logistic distributions using conservative spacings. Pak. J. Statist. 10(1)A, 179-188. [8] Ali, M. Masoom, D. Umbach and K. M. Hassanein (1981a). Estimation of quantiles of exponential and double exponential distributions based on two order statistics. Commun. Statist.Theory. Meth. 10, 1921-1932. [9] Ali, M. Masoom, D. Umbach and K. M. Hassanein (1981b). Small sample quantile estimation of Pareto populations using two order statistics. Aligarh J. Statist. 1,139-164. [10] All, M. Masoom0 D. Umbach and A. K. Md. E. Saleh (1982). Small sample quantile estimation of the exponential distribution using optimal spacings. Sankhy8 B 44, 135-142. [11] Ali, M. Masoom, D. Umbach and A. K. Md. E. Saleh (1985). Tests of significance for the exponential distribution based on selected order statistics. Sankhy~ B 47, 310-318. [12] All; M. Masoom, D. Umbach and A. K. Md. E. Saleh (1992). Estimating life functions of chi distribution using selected order statistics, l i E Trans. 24(5), 88-98. [13] Ali, M. Masoom, D. Umbach, A. K. Md. E. Saleh and K. M. Hassanein (1983). Estimating quantiles using optimally selected order statistics. Commun. Statist.- Theory. Meth. 12(19), 2261-2271. [14] Anderson, T. W. (1958). Introduction to Multivariate Statistical Analysis. John Wiley & Sons. [15] Arnold, B. C., N. Balakrishnan and H. N. Nagaraja (1992). A First Course in Order Statistics. John Wiley & Sons, Inc., New York. [16] Balakrishnan, N. and A. Clifford Cohen (1991). Order Statistics and I n f e r e n c e - Estimation Methods. Academic Press, Inc. [17] Balmer, D. W., M. Boulton and R. A. Sack (1974). Optimal solutions in parameter estimation problems for the Cauchy distribution. J. Amer. Statist. Assoc. 69, 238-242. [18] Benson, F. (1948). A note on the estimation of mean and standard deviation from quartiles. J. Roy. Statist. Soc. B 11, 21-100. [19] Beyer, J. N., A. H. Moore and H. L. Harter (1976). Estimation of the standard deviation of the logistic distribution by the use of selected order statistics. Technometrics 18, 313 332. [20] Bloch, D. (1966). A note on the estimation of the location parameter of the Cauchy distribution. J. Amer. Statist. Assoc. 61, 852 855. [21] Bofinger, E. (1975). Optimal condensation of distributions and optimal spacing of order statistics. J. Amer. Statist. Assoc. 70, 151-154. [22] Cane, G. J. (1974). Linear estimation of parameters of the Cauchy distribution based on sample quantiles. J. Amer. Statist. Assoc. 69, 243 245. [23] Chart, L. K. (1969). Linear quantile estimates of the location and scale parameters of the Cauchy distribution. Statist. Hefte 10, 277-282. [24] Chan, L. K. (1970). Linear estimation of the location and scale parameters of the Cauchy distribution based on sample quantiles. J. Amer. Statist. Assoc. 65, 851-859. [25] Chan, L. K. and N. N. Chan (1973). On the optimum best linear unbiased estimates of the parameters of the normal distribution based on selected order statistics. Skand. Aktuarietidskr. 1973, 120-128. [26] Chan, L. K., N. N. Chan and E. R. Mead (1971). Best linear unbiased estimates of the parameters of the logistic distribution based on selected order statistics. J. Amer. Statist. Assoc. 66, 889 892. [27] Chan, L. K., N. N. Chan and E. R. Mead (1973). Linear estimation of the parameters of the Cauchy distribution using selected order statistics. Utilitas Math. 3, 311-3 l 8. [28] Chan, L. K. and S. W. H. Cheng (1971). On the student's test based on sample percentiles from the normal, logistic, and Cauchy distributions. Technometrics 13, 12%137. [29] Chan, L. K. and S. W. Cheng (1972). Optimum spacing for the asymptotically best linear estimate of the location parameter of the logistic distribution when samples are complete or censored. Statist. Hefte 13, 41- 57. [30] Chan, L. K. and S. W. Cheng (1973). On the optimum spacing for the asymptotically best linear estimate of the scale parameter of the Pareto distribution. Tamkang J. Math. 4, 1 21.
Optimal linear inference using selected order statistics in location-scale models
209
[31] Chan, L. K. and S. W. Cheng (1974). An algorithm for determining the asymptotically best linear estimate of the mean from multiply censored logistic data. J. Amer. Statist. Assoc. 69, 1027-1030. [32] Chan, L. K. and S. W. Cheng (1982). The best linear unbiased estimates of parameters using order statistics. Soochow J. Math. 8, 1-13. [33] Chan, L. K. and S. W. Cheng (1988). Linear estimation of the location and scale parameters based on selected order statistics. Commun. Statist. - Theory. Meth. 17(7), 2259-2278. [34] Chan, L. K., S. W. H. Cheng and E. R. Mead (1972). An optimum t-test for the scale parameter of an extreme-value distribution. Naval Res. Logist. Quart. 19, 715-723. [35] Chan, L. K. and A. B. M. L. Kabir (1969). Optimum quantiles for the linear estimation of the parameters of the extreme-value distribution in complete and censored samples. Naval Res. Logist. Quart. 16, 381 404. [36] Chan, L. K. and E. R. Mead (197la). Linear estimation of the parameters of the extreme-value distribution based on suitably chosen order statistics. IEEE Trans. on Reliab. R-20, 74-83. [37] Chan, L. K. and E. R. Mead (1971b). Tables to facilitate calculation of an asymptotically optimal t-test for equality of location parameters of a certain extreme-value distribution. IEEE Trans. on Reliab. R-20, 235 243. [38] Cheng, S. W. (1975). A unified approach to choosing optimum quantiles for the ABLE's. J. Amer. Statist. Assoc. 70, 155-159. [39] Cheng, S. W. (1978). Linear quantile estimation of parameters of double-exponential distribution. Soochow J. Math. 4, 39-49. [40] Cheng, S. W. (1983). On the most powerful quantile test of the scale parameter. Ann. Inst. Statist. Math. 35, 407-414. [41] Chernofl, H. (1971). A note on optimal spacings for systematic statistics, Mimeographed Report. Department of Statistics, Stanford University. [42] Chernoff, H., J. L. Gastwirth and M. V. Johns Jr. (1967). Asymptotic distribution of linear combinations of functions of order statistics with applications to estimation. Ann. Moth. Statist. 38, 52-72. [43] Cs6rg6, M. (1983). Quantile Processes with Statistical Applieations. Society for Industrial and Applied Mathematics, Philadelphia. [44] Dalenius, T. (1950). The problem of optimum stratification. Scand. Aktuarietidskr. 33, 203- 213. [45] David, H. A. (1981). Order Statistics. 2nd edn. John Wiley and Sons, Inc., New York. [46] Dyer, D. D. (1973). Estimation of the scale parameter of the chi distribution based on sample quantiles. Technometrics 15, 489-496. [47] Dyer, D. D. and C. W. Whisenand (1973). Best linear unbiased estimator of the parameter of the Rayleigh distribution-Part II: Optimum theory for selected order statistics. IEEE Trans on Reliab. R-22, 229 231. [48] Eisenberger, I. (1968). Testing the mean and standard deviation of a normal distribution using quantiles. Technometrics 10, 781-792. [49] Eisenberger, I. and E. C. Posner (1965). Systematic statistics used for data compression in space telemetry. J. Amer. Statist. Assoc. 60, 97-133. [50] Epstein, B. (1956). Simple estimators of parameters of exponential distributions when samples are censored. Ann. Inst. Statist. Math. 8, 15-26. [51] Epstein, B. (1960). Estimation from life test data. Technometrics 2, 447-454. [52] Eubank, R. L. (1981a). A density-quantile function approach to optimal spacing selection. Ann. Statist. 9, 494-500. [53] Eubank, R. L. (1981b). Estimation of the parameters and quantiles of the logistic distribution by linear functions of sample quantiles. Scand. Actuar. J. 1981, 229 236. [54] Eubank, R. L. (1986). Optimal Spacing Problems. In: Encyclopedia in Statistical Science (N. L. Johnson and S. Kotz, eds.), Vol. 6, pp. 452-458, John Wiley & Sons, New York. [55] Gupta, S. S. and M. Gnanadesikan (1966). Estimation of the parameters of the logistic distribution. Biometrika 53, 565-570.
210
M. M. Ali and D. Umbach
[56] Gupta, S. S., A. S. Qureshi and B. K. Shah (1967). Best linear unbiased estimators of the parameters of the logistic distribution using order statistics. Technometrics 9, 43-56. [57] Hammersley, J. M. and K. W. Morton (1954). The estimation of location and scale parameters from grouped data. Biometrika 41, 296-301. [58] Harter, H. L. (1961). Estimating parameters of negative exponential populations from one or two order statistics. Ann. Math. Statist 32, 1078-1090. [59] Harter, H. L. (1971). Some optimization problems in parameter estimation. In: Optimizing Methods in Statistics (J. S. Rustagi, ed.), pp. 33-62, Academic Press, New York. [60] Hassanien, K. M. (1968). Analysis of extreme-value data by sample quantiles for very large samples. J. Amer. Statist. Assoc. 63, 877-888. [61] Hassanien, K. M. (1969a). Estimation of the parameters of the logistic distribution of sample quantiles. Biometrika 56, 684-687. [62] Hassanien, K. M. (1969b). Estimation of the parameters of the extreme value distribution by use of two or three order statistics. Biometrika 56, 429436. [63] Hassanien, K. M. (1971). Percentile estimators for the parameters of the Weibull distribution. Biometrika 58, 673-676. [64] Hassanien, K. M. (1972). Simultaneous estimation of the parameters of the extreme value distribution by sample quantiles. Technometrics 14, 63-70. [65] Hassanein, K. M. (1974). Linear estimation of the parameters of the logistic distribution by selected order statistics for very large samples. Statist. Hefte 15, 65-70. [66] Hassanien K. M. (1977). Simultaneous estimation of the location and scale parameters of the gamma distribution by linear functions of order statistics. Seand. Actuar. J. 88-93. [67] Hassanein, K. M., A. K. Md. E. Saleh and E. F. Brown (1984). Quantile estimates in complete and censored samples from extreme-value and Weibull distributions. IEEE Trans. on Reliab. R-33, 370-373. [68] Hassanein, K. M., A. K. Md. E. Saleh and E. F. Brown (1985). Best linear unbiased quantile estimators of the logistic distribution using order statistics. J. Statist. Comput. Simul. 23, 123-131. [69] Hassanein, K. M., A. K. Md. E. Saleh and E. F. Brown (1986a). Estimation and testing of quantiles of the extreme-value distribution. J. Statist. Plann. Inf. 14, 389-400. [70] Hassanein, K. M., A. K. Md. E. Saleh and E. F. Brown (1986b). Best linear unbiased estimators for normal distribution quantiles for sample sizes up to 20. 1EEE Trans. on Reliab. R-35, 327-329. [71] Hosking, J. R. M. and J. R. Wallis (1987). Parameter and quantile estimation for the generalized Pareto distribution. Technometrics 29, 339 349. [72] Kaigh, W. D. and P. A. Lachenbruch (1982). A generalized quantile estimator. Commun. Statist.-Theor. Meth. 11, 2217-2238. [73] Kaminsky, K. S. (1972). Confidence intervals for the scale parameter using optimally selected order statistics. Teehnometries 14, 371 383. [74] Kaminsky, K. S. (1973). Comparison of approximate confidence intervals for the exponential scale parameter from sample quantiles. Teehnometries 15, 483~487. [75] Kaminsky, K. S. (1974). Confidence intervals and tests for two exponential scale parameters based on order statistics in compressed samples. Teehnometrics 16, 251-254. [76] Kaminski, K. S. and P. I. Nelson (1975). Best linear unbiased prediction of order statistics in location and scale families. J. Amer. Statist. Assoc. "70, 145-150. [77] Kappenman, R. F. (1987). Improved distribution quantile estimation. Commun. Statist.-Simul. Comput. 16, 30~320. [78] Koutrouvelis, I. A. (1981). Large sample quantile estimation of Pareto laws. Commun. Statist. Theor. Meth. A(10)2, 189-201. [79] Kubat, P. and B. Epstein (1980). Estimation of quantiles of location-scale distributions based on two or three order statistics. Technometric~ 22, 575-581. [80] Kulldorff, G. (1961a). Contributions to the Theory of Estimation .from Grouped and Partially Grouped Samples. Almqvist and Wiksell, Stockholm.
Optimal linear inference using selected order statistics in location-scale models
211
[81] Kulldorff, G. (1961b). Estimation from Grouped and Partially Grouped Samples. John Wiley & Sons, New York. [82] Kulldorff, G. (1963a). Estimation of one or two parameters of the exponential distribution on the basis of suitably chosen order statistics. Ann. Math. Statist. 34, 1419 143l. [83] Kulldorff, G. (1963b). On the optimum spacing of sample quantiles from a normal distribution. Part I. Scand. Aktuarietidskr. 46, 143-156. [84] Kulldorff, G. (1964). On the optimum spacing of sample quantiles from a normal distribution. Part II. Scand. Aktuarietidskr. 47, 71-87. [85] Kulldorff, G, (1973). A note on the optimum spacing of sample quantiles from the six extreme value distributions. Ann. Statist. 1, 562-567. [86] Kulldorff, G. and K. V~.nnman (1973). Estimating location and scale parameters of a Pareto distribution by linear functions of order statistics. J. Amer. Statist. Assoc. 68, 218-227. [87] La Riccia, V. N. (1991). Smooth goodness-of- fit tests: A quantile function approach. J. Amer. Statist. Assoc. 86, 427-431. [88] LikeL J. (1985). Estimation of the quantiles of Pareto distribution. Math. Operationsfors. und Statist., Ser. Statist. 16, 541-547. [89] Lloyd, E. H. (1952). Least-squares estimation of location and scale parameters using order statistics. Biometrika 39, 88-95. [90] Mann, N. R. (1970). Estimators and exact confidence bounds for Weibull parameters based on a few ordered observations. Technometrics 12, 345-361. [91] Mik6, V. (1971). Efficiency-robust systematic linear estimates of location. J. Amer. Statist. Assoc. 66, 594-601. [92] Miyamoto, Y. (1972). Three-parameter estimation for a generalized gamma distribution by sample quantiles in large samples. J. Japan Statist. Soc. 3, 9-18. [93] Mosteller, F. (1946). On some useful "inefficient" statistics. Ann. Math. Statist. 17, 377-408. [94] Ogawa, J. (1951). Contributions to the theory of systematic statistics. I. Osaka Math. J. 3(2), 175-213, [95] Ogawa, J. (1960). Determination of optimal spacings for the estimation of the side parameters of an exponential distribution based on sample quantiles. Ann. Inst. Statist. Math. 12, 135 141. [96] Ogawa, J. (1976). A note on the optimal spacing of the systematic statistics - normal distribution. In: Essays in Probability and Statistics (Ikeda et al., eds.), pp. 467-474, Shinku Tsusho, Tokyo. [97] Parzen, E. (1978). A density-quantile function perspective on robust estimation. In: Robustness in Statistics (R. U Laudner and G. N. Wilkinson, eds.), pp.237-258, Academic Press, New York. [98] Rao, C. R. (1965). Linear Statistical Inference and Its Applications. John Wiley & Sons, Inc., New York. [99] Reiss, R.-D. (1980). Estimation of quantiles in certain nonparametric models. Ann. Statist. 8, 87-105. [100] Renyi, A. (1953). On the theory of order statistics, Acta. Math. Acad. Sci. Hung. 4, 191-231. [101] Rukhin, A. L. and W. E. Strawderman (1982), Estimating a quantile of an exponential distribution. J. Amer. Statist. Assoc'. 77, 159-162. [102] Saleh, A. K. Md. E. (1966). Estimation of the parameters of the exponential distribution based on order statistics in censored samples. Ann. Math. Statist. 37, 1717-1735. [103] Saleh, A. K. Md. E. (1967). Determination of the exact optimum order statistics for estimating the parameters of the exponential distribution from censored samples. Technometries 9, 279-292. [104] Saleh, A. K. Md, E. (1981). Estimating quantiles of exponential distribution. In: Statistics and Related Topics, North Holland Publishing. Co-editors: C~org6, M., Dawson, D., Rao, J. N. K., and Saleh, A. K. Md. E., 279-283. [105] Saleh, A. K. Md. E. and M. M. Ali (1966). Asymptotic optimum quantiles for the estimation of the parameters of the negative exponential distribution. Ann. Math. Statist. 37, 143 151. [106] Saleh, A. K. Md. E., M. Masoom Ali and D. Umbach (1983a). Estimating the quantile function of location- scale family of distributions based on a few selected order statistics. J. Statist. Plann. Infer. 8, 75-86.
212
M. M. Ali and D. Umbach
[107] Saleh, A. K. Md. E., M. Masoom Ali and D. Umbach (1983b). Simplified estimation of the parameters of exponential distribution from samples censored in the middle. Commun. Statist. Simula. Computa. 12(5), 609-627. [108] Saleh, A. K. Md. E., M. Masoom Ali and D. Umbach (1984). Tests of significance using selected sample quantiles. Statist. Prob. Lett. 2(5), 295-298. [109] Saleh, A. K. Md. E., M. Masoom All and D. Urnbach (1985). Large sample estimation of Pareto quantiles using selected order statistics. Metrika 32, 49-56. [110] Saleh, A. K. Md. E. and K. M. Hassanein (1986). Testing equality of location parameters and quantiles of s(> 2) location-scale distributions. Bull. lnst. Math. Acad. Sinica 14, 3949. [111] Saleh, A. K. Md. E. K. M. Hassanein and M. Masoom Ali (1988). Estimation and testing of hypotheses about the quantile function of the normal distribution. J. Infor. Opt. Sei. 9(1 ), 85-98. [112] Saleh, A. K. Md. E., K. M. Hassanein and M. Masoom Ali (1992). Estimation of quantiles using selected order statistics. Handbook o f the Logistic Distribution, ed: N. Balakrishnan, pp. 98-113, Marcel and Dekker, New York. [113] Saleh, A. K. Md. E., K. M. Hassanein and E. F. Brown (1985). Optimum spacings for the joint estimation and tests of hypothesis of location and scale parameters of the Cauchy distribution. Commun. Statist. - Theory. Meth. 14, 247-254. [114] Saleh, A. K. Md. E. and P. K. Sen (1985). Asymptotic relative efficiency of some joint-tests for location and scale parameters based on selected order statistics. Commun. Statist. - Theory. Meth. 14, 621 633. [115] Sarhan, A. E. and B. G. Greenberg (eds.) (1962). Contributions to Order Statistics. John Wiley and Sons, Inc., New York. [116] Sarhan, A. E. and B. G. Greenberg (1967). Linear estimates for doubly censored samples from exponential distribution with observations also missing from the middle. Bull. 1SI, 36 th Session 42, Book 2, 1195- 1204. [117] Sarhan, A. E., B. G. Greenberg and J. Ogawa (1963). Simplified estimates for the exponential distribution. Ann. Math. Statist. 34, 102-116. [118] S/irndal, C. E. (1962). Information from Censored Samples. Almqvist & Wiksells, Uppsala. [119] S~irndal, C. E. (1964). Estimation of the parameters of the gamma distribution by sample quantiles. Technometrics 6, 405-414. [120] Shelnutt, J. W., III (1966). Conditional Linear Estimation of the Scale Parameters of the Extreme Value Distribution by the Use of Selected Order Statistics. Unpublished M.S. Thesis, Air Force Institute of Technology. [121] Shelnutt, J. W., III, A. H. Moore and H. L. Harter (1973). Linear estimation of the scale parameter of the first asymptotic distribution of extreme values. IEEE Trans. on Relaib. R-22, 259-264. [122] Siddiqui, M. M. (1963). Optimal estimators of the parameters of negative binomial distributions from one or two order statistics. Ann. Math. Statist. 34, 117-121. [123] Siddiqui, M. M. and K. Raghunandanan (1967). Asymptotically robust estimators of location. J. Amer. Statist. Assoc. 62, 950-953. [124] Tischendorf, J. A. (1955). Linear Estimation Techniques Using Order Statistics. Ph.D. Thesis, Purdue University. [125] Ukita, Y. (1955). On the efficiency of order statistics. J. Hokkaido College o f Sci. and Art. 6, 54-65. [126] Umbach, D. (1994). Estimating functions of location and scale for the t distribution. Nonpar. Statist. 3, 369-377. [127] Umbach, D. and M. Masoom All (1990). Three goodness-of-fit tests based on selected order statistics. Soochow J. Statist. 16, 37-52. [128] Umbach, D. and M. Masoorn Ali (1991). A goodness-of-fit test for the exponential distribution. Pak. J. Statist. 7(3), 39-52. [129] Urnbach, D. and M. Masoorn Ali (1993). Conservative spacings for the estimation of functions of location and scale. J. InJb. Opt. Sci. 14, 309-319.
Optimal linear inference using selected order statistics in location-scale models
213
[130] Umbach, D. and M. Masoom Ali (1996a). A smooth goodness-of-fit test using selected sample quantiles. J. Korean Statist. Soc. 25(3), 347-358. [131] Umbach, D. and M. Masoom Ali (1996b). Conservative spacings for the Weibull and extreme value distributions. Calcutta Statist. Assoc. Bull. 46(183 184), 169-180. [132] Umbach, D., M. Masoom Ali and K. M. Hassanein (1981a). Small sample estimation of the exponential quantiles with two order statistics. Aligarh J. Statist. 1, 113 120. [133] Umbach, D., M. Masoom Ali and K. M. Hassanein (1981b). Estimating Pareto quantiles using two order statistics. Commun. Statist. Theory. Meth. 10(19), 1933-1941. [134] Umbach, D., M. Masoom Ali and A. K. Md. E. Saleh (1984). Hypothesis testing for the double exponential distribution based on optimal spacing. Soochow J. Math. 10, 133-143. [135] Umbach, D., M. Masoom Ali and A. K. Md. E. Saleh (1986). Estimation of the parameters of the double exponential distribution using symmetric spacings. Soochow J. Math. 12, 99-108. [136] V~innman, K. (1976). Estimators based on order statistics from a Pareto distribution. J. Amer. Statist. Assoc. 71,704-708. [137] Weiss, L. (1963). On the asymptotic distribution of an estimate of a scale parameter. Naval Res. Logist. Quart. 10, 1-11. [138] Weissman, I. (1978). Estimation of parameters and large quantiles based on the K largest observations. J. Amer. Statist. Assoc. 73, 812-815. [139] Yamanouchi, Z. (1949). Estimates of the mean and standard deviation of a normal distribution from linear combinations of some chosen order statistics. Bull. Math. Statist. 3, 52-57.
N. Balakrishnan and C. R. Rao, eds., Handbook of Statistics, Vol. 17 © 1998 Elsevier Science B.V. All rights reserved.
0
O
L-Estimation
J. R. M . H o s k i n g
1. Introduction We denote by Xk:n the k th smallest observation from a sample of size n drawn from the distribution of a random variable X. The ordered sample is XI:, _< X2:, _< ... _< Xn:,; the Xk:, are the order statistics. When basing inference on the order statistics, it is often convenient to work with linear combinations of them, of the form n
r = Zwi,.x,:n
.
(1)
i=1
These linear combinations are known as L-statistics or L-estimates. L-statistics have several advantages for use in statistical inference. If the rand o m variable Y is a linear transformation of X, Y = c~+ fiX, then the order statistics of X and Y satisfy Y,:n = e + flXi:n, and L-statistics computed from them satisfy (in an obvious notation) T (Y) = ct ~ wi,n ~- fiT (x). Thus T (r) = ct + fiT (x) if wi,n = 1 and T (y) = fiT (x) if ~ wi,, = 0; these are the required transformations for estimators of location and scale of the distribution of X, respectively. Because L-statistics are linear in the X~:,, they can often be more easily computed, and have their sampling properties evaluated, than more general estimators. By appropriate choice of the weights wi,~ it is possible to derive robust estimators, whose properties are not excessively dependent on correct specification of a statistical model. L-statistics are also a natural choice when the sample is censored, with only a subset of the order statistics, say X~:,, i = r, r + 1,... ,s (r > 1 or s < n) having been observed. In this survey of L-estimation we first describe some typical uses of L-estimates, illustrating the ways that L-statistics are commonly constructed. We then discuss in a little more detail some of the principal applications of L-statistics: summarizing data samples and estimating parameters and quantiles of univariate distributions. Finally we briefly discuss some applications of L-estimation to multivariate statistics; these tend to be much more complicated because it is not straightforward to extend the definition of order statistics to multivariate observations. 215
216
J.R.M. Hosking
Many estimators used in Statistics can be regarded as L-estimates, but not all are most conveniently treated as such, and there is considerable overlap between L-estimation and other approaches to parameter estimation and robust statistics. It should be borne in mind, therefore, that any account of L-estimation must necessarily overlap with several other topics, notably parameter estimation, kernel estimation, robust statistics and nonparametric regression.
2. Introductory examples 2.1 Efficient methods using the entire sample For a distribution with cumulative distribution function F((x - 2)/o-), completely specified except for a location parameter 2 and a scale parameter a, optimal L-estimates of the parameters can be derived from least-squares theory. The estimators were first obtained by Lloyd (1952). The order statistics of a random sample of size n have means and covariances of the form E&.
= =
+ 2 oej:
, ,
(2) (3)
where ~i:, and ~o~,j:,, are constants depending only the base distribution F. We write these expressions in vector form. Let 0 = [2 a] v be a vector containing the parameters; let X = [2(i. . . . . X,:~]"r be a vector containing the order statistics; let 1 be an n-vector each of whose elements is 1; let ¢ = [~l:n..- ~,:,]T; let ~ be the matrix with (i,j) element coij:,. Then we can write (2)-(3) as EX=21+a~=A0
,
varX = a2~ , where A is the n x 2 matrix A = [1
CJ.
We seek L-estimates )~ = y~,.W i .(~)v and ~ = y~.7i.n (~)v a/:n of 2 and a respectively; n Aim The best linear unbiased estimator ( " B L U E " ) 0 = [2 #IT of 0 is given by the Gauss-Markov theorem (see, e.g., Odell, 1982). It has the form 0 = (AT~"UIA)-IAT~[~ IX .
(4)
It is optimal in the sense that it has minimum variance among unbiased estimators, i.e. it satisfies E0 = 0 and, if 0 is any other L-statistic that is an unbiased estimator of 0, then the matrix vat 0 - var O is positive definite. Disadvantages of Lloyd's approach are that it requires the tabulation of means and variances of order statistics for every distribution and sample size for which it is used, and the inversion of the n x n matrix ~2. Gupta's (1952) "simplified linear estimates" overcome the second disadvantage by replacing the matrix ~ with the identity matrix. This gives surprisingly good results, at least for the Normal
L- Estima tion
217
distribution. Blom's (1958, 1962) "unbiased nearly best linear estimates" overcome both disadvantages. The covariance coi,j:n is first replaced by an asymptotic approximation (David, 1981, Eq. (4.6.5)), O')i'j:n ~
pi(1 - p j ) (n + 2)f(F-l(pi))f(F l(pj)) '
(5)
where p~ = i/(n + 1) and f(x) = dF(x)/dx is the probability density function of the standardized variate (X - 2)/a. This yields an ~ matrix whose inverse can be explicitly written down (e.g., Balakrishnan and Cohen, 1991, p. 217), and after some algebraic manipulations minimum-variance estimators are derived that require inversion of only a 2 x 2 matrix. The BLUEs and their derivatives are optimal only among the restricted class of L-estimates. There is no assurance that other estimators, such as maximumlikelihood estimators, may not be more efficient. However, it is possible to derive L-estimates that are asymptotically fully efficient in large samples (as n --+ oc). For location-scale models, the estimators were obtained by Bennett (1952) and more thoroughly studied by Chernoff et al. (1967). An L-statistic can be defined for arbitrary sample size n by permitting the weights wi,~ to approximate a smooth weight function, with nwi,, ~ J(i/(n + 1)). Chernoff et al. (1967) showed that by appropriate choice of the function J(.), estimates of location and scale parameters could be obtained that attained the Cramer-Rao lower bound asymptotically and were therefore maximally efficient. Consider the L-statistic
\n + l J
i=1
"
By Chernoff et al. (1967, Theorem 3), T is asymptotically Normally distributed, with asymptotic mean and variance given by
ET ~
I'
nvarT~2/
J(u)O(u) du , f
J(u)J(v)Q'(u)(~(v)u(1-v)dudv
,
0
LI(y) = - f ( y ) / f ( y ) ,
L2(y) = -{1 ÷ y f ( y ) / f ( y ) }
,
and define weight functions J(X)(.) and J(~)(-) by the matrix equation
[J()')(u)
J(Cr)(u)]=Ittl(Q(u))
L~z(Q(u))lI i v
where [ is the Fisher information matrix of a random sample drawn from the distribution of X. Then the statistics T()~) and T (~), defined as in (6) but with
218
J . R . M . Hosking
weight functions J(~) and J(°) respectively, are asymptotically unbiased and maximally efficient estimators of 2 and ~. The foregoing discussion has concentrated on the case in which a complete sample is available. In doubly censored samples only a subset of the order statistics, say X~:~, i = r, r + 1 , . . . , s (r > 1 or s < n), is observed. Both the BLUEs and the asymptotically efficient L-estimates may also be derived for such samples, by essentially the same approach as for complete samples; however, the asymptotically efficient L-estimates take the slightly modified form
Wr,nXr:" ~- Ws,nXs:n ~- n -I i~=rJ
Xi:, ,
(7)
in which additional weight is given to the most extreme of the observed order statistics.
2.2. Quick methods using a few order statistics L-statistics that involve only a small number of order statistics are quicker to compute than statistics that involve the entire sample, yet can yield useful and accurate estimates of distribution properties. An example is the sample range, Rn = Xn:n - Xl:n, which has a long history. As well as giving an informal measure of the spread of the sample values, it can be used in formal inference procedures. For the N o r m a l distribution, for example, constants dn have been tabulated (e.g., David, 1981, p. 185) such that Rn/dn is an unbiased estimator of the standard deviation of the distribution. Relative to the minimum-variance unbiased estimator,
the estimator R,/d, has efficiency that decreases as the sample size increases, being 85% when n = l0 and 70% when n = 20. Formal hypothesis tests can also be based on the sample range. To test the hypothesis that the mean of a Normal distribution is equal to a specified value /t 0, for example, an analogue of the Student's t statistic is t* =_ (X - #o)/R,. Relative to the (optimal) t test, the test based on t* loses a fairly small amount of power (Lord, 1950). However, it can be sensitive to departures of the underlying distribution from exact Normality; Prescott (1975) suggested alternatives that are more robust. With the availability of modern computing power, the choice of statistics on the basis of computational simplicity has become less compelling. However, it may still be that data are difficult or expensive to obtain, and methods based on a few order statistics can still be worthwhile. Balakrishnan and Cohen (1991, p. 240) describe an experiment of Benson (1949) in which observations of textile strength made by an automatic testing machine were recorded as points on a chart, and note that "as accurate conversion of the points to numerical values would be time-consuming, and also an estimate based on inaccurate conversion would
L-Estimation
219
usually cause gross accumulated error, sample quantiles were used for estimation since they could easily be picked up from the chart". A useful approach in such cases is to determine an optimal choice of which order statistics to use for estimation. For location-scale models, this is straightforward in principle using the BLUE approach. For any subset of k order statistics, the B L U E and its accuracy based on these k order statistics can be derived; the optimal choice is then the subset that yield the L-estimate with the smallest variance. In practice an exhaustive search over all (~) subsets of k order statistics is prohibitively expensive, but becomes feasible for moderate values of n if the approximation (5) for coij:n and the analogous approximation {i:n ~ F-1 (Pi) for in are used to simplify the matrices A and £~ in the definition of the BLUE. Chan and Cheng (1988) investigated this approach and found its asymptotic efficiency to be quite high, even when only a small number of order statistics are used.
2.3. Robust estimation using L-statistics
The previous sections have described L-estimates that are in some sense optimal when the underlying distribution of the sample is known apart from location and scale parameters. In practice one may not have complete confidence that the underlying distribution can be exactly specified. In particular, the distribution may have longer tails than expected, the sample may be contaminated by values drawn from some other distribution, or some sample values may be subject to measurement error. These situations can give rise to outliers, observations that are discordant with the main body of the sample. Considerable effort has been directed towards devising statistics that are robust to deviations from assumptions of randomness and distributional form. Because outlying values are the most extreme order statistics, L-statistics that give zero weight to these order statistics are particularly good candidates to be robust estimators. A widely studied robust estimate of location is the trimmed mean, n
r
m~ = (n - 2r) -1 Z
X/:, ,
(8)
i=r+l
where r --- [n~]. Here the largest and smallest r observations, each representing a fraction ~ of the entire sample, are ignored when calculating the mean. Extreme cases of trimmed means are m0, the usual sample mean, and ma/2, interpreted as lim~71/2 m~, the sample median. These are the maximum-likelihood estimators of the centers of the Normal and double exponential distributions respectively. For estimating the center 2 of a symmetric distribution, the trimmed mean is unbiased, approximately Normally distributed in large samples (Bickel, 1965; Stigler, 1973) and can be used as the basis of a t-like statistic for hypothesis tests on the value of 2 (Tukey and McLaughlin, 1963). Appropriate choice of ~, the amount of trimming, depends on the degree of robustness required: larger amounts of trimming (larger values of ~) give protection against heavier-tailed distributions. For example, Crow and Siddiqui (1967) recommend a value of
220
J.R.M. Hosking
= 1/5 when possible distributions range from the Normal to the double-exponential, and values of ~ between 1/4 and 1/3, depending on n, when possible distributions range from the Normal to the Cauchy (a heavier-tailed distribution than the double exponential).
2.4 Adaptive L-statistics The appropriate degree of trimming of the mean depends on the distribution from which the sample is drawn, and in particular of the tail weight of the distribution. It is therefore reasonable to estimate the distribution's tail weight from the observed sample and to choose the degree of trimming on the basis of this estimate. This is an example of an adaptive L-statistics, in which the weights wi,, in (1) depend on the actual observed sample. Hogg (1967) proposed to choose the trimming parameter c~ based on the sample kurtosis, but later (Hogg, 1974) found that a better indicator of tail weight was
U(.2) - L(.2) °l
- vU
s)
'
where ~(fl) and L(fi) denote the averages of the [nil] largest and the [nil] smallest order statistics respectively. The population analogue of QI takes the value 1.75 for the Normal distribution and 1.93 for the double exponential distribution. Hogg (1974) therefore suggested the adaptive L-estimate
ml/8 ml/4
if Q1 < 1.81 , if 1.81 _< Q1 < 1.87 ,
m3/8 if QI_> 1.87 . De Wet and van Wyk (1979) proposed a modification of Hogg's scheme in which is a continuous function of Q1. This estimator compares favorably in performance with a number of other robust estimates of location (Barnett and Lewis, 1994, p. 146).
3. Single-sample problems 3.1. Estimation of location Estimation of a location measure of a probability distribution, given a random sample drawn from the distribution, can be divided into two cases. In parametric estimation, the distribution is regarded as being completely specified apart from a finite number of parameters. L-estimates in this case can be based on the BLUE or asymptotically efficient approaches described previously. In nonparametric estimation, no firm assumptions are made about the form of the distribution. In
221
L-Estimation
this case, choice of an appropriate estimator combines two considerations: efficiency or accuracy, when the distribution has some particularly plausible form; and robustness against the possibility that the distribution has some unexpected form or against contamination of the sample by aberrant observations drawn from a completely different distribution. Many robust L-estimates of location have been proposed, but robust estimation can be based on other approaches. Robust estimation in general has been described by Andrews et al. (1972), Huber (1981) and Hampel et al. (1986). Here we merely describe and compare a few of the more c o m m o n L-estimators. The sample mean is unbiased and an efficient estimator of location for the N o r m a l distribution, but is not particularly robust. The trimmed mean is defined in (8). The Winsorized mean is similar to the trimmed mean, but extreme observations are replaced by less extreme order statistics rather than being ignored altogether:
nr l n 1 (r+l)Xr+l:n+
)
Z Xi:n+(r+l)Xn-r:n i=r+2
,
(9)
where r = [n~]. The sample median is ~"x(,,+l)/2:n, 1 "r = ].1(X,/z:n +x,/2+l:,),
n odd, n even
Two other simple estimators, each based on three sample quantiles, are Gastwirth's (1966) estimator,
0.3X[n/3]+l:n + 0.42 + 0.3Xn [n/3]:~ ,
(10)
and the trimean, 0.25X[n/4]+l: n + 0.52 + 0.25X n [n/4]:n •
(11)
An example of the performance of some L-estimators of location is given in Table 1. The tabulated values are the variances, obtained by simulation of samples of size 20, of estimates of the center of symmetry of various symmetric distributions. The performance of the mean is seriously degraded when the parent distribution is non-Normal. The median is somewhat inefficient for the N o r m a l distribution and small deviations from it, but is the most robust of these statistics; it performs relatively well for the Cauchy distribution. The other location measures have intermediate performance characteristics. The foregoing discussion has concentrated on symmetric distributions - for which it is appropriate to use symmetric L-estimators, with wi,n = wn+l i,n in (1) because for such distributions the center of symmetry is a particularly obvious and natural measure of location. Robustness to asymmetry is also important in many applications, but is less easy to assess in general, because choice of a statistic to measure location in samples from asymmetric distributions is also a choice of a particular feature of the distribution as a location measure. Use of the sample median as a location measure, for example, implies that the population
222
J. R. M. Hosking
Table 1 Empirical variances of some L-estimators of location. Results reproduced from Exhibit 5 of Andrews et al. (1972). CN(~,/?) denotes a contaminated Normal distribution: X ~ N(#, er2) with probability 1 - ~, x ~ N O , , (/~o)2) with probability e. Sample size 20. Estimator
Distribution
Mean 10% trimmed mean 25% trimmed mean Median Gastwirth, (10) Trimean, (11)
Normal
CN(10%,3)
CN(10%,10)
Laplace
Cauchy
1.00 1.06 1.20 1.50 1.23 1.15
1.88 1.31 1.41 1.70 1.45 1.37
11.54 1.46 1.47 1.80 1.51 1.48
2.10 1.60 1.33 1.37 1.35 1.43
12548.0 7.3 3.1 2.9 3.1 3.9
median is regarded as the appropriate measure of location for an arbitrary asymmetric probability distribution. This may or may not be the case in practice, depending on the application for which the data are being used.
3.2. Estimation of scale Nonparametric estimation of scale is analogous to estimation of location for asymmetric distributions, in that choice of a particular scale measure for a data sample implies belief that a particular feature of a probability distribution is an appropriate measure of scale. Again, considerations of robustness apply - the effect of outliers and long-tailed distributions can be more serious for scale estimators than for location estimators (Davidian and Carroll, 1987) - and L-estimates are only one kind of estimate that may be considered. However, some reasonable scale measures for a probability distribution with quantile function Q(u) have the form of a linear functional of Q(.), flj(u)Q(u)du, with fd J(u) du = 0. In this case the smooth L-estimator (6) is a natural estimator of the scale measure. Some L-estimators of scale have a long history. The sample range, Xn:n --Xl:n, has been widely used as a quick estimator of scale, but it is not robust to outliers and its properties are dependent on the sample size. Gini's mean difference statistic
1
can be written as an L-estimator, G
,=,
2(2i - n
(n-Z
1)x':" ;
its history was traced back to Von Andrae (1872) by David (1968),
L-Estimation
223
Estimators initially derived for use with specific distributions may, of course, be suitable for more general use if they are suitably robust. An example is the asymptotically efficient L-estimator for the Normal standard deviation,
i=~l~D 1( i "~Yi-n \ n + l /
"
'
where ¢ - l ( u ) is the quantile function of the standard Normal distribution. A trimmed version of this estimator,
In(1 c~)]
E
i=[,~1+1
k,n + l J
' '
was recommended by Welsh and Morrison (1990) for making inferences in small contaminated samples, and was used by them in an astronomical application involving comparison of distributions of stellar velocities. Scale estimators that are analogs of trimmed means can alternatively be based on the ordered values Zi:, of the deviations Zi = IX/- m I of the sample values from the estimated location measure m. Such estimators include the trimmed standard deviation,
{ 1 [n,~] 2 }1/2 with m taken to be the mean or the trimmed mean m~, and the median deviation Z[,/2]:~, with m taken to be the median. These are not L-estimators as defined in (1), though they are clearly related to L-estimators. Related statistics have been used to test homogeneity of scale in several samples (Conover et al., 1981; Tiku and Balakrishnan, 1984; Carroll and Schneider, 1985).
3.3. Estimation of quantiles The quantile of nonexceedance probability p of a random variable X is denoted by Q(p) and is the value such that a proportion p of the realizations of X are less than or equal to Q(p). Quantiles are an important topic in statistical inference, being often of more direct interest than the parameters of the probability distribution that is fitted to the data. Nonparametric inference for quantiles, in which no assumptions are made about the distributions of the underlying random variable X, gives robust and often reasonably efficient procedures. Nonparametric confidence intervals for quantiles based on individual order statistics are described by David (1981, pp. 15-16). Here we are concerned with some recent developments that involve more complicated L-statistics. The sample quantile is the value analogous to Q(p) but computed from the observed data; for a random sample of size n the sample quantile of nonexceedance probability p is
224
J.R.M.
Q(p)=X/:n
Hosking
for --i-1 < p _ < - i . n
(12)
///
As a function of p, Q(p) has discontinuities at the points p = i / n , i = 1 , . . . , n - 1, and a continuous estimator may be preferred. It is reasonable to estimate Q(p) by X~:n when p = (i - 1 / 2 ) / n and by a weighted average of the order statistics whose indices are adjacent to the fractional value n p + 1/2 otherwise. Thus, following Parzen (1979), we are led to the estimator Q(p)=n
---p
Xi:n-}-n
p--
+l:n
n
for
i! n
2
n
with 0(P) left undefined f o r p < 1/(2n) o r p > 1 - 1/(2n). Further extensions of this approach involve all of the order statistics in the estimation of a single quantile, greatest weight being given to the order statistics X/:, for which i / ( n + 1) is close to p. Harrell and Davis (1982) defined a quantile estimator that is the L-statistic (1) with weights HD
/ ' ( n -~- l)
wi,n = wi, ~ =-- F ( p ( n + 1))F((1 - p ) ( n
+ 1))
tp(n+l) 1(1 - t ) (l-p)(n+l)-I dt .
× fi/n a(i 1)/~
Heuristically, this is justified by arguing that Q(p) .~ EX(n+l)p:~ for large n, and estimating the mean of the order statistic X(n+l)p:~, given by
EX/:.
=
/7(I"/ -l- 1) F(i)-~(n~
fl
-1 - - i) ~.
ui-l(1 - u)" iO(u) du
by the corresponding integral with Q ( u ) replaced by the sample quantile function Q(u) the sample quantile function is Q(p) as in (12), regarded as a function of p, 0 < p < 1. An alternative interpretation of the Harrell-Davis estimator, noted by Sheather and Marron (1990), is that it is the bootstrap estimator of EX(,+I)p:,. Kaigh and Lachenbruch (1982) proposed an estimator of Q(p) that is a U-statistic, the average, over all size-k subsamples drawn without replacement from the full size-n sample, of the sample quantile Q(p) for the subsample. This can also be written as an L-statistic, the weights in (1) being given by Wi'n =
KL
n -- i
n
Wi'n ~
k - r
k
'
where r = [p(k + 1)1 and (~) is defined to be zero if a < b. The parameter k must be chosen by the user. When k = n, the estimator is just the sample quantile 0(P) for the full sample. Smaller values o f k cause more order statistics to contribute to the estimator, and tend to reduce its variability, but may cause bias because EXp(k+l):k can be a poor approximation to Q(p) when k is small. Figure 1 shows
,
L-Estimation
03t
225
o3t
i,n ]
l 1
~,n n
1
n
Fig. t. Relative contributions of each observation to the Harrell-Davis and Kaigh Lachenbruch estimators of the upper quartile Q(0.75), for sample size n = 19; Kaigh-Lachenbruch estimator uses k=7.
examples of the weights of the Harrell-Davis and Kaigh-Lachenbruch estimators of the upper quartile Q(0.75). Sheather and Marron (1990) considered kernel estimators of Q(p). The idea of kernel estimation is to take a rough estimator of a function and a smooth "kernel function" and combine them via convolution to yield a smooth estimator. In the case of quantile estimation, the rough estimator is the sample quantile function Q(u), which has discontinuities at the points u = i/n. The kernel is a function K(.) that integrates to 1 and is generally taken to be positive and symmetric about 0, i.e., it is a symmetric probability density function. The kernel estimator is QK(p) = f./n 1 h - I K ( ( .
du
----~-~I~J(i-1)/nh 1g((bt-p)/h) du}Xi:n ; the latter form exhibits the kernel estimator as an L-estimator. The parameter h is a "bandwidth" that controls the amount of smoothing of the sample quantile function; the contribution of Q(u) to QK(p) is determined by the size of the distance ]u - p [ as a multiple of h. The alternative form
x ( i - ½ ) / n - p X:n i=l
h
K '-½)/"-P h
is often more convenient for computation and includes a normalization of the weights so that they sum to 1. Sheather and Marron (1990) pointed out that both the Harrell-Davis and K a i g ~ L a c h e n b r u c h quantile estimators can be regarded as kernel estimators. Their weight functions are essentially the density function of the beta distribution and the probability function of the hypergeometric distribution, respectively, and both of these can be approximated by the Normal density function for large n; the asymptotic bandwidths are h = { p ( 1 - p ) / ( n + 1)}1/2 for the Harrell-Davis estimator and h = { p ( 1 - p ) / k } ] / 2 for the Kaigh-Lachenbruch estimator.
226
J . R . M . Hosking
The mean square error of QK(p) as an estimator of Q(p) contains terms that are of asymptotic order n -1, n - l h and h 4 and signs that are +, - and + respectively (Sheather and Marron, 1990, Theorem 1). The O(n -1) term is the asymptotic variance of the sample quantile Q(p). The optimal choice of the bandwidth parameter is therefore h = O(n i/3), and the mean square error of the optimal kernel estimator differs only by a term of order n -4/3 from that of the sample quantile. This suggests that kernel estimators are not likely to give great improvements in accuracy over the sample quantile. In simulation experiments using samples of size 50 and 100 from the Normal, lognormal, exponential and double exponential distributions, Sheather and Marron (1990) found that this was generally the case, improvements of 15% in mean square error being the best that could be obtained overall. Nonetheless kernel estimates have been found to be useful in practice; recent examples of their use include Alam and Kulasekera (1993) and Moon and Lall (1994).
3.4. Estimation o f distribution parameters
For the nonparametric estimation problems discussed in the previous sections, it is sometimes unclear which features of a distribution or an estimator are most appropriate as the basis for inference, and L-estimates often have appeal on account of their simplicity and demonstrable robustness to outliers. In parametric problems of fitting a distribution to data, the asymptotic optimality of likelihoodbased methods is well established and alternative methods need to possess comparable desirable properties. Optimality properties of L-estimates can be based on the B L U E or asymptotic efficiency properties discussed above, and when these can be achieved simultaneously with a convenient explicit form for the estimators the L-estimates become candidates for practical use. The BLUE and related approaches have been outlined above, and are further described by David (1981, Sections 6.2-6.3) and Balakrishnan and Cohen (1991, Chapters 4 and 7). The asymptotically efficient estimators of Chernoff et al. (1967), also mentioned above, are useful in that for some distributions they have a simple explicit form. For example for the Normal distribution the estimators of the mean and standard deviation are t/
i=1
i=1
~
x/:n ;
for the Cauchy distribution, with probability density function f ( x ) = o- 17c 1{ 1 + ((x - 2)/o-)} -2, - o c < x < oc, the estimates of the parameters are )~
1+ = n~
sin(47rpi) tan(~rpi) '
8i~1 tan(~pi) cos 4(~pi)X/:n a = n
where pi = i / ( n + 1) - 1/2. The weights wi,n of these estimators are illustrated in Figure 2, and illustrate how for long-tailed distributions such as the Cauchy
227
L-Estimation
-:I[TTIII[ITTIIIIITITT -:] i
Win
_.,
2
Normal scale
it TTI[IFT 3t I
Wi, n
0
2
Cauchy location
-
-
3
TTT
TTT
0 1
,TTTT, ~e.
i
Cauchy scale
Fig 2. Relative contributions of each observation to Chernoff et at.'s estimates of parameters of the Normal and Cauchy distributions, for sample size n - 19.
efficient estimators give less weight to the extreme observations, which are often outliers. The methods discussed thus far have been concerned with location-scale families of distributions, but can be extended to more general distributions. An approach worthy of attention was used by Maritz and Munro (1967) for the generalized extreme-value distribution and by Munro and Wixley (1970) for the three-parameter lognormal distribution. It can be applied to any distribution with location and scale parameters and one or more shape parameters. It can be regarded as an iterative procedure based on Blom's modification of the BLUE. When a shape parameter fl is present the BLUE cannot be computed directly because the means and covariances of the order statistics in (2)-(3) depend on the unknown ft. To overcome this an iterative procedure is used. 1. Guess an initial value for ft. 2. Compute means and variances of order statistics, and the A and D matrices in (4), using the assumed value of ft. 3. Estimate the parameters, including fl, by the B L U E (4). 4. Repeat steps 2. and 3. until converged. Rather than use the full BLUE, to minimize computations it is convenient to use the asymptotic approximations (5) for O,)i,j: n and ~i:n F-1 (Pj) in the definition of the A and ~ matrices. Computation of the estimates is then very straightforward, comprising iterations of least-squares calculations. For many distributions this will be considerably simpler than full maximum-likelihood estimation. LaRiccia and Kindermann (1983) proved that the Munro Wixley parameter estimators for the three-parameter lognormal distribution are asymptotically as efficient as the maximum-likelihood estimators. It seems likely that this asymptotic optimality holds for other distributions too. =
228
J . R . M . Hosking
3.5. Summarization of distributional shape using L-moments L-statistics can be used to define quantities that are analogous to the moments of a probability distribution or a data sample. Hosking (1990), drawing on earlier work by Downton (1966a,b), Greenwood et al. (1979), and Sillitto (1951, 1969), among others, showed that these quantities, which he called L-moments, form the basis of a unified approach to the summarization and description of theoretical probability distributions, the summarization and description of observed data samples, estimation of parameters and quantiles of probability distributions, and hypothesis tests for probability distributions. The L-moments of a probability distribution are defined by 21 = E(XI:I) , 2 2 = 1E(X2:2 - g l : 2 )
,
23 = ½E(X3:3 - 2X2:3 -I-21:3) , 24 = 1 E(X4:4 - 3X3:4 q- 3X2:4 - 21:4
,
and in general 2. = r - 1 Z ( - 1 ) j=0
j
E(Yr j:r) J
The L-moment 21 is the mean of the distribution, a location measure; 22 is a dispersion measure, being half the expected value of Gini's mean difference. It is convenient to define dimensionless versions of L-moments; this is achieved by dividing the higher-order L-moments by the dispersion measure 22. The L-moment ratios rr = 2~/22,
r = 3,4,...
,
measure the shape of a distribution independently of its scale of measurement. In particular, 773 and "C4 are measures of skewness and kurtosis respectively. The L-moment ratios z~, r _> 3, all take values between - 1 and +1. In practice, L-moments must usually be estimated from a random sample drawn from an unknown distribution. Because 2r is a function of the expected order statistics of a sample of size r, it is natural to estimate it by a U-statistic, i.e. the corresponding function of the sample order statistics averaged over all subsamples of size r that can be constructed from the observed sample of size n. This U-statistic is the sample L-moment g~. By its construction it is also an L-statistic, and it is an unbiased estimator of 2r. For example, the dispersion measure 22 is estimated from a sample of size n by 1 1 <_i<j<_n
this is one half of Gini's mean difference statistic.
L-Estimation
229
Hosking (1990) gives alternative expressions that do not involve the explicit construction of all possible subsamples, but instead represent the sample L-moments as L-statistics, n
gr = n-1 v - ,
(r) .~
2__~ W i:n~r~ i:n
"
i=1
The weights wi: " (r) n are illustrated in Figure 3 for the case n = 19: this shows the relative contributions of each observation to each sample L-moment. The weights w(r) i = 1, n, have a pattern that resembles polynomials of degree r - 1 in i. Indeed, in the notation of Neuman and Schonbach (1974), "Wi:n (r) is the discrete Legendre polynomial (-1)r-lrP~_l (i - 1, n - 1). L-moments are analogous tO ordinary moments, but have several advantages. For L-moments of a probability distribution to exist, the distribution must have a finite mean, but no higher-order moments need exist (Hosking, 1990). A distribution is uniquely defined by its L-moments (Hosking, 1990), whereas it is possible for two different distributions to have the same ordinary moments. Although moment ratios can be arbitrarily large, sample moment ratios have algebraic bounds (Dal6n, 1987); sample L-moment ratios can take any values that the corresponding population quantities can (Hosking, 1990). L-moments are less sensitive than ordinary moments to outlying data values (Royston, 1992; Vogel and Fennessey, 1993). L-moments can be used to fit distributions to data, by equating the first few sample and population L-moments, analogously to the method of moments. The resulting estimators of parameters and quantiles are sometimes more accurate than the maximum-likelihood estimates. This was found to be the case for the generalized extreme-value distribution (Hosking et al., 1985) and some instances of the generalized Pareto distribution (Hosking and Wallis, 1987).
llTTTTTTITTTTTTTTTTTT : w 1'
w31]! ,TT i,TTT,, 0
1a
i
1
Fig. 3. Relative contributions of each observation to the first four sample L-moments, for sample size n=19.
230
J.R.M. Hosking
The earliest work on L-moments was directed towards the environmental sciences, and most of the published applications of L-moments have been concerned with identifying distribution and estimating frequencies of extreme environmental events. Recent examples include Guttman (1993), Guttman et al. (1993), Benson (1993), Pearson (1993, 1995), Fill and Stedinger (1995) and Hosking and Wallis (1997). In other applications, Royston (1992) showed that L-moments gave useful summary statistics for medical data, and Hosking (1995) extended L-moments to censored samples.
4. More complicated problems
4.1 Correlation measures based on L-statistics Extension of L-statistic methods to multivariate samples is hampered by the difficulty of defining a unique ordering for multivariate data. The most widely studied approach involves ordering the values of one variable, thereby inducing an ordering on the other variables. For example, given a bivariate sample (Xi, Y/), i = 1 , . . . , n, the Y value associated with Xj:n is denoted by YF:~]" The Y values ordered in this manner are known as concomitants of order statistics (David, 1973) or induced order statistics (Bhattacharya, 1974). Schechtman and Yitzhaki (1987) described measures of association between random variable or data samples that involve L-statistics based on concomitants of order statistics, and Olkin and Yitzhaki (1992) applied these measures to regression analysis. An analog of covariance is the "Gini covariance" n
Gcov (X, Y) = n -1 ~-~(2i - n - 1)~i:n] , i=1
and an analog of correlation is Z(2i-
n - 1) Y~i:n]
( 2 i - n - 1)Y/. . . .
(13)
i=1
The term "'Gini" is appropriate because the denominator of (13) is a multiple of Gini's mean difference statistic. The Gini covariance can also be regarded as the covariance between the Y values and the ranks of the X values. For random variables X and Y, with cumulative distribution functions Fx and Fy respectively, the Gini covariance is defined to be G c o v ( X , Y ) = cov(X,Fy(Y)). It may be regarded as intermediate between the usual covariance, cov(X, Y), and the rank covariance, cov(Fx(X),Fy(Y)). It may therefore be expected to share some of the robustness characteristics of the latter measure. It has the disadvantage of not being symmetric in X and Y; in general we have
cov(Y, Fx(X)) 7~ cov(X, Fy(Y)) cov(Y, Fv(Y)) cov(X, Fx(X))
L-Estimation
231
However, if (X, Y) has a bivariate Normal distribution with correlation p, the two variants of the Gini correlation of X and Y are both equal to p.
4.2. Regression models It is natural to seek to extend L-estimation from single-sample applications to the linear regression model, in order to obtain estimators that are robust to deviant observations and long-tailed distributions of the error term in the model. Because regression is a multivariate problem, involving the dependent variable and one or more explanatory variables, the definition of order statistics in this context needs some thought. The residuals of a fitted regression model, however, can form the basis for ordering the observations. Consider the linear regression model k
yt=
Zfljxjt+et,
t= 1,...,n ,
(14)
j=l
with dependent variable yt, explanatory variables xjt, j -- 1 , . . . , k, and mutually independent errors et. Given preliminary estimates fi), j = 1 , . . . , k , of the regression coefficients, residuals are defined by k
Ot = Y t - - Z f l j x j t
,
t=
1,...,n .
j=l
Denote by r(t) the rank of the residual et. Then L-estimators of a regression coefficient can be defined to be
~Wr(t)Yt ;
(15)
t--I
when the regression is null, in which case yt = et, this definition reduces to that of a standard L-statistic, as in (1). As a simple example of (15), we can take wr(t)=
0 1
if r / ( n + l ) < ~ or r / ( n + l ) > if 7 < _ r / ( n + l ) _ < 1 - ~ ,
1-~,
(16)
the regression analog of a trimmed mean. Ruppert and Carroll (1980) investigated this estimator, but found it to be very sensitive to the particular choice of the preliminary estimates fi~. The approach based on ordering residuals from a preliminary fitted model originated with Bickel (1973), though his estimators were computationally complex and not equivariant under reparametrization of the model. Welsh (1987) defined similar estimators that were simpler to compute, and performed well when the preliminary fit used ordinary least squares estimation; his estimates are similar in spirit to a Winsorized mean of the residuals, rather than the trimmed mean defined in (15)-(16). Improvements to Welsh's estimates have been suggested by de Jongh et al. (1988) and Ren and Lu (1994).
232
J . R . M . Hosking
An interesting alternative approach is based on "regression quantiles", introduced by Koenker and Bassett (1978). First observe that for a single sample the sample quantile can be regarded as the solution of the minimization problem
tiER
+
Oly,
min
{t:yt>_t}
(1 O)ly,
] (17)
-
{t:yt
J
For 0 < 0 < 1 the solution to this problem is the sample quantile fl = yt:n if (t - 1)/n < 0 < t/n, and any point in the closed interval yt:n <_ fl <_yt+l:n if 0 = t/n. Write the regression model (14) as yt = x ~ f l + e t ,
t= 1,...,n ,
(18)
where fl is a k-vector of regression coefficients and xt a k-vector containing the values of the explanatory variables for the/th case. By analogy with the foregoing minimization problem, Koenker and Bassett (1978) defined the regression quantile ,8(0) to be the solution to the minimization problem
] min Z Olyt - x~fll + ~ (1 - O)lyt t oak {t:y,_>xTt} {t:yt<x,rt}
J
The regression quantile fl(1/2) is the least absolute error estimator of ft. In general, a regression quantile is a value of fl that fits a regression surface to exactly k of the n cases, by solving a set of simultaneous linear equations Yt = x~fl for t belonging to a size-k subset of the set { 1 , . . . , n}. This definition of quantiles in a regression context enables the definition of a regression analogue of a smooth L-estimators: analogously to (6) we write
fl=
J(O)fl(O) d O .
(19)
O•01
For example, a trimmed mean can be defined to be 1 -
2c~
fl(O) dO
.
Koenker (1987) found that it performed extremely well in his simulation experiments. Koenker and Portnoy (1987) illustrated the use of the trimmed mean of regression quantiles, and considered methods of estimating its asymptotic variance. Koenker and d'Orey (1987) found simple ways of computing all regression quantiles for 0 < 0 < 1, using linear programming techniques: in practice the number of distinct solutions to the minimization problem (17) is O(n) as n increases. Portnoy and Koenker (1989) defined adaptive L-estimates for slope parameters of regression models, in which the weight function J(-) in (19) is estimated by a kernel estimator of the error distribution.
L-Estimation
233
References Alam, K. and K. B. Kulasekera (1993). Estimation of the quantile function of residual life time distribution. J. Statist. Plan. Infer. 37, 327-337. Andrews, D. F., P. J. Bickel, F. R. Hampel, P. J. Huber, W. H. Rogers and J. W. Tukey (1972). Robust Estimates of Location: Survey and Advances. Princeton University Press, Princeton, N.J. Balakrishnan, N. and A. C. Cohen (1991). Order Statistics and Inference. Academic Press, San Diego, Calif. Barnett, V. and T. Lewis (1994). Outliers in Statistical Data, 3rd ed. Wiley, New York. Bennett, C. A. (1952). Asymptotic Properties of Ideal Linear Estimators, Ph.D. thesis, University of Michigan. Benson, C. (1993). Probability distributions for hydraulic conductivity of compacted soil liners. J. Geotech. Engg. 119, 471486. Benson, F. (1949). A note on the estimation of mean and standard deviation from quantiles. J. Roy. Statist. Soc. Ser. B 11, 91-100. Bhattacharya, P. K. (1974). Convergence of sample paths of normalized sums of induced order statistics. Ann. Statis. 2, 1034~1039. Bickel, P. J. (1965). On some robust estimates of location. Ann. Math. Statist. 36, 847-858. Bickel, P. J. (1973). On some analogues to linear combinations of order statistics in the linear model. Ann. Statist. 1, 597-616. Blom, G. (1958). Statistical Estimates and Transformed Beta-Variables. Almqvist and Wixell, Uppsala, Sweden. Blom, G. (1962). Nearly best linear estimates of location and scale parameters. In: Contributions to Order Statistics, eds. A. E. Sarhan and B. G. Greenberg, pp. 34-46. Wiley, New York. Carroll, R. J. and H. Schneider (1985). A note on Levene's test for equality of variances. Statist. Prob. Lett. 3, 191-194. Chart, L. K. and S. W. Cheng (1988). Linear estimation of the location and scale parameters based on selected order statistics. Comm. Statist. Theory Meth. 17, 2259-2278. Chernoff, H., J. L. Gastwirth and M. V. Johns (1967). Asymptotic distribution of linear combinations of functions of order statistics with applications to estimation. Ann. Math. Statist. 38, 52 72. Conover, W. J., M. E. Johnson and M. M. Johnson (1981). A comparative study of tests for homogeneity of variances, with applications to the outer continental shelf bidding data. Technometrics 25, 351 361. Crow, E. L. and M. M. Siddiqui (1967). Robust estimation of location. J. Amer. Statist. Assoc. 62, 353-389. Dal6n, J. (1987). Algebraic bounds on standardized sample moments. Statist. Prob. Lett. 5, 329 331. David, H. A. (1968). Gini's mean difference rediscovered. Biometrika 55, 573-575. David, H. A. (1973). Concomitants of order statistics. Bulletin de l'Institut International de Statistique 45, 295-300. David, H. A. (1981). Order Statistics, 2nd ed. Wiley, New York. Davidian, M. and R. J. Carroll (1987). Variance function estimation. J. Amer. Statist. Assoc. 82, 107%1091. de Jongh, P. J., T. de Wet and A. H. Welsh (1988). Mallows type bounded-influence regression trimmed means. J. Amer. Statist. Assoc. 83, 805 810. De Wet, T. and J. W. J. Van Wyk (1979). Efficiency and robustness of Hogg's adaptive trimmed means. Comm. Statist. - Theory Meth. B; 8, 117-128. Downton, F. (1966a). Linear estimates with polynomial coefficients. Biometrika 53, 129-141. Downton, F. (1966b). Linear estimates of parameters in the extreme value distribution. Technometrics 8, 3-17. Fill, H. D. and J. R. Stedinger (1995). L moment and PPCC goodness-of-fit tests for the Gumbel distribution and impact of autocorrelation. Water Resou. Res. 31,225-229. Gastwirth, J. L. (1966). On robust procedures. J. Amer. Statist. Assoc. 61, 929 948.
234
J . R . M . Hosking
Greenwood, J. A., J. M. Landwehr, N. C. Matalas and J. R. Wallis (1979). Probability weighted moments: definition and relation to parameters of several distributions expressable in inverse form. Water Resou. Res. 15, 1049-1054. Gupta, A. K. (1952). Estimation of the mean and standard deviation of a normal population from a censored sample. Biornetrika 39, 260 273. Guttman, N. B. (1993). The use of L-moments in the determination of regional precipitation climates. J. Climate 6, 2309-2325. Guttman, N. B., J. R. M. Hosking and J. R. Wallis (1993). Regional precipitation quantile values for the continental U.S. computed from L-moments. J. Climate 6, 2326-2340. Hampel, F. R., E. M. Ronchetti, P. J. Rousseuw and W. A. Stahel (1986). Robust Statistics: The Approach Based on Influence Functions. Wiley, New York. Harrell, F. E. and C. E. Davis (1982). A new distribution-free quantile estimator. Biometrika 69, 635 640. Hogg, R. V. (1967). Some observations on robust estimation. J. Arner. Statist. Assoc. 62, 1179-1186. Hogg, R. V. (1974). Adaptive robust procedures: a partial review and some suggestions for future applications and theory. J. Amer. Statist. Assoc. 69, 909-927. Hosking, J. R. M. (1990). L-moments: analysis and estimation of distributions using linear combinations of order statistics. J. Roy. Statist. Soc. Ser. B 52, 105 124. Hosking, J. R. M. (1995). The use of L -moments in the analysis of censored data. In: Recent Advances in Life-Testing and Reliability, ed. N. Balakrishnan, pp. 545 564. CRC Press, Boca Raton, Fla. Hosking, J. R. M. and J. R. Wallis (1987). Parameter and quantile estimation for the generalized Pareto distribution. Technometrics 29, 339-349. Hosking, J. R. M. and J. R. Wallis (1997). Regional Frequency Analysis. An Approach Based on L-moments. Cambridge University Press, Cambridge, England. Hosking, J. R. M., J. R. Wallis and E. F. Wood (1985). Estimation of the generalized extreme-value distribution by the method of probability-weighted moments. Technometrics 27, 251-261. Huber, P. J. (1981). Robust Statistics. Wiley, New York. Kaigh, W. D. and P. A. Lachenbruch (1982). A generalized quantile estimator. Comm. Statist. Theory Meth. 11, 2217-2238. Koenker, R. (1987). Discussion of "the trimmed mean in the linear model" by A. H. Welsh. Ann. Statist. 15, 20-36. Koenker, R. and G. Bassett (1978). Regression quantiles. Econometrica 46, 33-50. Koenker, R. and V. d'Orey (1987). Computing regression quantiles. Appl. Statist. 36, 383 393. Koenker, R. and S. Portnoy (1987). L-estimation for linear models, J. Amer. Statist. Assoc. 82, 851857. La Riceia, V. N. and R. P. Kindermann (1983). An asymptotically efficient closed form estimator for the three-parameter lognormal distribution. Comm. S t a t i s t . - Theory Meth. 12, 243 261. Lloyd, E. H. (1952). Least-squares estimation of location and scale parameters using order statistics. Biometrika 39, 88-95. Lord, E. (1950). Power of the modified t test (u test) based on range. Biometrika 37, 64-77. Maritz, J. S. and A. H. Munro (1967). On the use of the generalised extreme-value distribution in estimating extreme percentiles. Biometrics 23, 79 103. Moon, Y.-I. and U. Lall (1994). Kernel quantile function estimator for flood frequency analysis. Water Resou. Res. 30, 3095 3103. Munro, A. H. and R. A. J. Wixley (1970). Estimators based on order statistics of small samples from a three-parameter lognormal distribution. J. Amer. Statist. Assoc. 65, 212 225. Neuman, C. P. and D. I. Schonbach (1974). Discrete (Legendre) orthogonal polynomials: a survey. Internal. J. Num. Meth. Engg. 8, 743-770. Odell, P. L. (1982). Gauss-Markov theorem. In: Encyclopedia of Statistical Sciences, Vol. 3, eds. S. Kotz and N. L. Johnson, pp. 314-316. Wiley, New York. Olkin, I. and S. Yitzhaki (1992). Gini regression analysis. Internal. Statist. Rev. 60, 185-196. Parzen, E. (1979). Nonparametric statistical data modeling. J. Amer. Statist. Assoc. 74, 105-131. Pearson, C. P. (1993). Application of L moments to maximum river flows. NewZ. Statist. 28, 2-10.
L-Estimation
235
Pearson, C. P. (1995). Regional frequency analysis of low flows in New Zealand rivers. J. Hydrology (NZ) 33, 94~122. Portnoy, S., and R. Koenker (1989). Adaptive L-estimation for linear models, Ann. Statist. 17, 362-381. Prescott, P. (1975). A simple alternative to Student's t. Appl. Statist. 24, 210 217. Ren, J.-J., and K.-L. Lu (1994). On L-estimation in linear models. Comm. Statist. - Theory Meth. 23, 137 151. Royston, P. (1992). Which measures of skewness and kurtosis are best? Statist. Med. 11, 333 343. Ruppert, D. and R. J. Carroll (1980). Trimmed least squares estimation in the linear model. J. Amer. Statist. Assoc. 75, 828-837. Schechtman, E. and S. Yitzhaki (1987). A measure of association based on Gini's mean difference. Comm. Statist. - Theory Meth. 16, 207-231. Sheather, S. J. and J. S. Matron (1990). Kernel quantile estimators. J. Amer. Statist. Assoc. 85, 410-416. Sillitto, G. P. (195l). Interrelations between certain linear systematic statistics of samples from any continuous population. Biometrika 38, 377-382. Sillitto, G. P. (1969). Derivation of approximants to the inverse distribution function of a Continuous univariate population from the order statistics of a sample. Biometrika 56, 641-650. Stigler, S. M. (1973). The asymptotic distribution of the trimmed mean. Ann. Statist. 1, 472477. Tiku, M. L. and N. Balakrishnan (1984). Testing equality of variances the robust way. Comm. Statist. - Theory Meth. 13, 2143 2159. Tukey, J. W. and D. H. McLaughlin (1963). Less vulnerable confidence and significance procedures for location Based on a single sample: trimming/Winsorization 1. Sankhy~ Ser. A 25, 331 352. Vogel, R. M. and N. M. Fennessey (1993). L-moment diagrams should replace product-moment diagrams. Water Resou. Res. 29, 1745-1752. Von Andrae, C. C. G. (1872). Ober die Bestimmung das wahrscheinlichen Fehlers durch die gegebenen Differenzen yon m gleich genauen Beobachtungen einer Unbekannten. Astronomische Nachrichten 79, 257-272. Welsh, A. H. (1987). One-step L-estimators for the linear model, Ann. Statist. 15, 626-641. Welsh, A. H. and H. L. Morrison (1990). Robust L estimation of scale with an application in astronomy. J. Amer. Statist. Assoc. 85, 729 743.
N. Balakrishnan and C. R. Rao, eds., Handbookof Statistics, Vol. 17 © 1998 ElsevierScienceB.V. All rights reserved.
On Some L-estimation in Linear Regression Models
S o r o u s h A l i m o r a d i a n d A. K. M d . E h s a n e s Saleh
1. Introduction
Linear combination of order statistics or L-estimators plays an extremely important role in the development of robust methods for location parameters. An important precursor to the trimmed L-estimator is the L-estimator consisting of a few fixed number of order statistics, for example, the median, trimmed, and Gastwirth's estimator. Many researchers have studied L-estimators based on a few selected optimum sample quantiles for some parametric models to evaluate the asymptotic relative efficiencies of such procedures. Hogg (1974) preferred trimmed means and Stigler (1977) concluded that the 10% trimmed mean emerges as the victor in analysing data from 18th and 19th century experiments designed to measure some physical constants. As a first attempt, Bickel (1973) considered a class of one-step L-estimators of regression coefficients in a linear model depending on a "preliminary estimator". Although these estimators have good asymptotic properties, they are computationally complex and apparently not invariant to reparameterization. The object of this paper is to consider the well accepted definition of regression quantiles of Koenker and Bassett (1978) in the linear model and pursue the question of L-estimation based on a few selected regression quantiles and trimmed least squares estimation. The results about the trimmed least square estimator in this paper are obtained by improving the conditions initially given by Ruppert and Carroll (1980) and Jure6kova (1984) and are new. In Section 2 we present the definition of regression quantiles and its properties in detail as developed by Koenker and Bassett (1978) and Bassett and Koenker (1978). Other sections that follow are new results. Section 3 describes the L-estimation of regression parameters based on few selected regression quantiles for some known error distributions in order to assess the asymptotic relative efficiency (ARE) properties. Section 4 contains the discussions on trimmed L-estimators with weaker conditions than Jureckova (1984). Section 5 concludes with some improved estimators of regression parameters based on the trimmed L-estimators yielding Stein-type estimators.
237
S. Alimoradiand A. K. Md. E Saleh
238
2. Regression quantiles and their properties Consider the linear model
Ynxl = fl01n + Xnxp~pxl -~-Z n x l
(2.1)
,
where Z, xl = (Z1,... ,Z,)' consists of i.i.d.r.v's with a continuous cdf F0(z) and pdf fo(z), z E IR. It is well-known that the least squares estimator (LSE) of /~* = (fl0, P')' is given by I ~,L= D~D.
! D~Y
(2.2)
where D, = (1, IX) is the design matrix and the quadratic estimator of variance is / \ given by -L
t
$2= ( Y - fi°LI~- X~") ( Y - fi°LI"- X ~ ) n-p-1
(2.3)
It is further well-known that the LSE of/~* is not robust. A natural curiosity is to develop analogues of linear combination of order statistics and simplified quantile estimators for the linear model. Bickel (1973) considered ordering of the residuals based on a preliminary estimator. But his method is computationally complicated and needs basically iteratively re-weighted least squares techniques. However, Koenker and Bassett (1978) extended a well-accepted definition of sample quantiles in linear models which they call regression quantiles. We shall devote our attention to this concept of quantiles, present their properties as in Koenker and Bassett (1978) and apply them to the estimation of regression parameters. DEFINITION 2.1. Regression quantiles of order 2 is the solution of the minimization problem
bG~'?+l
j~J
where J = {j :yj - d;b > 0} and dj is the jth row of Dn the design matrix. For p = 0 and dj = 1 the minimization problem yields b0 equal to the sample quantile of order 2, while 2 = 1/2 yields b0 to be the sample median. Thus, the median is the least absolute error estimator (LAE) of fl0. ^
^/
I
Let 1~(2) be the solution sets consisting of (fi0(2),/~n(2)) of the regression quantiles by minimising (2.4). Also, let ~ denote the (p + 1)-element subsets of J = {1,2,... ;n}. There are (p+,n,) possible_ subsets in" x/t°. Elements h E ~ have complements h--- J - h and h and h partition the vector Y as well as the design matrix Dn. Thus, the notation Y(h) stands for the (p + 1)-element vector o f y ' s from Y, i.e., {y},j E h} while D~(h) stands for a (n - p - 1) x (p + 1) matrix with rows {(1,xj)lj E h}. Finally, let
On some L-estimation in linear regression models H = {h E Jgl rank D,(h) = p +
1} .
239 (2.5)
Then, the following theorem gives the form of the solution of (2.4). THEOREM 2.1. If the rank of D, is p + 1, then the set of regression quantiles I~(2') has at least one element of the form
1*,(Z) = D,(h) -1 Y(h)
(2.6)
for some h E ~/g. Moreover, 13(2') is the convex hull of all solutions of the form (2.6). The proof of Theorem 2.1 follows from the linear programing formulation of the minimization problem as given below: min[2'l',r + + (1 subject to
Y=D.b+r
-2')l.r' ], In = (1 , . . . , 1)'
+-r-,
(b,r +,r ) E I R p+I × I R + × I R + . (2.7)
For example see Abdelmalek (1974). Sample quantiles in the location model are identified with a single order statistic in a sample. Theorem 2.1 generalizes this feature to regression quantiles where normats to hyperplanes defined by subset of (p + 1) observations, play the role Of order statistics. The following Theorems describe various other properties of regression quantiles (RQs') given in Koenker and Bassett (1978).
THEOREM2.2. Iffl~*(2, Y, O,) E 1~(2, Y, O,) then the following are elements of the specified transformed problems: (i)
~*,(2, cV, D,) =c1*,(2, ^* V,D,),
c E [0, o0).
(ii)
^* - 2',dr, O.)= I*.(1
(iii)
^* Y + D,~,D,) = 1.n(2, 1.,(2, ^* Y , D , ) + ~,
(iv)
^* v, D.C) = C 1..(2',
dL(2', r,D.),
1.,(2, ^* Y,D,),
d E
~E 1Rp+I
Cp+l×p+1 non-singular matrix (2.8)
The following theorem gives the conditions when RQ's are unique. THEOREM 2.3. If the error distribution Fo(z) is continuous, then with probability ^. one: 1.,(2) = D,(h) -1 Y(h) is a unique solution if and only if (2'--l)lp+ 1 < Z[1/2
-- 1/2 sgn(yj - d j f '^* l.)-
2']djD.(h) I < 2.1;+1
jCh (2.9) where lp+l is a vector of ones.
240
S. Alimoradi and A. K. Md. E Saleh
The next theorem states the relationship of the number of residual errors /^, ilj = y j - djpn(2), J = 1 , . . . , n that are positive, negative and zero with that of n2. THEOREM 2.4. Let ~ = (ill,...,iin) / and P(it),N(ii) and Z(fi) be the number of positive, negative and zero residuals. Then, N(it) <_ n2<_ n - P(h) = N(/t) + Z(~)
(2.10)
for every ~,*(2) E 1~(2). If ~ ( 2 ) is unique, i.e., ~,*(2) = I~(2) then inequalities hold strictly. The theorem that follows has the following geometric interpretation: Consider a scatter diagram of sample observations in ]R2 with 2th-regression quantiles slicing through the scatter. Now, consider the effect (on the position of the 2-regression quantile) of moving observations up or down in the scatter. The result states that as long as these movements leave observations on the same side of the original line, its position is unaffected. THEOREM 2.5. If ~;(2) E 1](2, Y, Dn), then ~*(2) E B(2, DnI~; +Ait, Dn) where /i = Y - D,~* and A is any n x n diagonal matrix with nonnegative elements. Before stating the next Theorem it is necessary to consider the following conditions on the distribution function and the design matrix. (A0) F0 is continuous and has continuous density f0 in the neighborhood of Q0(2i) when q0(2i) > 0, i = 1,...,k. (A1) n -1/2 maxij Idij[ = o(1). (A2) limn~o~ 2;n = limn--~o~n lDInDn = 2;. p.d. Then, 1 for fixed ^* ) with 0 < 2 1 < ' " < ) o ~ < THEOREM 2.6. Let (^* /~n(21),...,/~n(2k) k(p + 1 < k < n), b e t h e sequence of ufiique RQ's from the model (2.1). Define
~*(2) = (fl0+aQ0(2),fll,...,flp)
.
(2.11)
Then
nl/2[(#;()~l)-]~(~l)) (2.12) where ( ( 2 i _A2j -_ 2,2j'~'~ f2 = \ \q0(2i) , qo(2j)J J "
(2.13)
We recognize that I2 is the covariance matrix of k sample quantiles with the spacing 2 = (21,..., 2k) / in the location model. We will also prove this theorem via weighted empirical processes in Section 4.
241
On some L-estimation in linear regression models
Finally, we present the following Theorem 2.7 which is the basis of studying the problem of estimation of the regression parameters based on a few selected "regression quantiles" in some known error distributions.
THEOREM 2.7. Let a(2) = (a()~l),..., a(2k))' be coefficients such that ~-~=1 a ( ~ i )
=
1 and let A0-A2 hold. Then ^,
~*(2) = Z
i=1
is
invariant
to
(2.14)
a()Li)~n(2i)
location,
scale
and
reparameterization
of
design
and
nl/2(l~;()~ ) --fl*()~)) converges in distribution to a ( p + l)-variate Gaussian distribution with mean 0 and covariance matrix a(2)'f~a(2)Z -1. 3. L-estimation of the parameters of a linear model based on a few selected regression quantiles with known error distributions
In this section we consider the simplified L-estimation of the regression and scale parameters of a linear model based on a few selected regression quantiles. Consider the linear model
Y : D~* + oZ
(3.1)
where Z = (Z1,..., Zn) I consists of i.i.d, error variables with a continuous known cdf Fo(z) and pdf f0(z), z E IR and/~* and a are the regression and scale parameters respectively. It is well-known that least squares estimation leads to the best linear unbiased estimator (BLUE) of/~* given by (2.2) and the optimum quadratic estimator of cr2 is given by (2.3). These estimators are not robust. Also the asymptotic efficiency of this BLUE depends on the error distribution. In this section, we introduce the estimation of (/~*', o-)' based on a few selected regression quantiles (RQ) of Koenker and Bassett (1978) which will relate to simple robust estimators of ,~*. This study will also extend the work of Ogawa (1951) on the estimation of location and scale model leading to simplified estimation of the regression and scale parameters of a linear model. Our program is to propose asymptotically best linear (in regression quantiles) unbiased estimators (ABLUE) of/~* and a based on a few fixed, say k, (p + 2 _< k < n) selected regression quantiles (RQ) when the error distribution is known. We shall also assess their asymptotic relative efficiency properties when the error distribution is the Cauchy-distribution. For a given integer k ( p + 2 _< k < n), consider the spacing vector (21, . . . , 2~)/ whose components satisfy the constraints 0 < 21 < - • • < 2k < 1. Now, by the minimization problem (2.4), we obtain the k RQ's given by the vector
n(~)=
n(Al),...,fljn(/~k
,
j=O, 1,...,p
(3.2)
S. Alimoradi and A. K. Md. E Saleh
242
Then, using Theorem 2.7, one obtains that the k ( p + 1)-dimensional random variable
(
n 1/2 ( 2 0 , ( 2 ) - / ~ 0 1 k - o-u)', (~ln(~)- f l l l k ) ' , . . . , (]lpn(.~) - flplk (3.3) converges in law to the k(p + 1) dimensional normal distribution with mean 0 and covariance matrix o2(17-1 ® g?) where u = (Ul,..., uk)' with uj -- Q0(2j), j = 1, ..., k, 1 = (1,..., 1)', a k-tupte of ones, and g2 is given at (2.13).
3.1. Joint estimation of regression and scale parameters In this subsection, we propose asymptotically best linear (in regression quantiles) unbiased estimators (ABLUE) of (/Y', o) t. ABLUE of (/~0,/~1,...,/~p)' and o can be obtained by minimizing the quadratic form using (3.3) as p
p
~-~ Z ~ri+,,j+l (Oi, - 0i)'~-1 (Oj, - 0i) , i=0 j=0
(3.4)
where ~rij is the ij th element of covariance matrix I7 and by setting 00n =
~0n(~),
0 0 ~--
fl01k + o-u,
Oj = fljlk ,
0in = ~j,(k),
(3.5)
j = 1,... ,p, to obtain the normal equations:
K3m'
K2 ] an j
re'V*
"
The explicit expressions for ~ and 6-, are given respectively by
~,
=X1 [ K 2 V - K3V*]
and
(3.7)
I [KIm'V* - K3m'V] ,
(3.8)
where V = (V0, V1,..., Vp),t
t - 1 ~jn(~); Vj = lkl'l
V* = (V0*, Vl*,... , Vp*)', , -1 lk, K1 = lk~ A = KIK2-K32,
Vj* = u'a-l~j,(/t);
/(2 = U/~-lu, m =
j = 0, 1 , . .. , p j = 0, 1,... ,p
K3 = 12fl l u ,
(1,~l,...,~p)' .
(3.9)
(3.10) (3.11) (3.12)
The asymptotic covariance matrix of (~,*', 6,)' is given by n---~ - K 3 m ' ~, -1
KI
I
"
/31 /
On some L-estimation in linear regression models
243
Also, the explicit forms of K1, K2 and K3 are given by K1
k+l [qo(2i) -
qo(2i
l)] 2
=2---
K2
(3.14)
'
i=1
[qo(2i)Qo(2i) -- qo(2i_l)Qo(2i Z_., i=1
k+l
K3
1)] 2
2i -- 2i-1 [q0(2i) --
(3.15) '
qo( 2i_l )] [qo( 2i)Qo( lai) - qo( 2i_l )Qo( 2i_l ) ]
i=1
(3.16)
2i -- 2i 1
with 2o = 0 and 2k+1 = 1. If Fo is symmetric and 2~ = 2k i+l,i = 1 , . . . , k , then, /£3 = 0 and the ABLUE of/Y and a simplify to V ~] = ~-l
and
m'V* ~-n - - K2 ,
(3.17)
with variances and covariance given by O-2
-* = ~ 7 2 ; -1, Var(pn) n/~l
O-2
Var(~n) =
(3.18)
ng2
and C o v ( ~ , 8-,) = 0 .
(3.19)
Now, the joint asymptotic relative efficiency (JARE) of the ABLUE (~,-*',an)-' relative to the MLE (/},*',#n)' may be computed as KPA JARE[ABLUE • MLE] = ip 1(I11122 - 122)
(3.20)
and A R E [ ~ : MLE] =
AP+I I•;1 K~+I (I,1122 - I22)P+1
and (3.21)
ARE(#,, : MLE) =
IllA K1 (111122 -- 1 2 )
where Ill, 122 and I12 are the elements of the information matrix for a location and scale distribution. Further, for K3 = 0, and 112 = 0, we obtain (note that 112 = 0 for symmetric distributions) JARE[ABLUE " MLE] - Kp+IK2
Ip+1122
(3.22)
and A R E [ ~ : MLE] = Kf+l111(p+I) and ARE[&n : MLE] = K21221
(3.23)
244
S. Alimoradi and A. K. Md. E Saleh
Another method of estimating (rio, a) and/~ would be to consider the two marginal distributions namely, the marginal of (i)
hi/2 (]~On(2) -- f l o l k - ¢7ti) t and ( ,
(ii)
tie/2 (flln(~)- flllk),..., (~pn(,~)- flplk
),),
(3.24) •
The first one yields the estimators ~0n = x{K2V0-K3V0*}, 1 -
and (3.25)
1
<, = S{K1v0*-K3~}, and the second one yields the estimators of ~ as ~, = g
1
V,
V= (V1,..., Vp)' .
(3.26)
The estimate of a is ~-n which is non-negative but, (3.8) may not be non-negative. The ARE of these estimators relative to the corresponding marginal likelihood estimators may be determined easily. Now, we consider the explicit expressions for the coefficients in (3.7) and (3.8). They are
l{ } 1{ } b = X KlUt~"~-i -K31~g? 1 = (bl,...,bk)/ a = X K212~-1 - K3u'g? 1 =- ( a , , . . . , a k ) '
(3.27)
(3.28)
where, for i = 1,2,...,k, ai =
qo(2i)K2 ~'qo(2i) - qo()~i-l) A "( /~i i~i-I
qo(2i+,) -- qo(~i) \
/~i+1 /~i
J
qo (2i)K3 [qo()@Qo()ci) - q0()~i_l)Q0()~i 1)
a
(
(3.29)
i7--~,77_1
qo (~i+I) Q0(}ui+ 1~i+1)~_~__q_0Ai (~i)Qo (/~i) } bi -- qo (2i)K1 f qo(.~i)Oo()o,) - qo(}~i
a
t
1)Q0(2i-,)
i;- Z5_,7 q0 (2i+,)Q0(2i+l) - qo ( 2i)Qo( 2i) 27+1 --27 }
qo()~i)K3 (
(3.30)
qo(,~i+l) -- qo(~i)'~
Using these coefficients, we may write the estimators (3.21)and (3.22) as follows:
On some L-estimation in linear regressionmodels
245
al/)0n(21) + . . . + akfl0,(2k) al/)1n(21) + . . . + ak/71,(2k)
,
d0 = 1 .
(3.31)
alflpn(21) + . . . + akflpn(2k)
~jP=odj(blfljn(2l) --~ .,. ~- bkfljn(2k) ) Note that the coefficients for the estimators of the components of p* are the same satisfy the condition ~ = 1 ai = 1 as in Theorem 4.3 of Koenker and Bassett (1978) while the coefficients of the estimator of o- satisfy ~ - 1 bj = 0. It also involves the vector m' = (1,~1,.--,~p). Unless m' = (1,0, 0 , . . . , 0) one may not be able to guarantee ~-n > 0. This suggest the course of estimation using (3.25) and (3.26), where there is no such anamoly. In order to obtain the A B L U E of (p*', ~)' optimally, we maximise KPA with respect to 21,22,... ,2k satisfying 0 < 21 < 22 < ... < 2k < 1. Once we have obtained the optimum spacing vector ~0 = (20 ... , z~), n0 , we can determine the optimal coefficients a ° = (a°,...,a°) ' and b ° = (bl,...,bk). 0 0 , To illustrate the methodology, we consider the Cauchy distribution• Here, we note that
((7~2(2iA2j-2i2j)~) =
(3.32)
t tsi 7,77i>77 )
and Q0(2j) = t a n [ ~ ( 2 j - 1/2)];
j = 1,...,k .
(3.33)
The explicit forms of K1, K2 and K3 are k+l [sin2(Tc2i) _ sin2(rc2i_l)]2 x,
'
i=1 k+l [sin(2~2i) - sin(2=2i_l)]2 K2 = ~ ~ ( £ - ~-"~'-1) ' i=1 k+l [sin 2 (7~2i) K3 =
_
_
(3.34)
sin 2(7~,~i_1)] [sin(2~2i) - sin(2~2i ,)] 2 (2; - 2 _1)
i=1
The optimum spacings for the estimation of (]l*', ~r)' may be obtained by maximising 2p+2KiA in this case since Ill =/22 = 1/2 and I12 = 0, We know that optimum spacings for the location-scale parameters are given by
20=(?+
2 l'k+l'''"k+l
k )
(3.35)
S. Alimoradi and A. K. Md. E Saleh
246
[see for example, Balmer et al. (1974) and Cane (1974)]. Using these spacings we obtain the o p t i m u m coefficients as 4 • 4 ~i 2~zi = ---sin ff-~cos k+ 1 k + 1' 4 • 2 rti 2~zi =sm ~ - ~ c o s bi k+l k+l' ai
i = 1 ... k ' '
and
(3.36)
i = 1 ... k . ' '
These coefficients m a y be used to obtain the A B L U E o f fl* and a respectively. The /
\ 2p+4
J A R E o f the estimator for this case will be (k+l s i n . ' . ) Now, we present some numerical examplesku~mg tkheljcomputatlonal algorithms o f K o e n k e r and d ' O r e y (1987, 1994) for c o m p u t i n g the R Q s ' for two sets o f data. EXAMPLE 3.1. Consider the simple regression model y = flo + fll x + z , with the following data: X:
Y:
78 75
78 73
78 71
78 70
78 68
77 68
76 66
76 66
76 65
76 63
76 62
75 60
77 60
76
17.3 17.6 15.0 18.1 18.7 17.9 18.4 18.1 16.3 19.4 17.6 19.5 12.7 17.0 16.1 17.3 18.4 17.3 16.1 15.9 14.6 16.5 15.8 19.5 15.3 17.4 17.6
Then, the different fitted lines are as follows, LSE : Regression M e d i a n (RQ(.5)) : RegressionlStQuantile (RQ(.25)) : Regression3rdQuantile (RQ(.75)) :
14.72224 + 0.0329x 9.9999 + 0. lx 9.1 + 0 . 0 9 9 9 9 9 x 15.72499 + 0.03125x
Now, we represent the solutions for k = 7 o p t i m u m regression quantiles for the n o r m a l distribution. They are
1 2 3 4 5 6 7
0.0t71 0.0857 0.2411 0.50 0.7589 0.9143 0.9829
26.0000019 16.46249771 12.499999 9.9999294 14.599999 19.984619141 19.49999
-0:1727273 -0.01874998 0.0499999 0.1000009 0.049999 -0.00769 0.000000089
Based on these regression quantiles, we obtain the estimates o f flo and /31 as (14.0088, 0.04139). The following figure shows the scatter plot o f data with different fitted lines.
On some L-estimation in linear regression models
scatter plot 3.1
247
LSE FSRQ RQ(.25, .5, .75)
....
O
p,a(.7~) .
.
.
.
.
.
•
.
.
Q
._--___~_. .
"
i
I
I
I
60
65
70
75
-
-
-
EXAMFLE 3.2. In this example we consider the regression model Y = flo + fllXl ÷ f12X2 ÷ f13X3 ÷ Z
with the following data, XI:
4.3 3.6
4.5 3.9
4.3 5.1
6.1 5.9
5.6 4.9
5.6 4.6
6.1 4.8
5.5 4.9
5.0 5.1
5.6 5.4
5.2 6.5
4.8 6.8
3.8 6.2
3.4
X2:
62 77
68 63
74 70
71 77
78 63
85 70
69 77
76 56
83 63
70 70
77 49
84 56
63 63
70
X3:
78 75
78 73
78 71
78 70
78 68
77 68
76 66
76 66
76 65
76 63
76 62
75 60
77 60
76
Y:
17.3 17.6 15.0 18.1 18.7 17.9 18.4 18.1 16.3 19.4 17.6 19.5 12.7 17.0 16.1 17.3 18.4 17.3 16.1 15.9 14.6 16.5 15.8 19.5 15.3 17.4 17.6
Then, the LSE fit and Regression Median fit is given, LSE :
5.344783 + 0.9234785Xl ÷ 0.04712587x2 ÷ 0.05217684x3
RM :
5.48703623 + 0.84914058Xl - 0.01677862x2 ÷ 0.11797349x3
In this case we obtain the solutions for k = 7 optimum regression quantiles for the normal distribution. The results are:
248
S. Alimoradi and A. K. Md. E Saleh
1 0.0211 2 0.1009 3 0.2642 4 0.5000 5 0.7358 6 0.8991 7 0.9789
DO
])1
])2
])3
-5.79232454 -7.45990467 3.67410445 5.48703623 9.57583427 12.98698711 11.04361057
1.84005880 1.93643999 0.86474842 0.84914058 0.45074245 0.79262042 1.01503742
0.07403532 0.07792763 0.02345317 -0.01677862 0.05087731 0.07539532 0.06348020
0.08877762 0.10249341 0.09597122 0.11797349 0.03373820 -0.04833032 -0.02330833
The estimates of/30,/31, /32, and/33 based on these regression quantiles are given respectively by (5.065928, 0.941837, 0.0356317, 0.0665379). We provide here a sample table of optimum spacings, coefficients and related quantiles for A R E computation where error distribution of the linear model (p = l) is Cauchy. In this case, as an example, if we choose k = 7 the A R E is A R E = K111~ 1 = (0.4774) × 2 = 0.9548 . Table 3.1 Optimum Spacings, coefficients and ARE quantities for the L-estimation of regression line with Cauchy Errors [k = 3(1)10] k
3
4
5
6
7
8
9
10
21 al 22 a2 23 a3 24 a4 25 a5 26 a6 27 a7 28 a8 29 a9 210 alo KI K2 K3 A
0.2500 0.0000 0.5000 1.0000 0.7500 0.0000
0.1837 -0.0769 0.4099 0.5769 0.5901 0.5769 0.8163 --0.0769
0.1535 -0.0730 0.3465 0.2664 0.5000 0.6132 0.6535 0.2664 0.8465 --0.0730
0.1321 -0.0593 0.2941 0.0892 0.4364 0.4701 0.5636 0.4701 0.7059 0.0892 0.8679 -0.0593
0.1147 -0.0446 0.2500 0.0000 0.3853 0.3147 0.5000 0.4599 0.6147 0.3147 0.7500 0.0000 0.8853 -0.0446
0.1013 -0.0333 0.2170 -0.0350 0.3431 0.1860 0.4501 0.3823 0.5499 0.3823 0.6569 0.1860 0.7830 -0.0350 0.8987 -0.0333
0.09tl -0.0254 0.1926 -0.0453 0.3074 0.0948 0.4089 0.2934 0.5000 0.3650 0.5911 0.2934 0.6926 0.0948 0.8074 -0.0453 0.9089 -0.0254
0.4053 0.4053 0.0000 0.0666
0.4469 0.4244 0.0000 0.0848
0.4631 0.4457 0.0000 0.0956
0.4712 0.4619 0.0000 0.1025
0.4774 0.4713 0.0000 0.1074
0.4822 0.4770 0.0000 0.1109
0.4855 0.4813 0.0000 0.1135
0.0827 -0.0197 0.1732 -0.0454 0.2765 0.0354 0.3736 0.2117 0.4591 0.3180 0.5409 0.3180 0.6264 0.2117 0.7235 0.0354 0.8268 -0.0454 0.9173 -0.0197 0.4880 0.4846 0.0000 0.1154
249
On some L-estimation in linear regression models
3.2. Estimation of conditional quantile function In this subsection, we consider the estimation of a conditional quantile function defined by Q(2) = t~* + O-Qo(2),
0< 2< 1
(3.37)
based on a few, say k, (p + 2 _< k < n) selected RQ's where to is a prefixed vector in advance. To achieve this, we simply substitute the ABLUE of (p*', O-)' in (3.37) from (3.6) and (3.7) and obtain = t0/~. + 6,Q0(2),
for each 2 E (0, 1)
(3.38)
Then the asymptotic variance is given by Var(Qn(2)) = a2 {KJo,~-lto - 2K3doX-lmQo(2) + K1Q2(2)} . (3.39) Similarly, the MLE of Q(2), say, Qn(2) has the asymptotic variance given by
O-2
hill
{[22t;~r-lt0 -- 2112tlor~-lmQo()~) q-/I1Q2@) }
(3.40)
The ARE of Qn(2) relative to Q.(2) is given by (3.40) divided by (3.39). The ARE expression lies between Chmin (K*/* 1) _< ARE[Qn(2) : Qn(2)] _< Chmax(K*/* 1) ,
(3.41)
( K,2 inK3) IC = \ K3m' K2
(3.42)
where and
1"= ( I~1"~ rail2 \[12m' 122 J '
and Chmin(A) and Chrna×(A) are a minimum and maximum characteristic root of A. Now, consider two quantities namely (i) tr[K*F ~] and (ii) ]K*/*-I] I/p+2. Then, p~2tr[K,/, l] will provide us with the average ARE of Q~(2) r_e~ative2 to Q~(2) which lies between Chmin(K*F ) and Chmax(K*F- ) while ]K'F- ] /p+ gives the geometric mean of the characteristics roots of K*F 1. Hence, we determine the average ARE as 1
p+2
tE[K*F 11 = [(p-t-1)K1i22 - 2/(3112 + K2111]/(p+ 2).
(3.43)
On the other hand, IK*r-ll =
K fA If1 (I11/22 - I22) "
(3.44)
The optimization of (3.43) or (3.44) will provide the required optimum spacings for the determination of the regression quantiles required to estimate (/~*', O-)' and
S. Alimoradi and A. K. Md. E Saleh
250
thereby Q(2). It is clear that (3.43) and (3.44) do not depend on the design matrix Dn. Alternatively, if fo(z) is symmetric then, the (p+2) characteristic roots of (/C/*-1) are
KIlIll,...,
KII~I 1, KIlH 1,
( K I I l l l q- K21~21)/2-4-1/2((KIIll 1 - K2I~21)2 q- 4K3I~lII~21) 1/2
(3.45) Thus, we maximise (KIll' + K2122')/2 + 1/2 ((K,/{] 1 - K21~')2 + 4K3Ii,'lyza) l/z w.r.t Q0(2) to obtain the optimum spacing vector which is independent of D,.
3.3. Estimating functions of regression and scale parameters Let 9(/P, G) be a function of the regression and scale parameters of the linear model. We consider the estimation of 9(~*, a) based on k, (p + 2 < k < n) re. gresslon quantiles ]~j,(~.)= (]~j,(21),... ,/?j,(2k)),J = 0, 1,... ,p as in (3.4) and ^
^
I
,
then estimate 9(/F,c;) by the substitution method as g(~*,6-). To study the properties of this estimator, we use the local linearization theorem as in Rao (1973). This allows us to obtain the asymptotic variance of 9(~*, 6). Now, assume that g(~*,o-) is defined on IRp+l x IR+ possessing continuous partial derivatives of order at least two at each point of ]R_p+I x ]R +. Then, using Rao (1973), the asymptotic variance of 9(~*, 6) may be written as 2
Var(9(~*,6)) =°--e~'K*-lo), n
~o=(~Oo, CO,,...,COp+,)' ,
(3.46)
where
COl-
og(/r, o) Offi
,
i= 0,1,...,p;
COp+l-
og(fl*, &r
(3.47)
and K *-1 is the elements in verse of K* given by (3.42). Similarly, let g(~*, ~-) be the MLE of 9(/P, a) with asymptotic variance Var 9
*,~
= - -no Y F 10~.
(3.48)
The ARE of g(~*, 6) relative to g(~*, ~) is given by
ARE [9 (~" , 6) : 9(~',8-)] - (oIK*~°'/*-I~°1(/1 and/* is given by (3.42).
(3.49)
On some L-estimation in linear regression models
251
To obtain the optimum spacing vector for the estimation ofg(/F, a), one has to maximize (3.49) with respect to Q0(21),..., Qo(2k)for fixed vector o). However, the function g(/F, a) is nonlinear in general and the vector e~ depends on the parameters (/V ~, a) r which is unknown, therefore, we may apply the CourantFisher Theorem (Rao, 1973) and find that Chmin(K, r
1)< ARE
< Chmax(K*r-1) .
(3.50)
Now, we follow section 3.2, for optimization problem to determine the optimum spacings. In this way one is able to estimate functions of (/W, a) ~ such as, (i) conditional survival function and (ii) conditional'hazard function.
4. Trimmed least-squared estimation of regression parameters and its asymptotic distribution In the previous section we considered the L-estimation of regression and scale parameters based on a few selected regression quantiles. These estimators resulted in quick and robust estimator of/~ and o- against outliers for each distribution under consideration while sacrificing very little on the asymptotic relative efficiency. The basic idea of derivation was based on the generalized least squares principle. In this section we consider another L-estimator namely, the trimmed least squares estimator (TLSE) of/L Consider the regression model
Y= D,~+Z,
where
~ = (fl0, fll,''"
,flp)t E INp+I
(4.1)
where Z = (Z1,..., Z,)' has components which are i.i.d r.v's from a continuous cdf F0(z) (with pdf f0(z)). The least squares estimator (LSE) of ~ is given by = (D,D,)
Onr
(4.2)
which is obtained by minimizing
(4.3/ with respect to/~. For the trimmed least squares estimation (TLSE) of p, we first define an n x n diagonal matrix A whose elements are
{ 0, aii
=
1,
,A
if Y,. < d~/~,(21) or Y/> di'~n(22 ) otherwise
(4.4 /
where ~n(21) and ~,(22) are the regression quantiles of orders 21 and 22 respectively for 0 < ~1 < 1/2 < 22 < 1. We obtain the TLSE of p by minimising
(Y- D,~)'A(Y- D,~)
(4.5)
S. Alimoradi and A. K. Md. E Saleh
252
with respect to p obtaining fl,(41,42) :
t
1
(D, AD,) D',AY.
(4.6)
Ruppert and Carroll (1980) studied the TLSE under the following conditions 1. F0 has continuous density f0 that is positive on the support of F0. 2. The first column of design matrix is 1, i.e., d/1 = l, i = 1 , . . . , n and ~-~d~j=O,
forj=2,...,p+l
.
i--I
3. n!irao o (max(n-'/21dij])") =0. \ t,J / 4. There exists a positive definite 2; such that lim n -1 (D~,D,) =
n~oo
~ .
Jure6kovfi (1984) considered Bahadur representation and asymptotic properties of ft, (21,22) under the following conditions: (i) (ii)
F0 is absolutely continuous with density function f0 such that f0(') > 0 and f~(.) is bounded in a neighborhood of Fol(4). max~,jn-1/41dijl= O(1), as n ~ c~.
(iii)
lim,~n-l(D',D,)=2;,
(iv) (v)
where 2; is a positive definite ( p + 1)× ( p + 1) matrix. ~y-~i=ad4=O(1), ~ " as n - + ec, j = 1 , . . . , p + 1. l i m , ~ d ~ j = dj, j = 1,...,p+ 1.
We obtain the same results as in Ruppert and Carroll (1980) and Jure6kovfi (1984) under weaker conditions on the design matrix D, and distribution function F0. We use the weighted empirical approach of Koul (1992) providing new proofs for the theorems and lemmas that follows later on. Before we state the main theorem for TSLE we need an asymptotic linearity result for ft,(4). First, we consider the conditions A0, A1 and A2 from section 2 after Theorem 2.5 and define fl(2)=fl+Q0(4)el,
el = ( 1 , 0 , . . . , 0 )
.
(4.7)
Now, consider the weighted empirical process
T,(t;4)=n 1/2~di{I(yi-d/t <_O)- 21
(4.8)
i=l
which is basic for Theorem 4.1 given below. THEOREM 4.1. Under the conditions A0, A1 and A2 for each 2 C (0, 1) we have
253
On some L-estimation in linear regression models
//1/2(fln(2 ) -- [J(2)) ~ -{qo()@Sn}-l Tn(fl()O;,~)
q- op(1)
(4.9)
where
Tn(~(,~);,~, ) = / / - 1 / 2 ~ d i { l ( Z i=1
i ~ Q0(,)))_de}
.
(4.10)
We prove this Theorem with the aid of two Lemmas. LEMMA 4.1. Let the conditions of Theorem 4.1 hold. Then for each 0 < 2 < 1 and b > 0
sup Tn(fl(,~)q-//-1/2t;.)~)- Tn(fl(,~);,~)--q0(,~)~nt [Itll_
=Op(1) . (4.11)
PROOF. This lemma is a weaker version of Theorem 2.3.1 of Koul (1992) where he proves the above result uniformly in 2 when A0 is replaced by assuming f0 to be uniformly continuous. Denote by T,(t; 2) = (Tn0(t; 2),..., Tnp(t;2)). So, we prove that the jth component satisfies sup ]]tl[_
Tnj(fl(2)+n-1/2t;2)-
Tnj(fl(2);2)-qo(2)l~dijd/t n i=1
:op(1). (4.12)
Tny(fl(2)+ n '/2t;2) Tnj\fl()~)( -}- n 1/2t;2) - E[Tnj(fl(2)q-n
by considering the centered Tnj(t,0.j~) =
1/2t;2)]
(4.13)
and the compact set . ~ ( b ) = { t E I R p + l l Iltll-
0
.
(4.14)
Then, the expression (4.12) is bounded above by sup T°.(t;2)-/°.(0;2) [[tll_
- q°(2)~ f~m =11@I 2 . First, we consider/2 where
g[Tnj(fl(,~);)~)] (4.15)
S. Alimoradi and A. K. Md. E Saleh
254
n -112 ~ dij{Fo(Qo()~) + n-1/2di't)
12 : sup IIt[I
i=1 -- Fo(Qo(2)) -
~
<_ n -112
i=1
qo(2)n-1/2di't} (4.16)
Id;jl sup Iltll-
Fo(Oo(2) + n-ll2d/t) - Fo(ao(2)) -- qo(2)n-1/2 di't .
By the mean value theorem 3 A.(t) C (0, n-1/2di't) such that
12 <_ n -1t2 ~
Id,sl sup
i=1
[n-ll2di'tl fo(Qo(2) + A,(t)) - f o ( Q o ( 2 ) )
Ittll-
and by taking
On : n-1/2lldilib we have
s~ _<,, l~ld,jllld, llb sup /o(Qo(~)+ D,.(,))-fo(Oo(~)) i=1
A.(t)<¢$.
= o(1) as n ~ oo and conditions A0 and A2.
(4.18)
Now, we prove that 11 = op(1) where 11 = sup Iltll_
TO.(t;2)_ vD(o; o ~)
.
(4.19)
First, we prove it for each t E X ( b ) TO.(t;2)_
0 . T~(0,X)
i=1
(4.20) -Fo(Oo(z~)+n-1/2d/t)+Fo(Oo()~))) n
:
n 1/2Zdij{Vi(t)-vni(t)} i=1
where
V/(t) = I(Qo(2) < Zi <_ (Qo(2) + n-i/2dirt), (4.21)
vni(t) : Fo(Qo(2) -I-n-1/2di't) - Fo(Qo(2)) • Now, we apply the Chebyshev inequality to get the result.
255
On some L-estimation in linear regression models
P( n-1/2~dij{Vi(t)-Vni(t)}
2>e2)
i=1 <_ llE e2 n m
(/--~1 dij { Vi(t) - Vn/(t) })2 (4.22)
l l z E dijdkjE(Vi(t)- v n i ( t ) ) ( ~ ( t ) - v.k(t)) 62 n / k n 2
e2nll~d2E(Vi(t)-vni('))=
_<easn-+oc
by mOanS 1 2 ,
which implies I1 = op(1) for fixed t. For tightness, we use the compactness property of set ~#(b). Since, ~/~(b) is a compact set it is suffices to show that for each s E X ( b ) and for every e > 0, there is a 6 > 0 such that l i m s u p P (suP\llt_sl,<_6
T°(t;2)- T°(s;2) > 2e) < ~"
(4.23)
We need the following relations to prove (4.23)
di't : di's + (d/t - ali's) .
(4.24)
So that d/t - dis = d/(t - s)
and Idi't - d/sl :
Id/(t
- s)l
(4.25)
_< I]dillllt - sll
Thus,
dis -IId~l16 ~ tilt ~ dis +
IId~l16 •
(4.26)
Therefore, we have
T°(t;2)-T°(s;2) n
= n-l~2 Z dij{[l(Zi <_Oo(A)+n-l/2d/t) - l(Zi <_Q0(2)+ n-1/2di's)] i=1
- [Fo(Qo(2) + n-l/2d/t)
-- F0(Q0(2)
+n
1/2d/s)] }
n
i=1 n
+ n-l~ 2 Zldijl Fo(Qo()O+n-1/2d/t)-Fo(Oo(2)+n
1~2d/s) .
i=1 (4.27)
256
S. Alimoradi and A. K. Md. E Saleh
Now, using the monotone property of the indicator function with the relation (4.26), we can write the first term of (4.27) under the absolute sign as
[[(Z i ~ Qo(/~) +?l 1/24.ts-rl-1/2Ndill(~ ) -l(Z i ~ Qo(}~)-[- F/-1/24,ts)l
<
[1(Zi< Qo(A)+n - l / 2 d / t ) - , ( Z / < Qo(A)+n-1/2d/s)] [l(Zi ~ Qo()~)+n-1/2dits + n-I/2lld/I,~ ) - - ' ( g i ~ Qo(~)+n-l/2dits)] (4.28)
so that II(Z/_< Q o ( 2 ) + n - l / 2 d / t ) -
l(Zi <_ Qo(2)+n-a/Zd/s)]
I I(Zi ~ Qo(~) -~ rl-l/2aits -~-rl l/2][ai[l(~) -
I(Zi <
Q0(2)+
(4.29)
lnlfa~ll6)
n-l/2d/s- n
,
and for the second term, we have
I Fo(Qo()~) + n-1/2d/t) _ Fo(Qo(j~) + n-1/2d/s)]
<_fo(Qo()o) + n 1~2dis + n-1/2lld, ll6) - Fo(Qo(2) + n-l/2d/s - n 1/2][dil[c5 )
.
If we substitute (4.29) and (4.30) in the above expression, then add and subtract the corresponding mean, we have T°(t; 2 ) - rO.(s; 2)
i=1 - Fo(Qo(2) + n l/2d/, +
- [i(z/<
n-1/21le ll )
- Fo(Qo(2) + n
t-
i--1
k
-
n-1/ZlldiH3)]
n-1/2U/s-
(4.30)
n-'/211~.tl6)]} 1/2Ndi]13)
Fo(Qo(2)
+ n 1/2di's - n-1/2J]di]]6)}
m
As we see, the R.H.S of (4.30) (designating it as Ill and I12 respectively), does not depend on t anymore and s is fixed. Now we apply the same argument as we mentioned in the proof of 11 for fixed t to obtain Ill = op(1). By condition A0 and A2, I12 < e for sufficiently small 8 which completes the proof of Lemma 4.1.
On some L-estimation in linear regression models
257
The next Lemma will show that the regression quantiles are bounded in probability, i.e., LEMMA 4.2. Under the condition of Theorem 4.1
n'/2(~.(2)-
ig()o)) = Op(1) .
(4.31)
We need the following steps to prove this Lemma n (i)
Tnj(~n(2);2 ) = ?l l / 2 ~ d i j { I ~ i
- ai t^~.(2)
<_ o) - 2
}
/=1
(4.32)
= op(1)
(ii)
Tnj(p(2);2) = Op(1)
(4.33)
(iii) V c > 0 , 0 < u < o c 3 b ( = b ~ ) a n d a n N
P(inf\ii,ii>b Tn( ~(2) + n 1/2t:2)'
~ such that for each 2 E (0,1)
>u)>l-e
(4.34)
PROOF of (i). We prove this step via Koul and Saleh (1995). By assumption A0, F0 is continuous hence we obtain Pn(2) as a unique solution of the minimization problem. Now use of Theorem 3.3 of Koenker and Bassett (1978) yields after writing sgn(x) =_ 1 - 2I(x <_ O) + I(x = 0), for each 2 E (0, 1) with probability one (w.p.1)
o)-
<
iEhc
(4.35) -- d / ~ n ( 2 ) = 0 O n l ( h ) < ,)~l'p+l
iEh~
Dn (h) is defined in section 2. Let w~,(2) denote the vector inside the inequalities, i.e., (2-
! 1)1;÷ 1 < W'n(ft ) < 21p+ 1 .
(4.36)
Also, for each i E h by Theorem 2.1 we have ith
d/~,(2) = diD n' (h) Y(h) = (0,..., 1 , . . . , O) Y(h) = yi
(4.37)
which implies that w.p. 1
I(yi - di'~n(2) : O) = 0 V i E h c • Then, w.p.1 we have
(4.38)
S. Alimoradi and A. K. Md. E Saleh
258 ?/
- ai ~ ( 2 )
< 0) - 2
(4.39) -- Z dil~ I(yi - dil~n(2) iEh
~ 0)-
,t}//921 (h)--.,'(4).
Note that (4.38) implies w.p.1
-d//~n(2) <0)= '^ __
I(yi
1 giEh
(4.40)
Therefore by the definition of T ~ ( ~ ( 2 ) ; 2 ) , we have
Tn(~n(2);2)-n-l/aZdi(1-2)=n-1/2D'(h)wn(2
)
(4.41)
iEh
which implies that
Tn(~n(2);2)
<_ n-l/2~--~lldill(1- 2) + n -1/2
D~n(h)wn(2)
iEh
(4.42) From (4.36), we further obtain
D'n(h)wn(2 )
/
<_ Dn(h)lp+l
(4.43)
where
D'n(h)lp+1 =
doj , . . . ,
dpj
.
(4.44)
j=0 But, each component is bounded above by max 11411,/p + 1 which implies
O,n(h)lp+l
n 1/2
<
n-1/2maxelldll(p+ 1)
.
(4.45)
IIdell(1 - 2) _< n-l/2 max 11411(p + 1) .
(4.46)
Also we have
n-l~2 Z ich
i
Thus, we get w.p.1 T, (/~(2); 2)
_< 2 ( p + 1)n-1/2maxlld/ll
(4.47)
and the R.H.S converges to zero in probability by A1. Now, we prove step (ii), PROOF of (ii). TO show (4.33), we look at the asymptotic distribution of Tnj(~(2);2~. Since, Z;s are i.i.d we use the Hfijek-Sidak C.L.T, Theorem 3.3.6 of \ /
259
On some L-estimation in linear regression models
Sen and Singer (1993). For this reason we check the condition which is satisfied by conditions A1 and A2, i.e., n l maxd~
n-1 Xi% d8
---+ 0
as n ---+oc
(4.48)
which implies that for each 2 E (0, 1)
00(2))- 2} a JV'(0, 1) . [2(1- .~)/./-1~--~4L1d2'] 1/2
n-1/2~n=ldij{l(Z
(4.49)
i <
Therefore, this property implies that for every e > 0 there exists an M~ such that
P(T,j(~(2);2)
_< M,) > 1 - 2e/3 .
(4.50)
We apply this result to prove step (iii). PROOF of (iii). We prove this part via Koul and Saleh (1995). First, we define the following process for simplicity ~ ( t ; 2 ) = T n ( p ( 2 ) + n 1/2t;2) .
(4.51)
Now, every t E INp+I with IItll > b could be written as t = rO, Irl > b and II011= 1. Also note that ~(rO; 2) • 0 is nondecreasing function o f t , V 0 E IRp+I. Therefore, Cauchy-Schwarz inequality implies for each 2 E (0, 1) inf
}ltll>b
~(t;2)
_>
inf
Iri>b,llOll=l
~'(rO;2).O
>_
inf
f~'(rO;2).O
[rl=b,HO[l=1!.
(4.52) Now, let • -T*, ( r , O;,t)~ = T ,*' (0,2). 0 +q0(2)r 0' ,r,,,O
(4.53)
and kn =inf{0'I2,0, II01t = 1} k = l i m k , = inf{0'I;0, II011 = 1}
(4.54)
A~ = N 1 - ~) <_ k, _< (1 + ~)~]. Since I2 is positive definite we have k > 0. This with the result of step (ii), implies that for every e > 0 there exists an M~, Nf and N~ such that
P(A~)=P(Ik,-k and
I <ek)_> l - e / 3 ,
Vn_>N~
(4.55)
S. Alimoradiand A. K. Md. E Saleh
260
/ P [ s u p ~'(0;2).0 _<M~] _> 1 - c / 3 , / \I[0H=I
(4.56)
Vn>N~ .
Also, by Lemma 4.1 3 an N~ such that
P(
inf f~'(rO;2).O}2>u) \lrl=b~ll011=lL
2
>_P(
)
inf ~f~(r~0;2)~ > u - e / 3 , \lrl=b,ll011=ll
Now, we use the fact that P
inf
(4.57)
n>_N~ .
Idl - Icl _< Id + cI, d, c c ~,. Then, we have
ir~(r,
2)}2 > u)
P(llrn'(o;;~).01-qo(,~)bO'~nO[ > b/|/2 1) >_P(Ir.'(o;~).ol<_-u 1/2q-bknqo(~,) VII01[ = 1) _>
vlloll ~--
/
(4.58) X
>_P ( sup IT'(0; 2). 0l_< -u -1/2 + bk(1 -e)qo(2);A,]~ \11011=1 _> 1-(2c/3),
/
Vn>_n~=NfVN~VN5
(M~+,v2) which completes the proof of by (4.55) and (4.56), as long as b _> k(l-¢)q0(;~), step (iii). Now, we use the results (4.32) and (4.34), to prove the Lemma similar to Jure6kovfi (1984) by defining t = n~/2(~,(2)- p(2)) . Thus, for a l l c > 0 , 0 < u < o c
(4.59)
3b(=b~) > 0 a n d N ~ so that
b T~(/~(2)+n 1 / 2 t ; 2 ) < u ) + P ( T ~ ( ~ , ( 2 ) ; 2 )
_>u)
= e/2 + e/2 = e .
(4.60)
Therefore, the proof of Lemma 4.2 is complete.
OnsomeL-estimationin linearregressionmodels
261
PROOF OF THEOREM 4.1. It follows from Lemma 4.1 that T, (j8(2)+ n - 1 / 2 t ; 2 ) - T~ (~8(2); 2 ) -
12,tq0(2)= oe(1)
(4.61)
for every sequence of random vector t, such that ]]t]] = Op(1). Hence, by substituting t with nl/2(~,(2) - p(2)) we have
Tn(~(~ ) +n-1/2[nl/2(~n(2 ) -/~(2))] ; 2) - Tn(1~(2);2) --qo(2)~n[nl/2(~n(2)--~(2)] ] =op(l) .
(4.62)
On the other hand
Tn(l~n(2);2)- Tn(]~(2);2) --qo(2)~n[nl/2(~n(2 )-I~(2))] =Op(1) . (4.63) But, by (4.33)we have T, (1), (2); 2) = op(1). Therefore, the proof of Theorem 4.1 is complete. [] The following Corollaries give the asymptotic distribution of RQ's. COROLLARY 4.1. Assume the conditions of Theorem 4.1 satisfied. Then,
rtl/2(~n()~) -- fl(2)) ~ ~A/p+I(O,'Q1- }~)
(4.64)
COROLLARY 4.2. Let the conditions A0-A2 hold. Then, for each 21, • • •, 2k such that 0 < 21 < ... < 2k < 1 the asymptotic joint distribution of the regression quantiles is given by
[n1/2(~n(21)- ~(~l)),...,nl/2(~n(2k)- fl(2k))] (4.65) ;~,A).j 2,,9 ' (i < j ) . where I2 = [co,).] and co,? - v0(~,i)q0(&) PROOF. By Theorem 4.1 we write
(4.66) and then by Corollary 4.1 and Cramer-Wold device we get the result. Now, we state the main Theorem of TLSE similar to Jure6kovfi (1984).
[]
262
S. Alimoradiand A. K. Md. E Saleh
THEOREM 4.2. Assume that the conditions A0, A1 and A2 hold. Then, for each 21 and 22 such that 0 < 21 < 1/2 < 22 < 1
(1)
F/1/2(~n (21,/~2) -- fl)
(ii)
: F/-1/2{(22_ 21).~,n} 1 ~ di(O(ai)_ 3)) ..~ Op(1) i=1 nl/g(~n(21,22)--~)~ Jl/'p+l (0, 0"2(21,22),~-"-1)
where O(Zi), Y and
0"2(21,22)
(4.67) (4.68)
are as defined
f Qo(21), / 4(z) = { z, Qo(22),
if z < Qo(21), if Qo(21) < z _< Qo(22), if Z > Qo(22)
t
(4.69)
7 = 21Q0(21) + (1 - 22)00(22) and
0"2(/].1,22) = (22--21) -2 f222 (Qo(u)-~0) 2 dbtq-/~l(Qo(/~l)-~50) 2 I
Jr- (l -- 22)(Qo(22)-
30) 2
(4.70)
--[21 (Qo(21)- go) q-(1- 22)(Qo(22)- ~5o)]2}. with
60 z (22 --21) -1
22
f2 Q0(u)
du
I
PROOF. This Theorem will be proved also with the aid of two Lemmas. First, we define the following processes Tn(t; 2 ) = n -1/2 ~ diZiI(Zi <_Qo()c) + n 1/2d/,)
(4.71)
i=1 and
n
: n 1/2 Z di{ZiI(Z i ~ Oo(.~)-[-n-1/24.tt ) i=1
(4.72)
l'2 :t)l,
where #(.) is defined as
I~(Qo(2) + n-I/2t)
= E[Zil(gi ~
Qo(2) + n -1/2]
(4.73)
263
On some L-estimation in linear regression models
and #'(u) =
uf(u).
LEMMA 4.3. Under the condition of Theorem 4.2 for each b > 0 sup Iltll<_b
L(t;2)
- 1"~(0;2) - q0(2)Q0(2)Znt
= op(1) .
(4.74)
PROOF. Consider ir,j(t; 2 ) , j = 0 , . . . ,p as a j th component of T,(t; 2). Then, we have
qo(2)Qo(X)di't
sup .J (t; 2) - . j (0; 2) [Itll~
< sup IlC_
i+(t; 2) -,jo (o;~) E(Tnj
+ sup IIt[l~b Jl +J2
(4.75)
•
First, note that
J2 <_n-1/z ~-~ ]dij] sup i=1 [Itll-
/~(Qo(2)+n
l/2di't) -/~(Qo(2)) (4.76)
-- qo(2)Qo(2)n-1/2di't Then, by the mean value Theorem there is a A(t) c (0, n 1/2 d,it ) such that
J2 <_bn-1 ~ i=l
I~jllld~llsup
(Qo(2) + A(t))fo (Qo(2)+ A(t))
Iltll-
(4.77)
- Oo(2)fo(Qo(2)) =o(1)
Since, Ittll _< b implies that A(t) _< 6n = n-l/2lldi]lb then, by conditions A0 and A2 we get the result. To prove Lemma 4.3 it is enough to show that Jl = op(1) where J1 =
~0
T~(t, . 2) - T°(0; 2)
sup
(4.78)
IItl]_
First, we consider for each t E Y ( b ) -o r,~(t, . 2) _ r-o$ ( o , . x)
= n-1/2~=1 dij{Z,l(Zi <_(Qo(2)+n-1/2di't) -/~(Qo(2)
=
+n
'/2d/t)
n
n-1/2 Z dij{ V i ( t ) i=1
V,i(t) }
(4.79)
S. AlimoradiandA. K. Md. E Saleh
264 where
~(t)
/
=Zd[Qo(2) < Z/< Qo(2)+
n-1/2di't)\
(4.80)
Now, we apply Chebyshev inequality to show that (4.79) goes to zero in probability.
P( l'l 1/2~=i dij{gi(t)-Dni(t)} 2~ C29
llE(+diy{Vii(t)-~)ni(t)}) 2
-<e2n
k/=ALS,1
(4.81/
n 2 1 li~ld2E(V/(t)-vni(t) ) ~2g/ = _< ¢ as n --+ oe by condition A0 and A2. Therefore, Yl = op(1) for fixed t. For the tightness, we follow similar argument as in expression (4.23) and use the compactness property of X(b). So, it is necessary to show t h a t V e > 0 3 ~ > 0 9 VsE Jd(b) l i mss uup Pp (\ nIlt-sll- 2 e ) < ¢ .
(4.82)
In this case we have ~j(t; 2) - T°(s; 2)
n
~t'l-1/2ZldijZil ,(Z/. ~ Qo(/[)q- H-l/24.tt ) -l(Z i ~ Qo()[) -}-/'/-1/2415) i=l
+n 1/a~]dijt #(Qo(2)+n-1/adi't)-#(Qo(2)+n 1/2di's) i=1
(4.83)
By inequality (4.29) the above expression is bounded above by
n i=1
--I(Zi ~
Qo(2)+n-l/2dits--n 1/2Hdf[]~)}
+ r/-1/2 ~ ]dij[ /~(Qo(,,~)-I--rl '~2d/t) --/2(Qo(~ ) -+-rl 1/2ditg) i=l = Jll +J12 • (4.84)
On some L-estimationin linearregressionmodels
265
By the m e a n value T h e o r e m and the same a r g u m e n t as in (4.77) we have
sup J12 = o(1) .
(4.85)
lit sll_<6
T o show suPllt_sll<~Jll -- oe(1) we note that Jll does not depend on t and so it is enough to show that Jll = / oe(1). F o r this we center Jll to obtain
Jll ~-n 1/2 ~ ]dij]( []Zill(Zi ~ Qo(2)+n i=1
-/~* (Qo(~.) + n-1/2d/s
1~2dis+n l/2Hdil]6) + n-1/211a,l16)]
--[,(Zi ~ Oo(~)-]-n 1/2ditN-n 1/211d/116)
-,12*(Qo(.~)-~-n-1/2dit$-.-1/2lldill6)] }
i=l -
~*(Q0().)+n-1/2a/s - n-|/211a, II6) }
•
(4.86)
By the same a r g u m e n t as we did for fixed t in J1, the first term of R.H.S of (4.86) goes to zero in probability. The second term is also less than e by continuity of/~* for sufficient small 3. Therefore, the p r o o f of l e m m a is complete. Before we state the next l e m m a it is necessary to introduce the following processes, namely /¢
Ujk(t) = n - 1 Z dijdfl(Zi <_Qo()~) + n-1/2d/t)
(4.87)
i=1 and U°k(t) = Ujk(t) -- E[Ujk(t)] .
(4.88)
Then, LEMMA 4.4. U n d e r the conditions of T h e o r e m 4.2, for 0 < b < oc and 2 E (0, 1) sup IUjk(t) tltll_
-- )~ajk[ L 0
where aj~ is the j x k th element of Z.
(4.89)
266
S. Alimoradiand A. K. Md. E Saleh
PROOF. First, we prove this lemma for each t. Then,
=
Z dijaij Zi <_ Qo(2) + n I/2di't -
i=1
= n-l~= 1 dijdik{I(Zi <_ Qo(2)+n l/2ditt ) + F0 ( 0 0 ( 2 ) +
-Fo(Qo(2)+n-1/2ditt )
n-1/Zdi't) } -ajkFo(Oo(2))
~ // 1~i=1dijdik{I(Zi ~ O°(A)-~-/'/-1/24¢t) -F°(O°(/~)~-n-1/2ditt)} +
F/-1~i=1dijdikFo(Oo(2) + n
'/2ai't ) - ajkFo(Oo(2))
= 11 +/2 . First we show that 11 =
P
(4.90)
op(1)
that is to say that for every e > 0, we have
~ dijdik{I(Z i ~ Qo(I~)--~l/l-1/2ditt)-Fo(O0(~ ) -~FI '/2dirt)}
~ c2
< LLE~ e2 n 2 ~,~i=1dijdik{l(Zi < O°()~)+n-l/2ditt) - F°(O°()O +n-1/2d/t)}} 2 -
-
_
-
-
11 e2 n 2 +/i=1~dZijdZE{i(Zi < Oo(2)+n_l/Zdi, t) _
1, -- 6_2n 2 Z_~ ~iJ~ik < -- -'~ 1
"
< L--V'~'42"/2
i=1
<easn--+oo
F0 (Q0()0
+n-l/Zdi 't)
}2
n
n-I/Zmax]dijlJ i,j / "n-l~-~d2z.a ik
i=I
byA1 a n d A 2 . (4.91
Continuity of F0 together with C-S inequality and condition A2 implies that /2 = o(1). Therefore, we have the result for fixed t. For tightness Ofll we follow the proof of the Lemma 4.3. Next we consider 12. For this we have
267
On some L-estimation in linear regression models
sup, ]lt[]
- su, . 1 Iltll
=
+ Fo(Qo(2))
<_n
< -
n -1
~
dijdik -
ajk
l ~"~[dijdik ] sup Fo(Qo(2) +n-'/2di't) i=1
-Fo(Qo(2))
+o(1)
Iltll~
r/-1
H-1
-
i=1 _ e as n ~ oo
d lk2
.o(1) + o(1)
by C-S inequality and A0 and A2 .
(4.92)
Now we change the process Uj~(.) to OTk(t) = n-1 ~
dljdikI(Qo(,~l ) + rt-1/2d/t
<
Zi <_ Qo(,~2)+n-1/2d/t)
i=1 (4.93) for each 21,22 3 0 < 21 < )~2 < l. Then, we have sup UTk(t) - (22 - )q)oi/c = op(1) . Iltl[<_b
(4.94)
We apply (4.96) to prove Theorem 4.2. PROOF OF THEOREM 4.2. (i) By using the result of L e m m a 4.3 in the matrix form we have
n DnAD,=-
i=1
didi'l Qo(21)+n-1/2di't
= (A2 - 21)Z + op(1) as n ~ e~ .
(4.95)
By the structure of matrix A,
i=1 -- I ( Z i ~ Qo(,~.l)+ dit(fln(,~,) - fl(,~l))) }Z/ -- 1"~(t2;22) - • ( t l ; 2 , ) where
(4.96)
S. Alimoradi and A. K. Md. E Saleh
268
ti = ?ll/2(fln()q) - [~(;i))
i = 1,2 .
(4.97)
But, L(t2; 2 2 ) - L ( t i ; ) . l ) : [1"~(t2; 2 2 ) - •(0; 22)-qo(22)Qo(22)r,,t2]
- [Tn (tl ;21 ) - L(0; 21 ) - qo(21 )Qo(21 )X.tl ] -~- [~'n(O; .'~2) -- ~"n (0; 21) -[- qo(22)Qo()t2)Xnt2
- qo(21)Qo(21)Z, tl].
(4.98)
By Lemma 4.2 the first and second terms of (4.100) are op(1). Also, by substituting ti with its representation given in the Theorem 4.1 we have
n-1/2OtnAl= Tn(fl(~l);~l)Qo(,~l) - Tn(~(,~2);,~2)Qo(,~2) n +n ,/2 ~ diZi{l(Oo()q) < Zi <_ Q0(22))} + op(1) i=l and finally by the definition of T, (fl(2); 2)
(4.99)
and using the fact that
I(Zi <_a) = 1 -I(Zi > a) we have n 1/2DtnAl= n l / 2 ~ d i { Z ( Z i ~ Qo(J[1))Qo(J~l) i=1
- 21Qo()q) - (1 - 22)Qo(22) } + op(1)
tl = n-l/2 Z di{(~(Zi)- ~2} -~- op(1) i=1
(4.100)
where ~b(~) and ? are given in the (4.69). Finally we have
D',AZ =
t
1
'
(4.101)
or
(n-,,:~,0 '~-~2~:~ = ~(~,,~) _ p).
~4.,0~
As a result
~'~(~/~, ~2/- ~) -- (/~- ~,/~0 '. -~2 ~ <~/~/- ,}+o~/~/ i=1
(4.103)
269
On some L-estimation in linearregressionmodels Therefore, the proof of the first part of Theorem 4.2 is complete.
PROOF OF THEOREM 4.2. (ii) The result of the second part is given by using the result of part0), co,nsidering F0 symmetric with 21 = 1 - 22. In this case we have = 0 and E(~b(Zi)) = 0. Thus, by similar argument as before we have,
n-l/2 ~i~1
dij~b(Z/)
~ Jff(0, 1)
(4.104)
[0"2(21,22)n-1 ~;_1 d2] 1/2 []
and so W.L.G. the proof of the second part is complete.
As we saw in the result of Theorem 4.2, TLSE converges to normal distribution with variance a2(21,22)2.; 1. Here we define a consistent estimator of ~r2()q,22). Let Y1,Y2,...,Yn be sample from regression model (4.1). Then the estimation of o-2(21,22) is given by
{
n
(
)
32(21,22) : (22 -- /~1)-2 (/// -- 1) I ~ ( Y z - •n)2I ~n(21) < Y/ ~ ]~n(22) i=1 q- )41(Qn(21)- ~5n)2-}- ( 1 - 22)(Qn(22)- deln) 2 -[21(Qn(21)-On ) 4-(1-)L2)(Qn(22)-an)]2}
,
(4.105) where
On = (22- 21) In 1 ~ YiI(d/~n(21 ) < Yi ~ dit~n(22)) i=1 (4.106) = cn -1 ~ Yff(Qo(21) + n-1/2dittl < Z i ~ Q0(22) + n 1~2dirt2) i=1 and ti =
n'/2(~n(2i)- fl(2i)).
s2(21,22).
The following theorem states the consistency of
THEOREM 4.3. Let the conditions of Theorem 4.2 hold. Then,
82(211 22) L 0"2(21,)42) .
(4.107)
PROOF. To prove this theorem first we shall show that 6,~a
(4.108)
S. Alimoradiand A. K. Md. E Saleh
270 where g = (~2 -- J[1) -1
[o(a~) YdF(y) = dtfl q- go aO(~)
(4.109)
and g0 is given by gO = ("]'2 -- ~1)
Qo(u) du
l
(4.110)
1
Equivalently, it is necessary to show that g0n +P g0
(4.111)
where n
gOn =cn-I ZZi1(Qo(21) + n-1/2di'tl < Zi <_ Qo()~2)+ n 1/2~'t2) . i=1
(4.112) For this reason we note n
Igon - go] = C ?l-li~ ZiI(Qo(l~l)+n
'/2ditt 1%l i ~_~Q o ( , ~ 2 ) + n - 1 / 2 d i t t 2 )
d olzl <-cn l ~i=1Z/I(Z/_< Q0(22)+n 1/2d/,2) --1)(Q0()~2) ) + c n -1 i~=l n liZ(Zi ~_~Oo(/~l)@?l-I/2dittl) -v(Oo(/~i)) : C(I1 +I2)
(4.113)
where
u v(u) =
Z dF0(z) .
(4.114)
oo
So, we have to show that Ii = Oe(1) = 12 where Cn
1 n
= 111 +112 •
(4.115)
On some L-estimation in linear regression models
271
In this case we may write 111 : I~ - Ii] where
I~1 =
cn -1~Z Z i i I ( Zi i=1
<
Q0(21) + n-1/2d/tl ) (4.116)
- v± (Qo()q) + n-1/2d/tl) . Also, it may be shown that I~i = oe(1). Also, by continuity of v(.) we have I12 = o(1). Therefore, I1 = oe(1) and similarly I2 = o/,(1). By similar argument we have r/
1
n
Zl2I(Q0()H) + n - l / 2 d / t l
< Zi <_ Oo(A2)-t-n-l/2~'t2)
(4.117) --
d
Q0(21)
= op(1)
Further, we have Q0,(2) L Q0(2) .
(4.118)
Combining the above results we obtain the proof of Theorem 4.3.
5. Trimmed estimation of regression parameters under uncertain prior information In this section we discuss the estimation of/~ when there exists some uncertain prior information on p in the form of a hypothesis. Consider the usual linear model (4.1). We are interested in the robust estimation of p when it is suspected but not sure that the null hypothesis H0: R/~ -- r
(5.1)
may hold, where R is a q x p + 1 matrix of rank q(_< p + 1). Saleh and Han (1990) considered the problem based on least-squares estimator of/~ while Saleh and Shiraishi (1989) dealt with R and M estimators. In this chapter, we consider (21,22)-trimmed estimator for the linear model (4.1) as discussed in Section 4 and the modification there of. For the model (4.1), we denote the (21,22)-trimmed estimator o f / I by ~,(21,22) and is called the unrestricted trimmed L-estimator (UTLE) after trimming off 21 and 22 proportion of observations using (4.4). When H0 is true, the corresponding trimmed estimator of /I is denoted by ~n(21,)-2) and is called the restricted trimmed L-estimator (RTLE). It may be pointed out that ~n(21,22) has smaller asymptotic dispersion than ~n(21,22) under H0 : R/~ = r, although, ~,(21,22) may be biased and even inconsistent. However, when H0 is uncertain, we consider a preliminary test (PT) on H0 and consider
272
S. Alimoradi and A. K. Md. E Saleh
]~n(21,22) o r ~,(21,22) according as H0 is accepted or rejected using a reasonable test statistic. We denote this estimator by ~ ( 2 1 , 2 2 ) and is called the preliminary test trimmed L-estimator (PTTLE). Further, we consider some Stein-type estimators incorporating ~n(21,22), ~n(21,22) and the test statistic. The usual shrinkage trimmed L-estimator (STLE) will be denoted by ~{(21,22) and the positive-rule shrinkage trimmed L-estimator (PRSTLE) denoted by ~s+ (21,22). In subsection 5.1, five trimmed estimators of/~ are introduced. In subsection 5.2, the asymptotic distributional quadratic risks (ADQR) of these estimators under local alternatives are investigated. In subsection 5.3, the main results on relative dominance of these trimmed estimators are obtained. References on the preliminary test approach to shrinkage estimation may be found in Saleh and Sen (1978-87).
5.1. Five trimmed estimators This subsection contains the proposed five estimators of/~ stemming out of the TLSE of p. In the Theorem 4.2 (ii), we showed that
xc/n(~n(21,22) -- fl) ~
Xp(O,
o"2(21,22),~ -1) .
(5.2)
Now, by analogy of LSE of/~ subject to R/~ = r, we propose the RTLE as ~n(21,22) = ~n(21,22) -- ~nlRt(RZnlRt)-l(R~n(21,.)~2) - r) .
(5.3)
To obtain the PTTLE of fl, we consider the following test statistic, L, given by L, = [R~,(21,22) - r]'[RX~-IR '1 l[R~n(21,22) -- r] qS2 (21,22)
(5.4)
where $2(21,22) given in the (4.107). By Theorem 4.3, $2(21,22) ~'~ o-2(21,22). Hence, as n --+ co, Ln converges in law to the central chi-squared distribution with q degrees of freedom (df) under H0. Hence, we define the PTTLE as
~n-PT(21,22) = ~n(21, 22) -- Xn1R'(RZf1R')-l(R~n(2,,22) - r)I(Ln _< Z2q,~)
(5.5) 2 is the upper 100~% point of the central chi-square distribution with where Xq,= q df and I(A) is the indicator function of set A. As for the STLE and PRSTLE we propose the following:
flS(21,22)
~n(/].l,22) -- cL~-l-rn-lR'(RI2~-' R') -1 (R~,(2,, 22) - r)
(5.6)
and
nS+(21,)v2) =
22)
[1 -- cLnl]i(Ln > C)(
n(21,22) --
n(21,22) ) (5.7)
OnsomeL-estimationin linearregressionmodels respectively
where
that as Ln --+ OO, flS(~l, 22) -S c, /~.(21,22){~nS+(-~1,-42)} --+ ~n(.~l, 22),
0 < c < 2(q - 2).
{]~S+(2l, 22) } --+ fln(.~q, 22) and
as
273
Note
L~ - +
which is similar to the property of PTTLE but (5.6) and (5.7) are smooth version of PTTLE.
5.2. Asymptotic distributional risks We consider the asymptotic distributional risks under local alternatives, noting that the test based on L, is consistent for fixed p such that R/~ ~ r and the PTTLE, STLE and PRSTLE are asymptotically equivalent to UTLE under fixed alternatives. Hence, we study the asymptotic risks under local alternatives: Ha : Rp = r + n - ' &
6 = (6s,...,aq)'.
(5.8)
Then we have the following theorem. THEOREM 5.1. Under {Ha} and conditions of theorem 3.4.6, the distribution of -PT (21,22) are given by fin
Ln,~n('~q,22), ]~n(')~l,A2), and
a'(RX 1R') '3
(5.9)
(i)
limP{Ln _< x} = Hv(X, A),
(ii)
limP{vCd(~n(21,22) -/~) _< x} = Gp+~(x;0, 002(2,,22)~; 1)
(5.10)
(iii)
limP{v~(~n(2~, 22) -/~) _< x}
(5.11)
=Op+l(Xq-~-lRt(R~ (iv)
A=
o°2(21,22)
,RI) 15;0,002(..~1,22)V)
limP{v~(~,eT(21,22) -/~) _< x}
= Hq(Z~,~;A)Gp+I{x + ~ 'R'(RZ-1R ')
~-~E(~)Gp+I{X-ff~-IRI(R~
1~;0,002(/~1,22)V}
1R")lzg0,002(21,..~2)V )
× dGp+, (z;0,002(2,,22)(RZ 1R'))
(5.12)
where
V = ,y-I _ Z,-IR,(RZ 1R,)-lR,y-1
(5.13)
and Hq(.; A) is the cdf of a non-central chi-square distribution with q df and noncentrality parameter A , Gp+l(x; #, ,~) is the cdf of a p--1-dimensional normal distribution with mean/z and covariance matrix Z and
g(~) =
{
z:
(z+6)'(R~'-lR')-l(z+c~) o.2(21 ' 22 )
Z2 )
> q,~
(5.14)
274
S. Alimoradi and A. K. Md. E Saleh
Furthermore, (v)
-s v~(~,(21,22)
- ~)
U - c02(21,22)
_y' 1Rt(R1;-1Rt)-I(Ru + ~)
(5.15)
(RU + 8)'(R1;-1R ') I(RU + 8)
where
U ~ J J / p + l ( 0 , o'2(21,22)1;-1) •
(5.16)
Similarly, (vi)
~(#.~+(2,,
22) - ~)
U - 1;-1R'(R1; 1R')-I(RU + 8)
-{-{J-cff2(21,22)[(RU-t-cJ)t(R~-lRt)-l(RU-l-cJ)]
-1 }
(5.17)
x I{ (RU + ~)'(R1;-1Rt) -1 (RU + 8) _> c} × 2; IR'(R1;-1R')-'(RU+6)
.
Assume that, for an estimator ~(21,22) of p, G*(x) = llmP{x/n(~n(21,22)' -* - if) < x} .
(5.18)
Then, we define the asymptotic distributional quadratic risk (ADQR) of ~* (21,22) by
,5}~(~*(21,22) ; W) = [
xtWx dG*(x) = tr(IJ*W) ,
J l~p
(5.19)
where
f Z* = J xx'dG*(x)
(5.20)
and W is a positive definite matrix associated with the quadratic loss function
~, , )~2) - fl) tW(fl,( -* 2 , , 22) - P) . L ( ~~,, ( 2 , , 22); ~) = n(p,(21
(5.21)
THEOREM 5.2. Under the assumption of theorem 5.1, the ADQR's of the estimators are given by (i)
~(~.(21,22); W) = 0"2(21,22)tr.F-1W,
(ii)
~}~(~n(21,22);W) = a2(21,22)[tr(1;-1W) - tr(B)] + ~t(R1;-IRr)-IB~
(iii)
~(~.VT(2t, 22); W) = o-2(21,22)tr(1;-lW) - o2(21,2e)tr(B)Hq+2 (Z~,~;A)
-I- 8t(R1;-lat)-lB~{2gq+2()~2q,o: A ) - Hq+4()~2q,e;A) } ,
275
On some L-estimation in linear regression models
(iv)
~(flS(21, ~2); W) = 0"2(21,22)tr(-y'-lw) - 0"2(21, )~2)c tr(B)
x {2E[)~q-+22(A)]- cE[Zq+42(A)]} + c(c + 4){ t~'(RX-1R')-IBt~E[x¢4(A)] }
(v)
~}~(I~S+(21,22) ; W) = ~}~(~S;W) - 0"2(21,22)tr(B)
x El/1 Jr- ~t(R~-lR)
c/l 'B6{2E[(1-C)~q22(A))]()~2q+ 2 ~ C)]
- El(1--C)~q24(A))I()~2q+4(A) ~ c)]}
where B = RX-1WZ-1R'(RX IR)-I
(5.22)
PROOF. (i) and (ii) follow by straight forward computations using Theorem 5.1 (ii) and (iii). Using Theorem 5.1 (iv) - (v), (iii), (iv) and (v) are given by the same argument as in Judge and Bock (1978). 5.3. Comparison of ADQR
First, we compare the ~n(~1,22) with 1g.(21,22) by the ratio of the ADQR of fl.(21,22) to that of/~.(,~l, 22). ARE[~n(2,,22);~.(21,22)]=
tr(B) 1 tr(X_lw)
az(,q, 22)tr(/;-'W) j (5.23)
< Hence, ARE[~.(21, ~2); ~n(~l, ~2)] _~ 1 according as <
6'(RX-IR ') IB6 ~ a2(21,22)tr(.r-lW) . Specially, if 6 = 0, ie, under H0, ARE(~.(21,22); ~n(21,22)) >_ 1. Let chmin(B) and chmax(B) be the smallest and largest characteristic root of B, then using the relation chmin(B)A <
~'(RX-1R')-IB6 o'2(21,22)
< chmax(B)A
(5.25)
tr(B) chmax(B)A'~ ' 1 tr(X IW) + ~ J < ARE(~.(21,22),~.(21,22)) <
tr(B) chmin(B)A; -1 1 tr(,~_lW ) t- ~ j
(5.26)
276
S. Alimoradi and A. K. Md. E Saleh
Both sides of inequality (5.22) are decreasing in A. Next, we investigate the ARE of fl~ -PT (;,1, ;~2) relative to fl~(21, ~,2).
ARE(flPT(~I,,~2),fln()~I,;~2) ) = [1 +h(6)]
1
(5.27)
where ,
1 , -,
h(6)=a 2(.~l,Jl.2){6(R~ R) B6}{ -
2Hq+2 (Z~,~,A) -/qq+4(Z},~, A) t~_-5~-- 5 }
f tr(B)Hq+eO~2q#;k)], I.
tr(_r 'I4/)
(5.28)
J '
Hence wegetARE(j~PT(j~l,)~2) ,fl~(,~,22 - ~ )) ~< 1 according to
6'(RZ-1R')-IB6 <_ O'2(;q, A2)
tr(B)Hq+2(Z},~; A) >- 2Hq+2(Z2#, A) Hq+4(Xq2#,A)
(5.29)
-
Also let us define hmax(A) = ch
2
2
~B~Af2Hq+20~q'~'A)-Hq+4(Xq'~'A)
max,] /.
tr(X-1W)
__ tr(B)Hq+2()~2q,~;
(5.30)
tr(S 1W) A).}
2 , A) - Hq+4(Zq#, 2 A) (~,__f 2Hq+2(Xq,~
hmi"(k) = chmin'n)ai,
tr~WW-~
tr(B)Hq+2 (Z2#; A) tr(_r I~V) }
(5.31)
Then we obtain the inequality I1 q-hmax(A)] -1 < ARE(j~PT(;q,22),j~n(~,l,)~2) ) < [1 + hmin(A)] 1 (5.32) We compare fin(21,22) with fin(21,22). The difference of the risks is given by - ~(#, (;~l, h); w)
= o-2(.,~l,;~2)ctr(B){an[z~2(z~)] -cg[zq42(A)] } -
c(c + 4){6'(RZ 1R')-'B6E[Xq+44(A)]} .
Using the relation (5.28), the R.H.S of (5.33) is bounded from below by
(5.33)
On some L-estimation in linear regression models
277
cE[Zq+42(A)] }
(5.34)
az(2"22)ctr(B){2ED~q2;(A)] - c ( c + 4)chmax(B)AE
-
[z$+e4(A)] .
We have also the relation E[X~2(A)] - (q - 2)EEz22(A)] = AE[x;44(A)] .
(5.35)
We may rewrite (5.34) as a2(/ll, 22)ctr(B){ ( 2 - 9)EIZq22(A)] + { ( q - 2 ) 9 - c}E[Zq4+2(A)]}
,
(5.36) where 9 = (c + 4)h and h = chmax(B)/tr(B). Thus, for (5.33) to be non-negative for all 6 it suffices to choose c such that 0_
and
(q-2)9>c>0
.
(5.37)
N o t e that 9_< 2 is equivalent to c < 2(h -1 - 2). This implies that tr(B) > 2chmax(B) is the minimum requirement for (5.33) to be non-negative. Also, from non-negativeness of (5.36) at 6 = 0, it follows that c has to be less than or equal to 2(q - 2 ) ( o q > 3). Hence, we get THEOREM 5.3. A sufficient condition for the asymptotic dominance of t h e S T L E over U T L E [i.e., ~(/~n(21,22);W)>~(~s(21,,)~2);W ) for g ~ C R q and W(p.d)]is that the shrinkage factor c is positive and it satisfies the following inequality 2E[Xq+22(A)] - cE[x$+42(A)] - ( c - 4 ) h A E [ X q 4 4 ( A ) ]
2 0
VA > 0
(5.38) which, in turn, requires that q _> 3, 0 < c < 2(q - 2) and tr(B) > 2chmax(B). By the same argument as in the p r o o f of Theorem 4.3 of Sen and Saleh (1987), we get THEOREM 5.4. Under the sufficient condition of T h e o r e m 5.3 the P T T L E fails to dominate the STLE. Also, if for c~, the level of significance of the preliminary test, we have 2 0) > Hq+2(Zq,~;
r[2(q - 2) - r]/[q(q - 2)] with r = (q - 2) A [2/(h - 4)] ,
then S T L E fails to dominate the P T T L E . Finally, we compare ~s+(21,n 22) with ~s(21 ,22), the difference of the risks is given by
278
S. Alimoradi and A. K. Md. E Saleh
~:~(PS+(21,~2); W) - ~(~3(21,22); W) =-{0-2(21,22)tr(B)E[(1-cZqZ2(A))2I(z2q+2(A)
< c)]
_.~¢~t(R~y, lR)-lBc~(g[(1- CXq24(A))/(X:+4(A)~c)] -2E[(1-c)~;22(A))I(z:+2<-c)])}
(5.39)
•
by using the relation E[Zq+22(A)] - E[Zq24(A)] = 2E[Zq+42(A)] .
(5.40)
The R.H.S of (5.40) is always negative so we have THEOREM 5.5. PSTLE dominates STLE. In general the dominance picture may be stated as ~(~,(2i, 22); W) ~ ~(l~s+ (21,22); W)
<
w) <
(5.4l)
22); w)
while ~(Pn ~PT(21,22); W) may sit between ~(p.(21,22);W) and ~(/~.(21,22); W) depending on the size of the test.
5.4. Conclusion In this section, we proposed several alternative estimators of the regression parameters based on trimmed least squares in the linear model. It is observed that the performance of the preliminary test estimators near the null-hypothesis is generally better than other estimators depending on the size of the test while the shrinkage estimators having uniform dominance and never depend on the size of the test, as such are preferable. The positive rule is preferred over the usual shrinkage estimator. However, we notice that the shrinkage estimator depends on the rank of the matrix ~ which is q and the requirement for the application is that q _> 3 while the preliminary test estimator does not depend on the size o f q. Thus for q _< 2, one is forced to use the preliminary test estimator. For optimum choice of the size of c~, one may benefit using preliminary test estimator while shrinkage estimators are always preferred provided q > 3.
Acknowledgement This research has been supported by a NSERC grant A-3088 of the second author.
On some L-estimation in linear regression models
279
References Abdelmalek, N. N. (1974). On the discrete linear L1 approximation and LI solution of over determined linear equation. J. Approx. Theory 11 38-53. Bahadur, R. R. (1966). A note on quantiles in large samples. Ann. Math. Statist. 37, 577-580. Balakrishnan, N. and A. C. Cohen (1991). Order Statistics and Inference. Academic Press. Balmer, D. W., M. Boulton and A. Sack (1974). Optimal solutions in parameter estimation for Cauchy distribution. JASA 69, 238-242. Bassett, G. and R. Koenker (1978). Asymptotic theory of least absolute error regression. JASA, Vol. 73, No. 363,618-622. Bickel, P. J. (1968). Some Contributions to the Theory of Order Statistics. Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics, Vol. 1,575-591. Bickel, P. J. (1973). On some analogues to linear combinations of order statistics in the linear model. Ann, Statist. 1, 59%616. Cane, G. J. (1974). Linear estimation of parameters of the Cauchy distribution based on sample quantiles. JASA 69, 242-245. Ferguson, T. S. (1967). Mathematical Statistics, a Decision Theoretic Approach. Academic Press. Gastwirth, J. L. (1966). On robust procedures. JASA 65, 946-973. Hogg, R. V. (1974). Adaptive robust procedures: A partial review and some suggestions for future application and theory. JASA 69, 909 927. Huber, P. J. (1981). Robust Statistics. New York, Wiley. Jure~kovfi J. (1984). Regression quantile and trimmed least squares estimator under a general design. Kybernetika, Vol. 20, No. 5, 345 357. Koenker, R. and G. Bassett (1978). Regression quantiles. Econometriea Vol. 46, No. 1. Koenker, R. and V. D'Orey (1987). Algorithm AS 229: Computing regression quantiles. J. Roy. Statist. Soc. Ser. C 36, 383 393. Koenker, R. and V. D'Orey (1993). Remark AS R 92. A remark on Algorithm AS 229: Computing dual regression quantiles and regression rank scores. J. Roy. Statist. Soc. Ser. C 43, 410-414. Koul, H. L. (1992). Weighted empiricals and linear models. Institute of Mathematical Statistics, Lecture Notes-Monograph Series, Vol. 21. Koul, It. L. and A. K. Md. E. Saleh (1993). R-Estimation of the parameters of autoregressive JAR(p)] models. Ann. Statist, Vol. 21, No. 1,534`551. Koul, H. L. and A. K. Md. E. Saleh (1995). Autoregression quantile and related rank-scores processes. Ann. Statist., Vol. 23, No. 2, 670-689. Mosteller, F. (1946). On some useful inefficient statistics. Ann. Math. Statist. 17, 377-408. Ogawa, J. (1951). Contributions to the theory of systematic statistics. L Osaka Math. 74, 105-121. Rao, C. R. (1973). Linear Statistical Inference and Its Applications. 2nd ed. New York, Wiley. Ruppert, D. and R. J. Carroll (1980). Trimmed least squares estimation in the linear model. JASA. Vol. 75, No. 372, 828-838. Saleh, A. K. Md. E. and C. P. Han (1990). Shrinkage estimation in regression analysis. Estadistica 42, 139, 40-63. Saleh, A. K. Md. E and P. K. Sen (1978). Nonparametric estimation of location parameter after a preliminary test on regression. Ann, Statist. 6, 154-168. Saleh, A. K. Md. E and P. K. Sen (1983). Nonparametric test of location after a preliminary test on regression in the multivariate case. Comm. Statist. Theory Meth. 12(16), 1855 1872 Saleh, A. K. Md. E and P. K. Sen (1984a). Least squares and rank order preliminary test estimation in general multivariate linear models. Proceeding of Indian Statistical Institute, Golden Jubilee Conference in Statistical Applications and New Directions. J. K. Ghosh and J. Roy (eds) 237-253. Saleh, A. K. Md. E and P. K. Sen (1984b). Nonparametric preliminary test inference. Handbook of Statistics, 4, P.R. Krishnaiah and P. K. Sen (eds), North-Holland, Amsterdam, 275-297. Saleh, A. K. Md. E and P. K. Sen (1985a). Nonparametric shrinkage estimation in a parallelism problem. Sankhya 47A, 156-165.
280
S. Alimoradi and A. K. Md. E Saleh
Saleh, A. K. Md. E and P. K. Sen (1985b). On shrinkage M-estimators of location parameters. Comm. Statist. Theory Meth. 14(10), 2313-2329. Saleh, A. K. Md. E and P. K. Sen (1985c). Preliminary test predicted in general multivariate linear models. Proceedings of the Pacific Area Statistical Conference, North-Holland Publishers, 619-638. Saleh, A. K. Md. E and Sen, P. K. (1985d). Shrinkage least squares estimation in a general multivariate linear model. Proceeding of the Fifth Pannonian Symposium on Mathematical Statistics, 307-325. Saleh, A. K. Md. E and P. K. Sen (1986a). On shrinkage least squares estimation in a parallelism problem. Comm. Statist. Theory Meth. 15(5), 1451-1466. Saleh, A. K. Md. E and P. K. Sen (1986b). On shrinkage R-estimation in a multiple regression Model. Comm. Statist. Theory Meth. 15(7), 2229-2244. Saleh, A. K. Md. E and T. Shiraishi (1989). On some R-and M-estimators of regression parameters under uncertain restriction. J. Japan Statist. Soc. Vol. 19 No. 2 1989 12%137. Sclove, S. L., C. Morris and R. Radhakrishnan (1972). Non-optimality of preliminary test estimator of the mean of a multivariate normal distribution. Ann. Statist. 43, 1481 1490. Sen, P. K and A. K. Md. E. Saleh (1987). On the preliminary test and shrinkage M-estimation in linear models. Ann. Statist. 15, 1580-1592. Sen, P. K and J. M. Singer (1993). Large Sample Methods in Statistics. Chapman and Hall. Stigler, S. M. (1977). Do robust estimators work ith real data. Ann. Statist. 5, 1055-1098.
Part III Inferential Methods
N. Balakrishnanand C. R. Rao, eds., Handbook of Statistics, Vol. 17 © 1998 ElsevierScienceB.V. All rights reserved.
1
1U
The Role of Order Statistics in Estimating Threshold Parameters
A. Clifford Cohen
1. I n t r o d u c t i o n
It is well known that the first order statistic, xl:n, in a random sample of size n (i.e., the smallest sample observation) contains more information about the threshold parameter, 7 (the end point), in various skewed distributions than any other sample observation and often more information than all other sample observations combined. Consequently, this statistic plays a major role in estimating parameters of the two-parameter exponential, the three-parameter Weibull, lognormal, gamma, inverse Gaussian, and other skewed distributions that often serve as models in survival studies where life spans and/or reaction times are measured. The range of the random variable of interest is 7 < X < oc, and it follows that limn~ooxl:, = 7. Thus in any sample, the first order statistic constitutes an upper bound on the threshold parameter, and for sufficiently large samples, this statistic provides a satisfactory estimate of that parameter. In small or moderate size samples, the expected values E(X1:n) or EIF(XI:n)], where F(XI:,) designates the cumulative distribution function, lead to improved estimators. Although essential derivations are sketched, primary concern here is on applications. In this chapter, we present modified moment and/or modified maximum likelihood estimators which employ the first order statistic in estimating parameters of the exponential, Weibull, lognormal, gamma, and inverse Gaussian distributions from both complete and censored samples. This topic has recently been considered by Smith (1995) in considerable detail, and previously by Smith (1985), Cheng and Iles (1989), Harter and Moore (1966), Hill (1963), Wingo (1973), Rockette et al. (1974), Lemon (1975), Wilson and Worcester (1945), Lockhart and Stephens (1994), Cohen (1951, 1965, 1975, 1991, 1995), Cohen and Whitten (1980, 1982a,b, 1985, 1986), Cohen, Whitten and Ding (1984, 1985), Chan, Cohen and Whitten (1984), Griffiths (1980), Balakrishnan and Cohen (1991), Epstein (1954, 1960), Epstein and Sobel (1953, 1954), Sarhan (1954, 1955), Sarhan and Greenberg (1962), Viveros and Balakrishnan (1994), Kambo (1978), and others. 283
284
A. C. Cohen
The principal motivation for developing the modified estimators presented in this chapter has been a desire to obtain estimators that are easy to calculate and that are endowed with improved statistical properties. Various simulation studies have indicated that the modified estimators exhibit smaller biases and variances than the traditional moment or m a x i m u m likelihood estimators. For any of the distributions under consideration here with p d f f ( x ) and cdf F ( x ) , the p d f of the first order statistic in a random sample of size n is fl:n(Xl:n) = n[1 -- F(xl:,)]"
lf ( x l : n )
,
(1.1)
and the expected value of the first order statistic is E(XI:n) =
F
Xl:nfl:n(Xl:n) dx,:n .
(1.2)
The expected value of the corresponding cumulative distribution is E[F(XI:n)] = 1/(n + 1) .
(1.3)
2. The exponential distribution In the notation employed here, the p d f and the cdf of the two-parameter exponential distribution are written as f(x:7,
fl)=le
p
(x 7)//~, 7 < x < o o ,
zero elsewhere ,
(2.1)
and F ( x : 7, fl) = 1 - e -(x-7)/B .
(2.2)
The expected value, variance, e3, e4, and median are E(X)=g+fl,
V ( X ) = fl 2,
~3(X)=2,
Me(X) = Y +/?ln2 .
~4 (X) = 9,
(2.3)
It now follows from (1.1) that f(Xl:n; "y,fl) = ~ e -n(xl:" y)/p P
(2.4)
E(XI:n) = 7 + fi/n .
(2.5)
and
M a x i m u m likelihood estimators (MLE) of the parameters based on a complete sample of size n are /)=2-x1:,,
and
~=xl:,
.
(2.6)
285
The role of order statistics in estimating threshold parameters
Although these estimators are quite satisfactory in samples that are sufficiently large, they are not unbiased. Modified maximum likelihood estimators (MMLE) that are not only unbiased, but are also minimum variance unbiased (MVUE) and best linear unbiased (BLUE) follow when we equate E ( X ) = 2 and E(XI:n) = xl:n. We thus obtain (2,7)
,/+ fl/n = Xl:, , which on solution yield
_ nxl:, - 2 and fi - n(2 - Xl:,) (2.8) n-1 ' n-1 These estimators were originally derived as BLUE by Sarhan (1954) who employed a somewhat laborious least-square technique of Lloyd (1952). At about the same time, Epstein and Sobel (1954) independently derived these estimators as functions of the M L E and demonstrated that they are both M V U E and BLUE. Exact variances and the covariance of estimates obtained by standard expected value procedures are f12 V(fi)--n_
l,
fi2 V ( ' ~ ) - - n ( n _ l) ,
__f12 Cov(~;,fi)-n(n_
l) .
(2.9)
Censored samples Samples that may be singly or progressively censored arise frequently in survival studies. In all censored samples let n designate the total number of randomly selected items in the sample. In a k-stage progressively censored sample, let T1 < T2 < .-- < Tj < ... < Tk designate times (points) at which censoring occurs. At time x = Tj, cj sample items are randomly removed (censored) from further observation, and for these it is known only that x > Tj. The total number of censored observations is ~ 1k cj and the number of complete (full term) observations is m = n - ~ 1 k cj. For Type I censoring, the Tj are fixed (known) constants. For Type II censoring, the Tj are random variables, and the cj are fixed constants. Singly right censored samples may be considered as a special case of progressively censored samples in which k = 1. In these samples, n is the total sample size, m, in the context of a life test, is the number of failures, and c is the number of survivors when the test is terminated at time T. Of course n = m + c. In a Type I censored sample, T is a fixed constant and c is a random variable. In a Type II censored sample, c is a fixed constant and T is a random variable. In this case, T is the mth order statistic in a random sample of size n. The log-likelihood function of a k-stage progressively censored sample as described above is
lnL=
1 m -mlnfi--fi~(xi-
1 ~ y) - - f i ~ c j ( T j -
?) + c o n s t .
(2.10)
A . C . Cohen
286
This is an increasing function of ? and the M L E of 7 is therefore the largest permissible value. Accordingly, ~ = xl:,. The estimating equation for fl is OlnL/Ofl = 0, and this follows from (2.9) as
OlnLofi - mfi
1
(xi
p
-
Z cj(Tj
(2.11)
=0
1
The M L E can thus be written as f) :
Xl: n
/~ = [ST -
nXl:n]/m ,
(2.12)
where ST is the " s u m total" of all observations, both complete and censored, i.e. m
ST=
k
Zxi+
ZcjTj
i=1
j=l
.
(2.13)
F o r the modified m a x i m u m likelihood equations, we replace the first equation o f (2.12) with E(X1:n) = Xl:n, and the resulting equations become
= xl:, - fi/n, n~,J/m .
(2.14)
/~ = [ST -
The required estimators then follow as
= [mxl:n -- ST/n]/(m - 1) /~ = [ S T -
(2.15)
nxl:,]/(m- 1) .
It is to be noted that the estimators of (2.8) for complete (uncensored) samples of size n follow as a special case of (2.15) with m = n and ST = n2. W h e n k = 1, the estimators of (2.15) apply to a singly censored sample with ST = ~ T xi + cT.
3. The Weibull distribution The p d f and the cdf of the three-parameter Weibull distribution in the notation employed here are
f(x;~,fi, 6) = -~-g(x- 7)6-1exp{-[(x- 7)/fl]6}, = 0
elsewhere ,
and
F(x; 7, fi, 6) = 1 - e x p { - [ ( x - 7)/fl]5} ,
7<x
(3.1)
The role of order statistics in estimating threshold parameters
287
where 7,/3, and 5, are the threshold, scale and shape parameters respectively. When 3 < 1, this distribution is reverse J shaped. When 3 = 1, this special case becomes the exponential distribution. When 3 > 1, the distribution approaches a bell shape. The mean and variance of this distribution are E(X) = 7 +/3F1,
V(X) =/32[F2 - r~] ,
where /'~ = F(1 + k/3), and F( ) is the gamma function
/0
F(Z) -----
tZ_le_td t .
(3.2)
In some instances, notably with the M L E and the MMLE, in order to obtain simpler estimating equations, it becomes desirable to employ 0 rather than/3 as the scale parameter where 0 =/36 .
(3.3)
The first order statistic in a random sample of size n from a Weibull distribution is also a Weibull distribution. More specifically, i f X is distributed as W(7,/3, 6) then Xl:n is distributed as W(7,/3i, 6), where/31 =/3/nl/6. Therefore the expected value of the first order statistic becomes E(XI:n) = y + (/3/nl/~)F1 •
(3.4)
Modified moment estimators (MME) are to be recommended when samples are complete, but when samples are censored, M M L E are to be preferred. Both the M L E and the M M L E are valid only if 3 > 1 and computational problems are likely to be encountered unless 3 >> 1. For a progressively censored sample, calculation of maximum likelihood estimates (MLE) requires the simultaneous solution of the following three equations. Oln L m ~-, 1 , 03 - 3 + z.~ ln(xi - 7) - OE (xi - 7)61n(xi - 7) = 0, i=1
OlnL O0 - OlnL
m 0 4-
3 -OE
,
Z * ( x i - 7) 6 = 0, 7)6_ 1
(xi-
O?
(3.5) m
-(5-1)E(xi-7) i=1
-1=0
,
where Z* indicates summation over both complete and censored observations; i.e. m
2;*x~ = ~ i=1
k
x/~ + •
cj Tf, etc.
j=l
Calculation of the M M L E requires the simultaneous solution of the threeequation system consisting of the first two equations of (3.5) plus [E(XI:,) = Xl:,],
A. C. Cohen
288
where E(X~:,) is given in (3.4). Additional details concerning these calculations can be found in Cohen (1991). Modified m o m e n t estimators For complete samples, modified moment estimators (MME) in a sample of size n that are valid for all values of 6 are 7 + flF1 = x, f12[F2 - F 2] = s 2, "~ -}- (fl/nl/6)F1
= Xl:n
(3.6) •
After a few simple algebraic manipulations, the three equations of (3.6) become
$2 /"2 -- F~ (~ - x,:,) 2 - [(1 - nl/a)V,] 2 = W(n, 6), = (nUaxa:n- Yc)/(n I / a - 1),
(3.7)
= nUa(Yc - x,:n)/(n '/e' - 1) . It is to be noted that when 6 = 1, the last two equations of (3.7) are identical to equations (2.8), which apply in the exponential distribution. With 2, s 2, and xl:n available from the sample data z the first equation of (3.7) can be solved for the estimate of the shape parameter, 6. Then 9 and fi follow from the last two equations. As aids to facilitate the solution of the first equation of (3.7), a chart and a table of the function W(n, 6) are reproduced here from Cohen, Whitten, and Ding (1984) as Figure 1 and Table 1, respectively. An extended version of the table is given in Cohen and Whitten (1988).
4. T h e l o g n o r m a l distribution
If a random variable Y is distributed normally (#, O'2), and if Y = ln(X - 7), then X is distributed lognormally with pdf f ( x ; y, #,
0 "2) =
exp{-[ln(x - 7) -- #]2/20-2} 0-v
=0,
(x -
'
< x <
0-2 > o,
(4.1)
elsewhere .
In this notation, 7 is the threshold parameter, # becomes the scale parameter, and o- becomes the shape parameter. However, it is often preferable to employ o0 = exp(0-2) as the shape parameter, and fi = exp(#) as the scale parameter. The expected value (mean), variance, third and fourth standard moments of this distribution, as given by Yuan (1933) are
289
The role of order statistics in estimating threshold parameters
i0' I 1 I
8.(D
I I I
I I I
I I I
mmmmm m l mm mmmmmmmmmmmmmmmm--,~--mm
II',I
T.O
-
: :I : I I
6.0
--
rE
W(n,5 ) = [(1-
5.0
! i i' !
3.0 | |11
I I
~0
n -116)
~12
and
I I I
l
I
I I I
I I 11 I I
2
~2
-2 -
~1
( 7 - X 1 7 = [(i - n'116)rl]2
_
I llll III kl I ( I III illll I
2.0
2
- rl
,
4.0 l I
Im
_ _ _ _
where k
E I0 °
l=,i:,
0.8 0.7 0.6 0.5
"P'-,4-...L_
0.3 I I
I
0.2
I I I
1 I 1 I I I
~N~", ~ ,~,%"C-. \ ,7",",
IIII
"'-, ~"-.-.~ ""~ "-.
"~--"< " - -
~ ~--~ ~"
1.0
~_
-"'~--
( ]] ] 0.0
~
~_ --"~-"~--~ ~ '---~..__
---'~----~--I1~ -.-q..-.It8 ~ "-t:*'
---~r-I ~
~---=-,
"<~>_.-C----- : ~ 2.0
:3.0
4.0
5.0
Fig. 1. Graphs of the Weibull estimation function W(n, 6) = ~
6.0
7.0
.
© 1996 American Society for Quality Control. Reprinted with Permission. Reprinted from Cohen, Whitten and Ding (1984)
E ( X ) = 2 + fl(o 1/2,
V(X) =
fi2[co((o - 1)], (4.2)
~3 = (co + 2)(00 - 1) 1/2, ~4 = (o 4 + 2(o 3 +
3o2 2 --
3 .
A.C. Cohen
290 Table 1 The Weibull estimation function
w(.,~)
F2 -- F~ -
[(1
-
n-I/~)Fl] 2
~XX~
5
10
15
20
25
0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.3 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 6.0 7.0 8.0
10.2276 5.4253 3.5613 2.6423 2.1174 1.7864 1.5625 1.1935 1.0647 0.8942 0.8126 0.7663 0.7370 0.7172 0.7031 0.6926 0.6783 0.6691 0.6629
9.9277 5.1015 3.2284 2.3075 1.7839 1.4555 1.2346 0.8737 0.7489 0.5844 0.5054 0.4601 0.4310 0.4109 0.3963 0.3852 0.3697 0.3593 0.3519
9.8877 5.0447 3.1597 2,2309 1.7023 1.3708 1.1480 0.7851 0.6603 0.4966 0.4185 0.3737 0.3451 0.3253 0.3109 0.3000 0.2846 0.2743 0.2669
9.87602 5.02509 3.13318 2.19917 1.66678 1.33266 1.10803 0.74259 0.61715 0.45329 0.37551 0.33113 0.30278 0.28325 0.26904 0.25826 0.24306 0.23290 0.22564
9.87129 5.01604 3.11992 2.18241 1,64730 1.31116 1.08507 0.71728 0.59121 0,42694 0.34927 0.30509 0.27693 0.25757 0.24349 0.23283 0.21781 0.20777 0.20061
6XX•
30
35
40
50
60
0.4 0.5 0.6 0,7 0.8 0.9 1.0 1.3 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 6.0 7.0 8.0
9.86898 5.01113 3.11225 2.17227 1.63514 1.29753 1.07015 0.70035 0.57368 0.40893 0.33127 0.28721 0.25920 0.23995 0.22599 0.21542 0.20055 0.19061 0.18353
9.86770 5.00817 3.10737 2.16557 1.62689 1.28794 1.05969 0.68817 0.56095 0.39571 0.31801 0.27403 0.24612 0.22697 0.21309 0.20260 0.18785 0.17800 0.17099
9.86693 5.00626 3.10405 2.16086 1.62096 1.28099 1.05194 0.67894 0.55123 0.38551 0.30775 0.26382 0.23598 0.21691 0.20310 0.19267 0.17802 0.16826 0.16131
9.86609 5.00400 3.09993 2,15477 1.61307 1.27155 1.04123 0.66581 0.53725 0.37067 0.29274 0.24886 0.22112 0.20217 0.18847 0.17814 0.16365 0.15401 0.14716
9.86569 5.00278 3.09753 2.15107 1.60809 1.26546 1.03419 0.65686 0.52760 0.36025 0.28215 0.23827 0.21061 0.19713 0.17811 0.16786 0.15349 0.14396 0.13719
6•n
70
80
90
100
150
0.4 0.5 0.6
9.86546 5.00204 3.09600
9.86532 5.00156 3.09496
9.86523 5.00123 3.09422
9.86517 5.00100 3.09367
9.86505 5.00044 3.09226
291
The role of order statistics in estimating threshold parameters
Table 1 (Contd.) 70
80
90
100
150
0.7 0.8 0.9 1.0 1.3 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 6.0 7.0 8.0
2.14861 1.60470 1.26121 1.02920 0.65033 0.52048 0.35246 0.27418 0.23029 0.20267 0.18386 0.17030 0.16010 0.14584 0.13638 0.12967
2.14688 1.60226 1.25809 1.02548 0.64533 0.51498 0.34636 0.26791 0.22400 0.19641 0.17764 0.16413 0.15398 0.13980 0.13041 0.12376
2.14561 1.60042 1.25570 1.02260 0.64137 0.51058 0.34142 0.26282 0.21888 0.19130 0.17257 0.15910 0.14899 0.13488 0.12555 0.11894
2.14464 1.59899 1.25382 1.02030 0.63814 0.50697 0.33733 0.25857 0.21460 0.18704 0.16834 0.15490 0.14482 0.13077 0.12149 0.11493
2.14202 1.59496 1.24836 1.01347 0.62807 0.49548 0.32399 0.24458 0.20044 0.17289 0.15428 0.14095 0.13098 0.11714 0.10803 0.10161
_~NN~
200
250
300
500
1000
0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.3 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 6.0 7.0 8.0
9.86501 5.00025 3.09170 2.14090 1.59313 1.24574 1.01008 0.62271 0.43919 0.31640 0.23651 0.19221 0.16464 0.14605 0.13278 0.12288 0.10916 0.10017 0.09384
9.86500 5.00016 3.09142 2.14029 1.59209 1.24422 1.00805 0.61933 0.48514 0.31138 0.23109 0.18665 0.15904 0.14048 0.12724 0.11738 0.10375 0.09484 0.08858
9.86499 5.00011 3.09126 2.13992 1.59144 1.24323 1.00670 0.61698 0.48228 0.30775 0.22713 0.18256 0.15492 0.13636 0.12315 0.11332 0.09976 0.09090 0.08470
9.86498 5.00004 3.09099 2.13928 1.59024 1.24132 1.00401 0.61197 0.47599 0.29942 0.21787 0.17292 0.14515 0.12658 0.11341 0.10365 0.09024 0.08153 0.07545
9.86498 5.00001 3.09086 2.13891 1.58946 1.23999 1.00200 0.60771 0.47036 0.29138 0.20860 0.16308 0.13508 0.11643 0.10328 0.09357 0.08031 0.07176 0.06582
© 1996 American Society for Quality Control. Reprinted with Permission. Reprinted from Cohen, Whitten and Ding (1984) M o d i f i e d m o m e n t e s t i m a t i n g e q u a t i o n s ( M M E ) , w h i c h e m p l o y t h e first o r d e r statistic were given by Cohen, Whitten and Ding (1985) for an uncensored ( c o m p l e t e ) r a n d o m s a m p l e o f size n as 7 if- fl (D1/2 = -~, fi2[(,O(f,O -- 1)] = $2~ 7 + f i e x p [ l'v/ln~E(Zl:,)] = Xl:n ,
(4.3)
A.C. Cohen
292
Table 2 Expected values of the first order statistic in the standard normal distribution n
E(Zt,.)
n
E(Zl,.)
n
E(ZI,.)
10 12 14 16 t8 20 22 24 26 28 30 32
-1.53875 -1.62923 -1.70338 -1.76599 -1.82003 -1.86748 -1.90969 -1.94767 -1.98216 -2.01371 -2.04276 -2.06967
34 36 38 40 45 50 55 60 65 70 75 80
-2.09471 -2.11812 -2.14009 -2.16078 -2.20772 -2.24907 -2.28598 -2.31928 -2.34958 -2.37736 -2.40299 -2.42677
85 90 95 100 125 150 175 200 250 300 350 400
-2.44894 -2.46970 -2.48920 -2.50759 -2.58634 -2.64925 -2.70148 -2.74604 -2.81918 -2.87777 -2.92651 -2.96818
Extracted from Harter's (1961) tables. © 1996 American Society for Quality Control. Reprinted with Permission. Reprinted from Cohen, Whitten and Ding (1985) where s 2 = ~ ( x i - 2 ) 2 / ( n - 1), a n d where Zl:n is the first o r d e r statistic in a r a n d o m s a m p l e o f size n f r o m a s t a n d a r d n o r m a l d i s t r i b u t i o n (0,1). A f t e r a few simple algebraic m a n i p u l a t i o n s , the three e q u a t i o n s o f (4.3) b e c o m e
s2
c~(~5- 1)
(2 - xa:~) 2 - Iv/& - e x p { ~ E ( Z l : n ) } ] = 2 - s(& - 1)
/? :
-
1)]
1/2
2=
A(n, a), (4.4)
1/2
The first e q u a t i o n o f (4.4) can be solved for &. Estimates o f 7 a n d / ~ then follow f r o m the the second a n d third equations. W h e n they are required, # and/~ follow as = ~ ,
and
/~=ln/~ .
(4.5)
Values o f E(ZI:,) can be o b t a i n e d f r o m tables c o m p i l e d by H a r t e r (1961). Selected values were e x t r a c t e d f r o m this source a n d are included as T a b l e 2 in the p a p e r by Cohen, W h i t t e n a n d D i n g (1985). This a b r i d g e d table is r e p r o d u c e d here as T a b l e 2 with p e r m i s s i o n o f the A m e r i c a n Society for Q u a l i t y C o n t r o l . A s c o m p u t a t i o n a l aids, a table a n d a c h a r t o f A(n, a) (from (4.4) are r e p r o d u c e d here f r o m C o h e n , W h i t t e n , a n d D i n g (1985) as T a b l e 3 a n d F i g u r e 2, also with p e r m i s s i o n f r o m ASQC.
5. The gamma distribution The p d f o f the t h r e e - p a r a m e t e r g a m m a d i s t r i b u t i o n with threshold, scale, a n d shape p a r a m e t e r s , 7, fl, P respectively is
293
The role of order statistics in estimating threshold parameters
f ( x ; 7, fi, P) = [r(p)fiPt-l( x ~))p-1 exp --[(x -- ?)/fi], ?<x0, p>0, = 0, elsewhere . -
(5.1)
The expected value (mean), variance, and third standard m o m e n t of this distribution are
E(X) -- "~÷ pfl,
0(3 = 2/V/-fi •
V ( X ) = pfl2,
(5.2)
M o m e n t estimating equations based on a r a n d o m sample of size n are +
=
p/ 2 = s2,
2/v
= a3 .
(5.3)
Modified m o m e n t estimators that employ the first order statistic are obtained by replacing the third equation of (5.3) with an appropriate function of the first order statistic. In this case, we employ ElF(X1:,)] = 1/(n + 1 ) = F(Xl:,), and the resulting equations become
t3/~2 = s 2,
(5.4)
F(xl:,) = 1/(n + 1) . W h e n the p d f of (5.1) is standardized with 0(3 as the shape parameter, it becomes
(2/0(3)4/c~ (z ÷ 2/0(3) (4/a~)-I e x p [ - 2 (z + 2/0(3) ] g(z; 0, 1, 0(3) -- F(4/%2 ) 0(3 ' 2
(5.5)
----
= 0,
elsewhere
,
and the standardized distribution function becomes
G(z; 0, 1, 0(3) =
/z
g(t; 0, 1,0(3) d t .
(5.6)
2/~3 With E(X) = 2 and V(X) = s 2, we write zl:n = (Xl:n-2)/s, E[F(Xl:n)] = E[G(zl:n)] = 1/(n + 1), the estimators of (5.4) become
and
since
G(zl:n;0, 1, c23) = 1/(n + 1)
/} : s&3/2, 9 = 2 - 2s/6(3, fi : 41~ 2
(5.7)
.
The first equation of (5.7) can be solved for ~3. Although this equation could be solved directly for 15, it seems preferable to consider e3 as the primary shape parameter. With c~3 thus calculated, the remaining parameters follow from the other equations of (5.7).
294
A . C . Cohen
Table 3 The lognormal estimating function
A(n, ~) -
~(~-
1)
[x/~ - exp(o'E(Zl:n)}] 2 X~
10
15
20
25
30
35
40
50
60
0.0l 0.02 0.03 0.04 0.05 0.06 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50 0.55 0.60 0.65 0.70 0.75 0.80 0.85 0.90 0.94 0.95 0.96 0.97 0.98 0.99 1.00 1.05 1.10 1.15 1.20 1.25 1.30 1.35 1.40 1.45 1.50
0.42615 0.43007 0.43411 0.43826 0.44253 0.44692 0.46574 0.49230 0.52257 0.55700 0.59609 0.64046 0.69079 0.74792 0.81279 0.88655 0.97050 1.06620 1.17548 1.30051 1.44387 1.60860 1.79834 1.97105 2.01746 2.06524 2.11446 2.16515 2.21736 2.27116 2.56573 2.90875 3.30941 3.77887 4.33073 4.98165 5.75206 6.66717 7.75811 9.06354
0.33575 0.33975 0.34386 0.34807 0.35239 0.35682 0.37572 0.40216 0.43212 0.46603 0.50444 0.54793 0.59723 0.65317 0.71670 0.78897 0.87129 0.96522 1.07261 1.19562 1.33683 1.49928 1.68661 1.85730 1.90319 1.95045 1.99914 2.04929 2.10097 2.15422 2.44598 2.78605 3.18361 3.64980 4.19823 4.84552 5.6121l 6.52316 7.60980 8.91066
0.29061 0.29457 0.29864 0.30281 0.30708 0.31146 0.33011 0.35617 0.38565 0.41901 0.45678 0.49958 0.54811 0.60322 0.66587 0.73719 0.81852 0.91142 1.01773 1.13962 1.27966 1.44091 1.62700 1.79667 1.84230 1.88931 1.93774 1.98763 2.03904 2.09202 2.38244 2.72112 3.11725 3.58196 4.12886 4.77456 5.53950 6.44882 7.53366 8.83263
0.26272 0.26664 0.27065 0.27476 0.27897 0.28329 0.30168 0.32736 0.35643 0.38933 0.42660 0.46885 0.51682 0.57132 0.63334 0.70402 0.78467 0.87688 0.98248 1.10364 1.24295 1.40345 1.58879 1.75785 1.80333 1.85018 1.89845 1.94819 1.99945 2.05228 2.34191 2.67980 3.07513 3.53903 4.08509 4.72995 5.49401 6.40244 7.48634 8.78433
0.24341 0.24727 0.25123 0.25529 0.25945 0.26372 0.28186 0.30722 0.33593 0.36845 0.40532 0.44715 0.49467 0.54871 0.61026 0.68045 0.76061 0.85231 0.95741 1.07806 1.21686 1.37685 1.56168 1.73034 1.77571 1.82247 1.87064 1.92027 1.97143 2.02416 2.31329 2.65068 3.04551 3.50891 4.05447 4.69882 5.46236 6.37026 7.45362 8.75104
0.22906 0.23288 0.23679 0.24080 0.24491 0.24912 0.26706 0.29214 0.32055 0.35275 0.38928 0.43077 0.47793 0.53161 0.59278 0.66259 0.74237 0.83369 0.93840 1.05868 1.19710 1.35672 1.54119 1.70955 1.75486 1.80154 1.84964 1.89921 1.95029 2.00295 2.29174 2.62879 3.02328 3.48635 4.03159 4.67561 5.43883 6.34639 7.42941 8.72649
0.21786 0.22164 0.22551 0.22947 0.23354 0.23771 0.25546 0.28030 0.30845 0.34038 0.37663 0.41783 0.46470 0.51807 0.57894 0.64844 0.72792 0.81894 0.92335 1.04333 1.18145 1.34079 1.52499 1.69314 1.73840 1.78502 1.83307 1.88259 1.93362 1.98623 2.27477 2.61158 3.00584 3.46868 4.01369 4.65750 5.42050 6.32785 7.41064 8.70750
0.20131 0.20501 0.20881 0.21270 0.21669 0.22079 0.23823 0.26267 0.29039 0.32189 0.35769 0.39843 0.44483 0.49774 0.55813 0.62717 0.70618 0.79674 0.90070 1.02025 1.15796 1.31690 1.50071 1.66857 1.71375 1.76031 1.80829 1.85774 1.90870 1.96124 2.24945 2.58595 2.97992 3.44248 3.98722 4.63077 5.39352 6.30062 7.38317 8.67978
0.18946 0.19311 0.19685 0.20068 0.20461 0.20865 0.22584 0.24995 0.27735 0.30850 0.34395 0.38434 0.43038 0.48293 0.54298 0.61166 0.69034 0.78057 0.88422 1.00346 1.14089 1.29955 1.48311 1.65078 1.69592 1.74243 1.79036 1.83977 1.89069 1.94319 2.23119 2.56751 2.96131 3.42372 3.96833 4.61174 5.37437 6.28136 7.36380 8.66029
© 1996 American Society for Quality Control. Reprinted with Permission. Reprinted from Cohen, Whitten and Ding (1985)
The role of order statistics in estimating threshold parameters
295
Table 3 (Contd.) ~n
70
80
90
100
150
200
300
400
0.01 0.02 0.03 0.04 0.05 0.06 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50 0.55 0.60 0.65 0.70 0.75 0.80 0.85 0.90 0.94 0.95 0.96 0.97 0.98 0.99 1.00 1.05 1.10 1.15 1.20 1.25 1.30 1.35 1.40 1.45 1.50
0.18044 0.18404 0.18773 0.19151 0.19539 0.19938 0.21637 0.24021 0.26733 0.29821 0.33338 0.37348 0.41924 0.47151 0.53127 0.59969 0.67810 0.76809 0.87150 0.99052 1.12773 1.28620 1.46958 1.63712 1.68223 1.72871 1.77661 1.82599 1.87688 1.92935 2.21723 2.55343 2.94713 3.40946 3.95399 4.59734 5.35991 6.26685 7.34924 8.64569
0.17327 0.17682 0.18047 0.18421 0.18805 0.19199 0.20881 0.23243 0.25932 0.28996 0.32489 0.36476 0.41029 0.46233 0.52187 0.59007 0.66827 0.75805 0.86128 0.98012 1.11718 1.27550 1.45875 1.62620 1.67128 1.71774 1.76562 1.81498 1.86585 1.91830 2.20610 2.54223 2.93587 3.39815 3.94264 4.58596 5.34852 6.25544 7.33782 8.63426
0.16738 0.17090 0.17451 0.17822 0.18202 0.18592 0.20259 0.22601 0.25270 0.28314 0.31788 0.35755 0.40287 0.45472 0.51408 0.58210 0.66013 0.74975 0.85282 0.97153 1.10845 1.26667 1.44982 1.61719 1.66226 1.70870 1.75657 1.80591 1.85677 1.90921 2.19694 2.53303 2.92664 3.38889 3.93337 4.57669 5.33924 6.24617 7.32856 8.62501
0.16243 0.16592 0.16950 0.17317 0.17694 0.18081 0.19734 0.22060 0.24712 0.27739 0.31195 0.35144 0.39660 0.44828 0.50748 0.57534 0.65323 0.74272 0.84566 0.96425 1.10108 1.25920 1.44228 1.60960 1.65465 1.70109 1.74895 1.79828 1.84913 1.90155 2.18925 2.52531 2.91889 3.38114 3.92562 4.56894 5.33151 6.23845 7.32087 8.61734
0.14576 0.14913 0.15259 0.15614 0.15979 0.16354 0.17959 0.20223 0.22812 0.25776 0.29170 0.33058 0.37515 0.42625 0.48489 0.55224 0.62964 0.71868 0.82122 0.93946 1.07598 1.23384 1.41671 1.58389 1.62892 1.67533 1.72316 1.77247 1.82330 1.87571 2.16334 2.49937 2.89298 3.35528 3.89984 4.54328 5.30598 6.21308 7.29566 8.59230
0.13582 0.13911 0.14249 0.14596 0.14953 0.15320 0.16892 0.19115 0.21664 0.24588 0.27942 0.31791 0.36209 0.41284 0.47115 0.53818 0.61529 0.70408 0.80640 0.92445 1.06082 1.21857 1.40135 1.56850 1.61352 1.65993 1.70776 1.75706 1.80789 1.86030 2.14795 2.48403 2.87771 3.34011 3.88480 4.52837 5.29123 6.19849 7.28124 8.57806
0.12385 0.12704 0.13031 0.13368 0.13715 0.14071 0.15601 0.17771 0.20267 0.23138 0.26441 0.30240 0.34612 0.39642 0.45432 0.52097 0.59776 0.68626 0.78834 0.90620 1.04243 1.20009 1.38283 1.54998 1.59501 1.64142 1.68926 1.73857 1.78942 1.84184 2.12958 2.46580 2.85964 3.32224 3.86715 4.51097 5.27408 6.18162 7.26465 8.56176
0.11654 0.11965 0.12286 0.12616 0.12956 0.13305 0.14807 0.16942 0.19403 0.22240 0.25510 0.29278 0.33620 0.38622 0.44387 0.51030 0.58689 0.67523 0.77719 0.89496 1.03113 1.18876 1.37152 1.53870 1.58374 1.63016 1.67801 1.72734 1.77820 1.83064 2.11848 2.45482 2.84882 3.31158 3.85667 4.50068 4.26400 6.17175 7.25499 8.55232
296
A . C . Cohen
101
I
I
F(n,a)
I
I
[~
,,,(,.,, - 1) - explcE(Zl,r)}]
=
8.0
I
I
1
I
I
2
d# t"
?.0 where . = exp(o 2)
i
6.0
/m.
J
and
5.0 s2
&(&
(~". Xl )2
4.0
[~
-
lJ
/,m' l
- exp{SE(Z1.n)l]2 ~w
where & = exp(~ 2)
3.0
I#/~ '//i#/ //,#/ ///A 'l
7/////
Ag/
2.0
//~
-¢ 10 °
/
///
/
0.8
t
0.7 i
0
~"
."r
' ° -~--
0.3'
/
"//]'// I /'/K'/ I r, 1i / /{/¢//q~ I i l l Ill .1 l F i ( / {I I l l
l
o.,'
/I. 'tl
/
o. 6 0.5-
/h/t
'~ /~'//J
i/'!'
/I,/'.//"Ii /
I
t
"'///i I I /i / i 16t I
Ill/I
,s ,Z.5~ I
zo ~ /
/
,,,2//i / __
/ 1
I
I
0.2 ~ 7/--// II
5 ' _'/ 0.2
0.5
0.8
1.1
1.4
G
Fig. 2. Graphs of the lognormal estimation function A(n, ~r) =
<,+~ I)
[-,/~ expo-E(zl:,,)]~" © 1996 American Society for Quality Control. Reprinted with Permission. Reprinted from Cohen, Whitten and Ding (1985)
The role of order statistics in estimating threshold parameters
297
As computational aids, a table and a chart of ~3 as a function of Zl:, and n were given by Cohen and Whitten (1986). These are reproduced here as Table 4 and Figure 3 with permission from ASQC. 6. The Inverse Gaussian distribution
The Inverse Gaussian (IG) distribution is a positively skewed distribution that provides a useful alternative to the Weibull, lognormal, gamma, and other similar distributions as a model in various survival studies. When a threshold parameter is present, the pdf of this distribution can be written as ~r 2~z(x- 7) 3 = 0,
1 I exp
~-7)
j,
7 < x < oc,
elsewhere ,
(6.1)
where/~ > 0, a > 0. The expected value (mean), variance, and shape parameter, ~3, are E(X) =
V ( X ) = a 2, ~3 =
(6.2)
3cr//~
.
In modified moment estimators (MME) that feature the first order statistic, the third equation of (6.2) is replaced by E[F(XI:n)] = 1 / ( n + 1), where F(-) is the IG cumulative distribution function of the first order statistic. Since F(XI:,) =-G(Zl:n), where Zl:n [Xl:n- E(X)]/o- and the corresponding sample value is zl:n = (xl:, - 2 ) / s , the estimating equations become =
~-2 = S 2,
(6.3)
G(z,:n; 0, 1, c23) = 1 / ( n + 1) . The cdf of the standardized IG distribution (0, 1, ~3) can be expressed as the sum of two normal components as follows - (z__+_6/~3) G(z;0, 1,e3) = ~ X/1 + (c~3/3)z +exp(18/e~)~b[ IV/1 + (c~3/3)z
E z
(6.4) where ~(.) is the cdf of the standard normal distribution (0,1). In this notation e3 is now the shape parameter, and the M M E of (6.3) become
~t = 3s/~3, = x -- 3s/~3 •
(6.5)
A . C . Cohen
298
Table 4 e3 as a function ofzl:, and n in the g a m m a distribution 10
20
25
30
40
50
100
250
500
2.4994 2.4382 2.3798 2.3240 2.2706
2.3250 2.2719
lO00
-.50 -.52 -.54 -.56 -.58
3.9914 3.8327 3.6837 3.5428 3.4087
3.9993 3.8448 3.7013 3.5675 3.4423
3.4474
-.60 -.62 -.64 -.66 -.68
3.2804 3.3245 3.1567 3.2132 3.0367 3.1077 2.9195 3.0072 2.8044 2.9111
3.3319 3.2235 3.1216 3.0253 2.9342
3.3325 3.2245 3.1230 3.0273 2.9368
3.2250 3.1237 3.0283 2.9382
3.1243 3.0293 2.9396
3.0297 2.9402
-.70 -.72 -.74 -.76 -.78
2.6904 2.8188 2.5770 2.7299 2.4632 2.6438 2.3482 2.5602 2.2309 2.4787
2.8477 2.7652 2.6864 2.6109 2.5383
2.8511 2.7696 2.6919 2.6176 2.5464
2.8529 2.8548 2.7720 2.7744 2.6949 2.6981 2.6214 2.6254 2.5510 2.5560
2.8556 2.7756 2.6996 2.6274 2.5585
2.7772 2.7018 2.6303 2.5623
-.80 -.82 -.84 -.86 -.88
2.1104 1.9852 1.8535 1.7130 1.5602
2.4684 2.4008 2.3353 2.2716 2.2096
2.4779 2.4120 2.3483 2.2866 2.2268
2.4835 2.4186 2.3560 2.2956 2.2371
2.4896 2.4259 2.3646 2.3056 2.2487
2.4927 2.4296 2.3692 2.3110 2.2549
2.4975 2.4357 2.3766 2.3200 2.2656
-.90 -.92 -.94 -.96 -.98
1.3897 2.0175 2.1492 2.1685 2.1803 1.1919 1.9429 2.0900 2.1118 2.1250 .9453 1.8683 2.0320 2.0563 2.0712 .5713 1.7936 1.9751 2.0021 2.0187 1.7185 1.9191 1.9489 1.9673
2.1936 2.1401 2.0883 2.0378 1.9886
2.2008 2.2134 2.1484 2.1631 2.0977 2.1146 2.0484 2.0677 2.0005 2.0223
2.2194 2.2210 2.1703 2.1723 2.1231 2.1256 2.0777 2.0807 2.0339 2.0375
2.2217 2.1732 2.1267 2.0821 2.0392
2.3990 2.3208 2.2438 2.1677 2.0924
-1.00 -1.02 -1.04 -1.06 -1.08
1.6429 1.5664 1.4890 1.4103 1.3302
1.8638 1.8093 1.7554 1.7019 1.6489
1.8967 1.8453 1.7947 1.7447 1.6954
1.9169 1.8675 1.8190 1.7712 1.7241
1.9405 1.8935 1.8475 1.8023 1.7580
1.9538 1.9082 1.8636 1.8200 1.7773
1.9783 1.9356 1.8940 1.8536 1.8142
1.9917 1.9508 1.9113 1.8730 1.8358
1.9959 1.9558 1.9170 1.8795 1.8432
1.9980 1.9582 1.9199 1.8829 1.8472
-1.10 -1.12 -1.14 -1.16 -1.18
1.2483 1.1645 1.0783 .9895 .8976
1.5962 1.5438 1.4916 1.4396 1.3876
1.6465 1.5981 1.5501 1.5024 1.4550
1.6777 1.6318 1.5863 1.5413 1.4967
1.7144 1.6714 1.6291 1.5873 1.5460
1.7354 1.6942 1.6537 1.6137 1.5744
1.7757 1.7381 1.7012 1.6652 1.6298
1.7996 1.7645 1.7302 1.6968 1.6643
1.8080 1.7739 1.7407 1.7085 1.6771
1.8126 1.7791 1.7466 1.7150 1.6844
-1.20 -1.22 -1.24 -1.26 -1.28
.8022 .7028 .5985 .4887 .3720
1.3356 1.2836 1.2315 1.1793 1.1268
1.4078 1.3607 1.3139 1.2670 1.2203
1.4524 1.4084 1.3647 1.3211 1.2777
1.5052 1.4647 1.4246 1.3849 1.3454
1.5355 1.4972 1.4592 1.4217 1.3845
1.5951 1.5610 1.5274 1.4943 1.4618
1.6324 1.6013 1.5708 1.5409 1.5116
1.6465 1.6166 1.5875 1.5590 1.5311
1.6546 1.6256 1.5973 1.5697 1.5428
-1.30 -1.32 -1.34 -1.36 -1.38
.2471 .1117
1.0742 1.0212 .9679 .9142 .8601
1.1735 1.1267 1.0798 1.0329 .9858
1.2344 1.1913 1.1482 1.1051 1.0621
1.3062 1.2672 1.2285 1.1899 1.1514
1.3476 1.3111 1.2748 1.2387 1.2029
1.4297 1.3980 1.3667 1.3357 1.3051
1.4828 1.4545 1.4268 1.3994 1.3725
1.5038 1.4771 1.4508 1.4251 1.3998
1.5165 1.4908 1.4656 1.4409 1.4168
The role of order statistics in estimating threshold parameters
299
Table 4 (Contd.) 10
20
25
30
40
50
100
250
500
1000
- 1.40 -1.42 -1.44 -1.46 -1.48
.8055 .9385 1.0190 1.1131 1.1673 1.2749 .7504 . 8 9 1 1 .9759 1.0749 1.1318 1.2449 .6947 .8434 .9327 1.0367 1.0965 1.2152 .6384 .7954 .8895 ,9987 1.0613 1.1857 .5814 .7472 . 8 4 6 1 .9606 1.0262 1.1565
1.3460 1.3198 1.2940 1.2685 1.2434
1.3750 1.3505 1.3265 1.3028 1.2795
1.3931 1.3698 1.3469 1.3245 1.3024
-1.50 -1.52 -1.54 -1.56 -l.58
.5237 .4652 .4059 .3456 .2844
.6987 .6498 .6006 .5509 .5008
.8026 .7589 .7150 .6710 ,6267
.9226 .8846 .8466 .8086 .7705
.9913 .9564 .9216 .8868 .8521
1.1275 1.0987 1.0701 1.0417 1.0135
1.2185 1.1940 1.1696 1.1456 1.1217
1.2565 1.2338 1.2114 1.1893 1.1674
1.2807 1.2593 1.2382 1.2174 1.1969
.2221 .1587 .0941
.4503 .3992 .3476 .2955 .2427
,5822 ,5374 .4923 .4470 .4013
.7324 .6942 .6559 .6175 .5790
.8174 .7827 .7481 .7134 .6787
.8954 .9575 .9297 .9020 .8744
1.0981 1.0748 1.0516 1.0286 1.0058
1.1458 1.1245 1.1033 1.0824 1.0617
1.1767 1.1568 1.1371 1.1176 1.0984
.1893 .1352 .0804
.3552 .3088 .2620 .2148 .1672
.5404 .5017 .4628 .4237 .3845
.6440 .6092 .5744 .5395 .5046
.8469 .8196 .7923 ,7651 .7380
. 9 8 3 1 1.0412 .9606 1.0209 .9383 1.0008 . 9 1 6 1 .9808 . 8 9 4 1 .9610
1.0793 1.0605 1.0419 1.0235 1.0052
.1190 .0704
.3451 .3055 .2657 .2257 .1854
.4696 .4345 .3993 .3640 .3286
.7109 .6839 .6570 .6301 .6032
.8722 .8504 .8287 .8072 .7857
.9414 .9219 .9025 .8833 .8642
.9872 .9693 .9515 .9339 .9165
.1449 .1042 .0631
.2931 .2574 .2216 .1857 .1496
.5764 .5496 .5228 .4961 .4694
.7644 .7431 .7220 .7009 .6799
.8453 .8992 .8264 .8820 .8077 .8649 . 7 8 9 1 .8480 .7706 .8312
.1134 .0769
.4427 .4160 .3893 .3625 .3358
.6590 .7522 .6382 .7339 .6174 .7157 .5967 .6975 . 5 7 6 1 ,6795
.8145 .7979 .7814 .7651 .7488
-2.10 -2.12 -2.14 -2.16 -2.18
.3091 .2824 .2556 .2289 .2021
.5555 .5350 .5146 .4942 .4738
.6615 .6436 .6258 .6081 .5904
.7326 .7165 .7005 .6846 .6688
-2.20 -2.22 -2.24 -2.26 -2,28
.1753 .1484 .1216 .0946 .0677
.4535 .4332 .4130 .3928 .3726
.5728 .5553 .5378 .5204 .5030
.6530 .6373 .6217 .6062 .5907
-
1.60
-1.62 -1.64 -1.66 -1.68 -1.70 -1.72 -
1.74
-1.76 -1.78 -1.80 -1.82 -1.84 -1.86 -1.88 -
1.90
-1.92 -1.94 -1.96 -1.98 -2.00 -2.02 -2.04 -2.06 -2.08
300
A. C. Cohen
Table 4 (Contd.) 10
20
25
30
40
50
100
250
500
1000
-2.30 -2.32 -2.34 -2.36 -2,38
.3525 .3324 .3123 .2923 .2722
.4857 .4684 .4512 .4341 .4169
.5753 .5599 .5446 .5294 .5142
-2.40 -2.42 -2.44 -2.46 -2.48
.2522 .2323 .2123 .1924 .1724
.3999 .3828 .3658 .3489 .3319
.4991 .4840 .4690 .4541 .4392
© 1996 American Society for Quality Control. Reprinted with Permission. Reprinted from Cohen and Whitten (1986)
As computational aids, Cohen and Whitten (1985) provided a table and a chart of ~3 as a function of zl:n, and n. These are reproduced here with permission of ASQC as Table 5 and Figure 4. With x, s 2 and xl:n available from sample data, &3 can be obtained by inverse interpolation in Table 5, or it can be read from the graphs of Figure 4. Estimates }, fi and 6- then follow from (6.5). 7. Errors of estimates
Exact variances and the covariance of modified estimates of the exponential parameters were given in Section 2 of this paper. These, of course, are applicable for both small and large samples. Simulation studies by Cohen, Whitten, and Ding (1984, 1985), by Cohen and Whitten (1985,1986), and by Chan, Cohen and Whitten (1984), revealed that variances and covariances of modified parameter estimates in the Weibull, lognormal, gamma, and inverse Gaussian distributions were closely approximated by the asymptotic variances and covariances of corresponding maximum likelihood estimates. Thus, by using the M L E variances, ~1 - c~)100%_confidence intervals (approx.) on a parameter 0 can be calculated as 0 4-Iz~/21V/VO, where z is the standard normal variate. For a 95% confidence interval, z~/2 = -1.96. In moderately large samples, (n > 50), the degree of approximation should be adequate for most practical purposes.
Weibull distribution Asymptotic variances and covariances of M L E for Weibull parameters that are valid when c5 > 2, were given by Cohen and Whitten (1988) as V(9) = (fl2/n)q~ll '
,V(~) = (a2/n)q~22,
V(/~) = (f12/n)~33 ,
Cov(9 , cg) = (fi/n)q512 ,
C o v ( ' ~ , / ~ ) = (f12/n)@13 ,
C o v ( ~ , / ~ ) = (/~/n)(~23 ,
(7.1)
The role of order statistics in estimating threshold parameters
301
-0.25
-0.50
-0.75
-!.00
-1.2S
-1.50
z1
-1.75
"2.00
-2Z5
•L 50
"2.75
0
0.5
1.0
1.5
2.0
2.S
3.0
~3
Fig. 3. Graphs of 0~3 as a function ofzl:n and n in the Gamma distribution. © 1996 American Society for Quality Control. Reprinted with Permission. Reprinted from Cohen and Whitten (1986)
3.5
302
A. C. Cohen
Table 5 ~3 as a function of z]:n and n in the inverse Gaussian distribution 5
l0
20
25
30
40
50
100
250
500
1000
-.30 -.32 -.34 -.36 -.38
9.5177 8.8591 8.2734 7,7480 7.2743
9.6810 9.0343 8.4607 7.9484 7.4879
9.7689 9.1284 8.5612 8.0551 7.6006
9.7886 9,1494 8.5833 8.0790 7.6259
9.8026 9.1642 8.5996 8.0958 7.6437
9.8215 9.1843 8.6212 8.1187 7.6677
9.8340 9.1978 8.6354 8.1340 7.6841
9.8645 9.2302 8.6696 8.1703 7.7222
9.8914 9.2588 8.7006 8.2025 7.7569
9.9058 9.2745 8.7175 8.2194 7.7757
9,9177 9.2868 8.7280 8.2333 7.7907
-.40 -.42 -.44 -.46 -.48
6.8441 6.4508 6.0891 5.7551 5.4454
7.0708 6.6906 6.3435 6.0243 5.7289
7.1902 6.8168 6.4763 6.1639 5,8758
7.2169 6.8448 6.5058 6.1948 5.9083
7.2356 6.8651 6.5268 6.2164 5.9312
7.2614 6.8918 6.5551 6.2465 5.9622
7.2784 6.9096 6.5736 6.2658 5.9829
7.3184 6.9524 6.6189 6.3128 6.0323
7.3550 6.9897 6.6576 6.3545 6.0752
7.3746 7.3906 7.0117 7.0253 6.6802 6.6964 6.3776 6.3961 6.0973 6.1180
-.50 -.52 -.54 -.56 -.58
5.1565 4.8865 4.6322 4.3924 4.1647
5.4559 5.6094 5.6434 5.6675 5.7000 5.7213 5,7729 5.8182 5.8423 5.2019 5.3623 5.3976 5.4234 5.4567 5.4792 5.5322 5.5797 5,6042 4.9642 5.1322 5.1694 5.1956 5.2309 5.2544 5.3093 5.3590 5.3864 4.7418 4.9174 4.9564 4.9834 5.0199 5.0445 5.1024 5.1540 5.1791 4.5324 4.7162 4.7563 4.7848 4.8224 4.8479 4.9086 4.9620 4.9887
-.60 -.62 -.64 -.66 -.68
3.9487 4.3359 3.7424 4.1499 3.5447 3.9730 3.3540 3,8058 3.1706 3.6466
4.5274 4.3492 4.1811 4.0222 3.8706
4.5684 4.5983 4.3922 4.4227 4.2260 4.2577 4.0688 4.1017 3.9190 3.9529
4.6375 4,6633 4.4635 4.4907 4.3004 4.3278 4.1452 4.1749 3.9991 4.0289
4.7266 4.5552 4.3946 4.2427 4.0999
4.7804 4.6129 4.4538 4.3043 4.1636
4.8109 4.6424 4.4844 4.3361 4.1965
4.8343 4.6650 4.5107 4.3615 4.2212
-.70 -.72 -.74 -.76 -.78
2.9920 2.8196 2.6493 2.4828 2.3179
3,4936 3.3482 3.2080 3.0745 2.9452
3.7279 3.5912 3.4611 3.3370 3.2175
3.7777 3.8126 3.6425 3.6793 3,5147 3.5523 3.3918 3.4312 3.2754:3.3145
3.8595 3.7277 3.6031 3,4833 3.3688
3.8914 3.7606 3.6351 3.5171 3.4035
3.9652 3.8361 3.7141 3.5985 3.487l
4.0290 3.9020 3,7818 3.6680 3.5601
4.0649 3.9407 3.8194 3.7046 3.5993
4.0928 3.9639 3.8495 3.7339 3,6313
-.80 -.82 -.84 -.86 -.88
2.1546 1.9894 1.8226 1.6510 1.4750
2.8200 2.7001 2.5829 2.4696 2.3600
3.1033 3.1631 2.9939 3.0546 2.8873 2.9515 2.7858 2.85i6 2.6890 2.7563
3.2047 3.0977 2.9959 2.8981 2.8033
3.2601 3.1543 3.0552 2.9575 2.8660
3.2956 3.1914 3.0930 2.9977 2.9067
3.3814 3.2809 3.1835 3.0908 3.0007
3.4576 3.3568 3.2642 3.1726 3.0853
3.4958 3.3973 3.3069 3.2176 3.1291
3.5270 3.4277 3.3333 3.2497 3.1636
-.90 -.92 -.94 -.96 -.98
1.2859 1.0785 .8308 .4855
2.2525 2.5935 2.1482 2,5023 2.0457 2.4136 1.9447 2.3273 1.8452 2.2432
2.6639 2.5740 2.4881 2.4044 2.3228
2.7113 2.6233 2.5392 2.4572 2.3773
2.7755 2.6906 2.6064 2.5272 2.4485
2.8183 2.9162 2.7324 2.8340 2.6502 2.7553 2.5729 2.6785 2.4960 2.6049
3.0019 2.9221 2.8458 2.7699 2,6971
3.0447 3.0814 2.9670 3.0028 2,8926 2.9276 2,8213 2.8555 2.7474 2.7865
-1.02 -1.04 -1.06 -1.08
1.7471 2.1626 2.2446 1.6517 2.0839 2.1668 1.5562 2.0057 2,0922 1.4607 1.9306 2.0204 1.3676 1.8571 1.9489
2,2993 2.2257 2.1525 2.0821 2.0118
2.3731 2.3006 2.2311 2.1617 2.0949
2.4223 2.3501 2.2809 2.2130 2.1477
2.6300 2.6819 2.5656 2.6137 2.5010 2.5534 2.4390 2.4903 2.3767 2.4346
2.7147 2.6512 2.5901 2.5314 2.4749
-1.10 -1.12 -1.14 -1.16 -1.18
1.2732 1.1799 1.0854 .9897 .8929
1.9443 2.0306 1.8792 1.9663 1.8153 1.9043 1.7537 1.8446 1.6909 1.7846
2.3168 2.2615 2.2082 2.1521 2.1025
2.4204 2.3680 2.3079 2.2592 2.2122
-
1.00
1.7851 1.7147 1.6456 1.5767 1.5102
1.8801 1.8127 1.7477 1,6815 1.6177
2.5343 2.4639 2.3988 2.3337 2.2710
2.0849 2.2107 2.0220 2.1502 1.9613 2.0942 1.9028 2.0379 1.8464 1.9836
2.3711 2.3196 2.2651 2.2126 2.1665
5.8621 5.6260 5.4073 5.1993 5.0105
The role of order statistics in estimating thresholdparameters
303
Table 5 (Contd.) 5
10
20
25
30
40
50
100
250
500
1000
-1.20 -1.22 -1.24 -1.26 -1.28
.7928 .6940 .5900 .4810 .3674
1.4439 1.3798 1.3158 1.2539 1.1920
1.5561 1.4945 1.4349 1.3752 1.3175
1.6302 1.5717 1.5151 1.4583 1.4012
1.7290 1.6731 1.6169 1.5626 1.5101
1.7897 1.7349 1.6819 1.6286 1.5770
1.9311 1.8804 1.8292 1.7798 1.7319
2.0501 2.0039 1.9549 1.9075 1.8658
2.1130 2.0657 2.0199 1.9714 1.9328
2.1579 2.1142 2.0719 2.0225 1.9830
-1.30 -1.32 -1.34 -1.36 -1.38
.2412 1.1300 1.0680 1.0081 .9480 .8860
1.2617 1.2036 1.1474 1.0929 1.0363
1.3481 1.2926 1.2389 1.1869 1.1365
1.4592 1.4060 1.3584 1.3083 1.2598
1.5292 1.4788 1.4300 1.3827 1.3369
1.6855 1.6407 1.5972 1.5511 1.5103
1.8213 1.7783 1,7365 1.6921 1.6529
1.8872 1.8430 1.8082 1.7665 1.7261
1.9448 1.8997 1.8559 1.8135 1.7878
-1.40 -1.42 -1.44 -1.46 -1.48
.8259 .7676 .7074 .6490 .5888
.9852 .9282 .8766 .8229 .7707
1.0838 1.0328 .9832 .9351 .8849
1.2128 1.1635 1.1193 1.0727 1.0276
1.2887 1.2456 1.2002 1.1560 1,1132
1.4669 1.4249 1.3840 1.3443 1.3058
1.6111 1.5742 1.5384 1.5037 1.4628
1.6870 1.6489 1:6120 1.5761 1.5412
1.7476 1.7087 1.6708 1.6340 1.5983
-1.50 -1.52 -1.54 -1.56 -1:58
.5303 .4701 .4082 .3480 .2862
.7166 .6640 .6130 .5600 .5084
.8361 .7887 .7392 .6912 .6444
.9802 1.0716 1.2684 1.4300 1.5073 1.5776 .9376 1.0276 1.2320 1.3913 1.4744 1.5436 .8928 .9884 1.1932 1.3604 1.4423 1.5106 ,8493 .9468 1.1554 1.3236 1.4111 1.4784 .8069 .9064 1.1220 1.2945 1.3807 1.4470
-1.60 -1.62 -1.64 -1.66 -1.68
.2196
.4551 .4032 .3527 .2973 .2433
.5957 .5483 .4990 .4542 .4043
.7625 .8671 1.0862 1.2596 1.3511 1.4165 .7225 .8257 1.0514 1.2320 1.3223 1.3868 .6803 .7853 1.0143 1.1988 1.2943 1.3579 .6393 . 7 4 6 0 .9845 1.1665 1.2606 1.3297 . 5 9 6 3 . 7 0 7 7 .9492 1.1412 1.2279 1.3022
© 1996 American Society for Quality Control Reprinted with Permission. Reprinted from Cohen and Whitten (1985) w h e r e the qSij a r e t a b u l a t e d in T a b l e 6. T h i s t a b l e w a s o r i g i n a l l y i n c l u d e d in C o h e n a n d W h i t t e n (1988). It is r e p r o d u c e d h e r e w i t h p e r m i s s i o n f r o m t h e p u b l i s h e r , Marcel Dekker.
Lognormal distribution Asymptotic variances and covariances of local maximum likelihood estimators ( L M L E ) o f the l o g n o r m a l p a r a m e t e r s w e r e g i v e n by C o h e n (1951) as
/~20"2H 0-2fl2 C ° v ( 9 ' ~) "-- nx/'~ -- 7 ~b12'
V(•) "~ fl2a2Hnto-- ~2_flZn~bll , V(/}) --/~2~--~2 [1.
+H]
= 0-2/~2 7 q~22,
V(&) -- a2 [1 + 2 a i H ] = ~a2 ~33,
Cov(~), 6") -- ~2~2~/.,/~
o2~27q513,
(7.2)
C o v ( o , / ~ ) -- -'82cr2Hn = T0-2fl2 ~23 "
where H ( a ) = [(o(1 + er2) - (1 + 2a2)] - l
(7.3)
304
A. C. Cohen
Fig. 4. Graphs of ~3 as a function of Zl:. and n in the Inverse Gaussian distribution. © American Society for Quality Control. Reprinted with Permission. Reprinted from Cohen and Whitten (1985)
The role of order statistics in estimating threshold parameters
305
Table 6 Variance-Covariance factors for m a x i m u m likelihood estimates of Weibull parameters*
~3
~5
~11
~22
~)33
4)12
~13
4)23
0.05 0.06 0.07 0.08 0.09
3.40325 3.36564 3.32873 3.29249 3.25691
1.81053 1.73452 1.66133 1.59085 1.52298
3.41756 3.02575 2.93695 2.85105 2.76792
2.11502 2.03759 1.96301 1.89116 1.82193
-7.25440 -6.89240 -6.54777 -6.21964 -5.90715
1.91206 1.83425 1.75926 1.68696 1.61727
3.64075 3.36154 3.10390 2.86620 2.64695
0.10 0.11 0.12 0.13 0.14
3.22197 3.18766 3.15397 3.12087 3.08836
1.45760 1.39464 1.33400 1.27559 1.21932
2.68748 2.60962 2.53424 2.46125 2.30057
1.75523 1.69097 1.62905 1.56939 1.51190
-5.60953 -5.32601 -5.05590 -4.79851 -4.55323
1.55009 1.48531 1.42284 1.36261 1.30452
2.44475 2.25832 2.08648 1.92813 1.78226
0.15 0.16 0.17 0.18 0.19
3.05642 3.02503 2.99419 2.96388 2.93409
1.16513 1.11292 1.06263 1.01419 0.96753
2.32212 2.25580 2.19156 2.12930 2.06897
1.45651 1.40313 1.35170 1.30215 1.25440
-4.31945 -4.09661 -3.88417 -3.68162 -3.48850
1.24849 1.19446 1.14235 1.09208 1.04359
t.64794 1.52430 1.41054 1130591 1.20974
0.20 0.21 0.22 0.23 0.24
2.09481 2.87603 2.84773 2.81990 2.79254
0.92259 0.87930 0.83760 0.79744 0,75876
2.01049 1.95381 1.89884 1.84555 1.79887
1.20840 1.16409 1.12140 1.08027 1.04066
-3.30433 -3.12869 -2.96118 -2.80141 -2.64900
0.99681 0.95169 0.90817 0.86618 0.82567
1.12138 1.04024 0.96579 0.89752 0.83495
0.25 0.26 0.27 0.28 0.29
2.76563 2.73917 2.71314 2.68753 2.66234
0,72150 0,68563 0,65109 0,61783 0.58581
1.74374 1.69511 1.64794 1.60216 1.55774
1.00251 0.96578 0.93040 0.89635 0.86357
-2.50361 -2.36491 -2.23260 -2.10636 -1.98591
0.78660 0.74890 0.71254 0.67746 0.64363
0.77766 0.72525 0.67734 0.63359 0.59367
0.30 0.31 0.32 0.33 0.34
2.63756 2.61317 2.58918 2.56557 2.54233
0.55498 0.52530 0.49674 0.46926 0.44281
1.51463 1.47279 1.48217 1.39273 1.35444
0.83201 0.80165 0.77244 0.74434 0.71732
-1.87100 -1.76136 -1.65675 -1.55694 -1.46171
0.61099 0.57950 0.54914 0.51985 0.49161
0.55730 0.52420 0.49412 0.46681 0.44206
0.35 0.36 0.37 0.38 0.39
2.51946 2.49694 2.47478 2.45296 2.43148
0.41736 0.39289 0.36934 0.34670 0.32494
1.31726 1.28115 1.24608 1.21201 1.17891
0.69133 0.66636 0.64236 0.61930 0.59716
-1.37085 -1.28417 -1.20147 -1.12258 -1.04732
0.46437 0.43811 0.41279 0.38837 0.36483
0.41967 0.39946 0.38123 0.36485 0.35014
0.40 0.41 0.42 0.43 0.44
2.41032 2.38950 2.36899 2.34879 2.32889
0.30401 0.28390 0.26458 0.24601 0.22819
1.14675 1.11551 1.08514 1.05564 1.02695
0.57590 0.55550 0.53593 0.51716 0.49917
-0.97554 -0.90707 -0.84178 -0.77951 -0.72014
0.34215 0.32028 0.29921 0.27891 0.25935
0.33699 0.32525 0.31482 0.30557 0.29741
0.45 0.50 0.55 0.60 0.63
2.30929 2.21560 2.12856 2.04757 2.00166
0.21107 0.13533 0.07410 0.02514 0.00083
0.99907 0.87087 0.75933 0.66209 0.60978
0.48193 0,40630 0.34639 0.30002 0.27791
-0.66353 -0.41794 -0.22546 -0.07555 -0.00249
0.24050 0.15627 0.08681 0.02991 0.00100
0.29025 0.26648 0.25688 0.25552 0.25695
*Valid only if ~ > 2. Reprinted from Cohen and Whitten (1988), p. 48 by courtesy of Marcel Dekker, Inc.
306
A. C. Cohen
Although these results are not strictly applicable for the MME, simulation studies by Cohen and Whitten (1980) and by Cohen, Whitten and Ding (1985), indicated that the asymptotic variances were in reasonably close agreement with corresponding simulated variances of the MME. Variance-covariance factors are reproduced here from CW (1985) as Table 7 with permission from Marcel Dekker. Gamma distribution
Asymptotic variances and covariances of maximum likelihood estimators of gamma distribution parameters were given by Cohen and Whitten (1986) as
~2 V(7)=--all,n
if2 Cov(f, fl)=--al2,n
if2 V(fl)=--a22,n
G2 V(P)=~
if2 Cov(7, p)=~-fial3,
a33'
if2 Cov(fl, p)=~-~a23 .
(7.4)
A table of the variance-covariance factors as functions of the shape parameter is reproduced here as Table 8 with permission from ASQC. Simulation studies disclose that M L E variances are in reasonably close agreement with corresponding simulated variances of the MME. Inverse Gaussian distribution
As in the case of Weibull, lognormal, and gamma distributions, simulation studies of modified estimators of inverse Gaussian parameters indicate that variances of the M L E are in close agreement with corresponding simulated variances. Asymptotic variances and covariances of the M L E as given by Cohen, and Whitten (1985) are
Cov('~, ]A) -= (0-2/n)(,012 Var(~) = (02/n)(P22 Cov('~, 6") = (o2/n)fP13 Var(&) = (a2/n)~033 Cov(,/~, ~) = (0"2/n)@23 Var(~) = (aZ/n)~oll
(7.5)
where
(Pll "~- 2 / D
(P12 ~ --q?ll
q)22 = qhl + 1
~P13 = (c~]/9)/D
(P33 =
(,023 =
(BC - E2)/D
--~3 ( C - A E ) / D
A= B= C= E =
(%2/9) + 1 (~2/2) + 1 (7A~4/54) + B (~2A/2) + 1 D = 2(BC - E 2) +cfl(2AE - AZB - C) = 2 ( C - 1) - c~2Az. (7.6)
A table of the variance-covariance factors, q)ij, is reproduced here from Cohen and Whitten (1985) with permission of ASQC, as Table 9.
The role of order stat&tics & estimating threshold parameters
307
Table 7 Variance-Covariance factors for maximum likelihood estimates of lognormal parameters
~3
fD
q~ll
q~22
q~33
~12
q~23
0.50 0.55 0.60 0.65 0.70
1.02728 1.03289 1.03898 1.04555 1.05258
885.23263 607.65422 431.38171 315.03444 235.69195
910.38125 628.63763 449.19715 330.38447 249.08577
49.95010 41.61692 35.27801 30.34403 26.42824
-147.19413 -111.08790 -85.98520 -67.98669 -54.74136
-149.18831 -112.89975 -87.64504 -69.51786 -56.16220
0.75 0.80 0.85 0.90 0.95
1.06007 1.06799 1.07634 1.08510 1.09426
180.05004 140.07179 110.73199 88.79275 72.11193
191.86570 150.59584 120.18559 97.34930 79.90929
23.26838 20.68148 18.53674 16.73867 15.21622
-44.77401 -37.12694 -31.15990 -26.43369 -22.64022
-46.09919 -38.36839 -32.32744 -27.53552 -23.68324
1.00 1.05 1.10 1.15 1.20
1.10380 1.11372 1.12398 1.13460 1.14554
59.23866 49.16952 41.19733 34.81515 29.65392
66.38784 55.76087 47.30516 40.50113 34.96965
13.91565 12.79571 11.82430 10.97615 10.23114
-19.55896 -17.02923 -14.93205 -13.17803 -11.69914
-20.54904 -17.97141 -15.83068 -14.03690 -12.52156
1.25 1.30 1.35 1.40 1.45
1.15679 1.16835 1.18020 1.19233 1.20472
25.44114 21.97300 19.09523 16.68980 14.66545
30.43014 26.67221 23.53622 20.89971 18.66773
9.57311 8.98892 8.46783 8.00100 7.58107
-10.44293 -9.36858 -8.44393 -7.64347 -6.94674
-11.23182 -10.12653 -9.17324 -8.34620 -7.62472
1.50 1.55 1.60 1.65 1.70
1.21736 1.23025 1.24336 1.25669 1.27023
12.95095 11.49026 10.23888 9.16122 8.22860
16.76599 15.13585 13.73062 12.51284 11.45225
7.20190 6.85830 6.54590 6.26099 6.00038
-6.33721 -5.80144 -5.32841 -4.90903 -4.53575
-6.99210 -6.43475 -5.94150 -5.50313 -5.11201
1.75 1.80 1.85 1.90 1.95
1.28397 1.29790 1.31200 1.32628 1.34071
7.41778 6.70977 6.08899 5.54257 5.05983
10.52422 9.70861 8.98879 8.35100 7.78379
5.76133 5.54149 5.33881 5.15152 4.97805
-4.20229 -3.90336 -3.63449 -3.39191 -3.17240
-4.76172 -4.44691 -4.16304 -3.90627 -3.67330
2.00 2.25 2.50 2.75 3.00
1.35530 1.43024 1.50791 1.58758 1.66869
4.63185 3.08881 2.16869 1.58683 1.20085
7.27756 5.41775 4.27019 3.51921 3.00385
4.81705 4.16174 3.68628 3.32880 3.05208
-2.97322 -2.20975 -1.70671 -1.35930 -1.11001
-3.46134 -2.64270 -2.09578 -1.71271 -1.43389
3.25 3.50 3.75 4.00 4.25
1.75079 1.83355 1.91669 2.00000 2.08331
0.93447 0.74441 0.60491 0.50000 0.41942
2.63607 2.36492 2.15943 2.00000 1.87378
2.83263 2.65498 2.50865 2.38629 2.28263
-.92535 -.78485 -.67550 -.58871 -.51863
-1.22440 -1.06276 -.93519 -.83255 -.74858
4.50 5.00 6.00 7.00 8.00
2.16650 2.33211 2.65871 2.97764 3.28840
0.35637 0.26578 0.16333 0.11031 0.07956
1.77207 1.61983 1.43425 1.32847 1.26164
2.19380 2.04972 1.84925 1.71682 1.62290
-.46121 -.37349 -.26335 -.19884 -.15742
-.67886 -.57037 -.42941 -.34311 -.28546
q~13 ~ --q~12 Reprinted from Cohen and Whitten (1988), p. 65 by courtesy of Marcel Dekker, Inc.
A.C. Cohen
308
Table 8 Variance-Covariance factors for maximum likelihood estimates of gamma distribution parameters ~3
p
a I1
a22
a33
al 2
al 3
a23
.30 .35 .40 .45 .50
44.4444 32.6531 25.0000 19.7531 16.0000
2662.9967 1381.8510 773.5243 457.9562 282.8694
1.4777 1.4697 1.4605 1.4502 1.4387
11070.8895 5828.5582 3318.9110 2004.3848 1266.7474
61.7633 44.1110 32.6715 24.8465 19.2675
-5408.0304 -127.4143 -2822.2097 -92.0694 -1590.3115 -69.1440 -948.7509 -53.4415 -591.1488 -42.2249
.55 .60 .65 .70 .75
13.2231 11.1111 9.4675 8.1633 7.1111
180.4944 118.0880 78.7503 53.2735 36.4083
1.4262 1.4127 1.3981 1.3826 1.3662
829.7439 15.1580 559.2873 12.0509 385.8140 9.6514 271.2024 7.7662 193.5812 6.2640
-380.9304 -251.9866 -170.1249 -116.6711 -80.9522
-33.9415 -27.6571 -22.7822 -18.9301 -15.8385
.80 .85 .90 .95 1.00
6.2500 5.5363 4.9383 4.4321 4.0000
25.0460 17.2843 11.9260 8.1991 5.5950
1.3490 1.3310 1.3125 1.2933 1.2738
139.9051 5.0532 102.1284 4.0683 75.1431 3.2614 55.6236 2.5968 41.3553 2.0475
-56.6283 -39.8075 -28.0315 -19.7084 -13.7851
-13.3243 -11.2567 -9.5401 -8.1034 -6.8925
1.05 1.10 1.15 1.20 1.25
3.6281 3.3058 3,0246 2,7778 2.5600
3.7725 2.4987 1.6128 1.0021 .5869
1.2539 1.2338 1,2137 1,1936 1.t738
30.8345 23.0220 17.1884 12.8143 9.5253
1.5927 1.2161 .9047 .6484 .4386
-9.5510 -6.5188 -4.3493 -2.8031 -1.7097
-5.8663 -4.9922 -4.2449 -3.6040 -3.0530
1.30 1.35 1.40
2.3669 2.1948 2.0408
.3104 .1318 .0219
1.1544 1.1356 1.1174
7.0486 5.1833 3.7801
.2685 .1324 .0256
-.9459 -.4225 -.0741
-2.5784 -2.1691 -1.8159
Valid only if p > 2. © 1996 American Society for Quality Control. Reprinted with Permission. Reprinted from Cohen and Whitten (1986)
8. Illustrative examples Example 1 T h i s e x a m p l e is f r o m M c C o o l (1974). T h e d a t a c o n s i s t o f f a t i g u e life i n h o u r s o f t e n b e a r i n g s o f a c e r t a i n type. T h e s a m p l e o b s e r v a t i o n s l i s t e d i n i n c r e a s i n g o r d e r of magnitude are 152.7, 172.0, 172.5, 173.3, 193.0, 204.7, 216.5, 234.9, 262.6, 4 2 2 . 6 I n s u m m a r y , n = 10, 2 = 2 2 0 . 4 8 0 4 , s 2 = 6 1 4 7 . 6 7 3 3 , s = 78.407, a3 = 1.8636, Xl:10 = 152.7, a n d s2/(2-x1:10) 2 = 1.33814. I f w e c o n s i d e r t h i s s a m p l e t o b e f r o m a W e i b u l l d i s t r i b u t i o n , w e f i n d c~ = 0.95 ( a p p r o x i m a t e l y 1) as t h e e s t i m a t e f o r 6. T h i s s u g g e s t s t h a t p e r h a p s t h e e x p o n e n t i a l d i s t r i b u t i o n is t h e a p p r o p r i a t e m o d e l . F u r t h e r m o r e , w i t h a3 = 1.8663 ( a p p r o x i m a t e l y 2), it s e e m s r e a s o n a b l e t o c o n c l u d e t h a t t h i s s a m p l e is f r o m a t w o p a r a m e t e r e x p o n e n t i a l d i s t r i b u t i o n , a n d w e e m p l o y e q u a t i o n s (2.8) t o c a l c u l a t e e s t i m a t e s as
309
The role of order statistics in estimating threshold parameters
Table 9 Variance-Covariance factors for m a x i m u m likelihood estimates of inverse Gaussian parameters
0~3
(911
0.50 0.55 0.60 0.65 0.70
777.60000 520.18733 359.19540 254.68604 184.68582
0.60000 0.62007 0.64172 0.66491 0.68956
5.40000 4.80812 4.31034 3.88573 3.51929
5.15000 4.53312 4.01034 3.56073 3.16929
0.75 0.80 0.85 0.90 0.95
136.53333 102.64044 78.30306 60.51803 47.31804
0.71563 0.74304 0.77177 0.80176 0.83298
3.20000 2.91955 2.67155 2.45098 2.25385
2.82500 2.51955 2.24655 2.00098 1.77885
1.00 1.05 1.10 1.15 1.20
37.38462 29.81606 23.98443 19.44521 15.87907
0.86538 0.89895 0.93364 0.96945 1.00634
2.07692 1.91755 1.77352 1.64299 1.52439
1.57692 1.39255 1.22352 1.06799 0.92439
1.25 1.30 1.35 1.40 1.45
13.05348 10.79709 8.98215 7.51246 6.31489
1.04431 1.08335 1.12344 1.16458 1.20677
1.41639 1.31784 1.22775 1.14523 1.06954
0.79139 0.66784 0.55275 0.44523 0.34454
1.50 1.55 1.60 1.65 1.70
5.33333 4.52442 3.85435 3.29660 2.83020
1.25000 1.29427 1.33958 1.38594 1.43335
1.00000 0.93602 0.87708 0.82271 0.77249
0.25000 0.16102 0.07708 -0.00229 -0.07751
1.75 1.80 1.85 1.90 1.95
2.43851 2.10821 1.82858 1.59098 1.38836
1.48180 1.53131 1.58188 1.63352 1.68622
0.72605 0.68306 0.64322 0.60625 0.57192
-0.14895 -0.21694 -0.28178 -0.34375 -0.40308
2.00 2.25 2.50 2.75 3.00
1.21500 0.64831 0.36593 0.21650 0.13333
1.74000 2.02524 2.33824 2.67964 3.05000
0.54000 0.41026 0.31765 0.25014 0.20000
-0.46000 -0.71474 -0.93235 -1.12486 -1.30000
3.25 3.50 3.75 4.00 4.25
0.08500 0.05584 0.03766 0.02601 0.01833
3.44977 3.87931 4.33890 4.82877 5.34909
0.16210 0.13300 0.11034 0.09247 0.07819
-1.46290 -1.61700 -1.76466 -1.90753 -2.04681
4.50 5.00 6.00 7.00 8.00
0.01317 0.00713 0.00245 0.00099 0.00045
5.90000 7.09404 9.85294 13.10854 16.86226
0.06667 0.04954 0.02941 0.01882 0.01274
-2.18333 -2.45046 -2.97059 -3.48118 -3.98726
(922 = q~. + 1
(933
(913
(923
(912 = - ( 9 .
Reprinted from Cohen and Whitten (1988), p. 82 by courtesy of Marcel Dekker, Inc.
A. C. Cohen
310
f = [ 1 0 ( 1 5 2 . 7 ) - 220.4804]/9 = 145.17, /~ = [10(220.4804 - 152.7)]/9 = 75.31 . F r o m (2.9), we calculate estimate variances as V(~) = 63.02
and
V(/~) = 630.2 ,
a n d a p p r o x i m a t e 95% confidence intervals become 129.6<7<
152.7
and
26.1~<
124.5 .
Note that the u p p e r confidence limit on 7 is the first order statistic of the sample.
Example 2 This example consists of progressively censored survival data assumed to be from an exponential population. It was originally presented by Freireich et al. (1963), and it was subsequently considered by Gehan, (1970), and was later used as a illustration in a graphical hazard function analysis by Gross and Clark (1975). The data consist of ordered times to remission in weeks of twenty-one leukemia patients w h o had undergone a c h e m o - t h e r a p y treatment to maintain previously induced remissions. Sample data, which include 9 complete and 12 censored observations are tabulated below. Time to remission of 21 patients Remission time in weeks, xi
Censoring times Tj
Number censored cj
Remission time in weeks, xi
Censoring times Tj
Number censored cj
17 19 20
1 1 1
16
6 6 6 6
1
9
1
10
1
25 32
11
1
34
2 I
35
1
7
22 23
10
13
1
11 cjTs = 250, ST = 109 + 250 n : 21, m = 9, T1 = X l : 2 1 = 6 , ~ i = 19 xi = 109, )-~j=l = 359 Substitution in equations (2.15) yields estimates = [9(6) - 359/21]/(9 - 1) = 4.61, /~ = [ 3 5 9 - 2 1 ( 6 ) ] / ( 9 - 1) = 29.13 .
The role of order statistics in estimating thresholdparameters
311
The estimated mean remission time based on a two-parameter exponential distribution is then E(X) = ~ +/~ = 4.61 + 29.13 = 33.74 weeks. If we assume the origin to be at zero, as did Gross and Clark, the estimate of the mean based on a one-parameter exponential distribution becomes E(X) = D = S T / m = 359/9 = 39.89 weeks . With 7 assumed to be zero, Gross and Clark obtained 36.2 weeks as the estimated mean remission time based on a cumulative hazard plot. Example 3 This example was previously used by Dumonceaux and Antle (1973) to illustrate estimation in the extreme value distribution. Here, we employ it to illustrate estimation in the Weibull, lognormal, gamma, and inverse Gaussian distributions. Sample data consist of m a x i m u m flood levels in millions of cubic feet of water per second flowing in the Susquehanna River at Harrisburg, Pa. over 20 four-year intervals from 1890-1969 as collected by Professor Byron Reich of the Civil Engineering Department of Pennsylvania State University. Observations in chronological order are as follows 0.654 0.402 0.269 0.416
0.613 0.379 0.740 0.338
0.315 0.423 0.418 0.392
0.449 0.379 0.412 0.484
0.297 0.3235 0.494 0.265
In summary, n = 20, x = 0.423125, s 2 = 0.0156948(s = 0.1253), X l : 2 0 = 0.265, and s2/(x -x1:20) 2 = 0.62771, z1:20 = -1.262, a3 = 1.0673243. Estimates of Weibull parameters, calculated as described in Section 3, are = 1.479, /~ = 0.202, and ~ = 0.241. We subsequently calculate c~3 = 1.0966, which is to be compared with a3 = 1.0673. Of course for the M M E , E(J0 = 2 = 0.423125 and V(X) = s 2 = 0.0156948. With the parent population assumed to be lognormal, estimates of parameters calculated as described in Section 4, a r e ~ = 0.4703, ~ = 1.245, /2 = - 1 . 4 9 2 , /~ = 0.225, ~ = 0.171, ci3 = 1.6159, with E(X) = 0.423125, and V(X) = 0.0156948. With the parent population assumed to be gamma, estimates of parameters calculated as described in Section 5, are ~U_ 2.902, / ) = 0.0735, ) = 0.210, ~3 = 1.174, and again, E(J0 = 0.423125 and V(X) = 0.0156948. With the parent population assumed to be distributed in accord with the Inverse Gaussian distribution, estimates of parameters calculated as described in Section 6, are 6- =0.1253, /2=- 0.301, ~ =0.125, c~3 = 1.2468, with E(X) = 0.423125 and V(X) = 0.0156948. Of the four estimates of ~3, that obtained when the parent distribution is assumed to be Weibull (c23 = 1.0966) is in closest agreement with the
312
A. C. Cohen
c o r r e s p o n d i n g s a m p l e value, a3 = 1.0673. A l t h o u g h n o t c o n c l u s i v e , this suggests t h a t p e r h a p s the W e i b u l l p r o v i d e s the best fit to the s a m p l e d a t a a n d t h a t p e r h a p s o u r best e s t i m a t e o f ~ is the W e i b u l l e s t i m a t e ~ = 0.241.
Acknowledgement Appreciation r e p r i n t tables are e x t e n d e d r e p r i n t tables
a n d t h a n k s a r e e x t e n d e d to M a r c e l D e k k e r , Inc. f o r p e r m i s s i o n to f r o m C o h e n a n d W h i t t e n (1988). L i k e w i s e , a p p r e c i a t i o n a n d t h a n k s to t h e A m e r i c a n S o c i e t y f o r Q u a l i t y C o n t r o l for p e r m i s s i o n to a n d c h a r t s f r o m the J o u r n a l o f Q u a l i t y T e c h n o l o g y .
References Balakrishnan, N. and A. C. Cohen (1991). Order Statistics and Inference." Estimation Methods. Academic Press, San Diego. Barlow, R. E., A. Madansky, F. Proschan and F. Schever (1968). Statistical estimation procedures for the "burn-in" process. Technometrics 10, 51-62. Chan, M., A. C. Cohen and B. J. Whitten (1984). Modified maximum likelihood and modified moment estimators for the three-parameter inverse Gaussian distribution. Comm. Statist. Simul. Comp. B, 13, 4~68. Cheng, R. C. H. and T. C. Iles (1989). Embedded models in three-parameter distributions and their estimation. J. Roy. Statist. Soc. B 52, 135-149. Cohen, A. C. (1951). Estimating parameters of logarithmic-normal distributions by maximum likelihood. J. Amer. Statist. Assoc. 44, 518-525. Cohen, A. C. (1963). Progressively censored samples in life testing. Technometrics 5, 237-339. Cohen, A. C. (1966). Life testing and early failure. Technometrics 8, 539 549. Cohen, A. C. (1991). Truncated and Censored Samples - Theory and Applications. Marcel Dekker, Inc., New York. Cohen, A. C. (1995). M L E ' s under censoring and truncation, and inference. Chapter 4 of The Exponential Distribution: Theory, Methods, and Applications, edited by N. Balakrishnan and A. P. Basu. Gordon & Breach, Langhorne, PA. 33-51. Cohen, A. C. and R. Helm (1973). Estimation in the exponential distribution. Technometries 14, 841-846. Cohen, A. C. and B. J. Whitten (1980). Estimation in the three-parameter lognormal distribution. J. Amer. Statist. Assoc. 75, 399404. Cohen, A. C. and B. J. Whitten (1982a). Modified moment and maximum likelihood estimators for parameters of the three-parameter gamma distribution. Comm. Statist., Siml. and Comp. It, 197-214. Cohen, A. C. and B. J. Whitten (1982b). Modified maximum likelihood and modified moment estimators for the three-parameter Weibull distribution. Comm. in Statist. - Theory and Methods ll, 2631 2656. Cohen, A. C. and B. J. Whitten (1985). Modified moment estimation for the three-parameter inverse Gaussian distribution. J. Qual. Tech. 17, 147 154. Cohen, A. C. and B. J. Whitten (1986). Modified moment estimation for the three-parameter gamma distribution. J. QuaL Tech. 18, 53-62. Cohen, A. C. and B. J. Whitten (1988). Parameter Estimation in Reliability and Life Span Models. Marcel Dekker, Inc., New York. Cohen, A. C., B. J. Whitten and Y. Ding (1984). Modified moment estimation for the three-parameter Weibull distribution. J. Qual. Tech. 16, 159-167.
The role of order statistics in estimating threshold parameters
313
Cohen, A. C., B. J. Whitten and Y. Ding (1985). Modified moment estimation for the three-parameter lognormal distribution. J. Qual. Tech. 17, 9~99. David, H. A. (1970). Order Statistics. John Wiley & Sons, New York. David, H. A. (1981). Order Statistics. 2nd. ed. John Wiley & Sons, New York. Dumonceaux, R. and C. E. Antle (1973). Discrimination between the lognormal and the Weibull distributions. Technometrics 15, 923-926. Engelhardt, M. E. and L. J. Bain (1977). Simplified statistical procedures for the Weibull or extremevalue distributions. Technometrics 19, 323-331. Epstein, B. (1954). Truncated life tests in the exponential case. Ann. Math. Statist. 25, 555-564. Epstein, B. (1960). Estimation from life test data. Technometrics 2, 447454. Epstein, B. and M. Sobel (1953). Life testing. J. Amer. Statist. Assoc. 48, 485-502. Epstein, B. and M. Sobel (1954). Some theorems relevant to life testing from the exponential distribution. Ann. Math. Statist. 25, 373-381. Freireich, E. J. et al. (1963). The effect of 6-mercaptopurine on the duration of steriod-induced remission in acute leukemia. Blood 21, 699-716. Gehan, E. A. (1990). Unpublished Notes On Survivability Theory. Univ. of Tex., M.D. Anderson Hospital and Tumor Institute, Houston, Texas. Griffiths, D. A. (1980). Interval estimation for the three-parameter lognormal distribution via the likelihood equations. Appl. Statist. 29, 58-68. Gross, A. J. and V. A. Clark (1975). Survival Distributions. Reliability Applications in the Biomedical Sciences. John Wiley & Sons, New York. Harter, H. L. (1961). Expected values of normal order statistics. Biometrika 48, 151-166. Harter, H. L. and A. H. Moore (1966). Local-maximum-likelihood estimation of three-parameter lognormal populations from complete and censored samples. J. Amer. Statist. Assoc. 61, 842-851. Hill, B. M. (1963). The three-parameter lognormal distribution and Baysian analysis of a point-source epidemic. J. Amer. Statist. Assoc. 58, 72-84. Kambo, N. S. (1978). Maximum likelihood estimation of the location and scale parameters of the exponential distribution from a censored sample. Comm, in Statist. - Theory and Methods 7, 11291132. Lemon, G. H. (1975). Modified maximum likelihood estimation for the three-parameter Weibull distribution based on censored samples. Technometrics 17, 247-254. Lloyd, E. H. (1952). Least-squares estimation of location and scale parameters using order statistics. Biometrika 39, 88 95. Lockhart, R. A. and M. A. Stephens (1994). Estimation and tests of fit for the three-parameter Weibull distribution. J. Roy, Statist, Soc. Series B, 56, 491-500. Mann, N. R, R. E. Schafer and N. I. Singpurwalla (1974). Methods for Statistical Analysis o f Reliability and Life Data. John Wiley & Sons, New York. McCool, J. L. (1974). Inferential Techniques for Weibull Populations, Aerospace Research Laboratories Report ARL TR 74-0180, Wright-Patterson AFB, Ohio. Nelson, W. (1968). A Method for Statistical Hazard Plotting of Incomplete Failure Data That Are Arbitrarily Censored, TIS Report 68-C-007, General Electric Research and Development Center, Schenectady, New York. Nelson, W. (1969). Hazard plotting for incomplete failure data. J. Qual. Tech. 1, 27-52. Nelson, W. (1972). Theory and applications of hazard plotting for censored failure data. Technometrics 14, 945-966. Nelson, W. (1982). Applied Life Data Analysis. John Wiley & Sons, New York. Rockette, H., C. Antle and Klimko L. A. (1974). Maximum likelihood estimation with the Weibull model. J. Amer. Statist. Assoc. 69, 246 249. Smith, R. L. (1985). Maximum likelihood estimation in a class of nonregular cases. Biometrika 72, 67-90. Smith, R. L. (1995). Likelihood and modified likelihood estimation for distributions with unknown endpoints. Chapter 24 of Recent Advances in Life Testing, ed. by N. Balakrishnan. CRC Press, Boca Raton, Florida. 455474.
314
A . C . Cohen
Viveros, R. and N. Balakrishnan (1994). Interval estimation of parameters of life from progressively censored data. Technometrics 36, 84-91. Wilson, E. B. and J. Worcester (1945). The normal logarithmic transform. Rev. Econ. Statist. 27, 17-22. Wingo, D. R. (1973). Solution of the three-parameter Weibull equations by constrained modified quasi-linearization (progressively censored samples). IEEE Trans. Reliab. R-22, 2, 96-102. Yuan, P. T. (1933). On the logarithmic frequency distributions and the semi-logarithmic correlation surface. Ann. Math. Statist. 4, 30-74.
N. Balakrishnan and C. R. Rao, eds., Handbook of Statistics, Vol. 17 © 1998 Elsevier Science B.V. All rights reserved.
1 1
I. ].
Parameter Estimation under Multiply Type-II Censoring
Fanhui Kong
I. Introduction
For a population with density function f ( x , 0), where 0 ¢ R q is the parameter of interest, the estimation of 0 under different censoring scheme has been a major topic of statistical inference. In this chapter, we concentrate on parameter estimation under multiply Type-II censoring. Assume n items are put on a life test, but only r]h,..., r~h failures are observed, the rest are unobserved. That is, for some items we may not know their exact failure times, not even their orders, but for each one of these items, we have observed failures xr,_~ and x~, such that it fails between x~i 1 and x~,. This is the multiply Type II censoring. Multiply Type II censoring is a generalization of Type II censoring where only the first k failure times are observed. It is also different from group censoring in which one separates (0, co) into a partition {(ti_l, til, i = 1,... ,k + 1} with to = 0, tk+l = oc and records the number of failures or censorings in each interval. It is a frequently practiced censoring scheme, particularly if one fails to record the exact failure times of some subjects, only several failure times and the number of failures between them are recorded. This frequently happens in follow up studies.
2. Best linear estimation
2.1. Best linear unbiased estimators Suppose the distribution of random variable X comes from a location and scale family, which has a density function of the form
for a density function g(x) defined on R. The standardized variate Y = (X - # ) / e has density function g(y) free from # and a. Let - o c < xrl,n < "'" < Xrk,n < OC be a multiply Type-II censored sample from population f(x; #, a), then - o c < y~,,n < " < Yrk,n < oc is a multiply Type-II censored sample from population 315
F. Kong
316
9(Y). Denote c~i = E(Yr,,n) and fiij = Cov(Y~.... Yrj,,) for i, j = 1 , . . . , k , then c~i and ~ij depend on 9(Y) only, not On the parameters, and they can be determined once and for all. Put X = (X,.~,~,... ,X~k,~)I, ~ = ( e l , . . . , ~k)', /~ = (/~ij)k×~, A = (1, c~) with 1 = ( 1 , . . . , 1) I and 01 = (#,a), then E X = #1 + o~ = AO,
Var(X) = o2/~ ,
(2)
where A and p are all known. By the generalized G a u s s - M a r k o v theorem, the least squares estimator for 0 is the best linear unbiased estimator (BLUE), which is
O* = (A'~-IA)-IA'~ff
'X ,
(3)
with covariance matrix a2(A'p-IA) -1. /~ l ( l c ( - cd')/~-l/A, then (3) becomes #* = -~:IFX,
Denote
A=det(A'~-lA)
and
cr* = I I F X .
F= (4)
F o r given values of n and k, denote Ak,~ = ~ ' f l - l ~ / A , Bk:n = - l ' f l - l ~ / A , and Lk,~ = l ' p l l / A , then #* and a* have the following variances and covariance Var(#*) = a2Ak,,,
Var(o-*) = a2Lk,~,
Coy(#*, o-*) = a2B~,, .
Since the estimators are linear functions of X and the coefficients depend only on ~ and p, to derive these estimators, it suffices to calculate the means and variance-covariance matrix of the order statistics from distribution 9(y). It could be quite hard to construct a handful table of such moments if n is large, see Balakrishnan and Chan (1992) and references there. It's more convenient to develop a computer program to generate these moments and at the same time to derive the estimates of the parameters. 2.2. Best linear invariant estimators F r o m #*, ~r* given above, one can derive the best linear invariant estimators (BLIE) for # and a (see M a n n et al., 1974), =
#,
Bk,n o-*, 1 + Lk~n
o" - -
o-* 1 + Lk, n
-
-
They have the mean squared errors MSE(~) =
Ak,,
k,n
1 T~,n
0.2
'
and Lk, n
MSE(~-) -- 1 + L~,,
~r2
"
BLIE ft and ~- are usually not unbiased, but in the sense of mean-squared error, they are better than BLUE/~* and a*. Similar to regular Type II censoring, both
Parameter estimation under multiply Type-H censoring
317
B L U E and BLIE for parameters/~ and a are asymptotically efficient and asymptotically normal (see Mann 1968b). 2.3. A special case: Weibull distribution
Assume T is a life time random variable having a two-parameter Weibull distribution Wei(0, fi) with the density function fl-lexp
y(t;O, fi) =
-
,
t>0
,
(5)
for 0 > 0 and fi > 0, where fi is the shape parameter and 0 is the scale parameter. Suppose Try,n,.-., Try,,, is a multiply Type II censored sample from the Weibull distribution Wei(0, fi), then Xr,, = lnTr,,n, i = 1 , . . . , k , is a multiply Type II censored sample from the extreme-value distribution EV(/~, ~r), with the density function f(x;/~,a)=-e
{7
(-")/ exp - e (x-z)/~ ,
-c~<x
,
(6)
for - e c < p < cx~ and (r > 0, where a = 1/fl and # = ln0. Notice this is a location and scale family with p and a, so their BLUE and BLIE can be derived, and these estimators can immediately be used to derive estimators for 0 and fl in the Weibull distribution. In fact, the corresponding estimators are: fi = l i p ,
fi* = 1/a*,
0 * = e ~*,
(7)
0=e ~ .
(8)
Being nonlinear functions of o-* a n d / / , respectively, estimators fl* and 0* do not enjoy the nice property of being unbiased. In fact, the unbiased estimators for fi and 0 do not exist, In this subsection, through the asymptotic distribution of a*, we provide an approximately unbiased estimator for fl with mean-squared error smaller than that of both fl* and ft. Denote a* = ~ %,,X~i,n in (4) and k-1
bl = --Crl.n
b2 = - ( c r l , n -t- Cr2,n)~... ~bk 1 ~ -- ~ C
.... = Crk,n
i=1
then a* can be expressed as = -
k- 1
=
O" i=1 k
k- 1 i=1
ri+l -ri
b ~'~ Xri+j'n -- X r i + j - l ' n j=l
l
Z
-_
i=l
j=l
28,
F. Kong
318
Here Xri+j'n -Xri+j-l'n Hij = aE(Zr,+j,n - Zr,+j-l,n) '
i= 1,...,k-
1, j =
1,...,ri+l -1"1. ,
and E(Zr,+j,,) is the expectation of the (ri + j)th order statistic of a sample of size n from the standard extreme-value distribution EV(0, 1). According to David (1981), each//,7 approximately has an exponential distribution with mean 1 and variance 1. For j ¢ m, H/j and Him have approximately zero covariance, and 2H/j has approximately a chi-square distribution with 2 degrees of freedom. Therefore, a*/a is approximately a weighted sum of independent chi-square random variables with # = E ( a * / a ) = 1 and v = V a r ( a * / a ) = L ~ , . . According to Patnaik (1949), as n--+ ec, 2tr*/aLk,n has a chi-square distribution with 2#2/v = 2/Lk,. degrees of freedom, that is 2a* o l d , , ~ x2(2/L
")
"
Similar to Zhang, Fei and Wang (1982), aLk,~/2a* approximately has an inverse Gamma distribution, so approximately
EdaLk,n~
1
and
Var
4
i,)
(Lk,"
2
1
Therefore an approximately unbiased estimator for fl is fl** _ 1 - Lk,. = (1 - Ll,,.)fl* 0"*
(9)
1
with Var(fl**) = fl 2/(Lk,-1. - 2). It's also easy to calculate the MSE's for fi* and fl, MSE(fl*) =
1 + 2Lk,,
f12 ,
1 + 7Lk,n
f12 .
(1- L,,.)(Lkl- 2)
and =
So, MSE(fl**) < MSE(fi*) < MSE(fl) .
(10)
Parameter estimation under multiply Type-H censoring
319
3. Maximum likelihood estimation
3.1. M L E in multiply Type H censoring Let's consider m a x i m u m likelihood estimate in a general population. Suppose
Xr~,~,...,Xrk,n with 0 <_Xr,,, < " ' <Xr~,n < ~ is a multiply Type II censored sample from a population with density f(x,O) and cumulative distribution function F(x, O) for 0 ~ R q. Denote the likelihood function of the sample as L(x, O) = L(X~l~,... ,x~,~, 0), then with x~0,~ = 0 we have k
L ( x , O ) = C I - [ { F ( x ..... O)_F(Xr~l,n,O)}~, .... , 1 (11)
i=l k
x {1--F(x,.k,~,O)}"-rkHf(Xri,n,O), i=1
where C is a constant not depending on 0. The likelihood equation becomes
OlogL(x,O) 00
~ --Z(ri_ri_l_l)
01og{F(xr,,O) - F ( x r i 1,,, 0)} "
i=1
+ ( n - r k ) Olog{1-F(x~,,,O)}o0
00
+ologj(xr,,o,o) -~
÷~
i=1
=0.
(12)
In general, this equation does not admit a closed form for the solution. Therefore the M L E does not have a closed form. Thus a numerical method, such as Newton Raphson iteration has to be used to determine the MLE. Due to this restriction, the finite sample properties of M L E can not be obtained.
3.2. Large sample properties of M L E To study the large sample properties of the M L E for multiply Type II censored data, let's define the gap between Xr, ~,n and Xr,,n as ri -- ri-1 -- 1, which is the total number of unobserved failures, and g = max(ri - ri-1 - 1) as the m a x i m u m gap. For a Type II censored sample, as the number of observations goes large, the M L E is consistent, asymptotically normal and efficient, see Halperin (1952), Bhattacharyya (1985). So it is interesting to know whether such properties are still valid for multiply Type II censored sample, especially when some gaps between observed failures go to oc. For the desired properties to be still valid, we need to impose some extra restrictions on the population distribution, especially at the tails of the density function. In addition, we need to restrict the speed at which these gaps go to oo.
F. Kong
320
Let's first introduce the following assumptions. ASSUMPTION 1. For almost all x, the derivatives 0 i logf(x, O) O0 i ,
i = 1,2
0 i+l logf(x, O) OxO0 i ,
and
i= 1,2,3
exist, and are piecewise continuous for every 0 belonging to a non-degenerate area I in R q and x in [0, oc). ASSUMPTION 2. There exist positive numbers A1, A2, 7ij, i = 1,2, j = 1,... ,5, such that when 0 is in some neighborhood of true value 00, and x is large enough, -< AlX~'i' i = 1,2,
Oil°gf(x'O)o0'
"Oi+lOXootlogf(x'O) < A I x)h,j+2 i = 1,2,3 (13)
and when x is small enough, meaning being close enough to zero, 0 i logf(x, O)
O0i
< A2x ~2i, i = 1,2, (14)
0 i+l logf(x, 0) < Azx_72,i+2 ' i = 1 , 2, 3 7
.
OxO0' H e r e 7ij could be different for different components of 0. Also assume there exists
a function H ( x ) that for every 0 in 0 3 logf(x.oo ~ 0) ~;~
'
<_ H ( x ) ,
R q, for
-oc<x
,
and an M independent of 0 such that
f
o~ H ( x ) f ( x , O) dx < M < oo . DO
For simplicity, we define 7~ = max{2711 + 713,711 Jr 714,712 _L 713, TI5} and 72 = max{2721 + 723,721 + 724,722 + 723,725} • ASSUMPTION 3. For x large enough, there exist positive numbers C1 and c~ such that
f(x, 0)
c, { 1 - F(x,
(15)
Parameter estimation under multiply Type-H censoring
321
F o r x small enough, there exists a real n u m b e r fi and a positive n u m b e r C2 such that
f(x, O) >_C2{F(x, 0)} 9
(16)
ASSUMPTION 4. F o r every 0 in I, the integral 2
foo (Ologf(x,O) f(x,O)dx =
\
(17)
oo
is finite and positive. Note for different c o m p o n e n t s of 0, the values of 71, ~/2 could be different. U n d e r these assumptions, we have the following theorems. THEOREM 1. F o r constants e, fl, 71, 72 defined as above, assume A s s u m p t i o n s 1 4 are valid. Further assume there are small positive numbers D, ~b, Zl and 172 with 0 < q~ < l, such that
f0
D{F-I(1
--
~x,O)}Y'X
(~-1) d x < o c
,
f0O{F 1[(1 - qS)x, O]}-72x -(/J-l) dx < oc ,
(18)
(19)
and for some positive y, as n ---+ oc n ~-2
{( F -1
1
Y
n(logn)a+~j
,0
~
0 ,
(20)
(21)
Then if the m a x i m u m gap 9 is always b o u n d e d the likelihood equation (12) has a solution converging in probability to the true value 00 as n + ec. T o derive the asymptotic normality, we only need to add very limited assumptions. THEOREM 2. In T h e o r e m 1, instead of conditions (18)-(21), if we have
foD{F
1(1 --
~x,O)}~'x
(~-l/2) dx < oc ,
(22)
322
F. Kong
Jo
D{F-1I(1 - ~)x, O]}-72x -(fl-1/2) dx < cx~
(23)
and
{(
n ~-3/2 F -1
1
Y
0
---+ 0
(24)
n(logn)l+~, '
(2s) then when g is bounded, the solution of (12) is an asymptotically normal and efficient estimator of 00. Finally consider the case that 9 + ~ along with n. THEOREM 3. Under Assumptions l-4 for the distribution, assume as n - + oc, ne-(n/g)~-+ 0 for any e > 0. At two tails of the order statistics, we assume (rj+l-rj1 ) / ( r j - 1) on the left tail and (rj+l - r j - 1 ) / ( n - rj+i - 1) on the right tail are bounded. Further, instead of (20) and (21) we need n = - 2 ( n - rk) 2 F -I
1
n(logY)~+~ ,0
~ 0 ,
(26)
(27)
and (18), (19) in Theorem 1. Then the M L E is consistent. Furthermore, if together with (22) and (23) in Theorem 2, (26) is modified by replacing ~ - 2 with e - 3/2 and (27) is modified by replacing fl - 2 with fl - 3/2. Then the derived MLE is asymptotically normal and efficient. The proofs of these theorems are provided in Kong and Fei (1996). Besides the regularity conditions for Type II censoring in Halperin (1952), we have introduced (13)-(16) and the additional assumptions presented in the theorems. Although these additional conditions seem complicated and sometimes difficult to verify, they are all restrictions on the tails of the population distribution. Under these conditions, the scores at two tails of the order statistics do not fluctuate too dramatically therefore the unobserved scores are different from the observed scores only in a small scale. As a consequence, the score statistic of the incomplete sample is essentially equivalent to that of the complete sample. Similar assumptions are used by Hall (1984) and Khashimov (1988) in deriving the limiting theorems of the sums of general functions of spacings.
Parameter estimation under multiply Type-H censoring
323
3.3. Examples EXAMPLE 1. First, consider the G a m m a distribution. It has the density function
f(x, k, O) -
1 xk-le -x/°, o~r(k)
k>0,
0>0,
x>0
.
(28)
Take k as a constant and only 0 as a parameter, Assumption 2 is valid and we have 71 = 2, and 72 = 0. Choose 1 < ~ < 2 and 1 - 1/k < fi < 2, then Assumption 3 and conditions (18)-(21) are satisfied. Therefore Theorem 1 is valid. Further let 0 < ~ < 3/2 and 1 - 1/k < fl < 3/2, Theorem 2 is also valid. For the case that the m a x i m u m gap g --+ ~ under the conditions in Theorem 3, the theorem is valid. EXAMPLE 2. Consider the two-parameter Weibull distribution, which has the density function in (5) and the log-density function logg(x, 0,/~) = l o g / ~ - / ~ l o g 0 +
(/~- 1)logx-
~
.
Here we use ~ instead of fi to avoid confusion. By taking derivatives with respect to /~, we find that for any ¢ > 0 the values of 7's satisfying Assumption 2 are 7~ = 3kt - 1 + e and 72 = 1 + ¢. Apparently, Assumption 3 is satisfied if ~ > 1 and fi > 1 - 1/#. Note when # < 1, fi could be negative. Furthermore, if e, fl also satisfy e < 2 and fi < 2 - l / y , one can find a small positive ~ that conditions (18)-(21) are satisfied. F o r (22)-(25) to be satisfied, one needs to choose ~, fi such that 1 < e < 3 / 2 , and 1 - 1 / # < f i < 3 / 2 - 1 / # . By taking derivatives with respective to 0, we can easily determine 71 3# - 1 and 72 = 1 - # if # < 1, and 72 = 0 otherwise. For # < 1, choose 1 < ~ < 2 and 1-1//~ 1, choose 1 < 7 < 2 a n d 1-1//~
1-1//~ 1. Putting these together, for different values of 71 and 72, we can find c o m m o n values of c~ and fi such that all the assumptions of Theorem 1 and Theorem 2 are satisfied, therefore the M L E of (0, #) is consistent and asymptotically normal and efficient. Theorem 3 is also valid if the gaps satisfy the assumptions there. Notice here the values of 71 and 72 need not be the same, but the values of e and fl must be same. After modifying the assumptions, these results can be extended to the situation where r a n d o m variables are defined on ( - o c , ec). In this situation, the corresponding assumptions of (19), (21), (23) and (25) are modified simply by replacing -72 with 72 in these expressions. 4. Approximate maximum likelihood estimation In view of the fact that the M L E does not have a closed form, we often try to derive the approximate MLE. The approximate M L E can be easily derived for
F. Kong
324
the location and scale family. In this family, instead of solving the likelihood equations directly, we apply the linear terms of the Taylor expansions so the likelihood equations become linear or quadratic equations and the closed forms of the solutions exist. Just as in Section 2, we consider X from a location and scale family with density f(x, #, a) of the form (1). Denote Y = - ~ , then it has density function g(Y) and distribution function G(y) free from parameters # and a. Suppose 0 _< x~,,. < ... < xr~,. < oc is a multiply Type-II censored sample from f(x, #, a), its likelihood function has the form, L(,u, o-;X)
=
C1
]+
o-k II{+(yr+,+)-
+(yr, ,,o)}+-r'-'-'
i=1
× {1 - GCy,+,n)}
tl--r k
(29)
k
Hg(y,,,~), i=1
where yr~,. --- (x~,. - #)/o- for i = 1 , . . . , k and for convenience we choose Y~o,. that G(Yro,.) = 0. The likelihood equation becomes, OlogL_
O#
1[/=~1
a i
(ri -- ri- 1 -- 1 )
g(Y~i,.)-g(Y~i,,.) G(Yri,n) G(yr~l,n) k
- (" - ~/l - ~-+---,." ~.~y~+ ) + ~
t
(30)
0(y~,,n)J---0,
and 01ogL0a _
~l Ik -t- Z(~+ii=]k _ El-| _ 1)yr,.ng(ypiG(YFi ,17), ,l'l)----YPi-['ng(YFi yF,-I'1,tl~'/))zG( (31)
Yr+,ng(Yr+,n) +Yr,,ng'(Yrl,n)] --(n--rk)~~ + ~ g(Yr~,n) J = 0 . For i = 1 , . . . , k, denote pr, = 1 - qr~ = ~ri and ~ri ~- G-1 (Pri)" Then G(Yri,,) is the r~h order statistic of a sample of size n from uniform distribution, and EG(Y~i,,) = p~,. As n goes large, G(Y~i,n) --+ Pri or Yr,,, -+ ~, for i = 1 , . . . , k. So for a function h(.) with certain number of derivatives, h(y~,,,) can be expanded to the Taylor expansion at ~r~. Taking the linear terms of the Taylor expansions only, one gets
g'(Y~.,)/g(Y~,,) =~i-fliY~,,,, i = 1,...,k , g(yr,,.)/[G(yrj,.) -G(yr,_,,.)] =qoi +qliYr,_,,. +tl2iYv,,., i = 1,...,k , g(Y+,-,,.)/[G(Y~,,.) -G(Y~ I,.)] =~lgi ++~iYr, ~,~ ++~iY~,., i = 1,...,k , g(Yrk,n) /[1 -- G(yr+,.) ] In fact, we have,
= r/;,k+ 1 -?
q~,k+lYr+,n •
Parameter estimation under multiply Type-II censoring
g'(¢,) ai-
g"(¢r,) - g'~(¢,)
g(~.r~) ~ ~r, fii
g(¢,)
t/o, -
with
fli = -
~_~t/1 i + ~r/2 i
pr~ -- p~ I
g(~ri)g(~ri-)21 ) ~ h i - (Pr~-Pr,_, , ~Oi-
325
O(¢ri_l)
and
'¢
¢(¢,)
r/2, -- -
-
Pr~--Pfi_[
£(¢~ ,) )2
,
i= 1,...,k
,
with
, _~_ , ~ri ltlli ~rYI2i
* -- g ( ~ J) ~
g2(~,)
g~(¢,) ... (PFi _pr~_,)2 , i = 1, ,k
with
and
,7.2i =
rhi,
i = 1,
...,
k
,
and
g(¢,) t/01
--
Prl
.
. 1g(¢rk) -- Prk
t/0'k+l
~rlq21
with
.
.
t]21 - -
~'(¢.,)
g~(¢.,)
Prl
P21
,
g'(¢J ~
Crktllk+ 1
with
YIl,k+l- l _ p ~ k
£(¢k) (1-p~k)
2
F o r different specific distributions, these expansions have been derived, see Balakrishnan, G u p t a and P a n c h a p a k e s a n (1992, 1995a,b) and Fei, K o n g and T a n g (1995). Putting these expansions into (30), one gets a linear equation for/2 and a. T h e solution for/2 is
~2=B-Ca
,
(32)
where k
k
m = Z(ri-
ri-1 - 1)(r/2i- rl~i) - ( n - rk)rl~,k+l -- Z f i i ,
i~l
C=--
Ill
i~l
(n--rk)tl•,k+l-
i= l
(ri--r, 1 -- 1)(~0i-- q;/) --
~,
and 1
k
B ~- -[~2/r'm
r,_, - 1/{ (.~i-.~i)x~, ,,. +/.2,-
i=1
k
.1,/x~,.} (33)
i=l
Similarly putting the expansions into (31), one gets a quadratic equation of # and a. Replacing # by (32) in the equation, one derives &= {-D+
V/D2+4kE}/2k,
(34)
,
F. Kong
326
where k
O:
k
~-~°~ix .... @ Z ( F i i=1
-- l~i_l -- 1)(110iXri,n -- glOi *X ri-l,n
-
(n -
,'e)./;,~+lx,-~,.
i=1 k
+ 2C~-~(ri-r;_l- 1)thi(X~,.~-x,.;_,,. ) + mBC i=1
and k
E:
k
~ . ~ fl.x ri~n -- ~ i=1
( ri -- Fi-i - l ) (l~2iXr2i,n
_
*
2
171iXri_l .n )
i=l k
+
(n
. 2 - rk)~li,k+lxrk,. + 2B
~-~(r; -
F;_ 1 - 1)~li(Xri,n -- Xri_l,.)-~- roB2 •
i=1
For some specific distributions, one can prove that D 2 + 4kE > 0 so (34) always has a solution, see Balakrishnan, Gupta and Panchapakesan (1992, 1995a,b) and Fei, Kong and Tang (1995). The asymptotic variances and covariance of g and & have the forms m
Var(6) =--°2 { ~_ @ 1 2 } m
--m
2
'
for some quantities of ~ and V2which usually have complicated expressions. For various distributions, the expressions for V1 and V2have been derived. For details, see Balakrishnan, Gupta and Panchapakesan (1992, 1995a,b) and Fei, Kong and Tang (1995). Unlike regular MLE, the approximate MLE for/z, a have closed forms. Since the difference between approximate MLE and MLE is O(1/v~), as n --+ oc, approximate MLE will enjoy all the nice asymptotic properties of MLE, such as consistency, asymptotic normality and asymptotic efficiency. Enormous simulation studies performed by Balakrishnan and Cohen (1991) and Fei, Kong and Tang (1995) show that the approximate MLE in various situations are nearly as efficient as the best linear unbiased estimators and MLE. The approximate MLE for fi and 0 in the Weibull distribution can also be derived if we make the transformation (7) and (8) as in Section 2.
5. Interval estimation for exponential distribution
In this section, we consider interval estimation for the parameters in exponential distribution under multiply Type II censoring. One-parameter and two-parameter
Parameter estimation under multiply Type-II censoring
327
exponential distributions are discussed separately. For each parameter, three different confidence intervals are provided, two approximate and one exact. The exact confidence interval can be used in most cases except some special situations. Suppose X has a two-parameter exponential distribution exp(0, r/) with density function 1 { ~_~} f(x;O,q)=-Oex p , x>>_q, 0 > 0 , (35) and cumulative distribution function
F(x;O, tl)=l-exp{-~O--~O~},
x_>t/, 0 > 0 ,
(36)
where t/>_ 0 is the warranty time or threshold and 0 is the residual mean survival time. If we make a transformation Y = Y-~, then Y ~ exp(1,0). For order statistics X~,n,... ,Xn,, of a sample of size n from population exp(0, it), YI,,,..., Y,,, are the corresponding order statistics of a sample from population exp(1,0). First we have the following lemma. LEMMA 1. Assume X ~ exp(1,0) and XI,~,... ,X~,, are the order statistics of a sample of size n. Define WI = Xl,n, W2 = X2,~ - XI,~,. • •, W~ = X,,, - Xn-l,, then 1. Wl,..., W~ are mutually independent, and W~ ~ e x p ( ~ ,
0), i = 1 , . . . , n.
2. Xi,n has the expression i
Xi,= ~ '
Zj
jl
n - j + l '
i = l,. .,n "
(37) '
where Z l , . . . ,Zn are i.i.d from exp(l,0). The proof can be found in Balakrishnan and Cohen (1991) or Arnold, Balakrishnan and Nagaraja (1992).
5.1. Confidence intervals for one-parameter exponential distribution In this subsection we assume the survival time has an exponential distribution exp(0,0) and consider the confidence interval for 0. Suppose 0 <_ Xrj,~ < ... < Xrk,, < e~ is a multiply type II censored sample from exp(0, 0). According to Lemma 1, we have
Xr,,n=
Zj
0
n-j+l
where Z l , . . . , Zr, are i.i.d ~ exp(1,0). Define Qk=2~
kX. ~,n , i=1
(38)
F. Kong
328
then
k
n
i=1
"~
2Zj
k
~
k+l_i2zj
(39)
j=ri i+1
i=l
with r0 = 0. This is a pivotal statistic whose distribution does not depend on 0. To derive its distribution, we note that since Zj ~ exp(1,0), then 2Zj has a X2(2) distribution. On the other hand, any )~2(2) random variable can be expressed as a sum of two independent Z2(1) random variables. Therefore Qk has the form of a weighted sum of 2rk independent X2(1) random variables. Based on this observation, we can derive the distribution of 0h, thereby the confidence intervals for parameter 0. In fact, we will provide three different confidence intervals for 0, two approximate and one exact. Method I. Denote the mean and variance of Qk as #k and Vk, then &=EQk=2~'
k i=I
n
k+l-i
~'
j=ri_l+lrt--j@
(40) 1
1
and ~
(k+ 1 -i)2 (n ---] -7 1~ "
Vk = Var(Qk) = 4 Z
(41)
i=1 j=ri-l+i
According to Patnaik (1949), approximately,
2#kQk
X2 (2#2~ \W) '
(42)
a chi-square distribution with 2/~/V~ degrees of freedom, This approximation gives identical first two moments. If 2/~2/Vk is an integer, then the quantiles of 2~eOkcan be found directly in a regular ;g2 table. In general however, this may not 2 v) or Z12 ~(v) can be derived by be the case. Then if v is small, quantiles such as Z~( interpolation. If v is larger than 10, a very accurate approximation can be obtained through the Wilson-Hilferty transformation (see Lawless (1982) or Johnson, Kotz and Balakrishnan (1994)). That is, if Z2(v) is a Z2 random variable with v degrees of freedom, then approximately {(~]I/3
2 +gv
1"~(~)1/2 1 . j ~-, N(0, )
As an example, if zl_~ is the 100(1 ingly, +~ - 1
-g)th
(43)
quantile of N(0, 1), then correspond-
= zl_~ ,
Parameter estimation under multiply Type-H censoring
329
that is 2
] /2"~ 1/2
]3
2g~ I n o u r c a s e v - - ~ , so _
Z12_~¢2/.2 ~ 2/*2[ V~ g,,'/2 "]3 \ Vk J = ~ 1 - (3/*D 2 4- '@--zl_~/ ~/*k ] -
(44)
-
We can easily obtain the 1 - ~ lower confidence bound for 0. Similarly, we can derive the 1 - e confidence interval for 0. Method II. Instead of transforming Ok to an approximate N(0, 1) variable, this method directly gives its approximate distribution. First we introduce the following lemma. LEMMA 2. Assume X1,...,X, are i.i.d random variables from N(0, 1) and all,..., d, are positive numbers with ~i~1 di = 1. Define X = ~ = 1 diX~:, then the density function of X is approximately
gd(X) = 2 .=ZTT,\2di/
/
~
,
x> 0 .
(45)
For a proof, see Gabler and Wolff (1987). This formula gives a quite accurate approximation to the density of X. In fact, as pointed out by Gabler and Wolff (1987), the first three moments of 9d(X) are identical to those of X, and the fourth and fifth moments are also quite close. Slight difference appears only when di's are close to each other. In expression (39), 2Zj are i.i.d ~ X2(2), for j = 1 , . . . , rk, each can be expressed as a sum of squares of two independent N(0, 1) random variables. For each j, assume Xj,1, Xj.2, are two such variables, that is 2Zj --X2j,1 4-Xj2,2,, then (39) becomes
ok=Z Z i=1 j=ri_l +1
j,1 +Xj!2 n-j+l
(46)
Put Mk = #k/2, then _0_a_ 2Mk satisfies the conditions in Lemma 2. Define k+l-i dij = 2 M k ( n - - j + 1) ' for i = 1 , . . . , k, j = ri-i 4- 1,..., ri, the density of ~
9d(X) : Z
~
i=1 j=r,_l +1
e x/24i/P
is approximately
.
(47)
F. Kong
330
As an example, assume ul-~ is the 100(1 lower confidence bound for 0 is
-o0
th
quantile of
Qk
then the 1 -c~
M k bIl -e~
The 1 - ~ confidence interval for 0 can be derived in the same way. In order to calculate ul-~, one needs to solve an equation which involves calculating the sum of rk partial F functions. With high speed computers and powerful software packages, this becomes quite easy. Method Ill. In m a n y situations, the exact distribution of Qk can be found. Let's introduce the following lemma by K a m p s (1990). LEMMA 3. Let ZI,..., Zn be i.i.d random variables with distribution exp(0, 0) for n 0 > 0 and a l , . . . , an be different positive numbers. Define T = ~i=1 Z~/ai, then the cumulative distribution function of T is given by n
n
Gn(t)=l-(-1)"-'HaiZa i=1
;'
(ak-aj) le-akt/°,
t>0
.
j=l,j#k
k=I
(49) K a m p s (1990) gives a proof. Instead of using Qk defined in (38) and (39), we introduce
Q* = Qk/2 =
(50)
r,., _
where Zj (/" = 1 , . . . , rk are i.i.d with exp(1,0) distribution. Define
(n-j+l)/k (n-j+l)/(k-1) aj =
j = 1,...,rl j--rl+l,...,r2
.
(51)
n-j+l
j=rk-l+l,...,rk
,
then according to L e m m a 3, the cumulative distribution function of Q~ is rk
Grk(t)=
r/~
1-(-1)rk-lHaiZa~' i=1
h=l
rk
H
(a~-aj)-'e
ah,, , > 0 .
j=l,j=~h
(52) As an example, if t, ~ is the 100(1 - @th quantile of 1 - c~ lower confidence bound for 0 0L -
tl -~
Grk(t), we
have the exact
(53)
Parameter est#nation under multiply Type-H censoring
331
The exact 1 - c~ confidence interval for 0 can be derived similarly. Again, complicated calculations are needed to derive the quantiles. Besides, for (52) to be valid, the k numbers a l , . . . ,ak defined in (51) must be different, so one cannot always use Lemma 3 to derive the exact confidence intervals. 5.2. Confidence intervals f o r two-parameter exponential distribution
Consider a two-parameter exponential distribution exp(0, r/) with t / > 0, in which one believes that death or failure cannot occur before certain time t/. As earlier, suppose X~,~,... ,X~k,~ is a multiply Type II censored sample obtained from a sample of size n from exp(0, t/). We have the following simple lemma which is proved in Balakrishnan and Cohen (1991) and Arnold, Balakrishnan and Nagaraja (1992). LEMMA 4. Assume X ~ exp(1,~/) and Xl,n,... ,X~,, are the first r order statistics of a sample of size n, then for l < _ s < r _ < n , V,I=Xs+I,~-X~.,, V~2= Xs+2,, - X~,,,..., Vs,r ~ = X~,,, - Xs,,, are the first r - s order statistics of a sample of size n - s from population exp(1,0). According to this lemma, Xr2,n-X~,,~,... ,X~k,,- Xr,, are the ( r 2 - rl)th,.. • , (rk --rl) th order statistics of a sample of size n - r~ from the one-parameter exponential distribution exp(0,0). Therefore the results in Section 5.1 can be directly used to derive confidence intervals for 0. Define Q1 2 ( X r , , n - - t / ) 0
(54)
and Qk,1 = 2 Z i=2
- 2 0
Xri,, -- kXr,,, /0 .
(55)
I_i=1
Pivotal Qk,1 only contains parameter 0, so the confidence intervals for 0 can be constructed through Qk,1 only. The three methods provided in the previous section can be adopted here to construct the confidence intervals for 0. To derive the confidence intervals for r/, consider the pivotal quantity Z-
Q1 _ Qk, l
Xr~,,-q
(56)
2ik=l Xri,n - kXrl ,n '
which is a division of two independent random variables, each being a weighted sum of exponential or )~2 random variables. The approximate and exact distributions of Q1 and Qk,1 have been given earlier, therefore those of Z are easy to derive. Since it is of particular interest to test hypothesis H0: t / = 0 versus HI: r/ > 0, SO besides constructing the confidence intervals for r/, we also provide three different tests for H0.
332
F. K o n g
M e t h o d I. By L e m m a 1 and (42), approximately, 2&Q1 ~ z2(f~)
and
2#
V1 where fl -- - ~-~ 2~ , approximately
Z2 0c2) ,
14,1 ~Ve,--7 2' f 2 = -~k,,
and
(57)
Vk, 1 are the mean and variance of Qk,1. So
# k,1,
F - 2#1Q1 / 2#k'lQk'! - #k,!
/ f-7 7
Xr,~,, - I1
k X,~,,n - kXr~,. ~i=1
,.o F ( f l ,f2) •
(58)
An F-table is needed to find the quantiles. If f l , f2 are not integers, interpolations can help to determine the a p p r o x i m a t e quantiles. Denote FI ~(fl,f2) as the 100(1 - o()th quantile of F ( f l , f 2 ) , then the 1 - c~ lower confidence b o u n d for r/is
OL = Xr,,.
fl~l F l - c ~ ( f l #k,1
--
;f2)
Xr,,n
-
(59)
kX~,,n
It is replaced by zero if OL is negative. In testing H0:17 = 0 versus Hi: t / > 0, one can reject Ho if 0L ) 0. M e t h o d II. In Section 5.1, L e m m a 2 gives a p p r o x i m a t e distributions of both 2Mz Q' and 2Mk,~Ok" where Mk.1. = #k,1/2. One can use these a p p r o x i m a t i o n s to derive the a p p r o x i m a t e distribution of Z. But if r1 is too small, L e m m a 2 m a y not give a good a p p r o x i m a t i o n to the distribution of Q1, therefore the a p p r o x i m a t e distribution derived for Z m a y not be accurate. However, in this particular situation, one can always use L e m m a 3 to derive the exact distribution of Q1, therefore give a more accurate a p p r o x i m a t i o n to the distribution of Z. T o do that, define k+l-j din = 2Mk,1 (n -- h + 1)' An a p p r o x i m a t e density of Z -
h ~-rj_l I}- 1 , . . . , r j ,
--7+ 11 (60)
1 ~ 1/2d/h
27;J
x
{(n-i+
.
Mk,tQI M, Qk,l is
rl)! i=1 j=2 h=~,_,+I Z
f z ( z ) - 4(n
j=2,...,k
z>0
.
1)z/2 + 2d,i h} l/2djh+l
Denote ~1-~ as its I00(1 - c0 th quantile, then the 1 - c~ lower confidence b o u n d for t/is M1
(61)
Parameter estimation under multiply Type-II censoring
333
Again if 0L is negative, we replace it with zero. Null hypothesis /7o: ~/= 0 is rejected if 0L > 0. F r o m Section 5.1, (60) is quite accurate. M e t h o d III. We can also derive the exact distribution o f Z t h r o u g h L e m m a 3. Define
Q*I = Q1/2,
Q*k,l = Qk,1/2 ,
then Q~ and Q*k,1 have the forms o f n2
nl
QI =
Q*k,1
1i ai,
i
:
i,
i=l
i=1
where Z l l , . . . , Z I , t , Z21,...,Z2~2 are i.i.d exp(0,0) r a n d o m variables. If a l , . . . , an~ and b l , . . . , b,2 are all different positive numbers, we can use L e m m a 3 to derive the exact distribution o f Z. In fact, nl = rl, ai = n - i + 1, for i= 1,...,rl, and n2=rg-rl, bj=(n-j+l)/(k-i+l) for ri-1 <_j <_ ri, i = 2 , . . . , k . N o t e a l , . . . , a n ~ are also all different, so if b2 for ri 1 <-j <_ ri, i = 2 , . . . ,k are all different, the distribution of Z = Q*I/Q*k,I is rl
Gz(Z) = l - - ( - - l ) r ~ I - [ a i=l
rk
i
rl
fl bj~-~ j=rl+l
r~
X
rk
rl
E I - [ (ak--aff) -1 k=l h=rl+l j=l,j~k
(62)
1
H (bh -- b l ) - l - l=rl +1, l~-h akz q- bh
for z > 0. Assume tl-~ is the 100(1 - ~)th quantile of Gz(z), then the 1 - ~ lower confidence b o u n d for q can also be derived as earlier and it can be used to test H0: rl = 0. F o r the special case that rl = 1, al = n, the distribution of Z becomes rk
Gz(z) = 1 - ( - 1 ) ~ n H b j Z j=2
rk
Yk
fl h=2 l=2, l#h
( b h - b l ) - ' - -1
z>0
.
nZ + bh
(63) Because o f the structure o f Gz(z), its quantile is actually easier to calculate than that o f fz(z).
References Aitken, A. C, (1935). On least squares and linear combinations of observations. Proc. Roy. Soc. Edin. 55, 4248. Arnold, B. C., N. Balakrishnan and H. N. Nagaraja (1992). A First Course in Order Statistics. John Wiley & Sons, New York. Bain, L. J. (1978). Statistical Analysis of Reliability and Life-Testing Models. New York, Dekker. Balakrishnan, N. (1990). On the maximum likelihood estimation of the location and scale parameters of exponential distribution based on multiply Type II censored samples. J. Appl. Statist. 17, 55 61. Balakrishnan, N. and P. S. Chan (1992). Order statistics from extreme value distribution, I: Tables of means, variances and covariances. Commun. Statist, - Simul. 21, 1199-1217.
334
F. Kong
Balakrishnan, N. and A. C. Cohen (1991). Order Statistics and Inference." Estimation Methods. Academic Press. Boston. Balakrishnan, N., S. S. Gupta and S. Panchapakesan (1995a). Estimation of the mean and standard deviation of the normal distribution based on multiply Type-II censored samples. J. Ital. Statist. Soc., in press. Balakrishnan, N., S, S. Gupta and S. Panchapakesan (1995b). Estimation of the mean and standard deviation of the logistic distribution based on multiply Type-II censored samples. Statistics, in press. Balakrishnan, N., S. S. Gupta and S. Panchapakesan (1992). Estimation of the location and scale parameters of the extreme-value distribution based on multiply Type-I1 censored samples, Technical Report #91-38C, Purdue University. Balasubramanian, K. and N. Balakrishnan (1992). Estimation for one and two-parameter exponential distributions under multiple Type-II censoring. Statistical Papers 33, 203-216. David, H. A. (1981). Order Statistics, Second Edition, John Wiley & Sons, New York. Epstein, B. (1954). Truncated life tests in the exponential case. Ann. Math. Statist. 25, 555-564. Fei, H. and F. Kong (1994). Interval estimations for one- and two-parameter exponential distributions under multiply Type II censoring. Commun. Statist, - Theor. Meth. 23, 1717-1733. Fei, H., F. Kong and Y. Tang (1995). Estimation for two-parameter Weibull distribution and extremevalue distribution under multiply Type-II censoring. Commun. Statist. Theor. Meth. 24, 2087-2104. Gabler, S. and C. Wolff (1987). A quick and easy approximation to the distribution of a sum of weighted chi-square variables. Statisticalpapers 28, 317-323. Hall, P. (1984). Limit theorems for sums of general functions of m-spacings. Math. Proc. Camb. Phil. Soc. 96, 517-532. Halperino M. (1952). Maximum likelihood estimation in truncated samples. Ann. Math. Statist. 23, 226-238. Johnson, N. L., S. Kotz and N. Balakrishnan (1994). Continuous Univariate Distributions, 2rid edn. Vol. 1. John Wiley & Sons, New York. Kambo, N. S. (1978). Maximum likelihood estimators of the location and scale parameters of the exponential distribution from a censored sample. Commun. Statist. - Theor. Meth. Al(12), 1129-1132.
Khashimov, Sh. A. (1988). Asymptotic properties of function of spacings. Theory Probab. Appl. 34, 298-306. Kong, F. and H. Fei (1996). Limit theorems for the maximum likelihood estimators under multiple Type II censoring. Ann. Inst. Statist. Math. 48, 731-755. Lawless, J. F. (1982). Statistical Models and Methods for Lifetime Data. John Wiley & Sons, New York. Lieblein, J. (1954). A new method of analyzing extreme value data. Nat'l Advisory Comm. for Aeronautics, Tech, Note 3053. Lieblein, J. and M. Zelen (1956). Statistical investigation of the fatigue life of deep groove ball bearings. Research Paper 2719, J. Research, Nat'l Bur. Standards 57, 273-316. Lloyd, E. H. (1952). Least-squares estimation of location and scale parameters using order statistics. Biometrika 39, 88-95. Mann, N. R., R. E. Schafer and N. D. Singpurwalla (1974). Methods .]'or Statistical Analysis o f Reliability and Lifetime Data. John Wiley & Sons, New York. Mann, N. R. (1967). Tables for obtaining the best linear invariant estimates of parameters of the Weibull distribution. Technometrics 9, 629-645. Mann, N. R. (1968a). Results on statistical estimation and hypothesis testing with application to the Weibull distribution and extreme-value distribution. Aerospace Research Laboratories Report ARL 68-0068, Office of Aerospace Research; United States Air Force, Wright-Patterson Air Force Base, Ohio. Mann, N. R. (1968b). Point and interval estimation procedures for the two-parameter Weibull and extreme-value distributions. Technometrics 10, 231-256.
Parameter estimation under multiply Type-H censoring
335
Mann, N. R. (1984). Statistical estimation of parameters of the Weibull and Frechet distribution. Edited by J. Tiago de Oliveiva Statistical Extremes and Applications, 81-89. D. Reidel Publishing Company, Dordrecht, Holland. Mann, N. R. and K. W. Fertig (1973). Tables for obtaining Weibull confidence bounds and tolerance bounds based on best linear invariant estimates of parameters of the extreme value distribution. Technometrics 15, 87-102. Mann, N. R., R. E. Schafer and N. D. Singpurwalla (1974). Methods for Statistical Analysis of Reliability and Lifetime Data. John Wiley & Sons, New York. Patnaik, P. B. (1949). The non-central X2 and F distribution and their applications. Biometrika 36, 202-232. Sirvanci, M. and G. Yang (1984). Estimation of the Weibull parameters under Type I censoring. J. Amer. Statist. Assoc. 79, 183-187. White, J. S. (1964). Least-squares unbiased censored linear estimation for the log Weibull (extremevalue) distribution. J. Industrial Math. Soc. 14, 21-60. Zhang, J., H. Fei and L. Wang (1982). Contrast among the accuracy of parameter estimate methods for the Weibull distribution. Acta Mathematicae Applicatere Siniea 5, 397411.
N. Balakrishnan and C. R. Rao, eds., Handbook of Statistics, Vol. 17 © 1998 Elsevier Science B.V. All rights reserved.
1 ,.3 [/~
On Some Aspects of Ranked Set Sampling in Parametric Estimation
Nora Ni Chuiv and Bimal K. Sinha
1. Introduction
In environmental sampling situations, one is often involved with a random selection of a few experimental units (e.g., so-called "hot spots") from an available hazardous waste site, and subsequent measurement and analysis of a variety of chemicals which are believed to contribute to the negative effects in the environment. Since measurements of these chemicals can often be quite expensive, it has been a major objective to devise practically feasible and informative sampling strategies with as few actual measurements as possible. Similar situations also arise quite frequently in agriculture and forestry. In real life sampling situations when the variable of interest from the experimental units can be more easily ranked than quantified, it turns out that, for estimation of the population mean, an old concept of Mclntyre (1952), namely, "Ranked Set Sampling" (RSS) is highly beneficial and much superior to the standard simple random sampling (SRS). Fortunately, in many agricultural and environmental studies, it is indeed possible to rank the experimental or sampling units without actually measuring them. For example, rankings for hazardous waste sites should be made according to contamination levels. These levels can usually be indicated either by visual cues such as defoliation or soil discoloration, or by inexpensive indicators such as special chemically-responsive papers, or electromagnetic readings. Sometimes information exists about the spatial distribution of a suspected chemical which can be used to create ranks. As another example, when estimating the level of PCB contamination along a pipeline in order to find contaminated sections, distance is a special factor. Since the longer the distance from the pipeline the lower the level of contamination, it is easy to rank samples based on distance. Analogously, in the area of forestry, there are quite a few situations when ranking based on cheap covariates can be done and used efficiently. For applications in this field, we refer to Cobby et al. (1985), Halls and Dell (1966) and Martin et al. (1980). The basic concepts behind RSS can be briefly described as follows. Suppose X1,X2,... ,X,, is a random sample from F(x) with mean/x and a finite variance 0-2. 337
338
N. N. Chuiv and B. K. Sinha
Then a standard nonparametric estimator of p is X = ~-]~lXi/n with var(X) = 0"2/n. In contrast to SRS, RSS uses only one observation, namely, X1:,--X(ll/, the lowest observation, from this set, then X2:.--= X(22), the second lowest from another independent set of n observations, and finally X,:n = X(nn), the largest observation from a last set of n observations. This process can be described in a table as follows. The important point to emphasize is that although RSS requires identification of as many a s n 2 experimental or sampling units, only n of them, namely, {X(I~), ... ,X(~)}, are actually measured, thus making a comparison of this sampling strategy with SRS of the same size n meaningful. Obviously, RSS would be a serious contender to SRS in situations where the task of assembly of the sampling units is easy and their relative rankings in terms of the characteristic under study can be done with negligible cost. It is obvious that the new sample X(1 ~), X ( 2 2 ) , • • • , X(n,), known in the literature as a Ranked Set Sample (RSS), are independent but not identically distributed. Moreover, marginally, X(ii) is distributed as X~:,, the ith order statistic in a sample of size n from F(x). McIntyre (1952) proposed m
n
tl
rss =
(1.1) 1
as a rival estimator of # as opposed to X'. It is easy to verify that E(firss) = ~, and a somewhat surprising result (Takahasi and Wakimoto, 1968) which makes RSS a serious contender is that var(fir~s) < var(X)!
(1.2)
A direct proof of this variance inequality follows from the well known positively associated property of the order statistics (Tukey, 1958, Bickel, 1967). Dell (1969) and Dell and Clutter (1972) provided the following explicit expression for the variance of ]~rss where ]A(i ) is the mean of X~.:n. n var(firss)
= 02///
-- ~-'~(/~(i) --
.
(1.3)
1
Many aspects of RSS have been studied in the literature. Takahasi and Wakimoto (1968) have shown that the relative precision (RP) of firss relative to X, defined as RP=var(X)/var(firss), satisfies: 1 5 RP _< ( n + 1)/2, with RP = (n + 1)/2 in the case where the population is uniform. Patil et al. (1992a) computed the expression for RP for many discrete and continuous distributions. David and Levine (1972) and Ridout and Cobby (1987) discussed the consequences of presence of errors in ranking. For some other aspects of RSS, we refer to Kvam and Samaniego (1991), Muttlak and McDonald (1990a,b), Stokes (1977, 1986), Stokes and Sager (1988), Takahashi (1969, 1970), Takahashi and Futatsuya (1988), Yanagawa and Shirahata (1976), Yanagawa and Chen (1980), and Patil et al. (1992b).
On some aspects of ranked set sampling
339
Since an application of RSS involves assembly o f n 2 units and repeated ranking of n units at a time, naturally it is most suitable for small values of n. However, this has the negative implication that the resultant RSS estimator m a y not be very efficient. Moreover, the SRS-based estimator, X, does not suffer from such artificial constraints. To increase the efficiency of the RSS-based estimator of /~, Mclntyre (1952) suggested replicating the entire RSS process several times. Thus, for example, to compare against a SRS of size 15, instead of working with an RSS based on n = 15 which requires assembly of 225 units and ranking 15 units at a time, which may involve ranking errors, one may consider an RSS procedure based on 3 units and replicating the whole process 5 times, thus requiring an assembly of 45 units and a total of 15 actual measurements as under the SRS scheme. Another possibility is to use an RSS procedure based on 5 units and replicating the entire process 3 times. Quite generally, if n = r x s, with r __%s, we can use an RSS procedure based on r units at a time, and repeat the process s times. This would require an assembly of r2s units and actual measurements of n units as under the SRS scheme. Referring to Table 1.1 with n = r, the data collected from the ith cycle can be denoted by tsX (ll:r), (0 ... ' X (r~:6 (0 Jx' i = l ' ' .,s. The overall estimator of/~ is then given by
(1.4)
j=, Li=X-A-~I07:r)/]
r.
The expression within [-] in the first part of (1.4) is clearly the average of r order statistics for a fixed cycle, while that in last part of (1.4) is the average of the jth order statistic (in a sample of size r) over s cycles. The variance of fi~s~(n = r x s) is given by (see (1.3)) var(firss(n = r x s)) = where E(X07:r) ) units. Writing
= g(j:r),
i
]/
a 2 / r - Z ( g 0 - : r ) - 1~)2/r 2
j=l
s ,
(1.5)
thus indicating its superiority over X based on n = r x s
var (Xo7:~)) = a(j:~) 2 ,
(1.6)
T a b l e 1.1 D i s p l a y o f n 2 o b s e r v a t i o n s in n sets o f n e a c h
X(II) .X(21)
X(12) )((22
X(U,-1)) X(2(,,-I))
X(h,) ,X(2n)
X(nl)
X(n2)
XO, O,- , ) )
XI ,,. )
N. N. Chuivand B. K. Sinha
340 it follows easily that
j=r E(/2rss(n = r × s)) = Z # U : r ) / r = #, )=1
(1.7)
j=r
var(
r
s(n = r ×
j=l What we have described above can be called an equal allocation scheme in the sense that each of the r order statistics is replicated an equal number of times, namely, s times: It is quite possible to use unequal allocation schemes as well. Starting with r units at a time and measuring (after ranking) the smallest unit Sl times, the second smallest s2 times, and so on, we end up with a collection of n = s~ + ... + s~ measurements. An unbiased estimator of/~ is then constructed as
j=lJ=r sj
~rss(Unequal) = ~
X--~X~(i) /sj
]/
r
(1.8)
i=1 with its variance given by
j=r var(firss (unequal))
=
1r2 2..~ x--, 0_2.. , . U.r)/SJ
(1.9)
j=l
It is also possible to discuss the problem of optimal allocation of the replications s i , . . . , & among the r order statistics for a fixed number n of actual measurements. It is readily seen from (1.9) that the optimal allocation corresponds to what is popularly known as Neyman allocation in the context of stratified sampling. Henceforth we will rarely discuss the question of replications in the rest of this chapter, and assume throughout that Table 1.1 provides our basic setup. Admittedly, the concept of RSS is nonparametric in nature, and ~rss is a natural candidate for unbiased estimation of/~ on the basis of RSS as described above when F(x) is completely unknown. In this chapter we explore the concept of RSS and exploit its full potential for estimation of parameters in some specific parametric models. As mentioned earlier, this is a survey paper based on some very recent works of primarily these authors and their associates. We discuss in detail several common parametric models, namely, normal, exponential, two parameter exponential, logistic, Cauchy, Weibull, and extreme value distributions, and discuss the usefulness of RSS methods as well as their suitable modifications for estimation of relevant parameters. Most of the variations of RSS stem from addressing such issues as (i) what is the best linear unbiased estimator ( B L U E ) based on X(ll) . . . . . X(nn)? (ii) what is the best selection o f n order statistics one each f r o m n sets o f n observations each? (iii) why not use X(I 1),X(zu, . . -, X(,1), all smallest, or X(I~),X(2,) . . . . , X(n,), all largest? (iv) can we get away with a smaller (i.e., a partial) R S S ? As expected, the answers to the above questions depend on the particular model under study as well as the nature of the parameter being estimated.
On some aspects of ranked set sampling
34l
Section 2 is devoted to the case of a normal population. We discuss estimation of a normal mean as well as a normal variance using an RSS. For the estimation of a normal variance o.2, we explore the possibility of obtaining an improved estimator based on X(ll),X(22),..., X(nn) compared to the standard nonparametric estimator ~ : ~ ( X i - X ) 2 / ( n 1). Specifically, we investigate if ~ l ( X ( i i ) -]~rss)2/(n - 1) can be used fruitfully. It may be noted that Stokes (1980) provided some asymptotic results for estimation of o.2 based on an RSS. In Section 3 we consider the problem of estimation of an exponential mean. In Section 4 we discuss the estimation of the location and scale parameters of a two parameter exponential distribution. Section 5 is devoted to the problem of estimation of the location parameter of a Cauchy distribution with an unknown scale parameter. In Section 6 we discuss the problem of estimation of the location and scale parameters of a logistic distribution. Finally, Section 7 addresses the problem of estimation of the location and scale parameters of a Weibull distribution and an extreme value distribution. As obvious from the above discussion, the theory of RSS is primarily based on properties of order statistics, and hence some standard text books on order statistics (Arnold and Balakrishnan, 1989, Balakrishnan and Cohen, 1991, David, 1981) as well as many related tables (Tietjen et al., 1977, Watanabe et al., 1957) play a crucial role. 2. Estimation of a normal mean and a normal variance
In this section we consider the situation when the underlying population is normal, and discuss the use of RSS for estimation of its mean and variance.
2.1. Estimation of mean: B L U E We first address the issue of how best to use the RSS, namely, X(xl),..., X(,n), for estimation of/z. Recall that E(X(u)) = # + via and var(X(ii)) = rio.2, where vi and v~ are respectively the expected value and the variance of the ith order statistic in a sample of size n from a standard normal population. Obviously, ~ vi = 0, and by the symmetry of the normal distribution, ~ vi/vi = O. It then easily follows that the B L U E of # is given by ])blue --
~lX(ii)/'l)i
$71/ i
(2.1)
with minimum variance = o . 2 ( ~ l l / V i ) - I , which is always smaller than O.2 ~ n l v i / n 2 =_ var(/)Rss)" Thus ~blue offers the first improvement over the standard RSS estimator/~rss" Incidentally, we can also derive the B L U E of # based on a partial RSS, namely, X(ll),...,X(~l), for l < n. Starting with ~i~tlciX(ii) and minimizing var(~i-tleiX(ii))=o.2Si-tleZvi subject to the unbiasedness conditions:
~,i~_', ei = 1, El=__'1civi = 0 leads to the BLUE of # as
N. N. Chuiv and B. K. Sinha
342
=
(EI----='IV2/Fi)(EI=/I
X(ii)/vi) -
(EI-'I
Vi/ Vi)(Ei=I i=l X(ii)vi/vi)
l/v,) - (2,=1
vilve)
(2.2) with
0-2 EI~ll V2/Ui var(~blue(/)) = (EI~ll
v2/oi)(Eii=[11/vi)
--(EI--II Vi/Vi) 2
(2.3)
The following table provides, for n < 20, minimum values of l for which var(/~blue(/)) < var(X), and shows that often a partial RSS combined with optimum weights does better than a SRS of size n. Thus, for example, /~blue(6) based on a partial RSS of size l = 6 is nearly twice as efficient as the mean X of a SRS of size n = 10. It should be noted that without some knowledge of F(x), a partial RSS cannot be used even to construct an unbiased estimator of/~. 2.2. Estimation of mean." which order statistic?
We next address the issue of the right selection of order statistics in the context of RSS, given that we must select one from each set of n observations, there are n such sets, and that the resultant estimator of kt is unbiased. Recall that McIntyre's estimator/~rss is based upon selecting the diagonal elements (X01),... ,X(n~)) in Table 1.1. The following variance inequality for order statistics of a normal distribution is useful for our subsequent discussion. Its validity for n < 50 can be seen from an inspection of the tables in Tietjen et al. (1977), and asymptotic validity follows from David and Groeneveld (1982). Throughout this chapter, the Table 2.1 Values of l and n for which var(¢bl~e(/)) < v a r ( X ) n 5 6 7 8 9 10
11 12 13 14 15 16 17 18 19 20
v a r ( ~v )
l
var(~blue (/))
0.2000a2 0.1667~ 2 0.1429a2 0.1250~ 2 0.1111~ 2 0A000~2 0.0909a2 0.0833~ 2 0.0769a2 0.0714~ 2 0.0667a~ 0.0625o-2 0.0588~ 2 0.0556~ 2 0.0526a2 0.0500~ 2
4 4 4 5 5 6 6 6 7 7 7 8 8 8 9 10
0.0937~ 2 0.1103~ 2 0.1396~ 2 0.0776~ 2 0.0933G 2 0.0574a2 0.0667~ 2 0.0770~ 2 0.0501~ 2 0.0566a2 0.0633~ 2 0.0484az 0.0479~ 2 0.0525~ 2 0.0375a 2 0.0435a2
343
On some aspects of ranked set sampling
sample median is defined in the usual way a s Xmedian:n = Xk+l: n for n = 2k + 1, and X'median:n = (Xk:n + Xk+l:n)/2 for n = 2k. LEMMA 2.1. var(Xmedian:n) _< var(Xr:n) for any r and n. In view o f the above result, we can r e c o m m e n d the use o f the median f r o m each set o f n observations, and the mean o f all such medians as an estimator o f #, namely, /~(n: n) = IX (1) (") ~" k median:n -I- " ' ' -~-X median:n]/n
(2.4)
where v(i) ~Xrnedian:n is the sample median from the ith row o f Table 1.1. Obviously, /~(n : n) is unbiased for # and, by L e m m a 2.1, m u c h better than the mean based on an ordinary RSS. We should note however that this procedures requires measuring exactly n units for n odd, but 2n units for n even. Thus, for n odd, the above variation o f RSS is certainly preferable. We point out below that a further variation o f ~(n : n) works equally well for n even. T o w a r d s this end, slightly more efficiently, we propose measuring only m medians f r o m the first m rows of Table 1.1, where m <_ n, and use (1) (m) #(m,n) ~- [Xmedian:n+ ' ' " +X~edian:n]/m
(2.5)
as an estimator of #. N o t e that this estimator could potentially use less measurements than McIntyre's, depending on m and whether n is odd or even. Clearly, E(#(m,n)) = #, var(#(m,,)) = var(Xmedian:n)/m •
(2.6)
A c o m p a r i s o n o f var(X) and var(#(m,,)) for m = n, (n - 1), (n - 2), (n - 3) appears in the following table. It is clear f r o m the above table that, for every n, a lot o f scope for i m p r o v e m e n t over the usual sample m e a n exists with the use o f less than n medians. It turns out that a general result in this direction is that m = 2 is enough f o r every n! This is because o f the following interesting variance inequality whose validity can again be seen f r o m an inspection o f the tables in Tietjen et al. (1977). A formal p r o o f appears in Sinha et al. (1996). (A plausibility a r g u m e n t in favor o f this fnequality follows from the asymptotic expressions: n a s y m p v a r ( X , ) = o 2 and n a s y m p v a r (Xmedian:n) = ~0"2/2.)
Table 2,2 Values of var(X) and var(#(,~,,,)) for cr2 = 1 n
var(X)
var(#(;,~))
var(~(,21,n/) var(~(n22,n)) var(~(n23,n))
5 10 15
0.2 0.1 0.0667
0.0574 0.0151 0.0068
0.0717 0.0167 0.0076
0.0956 0.0189 0.0079
0.1434 0.0216 0.0086
N. N. Chuiv and B. K. Sinha
344
THEOREM 2. l. var(Xn) < var(Xmedian:n) < 2var(Xn). In fact, as the p r o o f of the theorem demonstrates, for n even = 2m, a slightly stronger variance inequality holds, namely var(Xm:2m) < 2var(X2m) .
(2.7)
In view of Theorem 2.1 and the above inequality, it follows that fi(2,,), which is the average of only two suitably selected ranked set observations for n odd (namely, two medians), and fi~2,n) which is also based on only two ranked set observations for n even, will be better than the simple average of n observations, whatever be n. This is a very powerful and interesting result. Referring to Table 1.1, we would need only its first two rows and use /~(2,n) = (X(l(-~2))
+X(2("~)))/2
for n odd
(2.8)
and fi*(2,n) = (X(I(~)) +X(2(~)))/2
for n even
.
(2.9)
We now discuss, following Dixon (1957), another procedure for improved estimation o f p based on a RSS, if we are permitted to use four observations in all, whatever be n. Towards this end, given )(1,...,Xn, we first find the best pair of observations (Xr:n,X, r+l:n) in the sense of determining r such that the unbiased estimator (Xr:, +Xn_~+l:,)/2 has the smallest variance. It is tempting to conclude from L e m m a 2.1 that the sample median, which corresponds to r = n/2 for n even and r = (n + 1)/2 for n odd, would be the best choice. This is far from being true, as demonstrated in Dixon (1957), who provided the best values of r along with the smallest variance for 3 < n < 20. Values of r along with the smallest variance for 21 < n < 40 appear in Table 2.3 below. We have taken a 2 = 1 without any loss of generality. For large n, the best r = [np] is obtained by minimizing ~,(p) = asympvar(x/~)(Xr:, +Xn-r+l:n)/2 =paZ/z~2(qp) with respect to p, where t/p is the pth quantile of a standard normal population. This results in Popt = 0.27, Table 2.3 Values of best r and var(Xr:,, +Xn-r+a:n)/2 for 21 < n < 40 n
r
varI(Xr:. +Xn-r+l:.)/2]
n
r
var[(Xr:. +X,,-r+l:n)/2]
21 22 23 24 25 26 27 28 29 30
6 6 7 7 7 7 8 8 8 9
0.0578 0.0553 0.0529 0.0507 0.0487 0.0569 0.0452 0.0435 0.0421 0.0407
31 32 33 34 35 36 37 38 39 40
9 9 9 10 10 10 10 11 11 11
0.0394 0.00382 0.037• 0.0360 0.0349 0.0340 0.0331 0.0322 0.0314 0.0306
On some aspects of" ranked set sampling
345
as observed in Dixon (1957), with the resultant minimum O(Popt) = 1.2344a 2. Incidentally, the asymptotic validity of Theorem 2.1 for this best pair is obvious. It thus follows that, for every n, there is a unique pair (Xr:n,Xn r+l:n) such that (Xr:n +Xn-r+l:,)/2 performs better than the sample median, and hence satisfies the variance inequality of Theorem 2.1. It is then clear that, whatever be n, the unbiased estimator /~r---(X0~ ) +X0(,_~+I) ) ~-X(Zr)+X(2(n--r+a)))/4 based on the use of only four suitably selected measurements (see Table 1.1) does indeed represent the best combination of four observations, significantly outperforming the sample mean based on all the n observations. As mentioned before, optimum r is available for n _< 20 in Dixon (1957), and for 21 < n < 40 in Table 2.3 above. For large n, we can use r = [0.27n]. 2.3. Estimation of mean." why not all smallest or all largest? We now discuss some other variations of RSS. In many experimental studies, locating the smallest or the largest of a set of observations could be very easy compared to locating the median as suggested above or locating all order statistics as required in McIntyre's setup. In this part, we need to assume that ~2 is known ( = 1, without any loss of generality), and propose/~min = ~ n l ) ( ( i l ) / n -- Vl, the bias-corrected mean of n smallest observations (see Table 1.1), as an estimator of #. More efficiently, we suggest the use of m /]rain(m) = ~ X ( i l ) / m 1
- Vl
(2.10)
based on m(< n) smallest means. Note that E(/~mi,(m)) ~- # and var(/~rni,(m)) -vl/m. Analogously, we could use the average of the maximums if that is more convenient. The following table provides, for n < 20, values of minimum m for which/]min(rn) is better than X based on a SRS of size n, and demonstrates that this variation of RSS is quite efficient. Thus, for example, when the population variance is known, the bias-corrected average of only 6 smallest observations, one from each set of 20 observations, does better than the simple average of as many as 20 observations. 2.4. Estimation of mean: concept of expansion We next explore another interesting idea of expansion by raising the possibility of observing l(> n) experimental units but measuring at most m(< n) of them. The object is to efficiently use these m measurements to get an estimator of # which is Table 2.4 Values of m, n satisfying var(/~in(m)) < vat(X) n
24
5-8
9-12
13-17
18-20
m
2
3
4
5
6
N. N. Chuiv and B. K. Sinha
346
unbiased and has less variance compared to the sample mean based on n measurements. As an example, for n = 2, to dominate X with variance = cr2/2, we can record one more observation (i.e., 3 observations in all) but use only the median (i.e., only one measurement) to yield var(/~median:3) = 0.45o -2 < ff2/2 !! It turns out that this is a c o m m o n phenomenon, and can be done for every sample size n. The following table provides minimum m and corresponding optimum estimators o f # for n <_ 10. N o t e that in some cases the notion o f a RSS based on a total o f l observations and suitable splitting is used. For example, when n = 6, we can expand it to a sample o f size either 8 and use Dixon's (1957) best pair, 9 and use the median, or 10 and use the notion o f a partial RSS. Thus whenever the underlying experimental units can be easily ranked than quantified, observing a few more than in a SRS but measuring only a handful o f suitably chosen ranked units pays off.
2.5. Estimation of variance." ordinary RSS F o r estimation o f a2 b a s e d o n an RSS o f size n, we first observe that a c o m m o n
sense estimator, namely, ~ ( X ( i i ) E
(X(e0 - #rss)
-- / 2 r s s ) 2 / ( n -- 1) is not unbiased for ~r2 because
= (1 - l / n ) E
E(X(,i))E(X(jj))
g) - ( l / n ) iT~j=l
= (1-1/n)E
i)
-
(I/n)
E(X(1i))E(X(,j)) iT~j=l
Table 2.5 Improved estimators based on "expansion" n
v a r ( ~v)
l
m
estimator
variance
3 3 3 4 4 5 5 6 6 6 7 7 8 8 8 8 9 9 10 10
0.3333o- 2 0.3333cr 2 0.3333cr 2 0.25~r 2 0.25~ 2 0.2Oct2 0.20a 2 0.1667a 2 O. 1667o-2 0.1667o- 2 O. 1429o-2 0.1429o- 2 0.125o- 2 0.125cr 2 0,125a 2 0.125~r 2 0.111 l~r2 0.1111o- z 0.100a 2 0.100a 2
4 5 5 5 6 6 7 8 9 10=5x2 9 10 10 11 12 12=6x2 11 12 12 13
2 1 2 2 2 2 2 2 1 2 2 2 2 2 2 2 2 2 2 2
(22:4 +)(3:4)/2 )(3:5 (X2:5 + X4:5)/2 (X2:5 + X4:5)/2 (X2:6 q- X5:6)/2 (22:6 q- 25:6)/2 0(2:7 + X6:7)/2 (X3:8 + X6:8)/2
0.2980~ 2 0.2868o- 2 0.2310a 2 0.23 lO~r2 0,1930a 2 O, 1930o-2 0.1682a 2 0.1490o -2 O. 1661 cr2 0.1434o- 2 O. 1320~r2 0.1190o- 2 0.1190o- 2 0.1090c r2 O.lO00a 2 0.123o- 2 0.1090a 2 0.10000 -2 0.10000 `2 0-0924a 2
)(5:9 (X03) + X(23))/2 (X3:9 + 27:9)/2 (X3:1o + X8:1o)/2 (X3:IO + X8:1o)/2 (X3:ll + X9:11)/2 (X4:12 + X9:12)/2 (X(13) +2(24))/2 (X3:H + X9:11)/2
(X4:12 + X9:I2)/2 (Y4:12 +29:12)/2 (X4:13 + X10:13)/2
On some aspects of ranked set sampling
> (1 - 1/n)E
i) - ( l / n )
347
E(X(Ii).)((lj) ) i~j=l
=
E
1i) _ ~ ) 2
= (n-
1)0- 2 .
(2.11)
We therefore determine c, such that % ~2 = c, ~l(X(ii) -/~rss) 2 is unbiased for 0-2, and subsequently compute var(ff 2) as k~0-4. Recall that var(&2ordinary) = 20-4/(n - 1). The following table provides values of en and k~ for n _< 5, and demonstrates that it pays off to use RSS rather than SRS even for estimation of normal variance.
2.6. Estimation of variance." modifications of R S S As in the case o f estimation o f / , , here also we have explored the possibility of using the medians, the smallest, and the largest observations from Table 1.1 in ^2 Our order to come up with an improved estimator of o-2 compared t o 0-ordinary" computations for n = 2 reveal that the variance of 0-smallest^2= ( X ( l l ) _ X ( 2 1 ) ) 2 / 2vl, the unbiased estimator of o-2 based on the two smallest order statistics (see Table 1.1), is given by [~v~÷ 0-42 1] = 3.7280- 4. This is more than 20-4 = v a r (0-ordinary) ^2 1. Similarly, for n = 3, use of 0-median ~2 = E 1 3 (X'(i2) - X(.2))2/2/)2, where X(.2) 3X.(i2)/3, which is the unbiased estimator of 0-2 based on the three medians, = ~-~.1 results in var(~'2median) = 1.012o-4, which is more than 0-4! Thus, the use of the above variations of RSS does not seem to work for estimation o f 0-2. 3. Estimation of an exponential mean
In this section, which is based on Sinha et al. (1996), we assume that X1 ,...,X, is a r a n d o m sample from f(xtO ) = exp(-x/O)/O, x > 0, 0 > 0. Recall that E(X/:n) = ~¢-i l O/(n - j + 1) = Oci:, (say) and var(X~:n) = ~ - i 102/(n - j + 1) 2 = 02di, (say).
3.1. Ordinary R S S The traditional unbiased estimator of 0 based on a SRS of size n is given by 01 = X with var(01) = 02In. The McIntyre unbiased estimator of 0 based on the usual RSS as described before is given by Table 2.6 Values of c. and k~
n
2/(n - 1)
cn
kn
2
2
(2 + 2/~) -~
2+3/n-l/n2
3 4 5
1 0.6667 0.5
0.4036 0.2798 0.2156
0.7725 0.5372 0.3401
N. N. Chuiv and B. K. Sinha
348 ix//
(3.1)
0 2 = ~X(H)/FI
/=I with ix//
var(02) =- 0 2 Z
(3.2)
di:n/i'12 "
/=i Trivially, var(02) < var(01), as expected.
3.2. B L U E We now discuss a few variations of RSS. Since, for every i, ~:, = X(,)/ci:n provides an unbiased estimator of 0 with var(Yi:n) = 02d,:~/c2~ = OZai:n, the B L U E of 0 based on the RSS is readily seen to be
03=[~l~i;./ai:nl/I~2~I/a,;nJ [. ,=1
(3.3)
,=I
with
var(03)
=02/[~I/a,;.]
(3.4)
Obviously, var(03) < var(02), and thus 03 offers a uniform improvement over Mclntyre's 02. Incidentally, if we have available only a partial RSS, namely, X01 ) .... , XU0 for some l < n, the above argument shows that the B L U E of 0 based o n { X ( l l ) , . . . , fl£-(//)} is given by
Yii:,lai:,]
03(/) =
[
(3.5)
1/a,:,
k/=l with
/[w 1
var(03 ( / ) ) = 0 2 / L i _ _
(3.6)
ai: n
The following table provides the minimum I for which var(03(l)) < var(01) for n < 20, and clearly indicates that often it is enough to work with a partial RSS to
Table 3.1 Values of n and l for which var(03(/)) < var(01) n l
35 3
6-9 4
10-14 5
15-20 6
On some aspects of ranked set sampling
349
d o m i n a t e the u s u a l estimator based o n a SRS, provided we choose the weights appropriately. Thus, for example, the weighted m e a n of as few as 6 r a n k e d set m e a s u r e m e n t s is m o r e efficient t h a n the m e a n of a SRS of size 20.
3.3. Which order statistic? W e next address the issue of the right selection of order statistics, given that we m u s t select e x a c t l y one from each set of n observations. It is clear from the preceding discussion that Y~:nis the best choice, where the index r makes arm the smallest a m o n g a l:n,...,a .... I n the n o t a t i o n s of Table 1.1, our p r o p o s e d estimator of 0 is then given by i~n 04 =
(3.7)
~-~(X(ir)/Cr:n)/n
i~l with vat(04) =02ar:n/n
.
(3.8)
The following table provides the o p t i m u m values of r as well as arm for n < 40. M o r e efficiently, as in the case of e s t i m a t i o n of a n o r m a l m e a n , we p r o p o s e m e a s u r i n g only m ( < n) r th order statistics, where r is defined above, a n d the use of i~m
(3.9)
04(m) = ~-~(X(ir)/Cr:n)/m as a n estimator of 0. Obviously,
(3.10)
var(04(m)) = 02ar:n/m .
W e n o w state the following result regarding o p t i m u m selection of m. Its p r o o f appears in Sinha et al. (1996).
Table 3.2 Values of a,-:, n
r
arm
rt
r
arm
17
r
ar: n
r~
F
ar:n
1 2 3 4 5 6 7 8 9 10
1 2 3 4 5 5 6 7 8 8
1 0.56 0.40 0.33 0.28 0.23 0.20 0.18 0.16 0.15
11 12 13 14 15 16 17 18 19 20
9 10 11 12 12 13 14 15 16 16
0.13 0.12 0.11 0.11 0.10 0.09 0.09 0.08 0.08 0.07
21 22 23 24 25 26 27 28 29 30
17 18 19 20 20 21 22 23 24 24
0.07 0.07 0.07 0.06 0.06 0,06 0.06 0.05 0.05 0.05
31 32 33 34 35 36 37 38 39 40
25 26 27 28 28 29 30 31 31 32
0.05 0.05 0.05 0.05 0.04 0.04 0.04 0.04 0.04 0.04
350
N. N. Chuiv and B. K. Sinha
THEOgEM 3.1. Let r be such that ar:n is the smallest among al:n,...,a,:~. Then
ar:n < 2/n. In view of the above theorem, it follows that, irrespective of the value of n, once the optimum selection of the order statistic in each row of Table 1.1 is made, it is enough to repeat the sampling process only once. In other words, the weighted mean 04(2) = (X(lr)-I-X(zr))/(2Cr:n) of exactly two optimum order statistics, one each from the first two rows of Table 1. I, is better than the mean of all n observations in one row of it. This is a very powerful and interesting result, and in the same spirit as in the case of estimation of a normal mean. Optimum r for n _< 40 appears in Table 3.2 above. For large n, one can take r = [0.SnJ~ This is because, writing r = [np], nar.~ can be approximated as ~ P ~ [ l n ( 1 - p ) ] - . The latter ex• presslon has a minimum for p = 0.8, and the resultant m l m m u m value is seen to be 1.5625 < 2. .
.
•
l - - p
.
,
.
3.4. Why not all smallest or all largest? We now discuss another variation of RSS, namely, if it makes sense to measure only the smallest or only the largest in Table 1.1 rather than across the diagonal as in RSS. Since ai:n = 1, it follows trivially that }-~i=~X(iu/n, the mean of the n smallest order statistics, behaves just like X, and hence offers no improvement. This is some what disappointing because in practical problems involving reliability and life distributions, observing the smallest does indeed make sense in terms of feasibility and cost. On the other hand, since an:n < 1 (by C a u c h y Schwartz inequality), it is clear that the use of i=n 05 =
(3.11)
, i=1
the mean of the n largest order statistics, always results in an improved estimator of 0. In fact, as the following table indicates, often i=l
05(I) = ~ X ( i , ) / l
,
(3.12)
i=1
the mean of the 1 largest order statistics, will do for 1 much less than n.
3.5. Concept of expansion Finally, we address the consequences of the notion of expansion. Given an initial SRS of size n with its usual estimator X and var(X') = 02In, we expand the sample Table 3.3 Values o f n and l for which var(0s(/)) < v a r ( ~ ) n
2-12
13-29
30-40
l
2
3
4
On some aspects of ranked set sampling
351
size to I and measure only one order statistic, namely, the optimum order statistic Xr:l for which ar:l is the smallest among al:l,...,at:l. The resulting estimator of 0 is then given by 0 6 ( r : l) = Xr:l/Cr:l
(3.13)
var(O6(r:l))=O2ar:l .
(3.14)
with
The following table provides the values of l and r, for n < 20, for which var(06(r : l)) < var(_~), and demonstrates the usefulness of the concept of expansion. Thus, for example, the use of the 8th order statistic in a sample of size 9 does better than using all the observations in a sample of size 6.
4. Estimation of parameters in a two parameter exponential distribution In this section, which is based on Lain et al. (1994), we extend the discussion to the estimation of the parameters in a two parameter exponential distribution. One main point of difference between this section and the previous sections where mostly estimation of the mean is discussed, is that here neither of the parameters of interest is the mean. We note that the pdf of a two parameter exponential distribution can be written as
f(xlO, a ) = } e x p [ - ( x - O ) / a ] ,
x>O,
a>0
.
(4.1)
We note in passing that if X1,... ,Xn is a SRS of size n from (4.1), then 0=X0)-n,
6-=
n-1
(4.2)
are the uniformly minimum variance unbiased estimators (UMVUEs) of 0 and a respectively. Moreover, Table 3.4 Values of l and r for which var(06(r: l)) < var(X) n
l
r
n
l
r
1 2 3 4 5 6 7 8 9 10
2 3 4 6 8 9 11 13 14 15
2 3 4 5 7 8 11 11 12 12
11 12 13 14 15 16 17 18 19 20
16 18 20 21 24 25 28 29 30 35
13 15 16 17 20 20 23 24 24 28
N. N. Chuiv and B. K. Sinha
352
(~
_
varw'
~2
n(n
--
G2
var(a) =
1) 1
n --
(4.3)
1
4.1. Estimation of O." BLUE We first address the issue of how best to use the RSS, namely, Xfll),...,X(,,), for estimation of 0. Recall that E(X(ii)) = 0 + ci:~0-and var(X(ii)) = di:n0-2, where, as in Section 3, c~:n, di:~ are respectively the expected value and the variance of the ith order statistic in a sample of size n from a standard exponential distribution. Starting with }-~ ciX(ii) and minimizing v a r ( ~ ciX(ii)) subject to the unbiasedness conditions: ~ ci = 1, 2 ~ cici:, = 0, leads to the best linear unbiased estimator (BLUE) of 0 as 0b2ue
=
(S1
nXj
n
,
.
n
n
(ii)/di:n)(21C2n/di'n) -- ( 2 2 Ci:n/di:n)(S2 Ci:nX(ii)/di:n) (2nl n 2 d 2 - (G2 n
(4.4) with ( E l C{n/di:n) . 2 . 2 1 / d i : n ) ( S I Ci:n/di:n) - ( E l Ci:n/di:n)
var(0b|ue) = 0-2
(4.5)
The above formulae can be simplified a little bit using the fact that
~-~ l/di:, : ~ i=1
l / ( n - i + l) .
(4.6)
i=1
Table 4.1 provides a comparison of vat(0) and var(0blue) through RP = var(0)/ var(0blue) for n = 6, 7, 8, 9, 10 and clearly reveals the superiority of RSS over the use of SRS. Incidentally, we can also derive the BLUE of 0 based on a partial RSS, namely, X(al),... ,X(z0, for l < n. Starting with ~ 1 ciX(ii ) and minimizing var(212 ciX(ii) ) subject to the unbiasedness conditions: 212 ci = 1, E l cici:n = o, leads to 0b2ue(/)
=
(2'lX(ii)/di:n)(~ll
Ci:n/di:n) 2 _
(2'21/4.)(2'2
(~1l
G/<:.)
Ci:n/di:n)(211Ci:nX(ii)/di:n) -
<:./4:.)
2
(4.7)
Table 4.1 Comparison of var(0) and var(0bh~e) through RP n
6
7
8
9
10
RP
1.0439
1.1333
1.2177
1.2870
1.3537
On some aspects o f ranked set sampling
353
with var(0blue(l))
C2n/di:n)
= a2
(Ell /di:n)(E l
cL/di:n)
-
(X l c,:n/di:n) 2
(4.8)
The following table provides, for n = 8, 9 and 10 minimum values of l for which
RP = var(O)/var(Oblue(1)) > 1, and shows that often a partial RSS combined with optimum weights does better than a SRS of size n. Thus, for example, 0blue(7) based on a partial RSS of size I = 7 is as efficient as the U M V U E of 0 based on a SRS of size n = 10. As an extreme example, one can verify that 0blue(9) based on only 9 observations is as efficient as 0 based on n = 100.
4.2. Estimation of O: which order statistic? We next address the issue of the right selection of order statistics in the context of RSS, given that we must select one from each set of n observations, there are n such sets, and that the resultant estimator of 0 is unbiased. Recall that McIntyre's scheme is based upon selecting the diagonal elements (X(11),..., X(nn)) in Table 1.1, where X(ij) refers to the jth order statistic in the ith row of this table. Unfortunately, unlike in the case of normal and exponential distributions, there is no obvious clear-cut choice of any 'optimum' order statistic in the present problem. We first discuss the case of the minimum order statistic and examine the performance of the use of (X01),... ,X(ll)) for estimation of 0 for various choices of l = 1 , . . . , n in an attempt to determine the minimum value of l for which dominance over 0 holds. Noting that the pdf of X(il) is of the same form as (4.1) with a replaced by a* = a/n, and that X(ll),... ,X(tl) are iid, we can readily use (4.2) to conclude that 0rain(l) = ]z(1) --
~-Jl (X(il) - I1(1)) l ( l - 1)
(4.9)
is the U M V U E of 0 based on (X(11),... ,X(ll)), where Y(1) = min{X(ll),... ,X(ll)}, with the resultant variance given by var(0min(l))
~r2 - - n 2 l ( l _ 1)
(4.10)
A direct comparison of var(0) and var(0min(l)) immediately shows that l = 2 does the job. In other words, appropriate use of only two smallest order statistics Table 4.2 Values of l and n for which RP > 1 n
1
RP
8 9 10
7 7 7
1.1049 1.0692 1.0374
N. N. Chuiv and B. K. Sinha
354
from Table 1.1 results in an unbiased estimator o f 0 better than the use o f a SRS o f size n, whatever be n! This powerful result is similar to those observed in Sections 2 and 3 for n o r m a l and exponential means, in that in each case l = 2 is sufficient. However, the value o f r differs in each case. It should be noted that the optimality o f the m i n i m u m order statistic for estimation o f 0 is essentially due to its (partial) sufficiency under the model (4.1), and c a n n o t be expected to hold for other distributions. We next address the problem o f using a subset o f the r th order statistics, namely, X(~r),... ,X(zr) f r o m the r th column o f Table 1.1 for some fixed r > 1 and for some 1 < l < n to efficiently estimate 0, and in the sequel determine the m i n i m u m l for every value o f r for which the desired dominance holds. N o t e that for an arbitrary r > 1, X(a,),... ,X(lr) are iid with a c o m m o n p d f o f the form 1 ~ /x 0~ x aY,.n/-7-J > 0, for some 9~.,(:). It m a y be noted that, unless r = 1, an exact o p t i m u m inference in the f o r m o f a U M V U E of 0 is extremely difficult in the general situation. Following the idea given in (4.2), we therefore propose to use
Z(hr) = min{X(lr),...
,X(lr)}
(4.11)
and
l
Z(hr) =
~1 (X(i,) l-
-
1
Z(t:,))
(4.12)
N o t i n g that
E(Z(hr) ) = 0 q- G(5(l,r,n),
E(Z(*/:r) ) ---- aa~l,r,n )
(4.13)
where @,,,,) and 6~t,r,,) are two absolute constants, we m a y use
6(l,,,n)Z(*l:~)
-
O(l'r) = Z(hr)
(4.14)
(~l,r,n)
as an estimator o f 0 based on Jf(lr),-.. ,Y(lr)" We have numerically c o m p u t e d the values o f @,,,,), 6~z,r,n), and performed extensive simulation to evaluate the variance o f 0(t,) for various values o f l, r and n = 5, 10. It turns out that for these values o f n, only r = 2 works. Values o f @,2,n), and 6~l,2,n ) for n = 5 and 10 respectively are given in Table 4.3 and Table 4.4. In Table 4.5 and Table 4.6 we present the simulated values o f var(0(l,r)) for all the combinations o f l when r = 2 for n = 5 and 10 respectively. The combina-
Table 4.3 Values of 6(/,2,n ) and 6~t,2,n) for n = 5 l
2
3
4
5
~5~z,2,,) 6(l,2,n)
.3389 .2806
.3511 .2159
.3595 .1804
.3659 .1573
On some aspects of ranked set sampling Table 4.4 Values of 6(t,z,n) and
355
c5~1,2,n) for n = 10
l
2
3
4
5
6
7
8
9
10
6~1,2,n)
.1584 .1318
.1643 .1015
,1683 .0848
.1713 .0740
.1737 .0663
.1756 .0605
.1773 .0559
.1787 .0522
.1799 .0491
6(t,2,~)
tions for which dominance over 0 holds, namely, var(0(l,2)) < ~-W~-I)-are denoted by *(see (4.3) for var(0)). Thus, for n = 5, the unbiased estimator 0(3,2) based on three second order statistics {)((1,2), X(2,2), X(3,2)} performs better than 0, while for n = 10, the unbiased estimator 0(6,2) based on {X(1,2), X(2,2), X(3,2), X(4,2), X(5,2), X(6,2)} is better than 0. The simulated values of the variances of 0(l,r) are based on 10,000 replications of the values of 0(l,r), each 0(l#) in turn being generated from n2 simulated standard exponential variables. To examine the stability of the above simulated values, we generated 20 sets of values of simulated var(0(t,2)) for n = 5 and l = 2, 3, 4, 5, each set in turn being based on 10000 replications of standard exponential variables. The standard errors of these 20 values are given below in Table 4.7. It is clear that the amount of variation is very small, and dominance of 0(1,2) over 0 for n = 5 and l >_ 3 always holds.
4.3. Estimation of a: BLUE In this section we discuss the problem of estimation of the parameter 0- in (4.1), and point out that the use of RSS and its suitable variations results in much improved estimators compared to the use of a SRS. To derive the BLUE of a based on the entire McIntyre sample X(11),,.. ,X(~n), we minimize the variance of ~ ciX(ii) subject to the unbiasedness conditions: ~ ci = 0, ~ cic~:, = 1. This results in eblue =
(~-~nlX(ii)Ci:nldi:n)(~n 1 lids:,,) - ( ~
Ci:nldi:n)(~nlY(ii)/di:n)
( ~ n 1 | /di:n)(~nl c2i:n/di:n) -- ( ~ n 1 Ci:n/di:n) 2 (4.15) with v a r ( ~ b l u e ) = 0-2
(~
(El
1/di:n)
n
_ _
(E;'
.
,
Table 4.5 Simulated values of var(0(t,2 ) ) . , for n = 5 l
2
3
4
5
var(O(~,2))
0.1078a 2
0.0368a 2.
0.0191a 2.
0.0128a 2.
(4.16)
N. N. Chuiv and B. K. Sinha
356
Table 4.6 Simulated values of var(0(t,2))., for n = 10 l
2
3
4
5
var(0(l.2)) 6 0.0092a 2.
0.109@r 2 7 0.0064er 2.
0.0372a 2 8 0.0049er 2.
0.0192cr2 9 0.0042G 2.
0.0129~r2 10 0.0037er 2.
Table 4.7 Simulated standard errors of simulated var(0~t,2)) for n = 5 ~
l s.e.
2 6.1074 x 10-06
3 3.3474× 10 -07
4 5.1550× 10.-08
5 1.9139× 10-08
A s before, the a b o v e e x p r e s s i o n s c a n be simplified a bit u s i n g (4.6). I n T a b l e 4.8 we h a v e p r e s e n t e d the v a l u e s o f R P = v a r ( 6 ) / v a r ( ~ b l u e ) for n = 4, 5, 6, 7, 8, 9, 10. T h e o v e r w h e l m i n g d o m i n a n c e o f ~blue o v e r d in all the cases is o b v i o u s . F o r e x a m p l e , ~blue b a s e d o n a R S S o f size 7 is twice as efficient as 6- b a s e d o n n = 7. As in the case o f e s t i m a t i o n o f 0, here also we c a n use a p a r t i a l RSS, n a m e l y , X(Ia),.).,X(H) for s o m e l < n . S t a r t i n g w i t h ~llciX(ii) a n d m i n i m i z i n g 1 v a r (~1 ciX(ii)) subject to the u s u a l u n b i a s e d n e s s c o n d i t i o n s : ~ 1 ci = O, E l CiCi:n = 1, we r e a d i l y o b t a i n the B L U E o f a b a s e d o n the p a r t i a l R S S as ~blue(/) = ( E l
l
l
X(ii)Ci:n/di:n)(E1 I/dim) ( E l l Ci:n/di:n)(Ell X(ii)/di:n) ( E l l 1/d/:n)(Ell c2n/di:,,) - ( E l l ci:n/4:,,) 2 -
-
(4.17) with
var(#bta (/))
= a2
1/di:.) d./4:o)
- (FJ
(4.18)
.,:./4:.) 2
T a b l e 4.9 p r o v i d e s the m i n i m u m v a l u e s o f I for n = 6, 7, 8, 9, 10 for w h i c h d o m i n a n c e o f O'blue(/) over d h o l d s , a n d clearly shows the s u p e r i o r i t y o f this Table 4.8 Comparison of var(~) and var(~blue) through RP n
RP
4 5 6 7 8 9 10
1.1865 1.4535 1.7241 1.9988 2.2755 2.5615 2.8414
On some aspects of ranked set sampling
357
Table 4.9 Values of l and n for which RP > 1 n
l
RP
6 7 8 9 10
5 6 6 6 7
1.1422 1.4420 1.2059 1.0280 1.3086
method. F o r example, ffblue(6) based on a RSS of size 6 is as efficient as d- based on n~9, 4.4. Estimation o f ~r: which order statistic? In this section we explore the possibility o f finding " o p t i m u m " order statistics X(lr),...,X(nr) corresponding to the rth column in Table 1.1 so that a suitable c o m b i n a t i o n o f a subset of these order statistics, namely, X0r), ... ,X(zr) for some l < n would p r o d u c e an unbiased estimator o f a better than & A close inspection o f the formula given in (4.3) coupled with the observations made in Section 4.2 reveals that r = 1 will not do the job because O'min(I) =
n~il(X(il)l-- 1
Y(I))
(4.19) '
the U M V U E o f a based on X(ll),... ,X(n), satisfies: var(6min(/)) = ~r2/(l - 1) which is always bigger than ~2/(n - 1). We have done extensive simulation for all r > 1 when n = 5 and found that there is no value of l for which the dominance of Z* (,,r) 6(t,~) -- 6[z,~,,)
(4.20)
over ~ holds. We also noted that the performance of 6(t,r), which is always worse than 6, quickly deteriorates as r increases. REMARK 4.1. T h r o u g h o u t this section, we have reported our results for n up to 10, the m i n i m u m value of n in each o f the tables being the one for which the desired dominance holds. REMARK 4.2. Since the standard errors o f all our proposed estimators o f 0 and based on an RSS depend on a, these can be easily estimated by plugging in the p r o p o s e d RSS estimators o f a. REMARK 4.3. The concept o f expansion can again be used in this context. W e omit the details.
358
N. N. Chuiv and B. K. Sinha
5. Estimation of the location parameter of a Cauehy distribution
The object of this section, which is based on Ni Chuiv et al. (1994), is to explore s~Atable modifications of RSS for the estimation of the location parameter of a Cauchy distribution with an unknown scale parameter. However, several main points of differences exist between this problem and those of the previous sections. First, in this case the parameter of interest is a location parameter, while the mean is nonexistent so that McIntyre's formulation is not directly applicable. Secondly, unlike the previous cases, there is no nontrivial sufficient statistic in this case and consequently no uniformly minimum variance unbiased estimator (UMVUE) of the location parameter. This means that the U M V U E is no longer a standard for comparison under SRS. We note that the pdf of a Cauchy distribution can be written as 1
1
f(xlO,~)=--.
=or [1 + ((X --
O)/~r)21 ,
-oo<x,
0
a>O
(5.1)
where 0 is the location parameter and a is the scale parameter. First we give a brief discussion of some standard estimators of 0 based on a SRS of size n, namely, X1, • • •, Am. It is well known that for the model (5.1) the maximum likelihood estimators (MLEs) of 0 and a are solution of the equations
X/- 0
0/=°'
~ (Xi - 0) 2 n ,=I a2 7 i X / 7 O)2 - 2
(5.2)
There are some severe difficulties with the MLE of 0 even when o- is known, namely, numerical solution of the first equation of (5.2) with ~r known, identification of the solution which actually maximizes the likelihood, and most importantly, the small sample properties of the M L E (Barnett, 1966a). Since the order statistics Xl:n,... ,X,:n corresponding to X1,... ,X~ are jointly sufficient for 0 and a, quite a few simple estimators of 0 based on linear combinations of these order statistics have been suggested in the literature. Most notable among them are Lloyd's (1952) best linear unbiased estimator (BLUE) of the order statistics, Blom's (1958) nearly best linear estimator (NBLE) and modified nearly best linear estimator (MNBLE) of the order statistics, Bloch's (1966) "quick estimators" which are also linear in the Xi:n's, and Rothenberg et al.'s (1964) simple average of a few intermediate order statistics. Obviously, Lloyd's (1952) BLUE 0blue of 0 is the best among those mentioned above, and throughout this section we have taken this as the main standard for comparison against RSS estimators. Since E(X~:.) = 0 + Cr:.a,
var(X~:.) =
dr~:,,a2,
cov(X~:.,X~:.) = d,-s:~a2 (5.3)
359
On s o m e aspects o f r a n k e d set s a m p l i n g
where cr:., &r:. are the mean, variance of Xr:., and d~s:. is the covariance between X~:n and Xs:~ from a standard Cauchy distribution with 0 = 0 and 0.-- 1, the BLUE 0blue of 0 can be written as (Lloyd, 1952, Barnett, 1966b)
Oblue=
*) 11
(5.4)
with var(0blue) = o-2/lt(y'*) 11
(5.5)
where X * = (X3:,,...,Xn_2:,) ~ and E* -- var(X*). It should be noted that the BLUE of 0 as displayed above is based on the intermediate (n - 4 ) order statistics, namely, X3:,,...,X,_2:n, because the first and the last two order statistics from a Cauchy distribution have infinite variances. This naturally puts the restriction that n > 5. Barnett (1966b) provides a table of values of var(0blue) for n = 5(1)16(2)20. We also note in passing that the Rao-Cramer lower bound (RCLB) of the variance of any unbiased estimator of 0 based on the X/'s is given by RCLB = 20.2/n .
(5.6)
5.1. Ordinary R S S
Throughout Section 5 we have taken 0((33),...,X((,-2)(~ 2))) as the relevant McIntyre's RSS, ignoring the first and the last two order statistics from the original McIntyre's RSS (X(11),..., X(,,)). Following McIntyre's concept, our first estimator of 0 can be taken as (n-2) 0rss = Z X(rr)/(n - 4)
(5.7)
r=3
with
var(0rss)
(n-2) = 0" 2 Z d r r : n / ( n - 4) 2 • r=3
(5.8)
5.2. B L U E based on R S S
We now address the issue of how best to use the RSS, namely, X(33),..., X((n 2)(n-2)), for estimation of 0. Recall that E(X(ii)) = 0 + ci:na and var(X(ii)) = dii:n0.2, where (Barnett, 1966b) dii: n z
]2ii:n -
2 Ci:n
n, ci:, = n " ( i -
1)!(n
(5.9)
; J2 i)!a_~/z(tanx)
~+x
-x
dx (5.10)
360
N. N. Chuiv and B. K. Sinha ( 2 ) (÷i -x1 ) ( 2 #ii:n = ten( i _ 1)'(nn' _ i)' in~2 a_n/i(tanx) 2
-x
)(n-i) dx . (5.11)
Starting with ~.-2)ciX(ii) and minimizing var(Z~ n-z) eiX(ii)) subject to the unbiasedness conditions: ~ n - 2 ) ci = 1, ~ . - 2 t eici:. = 0, leads to the BLUE of 0 as
Oblue= (E~n-2) X(ii)/dii:n)(E~ n-2) Ci:n/ 2 d ii:n) -- (X-'~(n-2)~ /..t '~/V'~(n-2)Ci:nX(ii)/dii:n) kZ-.~3 Ci:n/Uii:n)t,2--~3
(E~n-2) 1/di,:,)(E~'-2)
2 cim/dii:.) -- (E3(n-2) Ci:n/dii:n) 2
(5.12) with
var(0blue) = 0-2
(E~ n-2) Ci:n 2 / dii:n ) (E~ n-2) 1/dii:n)(E~ n-2) Ci2n/dii:n) -- ( E ~ n-2) Ci:n/dii:n) 2 (5.13)
Using the fact that, for a Cauchy distribution,
(n-2) Z Ci:n ~- 0: i=3
(n-2) E Ci:n/dii:n = 0 , i=3
(5.14)
we get
V'~(n-2) v /a Oblue ~. Z-a3 A(ii)/uii:n -}-~,~"-2) 1/dii:n ' var(0blue)
0.2
(5.15)
E~n_2)1/dii:n
Table 5.1 provides a comparison of var(0blue), var(0rss), var(0blue) and the RCLB for n = 5(1)16(2)20. Without loss of generality, we have taken 0.2 = 1. It is clear from Table 5.1 that, as expected, 0blue performs much better than McIntyre's estimator 0rss for n _> 7, which in turn does better than for all n _> 6. Moreover, a comparison of var(0blue) with the RCLB reveals another striking feature that 0blue performs better than any unbiased estimator of 0 based on the SRS for all n _> 8! This latter fact can be proved theoretically as the following result shows. Its proof appears in Ni Chuiv et al. (1994). ~
0blue
N
THEOREM 5.1. For n > 8, var(0blue) < 20.2/n. REMARK 5.1. For n = 5, var(0blue), var(0rss) and var(0blue) are all the same because all the estimators involved coincide with X3:s, the sample median. REMARK 5.2. Unlike Lloyd's (1952) BLUE, Oblue, the estimators 0rss and 0blue do not involve the covariances of the order statistics, and so avoid the computational difficulties associated with 0blue.
On s o m e
aspects
of ranked set sampling
361
Table 5.1 Comparison of variances n
var(0blue)
var(0~ss)
var(0blue)
RCLB
5 6 7 8 9 10 11 12 13 14 15 16 18 20
1.2213 0.8607 0.6073 0.4733 0.3865 0.3263 0.2820 0.2481 0.2214 0.1998 0.1820 0.1671 0.1435 0.1257
1.2213 0.5454 0.3519 0.2651 0.2170 0.1870 0.1667 0.1521 0.1411 0.1327 0.1260 0.1205 0.1122 0.1062
1.2213 0.5452 0.3126 0.2041 0.1441 0.1074 0.0832 0.0664 0.0542 0.0451 0.0381 0.0327 0.0247 0.0194
0.4000 0.3333 0.2857 0.2500 0.2222 0.2000 0.1818 0.1667 0.1538 0.1429 0.1333 0.1250 0.1111 0.1000
Incidentally, we can also derive the BLUE of 0 based on a partial RSS, namely, X(33),... ,X(z+2 l+2), for l < (n - 4). Starting with ~ l + 2 ) ciX(ii) and minimizing var(F~'+2)ciXiii)) subject to the unbiasedness conditions: ~ I + 2 ) c i = 1 , ~(t+2) 3 ciCi:n = 0, leads to 0blue(prss, l) .... v-~(l+2) Ci:nX(ii)/dii:n) = (E~l+2) X(ii)/dii:n)(E~,+2)2Ci:n/dii:n) - +v-~('+2) ~2..~3 Ci:n/aii:n)~2..a3 ( E ~ 1+2) 1/dii'n)(E~ 1+2). C2n/dii'n). • - (K-~(/+2) ~,z_.,3 c i:nl/dii:n,~2
(5.16) with var(0blue(prss, I)) = 0-2
(E~I+ 2) Ci:n/ 2 i:n) ( V'(l+2)Z~3 1/dii:n)(E~ 1+2) ei:n/dii:n ) 2
(5.17)
_ ( X-'(l+2)z__~3Ci'n/4"i'n) 2 , •
The following table provides, for n = 5(1)16(2)20, minimum values of l for which (i) var(0blue(prss, l)) < vat(0), and (ii) var(0blue(prss, I)) < 2a2/n (RCLB). Again, we have taken 0-2 = 1 without any loss of generality. Clearly, it follows from Table 5.2 that often a partial RSS, based on a relatively very few actual measurements and combined with optimum weights, does better than Lloyd's 0blue and, more importantly, better than any unbiased estimator. Thus, for n = 10, 0blue(prss,4) based on a partial RSS of size l - 4 is more than 50% efficient than 0blue, as well as better than any unbiased estimator based on a SRS of the same size. We now discuss another variation of a partial RSS. Instead of working with (X'(33),... ,X((1+2)(1+2)) ) for some l < (n - 4), we begin with the central diagonal
N. N. Chuiv and B. K.
362
Sinha
Table 5.2 Minimum values of l, indicating dominance of PRSS over SRS and RCLB
8 9 10 11 12 13 14 15 16 18 20
var(0btue(prss, 1)) < RCLB
var(0blue(prss, l)) < var(0blue)
l
var(0btue(prss, l))
l
var(0blue (prss, l))
4 4 4 5 5 6 6 6 7 8 8
0.2041 0.1619 0.1724 0.1091 0.1251 0.0818 0.0954 0.1184 0.0753 0.0610 0.0860
3 3 4 4 5 5 6 6 6 7 8
0.2614 0.3533 0.1724 0.2289 0.1251 0.1602 0.0954 0.1184 0.1497 0.1113 0.0860
element(s) in McIntyre's RSS and spread out along the diagonal in both directions to include a few more terms. Thus, for n odd = 2m + 1, we propose to use the BLUE based on the central diagonal ( 2 / + 1) order statistics {Y((m÷l)(m+l)) , (Y(mm),Y((m+2)(m+2))),... ,(Y((m+l_l)(m+l_l)),X((m+l+l)(m+l+l)))} , which, in view of symmetry and (5.14), is given by
Oblue(mprss, 2l +
1) =
~-~J-lX((m+l+r)(m+~+~))/d(m+l+r)(m+l+r):"
(5.18)
~ l _ l 1/d(m+l+r)(m+l+r):n with ~72
var(Oblue(mprss,21 +
1)) = ~ t l
1/d(m+l+r)(m+l+r):n
(5.19)
On the other hand, for n even = 2m, we intend to use the BLUE based on 2l central diagonal order statistics, namely, {(Xmm,X(m+l)(m+X)),..., (X(m l+l)(,~-t+l), Y(m+l)(m+l))) , which, again in view of symmetry and (5.14), is given by l
/~blue(mprss, 2/) = ~-(1-1)X((m+r)(m+r))/d(m+r)(m+r):n
(5.20)
}-~J-(l- 1)1/d(m+r)(m+r):n
with ~72
var(0blue(mprss, 2/)) = 2_5"---'(;-1) l/d(m+r)(m+r):n
(5.21)
The following result, whose proof is given in Ni Chuiv et al. (1994), shows that this modification of a partial RSS often pays off in the sense that, whatever be n _> 8, the weighted average of only 4 or 5 selected middle order statistics from the
On some aspects of ranked set sampling
363
McIntyre's RSS, depending on whether n is even or odd, does better than any unbiased estimator based on a SRS of size n.
THEOREM 5.2. (i) For n = e v e n _ > 8 , var(Oblue(mprss,4))<2a2/n. n = odd _> 9, var(0blue(rnprss, 5)) < 2ry2/n.
(ii) For
5.3. Which order statistic? In this section, as in Sections 2.2, 3.3, and 4.2, we address the issue of the right selection of order statistics in the context of RSS. The following variance inequality for order statistics of a Cauchy distribution is useful for our subsequent discussion. Its small sample validity follows from Table 2 of Barnett (1966b), and its asymptotic validity is obvious (see also David and Groeneveld, 1982). Recall that a sample median is defined as Xmedian:n Ym+l:Zm+lwhen n = 2m + 1, and as [Xm:2m +Xm+l:Zm]/2 when n = 2m. =
LEMMA 5.1. var(Xmedian:n) _< var(X~:n) for any r and n. In view of the above result, we can recommend the use of the sample median from each row of n observations in Table 1.1, and the mean of all such medians as an estimator of 0, namely,
0rnedian:n(r/) =---[Xm(1)dian:n-]--''-I-X2)dian:nl/n
(5.22)
where X(mi~dian:n is the sample median from the i th row of Table 1.1. Clearly, 0median:n(n) is unbiased for 0, and, by Lemma 5.1, 0median:n(n ) is much better than the ordinary Mclntyre's estimator 0rss. Slightly more efficiently, we propose measuring only m medians from the first m rows of Table 1.1, where m < n, and use rv(l) (m) 0median:n(m) = C-~median:n -1-""" + X~edian:n]/m
(5.23)
as an estimator of 0. Clearly,
E(0median:n(m)) = 0,
var(Omedian:n(m)) = var(Xmedian:n)/rn .
(5.24)
The following result, whose proof again appears in Ni Chuiv et al. (1994), shows that it's enough to measure only 2 or 3 medians to achieve universal dominance over any unbiased estimator of 0 based on a SRS, whatever be n. This result is very similar to those in the cases of normal and exponential distributions.
THEOREM 5.3. (i) For 8 < n < 21, m = 2 will do, i.e., var(0median:n(2)) < 2rT2/n. (ii) For n _> 22, m = 3 will do, i.e., var(0median:n(3)) < 2~2/n. We finally discuss another variation of the above concept. Since the merit of using a RSS depends on the ability to rank the experimental units without their actual measurements, it is obvious that the fewer the number of units we need to rank the better. This suggests the strategy of ranking exactly 5 units at a time (the
N. N. Chuiv and B. K. Sinha
364
m i n i m u m n u m b e r for a C a u c h y distribution), measuring only the median, repeating the process m* times, and eventually using the average Omedian:5(m*) o f these medians as the resultant estimator o f 0. The following table provides, for n = 5(1)16(2)20, m i n i m u m values o f m* for which var(0median:5(m*)) is smaller than var(0blue) and also R C L B ( = 2aZ/n), based on a SRS of size n. Thus, for example, the average o f 4 sample medians, each based on 5 observations (but only one measurement), is better than Lloyd's B L U E based on 10 measurements. Similarly, the average o f 7 such medians dominates any unbiased estimator o f 0 based on a sample o f size 10. REMARK 5.3. It is again possible to explore the notion of expansion in this problem. We omit the details.
6. Estimation of location and scale parameters of a logistic distribution This section is based on Lain et al. (1995). Here we apply the concept of RSS and its suitable modifications to estimation o f the location and scale parameters of a Logistic distribution. We note that the p d f of a Logistic distribution can be written as 1
e
x
O
f(xlO, a ) - - a . [ l + e ~ f l o l 2 ,
-oc<x,
0
a>0
(6.1)
where 0 is the location parameter and a is the scale parameter. First we give a discussion o f some standard estimators o f 0 and o- based on a SRS o f size n, namely, X1,...,Xn.
Table 5.3 Minimum values of m* for which var(0mcdian:5(m*)) is smaller than var(0blue) and RCLB ^
n
m*(RCLB)
m*(0blue)
5 6 7 8 9 l0 ll 12 13 14 15 16 18 20
4 4 5 5 6 7 7 8 8 9 l0 10 11 13
1 2 3 3 4 4 5 5 6 7 7 8 9 10
On some aspects of ranked set sampling
365
It is clear that the conventional maximum likelihood estimators of 0 and ~ are extremely difficult to get in the context of (6.1), and their small sample properties are completely unknown. However, based on the order statistics XI:,,...,X,:,, Lloyd's (1952) best linear estimators (BLUEs) are quite popular. Throughout this section, we have taken the BLUEs of 0 and a as our main standards for comparison against RSS estimators. Note that E(Xr:n) = 0 + Cr:,a,
var(X~:~) = drr:nff 2,
cov(Xr:n,Xs:n) = drs:n~2
(6.2) where cr:n and dry:, are respectively the mean and variance of Xr:,, and d~s:n is the covariance between X~:, and Xs:, from a standard Logistic distribution with 0 = 0 and a = 1. The values of c~:,, drr:, and dr,:, are given in Balakrishnan and Malik (1994). Using (6.2), the BLUEs of 0 and o- are given by (Lloyd, 1952, Balakrishnan and Cohen, 1991) 0blue = l ' ( Z * ) - l x * / l t ( Z * ) - l l
(6.3)
ablue = c ( ( Z * ) ' X * / ~ ' ( Z * ) l c ~ .
(6.4)
In the above, X* = (Xl:n,... ,X,:,)', ~ = (c1:,,-.. ,Cn:,) and £* = var(X*). Moreover, var(&blue) = cr2/l'(y~*) I1
(6.5)
var(0blue) = o2/~t(~*)-10{
(6.6)
We also note in passing that the Fisher information matrix in our problem is given by
I,(0, ~r) =
(o ;)
(6.7)
where
c = 2 feX(l/°°x2-'- eX)---~,~2dx = 2.43 J0
(6.8)
(1 + e x)
which provides the Rao-Cramer Lower Bound (RCLB) of the variance of any unbiased e s t i m a t o r d l 0 q- d2ff o f dlO q- d2cr as RCLB(& 0 + d26) = dZ3a2/n + d2 aZ /cn .
(6.9)
This immediately gives RCLB(0) = 3~2/n
(6.10)
RCLB(~r) = ff2/Cn
(6.11)
N. N. Chuiv and B. K. Sinha
366
6.1. Estimation o f O: ordinary R S S
Since E(X) = 0, following McIntyre's concept, our first estimator of 0 can be taken as n
0rs s =
~-~X(rr)/rl
(6.12)
r=l
with var(0rss) = cr2 ~
(6.13)
dr~:./n 2 •
r=l
6.2. Estimation o f O: blue based on R S S
We now address the issue of how best to use the RSS, namely, X0~),... ,X(,,), for estimation of 0. Recall that E(X(u)) = 0 + ci:,~r and var(X(ii)) = dii:,a 2. Starting with }-~ cgX(,) and minimizing var(}-]~7 cY(iO) subject to the unbiasedness conditions: ~-~1~ ci = 1, 2 1 eiCi:n = 0, leads to the BLUE of 0 as 0blu e =
( 2 ~ X(ii)/dii:n)(E7 ci2:n/dii:n) -- ( E l n Ci:n/ d i':n)(Eln ei:nX (i')/dii:n)
(27 lld, i:.)(E7
-
(27
2 (6.14)
with
var(Gue) = .2
(27 cL/d,:.) (271/4,:.)(27 eL/a,:.) - (27 "i:./dii:.) 2
(6.15)
Using the fact that, for a Logistic distribution,
£ ei:. = O,
£ ei:./dii:.=
i=1
i=1
0
,
(6.16)
we get E l nX, n (ii)/dii:n 0blue -- Y~'~11~el,:, '
~ __ ~72 var(0blue) 2 7 1/dii:, "
(6.17)
Table 6.1 provides a comparison of var(0blue), var(0rss), var(0blue) and the RCLB (0) for n = 3(1)10. Without loss of generality, we have taken O"2 = 1. It is clear from Table 6.1 that, as expected, 0blue performs better than McIntyre's estimator 0rss which in turn does better than 0blue for all n. Moreover, N a comparison of var(0rss) with the RCLB reveals another striking feature that var(0rss) is smaller than RCLB for all n, so that 0rss performs better than any unbiased estimator of 0 based on the SRS for all n. This latter fact can be proved theoretically as the following result shows. Its proof appears in Lam et al. (1995).
On some aspects of ranked set sampling
367
Table 6.1 Comparison of variances for estimation of 0 n
var(0blue)
var(0rss)
var(0blue)
RCLB(0)
3 4 5 6 7 8 9 10
1.0725 0.7929 0.6284 0.5201 0.4435 0.3866 0.3425 0.3073
0.5968 0.3711 0.2553 0.1872 0.1438 0.1142 0.0931 0.0776
0.5695 0.3379 0.2227 0.1576 0.1171 0.0905 0.0720 0.0586
1.0000 0.7500 0.6000 0.5000 0.4286 0.3750 0.3333 0.3000
Table 6.2 Minimum values of l, indicating dominance of PRSS over SRS and RCLB(0)
3 4 5 6 7 8 9 10
var(Ob~uo(prss, l)) < RCLB
Var(Oblue(prss, l)) < var(0blue)
l
var(Ouue(prss, l))
1
var(Oblue(prss, l))
3 3 4 4 4 5 5 6
0.5695 0.4913 0.2755 0.3058 0.3823 0.2113 0.2541 0.1552
3 3 4 4 4 5 5 5
0.5695 0.4913 0.2755 0.3058 0.3823 0.2113 0.2541 0.3055
THEOREM 6.1. var(0rss) < 3~r2/n (RCLB for 0) for any n. As in the previous sections, we can also derive the B L U E o f 0 based on a
partial RSS (PRSS), namely, X01),. • . ,X(1 i), for l < n. Starting • with . E l l ciX(io and
l l m i n i m i z i n g v a r ( ~ 1 ciX(ii) ) s u b j e c t to t e e u n b i a s e d n e s s c o n d i t i o n s : ~ 1 ci = 1, ~ t 1 cici:~ = 0, leads to the s a m e f o r m as (6.14) with n r e p l a c e d b y l as the u p p e r limit in all the s u m m a t i o n s . T h e e x p r e s s i o n for the v a r i a n c e o f the r e s u l t a n t e s t i m a t o r 0blue(prss, l) is exactly as in (6.15) w i t h the a b o v e c h a n g e . T h e f o l l o w i n g table p r o v i d e s , for n = 3 ( 1 ) 1 0 , m i n i m u m v a l u e s o f l for w h i c h (i) var(0blue(prss, l)) < var(0blue), and (ii) var(0blue(prss, l)) < 3~rZ/n (RCLB). Again, we have taken 0.2 = 1 w i t h o u t loss o f generality. It is clear f r o m this table that substantial savings in the number o f m e a s u r e m e n t s can occur in s o m e cases.
6.3. Estimation of O: which order statistic? Once again, as in the previous sections, we address the issue of the right selection o f order statistics in the context o f RSS. The following variance inequality for
N. N. Chuiv and B. K. Sinha
368
order statistics of a Logistic distribution is useful for our subsequent discussion. Its validity essentially follows from David and Groeneveld (1982). Recall that a sample median is defined as Xmedian:~ =Xm+l:2m+l when n = 2m + 1, and as [Ym:Zm @Ym+l:2ml/2 when n = 2m. LEMMA 6.1. var(Xmedian:n) _< var(Xr:,) for any r and n. In view of the above result, we can recommend the use of the sample median from each row of n observations in Table 1.1, and the mean of all such medians as an estimator of 0. Slightly more efficiently, we propose measuring only m medians from the first m rows of Table 1.1, where m _< n, and use 0median:n(m)
=
ILX median:n (1) + • " ' -t X
(m) ~" median:n]/m
(6.18)
as an estimator of 0. Here X2~dian:n is the sample median from the i th row of Table 1.1. The following result, whose p r o o f again appears in Lain et al. (1995), shows that it is enough to measure only two (2) experimental units to achieve universal dominance over any unbiased estimator of 0 based on a SRS, whatever be n. This result is similar to those in the previous sections. 6.2. (i) var(Xm+l:2m+l)/2 < 3a2/(2m + 1) (RCLB for 0) for n = 2m + 1. (ii) var(Xm:zm)/2 < 3~2/(2m) (RCLB for 0) for n = 2m.
THEOREM
6.4. Estimation of ~7: B L U E based on R S S In this section we discuss the problem of estimation of the scale parameter a in (6.1), and point out that the use of RSS and its suitable variations results in much improved estimators compared to the use of the B L U E (6-blue) under SRS. To derive the B L U E of a based on the entire McIntyre sample X(aI),... ,X(nn), we minimize the variance of ~ ciX(ii) subject to the unbiasedness conditions : ~ ci = O, ~ 1n cici:n = 1. This results in ffbJue = (~-]~nlX(ii)ci:n/dii:")(E~ 1/dii:n) - (~-]~ Ci:n/dii:n)(~nlX(ii)/dii:n) (}--~ 1/d.:.)(E~ c2./d.:.) - (E~ Ci:n/dii:n) 2 (6.19) with
1/dii:n)
var(c blue ) = ¢72
(6.20)
(E7 1/a.:.)(E7 G/a,,:.) - ( E l ° c , : ° / 4 , : . ) 2 Using (6.16), the above expression can be simplified as
_ O'blue =
with
n
x(. I Ci:n/4,:.
E",
(6.21)
On some aspects of ranked set sampling
369
Table 6.3 Comparison of variances for estimation of cr n
var (6"blue)
var(~blue)
RCLB(a)
3 4 5 6 7 8 9 10
0.3333 0.2254 0.1704 0.1370 0.1145 0.0984 0.0864 0.0768
0.4533 0.2521 0.1627 0.1143 0.0850 0.0658 0.0525 0.0428
0.1372 0.1029 0.0823 0.0686 0.0588 0.0514 0.0457 0.0412
Table 6.4 Minimum values of l, indicating dominance of PRSS over SRS var(6blue(prss, 1)) < var(0blue) var(ablu~(prss, l)) 8 9 10
var((}blue) = 0-2
7 8 8
0.0891 0.0681 0.0718
n 21 ~ 1 ei:n/dii:n
(6.22)
In Table 6.3 we have presented the values of var(6-blue), var((}blue) and the R C L B (a) for n = 3(1)10. The dominance of (}blue o v e r (}blue holds for n > 5, although there is no d o m i n a n c e over R C L B ( a ) . As in the case of estimation of 0, here also we can use a partial RSS, namely, X(ll),...,X(ll) for some l
expansion
in
370
N. N. Chuiv and B. K. Sinha
7. Estimation of parameters in Weibull and extreme-value distributions In this section, which is based on Fei et al. (1994), we discuss the estimation of the p a r a m e t e r s in a two p a r a m e t e r Weibull and extreme-value distributions. We note that the p d f o f a two p a r a m e t e r Weibull distribution can be written as
1 where 0 > 0, fl > 0 are the scale and shape p a r a m e t e r s respectively. Let x = l n y ( the natural logarithm of y ), then x has a Type I asymptotic distribution of smallest (extreme) values given by -oc <x<
oc (7.2)
where # = ln0, a = 1//3 are the location and scale p a r a m e t e r s respectively. Section 7.1 is devoted to the estimation o f # and o- based on M c I n t y r e ' s RSS as well as a partial RSS (PRSS). Section 7.2 is devoted to a discussion of the relevance of smallest order statistics in this context. Estimators of the original parameters/3 and 0 are obtained as/~ = 1/6-, 0 = exp(/)). If I11, Y2,..., Yn is a simple r a n d o m sample (SRS) o f size n from (7.1), then X I , X 2 . . . . ,Xn is a SRS of size n from (7.2). Let XI:,,X2:n,... ,Xn:, be the order statistics, then Z/:, = (Xi:,, - # ) / a , i = 1 , 2 , . . . ,n are the order statistics from a SRS of size n from a standard extreme-value distribution. We shall use the notations E(Zi:,) = cti:,, Cov(Zi:n, Zj:n) = dij, i, V = (dij),x,,
i = 1,2,..., n ,
(7.3)
j = 1,2 . . . . , n
(7.4)
V -1 = (diJ),x,
.
(7.5)
It is well k n o w n that the unique m i n i m u m variance linear unbiased estimators ( U M V L U E ) of # and o- based on X 1 , . . . ,Am (i.e., Y1,..., Yn) are given by n
[t = Z
ai,,Xi:,
(7.6)
i=1
(7 = ~ Ci,n~i:n
(7.7)
i=1
The variances of the above estimators/~ and 6- are as follows: Var(# i
Ant72
(7.8)
V a r ( a i = Eno-2
(7.9)
=
and the covariance of/~ and t~ can be written as
On some aspects of ranked set sampling
371
(7.10)
C o v ( ~ , ~) = B n a 2
where n
A, = Xl
~ z
c~i:,cg:,diJ
(7.11)
i=1 j = l .
Bn = --
.
(7.12)
O~i:ndJ
n
1 E, = S
Zdij
(7.13)
i=1 j = l
Here (7.14) i=1
=
The coefficients ai:,, and n
= ¢i:n
i=1 j = l
in the formulas (7.6) and (7.7) may be obtained as //
ai,, =A, Z dij + B, Z c~J:"dij j=l
j=l
n
n
(7.15)
ci,, =B, Z dij + E, Z ~J:'dij j=l
(7.16)
j=l
We refer to Balakrishnan et al. (1990) for values of c~i:,'s, dii's and dij's, from which the above expressions can be evaluated. The coefficients ai,, and ci,, can also be obtained from Table 5.3 in Mann et al. (1974).
7.1. Estimation of l~ and a based on RSS In this section we first discuss the problem of estimation of the parameters in (7.2) using a RSS. Clearly X 01),X(22),... ,X(,,) are independent, X(ii) is distributed as X/:,, the i th order statistic in a sample of size n from (7.2). Then, we have E(X(ii) ) = / ~ + a E ( Z / , , ) = / ~ + c ~ i : , a ,
i= 1,2,...,n
Var(X(ii)) = o'2Var(Z~:,) = a2dii, i = 1,2,... ,n Cov(X(ii),X(jT)) = O, i C j, i,j = 1 , 2 , . . . , n .
(7.17) (7.18) (7.19)
We use the notations f
X = (X(11))((22)'" .X(,,)) ,
(7.20)
372
N. N. Chuiv and B. K. Sinha
Using the generalized G a u s s - M a r k o v theorem, we can obtain unique minimum-variance linear estimators of # and a based on the RSS (X(11),X(22)~...
,X(nn) )
as
The variance-covariance matrix of the estimators #* and ~r* is given by
Furthermore, we may obtain the explicit formulas of the estimators #* and or* as
The variances of the estimators #* and o* are
373
On some aspects of ranked set sampling
Var(#*) = a2
E1
c ./dii
(E;71/4,)(E7 G / 4 i )
~ 1/dii 1 / d i i ) ( ~ c~,/d,) - ( ~
Var(a*) = a 2 (~
2
-
~,:,/d,) 2
(7.28)
(7.29)
Table 7.1 provides a comparison of Var(/~), Var(~-) with Var(#*), Var(a*) respectively for n = 2, 3, 4, 5, 6, 7, 8, 9, 10. The superiority of our proposed estimators of #, a based on a RSS is obvious from the above table. We now discuss the use of PRSS in the context of our estimation problems. Let X01),X(22),... ,X(/l/ (l < n) be a partial RSS. It is then easy to verify that the BLUEs of # and o- based on the PRSS (denoted as/~l and #l ) are given exactly by (7.26) and (7.27) except that in all the summations, n is replaced by l. Moreover, the corresponding variances of these BLUEs are given by (7.28) and (7.29) respectively with the same changes. Table 7.2 provides for n = 5(1)10 minimum values of 11 and 12 for which Var(/~ll ) < Var(/~) and Var(#12) < Var(6-), respectively. It may be noted that there is some benefit in using a partial RSS for the estimation of # and ~r. Table 7.1 Comparison of var(/~) with var(p*), var(6-) with var(a*) when ~2 = 1 n
Var(/~)
Var(#*)
Var(6)
Var(a*)
2 3 4 5 6 7 8 9 10
0.6595 0.4028 0.2935 0.2314 0.1912 0.1629 0.1419 0.1258 0.1133
0.5859 0.2453 0.1387 0.0898 0.0631 0.0468 0.0361 0.0288 0.0234
0.7119 0.3448 0.2253 0.1666 0.1320 0.1090 0.0929 0.0809 0.0716
1.2118 0.4468 0.2382 0.1495 0.1031 0.0756 0.0579 0.0458 0.0372
Table 7.2 Minimum values of Ii and 12 (and corresponding variances) for which ~h beats fi and ~t_, beats 6- when ~2 = 1 n
var(/~)
Ii
var(~/, )
var(6-)
12
var(~ 6 )
5 6 7 8 9 10
0.2314 0.1912 0.1629 0.1419 0.1258 0.1133
4 5 5 6 6 7
0.1914 0.1101 0.1531 0.0937 0.1226 0.0796
0.1666 0.1320 0.1090 0.0929 0.0809 0.0716
5 6 7 7 8 9
0.1495 0.1031 0.0756 0.0875 0.0661 0.0517
N. N. Chuiv and B. K. Sinha
374
7.2. Relevance o f smallest order statistic
In this section we study the possibility of using only the minimum order statistics and examine the performance of the use of (X(ll), X(21),..., X(ll)) for estimation of # and cr for various choices of l = 1 , 2 , . . . , n in an attempt to find the minimum value of l for which dominance over/) and # holds. It is easy to verify that X(ll),X(21),... ,X(tl) are iid with a common extremevalue distribution whose pdf is given by
I xr- u] [
91,n(XlU, if) = ~ e x p L T j , e x p
-exp
(~--~) ] ,
-oe<x
where the new location parameter is u = # - atn(n). To discuss estimation of # and o based on X01),... ,X(tl), let Xl,l(1,n) < - . . < X/,z(1,n) be their order statistics. Then the UMVLUE of u, a may be obtained from (7.6), (7.7) as l
ul,/= Z
ai,tXi:l(1, n)
(7.31)
ci'lXi:l(l' n)
(7.32)
i=1 l
ffl,l = Z i=1
which result in the UMVLUE of # as /~,l = ul,t + 61,1In(n)
(7.33) (7.34)
By setting n = l in (7.6) to (7.14), we immediately obtain the variances and covariances of the estimators fil,l and 62,l as Var(~lj) = At~ 2
(7.35)
Var(61,l) = Et~r2
(7.36)
Cov(/~l,l, 61,l) = Btc r2 .
(7.37)
Then the variance of/~l,t and covariance of/~l,t and 61,l can be derived as Var(/~l,t) = [At + 2Btln(n) + El(ln(n))Z]a 2 Cov(/~ij, 6-~,1) = [Bt + Etln(n)]o-2 •
(7.38) (7.39)
Using (7.11) to (7.14), we may obtain At, Bt, Et and comparisons of Var(/~) with Var(/~lt ) and Var(c/) with Var(61t) for various values of l and n. Our computation (Table 7.3) shows that there is no improvement for estimation of # and ~r even in the best possible case of I = n. REMARK 7.1. Here again it is possible to use the concept of expansion for efficient estimation of the relevant parameters. We omit the details.
On some aspects of ranked set sampling
375
Table 7.3 Comparison of var(/2) with var(/~l,t), var(#) with var(6lj) for l = n when a 2 = 1 n
Var(/2)
Var(/~lj)
Var(6-)
Var(6-1j)
2 3 4 5 6 7 8 9 10
0.6595 0.4028 0.2935 0.2314 0.1912 0.1629 0.1419 0.1258 0.1133
1.0907 0.7645 0.6303 0.5535 0.5025 0.4643 0.4351 0.4113 0.3621
0.7119 0.3448 0.2253 0.1666 0.1320 0.1090 0.0929 0.0809 0.0716
0.7119 0.3448 0.2253 0.1666 0.1320 0.1090 0.0929 0.0809 0.0716
References Arnold, B. C. and N. Balakrishnan (1989). Relations, Bounds and Approximations for Order Statistics. Lecture Notes in Statistics No.53, Springer-Verlag, New York. Balakrishnan, N. and A. C. Cohen (1991). Order Statistics and Inference. Academic Press, Inc., Boston. Balakrishnan, N. and H. J. Malik (1994). Means, variances, and covariances of logistic order statistics for sample sizes up to fifty. Private communication. Balakrishnan, N., P. S. Chan and J. Varadan (1990). Means, variances and covariances of order statistics and best linear unbiased estimates of the location and scale parameters of extreme value distribution for complete and censored samples of size 30 and less. Private communication. Barnett, V. D. (1966a). Evaluation of the maximum-likelihood estimator where the likelihood equation has multiple roots. Biometrika 53, 151-165. Barnett, V. D. (1966b). Order statistics estimators of the location of the Cauchy distribution. J. Amer. Statist. Assoc. 61, 1205-1218. Bickel, P. J. (1967). Some contributions to the theory of order statistics. In: Proc. 5th Berkeley Symp. 1, 575-91. Bloch, D. (1966). A note on the estimation of the location parameter of the Cauchy distribution. J. Amer. Statist. Assoc. 61,852-855. Blom, G. (1958). Statistical Estimates and Transformed Beta Variables. John Wiley & Sons, New York. Cobby, J. M., M. S. Ridout, P. J. Bassett and R. V. Large (1985). An investigation into the use of ranked set sampling on grass and grass-clover swards. Grass and Forage Science 40, 257-263. David, H. A. and D. N. Levine (1972). Ranked set sampling in the presence of judgement error. Biometrics 28, 553 555. David, H. A. (1981), Order Statistics. 2nd edn. Wiley, New York. David, H. A. and R. A. Groeneveld (1982). Measures of local variation in a distribution: Expected length of spacings and variances of order statistics. Biometrika 69, 227-232. Dell, T. R. (1969). The theory of some applications of ranked set sampling. Ph.D. thesis, University of Georgia, Athens, GA. Dell, T. R. and J. L. Clutter (1972). Ranked set sampling theory with order statistics background. Biometrics 28, 545-555. Dixon, W. J. (1957). Estimates of the mean and standard deviation of a normal population. Ann. Math. Statist. 28, 806-809. Fei, H., B. K. Sinha and Z. Wu (1994). Estimation of parameters in two-parameter Weibull and extreme-value distributions using ranked set sample. J. Statist. Res. 28, 149-161.
376
N. N. Chuiv and B. K. Sinha
Halls, L. K. and T. R. Dell (1966). Trial of ranked set sampling for forage yields. Forest Science 12(1), 22-26. Kvam, P. H. and F. J. Samaniego (1991). On the inadmissibility of standard estimators based on ranked set sampling. In: 1991 Joint Statistical Meetings o f A S A Abstracts, 291-292. Lam, K., B. K. Sinha and Z. Wu (1994). Estimation of parameters in a two-parameter exponential distribution using ranked set sample. Ann. Inst. Statist. Math. 46, 723-736. Lain, K., B. K. Sinha and Z. Wu (1995). Estimation of location and scale parameters of a Logistic distribution using a ranked set sample. In: Nagaraja, Sen and Morrison, ed., Papers in Honor o f Herbert A. David, 18%197. Lloyd, E. H. (1952). Least-squares estimation of location and scale parameters using order statistics. Biometrika 39, 88-95. Mann, N. R., R. E. Schafer and N. D. Singpurwalla (1974). Methods for Statistical Analysis o f Reliability and Life Data. John Wiley & Sons, New York. Martin, W. L., T. L. Sharik, R. G. Oderwald and D. W. Smith (1980). Evaluation of ranked set sampling for estimating shrub phytomass in appalachian Oak forest, Publication No. FWS-4-80, School of Forestry and Wildlife Resources, Virginia Polytechnic Institute and State University, Blacksburg, Virginia, 24061, U.S.A. McIntyre, G. A. (1952). A method for unbiased selective sampling, using ranked sets. Aust. J. Agri. Res. 3, 385-390. Muttlak, H. A. and L. L. McDonald (1990a). Ranked set sampling with respect to concomitant variables and with size biased probability of selection. Commun. Statist. - Theory Meth. 19(1), 205-219. Muttlak, H. A. and L. L. McDonald (1990b). Ranked set sampling with size biased probability of selection. Biometrics 46, 435-445. Ni Chuiv, N,, B. K. Sinha and Z. Wu (1994). Estimation of the location parameter of a Cauchy distribution using a ranked set sample. Technical Report, University of Maryland Baltimore County. Patil, G. P., A. K. Sinha and C. Taillie (1992a), Ranked set sampling and ecological data analysis. Technical Reports and Reprints Series, Department of Statistics, Penn State University. Patil, G. P., A. K. Sinha and C. Taillie (1992b). Ranked set sampling in the presence of a trend on a site. Technical Reports and Reprints Series, Department of Statistics, Penn State University. Ridout, M. S. and J. M. Cobby (1987). Ranked set sampling with non-random selection of sets and errors in ranking. Appl. Statist. 36(2), 145 152. Rothenberg, T. J., F. M. Fisher and C. B. Tilanus (1966). A note on estimation from a Cauchy sample. J. Amer. Statist. Assoc. 59, 460-463. Sinha, B. K., B. K. Sinha and S. Purkayastha (1996). On Some Aspects of Ranked Set Sampling for Estimation of Normal and Exponential Parameters. Statistics & Decisions, 14, 223-240. Stokes, S. L. (1977). Ranked set sampling with concomitant variables. Commun. Statist. - Theor. Math. A6(12), 1207-121t. Stokes, S. L. (I 980). Estimation of variance using judgement ordered ranked set samples. Biometrics 36, 35-42. Stokes, S. L. (1986). Ranked set sampling. In: S. Kotz, N. L. Johnson and C. B. Read, eds, Encyclopedia o f Statistical Sciences 7. Wiley, New York, 585-588. Stokes, S. L. and T. W. Sager (1988). Characterization of a ranked set sample with application to estimating distribution functions. J. Amer. Statist. Assoc. 83 (No. 402), 374~381. Takahasi, K. and K. Wakimoto (1968). On unbiased estimates of the population mean based on the sample stratified by means of ordering. Ann. Inst. Statist. Math. 20, 1-31. Takahasi, K. (1969). On the estimation of the population men based on ordered samples from an equicorrelated multivariate distribution. Ann. [nst. Statist. Math. 21,249 255. Takahasi, K. (1970). Practical note on estimation of population means based on samples stratified by means of ordering. Ann. Inst. Statist. Math. 22, 421-428. Takahasi, K. and M. Futatsuya (1988). Ranked set sampling from a finite population (Japanese). Proc. Inst. Statist. 36(1), 55-68.
On some aspects of ranked set sampling
377
Tietjen, G. L., D. K. Kahaner and R. J. Beckman (1977). Variances and covariances of the normal order statistics for sample size 2 to 50. In" Selected Tables in Mathematical Statistics 5, 1-73. Tukey, J. W. (1958). A problem of Berkson, and minimum variance orderly estimators. Ann. Math. Statist. 29, 588-92. Watanabe, Y., M. Isida, S. Taga, Y. Ichijo, T. Kawase, G. Niside, Y. Takeda, A. Horisuzi and I. Kuriyama (1957). Some contributions to order statistics. J. Gakugei (Tokushima University), 8, 41-90. Yanagawa, T. and S. Shirahata (1976). Ranked set sampling theory with selective probability matrix. Austral. J. Statist. 18(1, 2), 45-52. Yanagawa, T. and S-H. Chen (1980). The MG-procedure in rank set sampling. J. Statist. Plan. Infer. 4, 3344.
N. Balakrishnan and C. R. Rao, eds., Handbook of Statistics, Vol. 17 © 1998 Elsevier Science B.V. All rights reserved.
1 |
Some Uses of Order Statistics in Bayesian Analysis
Seymour Geisser
1. Introduction Order statistics, although playing a prominent role in frequentist methodology, especially in nonparametric inference, are not often featured in Bayesian analysis. One area, however, where order statistics can be of interest to Bayesians is in the detection of outliers or discordant observations. These situations are such that there is an observation that appears to be somewhat removed from the remaining ones but no discernible alternative can readily be specified for the potential discordancy. Alternatives, although sometimes considered, Dixit (1994), require additional distributional assumptions and prior probabilities over and above the original model assumptions that initially an investigator may not be prepared to contemplate. In these situations a simple Bayesian test of significance may be appropriate in determining whether an observation or several of them are discordant or if it is necessary to contemplate alternative model assumptions for the entire data set. Sections 2 through 6 will consider Bayesian discordancy testing. Another area involves situations where the calculation of the probability that the minimum (maximum) of a set of future observables is greater (smaller) than some critical threshold. More generally we shall be interested in the chance that R out of M future values are in some interval or some set. This essentially involves the R th future order statistic. This will be the subject of sections 7 and 8.
2. Discordancy testing The notion of a Bayesian significance test was introduced by Box (1980) for goodness-of-fit problems. This view was adopted for discordancy testing by Geisser (1980, 1989, 1990). In this paper we delineate some approaches for the use of Bayesian significance testing to the detection of potentially discordant observations. These tests can be useful in situations where no distributional alternative is readily contemplated or easily modelled. These cases presumably may have arisen from errors in transcribing data, numbers misread, digits transposed, an incorrect sign before a number, a stipulated experimental condition that did not obtain or any of a host of possibilities that would serve to flaw one or more 379
380
s. Geisser
observations in an experiment. Therefore the surprise spawned by a small P-value of an appropriate significance test is useful in detecting potentially flawed observations. If the "discordant" observation(s) make(s) an appreciable difference in a potential inference or decision then a determination needs to be made as to whether the apparently discordant observables are really incompatible with the rest of the observations or whether the modeling requires revision or both. In univariate situations these discordancies often take the form of outliers in that one or several observations appear to be distantly removed from the others. We shall present a framework for such discordancy tests that depend on (a) the identification of a potentially discordant observation because of the intrusion of some untoward event connected with the particular observation (b) taking into account a diagnostic ransacking of the data in search of potentially discordant observations. Approaches are discussed that depend on the differing circumstances in identifying the suspect observations.
3. Suspicious circumstances Assume YI, • •., YN are independently distributed with known covariates xl, • • •, XN such that the distribution function of Y/is specified as F(yilxi, 0). In addition, we assume a prior density p(O) for 0. Hence based on this model we can compute the predictive distribution of a future value or set of such values z (") = (zl,... ,z,) given w (~) = ( w l , . . . , wn)
where y(N) = (Yl,...,YN) is the observed set of values of y(N) = (Y1,..., YN) given x (x) = ( x l , . . . ,XN) and the expectation is over P(Oly(N)). In the process of sampling a particular value Y~ say some untoward event or suspicious circumstance occurred that m a y have affected the observable yi. A determination can be made as to whether the observation was discordant and if so its effect on the inference or decision. Discordancy can be assessed using the predictive distribution of the particular observable Y~= yi given the rest of the sample y(z) which denotes all of the observations in y(N) with Yi deleted. If it is determined that yz is likely to have been flawed then a comparison of either posterior distributions of particular parameters of interest or the predictive distribution of future observations with and without the potentially flawed observable is in order. Since we are dealing essentially with "potentiality" it is clear that there is little concern with observables well within the ambit of those for which no doubt is manifest. Hence we would restrict our attention to those that appear extreme in some sense. When dealing with independent and identically distributed univariate observables, our attention is directed to those that are possibly extreme, i.e. the largest and the smallest observables. For an extreme single value for which an untoward event occurred, we can construct a significance test by calculating the predictive probability
Some uses of order statistics in Bayesian analysis
PM = Pr[ZM 2 YMlY(M)]
381
(3.1)
for the largest observation yM and for Ym the smallest observation P,~ : Pr[Zm _< YmlY(m)] ,
(3.2)
where Y(i) is y(n) with Yi deleted. If both are to be tested simultaneously, then P~,M = Pr[Zm <_ ym,ZM >_
y ly(m,M)]
(3.3)
when Y(i,j) is y(N) with Yi and yj deleted. As long as the largest or smallest yM and y,, were tagged because of a prior potential problem there is no need to concern ourselves with the distribution of order statistics. We have restricted ourselves to extreme points because nonextreme points are not likely to be of concern, unless values more extreme are also of concern because of their removal from the bulk of observations. More generally for non-identically distributed variables (usually because of known covariates) we would calculate
Pi = P[Zi
E
R]y(i),x(N)J
(3.4)
where R is a region indicated for discordancy by some diagnostic procedure.
4. Examples As an example we consider the exponential distribution. Let Y1,..., YN be a random sample from f(Yl~,7)=c~e-~(Y-~),
Y>7,
~>0
.
Let y l , . . . ,yd represent fully observed values, and Ya+l,..., YN be censored at yd+l,... ,YN respectively. Further let m = m i n ( y l , . . . ,Ya), and for reasons previously discussed (Geisser 1984), assume that m < min(yd+l,... ,YN). Let the conjugate prior density be P(7, c~) = p(v[~)p(c~) where p(T]a) =N0c~e ~N°(~ m0),
7 < m0 ,
and
p(e) oc c~d°-2e
c~N°(y°-m°),
O~ >
O, Yo > mo ,
where 1 < do _< No. Then the posterior densities are
p(7]O:,y(N)~ oc e ~N*(~-m*),
y < m* ,
p(o~[y(N)) oc o:d*-2e-~N*(7m*)
y * > m*~ ~ > 0
\
for
/
~
S. Geisser
382
l
d*=do+d,
N*=No+N,
m*=min(mo,m), (4.1)
N
y* = (No + N) -1 (N0Y0 + Ny),
and
Ny i = ~-'~Yi •
1 The predictive distribution of a future observable Z is
(
.j ~ F(z) = I 1 t
1
(y*-m*~ d*+l
\>-=7)
,
z<m*,
.... ~N*d(Y*--m)d*2 (N,+l){z_m,+N(y,_m,)}d,_l
m* ,
z>
Note that for the noninformative prior P(7, e) c~ e -l, m*---+m,
y*~y,
d*--+d,
N*~N
.
In what follows we shall remove the stars although it is clear that when the proper prior is available we can replace the unstarred values with starred values. Clearly, then, for (3.1) we would calculate a P-value for the largest, ( N - 1)c@(a4)-m) c-' PM =
(4.2)
m))
where c = d - 1 if Ya¢ is fully observed c= d
if ym is censored
and for the smallest, d-2
l
(4.3)
Pm = 1 - - ~ \Y(") - - m /
where m2 the second smallest is smaller than the censored observations when m is excluded. For combinations of the largest and smallest we first calculate (bereft of stars) the predictive probability of a pair of future observables Z1 and Z2
PrIZe_
~
+ ~
)Vy + z(Tz2 ---(-N + 2)v
\N(p-v)+zz-v,
N~I
; N~I
; \N~-v)+zl-v (4.4)
Some uses o f order statisties in Bayesian analysis
383
for v = min(zi,z2) and max(zl,z2) _< m,
E
l
Pr Z l _ < z l , Z 2 > z 2 l y (N) = ( N + I ) ( N + 2 )
N(y-Zl)+Z2-Zl (4.5)
for zl _< m _< z2, and Pr Z l > z l , Z 2 > z 2 1 y (N) - ( N + 2 )
NCV-m)+zl+z2-2m (4.6)
for m i n ( z l , z 2 ) >_ m. Hence, for the two smallest we can calculate Pro,m2 by substituting in (4.4) N - 2, Y(m,m2), d - 2 and m3 for N, y, d and m. F o r zl and z2 we use m and m2. F o r the smallest and the largest, Pm,M can be calculated f r o m (4.5) with N - 2, Y(m,M), m2, c substituted for N, y, m, d w h e r e { dc=
1 2
i f M is censored i f M i s uncensored
,
and substitute m for Zl a n d M for z2. F o r the two largest, say M and M2, we can use (4.6) with N - 2, 2(M,M2) and c substituted for N, y, d where c =
d - 2 d- 1 d
if neither o f M or M2 are censored if only one is uncensored if both are censored ,
and set Zl = M, Z 2 = m 2. F o r Y 1 , . . . , YN independent in n o r m a l linear regression the setup is as follows:
Y=xfl+u Y=
"
,
X=
I1.
"
"
\XNI,
...
"
,
fl=
XNp
where X is known, fl is u n k n o w n , and
U =
~ N ( O , a2IN)
.
Hence, e - ( 1/262) (y-X[1)' (y-Xfl) fy(y)
=
(2g)N/2ff N
where y is the realization o f the vector Y.
(4.7)
S. Geisser
384
Although there is no more difficulty with the usual normal-gamma prior for (fl, a2), we shall illustrate this with the noninformative prior 1
p(fi, cr2) oc a~ , so that p(fl, a2 lY) ~x
N1 2 +1 e-(1/2~2)(Y x~)'(y-X~) (~)//1
Suppose we are interested in predicting a set of M new variates
at known design matrix
m z
I Wl 1 " WM1
"" " "
Wlp I "
• ''
WMp /
such that
z= w~+u where U ~ N ( o , aZlM) .
Then for the future set Z f(zly) =
f f(zt 2, w)p( 2,fily,X)
da 2 dfl .
Let A = I+ W(X'X)
~W',
fi = ( X ' X ) - I X ' y
then we can calculate F((N + M - p)/2) f(zly)
=
~M/2 F( (N - p ) / 2 ) ( N - p)m/2lsZA[1/2
(X+M-p)/2 x
1-+
(N_p)s
2
(4.8)
Some uses of orderstatisticsin Bayesiananalysis
385
an M-dimensional student density. Further, it is easy to show that the predictive distribution of
(N - p)s 2
= FM,N-p •
(4.9)
Hence, to test whether a tagged subset y(n) of y(N) is an outlier group (usually n will be 1 or 2), we can calculate
~:n,N-n-p ~
p(n) = Pr
(N - n ~ ~ . (4.10)
where the subscript N - n indicates that the values SN 2 n' fiN-n, ^ AN-n, XN n are based on the undeleted N - n observations and Fn,x_n_p is an F variate with n and N - n - p degrees of freedom. F o r the special case where p = 1 and xil = 1 for i = 1 , . . . , N and n = 1 we have
(yi - Y(z)) ~ -
S(i)~/-~
1
~'~tN-2
(4.11)
a " S t u d e n t " variable with N - 2 degrees of freedom where Y(i)= ( N - 1) -l ~ j ¢ i Y J and ( N - 2)s~0 = ~ j g i ( Y ; - y ) 2 . If a direction (too large or too small) is considered an o u t c o m e of the u n t o w a r d event than one-sided Pi can be computed. If the u n t o w a r d event does not imply a direction then the two-sided significance value is appropriate. These m e t h o d s can easily be extended for a n o r m a l - g a m m a conjugate prior. F o r Poisson regression we assume Pr[Y = Ylx,
O] -
e-x°(Ox)Y y! ,
y = O, 1 , . . .
and obtain independent values Y1,..., YN with k n o w n covariates xl, • • •, XN. Again the conjugate prior is a g a m m a but we shall illustrate with the conventional i m p r o p e r prior p(O) oc -~. A future value YN+I has predictive probability function
Pr[YN+I=ZIXN+I'y(N)'x(N)]=(t+z--1)(t-- 1
bl@XN+l,jXN+l~z(\u_t_XN+lU )t (4.12)
N
z = 0, 1 , . . . , t =- ~-,1 Yi _> 1, and u = ~ N x i . Hence if Yi were tagged we would calculate the significance level o f y i by replacing (z, t, XN+I,U) with (Yi, t(i),xi, u(i)) where t(i) and u(i) are c o m p u t e d with yi and xi deleted. Call this qi and if this is larger than the prescribed Pi for rejection then one can stop, otherwise if this is smaller than the prescribed value, then one could calculate tails by probability ordering for the values of z.
386
S. Geisser
5. Ransacked data Here the situation is such that at the time the data were generated, no known untoward event occurred to influence the observables. However the data were ransacked, whether graphically or numerically, to determine whether the set of observations are consonant with the assumed model. We shall present methods that are appropriate in these circumstances. In univariate situations where the values are generated by an i.i.d, process, potentially discordant values are generally extremes, i.e. particularly large or small values. For the translated exponential, if the largest is chosen by ransacking, then we first calculate FM(ulO) the distribution of the largest M conditional on 0 = (e, ~). Then, for p(O) the proper gamma prior of section 4,
PM = 1 - f FM(u]O)p(O) dO so that
P. j=l \ J
No (No(Yo :m~o)+ o o-mo, )o-1 (M - mo)j
( - 1 ) J - ' No +--~
(5.1)
Since this represents the probability that the maximum is at least as large as its observed value, the result is appropriate for the maximum observation whether fully observed or censored. Similarly, for the smallest observation m, we obtain
Pm :
1 - ( No ~ {No______~o: mo)____}a°-I k.No + NJ {N(m - mo) + No (Yo - mo) }ao-1, N f~o__mo ~ d0-1
m _> mo,
No + N \ Y o
m < mo
_ m j
,
Of course, the above tests exist only for a proper prior distribution and that this prior will have a rather considerable effect on the significance test. Unless these prior parameters Y0, m0, No, do can be specified or perhaps approximated with some precision this may be impractical in many situations. In contrast the frequency approach attempts to obtain a statistic that reflects in some way whether the maximum (or minimum) is discrepant and has a sampling distribution independent of the parameters. For example in the non-censored case d = N, a statistic used is
M-M2 T - - M- _- m
(5.2)
where as before M, M2, m are largest, second largest and smallest values, Dixon (1951), Likes (1966), Kabe (1970). Here it is easily shown that (2 - t N - 2"] Pr[V _> t] = 1 - ( X - 1 ) ( X - 2)B.l\ t'
J
(5.3)
Some uses ~?[order statistics in Bayesian analysis
where
B(u, v) is the m2 -
T - - - -
387
beta function. Similarly for the smallest a statistic used is m
m-m
whose sampling distribution yields for a P-value, Pr[T < t] =
(N-2)BC+(n-2)t 1-t
-
,n - 2 ) .
(5.4)
When d < N, straightforward frequentist solutions are not available. In general then assume some diagnostic, say D, is used in ransacking to order the Yi. Then the transformation D(Y,.)=Di yields random variables D1, D2,..., DN. Hence we need to find the distribution of DM the transform which yields the observed y which is most discrepant, namely
FDM(dIO) , the conditional distribution of Then
DM associated
PM = 1 -- f FDM(dlO)p(O)d0
with the most removed Y~ given 0. (5.5)
where p(O) is a proper prior. Tests of this sort were termed Unconditional Predictive Discordancy (UPD), Geisser (1989). They allow the prior to play a substantial role in determining the outlier status of an observation. One can continue by finding the joint distribution of the ordered Di's given 0 and test whether jointly some most discrepant subset in terms of the diagnostic's discrepancy ordering is a discordant subset. For a simple normal case we assume ~, i = 1,... ,N are i.i.d. N(#, a 2) with 0-2 known and kt,-,~N(0,'c2). Now the unconditionally Y~, i = 1 , . . . , N are an exchangeable set of normal variates with mean 0, variance 0-2 + ~2 and covariance ~2. This might imply that V~= (Y~-- 0)2/(0-2 + "t"2) is the appropriate diagnostic with the maxi Vii = V0 being used to construct the significance test for the largest deviation, namely PM = Pr[V0 _> v] .
(5.6)
It is clear that V1,..., VN are all exchangeable and marginally distributed as Z2 with one degree of freedom. Although the distribution of ~ is not analytically explicit, I'M can be calculated by either numerical approximation or Monte Carlo techniques, see also tables by Pearson and Hartley (1966). However, this is not the critical issue. The question is whether V~ is an appropriate discrepancy measure because V~ only reflects distance from the prior mean and this could cause some discomfort as it need not reflect sufficient distance from the rest of the observations. The latter is often the implied definition of an outlier or a discordant observation. One could also use (Y/ - - Y ) Z N
max (0-2 + v2)(x + 1)
=
maxZi = Z0
(5.7)
388
S. Geisser
again a joint distribution of exchangeable normal random variables, each marginally Z2 with one degree of freedom, and though slightly more complex, it is still calculable. Again, this is essentially the frequentist approach for ~2 = 0 which in essence is independent of the prior assumptions. Perhaps this goes too far in the other direction, i.e. disregarding the prior information. Some compromise may be needed and the one that suggests itself is
( NT2y2 92.] 2 W/= Yi NT2 q- 0"2 ]
(5.8)
where the deviation is from the posterior mean an appropriate convex combination of sample mean and prior mean. Although unconditionally ~ , . . . , WN are still exchangeable the marginal distribution of Wi is essentially proportional to a non-central Z2, thus complicating matters still further for 14~ = max W~. However, deviations such as W~seem more sensible in that both prior and likelihood play a part in contrast to only either prior or likelihood. Further distributional complications ensue when the usual conjugate gamma prior is assumed for 0.-2. In addition, the two hyperparameters of the gamma prior also must be known. Extension to multiple linear regression with normally distributed errors, though clear for all 3 approaches, involves further unknown hyperparameters. For Poisson regression we also would require a discordancy ordering perhaps based on the least probable Y,. as the potential outlier. As this becomes quite complicated we shall merely illustrate for i.i.d. Poisson variates with a gamma prior, for 0 p(Olw,6) -
76 0z,- I e - ~O v(a)
If the maximum Y,. has the smallest probability we let Z = maxi Y/, assuming this is the potential outlier. Then [-z l e 00y] N
pr[z >
z] = 1 - [
dO ky=O y6 /.oo
=I-F-~]
p(O) dO
(
02
° e-(N+JO0a l l + 0 + ~ - + . . . - t
0z-i )N ( z T ~ ) .~
dO . (5.9)
Clearly one can write the multinomial expansion for the term raised to the N th power in the integrand and integrate termwise and obtain a complex but finite and explicit solution involving gamma functions. If rain Y/= W has the smallest probability, then
Some uses of order statistics in Bayesian analysis
Pr[W < wlO] = 1 -
389
w e_OOy]N 1 - y~=o-7-i]"
(5.10) w e O0yTN
Pr[W<_w]=l-fp(0l>a)
1-y_~07]
dO.
Again this is complex but explicitly computable in terms of a finite series involving gamma functions. Although simple analytic expressions, except when dealing with the exponential distribution, are rare, Monte Carlo methods are generally available to handle such situations. However, the major difficulty is of course the assignment of the proper prior distribution and the ensuing set of hyperparameters. Because of these difficulties we shall present another way of handling these situations which can be used with proper or improper priors.
6. Conditional predictive discordancy (CPD) tests We shall now present a method which (a) turns out to be much easier to calculate, (b) can be used for the usual improper priors, (c) depends on a proper prior and its hyperparameters when a proper prior is used, (d) is seamless in its transition from a proper prior to an improper prior and to censoring, and (e) in certain instances when an improper prior is used it will yield a result identical to a frequency significance test. The idea is relatively simple if D(Y/) represents the scalar diagnostic which characterizes the discrepancy of the observation from the model and orders the observables from most to least discrepant D1,D2,... ,DN, then a significance test P = er[D, > &lD1 > d2, d0) ]
(6.1)
where do) refers to d (N) with dl deleted. Here we assume only D1 is random and conditioning is on D(1), i.e. all but the largest discrepant value. Alternatively, we could consider conditioning D(1,2), i.e. all but the largest and second largest discrepant values which would result in P = Pr[D1 > dllD2 > d2,d0,2)] .
(6.2)
As an example we consider the exponential case of section 4. For testing the largest for discordancy using (6.1) we obtain
PM = Pr[Z > MIZ > M2,Y(M)] = rlVl:.(y-- m ) - ( M - M2)/c] -
PM = (1 - t ) c where
_
Y-6-7)
(6.3)
S. Geisser
390
M-M2 t
--
-
-
N ( y - m)
c = d - 1 if M were censored c = d - 2 if M were uncensored when the non-informative prior is used. F o r the conjugate prior we need only to affix stars to ~, m, d, and N, using the previous definitions of (4.1). Using (6.2) we obtain
PtM • e[Zl ~ MIX2 > M2,Y(M,M=)] P[ZI ~ M, Z2 ~ Mzly(M,M2)] P[Z2 e
M21Y(M,M2)]
_ N_;I .(N(Y -_N___@__~om) (M - m)) c
(6.4)
N-1 --
N
(1-t)~
where t
--
mmm N(~ - m) -
-
and c=
d-1 d-2 d-3
if M and M2 are censored if one of M or M2 is censored if M and M2 are uncensored
We know that if d = N, the uncensored case, the sampling distribution of the statistic
M-M2
T -- - N ( y - m)
(6.5)
which can be used to test for the largest being an outlier is such that Pr[T > t] = (1 - t) N-1 = PM i.e. the same value as (6.3). Hence we have a seamless transition from proper prior with censoring to the usual non-informative prior without censoring yielding the sampling distribution statistic. The second m e t h o d illustrated by (6.4) does not provide a frequentist analogue M-m and this cannot be reconciled with for the sampling distribution of T - N(y-m) (6.4). F o r the smallest observation we obtain, basically using (6.1), Pm = P r [ Z _< mlZ < m2,Y(m)]
Some uses of order statistics in Bayesian analysis
391
where
A(m)/A(m2), B(m)/A(m2), B(m)/B(m2),
Pm =
mo <_ m, m < m o <_ m2, m <_ m2 <_ mo ,
where d*
N*-I
A(z) = 1
2
(N*-)@(m)-m0)
N*
(N*-1)(Y~rn ) --m0) + z--m0
-*
d*-2
m
B(z) = N~ \ Y ~ m ) - z ] The noninformative prior, however, yields the simple form
Pm =
= (1 - ( N - 1)t) a-2 ,
\Y(m)
where m2 -- m t
--
-
-
N ( y - m)
For d = N in the uncensored case, the sampling distribution of T yields Pr[T > t] = P,n so again we have a seamless transition, Geisser (1989). For normal linear regression, a CPD test for the ransacked potential outlier we suggest using as a diagnostic
)
-- x~fl(i ) 'A~il
Yi
,^
(U - 1 - p)s~i ) where the notation (i) refers to the entire set of y = (Yl,... ,YN) with yi and associated x~ = (x/1,...,Xip) deleted. Once the Vi's are ordered into the largest Uc and the second largest Uc-1 with the values yc and yc-1 corresponding to Uc and Uc-1, then we can compute as the significance level Pc = Pr[Uc > uclU~ > U~-l,Y(c)] where u~ and u~-i are the realized values of Uc and Uc 1. We suggest that the significance computation be made as follows: Pc -
Pr[U > Uc] Pr[U > uc i]
(6.6)
S. Geisser
392
where U is distributed as an F-variate with 1 and N - 1 - p degrees of freedom. Similarly for Poisson regression we can order (3.1) using
Pi : Pr [Y/= Yi Ixi, Y(i), x(i)]
: (t(i)-}-Yi--1)( Xi ~Yi(U(i) ~t(i) \ t(i) - 1 ku(i) "-[-Xi/I \hi(i) ~-Xi,I
(6.7)
where t(i) = ~]#iyj, u(i) = 2]#iXj and Pc and Pc-I are the smallest and second smallest probabilities corresponding to say yc and yc I. At this point one could use as significance level the simple computation Pc-
pc
Pc 1+Pc
if xc # xc 1- However, if xc = xc-1, alternatively one can use the tail area C P D approach, i.e. Pr[Yc
Pr [Y~ _> Yc[Y(c)]
(6.8)
>_y~lYc >_yc ,,y(c)] = Pr:YcL 2yc lly(c)]
7. Combinations of largest and smallest In order to derive C P D tests for two observations at a time, i.e. combinations of the smallest and largest, we need the joint predictive distribution of two future observations given y(N). We shall derive results for the translated exponential case of the previous section. Here Pr Zl _< zl, Z2 _< zzly (x) =
N(~- m)
N( + N
N+l
( \ N ( f - v) + z2 - v ]
) d-1 + 2)V N
{
N(y-m)
,~a-1
- - -X + 1 \)V(y--- ~)-+V1 - ] (7.1)
for v : min(zl,z2) and max(zl,z2) _< m,
N ( N~-m) )a-1 Pr S,[ z21Y(N)~]= (N + I)(N + 2) \N (y --~[)-~-~2 - z, for
Z 1 _<< m < Z2,
and
Some uses of order statistics in Bayesian analysis
393
N+2 )V(y-m)+zl +z2-Zm for min(zl, z2) _< m. F o r a joint discordancy test of the smallest and largest (re,M) using the predictive exchangeability o f Z1 and Z2, we obtain, for c = d - 2 i f M is censored and c = d - 3 if M is uncensored, and Y(M,m) is the mean o f all the observations excluding m and M, the significance level Pm,M =
Pr[/1 < m,Z2
> MIZ~ <_ m2,Z2 > M2,Y(m,M)]
)(-F(M,m)_~ ~m I . = [((NN_-_?2)@(M,m)-m~ m +M2+M 2 c F o r (M - m)tM = M - M2 and (M - m)tm = m2 -- rn, Pm,M = {1 -- U(tM + (N -- 1)tM)} c .
(7.2)
F o r d = N, it can easily be shown that the unconditional-frequency calculation Pr[(TM + (N - 1)Tm)g > (tM -}- (N -- 1)tm)U] = Pm,M ,
(7.3)
M m tM and tm are the realized values of the r a n d o m variables U, TM and Tm u - N(-7--m)' respectively, Geisser (1989). F o r the two smallest (re, m2), where m3 is the third smallest, and assuming m3 _< m i n ( y d + l , . . . ,YN), it seems plausible to calculate Pm,m2
= Pr[Z1 < m , m
< Z2 ~
m2lZ1~ 22 ~ ma,Y(m,m2)]
(7.4) for N > d > 3. F o r the two largest (M, M2), similarly we m a y calculate PM,M2 = Pr[Z1 > M, M2 < Z2 <_ MIZI > Z2 >_ M3,Y(M,M2)] = 2Pr[Z1 > M, Z2 > M21 - Pr[Z1 > M, Z2 > M] Pr[Z1 > M3,Z2 > M3] where M3 is the third largest observation. Then for c= we have
d - 1 d-2 d - 3
if M and M2 are censored, if one o f M and M2 is censored, if M and M2 are uncensored ,
394
s. Geisser
m)
(7.5) It is of interest to point out that plausible alternative regions can be used for testing the two largest or the two smallest observations, which have frequentist analogues when d = N. It is not difficult to show that defining PrIZ1 > M , Zz > M2[y(M,M2)] P~t,M2 = Pr[Z1 > M3,Z2 > M3IY(M,M2)]
(7.6)
will result in P~t,M2 = (1 -- ur) c, where c is defined as before in the censored case and r ~
(M - M2) + 2(M2 - M3) M - rn
Further, for d = N, the unconditional-frequency calculation for the random variable UR observed as ur is Pr[UR > ur] : PM,a6 •
(7.7)
A similar calculation for the two smallest gives e' er[Zl < re, Z2 < m2lY(m,m2)] m,m2 = er[Z, < m3,Z2 < m3[Y(m,m2)] = (1 -- US) a-3 ,
(7.8)
where S ~
( N - 1)(m2 - m) + ( N - 2)(m3 - m2) M-m
Again for d = N the frequency calculation for US the random variable observed as us yields Pr[US > ur] = Ure,m2
(7.9)
All of the CPD tests can be given in terms of the original proper prior distribution by substituting m*, y*, d*, N* for m, y, d, N respectively. Advantages of the C P D tests given here over the usual frequency-discordancy tests are that they
Some uses of order statistics in Bayesian analysis
395
can include prior information and censoring. The comparative advantage of the C P D tests over the U P D tests is that the former can be used with certain useful improper priors and are always much easier to calculate. All of these tests are basically subjective assessments and, in general, are not frequency-based, even though under very particular circumstances some of them can be shown to have a frequency analogue. O f course, when alternatives can be specified, a full Bayesian treatment will supersede the methods suggested here for unknown alternatives.
8. Ordering future values In certain problems where there is data regarding an observable such as the yearly high water m a r k on the banks of a river or a dam there is an interest in calculating the probability that no flood will occur in the next M years, i.e. the m a x i m u m high water m a r k does not exceed a given value. Conversely, another situation is the use of possibly harmful drugs serving a limited number of patients given to alleviate very serious but rather rare diseases. Here the drug may be lethal or severely damage some important bodily function if some measured physiological variable falls below (or above) some defined value. A more mundane situation is where a buyer of say M light bulbs, whether connected in series or not, wants to calculate the probability of no failures in a given time based on previous information of bulb lifetimes. In the last two situations we require the chance that the minimum physiological value (or failure time) exceeds a certain threshold. In the first case we are interested in calculating the m a x i m u m Z of future values say YN+I,..., YN+Mnot exceeding a given value z, i.e. the distribution function of the m a x i m u m Z,
Pr[Z
(8.1)
where
In the second case we are interested in calculating 1 - F w ( w ) where W is the minimum of the future set YN+I,---, YN+M or
Pr[W>wly(")] = fer[W>wlt"l,O]p(OI,,("l)dO.
(8.2)
For the exponential case we can obtain explicit results for the previously discussed exponential sampling with a g a m m a prior. Here we obtain the predictive probability that the m a x i m u m Z will not exceed z, to be
S. Geisser
396
x N,(y~m,)~_~_m,)
for z > m*
P[Z
]d* 1
N*(y-m*)
x N*(y* ~
+-N-~-m* - z)
for z _< m*
F o r the problem where the m i n i m u m W should exceed a value w we obtain
N* [_
N*(y-m*) d*-I +M(w- m*)l
N* + M [ N * ( ~ * - m*) Pr [W >
w{y(N)I =
for w > m*
NZ-~-ML-~J
for w < m* ,
c.f. D u n s m o r e (1974). Sometimes the situation is such that we are interested more generally in the chance that at least the r th largest will not exceed a given value. We first obtain the probability that exactly r out of M will not exceed the threshold w, Geisser (1984). Let V/=I = 0,
if YN+i< w i= l,...,M otherwise
and set R = ~ N V/. Then after some algebra we obtain
if r > 0 ,
(y*-m*h
N*+M\y*-w]
=1
/:0
1+
w<_m*,
(X* + M - r + j )
(M-r+j)(w-m*)) ×
ifr =0,
w<m*,
~v*(N ~Tm*~
(d* '1 if w > m* .
397
Some uses of order statistics in Bayesian analysis
Thus Pr[R _<
rolY(N)] = 1 -
(y* -- m*~ d*-I /.~* -- m,"~ d*-I \y,-w J + ( , ~ ; -- w )
r0 x Z
M - r if w_< m*
r=o
j=o J
(N*+M=r+j)
x (l+(M--r+j)(w--m*))'-d* ~7~;Z-m; ~
if w > m* , (8.3)
roly(N)]
and 1 - Pr[R _< is also the distribution function of the r th order statistic of the future random variables Y~+i, i = 1,... ,M. For further ramifications on interval estimation of the r th order statistics, see Geisser (1985). Other sampling distributions cure conjugate priors are generally not amenable to explicit results but numerical approximations or Monte Carlo simulations are often capable of yielding appropriate numerical answers for the most complicated situations.
I11,..., YN, Y N + I , . . . , YN+M i.i.d. N(#, 0"2), Pr[Yx+i<_wl#,0-2] = ~ ( ~ - ~ ) = O
Now for
(8.4)
and assuming that p(#, 0"2) 0( ~22,
"rEO <_OJ= , ' [ ~ ( ~ ) <_0l = ,'[~-(r " <_ • ~(0)] = P[fl ~ ~b-l(0)]
where w----2 = ft. For fixed w using the posterior density for # and 0-2 given t7 obtain the density of fi to be
f(fl[y(N)) _ V/~F((@) '~e-N~2/2 ~.= where d = ( w - y ) / [ ( N - 1)$2]1/2 and
(v~NdB)F' (N+@I)j!(1~ NNd2 + j - )~ +r
EN(yi_y)2
PrER r,y,N,
= ( N - 1)s 2. Thence
y(N) we (8.5)
s. Geisser
398
which can be approximated numerically or by simulation techniques, Geisser (1987). From (7.9) one can obtain the distribution function of the r th order statistic among the future set YN+I,..., YN+M. Although the presentation here is for the number in a semi-infinite interval, it is easily extended to finite intervals and for a set of exchangeable normal variates, Geisser (1987).
9. Multivariate problems In situations where I11,..., YN, YN+I,..., YN+M are q-dimensional vector variables such that Vi = 1 = 0
if YN+i E G otherwise
where G is some specified region, again interest is in R = ~/M__ 1 V//, the number of future variables in G. In the medical arena, interest would be on the future number of patients who would be administered a therapeutic agent, exchangeable with past patients who had received the agent. Hence if
P{Y E G]O] = fi ,
(9.1)
then symbolically
P(R=r[y(N))= (Mr)f
fl'(l--,)M-rf(fl[y(N))d,.
(9.2)
In essence, this generalizes the problem of order statistics. For Yi ~ N(#, Z), and using p ( # , Z -1) e( 1El4
(9.3)
suggested by Geisser and Cornfield (1963)
where
P(Y ¢ GI#'Y'-I) = f G f ( Y l # ' Z - 1 ) dy = G ( # , E -1) .
(9.4)
Although in any practical application G will not be an arbitrary region but one in which each component of the vector would be in an interval (finite or semiinfinite) and here G = I1 x I2 z . . . z Iq, or a hyperrectangle. This obviously precludes any explicit analytical solution for the problem. Although an alternate form for P(R = r]y (N)) is available, it is not clear which would be more susceptible
Some uses of order statistics in Bayesian analysis
399
to n u m e r i c a l a p p r o x i m a t i o n a n d / o r s i m u l a t i o n , if either. H e r e o n e finds the j o i n t predictive d i s t r i b u t i o n o f YN+I,..., YN+M directly,
Pr[R=rly(N)]:J''':~Iqf(YN+I,...,YN+M[y(N))dyN+I,...,dyN+M
:~
"" fllqK(q,N
(2n)-qM/2K(q'N-1)I(N-
1)S[~lQlq/2,dy (9.5)
where
K - l ( q , v) = 2vq/27~q(q-1)/4 I I F ( V + l - J ) j=l M
y:M-l~yN+i, i=I
2
'
N
(N-1)S=
~--~(yj- y ) ( y j - y ) ' j=l
f2 = I + ee', e' = ( 1 , . . . , 1) a n M - d i m e n s i o n a l v e c t o r a n d y c.f. G e i s s e r (1993, p. 207).
=
(YN+I,... ,YN+M)t,
Acknowledgement
T h i s w o r k was s u p p o r t e d in p a r t b y N I G M S 25271.
References
Box, G. E. P. (1980). Sampling and Bayes' inference in scientific modelling and robustness. J. Roy. Statist. Soc., B 143 383-340. Dixit, U. J. (1994). Bayesian approach to prediction in the presence of outliers for Weibull distribution. Metrika 41 127-136. Dixon, W. J. (1951). Ratios involving extreme values. Ann. Math. Statist. 22 68-78. Dunsmore, I. R. (1974). The Bayesian predictive distribution in life testing models. Teehnometrics 3 455460. Geisser, S. (1980). Contribution to Discussion. J. Roy. Statist. Soc. Series A 143 416417. Geisser, S. (1984). Predicting Pareto and exponential observables. Canad. J. Statist. 12 143-152. Geisser, S. (1985). Interval prediction for Pareto and exponential observables. J. Eeonom. 29 173-185. Geisser, S. (1987). Some remarks on exchangeable normal variables with applications. Contributions to the Theory and Applications of Statistics. Academic Press, 127-153. Geisser, S. (1989). Predictive discordancy testing for exponential observations. Canad. J. Statist. 17 (2) 19-26. Correction, (1991) 19 (4) 453. Geisser, S. (1990). Predictive approaches to discordancy testing. Bayesian and Likelihood Methods in Statistics and Econometrics. S. Geisser et al. eds., Amsterdam, North-Holland, 321-335. Geisser, S. (1993). Predictive Inference. Chapman and Hall, New York.
400
S. Geisser
Geisser, S. and J. Cornfield (1963). Posterior distributions for multivariate normal parameters. J. Roy. Statist. Soc. B 25 368-376. Kabe, D. G. (1970). Testing outliers from an exponential population. Metrika 15 15 18. Likes, J. (1966). Distribution of Dixon's statistics in the case of an exponential population. Metrika 11 46~54. Pearson, E. S. and H. O. Hartley (1966). Biometrika Tables for Statisticians, Volume I, Cambridge.
N. Balakrishnan and C. R. Rao, eds., Handbook of Statistics, Vol. 17 © 1998 Elsevier Science B.V. All rights reserved.
1 A
1
Inverse Sampling Procedures to Test for Homogeneity in a Multinomial Distribution
S. Panchapakesan, Aaron Childs, B. H. Humphrey and N. Balakrishnan
1. Introduction
Let X I , X 2 , . . . , X k be a random sample from a multinomial distribution with parameters p l , p 2 , . . . ,p~, with joint probability function
f(xl,x2,
. . ,Xk) .
. (
N. 2 Pkxk , . . ")pXlLP~2 Xl~X2 ~ . . . ~ X k / k k i=1
i=l
and let X(1) _<X(2) ~ ' ' " ~ X(k) be the order statistics obtained by arranging the k Xi's in increasing order of magnitude. The multinomial distribution provides a model for studying the diversity within a population which is categorized into several classes according to qualitative characteristics. Such studies arise in ecology, sociology, genetics, economics and other disciplines. In engineering, the multinomial distribution arises in a model for a system which can be in any one of a finite number of states. In this paper, we are interested in testing the homogeneity hypothesis Ho:pl =P2 . . . . .
pk= 1/k .
The standard )~2-test for Ho is based on a fixed sample size N, with test statistic
T=~i=I
-
(1.1)
,
whose distribution is approximately ){2 with k - 1 degrees of freedom under H0 for large N; see, for example, Rao (1973, p. 391) or Hogg and Craig (1995, p. 296). Johnson and Young (1960) studied fixed sample size procedures based on the following statistics: 401
402
s. Panchapakesan, A. Childs, B. H. Humphrey and N. Balakrishnan
Rk - X(1)
and
W~ - X(k) - )((l)
- x/k /
N
Their results are based on large-sample theory. Young (1962) considered tests based on --
X(k)
--
X(1) and
Mk -- E~=, ]X~ - N[ k
N
He compared these tests and the standard z2-test using large-sample approximations. He also made some comparisons of exact and approximate significance levels. Gupta and Nagel (1967) examined the mean and variance of the two extreme order statistics from a multinomial distribution. In this paper, we propose in Section 2 two new test procedures, both using an inverse sampling procedure, but with different stopping rules. The critical values, power and expected sample size of these tests are discussed in Section 3. We then compare these procedures with the standard )~2-test in Section 4. A third procedure, which combines the stopping rules of the first two, will then be studied in Section 5. Inverse sampling rules with the stopping rule of Procedure 1 and with an unbounded version of the stopping rule of Procedure 2 have been used by Cacoullos and Sobel (1966) and Alam (1971), respectively, in the context of selecting the most probable multinomial cell. A combination of their stopping rules was used by Ramey and Alam (1979) in their inverse sampling procedure for the selection problem.
2. The proposed inverse sampling procedures Procedure 1
In this inverse sampling procedure, observations are taken one at a time and the sampling is terminated when the count in any one of the cells reaches a specified number M. Let.121,X 2 , . . . , Ark denote the cell counts at termination. Obviously, the largest Xj, X(k), is equal to M and X(j) < M - 1 for j # k. Let N = Y'~=l Xi. Then M <_U < _ k ( M - 1 ) +
l
with the minimum occurring when one cell count is M and all remaining k - 1 cell counts equal zero, and the maximum occurring when one cell count is M and all remaining k - 1 cell counts are equal to M - 1. We now propose a test based on the range, = x/k / - x/~ 1 = M
- x/1 ) .
We reject H0 when Wk > Dl, where D1, a nonnegative integer, is the critical value to be determined so that the test has significance level ~, i.e., P.o ( Wk >_ D1) <<_~ .
Inverse sampling procedures to test for homogeneity in a multinomial distribution
403
Procedure 2
We again take one observation at a time and let X l t , X 2 t , . . . ,Xkt be the cell counts at time t. We continue sampling looking for the first time t when (:~) X(k)t --X(k-l)t ~___D2 (to be specified)
.
If (,) happens for some t _< No, where No is a positive integer to be specified, stop sampling and reject H0. If (,) does not happen for t = 1 , 2 , . . . , N o , terminate sampling and accept/4o.
3. Critical values, power and expected sample size The critical values D1 for Procedure 1 are obtained by first noting that PHo(Wk >_ D~) = kPHo (Xk = M , Xi <_ M - 1, X1 < _ M - D 1 , =kPHo(Xk = M , X~ < M -
l)
1, X~ > M - D 1
+ 1,
-- kPHo(Xk = M , X / _ < M -
i= 1,2,...,k= l-k
M-1 ~
...
xk t = M - D t + l
1)
M-1 ~
xi=M
i = 1, 2, . . . , k - l)
l, i = 1 , 2 , . . . , k -
(m-l + xo)!pXl ' • .. p ?k ' A1 k (i-1)!xl!'..xk ,!
Dl+l
= l - kB (say) ,
where x0 = x l +x2 + . . . +x~_l. Then DI is the smallest integer among { t , . . . , M - l} for which 1 - kB < ~. Note that forDl = 1, PH0(Wk > D1) = 1. Therefore, D1 is obtained by computing 1 - k B for all possible values of D1 starting with D1 = 2 until 1 - kB < ~ for the first time. In Table l, we present the critical values D1 for k = 3(1)8, M = 8(1)15, and a = 0.01, 0.05, 0.10 and 0.15. A missing entry in Table 1 indicates that the given significance level was not attainable• To compute the power of Procedure l, we consider the following slippage alternative, Ha:p1 . . . . .
pk_l=p
and
pk=2p,
2>0.
In this case, p = k-1+2, 1 and the power is obtained as follows. PHo(M --X(1) >_ D1) k
= Z P t / , (ith cell is the terminating cell, M -)((1) _> DI) i=l
:
(k - 1)PH,(I st cell is the terminating cell, M -X(1) _> D1) + Plio (kth cell is the terminating cell, M -X(1) _> D1)
= (k - 1)A + C (say)
(3.1)
404
S. Panchapakesan, A. Childs, B. H. Humphrey and N. Balakrishnan
Then, A = PH, (1 st cell is the terminating cell)
-PHa(1st cell is the M-1
M-1
= x2F=_0 ,
(m
-
1 -~- x 2 -~-- . . .
E
x2 = M - D I
(M - 1 + x2 + • .. + Xk~lp~+X2+"'+xk-lpX~ J" k~
M I + 1
-[~Xk)!pM+xz+"+xk-lpxk~
(M-
xk =0
M-1 -
terminating cell, M - X(a) < D1)
"'"
E
Xk = M - D I
(M--1)!x2!."xk!
+ 1
Also, C = PHa (kth cell is the terminating cell)
--PHo(kth
cell is the terminating cell, M-2(1) < Ol)
M 1 M-~ =Z E xI:O
(M--
l +xl +'"
( M - 1)!xl v.. .Xk_l v " ,
Xk_l:O
M-1 -
xl=M-Dl+l
+xk-l)!p~'+'+xk-'l~k
M-I '
(M - 1 +Xl + . . " +xk l)!p~'++xk-~/~
Z
xk_j=M-Dl+l
"
l!
"
In Tables 2 and 3, we present the power for 2 --1~ , ~ ,1~ ,1~ 1, 1(1)5, k = 3(1)7, M = 8(1)15, and e = 0.05 and 0.10, respectively. Of course, the value for 2 = 1 is the attained significance level. Since the sample size N is not fixed for either procedure, we also computed the expected sample sizes for the two procedures. For Procedure 1, the expected sample size E(N) may be obtained using the following formula, --/
M-
l+j
Pr(N=M+J)=Z'")--£(M-1 x2+...+xk=j xi<_M-I
+""
"~pMDxj,,~
x2
+
\
~
xk ] ~• . . ~
"'" xi+...+xk_l=j.M xi<_M- I
--
...p~k
/
l~xl~... ,xk
~ " " "Pk-lP~k , 1
O<j_< (k-1)M-k+l
.
In Table 5, we present the expected sample size for Procedure 1 in the case of the 1 1 1 l(1)5, k = 3 ( 1 ) 7 a n d M = 8 ( 1 ) 1 5 . slippage configuration in (3.1) for 2 = ½,4,3,2, Naturally, 2 = 1 gives the expected sample size under the null hypothesis. The critical values, powers and expected sample sizes for Procedure 2 are considerably more difficult to derive than for Procedure 1. As a result, these quantities were all obtained by simulation, and are presented in Tables 7-11. Work on the exact derivation of these quantities is currently in progress. The values of No used for Procedure 2 were chosen so as to calibrate the expected sample size under the null hypothesis with that of Procedure 1.
405
Inverse sampling procedures to test for homogeneity in a multinomial distribution
Table 1 Critical values Dt for Procedure 1 k
3
4
5
6
7
8
e
M 8
9
10
11
12
13
14
15
0.01 0.05 0.10 0.15
8 7 7
9 8 8
10 9 8 8
11 10 9 8
11 10 9 9
12 11 10 9
13 11 10 10
13 12 11 10
0.01 0.05 0.10 0.15
8 8 8
9 8 8
10 9 9
11 10 10 9
12 11 10 9
12 1l 10 10
13 12 ll 10
14 12 1l 11
0.01 0.05 0.10 0.15
8 8
9 9 8
10 9 9
II 10 10 9
12 11 10 10
13 12 11 10
13 12 11 11
14 13 12 11
0.01 0.05 0.10 0.15
8 8
9 9 9
10 10 9
11 10 10
12 11 11 10
13 12 11 11
14 12 12 11
14 13 12 12
0.01 0.05
-
-
10
11
0.10
-
9
10
10
0.15
8
9
9
10
12 12 11 11
13 12 12 11
14 13 12 12
14 13 13 12
0+01 0.05
. -
0.10
0.15
11 11 10
12 11 11
13 12 12 11
14 13 12 12
15 13 13 12
.
.
.
-
10
-
9
10
8
9
10
I n o r d e r to get a n i d e a o f the a c c u r a c y o f the s i m u l a t e d v a l u e s f o r P r o c e d u r e 2, we s i m u l a t e d f o r P r o c e d u r e 1 t h e p o w e r s at the 5 % significance level a n d e x p e c t e d s a m p l e sizes. T h e s e results are p r e s e n t e d in T a b l e s 4 a n d 6. C o m p a r i n g the simu l a t e d p o w e r s in T a b l e 4 w i t h t h e e x a c t v a l u e s g i v e n in T a b l e 2, we see t h a t t h e r e is close a g r e e m e n t ( o f t e n to t w o d e c i m a l places) f o r all v a l u e s o f k, M , a n d 2. T h e a g r e e m e n t b e t w e e n the s i m u l a t i o n s a n d e x a c t v a l u e s f o r t h e e x p e c t e d s a m p l e size is e v e n closer, as c a n be seen b y c o m p a r i n g t h e s i m u l a t e d v a l u e s in T a b l e 6 w i t h the e x a c t v a l u e s in T a b l e 5. M a n y times, the s i m u l a t e d v a l u e s are w i t h i n .01 o f t h e real values! A s a result, w e c a n be fairly c o n f i d e n t t h a t the s i m u l a t i o n s f o r p r o c e d u r e 2 are also q u i t e close to t h e e x a c t values. C o m p a r i n g t h e t w o p r o c e d u r e s , we o b s e r v e t h a t w h e n t h e r e is s l i p p a g e t o the right, i.e., 2 > 1, P r o c e d u r e 2 b e c o m e s c l e a r l y s u p e r i o r to P r o c e d u r e 1 in p o w e r . W h e n t h e s l i p p a g e is large, it is e v e n s u p e r i o r in e x p e c t e d s a m p l e size. F o r exa m p l e , l o o k i n g at the f o u r t h r o w o f k = 5 w h e n 2 = 5 in T a b l e s 2 a n d 8, we see t h a t t h e p r o c e d u r e s h a v e essentially t h e s a m e a t t a i n e d significance level (.0484 a n d .0486, r e s p e c t i v e l y ) b u t P r o c e d u r e 2 h a s a p o w e r o f 9 8 . 3 % w h i l e the p o w e r o f
S. Panchapakesan, A. Childs, B. H. Humphrey and N. Balakrishnan
406
Table 2 Exact powers of Procedure 1 at 5% significance level k
3
M
2
DI
1/5
1/4
1/3
1/2
2
3
4
5
1
8 9 10 11 12 13 14 15
0.3019 0.2536 0.5251 0.4704 0.6755 0.6280 0.7764 0.7387
0.2295 0.1850 0.4148 0.3584 0.5523 0.4974 0.6542 0.6054
0.1485 0.1120 0.2765 0.2247 0.3787 0.3227 0.4611 0.4055
0.0681 0.0451 0.1241 0.0894 0.1676 0.1273 0.2019 0.1592
0.0770 0.0515 0.1410 0.1025 0.1910 0.1463 0.2306 0.1832
0.1843 0.1405 0.3381 0.2784 0.4564 0.3940 0.5479 0.4882
0.2968 0.2425 0.5163 0.4535 0.6642 0.6085 0.7640 0.7187
0.3975 0.3393 0.6504 0.5939 0.7961 0.7539 0.8796 0.8508
0.0209 0.0107 0.0311 0.0172 0.0343 0.0200 0.0338 0.0204
8 9 9 10 10 11 11 12
8 9 10 1l 12 13 14 15
0.3457 0.2907 0.2450 0.5199 0.4664 0.6766 0.6299 0.7813
0.2712 0.2186 0.1768 0.4085 0,3534 0.5523 0.4981 0.6591
0.1851 0.1392 0.1051 0.2695 0.2191 0.3770 0.3213 0.4646
0.0949 0.0624 0.0412 0.1193 0.0855 0.1654 0.1252 0.2028
0.1135 0.0763 0.0512 0.1467 0.1069 0.2043 0.1569 0.2509
0.2561 0.1981 0.1523 0.3723 0.3094 0.5058 0.4410 0.6066
0.3984 0.3314 0.2740 0.5757 0.5120 0.7307 0.6778 0.8273
0.5180 0.4509 0.3899 0.7209 0.6673 0.8590 0.8237 0.9281
0.0379 0.0197 0.0102 0.0315 0.0176 0.0365 0.0214 0.0373
8 9 10 10 11 11 12 12
0.0306 0.0160 0.0484 0.0274 0.0152 0.0332 0.0194
9 10 10 II 12 12 13
0.0429 0.0227 0.0118 0.0386 0.0217 0.0467 0.0276
9 10 11 11 12 12 13
0.0302 0.0159
10 ll
8
9 10 11 12 13 14 15 8
9 10 11 12 13 14 15
.
.
.
.
9
.
0.3049 0.2572
.
0.2453 0.1981 0.4452 0.3861 0.3339 0.5343 0.4814 .
. .
0.1515 0.1138
.
0.1006 0.0677 0.1875 0.1376 0.1001 0.1981 0.1519 .
0.0961 0.0627 0.0411 0.1235 0.0879 0.1741 0.1310
. .
0.2321 0.1875
.
0.1833 0.1374 0.1036 0.2731 0.2218 0.3867 0.3297
. .
.
0.0793 0.0520 0.1468 0.1050 0.0749 0.1507 0.1134
.
0.2682 0.2161 0.1749 0.4124 0.3571 0.5622 0.5078 .
.
0.1623 0.1222 0.3050 0.2483 0.2016 0.3573 0.3040
.
0.3424 0.2881 0.2433 0.5241 0.4709 0.6856 0.6393
8
10 11
.
0.3189 0.2687 0.5546 0.4989 0.4475 0.6619 0.6158
.
0.1243 0.0840 0.0565 0.1666 0.1219 0.2356 0.1820 .
.
0.0734 0.0478
.
0.2493 0.1937 0.4478 0.3774 0.3147 0.5208 0.4559 .
0.2952 0.2315 0.1802 0.4348 0.3666 0.5838 0.5170 .
.
0.1000 0.0674
.
0.4054 0.3396 0.6627 0.5987 0.5357 0.7594 0.7088 .
0.4678 0.3964 0.3330 0.6649 0.6022 0.8152 0.7694 .
.
0.2663 0.2087
.
0.5382 0.4722 0.8011 0.7529 0.7019 0.8868 0.8558 .
0.6076 0.5399 0.4756 0.8119 0.7657 0.9240 0.8995
. .
0.4460 0.3781
. .
0.5960 0.5303
Procedure 1 is 80.1%. Furthermore, from Tables 5 and 10 we see that Procedure 2 has a smaller expected sample size (18.95 compared to 19.80). For smaller values of 2 > 1, Procedure 2 continues to have significantly higher power, but only a slightly larger expected sample size. On the other hand, when there is slippage to the left (2 < 1), Procedure 1 is uniformly better in terms of both power and expected sample size.
Inverse sampling procedures to test for homogeneity in a multinomial distribution
407
Table 3 Exact powers of Procedure 1 at 10% significance level k
3
M
2
D~
1/5
1/4
1/3
1/2
2
3
4
5
1
8 9 10 11 12 13 14 15
0.6464 0,5839 0.7705 0.7232 0.8469 0.8126 0.8959 0.8714
0.5482 0.4780 0.6690 0.6097 0.7511 0.7030 0.8096 0.7710
0.4121 0.3385 0.5110 0.4416 0.5836 0.5207 0.6399 0.5837
0.2354 0.1715 0.2833 0.2188 0.3163 0.2539 0.3410 0.2813
0.2599 0.1924 0.3148 0.2467 0.3533 0.2871 0.3822 0.3187
0.4843 0.4068 0.5940 0.5235 0.6710 0.6093 0.7281 0.6748
0.6520 0.5829 0.7725 0.7193 0.8461 0.8067 0.8934 0.8645
0.7633 0.7074 0.8715 0.8355 0.9276 0.9053 0.9582 0.9444
0.0967 0.0553 0.0958 0.0578 0.0880 0.0550 0.0781 0.0500
7 8 8 9 9 10 10 11
8 9 10 11 12 13 14 15
0.3457 0.6394 0.5776 0.5199 0.7237 0.8505 0.8168 0.9008
0.2712 0.5397 0.4705 0.4085 0.6093 0.7553 0.7075 0.8164
0.1851 0.4029 0.3303 0.2695 0.4397 0.5876 0.5244 0.6481
0.0949 0.2288 0.1657 0.1193 0.2170 0.3200 0.2559 0.3483
0.1135 0.2686 0.1996 0.1467 0.2630 0.3809 0.3112 0.4164
0.2561 0.5219 0.4434 0.3723 0.5739 0.7262 0.6671 0.7849
0.3984 0.7071 0.6413 0.5757 0.7811 0.8952 0.8634 0.9344
0.5180 0.8217 0.7728 0.7209 0.8905 0.9607 0.9460 0.9806
0.0379 0.0966 0.0557 0.0315 0.0613 0.0963 0.0605 0.0881
8 8 9 10 10 10 11 11
8 9 10 11 12 13 14 15
0,3797 0.3189 0.6148 0.5546 0.7555 0.7086 0.8419 0.8079
0.3050 0.2453 0.5118 0.4452 0.6488 0.5903 0.7425 0.6945
0.2168 0.1623 0.3734 0.3050 0.4857 0.4178 0.5697 0.5070
0.1213 0.0793 0.2041 0.1468 0.2607 0.1989 0.3018 0.2398
0.1489 0.1006 0.2527 0.1875 0.3250 0.2554 0.3781 0.3087
0.3184 0.2493 0.5248 0.4478 0.6570 0.5884 0.7454 0.6881
0.4798 0.4054 0.7259 0.6627 0.8498 0.8067 0.9150 0.8872
0.6082 0.5382 0.8456 0.8011 0,9366 0.9138 0.9730 0.9619
0.0577 0.0306 0.0841 0.0484 0.0922 0.0559 0.0912 0.0572
8 9 9 10 10 11 11 12
8 9 10 11 12 13 14 15
0.4087 0.3424 0.2881 0.5819 0.5241 0.7325 0.6856 0.8269
0.3347 0.2682 0.2161 0.4751 0.4124 0.6196 0.5622 0.7209
0.2461 0.1833 0.1374 0.3355 0.2731 0.4514 0.3867 0.5407
0.1477 0.0961 0.0627 0.1730 0.1235 0.2299 0.1741 0.2731
0.1831 0.1243 0.0840 0.2255 0.1666 0.3011 0.2356 0.3589
0.3728 0.2952 0.2315 0.5096 0.4348 0.6515 0.5838 0.7468
0.5463 0.4678 0.3964 0,7266 0.6649 0.8566 0.8152 0.9227
0.6772 0.6076 0.5399 0.8541 0.8119 0.9445 0.9240 0.9783
0.0798 0.0429 0,0227 0.0674 0.0386 0.0776 0.0467 0.0793
8 9 10 10 11 11 12 12
8 9 10 11
0.3631 0.3049 0.6047
0.2890 0.2321 0.5008
0.2031 0.1515 0.3629
0.1131 0.0734 0.1986
0.1476 0.1000 0.2610
0.3366 0.2663 0.5612
0.5208 0.4460 0.7749
0.6635 0.5960 0.8902
0.0564 0.0302 0.0880
9 10 10
S. Panchapakesan, A. Childs, B. H. Humphrey and N. Balakrishnan
408
Table 4 Simulated powers of Procedure 1 at 5% significance level k
3
M
2
D1
1/5
1/4
1/3
1/2
2
3
4
5
1
8 9 10 11 12 13 14 15
0.2977 0.2549 0.5271 0.4677 0.6716 0.6152 0.7840 0.7412
0.2356 0.1837 0.4129 0.3578 0.5507 0.4907 0.6622 0.6013
0.1493 0.1102 0.2808 0.2294 0.3743 0.3216 0.4528 0.4078
0.0685 0.0489 0.1259 0.0935 0.1695 0.1313 0.2038 0.1609
0.0830 0.0524 0.1422 0.1038 0.1867 0.1496 0.2285 0.1790
0.1809 0.1432 0.3362 0.2736 0.4631 0.4047 0.5464 0.4926
0.2962 0.2408 0.5260 0.4523 0.6692 0.6050 0.7635 0.7206
0.3994 0.3414 0.6544 0.5872 0.8000 0.7555 0.8765 0.8516
0.0208 0.0111 0.0326 0.0181 0.0325 0.0199 0.0330 0.0181
8 9 9 10 10 11 11 12
8 9 10 11 12 13 14 15
0.3477 0.2859 0.2437 0.5272 0.4739 0.6781 0.6328 0.7811
0.2685 0.2163 0.1778 0.4011 0.3463 0.5600 0.4957 0.6625
0.1865 0.1405 0.1001 0.2656 0.2221 0.3798 0.3162 0.4671
0.0990 0.0663 0.0394 0.1171 0.0913 0.1679 0.1311 0.2062
0.1122 0.0772 0.0483 0.1510 0.1057 0.2043 0.1554 0.2547
0.2574 0.1992 0.1569 0.3751 0.3104 0.4968 0.4354 0.6102
0.3930 0.3313 0.2746 0.5728 0.5076 0.7322 0.6816 0.8243
0.5166 0.4532 0.3855 0.7251 0.6713 0.8547 0.8280 0.9257
0.0386 0.0184 0.0101 0.0328 0.0180 0.0375 0.0210 0.0377
8 9 10 10 11 11 12 12
0.5373 0.4705 0.7999 0.7561 0.7012 0.8874 0.8527
0.0300 0.0149 0.0455 0.0258 0.0164 0.0321 0.0194
9 10 10 11 12 12 13
0.6106 0.5424 0.4782 0.8157 0.7691 0.9206 0.9010
0.0405 0.0251 0.0118 0.0375 0.0222 0.0473 0.0277
9 10 11 11 12 12 13
0.0282 0.0153 0.0091 0.0274 0.0163 0.0378
10 11 12 12 13 13
8
9 10 11 12 13 14 15 8
9 10 11 12 13 14 15 8
.
0.3184 0.2686 0.5614 0.5011 0.4480 0.6607 0.6096
.
.
0.2496 0.2015 0.4362 0.3869 0.3380 0.5343 0.4773
.
0.3369 0.2806 0.2463 0.5271 0.4770 0.6873 0.6395 .
.
.
0.1613 0.1249 0.3104 0.2457 0.1995 0.3616 0.3062
.
.
0.0769 0.0558 0.1451 0.1033 0.0760 0.1511 0.1121
.
0.2730 0.2210 0.1768 0.4090 0.3540 0.5609 0.5135 .
0.1777 0.1418 0.1071 0.2769 0.2242 0.3851 0.3309
.
0.0961 0.0665 0.1827 0.1367 0.0992 0.1962 0.1449 .
0.0929 0.0579 0.0452 0.1226 0.0929 0.1703 0.1251
.
.
0.4589 0.3906 0.3389 0.6609 0.6090 0.8146 0.7633 .
. .
0.3014 0.2601 0.2135 0.4969 0.4405 0.6559
8
.
9
.
0.2300 0.1856 0.1509 0.3760 0.3269 0.5318 .
0.1508 0.1203 0.0846 0.2413 0.1923 0.3477
0.0806 0.0481 0.0305 0.1057 0.0713 0.1518
. .
0.1018 0.0676 0.0456 0.1393 0.1033 0.2062 .
.
0.2707 0.2052 0.1676 0.4121 0.3461 0.5622
.
0.4568 0.3751 0.3158 0.6591 0.5966 0.8092
. .
.
.
0.2880 0.2397 0.1760 0.4352 0.3823 0.5818 0.5143
9
10 11 12 13 14 15
.
0.3946 0.3429 0.6592 0.5958 0.5407 0.7564 0.7129
.
0.1306 0.0852 0.0580 0.1690 0.1196 0.2378 0.1884
.
.
0.2432 0.1955 0.4455 0.3855 0.3180 0.5158 0.4497
. .
.
0.5884 0.5280 0.4693 0.8150 0.7707 0.9284
.
409
Inverse sampling procedures to test f o r homogeneity in a multinomial distribution
Table 4 (Contd.) k
8
M
10 11 12 13 14 15
2
Dx
1/5
1/4
1/3
1/2
2
3
4
5
1
0.3161 0.2639 0.2269 0.5007 0.4551 0.6800
0.2471 0.2058 0,1673 0.3998 0.3448 0.5462
0.1684 0.1246 0.0906 0.2498 0.2098 0.3801
0.0785 0.0561 0.0358 0.1070 0.0838 0.1619
0.1157 0.0812 0.0534 0.1626 0.1204 0.2396
0.3005 0.2297 0.1830 0.4507 0.3844 0.6148
0.4868 0.4073 0.3498 0.7021 0.6378 0.8480
0.6322 0.5864 0.5114 0.8511 0.8082 0.9455
0.0413 0.0209 0.0116 0.0368 0.0225 0.0442
10 11 12 12 13 13
4. Comparison with the standard z2-test The significance level and power of the standard )~2-test are obtained by evaluating the following sum,
xi+"'+xk=N "
Xl x2 • ~
~ ""
,x~
""P~k ,
(4.1)
r>z,2 -
k-[,~
where T is the Z2-statistic given in (1.1). The pi's in (4.1) are taken to be 1/k when finding significance levels, and are taken to be the values specified by the alternative hypothesis when finding powers. In Tables 12 and 13, we present powers of the zZ-test for sample sizes that coincide with the expected sample sizes of Procedures 1 and 2 under the null hypothesis. The significance levels used were 3% and 7%, respectively, so as to provide close agreement with the actual attained significance level of the other two procedures which was always less than the actual significance levels of 5% and 10%. For k _> 6, the approximation of Patnaik (1949) described in Young (1962) has been used. From Tables 12 and 13, we see that the power of the )~2-test is usually as good or better than those of the other two procedures. However, for slippage to the left (2 < 1), the powers of Procedure 1 are quite comparable to those of the )~2-test. But Procedure 1 has the distinct advantage that the expected sample size decreases as 2 decreases, whereas the sample size of the zZ-test is fixed. For example, we see from the seventh row o f k = 4, 2 = 1/5 in Table 3 and the sixth row of k = 4, 2 = 1/5 in Table 13 that the respective attained significance levels (.0605, .0708) and powers (.8168, .8342) are comparable, but the sample size for the 7~2-test is 42 whereas the expected sample size for Procedure 1 is seen from Table 5 to be 35. The same observations can be made regarding Procedure 2 when 2 > 1. This time, however, the advantage of decreased expected sample size is quite significant. For example, in the fifth row of k = 5 of Tables 9 and 13 we see that the attained significance levels of the two procedures are similar (.0728, .0692), and
S. Panchapakesan, A. Childs, B. H. Humphrey and N. Balakrishnan
410
Table 5 Exact expected sample size when using Procedure 1 k
M
2 1/5
1/4
1/3
1/2
2
3
4
5
1
3
8 9 10 11 12 13 14 15
14.14 16.13 18.12 20.13 22.14 24.17 26.20 28.23
14.46 16.49 18.53 20.59 22.65 24.72 26.79 28.87
14.98 17.09 19.21 21.34 23.48 25.63 27.78 29.94
15.94 18.21 20.50 22.79 25.10 27.40 29.72 32.04
15.23 17.32 19.39 21.46 23.52 25.57 27.62 29.67
13.21 14.91 16.60 18.28 19.96 21.64 23.31 24.99
11.97 13.48 14.99 16.49 18.00 19.50 21.00 22.50
11.19 12.60 14.00 15.40 16.80 18.20 19.60 21.00
17.20 19.76 22.34 24.95 27.57 30.21 32.86 35.52
4
8 9 10 11 12 13 14 15
18.35 21.08 23.83 26.61 29.41 32.22 35.05 37.89
18.64 21.41 24.21 27.03 29.87 32.72 35.60 38.48
19.11 21.95 24.82 27.72 30.63 33.56 36.51 39.47
19.98 22.98 26.01 29.06 32.13 35.21 38.31 41.42
18.69 21.32 23.94 26.55 29.16 31.75 34.33 36.91
15.79 17.84 19.88 21.91 23.94 25.95 27.97 29.97
13.95 15.72 17.48 19.24 20.99 22.75 24.50 26.25
12.79 14.39 16.00 17.60 19.20 20.80 22.40 24.00
21.26 24.54 27.87 31.22 34.61 38.02 41.46 44.91
5
8 9 10 11 12 13 14 15
22.32 25.77 29.26 32.78 36.34 39.92 43.53 47.16
22.58 26.08 29.61 33.18 36.77 40.40 44.05 47.72
23.02 26.58 30.19 33.82 37.49 41.19 44.91 48.66
23.85 27.56 31.31 35.09 38.91 42.75 46.62 50.51
22.06 25.24 28.42 31.58 34.73 37.86 40.98 44.09
18.35 20.76 23.16 25.53 27.90 30.26 32.61 34.96
15.93 17.96 19.97 21.98 23.99 25.99 28.00 30.00
14.38 16.19 17.99 19.80 21.60 23.40 25.20 27.00
25.11 29.10 33.15 37.24 41.38 45.55 49.75 53.98
6
8 9 10 11 12 13 14 15
26.12 30.27 34.47 38.73 43.03 47.37 51.74 56.14
26.37 30.56 34.81 39.10 43.45 47.83 52.24 56.68
26.78 31.04 35.36 39.72 44.14 48.58 53.07 57.58
27.57 31.97 36.43 40.94 45.49 50.09 54.71 59.37
25.36 29.10 32.82 36.54 40.24 43.92 47.58 51.23
20.89 23.67 26.42 29.15 31.86 34.56 37.26 39.94
17.91 20.19 22.46 24.73 26.99 29.24 31.49 33.75
15.98 17.99 19.99 22.00 24.00 26.00 28.00 30.00
28.82 33.50 38.25 43.07 47.94 52.85 57.81 62.81
7
8 9 10 11
29.78 34.62 39.53 44.50
30.02 34.90 39.85 44.86
30.42 35.36 40.38 45.46
31.18 36.26 41.41 46.64
28.60 32.89 37.17 41.44
23.42 26.56 29.67 32.75
19.88 22.42 24.95 27.47
17.57 19.78 21.99 24.19
32.41 37.77 43.22 48.74
the powers for 2 = 2, 3, 4, and 5 are all comparable (.4027, .8299, .9652, .9941 in Table 9 and .4322, .8562, .9764, .9967 in Table 13). However, for each value of 2, the sample size of the z2-test is fixed at 45, whereas the expected sample size of Procedure 2 can be seen from Table 11 to be 39.25 when 2 = 2, 29.61 when 2 = 3, 22.84 when 2 = 4, and 18.88 when 2 = 5.
411
Inverse sampling procedures to test for homogeneity in a multinomial distribution
Table 6 Simulated expected sample size when using Procedure 1 k
M
,t 1/5
1/4
1/3
1/2
2
8 9 10 ll 12 13 14 15
14.14 16.12 18.12 20.11 22.15 24.18 26.19 28.21
14.47 16.49 18.55 20.59 22.66 24.72 26.74 28.94
14.96 17.12 19.19 21.31 23.47 25.63 27.79 29.95
15.97 18.18 20.50 22.80 25.10 27.41 29.71 32.08
15.20 17.27 19.37 21.49 23.57 25.60 27.64 29.64
8 9 10 11 12 13 14 15
18.32 21.10 23.83 26.58 29.36 32.27 35.01 37.90
18.68 21.42 24.19 27.06 29.88 32.70 35.58 38.51
19.13 21.97 24.85 27.65 30.61 33.50 36.46 39.40
19.95 23.01 26.09 29.05 32.05 35.20 38.21 41.39
5
8 9 10 11 12 13 14 15
22.39 25.78 29.39 32.73 36.35 39.82 43.51 47.25
22.59 26.09 29.46 33.26 36.79 40.39 44.12 47.75
22.96 26.60 30.15 33.81 37.51 41.14 45.04 48.69
6
8 9 10 11 12 13 14 15
26.09 30.30 34.48 38.70 43.01 47.36 51.78 56.37
26.39 30.55 34.80 39.11 43.47 47.90 52.33 56.72
7
8 9 10 11 12 13 14 15
29.79 34.66 39.58 44.58 49.65 54.51 59.80 64.88
8
8 9 10 11 12 13 14 15
33.45 38.86 44.46 50.20 55.85 61.69 67.68 73.44
3
3
4
5
1
13.21 14.89 16.61 18.31 19.90 ,21.54 23.37 24.96
11.99 13.47 14.97 16.52 17.93 19.53 20.98 22.54
11.17 12.63 13.98 15.43 16.78 18.23 19.60 20.99
17.22 19.78 22.40 25.01 27.61 30.23 32.88 35.61
18.64 21.29 23.96 26.52 29.15 31.77 34.39 36.86
15.76 17.81 19.79 21.86 23.94 26.04 28.02 29.93
13.99 15.66 17.49 19.27 21.07 22.78 24.49 26.30
12.78 14.40 16.01 17.56 19.20 20.80 22.37 23.98
21.22 24.51 27.82 31.24 34.72 37.90 41.46 44.77
23.88 27.60 31.29 35.17 39.00 42.75 46.66 50.52
22.07 25.27 28.46 31.61 34.69 37.77 40.97 44.32
18.32 20.78 23.15 25.55 27.86 30.16 32.62 35.06
15.95 18.05 19.99 21.97 24.02 25.96 28.04 30.03
14.42 16.21 17.99 19.82 21.57 23.40 25.15 27.06
25.18 29.09 33.15 37.28 41.29 45.61 49.71 54.00
26.82 31.10 35.35 39.72 44.07 48.67 53.16 57.60
27.51 31.91 36.45 41.01 45.53 50.02 54.63 59.41
25.42 29.00 32.78 36.42 40.22 44.02 47.53 51.10
20.93 23.67 26.34 29.23 31.91 34.38 37.28 39.95
17.90 20.26 22.40 24.66 27.04 29.16 31.53 33.75
15.95 17.96 20.04 21.97 23.96 25.96 28.04 29.90
28.83 33.60 38.09 43.01 47.96 52.74 57.73 62.78
30.01 34.87 39.93 44.89 49.92 55.14 60.23 65.33
30.46 35.36 40.46 45.36 50.68 55.90 61.03 66.12
31.29 36.18 41.42 46.69 51.97 57.08 62.60 67.97
28.62 33.03 37.08 41.47 45,60 50.05 54.14 58.57
23.48 26.60 29.65 32.65 35.72 38.88 41.88 45.07
19.88 22.36 24.92 27.55 30.00 32.43 34.91 37.63
17.53 19.78 21.99 24.22 26.41 28.56 30.79 32.96
32.44 37.78 43.21 48.80 54.38 60.05 65.79 71.40
33.56 39.12 44.81 50.45 56.30 62.10 67.90 73.91
33.95 39.65 45.32 51.14 57.03 62.96 68.89 74.92
34.80 40.60 46.45 52.08 58.13 64.22 70.30 76.51
31.77 36.72 41.55 46.18 51.01 55.99 60.78 65.18
25.96 29.43 32.97 36.32 39.82 43.25 46.47 49.76
21.85 24.71 27.50 30.33 32.99 35.80 38.59 41.18
19.21 21.57 24.02 26.34 28.93 31.18 33.55 35.95
35.81 41.93 48.09 54.24 60.78 66.87 73.31 79.90
412
S. Panchapakesan, A. Childs, B. H. Humphrey and N. Balakrishnan
Table 7 Critical values D2 for Procedure 2 k
3
4
5
6
7
8
~
No 18
21
24
27
30
33
36
0.01 0.05 0.10 0.15
9 7 6 6
10 8 7 6
10 8 7 7
i1 9 8 7
12 9 8 7
12 10 9 8
12 10 9 8
0.01 0.05 0.10 0.15
No 22 9 7 6 5
26 9 7 6 6
30 10 8 7 6
34 10 8 7 6
38 11 8 7 7
42 11 9 8 7
46 11 9 8 7
0.01 0.05 0.10 0.15
No 25 8 6 6 5
30 8 7 6 5
35 9 7 6 6
40 10 7 7 6
45 10 8 7 6
50 10 8 7 7
55 11 9 8 7
0.01 0.05 0.10 0.15
No 29 8 6 5 5
35 8 6 6 5
41 9 7 6 6
47 9 7 6 6
53 10 8 7 6
59 10 8 7 6
65 10 8 7 7
0.01 0.05 0.10 0.15
No 33 7 6 5 5
40 8 6 6 5
47 8 7 6 5
54 9 7 6 6
61 9 8 7 6
68 10 8 7 6
75 10 8 7 7
0.01 0.05 0.10 0.15
No 37 7 6 5 5
45 8 6 5 5
53 8 7 6 5
61 9 7 6 6
69 9 7 6 6
77 10 8 7 6
85 10 8 7 6
5. The combined procedure W e h a v e s e e n in t h e l a s t s e c t i o n t h a t P r o c e d u r e 1 is t h e b e t t e r p r o c e d u r e w h e n 2 < 1, w h i l e P r o c e d u r e 2 p e r f o r m s b e t t e r w h e n 2 > 1. S i n c e t h e f o r m o f t h e a l t e r n a t i v e h y p o t h e s i s is n o t a l w a y s k n o w n , i d e a l l y w e w o u l d like t o h a v e a procedure that performs optimally for both forms of the alternative hypothesis.
413
Inverse sampling procedures to test for homogeneity in a multinomial distribution Table 8 Powers of Procedure 2 at 5% significance level k
No
~
D2
1/5
1/4
1/3
1/2
2
3
4
5
1
3
18 21 24 27 3O 33 36
0.1431 0.1225 0.1617 0.1257 0.1598 0.1352 0.1551
0.13t5 0.1152 0.1468 0.1272 0.1496 0.1209 0.1458
0.1169 0.0907 0.1395 0.1081 0.1430 0.1034 0.1329
0.0870 0.0652 0.1045 0.0795 0.1006 0.0764 0.1030
0.1987 0.1915 0.2647 0.2567 0.3148 0.3003 0.3639
0.4713 0.4968 0.6217 0.6323 0.7171 0.7187 0.7915
0.6811 0.7371 0.8299 0.8418 0.9104 0.9081 0.9461
0.8162 0.8591 0.9244 0.9345 0.9674 0.9735 0.9878
0.0367 0.0253 0.0387 0.0280 0.0396 0.0255 0.0350
7 8 8 9 9 10 10
4
22 26 30 34 38 42 46
0.0559 0.0869 0.0636 0.0868 0.1082 0.0780 0.0939
0.0551 0.0859 0.0587 0.0877 0.0994 0.0766 0.0938
0.0529 0.0794 0.0579 0.0764 0.1000 0.0678 0.0897
0.0404 0.0680 0.0460 0.0621 0.0828 0.0621 0.0781
0.1432 0.2215 0.2130 0.2847 0.3609 0.3259 0.3996
0.4330 0.5720 0.5950 0.7080 0.7780 0.7812 0.8432
0.6752 0.8107 0.8322 0.8994 0.9428 0.9520 0.9742
0.8318 0.9231 0.9392 0.9713 0.9874 0.9907 0.9964
0.0214 0.0384 0.0259 0.0316 0.0462 0.0312 0.0425
7 7 8 8 8 9 9
5
25 30 35 40 45 5O 55
0.0729 0.0512 0.0714 0.0949 0.0652 0.0854 0.0556
0.0691 0.0487 0.0653 0.0905 0.0613 0.0812 0.0531
0.0643 0.0424 0.0675 0.0860 0.0578 0.0695 0.0485
0.0561 0.0386 0.0572 0.0792 0.0532 0.0664 0.0435
0.1868 0.1727 0.2506 0.3232 0.3117 0.3702 0.3488
0.4928 0.5166 0.6513 0.7533 0.7537 0.8283 0.8337
0.7362 0.7830 0.8765 0.9371 0.9413 0.9696 0.9756
0.8823 0.9143 0.9641 0.9834 0.9892 0.9948 0.9952
0.0389 0.0256 0.0405 0.0486 0.0325 0.0435 0.0254
6 7 7 7 8 8 9
6
29 35 41 47 53 59 65
0.0482 0.0815 0.0504 0.0673 0.0423 0.0580 0.0713
0.0461 0.0741 0.0477 0.0694 0.0416 0.0598 0.0723
0.0495 0.0715 0.0485 0.0701 0.0467 0.0553 0.0658
0.0465 0.0663 0.0475 0.0575 0.0410 0.0518 0.0635
0.1544 0.2360 0.2254 0.2934 0.2730 0.3475 0.4203
0.4723 0.6276 0.6502 0.7408 0.7541 0.8259 0.8767
0.7411 0.8597 0.8911 0.9406 0.9517 0.9743 0.9876
0.8809 0.9588 0.9635 0.9893 0.9917 0.9969 0.9989
0.0326 0.0489 0.0299 0.0414 0.0243 0.0337 0.0403
6 6 7 7 8 8 8
7
33 40 47 54 61 68 75
0.0383 0.0698 0.0380 0.0619 0.0304 0.0459 0.0617
0.0419 0.0590 0.0389 0.0548 0.0366 0.0437 0.0596
0.0353 0.0614 0.0347 0.0495 0.0329 0.0451 0.0553
0.0315 0.0529 0.0328 0.0525 0.0287 0.0381 0.0550
0.1339 0.2151 0.2066 0.2730 0.2547 0.3253 0.3983
0.4530 0.6042 0.6317 0.7331 0.7472 0.8272 0.8795
0.7396 0.8616 0.8884 0.9400 0.9553 0.9768 0.9881
0.8848 0.9585 0.9741 0.9890 0.9925 0.9974 0.9986
0.0268 0.0462 0.0294 0.0374 0.0222 0.0300 0.0406
6 6 7 7 8 8 8
8
37 45 53 6t 69 77 85
0.0307 0.0544 0.0341 0.0453 0.0610 0.0406 0.0497
0.0326 0.0576 0.0316 0.0450 0.0594 0.0402 0.0490
0.0308 0.0434 0.0305 0.0415 0.0570 0.0405 0.0462
0.0312 0.0452 0.0259 0.0400 0.0555 0.0344 0.0468
0.1243 0.2029 0.1867 0.2673 0.3286 0.3058 0.3715
0.4434 0.5901 0.6197 0.7358 0.8143 0.8255 0.8810
0.7305 0.8612 0.8862 0.9430 0.9721 0.9754 0.9888
0.8905 0.9577 0.9751 0.9907 0.9972 0.9972 0.999t
0.0234 0.0387 0.0206 0.0331 0.0432 0.0241 0.0349
6 6 7 7 7 8 8
S. Panchapakesan, A. Childs, B. H. Humphrey and N. Balakr&hnan
414
Table 9 Powers of Procedure 2 at 10% significance level No
D2 1/5
1/4
1/3
1/2
2
3
4
5
1
3
18 21 24 27 30 33 36
0.2541 0.1991 0.2559 0.1973 0.2469 0.1917 0.2296
0.2389 0.1903 0.2301 0.1877 0.2273 0.1796 0.2156
0.2131 0.1664 0.2166 0.1700 0.2103 0.1640 0.1956
0.1704 0.1285 0.1777 0.1280 0.1696 0.1204 0.1579
0.3007 0.6086 0.2777 0.6125 0.3666 0.7100 0.3506 0.7211 0.4192 0.7982 0.3950 0.7903 0.4635 0.8436
0.7931 0.8097 0.8869 0.8979 0.9386 0.9385 0.9658
0.9012 0.9148 0.9521 0.9617 0.9832 0.9833 0.9916
0.0928 0.0542 0.0784 0.0532 0.0784 0.0541 0.0629
6 7 7 8 8 9 9
4
22 26 30 34 38 42 46
0.1228 0.1642 0.1214 0.1520 0.1859 0.1388 0.1598
0.1167 0.1589 0.1100 0.1451 0.1815 0.1331 0.1518
0.1046 0.0897 0.1542 0.1262 0.1150 0.0928 0.1365 0.1205 0.1751 0.1486 0.1218 0.1101 0.1493 0.1272
0.2490 0.5621 0.3305 0.6891 0.3008 0.6855 0.3784 0.7863 0.4666 0.8510 0.4336 0.8460 0.4918 0.8893
0.7780 0.8795 0.8866 0.9398 0.9690 0.9695 0.9845
0.8909 0.9537 0.9657 0.9842 0.9930 0.9941 0.9983
0.0594 0.0856 0.0531 0.0724 0.0945 0.0643 0.0721
6 6 7 7 7 8 8
5
25 30 35 40 45 50 55
0.0729 0.1030 0.1439 0.0949 0.1215 0.1434 0.0961
0.0691 0.1017 0.1450 0.0905 0.1138 0.1431 0.0992
0.0643 0.0934 0.1294 0.0860 0.1061 0.1387 0.0949
0.1868 0.2681 0.3629 0.3232 0.4027 0.4698 0.4484
0.7362 0.8823 0.8588 0.9517 0.9256 0.9832 0.9371 0.9834 0.9652 0.9941 0.9847 0.9977 0.9834 0.9987
0.0389 0.0601 0.0877 0.0486 0.0728 0.0839 0.0548
6 6 6 7 7 7 8
6
29 35 41 47 53 59 65
0.1215 0.0815 0.1064 0.1404 0.0946 0.1149 0.1336
0.1197 0.0741 0.1007 0.1346 0.089l 0.1113 0.1352
0.1189 0.1094 0.0715 0.0663 0.1042 0.0935 0.1351 0.1218 0.0883 0.0751 0.1065 0.1017 0.1267 0.1199
0.2511 0.5948 0.2360 0.6276 0.3212 0.7382 0.4073 0.8247 0.3736 0.8233 0.4406 0.8826 0.5207 0.9228
0.8311 0.8597 0.9256 0.9636 0.9687 0.9853 0.9925
0.9351 0.9588 0.9840 0.9943 0.9955 0.9984 0.9992
0.0863 0.0489 0.0673 0.0928 0.0575 0.0716 0.0884
5 6 6 6 7 7 7
7
33 40 47 54 61 68 75
0.1010 0.0698 0.0907 0.1188 0.0770 0.0901 0.1076
0.1014 0.0942 0.0590 0.0614 0.0851 0.0826 0.1105 0.1144 0.0687 0.0683 0.0921 0.0887 0.1112 0.1057
0.0931 0.0529 0.0805 0.1064 0.0631 0.0847 0.0986
0.2376 0.5870 0.2151 0.6042 0.3010 0.7245 0.3761 0.8098 0.3491 0.8219 0.4170 0.8807 0.4933 0.9142
0.8250 0.8616 0.9293 0.9674 0.9690 0.9868 0.9933
0.9372 0.0741 0.9585 0.0462 0.9870 0.0613 0.9934 0.0848 0.9965 0.0503 0.9990 0.0639 0.9996 0.0769
5 6 6 6 7 7 7
8
37 45 53 61 69 77 85
0.0863 0.1322 0.0749 0.0997 0.1297 0.0836 0.0953
0.0858 0.1223 0.0732 0.1016 0.1310 0.0764 0.0965
0.0805 0.1182 0.0668 0.0876 0.1196 0.0703 0.0846
0.2212 0.5748 0.3129 0.7110 0.2783 0.7173 0.3611 0.8113 0.4369 0.8803 0.3938 0.8757 0.4720 0.9217
0.8242 0.9372 0.9202 0.9764 0.9348 0.9861 0.9668 0.9950 0.9849 0.9988 0.9858 0.9989 0.9951 0.9999
0.0863 0.1241 0.0743 0.0992 0.1149 0.0787 0.0961
0.0561 0.0893 0.1238 0.0792 0.1037 0.1262 0.0788
0.4928 0.6403 0.7467 0.7533 0.8299 0.8828 0.8821
0.0670 0.0995 0.0612 0.0796 0.0966 0.0588 0.0706
5 5 6 6 6 7 7
Inverse sampling procedures to test for homogeneity in a muhinomial distribution
415
Table 10 Expected sample size when using Procedure 2 at 5% significance level k
No 1/5
1/4
1/3
1/2
2
3
4
5
1
3
18 21 24 27 30 33 36
17.28 20.34 22.89 26.09 28.62 31.78 34.39
17.37 20.37 23.05 26.10 28.76 31.94 34.55
17.44 20.53 23.09 26.27 28.83 32.12 34.73
17,60 20.69 23.35 26.48 29.25 32.42 35.07
17.07 20.06 22.35 25.28 27.47 30.46 32.46
15.54 18.10 19.30 21.88 22.87 25.49 26.19
14.07 16.I6 16.76 19.03 19.19 21.55 21.72
12.83 14.69 14.99 16.99 16.93 18.90 18.97
17.83 20.88 23.78 26.82 29.70 32.82 35.69
4
22 26 30 34 38 42 46
21.67 25.36 29.49 33.18 36.83 41.08 44.81
21.68 25.38 29.55 33.19 36.90 41.15 44.78
21.70 25,44 29,54 33,29 36,93 41,29 44,86
21.79 25.54 29.68 33.45 37.19 41.36 45.09
21.21 24.43 28.34 31,33 34.04 38,28 40.58
19.32 21.28 24.51 25.87 27.00 30.35 31.04
17.24 18.20 20.81 21.31 21.73 24.17 24.43
15,35 15.91 18.11 18.27 18.35 20.49 20.38
21,88 25,74 29,81 33,73 37.57 41.69 45.53
5
25 30 35 40 45 50 55
24.50 29.61 34,30 38.89 44.18 48.79 54,20
24.51 29,60 34.35 38.92 44.21 48.88 54.25
24.57 29.67 34.37 38.98 44.30 49.05 54.32
24.63 29.73 34.47 39.12 44.41 49.12 54.41
23.72 28.65 32.50 36.18 41.08 44.49 49.67
21.09 25.27 27.28 28.68 32.57 33.67 37.57
18.22 21.42 22,29 22.56 25.88 25.73 28.90
15.78 18.47 18.79 18.95 21.36 21.40 23.97
24.73 29.81 34.63 39.43 44.62 49.43 54.67
6
29 35 41 47 53 59 65
28.62 34.17 40.45 46.13 52.40 58,03 63.69
28.64 34.22 40.47 46.05 52.44 58.05 63.69
28.61 34.26 40,45 46.09 52.37 58.12 63,79
28.64 34.34 40.48 46.26 52.41 58.19 63.93
27.79 32.54 38.45 43.00 49.09 53.00 56.66
24.67 27.16 32.08 33.90 38.64 39.82 40.65
21.03 22.19 25.75 26.30 29.86 29.96 30.08
18.23 18,54 21.51 21.59 24.29 24.52 24.43
28.76 34.51 40.68 46.49 52.67 58.51 64.28
7
33 40 47 54 61 68 75
32.65 39.20 46.51 53.09 60.53 67.15 73.77
32.62 39.34 46.52 53.16 60.46 67.23 73.78
32.67 39.29 46.59 53.29 60.49 67.15 73.87
32.72 39.41 46.62 53.23 60.57 67.32 73.90
31.80 37.51 44.43 49.81 56.71 61.58 66.12
28.31 31,52 36.89 39.28 44.48 45.94 47,05
23,97 25,33 29,25 30.02 33.86 33.95 34.27
20.50 20.88 24.08 24.33 27.43 27.39 27.41
32.76 39.51 46.66 53,46 60.66 67.48 74.21
8
37 45 53 61 69 77 85
36.70 44.34 52.51 60.23 67.78 76.21 83.85
36.67 44.26 52.53 60.26 67.85 76.19 83.93
36.70 44.43 52.60 60.33 67.91 76.16 83.92
36.68 44.43 52.64 60.36 67.98 76.33 83.88
35.79 42.38 50.39 56.33 62.14 70.40 75.75
31.95 35.71 41.88 44.32 46.20 52.39 53.29
27.08 28.57 33.17 33.70 34.01 38.42 38.51
22.78 23.36 26.98 26.95 27.06 30.43 30.43
36.75 44.50 52.73 60.49 68.19 76.55 84.27
S. Panchapakesan, A. Childs, B. H. Humphrey and N. Balakrishnan
416
Table 11 Expected sample size when using Procedure 2 at 10% significance level
No 1/5
1/4
1/3
1/2
2
3
4
5
1
3
18 21 24 27 30 33 36
16.49 19.74 21.98 25.40 27.58 31.06 33.34
16.63 19.83 22.25 25.54 27.82 31.24 33.6t
16.81 19.95 22.38 25.69 28.07 31.49 33.89
17.09 20.27 22.75 26.06 28.52 31.96 34.42
16.34 19.34 21.29 24.21 26.19 29.26 30~97
14.27 16.81 17.82 20.31 20.98 23.70 24.13
12.53 14.64 15.07 17.22 17.50 19.64 19.73
11.18 13.06 13.40 15.14 15.28 17.11 17.11
17.54 20.62 23.45 26.61 29.43 32.64 35.40
4
22 26 30 34 38 42 46
21.20 24.62 28.92 32.39 35.74 40.28 43.71
21.23 24.67 29.00 32.49 35.77 40.30 43.87
21.33 24.74 29.01 32.59 35.86 40.54 43.90
21.43 25.04 29.20 32.81 36.28 40.73 44.26
20.41 23.31 27.37 30.02 31.98 36.31 38.56
17.90 19.39 22.76 23.44 24.30 27.69 28.13
15.43 16.02 18.76 19.01 19.07 21.78 21.78
13.72 13.80 16.03 16.11 16.27 18.39 18.37
21.66 25.32 29.53 33.31 36.94 41.30 45.08
5
25 30 35 40 45 50 55
24.50 29.10 33.36 38.89 43.35 47.65 53.51
24.51 29.06 33.36 38.92 43.46 47.73 53.41
24.57 29.19 33.58 38.98 43.60 47.87 53.47
24.63 29.25 33.64 39.12 43.65 48.06 53.79
23.72 27.61 30.86 36.18 39.25 41.98 47.31
21.09 23.11 24.65 28.68 29.61 30.45 34.12
18.22 19.15 19.42 22.56 22.84 22.88 25.95
15.78 16.08 16.31 18.95 18.88 19.03 21.36
24.73 29.50 34.03 39.43 44.09 48.75 54.26
6
29 35 41 47 53 59 65
27.87 34.17 39.66 44.87 51.47 56.91 62.34
27.83 34.22 39.72 45.00 51.60 57.02 62.16
27.92 34.26 39.71 44.97 51.62 57.09 62.39
28.00 34.34 39.85 45.24 51.87 57.23 62.67
26.66 32.54 36.89 40.56 46.87 50.47 53.36
22.50 27.16 29.00 30.40 35.13 35.80 36.74
18.52 22.19 22.85 23.09 26.65 26.94 26.76
15.55 18.54 18.82 18.95 21.67 21.80 21.61
28.20 34.51 40.14 45.52 52.14 57.71 63.37
7
33 40 47 54 61 68 75
31.93 39.20 45.70 51.96 59.66 66.16 72.52
31.90 39.34 45.80 52.09 59.79 66.08 72.44
32.02 39.29 45.85 52.05 59.75 66.21 72.59
32.02 39.41 45.83 52.27 59.86 66.34 72.73
30.54 37.51 42.65 47.20 54.49 58.96 62.36
25.84 31.52 33.74 35.50 40.64 41.72 42.40
21.15 25.33 25.99 26.45 30.37 30.27 30.44
17.66 20.88 21.16 21.18 24.34 24.33 24.44
32.22 39.51 46.12 52.75 60.22 66.79 73.26
8
37 45 53 61 69 77 85
36.02 42.99 51.75 59.09 66.04 75.14 82.53
36.01 43.19 51.80 59.18 66.12 75.30 82.52
36.00 43.16 51.83 59.14 66.45 75.23 82.55
36.07 43.29 51.94 59.38 66.38 75.42 82.79
34.39 40.24 48.52 53.80 58.59 67.43 71.53
29.21 31.95 38.37 40.08 41.33 47.29 48.I8
23.81 24.77 29.38 29.61 29.90 34.24 34.23
19.79 19.92 23.42 23.82 23.58 27.00 27.14
36.31 43.49 52.13 59.64 66.99 75.89 83.10
Inverse sampling procedures to test for homogeneity in a multinomial distribution
417
Table 12 Powers of the chi-square test at 3% significance level k
3
N 1/5
1/4
1/3
1/2
2
3
4
5
1
18 21 24 27 30 33 36
0.3145 0.5428 0.5362 0.6350 0.7740 0.7981 0.8465
0.2398 0.4334 0.4229 0.5064 0.6518 0.6782 0.7336
0.1555 0.2940 0.2788 0.3339 0.4601 0.4823 0.5330
0.0702 0.1361 0.1187 0.1382 0.2044 0.2107 0.2338
0.1375 0.2250 0.2066 0.2473 0.3275 0.3366 0.3683
0.4001 0.5583 0.5545 0.6365 0.7386 0,7594 0,8012
0.6308 0.7828 0.7904 0.8585 0.9197 0.9327 0.9533
0.7820 0.8970 0.9062 0.9479 0.9764 0.9821 0.9896
0.0168 0.0315 0.0213 0.0222 0.0331 0.0296 0.0294
22 26 30 34 38 42 46
0.2288 0.3434 0.3945 0.5lll 0.5716 0.6532 0.7071
0.1874 0.2733 0.3122 0.4035 0.4595 0.5272 0.5810
0.1356 0.1895 0.2127 0.2704 0.3121 0.3546 0.3978
0.0740 0.0968 0.1028 0.1250 0.1408 0.1530 0.1712
0.1664 0.2183 0.2365 0.2899 0.3077 0.3478 0.3741
0.4736 0.5824 0.6336 0.7211 0.7537 0.8099 0.8396
0.7225 0.8215 0.8665 0.9208 0.9393 0.9637 0.9741
0.8642 0.9296 0.9560 0.9798 0.9867 0.9939 0.9963
0.0254 0.0297 0.0268 0.0298 0.0284 0.0277 0.0278
25 30 35 40 45 50 55
0.1648 0.2001 0.2708 0.3734 0.4232 0.5026 0.5858
0.1411 0.1658 0.2189 0.2991 0.3352 0.3985 0.4720
0.1097 0.1227 0.1556 0.2085 0.2276 0.2682 0.3213
0.0686 0.0706 0.0828 0.1059 0.1084 0.1232 0.1453
0.1719 0.1952 0.2299 0.2801 0.3009 0.3388 0.3777
0.4914 0.5655 0.6441 0.7247 0.7692 0.8184 0.8582
0.7475 0.8220 0.8831 0.9293 0.9522 0.9705 0.9820
0.8864 0.9354 0.9668 0.9845 0.9918 0.9961 0.9982
0.0301 0.0266 0.0269 0.0310 0.0273 0.0280 0.0297
29 35 41 47 53 59 65
0.1883 0.2315 0.2766 0.3229 0.3695 0.4159 0.4615
0.1609 0.1966 0.2341 0.2730 0.3128 0.3529 0.3931
0.1234 0.1484 0.1749 0.2026 0.2315 0.2611 0.2913
0.0731 0.0839 0.0952 0.1071 0.1195 0.1324 0.1457
0.1609 0.1966 0.2341 0.2730 0.3128 0.3529 0.3931
0.5458 0.6493 0.7365 0.8068 0.8614 0.9025 0.9327
0.8495 0.9199 0.9596 0.9806 0.9910 0.9960 0.9983
0.9631 0.9874 0.9960 0.9988 0,9997 0.9999 1.0000
0.0300 0.0300 0.0300 0.0300 0.0300 0.0300 0.0300
33 40 47 54 61 68 75
0.1648 0.2032 0.2438 0.2859 0.3289 0.3723 0.4156
0.1417 0.1733 0.2069 0.2420 0.2783 0.3154 0.3529
0.1100 0.1322 0.1557 0.1806 0.2066 0.2335 0.2611
0.0674 0.0770 0.0871 0.0977 0.1088 0.1203 0.1323
0.1541 0.1894 0.2268 0.2657 0.3058 0.3464 0.3871
0.5510 0.6590 0.7489 0.8202 0.8744 0.9142 0.9426
0.8686 0.9351 0.9700 0.9868 0.9945 0.9978 0.9991
0.9747 0.9927 0.9981 0.9995 0.9999 1.0000 1.0000
0.0300 0.0300 0.0300 0.0300 0.0300 0.0300 0.0300
37 45 53 61 69 77 85
0.1478 0.1823 0,2192 0.2578 0,2977 0.3383 0.3792
0.1278 0.1562 0.1866 0.2186 0.2520 0.2863 0.3214
0.1004 0.1202 0.1415 0.1640 0.1877 0.2123 0.2378
0.0632 0.0718 0.0810 0.0906 0.1006 0.1111 0.1221
0.1478 0.1823 0.2192 0.2578 0.2977 0.3383 0.3792
0.5515 0.6631 0.7554 0.8277 0.8819 0.9211 0.9484
0.8808 0.9446 0.9760 0.9903 0.9963 0.9986 0.9995
0.9813 0.9953 0.9990 0.9998 1.0000 1.0000 1.0000
0.0300 0.0300 0.0300 0.0300 0.0300 0.0300 0.0300
S. Panchapakesan, A. Childs, B. H. Humphrey and N. Balakrishnan
418
Table 13 Powers of the chi-square test at 7% significance level k
3
N
2 1/5
1/4
1/3
1/2
2
3
4
5
1
18 21 24 27 30 33 36
0.6497 0.7594 0.7198 0.8376 0.8934 0.8955 0.9236
0.5493 0.6566 0.6055 0.7379 0.8066 0.8083 0.8442
0.4090 0.4993 0.4370 0.5659 0.6382 0.6362 0.6724
0.2246 0.2760 0.2145 0,2976 0,3451 0.3319 0.3476
0.3057 0.3718 0.3177 0.4084 0.4655 0.4539 0.4820
0.6214 0.7147 0.6871 0.7842 0.8395 0.8423 0.8730
0.8136 0.8857 0.8784 0.9341 0.9606 0.9637 0.9761
0.9091 0.9546 0.9540 0.9802 0.9904 0.9918 0.9956
0.0755 0.0872 0.0499 0.0708 0.0797 0.0636 0.0597
22 26 30 34 38 42 46
0.4670 0.5051 0.6427 0.7039 0.7555 0.8342 0.8519
0.3892 0.4198 0.5420 0.5984 0.6485 0.7327 0.7537
0.2906 0.3087 0.4020 0.4435 0.4839 0.5586 0.5790
0.1704 0.1712 0.2196 0.2345 0.2526 0.2915 0.2998
0.2929 0.3013 0.3724 0.4010 0.4318 0.4882 0.5010
0.6371 0.6708 0.7639 0.8054 0.8427 0.8892 0.9031
0.8435 0.8745 0.9311 0.9527 0.9690 0.9836 0.9876
0.9358 0.9552 0.9812 0.9893 0.9944 0.9978 0.9986
0.0705 0.0599 0.0717 0.0675 0.0657 0.0708 0.0654
25 30 35 40 45 50 55
0,3276 0.4155 0.4990 0.5797 0.6545 0.7078 0.7677
0.2824 0.3515 0.4167 0.4876 0.5531 0.6021 0.6619
0.2226 0.2679 0.3099 0.3629 0.4102 0.4463 0.4963
0.1439 0.1615 0.1775 0.2028 0.2220 0.2354 0,2596
0.2710 0.3068 0.3527 0.3923 0.4322 0.4611 0.4997
0.6164 0.6857 0.7618 0.8104 0.8562 0.8852 0.9138
0.8366 0.8900 0.9367 0.9592 0.9764 0.9850 0.9914
0.9359 0.9651 0.9850 0.9923 0.9967 0.9983 0.9993
0.0700 0.0686 0.0678 0.0702 0.0692 0.0662 0.0672
29 35 41 47 53 59 65
0.3036 0.3572 0.4102 0.4619 0.5117 0.5592 0,6040
0.2681 0.3141 0.3603 0.4060 0.4508 0.4943 0.5361
0.2169 0.2513 0.2864 0.3217 0.3570 0.3922 0.4268
0.1425 0.1591 0.1762 0.1937 0.2114 0.2295 0.2478
0.2681 0.3141 0.3603 0.4060 0.4508 0.4943 0.5361
0.6820 0.7703 0.8382 0.8886 0.9249 0.9502 0.9676
0.9173 0.9604 0.9819 0.9921 0.9967 0.9986 0.9995
0.9837 0.9952 0.9987 0.9996 0.9999 1.0000 1.0000
0.0700 0.0700 0.0700 0.0700 0.0700 0.0700 0.0700
33 40 47 54 61 68 75
0.2736 0.3228 0.3722 0.4211 0.4689 0.5151 0.5592
0.2427 0.2847 0,3274 0.3701 0,4125 0.4541 0.4946
0.1983 0.2295 0.2616 0.2942 0.3270 0.3599 0.3926
0.1337 0.1488 0.1643 0,1802 0,1964 0.2130 0.2298
0.2594 0.3054 0.3518 0.3980 0.4434 0.4877 0.5304
0.6869 0.7783 0.8476 0.8979 0.9331 0.9571 0.9730
0.9295 0.9689 0.9871 0.9949 0.9981 0.9993 0.9998
0.9894 0.9974 0.9994 0.9999 1.0000 1.0000 1.0000
0.0700 0.0700 0.0700 0.0700 0,0700 0,0700 0.0700
37 45 53 61 69 77 85
0.2512 0.2967 0.3429 0.3891 0.4347 0.4794 0.5225
0.2238 0.2625 0.3022 0.3422 0.3823 0.4220 0.4610
0.1844 0.2131 0.2427 0.2729 0.3036 0.3345 0.3655
0.1272 0.1410 0.1552 0.1698 0.1848 0.2001 0.2157
0.2512 0.2967 0.3429 0.3891 0.4347 0.4794 0.5225
0.6876 0.7818 0.8524 0.9030 0.9379 0.9611 0.9761
0.9371 0.9741 0.9900 0.9964 0.9987 0.9996 0.9999
0.9925 0.9984 0.9997 0.9999 1.0000 1.0000 1.0000
0.0700 0.0700 0.0700 0.0700 0.0700 0.0700 0.0700
Inverse sampling procedures to test for homogeneity in a multinomial distribution
419
To accomplish this, we propose a combined procedure that makes use of both of the previous procedures. It is performed as follows. Procedure 3
For a fixed value of k and ~, we first choose values of M and No in Procedures 1 and 2, respectively, with corresponding critical values D1 and D2 given in Tables 1 and 7, respectively. We then take one observation at a time, and continue sampling until the first time t when one of the following three events occurs. (1) X(k)t - X(k-1)t ~_ D2 (2) X(~)t = i (3) No observations have been taken If (1) occurs, then we stop sampling and reject H0. If (2) occurs, we perform the test in Procedure 1 and draw the appropriate conclusion. And if (3) occurs, then we stop sampling and accept H0. If any two, or all three, of the above events occur simultaneously, we perform the action corresponding to the lowest numbered event that occurs. Ideally, once k and ~ have been fixed, we would like to choose M and No so that the procedure that is optimal, for a particular value of 2, is the one that determines the outcome of the test described above most of the time. That is, when 2 < 1 and Procedure 1 is optimal, we would like (2) to occur before (1) and (3), and (1) or (3) to occur before (2) when 2 > 1. In Tables 14 and 15, we present the powers of Procedure 3 performed when Procedures 1 and 2 are used at the 5% and 10% significance levels, respectively. The values of M and No were chosen according to the considerations described above. In addition to the powers, we also present (in brackets below the powers) the percentage of times that the outcome was determined by Procedure 1. That is, the percentage of times that (2) occurred before (1) and (3) or (2) and (3) occurred simultaneously, both before (1). The ideal situation would be for the percentages given in brackets to be close to l when 2 < l, and close to 0 when 2 > 1. Indeed, we see this occurring for some of the larger values of M and No for each value ofk. For example, in the last line ofk = 5 in Table 14, we see that Procedure 1 determines the outcome 91% of the time when 2 = 1/5, while it determines the outcome only 15% of the time when 2 = 5. Comparing the powers in this situation when £ < 1 to the optimal Procedure l, we see that with a similar attained significance level (0.0252 for Procedure 3 compared with 0.0194 for Procedure l), the powers of Procedure 3 are nearly as good as the optimal powers of Procedure 1. For example, when 2 = 1/4, the power of Procedure 3 (0.4699) is quite close to that of Procedure 1 (0.4814). Furthermore, the power is clearly superior to that of Procedure 2 in the same situation (0.0531). When 2 > 1, we compare the powers to those of the optimal Procedure 2. Both procedures have almost identical attained significance levels (0.0252 for Procedure 3 and 0.0254 for Procedure 2), and the powers of Procedure 3 are fairly close to the optimal powers of Procedure 2. For example, when 2 = 5, the power of
420
S. Panchapakesan, A. Childs, B. H. Humphrey and N. Balakrishnan
Table 14 Powers of Procedure 3 at 5% significance level k
No
M
18
9
21
10
24
11
27
12
30
13
33
14
36
15
22
9
26
10
30
11
34
12
38
13
42
14
46
15
25
9
30
10
35
11
40
12
45
13
50
14
55
15
2 1/5
I/4
1/3
1/2
2
3
4
5
.2911 (.77) .5291 (.88) .4976 (.89) .6783 (.94) .6388 (.93) .7814 (.95) .7437 (.94)
.2267 (.72) .4265 (.83) .3796 (.86) .5509 (.92) .5208 (.92) .6538 (.95) .6110 (.94)
.1507 (.64) .2785 (.74) .2460 (.80) .3883 (.87) .3382 (.87) .4611 (.92) .4100 (.92)
.0732 (.49) .1317 (.60) .1128 (.64) .1675 (.71) .1467 (.74) .2034 (.80) .1751 (.82)
.1249 (.53) .1591 (.65) .1667 (.66) .2018 (.74) .2024 (.73) .2458 (.79) .2425 (.77)
.3121 (.62) .3904 (.72) .4334 (.62) .4918 (.71) .5156 (.60) .5811 (.66) .6051 (.55)
.4970 .6309 (.54) (.43) .5716 .7143 (.63) (.51) .6494 .7909 (.45) (.31) .6990 ,8327 (.54) (.38) .7449 .8732 (.38) (.24) .7939 .9024 (.44) (.28) .8235 .9206 (.31) (.18)
.0278 (.30) ,0332 (.38) .0293 (.41) .0396 (.45) .0269 (.48) .0373 (.52) .0264 (.55)
.2527 (.62) .2501 (.73) .4936 (.82) .4710 (.88) .6740 (.89) .6353 (.94) .7826 (.96)
.1985 (.58) .1927 (.68) .3924 (.78) .3618 (.84) .5642 (.87) .5085 (.92) .6693 (.94)
.1287 (.53) .1239 (.62) .2630 (.72) .2332 (.78) .3880 (.81) .3230 (.87) .4731 (.90)
.0694 (.43) .0631 (.51) .1220 (.60) .0959 (.65) .1823 (.70) .1344 (.76) .2131 (.79)
.1023 (.56) .1217 (.60) .1710 (.71) .1706 (.71) .2703 (.70) .2050 (.79) .3137 (.76)
.2721 (.71) .3600 (.64) .4260 (.75) .4663 (.64) .6383 (.52) .5483 (.62) .7070 (.50)
.4693 (.65) .5713 (.49) .6606 (.58) .7127 (.42) .8490 (.30) .8100 (.35) .9087 (.24)
.6164 (.55) .7238 (.35) .8074 (.42) .8532 (.26) .9442 (.16) .9175 (.20) .9709 (.11)
.0255 (.29) .0203 (.34) .0327 (.40) .0268 (.45) .0514 (.49) .0287 (.53) .0443 (.56)
.2186 (.41) .2170 (.57) .4475 (.65) .4523 (.73) .4245 (.83) .6330 (.86) .6037 (.91)
.1835 (.39) .1596 (.53) .3725 (.63) .3575 (.70) .3126 (.79) .5137 (.83) .4699 (.88)
.1311 .0852 (.36) (.29) .I058 .0554 (.49) (.42) .2649 .1330 (.58) (.49) .2523 .1202 (.65) (.54) .2086 .0856 (.74) (.64) .3388 .1581 (.78) (.68) .3016 .1202 (.84) (.75)
.1609 (.42) .1022 (.58) .2286 (.63) .2352 (.63) .1739 (.73) .2752 (.74) .2006 (.81)
.4058 (.54) .3134 (.70) .5478 (.61) .5876 (.50) .4946 (.61) .6702 (.49) .6125 (.56)
.6290 .7650 (.45) (.34) .5381 .6962 (.58) (.44) .7768 .8950 (.42) (.26) .8196 .9280 (.29) (.15) .7559 .8925 (.37) (.20) .8828 .9607 (.25) (.12) .8440 .9473 (.30) (.15)
.0390 (.21) .0200 (.29) .0547 (.35) .0463 (.41) .0237 (.46) .0457 (.50) .0252 (.54)
D2
D~
7
9
8
9
8
10
9
10
9
11
10
11
I0
12
7
9
7
10
8
I0
8
11
8
11
9
12
9
12
6
9
7
10
7
10
7
11
8
12
8
12
9
13
1
421
Inverse sampling procedures to test for homogeneity in a multinomial distribution
Table 14 (Contd.) k
No
M
29
9
35
10
41
11
47
12
53
13
59
14
65
15
33
9
40
10
47
11
54
12
61
13
68
14
75
15
37
9
45
10
53
11
61
12
69
13
77
14
85
15
2
D2
DI
1/5
1/4
1/3
1/2
2
3
4
5
1
.2108 (,40) .2274 (.51) .2078 (.64) .4565 (.73) .4219 (.79) .6230 (.84) .6166 (.88)
.1787 (.39) .1874 (.49) .1563 (.61) .3529 (.69) .3374 (.78) .5240 (.82) .4906 (.86)
.1330 (.36) .1299 (.45) .0987 (.57) .2440 (.65) .2060 (.73) .3670 (.78) .3264 (.82)
.0870 (.30) .0801 (.39) .0513 (.50) .1257 (.58) .0903 (.65) .1753 (.70) .1436 (.75)
.1580 (.45) .1670 (.52) .1086 (.67) .2289 (.68) .1653 (.77) .2767 (.78) .2633 (.76)
.3869 (.60) .4492 (.55) .3683 (.68) .5766 (.58) .5009 (.67) .6756 (.57) .6968 (.45)
.6094 (.54) .6841 (.39) ,6097 (.49) .8139 (.34) .7592 (.42) .8850 (.30) .9071 (.20)
.7692 (.40) .8283 (.25) .7814 (.31) .9226 (.19) .8968 (.24) .9689 (.14) .9736 (.08)
.0440 (.22) ,0403 (.30) .0212 (.38) .0476 (.44) .0268 (.50) .0504 (.55) .0406 (.57)
6
9
6
10
7
11
7
11
8
12
8
12
8
13
.0132 (.39) .2238 (.51) ,2068 (.63) ,1993 (.71) .4274 (.79) .4121 (.84) .6210 (.88)
.0154 (,38) .1851 (.50) .1667 (.63) .1456 (.70) .3393 (.77) .3131 (.82) .5020 (.86)
.0156 .0121 (.35) (.31) .1269 .0784 (.46) (.41) .1025 .0503 (.58) (.52) .0887 .0433 (.65) (.59) .2235 .1002 (.74) (.67) .1883 .0738 (.79) (.72) .3375 .1511 (.83) (.76)
.0457 (.48) .1646 (.56) .1025 (.70) .1205 (.72) .1665 (.81) .1651 (.81) .2718 (.80)
.1858 (.65) .4386 (.61) .3457 (.73) .4036 (.64) .5022 (.72) .5186 (.62) .6954 (.50)
.3782 (.58) .6829 (.44) .5956 (.54) .6873 (.38) .7602 (.47) .8025 (.32) .9140 (.23)
.5275 (.46) .8240 (.29) ,7743 (.35) ,8452 (.21) .9035 (.27) .9260 (.16) .9792 (.09)
.0117 (.25) .0407 (.32) .0210 (.41) .0201 (.47) .0304 (.53) .0217 (.58) .0460 (.63)
6
10
6
10
7
11
7
12
8
12
8
13
8
13
.0089 (.40) .2296 (.50) .2224 (.64) .2027 (.72) .4461 (.77) .4269 (.84) .6269 (.88)
.0092 (.38) .1873 (.50) .1670 (.61) .1469 (.70) .3543 (.76) .3221 (.83) .5090 (.86)
.0101 (.36) .1408 (.47) .1047 (.58) .0917 (.66) .2383 (.73) .2024 (.80) .3659 (.84)
.0394 (.50) .1492 (.59) .1047 (.71) .1068 (.76) .2273 (.77) .1608 (,85) .2754 (.83)
.1568 (.69) .4350 (.64) .3400 (.77) .3861 (.67) .6074 (.56) .5201 (.64) .7113 (.53)
.3316 (.63) .6732 (.49) .5989 (.57) .6597 (.43) .8386 (.31) .7982 (.37) .9205 (,25)
.4948 (.50) .8217 (.32) .7618 (.40) .8295 (.25) .9488 (.14) .9276 (.19) .9811 (.11)
,0088 (.26) .0427 (.34) .0222 (.44) .0185 (.51) .0431 (.55) .0277 (.63) .0519 (.67)
6
10
6
10
7
ll
7
12
7
12
8
13
8
13
.0112 (.32) .0860 (.42) .0554 (.53) .0403 (.62) .1205 (.67) .0824 (.74) .1670 (.79)
422
S. Panchapakesan, A. Childs, B. H. Humphrey and N. Balakrishnan
Table 15 Powers of Procedure 3 at 10% significance level •#
No
M
18
9
21
l0
24
11
27
12
30
13
33
14
36
15
22
9
26
10
30
11
34
12
38
13
42
14
46
15
25
9
30
10
35
11
40
12
45
13
50
14
55
15
,~
D2
1/5
1/4
1/3
1/2
2
3
4
5
1
.6292 (.67) .7780 (.81) .7455 (.81) .8419 (.88) .8244 (,86) .8974 (,91) .8793 (.89)
.5264 (.62) .6823 (.77) .6355 (.78) .7586 (.87) .7196 (.85) .8078 (.91) .7774 (.89)
.3978 (.54) .5186 (.68) .4729 (,72) .5961 (.82) .5474 (.81) .6396 (.87) .5974 (.87)
.2349 (,41) .3009 (.55) .2641 (.57) .3162 (.67) .2834 (.69) .3443 (.77) .3097 (.78)
.3142 (.39) .3436 (.55) .3439 (.54) .3703 (.64) .3679 (.61) .4004 (.69) .3952 (.66)
.5914 (.39) .6466 (.52) .6703 (.42) .7021 (.51) .7166 (.41) .7492 (.48) .7725 (.38)
.7697 (.29) .8091 (.38) .8460 (.26) .8671 (.33) .8894 (.21) .9131 (.26) .9218 (.17)
.8733 (.19) .9041 (.26) .9325 (.14) .9442 (.19) .9604 (.11) .9673 (.14) .9721 (.08)
.1004 (.26) .1072 (.35) .0931 (.37) .0991 (.43) ,0771 (.46) .0860 (.51) .0671 (.53)
.4997 (.57) .5418 (.65) .5073 (.78) .7100 (.83) .8380 (.84) .8180 (.91) .9041 (.92)
.4401 (.53) .4566 (.62) .4084 (.75) .6049 (.79) .7538 (.81) .7151 (.89) .8246 (.90)
.3340 (,48) .3467 (.56) .2821 (.69) .4461 (.73) .5941 (.76) .5289 (.84) .6554 (.86)
.2194 (.39) .2056 (.45) .1421 (.57) .2339 (.61) .3471 (.65) .2744 (.73) .3618 (.75)
.3064 (.47) .3076 (.48) .2322 (.62) .3552 (.60) .4689 (.57) .3778 (.68) .4909 (.63)
.6034 (.50) .6450 (.42) .5407 (.55) .7122 (.44) .8223 (.34) .7584 (.43) .8523 (.33)
.7873 (.39) .8305 (.27) .7705 (.35) .8966 (.23) .9505 (.15) .9277 (.19) .9674 (.13)
.8918 (.28) .9152 (.16) .8908 (.20) .9603 (.12) .9877 (.07) .9789 (.09) .9928 (.05)
.2186 (.41) .4446 (.53) .4636 (.60) .6354 (.73) .6542 (.79) .7859 (.81) .7748 (.88)
.1835 (.39) .3793 (.50) .3960 (.57) .5501 (.70) .5494 (.76) .699l (.79) .6682 (.86)
.1311 (.36) .2956 (.46) .2948 (.53) .4345 (.65) .4075 (.71) .5428 (.74) .4943 (.81)
.0852 (.29) .1928 (.38) .1678 (.44) .2427 (.54) .2152 (.60) .3122 (.63) .2475 (.72)
.1609 (.42) .3025 (.49) .3239 (.50) .3667 (.63) .3600 (.62) .4776 (.61) .3773 (.70)
.4058 (.54) .6312 (.48) .6632 (.41) .7337 (.50) .7279 (.42) .8459 (.31) .8017 (.38)
.6290 (.45) .8407 (.33) .8652 (.22) .9072 (.29) .9168 (.20) .9675 (.12) .9470 (.16)
.7650 (.34) .9275 (.21) .9485 (.12) ,9698 (.15) .9758 (.09) .9916 (.05) .9876 (.07)
D~
6
8
7
8
7
9
8
9
8
l0
9
10
9
11
.1055 (.26) .0899 (.30) .0499 (.38) .0848 (.42) .1300 (.45) .0782 (,51) .1086 (,54)
6
8
6
9
7
10
7
10
7
10
8
11
8
11
.0390 (.21) .0899 (.26) .0847 (.31) .1018 (.41) .0788 (.44) .1188 (.47) .0733 (.52)
6
9
6
9
6
10
7
10
7
11
7
I1
8
12
423
Inverse sampling procedures to test for homogeneity in a multinomial distribution
Table 15 (Contd.) k
6
7
8
No
M
29
9
35
10
41
11
47
12
53
13
59
14
65
15
33
9
40
10
47
11
54
12
61
13
68
14
75
15
37
9
45
10
53
11
61
12
69
13
77
14
85
15
)o 1/5
1/4
1/3
1/2
2
3
4
5
1
.2458 (.34) .2274 (.51) .4508 (.60) .4732 (.67) .6338 (.77) .6298 (.81) .7812 (.84)
.2208 (.33) .1874 (.49) .3861 (.58) .3777 (.64) .5577 (.75) .5316 (.79) .6857 (.82)
.1788 (.30) .1299 (.45) .2862 (.53) .2771 (.60) .4073 (,71) .3796 (.75) ,5316 (.79)
.1363 (.25) .0801 (,39) .1777 (.46) .1639 (.53) .2346 (.63) .1963 (.67) .2954 (.71)
.2516 (.33) .1670 (.52) .3018 (.56) .3237 (.55) .3647 (.68) ,3509 (.66) .4608 (.64)
.5416 (.38) .4492 (.55) .6732 (.46) ,6929 (.38) .7437 (.48) .7588 (,38) .8566 (.29)
.7616 (.28) .6841 (.39) .8667 (,27) .8931 (.18) .9225 (.23) .9337 (.15) .9751 (.09)
.8802 (.18) .8283 (.25) .9464 (.14) .9628 (.08) .9799 (.10) ,9841 (.06) .9952 (.03)
.0888 (.18) .0403 (.30) .0934 (.35) .0842 (.40) .0904 (.48) .0704 (.53) .1101 (.54)
.2342 (.35) .2238 (,51) .4624 (.60) .4691 (.67) .4352 (.77) .6455 (.82) .6295 (.85)
.2136 (.33) .1851 (.50) .4047 (.60) .3906 (.66) .3472 (.75) .5445 (.80) .5139 (.83)
.1694 (.30) .1269 (.46) .3003 (.55) .2806 (.61) .2353 (.72) .4001 (.76) .3510 (,80)
.1253 (.27) ,0784 (.41) .1869 (.49) .1640 (.56) .1124 (.65) .2038 (.70) .1719 (.73)
.2353 (.37) .1646 (.56) .3210 (.60) .3093 (.61) .2263 (.73) .3431 (.72) .3486 (.68)
.5307 (.43) .4386 (.61) .6705 (,52) .6862 (.43) ,6033 (.52) .7613 (.43) .7839 (.33)
.7626 (.32) .6829 (.44) .8712 (.30) .8881 (.20) .8478 (.26) .9368 (.18) .9506 (.12)
.8797 (.20) ,8240 (.29) .9563 (.16) .9648 (.09) .9473 (.12) .9862 (.07) ,9905 (.04)
.0878 (.21) .0407 (.32) .0961 (.39) .0886 (.44) ,0434 (,52) .0806 (,56) .0667 (,60)
.2334 (.36) .2625 (.45) .2377 (.62) .4714 (.68) .4635 (.73) .6549 (.82) .6319 (.85)
.2088 (.34) .2233 (.45) .1838 (.59) .3926 (.67) .3769 (.71) .5572 (,81) .5157 (.84)
.1655 (.32) .1849 (.42) .1227 (.56) .2884 (,63) .2673 (.68) .4084 (.79) .3768 (.81)
.1274 (.28) .1342 (,37) .0749 (.51) .1704 (.59) .1530 (.63) .2178 (.72) .1835 (.76)
.2342 (.40) .2528 (.47) .1707 (.63) .3032 (.65) .3153 (.65) .3701 (.75) .3421 (.73)
,5184 (.48) .5899 (,42) .4857 (,56) .6830 (,47) .7224 (.36) .7637 (.46) .7797 (.36)
.7529 (.37) .8092 (.25) .7436 (.34) .8900 (.23) .9103 (.15) .9412 (,20) .9528 (.13)
.8779 (.24) .9155 (.13) .8760 (.18) .9646 (.11) .9762 (.06) .9866 (.08) .9904 (.04)
.0864 (,23) ,0899 (.30) .0391 (.42) .0904 (.48) .0810 (.51) .0926 (.61) .0696 (.64)
D2
DI
5
9
6
10
6
10
6
11
7
11
7
12
7
12
5
9
6
10
6
10
6
11
7
12
7
12
7
13
5
9
5
10
6
11
6
11
6
12
7
12
7
13
S. Panchapakesan, A. Childs, B. H. Humphrey and N. Balakrishnan
424
Table 16 Expected sample size when using Procedure 3 at 5% significance level k
3
No
M
2 1/5
1/4
1/3
1/2
2
3
4
5
1
11.99 13.54 14.18 15.87 16.39 18.02 18.40
17.42 20.19 22.96 25.77 28.50 31.26 34,00
18 21 24 27 30 33 36
9 10 11 12 13 14 15
15.79 17.95 19.90 22.04 23.99 26.06 28.04
16.00 18.22 20.36 22.55 24.49 26.68 28.69
16.33 18.73 20.88 23.20 25.36 27.63 29.78
16.87 19.41 21.86 24.42 26.72 29.24 31.60
1 6 . 1 8 1 4 . 3 8 13.02 18.54 16.21 14.67 20.68 17.64 15.61 22.94 19.57 17.37 24.95 20.91 18.10 27.17 22.66 19.90 29.03 23.78 20.47
22 26 30 34 38 42 46
9 10 11 12 13 14 15
20.13 23.12 26.22 29,19 31.89 34.91 37.74
20.27 23.37 26.56 29.45 32.39 35.37 38.29
20.52 23.70 26.99 30.00 33.04 36.25 39.20
20.82 24.21 27.65 31.02 34.18 37.59 40.77
19.81 22.73 25.61 28.36 30.84 33.76 36.10
17.35 19.17 21.56 23.17 24.51 27.08 28.23
15.37 13.97 16.65 14.88 18.64 16.72 1 9 . 6 0 17.4l 20.43 17.79 22.55 19.88 23.13 20.17
21.31 24.99 28.69 32.27 35.77 39.56 43.10
25 30 35 40 45 50 55
9 10 11 12 13 14 15
23.54 27.73 31.66 35.54 39,36 43.09 46.92
23.60 27.88 31.86 35.73 39.80 43.62 47.44
23.77 28.11 32.19 36.08 40.21 44.16 48.14
23.98 28.44 32.73 36.95 41.20 45.30 49.47
22.73 26.62 30.05 33.26 36.79 39.71 43.41
19.60 22.57 24.63 26.10 29.17 30.56 33.14
1 6 . 9 1 14.96 19.29 17.12 20.54 17.90 21.44 18.21 23,97 20.61 24.63 20.94 27.19 23.23
24.35 29.01 33.56 37.98 42,61 47,08 51.62
29 35 41 47 53 59 65
9 10 11 12 13 14 15
27.45 32.35 37.22 41.77 46.75 51.31 55.66
27.46 32.42 37.47 42.29 46,97 51.61 56.14
27.59 32.66 37.75 42.59 47.52 52.22 56.87
27.80 32.95 38.21 43.20 48.37 53.24 58.08
26.34 30.62 34.97 38.86 43.05 46,67 50.06
22.71 25.05 28.25 30.41 33.47 35.59 36.87
19.39 20.91 23.53 24.55 27.29 28.41 28.96
16.93 17.87 20.17 20.80 23.38 23.87 24,04
28,19 33,59 39,08 44,41 49,84 55,10 60.39
33 40 47 54 61 68 75
9 10 11 12 13 14 15
31.27 37.03 42.75 48.26 53.76 59.14 64.37
31.27 37.11 42.85 48.46 54.15 59.29 64.69
31.44 37.29 43.30 48.86 54.55 60.06 65.50
31.65 37.67 43.74 49.64 55.45 60,98 66.88
29.97 34.78 39.93 44.37 49.10 53.25 57,21
25.59 28.38 32.06 34.65 37.94 40.20 42.01
21.57 23.40 26,25 27.71 30,68 31.74 32,90
18,91 19,95 22.50 23.25 25.93 26.65 27.12
31.95 38.27 44.59 50.74 57.01 63.09 68.96
37 45 53 61 69 77 85
9 10 11 12 t3 i4 15
35.04 41.65 48.07 54.39 60.76 66.89 72.94
35.10 41.75 48.30 54.72 60.99 67.15 73.46
35.25 41.90 48.64 55.18 61.34 67.82 73.89
35.39 42.29 49.15 55.74 62.28 68.91 75.36
33.44 39.21 44.81 49.87 54.59 59.71 64.57
28.54 31.65 35,76 38,57 40.86 44,87 46,97
24.01 26,01 29.12 30.76 32.16 35.48 36.38
20.73 21.97 24.78 25.86 26.25 29.41 30.05
35.73 42.95 50.09 56.95 64.07 70.80 77.53
Inverse sampling procedures to test for homogeneity,in a multinomial distribution
425
Table 17 Expected sample size when using Procedure 3 at 10% significance level k
3
No
M 1/5
1/4
1/3
1/2
2
3
4
5
1
18 21 24 27 30 33 36
9 10 11 12 13 14 15
15.47 17.76 19.56 21.83 23.67 25.87 27.74
15.67 18.04 20.04 22.38 24.17 26.50 28.39
16.06 t8.57 20.58 23.04 25.07 27.45 29.50
16.66 19.30 21.60 24.29 26.48 29.11 31.39
15.82 18.32 20.26 22.65 24.49 26.82 28.48
13.71 15.68 16.79 18.88 19.90 21.83 22.64
12.11 13.91 14.52 16.41 16.82 18.74 19.06
10.97 12.63 12.93 14.74 14.98 16.70 16.89
17.27 20.12 22.83 25.72 28.38 31.20 33.91
22 26 30 34 38 42 46
9 10 11 12 13 14 15
19.98 22.78 26.07 28.95 31.51 34.72 37.44
20.12 23.09 26.43 29.20 31.96 35.16 37.94
20.38 23.42 26.85 29.74 32.64 36.04 38.88
20.68 23.95 27.52 30.79 33.84 37.41 40.46
19.56 22.25 25.32 27.84 30.00 33.16 35.22
16.82 18.20 20.86 22.07 22.99 25.81 26.65
14.57 15.37 17.59 18.12 18.59 20.90 21.21
12.98 13.48 15.46 15.83 15.96 18.19 18.25
21.23 24.81 28.59 32.11 35.51 39.43 42.89
25 30 35 40 45 50 55
9 10 11 12 13 14 15
23.54 27.52 31.32 35.54 39.06 42.67 46.72
23.60 27.69 31.49 35.73 39.52 43.21 47.24
23.77 27.90 31.82 36.08 39.92 43.74 47.96
23.98 28.28 32.44 36.95 40.95 44.84 49.25
22.73 26.25 29.35 33.26 36.05 38.68 42.68
19.60 21.69 23.28 26.10 27.64 28.60 31.47
16.91 t8.07 18.85 21.44 22.09 22.45 25.15
14.96 15.70 16.03 18.21 18.62 18.77 21.15
24.35 28.88 33.33 37.98 42.40 46.72 51.47
29 35 41 47 53 59 65
9 10 11 12 13 14 15
27.07 32.35 36.92 41.28 46.59 51.01 55.20
27.07 32.42 37.19 41.80 46.75 51.29 55.60
27.20 32.66 37.45 42.13 47.30 51.91 56.41
27.43 32.95 37.96 42.73 48.19 52.91 57.55
25.74 30.62 34.36 37.83 42.42 45.71 48.70
21.61 25.05 26.88 28.55 31.99 33.58 34.37
17.88 20.91 21.75 22.26 25.29 25.97 26.14
15.25 17.87 18.16 18.54 21.22 21.46 21.50
27.88 33.59 38.84 44.01 49.70 54.84 59.94
33 40 47 54 61 68 75
9 10 11 12 13 14 15
30.93 37.03 42.49 47.80 53.57 58.88 63.97
30.92 37.11 42.61 48.01 53.99 59.00 64.25
31.10 37.29 43.06 48.42 54.39 59.77 65.09
31.34 37.67 43.51 49.25 55.30 60.69 66.46
29.45 34.78 39.38 43.48 48.55 52.42 55.98
24.51 28.38 30.73 32.80 36.48 38.20 39.44
20.02 23.40 24.47 25.35 28.56 29.23 30.01
17.14 19.95 20.47 20.80 23.68 24.06 24.36
31.69 38.27 44.43 50.38 56.86 62.88 68.60
37 45 53 61 69 77 85
9 10 11 12 13 14 15
34.76 41.06 47.87 53.98 60.17 66.64 72.60
34.82 41.20 48.10 54.40 60.38 66.92 73.04
34.96 41.33 48.44 54.81 60.73 67.62 73.47
35.12 41.70 48.96 55.39 61.63 68.66 74.97
32.93 38.22 44.30 48.97 53.27 58.89 63.25
27.50 29.82 34.50 36.71 38.20 42.89 44.27
22.51 23.76 27.26 28.27 29.13 32.83 33.26
18.94 19.59 22.65 23.26 23.34 26.66 27.01
35.46 42.45 49.94 56.61 63.42 70.57 77.16
426
s. Panchapakesan, A. Childs, B. H. Humphrey and N. Balakrishnan
Procedure 3 (0.9473) is quite close to the optimal power (0.9952) of Procedure 1. Furthermore, the power is clearly superior to that of Procedure 1 (0.8558) in the same situation. Similar results hold at the 10% significance level and are given in Table 15. Aside from being nearly optimal in certain situations under both forms of the alternative hypothesis, the combined procedure has the further advantage of always having an expected sample size that is smaller than both Procedure 1 and Procedure 2. This can be seen by comparing the expected sample sizes for Procedure 3 given in Tables 16 and 17 (for the 5% and 10% significance levels, respectively) with the expected sample sizes given in Tables 5, 10 and 11 for Procedures 1 and 2.
6. Conclusions In this paper, we have presented three alternatives to the standard z2-test of homogeneity in a multinomial distribution. In each of these procedures, the sample size is not fixed. We have seen that Procedure 1 has a distinct advantage over the z2-test in terms of expected sample size (with the power being comparable) when there is slippage to the left, while Procedure 2 has a very significant advantage when there is slippage to the right. In addition, the test statistic for each procedure is much simpler to compute than the zZ-test statistic. Therefore, if the form of the alternative hypothesis is known, we recommend use of one of the first two tests (whichever one is optimal under the given alternative hypothesis). But if the form of the alternative is not known, we recommend the use of Procedure 3 for the large values of M and No given in Tables 14 and 15. This will provide a test that is nearly optimal under both forms of the alternative hypothesis.
Acknowledgements The second and fourth authors would like to thank the Natural Sciences and Engineering Research Council of Canada for funding this research.
References Alam, K. (1971). On selecting the most probable category. Technometrics 13, 843-850. Cacoullos, T. and M. Sobel (1966). An inverse sampling procedure for selecting the most probable event in a multinomial distribution. In Multivariate Analysis (Ed., P. R. Krishnaiah), pp. 423-455. Academic Press, New York. Gupta, S. S. and K. Nagel (1967). On selection and ranking procedures and order statistics from the multinomial distribution. Sankhy6 B 29, 1 34. Hogg, R. V. and A. T. Craig (1995). Introduction to Mathematical Statisties, Fifth edition. Macmillan, New York.
Inverse sampling procedures to test for homogeneity in a multinomial distribution
427
Johnson, N. L. and D. H. Young (1960). Some applications of two approximations to the multinomial distribution. Biometrika 47, 463-469. Patnaik, P. B. (1949). The non-central 7,2- and F-distributions and their applications. Biometrika 36, 202-232. Ramey, J. T. and K. Alam (1979). A sequential procedure for selecting the most probable multinomial event. Biometrika 66, 171-173. Rao, C. R. (1973). Linear Statistieal Inference and Its Applications, Second Edition. John Wiley & Sons, New York. Young, D. H. (1962). Two alternatives to the standard ~2 test of the hypothesis of equal cell frequencies. Biometrika 49, 107-116.
Part IV Prediction
N. Balakrishnan and C. R. Rao, eds., Handbook of Statistics, Vol. 17 © 1998 Elsevier Science B.V. All rights reserved,
1 l ~)
Prediction of Order Statistics
Kenneth S. Kaminsky and Paul I. Nelson
1. Introduction
Let X = ( X ( 1 ) , X ( 2 ) , . . . ,X(m))' and Y = (I1(1), Y ( 2 ) , . . . , Y(n))' be the order statistics of two independent random samples from the same family of continuous probability density functions {f(xlO)}, where 0 is an unknown parameter vector. Our main goal is to describe procedures where, having observed some of the components of X, say X1 = (X(1),X(2),... ,X(r))' it is desired to predict functions of the remaining components of X, namely X2 = (X(r+l),X(r+2),... ,X(m))t, called the one sample problem, or of Y, called the two sample problem. Motivation for this type of prediction arises in life testing where X represents the ordered lifetimes of m components simultaneously put on test. If the test is stopped after the r th failure so that X1 represents the only available data, we have a type II censored sample. In the one sample problem it may be of interest, for example, to predict: (i) X(m), the time at which all the components will have failed, (ii) a sample quantile X(s) of X2, where s is the greatest integer in m2, 0 < 2 < 1, s > r, (iii) the mean failure time of the unobserved lifetimes in X2. In the two sample problem, it may also be of interest to predict such functions of Y as: (i) the range, (ii) quartiles, (iii) the smallest lifetime. Prediction of order statistics can also be used to detect outliers or a change in the model generating the data. See for example Balasooriya (1989). We will describe interval and point prediction. Much of the past work on prediction intervals concerns approximations to complicated distributions of pivotals on a case by case basis. The availability of high speed computing has somewhat diminished the need for such approximations. Accordingly, we will focus on this computational approach to constructing prediction intervals. Patel (1989) gives an extensive survey of prediction intervals in a variety of settings, including order statistics. While overlapping his review to some extent, we will for the most part complement it. Throughout, we use boldface to denote vectors and matrices and A' to denote the transpose of the matrix or vector A.
431
K. S. Kaminsky and P. L Nelson
432
2. Prediction preliminaries Since prediction of random variables has received less attention in the statistical literature than parameter estimation, we begin by giving a brief general description of point and interval prediction. Let U and W be vectors of random variables whose joint distribution (possibly independence) depends on unknown parameters 0. Having observed U = u, it is desired to predict T = T(W), some function of W. Let T = T(U) be a function of U used to denote a generic predictor of T. Good choices for i? may be defined relative to specification of either the loss L(i?, T) incurred when i~ is used to predict T or some inference principle such as maximum likelihood. When the former approach is used, L(i?, T) is typically some measure of distance between i? and T. An optimal choice for T is then made by finding (if possible) that function which minimizes E{L[I?(U), T(W)]} ,
(2.1)
where"E"denotes expectation over all joint realizations of U and W. The set A = A(u) computed from the observed value of U is called a 1 - 27 prediction region for T = T(W) if for all 0 in the parameter space:
Po(T E A(U)) = 1 - 27 •
(2.2)
This statement is to be interpreted in the following frequentist perspective. Let (Ui, Wi), i = 1,2,... ,M, be independent copies of (U, W) and let N(M) denote the number of these for which T / = Ti(Wi)EA(U,.) holds. Then as M---+ oe, N ( M ) / M ~ 1 - 2 7 . Unlike the typical case of parameter estimation where the true value of the parameter may never be known, an experimenter is often able to ascertain whether or not T lies in A(u). Thus, if an experimenter makes many forecasts where there is a real cost associated with making an incorrect prediction, it becomes important to control the ratio N ( M ) / M . To apply this general setting to the prediction of order statistics we associate the vector U with the elements of X1 = (X(~),X(2),... ,X(r))' I n the one sample problem we associate W with X2 (X(r+l),X(r+2),... ,X(m))' while in the two sample problem, we associate W with Y. Specifically, we have: =
U = X, = (Xllt,X(21,... ,X~rl)' , X2 W=
(2.3)
in the one sample problem, in the two sample problem .
In all cases we consider the function being predicted T = T(W) to be linear with generic form: T = Z~i W~ = i¢' W ,
(2.4)
where {~:i} are constants. Note that by taking all of the components of K except one equal to zero, predictions can be made for individual components of X2 and Y.
Prediction of order statistics
433
3. Assumptions and notation Most of the parametric work on prediction of order statistics assumes that the order statistics X and Y are based on random samples from a location scale family of continuous probability density functions (pdf's) of the form: f(xlO ) = (1/a)g((x -/z)/a)
,
(3.1)
where o- > 0 and the pdf g (known) generates a distribution with finite second moment. Some early papers assumed for convenience that one of these parameters was known. Except when/~ is the left endpoint of the support of the distribution, as is the case when g is a standard exponential distribution, this assumption is unrealistic and we will assume for the most part that both location and scale parameters are unknown. We will also describe the construction of nonparametric prediction intervals where it is only assumed that the pdf of the underlying observations is continuous. Recall that we partition X ' = (Xtl, XI2), where X 1 represents the first r observed order statistics. Denote the vectors of standardized order statistics Zx~, Zx2, and Z y by: gxi ---~ ( X i -]21i)/17 ,
i = 1,2,
(3.2) Zy = (Y-
t~lv)/a ,
where 11,12 and 1y are columns of ones of appropriate dimension. The Z ' s have known distributions generated by the pdf 9- We let c~ij be the expected value of the ith standardized order statistic of a sample of size j from (3.1) and express these as vectors:
~1 = E(Zxl) = (~l,m, ~2,m,.-., °~r,m)I, e2 = E(Zx2) = (er+l,m, Ctr+2,m,..., em,m)',
(3.3)
~y = E ( Z y ) = (~l,n, ~2,n,..., O~n,n)! • Partition the variance-covariance matrix of X as: Var(X) = a2Var(Zx)
0"2( VI'I -
Vl'2)
(3.4)
v211
¢y2 V X
and represent the covariance matrix of Y by: Var(Y) = a2Var(Zv) =- ~r2 Vv .
(3.5)
K. S. Kaminsky and P. I. Nelson
434
For s > r, let a2w~ denote the m - r dimensional column vector of covariances between X1 and X~, so that: C o v ( Xi,X(s) ) -~- ry20)s -- a2V! --
(3.6)
1 ,s
where Vl,s is the i th row of V1,2. The covariances given in Vx and Vr and the expected values in the ~'s specified above do not depend on unknown parameters. They can be obtained: (i) from tables in special cases, (ii) by averaging products of simulated standardized order statistics and appealing to the law of large numbers, (iii) by numerical integration. Note that X and Y are uncorrelated since they are independent. Prediction can also be carried out in a Bayesian setting. Let ~c(0) denote a prior distribution on the parameter 0, rc(0[xl) the posterior distribution of 0 given xl and f(xl, tl__0) be the joint pdf of (X1, T) conditional on 0. The predictive density of t is then given by:
f(tlxl) = / If(x1, tlO)/f(xl IO)]~(dOIxl) •
(3.7)
In the two sample case T and X1 are conditionally independent given 0 and (3.7) simplifies accordingly.
4. Point prediction When predicting future outcomes from past outcomes, whether in the context of order statistics or otherwise, prediction intervals and regions probably play a more important role than point prediction. Yet, just as there are times when we desire a point estimator of a parameter, accompanied, if possible, by an estimate of its precision, there are times when a point predictor of a future outcome may be preferred to an interval predictor. And when a point predictor is accompanied with an estimate of its precision (such as mean square error), it could be argued that little is lost in comparison with having intervals or regions. Point predictors of order statistics, and their errors are the topics of this section.
4.1. Linear prediction Using the notation established in Section 3, we assume that X~ has been observed, and that X2 and/or Y have yet to be observed (or are missing). Let f X(s)
W(s) = [ y(~)
in the one sample problem in the two sample problem
and W = ~"X2 LY
in the one sample problem in the two sample problem
(4.1)
Prediction o f order statistics
435
We will concentrate on the prediction of a single component g~s) of W, although we could equally well predict all of W, or some linear combination of its components. Hence, for point prediction we will drop the subscript s and simply let C o v ( X 1 ,X(s)) = COs = CO ~" ((DI, C O 2 , ' . . , (Dr) t. From Lloyd (1952), the BLUE estimators of # and a based on X1, and their variances and covariances are /~ = - ~ ' I F I X 1 ,
8 = -ltlflX1,
Var(/~) = (0(lall0~l)O-2/A,
Var(6) = ( l ' l a l l l l ) o ' 2 / A ,
(4.2)
Cov(~, 6-) = --(l'lOllel)oa/A , where ! r 1 = ~"~ll(ll~ 1 -- 0~tlll)~"~ll/A A = (ltlallll)(0Ctl~r-~ll0~l)
-- ( l t l a l l l l )
2
,
and Ar'~12 ~
~?x =
1?21 ~r'~22 ff = Vx 1
A linear predictor W(~) = ~'X1 is called the best linear unbiased predictor (BLUP) of W(,)if E(W(s)) = E ( ~ ) )
(4.3)
and E(I~(,) - E(g~s)) 2 is a minimum, where ~ is a vector of constants. Note that in the context of the model (3.2), two constraints are imposed on ~ by (4.3). Namely, that ~'ll = 1 and ~'~1 = cq,k, where k = m in the one sample case and k = n in the two sample case. Goldberger (1962) derived the best linear unbiased predictor of the regression parameters in the generalized linear model (i.e., the linear model with correlated errors). Kaminsky and Nelson (1975a) first applied Goldberger's result to the order statistics of model (3.2) for the prediction of the components of X2 and Y, or for linear combinations of X2 and Y. Combining the one and two sample problems, the BLUP of W(~) can be written X(s) = (~ + ffO~s,rn) -~- cot~e~ll(Xl - ~11 -- O'~1)
W(~) =
^ Y(~)
/~ + 6~,~
in the one sample problem, in the two sample problem,
(4.4)
where 2 _< r < s _< m in the one sample problem, and 2 < r < m; 1 < s < n in the two sample problem. The simpler two sample predictor results from the fact that, by our independence assumption, Cov(Xl, Y(~)) = 0. The mean square error (mse) of W-(~) is
436
K. S. Kaminsky and P. 1. Nelson
O'2{/)ss --¢.O'1t~11¢.0-]-Cll }
mse(l~(,))
i n t h e o n e sample case,
, 2 i t O-2 {~1~'-~1 -t- O~s,nll~'~l 1 -- 2cq,nllllal}/A
in the two sample case , (4.5) where cll = var{(1 - ~O'~llll)fi q- (C~s,rn (2)/~"~11~1)(}}/0-2. We point out that in the two sample case, the BLUP of Y(~)isjust the BLUE of E(Y(~)), and we call f'(~) the expected value predictor of Y(~). In the one sample case, similar simplifications occur to the BLUP of X(~) for a fairly wide class of underlying pdf's. To see what these simplifications are, we state some results from Kaminsky and Nelson (1975b). -
-
Regularity conditions. The underlying standardized pdf, 9, is such that the limits: lira c%~ = Pl;
lira es,m = P2 ,
m--+oo
tn--+oo
(4.6)
and mlim (m • vr~)
_
2,(1
- 22)
g(Pl) :7]~2)
(4.7)
'
where 0 < ~1 < ,~.2 < l, r = ((m -Jr- 1)~1), s = ((m + 1)22), Pi = G-I()~i), i = 1,2, and (a) denotes the closest positive integer to a. Conditions under which the above limits hold may be found in Blom (1958) and van Zwet (1964). We are now ready to state the theorem which characterizes the conditions under which the BLUP of X(~) simplifies to the BLUE of its expected value: exist,
THEOREM 4.1. Consider the order statistics of a random sample of size m from (3.1) and suppose that regularity conditions (4.6) and (4.7) are satisfied. Then the following are equivalent: (i) The (standardized) distribution of the random sample is a member of the family ~ = { G1, G2, G3 }, where % ( x ) = x c,
0<x
G2(x) = ex,
x<0
C3(x)=(-x)%
c>0,
,
x<-l,
e>0.
(ii) For all r,s and m,2 _< r < s_< m, the BLUP of X(s) and the BLUE of E(X(s)) are identical (i.e,, ~o'~211(X1 - f i l l -6-~1) -= 0). (iii) For all r and m(2 < r < m), the BLUE of E(X(s)) is X(~) (i.e., k(s) = Another characterization based on the form of the BLUP of X(s) is contained in the following theorem:
437
Prediction of order statistics
THEOREM 4.2. Assume regularity conditions (4.6) and (4.7) are satisfied. Then (a) The B L U P o f X(s) does not depend (functionally) considered known or on/2 if it is not if and only if the distribution is G4(x) = 1 - e -x, x > 0. (b) The B L U P of X(s) does not depend (functionally) considered known or on 6. if it is not if and only if dized distribution is either Gs(x) = 1 - x -c, x > 1, c > -1 <x<0, c>0,
on ~t if this parameter is underlying standardized on a if this parameter is the underlying standar0 or G 6 ( x ) = 1 - ( - x ) c,
F o r the proofs of Theorems 4.1 and 4.2, the reader is referred to Kaminsky and Nelson (1975b). The proofs are based on results of Watson (1972). It is not difficult to show, with the help of Malik (1966, 1967), that for any of the distributions in the family {Gi}, i = 1 , 2 , . . . , 6, we have ~o'f211 -= ( 0 , 0 , . . . ,0, q) ,
(4.8)
where q = coi/vir, i = 1, 2 , . . . , r. This means that for any member of this family, the B L U P of X(s) takes the form X(s) = q "X(r) + (1 - q)./2 + (C~,,m-- q" Ctr,,,)6. •
(4.9)
In fact, in examining (4.9), Theorems 4.1 and 4.2, we remark further, that If G E ~ , then )((s)=/2+6.C~,m, if G = G 4 , then X(~)=X(r)+(Ct~,m--er,m)6., (4.10) if G = G5 or G6, then )?(~) = (C~,m/er,m)"X(r) + (1 - c~,,n/c~r,m)'/2 • A linear predictor W ( s ) = = ~1X1 is called the best linear invariant predictor (BLIP) of W(~) if E(W(~) - W(s))2 is a minimum in the class of linear predictors of W(s) whose mean square error is proportional to if2. This condition imposes the constraint ~'ll = 1 on the coefficient vector ~. Recall that if in addition ~'~l = e~,k, where k = m in the one sample case and k = n in the two sample case, ~X1 is an unbiased predictor o f W(~). Thus, the mean square error of the BLIP of W(s) never exceeds that of the B L U P . On the other hand, unbiasedness is lost for the BLIP. Kaminsky, M a n n and Nelson (1975) derive the BLIP of the components of X2 in the model (3.2). The B L I P W(~) is
Y(s) = Y(s) =
C12 61 -}- c22
in the one sample problem,
d 1 2 6.
in the two sample problem,
(4.11)
wherec12 = Coy{& (1 - oJf21111)/2 + (e~,m - e~'g211~l)/a 2, c22 = d22 = Var(6.)/a 2, and d12 = Coy{6.,/2 + 7~,m6.}/~r2. The mean square error of if@) is
mse(W(,)) =
{ mse(2(~)) - c22a2/(1 + c22)
in the one sample case,
mse(Y(s)) - d22a2/(1 + d22)
in the two sample c a s e . (4.12)
438
K. S. Kaminsky and P. I. Nelson
These quantities can be found with the help of (4.2). Kaminsky and Nelson (1975a) derive the asymptotically best linear unbiased predictor (ABLUP) of X(~) based on k sample quantiles. They find that the A B L U P of X(~) always takes a form similar to (4.12), where the quantities q, fi, 6-, c~r,m, and c~,,~ are replaced by their asymptotic counterparts. Kaminsky and Rhodin (1978) find that the prediction information contained in the largest observed order statistics X(r) in large samples is relatively high over a wide range of failure distributions. Nagaraja (1984) obtains the A B L U P and the asymptotically best linear invariant predictor (ABLIP) of X(s) based upon X(I),X(2),... ,X(r), where r and s are both small relative to the sample size, m. This is done as follows: Assume that appropriately normalized, X(1) converges in law to nondegenerate distribution function G. That is, there are sequences of constants cm and dm such that P{(X(a) - Cm)/dm < x} = 1 - Fm(cm q- dmx) ~ G(x)
(4.14)
as m ---+oc. Then, the order statistics X0),X(2),... ,X(r) behave (approximately) as the first r upper record values from G(# + ox), where /~ =em and o- = din. The problem of predicting X(~) then reduces to predicting the s th upper record value from the first r record values from G, a problem already studied by others. Nagaraja (1986) used two criteria as alternatives to mean square error to study prediction error in predicting X(~) from X(1),X(2),... ,X(r), in sampling from the two-parameter exponential distribution. These criteria were probability of nearness (PN) and probability of concentration (PC). A predictor X (1)(~) has higher probability of nearness (PN) to X(s) than predictor X ~ ) if P{ X0)-(s) X(~) < X ~ )-X(~) } > 0.5 .
(4.15)
We1 say that X({ ) is preferable to X!2, ) in the PN sense if (4.15) holds. A predictor ts) ~s) () X(s) has higher probability of concentration (PC)around X(~) than predictor X~a)) if
P{X(1)-X(~)(,) < c } _>P{ X ~ ) - X ( , )
(4.16)
for all c > 0 with strict inequality for at least one e. When (4.16) holds, we say that ~1) is preferable to X~2) in the PC sense. Nagaraja's (1986) study of prediction error via these criteria and mean square error was restricted to the twoparameter exponential distribution. It would be useful to expand such a study to other failure distributions. W
4.2. Other point predictors Kaminsky and Rhodin (1985) apply the principle of maximum likelihood to the prediction of order statistics. We will once again assume that we want to predict a single component, f X(s) W(s) = ~ y(~)
in the one sample problem in the two sample problem
Predictionof orderstatistics
439
where the sampling structure is the same as in Section 4.1 except that we no longer restrict sampling to location and scale families. Specifically, consider the joint pdf, f(xl, w(~);0), Of Xl and W(~)where 0 is an unknown vector of parameters. Viewed as a function of W(~) and 0, define
L(w(,), O;Xl) = f(xl, W(s); O)
(4.17)
to be the predictive likelihood function (PLF) of w(~) and 0. Let V~) and 0* be statistics depending on X1 for which
L(w(s), O*;xl) = sup f(xl, w(~); O) .
(4.18)
(w(s1,0) We call ~]) the maximum likelihood predictor (MLP) of W(~)and 0* the predictive maximum likelihood estimator (PMLE) of 0. Kaminsky and Rhodin (1985) obtain some uniqueness results and derive the MLP's for several distributions. Raqab and Nagaraja (1992) prove that in the one sample case with 0 known, the MLP is an unbiased predictor if the mode equals the mean in the conditional distribution of X(~) given X(r). This condition does not hold for the one parameter exponential distribution where the MLP is negatively biased. Raqab and Nagaraja (1992) also show that when X(r) and X(~) are sample quantiles, the mean square prediction error of the MLP based on samples from a one parameter exponential distribution or a uniform distribution, both with known scale parameter, converges to zero as the sample size m ~ ec. However, general results for maximum likelihood prediction of order statistics rivaling results for maximum likelihood estimation of parameters do not exist at this time. Takada (1991) develops the general form of a predictor 6(W(~)) based on a sample from a location-scale family which is better than both the BLUP and BLIP according to the PN criterion given in (4.15). The predictor 6(W(s)) is median unbiased in the sense that P(6(W(s)) _< W(~))= P(6(W(~)) >__ W(s)) for all values of the location and scale parameters (/1, a). Takada (1991) computes 6(W(s)) for samples from the two parameter exponential distribution. Raqab and Nagaraja (1992) also define a predictor t/(W(s)) to be the median of the conditional distribution of X(~) given X(r), which typically depends on unknown parameters. Let 0 (W(~)) denote this predictor with parameters replaced by estimators. Raqab and Nagaraja (1992) derive 0(W(~)) for the one parameter exponential and uniform distributions and show by a small scale simulation that it performs well in terms of rose. Adatia and Chan (1982) use adaptive methods to estimate the unknown parameters and predict future order statistics in sampling from the Weibull distribution. Their procedures assume an unknown scale parameter and a shape parameter which is also unknown, but assumed to lie in a restricted interval. Balasooriya and Chan (1983) and Balasooriya (1987) used the cross-validatory method of Stone (1974) and Geisser (1975) to predict future order statistics in samples from Weibull distributions and gamma distributions, respectively.
440
K. S. K a m i n s k y and P. I. Nelson
5. I n t e r v a l p r e d i c t i o n
5.1. Intervals based on pivotals
Suppose in the general setting of Section 2 we want to predict T = T(IV) from i~(U) and that Q = ~b(T, i~) is a pivotal. In the parametric case this means that for data generated by a p d f f a s given in (3.1), Q has a distribution free of (/~, ~) and in the nonparametric case the distribution of Q is the same for all continuous densities. In either case, a 1 - 27 prediction region for T is obtained by finding a set B that contains Q with probability 1 - 27 and then inverting:
P(Q E B)
1 - 27 =
(5.1)
= P(T E A(U)) ,
yielding A (U) as a 1 - 27 prediction region. The concept of invariance allows this procedure to be used for location-scale families. See appendix G of Lawless (1982) for a concise discussion of this concept. The estimators (]~(XI) , (7(XI)) computed from X1 are called equivariant if for all constants e and d, d > 0, fi(cl + d X l ) = c + d ~ ( X l ) ,
(5.2)
ff(cl + dX1) = d~(X1) . BLUE's, maximum likelihood estimators and Mann's (1969) best linear invariant estimators (BLIE's) based on samples from (3.1) are equivariant. We consider the problem of predicting linear combinations of unobserved random variables so that, using K to denote a generic vector of constants, T = ~¢'X2 in the one sample problem and T = K' Y in the two sample problem. For any equivariant estimators (/~, ~), the following are pivotals: =
-
=
•
(5.3)
Let 2~ = a'~X1 be a linear unbiased predictor of X~, r < s _< m, and I~ = b'~X1 a linear unbiased predictor of Y~, 1 < s < n, and ~ an equivariant estimator of a. Form arrays of predicted values and predicting coefficients denoted by: 22 = (X(r+l),k(r+2),. •. , X^( m ) ) ',
= (]~(1), Y(2),'", Y(n))', A = (ar+l,ar+2,... ,am) , B = (bl,b2,... ,bn) .
(5.4)
We obtain from (3.2) and (5.3) that the following are pivotals: Qx,,, = •'(X2 - X2)/# =
2,
'(zx2
-
A'Z.,),
Oy,,, = ~ ' ( Y - ~ ) / ~ =
e2,
'(zy -
gz.)
(5.5)
Prediction of order statistics
441
Let Oxa. and QaY)¢ represent the 100 ~ percentage points of the distributions of Q~. and Qy, respectively:
e ( o ~ . _< OL) = a, P(O..
(5.6)
<_ 0 ~ . ) = a .
We again note that for a given location-scale family the percentage points Qxa. and Qya. can easily be found by simulating data from the standardized pdf g(x) as defined in 3.1 and computing Q~. and Qy#. Iterate this procedure a large number of times and invert the empirical distribution functions to estimate the desired percentage points. Some authors, eg see Lawless (1971) and Likeg (1974), have used other versions of Q~. and Qy. of the form kl(Xl)Qx.+k2(X1) and k3(YI)Q>~ + k4(Y), where {kj} are random variables free of unknown parameters. In either case, level 1 - 2 7 prediction intervals are then given by: For K'X2 [dX2 + #Q~., For K'Y [K'Y+ 6Q~.,
~¢'22+ #QISI ,
~¢'Y+ #Q1ySI .
(5.7)
(5.8)
Let [Lx(XI), Ux(Xl)] and [Ly(X1), Uy(Xl)] denote the intervals computed from X1 given in (5.7) and (5.8) respectively. One sided lower 1 - 7 prediction intervals corresponding to the two-sided intervals above are:
[Lx(X1),oo),
[Ly(X1),oc) .
(5.9)
5.2. Intervals based on best linear predictors Optimality regarding the choice of which unbiased linear predictors {Xs} and {Ys} and to use in (5.7) and (5.8) has not been established in general. However, the BLUP's provide a reasonable basis on which to make this choice. From (4.4) they have the form:
2(,/(~, a) = ~ + ~,,mO + o ~ V ; I ( X l -- f a -- ~la),
(5.10)
E,)(~, a) -- ~ + ~,,na, where fi and 6- are the BLUE's of # and a. Using any equivariant estimators/~ and # , pivotals based on the form of the predictors for individual order statistics in (5.10) are (using (5.3)) given by:
442
K. S. K a m i n s k y and P. I. Nelson
Qx,,(~, 6) = (xl, I - k ( ~ , 6))/6 = ~'I2(ZX, s -- I - t - - ¢ O ' s V x 1 ( l x - } - - ~ l ) ) --
, O~s. m
--
!
--1
~1(1 -]-OJsV X )
-
(5.11)
t V-1~1~ (.0 s
Qy,,(~, #) = (Y(s) - Y,(~, 6))16 = ~ J 2 ( Z Y , , -I- O~s,n) -
~Yl - O~s,n ,
where Zx, s = (X(,) - # ) / a and Zr,, = (Y(,) - l~)/a. Expression (5.11) verifies our statement that these functions are distributed free of/~ and a. Here, letting Q~,~(~, 6) = Z~c, Qx,s([t, 6) and Qy,~([z, 6) = E~c, Qy,,(ft, 6), prediction intervals constructed from (5.6) and (5.11) parallel those in (5.7) and
(5.8): For
r'X2
[r'22(/~, 6) + aQ~x,,, - ~ For r ~Y [r'lz(/~,6) +#@,K,
r'22(/~,~)+6Ox[~ ~] ,
(5.12)
K'Y(/~,#)+6Qy1-7] .
(5.13)
5.3. The exponential distribution Some analytic results for the above procedure have been obtained for the exponential model f(x[#, ~) = ( l / a ) exp((x - l~)/a)) where x > # and f(x[#, a) = 0 elsewhere. When # is known to equal zero, these results are derived from the fact that the scaled spacings { 2 ( m - i + 1)(X(i)-X(i_l))/a} are digtributed as independent chi-square variates. Let Sr = Y']ir=i X(i) + (m - r)X(~), r times the role of a. For predicting X(~) from X1 with /~ known and taken as zero Lawless (1971) derived the distribution of the pivotal
R(X1 ,Xs) = (X(,) - X(r))/S~
(5.14)
as:
P(R(xl,x,)
>_ t) = ( r / ~ ( s - r , m - s +
rl(
1)) ~
i=0 × [l+(m-s+i+l)t]-r/[m-s+i+l
)
s--r-- 1 (--1)' i
I ,
(5.15)
for all t > 0, where B(a,b) = ( a - 1)!(b- 1 ) ! / ( a + b 1)!. Percentage points {Ra (X1, X(s))}, 0 < 6 < 1, of the distribution given in (5.15) can be approximated by a Newton-Raphson iteration, yielding a 1 - 27 prediction interval for X(s) of the form:
[x(r~ + R,(xl,X(s~)Sr, x(r~ + Rl='(x,,x(,~)sr]
.
(5.16)
Prediction of order statistics
443
Kaminsky and Nelson (1974) show that the percentage points of the distribution given in (5.15) can be approximated by scaled percentage points of an appropriate F distribution. This F-approximation follows from using a Satterthwaite type approximation to the numerator and denominator of R(X1,X(~)). They show that this approximation can also be used when the prediction is based on a subset of X1. An approximate 1 - 2~ prediction interval for X(~) based on X1 is given by: IX(r) + A1F~(a,b)a, X(r) + A,F0 r)(a,b)6-I ,
(5.17)
m where a = 2AZ/A2, Ai = ~j:r+l(m - j + 1)-i,i =. 1,2;b = 2r, F~(a,b) is the 1006 percentage point of an F-distribution with a and b degrees of freedom and r~ is the BLUE of a based on X1. Like~ (1974) extended Lawless's (1971) result to an exponential family with unknown location parameter #. However, the exact results are quite complicated and given implicitly. Instead, we recommend using simulation as outlined above. Lawless (1977) gives explicit formulas for prediction intervals for Y(s) in the two sample problem in the same setting. Abu-Salih et al. (1987) generalize prediction from the one parameter exponential to samples from a mixture of exponential pdf's of the form f(xlal, o-2, fi) = fi(1/al) exp(-x/al) + (1 - fl)(1/a2) exp(-x/a2). The mixing proportion fl is assumed known. They derive an exact prediction interval for X(~) when al/a2 is known. Their formulas are fairly complicated. Their intervals can also be obtained through simulation.
5.4. Optimality Takada (1985, 1993) obtains some optimality results in the one sample problem for the one sided version of the prediction interval given in (5.16) based on samples from an exponential distribution with known location parameter. Consider the class C(1 - ~) of lower 1 - y prediction intervals for X(s) of the type given in (5.9) which are invariant in the sense that for any positive constant c, Lx(cX1) = cLx(X1). Takada (1985) showed that the conditional mean length E(X(~.) - Lx(XI))[X(s) >_Lx(XI)) is minimized over C(1 - y) for all a by the lower bound of Lawless's (1971) interval. In the same setting, Takada (1993) also shows that 6(X1) = X(r)+ RT(X1,X(~)), the lower bound of Lawless's (1971) interval as given in (5.16), minimizes P(X(~) > 6(X1) + a) for all positive constants a , for all values of the scale parameter a, among all invariant lower 1 - 7 prediction limits 6(X1). Takada (1993) calls such optimal intervals uniformly most accurate equivariant. This type of optimality minimizes the probability that the value being predicted is more than any specified distance above the lower endpoint of the prediction interval.
5.5. Adaptive and distribution free intervals Suppose that the exact form of the standardized density given in (3.1) is not known but we are willing to assert that it lies in some specified finite collection G
444
K. S. Kaminsky and P. L Nelson
of pdf's. In such a situation, Ogunyemi and Nelson (1996) proposed a two stage procedure. In the first stage X1, the available data, is used to select a density g from G and the same data are then used in the second stage to construct prediction intervals from (5.12) and (5.13) via simulation. This procedure can be used for both the one and two sample problems. Fully nonparametric (for any continuous pdf) prediction intervals in the two sample problem for individual components Y(~) were given by Fligner and Wolfe (1976). They showed using the probability integral transform and basic properties of uniform order statistics that for 1 < i < j < m ,
P(x(,) < Y(.) < x )l = i=1
-A.
m
i
"
m
(5.18)
Thus, (X(i),X(])) provides a 100A% prediction interval for Y(~). For samples from a discrete distribution the right hand side of (5.18) provides a lower bound on the coverage probability. Fligner and Wolfe (1979) for m and n large and m / ( m + n) not close to 1 approximate the coverage probability A in (5.18) in terms of the standard normal C D F • by ~(Aj) - @(Ai), where Ak = [(m(n + 2))°5(n + 1)/ (s(n - s + 1)(n + m + 1))°5]((k - 0.5)/m - s/(n + 1)), k = i,j. These nonparametric intervals are easy to use and perform well in predicting middle sample quantiles. However, if m and n are very different, the possible levels of coverage are very limited. For example, if s is close to n and m/n is small, the coverage rate in (5.18) will be close to zero. See Patel (1989) for a further discussion of nonparametric prediction intervals.
5.6. Bayesian prediction Specification of a prior on the unknown parameter 0 allows the derivation of a predictive distribution on the quantity being predicted as given in (3.7). The frequentist interpretation of prediction regions given in Section 2 does not apply to probabilities obtained from this distribution and inferences drawn from it can be highly dependent on the prior chosen. The use of what are called noninformative priors provides some level of objectivity. See Geisser (1993) for a full treatment of the application of the Bayesian paradigm to prediction. Dunsmore(1974) applied the predictive distribution to life testing and proposed constructing a 1 - 27 highest posterior density region for T of the form:
A = {t;f(t[xl) > b} ,
(5.19)
where the constant b is chosen by the requirement that f A f ( d t l x l ) = 1 - 2 . 7. Dunsmore derives these regions for samples from the one and two parameter exponential distribution with a variety of priors. From the Bayesian perspective the region given in (5.19) would be interpreted as a 1 - 27 prediction region for T. In some cases, with noninformative priors Dunsmore's regions turn out to be identical to frequentist prediction regions.
Prediction of order statistics
445
Calabria and Pulcini (1994) consider the two sample problem where the data are generated by an inverse Weibull distribution. Specifically, 1/X has a Weibull distribution with parameters c~ and ft. They derive the predictive distribution of Y(s) for a noninformative prior of the form ~(~, fl) = c/aft, ~ > 0, fi > 0. They also use a log inverse gamma prior on P(X > t), for some specified t. This is converted into a conditional prior on ~ given fl and finally into a joint prior by placing a noninformative prior on ft. Lingappaiah (1983) placed a normal prior on # and an independent noninformative prior on ~r2 when observations are from a normal distribution with mean # and variance 0-2. He derived the predictive distribution of Y(s). Lingappaiah (1978) obtained predictive distributions for future order statistics based on several random samples from a gamma family with pdf f(xl~, ~) = x ~-1 exp(-x/a)/F(~)a ~, x > 0, with shape parameter c~being a known integer, by placing a gamma prior on 1/o- ~ 0. There is an initial sample from which all the data are used to construct a posterior for 0. He also supposes that p additional samples resulting in selected order statistics {X(i),ki} are available, where {X(i), ki}, is the ki th order statistic from sample i, 1 < i <_p. The posterior for 0 is successively updated by using it as a prior with the next sample. Suppose now it is desired to predict Y(s), the S th order statistic obtained from a future random sample from the same underlying gamma distribution. Lingappaiah (1978) obtains the predictive distribution of Y(~) by using the posterior of 0 as a prior in the expression given in (3.7). The formulas are quite complicated unless all the order statistics are minima. Lingappaiah (1978) shows that the variance of the predictive distribution decreases as p increases and more data becomes available. Lingappaiah (1986) follows a similar program for samples from the power family of pdf's of the form:
f(xlO)=Ox °-',
0<x<
1,
0>0
,
(5.20)
with an exponential prior on 0. He also allows censoring of both the smallest and largest order statistics and obtains predictive distribution for ratios of the form Y(n s2+l)/Y(sl+l) • Predictive distributions are also discussed below in the context of samples that allow for outliers and a shift in models.
5.7. Multiple future samples Let X, Y1, Y2,..., Yp, be vectors of order statistics obtained from independent random samples of sizes m, nl,nz,...,np from a continuous pdf. Use Y(i,j) to denote the jth component of Yi,j = 1 , 2 , . . . , n i , i = 1 , 2 , . . . , p . Given indices {qi, qi <_ni}, based on X it is desired to construct lower prediction intervals for {Y(i,m+q~+l)} of the form {Ii ~- [Lg(X), ec),i = 1 , 2 , . . . ,p} so that the probability that all p intervals are simultaneously correct is 1 - 7 • Note that if the i th interval actually contains Y(i,ni-q~+l), then at least qi of the components of Yg lie in/~. In this setting, Chou and Owen (1986a) give a complex distribution free expression for the probability of simultaneous coverage of intervals of the form
K. S. Kaminsky and P. L Nelson
446
{ [Xq, oe), i = 1 , 2 , . . . ,p} in terms of the multivariate hypergeometric distribution. A trial and error process must be used to attain a value close to the desired 1 - 7, if this is possible. If qi > ni, for some i, this procedure cannot be used. Chou (1988) assumes that the observations are selected from a one parameter exponential distribution with unknown scale parameter a. The functions {Y(i,n,-qi+l)/~} are pivotals, where 2 is the mean of X. Thus,the joint coverage probability P(t) of intervals of the form {I/(t) = It,2, ee)} does not depend on a. Chou (1988) gives an explicit expression for the joint coverage of the intervals {Ii(t)} as:
jl =ql
j2=q2
jp=np
x (-1)i[1 + ( J + i ) t / m ] -m , where K = ~/P=I ni,J = }-'f=lJi. P(t) is a decreasing function of t with P(0) = 1. Therefore, there is a unique solution call, it to, to the equation P(t) = 1 - 7 , allowing iterative construction of simultaneous 1 - 7 intervals. Chou (1988) shows that P(t) >_ (1 + Kt/m) -m. This implies that to >_ (m/K)((1 - 7 ) -1/m - 1). Chou and Owen (1986b) obtained similar results for samples from a normal distribution with unknown mean # and standard deviation a. Let # denote the unbiased estimate of standard deviation computed from X. Here, {(~-y(i,, q+l))/6} are pivotals leading to intervals of the form { [ 2 - t 6 , cc)}. Chou and Owen (1986) table values of t for p = 2 , 1 - y = 0 . 9 0 , 0.95,nl=nz=n=20,80(10),ql =qz=2, ql=q2=3, m=2, 15(1), 20, 25, 30, 40, 50, oc. Values for t under different settings and for different location scale families can be found by simulation.
' 5.8. Outliers and model shifts Although robustness is potentially an important issue for prediction for order statistics, relatively little work has been done on this problem. Investigating situations where the underlying data are not the realizations of identically distributed random variables is a good place to start. Observations may not have the same distribution because of a systematic shift in the model or the occurrence of outliers. Lingappaiah (1991 a) introduced a particularly tractable version of a shift in model scenario. Let O = {Xi, i = 1 , 2 , . . . , N} be independent random variables with one parameter exponential distributions. Partition the components of O into k successive blocks of random variables where the ith block consists of ni elements, each having mean io-, o- unknown, }-~=1 ni = N. Let Y be independent of O and consist of the order statistics obtained from a random sample of sizes n from a one parameter exponential distribution with mean ka. The goal is to construct a prediction interval for Y(s). Lingappaiah (1991a) places an exponential prior with mean 1 on ( l / a ) and obtains the predictive distribution H(u, s) = P(Y(s) > ulO ) in terms of the quantity S = 1 + Y'~=I iSi, where S,- is the sum of the observations in the ith block, i--- 1,2,... ,k. For example, H(u, 1) = [S/(S + nku)] N+I, so that a
Prediction of order statistics
447
1 - 7 lower predictive interval for Y(1) is given by [u0, ec), where u0 is the solution to H(uo, 1) = 1 - ?. The cases where s > 1 are more complicated but can be handled similarly. Lingappaiah (1985) allowed a more general shift model and used the posterior from previous stages to serve as the prior for the next stage. The presence of outliers can cause serious distortions for both estimation and prediction. Dixit (1994) considered the possibility that the available data contain some outliers. Specifically, Let X = ( X ( I ) , X ( 2 ) , . . . , X ( m ) ) t and Y=(Y(1), Y(2), - • •, Y(n))~ be collections of order statistics constructed by sorting independent random variables {X/} and {Yj} constructed as follows. Both {X/} and {Yj} have distributions in the Weibull family with pdf's of the form:
f(x[O, 6,fl) = fiO6x ~-1 exp[-6Ox~],
x > 0 .
(5.22)
The {X~} are distributed independently with 6 = 1. The {Yj} are independent with k components having pdf with 6 ¢ 1. These k random variables represent outliers. The remaining n - k components have 6 = 1. It is not known which components are outliers. But, the values of 6 and k are assumed known. Dixit (1994) places a gamma prior on 0 of the form:
n(Oia, h) = h(Oh) a-1 e x p ( - O h ) / F ( a ) ,
0> 0 ,
(5.23)
where a and h are specified hyperparameters. The goal here is to construct a predictive interval for the future order statistic Y(s), which may or may not be an outlier. Since the estimator 0 = ~i-1X].~ + (m - r)X/~, the total time on test of the/3 th power of the available data, is sufficient for 0, the predictive interval may conveniently based on 0. Dixit (1994) obtains an expression for H(u,s) =_P(Y(s)> ulO), the predictive probability that Y(~) exceeds u. Things simplify when predicting Y(1). An upper 1 - 7 predictive interval for Y(1) is given by (0, b), where b = [(h + O)/(6k + n - k)l[? -1/(a+r) - 1]. Dixit (1994) also extends these results to situations where several samples are available. In related work, Lingappaiah (1989a) allowed a single outlier with mean (1/a) + 6 in samples from a one parameter exponential distribution with mean a. Using a gamma prior on 1/a he obtained the predictive distribution for X(r) in terms of confluent hypergeometric functions. Also see Lingappaiah (1989c) for predictive distributions of maxima and minima in the one and multiple sample problems based on data from an exponential distribution with outliers. Lingappaiah (1989b) considered samples from a generalized logistic distribution with pdf of the form:
f(x[b, 6) = ce-X/(1 + e-X) c+1 ,
(5.24)
where c = b6 or b + 6. In both cases c ¢ b corresponds to an outlier with 6 a known value. A gamma prior is placed on b and the predictive distribution for Y(s) based on an independent random sample from (5.24) with c = b is obtained. Lingappaiah (1991b) derived the distribution of X(s)-X(r) in samples from a gamma distribution with shape parameter b, both shape and scale parameters assumed known, in the presence of a single outlier with shape parameter shifted a
448
K. S. Kaminsky and P. L Nelson
k n o w n a m o u n t , b + 6. T h i s d i s t r i b u t i o n c a n be u s e d to c o n s t r u c t a p r e d i c t i o n i n t e r v a l f o r X(s) b a s e d o n X(r).
6. Concluding remarks M u c h h a s b e e n l e a r n e d a b o u t the p r e d i c t i o n o f o r d e r statistics in the p a s t 25 years. L i n e a r p o i n t p r e d i c t o r s a n d p r e d i c t i o n i n t e r v a l s b a s e d o n s a m p l e s f r o m l o c a t i o n - s c a l e f a m i l i e s h a v e b e e n e x t e n s i v e l y studied. R e l a t i v e l y little is k n o w n about prediction based on samples from more general families of distributions, n o n l i n e a r p r e d i c t i o n a n d o p t i m a l i t y . F u t u r e r e s e a r c h in these a n d o t h e r a r e a s will u n d o u b t e d l y e x p a n d o u r k n o w l e d g e o f this i n t e r e s t i n g p r o b l e m .
References Abu-Salih, M. S., M. S. Ali Khan and K. Husein (1987). Prediction intervals of order statistics for the mixture of two exponential distributions. Aligarh. or. Statist. 7, 11 22. Adatia, A. and L. K. Chan (1982). Robust procedures for estimating the scale parameter and predicting future order statistics of the Weibull distribution. IEEE Trans. Reliability R-31, 5, 491-499. Balasooriya, Uditha (1987). A comparison of the prediction of future order statistics for the 2parameter gamma distribution. IEEE Transactions on Reliability R-36(5), 591 594. Balasooriya, Uditha (1989). Detection of outliers in the exponential distribution based on prediction. Commun. Statist. - Theory Meth. 711-720. Balasooriya, Uditha and Chart, K. Lai (1983). The prediction of future order statistics in the twoparameter Weibull distributions - a robust study. Sankhyd B, 45(3), 320-329. Blom, G. (1958). Statistical Estimates and Transformed Beta-Variables. Almqvist and Wiksell, Uppsala, Sweden; Wiley, New York. Calabria, R. and D. Pulcini (1994). Bayes 2-sample prediction for the inverse Weibull distribution. Commun. S t a t i s t . - Theory Meth. 23(6), 1811-1824. Chou, Youn-Min. (1988). One-sided simultaneous prediction intervals for the order statistics of 1 future samples from an exponential distribution. Commun. Statist. - Theory Meth. 17(11), 3995-4003. Chou, Youn-Min and D. B. Owen (1986a). One-side distribution free and simultaneous prediction limits for p future samples. J. Qual. Tech. 18, 96-98. Chou, Youn-Min and D. B. Owen (1986b). One-sided simultaneous lower prediction intervals for 1 future samples from a normal distribution. Technometrics 28(3), 247-251. Dixit, Ulhas J. (1994). Bayesian approach to prediction in the presence of outliers for a Weibull distribution. Metrika 41, 12~136. Dunsmore, I. R. (1974). The Bayesian predictive distribution in life testing models. Technometrics 16(3), 455-460. Fligner, M. A. and D. A. Wolfe (1976). Some applications of sample analogues to the probability integral transformation and a coverage probability. Amer. Statist. 30, 78-85. Fligner, M. A. and D. A. Wolfe (1979). Methods for obtaining a distribution-free prediction interval for the median of a future sample. J. Qual. Tech. 11, 192-198. Geisser, S. (1975). The predictive sample reuse method with application. J A S A 70, 320-328. Geisser, S. (1993). Predictive Inference: An Introduction. Chapman Hall, New York. Goldberger, A. S. (1962). Best linear unbiased prediction in the generalized regression model. J A S A 57, 369-375. Kaminsky, K. S. and P. I. Nelson (1974). Prediction intervals for the exponential distribution using subsets of the data. Technometrics 16(1), 57-59.
Prediction of order statistics
449
Kaminsky, K. S. and P. I. Nelson (1975a). Best linear unbiased prediction of order statistics in location and scale families. JASA 70(349), 145-150. Kaminsky, K. S. and P. I. Nelson (1975b). Characterization of distributions by the form of the predictors of order statistics. In: G. P. Patil et al. ed., Statistical Decisions in Scientific Work 3, 113115. Kaminsky, K. S., N. R. Mann and P. I. Nelson (1975). Best and simplified prediction of order statistics in location and scale families. Biometrika 62(2), 525 527. Kaminsky, K. S. and L. S. Rhodin (1978). The prediction information in the latest failure. JASA 73, 863-866. Kaminsky, K. S. and L. S. Rhodin (1985). Maximum likelihood prediction. AnInStMa 37, 507-517. Lawless, J. F. (1971). A prediction problem concerning samples from the exponential distribution with application in life testing. Technometrics 13(4), 725-729. Lawless, J. F. (1977). Prediction intervals for the two parameter exponential distribution. Technometrics 19(4), 469472. Lawless, J. F. (1982). Statistical Models and Methods for Lifetime Data. Wiley, New York. Like~, J. (1974). Prediction of sth ordered observation for the two-parameter exponential distribution. Technometries 16(2), 241-244. Lingappaiah, G. S. (1978). Bayesian approach to the prediction problem in gamma population. Demonstratio Mathematica 11(4), 907420. Lingappaiah, G. S. (1983). Prediction in samples from a normal population. J. Statist. Res. 17 (1, 2), 43 50. Lingappaiah, G. S. (1985). A study of shifting models in life tests via Bayesian approach using semi-orused priors (Soups). Ann. Inst. Statist. Math. 37(A), 151-163. Lingappaiah, G. S. (1986). Bayesian approach to prediction in censored samples from the power function population. J. Bihar Math. Soc. 10, 60-70. Lingappaiah, G. S. (1989a). Prediction in life tests based on an exponential distribution when outliers are present. Statistica 49(4), 585 593. Lingappaiah, G. S. (1989b). Prediction in samples from a generalized logistic population of first or second kind when an outlier is present. Rev. Mat. Estat., Sao Paulo, 7, 87-95. Lingappaiah, G. S. (1989c). Bayes prediction of maxima and minima in exponential life tests in the presence of outliers. J. Indust. Math. Soc. 39(2), 169-182. Lingappaiah, G. S. (1991a). Prediction in exponential life tests where average lives are successively increasing. Pak. J. Statist. 7(1), 33-39. Lingappaiah, G. S. (1991b). Prediction in samples from a gamma population in the presence of an outlier. Bull. Malaysian Soc. (Second Series), 14, 1-14. Lloyd, E. H. (1952). Least-squares estimation of location and scale parameters using order statistics. Biometrika 39, 88-95. Malik, H. J. (1966). Exact moments of order statistics from the Pareto distribution. Skandinavisk Aktuarietidskrift 49, 3-4, 144-157. Malik, H. J. (1967). Exact moments of order statistics from a power-function distribution. Skandinavisk Aktuarietidskrift 50, 3 4 , 64-69. Mann, N. R. (1968). Optimum estimators for linear functions of location and scale parameters. Ann. Math. Stat. 40, 2149 55. Nagaraja, H. N. (1984). Asymptotic linear prediction of extreme order statistics. Ann. Inst. Statist. Math. 289-299. Nagaraja, H. N. (1986). Comparison of estimators and predictors from two-parameter exponential distribution. Sankhy~ Ser. B 48(1), 10-18. Ogunyemi, O. T. and P. I. Nelson (1996). Adaptive and exact prediction intervals for order statistics. Commun. Statist. B., 1057 1074. Patel, J. K. (1989). Prediction intervals a review. Commun. Statist. Theory Meth. 18(7), 2393-2465. Raqab, M. Z. and H. N. Nagaraja, (1992). On some predictors of future order statistics. Tech. Report. No. 488, Ohio State Univ.
450
K. S. Kaminsky and P. L Nelson
Stone, M. (1974). Cross-validatory choice and assessment of statistical prediction (with discussion). J. Roy. Stat. Soc. B 36, 111-147. Takada, Y. (1985). Prediction limit for observation from exponential distribution. Canad. J. Statist. 13(4). 325-330. Takada, Y. (1991). Median unbiasedness in an invariant prediction problem. Stat. and Prob. Lett. 12, 281-283. Takada, Y (1993). Uniformly most accurate equivariant prediction limit. Metrika 40, 51-61. Van Zwet, W. R. (1964). Convex Transformations of Random Variables. Mathematisch Centrum, Amsterdam. Watson, G. S. (1972). Prediction and the efficiency of least squares. Biometrika 59, 91-98.
Part V Goodness-of-fit Tests
N. Balakrishnan and C. R. Rao, eds., Handbook of Statistics, Vol. 17 © 1998 Elsevier Science B.V. All rights reserved.
1
I O
T h e P r o b a b i l i t y Plot: T e s t s o f F i t B a s e d o n the C o r r e l a t i o n C o e f f i c i e n t
R. A. L o c k h a r t a n d M . A. S t e p h e n s
1. Introduction
1.1. The probability plot Suppose a r a n d o m sample )(1,X2,... ,Xn comes from a distribution Fo(x) and let X(~),X(2),...,X(n) be the order statistics. Fo(x) m a y be of the form F(w) with w = (x - ~)//3; ~ is then the location p a r a m e t e r and/3 > 0 is the scale p a r a m e t e r of F0(x). There m a y be other p a r a m e t e r s in F(w), for example, a shape parameter; here we assume such p a r a m e t e r s known, but ~ and fi are u n k n o w n . We can suppose that the r a n d o m sample of X-values has been constructed from a r a n d o m sample Wl, w 2 , . . . , wn f r o m F(w), by the t r a n s f o r m a t i o n
x~ = ~ +/~wi .
(1)
If the order statistics of the w-sample are w0) < w(2) < - • • < w(,), we have also x(,) = ~ + ~w(o
(2)
.
Let m~ be E(w(0 ) and let v~j be E(w(~) m i ) ( w ( j ) - mj); let V be the n × n matrix with entries vii. V is the covariance matrix of the order statistics w(0. F r o m (2) we have -
E(X(i)) = 0~+ flmi
(3)
and a plot of X(i) against rn~ should be approximately a straight line with intercept on the vertical axis and slope ft. The values mi are the most natural numbers to plot along the horizontal axis to achieve a straight line plot, but for most distributions they are difficult to calculate. Various authors have therefore p r o p o s e d alternatives T,- which are convenient functions of i; then (2) can be replaced by the model
X(i) = o~+ flTi + ei
(4)
where e~ is an " e r r o r " which has m e a n zero only for T~ = mi.
453
R. A. Lockhart and M. A. Stephens
454
A common choice for T/is ~ _= F-~{i/(n + 1)} or similar expressions which approximate rni. A plot of X(i) against T,. is called a probability plot and the T/are plotting positions. Historically, the plot was often made with T~on the vertical axis and X(0 on the horizontal axis, but we shall think of the plot with these axes x'eversed. Also, when Hi is used, the values i/(n + 1) were marked along the iri axis at the actual value of//,.; this axis is thus distorted, but the resulting paper (called probability paper) is then much easier to use, since only values of i/(n + 1) are required and the actual values of Hi need not be calculated. When the plot is made, a test of H0 : the sample comes from F0 (x) ,
(5)
can then be based on how well the data fits the line (3) or (4). As an example, suppose it is desired to test that the X-sample is normally fw ~ IV2 dt distributed, with unknown mean p and variance 0-2. Then F(w) = ~ ~ and the w-sample is standard normal. Then (3) becomes E(X(i)) =
+ 0-mi
where mi are the expected values of standard normal order statistics. For this distribution, c~= # and fl = 0-.
1.2. Measures of fit The practice of plotting the X(i) against mi (or against T~) and looking to see if a straight line results, is time-honoured as a quick technique for testing fit, particularly for testing normality. However, historically this appears to have been done by eye. An improvement is clearly to find a statistical measure of how well the data fits the line (4), and it is remarkable that this does not seem to have been done for many years after the introduction of the plot. Almost certainly this would have been because the tools were not then available to give tables for an appropriate test statistic. Three main approaches to measuring the fit can be identified. The first is simply to measure the correlation coefficient R(X, T) between the paired sets X/ and T~. A second method is to estimate the line c~+/~T,., using generalized least squares to take into account the covariance of the order statistics, and then to base the test of fit on the sum of squares of residuals. A closely related procedure is to fit a higher-order regression equation for X(i) against T~, and then to test that the coefficients of the higher-order terms are zero. Finally, a third technique is to estimate/~ from (2) using generalized least squares, and to compare this estimate with the estimate of scale given by the sample standard deviation. For all these methods an investigation of the null distribution of the resulting test statistic, for finite n, would require Monte Carlo methods, and high-speed computers were not available when the probability plot was first used; even asymptotic theory is greatly facilitated by modern probability methods which arrived only much later. In this article we give the asymptotic theory for the first of the methods above,
The probability plot
455
that based on the correlation coefficient. In a later section, the other techniques will be briefly surveyed, and connections made.
1.3. The correlation coefficient The correlation coefficient R(X, T) is an attractive measure of straight line fit, if only for the reason that the concept of correlation is well-known to applied workers. In what follows we extend the usual meaning of correlation, which applies to two random variables, and also that of variance and covariance, to apply when one of the pair, T/, is a constant. Thus let X refer to the vector X(1),Xcz),...,X(/7), let m refer to the vector ml,m2,... ,ran, and let T refer to the vector T1 T2, Tn; let X = ~ i = 1 (i)/n and T ---- ~ i = 1 Ti/n, and define the sums ...
~
--
n
XJ
i=1 n
--
/7
i=l n
slx, xl ~- ~(xc,>- x ) 2 ~ ( x , - x-) 2 i=1
i=l
/7
s(r, r)-- ~ ( T / - ~)~. i=1
S(X, X) will often be called S 2. The variance of X is then V(X,X)=I-~IS(X,X), the variance of /" is V( T, T) = J:r_IS( T, T), and the covariance of X and T is V(X, T) = 1-~_S(X a , T). The correlation coefficient between X and T is
R(X, r )
:
v(x, r ) ~/v(x,x)v(r,
r) =
s(x, r ) ~/s(x,x)s(r, r )
We now see another reason why statistic R(X, m), (sometimes called simply R), is an attractive statistic for testing the fit of X to the model (2), since if a "perfect" sample is given, that is, a sample whose ordered values fall exactly at their expected values, R(X,m) will be 1; and with a real data set, the value of R(X,m) can be interpreted as a measure of how closely the sample resembles a perfect sample. Then tests based on R(X, m), or equivalently o n R Z ( X , m ) will be onetailed; rejection of H0 occurs only for low values of R 2. Suppose )((i) = ~ + fiT/, where & and fi are the usual regression estimators of e and fl (ignoring the covariance between the X(i)). From the standard ANOVA table for straight line regression:
R. A. Lockhart and M. A. Stephens
456
Regression SS -
S2(X' T)
s(r, r)
Error SS = S 2
S2S(T,T)(X' T) _
~-'~(X(i)n-X(i))^ 2 i=1
Total
SS
= 82 =
S(X~ X)
it is clear that Error SS Total SS
- 1 - R2(X, T)
g
Define, for any T vector,
z ( x , r) = n{1 - R2(X,
r)}
Then Z(X, T) is a test statistic equivalent t o R2(X, T), based on the sum of squares of the residuals after the line (4) has been fitted. Z(X, T) has, in common with many other goodness-of-fit statistics (e.g., chi-square, and E D F statistics), the property that the larger Z(X, T) is, the worse the fit. Furthermore, in many practical situations, (as in Case 1 of Section 2.1 below), Z has a limiting distribution, whereas R 2 tends in probability to one. Sarkadi (1975) and Gerlach (1979) have shown consistency for correlation tests based on R(X,m), or equivalently Z(X, m), for a wide class of distributions, including all the usual continuous distributions. This is to be expected, since for large n we expect a sample to become perfect in the sense above. We can expect the consistency property to extend to R(X, T) provided T approaches m sufficiently rapidly for large samples.
1.4. Censored data R(X, T) can easily be calculated for censored data, provided the ranks of the available X(i) are known. These X(i) are paired with the appropriate T/and R(X, T) is calculated using the same formula as above, with the sums running over the known i. For example if the data were right-censored, so that only the r smallest values X(i) were available, the sums would run for i from 1 to r; if the data were leftcensored, with the first s values missing, the i would run fi'om s + 1 to n. Tables of Z(X, iv) = n{1 - RZ(x, T)} for T = m or T = H, for testing for the uniform, normal, exponential, logistic, or extreme-value distributions, and with various fractions of the data censored, have been published by Stephens (1986a). Note that the factor n, and not the number of observations available, is used in calculating Z(X, T) for censored data. The only exception to this is when the test is for the uniform distribution (see Section 4).
The probability plot
457
2. Distribution theory for the correlation coefficient
2.1. The general case We n o w discuss the asymptotic behaviour of Z(X, m) for the general test of H0 given in (5). Suppose F(w) is a continuous distribution, and let f ( w ) be its density. We observe ordered values X(~) < X(k+l) < " " < X(r) from a sample of size n f r o m the distribution Fo(x). We can assume that the sample comes from Fo(x) with = 0 and fi = 1, that is, from F(w) although (3) is fitted without this knowledge. N o t e that the sample m a y be left- and/or right-censored; suppose the given n u m b e r of observations is n* = r + 1 - k. We shall proceed heuristically, assuming firstly that the limit of Z(X, m) equals that of Z(X, H). We therefore study Z(X, H), which for convenience we abbreviate to Zn. The expression X ( i ) - ) ( ( i ) = X ( i ) - & - / ~ H ~ m a y be written X ( 0 - H i (X - H ) - (/) - 1) ( 8 - H), where & and/~ are the estimates of c~and fl in (4) when Ti = Hi, X = ~i=kX(i)/n*, and H = ~ir_k H i / n * . Then
Z(X, H) = n{1 - R2(X, H ) }
1
r
n E i = k (X(i) -- X ) 2
L e t p = (k - 1)In and q = r/n and let W = 1/(q - p ) . Also let q* =- F -1 (q) and p* = F l(p). It will be convenient to define the function 0(t) = ( F - l ( t ) - p ) / a where p a r a m e t e r s p and a are defined by
it=
F l(s)Wds=
xf(x)Wdx ,
and 62 =
/;
(F '(s) - #)2Wds =
//
x2f(x)Wdx _ #2 .
Finally, with m = n + 1, define the quantile process
Qn(t) = v/n{X([mt]) -- F -1 (t)} ; note that when t = i/m, Q.(t) = v~{X(i) - Hi}. Also define the process Y.(t) derived f r o m Q.(t) as follows: r (t) = Qn(t) -
Q
(s)Wds- O(t)
O(s)Qn(s)Wds
We now consider the terms in X(i) - X(i) expressed in terms of these processes. Let K = fqQn(s)Wds; then v ~ ( X - H ) = K + o p ( 1 ) .
R. A. Lockhart and M. A. Stephens
458
Also
\.
)
2 ; k (u, -
= (W/a)
f qO(t){Q.(t)
- K } dt + Op(1) ,
Then insertion of these expressions into X(i) - 2(0 = X(i) - Hi - (X - H) (D - 1)(Hi - H) gives the numerator of Z. equal to fpq y2(t) dt + op(1). For the limiting behaviour of Z., suppose Zoo is a random variable with the limiting distribution of Z. as n ---+ec. The process Q.(t) tends to Q(t), a Gaussian process with mean 0 and covariance rain(s, t) - st po(s, t) = f ( F _ 1(s) ) f ( F _ 1(t) ) Also, the process Y.(t) tends to
Y(t) = Q(t) -
Q ( s ) W d s - O(t)
O(s)Q(s)Wds
;
this process is a Gaussian process with mean 0 and covariance
p(s, t) = Po(S, t) - O(s)
~,(u)po(u, t ) W d u - O(t)
/q O(u)po(s , u ) W d u
_ fpqpo(u,t, W d u _ ~ q P o ( S , u ) W d u + ~qfpq p o ( u , v ) W 2 d u d v + O(s)O(t)
I 2 qpo(u,v)O(u)~'(v)W2dudv
+ (O(s) + O(t))
po(u, v)O(u ) W 2 du dv .
The numerator of Zoo is then T = fq y2(t)dt, and the denominator is a 2. Thus the limlting behavlour of Z, depends on the behavlour of T = fpq y2(t)dt, whxch depends on f.q Q2(t)dr; the behaviour of this integral in turn depends ultimately on the covarlance function po(s, t) through the following two integrals: •
.
J1 =
and
//
J P
Po(t, t)
,
,
dt
q J2 =
p02(s, t) ds dt .
The first integral decides whether or not Z, has a finite limiting mean, and the second whether it has a finite limiting variance. There are then three possible cases
Theprobabilityplot
459
guiding the limiting behaviour of Zn, depending on whether or not Ja and J2 are finite or infinite. These will be described and then illustrated with examples of tests on well-known distributions. CASE 1. In this case both Jl and ,/2 are finite. This is a situation which occurs often with other goodness-of-fit statistics which are asymptotically functionals of a Gaussian process, for example, statistics based on the empirical distribution function (EDF). There is then a well-developed theory to obtain the limiting distribution of the functional T = fq y 2 ( t ) d t and hence of 2",, above; see, for example, Stephens (1976). The limiting distribution of Zn takes the form
L¢. i
Zoo = ~r2 ~
2i
(6)
where vi are independent X2 variables and '~i a r e eigenvalues of the integral equation f(s) = 2
//
(7)
p(s, t)f(t) dt .
The mean of Zo~ is ~i~__l 2~-l/a 2 and the variance is ~i~1 2~i2/°-4; these will be finite when both J1 and J2 are finite. CASE 2. Suppose J2 is finite but J! -- ~ , In this case the limit of the mean of Z, is ~i~=l 2~-1 = ~ , and there exists an -+ cc such that 1
Z. - a. = n(1 - R 2) - an ~ ~ Z
271 (/)i -- 1) ,
(8)
i=1
where the 2i are again the eigenvalues of (7), and the vi are again independent Z~ variables. CASE 3. In this case both integrals J1 and J2 are infinite. Then in regular cases there exist constants an and bn, such that
Zn-an_n(t bn
- R 2) - a n
bn
=~ N(0, 1) .
(9)
2.2. Examples 1. Test f o r the uniform distribution - Case 1 Suppose the test is for the uniform distribution over the interval (a,b), with parameters a and b unknown. For any p or q Case 1 applies and ( r - k + 1)(1 - R 2) has the same limiting distribution regardless o f p or q. This test will be discussed in detail in Section 4.
460
R. A. Lockhart and M. A. Stephens
2. Test f o r the exponential distribution - Cases 1 and 3 The test is for F(x) = 1 - e -x/°, - o c < x < ec, with 0 > 0, and u n k n o w n . This test has been extensively examined by L o c k h a r t (1985) who gave the following results. F o r q < 1 (right-censored data) Case 1 applies and the distribution is a sum o f weighted chi-squared variables. This case is important when the exponential distribution is used to model, for example, survival times. F o r q = 1 we have case 3; a, = log n, arid b. = 2 1 ~ n, so that n(1 - R 2) - logn 2lx/~n
N(0, 1) .
3. Test f o r the logistic distribution - Cases 1 and 3 This test is for F(x) = 1/(1 +e-X), - e c < x < ec. F o r p > 0, and q < 1 we get Case 1. Thus the logistic test, when both tails are censored, is similar to the exponential test. F o r p = 0 or q = 1 or both we get Case 3. F o r complete samples, where p = 0 and q = 1, M c L a r e n and L o c k h a r t (1987) have shown that a, = log n and bn = 23/2 l v ~ n . 4. Test f o r the extreme value distribution Cases 1 and 3 Suppose the tested distribution is F(x) = exp (-e-X), - o c < x < ec; we shall call this distribution EV1; when q < 1 Case 1 occurs, and Case 3 occurs when q = 1. M c L a r e n and L o c k h a r t have shown that for complete samples, a, = logn, and b, = 2 ~ , as for the exponential test. W h e n the test is for the extreme value distribution (EV2) in the form: F(x) = 1 - exp (-e~), - o c < x < oc, case 1 occurs when p > 0 and for any value o f q; for p = 0, Case 3 occurs. This is to be expected since EV2 is the distribution of - X when X has distribution EV1. 5. Test f o r the normal distribution Cases 1 and 2 Suppose the test is for the n o r m a l distribution with mean and variance unknown. Then it m a y be shown that, for p > 0 and q < 1, that is, for data censored at both ends, we get Case 1 (both Jl andJ2 are finite), while f o r p = 0 or q = 1 or both we get Case 2. In the next section this test is examined and c o m p a r e d with similar tests.
3. Tests for the normal distribution 3.1. The correlation test The probability plot for testing normality has such a long history and has been used so often that it seems worthwhile to treat this example in greater detail. Historically, it was for testing normality that probability paper has been m u c h used, especially for the various effects arising in the analysis o f factorial experiments; see, for example, Davies (1956). It is also for the normal test that most of the more recent w o r k has been done to bring some structure into tests based on such plots, starting with the well-known test o f Shapiro and Wilk (1965). We
The probability plot
461
show in the next section that there are interesting connections between this test and the correlation test (the two are asymptotically equivalent), but only for the test for normality. Also in the next section we discuss briefly extensions of the probability-plot technique which are developed by fitting a polynomial to the plot rather than the simple linear equation (2). As was stated above, when the test is for the normal distribution with mean and variance unknown, we have that, for p > 0 and q < 1, that is, for data censored at both ends, Case 1 occurs (both Jl and J2 are finite), while for p = 0 or q = 1 or both Case 2 occurs. The results for p = 0 and q = 1 (that is, for a complete sample) were shown by de Wet and Venter (1972), using an approach somewhat different from that given above. De Wet and Venter show
Z, -an = n ( 1 - R 2) - a , ==~~-~ 2?l(vi- 1) ,
(10)
i=1
that is, Case 2 of Section 2.1 above with a = 1. These authors give a table of values of an for given n, and also tabulate the asymptotic distribution. They also considered the case where both # and a are given, so that the test reduces to a test for the standard normal distribution, and also the cases where one of the parameters is known and the other unknown. In all these cases the estimates of unknown parameters must be efficient; the obvious choices are the usual maxim u m likelihood estimates. A natural choice of an, in Cases 2 and 3, will be the mean of Zn, and for the normal tests this can be found. Consider again the case where both parameters are unknown, and suppose the test statistic is Z(X,m). The statistic is scale-free, and we can assume the true a = 1. Consider R2(X,m)= T/m~m, where N = S2(X, m) is the numerator of T and D = S(X, X) is the denominator, using the notation of Section 1. Because D is a completely sufficient statistic for a, T is distributed independently of D. It follows that the mean of T is the ratio of the mean of N and the mean of D. The mean of D is n - 1, so it remains to find the mean of N. Therefore we need E{Sa(X, m)}. Let V be the covariance matrix of standard normal order statistics. We have S(X, m) = m'X, so S2(X, m) = m'XX'm, and its expectation is m ' ( V + m m t ) m = m ~ V m + (m~m) 2. Thus the mean of R 2 (X, m) is E{R 2 (X, m) } = { (mtVm/mtm) + re'm}/(n - 1) . Using (11) in
(11)
a, = E(Z,) we obtain
a, = { n / ( n - 1 ) } { ( n - 1) - (mWm/m'm) - m ' m } . Asymptotically, using the result Vm -+ m / 2 (Stephens, 1975; Leslie 1987), we find an=n-l.5-m'm=trace(V)-l.5
.
Tables of mi exist for a wide range of values of n; they can also be obtained in some computer packages. Balakrishnan (1984) gives an algorithm to calculate mrm directly, and Davis and Stephens (1977) give an algorithm for V.
R. A. Lockhart and M. A. Stephens
462
When the parameters are known, it is best to substitute X / = (X(0 - #)/a. The test statistic corresponding to Z, is, say, Z0,n = ~ = 1 (~' - mi)2; the mean of Z0,n is aon equal to trace(V) = n - m'm. Asymptotically, an = a0n - 1.5. When both parameters are unknown and H is used instead of m, algebra similar to the above gives for the new an, say an~/
anH = (n/(n -- 1)} { (n -- 1) -- ( H ' V H / H ' H ) - (H'm) 2 / H ' H } . De Wet and Venter noted the relationship an = a0, - 1.5 (approximately for finite n) between the constants obtained in their consideration of the statistic Z(X, 1t). This is to be expected, since for large n the limiting distributions of Z(X, m) and Z(X, H) are the same (Leslie, Stephens and Fotopoulos, 1986). The expression for a0n used by de Wet and Venter is a0n = {1/(n + 1)}
j(l -j
~-1
-2
where j = i/(n + 1), and where ~b(.) and ~b(.) denote the standard normal density and distribution functions respectively. The term in the sum is the first term in a classic formula for approximating vii; see, for example, David (1981). An interesting feature of the various limiting distributions is that the weights in the infinite sum of weighted Z~ variables are the harmonic series I/j, j = 1,2,.... The terms in the sum start at j = 1 when both parameters are given; they start at j = l but omit j = 2 when the mean is known but the variance is estimated from the sample; they start at j = 2 when the mean must be estimated but the variance is known; and they start at j = 3 when both parameters must be estimated.
3.2. Other test procedures (a) The Shapiro-Wilk and Shapiro-Franc& tests There is a fascinating connection between Z(X, m) and the well-known ShapiroWilk (1965) and Shapiro Francia (1972) statistics for testing normality with complete samples and parameters unknown; these statistics derive from the third method of testing given in Section 2, namely to estimate fl from (2) using generalized least squares, and to compare this estimate with the estimate of scale given by the sample standard deviation. In the test for normality, the estimate/~, which is now an estimate of a, becomes /~ = & = m ' V - i X / m ' V - I m
,
(12)
and the Shapiro-Wilk statistic is, to within a constant,
w-
(re'v-Ix)2
<13/
(m'V-lm)S 2 where S 2 is defined in Section 1. The remarkable result is that, as n ---+c¢, this statistic becomes
The probability plot
W'-
(re'X)2 (m,m)S2 ,
463
(14)
which, since r~ is zero for the normal distribution, can be seen to be exactly the same as the correlation coefficient R(X, m). The statistic W' was in fact proposed by Shapiro and Francia (1972) for large samples. The connection between (13) and (14) goes back to an observation by Gupta (1952), who observed that, as n --+ ec, the estimate of rr given by (12) approaches 6- = m'X/m'm, as though one could simply "ignore" the V. At the time this was a very useful result for estimating a from order statistics, since tables of V were available only for very small n. Note that ignoring V really means estimating the parameters by simple least squares rather than generalized least squares. Of course one cannot really ignore V, or, in more mathematical terms, replace it by the identity matrix, so there is some interest to see why Gupta's observation is correct, especially as it appears to hold for the normal distribution only. The reason is that, for this distribution, V-Xm ~ 2m; this was shown heuristically by Stephens (1975) and later proved rigorously by Leslie (1987), who showed IIV-lm - 2m[I < C(logn)-1/2, where C is a constant independent of n and ]lbll2 = b'b for any vector b. When this is used in (12) one obtains the simpler estimate for a; also statistic W approaches W' = R(X, m). Finally, the fact that the asymptotic behaviour of R(X, m) is the same as that of R(X, H) was proved true, in the case of normality, by Leslie, Stephens and Fotopoulos (1986). The asymptotic theory of the Shapiro-Wilk W and the Shapiro-Francia W' is therefore that given by de Wet and Venter for R(X, H), and reproduced above, for the normal distribution with both parameters estimated. Since R(X, m) always gives a consistent test, the implication here is that, for testing the normal distribution, the Shapiro Wilk and Shapiro-Francia tests are consistent. However, consistency is true only for the normal distribution; if the ShapiroWilk procedure is applied to testing any other distribution, the statistic obtained is not asymptotically the same as R(X, m); in fact the resulting test will now not be consistent (Sarkadi, 1975; Lockhart and Stephens, 1995). Spinelli and Stephens (1987) give an example, when the test is the exponential test, of an alternative where the test statistic has power decreasing with sample size.
(b) Use of high-order regression As was stated in Section l, a procedure closely related to the correlation-coefficient test is to fit a high-order regression equation of the form E(X/)
= O~-Jr- f i T / q - f l 2 w 2 ( T / ) - ] - / ~ 3 w 3 ( T / ) -1- - - •
(15)
where wj(.) is a j-th order polynomial, and then to test that the Coefficients of the terms of order higher than the first are zero. This method has been developed for the test for normality, using T/= mi, by LaBrecque (1977), who gives extensive tables of coefficients for the w2(mi) and w3(mi) so that the covariance matrix of the estimates of the coefficients, when estimated by generalized least squares, shall be zero. Related distribution theory was given by Puri and Rao (1975) and by
R. A. Lockhart and M. A. Stephens
464
Stephens (1975). Puri and Rao also consider tests of normality obtained by combining the Shapiro-Wilk test (effectively the same as the test t h a t / 3 / 0 - = 1) with tests for f12 = 0 or f13 = 0. They find that the extra test for/32 = 0 adds power against asymmetric alternatives, and the extra test for/33 = 0 adds power against long tailed alternatives, as one would expect. LaBrecque (1977) gives power studies for one or other of the extra tests and also for the tests combined together. The method has been extended to tests for other distributions by Coronel-Brizio and Stephens (1996). 4. Tests for the uniform distribution
4.1. Four cases Suppose the test is for the uniform distribution for X, between limits a, b, written U(a,b); then F(w) = w, 0 < w < 1, and mi = E(w(i)) = i/(n + 1); also Hi = mi. The model is written Xi = a + (b - a)wi, so that e = a and/3 = b - a in (1) to (3). The order statistics X(i) could be plotted against i instead of against i/(n + 1); the scale factor 1 / ( n + l ) does not change the correlation coefficient, and R(X, m) = R(X, H) = R(X, T) where ~ -- i. We can distinguish four cases for this test; for the first two the test statistic simplifies because the range b - a is known. The cases are: Case Case Case Case
0: 1: 2: 3:
a and b both known; a unknown, but (b - a) known; a known, but (b - a) unknown; both a and b unknown.
CASE 0. Here a and b are known; then the transformation X ' = ( X - a)/(b - a) reduces the problem to a test that X t is U(0, 1). There are of course m a n y tests for this case (see, e.g., Stephens, 1986b). In the present context, the test will be based on the residuals from the line F(x I) = x 1, 0 < x ~ < 1; that is, on the statistic Zo,, = ~ = j {X(i ) - i/(n + 1)} 2. Durbin and K n o t t (1972) considered the statistic (n + 1)Zo,n/n, which they called M2; they showed that the mean o f M 2 is 1/6 and the variance is 1/45 for all n. These authors pointed out that the statistic is very closely related to the Cram6r-von Mises statistic W z = ~7=1{X~i)(2i - 1)/(2n)} 2 + 1/(12n), and in fact it has the same asymptotic distribution as W 2. Lehman (1973) found the correlation between Z0,n and W2 to be 1/2 (2n - 1)/{n(4n - 3)} ; the value of this correlation approaches 1 rapidly with n. Durbin and K n o t t concluded that it was not worth tabulating Z0,n since W2 is so well-known, and Lehman agreed. For small samples, the two statistics will have much the same power properties. CASE 1. Here the model is X(i) = a + flw(i), with /3 = b - a known. Substitute Xi'i) =x(i)/fl; then the model becomes X(i ) = a / ~ + w ( i ) , and E(X.('i,) = e + ( m i - ~), where c~= a / f l + f n . Ordinary least squares gives & =X'~.J Hence
The probability plot ^ I
--I
.
465
.
.
.
X~=X +mi-0.5 and the test statistic based on residuals is Zl., = h / --t 2 ~i_l{X~i) - X - ( m i - 0.5)} . Z1, has similar properties to the Watson U 2 statisti-c U ~ = 2i"_l[X(0 - ~ - { ( 2 i ' 1)/(2n) - 0.5}12 + 1/(12n), and has the same asymptotic distribution. CASE 2. For Cases 2 and 3 it becomes much more difficult to obtain the asX - )~(i))x2/v-,. ymptotic distributions of the test statistic n ~i=1((i) / z-,i=1 (X(i) - ~)2, the denominator being necessary, and complicating the analysis, because for these two cases the scale must be estimated. We state the results and give proofs later. For Case 2, the model is E(X(0 ) = a + 3mi, with/~ unknown and a known. Set X(i~ = X ( i ) - a , so that E(X~i))= [3mi, and estimate fi by least squares; then -- ~ 1 X ( i ) m i / ~ni=l m2i " Thus )((i) = a + flmi, and the test statistic is
Z,,,=n
(X(i)-2(0 2 i=1
~(X(i)-X)
2 •
(16)
i=1 O<3
The asymptotic distribution of Z2,,---~i=1 vi/2i where vi, for i - - 1 , 2 , . . . , are independent Z~ variables; J[i is an infinite set of positive weights given by ,~i = 0~, where Oi are the solutions of tan Oi = 0~, 0i > 0. This result is shown in the Appendix. Table 1 gives percentage points for Z2,, for a range of values of n, and also the asymptotic points. Those for finite n have been obtained by Monte Carlo sampling. The table also gives points for a modification of Zz,,, called ZZA,n. This is the statistic (assuming a = 0) Z2A,. ~- n
(x( i) -- x'(O' i / i~l X(i)
(17)
For this statistic ~i~=l X ( i ) 2 / n i s used to estimate/72 instead of the sample variance. This is a natural denominator in Case 2 with a = 0, where the model is E(X(0 ) =/~mg. If the known a ¢ 0, the new variable )((/0 = X(0 - a, instead of X(0, should be used in (17) to calculate the test statistic. An advantage of using Z2A,, is that the percentage points in the table are much less variable for small n, and interpolation is easier. The asymptotic points for ZZA,, are 0.25 times those for Z2,,. CASE 3. For Case 3, the model is E(X(i))= ( z + / 3 ( m i - 0.5), with c~ and /~ unknown, and least squares gives &~ X and /~ = ~i=l{((i) -X)mi}/~i=l(mi m) 2. The test statistic is now the correlation coefficient R(X, m) or equivalently Z3,, =- Z(X, m). Z3,, has the asymptotic distribution Z3,, = ~i~=l vi/2i, where, as above, vi are independent )~ variables. The constants 2~ are positive weights given in two infinite sets: n
X
--
t/
--
Set 1: 2i=4zr2i 2, i = 1,2,.... Set 2: 2k = 4~b~, k = 1,2,..., where ~bk are the solutions of tan q5k = q~k, qSk > 0. Table 2 gives Monte Carlo percentage points for Z3,,, for a range of values of n, and also asymptotic points.
466
R. A. Lockhart and M. A. Stephens
Table 1 Critical points for Z2,~ and Zzx,, n
Z2,,
Z2A,,,
Upper tail significance level
4 6 8 10 12 18 20 40 60 80 100 e~ 4 6 8 10 12 18 20 40 60 80 100 vo
0.50
0.25
0.15
0.10
0.05
0.025
0.01
0.690 0.763 0.806 0.832 0.848 0.877 0.881 0.907 0.916 0.920 0.922 0.932 0.140 0.166 0.184 0.193 0.200 0.209 0.212 0.224 0.228 0.229 0.230 0.233
1.240 1.323 1.364 1.388 1.407 1.438 1,444 1,470 1,480 1.485 1.488 1.497 0.245 0.287 0.307 0.320 0.330 0.346 0.349 0.362 0.367 0.369 0.370 0.374
1.94 1.89 1.85 1.88 1.89 1.91 1.92 1.93 1.93 1.94 1.94 1.94 0.333 0.379 0.403 0.420 0.432 0.452 0.455 0.472 0.477 0.479 0.480 0.485
3.47 2.59 2.37 2.34 2.33 2.32 2,32 2.32 2.32 2.32 2.32 2.31 0.411 0.467 0.494 0.512 0.523 0.543 0.547 0.563 0.568 0.570 0.572 0.578
8.67 4.74 3.78 3.40 3.27 3.12 3.10 3.03 3.00 2.99 2.98 2.98 0.545 0.616 0.648 0.670 0.683 0.705 0.708 0.727 0.734 0.736 0.737 0.744
20.3 8.49 6.29 5.30 4.80 4.26 4.18 3.82 3.73 3.71 3.70 3.67 0.707 0.796 0.830 0.848 0.861 0.882 0.886 0.903 0.909 0.911 0.912 0.917
47,0 17.0 11.4 8.9 7.8 6.3 6.0 5.1 4.9 4.9 4.8 4.6 1.010 1.065 1.089 1.102 1.111 1.121 1.124 1.138 1.146 1.149 1.150 1.155
Table 2 Critical points for Z3,n n
4 6 8 10 12 18 20 40 60 80 e~
Upper tail significance level 0.50
0.25
0.15
0.10
0.05
0.025
0.01
0.344 0.441 0.495 0.535 0.560 0.605 0.610 0.640 0.648 0.658 0.666
0.559 0.703 0.792 0.833 0.864 0.940 0.960 0.980 0.988 0.997 0.992
0.734 0.901 1.000 1.068 1.093 1.147 1.200 1.215 1.227 1.228 1.234
0.888 1.053 1.163 1.245 1.280 1.348 1.370 1.396 1.410 1.418 1.430
1.089 1.325 1.474 1.532 1.608 1.672 1.680 1.732 1.750 1.760 1.774
1.238 1.590 1.739 1.846 1.918 2.008 2.025 2.076 2.092 2.104 2.129
1.388 1.918 2.100 2.294 2.360 2.503 2.520 2.580 2.590 2.610 2.612
Theprobabilityplot
467
The derivation of the weights for Cases 2 and 3 is given in the Appendix.
4.2. Use of the tables with censored data Suppose origin and scale are both unknown (Case 3), and the data is censored at both ends. Thus n* = r - k + 1 observations are available, consisting of all those between X(k) and XCr). R(X, T) may be calculated, using the usual formula, but with sums for i from k to r, and with ~ = i/(n + 1) or T / = i, or even Tl, T2,..., Tn, equal to 1 , 2 , . . . , n*, these latter values for T, being possibilities because R(X, m) is scale and location invariant. In effect, for this test, the sample can be treated as though it were complete. Then n * { 1 - R 2 ( X , T)} = Z(X, T) will be referred to Table 2, using the values for sample size n*.
4.3. Example It is well-known that if times Q(i), i = 1 , 2 , . . . , n represent times of random events, occurring in order with the same rate, the Q(i) should be proportional to uniform order statistics U(i). Thus the Q(i) may be regressed against i/(n + 1) or equivalently against i as described above, to test that the events are random. Suppose Q(9), Q(10),..., Q(20) represent a subset of such times, denoting times of breakdown of an industrial process. We wish to test that these are uniform; times Q(1) to Q(s) have been omitted because the process took time to stabilize and the events are not expected to have occurred at the same rate as the later times. The times Q(9),Q(10),...,Q(20) are 82, 93, 120, 135, 137, 142, 162, 163, 210, 228, 233, 261. The value of Z(Q, T) = 12{1 -RZ(Q, T)} = 0.464. Reference to Table 2 at line n = 12 show that there is not significant evidence, even at the 50% level, to reject the hypothesis of uniformity.
5. Power of correlation tests
In this section we make some general remarks about the power of tests based on the correlation coefficient. Firstly, it is well-known that the Shapiro-Wilk test gives very good power for testing normality; although not as superior as originally suggested, it is marginally better than E D F statistics W2 and A2 (Stephens, 1974). Because of the connection noted in Section 3, we can expect the correlation test to have similarly good power for the normal test, where it has been mostly used. Also, the correlation test for the uniform distribution can be expected to be good, because of its connection with the E D F statistic W2, which is powerful for this test. However, the uniform and normal distributions have relatively short tails, and power results are less impressive, at least for large samples, when we consider tests for heavy-tailed distributions. The EV and logistic distributions have tails similar to the exponential and for testing these and similar distributions McLaren and Lockhart (1987) show that correlation tests can have asymptotic relative
468
R. A. Lockhart and M. A. Stephens
efficiency zero when compared with E D F tests. If the tests are conducted at level c~ and the alternative distribution approaches the null at the rate 1In 1/2, E D F tests will have power greater than e, whereas correlation-coefficient tests will have power equal only to c~. It is clear that these results on relative efficiency apply because the values in the heavy tails are the important ones influencing efficiency of the tests; for the effect of heavy tails see Lockhart (1991). The effect may be seen from the probability plots themselves; it is a well-known feature of regression that widely-spaced observations at one end of the set will be the influential values in determining the fit of the line. A direct comparison was made for the exponential test, by Spinelli and Stephens (1987), using complete samples of sizes 10, 20 and 50. The results for these sample sizes are somewhat difficult to interpret, but they suggest that the E D F statistics W2 and A2 are overall better than the correlation coefficient statistics, although the latter occasionally score well. In cases where the data are censored in the heavy tail (for this test, right-censored), and the asymptotic distribution reverts to Case 1, power results might well be somewhat different. More work is needed on comparisons for small, censored samples. It should be emphasised also that the rather negative theoretical results above are asymptotic results, and may only have a serious effect for very large samples. For small samples, the advantages of familiarity and graphical display will make correlation tests appealing in many circumstances; this will be especially so in the important situations where data are censored, since for Z, tables exist (Stephens, 1986a), where they may not for E D F tests.
A. Appendix A.1. Asymptotic theory f o r the uniform t e s t - Case 3
In this section the asymptotic theory of Z ( X , m ) is given for case 3 (the most difficult case, when both the parameters in the uniform distribution are unknown) following the general theory developed in Section 2. It is convenient to write the fitted model as X(i) = ~ + fi(mi - m) + ¢i •
(1)
As before, the process Qn(t) = x(i) - m i , and Y,(t) = X(i) -f((i), using the notation of Section 2; also v ~ ( X - N) =
and
/0'
O~(s) ds + Op n-½
The probability plot l
v/n ~ -
,
469
X
1 =
E i n l ( mi _ ~ ) 2 / n
=12foi(t-~){Qn(t)-folQn(s)ds}dt+Op(n-½) recalling that rh = 1/2 and ~ ' l ( m i Then Y,(t) becomes
m)2/n -+ 1/12.
I'
Y, (t) = X(i) - X(i) = Q, (t) -
Q, (s) ds
(/0'
-/0'
,
du+
As before, when n ~ oc, let Q(t) and Y(t) be limiting processes for Q,(t) and Y~(t) respectively. Q(t) is the well-known Brownian bridge with mean E{Q(t)} = 0 and covariance po(s, t) = min(s, t) - st. The process Q,(t) - fo Q,(s) ds has already been studied in connection with the Watson statistic U 2 (Watson, 1961; Stephens, 1976). For the asymptotic distribution of Z(X, m) we now need the distribution of Z~ = 12
I'
YZ(t)dt .
(2)
The covariance function of Y(t) requires considerable algebra but the calculation is straightforward; the result is
p3(5`, t) = p0(5`, t) + ~(5`)~(t)/5 - ~ ( s ) w ( t ) - w(5`)~(t) with v(s) = s - 1/2 and w(s) = s(1 - s ) ( 2 s - 1). The distribution of Z~ takes the form (Case 1 of Section 2) Z~ = 12~ -~v~ A..~ 2t i=I
(3)
where vi are independent Z~ variables. The 2~ are eigenvalues of the integral equation
2
11
f ( s ) p 3 (5`, t) ds = f ( t )
.
(4)
Suppose the corresponding eigenfunctions are j~(t). The solution of (4) is found as follows. The covariance/)3 (s, t) can be written p3(s, t) = min(s, t) + g(s, t), with
9(5`, t) = 65`t- I@ + 25`2 -- 5`3 --11~ t + 2t 2 _ t 3 + 2 _ 3st 2 + 2st 3 _ 3s2t + 2s3t . Differentiation of (4) twice with respect to t then gives
470
R. A. Lockhart and M. A. Stephens
f01f(s)ds + ( 1 2 t - 6 ) f o l S f ( s ) d s =
-f(t)-(6t-4)
lf"(t).
(5)
D i f f e r e n t i a t i o n a g a i n gives
- f ' ( t ) - 6 .j~O1f ( s ) d s + 12 ~o 1 s f ( s ) d s =
f(3)(/)
(6)
a n d finally
-f"(t)
= If(4)(t) .
Thus
f ( t ) = A cos v ~ t + B sin v ~ t + Ct + D .
(7)
L e t Ko = fd f ( s ) ds a n d K, = fd sf(s) ds. Set 0 = x/2~; t h e n C K0 = -}sin 0 - ~(cos 0 - 1) + ~ - + D
(8)
and
K1 =
f
l
C D sf(s) ds = AI1 + BI2 + ~ + g ,
(9)
where It =
f l
s cos Os ds =
0 sin 0 + cos 0 - 1
02
and I2 =
a£o
s sin Os ds
sin 0 + 0 cos 0 -
02
S u b s t i t u t i n g f ( t ) i n t o (5) gives - C t - D + (4 - 6t)K0 + ( 1 2 t - 6)K1 = 0 for all t; thus, e q u a t i n g coefficients, we h a v e - C - 6K0 + 12K1 = 0 a n d - D + 4K0 - 6 K I = 0. H e n c e c + ~ = K 1 , and C+D=K0. T h u s f r o m (8) we h a v e A sin 0 - B ( c o s 0 - 1) = 0, a n d f r o m (9) we have
AII+BI2 = 0
.
(10)
H e n c e 0 m u s t satisfy sin 0 cos 0 - 1
B A
I1 12
1 - 0 sin 0 - cos 0 sin 0 - 0 cos 0
(11)
So 0 satisfies 2 - 0 sin 0 - 2 cos 0 = 0, b y c r o s s - m u l t i p l i c a t i o n o f (11). L e t q5 = 0; then 2 - 44> sin ~b cos q5 - 211 - 2 sin 2 ~b] = 0, and hence sin ~b = 0 or sin ~b - q5 cos 4) = 0. T h e n q5i = ~i, i = 1 , 2 , . . . ; o r a l t e r n a t i v e l y q5k is the s o l u t i o n
471
The probability plot
o f tan q~ = qSk, k = 1 , 2 , . . . . Finally, 2i = 4~b~, for the first A-set, and 2~ = 4q52, for the second 2-set.
A.2. Asymptotic theory - Case 2 n
^
2
n
F o r Case 2 the test statistic is n ~-~4=l(X(i)-X(i)) / ~ i = l ( X ( i ) _ ~ ) 2 = Z2,n. We can take a = 0 in the model E(X(i)) = a + t~mi, so t h a t E(X(i)) = flmi, and least squares gives /~ = 2i=1 n X (i)mi/Y'~i=l n m2. Hence /~ - 1 = ~ i =n i (X( i ) - mi)mi/ ~ i ~ l m~. Similar reasoning to that for Case 3 gives the asymptotic distribution of Z2,, to be that of 12 f l yz2(t) dt where Y2(t) = Q(t) - 3t
/01
sQ(s) ds .
(12)
Q(t) is as defined in the previous section, and then Y2(t) is a Gaussian process with mean 0; its covariance function (after some algebra) is 14 st 3 s3 t p2(s, t) = rain(s, t) - y s t + ~ - -t 2
(13)
Thus for the weights in the asymptotic distribution of Z2,~, we need eigenvalues o f 2f~p2(s,t)f(s)ds=f(t ). Similar steps for those of Case 3 give f ( t ) = A cos Ot + B sin Ot + Ct + D with 0 = v/2, as before. Also, f ( 0 ) = 0, so D = - A , and el
- f ( t ) + 3t Jo sf(s) ds - f"(t)2
(14)
Thus f"(O) = 0, so D = A = 0. Then, from (14), we have
[/0'
- B sin Ot - Ct + 3t B
s sin Os ds +
/0 ]
Cs 2 ds = - B sin Ot .
Hence fd s sin Os ds = 0; thus Oj is the solution of sin Oj - Oj cos Oj = 0, that is, tan Oj = 0j, j = 1 , 2 , . . . . Finally, 2j = 03. These are the weights given in Section 4.
A.3. Asymptotic percentage points The final step is to calculate the percentage points of, say, Z3o~ = f/-~l vi/2i where )oi are the weights for Case 3. The mean #3 of Z3oo is f~ p3(s,s)ds = 1/15. The 80 largest 2i were found, and Z3~ was approximated by $1 = S * + T, where S* = ~ 2 1 vi/2i and T =/~3 - ~ ° 1 2 ; 1. SI differs from Z3~ by ~~ic~__812 ~ l ( v i - 1) which is a r a n d o m variable with mean 0 and variance 2 i~812 9 =
= 2
{/001
p~(s,t)dsdt-
8 2} 2~
;
472
R. A. Lockhart and M. A. Stephens
this value is negligibly small. Thus critical points of Z3oo are found by finding those of S*, using Imhof's (1961) method for a finite sum of weighted )/2 variables, and then adding T. These points are given in the last line of Table 2. Similar methods were used to give the asymptotic points for Z2,n and ZZA,nin Table 1.
References Balakrishnan, N. (1984). Approximating the sum of squares of normal scores. Appl. Statist. 33, 242-245. Coronel-Brizio, H. C. and M. A. Stephens (1996). Tests of fit based on probability plots. Research Report, Department of Mathematics and Statistics, Simon Fraser University, Burnaby, B. C., Canada, V5A 1S6. David, H. A. (1981). Order Statistics. Wiley, New York. Davies, O. L. (Ed.) (1956). The Design and Analysis of Industrial Experiments. Hafner, New York. Davis, C. S. and M. A. Stephens (1977). The covariance matrix of normal order statistics. Comm. Statist. Simul. Comput. B6, 135-149. De Wet, T. and J. H. Venter (1972). Asymptotic distributions of certain test criteria of normality. S. African Statist. J. 6, 135-149. Durbin, J. (1973). Distribution Theory for Tests Based on the Sample Distribution Function. Regional Conference Series in Applied Mathematics, 9. SIAM: Philadelphia. Durbin, J. and M. Knott (1972). Components of Cram6r-von Mises statistics. J. Roy. Statist. Soc. B 34, 290-307. Gerlach, B. (1979). A consistent correlation-type goodness-of-fit test; with application to the twoparameter Weibull distribution. Math. Operations-forsch. Statist. Ser. Statist. 10, 427~452. Imhof, J. P. (1961) Computing the distribution of quadratic forms in normal variables. Biometrika 48, 419~426. LaBrecque, J. (1977). Goodness-of-Fit tests based on nonlinearity in probability plots. Technometries 19, 293 206. Lehman, H. Eug6ne (1973). On two modifications of the Cram~r-von Mises statistic. J. Roy. Statist. Soc. B 35, 523. Leslie, J. R. (1987). Asymptotic Properties and New Approximations for both the Covariance Matrix of Normal Order Statistics and its Inverse. Goodness-of-Fit (P. R6v6sz, K. Sarkadi and P. K. Sen, eds), 317 354. Elsevier, New York. Leslie, J. R., M. A. Stephens and S. Fotopoulos (1986). Asymptotic distribution of the Shapiro Wilk W for testing for normality. Ann. Statist. 14, 1497-1506. Lockhart, R. A. (1985). The asymptotic distribution of the correlation coefficient in testing fit to the exponential distribution. Canad. J. Statist. 13, 253-256. Lockhart, R. A. (1991). Overweight tails are inefficient. Ann. Statist. 19, 2254-2258. Lockhart, R. A. and M. A. Stephens (1995). The Probability Plot: Consistency of Tests of fit. Research Report, Dept. of Mathematics and Statistics, Simon Fraser University. McLaren, C. G. and R. A. Lockhart (1987). On the asymptotic efficiency of certain correlation tests of fit. Canad. J. Statist. 2, 159-167. Purl, M. L. and C. Radhakrishna Rao (1975). Augmenting Shapirc~Wilk Test for Normality. Contributions to Applied Statistics: Volume dedicated to A. Linder 129-139. Birkhauser-Verlag: New York. Sarkadi, K. (1975). The consistency of the Shapiro-Francia test. Biometrika 62, 445450. Shapiro, S. S. and R. S. Francia (1972). Approximate analysis-of-variance test for normality. J. Amer. Statist. Assoc. 67, 215-216. Shapiro, S. S. and M. B. Wilk (1965). An analysis-of-variance test for normality (complete samples). Biometrika 52, 591-611.
The probability plot
473
Spinelli, J. J. and M. A. Stephens (1987). Tests for exponentiality when origin and scale parameters are unknown. Technometries 29, 471-476. Stephens, M. A. (1974). EDF statistics for goodness-of-fit and some comparisons. J. Amer. Statist. Assoc. 69, 730-737. Stephens, M. A. (1975). Asymptotic properties for covariance matrices of order statistics. Biometrika 62, 23-28. Stephens, M. A. (1976). Asymptotic results for goodness-of-fit statistics with unknown parameters. Ann. Statist. 4, 357-369. Stephens, M. A. (1986a). Tests based on regression and correlation. Chap. 5 in Goodness-of-Fit Techniques (R. B~ D'Agostino and M. A. Stephens, eds.). Marcel Dekker, New York. Stephens, M. A. (1986b). Tests for the uniform distribution. Chap. 8 in Goodness-of-Fit Techniques (R. B. D'Agostino and M. A. Stephens, eds.) Marcel Dekker, New York. Watson, G. S. (1961). Goodness-of-fit tests on a circle. Biometrika 48, 109-114.
N. Balakrishnan and C. R. Rao, eds., Handbook of Statistics, Vol. 17 © 1998 Elsevier Science B.V. All rights reserved,
1 p-/ l l
Distribution Assessment
Samuel Shapiro
1. Introduction
Many statistical analysis procedures require that the analyst assume some form for the distributional model which gave rise to the data. Early in the development of mathematical statistics the normal distribution was the model of choice. Today there are a wide range of models to choose from. The accuracy of the analyses which require such assumptions depend on how close the chosen model is to the actual distribution. Thus it is not surprising that the history of "goodness of fit" goes back to the beginnings of the development of modern statistics; the initial procedure in this area was developed by Pearson (1900) and is the well known chi squared goodness of fit test. Since Pearson's beginnings in this area there have been a plethora of procedures developed for a wide range of statistical models. Each of these procedures attempts to make use of some property of the model being tested and use this property to differentiate between the model and other possible distributions. Since distributions can be described by their order statistics it follows that properties of their order statistics can also be used to construct distributional tests. Some of these procedures use the sample order statistics directly while others use the spacings between adjacent sample order statistics. In the following chapter we will limit the discussion to distributional tests based on order statistics which are composite, no assumptions about the value of the parameters are needed, and are omnibus, have good power against a wide range of possible alternative models. Most of these procedures can only be used for location and scale parameter families or with distributions which can be transformed to a location-scale format. The procedures presented will be based on one of two rationales. The first will use the regression relationship of the sample order statistics on the expected values of the order statistics from the standardized hypothesized model, i.e., the model stated in the null hypothesis with its location parameter equal to zero and its scale parameter equal to one. Letting Y(i, n) be the ith order statistic from a sample of size n from population f(Y; #, ~) we have the relationship
Y(i,n)=#+~rm(i,n)÷~(i),
i= 1 , 2 , . . . , n 475
(1)
s. Shapiro
476
where # and G are the location and scale parameters, m(i , n) is the expected value of the i th order statistic from a sample of size n from f(y;0,1) and s(i) is the random error. The rationale behind these regression type tests of fit is that if the data were sampled from the hypothesized model the regression would be a straight line or equivalently the correlation between Y(i, n) and the re(i, n) would be close to one. The second group of tests are based on some function of the weighted spacings of the sample order statistics defined as
X(i, n) = K[Y(i, n) - Y(i - 1, n)],
i = 2,..., n
(2)
where K is some weighting function. These ideas will be developed in the following sections of this chapter. An excellent reference for an extensive discussion of testing for distributional assumptions can be found in D'Agostino and Stephens (1986) and for a handbook description of such procedures see Shapiro (1990) or Chapter 6 of Wadsworth (1990).
2. Probability plotting 2.1. Introduction One of the earliest techniques which used the order statistics in distributional assessment was with a graphical procedure known as probability plotting. Probability plotting can be used with scale and location parameter families of distributions. This technique, while not an objective procedure, yields a graphical representation of the goodness of fit of the data to the hypothesized model. The extent and magnitude of the departures from the model are apparent. The underlying rational is to use the regression equation expressed in (1) and plot the ordered observations against the expected value of the order statistics from the null distribution. If the selected model is correct this plot will be approximately linear, up to perturbations due to random error, and the slope of the line will yield an estimate of the scale parameter, a, and the intercept an estimate of the location parameter, #. If the model is incorrect then the plot will deviate from linearity, usually in a systematic pattern and the analyst will be able to reject the hypothesized distribution. The procedure also highlights outlying observations. While a subjective assessment as to the linearity must be made it is possible to make informative decisions and the ability to distinguish between the null and alternative models gets easier as the sample size increases. Some of the earliest work in the area of probability plotting was done by Mosteller and Tukey (1949) in connection with the use of Binomial probability paper. Chernoff and Lieberman (1956) discussed the use of generalized probability paper and Birnbaum (1959) and Daniel (1959) developed the concepts involved in using half-normal probability plots in connection with the analysis of 2 ~ experimental designs. Wilk et al. (1962) discussed probability plotting for the
Distribution assessment
477
gamma distribution and Wilk and Gnanadesikan (1961) applied the technique in connection with graphical analysis of certain multivariate experiments. Elementary discussions on the construction of probability plots appear in many statistical texts such as Hahn and Shapiro (1967), Shapiro (1990), Nelson (1986) and D'Agostino and Stephens (1986).
2.2. Construction of plots One of the major assets of this procedure is the ease in preparing the plots. It is not necessary to know the parameters of the distribution being hypothesized nor the expected value of the order statistics from the null, standardized distribution. Special paper is available for a number of distributions where one of the scales has been transformed so that the user need only plot some function of the order number and sample size and the scaling of the paper transforms it to the corresponding value of re(i, n) in equation (1). The choice of the function depends on the null distribution and is based on the work of Blom (1958) who suggested that a good approximation to the mean of the iTM order statistic
m(i,n) = F ~(Tri) , where
While various authors have made recommendations for use of specific values of ai and fli for a variety of distributions (for example Blom (1958) suggested using 3/8 for both these constants with the normal model) in many cases the plotting positions (i - 0.5)/n or i/(m + 1) are used as a general compromise. There is commercially available probability paper for the following distributions: normal, lognormal, exponential, extreme value, logistic, Weibull and chi squared (with degrees of freedom known, up to ten). The latter can be used for a gamma plot with known shape parameter if the shape parameter corresponds to the degrees of freedom for the chi squared paper. Wilk et al. (1962) describe how to construct probability paper for any gamma distribution when the shape parameter is known. Most statistical software packages have routines for construction of probability plots although for many of these the output is difficult to use. The following are the steps for constructing a probability plot if the computer is not used and the plotting paper is available. 1. Select the model to be tested and obtain a sheet of probability paper for the chosen model. 2. Let ~, i = 1,2,... ,n be the unordered sample values. Obtain the sample order statistics by ordering the observations from smallest to largest and denote these as f(i,n), where Y(1,n) <_ Y(2, n) _<--. _< Y(n,n). 3. Note that i denotes the order number.
478
s. Shapiro
4. Plot the Y(i, n) vs P(i) = (i - 0.5)/n. (Other plotting positions may be appropriate depending on the model and user's preference.) 5. The P(i)'s are always plotted on the percentile scale preprinted on the paper. This may be the ordinate or the abscissa depending on the brand paper selected. 6. Note that it is not necessary to have all the sample values to prepare the plot; however the order number of those values plotted must be known such as in the case of censored samples. The procedure can also be used with suspended values. This situation arises in a life test where units are still on test whose running times are less than units that have already failed, hence the order numbers are not known. This situation is discussed in Nelson (1986). 7. A decision must be made as to whether the plotted values tend to fall on a straight line. Use of a straight edge usually assists in making this decision. Since the order statistics are not independent there could be runs of points above and below a hypothetical straight line when the null model is appropriate. Thus tests of runs can not be used in the assessment. In addition the variance of the order statistics in unrestricted tails of the distribution is higher than in the central region or restricted tails. Thus greater deviations from linearity must be allowed in these regions. Systematic departures from linearity are the indicators of lack of fit and hence model rejection. Also the smaller the sample size the greater the tolerance that must be allowed for departures from a straight line. Examples of probability plots from null and non-null distributions are shown in Hahn and Shapiro (1967) and can be used as a guide for rejection of a null hypothesis. Often an analytical test procedure is used in conjunction with the plot so that an objective assessment can be made. In these cases the plot serves as a graphical guide as to what the analytical procedure is telling you and can indicate when the model is rejected if the rejection was due to one or a few outlying observations and give some idea of where the discrepancy in the data and the model occurred. For example the discrepancy could occur because the upper tail of the hypothesized model was too short. This then serves as a guide to the selection of another model to test. 8. If the model is not rejected then the plot can be used to estimate the parameters. Since these procedures depend on the null model and the paper being used this topic will not be covered here; the reader is referred to Shapiro (1990). Usually parameter estimates can be obtained more accurately from a computer output. 2.3. E x a m p l e All probability plots are constructed in a similar manner. The exact steps depend on the paper selected and the null distribution. The following example will illustrate the preparation of a plot for the case where the null distribution is the normal and the data have been left censored, the smallest five readings are missing. In this experiment twenty two specimens were measured for the quantity of a contaminant. The level of the contaminant in five samples was so small that it could not be measured by the instrument. The other 17 readings were ordered and are given in Table 1.
479
Distribution assessment
Table 1 Contaminants in samples NA NA NA NA NA
0.006 0.008 0.010 0.012 0.013
0.015 0.020 0.027 0.030 0.036
0.066 0.091
0.037 0.039 0.041 0.044 0.057
The plotting points, P(i) = ( i - 0.5)/22 were obtained f r o m Appendix A in Nelson (1986) and are available in that source up to n = 50. Note that the sample size is 22 even though there are only 17 values available for plotting. The smallest recorded observation corresponding to i = 6 is plotted against (6 - 0.5)/22 since the missing five observations are all less than this value. The plot is shown in Figure 1. E x a m i n a t i o n of the plot reveals that the plotted points fall along a straight line except for the largest value. Thus it is possible that 0.091 is an outlier and that further investigation of that reading is necessary. A replot of the data omitting this value is shown in Figure 2. There is no indication f r o m this plot that the n o r m a l model should be rejected.
0 0
0 0 I
I
I
i
I
I
-0.5
0.0
0.5
1.0
1.5
2.0
X
Fig. 1. Normal probability plot of contaminants in samples.
480
S. Shapiro
to o o
o o
o o I
I
I
I
I
I
-0.5
0.0
0.5
1.0
1.5
2.0
X Fig. 2. Replot of Fig. I after removal of largest value. 3. Regression type tests 3.1. Introduction
The original work in the use of equation (1) for goodness of fit was done by Shapiro and Wilk (1965) for the normal distribution, f ( y ) = [27C0"2] - I / 2 e x p [ - 1/2{(y - ~)/a} 2, -oo_
-o¢<#
a>0.
The rationale behind this procedure was that an assessment of the linearity of the probability plot could be made using an analysis of variance approach. If the hypothesized model is correct then as indicated above the slope is an estimate of the scaling parameter, a; if it is incorrect then the slope is not an estimate of the scaling parameter. Thus a test of the null hypothesis could be made by comparing the estimate of the scaling parameter based on the slope with another estimate which did not depend on whether the hypothesized model was correct. The test statistic proposed for the normal distribution was a scaled ratio of these two estimators; it is also the square of the coefficient of correlation between the ordered observations and the expected value of the order statistics. This work gave rise to many suggestions, many of which attempted to eliminate the need for the coefficients required in the computation of the statistic; some attempted to simplify obtaining the percentiles of the test statistic and some extended the rationale to test other null models. Some of these efforts will be described below.
481
Distribution assessment
3.2. Regression tests for normality Shapiro and Wilk (1965) proposed a test for normality which used the regression of the ordered observations on the expected value of the order statistics expressed in equation (1) to obtain an estimate of the scale parameter. Since the ordered observations are dependent, i.e. the ~i tS a r e correlated, generalized least squares [Aitken (1935) and Lloyd (1952)] were used as described in Chapter 1 in the companion volume to obtain the BLUE estimate for a. This was compared to the total sum of squares about the sample mean (another estimator, up to a constant, of a 2. This resulted in the test statistic
where Y(i, n) is the {h order statistic from a sample of size n, Y/are the unordered observations,
a t = ( a l , . . . ,an) : ( m ' V - 1 ) / ( m ' V - 1 V - l m ) 1/2 , and m and V are the vector of expected values and covariance matrix of the order statistics from the standard normal distribution. The authors tabled the values of the ai's for samples sizes from 3 to 50 and supplied percentiles for a lower tail test. Non-normality is indicated by a shift of the test statistic to lower values. Shapiro, Wilk and Chen (1968), Shapiro and Wilk (1965), Stephens (1974) and many others have compared the power of the W test to other procedures for testing normality and have concluded that the W test is a good omnibus test against a wide range of alternatives. The major disadvantages of the original procedure is that tables of constants and percentiles were needed and these were limited to samples up to size 50. The first modification to (3) was proposed by Shapiro and Francia (1972) who extended the sample size to 100 by using a remark by Gupta (1952) that the estimate of o- was little affected if the covariance matrix, V, in (3) was replaced by the identity matrix, L Using the test statistic
1
141
the coefficients and percentiles for this procedure were given. The coefficients ai and bi can be found in Shapiro (1990). While this procedure extended the sample size range of the W test it still required a table of constants and percentiles. In order to eliminate the need for the table of constants a modification was proposed by Weisburg and Bingham (1975) which replaced the bi's in (4) by H[(i - 3/8)/(n + 1/4)], where H(.) is the inverse of the standard normal distribution. (This suggestion uses Blom's (1958) approximation cited above.) Thus these two suggestions made it easier to calculate the statistic and increased the sample size range but had the effect of reducing the power of the test for certain classes of alternative distributions as compared to the original. Ryan and Joiner (1973) and Filliben (1975) interpreted W~ as the square of the correlation coefficient of the ordered observations and the expected value of the
S. Shapiro
482
order statistics. Filliben (1975) proposed use of the median of the order statistics in place of the expected value yielding the test statistic
r:=
r{(i, n ) - Y}{m'(/, n ) -
where M(i, n) is the median of the ith order statistic for a sample of size n and r~r is the average of the medians of the order statistics. This procedure gave power results comparable to W~. Royston (1992) took the next step by deriving approximations to the coefficients of the W test defined in (3) and proposed a normalizing transformation for the statistic so that the p-value of the test could be obtained from normal tables. Thus it was now possible to use the procedure for large samples without need of coefficients or special tables for percentiles. Work on the asymptotic properties of these procedures was done by De Wet and Venter (1972, 1973) where the asymptotic distribution of rf was obtained using H[i/(n + 1)] instead of the Blom suggestion and by Verrill and Johnson (1987) who showed the asymptotic equivalence of the Shapiro-Francia, Filliben, Weisberg Bingham and De Wet-Venter procedures. A similar correlation type test was proposed by Smith and Bain (1976) where the function H(.) was replaced by an approximation to it. In general while the computational effort required for obtaining the test statistic was reduced a comparison of the powers of these procedures with the original test (3) showed that the power of the test was reduced for many alternative distributions, especially when the sample size was small. One advantage of the correlation type tests is they can be used with censored samples since it is not necessary to calculate S. Another attempt to simplity the W procedure was suggested by D'Agostino (1971, 1972, 1973) where he suggested using
DA = [ ~ Y(i,n){i-O.5(n + 1)}]/[Sn 3/2]
(6)
where S = vd Z(Y(i, n) - y)2. . This.procedure is two-tailed, and percentage points were given for a standardized version of the procedure xn the cited references. The choice as to which of these procedures to use in any one application will generally depend on the computer software package available to the analyst. In general the original W test gives good overall power results against a wide range of alternatives; however the difference in the power for the various tests are small since they are all based on the same concept. It should be noted that for small samples these procedures are superior in power to the classical tests such as the chi squared goodness of fit test and the Kolmogorov-Smirnov procedure.
Example The following data set will be used to illustrate the details of some of the test procedures referred to in the text. It should be noted that most of the procedures
Distribution assessment
483
were done using a computer program and the intermediate results presented have been rounded. Therefore there may be round-off differences between these and the final results. The data are as follows after ordering. 9.1 9.8 11,3 13.6 Shapiro-Wilk
9.3 10.2 11.9 14.2
9.4 10.5 12.5 16.0
9.5 10.8 12.9 18.2
n = 16 Z Y = 189.2 S 2 = 103.3904
W test a n d R o y s t o n ' s a p p r o x i m a t i o n
The coefficients, ai's used with the highest eight values needed for equation (3) are, in reverse order (starting with the largest value): 0.5056, 0.3290, 0.2521, 0.1939, 0.1447, 0.1005, 0.0593 and 0.0196. Note the coefficients for the remaining are the negative of those listed. Thus 18.2 is multiplied by 0.5056 and 9.1 by -0.5056. Thus b =-9.5829 and W = (9.5829)2/103.3904 = 0.888. The lower tail percentiles for n = 16, obtained from Shapiro (1990) are W0.05= 0.887 and W0.10 --- 0.906. Thus the p value for the test is between 0.05 and 0.10. Using Royston's (1992) approximations for the coefficients yields a value of W of 0.888 and using his normalizing transformation gives a standardized normal Z-value of 1.62, which corresponds to a p-value of 0.0526. Filliben's test
It is necessary to calculate the medians of normal order statistics in order to compute equation (5). Filliben (1975) suggested the following approximation. Let ml = 1 - mn,
m i = (i - 0.3175)/(n + 0.365);
i = 2, 3 , . . . , n - 1
m , = (0.5) 1/"
Then m ' ( n , i) = 49-~ (mi) where 45-1 (.) is the inverse of the standard normal distribution. Note only half of these need be computed since the expected value of the medians are symmetric about zero. Once the m~(i, n) are obtained the statistic for the test can be obtained by calculating the Pearson product moment coefficient of correlation. Using the above data set the mi's for the highest eight values, in reverse order, are: 0.9576, 0.8972, 0.8361, 0.7750, 0.7139, 0.6528, 0.5917 and 0.5305. The corresponding inverse values are: 1.72, 1.27, 0.98, 0.76, 0.56, 0.39, 0.23 and 0.08. The remaining values run from -0.08 to -1.72. Calculating the coefficient of correlation of the inverse values and the data points yields r f = 0.944. The lower tail percentiles from Filliben (1975) are 1"0.05= 0.940 and r0.~0 = 0.952, yielding a result similar to the W test. D'Agostino's
DA test
Using equation (6) we find the sum in the numerator to be 175.6 which yields a value of DA of 0.2698. The percentiles are tabulated for a normalized version of the statistic Y = ,v/-n(DA -- 0.28209)/0.29986
484
S. Shapiro
which equals -1.63 for this example. Referring to the table in D'Agostino and Stephens (1986) the hypothesis of normality would be rejected if the normalized test statistic was less than -3.12 or greater than 0.526 for a test level of 0.05. Note this result is less discriminating than the prior procedures; however this test was designed basically for larger sample sizes.
3.3. Regression test for the exponential distribution Regression tests using the order statistics for the exponential distribution, f(y)=2exp[-2(y-#)];
y>#,
)o>0, - o c < / ~ < o c
,
follow the same rationale as for the normal distribution. For an extensive description of the subject of testing for the exponential model see Balakrishnan and Basu (1995). The equivalent of the normality W test (equation [3]) for the exponential distribution as proposed by Shapiro and Wilk (1972) was We = b 2 / S 2
(7)
where S is defined as in (6), b = [ Y - Y(1,n)]v/n/(n- 1) and Y(1,n) is the first order statistic in the sample. The quantity b is the generalized least squares estimator of the slope of the regression line which has been multiplied by a constant. Unlike the normality test this is a two tailed procedure and percentiles of the test statistic were given for samples up to 100 in the above cited article and in Shapiro (1990). The above procedure makes no assumption regarding the origin parameter, #. If # is known Stephens (1978) suggested the procedure of including the known value as the smallest data point and treat the sample as size n + 1, thus increasing the power of the test. The We procedure is not consistent. D'Agostino and Stephens (1986) point out that n We converges in probability to one for the exponential distribution and hence this test procedure will have no power against alternatives which have this same property. De Wet and Venter (1973) proposed a version of their normality test for use with the exponential distribution when the origin parameter is known to be zero. Defining
where H(i, n) = -In[1 - i/(n + 1)] they derived the asymptotic distribution for this test statistic when sampling from an exponential model. The correlation coefficient version for the exponential distribution was proposed by Filliben (1975) and is
(a)
485
Distribution assessment
where m(i,n) = F-l[i/(n + 1)] = -ln[1 - i / ( n + 1)] and rh equal to the mean of the re(i, n)'s. This results in a lower tailed test. An approximation of the percentiles for this test were given by G a n (1985). Two modifications of equation (8) were given by G a n and Koehler (1990). Their first suggestion, denoted k 2 substitutes Z(i, n) and the mean of the Z(i, n) for Y(i,n) and Y and p(i,n) and p for m(i,n) and r~ in equation (8), where Z(i,n) = F[{Y(i,n) - a)/b], a = Y(1,n) - b/n and b = nlY - Y(1,n)/(n - 1)], p(i, n) = i~ (n + 1) and/~ is the mean of the p(i, n)'s. Their second version used 1/2 for both the mean of the Z(i, n) and p. The percentiles for the test statistic are given by the function 1 - kp: = (Ap + Bpn) -1 and the values of Ap and Bp are given by the authors for each of the two statistics and can also be found in Balakrishnan and Basu (1995). Other correlation type tests were suggested by Smith and Bain (1976) who proposed computing the product m o m e n t correlation coefficient between the order statistics and -ln[1 - {i/(n + 1)}].
Example Shapiro-Wilk test Using the prior data set and calculations yields b = 2.8144,
S 2 = 103.3904
and
We = 0.07661 .
Referring to the table of percentiles in Shapiro (1990) for n = 16 it is found that the p-value for the test is close to 0.50.
Filliben test Using the data with equation (8) which computes the square of the coefficient of correlation of the ordered data with -ln[1 - i/(n + 1)] yields r 2 • 0.997.
3.4. Regression tests for the Weibull and extreme value distributions The Weibull distribution,
f(y) = (fl/a)(y/a) ~-lexp[-y/~)/~];
y >_ 0, /3 > 0, a > 0 ,
has a shape parameter,/3, and hence the regression type tests described above are not directly applicable to this model; however, the logarithmic transformation
W(i, n) = lnIr(i , n)] converts this model to an extreme value distribution for minimums and hence W(i, n) are the order statistics from the extreme value minimum distribution, fCv) = 1/~r exp[u -- exp{u]], -oe
u = (y -- ~)/o-,
-ee<#
0>0
,
which has no shape parameter. Thus regression tests for the Weibull distribution are the same as for the extreme value for minimums model except that the data
S. Shapiro
486
are first transformed prior to applying the test. In the paragraphs below the regression test for the extreme value for minimums will be described; however these can be used with the Weibull by using the above transformation. A test for the extreme value for maximums distribution,
f ( y ) = l/cr exp[-u - exp[-ul] , -oc
u = (y -/~)/G,
-oo~
a>0
,
is obtained by using the transformation W(i, n) = - Y ( i , n). The direct adaptation of the original normality W test to the extreme value distribution is difficult since it requires the covariance matrix of the order statistics which are not directly available. However since the expected value of the order statistics can be approximated using Blom's approach it is possible to use the W' procedure, equation (4), described above. Shapiro and Brain (1984) suggested using ~ = 0.48 and fl = 0.33 in Blom's approximation. Since F-l(~ci)= in ln{1/(1 - rci)} where ~i = ( i - oO/(n + 1 - ~ - fl) hence an approximation to
m(i,n) = In ln{1/[1 - {(i - 0.48)/(n + 0.19)}]} . Using the above in (4) and noting that 2m(i,n) = - n 7 (7 is Euler's constant) yields
W~' = b2'/S 2
(9)
where b' = [Z{m(i, n) + 7}Y(i, n)]/[Y~{m(i, n)2} 1/2] and S 2 is as defined previously. Note that for the test of a Weibull hypothesis Y(i, n) is replaced by W(i, n). This statistic uses no information about the covariance structure of the order statistics and unlike the case of the normal distribution this reduces the power of the procedure. Shapiro and Brain (1987) described a modification of this test by replacing the estimate of the slope of the regression line in (9) by an estimator suggested by D'Agostino (1971a). The modified procedure
(lO)
~=nb2/s 2 uses
b = [0.6079L2 - 0.2570Ll]/n where
n--1
;=1,2,...,.-1; 1
n-1
v.+i=vi[l+lnvi]-
1,
i= 1,2,...,n-1;
v2,=0.4228n-~vn+i 1
The resulting statistic provides a two tailed test for the extreme value distribution for minimums. The authors provided the following regression approximation to
.
487
Distribution assessment
the percentiles of the test statistic for p = 0.005, 0.025, 0.050, 0.950, 0.975 and 0.995 Wep ~ flOp q- fllpln[ n] q- fi2p{ln[n]} 2 •
The values of the coefficients can be found in Shapiro and Brain (1987), Shapiro (1990) and Wadsworth (1990). The power of this procedure was shown to be superior to the procedure defined in (9) and a test based on spacings proposed by Mann et al. (1973). Ozturk and Korukoglu (1988) modified the above procedure replacing the denominator in (10) by the probability-weighted m o m e n t estimator of a 6" = ~ [ ( 2 i -
n - l)Y(i,n)]/[O.693147n(n- 1)]
and defined a statistic
W* = b/6
(10a)
where b and 5- are defined in the above paragraphs. The authors obtained the null distribution of a normalized version of the test statistic via simulation and gave percentage points for a range of p values for samples up to size 1000. They showed via simulation that the power of this procedure was equal to or better than the test given in (10) for most of the alternatives included in their study. Distributional tests for the extreme value distributions (minimum, m a x i m u m or Weibull) based on the coefficient of correlation are given in D'Agostino and Stephens (1986), Gerlach (1979) and Stephens (1986). The advantage of these tests are they are applicable to censored samples. The reader should note that the above procedures for the Weibull are composite tests only when the origin parameter is assumed to be zero or is known; they are not valid if this parameter is unknown.
Example The data from the prior example is again used to illustrate the calculations. The test for the extreme value distribution uses the original data in equation (10) above. This yields L1 = 228.7757, S 2 = 103.39,
L2 = 173.8541,
b = 2.9307 ,
We = 1.329 .
Using the coefficients in Shapiro (1990) with p - - 0 . 9 9 5 yields a value of
Wep = 1.178. Since the computed value exceeds this value the hypothesis of the extreme value is rejected at a test level of 0.01 or stated otherwise the p-value of the test is less than 0.01. The test for the Weibull distribution is calculated using the natural logs of the data. This yields L1 = 42.2260, S 2 -- 0.6425,
L2 = 23.6134, We = 1.193 .
b = 0.2189 ,
s. Shapiro
488
Since this point exceeds 1.178 the p-value of the test is less than 0.01. The reader will note that this same set of data was used to test for the exponential distribution, a special case of the Weibull, and that a p-value close to 0.50 was obtained. The difference in the results is due to the fact that in the exponential test the origin was unspecified while in the Weibull case it was assumed to be zero.
Ozturk and Korukorglu Statistic Calculation of W* in equation (10a) uses the same value of b as above but substitutes ¢? for S in the statistic. For this set of data i t = 0.1718 and W* = 1.2742.
3.5: Correlation tests for other distributions It should be clear that the correlation type tests can be generated for any scalelocation parameter family as long as the appropriate H(i,n) function using Blom's approximation can be found. Then one need only use Monte Carlo simulation to generate percentiles of the distribution and either use a regression technique to summarize the results as a function of n or prepare appropriate tables. Examples for the logistic and Cauchy distributions are given in D'Agostino and Stephens (1986) and for the exponential power distribution in Smith and Bain (1976).
4. Use of spacings of the order statistics
4.1. Spacing tests for the normal distribution Distributional tests based on equation (2) have been developed for the distributions cited above. One such test for the normal distribution was developed by Chen and Shapiro (1995) and adapts the rationale of the original W procedure for use with the normalized spacings. In this procedure the estimate of o- in the numerator is obtained by considering the regression of Y ( i + 1 , n ) - Y(i,n) on m(i 4- 1, n) - re(i, n). Thus from equation (1) we have
Y(i 4- 1, n) - Y(i, n) : a[(m(i 4- 1, n) - m(i, n)] 4- (ei+l - el), i= 1,2,...,n-1 . Hence
[Y(i + 1, n) - Y(i, n)]/[m(i + 1, n) - m(i, n)] =a+[ei+l-ei]/[m(i÷l,n)-m(i,n)],
i= 1,2,...,n-
Let
Z(i, n) = [Y(i + 1, n) - Y(i, n)]/[m(i + 1, n) - m(i, n)] and
1 .
Distribution assessment J~
=
(~;i+1
--
489
ei)/[m(i + 1, n) - m(i, n)]
yielding the regression relationship
Z(i,n)=a+fi,
i=l,2,...,n-1
.
Minimizing Zf~2 results in an estimator of a of & = (n - l) -1 Z Z ( i , n )
.
N o w the re(i, n) are replaced using Blom's approximation; let
H(i,n) = q~-l[(i - 3/8)/(n + 1/4)] where 4 -1 (') is the inverse of the standard n o r m a l distribution, as was done in Weisberg and B i n g h a m (1975) described above. The test was defined as
QH* = x/~(1 - QH)
(11)
where
QH = [(n - 1)s] -1 Z [ Y ( i + 1, n) - Y(i, n)]/IH(i + 1, n) - H(i, n)l and
The properties of the test, the regression function to get the percentiles for samples up to 2000, and power comparisons with the regression tests referenced above are given in Chen and Shapiro (1995). In s u m m a r y it was shown that for m a n y alternatives the power of this procedure was as good as or better than the original W test.
Example Chen and Shapiro statistic Again using the same set of 16 observations as above with equation (11) yields the following results.
H(i) = -
1.7688,-1.2816,-0.9882,-0.7618,-0.5692,-0.3957 - 0.2335, -0.0772, 0.0772, 0.2335, 0.3957, 0.5692, 0.7618, 0.9882, 1.2816, 1.7688
and
s = 2.6254 .
This yields a value of QH = 0.9863 and QH*= 0.0548. The corresponding p-value is 0.052.
4.2. Spacing tests .for the exponential distribution The order statistics f r o m an exponential distribution have the p r o p e r t y that the weighted spacings
S. Shapiro
490
X(i,n)=(n-i+l)[Y(i,n)-Y(i-l,n)J,
i= 1,2,...,n
(12)
where Y(0, n) = p, the origin parameter, are independently and identically distributed exponential variates with/~ = 0. This property was utilized by Gnedenko, Belyayev and Solovyev (1969) to construct a test for the exponential distribution which assessed whether the hazard rate function is constant over time. They divided the spacings into two groups, the first r and the remaining s = n - r. The test statistic was defined as
= s r x(i,n)/r F nX(i,n/ 1
t.
r+l
1
(13)
.1
If the null hypothesis is true then both the numerator and denominator are sums of independent exponential variates with the same origin and scale parameters and hence G(r,s) will have an F distribution with 2r and 2s degrees of freedom. I f the origin parameter, #, is not known then one spacing is lost and the test is run with the remaining spacings. The procedure is useful with censored samples. The test is not completely objective since the value of r must be selected arbitrarily; the authors gave suggestions for making the choice. Other authors have made suggestions for similar procedures; these include Fercho and Ringer (1972), Harris (1976) and Lin and Mudholkar (1980). The latter authors evaluated the power of these procedures and concluded that for a wide range of alternatives their procedure, the bi-variate F test, and (13) had the highest power. The bi-variate F procedure was more powerful for alternatives with nonmonotone hazard rate functions such as the log normal. This procedure requires the user to divide the weighted spacings into three groups, the first consisting of the initial r spacings, the second consisting of next n - 2r spacings and the third consisting of the last r spacings. Two F ratios are computed. F1 is the ratio of the sum of the first r spacings to the sum of the middle spacings and F2 is the ratio of the sum of the middle spacings to the sum of the upper spacings. I f the hazard function is constant then the two resulting statistics will have a bivariate F distribution. The authors provide suggestions for the choice of r and show how the univariate F can be used to obtain percentiles of the test statistic. Brain and Shapiro (1983) suggested another method of using the weighted spacings defined in (12) to assess whether the hazard rate function is constant. They suggested regressing the spacings versus their order number and testing whether the slope of this regression was zero. Two procedures were investigated, one simply estimated the slope of the regression line and tested the hypothesis that it was zero and the second fit a second degree orthogonal polynomial and jointly tested whether the coefficients of the linear and quadratic terms were zero. The latter would be useful against alternatives which have non-monotone hazard rates. The first procedure for the case where the origin parameter is unknown was defined as
Distribution assessment
491
where ai = i - n/2. The distribution of Z is approximately standard normal even for relatively small samples and the test is two tailed. The quadratic term is given by Zq
=
[5/{(4(n + 1)(n - 2)(n - 3)}] 1/2
The distribution of Zq is also approximately standard normal and independent of Z. The combined test for the linear and quadratic components is given by Z * = Z 2 -}- Z q2 •
(15)
Since both the components of Z* are orthogonal contrasts and are squares of independent approximately normal variates, Z* has a distribution which converges with increasing sample size to the chi squared distribution with 2 degrees of freedom. The authors gave the small sample corrections to selected percentiles for the case where the sample size was greater than 14: Z~.90
=
4.605 - 2.5/n
Z~.95 = 5.991 - 1.25/n Z~,975 = 7.378 + 3.333/n . The constant term is the corresponding percentile from the chi squared two degree of freedom distribution. The authors described the modifications necessary for the case where the origin parameter is known. Another procedure based on spacings and the analysis of variance approach used in (3) was given in Tiku (1980). Two estimates of the standard deviation based on spacings are computed and the proposed statistic for complete samples is Z = 2Z(n
- 1 -i)Di/[(n - 2)~Di]
(16)
where Di = (n - i)[Y(i + 1, n) - Y(i, n)l. He showed that the distribution of Z/2 is the same as the mean of (n - 2) i.i.d uniform random variates which was given by Hall (1927). This test can also be used with censored samples. Example Brain and Shapiro statistic
The prior data will be used to illustrate the use of Equation (15) to test for an exponential distribution. It is assumed that the origin is unknown; hence there are 15 spacings available. The value of Z is 0.05096 and Zq is -0.59894 yielding a value of Z* of 0.3613. Using the equation for the 90th percentile, the critical value for the test is 4.4488. Since the computed value is less than this the p-value is greater than 0.10. 4.3. Spacing tests f o r the extreme value and Weibull distributions
The procedure described in (11) using the spacings to test for the normal distribution by Chen and Shapiro (1996) can be also used with the extreme value
492
s. Shapiro
d i s t r i b u t i o n s a n d the W e i b u l l using the t r a n s f o r m a t i o n s described in p a r a g r a p h 3.4. Chen a n d S h a p i r o (1996) p r o p o s e d the statistic Q H = [(n - 1)b] -1 ~-~[{Y(i + 1, n) - Y(i, n ) } / { H ( i + 1, n) - H(i, n)}] (17) where H ( i , n ) = G{(i - 0.5)/n} a n d G(a) = l n { - l n ( 1 - a)} a n d b is as defined in (10). W h e n n is equal to or greater t h a n 15 the following function o f Q H is a p p r o x i m a t e l y s t a n d a r d n o r m a l l y distributed. ZQH = { Q H - 1.008518 - 0.24756n 1/2 + 1.251697n-1}/[O.212212n
1/2
_
0.025916n-~] .
This is a two tailed procedure. A M o n t e C a r l o study showed that the p r o c e d u r e suggested by O z t u r k a n d K o r u k o g l u (1988) described a b o v e (10a) h a d slightly higher p o w e r t h a n (17); its p o w e r was o n the average two percent higher. H o w ever the p r o c e d u r e in (17) does n o t require a n y tables a n d p-values can be calculated using the n o r m a l distribution. Example Chen and Shapiro statistic The p r i o r d a t a a n d the calculation o f b = 0.2189 for the W e i b u l l test in 3.4 are used to illustrate the a b o v e procedure. T h e H ( i , n) are first c o m p u t e d . H ( 1 , 16) = - 3 . 4 4 9 9 , H(2, 16) - - 2 . 3 1 8 3 , H ( 3 , 16) = - 1 . 7 7 2 6 etc. This yields a value o f Q H = 0.9072 a n d ZQH = --1.65. T h u s the p - v a l u e for the test is a p p r o x i m a t e l y 0.10.
References Aitken, A. C. (1935). On least squares and linear combination of observations. Proc. Roy. Sac. Edin. 55, 42~48. Balakrishnan, N. and A. P. Basu (1995). The Exponential Distribution." Theory, Methods and Applications. Laghorne, PA: Gordon and Breach Publications. Birnbaum, A. (1959). On the analysis of factorial experiments without replication. Technometrics 1, 343 348. Blare, G. (1958). Statistical Estimates and Transformed Beta Variables. New York, John Wiley and Sons. Brain, C. W. and S. S. Shapiro (1983). A regression test for exponentiality: Censored and complete samples. Technometrics 25, 69 76. Chen, L. and S. S. Shapiro (1995). An alternate test for normality based on normalised spacings. J. Statist. Camp. Simul 53, 269-287. Chen, L. and S. S. Shapiro (1996). Can the idea of the QH test for normality be used for testing the Weibutl distribution. J. Statist. Camp. Simul. Chernoff, H. and G. J. Lieberman (1956). The use of generalized probability paper for continuous distributions. Ann. Math. Statist. 27, 806-818.
Distribution assessment
493
D'Agostino, R. B. (1971). An omnibus test of normality for moderate and large sample sizes. Biometrika 58, 341 348. D'Agostino, R. B. (1971a). Linear estimation of the Weibull parameters. Technometrics 13, 171-182. D'Agostino, R. B. (1972). Small sample probability points for D test of normality. Biometrika 59, 219221. D'Agostino, R. B. (1973). Monte Carlo comparison of the Wr and D test of normality for N = 100, Commun. Statist. l, 545-551. D'Agostino, R. B. and M. A. Stephens (1986). Goodness-of-Fit Techniques. New York, Marcel Dekker. Daniel, C. (1959). Use of half-normal plots in interpreting factorial two-level experiments. Technometrics 1, 311-341. De Wet, T. and J. H. Venter (l 972). Asymptotic distribution of certain test criteria of normality. S. Afr. Statist. J. 6, 135 149. De Wet, T. and J. H. Venter (1973). Asymptotic distributions for quadratic forms with application to tests of fit. Ann. Statist. 1, 380-387. Fercho, W. W. and L. J. Ringer (1972). Small sample power of some tests of constant failure rate. Technometrics 14, 713-724. Filliben, J. J. (1975). The probability plot correlation coefficient test for normality. Technometrics 17 111-117. Gan, F. F. (1985). Goodness-of-Fit Statistics for Location-Scale Distributions, unpublished Ph.D. thesis, Iowa State University, Department of Statistics. Gan, F. F. and K. J. Kohler (1990). Goodness-of-Fit tests based on P-P probability plots. Technometrics 32, 289-303. Gerlach, B. (1979). A consistent correlation-type goodness-of-fit test; with application to the twoparameter Weibull distribution. Math. Operations-forsch. Statist. Ser. Statist, 10, 427-452. Gnedenko, B. V., Y. K. Belyayev and A. D. Solovyev (1969). Mathematical Methods of Reliability. New York, Academic Press. Gupta, A. K. (1952). Estimation of the mean and standard deviation of a normal population from a censored sample. Biometrika 39, 260 273. Hahn, G. J. and S. S. Shapiro (1967). Statistical Models in Engineering. New York, John Wiley. Hall, P. (l 927). Distribution of the mean of samples from a rectangular population. Biometrika 19, 240. Harris, C. M. (1976). A note on testing for exponentiality, Nay. Res. Log. Quart. 23, 169-175. Lin, C. C. and G. S. Mudholkar (1980). A test of exponentiality based on the bivariate F distribution. Technometrics 22, 79-82. Lloyd, E. H. (1952). Least squares estimation of location and scale parameters using order statistics. Biometrika 39, 88-95. Mann, N. R., E. M. Scheuer and K. W. Fertig (1973). A new goodness of fit test for the Weibull distribution or extreme value distribution with unknown parameters. Cornmun. Statist. 2, 383-400. Mosteller, F. and J. W. Tukey (1949). The uses and usefulness of binomial probability paper. J. Amer. Statist. Assoc. 44, 174~212. Nelson, W. (1986). How to Analyze Data with Simple Plots, Volume 1, ASQC Basic References in Quality Control: Statistical Techniques, Milwaukee: American Society for Quality Control, Ozturk, A. and S. Korukoglu (1988). A new test for the extreme value distribution. Commun. Statist. Simul. 17(4), 1375-1393. Pearson, K. (1900). On the theory of contingency and its relation to association and normal correlation. Philos. Mag. 50, 157- 175. Royston, J. P. (1992). Approximating the Shapiro-Wilk W test for non-normality. Statist. Comput. 2, 117-119. Ryan, T. and B. Joiner (1973). Normal probability plots and tests for normality, Technical report, Pennsylvania State University. Shapiro, S. S. (1990). How To Test Normality and Other Distributional Assumptions, Volume 3, ASQC Basic References in Quality Control: Statistical Techniques, Milwaukee: American Society for Quality Control. Shapiro, S. S. and C. W. Brain (1984). Some new tests for the Weibull and extreme value distributions. Colloquia Mathematica Societatis Janos Bolyai 45, 511-527.
494
S. Shapiro
Shapiro, S. S. and C. W. Brain (1987). W Test for the Wcibull distribution. Commun. Stat&t. Part B Simul. Comput. 16, 209-219. Shapiro, S. S. and R. S. Francia (1972). Approximate analysis of variance test for normality. J. Amer. Statist. Assoc. 67, 215-225. Shapiro, S. S. and M. B. Wilk (1965). An analysis of variance test for normality (complete samples). Biometrika 52, 591-611. Shapiro, S. S. and M. B. Wilk (1972). An analysis of variance test for the exponential distribution (complete samples). Teehnometries 14, 355-370. Shapiro, S. S., M. B. Wilk and H. J. Chen (1968). A comparative study of various tests for normality. J. Amer. Statist. Assoc. 63, 1343 1372. Smith, R. M. and L. J. Bain (1976). Correlation type goodness of fit statistics with censored data. Comm. Statist. Part B Theory and Methods 5, 119-132. Stephens, M. A. (1974). EDF statistics for goodness of fit and some comparisons. J. Amer. Statist. Assoc. 69, 73~737. Stephens, M. A. (1978). On the W test for exponentiality with origin known. Technometrics 20, 33-35. Stephens, M. A. (1986). Goodness-of-Fit for censored data. Technical Report, Department of Statistics, Stanford University. Tiku, M. L. (1980). Goodness of fit statistics based on the spacings of complete or censored samples. Austral. J. Statist. 22, 260~275. Verrill, S. and R. A. Johnson (1987). The asymptotic equivalence of some modified Shapiro-Wilk statistics: complete and censored sample cases. Ann. Statist. 15, 413-419. Wadsworth, H. M. (1990). Handbook of Statistical Methods For Engineers And Scientists. McGrawHill, New York. Weisberg, S. and C. Bingham (1975). An approximate analysis of variance test for non-normality suitable for machine calculation. Technometrics 17, 133-134. Wilk, M. B. and R. Gnanadesikan (1961). Graphical analysis of multi-response experimental data using ordered distances. Proceedings of the Natural Academy of Sciences 47, 1209-1212. Wilk, M. B., R. Gnanadesikan and M. J. Huyett (1962). Probability plots for the gamma distribution. Teehnometrics 4, 1-20.
Part VI Applications
N. Balakrishnan and C. R. Rao, eds., Handbook of Statistics, Vol. 17 © 1998 Elsevier Science B.V. All rights reserved.
17) ]. (~ 1
Application of Order Statistics to Sampling Plans for Inspection by Variables
Helmut Schneider and Frances Barbera
1. Introduction
Acceptance sampling is a statistical process used to determine whether an incoming (or outgoing) quantity of goods conforms to specifications and should be accepted or rejected. Acceptance sampling means that either the producer or the consumer (or both) checks the quality of the product to determine its disposition. More specifically, when it is used, acceptance sampling is usually performed by: (1) a company receiving a shipment of goods, or (2) a company producing a batch of goods. There are two broad categories of acceptance sampling approaches: (I) attribute sampling and (2) variable sampling. Attribute sampling consists of counting the number of defectives or defects while variable sampling consists of assessing the chosen quality characteristic by measuring its value on a continuous scale. For variables sampling plans, some sample statistics, usually the sample mean and sample standard deviation, are computed from the sample and used to develop a test statistic. A decision is then made to either accept or reject the lot by comparing the test statistic with the product's specification limit. The primary advantage of variable sampling plans compared to attribute sampling plans is their efficiency (i.e., getting identical results using a smaller sample). There are primarily two areas where order statistics are used in acceptance sampling. First, since the performance of variable sampling plans is very sensitive to outliers, various techniques employing order statistics are suggested to make the variable sampling plans more robust. Secondly, when the quality characteristic is time of failure, censored sampling plans employing order statistics may be used. This article discusses the application of order statistics to variable sampling plans. The paper is organized as follows. In the following section we introduce variable sampling plans. Section 3 then deals with the application of order statistics for increasing the robustness of variable sampling plans. Section 4 discusses the use of order statistics for failure censored sampling plans, while Section 5 deals with methods of reduced test time such as accelerated life test sampling 497
498
H. Schneider and F. Barbera
plans and sampling schemes where several sets of units are tested. Section 6 gives a conclusion and outlook.
2. Sampling plans for inspection by variables Consider the situation where the quality characteristic under investigation, X, follows a normal distribution cb(x; #, a). The well-known procedure of Lieberman and Resnikoff (1955) is applied for variable sampling plans with a one-sided specification limit. The k method is used for single-sided specification limits. A sampling plan is created by specifying a sample size n and an acceptability constant k. Let ~ be an estimate of the mean and ~ be an estimate of the standard deviation of X; then the value t=~-k~
(1)
is compared with L, a lower specification limit. On the basis of this comparison, each lot is either accepted (t >_ L) or rejected (t < L). Equivalently one may choose the statistic
{ __ g - L
(2)
and reject the batch if t' < k. If an upper specification limit, U, is used then t = g + k~ is compared with U and the lot is rejected when t > U. Otherwise the lot is accepted. Again, using t ~ = (U - g ) / ~ the batch will be rejected if t ~ < k. Two critical quality levels are usually associated with acceptance sampling plans: an acceptable quality level (AQL) and a lot tolerance percent defective (LTPD) which is also called the rejectable quality level (RQL). AQL represents the percent defective considered acceptable as a process average. LTPD represents the level of quality that the consumer wants to have rejected. The acceptance criteria and sample size are often chosen such that the probability of accepting a lot coming from a process operating at an acceptable quality level (AQL) and the probability of rejecting a lot coming from a process operating at the Lot Tolerance Percent Defective (LTPD) are preassigned values e and/?, respectively (see, for example, Owen, 1963). Hence, the acceptance criteria and sample size are chosen such that the probability of accepting a lot coming from a process operating at an acceptable quality level (AQL) is 1 - c~, and the probability of rejecting a lot coming from a process operating at the Lot Tolerance Percent Defective (LTPD) is 1 -/~. Alpha (e) is the producer's risk, and/3 is the consumer's risk. For normally distributed characteristics one uses the statistics # = x and ~ = s. We will denote the latter plans as sampling plans by Lieberman and Resnikoff (1955). An identical statement of the sampling-plan specification problem can be made in terms of hypothesis testing. Essentially we are seeking tests of the hypothesis concerning the fraction of defectives p:
Application of order statistics to sampling plans Ho:p=AQL
and
Ha:p=LTPD
.
499
(3)
Thus we state that P{acceptlHo }=1-~
and
P{acceptlHa }=/~ ,
(4)
and we seek to determine n and k which satisfy this requirement.
3. Robustness of variable sampling plans for normal distributed characteristics Although the elegance of the variables method and its efficiency when the assumption of normality is true and all observations are available make this procedure superior to the attributes procedure, its sensitivity to the assumption of normality leads to the attributes procedure being used even when the variables method can be applied. Many authors therefore studied the effect of non-normality on variables sampling plans. Owen (1969) gives a summary of the work in this area. More recent work on sampling plans for non-normal populations include Masuda (1978), Rao et al. (1972), Schneider et al. (1981), Srivastava (1961) and Takagi (1972). In this article we will only discuss methods involving order statistics. Farlie (1983) has described some undesirable features of acceptance sampling by variables that have hindered its widespread use. He also gives an example to demonstrate the role outliers play in variable sampling plans. He considers an (n, k) sampling plan with n = 3 and k = 1.12. A lower limit L = 0 is specified. Hence, i f ~ - 1.12 s >_ 0, the batch is accepted; otherwise, it is rejected. From two batches, samples are taken where xl = 0.15, X 2 = 1.15, x3 = 2.15 and yl = 0.15, y2 = 1.15, y3 -- 3.05. The first sample leads to the acceptance of the associated batch because 2 = 1.15, sx = 1. The second sample leads to rejection of the associated batch since y = 1.45 and Sy = 1.473. The result seems paradoxical since the y sample is intuitively better than the x sample, yet the better sample leads to the rejection of the batch and the poorer sample leads to acceptance of the batch. This paradox is caused by the large observation 3.05. The normality assumption translates the large value (far away from the lower specification limit) into evidence for large deviations in the other direction as well. Thus, due to the symmetry of the normal distribution one expects small values in the batch close to the lower specification limit. Farlie (1983) states that sampling plans should have a property which relates sample quality to batch acceptance. Consider two samples A (Xl,X2,... ,x,) and sample B (Yl,Y2,..., y,). Let x(i) and Y(i) be the ith order statistics of sample A and B respectively. Sample A is preferred to sample B (with respect to a lower specification limit) if, and only if x(i) _>Y(i) for all i and x(i) > Y(i) for at least one i. Intuitively there should be no sample leading to a rejection of a batch which is preferred to a sample leading to acceptance of a batch. Farlie calls this the "Property Q" and develops sampling plans based on order statistics which have this intuitive property. He considers the statistic
500
H. Schneider and F. Barbera
n
T(Xl, x2,..., xn) = Z
aix(i)
(5)
i=1
where the weights ai, i = 1,2,... ,n are chosen to minimize the variance of the estimator T with the restrictions that T is an unbiased estimator of # + ka and ai >_ 0 for i = 1 , 2 , . . . , n. The latter requirement is to satisfy the property Q mentioned above. Farlie's sampling plans for lower specification limits result in sampling plans which are censored from above, i.e., a~ = 0 for i = r + 1, r + 2 , . . . , n for some r < n. The sampling plans turn out to have increasing censoring (i.e., more sample items have weights zero) as the acceptability constant, k, increases. The relative efficiency, measured by the ratio of the asymptotic variances of sampling plans by Lieberman and Resnikoff to the property Q sampling plans, is very high. For instance, for a sample size of n = 10 and 50% censoring from above the reported efficiency is still 95% . Symmetrical censoring was proposed by Tiku (1980) as a method of obtaining robust test procedures. He showed that symmetrical Type II censoring, where a fixed number of the sample items is truncated at both ends of the sample, is a powerful method of obtaining robust estimation of the location parameter of a population and performs quite well with other well-known robust estimators. This is so because non-normality essentially comes from the tails of the distribution and once the extreme observations (representing the tails) are censored, there is little difference between a non-normal sample and a normal sample. Subsequently, Kocherlakota and Balakrishnan (1984) used symmetrical censoring to obtain robust two-sided variable sampling plans. The authors use Tiku's modified maximum likelihood estimator (MML) to estimate the mean and standard deviation. A simulation study (Kocherlakota and Balakrishnan (1984)) suggests that these censored sampling plans are quite robust when applied to various non-normal distributions. This means that while variable sampling plans by Lieberman and Resnikoff are very sensitive to deviations from normality, symmetrical censoring of the sample will result in probabilities of acceptance which are closer to the expected ones regardless of the distribution of the population.
4. Failure censored sampling plans Consider a life test where the quality characteristic is time of failure, T, and the distribution function F(x; #.~a) belongs to the location scale family, i.e., the distribution of F(z), where z = (x - #)/~r is parameter free. Examples discussed later are the normal and the extreme value distribution. We note however, that the sampling plans will apply to the Weibull and lognormal distribution because of the relationship between the two pairs of distributions, i.e., the logarithm of a Weibull distributed random variable is extreme value distributed while the logarithm of a lognormal random variable is normally distributed.
Application of order statistics to sampling plans
501
Variable sampling plans such as discussed earlier and published for instance in MIL-STD-414 (1957) may be used. However, since it is time consuming to wait until the last item fails, these plans are not best suited for life tests. To save time, tests can be terminated before all test units have failed. The test can be discontinued after a prechosen time (time censored) or after a prechosen number of items have failed (failure censored). This paper is restricted to failure-censored sampling for location scale family distributions for two reasons. First, it is easier to draw inference from failurecensored samples than from time-censored samples. The reason is that the covariance matrix of estimators for location,/~, and scale, a, parameters of the distribution depends on the true values of # and a only through the pivotal quantity u = (x(r) -/~)/o- ,
(6)
where x(r) is the censoring point in time or the largest failure. For failure-censored samples, this quantity u is fixed, but for time censoring it has to be estimated. Consequently, the design and performance of failure-censored sampling plans do not depend on the unknown parameters of the distribution function as do the design and performance of time-censored sampling plans. Second, for time censoring there might not be any failures, in which case it is impossible to estimate both parameters of the distribution. In practice, however, time censored sampling may be preferred. This is partly because most life-test sampling plans have a constraint on the total amount of time spent on testing. Although the test time of failure-censored sampling plans is random, the distribution of test time can be estimated from historical data. If pooled parameter estimates were used from batches that were accepted in the past, the distribution of test time for good quality could be estimated very accurately. (Poor quality will obviously have a shorter test time.) This distribution can then be used as a guide for choosing a plan. The accuracy of these parameter estimates does not influence the performance of the failure-censored sampling plans, but it does influence the distribution of the time the experimenter has to wait until the failures have occurred. Failure-censored sampling plans for the doubly exponential distribution were discussed by Fertig and Mann (1980) and by Hosono, Okta, and Kase (1981). Schneider (1989) presented failure-censored sampling plans for the lognormal and Weibull distribution. Bai, Kim and Chun (1993) extended these plans to accelerated life-tests sampling plans. The main difference between various published plans is the type of estimators used. We will first describe the general methodology for developing failure-censored sampling plans. To prepare for this description, note that an analogy can be made between the sampling plans presented in this section and the variable-sampling plans by Lieberman and Resnikoff which control the fraction defective of a product, wherein the item is defective if its measured variate, X, is less than some limit, L. Since the reliability is just the fraction of items with failure times greater than the specified mission time, the time an item is required to perform its stated mission, we may equate unreliability at the mission time with fraction defective below L. Hence we
502
H. Schneider and F. Barbera
will use L as the notation for mission time. Analogously, one may define (Fertig and Mann 1980) the Acceptable Reliability Level (ARL) as the reliability level at which we want a 1 - ~ probability of acceptance and the Lot Tolerance Reliability Level (LTRL) as that level at which we want a 1 - fl probability of lot rejection. The statistic whose realization is used to decide whether or not a lot should be accepted in the normal case is ( x - L)/s, with ~ the sample mean and s the adjusted sample standard deviation. This is the statistic which gives uniformly most accurate unbiased confidence bounds for the fraction defective p = ~b(L;#, a). Even though the Weibull case does not admit complete sufficient statistics as in the normal situation, there do exist statistics which can be used to obtain confidence intervals on the reliability R(L). Consider failure censored data that can be generated by placing n items on life test and waiting until the r th failure has occurred. Let the order statistics of the sample from the location scale family be given by xlll,°,xl21,°,xc31,.,
. . .
(7)
where X(i), n is the i th order statistic of a sample of size n. For simplicity we will omit the index n and write X(i). Note that for the lognormal and Weibull distribution we take the logarithm of the failure times to obtain a location scale distribution. These order statistics can be used to test hypotheses concerning R(L). We may use t = ~ - k~ where ~ and ~ are estimates of the location and scale parameter, respectively and compare the t with the mission time L. If t < L the batch is rejected, otherwise the batch is accepted. Equivalently, we may use ta = ~ L and reject the batch if t ~ < k. The value k is called the acceptability constant and depends on the sample size, censoring, percentage defectives and the covariance matrix of the estimators used. For the variables sampling plans by Lieberman and Resnikoff specifying the combination of consumer's and producer's risk levels is sufficient to define both the acceptance criterion and the sample size. This is because censoring is not considered. When censoring is allowed, however, an added degree of freedom is introduced that requires the user to specify another criterion. In designing sampling plans one seeks the smallest sample size satisfying the specified levels of consumer's and producer's risk. The purpose of censoring is usually to obtain more failure information in a shorter period of time. Thus, placing three items on test and waiting until all three fail will take, o n the average, a longer period of time than if one places 10 items on test and waits for three failures. Fertig and Mann (1980) therefore suggest that with the introduction of censoring, some function of sample size and test time should be minimized subject to consumer and producer's risk in order to find a compromise between sample size and test time.
4.1. Plans based on the chi square approximation for the statistic t Fertig and Mann (1980b) consider the extreme value distribution and use the best linear invariant estimators (BLIE's) for its location and scale parameters # and tr,
Application of order statistics to samplingplans
503
respectively which are, for instance, described by Mann (1968). The distribution of t given in (1) (or t' given in (2)) depends on # and a only through F(e-~), the standard extreme value distribution. For the extreme value case where the unreliability, or fraction defectives is p=l-R(k)=l-
e x p { - exp [L--~a ~] }
(8)
and thus tp = # + a l n ( - l n [ 1 - p ] )
(9)
is t h e p l 0 0 th percentile of the reduced extreme value distribution, the distribution of t has been tabulated by Mann and Fertig (1973) using Monte Carlo procedures. Engelhardt and Bain (1977) and Mann, Schafer, and Singpurwalla (1974) offer approximations that have been used to determine percentiles of the distribution of t for various ranges of p, n and r, the r th failure of the censored data. The approximations were developed in order to construct confidence intervals on the reliability R(L) as well as tolerance bounds on x(r) for specified p. Unfortunately, these approximations are not universally valid as pointed out by Fertig and Mann (1980b). Lawless (1973) offers an exact procedure for obtaining confidence bounds on reliability and thus performing hypothesis testing. However, his method, which is based on an ancillary statistic, requires a numerical integration for each new sample taken, and therefore is not amenable to the construction of tables. Moreover, it is not clear how one could easily determine the risk under unacceptable alternatives (e.g., when the process is operating at the LTRL). Fertig and Mann (1980b) developed a Chi Square approximation to define the sampling plans they presented. 4.2. Plans based on the normal approximation for the statistic t
Schneider (1989) used the maximum likelihood estimators to estimate the location and scale parameters/~ and a and applied a large sample approximation to the statistic t defined in (1). However other estimators may be used as well. In what follows we shall use the best linear unbiased estimators (BLUE's) of Gupta (1952). The main difference between the plans described by Schneider (1989) and the plans described below is the covariance matrix used. Consider the BLUE's of /~ and a for a failure censored sample of size n where only the first r failures are observed. The estimators are weighted sums of the order statistics x(i) i = 1 , 2 , . . . , n of a sample of size n r
=
a .x(o
(lO)
i--1
r
:
(111 i=1
504
H. S c h n e i d e r a n d F. B a r b e r a
where the ai,n and bi,n are coefficients depending on the sample size and the distribution of the measured characteristic. For the normal distribution they are tabulated in Sarhan and Greenberg (1962) for sample sizes up to n = 20. For the extreme value distribution the coefficients are tabulated in Nelson (1982) for sample sizes up to n = 12. Let fi and ~ be the best linear unbiased estimators of/~ and a, respectively. We consider the statistic given in (1) i.e., t = ~ - k~ which is an unbiased estimator of # - k a and is asymptotically normally distributed (Plackett, 1958). Let the covariance matrix of the estimators be given by Var(~'a) = az[Ta11712
722712]"
(12)
For the normal distribution the factors 76 are tabulated in Sarhan and Greenberg (1962) for sample sizes up to n = 20. Nelson (1982) gives the factors 7a for the extreme value distribution for sample sizes up to n = 12. The variance of the statistic t is therefore Var(t) = o-2{71, + k272z - 2k712} .
(13)
In the following, large-sample theory is used to derive equations for the sample size and the acceptability constant for a given degree of censoring and given two points on the operating characteristic (OC) curve. The standardized variate U=
(14) M/O'2{711 ÷ k2722 - 2k712}
is parameter-free and asymptotically standard normally distributed. Thus let Zp be the p l00 th percentile of the log lifetime distribution (normal or extreme value) corresponding to the fraction nonconforming p, then the operating characteristic curve which gives the probability of acceptance for various percent defectives is approximately given by
(15)
where 7n,r(k) = {711 + k2722- 2k712}
(16)
and ~b(x) is the cumulative standard normal distribution function. Suppose we would like to determine an (n, k) sampling plan for two given points on the OC curve (p~, 1 - ~) and (p~, fl). It can be shown (Schneider, 1989) that the acceptability constant k is (asymptotically) dependent only on the percentiles of the log lifetime distribution and the standard normal distribution, i.e. k = U~Zp~ - ul ~Zpo U~ - - U 1
(17)
Application of order statistics to sampling plans
505
Thus k can be determined independently of n and the degree of censoring. The sample size n satisfies the equation n =
7n,r(k) .
(18)
Unfortunately the right side of the last equation involves the sample size n. However, a solution can be found through a search procedure. Notice also that for any (n, k) sampling plan discussed here, the OC curve can be determined by Monte Carlo simulation. This is possible because the distribution of the standardized variate U is parameter-free, depending on the sample size n, the acceptability constant k, and the number of failures r. Thus a simulated OC curve can be used to select a sampling plan. The same procedure is used when the maximum likelihood estimators (Schneider, 1989) are used instead of the BLUE. However, for the MLE, the equation for n is easier to solve because asymptotic covariance factors on the right hand side of the equation depend only on the percent censoring and not on the sample size. It was shown (Schneider, 1989) that even for small sample sizes the asymptotic results for the MLE's are accurate enough for practical purposes. Since the small sample covariance matrix is used for the BLUE's the results can be expected to be accurate as well. 4.3. Distribution o f the test length
The sampling plans derived in this article are failure censored. In practice, it is often desirable to know the length of the test in advance. Percentiles of the test time distribution can also be obtained from the distribution of the order statistic x(r) (David, 1981), which gives the time of the r th failure. If the test time x is lognormal distributed or Weibull distributed, then, after a logarithmic transformation, the pl00 th percentile x(r)p is computed by x(r)p = antilog{# + z(~)pa}
(19)
where z(r)p is the pl00 th percentile of the r th order statistic from a standard normal distribution or the smallest-extreme-value distribution. These percentiles may be obtained from Pearson and Hartley (1970). Note that the computation of the percentage points of the test time requires estimates of the parameters # and a. They should be estimated from historical data based on an acceptable quality level p~. In this case, the estimated test times are conservative; that is, if the lot quality of the current lot is good (at p~), then the estimated times are valid. If the quality of the lot is poor (p > p~), however, then the true test times will be shorter. Therefore, for planning purposes the test time distribution is conservative. To protect against long testing of a very good product (p < p~) one can introduce a time at which tests are terminated. A combination of failure censored and time censored sampling plans was discussed by Fertig and Mann (1980b). For these plans the tests are terminated if
506
H. Schneider and F. Barbera
a predetermined number of failures (re) occur or the test time for the items being tested exceeds xr, whichever comes first. The time xr is a predetermined feasible test time. I f the test is terminated because the test time exceeds x~, the lot will be accepted provided less than re failures occurred. The actual test time is then the minimum o f x r and the test time x~c to obtain rc failures. Fertig and M a n n (1980) give the median test time of xr, for Weibull distributed data based on the test statistic t.
5. Reduction of test times for life-test sampling plans A c o m m o n problem in life-test sampling plans is the excessive length of test times. Life times of good products are usually high making life test time consuming and expensive. Thus there is an interest in methods which help to reduce test times. The next two sections deal with such methods. One method to shorten the test time is to use accelerated test times discussed in the next section. The last section discusses ways of reducing test time by testing groups of items and stopping the test when the first item in the group fails.
5.1. Accelerated life-test sampling plans Bai, K i m and Chun (1993) extended the life test sampling plans developed by Schneider (1989) to accelerated tests. Accelerated life tests make use of a functional relationship between stress levels and failure time. The functional relationship has to be known. In m a n y cases a linear function is used. When accelerated tests are used the test time can be reduced substantially. Test items are usually tested at much higher stress levels than normally experienced during actual applications which we will refer to as the design stress. The failure times are then used to estimate the linear (or other) relationship. The estimated function is then used to extrapolate to failure times at design stress conditions usually not tested. Bai, K i m and Chun consider the following model. The location parameter /~ is a linear function of the stress level ]/(S) = 70 -~- 71 s
(20)
where s is the stress level and 71 and 7o are unknown constants. The scale parameter a is constant and independent of stress. The life test uses two predetermined stress levels Sl and s2 where sa < s2. A random sample of size n is taken from a lot and allocated to the two stress levels. The tests at each stress level are failure censored and the respective number of failures at each test level are rl and r2. The test procedure is the same as for the sampling scheme discussed in Section 4.2. The test statistic used at the design stress so, is t = ~(s0) - k~ where
(21)
Application of order statistics to sampling plans ~(S0) = 70 ~- ~lS0 •
507
(22)
The statistic t is compared to a lower limit L. The lot is accepted if t > L otherwise the lot is rejected. The sample size n and acceptability constant k are to be determined so that the OC curve of the test plan passes through two points (p~, 1 - c~) and (pfl, fl). Bai et al. (1993) use the maximum likelihood estimators for estimating the parameters in the model. To obtain the optimum proportions of the sample allocated to each stress level the following reparametrization is convenient. Let _ s-s0
,
(23)
S2 -- SO
then the mean # may be written in terms of ~=fl0+fll~
(24)
fl0----~0-J-~lS2
(25)
fil = 71(s2 - so) •
(26)
where
and
Bai et al. (1993) choose the proportion of the sample rc which is allocated at the low stress level to minimize the determinant of the covariance matrix
Cov(fi0, ill, ~)Nelson and Kielpinski (1976), however, provide an argument that an optimum plan uses just two stresses where the highest allowable test stress, s2, must be specified, while the low stress, sl, and the proportion, re, may be determined by minimizing the variance of the estimator under consideration. Hence the sampling plans of Bai et al. (1993) may not be optimal. This explains why the required sample sizes of the accelerated test plans can actually exceed the sample sizes of the failure censored sampling plans suggested by Schneider (1989) for a given risk. Bai et al. (1993) also give an approximation for the expected log test time
E[x(r,),ni] = { flo + fll~i + acb-l {ri-3/8"~ I,ni 1/4] f°r l°gn°rmal }
(27)
flO -}- fll~i q- crlp-1 {ri-l/4"~ f o r W e i b u l l
where ~b and ku are the standard normal and standard extreme value distribution and the adjustments (3/8) and (1/4) are based on Kimball's (1960) plotting positions on probability paper.
5.2. Group-Test sampling plans Balasooriya (1993) presents failure censored reliability sampling plans for the two parameter exponential distribution where m sets of n units are tested. Each set of n
508
H. Schneider and F. Barbera
units is tested until the first failure occurs. Balasooriya considers the two parameter exponential distribution function
f(x;#,a)=lexpI-X-#], 0.-
L
x>#
and o - > 0 .
(28)
O.~j
Let (29)
X(I),i, X(2),i, X(3),i,""", X(n-1),i, Y(n),i
be the order statistics of a random sample of size n from (28) of the ith set, i = 1 , 2 , . . . , m . The first order statistics xfl),i of sample i, i = 1 , . . . , m has the probability distribution function f(xfl); #, o-) = ~ne x p [- n X(1 )a-- ~] , X(l)_>#
anda>O
(30)
Let X(1),(I), X(1),(2), X(1 ),(3), • • • , X(1),(m-1), X(l),(m)
(31 )
be the order statistics of the smallest values from each sample of size n then the maximum likelihood estimator of xfl) in (30) is fi = xfl),fl)
(32)
and the maximum likelihood estimator of a is = 2ira1 (xfl),(i) -xfl),fl))
(33)
m
Sampling plans are then constructed in the usual way. The statistic t = ~ 4- k~
(34)
is compared to a lower specification limit L and the lot is rejected if t < L otherwise the lot is accepted. The operating characteristic curve is based on a result of Guenther et al. (1976). For t = g - k~ the probability of acceptance is easily obtained, however, when t = ~ + k~ the operating characteristic curve is more complicated and the solutions for m and k have to be found iteratively (Balasooriya, 1993). The expected test times depend on the setup of the tests. When the m sets are tested consecutively, the total test time is m
T c = ~-~x(1),(i )
(35)
i=l
and assuming # = 0 the expected test time is mcr E[rcl
/'1
For simultaneous testing one obtains
(36)
Application of order statistics to sampling plans
509
Ts = X(l),(m )
(37)
E[T~]
(38)
a n d thus =
- - 2._.#
ni=li
B a l a s o o r i y a (1993) p r o v i d e s tables for these s a m p l i n g plans.
6. Conclusion O r d e r statistics are e m p l o y e d in m a n y ways in a c c e p t a n c e sampling. First, o r d e r statistics are used to i m p r o v e the r o b u s t n e s s o f s a m p l i n g p l a n s b y variables. Second, in life testing one uses o r d e r statistics to s h o r t e n test times. Since life is an i m p o r t a n t quality characteristic a n d life tests are time c o n s u m i n g a n d expensive, recent w o r k focuses on reducing test times o f s a m p l i n g plans. T r a d i t i o n a l l y only the s a m p l e size a n d a c c e p t a b i l i t y c o n s t a n t k were c o n s i d e r e d the design p a r a m eters for v a r i a b l e s a m p l i n g plans. W h e n test p l a n s are censored, a new design p a r a m e t e r , the degree o f censoring, is a d d e d a n d a c o m p r o m i s e between s a m p l e size a n d test length has to be found. F u r t h e r research needs to be d o n e to c o m p a r e different s a m p l i n g schemes to d e t e r m i n e s a m p l i n g plans which are a best c o m p r o m i s e b e t w e e n the v a r i o u s objectives.
References Bai, D. S., J. G. Kim and Y. R. Chun (1993). Design of failure censored accelerated life test sampling plans for lognormal and Weibull distributions. Eng. Opt. vol. 21, pp. 197-212. Balasooriya, U. (1993). Failure-Censored reliability sampling plans for exponential distribution-A special case. J. Statist. Comput. Simul. to appear. Cohen, A. C., Jr. (1961). Tables for maximum likelihood estimates: Singly truncated and singly censored samples. Teehnometrics 3, 535-541. Das, N. G. and S. K. Mitra (1964). The effect of Non-normality on sampling inspection. Sankhya 261, 169-176. David, H. A. (1981), Order Statistics. New York, John Wiley. Engelhardt, M. and L. J. Bain (1977). Simplified procedures for the Weibull or extreme-value distribution. Technometrics 19, 323 331 Farlie, D. J. G. (1983). Sampling Plans with Property Q, in Frontiers in Statistical Quality Control. H. J. Lenz et al., eds., Physica-Verlag, Wurzburg, West Germany. Fertig, K. W. and N. R. Mann (1980b). An accurate approximation to the sampling distribution of the studentized extreme-value statistic. Teehnometrics 22, 83 97. Fertig, K. W. and N. R. Mann (1980b). Life-Test sampling plans for two-parameter Weibull populations. Technometrics 22, 165-177. Guenther, W. C., S. A. Patil and V. R. Uppuluri (1976). One-sided/3-content tolerance factors for the two parameter exponential distribution. Technometrics 18, 333-340. Gupta, A. K. (1952). Estimation of the mean and standard deviation of a normal population from a censored sample. Biometrika 39, 260-273. Harter, H. L. (1970). Order Statistics and Their Use in Testing and Estimation (Vol. 2), Washington, DC: U.S. Government Printing Office.
510
H. Schneider and F. Barbera
Hosono, Y., El. Okta and S. Kase (1981). Design of Single Sampling Plans for Doubly Exponential Characteristics, In: Frontiers in Statistical Quality Control, H. J. Lenz et al., eds., Physica-Verlag, Wurzburg, West Germany, 94--112. Kimball, B. F. (1960). On the choice of plotfing position on probability paper. J. Amer. Statist. Assoc. 55, 546-560. Kocherlakota, S. and N. Balakrishnan (1985). Robust two-sided tolerance limits based on MML estimators. Commun. Statist. - Theory Meth. 14, 175-184. Lawless, 1. F. (1973). Conditional versus unconditional confidence intervals for the parameters of the Weibull distribution. J. Amer. Statist. Assoc. 68, 665-669. Lawless, 1. F. (1975). Construction of tolerance bounds for the extreme-value and Weibull distributions. Technometrics 17, 255-261. Lieberman, G. J. and G. J. Resnikoff (1955). Sampling plans for inspection by variables. J. Amer. Statist. Assoc. 50, 457-516. Mann, N. R. (1967a). Results on location and scale parameter estimation with application to the extreme-value distribution. ARL67-0023, Aerospace Research Laboratories, Office of Aerospace Research, USAF, Wright-Patterson Air Force Base, Ohio. Mann, N. R. (1967b). Tables for obtaining best linear invariant estimates of parameters of the Weibull Distribution. Technometrics 9, 629-645. Mann, N. R. (1968). Point and interval estimation procedures for the two-parameter Weibull and extreme-value distributions. Technometrics 10, 231-256. Mann, N. R. and K. W. Fertig (1973). Tables for obtaining confidence bounds and tolerance bounds based on best linear invariant estimates of the extreme-value distribution. Technometrics 15, 86-100. Mann, N. R., R. E. Schafer and N. D. Singpurwalla (1974). Methods for Statistical Analysis of Reliability and Life Data. New York, John Wiley and Sons. Masuda, K. (1978). Effect of Non-normality on Sampling Plans by Lieberman and Resnikoff. Proceedings of the International Conference on Quality Control, Tokyo, Japan D3, 7-11. MIL-STD-414 (1957). Sampling Procedures and Tables for Inspection by Variables for Percent Defectives. MIL-STD-414, U.S. Government Printing Office, Washington, D.C. Nelson, W. (1982). Applied Life Data Analysis. John Wiley and Sons, New York. Nelson, W. and T. Kielpinski (1976). Theory for optimum accelerated life tests for normal and lognormal distributions. Technometrics 18, 105-114. Nelson, W. and J. Schmee (1979). Inference for (log) Normal life distributions from small singly censored samples and BLUE's. Technometrics 21, 43-54. Owen, D. B. (1963). Factors for One-sided Tolerance Limits and for Variables Sampling Plans. SCR607, Sandia Corporation monograph. Owen, D. B. (1969). Summary of recent work on variables acceptance sampling with emphasis on nonnormality. Technometrics 11, 631-637. Pearson, E. S. and H. O. Hartley (1970). Biometrika Tables for Statisticians (Vol. 1, 3rd ed.) Cambridge, U.K., Cambridge University Press. Rao, J. N. K., K. Subrahmaniam and D. B. Owen (1972). Effect of non-normality on tolerance limits which control percentages in both tails of normal distribution. Technometrics 14, 571-575. Sarhan, A. E. and B. G. Greenberg (eds.) (1962). Contributions to Order Statistics. New York, Wiley. Schneider, H. and P. Th. Wilrich (1981). The robustness of sampling plans for inspection by variables, In: Computational Statistics, H. Buning and P. Naeve, eds., Walter de Gruyter, Berlin New York. Schneider, H. (1985). The performance of variable sampling plans when the normal distribution is truncated. J. Qual. Tech. 17, 74-80. Schneider, H. (1989). Failure-Censored variable-sampling plans for lognorrnal and Weibull distributions. Technometrics 31, 199-206. Srivastava, A. B. L. (1961). Variables sampling inspection for non-normal samples. J. Sci. Engg. Res. 5, 145-152. Takagi, K. (1972). On Designing unknown-sigma sampling plans based on a wide class of non-normal distributions. Technometrics 14, 669-678.
Application of order statistics to sampling plans
511
Thoman, D. R., L. J. Bain and C. E. Antle (1969). Inferences on the parameters of the Weibull distribution. Technometrics 11, 445~,60. Tiku, M. L. (1967). Estimating the mean and standard deviation from a censored sample. Biometrika 54, 155-165. Tiku, M. L. (1980). Robustness of MML estimators based on censored samples and robust test statistics J. Statist. Plann. Infer. 4, 123-143.
N. Balakrishnan and C. R. Rao, eds., Handbook of Statistics, Vol. 17 © 1998 Elsevier Science B.V. All rights reserved,
1
g~
l_
Linear Combinations of Ordered Symmetric Observations with Applications to Visual Acuity
Marlos Viana
1. Introduction In vision research, the Snellen chart is commonly used to assess visual acuity and is made up of letters of graduated sizes. By combining letter size and chart distance it is possible to determine the minimum visual angle of retinal resolution. A visual acuity of 20/30 means that at 20 feet the minimum angle of resolution is 3o_ 20 times the resolution standard (about 5 minutes of arc). The vision of a normal eye is recorded as 20/20 and corresponds to zero in the scale determined by the logarithm of the minimum angle of resolution (Log MAR). Normally, a single measure of visual acuity is made in each eye, say Y1,112,together with one or more covariates X, such as the subject's age, reading performance, physical condition, etc. Because smaller values of Log M A R correspond to better visual acuity, the extremes of visual acuity are defined in terms of the "best" acuity I1(1) = min{Y1, Y2} and the "worst" acuity Y(2) = max{Y1, Y2}. Ordered acuity measurements are also required to determine the person's total vision impairment, defined as Total Impairment - 3Y(1) + Y(2) 4 [e.g., Rubin, Munoz, Fried and West (1984)]. Consequently, there is interest in making inferences on the covariance structure A = Cov(X,
WlY(1 ) -1- w2Y(2)) ,
which includes the assessment of the correlation and linear predictors between X and linear combinations wl I1(1) + w2Y(2) of the extreme acuity measurements. In particular, the correlations among average vision (wl = w2 = 0.5), best vision (Wl = 1,w2 = 0), worst vision (wl = 0, w2 = 1), acuity range (wl = -1,w2 = 1), vision impairment (wl = 0 75,w2 = 0.25) and one or more of the patient's conditions can be assessed. Other applications of extreme bivariate measurements include the current criterion for an unrestricted driver's license, which in the majority of states is 513
M. Viana
514
based on the visual acuity of the best eye [see Fishman et al., (1993), Szlyk et al., (1993)]; the assessment of defective hearing in mentally retarded adults based on the ear with best hearing (Parving and Christensen 1990); the predictive value of the worst vision following surgery in the eyes of glaucoma patients (Frenkel and Shin 1986); sports injury data on the reduction of best vision in damaged eyes (Aburn 1990); and the analysis of worst vision among patients treated for macular edema (Rehak and Vymazal 1989). 2. Models and basic results
Because of the natural symmetry between responses from fellow eyes as well as between the additional measurement X and the response of either eye, it is assumed that the vector of means associated with (X, Y1,Y2) is given by #' = (#0, #1, #1) and the corresponding covariance matrix 2; by o 2 7o-3 7av] 70-77 .g.2 p~-2 ] , 7o-'c Pz 2 z2 j
2; =
1 -}-p 72 < - -2 '
p2 < 1 -'
(2.1)
where the range of the parameters is necessary and sufficient for 2; to be positive semi-definite. When there are p Y-values, the restriction is 72 _< [1 + (p - 1)p]/p. In general, the correlation p between Ya and Y2 is in the interval [-1, 1]. However, in the present context in which I"I and Y2 represent measurements on each eye, the correlation may be assumed to be non-negative. A key result is that the covariance between X and Y(i) is equal to the covariance between X and Y,., which is surprising. In the bivariate case, because Y(2) =½IY1 - I+21+21-(I+1+g2) , it obtains that, coy(X, Y(2)) - coy(X, Y2) = 2!coy(X, II11- Y21)
= fX~Y2~Yl(X- #o)(Yl-y2)dP = ffxx(x-#o)R(x)dPx, where the next to last equality follows from the fact that the distribution symmetric in y and the last equality from defining
R(x) = /
(Yl - Y2) dPYIx J~ _
P(x,y) is
i
However, under the assumption of an exchangeable bivariate normal distribution, the expected conditional range R(x) of Y given x is constant in x and can be factored out of the integral, showing that cov(X, Y(2)) = coy(X, Y2). This consequence of the exchangeability properties of the normality model described by (2.1) is discussed further in Olkin and Viana (1995), where the following propositions are proved (see also David (1996) and Viana and Olkin (1997)).
Linear combinations of ordered symmetric observations
515
PROPOSITION 2.1. If Y1,..., Yp are normally distributed with common mean v, common variance ~2 and common correlation p, then the covariance matrix of the order satistics ~ # ' = ( Y 0 ) , ' " , Y(p)) is
Cov(°lJ) =
z2[pee ' +
(1 -- p)C~],
PROPOSITION 2.2. If the distribution of X, Y1,. • -, Yp is multivariate normal with Y1,..., Yp exchangeable and with (X, Yt) and (X, Yj) equally distributed, then
Cov (x,
=Cov (x, r).
where cg is the covariance matrix of the order statistics o f p independent standard normal random variables. As a consequence, the covariance matrix A of (X, w'~), where w indicates the column vector of real coefficients (wl, w2), is given by k : Cov(Y, w'O~) :
[ 0"2' [_ya'cw e
7azwe' ] . z2[p(w'e) 2 q- (1 - p)w'Cgw]
(2.2)
In the bivariate case, the covariance matrix 7' of (X, Y(1), II(2)) is
t// =
70-~ g2[p q_ (1 -- p)Cll ] z2[p qp)c12 ] , LTaZ ~2[p + (1 - p)c21 ] T2[p Jr- (1 - p)c22]J
~=
[Cll C121 = I0.6817 /C2I C 2 2 / [0.3183
(2.3)
where 0.3183] 0.6817J
(2.4)
is the covariance matrix between the largest and the smallest of two independent standard normal variables (Beyer 1990, Table VII.2 p. 243). Also note that, when p=2, W'%aW = (w~ q-
W2)Cll q- 2WlW2C12,
w'e = Wl q- w2 .
3. Correlations and linear regressions
From (2.2), the correlation 6 between X and a non-null linear combination w ' ~ of ~ is, 5 = Corr(X, w'°~) = - ~;w'e ~/p(w'e) 2 + (1 - p)w'C~w In addition,
,
(3.1)
516
M . Viana
(1 + p)/2 c - p(w'e) 2 + (1 - p ) w ' ~ w
62 <
1
1
2w'C~W' (w'e) 2
holds for p E [0, 1] and all w ~ IR 2 such that w'e ¢ O. N o t e that the correlation 7 between X and Yi is zero if [this is a direct consequence of Proposition 2.2] and only if the correlation between X and Y(i) is also zero. Therefore, 6 = 0 implies 7 = 0 which implies (because of normality) that X and Y/are independent. It then follows that X and w ' ~ are also independent. The fact that 6 = 0 implies the independence of X and w ' ~ is not obvious, because the joint distribution of (X, ~ ) is no longer multivariate normal. W h e n w'e = 0 then 6 = 0. This implies that the correlation between X and the range of Y is necessarily zero under the equicorrelated-exchangeable model (2.1). Conversely, because of the normality assumption, 6 = 0 implies the independence of X and the range of Y. PROPOSITION 3.1. Sup Corr(X, w ' ~ ) 7 w'e=l,7>0,p>0 V/(1 + p)/2 The maximum value is obtained when w' = (1/2, 1/2), in which case w'°~ is the average of the components of Y. PROOF. Write equation (3.1) as Corr(X, w ' ~ ) =
7 x//p + (1 - p ) f '
f _ w ' C fwf
(w'e) 2
Because cg is positive definite, a solution w to the constrained minimization problem for w'~w, and equivalently for f , needs to satisfy cgw = 2e, where 2 is a Lagrangian multiplier. The fact that c~ is stochastic shows that the unique constrained solution is w' = (1/2, 1/2). [] The correlation 0 between the extreme values Y(1) and Y(2) is 0 = Corr(Y(1),Y(2))
= p + (1 - p)Cl2 p + (1 p)c22
(3.2)
For non-negative values of p, it holds that 0.4669
c1~ _< 0 _< 1 , C22
whereas the partial correlation of Y(1) and Y(2) given X is 01210 =
Corr(Y(1), y(2)lX ) = p + (1 - p)c12 p + (1 - p)c22 -
]~2 72
_< 0 .
(3.3)
Linear combinations of ordered symmetric observations
517
Thus, the partial correlation is always a contraction of the product moment correlation, regardless of the composition of the covariate. The minimum mean-squared error (m.s.e.) linear predictor of w'~d from X follows from the fact that E[w'°~] = #lw'e + zV/1 - p w'c ,
where c ' = (Cl,C2) = (-0.56419,
0.56419)
(3.4)
is the expected value of the smallest and largest of two independent standard normal variables [e.g., Beyer (1990)], and from the fact that A10Ao1 = 7rw'e/a. The resulting equation is t
w'~=
"["
#1we+ zv/1-
p w'c + - 7 ( X - # o ) w e (7
l
(3.5)
.
The corresponding mean-squared error can be expressed as All[0
=
"C2(1 --
72)[P.lo(w'e) 2 + (1 - p.lo)W'cgw],
P.10 - - p1 ---~ 272 ,
(3.6)
whereas the multiple correlation coefficient is equal to 62, the squared correlation between w ' ~ and X. Similarly, the best (minimum m.s.e.) linear regression of X on w ' ~ is described by t
awe
x = #o ~ @ ( w , e ) 2 + (1 - p ) w ' ~ w ]
7w'(°lJ-Ule-zv/1-p
c) ,
(3.7)
with corresponding mean-squared error -Aooll = ~2 1
72(w'e)2
]
(3.8)
p(w'e) 2 + (1 - p)w'~wJ
Also of interest are best linear predictors of one extreme of Y based on X and the other extreme of Y. The appropriate partitioning of q' shows that the two linear regression equations I~1) =b112 + blY(2) -b b2X , I~2) =b2ll + bl Y(1) + b2X , are defined by parallel planes in which the coefficient bl is the partial correlation 01210 given by (3.3), the coefficient of X is b2 =
zy(1 - p ) ( c 2 2 - c12 ) o-lp -4- (1 - p ) c 2 2 - ~2]
'
whereas the intercept coefficients are, respectively,
(3.9)
518
M . Viana
bll2 = ktl(1 - bl) - b2//o -
(c2 -
blcl)zV/1 - p
b2ll = ktl( 1 - h i ) - b2kt0 + (c2 - b l C l ) ' C V / 1
- p
,
(3.1o)
•
Because cl = - c 2 , the vertical distance between the planes is 2c2zx/1 - p
(1 + b l )
•
In addition, the model m e a n - s q u a r e d error and corresponding multiple correlation coefficient R 2 can be expressed as m.s.e. =
"t'2(l -- p2)(C22 -- C12)
R2
p + (1 - p ) c 2 2 - y2 ,
= 1-
m.s.e. ,~2[p + (1 -- p)C22 ] '
4. Maximum likelihood and large-sample estimates Given a sample (Xc~,ylct,y2c~), O~ = 1 , . . . , N of size N with means x, Yi, crossp r o d u c t matrix A = (aij), i,j = 0, 1,2, the m a x i m u m likelihood estimates of ~ and 0 are given by
3=
,
9w'e
/
~/,b(w'e) 2 + (1 - D)w'Cgw
~ = / 5 + (1 - p)Cl2 , /) -}- (1 -- /0)C22
(4.1)
where
1
&2 - a 0 0
{2=~(all+a22) N
~
and a12 /3 - l(al I q- a22)'
21-(aol + a02) 7 = V/1
(all -/- a22) a v ~
,
(4.2)
are the m a x i m u m likelihood estimates of a 2, r2, p and ~ based on N a00 = ~ ( x 7 c~=l
N -- . ~ ) 2
a0j = Z ( x ~
- ~)(~> - yj)
c~=l
and N ~=1
The delta m e t h o d [e.g., A n d e r s o n (1985, p. 120)] shows that the asymptotic joint distribution of ~ (6 - 6, t) - 0) is n o r m a l with means zero, variances
Linear combinations of ordered symmetric observations
519
AVar(6) = [2p 2 + 6p2?2f + ~:2 _ 5p2)~2 _ 472p -4- 474p + 2p 3 + 4 f p - 4p3f + 2p3f 2 - 2p2f 2 + 2 f 2 - 2 p f 2
+ 474f - 6y2f - 4 7 4 p f ] / [ - 4 ( f + (1 _ / ) p ) 3 ]
[w'Cgw]/(w'e)2, w'e ¢
where f =
(4.3) ,
0;
aVar(0) = (Cl2 - c22)2(1 - p)2(1 + p)2 (e22 + (1 - c22)P) 4 '
(4.4)
and covariance ACov(3, O) = I ( - 2 p 2 + 3p2f + 272p - 2y2pf - p + 222f - 3 f + 1) X (C12 -- C22)~(1 -- p)]
(4.5)
/ [ 2 ( f + (1 --f)p)3/2(C22 + (1 --C22)P) 2] . In particular, note that
0 ACov(6,~),p=0,7=O)=
[0
(C22 GI2) 2
'
c~2 so that ~ and 0 are asymptotically independent when X, Y1, II2 are jointly independent.
5. An exact test for ~ = 0
As indicated earlier in Section 3, Proposition 2.2 implies that the following conditions are equivalent under the exchangeable multivariate normal model with covariance structure given by (2.1): 1. ~ = Corr(X, Y/) = 0 2 . 6 = Corr(X, w ' ~ ) = 0 3. X and Y are independent 4. X and ~ are independent 5. X and w'°~ are independent The hypothesis 7 - - 0 can be assessed as follows. p Let A00 = a00, A01 = (a0j), j = 1,... ,p, All = (aij), i , j = 1 , . . . ,p, and Al0 = Am. Further, let r denote the sample intraclass correlation coefficient
~i<j; i=1 aij /[p(p P
r=
1)/2]
E i P l aii/P
associated with the sample p × p matrix of cross-products All. The distribution of All is Wishart Wp(z2 [pee + ( 1 - p ) I ] , n ) , n = N - 1 . Further, let r.10 denote the
520
M. Viana
sample intraclass correlation coefficient based on the conditional cross-product matrix 2
t
Alll0 = All -- AIoAoolAol ~ Wp(z2(1 - 7 )[P.lo ee + (1 - P.10)l], n) , where P.10 = (P - 72)/( 1 - 72). It follows [e.g., Wilks (1946)] that t r nSlll0 UI - - (1 + Co - 1)r.10) ~ z2[1 + Co - 1)p -pTZ]ZZ_p , P
U2 . Co . 1)tr. n&,lo 2 . (1 . r.lo ) 272(1 P)Z(p-1)(n-p) , P tr nSll V1 - - (1 + Co - 1)r) ~ "c2(1 + Co - 1)p)z 2 , P V2 = Co - 1)tr n&l (1 - r) ~ "c2(1 - P)Z~p-I> Furthermore, U 1 is independent of U2, and V~ is independent of//2. In addition, when ~ = 0, from Anderson (1985), Corollary 4.3.2, it follows that Vii -
U1 ~
v2[1 + C o - 1)p]x2 ,
independent of V1. Consequently, when 7 = 0, n V1 - U~ _ n [1 - (1 + C o - 1)rq0)tr $1110] p V1 p [_ (l + Co - 1)r)tr S l l J
Fp,.
m
(5.1)
Similarly, when p = 0, directly from the canonical representation of All, Co-1) V'-I+Co-I> V2 1- r
~ Fn,(p 1)n
,
(5.2)
so that (5.1) and (5.2) can be used to assess the corresponding hypotheses. Note that when 7 is different from zero, larger values of (5.1) are expected; when p is positive larger values o f (5.2) are expected. In the unrestricted case, smaller values are expected when p is negative.
6. Numerical examples
The following statistics are based on N = 42 subjects participating in a larger experiment reported by Fishman et al. (1993), in which the evaluation of patients with Best's vitelliform macular dystrophy included the measurement of their bilateral visual acuity loss, denoted by (Ya, Y2), and age, denoted by X. Because the visual acuities YI, Y2 in (respectively left and right) fellow eyes are expected to be about the same, to have about the same variability and to be equally correlated with age, the model defined in Section 1 to describe the data on
521
Linear combinations of ordered symmetric observations
(X, Y1,112) is used. The c o r r e l a t i o n structure between age a n d linear c o m b i n a t i o n s o f extreme visual acuities will be c o n s i d e r e d next. T h e starting p o i n t is the s a m p l e m e a n s (2,)51,J52) = (28.833,0.412, 0.437) , covariance matrix A = - S N-1
"367.996 4.419 4.200
=
4.419 0.135 0.074
4.200]
0.074 / 0.163J
based on (X, 111, I12) d a t a , a n d c o r r e s p o n d i n g c o r r e l a t i o n m a t r i x
R :
[1.000 10'627 L0.542
0.627 1.000 0.499
0.542] 0.499l 1.000J
A g e is expressed in years a n d visual acuity m e a s u r e m e n t s are expressed L o g M A R units. T h e m a x i m u m l i k e l i h o o d estimate (4.2) o f the c o r r e l a t i o n p between vision on fellow eyes is 0.496, whereas the e s t i m a t e d c o r r e l a t i o n ~ between the p a t i e n t s ' s age a n d vision in either eye is 0.581. In a d d i t i o n , the e s t i m a t e d s t a n d a r d d e v i a t i o n o f vision in either eye is 0.386, the s t a n d a r d d e v i a t i o n for age is 19.182, the e s t i m a t e d m e a n vision a n d age are/51 = 0.424 and/5o = 28.83, respectively. The m a x i m u m l i k e l i h o o d estimate (4.1) o f the c o r r e l a t i o n 0 between extreme acuities is 0.782. T a b l e 1 s u m m a r i z e s the coefficients needed to estimate the c o r r e l a t i o n a n d linear regression p a r a m e t e r s between X a n d a linear c o m b i n a t i o n w ' ~ o f extreme Table 1 Linear combinations of extreme vision acuity w~
W'°~
w'e
w'C~w
w'c
(0.5, 0.5) (1, 0) (0, 1) (-1, 1) (.75, .25)
average vision best vision worst vision range visual impairment
1 1 1 0 1
0.5 c l l = 0.6817 c22 - 0.6817 2 ( C l l - - c22) = 0.7268 0.5454
0 cj = -0.5642 c2 - 0.5642 2c2 -- 1.1284 -0.2821
Table 2 Linear combinations of extreme vision acuity and corresponding estimates w'~
3
Avar(6)
Acov(;, 0)
Avar(6]~/= 0)
average vision best vision worst vision visual impairment
0.671 0.634 0.634 0.661
0.105 0.155 0.155 0.117
0.104 0.236 0.236 0.151
1.000 0.733 0.733 0.916
M. Viana
522
acuities. F r o m (4.1), (4.3) and (4.5), the corresponding estimates o f 6, Avar(~), Acov(6, 0) and Avar(6]7 = 0) are shown in Table 2. The estimated large sample variance o f ~), given by (4.4), is 0.6115. The value o f the test statistic (5.1) for 7 = 0 is Fp,n = 9.48, which supports the conclusion o f a non-null correlation 7 between age and vision. Consequently, there is evidence to support the hypothesis o f association between the patient's age and non-null linear combinations o f extreme vision measures, such as those indicated in Table 1. N o t e that the range o f vision acuity is necessarily independent o f the patient's age under the equicorrelated-exchangeable model described by (2.1). The test statistic (5.2) for p = 0 is F,,(p_~), = 2.97 which also supports the claim o f a positive correlation p between vision o f fellow eyes. The estimates o f the regression lines (3.5) predicting the linear combination of extreme visual acuity from the patient's age and corresponding standard errors s.e. derived from (3.6) are shown in Table 3. Similarly, the estimates o f the regression lines (3.'7) predicting the patient's age f r o m the linear combination o f extreme visual acuity and corresponding standard errors s.e. obtained f r o m (3.8) are shown in Table 4. A more realistic application, in this case, is the prediction o f the subject's reading performance f r o m a linear combination o f extreme acuities, such as the subject's total visual impairment 3y(1 ) ÷ l y ( 2 ) , defined earlier in Section 1 [see also Rubin et al., (1984)]. Tables 5 and 6 show the corresponding m i n i m u m m.s.e, estimates for these models, obtained from sample means and cross-products o f (X, Y(1), Y(2)). These estimates will be contrasted with those obtained from data on (X, Y1, Y2). The usual estimates obtained from (X, Y(I), Y(2)), although o p t i m u m in the m.s.e, sense, fail to carry over the multivariate normal assumption and properties. Table 3 MLE Linear regression estimates of w'~J on age w'~
constant
coefficient
X
s.e.
r2
average vision best vision worst vision range of vision visual impairment
0.0866 -0.068 0.2412 0.3092 0.0093
0.0117 0.0117 0.0117 0 0.0117
age age age age age
0.247 0.273 0.273 0.233 0.254
0.450 0.401 0.401 0 0.436
Table 4 MLE Linear regression estimates of age on w'~ X
constant
coefficient
w'~
s.e.
r2
age age age age age
12.464 8.932 19.565 28.83 15.845
38.599 34.389 34.389 0 37.4453
average vision best vision worst vision range of vision vision impairment
14.222 14.834 14.834 19.181 14.393
0.450 0.401 0.401 0 0.436
523
Linear combinations of ordered symmetric observations
Table 5 (x,yo),y(21)-based linear regression estimates of w'~ on age
w'~
constant
coefficient
X
s.e.
r2
average vision best vision worst vision range of vision visual impairment
0.0876 -0.037 0.2124 0.2494 0.0253
0.0117 0.0114 0.0119 0.0005 0.0115
age age age age age
0.2506 0.2431 0.3260 0.2823 0.2365
0.4516 0.4548 0.3376 0.0381 0.4743
Table 6 (x,y(l),y(2))-based linear regression estimates of age on w'~'
X
constant
coefficient
w'~
s.e.
r2
age age age age age
12.427 17.193 13.114 28.134 14.118
38.568 39.778 28.165 2.621 40.990
average vision best vision worst vision range of vision vision impairment
14.381 14.340 15.805 19.407 14.080
0.4519 0.4548 0.3376 0.0381 0.4743
U n d e r these data, the covariance matrix (2.3) would be estimated by [367.996 ~0 = / 4.207 [ 4.411
4.207 0.105 0.092
4.411 0.092 0.156
,
with resulting correlation matrix [1.000 Corr0(X, Y(1), Y(2)) = / 0.674 [0.581
0.647 1.000 0.716
0.581] 0.716 / 1.000J
In contrast, the corresponding m a x i m u m likelihood estimate obtained f r o m Section 4 under the equicorrelated-exchangeable model is A I i.ooo C o r r ( x , Y(1), Y(2)) = /0.634 [_0.634
0.634 1.000 0.782
0.634]
0.782 / 1.000J
The differences can be remarkable: for example, from Table 3, the estimated range o f vision is 0.3092, whereas the unrestricted estimated value f r o m Table 5 is 0.2494. The difference is numerically nearly equivalent to the difference between normal vision (Log M A R = 0) and a reduced vision o f 20/40 (Log M A R = 0.3). The unrestricted model for best vision overestimates 62 by a b o u t 12% and underestimates it by a b o u t 21% for the worst vision. Tables 4 and 6 show that the
524
M. Viana
expected ages c o r r e s p o n d i n g to a n o r m a l best v i s i o n (the m o d e l ' s i n t e r c e p t ) differ b y a b o u t 8 years. P r o p o s i t i o n 3.1 is p a r t i c u l a r l y i m p o r t a n t to j u s t i f y the choice o f the a v e r a g e v i s i o n a g a i n s t o t h e r c o n v e x l i n e a r c o m b i n a t i o n s w h e n the p u r p o s e is to o b t a i n the best m.s.e, l i n e a r m o d e l r e l a t i n g X a n d the c o n v e x c o m b i n a t i o n w ' ~ , u n d e r the e q u i c o r r e l a t e d - e x c h a n g e a b l e m o d e l . T a b l e 3 shows t h a t the c o r r e l a t i o n b e t w e e n the s u b j e c t ' s age a n d the a v e r a g e v i s i o n d o m i n a t e s the c o r r e l a t i o n w i t h best vision, w o r s t v i s i o n or visual i m p a i r m e n t . T h i s is a m a t h e m a t i c a l fact a n d n o t sampling variation.
Acknowledgement R e s e a r c h was s u p p o r t e d in p a r t b y u n r e s t r i c t e d d e p a r t m e n t a l g r a n t s f r o m Research to P r e v e n t B l i n d n e s s , Inc., N e w Y o r k , N e w Y o r k .
References Aburn, N. (1990). Eye injuries in indoor cricket at wellington hospital: A survey January 1987 to June 1989. New Zealand Medic. J. 103(898), 454~456. Anderson, T. W. (1985). An Introduction to Multivariate Statistical Analysis. 2nd edn., John Wiley, New York. Beyer, W. (1990). Standard Probability and Statistics - Tables and Formulae. CRC Press, Boca Raton. David, H. A. (1996). A general representation of equally correlated variates. J. Amer. Statist. Assoc. 91 (436), 1576. Fishman, G. A., W. Baca, K. R. Alexander, D. J. Derlacki, A. M. Glenn, and M. A. G. Viana (1993). Visual acuity in patients with best vitelliform macular dystrophy. Ophthahnology 100(11), 16651670. Frenkel, R. and D. Shin (1986). Prevention and management of delayed suprachoroidal hemorrhage after filtration surgery. Arch. Ophthal. 104(10), 1459-1463. Olkin, I. and M. A. G. Viana (1995). Correlation analysis of extreme observations from a multivariate normal distribution. J. Amer. Statist. Assoc. pp. 1373-1379. Parving, A. and B. Christensen (1990). Hearing of the mentally retarded living at home. Ugeskr(ft For Laeger 152(43), 3161 3164. Rehak, J. and M. Vymazal (1989) Treatment of branch retinal vein occlusion with argon laser photocoagulation Acta Universitalis Palackianae Olomucensis Facultatis Medicae 123, 231-236. Rubin, G. S., B. Munoz, L. P. Fried, and S. West (1984). Monocular vs binocular visual acuity as measures of vision impairment. Vision Science and Its Applications OSA Technical Digest Series 1, 328-331. Szlyk, J. P., G. A. Fishman, K. Sovering, K. R. Alexander and M. A. G. Viana (1993). Evaluation of driving performance in patients with juvenile macular dystrophies. Arch. Opthal. 111,207 212. Viana, M. A. G. and Olkin, I. (1997). Correlation analysis of ordered observations from a blockequicorrelated multivariate normal distribution. In: S. Panchapakesan and N. Balakrishnan, eds, 'Advances in Statistical Decision Theory and Applications', Birkhauser, Boston, chapter 21,305322. Wilks, S. S. (1946). Sample criteria for testing equality of means, equality of variances and equality of covariances in a normal multivariate distribution. Ann. Math. Statist. 17, 309-326.
N. Balakrishnan and C. R. Rao, eds., Handbook of Statistics, Vol. 17 © 1998 ElsevierScienceB.V. All rights reserved.
t,~t-~ ZU
Order-Statistic Filtering and Smoothing of Time-Series: Part I
Gonzalo R. Arce, Y e o n g - T a e g K i m and Kenneth E. Barner
1. Introduction
The processing of time-series is of fundamental importance in economics, engineering, and some social fields. Estimation methods based on structural timeorderings are extensively used in time-series smoothing and forecasting. Their design vary from ad-hoc to very sophisticated where the dynamical nature of the underlying time-series is taken into account. Unfortunately, many time-series filtering problems have not been satisfactorily addressed through the use of linear filters. As we illustrate in this tutorial, nonlinear filters can outperform linear methods in applications where the underlying random processes are nonGaussian or when system nonlinearities are present. Nonlinear and non-Gaussian processes are quite common in signal processing applications. Example waveforms include sea clutter in radar, speech waveforms, image and video signals, and many digital communication signals. For instance, image and video signals contain edges, details, scenes, and colors that can abruptly change from one sample to another. If linear filters are used to estimate these signals from their corresponding noisy observations, the resulting linear estimates will unavoidably yield blurred signals which, in many cases, are objectionable to the end user. Linear filters fail to preserve those fine features that are of great importance to visual perception. These facts agree with statistical principles which dictate that nonlinear estimation is advantageous for time series which are non-Gaussian in nature (Priestley 1988; Tong 1990). While second-order moments are sufficient to effectively process Gaussian processes, more powerful statistics must be exploited for the processing of non-Gaussian or nonlinear time series. In our case, we exploit traditional temporal statistics with order-statistics, jointly. Robustness is another issue that must be considered in the design of time-series filters. During the past decades it has become increasingly accepted that statistical procedures optimized under the assumption of Gaussianity are excessively sensitive to minor deviations from the Gaussian assumption (Huber 1981). Thus, the need of "robust" estimation frameworks for non-Gaussian sequence processing has become highly apparent. Since order statistics provide the basis for 525
G. R. Arce, Y. T. Kim and K. E. Barner
526
reliable inferences such as the estimation of location and scale, it is not surprising that the ordering information provided by the observation samples can significantly enhance the capability of time-series filters. This idea was first explored by Tukey (1974) when he introduced the running median for time series analysis. The running median is a special case of the running L-filter whose output can be written as N
y(n) = ~
wix(i)
(1)
i=l
where x(i) are the sample order statistics at time n, and where the set of weights {wi} are individually designed for each particular application. If the weights are chosen uniformly as wi = 1/N, the running L estimator reduces to the running mean. In fact, the mean is the only filter which is both a linear FIR and a L filter. If the weights are assigned as {~
for i = ~ + I , . . . , N - ~
Wi =
(2)
otherwise the obtained estimator is the symmetric trimmed mean where the bottom and top order statistics have been removed and where the remaining samples are averaged to produce the output. As described by Bednar and Watt (1984), trimmed means provide a connection between average smoothing and median smoothing as is illustrated here. Consider a segment of the voiced waveform "a", shown at the bottom of Fig. 1. This speech signal is placed at the input to several running trimmed mean filters of size 9. The outputs of the trimmed means as we vary the trimming parameter ~ from zero to four are also shown in Fig. 1. The vertical index denotes the trimming where the top signal is the median filtered output, the second signal from the top is the trimmed mean with ~ = 1 output signal, and successively the other trimmed means are displayed in Fig. 1. The different characteristics of the filtered signals as we vary the trimming can be immediately seen. Notice that while the running mean results in a smooth blurring of the signal, the running median smooths the signal with sharp discontinuities. This is due to the fact that the running median restricts the output value to be identical to the value of one of the input samples in the observation window. Depending on the amount of trimming, the alpha trimmed filter removes narrow impulses, but it also does some edge smoothing. The running L-filter has many desirable attributes which have been exploited in several applications (Bovik et al. 1983) However, L-filters fail to exploit the temporal structure of time series (Pitas and Venetsanopoulos 1989). Our goal is to define estimators that utilize both the temporal and ranking configurations of the permutation mappingp: xe(n) ~ xL(n), where xe(n) and xL(n) are the observation vector and its corresponding sorted order-statistic vector, respectively. It will be shown that for the analysis and processing of time series, one attains significant advantages by exploiting the information embedded in the mapping p, rather than using the marginal information contained in either xe(n) or xL(n). We denote the
Order-Statistic filtering and smoothing of time-series: Part I
527
0
-1
-2 0
50
i
i
i
100
150
200
r
250 time n
300
350
400
450
500
Fig. 1. Trimmed mean filtering of a speech signal for various levels of trimming.
estimators which exploit the permutation xe(n) +-+ xL(n) as permutation Ug-filters where L refers to the use of order-statistics, g denotes the temporal ordering of the linear combiner forming the estimate, and j refers to the amount of information extracted from the permutation mapping p. The structure and optimization methods used for the class of Ug filters parallels that of linear finite-impulse-response filters widely used in the area of signal processing for engineering applications. The underlying concepts, however, may be extended to other filters structures that may be more amenable for other fields.
2. The estimators
2.1. Lg filters Consider the real valued sequence {x(n)}, and define the N-long observation vector at the discrete time n as xe(n)=[xl(n),x2(n),...,xu(n)l r, where xi(n) = x(n + i - (K + 1)) with N = 2K + 1. Thus xe(n) is a temporally ordered observation vector centered at x(n). The observation samples can also be ordered by rank which defines the vector x c ( n ) = [x(1)(n),x(z)(n),...,X(N)(n)] T, where x(1) (n) _< x(2)(n) < ... < x(N)(n) are the sample order-statistics. When there can be no confusion, for the sake of notational simplicity, the temporal index n is dropped from the notation. The temporal-order and rank order observations are
528
G. R. Arce, Y. T. Kim and K. E. Barner
then expressed as simply xe = [Xl,X2,... ,XN] r and XL = Ix(1),x(2),... ,X(N)I r. The subscripts g and L refer to the temporal and ranking orderings of the elements in xe and XL, respectively. We define rz as the rank of xi among the elements of xe. Hence, the sample xz in xe gets mapped to x(r,) in Xz. In the case of rank ties among a subset of input samples, stable sorting is performed where a lower rank is assigned to the sample with the lower time indexing in the subset containing rank ties. xe and XL respectively contain local temporal and ranking information of the underlying time-series. It is useful to combine the marginal information of both vectors into one. To this end, the N2-1ong vector XLe is next defined as (Ghandi and K a s s a m 1991; Palmieri and Boncelet 1994).
xTg =- [XI(1), X l ( 2 ) , . , Xl(N) I...
,Xio'),...
IXNo),XN(2),... ,XN(N)]
,
(3)
where
{~ i
xi(j) =
if xi
>x(/) (4)
else
and where xi ~ , x(j) denotes the event that the ith element in xe is the jth smallest in the sample set. Thus, the ith input sample is mapped into the bin of samples xi(a), xi(2), . . . , X~(N) of which N - 1 are zero and where only one is non-zero having the same value as xi. The location of the nonzero sample, in turn, characterizes the ranking of xg among the N input samples. The decomposition xe C R x < >XLg C R N2 specified in (3) and (4) is a one-toone nonlinear mapping where x~ can be reconstructed from XLe as xe = EIN ®e~]xLe ,
(5)
where IN is an N x N identity matrix, eN is an N × 1 one-valued vector, and ® is the matrix Kronecker product. Since the XLe vector contains both, time and rank ordering information, it is not surprising that we can also obtain XL from XLe as
=
® Idx
.
(6/
EXAMPLE 1. Consider a length 3 filter and let the observation vector be xe = [3, 5, 2] r. The ranks of the elements in the observation vector are rl = 2, r2 = 3, and r3 = 1; thus, X L = [ 2 , 3 , 5] r xLe ----[0, 3,010, 0, 512, 0, 0] r .
(7)
The xe vector can be reconstructed from Xze as xe=
I
Or 0r
e3r 0r
]
[ 0 , 3 , 0 1 0 , 0 , 5 1 2 , 0 , 0 ] r,
e3r
(8)
Order-Statistic filtering and smoothing of time-series." Part I
529
where e? = I1, 1, 1]T, and 0 = [0,0,0] r. Similarly, XL is obtained from XLe as XL = [13113113][0, 3, 010, O, 512, O, O] r
where 13 is
the 3 x 3 identity matrix.
(9) []
The decomposition xe ~ XLe thus maps a vector with time-ordered samples into a vector whose elements are both time- and rank-ordered. The manner in which each element x~(j) is given its value is particularly important. The rank ofx~, ri, determines the value of all the elements X i ( 1 ) , X i ( 2 ) , . . . ,Xi(N) , regardless of how the other samples in the window are ranked. Given the value of ri, there are (N - 1)! ways of assigning ranks to the remaining N - 1 samples such that the values of xi(1), xi(2),..., Xi(N) are not modified. This can be considered a coloring of the full permutation space described by the N! ways in which the ranks of N elements can be assigned. The class of Lg filters, introduced by Palmieri and Boncelet (1994) and independently by Ghandi and Kassam (1991), builds on this idea of combining temporal and rank orderings. Here, the output is a linear combination of the observation vector where the coefficient associated with the ithinput sample also depends on its rank among all N samples. The output of the filter at time n can be expressed as = w T
(lO)
where the weight vector is W = [ ( w l ) r l ( w 2 ) r l . . . [(WN)T1T in which wi is the N long tap weight vector associated with the ithinput sample, and where £/(n) is the Lg estimate of a desired signal statistically related to the observation vector xe(n). The values given to the weights in W must be designed according to an optimization criterion. This topic will be addressed shortly. It is useful at this point to present an example illustrating the advantages of Lg filters over traditional linear filters. Consider the information signal of Fig. 2(a), used in (Ghandi and Kassam 1991), which is the superposition of two sinusoids at normalized frequencies 0.03 and 0.25 with amplitude 10 and 20, respectively. The desired signal is transmitted through a channel that exhibits saturation. The saturation can be modeled by a sigmoid function followed by a linear time-invariant channel. The sigmoid function is given by A ( 1 - e ~d("))/(1 + e ~d(n)), where d(n) is the desired signal, and the FIR channel is a low pass filter. The signal distorted by the sigmoid function and the FIR channel is depicted in Fig. 2(b) where A = 20 and c~= 0.2 are used. In addition, the channel also introduces additive contaminated Gaussian noise, whose probability density function is given by (1 - cS)G(0, cr2) + 6G(O, ~2) where G(0, a 2) represents a Gaussian distribution function, 6 is the density of outliers, and where al < a2. A contaminated Gaussian noise with al = 3,~r2 = 15 and 6 = 0.1 is added to the signal. The corrupted observed signal is depicted in Fig. 2(b). Figure 3 shows segments of the linear filter output and of the Lg filter output for a window of size 9. The figures show that the output of the linear filter is severely affected whenever an outlier is present. On the other hand, single outliers
G. R. Arce, Y. T. Kim and K. E. Barner
530
40
30
20
10
0
© -10
-20
-30
-40
10
20
30
40
50
60
70
80
90
100
time
40
30
I
tt
I ~1
20 I
I I
!° ~ -10
-20 II
Ii
,
i[ iI 'i~
-30 -40
10
20
30
40
5'0
6'0
70
8'0
90
100
dme
Fig. 2. Segment of (a) the original signal, and (b) nonlinearly distorted signal ( ) and the observed signal (-) which is corrupted by an additive contaminated Gaussian noise. have m i n o r effects on the Lg filter output. These observations can be readily seen in the time interval (20-25) o f Figs. 2 3, where an outlier was present in the observation sequence. The m e a n squared error attained by the linear and Lg filters are 70.55 and 58.16, respectively.
Order-Statistic filtering and smoothing of time-series: Part I
531
40 3G
10 © 0
;q q0 -20 i' -30
-40
10
20
30
40
50 time
60
70
80
90
100
90
100
40
30
20
10
0 Y. -7 -10 Ir
I, -20
L,
IiI
Lr
'l
q
,I
30
-40
10
20
30
40
50 time
60
70
80
Fig. 3. Segment of the estimated signals using (a) a linear filter, and (b) Lg filter, where ( - - ) represents the original signal and ( ) represents the estimation.
Before we conclude this section, we further elaborate o n this example to motivate the i n t r o d u c t i o n of the more general class of LJg filters. This can be done by noticing that neither the linear filter n o r the Lg filter are effective in reversing the n o n l i n e a r s a t u r a t i o n of the channel. The definition of the Lg filter incorporates
532
G. R. Arce, Y. T. Kim and K. E. Barner
both rank and temporal information to some extent• It does not fully exploits the information provided by the mapping xe ~ ~ xL. This follows from the fact that the weight given to the sample xi depends on the rank of x~ but the rank distribution of the remaining samples does not affect the weight applied to x~. In the following these concepts are generalized into a more general filtering framework. These are denoted as permutation LJg filters where j is the parameter that determines the amount of time-rank information used by the estimator (Kim and Arce 1994). To illustrate the performance of these estimators as j is increased, Fig. 4 shows segments of the L2g filter output and of the L3g filter output whose mean square errors are 28.5 and 4.6, respectively. These outputs are clearly superior to those of the linear and L1g filters shown in Fig. 3. The higher order LJf filters, in particular, are more effective in removing the saturation effects of the nonlinear channel. 2.2. U g filters
Consider the observation vector xe = [Xl,X2,...,XN] r and its corresponding sorted vector xL = [x(1),x(2), . . . , X(N)]r. Define the rank indicator vector =
T ,
(1 l)
where •¢~k =
1 0
if xi ~ x(k) else
(12)
where xi ~ x(k) denotes that the ith temporal sample occupies the k th order statistic. The variable ri is then defined as the rank ofxi; hence, N?~ri= 1 by definition. Assuming the rank indicator vector ~i is specified, if we would like to jointly characterize the ranking characteristics of xi and its adjacent sample, xi+l, contained in xe, then an additional indicator vector is needed which does not contain the information provided by ~ . Hence, we define the reduced rank indicator of xi+l, as ~ ] , where we have removed the r~helement from the rank indicator vector ~i+1. The two indicators ~i and ~ ] fully specify the rank permutation characteristics of xi and xi+l. We can generalize this concept by characterizing the rank permutation characteristics of a set of j samples. Here, a reduced rank indicator of xiea, ~ , is • th th th formed by removing the r_~ , riea, . . . , rie(a 1)elements from the rank indicator vector ~ie~ where ® denotes the Modulo N addition i ® a = (i + a) Mod N ] The parameter a specifies the sample, x~e~, whose rank information is being considered in addition to the rank information of the samples (x~,xiel,...,x~e(~_l)), i.e., a = 1 considers the rank information of the sample xiel when the rank information of the sample xi is known, and a = 2 considers the rank information of the sample xie2 when the rank information of the samples (xi,xi, i) is known. For
1 The Modulo N operation defined here is in the group { 1 , 2 , . . . , N}, such that (N Mod N = N) and ( N + I M o d N = 1).
Order-Statistic filtering and smoothing of time-series: Part I
533
40 30 2O o_
(~ 10 I1) LL
g
o
~_ -10 -20 -30
-40
i
10
20
30
40
110
20
30
40
50 time
60
7~0
8~0
60
7~0
80
90
1O0
9~0
100
40
30
20 "5 0 1C (1)
bY.
g o "5
E #_ -lo -20
-30
-40
i 50 time
i
Fig. 4. Segment of the estimated signals using (a) L2g-permutation filter, and (b) L3g-permutation filter, where ( - - ) represents the original signal and (--) represents the estimation.
534
G. R. Arce, Y. T. Kim and K. E. Barner
example, if xe = [6, 3, 10, 1]r and xL = [1, 3, 6, 10] r, then the rank indicator vectors and their respective rank parameters are ~ 1 = [0, 0, 1,0] T, ~7~3 [0,0,0,1] r,
~2 = [0, 1,0, 03 T, ~4=[1,0,0,0]T,
F 1 = 3, r3=4,
r2 = 2, F4=l '
(13)
The reduced rank indicator vectors D~ and ~2 are, for instance, = [ 1 , o , o,
~
E = [~r4,0,
171
= I i , o , o1
1, ~n]r
= [0, 1]T ,
(14)
where the r~h sample was removed from D~3el = ~ 4 to obtain ~ and where the r~h and r~h samples were deleted from D~3e2 = g~l to get ~ . Note that the notation /Or' used in (14) represents the deletion of the sample "0" which is the F~h element of the rank indicator vector. The D~ indicates that the sample x4 is the first ranked sample among (xl,x2,x4), and similarly g~ indicates that xl is the second ranked sample among (xl,x2). The general idea behind the reduced rank indicator ~ is that it characterizes the rank information of the sample xiea under the situation in which the rank information of the samples (xi, xi~l,..., xie(a 1)) is known. The rank indicator vector and the reduced rank indicator vectors are next used to define the rank permutation indicator PJ as l~ = ~ i ® ~ ]
@...®~
1
(is)
for 1 < j _< N, where ® denotes the matrix Kronecker product. Note that while the vector ~ i is of length N, the vector ~ in (15) has length of P/v dOfN(N - 1)... (N - j + 1) which represents the number of permutations choosing j samples from N distinct samples. The vector PJ effectively characterizes the relative ranking of the samples (xi, xi~l,... ,xie(j-1)), i.e., the rank permutation o f ( x i t x i ~ l , . . . , x i ~ ) ( ]. 1))- Hence, p0 does not unveil any rank information, whereas P] provides the rank information ofxi, but ignores the rank of the other N - 1 input samples. Similarly, P~ provides the rank-information of xi and ximl but eludes the ranking information of the other N - 2 input samples. Clearly, pX accounts for the ranks of all input samples xl through Xx. In order to illustrate the formulation of the vector P~i, again let xe = [6,3, 10, 1]r and xL = [1,3, 6, 10IV. The rank indicator vectors, ~/i, for this example vector xe were listed in (13), then the rank permutation indicators for j = 2 are found as p2
P]
= :
1@
I1
: :
:
:
:
:
[0, O, 1, O]7"0 [0, 1, O]r [0, 1,0,03 r @ [0,0, 1]r [0, O, O, 1]r ® [1,0,0] 7, [1,0,0,03 r ® [0, 1,0] r
(16)
Order-Statistic filtering and smoothing c?f time-series: Part I
535
To see how the PJ characterizes the rank permutation, let us carry out the matrix Kronecker product in the first equation in (16), that is, P~ = [(0, 0, 0), (0, 0, 0), (0, 1,0), (0,0, 0)1 ~
(17)
where parentheses are put for ease of reference. Note that the 1 located in the second position in the third parenthesis in p2 implies that the rank of xl is 3 and the rank of x2 among (x2,x3,x4) is 2. Thus, P~ obtained in this example clearly specifies the rank permutation of xl and x2 as (rl, r2) = (3, 2). Notice that the vectors P~ can be found recursively from (16) as P~ = P? ® ~¢? In general, it can z. z: 1 " 1 be easily seen from (15) that this recursion is given by PJ = PJ ® ~//- . The rank permutation indicator forms the basis for the rank permutation vectors Xj defined as the NUN long vector Xj
= ~1 (p~)r
ix2 (p~)r i ... iXN ( p j ) r ] r
(18)
Note that Xj places each xi based on the rank of j time-ordered samples (xi, Xi+l,..., xi+(j 1)). Consequently, we refer to it as the LJg vector, where we have borrowed the notation from the order statistics terminology (L and g refer to the rank and time variables, respectively). It should be mentioned here that there are other ways of defining rank permutation indicators. For instance, we could let P~ characterize the rank permutation of the samples (Xi+l,Xi®3,... ,xi@(2j+l)), or it can characterize the rank permutation of (xl, x2,...,xj) regardless of the index i. Here, we use the definition of PJ in (15) since it provides a systematic approach to the design. Associated with the X/, Kim and Arce (1994) define the LJg estimate as
:wTx j
(19)
where the weight vector is
in which wJ is the UN long tap weight vector, and where @ is the LJg estimate of a desired signal statistically related to the observation vector xe. Notice that for j = 0, the permutation filter LOg reduces to a linear FIR filter. For j = 1, the permutation filter is identical to the Lg filter introduced earlier.
3. ~-trimmed
LJg filters
One of the advantages of order-statistic filters is their ability to trace extreme observation samples at the beginning and end of the vector XL (David 1982). Thus, LJg-estimates can be made more robust to outliers simply by always ignoring these extreme observation (i.e., trimming). The a-trimmed LJg filters are easily defined by introducing the e-trimmed rank indicator N~ which is formed by removing
536
G. R. Arce, Y. T. Kim and K. E. Barner
the l'st through the e th, and the (N - 7 + 1) th through the N th elements from ~ where e = 0, 1 , . . . , [~_tj. For instance, suppose we have ~ i = [0, 1,0, 0,017,, then the n-trimmed rank indicator will be N~ = I1,0, 0] 7, for e = 1 and ~ = [0] for e = 2. The ~-trimmed rank permutation indicator vector W/,~ is easily defined as P~i,~ - ~ ® ~
®'" ® ~-1,
(21)
and the a-trimmed Ug vector is then defined as
Ix'< X~ =
]
•
(22)
T c~ where The e-trimmed LJg estimate immediately follows as df = (¥j)X~ Vj = [(~)7,[(~£)7,1... I (~)7,]7, in which ~/ is the ( ~ ) P/~ long tap weight vector.
4. Optimization The Wiener Approach Given the Ug filtering framework, the goal is to minimize the error e(n) between the desired signal d(n) and the permutation filter estimate. Under the MSE criterion, the optimization is straightforward, since the output of the LYg filter is linear with respect to the samples in Xj. Hence, it is simple to show that the optimal Ug filter is found as (Kim and Arce 1994) W ; pt : Rflpy
(23)
where pj = {d(n) Xj} and Rj is the Ux × Ux moment matrix
Rj = E{XjX5}
(24)
[11{1 R{2 ... R~N]
" L R~¢I
R J2
--"
(25) RJN J
in which
(26)
From (26), it can be seen that the diagonal sub-matrix of R j, R JU b / ' constitutes a diagonal matrix whose off-diagonal elements are all zeros. Also, the off-diagonal sub-matrix R~v for u • v forms an off-diagonal matrix whose diagonal elements are all zeros.
Order-Statistic filtering and smoothing of time-series. Part I
537
The solution of the optimal filter in (23) will be unique only when the correlation matrix in (25) is nonsingular. Certainly if the correlation matrix is singular, a solution could be found by use of the singular value decomposition (Haykin 1991). Although in most of the applications encountered in practice, where broadband noise is present, the autocorrelation matrices will be nonsingular, it is useful to identify conditions where the autocorrelation in (25) has full rank, or it is rank deficient. In the following we address this issue. Consider the permutation xe ~ XL, and let gj(i) denote the event that the set of samples xi,...,xi~fj_l) permute to the order-statistics x(~,),...,x(~,®~ l))' If we enumerate all UN such possible events by g ] (i), for m = 1 , 2 , . . . , UN, then a necessary condition for Rj to be nonsingular is that
Pr{ey(i)} ¢0
,
(27)
for all m and for all i = 1 , 2 , . . . , N. Equivalently, a sufficient condition for Rj to be singular is that
for some m and i. The above follows from the concept that in order for Rj to be nonsingular, Xj cannot contain any null components. Considering the class of LJg filters whose estimates are based on the v e c t o r Xj C ~/~Xj C R NM, where 5~xj is the sample space of Xj, M = UN, and R denotes the set of real valued numbers. We first invoke the standard theorem on the rank of a distribution in R N. THEOREM (Cramer 1946). A distribution in R N is nonsingular when and only when there is no hyperplane in R N that contains the total mass of the distribution. From the above theorem it follows that the structure of the sample space is sufficient to analyze the singularity conditions of a distribution. In particular, singularity will occur if the N-dimensional space is degenerated into a linear space with lower dimension. In reference to the Ug estimator, the Xj vector is formed as
x E 5ax C R N
~
Xj
E
~X) C R NM.
(29)
It follows that if one of the components of Xj is always null, the sample space 5Pxj degenerates in a set of dimension less than NM, leading to the singularity of E{XjXf}. This fact leads directly to the following conditions: (A) a necessary condition for E{XjX r } to be nonsingular is that Pr[(Xi, Xi®l,..., xi®(j 1))+-+ (x(r,),x(rlet),..., x(r,.~ ~/))] ¢ 0, for all i = 1 , 2 , . . . , N and for all possible ordering of xi, Xiel,..., xie(j-l); (B) a sufficient condition for E{XjX r } to be singular is that P~[(xi,xiel,..., xie(j-1)). ~ (x(~,), x(~,e~),..., X(r,~_,/)) ] = 0, for some i = 1 , 2 , . . . ,N and for some of the/~N possible ordering of xi,xi,l,... ,xieo 1). Notice that in order to obtain necessary and sufficient conditions for the nonsingularity of Rj we must break the sample space 5~x into all the subsets
538
G. R. Arce, Y. T. Kim and K. E. Barner
defined by the permutation (xi,... ,Xio(j_I) ) +-+ (X(rl),... , X(rie U I)))' From Cramer's theorem, the matrix Rj will be nonsingular if the aforementioned conditions are met, and if the mass of the distribution in all these subsets does not belong to a linear subset of dimension less than N. Clearly, as j is increased, the non-singularity conditions on the matrix Rj become more restrictive; thus, in some cases it may be possible that the solution in (23) exists for some given values o f j but not for higher values. Before we conclude this subsection of optimality, it should be noted that the optimal Ug filter, Wj pt specifies all other optimal filters Wppt for 0 _
(30)
pp = Tj,ppj .
(31)
and
Letting Wp pt and W] pt be the optimal weight vectors of the LPg and LJg filters, respectively, then by use of (30) and (31), the relation between these two optimal filters becomes V~ppt
r -1 = (Tj,pRjTj,p) Tj,pRjW)opt .
(32)
Thus, the optimal Wiener solution of the LPg filter can be obtained from W] pt when 0 _
Least mean square (LMS) optimization The complexity of permutation filters increases rapidly with j, thus the computational cost for the inversion of Rj is very expensive. An alternate method for the filter design can be developed by means of Widrow's LMS adaptive algorithm (Haykin 1991). Here, the filter weights are iteratively updated as Wj(n + 1) = Wj(n) + 2#e(n)Xj(n)
(33)
where e(n)= d ( n ) - [tj(n)= d ( n ) - Wf(n)Xj(n) is the estimation error at time instant n, and # is the step size of the LJg-LMS which controls the rate of convergence of the algorithm. In the update in (33), the simplicity of the LMS algorithm may be diluted by the rapid increase of the dimension of the vector update equation in (33) as the order of permutation lattice filter increases. Notice, however, that at any time instant, n, there are only N non-zero elements in Xj(n) so that the number of updates required in (33) is N per observation. Letting//be the location of the sample xi in the PJN long vector ~ which is the i 'th subvector of
Order-Statistic filtering and smoothing of time-series." Part I
539
Xj(n), then the Ug-LMS algorithm in (33) can be implemented as summarized in Table 1. In the algorithm, ~//(~, n) represents the/[th element of the itth sub-weight vector WJi • Note that the computation o f / / is essential and can be readily computed. Recall Pj = 5~i ® ~ ] ® . . . ® ~ J 1. Let a m denote the location of the non-zero element in ~m, when m = 0, 1,... ,j - 1. Note that am's represent the relative ranks of the first j samples of xe starting from xi. Since a m E { 1 , 2 , . . . , N - m } , Ii can be computed by the following recursion: •
Ii=0;
•
form=Otom=j-1 Ii = ( N - m)Ii + (a m -
l)
end •
I / = L. + 1;
The number of multiplications required in computing Ii is j, thus, the computational advantage of the LMS still holds even for higher order permutation filters. There are several factors which determine the convergence behavior of the LMS adaptation in (33). It is well known, for instance, that the step-size parameter # has a direct effect on the stability and convergence of the algorithm. In addition, the parameter j of U g permutation filters play an important role on the LMS adaptation. In order to make the analysis of the algorithm mathematically tractable, we follow the fundamental assumptions which make up the so-called independence theory (Haykin 1991). Although these conditions are not strictly valid in general for either the standard LMS or the Ug-LMS algorithms, these make the analysis of the adaptive algorithms tractable and, more importantly, provide the designer with reliable design guidelines. For c o n v e r g e n c e in the m e a n , it is simple to show that the Ug-LMS algorithm will converge provided that the step size in (33) satisfies
Table 1 The LMS adaptation algorithm for the LJg-permutation filters. Step 1: Step 2:
Step 3:
Step 4:
Step 5:
Obtain xe(n),xL(n) and d(n). Fori= 1,2,...,N find ri(n) compute Ii(n) Estimate the output and compute the error:
~(n) = d(n) - d:(.) Update the weight vector: for/= 1 toN wJj(Ii,n + 1) = wJi(l,, n) + 2#e(n)xi(n) Go to Step 1
540
G. R. Arce, Y. T. Kim and K. E. Barner
0 < # < --
1
(34)
)t,max
where J~max is the maximum eigenvalue of the correlation matrix Rj. Moreover, if in addition to the set of fundamental assumptions we also assume that the variable Ii is independent o f x i , then it can be shown that the condition on the step size in (34) becomes
0 <
<
(35)
which shows that the upper bound of the step size increases with the order of permutation filters. The condition on (35) must also satisfy the condition for convergence on the mean square given by (Haykin 1991) 2 2 0 < # < Trace [Rj] -Trace[Rx]
(36)
where Rx = E{xe(n)xe(n)r}. Since the eigenvalue spread of R: will in general grow as the parameter j is increased, for a fixed value of # satisfying (34) and (35), the speed of convergence of the Ug-LMS filter decreases as we increase j (Pitas and Venetsanopoulos 1991; Pitas and Vougioukas 1991). On the other hand, if we increase the step size in proportion to/~U, as the upper bound in (35) implies, the speed of convergence of the Ug-LMS algorithm can be significantly improved. A different method to speed up the learning characteristics of LJg LMS filters was recently proposed in (Arce and Tian 1996) where the OS filter weights are adapted in the transform domain.
5. Filter lattice structures
For an observation vector of size N, the Ug-filtering framework defines a wide range of filters each utilizing different amounts of rank and temporal ordering information. As it can be expected, the set of filters are coupled to each other. For a fixed observation vector of size N, there is a well defined structure of the set of all permutation filters ~Le = {L°g, Llg,...,LX-lg}. In particular, it has been shown that the set ~Le constitutes a complete lattice (Kim and Arce 1994). That is, the set ~Le is a well-defined poset which has a least upper bound and a greatest lower bound for any two elements of NLe. For this reason, we will refer to the set ~Le as a permutation filter lattice. Before we describe the properties of the LJg filter lattice, we introduce some basic notation. A poset (partially ordered set) is a set equipped with an ordering relation, denoted as _<, and obeying the well-known conditions for the elements x,y, z of a set:
Order-Statistic filtering and smoothing o f time-series." Part I
(I) (II) (III)
x _< x x_
and and
y_<x y_
imply imply
x=y x_
541
(reflectivity) (antisymmetry) (transitivity).
The order relation is a binary relation that represents an inequality between two numbers, set inclusion, or an information inclusion relation. For instance, we can say 0 _< 1, or {1,2,3} < {0, 1,2,3,5}. The definition of a lattice requires the concept of an upper bound and a lower bound of a poset. Let P be a poset and S c_ p. An element x c P is an upper bound o f S i f s _< x for all s C S. The set of all upper bounds o r S is denoted by S u. Thus, S ~ = {x E P]s _< x, Vs E S}. A lower bound is defined dually, and S l denotes the set of all lower bounds of S. Furthermore, if S" has a least element, then it is called the least upper bound, or supremum, of S, and is denoted by sup S. Dually, we can define the greatest lower bound, or infimum, of S, which is denoted by inf S. Notationally, x V y and x A y denote sup{x, y} and inf{x, y}, respectively. A nonempty poset P is called a lattice if x v y and x A y exist in P for all x , y E P. Moreover, if sup S and inf S exist in P for all S C P, then P is called a complete lattice. Lastly, a covering relation is defined as follows. Let P be an ordered set and x , y E P. We say x covers y (or, y -< x) i f y < x and y _< z < x implies z = y . The latter condition is demanding that there be no element z of P with y < z < x. Similarly, y ~- x will be used to represent y covers x. One of the most useful features of a poset or a lattice is that, in the finite case, they can be drawn to visualize the entire structure of the set. In a diagram of a poset, two elements in a poset are depicted by circles and connected by a straight line if one covers the other. If x covers y, the circle representing x is above the circle representing y, and the two circles are connected by a line segment. Based on these definitions, the lattice structure of the set of permutation filters can be constructed. Starting with the LJg filters, it is easy to see that the constraints on the weights applied to each sample Xz are relaxed as j increases. It follows that the weights of a LJg filter can be selected such that it is made identical to a LJ l g filter. We thus say that a LJg filter covers a LJ lg filter. Using this argument recursively, we can write L N lg ~_ L N - 2 g ~ . . . ~_ LOg .
(37)
Thus, the class of Ug filters constitutes an ordered set with a greatest lower bound (LOg) and a least upper bound (LX-lg). Consequently, LJg filters comprise a complete lattice (Donnellan 1968), and as such, can be depicted in a lattice diagram. Figure 5 depicts a LJg filter lattice for N ----7, where the infimum (linear filter) and the supremum (L6g) are clearly shown. The lattice structure of Ug filters illustrates the modularity of this filter class. While a filter in the lower region of the lattice may suffice for simple applications, a filter in higher regions of the lattice is more desirable for more difficult estimation problems.
G. R. Arce, Y. T. Kim and K. E. Barner
542
t £1 L41
L 31
~ Ll
Fig. 5. The L/g Permutation Filter Lattice for N 7. The least upper b o u n d is L6g and the greatest lower bound is LOg (i.e. the F I R filter).
6. Piecewise linear structure of
LJg filters
A fundamental property behind the structure of non-linear filters is their ability to model non-linear systems. While it is clear that polynomial filters approximate non-linearities by a truncated Volterra series expansion, the non-linear modeling structure of permutation filter lattices is not immediately seen. Here we describe the modeling structure behind permutation filter lattices where a non-linear surface is approximated by a set of piecewise linear hyperplanes. The domain of the piecewise linear structure is determined by the ordering characteristics of the underlying random processes. For a given window size, how fine the approximation is depends on the sub-lattice chosen. As we progressively truncate the LJg lattice to lower orders, the non-linear modeling becomes increasingly coarse. Consider a finite-memory, zero-mean, non-linear discrete-time system f(xe) where xe = ~1, x2, . . . , XN] r. The best fit produced by a linear filter is that of a hyperplane f(xe) = erxe, where e is chosen to minimize the error fit. Clearly, the linear fit is only satisfactory if the non-linear surface is nearly flat. Otherwise, the linear fit will yield large errors and non-linear methods should be employed. Volterra non-linear filters constitute one such technique where the surface is approximated by a polynomial surface N-1 f(Xg) =ZCi,XN-iI il--0 N-1
N-1N-I +Z~Cili2XN
i,XN i2
il =0 i2=0 N-1
(3s)
-~ ' ' ' ~- ~ - ~ ' ' ' Z eiI...ipXN il "''XN-ip iI--0 ip--O
specified by the parameters ci,, ci,,i2,..., cit,..,i k. The representation of non-linear surfaces by Volterra kernels has been studied for the past several decades. Each
Order-Statistic filtering and smoothing q[" time-series. Part I
543
Volterra kernel characterizes a particular order of nonlinearity making these filters very intuitive (Pitas and Venetsanopoulos 1989). Permutation lattice filters have a similar modular non-linear modeling capability. In this case, non-linear surfaces are fitted through piecewise linear hyperplanes. The boundaries of the partitions depend on the relative ordering of the stochastic observation vector of the system. To illustrate this concept, consider a system with a memory length of 3. In this case, there are six possible permutations o f x l , x2, x3: (I) (Ill) (V)
xl < x2 < x3 x2 < Xl < X3; X3 < Xl < x2;
(II) (IV) (VI)
Xl < X3 < X2; X2 <
X 3 < Xl;
x3 <
x2 % Xl;
where each possible ordering of the three samples has been labeled, and where the equalities between the samples cannot occur since we assume stable sorting is used when ordering the variables x l , x 2 and x3. Consider the least upper bound of the filter lattice, the L2g filter, which has a unique weight vector for each possible permutation; thus, within the boundaries of each partition the non-linear surface is independently modeled by a hyperplane. In general, the LJg filter approximates the non-linear surface through N! independent hyperplanes. The parameters specifying these are found through the normal equations found in the filter optimization. It should be emphasized that the boundaries are determined by the relative order between the variables xi. This is illustrated in Fig. 6 where the six hyperplane partitions are shown on the xl and x2 axis where the third variable x3 is set to a constant. As x3 is moved, so do the partition boundaries in the (xl, x2) plane as depicted in Fig. 6.
t
0
xl
x~
:m
IV
Fig. 6. B o u n d a r i e s for the p a r t i t i o n of the piecewise linear a p p r o x i m a t i o n o f the n o n l i n e a r surface f ( x ) . The value of the variable x3 is fixed. P a r t i t i o n I : x~ < x2 < x3; P a r t i t i o n 1I: Xl < x3 < x2; P a r t i t i o n III: x2 < xl < x3; P a r t i t i o n IV: x2 < x3 < x!; P a r t i t i o n V: x3 < xl < x2; Partition VI: x3 "( x2 '< Xl.
544
G. R. Arce, Y. T. Kim and K. E. Barner
The hyperplanes used by the L N l g filter are independently chosen to best fit the non-linear surface. If this lattice is truncated at a lower level, then the number of partitions remain the same but the hyperplanes are no longer independently determined. That is, the coefficients for a variable xi must be equal for a particular set of partitions. As the lattice is reduced, the constraint on the weights of xi are tightened and the non-linear surface fit becomes increasingly coarse.
7. Applications In order to visualize the Ug filter lattice performance formerly described, we present several applications where order-statistic based filters can be successfully used in time-series analysis and processing. In particular, we consider applications in digital communications and image processing.
7.1. Narrow band interference suppression in D S M A systems Wideband communications find applications in military, cellular, and indoor radio communications, where superior ability to operate against several forms of interference is achieved. In these multiple access systems where multiple users share a common broadcast communication channel, the waveforms received can be modeled by the superposition of K modulated signals observed in additive noise: r ( t ) = S t ( b ) + I t + N t , where St(b), It, and Nt are the desired signal, narrowband interference, and wideband channel noise, respectively. The desired composite signal is given by St(b) = ~k=a /~ Ak ~i=-M M bk(l)• sk(t-- iT) where K, Ag, 2M + 1, and T are the number of active users, the received amplitude of the kthuser, the frame length, and the symbol interval. The waveform and the binary information bit of the kthuser are sk(t) and bk(i), respectively. The demodulation of this multiple-access communication system can be obtained by a network of weighted chip matched filters followed by a decision algorithm. It is well known that significant performance degradation occurs if narrowband interference is present, and thus the suppression of narrowband interference enhances the communications performance significantly. Due to the non-Gaussian nature of the multiple access signal, non-linear suppression methods perform better than their linear counterparts. In (Vijayan and Poor 1990), a non-linear transversal adaptive filter was used to suppress the interference, where one user signal was assumed and where the channel noise was assumed to be Gaussian. The signal space was first basebanded and sampled so that the received signal is given by r(m) = d(m) + i(m) + n(m), where m denotes the discrete time index, and d(m) is the desired binary signal. The interference and channel noise can be modeled by an ARMA(p, q) model. We consider a permutation filter lattice framework for the narrowband interference rejection. The LMS adaptive approach discussed in Section 4 was used to find the various permutation filters optimized for the suppression of the interference. The interference and noise were modeled by an ARMA(4, 2) process and the desired
545
Order-Statistic filtering and smoothing of time-series: Part I
signal was given by a pseudorandom, or a pseudonoise(PN), binary sequence of values 10 and - 1 0 . A data set of 10,000 samples was used to train the filters which are then tested on a different realization of the underlying random processes. Fig. 7 depicts a segment of the desired signal and the narrowband interference ARMA(4, 2) process. In this filtering scheme, the P N sequence is treated as noise, and the interference sequence is regarded as the desired signal. The binary sequence is recovered by subtracting the estimate of the interference sequence from the received signal. The input signal to noise ratio (SNR) defined as the PN signal to interference ratio. Table 2 summarizes the MSE obtained after the narrowband suppression obtained by the various LJg permutation filters as we vary the order of the filter in the lattice always keeping the size of the window constant. F r o m these results we find that significant gains are obtained for j > 3. The relative error signals of some of the LJg type permutation filters are drawn in Fig. 8 to visually demonstrate their performances where the error signal is the difference between the interference signal and its estimation. For comparison, the results of the techniques introduced in (Iltis and Milstein 1985) using linear prediction, and in (Vijayan and Poor 1990) using non-linear prediction, are also shown in Fig. 8. The performance improvement of the non-linear technique in (Vijayan and Poor 1990) over the linear prediction technique in the rejection of interference signals was small. The superiority of the permutation filter approach over the previous solutions is readily seen.
10
-5 -10 -15 -20 0
20
40
60
80
100 time
120
140
160
180
200
Fig. 7. Segment of the wideband spread-spectrum signal and the narrowband interference ARMA(4,2) process. The desired signal is shown as (...), and the interference as (--).
546
G. R. Aree, Y. T. Kim and K. E. Barner
Table 2 Narrowband Interference Rejection MSE obtained for the Ug filter lattice for an observation vector of size N = 7 Order j
MSE of Ug filter
0 1 2 3 4 5 6
23.71 20.16 18.56 15.70 6.40 3.21 2.38
Fig. 9 shows the S N R gains a t t a i n e d by some o f the U g filters with different i n p u t S N R ' s . N o t e t h a t the S N R gains o f the filters are close at the low S N R , since the received signal a p p r o x i m a t e l y becomes the A R M A signal. A s the S N R increases, thus, as the n o n - l i n e a r i t y o f the signal is increased, the difference o f the S N R gains o f the high o r d e r o f p e r m u t a t i o n filter a n d the low o r d e r o f p e r m u t a t i o n filter is m o r e a p p a r e n t . N o t e t h a t the L6g a n d L4g filters gain m o r e t h a n 10 dB even at the S N R o f 20 dB whereas the linear filter becomes useless.
7.2. Inverse halftoning
Digital h a l f t o n i n g is the m e t h o d o f rendering the illusion o f c o n t i n u o u s - t o n e (contone) pictures on displays c a p a b l e o f p r o d u c i n g only b i n a r y picture elements.
(6,1) Filter
-80
0
50
100
150
200
250
300
350
400
Fig. 8. Segments of the error signals between the interference signal and the estimate of a linear prediction, non-linear prediction, L2g,L4g and L6g filter predictions, for an observation vector of size N = 7.
Order-Statistic filtering and smoothing of time-series." Part I
547
25[
20 o
5 ~~L~LI(0,1
0 -20
-15
-10
-5
0 SNR (413)
5
10
) Filter 15
20
Fig. 9. The S N R improvements of linear, LZg,L4g and L6g filters, for various input SNRs, for an observation vector of size N = 7.
"Inverse halftoning" is needed whenever halftones need to be enlarged, reduced, or re-sampled, and where only halftoned images are available. Although such format conversions could be performed in the binary sample domain of halftones, it is more effective to first reconstruct the multilevel images from their halftone, reformat the greyscale images to the desired specifications, and if desired halftone the reformatted images again. In addition to format conversion, image reconstruction from halftones can prove useful whenever there is a need to view the images on multilevel displays. A simple method to reconstruct gray-scale images from their halftones is to low pass filter the binary images. The obtained reconstructions using low pass filters give good tone renditions, but are unacceptable due to the blurred edges and details. We can apply permutation Ug filters to perform inverse halftoning, provided we extend the underlying concepts to account for ranking in a binary multiset. In multiset permutations, the number of weights for a given j is significantly smaller than that of regular permutation filters. A more in depth description of Ug filters for inverse halftoning can be found in (Kim et al. 1995). Fig. 10(a) shows the halftone "Bridge" image obtained by the method of error diffusion (Ulichney 1987). The reconstructed images using a mean filter (MSE = 835) and the optimal linear filter (MSE = 293) are shown in Figs. 10(b) and ll(a), respectively. All filters use a window of size 7 x 7. The reconstructed image obtained from this halftone image using a binary permutation filter for j = 9 (MSE = 220) is shown in Fig. l l(b). It can be seen in these figures that permutation filters reconstruct image details more precisely than the mean filter or the linear filter.
548
G. R. Arce, Y. T. Kim and K. E. Barner
Fig. 10(a). (a) Error diffusedhalfone, (b) reconstructionusing a 7 x 7 mean filter.
7.3. Robust D P C M coding f o r noisy digital channels
Differential Pulse Code Modulation (DPCM) is a well known technique for the compression of correlated sources, and has been widely used in the compression of speech, images, and video. It is simple and remarkably effective. The encoder and decoder are shown in Fig. 12. In its basic form, using a fixed filter and a fixed quantizer, DPCM can compress 8 bits per pixel images to 3 or 4 bits per pixel with minimal distortion. More sophisticated variations with adaptive filtering and/or quantization will do better. DPCM is appropriate in applications where the hardware costs need to be kept low. Examples include telephone systems and personal wireless communication systems. It is also appropriate in applications such as digital television, where the large data rate requires state of the art high speed electronics, which may prohibit the use of complex compression schemes. Because of its differential nature, DPCM can suffer from acute sensitivity to channel bit errors that may occur during transmission. That is, a single bit error during transmission will lead to error propagation as the signal is being reconstructed in the receiver. The noise immunity of DPCM can be greatly improved by the introduction of a mechanism that catches errors inside the feedback loop
Order-Statistic filtering and smoothing o/" time-series." Part I
549
Fig. 10(b).
and prevents propagation (Khayrallah 1995). This is achieved by replacing the usual linear filter by a simple Lg filter. If the communication system includes error control coding (ECC), the noise immunity can be further improved by exploiting the reliability information produced by the ECC decoder. When the decoder has low confidence in its output, it enables a center weighted median filter (see part II of the paper), which operates on the received prediction difference ~t. When it has high confidence, it leaves b~ unchanged. This selective filtering is required because the prediction difference sequence has low correlation, and full time filtering would alter it too much. Further details are given in the next example (Khayrallah 1995). Consider the 256 × 256 8 bits per picture element (pixel) "Lenna" image. The usual predictor is a three point linear filter Wi,j ~- O~l~li l,j-1 -~- O~2~li-l,j @ O~3~ti,j-I •
(39)
This scheme is referred to as g-DPCM. The coefficients ~n are chosen such that the mean squared error (MSE) in the absence of the quantizer is minimized. The quantizer is a k-bit scalar quantizer. Here k = 4. Again, the usual step of
550
G. R. Arce, Y. 7] Kim and K. E. Barrier
Fig. 1 l(a). The reconstructed images from the error diffusion halftone using (a) binary permutation filter with j = 0 (linear filter), (b) binary permutation filter with j = 9.
choosing a nonuniform quantizer optimized for the Laplacian probability distribution is taken. Khayrallah (1995) proposes to use Lg-DPCM, where the Lg filter is given by ^
/,^
r^
r
^
wi,j = ~ l U i - 2 , j + O~21gi l,j-1 + ~3Ui 1,j + ~4bli,j-2 "^ + ~5Ui,j-I "~
(40)
and the coefficients ~',i depend on the ranking of the five elements in the window. The Lg filter structure is such that the only ranking information used consists of the locations of the largest and smallest elements. The largest and smallest elements are discarded by making their coefficients equal to zero. For the remaining three elements, the prediction coefficients are chosen as the best MSE filter in the absence of a quantizer. Since the Lg filter is a cross between a linear filter and a median filter, it is useful to implement a M-DPCM, where the filter is a simple median ~i,i =reed(hi Ij I,f~i-l,j, hi,i-l)
.
(41)
The g-DPCM and M-DPCM are considered to be baseline schemes, and it is useful to compare the Lg-DPCM method to these. The ECC is a (7, 4; 3)
551
Order-Statistic filtering and smoothing of time-series. Part I
Fig. ll(b).
H a m m i n g code. Each quantizer output i? is converted into a 4-bit block x . The E C C encoder accepts x and produces a 7 bit codeword y . The channel is a binary symmetric channel with raw error rate e = 0.0316. The E C C decoder receives a 7-bit block z, and m a p s it to the closest codeword y' in H a m m i n g distance. Then y' is converted into x', its corresponding 4-bit input, which in turn is converted to b'. F o r this particular code, the distance d(z, yl) can only be 0 or 1. If the distance is 0, the decoder has high confidence, and leaves b' unchanged. I f the distance is 1, the decoder has less confidence, and applies a center weighted median filter to ~ to produce b" ^t
At
At
^l
^t
^!
^l
^" = med(b'iy, ~ 7, v i I j 1, Vi-l,y, vi-l,j+l, vi,j 1, vij+J Vi+l,j, vi+l~+l)
1) i , j
(42)
where o7 indicates duplication 7 times. The results are shown in Figure 13. Clearly, the L g - D P C M is superior to the other two methods. The L g - D P C M system is able to correct or corrects or conceals almost all channel errors. In passing it should be noted that a similar D P C M m e t h o d where the prediction is formed using weighted median filters (described in Part II) was proposed in (Salo and N e u v o 1988).
552
G. R. Arce, Y. T. Kim and K. E. Barrier A,
u
Filter
(b)
(a) Fig. 12. DPCM scheme, (a) encoder, and (b) decoder.
(a)
(b)
(c)
(d)
Fig. 13. (a) Original 8-bit image, (b) g-DPCM decoder (MSE = 104), (c) M - D P C M ( M S E - 129), (d) Lg DPCM decoder (MSE = 71).
decoder
Order-Statistic filtering and smoothing of time-series: Part I
553
8. Conclusion LJg filters constitute a large class o f order-statistic based estimators which combine the attributes o f linear and L filters, The theory o f U g filters is reviewed in the first part o f 'this paper. The filter structure parallels that of traditional linear filters, however, the vector o f input samples is first nonlinearly transformed into a vector o f greater dimension where the temporal and ranking characteristics o f the input vector are considered jointly. A general filtering framework was described where the degree o f time-rank information extracted from the m a p p i n g xe +-+ xL is chosen by the designer by specifying the value o f the parameter j. We next described optimization methods which minimize the M S E error criterion, assuming stationarity o f the underlying signals. The well k n o w n L M S adaptive optimization m e t h o d was described in the framework o f U g filters. To demonstrate the effectiveness of these class o f estimators, several applications are reviewed in the paper, including n a r r o w b a n d interference cancellation in spread-spectrum systems, image reconstruction from halftones, and D P C M coding in noisy channels. Even t h o u g h permutation U g filters are rather well understood, there are m a n y open questions and research problems of considerable interest. One example is found in the issue o f dimensionality where the n u m b e r of parameters increases rapidly with j. M e t h o d s that reduce the n u m b e r o f parameters while preserving the effectiveness o f LJg filters are needed. Research focusing on the joint use os higher-order statistics with order-statistics (ranking) m a y also lead to useful results. Finally, w o r k in identifying novel applications for U g filters in engineering and other disciples is important.
References Arce, G. R. and M. Tian (1996). Order-statistic filter banks. IEEE Transactions on Image Processing, Vol. 5. Bednar J. B. and T. L. Watt. (1984). Alpha-trimmed means and their relationship to median filters. IEEE Transactions on Acoustics, Speech, and Signal Processing. Vol. ASSP-32. Bovik, A. C., T. S. Huang and D. C. Munson Jr. (1983). A generalization of medium filtering using linear combinations of order statistics, IEEE Transactions of Acoustics, Speech, and Signal Proeessing. Vol. 31. Cramer, H. (1946). Mathematical Methods of Statistics. New Jersey, Princeton University Press. David. H. A. (1982). Order statistics. New York. Wiley Interscience. Donnellan, T. (1968). Lattice Theory, New York, Pergamon Press. Ghandi, P. and S. A. Kassam (1991). Design and performance of combinations filters. IEEE Transactions on Signal Processing. Vol. 39. Haykin S. (1991). Adaptive Filter Theory. New Jersey, Prentice Hall. Huber, P. J. (1981). Robust Statistics. New York, John Wiley & Sons. Iltis, R. A. and L. B. Milstein (1985). An approximate statistical analysis of the widrow lms algorithm with application to narrow-band interference rejection. IEEE Transactions on Communications. Vol. 33. Khayrallah, A. (1995). Nonlinear filters in joint source channel coding of images. IEEE International Symposium on Information Theory. Sep. 17-22, 1995 Whistler, BC, CA.
554
G. R. Arce, Y. T. Kim and K. E. Barrier
Kim, Y.-T. and G. R. Arce. (1994). Permutation filter lattices: a general order-statistic filtering framework. IEEE Transactions on Signal Processing, Vol. 42. Kim, Y.-T., G. R. Arce and N. Grabowski (1995). Inverse haltoning using binary permutations filters. IEEE Transactions on Image Processing. Vol. 4. Palmieri F. and C. G. Boncelet, Jr., (1994). Ll-filter~a new class or order statistics filters. IEEE Transactions of Acoustics, Speech, and Signal Processing. Vol. 42. Pitas, I. and A. N. Venetsanopoulos (1989). Non-linear Filters. Kluwar. Pitas, I. and A. Venetsanopoulos (1991). Adaptive filters based on order statistics. IEEE Transactions on Signal Processing. Vol. 39. Pitas, I. and S. Vougioukas (1991). Lms order statistic filters adaptation by back propagation. Signal Processing. Vol. 25. Priestly, M. B. (1988). Nonlinear and Nonstationary Time Series. London, Academic Press. Salo, V. H. J. and Y. Neuvo (1988). Improving tv picture quality with medium type operations. IEEE Transactions on Consumer Electronics. Vol. 34. Tong, H. (1990). Nonlinear Time Series. New York, Oxford University Press. Tukey, J. W. (1974). Nonlinear (nonsuperimposable) methods for smoothing data. In: Conf Rec., (Eascon). Ulichney R. (1987). Digital Halftoning. Cambridge, MIT. Vijayan, R. and H. V. Poor (1990). Nonlinear techniques for interference suppression in spreadspectrum system. IEEE Transactions on Communications. Vol. 38.
N. Balakrishnan and C. R. Rao, eds., Handbook of Statistics, Vol. 17 © 1998 Elsevier Science B.V. All rights reserved.
t.% 1 /~ 1
Order-Statistic Filtering and Smoothing of Time-Series: Part II
Kenneth E. Barner and Gonzalo R. Arce
1. Introduction
Data time-series occur naturally in numerous fields of study including economics, engineering, medical, and many social fields. These time-series must often be processed, or filtered, to extract some information of interest. Traditionally, this filtering has been linear. Certainly, linear filters have a sound theoretical basis and have been extensively studied. Unfortunately, linear filters suffer from poor performance in many applications. Among the signals that linear filters perform poorly on are those with changing levels and corrupting noise that is either heavy tailed or signal dependent (Pitas and Venetsanopoulos 1989). This poor performance has lead to the investigation of nonlinear filtering methodologies. The design of nonlinear filters can follow many approaches since there is no single underlying theory of nonlinear filters. Thus, nonlinear filters range from simple ad hoc methods designed to tackle a single problem, to increasingly theoretically founded approaches that are more widely applicable. One nonlinear filtering approach that has received considerable attention, and for which much theoretical study has been conducted, is that based on rank-order. Indeed, much attention has been paid to rank-order filters since the running median filter was first applied to the smoothing of time-series by Tukey in 1974 (Pitas and Venetsanopoulos 1989; Arce et al. 1986); Ghandi and Kassam 1991; Hardie and Boncelet 1993; Wendt et al. 1986; Tukey 1974). The rank-ordering of samples allows the design of filter structures that are (a) robust in environments where the assumed statistics deviate from Gaussian models and are possibly contaminated with outliers, and (b) track signal discontinuities without introducing transient or blurring artifacts as linear filters do. Filter classes that operate on rank-order information can be broadly broken down into two categories according to how the estimate is formed. The two filter categories are weighted sum and selection type. The weighted sum type filters form estimates by weighting the input samples, often as a function of temporal- and rank-order, and then summing the weighted samples to obtain an estimate. Such filters were discussed in Part I of this chapter. The selection type filters take a 555
556
K. E. Barner and G. R. Arce
different approach, restricting the output to be one of the input samples. As in the weighted sum case, the input samples can be weighted to reflect importance, but the filter output must be one of the observation samples. Selection rank-order filters have advantages over their weighted sum counterparts in many applications. This is particularly true for signals with numerous edges, such as images or biomedical signals where the measured process can change states abruptly. Weighted sum based filters tend to blur the edges of such signals, even if their weights are a function of temporal- and rank-order. In images, accurate tracking of edges is vital due to the nonlinear nature of the human visual system. Selection type filters have considerable advantages in edge tracking as compared to weighted sum filters. To illustrate this and motivate the selection approach, consider the raster-scan order filtering of an image corrupted by impulsive type noise. A common approach to limiting the effect of impulsive outliers is through trimming. In a weighted sum approach, this leads to the etrimmed mean. The output of this filter at instant n is 1 N+I y ( n ) - N - 2(~ - 1) E .
(1)
x(i) ,
where x(1) <__x(2) _< • • • _< X(N) are the N observed samples in rank order. Thus, the e-trimmed mean averages over all but the ~ - 1 smallest and largest samples. If = 1, the sample mean is realized while for c~= N, the sample median is realized. A comparable trimming method that is selection type is the center weighted median (Ko and Lee 1991) which can be expressed as y ( n ) = Med[x(~),x(n),x(N+l_c~)]
.
(2)
For this filter, the output is identical to the input as long as x(~) < x ( n ) <_ X(N+I-~). If x(n) is outside this range, the output is trimmed to either x(~) or X(N+I-~). To compare the weighted sum and selection approaches, consider the single image scan line shown in Fig. 1. This figure shows the original scan line, the scan corrupted by impulsive noise, and the running trimming statistics x(~) and X(N+l-~). As the figure shows, these statistics form a band between which the samples are either averaged (weighted sum approach) or the input is passed to the output (selection approach). Figure 2 shows the results of the two filtering operations. While both suppress outliers, the selection approach clearly performs better than the weighted sum approach which excessively smoothes all edges. The advantages of the selection approach can more clearly be seen by examining the image in Fig. 3 whose upper left quarter is the original "aerial" image, the upper right quarter of the figure is the corresponding quarter of the image which has been corrupted by noise, the bottom left quarter is the output of a weighted-sum type order-statistic filter, and the bottom right quarter is the output of a selection-type order statistic filter. Both filters operate on a raster scan and have a width of seven. This example illustrates that the selection approach to filtering has clear advantages for certain signals. It is this general category of selection order-statistic
Order-Statistic filtering and smoothing of time series. Part H
557
260 240 220 200 180 160 140 120 100 -
80 6O 0
i
Original Signal Noisy Signal Trimming Order Statistic
I
i
i
,
i
i
i
i
i
i
10
20
3'0
40
50
60
70
80
90
~-00
Fig. 1. A single scan line from the image "aerial." The original, corrupted and running order-statistics x(~) and X(u+~-~) are shown. The corruption is additive Laplacian noise with probability of occurence 0.15 and a = 75. Also, N = 13 and ~ = 4.
filters that we cover in this chapter. We begin by giving a brief review of the most well known and thoroughly studied selection order-statistic filter, the median filter. The median filter is also the starting point for many generalizations that have been developed. Therefore, a thorough understanding of the median filter is necessary to fully understand the principals behind the generalizations. The median filter, as will be shown, possesses many optimality properties. However, the filter offers little flexibility and is temporal blind. That is, all temporal information is lost in the filtering process. Permuting the time ordered observations, for instance, does not alter the filter output. This lack of temporal information causes performance to suffer. As a result, numerous generalizations of the median filter have been introduced that incorporate some form of temporal information (Ghandi and Kassam 1991; Wendt et al. 1986; Brownrigg 1984; Coyle et al. 1989; Hardie and Barner 1994 and 1996). Temporal information can be incorporated into order-statistic filtering through weighting of time ordered samples prior to rank ordering. This leads to the class of Weighted Median filters and Weighted Order Statistic (WOS) filters (Yli-Harja et al. 1991). Through weighting, certain temporal samples can be emphasized while others are deemphasized. This weighting scheme incorporates temporal information and results in considerable performance gain over temporal blind (strictly rank-order) filters. Still, the temporal-order weighting followed by
558
K. E. Barrier and G. R. Arce
240
J
220
i
i/~ ..li/'\ ~'> ! i! I ! li
20O
i
:
I"ri
180
I i'i
! IZ 'i: /i:
i
!.
j!
! ii
i',, i, ! t~ f'~ i i" i! i ;! fii
160
i 140
i a ~4
//i
/
il ~i i ~Ji i ti i i lii
t
I~ ~;, ~"
, i
!i,
,'ii'
""
i
f
ii'!
\ ~i A J
!
120
Ili;
~l
!", ~"
,
!Pi
/ I
/[:
I i
I !
100 • --
Original Signal Selection Filter Output Weighted Sum Filter Output
:! •/
8O 60
i
0
10
i
20
30
i
40
50
60
i
70
80
90
100
Fig. 2. The o u t p u t of the selection and weighted sum filters operating on the c o r r u p t e d scan line in Fig. 1.
Fig. 3. The image "aerial" b r o k e n into four quadrants: u p p e r left, original, u p p e r right, noisy, lower left, weighted sum filter output, lower right, selection output.
Order-Statistic filtering and smoothing of time-series: Part H
559
rank ordering decouples the temporal from rank information during weighting. Due to this decoupling, these filters use only a fraction of the temporal and rank information contained in the two orderings. The full temporal and rank information is represented by the mapping that takes one ordering to the other, p : x ~ xL where x and xL are vectors containing the temporally and rank ordered observation samples, respectively. The full permutation mapping information can be utilized by coupling the temporal- and rank-order during weighting. This results in the powerful class of Permutation Weighted Order-Statistic (PWOS) filters (Barner and Arce 1994; Arce et al. 1995). While the performance achieved by using the full permutation information can be impressive, the explosive growth in the parameter set limits the number of samples for which the full information can be used. To combat this problem, a LJe lattice approach to coupling temporal- and rank-order information is used. In the lattice terminology, g and L refer to temporal- and rank-order respectively. The exponent governs the amount of rank (j) coupling used. Thus, the amount of temporal and rank coupling is easily controlled. This offers flexibility in performance as well as control over the parameter set. In addition to the lattice approach, we detail alternative methods for reducing the permutation information while retaining performance gains. To effectively utilize these classes of filters, the parameters must be set appropriately for the task at hand. To this end, we present two adaptive optimization techniques. Lastly, numerous examples are given illustrating the performance of the various filters.
2. The median filter
The running median filter was the genesis for the broad array of rank order based filtering techniques that exist today, and that continue to be developed. The running median filter was first suggested as a nonlinear smoother for time series data by Tukey in 1974. Since median filters are the foundation upon which current rank order based filtering techniques are based, a thorough understanding of the median filter and its properties is crucial to the development and understanding of current techniques. As such, a brief review of the median filter is given in this section. The review includes formal definitions and a survey of statistical and deterministic properties developed to characterize the median filter performance. Also included is a review of threshold decomposition, which was instrumental in developing many of the median filter properties. This review serves as a starting point for the median filter generalizations developed in the following sections.
2.1. The running median filter To define the running median filter, let {x} be a discrete time sequence. The running median passes a window over the sequence (x} that selects, at each instant n, an odd number of samples to comprise the observation vector x(n). The observation window is typically symmetric and centered at n, resulting in
560
K. E. Barner and G. R. Arce
x ( n ) -- Ix(n -
,x(,
+
T ,
(3)
where N1 may range in value over the nonnegative integers and N = 2N1 + 1 is the (odd valued) window size. While processing such non-causal observation vectors has traditionally been referred to as smoothing, we loosen the terminology somewhat and refer to the processing of both causal and non-causal observations as simply filtering. The median filter operating on the input sequence {x} produces the output sequence {y}, where at time index n
y(n) = Med[x(n)]
(4)
= Median value of [x(n - N]),..., x(n),..., x(n + N])] .
(5)
That is, the samples in the observation window are sorted and the middle, or median, value is taken as the output. The input sequence {x} may be either finite or infinite in extent. For the finite case, the samples of {x} can be indexed as x(1 ), x(2),..., x(L), where L is the length of the sequence. Due to the symmetric nature of the observation window, the window extends beyond a finite extent input sequence at both the beginning and end. These end effects are generally accounted for by appending N1 samples at the beginning and end of {x}. Although the appended samples can be arbitrarily chosen, typically these are selected so that the points appended at the beginning of the sequence have the same value as the first signal point, and the points appended at the end of the sequence all have the value of the last signal point. To illustrate the appending of input sequence and the median filtering operation, consider the input signal {x} of Figure 4. In this example, {x} consists of 20 5 4 3 2 1 0
-
•
tD---
OO--G
~ I! W
A w
Input
A w
Filter Motion
0(3-0 (2)--(3-o--t~Q
~ a
-
-
~O v w
wQ
Output
Fig. 4. The operation of the window width 5 median filter, o: appended points.
Order-Statistic filtering and smoothing of time-series." Part H
561
observations from a 6-level process, {x :x(n) E {0, 1 , . . . , 5 } , n - - 1,2,...,20}. The figure shows the input sequence and the resulting output sequence for a window size 5 median filter. Note that to account for edge effects, two samples have been appended to both the beginning and end of the sequence. The median filter output at the window location shown in the figure is y(9) = Med [x(7),x(8),x(9),x(lO),x(t 1)] -- Med[1, 1,4, 3, 3] = 3. The median filtering operation is clearly nonlinear. As such, the median filter does not possess the superposition property. Thus, traditional frequency and impulse response analysis are not applicable. The impulse response of a median filter is, in fact, zero for all time. Consequently, alternative methods for analyzing and characterizing median filters must be employed. Broadly speaking, two types of analysis have been applied to the characterization of median filters: statistical and deterministic. Statistical properties examine the performance of the median filter, through such measures as optimality and output variance, for the case of white noise time sequences. Conversely, deterministic properties examine the filter output characteristics for specific types of commonly occurring deterministic time sequences. In the following, we review some of the statistical and deterministic properties of running median filters.
2.2. Statistical properties The statistical properties of median filters can be examined through the derivation of output distributions and statistical conditions on the optimality of median estimates. These analysis generally assume that the input to the median filter is a constant signal with additive white noise. The assumption that the noise is additive and white is quite natural and made similarly in the analysis of linear filters. The assumption that the underlying signal is a constant is certainly convenient, but more importantly, often valid. This is especially true for the types of signals median filters are most frequently applied to, such as images. Signals such as images are characterized by regions of constant value separated by sharp transitions, or edges. Thus, the statistical analysis of a constant region is valid for large portions of these commonly used signals. By calculating the output distribution of the median filter over a constant region, the noise smoothing capabilities of the median can be measured through statistics such as the filter output variance. The median filter properties covered here are for time series signals consisting of white noise observation samples with known distribution. Since the observation sequence is probabilistic, the time index can be dropped and attention focused on a single observation vector. In this case, and others for which the time index n can be dropped without confusion, we do so and denote the observation vector as simply x = [xl,x2,... ,XNI. Consider first the case where the observation samples are white noise with a double exponential, or Laplacian, distribution. In this case, the common probability density function (pdf) is given by fx(t) = ( 1 / ~ ) e ~lt i,t, where p and ~r2 are the mean and variance, respectively. For a vector of samples, the joint pdf is
562
K. E. Barner and G. R. Arce
N
fx(t) = \ - ~ 2 /
(6)
:
Given an observation vector x, the Maximum Likelihood (ML) estimate of the mean, or location parameter, is found by maximizing (6) with t = x. To simplify the notation, define the distance operator D"(.) as N
D"(fi) = Z
Ixi - fil" "
(7)
i=1
Then the ML estimate of the location, for Laplacian distributed samples, is the value fl that minimizes D r (fl) with t / = l. It is easy to show that Med[xl, x2,..., XN] = arg mjn D' (fl) .
(8)
Thus, the median of the samples xl, x 2 , . . . , xx is the value fl that minimizes D 1(fi), and consequently, the ML estimate of location for samples with a Laplacian distribution. As a comparison, 1
N
Mean[xa,x2,...,XN] = ~ i~=oXi=argmjnD2(fl )
(9)
is the ML estimate of location for samples with a Gaussian distribution. The median and sample mean are, thus, optimal estimates of location for the Laplacian and Gaussian distributions, respectively. This shows that for heavy tailed distributions, such as the Laplacian, the median has advantages over the linear combination based sample mean. A further examination of D 1(.) and D2(.) reinforces this point. The median is clearly the least absolute error estimate of the center of the distribution for Xl,X2,... ,XN, while the mean is the least squared error estimator. The reliance on the absolute error criteria means that the median is less influenced by outliers than the squared error based mean. Having established the types of signals for which median filters are optimal, the filtering operation can be further characterized through the determination of output distributions. Assume again that the input time series consists of white noise samples with pdffx(.) and cumulative distribution (cdf) F~(-). Under these conditions on the input samples, it well known that the median filter output cdf, Fmed('), and pdf, fmcd('), are given by Freed(t) =
Z
F~(t)i(1 _ F x ( t ) ) U
i
(lO)
i=N~ + l
and N~ N1 !N1.
freed (t) = - - - - ~ f x ( t ) F x ( t )
NI (1 - Fx(t)) NI
(11)
563
Order-Statistic filtering and smoothing of time-series: Part H
respectively (David 1982). F r o m these expressions it can be shown that for tl and t2 such that Fx(tl) = 1 - Fx(t2), then Freed(h) = 1 -- Fmed(t2) also holds. By setting tl = t0.s, where by definition t0.s is the point satisfying Fx(t0.5) = 0.5, we see that the median is statistically unbiased in the sense that the median of the input is the median of the output. Moreover, the median behaves consistently for samples with asymmetric distributions. The calculation of statistics such as the output mean and variance from the expressions in (10) and (11) is often quite difficult. Insight into the smoothing characteristics of the median filter can, however, be gained by examining the asymptotic behavior (N --+ oc) of these statistics, where, under some general assumptions, results can be derived. F o r the case of white noise input samples, the asymptotic mean, }/med, and variance, timed2, of the median filter output are ]/med =
t0.5 ,
(12)
and 1
2 _ timed 4N(fx(to.5))2
(13)
Thus, the median produces a consistent (llmu-~oo ' 2 = 0) and unbiased estimate timed of the input distribution median, irrespective of the input distribution. Note t h a t the output variance is not proportional to the input variance, but rather 1/fz(to.5). F o r heavy tailed noise such as impulsive, 1/fx(to.5) is not related to input variance, i.e., the variance is proportional to the impulse magnitude, not 1/fx(to.5). Thus, the output variance of the median in this case is not proportional to the input variance. This is not true for the sample mean and further explains the more robust behavior of the median. The variances for the sample mean and median filter output are given in Table 1 for the uniform, Gaussian, and Laplacian input distribution cases (David 1982). The results hold for all N in the uniform case and are asymptotic for the
Table 1 Asymptotic output variances for the window size N mean and median filters for white input samples with uniform, Gaussian, and Laplacian distributions. Mean and median filter output variance Filter type
Input sample probability Density function Uniform fi(t) = • Gausslan L(t) = ~v ze~1 -
Laplacian fi(t) =
for 3x/5752< t < ~ otherwise -_~,(t-pf
.
e -£a -
.
.
.
Mean
Median
o,, 7
3°-" u+~
~" N
~-" 2N
~
~-
564
K. E. B a r n e r a n d G. R . A r c e
Gaussian and Laplacian cases. Note that the median performs about 3 dB better than the sample mean for the Laplacian case and 2 dB worse in the Gaussian case. The median filter possesses numerous statistical properties in addition to those discussed above. Among those properties that illustrate the optimality of the median are (Coyle et al. 1989): [1] The conditional median at each time instant n is the minimum Mean Absolute Error (MAE) estimator of the signal value at time n, where the conditioning is on the past history up to time n of the noise corrupted observations of the signal. [2] The running median is, with high probability, a maximum a p o s t e r i o r i estimator of a constant signal in symmetric impulsive noise. These statistical properties are complemented by a set of deterministic properties, which are discussed next. 2.3.
Deterministic
properties
Statistical properties give considerable insight into the median filter performance. The median filter cannot, however, be sufficiently characterized through statistical properties alone. For instance, an important question not answered by the statistical properties is what type of signal, if any, is passed through the median filter unaltered. Linear filters, for example, can be analyzed in the frequency domain to determine, among other things, pass- and stop-band frequencies. If the frequency content of the input signal lies exclusively in the filter passband, then the signal passes through the filter unaltered 1. Conversely, signal content in the stop band does not pass through, or is at least attenuated by, the filter. Somewhat analogous results do in fact exist for the median filter. For median filters, passband or invariant signals are referred to as root signals. The concept of root signals is important to the understanding of median filters and their effect on general signal structures. A review of the significant results in root signal analysis is given in the following along with the main median filter properties resulting from this analysis. The definition of a root signal is quite simple: a signal is a median filter root if the signal is invariant under the median filtering operation. Thus, a signal {x} is a root of the window size N = 2NI + 1 median filter if x(n)
= Med[x(n
- N1),
. . . ,x(n),
. . . ,x(n
+
N1)]
(14)
for all n. As an example, consider the signal shown in Fig. 5. This signal is filtered by three different window size median filters (NI = 1,2, and 3). Note that for the window size three case (N1 = 1), the filter output is a root. That is, further filtering of this signal with the window size three median filter does not alter the signal. Notice, however, that if this same signal is filtered with a larger window size 1In general, the pass-band is definedin terms of the magnitude response. Thus, there may be some time shifting of signals in the pass-band, depending on the filter phase response.
Order-Statistic filtering and smoothing of time-series." Part H
565
4
3
©
2
e
.-
e ~
Input signal x(n)
-"
v
1 G ~ .
0
4 3
• • e
2 1
e. © ze.z
0 4 3 2 1
2
©'~ e
Output signal for a window of size3
ze.
e~o
---
e e ~
e.~
~e.
Output signal for a window of size 5
e.e.~©
0
4 3
~ ~
e©
--
e
~ •
Output signal for a window of size 7
1
0 Fig. 5. Effects of window size on a median filtered signal, o: appended points.
m e d i a n , the signal will be modified. Thus, the signal in Fig. 5(b) is in the passb a n d , or a root, o f a Nl = 1 m e d i a n filter b u t outside the p a s s b a n d , or n o t a root, o f the N I = 2 a n d N1 = 3 filters. T h e g o a l o f r o o t analysis is to relate the filtering o f desired signals c o r r u p t e d b y noise to r o o t a n d n o n - r o o t signals. I f it can be shown t h a t certain types o f desired signals are in the m e d i a n filter r o o t set, while noise is o u t s i d e the r o o t set, then the m e d i a n filtering o f a time series will preserve desired structures while altering the noise. Such a result does in fact h o l d a n d will be m a d e clear t h r o u g h the following definitions a n d properties. F i r s t note that, as the e x a m p l e a b o v e illustrates, w h e t h e r or n o t a signal is a m e d i a n filter r o o t d e p e n d s on the w i n d o w size o f the filter in question. Clearly, all signals are r o o t s o f the w i n d o w size one m e d i a n (identity) filter. T o investigate this d e p e n d e n c e on w i n d o w size, m e d i a n filter r o o t signals can be c h a r a c t e r i z e d in terms o f local signal structures, where the local signal structures are related to the filter w i n d o w size. Such a local structure b a s e d analysis serves two purposes. First, it defines signal structures that, when p r o p e r l y c o m b i n e d , f o r m the m e d i a n filter r o o t set. Second, b y relating the local structures to the filter w i n d o w size, the effect o f w i n d o w size on r o o t s is m a d e clear. T h e local structure analysis o f m e d i a n filter r o o t s relies on the following definitions.
566
K. E. Barner and G. R. Arce
DEFINITION 2.1. A constant neighborhood is a region of at least N1 + 1 consecutive identically valued points. [] DEFINITION 2.2. An edge is a monotonic region between two constant neighborhoods of different value. The connecting monotonic region cannot contain any constant neighborhoods. [] DEFINITION 2.3. An impulse is a constant neighborhood followed by at least one, but no more than N1 points which are then followed by another constant neighborhood having the same value as the first constant neighborhood. The two boundary points of these at most N points do not have the same value as the two constant neighborhoods. [] DEFINITION 2.4. An oscillation is a sequence of points which is not part of a constant neighborhood, an edge or an impulse. [] These definitions may now be used to develop a description of those signals which do and those which do not pass through a median filter without being perturbed. In particular, Gallagher and Wise (1981) have developed a number of properties which characterize these signal sets for the case of finite length sequences. First, any impulse will be eliminated upon median filtering. Secondly, a finite length signal is a median filter root if it consists of constant neighborhoods and edges only. Thus, if a desired signal is constructed solely of constant neighborhoods and edges, then it will not be altered by the median filtering operation. Conversely, if observation noise consists of impulses (as defined above), it will be removed by the median filtering operation. These median filter root properties are made exact by the following. DEFINITION 2.5. A sequence {x} is said to be locally monotonic of length m, denoted LOMO(m), if the subsequence x ( n ) , x ( n + 1),... ,x(n + m - l) is monotonic for all n > I. [] PROPERTY 2.1. Given a length L sequence to be median filtered with a length N = 2N1 + 1 window, a necessary and sufficient condition for the signal to be invariant (a root) under median filtering is that the extended (beginning and end appended) signal be LOMO(N1 + 2). [] Thus, the set of signals that forms the passband or root set (invariant to filtering) of a size N median filter consists solely of those signals that are formed of constant neighborhoods and edges. Note that by the definition of LOMO(m), a change of trend implies that the sequence must stay constant f o r at least m - 1 points. It follows that for a median filter root signal to contain both increasing and decreasing regions, these regions must be separated by a constant neighborhood of least N1 + 1 identically valued samples. It is also clear from the definition of LOMO(.) that a LOMO(ml) sequence is also LOMO(m2) for any two
Order-Statistic filtering and smoothing of time-series: Part H
567
positive integers ml _> m2. This implies that the roots for decreasing window size median filters are nested, i.e., every root of a window size M filter is also a root of a window sized N median filter for all N < M. This is formalized by: PROPEkTY 2.2. Let S denote a set of finite length sequences and RN~ be the root set of the window size N = 2N1 + 1 median filter operating on S. Then the root sets are nested such that ...RN~+I E RNI E RN~-I E . . . E R1 E Ro = S. [] In addition to the above description of the root signal set for a median filter, it can be shown that any signal of finite length is mapped to a root signal by repeated median filtering. In fact, it is simple to show that the first and last points to change value on a median filtering operation remain invariant upon additional filter passes, where repeated filter passes consist of using the output of the prior filter pass for the input of an identical filter on the current pass. This fact, in turn, indicates that any L long nonroot signal (oscillations and impulses) will become a root structure after a maximum of ( L - 2)/2 successive filterings. This simple bound was improved in [17] where it was shown that at most 3I- L_--2 .] |2(N1 + 2)[
(15)
passes of the median filter are required to reach a root. This bound is conservative in practice since in most cases a root signal is obtained after ten or so filter passes. The median filter root properties are illustrated through an example in Fig. 6. This figure shows an original signal and the resultant root signals after multiple passes of window size 3, 5, and 7 median filters. Note that while it takes only a single pass of the window size 3 median filter to obtain a root, it takes two passes for the window sizes 5 and 7 median filters. Clearly, the locally monotonic structure requirements of the root signals are satisfied in Fig. 6. For the window size 3 case, the input sequence becomes LOMO(3) after a single pass of the filter. Thus, this sequence is in the root set of the window size 3 median filter, but not a root of the window size N > 3 median filter since it is not LOMO(N) for N > 3. The deterministic and statistical properties form a powerful set of tools for describing the median filtering operation and performance. Together, they show that the median filter is an optimal estimator of location for Laplacian noise and that common signal structures, e.g., constant neighborhoods and edges in images, are in the filter pass-band (root set). Moreover, impulses are removed by the filtering operation and repeated passes of the median filter always results in the signal converging to a root, where the root consists of a well defined set of structures related to the filter window size. 2.4. Median filtering and threshold decomposition
A fundamental property of median filters is threshold decomposition (Fitch et al. 1984). This property was the key to deriving many of the median filter statistical and deterministic properties. Moreover, threshold decomposition is instrumental
568
K. E. Barner and G. R. Arce
~
•
~OOO
Input signal x(n) GOOd, A
A
A
A v
A v
- - v
v
A v
A v
Root signal for a window
A w
A v
v
A v
A w
A v
of size 3 ( 1 filter pass).
A v
~.t-Root signal for a window of size 5 ( 2 filter passes).
4 3 2
11o
Root signal for a window of size 7 ( 2 filter passes).
1
0 Fig. 6. Root signals obtained by median filters of size 3, 5, and 7. o: appended points.
in the optimization of the median filter generalizations discussed in the following sections. A review o f this i m p o r t a n t property is therefore in order. Threshold decomposition is simply a means o f decomposing an M-level signal into an equivalent set o f M - 1 binary sequences 2. Let x(n) = Ix1,x2,..., XN] be an N element observation vector where the signal is quantized to M levels such that x ( n ) E ZM = {0, 1 , . . . , M - 1 }. The threshold decomposition o f x(n) results in the set o f binary vectors Xl(n), X 2 ( n ) , . . . , x V - l ( n ) , where Xi(n) ~ {0, 1} N is the observation vector thresholded at level i for i = 1 , 2 , . . . , M - 1. As a function o f the threshold operator T/[.], Xi(n) • r/[x(n)] = [ri[x1], ri[x2],..., T/[XN]] = I < i, < , .i. . ,xh],
(16) (17)
(is)
2 For now we restrict the discussion to quantized signals. This restriction is lifted in Section 3.4
Order-Statistic filtering and smoothing of time-series." Part H
569
where T/[.] is defined as 1 0
Xj = T/[xjJ = for i = 1 , 2 , . . . , M -
if xj > i,
(19)
otherwise
1 and j = 1 , 2 , . . . , N . In terms of the time indexed samples,
Xi(n) = T/Ix(n)]. Threshold decomposition can be reversed by simply adding the threshold decomposed signals, M-1
M-I
x(n) = ~ X i ( n )
and
x(n) = ~ X ' ( n )
i=1
.
(20)
i=1
Furthermore, it was shown by Fitch et al. that the median operation commutes with thresholding (see Fitch et al. 1985). Stated more formally, the median filtering of a M-level signal x(n) C {0, 1 , . . . , M - 1} is equivalent to filtering the M - 1 threshold signals X 1(n),X2(n),... , X M-1 (n), and summing the results, M-1
Med[x(n)l = ~
Med[Xi(n)]
(21)
i=1
for all n. Thus, threshold decomposition is a weak superposition property. A related property is the partial ordering property known as the stacking property. DEFINITION 2.6. Let X and Y be N element binary vectors. Then X stacks on Y, which is denoted as Y _< X, if and only if ~ < X/for i = 1, 2 , . . . , N . A function f(.) possesses the stacking property if and only if Y _< X ~ f ( Y ) < f ( X ) .
(22) []
The median filter was shown to possesses the stacking property (Fitch et al. 1985), which can be stated as follows. In the threshold decomposition domain, the binary median filter output at threshold level i is always less than or equal to the binary median filter output at lower threshold levels: Med[Xi(n)] _< Med[XJ(n)]
(23)
for all i,j such that 1 _< j < i < M - 1. The stacking property is a partial ordering property. It states that the result of applying the median filter to each of the binary sequences obtained by thresholding the original signal will have a specific structure to them. Thus, in median filtering by threshold decomposition, the input sequence is first decomposed in M - 1 binary sequences, and each of these is then filtered by a binary median filter. Furthermore, the set of output sequences possesses the stacking property. As a simple example, consider the median filter of window size three (N = 3) being applied to a 4-level input signal as shown in Fig. 7. The outputs of the multi-level median filter and of the threshold decomposition median filter are identical because of the weak superposition property.
570
K. E. Barner and G. R. Arce
1 1 0
2
0
3
3
1 2
Median Filter
2
>1
1 1 0
;
Threshold at 1, 2, and 3
2
3
3 2
2
2
T T
Add binary outputs
J
1
0 0 0 0 0
1 1 0 0 0
~l BinaryMed. Filter t
~ 0
0 0 0 0
0 0 0
1 0
1 1 0
:~1 Binary Me'd"Filter
~ 0
0 0 0
1 1 0
1 0
1 1 1 1 1
-[ BinaryM~l.Filter
~ 1 l
1 1-
1 0
1 1 0 0 0
1 1 1 1 1 1 1 1 1 1 1 1
Fig. 7. Median Filtering by threshold decomposition. The 4-valued input signal is filtered by the running sorting method in the top part of the figure. In the bottom part of the figure, the signal is first decomposed into a set of binary signals and each of these is filtered by a binary median filter. The output is produced by adding together the outputs of the binary median filters.
3. Weighted median filters
Numerous generalizations to the median filtering operation have been introduced since Tukey first suggested the median filter as a smoother in 1974 (see Tukey 1974). While m a n y different approaches have been taken in an attempt to improve the median filter performance, most have, in some way, attempted to include temporal information into the filtering process. For most signals, and certainly those of practical interest, it is clear that certain observation samples have a higher degree of correlation with the desired estimate than do others. In the linear filter case, this correlation is reflected in the weight given each sample. A similar weighting approach can be taken to generalize the median filter. The sample weighting approach to generalizing the median filter is developed in this section. We begin by discussing the Center Weighted Median (CWM) filter, in which only one sample, the sample centrally located in the observation window, is weighted. This is then generalized to the Weighted Median (WM) filter case in which all observation samples are weighted. In both the C W M and W M filter cases the output is the median value of the weighted set. A further generalization can be achieved by allowing the output to be an order statistic other than the median. This leads to the class of Weighted Order Statistic (WOS) filters. Following the development of these generalizations, we show that each possesses the threshold decomposition property. As noted earlier, threshold decomposition is an extremely powerful tool for both filter analysis and optimization, and is the final topic covered in this section.
571
Order-Statistic filtering and smoothing o f time-series: Part H
3.1. Center weighted median filters
The median filter is strictly a rank order operator. Thus, all temporal locations within the observation window are considered equivalent. That is, given a window of observation samples, any permutation of the samples within the observation window results in an identical median filter output. As stated above, for most signals certain samples within the observation window are more correlated with the desired estimate than are others. Due to the symmetric nature of the observation window, the sample most correlated with the desired estimate is, in general, the center observation sample. The center observation sample can be weighted to reflect its importance, or correlation with the desired estimate. Since median filters select the output in a different fashion than do linear filters, i.e., ranking versus summing, the observation samples must also be weighted differently. In the median filtering case, weighting is accomplished through repetition. Thus, the output of the CWM filter is given by y(n) = Med[xm,... ,Xc-1 ,Xc
<~W c , X e + l , . . .
,XN]
(24)
,
wc times
where xc<>wc denotes the replication operator xc<>w~ =SCc,Xc,...,Xc and c = (N + 1)/2 = N1 + 1 is the index of the center sample. The center sample is thus repeated Wc times, where w~ is non-zero odd positive integer. Consequently, the output of the CWM filter is the median over an extended set containing multiple center samples. When w~ = 1, the operator is a median filter, and for Wc >_ N, the CWM reduces to an identity operation. On the right side of (24) the time index n has been dropped for notational simplicity and the observation samples indexed according to their location in the observation window. In terms of the time series, the samples in the observation window are xi = x ( n - (N1 + 1) + i ) for i = 1 , 2 , . . . , N . The effect of varying the center sample weight is perhaps best seen by way of an example. Consider a segment of recorded speech. The voiced waveform "a" is shown at the top of Fig. 8. This speech signal is taken as the input of a CWM filter of size 9. The outputs of the CWM, as the weight parameter Wc from 1 to 9, are also shown in Fig. 8. The vertical index denotes the value given to Wc. The signal at the top is the original signal, or the output signal of the C M W when Wc = N, or 9 in this example. The second signal from the top is the CWM filtered signal with w~ = N - 1. The weight wc is successively decreased until w~ = 1, in which case the CWM filter reduces to the standard median. The smoothing characteristics of the CWM filter, as a function of the center sample weight, are illustrated in the previous example and figure. Clearly, as Wc is increased less smoothing occurs. This response of the CWM filter is explained by the following property which relates the weight Wc and the CWM filter output to select order statistics (OS). The N observation samples x I , X 2 , . . . , X N can be written as an OS vector, XL :
[X(1),X(2),... ,X(N)] ,
(25)
K. E. Barner and G. R. Arce
572
0
50
100
150
200
250 time n
300
350
400
450
500
Fig. 8. Effects of increasing the center weight of a C W M filter of size N = 9 operating on the voiced speech "a". The C W M filter o u t p u t is shown for wc = 1,3,5,7,9. Note that for wc = 1 the C W M reduces to median filter, and for wc = 9 it becomes the identity filter.
where x0) _< x(2) _< .-. _< X(N). The following relation (Hardie and Boncelet 1993; Ko and Lee 1991) utilizes this notation. PROPERTY 3.1. Let {y} be the output of a CWM filter operating on the sequence {x}. Then y(n) = Med[xl,..., x~_,, xc owe, Xc+,,. . . , XN], = Medlx(k),Xc,X(N_k+,)]
wherek=(N+2-wc)/2for l_<wc<_N, a n d k = l f o r w c > N . From this property we can write the CWM filter output y(n) as
y(n) =
I Xc x(k) X(x+l-k)
if X(k) ~ Xc ~ X(N+I_k) if Xc <_ x(k) if Xc >_ X(N+I k)
(26) (27) []
(28)
Since x(n) is the center sample in the observation window, i.e., xc = x(n), equation (28) indicates that the output of the filter is identical to the input as long as the x(n) lies in the interval [x(k),X(u+l k)]- If the center input sample is greater than X(N+I-~) the filter outputs X(u+l-k), guarding against a high rank order (large)
Order-Statistic fihering and smoothing of time-series. Part H
573
aberrant data point being taken as the output. Similarly, the filter's o u t p u t is X(k) if the sample x(n) is smaller than this order statistic. This C W M filter performance characteristic is illustrated in Figs. 9 and 10. Figure 9 shows how the input sample is left unaltered if it is between the trimming statistics x(k) and X(N+l-k) and m a p p e d to one o f these statistics if it is outside this range. Figure 10 shows an example o f the C W M filter operating on a Laplacian sequence. A l o n g with the input and output, the trimming statistics are shown. It is easily seen h o w increasing k tightens the range in which the input is passed directly to the output.
I
I
L X(k)
X(1)
X(N+I-k)
X(N)
Fig. 9. The center weighted median filtering operation. The center observation sample is mapped to the order statistic x(k) (X(N+J-k)) if the center sample is less (greater) than X(k) (X(N+I-k)), and left unaltered otherwise.
I
---
Input signal Trimming order statistics Filter output
, II f
il ii
'
ill
-1
iI . . . .Ii '!i 1
iI F
-2
II
!1
I
I i' r
i
i Lli~ li
~
ili
~ j~r ~
~r
~"
11',, iJ~l Ii I
II
il !l liI -3 I
-4
I
0
20
40
60
80
100
120
140
160
180
200
Fig. t0. An example of the CWM filter operating on an i.i.d. Laplacian sequence with unit variance. Shown are the filter input and output sequences as well as the trimming statistics x(k) and X(N+t-k).The filter window size is 25 and k = 7.
K . E . Barner and G. R. Arce
574
3.2. Weighted median filters The weighting scheme used by CWM filters can be naturally extended to include all input samples. To this end, let w = [wl, w2,..., WN] be a N long weight vector with positive integer elements that sum to an odd number, i.e., ~U=l wi is odd. Given this vector of weights, the W M filter operation is defined as (Brownrigg 1984)
y(n) = Med[x(n) <>w] = Med[x~ owl ,x2 o W2,... ,X u <>WN].
(29) (30)
Thus, W M filters incorporate temporal order information by weighting samples according to their temporal order prior to rank filtering. The filtering operation is illustrated through the following example. EXAMeLE 3.1. Consider the window size 5 W M filter defined by the symmetric weight vector w = [1,2, 3, 2, 1]. For the observation x(n) = [12, 6, 4, 1,9], the filter output is found as
y(n) = Med[12 o 1,6o2,4<>3, 1 o 2 , 9 o 1] = Med[12, 6, 6,4, 4, 4, 1, 1,9] = Med[1, 1,4,4,4, 6,6, 9, 12] =4
(31)
where the median value is underlined in equation (31). The large weighting on the center input sample results in this sample being taken as the output. As a comparison, the standard median output for the given input is y(n) = 6. [] The W M filtering operation can be schematically described as in Fig. 11. This figure illustrates that as the filter window slides over an input sequence, the observation samples are duplicated (weighted) according to their temporal order within the window. This replication forms an expanded observation set which is then ordered according to rank, and the median sample selected as the output. In this fashion specific temporal order samples can be emphasized, and others deemphasized. The figure also illustrates that structurally, the W M filter is similar to the linear FIR filter. This relationship between linear and W M filters can be further explored through an alternative W M filter definition. The constraint that the W M filter weights be integer valued can be relaxed through a second, equivalent, filter definition. Thus, let w be an N element weight vector with positive (possibly) non-integer elements. The output of the W M filter defined by w and operating on the observation x(n) can be defined as
y(n) = arg rn~n D~ (fi)
(32)
where D"w(.) is the weighted distance operator N
D~w(fi) = E w'lxi - fl['~ " i:l
(33)
Order-Statistic filtering and smoothing of time--series: Part H
575
Observation Window x4
...
XN
)¢ ~
<,)
~
Output
P' {y}
Fig. 11. The weighted median filtering operation.
Note that D~(/~) is piecewise linear and convex for wi >_ O, i = 1 , 2 , . . . , N . Thus, argmin~D~(/3) is guaranteed to be one of the input samples X l , X 2 , . . . ,XN. The WM filter output for non-integer weights can determined from (32) as follows: 1. Calculate the threshold w0 = ~1 i =N1 wi. 2. Sort the samples in the observation vector x(n). 3. Sum the weights corresponding to the sorted samples beginning with the maximum sample and continuing down in order, 4. The output is the sample whose weight causes the sum to become >_ w0. The following example illustrates this procedure. EXAMPLE 3.2. Consider the window size 5 WM filter defined by the real valued weights w = [0.1,0.1,0.2,0.2,0.1]. The output for this filter operating on the observation x(n) = [12, 6, 4, 1,9] is found as follows. Summing the weights gives the t hre sh o1d w0 -- ~1 i =51 wi = 0.35. The observation samples, sorted observation samples, their corresponding weight, and the partial sum of weights (from each ordered sample to the maximum) are: observation samples corresponding weights
12, 0.1,
6, 0.1,
4, 0.2,
1, 0.2,
9 0.1
sorted observation samples corresponding weights partial weight sums
1, 0.2, 0.7,
4, 0.2, 0.5,
6, 0.1, 0.3,
9, 0.1, 0.2,
12 0.1 0.1
(34)
Thus, the output is 4 since when starting from the right (maximum sample) and summing the weights, the threshold w0 = 0.35 is not reached until the weight associated with 4 is added. The underlined sum value above indicates that this is the first sum which meets or exceeds the threshold. []
576
K. E. Barner and G. R. Arce
In the previous section the median and sample mean filters were related through the distance operator D"(.). There, it was shown that Med[x(n)] = arg min~ D I (/~) while Mean[x(n)] = arg min~ D 2(fl). Similar results hold relating the WM and linear FIR filters by means of the weighted distance measure D~(.). As stated in (32), the WM of x(n) is argmin~Dw(/~ ) for t / = 1. Interestingly, if the distance norm is changed to two, then argm~nD2(/~) _ ~-~Ji=.=,wiXi '
(35)
which is a normalized linear FIR filter (Yli-Harja et al. 1991). Before ending the discussion on WM filters it is important to point out that the two filter definitions given (equations (28) and (32)) are identical. It has been shown that any WM filter based on real valued weights has an equivalent integer valued weight representation (Yli-Harja et al. 1991). As an illustration, multiplying a weight vector by a positive constant results in an identical filter. Thus, the WM filter defined by the weight vector w = [1, 1,2, 2, 1] is identical to that used in Example 3.2. Consequently, there are only a finite number of WM filters for a given window size. The number of WM filters, however, grows rapidly with window size. For instance, there are only 4 window size 5 WM filters, but 114 and 172,958 window size 7 and 9 WM filters, respectively (Yli-Harja et al. 1991).
3.3. Weighted order statistic filters The weighting scheme used in WM filters is an effective method for emphasizing samples in certain observation window locations and de-emphasizing others. However, the WM filter output is restricted to be the median of the weight (repetition) expanded set. This lack of freedom in choosing the rank of the output can limit performance in certain cases. This limitation can be eliminated by allowing the rank of the output to be an adjustable parameter. This leads to the class of WOS filters, which includes WM and all rank-order filters as a subset. Moreover, the more powerful generalization developed in the remainder of the paper are based on the WOS filtering operation. The operation of a window size N WOS filter is defined by the N element weight vector w and tlae rank parameter w0. For positive integer valued weights and rank parameter (the integer constraint will be lifted shortly), the output of the WOS filter is computed as
y(n) = w'~hLargest [x(n) ow] .
(36)
N Note that if w0 = ½(1 + ~ = 1 wi) (or for non-integer weights w0 = ~1 ~ i = l wi), then the WOS filter reduces to a WM filter. The WOS filters also contain rank-order filters as a special case. By restricting each of the weights to be unity, wi = 1 i = 1,2,...,N, the WOS filter output becomes y(n) = w~h Largest [x(n) <>w] =X(wo), where again x0),X(z),...,X(x) are the order statistics. While rather simple, there are several applications where rank-order filters can be
Order-Statistic filtering and smoothing of time series. Part H
577
effectively utilized. The demodulation of AM signals is one such example where the output rank is selected so as to tract the envelope function of the AM signal. Figure 12 depicts the AM detection of a 5 kHz tone signal on a 31 kHz carrier and sampled at 250 kHz using an eighth-ranked-order operation with a window size of 9. Figure 12 (a) shows the envelope detection when no noise is present, whereas Fig. 12 (b) shows the envelope detection in an impulsive noise environment. Note that while impulsive noise is very disruptive with most ~envelope detectors, the output of the rank-order filter is hardly perturbed by the noise. As with WM filters, the restriction that the weights, and in this case w0, be integer valued can be relaxed. For non-integer values, w0 is referred to as the threshold and the WOS filter output is determined by the same procedure used to find the WM filter output for the non-integer weights. The only difference being N that w0 is free to be chosen and not restricted to w0 = g1 ~i=1 wi. Thus, WOS filters have N + 1 degrees of freedom. The freedom to set the threshold, in addition to
Original signal 160 150 140 130 120 110 100 90 80
(a) 700
10
20
30
40
50
60
70
80
90
100
Signal corrupted with impulsive noise
(b)
'° 0
10
20
30
40
50
60
70
80
90
100
Fig. 12. R a n k - o r d e r A M d e m o d u l a t i o n . The w i n d o w size is 9, a n d the o u t p u t is the 8 'th largest in the window. B a s e b a n d signal is at 5 K H z with a carrier o f 31KHz. The s a m p l i n g frequency is 250 K H z . (a) noiseless reception. (b) noisy reception with impulsive noise P r = 0 . 1 5 , cr 20.
K. E. Barner and G. R. Aree
578
the weights, makes WOS filters a powerful class of filters with wide ranging applications. Moreover, effective (adaptive) optimization procedures exist for WOS filters. Furthermore, since the W M and WOS filters are simple generalization of the median, we can expect some properties of the median filter to extrapolate to these more general filters. This is in fact the case for the root signal properties and threshold decomposition. We revisit threshold decomposition next because of its importance in the analysis and optimization of W M and WOS filters. 3.4. Threshold decomposition and logic As stated above, threshold decomposition extends to the class of WOS filters. To show this, we begin by again restricting the input signal to have M levels. After proving that WOS filters possess the threshold decomposition and stacking property, the conditions on the input signal are relaxed to allow for the case of real valued inputs. To begin, denote the input vector a s x ( n ) = [ X l , . . . , X N ] T where xi C ZM = {0, 1 , . . . , M - 1}. Recall that x(n) can be decomposed into M - 1 binary vectors X ~(n), X 2 ( n ) , . . . , X M I(n), where the elements of the binary vectors are X m = Tm[Xi] for m = 1 ; 2 , . . . , M - 1 and i = 1 , 2 , . . . , N . Also, the decompoM-1 sition is reversible, xi = ~m=l X,m for i = 1 , 2 , . . . , N. The decomposition can now be applied to the WOS filter operation: y(n) = w o'th Largest[x1 o Wl .. ., XN O WNI = wt°th Largest t. \m=,Z X ~
(37)
° wl " " " l~Z=l X ~ ) ° WN
"
(38)
Since replicating each xi sample w i times is equivalent to replicating each X~m binary sample wi times and adding all these together, the above can be written as y(n) = w o'th Largest
owl), ..., Lm=l
m o WN)
(39)
m=l
The next step is to invoke the stacking property of threshold decomposition which states that if X F = 1 for a given threshold level p, then X 7 = 1 for all levels q < p. Similarly if X,p = 0, then Xiq = 0 for all q > p. Thus, finding the w~h largest sample in the set Xl <>wl, • • •, x x o WN is equivalent to finding the m a x i m u m level m at which there are w0 or more ones in the set X ~ O W l , . . . ,X~r OWN. Finding the m a x i m u m level which satisfies this condition, in turn, can be found by counting the levels which have w0 or more ones in the binary vectors. Hence, the output of the WOS filter can be written as M-1
y(n) = ~
wg h Largest[X~ o Wl,. . . ,X~v o WN].
m=l
This expression can be further simplified as (Yli-Harja et al. 1991)
(40)
Order-Statistic filtering and smoothing of time-series: Part H M
y(n)
579
1
= Zf(xm;w)
(41)
m=l
where the function f ( X ; w) is a linearly separable threshold function
{'0 ifohewie(mm) -wo
j/xm
(4 1
The output of a WOS filter can be finally expressed as M-1 m=l
where U(.) is a unit step function, and where W = [w0, Wl, w2,..., WN]r and X" = [-1,X~', ... ,X~] a" are the extended weight and extended observation vectors, respectively. Hence, the WOS filter output is shown to not only satisfy the threshold decomposition property but also to be characterized by a sum of linear threshold functions. Notice that in the threshold domain (40), the WOS filter weights, including w0, are required to be positive but can also be real-valued. The restriction that the input be integer-valued can also be relaxed to allow for realvalued observations. Next, we generalize the threshold decomposition architecture to handle real-valued signals (Yin and Neuvo 1994). Take x(n) to be nonnegative and real-valued. The nonnegative constraint is taken for convenience, and will be relaxed shortly. As in the integer-valued case, a real-valued observation x(n) can be decomposed into a set of binary signals,
xr(n)=U(x(n)-F), from which
x(n)
x(n) =
0_
(44)
can be recovered, xr(n)
dr
=
g(x(n)
- r) dr
.
(45)
The WOS filtering of a real-valued signal can now be implemented using threshold decomposition as
y(n)---
U wm~r dF
(46)
where ~ r = [_l,XrXr,... ,xr]T. The integration is simplified by the fact that the observation vector contains, at most, N different valued samples. Consequently, there are at most N + 1 different binary vectors X r. The possible vectors are
K. E. Barner and G. R. Arce
580
(1, 1 , . . . , 1 ) T
xr
:
1
2
~''"
,yN (, ,))T
(o,o,...,o)
if F E [0,x(1)] if F c (x(i-i),x(i)] if F E (X(N)~-~-(x3]
(47) .
Using this fact it can be shown that N
y(n) = x(1) + ~-~(x(i) - x(i-1))U(VVT~K x(il) •
(48)
i=2
This decomposition holds for both integer- and real-valued signals, as well as those that are not strictly positive. Moreover, this decomposition is much more efficient than that originally derived for integer-valued signals since it requires only N + 1 threshold logic operations rather than M. This reduction in complexity simplifies WOS analysis and optimization, both of which are performed in the threshold domain. By combining this threshold decomposition with unit step function approximations, fast adaptive optimization algorithms can be developed. This is the approach taken in Section 5, which describes the optimization of WOS filters.
4. Time-Rank coupling extensions: P W O S filters
The generalizations of the median filter discussed in the previous section are based on the weighting of samples. In the most general case covered, WOS filters, the observation samples are weighted according to their temporal-order prior to rank ordering. This median filter generalization method, as well as others such as stack filters (Wendt et al. 1986; Coyle et al. 1989), have been proposed to incorporate some form of temporal-order information into rank-order filters. Still, due to their constrained nature, these methods do not fully utilize the information contained in both the temporal and rank ordering of the observed data (Barner and Arce 1994). An observation set of samples can, of course, be ordered in many ways 3. In most practical situations samples are observed on a time ordered basis, e.g., from a sensor which is regularly sampled. This results in the temporal-ordered observation x. The samples comprising x can be permuted such that they are ordered according to a different criteria, such as rank. The rank-ordered samples are written as XL. Thus, the mapping x ~-+XL is simply a permutation of samples. Moreover, this permutation mapping contains both the temporal and rank orderings of an observation set of samples. The temporal and rank natural orderings are important for the filtering process. Rank-order information is vital for reducing the effect of outliers in non-Gaussian environments and accurately tracking non-stationary signal dis-
3Two orderingsthat arise naturallyare temporaland rank. Other natural orderingsincludespatial, spectral and likelihood.
Order-Statistic filtering and smoothing of time-series: Part H
581
continuities. Conversely, temporal-order information is essential for preserving/ rejecting signal frequency content and processing temporally correlated signals. The class of Permutation (~) filters have been designed to take full advantage of the permutation mapping x H xt, and consequently, both the temporal - and rank-order of observation samples. By utilizing both orderings, permutation filters have shown to be both robust and frequency selective (Barner and Arce 1994; Arce et al. 1995; Kim and Arce 1994). Moreover, the temporal- and rank-order information can be simply augmented with additional statistics resulting in extended N filters (Hardie and Barner 1996), which have additional capabilities. Selection N filters contain WOS filters, stack filters (Wendt et al. 1986), and some composition of discrete morphological operators, as a proper subset. The use of the observation permutation as a basis for filtering has considerable advantages. However, the factorial growth in the number of permutations, as a function of window size, limits the practicality of using the full permutation information. Thus, a subset of the permutation information must be used in practice. Optimizing on what, and how much, temporal and rank information should be used is very difficult. Therefore, we adopt a nested lattice formulation of permutation filters. This lattice formulation gives a well structured method for controlling the amount of temporal and rank information used. Each vertex of the lattice defines a class of N filters which uses a fixed amount of temporal- and rank-order information. This lattice is an extension of the Lg ordering used in (Ghandi and Kassam 1991; Palmieri and Boncelet 1989). This extension results in a /Jg time/rank ordering and lattice, where j indicates the amount of rank information incorporated, respectively. To illustrate the concept, the following discussion starts with the simple Llg case. Extensions are then made to the more general cases. 4.1. Llg P W O S filters
The WOS filter operates on limited temporal- and rank-order information. Clearly, samples are weighted according to their temporal-order, or equivalently, their location within the observation window. The expanded set is then rank ordered and the w~h sample taken as the output. The observation samples are ordered only after weighting. That is, the weight applied to a sample is not dependent upon its rank-order. For instance, if the center sample is heavily weighted to reflect its importance, then the observation sample in that location is emphasized regardless if it is a "good" sample or an outlier. In fact, all outliers are emphasized under this scheme since each outlier occupies the center observation window location once, assuming the window is sequential shifted over the sequence one sample at a time. The samples in the observation window can be more appropriately weighted by considering the temporal- and rank-order of each sample. To accomplish this weighting, define the rank indicator vector Ni = [~il, -~i2,. • •, ~iX] T, where
Nik =
1 0
ifxi~ ~x(~) otherwise
(49)
582
K. E. Barrier and G. R. Arce
and xi ~ ~x(k) means that the k th order statistic occupies the ith temporal location in x. Let the variable ri be the rank ofx~ in xL; hence, 08i~, = 1 by definition. Thus, .~i is a length N binary vector with a "1" in position ri. The other N - 1 positions i~ the vector are zeros. The N rank indicators can be combined into a N 2 x N matrix P that indicates the rank of each sample 4,
[~ 1
p =
0
...
0
-~2
0
•
'.
0
''-
(50)
~N
where 0 is a N long vector of zeros. EXAMPLE 4.1. Consider the 4 sample (temporally-ordered) observation x = [6, 3, 10, 1], which results in the rank-ordered vector xL = [1, 3, 6, 10]. the four rank indicator vectors and their respective rank parameters are N1 = [0, 0, 1,0] T, ra = 3
~2 = [0, 1,0,
~3 = [0, 0, 0,11 T, r 3 = 4
~4 ----'[1, 0, 0, 0] T, r 4 = l
0] T, r2 =
2
(51) .
Combining them into the P matrix produces,
p=
o, o, 1, o o, o, o, o o, o, o, o o, o, o, o
I
I I I I
o, o, o, 1, o, o, o, o,
o, o o, o o, o o, o
/ I f I
o, o, o, o, o, o, o, o,
o, o I o, o, o, o I o, o, o, 11 o, o, o, o I 1, 0,
o, o o, o, 0,
'
(52)
where the vertical separation bars have been added for convenience. Thus, the first section of the P matrix gives the rank of xl, the second gives the rank of x2, and continuing so until the last section which is the rank of XN. [] Having defined P, which gives the temporal- and rank-order of each sample, we can now define a corresponding weight vector. Since the goal is to weight each sample according to its temporal- and rank-order, the weight vector must have N 2 entries• Consider the i'th temporal sample xi. This sample can take on N rank values, so N weights must be associating with this sample. Define the weight vector
wi = [wi,(l/, wi,t2/, • •. , Wi,(N/]T
(53)
4 Note that this same rank information could be represented by a N 2 element vector. We use the less efficient representation only to allow simple matrix products. This will simplify the notation used shortly.
Order-Statistic filtering and smoothing of time-series." Part H
583
with positive valued elements to be that associated with Xg. Thus, each xg has N weights and the single weight use at any given instant will depend on the rank of xi. Recalling that rg is the rank ofxg, the weight used at each instant is wi,(ri). Thus, each observation sample is weighted according to both its temporal- and rankorder. The N weight vectors can be stacked to form a single PWOS weight vector, W = [W~lWfl ..-IWTN]w
(54)
The appropriate weights from W (only N weights are used at any given time) can be selected using P. Once the weights are selected, the output of a PWOS filter is found in an analogous manner to the WOS filter output. Formally, the PWOS output is defined as y(n) = Vf~ thLargest IxT <>wTp]
(55)
: V¢~ thLargest [Xl <>WTPI, x2 <>W2TP2,..., XN <>WTPN]
(56)
= W~ thLargest Ix1 ~ Wl (rl), X2 ~ W2(r2),""", XN (>WN(rN)] ,
(57)
Thus, each input sample is weighted according to its temporal- and rank-order and the W0th largest sample is chosen as the output from the expanded set. Since the weight of each sample depends on the temporal- and rank-order of one sample (itself), this filter is said to use Llg temporal/rank information and reside at the Llg location on the Lg lattice, which is defined shortly. The following examples illustrate the operation of PWOS filtering. The weights in the example are integer-valued. However, like WOS filters, PWOS filter weights need only be positive. We give only a integer-valued weight PWOS example as output for real-valued weights is found similarly to the WOS case. EXAMeLE 4.2. Consider the window size 3 PWOS filter with g¢~ =-6. Let x = [xl,x2,x3] = [5, 1,4]; then, x r = [x(1),x(2),x(3)] = [1,4,51. Let the PWOS weight vector be W = [WI(1) , WI(2) , WI(3) I . . . Iw30), w3(2), w3(3)] T
(58)
= [1,3,213, 5,414, 3, 3]T From the observation vector, we can compute the matrix P. The rank indicator vectors for x are: ~1 =[0,0,1] T, = [1, o, 017,
"0~3 = [0, 1,0] T,
rl = 3 r2 = 1
(59)
r3 = 2 .
The weights obtained for the replication of the input samples are computed via w T p , which evaluates to
584
K. E. Barner and G. R. Arce
1
9
9
1
Fig. 13. Optimal window size 9 PWOS filter weights plotted as a mesh function of temporal- and rank-order. The greatest weight is given to those samples that are centrally located in both time and rank.
wTp = [1, 3, 213, 5, 414, 3, 3] [°,0,
°,0, 01 [I °,1, °,0, 0° I] °,0, 0,°, i]'~ o, o, o I o, o, o I o, 1, (60)
= [2, 3, 3] .
(61)
The output of the filter is
y(n) = 6' th Largest [xT <>wTp] = 6 t th Largest [5 <>2, 1 <>3, 4 o 3] = 6;th Largest [5,5, 1, 1, 1,4,4,4] = 4 .
(62)
[] The advantage of considering both temporal- and rank-order when assigning weights is that outliers can be detected and given a smaller weight. This is illustrated in Fig. 13 which shows optimal PWOS filter weights plotted as a mesh function of temporal- and rank-order. The input to this filter was an image corrupted by heavy-tailed points. As the figure shows, the samples given the most weight are centrally located in both time and rank. This makes intuitive sense has the central temporal samples are expected to be more correlated with the desire center sample then those which are temporally distant. Similarly, samples that lie in the extreme ranks may be outliers and should be given smaller weight. Next, we extend temporal/rank coupling to include more than one sample.
Order-Statistic filtering and smoothing of time-series: Part H
585
4.2. LJg P W O S filters
The weighting scheme derived in the previous section can be extended to include information on the rank-order of multiple samples. Thus, each input sample can be weighted according to not only its temporal- and rank-order, but also the rank-order of its neighbors. This scheme allows the ranks of adjacent samples to be compared during the weighting process. Through such comparisons, it can be better determined if a sample is truly an outlier. For instance, if two adjacent samples both have high rank, then they may simply be samples which crossed an edge. If only one sample has high rank, then with higher probability it is an outlier. To take advantage of neighboring rank information, a general U g rank coupling technique is developed next. In the previous section rank indicators were used to characterize the rank of each (temporal) sample xi. Suppose that we want to jointly characterize the ranks of two input samples, xi and xi+l. If the rank indicator vector for xi, Ni, is given, then we can form an additional indicator vector for Xi+l that does not contain the information provided in Ni. This vector, denoted by ~], is the N - 1 length reduced indicator vector formed by removing the r~h element from ~i+l. Thus, Ni gives the rank ofxz and N] gives the rank ofx~+l, given that we know the rank of xi already. We can extend this concept to more than two samples. Associated with the x ith input sample, the reduced rank indicator N~ is formed by removing the th Fth rith ,ri®l . . . . , ~(a-1) elements from the vector ~i~a, where ® denotes Modulo N addition i ® a = ( i + a ) Mod N. 5 For example, if x = [ 6 , 3 , 1 0 , 1 ] a n d x r = [1,3, 6, 10], then the rank indicator vectors and their respective rank parameters are ~1
=
[0, 0,
1,0]T~
r1 =
~3 = [0, 0, 0,11 T, r 3 = 4
3
~2 = [0, 1,0, 0]T, r2 = 2 ~ 4 = [ 1 , 0 , 0 , 0 ] r, r 4 = 1 .
(63)
The reduced rank indicator vectors 0~¼ and N32 are, for example, ~4 = [/0/'4~0, 1,0] T = [0, 1,0] T
N~ = {,tY4, 0, 1, )Or3]T ---- [0, 1]T
(64)
where the r~h sample was removed from ~4~1 = .~1 to get Nl4 and where the r~h and r~h samples were deleted from N3e2 =- N1 to get N~. The rank indicator vectors Ni, N I , . . . , ~/-1 can be used to express the ranks of j consecutive samples starting at xi. The rank permutation indicator associated with the x~h input sample is defined as
5 The Modulo N operation defined here is on the group { 1 , 2 , . . . , N}, such that (N Mod N = N) and (N + 1 M o d N = 1). The ranks can, of course, be coupled in a fashion other than cyclical Modulo N method used here, e.g., the next sample coupled to xi could be that of m i n i m u m temporal distance from xl resulting in coupling progressions xi,xi+l ,xi ~,xi+2,.... Such couplings result in similar filter structures and results. For simplicity, we use the notationally simple cyclic Modulo N coupling here.
586
K. E. Barrier and G. R. Arce
(65) for 1 ~ j < N, where ® denotes the matrix Kronecker product. Note that the vector ~ has length UN _= N ( N - 1)... ( N - j + 1). The indicator vector P~ characterizes the relative ranking of the samples x i , x i ® l , . . . ,Xi®(j_l). Thus, p0 contains no ranking information; P] provides the rank information ofx~, but does not contain information relating to the ranks of any other samples• P~ provides the rank information of xi and xiel but none related to the other N - 2 samples. Clearly, pN accounts for the ranks of all input samples in the observation vector. In order to illustrate the formulation of the vector P{i', consider again the observation vector x = [6, 3,10,1] a" and its corresponding sorted vector xr = [1,3, 6, 10]T. The rank permutation indicators for j = 2 are found as P~ : ~1 ® ~'1 = P~ : ~ 2 ® e ~ =
[0, 0, 1,0] T @ [0, 1,0] T [0, 1,0,0]T @ [0,0, 1]T
= [0, 0, 0, 1]T ® [1,0, 0] T
P~ = ~ 3 ® ~
(66)
p2 : ~ 4 @ e I : [1,0,0,0] T ® [0, 1,0] T .
p2 in (66), for example, indicates that X 1 is the third smallest sample in x, and that x2 is the second smallest of the remaining samples, making it the second smallest overall in x. The rank permutation indicator vectors are next used to formulate the class of PWOS[/'] filters. Given the input vector x and the corresponding permutation indicator vector ~ , we define the/ON long weight vector W~ associated with the i'th input sample as • w /(/~N)] j ~r W~ = [14fl/i(1),w~/(2),...,
,
(67)
where the elements are constrained to take on positive real values. These vectors are then stacked to form the PWOS weight vector Wj = [(w{)TI(w~)TI ... ](w~)T] T .
(68)
As in the previous development, only N weights are used at a given instant. The N appropriate weights can be selected from W j as a matrix product with the NUN x N matrix 0
pj =
P~ 0
" --.
0
,
(69)
I~N
formed by the N rank-permutation indicator vectors and where 0 is a UN long null vector. Thus, the weight applied to each input sample xi is ( w j ) T ~ , which depends on the ranking characteristics of the samples xi, x i e l , . . . , x~e(j-1). Since the
587
Order-Statistic filtering and smoothing of time-series: Part H
x(n)
x(n
x(n-N+l)
~r
ty (n)
,th
. y(n) Fig. 14, The PWOS filter structure with time-varying replication coefficients that adjust to the varying characteristics of the observations.
weight applied to each sample depends on the temporal-order of one sample (itself) and the rank-order o f j samples, the filter based on this weighting is said to use Ug temporal/rank information. The PWOS filter that uses this temporal/rank information is denoted as a PWOS[j] filter, the output of which is defined as
y(n) = W~th Largest[x T <>((Wj)Tpj)] = ~/~ th Largest[x1 <~( W { ) T ~ , . . . ,X N '0 ( w J ) T p / N ]
(70) ,
(71)
The PWOS filtering structure constitutes, in essence, a time-varying WOS filter where the varying coefficients adjust to the rank and time ordering characteristics of the input samples. Figure 14 depicts the structure of a PWOS~j] filter. Note that in the above formulation, the PWOS [0] filter reduces to the WOS filter, where the filter weights for each sample xi are assigned the same value regardless of the observation permutation. In the PWOS[N - 1] filter, the complete mapping from location to rank for all samples in the observation vector is used to assign the weights.
4.3. PWOS filter lattices For an observation vector of size N, the PWOS[/'] filtering framework defines a wide variety of filters where each filter uses different rank and temporal ordering information. As expected, the set of filters are coupled to each other. This coupling takes on a well defined structure and ordering. Let ---- {(PWOS) 1, (PWOS)2,..., (PWOS)N}, where (PWOS) i stands for the set of PWOS [j] filters i Much like IJg filters were shown to comprise a complete lattice, it can shown that permutation weighted order statistic filters also comprise a complete lattice. For this reason, we refer to the set ~ as a Permutation Weighted Order Statistic Filter Lattice.
588
K. E. Barner and G. R. Arce
I PWOS(6) PWOS(5)
PWOS(2) PWOS(1) WOS Fig. 15. The Permutation Weighted Order Statistic Filter Lattice for N - 7. The least upper bound is the PWOS [6] filter and the greatest lower bound is the WOS filter. The basic notation used to describe ordered sets and lattices, was described in Part I of this paper. Here we omit these definition and present the lattice structure of PWOS filters. Starting with the PWOS[/l filters, it is easy to see that the constraints on the weights applied to each sample xi are relaxed as j increases. It follows that the weights of a PWOS[/] filter can be selected such that it is made identical to a P W O S [ / - 1] filter. We thus say that a PWOS[j] filter covers a P W O S ~ j - 1] filter. Using this argument recursively, we can write PWOS[N - 1] ~ PWOSIN - 2] ~ - . . . ~- PWOS[0] .
(72)
Thus, the class of PWOS[/] filters constitutes an ordered set with a greatest lower bound (PWOS[0]) and a least upper bound ( P W O S [ N - 1 ] ) . Consequently, PWOS[j I filters comprise a complete lattice (Donnellan 1968), and as such, can be depicted in a lattice diagram. Figure 15 depicts a PWOS[j] filter lattice for N -- 7, where the infimum (WOS filter) and the supremum (permutation filter (Barner and Arce 1994)) are clearly shown. The lattice structure of PWOS filters illustrates the modularity of this filter class. While a filter in the lower region of the lattice may suffice for simple applications, a filter in higher regions of the lattice is more desirable for more difficult estimation problems. 4.4. Mo d el order (complexity) reduction
Perhaps the most significant shortcoming of PWOS filters is the number of parameters, or weights, required for large window sizes or large values of j. For example, a PWOS[2] filter for a window size o f N = 11 has 11 • 11.10 = 1,210 weights. Besides the obvious m e m o r y requirements, the filter may also be difficult to train using adaptive algorithms, requiring m a n y iterations to adequately train all of the weights. In practice, some weights may never be updated in the adaptive
Order-Statistic filtering and smoothing of time series: Part H
589
training process. Clearly, a subclass of filters that possesses the desirable attributes of the class of PWOS filters yet has fewer parameters to train and store would be quite valuable. We next describe a method for reducing the filter complexity which is based on coloring, or quantizing, the temporal or rank indexes. Through this quantization, the number of indices, and consequently the number of permutations and combinations of indices, is reduced. Coloring also makes intuitive sense since to form an acceptable estimate, it is often not necessary to know the exact rank or time of each sample. For instance, the goal of rank-ordering is often to identify outliers. These outliers, for symmetric noise, occupy the two extremes of the ranked set. Thus, the minimum and maximum sample are generally assumed to be outliers. For this case, the minimum and maximum rank could be assigned a common color which indicates outlier samples. Assigning the minimum and maximum samples a common color results in no loss of information, assuming that these samples are non-information bearing noise. For such signals, the number of ranks has been reduced through coloring, and consequently the number of permutations and combinations, but no loss of information has occurred. Similar coloring can be applied to temporal-order, for instance, when the input signal is symmetric with respect temporal-order. Coloring is achieved in both temporal- and rank-order by mapping the numerical indices of each ordering to a set of colors, where the number of colors is less than or equal to the number of indices. The coloring is developed similarly for each ordering. Therefore, we give only the development for rank coloring here. For a full discussion on coloring, (see Barner and Arce 1997). Coloring is essentially a mapping of indices to a set of colors. For convenience, the colors can be represented by integers. Let c be the coloring operator. Then for a window size N observation and M element color set (M_
Rank I1
2
4
5
6
7
8
9
3
3~
\~/~ Coloring 1
1
2
2
2
3
Colored Rank Fig. 16. The 9 samplerank coloring.
~-
590
K. E. Barner and G. R. Arce
indices. The updated rank -~i = [Ni~, ~ i 2 , . . . , NiM]T, where ~ik =
1 0
indicator
vector
will
if c(ri) = k otherwise .
have
M
entries,
(73)
Note that this definition of the rank indicator vector contains the previous definition has a special case. The two definitions are identical for M = N and c(i) = i for i = 1 , 2 , . . . , N. The following example illustrates the rank coloring concept. EXAMPLE 4.3. Consider a N = 9 observation vector and a M = 3 color set with mapping e = [1, 1, 1,2, 2, 2, 3, 3, 3]. Thus, this coloring partitions the order statistics into three sets (lower, middle, and upper third) where each set is mapped to a different color, as illustrated in Fig. 16. Suppose x = [7,3, 11,9,2, 1,5,4, 12]. Then xL = [1,2, 3, 4, 5, 7, 9, 11, 12] and the 9 colored rank indicator vectors are = [0,1,0] 7 = [0, 0,1] T
= [1,0, 0] T
e 3 = [0, 0, l] T
~ 5 = [1,0,0] T
~ 6 = [1,0,0] T
¢~7 = [0, 1,0] T
~s = [0, 1,0J T
(74)
~9 = [0,0,1] y
[]
The colored rank indicator vectors can be combined and coupled to form colored permutation or combination matrices. This then leads naturally to colored PWOS filters. The coloring concept is an important tool for reducing the filter parameter space. Indeed, for the conditions in the example above, coloring reduces the number of permutations from 362,880 to 1,680, resulting in a significant reduction in the parameter space. The importance of reducing the parameter space increases as the window size grows. Moreover, decreasing the parameter space through coloring does not necessary lead to decreases in performance. In fact, through coloring it is possible to increase the window size beyond that which is practical without coloring. This performance gain achieved through increasing the window size, generally, more than offsets the cost of coloring. This is especially true if the color mapping is well chosen. In the example above, the coloring was chosen to uniformly quantize the ranks. Such a quantization is often not optimal. While optimally sending the coloring is difficult do to the nonlinear nature of the problem, a sum-optimal forward sequential optimization procedure can be employed in which the coloring is restricted to be increasing, 1 = c(1) < c(2) _< ... < c ( N ) = M , and the color count (M) is sequentially increased by splitting the color which results in the largest performance gain (Barner and Arce 1997). Figure 17 shows the results of this procedure operating on a speech signal. The figure shows the estimate MSE for filters with window size N = 50, 75, 101,125, and 151. The temporal/rank information used is the L~g. Clearly, using the full L lg information is not possible for these large window sizes since this would require N 2 weights. Figure 17 shows the decrease in MSE as the number of colors increases. Clearly, the improvement in performance is minimal after roughly 10
Order-Statistic filtering and smoothing of time-series: Part H
591
8 kHz Sampling Rate 5O
45
o
151 sample (18.88 ms) window
×
125 sample (15.63 ms) window
+
75 sample (9.38 ms) window
101 sample (12.63 ms) window 40
51 sample (6,38 ms) window 35
30
25
20
15
1~O 1~5 20 Number of Rank Quantization Values
25
30
Fig. 17. Filter estimate MSE values as a function of the number of colors for window size N = 50, 75, 101, 125, and 151 filters operating on speech sample at 8 kHz. The dashed lines are the Final Prediction Error (FPE) which penalizes for the addition of filter parameters. Each curve is stopped at the FPE minimum.
colors. Interestingly, the optimization results in nonuniform colorings. Figure 18 shows the size of the quantization (coloring) bin that each rank index is mapped to for the final coloring determined by the optimization. As the figure shows, the optimization produces a finer quantization at the rank extremes (near the minimum and maximum) and a coarse quantization of the central ranks (near the median). This example shows that coloring can enhance the temporal/rank coupling concept by reducing the size of the parameter set an allowing larger window sizes.
5. Optimization techniques 5.1. Problem formulation The filtering operations of WOS and PWOS filters are determined by the replication weights and threshold. In many cases, the number of parameters governing the operation can be quite large. Therefore, in order for these filter classes to be useful there must be an automated way to set the parameters. The most common way of setting the filter parameters is adaptively, based on a training set of data. Two MAE optimal adaptive techniques are presented here. The WOS and PWOS filters discussed in this paper have similar structures and only differ in how they select the weights to be applied to the observation samples.
592
K. E. Barrier and G. R. Arce
8 kHz Sampling Rate
'/
16 14 .-~ 12
N
r,r
"a co
/I i
2O
40
60 Sample Rank
80
100
120
Fig. 18. The nonuniform rank quantization (coloring) resulting from the optimization for the window size N = 125 case. For each rank (1 to 125) the plot shows the size of the quantization bin that each index is mapped into (or equivalently, the number of rank indexes mapped to the same color). The plot shows that the rank extremes (near the minimum and maximum) the indices are finely quantized while the central ranks (near the median) are coarsely quantized.
This difference has minimal effect on the optimization, at each set only the appropriate weights (according to the observation permutation or combination) need be updated. We therefore develop the optimization using the simpler WOS filter notation. The optimization is derived using threshold decomposition and assumes positive real-valued signals and weights. Let d be the desired signal to be estimated from the noisy observation vector x. Letting "~ = (w0, wT)T and ~R = (_ 1, Xe) T, then by threshold decomposition, the estimate of the desired signal is c) =
U wT~e
) dR.
(75)
Using the above expression, we develop two different algorithms for finding the optimal WOS filter. The first uses a linear approximation to the unit step function, resulting in an expression that can be easily optimized using well-known steepest descent and LMS algorithms (Yin and Neuvo 1994). The second algorithm also uses an approximation to the unit step function in (75); however, it uses a sigmoid function, which more closely resembles the unit step in shape and upper and lower limits on output. Combined with some simplifications, this approximation results in an adaptive algorithm for updating the WOS filter weights in a manner similar to an LMS algorithm (Yin et al. 1993).
Order-Statistic filtering and smoothing of time-series: Part H
593
5.2. Algorithm I The linear approximation approach is based on the WOS filter training algorithm first reported in (Yin and Neuvo 1994). The error between the desired signal and the WOS estimate is e = d - ct or e=d-
j0 (
U @T~R
)
dR
(76)
The MAE to be optimized under can now be written as
This error can be simplified with the use of several WOS filter properties. First, we make use of the stacking property: if d > d, then D m ~ ~)m Vm; if d < d, then D m ~ £)m Vm (Wendt et al. 1986). This allows us to move the absolute value
operatorinsidetheintegralsince[De-U(~CTx")]isamonotonicfunctionofR,
Next, the order of the expected value operator and the integral is switched, which is possible because both are linear operators.
Lastly, we change the power of the error norm. Because DR and U(~cTY, R] a r e binary-valued, the absolute value of their difference is equal to the squarer o[~,their difference. Thus,
The above expression does not lend itself to a gradient-based optimization algorithm, because it would have the derivative of the unit step function (a Dirac delta function) in it. Therefore, we need to replace the unit step with another function. The simplest approximation is to replace the step function with a linear approximation. Substituting this approximation into the mean absolute error formula (8l) and denoting the approximate error as 0,
Expanding the square and moving the weight vector outside of the expected value and integral operators results in
594
K. E. Barrierand G. R. Arce
This error expression is structurally similar to the classical square error expression for a linear filter. This is made more clear with the following definitions: 1~1= E
3[)
X)
• dR }
(84)
and
which are the morphological autocorrelation and cross-correlation. To find the optimal weight vector, "~0, we take the gradient of (83) with respect to V¢ and set it to zero. This results in the familiar normal equation -20+2RW0=0
or
R~¢0=0 ,
(86)
subject to the constraints ~ _> 0, Vi. Due to the positivity constraints, the normal equation cannot be solved, in general, by matrix inversion. It can be solved using an iterative non-negative least squares algorithm, however. Since this is computationally expensive, especially for large 1~ and 0, a simple iterative solution is preferred. Therefore, we use a steepest descent algorithm to find the optimal filter weights. 5.2.1. The steepest descent and LMS algorithms
The steepest descent algorithm converges to a minimum mean squared error solution through iterative updates on the weight vector. To enforce the positive constraints on the weights, define the projection operator P[.] as P[X] =
X, 0,
i f X > 0; otherwise .
(87)
The steepest descent optimization is based on the weight iteration W(k + 1) = P[(I - #l~)'~¢(k) +/20]
(88)
where 0 2 < 2/(2min + ).max) is the required restraint on /2 for guaranteed convergence. The ).rain and ).max are the smallest and largest eigenvalues of l~. The weights are initialized to some arbitrary value and then updated until the change in weights is less than some predefined stopping criteria. The disadvantage of this approach is that the correlation and cross-correlation matrices must be estimated. These matrices are often sparse, and can be quite large especially for PWOS and WOS filters. A simpler approach, in which only N weights are updated at each iteration regardless of the filter class, is to use an
Order-Statistic fihering and smoothing of time-series. Part H
595
LMS type update. Again the weights are updated according to (88), where now an instantaneous estimate of [I and p are used, oo
L O(k) = L DR(k)XR(k) dR . =
and
dR
(89)
OO
(90)
Substituting in these estimates into (88) results in the update
~(k + 1) = P[qC(k) + ~(k)
(Z~(k)-(W)T(k)~R(k) ~R(k) dR]. (91)
This update is similar to the commonly used linear LMS algorithm. The major difference between this and the traditional update is that the signals here are binary and the projection operator has been added to ensure that the weights are nonnegative. The fact that the signals are binary can be exploited to eliminate the threshold decomposition altogether. Note that for binary signals, a • b = rain{a, b}. Thus, multiplication can be replaced by the minimum operator. Moreover, the minimum operator obeys the threshold decomposition and stacking properties which allows the update to revert to the multi-level (or realvalued) signal case. Making these substitutions results in the simple update. + 1) = P [~(k) + t~(k)(min{xi(k), d(k)} N
(92)
- Z ~ min{xi(k),xj(k)}
+ mo(k)xi(k)) ]
j=l
for i = 1 , 2 , . . . , N and Wo(k +
1)
= P Wo(k) + p(k) -d(k) + ~
~Vjxj(k) +
Wo(k)xmax
•
(93)
5.3. Algorithm H Recall from Section 3.4 that the N-length binary vector x R can take on, at most, N + 1 different values depending on R, which are given in equation (47). Using this fact the estimate of the desired signal can be written as
=,o
"
+
U @T~ N)
dR ,
<,_ ,>
,>)
(94)
K. E. Barner and G. R. Arce
596
or substituting in the possible binary vector values and simplifying, N ~/=X(1 ) -~- ~ ( X ( i ) - 2 ( i _ I ) ) u ( W T x x(i-')) . i=2
(95)
This simplified expression for the output can be used in conjunction with an approximation of the unit step function to solve the optimization problem. By approximating the unit step with the sigmoid function 1
U~(X)- l + e
x '
(96)
which has derivative d{U~(X)} e x -= U,(X)[1 - Us(X)] . dX (1 + e-X) 2
(97)
a steepest descent optimization algorithm can be derived. Taking the derivative of the MAE expression in (81) with the step approximation and estimate expressions above, VcxE{le]} = - 2 fo°~ E{ ( g -
U~('~fT'X)) U,(WTX) (1 - U,(WTX))X} d R . (98)
Denoting the estimate of the desired signal as L~ = U~(~¢Tx R) the gradient becomes VwE{[e[}---2/°°E{(DR-L~s)~(1-L~s)XR}dR
.
(99)
/oo /
=-2
E Ae'~ R d R ,
(100)
where AR = (/if - / ~ 3/~ (1 -/DR). An instantaneous estimate of the gradient in (100) can be obtained by simply removing the expected value operator. Using the instantaneous estimate an LMSlike update equation for finding the optimal weights can be found (Yin et al. 1993):
I
Wi(n + 1) = P ~(n) + 2/~
N
(1)A° + ~-]~(x(j) - x 0. I))Ax~l-')X~ic'-')
)I
,
j=2
(101) for i-- 1,2,... ,N where the integrals have been replaced by their equivalent summations. Yin et al. show that further simplifications can be made by using only those vectors near the decision boundary, which have the most impact on the weight update equation. This is true since Ax peaks at the point X = d, and rapidly falls off to zero as X move away from this point. This can be approxi-
Order-Slatistic filtering and smoothing oJ' time-series." Part I1
597
mated by saying that Ax is non-zero only at the point X = d. Thus, all the terms in (101) that do not contain Ad are zero. Using this simplification and the fact that is constrained to be one of the input samples, we can write (101) as W/(n + 1 ) = P i e ( n ) +
2/~((x(e+,)- x(e))Axl"lU(xi- x(e)))]
,
(102)
where x(e) = d and x(e+l) is the next strictly larger sample in x. Next, we make the observation that the Ax(~>term is either a positive or negative constant depending upon whether d > c) or d < d. Absorbing the magnitude of Ax(e>into the stepsize (#), (x(e+l) - xff))A~'/ reduces to (x(e+l) - x(~)) sgn(d - d). In order to simplify the weight update equation, we make the following qualitative assessment. Since the objective of gradient-based training algorithms is to update the weights by a factor proportional to the difference between the desired signal and the filter estimate, (d - d) is a valid approximation to (x(~+l) -x(e))sgn(d - c)). Therefore, the final form of the fast adaptive WOS algorithm is (Yin et al. 1993):
~(n+l)=P[~(n)+Z#(d-~l)U(xi-~t)],
0(n +
i=l,...,N
1) = P[W0(n)r-- _ 2/~(d- d)Jl
(103)
(104)
Thus, when the filter output is less than the desired signal, those weights corresponding to the observation samples that are larger than the filter output are
Fig. 19. Frame 35 of original video signal (resolution: 720 x 486).
598
K. E. Barner and G. R. Arce
Fig. 20. Frame 35 with contaminated Gaussian noise plus interference, MSE = 965.
incremented. The threshold is decremented if the filter output is smaller than the desired signal and incremented if the filter output is larger than the desired signal. Obviously, this algorithm is less complex than Algorithm I, requiring only one multiplication per iteration. Yin, Astola and Neuvo (1993) state that Algorithm II is more sensitive to the initial weight vector than Algorithm I; however, we have always achieved convergence with an initial weight vector of all weights the same and the threshold equal to half of the sum of N weights (i.e. the median).
6. Applications to image restoration In this simulation we demonstrate the effectiveness of PWOS filters in the reconstruction of a distorted video sequence. The video sequence has a 720 by 486 pixel resolution per frame. The distortion consists of additive impulsive noise and on a tone interference i(x,y)= lOOsin[O.25n(x-y)]. The probability that an impulse occurs is 0.05. We compare the results of linear, WOS, permutation WOS(j = 2), and a colored PWOS(j = 2) filters, all using a square window of width 5. The output of a L2g filter is also shown which has very distinct characteristics compared to selection-type filters. All filters were optimized using the first 5 frames of the sequence and used to filter 60 additional frames. Figures 19-23 show frame 35 of the original video signal, its noisy observation, and its various estimates. The respective reconstruction MSE are shown in the captions. Clearly,
Order-Statistic.filtering and smoothing of time-series: Part II
599
Fig. 21. Optimal W O S filter estimate (enlarged), M S E = 87.5.
Fig. 22. Permutation W O S filter estimate (enlarged) with j - 2, M S E = 22.3.
the linear filter destroys the fine details, and the WOS filter does not completely remove the tone interference. The PWOS filters all attain good performance with various degrees of complexity. The colored PWOS filter yields a very good performance at a small cost in complexity.
600
K. E. Barner and G. R. Arce
Fig. 23. Colored PWOS filter estimate (enlarged) with j = 3, MSE = 21.4 five colors are used.
Fig. 24. OptimaI linear filter estimate (enlarged), MSE = 89.2.
Order-Statistic,filter#~g and smoothing o/" time-series: Part H
601
Fig. 25. Permutation L2g filter estimate (enlarged), M S E = 31.6.
7. Conclusion
The fundamentals of weighted order-statistic filters and permutation weighted order-statistic filters are presented in this paper. The simplest of these, the median and center weighted median (CWM) filters, where described through their statistical and deterministic properties. It was shown that simple, yet effective, filters can be designed by varying the center weight of CWM filters. For the larger class of weighted median and permutation weighted order statistic filters, we presented a design methodology based on the minimization of the mean absolute error (MAE) of the estimate. These methods rely on the threshold decomposition property characteristic of these filters. Two simple adaptive filter algorithms were presented which can be used to train permutation WOS filters. Thus, given a desired signal and a corresponding observation sequence, a PWOS filter can be easily design for any application where the training signals can be made available. In order to illustrate the performance of WOS and PWOS filters, we reviewed the image restoration problem where a video sequence is corrupted by impulsive noise and a tone interference. Using the adaptive algorithms described in this papers, we restored the video sequence using several PWOS filters and we compared them with the Ug type filters. We showed that PWOS filters preserve edges and discontinuities more effectively than Ug filters; however, Ug filters have more flexibility since their output is not constrained by the input value set. The applications presented here have been biased toward digital image and video communications. WOS and PWOS filters, however, can be readily used in other applications of time series analysis.
602
K. E. Barner and G. R. Arce
References Arce, G. R, N. C. Gallagher, Jr. and T. A. Nodes (1986). Median filters: theory and aplications. In: Advances in Computer Vision and Image Processing, (T. S. Huang ed.), vol. 2, JAI Press. Arce, G. R., T. A. Hall and K. E. Barner (1995). Permutation weighted order statistic filters. IEEE Transactions on Image Processing, vol. 4. Barner, K. E and G. R. Arce (1994). Permutation filters: A class of non-linear filters based on set permutations. 1EEE Transactions on Signal Processing, vol. 42. Barner, K. E and G. R. Arce (1997). Design of permutation order statistic filters through group colorings July, No. 7, Vol. 44. Brownrigg, D, R. K (1984). The weighted median filter. Commun. Assoc. Comput. Maeh., vol. 27. Coyle, E. J., J.-H. Lin and M. Gabbouj (1989). Optimal stack filtering and the estimation and structural approaches to image processing. IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 37. David, H. A (1982). Order statistics. New York, Wiley Interscience. Donnellan, T. (1968). Lattice Theory New York, Pergamon Press. Fitch, J. P., E. J. Coyle and N. C. Gallagher, Jr. (1984). Median filtering by threshold decomposition. IEEE Transactions on Acoustics, Speech, and Signal Processing vol. 32. Fitch, J. P., E. J. Coyle and N. C. Gallagher, Jr. (1985). Threshold decomposition of multidimensional ranked-order operations. IEEE Transactions on Circuits and Systems, vol. 32. Gallagher, N. C. Jr. and G. L. Wise (1981). A theoretical analysis of the properties of median filters. IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 29. Ghandi, P and S. A. Kassam (1991). Design and performance of combination filters. IEEE Transactions on Signal Processing, vol. 39. Hardie, R. C and C. G. Boncelet, Jr. (1993). LUM filters: A class rank order based filters for smoothing and sharpening. IEEE Transactions on Signal Processing, vol. 41. Hardie, R. C and K. E. Barner (1994). Rank conditioned rank selection filters for signal restoration. IEEE Transactions on Image Processing, vol. 3. Hardie, R. C and K. E. Barner (1996). Extended permutation filters and their application to edge enhancement. IEEE Transactions on Image Processing, vol. 5. Kim, Y.-T and G. R. Arce (1994). Permutation filter lattices: a general order-statistic filtering framework. IEEE Transactions on Signal Processing, vol. 42. Ko, S.-J and Y. H. Lee (1991). Center weighted median filters and their applications to image enhancement. IEEE Transactions on Circuits and Systems, vol. 38. Palmieri, F and C. G. Boncelet, Jr. (1989). Ll-Filters-a new class of order statistic filters. IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 37. Pitas, I and A. N. Venetsanopoulos (1989). Non-linear Filters. Kluwer. Tukey, J. W (1974). Nonlinear (nonsuperimposable) methods for smoothing data. In: Conf. Rec., (Eascon). Wendt, P, E. J. Coyle, and N. C. Gallagher, Jr. (1986). Stack filters. IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 34. Wendt, P., E. J. Coyle and N. C. Gallagher, Jr. (1986). Some convergence properties of median filters. IEEE Transactions on Circuits and Systems vol. 33. Yin, L., J. Astola and Y. Neuvo (1993). Adaptive stack filtering with applications to image processing. IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 41. Yin, L. and Y. Neuvo (1994). Fast adaptation and performance characteristics of Fir-Wos hybrid filters, IEEE Transactions on Signal Processing. Vol. 42. Yli-Harja, O, J. Astola and Y. Neuvo (1991). Analysis of the properties of median andweighted median filters using threshold logic and stack filter representation. IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 39.
N. Balakrishnan and C. R. Rao, eds., Handbookof Statistics, Vol. 17 © 1998ElsevierScienceB.V. All rights reserved.
Z,L
Order Statistics in Image Processing
S c o t t T. A c t o n a n d A l a n C. B o v i k
I. Introduction
In the age of the home computer and desktop workstation, the processing, analysis, storage and display of visual information as digital imagery is becoming commonplace. High definition television (HDTV), direct TV, photo CDs, the World-Wide Web and video teleconferencing are all examples of emerging technologies that utilize digital images. This technological explosion has also been accompanied by many advances in the processing of digital media. Thus, digital image processing has become one of the fastest-growing areas of research and development in the electrical and computational sciences. Current areas of significant research in image processing include the development of theory and algorithms for the acquisition, display, decomposition, reconstruction, compression, enhancement, restoration and analysis of digital image data. The environments in which digital image data are captured are often subject to unpredictable noise levels, interference and modeling uncertainties. Thus, statistical techniques in general have been widely adopted in digital image processing and numerous papers have elaborated optimization criteria for performing image processing tasks. Not surprisingly, in practical applications, the lack of knowledge of signal contamination suggests the use of robust techniques to accomplish many of these tasks, and so, order statistics have found an important place in the image processing literature. Although the statistical properties of local order statistic-based estimators are well-understood, and their application to signal and image processing might be guessed at, there are many serendipitous non-statistical attributes of order statistic methods that make them particularly appealing. Some of these attributes are so striking that order statistic based methods have come to dominate certain areas of computerized image processing. It is hoped that making these applications accessible to statisticians, applied mathematicians, non-electrical engineers, etc., will increase the cross-fertilization of new concepts. Thus, this chapter seeks to summarize several of the most significant applications of order statistics in the area of digital image processing. It is appropriate at this point to give some background into the problem of digital image processing, and the overall motivation for using order statistics in
603
604
S. T. Acton and A. C. Bovik
image processing problems. Digital images may be obtained by sampling and quantizing a continuous image, whether color or black-and-white. The continuous image may be represented as a continuous electrical waveform, such as a television signal, or as a physical negative image, positive slide, or photograph (strictly speaking, a photograph is quantized at the level of the grains in the emulsion; usually, however, this is much finer than the sampling resolution). Digital images may also be acquired directly, as from a digital camera, for example; such instruments are now becoming widespread and extremely inexpensive. Often, the image obtained will be corrupted by noise or cluttered with unessential detail. The noise may arise from electrical noise, from imperfections in the imaging environment (ranging from faulty sensors to dust on the camera lens!), or from channel errors arising in the transmission of the digital data. The techniques of digital signal processing (DSP) offer powerful methods to improve and analyze these data, and have found application in the area of image processing for over three decades. Most basic among these tools is the digitalfilter, which most generically, is a transformation of a digital signal or image into another signal or image, according to some prescription. However, most of the existing theory, and certainly, practical implementations have involved the theory and design of linear digitalfilters. Linear filters are, of course, transformations that obey the linear superposition property, examples of which can be found in many aspects of everyday life. Automobiles are equipped with mufflers (acoustic filters) and shock absorbers (mechanical filters); hi-fi systems are supplied with electrical filters (now often of the digital variety) that provide a clear signal output. Such filters can be implemented (or modeled) by linear convolution, and can be described by a frequency response. Thus, linear filtering may always be viewed as a process which amplifies, attenuates, and phase shifts the individual frequency components of the input signal. The design of linear filters thus always implies some specification based on frequency selection. Linear filters can be expeditiously implemented in the frequency domain via the Cooley-Tukey Fast Fourier Transform (FFT) techniques, or other recent fast algorithms. Indeed, the initial euphoria surrounding signal processing in the 1960s may be attributed to the efficacy and the straightforward implementation of FFT-based linear filters. Despite their many advantages, linear filtering approaches suffer from certain limitations which have emerged in certain application domains, and which prove to be difficult to surmount without abandoning the linear approach entirely. For example, a common task in image processing is the removal of broadband additive noise (e.g., white noise), which occurs in many image signals, such as radar data, electron micrographs, tomographic images, etc. When the noise is broadband, it is inevitable that the signal and noise frequencies will overlap, often in regions of the spectrum that support key pieces of signal information. Linear filtering methods have no ability to distinguish between noise and signal components occupying the same parts of the spectrum, and so, the process of noise reduction can also result in the distortion of important signal properties such as image edges and image detail.
Order statistics in image processing
605
Thus, nonlinear image filtering, and as we will see nonlinear image processing paradigms in general, have become an area of energetic research. Numerous techniques have been proposed, implemented, demonstrated, and argued, but in only one area of nonlinear image processing has there emerged a substantive discipline based on mathematical rigor, general applicability, and repeatable performance. That area may be accurately described as order statistic-based image processing, although the wealth of research that has emerged has revealed a wide variety of new nonlinear methods that are not intrinsically involved with order statistics, but which are, nevertheless, direct descendants of order statisticbased methods. In this overview, we will highlight several established developments in the area of order statistic-based image processing. In addition, a number of recent innovations and extensions to the basic theory will be illumined. Throughout, the practical application of these techniques to digital signal processing, and image processing in particular, will be emphasized. At this point in time, a number of image processing devices, such as the median filter, the order statistic filters, and many related techniques are no longer novelty items, but instead, are standard image processing tools. The goal of this contribution is to give an overview of the role that order statistics have played in this. The plan is not to be exhaustive, but instead, to give suggestive results while maintaining a modicum of analysis. Because of the disparity between the notations usually used by statisticians and by image processing engineers, we shall adopt a simple format and exposition in an effort to maximize communication.
2. Order statistic filters
An M-dimensional digital image will be denoted {Xn : n C V C ZM}, where Z is the set of integers, and where the domain V is discrete and countable (and in practice, finite): typically an M-dimensional integer interval. Such digital images are also quantized to have discrete, finite ranges; computer processing requires this, although the analysis does not always make this explicit. The elements xn represent samples of the image, called pixels (picture elements). Usually, this equates to samples of optical image intensity, or brightness (which is positivevalued), but it may also represent other attributes of sensed radiation. Indeed, the diversity of image types is as diverse as the different bands of sensed radiation, which may be electromagnetic, sonic, or atomic, and also dependent on the types of sensors used to capture the data. Some images may even be complex-valued, as in the case of synthetic aperture radar. However, for simplicity of exposition, we will assume that the samples Xn are real-valued samples of optical images, and all of the image processing examples given will be standard optical images; however, the techniques summarized in this chapter can be, and have been, applied to an amazing diversity of image types. Two-dimensional digital images (the most common variety, representing samples of static, spatial imagery) are usually represented as a two-dimensional
606
s. T. Acton and A. C. Bovik
matrix of values, {xi,j}, where i indexes the image row and j the image column. However, in order to maintain generality, we will utilize the M-dimensional form x,. There are two reasons for this. First, many image signals are of a higher dimensionality, the most obvious example being digital video, where there are typically two spatial indices and one time index. There are also images naturally having three spatial dimensions, such as tomographic data or optical sectioning microscopic images. The second reason we will adopt the general M-dimensional form is that most order statistic-based processing paradigms do not make use of the spatial ordering of the data beyond gathering the data locally; instead, the algebraic ordering is considered (prior to weighting, processing, etc.). Regardless of dimensionality, a digital filter transforms a signal {x, : n E V C_ Z M} (hereafter simply {xn}) into another, filtered signal {Yn}, usually of the same dimensionality and domain, with the intention that the result will be somehow improved: less noisy, for example, or with certain features enhanced or emphasized. Median and rank-order filters
The median filter (Tukey 1971; Tukey 1974) is a simple nonlinear device which does precisely what linear filtering paradigms could not accomplish - remove high-frequency noise from signals without disturbing certain high-frequency informational components - specifically, sharp, sustained transitions and trends, such as edges at the boundaries of objects, or other details representing minute, yet significant structure in the image signal. And more - it had the capability to eradicate impulsive noise occurrences with amazing efficiency. And best of all, it was conceptually trivial and easy to implement - all one had to do was define a moving window, and replace incoming signal points with the computed sample medians. Shortly after the conception of the median filter, a number of DSP engineers immediately foresaw that it might prove to be useful for such significant application tasks as speech signal filtering (Rabiner et al., 1975) and image filtering (Frieden, 1976). However, at that time there was no DSP theory for the median filter, and while there did exist a vast statistical literature on the sample median and other order statistics, it was both written in a different technical language, and dealt mostly with static population samples, and very little with signals. Since then, a rich DSP theory of order statistics has begun to emerge, and which may be regarded as the DSP engineer's contributions to the field of order statistics. A significant part of this contribution has been made by engineers concerned with image signals, which are strongly characterized by the type of information that the median filter tends to favor. Given an M-dimensional image {x.}, the median filter requires the definition of an M-dimensional moving window with a fixed number N of elements, to compute each individual output value. A window is really a pre-specified geometric or temporal law for collecting image samples around each coordinate, in order to perform the median or other operation on them. Once the samples are collected within the window, their relative positions in space (within the window) are discarded, and so it suffices to define a one-dimensional vector to contain
607
Ol'der statistics in image processing
them. Therefore, at image coordinate n, define the windowed set of pixel values by the one-dimensional vector x. =
(1)
This notation indicates that there are N image pixels, indexed 1 , . . . , N , that have been captured from the vicinity of image coordinate n. It will always be assumed that the window covers an odd number N = 2m + 1 of image samples. This is easily justified: at each image coordinate n the filter window is centered directly over the sample to be created at the same coordinate in the output image; the window is usually desired to be (even) symmetric along each dimension; and the window ordinarily contains the current pixel over which the window is centered. Median filter windows
The windowing law used to collect spatial image samples operates according to a geometric structure such as a square-shaped, cross-shaped, hexagonally-shaped, or other such window, several of which are depicted in Figure 1. Usually, windows having a nearly circular geometry are favored, since this encourages isotropic (direction-insensitive) processing of the data (the ROW and C O L U M N filter geometries being an exception to be explained shortly). Of course, it is sometimes desirable to encourage directional processing, in which case a more
m D 3x3 SQUARE
m m 5x5 SQUARE
ikdl
m 3x3 CROSS
/
m 5x5 CROSS
um
ml 3x3 XSHAPE
5x5 XSHAPE
lx3 ROW and 3xl COLUMN
lx5 ROW and 5xl COLUMN
Fig. 1. Typical window geometries for the median filter and other order statistic filters. The white square denotes the center pixel covered by the window.
608
S. T. Acton and A. C. Bovik
oblate window may be selected. The windowing law is defined by a set of vector s h i f t s B = { b i , . . . , bN}, (usually centered around the zero vector shift 0) that determine the coordinates of the image samples that will be captured into the windowed set. With respect to a given sample x,, the window B may be considered an operator (a kernel) such that (Xn__bl,... ,Xrl_cbN)T = XtT. Thus, in Fig. 1, the window B = ( ( - 1 , - 1 ) , ( - 1 , 0 ) , ( - 1 , 1), ( 0 , - 1 ) , (0,0), (0, 1),(1,-1), (1,0),(1, 1)), and at each coordinate n = ( i , j ) the windowed set associated with B is coordinate
Xn = (Xi- 1, j - 1, Xi- 1,j ~X i - I ,j+ 1 : Xi,j- 1, Xi~j, Xi,j+ 1, Xi+ 1, j - 1 ~Xi+ 1d' Xi+ 1,j+ 1) T .
Before actually defining the simpler filters, one other detail needs explanation, When the index n of window of a nonlinear filter is brought next to or near the edge of the image, part of the window will overlap with "empty space"; viz., certain elements of x. will be undefined. This is depicted in Figure 2 for a 3 x 3 S Q U A R E filter. Although there is not strong theoretical justification for it (and other methods are used), the usual convention that is followed in defining the windowing operation near the borders of the image is to (conceptually) fill the "empty" window slots with the value of the nearest well-defined image pixel nearest. This is called p i x e l r e p l i c a t i o n . This is usually regarded as more satisfactory than filling with zero values, since it implies that the scene being imaged falls to a zero intensity beyond the border, which is generally not the case. Another approach is to r e f l e c t pixel values, i.e., define the windowed pixels beyond
0
•
0
0
Fig. 2.Depiction of a 3 x 3 SQUARE window overlapping the borders of an image at different locations.
Order statistics in image processing
609
the image border by symmetrically reflecting pixels across the axis defined by the image border. This often has the advantage of preserving the continuity of patterns, but it is also harder to program.
Median filter definition and examples With this notation in hand, it becomes possible to define and develop a large class of nonlinear filters. Given a windowed set x,, of image samples collected at and near coordinate n, define the vector of order statistics of x, by x(.) = {x(1):.,...,x(N):.} r ,
(2)
where x(1):. =_< • • • _< X(N):,. The output of the median filter is then easily defined: let {y.} = med{x.} when Yn = X ( m + l ) : n
•
(3)
As an example of the median filter's ability to eliminate impulse noise, see Fig. 3. Figure 3a displays a digitized optical image of a tower on the University of Texas at Austin campus. The image has been digitized to a spatial resolution of 256 x 256 pixels (65,536 pixels total), with each pixel's intensity quantized to one of 256 possible gray levels (8 bits), indexed from 0 (black) to 255 (white). The image in Fig. 3(b) contains "salt and pepper" impulse noise - 25% of the image brightness values have been randomly changed to extreme white or black values (with equal probability). Figures 3(c), 3(d), and 3(e) show the result of median filtering the noisy image in Fig. 3(b) using (square) windows of sizes 3 x 3, 5 × 5, and 7 x 7 (covering 9, 25, and 49 pixels), respectively. The quality of the enhanced images in Figs 3(c)-3(e) is quite variable - there is a strong dependence in general between median filter results and filter geometry, as might be expected. The 3 x 3 filter was not able to remove clustered outliers very effectively - although the overall density of impulses was reduced, others groupings of outlying values were merged into larger structures, which could lead to misinterpretation of their nature. By contrast, using a larger 5 x 5 filter window provided superb noise suppression - certainly better than attainable with any linear-type filter, while not degrading the informational part of the image much - the edges, or object boundaries, and much of the detail remains sharp. This result can definitely be interpreted as an enhancement of the image. Taking a yet larger filter window, however, as evidenced by the result attained using the 7 x 7 window, can lead to excessive destruction of some of the useful information. There is clearly a tradeoff between noise suppression and information retention in the selection of an effective median filter window for image smoothing.
Median filter roots Prior to 1980, little work was done on characterizing the temporal or spatial properties of the median filter - as with all nonlinear filters, difficulties in analysis often lead to intractable problems in time- or space signal processing. Indeed, the only statistical treatment on the subject prior to 1980 that we have been able to
610
S. T. Acton and A. C. Bovik
(a)
(b)
(c)
(d)
(e)
(13
Order statistics in image processing
611
find that actually incorporates time- or space-dependent information is H. A. David's work characterizing the correlations of ranges from overlapping samples (David 1955). This work has since been extended towards developing the spectral properties (second-order distribution and selected moment properties) of stationary signals that have been median filtered or filtered by other order statistic-related filters (Bovik and Restrepo 1987; Kuhlman and Wise 1981). The first significant characterizations of the non-static properties of the median filter demonstrated that certain interesting one-dimensional signals, called root signals (or fixed points) are invariant to one-dimensional median filtering (Gallagher and Wise 1981; Tyan 1981; Longbotham and Bovik 1989). While we do not plan to spend much time in discussion of one-dimensional signals (they are discussed at great length elsewhere in this volume), some mention of the basic root signal properties will enhance the discussion of two-dimensional signals. The root signals are characterized by a novel measure of signal smoothness, termed local monotonicity. A one-dimensional signal {x, } is locally monotonic of degree d or LOMO-(d), if every subsequence of {x,} of d successive elements forms a monotonic (increasing, decreasing, or constant) sequence. THEOREM 1. Tyan (1981), Longbotham and dimensional signal {xn} contains at least one length m + 1. Then the output of a y, = X(m+l):~ = Xm+~:, = X~ for every n if and
Bovik (1989): Suppose that the onemonotonic segment (xk,... ,x~+m) of length N = 2 m + 1 median filter only if {x~} is LOMO-(m + 2).
This unexpected result gives deeper insights into the type of signal that the median filter "prefers," and it also suggests the possibility of a limited eigenfunction-like analysis for median filter design and performance study. Even more surprising is the fact that repeated median filtering reduces any signal to a root: repeated application of the median to most finite-length signals (only excluding oscillating bi-valued signals) results in convergence to a root signal in a finite number of iterations: THEOREM 2. Gallagher and Wise (1981), Tyan (1981): A one-dimensional median filter with window size N = 2m + 1 will reduce a length-L signal {xn} to a root signal that is LOMO-(m + 2) in at most (L - 2)/2 repeated passes. Some mention needs to be made at this point about signal duration. In Theorem 1, it is assumed that the signal may be doubly infinite in duration, although it have a finite support. In Theorem 2, the signal is assumed finite in extent (and undefined beyond) - and so there must be some mechanism for defining the filter operation
Fig. 3. (a) Original "Tower" image. (b) "Tower" corrupted by 25% salt-and-pepper noise. (c) 3 x 3 SQUARE median filter applied to noisy image. (d) 5 x 5 SQUARE median filter applied to noisy image. (e) 7 x 7 SQUARE median filter applied to noisy image. (f) 50 iterations of 3 x 3 SQUARE median filter applied to noisy image
612
S. T. Acton and A. C. Bovik
(so that the window is filled with samples near the signal endpoints). The agreed upon method of doing this is to "pad" the signal, by appending m samples at the beginning of the signal (all equal to the first sample), and m samples to the end of the signal (all equal to the last sample). This has the added benefit of introducing the requisite monotonic segment of length m + 1 of Theorem 1. In fact, for finitelength signals with padding, Theorem 1 can be modified. In the context of image filtering, this process becomes identical to pixel replication. Both of these results hold only for signals of one dimension; similar results for true two-dimensional windows, such as SQUARE, CROSS and X-SHAPE in Fig. l, have not been developed. Nevertheless, understanding the behavior of the one-dimensional median filter, and its root structures, does yield insights into its efficacy in image processing applications. In image processing parlance, the median filter is frequently referred to as an "edge-preserving, impulse-suppressing" image smoothing device. For one-dimensional signals, the preceding Theorems make clear that the median filter does not disturb signal trends - even sharp ones, and of course, the sample median possesses high efficiency for estimating data contaminated by outliers or by heavy-tailed noise influences. We will not spend much time on the statistical characterization of the median filter here, other than reviewing some distribution formulae for two-dimensional windows. The reason for this is that, although statistical considerations are equally important for image processing applications as to standard statistical applications, most of them will elsewhere be available in this volume, or already familiar to the reader. As mentioned above, there has been no theory developed that describes the root-signal structures of median filters defined with two-dimensional windows, such as those depicted in Fig. I. Indeed, it is not at all clear that a well-defined class of such root signals exist, although it is trivial to construct examples of twodimensional root signals for any median filter window (for example, a two-dimensional signal that is constant along one dimensional, and locally monotonic of sufficient order along the direction orthogonal). Part of the reason for this is that there is no clear definition for two-dimensional local monotonicity, since it is the orientation of each image path along which the pixel values are to be locally monotonic must be considered.
Separable median filter Actually, there is a type of two-dimensional median filter that does admit a root signal analysis, although the filter windows that define it are not truly two-dimensional. The so-called separable median Jilter is defined by filtering the image row-by-row with a horizontally-oriented filter window, then filtering the result column-by-column with a vertically-oriented filter. Figure 1 depicts length-three and length-five ROW and C O L U M N windows that define the separable median filter. The term "separable" is not accurate, since the so-called separable filter is not equivalent to a true two-dimensional filter result (i.e., not equivalent to the CROSS filter). Also, row filtering followed by column filtering will not generally yield the exact same result as column filtering followed by row filtering, although the two results will generally be qualitatively nearly identical.
Order statistics in image processing
613
The roots of the separable median filter are described by Nodes and Gallagher (1983), who found that the two-dimensional roots are images that must be roots of both the row and column filters individually (locally monotonic along both directions), except for certain instances where there may be an image region over which the image is strictly bi-valued, and oscillates in a certain way in that region (Nodes and Gallagher 1983). However, such binary oscillating regions are so rare in digital imagery that the bi-directional root property practically characterizes the two-dimensional roots of the separable median filter. Stated another way, the practical roots of a separable median that utilizes windows of span 1 x (2m + 1) and (2m + 1) x 1 are images whose rows and columns are all LOMO-(m + 2). This property approximates many naturally-occurring images, and often is taken as a simple and natural definition of two-dimensional local monotonicity. Nodes and Gallagher also are able to show that repeated application of a separable median filter to an image of finite duration will nearly always eventually reduce the image to a root in a finite number of applications. The exceptions are again, certain bi-valued oscillating images that are not encountered in practice. Further root image analysis for recursive modifications of the separable median filter have been supplied by McLoughlin and Arce (1987). Two-Dimensional roots
However, there has as yet been no generalized root image theory developed for true (non-separable) two- and higher-dimensional signals, nor is there evidence that repeated two-dimensional median filter applications must converge to a root signal. Thus, M-dimensional root signal theories remain the subject of intense inquiry. Not much progress has been reported, although for strictly bi-valued images, Heijmans (1994a) has introduced a modified median filter that will converge to a fixed signal. Certainly, part of the problem lies in the fact that the broad diversity of possible two-dimensional windows makes simple root-image characterization unlikely. Further, experiments do not show that two-dimensional median filters yield root images upon repeated application. For example, Fig. 3(f) depicts the divergent behavior of the median filter for digital imagery. After 50 iterations of the two-dimensional median filter, blurring of the image continues to evolve, the image modifying slightly at least with each application. Regardless of the existence or otherwise of two-dimensional root images, the median filter demonstrably smoothes image information quite effectively is such a way that information-bearing sharp transitions, or "edges", are maintained. Image edges contain a wealth of information about the objects contained in the image: their locations, shapes, sizes, and contrast relative to the background. Much of image analysis theory and algorithm development relies upon first detecting the edges in a digital image. By contrast, low-pass linear filtering invariably blurs edges and promotes inter-region smoothing, thus reducing much of the key information available in the edge data.
S. T. Acton and A. C. Bovik
614
Streaking and blotching in median filtered images Because of its amicable properties for image processing, the median filter has enjoyed tremendously widespread use in an amazing diversity of image processing ~ystems, including, for example, commercial biomedical image processing systems, medical radiographic systems, military imaging hardware, etc. However, the median filter is not a panacea! A significant drawback associated with median filtering are the phenomena known as streaking and blotching (Bovik 1987). When a median filter is iteratively applied to an image, or when a filter is used that has a large window span, significant runs of constant or nearly constant values can create streaking artifacts in the output signal. These are a consequence of the median filter's tendency to create root-like signals (see Theorem 2). Bovik (1987) supplies an extensive analysis of the streaking and blotching effects of the median filter. The statistics of median filter streak lengths for one-dimensional signals can be arrived at straightforwardly, although the analysis of two-dimensional streaks (blotches) is rather involved. For the case of one-dimensional filters, suppose that the one-dimensional signal xn is a sequence of independent and identically distributed (i.i.d.) random variables with common arbitrary continuous probability distribution function F. As before, denote the output of a length N = 2m + 1 median filter by yn = X(m+l):,; define a streak of length L in y~to be an event of the form
{Yn-1 7g Yn = Yn+l . . . . .
Yn+L-I • Y,+L} ,
(4)
that is, a constant-valued output subsequence of duration exactly L. Given the initial coordinate n, L is a discrete random variable with a finite range space (since, under the assumption of continuity of F, nonoverlapping windows will yield equal outputs with zero probability). The probability mass function of the streak length is given by 2m÷ 1
PL(fl~)=Pr{Z=2}-- m+-l-[Qm(fl~+2)+Qm(.~)-2Qm(fl~+l)]
(5)
for 1 < 2 < 2m + 1 and PL(2) = 0 elsewhere, where, defining rl = m a x ( l , 2 - m), r2 = min(2, m + 1)
Qm(;O=
er{yn
=
.....
2m-2+2~
2m + 2
Y.+L-1 }
(2r-I)km-r+lJ r=rl
(6)
{2m+a-?~
k m+r 1 J
for 1 < 2 < 2m + 1 and Qm(2) = 0 elsewhere. In the above expression, there is no dependence on the window position n, since the input {x, }, and hence, the output {y,}, is stationary. Note also that there is no dependence on the input distribution F, other than the property of continuity. Thus, median filter streaks tend to occur with lengths independent of the input distribution. It can be shown that for large windows spans N = 2m + 1, PL(2) --+ 2 -~. The expected streak length is
Order statistics in image processing
615
2m+ 1 /~L -- m + 1
(7)
hence for m large, the mean streak length #L ~ 2. The variance a 2 of streak length is not so simple to express, however, it is easily shown that for m large, a 2 ~ 2 also (Bovik 1987). At first glance, this appears counter to the obvious fact that larger windows create extensive streaks and blotches. However, in practice, the streaking effect is not so much one where large windows create very long streaks, but instead, larger windows give rise to a larger number of moderate-length streaks which often occur in succession. For two-dimensional median filter geometries, the analysis is much more complex owing to the varying degrees of overlap between filter windows that are positioned at different orientations and distances from one another in the image. Ideally, one would like to be able to express the average "size" of a blotch, but the complexity of the problem, as well as the fact that different window geometries will yield different results has so far left this unaccomplished. In a limited analysis (Bovik 1987), the probability that two overlapping windows would yield identical outputs was calculated for a variety of window geometries. Supposing that y, and YP are the outputs of a two-dimensional median filter at coordinates n and p, where the window geometry spans N = 2m + 1 pixels, and where the windows centered at n and p overlap over exactly K _ < N pixels, then defining Sl = max(1,m + 2 - K), s2 =- min(n + 1,2m + 2 - K), the probability that the median outputs will be equal (hence both potentially part of a blotch) is
K ~(2m+sl-K)2 (m-~+~) 1(-1
Pr{yn = yp} -- 4m + 2 - K
- 1
s=sl
(4m+l-K~
(8)
\ m+s-1 J
This expression does not reduce easily, but Bovik (1987) tabulates these blotch statistics for three types of median filter window geometries (SQUARE, CROSS, XSHAPE) shown in Fig. 1, and for window spans ranging from 3 to 15 (pixel coverage depending on the window). It is found, not unexpectedly, that the CROSS and X S H A P E geometries tend towards strong horizontal/vertical streaks and strong diagonal streaks, respectively, where the S Q U A R E geometry yields streaks with little directional preference, hence tends instead to create "blotches". This blotching effect may be observed in several of the examples in Fig. 3. Refer to (Bovik 1987) for a series of image processing examples illustrating this effect. The effects of streaking and blotching are undesirable, since they may lead an observer (or an automated processing system) to believe that there is an image structure where there is not any physical correlate. Image streaks and blotches can be reduced somewhat by postprocessing the median-filtered image with a short duration low-pass linear filter (Rabiner et al., 1975), or by instead, defining the filter output to be a linear combination of the windowed signal samples, rather than just the median (Bovik et al., 1983; Bednar and Watt 1984; Lee and Kassam 1985). The latter of these defines the so-called OS filters, to be defined shortly.
616
s. T. Acton and A. C. Bovik
Output distributions of 2-D median filters As is probably evident from the previous analysis of median filter streaking, the analysis of the second-order statistical properties of two-dimensionally medianfiltered images is quite challenging. Nevertheless, Liao, Nodes and Gallagher (1985) compute expressions for the multivariate output distributions of the separable median filter, and two-dimensional SQUARE median filters. Since the expressions obtained are truly cumbersome, the reader is referred to this reference, Computation of the moving median Another area of concern regarding the median (and other OS-related filters) is the computational cost. As the window size N increases, the sorting process increases in execution time as N! and becomes prohibitively slow on a standard serial computer. The expense of arithmetic ranking has spurred the development of fast algorithms to accelerate the computation (Huang et al., 1979; Ataman et al., 1980) and special-purpose architectures tailored to the median filter operation (Oflazer 1983). The algorithm due to Huang et al. exploits the fact that many of the pixels are shared between adjacent or overlapping filter positions, and so need not be ranked at each filter position. Other fast implementations utilize the threshold decomposition property of the median filter (discussed in another chapter in this volume), which for quantized data, allows all operations to be expressed as binary median filtering (i.e., binary majority filtering).
Rank- Order filters With the broad acceptance of the median filter as an effective image smoothing device, modifications and generalizations began to appear. The simplest extensions are the rank-order (RO)filters, which deliver the k th order statistic at the filter output. Nodes and Gallagher (1982) studied the basic time-domain properties of the RO. Using the notation for the median filter, the output of the k th rank-order filter is defined as follows: if {Yn} = rank~{xn}, then y, = x(k):, .
(9)
where 1 < k < N. RO filters of interest include the max filter and the min filter. The max filter, also called the dilation filter since it dilates signal peaks, is defined by k = 2m + 1. The rain filter, also called the erosion filter since it erodes the peaks in a signal, is implemented with k = 1, the first or lowest ranking OS. The root signals produced by 1-D RO filters are described in (Eberly et al., 1991). RO filters other than the max, median, and rain find relatively practical applications. In a later section, the erosion and dilation filters will be more extensively discussed; it will be found that erosion and dilation operations can be combined in a variety of ways to create smoothing filters having interesting properties. In fact, an entire discipline which we may term digital mathematical morphology has arisen which is largely based on these two simple operations.
Order statistics in image processing
617
Detail-Preserving filters An important ingredient in assessing the efficacy of a median filter is the effect that the geometry of the window used has on information-bearing image structures. Figure 4 is instructive on this point. Figs. 4(a) and 4(b) depict close-ups of small patches of an image that are to be median filtered (for simplicity, the subimages shown are bi-valued, but the observations made here are generally applicable). The curvilinear structures that appear in the image patches could represent significant image detail, such as parts of text, minute detail in a fabric, etc. Reducing the clarity of these features as part of a noise-reducing process could greatly deteriorate interpretation of the image content. Applying a SQUARE median filter of any size will completely eradicate the structure shown in both Figs. 4(a) and 4(b), leaving nothing but blank patches the filter will regard the local patterns as "outlying values." However, applying a CROSS median filter of moderate size will leave the pattern in Fig. 4(a) completely untouched, while an XSHAPE median filter will leave the structures in Fig. 4(b) nearly unscathed. However, if the filters are reversed, viz., a CROSS median filter applied to 4(b) or an XSHAPE filter applied to 4(a), the patterns would again be destroyed. Clearly, it would be desirable to be able to incorporate selection of the window geometry into a design procedure for median filters. Apart from the difficulty of finding such an optimal geometry for a given image (according to some criterion, which would require identifying the "desirable" structures to be preserved), the problem is made more difficult by the fact that images are generally nonstationary, and apt to contain different types of features at different spatial locations. With these problems in mind, several authors have proposed "detail-preserving filters," which possess qualities of the median filter (and usually are defined in terms of combined median operations), but with additional sensitivity to structures (of variable orientation) such as depicted in Fig. 4. Filters in this "class" include multistage median filters, which combine the responses of several
m
m
IIm
mum
II mml Ili i Ilmm i
|
(a)
(b)
Fig. 4. Depictionof two simpleimage patterns for whichmedianfilteringwith differentwindowsyields radically differentresults.
S. T. Acton and A. C. Bovik
618
directionally-oriented median filters at each coordinate (Arce and Foster 1989); FIR-median hybrid filters, which combine linear filters with median filters (Neiminen, Heinonen and Nuevo 1987); and max/median filters, which combine median operations with other RO operations (Arce and McLaughlin 1987). Because of the similarity in performance of these filters (Arce and Foster (1989) perform a careful statistical analysis), only a few members of the class are described here. The class of filters can be divided into two types, the so-called unidirectional multistage filters and the bidirectional multistage filters. The unidirectional multistage filters utilize four one-dimensional median filters outputs, ,which are then combined. Fig. 5 depicts four one-dimensional windows, labeled Bi through B4. These include the ROW and C O L U M N windows from before, plus two diagonal windows. The unidirectional filters are defined as follows. At each coordinate n, denote the output of a median filter using window B i a s Yn,i for i = 1,2, 3, 4. Denote the maximum and minimum of these outputs, respectively, by
Yn,max =max{yn,i;i= 1,2,3,4} and Yn,min = min{yn,i;i = 1,2, 3, 4} .
(10)
The unidirectional multistage max/median filter is then defined to be the median of these and the current input (over which the windows are centered): Y.,max/mod = med{x.,y.,m,×,yn,mi.} .
(11)
The unidirectional multistage median filter is computed in several stages of computing medians, instead of maxima and minima. Using the same windows, define Yn,(I,3) = med{xn,yn,j,yn,3}
and
Yn,(2,4) = med{xn,Yn,2,Yn,4}
(12)
The output of the unidirectional multistage median filter is then Yn,multi-med = med{xn, Y.,(I,3),Yn,(2,4)} •
B1
B2
(I 3)
B3
B4
Fig. 5. Subwindowsused in the definitionof the unidirectionalmultistage max/median filter.
Order statistics in image processing
619
The bidirectional versions of these filters are defined simply by using CROSS and XSHAPE two-dimensional windows instead of combinations of one-dimensional windows. The bidirectional multistage max/median filter is defined as the median of three values: the current sample, and the maximum and the minimum of the CROSS and X S H A P E median filter outputs. The bidirectional multistage median filter is defined simply as the three-sample median of the current output with the two filter outputs. Arce and Foster (1989) perform an extensive statistical and empirical analysis comparing the various proposed detail-preserving filters. In addition to computing breakdown probabilities of each of the filters, empirical mean-square error and mean absolute error figures are computed from images of real scenes as well as test images. They found that the relative performances of the various filters studied varies as a function of the level of signal vs. noise (SNR), the window spans, and the noise type. Generally it was found that the detail preserving filters have considerable advantage relative to the median filter for preserving image detail, unless the noise level is very high (when the detail is obscured). Tables and detailed comparisons can be found in (Arce and Foster 1989).
OS filters The so-called order statistic filters (OS filters), or occasionally L-filters, since they are really moving L-estimators, output a linear combination of the windowed order statistics. Thus, the filter definition requires the specification of a length-N vector of filter coefficients a. In image processing applications the condition 2m+l
a~ = 1
(14)
k=l
is usually applied; from a statistical perspective, this amounts to a unbiased mean condition. In simpler terms, it means that the average level of an image will remain unchanged when filtering with an OS filter with coefficients a satisfying (15). The output of an OS filter with coefficient vector a is then defined: if {Yn} = O S a { x n ) , then N
Y" = Z
akx(k):,
(15)
k=l
From an image processing perspective, the introduction of a filter coefficient set a allows the possibility of a design procedure (whereas the median, RO, and detailpreserving filters are "automatic") based on knowledge of the image and/or noise statistics. In addition to the median and rank-order filters discussed previously, several useful fall in the class of OS filters. Of course, all of these are moving versions of standard (possibly optimal) L-estimators, and so, enjoy statistical behavior partially described by the static properties that are amply available in the statistics literature. For example, the midrange filter is implemented with a~ = 1/2 for k = 1, a~ = 1/2 for k = 2m + 1, and a~ = 0 otherwise. Standing
620
S. T. Acton and A. C. Bovik
apart as the only linear OS filter, the average filter, is constructed using ak = 1/(2m + 1). An L-inner mean filter (a moving trimmed mean, in essence) may be defined by setting a~ = 1/(2L + 1) for (m + 1) _< k < (m + 1 + L ) and ak = 0 otherwise. Fig. 6 depicts an OS filter application. The image in Fig. 6(a) was immersed in additive Laplacian-distributed noise of standard deviation a = 20.0, thus yielding Fig. 6(b). An OS filter was designed such that the weighting vector a is triangular in form. This device, referred to as the A-OS filter (read "triangle OS filter"),
(a)
(b)
(c)
(d)
Fig. 6. (a) Original "'Old Central" Image. (b) "Old Central" corrupted by Laplacian-distributed additive noise (a -- 20.0). (c) 5 x 5 SQUARE A-OS applied to noisy image. (d) 3 x 3 SQUARE W M M R - M E D filter applied to noisy image.
Order statistics in image processing
621
emphasizes OS that neighbor the window median intensity, while suppressing the extreme windowed values. The result of A-OS filtering the image using a 5 x 5 S Q U A R E window is depicted in Fig. 6(c). Clearly, the noise content of the image has been effectively removed; however, there is some expense of image blurring. Optimal O S f i h e r s
If the image samples are corrupted by additive i.i.d, noise samples, and if the image is considered sufficiently smooth that it can be modeled as nearly constant within each window position, then standard techniques may be used to design the OS filter under, for example, a standard least-squares criteria (Lloyd 1952; Sarhan 1954, 1955a,b) or using some statistical robustness criteria (David 1982; Crow and Siddiqui, 1967; Gastwirth and Cohen 1970). Such topics have been considered in (Bovik et al., 1983; Restrepo and Bovik 1988). Hence, for a sufficiently smooth signal (viz., approximately constant within the span of a window) immersed in an additive (or multiplicative) i.i.d, noise process, it is possible to construct an optimal OS filter under the least-squares or MSE criterion, and such a filter can be used with great effect to smooth signals, without the streaking effect introduced by median filtering (Bovik et al., 1983). It is worth noting that since the (linear) average filter is also an OS filter, it is always possible to derive an OS filter (under these conditions) that produces a MSE no larger than that of the optimal linear filter! When the noise becomes non-independent, or when the signal becomes fastvarying, then this simple procedure breaks down. With a little effort, however, it is possible to compute an optimal MSE OS filter to estimate a non-constant image immersed in i.i.d, noise. However, the computation involved to calculate the optimal coefficients is non-trivial. The following discussion is based on the work of Naaman and Bovik (1991). Here, an observed image signal {x.} is defined by the pointwise sum xn = S n q- r/n, where {s.} is the uncorrupted signal and {~/n} represents a constant-mean noise sequence. The mean-squared error (MSE) between the OS-filtered signal y. and the signal sn is e. = E I(yn- s.) z] = a V H n a - 2sna v/,. + s 2
(16)
Hn=E[x(n)X/n)] ,
(17)
where
is the O S correlation matrix. Its elements are given by Hi,j:n = E[x(i):nX(j):n] .
(is)
The O S mean vector is defined -- Six/.)]
,
(19)
S. T. Acton and A. C. Bovik
622
with elements /&n = E[x(i):n] .
(20)
T o create an optimal OS filter for this image processing problem, the M S E is minimized over the OS coefficients a. F o r an image containing K pixels, the M S E can be minimized over the entire set of signal samples, or over a subset o f the samples. F o r a generalized solution over K components, the M S E is K
e= Z
e.~ = a r H a
- 2a~l~ls + s r
(21)
k=l
where K
= Z ~I"k (NxN) ,
(22)
k=l
1VI=
[/~nl
I~.=l.-I~.k]
(NxK) ,
(23)
and s = (sn., &2, • • -, s.k) r
(24)
In this paradigm, the signal c o m p o n e n t s to be estimated, s, m a y or m a y not contain shared (overlapping) estimates. So, the signal m a y be estimated from K overlapping windows given by the vectors Xnk = s.~ + !1. k
(25)
x. k = (Xl:.k,Xe,k,... ,Xx:.k) r ,
(26)
S.~ = (Sl:.k, S2:nk,.'',SN:.k) r ,
(27)
!1.~ = 011:.~,t/2:.k,...,qN:nk) r
(28)
where
and
A l t h o u g h the estimation process itself is not affected by the use o f overlapping windowed components, the optimal coefficients will be different than in the nonoverlapping case. Since all of the involved correlation matrices are positive definite, the m i n i m u m M S E solution can be realized by a straightforward quadratic optimization process (with a global m i n i m u m ) as follows: O__e_e= 2~la - 21~Is = 0 Oa
(29)
Order statistics in image processing
623
which yields the optimal OS coefficients a*: a*
= H-1Ms
.
(30)
The optimal coefficients produce a MSE of: 8min
= sT ['I -- l~i2I- 1]~I] s
(31)
where I is the K x K identity matrix. For a given i.i.d, noise process with distribution F(.) known a priori, closedform expressions for the OS correlation matrix and the OS mean vector are obtainable, but at a significant computational cost (Naaman and Bovik 1991). Direct computation of these marginal and joint statistics by numerical integration involves O(K) integrals with O(N !)-term integrands. Hence, for large values of K, as encountered in image processing, and large sizes (N) of the filter window, direct computation is not practical, The composite OS moment matrices can be estimated in polynomial time (Boncelet 1987). Using a recursive, discrete approximation algorithm, the OS filter coefficients for image processing applications can be computed in a reasonable period of time. The multivariate distribution G of m order statistics x(k~),...,x(km) corresponding to a sample of independent random variables x = ( x l , . . . ,xu) r of size N _> m can be estimated using G ( k l , . . . , km; N) = Pr{x(i) < ti,
i = k l , . . . , km
m+l
= ZG(ki,...,kj-,,kj-
1,...,km-1;tl,...,tm
N-l)
j=l
× Pr{tj-l<xu<
tj}
(32)
where kl < k2 < ... < km are integers and - e c = to < tl < ...
624
S. T. Acton and A. C. Bovik
A(2, a) = e + 2~(1 - are)
(33)
where e is a unit vector of length N and 2c is a Lagrange multiplier. Differentiating (33) with respect to the coefficient vector a (and equating the result to zero), the optimal coefficients are computed as: 2c- 1 a* = H~-IlVls + ~-U~- e
(34)
which combined with (14) yields 2 ( 1 - er~r: llVls) )~c =
(35)
er/~rc i e
The results can then be used to determine a new minimum MSE. At this point, methods for computing optimal OS coefficients by a standard global MSE technique and a local structure-preserving least squares technique have been described. Using these solutions, optimal OS filters for smoothing digital imagery and other signals can be defined. The OS filter has proven to be a powerful tool in image enhancement applications. The impulse rejection and edge preservation properties of OS filters cannot be matched by linear filters, since signal discontinuities lead to overlapping frequency spectra (Restrepo and Bovik 1986). OS filters are also translationinvariant and preserve linear signal trends (Lee and Kassam 1985). W M M R filters The weighted majority with minimum range (WMMR)filters are OS filters that weight the ordered values in a subset of window samples with minimum range (Longbotham and Eberly 1993). Typically, the set of m + 1 samples with minimum range is found within a filter window of 2m + 1 samples; then the m + 1 samples are combined linearly according to a weight vector a. For example, the weights may select the median (WMMR-MED) or the average (WMMR-AVE) of the m + 1 values. The W M M R concept is a generalization of the trimmed mean filters, which "trim" away a set number of the largest and smallest order statistics in the window and then average the remaining samples (Bednar and Watt 1984). The W M M R filters have proven to be very effective edge enhancers and smoothers for signals that are piecewise constant (PICO). For an example, see Fig. 6. The image in Fig. 6(b), corrupted with Laplacian impulse noise, has been enhanced by a 9 × 9 W M M R - M E D filter (see Fig. 6(d)). Note the very sharp retention of image features despite the excellent noise suppression. An interesting root-signal analysis has also been developed for the W M M R filters. A one-dimensional signal is defined to be piecewise constant of degree d, or PICO-d, if each sample belongs to a constant segment of at least length d. Using the W M M R filter, similar concepts to linear lowpass, highpass, and bandpass filtering for PICO signals have been conceived (Longbotham and Eberly 1992). Just as Gallagher and Wise (1981) derived a root signal theory for median filters,
Order statistics in image processing
625
Longbotham and Eberly in like fashion provide a root signal characterization for the W M M R filter. In order to achieve a PICO-(m + 1) root signal by iterative application of the W M M R filter, they show that the weights a must be nonnegative and sum to unity (normalized). Also, the first and last weights (al and am+l) must be unequal (Longbotham and Eberly, 1993). Although convergence to a PICO root signal for 2-D signals is not guaranteed, the W M M R is an extremely powerful tool for smoothing images of natural and man-made scenes that consist of approximately piecewise constant regions. The W M M R filter has been successfully applied to problems in biomedicine (extracting dc drift) and in electronic circuit inspection (removing impulse noise from digital images of printed circuit boards) (Longbotham and Eberly 1992).
Relationship of OS fihers to linear FIR filters One may note that the definition of the OS filter is very similar to that of afinite impulse response (FIR) linear filter. An FIR filter is simply a linear filter that is not recursive, i.e., the current output does not depend in any way on other outputs. Using our prior terminology, a one-dimensional linear FIR filter with associated coefficient vector b has an output defined as follows: if {z.} = FIRb{x,}, then N
Zn = Z akxk:n k-1
(36)
Hence, the signal samples are not placed in rank order prior to weighting. Thus, an FIR filter weights the signal samples within a window according to either spatial or temporal order, whereas the OS filter employs an additional algebraic sorting step so that the samples are weighted according to their arithmetic rank. The hybridization of FIR and OS filters, such as the FIR-median filter (Heinonen and Nuevo 1987), is a current topic of research. The difference or similarity in performance between FIR and OS filters is signal-dependent. For a signal that is sufficiently smooth, in the sense of being locally monotonic (LOMO), OS and FIR smoothing filters operate identically. This result is formalized in Theorem 3 for 1-D filters: THEOREM 3. Longbotham and Bovik (1989): The outputs of a linear FIR filter and an OS filter are equal: OSa{xn} -- FIR,{xn}
(37)
for all even-symmetric coefficient vectors a of length N = 2m + 1, if and only if (x.} is LOMO-(2m + 1). This Theorem, which is fairly obvious, makes it clear that the description of signal smoothness in terms of local monotonicity has significance for not only the median filter, but for all OS filters. While setting the foundation for analytical
626
S. T. Acton and A. C. Bovik
characterization of OS filters, these results have also deepened the significance of the property of local monotonicity as a measure of signal smoothness. A more general and provocative Theorem is given next: THEOREM 4. Longbotham and Bovik (1989): Suppose that the elements of a length N = 2m + 1 even-symmetric coefficient vectors a satisfy a I = a2 . . . . . a2m+l-k, where m + 1 < k < 2m and one of the following: (a) al < min{a2m+2-k, • • •, am+l } (b) al > min{a2m+2_/~, • . ., am+l }. If {x,} contains at least one monotonic segment of length k, then: OSa{xn} = FIRa{xn}
(38)
if and only if {xn} is LOMO-(k + 1). An OS/FIR filter pair satisfying Theorem 4 are said to be LOMO-(k + 1) equivalent. Some interesting filters can be immediately seen to be LOMO-equiv-
alent: COROLLARY. Longbotham and Bovik (1989): Linear averaging filters of length 2k + 1 and k-inner mean filters are LOMO-(k + 1) equivalent. Many signal processing operations are simple to define in one dimension, but are difficult to extend to 2-D or higher dimensions, such as Kalman filtering. Most digital filtering operations have disparate definitions in 1-D and in 2-D (or M-D), since the filter operation must be redefined relative to the windowing geometry or other spatial criteria. This is not true with OS filters. Once each element is ranked, the OS filters are defined in the same manner for the one-dimensional, twodimensional, and M-dimensional cases; they remain as conceptually simple (although the theory may not extend across dimensions!). As a result, OS filters may be discussed in terms of general signal processing; the one-dimensional, twodimensional, and M-dimensional cases do not have to be treated separately.
3. Spatial/temporal extensions Despite the broad success of OS-retated filters in the area of image processing, which resides primarily in the fact that OS filters provide image smoothing performance that linear filters cannot, one must be careful in applying the OS technique blindly. For example, an OS filter does not capitalize on the original (spatial or temporal) ordering of the signal or image within a filter window. In this respect, there are certain processing tasks that are not well-suited for a standard OS filter. The idea behind several of the OS generalizations has been to combine the spatial/temporal ordering information of linear filters with the rank order information contained in the responses of OS filters.
Order statistics in imageprocessing
627
C, LI and permutation filters The so-called combination filters (C filters) (Ghandi and Kassam 1991) and Ll filters (Palmieri and Boncelet 1989), attempt to fuse OS filters (denoted by L) with linear filters (denoted by I). The C and Ll filters use a weighted sum of the input samples to compute the filter output. The weights are defined to be a function of both rank and temporal/spatial position. The results that have been reported indicate that improved signal estimation performance may be gained relative to the standard OS approach (i.e., the median filter) for removing signal outliers. Unfortunately, the C and L1 filters may perform poorly, or be difficult to design, for more complex or nonstationary signals, where signal discontinuities such as image region boundaries exist (Barner and Arce 1994). Consequently, the relationship between spatial/temporal order and rank-order remains a current topic of research within signal processing. In the same spirit as C and Ll filters, permutation filters (P filters) generalize the OS filter by fostering the utilization of the spatial or temporal context of a signal (Barner and Arce 1994). Since the mapping of a spatially or temporally ordered vector to a rank-ordered vector is essentially a permutation, permutation theory is used to develop the filtering theory. Each permutation filter output is defined to be the order statistic associated with mapping the original vector to the ordered vector. The P filters do not use weighted sums; each filter output is restricted to lie within the set of input samples. The operation that maps the spatially/temporally ordered vector x, to a rank-ordered vector x(,) is called the observation permutation, Px. The sample given as the output of the P filter is a function of p,,. If SN is the set of possible permutations, and H is a set whose elements (called blocks) are possibly empty subsets of SN, then the P filter is defined as follows: Fp(x.; H) = x(tx) ,
(39)
where Px E Hi, (Ix is the index of the block that contains px). So, in this construction, specific outputs are associated with permutations in the blocks of H. The arrangement of H can be tailored to specific applications such as estimation. The initial results for these very recent filters show that P filters can track discontinuities of nonstationary signals and reduce outliers effectively while also allowing effective frequency selection. The complexity of P filters is an obvious drawback, as the number of possible permutations in SN explodes combinatorially with increased window size. Reduced set permutation filters (RP filters) are P filters which consider several permutations as isomorphic and therefore equivalent, thus reducing the number of possible instances. Rank-order filters, weighted rank-order filters, and stack filters have a limited permutation transformation set, as compared to the general class of P filters; in fact they are a subset of the P class (Barner and Arce 1994).
628
S. T. Acton and A. C. Bovik
Stack filters Wendt, Coyle, and Gallagher (1986) developed the so-called stack filters. The power of these filters lies in their versatility and efficient implementation. They form a class of filters which include the RO filters as well as many of the OS filters. A VLSI implementation of the stack filter class is facilitated by exploiting the stacking property and the superposition property, also called threshold decomposition. It is shown in (Fitch et al., 1985) that RO operations can be implemented by first decomposing a K-valued discrete signal (by thresholding) into K binary signals, then applying binary filter operations (such as the binary median) on each of the binary signals, and finally "stacking" the outputs of the Boolean operations to form a K-valued output signal. Filters that can be implemented by threshold decomposition and stacking belong to the class of stack filters. The conditions under which a stack filter preserves edges and LOMO regions are given in (Wendt et al., 1986). In addition, the rate of convergence and the basic statistical properties of the stack filter are provided. Since stack filters are easily realized on a special-purpose chip and incorporated into a real-time signal processing system, research into the stack filter theory has been quite extensive. More than that, optimal stack filters can be effectively and very naturally designed according to the mean absolute error (MAE) criterion (Coyle and Lin, 1988); indeed, optimal stack filtering under the MAE criterion bears analogies to optimal linear filtering under the MSE criterion: both procedures are made possible by superposition properties, and both have tractable design procedures. Indeed, stack filtering would almost certainly find nearly ubiquitous application, if it were not for the fact that the threshold decomposition does not easily yield to intuitive or simple design. Nevertheless, impressive results have been obtained in image filtering applications (Coyle et al., 1989). The other main drawback of the stack filters are their constrained structure; these limitations are discussed in (Barner et al., 1992). Recent extensions to the stack filter theory include the creation of signal-dependent adaptive stack filters (Lin et al., 1990).
4. Morphological filters Although the connection is not always made, OS filters are also closely related to a relatively new area of image processing called morphology. As the name implies, morphological operations alter the "'shape" of objects in signals and images and are particularly useful in the representation and description of image features such as regions, boundaries, and skeletons. In a fundamental paper unifying the theory between OS filters and morphological filters, Maragos and Schafer (1987) demonstrated that any OS filter can be implemented via morphological operations. It will be convenient to introduce a new notation for the basic morphological operators. The morphological operations are highly dependent upon the size and shape of the structuring element, or window, B. Given the structuring element B, if k = 1, then {yn} = rankk{x.} = {x,}®B, the so-called erosion of {Yn} by B.
Order statistics in image processing
629
Likewise, if k = N = 2m + 1, then {y.} = rankk{x.} = {Xn} ® B which is the dilation of {x.} by B. The erosion operation can be viewed as finding the infima of the windowed sets; likewise, the dilation operation consists of computing suprema of the windowed sets defined by the structuring element. It is noteworthy that erosion distributes over arbitrary infima and dilation distributes over arbitrary suprema. In this respect, dilation (or erosion) can be implemented by the Minkowski addition (or subtraction) of the image {Xn} and the structuring element B. For binary-valued images, the way in which dilation affects the foreground objects of an image, erosion similarly affects the background. This duality relationship may be summarized by the following theorem and corollary, where X = {x,}, Xc is the logical complement of X, and X T is the transpose of X: THEOREM 5. (Haralick and Shapiro 1992): (X®B) c = Xc ® B T. COROLLARY. (Haralick and Shapiro 1992): (X ® B) c = XC®BT. For gray-scale images, the erosion and dilation operations have the dual effects of eliminating positive-going noise impulses and negative-going noise impulses, respectively. However, each also biases the signal either upward or downwards. An example illustrating this point is shown in Fig. 7. The dilated image of Fig. 7(b) has highlighted, expanded bright regions, while the eroded image (Fig. 7(c)) has been biased negatively, producing a dark, lower contrast image. Bias-reduced operators can be defined by concatenating opposite operations. The open operation is defined by the dilation of the erosion as follows: {x.} o B = ({x.}®B)
® B .
(40)
The opening is (Heijmans 1994b): (1) (2) (3) (4)
increasing such that if {Xn} C_ {y.}, then {x.} o B _C {y.} o B; translation invariant; idempotent, ({Xn} o B) o B = {Xn} o B; anti-extensive, {x.} o B c_ {x.}.
An open operation performed on an image will smooth the image, preserve edge information, and reject positive-going impulses. The closing is defined by {Xn}'B = ({xn} @ B)®B .
(41)
The close operation possesses the same four properties as the open operation, except that it is e x t e n s i v e rather than anti-extensive; therefore, {Xn} C_ {x.} • B. Note that dilation and erosion are also increasing and translation invariant. When applied to an image, the close operation will again smooth noise without removing edges but will eradicate negative-going, not positive-going, impulses. This characteristic difference between open and close can be exploited in the
630
S. T. Acton and A. C. Bovik
(a)
(b)
(c) Fig. 7. (a) Original "'Truck" Image. (b) Dilation with 3 × 3 SQUARE. (c) Erosion with 3 × 3 SQUARE. design of simple peak and valley detectors for digital imagery. Both open and close have important morphological implications in processing and interpreting images. For example, the opening will tend to remove small image regions and will separate loosely connected objects in the image. The closing, on the other hand, will remove small "holes" and gaps in the image. Because they are biasreduced operators, they do not significantly affect image region areas. So, like other OS filters, these filters, constructed from a succession of OS filters, can be used to smooth image data without altering the image structure. However, the open and close operations by themselves have limitations, as depicted in Fig. 8. The image of Fig. 8(a) was subjected to salt and pepper noise, where 5% of the
631
Order statistics in image processing
(a)
(b)
(c)
(d) Fig. 8.
pixels have been randomly changed to white or black, simulating transmission errors, as depicted in Fig. 8(b). The close operation result (Fig. 8(c)) and the open operation result (Fig. 8(d)) contain impulse noise artifacts. This situation can be mitigated by combining open and close into further-combined operators. By using further concatenation of operations we obtain the two-sided operators: ({x.} • B) o B
(42)
({x,,} o B) • B ,
(43)
and
632
S. T. Acton and A. C. Bovik
(e)
(f)
(g) Fig. 8. (a) Original "Friends" Image. (b) hnage corrupted by 5% salt-and-pepper noise. (c) Closing with 3 x 3 SQUARE. (d) Opening with 3 x 3 SQUARE. (e) Close-open with 3 x 3 SQUARE. (f) Open-close with 3 x 3 SQUARE. (g) Median filtered with 3 x 3 SQUARE. called open-close and close-open, respectively. Both of these operations have the ability to smooth noise, especially impulse noise, of both the positive-going and negative-going type, with little bias. Morphologically, open-close tends to link neighboring objects, where close-open tends to link neighboring holes. Here, it is assumed that the objects have higher mean intensities than that of the holes. Where open and close alone failed (see Figs. 8(c) 8(d)), the close-open (Fig. 8(e)) and open-close (Fig. 8(f)) filters successively eliminated the majority of the image noise. However, the performance of open-close and close-open did not match the results given by a median filter (Fig. 8(g)), using the same structuring element.
Order statistics in image processing
633
However, the open-close and close-open operations represent the result of four consecutive processing iterations, and so represent a more intense smoothing of the image. Since morphological operators can be implemented without computing the complete set of order statistics for each windowed set (only m a x and min are needed), open-close and close-open offer affordable alternatives to median smoothing for real-time processing. Furthermore, morphological operations can be realized using simple logic circuits and can be implemented on high-speed locally interconnected parallel processors. With this in mind, Haralick and Shapiro (1992) give a morphological approximation to the median. For input image {Xn}, let {y.} = ({x.} o B), and {zn} = ({x.} • B). Each sample n of the output image {qn}, is given by qn = yn if [yn _> x.I and qn = zn otherwise . If {x.} is monotone at n, the median filter response {r.} = med{x.} at sample n is Xn = Yn = Zn = qn- If {Xn} is not monotone at n and xn is a neighborhood minimum, then Yn -< rn which implies that z. _> r.. At this point qn = Zn, so that the minimum points are changed to pixel values greater than or equal to the median response. However, since x~ is a neighborhood minimum, the new value is less than the neighborhood maximum. In this way, negative-going outliers can be eliminated by this expeditious approximation of the median. The same argument is true in the case of positive-going neighborhood maxima, where the closing q. = y. is the appropriate response. In addition to filtering, these OS-based operations are extremely effective tools that can be utilized to describe the shape of particular image regions. A morphological shape decomposition (Pitas and Venetsanopoulos 1990) gives a coarseto-fine description of a given image, similar to, but more meaningful than a succession of linear filtering. The morphological representation preserves edge localization, while linear filtering tends to blur and distort image objects. A simple shape decomposition can be created by a succession of open-close filters with increasing structuring element size (Vincent 1994). This progression is often called the alternating sequential filter (ASF) method (Serra and Vincent 1992). The Pitas-Venetsanopoulos decomposition has also been applied to several imaging applications (Pitas and Venetsanopoulos 1994). Given an image object defined by a set of samples {y.} and a convex structuring element B, the largest constant rl (called the radius) is found such that {y~} o riB ¢ 0. The first-order approximation in the shape decomposition is defined as {y(1).} = {Yn} o riB .
(44)
Then, the largest radius r2 is computed such that ({Yn} {Y(1)n}) o r2B ¢ Q, which leads to the second-order approximation {y(2).} = {y(1)n } U ({y.} {y(1).}) o rzB . Given {y(0)n } = ~ , the following recursion defines the decomposition:
(45)
634
S. T. Acton and A. C. Bovik
{y(t + 1).} = {y(t).}
u
({y.}\{y(t).})
o
r,+,B
(46)
where t _> 0 and rt is the radius of the maximal structuring element rt+lB that can - 1)n}. Using (46), a union of disjoint shape compobe inscribed in { y , , ) \ { y ( t nents is created and may be used for a high-level image analysis. The multi-scale representation can then be utilized for image segmentation, edge detection, image coding, or for correspondence processes such as stereo vision, motion analysis, and feature registration. An example morphological image application is presented in Fig. 9. Here, the morphological operators are used in an optical character recognition process. First, the gray-scale image of Fig. 9(a) is thresholded (t = 150) such that the
~... (a)
(b)
",,,
/
--'--:"
i
• ", ... /..
:..
.-"
~
(d)
(c) Fig. 9.
Order statistics in imageprocessing
t'"
635
(e)
Fig. 9. (a) Original "Letters" Image. (b) Binarysegmentation.(c) Close-openwith 9 x 9 SQUARE. (d) Medial Axis Transformation.(e) Inverse Medial Axis Transform. characters are black and the background white, yielding Fig. 9(b). However, an optimal threshold does not exist to provide a complete separation of the characters and the background. To remove the unwanted holes and small regions, a 9 x 9 close-open filter is successfully applied (Fig. 9(c)). Moreover, the medial axis transform (MAT) (Blum 1973; Serra 1988) is used to provide a compact representation of this binary image (see Fig. 9(d)). One can exactly recreate the original via the inverse MAT. Image morphology, which can be viewed as a method of order statistics, is growing rapidly in its theory and applications. These economical operators offer powerful image smoothers and shape descriptors. In addition to the applications presented here, image morphology can be used to analyze granulometry (size distribution of objects), to create morphological skeletons, to bound derivative operators, to measure object-to-object distances, and to define a new method of sampling based on image shapes. Indeed, only a minute portion of the related current theory and applications can be represented here.
5. Related OS applications Edge detection
Since OS filters can be defined to have powerful edge-preserving capability, they can also be employed to locate image discontinuities. Edge detection is an important task in image processing, since it provides the subdivision of an image into delineated, structurally significant regions. Many higher-level image understanding or vision tasks depend on the success of edge detection. Unfortunately, many edge detection schemes are sensitive to noise and are expensive to implement.
636
S. T. Acton and A. C. Bovik
Bovik and Munson (1986) proposed an edge detection scheme that is both inexpensive and resilient to outliers. Their method uses median comparisons, instead of average comparisons, between local neighborhoods on each side of a prospective edge. Statistical and deterministic results show that the median-based edge detector is more effective than average-based detectors in certain circumstances. The median filter (and other OS filters) may be used to pre-process an image for traditional edge detection techniques. Median prefiltering can dramatically improve the performance of edge detectors in terms of increased noise suppression away from edges, and increased preservation of detail (Bovik et al., 1987).
Image enhancement and restoration Least squares methods The most common application of OS filters is in signal enhancement and restoration. Order statistics have also been incorporated into more sophisticated enhancement/restoration algorithms. Bovik et al. designed a image restoration algorithm that uses OS-based hypothesis testing to preserve edges and smooth between edges (Bovik et al., 1985). The order statistics within a moving window are used to detect an edge of minimum height. If an edge is present, the output value is given by the order-constrained least squares fit of the window samples. If an edge is not present, the average of the windowed samples is the output value. In this way, edge information is incorporated into the restoration process. The algorithm compared favorably to both the median and average filters, in terms of subjective perception and mean squared error. Locally monotonic regression Certain properties of OS filters have developed into a theory of signal transformations which are not OS filtering operations themselves, but are global characterizations of the result of OS filtering. Locally monotonic (LOMO) regression is a device for enhancing signals by computing the "closest" LOMO signal to the input (where closeness is defined by a given distance norm) (Restrepo and Bovik, 1993). Hence, the computation of a LOMO regression may be compared to finding the root signal produced by iterative application of a median filter; however, the signal computed by LOMO regression is optimal in sense of similarity to the input signal. Furthermore, LOMO regression yields maximum likelihood estimates of locally monotonic signals contaminated with additive i.i.d, noise (Restrepo and Bovik 1994). The high computational cost of LOMO regression is a drawback. As the number of samples in the input signal increase, the number of operations required to compute a LOMO regression increases exponentially (Restrepo and Bovik, 1993). Faster windowed LOMO methods for 1-D signals can be implemented at the expense of relaxing the requirements for optimality (Restrepo and Bovik, 1991). Unlike standard OS filtering, the extension to higher dimension signals for LOMO regression is a difficult, ill-posed problem. Approximate methods which allow small deviations from the characteristic property of local monotonicity are
637
Order statistics in image processing
described and applied to corrupted images in (Acton and Bovik 1993). The success of LOMO regression-based image enhancement has also led to the development of other forms of regression, including piecewise constant (PICO) regression. The result of a PICO regression is comparable to the result of applying the W M M R OS filter multiple times. The example in Fig. 10 depicts the ability of PICO regression to eliminate heavy-tailed noise within piecewise constant image regions. Notice the detail preservation of the PICO regression result in Fig. 1l(c). PICO regression has been utilized to enhance and restore inherently piecewise
(a)
(b)
(c) Fig. 10. (a) Original "South Texas" image. (b) "South Texas" corrupted by Laplacian-distributed additive noise (or = 11.0). (c) PICO regression of noisy image.
638
s. T. Acton and A. C. Bovik
constant signals (Acton and Bovik 1993) and also to segment images into P I C O regions (Acton and Bovik 1994a). A more difficult problem in image processing occurs when a blurring process is concurrent with noise degradation. For example, consider a digital image taken from a moving automobile, resulting in a linearly blurred scene. In addition, the image is subject to an additive thermal noise. To restore such an image, it is necessary to simultaneously sharpen (de-blur) and smooth (de-noise) the image, which are conflicting tasks. The straightforward application of an OS filter or locally monotonic regression would fail to deblur the image. Optimization-based algorithms which iteratively deconvolve the blurred image while enforcing local monotonicity to remove noise have been successfully applied (Acton and Bovik 1994b); this approach, which is very recent, seems very promising.
6. Conclusions The available theory and tools for processing digital imagery has been significantly enhanced by the addition of order statistic techniques. F r o m Tukey's discovery of the median filter to the current research on morphology, the brief history of order statistics in image processing reveals a rich, fundamental contribution. The OS filter theory provides a general approach to image enhancement and optimal nonlinear filter design. The basic concepts have been extended to capitalize on both spatial ordering and rank ordering. Also, the filters have formed the basis of a new area in image processing that explores higher level questions in image processing regarding object integrity and shape. Numerous practical applications to significant image processing problems have been reported. Nevertheless, several important theoretical questions are as of yet unanswered, and a plethora of ripe application areas have yet to be explored.
References Acton, S. T. and A. C. Bovik (1993). Nonlinear regression for image enhancement via generalized deterministic annealing. Proc. of the SPIE Symp. Visual Commun. Image Process. Boston, Nov. 7-12. Acton, S. T. and A. C. Bovik (1994). Segmentation using piecewise constant regression. Proc of the SPIE Symp. Visual Commun. Image Process. Chicago, September 25-28. Acton, S. T. and A. C. Bovik (1994). Piecewiseand local class models for image restoration. IEEEInt. Conf. Image Process. Austin, TX, Nov. 13-16. Arce, G. A. and M. P. McLoughlin (1987). Theoretical analysis of the max/median filter. IEEE Trans. Acoust., Speech, Signal Process. ASSP-35, 60-69. Arce, G. A. and R. E. Foster (1989). Detail-preservingranked-order based filters for image processing. IEEE Trans. Aeoust., Speech, Signal Process. ASSP-37, 83-98. Ataman, E., V. K. Aatre and K. M. Wong (1980). A fast method for real-time median filtering. IEEE Trans. Acoust., Speech, Signal Process. ASSP-28, 415420. Barner, K. E., G. R. Arce and J.-H. Lin (1992). On the performance of stack filters and vector detection in image restoration. Circuits Syst. Signal Process. 11.
Order statistics in image processing
639
Barner, K. E, and G. R. Arce (1994). Permutation filters: A class of nonlinear filters based on set permutations. IEEE Trans. Signal Process. 42, 782-798. Bednar, J, B. and T. L. Watt (1984). Alpha-trimmed means and their relationship to median filters. IEEE Trans. Acoust., Speech, Signal Process. ASSP-32, 145-153. Blum, H. (1973). A transformation for extracting new descriptors of shape. In: Wathen-Dunn, W., ed., Models for the Perception of Speech and Visual Forms. MIT Press, Cambridge. Boncelet, C. G. (1987). Algorithms to compute order statistic distributions. S I A M J. Sci. Stat. Comput. 8(9), pp. 868-876. Bovik, A. C., T. S. Huang and D. C. Munson, Jr. (1983). A generalization of median filtering using linear combinations of order statistics. IEEE Trans. Acoust., Speech, Signal Process. ASSP-31, 1342-1350. Bovik, A. C., T. S. Huang and D. C. Munson, Jr. (1985). Edge sensitive image restoration using orderconstrained least-squares methods. IEEE Trans. Acoust., Speech, Signal Process. ASSP-33, 1253-1263. Bovik, A. C. and D. C. Munson, Jr. (1986). Edge detection using median comparisons. Comput. Vision, Graphics. Image Process. 33, 377-389. Bovik, A. C. (1987). Streaking in Median Filtered Images. IEEE Trans. Acoust., Speech, Signal Process. ASSP-35, 493-503. Bovik, A. C. and A. Restrepo (Palacios) (1987). Spectral properties of moving L-estimates of independent data. J. Franklin Inst. 324, 125 137. Bovik, A. C., T. S. Huang and D. C. Munson, Jr. (1987). The effect of median filtering on edge estimation and detection. IEEE Trans. Pattern Anal. Mach. Intell. PAMI-9, 181-194. ZCoyle E. J. and J.-H. Lin (1988). Stack filters and the mean absolute error criterion. IEEE Trans. Acoust., Speech, Signal Process. ASSP-36, 1244-1254. Coyle, E. J., J.-H. Lin and M. Gabbouj (1989). Optimal stack filtering and the estimation and structural approaches to image processing. IEEE Trans. Aeoust., Speech, Signal Process. ASSP-37 (12). Crow E. L. and M. M. Siddiqui (1967). Robust estimation of location. J. Amer. Statist. Assoc. 62, 353 389. David, H. A. (1955). A note on moving ranges. Biometrika 42, 512-515. David H. A. (1982). Order Statistics. Wiley, New York. Eberly, D., H. G. Longbotham and J. Aragon (1991). Complete classification of roots of one-dimensional median and rank-order filters. IEEE Trans. Signal Process. 39, 197-200. Fitch, J. P., E. J. Coyle and N. C. Gallagher (1984). Median filtering by threshold decomposition. [EEE Trans. Acoust., Speech, Signal Process. ASSP-32, 1183-1188. Fitch, J. P., E. J. Coyle and N. C. Gallagher (1985). Threshold decomposition of multidimensional ranked-order operations. IEEE Trans. Circuits Syst. CAS-32, 445450. Frieden, B. R. (1976). A new restoring algorithm for the preferential enhancement of edge gradients. J. Opt. Soc, Amer. 66, 280-283. Gallagher, N. C. and G. L. Wise (1981). A theoretical analysis of the properties of median filters. IEEE Trans. Acoust., Speech, Signal Process. ASSP-29, 1136-1141. Gastwirth, J. L. and M. L. Cohen (1970). Small sample behavior of some robust linear estimates of location. J. Amer. Statist. Assoc. 65, 946-973. Ghandi, P. and S. A. Kassam (1991). Design and performance of combination filters. IEEE Trans. Signal Process. 39(7). Heijmans, H. J. A. M. (1994a). Construction of self-dual morphological operators and modifications of the median. IEEE Int. Conf. Image Process. Austin, TX, Nov. 13 16. Heijmans, H. J. A. M. (1994b). Mathematical morphology as a tool for shape description. In: YingLie O e t al., eds., Shape in Picture: Mathematical Description of Shape in Gray-level Images. Springer-Verlag, Berlin. Heinonen, P. and Y. Neuvo (1987). FIR-median hybrid filters. IEEE Trans. Acoust., Speech, Signal Process. ASSP-35, 145 153. Huang, T. S., G. J. Yang and G. Y, Tang (1979). A fast two-dimensional median filtering algorithm. IEEE Trans. Acoust., Speech, Signal Process. ASSP-27, 13 18.
640
S. T. Acton and A. C. Bovik
Kuhlman, F. and G. L. Wise (1981). On the second moment properties of median filtered sequences of independent data. IEEE Trans. Comrnun. COM-29, 1374-1379. Lee, Y. H. and S. A. Kassam (1985). Generalized median filtering and related nonlinear filtering techniques. IEEE Trans. Acoust., Speech, Signal Process. ASSP-33, 672-683. Liao, G.-Y., T. N. Nodes and N. C. Gallagher (1985). Output distributions of two-dimensional median filters. IEEE Trans. Acoust., Speech, Signal Process. ASSP-33, 1280-1295. Lin, J.-H., T. M. Selke and E. J. Coyle (1990). Adaptive stack filtering under the mean absolute error criterion. IEEE Trans. Acoust., Speech, Signal Process. ASSP-38(6), 938-953. Lloyd, E. H. (1952). Least-squares estimation of location and scale parameters using order statistics. Biornetrika 39, 88-95. Longbotham, H. G. and A. C. Bovik (1989). Theory of order statistic filters and their relationship to linear FIR filters. IEEE Trans. Aeoust., Speech, Signal Process'. ASSP-37, 275 287. Longbotham, H. G. and D. Eberly (1992). Statistical properties, fixed points, and decomposition with WMMR filters. J. Math. Imaging and Vision. 2, 99-116. Longbotham, H. G. and D. Eberly (1993). The WMMR filters: A class of robust edge enhancers. 1EEE Trans. Signal Process. 41, 1680 1684. Maragos, P. and R. W. Schafer (1987). Morphological filters-Part lI. IEEE Trans. Acoust., Speech, Signal Process. ASSP-35. McLoughlin, M. P. and G. A. Arce (1987). Deterministic properties of the recursive separable median filter. IEEE Trans. Acoust., Speech, Signal Process. ASSP-35, 98-106. Naaman, L. and A. C. Bovik (1991). Least squares order statistic filters for signal restoration. IEEE Trans. Circuits and Syst. 38, 244-257. Neiminen, A., P. Heinonen and Y. Nuevo (1987). A new class of detail-preserving filters for image processing. IEEE Trans. Pattern Anal. Machine Intell. PAMI-9. Nodes, T. A. and N. C. Gallagher (1982). Median filters: Some modifications and their properties. 1EEE 7?ans. Aeoust., Speech, Signal Process. ASSP-30, 739-746. Nodes, T. A. and N. C. Gallagher (1983). Two-dimensional root structures and convergence properties of the separable median filter. IEEE Trans. Aeoust., Speech, Signal Process. ASSP-31, 1350-1365. Oflazer, K. (1983). Design and implementation of a single-chip median filter. IEEE Trans. Acoust., Speech, Signal Process. ASSP-31. Palmieri, F. and C. G. Boncelet0 Jr. (1989). Ll-Filters-A new class of order statistic filters. IEEE Trans. Acoust., Speech, Signal Process. ASSP-37. Pitas, I. and A. N. Venetsanopoalos (1990). Morphological shape decomposition. IEEE Trans. Pattern Anal. and Mach. Intell. PAMI-12, 38~45. Rabiner~ L. R., M. R. Sambur and C. E. Schmidt (1975). Applications of a nonlinear smoothing algorithm to speech processing. IEEE Trans. Acoust., Speech, Signal Process. ASSP-23, 552-557. Restrepo (Palacios), A. and A. C. Bovik (1986). Spectral analysis of order statistic filters. IEEE Int7 Conf Acoust., Speech, Signal Process. Tokyo. Restrepo (Palacios), A. and A. C. Bovik (1988). Adaptive trimmed mean filters for image restoration. IEEE Trans. on Acoustics, Speech, and Signal Processing. 36, 1326-1337. Restrepo (Palacios), A. and A. C. Bovik (1991). Windowed locally monotonic regression. IEEE Int7 Conf. Acoust., Speech, Signal Process. Toronto. Restrepo (Palacios), A. and A. C. Bovik (1993). Locally monotonic regression. IEEE Trans. Signal Process. 41, 2796-2810. Restrepo (Palacios), A. and A. C. Bovik (1994). On the statistical optimality of locally monotonic regression. 1EEE Trans. Signal Process. 42. Sarhan, A. E. (1955). Estimation of the mean and standard deviation by order statistics. Ann. Math. Stat. 25, 317-328. Sarhan, A. E. (1955a). Estimation of the mean and standard deviation by order statistics Part IL Ann. Math. Stat. 26, 505-511. Sarhan, A. E. (1955b). Estimation & t h e mean and standard deviation by order statistics Part III. Ann. Math. Stat. 26, 576-592.
Order statistics in image processing
641
Serra, J. (1988) hnage Analysis and Morphology." Volume 2." The Theoretical Advances. Academic Press, London. Serra, J. and L. Vincent. (1992). An overview of morphological filtering. Circuits, Systems, and Signal Processing. 11(1), 47-108. Tukey, J. W. (1971). Exploratory Data Analysis. Addison-Wesley, Reading, MA. Tukey, J. W. (1984). Nonlinear (nonsuperimposable) methods for smoothing data. In: Conf. Rec., EASCON, 673, 1974 (also available in The Collected Works of John W. Tukey, II, Time Series: 1965-1984, D. R. Brillinger, Ed. Wadsworth, Monterey, CA.). Tyan, S. G. (1981). Median filtering: Deterministic properties. In: Two-dimensional Signal Processing: Transforms and Median Filters. T. S. Huang, ed. New York: Springer-Verlag. Velleman, P. F. (1980). Definition and comparison of robust nonlinear data smoothing algorithms. J. Am. Star. Ass., vol. 75, pp. 609-615. Vincent, L. (1994). Morphological area openings and closing for grey-scale images. In: Ying-Lie O, ed., Shape in Picture. Mathematical Description of Shape in Grey-level Images. Springer-Verlag, Berlin, 197-208. Wendt, P. D., E. J. Coyle and N. C. Gallagher, Jr. (1986). Stack filters. IEEE Trans. Acoust., Speech, Signal Process. ASSP-34, 898-911.
N. Balakrishnan and C. R. Rao, eds., Handbook of Statistics, Vol. 17 © 1998 Elsevier Science B.V. All rights reserved.
,~,~
LD
Order Statistics Application to CFAR Radar Target Detection
R. Viswanathan
1. Introduction
Constant false alarm rate(CFAR) detector is required for automatic detection and tracking of targets using radars (Nathanson 1991; Nitzberg 1992; Skolnik 1980). The first C F A R test proposed was called a cell averaging (CA)-CFAR (Finn and Johnson 1968). Two variations of this were introduced later to address some of the shortcomings of the CA-CFAR test. More recent works have considered tests based on order statistics for C F A R detection (Gandhi and Kassam 1988). Even though the tests based on order statistics perform better than CAC F A R and its variations over a wide range of situations, there exists no single test that performs the best for every possible situation. The term situations will be clear from the discussions presented later on in this section. As an optimal test is difficult to be found even in a restricted set of situations, there have been a number of attempts to design "improved" C F A R detectors. A bulk of these, which are based on the order statistics, is the focal point of this paper. Denote by Z the squared envelope detector output corresponding to a test cell of a radar search volume (Di Franco and Rubin 1980; Nitzberg 1992). Based on Z, the problem is to decide whether the signal received at the radar receiver is due to reflections from a target plus background noise and clutter (hypothesis//1) or due to only clutter plus noise (hypothesis H0). When the clutter is dominant, one can neglect the effect of noise. Typically, the form of the density of W = v/-Z is known to a reasonable degree but some parameters associated with it are unknown. The exact density of W depends on the nature of target and clutter, but, under the null hypothesis, there exist four possible standard densities, namely Rayleigh, Weibull, log-normal, and K-distribution. Except for the first, the rest are distributions with two parameters. One of the standard target models is exponential for Z (a Rayleigh for W). For the exponential target (//1) and the exponential clutter (/40), the mean under the alternative is larger than the mean under the null hypothesis. That is, the distributions under the two hypotheses are stochastically ordered. Therefore, the target detection problem in this case can be written as 643
644
R. V&wanathan
Hi:
Z ~ exponential with mean
21
H0:
Z ~ exponential with mean
20
(1)
with 21 > 20, but both parameters being unknown. Even though a UMP test of (1) exists and is of the form Decide //i
iff Z > b
(2)
the threshold b cannot be determined for a given probability of false alarm, PF, (same as type I error probability), since 20 is not known. The PF of the test (2) with a fixed b varies significantly even with small changes in 20. A method to obtain a C F A R test is then based on obtaining several reference samples X = (X 1,)(2,...,Xn) as the output of the squared envelope detector corresponding to the radar returns in the cells adjacent to the test cell (typical values of n range from 10 to 30). The hope is that the noise plus clutter present in these reference cells are similar to the noise plus clutter in the test cell, and therefore a reasonable estimator of 20 using the reference samples can be obtained. Typically it is assumed that the samples XI,X2,...,Xn are independent among themselves and are independent of Z. Correlation between the samples might occur when the samples are converted to digital signals using an A/D operation. A/D sampling frequencies higher than the radar bandwidth cause adjacent resolution cell voltages to be statistically correlated (Nitzberg 1992). As a first degree analysis, such correlation effects can be ignored. Denoting the estimator as S(X), a test inspired by (2) is given by HI
z _> t s
(3)
<
H0 where t is an appropriate constant to be determined. Fig. 1 shows a conceptual block diagram of a typical C F A R test. The corresponding false alarm probability is given by PF = P(Z >_ t S]20)
(4)
If 20 is the scale parameter of the density of S and if it is the only parameter, then PF is independent of 2o, and a constant t that achieves the required false alarm probability can be found. In the case of exponential density with identically distributed {Xi}, the sample mean n!2ni=l X,. is an U M V U E of 20 and the test (3) with S as the sample mean is called the cell average (CA-CFAR) test: tl
Z>_t <
H0
i=1
In the above equation the constant n has been absorbed into the threshold t, which is to be determined for a desired false alarm rate. The CA test is very
Order statistics application to CI:ARradar target detection
645
!
I.I .... Ix-, l
I
I 1,1-:-- I x. I
Estimate S
[
I No Compare Z >tS
[
"
Decide Ho
~' Yes Decide H1 Fig. 1. A CFAR test based on adjacent resolutioncells. appealing because it uses UMVUE. If the problem (1) is modified such that under Ho, i.i.d, exponential reference samples {Xi} with mean 20 are available, then it has been shown recently by Gandhi and Kassam that the CA-CFAR test is indeed UMP for a specified PF (see Gandhi and Kassam 1994). The CA test is not totally satisfactory because the X~ s may not be identically distributed always. It is well known that the sample mean is not a robust estimator when outliers are present. Realistically, with a reasonably large number of adjacent cells, it is likely that 1) some of these samples are from other interfering targets and 2) two groups of samples may be from differing clutter power backgrounds when a clutter transition occurs within the range of resolution cells. There are several models for clutter transition, such as ramp, step etc. (Nitzberg 1992), but we consider only the step clutter transition in the sequel. This seems to be predominant in many practical situations. Fig. 2 illustrates a step clutter, showing two values for clutter power, one to the left and the other to the right of the transition. When the clutter transition occurs, ideally we want the estimate of the power level (which is the mean in the case of exponential) of the clutter-plusnoise background that is present in the test cell under investigation. Since estimates are based on a finite number of samples, the ideal value cannot be realized. The tradeoff parameters are 1) good probability of detection performance in the homogeneous background, that is, no interfering targets or clutter power variations 2) a good resolution of multiple targets that may be closely spaced within the resolution cells and 3) low false alarm rate swings during clutter transitions.
646
R. Viswanathan
Clutter Power
High
c_
Low /
I
1 2
I
. . . . . .
I
I
Test Cell
I
. . . . . .
b
n+l Cell Number
Fig. 2. Step clutter.
Two practical problems that are associated with the performance of CAC F A R can now be seen. The threshold t in the CA test of (5) is computed using a homogeneous background and a specified probability of false alarm, which is typically of the order of 10 -6. If there is even a single interfering target causing one of the reference samples in the set {X/, i = 1 , . . . , n} to have a larger value, then the estimate s will assume a value larger than what it should. The consequence of this is that if a target is present in the test cell, the probability that Z exceeds tS will be diminished yielding a low probability of detection. This is called target masking phenomenon. Another undesirable situation is when a clutter transition occurs within the reference cells such that the test cell is in the high clutter power region along with half of the reference cells, with the remaining in the low clutter. In this case the estimate s will be lower than what it should be, and if there is no target in the test cell, the probability that Z exceeds tS will be increased yielding a large increase in probability of false alarm (upper false alarm swing). It is to be understood that for any of the C F A R test discussed here, the test will be designed, that is appropriate threshold constant and test statistic will be found, so that the test has the desired false alarm probability under homogeneous noise background. Historically two variations of CA-CFAR, which are termed G O - C F A R and SO-CFAR, were proposed to remedy some of the problems associated with CA-CFAR. With reference to Fig. 1, the SO-CFAR, which stands for smallest-of-CFAR, takes as the estimate of 20, the minimum of the two arithmetic means formed using the samples to the right of the test cell (lagging window) and the samples to the left of the test cell (leading window), respectively (Weiss 1982). Therefore, if one or more of interfering targets are present in only one of the lagging or leading windows, then the estimate s will not be large as in the CA case, and therefore the target masking does not happen. However, target masking happens if interfering targets appear in both the win-
Order statistics application to CFAR radar target detection
647
dows. Also, it is unable to control the upper false alarm swing that happens with a clutter transition. The GO-CFAR or the greatest-of-CFAR computes as its estimate the maximum of the two arithmetic means from the two windows (Hansen 1973). This controls the upper false alarm swing during clutter transition but target masking occurs when interfering targets are present. Thus, these two variations of CA-CFAR are able to address one, or the other, but not both the problems encountered with the CA-CFAR. Estimators based on order statistics are known to be robust. Also, not using the samples from interfering targets in obtaining an estimate is essentially a problem of estimation with outliers, and therefore estimators such as trimmed mean, linear combination of order statistics etc., should prove useful. The rest of the chapter is organized as follows. In Section 2 we discuss order statistics based CFAR tests for target detection in Rayleigh clutter. Section 3 presents order statistics based CFAR tests for Weibull, log-normal, and K-clutter. In section IV we conclude this chapter.
2. Order statistics based CFAR tests for Rayleigh clutter
When the clutter amplitude is Rayleigh distributed, the squared-envelope detector output is exponentially distributed. Since the noise also has a Rayleigh amplitude, the squared-envelope detector output when noise plus clutter is present at its input will also be exponentially distributed. When a clutter power transition occurs, for the sake of convenience, we call the high clutter plus noise as simply clutter (or as high clutter) and the low clutter plus noise as simply noise (or as low clutter). Reflections from an air plane due to the transmission of a single pulse can be modeled as a Gaussian process, with the corresponding amplitude distributed as Rayleigh. Such a model for the target is called the Rayleigh fluctuating target model or simply Rayleigh target (Di Franco and Rubin 1980; Nathanson 1991). For multiple pulse transmission, the target returns can be classified i n t o four categories, Swerling I through Swerling IV. Swerling ! and II have Rayleigh distributed amplitude whereas Swerling III and IV have Rayleigh-plus-one dominant amplitude distribution (Di Franco and Rubin 1980; Nathanson 1991). For most part we concern ourselves with the single pulse case and Rayleigh target. With a Rayleigh target, under the target hypothesis H1, the envelope W will be Rayleigh distributed (or Z will be distributed as an exponential). Therefore, the test sample and the reference samples are all independent and exponentially distributed. The mean of each of these samples is determined according to the scenario. The mean of the squaredenvelope detector output when only noise is present at its input, is assumed to be 20. The mean, when a target reflection is present, is taken as 20(1+ SNR), where SNR stands for signal-to-noise power ratio, which is the ratio of the means of the squared envelope corresponding to target and noise, respectively. Similarly, one can define INR, the interfering target-to-noise power ratio and
648
R. Viswanathan Table 1 The mean values of different cells Cell
The mean value of the exponential dist.
Test Cell Z Noise only Clutter (high) Reference Cell X/ Noise only Clutter (high) Interfering target
Rayleigh Target (HI) No target (Ho) 2o(1 + SNR) 20 2o(1 + SNR + CNR) 2o(1 + CNR) 20 20(1 + CNR) 2o (1 + INR)
CNR, the clutter(high)-to-noise power ratio. The detection problem can be summarized as follows. 2.1. Fixed order statistics test ( O S - C F A R )
The fixed order statistics test (OS-CFAR) is based on the following: HI
Z > t Y~
(6)
< H0
where Y~is the r th order statistic of the samples {X/, i = 1 , . . . , n}. Since the sample size n is clear from the context, the r th order statistic is denoted by Y~, instead of the customary Y~:,,.First thing to observe is that (6) is a C F A R test, because under homogeneous background, 2o is also the scale parameter of the density of Yr. The probability of false alarm and the probability of detection under homogeneous background are given by (Gandhi and Kassam 1988; Rohling 1983) r 1
PF = P ( Z >_ t Y~IHo) = 1-I (n - i)/(n - i + t)
(7)
i=0 r-I
PD = P ( Z >_ t Y,.IHI) = 1--[ ( n
-- i ) / ( n
-- i +
(t/(1 + SNR)))
(8)
i=0
Notice that PD can be found from PF expression by replacing t with t/(1 + SNR). Because of the assumed Rayleigh target and Rayleigh clutter models, this relation between PF and PD is valid for any C F A R test, Rohling first proposed an order statistic test for C F A R detection (Rohling 1983). It was later analyzed by Gandhi and Kassam (1988), and Blake (1988). Detailed analysis of CA-CFAR and OS-CFAR are presented in (Gandhi and Kassam 1988). As this paper shows, in homogeneous background, for n = 24, PF = 10-6, the probability of detection of OS-CFAR, when r is greater
Order statistics application to CFAR radar target detection
649
1.0 I J
I
N=
Pf=
24
~".
1E-6
/y<..
fT/~ jj'/
0.8~
/,i
I
.."
......
.....
.' :
tO t~
C3
o.6 I
///;'
>
....
O ..Q
0.4
i
////
Q_
I
,&: //,'
/ /~, I
5
/ 4 /.,
.......
..'
-
..."
--
-. . . .
r=.8 -
/'=
"lk~
r=24
I
.....
10
15
20
25
30
SNR (dB) Fig. 3. P e r f o r m a n c e in h o m o g e n e o u s b a c k g r o u n d .
than or equal to n / 2 , is not far from the CA-CFAR, which is UMP. Only the order statistics well below the median give poor detection performance, r = 21 gave the best performance in the sense that its PD curve with respect to SNR is closest to the CA test. Fig. 3 shows a sample performance curve for homogeneous background. In general, r values of the order of 3n/4 is recommended (Rohling 1983). For understanding the performance in the nonhomogeneous situation, we consider two cases: 1) interfering targets and 2) clutter transition. Even though it is possible that both these situations may occur together, by considering their effects separately, it is possible to predict the effects when both occur together. By choosing r < n, the OS-CFAR can avoid target masking when interfering targets appear in the reference window. Specifically, it can tolerate ( n - r) interfering targets in the reference window and achieve monotonically increasing probability of detection as a function of SNR. A simplified, but effective model for interfering situation is to assume that all the interferers have the same strength so that the corresponding X/ samples all have mean value 20(1 + INR). With k interferers, there exist two groups of reference samples, with mean of each of k samples in one
R. Viswanathan
650
being 20(1 + INR), and mean of each of (n - k) samples in the other being 20. By finding the density of Y~ with this two groups of random samples, we can obtain an expression for PD (Gandhi and Kassam 1988):
t
~
min(i,n-k)~.~
PD t=r j=max(O,i-k) (9)
× (t/(1 + SNR)) + n - k - j
+ j l + (j2 + k -
i + j ) / ( 1 + CNR)
Fig. 4 is plot of PD VS. SNR, with I N R = SNR, and different number of interferers, k. As expected, the OS test with k = 21 and n = 24 can tolerate up to 3 interfering targets without test cell target masking. When the number of interfering targets exceeds ( n - r), the PD approaches asymptotically to a value less
,o i I o.B ~ r.
_o
!
°
i
N= 24 pf= 1E-6 ~.~z INR = SNR ~"'// r = 21 /~>',, k: # of interfering targets / ,//" ~ / / /
///
/ ~ 0.6 ~-
-8
0.4(
/..L/__~___ /~/__ __
/>/:~__
!
0.2 ~
/:-,7 /,.//
o
k= 1 k= 2
.=
/:7/ ////,/
I
....
3
/ k= 4
y
.-
/"/ / /" ,// / /"
5
10
15 20 SNR (dB)
25
Fig. 4. Performanceof OS-CFAR in interferingtarget situation.
30
651
Order statistics application to CFAR radar target detection
than 1, as SNR approaches e~. This value decreases as the difference (k - (n - r)) increases. The performance in step clutter can also be examined using (9). With k number of cells from high clutter and the rest from the noise (low clutter plus noise), the PF is given by (9) with SNR replaced by CNR, when the test cell is from high clutter, and with SNR = 0, when the test cell is from noise. Fig. 5 shows false alarm change performances of OS-CFAR and CA-CFAR. Three things can be noticed. 1) The maximum increase in false alarm for the OS test (k = 21) is less than that of CA test. 2) Although k = 24 has the lowest false alarm increase, it cannot be chosen because of its inability to prevent target masking even with one interferer. 3) The decrease in the false alarm rate well below the designed value of 10 6, which happens when the number of clutter cells is less than n/2, is also undesirable because it is accompanied by reduced PD. We digress briefly to mention two other parameters that are used sometimes to assess the performance of a C F A R detector. One is called the average detection threshold (ADT), which describes the performance in homogeneous background (Gandhi and Kassam 1988; Rohling 1983). The detection probability increases for any test of the type (3) when the threshold on the right hand side of the inequality decreases and vice versa. Therefore, comparison of the fixed optimum threshold
T
r
-1.0
N-- 24 CNR =10 dB -3.0
E'~ O
-5.0 -6.0 -7.0
"6 xi .o
a_
-9.0
" N "\\\
O ..J
"~, -11.0
\\
CA OS ( r = 21)
-
-
OS(r
=24)
-13.0
-15.0
I 4
8 12 16 Number of Clutter Cells
Fig. 5. False alarm rate change with clutter transition.
20
24
652
R. Viswanathan
with the average detection threshold ADT of a C F A R detector should give a measure of the loss of detection. ADT is defined as A D T - E(tS)
(10)
2o The performances of two detectors can then be measured by forming the ratio ADT1/ADT2, which can also be expressed in decibels as 101ogl0(ADT1 ) -101ogl0(ADT2 ) dB. The other measure is to compare the SNRs required by two tests to achieve the same probability of detection, at a prescribed false alarm probability. The measure is usually expressed as the difference of the two SNRs in dB. Related remarks: Levanon albproximated the detection probability using Stirling's formula (Levanon 1988). His approximation provides a quick estimate of the detection loss (in terms of SNR) in both homogeneous and interfering target situations. The performance was more thoroughly analyzed by Shor and Levanon in (Shor and Levanon 1991). They considered Rayleigh as well as Rayleigh-plus-one dominant target models. The clutter background was taken to be a Rayleigh distributed amplitude. They have also derived an expression for probability of false alarm, for Weibull clutter with a known shape parameter. Lira and Lee derived analytical expressions for the detection probability in homogenous background, for the case of multiple pulse noncoherent integration (Lira and Lee 1993). They considered all four Swerling target models. Effect of quantization on the performance of a C F A R detector was investigated by Gandhi (Gandhi 1996). The term quantizati0n refers to the situation where each real number {X/} and Z are represented in digital form by a finite number of bits. A general conclusion is that for both CA and OS detectors in homogeneous background, with 12-bit uniform quantizer, adequate false alarm control near the designed PF is possible, if the noise power fluctuates by less than l 5dB. This noise variation margin is reduced if there are interfering targets in the reference window. An interesting analysis of the effectiveness of an order statistic test versus an averager, for post detection multiple pulse integration was provided by Saniie, Donohue, and Bilgutay (1990). The threshold for either of the tests is fixed knowing all the parameters of the clutter distribution, and in this sense, the tests are not CFAR. However, by realizing that the order statistics are asymptotically consistent estimators of the quantiles, and by looking at the separation between the inverse CDFs of the clutter only and target plus clutter samples, the authors predict the detection performances of the two schemes. Thus, OS-CFAR exhibits an overall improvement in performance over the CA-CFAR: (i) it can tolerate more interfering targets without target masking, the exact number being dependent on (n - k), whereas the CA test cannot tolerate a single interferer (ii) the worst case false alarm increase with the OS test is smaller than that with the CA test and (iii) the OS test achieves the above improvements (i) and (ii) without sacrificing too much of detection probability under homoge-
Order statistics application to CFAR radar target detection
653
neous background. A drawback of the OS test is that k has to be fixed apriori. It can be fixed from a knowledge of the maximum number of expected targets in the reference cells, but such a knowledge may not always exist. Also, the maximum false alarm increase with clutter transition is larger than what is desirable. These factors then motivated other researchers to look for better tests. Almost all of these are based on order statistics. These attempts have met with only partial SUCCESS.
2.2. Other order statistics based tests First we consider tests that are based on a subset of the order statistics {Yi, i = 1 , . . . , n}, with the subset selection procedure determined apriori before the reference cells samples are obtained.
2.2.1. Apriori selected order statistics subset Gandhi and Kassam (1988) considered the trimmed mean test of the form (3) with S given by n-T 2
s:
Z
(11/
j:TI+I
In (11), the trimming points, (n - T2) and (TI + 1) are to be fixed apriori. The actual value of T2 depends on the maximum number of interfering targets that may be present in the reference window. The value of Tl should be small to attain good detection performance in a homogeneous background. However, for good false alarm control during clutter transition, T1 should be large and T2 should be small. Therefore, a compromise between two sets are needed. With judicious choice for the trimming constants, the authors reported a very marginal (may be?) overall performance improvement over an OS-CFAR test. Observe that the OSC F A R is a subset of (11). Ritcey analyzed another test, called censor mean level detector (CMLD), which is again a subset of (11) with Tl = 0 (Ritcey 1986). This test was first considered by Rickard and Dillard (1977). Ritcey and Hines considered another modification of (11) by using a winsorized mean (Ritcey and Hines 1989)
S
l(~yj.+(n+l_k)y~)
k
k \j=l
(12)
For the homogeneous noise case, the estimator s is an UMVUE of 2o based on any k samples of the unordered {Xi}. The improved estimation of (12) over a single order statistic marginally improves the detection probability under homogeneous background, but the detection performance in multiple targets situation is essentially unchanged. The reason for this is that the trimming point k needs to be fixed apriori. Another modified form of C M L D was analyzed recently (El Mashade 1995). Ritcey and Hines combined the ideas of OS-CFAR and the
R. Viswanathan
654
GO-CFAR of section 1 to formulate the maximum(MAX) family of order statistic detectors (Ritcey and Hines 1991). For the MAX family,
s -- max(<, s2) where
(13)
n/2 S, = ~ c i ( i th OS from {X/,j = 1,...,~}) ; i=1
n/2 $2 = y~'c/(i th OS from {X/,j = ~+ 1 , . . . , n } ) . i=1
The constants ci are to be fixed apriori. The authors considered three tests, the maximum of winsorized means, for which the appropriate constants ¢i can be determined from (12), the maximum of censored mean levels MX-CMLD, for which (ci = 0, i > some l), and the maximum of order statistics MX-OSD, for which (ci = 0, except for i = some r). The first was also analyzed by A1-Hussaini, who called it a censored- greatest-of detector (CGO) (A1-Hussaini 1988). The MX-OSD performs nearly as good as the MX-CMLD, The maximum increase in the false alarm rate during clutter transition for MX-OSD or MX-CMLD is smaller than that of the OS-CFAR, because of the use of maximum operation in (13). In interfering target situations, the probability of detection achieved with MX-OSD (r such that ( ~ - r) < maximum number of interfering targets) is as good as that achieved with OS-CFAR (r such that (n - r) < maximum number of interfering targets). Thus, MX-OSD is an improvement over OS-CFAR. A similar conclusion was reached by Di Vito, Galati, and Mura (1994), and EliasFust'e, de Mercado, and Reyes Davo (1990), who called the MX-OSD as OSGO. The first authors evaluated the MX-OSD and the OS tests for false alarm rate change during clutter transition, and for detection performance with multiple target interference, using importance sampling techniques. Their multiple target interference model was slightly different in the sense that an interfering target produced target returns in three adjacent resolution cells, such that their amplitudes are correlated in some specific fashion. Some related performance analysis can be found in (Guan and He 1995), (He and Guan 1995), and (Wilson 1993). Ritcey also derived analytical expressions and obtained numerical results for the MX-MLD, when mutiple pulses with noncoherent integration are employed (Ritcey 1990). He considered all the four Swerling target models. Barkat, Himonas, and Varshney analyzed the performance of C M L D for both Swerling IV and II targets (Barkat et al. 1989). Nagle and Saniie considered for S, best linear unbiased estimator (BLUE) (Arnold et al. 1992), based on a partial set of order statistics (Nagle and Saniie 1995): //
m
(14)
S i=r+l
The optimum coefficients are determined for homogeneous background noise. Some increase in detection probability in the homogeneous background over the
Order statistics application to C F A R radar target detection
655
OS-CFAR is obtained. They did not evaluate the false alarm rate change during clutter transition. It is expected that the worst case false alarm increase for the BLUE detector will be larger as compared to the increase for the OS detector, because of the inclusion of lower order statistics in the estimator. Another related test is based on the sum of the two order statistics, one from each of the lagging and leading windows (He 1994). We summarize the lessons learned from these studies as follows. Compared to the OS-CFAR, the inclusion of more order statistics in the estimators such as TM, CMLD or BLUE, only provide a marginal increase in the detection probability under the homogeneous background noise situation. However, the worst case increase in the false alarm probability of these detectors during clutter transition is a little larger than the increase achieved with the OS-CFAR. In any case, all these tests are unsatisfactory in terms of false alarm control. These comments should be applicable to Rayleigh as well as Rayleigh-plus-one dominant target models, and single as well as multiple pulse noncoherent integration, even though a thorough analysis has been done only for the Rayleigh target and single pulse model. A reasonable false alarm control during clutter transition is obtained by splitting the reference window into two halves, and then applying the maximum operation, as in the Mx family of detectors. The MX-OSD is simple and is nearly as efficient as the others in the Mx family of detectors. One drawback of MX-OSD is the decrease in false alarm probability below the design value, and consequently a loss of detection, when the test cell is in low clutter, and nearly half the number of reference cells are in high clutter. An improvement in this situation may be possible with an adaptive order statistic based estimator, where the selection of the order statistics is dictated by the reference samples themselves, and not fixed apriori.
2.2.2. Tests with data dependent selection of order statistics Among these class of tests we first consider two procedures called a variable trimmed mean (VTM) detector and selection and estimation (SE) test. The selection procedures employed in these tests to select the order statistics subset are similar, but they differ in the choice of final estimate for the noise power. They have similar performances under homogeneous and multiple targets situations, but the SE test has a better false alarm control during clutter transition. Ozgunes, Gandhi, and Kassam proposed and analyzed the VTM detector in (Ozugunes et al. 1992). The VTM detector is of the form (3) with S given by 1 x2 X--" Y, S (15) (X2 - k + 1)/--" i=k where the lower trimming point k is to be fixed apriori but K2 is selected according to the rule: set K2 mEk2
if
Yk2 <-- (1 + v)Yk < Yk2+l
(16)
The design parameters of VTM are the constants k and v _> 0, and for a given design of VTM, the threshold t in (3) is chosen to satisfy the PF requirement under
R. Viswanathan
656
homogeneous condition. VTM reduces to OS-CFAR with the choice v = 0. k should be large enough to achieve a good false alarm control during clutter transition but not too large so as to encounter target masking. As in OS-CFAR, a reasonable choice is k such that (n - k) equals the maximum expected number of interferers. For a given k, a larger value for v limits the false alarm increase during clutter transition better. However, a too large value decreases the detection probabilities under the homogeneous and interferer situations. Using the property that i.i.d observations from an exponential density form a Markov chain (David 1981), the authors obtained a closed form expression for PF in the homogeneous case. They used computer simulations to assess the performance under nonhomogeneous conditions. The results show that the VTM detector performs slightly better than the OS-CFAR detector. Viswanathan and Eftekhari considered the SE test in (Viswanathan and Eftekhari 1992). They applied a selection and ranking procedure used in reliability applications to select the order statistics subset. The SE test is based on (3) with S given by S = Y~
(17)
where/3 depends on the size r (the number of elements) of the selected subset, and the subset is determined by include in the subset all Xi satisfying N < dYb
(18)
The design parameters are the constants d and b such that d _> 1 and 1 < b < n. b has to be chosen smaller than n/2 so as to make a possible inference about the presence of a clutter transition near the middle of the reference window, d has to be chosen as a compromise between detection performances under homogeneous and interfering target situations. For Rayleigh target in Rayleigh clutter, closed form expressions for PF, corresponding to homogeneous and interfering target situations are obtained in (Viswanathan and Eftekhari 1992). In the homogeneous case, the probability of selecting the subset with size r is given by
b
(b_l
1, i=0 i (-l/i beta(v-b+ 1,
)
19, -
where beta(.,.) is the standard beta function. Based on (19) and another expression for Ps(r) corresponding to interfering target situation, it is possible to reasonably optimize d for a given b. Ideally, the subset selection should meet the following. 1) All X~'s should be selected, if they are from homogeneous background. 2) If there are multiple targets, the samples due to these should not be selected but the rest should be included. 3) If clutter background with differing power levels exist, the samples whose power levels are the same as the one present in the test cell must be selected and the rest should not be selected. In practice, all
Order statistics application to CFAR radar target detection
657
these requirements cannot be met. The design of (b,d) can be completed by considering few b values below (n/2) and then obtaining the best (b, d) value over all the choices considered. Smaller d is better when interfering targets exist whereas a larger d is preferable for the homogeneous case. As shown in (Viswanathan and Eftekhari 1992), a compromise value for d can be usually chosen from a study of Ps(r). The design of SE is then completed by specifying ft. This can be done by means of a "look up" table that provides the proper choice of fl values for every r value. The proper choice, as explained in (Viswanathan and Eftekhari 1992), is based on logical reasoning. For example, if r is determined as being close to n/2, it implies a possible clutter transition situation, and therefore fl needs to be kept close to n, to control the upper false alarm swing. It is shown that for a given OS-CFAR, a SE test can be designed so that (i) it can tolerate an additional target over the OS and (ii) its false alarm increase in clutter transition is much below the level of the OS, as well as the VTM. The false alarm control during clutter transition gets better as C N R increases. This is to be anticipated, because as C N R increases, it is much easier to identify the "outliers" - the high clutter samples, from the composite group. The subset selection (18) is identical to the one used in VTM(16), but by choosing b to be smaller than n/2, and by having a better estimation procedure, the SE test is able to provide a better false alarm control during clutter transition. Gandhi and Kassam considered another test called the adaptive order statistic detector(AOS) (Gandhi and Kassam 1989). AOS uses a statistic similar to (17) where fl takes one of two possible values, kl or k0, with kl >_ k0. These two numbers are the design parameters. A hypothesis test on the reference samples yields a decision on whether a clutter transition is present or not within the window. If clutter present decision is made, the order kl is chosen, otherwise the order/co is used. Like the SE test, AOS can be designed to limit the false alarm increase during clutter transition. Lops and Willett considered another order statistic based scheme called LI-CFAR (Lops and Willett 1994). It is based on a combination of linear and ordering operations. Denote the rank ordered vector of the reference vector X as Xr (X(1), X(2),... ,X(n)). Here for convenience the r th order statistic Yr is denoted as X(,.). The test is again based on (3) with S given by =
(20)
S = cTw
where
= (X(1)l,...,X(1)nlX(2),,.
X(j), =
{X~ 0
if
.... X(2)nl.. " iX(n)l,...,X(n)n
X(j)+-+Xk else
)T
(21) (22)
and the notation X(j) +-~Xk implies that the k th element in X occupies the jth location in the ranked vector Xr. Both e and w are column vectors of length n 2. The design of LI test(or filter) is controlled by the elements of
R. Viswanathan
658
j=0,1,...,n-1;k= optimization problem
C~ Cj, k
1 , 2 , . . . , n . c is obtained as the solution to the
c = arg m i n ( E ( 2 0 - S) 2)
(23)
subject to the constraints c>0
and
E(S)=20
(24)
This is a quadratic programming problem for which an efficient solution exists. The solution to (23) depends on two quantities Rw = E(ww r) and p = E(20w). If (23) is solved for homogeneous background, the solution would turn out to be CA-CFAR, because the sample mean is the minimum mean square error unbiased estimator. Since analytical expressions for Rw and p are not possible, these will have to be estimated from realistic realizations of the vector w. That is, the Ll filter must be trained with realizations of X that best describe the different heterogeneous and homogeneous situations. A model for the generation of X is then based on the following. Each reference cell is affected independently by an interfering target return with probability Pi (the subscript i denotes interferer). A step clutter occurs (or does not occur) within the window with probability pc ((1 -Pc)). The step occurrence being low to high, or high to low, is then decided on an equally likely basis. In order to generate interferer and clutter samples, the parameters INR and C N R are also needed. The authors show that the Ll filter is an average of a collection of linear combination of order statistic filters (L-filters). They provide performance equations for Rayleigh target in Rayleigh clutter, but simulation of partial random variables yields computationatly faster error probability estimates than a direct evaluation of analytical expressions, which is combinatorially explosive. Even though the design of the Ll filter is elaborate, and it requires a training process, once the coefficients are obtained, the actual implementation on line is simpler (notice that only n terms in the vector w are nonzero). Based on the results from this study it can be said that LI filter provides a better false alarm control than a simple OS-CFAR. It is not known whether a L1 type filter, or the SE test performs better in an overall fashion, as no comparative study has been done. Barkat, Himonos, and Varshney proposed a generalized censored mean level detector (GCMLD) (Barkat et al. 1989). The data dependent censoring scheme of the G C M L D determines the number of interfering targets present in the reference window, for a given false probability of false censoring. This detector, however, is designed to operate satisfactorily in homogeneous backgrounds and multiple target environments only, and would exhibit considerable increase in the false alarm rate in regions of clutter power transitions. Finn considered a multistage C F A R design procedure, which is not order statistic based, but which uses maximum likelihood estimates (Finn 1986). The scheme tests the reference samples for homogeneity, for possible clutter transition, and position of clutter transition, if a transition is suspected, and for the samples from possible interferers. These tests are parametric and use the know-
Order statistics application to CFAR radar target detection
659
ledge that the target and clutter are Rayleigh distributed. The author has quoted encouraging results based on simulation studies. The drawback of such a multistage procedure is that it introduces statistical dependencies and it cannot be analytically evaluated. 2.3. Distributed C F A R tests based on order statistics
Distributed radar target detection refers to the situation where geographically separated radars look for targets in a search volume. The radars then communicate their information, including their decisions with regard to the presence of targets, to a central site called the fusion center. The fusion center then combines information from all the radars to form target tracks. In a very general situation, the separation would be so large that the same target will not be seen by all the radars at the same time. We consider a simpler situation where a search volume is simultaneously searched by multiple radars. A typical scenario would be a search with two or three radars. As before, only the target detection problem is addressed. Sending all the reference samples and the test samples would require more communication capacity in the links between the fusion center and the radars, and would also demand increased processor capability at the fusion center. To ease these requirements, two approaches are considered in the literature. 1) Individual radars send their decisions to the fusion center, and the fusion center makes a final decision based on the individual decisions. 2) The radars send condensed information in the form of few statistics to the fusion center, and the fusion center makes the decision regarding the presence of a target in a test cell in the search volume. Uner and Varshney analyzed distributed C F A R detection performance in homogeneous and nonhomogeneous backgrounds (Uner and Varshney 1996). Each radar conducts a test based on its own reference samples and the test sample, and sends its decision to the fusion center. Let ui denote the decision of t h e i th radar such that ui =
1 0
if if
ith s e n s o r decides H1
ith sensor decides H0
(25)
If the distributions of the Bernoulli variables {ui}, under the two hypotheses, are known completely, then the fusion center can employ an optimal likelihood ratio test based on {ui} (Chair and Varshney 1986; Thomopoulos et al. 1987). However, in radar situation with nonhomogeneous reference cells and unknown target strength, it is not possible to construct such a test. Therefore, the fusion center employs a reasonable nonparametric test of the type of a counting rule. That is, the counting rule is given by n ,~--~ Ui HI >k < i=1 H0
(26)
where k is an integer, k = 1 is called an OR rule because in that case (26) is nothing but the OR operation on the Boolean variables {ui}. Similarly, k = n
660
R. Viswanathan
corresponds to the A N D rule and k = (n + 2)/2 (for n odd) corresponds to the majority logic rule. The authors considered OS-CFAR and CA-CFAR for individual radar tests and considered A N D and OR rules for the fusion site. The distributed OS test is more robust with respect to interfering targets and false alarm changes than the distributed CA test. Amirmehrabi and Viswanathan evaluated a distributed C F A R test called signal-plus-order statistic C F A R (Amirmehrabi and Viswanathan 1997). The radar modeling assumes that the returns of the test cells of different radars are all independent and identically distributed. In this scheme, each radar transmits its test sample, and a designated order statistic of its surrounding observations to the fusion center. At the fusion center, the sum of the test cells' samples is compared to a constant multiplied by either (1) the minimum of the order statistics (called mOS detector) or (2) the maximum of the order statistics (called MOS detector). For detecting a Rayleigh target in Rayleigh clutter with two radars, closed form expressions for the false alarm probabilities, under homogeneous and nonhomogeneous conditions, are obtained. The results indicate that the MOS detector performs much better than the OS detector with the A N D or the OR fusion rule. Of course, this is achieved at the price of sending two real numbers from each sensor instead of sending only a binary information (bit), as in the case of a counting rule. The MOS detector performs better than the mOS detector and performs nearly as good as a central order statistic detector that compares the sum of the test samples against a constant times an order statistic of the reference samples from all the sensors. The general superiority of the MOS is of no surprise in the light of earlier result that the MX-OSD performs better than a fixed OS test, in terms of its ability to control the false alarm increase, without sacrificing detection performance in interfering target situations. A drawback of the MOS is the assumption that the returns in the test cells of the two radars are identical. In a more realistic situation, the noise powers of the test cells of the two radars would be different, and the actual probability of false alarm of the MOS test would change from the designated value, with the exact value being a function of the ratio of the noise powers. Elias-Fuste, Broquetas-Ibars, Antequera, and Marin Yuste considered k-out of - n fusion rule with CA and OS tests for individual radars (Elias-Fust'e et al. 1992). Rayleigh target and Rayleigh clutter were the models used in the analysis. Necessary conditions that the individual thresholds at the radars and the k value should satisfy, in order to maximize probability of detection, for a given probability of false alarm at the fusion center, are derived. Numerical results indicate that there exists no unique k that is optimum for a wide range of system parameters, such as the individual SNRs, the assumed order of OS detector at each radar, the number of reference cells at each site etc. Distributed order statistic C F A R was investigated by Blum and Qiao for detecting a weak narrowband signal in Gaussian noise (Blum and Qiao 1996). A two sensor system using either A N D OR rule was considered. The signal modeling allows for statistical dependency between test cells of the two sensors. However, weak signal detection has more relevance to sonar rather than radar targets.
Order statistics application to CFAR radar target detection
661
3. Order statistics based tests for non-Rayleigh clutter In this section we consider order statistics based tests for Weibu11, log-normal and K-clutter distributions. Unlike the Rayleigh case, these are distributions with two parameters. Therefore, the C F A R detectors designed for the case of both parameters being unknown are less powerful in general, as compared to the detectors designed for the case where one of the parameters is known. 3.1. Weibull clutter
Let W denote the output of the envelope detector corresponding to the test cell. W is distributed as Weibull if the corresponding CDF is given by
Rayleigh clutter is a member of the Weibull family since it can be obtained from (27) with C = 2. The parameter C controls the skewness of the distribution (the "shape" parameter), whereas B is the scale parameter. Smaller values of C result in heavier tailed distributions, viz. spiky clutter. Notice that Z = W2 is also distributed as another Weibull. The moments of (27) are given by
r r
1)
(28)
Thus, for a fixed C, the clutter power (E(W2)) variation is due to a change of B, and a C F A R test maintains a constant false alarm rate irrespective of the value of B. Also, for a fixed C, the C D F is a stochastically ordered family with respect to B. Therefore, if a radar clutter data fits reasonably well a Weibull with a fixed C value, then an order statistic can be used as an estimator for B, and a C F A R test formulated as in the Rayleigh case. Even though an order statistic estimator is not very efficient for small sample sizes, the OS-CFAR can easily tolerate interfering targets and provide some false alarm control during clutter power transitions. Notice that the squared-envelope detector output Z = W2 is distributed as Weibull with parameters (C/2, BZ). Therefore, for single pulse detection with an OSCFAR, it does not matter if a test is formulated with W or Z. An OS-CFAR test based on the squared-envelope is of the form (6) with t determined for a given false alarm requirement. The order number r is a design parameter. The probability of false alarm is given by (Levanon and Shot 1990) nl(t c/2 + n - r)! PF = ( ~ - ) [ ( ~ ~-~!
(29)
Notice that the solution for t requires the knowledge of C. We can call the test (6) as one parameter OS for Weibull clutter. The shape parameter significantly affects the probability of false alarm. Fig. 6 shows the variation of PF with C, for dif-
R. Viswanalhan
662 0.0 N=
~E"
24
-5.0
0~
0~
-6.0
"5 .6 o n
o~
2
r
-10.0
"\
r = 24 /
-15.0 L 1
2
~ -
3
C
Fig. 6. False alarm changes with changes in the shape parameter of Weibull (test designed for Pf=l E - 6 a t C = 2 ) . ferent values of r. In this figure, for a given r, and C = 2 (Rayleigh), t had been fixed at a value to provide PF = 10 -6. When the clutter is nonstationary, and its envelope distribution changes, it may not be reasonable to assume that C is known. Also, when clutter variations take place within the adjacent resolution cells, both C and B might change. Therefore, it is desirable to have a C F A R test that is independent of both B and C. Such a test, called two parameter OS, was formulated by Weber and Haykin (1985):
Z H1 ~ ~l-flyjfl
(30)
<
11o where Z is the test cell sample and Y/(Yj) is the ith(j'th) order statistic of the reference samples (X1,X2,... ,X,). Since the ordering is preserved under the scale and raised to a positive power transformations, and the density of fz~c/2 ~-O is independent of B and C,
Pf : P(z_>
H0)
(3,)
Order statistics application to CFAR radar target detection
663
is independent of both the parameters, that is, (30) is a C F A R test. Whereas the order numbers i and j are design parameters, the constant fi is to be adjusted to achieve a desired PF- An expression for (31) involving a double integral is available (Levanon and Shor 1990; Weber and Haykin 1985). It is shown in (Levanon and Shor 1990) that the test (30) can be derived based on the estimator proposed by Dubey (1967). We digress briefly to discuss few C F A R tests that were not based on the order statistics. Assuming that the reference samples are homogeneous, other estimators for B and C have been considered. The ML estimates of C and B lead to a transcendental equation for the estimate of C (Gandhi et al. 1995). Hence, no attempt has been made to derive a C F A R test based on ML estimates. However, Anastassopoulos and Lampropoulos formulated a C F A R test called OW, based on the ML estimate of B, for a known shape parameter (Anastassopoulos and Lampropoulos 1995). They compared the performance of OW against CA, among others. Gandhi, Cardona, and Baker (1995) have analyzed the so called log-t test
in [lq7=1
,,, > t
¢5E;=1 {ln[HiZ,Wk/W/]}2~
(32)
where Wi = v/~i. The log-t test was originally proposed by Goldstein and was shown to have the C F A R property (Goldstein 1973). The name log-t becomes obvious by noticing that the left hand side of (32), after a logarithmic transformation of the variables, is a t statistic. Levanon and Shor looked at the effect of the values of i and j, of the two parameter OS test (30), on the variance of the estimator of C (this estimator can be used to derive the test (30) as shown in (Levanon and Shor 1990)). They conclude that a large value o f j (close to n) and a smaller value of i (close to 1) lead to a smaller variance for the estimator of C (a similar statement was made in (Weber and Haykin 1985). However, from the view point of tolerating interfering targets, it is imperative that a large j is not selected. A consequence of this is a decrease of the probability of detection in homogeneous background. Analytical evaluation of probability of detection gets complicated because the density of the envelope of a Rayleigh target in Weibull clutter can only be represented in an integral form (Schleher 1976). For C = 2, Weibull becomes a Rayleigh clutter, and hence analytical results for detection probability are possible. These results in (Levanon and Shor 1990) compare the detection probabilities of the two parameter OS, one parameter OS, and the log-t test, under homogeneous Rayleigh background. The two parameter OS exhibits considerable detection loss, in terms of the required SNR for a specified detection probability, as compared to the single parameter OS. Of course, the single parameter OS is designed with the knowledge that C = 2, and its superior performance is anticipated. However, the loss for the two parameter OS test was smaller than that of the log-t test. In that
664
R. I/iswanathan
sense the two parameter OS may be preferable to the log-t test. However, the value for j in (30) was taken to be close to n, which means that the normal immunity to multiple targets expected from an OS test is not available. They also looked at the variation of the detection probability of a single parameter OS test that was designed for several values of assumed C < 1, when the actual C is 2. A general conclusion was that if C does vary only over a small interval such as (1, 1.5), the single OS test with an assumed C within the range may still show a larger detection probability than a two parameter OS test. Again, the order number j that they used in the evaluation of the two parameter OS for this comparison was equal to n, which implies no tolerance to interfering target. It seems more evaluations are required before any conclusive argument with regard to these three tests can be made. Rifkin evaluated the performance of a MX-OS detector operating in Weibull clutter (Rifkin 1994). In terms of the envelope detector output, the test is given by Ht
(33)
W>>_tR <
~/0 where R = max(R~,Re),R1 is the r th ON of the samples from the leading window, W1,..., W~,/2), and R2 is the r th OS of the samples from the lagging window, W,/2+1, • • •, W,). Given the knowledge of the shape parameter C, (33) is a C F A R under homogeneous Weibull clutter. The threshold t needed to achieve a desired false alarm rate can be obtained from
/
=
fw(x)
(34)
Rifkin considered A D T performance measure to characterize detection performance. For Weibull, A D T can be defined as
A D T - t2E (e2) E(J~i 2)
(35)
Comparison of A D T with a threshold that is set for a required false alarm rate, when both C and B are known, gives an idea of additional detection loss that occurs with the C F A R scheme. Results show that for a given r, A D T increases with decreasing shape parameter C. Also, with r selected as the 75 th percentile, larger values of n seem to be helpful for smaller C in the sense that greater reductions of A D T are achieved. An approximate interferer model assumes the interferer power to be high so that these samples would occupy the extreme positions in the rank order. With this model, it was seen that the MX-OS scheme is quite robust to interferer (Rifkin 1994). The additional A D T due to one interferer was less than ldB, whereas a CA designed for C -- 2, would exhibit a 10dB loss. False alarm changes with clutter transition were numerically evaluated using the model that the low power clutter is Rayleigh distributed whereas the high power clutter is a Weibull. The worst case increase in false alarm happens for
Order statistics application to CFAR radar target detection
665
spiky clutter, i.e., with low C values. The least increase occurred for Rayleigh ( C considered was in the range (0.5, 2)). Guida, Longo, and Lops considered a C F A R test based on BLUE estimates of the parameters of a transformed Weibull variate (Guida et al. 1992). They applied the transformations G = In W and Gi = In Wi to the envelope detector outputs of the test and the reference cells. A Weibull is then transformed into a Gumbel density fc(9) = ~1e x p ( 9~ ,)a
e x p ( - exp ( ~ - ~ ) )
(36)
where the location parameter a and the scale parameter b are related to the Weibull parameters (see (27)) by a=lnB,
b - - C -I
(37)
If ci and/; are two equivariant estimators of a and b respectively, then the test G - c~/~ =-->h b <
(38)
H0
is a C F A R test, because the test statistic is independent of the parameters a and b (Guida et al. 1992). In fact, the estimates ci and/; can be chosen as BLUEs based on the type II censored (upper and lower orders censored) samples of the variates (Ga, G2,. •., G~). By choosing the upper censoring point, r2, less than n, the test (38) can be made tolerant to interfering targets. A difficulty with (38) is that the exact density of the test statistic cannot be obtained. Some reasonable approximation to the density can be made so that the threshold h can be computed for a given false alarm requirement. In their performance evaluation, the authors considered nvalues ranging from 8 to 32, PF in the range (10 6 10-3), C values in the range (1, 3), and signal-to clutter ratio (SCR) from 0 to 40dB. In order to compare the detection performance with CA, the authors first assumed a Rayleigh target in Rayleigh clutter (Weibull with C = 2). In homogeneous Rayleigh clutter, both log-t and BLUE have detection losses as compared to CA. This is to be anticipated because the former two are bi-parametric whereas the CA is designed with the knowledge that the clutter is Rayleigh. The interesting thing is that the losses decrease with n and for n = 32, the losses for both of the detectors are small as compared to CA. Next, they considered Rayleigh target in Weibull clutter. A difficulty is that the exact density of the test cell under target hypothesis depends on both the in-phase and the quadrature components of both the clutter and the target return, whereas only the amplitude distributions of target (Rayleigh) and the clutter (Weibull) are specified. An approximation is made that for large SCRs, the amplitude distribution of the signal plus clutter is effectively that of the signal distribution. It is observed that for a given SCR, highly skewed clutter yields smaller detection probability. BLUE has an 1 to 2dB advantage over the log-t test. An examination of the censoring depths, (n - r2) and (n - rl), shows that the detection probability
R. Viswanathan
666
loss in homogeneous background is significant only if a heavy censoring depth is used. Whereas heavy censoring may be useful in controlling worst case false alarm increase in clutter transition, small values of censoring depth can provide reasonable detection probability under both homogeneous and small number of interferer conditions. Unfortunately, this paper did not compare the performance of BLUE against the other OS based bi-parametric procedure, the two parameter OS test (30).
3.2. Log-normal clutter There have been two order statistics based CFAR tests proposed for the log-normal clutter, whose amplitude distribution is given by
fiv(w) - x/~acw exp
w> 0
(39)
The log-t test in (32) is a non-OS based CFAR test for log-normal and it was originally suggested by Goldstein (1973). It can be seen that with the transfor2 With mation Q = in W, Q is distributed as normal with mean Pc and variance a c. the log envelope output transformation, the transformed variables of the test and reference cells can be denoted as Q and {Qi}, respectively. Since the order statistics based on the Q variates preserve scale and location information, a CFAR test can be designed using order statistics of {Qi}. One such test was formulated in (Gandhi and Holladay 1992) and its performance was compared against the log-t detector. The detection probabilities of both the tests were numerically computed for a nonfluctuating target (point target) case. The density of W under the target hypothesis was obtained using an integral form expression (Schleher 1976). The OS detector of (Gandhi and Holladay 1992) exhibits some detection loss as compared to log-t in homogeneous clutter, but its performance with single interferer is much better than that of the other. More comparative evaluations, especially for fluctuating target model, are needed. As in the case of Weibull, Guida, Longo, and Lops proposed a CFAR test based on the BLUE estimator (Guida et al. 1993). The test is to be conducted with the log-transformed variates as given below.
V
< H0
where U - #c - / i ,
V =--d-
O"c
(41)
O"c
and the estimators are of the form n--r 2
fi = Z i~r I + 1
n--r 2
aiQ(i),
d= Z i=rl + 1
biQ(i)
(42)
Order statistics application to CFAR radar target detection
667
It has been shown that (40) is a C F A R test with respect to both the parameters of the log-normal clutter. However, as in the Weibull case, the exact density of the test statistic in (40) cannot be determined. Therefore, the authors suggest an approximation. The performance results show that the BLUE test with small rl and r2 is indeed more resistant to interfering targets than the log-t detector. 3.3. K-distributed clutter
The K-distribution given below as (43) has been found to be a good fit for the amplitude distribution of certain radar sea clutter as well as clutter in certain synthetic aperture radar (SAR) imagery. f w ( w ) = ~(v)
-~
Kv-l(bw),
w> 0
(43)
When the shape parameter v is unbounded, the K-distribution becomes a Rayleigh, and when it is 0.5, it becomes an exponential. Smaller values of v correspond to heavier tails, b is the scale parameter of the density. If only b varies with clutter inhomogeneity across the reference cells, then CA and OS tests are C F A R tests, because both averaging and ordering preserve the scale parameter. These tests are generally called C F A R in the literature but it has to be understood that this property holds with respect to the parameter b only. Armstrong and Griffiths evaluated the performances of CA, GO, and OS C F A R tests in K-distributed clutter (Armstrong and Griffiths 1991). Envelope detected outputs are assumed available. All the C F A R tests can be put in the form HI
W _> t R < /4o
(44)
where R is the sum of reference samples for the CA-CFAR test, R is an order statistic of the reference samples for the OS test etc. With the knowledge of v, the t required for a desired probability of false alarm, for each of the tests can be obtained. The Rayleigh fluctuating target and single pulse processing were assumed. For evaluating the probability of detection, an accurate approximation is made that the density of the envelope of Rayleigh signal in K-distributed clutter is another Rayleigh with an appropriate mean value. Only detection performance in homogeneous situation was considered. The ranges of different parameters considered are as follows: n = (8, 32), PF =- (10 -8, 10 -3) and v = (0.1, e~) . The general conclusion is that the OS detector exhibits maximum detection loss, and the CA detector the least, among the three. The losses, with respect to Rayleigh clutter, increase with the spikiness (smaller v) of the distribution for all the tests. Effect on false alarm rate change due to an incorrect assumption of v
668
R. Viswanathan
was also studied. General conclusions are that all the tests are equally sensitive to errors in the assumed value of v, the value of n does not greatly affect the sensitivity, sensitivity is higher at lower false alarm rate (~ 10-8), and sensitivity increased for all the tests with decreasing v. It is possible that a test designed with a Rayleigh assumption and PF = 10 -6 could actually have a false alarm rate of 10 -3, even with a moderate spiky clutter (v _< 2). The OS test is expected to show more immunity to interferers than the other two. But, because of greater loss in homogeneous backgrounds, the authors recommend a C M L D over the OS test. It remains to be explored whether other order statistics based tests can be designed to perform better than CA in the homogeneous background and at the same time show masking immunity to interfering targets. It is to be mentioned that tests can be designed based on a spherically invariant random process (SIRP) model for the K-distributed clutter process (Conte et al. 1991). However, these tests are not C F A R in the sense that they require a complete knowledge of the parameters that generate the S1RP process. Finally, we just mention that OS based tests have been recently applied to extract targets from clutter encountered in SAR imagery (Kuttikad and Chellappa 1995; Novak and Hesse 1991).
4. Conclusion
In this chapter we have reviewed various order statistics based tests for C F A R radar target detection in different clutter environments. The performances of several OS based tests for detecting Rayleigh target in Rayleigh clutter have been extensively studied. This is understandable because an analytical evaluation of a majority of these tests is possible with exponential assumption. Among these tests, MX-OS, SE, AOS, and perhaps the LI filter, or BLUE with the GO operation, seem to provide an overall superior performance. No close comparative studies among these have been done to determine the best choice. However, given a scenario involving the maximum expected number of interfering targets and nature of inhomogeneity in clutter, it should be possible to compare these tests and determine the best choice. Also, the performances of these tests will have to be investigated further for multiple pulse detection. Though other possible tests, including multi-stage procedures, can be formulated, it is very much clear that these will have to include "ranking operation" in them, in order to provide the required immunity to clutter inhomogeneity. A study of the performances of few OS based tests in non-Rayleigh clutter has been done only to a limited extent. Our discussion of Weibull clutter considered the bi-parametric OS test given by (30), the single parameter OS test, and the BLUE test. A more thorough comparative study of these will have to be done in order to determine their effectiveness to inhomogeneity in reference cell samples. In the case of K-distributed clutter, under the condition of known shape parameter and homogeneous clutter, the CA test performs significantly better than the OS test. Even though the OS is expected to be more tolerant to interfering
Order statistics application to CFAR radar target detection
669
targets, because of its detection loss with respect to CA, it is necessary to look into the p e r f o r m a n c e s of other OS based tests. Perhaps, intractability with analytical e v a l u a t i o n of detection p e r f o r m a n c e w o u l d dictate s i m u l a t i o n studies. Also, in the case of K-distributed clutter, it is n o t clear if C F A R detection based o n envelope is " a d e q u a t e " . F u r t h e r investigation of S I R P model for the K-distributed clutter m a y provide a n answer to this in the future (Conte et al. 1991; W a t t s et al. 1990).
Acknowledgement The a u t h o r w o u l d like to t h a n k Professor P r a s h a n t G a n d h i a n d Professor Prarood V a r s h n e y for their c o m m e n t s on an earlier version of this paper. He is also t h a n k f u l to Mr. C h a n d r a k a n t h G o w d a for his help in generating the illustrations. The work was supported by B M D O a n d m a n a g e d by the office of N a v a l research u n d e r c o n t r a c t N00014-94-1-0736.
References Al-Hussaini, E. K. (1988). Performance of the greater-of and censored greater-of detectors in multiple target environments. IEE Proc. F, Commun, Radar & Signal Process. 135, 193-198. Amirmehrabi, H. and R. Viswanathan (1997). A new distributed constant false alarm rate detector. IEEE Trans. AES. 33, 85 97. Anastassopoulos, V. and G. A. Lampropoulos (1995). Optimal CFAR detection in Weibull clutter. IEEE Trans. AES. 31, 5~64. Armstrong, B. C. and H. D. Griffiths (1991). CFAR detection of fluctuating targets in spatially correlated K-distributed clutter, l E E Proc. F, Radar & Signal Process. 183, 139 152. Arnold, B. C., N. Balakrishnan and H. N. Nagaraja (1992). A first course in order statistics. Wiley, New York. Barkat, M., S. D. Himonas and P. K. Varshney (1989). CFAR detection for multiple target situations. lEE Proc. F, Commun, Radar & Signal Process. 136, 193 209. Blake, S. (1988). OS-CFAR theory for multiple targets and nonuniform clutter. IEEE Trans. AES. 24, 785 790. Blum, R. S. and J. Qiao (1996). Threshold optimization for distributed order-statistic CFAR signal detection. IEEE Trans. AES. 32, 368-377. Chair, Z. and P. K. Varshney (1986). Optimal data fusion in multiple sensor detection systems. IEEE Trans. AES. 22, 98-101. Conte, E., M. Longo, M. Lops and S, L. Ullo (1991). Radar detection of signals with unknown parameters in K-distributed clutter, l E E Proe. F, Radar & Signal Process. 138, 131-138. David, H. A. (1981). Order statistics. Wiley, New York. Di Franco, J. V. and W. L. Rubin (1980). Radar detection. Artech House, Dedham, MA. Di Vito, A., G. Galati and R. Mura (1994). Analysis and comparison of two order statistics CFAR systems, l E E Proc. F, Radar & Signal Process. 141, 109 115. Dubey, S. D. (1967). Some percentile estimators for Weibull parameters. Technometrics. 9, 119-129. Elias-Fust'e, A. R., M. G. G. de Mercado and E. Reyes Davo (1990). Analysis of some modified order statistic CFAR: OSGO and OSSO CFAR. IEEE Trans. AES. 26, 197-202. Elias-Fust'e, A. R., A. Broquetas-Ibars, J. P. Antequera and J. C. M. Yuste (1992). CFAR data fusion center with inhomogeneous receivers. IEEE Trans. AES. 28, 276-285. El Mashade, M. B. (1995). Analysis of the censored-mean level CFAR processor in multiple target and nonuniform clutter, l E E Proc. - Radar, Sonar Navig. 142, 259 266.
670
R. Viswanathan
Finn, H. M. and R. S. Johnson (1968). Adaptive detection mode with threshold control as a function of spatially sampled clutter-level estimates. RCA Review. 29, 414-464. Finn, H. M. (1986). A CFAR design for a window spanning two clutter fields. IEEE Trans. AES. 22, 155-169. Gandhi, P. P. and S. A. Kassam (1988). Analysis of CFAR processors in nonhomogeneous background. IEEE Trans. AES. 24, 427-445. Gandhi, P. P. and S. A. Kassam (1989). An adaptive order statistic constant false alarm rate detector. IEEE International Conf. Systems Engr., Wright State University, 85-88. Gandhi, P. P. and S. A. Kassam (1994). Optimality of the cell averaging CFAR detector. IEEE Trans. Inform. Theory. 40, 12261228. Gandhi, P. P. (1996). Data quantization effects in CFAR signal detection. IEEE Trans. AES, 32, 1277 1289. Gandhi, P. P., E. Cardona and L. Baker (1995). CFAR signal detection in nonhomogeneous Weibull clutter and interference. Proeee. IEEE Inter. Radar Conf., 583 588. Gandhi, P. P. and G. J. Holladay (1992). Constant false alarm rate detectors in log-normal clutter. Procee. 1992 Conf. Infor. Scienc. Sys. Princeton Univ., NJ., 749-754. Goldstein, G. B. (1973). False-alarm regulation in log-normal and Weibull clutter. IEEE Trans. AES. 9, 84-92. Guan, J. and Y. He (1995). Performance analysis of GOSCA CFAR detector in clutter edge situation. IEEE International Radar Conference, 592-594. Guida, M., M. Longo and M. Lops (1992). Biparametric linear estimation for CFAR against Weibull clutter. IEEE Trans. AES. 28, 138-151. Guida, M., M. Longo and M. Lops (1993). Biparametric CFAR procedures for lognormal clutter. IEEE Trans. AES. 29, 798 809. Hansen, V. G. (1973). Constant false alarm rate processing in search radars. Procee. IEEE 1973 Inter. Radar Conf., London. 325-332. He, Y. (1994). Performance of some generalised modified order statistics CFAR detectors with automatic censoring technique in multiple target situations. IEE Proc. - Radar, Sonar Navig. 141, 205-212. He, Y. and J. Guan (1995). A new CFAR detector with greatest of selection. IEEE International Radar Conference, 589-591. Kuttikad, S. and R. Chellappa (1995). Building wide area 2D site models from high resolution polarimetric synthetic aperture radar images. Tech. Report, Dept. Elec. Engr. and Center Automation Res., Univ. Maryland, CAR-TR-776. Levanon, N. (1988). Detection loss due to interfering targets in ordered statistics CFAR. IEEE Trans. AES. 24, 678-681. Levanon, N. and M. Shor (1990). Order statistics CFAR for Weibull background. IEE Proc. F, Radar & SignalProcess. 137, 157-162. Lim, C. H. and H. S. Lee (1993). Performance of order-statistics CFAR with noncoherent integration in homogeneous situations, lEE Proc. F, Radar & Signal Process. 140, 291-296. Lops, M. and P. Willett (1994). LI-CFAR: A flexible and robust alternative. IEEE Trans. AES. 30, 4154. Nagle, D. T. and J. Saniie (1995). Performance analysis of linearly combined order statistic CFAR detectors. IEEE Trans. AES. 31,522-533. Nathanson, F. E. (1991). Radar design principles. McGraw Hill, New York. Nitzberg, R. (1992). Adaptive signal processing for radar. Artech House, Boston. Novak, L. M. and S. R. Hesse (1991). On the performance of order-statistics CFAR detectors. Procee. 25th Asilomar Conf. on Signals, Systems, and Computers. 835-840. Ozugunes, I., P. P. Gandhi and S. A. Kassam (1992)i A variably trimmed mean CFAR radar detector. IEEE Trans. AES. 28, 1002-1014. Rickard, J. T. and G. M. Dillard (1977). Adaptive detection algorithm for multiple target situations. IEEE Trans. AES. 13, 338-343.
Order stat&tics application to CFAR radar target detection
671
Rifkin, R. (1994). Analysis of CFAR performance in Weibull clutter. IEEE Trans. AES. 30, 315-329. Ritcey, J. A, (1986). Performance analysis of the censored mean-level detector. IEEE Trans. AES. 22, 48-57. Ritcey, J. A. and J. L, Hines (1989). Performance of the Max mean level detector with and without censoring. IEEE Trans. AES. 25, 213-223. Ritcey, J. A. and J. L. Hines (1991). Performance of MAX family of order-statistics CFAR detectors. IEEE Trans. AES. 27, 48-57. Ritcey, J. A. (1990). Detection analysis of the MX-MLD with noncoherent integration. IEEE Trans. AES. 26, 569-576. Rohling, H. (1983). Radar CFAR thresholding in clutter and multiple-target situations. IEEE Trans. AES. 19, 608-621. Saniie, J,, K. D. Donohue and N. M. Bilgutay (1990). Order statistic filters as postdetection processors. IEEE Trans. A S S P . 38, 1722-1731. Schleher, D. C. (1976). Radar detection in Weibull clutter. IEEE Trans. AES. 12, 736-743. Skolnik0 M. (1980). Introduction to radar systems. McGraw Hill, NY. Shor, M. and N, Levanon (1991). Performance of order statistics CFAR. IEEE Trans. AES. 27, 214224. Thomopoulos, S. C. A., R. Viswanathan and D. C. Bougoulias (1987). Optimal distributed decision fusion in multiple sensor systems. IEEE Trans. AES. 23, 644-653. Uner, M. K. and P. K. Varshney (1996). Distributed CFAR detection in homogeneous and nonhomogeneous backgrounds. IEEE Trans, AES. 32, 84-96. Viswanathan, R. and A, Eftekhari (1992). A selection and estimation test for multiple target detection. IEEE Trans. AES. 28, 505-519. Watts, S., C. J. Baker and K. D. Ward (1990). Maritime surveillance radar Part 2: Detection performance prediction in sea clutter, l E E Proc. F, Radar & Signal Process. 137, 63-72. Weber, P. and S. Haykin (1985). Ordered statistics CFAR for two-parameter distributions with variable skewness. IEEE Trans. AES. 21, 819-821. Weiss, M, (1982). Analysis of some modified cell-averaging CFAR processors in multiple target situations. I E E E Trans. AES. 18, 102-113. Wilson, S. I. (1993). Two CFAR algorithms for interfering targets and nonhomogeneous clutter. IEEE Trans. AES. 29, 57-72.
Author Index
Basu, A.P. 3, 4, 22 Basu, E.P. 484, 485 Bednar, J.B. 526, 615, 624 Belyayev, Y.K. 490 Bennett, C.A. 217 Benson, C. 218 Benson, F. 184, 230 Berkson, J. 127 Beyer, J.N. 195 Beyer, W. 515, 517 Bhattacharya, P.K. 230, 319 Bickel, P.J. 219, 221,237, 238, 338 Bilgutay, N . M . 652 Bingham, C. 481,489 Birnbaum, A. 476 Blake, S. 648 Bloch, D. 201,358 Block, H. 4 Blom, G. 160, 164, 217, 358, 436, 477, 481 Blum, H. 635, 660 Boncelet, C.G. 529, 623, 627 Boulton, M. 201 Bovik, A.C. 611,614, 615, 621,523-626, 635-637 Brain, C.W. 486, 487, 490 Broquetas-Ibars, A. 660 Brown, E.F. 184, 197, 201
Abdelmaek, N . N . 239 Abramowitz, M. 62, 131 Aburn, N. 514 Abu-Salih, M.S. 443 Acton, S.T. 637 Adatia, A. 439 Aitken, A.C. 161,481 Alam, K. 226, 402 Al-Hussaini, E.K. 654 All, M . M . 117, 164, 183, 184, 191, 193, 195197, 200-207 Amirmehrabi, H. 659 Anderson, T.W. 203, 518, 520 Andrews, D . F . 221,222 Antequera, J.P. 660 Antle, C.E. 311 Arce, G . A . 613, 618, 619, 627 Arce, G . R . 535 Arnold, B.C. 26,65,88,90,91,104, 117, 125, 132, 136, 143, 160, 162-164, 175, 198, 327, 331,341 Astola, J. 597 Ataman, E. 616 Bai, D.S. 501,506, 507 Bai, Z . D . 160, 164-166, 179 Bain, L.J. 61,482, 485, 488, 503 Baker, L. 663 Bakir, S.T. 63 Balakrishnan, N. 3, 4, 22, 25, 26, 43~,5, 47, 63, 65, 66, 72, 76, 77, 86-88, 90, 91, 102, 104, 117, 119, 124, 125, 127-130, 132, 136, 143, 159, 160, 16~164, 175, 178, 198, 217, 218, 223, 226, 283, 316, 325-328, 331,341, 365, 371,461,484, 485, 500 Balasooriya, Uditha, 431,439, 507-509 Balmer, D.W. 201,246 Barkat, M. 658 Barner, K.E. 627 Barnett, V. 163, 220, 358, 359, 363 Bassett, G. 232, 237-239, 241,245, 257
Cacoullos, T. 402 Calabria, R. 445 Cane, G.J. 246 Cardona, P.E. 663 Carroll, R.J. 222, 223, 231,237, 252 Chan, L.K. 164, 183, 195, 196, 201,219, 439 Chan, M. 283, 300 Chan, N . N . 183, 184, 190, 201 Chan, P.S. 66, 76, 316 Chen, H.I. 163, 481 Chen, L. 488, 489, 491,492 Chen, S-H. 338 Cheng, R. C.H. 283 673
674
Author mdex
Cheng, S.W. 183, 184, 190, 195, 203, 219 Chernoff, H. 164, 185, 217, 226, 476 Chou, Youn-Min 445, 446 Christensen, B. 514 Chun, Y.R. 501,506 Clark, V.A. 310 Clutter, J.L. 338 Cobby, J. M, 337, 338 Cohen, A.C. 3, 25, 43~45, 61, 63, 72, 77, 87, 104, 163, 164, 178, 217, 218, 226, 283, 288, 289, 291,292, 296, 297, 300, 301,303-307, 309, 312, 326, 327, 33I, 341,365, 621 Conover, W.J. 223 Coronel-Brizio, H.C. 464 Coyle, E.J. 628 Craig, A.T. 401 Cramer, H. 537 Crawford, D.E. 178 Crow, E.L. 219, 621 D'Agostino, R.B. 476, 477, 484, 486 488 d'Orey, V. 232, 246 Daniel, C. 476 Das Gupta, S. 166 David, H.A. 3, 25, 26, 28, 63, 65, 72, 77, 81, 88, 91, 104, 117, 132, 136, 143, 162 164, 218, 222, 223, 226, 230, 318, 338, 341, 34t, 363, 368, 462, 505, 611 Davidian, M. 222 Davidson, R.R. 127, 129, 130 Davies, O.L. 460 Davis, C.E. 224, 461 de Jongh, P.J. 231 de Mercado, M. G.G. 654 Dell, T.R. 337, 338 DeWet, T. 220, 461,482, 484 di Vito, A. 654 DiCiccio, T.J. 63 Dillard, J.M. 653 Ding, Y. 283, 288, 289, 291,292, 296, 300, 306 Dixit, U.J. 379, 447 Dixon, W. J. 344-346, 386 Donohue, K.D. 652 Downton, F. 228 Dubey, S.O. 127, 662 Dumonceaux, R. 311 Dunsmore, I.R. 396, 444 Durbin, J. 464 Dyer, D.D. 203 Dykstra, R.L. 14, 16-18, Eberly, D.
616, 624, 625
Eftekhari, A. 656 Eisenberger, I. 183, 184 Elias-Fust'e, A.R. 654, 600 Engelhardt, M. 503 Epstein, B. 43, t83, 191,283, 285 Escobar, L.A. 179 Eubank, R.L. 190, 191, 196 Farlie, D. J.G. 499 Fei, H. 318, 322, 325, 326, 370 Fennessey, N . M . 229 Fercho, W.w. 490 Fertig, K. W. 501-503, 505, 506 Fill, H.D. 230 Filliben, J. J. 481-484 Finney, D.J. 127 Fintch, J.P. 628 Fishman, G.A. 513, 520 Fligner, M.A. 444 Foster, R.E. 618, 619 Fotopoulos, S. 160, 163, 462, 463 Francia, R.S. 462, 463, 481 Freireich, E.J. 310 Frenkel, R. 514 Fried, L.P. 513 Frieden, B.R. 606 Futatsuya, M. 338 Gabler, S. 329 Galati, G. 654 Gan, F . F . 163, 485 Gastwirth, J.L. 185, 221,621 Gehan, E.A. 310 Geisser, S. 379, 381,387, 391,393, 396 398, 439, 444 Gerlach, B. 456, 487 Ghallagher, Jr., N.C. 566, 611,613, 616, 624, 628 Ghandi, D. 529, 627, 648,652, 653, 655, 657, 663 Gnanadesikan, M. 183, 195, 477 Gnedenko, B.V. 490 Goldberger, A.S. 435 Goldstein, G.B. 663, 666 Greenberg, B.G. 3, 43, 163, 183, 184~ 188, 193, 194, 283, 504 Greenwood, J.A. 228 Griffiths, D. A, 283 Groeneveld, R.A. 342, 363,368 Gross, A.J. 310 Guenther, W.C. 508 Guida, M. 664, 666 Gumbel, E.J. 127, 129
Author mdex
Gupta, A . K . 481,503 Gupta, S.S. 47, 63, 128, 164, 183, 195, 216, 325, 326,402, 463 Guttman, N.B. 230 Hahn, G.J. 161,163, 166, 167, 178, 179, 477, 478 Hall, P. 322, 491 Halls, L.K. 337 Halperin, M. 319, 322 Hamouda, E.M. 178 Hampel, F . R . 221 Han, C.P. 271 Harrell, F.E. 224 Harris, C.M. 490 Hatter, H.L. 159, 163, 169, 170, 183, 193, 195, 283, 292 Hartley, H.O. 387, 505 Hassanein, K . M . 183, 184, 190, 191, 195 197, 199, 201-203 Haykin, S. 661 Heijmans, H. J. A . M . 613, 629 Heinonen, P. 618, 625 Helm, R. 43 Hill, B.M. 283 Himonos, S.D. 658 Hoaglin, D.C. 163 Hogg, R.V. 220, 237, 401 Hosking, J. R . M . 87, 228, 229, 230 Hosono, Y. 501 Huang, T.S. 616 Huber, P.J. 221 Iglewicz, B. 163 Iles, T.C. 283 Imhof, J.P. 472 Johns, M.V. 185 Johnson, N.L. 164, 328, 401 Johnson, R.A. 482 Joiner, B. 48l Joshi, P.C. 26, 28, 30, 45, 50, 117 Jure~kovfi, J. 237, 252, 260, 261 Kabe, D . G . 386 Kabir, A. B. M.L. 196 Kadane, J. 22 Kaigh, W . D . 224 Kale, B.K. 3 Kambo, N.S. 43, 283 Kaminsky, K.S. 183, 435439, 443 Karlin, S. 5, 166 Kase, S. 501
675
Kassam, S.A. 529, 615, 624, 627, 648, 653, 655, 657 Khan, A.H. 117 Khashimov, Sh. A. 322 Khayrallah, A. 550 Kielpinski, T. 507 Kim, J.G. 501,506 Kim, Y-T. 535 Kimball, B.F. 507 Kindermann, R.P. 227 Kish, J . F . 127 Kjelsberg, M.O. 47 Knott, M. 464 Kocherlakota, S. 47, 117, 119, 124, 500 Koenker, R. 232, 237-239, 241,245,246, 257 Koheler, K.J. 163, 485 Kong, F. 322, 325, 326 Korukoglu, S. 487, 492 Kotz, S. 328 Koul, H.L. 252, 253, 257, 259 Koutrouvelis, I.A. 183, 191,203 Kubat, P. 183 Kuhlman, F. 611 Kulasekera, K.B. 226 Kulldorff, G. 183, 193, 196, 201,202 LaBrecque, J. 463, 464 Lachenbruch, P.A. 224 Lall, U. 226 Lam, K. 351,364, 366, 368 LaRiccia, V. N 206, 227 Lawless, J.F. 3, 6, 9, 43, 56, 61, 63, 79, 163, 328,440~,43, 503 Lee, Y.H. 615,624 Lehman, Eugene H. 464 Lemon, G . H . 283 Leone, F.C. 178 Leslie, J.R. 160, 163, 4462, 463 Leung, M. Y. 127-130 Levanon, N. 652, 663 Levine, D.N. 338 Lewis, T. 163, 220 Liao, G-Y. 616 Lieberman, G.J. 164, 476, 498 Lieblein, J. 66, 76, 79 Likes, J. 386, 441,443 Lin, C.C. 490 Lin, J-H. 628 Lindley, D.V. 22 Lingappaiah, G. S. 445~447 Liniger, L. 10 Lloyd, E.H. 77, 106, 161,183,216, 285, 358 360, 365, 435, 481,621
676
Author mdex
Lockhart, R.A. 283, 460, 463, 467, 468 Longbotham, H.G. 611,624-626 Longo, M. 664, 666 Lops, M. 657, 664, 666 Lord, E. 218 Lu, K.L. 231 Malik, H.J. 117, 365 Mann, N . R . 43, 61,200, 316, 317, 371,437, 440, 487, 501-503, 505, 506 Maragos, P. 628 Marron, J. S. 224-226 Martin, W.K. 337 Martiz, J.S. 227 Masuda, K. 499 McCool, J.L. 308 McDonald, L.L. 338 McIntyre, G. A. 337-339 McLaren, C.G. 460, 467 McLaughin, D.H. 219 McLoughlin, M.P. 613, 618 Mead, E.R. 183, 195, 196, 201 Moon, Y.I. 226 Moore, A.H. 159, 169, 170, 195, 283 Morrison, H.L. 223 Mosteller, F. 183, 187, 194, 476 Moussa, E.A. 178 Mudholkar, G.S. 490 Munoz, B. 513 Munro, A . H . 227 Munson, Jr., D.C. 635 Mura, R. 654 Muttlak, H.A. 338 Naaman, L. 621,623 Nagar~a, H.N. 88, 91, 117, 132, 136, 143, 162-164, 175, 198, 327, 331,438, 439 Nagel, K. 402 Nelson, W. 43, 57, 116, 163, 166, 167, 178, 179, 435-438, 443, 444, 477-479, 504, 507 Neuman, C.P. 229 Neuvo, Y. 597, 618, 625 NiChuiv, V. 358, 360, 364, 363 Nodes, T.A. 613, 616 O'Hagen, A. 22 Odell, P.L. 216 Ogawa, J. 183, 194, 195, 241 Ogunyemi, O.T. 444 Okta, El. 501 Oliver, F . R . 127 Olkin, I. 230, 514 Owen, D.B. 445, 446, 498, 499
Ozturk, A.
487, 492
Palmieri, F. 529, 627 Panchapakesan, S. 325, 326 Parrish, R.S. 163 Parving, A. 514 Parzen, E. 185, 224 Patel, J.K. 431,444 Patil, G.P. 338 Patnaik, P.B. 318, 328, 409 Pearl, R. 127 Pearson, K. 475, 505 Pearson, C.P. 230 Pearson, E.S. 387 Pitas, I. 633 Portnoy, S. 232 Posner, E.C. 183, 184 Prentice, R.L. 62, 63, 128 Prescott, P. 218 Press, S.J. 22 Pulcini, D. 445 Purl, M.L. 463 Qiao, J.
660
Rabiner, L.R. 606, 615 Ramey, J.T. 402 Rao, C.R. 176, 191,250, 251,401,463, 499 Raqab, M.Z. 439 Reed, L.J. 127 Rehak, J. 514 Ren, J.J. 231 R6nyi, A. 5 Resnikoff, G.J. 498 Restrepo, A. 611,621,624, 636 Reyes davo, E. 654 Rhodin, L.S. 438, 439 Richard, J.T. 653 Ridout, M.S. 388 Rifkin, R. 663 Ringer, L.J. 490 Robertson, T. 14, 16-18 Rockette, H. 283 Rothenberg, T.J. 358 Royston, J.P. 482, 483 Royston, P. 229, 230 Rubin, G.S. 513, 522 Ruppert, D. 231,237, 252 Ryan, T. 481 Sach, R.A. 201 Sager, T.W. 338
677
Author index
Saleh, A.K. 43, 45, 183, 184, 193 197, 201, 203, 204, 257, 259, 271,272, 277 Salzer, 66 Saniie, J. 652 Sarhan, A.E. 3, 43, 163, 183, 184, 188, 193, 194, 283, 285, 504, 621 Sarkadi, K. 160, 456, 463 Sarkar, S.K. 160, 164-166, 179 Sfirndal, C.E. 183, 190, 203 Schafer, R.E. 61,503 Schafer, R.W. 628 Schechtman, E. 230 Schneider, H. 223, 499, 501,503-505, 506, 507 Schonbach, D.I. 229 Schultz, H. 127 Sen, P.K. 184, 204, 259, 272, 277 Serra, J. 633, 635 Shah, B.K. 86, 88, 91, 102 Shapiro, S.S. 160, 161, 163, 166, 167, 460, 462, 463, 476478, 480,481,483~489, 492, 629 Sheather, S. J. 224-226 Shin, D. 519 Shirahata, S. 338 Shiraishi, T. 271 Shor, M. 652, 663 Siddiqui, M . M . 183, 193, 219, 621 Sillitto, G.P. 228 Singer, J . M . 259 Sinha, S.K. 3, 343, 349 Smith, R.L. 283 Smith, R . M . 482, 485, 488 Sobel, M. 43, 283, 285, 402 Solovyev, A . D . 490 Spinelli, J.J. 463, 468 Srivastava, A. B.L. 499 Stedinger, J.R. 230 Stegun, I.A. 62, 131 Stephens, M.A. 160, 163,283,456, 459,461464, 467~469, 476, 477, 481,484, 487, 488 Stigler, S.M. 219, 237 Stokes, S.L. 338, 341 Stone, M. 439 Sukhatme, P.V. 4, 25 Szlyk, J.P. 513 Takada, Y. 439, 443 Takagi, K. 499 Takahasi, K. 338 Tang, Y. 325, 326 Tarter, M.E. 47 Taylor, H . M . 5
Tierney, L. 22 Tietjen, J. L. 341-343 Tiku, M.L. 223,491,499 Tukey, J.W. 219,338,476, 526, 555, 559, 606 Umbach, O. Uner, M . K .
184, 191, 193, 195-197, 200 207 659
V/innman, K. 202 van Zwet, W . R . 436 van Wyk, J. W.I. 220 Varshney, P.K. 658, 659 Venetsanopoulos, A.N. 633 Venter, J.H. 461,482, 484 Verhulsl, P.J. 127 Verrill, S. 482 Viana, M. A . G . 514 Vincent, L. 633 Viswanathan, R. 656, 659 Viveros, R. 57, 283 Vogel, R . M . 229 Von Andrae, C. C . G . 222 Vymazal, M. 514 Wadsworth, H.I. 476, 487 Wakimoto, K. 338 Wallis, J.R. 229 Wang, L. 318 Wang, W. 160, 164-166, 179 Watanabe, Y. 341 Watson, G.S. 437, 469 Watt, T.L. 526, 615, 624 Weber, P. 661 Weisburg, S. 481,489 Welsh, A.H. 223, 231 Wendt, P.D. 628 West, S. 513 Whisenand, C.W. 203 White, J.S. 66 Whitten, B. J. Wilk, M.B. 160, 161,163, 166, 167, 460,462, 476, 477, 480, 481 Wilks, S.S. 520 Willett, P. 657 Wilson, E.B. 127, 283 Wingo, D . R . 283 Wise, G.L. 566, 611,624 Wixley, R. A.J. 227 Wolff, C. 329 Worcester, J. 127, 283 Wright T.W. 14, 16-18 Yamanouchi, Z.
183, 201
678 Yanagawa, T. 338 Yin, L. 597 Yitzhaki, S. 230 Young, D.H. 63, 401,402, 409 Yuan, P.T. 288
Author mdex
Yuste, J. C.M.
660
Zelen, M. 76 79 Zelterman, D. 87, 128 Zhang, J. 318
Subject Index
c(-trimmed LJg estimate 536 c~-trimmed Ug filters 545 s-trimmed LJg vector 536 A2 476, 468 Accelerated life testing 178, 497 Accelerated life-test sampling plans 501,506 Accelerated test plans 507 Acceptable quality level (AQL) 498 Acceptable reliability level (ARL) 502 Acceptance sampling 497, 499, 509 Acoustic filters 604 Adaptive algorithm 538, 592 Adaptive filtering 548 Adaptive interval 443 Adaptive L-estimates 232 Adaptive L-statistics 220 Adaptive methods 439 Adaptive order statistic detector (AOS) 657 Additive Markov chain 5, 25 ADQR 275 Agricultural production 159 Agriculture 337 Algorithms for the acquisition 603 Alternating sequential filter 633 Alternative hypothesis 409, 412, 426 Alternative model 476 AM signals 577 Analysis of digital image data 603 Analysis of variance 480, 491 Annual rainfall 198 ANOVA table 455 Antisymmetric 15 APL 108 Applications accelerated life testing 178 agricultural production 159 annual rainfall 198 astronomical 223 biological sciences 5 biological organisms 127 biological assays 45, 127, 159
cancer research 5, 6, 8 clinical trials 5, 6, 8 data compression in space telemetry 184 demographic studies 127 dosage-response studies 45 engineering 5 fatigue test 79 human populations 127 insulating fluid breakdowns 57 lifetesting 3, 6, 7, 25, 26, 45, 55, 56, 63, 82, 163, 196 metal ability 159 physicochemical phenomena 159 population growth 159 reliability theory 3, 6, 163 survival analysis 3, 6, 159 target analyses 45 thread strength in textiles 184 voltage stress 56 Approximate 310 Approximate confidence intervals 26, 45, 56, 57 Approximate maximum likelihood estimation 323 Approximate MLE 326 ARE 244, 248-250, 275, 276 ARMA signal 546 Assessment of defective hearing 514 Astronomical application 223 Asymmetric alternatives 464 Asymmetric distribution 221 Asymmetric spacing 201 Asymptotic covariance factor 505 Asymptotic covariance matrix 242 Asymptotic dispersion 271 Asymptotic distribution 258, 261,317 Asymptotic distributional quadratic risk 272, 274 Asymptotic efficiency 192, 193,200, 219, 241, 317, 326 Asymptotic normality 321,326 679
680
Subject index
Asymptotic percentage points 471 Asymptotic points 465, 471 Asymptotic properties 252 Asymptotic relative efficiency 237, 241,251, 468 Asymptotic variance 191, 192, 199, 226, 232, 249, 250, 300, 303, 306, 326, 500 Asymptotic variance-covariance matrix 200 Asymptotically best linear invariant predictor (ABLIP) 438 Asymptotically best linear unbiased predictor (ABLUP) 438 Asymptotically best linear unbiased estimator (ABLUE) 183, 187, 189, 191, 194, 196, 199, 201,204, 242-246, 249 Asymptotically efficient L-estimates 218 Asymptotically efficient L-estimator 222 Asymptotically normal 317, 319, 322, 323 Asymptotically unbiased 191,200, 218 Asympvar 343 Attribute sampling 497 Autocorrelation 537, 594 Autocorrelation matrix 537 Average ARE 249 Average detection threshold (ADT) 651 Average filter 620 Average level 619 Average smoothing 526 Bahadur representation 252 Bandwidth 225, 226 Bartlett type correction 13, 14 Batch acceptance 499 Bayes estimator 21, 22 Bayesian analysis 379 Bayesian inference 3, 20 Bayesian paradigm 444 Bayesian prediction 444 Bayesian setting 434 Bayesian tests of hypotheses 22 Bayesian test of significance 379 Baye's theorem 20 Bell shaped 287 Bernoulli numbers 131 Best linear invariant estimators (BLIE) 316, 317, 440, 502 Best linear invariant predictor (BLIP) 437 Best linear predictor 517 Best linear unbiased estimator (BLUE) 26, 43-45, 56, 57, 63, 7(~80, 82, 86, 101-104, 108, 112, 113, 115, 128, 143-146, 149, 150, 152, 154, 160-163, 165, 168, 169, 171, 172, 174, 176, 178, 179, 183, 186, 187, 191, 193, 195, 201,202, 216-220, 226, 227, 241,285,
315 317, 326, 340, 341,348, 352, 355, 356, 358-362, 364-366, 36%369, 373, 435, 436, 440, 441,481,503-505, 654-666, 668 Best linear unbiased predictor (BLUP) 435 437, 441 Best MSE filter 550 Best selection 340 Best's vitelliform macular dystrophy 520 Beta distribution 166, 225 Beta function 129, 387 Bi-variate F test 490 Bias 111-115, 148, 149, 170, 284 Bias-corrected mean 345 Biased 271 Bidirectional multistage filters 618 Binary permutation filter 550 Binary relation 15, 541 Binary signals 570 Binomial expansion 73 Binomial probability paper 476 Binomial regression data 128 Biological assays 45, 127, 159 Biological organisms 127 Biological sciences 5 Biomedical image processing systems 614 Bivariate F distribution 490 Bivariate normal distribution 231 Block-diagonal matrix 178 Blom's approximation 488, 489 Blue 366 BLUE detector 654 BLUE estimates 664 BLUE test 668 Bonferroni argument 112 Bootstrap estimator 224 Brain and Shapiro statistic 491 Breakdown of an industrial process 467 Brownian bridge 469 387, 388 X2(2) 328, 329 Z2(fl) 332 ehfl-test 401,402, 409, 410, 417, 418, 426 g~ variables 459, 465, 469, 472 C.L.T. 258 C-S inequality 266, 267 CA detector 667 CA test 668 Cancer research 5, 6, 8 Cauchy 220, 248 Cauchy distribution 201,202, 221,226, 227, 21,245, 340, 341,358, 359, 360, 363, 364, 488 Cauchy-Schwartz inequality 173, 259, 350 chi z
Subject index Cellular 544 Censored 215, 287, 381 383, 390, 393, 394, 460, 490, 506 Censored data 456, 467, 503 Censored observations 310 Censored samples 230, 283, 285, 478,482, 487, 491 Censored sampling plans 497, 500 Censored scale regression model 177 Censoring time 6, 9, 10, 310 Censoring 3, 5, 6, 104, 152, 160 171, 173, 193, 195, 196, 315, 389, 502 Center weighted median (CWM) filter 570574, 598 Central chi-squared distribution 272 CFAR detection 648, 653 CFAR property 663 CFAR radar test detection 668 CFAR test 644-646, 648, 658, 659, 664-667 Characteristic function 131, 132 Characteristic roots 192, 197, 198, 200, 249, 250, 275 Characterization 4, 127, 436 Chebyshev inequality 254, 264 Chemo-therapy treatment 310 Chert and Shapiro statistic 489 Chi-square approximation 26, 43, 45, 502, 503 Chi-square distribution Chi-square statistics 456 Chi-square variates 442 Chi-squared goodness of fit test 475, 482 Chi-squared variables 460 Clinical trials 5, 6, 8 Close-open 632 Closed convex cone 15, 19 Closure property 16 Coefficient of correlation 483, 487 Coefficient of kurtosis 26, 45-47, 55, 88, 89, 130 Coefficient of skewness 26, 45-47, 55, 88, 89, 115, 130 Colored PWOS filters 590 Combination filters 627 Compact set 253, 255 Compactness property 255 Comparable 15 Complete lattice 540, 541 Complete sufficient statistics 502 Compounding distributions 129 Compression 603 Computer algorithms 107 Computer program 483 Concomitants of order statistics 230
681
Conditional confidence intervals 57 Conditional distribution 387 Conditional hazard function 251 Conditional inference 57 Conditional mean length 443 Conditional median 564 Conditional predictive discordancy (CPD) tests 389, 391,392, 394, 395 Conditional quantile function 249 Conditional survival function 251 Confidence intervals 45, 81, 82, 184, 300, 310, 327, 329, 330, 331,503 Conjugate prior 390, 397 Conservative spacing 19(~198, 200 Consistency 326 Consistent 319, 322, 323, 463, 484 Consistent estimator 269 Constant false alarm rate (CFAR) detector 643, 651,652 Constant neighborhood 566 Consumer's risk 498, 502 Contaminated Gaussian noise 529 Contaminated normal distribution 222 Contaminated samples 223 Contamination 221 Convariance structure 513 Convergence in the mean 539 Convergence in the mean square 540 Convex hull 239 Cooley-Tukey fast Fourier transform 604 Coordinate shifts 608 Correlation 79, 108, 150, 152, 230, 231,476, 480, 513 516, 521, 522, 524, 549, 644 Correlation coefficient 453-455, 457, 463465, 467, 481,484, 485 Correlation coefficient statistics 468 Correlation-coefficient test 463, 468 Correlation matrix 521, 523, 537, 540, 621, 623 Correlation measures 230 Correlation structure 520 Correlation tests 456, 460, 461,467, 488 Correlation type tests 485 Courant-Fisher theorem 192, 251 Covariance function 457, 471 Covariance matrix 166, 240, 241,242, 273, 316, 433, 453,461,463,481,503, 504, 507, 514, 515, 521,522 Covariances 5, 63, 67, 76, 77 80, 86, 98, 102,104, 111-115, 117, 125, 128, 132, 136 138, 144, 148, 149, 160, 164, 183, 216, 217, 227, 230, 243,285, 300, 303, 306, 316, 317, 326, 359, 360, 370, 374, 387, 434, 435,454, 457, 469, 519
682 Covariates 380, 381,385 Coverage probability 444 Covering relation 541 Cram~r-Rao lower bound 217 Cramfir-von Mises statistic 464 Cram6r-Wold device 261 Cramer's theorem 538 Critical points 466, 472 Critical values 402-405, 412, 419 Cross-correlation 594 Cross-validatory method 439 Cumulative hazard plot 311 CWOS filters 589 D'Agostino's DA test 483 Data compression in space telemetry 184 Decomposition 603 Decreasing failure rate (DFR) family 87 Demographic studies 127 Dependent variable 231 Design matrix 238, 240, 250, 252, 384 Detail-preserving filters 617 Determinant 190, 507 De Wet-Venter procedure 482 Diagonal matrix 178, 240, 251 Diagnostic procedure 381 Differential equation 88, 117 Differential pulse code monitoring (DPCM) 548 Digamma (psi) function 62, 63, 65, 130 Digital communication signals 525 Digital communications 544 Digital filter 604, 606 Digital halftoning 546 Digital image processing 603 Digital imagery 603 Digital mathematical morphology 616 Digital television 548 Dilation 629 Dilation filter 616 Dimensional normal distribution 242, 273 Dirac delta function 593 Directionally-oriented median filter 618 Discordancy ordering 388 Discordancy testing 379, 380 Discordant observations 379, 380, 387 Discordant values 386 Discrete Legendre polynomial 229 Dispersion measure 228 Display 603 Distance operator 576 Distribution (p + l)-variate Oaussian 241 t 203, 204
Subject index TP2 166 asymmetric 221 asymptotic 258, 261 beta 166, 225 bivariate normal 231 bivariate F 490 Cauchy 201,202, 221,227, 245, 340, 341, 359, 360, 363, 365, 488 chi-square 8, 11 14, 26, 45, 56, 57, 203207, 318, 328, 477, 491 compounding 129 conditional 387 contaminated normal 222 DFR 87 dimensional normal 242, 273 double exponential 166, 203, 210, 220, 226, 501 doubly truncated exponential 26 doubly truncated logistic 117, 119, 124 doubly truncated generalized logistic 85, 116, 117, 119 error 232, 248 EV 467 exchangeable bivariate normal 514 exponential family 16 exponential power 488 exponential 3 5, 7-9, 11, 18-20, 22, 25, 26, 45, 47, 55, 56, 159, 161,163, 164, 166- 170, 176, 185, 193, 195, 199, 202, 207, 226, 284, 287, 288, 308, 311,326, 327, 331,340, 351, 352, 363, 381,433, 438, 439,442, 446, 456, 460, 477, 484, 488-491,507, 508 exponential-gamma 127, 128 extreme value 61, 66, 76, 79, 127, 196-198, 200, 311,340, 341,370, 374, 456, 460, 477, 485, 487, 491,500, 502-504 F 11, 12, 443, 490 gamma 128, 159, 166, 203, 283, 292, 297, 298, 300, 301,306, 307, 311,323, 447, 477 Gaussian 562 generalized logistic generalized Pareto 229 generalized gamma 63, 79 generalized extreme value 129, 227, 229 generating 161, 163-165, 178 half logistic 26 heavier tailed 219, 220 heavy-tailed 467 hypergeometric 225 IFR 87 inverse gamma 304 inverse Gaussian 283, 297, 300, 302, 306, 311 inverse Weibull 445
Subject index inverted gamma 20, 21 joint predictive 392, 399 K 643 K-clutter 660 life 193, 195, 199, 350 location 243 location-scale 143, 159, 160, 163, 167, 174, 175, 177, 179, 183, 186, 187, 201,205 log-concave 160, 166, 179 log-gamma 61, 63, 67 log lifetime 504 log-logistic 26, 195 log-Weibull 127 logistic 26, 86-88, 91, 102, 127, 129, 130, 159, 166, 168-171, 179, 195-197, 340, 341,364, 366, 368, 456, 460, 467, 477, 488 lognormal 226, 227, 283, 288, 297, 300, 303, 306, 311,477, 500-502, 660 long-tailed 222, 231 multinomial 401,402, 426 multivariate normal 187 multivariate hypergeometric 446 noncentral chi-square 273 noncentral t 203, 204, 207 normal 62, 127, 128, 131-133, 159, 160, 163, 164, 166-170, 174, 201,207, 21%221, 223,226, 227, 246, 247,269, 297, 340, 342, 446, 456, 460, 461,463,475, 477, 480, 484, 488, 491,492, 498, 500, 504, 505 null 454, 477, 478, 487 Pareto 26, 264, 202 posterior 434 power function 26, 264 predictive 380, 382, 385, 444-446 prior 386 proper prior 389 Rayleigh 203 right-truncated exponential 25, 26, 45, 47, 48, 55 scale 243 skewed 283, 297 standard normal 292, 489 symmetric continuous 127, 129, 204 symmetric 243 truncated exponential 117 truncated logistic 26, 47 truncated log-logistic 117 type I generalized logistic 127, 128 type II generalized logistic 127, 128 type III generalized logistic 127-131 type IV generalized logistic 128 type V generalized logistic 87
683
uniform 108, 159, 161, 164, 166, 167, 174, 184, 439, 456, 459, 464, 467 Weibull 79, 159, 199, 200, 283, 286, 287, 297, 300, 306, 308, 311,317, 323, 326, 340, 341,370, 439, 445, 477, 491,500-502, 660 Distribution 561, 563 Distribution free intervals 443 DLINDS routine 144 DNEQNF routine 148 DNEQNJ routine 148, 170 Dosage-response studies 45 Double exponential 561 Double exponential distribution 166, 203, 219, 220, 226, 501 Double moments 25, 26, 28, 30, 43, 45, 49, 50 Double precision 137, 144 Doubly censored samples 218 Doubly truncated exponential distribution 26 Doubly truncated generalized logistic distribution 85, 116, 117, 119 Doubly truncated logistic distribution 117, 119, 124 Doubly Type-II censoring 26, 43-46, 56, 57, 76, 160, 168, 170, 174, 175 DPCM 551 DPCM coding 553 DPCM scheme 552 DQDAG routine 137 DTWODQ routine 137 Dynamic programming 190 ECC decoder 549, 551 Ecology 401 Economics 401 EDF statistics 456, 467, 468 Edge 566 Edge detection 634, 635 Efficiency 183, 218, 339 Efficient 319, 323 Efficient estimator 322 Eigenfunctions 469 Eigenvalue 459, 540 Electrical filters 604 Electron micrographs 604 Empirical distribution function (EDF) 185, 441,459 Engineering 5, 401 Enhancement 603 Environmental sampling situations 337 Environmental sciences 230 Equal allocation scheme 340 Equivariant 440 Equivariant estimator 441,664
Subject index
684
Erosion filter 616 Error control coding (ECC) 549 Error diffusion halftone 550 Error distribution 232, 239, 248 Estimating equations 148, 286 Estimator 224 Euclidean space 15 Euler's constant 197, 486 EV distribution 467 Exchangeable 387, 388 Exchangeable bivariate normal distribution 514 Exchangeable normal variates 398 Exp(0,0) 333 Exp(0,1) 330 Expectation 318 Expectation vector 172 Expected live-time 56, 57, 81, 82 Expected sample size 402-407, 410, 411, 415, 416, 424-426 Expected value 284, 287, 288, 293, 297, 34I, 352, 433, 476, 481 Expected value operator 593 Expected value predictor 436 Experimental designs 476 Explanatory variables 231,232 Exponential 300, 331,347, 355, 389,467, 490, 643° 644, 645 Exponential case 395 Exponential density 17 Exponential distribution 3 5, 7-9, 11, 1820, 22, 25, 26, 45, 47, 55, 56 159, 161,163, 164, 166-170, 175, 176, 185, 193, 195, 199, 202, 207, 226, 284, 287, 288,308, 311,318, 326, 327, 331,340, 351,352, 363,381,389,433, 438,439, 442, 446, 456,460, 477, 484, 488491,507, 508 Exponential family 16, 443 Exponential-gamma distribution 127, 128 Exponential mean 341,354 Exponential population 310 Exponential power distribution 488 Exponential prior 445, 446 Exponential test 463, 468 Extreme environmental events 230 Extreme observations 227 Extreme order statistics 219, 221 Extreme value 486, 504 Extreme value distribution 61, 66, 76, 79, 127, 196-198, 200, 311,317, 318,340, 341,370, 374, 456, 460, 477, 485,487, 491,500, 502 504, 507 F-approximation
443
F-distribution 11, 12, 443,490 F-table 332 F-variate 385, 392 Factorial experiments 460 Failure 497 Failure censored 505 Failure censored sampling 501 Failure censored sampling plans 500° 507 Failure times 5, 6, 9, 10 False alarm 651 False alarm probability 652 False alarm rate 654 Fast adaptive optimization algorithm 580 Fast adaptive WOS algorithm 597 Fatigue life 308 Fatigue test 79 Feature registration 634 Filliben procedure 482 Filliben's test 483, 485 Filter lattice structures 540 Filtering 523, 555 Final prediction error (FPE) 591 Finite impulse response (FIR) linear filter 625 FIR-median hybrid filters 618 FIR-median filter 625 First-order moments 125 Fisher information matrix 217, 365 Fisher's information 10, 190, 196, 197, 200 Fixed order statistics test 648 Flood 395 Forecasting 525 Forestry 337 FORTRAN language 168 Fourth standard moments 288 Frequence significance test 389 Frequency-discordancy tests 394 Frequency response 604 Frequentist prediction regions 444 Gamma 385, 445 Gamma distribution 128, 159, I66, 203, 283, 292, 297, 298, 300, 30l, 306, 308, 311,323, 439, 447, 477 Gamma family 445 Gamma function 64, 131,287, 388, 389 Gamma plot 377 Gamma prior 386, 388, 395, 447 Gastwirth's estimator 221,237 Gauss-Markov theorem 162, 186, 187, 216, 316, 372 Gaussian 563 Gaussian distribution 562 Gaussian noise 660 Gaussian process 457, 459, 471,525, 647
Subject index Generalized censored mean level detector (GCMLD) 658 Generalized extreme value distribution 129, 227, 229 Generalized gamma distribution 63, 79 Generalized least squares 454, 462, 463,481, 484 Generalized linear model 435 Generalized logistic distribution 85-87, 97, 98, 101 105, 111, 113, 114, 119, 124, 127, 447 Generalized Pareto distribution 229 Generalized probability paper 476 Generalized variance 76, 190, 192, 194-196, 201 Generalized variance criterion 195, 199, 201 203 Generating density function 159, 162, 173 Generating distribution 161, 163 165, 178 Genetics 401 Geometric mean 249 Giui correlation 231 Gini covariance 230 Gini's mean difference 228, 230 Gini's mean difference statistic 222 Goodness of fit 475, 476, 480 Goodness-of-fit problems 379 Goodness-of-fit statistics 456, 459 Goodness-of-fit tests 132, 184, 205, 207 Gradient-based optimization algorithm 593 Graphical hazard function analysis 310 Greatest lower bound 541,542 Group-test sampling plans 507 Growth function 127 Guarantee time 8 Gumbel density 664 Half logistic distribution 26 Half-normal probability plots 476 Halfone 548 Halftones 553 Hamming code 551 Hamming distance 551 Harrell-Davis estimator 224, 225 Hazard function 87, 148, 191,490 Hazard rate function 490 Hazardous waste site 337 Heavier-tailed distribution 219, 220 Heavy-tailed distribution 467 Hessian matrix 173 High-level image analysis 634 High-order regression equation 463 Higher order moments 25 Highest posterior density region 444
685
Homogeneity 426 Hot spots 337 Human populations 127 HYBRD1 routine 148 HYBRDJ routine 148 Hypergeometric distribution 225 Hyperparameters 388, 389 Hyperrectangle 398 Hypothesis tests 228, 498 Identity matrix 216, 463, 528, 529, 623 Image 525 Image coding 634 Image enhancement and restoration 636 Image filtering 606 Image processing 544, 603-605, 637, 638 Image restoration 597 Image segmentation 634 Image signals 604 Improper prior 385, 389 Impulse 566 IMSL routines 137, 144, 148, 168, 170 Incomplete beta ratio 129 Incomplete gamma function 61 Inconsistent 271 Increasing failure rate (IFR) family 87 Indicator function 256 Indoor radio communications 544 Induced order statistics 230 Information matrix 11, 113, 115, 243 Informative sampling strategies 337 Input signal 578 Input signal to noise ratio 545 Insulating fluid breakdowns 57 Interpolation 185 Interval 431 Interval estimation 326 Interval prediction 432, 440 lntraclass correlation coefficient 519 Invariance 440 Invariant 241 Inverse gamma distribution 304, 318 Inverse Gaussian 309 Inverse Gaussian distribution 283, 297, 300, 302, 306, 311 Inverse halftoning 546~ 547 Inverse sampling 401,402 Inverse Weibull distribution 445 Inverted gamma distribution 20, 21 Isotonic 15, 16 Isotonic regression 14-17 Jacobian
7
686
Subject index
Joint asymptotic relative efficiency (JARE) 243, 246 Joint coverage probability 446 Joint discordancy test 393 Joint predictive distribution 392, 399 k-out of-n fusion rule 660 K-clutter 647 K-clutter distribution 660 K-distribution 643 K-distributed clutter 666, 668 K-distributed clutter process 667 Kaigh-Lachenbruch estimator 225 Kalman filtering 626 Kernel estimation 216, 225 Kernel estimator 225, 232 Kolmogorov-Smirnov procedure 482 Kolmogorov-Smirnov test 205 Kronecker product 528, 534, 535 Kurtosis 220, 228 g-DPCM 550 g-DPCM decoder 552 Ug-LMS algorithm 539, 540 Ug estimate 535 LJg estimator 537 / i f filter 531,532, 537, 539, 540, 542, 543, 546, 547, 553 LJg filter lattice performance 544 Ug-LMS filters 540 Ug lattice approach 559 L/g permutation filter 545 LJg permutation filter lattice 541 Ug PWOS filters 581, 584 Ug type filters 601 Lg-CFAR 657 Lg-DPCM 550, 551 Lg DPCM decoder 552 Lg filter 527, 529-531,535, 549,550, 627, 658, 668 L£ ordering 581 L-estimates 216-218, 220, 222, 226 L-estimation 215, 231,237, 241,248, 251 L-estimators 161, 164, 165, 221,222, 225, 231,237, 251,271,619 L-filters 658 L-inner mean filter 620 L-moments 228-230 L-statistics 215-219, 223, 224, 228-231 Lack-of-memory property 193 Lagrange multiplier 624 Laplacian 563 Laplacian statistic 563 Lattice 541
Law of large numbers 433 Least absolute error estimator 232, 238 Least mean square (LMS) optimization 538 Least squares 465, 471 Least squares projection 14, 19 Least-squares calculations 227 Least-squares criteria 621 Least-square estimation 241 Least-squares estimator (LSE) 164, 238, 246, 247, 251,272, 316 Least-square technique 285, 624 Least-squares theory 161,216 Least upper bound 54l, 542 Left censored 478 Left truncation 116 Level of significance 277 Life distributions 193, 195, 199, 300 Life span model 87 Life spans 283 Life test 315, 478 Life testing 444, 509 Life-time model 128 Lifetesting 3, 6, 7, 25, 26, 45, 55, 56, 63, 82, 163, 196, 431 Lifetimes 5, 9, 431 Likelihood-based methods 226 Likelihood equation 319, 321,324 Likelihood function 7, 9, 12, 13, 20, 111,147, 319, 324 Likelihood ratio approach 12 Likelihood ratio test 12-14, 17-19, 659 Limiting theorems 322 Line regression 455, 517, 521-523 Linear convolution 604 Linear digital filters 604 Linear estimation 61 Linear filter 525, 529, 547, 549, 550, 555, 564, 593, 618, 626 Linear FIR filters 625 Linear functional 222 Linear LMS algorithm 595 Linear model 278 Linear model 160, 206, 238, 248, 250, 271 Linear operators 593 Linear point predictors 448 Linear prediction 434, 545 Linear predictor 435, 437, 513, 517 Linear programming 238 Linear programming techniques 232 Linear regression model 231,237 Linearly separable threshold function 578 LMS adaptive optimization method 553 LMS algorithm 591,592, 594 Local alternatives 272, 273
Subject index Local linearization theorem 191,250 Local maximum likelihood estimators (LMLE) 303 Local monotonicity 611 Locally monotonic of length m 566 Location 215, 220, 221,227, 501,526, 562, 567 Location and scale family 315, 324, 439 Location and scale parameter families 475 Location distribution 243 Location estimators 222 Location measures 228 Location model 239-241 Location parameter 63, 76, 86, 104, 105, 128, t43, 144, 147, 150, 152, 159, 160, 161,170, 177, 183, 186, 190, 191,196, 205, 207, 216, 217, 237, 341,358, 364, 370, 374, 443, 453, 476, 500, 502, 503, 506, 664 Location-scale distributions Location-scale families 227, 317, 440, 446, 448, 500-502 Location-scale family 433, 441 Location-scale models 217, 219 Log lifetime distribution 504 Log inverse gamma prior 445 Log-concave distribution 160, 166, 179 Log-gamma distribution 61, 63, 67 Log-gamma regression model 63 Log-likelihood function 9, 16, 285 Log-logistic distribution 26, 295 Log-normal clutter 665 Log-Weibull distribution 127 Logistic distribution 26, 86-88, 91, 102, 127, 129, 130, 159, 166, 168-171, 179, 195 197, 340, 341,364-366, 368, 456, 460, 467, 477, 488 Logistic function 127 Logistic model 128 Lognormal 294, 296, 307, 643, 647, 666 Lognormal distributed 505 Lognormal distribution Long-tailed distributions 222, 226, 231 Longer tails 219 Loss 432 Lot tolerance percent defective (LTPD) 498 Lot tolerance reliability level (LTRL) 502 Lower bound 541 Lower confidence bound 329, 330, 332, 333 Lower confidence limit 56, 57 Lower prediction intervals 445, 447 Lower specification limit 498, 500, 508 M-dimensional student M-DPCM 550
385
687
M-DPCM decoder 552 Maple V 110 Marginal likelihood estimators 244 Markov chain 655 max/median filters 618 max filter 616 Maximum flood levels 311 Maximum gap 319 Maximum likelihood 432, 438 Maximum likelihood (LM) estimates 518, 521,523, 562, 658 Maximum likelihood estimator (MLE) Maximum likelihood method 116 Maximum likelihood predictor (MLP) 439 Maximum of the order statistic 659 Maximum-likelihood estimation 227 Mean absolute error 599 Mean absolute error (MAE) criterion 628 Mean absolute error (MAE) estimator 564 Mean absolute error formula 593 Mean failure time 431 Mean filter 547 Mean life-time 26, 79 Mean remission time 311 Mean square errors 128, 148, 149, 161, 168170, 177, 226, 316, 317, 434, 435 437, 438, 517, 518, 530, 621 Mean square prediction error 439 Mean value theorem 254, 265 Mean vector 621 Means 5, 26, 4547, 55, 62, 63, 66, 77, 86, 91, 97, 104, 115, 117, 125, 128, 130, 136, 137, 144, 174, 183, 201,206, 207, 216, 217, 219, 223, 224, 226, 227, 241,242, 249, 273, 287, 288, 297, 316, 318, 332, 37, 338, 340, 343, 346, 347, 349-351,359, 365, 387, 402, 446, 454, 457, 459, 460462, 469, 471,485,497, 498, 500, 514, 515, 518, 556, 561-564, 615, 666 Measure of association 230 Measure of fit 454 Measure of skewness 228 Mechanical filters 604 Medial axis transform (MAT) 635 Median 129, 162, 219, 221,237, 238,284, 343, 345-347, 363, 364, 368, 439, 483, 506, 550, 560, 562-564, 565, 569~ 574, 576, 606, 648 Median deviation 223 Median filter windows 607 Median filter 550, 557, 559 571,576, 580, 606, 609, 612, 614-618, 627, 632, 633, 636 Median filter generalization method 580 Median filter roots 609 Median smoothing 526
688
Subject index
Median unbiased 439 Medical data 230 Medical radiographic systems 614 Mental ability 159 Method of error diffusion 547 Method of moments 115, 1I6, 229 Midrange 127, 129 Midrange filter 619 Military 544 Military imaging hardware 614 rain filter 616 Minimal sufficient statistic 9 Minimax approach 190 Minimum mean square error 167, 168, 517, 594 Minimum mean square error unbiased estimator 657 Minimum variance 341 Minimum variance unbiased (MVUE) 285 Minimum of the order statistic 659 Minumum-variance estimators 217 Minimum-variance unbiased estimator 218 Minkowski addition 629 MINPACK routines 148 Mixture of exponential pdf's 443 ML estimates 662 role ofcr 442 Model order (complexity) reduction 588 Modified estimates 300 Modified estimators 83, 306 Modified maximum likelihood equations 286 Modified maximum likelihood estimators (MMLE) 283, 285, 287 Modified mean filter 613 Modified moment 283 Modified moment estimators (MME) 287, 288, 291,293, 297, 306, 311 Modified nearly best linear estimator (MNBLE) 358 Modulo N addition 532 Modulo N operation 532 Moment 132, 284 Moment estimating equations 293 Moment estimators 115 Moment generating function 61, 64, 130, 135 Moment matrix 536 Monotone property 256 Monte Carlo percentage points 465 Monte Carlo process 148 Monte Carlo sampling 465 Monte Carlo simulation 167, 168, 205 207, 397, 488, 505 Monte Carlo study 492 Monte-Carlo 387
Monte-Carlo methods 389, 454 Morphological filters 628 Morphology 628 mOS detector 659, 660 MOS detector 659, 660 Mosteller's theorem 194, 203 Motion analysis 634 Moving median 616 Moving window 606 rose 439 Multi-level median 569 Multinomial distribution 401,402, 426 Multinomial expansion 388 Multiple correlation coefficient 517, 518 Multiple linear regression 388 Multiply type-II 315 Multiply type-II censored samples 317, 324, 327, 331 Multiply type-II censoring 319, 326 Multistage median filters 617 Multivariate experiments 477 Multivariate hypergeometric distribution 446 Multivariate normal distribution 187 Multivariate normal 515, 516, 519, 521 Multivariate samples 230 Multivariate statistics 215 MX-OS detector 663 N(0,1)
328, 329
N(~z)
39s
N(/~,a 2) 397 Nearly best linear estimator (NBLE) 358 Nearly best nearly unbiased linear estimator 160, 166, 179 Nearly best unbiased linear estimator 160, 164, 166 Newton-Raphson iteration 319, 442 Newton's method 148 Neyman allocation 340 Non-Gaussian 525 Non-Gaussian environments 580 Non-informative prior 390, 391 Non-linearestimators 161,166, 171,179, 180, 203 Non-normal 221 Non-singular matrix 239 Noncentral Z2 388 Noncentral chi-square distribution 273 Noncentral t distribution 203,204, 207 Nonexceedance probability 223 Noninformative prior 20, 382, 384, 444, 445 Nonlinear filter design 638 Nonlinear filters 525, 542, 555 Nonlinear image filtering 605
Subject index Nonlinear prediction 448 Nonlinear smoother 559 Nonlinear time series 525 Nonparametric 444 Nonparametric confidence intervals 223 Nonparametric estimation 220, 226 Nonparametric estimator 338 Nonparametric inference 223, 379 Nonparametric prediction intervals 433 Nonparametric regression 216 Normal 349, 350, 354, 363, 387, 444, 445, 478, 482, 483, 499, 402, 502, 515, 517, 518 Normal approximation 503 Normal density 225 Normal distribution 62, 127, 128, 131-133, 159, 160, 163, 164, 166-170, 174, 201,207, 217- 221,223, 226, 227, 246, 247, 269, 297 340, 342, 446, 456, 460, 461,463,475, 477, 480, 484, 486, 488,491,492, 498, 500, 504, 505, 507 Normal equations 242 Normal linear regression 383 Normal model 477, 479 Normal population 341 Normal probability plot 479 Normal random variable 388 Normal standard deviation 223 Normal variance 347 Normal-gamma prior 384 Normal-gamma conjugate prior 385 Normality 461,462, 467, 481,486 Normalized spacings 24, 45, 488 Normally 288, 388 Normally distributed 217, 454 Nuisance-parameter 16 Null distribution 454, 477, 478, 487 Null hypothesis 203, 271,278, 333, 404, 409, 475, 478, 480, 643 Null model 476 Numerical integration 63, 73, 134 Omnibus 475 Omnibus test 481 One sided interval 441 Open-close 632 Operating characteristic (OC) curve 504, 507, 508 Optimal adaptive techniques 590 Optimal kernel estimator 226 Optimal L-estimates 216 Optimal linear inference 183 Optimal spacing 184, 190, 191,195-198, 200 203, 245 Optimum regression quantiles 246, 247
689
Optimum sample quantiles 237 Optimum spacings 248-251 Order restricted inference 3 Order statistic (OS) detectors 653, 660, 667 Order statistics (OS) filters 605,615, 619, 621, 623-628, 630, 636 Ordering relation 540 Ordinary least squares 464 Ordinary least squares criterion 231 Orthogonal contrasts 491 Orthogonal polynomial 490 OS based tests 668 OS-CFAR test 653, 656 Oscillation 566 Outliers 163, 222, 226, 227, 251,379, 380, 385, 387, 388, 390, 391,431,446,447, 479, 497, 529, 530, 535, 555, 556, 562, 580, 583, 627 Outlying observations 476 Outlying values 219 Overlapping samples 611 Ozturk and Korukorglu statistic 488 (p + 1)-variate Gaussian distribution 241 P-value 380, 382, 387 p-values 108, III, 114, 115, 482, 485, 488, 491,492 Pareto distribution 26, 164, 202 Partial RSS 341,342, 348, 352, 353, 356, 361, 369 Partial correlation 516, 517 Partial order 15 Patients 310 Pearson's test 205 Percentage points 163, 441~443, 487 Percentiles 483-485, 487, 489~49l, 503 Permutation filter 540, 541,546, 581,627 Permutation filter approach 545 Permutation filter lattice 540 Permutation lattice filters 543 Permutation space 529 Permutation weighted order statistic filter lattice 587, 588 Permutation weighted order-statistics (PWOS) filters 559, 580, 582-584, 586-591,594, 597-601 Personal wireless communication systems 548 Photograph 604 Physical negative image 604 Physicochemical phenomena 159 Piecewise constant (PICO) regression 637 Pitman asymptotic relative efficiency 204 Pivotal quantity 501 Pivotal statistics 328
690
Subject index
Pivotals 431,440, 442, 446 Pixel replication 608 Plotting positions 454 Point prediction 431,432, 434 Poisson regression 385, 388, 392 Poisson variates 388 Population growth 159 Poset 540 Positive definite 216, 252 Positive definite matrix 173, 274 Positive slide 604 Positive-rule shrinkage trimmed L-estimator 272 Posterior densities 381,397 Posterior distribution 434 Posterior mean 388 Posteriori estimator 564 Powell's hybrid algorithm 148, 170 Power 204, 206, 218, 402-409, 413, 414, 417420, 422, 426, 464, 467, 475, 481,482, 484, 486, 489, 490, 492 Power function distribution 26, 164 Prediction 431-435, 448 Prediction error 438 Prediction limits 443 Prediction region 432, 440, 444 Predictive density 434 Predictive distribution 380, 382, 385, 444-446 Predictive intervals 431,434, 441-444, 446448 Predictive likelihood function (PLF) 439 Predictive maximum likelihood estimator (PMLE) 439 Predictive probability 380, 382, 385, 395 Preliminary estimator 237, 238 Preliminary test 271,277 Preliminary test estimator 278 Preliminary test trimmed L-estimator 272 Prior density 380, 381 Prior distribution 386 Prior information 395 Prior mean 387 Probability detection 648, 649 Probability filters 542 Probability integral transform 444 Probability of concentration (PC) 438 Probability of false alarm 648, 661 Probability of nearness (PN) 438 Probability paper 454, 460, 477, 507 Probability plot 453, 454, 460, 476, 478, 480 Probability transform method 108 Probability-plot technique 461 Probability-weighted moment estimator of tr 487
Producer's risk 498, 502 Product moment correlation 517 Product moments 63, 72, 73, 75, 76, 85, 86, 88 91, 92, 102, 103, 117, 119, 124, 132, 134, 137 Progressively censored sample 285, 287 Progressively censored survival data 310 Projection operator 594 Proper prior 387, 389 Proper prior distribution 389 PRSS 362, 367, 369, 370, 373 PRSTLE 272, 273 Pseudonoise (PN) 545 Psi function (see Digamma function) PSTLE 278 PTTLE 272, 273, 277 PWOS filter lattices 586 PWOS filter structure 587 PWOS[jI filters 586-588 Q-Q plots 79, 81, 150-153 Quadratic estimator 238, 241 Quadratic form 203, 207, 242 Quadratic loss function 274 Quadratic programming 657 Quadruple moments 25, 26, 36, 43, 45, 52, 55 Quantile function 185, 191, 194, 196, 202, 203, 217, 222-225 Quantile process 457 Quantiles 108, 183, 184, 190, 191, 193, 201, 215, 219, 221-224, 226, 228, 229, 232, 238240, 328, 330-333, 344, 431,438, 439, 444 Quartiles 431 Quasi-order 15 Quasi-ordered set 15 Quick estimators 358 R-estimator 271 Radar data 604 Radar target detection 643 Random censoring 3, 6, 8, 14 Random error 476 Random processes 525 Range 218, 222, 431,611 Rank covariance 230 Rank ordering 638 Rank permutation 535 Rank-order (RO) filters 555, 606, 616, 628 Ranked set sampling 337 Ranking 338 Ranks 187, 231,271,278, 456 Ransacked 391 Ransacked data 386 Ransacking 387
Subject index Rao-Cram+r lower bound (RCLB) 359, 360, 362, 364, 365, 367-369 Rayleigh 643, 647 Rayleigh clutter 647, 665, 666, 668 Rayleigh distributed 652 Rayleigh distribution 203 Re-weighted least squares techniques 238 Reconstruction 603 Recurrence relations 2638, 40, 41, 43, 48, 50, 52, 55, 65, 66, 76, 85, 86, 88, 90-92, 96, 100, 102, 117, 119-124, 136 Recursive computational algorithm 103, 104, 117, 125 Reflexive 15 Region 434 Regression 1st quantile 246 Regression 3rd quantile 246 Regression 475, 476, 481,488, 489 Regression analysis 206 Regression coefficients 160, 161,178, 231,232 Regression equation 454 Regression estimator 207, 455 Regression line 486, 522 Regression median 246, 247 Regression model 161, 178, 231,232, 247, 251,269 Regression parameter 165, 166, 237,238, 241, 242, 250, 251,271,278, 435 Regression quantiles 232, 237-241,246, 248, 249-251,257 Regression tests 481,484, 485 Regression type tests 480 Rejectable quality level (RQL) 498 Relative efficiency 128, 149, 150, 468 Relative maximum likelihood 79 Relative precision 338 Reliability 350, 503 Reliability procedure 656 Reliability theory 3, 6, 163 Remission time 310 Residual errors 240 Residual mean survival time 327 Residual sum of squares 161 Residual-based estimators 174, 175 Residuals 231,465 Restoration 603 Restricted least squares method 14 Restricted maximum likelihood estimate 17, 18 Restricted trimmed L-estimator 271 Reverse J shaped 287 Right censored samples 285 Right truncation 116 Right-censored 456-458
691
Right-censored data 460 Right-truncated exponential distribution 25, 26, 45, 47, 48, 55 Robust 218, 238, 241,497, 535, 555 Robust DPCM coding 548 Robust estimates 220 Robust estimation 219, 271,500 Robust estimator 215, 219, 241,251 Robust L-estimates 221 Robust methods 237 Robust statistics 216 Robust techniques 603 Robust two-sided variable sampling plans 500 Robustness 219, 221,222, 226, 230, 499, 509, 525 Robustness criteria 621 Robustness studies 128, 132 Root signals 611 Rounding errors 26 Royston's approximation 483 RP 356, 357 RQ's 261 RSS 338-348, 350, 352, 353, 355, 356-359, 363-368, 370-373 RTLE 272 Running 526 Running L estimator 526 Running L-filter 526 Running median 526 Running median filter 555, 559 Sample median 360 Sampling plans 497, 498, 500, 504, 508, 509 Scale 454, 462, 501,526 Scale and location invariant 467 Scale distribution 243 Scale estimators 215, 222 Scale model 241 Scale parameter 3, 8, 20, 26, 43, 56, 63, 76, 86, 104, 105, 128, 143, 144, 147, 150, 152, 159161, 164-166, 168, 170, 177-179, 183, 186, 190, 191,199, 200, 205, 207, 216, 217, 227, 241,242, 250, 251,287, 288, 292, 317, 341, 358, 364, 368, 370, 439, 443,446, 447,453, 475, 481,502, 503, 661,664 Scale-location parameter family 488 Scaling parameter 480 Scatter diagrams 240 Scatter plot 246, 247 Score statistic 322 Sea clutter in radar 525 Second-order moments 125, 525 Selection and estimation (SE) test 655
692
Subject index
Selection and ranking procedure 656 Selection type filters 555, 556 Separable median filter 612, 613 Shape parameter 63, 91, 103, 105, 111, 113115, 117, 124, 125, 128, 129, 131, 143, 147, 150, 152, 159, 166, 199, 217, 227, 287, 288, 292, 293, 297, 306, 317, 370,439, 445, 447, 453, 477, 485, 661,663 Shapiro-Francia procedure 482 Shapiro-Francia test 462, 464 Shapiro-Francia W/ 463 Shapiro-Wilk statistic 160 Shapiro-Wilk test 462, 464, 467, 485 Shapiro-Wilk W 463, 483 Shift model 448 Shrinkage estimation 272 Shrinkage estimators 278 Shrinkage factor 277 Shrinkage trimmed L-estimator 272 Signal processing 525 Significance test 380, 386, 387 Significance level 385, 392, 393, 405-408, 413-420, 422, 424-426 Simple least squares 463 Simplelinear unbiased estimator (SLUE) 179 Simple loop order 16 Simple order 15 Simple random sampling (SRS) 337 339, 345-347, 349 351,353-355, 358, 360-364, 366, 367, 369, 370 Simple regression model 245 Simple tree order 16 Simplified L-estimation 241 Simplified linear estimator 164 Simplified linear estimates 216 Simplified quantile estimators 238 Simulation 404, 487 Simulation study 111, 161,284, 306 Single moments 25 28, 43, 45, 48, 63, 65, 85, 86, 88, 90, 91, 117, 133, 135-137 Singly 285 Singly censored 286 Singular value decomposition 537 Skewed distributions 283, 297 Skewness 661 Skew-symmetric matrix 77 Slippage 405, 406, 409, 426 Slippage alternative 403 Slope of the regression 490 Slope parameters 232 Smallest lifetime 431 Smooth kernel function 225 Smooth L-estimator 222, 232 Smooth weight function 217
Smoothing 525, 555 Snellen chart 513 Sociology 401 Space signal processing 609 Spacing 187, 191-194, 196, 197, 201,204, 205, 241,246, 322 Spacing tests 489, 491 Spacings 240, 442, 475, 487, 488, 490, 491 Spatial ordering 638 Speech signal filtering 606 Speech waveforms 525 Spherically invariant random process (SIRP) 667 Sports injury data 514 Squared loss function 21 Stack filters 628 Stacking property 569, 628 Stacks 569 Standard deviation 82, 174, 201,218, 226, 446, 462, 491,497, 498, 500, 502, 521,620 Standard error 82, 108, 109, 111, 112, 114, 152, 357 Standard errors 522 Standard normal 491 Standard normal density 462 Standard normal distribution 292, 489 Standard normal population 344 Standard normal variate 300 Standard normally 492 Statistical software package 477 Steepest descent 591, 594 Steepest descent algorithm 198 Steepest descent optimization algorithm 596 Stein-type estimators 237, 272 Steller velocity 223 Stereo vision 634 Stirling's formula 652 STLE 272, 273, 277, 278 Stopping rules 402 Stratified sampling 340 Streak 614 Streaking 614 Streaking artifacts 614 Students t statistics 218 Sufficient statistic 358 Sum of squares of residuals 454 Superposition property 628 Survival analysis 3, 6, 159 Survival function 7, 9, 191 Survival studies 283, 285 Survival time 327, 460 Swerling target models 652 Symmetric 243 Symmetric censoring 179
Subject index Symmetric continuous distribution 127, 129, 204 Symmetric distributions 243 Symmetric L-estimators 221 Symmetric spacing 190, 20l, 202, 204 Symmetric trimmed mean 526 Symmetrical censoring 500 Symmetrical type II censoring 500 t distribution 203, 204 Target analyses 45 Taylor expansion 324 Telephone systems 548 Television signal 604 Test of homogeneity 401 Test statistic 272 Tests of fit 453 Tests of hypotheses 56, 184 Tests of normality 464 Tests of significance 203 Textile strength 218 Third standard moments 288, 293 Thread strength in textiles 184 Threshold 327, 396 Threshold decomposition 567 570, 579, 591, 628 Threshold decomposition property 616 Threshold decomposition signal 569, 577 Threshold parameter 8, 283,287, 288, 292 Time series analysis 526, 544 Time censored sampling plans 505 Time-series 525, 528, 555, 562 TLSE 252, 261,269, 272 Tolerance bounds 503 Tomographic images 604 Total test time 508 Total time on test 12, 19 Total variance 190 Total vision impairment 513 Totally positive of order 2 (TP2) 166 Trace 190, 461 Transitive 15 Translated exponential 386, 392 Tri-diagonal matrix 188 Trigamma function 62, 63 Trimean 221 Trimmed 237 Trimmed estimation 271 Trimmed estimators 272 Trimmed L-estimator 237 Trimmed least squared estimation 237, 25l Trimmed least squares 278 Trimmed mean 219, 221,223, 231,232, 526 Trimmed mean filter 526, 624
693
Trimmed mean test 653 Trimmed standard deviation 223 Trimming 220 Trimming method 556 Trimming statistic 573 Triple moments 25, 26, 31, 35, 43, 45, 50, 52 Truncated exponential distribution 117 Truncated logistic distribution 26, 47 Truncated log-logistic distribution 117 Truncated Votterra series expansion 542 Truncation parameters 125 Two-sided intervals 441 Two-tails test 206 Type I censoring 3, 6, 8, 20-22, 285 Type I generalized logistic distribution 127, 128 Type II censored 665 Type II censoring 3, 6-8, 11-13, 17, 18, 20° 21, 63, 160, 285, 315, 316, 322 Type II generalized logistic distribution 127, 128 Type III generalized logistic distribution 127131 Type IV generalized logistic distribution 127 Type V generalized logistic distribution 87 Type-I errors 112 Type-II right censoring 26, 44, 56, 63, 82, 86, 103, 105, 108, 143, 147, 150, 152 Type-II censored sample 319, 431 U-statistic 224, 228 UMP test 644 Unbiased 285, 316, 317, 346, 347 Unbiased confidence bounds 502 Unbiased estimate 114, 167 Unbiased estimator 216, 218,228, 317, 340, 342, 347, 354, 355, 357, 359, 361,363, 365, 366, 368, 500 Unbiased nearly best linear estimates 217 Unbiased predictor 437, 439 Unbiasedness 352, 369, 437 Uncertain prior information 271 Unconditional predictive discordancy (UPD) 387, 395 Undirectional multistage filters 618 Undirectional multistage max/median filter 618 Undirectional multistage median filter 618 Unequal allocation schemes 340 Uniform 338, 443, 467, 563 Uniform distribution 108, 159, 161, 164, 166, 167, 174, 184, 324, 439, 456, 459, 464, 467 Uniform spacing 201,202 Uniform test 468
694
Subject index
Uniformly minimum variance unbiased estimator (UMVUE) 163, 351,353, 354, 357, 358, 644, 645, 653 Unique minimum variance linear unbiased estimators (UMVLUE) 370, 372, 374 Unrestricted driver's license 513 Unrestricted trimmed L-estimator 271 Upper bound 541 Upper confidence limit 310 Upper quartile 225 Upper tail significance 466 UTLE 277 Variable rimmed mean (VTM) 655 Variable sampling plans 497-502 Variable sampling 497 Variance inequality 343, 344, 363, 367 Variance-covariance matrix Variance-covariance 305 Variance-covariance factors 306-309 Variances 5, 26, 44~7, 55, 62-64, 66, 77-80, 86, 91, 98, 102-104, 111-115, 117, 125, 128, 130, 135 138, 144, 148, 149, 160, 162, 159, 177, 189, 191, 192, 195, 204, 206, 216, 217, 238, 243, 269, 284, 285, 287, 288, 293, 297, 300, 316, 318, 332, 337, 340, 341,345, 346, 352, 355,359, 361,365,367-370,372374, 387, 401,435, 445, 454, 455,459-462, 465, 500, 515, 518, 561,563, 573, 615, 666 Video signals 525 Visual acuity 513 Visual acuity measurements 521 Visual acuity loss 520 Voltage stress 56 Volterra kernel 542, 543 Volterra non-linear filters 542 VTM detector 655, 656 W2 467, 468 W statistic 163 W test 48l, 586 Warranty time 327
Watson U2-statistic 465, 469 Weak superposition property 569 Weibull 289, 290, 305, 312, 485-488, 492, 502, 643, 647 Weibull clutter 665, 668 Weibull distributed 505, 506 Weibull distribution 79, 159, 199, 200, 283, 286, 287, 297, 300, 306, 308, 311,317, 323, 326, 340, 341,370, 439, 445, 477, 491,500502, 660 Weibull family 447, 661 Weibull test 492 Weighted distance measure 575 Weighted distance operator 574 Weighted empirical approach 252 Weighted empirical processes 240, 252 Weighted Euclidean distance 15 Weighted least squares approach 161, 162, 165 Weighted least squares objective function 171 Weighted majority with minimum range (WMMR) filter 624, 635 Weighted mean (WM) filter 549, 551,557, 570, 574, 575-577 Weighted median 556, 599 Weighted order statistic (WOS) filter 557, 570, 576-583, 586, 590-592, 594, 598, 601 Weighted residual sum of squares 166 Weighted spacings 476 Weighted sum filters 558 Weighted sum type filters 555 Weight function 217, 218, 225, 232 Weisberg-Bingham procedure 482 White noise 563 Wiener approach 536 Wilson-Hilferty transformation 328 Window set 607 Winsorized mean 221,653 Winsorized mean of the residuals 231 Without replacement 224 Worst vision among patients 514 Worst vision following surgery 514
Handbook of Statistics Contents of Previous Volumes
Volume 1. Analysis of Variance Edited by P. R. Krishnaiah 1980 xviii + 1002 pp.
1. Estimation of Variance Components by C. R. Rao and J. Kleffe 2. Multivariate Analysis of Variance of Repeated Measurements by N. H. Timm 3. Growth Curve Analysis by S. Geisser 4. Bayesian Inference in MANOVA by S. J. Press 5. Graphical Methods for Internal Comparisons in ANOVA and MANOVA by R. Gnanadesikan 6. Monotonicity and Unbiasedness Properties of ANOVA and MANOVA Tests by S. Das Gupta 7. Robustness of ANOVA and MANOVA Test Procedures by P. K. Ito 8. Analysis of Variance and Problem under Time Series Models by D. R. Brillinger 9. Tests of Univariate and Multivariate Normality by K. V. Mardia 10. Transformations to Normality by G. Kaskey, B. Kolman, P. R. Krishnaiah and L. Steinberg 11. ANOVA and MANOVA: Models for Categorical Data by V. P. Bhapkar 12. Inference and the Structural Model for ANOVA and MANOVA by D. A. S. Fraser 13. Inference Based on Conditionally Specified ANOVA Models Incorporating Preliminary Testing by T. A. Bancroft and C. -P. Han 14. Quadratic Forms in Normal Variables by C. G. Khatri 15. Generalized Inverse of Matrices and Applications to Linear Models by S. K. Mitra 16. Likelihood Ratio Tests for Mean Vectors and Covariance Matrices by P. R. Krishnaiah and J. C. Lee
695
696 17. 18. 19. 20. 21. 22. 23. 24. 25.
Contents of previous volumes
Assessing Dimensionality in Multivariate Regression by A. J. Izenman Parameter Estimation in Nonlinear Regression Models by H. Bunke Early History of Multiple Comparison Tests by H. L. Harter Representations of Simultaneous Pairwise Comparisons by A. R. Sampson Simultaneous Test Procedures for Mean Vectors and Covariance Matrices by P. R. Krishnaiah, G. S. Mudholkar and P. Subbiah Nonparametric Simultaneous Inference for Some MANOVA Models by P. K. Sen Comparison of Some Computer Programs for Univariate and Multivariate Analysis of Variance by R. D. Bock and D. Brandt Computations of Some Multivariate Distributions by P. R. Krishnaiah Inference on the Structure of Interaction in Two-Way Classification Model by P. R. Krishnaiah and M. Yochmowitz
Volume 2. Classification, Pattern Recognition and Reduction of Dimensionality Edited by P. R. Krishnaiah and L. N. Kanal 1982 xxii + 903 pp.
1. Discriminant Analysis for Time Series by R. H. Shumway 2. Optimum Rules for Classification into Two Multivariate Normal Populations with the Same Covariance Matrix by S. Das Gupta 3. Large Sample Approximations and Asymptotic Expansions of Classification Statistics by M. Siotani 4. Bayesian Discrimination by S. Geisser 5. Classification of Growth Curves by J. C. Lee 6. Nonparametric Classification by J. D. Broffitt 7. Logistic Discrimination by J. A. Anderson 8. Nearest Neighbor Methods in Discrimination by L. Devroye and T. J. Wagner 9. The Classification and Mixture Maximum Likelihood Approaches to Cluster Analysis by G. J. McLachlan 10. Graphical Techniques for Multivariate Data and for Clustering by J. M. Chambers and B. Kleiner 11. Cluster Analysis Software by R. K. Blashfield, M. S. Aldenderfer and L. C. Morey 12. Single-link Clustering Algorithms by F. J. Rohlf 13. Theory of Multidimensional Scaling by J. de Leeuw and W. Heiser 14. Multidimensional Scaling and its Application by M. Wish and J. D. Carroll 15. Intrinsic Dimensionality Extraction by K. Fukunaga
Contents of previous volumes
697
16. Structural Methods in Image Analysis and Recognition by L. N. Kanal, B. A. Lambird and D. Lavine 17. Image Models by N. Ahuja and A. Rosenfeld 18. Image Texture Survey by R. M. Haralick 19. Applications of Stochastic Languages by K. S. Fu 20. A Unifying Viewpoint on Pattern Recognition by J. C. Simon, E. Backer and J. Sallentin 21. Logical Functions in the Problems of Empirical Prediction by G. S. Lbov 22. Inference and Data Tables and Missing Values by N. G. Zagoruiko and V. N. Yolkina 23. Recognition of Electrocardiographic Patterns by J. H. van Bemmel 24. Waveform Parsing Systems by G. C. Stockman 25. Continuous Speech Recognition: Statistical Methods by F. Jelinek, R. L. Mercer and L. R. Bahl 26. Applications of Pattern Recognition in Radar by A. A. Grometstein and W. H. Schoendorf 27. White Blood Cell Recognition by E. S. Gelsema and G. H. Landweerd 28. Pattern Recognition Techniques for Remote Sensing Applications by P. H. Swain 29. Optical Character Recognition - Theory and Practice by G. Nagy 30. Computer and Statistical Considerations for Oil Spill Identification by Y. T. Chinen and T. J. Killeen 31. Pattern Recognition in Chemistry by B. R. Kowalski and S. Wold 32. Covariance Matrix Representation and Object-Predicate Symmetry by T. Kaminuma, S. Tomita and S. Watanabe 33. Multivariate Morphometrics by R. A. Reyment 34. Multivariate Analysis with Latent Variables by P. M. Bentler and D. G. Weeks 35. Use of Distance Measures, Information Measures and Error Bounds in Feature Evaluation by M. Ben-Bassat 36. Topics in Measurement Selection by J. M. Van Campenhout 37. Selection of Variables Under Univariate Regression Models by P. R. Krishnaiah 38. On the Selection of Variables Under Regression Models Using Krishnaiah's Finite Intersection Tests by J. L Schmidhammer 39. Dimensionality and Sample Size Considerations in Pattern Recognition Practice by A. K. Jain and B. Chandrasekaran 40. Selecting Variables in Discriminant Analysis for Improving upon Classical Procedures by W. Schaafsma 41. Selection of Variables in Discriminant Analysis by P. R. Krishnaiah
698
Contents of previous volumes
Volume 3. Time Series in the Frequency D o m a i n Edited by D. R. Brillinger and P. R. Krishnaiah 1983 xiv + 485 pp.
1. Wiener Filtering (with emphasis on frequency-domain approaches) by R. J. Bhansali and D. Karavellas 2. The Finite Fourier Transform of a Stationary Process by D. R. Brillinger 3. Seasonal and Calender Adjustment by W. S. Cleveland 4. Optimal Inference in the Frequency Domain by R. B. Davies 5. Applications of Spectral Analysis in Econometrics by C. W. J. Granger and R. Engle 6. Signal Estimation by E. J. Hannan 7. Complex Demodulation: Some Theory and Applications by T. Hasan 8. Estimating the Gain of a Linear Filter from Noisy Data by M. J. Hinich 9. A Spectral Analysis Primer by L. H. Koopmans 10. Robust-Resistant Spectral Analysis by R. D. Martin 11. Autoregressive Spectral Estimation by E. Parzen 12. Threshold Autoregression and Some Frequency-Domain Characteristics by J. Pemberton and H. Tong 13. The Frequency-Domain Approach to the Analysis of Closed-Loop Systems by M. B. Priestley 14. The Bispectral Analysis of Nonlinear Stationary Time Series with Reference to Bilinear Time-Series Models by T. Subba Rao 15. Frequency-Domain Analysis of Multidimensional Time-Series Data by E. A, Robinson 16. Review of Various Approaches to Power Spectrum Estimation by P. M. Robinson 17. Cumulants and Cumulant Spectral Spectra by M. Rosenblatt 18. Replicated Time-Series Regression: An Approach to Signal Estimation and Detection by R. H, Shumway 19. Computer Programming of Spectrum Estimation by T. Thrall 20. Likelihood Ratio Tests on Covariance Matrices and Mean Vectors of Complex Multivariate Normal Populations and their Applications in Time Series by P. R. Krishnaiah, J. C. Lee and T. C. Chang
Contents of previous volumes
699
Volume 4. Nonparametric Methods Edited by P. R. Krishnaiah and P. K. Sen 1984 xx + 968 pp.
1. Randomization Procedures by C. B. Bell and P. K. Sen 2. Univariate and Multivariate Mutisample Location and Scale Tests by V. P. Bhapkar 3. Hypothesis of Symmetry by M. Hugkovfi 4. Measures of Dependence by K. Joag-Dev 5. Tests of Randomness against Trend or Serial Correlations by G. K. Bhattacharyya 6. Combination of Independent Tests by J. L. Folks 7. Combinatorics by L. Takfics 8. Rank Statistics and Limit Theorems by M. Ghosh 9. Asymptotic Comparison of Tests-A Review by K. Singh 10. Nonparametric Methods in Two-Way Layouts by D. Quade 11. Rank Tests in Linear Models by J. N. Adichie 12. On the Use of Rank Tests and Estimates in the Linear Model by J. C. Aubuchon and T. P. Hettmansperger 13. Nonparametric Preliminary Test Inference by A. K. Md. E. Saleh and P. K. Sen 14. Paired Comparisons: Some Basic Procedures and Examples by R. A. Bradley 15. Restricted Alternatives by S. K. Chatterjee 16. Adaptive Methods by M. Hugkovfi 17. Order Statistics byJ. Galambos 18. Induced Order Statistics: Theory and Applications by P. K. Bhattacharya 19. Empirical Distribution Function by E. Csgki 20. Invariance Principles for Empirical Processes by M. Cs6rg6 21. M-, L- and R-estimators by J. Jure6kovfi 22. Nonparametric Sequantial Estimation by P. K. Sen 23. Stochastic Approximation by V. Dupa~ 24. Density Estimation by P. R~v~sz 25. Censored Data by A. P. Basu 26. Tests for Exponentiality by K. A. Doksum and B. S. Yandell 27. Nonparametric Concepts and Methods in Reliability by M. Hollander and F. Proschan 28. Sequential Nonparametric Tests by U. Mfiller-Funk 29. Nonparametric Procedures for some Miscellaneous Problems by P. K. Sen 30. Minimum Distance Procedures by R. Beran 31. Nonparametric Methods in Directional Data Analysis by S. R. Jammalamadaka 32. Application of Nonparametric Statistics to Cancer Data by H. S. Wieand
700
Contents of previous volumes
33. Nonparametric Frequentist Proposals for Monitoring Comparative Survival Studies by M. Gail 34. Meterological Applications of Permutation Techniques based on Distance Functions by P. W. Mielke, Jr. 35. Categorical Data Problems Using Information Theoretic Approach by S. Kullback and J. C. Keegel 36. Tables for Order Statistics by P. R. Krishnaiah and P. K. Sen 37. Selected Tables for Nonparametric Statistics by P. K. Sen and P. R. Krishnaiah
Volume 5. Time Series in the Time D o m a i n Edited by E. J. Hannan, P. R. Krishnaiah and M. M. R a o 1985 xiv + 490 pp.
1. Nonstationary Autoregressive Time Series by W. A. Fuller 2. Non-Linear Time Series Models and Dynamical Systems by T. Ozaki 3. Autoregressive Moving Average Models, Intervention Problems and Outlier Detection in Time Series by G. C. Tiao 4. Robustness in Time Series and Estimating ARMA Models by R. D. Martin and V. J. Yohai 5. Time Series Analysis with Unequally Spaced Data by R. H. Jones 6. Various Model Selection Techniques in Time Series Analysis by R. Shibata 7. Estimation of Parameters in Dynamical Systems by L. Ejung 8. Recursive Identification, Estimation and Control by P. Young 9. General Structure and Parametrization of ARMA and State-Space Systems and its Relation to Statistical Problems by M. Deistler 10. Harmonizable, Cram6r, and Karhunen Classes of Processes by M. M. Rao 11. On Non-Stationary Time Series by C. S. K. Bhagavan 12. Harmonizable Filtering and Sampling of Time Series by D. K. Chang 13. Sampling Designs for Time Series by S. Cambanis 14. Measuring Attenuation by M. A. Cameron and P. J. Thomson 15. Speech Recognition Using LPC Distance Measures by P. J. Thomson and P. de Souza 16. Varying Coefficient Regression by D. F. Nicholls and A. R. Pagan 17. Small Samples and Large Equation Systems by H. Theil and D. G. Fiebig
Contents of previous volumes
701
Volume 6. Sampling Edited by P. R. Krishnaiah and C. R. R a o 1988 xvi + 594 pp.
I. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20.
21. 22. 23. 24.
A Brief History of Random Sampling Methods by D. R. Bellhouse A First Course in Survey Sampling by T. Dalenius Optimality of Sampling Strategies by A. Chaudhuri Simple Random Sampling by P. K. Pathak On Single Stage Unequal Probability Sampling by V. P. Godambe and M. E. Thompson Systematic Sampling by 13. R. Bellhouse Systematic Sampling with Illustrative Examples by M. N. Murthy and T. J. Rao Sampling in Time by D. A. Binder and M. A. Hidiroglou Bayesian Inference in Finite Populations by W, A. Ericson Inference Based on Data from Complex Sample Designs by G. Nathan Inference for Finite Population Quantiles by J. Sedransk and P. J. Smith Asymptotics in Finite Population Sampling by P. K. Sen The Technique of Replicated or Interpenetrating Samples by J. C. Koop On the Use of Models in Sampling from Finite Populations by I. Thomsen and D. Tesfu The Prediction Approach to Sampling theory by R. M, Royall Sample Survey Analysis: Analysis of Variance and Contingency Tables by D. H. Freeman, Jr. Variance Estimation in Sample Surveys by J. N. K. Rao Ratio and Regression Estimators by P. S. R. S. Rao Role and Use of Composite Sampling and Capture-Recapture Sampling in Ecological Studies by M. T. Boswell, K. P. Burnham and G. P. Patil Data-based Sampling and Model-based Estimation for Environmental Resources by G. P. Patil, G, J. Babu, R. c. Hennemuth, W. L. Meyers, M. B. Rajarshi and C. Taillie On Transect Sampling to Assess Wildlife Populations and Marine Resources by F. L. Ramsey, C. E. Gates, G. P. Patil and C. Taillie A Review of Current Survey Sampling Methods in Marketing Research (Telephone, Mall Intercept and Panel Surveys) by R. Velu and G. M. Naidu Observational Errors in Behavioural Traits of Man and their Implications for Genetics by P. V. Sukhatme Designs in Survey Sampling Avoiding Contiguous Units by A. S. Hedayat, C. R. Rao and J. Stufken
702
Contents of previous volumes
Volume 7. Quality Control and Reliability Edited by P. R. Krishnaiah and C. R. R a o 1988 xiv + 503 pp.
1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 2 l. 22. 23.
Transformation of Western Style of Management by W. Edwards Deming Software Reliability by F. B. Bastani and C. V. Ramamoorthy Stress-Strength Models for Reliability by R. A. Johnson Approximate Computation of Power Generating System Reliability Indexes by M. Mazumdar Software Reliability Models by T. A. Mazzuchi and N. D. Singpurwalla Dependence Notions in Reliability Theory by N. R. Chaganty and K. Joag-dev Application of Goodness-of-Fit Tests in Reliability by H. W. Block and A. H. Moore Multivariate Nonparametric Classes in Reliability by H. W. Block and T. H. Savits Selection and Ranking Procedures in Reliability Models by S. S. Gupta and S. Panchapakesan The Impact of Reliability Theory on Some Branches of Mathematics and Statistics by P. J. Boland and F. Proschan Reliability Ideas and Applications in Economics and Social Sciences by M. C. Bhattacharjee Mean Residual Life: Theory and Applications by F. Guess and F. Proschan Life Distribution Models and Incomplete Data by R. E. Barlow and F. Proschan Piecewise Geometric Estimation of a Survival Function by G. M. Mimmack and F. Proschan Applications of Pattern Recognition in Failure Diagnosis and Quality Control by L. F. Pau Nonparametric Estimation of Density and Hazard Rate Functions when Samples are Censored by W. J. Padgett Multivariate Process Control by F. B. Alt and N. D. Smith QMP/USP-A Modern Approach to Statistical Quality Auditing by B. Hoadley Review About Estimation of Change Points by P. R. Krishnaiah and B. Q. Miao Nonparametric Methods for Changepoint Problems by M. Cs6rg6 and L. Horvfith Optimal Allocation of Multistate Components by E. E1-Neweihi, F. Proschan and J. Sethuraman Weibull, Log-Weibull and Gamma Order Statistics by H. L. Herter Multivariate Exponential Distributions and their Applications in Reliability by A. P. Basu
Contents of previous volumes
703
24. Recent Developments in the Inverse Gaussian Distribution by S. Iyengar and G. Patwardhan
Volume 8. Statistical Methods in Biological and Medical Sciences Edited by C. R. R a o and R. C h a k r a b o r t y 1991 xvi + 554 pp.
1. Methods for the Inheritance of Qualitative Traits by J. Rice, R. Neuman and S. O. Moldin 2. Ascertainment Biases and their Resolution in Biological Surveys by W. J. Ewens 3. Statistical Considerations in Applications of Path Analytical in Genetic Epidemiology by D. C. Rao 4. Statistical Methods for Linkage Analysis by G. M. Lathrop and J. M. Lalouel 5. Statistical Design and Analysis of Epidemiologic Studies: Some Directions of Current Research by N. Breslow 6. Robust Classification Procedures and Their Applications to Anthropometry by N. Balakrishnan and R. S. Ambagaspitiya 7. Analysis of Population Structure: A Comparative Analysis of Different Estimators of Wright's Fixation Indices by R. Chakraborty and H. DankerHopfe 8. Estimation of Relationships from Genetic Data by E. A. Thompson 9. Measurement of Genetic Variation for Evolutionary Studies by R. Chakraborty and C. R. Rao 10. Statistical Methods for Phylogenetic Tree Reconstruction by N. Saitou 11. Statistical Models for Sex-Ratio Evolution by S. Lessard 12. Stochastic Models of Carcinogenesis by S. H. Moolgavkar 13. An Application of Score Methodology: Confidence Intervals and Tests of Fit for One-Hit-Curves by J. J. Gart 14. Kidney-Survival Analysis of IgA Nephropathy Patients: A Case Study by O. J. W. F. Kardaun 15. Confidence Bands and the Relation with Decision Analysis: Theory by O. J. W. F. Kardaun 16. Sample Size Determination in Clinical Research by J. Bock and H. Toutenburg
704
Contents of previous volumes
Volume 9. Computational Statistics Edited by C. R. R a o 1993 xix + 1045 pp.
1. 2. 3. 4. 5. 6. 7.
Algorithms by B. Kalyanasundaram Steady State Analysis of Stochastic Systems by K. Kant Parallel Computer Architectures by R. Krishnamurti and B. Narahari Database Systems by S. Lanka and S. Pal Programming Languages and Systems by S. Purushothaman and J. Seaman Algorithms and Complexity for Markov Processes by R. Varadarajan Mathematical Programming: A Computational Perspective by W. W. Hager, R. Horst and P. M. Pardalos 8. Integer Programming by P. M. Pardalos and Y. Li 9. Numerical Aspects of Solving Linear Lease Squares Problems by J. L. Barlow
10. The Total Least Squares Problem by S. Van Huffel and H. Zha 11. Construction of Reliable Maximum-Likelihood-Algorithms with Applications to Logistic and Cox Regression by D. B6hning 12. Nonparametric Function Estimation by T. Gasser, J. Engel and B. Seifert 13. Computation Using the QR Decomposition by C. R. Goodall 14. The EM Algorithm by N. Laird 15. Analysis of Ordered Categorial Data through Appropriate Scaling by C. R. Rao and P. M. Caligiuri 16. Statistical Applications of Artificial Intelligence by W. A. Gale, D. J. Hand and A. E. Kelly 17. Some Aspects of Natural Language Processes by A. K. Joshi 18. Gibbs Sampling by S. F. Arnold 19. Bootstrap Methodology by G. J. Babu and C. R. Rao 20. The Art of Computer Generation of Random Variables by M. T. Boswell, S. D. Gore, G. P. Patil and C. Taillie 2l. Jackknife Variance Estimation and Bias Reduction by S. Das Peddada 22. Designing Effective Statistical Graphs by D. A. Burn 23. Graphical Methods for Linear Models by A. S. Hadi 24. Graphics for Time Series Analysis by H. J. Newton 25. Graphics as Visual Language by T. Selker and A. Appel 26. Statistical Graphics and Visualization by E. J. Wegman and D. B. Carr 27. Multivariate Statistical Visualization by F. W. Young, R. A. Faldowski and M. M. McFarlane 28. Graphical Methods for Process Control by T. L. Ziemer
Contents of previous volumes
705
Volume 10. Signal Processing and its Applications Edited by N. K. Bose and C. R. R a o 1993 xvii + 992 pp.
1. Signal Processing for Linear Instrumental Systems with Noise: A General Theory with Illustrations for Optical Imaging and Light Scattering Problems by M. Bertero and E. R. Pike 2. Boundary Implication Rights in Parameter Space by N. K. Bose 3. Sampling of Bandlimited Signals: Fundamental Results and Some Extensions by J. L. Brown, Jr. 4. Localization of Sources in a Sector: Algorithms and Statistical Analysis by K. Buckley and X.-L. Xu 5. The Signal Subspace Direction-of-Arrival Algorithm by J. A. Cadzow 6. Digital Differentiators by S. C. Dutta Roy and B. Kumar 7. Orthogonal Decompositions of 2D Random Fields and their Applications for 2D Spectral Estimation by J. M. Francos 8. VLSI in Signal Processing by A. Ghouse 9. Constrained Beamforming and Adaptive Algorithms by L. C. Godara 10. Bispectral Speckle Interferometry to Reconstruct Extended Objects from Turbulence-Degraded Telescope Images by D. M. Goodman, T. W. Lawrence, E. M. Johansson and J. P. Fitch 11. Multi-Dimensional Signal Processing by K. Hirano and T. Nomura 12. On the Assessment of Visual Communication by F. O. Huck, C. L. Fales, R. Alter-Gartenberg and Z. Rahman 13. VLSI Implementations of Number Theoretic Concepts with Applications in Signal Processing by G. A. Jullien, N. M. Wigley and J. Reilly 14. Decision-level Neural Net Sensor Fusion by R. Y. Levine and T. S. Khuon 15. Statistical Algorithms for Noncausal Gauss Markov Fields by J. M. F. Moura and N. Balram 16. Subspace Methods for Directions-of-Arrival Estimation by A. Paulraj, B. Ottersten, R. Roy, A. Swindlehurst, G. Xu and T. Kailath 17. Closed Form Solution to the Estimates of Directions of Arrival Using Data from an Array of Sensors by C. R. Rao and B. Zhou 18. High-Resolution Direction Finding by S. V. Schell and W. A. Gardner 19. Multiscale Signal Processing Techniques: A Review by A. H. Tewfik, M. Kim and M. Deriche 20. Sampling Theorems and Wavelets by G. G. Walter 21. Image and Video Coding Research by J. W. Woods 22. Fast Algorithms for Structured Matrices in Signal Processing by A. E. Yagle
706
Contents of previous volumes
Volume 11. Econometrics Edited by G. S. Maddala, C. R. Rao and H. D. Vinod I993 xx + 783 pp.
1. Estimation from Endogenously Stratified Samples by S. R. Cosslett 2. Semiparametric and Nonparametric Estimation of Quantal Response Models by J. L. Horowitz 3. The Selection Problem in Econometrics and Statistics by C. F. Manski 4. General Nonparametric Regression Estimation and Testing in Econometrics by A. Ullah and H. D. Vinod 5. Simultaneous Microeconometric Models with Censored or Qualitative Dependent Variables by R. Blundell and R. J. Smith 6. Multivariate Tobit Models in Econometrics by L. -F. Lee 7. Estimation of Limited Dependent Variable Models under Rational Expectations by G. S, Maddala 8. Nonlinear Time Series and Macroeconometrics by W. A. Brock and S. M. Potter 9. Estimation, Inference and Forecasting of Time Series Subject to Changes in Time by J. D. Hamilton 10. Structural Time Series Models by A. C. Harvey and N. Shephard 11. Bayesian Testing and Testing Bayesians by J. -P. Florens and M. Mouchart 12. Pseudo-Likelihood Methods by C. Gourieroux and A. Monfort 13. Rao's Score Test: Recent Asymptotic Results by R. Mukerjee 14. On the Strong Consistency of M-Estimates in Linear Models under a General Discrepancy Function by Z. D. Bai, Z. J. Liu and C. R. Rao 15. Some Aspects of Generalized Method of Moments Estimation by A. Hall 16. Efficient Estimation of Models with Conditional Moment Restrictions by W. K. Newey 17. Generalized Method of Moments: Econometric Applications by M. Ogaki 18. Testing for Heteroskedasticity by A. R. Pagan and Y. Pak 19. Simulation Estimation Methods for Limited Dependent Variable Models by V. A. Hajivassiliou 20. Simulation Estimation for Panel Data Models with Limited Dependent Variable by M. P. Keane 21. A Perspective on Application of Bootstrap methods in Econometrics by J. Jeong and G. S. Maddala 22. Stochastic Simulations for Inference in Nonlinear Errors-in-Variables Models by R. S. Mariano and B. W. Brown 23. Bootstrap Methods: Applications in Econometrics by H. D. Vinod 24. Identifying outliers and Influential Observations in Econometric Models by S. G. Donald and G. S. Maddala 25. Statistical Aspects of Calibration in Macroeconomics by A. W. Gregory and G. W. Smith
Contents of previous volumes
707
26. Panel Data Models with Rational Expectations by K. Lahiri 27. Continuous Time Financial Models: Statistical Applications of Stochastic Processes by K. R. Sawyer
Volume 12. Environmental Statistics Edited by G. P. Patil and C. R. R a o 1994 xix + 927 pp.
1. Environmetrics: An Emerging Science by J. S. Hunter 2. A National Center for Statistical Ecology and Environmental Statistics: A Center Without Walls by G. P. Patil 3. Replicate Measurements for Data Quality and Environmental Modeling by W. Liggett 4. Design and Analysis of Composite Sampling Procedures: A Review by G. Lovison, S. D. Gore and G. P. Patil 5. Ranked Set Sampling by G. P. Patil, A. K. Sinha and C. Taillie 6. Environmental Adaptive Sampling by G. A. F. Seber and S. K. Thompson 7. Statistical Analysis of Censored Environmental Data by M. Akritas, T. Ruscitti and G. P. Patil 8. Biological Monitoring: Statistical Issues and Models by E. P. Smith 9. Environmental Sampling and Monitoring by S. V. Stehman and W. Scott Overton 10. Ecological Statistics by B. F. J. Manly 11. Forest Biometrics by H, E. Burkhart and T. G. Gregoire 12. Ecological Diversity and Forest Management by J. H. Gove, G. P. Patil, B. F. Swindel and C. Taillie 13, Ornithological Statistics by P. M. North 14. Statistical Methods in Developmental Toxicology by P. J. Catalano and L. M. Ryan 15. Environmental Biometry: Assessing Impacts of Environmental Stimuli Via Animal and Microbial Laboratory Studies by W. W. Piegorsch 16. Stochasticity in Deterministic Models by J. J. M. Bedaux and S. A. L. M. Kooijman 17. Compartmental Models of Ecological and Environmental Systems by J. H. Matis and T. E. Wehrly 18. Environmental Remote Sensing and Geographic Information Systems-Based Modeling by W. L. Myers 19. Regression Analysis of Spatially Correlated Data: The Kanawha County Health Study by C. A. Donnelly, J. H. Ware and N. M. Laird 20. Methods for Estimating Heterogeneous Spatial Covariance Functions with Environmental Applications by P. Guttorp and P. D. Sampson
708
Contents of previous volumes
21. Meta-analysis in Environmental Statistics by V. Hasselblad 22. Statistical Methods in Atmospheric Science by A. R. Solow 23. Statistics with Agricultural Pests and Environmental Impacts by L. J. Young and J, H. Young 24. A Crystal Cube for Coastal and Estuarine Degradation: Selection of Endpoints and Development of Indices for Use in Decision Making by M. T. Boswell, J. S. O'Connor and G. P. Patti 25. How Does Scientific Information in General and Statistical Information in Particular Input to the Environmental Regulatory Process? by C. R. Cothern 26. Environmental Regulatory Statistics by C. B. Davis 27. An Overview of Statistical Issues Related to Environmental Cleanup by R. Gilbert 28. Environmental Risk Estimation and Policy Decisions by H. Lacayo Jr.
Volume 13. Design and Analysis of Experiments Edited by S. Ghosh and C. R. R a o 1996 xviii + 1230 pp.
1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17.
The Design and Analysis of Clinical Trials by P. Armitage Clinical Trials in Drug Development: Some Statistical Issues by H. I. Patel Optimal Crossover Designs by J. Stufken Design and Analysis of Experiments: Nonparametric Methods with Applications to Clinical Trials by P. K. Sen Adaptive Designs for Parametric Models by S. Zacks Observational Studies and Nonrandomized Experiments by P. R. Rosenbaum Robust Design: Experiments for Improving Quality by D. M. Steinberg Analysis of Location and Dispersion Effects from Factorial Experiments with a Circular Response by C. M. Anderson Computer Experiments by J. R. Koehler and A. B. Owen A Critique of Some Aspects of Experimental Design by J. N. Srivastava Response Surface Designs by N. R. Draper and D. K. J. Lin Multiresponse Surface Methodology by A. I. Khuri Sequential Assembly of Fractions in Factorial Experiments by S. Ghosh Designs for Nonlinear and Generalized Linear Models by A. C. Atkinson and L. M. Haines Spatial Experimental Design by R. J. Martin Design of Spatial Experiments: Model Fitting and Prediction by V. V. Fedorov Design of Experiments with Selection and Ranking Goals by S. S. Gupta and S. Panchapakesan
Contents of previous volumes
709
18. Multiple Comparisons by A. C. Tamhane 19. Nonparametric Methods in Design and Analysis of Experiments by E. Brunner and M. L. Puri 20. Nonparametric Analysis of Experiments by A. M. Dean and D. A. Wolfe 21. Block and Other Designs in Agriculture by D. J. Street 22. Block Designs: Their Combinatorial and Statistical Properties by T. Calinski and S. Kageyama 23. Developments in Incomplete Block Designs for Parallel Line Bioassays by S. Gupta and R. Mukerjee 24. Row-Column Designs by K. R. Shah and B. K. Sinha 25. Nested Designs by J. P. Morgan 26. Optimal Design: Exact Theory by C. S. Cheng 27. Optimal and Efficient Treatment - Control Designs by D. Majumdar 28. Model Robust Designs by Y-J. Chang and W. I. Notz 29. Review of Optimal Bayes Designs by A. DasGupta 30. Approximate Designs for Polynomial Regression: Invariance, Admissibility, and Optimality by N. Gaffke and B. Heiligers
Volume 14. Statistical Methods in Finance Edited by G. S. M a d d a l a and C. R. R a o 1996 xvi + 733 pp.
1. Econometric Evaluation of Asset Pricing Models by W. E. Ferson and R. Jegannathan 2. Instrumental Variables Estimation of Conditional Beta Pricing Models by C. R. Harvey and C. M. Kirby 3. Semiparametric Methods for Asset Pricing Models by B. N. Lehmann 4. Modeling the Term Structure by A. R. Pagan, A. D. Hall, and V. Martin 5. Stochastic Volatility by E. Ghysels, A. C. Harvey and E. Renault 6. Stock Price Volatility by S. F. LeRoy 7. GARCH Models of Volatility by F. C. Palm 8. Forecast Evaluation and Combination by F. X. Diebold and J. A. Lopez 9. Predictable Components in Stock Returns by G. Kaul 10. Interset Rate Spreads as Predictors of Business Cycles by K. Lahiri and J. G. Wang 11. Nonlinear Time Series, Complexity Theory, and Finance by W. A. Brock and P. J. F. deLima 12. Count Data Models for Financial Data by A. C. Cameron and P. K. Trivedi 13. Financial Applications of Stable Distributions by J. H. McCulloch 14. Probability Distributions for Financial Models by J. B. McDonald 15. Bootstrap Based Tests in Financial Models by G. S. Maddala and H. Li
710
Contents ~?fprevious volumes
16. Principal Component and Factor Analyses by C. R. Rao 17. Errors in Variables Problems in Finance by G. S. Maddala and M. Nimalendran 18. Financial Applications of Artificial Neural Networks by M. Qi 19. Applications of Limited Dependent Variable Models in Finance by G. S. Maddala 20. Testing Option Pricing Models by D. S. Bates 21. Peso Problems: Their Theoretical and Empirical Implications by M. D. D. Evans 22. Modeling Market Microstructure Time Series by J. Hasbrouck 23. Statistical Methods in Tests of Portfolio Efficiency: A Synthesis by J. Shanken
Volume 15. Robust Inference Edited by G. S. M a d d a l a and C. R. Rao 1997 xviii + 698 pp.
1. Robust Inference in Multivariate Linear Regression Using Difference of Two Convex Functions as the Discrepancy Measure by Z. D. Bai, C. R. Rao and Y. H. Wu 2. Minimum Distance Estimation: The Approach Using Density-Based Distances by A. Basu, I. R. Harris and S. Basu 3. Robust Inference: The Approach Based on Influence Functions by M. Markatou and E. Ronchetti 4. Practical Applications of Bounded-Influence Tests by S. Heritier and M-P. Victoria-Feser 5. Introduction to Positive-Breakdown Methods by P. J. Rousseeuw 6. Outlier Identification and Robust Methods by U. Gather and C. Becker 7. Rank-Based Analysis of Linear Models by T. P. Hettmansperger, J. W. McKean and S. J. Sheather 8. Rank Tests for Linear Models by R. Koenker 9. Some Extensions in the Robust Estimation of Parameters of Exponential and Double Exponential Distributions in the Presence of Multiple Outliers by A. Childs and N. Balakrishnan 10. Outliers, Unit Roots and Robust Estimation of Nonstationary Time Series by G. S. Maddala and Y. Yin 11. Autocorrelation-Robust Inference by P. M. Robinson and C. Velasco 12. A Practitioner's Guide to Robust Covariance Matrix Estimation by W. J. den Haan and A. Levin 13. Approaches to the Robust Estimation of Mixed Models by A. H. Welsh and A. M. Richardson
Contents of previous volumes
711
14. Nonparametric Maximum Likelihood Methods by S. R. Cosslett 15. A Guide to Censored Quantile Regressions by B. Fitzenberger 16. What Can Be Learned About Population Parameters When the Data Are Contaminated by J. L. Horowitz and C. F. Manski 17. Asymptotic Representations and Interrelations of Robust Estimators and Their Applications by J. Jure~kovfi and P. K. Sen 18. Small Sample Asymptotics: Applications in Robustness by C. A. Field and M. A. Tingley 19. On the Fundamentals of Data Robustness by G. Maguluri and K. Singh 20. Statistical Analysis With Incomplete Data: A Selective Review by M. G. Akritas and M. P. LaValley 21. On Contamination Level and Sensitivity of Robust Tests by J. A. Visgek 22. Finite Sample Robustness of Tests: An Overview by T. Kariya and P. Kim 23. Future Directions by G. S. Maddala and C. R. Rao
Volume 16. Order Statistics - Theory and Methods Edited by N. Balakrishnan and C. R. Rao 1997 xix + 688 pp.
1. Order Statistics: An Introduction by N. Balakrishnan and C. R. Rao 2. Order Statistics: A Historical Perspective by H. Leon Harter and N. Balakrishnan 3. Computer Simulation of Order Statistics by Pandu R. Tadikamalla and N. Balakrishnan 4. Lorenz Ordering of Order Statistics and Record Values by Barry C. Arnold and Jose A. Villasenor 5, Stochastic Ordering of Order Statistics by Philip J. Boland, Moshe Shaked and J. George Shanthikumar 6. Bounds for Expectations of L-Estimates by Tomasz Rychlik 7. Recurrence Relations and Identities for Moments of Order Statistics by N. Balakrishnan and K. S. Sultan 8. Recent Approaches to Characterizations Based on Order Statistics and Record Values by C. R. Rao and D, N. Shanbhag 9. Characterizations of Distributions via Identically Distributed Functions of Order Statistics by Ursula Gather, Udo Kamps and Nicole Schweitzer 10. Characterizations of Distributions by Recurrence Relations and Identities for Moments of Order Statistics by Udo Kamps 11. Univariate Extreme Value Theory and Applications by Janos Galambos 12. Order Statistics: Asymptotics in Applications by Pranab Kumar Sen 13. Zero-One Laws for Large Order Statistics by R. J. Tomkins and Hong Wang 14. Some Exact Properties Of Cook's DI by D, R. Jensen and D. E, Ramirez
712
Contents of previous volumes
15. Generalized Recurrence Relations for Moments of Order Statistics from Non-Identical Pareto and Truncated Pareto Random Variables with Applications to Robustness by Aaron Childs and N. Balakrishnan 16. A Semiparametric Bootstrap for Simulating Extreme Order Statistics by Robert L. Strawderman and Daniel Zelterman 17. Approximations to Distributions of Sample Quantiles by Chunsheng Ma and John Robinson 18. Concomitants of Order Statistics by H. A. David and H. N. Nagaraja 19. A Record of Records by Valery B. Nevzorov and N. Balakrishnan 20. Weighted Sequential Empirical Type Processes with Applications to ChangePoint Problems by Barbara Szyszkowicz 21. Sequential Quantile and Bahadur-Kiefer Processes by Miklds Cs6rg6 and Barbara Szyszkowicz