PITMAN'S MEASURE OF CLOSENESS
A Comparison of Statistical Estimators
This page intentionally left blank
PITMAN'S M...
33 downloads
642 Views
22MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
PITMAN'S MEASURE OF CLOSENESS
A Comparison of Statistical Estimators
This page intentionally left blank
PITMAN'S MEASURE OF CLOSENESS
A Comparison of Statistical Estimators
BiaJTL.
Society for Industrial and Applied Mathematics Philadelphia 1993
Library of Congress Cataloging-in-Publication Data Keating, Jerome P. Pitman's measure of closeness: a comparison of statistical estimators / Jerome P. Keating, Robert L. Mason, Pranab K. Sen. p. cm Includes bibliographical references and index. ISBN 0-89871-308-0 1. Pitman's measure of closeness. I. Mason, Robert Lee, 1946 II. Sen, Pranab Kumar, 1937 - . HI. Title. QA276.8.K43 1993 519.5'44—dc20
93-3059
Copyright 1993 by the Society for Industrial and Applied Mathematics. All rights reserved. No part of this book may be reproduced, stored, or transmitted in any manner without the written permission of the Publisher. For information, write the Society for Industrial and Applied Mathematics, 3600 University City Science Center, Philadelphia, Pennsylvania 19104-2688.
To our esteemed colleague, teacher, and mentor, C. Radhakrishna Rao on the occasion of his seventy-second birthday.
This page intentionally left blank
Contents Foreword
xi
Preface
xiii
1 Introduction 1.1 Evolution of estimation theory 1.1.1 Least squares 1.1.2 Method of moments 1.1.3 Maximum likelihood 1.1.4 Uniformly minimum variance unbiased estimation 1.1.5 Biased estimation 1.1.6 Bayes and empirical Bayes 1.1.7 Influence functions and resampling techniques 1.1.8 Future directions 1.2 PMC comes of age 1.2.1 PMC: A product of controversy 1.2.2 PMC as an intuitive criterion 1.3 The scope of the book 1.3.1 The history, motivation, and controversy of PMC 1.3.2 A unified development of PMC
1 5 5 6 8 10 12 14 15 17 18 18 19 22 22 23
2 Development of Pitman's Measure of Closeness 2.1 The intrinsic appeal of PMC 2.1.1 UseofMSE 2.1.2 Historical development of PMC 2.1.3 Convenience store example 2.2 The concept of risk 2.2.1 Renyi's decomposition of risk 2.2.2 How do we understand risk?
25 25 26 31 37 41 41 43
VII
viii
CONTENTS 2.3 Weaknesses in the use of risk 2.3.1 When MSB does not exist 2.3.2 Sensitivity to the choice of the loss function 2.3.3 The golden standard 2.4 Joint versus marginal information 2.4.1 Comparing estimators with an absolute ideal 2.4.2 Comparing estimators with one another 2.5 Concordance of PMC with MSE and MAD
3 Anomalies with PMC 3.1 Living in an intransitive world 3.1.1 Round-robin competition 3.1.2 Voting preferences 3.1.3 Transitiveness 3.2 Paradoxes Among Choice 3.2.1 The pairwise-worst simultaneous-best paradox 3.2.2 The pairwise-best simultaneous-worst paradox 3.2.3 Politics: The choice of extremes 3.3 Rao's phenomenon 3.4 The question of ties 3.4.1 Equal probability of ties 3.4.2 Correcting the Pitman criterion 3.4.3 A randomized estimator 3.5 The Rao-Berkson controversy 3.5.1 Minimum Chi-square and maximum likelihood 3.5.2 Model inconsistency 3.6 Remarks 4 Pairwise Comparisons 4.1 Geary-Rao Theorem 4.2 Applications of the Geary-Rao Theorem 4.3 Karlin's Corollary 4.4 A special case of the Geary-Rao Theorem 4.4.1 Surjective estimators 4.4.2 The MLR property 4.5 Applications of the special case 4.6 Transitiveness 4.6.1 Transitiveness Theorem
45 45 50 55 56 58 60 61 65 66 68 70 70 75 75 77 80 83 85 87 90 92 94 94 98 99 101 102 106 111 116 117 118 119 128 128
ix
CONTENTS 4.6.2
Another extension of Karlin's Corollary
129
5 Pitman-Closest Estimators 5.1 Estimation of location parameters 5.2 Estimators of scale 5.3 Generalization via topological groups 5.4 Posterior Pitman closeness 5.5 Linear combinations 5.6 Estimation by order statistics
135 136 147 151 155 159 164
6 Asymptotics and PMC 6.1 Pitman closeness of BAN estimators 6.1.1 Modes of convergence 6.1.2 Fisher information 6.1.3 BAN estimates are Pitman closest 6.2 PMC by asymptotic representations 6.2.1 A general proposition 6.3 Robust estimation of a location parameter 6.3.1 L-estimators 6.3.2 M-estimators 6.3.3 R-estimators 6.4 APC characterizations of other estimators 6.4.1 Pitman estimators 6.4.2 Examples of Pitman estimators 6.4.3 PMC equivalence 6.4.4 Bayes estimators 6.5 Second-order efficiency and PMC 6.5.1 Asymptotic efficiencies 6.5.2 Asymptotic median unbiasedness 6.5.3 Higher-order PMC
169 170 170 174 175 181 181 183 184 186 188 191 191 193 195 198 201 202 203 205
Bibliography
211
Index
223
This page intentionally left blank
Foreword I have great pleasure in writing a foreword to "Pitman's Measure of Closeness (PMC): A Comparison of Statistical Estimators" by Keating, Mason, and Sen, for many reasons. It is the result of a fruitful collaboration by three major research workers on PMC. It is a comprehensive survey of recent contributions to the subject. It discusses the merits and deficiencies of PMC, throws light on recent controversies, and formulates new problems for further research. Finally, there is a need for such a book, as PMC is not generally discussed in statistical texts. Its role in estimation theory and its usefulness to the decision maker are not well known. Since 1980, I have expressed the belief that the decision maker would benefit from examining the performance of any given estimator under different criteria (or loss functions). I suggested the use of PMC as one of the criteria to be seriously considered because of its intuitive appeal. I am glad to see that during the last ten years, PMC has been a topic of active research. The contributions by the authors of this book have been especially illuminating in resolving some of the controversies surrounding PMC. The authors deserve to be congratulated for their excellent effort in putting together much useful material for the benefit of statistical theorists and practitioners.
C.R. Rao Eberly Professor of Statistics
XI
This page intentionally left blank
Preface This book presents a unified development of the origins, nature, methods, and applications of Pitman's measure of closeness (PMC) as a criterion in estimation. PMC is based on the probabilities of the closeness of competing estimators to an unknown parameter. Although there had been limited exploration of the PMC methodology, renewed interest has been sparked in the last twenty years especially in the last ten years after Rao (1981) pointed out its use as an alternative criterion to minimum variance. Since 1975 over 100 research articles, authored by many prominent statisticians, have appeared on this criterion. With this renewed interest has come better understanding of this method of comparison and its usefulness. Posed as an alternative to the concept of MSE (mean squared error), PMC has been extensively explored through theorems and examples. The goal of this monograph is to acquaint readers with this information in a single comprehensive source. We refer researchers and practitioners in multivariate analysis to Sen (1991) and (1992a) for a comprehensive review of the known results about PMC in the multivariate and multiparameter cases. The recent proliferation of published results on Pitman's measure of closeness makes it difficult to provide the readership with a relatively current monograph. To do so, we make some restrictions. We have, for example, written the book at the level of a graduate student who has completed a traditional two-semester course in mathematical statistics. We hope that the holistic presentation of the known results about PMC presented under a common notation will accelerate the integration of the beneficial features of this criterion into the mainstream of statistical thought. The intended audience for this book consists of two groups. The first group includes statisticians and mathematicians who are research oriented and would like to know more about a new and growing area of statistics. It also includes those who have some formal training in the theory of statistics or interest in the estimation field, such as practicing statisticians who work xin
xiv
PREFACE
in areas of application where estimation problems need solutions. The second group for whom this book is intended includes graduate students in statistics courses in colleges and universities. This book is appropriate for courses in which statistical inference or estimation techniques are the main topics. It also would be useful in courses on research topics in statistics. It thus is appropriate for graduate-level courses devoted to the theoretical foundations of estimation techniques. The topics selected for inclusion in this book constitute a compilation of the many research papers on PMC. In all decisions regarding topics, we were guided by our collective experiences in working in this field and by our desire to produce a book that would be informative, readable, and for which the techniques discussed would be understandable to individuals trained in statistics. The book contains six chapters. The first chapter begins with a philosophical perspective, includes the least amount of technical detail, and focuses on basic results. The book then gradually increases in mathematical complexity with each succeeding chapter. Throughout the book a serious effort has been made to present both the merits and the drawbacks of PMC. More technical multivariate results are treated only briefly, to allow a thorough discussion of univariate procedures and methodologies. The first three chapters relate the history and philosophy of PMC. The material is illustrated through realistic estimation problems and presented with a limited degree of technical difficulty. The Introduction in Chapter 1 presents the motivation for exploring PMC and the notation to be used throughout the book. Chapter 2 contains discussions on the development of PMC, including its history, the concept of risk, the competition with MSE, and the role of the loss function. Chapter 3 explores the operational issues involved in adopting PMC as a criterion. Discussions are given on the issues of intransitiveness and probability paradoxes as well as on a useful methodology for resolving ties in probability. The last three chapters present a unified development of the extensive theoretical and mathematical research on PMC. Taken together, they serve as a single comprehensive source on this important topic. We unify notation, denote overlap, and present a common foundation from which the known results are deduced. The text is highly referenced, allowing researchers to readily access the original articles. Many new findings not yet published also are presented. Chapter 4 establishes the fundamental results of pairwise comparisons based on PMC. Chapter 5 connects results of PMC with well-accepted notions in statistical inference such as robustness, equivari-
DEVELOPMENT OF PITMAN'S MEASURE OF CLOSENESS
xv
ance, and median unbiased estimation. The last chapter contains results on asymptotics, including the optimality of BAN estimators according to the Pitman closeness criterion and general equivalence results of Pitman, Bayes, and maximum likelihood estimators. The examples used throughout the second half of the book are often important and practical applications in the physical and engineering sciences. This element should strengthen the appeal of the book to many statisticians and mathematicians. The last three chapters also could serve as a useful reference for an advanced course in statistical inference. We are indebted to many individuals for contributing to this work. C. Radhakrishna Rao, Eberly Professor of Statistics (Pennsylvania State University), has been a constant source of guidance and inspiration for over a decade and his work has been a chief motivator in this venture. Colin Blyth, Professor Emeritus (Queen's University), reviewed several earlier versions of this manuscript and has provided many valuable suggestions on content and validity of the results. We gratefully acknowledge the extensive editorial review of this book by Norman L. Johnson, Professor Emeritus (University of North Carolina). We also acknowledge the influence of Malay Ghosh (University of Florida) not only on the many technical results which bear his name but also on his foresight to recommend a special issue of Communications in Statistics devoted to recent developments in Pitman's closeness criterion. We thank the reviewers, H. T. David (Iowa State University), R. L. Fountain (Portland State University), and T. K. Nayak (George Washington University) for their careful reading of and commentary on this manuscript. These reviews helped us produce an improved presentation of the fundamental issues and results associated with Pitman's measure of closeness. We are also grateful to Vickie Kearn, the SIAM book editor, her assistant, Susan Ciambrano, and our editor, Laura Helfrich, for their editorial support of our work. Finally, we extend thanks to Catherine Check, Shelly Eberly, Nata Kolton, and Alan Polansky for the excellent typing support they provided for the project. J. P. Keating R. L. Mason P. K. Sen September 10, 1992
This page intentionally left blank
Chapter 1
Introduction There are many different ways to estimate unknown parameters. Such estimation procedures as least squares, method of moments, maximum likelihood, unbiasedness with minimum variance, skillfully chosen biased estimation, and Bayes and empirical Bayes are only a few of the many techniques that are currently available. Most of these techniques have been developed in response to specific practical problems for which no useful or existing solution was available, or to provide improvements to well-accepted estimation procedures. Certainly it is fortunate that there are many different ways to estimate unknown parameters, but which one should we use? It seems circular to use the criterion by which a best estimator was obtained as a basis for determining which estimator is best. What is clearly lacking is a systematic treatment of the comparison of such estimators based on criteria other than the currently popular mean squared error (MSE), which we will define in The use of mean squared error as a criterion in the comparison of estimators dates back to Karl Priedrich Gauss. As a simple illustration, consider the location (or intercept) model with an additive error component. This model is expressed in the following equation: where YJ is the ith observed value of n observations taken to measure an unknown parameter 0, and £; is the additive experimental error. For example, a scientist may try to reconcile differences in observations of the length Yi of a chord of the earth along its axis of rotation, where 9 is the true distance from the North Pole to the South Pole of the Earth. 1
2
PITMAN'S MEASURE OF CLOSENESS
Gauss took as his estimate of 6 the value 9 that minimizes the sum of squares for errors (SSE),
The value 0, termed the least squares estimator, is given by
Gauss tried to support his mathematical result through probabilistic methods but he (1821)
found out soon that determination of the most probable value of an unknown quantity is impossible unless the probability distribution of errors is known explicitly. Since the error component e, in the location model had to have a completely specified distribution, Gauss reasoned that the process might be made simpler if he transposed the ordering. Gauss (1821) intended to determine the distribution of the experimental errors, ei, 62, • • • , £n, contained in a system of observations that in the simplest case would result in the rule, generally accepted as good, that the arithmetic mean of several values for the same unknown quantity obtained from equally reliable observations shall be considered the most probable value. By this inverted process, Gauss found the desired density function of experimental errors to be that of a normal distribution with mean zero and variance cr2; i.e.,
Using the criterion of minimizing the SSE and assuming normality for the distribution of experimental errors in the system of observations, Gauss found that the sample mean Y became the most probable estimate of 0. Gauss (1821) was able to separate his construction of the least squares estimator from assumptions about the distribution of experimental errors.
INTRODUCTION
3
To obtain this separation, Gauss modified his proposed estimation criterion and chose the value 6 that minimizes
This Gaussian modification is the precursor to MSE. When 0 = Y, the sum of squares for error measures the total sum of squares in the system of observations, whereas if 0 — E(Y), the MSE becomes n times the population variance. In 1774 Pierre Simon Laplace proposed as an estimate of 0 the value 9 that minimizes the expected sum of absolute deviations,
This method was the precursor of the mean absolute deviation (MAD) criterion. In the location model, Laplace showed that the sample median Y was the most probable estimate of 9 whenever the experimental errors are independent and identically distributed with a common double-exponential density function given by
where A is a scale parameter that measures dispersion in the experimental errors. He proved that the Gaussian procedure of minimizing the expected sum of squares and his own procedure of minimizing the expected sum of absolute deviations produce the same estimate Y if and only if the experimental errors were normally distributed. During this period of the birth of estimation theory, the chosen loss function was clearly at the center of its development. The Gaussian selection of normally distributed experimental errors was a direct consequence of his conjugate choice of quadratic loss. Likewise, the Laplacian selection of double-exponentially distributed experimental errors stemmed from his conjugate choice of absolute deviation. We can add to these examples the distribution named after Augustin Luis Cauchy, who postulated for the experimental errors in the location model a density function given by
4
PITMAN'S MEASURE OF CLOSENESS
where A is a scale parameter that measures the scatter in the system of observations. Under the Cauchy law for the distribution of errors, the criteria of Gauss and Laplace were inappropriate since neither expectation exists. With this background, estimation theory, as we know it, was born. Gauss and Laplace set forth the two most important loss functions, based on MSE and MAD, and clearly stated that what was "best" depended upon the distribution for experimental errors. Cauchy's contribution was his example that minimizing the "expected" dispersion in a system of observations could be flawed. These contradictory results produced two important questions: (i) Is there an estimation criterion, which is insensitive (i.e., robust) to the experimenter's choice for measuring loss? (ii) Can this criterion be applied without focusing only on the expectation of the experimental error? In response to these questions, we present in this book a systematic treatment of pairwise comparisons of statistical estimators based on an alternative criterion termed Pitman's measure of closeness (PMC). We consider other criteria as a basis of comparison, but for the most part we focus on the criterion developed by Pitman (1937). It is defined as follows.
Definition 1.0.1 Let 6\ and 02 be two real-valued estimators of the real parameter 6. Then Pitman's measure of closeness of these two competing estimators is denoted by lP(0i,02\0) and defined by
Whenever the probability in (1.1) does not depend upon 0, we will suppress the conditioning notation and simply write .P(0i, #2)- When the value of JP(0i,02\0) depends upon the value of 0, PMC varies over the parameter space ft. In its simplest form PMC is the relative frequency with which the estimator 0\ will be closer than its competitor 62 to the true but unknown value of the parameter 0. However, lP(0i,02\0) does not measure how much closer 0i is to 0 than is #2- Pitman made his original definition under the assumption that
In such cases, the ratio !P(0i,02\0)/lP(02,0i\0) is the odds that Q\ will produce an estimator closer to 0 than 02- In §3.4, we extend Definition 1.0.1
INTRODUCTION
5
to the interesting and important comparison when the probability in (1.2) is not necessarily zero. Our viewpoint in this book is intentionally limited to PMC. Chapters 1-3 contain our reasons for this narrowness. In the space of this modest monograph we cannot hope to sort through all the known competing criteria for estimation. Savage (1954) has presented an analysis of pairwise preferences between estimators derived under seven different criteria. We propose PMC as an alternative to other ways of comparing the relative value of two or more statistical estimators in a variety of situations, some of which have no previously known solution. The similarities between PMC and MSB will also be thoroughly explored. We begin this chapter with a brief discussion of the evolution of estimation theory. The discussion will help us place in perspective the different estimation procedures that we will compare in this book and the reason for the emergence of Pitman's measure of closeness as a viable alternative to MSE.
1.1
Evolution of estimation theory
The evolution of estimation theory has been driven by a need to solve real problems. It has progressed from least squares to the method of moments, to maximum likelihood, to Bayes and empirical Bayes procedures, to riskreduction approaches, to robustness, and to resampling techniques. The following subsections provide brief discussions of these historical events in estimation theory. They also set the stage for our later discussions on comparisons of estimators.
1.1.1
Least squares
One of the most widely used estimation procedures is the method of least squares (LS). It was first derived in the beginning of the nineteenth century and its origins can be traced to independent work by Gauss and Legendre, and later input by Laplace. Plackett (1972) provides extensive details on the discovery of this method and the controversy of its origins. Its use of MSE as an estimation criterion is well known and is briefly discussed in §2.1. Consider a simple linear regression model in which
6
PITMAN'S MEASURE OF CLOSENESS
where .E^lo^; 9] is the conditional mean of YJ, 0 is an unknown parameter, and the Xj's are known values. Assume that the conditional variances of the Yi's are equal. To find the least squares estimator of 6 for n pairs of observations (2/i,£j), we determine the value of 9 that minimizes
The least squares estimators have several useful properties that add to their appeal. For example, when Yi,...,Yn have indepenedent norml distributions, the LS estimators maximize the likelihood function, are functions of sufficient statistics, and are uniformly minimum-variance unbiased estimators of regression parameters. They also are the best linear unbiased estimators in the class of estimators that are linear functions of F, in the sense of having the minimum MSE or convex risk. However, the LSEs are not invariant under arbitrary monotone (such as logarithmic or power) transformations on the observations—a case that often arises in practice, especially in biomedical studies. During the last fifty years many new developments have occurred in least squares estimation. In each situation new methodologies were introduced to improve the performance of older techniques. For example, regression diagnostics have progressed from simple residual plots and correlation matrices to such sophisticated techniques as partial leverage plots, condition numbers, Cook' s distance measure, and dfbetas (e.g., see Belsley, Kuh, and Welsch (1990)). Similar advancements have been made in such new areas as robust regression analysis, influence functions, quasi-least squares analysis, weighted least squares, inverse regression, and generalized linear models. In each of these settings, new estimation schemes were developed to remedy past problems. For example, robust regression and influence functions are useful improvements in accommodating data with outliers or influential observations. Weighted least squares is an improvement for addressing the problem of heteroscedasticity of the error variance, and inverse regression is a helpful solution for use in calibration problems. The wide popularity of regression in data analysis is a tribute to its ability to integrate these new estimation techniques into its overall approach to parameter estimation.
1.1.2
Method of moments
Karl Pearson introduced the method of moments (MM) at the beginning of the twentieth century. His empirical procedure involved equating the first
INTRODUCTION
7
k moments of the sample with the first k moments of the population, when there are k(> 1) unknown parameters associated with the population. This method results in k equations consisting of k unknowns. Although the MM procedure is very much an ad hoc technique, its simplicity and optimality in some specific cases made it useful in the absence of theoretically well-founded alternatives. For distributions with one unknown parameter, say, 0 = h(n) (i.e., a function of the population mean //), the method of moments estimator (MME) is given by
where X is the sample mean taken from a random sample of size n from the distribution of interest. When the distribution has twp parameters, namely, the population mean p, and variance cr2, the MMEs are given by
Pearson realized that his procedure might focus too closely on the first two moments of the distribution without regard to higher-order effects such as skewness and kurtosis. In such cases, alternative strategies were developed in which fi and a were estimated based on equating third and fourth population moments with the third and fourth sample moments. In addition, Pearson developed a family of distributions to allow for incorporation of these higher-order moments. This extension introduced to the underlying distribution additional parameters that were estimated as before. The MM procedure survived in part because of its ability to estimate reasonably well the parameters in the normal distribution. Indeed, the method of moments was shown by Blyth (1970) to be a special case of the method of least squares under relatively general conditions. However, MM estimation has obvious shortcomings. For example, it cannot be used with a Cauchy distribution under its present formulation since no moments exist (N.B., the MM can be modified in the Cauchy to estimate its scale parameter by using fractional moments). Finally, in distributions with complicated moments such as the Weibull, solving for the unknown parameters produces nonlinear equations that can only be solved numerically. This drawback is also shared by LS estimation for nonlinear models. Although the MME has several undesirable features, it seems reasonable to question how it compares with its competitors. To answer this query we
8
PITMAN'S MEASURE OF CLOSENESS
must decide on the criterion we will use for the basis of the comparison. Suppose we select as our criterion the bias B(0,6} of an estimator, which is given by
To illustrate the effect of this choice, let us consider the MME of 0 in the uniform distribution. Example 1.1.1 Consider a random sample A"i, Xi,..., Xn chosen from the uniform distribution
where I(a^(x) is the indicator function on the open interval (a,6). The MME is given by
which does not depend solely on the value of the sufficient statistic Xn:n, the largest order statistic. Since OM in (1.7) is formed from the sample mean of a randomly chosen sample of size n, then E(&M) = 2E(X) = 2| = 0. Thus the MME produces an unbiased estimator of 0. However, it is easy to show that many other estimators have zero bias (e.g., twice any convex linear combination of the unordered X^s is unbiased). Thus, the bias by itself does not have enough discriminating power, and we may need additional criteria to compare these estimators.
1.1.3
Maximum likelihood
Sir Ronald A. Fisher in the 1920s introduced the method of maximization of the likelihood function. He examined the joint distribution of the observed sample values #1,... ,x n as a function of the unknown parameters. The likelihood function is given by
and reduces to
INTRODUCTION
9
whenever the xi,...,x n are independent and identically distributed with common density function f ( x ; 0 } . In the ex-post-facto sense, after the data are collected, the likelihood function is only a function of 6. Interpretations of the likelihood function can lead to wide philosophical differences, which we wish to avoid. Readers interested in further details of these differences will benefit from reading Efron (1982). It suffices to say that the maximum likelihood estimator (MLE) is the value of 9 that produces the absolute maximum of l(0\x\,..., xn] and is a function of #1,..., xn. However, it may not always be possible to express this estimator in closed form as a function of x i , . . . , xn, and in some cases, the very existence of an absolute maximum may be difficult to establish. Most practitioners of statistics will frequently choose the MLE technique because they can ordinarily obtain reasonable estimators of the unknown parameters. While the exact distributional properties of the MLEs are unknown except for distributions having sufficient statistics, the asymptotic properties remain appealing features. The asymptotic unbiasedness, normality, consistency, sufficiency, and efficiency of these estimators make analysis of large data sets, such as consumer preference surveys and clinical trials, straightforward. This set of properties forms the condition of an estimator being best asymptotic normal (BAN). We will discuss such estimators in Chapter 6. For a general class of models, the concept of transformational invariance is a property shared by the ML and MM estimators. For location and scale parameter families of distributions, the consequent MLEs are equivariant, which is a condition we will use throughout Chapter 5. MLEs have the additional desirable aspect that whenever a sufficient statistic T(X) exists for an estimation problem, the MLE, by virtue of its origins from the joint density function, will be a function of that sufficient statistic. These features of unbiasedness, normality, consistency, invariance, and efficiency have become the principal properties sought in estimators. In the example of the uniform distribution given in (1.6) the MLE of 6 is
Since Xn:n/0 has a beta distribution with parameters a = n and (3 = 1, then E(0L) = B • E(Xn:n/6) = n0/(n+1). Thus the MLE produces a biased estimator of 0 whereas the MME yields an unbiased estimator. However, the MLE is asymptotically unbiased and has a smaller variance since
10
PITMAN'S MEASURE OF CLOSENESS
The uniform distribution is an interesting case because it exemplifies the Hodges-LeCam superefficiency property primarily because a regularity condition pertaining to the Frechet-Cramer-Rao inequality does not hold. To be more specific, df(x]0)/dO, where f ( x ; 0 ) is defined in (1.6), does not exist for x = 0 and moreover, since the essential range of X depends on 0, differentiation under the integral may no longer be valid. Note that since VBX(OM) is proportional to 1/n, the rate of convergence of Var(0A/) to zero is consistent with the optimum rate spelled out by the Frechet-CramerRao inequality (whenever it is applicable) and the Central Limit Theorem. However, the reader can readily verify that Var(#£,) is proportional to 1/n2 and its subsequent rate of convergence to zero is denoted 0(l/n2). The method of moments and the maximum likelihood estimators are consistent in that for each e > 0
We may want to discriminate between OM and QL as consistent estimators of 0 based on their rates of convergence. In the above example, the two rates of convergence are different (i.e., 1/n and 1/n2, respectively), and hence, in an asymptotic framework we can compare them. However, one might be concerned that the MLE underestimates 0 (a.e.), whereas the MME has a distribution that is symmetric about 0. If we graph the corresponding marginal distributions, we have an excellent example of the trade-off between an unbiased estimator with a reasonable variance compared to a biased estimator with a very small variance. It is exactly these types of differences that make estimator comparisons a difficult task and illustrate the need for useful comparison criteria. In the next subsection we introduce MSE as one way to reconcile variance and bias considerations and illustrate this reconciliation by constructing an unbiased estimator from the MLE in the uniform distribution.
1.1.4 Uniformly minimum variance unbiased estimation
(UMVUE)
The notion of MSE was introduced as
INTRODUCTION
11
Among the class of unbiased estimators (for which E(9) = 9), the estimator with minimum variance also has minimum MSE. This observation motivated the construction of unbiased estimators with minimum variance. The subsequent procedure produced small sample estimators that had two of the three appealing asymptotic properties of MLEs. This procedure was especially embraced by decision theorists because it could be developed from squared error loss functions. The examples of the MME and the MLE of 9 in the uniform distribution (see §§1.1.2 and 1.1.3) illustrate the trade-off between minimizing the squared bias and reducing the variance of an estimator. Many researchers have been intrigued with the mathematical elegance of the derivation of the Frechet-Cramer-Rao inequality, the Rao-Blackwell Theorem, and the Lehmann-Scheffe Theorem. From a practical perspective, the last two theorems provided practitioners with a method for the construction of unbiased estimators with minimum variance. We need only to find the conditional expectation of any unbiased estimator of the target parameter, given the sufficient statistic. For a unique best estimator in this sense, we may need the concept of completeness. The MSE of 9M is given by
and the MSE of OL is given by
Using MSE as a criterion, the biased estimator OL is preferred to the unbiased estimator OM• Since the E[OL\ = n9/(n + 1), we construct the following unbiased estimator:
The MSE of Su is given by
and is smaller than the MSE of OL whenever n > 1. The Frechet-CramerRao inequality does not apply to the uniform distribution because the essential range of the uniform distribution depends upon 9.
12
PITMAN'S MEASURE OF CLOSENESS
In general, we may not be able to reduce the bias of the MLE so simply as we did in the uniform distribution. In the more complex models, we may need to reduce the bias by more sophisticated techniques such as jackknifing the MLE. The jackknifing process reduces the size of the bias through higherorder expansions of the likelihood function.
1.1.5
Biased estimation
After World War II, researchers began to question the conventional methods of estimation detailed above and more closely scrutinized the resultant estimators. These individuals recognized that the MSE of an estimator can be made small by manipulating the bias to reduce the variance of the estimator. For example, estimators that are constants have a zero variance but are almost always biased; under the MSE criterion, they are always admissible but have no practical value. Within a short span, statisticians began producing biased estimators with uniformly smaller mean squared error than that attained by the UMVUE. The Hodges-Lehmann (1951) admissibility result for the sample mean in random samples chosen from a normal distribution helped to dilute the influence of these biased estimators. Its utility in large samples through the Central Limit Theorem aided the cause for unbiased estimators. However, Charles Stein in 1956 began to produce biased estimators in three-or-higher dimensions which had performed better than the traditional MLE, even for the multivariate normal distribution. Estimation rules with smaller total risk than the MLE were shown to exist for estimating each of the means of the sets of data. Stein's paradox generated much debate and caused many researchers to consider radical departures from traditional statistical thinking. The Stein effect helps illustrate the usefulness of Pitman's measure of closeness in one dimension in the following example. Example 1.1.2 Efron (1975) proposed the following problem. Let Xi,..., Xn be a random sample of size n from a normal distribution with a mean of 6 and unit variance (i.e., J\f(0,l)). As an alternative to the unbiased sample mean, 6\ = X, Efron proposed the following estimator #2 = X — A n (Jf), where
INTRODUCTION
13
is an odd function, and $(t) = l/v^/l^ e s /2ds is the standard normal distribution function. Figure 1.1 illustrates the relationship between 6\ and #2 for n = 1 and X > 0. When X is between —c/^/n and c/^/n, 62 = X/1\ when |-X"| > c/^/n, then
where c is the unique zero of q(x) = x — $(—x). The numerical value of c is 0.35962.
Figure 1.1: A graph of the unbiased estimator and the Efron estimator. To paraphrase Efron, #2 has none of the nice properties of 9\. It is not unbiased, not invariant, not minimax, nor even admissible in the sense of MSE. Yet 02 = B2(X) will more often be closer than 0i = $i(X) to the true value of 0 regardless of the value of 9\ In the sense of PMC, X is inadmissible to #2 ? which illustrates the presence of the Stein effect in the univariate case. Thus the sanctuary afforded by the Hodges-Lehmann landmark admissibility result for the sample mean does not extend to the criterion of PMC. The term A n (X) in (1.12) is known as a shrinkage factor since 0% shrinks X toward zero. The Efron estimator #2(X) is not unique in that A n (X) may
14
PITMAN'S MEASURE OF CLOSENESS
be defined in many ways so that &i(X) is inadmissible to the associated estimator O^X) in the PMC sense. This form of inadmissibility was also presented by Salem and David in 1973. The form of An(X) presented by them can be found in Salem (1974, p. 16). David and Salem (1991) and Yoo and David (1992) show that the inadmissibility surfaces in the general location parameter case or in the general scale parameter case. Moreover, Yoo and David (1992) present the proof that 0i pC) is inadmissibile in the PMC sense to any estimator 9$(X) that satisfies the conditions that #3(0) = 0 and, for all X ^ 0,
This latter condition implies that any estimator, which lies strictly between the two estimators in Figure 1.1, is also better than 9\(X} in the sense of PMC. Thus the "Stein effect" surfaces under the Pitman closeness criterion even in one dimension. Whereas many statisticians would not see this as sufficient evidence to abandon the unbiased estimator 0i, Efron (1975) uses Example 1.1.2 to raise ... a serious point: in more complicated estimation situations involving the estimation of several parameters at the same time, statisticians have begun to realize that biased estimation rules have definite advantages over the usual unbiased estimators. This represents a radical departure from the traditional unbiased estimation which has dominated statistical thinking since Gauss' development of the least squares method. The remarks with regard to Gauss are especially important and are addressed in §2.1.1.
1.1.6
Bayes and empirical Bayes
Bayesian analysis can be used by practitioners for situations where scientists have a priori information about the values of the parameters to be estimated. This information is formalized into a prior distribution on the parameter, and estimators are formed from the posterior distribution of the parameter given the data. This approach contains as a special case the fiducial procedure, which was used by Fisher. Our intent in this book is to present Pitman's concept of closeness under Bayesian, fiducial, and frequentist assumptions. Whether the prior distri-
INTRODUCTION
15
bution should be subjectively based or empirically constructed is an issue of considerable debate among Bayesians. The fiducial approach of Fisher essentially uses noninformative priors and the resulting estimates often have a relevant classical interpretation. In the Bayesian sense, the Pitman estimator (see Pitman (1939)) has an interesting interpretation. With respect to a uniform weight function (see Strasser (1985)), the Pitman estimator becomes the mean of the posterior distribution and as such is the Bayes estimator for quadratic loss. For Pitman's concept of closeness, the differences between frequentists and Bayesians are mainly philosophical, since the estimator that is "best" in the sense of PMC is the same for the different groups for a large class of estimation problems. Bayesians could certainly use PMC whenever two estimates of a parameter are derived from different measures of central tendency of the posterior distribution. For example, the Bayes estimator under a squared error loss function is the mean of the posterior distribution of (0\x\,..., xn), whereas the Bayes estimator under an absolute loss function is the median of the posterior distribution. Hence, if we are uncertain about the true error incurred, we may want to use a posterior version of the PMC definition. This comparison has been completed by Ghosh and Sen (1991) and is contained in §5.4 (see Definition 5.4.1).
1.1.7
Influence functions and resampling techniques
John Tukey in the early 1960s began a minor but much needed revolution in estimation theory with his Exploratory Data Analysis (EDA). Tukey believed that before we start using sophisticated estimation theory, we should explore the data under investigation through a variety of techniques for depicting its true underlying distribution. Moreover, as early as 1922, Fisher recognized the poor MSE performance of the sample mean and sample variance except in a highly confined subset of Pearson curves concentrated about the normal distribution. Two significant theoretical spin-offs that came from EDA were Huber's concept of robustness together with Hampel's formulation of influence functions and Efron's renovation of some resampling techniques. Huber, through the use of influence functions, developed methods that were robust (i.e., insensitive) to families of distributions within a neighborhood of the assumed distribution. Efron, with the novel bootstrap resampling technique, introduced a valuable procedure that offered an alternative to the jackknife estimator, although the latter may have smaller MSE.
16
PITMAN'S MEASURE OF CLOSENESS
Student Exam Consistent Overconfident 1 72 96 2 94 75 12 74 3 4 92 73 Average 73.5 73.5 Median 73.5 93
Procrastinator 60 65 59 100 73.5 62.5
To illustrate the concerns of Tukey, consider the following example. Example 1.1.3 The data in the table above represent the scores on four examinations for three different students: one who is consistent, one who is overconfident, and one who is a procrastinator. The question concerns what grade should be assigned to each student. To those who advocate sample averages, it is easy to see that each student has an average exam score of 73.5 and should therefore receive the same grade. However, if we remember that the purpose of a grade is to provide the best indicator of a student's performance, we will find ourselves in a dilemma. Clearly the overconfident student is capable of better work than a 12 on Exam 3, but that is precisely his score. The grade of 12 is considered an outlying observation that is not representative of the student's performance. Researchers in influence functions have produced estimation procedures which are robust to one's assumption about the true unknown distribution of the grades. A simplistic interpretation of their research is that sample medians tend to be better indicators of true performance because our knowledge of the distribution of grades is at best imprecise. Using the sample median, the consistent student's grade remains at 73.5, but the overconfident student's grade increases to 93. The contrast in the estimated grades for the latter student is remarkable. However, before students begin demanding a robust estimator of their grade, they should consider the grade of the procrastinator. This student has precisely the same average as the other two students, but the median grade is substantially lower. The question surfaces: what grade should the procrastinating student receive? Is the sample median a Pitman-closer estimator than the sample mean? Eisenhart, Deming, and Martin (1963) initiated such inquiries via simulation of the comparison of the sample mean with the sample median for the Gaussian, Laplace, uniform, Cauchy, hyperbolic
INTRODUCTION
17
secant, and hyperbolic secant-squared distributions. This numerical study strongly suggests that the sample median may be a more robust choice across a broad spectrum of distributions. This kind of example shows the impossibility of describing a student's performance on a set of tests by any one number. Similarly, random loss cannot be identified with any single value (see §3.1.2).
1.1.8 Future directions Many other estimation techniques exist in addition to the major ones discussed above, and ongoing research continues to advance the evolution of estimation theory. While our concern is primarily directed toward useful criteria for the comparison of these estimators, it is worthwhile to reflect on some comments by Efron (1978) on these statistical foundations. He states: The field of statistics continues to flourish despite, and partly because of its foundational controversies. Literally millions of statistical analyses have been performed in the past 50 years, certainly enough to make it abundantly clear that common statistical methods give trustworthy answers when used carefully. In my own consulting work I am constantly reminded of the power of the standard methods to dissect and explain formidable data sets from diverse scientific disciplines. In a way this is the most important belief of all, cutting across the frequentist-Bayesians divisions: that there do exist more or less universal techniques for extracting information from noisy data, adaptable to almost every field of inquiry. In other words, statisticians believe that statistics exists as a discipline in its own right, even if they can't agree on its exact nature. In Chapters 5 and 6, we introduce optimum estimators in the PMC sense. We show that these estimators have not only a frequentist interpretation but a Bayesian one as well. Hence, this inquiry into the Pitman closeness criterion produces a "best" estimator, which transcends the frequentist-Bayesian division. Moreover, these methods are shown to be of practical value in such diverse disciplines as political science, economics, marketing, quantum physics, engineering, clinical sciences, and bioassay.
18
PITMAN'S MEASURE OF CLOSENESS
1.2
PMC comes of age
Pitman's measure of closeness has existed as a criterion in statistical estimation for more than fifty years. However, its development has progressed at a much slower rate than the discipline itself. In §2.1, we will see that this delayed development is due primarily to researchers' need to develop inferential procedures that had superior mathematical properties to their predecessors or provided reasonable estimators in a wider variety of estimation problems. A brief discussion of the origins of PMC will help us better understand the role it plays for the statistics community.
1.2.1
PMC: A product of controversy
In the 1930s, an extensive controversy arose between Karl Pearson and Ronald Fisher concerning the merits of the minimum chi-square and the maximum likelihood estimators. In a 1936 Biometrika paper, Pearson defended the minimum chi-square procedure and noted that Fisher's MLE did not give "better" results in an example where a distribution is fitted to a set of data in two separate ways: where the parameters are estimated by minimum chi-square and MLE. The fitted distributions were then compared using a chi-square goodness-of-fit test. Pearson questioned the value of the MLE and Fisher's definition of the "best" estimate of a parameter in the following quotation: But the main point in this paper is to suggest to Professor Fisher that he should state the method of likelihood has taken hold of so many of the younger generation of mathematical statisticians, wherein he conceives it to give "better" results than the method of minimum chi-square. The latter has a firmer basis and gives the practical statistician what he requires, the "best" fitting curve to his data according to chi-square. But what does the maximum likelihood give him? If it gives the "best" parameters to the frequency curves, how does he define "best"? It cannot be because it gives the minimum [obviously maximum meant] likelihood, for that would be arguing in a circle. He must define what he means by "best," before he can prove that the principle of likelihood provides it. Pearson suggests that Fisher admit that he has converted many younger statisticians into users of maximum likelihood with the promise that it gives
INTRODUCTION
19
better results than the method of minimum chi-square. It is ironic that while Pearson warned Fisher against circular reasoning, Pearson's explicit assumption of what is best was equally circular in nature. For the purpose of our book, Pearson's quotation is essential because it is the prime mover in Pitman's endeavor to provide an impartial basis of comparison of the method of minimum chi-square with that of maximum likelihood. Pitman proposed PMC as a criterion that is intuitive, natural, and easily understood. Pearson's article is cited in the first sentence of Pitman's cornerstone article that appeared in 1937. Without this controversy, Pitman might not have proposed the criterion. Over forty years after the Pearson-Fisher controversy, Berkson and Rao engaged in a more refined debate over the usefulness of minimum chi-square as a criterion in estimation. Berkson claimed that by any traditional criterion, the procedure of minimum chi-square estimation would produce a "better" estimator than the MLE. Just as the Pearson-Fisher controversy precipitated Pitman's suggestion of PMC as a basis for the comparison of estimators, the Rao-Berkson controversy precipitated Rao's appeal to other criteria (including PMC) as methods for the comparison of estimators. In §3.5, we provide empirical evidence from Berkson's 1955 bioassay problem that the outcome of this comparison via the Pitman closeness criterion is mixed over the parameter space. What have we learned from these controversies? We, as well as many other researchers, have found answers in a multitude of desirable properties associated with PMC. This book is written to share these findings with a broader spectrum of the scientific community. The methods given here provide reasonable estimators in practical problems which lack well-founded estimation techniques. The development of Pitman-closest estimators is an earnest effort to present the "best" of what the Pitman closeness criterion has to offer the practitioner.
1.2.2
PMC as an intuitive criterion
Pitman's criterion has been studied by many generations of statisticians. Geary (1944), Scheffe (1945), Johnson (1950), Zacks (1965), Blyth (1972), David (1973), Efron (1975), and Rao (1980) are the principal scholars who kept the criterion alive until the wealth of information about PMC was sufficient for it to be regarded on its own merits. Although several developments in estimation theory, especially in the 1940s (see §2.1), rightfully overshadowed the concept of PMC, many articles
20
PITMAN'S MEASURE OF CLOSENESS
have been written about it in the past twenty years. Efron's (1975) article on the relevance of PMC is illustrated in the experience of a reliability engineer at Bell Helicopter Textron in 1976. The junior engineer was asked by the Chief Engineer of Research and Development to discuss different methods for estimation of the mission reliability of a critical AH-1J (the Cobra) attack helicopter component. The data in question came from an experiment that had to be curtailed due to cost overruns. As such the data were considered as Type II censored data from an exponential failure model. The reliability engineer presented discussions of the MLE and the UMVUE of the mission reliability under this exponential assumption. The MLE procedure was part of MIL-STD 781, which was developed in the 1950s. Thus, the reliability engineer's suggestion that the UMVUE might perform better in small samples was met with criticism from the project engineers. Nonetheless, they raised a multitude of valid questions about his choice. They objected to the fact that the UMVUE could provide an estimated mission reliability of zero even though every component in the sample had a positive chance of completing the mission. They disliked the fact that if they took the estimated mission reliability and back-solved for the estimated failure rate, the result disagreed with the reliability engineer's best estimate of the failure rate obtained by the same procedure. These practical questions raised by the engineers illustrated the point that the UMVUE was not necessarily range-preserving, nor was it invariant under transformations. The engineers did like the fact that the UMVUE gave a higher estimate of the mission reliability than the MLE for the AH-1J data. Faced with these conflicting views, the Chief Engineer of Research and Development asked the reliability engineer to determine which of the two estimators gave the "better" (i.e., closer) estimate more often. Although his question was stated simply, the solution was not evident and could not be provided for several months. Example 1.2.1 To present the more technical details of this example, let Xi ...,n be n independent and identically distributed exponential random variables with common density function
The parameter A is known as the failure rate of the process. The values n are the lifetimes of the n components in the study. The time to is the mission time for which the component is to be used. The reliability at the time to is given by
INTRODUCTION
21
The maximum likelihood estimator RL^O) is given by
where X is the average lifetime of the n observations in the sample. The uniformly minimum variance unbiased estimator is given by
If X < tQ/n, then Ru(to] = 0. However, in the asymptotic setting, we can show that The reliability engineer calculated the values of JP(Ri,(tQ), Ru(to)\R(to)), which can be found in Dyer, Keating, and Hensley (1979). However, he was not content with the outcome that the comparison was mixed over the parameter space. After completing the requested pairwise comparison, the junior engineer found that the 50% lower confidence bound on R(to) uniformly outperformed the MLE and UMVUE in the sense of Pitman's closeness criterion. Also, the 50% lower confidence bound (which is a median unbiased estimator in this case) was range preserving and the back calculation of the estimated failure rate produced a 50% upper confidence bound on A. Thus, this estimator was better than the standard ones and did not have the counter intuitive complications raised by the engineers. Due to the success of his techniques the junior engineer was moved from Reliability Engineering into Research and Development. He continued adding distributions to the list for which the 50% confidence bound out performed the MLE and UMVUE under the Pitman closeness criterion. PMC remains a reasonable alternative to MSE because of persistent problems in conventional estimation procedures and because among these "good" available estimators it provides a simple, readily understood way of choosing between two alternatives. The Pearson-Fisher example, the RaoBerkson example, and the reliability engineer's example arise because of specific problems inherent to the application (see §§3.5 and 4.4, respectively, for detailed discussions). In irregular cases, such as the uniform distribution in §1.1, we would find it to be an appealing alternative. However, PMC
22
PITMAN'S MEASURE OF CLOSENESS
has arrived because of the compelling result that PMC gives rise to an optimal estimator (within a broad class) related to the intrinsic property of median unbiased estimation. The importance of this result is reflected in the fact that we devote Chapter 5 to synthesizing the major works on optimality by Ghosh and Sen (1989) and (1991), Kubokawa (1991), and Nayak (1990). PMC's existence does not rely on moments of the underlying family of distributions and it has origins in diverse areas of estimation such as Bayesian, influence-function-based, and conditional estimation procedures. These results, which are given in Chapter 5, were more readily accepted due to PMC's ability to explain the Stein effect in multivariate analysis (see Sen, Kubokawa, and Saleh (1989)).
1.3
The scope of the book
When several different estimators are available for comparison, it is natural to ask which one is best. The answer depends upon many items such as the use to be made of the chosen estimator and the loss incurred in selecting the wrong one. Judging the "goodness" of an estimator requires criteria for making this determination. Some of these criteria, such as MSB, are very popular among researchers due to their experience in using them. Others, such as PMC, are less frequently used due to a lack of general familiarity with it and its associated properties.
1.3.1
The history, motivation, and controversy of PMC
The first three chapters of this book focus on the basic concepts, history, controversy, paradoxes, and examples associated with the PMC criterion. The attributes as well as the shortcomings of PMC are discussed in an attempt to impart a balanced perspective. Chapters 2 and 3 contain illustrations of the use of PMC in practical estimation problems, which arise in such diverse disciplines as political science, economics, athletics, engineering, sociology, marketing, and game theory. Chapter 2 contains discussions on the intuitive, intrinsic, and simplistic characteristics of PMC. We begin with a discussion of the history of PMC and introduce the concepts of risk, the competition between PMC and MSB, and the role of loss functions in the comparison of estimators. These results are illustrated through realistic estimation problems and presented with a limited amount of technical detail.
INTRODUCTION
23
However, the PMC criterion is not presented as a panacea to be used in all estimation problems. Throughout Chapter 3, we discuss the shortcomings of PMC and provide a sort of Surgeon General's Warning about its use (as suggested by Efron (1975)). The lack of transitiveness of PMC is discussed and the paradoxical consequences are illustrated. We compare preferences obtained by the use of PMC with those obtained by other methods whenever the comparison provides insight into the estimation process or clear weaknesses in other criteria. Through simple practical problems the reader can discover the prevalence of the usefulness of PMC and the caveats associated with its limitations. Chapter 3 explores the many controversies and paradoxes associated with PMC and emerges with a unified conclusion that echoes the sentiments of Johnson and Rao about the usefulness of different criteria in estimation.
1.3.2
A unified development of PMC
The last three chapters of this book contain a current account of theoretical and mathematical research in PMC. The examples used to illustrate and support these theoretical findings are ones related either to the controversies put forth in the first part of the book or are important practical examples from the physical and engineering sciences. In Chapter 4, the role of PMC in pairwise comparisons is developed through a sequence of successive generalizations that expand the class of estimation problems for which PMC can be computed. An extremely useful result is given whenever competing estimators are functions of the same univariate statistic, which happens frequently in the presence of a sufficient statistic. Chapter 5 contains a general development of Pitman-closest estimators through the use of topological groups and group invariant structures. The results are unified through the works of Ghosh and Sen (1989), Nayak (1990), and Kubokawa (1991) and result in a Rao-Blackwell-type theorem for PMC. An alternative posterior Bayesian interpretation of PMC produces a posterior Pitman-closest estimator from the work of Ghosh and Sen (1991). Linear estimation is a rich source of estimators and special methods are developed to allow us to compare different linear estimators. An optimal linear estimation procedure is given which relates the Kagan-Linnik-Rao form of the Pitman estimators to medians obtained by conditioning on ancillary statistics. Chapter 6 contains the important asymptotic properties that can be obtained for PMC and Pitman-closest estimators. BAN estimators are shown
24
PITMAN'S MEASURE OF CLOSENESS
to be Pitman-closest within a broad class of consistent asymptotic normal (CAN) estimators. The Pitman-closest estimators derived in Chapter 5 are examined in Chapter 6 but in an asymptotic framework. Under suitable regularity conditions these estimators are shown not to be only first-order efficient in the sense of Fisher but also second-order Pitman closest. These latter results are made possible by a first-order-asymptotic normal representation for the Pitman-closest estimators (Sen (1992b)). This representation is also useful in the asymptotic comparison of Pitman estimators, linear estimators, and MLEs. We conclude this chapter with a brief remark on multiparameter estimation problems, which have received the basic impact of PMC during the past six years. The role of a loss function is far more complex in a multiparameter estimation problem, and the mathematical complexities involved in the study of optimal properties of estimators make it difficult for us to include these developments in this volume. In the multiparameter context, Stein (1956) presented the paradox that even if each coordinate of a multivariate estimator is marginally optimal for the corresponding coordinate of the parameter, the vector-valued estimator may not be jointly optimal. For the normal mean vector, James and Stein (1962) constructed a shrinkage estimator that dominates the MLE in quadratic risk. This remarkable finding led to a steady growth of results in multiparameter estimation and the Pitman closeness criterion has contributed as a viable criterion in this context. Sen, Kubokawa, and Saleh (1989) and Sen (1992a) contain useful accounts of these related developments. We hope that readers interested in studying these more advanced topics will consult the references at the end of this book.
Chapter 2
Development of Pitman's Measure of Closeness The concepts underlying Pitman's measure of closeness (PMC) are tied to both an intuitive and a philosophical understanding of risk. These issues can become complicated as we consider pairwise and multiple comparisons of estimators. Nevertheless, the basic ideas are easy to understand and are tied closely to comparison examples that occur daily in our lives. This chapter begins with a discussion that traces the historical development of PMC and the motivation for its use as an alternative to MSE. We consider the more popular criterion of minimum mean squared error (MSE) in order to place PMC in its proper perspective. This is particularly pertinent in view of the strong emphasis given to MSE in the statistical literature. We explore the concept of risk as well as our understanding of it. Risk is shown to be sensitive to the choice of the underlying loss function. Illustrations are given of situations where MSE does not exist and where PMC is a helpful alternative. Finally, we discuss when an estimator is compared to both an absolute ideal as well as to other estimators to show the usefulness of joint versus marginal information.
2.1
The intrinsic appeal of PMC
An exploration of the historical development of Pitman's measure of closeness will be useful in our understanding of its merits. To do this, however, we will need to look at the major statistical criterion used in comparing estimators, namely, minimum mean squared error. Then these two procedures, MSE and PMC, will be compared and contrasted. 25
26
2.1.1
PITMAN'S MEASURE OF CLOSENESS
Use of MSB
The comparison technique based on minimizing the mean squared error has an extensive history. It was first discussed by Gauss, who stated the following in a paper that was read to the Royal Society of Gottingen, Germany, on February 15, 1821 (see Farebrother (1986)): From the value of the integral f^°00x(f>(x)dx, i.e., the average value of x (defined as deviation in the estimator from the true value of the parameter) we learn the existence or non-existence of a constant error as well as the value of this error; similarly, the integral J^°00x2^>(x)dx, i.e., the average value of x2, seems very suitable for defining and measuring, in a general way, the uncertainty of a system of observations. ... If one objects that this convention is arbitrary and does not appear necessary, we readily agree. The question which concerns us here has something vague about it from its very nature, and cannot be made really precise except by some principle which is arbitrary to a certain degree. ... It is clear to begin with that the loss should not be proportional to the error committed, for under this hypothesis, since a positive error would be considered as a loss, a negative error would be considered as a gain; the magnitude of loss ought, on the contrary, to be evaluated by a function of the error whose value is always positive. Among the infinite number of functions satisfying this condition, it seems natural to choose the simplest, which is, without doubt, the square of the error, and is the way proposed above. The MSE methodology stemmed directly from Laplace's earlier work in 1774 on estimation based on minimization of the mean absolute error of estimation. An MMSE estimator is defined as follows. Definition 2.1.1 (Minimum Mean Squared Error (MMSE)). If 0 is a parameter to be estimated and 9\,..., Ok are competing estimators, the estimator with the smallest MSE is found by selecting Oj such that
DEVELOPMENT OF PMC
27
The basis of the MMSE estimator in decision theoretic settings is its minimization under a squared error loss function. Frequently, the MSE comparison does not produce one estimator that is uniformly better for all 0. Indeed, a uniformly MMSE estimator is impossible within an unrestricted class of estimation rules. It is very popular, however, less because of its practical relevance and more because of its tradition, mathematical convenience, and simplicity—to summarize Gauss. This fact has been noted by several authors, including Karlin (1958) who gave the following explanation of its merits: The justification for the quadratic loss as a measure of the discrepancy of an estimate derives from the following two characteristics: (i) in the case where a(x) represents an unbiased estimate of h(ijS), MSE may be interpreted as the variance of a(x] and, of course, fluctuation as measured by the variance is very traditional in the domain of classical estimation; (ii) from a technical and mathematical viewpoint square error lends itself most easily to manipulation and computation. Due to its many attractive features, the criterion of MSE has been ingrained in statistical education. Although numerous estimation techniques such as maximum likelihood and minimum chi-square exist, few are as readily advocated and well accepted as the MMSE estimator. Another popular attribute of the MSE criterion can be developed if we consider a smooth loss function in a neighborhood of the parameter. Then for consistent estimators, MSE is a natural risk function in large samples. To illustrate this point, consider the following loss function
with the properties: (i) p(0) = 0 (i.e., whenever 0 — 0 no loss is incurred) (ii) p'(0) = 0 (i.e., the loss function is smooth in a neighborhood of 0) (iii) p(x] is nondecreasing for x > 0 and nonincreasing for x < 0 (iv) pW(0) exists for each k=0,l,2,.... Then the MacLaurin series expansion of p(0 — 0) is given by
28
PITMAN'S MEASURE OF CLOSENESS
where the first two terms vanish due to properties (i) and (ii) of the loss function. If 6 is a consistent estimator, 6 tends to be close to 0 in large samples; consequently, the cubic, quartic, and subsequent terms of the above series become negligible. This yields the result that for large sample sizes, a smooth loss function for the consistent estimator 0 in a neighborhood of 9 is given by
which is the quadratic loss function. (Note that /0"(0) > 0 is not guaranteed by property (iii) alone but requires that p(x) be concave upward in a neighborhood of the origin.) This approximation certainly provides the asymptotic appeal for MSE. In the asymptotic setting, comparisons made using the Pitman closeness criterion will be shown to be concordant with those obtained by MSE (see §6.2). With small sample sizes, however, the higher-order terms in the above series are not necessarily negligible and thus the quadratic loss function may not be appropriate. Condition (iv) can be relaxed so that a broader class of loss functions may be included in this appeal to MSE. If we replace condition (iv) with the condition that p^ (x) exists and is continuous for all x then the second-order Taylor expansion of p(6 — 0} is given by
where 0 < h < 1. With small sample sizes, however, setting a = h(9 — 0), p^\a)—/9^(0) may not be negligible. This modification allows us to include loss functions of the form p(y] = \y\r where r = 2 + 6 for some 0 < 8 < 1. Another popular loss function is absolute error loss, defined by
and the estimator for which the expected value of \0 — 0\ is smallest among the competing estimators is known as the minimum mean absolute deviation (MMAD) estimator. Note that this loss function does not satisfy property
DEVELOPMENT OF PMC
29
(ii) since p'(Q) does not exist. Hence the MacLaurin expansion is not applicable in this case. In a similar way loss functions defined as powers of the absolute loss may violate property (iv) as well, thus making the MacLaurin expansion even less plausible. Another problem with MSE is that, even in large samples, since MSE may heavily weight large errors, it will not necessarily provide for high probabilities that the estimator is near the true value of the parameter. This result has been recognized by many authors (e.g., Rao (1981)) and has led to much controversy about the use of MSE as a criterion. The following simplistic example illustrates one such criticism by Rao (1981) that quadratic and higher-order loss functions may place "... undue emphasis on large deviations which occur with small probability." Example 2.1.2 Suppose a tour bus company conducts excursions for school children who must pay the driver $2.00 for each trip. Two different tours are available. The first driver, an experienced but somewhat lazy person, knows that the schools always send groups in multiples of ten students. Rather than count each student, he mainly watches the loading process and allows students to deposit their own $2.00 fares in the token box. Since he knows the bus holds exactly 50 students, he counts the number of apparently empty seats using his overhead mirror. He then subtracts the count from 50 and rounds to the nearest 10 to obtain an estimate of the number of students on the bus. His count is made inaccurate by children whose heads are not seen, who move during the count, who sit more than two to a seat, and so forth. In contrast, a second driver meticulously collects each fare and individually counts the students as they pay their money. However, he always pockets $2.00 before depositing the collected fares in the token box. He then provides an estimated count one less than the true number. Let 0i represent the estimate given by the first bus driver and #2 be the estimate given by the second bus driver of the unknown student count 6. Suppose 61 equals 0 with probability .99 and equals 9 ± 10 with probability .01. Thus 0i errs (in absolute value) by 10 with a frequency of only 1%. Suppose also that 02 equals 0 — 1 with probability 1. Define the risk Ri(q] associated with the first driver's estimate as
whereas the risk R^q) in using the second driver's estimate is
30
PITMAN'S MEASURE OF CLOSENESS
where q > 0. If one is certain about the "cost" of erring and therefore knows g, then the choice of which estimator to use is clear. When q = 1, -Ri(l) = 0.1 and #2(1) = 1> implying that with respect to absolute risk, Q\ is the preferred estimator as the risk associated with 6\ is the smallest of the two estimators. However, when q = 2, the "infrequent" but large error associated with 0\ (a result of the first bus driver rounding the count to the nearest 10 students) is squared so that #1(2) = 1 and #2(2) = 1. The estimates now are equivalent with respect to MSE. With higher-order loss functions, such as q = 3, a reversal of preference occurs and 62 is the choice. As can be seen from this example, MSE and higher-order loss functions place heavy emphasis on the error in estimation made by the first bus driver. Yet such an error has a frequency rate of only 1%! In protecting against large errors, the MSE criterion fails to choose the estimator which is more frequently closer to the true count 0. To find this probability we compute
Therefore, 9\ provides a closer estimate of 6 than 62 with a frequency of 99%. More importantly, this choice remains the same regardless of the form of the loss function. Example 2.1.2 is counterintuitive for the mean squared error criterion in that 9\ is exactly correct with 99% frequency, whereas 62 always errs by exactly 1. From a practical point of view, if one worked for a company and had such drastic odds, which estimator would one use to estimate an unknown parameter, 0, such as the fraction defective? For a realistic example of this simplified situation, the reader is referred to the problem of estimation of the fraction defective in the normal distribution in Example 4.5.2. Consider this comparison when the true fraction defective is .05. The outcome of Example 2.1.2 should not be overemphasized. Consider two pharmaceutical machines that fill a medication, such as Accutane, Digitalis, or Lithium, which can be toxic if taken in excessive amounts. The first machine produces a 40 mg capsule of Accutane with 99% frequency but produces a 60 mg capsule with 1% frequency. The latter dose can be toxic with such severe side effects as blindness and irreversible liver damage. The second filling machine always produces a capsule with 39 mg of Accutane. In this case the first machine produces a capsule closer to the doctor's specifications with a relative frequency of 99%. However, the consequences of an overdose are so severe that even a 99% better fill rate does not justify use of the first filling machine.
DEVELOPMENT OF PMC
31
The following example by Robert, Hwang, and Strawderman (1993) exhibits a much different problem than that suggested by Rao in that the estimator B\ has large deviations from the target parameter 9 but the probability of occurrence, which equals .49, is reasonably large. Example 2.1.3 Let 0\ and 0% be two estimators ofOelR such that 0\ equals 9 with probability .51 and 9 ± 1023 with probability .49, while 92 = 9 ± .1 with probability 1. Then it follows that
Therefore, Q\ is Pitman-closer than #2 in estimation of 9 but "errs badly almost half the time." Whenever p > ^ Iog10(2),
This shows that #2 is preferred to 0\ for almost any £p-loss function because 9\ errs "badly" with probability .49. Note that, even in this example (despite the enormous deviation given to the Pitman-closer estimator in cases where it is not preferred), there exist values of p(< ^ Iog10(2)) for which the estimator preferred under Pitman's criterion coincides with that obtained through risk. Nonetheless, Robert et al. (1993) illustrate that the appeal for the Pitman closeness criterion is its basis as a majority preference rule. The importance of this observation and the subsequent role of the Pitman closeness criterion in game theory are discussed in §3.2. This example illustrates that when we have firm knowledge of the relative seriousness of the possible errors, our use of PMC may be inappropriate. Like other criteria, PMC is not universally applicable. The context of these examples provides no overwhelming endorsement of either criterion but can serve to manifest complications with both. It does serve to demonstrate that the choice of the loss function should be neither capricious nor automatic. Many statisticians concur with this conclusion, which seems consistent with the stance taken by Rao (1991).
2.1.2
Historical development of PMC
Pitman's (1937) measure of closeness is based on the probabilities of the relative closeness of competing estimators to an unknown parameter. For example, a confidence interval is a measure of the concentration (closeness)
32
PITMAN'S MEASURE OF CLOSENESS
of an estimator about the true value of the parameter. PMC can be used to define a Pitman-closer estimator and a Pitman-closest estimator within a class D. Definition 2.1.4 (Pitman-Closer Estimator). If 6\ and 62 are estimators of a common parameter 6, then 9\ is said to be a Pitman-closer estimate than #2 if
for all values of 9, with strict inequality for at least one 0 e Q. Definition 2.1.5 (Pitman-Closest Estimator). Let D be a nonempty class of estimators of a common parameter 9. Then 0* is Pitman-closest among the estimators in D provided for every 6 e D, such that 0^=0* (a.e.)
for all values of 6, with strict inequality for at least one value of 9. For a variety of technical reasons, there has been a general lack of interest in PMC. Pitman originally showed that the measure was intransitive. Blyth (1972) also noted this problem, as well as some paradoxes illustrating the inconsistency of PMC in selecting estimators based on pairwise or simultaneous comparisons. There were difficulties in evaluating the probability statement in Definition 1.0.1 and the accompanying theory was not of a sufficiently general nature to make the measure useful. The controversy between Berkson (1980) and Rao (1980) as mentioned in §1.2.1 over the use of MSE as a criterion in estimation sparked new interest in PMC as an alternative measure. Rao (1981) successfully argued for PMC as an intrinsic measure of acceptability and presented many diverse univariate examples in which shrinking the MSE of an unbiased estimator to a MMSE estimator did not yield a better estimator in the sense of PMC. Keating (1983) observed a similar phenomenon for percentile estimators under absolute error risk and he later framed Rao's observations in the context of a risk unbiased estimator. He showed that the phenomenon holds uniformly for certain estimation problems under absolute error risk. With this renewed interest in PMC has come more understanding of the methodology. Keating and Mason (1985b) provide some intuitive examples that illustrate and clarify the paradoxes raised by Blyth. These same
DEVELOPMENT OF PMC
33
authors (1988b) use PMC to give an alternative perspective to James-Stein estimation. Sen, Kubokawa, and Saleh (1989) shed light on Stein's paradox in the sense of PMC. Many other illustrations of the expanded attention being given to PMC can be seen by scanning the titles of the references at the end of this book. Readers will find Pitman's (1937) seminal paper to be essential reading in the discussions that follow. It has been conveniently reprinted in a 1991 special issue of Communications in Statistics-Theory and Methods, A20(ll), devoted to Pitman's measure of closeness. Pitman defined the criterion, presented examples where the MLE might be misleading, articulated the central role played by the median in his criterion, introduced the notion that we know as equivariance, and recognized the intransitive nature of pairwise comparisons under his criterion. In the latter regard the reader will undoubtedly be surprised at Pitman's declaration: "In any case, this (intransitiveness) does not detract from the definition of closest." Although Pitman gave no rationale for this declaration, he was clearly convinced of the lack of necessity for transitiveness whenever a Pitman-closest estimator existed. His ability to segregate the concept of a Pitman-closest estimator among a class from that of a Pitman-closer estimator between a pair reflects an intuition which places his understanding of the criterion years ahead of his colleagues. In this regard, it suffices to say that more than five decades passed before the mathematical foundations of all the concepts introduced by Pitman were completely established. In the historical discussion which follows, we trace various ideas of Pitman to their ultimate solution in mathematical rather than chronological order. The ingenious nature of Pitman's original work was given little attention in its day. Shortly after the appearance of Pitman's work, Germany precipitated World War II with its invasion of Poland in 1939. Obviously many statistical pursuits were diverted to support the war effort. More importantly, between 1940 and 1950 many theoretical advances in decision theory were made, including the Frechet-Cramer-Rao inequality, the RaoBlackwell Theorem, and the Lehmann-Scheffe Theorem. These hallmark results based on the criterion of mean squared error clearly dominated the statistical literature at the time and for decades to follow. A review of Pitman's personal research shows he derived the Pitman (1939) estimators (PE) of location and scale based on squared error loss. Later, we shall see that his adherence to quadratic risk probably prevented him from recognizing the connection of his closeness criterion with Pitman estimators. This connection will be made in §§5.1 and 5.2. The relationship
34
PITMAN'S MEASURE OF CLOSENESS
between Pitman-closest estimators and Pitman estimators was no doubt stimulated by the work of Nayak (1990), who explained the role of location and scale invariance in the construction of Pitman-closest estimators. However, Ghosh and Sen (1991) explain the connection via a Bayesian perspective and Kubokawa (1991) extends these results to a general group invariant structure. Although Kubokawa's discussion is quite brief, a full development is given in §5.3. In 1944 Geary made a significant contribution in the determination of the closeness probability for unbiased estimators of a common mean. His approach will be used in a generalization known as the Geary-Rao Theorem in §4.2. His example of the comparison of the sample mean to the sample median in random samples chosen from a normal distribution is presented asymptotically (see Example 4.6.8). The asymptotic nature of Geary's example prompted Scheffe (1945) to advocate the Pitman closeness criterion as a procedure for discrimination among consistent estimators of a common parameter. This idea of Scheffe's is discussed in §6.2 and has connections with Rao's (1961) concept of second-order efficiency. The formal investigation of the asymptotic nature of the Pitman closeness criterion was started by Sen (1986a) with his study of best asymptotic normal (BAN) estimators. Its full development is the subject of Chapter 6. Much of the research from 1945-1975 involved the tedious comparison of estimators for known distributions. The importance of these comparisons (e.g., Landau (1947), Johnson (1950), Eisenhart et al. (1963), Zacks and Even (1966), and Maynard and Chow (1972)) should not be overlooked because they gave reasonable examples in which mean unbiased estimators were not necessarily Pitman-closer. Johnson (1950) generalized the results of Geary and Landau by dropping the condition of unbiasedness with respect to estimation of the mean in a normal distribution (see Example 4.2.1). In the uniform distribution, Johnson showed that the MLE is inadmissible under the Pitman closeness criterion and, like Pitman, he constructed a Pitman-closest estimator (see Example 3.3.2). An exception to these articles was given by Blyth (1972), who discussed several paradoxes surrounding the Pitman closeness criterion. The first paradox was related to the criterion's intransitive nature. The issue of transitiveness was given a partial solution by Ghosh and Sen (1989) and Nayak (1990) through restrictions of invariance. Ghosh and Sen (1991) also furthered this cause by showing that the posterior Pitman closeness criterion produced transitive comparisons. Hence, in the posterior setting established by Ghosh and Sen (1991), the paradox vanishes. Blyth also introduced the "pairwise best-simultaneous
DEVELOPMENT OF PMC
35
worst paradox," a useful discussion of which is given in §3.2. Later Keating and Gupta (1984) and Keating and Mason (1985a) developed techniques for these simultaneous evaluations based on PMC for a specific class of estimators. Blyth (1951) extended the well-known centerpiece of estimation theory in showing that the sample mean in random samples chosen from a normal distribution was admissible in the sense of mean squared error across the class of all decision rules. He stated: For classes of decision rules unrestricted by unbiasedness or invariance, it is clear that no decision rule is uniformly best in the sense of MSB. Blyth's generalization of the Hodges-Lehmann admissibility result allayed the apprehension among researchers about biased estimators that might have smaller MSB than their unbiased counterparts. Its importance to much of statistical inference arises from large sample application through the Central Limit Theorem. Note that constant estimators are also admissible under the MSE criterion but admissibility alone does not make them useful in any circumstance. Within a decade, Stein (1956) showed that in three or more dimensions, the MLE is an inadmissible estimator of the normal mean vector. He also constructed biased estimators, known as Stein-rule or shrinkage estimators, which dominate the MLE in terms of quadratic risk. Efron (1975) showed (see Example 1.1.2) that the Stein effect occurred in the one-dimensional case for the Pitman criterion. Robert et al. (1993) presented a class of decision rules for which one-dimensional UMVUEs are inadmissible in the Pitman sense. Keating and Mason (1988b) discussed the importance which the Pitman closeness criterion had on the subsequent MSE comparisons in two dimensions. This started a new but fragmented age of research in the Pitman closeness criterion. Prom 1979-1981, Dyer and Keating, in a sequence of articles, began to derive some general results for the determination of the Pitman closeness criterion whenever the estimators were functions of a common (usually sufficient) statistic. Within restricted classes, they derived Pitman-closest estimators of percentiles and scale parameters. Keating (1983), (1985) extended the earlier results for Pitman-closest estimators to location and scale parameter families. An extensive method for the determination of the Pitman closeness criterion was given by Rao, Keating, and Mason (1986), and
36
PITMAN'S MEASURE OF CLOSENESS
the results of Geary (1944) were shown to be a special case of the Geary-Rao Theorem. Much of the work completed between 1945-1985 resulted in Pitmanclosest estimators which were median unbiased. Recall that an estimator, #, is median unbiased for a parameter 6 provided
This observation had already been made by Pitman, but the class of examples was now sufficiently large to produce a general result. Ghosh and Sen (1989) gave conditions under which a median-unbiased estimator (MUE) would be Pitman-closest within a class. Nayak (1990) actually provided a technique for the construction of Pitman-closest estimators of location and scale which were median unbiased (see §§5.1 and 5.2, respectively). Kubokawa (1991) generalized this median-unbiased property for a general group invariant structure (see §5.3). However, the renewed interest in the study of the Pitman closeness criterion was begun by Rao (1980), (1981). He offered examples in which shrinking an unbiased estimator to a minimum mean squared error estimator within a class did not make the latter Pitman-closer. His approach was directed at the indiscriminate use of the mean squared error criterion. He proposed that, in the comparison of "improved" estimators derived under various conditions, the process should improve such natural properties as the Pitman closeness criterion and stochastic domination with respect to the classical estimates such as UMVUE and MLE. He questioned the usefulness of the improvement and clearly opposed Berkson's (1980) stance on minimum chi-square estimation. Keating (1985) discussed this phenomenon in the estimation of scale parameters. Keating and Mason (1985b), at the suggestion of Rao, discussed many practical examples in which the Pitman closeness criterion may be more useful than MSE. These practical papers made it easier for the general readership to understand the questions raised by Rao and one example is given in the following subsection. From the reliability estimation problem contained in Example 1.2.1, the reader can begin to see why Rao called PMC an intrinsic criterion. It arises naturally out of the estimation problem and can be interpreted simply by engineers and scientists with some formal training in estimation theory. Moreover, the intrinsic nature dominates some estimation problems where emphasis is placed on closeness to the target value. We present a relevant example due to Keating and Mason (1985b) illustrating a situation where closeness probability is a meaningful criterion for comparing estimators.
DEVELOPMENT OF PMC
2.1.3
37
Convenience store example
Suppose there are k competing convenience stores located in the town of Pitmanville. The loci of the stores will be denoted by the points xi, X 2 , . . . , Xfc in the plane of Pitmanville. We will assume that it is unknown whether or not the grid system of the town is rectangular, and in the absence of this knowledge will choose the Mahalanobis distance to measure the distance between points. Each Xj is thus the rectangular coordinate pair associated with the location of the ith convenience store. The two coordinate axes are, respectively, east-west and north-south, and the origin (0,0) is the location of the town square. The population distribution over Pitmanville will be represented by /(x), the joint density function of X that is the coordinate location of an arbitrary individual. Of concern is the location of this individual relative to the zth store. We will represent the weighted squared distance between the individual and the ith store by
where E is the covariance matrix of X. The expectation of the value in (2.3) is the bivariate analogue of MSB and is referred to as quadratic risk. Individuals usually patronize the convenience store that is closest to their location (all other considerations being the same) rather than the one that has the smallest weighted squared distance (i.e., MSB). To illustrate this consider Figure 2.1, which locates three stores (denoted by xi,X2, and Xa) in the plane of Pitmanville. The town has been subdivided into three regions (denoted Vi, V^, and Vs) in order that the persons located in a given region Vi are closer to the ith store than to any other. If the three Xj are not collinear, the perpendicular bisectors of the line segments joining each pair of x's determine the boundaries of these regions. The simultaneous closeness probability is simply the probability of Vi denoted Pr(Vi). Suppose that the coordinate location of the population density has a bivariate normal distribution with mean vector 0 and covariance matrix S. With this assumption the weighted squared distance in (2.3) has a noncentral chi-squared distribution with two degrees of freedom and a noncentrality parameter given by Using this result it follows that the MSB is given as
38
PITMAN'S MEASURE OF CLOSENESS
Figure 2.1: The Voronoi tessellation of three convenience stores.
Figure 2.2: Example of stores distributed on the unit circle.
DEVELOPMENT OF PMC
39
The weighted MSE is at a minimum when Xj is at the origin 0. Thus MSB places a value on each store without considering the location of competing stores, while the closeness probability accounts for the relative positions of the competing stores through its determination of the size of the region V£. Figure 2.2 illustrates three stores located on the unit circle at coordinates given by (--s/2/2, \/2/2), (l/2,\/3/2), and (0,-1). In this example we will assume that £ is the 2 x 2 identity matrix. Note that each store would be closest to exactly 50% of the town population in any of the three pairwise comparisons and that all three stores have the same weighted MSE value of 3. The conclusion at this stage would be that store location should not be a factor in deciding which store to patronize. However, consider what happens when the closeness probability is used as a criterion of comparison. The probability for Vi represents the fraction of the population that is located closer to x$ than any other store. These probabilities can be calculated using the probability content of sectors of the bivariate normal. A simpler geometric approach has been described by Farebrother (1986) and is given below. Note in Figure 2.2 that the three stores are located at the same distance from the town square. One is in the direction N 30° E; the other at N 45° W; and the third located due S. The angles between these directions are 75°, 135°, and 150°, so the first store has a sector with a central angle of 37.5 + 75 = 112.5°; the second store a sector of 37.5 + 67.5 = 105°; and the third store a sector of 67.5 + 75 = 142.5°. Dividing by 360°, we find that the stores attract the following proportions of the population: Store Pr(VS) 1 .3125 2 .2917 .3958 3 Based on these calculations, a larger proportion of the population in Pitmanville will live closer to the location of store 3 than to the location of the two competing stores. In general, all the stores on the same contour of constant population density will have the same weighted MSE but they will not necessarily share the same fraction of customers located in Pitmanville. The above example illustrates another useful aspect of PMC as an alternative criterion to MSE in comparing estimators. In this case, we illustrate the importance not of pairwise comparisons but rather simultaneous ones (see §3.2). This example, however, is not a universal result, as similar examples could be constructed to show that MSE is often preferred to PMC.
40
PITMAN'S MEASURE OF CLOSENESS
What should be gained from this discourse is the need to carefully consider the merits and limitations of other criteria besides MSB when comparing competing estimators. For example, when we decide to purchase a gallon of milk at 11:00 pm from the local convenience store we usually do not stop and compute the MSE of the population density of each store. Instead, we probably go to the store that is closest in travel time to our homes. While the placement of new stores according to MSE would always be at the town center, the placement using the PMC would be based on the proportion of the population closest to the store. We note that the partition, Vi, Vz, and Vs, is the Voronoi tessellation of M2 determined by the points xi, X2, and X3. Nearest neighbor algorithms have been developed to determine the regions whenever many points or stores are involved. Keating and Mason (1988b) discuss the extensive impact on the Voronoi tessellation created by changing to the l\ metric, which is frequently used in robotics. For example, the Voronoi tessellation becomes random if the coordinates xi, X2, and xa represent the locations of three police patrol cars in Pitmanville (see Stoyan, Kendall, and Mecke (1987)). The convenience store example also serves to illustrate the important connection of PMC with Bayesian estimation (e.g., see Keating and Mason (1988b)). Suppose we have a bivariate parameter 0 = (#1, #2) and that g(6) is a bivariate prior distribution defined over the parameter space. Using different estimation criteria, such as MSE and MAD, certain estimators §1, 62,..., &k are optimal based on the measures of central tendency of the posterior distribution. These estimators are fixed given the data and the parameter(s) in the prior distribution. In the posterior case, 0\, 62,..., Ok, play the same role as the points representing convenience stores. The regions V\,..., \4 based on the underlying Mahalanobis metric, partition Pitmanville into k sets in which the posterior loss due to estimator Oi is less than the posterior loss due to any other estimator. The calculation of Pr(V^) is the value of posterior PMC and the Mahalanobis metric serves as the loss function. Thus, as in Bayesian estimation theory, the individual posterior PMCs can be compared to select the estimator of choice. The concept of posterior Pitman closeness is developed in §5.4. A similar bivariate example involving election data is given by Romer and Rosenthal (1984), where the electorate is jointly distributed on public spending on parks and police. The candidates assume fixed positions, x i , . . . , Xfc on spending on parks and police.
DEVELOPMENT OF PMcc
2.2
41
The concept of risk
The language of decision theory is often used in evaluating different measures of closeness. We want to know how close two estimators are to the true value of a parameter, but we lack a methodology for determining the error we make in choosing each alternative. Since an estimator helps make a decision on the value of the parameter, we term the estimator a decision function. Associated with each decision function is a measure of the error in not estimating the correct parameter value. This error is formally called a loss function. We have seen in the previous subsection that loss functions can take a variety of forms. With the MSE criterion the loss function is based on squared error, while with the PMC criterion the loss function is expressed as a probability function of the absolute error. Selection of an appropriate loss function is neither an arbitrary nor a simple matter. But once chosen the problem is to select the estimator that makes the loss small. Doing this across many different samples of data can be difficult as some errors will be small while others will be large. To remove the dependence of the loss on the sample values, we usually choose estimators that minimize the average loss, the risk function. Choosing estimators with the smallest loss requires us to have an understanding of the concept of risk. This is particularly important in our study of PMC.
2.2.1
Renyi's decomposition of risk
One useful result that serves as an aid in understanding the risk associated with PMC is contained in a theorem given by Renyi (1970). Theorem 2.2.1 (Theorem of Total Expectation). Let #1,0 2> • • • , Ok (k > 2) be k real-valued estimators of the real parameter 0. Denote the loss of the ith estimator by Ci where Ci — p(\6i — 9\) for i = 1,..., k and p(y) is a strictly increasing and nonnegative-valued function of y for y > 0. The mean absolute error using the closeness partitions C\ < C^, £1 = £2, and C\ > £2 is as follows:
42
PITMAN'S MEASURE OF CLOSENESS
Using the above theorem, Dyer and Keating (1979b) determined a general solution for the decomposition components and introduced PMC as an interpretable element of the total risk, E(Ci). The probability statements in the first two terms of the preceding equation are the values of JP(0\,0 #i|0)> while the term E(d\Cj = min(£i, £2)) for i,j = 1,2, is the conditional risk incurred in estimating 6 with 0, given that 9j is closer to 9. Thus the pairwise Pitman probability has a unique role in the decomposition and evaluation of risk. In this regard Johnson (1950) remarked that It must be admitted that the closeness criterion does provide information not given by the mean square error. In particular, it is always possible to use the closeness criterion, regardless of the existence or non-existence of the first and second moments of the estimators. Ideally, of course, it would be desirable to use both criteria. An interesting property of PMC is discovered by extending the loss functions in Theorem 2.2.1. We call it the invariance property, due to its origins in Bayesian hypothesis testing (see Keating and Mason (1985a)). Theorem 2.2.2 (Invariance Property). If C\ = \0i — 9\q is the Cq-norm loss function of the estimator Q{, i = 1,2 and q > 0, then
The closeness partition defined by £f < C\ in (2.4) is the same as the closeness partition defined by C\ < £2- Note that while varying the value of q does not alter the Pitman closeness probabilities, it does arbitrarily increase or decrease the associated conditional risks. The reader can verify this with Examples 2.1.2 and 2.1.3. Thus, the Pitman closeness probabilities may be more important than the conditional risks in evaluating (2.4) because they are not changed by the arbitrary choice of the loss function within this class of £g-norm, q > 0. This is another natural property underlying the appeal of PMC which is not generally shared by MSE.
DEVELOPMENT OF PMC
43
However, the invariance of PMC does not remain true when we include an asymmetric loss function such as the entropy loss defined by
which is a useful loss function in the comparison of estimators of a scale parameter.
2.2.2
How do we understand risk?
Prevalent in our discussions of PMC is the issue of risk (e.g., see Mason (1991)). It is easy enough to define risk as the average loss, but it is not always as simple to express our understanding of it. Consider, for example, the risk involved in gambling. If we were to bet money on a particular outcome, we often think of the risk we incur as a function of the amount we bet as well as the odds of winning. Our major concerns are with the loss we would suffer if we were to lose the bet or the amount we would gain if we were to win the bet. But the true risk is a function of both of these elements and combines the respective probabilities of winning and losing with the amount wagered. Suppose this example is extended to a situation in which two alternative methods of betting are available to us. How do we proceed to compare them? One approach might try each method for a fixed number of bets using the same amount of money for the bets. The "better" method would be the one that yielded the smallest total loss, or, equivalently, the smaller average loss. We would say that it had the smaller risk. Pitman, in his 1937 paper, addresses the problem of such pairwise comparisons in discussing the limitations of PMC. He states that ... from the practical point of view, what is best depends upon the use we make of our estimates, and ultimately upon how we pay for our mistakes ... A key word in this phrase is pay. If we define clearly the use to be made of an estimate and the loss of an incorrect decision, then finding the proper criterion to use in picking the "best" estimator would be easy. An alternative approach would be to apply Renyi's Theorem of Total Expectation. To understand this concept of loss, consider two competing estimators, say 9\ and #2, for an unknown parameter 0. Denote the loss function for the first estimator as C\ = £(#i, 0) and the loss function for the
44
PITMAN'S MEASURE OF CLOSENESS
second estimator as £2 = £(^2>#)- The difference in the risks, A.R, which we incur in using #2 instead of #1, can be expressed as follows:
This difference in risks is a function of the average amount we would expect to lose (in using 62 when 0\ is preferred) times the probability of losing, plus the average amount we would expect to win (when #2 is preferred) times the probability of winning. In this setting, ties become irrelevant to the choice of estimator since they occur when the two risks are the same. The equation shows that the associated conditional risks are weighted by the closeness probabilities: Pr(£i < £2) and Pr(£i > £2)- This result is stated in a more mathematical way in the form of the following characterization of preference based on risk. Theorem 2.2.3 Let 9\ and #2 be real-valued estimators of the real-valued parameter 9 and let £(0,0) be the chosen loss function. Then it follows that
If the conditional risks are the same, then it is the odds of winning or losing that determine which estimator is preferred. Thus the risk is highly sensitive to the value of PMC. These ideas are expressed in the use of the word pay in the preceding quotation by Pitman. In practical situations we do not always have enough information to evaluate the average loss. Risk then is based on our prior beliefs about winning and losing as well as our expectations for loss. When comparing two estimators, PMC is an interpretable component of the associated risk. It expresses our belief that one method will have a smaller loss than the other. When these probabilities are combined with the conditional average losses, we are then able to determine which estimator has the smallest risk.
DEVELOPMENT OF PMC
2.3
45
Weaknesses in the use of risk
There are inherent weaknesses in relying totally on risk as a criterion in comparing two estimators. One concern is that it may not always be possible to determine the risk. This can occur when the losses associated with incorrect decisions are unknown or cannot be formulated. The risk is also very sensitive to the choice of the loss function. As noted earlier, an arbitrary choice for the loss function can definitely influence the risk but would have no effect on PMC across monotone functions of |0 - 0| and a small effect in classes including both entropy and £q loss functions. Another problem, discovered by Rao (1981), is that shrinking the risk of an unbiased estimator does not necessarily produce an estimator which is Pitman-closer than the unbiased one. However, some shrinkage estimators, such as the Stein rule, are Pitman-closer estimators than the classical MLE, even under less restrictive conditions than with quadratic or other risks (see Sen, Kubokawa, and Saleh (1989) or Sengupta and Sen (1991)). Finally, when comparing an estimator to a fixed standard, as in calibration studies, risk can be a poor criterion. The following subsections contain discussions on these various situations. They explore in more detail some of the inherent weaknesses associated with the use of risk and illustrate settings where PMC is a viable alternative. As has been stressed throughout this book, one should not arbitrarily choose risk or PMC as the sole criterion in comparing estimators. Concern must also be given to the use of the estimator as well as the loss in making an incorrect decision.
2.3.1
When MSE does not exist
An interesting aspect of risk is that it may not always exist. By definition the risk function is the expected value of the loss; i.e.,
Unfortunately, as mentioned by Johnson (1950), there are situations where MSE does not exist or cannot be determined for a variety of reasons. This subsection discusses some commonly encountered examples, which are used to demonstrate the value of PMC as an alternative criterion for the comparison of estimators.
46
PITMAN'S MEASURE OF CLOSENESS
Example 2.3.1 (Divergent Estimators). We consider a random sample Xi,... ,Xn of size n from a Bernoulli distribution with probability function where 0 is the probability that a component is defective. A completesufficient statistic for estimation of 0 is given by Y = nX, which has a binomial distribution under the usual assumptions. Prom a frequentist's perspective, 0 is the frequency with which defective components are produced. Its reciprocal r = I/O is the period or expected number of produced components between defects. From the invariance property of MLEs, r\ = n/Y = l/X is the MLE of T. However, we can see that T\ is undefined with positive probability (1 — 0)n. MLEs have the convenient property that if OL is the MLE of 0, then g(OL) is the MLE of g(0). However, this can be a great weakness in that the MLE may be an inferior estimator for many #'s, even if OL happens to be a good estimator for 9. Further consideration of the MLE would be fruitless due to its singularity at Y = nX = 0. This is clearly the Achilles' heel of the logistic regression problem posed by Berkson (1980). He suggests the ad hoc procedure of adding l/2n to X whenever X = 0. Consider the estimator of r motivated by a 50% confidence interval on 0,
where 6.50(a,/?) is the median of a beta random variable with parameters a and /3. The motivation for this estimator stems from the relationship between cumulative binomial probabilities and values of the beta distribution function. Since the denominator never vanishes, Ti(X) is well defined over the sample space. Moreover
For example, if 1,000 components are inspected without defect then
which is a reasonable estimate of r when X = 0. Even the ad hoc Berkson estimator produces an estimator of T of 2,000. MSE can be calculated for
DEVELOPMENT OF PMC
47
Tz(X} whereas it cannot be for T\(X). The estimator T2(X} arises out of a Bayesian development which is discussed in §5.4. MSE may fail to exist due to problems inherent to the underlying distribution such as the lack of moments of any order. To illustrate this problem consider the example of the Cauchy distribution. Example 2.3.2 (The Cauchy Distribution). Suppose X\ and X-2 are a random sample of size two from the Cauchy probability density given by
where 9 is an unknown parameter. Although the above density function is symmetric about the parameter 0, its mean and higher-order moments do not exist. Consider the following two estimators of 0:
Blyth and Pathak (1985) show that #2 is Pitman-closer than 6\ to 0 in spite of their marginal distributions being identical. While moments of any positive order diverge, both estimators underestimate 0 with probability 0.50. If we want to determine which of these estimators is closer on the average to the unknown parameter and use MSE as the criterion, we cannot solve the problem. This results from the fact that the expectations of X\ and X2 do not exist. Thus we must use some alternative estimation criterion. If PMC is selected, we need to evaluate the probability Pr(|#i — 0\ < |#2—#!)• Consider the following transformation which parallels the argument in Blyth (1986). Let a = arctan[Xi - 9} and /3 = arctan[X2 - 9}. Thus, it follows that
The random vector (a, (3) is uniformly distributed over a square with length TT and centered at the origin. With this transformation, PMC is given by
48
PITMAN'S MEASURE OF CLOSENESS
Figure 2.3: Preference region for X\ in the Cauchy distribution.
An illustration of this region of preference in the a — 0 plane is shown in the shaded part of Figure 2.3. Using numerical integration and the rectangular symmetry of the /(a, /?) we find that PMC reduces to
The probability is less than .50, indicating that $2, the sample mean, is Pitman-closer to 0 than 0\, the individual sample value. Thus, although we were unable to compare the two estimators with the MSE criterion, we were able to discriminate between them using the PMC criterion. Example 2.3.3 (The Calibration Problem). A calibration experiment is one in which a response variable Y is measured as a function of a known set of predictor-variable values X, at least two of which must be distinct. A calibration line is fitted to the data and then used to estimate unknown
DEVELOPMENT OF PMC
49
predictor-variable values from corresponding measured values of the response variable. For example, consider the problem of calibrating a measuring instrument such as a gas pressure gauge. To calibrate the gauge, measurements are taken at several known gas pressures. Some technique, such as least squares regression, is used to fit a calibration line to the gauge measurements as a function of the known gas pressures. In the future, when the gauge is used to take measurements at unknown gas pressures, the calibration line can be employed to estimate these pressures. Estimating an unknown X, associated with a known Y value, can be achieved by classical regression methods. Suppose the responses are obtained from the model
and the unknown parameters a and /3 are estimated by least squares as 5 and /9. Given a known value of V, say ?/, the estimate of X is given by
This estimator is supported by the fact that it is also the maximum likelihood estimator when the errors, £j, are normally distributed. Alternatively, one could use an inverse regression procedure to estimate X. In this approach instead of regressing Y on X we do the "inverse" and regress X on Y. The resultant prediction equation yields the estimate
for the known value ?/, where 5* and ft* are the least squares estimates of the unknown parameters a* and (3* in the model
There has been much debate in the statistical literature over which of these two estimators is superior in estimating the unknown X value (e.g., see KrutchkofT (1971); Halperin (1970); Vecchia, Iyer, and Chapman (1989)). Much of the controversy stems from the fact that the MSE of X\ is infinite so that a random drawing from any distribution with a finite variance would produce a better estimate than this one under the MSE criterion. In short, MSE is an inappropriate criterion in this situation since it does not exist for one of the two estimators.
50
PITMAN'S MEASURE OF CLOSENESS
PMC has been introduced as an effective alternative criterion to solve this problem (Halperin (1970); Krutchkoff (1971); Keating and Mason (1991)). The relative closeness of X\ and X-z to the unknown X value can be found by evaluating the following probability
This quantity cannot be evaluated directly due to the complicated integration that is required but it can be approximated by considering the appropriate asymptotic distributions and using the fact that the variables are normally distributed. When this is done, it can be shown that neither estimate dominates the other, although the classical estimator appears to be far superior in practice. This example, like the previous one involving the Cauchy distribution, demonstrates a situation where the MSE criterion is ineffective due to its lack of existence. In these settings alternative criteria such as PMC have a meaningful role to play in the comparison of competing estimators. However, even when MSE does exist, the PMC criterion can be very helpful. The advantage in using PMC is that, while it is an interpretable component of the associated risk, its existence is independent of the arbitrary choice of the loss function from the class of monotone functions of \6 — 9\. The entropy loss function provides a nice example that we may be unable to extend this result for a much larger class of functions. Computing probabilities can pose as many problems as encountered in evaluating risk functions but at least the PMC is unchanged by our choice of loss functions.
2.3.2
Sensitivity to the choice of the loss function
The choice of the loss function determines the risk. A squared error loss function such as has the feature that estimates within a unit of the true parameter 0 produce smaller losses than that obtained under absolute error loss. However, the squared error loss function may give undue influence to large errors while an absolute error loss function such as
does not down-weight small errors. With an arbitrary loss function the risk can be highly sensitive to the selection.
DEVELOPMENT OF PMC
51
This apparent sensitivity is one that has encouraged many researchers to look to other criteria in evaluating estimators. The convenience store example and the electorate example of Romer and Rosenthal (1984) are practical examples where PMC is more relevant than a squared error loss function. Halperin's inverse regression problem provided an example in which PMC is a useful alternative for a situation where the risk associated with a squared error loss function cannot be evaluated. In each of these situations the risk was highly sensitive to the choice of the loss function. The selection of an appropriate loss function has been addressed by Laplace, Gauss, Pitman, Karlin, and Rao. The problems associated with inappropriate choices are documented throughout Chapters 2 and 3 of this book. We raise the need for estimators that are insensitive, that is, robust, to such choices. The importance of this issue has been cited by Ghosh, Keating, and Sen (1993): Robustness of estimators with respect to the loss function is an issue of utmost concern to the decision theorist With a few exceptions such as Hwang (1985), (1988) or Brown and Hwang (1989), this issue is usually inadequately addressed in the decision theory literature and one can safely assert that this robustness issue is yet to be satisfactorily resolved. In this regard, an appealing feature of the Pitman closeness criterion is its invariance with regard to choice of loss functions from the If and larger classes. The last sentence is made in light of Theorem 2.2.2, and the conditions for the larger class are specified by Keating (1991). The previous discussions raised the issue of the choice of a loss function. Indiscriminate choices may well prove to be disastrous. Many recent advances in estimation theory have centered on the concept of robustness of estimation techniques to departures in observed values from the underlying distribution. A thorough discussion of robust techniques is given in Hampel et al. (1986). Their focus is based on influence functions which provide formalizations of the bias created by an outlier. Among the popular robust procedures are those based on Huber's influence function
where c is labeled the breakdown point. A graph of ^c(x) is given in Figure 2.4. Note that at x = ±c, this function changes from a linear influence to
52
PITMAN'S MEASURE OF CLOSENESS
Figure 2.4: The Huber influence function with breakdown point c. a constant influence. If we let £'(x) = ^c(x), the corresponding loss function uses quadratic loss in the interval [—c, c] and an absolute loss function elsewhere. The appeal of the loss function generated by the Huber influence function is that it incorporates the desirable features of absolute and squared-error loss. However, controversy still centers on the selection of the breakdown point c. It should be noted that, due to Nayak's (1991) observation, Pitman's measure of closeness is invariant under any loss function defined by H(\x—6\] where #(0) — 0 and H(x) is strictly increasing for x > 0. Thus, the result of invariance in Theorem 2.2.2 applies to loss functions constructed out of such breakdown functions. A very popular influence function is the one associated with redescending M-estimators, which we will discuss in Chapter 6. Consider the following "normal" loss function given by
This loss function is graphed in Figure 2.5 and its complement has the same shape as a normal density function. All losses are bounded below by zero and above by one. In the sense of robust statistics, it has the desired
DEVELOPMENT OF PMC
53
Figure 2.5: The "normal" loss function. property of down-weighting extreme observations. A convenient feature of this loss is that if one defines I{>N(X) — £'N(X}, the influence function does not descend too rapidly. The influence function defined by VwW is given in Figure 2.6 and is very similar to that of the tanh-estimator under the normal distribution. Consider the influence function, i/;*, defined as follows:
c and /3C can be determined from Hampel et al. (1986). The optimum estimator determined by ip* is known as a median-type hyperbolic tangent estimator. This influence function is depicted in Figure 2.7 and differs from the one in Figure 2.6 in that it goes to zero at ±c, and in the neighborhood of x = 0, V>* nas a JumP discontinuity. However, ?/>#(#) does not descend to zero for all x such that \x\ > c, as does ^(x). It also has the desirable feature that the influence function is nearly linear in a large neighborhood of the origin. Moreover, ^N(X) satisfies the property of losses of the form H(\x\) such that #(0) = 0 and H(x) is strictly increasing for x > 0. Hence the results of Theorem 2.2.2 apply and a Pitman-closest estimator derived under absolute loss will also be Pitman-closest under this "normal" loss. Furthermore, the optimal estimator under an influence function is frequently related to medians, which is also the case for Pitman-closest estimators (see §5.3).
54
PITMAN'S MEASURE OF CLOSENESS
Figure 2.6: The influence function of the "normal" loss.
Figure 2.7: The influence function for a median-type tanh-estimator.
DEVELOPMENT OF PMC
55
2.3.3 The golden standard The concept of risk, as defined in (2.6), assesses penalties to departures from the truth (i.e., the golden standard). Certainly it is within the domain of the statistician to question by what authority one has chosen, for example, quadratic risk. Some would contend that the user with a real problem knows the potential loss. If one does not know the loss function, should one universally apply a squared error loss function? It is clear that one can artificially create large departures from the truth by an imprudent choice of loss. In the evolution of science, competing theories are compared on how well they explain observed phenomena. For example, for most of recorded history, it was commonly believed that the earth was flat. This belief was held despite the brilliant experimental work of Eratosthenes (c. 230 B.C.) associated with the summer solstice at Syene and Alexandria. This same kind of "common sense" was used via the MacLaurin series to support the motivation of MSE. In layman's terms, the world is "locally flat" or "locally Euclidean." The great truth that Riemannian geometry models the geometry on the surface of a sphere has done little to diminish the value of the Euclidean approximation but should make us more prudent in applying it. Another example of competing theories is seen in a study of gravitational attraction, which initially was well described by Sir Isaac Newton. However, with the advent of Albert Einstein's special theory of relativity, it became clear that Newtonian laws were simply inadequate to explain the behavior of relativistic particles. This superior explanation of Einstein's does not diminish the fact that much of human endeavor is well explained by a world which is "locally Newtonian." It seems consistent with the history and philosophy of science that Einstein's theory of relativity is a better approximation to an "ideal" world that we do not fully understand. In using Pitman's measure as a criterion of estimation we also compare two competing theories (in the form of estimators) with a true ideal (the unknown parameter). The choice of 9\ over 82 in estimating 0 based on PMC allows us to quantify "how frequently" we have chosen better. In any given sample the statistician does not know whether his choice was closer but does know how frequently he can expect to choose the better of the two theories. The simple philosophy of Pitman's measure was coined centuries ago (see Rao (1989)) in a discourse by Descartes: " .. .when it is not in our power to know what is true, we ought to do what is more probable." Descartes' idealized view must be considered in light of the fact that comparisons based
56
PITMAN'S MEASURE OF CLOSENESS
on PMC may be mixed over the parameter space. Our probabilities depend upon an unknown parameter and, what is more probable, may depend on this parameter, which we do not know.
2.4
Joint versus marginal information
In comparing estimators using the MSE or PMC criterion, an issue that often arises concerns the use of marginal versus joint information. PMC depends on the joint distribution of the competing estimators while MSE is based on their separate marginal distributions. Which is better depends on the particular situation under study. Several different authors have addressed this topic and have provided useful examples of the value of both types of information. Savage began the debate in his 1954 book on the foundations of statistics. After reviewing Pitman's (1937) original paper on PMC he concluded that
On any ordinary interpretation of estimation known to me, it can be argued (as it was in Criterion 3) that no criterion need depend on more than the separate distributions. A careful reading of Criterion 3, stochastic domination, reveals that Savage states that if the concentration functions of two estimators are identical, then there would be nothing to choose from since they are marginally identical. So Savage does not give any argument for support of this statement. However, Blyth (1972), with accompanying discussants, and Blyth and Pathak (1985) offer a series of examples where knowledge of the joint distribution can be more helpful. These examples are essential in establishing the role of PMC in estimation theory. Robert, Hwang, and Strawderman (1993) criticize the Pitman closeness criterion because its value depends on the joint distribution of two competing estimators. The basic reason given is that only one estimator is used in practice. They note that the Pitman closeness criterion depends upon the value of the correlation between the estimators. An informative example in this regard is given by Peddada and Khattree (1986), who compare two unbiased but correlated estimators of the common mean of two normal distributions. Sarkar (1991) provides a generalization of their comparison to several populations. In the asymptotic setup of Chapter 6, we show that the MSE and PMC criteria depend on a common correlation coefficient which has its origin in the Fisher efficiency.
DEVELOPMENT OF PMC
57
Example 2.4.1 UMVU estimation is certainly one of the key procedures in estimation theory and its origins in decision theory are well accepted. Consider two (mean) unbiased estimators 6\ and 6% of a common parameter 0 such that both have finite second moments, and 62 is the UMVUE of 9 attaining the Frechet-Cramer-Rao information bound. Then the mean squared error relative efficiency of B\ to 0% is
where p is the correlation coefficient between 9\ and 62- Whenever the family of distributions is complete, we have that Var(02) = Cov(0i,02) by the Lehmann-Scheffe Theorem, since 02, the UMVUE, is uncorrelated with all unbiased estimators of zero, in particular 6\ —62- When the family of distributions is not complete, Var(02) = Cov(0i,02) since Var[o:0i -f (1 a)02] > Var(02) for any a £ [0,1] and
Since any convex linear combination of 6\ and 02 is unbiased, its variance cannot be smaller than that of the UMVUE. Thus, even in the classical context of unbiased estimation, the correlation coefficient provides a quantitative value to the mean squared error relative efficiency of an unbiased estimator to the UMVUE! This connection between an element from the class of unbiased estimators with the UMVUE, pointed out by Fisher (1938), is seldom accentuated in texts on statistical inference, although it plays a natural role in finding the Fisher efficiency. In the absence of a complete sufficient statistic, the joint distribution of two estimators may contain more information than either of the marginals. The Pitman closeness criterion extracts such additional information in a natural way. In Example 2.3.2 on the Cauchy distribution, 0 = [9\ 02] forms a sufficient statistic for estimation of 0. The Pitman closeness criterion, based on the joint distribution of &\ and 02, contains this sufficient statistic through a linear transformation and should intuitively be more informative. This example of the Cauchy distribution has been widely discussed by Blyth (1972), Mood et al. (1974), Blyth and Pathak (1985), and Robert et al. (1993). It has been proposed as an unintuitive aspect of the Pitman closeness criterion. We interpret this result very positively, since larger
58
PITMAN'S MEASURE OF CLOSENESS
samples provide sample averages that are more frequently closer to the parameter of interest, which is indeed a very reasonable conclusion that cannot be obtained using conventional procedures. The sample mean, considered as a convex linear combination of independent Cauchy random variables, has a Cauchy distribution with a median of 0 and a unit scale parameter (see Kagan, Linnik, and Rao (1973)). Consequently, this phenomenon persists for increasing sample sizes in the Cauchy distribution and is not simply a special result for samples of size 2. Decision theorists could reasonably contend that absolute or quadratic loss is inappropriate for such distributions as the Cauchy. We illustrate loss functions for which the risk would exist even in the Cauchy distribution. Consider the "normal" loss function defined previously by
which provides a bounded loss in the interval [0,1). Since 9\ and #2 have the same marginal distributions, if their risks exist they will be identical in value. In this case,
Although the risks exist in this case, they produce the same value for both candidate estimators and thus we conclude that the two estimators are equivalent. Again, we reiterate that the Pitman closeness criterion produces the intuitive result that 02 is Pitman-closer than 6\. The following subs'ections contain discussions on situations where joint information is more helpful than marginal information. Initially the concept of comparing estimators to the true value of the parameter being estimated (as used in the MSB criterion) is discussed. It is then shown that, by determining which of the estimators is closer in a relativistic sense (as in the PMC criterion), more useful results frequently can be attained.
2.4.1
Comparing estimators with an absolute ideal
Use of the MSE criterion in comparing estimators is based on examining only the marginal information available on each estimator. This is the common procedure in statistical estimation theory. There exists a true but unknown value of the parameter. We believe in the existence of this ideal value but we do not know it and cannot determine it with probability one. Each available
DEVELOPMENT OF PMC
59
estimator is to be compared with this ideal using MSE as the criterion. The estimators thus stand alone and we are forced to use only their separate distributions. The popular procedures of statistical inference such as maximum likelihood estimation, minimum variance unbiased estimation, and Bayes estimation (using noninformative priors), produce estimators that are optimal within a class of decision rules under a specific criterion. Thus with respect to a chosen criterion, a best estimator (i.e., an MLE, a UMVUE, or a Bayes estimator) of the ideal value exists. In the previous section, we discussed examples in which one or all of these optimal decision rules may not exist. In such estimation problems and, in the absence of optimum estimators, we are left with only the ability to make relative comparisons in pairs. Pitman's closeness criterion provides an alternative technique for comparing available estimators of the true but unknown value of the parameter. Even in estimation problems where all the optimal procedures exist, the question surfaces as to how these different procedures should be compared. Each technique has its own set of favorable properties, but if there is disagreement over the criteria then the Pitman closeness criterion offers an impartial method for comparing the optimal decision rules. Some authors (e.g., Rao (1980) and Severini (1991)), when comparing optimum estimators derived from loss functions with those that are not (such as the maximum likelihood estimator), have advocated the use of the Pitman closeness criterion as an impartial procedure for the pairwise comparison. Keating (1985) also advocated the use of Pitman's closeness criterion in comparing the best estimators derived under different loss functions, such as absolute and quadratic loss. This point accentuates our view that the sensitivity of the optimal estimator in the decision theoretic framework is highly conditioned on the presumed loss function. It is simple to overlook the use of other loss functions because quadratic loss is employed so pervasively throughout the discipline. Further, in estimation theory, when the class of estimators is unrestricted, optimum estimators with respect to traditional but different criteria fail to exist (i.e., we are left with at best a set of admissible rules). It therefore seems judicious to use some impartial criterion such as the Pitman closeness criterion to compare estimators. Contrast this situation with the setting where we use PMC as the criterion. Here we take a relativistic approach in that our interest is in which estimator is more frequently closer to the ideal value. We do not really care about the size of the error as much as we do about the estimator that has
PITMAN'S MEASURE OF CLOSENESS
60
the smaller error. For example, in comparing the times of runners at a track meet, we are not so concerned in how fast each preliminary winner runs a particular race as we are in which runner wins the final race. It would be helpful to have the marginal results of each runner for individual races but we are mainly concerned with which one wins in head-to-head competition. This estimation requires knowledge of the joint distribution of the two race times, which may not be generally independent.
2.4.2
Comparing estimators with one another
Consider the following example to demonstrate the usefulness of joint versus marginal information in the comparison of one estimator to another. Consider two mechanical instruments that drill holes on some metallic strip. It is desired that the holes be drilled at the location 0 on a measurement scale. Let us assume that the two instruments are subject to error and hence do not always drill at the location of interest. Suppose the joint distribution of the location of the holes drilled by the two instruments is given as follows: q1 q2 .001 1.000
0 .495 .495
10 .005 .005
In the context of estimation, 6\ and 02 are two competing estimators of the known parameter 0 = 0. We want to choose the most appropriate estimator (the drilling instrument). Note that
so that 0i is superior to 02 in terms of PMC. Consider now the marginal distributions of these two estimators. These are as follows: q1
0
10
pr00 .99 .01 q2
.001
pr02
.5
1.0 .5
DEVELOPMENT OF PMC
61
Using these distributions the MSEs of the two estimators are
and
Hence, the estimator 62 is preferable to 9\ since it has the smaller MSE. We are confronted with the conflicting results that 0\ is preferable using the PMC criterion and 0% is superior under the MSE criterion. This gives rise to the question as to which estimator is of more practical value. Observe, from the marginal distributions, that 9\ drills at the correct location 99% of the time and at the wrong location 1% of the time. In contrast, #2 never drills at the location of interest although, in terms of MSE, it is closer to the true parameter value than B\. Based on these results, our choice should be clear: select the first instrument as it is better in drilling at the preferred location. The above examples demonstrate that a user of PMC can defend the value of joint information in many types of estimation problems. We will discuss this point in more detail in Chapter 5, where we will introduce the appropriate roles of ancillarity, equivariance, sufficiency, etc. They also add an important link in the historical development of PMC, as they relate some of the concerns that needed to be addressed to make it a viable alternative. Fortunately, researchers have been able to provide opposing views to procedures based on only examining marginal information. These arguments raised other issues of concern that in turn opened new paths of development.
2.5
Concordance of PMC with MSE and MAD
In the preceding sections of this chapter, we presented examples in which PMC may be a better criterion than the traditional criterion of mean squared error for the comparison of competing estimators. Many of these examples were nonregular cases that pose difficulties for most criteria. In many regular cases, there is considerable agreement of PMC with MSE on the better estimator in a pairwise comparison. To present a more balanced perspective of the interplay of these criteria, we discuss here, in very general terms, the
62
PITMAN'S MEASURE OF CLOSENESS
relative behavior of PMC, MSE, and MAD. We refer the reader to examples in subsequent chapters for illustration of these observations. In the presentation of a more balanced perspective of the general agreement between PMC and MSE in regular cases, certain questions arise naturally. For example, given cases where MSE performs well, what can we say about the Pitman-closeness criterion and vice versa? At this stage it is not possible to provide a complete answer to such a broad question. Nevertheless, we provide some guidelines in this section to motivate the more technical deliberations which will follow in Chapters 4-6. Throughout Chapter 4 we provide many examples in which PMC has an influence on the comparison of two estimators based on MSE and MAD. In estimator comparisons that are mixed over the parameter space ft, we can often determine the point in the parameter space where MSE or MAD change preference by noting dramatic shifts in the numerical values of the Pitman closeness probabilities. This observation is illustrated in Example 4.5.2 of the proportion defective in the normal distribution. This example illustrates the effect that PMC can have on risk-based comparisons. However, it does not demonstrate complete agreement between the criteria. In §4.2.2, we engage the problem as to whether an unbiased estimator with known variance is Pitman-closer than another unbiased estimator with a larger variance. If the estimators have a bivariate normal distribution, then we prove that PMC and MSE are equivalent. Khattree and Peddada (1987) extend this concordance result to estimators having elliptically symmetric distributions. Through these results we lay the foundation for the asymptotic comparison of the estimators. A preliminary example of an asymptotic PMC comparison comes from the comparison of the sample mean and sample median of random samples from a normal distribution in Example 4.6.8. The theoretical asymptotic results are thoroughly studied throughout Chapter 6. In §§6.1 and 6.2, the theory of BAN estimators is shown to support the concordance of PMC and MSE within a broad class of estimators having asymptotically normal (AN) distributions. A direct consequence of these asymptotically equivalent criteria, is that PMC is asymptotically transitive. Thus, PMC can asymptotically share the same standing as the MSE criterion. Moreover, PMC's flexibility in small sample sizes may often make it a more appealing criterion than MSE or MAD. In §§5.1, 5.2, and 5.3, we construct small-sample Pitman-closest estimators via the properties of equivariance and ancillarity. Under some reasonable conditions, these Pitman-closest estimators coincide with the median
DEVELOPMENT OF PMC
63
unbiased estimator having smallest MAD within the equivariant class. Once again, PMC is transitive when one compares estimators from an equi variant class. It is important to observe that for Pitman's measure of closeness, median unbiasedness replaces unbiasedness from the MSB perspective and the properties of equivariance and ancillarity replace sufficiency and completeness as prerequisites. In §5.4 we present a Bayesian interpretation of Pitman's measure known as posterior Pitman closeness. We mentioned this concept at the end of the convenience store example in §2.1.3. Posterior Pitman-closest estimators are frequently Pitman estimators obtained under absolute error loss. We show that posterior Pitman closeness is transitive and that the posterior Pitman-closest estimator is the one which minimizes posterior MAD. In §6.4, we unify the convergent equivalent nature of the MLE, the Pitman estimator, and the Bayes estimator under mild regularity conditions. These unifying results manifest a remarkable concordance of PMC with MSE. This important but little-known result is established under the suitable regularity conditions needed for the BAN estimators. However, the regularity conditions needed for PMC are less restrictive than those generally assumed.
This page intentionally left blank
Chapter 3
Anomalies with PMC We have seen in the last chapter that there are many different reasons for preferring PMC as a criterion in parameter estimation. These arguments should help solidify our understanding of this procedure and how it relates to the concept of risk. It now is helpful to turn our attention to certain operational aspects of PMC and when it is appropriate to adopt it as a criterion. Most of these issues center on probability properties and apply to many facets of our daily lives. Understanding them will help us recognize estimation problems when PMC is a useful alternative to other estimation criteria and when it is not. For example, in many types of consumer preference tests, it is common to ask each sampled individual to make a pairwise choice between several brands, say A, B, and C. One could then determine the proportion of consumers who preferred A to B, B to C, and C to A. If all three proportions exceed 50%, it is difficult to choose the best alternative as any one appears to be preferred over exactly one of the other two brands. Yet such an event is entirely possible. The above example illustrates a key criticism associated with PMC: the fact that it lacks the transitive property. By this we mean that Pr(X < Y) > .50 and Pr(F < Z) > .50 does not guarantee that Pr(X < Z) > .50. Blyth (1991), in a discussion of Pitman's original paper on PMC, counters these criticisms with arguments that transitiveness may not describe reality, particularly when making social preferences. We live in an intransitive world in which there are many situations, such as athletic competitions, political races, or chess matches, where the transitive property is irrelevant. The aspect of intransitiveness is explored in this chapter and several examples of practical settings in which it occurs are described. 65
66
PITMAN'S MEASURE OF CLOSENESS
A discussion is also given of the probability paradoxes in choice that may occur in using a pairwise comparison procedure. Some of these are similar to problems of the comparison of multiple treatment means in a one-way ANOVA. In using PMC it may be possible to find an estimator that is worst in pairwise comparisons but best in a simultaneous comparison. Likewise it is possible to have a pairwise-best estimator that is simultaneous-worst. Such paradoxes are given careful study to aid in understanding PMC. The problems of intransitiveness and paradoxes among choices can be partially resolved by the manner in which probability is assigned to ties between two estimators. A useful resolution for this assignment is discussed and leads to judicious treatment of adaptive estimators under PMC. The chapter ends with a discussion of the Rao-Berkson controversy concerning the comparison of minimum chi-square estimators with maximum likelihood estimators which are equal with positive probability.
3.1
Living in an intransitive world
Many daily decisions are made which produce intransitive preferences among the available choices. For example, in the long history of the Southwest Athletic Conference (SWC) only the season of 1959 produced a three-way tie for the conference football championship. In that season, three teams, Texas Christian University (TCU), the University of Arkansas, and the University of Texas, finished conference play with identical won-lost records of 5-1. Moreover, in head-to-head competition, TCU defeated Texas 14-9, Texas defeated Arkansas 13-12, and Arkansas defeated TCU 3-0. This outcome produced the type of intransitiveness that Savage (1954) so well characterized as the feeling of being caught in a contradiction. The final scores of that season are given in Table 3.1 for each tri-champion. Since final scores are available we can compute a total-point differential to select a champion. In head-to-head competition, TCU had a +2 point differential, Arkansas had a +2 point differential, and Texas had a —4 point differential. Unfortunately, this comparison still leaves TCU and Arkansas tied for the championship. Based on all the conference opponents, TCU had a +70 point differential, Arkansas had a +30 point differential, and Texas had a +43 point differential. This comparison would declare TCU the conference champion. However, if a minimax criterion is used then Arkansas would be the champion, since its lone defeat was by a single point. It appears that a partisan fan could probably find an established statistical procedure to support the right of each of these three teams to the championship. To those
ANOMALIES WITH PMC
67
Table 3.1: Final scores of the SWC tri-champions. Opponent TCU Arkansas Texas A&M Baylor Rice SMU
TCU 0-3 14-9 14-8 14-0 35-6 19-0
Arkansas 3-0
12-13 12-7 23-7 14-10 17-14
Texas 9-14 13-12
20-17 13-12 28-6 21-0
Taken from "The Official Southwest Athletic Conference Football Roster-Records Book," 1960, Volume XL
of us who are sports fans such paradoxical outcomes are indeed fascinating. One should be impressed by the very close outcomes and low scores of most of these games. As a partial explanation, one should recall that at this time, most of the participants played both offense and defense. The SWC employs a unique rule in the event of a tie for the football championship: the team with the least recent appearance in the Cotton Bowl becomes the conference representative. TCU had won championships in 1955 and 1958, Arkansas had won a championship in 1954, but Texas had neither won nor shared a championship since 1953. Therefore, the University of Texas was chosen to represent the SWC in the 1960 Cotton Bowl against Syracuse University. Syracuse defeated Texas 23-14 in the bowl game and also won the national title. Usually when several alternatives are available, we use pairwise comparisons to help guide our selection but, as shown in this example, intransitiveness among choices may pose a problem. Examples where this paradox can occur range from round-robin competitions, to politics, and even to psychology, such as in the study of the notion of rationality in thinking patterns. Understanding these phenomena is important in our use of PMC, due to its emphasis on preference of choice between two alternatives. Before beginning this section we need a formal definition of intransitiveness as it applies to PMC. An excellent source is David (1988) who defines it as follows. Definition 3.1.1 For any three real-valued random variables A, B, and C, stochastic intransitiveness occurs whenever Pr(A < B}, Pr(B < C), and Pr(C < A) all exceed .50.
68
PITMAN'S MEASURE OF CLOSENESS
Suppose that we wish to compare three estimators, 0i,#2> and 0$ of a common parameter 0. Then it is possible for .P(0i,02|0)>JP(02»03|0)> and ^P(035#i|0) to all exceed .50. Such circularities were dubbed by Sir Maurice Kendall as circular triads more than half a century ago. David's (1988) monograph on paired comparisons is essential reading on this topic. Each pairwise comparison, 6\ vs. 62, #2 vs. #3, and 9\ vs. #3, has two possible outcomes based on PMC (ignoring ties). Hence, there are 23 possible outcomes for the three pairwise comparisons. Based on PMC, the three pairwise comparisons may not have independent outcomes. Among these eight results, two possibilities produce circular triads. These triads can be represented by means of a directed graph where Oi —> 0j symbolizes that Oi is Pitman-closer than Oj to 0, as depicted in Figure 3.1. Whereas Savage lamented the contradictory nature of a circular triad, David (1988) provided a decidedly different perspective: "It is a valuable feature of the method of paired comparisons that it allows such contradictions to show themselves —"
3.1.1
Round-robin competition
The simplest example of intransitiveness can be seen in round-robin competitions. For example, in a round-robin tournament involving three soccer teams from Argentina, Brazil, and Italy, Argentina may be preferred over Brazil, Brazil may have a better chance of beating Italy, but Italy may be the choice over Argentina. Which team should we choose? Clearly our choice is difficult as no one team appears to be preferred over all the others. Blyth (1972) gives an example of an athletic event where the times TA, TB, and TC to run a particular course are recorded for three runners, labeled A, B, and C. If Pr(TA < T5), Pr(TB < Tc), and Pr(Tc < TA) are all greater
Figure 3.1: A directed graph of a circular triad based on PMC.
ANOMALIES WITH PMC
69
than .50, then these three probabilities indicate that there is better than a 50% chance that each of the following events occurs: A beats B, B beats C, and C beats A. In many Olympic team competitions, such as basketball, water polo, or ice hockey, this same property arises in comparisons. Several national teams are placed together in a league and then compete in round-robin competition to see who advances to the medal round. During the competition, pairwise comparisons between teams are made by the various sports writers. When enough information is gathered to assess the necessary probabilities of winning, the newscasters select the teams with the best chance to make it to the final round. Unfortunately, the intransitiveness property often arises making it difficult to predict the winner. In the Games of the XVI Winter Olympiad (1992), a circular triad occurred in the preliminary round of the ice hockey competition. The teams from Canada, Czechoslovakia, and the Unified Team of the former Soviet Union completed the preliminary round-robin with identical records of four wins and one defeat. Czechoslovakia defeated the Unified Team 4-3, the Unified Team defeated Canada 5-4, and Canada defeated Czechoslovakia 51. In terms of team placement the Olympic committee uses a goal differential in head-to-head competition to break ties. Canada had a +3 goal differential, the Unified Team had a 0 goal differential, and Czechoslovakia had a —3 goa differential. Accordingly, the teams were ranked first, second, and third, respectively for the medal round. The ice hockey medal round uses a single-elimination approach in the determination of the gold medal winner. Thus placement can definitely affect the results. For example, although the Unified Team lost to Czechoslovakia in the preliminary round, it did not face Czechoslovakia in the medal round as the Czech team was eliminated in its first medal-round game. As was predicted by the placement order, the Unified Team won the gold medal, the Canadian team won the silver medal, and the Czech team won the bronze medal. Intransitiveness is not necessarily a negative aspect as transitiveness may be an artificial requirement. For example, in chess it is entirely possible that with high consistency player A wins over player B, B wins over C, and C wins over A. This does not make the rules of chess inappropriate but instead increases interest in the game. This is why round-robin events are popular. A dominant team will usually excel but when no team has a clear advantage over all the others then the intransitiveness leads to interesting and often exciting results.
70
3.1.2
PITMAN'S MEASURE OF CLOSENESS
Voting preferences
The issue of intransitiveness is most prevalent in discussions on voting preferences. For example, Blyth (1972) presents the problem where voters are asked to make pairwise comparisons between alternatives A, B, and C. A majority (70%) of the voters prefer A over B, a majority (60%) prefer B over C, and a majority (70%) prefer C over A. So which alternative is most popular? Election data often result in intransitive popularity contests. For example, prior to the 1980 Presidential elections certain political opinion polls revealed that in head-to-head competition, President Jimmy Carter would defeat Senator Edward Kennedy, California Governor Ronald Reagan would defeat President Carter, but Senator Kennedy would defeat Governor Reagan. The presence of such intransitiveness in elections is common and should not cause us to ignore this information. Instead, it should help us to understand that elections are based on closeness probabilities. Hoffman (1988) describes a preference test concerning America's favorite fast food, the hamburger. The choices are to buy a hamburger from McDonald's, Burger King, or Wendy's. He notes that society's preference may be paradoxically intransitive in that McDonald's may be preferred to Burger King, Burger King may be preferred to Wendy's, but Wendy's may be preferred to McDonald's. With three individuals, such a selection would occur if each chain is ranked first by only one person, second by only one person, and third by only one of the individuals. Another example described by Blyth (1972) concerns the choice of pie that brings an individual the most satisfaction. Suppose there are three alternatives: apple, blueberry, and cherry. It is entirely conceivable that the individual prefers apple to blueberry, blueberry to cherry pie, and cherry to apple pie. This preference would imply a probability exceeding .50 of receiving more satisfaction from one pie as compared to the other. Here the preference within an individual, as opposed to across individuals, is intransitive.
3.1.3
Transitiveness
The lack of transitiveness is clearly evident in these examples and requires extensive treatment and discussion. We are concerned, although to a lesser extent than decision theorists, over the possible intransitiveness of Pitman's closeness criterion. Our diminished concern over transitiveness is based on a
ANOMALIES WITH PMC
71
multitude of results. In Theorem 5.4.5, we demonstrate that transitiveness exists among competing estimators in the Bayesian setting known as posterior Pitman closeness (see Ghosh and Sen (1991)) to which we previously referred in the convenience store example. From Ghosh and Sen (1991), the posterior median is posterior Pitman-closest, and as such is unique, whenever the posterior distribution is continuous. Likewise, if we use absolute loss, the estimator that minimizes the Bayes risk (using a noninformative prior) coincides with that obtained from the Pitman closeness criterion. If one restricts the class of estimators under consideration to those that are equivariant under a suitable group of transformations (as in Ghosh and Sen (1989); or Nayak (1990)), the Pitman closeness criterion produces a median-unbiased estimator as the Pitman-closest within the class. More importantly, among these estimators transitiveness holds (see Nayak (1990) and §5.3). Restriction of the class of estimators by unbiasedness or equivariance is virtually inescapable in classical analysis and there is no reason not to involve such "common" restrictions in using the Pitman closeness criterion. Likewise, the Pitman-closest estimator under such restrictions is the Pitman estimator under absolute risk. In decision theory the lack of transitiveness among competing estimators under Pitman's criterion is usually unavoidable. This happenstance is a consequence of the fact that majority preference alone is sufficient to guarantee the status of being Pitman-closer. Game theorists will recognize that as a majority preference rule the Pitman closeness criterion parallels the democratic voting process in the United States. If 51% of the citizens favor a candidate, it does not matter how much he or she is disliked by the 49% who voted for the opposition. Neither does it matter how close the decision was for the 51% who favored the winner. This criterion follows closely the well-known paradoxes in elections or preference polls. In this regard, the example given in §3.2.3 is very similar to the vast literature of Brams (1985). As an illustration, consider a primary election in which there are three candidates, A, B, and C. Let us suppose that each voter orders the candidates with only the following possible outcomes:
The ability to order makes individual preference transitive. (Note that this situation differs from the pie example discussed earlier because in that case individual choices were intransitive.) Let us suppose that the electorate
72
PITMAN'S MEASURE OF CLOSENESS
supports each of the three permissible outcomes with the stated percentages. Even though each member of the electorate has ordered the candidates, the following pairwise elections would result:
where X ^> Y denotes that candidate X defeats candidate Y. Hence if one candidate withdraws from the election, the consequences will be disastrous for one of the two remaining individuals. With three candidates in the race, the polls provide the picture of a very competitive election. In this example the two candidates, B and C, will be chosen as the top two winners. Even though candidate B avoided elimination by only 1% of the electorate in the following run-off election, B will overwhelmingly defeat C by a landslide margin of 30%. The axiom of von Neumann and Morgenstern (1944, p. 27), that individual preference is transitive, is often called plausibility. However, these authors later remark that preference by society or by n participants in a game, even with the postulate of plausibility, ... is indeed not transitive. This lack of transitivity may appear to be annoying and it may even seem to be desirable to make an effort to get rid of the theory of it. Yet ... [it is] a most typical phenomenon in all social organizations. Cyclical dominations — y over x and z over y and x over z — is indeed one of the most characteristic difficulties that a theory of these phenomena must face. This incoherence (i.e., intransitiveness) of Pitman's closeness criterion is well known among social scientists through Arrow's (1951) Impossibility Theorem for the collective rationality of social decision making. Blyth (1991) discusses the context of this point and its relation to the Pitman closeness criterion. Following von Neumann and Morgenstern, Arrow (1951) hypothesizes an electorate of voters whose preferences are "rational," meaning that they satisfy the following pair of axioms: I. Either x > y or y > x (Decidability); II. If x > y and y > z, then x > z (Transitiveness). In the mathematical sense, Axiom I applied to the real number line is very much an outcome of the principal of trichotomy (i.e., x > 0 or x < 0).
ANOMALIES WITH PMC
73
Arrow constructs five more conditions to avoid trivial cases and proves the Impossibility Theorem: that no social ordering satisfies the pair of axioms and the quintet of conditions. Reflecting upon his discovery, Arrow (1951) remarks that ... the only part of the conditions that seems to me at all in dispute is the assumption of rationality. The consequences of dropping this assumption are so radical that it seems worthwhile to maintain it and consider restrictions on individual preferences. Readers who are interested in a complete presentation of the two axioms and the five conditions are referred to the presentation by Thompson (1989). He also provides a proof of Arrow's Impossibility Theorem. In fact Savage (1954) constructed his Foundations of Statistics on an axiomatic treatment of pair wise preferences. He makes the analogy that the occurrence of an intransitive comparison creates the same sensation as when one is confronted with the fact that some of his beliefs are logically contradictory. With the Pitman closeness criterion and its subsequent results, we are often better equipped to address many of the pressing issues of politics. An example of this can be seen in the usage of the system of approval voting, a method which is used for the election of certain officers in scientific and engineering societies (see Brams and Fishburn (1992)). In an approval vote involving three candidates, each voter selects one or two candidates. For example, in our previous primary election discussion with candidates A, B, and C, A would appear on 67% of the ballots, B on 65%, and C on 68%. Approval voting allows us to rank the candidates; thus, to some extent our degree of preference is reflected in the approval ballot. Haunsperger (1992) and Esty (1992) very recently suggested different ways to use these simultaneous comparisons to find a Condorcet (1785) candidate, who would defeat all competitors in a two-candidate election. Haunsperger (1992) suggests an innovative use of the Kruskal-Wallis test in her procedure, whereas Esty employs a maximum likelihood-based approach. These recent approaches arising from practical problems in diverse disciplines illustrate the realism and necessity of the PMC criterion. In our example of the 1959 SWC football season, approval voting could be used to determine the conference champion by ranking the three teams by their margin of victory in conference games. In this approach suggested by Haunsperger (see Table 3.2), TCU receives three first, three second, and one third place ranks, whereas Arkansas and Texas each receive two first, two
PITMAN'S MEASURE OF CLOSENESS
74
Table 3.2: Ranks of tri-champions by magnitude of victory. TCU Arkansas Texas A&M Baylor Rice SMU
Rank 1 Rank 2 RankS Arkansas TCU Texas Texas Arkansas TCU TCU Texas Arkansas TCU Arkansas Texas Arkansas TCU Texas TCU Texas Arkansas Texas TCU Arkansas
second, and three third place ranks. Hence, as illustrated in Table 3.2, TCU wins approval voting based on ranks ordered by the magnitude of victory. This nonparametric rank sum is known among social scientists as a Borda count (see Brams 1985). Along the lines of Arrow's axioms, the Possibility Theorem of Sen (1966) also has application in our discussion of transitiveness. Heating's (1991) proof (see Theorem 4.6.1) that the Pitman closeness criterion is transitive among a class of ordered estimators, is a special case of Sen's more general result applied to the comparison of estimators. The intransitiveness of the Pitman closeness criterion in estimation theory is certainly to be expected given the vast research on majority preference in the larger domain of game theory. Moreover, Pitman's (1937) closing remarks that intransitiveness "... does not detract from the definition of closest" challenge the necessity of transitiveness among all pairwise comparisons. In other words, if we can exhibit a decision rule, a Condorcet (1785) estimator, that is Pitman-closer than all its competitors on a pairwise basis, is transitiveness among the defeated a realistic concern? The mathematical significance of Pitman's intuitive remark is discussed in §3.4. If transitiveness can only be guaranteed by restricting the class of candidate estimators, then a natural question arises as to how frequently transitiveness occurs. Brams (1985, p. 66) gives us some limited insight into the relevance of intransitiveness based on the number of estimators under consideration through calculations of probabilities of a Condorcet candidate, which is a Pitman-closest estimator. For three competing estimators, Brams calculates that approximately 92% of the comparisons would produce a Pitman-closest estimator due to chance alone. Moreover, in asymptotic setups, we will show in Chapter 6 that transitiveness holds under fairly general regularity conditions.
ANOMALIES WITH PMC
3.2
75
Paradoxes among choice
Intransitiveness is only one of several paradoxes associated with the concept of probability of choice. Even if a pairwise criterion such as PMC were transitive, the preferred alternative may be the pairwise-worst but simultaneouslybest. Likewise, the choice could be pairwise-best but simultaneously-worst. These paradoxes need to be understood when we use PMC. An excellent example in politics is used to illustrate these concepts.
3.2.1
The pairwise-worst simultaneous-best paradox
A formal description of the pairwise-worst simultaneous-best paradox is given by Blyth (1972). It is defined as follows. Definition 3.2.1 For real-valued random variables X\, X^, and X%, it is possible for Pr(Xi = mm{Xi,X2,Xs}) to be the largest for i = 3, even though Pi(Xi < X3) and Pr(X2 < X3) exceed .50. Thus X% is preferred over both X\ and X<2 in any simultaneous comparison even though it is worse than X\ and X% in pairwise comparisons. Hoffman (1988) describes an interesting problem taken from a 1948 American Mathematical Monthly. It concerns three men, Al, Ben, and Charlie, who are competing in a dart game with balloons as the target. Each person is holding a balloon tied to a string and remains in the game until his balloon is broken by a dart. The person left with an unbroken balloon is the winner. The order of play is determined randomly and then each person gets to throw one dart on each rotating turn. Suppose that Al pops a balloon 80% of the time, Ben pops one 60% of the time, and Charlie pops one 40% of the time. In any pairwise competition we would expect Al to beat Ben, Al to beat Charlie, and Ben to beat Charlie. In simultaneous competition, however, the winner depends on the strategy used by the players. For example, if each player tries to pop the balloon of the strongest opponent, then the worst player may be the most likely to emerge as the winner. In such a situation, the probability that Charlie, the worse shot, wins is .37 while the probability that Al, the best shot, wins is .30, and the probability that Ben wins is .33. In short, Charlie survives because Ben and Al concentrate on each other and ignore him. Another illustration of this point can be found in the primary election suggested earlier with three candidates A, B, and C. Suppose that each voter orders the candidates with only the following possible outcomes:
76
PITMAN'S MEASURE OF CLOSENESS
With these stated percentages, candidate C is the "simultaneous-best" candidate but is also the "pairwise- worst." The following pair wise elections would result:
Both candidates A and B would defeat candidate C because he is the leastliked candidate among 60% of the electorate, whereas he is the first choice among 40% of the voters. This example also manifests candidate A as the "simultaneous-worst" (since he garners 25% of the primary vote) but "pairwise-best." The above example is useful in illustrating the politics of extremes (i.e., see §3.2.3). For example, consider a state whose Democratic electorate is 40% African-American. Assume that in the state's Democratic primary election, candidate C is of African-American descent whereas candidates A and B are Caucasians. In the run-off election candidate B will challenge C since candidate A would be eliminated in the primary. This definitely results in an election of extremes since among the 40% who vote for candidate C, B is least acceptable. Among the 35% who vote for candidate B, candidate C is least acceptable. If the electorate votes strictly along racial lines then the election hypothesized above becomes very plausible. Issues such as race, abortion, or war can be quite divisive, resulting in the polarization of the electorate along a single issue. This can eliminate candidates of moderation, such as A. In the final analysis, 40% of the electorate will have an elected official who is least acceptable among the available candidates. Note that candidate A, who was the favorite among only 35% of the electorate, was not considered to be the least acceptable by anyone. With approval voting, candidate A would have been chosen on 100% of the ballots, candidate B would have been chosen on 60% of the ballots, and candidate C would have been chosen on only 40% of the ballots. An election example in which a simultaneous-best did not prevail in a runoff election occurred in San Antonio's Mayoral election of May 4, 1991. Eleven candidates competed in a race that centered on issues of a water reservoir and city taxes. The outcome of that election is given in Table 3.3. The outcome of this election advanced Councilwoman Berriozabal and Councilman Wolff into a runoff election. However, a survey by the Southwest Voter Registration Institute showed a polarization of the electorate
ANOMALIES WITH PMC
77
Table 3.3: 1991 San Antonio mayoral primary. Candidate Maria Berriozabal Nelson Wolff Lila Cockrell Van Archer Others Total
Number of Votes 40,319 34,075 26,939 20,988 7,964 130,285
Percentage 30.95 26.15 20.68 16.11 6.11 100.0
Results taken from the San Antonio Express News, May 5, 1991. along ethnic lines. Maria Berriozabal succeeded in winning 78% of the Hispanic vote, whereas Nelson Wolff captured only 10% of the Hispanic vote. Among non-Hispanic voters, Berriozabal won only 5% to Wolff's 35%. An estimated 36.1% of the ballots were cast by persons of Hispanic descent. In the runoff election, approximately 50,000 of the 55,891 uncommitted voters were non-Hispanics. The ethnic composition of the uncommitted voters indicated that Wolff was clearly in the lead. Berriozabal's best issue among non-Hispanic voters was her opposition to the water reservoir, which Wolff supported. In the initial May election, however, the continued funding of the water reservoir was defeated by a 52:48 margin. With the resolution of the reservoir issue, Berriozabal lost the opportunity of winning the support of the uncommitted non-Hispanic voters who opposed the water reservoir. Thus Wolff won the runoff election and became San Antonio's mayor. We can extend this paradox to our previous hamburger preference problem. Suppose a group of consumers ranks the three chains according to the preference shown in Table 3.4. Suppose also that the proportion of people choosing set 1 is .25, the proportion of people choosing set 2 is .35, and the proportion choosing set 3 is .40. If it is known that 65% prefer McDonald's to Wendy's, 60% prefer McDonalds's to Burger King, and 60% prefer Wendy's to Burger King in pairwise competition, McDonald's wins all pairwise comparisons. However, the opposite occurs in a simultaneous comparison, where the preferred choice is Wendy's with 40% of the votes and the least preferred spot is McDonald's with 25% of the votes.
3.2.2
The pairwise-best simultaneous-worst paradox
As in §3.2.1 the definition of a pairwise-best simultaneous-worst paradox is due to Blyth (1972). It is given as follows.
PITMAN'S MEASURE OF CLOSENESS
78
Table 3.4: Reference set for fast food chains.
Set 1 1. McDonald's 2. Wendy's 3. Burger King
Set 2 1. Wendy's 2. McDonald's 3. Burger King
Set3 1. Wendy's 2. Burger King 3. McDonald's
Definition 3.2.2 For real-valued random variables X\, X%, and X$, it is possible for Pr{JQ = mm(Xi,X<2,X3)} to be the smallest for i=\, even though Pr(Xi < X%) and Pr(Xi < X3) exceed .50. Thus X\ is the least preferred over X^ and X$ in the simultaneous comparison even though it is preferred over each in any pairwise comparison. Hoffman's (1988) dart-throwing problem is a useful example to illustrate this concept. If we return to the same problem as illustrated in §3.2.1, we note that Al was pairwise-best over Ben and Charlie. However, when the obvious strategy of attacking the strongest was utilized, Al became the simultaneous-worst. Because everyone attacked him first, Al was the biggest loser even though in pairwise competition he was far superior to either of the two contestants. In the hamburger preference problem in the previous subsection, all pairwise comparisons between McDonald's, Wendy's, and Burger King resulted in McDonald's as the winner. However, when a comparison was made with all three alternatives, McDonald's was the least preferred spot. Example 3.2.3 To exemplify this paradox in the context of estimation theory, let us consider a random sample of size n from a normal distribution with mean fj, and standard deviation a. The sample standard deviation S is a complete sufficient statistic for the estimation of a. We consider three estimators which are scalar multiples of S. The maximum likelihood estimator of a is denoted by ffi(S) = QI^, where a\ = ^/(n - l)/n. The median unbiased estimator of a is denoted by ^2(8) = ot^S, where 0:2 = Y/(n — l)/m n _i. The median of a chi-square with / degrees of freedom is denoted by m/. We consider a third estimator 03(8) = 0:3^', where 0:3 = 1/ai. The mode-median-mean ordering for the chi-square distribution,
ANOMALIES WITH PMC
79
(see Johnson and Kotz (1970) or Sen (1989a)), produces the following ordering of the three estimators:
Also, we calculate PMC through the result that whenever oti < QJ, then
where &ij = (ai + aj)/2 and X2(x>f) denotes the distribution function of a chi-square random variable with / degrees of freedom. We demonstrate a simple illustration of the pairwise-best simultaneous-worst paradox for a sample of size ten. The scalar coefficients become QI = .9487, c*2 = 1.0386, and as = 1.0541. When we apply the PMC formula to these three estimators, we obtain JP(aiS,ot2S] = .427, JP(a2<S, a^S) = .512, and ]P(aiS, ct^S) = .439. Thus $2(8) is the pairwise-best estimator of a, and in §5.2 we show that it is the Pitman-closest estimator of a. However, since QI < Q2 < as, then
Our "pairwise-best" estimator has been sandwiched between the MLE and &•$($}. In simultaneous comparison, because ai is slightly smaller and as is slightly larger than c*2, then o\(S] and (73(5) obtain the largest shares of the simultaneous probability, leaving the median unbiased estimator with a very modest proportion of the total probability. From this example the reader can understand the mechanics of this process in estimation theory and can readily create an example of the pairwise-worst simultaneous-best paradox by adjusting ai, 02, and 0:3. In many ways the type of paradox that occurs when the pairwise-best becomes the simultaneous-worst, or vice versa, is not a direct result of irrational behavior. In preference tests, for example, we have seen that selection strategies can play an important role in the choice of the winner. The same holds true in political elections. Such unintuitive outcomes often defy predicted results. In such difficult settings closeness probabilities, and the criterion of PMC, are a great aid in ranking the available alternatives.
80
PITMAN'S MEASURE OF CLOSENESS
Figure 3.2: A partition of the electorate along a single issue.
3.2.3 Politics: The choice of extremes Politics serve as an excellent setting to illustrate the impact of the probability paradoxes discussed in the previous subsections of this chapter, and the advantages of the use of PMC over other criteria such as MSE. Consider, as an example, an election that is centered around a single issue, namely, the attitude of voters within a population regarding the economy. Suppose the distribution of the electorate on the economic issue of concern is as illustrated in Figure 3.2. Let — oo represent an extreme liberal and +00 an extreme conservative. Assume further that candidates C i , . . . , Cjt take specific positions x\ < • • • < Xk on the economy and that f ( x ) represents the distribution of the attitude of the electorate toward the issue of the economy. One method of comparing the candidates is to determine the mean squared error (MSE) of each one and to select the candidate with the smallest MSE. Let X be the random variable that measures the economic attitude of randomly chosen voters within the electorate. The MSE of each candidate is calculated as Mi = E((X — Xi)2) for i — 1,..., k. The smallest possibe value of Mi is found by setting Xi = E(X). Such an impersonal calculation normally is not made by a voter. Instead, individuals are most likely to vote for the candidate whose position is
ANOMALIES WITH PMC
81
closest to their own. In this example, the set of voters closest to candidate Ci on the given economic issue is denoted by
Applying this equation one finds that the shaded region in Figure 3.2 represents Pr(V2), the fraction of the electorate that favors the second candidate. Similar probabilities would have to be calculated across all k sets to obtain the simultaneous-best candidate with respect to the economic issue. Example 3.2.4 To illustrate the inconsistency in the electorate's candidate of choice depending on whether the election is a primary or a run-off election we return to the Democratic Presidential primary of 1972. This primary will also illustrate the thesis of the politics of extremes. In that presidential primary, the four principal candidates were Hubert Humphrey, George McGovern, Ed Muskie, and George Wallace. Humphrey had been the Democratic nominee who opposed Richard Nixon in the 1968 Presidential election. Wallace, a southern conservative, ran as the presidential nominee of the defunct American Party in 1968. However, in 1972 he returned to the Democratic fold and gained support for his stance as a "hawk" on the Vietnam war, support of American labor, and opposition to racial integration. Ed Muskie, who had become the clear Democratic frontrunner, began to pass President Nixon in national preference polls in early 1971. However, Muskie's candidacy had been severely weakened by information from the Nixon White House. George McGovern was a liberal "dove" who opposed the Vietnam War, supported labor, and preferred governmentally mandated social reforms. It is difficult to imagine two candidates in the same party as ideologically different as McGovern and Wallace. We shall place McGovern at —1 (an extreme liberal) on our liberal/conservative scale and Wallace at +1 (an extreme conservative). Among the two candidates with intermediate views, Hubert Humphrey would generally be considered more liberal than Muskie and we shall assign them (our subjective) ranks -.50 and 0.00, respectively. These ranks and the assumed distribution for the primary electorate are depicted in Figure 3.3. Although we conjecture a form for the distribution of the electorate, the conjectured form is supported by the actual primary balloting given in Table 3.5. The Democratic voters of this divided nation clearly voted for candidates whose positions on the issues were closest to their own. The attempted assassination of George Wallace stopped the candidacy of the Democrat with
82
PITMAN'S MEASURE OF CLOSENESS
Figure 3.3: Distribution of Democratic Primary Voters in 1972. Table 3.5: 1972 Democratic preference primaries. Total Wallace Humphrey McGovern Muskie 11,724,795 2,647,676 2,202,840 1,743,023 3,354,360 18.79% 14.87% 100% 22.58% 28.61% Results taken from AMERICA VOTES 10 by Richard M. Scammon (1972), Congressional Quarterly, Washington D.C. 1973
the most primary votes on May 15, 1972. After the primaries on May 16, 1972, the cumulative primary balloting was as given in Table 3.5. The percentages in the row beneath the number of primary ballots represent the Pr(V^) for each of the four candidates. To readers who are too young to remember these times, this enumeration of the ballots in the Democratic primary may be enlightening. Senator George McGovern won the Presidential nomination on the first ballot and was very successful in courting uncommitted delegates by casting Humphrey as an unelectable candidate, and Muskie as a weakened alternative. It is also abundantly clear from Nixon's landslide reelection that almost 30% of the Democratic voters probably saw President Nixon as a more acceptable alternative than McGovern. This example is meant to be strictly illustrative, as we well realize that true elections are multivariate phenomena subject to many quantitative variables such as foreign policy, regional economics, prejudices, etc. The ordering of the candidates along this scale does illustrate the distance between the supporters of McGovern and Wallace.
ANOMALIES WITH PMC
3.3
83
Rao's phenomenon
One notable weakness in using risk for the comparison of estimators was uncovered by Rao (1981). He found that shrinking the risk of a risk unbiased estimator to a minimum risk estimator does not necessarily yield an estimator that is better in the sense of PMC. Rao considered the case where squared error is the loss function and showed that such a loss function may treat large errors inappropriately. Keating (1985) showed these observations also hold uniformly for an absolute error loss function under some mild regularity conditions. Example 3.3.1 As an illustration of Rao's phenomenon consider the problem of estimation of the population variance from random samples chosen from a AT(/x, a2) distribution. It is well known that S2, the sample variance, is a complete sufficient statistic and UMVUE for estimation of a2. We restrict our attention to a class B of estimators that are scalar multiples of S2. The class B = {c£2 : c > 0} was also considered by Rao (1981), Keating and Gupta (1984), and Ghosh and Sen (1989). When c = GI = 1, we have an estimator, S2eB, which is the UMVUE of a2. Using the fact that
we can derive the minimum mean squared error estimator (MMSE) of a2 in the class B as the one for which c = c% = (n — l)/(n + 1). Since c\ > 02 for all n > 2, then
The latter result is due to the fact that the chi-square distribution is positively skewed and unimodal and therefore has the median-mean ordering. Hence, we see that shrinking an unbiased estimator to a MMSE did not produce a Pitman-closer estimator. In §5.2, we shall show that the Pitmanclosest estimator in B is specified by setting c = CQ = (n - l)/(m n _i). Note that which shrinks the unbiased estimator toward 0.
84
PITMAN'S MEASURE OF CLOSENESS
Example 3.3.2 Rao's phenomenon may not always occur. The following example, given by Schrodinger (see Geary (1944)) and thoroughly discussed in Johnson (1950), concerns the closest estimate of the number of cars in Dublin and is counterintuitive to Rao's phenomenon. The cars are numbered consecutively from 1 to an unknown total 0, and the numbers on a sample of n cars are known. Let us make the problem continuous so that X, say, has a uniform frequency distribution on the interval (0,0). It is desired to estimate the unknown 6. In a random sample of size n, the largest number Xn:n is a sufficient statistic for estimating 0. Since 6 is a scale parameter, we consider a class of estimators of the form B = {aXn:n : a > 0}. When a = 1, we obtain the maximum likelihood estimator 6 = Xn:n. Prom the statement following (1.10) we know that 9/9 has a beta distribution with a cumulative distribution function given by Fn(t) = (t/0)n, Q
To obtain the relative minimum we must set a = (n + 2)/(n+1). This yields the MMSE estimator Note that the three estimators we derived are aligned as
for all 0 < Xn:n < 0. Thus, the unbiased estimator 9\ is closer to 0 than the MMSE estimator 62 whenever (9\ + #2)/2 < 9. In this situation, PMC for these two estimates is given by
Only for n = 1 is the unbiased estimator Pitman-closer than the MMSE estimator. For n > 2, the MMSE estimator is Pitman-closer and exemplifies a counterexample to Rao's phenomenon. Pitman (1937) shows that the
ANOMALIES WITH PMC
85
Pitman-closest estimator in B is given by a = ao = 21/71. For n > 4, one can verify that
Since the distribution of Xn:n/0 is negatively skewed for n > 2 with a mode of 1, then the mean-median-mode ordering occurs. Hence in this case, the MMSE will provide a Pitman-closer estimator to 0. Likewise, following Scheffe's suggestion, the asymptotic value of the Pitman closeness comparison between the UMVUE and the MMSE was determined by Johnson (1950) from the arbitrary sample expression as e~l. As n increases in size, 9\ shrinks to 9%, since both estimators approach Xn:n. However, as the sample size increases the PMC decreases in value. Thus, shrinking the unbiased estimator to a MMSE estimator in this car example improves the intrinsic property of PMC. These results indicate the need to be careful in using risk as a criterion in comparing estimators. Keating (1985) found that shrinking risk under absolute error loss to a minimum risk estimator does not affect the intrinsic property of PMC as severely as shrinking the risk under squared error loss to a minimum squared error risk estimator. Nevertheless, use of risk as a sole criterion in such comparisons can be a problem because of the size of the associated conditional risks in Theorem 2.2.1.
3.4
The question of ties
Keating and Mason (1988a) give a cursory discussion of the question of ties between competing estimators and suggest that this probability be equally distributed between the two. Some maladies in Pitman's original definition would surface without modifications for distribution of probabilities of ties. For example, if we compare an estimator B\ with itself, then JP(0\,Oi\9} = 0. However, it seems only plausible that under a comparison criterion an estimator should be equivalent to itself; that is, PMC should be one-half. Also, for two estimators 9\ and #2, if we transpose the order of consideration we would hope to obtain complementary values. Nagata also made the same suggestion to Kubokawa (1991) who used the modification throughout his paper. To do so, define the following corrected Pitman relation as:
86
PITMAN'S MEASURE OF CLOSENESS
This relation or functional defined on the Cartesian product of the set of decision rules corrects the Pitman's criterion by equally distributing the probability of ties between two competing estimators. We can use the Pitman functional to modify Definition 2.1.4 of a Pitmancloser estimator, and Definition 2.1.5 of a Pitman-closest estimator. In this regard we say that r(0i,02\0) > 0 with strict inequality for at least one 9 e J? <=$• 0i is corrected Pitman-preferred to 62The corrected relation, although intransitive, has the following properties: (i) reflexive, r(§i,6i\0) = 0; and
(ii) skew symmetric, r(0i,0z\0) = —r(02,0i\0). This modification brings the criterion closer to the decision theorist's vision of an equivalence relation defined on the class of decision rules. However, in general, Arrow's (1951) result shows that further pursuit is useless without some restrictions indicated by Sen (1966). By virtue of the corrected relation defined from the original Pitman closeness criterion, we are able to illustrate the intuitive nature of Pitman's (1937) declaration that intransitiveness should not detract from the concept of being Pitman-closest. Although his statement is unequivocal he provides no mathematical basis for it. To construct the mathematical framework for Pitman's declaration about the unnecessity of the transitiveness of the Pitman-closer criterion with respect to the Pitman-closest concept, we restrict our attention to a convex class of decision rules. In the functional analysis sense, we recognize the Pitman relation as a functional, albeit nonlinear, defined on D x D. Let D be a convex class of decision rules within which there exists a Pitman-closest estimator 0*. Then let 6 be any other decision rule in the convex set D. Then by virtue of its role as being most-corrected Pitmanpreferred: r(^*, 0\0) > 0 with strict inequality for at least one 0 e Q for every 0 e D such that 0 =£ 6* (a.e). In the sense of ordering, we may say that no element of D lies to the right of 0* and as such 0* becomes an extreme point of the set D. An invariant class (see §§5.1 and 5.2) of decision rules for an unknown parameter 6 is convex, as is the class of surjective decision rules (see §4.5).
ANOMALIES WITH PMC
3.4.1
87
Equal probability of ties
A paradox can arise in the use of PMC when there is a nonzero probability that the tow estinmatiors beuing comtpkated are ewyql. cinstuder a simple example where X\ and X^ are both identically distributed as a uniform distribution with parameter 0 as defined in Example 3.3.2. Consider the order statistic estimator #2 — max(Xi,X2), the maximum likelihood estimator of 6. It manifests the following oddity, in the sense of PMC, when compared with the estimator Q\ = X\:
The reason for this oddity, of course, is due to the fact that, with n = 2, Pr(#i = #2) = -50. Consequently the two estimators are identical with 50% frequency. Hence #2 is always at least as close as 0\ and is closer to 0 than 01 with probability .50. One simple solution to avoid this problem would be to divide the probability equally between the two estimators since in essence they have tied. If this equitable distribution of the probability of ties is used, the paradox vanishes. The importance of the example of the uniform distribution has been previously cited within §3.3 on Rao's phenomenon, in which this family of distributions provided a counterexample to Rao's observations. Again this fruitful example helps illustrate the problems associated with ties. The comparison of estimators of the one-parameter uniform distribution has broad interest in statistics because it allows for superefficient estimation better than that of the UMVUE due to the fact that it has infinite Fisher information. Example 3.4.1 In the case of estimators with discrete distributions, the problem of ties becomes more pronounced. We consider a special case of Example 2.3.1 of a random sample of size 3 from a Bernoulli distribution; i.e., Y = 3X has a binomial distribution, J3(3,0), where Y is the number of successes in three Bernoulli trials. The maximum likelihood estimator of 0 is 0i PO = X. It is to be compared to the following estimator:
88
PITMAN'S MEASURE OF CLOSENESS
where e = v/3/12. Actually the choice of e only requires 0 < e < g and it follows that for every 0 < 6 < 1,
However, for 0 in the interval between -jfe -e/2 and ^+e/2; i.e., for .3445 < 0 < .6555. Consider the effect of ties in this example. Note that
Thus with at least 25% frequency the two estimators agree and are Pitman equivalent. However, by setting #2(3) > \ and #2(3) < f > we guarantee that, for all values of 0, 02(X) will be preferred over 9 for either [X = 5] or [X = f]. Since Pr[§i(X) = ±] = 36(1 -0)2 and Pi[0i(X) = §] = 302(1 -9),
Thus, by counting ties we focus on the values of the parameter space for which the two estimators are equal and assure that #2 is as close as 0\ with probability 1. If the probability of ties were equally divided between the two estimators, our conclusion changes drastically and 02 is only slightly preferred over B\. The median plays a central role throughout this discourse. Its culmination comes in Chapter 5, in which Pitman-closest estimators among an equivariant class are shown to be median-unbiased estimators. Then Oi (X) = X is the MLE of the unknown proportion of successes 0. It is also the UMVUE of 0 and is efficient in that its variance attains the FrechetCramer-Rao lower bound on the variance of an unbiased estimator. Without modification of the Pitman closeness criterion, X may be inadmissible in this sense. However, the example can be modified to construct an alternative estimator that is not confounded with the problem of ties. This alternative estimator has its origins in Bayesian analysis and has the form
where 6.50 (a, (3) is the median of a beta random variable with parameters a and (3. The Bayesian origins of this median-rank estimator of 0 are explained
ANOMALIES WITH PMC
89
Table 3.6: Numerical values of estimators of the binomial proportion.
X 0 1 3 2
3
1
0i 02 0 0.000 0.000 .159 .333 .356 .386 .667 .644 .614 1.000 1.000 0.841
in Example 5.4.8. It should be noted that the values of 0(X) coincide with the median ranks of the order statistics taken from a random sample of size four. Table 3.6 contains the numerical values of the three competing estimators. These estimators are graphed in Figure 3.4. In the earlier discussion, the issue of ties was brought forth to clarify that the disparity between Q\ and 02 was not nearly as drastic as first thought. The table provides insight to the superiority of 02 in the middle of the parameter space in that 02(5) > 0i(g) and 02(|) < 0i(|)- Hence for 0 between .3445 and .6555, the values of X = ^ and X = | will always contribute to the Pitman closeness of 02 over 01. By slightly increasing the estimate at X = ^ by e and decreasing it at X = | by e, we produce an estimator which is preferred over a subset of the parameter space. This strategy of improving traditional estimators based on the Pitman closeness criterion was first suggested by Keating and Mason (1985a). The modification obtained by using 0 provides the same strategy to 0i for all values of X. All three estimators are increasing functions of X and as such they will be discussed in §4.4. By connecting the estimates at each value of X with straight lines as depicted in Figure 3.4, the estimators can be made continuous. However, the median-rank estimator 0 is Pitman-closer to 0 than 0i with probability 1, whenever .3707 < 0 < .6293. This unusual result is obtained whether probabilities of ties are split or not because 0 uses the same e-strategy on the extreme values of X. This can be seen from Figure 3.4, where no value of X produces an estimate defined by 0 whose value is closer to 0 than that given by 0 over the restricted parameter space. In essence, 0 shrinks 0i toward .50 much like a shrinkage or Stein-rule estimator. Whereas one may lack sufficient knowledge to place a prior distribution on 0, it may well be known that 0 lies in the interval (.3707, .6293). In such
90
PITMAN'S MEASURE OF CLOSENESS
Figure 3.4: Three estimators of the success probability. cases, one would be certain that 0 is closer to 6 than 0\. We would reasonably counter that, with such prior information, an estimator superior to 6 could be constructed. The importance of the result is that MSE comparisons over the same subset of the parameter space do not disclose this complete preference. A second practical problem that may occur frequently is that if the true parameter space is (0,1) then the estimates at X = 0 and X = 1 fall in the closure of the parameter space, but not in the parameter space itself. This issue, that estimators should map the data space into the parameter space, was very much a central theme in the work of Hoeffding (1984).
3.4.2
Correcting the Pitman criterion
In the above example we see that PMC, given in Definition 1.0.1, results in a comparison criterion that is not only intransitive but also is neither reflexive nor symmetric. While we cannot easily correct the problem of intransitiveness, we can solve the latter two problems. To do this, define the following weighted function:
ANOMALIES WITH PMC
91
where C\ is the loss associated with estimator 9\ and £2 is the loss associated with #2- If £2 < £1 there is no penalty; if the losses are equal there is a penalty of ^; and if £2 > £1 the loss is 1. Using these results, let us set forth a weaker definition, denoted by JP*, as the expectation of the weighted function in (3.2). It can be expressed as follows:
Thus the weight function J serves to split the probability assigned to events that produce ties between the loss functions. The weaker definition of Keating and Mason (1988b) has the following properties:
Thus, the corrected Pitman relation is reflexive and skew-symmetric but still intransitive. To illustrate the usefulness of the corrected Pitman criterion, reconsider the special case of the two observations from a uniform distribution as mentioned at the beginning of §3.4.1. In this special case, we have that
because Pr (9\ = 62) = .50 (i.e., ties should occur in half the cases due to chance alone). Using the corrected Pitman criterion, we have that
and JP*(02) 0i) = 1 — JP*(0\, #2) = -75. These corrected values give us a more reasonable picture of this pairwise comparison than the perplexing values determined from Definition 1.0.1.
92
PITMAN'S MEASURE OF CLOSENESS
The problem of ties can occur with many adaptive estimators such as in comparing 6\ — X and
Since these two estimators agree over a large part of the sample space, there is a problem with ties. By utilizing an equal assignment of probability of the ties this problem can be avoided. More importantly, use of this procedure produces a Pitman relation which is more nearly an equivalence relation. There remains only the problem of lack of transitiveness. From a mathematical perspective, if we define a new relation r*(0i,02\0) = \r(0i,02\0)\ then r*(x,y\0) is symmetric. In the language of probabilistic metric spaces, our questions about transitiveness are then converted into questions about whether the triangle inequality is true.
3.4.3 A randomized estimator Randomized estimators are frequently used in estimation theory, especially for the estimation of parameters arising from discrete distributions. The process of randomization is often used to produce an exact test of a hypothesis when the distribution of the test statistic is discrete (see Lehmann (1986)). To assess the capacity of the corrected Pitman criterion to judiciously treat randomization, let us consider 6\ and 62 to be two real-valued estimators of the real-parameter 9. Consider the random variable J7, having a uniform distribution on (0,1), as being stochastically independent of 0\ and 62. Traditionally we obtain values of U externally by means of a random number generator. In this context, define the randomized estimator OR as
Let us suppose that 6\ is Pitman-closer to 0 than $2 (by Definition 2.1.4), and compare OR with 02 (i-e., the worse of 6\ and 62}, using
Note that whenever U > |, the comparison results in a tie, OR = 02- Consequently, by the independence of t/,
ANOMALIES WITH PMC
93
According to Pitman's original definition, the randomized estimator would not produce an estimator that is Pitman-closer than the worse of the two competing estimators. Let us reconsider this same comparison in light of the corrected Pitman criterion.
The second term, on the right-hand side of the preceding equation, is a consequence of the tie between OR and 0% whenever U > \. So we have
Therefore, the randomized estimator will be corrected Pitman-preferred to 02. Using a similar argument we can show that the estimator 9\ will be corrected Pitman-preferred to OR. This provides a transitive ordering among Oi, OR, and 02 under the corrected Pitman criterion. One might reasonably conjecture that the corrected Pitman criterion is artificially difficult, and modify Definition 1.0.1 as follows:
Note that under this definition
By using < in Definition 1.0.1, we would produce a criterion that says that by randomizing we would always produce an estimator that is Pitman-closer than 0i or 02. This result emphasizes the judicious way in which the corrected Pitman criterion handles ties. Pitman's original definition assigns all the probability of ties to the second estimator. The less than or equal to modification assigns all the probability of ties to the first estimator. Randomization is a convenient means for illustrating these pitfalls and the practicality of the corrected Pitman criterion.
94
3.5
PITMAN'S MEASURE OF CLOSENESS
The Rao—Berkson controversy
A controversy important to the understanding of PMC is one stimulated by Berkson (1980), who questions the sovereignty of maximum likelihood estimation. Berkson's preference is for minimum chi-square estimation, where
... a chi-square function is defined as any function of the observed frequencies and their expectations (or estimations of their expectations) that is asymptotically distributed in the tabular chisquare distribution. Such a professed choice stirred much discussion and debate and ultimately led to a resurgence of interest in PMC. Rao (1980), in his discussion of Berkson's results, questioned Berkson's reliance on the principles of mean squared error. Rao noted some anomalies in using minimum mean squared error as a criterion and proposed that estimation procedures be reexamined using other criteria such as PMC. He also clarified the concepts associated with his criterion of second-order efficiency (SOE). A discussion of the Rao-Berkson controversy and of the principles involved in the usage of SOE is contained in this section.
3.5.1 Minimum chi-square and maximum likelihood In categorical data models, there are many functions that are asymptotically distributed as a chi-square random variable. Each could be used as a criterion in estimation. Berkson (1980) lists five such functions, one of which is the likelihood chi-square denoted by
where O is the observed frequency in n trials and E is the expectation of the corresponding frequency and depends on the unknown parameters. Minimizing this chi-square with respect to the unknown parameter yields the estimate of interest, labeled the minimum chi-square estimate (MCE). Asymptotically equivalent estimates can be obtained by minimizing any of the other chi-square functions. These procedures belong to a general class of BAN estimators, which we will consider in Chapter 6. Berkson (1980) argues that minimizing the likelihood chi-square yields the same estimate as would be obtained using the maximum likelihood procedure, and thus the MCE should be preferred over the maximum likelihood
ANOMALIES WITH PMC
95
estimate (MLE). The maximum likelihood procedure selects the parameter estimate that maximizes the likelihood function, or, equivalently, minimizes the log-likelihood function, associated with the sample data. Since the MLE can be derived as a MCE, it is argued that minimum chi-square is the primary principle of estimation. To illustrate the controversy aroused by Berkson, we discuss a simplified variant of his 1955 bioassay problem in logistic regression. We assume that two independent groups, consisting of n patients each, are exposed to different dose levels of an experimental medication. The first n patients receive the (lower) dose level d\ = —1, and the second n receive (the higher) dose level d-2 = +1. Then each patient in the experiment constitutes a Bernoulli trial as in Example 2.3.1 and Example 3.4.1. Fountain, Keating, and Rao (1991) discuss this bioassay problem under different criteria. Let nX\ and nX-2 be the number of survivors due to treatments 1 and 2, respectively. Under the Bernoulli conditions described we know that nX\ ~ B(n, Q\) and nX-2 ~ J3(n, #2)- In this simplified logistic regression problem, the survival proportions 0\ and 6-2 for the two dose levels are functions of a common parameter /3 and given by
The parameter space fi for (#i,02) is defined as the open square lying in the first quadrant with sides of unit length and a vertex at the origin. The logit is defined as TTJ = ln[0j/(l — fy)] = ftdi, for i = 1,2. Berkson suggested estimating (3 by minimizing a weighted least squares term of the following form (which is the basis of logistic regression):
where the estimated logit, Tri = ln[Xi/(l — Xi)} and C(fl\X\,X<2) is one of the five forms mentioned by Berkson. An absolute minimum of this function is called a minimum logit chi-square estimator. However, the procedure diverges whenever Xi = 0 or 1 and as such is seriously flawed. This situation is likely to occur when some TTJ is close to zero or one. In this occurrence, |/31 may be quite large so that the corresponding variance estimate may be unreliable. These situations occur frequently in practice when the physician is trying to determine the minimum dose (threshold) to obtain a beneficial effect or the maximum dose (saturation) beyond which no improvement is observed.
96
PITMAN'S MEASURE OF CLOSENESS
This divergence means that one cannot compute MSE as in Example 2.3.1. Whenever the experiment results in a value of (X 1^X2) that fallson the boundary of f2, Berkson's procedure diverges because the parameterization based on /3 does not permit 0\ or 62 to attain either endpoint, 0 or 1. This emphasizes the issue raised by many statisticians that the estimators must map the data space into the parameter space. However, the maximum likelihood procedure is equally flawed. In this procedure, we maximize the likelihood function
Divergence of the MLE occurs for similar reasons as the minimum logit chisquare estimator. Hence, we propose to replace Xi with 0$ in C((3\X\,X2) and l(fl\Xi, X-i) where
which has been motivated in Examples 2.3.1 and 3.4.1. By adopting this convention we produce quasi-MLEs and quasi-minimum logit chi-square estimators of /?. Due to the simplicity of the form of the logit, we can express the quasi-estimators in closed form as follows:
and
In Figure 3.5, we illustrate how the modified estimators produce estimates away from the boundary and toward the center of the parameter space (i.e., the open unit square in the first quadrant) for a sample of size five, much like a Stein rule or empirical Bayes procedure. The open circles represent the original bivariate data space and the shaded circles are the loci of the transformed data space. The resultant minimum logit chi-square estimators are given in Table 3.7 for the possible values of (X\,X2). The missing values in the table can be obtained from the observation that the matrix of estimates is skew-symmetric. The quasi-maximum likelihood estimator also satisfies the skew symmetric property and its corresponding numerical values are given in Table 3.8. We note that the two estimators produce identical estimates along the principal diagonal and anti-diagonal. Because of the prevalence of ties, we will
ANOMALIES WITH PMC
97
Figure 3.5: The median transformed data space. Table 3.7: The quasi minimum logit cm-square estimator.
X2 .4 .8 .6 Xi 0.0 .2 1.0 .0 0.0 .01757 .37189 .82517 1.38185 2.09995 .2 .27764 .63029 1.02301 1.38184 0.0 .4 0.0 .31699 .63029 .82517 .27764 .37189 .6 0.0 .01757 .8 0.0 0.0 1.0 use the adaptation of ties suggested in the previous subsection and remark that its impact on this important estimation procedure demonstrates its need. We also note that even in these quasi-procedures, which were chosen to remedy common symptoms in both estimation techniques, the minimum logit chi-square estimator at each point of the essential range produces an estimate as close or closer to zero than the MLE. The values of Pitman's measure can be calculated when a value for /3 is specified. If the survival
98
PITMAN'S MEASURE OF CLOSENESS Table 3.8: The quasi maximum likelihood estimator.
X2 .4 .6 .8 1.0 Xi 0.0 .2 0.0 0.0 .31322 .64620 1.01833 1.47112 2.09995 .2 .31655 .65028 1.02301 1.47112 0.0 .4 0.0 .31699 .65028 1.01833 .6 0.0 .31655 .64620 .31322 .8 0.0 1.0 0.0 rate at the lower dose level is expected to be only 6\ = .10, whereas the survival rate at the higher dose level is 0% = .90, then (3 = In (9). When the values of Q\ and 6-2 are specified, the joint distribution of (Xi^X^) can be obtained. If /? = ln(9), then JP*(J3L,Pc\P) = .7683; if (3 = ln(3) then P*(f*L, Pc\P) = -6262; if /3 = ln(|) then P*(/9L,/?c|/?) = -5589; and if /3 = 0 then JP*(/3L,PC\P) = .2461. The values of /3 were chosen to illustrate that, as mentioned in §2.4, the comparison of estimators is frequently mixed over the parameter space, which is precisely the result we find here. The value of ln(|) was chosen because Berkson used it and in that case PMC favors the quasi-MLE. The values of ln(9) and 0 were chosen to illustrate the extreme values which IP* could attain. We depart from this example with the observation that the quasi-MLE produces a Pitman-closer estimator for values away from (3 = 0. The minimum logit chi-square produces a closer estimate of /3 whenever 0\ = 62 = .50 and the regressor variable d explains less of the observed variation in the proportion of survivors.
3.5.2 Model inconsistency Rao (1980) counters these arguments by attacking the concepts of consistency and minimization of MSE that are advocated by Berkson. Rao lists several different examples of estimates that have smaller MSE but perform poorly in terms of other criteria such as PMC. For example, suppose X is a normally distributed variate with mean zero and variance cr2. Based on the quadratic loss function,
ANOMALIES WITH PMC
99
so the estimator X2/3 is better than X2 as an estimator of cr2. The reader can verify this result by observing that X2/a2 ~ Xi- From this result we also recognize that X2 is an unbiased estimator of cr2. However, X2/3 is not more frequently closer to the true value of cr2 than X2 since
We can verify the result on PMC by referring to Example 3.3.1. Thus there is a question of whether quadratic loss is appropriate in all situations. Rao presents a corresponding example on the issue of consistency; the example is similar to the one given at the end of §3.4. Let X = 0 + e where 6 is an unknown parameter and e is the error with the properties that E(e) = 0 and V(e) = O2. Berkson notes that X is a model consistent estimator since X = 0 when e = 0. Rao counters that X/2 is better than X by the principle of minimum MSE since
yet X/2 is not model consistent. In later years, Fisher (see Rao (1992)) tried to adjoin model-consistency to the well-accepted definition of consistency. Again contradictory results are obtained, suggesting that the principles advocated by Berkson are not always acceptable. The interesting part of this controversy is not a decision that the MLE is better than the MCE, or vice versa. The real point is that different methods of estimation are applicable in different situations. Sometimes the MCE may be the best and sometimes the MLE may be preferred. As we have seen in this book, there are many settings where an estimator based on PMC is best. Which estimation procedure is actually chosen depends on many factors, including both the use to be made of the estimate as well as the loss incurred in making an incorrect decision.
3.6
Remarks
The discussions in the first three chapters of this book have concentrated on the concepts and philosophies surrounding Pitman's measure of closeness. We have examined the extensive development of PMC as well as many of its anomalies. Numerous illustrations and examples have been given to provide further insight into various aspects of this procedure. To achieve
100
PITMAN'S MEASURE OF CLOSENESS
an appreciation for the usefulness of PMC, the level of these discussions has remained relatively nonmathematical, devoid of theorems and proofs. It is hoped that this approach has helped the reader's understanding of this estimation criterion and provided time for reflection and appreciation of its value. We close this chapter with a citation from Rao (1991) that echoes the sentiments of Johnson (1950) (see the paragraph preceding Theorem 2.2.2). The similarity of their messages, given more than four decades apart, speaks to the timelessness of their content.
/ believe that different criteria of estimation are useful for different purposes and it would be of interest to examine the performance of any given estimator from many different points of view. PMC has some advantages over the others as it is based on probabilities of certain events rather than on the expectations of certain functions of random variables, and is therefore more generally applicable. Further, it might be of additional help in deciding between estimators which are indistinguishable by criteria based on individual distributions. On the other hand, it has some disadvantages like lack of transitivity as pointed out by many authors.
Chapter 4
Pairwise Comparisons In this chapter, we present a general methodology for the comparison of two competing estimators 9\ and 0% of a common parameter 6 using Pitman's measure of closeness as a criterion. The chronological development of these methods does not parallel the presentation given here. Rather, the results are presented in deductive sequences; primary results and their subsequent corollaries are developed from a geometric perspective, which retrospectively provides considerable insight into the mechanics of preference between estimators. The fundamental theorem of the chapter is named in honor of the distinguished Irish statistician, Geary (1944), who provided an ingenious technique for the evaluation of Pitman's measure of closeness for unbiased estimators, and for the renown statistician, Rao, who coauthored a generalization of Geary's earlier work (see Rao, Keating, and Mason (1986)). Geary's findings were very close to a complete solution to the problem. To apply his results, Geary used asymptotic theory that was in vogue at that time. The significance of his work went relatively unnoticed for more than forty years, although it was favorably reviewed by Scheffe in 1945. Finally, Rao, Keating, and Mason (1986), using a geometric extension of Geary's technique, proved this fundamental theorem of comparison for Pitman's measure of closeness. As discussed in Chapter 2, although PMC had been explored through some cursory investigations (not all of which were favorable), it seemed imprudent to dismiss a measure of comparison until a means was established for its determination. Johnson (1950) conjectured that a general procedure for the determination of PMC would not be simple: Here again, however, difficulties of computation militate against the use of the closeness criterion. 101
102
PITMAN'S MEASURE OF CLOSENESS
Therefore, necessity mixed with the challenge of an open (but perhaps somewhat obscure) question became the motivation for much of the research given in this chapter.
4.1
Geary-Rao Theorem
Let 61 and 0% be univariate estimators of the real parameter 0 and let /(•,•) be their joint probability density function. We want to evaluate Pitman's measure of closeness for these two competing estimators in this general setting. To do this, we will introduce some helpful definitions, whose utility in estimation theory will be manifested through subsequent theorems. References will also be made to their utility in Bayesian hypothesis testing. Definition 4.1.1 Let 6\ and 02 be two univariate estimators of the real parameter 0. The set of all possible values of 6\ and 02 for which 6\ = 02 is called the line of equality. The set known as the line of equality consists of all the sample values for which 6\ and 02 coincide. Along this line the two estimators produce the same estimate. Section 3.4 addresses estimation problems in which this line has positive probability. As illustrated in §3.4, this situation can be quite common when the estimators have discrete distributions, or when one continuous estimator is adapted from another. Definition 4.1.2 Under the same conditions given in Definition 4.1.1, define the set of all 6\ and 02 such that 0\ + 02 = 10 as the switching line of the estimators 9\ and 02 at the parameter 0. The set known as the switching line consists of all the sample values for which the average of Q\ and 02 equals the true but unknown value of the parameter 0. The points on the line of equality do not depend upon the value of 0, whereas the points on the switching line are parameterized by the value of 0. Along these perpendicular lines, the estimates are equidistant from 0, in the former case because of their identical values and in the latter because one estimator overestimates the parameter as much as the other underestimates. The motivation for the names attached to these sets and their geometric interpretation will be thoroughly discussed later in this chapter. Define the following regions in the plane determined by the Cartesian product of the range spaces of the two estimators, 0\ and 62- These are
PAIWISE COMPARISONS
103
depicted in Figure 4.1 and will be helpful in evaluating Pitman's measure of closeness.
Figure 4.1: Regions of preference of B\. Theorem 4.1.3 (Geary-Rao Theorem) Let 9\ and 62 be two univariate estimators of the real parameter 9. Then Pitman's measure of closeness can be evaluated as follows
Proof: In T^i, we note that 0\ < 9%, which implies that 9i — 9 < fa ~ 9 and for each (0i,02)efti, we have that 0i + 92 > 29 and -§i + fa > 0. By adding these inequalities, we obtain that 02 > 0. Moreover, these two
104
PITMAN'S MEASURE OF CLOSENESS
inequalities imply that -(62 - 0) < 6\ - 0 < (§2 - 0) or, equivalently, that |0i-0|<|0 2 -0|. In 7^3, we observe that 9\ > 62 so that 6\ — 6 > $2 — 0 and for each ordered pair (0i,02)eft3, we have that 0i + 0% < 20 and -0\ + 02 < 0. By adding these inequalities, we obtain that 02 < 0. Likewise, the two inequalities jointly imply that — (0 — #2) < 01 — 0 < (0 — 02) or equivalently that |0i-0| < |02-0|. In a similar way we can show that in 7^2 and 7^4 that |02 — 0| < |0i — 0|Hence, it follows that P(0i,0210) = Pr(R,i) + Pr(7e3). Note that Jl\ and 7£s are the first and third quadrants in a coordinate system whose origin in located at the point (0,0) and whose axes are rotated through an angle of 45° from the original. We have been careful in this theorem to state Pitman's measure of closeness in terms of the strict inequality to avoid the problem of ties discussed in Chapter 3. If the line of equality and the switching line are events with probability zero, then both variants of PMC yield the same result (i.e., the comparison is the same whether one defines PMC according to (1.1) or as in the Pitman relation in §3.4). The result of the Geary-Rao theorem can be stated in a rectangularform whenever the line of equality and the switching line have zero probability. Define the bivariate random vector 0 by 9 = [ 9\ 02 ]', the 2 x 1 vector of ones as 1 and the matrix A by
These definitions give rise to the transformed axes depicted as dashed lines in Figure 4.1 and defined by the bivariate random vector U as follows:
The advantage obtained in using this transformation resides in the simplicity with which Pitman's measure of closeness can be determined. This simplification is illustrated in the following theorem. Theorem 4.1.4 (Rectangular Form) Let 9\ and 62 be two univariate estimators of the real parameter 0 and let /(iti, 1^2) be a joint density where U\ and C/2 o,re defined according to (4.3). Then it follows that
PAIRWISE COMPARISONS
105
The proof of the rectangular-form theorem is an obvious consequence of the Geary-Rao Theorem. Also, note that since this transformation from the coordinate system defined on the bivariate random vector 6 to U is unitary, then the joint density function of U can be found from that of 9 in a straightforward way. The rectangular-form theorem is also valid when the joint distribution of 0 is discrete. In this case the integrals in (4.4) would be replaced with summation signs and the joint density functions would be replaced with joint probability functions. From the PMC inequality, however, the indices of summation cannot be initialized at zero but rather must be initialized at some minimum value. Theorem 4.1.5 (Discrete Rectangular Form) LetO\ and 6-2 be two discrete univariate estimators of the real parameter 0 and let p(u\,u<2) be a joint probability function where U\ and C/2 are defined according to (4.3). Then
The integrand in (4.4) and the summand in (4.5) simplify whenever f ( x , y ) or p(x,y) satisfies bivariate symmetry (i.e., the surface of /(x,y), which intersects any plane determined by a line passing through the origin and laying in the (#, y] plane, produces a graph that is symmetric about the 2-axis). Then (4.4) in the continuous case can be simplified as follows:
The bivariate normal is just one example in which the above simplification could be applied. Note that we do not state a simplification of (4.6) under the assumption that u\ and U2 are independent. These random variables are uncorrelated if and only if they have equal variances. Consequently, independence seems to be an unlikely condition. For details the reader is referred to the discussion in §4.6. Another variation of the Geary-Rao Theorem can be drawn from Sheppard's (1898) Theorem; we also refer the reader to the work of Peddada and Khattree (1986) for some related developments. When comparing two estimators of the mean n from a normal population, they applied the polar coordinate transformation to the random vector U, in (4.3), when U has a bivariate normal distribution. This transformation was used by Sheppard (1898) to simplify the evaluation of probabilities in a bivariate normal distribution. We discuss a simpler form using Sheppard's result in the next
106
PITMAN'S MEASURE OF CLOSENESS
section. In general this transformation can be applied as follows. Define the two variables U\ = rcos(o;) and C/2 = rsin(a), then we may restate the results of the Geary-Rao Theorem in yet another way. Theorem 4.1.6 (Polar Form) Under the conditions of Theorem 4.1.3 and provided the bivariate random vector 0 has a continuous joint density function then Pitman's measure of closeness may be evaluated as follows:
where /(r, a) is the joint density junction of the random variables r and a. The polar form becomes especially useful whenever /(r, a) can be factored into the marginal densities on r and a, respectively.
4.2
Applications of the Geary—Rao Theorem
Example 4.2.1 (Pooled Estimators) Consider the contaminated population problem suggested by Tukey (1960). Consider a random sample of size n from a designated population, which has a normal distribution with mean \L and variance cr2 (i.e., Af(//,<7 2 )). The sample mean Xn is the traditional estimator of // and Xn is Af(p,,a2/n). Consider a random sample of size m from an independent but spurious population, which is Af(v, cr2) and the mean Ym of the second sample is AA(i/, cr2/m). Recall that we are interested in the estimation of // from the designated population. For example, the first population may be the batting average of George Brett, third baseman for the Kansas City Royals. He has a true batting average // and an associated standard deviation a. As we take box scores of Brett's hitting performance, we can obtain an estimate of his true batting average from a sample of his actual times at bat. Early in the season, however, batting averages can be quite misleading. Since Brett has been a perennial All-Star and an American League Batting Champion, could we obtain a better estimate of his seasonal batting average by augmenting his average with data from a different but pertinent source, for example, the five American League players with the highest batting averages at an early stage of the season? In this problem v becomes the true batting average of the top five American League hitters, and the selection of the group chosen to augment the data of interest is critical. In the context of this example,
PAIRWISE COMPARISONS
107
it seems reasonable to assume homogeneity of variances among players of this caliber. This example is modeled after that given by Efron and Morris (1977). The estimators of // considered by Johnson (1950) and Rao, Keating, and Mason (1986) are 0i = Xn and 02 = (nXn + mYm)/(n + m). This latter combined estimator is called a contaminated estimator of //, since it is found by pooling the observations from samples of the true and spurious populations. In reality, this circumstance usually arises when we unknowingly collect a sample of data, which came from two distinct independent normal populations. Here, we make the simplifying assumption that the mixing proportion is known. Another special case of this estimation problem frequently occurs in practice in attempting to combine intrablock and interblock estimates of linear contrasts of the main effects in the analysis of a balanced incomplete block design (BIBD) with random block and error effects. The method for the recovery of interblock information was proposed by Yates (1940) and is based on a convex linear combination of unbiased estimators of a common parameter A = c'r, which is a linear contrast of the treatment effects. We also refer the readers to Rao (1947) for an excellent review of the analysis of incomplete block designs. The form of the combined estimator is given by
where AI and A2 are unbiased and uncorrelated estimators of the contrast A obtained from the intrablock and interblock equations, respectively, and the weights wi and W2 are inversely proportional to the respective variances of the unbiased estimators. In our exposition of the contaminated population example if // = v, the estimators are unbiased and the respective weights are given by w\ = n/a2 and W2 = m/a2, so these problems appear similar. However, they differ in practice since the weights w\ and W2 are estimated in the analysis of BIBDs (i.e., see Yates (1940) or Rao (1947)). Our illustration is simple in that the variances of the two populations are assumed to be equal and consequently A is not a function of a. For treatment of the nonhomogeneous case, in which the different variances must be estimated, the reader is referred to the important and complex work of Kubokawa (1989). From normal distribution theory, since 0 = [0\ 62}' is a linear form of the vector of sample means, it is straightforward to show that 9 is A/"(/x, X), with
108
PITMAN'S MEASURE OF CLOSENESS
a mean vector p! = [p, (np + mz/)/(n + m)] and a covariance matrix S with off-diagonal element given by cr2/(n + m) and diagonal components cr2/n and
and the covariance matrix S* has off-diagonal element given by Cov(J7i, U^) = — ra<j2/[2n(n + m)], and diagonal elements given by
and
In (4.6), the probability content of quadrants of the bivariate normal can be used to evaluate Pitman's measure of closeness. Since the inner integral in (4.6) can be expressed through the standard normal distribution function, such probabilities can be found through the equation T\(x', A) = 2 f£° $(xu — A)0(it)dit, where T\(x; A) is the distribution function of a noncentral ^-distribution with one degree of freedom and noncentrality parameter A. Then one can ascertain Pitman' s measure as i
where 6 = (y — n)/cr and
Rao, Keating, and Mason (1986) provide a small set of tables of Pitman's measure calculated from (4.8) for small values of m. From Rao (1981) we obtain the condition that if 62 < (1/ra) + (1/n) then the pooled estimator has smaller MSE. In their tables, Rao et al. highlight the regions of disagreement between Pitman's measure and MSE. Although the regions of disagreement in these tables are small, Pitman's measure, in contrast to MSE, indicates a preference for the unbiased estimator for smaller values of 6 (i.e., when the differences in the two populations are more subtle). Table 4.1 lists the closeness probabilities for n = 5(5)25, m = 1(3)10, and 6 = 0(.5)1.5. The values chosen for this table allow us to calculate Pitman's measure for two unbiased estimators (i.e., 8 = 0), which will be
PAIRWISE COMPARISONS
109
discussed in Example 4.3.2, whereas the values of 6 equal to 1 or 1.5 give us distinct references (i.e., the number of standard deviations) on the differences between the two populations. The sample sizes are chosen for the purposes of the comparison of these results to some obtained by Peddada and Khattree (1986) and Kubokawa (1989). For the nonzero values of £, PMC is an increasing function of the sample size, m, of the spurious population. For almost all values of n and ra in the latter two parts of the table, the unbiased estimator is preferred to the contaminated estimator in the PMC sense. However, increases in sample size of the true population do not necessarily imply increases in PMC (e.g., the values for 6 = 1). Example 4.2.2 (Correlation and Unbiased Estimators) This contamination problem gives rise to an interesting problem about MSE and Pitman's measure of closeness. Suppose that X\ and X^ are unbiased estimators of the common parameter 0 and that they have correlation coefficient p but different standard deviations cr\ and a^. For simplicity, we assume that they are jointly distributed according to a bivariate normal distribution. Clearly MSE(Xi) = a\ and MSE(^2) = o\ and the consequent mean squared error relative efficiency apparently does not involve the correlation coefficient p. This problem was first posed by Geary (1944) and extensively explored by Landau (1947) without a successful resolution until the work of Johnson (1950). From the transformation in (4.3), it can be shown that U has a bivariate normal distribution with mean vector 0 and covariance matrix E in which the off-diagonal element is given by Cov(C/i,C/2) = — v\ 4- v\, whereas the diagonal components are respectively given by Var(C/i) = a\ 4- a\ + IPG\GI and the Var(t/2) = v\ + a\ — Ipaia^. Now if a\ = <72, then Pitman's measure and MSE agree that X\ and X% are equivalent. However, if the variances differ, then the numerical value of Pitman's measure varies with p, whereas that of MSE does not. This does not imply that PMC and MSE will disagree but that PMC's numerical value is inherently dependent upon the correlation p between the two estimators, while MSE remains unchanged by the existence of nonzero correlation (i.e., dependency) between the variables. This example thus serves as a useful illustration of the fact that PMC bases preference on joint information as opposed to MSE which uses only marginal information (i.e., see §2.4). To determine the preference between X\ and X
PITMAN'S MEASURE OF CLOSENESS
110
Table 4.1: Values of PMC of Unbiased to Pooled Estimator of Mean. Size of the Augmenting Set (m)
n 5 10 15 20 25
1 .4300 .4501 .4591 .4646 .4683
n 5 10 15 20 25
1 .4516 .4666 .4730 .4767 .4793
n 5 10 15 20 25
1 .5064 .5082 .5078 .5073 .5068
n 5 10 15 20 25
1 .5727 .5580 .5493 .5436 .5395
<5 = 0.0 4 7 .3661 .3300 .4025 .3739 .4196 .3952 .4300 .4085 .4372 .4177 £ = 0.5 4 7 .4748 .5100 .4975 .5428 .5039 .5505 .5064 .5521 .5075 .5517 6 = 1.0 4 7 .6695 .7651 .6600 .7634 .6455 .7471 .6334 .7311 .6236 .7172 £=1.5 4 7 .8071 .8987 .7726 .8788 .7441 .8547 .7225 .8334 .7055 .8150
10 .3041 .3524 .3766 .3918 .4025 10 .5399 .5834 .5942 .5962 .5953 10 .8203 .8289 .8158 .8006 .7863 10 .9400 .9328 .9174 .9016 .8866
pard's Theorem (1898). Suppose that Y = [Yi ¥2]' has a bivariate normal distribution with mean vector, 0, Var (Yi) = ri 2 , Var(l^) = T22, and Cov(Yi, Y^) = T/T^. Then it follows from Sheppard's Theorem that
PAIRWISE COMPARISONS
111
Prom the Geary-Rao Theorem it follows that
where 77 = (a\ — o\}/ J(cr2 + cr2,)2 — kp*-a\(j\. The trichotomy of 77 corre-
sponds to the trichotomy of cr2, — cr2, and consequently to the trichotomy of P(Xi,X2|0) - \ (see Johnson (1950, p. 282)). This establishes the result for PMC that between unbiased estimators having normal distributions the one with smaller variance is preferred as it is in comparisons based on MSE. Peddada and Khattree's adaptation of Sheppard's Theorem produces the following equivalent expression, provided \p\ / 1:
where 7 = aija\. Then it follows that JP(Xi,X2\0) > 5, if and only if 7 > 1 (i.e., the variance of X% exceeds the variance of X\). The simplicity of (4.9) over (4.10) is obvious and (4.9) does not exclude the case in which \p\ = I. This case may surface when we compare two equally efficient estimators. The connection between the works of Johnson (1950), Peddada and Khattree (1986), and Rao, Keating, and Mason (1986) can be established through a few observations. It is assumed that the estimators are unbiased in this latter example, so that the parameter 6 = 0 in (4.8) and the noncentral ^-distribution is reduced to a central t. Moreover, it is well known that a central ^-distribution with one degree of freedom has a Cauchy distribution whose distribution function is a function of the arctangent.
4.3
Karlin's Corollary
We commonly encounter situations in which 9\ and 62 are functions of a common statistic T. For example, in the estimation of a parameter 0 from a distribution which admits a complete sufficient statistic T, we know that both the maximum likelihood (MLE) and the uniformly minimum variance unbiased (UMVUE) estimators are functions of T. Recall that we discussed this issue in §2.4 under Savage's remarks with regard to use of marginal versus joint information in the assessment of competing estimators of a common parameter. The joint distribution of 6\ and #2 will become singular and the
112
PITMAN'S MEASURE OF CLOSENESS
bivariate solution given in the Geary-Rao Theorem simplifies to a univariate problem, whose solution was given by Rao, Keating, and Mason (1986). We, however, name this theorem in honor of the distinguished statistician Samuel Karlin, who in 1957 proved a similar result but in the context of Bayesian hypothesis testing. Suppose that 0\ and 02 are functions of a common statistic T. Looking at Figure 4.1, visualize a continuous contour defining the functional relationship between 6\ and 6-2 in the (Oi,02) plane. From the Geary-Rao Theorem it is easy to see that preference (i.e., the estimator that is closer to 0) of one estimator over another will change whenever the continuous path crosses the line 0i (T) = 02(T) or Oi(T) + 02(T) = 10 (i.e., whenever the continuous path crosses the line of equality or the switching line) except when the continuous path passes through the origin of the coordinate system of u from 72-i into 7^3 or from K2 into 7^4 or vice versa. The values of T at which the continuous path crosses the line of equality will be known as crossing points of the estimators 0\ and #2- Usually such points will merely be the points of intersection of the two estimators as functions of £, a realization of the statistic T. The values of T for which the continuous path crosses the switching line, determined by #i(T) + 02(T) = 20, will be known as switching points, which will be functions of the true but unknown value of 0. The term switching point is used to denote that preference (i.e., the estimator which is closer to 0) switches from one estimator to another as the continuous path crosses this line. Note that the rare exceptions, when the continuous path crosses diagonally through the origin of the coordinate system defined on u from one odd-numbered quadrant to another or from one even-numbered quadrant to another, are precisely the cases when preference does not switch from one estimator to another. In such cases, a crossing point is coincidentally a switching point. For visual thinkers, this discussion is probably sufficient to understand the results of Karlin's Corollary. However, for the sake of logical rigor, we present the same concepts in a more mathematical vein.
Definition 4.3.1 Let 0\(T) and 02(T) be univariate functions of the common statistic T. A point t in the essential range of the random variable T is said to be a crossing point ofO\ and 02, provided 0\(T) — 02(T) changes sign at t. Thus t is a crossing point of 0\ and 02 provided there exists an 77 > 0 such that for all e,0 < e < 77, and Q(t) = Oi(t)-0 2(t), then Q(t+e)Q(t-e) <
PAIRWISE COMPARISONS
113
0. Note that this definition uses Karlin's (1957) definition of a "change point of the first kind." However, he proposed the definition in a decision theoretic context related to Bayesian hypothesis testing. Zacks (1971) gives an elegant exposition of these concepts. Definition 4.3.2 Under the same conditions given in Definition 4.3.1, a point s in the essential range of the random variable T is said to be a switching point o/0i and 0? provided 0i(T) + #2 CO — 10 changes sign at s. Thus s is a switching point of 0i and 62 provided there exists an 77 > 0 such that for all e, 0 < e < 77, and for the function K,(t) = 0\(t) + 02 (t) — 20 it follows that /C(s + e)JC(s — e) < 0. A switching point s is a change point of the first kind for the function /C(£). Hence, we restate the Geary-Rao Theorem in the context given by Rao et al. (1986). Corollary 4.3.3 (Karlin's Corollary) Let 9\ and 02 be univariate estimators of the real parameter 0 and continuous functions, a.e., of the statistic T. Let A = {ti,... ,tm} be the set of m crossing points of 6\ and 02, let B = (si,..., sn} be the set of n switching points of 6\ and 02 at 9, and denote A&B — {yi,..., y j } , where j < m + n, as the symmetric difference, (A U B)—(A fl B}, of the sets A and B. Define the ordered points of the closeness partition as XQ = —oo, x\ = min{yi,..., T/J}, . . . , Xj = max{?/i,..., yj} and £j+i = oo. // \6\ — 0| < |02 — 0| in (XQ,XI) then
where double brackets in the upper index of summation in (4.11) denote the greatest integer function. Some assumptions given in Karlin's Corollary require explanation. He assumes that the sets of crossing points and switching points are finite. Later we shall demonstrate that the corollary is still true for infinite sets, which must be at most countable. A second assumption in Karlin's approach is that the essential range / of the random variable T is an interval. We will show that the essential range need only be an open set of the reals. From Zacks (1971), we use the following notation, which is presented there in a Bayesian context. Let Co(X^6) be the absolute loss function
114
PITMAN'S MEASURE OF CLOSENESS
defined on the Cartesian product of the essential range / and the parameter space ft and define
where T is a statistic, which possesses the monotone likelihood ratio property for the parameter 0, and d\(T] and di(T) are two decision rules or estimators). In the Bayesian context, £o[dj(T), 0} is the loss due to choosing action rfi(T), when the true state of nature is 0. Using Karlin's concepts applied to preference in terms of Pitman's measure, we can see that preference (i.e., choice of the one with smaller loss) between d\(T] and d,2(T) will change according to the changes in sign of the function N(T\0), that is,
where JC(T) and Q(T] are defined using Definitions 4.3.1 and 4.3.2, respectively. Define Z(N) as the set of all zeros of N(T\0) and C(N) as the set of all changes in sign of N(T\0). From the previous discussions we can see that N(T\0) will change sign according to the changes in sign of /C(T) or Q(T) except when these functions change sign simultaneously. Karlin (1957) and Zacks (1971) observe this in the Bayesian context for testing two hypotheses (i.e., hypothesis Ai : 6 e ftj, for i = 1,2). The set over which B\ is closer to 0 than 62, corresponds to the acceptance region determined by the test function for the first hypothesis (i.e., A\ : 0 e fti) in the Bayesian framework. Denote the indicator function on the positive real number line by I+(x) Then / + [—N(T\0)] plays the corresponding (but classical) role of the Bayes test function for the above hypothesis (or action). It is defined as
We see in the continuous case that
PAIRWISE COMPARISONS
115
The numerical value of Pitman's measure becomes the expected value of the classical analogue to the Bayes test function. The partition defined in Karlin's Corollary consists of at most j + 1 intervals, and on these, 6\ has smaller loss than #2 on exactly [j/2j intervals. The estimator with smaller loss alternates between the two competitors over adjacent intervals defined by the partition. We do not emphasize the assumptions used in Karlin's Corollary since they can be made less restrictive without affecting the truth of the corollary. However, some remarks about the assumptions are noteworthy. The definitions of crossing, switching, and change points allow for N(T\0) to be discontinuous at such points. Consequently, every change point of N(T\0) need not be a zero of N(T\0). Moreover, elements of Z(N) need not be the change points of N(T\0). Previously, it had been assumed that Z(N] must be a subset of C(N). In fact it is conceivable that Z(N)C\C(N) = >. Earlier research in this area had required that all zeros be modal (i.e., every zero of N(T\0) had to also be a change point). However, the use of the symmetric difference of the set of crossing points A = C(Q} and the set of switching points B = C(/C) precludes the requirement. The previous researchers in this area were aware of the unnecessary nature of this condition and they circumvented the obstacle in another way by requiring that the nonmodal zeros of N(T\0) be counted twice. By Theorem 2.2.2, the corollary remains true under a wide class of transformations on £Q- Thus it follows that the sets A and B suffice for all loss functions in the family. Ferguson (1967) made the same observation in the context of Bayesian hypothesis testing. The importance of this result to Bayesian decision theory lies in the explicit statement given for C(N) = C(/C)AC(C/). We can be more articulate in this case since we have a limited family T of the loss functions. In practice, the Bayesian can start by solving for the value(s) of 0 that satisfy di + d2 = 10. Earlier we claimed that Karlin's Corollary was true even in the event that the number of changes of sign is countably infinite and that we could drop the condition that the statistic T must satisfy the monotone likelihood ratio (MLR) property. Toward this end, we present an extension of Karlin's Corollary given by Keating (1991).
Corollary 4.3.4 (Karlin's Extended Corollary) Using the definition given in (4.12), suppose that N(U\0) is a piecewise continuous function of the statistic U, whose essential range I contains at most a countable number
116
PITMAN'S MEASURE OF CLOSENESS
of discontinuities in the set D. Then there exists at most a countable number of disjoint open intervals, /i, /2,... over which \0\(U) — Q\ < \&2(U) — 0\. Consequently, we have that
Proof: By assumption N is a continuous (a.e.) function such that N : I — D —> JR. Now the interval (—00,0) is an open set in JR under the Euclidean metric. So from the continuity of JV, the inverse image JV~ 1 [(—oo,0)|0] is a relatively open set in / — D, which is an open subset of ]R. For the real number line, every open set can be written as at most a countable union of disjoint open intervals, which are mutually exclusive events defined on the random variable C7. Thus it follows that
such that Ij fl Ik = for j ^ k. From the definition of N
The subsequent result in (4.18) is a direct consequence of Kolmogorov's axiom for being countably subadditive. Note that Karlin used the MLR property to obtain a complete (or essentially complete) class of decision rules for the two hypotheses. The end points of the intervals, given above, satisfy either Definition 4.4.1 or Definition 4.4.2 except we must add the condition "less than or equal" to zero as opposed to "strictly less than" zero.
4.4
A special case of the Geary—Rao Theorem
The calculation of Pitman's measure of closeness can be greatly simplified for a special case arising out of the Geary-Rao Theorem. Before presenting
PAIRWTSE COMPARISONS
117
the special case, we define a class of estimators that can be compared based on PMC via the simplification.
4.4.1
Surjective estimators
Definition 4.4.1 An estimator 6 (T), which is a function of the statistic T, is a nondecreasing full-range estimator of the real parameter 0, provided: (i) 0 : I —» 17 is surjective, (i.e., the estimator maps the essential range of the random variable onto the parameter space, £1). (ii) 0(T) is a nondecreasing function o f T . Property (i) of Definition 4.4.1 (i.e., surjective estimators) has concerned statisticians since Halmos (1946), Lehmann (1951), and, more recently, Hoeffding (1984) and Bar-Lev and Enis (1988). The simple intent of the condition is that the set of possible values that the estimator can assume is a subset of the parameter space 17. An estimator should not assume values which are not possible for the parameter 0. However, the history of statistics contains many estimation procedures that do not satisfy the surjective condition. One such example involves estimators that can provide negative estimates of variance components, such as Rao's (1971) MINQUE estimator of a linear combination of variances. If Property (ii) is strictly increasing, as opposed to just nondecreasing, then the estimator becomes injective in the topological sense. Of course, if an estimator is both injective and surjective it becomes bijective or a homeomorphism. If #i(T) and O^T} are two estimators that satisfy Definition 4.4.1 and are not both constant on the same subset C (a.e.) of the domain /, then 0(T) = [0i(T)+02(T)]/2 will be a homeomorphism on/. By the intermediate value theorem there exists at least one switching point si in /, and by the strictly increasing nature of the average of the two estimators, s\ is also unique. These observations produce the following corollary, which was derived by Dyer and Keating (1979a). Corollary 4.4.2 LetO\(T} and Oz(T) be nondecreasing full-ranged estimators of 0, such that both are not constant (a.e.) on I. Let s\ be the unique switching point at 6 and suppose that 0i(T) and O^T] have at most a finite number of crossing points, say £1,..., tm. If 0\(t] > 02(t), whenever t is in (-co, min(ti,..., tm)) then
118
PITMAN'S MEASURE OF CLOSENESS
where XQ = -oo, x\ = min(*i,ti,... ,tm), ..., xm+i = max(si,£i,... ,£ m ), #m+2 = oo, o,nd j = m + 1. In this statement, we assume that the unique switching point is not a crossing point. If it is, we can derive a simpler result through the symmetric difference in Karlin's Corollary since Corollary 4.4.2 is a special case of Karlin's Corollary. The wide applicability of Corollary 4.4.2 is demonstrated in the following subsection. 4.4.2
The MLR property
Earlier in this chapter, we cited Karlin's use of the MLR property to establish a complete class of decision rules. In this subsection, we begin the process of disclosing results, which the MLR property supports in classical estimation theory. Definition 4.4.3 The family f(x,0) of probability functions indexed by the parameter 0 is said to have the monotone likelihood ratio (MLR) property if there exists a real-valued function T(x) such that for any 6 <6' the distribution functions F(x; 0) and F(x\ 9'] are distinct and the ratio f ( x ; 9'}f f(x\ 0) is a nondecreasing function of T(x). Remark 4.4.4 Using Lemma 2 from Lehmann (1986, p. 85), we can see that the median unbiased estimator (MUE) of 9 from a family, which satisfies the MLR property, is a nondecreasing function of the statistic T(x). Since the median unbiased estimator is a 50% upper confidence limit on the parameter, the result is a natural consequence from hypothesis testing. Remark 4.4.5 Using some other associated lemmas from Lehmann (1986), we can prove (through a contrapositive argument) that the maximum likelihood (MLE) estimator of 9 is a nondecreasing function of T(x). Remark 4.4.6 The uniformly minimum variance unbiased estimator is also a nondecreasing function of T(x) under some general regularity conditions. Hence, many of the popular and conventional point estimators satisfy Definition 4.4.1 for a family of distributions that have the MLR property
PAIRWISE COMPARISONS
119
For such families of distributions the methods of median unbiased estimation, maximum likelihood estimation, and mean unbiased estimation can be compared through Corollary 4.4.2. The corollary provides a convenient shortcut for the determination of PMC.
4.5
Applications of the Special Case
Example 4.5.1 (Estimation of the Characteristic Life in the Exponential) Let Xi:n < X2-.n < •" < Xr:n be the first r-ordered failure times of a random sample of size n chosen from the exponential failure model with probability density function given by f(x\ 9} = (1/0) exp—(x/0) I+(x). We assume that the experiment is truncated at the time of the rth failure so that we have a Type II censored sample, and the parameter 0 the characteristic life, is unknown. In reliability theory, scientists, engineers, actuaries, and epidemiologists are frequently concerned with estimation of 0 or, more generally, A = cd, where c is a known constant. When c = — ln(p), A is the 100(1 — p)th percentile of f ( x ; 0) (i.e., xp = - ln(p) 0 is the reliable life corresponding to the survival proportion p). Several estimators of 0 are based on the complete sufficient statistic T = Si=i Xi:n+(n—r)Xr:n. The statistic T represents the total time on test and has a gamma distribution with shape parameter r and scale parameter 0. We can equivalently say that 2T/0 has a chi-squared distribution with 2r degrees of freedom. Nagaraja (1986) discusses estimation of 0 based on invariant functions of T. The maximum likelihood estimator (MLE) of 0 is
Since T has a gamma distribution with shape parameter r, and scale parameter 0, then E(T) — r0 and Var(T) = r# 2 , and therefore 0i is also the uniformly minimum variance unbiased estimator (UMVUE) of 0. Moreover, it is efficient in the sense that its variance, 0 2 /r, attains the FrechetCramer-Rao lower bound on the variance of an unbiased estimator. For most statisticians, one would see little need for the consideration of other estimators. To produce a competitor to the MLE, we introduce the class of estimators Q, where Q = {aT : a > 0} and determine the element in Q that has smallest mean absolute deviation. For any estimator in Q, the absolute risk is given by
120
PITMAN'S MEASURE OF CLOSENESS
where /(*) = tr~l exp(-t/6) I+(t)/[OrT(r)]. By Leibniz' rule, we can differentiate the absolute risk in (4.20) with respect to a and obtain
where X2(x;/) is the distribution function of the %2 distribution with / degrees of freedom. Upon equating the first derivative with respect to a to zero and solving for a, we obtain a unique critical value of the absolute risk. Noting that d?E(\aT — 0\)/da2 > 0, we obtain that the minimum mean absolute deviation estimator (MMAD estimator) of 0 from the second derivative test is given by Let Oi(T) = ttjT, for i — 1,2 be two estimators in the class Q. Since the estimators in Q are functionally ordered for a specified value of r, we assume without loss of generality that ot\ > 012 • Since the two estimators do not intersect over the positive reals, which is the essential range of T, then the set A of crossing points in Karlin's Corollary is empty and from Corollary 4.4.2, B consists of the unique switching point defined by 0/a', where a' = (cti+a<2)/2. Then it follows that the closeness partition separates the positive reals into two complementary intervals (0,0/ot'} and [0/a1, oo). From Corollary 4.4.2, Pitman's measure becomes
From Groeneveld and Meeden's (1977) observation about the modemedian-mean inequality, we can establish that m-2r < 2r < ra2r+2 (i-e., ?7i2r < 2r because the median of a chi-squared random variable is less than its mean (with degrees of freedom 2r) and 2r < m2r+2 because the mode is less than the median (with degrees of freedom 2r + 2)). Thus, it follows that OL(T) > 0A(T) for all T > 0. Table 4.2 contains Pitman's measure of closeness of the MLE relative to the MMAD estimator of 9 in the exponential failure model for Type II censored samples of size r.
PAIRWISE COMPARISONS
PMC
Table 4.2: PMC of the MLE to the MMAD estimator. Censor Size (r) 1 2 3 4 5 7 10 15 .7144 .6665 .6410 .6246 .6128 .5967 .5818 .5674
121
20 .5587
From Table 4.2, we conjecture that the MMAD estimator of the characteristic life is inadmissible to the MLE in the sense of PMC. If 0 < a(= ro2r) < b(= 2r) < c(= ra2r+2), then one can readily show that a < 26c/(6 + c). The inequality, although algebraically straightforward, may be somewhat unclear. It is a simple statement that 26c/(6+c), as the harmonic mean of the distinct values 6 and c must exceed b, which exceeds a. Using this result in conjunction with (4.23) and noting that, for this comparison, a' = [(1/r) + (2/m2r+2)]/2, we have
Thus, the truth of the inadmissibility conjecture drawn from Table 4.2 is established mathematically. The fact that Pitman's measure of closeness does not depend upon the characteristic life was observed by Dyer and Keating (1981). Keating (1985) later showed, in the context of Rao's phenomenon (see §3.3), that this phenomenon was true for the comparison of multiples of a scale invariant statistic of a scale parameter. As we shall see in Chapter 5, Nayak (1990) uses an elegant argument based on group invariant transformations to show that the phenomenon is true for the comparison of any two equivariant estimators of a location or scale parameter from a location and scale parameter distribution. Example 4.5.2 (The Proportion of Survivors in the Normal Distribution) Let Xi, ^2,..., Xn be independent and identically distributed normal random variables with common mean // and variance cr2. In quality control studies, engineers are quite interested in the fraction of a population that falls below a specified tolerance level XQ. For example, in applying the material strength of rotor blades an engineer would become concerned about the proportion of rotor blades having a mechanical fatigue strength X (or endurance limit) that falls below the maximum stress level XQ exerted on the
122
PITMAN'S MEASURE OF CLOSENESS
blade during a specific mission. Since such stresses lead to rotor damage, material fatigue cracks, and catastrophic accidents, rotor analysts are concerned about the estimation of the proportion that falls below this maximum stress level. In this example for the normal distribution, we consider the equivalent problem of the comparison of estimators of the proportion of survivors above some level XQ. The proportion P of survivors at XQ is given in the normal distribution by
Observe that P(XQ) depends on XQ and the unknown parameters // and cr only through the ratio (H — XQ)/O. In other words, $-1[P(xo)] = (// —XQ)/O-, where $-1(x) is the inverse function of the standard normal distribution function. 3>~l(P) is more frequently referred to as the standard normal deviate associated with the lOOPth percentile of the distribution on X. Let the statistic T be defined as follows
where n is the sample size, X is the sample mean, and S is the sample standard deviation (i.e., S2 = Y!i=i(Xi - *)2/(n - !))• It follows that
where £(/, A) represents a noncentral ^-random variable with / degrees of freedom and noncentrality parameter A. Let
where c\ — \/27r/[r(//2)2^~2)/2], denote the distribution function of a noncentral ^-random variable having / degrees of freedom and noncentrality parameter A. The classical estimators of the survival proportion P(XQ), maximum likelihood and minimum variance unbiased, are functions of the complete sufficient statistic (X, S). From the invariance property of maximum likelihood, the MLE of P(XQ) is given by
PAIRWISE
COMPARISONS
123
The uniformly minimum variance unbiased estimator (UMVUE) of P(XQ) can be determined through the Rao-Blackwell Theorem as
where /x(a, (3) is the incomplete beta function ratio. Thus it is obvious that PL(T) and Py(T] can be expressed as functions of the common statistic T and the two estimators satisfy Definition 4.4.1, which is necessary in Corollary 4.4.2 in that each is a continuous, nondecreasing function of T. The range of PL(T) is (0,1), whereas the range of Py(T) is [0,1]. Graphs of PL(T) and Py(T) are given in Figures 4.2 and 4.3 for n = 3 and 8, respectively. These estimators have been compared by Zacks and Milton (1972) and Brown and Rutemiller (1973) on the basis of MSE, and by Boullion, Cascio, and Keating (1985) in terms of absolute risk.
Figure 4.2: Estimators of the proportion when the sample size is 3. To apply Corollary 4.4.2, we must determine the crossing points, which are the points of intersection, of Pi(T) and Py(T). In this section, we note that PL(T] and Py(T) have precisely three crossing points. First observe that at t = 0, we have P L(0) = $(0) = \ and likewise that Py(0) = /i/2[(n-
124
PITMAN'S MEASURE OF CLOSENESS
Figure 4.3: Estimators of the proportion when the sample size is 8. 2)/2, (n — 2)/2] = ^. Consequently, t = 0 is a point of intersection for Pi(t)
L(Q= $(t/,/n^l) = 1- $(-t/^fn^l) = 1 - PL(-t),
then the function of Pi(t) — \ is an odd function of t. Furthermore, for \t\ < n — 1, we have that
Therefore, the function Py(t) — \ is an odd function of t. This propery of symmetry about (0, |) implies that for each positive crossing point of the graphs of PL(^} and Py(t), a negative crossing point exists in which these ^-coordinates are equal in absolute value. The fact that there are exactly three points of intersection can be established by proving that there is a unique positive point of intersection (see Keating (1980) for the mathematical details). Using numerical procedures, we determined the positive crossing point z\ of the MLE and UMVUE for n from 3-10. From a numerical perspective, for estimators that satisfy Definition 4.4.1, the associated switching point s\ must be bounded by $~l(P}\/n — 1
PAIKWISE COMPARISONS
125
and (n- !){2/p1[(n-2)/2, (n-2)/2] -1}. The first value is the value of t for which Pi(t) = P and the latter is the value for which Py(t) = P- Hence, we can use the secant-method to solve for s\ for each value of P and n through IMSL subroutines for the inverse functions of the standard normal and the beta distribution functions. We shall present results for P > .50 and note that the PMC is symmetric about P = .50. Observe that Pi(t) > Pv(t) whenever t e (—00,21) so that Corollary 4.4.2 implies that the ordered endpoints are given by XQ = — oo,xi = —2i,Z2 = 0,2:3 = min(si,2i),#4 = max(si,zi) and £5 = oo, which yields
which is a linear combination of noncentral ^-probabilities, where
Table 4.3 contains the closeness probabilities of the MLE to the UMVUE of the survival function in the normal distribution for values of n = 3, 4, 6, 7, 9, and 10 and P = .50(.05).95. We note that the closeness probabilities approach extremes as P approaches .50 and .95. As the survival proportion approaches .50, Py(T) is highly preferred to the MLE in the sense of Pitman's measure of closeness, whereas the MLE is highly preferred in the same sense when the survival proportion approaches .95. These extreme values of Pitman's measure are produced by the same geometric phenomenon. When P = .50, the crossing point at t = 0 also becomes a switching point of the two estimators and thus according to Karlin's Corollary is deleted from the symmetric difference of the sets of crossing points and switching points. At this proportion of survivors, the UMVUE is closer to P = .50 for all values of T from —z\ to z\ and hence the subsequent dominance of Py in terms of Pitman's closeness criterion is readily understood. Likewise, when the survival proportion approaches .95, the switching point si approaches the positive crossing point z\. Consequently, the values of si and z\ are deleted from the symmetric difference (or more exactly the probability content of that interval approaches zero) of the sets of crossing points and switching points. In this way the MLE is closer to a value of P
126
PITMAN'S MEASURE OF CLOSENESS
Table 4.3: PMC of the MLE to the UMVUE. Sample Size n P
.50 .55 .60 .65 .70 .75 .80 .85 .90 .95
3
4
6
7
9
10
.1939 .2801 .3766 .4802 .5882 .6973 .8035 .9004 .9766 .9824
.0804 .1778 .2819 .3887 .4957 .6016 .7071 .8141 .9247 .9660
.0158 .1326 .2462 .3502 .4425 .5260 .6093 .7066 .8368 .9813
.0073 .1328 .2505 .3525 .4368 .5084 .5797 .6697 .8036 .9903
.0016 .1433 .2692 .3682 .4388 .4903 .5411 .6176 .7551 .9988
.0007 .1499 .2797 .3772 .4417 .4844 .5257 .5941 .7295 .9871
near .95 for all T t (—00, — z\) U (0, oo). Of course for P near .95, the latter set in this union contains most of the probability. One should also observe that the magnitude of the preference of the MLE in terms of Pitman's measure of closeness becomes a more local phenomenon in the parameter space as the sample size increases. For the cases in which the sample size is near 10, the closeness probability of the MLE relative to the UMVUE drops in value by about .30 when P moves away from .95 below to .90. From Table 4.3, we see that for small sample sizes, the decided preference of MLE in terms of PMC is spread over a larger subset of the parameter space [0,1]. A median unbiased estimator P^T) can be determined from medians of the noncentral ^-distribution. Let £.5o(/, 8) denote the median of a noncentral ^-distribution with / degrees of freedom and noncentrality param For a fixed number of degrees of freedom, £.5o(/, 8) is an increasing function of 8. Hence, there exists a unique value 8 = h.soj(x), such that
The median unbiased estimator is then determined by solving the following equation for Ps(t}
PAIRWISE COMPARISONS
127
Consequently, the median unbiased estimator of P is given by
In Chapter 5, we will show that median unbiased estimators (which are functions of a complete sufficient statistic) are Pitman-closest within a restricted class of estimators. Example 4.5.3 (The Efron Rule Revisited) Another illustration of Corollary 4.4.2 arises from Example 1.1.2. In Chapter 1, we merely stated that Efron's rule was Pitman-closer than the sample mean X. Efron (1975) does not detail his evaluation of PMC for these estimators. We note for 0 = 0, iP(0i,02|0) = 0 which is obvious from Figure 1.1. We present a proof for the case when 9 > 0 and note that PMC in this example is an even function of 0. From Corollary 4.4.2, we have
where s is the unique switching point of 9\ and 02 and 0 is obviously the sole crossing point. Two cases must be considered, depending upon the two possible definitions of O^X}. The first case occurs whenever 0 < 9 < c/Cl^/n) and can be readily handled. In the second case, where 0 > c/(2v/n), we have by Definition 4.1.2
which implies that Moreover, since s > 0 > c/^-^/n), then we have
The expression on the right-hand side of this inequality is a strictly increasing function of 0 and is bounded above by one-half.
128
PITMAN'S MEASURE OF CLOSENESS
4.6
Transit iveness
Although Pitman's measure of closeness of three competing estimators need not be transitive (i.e., see §3.1), there are certain sufficient conditions that produce transitiveness. Let #i(X), #2(X), and #s(X) be three competing estimators of the real parameter 0 based on a common random vector X. If the three estimators are ordered for all X in the sample space ]Rn, Pitman's measure of closeness is transitive.
4.6.1
Transitiveness Theorem.
Theorem 4.6.1 (Transitiveness Theorem) Let #i(X), ^(X), and 0a(X) be real-valued estimators of the real parameter Q based on the data contained in the n-dimensional random vectorX. If the three estimators are functionally ordered for all X, Pitman's measure of closeness is transitive. Proof: Assume that #i(X) > ^(X) > ^a(X). Since the estimators are ordered, then from the Geary-Rao Theorem we can conclude that for i = l,2;j = 2 , 3 a n d i < j
Moreover, by the assumed ordering we know that
Since these random variables are functionally ordered according to the ordering given in inequality (4.35) and the common essential range E of the three estimators is an open subset of JRn, we have that
for all x e (—oo, oo). In particular, when x = 20, we have from inequalities (4.34) and (4.36) that
Case 1. Suppose that JP(Si,d2\6) > .50. Then JP(02,S3\0) > .50jind JP(?ii?3|0) > -50. In this case, we assumed that P(0i,02|0) > -50 (i.e., Si is
PAIRWISE COMPARISONS
129
better in the sense of PMC than 02) and consequently the two conclusions follow from the transitive property of real numbers and inequality (4.37J. The two conclusions are that 02 is Pitman-closer to 9 than #3 and that Q\ is Pitman-closer to 6 than #3. Case 2. Suppose that P(0i,02|0) < -50 and P(0i,03|0) > .50. Then F(02,03|0) > .50. From the first assumption in Case 2, we have Pr(0i +02 > 20) < .50 and from the second that Pr(0i + 03 < 20)^> .SO.^Prom these assumptions, we have that 02 is Pitman-closer to 0 than 0i, and 6\ is Pitmancloser to 0 than 03. However, by inequality (4.36) and the transitive property of real numbers, we have that
The transitiveness of the Pitman closeness criterion is now established in this second case; i.e., , 02 is preferred over 0s. ^Case 3. Suppose that P(02,03|0) < .50. Then P(0i,02|0) < -50 and JP(6ii03\0] < -50. Notice that this assumption presumes that 0s is Pitmancloser to 0 than 02. This case is easily established by inequality (4.36). Since F(02>03|0) < -50 then by the transitiveness of real numbers, we have that P(0i,02|0) < JP(0i,03|0) < .50. The transitiveness of Pitman's measure of closeness is now established in this last case.
4.6.2
Another extension of Karlin's Corollary
Remark 4.6.2 Note that (4.34) includes an intervening equation that states the following result:
where these functions are defined, respectively, after Definitions 4.3.1 and 4.3.2, except the domains of definition are extended to JRn. We can now denote the inverse image of a set A C JR for the function £(•) as Q~l(A), which is a subset of JRn, as we did previously. If the product of Q(-} and /C(-) is negative, then the regions of preference defined in the Geary-Rao Theorem are Ui = /C"1 [(0, oo)] H Q~l [(-oo, 0)] and ft Therefore, it follows that
130
PITMAN'S MEASURE OF CLOSENESS
The regions given in (4.39) are subsets of Mn and consequently in the continuous case these results have expressions in terms of n-fold integrals,
Let E be the essential range of the random variable X and let D be the set of discontinuities of the function TV. If /C and Q are restricted continuous maps from E — D —> JR1, so that the restricted domain, E — D, of N is an open subset of JRn, then the pre-images must be open in JRn since the sets (—00,0) and (0, oo) are open in M1. Since open sets in this separable metric space (M, p) satisfy the Lindelof Property, then the preference region, which is an open subset of JRn, can be written as a countable union of open sets, O\, 02 > • • • , in Mn. However, the open subset of JRn cannot be written as a countable union of "disjoint" open cubes (see Dugundji (1966, p. 95)). Since we have assumed that E — D is also an open subset of JRn, then the pre-image under the continuous restricted map N can be written as the union of at most a countable number of nonoverlapping "closed" cubes, Ci, <72, • • • , c JRn (see Dugundji (1966)). Thus it follows that
The importance of the result in (4.41) resides in the simplicity with which one can evaluate each n-fold integral in the infinite sum. Observe that each n-dimensional closed cube CK has exactly 2n verticies, v^, v^ 2 ,..., Vfc2n and the probability content over this n-dimensional cube can be determined through a linear combination of values of the distribution function F(X) at each of the vertices. Remark 4.6.3 Note that if #i(X) > #2(X), then Pitman's measure of closeness can be viewed as a risk function defined on the estimator, which is the average of #i(X) and ^(X). Define the simple loss function as £*(0,0) = J+(0 - 0). Then from (4.34), we have
PAIRWfSE COMPARISONS
131
Theorem 4.6.4 (Median Unbiased Estimation) Let 02(X) be a median unbiased estimator of the real parameter 6. Suppose that #i(X) > 02(X) > 03(X), for all X e ]Rn. Then F(8i,fe|0) < .50 and JP(e2,S3\0) > .50. Proof: By the assumed ordering, it follows that 0i(X) + #2(X) > 202 (X). From inequality (4.34), we conclude that
Once again, from the assumed ordering, 202 (X) > 02(X.) + 0a(X) and by inequality (4.34), we observe that
We conclude that the median unbiased estimator (MUE) is optimal in the sense of Pitman's measure of closeness over a class of ordered estimators. Example 4.6.5 (Estimation of the Characteristic Life) In Example 4.5.1, we discussed the estimation of the characteristic life 6 in an exponential failure model and we considered Type II censored data from this distribution. The maximum likelihood estimator QL satisfied the criteria of unbiasedness, maximizing the likelihood function, minimum variance, consistency, efficiency, and was uniformly preferred in terms of Pitman's measure over the minimum absolute risk estimator in Q. In the following discussion, we show that the MLE is inadmissible to the median unbiased estimator in the sense of Pitman's measure of closeness. Recall from the remarks following (4.22) that the estimators in Q are functionally ordered for a specified value of r. Then from the previous optimality theorem of the MUE, we know that all other estimators in Q will be inadmissible to the MUE in Q provided it exists. Since Pr(2T/0 < m^r) = -50 it follows Ojj — 2T/77i2r. The median unbiased estimator is as likely to overestimate 6 as it is to underestimate 0. From another aspect, the median of the statistic BU is the true but unknown value of 9. Brown, Cohen, and Strawderman (1976) also established that, among the median unbiased estimators, the one that is a function of the sufficient statistic will have smallest absolute risk.
132
PITMAN'S MEASURE OF CLOSENESS
Remark 4.6.6 Suppose that the sample size of a random sample, chose from a population with continuous distribution function FQ(X) is odd. Then the sample median X is a median unbiased estimator of the population median 9.
m(x) denote the distribution function of the sample median which is given by Xm:2m-i- Obviously, we are assuming a sample size of 2m — 1. Then it follows that
The simplification results from £" for each j = 1,... ,ra. So the conclusion follows that the sample median is a median unbiased estimator of the population median. For even sample sizes, if we define X = l/2(-Xro:2m + .Xm+i:2m) and use that FQ is symmetric about 0, then X is median unbiased for 9. Remark 4.6.7 Reconsider Example 2.3.2 but with a random sample of size 3. Let Xi, Xi, X$ be three independent observations from the Cauchy distribution whose density function is
where 9 is the location parameter. It is quite natural to take the sample mean X as an estimator of 9. Note that X has precisely the same Cauchy density, so that E(X) and Var(X) diverge. As an alternative, we consider the sample median X = X2-.3, which has a symmetric distribution around 0, and moreover E(X) = 0, so that X is both median unbiased and unbiased (while X is only median unbiased). However, since Var(X) does not exist, we cannot use MSE in comparing X and X. Since E(X) does not exist either, the MMAD criterion would present the same dilemma as in the calibration problem, Example 2.3.3. However, there is no problem in incorporating the PMC as a valid means for comparing the performance of X and X. In §1.1.2, we made the parenthetical note that fractional moments could be used in the method of moments to obtain estimates of the scale parameter in the Cauchy distribution. Along these lines one may also consider a loss
PAJRWTSE COMPARISONSx
133
£9 — |0 _ 0|9 as given in Theorem 2.2.2, where 0 < q < I. Such choices allow for the existence of E[£q(X, 0}] and E[£q(X, 6)]. Two problems arise: (i) for q < 1, this loss function is not convex; and (ii) there is not a natural choice for q(Q < q < 1). What would be a natural interpretation of X or X being better than the other under a nonconvex loss function? While some answers to these queries are presented in §2.3, fortunately, PMC rescues us from such unnecessary controversies and complications by Theorem 2.2.2. Not surprisingly more than forty years ago, Johnson (1950), being aware of these controversies, suggested the Pitman closeness criterion for this Cauchy population! From the preceding remark it follows that the sample median is also a consistent estimator of the population median. In reference to estimation of the Cauchy location parameter, we see that the sample median is a consistent estimator of 0 whereas the sample mean is not, although both are median unbiased estimators of 0. Example 4.6.8 (Asymptotically Unbiased Estimators) In the normal distribution, for odd sample sizes 2m — 1, the sample mean X and the sample m:2m-i are median unbiased estimators of the population median IJL. Notice that we choose an odd sample size to evade the problems of randomization required in even sample sizes. Hence, these two median unbiased estimators of // must cross. Note that both are convex linear combinations of the order statistics X\:im-\,..., ^2m-i:2m-iFor example, Geary (1944) compared the sample mean and the sample median as median unbiased estimators of the population mean \JL in the normal distribution. From asymptotic theory, we know that the sample mean X and the sample median X are mean unbiased estimators having a bivariate normal distribution. The respective asymptotic variances (see Geary (1944)) are given by <72/n and 7rcr2/2n with correlation coefficient \/2/7r which from (4.9) results in 17 = (TT - 2)/-s/(?r + 2)2 - 16. Then from (4.9), we can calculate Pitman's asymptotic measure of closeness as .6149685, which agrees with the numerical value given by Geary (1944) as .615. We can conclude that asymptotically the sample median is an inadmissible estimator to the sample mean for the estimation of the normal population mean //. Scheffe (1945) encouraged research in Pitman's asymptotic measure of closeness. Intuitively, one can conjecture about the comparison of consistent estimators of an unknown parameter 0 based on Pitman's asymptotic measure of
134
PITMAN'S MEASURE OF CLOSENESS
closeness. In this regard, we observe that both estimators are consistent asymptotically normal (CAN) estimators but the best asymptotically normal (BAN) estimator is favored by Pitman's measure. This example is a precursor to the unifying results of Sen (1986a). He established that in this asymptotic sense, a BAN estimator is always preferred to one which is not. Note that the sample median is not efficient in that the ratio of its variance to the Prechet-Cramer-Rao lower bound exceeds 1.
Chapter 5
Pit man-Closest Estimators Chapter 4 was primarily devoted to procedures for the determination of Pitman's measure of closeness. With such procedures we can proceed with theoretical developments of and inferential connections to PMC. Two topics of particular interest are the existence and construction of "Pitmanclosest estimators." In some examples of previous chapters, we exhibited a "Pitman-closest" estimator within a restricted class. The Transitiveness Theorem 4.6.1 and its consequence the Median-Unbiased-Estimator Theorem 4.6.4 provide some insight into a prominent candidate for such inquiries To further illustrate these latter two results, let us consider the perplexing problem of estimation of the population median in the Laplace distribution from random samples with an even number of observations. Example 5.0.1 Let X\,...,X-2m be a random sample of size 2m from a one-parameter Laplace distribution with median B (see Ghosh and Sen (1989), and Keating (1991)). The Laplace density function is given by
for all x e JR. For even-numbered sample sizes there is no unique solution for maximizing the likelihood function with respect to 0 and there is no univariate sufficient statistic for estimation of 0. All maxima of the likelihood m-^m and Xm+i^m, the middle two order statistics. We consider the class C of convex linear combinations of X since C is the closure of the set of all MLEs. Define C as
135
136
PITMAN'S MEASURE OF CLOSENESS
such that 0 < 01,02 < 1, and a\ -f 02 = 1. We note that all the estimators in C are placed in increasing order as functions of a^. Now by Theorem 4.6.1, if C contains a median unbiased estimator then the MUE is optimal in the sense of PMC. Since the Laplace distribution is symmetric about 0, the linear combination formed by setting a\ = a^ = \ produces the unique median unbiased estimator within C and hence is Pitman-closest. Every estimator in C can be written as
where 0 = (Xm:2m + -Xm+i:2m)/2 is the median unbiased estimator, Z = (Xm+i:2m — Xm:2m)/2 has a parameter-free distribution, and u t [—1,1]. The utility of the theorems in Chapter 4 is limited in that the estimators must be considered in pairs. However, in this chapter the theorems will be used to prove results which construct a Pitman-closest estimator of the parameter within a restricted class.
5.1
Estimation of location parameters
Consider a random vector X (n-dimensional) from the data space (as in Kempthorne (1989)), En C JRn, with density function /(x; 6) for some 61 fi, the parameter space. In modern statistical inference much interest has been devoted to estimation problems of a real parameter 9 based on a vector of observations X arising from a sample where the loss function £(9,6) is invariant under a group G of transformations defined on En (i.e., for each g eG,g:En -+En). For example, suppose that loss is measured by way of a metric p(x), defined for # > 0, which satisfies some of the global conditions set out in Chapter 2 (i.e., p(0) = /t/(0) = 0, and p'(x) > 0 for all x > 0) so that a symmetric loss function is produced in accord with the suggestion of Efron (1978) as Let G(17) and G(D) denote the associated groups of transformations defined on 17 and D, the class of all estimators of 0. For any g e G, let ~g and g be the corresponding transformations defined on 17 and £>, respectively. Note that g : £7 —* 17 and g : D —> D. To continue with the example started above with the loss function £, consider the simple group of common translations of the observed x's,
PITMAN-CLOSEST ESTIMATORS
137
This transformation arises naturally in the estimation of location parameters. This translation induces the associated transformations The class D of estimators that satisfy gc() in the preceding equation are known as location invariant estimators. For the symmetric loss function £ note that
This example illustrates what is meant by a loss function being invariant under a group, G(En), of transformations. In a more mathematical vein, a loss function is invariant under a group of transformations provided
and In this context an estimator 0 is said to be invariant under the group of transformations G, provided 0[#(x)] = [0(x)] for all g e G. Eaton (1983) is an excellent reference on the concept of invariance in statistical inference. The following theorem, which is due to Nayak (1990), states that PMC is constant on the orbits of the group G defined on fi. Theorem 5.1.1 If 9\ and 62 are invariant estimators of the real parameter 0, then for all g e G(ft) Proof: We follow Nayak's proof for nonrandomized estimators. By Definition 1.0.1 and (5.2),
Moreover, since B\ and 0% are invariant estimators of 0, the previous equation reduces to
138
PITMAN'S MEASURE OF CLOSENESS
by equation (5.1) of invariance. The conclusion in Theorem 5.1 is a straightforward application of the definition of PMC. This result explains the common discovery that many comparisons resulted in PMC values that were independent of 0. Such observations (see Dyer and Keating (1981), (1983), and Keating and Gupta (1984)) were made with respect to estimation of scale parameters and percentiles in location and scale parameter families of distributions. This independence was also observed (see Khattree (1987)) in the multivariate setting in estimation of dispersion matrices. The theorem also extends the previous efforts (Keating (1983), (1985)) of explaining this general phenomenon in estimation of scale parameters and percentiles. A central concept in Nayak's construction of Pitman-closest estimators is his generalization of a result given in Pitman's original paper.
Lemma 5.1.2 (Pitman's Lemma I) Let X be a random variable with median M. Let p be a function whose range is a subset of the nonnegative real numbers such that p(x) is strictly increasing for x > 0 and for which p(x) is strictly decreasing for x < 0. Then for all k e 1R, kj^M. Proof: Let us assume that M. < k. From monotonicity conditions imposed upon p( ), whenever x < M. then p(x — M.} < p(x — k). Hence
Since M. is a median of X then the conclusion of Pitman's lemma is an immediate consequence. A similar argument produces a parallel proof for the case when M > k. Pitman established the lemma for the special case in which p(x) = \x\. Nayak makes the observation that if M < k\ < k-2 or if M > k\ > fo then
This elegant observation consolidates many examples worked by researchers. However, it can be generalized along the lines suggested by Nayak. If in addition p( ) is continuous then, because of its monotonicity, there exists a unique value &o, such that k\ < ko < k^ and
PITMAN-CLOSEST ESTIMATORS
139
where fco is the midpoint of the line segment [fci, fo] under the metric p( ). This definition allows for asymmetric loss functions. Whenever the loss function is symmetric about zero, then fco = (k\ + kz)/2. Lemma 5.1.3 (Midpoint Lemma I) LetX be a random variable with a median M.. Let k\ and k^ be distinct real numbers such that k\ < k M then
Proof: If fco > Ai, then from the strict monotonicity of p it follows that
Hence it follows that
This result is not accidently similar to those in Chapter 4. To see this perspective, consider X — k\ and X — k% as ordered estimators of a population median M.. From the results in Chapter 4 on ordered estimators, the determination of Pitman's measure depends only upon the switching point, the value of s that satisfies
The solution is given by s = fco, where /CQ = (k\ + &2)/2, the Euclidean midpoint. Hence, preference between these estimators is dictated by the sign of M. — fc0. The monotonicity of p is essential to this sequence of results. Without the monotonicity, the value of fco in the midpoint lemma is no longer unique. Using Pitman's Lemma, we present the following result about location invariant estimators of a location parameter 6. Let 9\ = X — k\ and 02 = X — &2 be two distinct estimators of 0. For loss functions satisfying the conditions in Pitman's Lemma
from Theorem 5.1.1. This observation, together with Pitman's Lemma, produces the following lemma for the class D of translations of X, D = {0 : G=X-k, where k e 1R}.
140
PITMAN'S MEASURE OF CLOSENESS
Lemma 5.1.4 (Nayak's Lemma) Let the loss function C(0,0) = p(0 — 6}, where p satisfies the conditions in Pitman's Lemma. A Pitman-closest estimator of 0 e fi is given by
where A4o is the median of X when 0 = 0. Pitman's and Nayak's Lemmas are not true if the strict monotonicity of p( ) is relaxed. The preceding lemmas fail, for example, in the case of a simple loss function Cs(9,0) on a symmetric interval about 0 defined by
Then it follows that the Pitman-closest estimator of 0 employs the value of k which maximizes Pr[(fc — c) < X < (k + c)]. Theorem 5.1.5 (Nayak's Location Theorem) Let X be a random vector having a joint density function with location parameter 0 given by f(x\ — 6,... ,xn - 9), for 0 e ft = ]Rl. Let the loss function £(0,0) = p(0 - 0), where p ( ) satisfies the conditions in Pitman's lemma. Let Yi = Xi — Xn, i = 1,..., n — 1. Then a Pitman-closest location invariant estimator (CLIE) of the location parameter 0 is given by
where A'fo(-Xn|Y) is a median of the conditional distribution of Xn given Y when 0 = 0. Proof: The (n — l)-dimensional vector Y is a maximal invariant and a location invariant estimator must be of the form (see Eaton (1983))
Let 0 be an estimator of the form given in (5.10). Then by (5.1)
which, from (4.15), is E[I+(-N(X.\Q))]. However, by the law of iterated expectation,
PITMAN-CLOSEST ESTIMATORS
141
However, the term inside the braces has a minimum value of | by Nayak's Lemma. It thus follows that
Remark 5.1.6 OCLIE is a median unbiased estimator of 0.
However, the term inside the braces has a minimum value of ^ which produces the result that Pi[Xn — Mo(Xn\Y) < 0] > ^. A similar argument produces the complementary inequality. Remark 5.1.7 OCLIE, whenever the first absolute moment exist, is the Pitman estimator of 0 under the absolute or Laplacian loss function (see Kagan, Linnik, and Rao (1973, p. 226)). However, the proof of Nayak's Location Theorem does not require the existence of moments and the solution provides an optimal estimator for distributions such as the Cauchy that are often lacking in estimation procedures. In addition, since the Pitman estimators have origins in Bayesian estimation, these dogmatically different procedures produce the same estimate of 0 whenever we use the noninformative prior (see Jeffreys (1961)). This observation will be presented in §5.4 through the work of Ghosh and Sen (1991). Remark 5.1.8 The Pitman estimator under absolute risk is Pitman-closer to 0 than the Pitman estimator obtained under quadratic risk. They coincide whenever the median and mean of the Pitman empirical distribution function are equal. This occurrence will be discussed in §6.4. This result provides further evidence of the frequent occurrence of Rao's phenomenon (see Keating (1985)) in estimation. Remark 5.1.9 If the structure of Nayak's Theorem is preserved as the data space is transformed so that Xn = T becomes a complete sufficient statistic, then Moreover, the vector Y is ancillary to T so that via Basu's (1955) Theorem, T and Y are stochastically independent, which reduces the CLIE to
142
PITMAN'S MEASURE OF CLOSENESS
Thus under sufficiency and invariance the Pitman-closest estimator becomes a function of the sufficient statistic alone. Remark 5.1.10 The results of this subsection are directly related to the Kagan-Linnik-Rao (KLR) Theorem. A simple linear transformation of the data space can transform Xn = X and as such the Pitman-closest estimator may be expressed as where a is the ancillary vector with ith component a; = Xi — X. The KLR Theorem gives necessary and sufficient conditions under which no improvement is provided by conditioning the sample mean on the ancillary vector. This result is similar to Rao's form (see Kagan, Linnik, and Rao (1973)) of the Pitman estimator. Rao uses the following linear combination of the Ai's:
We can see that Nayak's choice of Xn or Pitman's choice of X\ could have been extended to any estimator satisfying Rao's form: where e^ = Xi — R for i = 1,..., n. This general formulation of the Pitman estimator leads to many new characterization problems that parallel those given by Kagan, Linnik, and Rao. We will revisit Rao's form of the Pitman estimator of location in §5.5. Note that A4o can be determined in an alternative way using the following equation:
This procedure can prove very useful when the conditional distribution of Xn\Y is complex. The result in (5.16) is the integral analogue to the Pitman estimator obtained under quadratic loss (see Kagan, Linnik, and Rao (1973, p. 225)). Nayak's integral result determines the Pitman estimator under the absolute loss when it exists. Although the integral solution existed in cases where the absolute risk did not, the subsequent estimator lacked mathematical foundations and properties.
PITMAN-CLOSEST ESTIMATORS
143
Example 5.1.11 Consider the exponential family of distributions with a truncation parameter p, and unit scale parameter, so that its density function is given by This problem is frequently of interest in estimation of the guarantee time in component or system reliability. Let Ti,..., Tn be a random sample of size n from this exponential distribution and let Ti:n be their ordered values. Let Xi = Tn-i+i:n for each i = 1,... ,n so that Xn = T\:n. Note that this is a permissible candidate for the random vector X under the conditions of Nayak's Location Theorem (NLT). Define YI according to the conditions in NLT: for each z = l , . . . , n — 1. Let -Dn-;+i = Tn_i+i:n — Ti:n, which are the times between successive failures (or interarrival times). In the exponential distribution it is well known that Xn and the D^s are independently distributed. Define Y\ = D<2 and for i = 3,..., n
Hence, according to the NLT we must determine Aio? which is the median n given Y when p, = 0. Since Y can be written as a linear form of the D^s according to (5.18), and, by virtue of the independence of X then Xn and Y are independent. Hence the determination of the median of n Y when // = 0 reduces to the determination of the median of Xn when [i = 0. It is well known that
Thus when /z = 0 the median of Xn is
Hence the Pitman-closest invariant estimator of // is given by
This example illustrates the point that independence and sufficiency (see Remark 5.1.9) play key roles in applications of the NLT. Our choice of defining Xn = T\.n and its sufficiency for estimation of /x accelerated the process
144
PITMANS MEASURE OF CLOSENESS
of applying the NLT. Given sufficiency and ancillarity, it is unnecessary to compute the conditional median and a simpler result is given in the next theorem. Example 5.1.12 The extreme-value distribution has intrigued researchers from both practical and theoretical perspectives. Prom a theoretical point of view there is no minimal sufficient statistic for estimation with dimension less than n and from a practical perspective there is wide applicability in research problems as exemplified by Gumbel (1954) and Weibull (1951). The density function of the one-parameter extreme-value distribution is given by
Let X\,..., Xn be a random sample of size n from the distribution given in (5.23) and define Yi according to the change of variable given in NLT. Due to the messy nature of the conditional distribution of Xn\Y, we will use (5.16) to calculate Mo(Xn\Y). To use this process we need to simplify the expression in the integrand as follows:
Define ui = eyi for i = 1,..., n — 1 and make the change of variable t = ex. These changes simplify (5.24) into
Let o = (tii + • • • + un-\ + 1). Substitution of this simplified version into (5.17) produces
One can see that the integrands can be transformed into those of x2 distributions with 2n degrees of freedom. This transformation produces
PITMAN-CLOSET ESTIMATORS
145
Consequently, eM°(2d) = m^n- Thus, it follows that
Thus by the NLT the Pitman-closest invariant estimator of 0 is given by
With later results we will be able to verify this solution. The next theorem enables us to obtain this Pitman-closest estimator in an easier way. Example 5.1.11, which was presented by Ghosh and Sen (1989) under a separate development, illustrates that if Xn via transformation is a complete sufficient statistic for estimation of 0 and Y becomes ancillary, then by Basu's Theorem, Xn and Y are stochastically independent. The subsequent Pitman-closest estimator was simple to find through Nayak's Theorem. The fact that the Pitman-closest estimator from Nayak's Theorem is a median unbiased estimator leads to a more general result which provides for situations in which the median unbiased estimator is Pitman-closest within a larger class C, which is easier to characterize. Theorem 5.1.13 (Ghosh-Sen Theorem I) Let 0 be a median unbiased estimator of 9 and consider the class C of all statistics of the form V = 9 + Z, where 6 and Z are stochastically independent. Then for all 0 e Q and V e C
Proof: Using Theorem 2.2.2, we have that
Since 9 and Z are assumed to be independent then
146
PITMAN'S MEASURE OF CLOSENESS
The latter inequality is a consequence of the assumption that 0 is a median unbiased estimator of 0. The conclusion of the theorem follows from the observation that Under the loss function given in Pitman's Lemma, Ghosh and Sen (1989) show that every equivariant estimator is PC inadmissible relative to OCLIE- Unlike Nayak's Theorem, the Ghosh-Sen Theorem is not confined to estimation problems that admit an equivariant structure that is common for location and scale parameter families of distributions. Since Y is ancillary, 0 is median unbiased, and 6 and Z are independent, M(0\Z) = M(0) = 6. Thus, we can avoid the conditional distribution setup by simply referring to Pitman's Lemma. Thus there are estimation problems in which the two theorems can be coupled to produce some general results. Example 5.1.14 Let Xi,..., Xn be independent and identically distributed normal random variables with mean 0 and unit variance (i.e., Xi ~ J\f(0,1) for each i — 1,..., n). It is well known that 6 = Xn ~ A/"(0, l/n) and as such is a median unbiased estimator of 9. 6 is distributed independently of the ancillary vector W, with ith component Xi — 6. By Basu's (1955) Theorem 0 and W are independently distributed so that 0 is the Pitmanclosest equivariant estimator of 6. This latter observation is again due to the expression in (5.11). As shown by Ghosh and Sen the independence of 0 and Z is not crucial for the proof of their theorem. The independence condition can be replaced by the less restrictive condition that the conditional distribution of 0 given Z is symmetric about 0. Example 5.1.15 Reconsider the problem of estimation of the median in the Cauchy distribution based on a sample of size 2. In Example 2.3.2, we showed that 0 — (X\ + X^jl was Pitman-closer to the Cauchy median 0 than X\ despite the fact that they have identical marginal distributions. Since 0 ~ Cauchy (0,1) then 0 is a median unbiased estimator of 0. In fact, any convex linear combination of the two random variables will also be median unbiased. Note that 0 seems to have few, if any, other desirable properties. Define Z = (Xi - X2)/2 ~ Cauchy(Q, 1). Then Xi = 0 + Z. It should be noted that 0 is the MLE of 0 if and only if \Z\ < 1. The joint distribution ofT = 0 — 0 and Z can be written as
PITMAN-CLOSEST ESTIMATORS
147
Thus it follows that the conditional distribution of T\Z can be written as
which is symmetric about the origin. It follows that the conditional distribution of 0 given Z is symmetric about 0. Therefore, by the remark following Example 5.1.14, 9 is the median unbiased estimator within C that is Pitman-closest. This example illustrates that independence is not necessary but rather a sufficient condition. McCullagh (1992) recently considers conditional inference in the Cauchy distribution with specific attention to the role of ancillarity. Another example, which arises from the two-parameter uniform distributions, is due to Pitman (1937) and illustrated by Ghosh and Sen (1989). Let Xi, ^2,..., Xn be independent and identically distributed random variables having a uniform distribution on the interval [0 — 8/2,0 + 6/2]. It is well known that (Xi:n,Xn:n) is jointly sufficient for (0,6). The midrange,
is a median biased estimation of 0. Define the sample range as
The conditional distribution of 9 given Z is symmetric about 0 and therefore the midrange is the Pitman-closest estimator within the class C of estimators of the form 0 + aZ (i.e., the linear combination of X\-n and Xn:n). The symmetry of the conditional distribution of 0 given Z is not needed in this example. It suffices to show that 0 is also conditionally median unbiased for 0 given Z. This property, known as uniform conditional median unbiasedness (UCMU) (see Sen and Saleh (1992)), is the weakest one in this setup. Some examples of conditional median unbiased estimators will be given in the last section of this chapter.
5.2
Estimators of scale
Consider the argument given in §5.1 and assume that X ~ l/0f(x/0) where x e ]R+ and 0 e J7 = 1R+. The corresponding loss function depends upon 0 and 0 only through the ratio, 0/0 (i.e., C(0,0) = p(0/0), such that p ( l ) — 0, p(x) is strictly increasing for x > I and strictly decreasing for x < 1, and p(x) is typically asymmetric about x = I).
148
PITMAN'S MEASURE OF CLOSENESS
Lemma 5.2.1 (Pitman's Lemma II) Let X be a positive random variable with median M.. Let p be a function whose range is a subset of the nonegative real numbers and for which p(x) is strictly increasing for x > 1 and strictly decreasing for x < 1. Then
for all k e M+, the positive reals, such that k ^ M. The proof can be replicated in a direct way from Pitman's Lemma I by letting Y = ln(x), and /3 — ln(#). Note that Y and (3 satisfy the conditions in Pitman's Lemma I. Also, one should observe that if M is a median of Y then ln(Ai) is a median of X. If M < k\ < fo or if M. > ki > fe then
If p( ) is continuous then, because of monotonicity, there exists a unique value &o such that k\ < k$ < ki and
Thus, &o is the midpoint of the line segment [fci,/^] under the distance function p( ). These are asymmetric loss functions. Whenever the loss function is symmetric about 0 then m(fco) — [hi(fci) + ln(&2)]/2 or k$ = k, the geometric mean of k\ and fo. Lemma 5.2.2 (Midpoint Lemma II) Let X be a positive random variable with a median M. . Let k\ and ki be distinct real numbers such that k\ M. then
The proof follows directly from the logarithmic change of variable. Lemma 5.2.3 (Nayak's Lemma II) Let the loss function C(6,Q) = p(Q/0), where p satisfies the condition given in Pitman's Lemma II. Then a Pitmanclosest scale invariant estimator of a scale parameter 0 intl is given by
PITMAN-CLOSEST ESTIMATORS
149
The proof of this lemma parallels the one given in Nayak's Lemma I under the same transformation given after Pitman's Lemma II. However, evaluation of ln(0) = 0 implies that 6 = 1. Let B\ = X/ki and 62 = X/k% be two distinct scale invariant estimators of the scale parameter 6. Thus it follows that for loss functions satisfying the conditions in Pitman's Lemma II,
using Theorem 5.1.1. This observation, together with Pitman's Lemma, produces the following lemma for the class D of dilations of X, D = {6 : 6 = X/k,wheTekeJR+}. Theorem 5.2.4 (Scale Theorem) Let X be a random vector having density function with scale parameter 6 given by (l/6n)f(xi/6,... ,x n /0), for 6 c fJ = ]R+. Let the loss function £(6,6) = p(6/6), where p( ) satisfies the conditions in Pitman's Lemma II. Let Yi = Xi/Xn,i = l , . . . , n — 1. Then a Pitman-closest scale invariant estimator of the scale parameter 6 is given by
where M.\(Xn\Y] is a median of the conditional distribution of Xn given Y when 6 = 1. Example 5.2.5 Consider a random sample, l/i, C/2,.. • , Un of size n from a uniform distribution on the interval (0,0). Consequently f ( u ; 0 } = 7(o,0) (w)/0- Define Xi = Ui:n for each i = 1,..., n (i.e., the ith order statistic from the random sample). This produces a random vector X that satisfies the hypothesis of the Scale Theorem. Define the (n — l)-dimensional maximal invariant Y with ith component given by Yi = Xi/Xn = Un:i/Un:n for each i = 1,..., n — 1. It is also well known that Y is distributed independently of the sample maximum Xn and that the joint distribution of the components of Y is that of the order statistics taken from a random sample of size n — I from a uniform distribution on the interval (0,1). Earlier we noted that The median of this beta random variable can be obtained as M.\(Xn} = 2-i/n Fr0m Theorem 5.2.4, we have that
150
PITMAN'S MEASURE OF CLOSENESS
which agrees with the result given previously. Again, M.\ can be solved in an alternative way whenever the conditional distribution of Xn\Y is complex by using the equivariance of M.\ under an arbitrary monotone transformation on the random variables. If we consider the following equation:
then a logarithmic transformation of the variables can yield a simpler result which we illustrate in the following example. Example 5.2.6 Let X\,..., Xn be independent and identically distributed normal random variables with zero means and common standard deviation a (i.e., Xi ~ AA(0,o-2)). We shall use (5.38) to solve for M\.
Let o2 = (yjf H 1- y^i +1) and v = (ax)2. Substitution of this expression into (5.39) produces
It follows that
Thus it follows from the scale theorem that
PITMAN-CLOSEST ESTIMATORS
151
As noted by Ghosh and Sen (1989), this result was given by Keating and Gupta (1984) under much narrower constraints on the class D of estimators of a. However, the result given here broadens the class over which this median unbiased estimator is Pitman-closest to include all equivariant estimators. A further simplification of this procedure is given in the following theorem. Theorem 5.2.7 ( Ghosh-Sen Theorem II) Let D be the class of all statistics of the form V = 9(1 + Z), where 0 is a median unbiased estimator of the scale parameter 0. If 0 and Z are nonegative and independently distributed then for all 0 e 17 and V e D, The proof follows in a straightforward way from that given for the Ghosh-Sen Theorem I.
5.3
Generalization via topological groups
The results for location and scale parameter distributions can be combined under one general development. The individual cases for location and scale parameters were given separately because of their wide applicability and to allow the reader to use some of these essential developments without having to digest all the mathematics presented here. The extension of Nayak's work to a general group structure was first published by Kubokawa (1991). Although his presentation is very brief, we present a full development along the lines given in §§5.2 and 5.3. The key to this generalization is embedded in the recognition that for location parameters, the parameter space 17 = 1R is an abelian group under addition with identity element 0 and for scale parameters, the parameter space 17 = 1R+ is an abelian group under multiplication with identity element 1. Then two questions arise: (1) Does the Pitman-closest invariant estimator simply depend upon the median M.e of the conditional distribution Xn[Y when 0 — e, the identity element in the group associated with the parameter space? (2) Can the Pitman-closest estimator be determined by performing the binary operation of the group on Xn and the inverse of Mel If the data space E and the parameter space 17 are groups under the common operation o, we say that (E, 17, J) form a topological group whenever the function h : E x 17 —>• 17 defined by the projection
152
PITMAN'S MEASURE OF CLOSENESS
is continuous. By this we mean that the h( ) is continuous provided the inverse image of every open set ft with respect to the topology defined by J is open in the product topology defined on E x ft. For the special case of surjective estimators discussed in Chapter 4 then E = ft and this condition produces the foundation for the structure of topological groups. Consider the loss functions of the form defined by
such that p(e) = 0, where e is the common identity of the groups (E, o) and (ft,o) and p(x) is a continuous strictly increasing function for x > e and strictly decreasing for x < e. Herein we assume that h(x,u) is an increasing function of x and a decreasing function of a;. Let J be the topology of the real numbers restricted to ft. Such loss functions will be denoted as continuous topological group (TG) loss functions since they consist of the continuous compositions with the function h. Lemma 5.3.1 (Pitman's Lemma III) Let X be a random variable with a median M.. Let p be a continuous loss function. Then
for all k e E, the data space, such that k ^ M. Proof: Suppose that M. < k so that XoM.~l > Xok~l by the assumption that h(X,u>) is a decreasing function of u. If X < M., then e > x o M.~l and, since p(z) is a decreasing function for z < e
Hence by the assumptions on the continuous TG loss functions,
Since M is the median of X the result follows immediately. Suppose that k\ < k<2 then h(x,k\) > h(x, fo) for all x. From the conditions on p it follows that:
and
PITMAN-CLOSEST ESTIMATORS
153
Define JV(x;/ci,/C2) = p(xokil) — ^(xofc^1). For x e [fci,/C2],/o(xo/:f1) is a continuous increasing function of x and p(x ok^1) is a continuous decreasing function of x, which makes N(x) an increasing continuous function on [fei, fo]. Since N(k\) < 0 and N(k^) > 0 then by the intermediate-value theorem there exists a unique value &o, such that N(ko) = 0, where k\ < fcg < k% or, equivalently, This value fco is the midpoint of the line segment [&i, fo] under the distance function p[h( )]. Lemma 5.3.2 (Midpoint Lemma III) Let X be a random variable with a median M.. Let fci, £2 e E such that k\ M. then
Proof: If fco > Ai, then from the strict monotonicity of p it follows that
Hence it follows that
For the class D of transformations of X of the form {0 : 9 = X o k~1}, where k e E and X satisfies the axioms of Lemma 5.3.2, an optimal estimator can be constructed directly from Lemmas 5.3.1 and 5.3.2. Lemma 5.3.3 (Nayak's Lemma III) Let the loss function £(0,0) = p(9 o 0~1}, where p is a continuous TG loss function. Then a Pitman-closest estimator of 6 in D is given by
where Me is a median of X when own the gorup indentiy.
154
PITMAN'S MEASURE OF CLOSENESS
Proof: Let 0\ = X o M^1 and #2 = X o k~l be two distinct estimators of 0. For continuous TG loss functions:
However, Me oO = M, the median of X so that by Pitman's Lemma the conclusion of Nayak's Theorem follows. Remark 5.3.4 If the distribution function of X is denoted by F(x o 0"1), where F( ) is a parameter-free distribution function then, it follows that Mo0~l=Me. Theorem 5.3.5 (Kubokawa-Nayak Theorem) Let X be a random vector of dimension n from a distribution that is invariant under the component transformation Zi = Xi<>uj~l (i.e., /(z) has a parameter-free distribution}. Let the loss function C(0,0] = p(6oO~l), where p( ) is a continuous TG loss function. Let Yi = Xi o X~^,i = 1,..., n — 1. Then a Pitman-closest invariant estimator of 0 is given by
where M.e is a median of the conditional distribution of Xn given Y when 0 = e. Proof: The (n— l)-dimensional vector Y is a maximal invariant and a group invariant estimator must be of the form (see Eaton (1983))
Let 0 be an estimator of the form given in (5.52). Then by (5.1)
which, from (4.15) is E[I+(—N(X.\e))]. tion
But by the law of iterated expecta-
PITMAN-CLOSEST ESTIMATORS
155
However, since Y is fixed, the term inside the braces has a minimum value of | by Nayak's Lemma. It thus follows that
Remark 5.3.6 Computation of Me(Xn\Y) can be mathematically complicated. A simplified approach would result by finding an MU estimator 9 of 0 and an ancillary vector Y, so that 9 and Y are stochastically independent and couple the proof of Theorem 5.1.13 with Lemma 5.3.2 to obtain the following result. Theorem 5.3.7 Let 9 be a median unbiased estimator of 9 and the class C of estimators of the form 9 = 9 o Z for which 9 and Z are stochastically independent Then 9 is the Pitman-closest group invariant estimator of 9 within the class C. For illustrative examples of Theorem 5.3.7, we refer the reader to Example 5.1.12 about the extreme-value distribution. The transformation u(x] = ex satisfies the needed regularity conditions and illustrates the invariance of median unbiased estimators under monotone transformations of the observations. Another example, arising from bioassay problems, is discussed by Sen (1963), who uses the group of strictly monotone functions for which the vector of ranks composes the maximal invariants. The consequent rank estimators are not only most appropriate in the conventional sense but are also median unbiased as well.
5.4
Posterior Pitman closeness
In Chapter 2, we introduced the convenience store example as it applied to Pitman's concept of closeness. We then partitioned the plane of Pitmanville into sectors in which the residents lived closest to a given convenience store. At the end of that example, we explained that the concept had a Bayesian interpretation if we considered the location of the stores as fixed points, which is the case of estimators in the posterior sense. The posterior distribution becomes the population density over the plane of Pitmanville. Again, in Chapter 3, we introduced the example of the distribution of the electorate on a given topic as ordered from liberal to conservative. The candidates were seen as having fixed positions on the issue. This example also has interpretations in the Bayesian sense, where the positions of
156
PITMAN'S MEASURE OF CLOSENESS
the candidates are considered as fixed points given the parameters in the prior distribution. These Bayesian examples are useful illustrations of the methodology developed by Ghosh and Sen (1991). Their contribution can be summarized succinctly in three results. They (i) coherently formulate Pitman's measure within a Bayesian context; (ii) prove that the subsequent formulation produces a relation which is transitive and thus an equivalence relation; and (iii) prove that the median of the posterior distribution produces the optimal estimator under this new formulation. In this section, we present the development of this Bayesian aspect of Pitman's measure, known as posterior Pitman closeness. Later we connect these results with the classical results presented in §§5.1 and 5.2. In the following discussion, we use 6(x) to denote a Bayes decision rule and reserve the notation 0(x) strictly for the classical setting. Definition 5.4.1 Let 11(0) be a prior distribution defined on 0. Let x be a vector of observations from the data space E. Then the posterior Pitman closeness (PPC) of a Bayes estimator d\ (x) to another <^(x) in estimation of 6 under the prior II (0) is given by
As in the classical setting the definition of posterior Pitman closeness can be refined to define an estimator which is optimal in the sense of PPC. Definition 5.4.2 Under the conditions of Definition 5.4.1 the estimator 81 is said to be posterior-Pitman-closer to 0 than 62 under the prior distribution Ii(0) provided for all x e E with strict inequality for some x e E. If the posterior distribution of 0 given x is continuous, then the inequality within the parentheses in (5.56) can be replaced by a strict inequality. The classical interpretation, formulated by frequentists, of Pitman's measure is conditioned on 0 and produces a fundamentally different interpretation of PMC than that of PPC. Lemma 5.4.3 (PPC Determination Lemma) Let 11(0) be a prior distribution defined on the real-valued parameter 0 such that the subsequent posterior
PITMAN-CLOSEST ESTIMATORS
157
distribution is continuous with an interval domain. Let 6i(x) and fo(x) be nonrandomized estimators of 6 such that ^i(x) is an element of the domain o/II( ) for each x e E and i - 1,2. If 6\(x) < <52(x),
Proof: Since the estimators are fixed given X = x, then |<$i(x) - 0\ < |#2(x)—0| whenever 0 < (6i+^)/2. (i.e., whenever 0 is less than the midpoint of the interval determined by #i(x) and ^(x)). Since the PPC is calculated under the posterior distribution of 0|x, the result is straightforward. This result is very similar to that given in Chapter 4, when applying Karlin's Corollary to ordered estimators in the classical sense. The simple procedure for the determination of PPC allows us to state sufficient conditions for the preference based on PPC. Lemma 5.4.4 (PPC Preference Lemma) Under the conditions of the PPC determination lemma, a Bayes rule 6\(x.) is posterior Pitman-closer to 0 than another 62 (x) provided
where M.(0\x) is a posterior median of 0 under the prior 11(0). Proof: Prom (5.57), then
This result parallels that given in Chapter 4 for ordered estimators in the classical sense. It is also quite similar to the preceding midpoint lemmas given in §§5.1, 5.2, and 5.3. Its importance lies in the simplicity with which one can establish that posterior Pitman closeness is a transitive relation. Theorem 5.4.5 (Posterior Transitiveness Theorem) Let 11(0) be a prior distribution defined on the real-valued parameter 0 such that the subsequent posterior distribution is continuous with an interval domain. Let £i(x), 62(x), and ^s(x) be nonrandomized estimators of 6 for each i = 1,2,3. If 6\ is posterior Pitman-closer than 62, and 62 is posterior Pitman-closer to 63, then 61 is posterior Pitman-closer than 63.
158
PITMAN'S MEASURE OF CLOSENESS
Proof: Ghosh and Sen (1991) give a very detailed proof of this theorem. Since the estimators ^i(x),^(x), and <$s(x) are elements of the domain of 11(0) for each x e E (i.e., Bayes estimators are surjective) then for X = x the estimators are ordered. The detailed proof follows along the lines of the arguments given in the Transitiveness Theorem 4.6.1. The last theorem in this section provides a posterior Pitman-closest estimator as a posterior median. Theorem 5.4.6 (Posterior Median Theorem) Let Ai(0|x) be a posterior median of 0\x. under the prior 11(0). Then,
for all x e E and for every estimator <5(x) of 0. Proof:
Since M(0\x) is a median of the posterior distribution of 0|x, then zero becomes a conditional median of [0 — M(0\x)][M(0\x.) — <$(x)j given x. Since the right-hand side of the previous equation is negative, then the subsequent probability must be at least one-half. The following sequence of remarks has been excerpted from Ghosh and Sen (1991). Remark 5.4.7 If the posterior distribution of 0|x is continuous under the prior 11(0), .M(0|x) is unique and as such becomes the unique posterior Pitman-closest estimator of 9. The continuity of the posterior distribution is preserved if the prior distribution 11(0) is continuous but X has a discrete distribution. Example 5.4.8 Suppose that X\TT ~ Binomial(n, TT), whereas TT has a uniform prior on the interval [0,1]. Then it follows that the posterior distribution of 7r\X ~ Beta(l + x, n + 1 — x). In this problem, the posterior Pitman-closest estimator would be given by M(n\x). By using the wellknown relationship between the Beta and F-distributions, the unique posterior median is given by
PITMAN-CLOSEST ESTIMATORS
159
where Fso(n, m) is the unique median of an F-distribution with n and m degrees of freedom, respectively. The range-preserving nature of this Bayes rule can be seen in that all the values of x produce an estimate in the interval (0,1). In the classical sense this estimator is also a median unbiased estimator (MUE) of TT. This particular observation is especially important in the context of logistic regression, since occurrence of x on the boundary of the data space {0, n} results in divergence of the procedures of both maximum likelihood and minimum chi-square estimation, (see Berkson (1980), Fountain, Keating, and Rao (1991), and Rao (1980)). Remark 5.4.9 The prior distribution used in Example 5.4.8 is a noninformative prior or a prior of ignorance. The generalizations of such noninformative priors to parameters in which the parameter space (or domain of support) is not bounded are known as improper prior distributions. As previously mentioned in §5.1, the Pitman estimator of location for n random variables, X\,..., Xn, with a joint density function of the form f(x\ — 0,..., xn — 0) is also a generalized Bayes rule under absolute loss and Jeffries noninformative (or improper) prior (see Strasser (1985)). In this narrow sense, the classical and Bayesian procedures give rise to the same estimator. Remark 5.4.10 Ghosh and Sen (1991) give an argument for a numerical integration comparable to (5.17). For example, in the Laplace distribution the ordered observations from a random sample of size n constitute a minimal sufficient statistic for estimation of the median in the Laplace distribution. For any prior distribution on 0, as long as the subsequent posterior is proper, the posterior median of 0|x can be obtained through numerical integration. Their observation generalizes that of Nayak given in (5.17).
5.5
Linear combinations
In the preceding theory of this chapter, no assumptions of independence were imposed upon the vector of observations X. It is well known that the vector X* of order statistics from a random sample of size n forms a complete sufficient statistic for estimation of 9. In some cases, such as the extremevalue distribution or the Cauchy distributions, no reduction in dimension will produce a minimal sufficient statistic. This complication has forced researchers to produce alternative procedures at least for the Laplace and extreme value distributions. The procedures given in §§5.1 and 5.2 provide alternative estimators for these distributions.
160
PITMAN'S MEASURE OF CLOSENESS
From the location invariant and scale invariant estimators discussed extensively in §§5.1 and 5.2, one is logically led to consideration of linear combinations of the order statistics. Many of the popular estimators of location or scale parameters are linear combinations of the order statistics. The best linear (mean) unbiased estimators (BLUEs) of location and scale parameters which Lloyd (1952) developed using methods of least squares are linear combinations of the order statistics. Mann (1969) developed best linear invariant estimators (BLIEs) via a Gram-Schmidt orthogonalization of the BLUEs. In this section, we consider the class D of linear combinations of the order statistics, i.e.,
Two estimators 9\ and #2 in D are functions of a common statistic T if and only if ai and &2 are linearly dependent. If these vectors are linearly dependent then the estimators 9\ and 62 become scalar multiples of one another (see Keating (1985) for a discussion of results in this regard for scale-invariant estimators of scale parameters). Many of the results in this section were motivated by the results of Mason et al. (1990). They establish a theoretical framework for the comparison of linear forms of a random vector X. Because of the importance of estimation of location and scale parameters based on linear combinations of order statistics, the application of their work to this reduced dimension is certainly appropriate. The substantive results of this section were developed independently by Fountain (1991) and Peddada and Khattree (1991). To remain consistent with the prior presentation, we adopt the presentation of Fountain with its development in linear algebra.
Lemma 5.5.1 (Fountain's Lemma) Let a and b be linearly independent vectors in 1RH and define the nxn matrix C = aa' — bb'. Then the following three properties are true: 1) the vectors a and b form a basis for the column space C; 2) the eigenvalues ofC are given by AI and \z, where
PITMAN-CLOSEST ESTIMATORS
161
3) the corresponding eigenvectors, YI and V2 are given by
Proof: Let the projection matrix P be defined by P = bb'/(b'b) and the vector v
From the definition of v it follows that Cv = a and thus a is an element of the column space of C. Using the same technique, one can show that b is an element of the column space of C. To show that {a, b} spans the column space of C, let z be an arbitrary element of the column space of C. Then there exists a vector w such that z = Cw and
so that z e SPAN {a., b}. Since the vectors a and b are linearly independent by hypothesis, then {a, b} is a basis of the column space of C. If v is an eigenvector of C then it must lie in the column space of C. Without loss of generality, one can assume that v is of the form v = a + <$b. Now by definition, Cv = Av, which, upon substitution, implies that
The second half of this definition produces
Since {a, b} is a basis, then one can merely equate the coefficients of a and b in the last two expressions. The subsequent system of two equations can be reduced to a single equation in one unknown by eliminating 6. This elimination produces the second equation 6 — (A — a'a)/(a'b). The resulting characteristic second-degree polynomial is given by
The two solutions given in (5.61) and (5.62) can be obtained by completing the square on the quadratic equation in (5.65). The subsequent form of the eigenvectors follows from the secondary equation.
162
PITMAN'S MEASURE OF CLOSENESS
Remark 5.5.2 From the characteristic polynomial the product of the eigenvalues is The Cauchy-Schwarz inequality implies that the product is negative and thus AI > 0 and A2 < 0. Also the sum of the eigenvalues is given by
This latter result will prove useful in applications. Fountain's lemma leads one to the following canonical form for the comparison of two linear estimators of a parameter 0 based on PMC. It will be convenient in the following formulation to normalize the eigenvectors by Gi = Vi/||vi|| for i = 1,2. Theorem 5.5.3 (Canonical Form) Let X be a random vector of dimension n and let 9\ = a'X and 02 — b'X be linear estimators of the unknown parameter 9. If a and b are linearly independent vectors in JRn then
where Yi = eJ(X — 0z), for i = 1,2, K = —\2/^i, the nx n matrices P and Q are given by P = bb7l|b||2 and Q = aa7||a||2, and
Proof:
The reader may want to refer to the proof of Fountain's Lemma, specifically to (5.62), for the construction of z. From the Spectral Decomposition Theorem, C can be rewritten as
Substitution of this form into previous inequality results in
PITMAN-CLOSEST ESTIMATORS
163
where Wi = e^X for i = 1,2. This inequality can be reduced by completing the square on both w\ and W2, which simplifies this inequality to
At first inspection the above inequality appears to be that of the interior of a hyperbola centered at (0z'ei,0z'e2) in the w\ — W2 plane. However, from the manner in which z is constructed via orthogonal projections, z is orthogonal to Cz, where
Hence it follows that (a'z — l)a — (b'z — l)b = 0. Due to the unique representation of a basis, a'z = b'z = 1. Consequently, z'Cz = z'(a — b) = We conclude this section with some pertinent remarks on Theorem 5.5.3. The suggested canonical form has practical utility if the joint distribution of (¥1,12) is tractable (say, bivariate normal). However, if the elements of X are independently distributed as logistic, Cauchy, or Laplace variates, the joint distribution of (Fi,!^) is not of any simple form for a fixed n. For large n, the joint normality of (Yi, YJJ) may be obtained by the classical Central Limit Theorem except for the case of the Cauchy distributions. The consequent value of PMC in (5.68) can be explicitly obtained by using Sheppard's (1898) Theorem. We clarify this point in Chapter 6 by dealing with the asymptotic case in a far more general setup. If we assume that (a'X, b'X) has a jointly bivariate normal law (which would immediately imply that (Yi, ¥2} also has a bivariate normal law), a more simplified version of (5.68) follows. We denote the covariance matrix of (a'X, b'X) by S = ((0tj))i.j=i,25 and let (C/i,t/2) De a bivariate normal random vector with null mean vector and covariance matrix S. If we assume that YI and YI are asymptotically unbiased, the left-hand side of (5.66) is
where
164
PITMAN'S MEASURE OF CLOSENESS
This is based on Sheppard's (1898) Theorem, in §4.2. Thus, we arrive at the same conclusion that follows 4.9. We not only have a simplified version of lP(Yi, ¥2), but also its value is greater than \ whenever (722 is large than ffu. For normally distributed linear estimators, this not only renders transitiveness under the Pitman closeness criterion but also induces an isomorphism with respect to the same under the MSE criterion. For details, refer to Sen (1992c). If Y\ and ¥2 are biased, by invoking positive association between U\ and t/2 (when (722 > oil), we may get a lower bound on PMC that depends upon E(Yi) and £(¥2). The PMC lower bound may exceed, equal, or be less than | depending on the mean values. Mason and Blaylock (1991) present numerical examples of these intricate calculations as they relate to problems in ridge regression. Nonetheless, the situation is less complex than in (5.66).
5.6
Estimation by Order Statistics
Let X\:n,..., Xn:n be the n order statistics from a random sample of size n taken from a location and scale parameter family of distributions. The random variable X has a parameter space fi = {(//, cr) : // e JR, a e iR+}, the upper half plane, and its distribution function is given by
where H(-) is a parameter free distribution function. Define the random vector Z so that its ith component Zi is given by
for each i = 1,... ,n. This transformation produces the following results:
for i, j = 1,2,..., n. Define E(Z) = e and Cov(Z) = V, where e is an n x 1 column vector of known constants such that ei = E(Zi). The n x n matrix V is composed of known constants such that the element in the ith row and jth column, tfy, is given by
PITMAN-CLOSEST ESTIMATORS
165
The Zi's have the same distributor! as the order statistics, formed from a random sample of size n, generated from a family with distribution function H(•). The following presentation is found in more general detail in Fountain and Keating (1993). This formulation produces a linear model of the following form:
where E(6) = 0, Cov(6) = <72V, C = [1 e] is the design matrix, and the vector of parameters is ft = [p, a]. Prom the Gauss-Markov Theorem the best linear unbiased estimator (BLUE) of 0 is given by
From Lloyd (1952) an expression for each BLUE of fj, and cr, respectively, can be simplified by partitioning the matrix C/V~1C. The subsequent BLUEs are given by
where F is the skew-symmetric matrix given by
where A = |C/V~1C|. In this way it is easy to see that both estimators are linear combinations of the components of X. The variances and covariance of these estimators are given by
Mann (1969) showed that the best linear invariant estimators (BLIEs) of a and fi are given by
166
PITMAN'S MEASURE OF CLOSENESS
where d = A/(A + I'V"1!), and c = -l'V~le/(& + I'V'1!). Both £1 and <72 are linear combinations of X but they are simply scalar multiples of the same linear combination. In this case, the matrix aa' — bb' becomes singular and the simpler comparison results. This comparison of scalar multiples of a scale invariant estimator of a scale parameter was discussed by Keating (1985). Earlier, in Examples 3.3.1 and 3.3.2, we illustrated Rao's phenomenon based on estimators of scale parameters. Preference in terms of PMC depends upon the skewness of the distribution of a\. In the context of §5.5, //i and fa are both of the form a'X and b'X and as such can be compared by the methods given there. As noted by Lloyd, if the underlying distribution of X is symmetric about p, then fa will be uncorrelated (i.e., c = 0) with a\. Thus in such cases the BLUE fa will coincide with the BLIE fa of p,. Moreover, the sign of (fa — fa) is completely determined by the Cov(£ti,<7i). If fa and a\ are positively correlated (i.e., c > 0) then fa < fa for all X. If the correlation is negative (i.e., c < 0) then the order of the estimators is reversed. In either case, using Corollary 4.4.2, we can show that where T\ = (fa — fJt)/ffi. It is well known, see Lawless (1982), that T\ has a parameter-free distribution since fa and ai are equivariant estimators of \i and a, respectively, and E(T\) = 0. Thus,
Hence, we have a necessary and sufficient condition for which the best linear unbiased estimator of fi will be Pitman-closer than its best linear invariant counterpart. This is the framework of Rao's phenomenon in which we compare the MVUE of // in the class of linear estimators with the member which has the smallest MSE. For detailed discussions of BLUEs in light of the Pitman closeness criterion, we refer the reader to Sen (1989b). The framework of this problem in which we illustrate the comparison of estimators of a location parameter can be used to produce similar comparisons of estimators of the percentiles of X, (see Keating (1984) and Keating and Tripathi (1985)). However, these pairwise comparisons do not manifest a Pitman-closest member of the class of linear estimators of location, scale, or percentiles of the distribution.
PITMAN-CLOSEST ESTIMATORS
167
Define the iih ancillary component
for each i = 1,..., n. The subsequent vector a is well known as the ancillary vector. Consider the conditional confidence interval,
Equation (5.83) produces the conditional lower confidence bound on p, as #1 - Z7(7i. If we set 7 to be one-half then we produce a conditional median unbiased estimator (CMU) of p, given by
where /5o = M(T\ a). This 50% level confidence interval on n constructed by the conditional procedure is identical in value to the 50% Bayes posterior probability interval obtained under the customary improper prior (see Lawless (1982)). McCullagh (1992) provides a very thoughtful review of such conditional procedures applied to the Cauchy distribution. Takada (1991) discusses the general role of median unbiased estimators in invariant prediction problems. In this case, as in inverse regression, the estimand may not be a parameter but rather a future observation. Different equivariant estimators will produce different unconditional confidence intervals and different unconditional medians. Hence our search for an unconditional Pitman-closest estimator seems undirected. The necessary and sufficient condition given above for the Pitman-closer estimator of H (see (5.82)) was made possible by the fact that the equivariant estimators arose naturally from the procedure of weighted least-squares. However, different equivariant estimators lead to the same confidence intervals when we use the conditional approach. Keating (1984) establishes this result for estimation of percentiles in location and scale parameter families of distributions. Thus /2c given in (5.85) is the conditionally Pitmanclosest estimator of /z and the posterior Pitman-closest estimator of /z under the noninformative prior. Again, we have been able to find this bridge between a conditional procedure and a Bayesian procedure to be useful in the construction of the Pitman-closest estimator. Furthermore, this CMU estimator is precisely the one obtained by Nayak (see §5.1). He obtained Pitman-closest invariant estimators conditioned on an ancillary vector. In this way, we can now see the coincidence of the three procedures in this
168
PITMAN'S MEASURE OF CLOSENESS
special case for location parameters, scale parameters, and percentiles from location and scale parameter families of distributions. Finally, the conditional approach is also similar to Rao's form of the Pitman estimators since Rao likewise conditioned on an ancillary vector a. Mathematically stronger results than these can be established through the Kagan-Linnik-Rao Theorem but are beyond the scope of this text. There are some specific models where the distribution function H(-) admits a minimal sufficient statistic for (//,cr). The normal, exponential, and uniform distribution functions are notable examples. In such cases, two specific linear combinations of X\.n,... ,Xn:n are jointly sufficient for (/^,
Chapter 6
Asymptotics and PMC The merits and demerits of PMC, mostly in finite sample size situations, have been the main targets of our discussions in this book. In Chapter 5 we observed that equivariance, under certain groups of transformations that map the sample space onto itself, plays a key role in a characterization of the Pitman-closest property of estimators within an appropriate class. We also saw that equivariance eliminated some of the complications of the Pitman closeness criterion such as inadmissibility and intransitiveness. Restrictions of the class of estimators via equivariance or unbiasedness are not at all uncommon with most other criteria which are used in a decision theoretic setting to judge the performance of competing estimators of a common parameter. The Pitman closeness criterion, like other criteria, becomes more appealing under such restrictions. Asymptotic methods relate to the situation where the sample size is usually made to increase indefinitely to induce various simplifications which may be tenable for moderate samples sizes as well. In classical estimation problems, asymptotic methods play a vital role not only in enlarging the class of competing estimators but also in weakening the regularity conditions relative to the small sample situations. Parametric and nonparametric estimation are uniquely blended in this formulation. Of course, the applicability of an asymptotic result depends on its rate of convergence. These aspects of the asymptotics are considered in this chapter with due emphasis on Pitman's measure of closeness and its concordance with other criteria. In passing, we add that Scheffe (1945) strongly encouraged the development and use of the asymptotic results associated with the Pitman closeness criterion and the study of Sen (1986a) established the fundamental results. We may also add that the last two sections of this chapter deal with MLEs, 169
170
PITMAN'S MEASURE OF CLOSENESS
Pitman estimators, and Bayes estimators in a unified manner and are somewhat more technical than the rest (although there is a smooth gradation of the level of technicality throughout this chapter). Scanning these two sections should not hamper the general comprehension of the text, although these two sections add much to the completeness of discussions and expand the scope of asymptotic optimality of estimators in a broader perspective.
6.1
Pitman closeness of BAN estimators
Let -X"i,... ,Xn be n independent and identically distributed random variables with a distribution function F(x\9] admitting a density f(x;0) with respect to some sigma-finite measure //. Typically, /(•) stands for the probability density function in the (absolute) continuous case, and for the probability function in the discrete case. The form of the density /(•) is assumed to be given, while 0 is an unknown parameter belonging to the parameter space 17, a subset of JR. Our main objective is to construct a suitable estimator 0n = 0(X\,... ,Xn) having some optimal (or desirable) properties. In Chapter 5, we stressed the formulation of a Pitman-closest estimator in a fixed sample size setup where n may not be large enough to induce some of the asymptotic simplifications that can generally be obtained by the use of refined probabilistic tools. Here, we allow n to be sufficiently large to yield such simplifications without compromising much on the underlying regularity assumptions.
6.1.1 Modes of convergence Before we begin our formal presentation of asymptotic results, we present the definitions of convergence in probability and convergence in distribution. For an application-oriented treatment of these modes, the reader may refer to Sen and Singer (1993). Definition 6.1.1 A sequence {Tn}^=1 of random variables converges in probability to a constant b provided for each e > 0,
We denote this mode of convergence by Tn —» b.
ASYMPTOTICS AND PMC
171
We have encountered this mode of convergence earlier when we introduced the concept of consistency in §1.1.3. The sequence {^n}£Li of estima"~~ P tors indexed by the sample size is consistent if and only if On —» 6. A second occurrence is given in Remark 4.6.6. We note that F(Xn) —* ^, where Xn is the sample median. If F(-) is monotone and continuous in a neighborhood of 0, then Moreover, F-1(|) equals 0, the population median. In this sense, the sequence of sample medians {Xn}^=1 converges in probability to the population median. In many cases, our interest in a sequence {Tn}^=l of random variables is not simply restricted to the limiting value of a sequence but extends to the limiting behavior of the distribution functions of suitably normalized random variables in the sequence. In this regard we introduce the definition of convergence in distribution. Definition 6.1.2 Let {Tn}^=1 be a sequence of random variables and let Fn(x) be the distribution function o/Tn. Let X be a random variable with distribution function F(x). The sequence {Tn}^! of random variables converges in distribution to a random variable X provided
at each point of continuity of F(x). We denote this mode of convergence by Tn2x. Definitions 6.1.1 and 6.1.2 can be naturally extended to sequences {Tnl^Lj of random vectors. Fn(x) becomes the joint distribution function of the random vector Tn. The sequence of random vectors (of the same dimension p) converges in distribution to a random vector X provided
at each x, a point of continuity of F, where F is the joint distribution function of the random vector X. If the vector X has a p-variate normal distribution with mean vector p, and covariance matrix £, we will replace the random vector X by .A/^(/i, S) in the convergence notation, that is
172
PITMAN'S MEASURE OF CLOSENESS
These modes of convergence will be used throughout this chapter to obtain asymptotic results. Another useful but more restrictive mode of convergence focuses on the probability of the limiting behavior of a sequence of random variables. Definition 6.1.3 A sequence {Tn}^=1 of random variables converges almost surely to 0, provided
We denote this type of convergence by Tn ^4' 0. This definition can be modified to form convergence of a sequence to a random variable. Definition 6.1.4 A sequence {Tn}^=1 of random variables converges almost surely to a random variable T provided the sequence {Tn — T}^=1 converges almost surely to 0. We write this mode of convergence asTn—Ta-^Q. Among these three modes of convergence, there is a hierarchy of implication. Let {Tn}%Li be a sequence of random variables and let T be a random variable. Then
Another mode of convergence frequently used in asymptotic settings involves the limiting behavior of the moments of a sequence of random variables. Definition 6.1.5 A sequence of random variables {Tn}^=l converges in the rth mean to a random variable T, provided for the positive number r, there exists an integer UQ, such that
We denote this type of convergence by Tn -^> T. The most frequently used mode of this type is quadratic or second mean convergence, which occurs when r = 2. Since E(\Tn\r) and E(\T\r)
ASYMPTOTICS AND PMC
173
exist, Minkowski's inequality implies that E(\Tn — T\r) exists. This mode of convergence is related to the previous modes of convergence in the following way: Consequently, asymptotic results that only require convergence in distribution apply to a larger class of sequences than those asymptotic results that require convergence in the rth mean. Convergence in the rth mean for a sequence {Tn}^Li to a limiting random variable T implies convergence in the sth mean whenever r > s. Thus second mean convergence of {Tn}^=l to T implies first mean convergence of {Tn}^=l to T and consequently mean squared error convergence of {Tn}f-i toT. In the asymptotic setting, we need to extend Definition 1.0.1 of Pitman's measure of closeness. Definition 6.1.6 (Asymptotic PMC) Let {0n}£Li and {^KJU be two sequences of estimators of the real-parameter 0. We define Pitman's asymptotic measure of closeness of 9n to O'n as
The use of the square root of n in the definition is important because it allows us to provide a more refined comparison of the asymptotic variances of On and ffn and a more meaningful (i.e., nonsingular) limiting bivariate random vector for (0n, Q'n). Definition 6.1.5 allows us to extend Definition 2.1.5 of a Pitman-closest estimator in a meaningful way for asymptotic situations. Definition 6.1.7 (Asymptotically Pitman-closest Estimator) A sequence {0^}J£Li of estimators is said to be an asymptotically Pitman-closest estimator (APC) of a parameter 0 within a class C provided
for every 0 e Q and each {0n}J£=i c C. In this formulation, 0* may not be unique. The manner, in which the APC estimator is defined, is equivalent to the corrected Pitman criterion but carefully evades the calculation of the probability of ties.
174 6.1.2
PITMAN'S MEASURE OF CLOSENESS Fisher information
The likelihood function, due to the independence of the sample observations Xi,..., Xn, with observed values #1,..., x n , is given in (1.9) as
for each 0 € fi. The log-likelihood score statistic is defined as
where f'e stands for the partial derivative with respect to 0, and the loglikelihood score statistic is tacitly assumed to exist for almost all x and 0. Actually, we assume more than this. Let
for each 0 e 17 be the Fisher information on 0. The integration is defined with respect to the Lebesgue-Stieltjes measure //, and we assume that for each 0 e tl
Let us denote by {/fn(^)}^L1 the sequence of expectations defined by
Assuming that differentiation with respect to 6 can be carried out under the integral sign on the right-hand side of (6.5), we obtain that
ASYMPTOTICS AND PMC
175
as E(Un) = (d/<90)/---//(x;0)nfd//(zi) = (d/dO)l = 0, under the same regularity conditions pertaining to the interchange of the order of differentiation with respect to 9 and integration with respect to //. By combining (6.2) and (6.3), we have
so that by (6.6) and (6.7) and the Cauchy-Schwarz inequality,
This yields the Frechet-Cramer-Rao inequality:
for all {§n}™=i and 0 e ft. The strict equality in (6.9) holds when
where k is a nonzero constant, which may depend on 9 and n. If we restrict our discussion to the class of unbiased estimators of 0, we have Hn(9) — 0, so that Hfn(9) = 1, and (6.9) provides a lower bound, called the information limit, to the variance of any unbiased estimator of 0.
6.1.3 BAN estimates are Pitman closest In an asymptotic setting, the bestness, the state of being best, of an estimator is usually judged by the minimum mean square error (MSE), or other generally convex risk functions. The class of estimators under consideration consists of asymptotically normal (AN) estimators. This provides the genesis of BAN (best asymptotically normal) estimators, as introduced by Neyman (1949) in the context of categorical data models. This traditional asymptotic framework led Sen (1986a) to pose a very natural question:
176
PITMAN'S MEASURE OF CLOSENESS How do such BAN estimators behave when used with Pitman's measure of closeness?
This may well be the vantage point for the asymptotic situation where we may not need to confine ourselves to the class of unbiased estimators. We say that On is a BAN estimator of 6 if the following conditions hold: (i) (AN part): As n increases indefinitely,
where is the asymptotic variance. An implication of this convergence in distribution is that 9n —> 0 and we can replace a2 by the variance of the asymptotic normal distribution of \/n(0n — 0), so that the second mean convergence result is not that essential. (ii) the condition of best is attained by {On}^-\ in the sense that
If we combine (6.10) along with (6.11) and (6.12) we conclude that for a BAN estimator #n,
where U* = Un/\/n, so that by (6.2) and (6.7), as n —> oo,
If 0n is AN but not necessarily BAN, then (6.11) holds with l£S(0). In an asymptotic setup, (a2$$(0)}~1 becomes the Fisher efficiency of 0n, which is always less than or equal to one, where the equality holds for a BAN estimator. Let C be the class of all estimators of 0, such that as n increases indefinitely,
where F is the 2 x 2 matrix with elements
ASYMPTOTICS AND PMC
177
Equation (6.6), coupled with the asymptotic unbiasedness of 0n, insures that 712 = 1. The class C is nonempty since, by virtue of (6.13), all BAN estimators of 9 belong to this class. Then the following optimality theorem is due to Sen (1986a). Theorem 6.1.8 A BAN estimator (say, 0J) is asymptotically Pitmanclosest within the class C. Proof. Note that by (6.15) and (6.16), as n -> oo,
where For any 0n for which cr2 is greater than 1/9(0), Rn is asymptotically independent of Un and normally distributed with 0 mean and variance or2 1/9(0). If cr2 = 1/9(0), we still have a degenerate normal law with 0 mean and 0 variance. On the other hand, 0* is a BAN estimator, so that by (6.13),
Since .R* —> 0 while Rn and U* are asymptotically independent, by (6.18) and (6.19), we have
178
PITMAN'S MEASURE OF CLOSENESS
In the nondegenerate case (i.e., for cr2 > 1/^(0)), by (6.17), the right-hand side of (6.20) converges to a positive limit, while in the degenerate case (of p Rn —> 0), the right-hand side of (6.20) converges to 0. Hence the inequality in Definition 6.1.7 holds, where the equality sign holds when both 0J and On are BAN estimators, and in that case, by (6.13) both are convergent equivalent to 0 + U*/[y/n^(&)}. Example 6.1.9 To illustrate the comparison of a BAN estimator with one that is not, let us review Example 4.6.8. In this example, we compare the sample mean Xn with the sample median Xn taken from a random sample from a normal population. Then from Geary (1944) we have that
where the covariance matrix B is given by
Since ?r/2 > 1, then Xn is AN but not a BAN estimator and Xn is an asymptotically Pitman-closer estimator. Moreover, since |B| > 0, the bivariate distribution is nonsingular. We also note that the asymptotic mean squared error relative efficiency of {Xn}^=1 to {Xn}^=1 is simply the ratio of their asymptotic variances. This ratio is
where p is the correlation coefficient between Xn and Xn. In this asymptotic construct, p2 is the asymptotic Fisher efficiency and the asymptotic version of (2.8). In Example 4.6.8, if the underlying distribution is Laplace rather than normal, the sample median Xn is BAN while Xn is not. In this case the covariance matrix of ^/n(Xn — 0) and ^/n(Xn — 0} is given by
where A is the scale parameter in the Laplace distribution. Hence, Xn is asymptotically Pitman-closer to 6 than Xn and the Fisher efficiency of Xn with respect to Xn is \.
ASYMPTOTICS AND PMC
179
Example 6.1.10 To illustrate the degenerate case, let us reconsider Example 3.3.2, in which we compared estimators of a normal population variance 0 = <j2. The uniformly minimum variance unbiased estimator is given by ^n — &n and *ne maximum likelihood estimator is given by On = (n—l)£2/n. 4 2 Since (n-l)Sl/a2 ~ xl-i, Var(g*) = 2<74/(n-l), Var(0 n) = 2(n-l)a /n and Cov(0£,0n) = 2cr4/n. Thus, we have that
where the covariance matrix is singular. In this example both estimators are BAN estimators and illustrate their convergent equivalent nature. The regularity conditions are posed to insure the asymptotic normality. To illustrate this point, consider a counter example. Suppose that a radar device is used for detecting the speed of a moving object. The accuracy of the device is guaranteed to be within one mph of the true speed. If we assume that the measured speed X has a uniform distribution about the true speed 0, the density function of X is given by
Let X\,..., Xn be independent and identically distributed random variables from this uniform distribution. Define the random variables
as one-half of the sample range and midrange, respectively. The joint density function of Un and Vn is given by
Prom this result, the reader can verify that
where L represents a Laplace distribution with median of 0 and unit scale parameter. Thus the AN condition does not hold here, although the Pitman closeness property follows from the discussion made at the end of §5.1.
180
PITMAN'S MEASURE OF CLOSENESS
Remark 6.1.11 The weak convergence result in (6.18) provides the key to the sketched proof. In this respect, we may not need the second mean convergence of R^ (otherwise needed for the bestness in the quadratic risk sense). In many cases, we may require less stringent regularity conditions, which will be clarified further in the next section. Comparison of estimators of the reciprocal of a binomial parameter is another example that clarifies this point (see Example 2.3.1). Example 6.1.12 Another interesting example involves estimation of the scale parameter A in the exponential failure model. As in Example 1.2.1, A is the failure rate of the exponential distribution. Let X\,...,Xn be n independent and identically distributed random variables as in Example 1.2.1. The total time on test Tn has a gamma density function with shape parameter n and scale parameter I/A. As estimators of A consider the scalar class whose elements are of the form A(a) = a/Tn, where a > 0. Then we can readily verify that
and
Thus the minimum mean-squared error estimator in this class is obtained by setting a\ = (n — 2) while the MLE is obtained by setting a? = n. Note that the MMSE estimator of the mean time between failures 9 = I/A is given by Tn/(n + 1) while the MLE is Tn/n. Thus, even in this simple case MMSE estimators do not satisfy transformational invariance whereas MLEs clearly do. Using the Ghosh-Sen Theorem II, we obtain a scale equivariant Pitmanclosest estimator of A by setting as = m2n/2. Note that the Pitman-closest estimator of 0 is given in Chapter 5 as 2Tn/m2n so that Pitman-closest estimators satisfy this intrinsic invariance property not held by MMSE estimators. The median m^n of the chi-square distribution having 2n degrees of freedom can be approximated by In — 2/3 + O(l/n). Hence all the estimators discussed here are BAN estimators and consequently the MMSE and the Pitman-closest estimators are asymptotically convergent-equivalent. Nevertheless, for finite n this invariance property of Pitman-closest estimators contributes to their intrinsic appeal and in this respect makes them more advantageous than the MMSE estimator.
ASYMPTOTICS AND PMC
6.2
181
PMC by asymptotic representations
In the previous section we studied the Pitman-closest property of asymptotic normal (AN) estimators with emphasis on the related BAN estimators. Although BAN estimators are typically nonlinear, they can often be expressed in terms of a linear estimator plus a remainder term which converges to zero in a certain mode of convergence (e.g., in probabilty). A first-order asymptotic representation (FOAR) holds for a large class of estimators in diverse estimation problems arising in parametric, semiparametric, or nonparametric models. For the sake of completeness of discussion and for the purpose of illustration, we present an elementary introduction to FOAR and stress its role in the development of PMC in an asymptotic framework. We shall also emphasize the diversifications achieved from the first-order asymptotic representation.
6.2.1
A general proposition
Consider a class C of estimators {On}^=i of a parameter 9. Suppose that there exists a score function >(#;#), possibly dependent on 0, such that it is possible to write
where the remainder term Rn is Op(l/^l/n) (as n —>• oo) and the score function is normalized so that Then, 6n is said to admit a first-order asymptotic representation. It follows from the Central Limit Theorem that as n —>• oo,
Consequently, by (6.21)-(6.23), we conclude that as n —•>• oo,
For this reason, (6.21) is often termed a first-order asymptotic normal representation (FOANR). If this class C includes a BAN estimator 0* of 0, then we have
182
PITMAN'S MEASURE OF CLOSENESS
Defining &(0) as in (6.3), we have
e = df/dO. Thus E[*(Xi\0)] = 0 and oj = E[(4>*(Xi;0))2] = 1/3(0).
Moreover, by the Frechet-Cramer-Rao inequality,
for all 0 e $, where $ is the class of all score functions corresponding to the class of estimators {6n}^=1 for which a FOANR holds. Further, using (6.21) and (6.23), and the identity that E{(d(0n - 0)/d0)} = -1, we obtain that for every (j> e $, the covariance with 0* is
By combining (6.21), (6.23), and (6.27)-(6.28), we have the following convergence in distribution:
and hence, by Theorem 6.1.1, we arrive at the following result. Theorem 6.2.1 Within the class C of estimators {0n} admitting an FOANR, whenever a BAN estimator exists, it is asymptotically first-order Pitman closest. Let On' and On be two estimators of 0, such that for O with (j> = ft, j = 1,2. We let aj£ = E^X^O^X^O)], for j,t = 1,2. -~j\\ ^f2^ Then by the asymptotic bivariate normality of yn(0 by (6.22) and Sheppard's Theorem (see §4.2), we have
where 77 = (022—0i\)/{(v\i+ ^22) 2 —^12}- Note that an < a^ implies that 77 > 0. So from (4.9) we conclude that JP^O^.O^^O) is greater than or equal to ^. Thus the usual transitiveness (under MSB criterion) is retained in an asymptotic sense under the PMC criterion, for all 0}r admitting a FOANR. We refer to Sen (1992c) for more details.
ASYMPTOTICS AND PMC
183
Theorem 6.2.2 Within the class C of estimators {Qn} admitting a firstorder asymptotic normal representation, asymptotic PMC, JPoo (•,•)#), is transitive and the consequential ordering on C is determined by the values of the asymptotic variances of the estimators in C. The implication of this asymptotic result is far reaching. For the class of FOANR estimators there is an isomorphism between the PMC and MSE criteria in a very general asymptotic setup. Thus PMC becomes transitive without necessarily requiring that nE[(6n — 0^2] has a limit equal to the variance of the asymptotic normal law for \/n(On — 0). To iterate this point further, in the next section, we consider a general class of AN estimators of location and discuss the relevance of the PMC in a very natural setup.
6.3
Robust estimation of a location parameter
Let us consider the simple model in which X\,..., Xn are independent and identically distributed random variables with a probability density function f ( x ; 0) = h(x — 0), 9 e H, and the form of the density h(y) does not depend on 9. The classical least squares estimator (LSE) of 0, discussed in §1.1.1 and obtained by minimizing
with respect to 0, is given by the sample mean Xn, which is also the MLE of 9 if the density h is normal. If the density h admits a finite variance cr2, then where $s(0] is the Fisher information on 0. For normal /i, the equality sign in (6.31) holds, so that Xn is fully efficient; Xn is also the Pitman-closest estimator of 0 (see Example 5.1.15). This ideal picture may change abruptly if the assumed density h deviates from a normal one, either locally or globally. Moreover, in actual practice, very rarely can we take normality of h for granted (even if suitable transformations on the variate X are considered to induce more symmetry). As such, practitioners may consider alternative estimators which are robust to such possible departures from the modelbased assumptions. The L-, M- and R-estimators are three such possibilities.
184
PITMAN'S MEASURE OF CLOSENESS
6.3.1 L-estimators
Let Xi:n < ... < Xn:n be the order statistics corresponding to the unordered Xi,..., Xn. By the assumed continuity of F (the distribution function corresponding to the probability density function /), ties among the Xi (and hence, Xi:n) can be neglected in probability. Let k be a nonnegative integer (0 < k < n/2), and let
The family {Tn^\ 0 < k < n/2} constitutes the class of trimmed means. For k = 0,Tn>0 = Xn, and for k = [n/2], it reduces to the sample median Xn. Within this family, Tn)0 is usually least-robust, while Xn is most-robust, against outliers or gross error contamination. On the other hand, for small departures from the assumed normality of /, in terms of efficacy, X not compare well with Tn,0 or T Wj i. Based on such considerations, we choose a small a (0 < a < \) and consider an a-trimmed mean Tn>fcn, kn ~ not. For any k < n/2,T n> jt is a linear combination of the order statistics {Xi:n}, and belongs to the class of L-estimators discussed in §5.6. A variant form of this trimmed mean Tn^ is the so-called Winsorized mean
for k = 1,..., [n/2j, where for k — 0, WHi0 = Xnj and where Wn^ belongs to the class of L-estimators of location. Let us also consider a related estimator, termed the rank weighted mean by Sen (1964). Consider a subset (i\ < i% < ... < i-zk+i) of 2k + 1 distinct indices (out of 1,..., n) and let X^ ... i2k+i be the median of X^ ..., Xi2k+1. Let
be the average over all possible subsample medians, where nCT is the number of combinations of n elements selected r at a time. Some algebraic manipulations lead us to
ASYMPTOTICS AND PMC
185
where ,Sn)0 = Xn and Sn ir/ n _|_ i)/2j] — Xn- These Sn>k also belong to the class of L-estimators or linear functions of order statistics discussed in §5.6. We may define formally an L-statistic as
where the Cnj are known coefficients, not all zero. If we let k = kn ~ np, for some 0 < p < 1 and Cn,i = 0 or 1 according as i ^ kn or i = fcn, then Ln reduces to a sample p-quantile. In general, if only a fixed number (say, r(> 1)) of the c™^ are nonzero and the indices i\,... ,ir correspond r]|, respectively, where 0 < p\ < ... < pr < 1, we have a linear combination of r sample quantiles, and Ln is termed a Type I L-
mr. -n are nonlinear functions of the Xi, the elegant Bahadur (1966) representation holds under quite general regularity conditions, and that implies that (6.21) holds for -X"i[
As such, (6.21) holds for Type I L-statistics. In a variety of situations, as in (6.32), (6.34), and (6.36), the Cnj are some smooth functions (of (i,n)), and we may write
where Jn = {Jn(w),0 < u < 1} converges (as n —> oo) to a smooth function J = {J(w),0 < u < 1}. In all the examples, we consider J to be bounded. In this case, Ln defined by the coefficients Cnj in (6.37) is a linear function of order statistics with "smooth" weights and is termed a Type II L-statistic. If we denote by
the sample or empirical distribution function, then by (6.36) and (6.38), we have
186
PITMAN'S MEASURE OF CLOSENESS The Glivenko-Cantelli Theorem (see Pitman (1979)) states that
So it is quite intuitive to formulate 0 as
Expanding Fn around F it can be shown that for such an Ln, (6.21) holds with
and
We may refer to Serfling (1980, Chap. 8) and Sen (1981, Chap. 7) for details and thereby omit the proofs of (6.42) and (6.43). Thus, the FOANR holds for a general class of L-statistics where we may not need to confine ourselves to a parametric setup. On the other hand, in a parametric framework (especially with respect to location and scale parameters), BLUEs (best linear unbiased estimators) are also L-statistics of the form given in (6.36) and, for them, (6.42) holds under fairly general regularity conditions. Particularly, the ABLUEs (asymptotically BLUE) satisfy (6.42) for a large class of densities belonging to the location-scale family, and hence, the APC characterization discussed here pertains to a large class of L-statistics. Using Theorem 6.2.1 we can state the following characterization of an APC estimator.
Theorem 6.3.1 Let C\ be the class of L-estimators of a location parameter 0. If C\ is restricted to the L-estimators which admit an FOANR, an ABLUE of 9 is asymptotically Pitman-closest in C\.
6.3.2
M-estimators
Let us next consider the usual M-estimators of location. Let X\, X
ASYMPTOTIGS AND PMC
187
for all x e ]R and 0 e ft C JR. We assume that H is such that its probability density function is symmetric about 0. Let II be the class of all odd functions defined over the entire real line. Then the M-estimator associated with the odd function ij)(x) is given by
We note that if -0(x) is nondecreasing then Mn(t) is a nonincreasing function of t. Moreover, from the symmetry of h(-) about the origin and the fact that tf}(-) is an odd function then
Therefore equating Mn(t) to 0, we obtain the following M-estimator of 0 based on the influence function -0:
where 0n,i = sup{t : Mn(t) > 0} and 0n^ = inf{£ : Mn(t) < 0}. In robust estimation, generally, tjj(x} is taken to be a bounded function, so that 6n is not that sensitive to outliers or gross error contamination. A notable influence function, due to Huber (1964) is given in (2.7). Whenever ij)(x) is nondecreasing, we may identify (as in §2.3.3) a convex function p(x) such that p'(x) = ip(x). In such cases, we may express the M-estimator in (6.46) as
In particular, if we let p(y) = \y\, we have if>(y) = sign ?/, so that On reduces to the sample median. Thus, the Li-norm estimator belongs to the class of M-estimators. Similarly, if we let p(y) = y2, (6.46) reduces to the classical least squares estimator (LSE) of 9. In a slightly different slant, let il)(y) — ~ f (y) /f (y) i where /, the density function corresponding to the distribution function F, is absolutely continuous. Then On corresponds to the classical maximum likelihood estimator (MLE). In this special case tp(-} need not be an odd function. Thus, the class of M-estimators includes various notable members as in the normal influence function discussed in §§2.3 and 2.4. The regularity conditions on / may be adjusted according to the parallel ones for the influence function if)(-). For example, if ip(-) is
188
PITMAN'S MEASURE OF CLOSENESS
absolutely continuous and bounded, we may allow / to be quite arbitrary. However, as we make ip(-) more general, additional regularity conditions on / are needed to achieve simple asymptotic results for the M-estimators. Here we allow ip(x) to be more arbitrary, but square integrable, and we assume that / has a finite Fisher information,
Furthermore, define
Then, for monotone ^(x) or a difference of two monotone functions, using (6.45) and the usual proof of the Glivenko-Cantelli Theorem, we can show that for every k > 0, as n —» oo,
This lemma in turn implies that as n —> oo,
so that an FOANR holds with $ = iji/ify, /). As such, with the APC characterizations, Theorem 6.2.1 holds for the usual M-estimators as well. Since the MLE belongs to the class of M-estimators for which (6.50) holds, and for the MLE, 7(^7, /) = $(/), we obtain by virtue of the BAN property of the MLE that it is also asymptotically Pitman-closest within the class of M-estimators of location. Theorem 6.3.2 Let C^ be the class of M-estimators of a location parameter 6. If C^ is restricted to those M-estimators which admit an FOANR, the MLE (or any other BAN estimator) of 0 is asymptotically Pitman-closest in C-2.
6.3.3 R-estimators We conclude this section with a discussion of the rank-based R-estimators of location. In the same setup as in the case of M-estimators, we consider a set
ASYMPTOTICS AND PMC
189
of scores: a n (l) < • • • < a n (n) generated by a score function > = {0(it), 0 < u < 1} in the following manner:
where Ui:n < ... < Un:n are the order statistics of a sample of size n from the uniform (0,1) distribution. We assume, without any loss of generality, that 0 is monotone. For every real t, let -R^(t) be the rank of \Xi —1\ among \X\ — £ | , . . . , \Xn — t|, for z = 1,..., n, and let
It is known that S$(t) is a nonincreasing function of t and S$(0) is distributed symmetrically around 0 (independently of F), so that as in (6.46)(6.47), we may define an R-estimator of 0 by
where
For the particular case of 0 = l,0 n) i(>) reduces to the sample median. Also, for (f)(u) = it, the values of an(k) become the mean ranks. The subsequent R-estimator 0n() ls the classical Wilcoxon score-estimator and is expressible as
Another notable member of this class is the normal scores estimator, which corresponds to the case <j)(u) = $~1(it),0 < u < 1, where $(#),£ e 1R is the standard normal distribution function. In this case, and, in general, On(<j)) may not be expressible explicitly in a closed algebraic form in X\,..., Xn, but can be solved by a convenient iterative process (where the Wilcoxon score-estimator can be taken as a good initial estimator). In an asymptotic setup, we require that 0 e £2(0,1), so that A^ = f^
190
PITMAN'S MEASURE OF CLOSENESS
In fact, the normal scores estimator is better than the sample mean for all nonnormal F, and for normal F, they are asymptotically equally efficient. Let us denote by and assume that F has a finite Fisher information
Further let (u) = 0o((l + w)/2) where
Then parallel to (6.49), we have for every k > 0 as n —»• oo (at 0 = 0) (6.59)
sup{|S£(t/Vn) - S*(0) + v^ *7 IA/n : 1*1 < *} ^ 0.
This convergence in probability yields the FOANR result: n(4>)
- e = £ 0(F(Xi - 0))/[m(, /)] +
which, in turn, implies that Note that
so that (6.62) is minimized when /o2(0o>0/) = 1, i-e., (j>o(u) = 0/(^)« This provides the APC characterization of the R-estimator 0n (>/) within the class of R-estimators. Theorem 6.3.3 Let C$ be the class of R-estimators of a location parameter 0. If Ca is restricted to those R-estimators which admit an FOANR, the On((j)f) is asymptotically Pitman-closest in €3. The results in this section apply to simple regression or linear models where the error components are independent and identically distributed random variables, but the observable random variables are subject to unknown deterministic components expressible as linear functions of known regression vectors and unknown coefficients.
ASYMPTOTICS AND PMC
6.4
191
APC characterizations of other estimators
Pitman has made several outstanding contributions to statistical inference; paramount among these contributions are the Pitman estimators (PE), introduced in 1939. Pitman estimators enjoy various optimal properties in the location-scale families of distributions as well as for general exponential families admitting sufficient statistics. They are also known to be quite close to the classical MLEs when the sample size is large. Moreover, the PEs are Bayes estimators in a meaningful sense and are similar to general Bayes estimators as well, when n is large. In this section, we discuss asymptotic properties of PEs, Bayes estimators, and MLEs as they relate to the Pitman closeness criterion.
6.4.1
Pitman estimators
Let Xij ...,Xn be independent and identically distributed random variables, each having the density f ( x , 0 ) with respect to a sigma-finite measure //, where 0 e ft C JR. We rewrite the likelihood function in (6.1) as
for each 0 e ft. In particular, for the location model discussed in §5.1, /(x, 0} = h(x — 0), so that (6.63) reduces to
for each 0 e ft. For this location model, Pitman (1939) introduced an estimator of 6 known as the Pitman estimator (PE), which is defined as
If we define the posterior density of 0 with respect to the uniform weight function by
for each 0 e D, then, by (6.65) and (6.66), we have
192
PITMAN'S MEASURE OF CLOSENESS
where U stands for the improper prior distribution of 6 generated by the uniform weight function on 0. Thus, Op>n may also be interpreted as a Bayes estimator. In this location model, the form of the density /(•) is assumed to be independent of 0, and OptH is translation-equivariant and unbiased for 6. Further, within the class of translation-equivariant estimators, 0p>n is minimax for the location family with respect to a quadratic loss, and admissible under additional regularity conditions. For each 6 e JR, the density 9P,n(Q)i gives rise to a corresponding distribution function Gp,n(0), such that dGptn(0) = gp,n(0)dO, which, following Sen (1992b), we may term the Pitman empirical distribution function of 0. If we let
then according to the results discussed in §5.4, Op,n is a posterior Pitmanclosest estimator of 0. The estimator 9pjH is unique whenever Gp,n has a unique median. Moreover, in a general estimation problem, with respect to an absolute error loss, 0p)Tl may dominate the Pitman estimator (PE), 0pjn. This is particularly the case if the distribution function (jp,n is not symmetric, such as occurs with the Poisson and gamma distributions. Thus, for 0p>n and its contender 6p,n, the relative comparison depends on the loss function and, in a general setup, neither dominates the other under all such loss functions. We have observed in earlier chapters that sufficiency, equivariance, and median unbiasedness play basic roles in this theory of estimation. Note that if an estimator On = 0(X\,..., Xn] is a sufficient statistic for the estimation of a parameter 0, then defining the likelihood function /n(0|X) as in (6.63), we have by the Neyman factorization theorem,
for each 0 e fi, where hn(-} stands for the density of 6n and it depends on 0 as well as on X\...,Xn (through 6n only), while /*(•) does not depend on 9. As such, by (6.65) and (6.69), we obtain that Qp,n = n(0n,0)d0/ f hn(0n,6)d0 = ifrn(0n), and it is a function of the sufficient statistic 0n. This feature of the Pitman estimator 0p)7l along with its unbiasedness make it possible to use the classical Rao-Blackwell Theorem to show that 0p)Tl has minimal risk property under a quadratic loss or a suitable convex loss. The sufficiency and unbiasedness of 0p)n may not, however, suffice for its Pitman-closest characterization.
ASYMPTOTICS AND PMC
193
In this respect, we refer to Theorem 5.1.13 wherein a Pitman closest characterization has been established under median unbiasedness of a sufficient statistic. Note that under (6.69), the density #p,n(0), for each 9 e ft, depends only on 0n, and, as such, the Pitman empirical distribution function Gp,n(0), for each 0 e Q, depends solely on 6n through its density hn(0n,0). <7p,n(-) is a random distribution function defined on fi and is itself a sufficient statistic (process) whenever 0n is sufficient. Thus, instead of the mean of Gp)n(0), if we choose its median (as defined by (6.68)), denoted by 0p,m then we have 0p>n = VfoAf (0n) a median-unbiased sufficient statistic. By Theorem 5.1.14, we have within the class C of estimates of 0 of the form Un = Op,n + Zn, where Zn is ancillary,
for all 0 e ft, Un t C. In this respect, we may refer to the Ghosh-Sen Theorem I (Theorem 5.1.14) where equivariance, sufficiency, and ancillarity considerations were incorporated in a formulation of the class of estimators C. However, it may be remarked that ancillarity of Zn is only a sufficient but not necessary condition. Note that if the conditional distribution of 0p)Tl given Zn has median 0 (a.s., Zn), then (6.70) holds. This property, termed by Sen and Saleh (1992) as the uniform conditional median unbiasedness, insures the usual median unbiasedness (although the converse may not be true), and removes the need to assume that Zn is ancillary.
6.4.2
Examples of Pitman estimators
Considerations of minimal sufficiency and maximal invariants often lead us to consider the class of estimators C, where the Zns depend only on the maximal invariants. Thus, in order that the Pitman-closest 0p>n is within this class C, we need to verify that 0p dition for this is that the distribution of 0p illustration, consider the following examples. Example 6.4.1 Let X\,... ,Xn be n independent and identically distributed random variables having the normal density with unknown mean 0 and variance a1. For simplicity, let us assume that a2 is known and without any loss of generality we take <j2 = 1. Then Qp,n — Xn has a normal distribution with mean 0 and variance 1/n, so that the PE, 0p>n, is median unbiased for 0, and hence is Pitman-closest within the class of equivariant estimators.
194
PITMAN'S MEASURE OF CLOSENESS
However, without equivariance, this Pitman-closest characterization of the PE may not hold even for this simple example. Example 1.1.2, taken from Efron (1975), illustrates a case where 0p)7l is dominated in the Pitman sense. Example 6.4.2 Consider Example 5.1.11 where the Pitman estimator of the guarantee time in the one-parameter exponential distribution is given by and is an unbiased estimator of 0. Although OpiH is not median unbiased for 6, a median-unbiased version of Qp,n is given by
so that within the class of translation-equivariant estimators of 0, the Pitman closest estimator is given by 9p>n not Op,n. Example 6.4.3 Let Xi,..., Xr be r independent random variables, where each Xi has the Poisson distribution with the parameter 9. Then T = X)i=i Xi is sufficient for 0 and Xr = T/r is unbiased for 0; Xr is also the MLE of 6. But
so that the MLE and 0p)7., the Pitman estimator, are different. Further, the Poisson distribution does not belong to the location-scale family, and it is easy to verify that 0p>r is not median-unbiased for 0 (see Sen and Saleh (1992)); the median of 0p)7. depends on 6. So here the Pitman estimator 0p)T. is not the Pitman-closest. With regard to the exponential distribution discussed in Example 6.1.12, we can obtain the Pitman estimator of A from (6.73) as (n + l)/Tn but the posterior Pitman-closest estimator of A is given by a(ri) = m2n+2/2, where "i2n+2 is the median of a chi-square random variable having 2n + 2 degrees of freedom. Thus, for this example, as well as Example 6.1.12, the classical Pitman estimator does not satisfy an intrinsic invariance condition because the Pitman estimator of 6 = I/A is Tn/n. However, the posterior Pitmanclosest estimator of 0 is 2Tn/m2n+2- In light of the engineers' remarks in
ASYMPTOTICS AND PMC
195
Example 1.2.1, this invariance condition gives the Pitman-closest estimator a practical advantage over the MMSE estimator or the Pitman estimator. Once again in the asymptotic context, we will see that as n increases these estimators, as BAN estimators in the conventional sense, share the APC characterization. A similar situation exists for estimation of normal variance <j2 and the precision 9 = a~2. In view of the similarities of these estimation problems we leave the details to the reader. Example 6.4.4 Let us consider a final example where sufficiency does not hold in a finite sample setup. Let X\,... ,X n be n independent and identically distributed random variables with each Xi having the double exponential or Laplace distribution discussed in Example 5.0.1, i.e.,
In this case, /n(0|x) = exp{- £?=i \Xi - 0|}/2n, so that an MLE of 0 is the sample median Xn. However the factorization in (6.69) does not hold here, and hence Xn is not a sufficient statistic. Note that the probability density function / in (6.74) belongs to the location family, and by (6.66),
The computation of 0pjTl in (6.75) poses a serious numerical problem, especially when n is not very small. Similarly, by (6.66) and (6.68), we have
and the computation may also become very complicated as n increases.
6.4.3 PMC equivalence In general, for probability density functions not admitting a sufficient statistic, the computation of the Pitman estimator, Qp,n or its contender 0p)Tl may be cumbersome, and may require some iterative solutions. This inherent )Tl and 0p>n can largely be eliminated by taking recourse to asymptotic methods which rest on the affinity of these Pitman estimators to the MLE (which are asymptotically sufficient in a broad sense). The local asymptotic normality (LAN) condition pertaining to the asymptotic
196
PITMAN'S MEASURE OF CLOSENESS
normality of the MLE also pertains to the asymptotic equivalence of Pitman estimators and MLE, and this will be outlined here. In this setup, location-scale or exponential family of probability density functions or even sufficiency and/or equivariance do not play any significant roles. Relative to the likelihood function Jn(0|x) in (6.63), let 9*n be the MLE (i.e., Jn(0J|x) = sup{/n(0|x) : 9 e 6}), so that we may rewrite (6.64) as
As a result, by (6.66) and (6.67), we have
Now, under the usual regularity conditions permitting the LAN condition for /n(#|x) (see for example, Theorem 8.1 of Ibragimov and Has'minskii (1981)), holds uniformly in 9 belonging to a compact set K, as n —> oo,
where
is the per unit information on 9, As such, by using (6.77) and (6.78), we obtain that under the same regularity conditions pertaining to the LAN condition of 0*, as n —* oo,
Or, in other words, as n -» oo, -y/ri |0p>n — 0J| —> 0. In a similar manner, Sen (1992b) shows that for every fixed t(\t\ < oo), as n —» oo,
where
ASYMPTOTICS AND PMC
197
is the sample counterpart of S(0), and $(•) is the standard normal distribution function. By (6.67) and (6.82) (and the a.s. convergence of Vn to S(0)), we conclude that as n —> oo,
Hence, from (6.81) and (6.84), we obtain that as n —> oo,
Also, by the BAN property of the MLE 0*, we obtain by Theorem 6.2.1 that 0* is APC. Thus, by (6.85), we conclude that both forms of the Pitman estimators 0p)Tl and 0p)Tl are BAN and hence, they share the APC characterization along with the MLE 0*. For simplicity of presentation, we have considered here the case of 0 being real valued. When 0 is a p-vector, for some p > 1, (6.81) extends directly to a quadratic form involving 0 — 0J (p-vector) and the information matrix 9f (p x p matrix), so that (6.78) again leads to (6.81) and (6.82), and extends to a multinomial distribution function with null mean vector and dispersion matrix Vn, the sample counterpart of S(0). As such, (6.85) extends to the vector case (see Sen (1992b)), and by Theorem 6.2.3, we arrive at the APC characterization of 0p>n and 0p>n (along with 0*) within the class of AN estimators. Note that by (6.13) and (6.21), an FOANR holds for both 0p,n and 0p,n (even in the multiparameter case), and the results discussed in detail in §6.2 hold for such Pitman estimators as well. For this reason, in small samples, there may not be unequivocal cause for advocating the MLE 0* instead of either of the two Pitman estimators 0p?n and 0p,n; the choice may be guided to a greater extent by the nature of the Pitman empirical distribution function Gp,n. For large sample sizes, there may not be any real difference in these estimators, and in that context, the basic criterion for choosing an estimator from among these three may be dictated by their computational simplicities. In that respect the Pitman estimator 0p)W may be more complicated than 0p)fl which, in turn, may be more cumbersome than the MLE 0*. In any case, iterative schemes may have to be employed, and they may be of comparable order of difficulty. For BAN estimators, it is possible to construct suitable shrinkage versions which dominate them under quadratic norm as well as APC criteria (see Sen (1986b)). In view of the FOANR results for these Pitman estimators, such asymptotic dominance results hold for 0p)Tl and 0p>n when 6 is vector valued. These shrinkage versions do not belong to the class of AN
198
PITMAN'S MEASURE OF CLOSENESS
estimators, and hence, their APC characterization may not follow from the results in §6.2. Also, a Pitman-closest optimal shrinkage version may not exist even asymptotically. Although in many cases, an optimal shrinkage estimator (under quadratic risk) may exist. We may refer to Sen and Sengupta (1991) for some details (pertaining to a finite sample setup) which carry over to the asymptotic situation.
6.4.4
Bayes estimators
In §5.4, we have presented the basics of the Bayesian aspects of Pitman's closeness measure, termed the posterior Pitman closeness (Ghosh and Sen, 1991). Definitions 5.4.1 and 5.4.2 pertain to these developments, while Theorems 5.4.5 and 5.4.6 pertain to the basic results in this domain. Whereas the classical PC definition may suffer from lack of transitiviteness and may also depend on the joint distribution of the two estimators being compared (as has been thoroughly discussed in earlier chapters), these drawbacks disappear to a larger extent in the asymptotic situations (as studied in §6.2 and 6.3) as well as in the posterior Pitman closeness approach. As we have seen in §5.4, the posterior Pitman closeness measure has added a new dimension to Bayesian estimation theory, raising genuine concern with the unreserved use of squared error or quadratic loss in practice. For example, a Bayesian decision theorist favoring squared error loss may be naturally tempted to advocate the posterior mean as the most appropriate estimator of the parameter of interest, while a second Bayesian more inclined to absolute error loss may strongly support the use of the posterior median. For sensibly asymmetric (posterior) distributions, these prescriptions may differ substantially! Fortunately the nonrobustness of Bayes estimators (with respect to choice of loss functions and/or prior on 0) has captured the close attention of Bayesians aspirous of prior-robustness, and it has already come to be recognized that Bayesian Pitman closeness may offer some convincing features (such as invariance with regard to the choice of loss functions which are arbitrary strictly monotone functions of the absolute error), which may not be shared by other conventional Bayesian estimation criteria. Contrary to the problem of finding estimators that minimize the risk R(0,6) = E0[£(0,6)], corresponding to a set loss function £(•), at every value of 0, in a Bayesian setup, one seeks to incorporate a prior distribution A(0) of 0 (over O) in the computation of the Bayes risk,
ASYMPTOTICS AND PMC
199
and then to choose that 6 for which RB(^} is a minimum. This 6 is termed a Bayes estimator of 8 with respect to the prior A. It turns out that minimizing (6.86) is equivalent to that of minimizing E{£(0,6(x))\X = x}, and for the squared loss function £(a, b) = k(a — b}2 for k > 0, the Bayes estimator with respect to the prior A is
Viewed from this point, it is clear that both the chosen prior A(0), 6 e 0 and the adopted loss function £(•) may considerably influence the choice of a Bayes estimator. In a uniparameter model, often a squared error loss is adopted, so that the Bayes estimator is given by (6.87). In this respect it may be remarked that if II = (A(0), 0 e G} is a class of prior distributions of 0, then for every A e II, the posterior distribution of 0, given X = x, is symmetric about a statistic 6$(x) and admits a finite first moment, then 6o(X) remains the Bayes estimator of 0 with respect to the entire class II. However, such a setup may demand a somewhat restrictive class II for the prior distributions A. Even a smaller degree of asymmetry of this posterior distribution of 0, given X, may induce a greater degree of change in the Bayes estimators depending on the chosen loss functions. Such restrictions may become more evident if we want to achieve some optimality properties (namely, efficiency) of the Bayes estimator in a broader setup of robust priors. In an asymptotic setup, however, Bayes estimators are asymptotically efficient under quite general regularity conditions and become asymptotically independent of the prior distribution A(0). We like to assert a similar property of Bayes estimators in the light of the Pitman closeness criterion. In a simple setup of a real-valued parameter 6 e ©, we define the likelihood function /n(0|x) as in (6.63). Under the usual (Cramer-type) regularity conditions on the density /, it is known (see Sen and Singer (1993, Chap. 5)) that for every K : 0 < K < oo, when 0 holds, as n —> oo,
where for the sake of notational simplicity, we let /n(0|x) = Jn(0)? and where [/*, 3(0) etc., are defined in (6.3), (6.13)-(6.14) . This quadratic approximation for the log-likelihood function in a neighborhood of the true parameter 9 not only insures that a consistent solution of the likelihood equation exists and belongs to an O(l/^/n) neighborhood of 0, but also this solution (often termed the A-MLE), satisfies (6.13), providing the BAN and APC characterizations as in Theorem 6.1.8. Let us expand a BAN estimator as
200
PITMAN'S MEASURE OF CLOSENESS
Also, let A(0|x(n)) be the posterior density of 6 given x(n) = (#1,... ,x n ). Then we have
Therefore, under squared error loss, the Bayes estimator of 0, as formulated in (6.87) is given by
If we let
then we may naturally be tempted to use the quadratic approximation in (6.88) to provide a good approximation for (6.91). We may write
so that by (6.91),
As such, by some standard analysis, it can be shown that
where Tn is a version of the A-MLE and hence is BAN for 6. Hence, the Bayes estimator QB,U is BAN, so that by Theorem 6.1.8, it is an APC estimator of 0. Let us look at the posterior density A(0|x(n)). Then, it can be shown that for every real t, as n —> oo,
ASYMPTOTICS AND PMC
201
where $ is the standard normal distribution function. Or, in other words, suppose that OB,U is the median of the posterior distribution of 9 (so that OB,U is the posterior Pitman-closest estimator of 0 with respect to the prior A(0)). Then by (6.96), as n -» oo,
so that by (6.95) and (6.97), as n —> oo,
Consequently, the posterior Pitman-closest estimator OB,U is also asymptotically efficient in the conventional sense and is asymptotically equivalent to the Bayes estimator 0B,n- These asymptotic equivalence results also hold for the multiparameter problem. The finiteness of the first moment of the prior density A(0) is taken for granted in the derivation of the desired asymptotic result. Also, the continuity and positivity of A(0) for every 9 is presumed in this context. This particular class of priors (A(0)} does not include the improper prior relating to the Pitman estimators 9piH and 0p)Tl, treated earlier in this section. Note that both the Pitman estimators and the Bayes estimators are asymptotically equivalent via their asymptotic equivalence to the MLE. In either case, the posterior Pitman-closest estimators do not entail the second mean convergence condition of the MLE, and hence may require less stringent regularity conditions.
6.5
Second-order efficiency and PMC
In the last three sections we have seen that BAN estimators are first-order efficient in the light of the usual mean squared error (or quadratic risk) as well as the Pitman closeness criterion, and in this context an FOANR plays a basic role. Since such BAN estimators include a large class (as depicted in §6.3 and 6.4), further studies are in order so as to force possible discrimination among them. In a conventional setup of quadratic risks, it has been observed that the concept of efficiency in estimation is linked with the degree of closeness of approximation to the derivative of the log-likelihood function (i.e., the efficient score statistic as defined in (6.89)). This relationship plays a fundamental role in asymptotic statistical inference and we refer to (6.13) for further clarification (see Rao (1961), (1962)). Rao innovated a formulation of second-order asymptotic efficiency (SOAE) of estimators, and this
202
PITMAN'S MEASURE OF CLOSENESS
led to further stimulating works in diverse areas of applications. The past three decades have seen phenomenal growth of literature on SOAE of estimators, where efficiency has been interpreted in terms of mean squared error (or quadratic risk) and by the information contained in the sample as well as in specific statistics. We refer to Ghosh (1993) for an excellent and current survey of higher-order asymptotic efficiency and its statistical applications. Higher-order median unbiasedness properties of estimators (see Akahira and Takeuchi (1981)) are known to have a basic role in the formulation of higher-order asymptotic efficiency of estimators. In this section, we provide a clear motivation for SOAE of estimators under the Pitman closeness criterion. Some of these results are adapted with some simplifications, although in less generality, from Ghosh, Sen, and Mukerjee (1992).
6.5.1 Asymptotic efficiencies Let X^ be the sample point (belonging to the sample space E^), and let p(X^n\6) be the density of X^ where 0 is a parameter belonging to a parameter space O. For simplicity of presentation, we treat 0 as a realvalued parameter, so that 6 C JR. Let T and let p(Tn,6) be its density. Following Rao (1962), we define
Further, let a (usually, - ^) be so chosen that
Then, Rao's definitions of first- and second-order efficiency can be presented as follows. Definition 6.5.1 A statistic Tn is said to be first-order efficient, if for a suitable choice of a in (6.100),
n is AN then (6.101) is essentially related to the first-or
ASYMPTOTICS AND PMC
203
Definition 6.5.2 The second-order efficiency ofT
where S(X(n)) and 9(T^) stand for the amount of information (in the Fisher sense) contained in the sample and in the statistic Tn, respectively. Actually, (6.102) examines the amount of information lost in using the statistic Tn instead of the whole sample X^n\ so that a minimum value o/(6.102) (over the choice ofTn) leads to the most efficient estimator. There are some distinctive features of Rao's definition of first- and second-order efficiency. The usual definition of the first-order efficiency of Tn entails that
so that the bias of Tn as well as its sampling variance both have important roles in (6.103). On the other hand, defined in terms of the score functions in (6.56), one does not need to compute the bias or sampling variance of Tn, so that there is a natural emphasis on the information contained in X^ n ) relative to Tn. To stress this point further, consider any invertible transformation T* = gn(Tn) from Tn to T*. Then
so that if Tn is (first- or second-order) efficient for 0, so is T*.
6.5.2
Asymptotic median unbiasedness
We have already emphasized in Chapter 5 the role of median unbiasedness of sufficient statistics in the context of Pitman-closest characterizations, and such a property remains true for a change from Tn to T* whenever T£ is a strictly monotone function of Tn. Similarly, if T£ = gn(Tn) where gn(t) is a strictly monotone function of £, then S(Tn) = S(T^). As such, if Tn is a (first- or second-order) efficient estimator of 0 and if gn(Tn] is a strictly monotone function of Tn, then T* = gn(Tn) is also efficient in the sam mode. On the other hand, in §4.7 we have seen the effect of ordering of estimators on the Pitman closeness comparisons. This allows us to improve on an estimator in the light of the Pitman closeness criterion. Towards this end, we present the following.
204
PITMAN'S MEASURE OF CLOSENESS
Lemma 6.5.3 Let Tn be a median unbiased estimator of 6 and let gn(t) be a strictly monotone Junction satisfying the additional condition that either 9n(t) t, with strict inequality holding somewhere. Then
for all 0 with strict inequality holding for some 9.
Consider first the case of T* < Tn (a.e.), so that Tn and gn(Tn) have no crossing points. Then med(Tn + T*- 20) < med[2(Tn - 6)] < 0, so that the right-hand-side in (6.106) is at least \. A similar case holds when Tn < T* a.e. Note that (6.91) implies that $(Tn) = %(T£). The transformation Tn -»• T£ does not affect the efficiency picture of Rao's definitions. According to the Pitman-closer definition, an improvement is possible by inducing median unbiasedness—as can always be done in the location-scale model. Thus, it may be possible to choose an estimator Tn, which may be efficient in the Rao sense, but can be improved on by a median unbiased version of it. This prescription provides an operating manual for finding a Pitman-closest estimator within the class of Rao's second-order efficient estimators. We consider the following examples. Example 6.5.4 Reconsider Example 5.1.11, and note that 0p>n in (6.72) is a median unbiased estimator of 0, and within the class of translationequivariant estimators, it is the Pitman-closest one. Now X\:n is sufficient for 0, and X\:n = gn(0p,n) satisfies the hypothesis of Lemma 6.5.1 with 9n(t) < £, for all t. Therefore, 0p^n dominates X\in in the Pitman-closeness sense, although they are both second-order efficient. Incidentally, in this example, X\:n or 0p>n is not BAN (in the sense that the AN property does not hold), so that for Lemma 6.5.1 to hold, we do not need to restrict ourselves to the class of BAN estimators. Example 6.5.5 Let X\,..., Xn be independent identically distributed random variables with the density function given in Example 4.5.1 as
ASYMPTOTICS AND PMC
205
n is unbiased for 9, it has the minimum variance, and it is efficient in the Rao sense. Let 7n be the median of the gamma (n, 1) density, and let n = nXn/in. Then, knowing that 7n < n, for all n and Xn > 0 (a.e.), we n is median unbiased for 9, and (ii) Xn = 7nTn/n < Tn (a.e.). Prom Lemma 6.5.1 we obtain that T in fact, it is the Pitman-closest estimator of 9 within the class of scaleequivariant estimators of 6. Here Tn belongs to the class of BAN estimators, and dominates Xn in the PC sense. Example 6.5.6 A closely related example is the normal density
where both 9 and a are unknown. Let Xn, and s2 = Y%=i(Xi ~ Xn)2/n be the MLEs of 9 and a2, based on X\,..., Xn. Invoking the joint sufficiency of (Xn,Sn), we may restrict ourselves to the class of estimators of (9, a) which are functions of (Xn,sn) only. Let s£2 = [n/(n — l)]s 2 , so that s*2 is unbiased for a2. Then (Xn,s*2) enjoys the efficiency properties in the Rao sense. If we let s^2 — ns 2 /m n _i, as suggested at the end of Example 3.3.1, then s®2 is median unbiased for <j2, so that (X instead of (Xn, s*), on the grounds of the Pitman closeness criterion.
6.5.3 Higher-order PMC In all the examples considered above, sufficiency plays a vital role. In the negation of sufficiency, the first-order efficiency holds for a broad class (of BAN) estimators, although their second-order efficiency picture may be quite different. Rao (1962) has a useful account of this. For this reason, we introduce the notion of higher-order Pitman-closeness which will be compatible with the notion of higher-order efficiency in the Rao sense. Definition 6.5.7 Consider two sequences {Tn}^=1 and {T*}™=1 of competing estimators of a common parameter 9 (based on the same sequence of samples). Suppose that there exists a positive number A, such that as n increases, for some real c(9) (which may depend on 9). Then, the Pitman-closeness measure, JPoo(Tn,T^), is said to be of higher order. In particular, for A = |,
206
PITMAN'S MEASURE OF CLOSENESS
it is said to be of second order. If c(0) > 0, for all 9, with strict inequality for some 0, then Tn is said to be second-order closer to 0 than T£, in the Pitman sense. If this holds for all T* belonging to a class C, then Tn is second-order Pitman closest within the class C. In the study of this second-order Pitman closeness property of estimators, we find it convenient to make use of the following definition (where Cn is typically ^ ): Definition 6.5.8 (Akahira and Takeuchi) For each positive integer k, a {en}-consistent estimator 0n of 0 is kth-order asymptotically median unbiased (denoted by k-AMU), if for any tf in O, there exists a positive number 6, such that
Note that for Cn = \/n, we have, equivalently,
uniformly in 0 in a closed interval, k = 1,2,... ; for k = 1, this reduces to the usual definition of the asymptotic median unbiasedness property. Guided by Lemma 6.5.1 and (6.111), we obtain the following result. n = gn(Tn) is any k-AMU estimator of 0, while Tn is MUfor0, and if gn(t) satisfies the hypothesis of Lemma 6.5.1, then
so that 6n and Tn are PC-equivalent of order k, for k >l. The proof parallels the proof of Lemma 6.5.1, where we need to use (6.111) instead of the first-order median unbiased property of Tn and Tn. We omit the details. Example 6.5.10 As illustrations of the utility of Lemma 6.5.2, reconsiuer Examples 6.5.5 and 6.5.6. In either case, the median mn has been tabulated
ASYMPTOTICS AND PMC
207
for small values of n, while for large n, some good approximations are available. For example, we know from Johnson and Kotz (1970, p. 177) that for the chi-square distribution function with n degrees of freedom,
so that for n > 3,
and hence using s* or sn instead of s° entails only second-order Pitmancloseness. In the above examples, the gn(t] satisfies the ordering axiom in Lemma 6.5.1. This ordering enables us to study the higher-order PMC of some estimators which have the same second-order efficiency in the sense of Definition 6.5.2. As such Lemma 6.5.9 may be more informative than Definition 6.5.2. If such an ordering does not hold, we may not be in a position to apply Lemma 6.5.2. Nevertheless, we may be able to draw conclusions on second-order Pitman-closeness under some alternative regularity conditions. Recall that if (6.99) is to hold for all 0 and some A > 0, then
for all 0, so that Tn and T* are first-order PC equivalent. As such, by virtue of the results in §§6.2 and 6.3, we may confine ourselves to the class of BAN estimators which are known to be first-order PC equivalent. To study the second-order Pitman-closeness property, we tacitly assume that such BAN estimators admit second-order representations of the following form: where
(stemming from the BAN character of T n ),Tn thogonal, and
and Tn are mutually or-
where Y has a nondegenerate distribution function. Let Tn\ and Tni be two BAN estimators, each of which admits a second-order representation of
208
PITMAN'S MEASURE OF CLOSENESS
the form (6.114) - (6.116). We attach the subscript j(= 1,2) to denote the (k) corresponding 7]^ , j = 1,2, and k = 1,2. By virtue of the BAN condition,
ni = Tn2 = Tn\ say. Also, both T$ and 2$ are orthogonal to T£\ and hence As such, we have the following.
Now, by (6.104), n2[(T^)2 - (T^)2] + op(l) = Op(l), so that whenever n(T^ — T^) + op(l) does not converge to 0 (in probability) while Pi{^nT^ < a/vM = 5 + O(l/>/n), then by (6.109) and the fact that (6.118) is ^ ±0(l/v / rc), Tni and Tn2 are second-order Pitman closer to each other. If, however, n(T$ ± T$) is o that Tni, Tn2 are second-order PC-equivalent. This decomposition (i.e., (6.114)) is a direct extension of (6.21) to a second-order representation, and holds for BAN estimators under additional regularity conditions (see Ghosh et al. (1992)) as well as for a general class of nonparametric statistics, treated in §6.3. However, this may not be the case if the score function >(•) in (6.21) admits jump discontinuities (as was /Q\
/rt\
the case with sample median or Type I L-statistics), where Tn is typically
Op(n~3/4), so that (6.116) may not hold. If in (6.116), n is replaced by n3/4, then in (6.118), has to be replaced by which is itself Op(l). Hence (6.118) may not be ^ -I- O(l/y/n), although it converges to ^ as n —»• oo. This leads us to consider "smooth" scores
ASYMPTOTICS AND PMC
209
for L- or M- or R-estimators, so that the second-order Pitman closeness property can be studied by using (6.118). In a parametric framework, more precise bounds, based on more complicated analysis, have been established by Ghosh, Sen, and Mukerjee (1992). These are, however, beyond the scope of our contemplated level of presentation, and hence, will not be pursued here. In closing, we remark that the primary emphasis of Ghosh et al. (1992) has made use of the usual Edgeworth-type expansions for the distribution of y/n(Tn — 0), whereas in (6.114) we present the second-order decomposition in terms of orthogonal components. In many cases, especially in nonparametrics, it may be less cumbersome to establish (6.114) than to develop the Edgeworth expansion. This point becomes more crucial in the multiparameter case, where an expansion for each coordinate variable can be made in light of (6.114) but an Edgeworth expansion of the joint multivariate distribution of the vector of coordinates will most likely be considerably more complicated. We conclude this chapter and close the monograph with some parting remarks to facilitate further reading of research literature dealing with problems more complex than those treated here. For simplicity of presentation and ease of comprehension, we have exclusively discussed one-parameter estimation problems. In practice, aside from the binomial, Poisson, and exponential laws, one will commonly encounter models with at least two parameters such as the univariate normal and the entire class of location and scale parameter families including the Cauchy, Gumbel, Laplace, logistic, and Weibull. In many other cases, such as the multivariate normal or ^-distributions, the distributions have a larger number of parameters. In these multiparameter cases a primary concern in estimation theory for the practitioner is the formation of a suitable loss function. However, the consequent Pitmanclosest characterizations are much more complex. Loss fuctions arising out of nonnegative definite quadratic forms are natural extensions of squared error loss and provide the practitioner with some reasonable choices and there are intrinsic alternative measures in many other contexts such as entropy loss. Even in the former case of quadratic loss functions, a proper choice of the discriminant may depend on the interrelations among the estimators and hence uniform optimality for all such loss functions may not exist. In this framework it will become quite natural to restrict our consideration to appropriate classes of estimators by
210
PITMAN'S MEASURE OF CLOSENESS
equivariance, unbiasedness, and other properties as we did in Chapter 5. These restrictions yield some generalizations of the Pitman closeness criterion along the lines suggested in Sen (1990), (1992a) and Sen, Nayak, and Khattree (1991). Most of the results developed in Chapter 5 have been extended to these multiparameter problems and the reader will find a recent account of these developments in Sen (1992a). In the asymptotic case we can bring BAN estimators into the multivariate framework and Theorems 6.1.8 and 6.2.1 remain true. The Pitman estimators, Bayes estimators, and MLEs are readily amended to these multiparameter families. With respect to ^-dimensional location parameters, robust estimators may frequently satisfy the property of also being APC within a confined class. But a basic problem in the multiparameter case is that shrinkage or Stein-rule estimators of the multinormal mean vector do not have even asymptotically multinormal laws. Since these estimators are not AN the conclusions drawn for such AN estimators are not directly applicable. Although there are some PMC derivations for shrinkage estimators of location, the techniques involved are somewhat more complicated than the ones given here (see Mason et al. (1990)). Even in the asymptotic context, these shrinkage rules can at best be described in terms of a function of a vector of BAN estimators. Sen and Sengupta (1991) have made some progress in this regard but more suitable techniques are needed for dealing with general APC results for such functions of multinormal variables. Finally, we have restricted our results to fixed sample size estimation procedures. However, many of the results given herein are also applicable in sequential analysis. The Ghosh-Sen (1989) theorems have been extended to the sequential setup in a natural way (see Sen (1990)) and are readily extended to sequential asymptotic development. Given the advanced developments of the Pitman closeness criterion in the asymptotic setup for multiparameter models and sequential sampling schemes, it would be appropriate to say that the Pitman closeness criterion merits equal consideration with other conventional measures customarily used in asymptotic optimality studies. We echo the sentiments of Johnson and Rao in saying that even in an asymptotic construct there is no need to abandon either PMC or its generalizations or MSE in favor of the other. Rather we need to examine more crucially the appropriate nature of loss functions and the robustness of underlying distributional models in deciding upon a criterion. It is our earnest hope that, given the discussions in this monograph, the reader will be encouraged to weigh the Pitman closeness criterion on its natural merits.
Bibliography Arrow, K. J. (1951), Social Choice and Individual Values, Yale University Press, New Haven, CT. Akahira, M. and Takeuchi, K. (1981), The concept of asymptotic efficiency and higher order asymptotic efficiency in statistical estimation theory, Lecture Notes, Univ. of Tokyo, Japan. Bahadur, R. R. (1966), A note on quantiles in large samples, Ann. Math. Statist., 37, pp. 577-580. Bar-Lev, S. K. and Ennis, P. (1988), Sign-preserving unbiased estimators in linear exponential families, J. Amer. Statist. Assoc., 83, pp. 1187-1189. Basu, D. (1955), On statistics independent of a complete statistic, Sankhya, Series A, 15, pp. 377-380.
sufficient
Belsley, D. A., Kuh, E., and Welsch, R. E. (1990), Regression Diagnostics: Identifying Influential Data and Sources of Collinearity, 2nd Ed., John Wiley, New York. Berkson, J. (1955), Maximum likelihood and minimum chi-square estimates of the logistic function, J. Amer. Statist. Assoc., 50, pp. 130-162. (1980), Minimum chi-square, not maximum likelihood!, Ann. Statist., 8, pp. 457-469. Blyth, C. R. (1951), On minimax statistical decision procedures and their admissibility, Ann. Math. Statist. 22, pp. 22-42. (1970), On the inference and decision models of statistics, Ann. Math. Statist., 41, pp. 1034-1058. (1972), Some probability paradoxes in choice from among random alternatives, J. Amer. Statist. Assoc., 67, pp. 366-381. 211
212
BIBLIOGRAPHY Blyth, C. R. (1986), Convolutions ofCauchy distributions, Amer. Math Monthly, 93, pp. 645-647. (1991), Comments on 'The closest estimates of statistical parameters', Comm. Statist., A20, pp. 3445-3452. Blyth, C. R. and Pathak, P. K. (1985), Does an Estimator's Distribution Suffice?, Proceedings of the Berkeley Conference in Honor of Neyman and J. Kiefer, 1, pp. 45-52. Boullion, T. L., Cascio, G., and Keating, J. P. (1985), Comparison of estimators of the fraction defective in the normal distribution, Comm. Statist., A14, pp. 1511-1529. Brams, S. J. (1985), Rational Politics: Decisions, Games, and Strategy, Congressional Quarterly Press, Washington, D.C. Brams, S. J. and Fishburn, P. C. (1992), Approval voting in scientific and engineering societies, Group Decision and Negotiation, 1, pp. 41-55. Brown, G. G. and Rutemiller, H. C. (1973), The efficiencies of maximum likelihood and minimum variance unbiased estimators of the fraction defective in the normal case, Technometrics, 15, pp. 849855. Brown, L. D., Cohen, A., and Strawderman, W. E. (1976), A complete class theorem for strict monotone likelihood ratio with applications, Ann. Statist., 4, pp. 712-722. Condorcet, M. (1785), Essai sur I 'application de I'analyse a la probabilite des decisions rendues a la pluralite des voix, Paris. David, H. A. (1981), Order Statistics, John Wiley, New York. (1988), The Method of Paired Comparisons, Griffin, London. David, H. T. and Salem, A. S. (1991), Three shrinkage constructions for Pitman-closeness in the one-dimensional location case, Comm. Statist., A20, pp. 3605-3627. Dugundji, J. (1966), Topology, Allyn and Bacon, Boston. Dyer, D. and Keating, J. P. (1979a), Pitman closeness efficiency of estimators of reliability with applications to the exponential model, Proceedings of the 24th Conference on the Design of Experiments in Army Research, Development, and Testing, ARO Report 79-2, pp. 381-409.
DEVELOPMENT OF PITMAN'S MEASURE OF CLOSENESS
213
Dyer, D., and Keating, J. P. (1979b), A further look at the comparison of normal percentile estimators, Comm. Statist., A8, pp. 1-16. (1981), On the relative behavior of estimators of the characteristic life in the exponential failure model, Comm. Statist., A10, pp. 489501. (1983), On the comparison of estimators in a rectangular distribution, Metron, 41, pp. 155-165. Dyer, D., Keating, J. P., and Hensley, O. L. (1979), On the relative behavior of estimators of reliability/survivability, Comm. Statist., A8, pp. 39&-416. Eaton, M. L. (1983), Multivariate Statistics: A Vector Space Approach, John Wiley, New York. Efron, B. (1975), Biased vs. unbiased estimation, Adv. Math., 16, pp. 259-277. (1978), Controversies in the foundations of statistics, Amer. Math. Monthly, 85, pp. 231-246. (1982), Maximum likelihood and decision theory, Ann. Statist., 10, pp. 340-356. Efron, B. and Morris, C. (1977), Stein's paradox in statistics, The Scientific American, 236, pp. 119-127. Eisenhart, C., Deming, L. S., and Martin, C. S. (1963), Tables Describing Small Sample Properties of the Mean, Median, and Standard Deviation and Other Statistics in Sampling from Various Distributions, National Bureau of Standards, Technical Note 191. Esty, W. W. (1992), Votes or competitions which determine a winner by estimating expected plurality, J. Amer. Statist. Assoc., 87, pp. 373-375. Farebrother, R. W. (1986), Pitman's measure of closeness, Amer. Statist., 40, pp. 179-180. Ferguson, T. S. (1967), Mathematical Statistics, Academic Press, New York. Fisher, R. A. (1922), On the mathematical foundations of theoretical statistics, Philos. Trans. Roy. Soc., London, A222, pp. 309-368. (1938), Theory of Estimation, Readership Lectures, Calcutta University Press, Calcutta.
214
BIBLIOGRAPHY Fisher, R. A. (1973), Statistical Methods and Scientific Inference, 3rd Ed., Hafner Press, London. Fountain, R. L. (1991), Pitman closeness comparison of linear estimators: a canonical form, Comm. Statist., A20, pp. 3535-3550. Fountain, R. L. and Keating, J. P. (1993), The Pitman comparison of unbiased linear estimators, Statist. Probab. Lett., 11, to appear. Fountain, R. L., Keating, J. P., and Rao, C. R. (1991), An example arising from Berkson's conjecture, Comm. Statist., A20, pp. 34573472. Gauss, K. F. (1821), Theoria combinationis observationum erroribus minimis obnoxiae. Pars prior. Geary, R. C. (1944), Comparison of the concepts of efficiency and closeness for consistent estimates of a parameter, Biometrika, 33, pp. 123-128. Ghosh, J. K. (1993), Higher order asymptotics, NSF-CBMS Lecture Notes, IMS Series, to appear. Ghosh, J. K., Sen, P. K.. and Mukerjee, R. (1992), Second order Pitman closeness and Pitman admissibility, Institute of Statistics, University of North Carolina, Mimeograph Series, Report No. 2071. Ghosh, M., Keating, J. P., and Sen, P. K. (1993), Comment on 'Is Pitman closeness a reasonable criterion?', J. Amer. Statist. Assoc., 88, to appear. Ghosh, M. and Sen, P. K. (1989), Median unbiasedness and Pitman closeness, J. Amer. Statist. Assoc., 84, pp. 1089-1091. (1991), Bayesian Pitman closeness, Comm. Statist., A20, pp. 3659-3678. Groeneveld, R. A. and Meeden, G. (1977), The mode-median-mean inequality, Amer. Statist., 31, pp. 120-121. Gumbel, E. J. G. (1954), Statistical Theory of Extreme Values and Some Practical Applications, National Bureau of Standards, Applied Mathematics Series, Volume 33. Halmos, P. R. (1946), The theory of unbiased estimation, Ann. Math. Statist., 17, pp. 34-43. Halperin, M. (1970), On inverse estimation in linear regression, Technometrics, 12, pp. 69-82.
DEVELOPMENT OF PITMAN'S MEASURE OF CLOSENESS
215
Hampel, F. R., Ronchetti, E. M., Rousseeuw, P. J., and Stahel, W. A. (1986), Robust Statistics: The Approach Based on Influence Functions, John Wiley, New York. Haunsperger, D. B. (1992), Dictionaries of paradoxes for statistical tests on k samples, J. Amer. Statist. Assoc., 87, pp. 149-155. Hodges, J. L. and Lehmann, E. L. (1951), Some applications of the Cramer-Rao Inequality, Proceedings of the Second Berkeley Symposium on Mathematical Statistics and Probability, University of California Press, Berkeley, Los Angeles, pp. 53-73. Hoeffding, W. (1984), Range preserving unbiased estimators in the multinomial case, J. Amer. Statist. Assoc., 79, pp. 712-714. Hoffman, P. (1988), Archimedes' Revenge, Fawcett Crest, New York. Ibragimov, I. A. and Has'minskii, R. Z. (1981), Statistical Estimation: Asymptotic Theory, Springer-Verlag, New York. Jeffreys, H. (1961), Theory of Probability, Clarendon Press, Oxford. Johnson, N. L. (1950), On the comparison of estimators, Biometrika, 37, pp. 281-287. Johnson, N. L. and Kotz, S. (1970), Distributions in Statistics, Vol. II, Continuous Distributions, Houghton-Mifm'n, Boston. Kagan, A. M., Linnik, Y. V., and Rao, C. R. (1973), Characterization Problems in Mathematical Statistics, John Wiley, New York. Karlin, S. (1957), "Polya type distributions, II," Ann. Math. Statist., 28, pp. 281-308. (1958), Admissibility for estimation with quadratic loss, Ann. Math. Statist., 29, pp. 406-436. Keating, J. P. (1980), Theory and Applications in the Comparison of Estimators, Ph.D. Dissertation, Univ. of Texas at Arlington. (1983), Estimators of percentiles based on absolute error loss, Comm. Statist., A12, pp. 441-447. (1984), A note on the estimation of percentiles and reliability in the extreme-value distribution, Statist. Probab. Lett., 2, pp. 143-146. (1985), More on Rao's phenomenon, Sankhya, Series B, 47, pp. 18-21.
216
BIBLIOGRAPHY Keating, J. P. (1991), Karlin's corollary: A topological approach to Pitman's measure, Comm. Statist., A20, pp. 3729-3750. Keating, J. P. and Gupta, R. C. (1984), Simultaneous comparison of scale estimators, Sankhya, Series B, 46, pp. 275-280. Keating, J. P. and Mason, R. L. (1985a), Pitman's measure of closeness, Sankhya, Series B, 47, pp. 22-32. (1985b), Practical relevance of an alternative criterion in estimation, Amer. Statist., 39, pp. 203-205. (1988a), Preference regions and their probabilities based on a rectangular grid, Comm. Statist., A17, pp. 1973-1983. (1988b), James-Stein estimation from an alternative perspective, Amer. Statist., 42, pp. 160-164. -(1991), Closeness comparison of classical and inverse regression estimators, Comput. Statist. Data Anal., 12, pp. 4-11. Keating J. P. and Tripathi, R. C. (1985), Percentiles, estimation of, Encyclopedia of Statistical Sciences, VI, pp. 668-674. Kempthorne, 0. (1989), The fate worse than death and other curiosities and stupidities, Amer. Statist., 43, pp. 133-134. Kendall, M. G. and Stuart, A. (1977), The Advanced Theory of Statistics, Vol. I, 4th Ed. Khattree, R. and Peddada, S. D. (1987), A short note on Pitman nearness for elliptically symmetric estimators, J. Statist. Planning Inference, 16, pp. 257-260. Khattree, R. (1987) On comparison of estimates of dispersion using generalized Pitman closeness, Comm. Statist., A16, pp. 263-274. Krutchkoff, R. G. (1971), The calibration problem and closeness, J. Statist. Comput. Simul., 1, pp. 75-89. Kubokawa, T. (1989), Closer estimators of a common mean in the sense of Pitman, Ann. Inst. Statist. Math., 41, pp. 477-484. (1991), Equivariant estimation under the Pitman closeness criterion, Comm. Statist., A20, pp. 3499-3523. Landau, H. G. (1947), On the relations between certain criteria for the estimation of statistical parameters, Univ. Pittsburgh Bull., 43, pp. 143-150.
DEVELOPMENT OF PITMAN'S MEASURE OF CLOSENESS
217
Laplace, P. S. (1774), Memoire sur la probabilite des causespar les evenements. Lawless, J. F. (1982), Statistical Models and Methods for Lifetime Data, John Wiley, New York. Lehmann, E. L. (1951), A general concept of unbiasedness, Ann. Math. Statist., 22, pp. 587-592. (1983), Theory of Point Estimation, John Wiley, New York. (1986), Testing Statistical Hypotheses, John Wiley, New York. Lloyd, E. H. (1952), Least squares estimation of location and scale parameters using order statistics, Biometrika, 39, pp. 88-95. Mann, N. (1969), Optimum estimators for linear functions of location and scale parameters, Ann. Math. Statist., 40, pp. 2149-2155. Mason, R. L. (1991), Comments on 'The closest estimates of statistical parameters', Comm. Statist., A20, pp. 3453-3456. Mason, R. L. and Blaylock, N. W. (1991), Ridge regression estimator comparisons using Pitman's measure of closeness, Comm. Statist., A20, pp. 3629-3642. Mason, R. L., Keating, J. P., Sen, P. K., and Blaylock, N. W. (1990), Comparison of linear estimators using Pitman's measure of closeness, J. Amer. Statist. Assoc., 85, pp. 579-581. Maynard, J. M. and Chow, B. (1972), An approximate Pitman-type 'close' estimator for the negative binomial parameter p*, Technometrics, 14, pp. 77-88. McCullagh, P. (1992), Conditional inference and Cauchy models, Biometrika, 79, pp. 247-259. Mood A. M., Graybill, F. A, and Boes, D. C. (1974), An Introduction to the Theory of Statistics, McGraw-Hill, New York, p. 290; 363. Nagaraja, H. N. (1986), Comparison of estimators and predictors from two parameter exponential distribution, Sankhya, Series B, 48, pp. 10-18. Nayak, T. K. (1990), Estimation of location and scale parameters using generalized Pitman nearness, J. Statist. Planning Inference, 24, pp. 259-268.
218
BIBLIOGRAPHY Nayak, T. K. (1991), Comments on Bayesian Pitman closeness, Comm. Statist., A20, pp. 3679-3684. Neyman, J. (1949), Contributions to the theory of the x2 test, Proceedings of the First Berkeley Symposium on Mathematical Statistics and Probability, J. Neyman, ed., University of California Press, Berkeley, Los Angeles, pp. 239-273. Pearson, K. (1936), Method of moments and method of maximum likelihood, Biometrika, 28, pp. 34-59. Peddada, S. D. and Khattree, R. (1986), On Pitman nearness and variance of estimators, Comm. Statist., A15, pp. 3005-3017. (1991), Comparison of estimators of the location parameter using Pitman's closeness criterion, Comm. Statist., A20, pp. 3525-3534. Pitman, E. J. G. (1937), The closest estimates of statistical parameters, Proc. Cambridge Philos. Soc., 33, pp. 212-222. (1939), Tests of hypotheses concerning location and scale parameters, Biometrika, 31, pp. 200-215. (1979), Some Basic Theory for Statistical Inference, ChapmanHall, London. Plackett, R. L. (1972), The discovery of the method of least squares, Biometrika, 59, pp. 239-252. Rao, C. R. (1947), General methods of analysis for incomplete block designs, J. Amer. Statist. Assoc., 42, pp. 541-561. (1961), Asymptotic efficiency and limiting information, Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability, University of California Press, Berkeley, Los Angeles, 1, pp. 531-566. (1962), Efficient estimates and optimum inference procedures in large samples, J. Roy. Statist. Soc., Series B, pp. 24-46. (1971), Estimation of variance and variance componentsMINQUE theory, J. Multivariate Anal., 1, pp. 257-275. (1973), Linear Statistical Inference, John Wiley, New York. (1980), Discussion of 'Minimum chi-square not maximum likelihood!', Ann. Statist., 8, pp. 482-485.
DEVELOPMENT OF PITMAN'S MEASURE OF CLOSENESS
219
Rao, C. R. (1981), Some comments on the minimum mean square error as a criterion in estimation, in Statistics and Related Topics, NorthHolland, Amsterdam, pp. 123-143. (1989), Statistics and Truth, Council of Scientific and Industrial Research, New Delhi, India. (1991), Remarks to Pitman nearness in statistical estimation, Comput. Statist. Data Anal., 12, pp. 1-17. (1992), R. A. Fisher: the founder of modern statistics, Statist. Sci., 7, pp. 34-48. -(1993), Comment on 'Is Pitman closeness a reasonable criterion?,' J. Amer. Statist. Assoc., 88, to appear. Rao, C. R., Keating, J. P., and Mason, R. L. (1986), The Pitman nearness criterion and its determination, Comm. Statist., A15, pp. 31733191. Renyi, A. (1970), Foundations of Probability, Holden-Day, San Francisco. Robert, C., Hwang, J. T., and Strawderman, W. E. (1993), Is Pitman closeness a reasonable criterion?, J. Amer. Statist. Assoc., 88, to appear. Romer, T. and Rosenthal, H. (1984), Voting models and empirical evidence, Amer. Sci., 72, pp. 465-473. Salem, A. S. (1974), Alternative multivariate normal admissibility criteria, Ph.D. Dissertation, Iowa State Univ. Sarkar, S. K. (1991), On estimating the common mean of several normal populations using the Pitman closeness criterion, Comm. Statist., A20, pp. 3487-3498. Savage, L. J. (1954), The Foundations of Statistics, John Wiley, New York. Scheffe, H. (1945), Geary, R. C.: Comparison of the concepts of efficiency, Math. Reviews, 6, p. 2. Sen, A. K. (1966), A possibility theorem on majority decisions, Econometrica, 34, pp. 491-499. Sen, P. K. (1963), On the estimation of relative potency in dilution (direct) assays by distribution-free methods, Biometrics, 19, pp. 532552.
220
BIBLIOGRAPHY Sen, P. K. (1964), On some properties of the rank weighted means, J. Indian Soc. Agri. Statist., 16, pp. 51-61. (1981), Sequential Nonparametrics: Invariance Principles and Statistical Inference, John Wiley, New York. (1985), Theory and Applications of Sequential Nonparametrics, CBMS-NSF Regional Conference Series, No. 49, Society for Industrial and Applied Mathematics, Philadelphia, PA. (1986a), Are BAN estimators the Pitman-closest ones too?, Sankhya, Series A, 48, pp. 51-58. (1986b), On the asymptotic distributional risks of shrinkage and preliminary test versions of maximum likelihood estimators, Sankhya, Series A, 48, pp. 354-371. (1989a), The mean-median-mode inequality and noncentral chisquare distributions, Sankhya, Series A, 51, pp. 106-114. (1989b), Optimality of BLUE and ABLUE in the light of Pitman closeness of statistical estimators, Colloquia Mathematica Societatis Janos Bolyai: Limit Theorems in Probability and Statistics, pp. 459-476. —(1990), On the Pitman closeness of some sequential estimators, Sequential Anal, 9, pp. 383-400.
—(1991), Some recent developments in Pitman closeness and its applications, Comput. Statist. Data Anal., 12, pp. 11-16. —(1992a), Pitman closeness of statistical estimators: latent years and the renaissance, Current Issues in Statistical Inference: Essays in Honor of D. Basu, M. Ghosh and P. K. Pathak, eds., IMS Lecture Notes, 17, pp. 52-74. —(1992b), On Pitman empirical distribution and statistical estimation, in Data Analysis and Statistical Inference: Festschrift in Honour of Friedhelm Eicker, S. Schach and G. Trenkler, eds., Eul-Verlag, Germany, pp. 65-82. —(1992c), Isomorphism of quadratic norm and PC ordering of estimators admitting first order AN representations, Sankhya Series A, 54, to appear.
DEVELOPMENT OF PITMAN'S MEASURE OF CLOSENESS
221
Sen, P. K., Kubokawa, T., and Saleh, A. K. M. E, (1989), Stein's paradox in the sense of Pitman's measure of closeness, Ann. Statist., 17, pp. 1375-1386. Sen, P. K., Nayak, T. K., and Khattree, R. (1991), Comparison of estimators of a dispersion matrix under generalized Pitman nearness criterion, Comm. Statist., A20, pp. 3473-3486. Sen, P. K. and Saleh, A. K. M. E. (1992), On Pitman closeness of Pitman estimators, Gujarat Statistical Reviews, C. G. Khatri Memorial Volume, to appear. Sen, P. K. and Sengupta, D. (1991), On characterizations of Pitman closeness of some shrinkage estimators, Comm. Statist., A20, pp. 3551-3580. Sen, P. K. and Singer, J. M. (1993), Large Sample Methods in Statistics: An Introduction with Applications, Chapman-Hall, New York, to appear. Sengupta, D. and Sen, P. K. (1991), Shrinkage estimation in a restricted parameter space, Sankhya, Series A, 53, pp. 389-411. Severini, T. A. (1991), A comparison of the maximum likelihood estimator and the posterior mean in the single parameter case, J. Amer. Statist. Assoc., 86, pp. 997-1000. Sheppard, W. F. (1898), On the application of the theory of error to cases of normal distributions and normal correlations, Proc. Roy. Soc., 62, p. 170. Stein, C. (1956), Inadmissibility of the usual estimator for the mean of a multivariate normal distribution, Proceedings of the Third Berkeley Symposium on Mathematical Statistics and Probability, Univ. of California Press, Berkeley, Los Angeles, 1, pp. 197-206. Stoyan, D., Kendall, W. S., and Mecke, J. (1987), Stochastic Geometry and Its Applications, John Wiley, New York. Strasser, H. (1985), Pitman estimators, Encyclopedia of Statistical Sciences, VI, pp. 735-739. Takada, Y. (1991), Median unbiasedness in an invariant prediction problem, Statist. Probab. Lett., 12, pp. 281-283. Thompson, J. R. (1989), Empirical Model Building, John Wiley, New York.
222
BIBLIOGRAPHY Tukey, J. W. (1960), A survey of sampling from contaminated distributions, in Contributions to Probability and Statistics, I. Olkin et al., eds., Stanford Univ. Press, Stanford. Vecchia, D. F., Iyer, H. K., and Chapman, P. L. (1989), Calibration with randomly changing standard curves, Technometrics, 31, pp. 83-90. Von Neumann, J. and Morgenstern, O. (1944), Theory of Games and Economic Behavior, Princteon University Press, Princeton, NJ. Weibull, W. (1951), A statistical distribution of wide applicability, J. Appl. Mech., 18, pp. 293-297. Yates, F. (1940), The recovery of interblock information in balanced incomplete block designs, Ann. Eugenics, 10, pp. 317-325. Yoo, S. and David, H. T. (1993), Shrinkage constructions for Pitman domination, Sankhya, Series A, 55, to appear. Zacks, S. (1971), The Theory of Statistical Inference, John Wiley, New York. Zacks, S. and Even, M. (1966), The efficiencies in small samples of the maximum likelihood and best unbiased estimators of reliability functions, J. Amer. Statist. Assoc., 61, pp. 1033-1061. Zacks, S. and Milton, R. (1972), Mean square errors of the best unbiased and maximum likelihood estimators of tail probabilities in normal distributions, J. Amer. Statist. Assoc., 66, pp. 590-593.
Index approval voting, 73, 77 asymptotic efficiencies, 203, 204 median unbiasedness, 204 methods, 171 normal, 178 PMC, 175 properties, 24
in distribution, 173 in probability, 172 in rth mean, 174 convex class, 87 convex risk, 6, 135, 188 corollaries Karlin, 114, 115, 131 Karlin's extended, 118 crossing point, 115
Bahadur representation, 186 Bayesian estimation, 14, 40, 157, 199 best asymptotically normal, 9,136, 177 bias, 8, 11, 12 bioassay, 19, 46, 96, 157 bootstrap, 16 Borda count, 74
distribution Bernoulli, 46 beta, 9, 85, 160 binomial, 88, 160 bivariate normal, 37,110, 111, 113, 135, 178, 180 Cauchy, 3, 7, 47, 57, 134, 148 chi-square, 84, 123, 147 double exponential, 3, 196 empirical, 187 exponential, 20, 121, 133, 145, 195 extreme value, 146, 157 Laplace, 137, 161, 180, 196 noncentralt, 110,124,127,129 normal, 2, 79, 84, 108, 124, 135, 148, 152, 178, 180,
calibration, 48 change point, 115 circular triads, 68 collective rationality, 72 concordance, 61 Condorcet estimator, 74 candidate, 73, 75 confidence interval, 21 consistency, 9, 10, 99, 135, 136 contaminated sample, 108 convergence almost surely, 174
208
Poisson, 195 uniform, 8, 9, 84, 88, 92, 93, 147, 151, 181 223
INDEX
224
Efron rule, 12, 129 equivariance, 33, 171, 195 equi variant, 15 estimation theory, 5 estimators asymptotically BLUE, 170,187 asymptotic median unbiased, 204 asymptotic Pitman-closest, 175, 183, 187, 189, 192 asymptotic unbiased, 135, 176 BAN, 9, 136, 177, 178 Bayes, 14, 199 best, 1, 17, 18, 177 biased, 12 bijective, 119 BLIE, 162, 167 BLUE, 162, 167 conditional median unbiased, 169 consistent, 10, 136 hyperbolic tangent, 54 higher order PMC, 206, 207 injective, 119 L-, 185, 186 least squares, 5 location invariant, 139 M-, 188 MMAD, 122 maximum likelihood, 8, 95,196 median unbiased, 36,120,129, 133, 138 method of moments, 7 minimum chi-square, 18, 95, 161 minimum logit, 96 normal score, 190 Pitman, 15, 33, 143, 161, 192 Pitman-closer, 32
Pitman-closest, 32, 137 posterior Pitman-closer, 158 R-, 190 rank weighted mean, 185 randomized, 93 surjective, 119 trimmed mean, 185 UMVUE, 10, 56 Wilcoxon score, 190 Windsorized mean, 185 exponential characteristic life, 121, 133 first-order efficiency, 203 Fisher efficiency, 56, 180 Fisher information, 176, 184, 189 inequality Cauchy-Schwarz, 164, 177 FrecheVCram<§r-Rao, 11, 177 Minkowski's, 175 mode-median-mean, 79, 123 triangle, 93 influence function, 6, 15, 51, 188, 189 interblock information, 109 intransitive, 75, 184 invariance property, 42 inverse regression, 6, 49 jackknife, 12, 16 joint information, 56, 113 least squares, 184 lemmas Fountain, 162 Midpoint I, 141 Midpoint II, 150 Midpoint III, 155 Nayak I, 142
DEVELOPMENT OF PITMAN'S MEASURE OF CLOSENESS Nayak II, 150 Nayak III, 155 Pitman I, 140 Pitman II, 150 Pitman III, 154 PPC determination, 158 PPG preference, 159 likelihood function, 8, 176 Lindelof, 132 linear combinations, 161 line of equality, 104 location model, 1, 142, 161, 192 loss function absolute error, 33, 50, 56, 58, 63 definition, 41 entropy, 43, 45 normal, 52, 58 quadratic, 33, 50, 58 M-estimator, 53, 188 marginal information, 56, 113 mean absolute deviation, 3, 28, 61 mean ranks, 190 mean squared error, 10, 16, 26, 70, 99
median, 79,121,123, 134, 137, 207 median ranks, 89 median unbiased, 36 minimum MSE, 26 mode, 79, 123 monotone likelihood ratio, 120 order statistics, 8, 83, 121, 134, 135, 145, 162, 166 pairwise best, 65, 75 comparisons, 103 worst, 75
225
paradox, 23, 80 plausibility, 72 PMC anomalies, 65 asymptotic, 175, 180 corrected criterion, 91, 94 definition, 4 corrected for ties, 86, 91 corrected preferred, 86 controversy, 18 higher-order closeness, 206, 207 history, 31 paradox, 34, 66, 75 politics, 70, 76, 80 pooled estimator, 109 posterior Pitman closeness, 40,158 quasi-MLE, 97 Rao-Berkson controversy, 19, 65, 94 Rao's phenomenon, 83, 143 rate of convergence, 10, 171 regression diagnostics, 6 resampling techniques, 15 risk, 29, 41, 43, 45 risk decomposition, 41 robust estimation, 5, 15, 184 procedure, 51 regression, 6 round-robin competition, 67 score function, 189 second-order efficiency, 95, 203, 206 shrinkage, 14 simultaneous worst, 75, 78 skew symmetric, 92, 191 Stein rule, 45 switching line, 104
226 switching point, 114, 115 superefficiency, 10 theorems Arrow's impossibility, 72 Basu, 144, 147, 148 canonical, 164 Central Limit, 10, 165 discrete rectangular form, 107 Geary-Rao, 105 Ghosh-Sen I, 147 Ghosh-Sen II, 153 Glivenko-Cantelli, 187 Kagan-Linnik-Rao, 144 Kubokawa-Nayak, 156 median unbiased estimation, 133 Nayak's location, 142 polar form, 108 Possibility, 74 posterior median, 159 posterior transitiveness, 159 rectangular form, 106 scale, 151 total expectation, 41 transitiveness, 130 Sheppard's, 107,113,165,170, 183 ties, 86, 91, 93 topological group, 154 transitiveness, 71, 130 unbiased, 11 uniform conditional median unbiasedness, 149 uniformly minimum variance unbiased, 10
INDEX variance, 11 Voronoi tesselation, 38, 40 voting preference, 70