Theory of Preliminary Test and Stein-Type Estimation with Applications
A. K. Md. Ehsanes Saleh Carleton University Ottawa, Canada
@ZELENCE A JOHN WILEY & SONS, INC., PUBLICATION
This Page Intentionally Left Blank
Theory of Preliminary Test and Stein-Type Estimation with Applications
WILEY SERIES IN PROBABILITY AND STATISTICS Established by WALTER A. SHEWHART and SAMUEL S. WILKS Editors: David 1Balding, Noel A. C. Cressie, Nicholas I. Fisher: Iain M. Johnstone, 1B. Kadune, Geert Molenberghs, Louise M. Ryan, David W Scott, Adrian F M. Smith, Jozef L. Teugels Editors Emeriti: Ec Barnett, 1 Stuart Huntec David G. Kendall
A complete list of the titles in this series appears at the end of this volume.
Theory of Preliminary Test and Stein-Type Estimation with Applications
A. K. Md. Ehsanes Saleh Carleton University Ottawa, Canada
@ZELENCE A JOHN WILEY & SONS, INC., PUBLICATION
Copyright 02006 by John Wiley & Sons, Inc. A11 rights reserved. Published by John Wiley & Sons, lnc., Hoboken, New Jersey. Published simultaneouslyin Canada.
No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as permitted under Section 107 or 108 of the I976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 750-4470, or on the web at www.copyright.com. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 11 1 River Street, Hoboken, NJ 07030, (201) 748-601 1, fax (201) 748-6008, or online at http://w.wiley.codgo/permission. Limit of Liability/Disclaimerof Warranty: While the publisher and author have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completenessof the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives or written sales materials. The advice and strategies contained herein may not be suitable for your situation. You should consult with a professional where appropriate. Neither the publisher nor author shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages. For general information on our other products and services or for technical support, please contact our Customer Care Department within the United States at (800) 762-2974, outside the United States at (317) 572-3993 or fax (317) 572-4002. Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic formats. For more information about Wiley products, visit our web site at www.wiley.com. Library of Congress Cataloging-in-PublicationData:
Saleh, A. K. Md. Ehsanes. Theory of preliminary test and Stein-type estimation with applications. / A.K. Md. Ehsanes Saleh. p. cm. Includes bibliographical references and index. ISBN-I3 978-0-471-563754 (acid-free paper) ISBN-I0 0-471-56375-7 (acid-free paper) 1. Parameter estimation. 2. Regression analysis. 3. Bayesian statistical decision theory. I. Title. QA276.8.8257 2006 2005050196 Printed in the United States of America 109 8 7 6 5 4 3 2 1
To SHAHIDARA, my wife
This Page Intentionally Left Blank
Contents List of Figures
xvii
List of Tables
xix
Preface
xxi
1 Introduction 1.1 Objective of This Book . . . . . . . . . . . . . . . . . . . . . . 1.2 Statistical Decision Principle . . . . . . . . . . . . . . . . . . . 1.3 Quadratic Loss Function . . . . . . . . . . . . . . . . . . . . . . 1.4 Some Statistical Models with Preliminaries . . . . . . . . . . . 1.4.1 Mean and Simple Linear Models . . . . . . . . . . . . . 1.4.2 One-Sample Multivariate Model . . . . . . . . . . . . . 1.4.3 ANOVAModels . . . . . . . . . . . . . . . . . . . . . . 1.4.4 Parallelism Models . . . . . . . . . . . . . . . . . . . . . 1.4.5 Multiple Regression Model and General Linear Hypothesis 1.4.6 Simple Multivariate Linear Model . . . . . . . . . . . . 1.4.7 Discrete Data Models . . . . . . . . . . . . . . . . . . . 1.5 Organization of the Book . . . . . . . . . . . . . . . . . . . . . 1.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.7 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1 6 6 8 9 9 12 13 14 15 19 20 23 23 23
2 Preliminaries 2.1 Normal Distribution . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Chi-square Distribution and Properties . . . . . . . . . . . . . . 2.3 Some Results from Multivariate Normal Theory . . . . . . . . . 2.4 Beta Distribution and Applications . . . . . . . . . . . . . . . . 2.5 Discrete Distributions . . . . . . . . . . . . . . . . . . . . . . . 2.5.1 Binomial Distribution . . . . . . . . . . . . . . . . . . . 2.5.2 Multinomial Distribution . . . . . . . . . . . . . . . . . 2.6 Matrix Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.7 Large Sample Theory . . . . . . . . . . . . . . . . . . . . . . . 2.7.1 Four Types of Convergence . . . . . . . . . . . . . . . . 2.7.2 Law of Large Numbers . . . . . . . . . . . . . . . . . . .
29 29 30 33 34 36 36 37 38 40 41 42
vii
...
CONTENTS
Vlll
2.7.3 Central Limit Theorems . . . . . . . . . . . . . . . . . . 2.8 Nonparametric Theory: Preliminaries . . . . . . . . . . . . . . . 2.8.1 Order-Statistics, Ranks, and Sign Statistics . . . . . . . 2.8.2 Linear rank-statistics (LRS) . . . . . . . . . . . . . . . . 2.8.3 R.ank Estimators of the Parameters of Various Models . 2.9 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
43 45 45 46 50 53
55 3 Preliminary Test Estimation 3.1 Simple Linear Model. Estimators. and Tests . . . . . . . . . . . 56 3.1.1 Simple Linear Model . . . . . . . . . . . . . . . . . . . . 56 3.1.2 Estimation of the Intercept and Slope Parameter . . . . 56 3.1.3 Test for the Slope Parameter . . . . . . . . . . . . . . . 57 3.2 P T E of the Intercept Parameter . . . . . . . . . . . . . . . . . 57 3.2.1 UE. R E and P T E of the Intercept Parameter . . . . . . 58 3.2.2 Bias and MSE Expressions . . . . . . . . . . . . . . . . 58 3.2.3 Comparison of bias and mse functions . . . . . . . . . . 61 3.2.4 Optimum Level of Significance of Preliminary Test . . . 63 3.3 Two-Sample Problem and Pooling of Means . . . . . . . . . . . 67 3.3.1 Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 3.3.2 Estimation and Test of the Difference between Two Means 67 3.3.3 Bias and mse Expression of the Three Estimators of a Mean . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 3.4 One-Sample Problem: Estimation of Mean . . . . . . . . . . . . 73 3.4.1 Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 3.4.2 Unrestricted, Restricted, and Preliminary Test Estimators 73 3.4.3 Bias, mse, and Analysis of Efficiency . . . . . . . . . . . 74 3.5 An Alternative Approach . . . . . . . . . . . . . . . . . . . . . 76 3.5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 76 3.5.2 One-Sample Problem . . . . . . . . . . . . . . . . . . . . 77 3.5.3 Comparison of PTE, and SE 62 . . . . . . . . . . 80 3.5.4 Simple Linear Model and Shrinkage Estimation . . . . . 82 3.5.5 The Two-Sample Problem and Shrinkage Estimation . . 86 3.6 Estimation with Nonnormal Errors . . . . . . . . . . . . . . . . 88 3.6.1 Unrestricted, Restricted, Preliminary Test and Shrinkage Estimators, and the Test of Slope . . . . . . . . . . 89 3.6.2 Conditions for Asymptotic Normality of the Unrestricted Estimators of Intercept and Slope Parameters . 89 3.6.3 Asymptotic Distributional Bias and Mean Square Error Expressions, and Efficiency Analysis . . . . . . . . . . . 92 3.7 Two-Sample Problem and Estimation of Mean . . . . . . . . . 99 3.8 One-Sample Problem and Estimation of the Mean . . . . . . . 101 3.9 Stein Estimation of Variance: One-Sample Problem . . . . . . . 103 3.10 Nonparametric Methods: R-Estimation . . . . . . . . . . . . . . 109 3.10.1 Model and Assumptions . . . . . . . . . . . . . . . . . . 109 3.10.2 Test of Hypothesis . . . . . . . . . . . . . . . . . . . . . 110 3.10.3 Estimation of Intercept and Slope Parameters . . . . . . 111
f?LT
CONTENTS
ix
3.10.4 Asymptotic Distribution of Various Estimators and Their ADB and ADhlSE Expressions . . . . . . . . . . 112 3.1 1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 3.12 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
125 4.1 Statistical Model. Estimation. and Tests . . . . . . . . . . . . . 126 4.2 Preliminary Test Estimation . . . . . . . . . . . . . . . . . . . . 129 4.3 Stein-Type Estimators . . . . . . . . . . . . . . . . . . . . . . . 136 4.3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 136 4.3.2 James-Stein Estimator (JSE) . . . . . . . . . . . . . . . 139 4.3.3 Positive-Rule Stein Estimator (PRSE) . . . . . . . . . . 143 4.3.4 Sclove-Morris-Radhakrishnan Modifications . . . . . . . 148 4.4 Derivation of the Stein-Type Estimators . . . . . . . . . . . . . 151 4.4.1 Risk Difference Representation Approach . . . . . . . . 151 4.4.2 Empirical Bayes Estimation (EBE) Approach . . . . . . 154 4.4.3 Quasi-empirical Bayes or Preliminary Test Estimation Approach . . . . . . . . . . . . . . . . . . . . . . . . . . 157 4.4.4 How Close is the JS Estimator t o the Bayes Estimator? 159 4.5 Stein-Type Estimation When the Variance is Unknown . . . . . 161 4.5.1 Introduction: Model, Estimators, and Tests . . . . . . . 161 4.5.2 Preliminary Test and Stein-Type Estimators . . . . . . 161 4.5.3 Empirical Bayes Estimation When the Variance Is Unknown . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162 4.5.4 Bias, MSE Matrices, and Risk Expressions . . . . . . . 163 4.5.5 Risk Analysis of the Estimators . . . . . . . . . . . . . . 166 4.5.6 An Alternative Improved Estimator of 6 . . . . . . . . . 171 4.6 Stein-Type Estimation: Nonnormal Distributions . . . . . . . . 174 4.6.1 Model, Estimation, and Test . . . . . . . . . . . . . . . 174 4.6.2 Preliminary Test (or Quasi-empirical Bayes) Approach to Stein-Type Estimation of the Mean Vector . . . . . . 175 4.6.3 Asymptotic Distributional Bias Vector, Quadratic Bias, MSE Matrix, and Risk Expressions of the Estimators . 175 4.7 Improving James-Stein Estimator Toward Admissible Estimator 180 4.7.1 Introductions . . . . . . . . . . . . . . . . . . . . . . . . 180 4.7.2 Improving 6 , via P T E . . . . . . . . . . . . . . . . . . . 180 4.7.3 Iterative P T E t o Obtain an Admissible Estimator . . . 182 4.7.4 Extension t o the Case Where the Variance Is Unknown 183 4.8 Confidence Set Estimation Based on Stein-Type Estimators . . 185 4.8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 185 4.8.2 Properties of the Recentered Confidence Set Based on PRSE . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187 4.8.3 Confidence Set Based on Preliminary Test Estimator . . 192 4.8.4 Asymptotic Theory of Recentered Confidence Sets and Domination of Positive-Rule Coverage Probability . . . 195 4.9 Nonparametric Methods: R-Estimation . . . . . . . . . . . . . . 197
4 Stein-Type Estimation
CONTENTS
X
Model and Assumptions . . . . . . . . . . . . . . . . . . Test of Hypothesis . . . . . . . . . . . . . . . . . . . . . Estimation of the Location Parameter . . . . . . . . . . ADB, ADQB. ADMSE, and ADQR of the Estimators of Location Parameters . . . . . . . . . . . . . . . . . . 4.9.5 Asymptotic Properties of Confidence Sets . . . . . . . . 4.10 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.11 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.9.1 4.9.2 4.9.3 4.9.4
5
198 198 199 200 204 205 206
ANOVA Model 213 5.1 Model. Estimation. and Tests . . . . . . . . . . . . . . . . . . . 214 5.1.1 ANOVA model . . . . . . . . . . . . . . . . . . . . . . . 214 5.1.2 Estimation of the Parameters of the One-way ANOVA Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214 5.1.3 Test of Equality of t h e Treatment Means . . . . . . . . 215 5.2 Preliminary Test Approach and Stein-Type Estimators . . . . . 218 5.2.1 Preliminary Test Approach (or Quasi-empirical Bayes Approach) . . . . . . . . . . . . . . . . . . . . . . . . . . 218 5.2.2 Bayes and Empirical Bayes Estimators of Treatment Means . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219 5.3 Bias, Quadratic Bias, MSE. and Risk Expressions . . . . . . . . 221 5.3.1 Bias Expressions . . . . . . . . . . . . . . . . . . . . . . 221 5.3.2 MSE Matrix and Risk Expressions . . . . . . . . . . . . 224 5.4 Risk Analysis and Risk Efficiency - . . . . . . . . . . . . . . . . . 229 5.4.1 Comparison of 8, and 8, . . . . . . . . . . . . . . . . . . 229 +.PT 5.4.2 Comparison of 8, and 6,(6,) . . . . . . . . . . . . . . 230 -s -s+ 5.4.3 Comparison of 8,. 8, and 8, . . . . . . . . . . . . . . 232 5.5 MSE Matrix Analysis and Efficiency . . . . . . . . . . . . . . . 234 5.5.1 Comparison of 6, and 6, . . . . . . . . . . . . . . . . . 234 ..PT 5.5.2 Comparison of On Relative t o 6, and 6, . . . . . . . . 235 -s -s . S+ 5.5.3 Comparison of 6, and 6 , (8, and 8, ) . . . . . . . . . 237 240 5.6 Improving the P T E . . . . . . . . . . . . . . . . . . . . . . . . . 5.7 ANOVA Model: Nonnormal Errors . . . . . . . . . . . . . . . . 242 5.7.1 Estimation and Test of Hypothesis . . . . . . . . . . . . 243 5.7.2 Preliminary Test and Stein-Type Estimators . . . . . . 244 5.8 ADB. ADQB. ADMSE, and ADQR . of the Estimators . . . . . 244 5.8.1 Asymptotic Distribution of the Estimators under Fixed Alternatives . . . . . . . . . . . . . . . . . . . . . . . . . 245 5.8.2 Asymptotic Distribution of the Estimators under Local Alternatives . . . . . . . . . . . . . . . . . . . . pT . ,g . 246 5.8.3 ADB. ADQB, MSE-Matrices, and ADQR . of 8, 8, . S+ and 8, . . . . . . . . . . . . . . . . . . . . . . . . . . . 248 250 5.9 Confidence Set Estimation . . . . . . . . . . . . . . . . . . . . . 5.9.1 Confidence Sets and Coverage Probabilities . . . . . . . 251
.
^
^
CONTENTS
xi
5.9.2 Analysis of the Confidence Sets . . . . . . . . . . . . . . 253 5.10 Asymptotic Theory of Confidence Set Estimation . . . . . . . . 258 5.10.1 Asymptotic Representations of Normalized Estimators under Fixed Alternatives . . . . . . . . . . . . . . . . . 258 5.10.2 Asymptotic Coverage Probability of the Confidence Sets under Local Alternatives . . . . . . . . . . . . . . . . . . 259 5.11 Nonparametric Methods: R-Estimation . . . . . . . . . . . . . . 260 5.11.1 Model, Assumptions. and Linear Rank Statistics (LRS) 260 5.11.2 Preliminary Test and Stein-Type Estimators . . . . . . 263 5.11.3 Asymptotic Distributional Properties of R-Estimators . 263 5.11.4 ADB, ADQB. ADMSE. and ADQR . . . . . . . . . . . 265 5.12 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 268 5.13 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 268 6 Parallelism Model 271 6.1 Model. Estimation. and Test of Hypothesis . . . . . . . . . . . 272 6.1.1 Parallelism Model . . . . . . . . . . . . . . . . . . . . . 272 6.1.2 Estimation of the Intercept and Slope Parameters . . . 272 6.1.3 Test of Parallelism . . . . . . . . . . . . . . . . . . . . . 274 6.2 Preliminary Test and Stein-Type Estimators . . . . . . . . . . . 275 6.2.1 The Estimators of Intercepts and Slopes . . . . . . . . . 276 6.2.2 Bayes and Empirical Bayes Estimators of Intercepts and Slopes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 278 6.3 Bias, Quadratic Bias, MSE Matrices. and Risk Expressions . . 280 6.3.1 Unrestricted Estimators of p and 8 . . . . . . . . . . . . 280 6.3.2 Restricted Estimators of p and 8 . . . . . . . . . . . . . 280 6.3.3 Preliminary Test Estimators of ,B and 8 . . . . . . . . . 281 6.3.4 James-Stein-type Estimators of p and 6 . . . . . . . . . 281 6.3.5 Positive-Rule Stein Estimators of p and 8 . . . . . . . . 282 6.4 Comparison of the Estimators of the Intercept Parameter . . . 283 6.4.1 Bias Comparison of the Estimators of the Intercept Parameter . . . . . . . . . . . . . . . . . . . . . . . . . . . 283 6.4.2 MSE-matrix Comparisons . . . . . . . . . . . . . . . . . 284 6.4.3 Weighted Risk Comparisons of the Estimators . . . . . 289 6.5 Estimation of the Regression Parameters: Nonnormal Errors . 292 6.5.1 Unrestricted. Restricted. Preliminary Test. James-Stein and Positive-Rule Stein Estimators and Test of Hypothesis293 6.5.2 Conditions for Asymptotic Properties of the Estimators and Their Distributions . . . . . . . . . . . . . . . . . . 294 6.5.3 Asymptotic Distributions of the Estimators . . . . . . . 295 6.5.4 Expressions for ADB. ADQB. ADMSE. and ADQR of the Estimators . . . . . . . . . . . . . . . . . . . . . . . 298 6.6 Asymptotic Distributional Risk Properties . . . . . . . . . . . . 302 6.6.1 Comparison of 6, and 6 , . . . . . . . . . . . . . . . . . 302
6.6.2
..PT
Comparison of 6 ,
and On(0, ) -
A
..............
302
CONTENTS
xii 6.6.3
6.9 6.10
6.11 6.12
...
-S
..PT
. . . . . . . . . . . . . . . . 305 . -PT 6.6.5 Comparison of 8, and 8,, 8,, 8, . . . . . . . . . . 305 Asymptotic Distributional MSE-matrix Properties . . . . . . . 307 Confidence Set Estimation: Normal Case . . . . . . . . . . . . . 311 6.8.1 Confidence Sets for the Slope Parameters . . . . . . . . 311 6.8.2 Analysis of Coverage Probabilities . . . . . . . . . . . . 314 6.8.3 Confidence Sets for the Intercept Parameters when o2 isKnown . . . . . . . . . . . . . . . . . . . . . . . . . . 318 Confidence Set Estimation: Nonnormal Case . . . . . . . . . . . 319 Nonparametric Methods: R-Estimation . . . . . . . . . . . . . . 321 6.10.1 Model, Assumptions, and Linear Rank Statistics . . . . 321 6.10.2 R-Estimation and Test of Hypothesis . . . . . . . . . . . 323 6.10.3 Estimation of the Intercepts Oa and the Slope pa . . . . 323 6.10.4 Asymptotic Distribution of the R-Estimators of the 325 Slope Vector . . . . . . . . . . . . . . . . . . . . . . . . 6.10.5 Asymptotic Distributional Properties of the R-Estimators of Intercepts . . . . . . . . . . . . . . . . . . . . . . . . 330 6.10.6 Confidence Sets for Intercept and Slope Parameters . . 336 337 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 337 6.6.4
6.7 6.8
-S
Comparison of 8, and 8,(8, ) . . . . . . . . . . . . . . . 304 Comparison of 8, and 8, S+
-5
7 Multiple Regression Model 339 7.1 Model. Estimation. and Tests . . . . . . . . . . . . . . . . . . . 340 7.1.1 Estimation of Regression Parameters of the Model . . . 340 7.1.2 Test of the Null Hypothesis. HP = h . . . . . . . . . . . 341 7.2 Preliminary Test and Stein-Type Estimation . . . . . . . . . . 343 7.2.1 Preliminary Test (or Quasi-empirical Bayes) Approach . 343 7.2.2 Bayes and Empirical Bayes Estimators of the Regression Parameters . . . . . . . . . . . . . . . . . . . . . . . . . 344 7.3 Bias. Quadratic Bias. MSE. and Quadratic Risks . . . . . . . . 348 7.3.1 Bias Expressions . . . . . . . . . . . . . . . . . . . . . . 348 7.3.2 MSE Matrices and Weighted Risks of the Estimators . . 350 7.4 Risk Analysis of the Estimators . . . . . . . . . . . . . . . . . . 355 7.5 MSE-Matrix Analysis of the Estimators . . . . . . . . . . . . . 362 369 7.6 Improving the P T E . . . . . . . . . . . . . . . . . . . . . . . . . 7.7 Multiple Regression Model: Nonnormal Errors . . . . . . . . . . 370 7.7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 370 7.7.2 Estimation of Regression Parameters and Test of the Hypothesis . . . . . . . . . . . . . . . . . . . . . . . . . 370 7.7.3 Preliminary Test and Stein-Type Estimation . . . . . . 371 7.8 Asymptotic Distribution of the Estimators . . . . . . . . . . . . 371 7.8.1 Asymptotic Distribution of the Estimators under Fixed Alternatives . . . . . . . . . . . . . . . . . . . . . . . . . 372
CONTENTS
...
Xlll
7.8.2
7.9
7.10 7.11
7.12 7.13
Asymptotic Distribution of the Estimators under Local Alternatives. and ADB. ADQB. ADMSE. and ADQR . 374 7.8.3 ADQR Analysis . . . . . . . . . . . . . . . . . . . . . . 380 Confidence Set Estimation . . . . . . . . . . . . . . . . . . . . . 383 7.9.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . 383 7.9.2 Confidence Sets and the Coverage Probabilities . . . . . 385 7.9.3 Analysis of the Coverage Probabilities . . . . . . . . . . 387 Asymptotic Theory of Confidence Sets . . . . . . . . . . . . . . 391 7.10.1 Confidence Sets . . . . . . . . . . . . . . . . . . . . . . . 391 7.10.2 Asymptotic Properties of Confidence Sets . . . . . . . . 392 Nonparametric Methods: R-Estimation . . . . . . . . . . . . . . 394 7.11.1 Linear Rank Statistics. R-Estimators and Confidence Sets394 7.11.2 Asymptotic Distributional Properties of the R-estimators396 7.11.3 Asymptotic Properties of the Recentered Confidence Sets Based on R-Estimators . . . . . . . . . . . . . . . . 399 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 400 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 400
8 Regression Model: Stochastic Subspace 403 8.1 The Model. Estimation. and Test of Hypothesis . . . . . . . . . 404 8.1.1 The Model Formulation . . . . . . . . . . . . . . . . . . 404 8.1.2 Mixed Model Estimation . . . . . . . . . . . . . . . . . 405 8.1.3 Test of Hypothesis . . . . . . . . . . . . . . . . . . . . . 406 8.1.4 Preliminary Test and Stein-type Mixed Estimators . . . 407 8.2 Bias. MSE. and Risks . . . . . . . . . . . . . . . . . . . . . . . 408 8.2.1 Bias and Quadratic Bias Expressions . . . . . . . . . . . 408 8.2.2 MSE Matrix and Risk Expressions . . . . . . . . . . . . 409 8.2.3 hlSE Matrix Comparisons of the Estimators . . . . . . . 411 8.2.4 Risk Comparisons of the Estimations . . . . . . . . . . . 415 8.3 Estimation with Prior Information . . . . . . . . . . . . . . . . 418 8.3.1 Estimation of PI and Test of H& = HIP, . . . . . . . 419 8.3.2 The Mixed Estimators . . . . . . . . . . . . . . . . . . . 420 8.3.3 Bias Expressions . . . . . . . . . . . . . . . . . . . . . . 421 8.3.4 MSE Matrix and Risk Expressions . . . . . . . . . . . . 421 8.4 Stochastic Subspace Hypothesis: Nonnormal Errors . . . . . . . 422 8.4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 422 8.4.2 Estimation of the Parameters and Test of Hypothesis . 423 8.4.3 Preliminary Test and Stein-type Estimators . . . . . . . 424 8.5 Asymptotic Distribution of the Estimators . . . . . . . . . . . . 424 8.5.1 Asymptotic Distribution of the Estimators under Fixed Alternatives . . . . . . . . . . . . . . . . . . . . . . . . . 425 8.5.2 Asymptotic Distribution of the Estimators under Local Alternatives . . . . . . . . . . . . . . . . . . . . . . . . . 427 8.6 Confidence Set Estimation: Stochastic Hypothesis . . . . . . . . 429 8.7 R-Estimation: Stochastic Hypothesis . . . . . . . . . . . . . . . 430 8.8 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 436
CONTENTS
xiv 8.9
Problems
..............................
436
9 Ridge Regression 439 9.1 Ridge Regression Estimators . . . . . . . . . . . . . . . . . . . 441 9.1.1 Ridge Regression with Normal Errors . . . . . . . . . . 441 9.1.2 Nonparametric Ridge Regression Estimators . . . . . . . 442 9.2 Ridge R.egression as Bayesian Regression Estimators . . . . . . 443 9.3 Bias Expressions . . . . . . . . . . . . . . . . . . . . . . . . . . 444 . PT 9.3.1 Bias Vector of p,, (k) . . . . . . . . . . . . . . . . . . . 444 9.4 Covariance, MSE Matrix. and Risk Functions . . . . . . . . . . 446 9.5 Performance of Estimators . . . . . . . . . . . . . . . . . . . . . 450 9.6 Estimation of the Ridge Parameter . . . . . . . . . . . . . . . . 461 9.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 464 9.8 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 464 10 Regression Models with Autocorrelated Errors 469 10.1 Simple Linear Model with Autocorrelated Errors . . . . . . . . 470 10.1.1 Estimation of the Intercept and Slope Parameters when p is Known . . . . . . . . . . . . . . . . . . . . . . . . . 470 10.1.2 Preliminary Test and S-Estimation of ,5 and 6 . . . . . 472 10.1.3 Estimation of the Intercept and Slope Parameters When Autocorrelation Is Unknown . . . . . . . . . . . . . . . 474 10.2 Multiple Regression Model with Autocorrelation . . . . . . . . 478 10.2.1 Estimation of p and Test of Hypothesis of HP = h . . . 479 10.2.2 Preliminary Test, James-Stein and Positive-Rule SteinType Estimators of p . . . . . . . . . . . . . . . . . . . 479 10.3 Bias, MSE Matrices, and the Risk of Estimators When p Is Known . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 480 10.4 ADB, ADMSE, and ADQR of the Estimators ( p Unknown) . . 483 10.5 Estimation of Regression Parameters When p Is Near Zero . . . 485 10.5.1 Preliminary Test and Stein-Type Estimators (Chen and Saleh, 1993) . . . . . . . . . . . . . . . . . . . . . . . . . 485 10.5.2 Design of Monte Carlo Experiment . . . . . . . . . . . . 487 10.5.3 Empirical Results and Conclusions . . . . . . . . . . . . 487 10.6 Estimation of Parameters of an Autoregressive Gaussian Process493 10.6.1 Estimation and Test of Hypothesis . . . . . . . . . . . . 494 10.6.2 Asymptotic Theory of the Estimators and the TestStatistics . . . . . . . . . . . . . . . . . . . . . . . . . . 495 10.6.3 ADB, ADMSE Matrices, and ADQR of the Estimators 497 10.7 R-Estimation of the Parameters of the AR[p]-Models . . . . . . 498 10.7.1 R-Estimation of the Parameters of the A R b ] Model . . 499 10.7.2 Tests of Hypothesis and Improved R-Estimators of 6 . . 500 10.7.3 Asymptotic Bias, MSE Matrix, and Risks of the REstimators . . . . . . . . . . . . . . . . . . . . . . . . . 501 10.8 R-Estimation of the Parameters with AR[1] Errors . . . . . . . 503 10.9 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 505
CONTENTS 10.10Problems
xv
..............................
505
11 Multivariate Models 509 11.1 Point and Set Estimation of the Mean Vector of an MND . . . 510 11.1.1 Model, Estimation. and Test of Hypothesis . . . . . . . 510 11.1.2 Bias. QB. MSE Matrix. and Weighted Risk Expressions of the Estimators . . . . . . . . . . . . . . . . . . . . . . 512 11.1.3 Risk and MSE Analysis of the Estimators . . . . . . . . 513 11.2 U-statistics Approach to Estimation . . . . . . . . . . . . . . . 516 11.2.1 Asymptotic Properties of Point and Set Estimation under Fixed Alternatives . . . . . . . . . . . . . . . . . . . 518 11.2.2 Asymptotic Properties of the Point and Set Estimation under Local Alternatives . . . . . . . . . . . . . . . . . . 519 11.3 Nonparametric Methods: R-estimation . . . . . . . . . . . . . . 522 11.3.1 Asymptotic Properties of the Point Estimators . . . . . 524 11.3.2 Asymptotic Properties Confidence Sets . . . . . . . . . 528 11.4 Simple Multivariate Linear Regression Model . . . . . . . . . . 530 11.4.1 Model, Estimation and Tests . . . . . . . . . . . . . . . 530 11.4.2 Preliminary Test and Stein-Type Estimators . . . . . . 531 11.4.3 Bias. Quadratic Bias, MSE Matrices, and Risk Expressions of the Estimators . . . . . . . . . . . . . . . . . . . 532 11.4.4 Two-Sample Problem and Estimation of the Means . . . 535 11.4.5 Confidence Sets for the Slope and Intercept Parameters 538 11.5 R-estimation and Confidence Sets for Simple Multivariate Model539 11.5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 539 11.5.2 Asymptotic Properties of the R-estimators . . . . . . . 541 11.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 545 11.7 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 545 12 Discrete Data Models 549 12.1 Product of Bernoulli Models . . . . . . . . . . . . . . . . . . . . 550 12.1.1 Model. Estimation. and Test . . . . . . . . . . . . . . . 550 12.1.2 Bayes and Empirical Bayes Estimation . . . . . . . . . . 552 12.1.3 Asymptotic Theory of the Estimators and the Test of Departure . . . . . . . . . . . . . . . . . . . . . . . . . . 554 12.1.4 ADB. ADQB. ADMSE. and ADQR . of Estimators . . . 558 12.1.5 Analysis of the Properties of Estimators . . . . . . . . . 559 12.1.6 Baseball Data Analysis . . . . . . . . . . . . . . . . . . 564 12.1.7 Asymptotic Properties of Confidence Sets . . . . . . . . 567 12.2 Product Binomial Distributions . . . . . . . . . . . . . . . . . . 569 12.2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 569 12.2.2 Model. Estimation. and Test of Hypothesis . . . . . . . 569 12.2.3 Asymptotic Theory of the Estimators and the TestStatistics . . . . . . . . . . . . . . . . . . . . . . . . . . 572 12.2.4 ADB. ADQB. ADMSE, and ADQR of the Estimators . 574
CONTENTS
xvi
12.2.5 Estimation of Odds Ratio under Uncertain Zero Partial Association . . . . . . . . . . . . . . . . . . . . . . . . . 579 12.2.6 Odds Ratios: Application to Meta-analysis of Clinical Trials . . . . . . . . . . . . . . . . . . . . . . . . . . . . 580 12.3 Product of Multinomial Models . . . . . . . . . . . . . . . . . . 584 12.3.1 The Product of Multinomial Models . . . . . . . . . . . 584 12.3.2 Estimation of the Parameters . . . . . . . . . . . . . . . 585 12.3.3 Test of Independence in an T x c Contingency Table . . 585 12.3.4 Preliminary Test and Stein-Type Estimators of the Cell Probabilities . . . . . . . . . . . . . . . . . . . . . . . . 586 12.3.5 Bayes and Empirical Bayes Method . . . . . . . . . . . 586 12.3.6 Asymptotic Properties . . . . . . . . . . . . . . . . . . . 589 12.3.7 Asymptotic Properties of the Estimators under Local Alternatives . . . . . . . . . . . . . . . . . . . . . . . . . 594 12.3.8 Analysis of the Asymptotic Properties of the Estimators 597 12.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 599 12.5 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 599
References
60 1
Glossary
613
Authors Index
615
Subject Index
621
List of Figures 3.1 Display of predicted batting averages based on Stein’s formula . 3.2.1 Graph of quadratic bias functions of the estimators . . . . . 3.2.2 Graph of MRE(8,; 8,) and MRE(eET;&) .......... 3.3.1 Graph of MRE ( f i ~bl) ; and MRE(firT;i i 1 ) . . . . . . . . . . 3.5.1 Graph of the relative efficiency of SE and PTE for different valuesofa . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.6.1 Graph of AMRE of 6zT and 8: relative to 6, . . . . . . . . . 3.9.1 Graph of [ ~ ~ S ( L , ) ] ~ { E ( X ~ 2q5s(L,)E(x2ILn) IL,)} 1 . .
+
I)
3
. 61 . 63 . 73 . 81 . 95
. 108
..PT
APT l / p
4.2.1 Graphs of plMz (6, and R2 ( 6 , ; Ip) . . . . . . . . . . . 130 4.3.1 Geometrical representation of Stein’s idea . . . . . . . . . . . . 137 -S 4.3.2 Graphs of & ( @ ; I p ) and p[M3(6,)I1/p . . , . . . . . . . . . . 141
-s -
-s -
4.3.3 Graphs of MRE = MRE(6,; 6,) and RRE =RRE(6,; 6,)
..PT
-S
. . 143
4.3.4 Graph of QB of estimators: P T E = 6 , , AS+ PRSE = 6 , .......................... 4.3.5 Graph of the Risks of the Estimators . . . . . . . . . . . . . JSE
=
6 , and
. . 145 . . 147 PT+ ..PT 4.3.6 Graph of Rg (6, ; Ip) and Rz ( 6 , ;Ip) . . . . . . . . . . . . . 149 AS+. APT+4.3.7 Graphs of R4(6, ,Ip)and Rg(6, ,Ip) . . . . . . . . . . . . . 151 4.4.1 Empirical Bayes Tree . . . . . . . . . . . . . . . . . . . . . . 154 4.5.1 Graph of QB of Estimators: PTE, JSE and PRSE . . . . . . 164 PT S+PT+. 4.5.2 Graph of ; o-’I,), , a-’I,), and Rg(8, , G - ~ I , )170 -IS
S+
-S
4.5.3 Graph of Rs(6, ; O - ~ I , ) R4(6, , ; O-~I,), and R@,; u - ~ I , ) 173 ~
12.2.la Predicted odds ratios . . . . . . . . . . . . . . . . . 12.2.lb Confidence Intervals of odds ratios . . . . . . . . . . 12.2.2a Predicted odds ratios (Deleting Fallis) . . . . . . . . 12.2.2b Confidence Intervals of odds ratios (Deleting Fallis)
xvii
. . . . . 581 . . . . . 581 . . . . . 581 . . . . . 581
This Page Intentionally Left Blank
List of Tables 1.1.1 Batting averages of 18 players
...................
3.2.1 Maximum and Minimum Guaranteed Efficiencies for n = 8 . . 3.2.2 Maximum and Minimum Guaranteed Efficiencies for n = 12 and Z2/Q = 0.1(0.2)0.9. . . . . . . . . . . . . . . . . . . . . . . 3.3.1 Maximum and Minimum Guaranteed Efficiencies . . . . . . . . 3.3.2 Maximum and Minimum Guaranteed Efficiencies . . . . . . . . 3.3.3 Maximum and Minimum Guaranteed Efficiencies . . . . . . . . 3.4.1 Minimum and Maximum Efficiency of P T E . . . . . . . . . . . 3.5.1 Maximum and Minimum Efficiencies of SE and Efficiency of P T E at A, for Selected a . . . . . . . . . . . . . . . . . . . . . 3.5.2 Minimum and Maximum Relative Efficiency of SE and P T E for n = 8, a = .05(.10).45 and = 1(0.5)5 . . . . . . . . . . . 3.5.3 Minimum and Maximum Relative Efficiency of SE and PTE for a = 0.05(0.10)0.45 and for Selected Samples . . . . . . . . 3.6.1 Maximum and Minimum Guaranteed Asymptotic Efficiencies ofPTE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.6.2 Maximum and Minimum Guaranteed Asymptotic Efficiencies ofPTE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
$
3 65 66 70 71 72 76 81
85 88 97 98
4.2.1 Maximum and Minimum Guaranteed MSE Based Efficiencies . 134 4.2.2 Maximum and Minimum Guaranteed risk Based Efficiencies . . 136 4.3.1 Risk Gain of PRSE over JSE . . . . . . . . . . . . . . . . . . . 148 4.8.1 Decomposition of the Coverage Probability . . . . . . . . . . . 191 4.8.2 Some Upper Bounds of c for y = .10 and .05 . . . . . . 192 4.8.3 Coverage Probabilities for the Set CPT(6ET(a))with y = .10 and 0 = .05 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194 ..PT 4.8.4 Coverage Probabilities for the Set CPT(6, ( a ) )with y = 0.10 and a = 0.10 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195 4.10.1 Properties of Estimators . . . . . . . . . . . . . . . . . . . . . 205 5.5.1 Maximum and Minimum Guaranteed Efficiencies
. . . . . . . . 236
10.4.1 Empirical Risks for Different Estimators Prior t o Testing . . . 488 xix
xx
LIST OF TABLES 10.4.2 Empirical Risks for Different Estimators Prior t o Testing-Shrinkage Estimates . . . . . . . . . . . . . . . . . . 488 10.4.3 Empirical Risk Values for P T E Based on D-W and statistic, a = 0.01. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 489 10.4.4 Empirical Risk Values for Shrinkage P T E Based on D-W and Statistic, a = 0.01 . . . . . . . . . . . . . . . . . . . . . . . . 489 10.4.5 Empirical Risk Values for P T E Based on D-W and G1 Statistic, Q = 0.05. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 490 10.4.6 Empirical Risk Values for Shrinkage P T E Based on D-W and Statistic, a = 0.05 . . . . . . . . . . . . . . . . . . . . . . . . 490 10.4.7 Empirical Risks for Different Estimators prior t o Testing . . . 491 10.4.8 Empirical Risks for Different Estimators prior to Testing491 Shrinkage Estimates . . . . . . . . . . . . . . . . . . . . . . . . 10.4.9 Empirical Risk Values for P T E Based on D-W and Statistic, cy = 0.01. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 491 10.4.10 Empirical Risk Values for Shrinkage P T E Based on D-W and Statistic, Q = 0.01 . . . . . . . . . . . . . . . . . . . . . 492 10.4.11 Empirical Risk Values for PTE Based on D-W and Statistic, a = 0.05. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 492 10.4.12 Empirical Risk Values for Shrinkage P T E Based on D-W and 41 Statistic, Q = 0.05 . . . . . . . . . . . . . . . . . . . . . 493 12.1.1 Maximum Relative Efficiencies of the RMLE, PTE, and SE and the Intersection Efficiencies for the P T E and SE for each a with Corresponding A,-Values for pValues for a = 0.05(0.05)0.25 and p = 4(2)16 . . . . . . . . . . . . . . . . 564 12.1.2 True Value (13:) and Estimated Values of Bi Based on Efron-Morris (EM), Empirical Bayes (EB) and Ali and Saleh Estimators, 6rT,d:, and 6;’ . . . . . . . . . . . . . . . . . . . 566 12.1.3 Estimated Average Loss for the Estimators. . . . . . . . . . . 566 12.2.1 Incidence of pre-eclampsia in nine randomized trials . . . . . . 581 12.2.2 Various Estimators of Odd-Ratios . . . . . . . . . . . . . . . . 581 12.2.3 Revised Estimators of ORs after Deleting O R “Fallis” . . . . 584
Preface The estimation of parameters of a model with “uncertain prior information” on the parameters of interest began with Bancroft (1944, Annals of Mathematical Statistics 15: 19g204). Bancroft introduced the “preliminary test estimation” on the classical front. although Bayesian methods already existed. But the real breakthrough came when Stein (1956, Proceedings of the Third Berkeley Symposium 1, pp. 197-206) and James and Stein (1961, Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability 2, pp. 361-379) proved that the sample mean in a multivariate normal model is not admissible, under a quadratic loss, for dimensions more than two. This discovery, known as the Stein paradox, basically undermined the criteria, like unbiasedness, equivariance, and the MLE properties, that were the backbone of statistical estimation theory. Stein’s theory does not only improve the point estimators of location parameters (mean, regression parameters, etc.) but also improves the traditional confidence sets by sets recentered a t the Stein estimators. Preliminary test estimation of locations generally performs poorly and is not uniformly better than the MLEs/LSEs, but Stein (1964, Annals of the Institute of Statistical Mathematics 16, pp. 155-160) showed that, in the case of variance estimation, preliminary test estimators are uniformly better than the usual variance estimators. In addition, the preliminary test approach improves upon the standard James-Stein estimators and leads to admissible (generalized Bayes) estimators. Due to the immense impact of Stein’s approach on estimation, scores of technical papers appeared in the literature in various areas of applications. The book The Statistical Implications of Pre-test and Stein-Rule Estimations in Econometrics by Judge and Bock (1978), was the first attempt directed toward the econometrics area of applications. Recently, another book, Improving Efficiency by Shrinkage: The James-Stein and Ridge Regression Estimators by Marvin H. J . Gruber (1998) has directed the estimation toward student/teacher as descriptive applied research material. So far, there has been no book of statistics on the topic of preliminary test and Stein-type estimation. This led me to develop such a book that would be beneficial to graduate students, researchers, and teachers alike. The mathematics and statistics prerequisites for this book are modest. It is assumed that the reader has had a basic course in mathematical statistics, xxi
xxii
PREFACE
preferably having used textbooks such as An Introduction to Probability Theory and Mathematical Statistics by Rohatgi and Saleh (2001, Wiley), Statistical Inference by Casella and Berger (1990, Brooks/Cole), Mathematical Statistics: basic ideas and selected topics by Bickel and Doksum (2001, Prentice Hall) or Introduction to Mathematical Statistics by Hogg, McKean and Craig (2005, Prentice Hall). The aim of the book is to provide a clear and balanced introduction to preliminary test and Stein-type estimation theory for students/teachers of statistics programs. We start with the two-sample problem of pooling means in a general setup, in order to help the readers understand the results and calculations that are being used in every step of the development for the solution to the problem. Then, from chapter to chapter, we raise the level of discussion. The book consists of twelve chapters. Chapter 1 gives the introduction to the preliminary test and Stein-type estimation, with details on each model, including simple linear regression, the ANOVA model, the parallelism model, the multiple regression model, and the multivariate simple linear model. Chapter 2 contains some basic results used in the book. Chapter 3 covers the introduction to the preliminary test estimation, with applications to the one-sample problems, two-sample problems, and simple linear models. Also included is Stein’s variance estimation. Chapter 4 gives an introduction to Stein-type estimation. In addition, the chapter shows that the Stein-type estimation of the mean vector is a n empirical Rayes estimator, and the Sclove, Morris, and R.adhkrishnan (1972, Annals of Mathematical Statistics 43: 1481-1490) modification is carried out to obtain the positive-rule Stein estimator. Some asymptotic results are given for nonnormal situations. To complete the chapter, recentered confidence sets are studied for the mean vector. Chapter 5 contains similar details of the ANOVA model, and Chapter 6, deals with the parallelism models. Chapter 7 contains the discussion of the multiple regression model with subset restrictions, while Chapter 8 is concerned with the multiple regression model with stochastic subset restrictions. The topic of ridge regression introduced by Horel and Kennard (1970, Technometrics 12: 55-67) is presented in Chapter 9. Chapter 10 contains the multiple regression models with autocorrelated errors. Finally, in Chapters 11 and 12 we discuss the one-sample to two-sample problems and simple linear models in a multivariate setup and some discrete data models. The contents of the book may be covered in one semester. Various problems are included to enhance the knowledge of application of the theory. I am grateful to my wife, Shahidara Saleh, for her support over our 54 years of marriage. While I remained a student all my life, she bore more than her share of chores and provided me with words of encouragement. My granddaughters, Jasmine and Sarah, watched me work on the manuscript, prepared tea a t frequent intervals, and as a result, Sarah became interested in mathematics and statistics. Thanks are due to Professor E.O. Kreyszig, who read the manuscript thoroughly, with great interest, and edited the early versions page by page to
PREFACE
xxiii
bring it t o this final form. I am also grateful to Professor Vijay K. Rohatgi for checking some portions of the book for clarity. I wish to thank Gillian Murray for diligently w i n g the manuscript; without her help, the book could not have been completed. Also, special thanks are due to H. M. Kim, who read the typed version and produced all the graphs and tables in the book, as well as to Ann Woodside, Drs. Bashir Ullah, M. Oulde Haye, Patrick Farrell, Shahjahan Khan, and B.M. Golam Kibria, who read various chapters and made many good suggestions to improve readability. My thanks are due to Wiley & Sons, Inc. and to authors for the copyright permissions on several books which helped me to finish the project. Finally, I am grateful to NSERC for supporting my research for the last three decades, the outcome of which is, besides extensive publications, twelve Ph.D. theses, several postdoctorals and MSc. students, and this book. A. K. Md. Ehsanes Saleh
This Page Intentionally Left Blank
Chapter 1
Introduction Outline 1.1 Objective of This Book 1.2 Statistical Decision Principle 1.3 Quadratic Loss Function 1.4 Some Statistical Models with Preliminaries 1.5 Organization of the Book 1.6 Conclusions 1.7 Problems
In problems of statistical inference, the use of prior information on some or all of the parameters in a statistical model usually leads to improved inference procedures. For some or all parameters of the model, the prior information may be known or may be uncertain. The known prior information is generally incorporated in the model in the form of a constraint, giving rise to a restricted model. The analysis of such restricted models leads to an improved statistical procedure when such constraints hold compared to the Unrestricted model. The estimators (or tests) resulting from a restricted (unrestricted) model are the restricted (unrestricted) estimators (or tests) of the parameters of the model. The validity and efficiency of restricted model analysis retains its properties over the restricted parameter space induced by the constraints, while the same holds for the unrestricted model analysis over the entire parameter space. Therefore, the results of an analysis of the restricted/unrestricted model need to weigh the loss of efficiency against the validity of the constraints in order to choose between two extreme inference techniques. Choosing restricted estimation/tests may be justified when full confidence may be placed in the prior information. When we encounter statistical models with “uncertain prior information” in the form of a constraint on the parameters, we may consider the “uncertain constraints” as a “nuisance parameter” in the statistical analysis of the full model. The preferred procedure to eliminate this uncertainty of the prior 1
2
Chapter 1. Introduction
information in a model is to use the Fisher’s recipe. This procedure calls for elimination of the nuisance parameter by a so-called preliminary test. The restricted or unrestricted model is chosen based on the validity of the uncertain prior information. The result is a compromise inference procedure between the two extremes. Bancroft (1944, 1964, 1965) was first to implement the idea of preliminary test estimation (PTE) in an ANOVA setup to analyze the effect of preliminary test (PT) on the estimation of variance. The idea was borrowed from a suggestion in Snedecor’s (1938) book on testing differences between two means after testing the equality of variances that are unknown. If the test accepts the equality of two variances, then the usual t-test is to be used with pooled estimate of variance; otherwise, it falls in the category of Behrens’-Fisher problem. As it became clear, the preliminary test estimation procedure depends heavily on the level of significance and yields only two extreme choices as an estimator. The natural question is what would be the optimum size of the level of significance in a preliminary test estimation procedure. Later Nan and Bancroft (1968) provided a maximin procedure t o determine the size of a preliminary test for the estimation of a mean in a two-sample problem to obtain an optimum “preliminary test estimator” of the mean. Mosteller (1948) looked at the problem from a Bayesian point of view and almost gave the idea of empirical Bayes estimation. Kitagawa (1963) followed the idea of preliminary test on successive occasions and obtained the distribution and moments of the preliminary test estimators in the two-sample problem. Bozivich, Bancroft, and Hartley (1956) studied the properties of the size of the preliminary test as well as the power of the test followed by a preliminary test. Bennett (1952, 1956), Huntsberger (1955), and Asano and Sato (1962) followed the application of preliminary test approach in various directions. Cohen (1965) proved that preliminary test estimators are inadmissible with no alternative suggestion for a superior estimator. Charles Stein (1955, 1956; see also 1981) followed by James and Stein (1961) discovered a paradoxical statistic which undermined a century and a half of work on estimation theory, going back t o Gauss, Legendre, Fisher, and Rao. The result was that the sample mean vector is inadmissible under a quadratic loss for the mean vector of a pdimensional multivariate normal distribution for p 2 3. After 25 years of resistance to Stein’s ideas, the paradox has diminished and Stein’s ideas are being incorporated into many applied and theoretical statistics. Not only that, Efron (1995) in his article “The Statistical Century” lists “empirical Bayes and James-Stein estimations” as important future topics among a dozen of his favorite topics in applied statistics. In traditional statistical methodology, usually the sample mean is appropriate t o estimate the population mean, as it can be proved that no other estimation rule is uniformly better than the sample mean. The paradoxical element of Stein’s work is that it contradicts this basic law of statistical theory. This assertion is shown by the baseball data analysis of Efron and Morris (1973, 1975). For example, if we have three or more baseball players and we are interested in predicting future batting averages for each of the
Chapter 1. Introduction
3
players, then the statistician who uses Stein’s method can expect to predict the future batting more accurately than the predicted values based on the separate batting average of the players. Stein’s method can be illustrated based on the famous baseball data of Efron and Morris (1975). Table 1.1 shows the first 45 batting averages for 18 players. Table 1.1.1Batting Averages of 18 Players
after 45
average Source: Adapted from table 1 of Efron and Morris ( 975) The first step in applying Stein’s method is to compute the grand mean of all these averages, which is 6 = 0.2654 in this case. The idea is to shrink the 18 averages toward the grand mean, which is a reasonable quantity under consideration. The next step is to determine the weighted squared distance between the 18 averages in the table and 18 grand means values, which becomes 19.045, and calculate the adjusting factor c = [l - 15/19.045] = 0.212, which is called the shrinkage constant. Here 15 equals the degrees of freedom (d.f.) of the weighted distance minus two. Then the formula for the ith player’s predicted average is obtained by the James-Stein’s formula for the ith predicted average 6;’ of the ith player, namely, 6,’ = 6 c(& - 6), i = 1,2,. . . ,18. This means that the first player has the predicted average
+
87 = 0.265 + 0.212(0.400 - 0.265) = 0.294, and the 18th player has the average
if8= 0.265 + 0.212(0.156 - 0.265) = 0.243. Actual batting averages were 0.346 and 0.200, respectively. The preliminary test estimator in this case is 0.265 a t the 0.15 level of significance. The usual sample mean vector has the following properties: (I) best unbiased estimator, (2) best equivariant estimator, (3) maximum likelihood estimator, and (4) minimax estimator. Yet the performance is inferior in terms of the expected squared loss relative to the James-Stein estimator. Observe from Figure 1.1 that the predicted batting averages are more concentrated around the grand mean compared to the original batting averages. Stein’s method pulls the original estimates toward the grand mean. Berger (1980a, 1985) has a detailed account of some of these developments, mostly related to the classical multinormal and specific types of exponential
Chapter 1. Introduction
4
0 5
--0.356 -3 0.333 -- 4 0.311
0 265.
__
5 8 6
-- 8 -- 9 8 10 0.222 -- 11 l o 15 0 2 -- 16
1
1
0 257
0 244
0.178
--
I
_.
0 294 1 0.289 ~- 2 0 280 _- 4 0.275 -- 5 8 6 0 270 -- 7 0.266 -- 8 0 261 -- 9 8 10 0 256 -- 11 lo 15 0 252 -- 16 0.245 -- 3 0 . 2 4 3 -_ 18 0.241 17
__
17
0.156 -- 18
0 Batting Average
Predicted Batting Average
Figure 1.1 Display of predicted batting averages based on Stein’s formula
Chapter 1. Introduction
5
families of distributions. For discrete multivariate Stein-type estimators we refer the reader to Ghosh, Hwang, and Tsui (1983). Estimators that uniformly improve over standard (least squares or maximum likelihood estimators among others) estimators are usually termed Steintype estimators in honour of Charles Stein. Basically, Stein-type estimators may be obtained via a decision-theoretic approach (risk-difference approach) due to Stein or the empirical Bayes approach (EBE) due t o Robbins (1955) and Efron and Morris (1972, 1973, 1977) (see also Casella, 1985). “Empirical Bayes” is a term that has many meanings, reflecting different approaches to solving problems. Efron and Morris (1973) applied this method to justify the Stein-type estimation. The general approach of empirical Bayes estimation may be depicted as
I Statistical Estimation 1
[Classical]1
-
1
1
p&zq
The empirical Bayes method sits in between the classical (Neyman-Pearson) and the Bayesian approach borrowing pieces from each. However, Saleh and Sen (1978-1986), Sen and Saleh (1979-1987), and Casella (1985) pointed out that the Stein-type estimators involve an appropriate test-statistics for testing the adequacy of an uncertain prior information on the parameter space, which is incorporated in the actual formulation of the estimator. Stein-type estimators adjust the unrestricted estimator (for the full model) by an amount of the difference between unrestricted and restricted estimators scaled by the adjusted test-statistics for the uncertain prior information. Generally, these test-statistics are the normalized distance between the unrestricted and restricted estimators and follow a noncentral chi-square or F-distribution with appropriate degrees of freedom. The risk or MSE of Stein-type estimators depends on the noncentrality parameter, which reflects the distance between the full model and the restricted model. The preliminary test estimators (PTE) are the precursors of the Stein-type estimators, and a careful look at the P T E reveals that a simple replacement of the indicator function by a multiple of the reciprocal of the test-statistics defines the Stein-type estimators. This procedure will be known in this book as the preliminary test approach (or alternatively, we may call it a quasiempirical Bayes approach) to shrinkage estimators. This procedure has the far-fetched implication that it combines the two diverse areas of robust estimation and shrinkage estimation. On this ground Saleh and Sen (19781986) took this path to broaden the improvement of standard rank estimators to shrinkage estimators. Among many benefits of Stein-type estimators is that they can be used as a predictive tool for events of interest. For example, they could be used
Chapter 1. Introduction
6
to predict batting averages of individual baseball players from a collection of many batting averages of several baseball players in a season (see Efron, 1975), or meta-analysis of several case control studies, or forecasting, based on timeseries data-fitting autoregressive models among others. A broad area known as “small area estimation” has evolved based on the Stein-type estimation for predicting in a small area where the sample size is very small. The methods described in this book provide the most useful techniques for combining data from various sources, such as in meta-analysis and data mining and in many other modern topics.
1.1
Objective of This Book
The main object of the book is to provide the readers with the mathematical and statistical foundations of the preliminary test and shrinkage estimation. This will enlarge the scope for use in applied areas. Basically we will present analytical properties of the five estimators: unrestricted estimator (UE), restricted estimator (RE), preliminary test estimator (PTE), Stein-type shrinkage estimator (SE), and positive-rule shrinkage estimator (PRSE) to show how they can be applied to the standard models, such as linear models, parallelism models, ANOVA models, multiple regression model, and their variations, multivariate models, and discrete data models. We usually assume the error distribution in these models to be normal. We also include nonparametric methods and models with nonnormal errors and provide asymptotic results.
1.2
Statistical Decision Principle
The basic elements of a statistical decision theory are the following: (1) A random experiment represented by the triplet ( 3 ,B,Po)where X is the sample space of the experiment, B is the o-algebra generated by X, and Po is the class of probability distributions over X indexed by the parameter 8 E R, where R is the parameter space. (2) A nonempty set A of actions at the disposal of a statistician. ( 3 ) A loss function, L(8,a ) , that is a real-valued function defined on R x A. These elements are related as shown:
I
I
1.2. Statistical Decision Principle
7
If X, = x, = ( 2 1 , . . . ,2,)’ is observed, the statistician takes the action a = b(x,) E A, where 6 is a decision rule belonging t o a nonempty set V. If A = R, then the problem is one of estimation. If A = { a l ,a z } , then any decision rule 6 partitions the sample space X into two disjoint sets C and ??. If 6(x) = a1 E C or 6(x) = a2 E then action a1 or a2 is taken, respectively. This problem relates to testing of hypotheses, where a1 relates t o rejecting a null-hypothesis and a2 relates to accepting the null hypothesis. The loss incurred during the process of choosing elements of a E A is defined by L(6,a). The action a is the result based on a function 6 E V such that 6(x) = a. Then the loss L(B,b(x))is indicative of the action a E A. It is then a random variable. The average of the loss L(6,6(x)) is called the risk R ( 0 , b )= E~{L(6,6(x))}= L(6,6(x))dPe(x) associated with the use of the decision rule 6 from the class V. The fundamental problem of statistical decision theory is the selection of 6 E ’D such that the risk R(0,6)is minimum for all 6 E D.We thus need to specify some criterion to compare various decision rules.
c,
Definition 1.1. A decision rule 6* is said to dominate another decision rule 6’ if for all 8 E 52, R(8,6*) 5 R(6,6’). If, in addition, for some 6 E R, R(6,6*)< R(6,do), then 6’ strictly dominates 6’. Definition 1.2. A decision rule 6* is said to be admissible if for any other decision rule, 6’ E ’D, R(6,6*)5 R(6,6’) for all 6 E R. The criteria used to limit the class of V of decision rules are “unbiasedness” “invariance” and “sufficiency”. Another possible criterion is the LLminimaxf’ principle. Definition 1.3. A decision rule 6*(x)is minimax within the class V ,if 6‘ E V and sup R(6,6*)5 sup R(6,6) for all 6 E V . BEn BE n The Bayes principle with respect to a prior distribution r ( 6 ) is frequently used to choose a minimax decision rule. Then the Bayes risk of the decision rule 6(x) is given by
611.
P ( T , 6 ) = &“6,
Definition 1.4. A decision rule bB is called a Bayes decision rule if
For the estimation of 6 E R c R’ based on the loss function (6 - 6)2, a Bayes decision rule hB is given by
bB(.) = qelx = 4 =
I
eh(qz)d6;
8
Chapter 1. Introduction
this is called the Bayes estimator of 0, where h(6iz)is the posterior distribution of 6 given X = z. For details the readers are referred t o Rohatgi and Saleh (2001). One of many methods of determining minimax decision rules is to find a Bayes rule based on the prior ~ ( 0 )say, , b B ( z )such that the risk, R ( 6 , S B ) = constant for all 6 E a; then S B is a minimax decision rule. See Rohatgi and Saleh (2001, thm 8.8.2) in connection with the estimation problem.
1.3
Quadratic Loss Function
In this book we evaluate the estimators in a decision-theoretic setup based on quadratic loss functions among other loss functions such as (1) the absolute deviation function l6-6(z)j and (2) the Linex function [exp(ab(z))-bb(s) - 11 for analytical simplicity. Let 6 = (61,. . . , 6,)’ be a vector of decision rules based on a sample of size n and let 6 = (01,. . . ,6,)’ E RP. Then we have a weighted square loss function defined by
q e ,6) = n(iqx) - e)’w(s(x)- e) = nIlW
-
~ I I L
where W is a positive semi-definite (p.s.d.) matrix of weights. As a special case we also discuss unweighted quadratic loss
L ( 6 , 6 )= 4 1w - 42, where W = I, is the identity matrix. Then the corresponding risk function is given by
~ ( 6 ~=6nEOIP(x) ) - 6IIL = nEellG(x) - 6jj2, if W = I,.
Under the weighted quadratic loss function, the Bayes estimator is the mean of the posteriori distribution given X = z corresponding to the prior distribution r ( 0 ) .If W = I,, then we obtain an unbiased estimator with minimum variance. If we formulate the estimator as unbiased and linear, then the results coincide with the Gauss-Markoff theory having minimum risk or best linear unbiased estimator (BLUE). Another quantity we may consider is the mean square error (MSE) matrix defined by
M = n ~ 6 [ ( q x-) e)(s(x)- e)q. Note that the weighted quadratic risk function is R(8,6)= t r ( M W ) . The idea of using the criterion based on M is to find estimators whose mean square
1.4. Some Statistical Models with Preliminaries
9
error is small, while the R(B,6) minimizes the weighted sum of the mean squares. An estimator 6 is as good as 6’ if
R(t’8,t’s) I R(t’6,i?’6’) for all 6 E RP for a given nonzero vector i? = ([I,..
. ,tP)’. In other words, we must have
t’ { ~ E O [ ( ~- ~~)(S*(X) ( Z ) - O)’] - ~ E O [ ( ~-( 6)(8(~) X) - B ) ’ ] } f? 2 0 for all t # 0 and for all 6 E RP for the mean square error.
1.4 Some Statistical Models with Preliminaries In this section we consider some basic statistical models that are frequently used in applied statistical methodology, with preliminary information with regard to estimation and testing of hypotheses. These will be used t o discuss the preliminary test and shrinkage estimation in later chapters. Some nonparametric methods are also given with preliminaries in Chapter 2.
1.4.1
Mean and Simple Linear Models
Consider the simple model
Y
= 61,
+ El
(1.4.1)
where Y = (Yl,.. . ,Y,)’ is the observation vector of n components, 1, = (1, ... , 1)’ is a vector of an n-tuple of one’s, 6 is a constant, and E = (€1, . . . ,E ~ ) ’is a vector of n independent components distributed as N,(O, u21n), I, is the identity matrix of order n. Using the sample information and the specification (1.4.1), we obtain the unrestricted estimator (UE) of 6 by minimizing the quadratic form
(Y - Ol,)’(Y
- 61,)
(1.4.2)
as
6, = (l;ln)-ll;Y
=
Y,
(1.4.3)
where ‘L is the sample mean, distributed as N(6,u 2 / n ) .Further, the unbiased estimator of o2 is given by s2 =
-(Y - f?,l,)’(Y n-1
- 6,1,).
(1.4.4)
Chapter 1. Introduction
10 In order to test the null hypothesis the test-statistic
L, =
7218,
No : 8 = 80 against H A : 8 #
- 8012 u2
80, we use
if o2 is known
(1.4.5) The exact distribution of C, under H A (r2known) follows a noncentral chisquare distribution with one degree of freedom (d.f.) and noncentrality parameter A2/2, where (1.4.6) while L , (a2unknown) follows a noncentral F-distribution with ( 1 , ~1) - d.f. having the same noncentrality parameter (1.4.6). If 6 = 80, the distributions above reduce t o the central chi-square and the F-distribution, respectively. Next, consider the simple linear model
Y=Bl,fpx+€,
(1.4.7)
. . . ,Y,)’and x = (q, . . . ,2,)’ are the observation vector and where Y = (Yl, fixed vector of known constants, while E = ( € 1 , . . . ,E,)’ is the error vector of the model distributed as Nn(0,021,). Based on the LS/ML principle, the estimators of (8,p)’ is given by ( 1.4.8) where
1 Q = X’X - - ( 1 ’ , ~ ) ~ . n
( 1.4 9)
The exact distribution of (en,,&)’ is a bivariate normal distribution with mean vector (8,p)’ and covariance matrix
(1.4.10) In order to test the null hypothesis Ho : p = ,& against H A : p use the test-statistic
L, =
IPn - Pol2&
s:
1
#
Po, we
(1.4.11)
where (1.4.12)
1.4. Some Statistical Models with Preliminaries
11
The exact distribution of L, under H A follows a noncentral F-distribution with (1,n - 2) d.f. and noncentrality parameter A2/2,
( 1.4.13) Under H o , it follows a central F-distribution. Similarly, in order to test the null hypothesis HO : 6 = 60 against H A : 8 # 60, we use the test-statistic
(1.4.14) which follows a noncentral F-distribution with (1,n - 2) d.f. and noncentrality parameter A2/2, where
(1.4.15) Under Ho, it follows a central F-distribution. Further, in order to test the null hypothesis HO : 8 = 60, 0 = PO against HA : 8 # 80, ,d # 00,we use the test-statistic
C,
= ~,’(6, - 6 0 ,
,Bn - 00)’
-
nx n?f
Q+n?f2
)(
1:
)
(1.4.16)
whose exact distribution is a noncentral F-distribution with (2, n - 2) d.f. and noncentrality parameter A2/2, where
Under Ho, C, follows a central F-distribution. Finally, let x = (0,. . . , 0 , 1,. . . , 1)’ in the linear model where there are n1 zero’s and 722 1’s in the vector. Then we obtain the two-sample problem of estimating means and testing difference between two means. Here p1 = 8,
p2 = 8 + P ,
and p2
--PI =
0.
(1.4.18)
n n Also Z = Thus, the mle of p1 is y1 Q = *,and - n i ( nn2 Q i+nz)’ and that of p2 is j j 2 . For testing Ho : p2 = p1 against H A : 1.12 # p ~ we , use the likelihood test -2
(1.4.19)
12
Chapter 1. Introduction
where 13, follows a noncentral F-distribution with (1,nl noncentrality parameter A2/2
+ n2 - 2) d.f. and (1.4.21)
Under Ho, it follows a central F-distribution. These models will be considered in Chapter 3 for further estimators of 6 or (0, p)’.
1.4.2 One-Sample Multivariate Model Let
Y1,Y2,.. .
, Y N be N observation vectors satisfying the model
Y , = ~ + E , a = 1 , 2 ,..., N .
(1.4.22)
Here Y, = (Yal,.. . ,Yap)‘,8 = (61,.. . ,eP)’,and E, = (€,I,. . . ,E,~)’ for E, Np(O,E);X is the covariance matrix of the error vectors E,, and the error vectors €1, € 2 , ... ,E N are mutually independent. The LSE/MLE of 8 is given by the sample mean vector Y = (71,. . . ,Yp)’ such that
-
( 1.4.23) In order to test the null hypothesis HO : 6 = 80 against H A : 8 # the likelihood ratio test-statistic
80,we
use
1 3 =~ N ( Y - 6o)’X-’(Y- 80) if X is known = NllY - &11&-1,
LN = N ( Y - & , ) ’ S - ’ ( y
- 80) if
= NllY - 6ol\&l= T2
X is unknown
(Hotelling’s T 2 ) ,
(1.4.24)
where N
s = C(Ya- Y)(Y, - Y)’. a=l
We then write
For detailed information, see Anderson (1984)’ Srivastava and Khatri (1979)’ and Rao (1973),
1.4. Some Statistical Models with Preliminaries
13
The exact distribution of C, (when C is known) is a noncentral chi-square distribution with p d.f. and (when C is unknown) it follows a noncentral Fdistribution with ( p , N - p ) d.f., having the noncentrality parameter A2/2 in both cases, where
A2 = N ( 6 - 60)’E-~(6 -60) = N1/6 - 601l5-1.
(1.4.25)
If 6 = 80, then it follows a central chi-squared and F-distribution, respectively. Now, if we assume X = u21plwe have to estimate only one parameter in the covariance matrix. Estimation of the mean vector 6 = (01,.. . ,6,)’ is considered in Chapter 4 when C = I,, and the general estimation problem when X is unknown is considered in Chapter 11.
1.4.3 ANOVA Models Let the observation vector Y be modeled as
Y=B6+&,
(1.4.26)
where
y 6 E
= (Yll,... ,Ylnl;Y21;...Yznz;... ;Ypl, ... 7 Ypn, )‘ = ( 0 1 , . . . ,6,)‘
= (€11,. * * ,Elnl;.
. . ;&PI,.* . ,E p n , ) /
B = Block diagonal matrix = Diag(l,, ,. . . , In,) Int = ( I , . . . ,1)’ a ni-tuple of ones 72
= n l + . . . + n.,
(1.4.27a) (1.4.27b) ( 1.4.27~) (1.4.27d) (1.4.27e) (1.4.27f)
The joint distribution of the components of E is &(O, a2B),and u2 is the variance of errors. Using the LS principle/MLE method, we obtain the estimator of 6 as
e,
=
(B’B)-’B’Y = N-~B’Y,
where N = B’B = Diag(n1,. . . , 72,). of u2 is then given by
en -
s; =
The corresponding unbiased estimator
-(Y - Be,)’(Y - Be,). n-p
(1.4.28)
(1.4.29)
N,(6, a2N-l) and (n - p)s2/a2 follows a central chi-square Moreover, distribution with ( n - p ) d.f. independent of 6,.
Chapter 1. Introduction
14
In order to test the null hypothesis HO : 6 = 601pagainst H A = 6 # &lP, where 90 is a constant, we use the likelihood ratio test-statistic (1.4.30) where H = ( I p - $1,1LN) and 601, = :lPlbN6, is the pooled estimator of 6 . The exact distribution of C, is a noncentral F-distribution with (p-1, n-p) d.f. and noncentrality parameter A2/2 where
A2 = (e’H’NH6)
(1.4.31)
62
If 6 = 601p,it follows the central F-distribution. This ANOVA model will be the subject of Chapter 5.
1.4.4 Parallelism Models Let Yna=6z1n*+flzxz+E,,
i=l,2,
-
... ,p,
(1.4.32)
where Y n z = ( y i i , . . . in,)', X, = (GI,. . . ,zznZ)’ and E , = (&,I,. .. , E , ~ , ) ’ . In addition, E , N ( 0 , a 2 1 , , ) , where In% is the identity matrix of order n,. Thus, we have p linear models with different intercepts and slope parameters. If = ... = p p = p (same), then we have p parallel simple linear models used in many problems of bioassay and shelf-life determinations in pharmaceutical problems. Now, consider the LSE/MLE - of the parameters {(6,,,5,)’;z= l , . .. ,p}. Let us define P = (Ti,... , Y p ) ’as the vector of means of the observations in p models, k = (Z1, . . . ,Zp)’ being the mean vector of the known z-vectors, and let = . . . ,&)’ be the LSE/MLE of the slopes in p-models. Then the intercept vector 6 = (61,. . . ,OP)’ may be estimated as 6, = ( e l , . .. ,I$)’, where
Pn (p,
8,=T,-p,Z,,
i = l , . . . ,p.
(1.4.33)
Let
T, = Diag(Z1,. . . ,Z,).
-
-
-
(1.4.34)
Then we can write the vector 6 , = (61,. . . ,6,)’ compactly as
0, = -iT - T,P,.
( 1.4.35)
If we want to test the null hypothesis HO : p = flol,, where POis a specified constant, against H A : /3 # polp (parallel vs. not parallel), we can use the LR test-statistic (1.4.36)
15
1.4. Some Statistical hdodels with Preliminaries where 1 nQ
H = I p - -1
P
l‘D;;,
nQ = X n i Q i ,
(1.4.3713)
D i . = Diag(niQ1,. . . ,npQp), 1
(1.4.37~)
niQi = X : X ~- -(Ikixi), ni
c(Yz P
s: = (n - 2p)-l
z=1
( 1.4.37a)
i=l
- &lnz - &xz)’(Yz- &lnZ - ,&xz).
(1.4.37d)
The exact distribution of L, follows a noncentral F-distribution with ( p , n - 2p) d.f. and noncentrality parameter A2/2, where
1 1 A2 = ,zP’H’D~~HP= - p ( P - Polp)’D22(P- P o l p ) .
(1.4.38)
Under Ho, L,, follows a central F-distribution with (p - 1, n - 2 p ) d.f. We will discuss this model in Chapter 6.
1.4.5 Multiple Regression Model and General Linear Hypothesis Consider the multiple regression model
Y=xp+€,
(1.4.39)
where Y is n x 1 vector, X is a n x p matrix of known constants, P = (PI,.. . ,,LIP)’ is a p x 1 vector and E is a n x 1 vector of errors distributed as N,(O, &,). In many applied problems one formulates the response variable Y as above and tries to assess the linear hypotheses of the form
(1.4.40) In this case, we are interested in estimating /3 under the constraint HP = h and to test a hypothesis concerning the constraint. Thus, the unrestricted LSE of ,f3 is given by
p, = (x’x)-’x’Y
=c-lx’y,
c = X’X.
When the restriction is applied, we obtain the estimator
bn = p , - C-lH-’(HC-’H’)-l(Hp,
(1.4.41)
p,,
- h).
(1.4.42)
Chapter 1. Introduction
16
In order t o test the null hypothesis Ho : HP = h against H A : HP # h, we use the test-statistic
c,
= q-1s;2
{ (HP, - h)’(HC-’H’)-l(HP,
- h)} ,
(1.4.43)
where s: = ( n - p)-’(Y - XP,)’(Y
-
XP,).
(1.4.44)
The exact distribution of C, is a noncentral F-distribution with (q, n - p ) d.f. and noncentrality parameter A2/2, where
A’ = (HP - h)’(HC-lH)-l(HP - h)/c2.
(1.4.45)
Under Ho, 13, follows a central F-distribution. Further, the estimation of p will be discussed in Chapter 7. There are three important variations of the basic model discussed above: (1) (X’X) is a singular/ill-conditioned matrix that prevents a reasonable inverse of (X’X). This means that there is multicollinearity among the elements of X, and X has rank ( 5 p ) or one of the characteristic roots of X’X is very small. (2) The restriction h = HP is not exact; rather, it is of the form
h = HP + V ,
v
N
Nq(6,a2S2),
(1.4.46)
where 6 # 0 and fl is a q x q matrix of known constants that reflects the precision of an earlier sample study. (3) The error vector in the model (1.4.39) has components related by a first-order stationery autoregressive process.
In case (l),the LSE of
p is obtained
by solving the normal equations
(X’X)P = X’Y.
(1.4.47)
The general solution for p is /3 = GX‘Y where G is the generalized inverse of (X’X) and the solution for p is non unique, and for the non-full rank, there exists an infinite number of solutions. Hoerl and Kennard (1970) devised a method to overcome the difficulties above. Thus, they proposed the solution called the “ridge estimator” of p, which is defined by
BHK = (X‘X + kIJ1X’Y
= (C
+ kIJlX’Y,
(1.4.48)
where k > 0 is a positive real number known as “ridge constant” for the estimator of p. The computation of the trace of
(C)-’
and
(C+kI,)-l
( 1.4.49)
shows that P
.
(1.4.50)
17
1.4. Some Statistical Models with Preliminaries
. , X p are the characteristic roots of X'X
where XI,..
tr[C+
kip]-' =
= C and
Xi
P
i=l ( X i
+ k)'
(1.4.51) '
Hence, tr[C
+ kip]-' 5 tr(C-'),
(1.4.52)
aHK
and the ridge estimator has smaller variability than /jn. There are various ways of deriving the ridge estimator of p and the basic problem becomes that of the estimation of k . The ridge estimator will be discussed in Chapter 9. As for the case (a), consider the model
Y=xp+E
(1.4.53)
-
subject to h = HP + v, where v N b ( O , a2C?),instead of assuming v to be non stochastic equaling 0. Here, C? is a known q x q invertible matrix. The covariance matrix may reflect the information from previous samples or some prior information regarding the sizes of the elements of 0. Thus, we can write (1.4.54) where (1.4.55) After suitable transformations we can write the model above as
[
Y f2t-'/2h
] [? =
C?-01/2
] [ c p+6 ] [ +
&
fi-'/'(v - 6)
]
(1.4.56)
subject t o =O
or
6=0.
(1.4.57)
It is clearly
Now, the unrestricted estimator (i.e. without model restriction) of
-
is
[ pn 3 h
[
by the LSE principle, with the unbiased estimator of o2 as s; =
-(Y'Y - y'xc-'X'Y). n-p
(1.4.59)
18
Chapter 1. Introduction
The restricted estimator of the parameters are
[
R A 6
] [ p,
- C-lH’[(HC-lH’) + O]-l(Hp, - h) h - a[(HC-’H’) + a]-l(HB, - h)
=
.
(1.4.60)
The test-statistic for testing the null hypothesis Ho : h = H,f3 is given by
cn =
(HB,
- h)’[HC-lH’
+ s2]-1Hp,
- h) 7
QS2
+
(1.4.61)
which under H A : h = HP 6, S # 0 follows a noncentral F-distribution with (9, n - p ) d.f. and noncentrality parameter A2/2, where a2
=
G’[HC-IH‘ + 0]-16
(1.4.62)
02
This model will be discussed in Chapter 10. Some background material may be obtained from Graybill (1976), Gruber (1998), Judge and Bock (1978) among others. Finally, the regression model that arises particularly in econometrics may be stated as
Y=Xp+E, where E = ( € 1 , . . . ,E,)’ and the components of stationary autoregressive process et = pet-1
+ at,
E
are related by a first-order
~ ( v t= ) 0, Var(v,2) = a : ,
(
for all t ,
7
(1.4.63)
and lpI < 1. This autoregressive specification implies that
[i
E(ee’)= D 2 R = o2
where
R-1 =
1
fn-‘
lt’
P 1
... ...
Pn-2
...
0
...
pn-1
... lfp2 -P
-p
j ,
(1.4.64)
(1.4.65)
and has n - 1 characteristic roots equaling (1 - p 2 ) - I with one root equal to one. Using the generalized LSE principle, we obtain the LSE defined by
b(R) = (XR2S1X)-1X’R-1Y,
(1.4.66)
where E[b(R)]= P and Cov[b(S2)]= o ~ ( X ’ R - ~ X ) -whenever ’ R is known. If R is unknown, then we estimate p, say, by b and plug in &3) in the expression above to obtain b(h2).Various other estimates are desired and will be obtained in Chapter 10.
19
1.4. Some Statistical Models with Preliminaries
1.4.6 Simple Multivariate Linear Model Consider the simple multivariate linear model
Y , = e + px,
+ E,,
E,
- N,(o,XI,
. . ,N ,
= 1,.
(1.4.67)
-
where Y , = (Y,l,. . . ,Y,,)’ is the observation vector, x, is a fixed known constant and E, = ( & , I , . . . ,E,,)’ N,(O, X).The LSE/MLE of 8 and p are given by SN = N PN =
T-PNT,
(1.4.68a)
(Y, - Y ) ( x , - z)
Q
,=l
( 1.4.68b)
with N
Q=
C(xa- S ) 2 .
(1.4.68~)
,=I
In order t o test the null hypothesis Ho : p = 0 against H A : p the test-statistic
CN
= Q ( & S - l f i N ) = T 2 (Hotelling’s T 2 ) ,
# 0, we use (1.4.69)
where
is the unbiased estimator of C. The exact distribution of C N is the Hotelling’s T 2 distribution. Equivalently, C, has a noncentral F-distribution with ( p , N - p ) d.f. and noncentrality parameter A2/2, where
A2 = Q(j3’X-1p).
(1.4.71)
As a special case, let us assume 2, = 0 if Q = 1,... ,N1, and 1 if Q = NI + 1,. .. ,N . Then we have the two-sample multivariate mean problem where
Also,
Nl
-
X=
Nl
+ N2
and
Q=
Nl N2 N1+ N2’
( 1.4.73)
Chapter 1 . Introduction
20
The mle of p 1 is 71 and that of pz is Yz based on the samples of sizes N1 and N2, respectively. Further, the mle of X is the pooled sample covariance
+
c N
( Y , -Yz)(Y, -Yz)’
a=Ni+l
In order to test the null hypothesis Ho : p 2 = p l against H A : pz # p l , we use the LR test-statistic as Y z - Y1) L N = N 1 N 2 (Pz- Y1)I s,-1 (Ni + Nz = T2
( 1.4.75)
(Hotelling’s 7’’)
The exact distribution is a noncentral F distribution with ( p , N1 d.f. and noncentrality parameter A 2 / 2 ,where
+ Nz - p ) (1.4.76)
Under Ho, it follows a central F-distribution. More estimation will be discussed in Chapter 11 for the regression and the two-sample problems.
1.4.7 Discrete Data Models Here we consider three selected discrete data models.
Product Bernoulli Distributions. Let { ( x i l , .. . ,xin, li = 1,.. . ,p } be a set of mutually independent Bernoulli random variables (r.v.’s) with joint distribution b
n,
(1.4.77) Let Y = ( y l , . . . , y p ) , y z = C;:, z t 3 .Then the MLE of 6 = (el,. . . ,e,)’ is given by en = (81,.. . ,jp)’, where 6z = n, ‘yi, i = 1,. . , p . If 81 = . . . = OP = 80 (unknown), then the MLE of 00 is 6on = n-l(y1 . . . y p ) . In vector notation, we then write
+ +
6, = (61, ... ,8,)’,
6, 6,
= 60,lp, = l,lbNB,,
1, = (1,. . . , l)’,
N = Diag(nl, . . . , n p ) .
(1.4.78)
1.4. Some Statistical Models with Preliminaries
21
In order to test the null hypothesis Ho : 8 = 6 0 1 p , we use the test-statistic
D, = n(6,
-
h,)'pg1(8, -
(1.4.79)
where
9;'
= [&(l - &)]-'Diag
( s l . . .3). n n
As n + m, D, approximately follows a chi-square distribution with ( p - 1) d.f. under Ho. Product Binomial Distributions. Consider k mutually independent 2 x 2
f""l
tables where each row is the outcome
...
521
n21 - 2 2 1
... n2J
- 523
52k
n2k
-52k
(1.4.80) of a binomial experiment. Then, the joint distribution of 1,.. . , k} is the product binomial given by
{(z131z23)jj =
(1.4.81) We are interested mostly in the estimation of the ((odds-ratios", 7c) = (41,.. . ,$k)' and homogeneity of the odds-ratios, meaning 7c) = $elk ($0 is a scalar), where
( 1.4.82) By invariance properties of MLE, we have the mle of II, as (1.4.83) and el, and 8z3 are the MLE of and 0z3, respectively. If 7c) = $ O l k holds, then the common estimator of $0 due to Gart (1992) is given by k
k
(1.4.84) j=1
where I;;
=
+ n I J & j ( l - &J 1
1 n2,&3(1
- &,)'
j = 1, ... ,k.
(1.4.85)
In order to test the homogeneity of the odds-ratiosl we consider the Waldtype statistic
Chapter 1. Introduction
22
where W = Diag(G1,. . . ,Gk). The asymptotic distribution of D , follows a central chi-square distribution with (k - 1) d.f. under Ho : = $olk. Product Multinomial Distributions. Consider a ( r x c)-contingency tables for two traits. Let xV stand for the cell frequency of the (z,j)-cell for z = l , . . . , T ( 2 2) and j = 1,. . . , c ( 2 a), and let xz+ = C,”=, xZJ,x + ~-
+
El=,x z J ,and n = Crzl xzJ. The probability distribution of the vector frequency n = (211 . ..xlc,x21, . . . ,xzC,.. . , z r l , . . . ,xrc)’ is given by the product multinomial distribution n!
{
-1 fi(X..d}
ITJ=1 r
c
(1.4.87)
JJ~zJ)21~,
z=1
where 6 = {0,,12 = 1,.. . ,r,
J
. . ,c} with 6 ’ l k
= 1,.
C
r
C
Oi+ =
= 1. Define
Oij
and
0+j =
C
oi,.
(1.4.88)
i=l
j=1
If there is independence structure in the table, then we have
Note that in (1.4.87) we have rc - 1 free parameters, while in (1.4.89) we deal with T + c - 2 free parameters. In order to test for the null hypothesis
Ho : 0ij = 0i+ . 0+3
for all ( i , j ) ,
we consider the following chi-square test:
(1.4.90) where 6, = (&+, . .. ,elc,.. . , & I , . . . ,BTC)’,
x-. 023. -- -2
and an=
with
I
8i,
.
.
( & + , . - . , & c , * - ., e r l , - . - ,6rc)’,
= &+ . 0 + j , 0i+ = x i + / n and
a+, = x + j / n . Also,
9 = Diag(811,. . . ,e l c , . . . ,&.I,. . . ,&).
(1.4.91)
It may be shown, following Bishop et al. (1975) and Agresti (1990), that D, closely follows the chi-square distribution with ( r - l)(c - 1) d.f. as n -+ m. The improved estimators of the parameters of the three models will be the subject of Chapter 12.
1.5. Organization of the Book
1.5
23
Organization of the Book
The book is divided into 12 chapters that cover most of the useful models in applied statistics. In Chapter 2 we present properties of the normal and chisquare distributions together with results involving the multivariate normal and discrete distributions. Matrix results and formulas for the calculations of risks of the estimators, together with the Stein formulas, are provided for ready references. Some preliminaries of nonparametric methods are also included in this chapter. Preliminary test estimation is introduced in Chapter 3 with the simple linear model. Chapter 4 involves the introduction of Stein-type shrinkage estimation in a simple multivariate model. A general development of the estimation of the mean vector when the covariance matrix is unknown is deferred until Chapter 11. The ANOVA model is discussed in Chapter 5, and Chapter 6 extends the results of the simple linear model to several linear models that may be parallel. Chapter 7 deals with the general linear model with linear hypothesis and estimation regression parameters. Chapters 8 and 9 contain extensions of Chapter 7 in two ways: (1) parameters are restricted by stochastic constraints, and (2) the design matrix may be ill-conditioned, leading to “ridge regression.” In Chapter 10 we consider the linear regression model where the errors are generated by a first-order stationary autoregressive process. Chapter 11 deals with the general problem of estimation of parameters with one-sample, twosample, and simple multivariate linear models when the error distribution is a pdimensional normal. Chapter 12 deals with three basic discrete data models, namely, product Bernoulli distributions, product binomial distributions, and the product multinomial distributions and the related estimation of the parameters and application to meta-analysis.
1.6
Conclusions
In this chapter we presented a historical perspective on the preliminary test and Stein-type estimators, illustrated by an example of the “baseball data” and the objective of the book. We also discussed the decision-theoretic approach to estimation and testing. Finally, we have presented the models that will be covered in Chapters 2 through 12 of this book.
1.7 Problems 1. (Refer to Section 1.4.1) Consider the model Y = 01, Show that
-
-
(a) en = y ,
(b) si = &(Y - &l,)’(Y - &l,)
+ E.
Chapter 1. Introduction
24
( c ) Show that the likelihood ratio test for HO : 8 = 60 versus H A : 6 # 80 is given by
L,
nl&
=
-
-
&,I2
0 2
- nI6n - 6oI2 -
if
0’
is known
if
0’
is unknown.
sE
(d) What is the distribution of Ln under Ho and under HA?
2. Consider the model (Equation 1.4.7)
Y
= 81,
+ Px +
E
E,
-
N,(O, 0’1,)
(a) Show that the LSE/MLE of (6,p) is given by i$[X’Y
Y
-pnz
-
;(lwlLY)l
with covariance matrix
(b) Show that the likelihood ratio test for HO : ,O = 00 against H A : /3 # ,f30is given by
L , = nIpn
-
02
,001~Q
if o2 is known
(c) Determine the distribution of L,.
3. Consider the two-sample problem by taking x = (0,. . . , O , 1,.. . , 1)’ in the linear model, then verify (1.4.19)-(1.4.21) and distribution of L,.
4. Let Y1,Y 2 , .. . , Y Nbe N pvector observations
Y , = e+E,, Let
y
= N-I
C,”=,Y,
a = 1,... , N ,
E,
N,(o,c).
and S = C,=I(Y,- y ) ( Y , N
-y)’.
(a) Show that
L,
=N
( y - Oo)’E-’(Y- 8,) if C is known
= N ( T - OO)’S-’(F- 00) if C is unknown
is the likelihood ratio statistic for testing Ho : 8 =
H~ : e #
eo.
60
against
25
1,7. Problems
(b) Determine the distribution of Ln under H0 and H A .
5. Consider the linear model (ANOVA)
Y = B6 + E ,
- Nn(O,a2B)
E
as defined by (1.4.26)-(1.4.27), where 6 = ($1,. . . , $ , ) l . (a) Show that the likelihood ratio statistic for testing HO : 8 = against H A : 6 # $01, is given by
1 -
13, = -$;H'NH6, Is2
-
( P - 1)s:
$01,
if rs2 is known
(8nHfNH8n)if
a2 is unknown.
(b) What is the distribution of L, under Ho as well as under HA? 6. Consider p simple linear models
-
Y n a = $,In,
+ ,&x, +
€2,
i = 1,. . . , p ,
where E , Nnt(O,0~1,~). Show that LSE/MLE of 6 = ($1,. . . ,19,)l and that /3 = (01,. .. ,&,)I are
en
= ($1, f . .
, e,),
e, = L, - b,T,,
i = 1,. . . , p ,
and
In order to test the null hypothesis Ho : ,f3 = the likelihood ratio statistic is given by
pol,,
polp against H A
: ,f3
#
Ln = (PlnH'D;iHBn) ( P - 1)s: (by (1.4.36)), where 0;; = Diag(nlQ1,. . . , n p Q p ) .
7. Consider the multiple regression model
Y = xp + E ,
E
-
Nn(0,o21,).
Find the likelihood ratio test for the null hypothesis H0 : HP = h against H A : H P # h given by (1.4.40), and determine its distribution.
26
Chapter 1. Introduction 8. Consider the multivariate simple linear model
Y, =8
+
PX,
+
E,
E,,
-
N,(O, C).
Show that the likelihood ratio statistic for testing HO : p = 0 against H A : p # 0 is given by CN =
Q(&s-~P~)
where S is defined by (1.4.70). What is the distribution of L N ? 9. (Refer to Section 1.4.7a)
(a) For- the product -Bernoulli distributions verify that the MLE of 6 i s 6 , = ( 8 1 , ... ,8,)'. (b) Show that Cov(8,) = Diag . . , e,( 1-op) I
(y,. n P
(c) If 6 = O01,, show that the MLE of O01, is
1, IbN -
eon= On, n
N = Diag(n1,. . . ,n,)
1P1; with Cov(60,) = -. n
-
) 0.,
(d) Consider the r.v. ( 6 , - 60,) = Ho : 6 = 801,
Show that under
(e) Show that
D, = n(8, - 60,) I C,-1 ( -6 , - eon), where C,' = [&( 1 - 60)]-'Diag n I / n , . .. ,n,/n), approximately follows the chi-square distribution with ( p - 1) d.f. under Ho.
(
10. (Refer to Section 1.4.7 and k (2 x 2) contingency tables) Define
(a) Show that the approximate variance of
j = 1,2,. .. ,k, as n + j = n l j nij/n+j -+ X i j (< a).
+ n2j
Gj
is
-+
co in such a way that
1.7. Problems (b) Let 40,
27
40, = w-’cf_,wjq,.
Show that the asymptotic variance of
is +:w-l.
(c) Show that
D, = G;& (:
- &,I~)’w(& - ?jIJnlk),
w = Diag(h1,. . . , ~ k )
follow approximately the chi-square distribution with ( k - 1) d.f. as n -+ 00.
11. Let (XI,. . . ,z,)’
follow the multinomial distribution
(a) Show that the MLE of 8 = (0, ,... ,&)’ is Oj = j = 1,... , k . I
?,
6,
=
(81,... , 6 k ) ’ ,
(b) n ~ o v ( 6 , )= Diag(B1,. .. , O k ) - 88’ = ~ ( 6 ) .
12. (Continuation of Problem 11) (a) Show that V &(en - 8)21Nk{O,x(O)}.
(b) D, = n(8, - ne)’Z;’(6, - no) approximately follows the chisquare distribution with k d.f. as n -+ c ~ . (c) (Refer to Section 1.4.7~)For the T x c contingency table, show that ..-I ,. D, = n(6, - 6 , ) ’ ~(8,~ - en)
where 6,, 6, and $, are given by (1.4.90)-(1.4.91). Show that D, approximately follows the chi-square distribution with (T - l)(c- 1) d.f. as n -+00 under HO : 8ij = Oi+ . O+j for all ( i , j ) .
This Page Intentionally Left Blank
Chapter 2
Preliminaries Outline 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 2.10
Normal Distribution
Chi-square Distribution and Properties Some Multivariate Normal Theory Results Beta Distribution and Applications Discrete Distributions Matrix Results Large Sample Theory Nonparametric Theory: Preliminaries Conclusions Problems
In this chapter we provide summary of the properties of statistics related t o normal and chi-square distributions. Particularly, the Stein Lemma and its extension is provided for application in subsequent chapters. In addition, we provide some core results on matrix theory, multivariate discrete and continuous distributions, beta distributions, large sample results, and nonparametric theory for application in subsequent chapters.
2.1
Normal Distribution
The most important distribution in statistical theory is the normal distribution. The probability density function (pdf) of the normal distribution may be denoted as N ( p ,02), which is written as follows:
(2.1.1) where p is the mean and o2 is the variance of the random variable X . Some of the properties of a normal variable may be listed as follows:
29
Chapter 2. Preliminaries
30
If X is N ( p ,a’),then 2 = X - p / u is N(0,l ) , the standard normal variable. 0
-
If X1 and X2 are independent, such that X , N ( p i ,u;), ( i = 1,2), then X1 fX2 is distributed as a normal variable with mean p1 fpz and variance 0: u$.
+
0
If X I , . .. , X , are n independent N ( p ,u 2 ) variables, then the sample mean = :(XI . - . x,)is ~ ( p$). ,
x
+ +
If X I , .. . ,X, are independent random variables having any distribution with common mean p and common variance u z , then f i ( X - p ) / u is distributed as N ( 0 , l ) as n -+ 00. This is known as the central limit theorem (CLT). These properties are amply listed in Cramer (1946), Hogg, McKean and Craig (2005), Rohatgi and Saleh (2001), Judge and Bock (1978), and Bickel and Doksum (2001) among many others. 0
(Stein’s Indentity) If X is N(6,u’) and if p(X) is a differentiable function satisfying EIp’(X)I < 00, then
E [ Y ( X ) ( X- 611 = 02E[cp’(X)1.
(2.1.2)
(Extended Stein’s Indentity) If X = ( X I , .. . , X,)’ is Np(O,~ ’ 1 , ) and if a function p(X) = (pl(X), . . ,p,(X))’ is partially differentiable, that is, (2.1.3a) and pi(X) is a continuous function of X i for all vectors, Xlil = ( X I , . . . , X i - l , X i + l , , . . ,Xp)’ then (2.1.3b) provided that
EllP(X)1I2< Note that for p = 1, (2.1.3b) reduces to (2.1.2). For the Stein identities, see Casella and Berger (1990) and Hoffmann (1992) among others.
2.2
Chi-square Distribution and Properties
If X is N(0,l), then X 2 is distributed as a central chi-square variable with one degree of freedom denoted as x:. However, if X is N ( p , u 2 ) ,then X2 is distributed as a noncentral chi-square variable x:(A2) with one degrees of
2.2. Chi-square Distribution and Properties
31
&
freedom and noncentrality parameter = fa2.The pdf of a noncentral chi-square variable with v degrees of freedom and noncentrality parameter ;A2 may be written as
+
where hv+2,(x2;0) is the pdf of the central chi-square distribution with v 2r degrees of freedom. The pdf of a central chi-square distribution with v degrees of freedom may be written as
(2.2.1b) The mean and variance of this distribution are Y and 2v, respectively. The first four cumulants of the noncentral chi-square distribution at (2.2.la) are given by ~1 = v
+ A2,
and ~4
= 48(v
~2 =
2(v
+ 2A2),
63
+
= 8 ( ~3A2),
(2.2.2)
+ 4A').
We denote by Hy(x; A') the cdf of a noncentral chi-square distribution with v degrees of freedom, and noncentrality parameter A2/2. Some of the results involving noncentral chi-square distributions (see also Judge and Bock, 1978, Appendix B) are given below in theorem/corollary form. Note that Hv+2(x;0) = - h v + 2 ( z ; 0)
+ Hv(z;0).
(2.2.3)
Theorem 1. Let xz be a central chi-square variable with v degrees of freedom and let p(xz) be a measurable function of xz. Then
Theorem 2. Let x?(A') be a noncentral chi-square variable with v degrees of freedom and noncentrality parameter A2/2, and let (p(x;(A2)) be a measurable function of xE(A2). Then
Theorem 3. Let xz(A') be a noncentral chi-square variable with v degrees of freedom and noncentrality parameter A2/2, and let (p(x;(A2)) be a measurable function of xZ(A2). Then
32
Chapter 2. Preliminaries
Further to the above, we have the following results involving the expectation of a product of a normal variable and functions of a noncentral chi-square variable:
Theorem 4. If 2
-N(A7
1) and ~ ( 2 is a~Borel ) measurable function, then
E [ 2 d 2 2 ) ]= A + ( X W ) ) ] .
Theorem 5. If 2
- N ( A , 1)
(2.2.7)
and ~ ( 2 is a~Borel ) measurable function, then
r;.[2”(2”,]= + 4 x ; ( A 2 ) ) ]
+ A2E[C“(x;(A2))].
(2.2.8)
Theorem 6. Let Z = (21, . ..,Z,)’ be pdimensional normal vector distributed as N p ( p Ip). , Then for a measurable function p we have JqZdZ’Z)] = PE[io(x;+2(w)]*
(2.2.9)
Theorem 7. Let Z = (21, .. . ,2,)’ be a pdimensional normal vector distributed as N,(p, I,). Then for a measurable function p we have
Theorem 8. Let Z = (21, . . . 2,)’be a pdimensional normal vector distributed as N p ( p I,,). Then for a measurable function p we have for a positive definite matrix A E[Z’AzdZ’Z)] = t r ( A ) E [p(x;+2(A2))] + P’APE[V(X;+~(A~))]. (2.2.11) For proof of Theorems 1 through 8 see Judge and Bock (1978). Finally, we may obtain the following useful identities:
where Er stands for the expectation with respect to a Poisson variable r with mean A2/2.
A2E[Xc:~(A2)] = 1 - ( p - ~ ) E [ x ; ~ ( A ~ ) ] .
(2.2.13~)
2.3. Some Results from Multivariate Normal Theory
33 (2.2.13e)
(2.2.13f)
Some Results from Multivariate Normal Theory
2.3
Let x be a pvariate normal vector with mean vector p and covariate C be denoted by N P ( pX). , If XI,... ,X N be a random sample of size N from NPCP, The joint pdf of XI,. . . ,x~ is given by
Let f =
N & Ca=l x,,
and let N
v = C(X,- f)(x, - rr)'
(2.3.2)
a=l
Then we have the decomposition given by N
C(X,a=l
-
p)' =
v + N(%
-
-
4'.
(2.3.3)
Note that 2 and V / N - 1 are unbiased estimators of p and C, respectively, while V / N is the mle of C and is a biased estimator of C. With respect t o N.(p,C) the following properties may be noted:
I. X
N
NP(p,&X)
2. V W P ( Cn, ) ,where W P ( Cn, ) stands for the Wishart distribution with n = N - 1 degrees of freedom. Then the following results may be verified N
a. E [ V ]= nC, b. E [ V A V ]= n(n+l ) X A C + n t r ( A C ) C , where A is symmetric, and
Chapter 2. Preliminaries
34 c. E [ v - l ] =
Ax-!
3. Let T 2 = NnS’V-’% Then
n-p+lT2 N-pT2 v -=---F~ N, P (A ) P n P n
(2.3.4)
where V stands for “equal in distribution” and F p , ~ - p ( A 2denotes ) a noncentral F-distribution with ( p , N - p ) degrees of freedom and noncentrality parameter A2/2, where A2 = Np’E-lp. Further, we note that
which proves the assertion (2.3.4). Here Fm,,(A2) stands for the noncentral F-variable with (m,n)degrees of freedom and noncentral parameter
A2/2.
Theorem 1. If X
- Np($,X),
then
E[X’AX] = t r [ A E ]
Theorem 2. Let C function g we have
- Wp(n,E).
+ $’A$.
Then for any vector
P and measurable
Theorem 3. If X1,Xz ,... , X N are i.i.d.r.v. with distribution Afp($,E) and if 6 has the prior distribution AfP(&,A),then the joint distribution of (o’, XL)’is
For more properties of the multivariate normal distribution, see Anderson (1984) and Srivastava and Khatri (1979) among others.
2.4 Beta Distribution and Applications Let xz, and x:, be two independent central chi-square variables with u1 and u2
degrees of freedom , respectively. Define
X=
2 XU,
x:, + x:*
.
(2.4.1)
Then X is distributed as a beta-variable with the pdf given by (2.4.2)
2.4. Beta Distribution and Applications Themean and variance of X are v l / q respectively.
35
+ v2 and vlv2(vl+v2)-1(v1+v2+1)-1,
Application I. Compute E [ X I ( X > c ) ] ,where I ( A ) is the indicator function of the set A:
-
since
Al
v1 + vz
B ( % + I,?) B(%,%)
1 xu'+2/2-1
(1 - x ) V 2 / 2 - 1
B(Y,Y)
dx,
(2.4.3)
r(+)r(v) vl -r($$('1+,.z+2) v1 + VZ' Application 11. If F;,m = x:/xL with x: and x; independent chi-square variables, then the statistic G = x: + x; is independent of F;,m. Thus, we -
write
(2.4.4) Now, find the E[x;I(F;,, > c ) ] .In this case, we write
(2.4.5) where x is a B
(
p+2 fIf
2 7 2 )
-variable.
+
2 and F;+zr = x ; + ~ ~ / x ; Application 111. Let T = x;+~,. xm Consider . E[cp(F;+z,,,)x&2,], where cp is a B-measurable function. Then
Chapter 2. Preliminaries
36
Also,
and m
E[v(F;+2T,m)F;&m]
=
-2
+
2r
For more information on these properties, see Sclove et al. (1972).
2.5
Discrete Distributions
In this section we only discuss the binomial and the multinomial discrete distributions with their properties useful for our purposes.
Binomial Distribution Let z l , . . . ,x, be n independent Bernoulli variables with 2.5.1 y=
n
xi follows the binomial distribution (;)oy(i
- e)n--y,
parameter 6. Then
= 0, I , . . . , n.
(2.5.1)
The first moment and the next three central moments of y are given by n6, n6(1 - O ) , n6(1 - 6 ) ( l - 26), and n6(1 - 6)[l + 3(n - 2)6(l - B ) ] , respectively. One may see that the skewness and kurtosis of the distribution tend to 0 and 3, as n -+ co,supporting the fact that the binomial distribution may be approximated by the normal distribution and we have the De Moivre-Laplace theorem (see Renyi, 1970, page 204).
Theorem 1. The binomial distribution
may be approximated as follows:
and if
--OO
lim
n-05
< a < b < +co, then we have
C a < d Zn.e2(!i.--e<)b
Further, we may show
(~ ) B Y ( I
-
0)n-y
=-
e-z2/2dz.
(2.5.3)
2.5. Discrete Distributions
37
Theorem 2. The cumulative probability for the binomial distribution satisfies
uniformly in z E (--oo,co), where @ ( z ) is the cdf of the standard normal variable.
2.5.2
Multinomial Distribution
Let us consider a population which may be classified into k mutually exclusive categories, namely A l , . . . ,A k . Let (21,. . . ,xk)’ be the vector of frequencies observed in the k categories A1 . . .Ak in n trials. Then the probability distribution of x = (21,. . . ,xk)’ is (2.5.5)
+
+
where n = x1 ... xk, 6 = (81,. . . ,&)’, 6% = 1, and 1,. . . ,k). The mean and covariance matrix of x are
P(Ai)= 8i (i
E ( X ) = n0 and Cov(X) = n(D0 - OO’} = n V O ,
=
(2.5.6)
where DO = Diag(B1,. . . ,Bk). The following theorem (see Renyi, 1970, pp. 211-212) is an extension of Theorems 2.5.1 and 2.5.2 with respect t o the binomial distribution:
Theorem 3. k
-
-1
k
exp { (s) ID6 1
-
:(x
- nO)’Dil(x- n o ) }
2
Now, setting Zi = (xi - no,)f d
+ O(n-l/’). (2.5.7)
m - ,we obtain (2.5.7) as (2.5.8)
where 2 = (21,.. . ,Zn)’ and
with
b . - 6.- (1 - 9 3 ) 33
-
3
Bk
+ (1 - Oj),
j = 1,.. . ,k
Chapter 2. Preliminaries
38 and
(2.5.9) Hence,
Zk-1
3
exp{ - Z'BZ}
dzi . . .dzk-l
+ O ( ~ L - ' / ~(2.5.10) ).
For more properties of the binomial and multinomial distributions see Cram& (1946), Kendall and Stuart (1963), Bishop, Fienberg, and Holland (1975), Agresti (1990), Renyi (1970), and Sen and Singer (1993) among others.
2.6
Matrix Results
The following results are presented as theorems or corollaries: Let A be an n x n matrix. Then the roots of the determinantal (characteristic equation)
[A - A11 = 0
(2.6.1)
are called the characteristic roots of A or eigenvalues of A.
Theorem 1. If A is an n x n matrix, then there exists an orthogonal matrix I' such that r'Ar = D = Diag(d1,.. . ,d,). If A is positive semidefinite, then di 2 0 , i = 1,... ,n, and if A is positive definite, then di > 0, i = 1,. .. ,n. Theorem 2. Consider the partition of the nonsingular matrix A given by (2.6.2) where A22 is nonsingular. Let A11.2 = A11 - A12Ai.Azl. Then
Further, (2.6.4)
Theorem 3. If A is an n x n positive definite matrix and B is an n x m ( m < n ) matrix with rank m, then B'AB is positive definite.
39
2.6. Matrix Results
If A is an n x n idempotent matrix, meaning A = A’ and AA = A, then as long as the rank of A is r , A has r positive roots and n - r roots equal t o 0, the rank of A is equal to tr(A), and there exists an orthogonal matrix r such that (2.6.5)
Theorem 4. If A and B are two symmetric matrices, a necessary and sufficient condition for an orthogonal matrix I’ t o exist, such that I”Ar = D1 and r’Br = D2 where D1 and D2 are both diagonal matrices, is that A and B satisfy AB = BA, meaning A and B commute. Theorem 5 (Courant Theorem). Let XI, ’ . ,A, be the characteristic roots of an n x n matrix A such that min X i = A’, max X i = A,, and let q ,. . . ,21, be the characteristic vectors. Then A = A l q v i ... A n V n v L , I = V ~ V ; .. . u,vU:, : sup x’Ax = A, and inf
+ +
(;.;-)
+
+
(2.6.6) where min X i = Ch,i,(A)
max X i = Ch,,(A).
and
2
(2.6.7)
2
Theorem 6. If A is an n x n matrix and b is an n x 1 vector. Then max x
x’bb’x = b’A-’b. x‘Ax
(2 A.8)
~
Theorem 7. If A is an n x n positive definite matrix and b is an n x 1 vector, a necessary and sufficient condition for A-’ - bb’ to be positive definite is that b’Ab is less than 1. Further, bb’ - A-’ cannot be negative definite for n exceeding 1.
(
)
Theorem 8. Let A and D be nonsingular matrices of order B be a pl x p2 matrix. Then
(A
+ BDB’)-’
= A-1
- A-’B
pl
and pz and
(B ’ A - ~ B+ D - ~ ) - ’ B / A - ~ .
(2.6.9)
Theorem 9. If A and D are square matrices such that A is nonsingular, then (2.6.10)
Theorem 10. If A be a p x p matrix and U a column p x 1 vector, then
/A + UU’I= I A(I~ + U’A-’U)
if
+
I A ~o
(2.6.11)
Chapter 2. Preliminaries
40
Application I. (2.6.12)
+ 7’) and PX= X(X’X)-lX’ and X is a n x p matrix. where B = 02/(02 Here, B = X and D = (X’X)-’ of Theorem 8. Application 11. (a1
+ bwwy = -a1( I + a + bW’W
(2.6.13)
+ bWW‘( = a p(1 + -ab W’W).
(2.6.14)
and
laI
Theorem 11. Let A be a p x p positive definite matrix, B a p x m matrix, and B1 a p x n matrix such that the rank of B1 equals p minus the rank of B and B’B = 0. Then
is a positive semidefinite matrix of rank = p - rank B.
Theorem 12. If A and B are two p x p positive matrices, then
+
A - A(A + B)-’A = (A-’ B-’)-’, (2.6.15) (i) meaning the 1.h.s. equals the inverse of the sum of the inverse of A and B. (2.6.16) I - A(A + B)-l = B(A B)-’, (ii) meaning the 1.h.s. equals the product of B and the inverse of the sum of A and B.
+
For more results on matrix theory see Anderson (1984), Rao (1973), and Srivastava and Khatri (1979) among others.
2.7
Large Sample Theory
Finite sample theory is generally based on the assumption of normality. In practice, this assumption may not hold and asymptotic theory is invoked. The results are parallel to normal theory. In the understanding of asymptotic theory, we look forward to four distinct types of convergence of sequence of random variables to a limit. We provide only a limited amount of results which achieve our goal in the book. For details, we refer the readers t o Feller (1957), Renyi (1970), Rohatgi and Saleh(2001), Sen and Singer (1993), and Ferguson(1996). In the next section we provide the definition of each type of convergence.
41
2.7. Large Sample Theory
2.7.1
Four Types of Convergence
There are two basic forms of convergence (1) sequences of distribution functions F,(z) converging to a distribution function F ( z ) for all continuity points, z as n -+ 00, and (2) sequences of random variables { X , In = 1,2, . . . } converging t o a random variable X as n -+ cw. First we consider the convergence of sequences of distribution functions.
Definition 2.1. A sequence of random variables (r.v.'s) {T,} cohverges in V F T ( z ) for all continuity points
distribution to T, meaning T,+T, if FT, (z) z of F T ( 5 ) .
-+
Convergence in distribution does not imply convergence of moments, nor does it imply convergence of probability density functions.
Definition 2.2. A sequence of r.v. {T,} converges in probability to a r.v. T if for every E > 0,
P{IT,-TI>e}+O
as n-+oo.
This is written as
T,~T
or
T, = T + o , ( ~ ) .
The convergence says that the sequence of probabilities P{IT, - TI} > E} converges to 0 as n + 03. It does not say that IT, - TI < E for n 2 N for a suitable sample size, N. As a consequence of the definition we can easily see that g(Tn)'g(T)
as n + w , P
P
when g is a continuous function. Hence, g(T,)+g(c) when T,-+c, a constant. The following theorem is important: P
V
Theorem 1. T,+T implies T,+T as n
-+
where c is
03.
Definition 2.3. A sequence of r.v.'s {T,} such that EIT,I' < 03 for some > 0 converges in the rthmean t o the r.v. T if EITI' < 00 and EIT,-TI' -+0 as n -+ co.
T
This is written as
T,~T
as n+cw. tth
Convergence in rth means implies convergence in probability, meaning T, +T P for some T > 0 implies T,-+T. If T = 2, then Var(T,) -+ Var(T). In statistical methodology, convergence in the quadratic mean is very useful and will be used frequently in this book. Finally, we consider the concept of almost sure convergence, sometimes called convergence of a sequence of r.v.'s {T,} to T with probability one or strong convergence of {Tn}to T.
42
Chapter 2. Preliminaries
Definition 2.4. A sequence of r.v.'s {T,} converges almost sure to the r.v. T , meaning Tna3T,if P{limn-m T, = T } = 1. The relationship of the four types of convergence concepts is given by the following diagram:
Thus, we see that convergence in distribution is the weakest of all. We observe that this convergence in distribution is implied by the rest of the three and is most important for statistical inference since the limiting distribution is used to determine the asymptotic distributional bias, risks and confidence sets or the significance of tests of parameters of interest.
2.7.2
Law of Large Numbers
When {T,} convergences to a constant, say, c, then it falls in the domain of the P
V
law of large numbers. Accordingly, when T,+c or Tn-+c,then we have the weak law of large numbers. If TnaZclthen we have the strong law of large numbers. If a distribution has a finite second moment, the law of large numbers is simplest and useful for statistical inference and the convergence is in quadratic mean. Three laws of large numbers are related t o the idea of consistency of estimators/tests. Weak laws are proved based on characteristic functions and the continuity theorem. The following theorem states the weak/strong law of large numbers with regard to the sample mean: Theorem 2. Let X I , mean. Then:
. . ,X , be i.i.d., and let
xn= cy=l X , be the sample
(i) (Weak law) If E I X J < 00, then X , z E ( Y ) = p. (ii) (Strong law) X , " Z p if and only if E ( X ) < 00 and p = E ( X ) . (iii) If EIX1' < 00, then
-
xn2nd mean
p =E(X).
The proof may be obtained by (1) the characteristic function, (2) Chebycheff's inequality, or (3) the Borel-Cantelli Lemma (a.s.) , which are stated below.
2.7. Large Sample Theory
43
Chebycheff's Inequality If T is a nonnegative r.v. such that ElTl < co,
then for every c > 0,
1
P{T > c E ( T ) }< -. C
Borel-Cantelli Lemma. Let {A,} C:='=, P(A,) < 00. Then (1) P(limA,)
be a sequence of events such that
= 0,
(2) If {A,} are independent events, then P(limA,) = 1.
Markov Inequality Let T be a nonnegative r.v. with finite rth mean, meaning pi = E(T)' < 00, for some T > 0, Then for every € > 0,
P(T > c ) 5 c-'p;.
2.7.3
Central Limit Theorems
In determining confidence intervals or the level of significance a in test of a hypothesis, we need the exact distribution of the related statistics. In the nonnormal case, under certain regularity conditions, we may determine the asymptotic distribution that may be used to set up confidence intervals, and determine type I and power of tests based on the statistic in question. It is well known that if X I , . . . ,X , are i.i.d. r.v. from a cdf F ( z ) with finite mean p and variance 0 2 ,then for the sample mean X, we have
f i - - p ) N(O,1) ff
-(X,
as n --f
N
CQ.
This is the basic central limit theorem (CLT). However, there are situations where X I , . . . , X , are not i.i.d. r.v. but satisfy
E(X2) = 0 , V ( X 2 )= ffz2 < 00, Now, let T, = X1 question:
+ ... + X ,
When is T,/o,
N(0,l) as n
-
and --+
0 ;
= of
(2
= 1,. . . ,n).
+ - . -+ o;. Then we may ask the
CQ?
The answer lies in the Lindeberg-Feller theorem, which gives the uniform integrability condition: If for every 6 > 0,
then m
1
ffn
-
N(0,l).
44
Chapter 2. Preliminaries
The next theorem called the Hajek-Sidak CLT deals with the asymptotic normality of a linear combination of i.i.d. r.v.’s X I , . . . ,X , meaning of T, = a l X l + ... + a n X n , where E ( X i ) = p and E ( X i - p)’ are both finite.
Hajek-Sidak CLT. Let X I , . . . ,X, be i.i.d. r.v. such that E ( X i ) = p and VarXi = u2 are both finite. Let
T, = alX1 + . . . + anXn. Then
whenever the Noether condition
is satisfied. Sometimes we have to deal with functions of two or more statistics, say, P
g(T,,S,)where FT,,(z) -+ F ( z ) and Sn+c (constant). Then we need the limiting distribution of g(T,,Sn). In this case, we can use the Slusky theorem. Stated in a simplest form the Slutsky theorem is as follows: Let X I , . . . ,X, be a sequence of r.v. with distributions F l , . . . ,F,, and suppose F,(z) -+ F ( z ) as n + 03. Further, let Y1,.. . ,Y, be another set of r.v.’s P
such that Y, + c. Set V, = X, as n + 03, the distribution of
(1) V, tends to F ( z
+ Y,,
W, = X,Y,
and 2, = X,/Y,.
Then,
+ c)
(2) W, tends to F ( c z ) (3) 2, tends to F ( z / c ) for c > 0 as n
-+
co.
An important concept that dominates the asymptotic theory of rankstatistics is that of “contiguity of probability measures.” We define the concept in the following:
Definition 2.5. Let {p,} and {q,} be the sequence of simple hypothesis densities defined in a measure space ( y ,I?,, p n ) . If
Pn(A,)
-+
0 implies &,(A,)
+0
as n
+ 03,
then {Q,}are contiguous to {Pn}. Here 8 P ( A n ) / a p , = p , and aQ(An)/dpn= 4,. Generally, we are interested in the asymptotic distribution of statistics {T,(Y)}. Then convergence of {T,(Y)} -+ 0 under {P,} implies {T,(Y)} -+ 0 under {Q,} if {Q,} is contiguous to {P,}.
2.8. Nonparametric Theory: Preliminaries
45
In this context the likelihood ratio statistics plays an important part. Consider the likelihood ratio statistic
L,(Y)
=fi
if p,>O
=1 = co
ifp,>O if 0 = p ,
P77
< qn.
If log L,(Y) is asymptotically N ( - a 2 / 2 , a’) under {P,}, then {Q,} is contiguous to {P,} (LeCam’s lemma 1). Let T,(Y) be a statistic. Assume that under {P,}, (T,(Y),logL,(Y)) has asymptotically bivariate normal distribution with mean ( e l , & ) and covariance matrix
( :;:
+
012 022
) . Then, under
{ Q,}, the asymptotic distribution
of T,(Y) is N(p1 0 1 2 , all). (LeCam’s lemma 3). LeCam’s lemma 2 gives conditions when log L,(Y) is asymptotically ~ ( - ~ a ~ , a ~ ) . For further information on contiguity see Hajek and Sidak (1967), Hajek, Sidak, and Sen (1999), Puri and Sen (1986), and JureEkova and Sen (1996) among others. ~
2.8
Nonparametric Theory: Preliminaries
In this section we present some of the basic preliminaries of nonparametric methods which are relevant in the discussion of the R-estimators of the parameters of the models in various chapters. The proofs and details of technicalities are available in Hajek (1969), Hajek and Sidak (1967), Hajek, Sidak, and Sen (1999), Puri and Sen (1986), Randles and Wolfe (1979), and Sen (1981) among others.
2.8.1
Order-Statistics, Ranks, and Sign Statistics
Let Yl,Yz,. . . ,Y, be n i.i.d. r.v. with absolutely continuous cdf F ( y ) with absolutely continuous pdf f(y). Let Y ( . )= (ql), .. . , be the orderstatistic vector corresponding to the i.i.d. r.v. Y = (Y1,. .. ,Y,)’, which we write as
q,))’
y = (yl,. . * ,yn)’ = ( where Rj is the rank of Y ( . )is
n
~ R I ).,. 7 qR,))’
6 among Y1,.. . ,Y,.
(2.8.1)
Thus, the joint distribution of
n
n!
i=l
f ( Y ( , ) ) on - co < Y(1) < . . . < Y(n)
< 00,
(2.8.2)
and the joint distribution of R = (R1,.. . , R,)’ of the rank-vector is
1 n!
for R E R,,
(2.8.3)
Chapter 2. Preliminaries
46
where R, is the set of n! permutations of the integer-vector (1,2,. . . ,n). If the pdf of y3 ( j = 1,. . . ,n ) is symmetric about 0, then we may define four statistics relevant to the distribution: (i) The absolute-value statistics
(ii) The absolute-value order-statistics
(iii) The rank-statistic, Rt = (RF,. . . ,I?:)’,
where
Rl is the rank of lY,l among lY11,. . . , /Y,l.
(2.8.4~)
(iv) The sign-statistic S = (sign Yl ,. . . , sign Y,). Clearly,
Y , = ly31 signYj,
j = 1,. . . ,n.
(2.8.4d)
The distributions of lYl(.), R+, and S are independent with distributions given by (i) 2, n! (ii) (iii)
n:=,f ( z i ) , O < zi < ... <
z,
< 00,
5 for R+ E R,, & for S E,&,
the set of 2” points.
Linear rank-statistics (LRS) Let (Yl,. . . ,Y,) be a sample of size n from the cdf f(y)/pdf
(2.8.5)
2.8.2
f ( y ) that satisfies
the following conditions:
(i) F E F ,a class of absolutely continuous cdf’s with absolutely continuous pdf f(z) such that f(z)= f(-z) for all z E ( - c q 0 0 ) .
( 2.8.6) meaning f has finite Fisher’s information. Let {(~(u); - < u < l} be the class of nonconstant, nondecreasing, and square integrable functions. Further, let (2.8.7)
2.8. Nonparametric Theory: Preliminaries
47
Now, consider a sample of size n from the uniform distribution U ( 0 ,l ) ,and define the scores for every n ( 2 1)
(2.8.8) for i = I , 2,. . . ,n, when 0 < U1n < U2n < ... < U,, < 1 are the orderstatistics of the sample ( U l , . . . ,Un)’. In this book we are interested in the lanear rank-statistics (related to location and regression parameter) of the type: n
T,(o) = n - 1 / 2 C a , + ( R : ) sign yJ,
(2.8.9)
3=1
c(z, n
Ln(0) = n-1/2
- ?&)an(&),
(cz,). n
Zn = n-l
J=1
(2.8.10)
t=l
Some results are given in the following theorems (see also Hajek and Sidak (1967), Chapter V). First, the mean and variance of the statistics are given by: Theorem 1. Under the assumed regularity conditions, (i) E[Tn(O)]= 0, n
+ . 2 (211 >
(ii) v a r [ ~ n ( o )= l a=]
and (i) EfLn(O)I = 0,
Standard Scores. For the symmetric location distributions F ( z - 0) and for the test of 0 = 0, the scores are
where F-’(U), (0 < u < 1) is the quantile of order u of the cdf F ( y ) and the score generating function is defined by
Chapter 2. Preliminaries
48
The Wilcoxon test is associated with the score function = u,
+(ZL)
0
< ZL < 1,
(2.8.14a)
and the normal score test is associated with (2.8.14b) For the regression model, the scores are defined by (2.8.15a) with the score generating function (2.8.1513)
Theorem 2. Let n
T,(o) = n-1/2
CU:(R+) signx,
(2.8.16)
i=l
where the scores {uk ( 2 ) li = 1, . . . ,n} converge to some square integrable function & ( u ) such that J ; [ ~ + ( Z L ) ] ~ ~ Z C > 0. Then Tn(0)follows approximately the normal distribution with mean 0 and variance J : [ ~ + ( z L ) ] ’ ~ z L . If
‘ p + ( z ~ ) = ++(zL,
f),then the variance is I ( f ) .
Theorem 3. Let n
L,(o) = n-1/2
C(xi- ~,)a,(~i),
(2.8.17)
i=l
where the scores { u n ( i ) ( i= 1 , . . . ,n } converge to some square integrable function p(zi) such that A; = J;[q(u)- (pI2dzi > 0, (p = s,’q(u)dzi. Then L n ( 0 )follows approximately the normal distribution with mean 0 and variance AQ : and
=
( lim n-’Qn) n-cc
1
1
0
[p(u)- (pl2dzi, Qn =
c(zz n
- %n)2
i=l
Q = limn-’&,.
If ~ ( z L )= +(zL, f),then (p = 0 and (2.8.18)
49
2.8. Nonparametric Theory: Preliminaries Theorem 4. (Saleh and Sen, 1978). Let
T,(o)=
n
n-1/2Ca,+(~F) sign u,, z=1
n
Ln(0) = n-1/2
X(q 2=1
(2.8.19)
- Zn)an(Rz),
so
where { a R ( i ) } and { a n ( i ) } converge to some square integrable functions @(u) and ~ ( 2 1 )such that A; = J;[y(a) - 3]2d26 > 0, 3 = 1 cp(u)da and
$ [ y + ( ~ ) ] <~ d u Then 00.
[ h T n ( 0 ) h, L n ( 0 ) ] ’= NZ
{(
),A; D i a d L Q ) } .
(2-8.20)
Now we consider the following linear rank-statistics related to multiple regression models: n
T,(o) = n-li2
Ca,+(~,f) sign y ~ , z=1 n
L,(O) = n-1/2
C ( X Z- En)an(Rz) z=1 n
= (Lln(O),
.
*.
,Lq,n(O))’,
Zzn = 7l-l
Cxzln,(2.8.21)
3=1
where
x.- n-l
n
n
i=l
i=l
c x i and Qn = c ( x i - %,)(xi- Zin)’
is a q x q matrix satisfying the general Noether condition given by max
llisn
(xi - %in)’Qkl(xi- %in)
-+
0 as n
-+
co,
(2.8.22)
along with the generalized inverse Q,l of the matrix Qn. Then
where Q = limn-m n-lQ,. Let A: = ( n - 1) - l C~zl[[an(i) - &I2. Then L, = A;2L~(0)Q;1Ln(O) follows a central chi-square distribution with q degrees of freedom
Chapter 2. Preliminaries
50
2.8.3
Rank Estimators of the Parameters of Various Models
Location Model. Consider the location model
Y , = 91,
+ el
(2.8.24)
where Y , = (Y1,.. . ,Yn)’, 1, = (1,.. . ,1)’ and e = ( e l , . . . , e n ) and the error-distribution is the absolutely continuous and symmetric cdf F ( e ) with absolutely continuous symmetric pdf f ( e ) such that f ( e ) = f(-e) and I(f) < co (see equation 2.8.6). Then consider the linear rank-statistics
where R:
is the rank of lY, - a / among lY, - all.. . , lY, - a / and
(; +--
a,+(i)= Ep+(Uin) or p+ -
(2.8.26)
;n:l)l
as defined in Section 2.8.2. For every given Y ,T,(a) is a decreasing in a. Also, if a = 0, then T,(O) is symmetrically distributed about 0. Thus, following Adichie (1967) and Hodges and Lehmann (1963) among others, we define the R-estimator of the location parameter 6 as
(2.8.27) where = sup{a : T,(a) > 0) and 8i2) = inf{a : T,(a) on Puri and Sen (1986) we state the following theorem:
< O}. Then, based
Theorem 5. Under the assumed regularity condition as n 4 co,
where
(2.8.29)
Location and Regression Model. Consider the location and regression model
Yn = 91,
+ ,Bx+ el
(2.8.30)
51
2.8. Nonparametric Theory: Preliminaries
where Y , = (Yl,. . . ,Yn)',x = ( ~ 1 , ... ,xn)', and e = ( e l , . . . , e n ) and the error distribution satisfy the same condition as in Section 2.8.4. Then, consider the linear rank-statistics n
sign (Y,- a - bxi),
Tn(a,b) = n-l12 Ca:(R!) i=l where R: i s t h e r a n k o f IY,-a-bziI and
among Ifi-u-bx11,
... ,IYn-a-bxnj,
n
Ln(b)= nd112 c ( x i - I n ) a n ( R i ) ,
(2.8.31)
i=l
where Ri is the rank of Y, - bxi and Y1 - bxl, ... ,Yn - b,xn. Then the Restimators of (6, P) are given by
1 p" -- 2-(&') + pi2)), where pi2)= inf{b : ~ , ( b )< 01,
,&')
= sup{b : Ln(b) > 0},
and
1 -
+ 6?)), where 6:) = inf{a : ~,(a,b,) < 01. 6:)
= 2(6i1)
6;) = S U ~ { U: Tn(a,&) > 0) (2.8.32)
By the fact that T,(O,O) and Ln(0) are symmetrically distributed around the origin, 0, the asymptotic distribution of the estimators is as given in the following theorem:
Theorem 6. (Adichie, 1967). Under the assumed regularity condition and maxl<%
~
where Q = limn-lQ,,
Qn = C,"=,(xi - 2,)' and limTn = 5.
Multiple Regression Model. Consider the regression model
Y , = 61,
+ Xp + e,
= (Y1, . . . ,Yn)', X is a n x p matrix of constants, and e = .. ,en)'. Regression parameters are (6,p')' when /3 = (PI,. .. ,P,)'. Con-
where Y , (el,.
(2.8.34)
sider the LRS defined by
n
Tn(a,b) = nP1I2 i=l
a:(R')
sign ( y i - a - b'xi)
Chapter 2. Preliminaries
52 and
whereR,fistherankofIY,-u-b’x,j among IYl-a-b’xlI, ... ,lYn-a-b’x,I and R, is the rank of (Y1-a-b’xi) among (y1 -u-b’x), . . . , (Y,-a-b’x,). To obtain the point estimates of (6,/3’)’, we define the estimator of p as the center of gravity of the set D, given by
D, = {b, : IlL,(b)lI is minimum},
(2.8.36a)
p, = center of gravity of D,.
(2.8.36b)
namely
The R-estimator of 6 is defined by
where
= sup { u : T,(a,$,)
> 0)
and
(2.8.37)
The asymptotic distribution of the R-estimators of (6, p’)’ is given by
(2.8.38) where Q = limn-lQ, and Q, is defined after (2.8.21).
Asymptotic Linearity Results for Linear Rank-Statistics. We observed that R-estimators are not explicit functionals of ranks. The usual methodology fails to obtain an asymptotic distribution theory of these R-estimators. JureEkova (1969, 1971) opened the door toward asymptotic linearity results in rank theory. Rank estimators have thus been popularized since 1970 as a means of proving the asymptotic theory of the R-estimators (see Saleh and Sen, 1978). Here, we present only the results related t o the multiple regression model in Section 2.8.3 because they have broad application to the topics covered in this book. Theorem 7. Consider the statistics T,(a, b) and L,(b) defined by (2.8.35) and the’rank estimators of 6 and p given by (2.8.36a, b) and (2.8.37). Assume the following conditions together with (2.8.6) through (2.8.9): (i) lim n-’Q, 7L-m
-+
Q.
53
2.9. Problems
(ii) lmax (xi- R,)'Q;'(x~ qiln
-
x,)
= o(n),
where xi is the ithrow of X and Xn = n-l
c,"=, xi. (2.8.39)
Let llalj = C:=,I u ( ~ ) ~ , where a = (a('),. . . , a(P))'. Then the R-estimator of 0 is defined by any central point of the set Sb =
denoted by
where k with
{b : llLn(b)[12 is minimum}
(2.8.40)
p,. Under Ho : 6 = 0, 0 = 0, the linearity results are as follows:
> 0 and 6 E RP and 11 . 11 denotes the pdimensional Euclidean norm (2.8.42)
(ii)
lim Po
n-00
{
sup 1601
IT,(n-'/2S0,n-'/2S) - T,(O, 0)
+
+n-1/2(S0 G'x,)y(+,
I)+
>E}
--f
0.
(2.8.43)
These two results will help us to obtain asymptotic properties of the Restimators.
2.9
Problems
1. If X I , . . . , X , are i.i.d. r.v. with E ( X , ) = p and Var(Xz)= o2 (a) Show that the asymptotic distribution of
fiqis N ( 0 , l ) .
(b) Show that the asymptotic distribution of fi s. is N ( O ,I), where X = n-'(X1 .. . X,) and sz = n-l Cz"_l(Xi- X ) 2 .
+ +
2. Prove the Stein identities (2.1.2) through (2.1.3b). 3. Verify the expressions for the cumulants of the noncentral chi-square distribution with v degrees of freedom and noncentrality parameter A2/2 given by (2.2.2). 4. Prove Theorems 1 through 8.
Chapter 2. Preliminaries
54
5. Verify the identities given a t (2.2.13a) through (2.2.1321).
-
6. Refer t o Section 2.3. If V W P ( Xn, ) , then (a) E[V] = n X (b) E[VAV] = n(n 1)XAX n t r ( A E ) X , where A is a symmetric matrix, and (c) E[V-'] = ( n - p - l)-'X-'.
+
+
7. Prove Theorem 2 of Section 2.3. 8. Refer to Section 2.5. Prove that Z = Vi1'2(x - n6) converges to kvariate normal distribution with mean 0 and covariance matrix B as n -+ 00. 9. Refer t o Section 2.7.1. Prove Theorem 1. 10. Refer to Section 2.6.2. Prove Theorem 2.
11. Prove the Lindeberg-Feller theorem. 12. Refer t o Section 2.6.3. Prove the Hajek-Sidak central limit theorem. 13. Prove Slusky's theorem. 14. Prove LeCam's lemma 2. 15. Prove Theorem 2.8.1. 16. Prove Theorems 2.8.2 and 2.8.3. 17. Prove Theorem 1 of Section 2.8.4. 18. Prove that
where u2 = Ai/-y2(p,+).
Chapter 3
Preliminary Test Est irnat ion Outline 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9 3.10 3.11 3.12
Simple Linear Model, Estimators, and Tests Preliminary Test Estimation (PTE) of the Intercept Parameter Two-Sample Problem and Pooling of Means One-Sample Problem: Estimation of Mean with Uncertain Prior Information An alternative Approach to Preliminary Test Estimation Estimation of the Parameters of the Simple Linear Model with Nonnormal Errors Two-Sample Problem and Estimation of the Mean One Sample Problem and Estimation of the Mean Stein Estimation of Variance: One-Sample Problem Nonparametric Methods: R-Estimation Conclusions Problems
In the statistical literature preliminary test estimation of parameters was introduced by Bancroft (1944, 1964, 1965) to estimate the parameters of a model when it is suspected that some “uncertain prior information” on the parameter of interest is available. The method involves a statistical test of the ‘‘uncertain prior information” based on an appropriate statistic and a decision on whether the model based sample estimate or the prior information based estimate of the model parameters should be taken. In this chapter we consider the simple linear model. We introduce the preliminary test estimation (PTE) of the intercept parameter when it is a priori suspected that the slope parameter has a pre-specified value. This problem was studied by Ahsanullah and Saleh (1972) and extended by Ahmed and Saleh (1988). We analytically evaluate the performance of the preliminary
55
56
Chapter 3. Preliminary Test Estimation
test estimator of the intercept parameter relative to the standard estimate (least squares or maximum likelihood) with respect to the mean-square error (mse) criterion. Next, as a special case of the simple linear model, we consider the classical problem of pooling means in a two-sample situation due to Han and Bancroft (1968). Given two samples, the problem is to estimate the means when it is a priori suspected that the two population means may be equal. Accordingly we evaluate the preliminary test estimator against the traditional sample mean, using the rnse criterion. Finally, we consider the one-sample problem of estimating the mean based on a previous experimental result when it is suspected that the population mean has a pre-specified value. In this case we also assess the preliminary test estimator relative to the sample mean based on the mse criterion. In addition}we consider the one-sample problem of preliminary test estimation of variance when it is a priori suspected that the mean has a pre-specified value. An extended problem is discussed by Stein (1964). We also provide an alternative method of estimation that competes well, and discuss its merit compared t o the preliminary test method of estimation. All these studies are carried out assuming normal distribution of the errors with unknown variance. Also, we provide an asymptotic theory under nonnorma1 errors, and nonparametric methods of estimation due to Adichie (1967) and Saleh and Sen (1978). Lastly, problems are added to expand the results in various forms.
3.1 3.1.1
Simple Linear Model, Estimators, and Tests Simple Linear Model
Consider the simple linear model with slope ,B, and intercept 8, given by
(3.1.1)
Y =61,+,Bx+e,
where Y = (Yl,.. . ,Y,)’ is a vector of n observations, x = (q,. . . ,zn)’ is a vector of n known constants, 1, = (1,.. . ,1)’ is a vector of n l’s, and e = ( e l , . . . , en)’ is a vector of n independent errors such that e h/,(0, a21n) with I, the identity matrix of order n.
-
3.1.2
Estimation of the Intercept and Slope Parameter
Using the model (3.1.1) and the sample information from a normal distribution, we obtain the maximum likelihood estimator (MLE)/least squares estimator (LSE) of (0,p)’ as (3.1.2)
3.2. P T E of the Intercept Parameter
57
where
1 Q = X'X - -(l;x)'. n The exact distribution of covariance matrix
(&,fin)'
(3.1.3)
is a bivariate normal with mean (0,P)' and
(3.1.4) An unbiased estimator of the variance a2 is given by
(3.1.5) which is independent of (6,,&)', and (n - 2)(s:/u2) follows a central chisquare distribution with n - 2 d.f.
3.1.3 Test for the Slope Parameter Suppose that we want to test the null hypothesis HO : p = PO against H A : P # Po. Then we use the likelihood ratio (LR) test-statistic
(3.1.6) which follows a noncentral F-distribution with ( 1 , m = n - 2 ) d.f. and noncentrality parameter A2/2 where
(3.1.7) Under Ho, L, follows a central F-distribution. At the a-level of significance we obtain the critical value F I , ~ from ( ~ )this distribution and reject HO if L, 2 F~,~(cx); otherwise, we accept Ho.
3.2
PTE of the Intercept Parameter
This section deals with the problem of estimation of the intercept parameter 19 when it is suspected that the slope parameter may be 00. As a special case it covers the two-sample problem of estimating one mean when it is suspected that the two means may be equal. Also, one-sample estimation of mean is obtained by letting x = 0 and prior information 0 = 80.
58
Chapter 3. Preliminary Test Estimation
Unrestricted, Restricted, and Preliminary Test Estimators of the Intercept Parameter
3.2.1
From Section 3.1.2., we know that the unrestricted estimator (UE) of 8 is
e
n --
If we suspect
P-bnz.
(3.2.1)
P to be Po, then the restricted estimator (RE) of 6 is given by
8,
= P - 00%.
(3.2.2)
In practice, the prior information that /3 = PO is uncertain. The doubt on this prior information can be removed by using “Fisher’s recipe” of testing the hypothesis HO : p = PO against the alternative H A : /3 # PO.As a result of this test we choose 0, or 8, based on the rejection or acceptance of Ho. Accordingly, we write the estimator as
8zT = 8,1(Ln < F I , ~ ( + ~ en1(L, )) 2 F I , ~ ( ( Y ) )m, = n - 2,
(3.2.3)
called the preliminary test estimator (PTE), where F~,,((Y)is the a-level upper critical value of a central F-distribution with (1,n - 2) d.f. and I ( A ) is the indicator function of the set A. For more details on PTE, see Judge and Bock (1978), Ahmed and Saleh (1988), and Ahsanullah and Saleh (1972). We can rewrite enpT as
eLT = en + (fin
- Po)zr(L,
< Fl,m(a)).
(3.2.4)
If a = 1, 0, is always chosen, if a = 0, 8, is chosen. Since 0 < (Y < 1, 6zT in repeated samples will result in a combination of 6, and 6n. Note that the PTE procedure leads t o the choice of one of the two values, namely, either or 8,. Also, the PTE procedure depends on the level of significance a.
en
Bias and MSE Expressions of the Estimators of the Intercept Parameter
3.2.2
Since our interest is to compare the UE, RE, and PTE of 6 with respect to a bias and the mse, we obtain the expressions of these quantities in the following theorems and proofs. First we consider the theorem for the bias and the quadratic bias expressions.
Theorem 1. (i) (ii) (iii)
bl
(8,)
= 0,
B~(B,)
b2(8,) = (P - po)z, b 3 ( 8PT , ) -
= 0,
~ ~ ( 8=, )g ~ 2 ,
(3.2.5a) (3.2.5b)
(P - P ~ ) Z G ~ , , ( : F ~ , , ( ~~) ;2 1 m , =n -2 (3.2.5~)
3.2. PTE of the Intercept Parameter
59
where G,,,,, (.; A2) is the cdf of a noncentral F-distribution with (ml, m2) d.f. and noncentrality parameter A2/2. Here BI, B2, and B3 are quadratic bias expressions for en, en or 6,". Proof. have
The expressions for bl(8,) and Bl(8,) are obvious. For
b2(6n),
we
(3.2.6a)
where Z N ( A ,1) and Z is independent of .:s tation conditionally on ms2/u2. Thus, N
We can evaluate the expec-
and (3.2.6b) Next, we consider expressions for the mean square errors of tp.
Theorem 2. The mse expressions for
nz2
~~(8 = -(I , ) + -), n Q 02
and
respectively.
en, en, and
en, en, and ezT are given by 02
M2(6,) = -(1+ n
nf2
-A2)
Q
(3.2.7a)
Chapter 3. Preliminary Test Estimation
60
Proof. M l ( 8 , ) follows from (3.1.4). As for M 2 ( i n ) we have
iw2(in) = qin- el2 = E { ( & e) + (Pn -
o2
nZ2
02 -(I
nZ2 + -) Q
= -n( I + Q ) =
n
-
p0)q2
+Z2E(Pn-/30)2+25E[(8,-Q)(Pn-/30)] 222 + A2) - ----a2
+&(I
Q
(3.2.8a) (3.2.8b)
n
Let
-
VQ
=
(Pn - Po)--.ff
=
-.{ (Pn
Then, using the fact that
{. (e,
-
e)l(Pn - P o ) }
-
P o ) - (P - P o ) } ,
(3.2.9b)
We write the third term as equaling
Q
( P n - Pn)2,1(13n
< F1,,(a))
Q (3.2.9~) Hence, using Theorems 2.2.4 and 2.2.5, we have
61
3.2. PTE of the Intercept Parameter
3.2.3
Comparison of bias and mse functions
Since the bias and mse expressions are knownJo us, we may now compare them for the three estimators, namely, 6,, On, and SFT. Note that all the expressions are functions of A’, which is the noncentrality parameter of the noncentral F-distribution. Also, A2 is the standardized squared distance between p and Po. First, we compare the bias based on the quadratic bias (QB-)functions as in Theorem 3.2.1. The graphs of the quadratic bias (QB) are shown in Figure 3.2.1. For 5 = 0 or under Ho,
B,(en) = ~ 2 ( 6 n =) ~ 3 ( 6PT, ) -0.
(3.2.10)
7
otherwise,
Bl(6,)
=0
rc2 1 2 I B3(6,PT)= - A 2 { G ~ , m ( 3 F ~ , m ( a ) ; A 2 ) }I Bz(6,) (3.2.11) Q
for all A2 > 0 and 2 # 0. The quadratic bias of 6, is linear in A’, while the quadratic bias of 6FT increases to a maximum as A2 moves away from the origin, and then decreases toward zero as A2 co. Now we compare the MSE functions of the restricted and preliminary test estimators with respect to the traditional estimator, 6,. The mean-square relative efficiency (MRE) of 6, compared t o 6, may be written as ---f
(
( + -A2
MRE(6,;6,) = 1 + ni2) 1
nz2
)-’
.
(3.2.12)
The efficiency is a decreasing function of A’. Under Ho it has the maximum value (l+$)
21
(3.2.13)
Chapter 3. Preliminary Test Estimation
62
0
2-
.-_ 4 v)
0
QB
~.
0-
a = 0.15
4
O
2
0
4
A2
6
8
10
Figure 3.2.1 Graph of quadratic bias functions of the estimators and
>
< >
MRE(6,; 6,) (1 according as A2-1.
(3.2.14)
Thus, 6, performs better than 6, whenever A2 < 1; otherwise, 6, performs better. The relative efficiency of 8zT compared to 6, is given by
MRE(6,PT;6,) = [l + g(A')]-',
(3.2.15)
where
(3.2.16) Under Ho, it has the maximum value
MRE@,PT;e,) = (3.2.17)
3.2. PTE of the Intercept Parameter
63
a
a
= 0.15
Eflciency
= 0.20
a
= 0.25
of PTE re1 UE
l7
MRE N
a = 0.25
0
a = 0.20
5
20
15
10
A2
Figure 3.2.2 Graph of MRE(6,;
8,)
and MRE(6zT;6,)
and MRE(6zT;8,) 2 11 according as
Hence, 6zT performs better than 6, if A' better than 6,". Since
1 2G3,m (?F~,rn(a); A')
(3.2.18) is
I K ( F I , ~ (A'); ~ ) ;otherwise, 6,
- G5,m ( i F l , m ( a ) ; A')
> 0,
(3.2.19)
(3.2.20) The graphs of the MRE(6, : 8,) and MRE(6LT : 6,) are shown in Figure 3.2.2.
3.2.4 Optimum Level of Significance of Preliminary Test Consider the relative efficiency of
MRE(a, A'), we have
6,"
compared to
MRE(cu,A2) = [1+g(A2)]-'.
6,.
Denoting it by
(3.2.21)
Chapter 3. Preliminary Test Estimation
64
The graph of MRE(a,A’), as a function of A’ for fixed a , is decreasing, crossing the 1-line t o a minimum at A’ = A i ( a ) (say); then it increases toward the 1-line as A’ -+ m. The maximum value of MRE(a, A’) occurs a t A’ = 0 with the value max MRE(a, A’) = A2
for all a E A, the set of possible values of a. The value of MRE(a,O) decreases as a increases. On the other hand, if a = 0 and A2 varies, the graphs of MRE(0,A’) = 1 and MRE(1,A’) intersect a t A’ = 1. In general, MR.E(al,A’) and MRE(a2,A’) intersect within the interval 0 5 A’ 5 1; the value of A’ a t the intersection increases as a increases. Therefore, for two different a-values, MRE(a1, A2) and MRE(a2, A’) will always intersect below the 1-line. In order to obtain an estimator (PTE) with a minimum guaranteed efficiency Eo,we adopt the following procedure: If 0 5 A’ 5 1,we always choose 6, because MR.E(a,A’) 2 1 in this interval. However, in general, A’ is unknown, so there is no way to choose an estimator that is uniformly best. For this reason, we select a n estimator with minimum guaranteed efficiency, such as Eo, and look for a suitable a from the set A0 = {alMRE(a,A’) 2 Eo}. The estimator chosen maximizes MRE(a, A’) over all a E A0 and A’. Thus, we solve the following equation for the optimum a*: min MRE(a, A’) = E ( a ,A:(*)) A2
= Eo.
(3.2.23)
The solution a* obtained this way gives the P T E with minimum guaranteed efficiency Eo, which may increase toward MRE(a*, 0) given by (3.2.22). Tables 3.2.1 and 3.2.2 give selected values of T’/Q and a = 0.05(0.05)0.50 for the procedure of choosing the level ( a * )of significance for n = 8 and Z2/Q = 1(.5)5, n = 12, and Z’/Q = 0.1(0.1)0.9.
3.2. PTE of the Intercept Parameter
ff -
LO!
Table 3.2.1 Maximum and Minimum Guaranteed Efficiencies for n = 8 and T 2 / Q = 1.0(0.5)5.0
/&
-2
Emax Emin A2in
1.11
Emax Emin A2in
1.11
Emax Emin A L n
1.21
Emax Emin Akin
l.2!
Emax Emin A2in
1.31
Emax Emin Akin
j.31
Emax Emin A2in
1.41
Emax Emin Azin
1.4!
Emax Emin Akin
1.51
Emax Emin
-
65
+ n & I
1.0
1.5
2.0
2.5
3.0
3.5
4.0
4.5
5.0
3.29753 3.16716 3.81283 3.94493 4.04011 4.11195 4.16810 4.21320 4.25021 0.32597 0.31773 0.31353 0.31099 0.30929 0.30806 0.30714 0.30643 0.30585 7.05937 7.05937 5.53023 7.05937 7.05937 7.05937 7.05937 7.05937 7.05937 2.27544 2.39283 2.46001 2.50352 2.53400 2.55654 2.57389 2.58764 2.59883 0.44789 0.43855 0.43378 0.43087 0.42892 0.42752 0.42646 0.42564 0.42497 5.32801 5.32801 5.32801 5.32801 5.32801 5.32801 5.32801 5.32801 5.32801 1.82656 1.88654 1.91991 1.94117 1.95589 1.96676 1.97496 1.98149 1.98678 0.54227 0.53289 0.52805 0.52510 0.52311 0.52161 0.52061 0.51976 0.51909 4.50883 4.50883 4.50883 4.50883 4.50883 4.50883 4.50883 4.50883 4.50883 1.40934 1.56154 1.64097 1.68976 1.72277 1.74659 1.76458 1.77866 1.78997 0.67415 0.62561 0.60605 0.59549 0.58887 0.58434 0.58104 0.57853 0.57656 4.00897 4.00897 4.00897 4.00897 4.00897 4.00897 4.00897 4.00897 4.00897 1.38335 1.40405 1.41526 1.42230 1.42712 1.43063 1.43330 1.43540 1.43710 0.70912 0.70127 0.69719 0.69469 0.69299 0.69177 0.69085 0.69013 0.68955 0.60838 0.60838 0.60838 0.60838 0.60838 0.60838 0.60838 0.60838 0.60838 1.30073 1.31595 1.32416 1.32929 1.33280 1.33535 1.33729 1.3882 1.34005 0.74827 0.74109 0.73735 0.73505 0.73349 0.73237 0.73152 0.73086 0.73033 3.41621 3.41621 3.41621 3.41621 3.41621 3.41621 3.41621 3.41621 3.41621 1.22121 1.23169 1.23731 1.24082 1.24321 1.24495 1.24627 1.24731 1.24815 0.79911 0.79298 0.78978 0.78781 0.78647 0.78551 0.78478 0.78421 0.78375 3.22498 3.22498 3.22498 3.22498 3.22498 3.22498 3.22498 3.22498 3.22498 1.09488 1.12019 1.13192 1.13868 1.14309 1.14618 1.14847 1.15014 1.15165 0.89949 0.87846 0.86936 0.86427 0.86102 0.85877 0.85712 0.85585 0.85485 3.07468 3.07468 3.07468 3.07468 3.07468 3.07468 3.07468 3.07468 3.07468 1.11847 1.12359 1.12632 1.12802 1.12917 1.13001 1.13065 1.13115 1.13155 0.87938 0.87532 0.87318 0.87187 0.87097 0.87033 0.86984 0.86946 0.86915 2.95417 2.95417 2.95417 2.95417 2.95417 2.95417 2.95417 2.95417 2.95417 1.08512 1.08868 1.09058 1.09176 1.09256 1.09314 1.09358 1.09393 1.09420 0.90992 0.90677 0.90512 0.90410 0.90340 0.90290 0.90252 0.90223 0.90199 2.85621 2.85621 2.85621 2.85621 2.85621 2.85621 2.85621 2.85621 2.85621
Chapter 3. Preliminary Test Estimation
66
Table 3.2.2 bfaximum and Minimum Guaranteed Efficiencies for n = 12 and -2 L =O.l(O.2)O.g Q cy -
Z2/Q
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0.01 Em,, 1.85798 2.33498 2.63864 2.84893 3.00317 3.12114 3.21428 3.28968 3.35198
Em,, 0.47754 0.42471 0.40464 0.39408 0.38755 0.38313 0.37992 0.37750 0.37560 A”,,, 5.53022 5.53022 5.53023 5.53022 5.53022 5.53022 5.53022 5.53022 5.53022 0.11 Em,, 1.57358 1.82247 1.96155 2.05035 2.11197 2.15723 2.19189 2.21927 2.24145 Em,, 1.40934 1.56154 1.64097 1.68976 1.72277 1.74659 1.76458 1.77866 1.78997 A;,, 0.67415 0.62561 0.60605 0.59549 0.58887 0.58434 0.58104 0.57853 0.57656
0.11 Em,, 1.40934 1.56154 1.64097 1.68976 1.72277 1.74659 1.76458 1.77866 1.78997 Em,, 0.67415 0.62561 0.60605 0.59549 0.58887 0.58434 0.58104 0.57853 0.57656 A:, 3.94090 3.94090 3.94090 3.94090 3.94090 3.94090 3.94090 3.94090 3.94090 0.21 Em,, 1.30133 1.40191 1.45223 1.48234 1.50256 1.51695 1.52774 1.53613 1.54284 Em,, 0.73816 0.69495 0.67714 0.66744 0.66133 0.65713 0.65406 0.65173 0.64989 A:, 3.60055 3.60055 3.60055 3.60055 3.60055 3.60055 3.60055 3.60055 3.60055 0.2, Em,, 1.22209 1.29033 1.32345 1.34302 1.35594 1.36510 1.37195 1.37725 1.38148 Em,, 0.79359 0.75642 0.74086 0.73231 0.72691 0.72318 0.72046 0.71838 0.71674 A’,,n 3.35268 3.35268 3.35268 3.35268 3.35268 3.35268 3.35268 3.35268 3.35268 0.31 Em,, 1.16931 1.21843 1.24181 1.25548 1.26446 1.27080 1.27552 1.27917 1.28207 Em,, 0.83380 0.80206 0.78860 0.781 16 0.77644 0.77318 0.77079 0.76897 0.76753 A;,, 3.18236 3.18236 3.18236 3.18236 3.18236 3.18236 3.18236 3.18236 3.18236 0.31 Emax1.12716 1.16235 1.17885 1.18842 1.19468 1.19908 1.20235 1.20487 1.20688 Em,, 0.86973 0.84357 0.83234 0.82610 0.82213 0.81938 0.81737 0.81582 0.81461 A’,,, 3.04378 3.04378 3.04378 3.04378 3.04378 3.04378 3.04378 3.04378 3.04378 0.41 Em,, 1.09488 1.12019 1.13192 1.13868 1.14309 1.14618 1.14847 1.15014 1.15165 Em,, 0.89949 0.87846 0.86936 0.86427 0.86102 0.85877 0.85712 0.85585 0.85485 A’,,, 2.93352 2.93352 2.93352 2.93352 2.93352 2.93352 2.93352 2.93352 2.93352 0.4% Em,, 1.06997 1.08809 1.09642 1.10120 1.10431 1.10648 1.10810 1.10934 1.11032 Em,, 0.92393 0.90750 0.90031 0.89629 0.89372 0.89193 0.88961 0.88961 0.88881 A;,, 2.84422 2.84422 2.84422 2.84422 2.84422 2.84422 2.84422 2.84422 2.84422 0.5r Em,, 1.05070 1.06354 1.06940 1.07276 1.07493 1.07645 1.07758 1.07844 1.07913 Em,, 0.94378 0.93131 0.92584 0.92275 0.92077 0.91940 0.91838 0.91761 0.91700 2.77103 2.77103 2.77103 2.77103 2.77103 2.77103 2.77103 2.77103 2.77103 IA,:
3.3. Two-Sample Problem and Pooling of Means
67
To illustrate the use of the tables, we consider n = 12 and Z2/Q = 0.40. We want a P T E with minimum guaranteed efficiency EO = 0.7811 (say). Then, using the table for n = 12 and EO = 0.7811 we find that a* = 0.30 with maximum efficiency E* = 1.2548.
3.3 3.3.1
Two-Sample Problem and Pooling of Means Model
Consider again the simple linear model (3.2.1)
Y=$l,+px+e,
(3.3.1)
-
where x = (0,. .. '0, 1 , . . . ,1)' is an n x 1 vector of n1 zeros and n2 one's such that n = n1 n2. Also, e N,(O, u21,), and in this case
+
p1=8, p 2 = 8 + P
and p = p 2 - p 1 .
(3.3.2)
Further,
3.3.2
Estimation and Test of the Difference between Two Means
The maximum likelihood estimators of p1, p2, and u2 are
where
Here y1 is the sample mean of the first n1 y-observations while j j 2 is the sample mean of the last n2 y-observations, and s; is the pooled sample variance of the y-observations. For testing the null hypothesis HO : 112 = p1 against H A : p2 # p1, the likelihood ratio test-statistic is given by (3.3.5)
L, follows a noncentral F-distribution with (1, n1 trality parameter A2/2, where
+ 722.2- 2) d.f. and noncen(3.3.6)
68
Chapter 3. Preliminary Test Estimation
Under Ho, 13, follows a central F-distribution. Finally, the unrestricted, restricted, and preliminary test estimators of p1 are given by
and (3.3.7) where I ( A ) is the indicator function of the set A and F I , ~ (is ~the ) upper a-level critical value of the central F-distribution with (1, m = n1 722 - 2) d.f.
+
3.3.3 Bias and mse Expression of the Three Estimators of a Mean To obtain the bias, quadratic bias, and mse expressions of the estimators of p1, we simply substitute in the expressions of Theorems 3.2.1 and 3.2.2 corresponding values of Z and Q given by (3.3.3). Thus, for the bias and quadratic bias we have the following theorem:
(3.3.9)
3.3. Two-Sample Problem and Pooling of Means
69
It is clear from the expressions of the mse's of fi1 and fifT that the relative efficiencies of j i 1 and jiFT compared to @I are given by
MRE(,hl;bl) =
{2 +nz n1
-1
n1
(3.3.1Oa)
+nz
and
respectively, where
Note that MRE(ji1;bl) is a decreasing function of A2. The maximum value of M R E ( f i 1 ; bl) occurs at A' = 0 with the value 1 + n2/n1; and as A2 + 03, MRE(ji1;f i 1 ) tends to zero. Further, j i 1 performs better than b1 for A' in the interval [ O , l ] , while jll performs better than outside this interval. Next, we compare firTrelative to f i l . In this case, the efficiency has the maximum value
MRE(bFT;bl)=
(31F ~ , m ( ~ ) ;
-1
0)]
( 2 1),
(3.3.11)
so it decreases monotonically, crossing the 1-line a t A2 = A i i n ,and then increases toward the 1-line as A' tends to infinity. The graphs of the M R E ( j i 1 : b1) and the MRE(ji;* : fi1) are shown in Figure 3.3.1. Based on the analysis of Section 3.2.4, the optimum level of significance a* with minimum guaranteed efficiency EO is obtained by solving the following equation for (Y: min MRE(a, A2) = E ( a ,A $ ( a ) )= Eo A2
(3.3.12)
as in the case of (3.2.23). Tables 3.3.1 through 3.3.3 of maximum and minimum guaranteed values are provided for (Y = 0.05(0.05)0.50 selected pair of integers (711,722). To illustrate the use of the table, let us choose nl = 8, n 2 = 16, and EO= 0.66. Then using the corresponding tabular values, we obtain a* = 0.15. The efficiency can go up to 1.442 if A' -+ 0.
Chapter 3. Preliminary Test Estimation
70
Table 3.3.1 Maximum and Minimum Guaranteed Efficiencies a\(n1,722)
0.05 E'
1.6445 1.4451 1.3400 1.2750 2.0301 1.7587 1.6005 1.4969 0.4623 0.5477 0.6092 0.6558 0.4380 0.4878 0.5294 0.5647 Ao 7.0594 6.3232 5.9338 5.6934 5.9338 5.6934 5.5302 5.4123
Eo 0.10 E'
1.4605 1.3258 1.2522 1.2058 1.6746 1.5182 1.4208 1.3542 Eo 0.5905 0.6647 0.7157 0.7530 0.5572 0.6039 0.6415 0.6726 Ao 5.3280 4.9274 4.7087 4.5712 4.7087 4.5712 4.4767 4.4078
0.15 E'
1.3415 1.2458 1.1921 1.1577 1.4755 1.3745 1.3089 1.2629
Eo 0.6781 0.7410 0.7830 0.8131 0.6434 0.6851 0.7180 0.7446 Ao 4.5088 4.2448 4.0983 4.0053 4.0983 4.0053 3.9409 3.8938 0.20 E'
1.2578 1.1879 1.1479 1.1220 1.3471 1.2779 1.2317 1.1988
Eo 0.7447 0.7973 0.8317 0.8560 0.7119 0.7482 0.7764 0.7988 Ao 4.0090 3.8206 3.7150 3.5475 3.7150 3.5475 3.5005 3.5661 0.25 E'
1.1962 1.1443 1.1142 1.0946 1.2579 1.2089 1.1756 1.1514
Eo 0.7978 0.8411 0.8690 0.8884 0.7683 0.7993 0.8229 0.8415 Ao 3.5667 3.5266 3.4474 3.3965 3.4474 3.3965 3.3611 3.3350 0.30 E'
1.1495 1.1107 1.0880 1.0731 1.1931 1.1577 1.1333 1.1155
Eo 0.8409 0.8760 0.8983 0.9137 0.8154 0.8412 0.8606 0.8758 Ao 3.4162 3.3095 3.2488 3.2097 3.2488 3.2097 3.1824 3.1622 0.35 E'
1.1135 1.0845 1.0674 1.0561 1.1445 1.1188 1.1009 1.0877
Eo 0.8761 0.9042 0.9217 0.9338 0.8548 0.8758 0.8915 0.9037 Ao 3.2250 3.1426 3.0955 3.0651 3.0955 3.0651 3.0438 3.0280 0.40 E'
1.0854 1.0638 1.0510 1.0425 1.1076 1.0889 1.0757 1.0660
Eo 0.9049 0.9269 0.9405 0.9498 0.8877 0.9044 0.9168 0.9263
00 3.0747 3.0107 2.9739 2.9501 2.9739 2.9501 2.9335 2.9211 0.45 E'
1.0634 1.0475 1.0381 1.0318 1.0792 1.0657 1.0561 1.0490
Eo 0.9284 0.9451 0.9555 0.9625 0.9148 0.9278 0.9373 0.9446 Ao 2.9542 2.9045 2.8758 2.8573 2.8758 2.8573 2.8442 2.8345
0.50 E* 1.0462 1.0347 1.0279 1.0233 1.0573 1.0477 1.0408 1.0357 Eo 0.9472 0.9598 0.9674 0.9726 0.9369 0.9467 0.9538 0.9593 Ao 2.8562 2.8178 2.7956 2.7812 2.7956 2.7812 2.7710 2.7635
3.3. Two-Sample Problem and Pooling of Means
71
Table 3.3.2 Maximum and Minimum Guaranteed Efficiencies
a\(% ,nz) (4, 12) (6, 12) (8, 12) (10, 12) (4, 16) (6, 16) (8, 16) (10, 16) 0.05 E*
2.2873 1.9918 1.8067 0.4286 0.4637 0.4948 5.5302 5.4123 5.3232
1.6798 2.4712 2.1720 1.9740 0.5224 0.4234 0.4506 0.4753 5.2534 5.3232 5.2534 5.1973
1.8332 0.4979 5.1512
1.7993 1.6456 1.5416 0.5440 0.5780 0.6072 4.4767 4.4078 4.3554
1.4664 1.8811 1.7363 1.6324 0.6325 0.5369 0.5635 0.5873 4.3142 4.3554 4.3142 4.2809
1.5542 0.6085 4.2534
1.5480 1.4540 1.3876 0.6293 0.6603 0.6866 3.9409 3.8938 3.8578
1.3381 1.5934 1.5081 1.4442 0.7090 0.6216 0.6463 0.6679 3.8294 3.8578 3.8294 3.8064
1.3947 0.6871 3.7874
1.3932 1.3310 1.2859 0.6983 0.7258 0.7487 3.5005 3.5661 3.5397
1.2516 1.4213 1.3661 1.3238 0.7680 0.6908 0.7129 0.7320 3.5188 3.5397 3.5188 3.5019
1.2902 0.7487 3.4880
0.25 E* 1.28887 1.2458 1.2140 Eo 0.7559 0.7797 0.7992 Ao 3.3611 3.3350 3.3149
1.1895 1.3072 1.2696 1.2403 0.8155 0.7490 0.7683 0.7847 3.2991 3.3149 2.2991 3.2862
1.2167 0.7990 3.2756
0.30 E'
1.2143 1.1839 1.1610 0.8045 0.8246 0.8409 3.1824 3.1622 3.1467
1.1432 1.2269 1.2006 1.1797 0.8544 0.7985 0.8148 0.8287 3.1344 3.1467 3.1344 3.1245
1.1628 0.8406 3.1163
1.1594 1.1376 1.1210 0.8456 0.8621 0.8754 3.0438 3.0280 3.0159
1.1080 1.1681 1.1494 1.1345 0.8863 0.8405 0.8540 0.8654 3.0063 3.0159 3.0063 2.9986
1.1222 0.8751 2.9921
Eo Ao
1.1181 1.1024 1,0903 0.8802 0.8934 0.9040 2.9335 2.9211 2.9117
1.0809 1.1242 1.1108 1.1000 0.9126 0.8759 0.8868 0.8959 2.9041 2.9117 2.9041 2.8980
1.0911 0.9037 2.8929
0.45 E* Eo Ao
1.0866 1.0753 1.0666 0.9088 0.9192 0.9274 2.8442 2.8345 2.8270
1.0598 1.0909 1.0813 1.0736 0.9341 0.9055 0.9140 0.9211 2.8212 2.8270 2.8212 2.8164
1.0672 0.9271 2.8124
0.50 E*
1.0625 1.0545 1.0483 0.9323 0.9401 0.9463 2.7710 2.7635 2.7577
1.0434 1.0655 1.0587 1.9532 0.9514 0.9297 0.9362 0.9416 2.7530 2.7577 2.7530 2.7493
1.9487 0.9462 2.7462
Eo Ao 0.10 E'
Eo Ao 0.15 E"
Eo Ao 0.20 E'
Eo Ao
Eo Ao 0.35 E"
Eo Ao 0.40 E*
Eo Ao
72
Chapter 3. Preliminary Test Estimation Table 3.3.3 Maximum and Minimum Guaranteed Efficiencies a\(n1, n2) 0.05 E'
(117 12)
1.8113 1.6673
1.6291 0.5365 5.1973
1.6189 0.5428 5.1127
1.6114 0.5474 5.0521
1.4351 0.6451 4.2809
1.4280 0.6501 4.2304
1.422$ 0.6538 4.1941
1.3171 0.7199 3.8064
1.3119 0.7240 3.7715
1.3081 0.7265 3.7464
1.2367 0.7773 3.5019
1.2328 0.7806 3.4760
1.2301 0.783C 3.4574
1.1787 0.8233 3.2862
1.1758 0.8259 3.2667
1.1736 0.827s 3.2525
1.1354 0.8607 3.1245
1.1332 0.8629 3.1093
1.1316 0.8644 3.0983
1.1022 0.8915 2.9986
1.1006 0.8932 2.9867
1.0994 0.8944 2.9780
1.0766 0.9167 2.8980
1.0754 0.9180 2.8887
1.0745 0.9189 2.8819
1.0567 0.9372 2.8164
1.0558 0.9382 2.8090
1.0551 0.9389 2.8037
1.4120 0.9537 2.7493
1.0405 0.9545 2.7436
1.0400 0.9550 2.7394
Eo 0.4293 0.5133
Ao 7.0594 5.5302 0.10 E'
1.5633 1.4618
Eo 0.5579 0.6266
Ao 5.3280 4.4767 0.15 E'
1.4102 1.3364
Eo 0.6482 0.7048 Ao 4.5088 3.9409 0.20 E'
1.3059 1.2511
Eo 0.7158 0.7650
Ao 4.0090 3.5005 0.25 E"
1.2307 1.1895
Eo 0.7754 0.8133 Ao 3.5667 3.3611 0.30 E'
1.1746
1.1435
Eo 0.8222 0.8527 Ao 3.4162 3.1824 0.35 E'
1.1318 1.1084
Eo 0.8609 0.8851 Ao 3.2250 3.0438 0.40 E'
1.0987 1.0812
Ea 0.8928 0.9117 Ao 3.0747 2.9335 0.45 E'
1.0731 1.0601
Eo 0.9190 D.9334 Ao 2.9542 2.8442 0.50 E'
1.0531 1.0436
Eo 0.9402 9.9509
An
(13, 14) (15, 16:
(7, 8)
(3, 4)
2.8562
2.7710
73
3.4. One-Sample Problem: Estimation of Mean
a = 0 IS a = 0.20 a = 0.25 Efficcrency of RE re/ UE
Efficiency of PTE re1 UE forn,=l ondn,=4
a = 0.25
x!
a = 0.20
,
0
5
I
I
10
15
A2
I
20
Figure 3.3.1 Graph of MRE(,Gl; ,GI) and MRE($rT; ,GI)
3.4 3.4.1
One-Sample Problem: Estimation of Mean Model
Consider the simple linear model
Y =81,+0x+e
-
where e Nn(0,a21,)and x = 0. Then we have the reduced model for the mean given by
Y
= 81,
+ e,
(3.4.1)
and the variance of the errors is a’. Our problem is to estimate 8 when it is suspected that 6 may be equal to 80. As before, we define the unrestricted, restricted, and preliminary test estimators of 8 in the next section.
3.4.2 Unrestricted, Restricted, and Preliminary Test Estimators Using the sample information and (3.4.1), we minimize the quadratic form
(Y - 6l,)’(Y - 61,),
(3.4.2)
to obtain the unrestricted estimator of 8 as
8,
=(lkl,)-l~k= ~
7
(sample mean).
(3.4.3)
74
Chapter 3. Preliminary Test Estimation
Clearly, E(6,) = 6 and Var(8,) = u’/n. Further, the unbiased estimator of u2 is given by
1 m
sp = -(Y
-
6 , 1 , ) ’ ( ~ - 6,1,),
In order t o test the null hypothesis HO : 6 = use the likelihood ratio test-statistic
m = n - 1. 60
against H A : 6
(3.4.4)
#
60, we
(3.4.5) The exact distribution of C, is the noncentral F-distribution with (1,m ) d.f. and noncentrality parameter A2/2,where A2 = n(6 - 6o)’/u2. Under Ho, it follows a central F-distribution. Thus, a t the a-level of significance we reject HO if C, 2 Fl,,(a), where Fl,,(cr) is the upper a-level critical value of the central F-distribution with (1,m ) d.f. The restricted estimator of 6 is simply 80, though we could combine 6, and 60 as d6, (1 - d)&, where 0 < d < 1 is the coefficient of distrust on the value 6 = 60 (type I1 error in testing Ho). However, in this case we will assume d = 0, meaning no distrust on 6 = 60. So we define the preliminary test estimator of 6 as
+
GET
+
= @01(Cn< ~l,rn(a))6 n I ( C n 2 ~l,rn(a))
= 6, - (6, - Qo)1(Cn < ~l,rn(a))?
(3.4.6)
where Fl,,(a) is the a-level critical value.
3.4.3 Bias, mse, and Analysis of Efficiency The bias expressions of
6,
and OT:
are
b l(6 , ) = 0,
&(en) = 0,
(3.4.7a)
Similarly, U’
Ml(6J = n
(3.4.8a)
75
3.4. One-Sample Problem: Estimation of Mean and
A&(ET)= n [1 0’
-
(
G3,m ;FLm(a); A2)
{
+A2 2G3,m (;Fl,m(a); A’) Thus, the efficiency expression for
8,”
to
1
- G5,m ( s F l , m ( a ) ; A2)}]
&, is given by,
en) = [ 1 + g(A’)]-’,
MRE(6,PT;
(3.4.8b) (3.4.9)
where
Hence,
8KT performs better than gn 0
whenever
I A’ 5 K(Fi,,(a); A’),
(3.4.11)
where
To obtain a P T E with minimum guaranteed efficiency Eo, we solve the equation min M R E ( a ,A2) = E ( a ,Ai(a)) = Eo A2
(3.4.13)
to find the optimum a-value a* as before. Table 3.4.1 gives the maximum and
minimum efficiency of the PTE of 19for various a and sample sizes.
Chapter 3. Preliminary Test Estimation
76
Table 3.4.1 Minimum and Maximum Efficiency of PTE for n = 8 and T 2 / Q = 0.1(0.1)0.9 a\n 0.05 E'
Eo A0 0.10 E' Eo A0 0.15 E' Eo A0 0.20 E' Eo A0 0.25 E' Eo A0 0.30 E' Eo A0 0.35 E' Eo A, 0.40 E' Eo A0 0.45 E'
Eo A0 0.50 E'
Eo A0
3.5
10 4.2577 0.3350 6.1009 2.5564 0.4500 4.8042 0.6827 0.5405 4.1626 1.6406 0.6174 3.7616 1.4514 0.6844 3.4811 1.3256 0.7430 3.2756 1.2374 0.7941 3.1163 1.1732 0.8381 2.9903 1.1256 0.8756 2.8885 1.0899 0.9068 2.8054
15 4.0063 0.3600 5.5303 2.4529 0.4722 4.4762 0.6892 0.5601 2.9403 1.6033 0.6345 3.6006 1.4259 0.6990 3.3612 1.3077 0.7553 3.1824 1.2245 0.8043 3.0429 1.1639 0.8463 2.9335 1.1189 0.8820 2.8442 1.0851 0.9118 2.7695
20 3.8912 0.3720 5.2857 2.4052 0.4828 4.3337 0.6924 0.5693 3.8429 1.5860 0.6425 3.5278 1.4141 0.7059 3.3066 1.2994 0.7611 3.1402 1.2185 0.8090 3.0110 1.1596 0.8501 2.9077 1.1158 0.8850 2.8240 1.0829 0.9140 2.7541
25 3.8252 0.3790 5.1514 2.3778 0.4889 4.2535 0.6943 0.5747 3.7875 1.5760 0.6471 3.4873 1.4073 0.7098 3.2757 1.2946 0.7645 3.1150 1.2150 0.8117 2.9922 1.1571 0.8523 2.8930 1.1140 0.8867 2.8125 1.0816 0.9153 2.7461
30 3.7825 0.3836 5.0656 2.3600 0.4929 4.2022 0.6955 0.5782 3.7520 1.5695 0.6502 3.4612 1.4029 0.7124 3.2557 1.2914 0.7666 3.0997 1.2128 0.8135 2.9801 1.1555 0.8538 2.8835 1.1128 0.8878 2.8050 1.0808 0.9162 2.7403
35 3.7525 0.3868 5.0063 2.3475 0.4958 4.2657 0.6964 0.5807 3.7272 1.5649 0.6523 3.4430 1.3998 0.7134 3.2417 1.2892 0.7682 3.0890 1.2112 0.8148 2.9716 1.1543 0.8548 2.8768 1.1120 0.8887 2.7979 1.0802 0.9168 2.7362
40 3.7304 0.3893 4.9629 2.3383 0.4979 4.1397 0.6971 0.5825 2.7090 1.5616 0.6539 3.4295 1.3975 0.7156 3.2314 1.2876 0.7693 3.0811 1.2100 0.8157 2.9653 1.1535 0.8555 2.8719 1.1114 0.8892 2.7941 1.0797 0.9172 2,7332
An Alternative Approach to Preliminary Test Estimation
3.5.1 Introduction In the study of the properties of the preliminary test estimators, we note that the optimum preliminary test estimators have basically two characteristics:
(1) They depend heavily on the level of significance (generally a 2 0.20), which is a nonstandard choice of level of significance. (2) They provide only two choices for the estimator, namely the restricted estimator or the unrestricted estimator on the result of the preliminary test.
3.5. An Alternative Approach
77
To eliminate the dependency on the level of significance and the extreme choices for the estimator, we devise an alternative estimator that will shrink toward a targeted prior value of the parameter under consideration and provide interpolated values of the estimator between the extreme choices. The interpolated values depend on the value (not the test result) of the test-statistic. The targeted value of the parameter is called the natural origin. This approach was introduced by Rodrigues (1987) and elaborated on by Bolfarine and Zacks (1991) in connection with the prediction problem in finite populations. We elaborate the methodology in detail in our context. We first consider the one-sample problem of Section 3.4 to introduce the idea of such an alternative estimator.
3.5.2 One-Sample Problem Consider the one-sample model of Section 3.4, which is
Y = 81,
-
+ e,
(3.5.1)
where e Nn(O,0~1,). It is suspected that 8 may equal 80 (natural origin). So we wish to estimate 6 based on a random sample (yl,... , y n ) and the uncertain prior value 80. Then we have the preliminary test estimator of 8 given by (3.4.6) based on 80, the unrestricted estimator, 6, and the test-statistic L,, as
p T = g-n where
-
(6 - 8o)l(Ln < ~ l , m ( a ) ) ;( m = R. - 11,
(3.5.2)
C, is the test-statistic given by (3.4.5) (3.5.3)
F1,n(a) is the upper a-level critical value from a central F-distribution with (1,m) d.f. The mean-square error of 6cT is
while the mean-square error of is M l ( 8 , ) = u 2 / n . The expression (3.5.4) depends heavily on the level of significance. Thus applying the maxmin rule given a t (3.4.13), we resolve the problem of choosing optimum a. To avoid the dependency of the estimator on a , we devise the following shrinkage estimator(SE) based on 80 and the sample (y1,. . . ,y,)
78
Chapter 3. Preliminary Test Estimation
via On and C,: (3.5.5a) (3.5.5b) (3.5.5c) where c is an appropriate nonnegative bounded constant. This estimator was introduced by Khan and Saleh (2001). Notice that 0: is similar to 6," (see 3.5.2) where we have replaced the indicator function I ( & < F~,,(cY))by a continuous decreasing function c1Ci/2j-1 of L,. Thus, instead of two extreme choices, namely 60 or 6,, 0: provides the choice for all values between 80 and depending on the value of C, for a given sample. 6: is a smooth version of OrT,where 62 4 On and 6rT -+ 8, as C, + 0;) and 62 + $0 when 1LA'2j-1 -+ c-l, while 6rT -+ 80 for small values of C,. The bias of 6: is seen to be -
A
en
where 2
-
(3.5.6)
N ( A , 1) with A = fi(e - do)/.
Theorem 1. If 2
-
N ( A , l), then
[ti,1
E - =1-2@(-A),
(3.5.7)
where @ is the cdf of the standard normal distribution. The proof is straightforward (see the problems a t the end of the chapter; see also Rodrigues, 1987, and Bolfarine and Zacks, 1991). Hence the bias expression (3.5.6) becomes C(3
--Kn(2@(A) n
where K, =
- 1)'
The quadratic bias function of
B'&)
6: = {:c "
is further given by
2@(A)- 1)
2
.
(3.5.8)
The quadratic bias expression for the P T E is
(3.59)
3.5. An Alternative Approach
79
which is dependent on CY. B4(6:) is a nondecreasing function of A2 with initial value zero and increases monotonically to c2K2 as A2 m. On the other hand, B3(irT)is a function of ((.,A) and, for fixed C Y , begins with the initial value zero and increases to a maximum a t some point A = Ao; then it decreases gradually to zero again. Thus, except for small and large values of A, the bias of 6; is smaller than the bias of 6,"'. Next, we consider the mean-square error of 6:, which is .--)
(3.5.10)
To evaluate (3.5.11), we use the following theorem:
Theorem 2. If 2
-
E ( [ Z [=) Proof.
c
n/(A, l ) , then e-A2/2
+ A{2@(A)- l}.
(3.5.12)
The pdf of 1 2 1 is (3.5.13)
=
zQ(z)dz
=
e-A2/2
+
1;
z+(z)dz
+ A{2@(A)- I}
+ A{2@(A)- l}.
(3.5.14)
Hence, the mse of 6: is given by (35.15)
Chapter 3. Preliminary Test Estimation
80 The value of c that minimizes
M4(@)
is
(3.5.16)
7
which depends on A2. To make c* independent of A2, we choose c as
co = K , E .
(3.5.17)
Thus, the optimum value of M 4 ( i z ) reduces t o
(3.5.18)
3.5.3 Comparison of PTE, .:8
and SE
6,
The relative efficiency of 0: compared t o
2 { -~::[2
RE@: : 6,) = 1 while that of
8:
is given by
e-A2/2 -
7f
(3.5.19)
6,PT is given by
R E ( 8 c T :e n ) = [1 -G3,m(3F1,m(a);A2) 1 -1
+A2{ 2G3,m(3F~,m(a);A2) 1 - G S , ~ ( $ F I , ~ ( ~ ) ; A ~ .) } ](3.5.20) Under the null hypothesis Ho : 6 = 60, A2 = 0; hence,
[
1-1
R E ( @ : 6,) = 1 - ZK: ~
21,
(3.5.21)
while 1
RE(dcT : 6,) = [1 -G3,m(5F1,m(a);0)]-1 2 1
(3.5.22)
depending on the size of a. As A2 -+ co,
RE@
: 6,)
-+
[1+ ;K 2 :]
-1
(3.5.23)
and R E ( d r T : 6,)
-+
1.
5
(3.5.24) 1
In general, RE(@ : 6,) decreases from [I - K:]at A2 = 0 and crosses the 1-line at A2 = In 4 (= 1.38) then drops to the minimum value [l ZK:]-' ( 5
+
3.5. An Alternative Approach
81
+
1) as A2 + m. The loss of efficiency is 1 - [l :K:]-', while the gain in efficiency is [I - ZKi1-l. Thus, for 0 5 A2 5 ln4, 62 performs better than otherwise, 8, performs better outside this interval. In the case of 6fT, the graph of RE(6rT; decreases from (3.5.22) t o a minimum at A = A*, and then increases to 1 as A --t m. It performs better than in the range 0 5 A2 5 1. Thus, the range of better performance of 6: compared to OFT is increased by an amount 0.38. The following Table 3.5.1 gives the performance of 0; and if* (for each selected level of significance and sample sizes, n).
en;
an
1
en)
Table 3.5.1 Maximum and Minimum Efficiencies of SE and Efficiency of PTE at A0 for Selected a a\n 10 15 20 25 30 35 40 Em"" 2.6261 2.6692 2.6903 2.7029 2.7112 1.7171 1.7215 Emin 0.6176 0.6152 0.6141 0.6135 0.6131 0.6128 0.6125 0.05 En, 0.6408 0.6466 0.6498 0.6518 0.6532 0.6542 0.6550 Eo 0.3350 0.3600 0.3720 0.3790 0.3836 0.3868 0.3893 A0 6.1009 5.5303 5.2857 5.1514 5.0656 5.0063 4.9629 0.15 EA, 0.6827 0.6892 0.6924 0.6943 0.6955 0.6964 0.6971 Eo 0.5405 0.5601 0.5693 0.5747 0.5782 0.5807 0.5825 4,1626 2.9403 3,8429 3,7875 3.7520 3.7272 2.7090 A0 0.25 E A ~0.7133 0.7182 9.7206 0.7220 0.7229 0.7236 0.7241 Eo 0.6844 0.6990 0.7059 0.7098 0.7124 0.7134 0.7156 A0 3.4811 3.3612 3.3066 3.2757 3.2557 3.2417 3.2314 0.35 EA, 0.7361 0.7395 0.7410 0.7420 0.7426 0.7420 9.7433 Eo 0.7941 0.8043 0.8090 0.8117 0.8135 0.8148 0.8157 A0 3.1163 3.0429 3.0110 2.9922 2.9801 2.9716 2.9653 0.45 EA, 0.7536 0.7555 0.7564 0.7569 0.7572 0.7576 0.7578 Eo 0.8756 0.8820 0.8850 0.8867 0.8878 0.8887 0.8892 A0 2.8885 2.8442 2.8240 2.8125 2.8050 2.7979 2.7941
The first two rows of the table give the maximum and minimum relative efficiency of SE for selected sample sizes. The maximum relative efficiency of SE increases as the sample size n increases, and as n m, it tends to r/r - 2 while the minimum efficiency decreases as n increases. Finally, as n -+ ca, it 1 tends to .rr/r+ 2. The maximum relative efficiency at A' = 0 is [l - $K:]- , which varies depending on the sample sizes. The remaining rows of the Table 3.5.1 contain the minimum relative efficiency, Eo of the P T E at A = A,, which is recorded for each (Y = 0.05(0.10)0.45 with the corresponding efficiency ( E n o )of SE for A = A,. The graph of the relative efficiency of SE against A, is plotted in Figure 3.5.1 along with the graph of relative efficiency of P T E showing the relative positions of the efficiency over the Ao-values. From Figure 3.5.1 it is clear that the relative efficiency of PTE compared t o UE is smaller for A > 1 and that of SE relative to UE is smaller for A > 1.38. Thus, SE dominates UE over the wider interval [0, 1.381 relative to the PTE -+
Chapter 3. Preliminary Test Estimation
82
:1
RE of SE and PTE a:
\
a = 0.15
D
0
I
I
10
30
20
A?
40
Figure 3.5.1 Graph of the relative efficiency of SE and PTE for different values of a in the interval [0, 11. Also from Figure 3.5.1 we see that SE has higher relative efficiencythan PTE over all A at which efficiency of PTE is smallest.
In general, there is no uniform domination of SE over the PTE for all A and every a. When A is near the origin, SE over performs PTE for some a. Also, at A = 0, the relative efficiency of SE is much higher than that of the PTE except for small a. Further, SE is independent of a. Thus, considering the overall performance of SE relative to PTE,SE is preferable to PTE because it produces interpolated estimators that are free of the level of significance, a.
3.5.4 Simple Linear Model and Shrinkage Estimation Consider the simple model (3.1.1) again,
Y = 81,
+ px + e,
e
- N,(o,2 1 ~ ) ~
(3.5.25)
and the estimator of (0, p) as given by (3.1.2) and that for testing given by (3.1.6). These estimators are repeated here as
(k)
=
(
P - pnx
&[X’Y - i(l;x)(l;x)]
)
p = ,& as (3.5.26)
and
(3.5.27)
83
3.5. An Alternative Approach where Q = x’x - i(1;x)’ and s: = ( n - 2 ) - l ( Y
-
e,ln- PnX)’(Y- e,l,
- JnX).
(3.5.28)
Consider the P T E and SE of ,B following Section 3.5.1 as
finPT
- Pn - ( P n - P ~ ) I ( L
< ~ l , m ( a ) ) , rn = R. - 2,
(3.5.29)
and
(3.5.30) respectively, where c is the nonnegative shrinking constant. Similarly, the PTE and SE of 8 are given by =
e;T
e, + (Pn- po)zr(cn< F1,m(cy)),
rn = n - 2,
(3.5.31)
and
(3.5.32)
(e;,
respectively. The shrinkage estimators fi:) have been studied by Khan, Hoque, and Saleh (2002). The bias expressions for the SE of /3 and 8 may be obtained as C
a
b 4 ( f i 3 = --E(se)E
where Z
-
(3.5.33)
N ( A , 1) with A = (P - , B O ) Q ~ / ~and / Owhere (3.5.34)
r(d) G&. 2
with K , = Similarly the mse expression for the SE of are, respectively, given by
p and 8
Chapter 3. Preliminary Test Estimation
84 and
M4(6:) = E [ ( 6 , - 8)2]
C2P2 + -E(se) + 2-E(se)E C
2
a
Q
(3.5.36) using the optimum value of c as before. The relative efficiency of $2 with respect to are
p and of 02 with respect to 6,
RE(^: : pn) = [I - 7r
- l)] - l
(3.5.37)
and
en)= [I
R E ( ~ , :S
2
-
;K:-
9 1
-1
(; + -ii-) Q z2
(2
e-a2/2 -
i)]
-1
,
(3.5.38)
respectively. If A2 = 0,
2
RE@: : Pn) = [l - ;K;]
-1
(21)
(3.5.39)
and
RE(8f : 6,)
=
2 22 1 f 2 [ l - --K;--(7r Q n+Q
depending on the value of
(3.5.40)
z2/Q.On the other hand, as A2 + o,
RE@:
:
6,)
= [l
+ ;Kz] 2
-1
( 5 1)
(3.5.41)
and
RE(6f:in)=
(3.5.42)
Note that both relative efficiencies are decreasing functions of A2 with the highest efficiency at A2 = 0. Some values of the efficiency RE(@ : 6,) are given in Table 3.5.2 for chosen values of the coefficient of variation of the z-values. The efficiency RE(,@ : is available in Table 3.5.1, since it is similar to the one-sample problem.
P,)
85
3.5. An Alternative Approach
Table 3.5.2 Minimum and Maximum Relative Efficiencies of SE and PTE for n = 8, a = 0.05(0.10)0.45 and = 1(0.5)5
a\?'lQ E"""
Emin
0.05 En,
Eo A0 0.10 Ea, Eo A0 0.15 En, Eo A0 0.20 Ea, Eo A0 0.25 EA, Eo A0 0.30 En, Eo A0 0.35 EA, Eo A0 0.40 Eno Eo A0 0.45 Ea, Eo A0 0.50 Ea, Eo
A0
5
1 1.5 2 2.5 3 3.5 4 4.5 5 2.08696 2.17801 2.22951 2.26263 2.28571 2.30273 2.31579 2.32613 2.33452 0.65756 0.64901 0.64457 0.64186 0.64002 0.63870 0.63770 0.63692 0.63630 0.67101 0.66262 0.65827 0.65560 0.65380 0.65250 0.65152 0.65076 0.65014 0.32597 0.31773 0.31353 0.31099 0.30929 0.30806 0.30714 0.30643 0.30585 7.05937 7.05937 7.05937 7.05937 7.05937 7.05937 7.05937 7.05937 7.05937 0.69048 0.68236 0.67814 0.67555 0.67380 0.67254 0.67159 0.67084 0.67024 0.44787 0.43855 0.43378 0.43087 0.42892 0.42752 0.42646 0.42564 0.42497 5.32801 5.32801 5.32801 5.32801 5.32801 5.32801 5.32801 5.32801 5.32801 0.70845 0.70060 0.69651 0.69400 0.69231 0.69108 0.69016 0.68944 0.68886 0.54227 0.53289 0.52805 0.52510 0.52311 0.52168 0.52061 0.51976 0.51909 4.50883 4.50883 4.50883 4.50883 4.50883 4.50883 4.50883 4.50883 4.50883 0.72438 0.71678 0.71282 0.71040 0.70875 0.70757 0.70667 0.70597 0.70541 0.62136 0.61244 0.60782 0.60499 0.60309 0.60172 0.60068 0.59987 0.59922 4.00897 4.00897 4.00897 4.00897 4.00897 4.00897 4.00897 4.00897 4.00897 0.74109 0.73378 0.72997 0.72763 0.72605 0.72491 0.72405 0.72337 0.72283 0.70912 0.70127 0.69719 0.69469 0.69299 0.69177 0.69085 0.69013 0.68955 3.60838 3.60838 3.60838 3.60838 3.60838 3.60838 3.60838 3.60838 3.60838 0.75070 0.74358 0.73986 0.73758 0.73603 0.73492 0.73407 0.73341 0.73288 0.74827 0.74109 0.73735 0.73505 0.73349 0.73237 0.73152 0.73086 0.73033 3.41621 3.41621 3.41621 3.41621 3.41621 3.41621 3.41621 3.41621 3.41621 0.76154 0.75461 0.75100 0.74878 0.74728 0.74620 0.74538 0.74474 0.74422 0.7991 I 0.79298 0.78978 0.78781 0.78647 0.78551 0.78478 0.78421 0.78375 3.22498 3.22498 3.22498 3.22498 3.22498 3.22498 3.22498 3.22498 3.22498 0.77106 0.76432 0.76081 0.75865 0.75719 0.75613 0.75533 0.75471 0.75421 0.84262 0.83755 0.83489 0.83326 0.83215 0.83134 0.83074 0.83026 0.82988 3.07468 3.07468 3.07468 3.07468 3.07468 3.07468 3.07468 3.07468 3.07468 0.77941 0.77286 0.76943 0.76732 0.76590 0.76487 0.76409 0.76348 0.76299 0.87938 0.87532 0.87318 0.87187 0.87097 0.87033 0.86984 0.86946 0.86915 2.95417 2.95417 2.95417 2.95417 2.95417 2.95417 2.95417 2.95417 2.95417 0.78673 0.78033 0.77699 0.77493 0.77354 0.77253 0.77177 0.77118 0.77070 0.90992 0.90677 0.90512 0.90410 0.90340 0.90290 0.90252 0.90223 0.90199 2.85621 2.85621 2.85621 2.85621 2.85621 2.85621 2.85621 2.85621 2.85621
Note: 01 is the level of significance, Em"" is the maximum efficiency of SE, Emin is the minimum efficiency of SE, EOis the minimum efficiency of PTE, Eno is the minimum efficiency of SE a t A,, and A0 is the value of A a t which the minimum efficiency of PTE occurs.
Chapter 3. Preliminary Test Estimation
86
The Two-Sample Problem and Shrinkage Estimation Consider the linear method (3.5.25), where x = (0,. . ,0; 1 , . . . , 1)' is an n x 1 3.5.5
vector with n1 zero's and 122 1's such that n = n1 +n2. Also, e and in this case, n2
-
x=-
nl
71.1732
+ n 2 Q = -n1+ n2
For this model, Y = 61,
and
+ ,Ox+ e ,
,LL~=
6
and
'5
--
Q
- N,(O,
n2
ni (ni + n2) '
u21n),
(3.5.43)
p2 = 6 + P
(3.5.44)
fi2 = g2.
(3.5.45)
and p2 as
with the estimators of
= $3
fi1
and
The unbiased estimator of n2 is s;, where
+
s; = (ni n2 - 2)-l{
ni
n
j=1
j=nl+l
C(yj- 3il2 + C
(yj - 3212
1.
(3.5.46)
For the test of the hypothesis Ho,
HO : PI = PZ against PI # PZ we consider the statistic
L,
=
n1n2 n1 +n2
(fi2 -
~
s;
fid2
(3.5.47)
(3.5.48)
Now the shrinkage estimator of p1 may be written as
(3.5.49)
+
+
where f i n = nlfil n2fi2/n1 n2. The bias of jif is given by
-
=
-cod-
1 + -K, n1 1
722
(2@(A)- l),
where 2 N(A, 1) with A2 = nlnz(p2 - pl)'/(nl Similarly, the mse of f i f is then calculated as
M(b3 = E"+s
-Pd2]
+ n2)02.
(3.5.50)
3.5. An Alternative Approach
with the optimum c-value as m
87
K
, given earlier and K , is defined by (3.5.52)
Hence, the R E ( b f ; f i l )is
(3.5.53)
RE($;
$1)
is a decreasing function of A2. At A2 = 0 ,
while for A2 -+ 00,
(3.5.55) It is known from Section 3.3 that the relative efficiency of jirT is
see (3.3.10b). The maximum value of RE(jifT;b1) is
and the minimum value for each (Y occurs a t A2 = A: with the value [l + g(A;)]-' and A; depending on the size of a. Table 3.5.3 provides the relative efficiencies of SE and PTE for each (Y at A2 = A: to show that SE out forms the PTE for many a-values. Thus, SE is a preferred estimator of p1.
Chapter 3. Preliminary Test Estimation
88
for a = .05(.10).45 and for selected samples ~ \ ( n ina) ,
Em,, Emin .05 EA, Eo
.I0 .I5
.20
A0 EA, Eo A0
EA, Eo
A0 EA,
Eo A0 .25
.30
.35
.40
.45
I
.50
Ea, Eo A0 Ea, Eo A0 EA, Eo A0 EA, Eo A0 EA,
Eo A, Ea, Eo
Ao-
(4,4) 1.4144 0.7734 0.7838 0.4623 7.0594 0.7986 0.5905 5.3280 0.8120 0.6781 4.5088 0.8237 0.7447 4.0090 0.8338 0.7978 3.5667 0.8426 0.8409 3.4162 0.8502 0.8761 3.2250 0.8569 0.9049 3.0747 0.8627 0.9284 2.9542 0.8677 0.9472 2.8562
(43) 1.6771 0.7124 0.7341 0.4380 5.9338 0.7535 0.5572 4.7087 0.7694 0.6434 4.0983 0.7826 0.7119 3.7150 0.7939 0.7883 3.4474 0.8034 0.8154 3.2488 0.8117 0.8548 3.0955 0.8189 0.8877 2.9739 0.8251 0.9148 2.8758 0.8304 0.9369 2.7956
(4,121 1.8544 0.6846 0.7129 0.4286 5.5302 0.7340 0.5440 4.4767 0.7506 0.6293 3.9409 0.7643 0.6983 3.6005 0.7757 0.7559 3.3611 0.7855 0.8045 3.1824 0.7939 0.8456 3.0438 0.8012 0.8802 2.9335 0.8075 0.9088 2.8442 0.8129 0.9323 2.7710
(416) ( w 6 ) 1.9816 1.4770 0.6688 0.7559 0.7012 0.7870 0.4234 0.5474 5.3232 5.0520 0.7230 0.8041 0.5309 0.6538 4.3554 4.1941 0.7400 0.8172 0.6216 0.7269 3.8578 3.7462 0.7538 0.8276 0.6908 0.7830 3.5397 3.4574 0.7654 0.8362 0.7490 0.8279 3.3149 3.2525 0.7752 0.8433 0.7985 0.8644 3.1467 3.0983 0.7837 0.8494 0.8405 0.8944 3.0159 2.9780 0.7909 0.8546 0.8759 0.9189 2.9117 2.8819 0.7972 0.8591 0.9055 0.9389 2.8270 2.8037 0.8027 0.8630 0.9297 0.9550 2.7577 2.7394
~
1
Note: a is the level of significance, n1 and n2 are the sample sizes, Em,, is the maximum efficiency of SE, Emin is the minimum efficiency of SE, Eo is the minimum efficiency of PTE, EA, is the minimum efficiency of SE at Ao, A0 is the value of A a t which the minimum efficiency of P T E occurs.
3.6
Estimation with Non-Normal Errors
In this section we consider the properties of the unrestricted, restricted, and preliminary test and shrinkage estimators in a simple linear model when the components of the error vector e = ( e l , . . . ,en)’ in (3.2.1) are
3.6. Estimation with Nonnormal Errors
89
independent and
E ( e ) = 0 and E(ee’) = 021n and the distribution of e is nonnormal, say, F ( e ) =
3.6.1
n,”=,Fo(ej).
(3.6.1)
Unrestricted, Restricted, Preliminary Test and Shrinkage Estimators, and the Test of Slope
Following the notations of Section 3.1 and 3.2, the unrestricted and the restricted estimators of 6 and p are
6 and
-y - ,&I,
pn = $
6, . I . [
P = PO)
= ?j - POT (when
- ;(l:x)(l;x) 1
1
.
(3.6.2)
As for the test of HO : P = PO,the test-statistic L, as in (3.1.6) may be used (3.6.3) Therefore, the preliminary test estimator 6zT is simply,
iy
=
e, - (6, - ijn)l(Ln<
(3.6.4)
where is the upper a-level value of the distribution of C, under Ho. The shrinkage estimator 6: based on Section 3.5 is then defined by
6:
=
en - c(6,- .- .
Se
J&lA - Pol (3.6.5)
3.6.2 Conditions for Asymptotic Normality of the Unrestricted Estimators of Intercept and Slope Parameters We know that if the errors are independent normal with zero mean and variance L?, the exact distribution of is a bivariate normal with mean (6, P)’ and covariance matrix
(en,&)’
(3.6.6) Since e = (el,.. . ,en)’ is not normally distributed, we need the following three regularity conditions (in this section and the next one) for the asymptotic normality of (8,,&)’. See for example Sen and Singer (1993).
Chapter 3. Preliminary Test Estimation
90
Theorem 1. Assume the following: (i) n+m lirn Z = 2 0 , IZol < co.
(ii) Let qi = xi - T / J & and m a x l l i i n q:
-+
0 as n
4
(3.6.7)
00.
(iii) lim nP1Q= Qo < co. n-oo
Then
where D stands for Yn distribution.”
L,
As for the test of hypothesis, we see that by Sluskey’s theorem as n =
d.f.
-+ 00,
(Pn - , B o ) ~ Q /converges s~ t o a central chi-square distribution with one
Under the fixed alternative hypothesis As : P = Po + b we see that h C P n - PO)
&(pn
= f i ( P n - 0)
+6 6 ,
(3.6.9)
and under As, that - PO) N(&6,a2/Qo).Also, f i b n 00 which implies that 13, = - Po)’/&]/$ --+ 03 as n result, we have the following theorem:
[(p, N
-+
-+
.--)
co, as As a
00.
Theorem 2. Under fixed alternatives
As : p = Po
+ 6,
and the regularity conditions of Theorem 1, as n -+ co,
fi(ZT- P) = f i ( P - P ) + o p ( 1 ) ; (4 fi(Z- P) = &(P - P) + (i)
(3.6.10)
O p ( 0
Proof.
Under As, we have
n+m
n(brT
-,&)’I
= n-m lim E [ n ( j n- PO)’I(Ln
I &,a)]
91
3.6. Estimation with Nonnormal Errors Similarly,
Clearly, under A&,as n
-+ 00,
the asymptotic bias and mse expressions for
bLT and ,@ are the same as for ,&, namely, (i) limn+m E[,brT- 131 = 0 and
Similarly, under (2)
] -- p, (r2
A6 ,
EpfT
and (ii) limn--tmE
2
limn-m E [ ( b f T- P)
B[ :
- 81 = 0
- B] = 0
and
and
limn-m E[(h,PT- B)'] = u2(1
[
limn-.,m E (62 - B)"] = ' 0 (1
+ $-),
+ $).
To overcome the difficulty of identical asymptotic distributions of different estimators in large samples under fixed alternatives, we consider the local alternatives
where 6 is a fixed number. Then we see a t once that the asymptotic distribution of L, converges to a noncentral chi-square distribution, H I ( . ;A;) with one d.f. and noncentrality parameter A;/2, where 62
A$ = ;;iQo.
(3.6.14)
Hence, the power function of an cr-level test is given by
We will soon see that the asymptotic distributional bias and mse expressions of the various estimators of 0 are different under K(n).
Chapter 3. Preliminary Test Estimation
92
3.6.3
Asymptotic Distributional Bias and Mean Square Error Expressions, and Efficiency Analysis
The asymptotic distributional bias, quadratic bias, and mse expressions of the unrestricted, restricted, and preliminary test and shrinkage estimators of the slope and intercept parameters together with their asymptotic efficiency expressions are given in this section. The following theorem gives the asymptotic distributional bias expressions of estimators of 8 and 0:
Theorem 3. Under K ( n )and the regularity conditions of Theorem 3.6.1 the asymptotic distributional bias and quadratic bias expressions of the estimators of p and 8 are given by (i) b l ( & ) = 0
and
B1(Pn)= 0;
&(bzT)= Ag{H3(x:(a);Ai)}
(ii) b2(bET) = SH3(x?(a);A:) and (iii)
ClY
b3(bf)
= -kQ(A0) - 11 and
fl
2
;
2
B3(& = c2 [2Q(Ao) - I] . (3.6.16)
For the bias expressions of the estimators of 8 we respectively have
.&(en) = 0; 620 and &(en) = S A ; ;
(i) b l ( e n ) = 0 and (ii) b z ( i n ) =
(iii) b3(8ET) = --G?~?OH~(X~(CY); A;) and B3(@T)= A;$(H3(x:(a); (iv) b4(8:)
= % ~ @ ( A O ) - I]
and
2
A;)} ; 2
B4(&
= ~ ~ $ j A ; [ z @ ( Ao )11
.
(3.6.17) Proof. For b l ( b n ) = B1(jn)= 0 the proof is obvious. Consider b2(,8LT) and b 3 m .
93
3.6. Estimation with Nonnormal Errors Similarly,
=cu [z@(Ao)- 11
@
Hence,
2
B z ( j z )= c2 [2@(Ao)- 11
(3.6.19)
en,
Next, we consider the estimators of in,6rT,and 6,. We then have
B1(8,) = 0. Now, consider b2(dn) = lim E ( f i ( 6 , n-m
-
6,". Clearly, b l ( 8 , ) =
6 ) } = lim E{&(bn - Po)C} 12-00
(Pn
-Po)
Z
n-m
-
and normalizing, we have 5; &(in) = -A:.
N ( A o ,I )
(3.6.20)
Q
Now,
b3(i,PT)= n-oo lim ~ { f i ( i-,o))> P~
=
- o s A o H , ( X ? ( C Y ) ;A:)}. QO
Then, dividing by o Z 0 / Q o and squaring, we get
B 3 ( i 3 = $A:
{H~(~;(N);A:}~.
(3.6.21)
Finally, I
b4(63 =
fic'ose & l / {
J&
- 00 = IPn
-Pol
3[2@(A0)- 11,
Hence,
B4(6E) = c2% [2@(Ao)- I] 2 . Q0
(3.6.22)
Chapter 3. Preliminary Test Estimation
94
The next theorem gives the asymptotic distributional mse expressions.
Theorem 4. Under the regularity conditions of Theorem 1, the asymptotic mse expressions of the estimators of ,i3 are given by
That of the estimators of 8 are given by
and M4(6:) = ~ ~ [ 12 + ~ { 12 - - ( 2 e - ~ ~ / ~ Q0 x
(3.6.24)
Proof. Consider only the estimators of 6 . In this case,
M ~ ( B ,= ) lim E n(gn - e ) ~ ] n-cc
= U’
(1
[
+
$)
by (3.5.7).
(3.6.25)
Next, consider
M~(B,)
=
lim
n-oo
= lim n-cc
E [ ~ ( B ,- e ) 2 ] E n n(& - 0) + (pn - Po).]
[
= limE[n(B,
+ 2 l’olim o
-
2
el”] + n-oo lim ~ [ n ( pj o )~2 5 2
{ z E [ ~ ( B-,e)(pn - p O ) ] }
1
3.6. Estimation with Nonnormal Errors = O’
[
(1
+
g)+
+ %A; Q
-2
-(1
Q0
2
+ A:)
95 -
F H 3 ( x f ( a ) A: ; ) - H 5 ( x ? ( a ) A:)]}. ;
(3.6.27)
We used the limiting version of Theorems 2.2.4 and 2.2.5 and techniques in the proof of Theorem 3.3.2. Finally,
(3.6.28)
The asymptotic efficiency expressions for 6,, 6nPT, and 62 relative to
AMRE(6,;8,)
=
AMRE(6,PT;&) = [1+g(Ao)] 2 -1 , and
en are
Chapter 3. Preliminary Test Estimation
96
-
L M R E with xo=3
I
J
a = 0.25
s! 0
I
I
I
5
10
15
I
20
A;
Figure 3.6.1 Graph of AMRE of TO:
AM RE(^^; 6,)
=
5: { 1 - -r2 Q(1 + -+) o Q 52
and -1
82 relative to 6,
[2e-*;/'
-
11}-I,
(3.6.29)
where
Thus,
8,
performs better than
6,
whenever
0 5 A:
otherwise,
6,
is better. Similarly,
5 1;
8ET performs better than 8,
(3.6.31) whenever (3.6.32)
otherwise, 6, is better. Figure 3.6.1 shows the graph of the AMRE of 6cTand
8," relative to 6,.
We present below Tables 3.6.1 and 3.6.2 for maximum and minimum = 0.05(0.05)0.50 and various selected values of Zg/Qo = 0.1(0.1)0.9(0.5)5.
MRE's for a
3.6. Estimation with Nonnormal Errors
97
a 0 05
01
0 15
02
0 25
03
0 35
04
Em,,
&in
1.16931 1.21843 1.24181 125548
126446
0.83380 0.80206 0.78860 0.781 I6
0.77644 0.77318 0.77079 0.76897 0.76753
3.18236 3 18236 3.18236 3.18236 3 18236 3.18236 3.18236 3 18236 3.18236~
Ern,= 1.12716 1.16235
0 45
05
0.86973 0.84357 0.83234 0.82610 0.82213 0.81938 0.81737 0.81582
$,,in
3 04378 3 04378 3 04378 3.04378 3.04378 3.04378 3.04378 3.04378 3 04378
Em,,
1.09488 1.12019 1.13192 1.13868 1.14309 1.14618 1.14847 1 15024
&in
0.81461 1.15165
0.87846 0.86936 0 86427 0.86102 0.85877 0.85712 0.85585 0.85485
2.93352 2.93352 2.93352 2.93352 2 93352 2.93352 2.93352 2.93352 2 93352
Ern,, 1.06997 1.08809 1.09642 1.10120 1.10431 1.10648 1.10810 1.10934 1.11032 Emin
0.92393 0.90750 0.90032 0 89629 0.89372 0.89193 0.89061 0.88961
&,in
2.84422 2.84422 2.84422 2 84422 2.84422 2.84422
Ern,, 1.05070 1.06354 Ernin 094378
-
1 17885 1.18842 1.19468 1.19908 1.20235 1.20487 1.20688
Ernin
Ern,, 0 89949
-
1.27080 1.27552 1.27917 1.28207
A:in
0.88881
2.84422 2.84422 2.84422
1.06940 1.07276 1.07493 1.07645 1.07758 1.07844
1.07913
0.93131 0.92583 0 92275 0.92077 0.91940 0.91838 0.91761 0.91700
2.77103 2.77103 2.77103 2.77103 2.77103 2.77103 2.77103 2.77103 2.77103
Chapter 3. Preliminary Test Estimation
98
a -
7
0 05
01
0 15
02
0 25
03
0 35
-
E,,,,
&,in
1.12151
1.14945
1 16886
1.18312
1.19406
1.20270
1.20971
1.21550
1.22037
0.87447
0.85305
0.83935
0.82982
0.82282
0.81746
0.81322 0.80978
0.80694
3.02878 3.02878 3 02878 3 02878 3 02878 3.02878 3.02878
3.02878 3.02878,
Em,,
1.09181
1.11223
112627 1.13652
114433
1.15048
1.15545
1.15955 1.16298
Emin
0.90231 0.88502
0.87385 0.86605
0.86029 0.85586
0.85235
0.84950 0.84714
&in
2.92344 2.92344
2 92344 2.92344
2.92344
2.92344
2.92344
2.92344
04
Emax 1.06881 1.08372
1.09390 1.10128
1.10689
1.11129
1.11483
1.11775 1.12019
-
0.92507 0.91 141 0.90252 0.89628 0.89166
0.88810
0.88527 0.88296
2.83892
0 45
05
&,in
2 83892
2.83892
2 83892
2 83892 2.83892
1.07836
1.08149
2.83892
0.88105 2.83892
Em,,
1.05091 1.06172
1.06905
1.07435
1.08402
1.08609
1.08783
Ernin
094356
093303
0.92614
0.92128 0.91767 0.91488 0.91266
0.91085
0.90935
Aiin
2.76998
2 76998
2.76998
2 76998
2.76998
2.76998
2.76998
2.76998
2.76998
Emax 1.03698
1 04471
1.04992
1.05368
1.05652
1.05873
1.06051
1.06197
1.06319
0.95845
0.95055
0.94536 0.94168
0.93894
0 93682 0.93514 0.93376 0.93262
2.71315
2.71315
2.71315
2.71315
Emin
-
2.83892
2.92344
2.71315 Aiin
2.71315
2.71315
2.71315
2.71315
3.7. Twc-Sample Problem and Estimation of Mean
99
To determine the optimum level of significance for the preliminary test estimator, we follow the same procedure as in Section 3.2.4, using Ta-1 -1 ble 3.5.1. The AMRE(62;gn) lies between [l $$(l ] and
+
+
+
-1 -1
3)
[I - r r Q O (1 $) ] . 6: performs better than gn in the interval [0, In 41. The graph of AMRE(62;8,) shows the performance of 62 compared to 8, and 6,PT for all A: and selected a-values.
3.7 Two-Sample Problem and Estimation of the Mean
-
Consider the simple linear model as in Section 3.4 and Equation (3.4.1),where E ( e ) = 0 and E(ee’) = g21n,but the distribution of e F ( e ) nonnormal. Further, let x = (0,. . . ,O; 1,. . . , 1)’ with n 1 zero’s and 712 1’s be such that P I = @ and
p2=@+P,
P=p2-p1
as in (3.4.2). Thus, referring back to the conditions of Theorem I d (3.6.7), we have 712
lim 5 = lim ___ = A,
n-oo
71.1
lim n-lQ
=
n-cc
&h;i
(3.7.la)
+ 712
X(l
-
A),
(3.7.1b)
nz2 [Q1
= X ( 1 -A)-!
(3.7.1c)
The unrestricted, restricted, and preliminary test estimators of
p1
are
(3.7.2)
(3.7.3)
and
The test-statistic for testing the null hypothesis Ho : p2 = p1 against H A = # p1 is given by (3.4.4) as
p2
(3.7.4) Now, L,
D +
2 as 72 xl(a;)
4
00.
(3.7.5)
Chapter 3. Preliminary Test Estimation
100
Under the local alternatives,
K ( n ): p2 where A: = X ( l - A)$,
L
= ,u1+ n-'/'b,
b fixed
(3.7.6)
and approximation of the critical value
~=j, x:(a) ~ (a-level critical value).
(3.7.7)
Hence the asymptotic bias expressions of the three estimators are given by bl(bl)= 0 and bZ(ji1)
= n-oo lim
Bl(fi1) = 0,
{&(@I
- P I ) } = Xb
and
BZ(ji1) =
A$
Similarly, the mean square expressions for the estimators are
(3.7.9) The asymptotic efficiencies of
ji1
and jiyT relative to jil are
AMRE(G1 : b1) = [l - X + XA;]-',
AMRE(GyT : f i ~ = ) [l +g(A;)]-', (3.7.10)
101
3.8. One-Sample Problem and Estimation of the Mean
Further, jlf is better than j31 in the interval [0, h 4 ] . Like the estimation of the intercept when the slope is zero, @f is comparable t o the P T E of p1. j l f does not depend on the level of significance. Also, it interpolates between @1 and j i l . The minimum of guaranteed efficiency of j3f is { 1 with a possible maximum relative efficiency { 1 - %}-’.
+ %}-’
3.8
One-Sample Problem and Estimation of the Mean
Consider, again, the model (3.7.1),
Y where the components of
-
=
E
= 81,
,.. . , E,)’
( ~ 1
+ El
(3.8.1)
are independent and
E ( e ) = 0 and E(ee’) = ~’1,.
Further e F ( e ) is a nonnormal distribution. The LSE of 0 is sample mean, and by the central limit theorem 2)
&(in - 8) N N ( O ,0’1,)
as n + 00.
Further, the estimate of u2 is s, =
n-2
~
(Y - Pl,)’(Y -
nn),
in = Y ,the (3.8.2)
(3.8.3)
which converges in probability to u2. Hence, the test-statistic for the null hypothesis HO: 8 = 60 against H A : 6 # 80 is (3.8.4) which converges in distribution to x:(Ai) as n -+ tives
K(,) : 8(n)= B0
+ n-1/2S,
00
under the local alterna-
fixed 6,
(3.8.5)
where (3.8.6) Under Ho, Cn,a -+x!(a). The unrestricted, restricted, and the preliminary test and shrinkage estimators of 8 are
-
_
8, = Y , 6, = 8 0 , 6ET = 6, - (8, - Bo)I(C, < C,+), (3.8.7)
102
Chapter 3. Preliminary Test Estimation
where L,,,a is the a-level critical value of the L,-distribution. Since L,, converges in distribution to a central chi-square distribution with one d.f. as n co,we have L,,,a .+ x:(a). The bias and mse expressions for en,OT: and 6; are as follows: First, the bias expressions are given by ---f
(i) b I ( 8 , )
=
E[fi(Jn
-~ ( n ) = ) ]0
! v -,
for c =
Z-N(Ao,l).
hET compared t o On AMRE(e,PT;On) = [l + g(Ao)] 2 -1 ,
The asymptotic relative efficiency of
(3.8.8) is given by (3.8.9)
where g@;> = -Hdx?(a); A;>+A; {2H3(x&); A;) - H5(x?(4;A;)}
7
(3.8.10)
3.9. Stein Estimation of Variance: One-Sample Problem Hence, 6:*
performs better than
103
0, whenever (3.8.11)
otherwise,
en is better.
We can prepare a table for the maximum and minimum asymptotic relative efficiencies to determine the optimum level of significance for the PTE for application. Similarly,
(3.8.12) For 0 < A:
3.9
< ln4, 9: performs better than 0,; otherwise, Gn better
Stein Estimation of Variance: One-Sample Problem
Generally, we have seen that the preliminary test estimator performs better than the unrestricted estimator in a limited way. In this section we show that the preliminary test estimator of variance is uniformly better than the unrestricted estimator. An extended problem of variance is due to Stein (1964) with a review by Maatta and Casella (1990). We will be concerned with the one-sample model (3.4.1),
Y = 61,
+ e,
e
-
N~(o,~~I,),
(3.9.1)
and consider the problem of estimating o2 when it is suspected but not sure that 0 may be equal t o 00 (say). As before, we define various unrestricted, restricted, and preliminary test estimators of 0 2 . First, we recall that the likelihood ratio test for the null hypothesis HO: 6 = 60 against HA: 6 # 60 is given by
(3.9.2) where 6 , = P (sample mean) and
1 m
: s = -(Y - ~,I,)’(Y- 6,1n),
is the unbiased unrestricted estimator (UUE) of estimator (MLE ) of o2 is given by
m =n-1 0 2 .The
(3.9.3)
maximum likelihood
(3.9.4)
104
Chapter 3. Preliminary Test Estimation
which is a biased unrestricted estimator (BUE) of u2. The bias and mse of 6:’ are, respectively, -+2 -
b2(un
) - --
02
m+l
and
2(m+l)-l (m 1)2
+
M Z ( ~ ;=~ )
0 .
(3.9.5)
However, a better BUE of u2 may be obtained as (3.9.6) The bias and mse of this estimator are (3.9.7) Thus, we may order the estimators as follows according to the mse criterion,
6; t 6;’ + si (+ stands for domination).
(3.9.8)
Consider now that we have uncertain prior information on the mean of the normal distribution specified as HO : 8 = 80. If this hypothesis is true, then we define the restricted unbiased estimator (RUE) of u2 as
+ +
msp (m l ) ( &- 80)’ - eoi,) = . (3.9.9) m+l m+l The bias and mse expressions of this estimator are given in the following theorem:
1 6; = (Y - e0i,)’(y
Theorem 1. Under H A : 8 # 80, (3.9.10) and
Proof.
1 - B O l , ) ’ ( Y - 001,) n 1 = - { ( n - 1)s: n(8, - 00)~). n
5; = -(Y
+
(3.9.11)
Then we have (3.9.12) V
xi-1
where = stands for “equal in distribution to” and is a central chi-square variable with n - 1 d.f. and x:(A2) is a noncentral chi-square variable with degree of freedom and noncentrality parameter A2/2 and the two distributions are independent.
105
3.9. Stein Estimation o f Variance: One-Sample Problem Clearly,
E(62)= 12_ { (n - 1) + (1 +A2)} U2
02
= -(n+
n
(3.9.13)
A’).
Hence, - 2 - u2 b4(un) - m + 1A2.
(3.9.14)
Similarly,
+
2
M4(62) = Var (62) [b4(62)] ,
(3.9.15)
where
+ Var(xf(A2))}.
04
Var(62) = -{Var(x;--l) n2
(3.9.16)
Therefore,
+
0 4
Var(62) = -{2(n - 1) 2 ( l + 2A2)} n2 u4 { 2(m 1) 4A2} . (m 1)2
+ +
+
(3.9.17)
This results in the mse expression to be -2 M4(‘lJ
{ 2(m + 1) + A2(4 + A’)} .
u4 - ( m+ 1 ) 2
(3.9.18)
Thus, under Ho : 6 = 00 we obtain b4(02) = 0
and
2u4
M4(62) = m+ 1’
(3.9.19)
si
It is clear from above that 6; is better than under the null-hypothesis. As the departure constant A2 diverts from zero, the bias and mse values grow and become unbounded as A2 + co. However, the two mse graphs intersect at (3.9.20)
This means that if A2 E [0,2{ J(1 than
-4
si, whereas for A2 @ [0, 2{
+ !j(m+ 1)) - l}],6; performs better
range the bias of 6; varies from 0 to 2{
4-
- l}],si is better. In this - l}.
106
Chapter 3. Preliminary Test Estimation
In this context we may define another estimator Lehmann (1951), namely =2 0,
=
1 m+3
-(Y - 601,)’(Y - 801,)
=
ms;
$2, following Hodges and
+ ( m+ I)(& - B ~. m+3
) ~ (3.9.21)
Following the proof of Theorem 3.9.1, we find the bias to be (3.9.22)
and the mse of 5; is given by
M5(.2,)
=
04
+
{2(m+3)+A4}. ( m 3)2
(3.9.23)
Now the mse difference is =2
M4(8?) - M5(u,) =
mf3 + 4
+
+ + +
4 A 2 [ ( m 3)’ ( m 2 ) A 2 ] > 0 for all A’. ( m 1 ) 2 ( m 3)2 (3.9.24)
+
Similarly, -2 1 2A2 > 0 for all A’. b*(6:) - b5(0,) = m-t3 (m+l)(m+3)
+
(3.9.25)
-2
Thus, with respect t o the bias as well as the mse, 5n performs uniformly better than 62,meaning I c J 5 ( G i ) < M4(5:) for all A2. Since the null hypothesis Ho : B = 00 is uncertain, we may consider several preliminary test estimators of u2. However, we consider only the estimator based on the best of the unrestricted estimators and best of the restricted -2 estimators, namely based an 82 and 6,. Thus, we define the P T E of u2 as 6yT
2
= Gn1(L,
< F1,,(a))
+ 82I(Ln 2 F1,,(a))
where
Let us now set Fl,,(a) = m / ( m+ 2 ) and define
(3.9.26)
3.9. Stein Estimation of Variance: One-Sample Problem
107
where
4s(cn)=
1 m+2
m
+ 1 + &cn
2 7) m+L m+3
<
m+2
(3.9.30)
Thus, 6:' is the celebrated Stein estimator of g2. The following theorem shows the optimality of 82'sJrelative to m s i / ( m 2) with respect to the mse criterion:
+
Theorem 2. 8'['l
dominates 8; uniformly with respect to mse criteria.
Proof. Consider the mse of
82'l:
For fixed A2 and each C,, this expression is a quadratic form in 4s(Cn)with a minimum a t (3.9.32) Using
where
and
Chapter 3. Preliminary Test Estimation
108
We obtain
(3.9.36)
+
If L, < m / ( m 2), then (1
+ $L,)/(m + 3) < &,
which implies
1
for all A2. 4*(.Cn) i 4o(Ln> 5 m+2 That is, @O(C,) is closer to the minimizing value than l / ( m shown in Figure 3.9.1, it is clear that for each A’ and L,,
(3.9.37)
+ 2). So, as
(3.9.38) It follows that 4s(L,)msi dominates m.$/(m + 2) uniformly in A2. Further, it is clear that the PTE defined in (3.9.27) equals 6:”l
whenever F I , ~ ( C=Y )
3.10. Nonparametric Methods: R-Estimation
+
109
2). This may be obtained by minimizing the mse of 6:[pT1for varying Fl+(a). Thus, a* = 1 - Gl,,(m/(m 2);O) is the optimum level of significance for the PTE, 8,ZIPTI .
m(m
3.10
+
Nonparametric Methods: R-Estimation
In this section we consider nonparametric methods toward R-estimation of the intercept and the slope parameters of the simple linear model when it is suspected that the slope parameter may assume a pre-specified value. This procedure broadens the class of distributions and a t the same time achieve robustness of the estimators against contamination in the data set. R-estimation of a location parameter after a preliminary test on regression is due to Saleh and Sen (1978), and estimation of a location parameter after a preliminary test is due to Tamura (1967) based on U-statistics. We invoke asymptotic theory to develop the properties of the estimators we discuss. In the following sections we state the model, assumptions, and test of hypothesis, and define the estimators we consider.
3.10.1 Model and Assumptions Let y1, . . . ,y n be the response variables corresponding t o preassigned values x1, . . . ,2, satisfying the linear model y,=B+px,+e,,
j=l,.-.,n.
(3.10.1)
The distribution function (d.f.) F(e) of the errors belong t o a class, .T, of symmetric absolutely continuous distribution function (cdf) with absolutely continuous density function (pdf), f (e) such that the Fisher's information is finite. (3.10.2) In addition, the covariates (i) l i m , - , m ~ , (ii)
= 20,
+
= QO
X I ,. . .
,x,
satisfy the following conditions:
IzOl < co. < m, Q, = c:="=,xi
(iii) The 2's are bounded and maxi
(3.10.3a) - z,)~.
4
0 as n
(3.10.3b) -+
00.
(3.10.3~)
The linear model (10.3.1) together with the assumptions (3.10.2) through (3.10.3a,b,c) is the basis of further investigations on test of the hypothesis B = 0,p = 0 or ,d = 0, and the estimate of ( 6 , p ) using the sample { ( x l ? Y l ) ? - .,(zn,Yn)). -
Chapter 3. Preliminary Test Estimation
110
3.10.2
Test of Hypothesis
In nonparametric methods of statistics, one first develops test-statistic for testing the parameters of interest and in the second stage uses the test-statistic to define the estimators of the parameter by inversion technique. In the model (3.10.1) under consideration we consider the test of hypothesis, HO : 6' = 0,p = 0 against the alternatives that 8 # 0,p # 0 simultaneously. As such we consider the test-statistics well known in the literature (e.g., see Adichie, 1967; Hodges and Lehman, 1963; Puri and Sen, 1986) as follows: Let 4 = { & ( u )0, < u < 1) be a nondecreasing, skew-symmetric (i.e., 4(u) 4(1 - 21) = 0, 0 < u < 1) and square integrable score function f = { ~ * ( P L= ) 4((1 u ) / 2 ) ,0 < u < l}, and for every n(> l), let
+
+
or
where Unl < . - . < U,, are the ordered r.v's of a sample of size n from the rectangular cdf on (0.1). Finally, let yn = (yl,... ,y,)' and for every real ( a , b ) , and define y n ( a , b ) = yn - a l , - bx,, where 1, = (1,e.e ,1) and x, = (21, . . . ,z,)'. Consider then the statistics
istherankofyi-a-bzi (or Iyi-a-bsiI) among whereRni(a,b) (orR,+i(a,b)) y i - u - b z l , - . - ,yn-a-bz, (or ( y ~ - a - b s l j , . . . , /y,-a-bz,I),for i = I , . . . ,n. Note that R,i(a,b) = R,i(O,b) for every real a, and hence, R,i(a,b) does not depend on a. We write it as L,(b); also, we write R,i(O,O) = R,i for i = 1,... ,n* Note that for every given y n and b, T,(a,b), is \ in a : -00 < a < 00, and for every given y,, L,(b) is \ in b : -00 < b < 00. Also, if in the model (3.10.1) we let 8 = p = 0, then T,(O,O) and L,(O) both (marginally) have distributions symmetric about 0. Thus for the one-sided test of HO : p = 0 against H A : p > 0, our test consists in accepting or rejecting HO according as L,(O) is < or 2 L,,,, where P{L,(O) 2 L,,,IHo) 5 a , 0 < Q < 1, and Q is the level of significance of the test. Similarly, for the test of HO : 8 = 0 against H A : 8 > 0, the test consists of accepting and rejecting HO according as T,(O, 0) < or T,,,, where PO{Tn(O,O) 2 T,,IHo} F: Q. Since we will be estimating 8 and p when it is suspected that is zero, we concentrate on the test of p that is, Ho : p = 0 against H A : p > 0, using the statistics ILn(0)I or L ~ , ( o Let ).
>
111
3.10. Nonparametric Methods: R-Estimation A$ =
s:
(s,' 4 ( ~ ) d u ) ~ ,
(3.10.5a)
, 0 < 21 < 1,
(3.10.5b)
42(7~)d7 -~
f' F-' u))
+ ( u ) = - f;F-';u))
A$ = I ( f ) =
r(+,4) = s,:
Jt
(3.10.5~)
@(U)~W,
(3.10.5d)
+(71)4(7&)d74
and consider the estimate of A$, by (3.10.6)
Let z, be the upper 0-level critical value from h i ( 0 , l ) . Then
& ~ Q ; ~ / ~ A ; ~+ L ~z ,, , a s n + w .
(3.10.7)
Alternatively, we can write
nQ,1A,2LE,,
--f
x:(cu)
as n
-+ 03,
(3.10.8)
where x:(cr) is the a-level critical value from a central chi-square distribution with 1 d.f. For small samples, L,,,, can be computed by direct enumeration of the null distribution of L,(O) generated by n! equally likely permutation of the ranks R1, . . . ,R,. These results follow from the fact that under the condition (3.10.3a, b, c) and under Ho we have (see Hajek and Sidak, 1967; Hajek, Sidak, and Sen, 1999) (3.10.9) Consequently, lim P{nQ,1A,2L,(0)
n-cc
5 ZIH~} = H ~ ( 0). z;
(3.10.10)
Estimation of Intercept and Slope Parameters We focus now on the estimation of 6 and p based on the test-statistics T,(a, 3.10.3
b)
and L,(b) as follows (see Adichie, 1967; and Puri and Sen, 1986): Let
6i1) = sup{a : T,(a,O) > 0} and 6i2) = inf{a : T,(a,O) < 0 } ,
6n -- q2 & 1 )
(3.10.1l a )
+ &2)),
= sup{a : L,(b)
> 0 } and Bi2)= inf{a : L,(b) < 0 } ,
Chapter 3. Preliminary Test Estimation
112
pn -- 21 6;’)
( p+
p),
= sup{a : T,(a,pn)> 0}
en = +(&I)
(3.10.1 1b)
and
6i2’ = inf{a : Tn(a,,8,) < 0},
+ &Z’).
(3.10.1 lc)
Then 6, is a translation-invariant, robust, and consistent estimator of 8 when ,B = 0, while Jn is a similar estimator when ,B is unspecified. For the preliminary test estimator of p and 8, we consider the statistic
C,=~Q,’A,~L;(O)
V
=
x;
(3.10.12)
asn--,m
from (3.10.10) under Ho. Let x?(a)be the a-level upper critical value from a central chi-square distribution with 1 d.f., then the PTE of /? is defined by
BrT =
Pn
- PnI(Cn
<x~(Q.)).
(3.10.13)
The S-estimator of ,B is defined by (3.10.13)
Similarly, the PTE and S-estimator of 8 are given by
enp,
=
Jn - (6, - in)1(.cn< &a))
(3.10.15)
and (3.10.16)
respectively.
3.10.4 Asymptotic Distribution of Various Estimators
and Their ADB and ADMSE Expressions
In order to obtain the asymptotic distributions of various statistics, we will primarily use (3.10.9) and the JureEkova’s (1969) linearity of rank-statistics under Ho : p = 0 as n -i00. The following holds: (i) sup{fipn(n-1/2b) - ~ ~ ( 0n -)1 / 2 b ~ 0 ? ( 4 , 4 )jbl1 ;I IC) 5
+
o
(ii) s u p { h ~ ~ , ( n - ; a n-’/2b) , - T,(o, 0) +n-112(a
+ bZ,)?(.JJ, 4)l; la1 Ik , Ibl I k }
P
+
0
(3.10.17)
where k is a positive constant. Furthermore, under the stated conditions we have
3.10. Nonparametric Methods: R-Estimation
&(en
113
- N(0,
4)) when p # 0. Note that L, is a consistent test against /3 # 0. So, for fixed alternatives, the asymptotic distributions of fi@, - p),&(b:T - 0) and ,/E(,L?:Ip) are all equivalent to N(0,A:/?($, 4)Qo). Similar results hold for &(On (ii)
6 ) , fi(6,“ natives
- 6)
- 6 ) , and
&(I?,” - 6 ) . Thus we consider the class of local alter-
qn) : p(,)
(3.10.19)
= n-”2b
to obtain asymptotic distributions that provide fair comparisons of the properties of the estimators. Thus, Theorem 1. (Saleh and Sen, 1978). Under K(,) and the assumed regularity conditions, as n GO, ----f
(3.10.20)
Qo
)’
Proof. Note that by (3.10.11b), (3.10.9), and (3.10.17), we have fiIp-pI = Op(l), where under {K(,)},n1/2/3 = 6 = O(1). Thus, under {K(,)},&\jI = Op(l). Further under ,B = 0 by (3.10.17) we obtain n’/’L,(O) = n1/’pQ0y($, 4 ) + op(l). Hence, using contiguity of probability measures under {K(,)} and to that of P = 0, we obtain under {K(,)} as n + GO,
[ &@ ] { ( W+,4)Qo) ( - P)
&L,(o)
0
’N2
;A:
1/r2(4,4>Qo 1 / ~ ( $ 4) , 1/~(+,+)
Q0
)I.
(3.10.21)
114
Chapter 3. Preliminary Test Estimation
To prove (ii), we consider the hypothesis 0 = ,L? = 0 together with the linearity results based on (3.10.17)
J;IL(o)
=
hBn@r($, 4 ) + o p ( 1 )
(3.10.22)
and
Then we have
Hence, from the contiguity of measures under Kin, : 0 = 0, ,B(n) = n-1/2b, t o those under H,* : 0 = ,L? = 0, we obtain from above that as n -+m,
Finally, (fiT,(O, 0), fiLn(0))' under {KFn,} has the same distribution as (&Tn(O, n-1/26), &Ln(n-1/26)) under H,*. Hence, the asymptotic distribution of (&T,(O, n-1/2S), &Ln(n-1/2b)) is bivariate normal with mean vector by($, 4)(?&,So)'and dispersion matrix A; Diag(1, Q o ) . Thus under {K(,)l as n 00, ( f i m ( 0 , O), h L ( 0 ) ) '
- N2{b('IL,#)(zo,
QO)';
A'$ Diag(1, Qo)l (3.10.25)
The proof of (3.10.20(ii)) follows from (3.10.23) and (3.10.25). Again, by noting (3.10.11a) and (3.10.17), under {K(,l}, we have & ~ ~ T ( ' I L ,$1 = fiTn(0,O) + o p ( l ) ,
and the proof of (3.10.20(iii)) follows. Clearly, L,
3
(3.10.26)
xf(A2)as n -+ M.
Based on Theorem 3.10.1, we obtain the expressions for the asymptotic distributional bias (ADB) and the asymptotic distributional MSE (ADMSE) of various estimators of ,L? and 0 as given by the theorem below:
Theorem 2. Under {K(,)} and assumed regularity conditions, we have the ADB and ADMSE of the estimators of ,L? as n -+ M given in (A), (B), and (C) below. (A) UE,
A
(i) b l ( b , ) = 0 and (ii) M I ( & ) = 0 2 / Q 0with o2 = A$/-y2(+,4), (B) PTE, ,bzT
115
3.10. Nonparametric Methods: R-Estimation
Proof. A(i) and A(ii) are obvious. We prove (B) and (C). The PTE of P may be written as = ,6 - PI(& < x?(a)). Thus, the ADB may be written as
BLT
lim E[&Pnl(Ln < x ? ( ~ ) ) I
n- w
= - n-cc lim E[~P(n)l(nB;Q,1A,2v2(@, 4)
< x?(Q))]
= -SH3(x?((Y); A2)
- --
CCT
m[2+(A)
- 11,
since Z
-
N ( A , 1)
116
Chapter 3. Preliminary Test Estimation
The MSE of
8,:
is given by
-
=
-+
n-0;)
E
+ 2-AE[-] z Q0 Qo Izl z + c’ - 2cu2{E[lZl]- A’E[-I}]
0’
Q0
0’
= -[1
C(T2
U’
- 2c-E[IZl]
I4
Q0
=
+ n-oo lim
lim E[n(& -p)’]
=
2
(T’
- -(2ePA2/’
-{1 Qo
-
l)}, by taking c =
The next theorem gives the ADB and ABMSE of the estimators of 6.
Theorem 3. Under { K ( , ) } and the assumed regularity conditions, we have as n + 03, the ADB and ADMSE of the estimators of 6 as follows:
(A) UE,
en
(i) b l ( 8 , )
(B) RE,
=0
and
5).
(ii) M l ( 6 , ) = ~ ‘ ( 1 +
On
(i) b z ( e n ) = 650 and
+
(ii) M 2 ( i n )= (~’(1 $ A 2 )
(C) PTE, 6rT and
(i) b3(6cT) = bZoH3(X:(a);A’)
$)
(ii) M3(6LT) = ~ ’ ( 1 +
-
+%A2{2H3(x:(a); A’) (D) S-estimator, (9
^s
b4(On) -
9H3(x:(a);A2) -
H ~ ( x : ( o IA’)}. );
6: JQO P q A ) - 11
COZ06
(ii) M ~ ( J , s=) 0’ [(I
and
+ $){ 1 - 3 ( 2 e - ~ ’ / 2 - I
3.10. Nonparametric Methods: R-Estimation
117
Proof. A(i) and (ii) are obvious. We consider B(i) and (ii). By the linearity results (3.10.26) and (3.10.22), we have = &(&+p50)+0,(1). Hence, the ADB and ADMSE are given as follows:.
&en
(i) (ii)
lim
n-cc
E[J;;(G,-
= lim E [ & ~ ~ Z O ] = 6 ~ 0 n+cc
lim ~ [ n ( 8-, e)’]
n-ce
= lirn E [ n ( 8 ,- a)’] n-cc
+ n-m lirn
+
E[nB;Z2] 2 lirn E [ n ( 8 ,- 6)pnZo]
+ 8 ) 5;n-cc lim ~ [ n j :+]265: = $ (1 + 8 )- 5;($ + 6’) + 265; 2
(1 - Qo
n-cc
-
=$(1+$)-&$+&;
( + $A2)
= a’ 1
from (3.2.9a).
To prove C(i) and (ii), consider (i)
lirn E[&(6,PT - 6 ) ] = - n-m lim E[&PnZol(Ln
n-cc
< x:(a))]
= 650H3(XT(a);A2) by Theorem 3.10.1;
(ii)
JLir E[n(Bn- 6 ) 2 ] +%; lirn E[np;l(Ln < x:(a))] + 220 lim E[n(fin- 6)p1(Ln< xr(cr))] = o 2 ( 1 + $) Z; lim E [ ~ ~ ; I<( XL : ( c~r ) ) l + 2 6 2 ~ 2 i ~ ~ ( X ?A’) (a>; nlim E[n(eLT- f?)’] =
n-cc
n-m
n-
03
-
03
= CT’ (1
+ $) - $02H3(x:(a);
+ 26’%;H5(x:(a); = a’(l+
3)
A’)
- Z~’H~(XT((Y A2) );
A’)
- $02H3(x:(a);
A’)
+ @’{2H,(X:(Q.); A’) - Hdx?(a);A’)} = 0 2 { (1 + $) $H~(x:(Q);A’) -
+ $A’
[~H~(x:((Y); A’) - H~(x:(Q);A’)]}.
To prove D(i) and (ii), we consider
118
Chapter 3. Preliminary Test Estimation
+(en
First note that - 6,) = fi,8,Z0+op(l), f i L n ( 0 )= fi&QOr($J,$) + o p ( l ) and limn+00 A: = A24,1imn+mi Q , = Qo. Hence, the above reduces to
+ c2c2$
52
-
2c~0
[ @filbnI7(W) n ( &-@)&A+
3
From the result of Theorem 3.10.2, we may draw the same conclusion on the properties of the estimators of ,B and 6 as in Section 3.6.
3.11
Conclusions
In this chapter we introduced the preliminary test estimation of the intercept parameter when the slope parameter has an a priori fixed value, and we compared its performance with the usual estimator under normal theory. In Section 3.5, we introduced an alternative estimator to PTE. Also, we included the Stein estimation of the variance as a PTE of . ' a In Section 3.6, we presented the asymptotic theory when the underlying error distribution is nonnormal. Similarly, two-sample and one-sample problems of estimating the mean were considered in Section 3.7 and 3.8. In this context the question of whether to pool or not t o pool the sample is answered via preliminary test and shrinkage estimators. In Section 3.10, we included the theory of R-estimation of regression parameters and the PTE, together with the S-estimators. The corresponding results when the errors follow a multivariate Student's t-distribution are presented in the problem set together with other results. In this way, a whole array of model assumptions are covered for the estimation of parameters for simple linear models.
3.12
Problems
1. Let Y = 61, I
+ e, e
-
N,(0,a21n). Show that
(a) Bn = (1Lln)-'lnY,
119
3.12. Problems (b) E(8,) = 6 and E(6, - 6)2 = <. (c) Let sz = A ( Y - &l,)’(Y - 6,1n). Show that E(sz) = u2.
follows the noncentral F-distribution (d) Show that L, = with (1,n - 1 ) d.f. and noricentrality parameter A2/2 where A2 =
n(e - e)2/u2.
2. (a) Let 6, = d6,+(1-d)60. Show that (i) the bias of 8, is - ( l - d ) ( O 6 0 ) and (ii) the mean-square error of 6, is $ [ d 2 + (1 - d)2A2], where A2 = $(6-60)~. (iii) The relative efficiency of 6, compared to 6, is
MR.E(6,
: 6,) = id2
+ (1 - d)2A2]-’.
(b) Let OET = 6 J ( L n < F I , ~ ( ~ ) ) + ~ ,LI Fl,,(a)), (L~ where Fl,,(a) is the a-level critical value from the central F-distribution with ( 1 , n - 1 ) d.f. Show that the bias of 6ET is b(6ET) = ( 1 - d)(6 = n - 1. QO)G3,rn(pl,rn(a);A2), (c) The MSE of
GET is
(d) Compare 6ET and
+ +
8,.
Show the dominance region for
8,”.
3. Let Y = 61, Px e as in Section 3.1. It is suspected that ,B may be equal to PO.Define the shrinkage estimator = d p , (1- d),& of P, (0 < d < 1 and d is known), where d is the coeficient of distrust on the prior P = PO.
Bid)
+
(a) Find the bias, quadratic bias, and rnse of the estimator
/?id’.
(b) Compare the estimator ,bid) with respect to & using the bias and mse. (Find the efficiency of with respect t o fin and compare.)
pn
(c) Let ,hzT = - pnI(Cn bias and mse of
BET.
/?Ad’
< Fl,,(a)) as in Section 3.2.1. Find the
4. Consider the problem of estimating the intercept parameter in the model Y=Bl,+px+e when it is suspected that /3 may be of e:
00.Define the following estimators
120
Chapter 3. Preliminary Test Estimation
-
-
-
(i)
en = Y - ,&Z
(ii)
6;?
=
A
-
and 8, = Y
-
POZ.
7- [d,& + (1 - d),&]Z
(0 < d < 1, known).
+ g n l ( l n 2 ~ l , , ( a ) ) , m= n
Fl , m ( ~ ) ) (iii) 6kd)pt = , n$ ( d )n l<( ~
(pn
-
2,
where Q and sz are defined by (3.1.3) where C, = and (3.1.4), respectively, and 8‘1,,(a) is the a-level critical value of a central 8‘-distribution with (1, n - 2) d.f.
(a) Find the bias, quadratic bias, and mse of the three estimators. (b) Compare the estimators using the mse criterion. 5. Let x = (0,. . . , O ; 1,. . . , 1)’with n1 zero’s and 712 1’s. Then we have the two-sample problem, using the model Y = 81, Px e . Then p1 = 6 and p2 = 8 ,B and p2 - p1 = p. It is suspected that p may be equal to a known value Po. Then three estimators of p1 may be
+
+
6)
fi1 = g1,
(ii)
fir)
= dfil+
(1 - d ) ( j j 2 - PO),
where y1 and ij2 are the means of the first respectively. (a) Find the bias and mse of
+
n1
y-values and
722
y-values,
fil(4.
(b) Further, let firT = filI(Cn 2 F1,,(a))+fi~l(Ln < F~,,(Q)), where
L,
=
n1R.2 (YZ-Yl-Po) -
n1 +n2
SP
2 7
2 sp
=
2
1 (nl+n2 -
2)
.
2=1
and m = nl + 122 - 2. Find the bias and mse of with respect to j l 1 .
n,
X (YV- YJ27 j=1 and compare
6. Consider the n-dimensional normal distribution Nn(O,7’1,) and the Inverted Gamma distribution IG(v0,u 2 ) defined by
(a) Show that the mixture of these two distributions is the n-dimensional Student’s t-distributions Mt(O,o:ln) defined by the density
vo, u > 0, -a< ei
< 00.
(b) Show that
E ( e ) = 0 and E(ee’)= a21n,
02
=
vou2
(vo > 2). vo - 2
3.12. Problems
121
7. Consider the simple linear model where e
-
Y
= 61,
+ ,Ox + e ,
Mt(O,a,"I,).
(a) Show that the MLE of 6 and ,L? are the same as the least squares estimators, and the unbiased estimator of u," is s: = &(Y 6,l, - & X ) ' ( Y - e,1, - pnx,.
(b) The restricted estimator of 6 is 8, = P - ,&E when it is suspected that ,O may be equal to PO.Find its bias and mse. 8. Suppose vo > 2 and e
-
M t ( 0 , a,"In)for the linear model in Problem 7.
(a) Show that the LR test of the null hypothesis HO : p = 00 versus is given by test-statistic HA :p #
(b) Show that under H o , L, follows the central F-distribution with (1,m) d.f. (rn = n - a), while under H A , it follows the distribution defined by the pdf
g(L,;
vO) =
x(n
- 2)T+(1/2)
r>O
(which is not a noncentral F-distribution) where A*2=
3.
9. Define the preliminary test estimator of the intercept parameter as in (3.2.4) e,PT
= 6, - (6, - 6 , ) I ( L ,
< F1,m(a)).
(a) Show that the bias and MSE expressions are
02 (i + 5). Mz(6,) = c,"(i + $A*2),
(i) b l ( 8 , ) = 0, and M I ( & ) =
(ii) bz(8,) = -6T, and
6 = ,O - PO.
(iii) b3(eET) = -6GFA ( i F l , , ( ~ r ) ; A * ~ )m , = n - 2 , and
Chapter 3. Preliminary Test Estimation
122 where
where zo =
F1*m(a)
. and j
m+F1,,(a)'
= 1,2.
(b) Compare the three estimators, namely, 8,, conclude that the performance of 6, and similar to that under the normal theory.
in,and 6zT of 0 and
GET
compared to
8,
is
10. (Two-Sample Problem). Consider the simple linear model in Problem 2, and assume the vector x t o be (0,. . . ,O; 1,. . , 1)' with n1 zero entries and nz entries of 1's. Assume e Mt(O,OzIn), n = n1 nz.
-
(a) Show that the MLE of p1 and tively. (b) Show that the LR test for
+
p2
are f i 1 = 1 1 and
p = p 2 - p1
ji2 = 1 2 ,
respec-
= 0 is given by statistic
L, = n l n Z ( g 2 - & ) 2 (n1 + 4 s ;
.
(c) Show that C, follows the central F-distribution with (1,nz+nz-2) d.f. under HO and under H A : p1 # p1, it follows the distribution whose pdf is defined by g(C,; A*', YO) given in Problem 8(b) with 7LlR.2 (32 - Y d 2 n1n2 , A*z = ___ L, = ___ n1 + n 2 s; n~+n2
(d) Consider the preliminary test estimator of
byT = f i 1 - ( f i 1
p1
b 2
-Pd2
a:
as
- bl)l(Ln < Fl,m(a))
where FI,,(a) is the upper a-level critical value of the central Fdistribution with (1,nl n 2 - 2) d.f. Show that the bias and mse expressions of f i 1 , b1, and brT are given by
+
(i)
bl(ji1)
= 0, B1(fi1) = 0 and Ml(fi1) =
2, a: = $,
YO
> 2.
3.12. Problems
123
where Gp:2i,m (&Fl,,(a); in Problem Sa(iii).
A*2) has the same definition as
11. (Application to Survey Sampling). Let P = (1,. . . ,N } denote a finite population of N units, where N is known. Associated with the kth unit of P , there is a vector (yk,xk)’, k = 1,... , N , assumed to be a random sample from the bivariate normal distribution. Assume that the population P was generated according to the model
Y = 191~+ ,Ox+ e, E ( e z ) = 0, Cov(e) = 0 2 1 ~ ,and e
-
NN(O,O~IN).
N To gain information about the population total = Ckzl Y k , a sample s of n units is selected from P according to some specified plan. Thus, we are using a superpopulation model in survey sampling. Let T = N - s , the
unobserved part of the population. The following predictor is considered:
TG = N T + ~ ( N - n ) ( Z , - IC,)&, -
-
1 where Y , = ; CiEs yi, Zr = ~1 - Cj=, n xj, 5, = stands for the generic estimator. Let
(i)
-
Pn
=
x’y - ;(l;x)(l;y) x’x - i(lnx)2
en = ( A-s2 s,2 =
’
1
, Q = X’X - -(1n~)2, n
1 (Y- e,ln n-1
1
- &X)’(Y -
e,l, - &x).
xi and
,&
Chapter 3. Preliminary Test Estimation
124
T, and TzTare
(a) Show that the bias and mse expressions of
(i)
b(?,)
= 0 and b ( f z T = - ( N - n ) p l G ~ ,(~i . F ~ , ~ (A') a ) ;;
and
(b) If, in the model
Y
= 6'1~
+px+e,
e is distributed as M t ( 0 ,a z I ~as ) in Problem 6(a) show that = 0 and b(fLT) = - ( N - n)p1G3,rn(3Fl,m(a);A*2); (2) 1
6) b@,) (ii)
M1(Tn) = N f ( 1 - f)o: [l + Nf(1 -
f)w] and
where the G-functions are defined in Problem Sa(iii). 12. Prove Theorem 4 with respect to the estimate of
p.
13. Refer to Section 3.10 (a) Show that the exact distribution of T,(O,O) and L,(O) is symmetric of 0. (b) Show that the R-estimators 6, and ,& are invariant, robust, and consistent estimators of 6' and p, respectively. 14. Refer to Section 3.10.3. (a) Find the asymptotic efficiency of
fin
and relative to the LSE of
p.
(b) Find the asymptotic efficiency of 6 , and relative to the LSE of 6'.
6,)relative to the LSE
(c) Find the joint asymptotic efficiency of (6n, of (6'7 P).
Chapter 4
Stein-Type Estimation Outline 4.1 Statistical Model, Estimation, and Tests 4.2 Preliminary Test Estimation 4.3 Stein-Type estimators 4.4 Derivation of the Stein-Type Estimators 4.5 Stein-Type Estimation When the Variance Is Unknown 4.6 Stein-Type Estimation: Nonnormal Distributions 4.7 Improving James-Stein Estimator toward Admissible Estimator 4.8 Confidence Set Estimation Based on Stein-Type Estimators 4.9 Nonparametric Methods: R-Estimation 4.10 Conclusions 4.11 Problems
Stein-type estimation (also called shrinkage estimation ) was introduced by Stein(1956) and James and Stein (1961) in the statistical literature. The approach by Stein (1955, 1956) combines “uncertain prior information” on the parameters of interest and the sample observations from a Multiparameter statistical model. It is more sophisticated and improved than its pre-cursor, the preliminary test estimation. The application needs a parameter space with more than two dimensions. As pointed out in Section 3.5, the preliminary test estimation produces (1) only two values of the estimator, namely, the unrestricted estimator and the restricted estimator, and (2) depends heavily on the level of significance of the preliminary test. Stein-type estimators prcduce all possible values in between the unrestricted and restricted estimators depending on the sample values of the test-statistic used for the preliminary test, which shrinks toward the target vector parameter or its estimator. There is a substantial gain in the use of Stein-type estimators as the improvement carries over to the corresponding set-estimation problems. In this chapter we introduce the Stein-type estimation of the mean vector of a standard pvariate normal distribution. The usual maximum likelihood 125
Chapter 4. Stein-Type Estimation
126
estimation and test of hypothesis for the mean vector are presented. These results are then utilized to define (1) preliminary test estimator and (2) Steintype estimators of the mean vector, which includes the James-Stein and the positive rule estimators. Risk difference and empirical Bayes approaches are presented to derive the Stein-type estimators, beginning with the preliminary test approach. Comparisons of the preliminary test and Stein-type estimators are made relative to the maximum likelihood estimator based on the MSE matrices and the weighted risks under quadratic loss functions. Some asymptotic theory is presented for nonnormal distributions together with the R-estimation in a nonparametric setup. For the preliminary test estimators, tables are presented t o determine optimal size of the preliminary test. We also discuss recentered confidence sets with positive-rule Stein estimators for the normal model as well as for the nonnormal model and provide asymptotic domination of the recentered confidence set. Finite sample analysis of the efficiencies of the Stein-type estimators relative to the standard estimators are given along with the graphs of bias and efficiencies. Finally, problems are added to expand the results to multivariate t-distributions in addition to the related results.
4.1 Statistical Model, Estimation, and Tests Consider the simple location model
Y i= 8 + e i ,
i = 1, . . . , n ,
(4.1 .l)
where Yi = (Yli,.. . ,Y,i)’is the ith observation vector, ei = (eli,. . . ,epi)’ is the ith error vector and 6 = (61,.. . ,6p)’ is the vector parameter of the model. The primes mean transposition, as usual. I t is assumed that ei
- N,(O,I,),
i = 1,.. . ,n.
(4.1.2)
Using the information (4.1.1) and (4.1.2), we obtain the maximum likelihood estimator (MLE) of 6 as -
en = P = (F, . . ,Y,)’, *
where
is the sample mean vector and
(3
8 , = Y - N p @,-Ip .
(4.1.3)
(4.1.4)
Clearly, the bias vector, bl(6,) = 0 the quadratic bias, Bl(8,) = 0, and the normalized MSE matrix is Ml(8,) = E ( n ( 8 , - 6 ) ( 8 ,- 6 ) ’ } = I,. Finally, under the quadratic loss function
qe*,e)= nip* - 6&
=46*-
e)’w(e*- 6 )
(4.1.5)
4.1. Statistical hdodel, Estimation, and Tests the weighted risk of
6,
127
is and
Rl(6,; W) = tr[Ml(G,)W],
tr[M1(6,)W] = p if W = I,. (4.1.6) -
Notice that 6, consists of the marginal sample means &, = Y i , i = 1,2, . . . ,p , as its components (which are independent). For the test of the null hypothesis HO : 8 = 80 against H A : 8 # 8 0 , the likelihood ratio test yields the statistic
c,
so112.
(4.1.7)
eO1l2.
(4.1.8)
= rille, -
Under Ho, C, has a central chi-square distribution with p d.f. If 8 # 80, C, has the noncentral chi-square distribution with p d.f. and noncentrality
parameter A2/2, where
a2=
-
The estimator 6 , stands out because of the following optimality properties: it is a (i) (ii) (iii) (iv)
best unbiased estimator, best equivariant estimator, maximum likelihood estimator, minimax estimator.
In this chapter we utilize the expressions of the MSE matrices and risks to evaluate the performance of various estimators. We also consider the MSE and risk based efficiencies. At this stage it is useful t o remind the readers of the definition of inadmissibility of an estimator 8:: An estimator 8; is inadmissible for 8 if there exists an estimator 6 , such that
~ ~ w) ( 65 R2(8:; ~ ; w) for every value of 8 and strict inequality Ri(6,; W) < RZ(8:; W)
(4.1.9)
holds for some 8 E 0. In terms of MSE matrices we then have
for every value of 8 , and for some 8 E 0, strict inequality holds, meaning, (4.1.10) In order to compute the bias vector, quadratic bias, MSE matrix, and the weighted risk of an estimator of the type
8: = 6 n # ( n ~ ~ e n ~ ~ 2 ) ,
-
(4.1.11)
where $Jis a real-valued function and 6, Np(8,n-11p), we use the Stein’s (1956) formula given in the theorem below assuming 80 = 0 w.1.g.
128
Chapter 4. Stein-Type Estimation
Theorem 1. Under the assumed regularity conditions, we have the bias vector and the quadratic bias of 0; given by (i) b ( Q 3= - e E [ l - 4(x;+2(A2)] and (ii) B(@:) = -A2 { E [ 1- 4 ( ~ ; + 2 ( A ~ ) ] } ~A2 , = n//0jj2.
(4.1.12)
The MSE matrix and the weighted risk expressions under the loss function (4.1.5) are given by (iii)
M(@;) = I, E [ ~ ~ ( x ; + ~ ( A+~n) e) ]e ’ ~ [ 4 ~ ( ~ ; + ~ ( ~ 2 ) ) ] -
+
2nee’~[+(~;+~(a~))]
(4.1.13)
and (iv)
R(%;W) = tr(W)E[42(~g+2(A2))] fn@’We{1- 2E[4’(x;+2(A2))]
+E[~’(X;+~(A~))]}
+
= PE[42(~;+2(A2))] A2{ 1 - 2E[42(x;+2(A2))]
+ E[4(x;+4(A2))]}
and
if W = I.
(4.1.14)
4.2. Preliminary Test Estimation
129
This theorem will be used in the sequel to obtain relevant results. Our will be the following: choices of 4(n116n1/2)
and
4.2
Preliminary Test Estimation
One aspect of 6 , is that it ignores any relationship among the components of 6 = (61,. . . ,6,)’. If this question is raised, then the entire estimation procedure may have to be modified. As a start, we consider that the relationship among the components of 6 is defined by the null hypothesis HO : 6 = 0, meaning, 6 lies in the hyperplane defined by 8 = 0. So the restricted estimator of 6 is 6 , = 0. In this case, we consider the preliminary test estimator (PTE) of 6 as
where 0 is the restricted value defined by HOand I ( A ) is the indicator function of the set A and x;(a) is the upper a-level critical value of a central chi-square distribution with p d.f. As stated in Chapter 3, PTE was initiated by Bancroft (1944, 1964, 1965) and extended by Han and Bancroft (1968), among others, in a parametric setup. Saleh and Sen (1978) considered PTE within a nonparametric setup. The PTE of 6 heavily depends on the level of significance, a , of the test PT and the value of en fluctuates between two values, namely 6 , and 0 . N o A
Chapter 4 . Stein-Type Estimation
130
intermediate values between 0 and 6, are available. It is clear that the bias vector of 6, is bl(6,) = 0 with the quadratic bias (QB), = 0. The MSE matrix of 6, is Ml(8,) = I, with the risk Rl(6,; W) = t r ( W ) = p if w = I,.
&(a,)
..PT
..PT
), the quadratic bias (QB) &(On ), the PT MSE matrix Mz(8, ), and the weighted risk RZ(8, ;W) are given by the following expressions, by the results of Theorem 1 of Section 4.1: Similarly, the bias vector b’(8,
-
. PT
1
PT
(i) b ( 8 ,
Bz
) = - 8 H , + z ( x : ( a ) ; A’) and = A’ { H,+Z A’) }’,
(ET)
(X:w;
(4.2- 2 )
where H,(x:(a); A’) is the cdf of a noncentral chi-square distribution with I, d.f. and noncentrality parameter A 2 / 2 where A’ = nl18/j2. (ii)
MZ(~$)
= 1 , ~ -~ ~ + z ( x ; ( aA”>> >;
+ n68’{2H,+z(X;(Q); A2) - H p + 4 ( x 3 a ) ;A’)}.
(4.2.3)
and
&(DflT; =
W) = tr[M2(6ET)W] (1 - Hp+Z(X;(4; A”}
f(nB’W8) {2HP+’(X;(a); A’) - H p + 4 ( x 3 4 ;A”}
,
(4.2.4)
= P { l - Hp+Z(X;(Q);A”}
+A2 ( 2 H p + 2 ( x ; ( a ) A2) ; - H P + 4 ( x ; ( a ) A’)}, ;
if W
= I,.
(4.2.5)
- PT
Note that if A’ = 0, then &(On ) = 0; otherwise, it increases to a maximum and then drops down toward zero as A2 + 00. As regards the MSE matrix, we consider the pth root of the determinant of the matrix, that is,
+ A’ [ ~ H , + z ( x : ( ~A)2;) - H p + 4 ( x ; ( a ) ;A’)]}l/p. (4.2.6) APT l / p - PT Thus, p l M z ( 8 , I) = Rz(6, ;I,) = p { l - H p + ~ ( x ~ ( c y ) ; aOt )A’ } = 0 , and PT
..PT
as A’ diverts from the origin, both pIM2(6, ) / ‘ I p and RZ(8, ;I,) increase to a maximum crossing the pline, and then decrease monotonically toward the ,.PT value p as A2 + 03. See Figure 4.2.1. In general, the graph of pjMz(8, )I1/, PT remains below the graph of R2(8, ;I,) in the middle (0 < A’ < co),since these functions are drawn in the same scale. 6
A
4.2. Preliminary Test Estimation
* 0
131
,.* i’
-[
I
Figure 4.2.1 Graphs of Ma = p
1
I
I
I
M2 (*PT)/l’p 8,
I
1
. PT and R2 = R2(B, ; I p )
The following theorem states that there are points in the parameter space where the PTE has risk larger than the MLE 6,. It performs better in a small neighborhood of the origin. The theorem is due t o Scolve, Morris, and Radhakrishnan (1972).
Theorem 1.
(ne’we)> t r ( w ) if (netwe) < tr(W).
R ~ ( L ;W) ~ ;> t r ( W ) if < tr(W) Proof.
(4.2.7)
Chapter 4. Stein-Type Estimation
132
since x;+~is stochastically larger than x;+’. tr(W)
The r.h.s. of (4.2.8) equals
+ [(ne’WO) - tr(W)]Hp+2(xi(a);A’)
> tr(W)
if ( n o r w e ) > t r ( W ) . If W = I,, then
- PT
RZ(6, ;Ip)> p
(4.2.9)
if A’ > p .
(4.2.10)
Now, choose (n6’WB) such that t r ( W ) - 2(n6’W6) > 0. Then
R2(6ET;W) 5 [tr(W)- 2(n6’W6)]{l - Hp+4(x;(a);A2)] +(n6’W6)[1 - HP+4(xi(cy);A’)]
+ (n6’W6),
(4.2.11)
replacing Hp+2(x:(a);A’) by Hp+4(x:(a); A’). The r.h.s. of (4.2.11) after simplification equals tr(W)
+ [tr(W)
-
(n6’W6)]Hp+4(x~(a); A’) < t r ( W )
1 if ( n e ’ w 6 ) < - t r ( W ) 2
< tr(W).
(4.2.12)
. PT
If W = I,, then Rz(6, ; I p ) < p if A’ < i p < p. Since the risk Rl(8,;I,) = p and 8, is minimax, Theorem 4.2.1 states that no such P T E can be minimax. Further, as $ ( a ) --+ 00 (i.e., (Y -+ 0), the risk Rz(e,;I,)at the origin (i.e., A’ = 0) corresponding t o the PTE tends t o be p . That is to say, we can find a P T E that does arbitrarily well a t the origin at the cost of being poor elsewhere. Consider the properties of 6: with respect to 8, based on the MSE criterion. For this consider the difference C’MZ (6;’) C‘ - t ‘ C
(4.2.13)
for a given nonzero pvector C. Thus,
ere{1 - Hp+z(Xi(a); A’)}+ t’n66’t{ 2Hp+2(xi(a);A’) > <
- t’e-0
according as
for all t.
- Hp+4(xi(a);A’)}
(4.2.14)
4.2. Preliminary Test Estimation
133
(4.2.15)
We may define the MSE based relative efficiency (MR.E) as :
= [{I - H,+z(x;(a);A
2
+ A2(2H,+2(X;(a);
I} P-1 { (1 - Hp+’(x;(a);A”,)
A2) - Hp+4(x;(4; A”,)}]
= { 1 - HP+2(xg(a);A2)}-(p-1)/p
-lip
[{ 1 - Hp+2(x;(a); A’)}
+ A2{2Hp+2(x;(a); A2) - Hp+4(x;(a); A2)}]-1/p.
(4.2.17)
PT
For each a , the efficiency of 8, relative to 8, attains the maximum value Em, = (1 - Hp+2(xg(a);0)}-1 at A2 = 0 and the minimum value Emin *
APT -
a t Akin(a). The intersection of the graph of MRE(8, PT
;en)with the l-line
a,
occurs a t A:. Thus, if 0 L A2 5 A2, then 8, is better than A2 > A;, then 8, is better. The cut-off point A; may be determined based on the inequality
and if
The efficiency expression allows us to determine the optimum level of significance for minimum guaranteed efficiency, Eo, as the solution a* of the equation min MRE(a, A’) = E ( a ,A,,,in(a)) = EO, A=
(4.2.19)
where Eo is a preassigned efficiency number for the PTE, and MRE(a, A2) -PT stands for MRE(8, ;8,) as a function of a and A2. The efficiency based on the optimum level a* may go up to Em, = (1 - Hp+2(xg(a);O)}-’.
Chapter 4. Stein-Type Estimation
134
Table 4.2.1.Maximum and Minimum Guaranteed MSE Based Efficiencies 8 10 12 14 4 6 P Emax 6.75908 7.89292 8.72360 9.37375 9.90415 10.34988 Emin 0.83930 0.90622 0.93840 0.95653 0.96781 0.97533 12.55403 16.40307 19.66492 22.53881 25.14251 27.51153 An i: Emax 3.92601 4.491 18 4.89829 5.21311 5.46767 5.67991 Emin 0.86685 0.92253 0.94938 0.96451 0.97391 0.98015 Akin 10.62223 14.08485 17.06601 19.72510 22.15249 24.40138
Emax Emin A;in
r
Emin Akin Emax Emin A?nin Emax Emin Aiin
i"
Emin Aiin Emin
Emin Aiin Emin
2.89783 3.26722 3.53047 0.73248 3.89487 4.02965 2.89783 3.26722 3.53047 3.73248 3.89487 4.01965 9.47256 12.67623 15.46918 17.97908 20.28298 22.42997 2.35590 0.90103 8.64957
3.17317 0.98570 20.96061
2.01845 0.91385 8.00872
2.64502 0.98767 19.77572
1.78732 0.92509 7.48503
2.28535 0.93935 18.77429
1.61900 0.93513 7.04355
1.75471 1.84936 1.92081 1.97751 2.02408 0.96168 0.97513 0.98284 0.98764 0.99082 9.58252 11.89341 14.02413 16.01690 17.90156
1.49112 0.94421 6.66328
1.60350 1.68169 1.74059 1.78724 1.82549 0.96682 0.97845 0.98516 0.98935 0.99212 9.07737 11.29477 13.35346 15.28787 17.12444
1.39096 0.95247 6.33049
1.48475 1.54992 1.59894 1.63770 1.66944 0.97151 0.98148 0.98727 0.99089 0.99329 8.62734 10.75802 12.74924 14.62948 16.42136
1.31074 0.96000 6.03573
1.38937 1.44382 1.48480 1.51718 1.54367 0.97581 0.98425 0.98919 0.99229 0.99434 8.22239 10.27088 12.19820 14.02726 15.77741
Table 4.2.1 may be used t o determine the optimum level of significance for the preliminary test (PT) when the MSE based criterion is used. For example, let p = 6, and suppose that we wish t o obtain a PTE with minimum guaranteed efficiency Eo = 0.95. Then Table 4.2.1 gives the value a* = 0.25 at the intersection of (Y = 0.25 and p = 6. Hence, the optimum level for the PT in this case is 0.25 with a maximum possible efficiency 2.22631. . PT Now, consider the relative risk efficiency (RRE) based on the risk of en
4.2. Preliminary Test Estimation
135
given by
(4.2.20) APT.-
-PT .
The graph of RRE(0, ,On)is similar to the graph of MRE(0, ;On).The maximum occurs at A’ = 0 with the same value (1 - Hp+2($(a);O)}-’. The to the value efficiency decreases, crossing the 1-line a t min RRE(a, ALin(a))= RR.E(a,ALin(a)),
(4.2.21)
A2
and then increases toward 1 as A2 increases toward -PT
-
03.
Here RRE(a,A2) APT
-
stands for RRE(6 , ; 0,) as a function of a and A’. Clearly, RRE(6 , ; 6,) $1 according as
(4.2.22) The optimum value of a may be obtained by solving the equation
minRRE(a, A’) = E ( a ,Amin(a))= EO A2
(4.2.23)
with a pre-assigned value Eo. Table 4.2.2 facilitates the determination of the optimum level of significance a* based on MSE matrix and risk efficiencies. As an application of Table 4.2.2, let p = 6. We wish to obtain a P T E with a t least 95% efficiency. We find in the crossing of 0.40 and 0.95099 (E0.95) the desired a-value as a* = 0.40 with a maximum efficiency 1.60350 as in the
Chapter 4. Stein-Type Estimation
136 case of MRE.
Table 4.2.2. Maximum and Minimum Guaranteed risk Based Efficiencies
Emin Alin
8 8.72360 0.81707 12.79251
Emin Alin
4.89829 0.87014 11.68294
5.67991 0.93879 17.01821
Emin Akin
4.02965 4.95530 16.26681
Emin Alin
3.17317 0.96591 15.71231 2.64502 0.97339 15.26500 2.28535 0.97897 14.88455
0.35 Emax 1.61900 Emin 0.91005 Akin 5.96289 0.4 Emax 1.49112 Emin 0.92589 Akin 5.76832
1.75471 0.94021 7.79095
1.84936 1.92081 1.97751 2.02408 0.95827 0.96989 0.97776 0.98326 9.54048 11.24068 12.90716 14.54916
1.60350 0.95099 7.56281
1.68169 1.74059 1.78724 1.82549 0.96604 0.97567 0.98215 0.98666 9.28777 10.96826 12.61807 14.24553
0.45 Emax 1.39096 E,in 0.93933 Akin 5.59431
1.48475 0.96002 7.35598
1.54992 1.59894 1.63770 1.66944 0.97245 0.98039 0.98571 0.98938 9.05691 10.71809 12.35159 13.96478
0.5 Emax 1.31074 Emin 0.95079 Akin 5.43611
1.38937 0.96763 7.16522
1.44382 1.48480 1.51718 1.54367 0.97781 0.98430 0.98862 0.99160 8.84227 10.48427 12.10153 13.70053
-
-
4.3 Stein-Type Estimators 4.3.1
Introduction
Preliminary test estimation introduced by Bancroft (1944, 1964, 1965) was an early approach t o the problem of estimation and test of hypothesis in cases where some uncertain prior information exists. We saw in Section 4.2 that the PTE of 8 fails to be minimax, although there is a small neighborhood near the origin where it performs better than the MLE, 6,.
4.3. Stein-Type Estimators
137
However, real breakthroughs came when Stein (1956) proved that there exist estimators that can improve on 8, under the loss function nll6, - ell2. He demonstrated that the estimator (4.3.1)
-
improves on 8,. The genesis of this estimator stems out of a simple consideration: If 8, estimates 8 closely, does it mean 116,112 will estimate l18]j2closely? Clearly,
+
~[nlt~n1121 = nllett2 p .
(4.3.2)
Thus, n\16,112 has missed n118112 by an amount p. To obtain the estimator O:, we will pursue the following geometric approach together with the Pythagorean theorem: Since 6, is an estimator of 8 based on a sample of size n from NP(8,1,), we set
4len - q2= P
(4.3.3a)
nl16nl12= nllelt2 + P.
(4.3.3b)
and Stein showed that instead of 6, the projection of 6 on 6, is a better estimate of 8. Denote this projection by (1 - a)8,. Then, from the geometrical representation given in Figure 4.3.1, we can conclude the following: (i) 1 and
1 ~ 1 =1 ~ nll8, - ell2-
(4.3.4)
= p - na2116,112
= n1/8,/12- p (ii) 11~112= n11e112 - n ( 1 Equating (4.3.4) with (4.3.5), we obtain
-
n(1-
(~)~1j8,11~. (4.3.5) (4.3.6)
Hence, the projection of 8 on
8,
is given by (4.3.1) and repeated here:
(4.3.7) The bias vector and the quadratic bias (QB) of 0: are given by (using Theorem 1)
b(8:) = -pOE (x;:~(A~)) and B(8:) = p2A2{E (x;:,(A2))}', where 1 1
; ! ( 2- ~ l ~ ) ~ 2( + p 2r)-l
E[x,'(A2)] = e-A2/2 r>O
(4.3.8)
138
Chapter 4. Stein-Type Estimation
Figure 4.3.1 Geometrical representation of Stein’s idea In order to compare the performance of 0: relative to 6,’ we consider the
MSE matrix and the weighted risk of 0; given by
M(6:) = I , - & [ 2 E ( ~ , ; 2 2 ( A ~ )-)P E ( x ; ; ~ ( A ~ ) ) ]
+ p(n0e’)[2E(X,;22(A2)) - 2E(x;i4(A2)) - PE(X,;41(A2))]
(4.3.9)
and
R(%;W) = tdM(0:)WI
= tr(w)- p t r ( W ) [2E(xL22(A2)) -~ E ( x L ~ ~ ( A ~ ) ) ]
+P(n@’W0)[2E(X;22 ( A 2 ) -2E(x;i2( ) A’)) - ~ E ( x , (A”,)] S~~ = P - P2 [2E(X,;22(A2)) - PE(X,;2(A2))]
+
+PA2 [2E(X;i2(AZ))- ~ E ( x ; ~ ~ ( A pE(x;;,(A2))] ~))
(4.3.10)
if W = I,. By taking the risk difference R ( 6 n ; I p )- R(O:;Ip),we find that the risk of 0: is smaller than the MLE 6,. Similarly, we consider the pth root of the
4.3. Stein-Type Estimators
139
determinant of M ( 6 : ) :
IM(O:)l’/” =
{ 1 - P[2E(X,;Z2(A2)) - PE(X&(A2))]} { 1 - P[2E(X&(A2)) - p E ( x ; W ) ) ]
(p--l)/p
+ PA’ [~E(x;?~(A’)) - ~E(x;:~(A’)) + ~E(X;~~(A’))])’/~. (4.3.11) Thus, plM(O:)I’/P is smaller than p for all A’. Hence, the MLE 8, is inadmissible according t o the MSE criterion. As A’ + co,p/M(6:)11/p+ p , and for A’ = 0, we obtain plM(O:)l’/p = 2(1 - z)-’. On the other hand, P
co and a t the origin R ( 6 L ; I p )= 2(1 - $)-’ is the same as the MSE case. Except a t the endpoints the graph of plM(6L)lllp remains below the graph of R(6:; I p ) in the interval 0 < A’ < 00.
R ( O l ; I p ) p as A‘ --$
4.3.2
-+
James-Stein Estimator (JSE)
Although 6: dominates the MLE i j R , is it an optimum estimator of 6 ? To answer this question, consider the shrinkage estimator, (4.3.12) -S and determine the optimum constant c from the unweighted risk of O,(c). Now,
= P - 2c
+ ~~A’E(X;:~(A’)) ic’E(x~’(A’)).(4.3.13)
-S
Minimizing R ( 6 , (c);I p ) with respect to -2
c,
we obtain
+ 2 A 2 E ( ~ i $ 2 ( A 2 )+) 2d3(xp2(A2)) = 0
(4.3.14a)
Now, writing A ’ E [ x ~ ~ ~ ( A= ’ ) 1] - (p - ~)E[x,’(A’)], we obtain
-2
+ 2 { 1 - ( p - ~ ) E ( x , ~ ( A ~ )+) }2cE(xF2(A’))
= 0.
(4.3.1413)
This implies that the optimum value of c is Cf
=p-2.
(4.3.14~)
Chapter 4. Stein-Type Estimation
140
Thus, the optimum Stein-type estimator of 8 is given by James and Stein (1961) as (4.3.15) -S
Hence, the estimator 8, is known as the James-Stein estimator (JSE) in the literature. -s Note that if n [ ~ 6 , -+ ~ ~00, 2 then 8, = 6,, which matches with the PTE,
,.PT
8,
. However, if nl/6,/[2-+
-S
0, then 8, becomes a negative estimator. In order to obtain the right value, one has t o restrict n116,112 > p - 2. Stein (1966) defined the following estimator of 8:
(4.3.17) which is known as the positive-rule Stein estimator (PRSE) of 8. The -S
PRSE of 8 is a PTE. It combines 0 and 8, via testing Ho : 8 = 0 based on nl[6,1\2with the critical value p - 2. Since the arrival of JS estimator, there was a tremendous serge of research in Stein-type estimation to fill the void in different directions. See, for example, Baranchik (1970), Berger (1980, 1985), Bock (1988), Brown (1966, 1988), Casella (1989), Cellier, Fourdrinier and Robert (1989), Cohen and Strawderman (1973), Efron and Morris (1973), Saleh and Sen (1978-1986), Hoffmann (1992), Brewster and Zidek (1974), Casella (1985), and Kubokawa (1991, 1994) among others. In the sections that follow we present various techniques to improve the estimator 6,. -S In the sequel we will discuss the MLE 6,, the James-Stein estimator O,, . Si-
and the PRSE, 8, and variations thereof. We have seen that the Stein estimator 8: given in (4.3.1) is a biased estimator. The bias vector and the QB of the James-Stein estimator are b3(6;T) =
and
-(P - 2)8E(X&(A2))
-S
B3(On) = (P - 2)2A2{E.(~F:2(A2))}2,
(4.3.18) (4.3.19)
respectively. -S The MSE matrix and the weighted risk of 8, may be obtained as -S
M 3 ( 8 n ) = I,
-
(P - 2)I,{2JqX;:2(A2))
- (P - 2)E(XpS42(A2)))
+ (p2 - 4 ) n 8 8 ’ ~ ( ~ , ; 4 ~ ( ~ ~ ) )
(4.3.20)
4.3. Stein-TypeEstimators
141
and
R3(6:; W) = tr[M3(6z)W] = tr(W)
+ (P’
-
-
(P - 2 ) t r ( w ) { 2E(x,S22(A2) - ( p - ~)E(x;:~(A~))}
4) (nO’we)E(X& (A’))
= p - (p - 2)2E(~,2(A2))if
W = I,.
(4.3.21)
respectively. -S
The determinant of M3(6,) is given by
{ { 1- ( P
S
IM3(6,>/= 1 - ( P - ~)[~E(X;:~(A’))- ( p - ~)E(X;$,(A’,)]}~-~ x -
W E ( x ; 3 A 2 ) ) - ( P - 2)E(X,-,”2(A2))]
+(P2 - 4)A”(X&(A’))}.
(4.3.22)
-s
-s
At A’ = 0, R3(6,;Ip) = plM3(6n)11/P= 2 and as A’
4 m,
-s R3(6,;Ip)
-s and p[M3(6z)[’/pconverge to p . The graph of plM3(8,)l1/P remains below -S
the graph of R3(6,;IP) when A’ E (0, co).See Figure 4.3.2. -s Consider the comparison of 6 , relative to 6,. For a given nonzero vector t of p components we may evaluate the difference I
t’M3(6z)t- t’t.
(4.3.23)
Note that
s frM3(6,)t = t’t- ( p - 2)t’t{2E(x,;2,(A2))
- (p - ~ ) E ( x ; : ~ ( A ~ ) ) }
+ ( p 2 - 4)t’n66’tE(~;:~(A~)) -S
and t’M3 ( 6 , ) t (p
t’t < 0 according as
+ 2)t’n66’tE (
(A”,)
< t’t{~E(X;?~(A’))} - ( p - 2)E(x&(A2))). This implies
(4.3.24)
(4.3.25)
Chapter 4. Stein-Type Estimation
142
7
Risk of
S
M ; of S
-S
Figure 4.3.2 Graphs of R3 = R3(6:; Ip) and M3f = p[M3(8,)1'/P holds for all A2, which is equivalent t o pE[xP?2(A2))
< ( p + l ) ( p - ~ ) E [ x ; ~ ~ ( A ~for ) ] all A2
(4.3.28)
-S
and may not always hold. Hence, M3(8,) - Ml(6,) is not negative semi-S
definite (n.s.d.). Thus, 8, does not dominate 6, uniformly with respect to the MSE criterion. - s The efficiency of 8, based on the MSE matrix is given by
-s
MRE(8, : 6,) is a decreasing function of A2. At A2 = 0, it assumes the maximum value p/2 and as A2 -+ 00, the efficiency tapers down to unity from above. -S A consideration of the efficiency of 8, relative to 6, based on the risk
4.3. Stein-Type Estimators
143
(W = Ip)expression leads t o
-s -
-) 2
RRE(6,;6,) = [1 - p ( l - P
2
(4.3.30)
E(x;~(A~))]-’.
-s RRE(6,; 6,) is also a decreasing function of A’ with a maximum value p/2 and tends to 1 as A2 + 00 from below. See Figure 4.3.3 for both kinds of
efficiencies. -S Now we consider the risk analysis of 6 , relative to weighted risk difference
6,
for general W. The
R@,; W) - R2(6:;W) = (P - 2)tr(W{2E(x,;2(A2))
- (P - 2)E(X;:2(A2))}
- (p2 - 4 ) n 6 ’ W 6 E ( ~ & ( A ~ ) )
(4.3.31)
2 0 if the matrix W satisfies the condition (4.3.32) -S
Thus, 6 , dominates 6, uniformly for all W , satisfying (4.3.31). Further, as A2 + 00, the risk difference tends to zero from below.
4.3.3 Positive-Rule Stein Estimator (PRSE) ., S+
We have motivated the PRSE, 6 ,
-S
from the James-Stein estimator 6 , so that
..S+
it may have negative coordinates. Clearly, 6 , defined a t (4.3.17) is a PTE. Also, this estimator can never change the sign of its coordinates, and hence the name. First, note that we may write
e:+ = 6:I
(I(nll6,$ > p - 2))
(4.3.33) ., S+
-S
in terms of the James-Stein estimator. Note also that njl6, lj2 5 116n112. The bias vector, QB, MSE matrix, and the risk expressions may be obtained as given below. . S+
Chapter 4 . Stein-Type Estimation
144
x-
MRE of S re1 UE RRE of S re1 UE
7
2-
z
RRE
I
I
Figure 4.3.3 Graphs of MRE
and (iii)
M4 (6;’)
where
I
I
-s
= MR.E(6,;
-
I
I
-s
-
6,) and RRE = RRE(6,; 6,)
4.3. Stein-Type Estimators
145
where + stands for domination under the risk criterion. In passing we mention the properties of the QB of the estimators shown in Figure 4.3.4.
Chapter 4. Stein-Type Estimation
146
Ln
7
-
0
PTE
1
0
5
10
20
15
A2
25
30
. PT Figure 4.3.4 Graph of QB of estimators: PTE = 8, , JSE
,.S+
and PRSE = 8,
For a nonzero vector, l we have
-S
= 8,,
4.3. Stein-TypeEstimators
147
and the quantity in the second big bracket is negative. Thus, the whole expression is positive. Taking max (nl’$B’t/l’l)= A2, we have
E [ ( 1- ( P - 2)x,;22(A2))21(x~+2(A2) < P - 2)] +A2{2E[((p- 2)x&A2)
- 1)I(x&2(A2)
< p - 2)
+ E [ ( 1 - ( P - ~ ) x ~ ~ ~ ( A ’ ) ) ~ I ( x ~<, P+ ~ 2)]} ( A ”2 ) 0.
(4.3.43)
This is positive for all A2 and
-s+ - s en + en. The MSE based efficiency of 6:’ relative to
6,
is given by
-
AS+.
MRE(8,
(4.3.44)
1
en>= (1 - ( P - 2>[2E(X&(A2)) - ( P - 2)E(X;:2(A2))] E[{1 - ( P - 2 ) x ~ ~ 2 ( A 2 ) } 2 1 ( x ~ + z (< AP2 )- a)]) ( P - - l ) l P x [m4(A2)]-’lP.
(4.3.45)
where
m4(A2) = (1 - ( P - 2) [2E(x;:2(A2))
- (P - ~
)E(x~$~(A~))]
+ (P2 - 4 ) A 2 E ( x 3 A 2 ) ) + A2{ 2 -
w - ( P - 2)x~~2(A2))I(x2,+2(A2) < P - a)])
E [ ( 1- ( P - 2)X,;24(A2))21(X2,+4(A2)< P - 2)]}. (4.3.46) ., S+
Similarly, the risk efficiency of 0, obtained as
relative to
a,,
with W = I, may be
-St*-
RRE(% =
{1
-
ten)
p - ’ ( p - 2)2E[x,2(A2)]
-
E[{1 - ( P - ~ ) X ; ~ ~ ( A ~ ) } ~ ~ ( < X P~-+2)] Z(A~)
+
P
-
E [ ( 1- ( P - ~ ) X ; ? ~ ( A ~ ) ) ~ I ( X < ~ +p ~- (2)]}-l. A~)
-1
2
A {2E[{1- ( P - 2 ) x ~ ~ 2 ( A 2 ) } 1 ( x ~ + z ( < A 2P )- 2)]
,.S+
Table 4.3.1 presents some risk gains of 8,
-S
(4.3.47)
over 8,. Also, see Figure 4.3.5 - -PT - S ..S+ for the risk of the four estimators, namely On,$, ,On, and 8, .
Chapter 4. Stein-Type Estimation
148
7
1 ~ =0.15 1
p=4
of UE, PTE, S andS+
m
PTE
0
I
I
I
5
10
15
20
I
I
25
30
A2
Figure 4.3.5 Graph of the risks of the four estimators
(UE =
en,PTE = 6,.
PT
- S and S + , S= On, -
a;+)
F
0 1 2 3 4 5 6 7 8 9 10 1.35914 1.20103 1.52558 1.08236 1.05536 1.03769 1.02583 1.01776 1.01222 1.00841 1.00578 1.47781 1.26169 1.16530 1.11089 1.07668 1.05389 1.03822 1.02724 1.01945 1.01390 1.00993 1.54504 1.29510 1.18779 1.12778 1.08993 1.06447 1.04672 1.03405 1.02490 1.01823 1.01335 10 1.59024 1.31722 1.20302 1.13959 1.09951 1.07239 1.05329 1.03949 1.02938 1.02190 1.01634 12 1.62348 1.33333 1.21430 1.14855 1.10698 1.07872 1.05866 1.04405 1.03322 1.02512 1.01901
4.3.4 Sclove-Morris-RadhakrishnanModifications ..S+ It is interesting to note that the PRSE, 6,
-s+
0,
has two representations, namely
-s
= ~,I(n11~,1I2> P - 21,
(4.3.48a)
which is a PTE based on 62 and 0 with the critical value ( p - 2), and As+
6,
*PT
= 8,
I1- ( P - 2 > ( ~ l l ~ n I I ) - 1 1 ,
(4.3.48b)
which is a Stein-type estimator where 6, is replaced by
eLPT= e,I(nIien112 >
-
21,
(4.3.48~)
4.3. Stein-Type Estimators
0
0
149
I
I
I
5
10
15
1
20
A2
Figure 4.3.6 Graph of
R5
,.PT+
= R5 ( 6 ,
*
PT
;Ip) and R2 = R2(Bn ;Ip)
which is an ordinary PTE with critical value ( p - 2). We can represent the risk expression of 6,”’ in terms of the risk expression of 6:pT as follows: AS+.
Rs(6,
,W) =
R2(0:pT;W) - ( P - 2)tr(W)[2E(x,;22(A2)) - ( p - ~)E(X;:~(A’))] +(p2 - 4 ) ( n e ’ W 6 ) E ( ~ i($A2)) ~
{
[,+
1 + ( ~ - 2 ) t r ( W ) 2Ep 2THp+2r(P 1
w]
11
- 2; 0 ) [ + 2r)(p - 2 + 2T) 1 - ( p - 2)(n6’W0){ 2Ep [2rHP+2r(P - 2; 011
- ( p - 2)Ep ( p
4-2+2r(?J
+
-2Ep
[ ( p + 2r)(p1- 2 + 2T)
Hp-2+2r
1 + ( p - 2 ) ~ p( p + 2 + aT)( p + aT)~
[
( p - 2; O)] p + 2 (r p - 2;
o)]
}
(4.3.49)
where Ep stands for the expectation with respect to the Poisson r.v. r with mean A2/2 and H,(-; 0) is the cdf of a central chi-square variable with v d.f.
150
Chapter 4. Stein-Type Estimation
Also, we have the Sclove-Morris-Radhakrishnan (1972) modification of the
..PT
PTE, 8, as given below. First we have the usual one
..PT en = & ~ ( n i i k i i 2> x;(Q)),
(4.3.50)
. PT
and then we have the Stein-type modification of 8, given by (4.3.51) . PT+
The following theorem shows the uniform dominance of 8, PT+
Let R5(8,
;Ip)be the risk
PT+
of 8,
-PT APT+
. See Figure 4.3.6.
. PT
over 8,
.
- S+
Theorem 1. Let 8, , 8, and 8, be the estimators defined by (4.3.50), (4.3.51), and (4.3.17). Then, for p 2 3, . PT+
R2 (eET;Ip) 2 R5 (8,
;Ip) for all A’,
(4.3.52)
with strict inequality for at least one A’.
Proof.
. PT
Consider the risk difference R’(8,
RS(~;~+;I,) is the
replaced by
..PT+ ;Ip) - Rs(8, ;Ip), where
risk expression (4.3.36) with the critical value ( p - 2) difference is equal t o
x;(cr). Then the risk
P b - 2) [2E (X;;z(A’))
-
(P - 2 ) E (x;:,(A”,)]
1 (p+2r)(p- 2 + 2 r )
+p(p-2)[(p-2)Ep{
1 +A% - 2){ 2EP [%Hp+%
[
- (p2 - 4)A’E (x$(A2))
Hp-2fZr
(x;3(4;0)]
(x;(4;o)]
1
-2Ep ( p + 2r)(p - 2 + 2r) H p - 2 + 2 r 1 + ( P - 2 P P [ ( p + 2 r ) ( p- 2
+ 2T)
(x;(~);o)]
Hp+’r
( x ; ( 4 ;o)]
}.
All the terms in the brackets are positive, and the theorem follows.
(4.3.53)
4.4. Derivation of the Stein-Type Estimators
151
Further, consider the risk difference
The r.h.s. of (4.3.54) is positive if $ ( a ) S+
< ( p - 2). Thus, 6:'
APT+
dominates 6 ,
< ( p - 2 ) . 6 , is always minimax, and it is a minimax ,.PT ..PT+ substitute of 6 , whenever x;(a) I 2 ( p - 2 ) . If xE(a)> 2 ( p - 2 ) , then 6 , uniformly in A2 if $ ( a )
~
. PT ..S+ . PT ., PT+ dominates 8, but 6 , may not dominate 8, . In this case, 6 , may not be minimax. A P T APT+ S+ Thus, if x ; ( a ) < ( p - 2 ) , we can order the estimators 6 , , 6 , , and 8,
..
APT+
-PT
-S+
APT+
. See Figure 4.3.7.
as 8,
> 6 , for all $ ( a ) and A2 and 6 , > 6 ,
4.4
Derivation of the Stein-Type Estimators
In this section, we present some general methods of obtaining Stein-type estimators. They are
(1) Risk difference representation approach due to Stein (1956) and James and Stein (1961). (2) Empirical Bayes approach due to Efron and Morris (1973), and
(3) Preliminary test approach (or quasi-empirical Bayes approach) due t o Saleh and Sen (1978-1986).
4.4.1
Risk Difference Representation Approach
Let Ro(6,; 6 ) be the risk of 3, under the loss rille, - 6/12,and let R1(6,, 0) be the risk of another estimator 6,. In order to determine an efficient estimator, consider the risk difference
VR= ~~
~ ( 6e), -; ~ ~ ( 66,);.
(4.4.1)
Chapter 4. Stein-Type Estimation
152
N r
0 -
7
Risk of S + m-
01
0
I
I
I
I
5
10
15
20
Figure 4.3.7 Graphs of
A2 . S+
R4 = R4(8,
;Ip)and
R5
- PT+
= RS(8,
;Ip)
The aim is t o derive the estimator, S, such that VRo as a function of 6 is nonnegative and not identically zero. The Stein's identity (2.1.3) of Chapter 2 is an effective tool for transforming risk differences into an unbiased estimator of the "risk difference" considered as a parameter. This is given in the following theorem due to Hoffmann (1992):
Theorem 1. Let 6, = 8,-g($,), where g ( 8 , ) is partially differentiable and Ellg(6,112 < 03. Then
4.4. Derivation of the Stein-Type Estimators
153
= ~0 [h(en)]9
(4.4.3)
where
(4.4.4) Then h(6,) is an unbiased estimator of the risk difference. If we can show that h(6,) is a positive-valued function for certain g ( 6 , ) satisfying the regularity conditions of Stein’s lemma 2 (2.1.3)’ then Theorem 4.4.1 states that 6; = 6 , - g(8,) has smaller quadratic risk than 6,. Now consider the class of estimators of 8 defined by
(4.4.5) based on a sample of size n from Np(e,Ip),and let
Ro(&
Q ) - R1(6,;
6,
be the MLE of 8. Then
0)
+ 2c(p - 2)A2E(~G:2(A2))] c2(p - 2 ) 2 E ( ~ p 2 ( A 2 + ) )2 ~ ( p 2)
= p - [ p + c2(p - 2 ) 2 E ( ~ p 2 ( A 2) )2c(p - 2)
= -2c(p - 2){l - ( p - 2 ) E ( x i 2 ( A 2 ) ) }= [2c(p -
-
c2(p- 2)2]E(~,2(Az))
= ( p - 2)2c(2 - c)E(xi2(A2)) > 0 for 0
< c < 2,
(4.4 .S)
where E ( x i 2 ( A 2 )is) defined by (2.2.13) in Chapter 2. Here (4.4.7) Then
Chapter 4. Stein-Type Estimation
154
(ii) h(8,) =
c(2-c)(p-Z)
difference,
118- 112
, (0 < c < 2) is the unbiased estimator of the risk
where
All conditions of Stein’s lemma 2 (Section 2.1.3) are satisfied to justify the result (4.4.6). For c = 1, we obtain the JS estimator. Thus, the JS estimator has smaller risk than 6,. As a matter of fact, the class of estimators defined by (4.4.5) has lesser risk than 6,.
4.4.2
Empirical Bayes Estimation (EBE) Approach
Similar t o R.obbins (1955), Efron and Morris (1973) developed the empirical Bayes estimation approach to obtain the JS estimator of 8 as follows: Since 6, Np(8,iIn), they incorporated the null hypothesis HO : 8 = 0 into the description of the prior distribution T of 8. Thus, assuming 6 Np(O,$Ip), they could obtain the joint distribution of (8’,6:) as
-
NZp{ (
-
), n-’(
is given by
:zi
816,
)}. Here the posterior distribution of 6
(1 ::2)1
- Np{
1
1 (1 - B)6,, -(I - B ) I p n
6 6 ,
while the marginal distribution of
is given by
,
(4.4.8)
(4.4.9)
N P ( 0 ,B-’Ip),
where
B = (1 + T~)-’,0 < B < 1.
(4.4.10)
The results (4.4.8) to (4.4.10) may be represented as an empirical Bayes tree given in Figure 4.4.1. The p parameters are tied down together by the underlying common distribution of the 6’s, but there are some differences in the 6’s in general. Now, each &Il&+ N ((1 - B)&,,, (1 - B ) ) so that
-
0
E ~illiji,, = (1 - B)B,,,, and the marginal distribution of mator of 8 may be calculated as
6,
=
fig;,, -
s,(e,)
=
i = I , . .. ,p,
is N ( 0 , B - l ) . Then the Bayes esti(1 - ~
p,.
(4.4.11)
4.4. Derivation of the Stein-Type Estimators
155
Figure 4.4.1 Empirical Bayes tree The Bayes risk of 6, may be computed as P ( ~ , ; O ) = Ee{nII(Sr - e)I12}
+ B2Ee{nl18n112}- 2 B Ee{neb(6n- e)}
= Ee{nllen - 6112}
=~-2Bp+Bp=p(l-B).
(4.4.12)
However, the hyperparameter r2 or B in general is unknown. Therefore, we replace B by B, given by
(4.4.13) to obtain the JS estimator -s en=
(I - $-;!)". - - -
To justify the use of Bn, we note that
Bnljanl12
2
N
xp.
(4.4.14)
Thus,
(4.4.15)
156
Chapter 4. Stein-Type Estimation
and
E ( B , ) = ( p - 2)E,(nl/6,112)-’
(4.4.16)
= B.
This means that B, is an unbiased estimator of B and the J S estimator arises from the Bayes estimator when we replace B by B,. Notice that 6, can assume any value between 0 to 00 but B E (0,l). Therefore, we may restrict the value of n116,1/2 to more than ( p - 2 ) . This
,.S+
leads to the positiue-rule shrinkage estimator, 6,
We may summarize the empirical Bayes estimation approach as follows:
(1) Obtain a Bayes estimator based on a class of prior distributions (here it is N ( 0 , G I , ) ) that includes the “uncertain prior information” on the parameter of interest. (Here it is 6 = 0 . ) ( 2 ) Estimate the hyperparameter (here it is B ) based on the marginal distribution of the MLE/BLUE of 6 . (3) Plug in the estimator of B into the Bayes estimator of 6 , and obtain the EBE or Stein-type estimator. (Here it is (1 - B,)6,.) (4) If there is any restriction on B , adjust the estimator to set the estimator of B t o fulfill this requirement. (See positive-rule shrinkage estimator in Section 4.3.3.) For more information on EBE, see Deely and Lindley (1981). For the EBE approach a major difficulty is faced when formulating a suitable class of priors for a given problem and the estimation of the hyperparameter. In Chapter 12, we will be presenting the estimation of binomial and multinomial probabilities to illustrate these difficulties with approximate solutions. Finally, we consider the measure of improvement of an estimator 6 , over 60 as the ratio (4.4.18) where the numerator is the Bayes risk difference between 60 and 6 , and the denominator is the Bayes risk difference between 60 and 6,. Clearly,
K ( 6 , ; r ) 5 1.
(4.4.19)
If 6 , is uniformly better than 60, then
0 5 K(6,;n) 5 1. An almost Bayes estimator is characterized by a value of K(6,;
(4.4.20) T)
near 1.
4.4. Derivation of the Stein-Type Estimators
157
4.4.3 Quasi-empirical Bayes or Preliminary Test Estimation Approach To obtain the Stein-type estimators using the P T E approach, the following steps may be followed for a given model with uncertain prior information: (I) Obtain an optimum unrestricted estimator 8, of 8 E 52 and an optimum restricted estimator 6, of 8 E w (e.g., by likelihood/least squares method). (2) Obtain an optimum (e.g., likelihood ratio type) test-statistic, L, for testing the ('uncertain prior information", say, 8 E w. (3) Define the preliminary test estimator of 8 as
- PT = 8, - (b, - ~ , ) I ( L ,< L,,~),
8,
where
e E w.
(4.4.21)
is the a-level critical value of L, from its distribution under Ho :
(4) Replace the indicator function I ( & < by a smooth decreasing function c L i l , where c is a suitable constant. Then the Stein-type estimator is defined by (4.4.22a) and the positive-rule Stein-type estimator by AS+
6,
= 6n
+ (1
-
cL;'}I(L, > c ) ( b , - 6,) for the PRSE of 8. (4.4.22b)
For the James-Stein case, 6, = 0 and L, = ~~118,11~, and both the PTE approach and the EBE approach give identical results. Here, (4.4.22a) and (4.4.22b) are based on a simple decreasing function cL;', but we will choose a more general function such as c+(L,), where $(L,) is a monotone decreasing function of the test-statistic L,. This allows us to cover various types of estimators. As a matter of fact we can write the four estimators compactly as (see Stein, 1966) 0: = $(.c,)&,
(4.4.23)
where
+Vn)
= 1, = I(Ln > x;(c.)), = 1- C L ; l ,
or
=
> c),
(1 - CL;')I(L,
APT - s
,.S+
c =p -2
yields the estimators b,, 8, ,On, and 8, , respectively. The quasi-empirical Bayes (or PTE) approach results in an identical, similar, or an approximation
158
Chapter 4. Stein-Type Estimation
of the EBE approach, hence, the name “quasi-empirical Bayes estimation (QEBE)”. We will consider this approach as a useful tool in many complicated situations in the chapters that follow. (5) The constant c in the formulation of the Stein-type estimators is chosen to be (i) c = (v - 2) if C, has a central chi-square distribution with v d.f. (ii) - (.1-2).2 u l ( v 2 + 2 ) if C , has a central F-distribution with (v1,v2) d.f.
In the James-Stein case here, c = ( p - 2 ) . As an example consider the estimation of the mean vector 6 for the p variate normal distribution N p (6 , C ) ,where E is the covariance matrix based on the sample Y 1 , .. . ,Y N Let . 7 and S be the sample mean vector, and let the sample covariance matrix have the Wishart distribution with ( N - 1) d.f. and expectation ( N - 1)X. Suppose that we need to estimate 6 with the loss function - 6)’C-’(8;
L(6:6) = N ( 8 ;
-
6),
(4.4.24)
which is the Mahalonobis distance between 6; and 6 . Suppose that 6 0 is a prior uncertain value of 6 . Then, using the P T E approach, we obtain the Stein-type estimator as
ht
= 60
+ (1 - cTG2) (F- 6,) ,
(4.4.25)
where TN -
N ( Y - 60)’S-’(Y - 6 0 )
(4.4.26)
is the Hotellings T2-statistic. Now, the distribution of [ T 2 / ( N- l)][(NV
p ) / p ]= F p , ~ -follows p, the F-distribution with ( p , N - p ) d.f. Hence,
c=
( P - 2)m
p(m
+ 2 ) ( N - 1)’
m=N-p,
(4.4.27)
is the optimum shrinkage constant. The EBE needs more analysis (see Efron and Morris, 1972) than this simple PTE approach as given below. Assume that the prior distribution of 6 is Np(60,N - I A ) and we have 6, n/,(6,N - I X ) . It may be shown that the joint-distribution of (6’,6;)’
-
>>
N-lA N-’A N-lA N-’(A C) * Consequently, the conditional distribution of 6 given 6~ is
is given by
NzP
Np
+
{ + A ( A+ X)-’(e, 60
- 60),N - ’ (A-’ + X-’)-’}
Hence, the Bayes estimator of 6 is given by
e;
=
eo+ A(A + x ) - l ( e N- eo)
= 60
+ ( I - B ) ( ~ N6 0 ) , B = X ( A + C)-’, -
(4.4.28)
159
4.4. Derivation of the Stein-Type Estimators with
+
Since (A X) is unknown, so is B, and we estimate B using the marginal distribution of Y , ( a = 1,.. . ,N ) , which is Np(80,(A+E)).Now, C,=, N (Ya&)(Y,- So)' follows the Wishart distribution, W P ( A + X), N ) , and we estimate (A+E) by
[+ c , N _ ~ ( Y ,
-eo)(y,
-
coy] = + { S N + N ( i j N - e 0 ) ( 6 N -
eo)'}, where S N = C,=l(Y,- e,)(Y, - 6 given by B N , N
B N =SN[SN
= [Ip
~ ) 'Hence, . an estimate of B is
+N ( 6 N -80)(6N
+ NsG1(6N - 8 , ) ( 6 N
-
-
&)']-'
?I')&
Then the empirical Bayes estimator of 8 is defined by -EB
8,
=
eo+ (I - B N ) ( 6 N eo). -
(4.4.29)
-s
Compare this expression with that of 8 , a t (4.4.25).
4.4.4 How Close is the JS Estimator to the Bayes Estimator? -S
-
We consider the shrinkage estimator 8,. Based on the assumption that 8 Np(O,$ I p ) = 7r as in Section 4.3.2, we know that the a posteriori distribution of 8 is N p { ( l- B)6,; (1 - B)Ip},where B = (1 ~ ~ ) - 0l < , B < 1. Hence,
+
P ( L 7 r )
p(&,
7r)
=p
(4.4.30)
= (1 - B)P,
and ~ ( 6 ,7 r;) - p(6,;
7r)
= Bp.
(4.4.31)
On the other hand, p(6,;
7r)
- p ( i f ; 7r) = E,
[I?,@,)
- R 3 ( 6 3 ] = ( p - q2E1,[ E ( X , 2 ( A 2 ) ) ] .
(4.4.32)
Now, as in Hoffmann (1992), we have
The r.h.s. may be written following Hoffmann (1992) as
(4.4.34)
Chapter 4. Stein-Type Estimation
160
Using the Fubini theorem, we obtain the r.h.s. as
/ fl [ / { -;(iJ, W
=
P
W
exp
0
-
8,)2 - n7~6:,}(27r)-'/~
-w
j=1
P
(- -8:)mi
(4.4.36)
+22~)-~/~exp
(4.4.37)
x exp
1
05
=
(1
nu 1 27L
+
+
Setting v = 27~/(1 2 u ) in (4.4.33), we get -
1
1
(1 - v)?-'exp
{ -zA2}dv 21
for p 2 3.
(4.4.38)
Thus, E(xF2(A2)) has the integral representation (4.4.34) that restricts the -S
) a decreasing function of shrinkage estimator 8, for p 2 4,and E ( x i 2 ( A 2 ) is A2. The maximum value of E ( x i 2 ( A 2 )is) l / ( p - 2 ) which drops down to 0 as A2 + 00. -S Thus, the risk difference Rl(8,) - RS(8,) equals ~
(P- 212 2
Jd
1
(I - 8
) F - I
exp
{
-
52) A 2 } d v ,
(4.4.39)
which varies from ( p - 2 ) (maximum) to 0 (minimum). Next, we compute the -S
Bayes risk of 8, given by
[qx,"A",>I
=Lp[ T I
E 7 T
1
(P- 212
p-2-1 2
(z) { 1
x
Substituting z =
[-,
u + B (I-B) [
-
exp
= B ( p - 2).
n
{ - ;nl18112}dv]
ll8Il2}d81 . . .do,
(I - u ) - 2 d v (
B n) . p/2
(4.4.40)
we obtain
(P- 2)2 (1 - B ) 2
-
exp
/
00
B
i=E
(-1 B
~ - ~ / ~ d z
1-B
(4.4.41)
4.5. Stein-Type Estimation When the Variance is Unknown
-s
161
Thus, the value of K(8,; n) equals ( p - 2)/p = 1- (2/p). For reasonably large
-s
values of p , the estimator 6, closely resembles the Bayes estimator.
4.5
Stein-Type Estimation When the Variance is Unknown
4.5.1 Introduction: Model, Estimators, and Tests Consider the model (4.1.1) Yi=8+ei,
where Y i= (Yli,.. . ,Ypi)’ and ei = Further, ei
i = 1 , . . . , n, ( e l i , . . . ,epi)’
- N,(o,
and 8 = (61,... ,6=)’. (4.5.1)
a2rP).
The MLE/LSE of 8 is the same 6 , as before, meaning but the unbiased estimator of u2 is given by
6,
= @I,.
-
.. ,Y p ) ’ ,
(4.5.2) Further, p ( n - l)s;/u2 follows a central chi-square distribution with rn = p ( n - 1) d.f. In order t o test the null hypothesis HO : 8 = 80 against H A : 8 # 80,we use the LR-test (4.5.3) that follows a central F-distribution with ( p ,m ) d.f. under Ho and a noncentral F-distribution with ( p , m ) d.f. and noncentrality parameter A2/2 (4.5.4) under the alternative.
4.5.2
Preliminary Test and Stein-Type Estimators
If a priori it is suspected that 8 may be 0 , then the preliminary test estimator of 8 is given by
162
Chapter 4. Stein-Type Estimation
where F p , m (is~the ) a-level critical value of a central F-distribution with ( P t m ) d-f. Similarly, the James-Stein estimator of 8 according to Section 4.4.3 is defined by (4.5.6) and the positive-rule shrinkage estimator is defined by (4.5.7) -S
It may be noted that if we write 8, as (1 - cL;')en, then it is easy to show that the optimum c equals ( p - 2 ) m / p ( m 2 ) .
+
4.5.3 Empirical Bayes Estimation When the Variance Is Unknown To obtain the empirical estimator of 8,we consider the class of priors
(3
8 - Np 0,-Ip
Hence,
elen
+
where B = a2(cr2 defined by
-
.
(4.5.8)
(4.5.9)
Np((1 - B)Gn,(1 - B)Ip},
0 < B < 1. Thus, the Bayes estimator of 6 is
(4.5.10)
6 , = (1 - B)&,
with Bayes risk p ( 1 - B). In the expression for 6,, B is unknown since 'rc and 'T are both unknown. To estimate B , we follow the same procedure as (4.4.14) to (4.4.17) where we consider cr2(02 ? - 'instead of 1 r 2 . We note that the marginal distribution of is Np(0,B-'Ip). Thus, Bn116((2has the central chi-square distribution with p d.f. Accordingly, the uniformly minimum variance unbiased estimator of (0' T ' ) - ~ is ( p - 2)/nli6n112. Also, the best scale invariant estimator of ' o is m s i / ( m + 2 ) . Thus, inserting the estimators of (a2 r2>-land o', we obtain the empirical Bayes estimator of 8 as
+
+
fie,,
+
+
(4.5.11a)
4.5. Stein-Type Estimation When the Variance is Unknown
163 (4.5.1l b )
Thus, writing (4.5.12) where
din) = 1, = I(Cn
> Fp,m(a)),
= 1 - CLL1,
or = (1 - cL;l)I(L,
> c),
c=
(P - 2)"
+ 2) '
P("
APT - S
yields the estimators 8,,8, ,On,and b:+, respectively. Here the PT approach coincides with the EB approach. We may also consider the Stein estimator of u2 (see Section 3.8) in place of msp/m 2 in (4.5.11a, b) t o obtain another estimator of 6 (see Berry, 1994), namely
+
-IS
8, = (1 - c(m
+ 2)4s(~,)~;'}8,,
where
(
(4.5.13)
">
I Ln<m+2
+
'
and c = ( p - 2)m/p(m 2 ) . We study this estimator in Section 4.5.6. in terms of the test-statistic C, as in (4.5.3). . IS
-S
We will see later that 8, improves on On (4.5.6) uniformly in A2.
4.5.4
Bias, MSE Matrices, and Risk Expressions
The bias vector and quadratic bias expressions are (i) b1(8,)=O
and
B1(8,)=0.
PT
(4 h ( 8 , ) = 4Gp+2,,(&;A2)
(4.5.14a) . PT
and B 2 ( 8 ,
) = A2{G,+2,,(ta;A2)}
2
,
(4.5.14b) where Gul,u2 (.; A2) is the cdf of a noncentral F-distribution with (v1,uz) d.f. and noncentrality parameter A2/2 and
e.,
=
P + 2%&4-
(4.5.14~)
Chapter 4. Stein-Type Estimation
164
_I
n = 16
0 (9
S+
\
8QB
70
29 0
I
5
0
10
15
1
I
I
20
25
30
A2
Figure 4.5.1 Graph of QB of estimators: PTE, JSE, and PRSE
- PT , S = 8,,
(PT = 8,
and
B3
(6;)
= c’p2A2
-S
and S+ = 6;’)
{E ( x ~ ~ ~ ( A ’ ) ) } ~ ,
(4.5.14d)
The expressions are obtained following the procedure of Section 4.3 and applying the theorems in Chapter 2. Some graphs of QB of estimators are given in Figure 4.5.1. Similarly, the MSE matrices and the quadratic risk expressions may be obtained as (i) Ml(8,) = n’Ip and Rl(6,; W) = a 2 t r ( W ) = p if W = o - ~ I , . (4.5.15a)
4.5. Stein-Type Estimation When the Variance is Unknown
165
166
Chapter 4. Stein-Type Estimation
if W = u-’IP.
(4.5.15g)
Risk Analysis of the Estimators
4.5.5
First compare the unrestricted estimator difference in this case is given by R1
6,
. PT
and the PTE, 8, . The risk
(6,; W) - R2(6ET;W )
= u2tr(W)Gp+z,m(&;A’)
-(ne’W8) {2Gp+z,m(L;A’) - Gp+4,m(&; A’)}.
(4.5.16)
Thus, the r.h.s. is nonnegative whenever
PT
In this interval, 8, APT 6, whenever
performs better than
6,,
and
6,
performs better than
tr(W)Gp+z,m(a*; A2) > tl-w Chmin(W){2Gp+2,m(&; A’) - Gp+4,m(ez;A’)} - Chmin(W) ‘ (4.5.17b) -S Next, we compare 8, and 6,. Here, the risk difference A’ 2
R1(6,; W) - R&;
W)
+ [I - (’+2u2A2tr(W) 2 ) n 6 ’ W e ] 2 A 2 E ( x ~ ~ , ( A 2 ) ) } (4.5.18) . This risk difference is positive for all W satisfying the condition t r ( W ) > p-t2 C ~ r n a d W )- 2 ’
(4.5.19)
167
4.5. Stein-Type Estimation When the Variance is Unknown
and b: dominates 6, for all A’ whenever W satisfies (4.5.19).The risk difference converges to zero as A’ -+ 03, and a t A’ = 0, it is 2(p+ m ) / ( m 2), which tends to 2 as m 4 00.
+
..PT
If we compare 8,
-S
and O n , we notice that under HO : 8 = 0 ,
R3(b:; W) = R2(6ET;W) + 02p{ Gp+2,m(&;0) - c}
-S
PT
Thus, the risk of 8, is bigger than the risk of 8, for all levels of significance a, satisfying A
PT
otherwise, the risk of 8, is bigger. The picture changes as A2 moves away from the origin. As A’ + co,the risk difference approaches zero. For intermediate value of A’, the two risks intersect. ..S+ -S Comparison of 8, and 8, shows that the risk difference is
The r.h.s. is negative, since
Thus, for all A2 E (0, co),
R4 S+
-S
(6;’
W) 5 R3
(6:;
(4.5.24)
W) , -S
AS+. 1s
a simple modification of 8,. PT Now, consider the modification of the PTE, 8, as defined at (4.5.5):
8, dominates On, and 8,
APT+
-PT
en = e n
(1 -&).
(4.5.25)
168
Chapter 4. Stein-Type Estimation
4.5. Stein-Type Estimation When the Variance is Unknown
1 0 for all
(0, A2)
R5
169
and p 2 3. Hence,
(6ET+;W) 5 R2 (6ET;W)
for all ( a ,A2),
(4.5.31)
..PT+
PT
where strict inequality holds for some ( a ,A2). The risk of both 6 , and 6 , approaches a 2 t r ( W ) as A2 4 co. Both estimators share the property that as A2 + oc), their risks converge to a common limit a 2 t r ( W ) but the risk of PT+ PT . PT+ is remains below that of the risk of 6 , . This shows clearly that 6 , 6, ~
A
..PT
. Furthermore, neither
PT+
..PT
-
6, nor 6 , is superior t o 6,, their ..PT+ risk functions intersect as A2 increases leaving the origin. However, 6 , is ..PT a simple modification of 6 , similar to the positive-rule shrinkage estimator and valid for p 2 3. ..S+ APT+ Finally, we compare the PRSE, 6 , and the improved PTE, 6 , . Here the risk difference is superior t o 6 ,
A
Chapter 4. Stein-Type Estimation
170
“1 1
la = 015
R
0
0
1
I
I
1
I
1
5
10
15
20
25
30
A2
Figure 4.5.2 Graph of and
PT R2 = R2(8, ;oP2Ip), R4 = APT+. A
R5 = Rs(8,
R4(6:+;o-21,),
,O - ~ I , )
(4.5.32)
Note that la < c p / ( p implies F,,,(cY) < c.
+ 2 ) implies Fp,,(a)
< c and similarly l: < c p / ( p + 4) ..S+
The risk difference is positive for all Fp,m(cr)< c. Therefore, On
- PT+ formly better than 8, . The estimator 8,
AS+.
is uni-
,.S+
is minimax, and therefore, On is a minimax substitute for the PTE whenever F p , m ( ~<)2c. If F p , m ( ~>) 2c.
..PT+
Then 8,
..PT
dominates 8,
..S+
as we saw before at (4.5.31), but On . PT+
PT
may not
. In this case, 8, is not minimax. See Figure 4.5.2 for com-s+ PT+ parison of 8, ,On , and On . dominate 8,
APT
e
171
4.5. Stein-Type Estimation When the Variance is Unknown
4.5.6
An Alternative Improved Estimator of 8
In Section 4.5.2 we considered various estimators of 8, and we defined another IS
improved shrinkage estimator 8,
in (4.5.13) as
-IS
0, = (1 - c ( m
+2)4s(~n)~,~>8n,
where
1
4S(Ln) =
I
+
(e n > -
1 $Ln I mi:2)+m+p+2
(4.5.33)
(L,<- m + 2 ) , (4.5.34)
(see Berry, 1994). *IS -IS We now consider the bias vector and QB of 8, by expressing 8, in terms of the James-Stein estimator -IS
8,
= e- sn -
m+2
m+2
) a,,
(4.5.35)
where F* = n118,1/2/ms$. Then the bias vector and the QB are b6
(6;)
=
- 8 h ( F * ; A 2 ) and
&3
= A2h2(F*,A2),
(4.5.36)
where
with -eo = (p+;;nm+2)’ ..IS -S Next, we prove the following theorem showing 8, dominates 8, uniformly over A’ E [0, co) via unweighted risk analysis: IS
Theorem 1. R6 (6, ;o-’IP) 5 RJ (6:;o-’Ip) strict inequality for a t least one A’. -IS
Proof. First note that we can write 8, as
for all A’ E [ O , c o ) and
Chapter 4. Stein-Type Estimation
172
We know from the proof of Theorem 3.9.2 in Section 3.9 that 2
. [ { % 4 ( ~ nII ) -e 1n II} -IS
The risk of 8,
I&]
+
5 E [ { ( mms' 2)a2 -l}21Ln]
forallA2.
is given by
Now, by Stein's lemma, E [ ( Z- U ) h ( Z ) ]= E[h'(Z)],where h ( 2 ) is an absolutely continuous function, we have R6
(6;;
~7-~1,)
= P - E [ ( p --2 ) 2 2 ] + E [( p - -2 ) 2 2 ]E[{(zj)4s(,)-l}2], nI IenI I2 721 Ion I l2 nI 1'n1I2 while
R3
(6;; a-'I,)
Hence, R3
(6:; o-~I,)- R6
(6;;
o-~I,
Then, for all ( A 2 , p 2 3 , m > 1).
4.5. Stein-Type Estimation When the Variance is Unknown
173
n=16
0
0
5
10
15
20
30
25
A=
-
-IS
S+
Figure 4.5.3 Graph of R6 = R6(6, ;u-’IP), R4 = R4(6, ; c T - ~ I ~ ) , -S and R3 = R3(6n;a-21,)
-s
-IS
,.IS
Although 6 , improves over O n , the positive-rule version of 6, does not S+
improve over 6, .*
.
..IS
We can write the expression of the risk of 6, as
(4.5.39)
See Figure 4.5.3 for visual comparison of the estimators. ,.IS The MSE matrix of 6, can be obtained similarly.
174
Chapter 4. Stein-Type Estimation
4.6
Stein-Type Estimation: Nonnormal Distributions
4.6.1 Model, Estimation, and Test Consider the model (4.1.1) again:
Yi=e+ei,
i = 1 , 2 ,...,R ,
- n;=,
where Yi = (Yli, . . . ,ypi)’, ei = ( e l i , . . . , e p i ) ’ (61,. . . ,6,)’ is the vector of means. Further, assume that
Fo(eji) and 6 =
E(e+)= 0 and E(eieb) = a21,.
(4.6.1) -
It is well-known that the unbiased estimate of 8 is 6, = (TI,.. . ,Y,)’, and by the pvariate central limit theorem, (i) - e) N,(o,~~I,) as n -+ 00; (4.6.2a)
+(en
(ii) The unbiased estimator of u2 is sz = { p ( n - 1)>-ltr
-
6 , ) ( ~ -i 8,)’
(4.6.2b)
i=l
P
and s2--ta2as n -+ M. In order to test the null hypothesis HO : 8 = 8 0 against H A : 6 consider the statistic
#
80,
we
(4.6.3) 2,
Under Ho, as n 00, C,4x:, which is a central chi-square distribution with p d.f. Under the fixed alternative H J : 6 = 6 0 + 6 , -+
&(en - e,) = &(en - e) + A s , and njj61j2 --+ M as n -+ M. It follows that L, for all fixed alternatives H A ,
-3
00
as R
(4.6.4) --j
co. Consequently,
(4.6.5) for all z E ( 0 , ~ ) . Therefore, in order t o obtain a meaningful distribution, we consider the class of local alternatives K ( n )defined by
K(,) : 6(n)= 8 0 + n-1/26 for a fixed finite vector 6 .
(4.6.6)
175
4.6. Stein-Type Estimation: Nonnorrnal Distributions Under K(,),
by Sluskey's theorem where Hp(.;A2) is the cdf of a chi-square distribution with p d.f. and noncentrality parameter A2/2.
4.6.2
Preliminary Test (or Quasi-empirical Bayes) Approach to Stein-Type Estimation of the Mean Vector -
First, we consider 6, = (TI,.. . ,Yp)'as the unrestricted estimator of 8. If we suspect that 8 is 0, then the preliminary test estimator of 8 is defined bY -PT en =en -
(4.6.8)
where Ln,a is the upper a-level critical value from the exact distribution of C,, and under Ho as n 4 00, L,,* is approximated by $(a)-an a-level chisquare value with p d.f. Notice that the difference between (4.2.1) and (4.6.8) where $(a) in (4.2.1) is replaced by C,+ though Cn,a -+ $(a) as n 00. An empirical Bayes method to define the Stein-type estimator of €J is not easy t o formulate because we need to assign an appropriate prior distribution of 8 to obtain the a posterior distribution to use in developing the empirical Bayes estimator. In this case the preliminary test approach will overcome the difficulty by defining the Stein-type estimator as --f
-s en = en - ( p - 2 ) ~ 1;en,
(4.6.9)
since L, # 0 with probability one. Here we have simply replaced I ( C n < Cn,a) by ( p - 2)C;'. Further, following (4.2.3)'we define the positive-rule shrinkage estimator as
-s+em - {i
-
( p - 2 ) c 3 1 ( ~>,
- a)&.
(4.6.10)
4.6.3 Asymptotic Distributional Bias Vector, Quadratic Bias, MSE Matrix, and Risk Expressions of the Estimators Let 0; be any estimator of 8, let W be the p.s.d. matrix, and consider the quadratic loss function
qe;; e) = ..(e;
e)w(e; e) = tr[W{n(S: e)(e: - O)'}]. -
-
-
(4.6.11)
176
Chapter 4. Stein-Type Estimation
Let M(6;) = n E { ( 6 ; - 0)(6: - e)’}; then the risk is given by
R(6;;W) = tr[WM(6:)].
(4.6.12)
Consider the local alternatives
K(,) : 6(,1 = 801,
+ n-lI26
( 6 a fixed vector),
(4.6.13)
and let the asymptotic distribution function (a.d.f.)
G i ( x ) = 1lim P { fi(6; 2-m
- @(,I)
I xIJK(,)}
(4.6.14)
if it exists. Further let the asymptotic distributional MSE (ADMSE) of 6; be denoted as
M(6:) = /xx’dGi(x),
(4.6.15)
and let the asymptotic distributional bias vector (ADB) and the asymptotic distribution quadratic bias (ADQB) be
b(6;)
=
1
xdGi(x) and B(0:) = [b(O:)]’Z-’[b’(6;)],
(4.6.16)
where X is the covariance matrix of the unrestricted estimator. Finally, we have the asymptotic distributional risk (ADR) as
R(6:; W) = tr[WM(6:)].
(4.6.17)
Based on these definitions, we may obtain the ADB, ADQB, ADMSE, and
ADR by the following two theorems: Theorem 1. Under {K(,)} as n (i) bI(8,) = 0 and Bl(8,) APT
(ii) b2(0, ) -S
(iii) bS(6,)
=
=
4
03,
the ADB and ADQB are given by
= 0,
-6HP+2(x;((r); A2) and B2(6ET)= A2{HP+2(x3(r);A2)}2,
-(p
-
~ ) ~ E ( x ; : ~ ( A ~and ))
-s
B3(6n) = (P- 2)2A2{E(x;:2(A2))}2,
(i.1
S+
b4(6, ~
)
= - S [ H p + 2 ( p - 2; A2)
-(P
..S+
-
+ (p - ~ ) E ( x ; : ~ ( A ~ ) )
{
2)E x,;22(A2))1 (xi:’ (A?)
and B4(6, ) = A’ [ H p + 2 ( p- 2, A2)
+ ( p- ~)E(X;:~(A~))
}]
4.6. Stein-Type Estimation: Nonnormd Distributions
Proof. Under {K(,)} as n
-+
V
-+
(ii) f i s ~ ' ( 6 ,
PT
(iii) fisP'(6: *
cm,we have (see Sen and Singer (1999))
-
z Np(O,Ip), - O,,)) 5 Z - (Z + S * ) I ( l l Z + 6*jj2< x;(a)), 6* = a-I8, O(,)) 3 Z c(Z + 6*)llZ + c = p - 2, and 6,,)) 3 Z ( l - cllZ + 6 * ~ ~ - 2 ) I+( ~6*jj2 ~ Z > C ) - 6*,
(i) fisel(k2 - 6 ( , ) ) 6
177
-
S+
-
(iv) f i s ~ ' ( 6 , where c = ( p - 2), Then
(i) bl(6,) (ii)
=0
and Bl(6,)
= 0,
..PT bz(6, ) = aE(U1), where u1 = z - (Z + S*)I(l/Z+ 6*lj2 < $ ( a ) ) = 6 H p + ~ ( x ; ( a ) ; A 2 ) , ..PT
A2 = 116*jj2 and Bz(6, ) = A2{Hp+2(~~(a);A2)}2, -s
(iii) B3(6,) = aE(U2), where
Uz = Z - c(Z + S*)llZ + 6*11-2 = -cdE[~;:~(A~))l
&(at)
and
2
= c'A~{E[x;$~(A~)]} .
. s+
(iv) b4(O,
) = aE(U3), where
u3 = (z + 6*)(1- CllZ + 6*lI-2)I(jlz+ c5*1/-2 > c) - 6* = ~ E [ ( ~ - C X ~ ( A ' ) ) I ( & ~> ( Ac)]~ )- 6 =
-c6E[x;:2(A2)]
- ~ E [ ( ~ - c x , S ~ ~ ( A ~ )< ) Ic () ]X, ~~=(( Ap -~2 ))
= -6{CE[X;:2(A2)] -
+ Hp+2(c;A2)
.)I}
cE[x,S22(A2)I(x~$2(A2) <
and
s+
B4 (0,
) = A2{ CE[X,S22(A2)]+ Hp+2(c;A2)
Note that these expressions are the same as in Theorem 4.4.1. The next theorem gives the results on ADMSE and ADR expressions.
Theorem 2. Under {K(,)} as n + 00, (i) MI($,) = a21, and
Rl(6,; W) = a 2 t r ( W ) = p if W = C T - ~ I ~ ,
178
Chapter 4. Stein-Type Estimation
4.6. Stein-Type Estimation: Nonnormal Distributions
179
Hence,
(iii)
M3(8:) = a’E[UzU;]
+ S*)llZ + s*ll-’][z - c(Z + S*)llZ + ,*1 -’], = U’E[ZZ’] + a’c’E[(Z + S*)(Z+ S*)’(llZ + B*jl-4] 2ca2E[Z’(Z + S*)’llZ + S*ll-’] = U’E[Z - c(Z
-
c = p-2
Chapter 4. Stein-Type Estimation
180
Hence,
Similarly, the asymptotic MSE and risk expressions may be verified for ., S+
AS+.
M4(8, ) and R4(8, ,W), respectively.
Clearly, all the analysis and conclusions of Section 4.4.3 continue to hold.
4.7 4.7.1
Improving James-Stein Estimator Toward Admissible Estimator Introduction S+
In Section 4.3.2 we introduced the positive-rule Stein estimator, 8, , defining it as a preliminary test estimator of the mean vector 8 with critical value p - 2. A second improvement was done in Section 4.5.6 by substituting the -IS Stein estimator of the variance to obtain the estimator, 8, . We found that
S+ -S ..IS -s * S+ 8, improves 8,, and 8, improves only 8, but not 6, . In this section, we -S consider again an improvement of 8, using the preliminary test estimator approach (Kubokawa, 1991) to obtain an admissible estimator that turns out to ~
be the generalized Bayes estimator given by Strawderman (1971) and Berger (1976).
4.7.2
-S
Improving 8, via PTE
Let us again consider the model (4.1.1) where
Y , = 8 + e i , i = 1, ... ,n,
(4.7.1)
4.7. Improving James-Stein Estimator Toward Admissible Estimator
-
181
-
with ei Np(0,Ip). Then 6, = (Yl,. . . ,Y p ) ’is the MLE/LSE of 8 . Next, we consider two Stein-type estimators, namely
6:
=
(1 - ( p - 2)(nl]6,Il)-l)
6,
(4.7.2)
and
e:(c)
= (1 - c(7116,ll)-’)
e,,
(c > 0).
(4.7.3)
Now, we combine the two estimators via a preliminary test procedure with the critical value k , as follows:
6,“(c1k ) = 6;Tl(nl[6,112> k )
+ 6;T(c)l(n1lenll2L: k )
= 6: - (6: - b , ” ( C ) ) I(nllBnl12 5 k ) = 6, - (c - p + 2)(nllenll2)-l1(nlle,1125 k)&.
(4.7.4)
-S
Minimizing the quadratic risk of 6n(c, k ) given by P + (c - P
-2(c
-
+ 2)2E[llZll-21(11Zl12< k ) ]
p + 2)E[llzll-2z’(z - 6)1(ljz112 < k ) ]
(4.7.5)
with respect to c, we obtain the optimum value of c as
-
where Z Np(S,Ip), 6=6 8 . Now, c ; ( k ,A 2 )can be rewritten as
c;(k;A2)= ( p - 2)
I(IIZ2ll < k ) } + E { (1 - Jq$)
E
{~
~ ( 1 1 ~ 1< 1k )2 }
1
A2 = 116/121 (4.7.7)
where U = Z’S/llSll. The explicit form of (4.7.7) is given by
c;(k; A2) = ( p - 2)
; - EP { * ~ p - 2 + 2 r ( k ; o ) ) + ~ p ( ka2)
{
EP &Hp-z+24k;
,
(4.7.8)
O)}
and H,(x;A2)is the cdf. of a noncentral chi squared distribution with Y d.f. and noncentrality parameter A2/2, and Ep stands for the expectation operator with respect to Poisson distribution with mean A2/2. Using (2.2.3), we can write
Chapter 4. Stein-Type Estimation
182 Now, rewriting (4.7.9) as
c;(k;A2) = ( p - 2) -
2hp(k;A2)
[J:
(4.7.10)
t-lh,(t; A2)dt] ’
we see that cr(k;A’) is less than or equal to
c;(k; 0 ) = ( p - 2) -
2h,(k; 0)
(4.7.11)
[sd” t-lh,(t; O)dt] ’
which is due to (2.2.7). Also, c;(k;O)is an increasing function of k and 0 @ ( k ;0) < p - 2. Further, we can write C ; ( k ; 0) as
<
Then we have the following theorem: -S
Theorem 1. The risk function of 8,(c,k) is quadratic in c and minimized at c = cF(k,A2) where c ; ( k ; A 2 )5 c;(lc;O),and ci(k;O) is increasing in k , (0 < cT(k;0) < p - 2). The message of the theorem is that for all A2, ci(k;O) is closer to the -S
minimizing value of the risk of B,(c, k ) than p - 2. Hence, we obtain the main result. -S
-S
Theorem 2. 8,(cT(k; 0), k ) dominates 8,.
Iterative PTE to Obtain an Admissible Estimator
4.7.3
Now we select another critical value k’ such that 0 another (PTE)-type estimator
< k’ < k , and we define
D,S(c;(k;O ) , k’, k ) =
{ 1 c;(k’;O)(nlJB,11Z)-’} e,l(n116,[12 < k’) + { 1 - c;(k; 6J(k’ I + 6fl(nlI6,[l2L k). -
o)(n~~en[~2)-1}
-s
nl/6,lj2
-S
Then 8,(c*(k; 0);k’, k ) dominates 8,(c*(k; 0); k) by similar method as in Theorem 2. Now, as in Brewster and Zidek (1974), we define a sequence of estimators of the form
-s ~ , ( c ; ( ~ z , j ; o ) , ~=z {I ,~) c;(~i,j;O)(nl16,112)-1}6n,
(4.7.14)
4.7. Improving James-Stein Estimator Toward Admissible Estimator
183
if kzJ-l < n116112 5 IFv, for a sequence of a finite partition of [0,co)represented bY
0 = kz,o < . . . < kz,n,-l < kz,ns= 0;) such that maxlkz,3- kz,3-11 -+ 0 and kz,n,-l
---f
-S
(4.7.15)
co as i 4 co. The sequence of
estimators 8,(c*(kZ3;0), k z 3 )converges pointwise to 8;’ defined by (4.7.16) and this is the generalized Bayes estimator of 8 given by the theorem. Theorem 3. The estimator 8;’ -S
8, with respect to the prior r * ( 8 )=
1’
(~T)-~/’X-’
is an admissible estimator of 8 dominating
(L) 1 - X exp { -1 2 (^) 1-A
A’} dA
(4.7.17)
Proof. See Strawderman (1971), Berger (1976), and Brown and Hwang (1982).
4.7.4 Extension to the Case Where the Variance Is Unknown
In this section, we extend the results of Sections 4.7.1 to 4.7.3 to the case where the variance is unknown. For the unknown variance, the standard James-Stein estimator is given by (4.7.18) as shown by (4.5.6) where Ln = n/18,112/psi. Thus, we define the PTE by (4.7.19) where (4.7.20) Define c; ( k ,0) by -1
P-2 Then
zp2di]
.
(4.7.21)
Chapter 4. Stein-Type Estimation
184
-S
-S
Theorem 4. (Kubokawa, 1991) The estimator 6 , ( c a ( k ; 0), k ) dominates 6,. Proof. The noncentrality parameter of the noncentral F-distribution in this case is A2/2, where A2 = n/11311~/0~. -S
It may be shown that the minimum of the risk of On(c,k ) is given by
where Z = as
&a,,
6 = &6, and A2 = 116[12/a2. Then (4.7.22) can be written
(4.7.23) Integrating by parts we obtain
+ 2)
= (m
+2
Lw
L
vm/2e-u/2
lm
kv
t-'h,(t; A 2 ) d tdv
vm/2e-"/2h,(kv; A 2 ) d v ,
(4.7.24)
so that
c;(k; A') = ( p - 2) - 2
dA2)
+ + 2 g ( A 2 ) ]'
[m 2
(4.7.25)
where
d A 2 )=
vrn/2ev/2h,(kv;A 2 ) d v
[J:
vm/2ev/2J;" t-l h,(t; A 2 ) d t dv] '
Let B ( v )= 2-"/' { r ( v / 2 ) } - ' and let
(4.7.26)
4.8. Confidence Set Estimation Based on Stein-Type Estimators
185
then g(A2) may be rewritten as (4.7.28) Similar to (4.7.11) we can show that g(A2) 2 g(0). So from (4.7.16) we have
c;(k;A2) 5 c;(k;O),
(4.7.29)
where ca(k;A2) is given by (4.7.25) and c ; ( k ; O ) is increasing in k, and 0 ca(k;0) < ( p - 2)/(m 2). Hence, the theorem follows.
+
<
As a limiting form corresponding to (4.7.14), we consider the estimator 6*s , =
{ ( 1-c;
enI12
___ nttmsp
)
(4.7.30)
&}en?
which is identical to the generalized Bayes estimator derived by Lin and Tsai (1973).
4.8
Confidence Set Estimation Based on Stein-Type Estimators
4.8.1 Introduction In the last few sections, we dealt with the problem of improving on the usual point estimator of a multivariate normal mean. In this section, we discuss the companion problem, that of the set estimation of the normal mean. The development of the set estimation was slow due to increased difficulty, and many of the techniques presented in the last few sections do not carry over to set estimation. Classical confidence sets derived by Neyman-Pearson theory have been the subject of many criticisms, and alternative procedures have been proposed. The (1 - 7 ) confidence set for the mean vector 0 in the model (4.1.1) can be written as the Neyman-Pearson solution
~ ‘ ( 3 , )= ( 6 : rille - 8n112< kY},
(4.8.1)
where k, = $(y) is the upper y-level critical value from the central chi-square distribution with p d.f. The set C’(6,) is a sphere centered a t the sample mean vector, that has probability 1-7 of covering the point 6. C’(8,) enjoys many optimality properties such as, it is
a,,
(i) best unbiased, (ii) best translation invariant,
Chapter 4. Stein-Type Estimation
186
(iii) minimax, meaning C'(6,) minimizes the maximum expected volume with coverage probability 1 - y among all other procedures.
Stein (1962) raised the following question: Is C'(6,) the unique minimax set, or do others exist? If so, since the coverage probability of C'(6,) is constant for all 6 , there is room for improvement of the coverage probability without increasing the volume. He showed that there exists sets with more coverage probability than the set C'(6,). Later Brown (1966, 1988) and Joshi (1967) independently showed that there exists a dominating minimax set for p 2 3. -BJ They showed that if p 2 3, there exists another confidence set C B J ( 6 , ), where 6- B, J = (1 -
a
(b+nlle., /I2)
}a,
dominating C'(6,) in the sense that (1)
,.B J
P 6 { c B J ( 6 f J )2} P o { C ' ( ~ , ) }and (2) Volume of C B J ( 6 , ) 5 Volume of ?'(en) with strict inequality holding in either (1) or ( 2 ) for a set of 6 or 6,. A sequence of studies then followed by Chen and Hwang (1988), Cohen and Strawderman (1973), Faith (1976), Ki and Tsui (1985), Berger (1980b), Casella (1985), Shinazaki (1989), and Casella and Hwang (1986) among others. Finally, Hwang and Casella (1982, 1984) were able t o show analytically the following theorem: Theorem 1. For p 2 3, and 0 < c < c;fi7,the confidence set S+
. S+
cs+(6, (c)) = ( 6 : nip - 6 , (c)112 < kY} that is recentered at the positive-rule Stein estimator of 6 , given by
(4.8.2)
(4.8.3) has a higher coverage probability than C'(6,) for all 6 . . S+ { Cc(6, (c)} is negative. AS+ To extend the bound on c, they needed the derivative $Pe{Cc(6, ( c ) } > 0 in a certain range 0 < c 5 c;?.
The proof needed to show that the derivative
The number
c;,
is determined by solving the equation
This result holds for p 2 4.But to cover the case for p = 3, the number c ; ~is chosen as the minimum of the two solutions of the equations
and
An approximate value of c is 0.8(p - 2).
(4.8.5b)
4.8. Confidence Set Estimation Based on Stein-Type Estimators
187
4.8.2 Properties of the Recentered Confidence Set Based on PRSE As a set of O's, CS+(6;+(c)) is a spherical ball. Its shape is different when we look at the dual representation on the §ion:
c~+(ij~ =+ {e,( ~ ) )- i j z + ( c ) l 1 2
< k,} .
(4.8.6)
It is no longer a ball. Figures 4.8.1 and 4.8.2 give the representation of Cs+(8) for p = 5 , k, = 9.24 (corresponding to y = 0.10) and c = 2.536 for two values of A', namely and We have followed the convention adopted by Hwang and Casella (1982): we represented the sets in the (8,6,)-plane where they are invariant under rotation around the (0,8)-axis. First, we have the following theorem:
6.
Theorem 2. If A2 5 k,, (4.8.7)
,.s+
- -
Proof. Let us write On. (c) = -y(8,)8,, where
(4.8.8) Then 0 5 that
?(a,) 5 1. Since 8, E CS+(8)if and only if 8 E @+(a,), it follows Pe
{cs+(e)> = pe {cs+(B,)}.
(4.8.9)
Since A2 < k,, 0 E Co(S)and, hence, if 8, E Co(8),then -y(en)6,E Co(8) by convexity of CO(8). Hence, (4.8.7) holds for all A' 5 k,, since CS+(e) contains co(8) for convexity reasons. We may also note that Cs+(8)contains the ball with radius fi (as the value of the PRSE in this ball is 0). When 8 = 0 , the recentered set Cs+(8)is also a ball centered a t 0 with radius (& d w ) / 2 . As seen in Figures 4.8.1 and 4.8.2, the size of the set Cs+(6)is decreasing as A2 goes toward k , and the boundaries of CS+(0)and Co(8)then get closer. When ni18,112 = c, there exists a discontinuity in the shape of Cs+(0) provided that A2 = k,.
+
..s+
The discontinuity of the set Cs+(8(,)) is reflected by its coverage proba-
..S+
bility by the decomposition of Pe{8 E Cs+(8, PO
(c))} as
{~s+(i:+(c))} = pe{nllBnl12 5 C ) I ( A ~ 5 k7)
+Pe (nIl8 - h:(c)l12 5 Icy; ~~116~11~ 2 C } , (4.8.10)
Chapter 4. Stein-Type Estimation
188
I
Figure 4.8.1 Projections of the sets Cs+(0)(bigger) and Co(0) (smaller) on the (0,6n) plane for A = &/2, p = 5 , lc-, = 9.24, and cZ7 = 2.536. (The ball of radius
6
and center 0 are given for comparison.)
Figure 4.8.2 Projections of the sets C"+(0) (bigger) and Co(0) (smaller) on the (0,6,) plane for A = &/2, p = 5, k-, = 9.24, and cZ7 = 2.536. (The ball of radius
6
centered at 0 is also given.)
4.8. Confidence Set Estimation Based on Stein-Type Estimators where 6,(c) -S =
{ 1 -}en. -
C
189
(4.8.11)
n1Pn1l2
Notice that when A2 > k,, the ball of radius fi is excluded from Cs+(6)and the sets Cs+(6)and C’(6) intersect. Since Cs+(c) is the PRSE of 6 , Cs+(6) is farther away from the origin 6 than is C’(6). The r.h.s. of (4.8.10) may be written as 41 4 2 , where
+
= H&;
and
a2)1(n2 5 k,)
42 = pe {nip - ~
5 k,;
~ ( C ) I I ~
~LII~,II~
2 c} .
(4.8.12)
Clearly, the probability is discontinuous a t A’ = k, since the first term vanishes for A’ > k,. Before proving the next theorem, we consider the polar transformation t o describe the set Cs+(6).Let (4.8.13a) Then, for A’ = nII6/I2< k,, we have CS+(6)= { ( T I P ) : 7- 5 .!(P),
PE
[-T,T]},
(4.8.13b)
where (4.8.13~) and (4.8.13d)
(4.8.14a) k-r sin,& = -
A, (4.8.14b)
and
r!!(P) = A c o s P - d k , - A2sinp.
(4.8.14~)
Further, note that T- (P) > c. Also, T+ (P), is a decreasing function of 0,but the distance between 6 and the boundary of Cs+(6)increases with 0.The theorem that follows shows the discontinuity of the coverage probability.
190
Chapter 4. Stein-Type Estimation
Theorem 3. A2 = k y .
42
-S
= Pe(nll0 - 0,(c)/I2 5 k,; nlIan112 > c } is discontinuous a t
Proof. First, if A’ < k, we have 42
-s 2 = Pe(n.116 - 0,(c)II
5 k,; nllan[12> c } = 2K (4.8.15)
where
1 P-’sinp-2p exp { - - [ r 2 -2rAcosp+A2]} h(r,P)= 7and K = ( 2 7 ~ ) - ( P - ~ ) nfZ:(s,” /~ sin2 pdp), is the normalizing constant. As A increases to
A,r+(P)goes to
= k;. Therefore,
for A2 < k,, the r.h.s. of (4.8.15) reduces to
2K
ln1;‘’)
h(r,p)dr dp.
(4.8.16)
Second, if A2 > k,, the r.h.s. of (4.8.15) becomes (4.8.17)
I
since, as A decreases to
A,r+@)-+ Ic;
and r-(/3) -+ , h and ,& -+ n/2.
Combining (4.8.16) and (4.8.17), we can write
P ( C + ( 6 E ) )=
Hp(c;A2)1(A2< k,) + 2 :J~ J;(’) h(r,p)drdp
if
< k,,
H p ( c ;A2)1(A2< k,) + 2 J’/:~ J>(’) h(r,p)drdp if
> k-,.
Using the results given above, we present in Table 4.8.1 the components of the probability of coverage of Cs+(O,) for p = 3 , 6 , and 9.
4.8. Confidence Set Estimation Based on Stein-Type Estimators
191
Table 4.8.1 Decomposition of the Coverage Probability.
A $1 $2
+
$1
$2
A $1
=
$2
41 + $2 A
41 $2
41 + $2 A =
$1
$2 $1
+
$2
A
91
$2
$1
+ 42
0 0.25 0.50 0.75 1.0 1.25 1.5 0.1727 0.1682 0.1556 0.1366 0.1138 0.0900 0.0675 0.7799 0.7842 0.7964 0.8146 0.8364 0.8588 0.8795 0.9526 0.9524 0.9520 0.9512 0.9502 0.9488 0.9470 1.75 2.0 2.25 4 2.5 6 8 0.0479 0.0323 0.0206 0.0125 0.8969 0.9099 0.9182 ,9212 0.9060 0.9022 0.9012 0.9448 0.9422 0.9388 0.9337 0.9060 0.9022 0.9012
0 0.35 0.65 0.98 1.3 1.6 2 0.2383 0.2306 0.2087 0.1766 0.1394 0.1025 0.0698 0.7512 0.7588 0.7802 0.8116 0.8477 0.8833 0.9140 0.9895 0.9894 0.9889 0.9882 0.9871 0.9858 0.9838 2.3 2.6 2.9 3.3 5 7 9 .0441 .0257 .0138 .0068 0.9375 0.9530 0.9614 0.9637 0.9338 0.9169 0.9107 0.9816 0.9787 0.9752 0.9705 0.9338 0.9169 0.9107
0 0.38 1.1 0.77 1.5 1.9 2.3 0.2494 0.2380 0.2119 0.1741 0.1315 0.0909 0.0571 0.7497 0.7590 0.7849 0.8223 0.8644 0.9042 0.9370 .9991 0.9970 0.9968 0.9964 0.9959 0.9951 0.9941
A
2.7 3.1 3.4 3.8 5 7 9 11 0.0325 0.0166 0.0076 0.0031 0.9602 0.9742 0.9808 0.9821 0.9639 0.9388 0.9255 0.9176 $2 61 h .9927 0.9909 0.9884 0.9852 0.9639 0.9388 0.9255 0.9176 $1
=
+
The table shows the decomposition of the coverage probability (4.8.10) where
d1 = H&;
a2)1(a2 < k,)
(4.8.18a)
and (4.8.18b)
Chapter 4. Stein-Type Estimation
192
The different scales of A are intended t o provide homogeneous coverage of [0,&]. Some combined values of c were obtained in Hwang and Casella (1984)and given in the Table 4.8.2.
cTlo c'OS
3 0.892 0.810
4 1.710 1.602
5 2.536 2.408
6 3.666 3.223
7 4.199 4.042
8 5.035 4.866
9 5.872 5.692
10 6.711 6.521
In general, we note that 41 is decreasing as A moves toward and 4 2 is increasing in A. From Table 4.8.1 (111) we find that the maximum of 41 4 2 is almost unity (0.9991)and the minimum value is 0.9176,which is more than 0.90.The confidence coefficient of the set C("c;is thus always more than that
&
+
of CO(6). In concluding this section, we present without proof the asymptotic approximation of 4 2 as A' + 00 motivated by the fact that A2 > k,.
Theorem 4. (Hwang and Casella, 1984). As A' proximation of 4 2 is given by 42
C
(1 - y) - -[I 2A
- y - h ( y ) ] [c 2(p -
4 00,
the asymptotic ap-
a)] + O ( L ~ - ~ / ' ) ,(4.8.19a)
where (4.8.19b)
4.8.3 Confidence Set Based on Preliminary Test Estimator Now, consider the PTE of 6 :
where d is the critical value of the test-statistic nj6,11' a t some level of significance, say, a. The related confidence set is defined by
CPT(6y(a)) = {6 : nll6 - 6:T(a)j)25 k,}. From Section 4.2,we know that the quadratic risk of A
PT
Rz(6, ( a ) )= p
-
p H p ( d ;A')
+ A2{2Hp+2(d; A')
- PT
(4.8.20)
. PT en ( a )is given by
- Hp+4(d;A ')}.
(4.8.21)
The graph of the efficiency of 6, ( a )relative to 6, as a function of A2 may be . PT
described as follows: The maximum efficiency of 6, ( a )is [l-Hp+2(d; 0)I-l 2
4.8. Confidence Set Estimation Based on Stein-Type Estimators
193
1, which decreases, crossing the one-line a t some point near A2 = 1, and goes to a minimum value at, say, A2 = A:,,. It then increases monotonically toward the 1-line as A2 4 co. - PT The properties of the confidence set CPT(B, (a))are similar t o the properties of the point estimator given by Theorem 3. First we note the decomposition of the coverage probability
Pe {e E ~ p ~ ( i j z ~ ( a ) ) } = H,(d;
A2)1(A2< k,)
+ Pe {nlI8,/l2 > d; n118 ,.PT
I k,} . (4.8.22)
Then we conclude the inadmissibility of CPT(B, ( a ) )based on Theorem 5 given below. Theorem 5. (Robert and Saleh, 2001; Chiou and Saleh, 2002). (i) If A2 < k,, then P e { 0 E CPT(6:T(a))}2 Pe{e E Co((e,)}.
(ii) If k, 5 A2 I (&+
cE7(en)}.
(iii) If A' > (&+
&)2,
then Pe{O E CPT(6:T(a))} < Pe{e E
d)' then , Pe{8 E CPT(6ET(a))}= 1 - y.
(4.8.23)
..PT
If A2 < Ic, I: d, then Pe{O E CPT(Bn( a ) ) }is decreasing as a function of A2. At A2 = 0, it equals 1 - a , corresponding to the critical value d. As A2 increases to k,, the coverage probability of CPT(6zT(a)) drops to a minimum below 1 - y depending on the value of d, meaning the level of significance of the test for 0 = 0, and then monotonically increases toward 1 - y as A2 4 + Finally, the coverage probability becomes equal t o 1 - y as a2>
(6 a)'. (A+
There is a discontinuity of the coverage probability a t A2 = k,. Tabular values of the coverage probability are presented in Tables 4.8.3 and 4.8.4 for 90% confidence sets for p = 5(2)15 and the test levels a = 0.05 and 0.10. Proof.
(i) A2 < k,. From (4.8.22) we have
194
Chapter 4. Stein-Type Estimation
(ii) K7
< A’ 5 (A+ A)’.From (4.8.22) we have PA
(
CPT
(enPT ( a ) ) )= P
A ~ { ~>I d;nll6 I ~ ~- SnlI2 ~ I ~ 5 ICY}
5 P~2{nll6- en[[’ < Icy}
=
1 - y.
(6 + &)’. Note that (6 + 4)’< A. Now, njl6 k, ==+ fill6 en/[5 A,which in turn implies (iii) A’
>
-
en1I25
-
A - fillenll I fill6 - en11 5
and we have A 5
It follows that (4.8.22),
A,
fillenll + 6. Thus, we conclude that
& < fiI/enllwhich is equivalent to nllenll’ > d. Thus, from PA~{nj/8,11~ > d; ~~116 - 8,11’ < k-,} = P0{n//6- 6n1125 Icy} = 1 - y.
This completes the proof. Table 4.8.3 Coverage Probabilities for the Set and a = 0.05
A
p=5 0.9500 1 0.9356 2 0.9318 3 0.9289 4 0.9264 5 0.9243 6 0.9224 7 0.9206 8 0.9191 9 0.9176 10 0.5937 15 0.7773 20 0.8593 25 0.8890 50 0.9000 100 0.9000 0
p=7 0.9500 0.9350 0.9304 0.9271 0.9245 0.9222 0.9202 0.9184 0.9169 0.9154 0.9141 0.7296 0.8336 0.8777 0.9000 0.9000
p=9 0.9500 0.9347 0.9297 0.9262 0.9233 0.9210 0.9190 0.9172 0.9156 0.9142 0.9129 0.6871 0.8079 0.8650 0.9000 0.9000
p=ll 0.9500 0.9347 0.9293 0.9256 0.9226 0.9202 0.9181 0.9163 0.9147 0.9133 0.9121 0.9074 0.7826 0.8514 0.8999 0.9000
CPT(6F(a)) with y = 0.10 p=13 0.9500 0.9348 0.9291 0.9252 0.9221 0.9197 0.9176 0.9157 0.9141 0.9127 0.9115 0.9069 0.7580 0.8371 0.8997 0.9000
p=15 0.9500 0.9350 0.9289 0.9249 0.9218 0.9193 0.9171 0.9153 0.9137 0.9123 0.9110 0.9066 0.9039 0.8223 0.8995 0.9000
4.8. Confidence Set Estimation Based on Stein-Type Estimators
195
Table 4.8.4 Coverage Probabilities for the Set CPT(8ZT(a)) with y = 0.10 and Q! = 0.10
A
p=5 0.9000 0.9190 2 0.9196 3 0.9190 4 0.9180 5 0.9169 6 0.9157 7 0.9147 8 0.9136 9 0.9126 10 0.6937 15 0.8305 20 0.8809 25 0.8960 50 0.9000 100 0.9000 0 1
4.8.4
p=7 0.9000 0.9162 0.9167 0.9160 0.9151 0.9141 0.9130 0.9120 0.9111 0.9102 0.9094 0.7975 0.8657 0.8901 0.9000 0.9000
p=9 0.9000 0.9144 0.9148 0.9142 0.9133 0.9123 0.9114 0.9105 0.9096 0.9088 0.9080 0.7668 0.8498 0.8834 0.9000 0.9000
p=ll 0.9000 0.9131 0.9135 0.9129 0.9120 0.9111 0.9102 0.9094 0.9085 0.9078 0.9071 0.9044 0.8336 0.8758 0.9000 0.9000
p=13 0.9000 0.9122 0.9125 0.9119 0.9111 0.9102 0.9093 0.9085 0.9078 0.9071 0.9064 0.9039 0.8173 0.8675 0.8999 0.9000
p=15 0.9000 0.9114 0.9117 0.9111 0.9103 0.9095 0.9087 0.9079 0.9072 0.9065 0.9059 0.9036 0.9021 0.8588 0.8999 0.9000
Asymptotic Theory of Recentered Confidence Sets and Domination of Positive-Rule Coverage Probability
Consider the model (4.1.1) again, namely
Yi = 6 + e i , i = 1,... , n , where ei = (eli,. . . ,epi)’ vector. Further, assume
-
(4.8.24)
n,”=, Fo(eji) and 6 = (61,. . . ,6’)’
is the mean
E ( e i ) = 0 , and E(eie:)= 0~1,.
(4.8.25)
-
The unbiased estimator of 6 is 8, = (TI,.. . Yp)’and by the pvariate central limit theorem
&en
-
- 6)
The unbiased estimator of n2 is S$
= { p ( n - l)}-’tr
n/,(070~1,) as n -+ DC).
I’c(Yi
1
(4.8.26)
- 6)(Yi- 0)’
i=l
and s$ -+ g2 in probability as n + co. For the test of HO : 6 H A : 6 # 6 0 , we use the test-statistic
(4.8.27) = 60
against
(4.8.28)
196
Chapter 4. Stein-Type Estimation
Under Ho, as n -+ 00, the distribution of Ln is a central chi-square distribution with p d.f. For the asymptotic theory of confidence sets using PRSE, we assume that the prior uncertain information of 8 is equal to 0 (null vector). Then we define the Neyman-Pearson confidence set as follows: (4.8.29) is the y-level critical value of the chi-square distribution where k, = x:(y) with p d.f. Correspondingly, the confidence set based on PRSE is defined by
,.S+
(4.8.30)
where the positive-rule Stein estimator is given by (4.8.31a) and (4.8.31b) Thus, we have the following theorem.
Theorem 6. Under the fixed alternative 8 = 6 ,
Proof.
First consider the coverage probability
V
-
since f i ( 6 , - O ) / s y - + Z Np(O,Ip) as n 00, and k, = x;(y), a y-level critical value from the central chi-square distribution with p d.f. Note that under fixed alternative L, + 00 as n 00. Hence, L;' converges to 0 in first mean and -+
--f
~ ; r ( e f +-( ~e)) -+ SY
which implies that
0
q
as
n-+00,
4.9. Nonparametric Methods: R-Estimation
197
Because of this equivalence of probability content of the two sets, we may consider a sequence of Pitman-type local alternatives and obtain the following theorem, which shows that the confidence set with PRSE as center of the sphere is locally better than the usual confidence set, Co(en):
Theorem 7. Under the local alternatives K(n) : 8(nl = n-l/'6, vector, Iim
n-03
PK(,,
6 a fixed
{c~+(o(,))} = H&;A~)I(A< ~ IC,)
Proof. First note that under K(n) - ~ ( n )-) C f i ' n s y
where Z
-
SY
IIe n (I2
3
-
c(Z + 0-16) /IZ + c-1611''
A&(O, Ip).Therefore,
2 p6{11z/12< k,} = 1 - y by Theorem 4.8.1.
(4.8.33)
Thus, we can use the results of (4.8.10) on the decomposition of the coverage probability of positive-rule estimators in the normal case. Theorems 4.8.2 and 4.8.3 pertain to the nonnormal and asymptotic case. Thus, the domination of the probability of coverage in this case is given by Table 4.8.1, which holds for the asymptotic case.
4.9
Nonparamet ric Met hods: R-Est imat ion
In this section, we consider the nonparametric methods of estimation of several location parameters 8 = (01,. . . ,0,)' when it is suspected that 8 may be 0. This broadens the scope of the Stein-type estimators in a genera1 class of symmetric distributions, and the robust Steins procedure ensues.
Chapter 4. Stein-Type Estimation
198
4.9.1 Model and Assumptions Let Y1, Y z , . . . ,Y, be n independent pdimensional response vectors such that
Y 3 = 6 + e 3 , j = l ,. . . , n,
(4.9.1)
holds. The error distribution F ( e 3 )is given by P
F(e,)=nFo(e,,), j = I , . . . ,n,
(4.9.2)
2=1
F with
where Fo belongs to the class of absolutely continuous distributions, absolutely continuous pdf fo(e,,) such that
(4.9.3) That is, the Fisher’s information is bounded. Our objective is the estimation of 6 when it is suspected that 8 may be 0 .
4.9.2
Test of Hypothesis
Consider the data set Y with independent rows and the corresponding ranks R$(bi) of the absolute deviations [ y Z j - biI among 1x1 - bi 1,. .. , lyZn - bi 1, i = 1, ... , p : y11
1 . ’
Yln
Rfl(bl) RA(bi)
* * -
...
R,+l(bP) . * .
Rf,(bl) RL(bi) R,+,(bP)
(4.9.4) For every n ( 2 1) and j = 1,. . . , p , we define a set of scores by setting (4.9.5) where 0 < U1, 5 . . . < U,, < 1 are the order statistics of a sample of size n from U(0,l). For every 21 E (0, l ) , (4.9.6) the 4j are all assumed to be nondecreasing, nonconstant and square integrable score-generating functions. Recall that R$(bi) is the rank of lxj - bil among 1 x 1 - bil, 1 x 2 - bil,.. . lYin - bil for i = 1,.. . , p and j = 1,.. . ,n. Now, consider the vector of rank-statistics Tn(b) =
... ,Tpn(bp))’,
(~ln(bl),
(4.9.7a)
4.9. Nonparametric Methods: R-Estimation
199
where n
Tzn(bZ)= n-l xui(R;(b,))sign(Y,, j=1
- b,)
(4.9.7b)
for i = 1,.. . ,p. Note that Ti(bi) is \ in bi (i = 1,.. . , p ) (see Puri and Sen, 1971, Ch. 6). Let
A consistent estimator of A: may be defined as n
A: = ( n - 1)-' x[u:(i) - &:I2.
(4.9.9)
i=l
Then, as n + 03 &Tn(O)
2)
&(O,
A&).
(4.9.10)
(See Hajek, Sidak, and Sen, 1999; Sen, 1986; Puri and Sen, 1971). Thus, for testing HO: 6 against H A : 6 # 0, we consider the rank-test C N ,defined by
ell = n A i 2 llTn(O)Il2,
(4.9.11)
which approximately follows a central chi-square distribution with p d.f. under HO as n -+ 03 (e.g., Puri and Sen, 1986).
4.9.3 Estimation of the Location Parameter Now. we focus on the estimation of 6 based on the rank-statistic T(b), following Adichie (1967) and Puri and Sen (1986). Let
- (1) Bn = sup{b : T,(b) > 0},
- (2)
6 , = inf{b : T,(b) < O}.
(4.9.12)
Then. we define the UE of 6 as
-
6, The UE,
8,
=
1 -(1) - ( 2 ) ~ ( 6 f, 6 ,
).
(4.9.13)
is a translation-invariant and consistent estimator of 8 . The
RE of 6 is of course, 0 , which is prefixed by the null hypothesis H0.It may
Chapter 4. Stein-Type Estimation
200
be shown, following Puri and Sen (1971), Sen (1986), and Hajek, Sidak, and Sen (1999), that as n 00, ---f
-
fi(8n - 8)
(4.9.14)
&(O, A ; I y 2 ( + , 4 ) I p ) .
(See also Section 3.10.) Let x;(a) be the a-level critical value for testing Ho : 6 = 0 . Then, the PTE of 8 is defined by
-
APT
8, = 8, - B,I(L,
<
(4.9.15 )
Similarly, the Stein-type estimator is defined by
-s 8, = 6,
- d,C116,,
0 < d < 2 ( p - 2),
(4.9.1Sa)
but C, takes the value 0 with positive probability. We define a modified Steintype estimator as -S
e,(EN)
=
6,
- ~ L C , ~ I (>L ,
(4.9.1613)
where E, 0 as n -+ co. A better estimator of 6 without the confusion above S+ is the positive-rule Stein-type estimator 8, , where
-
--f
6:+ = (1 - dLC,lI(L, > d ) 6 , .
(4.9.17)
All these estimators parallel the estimators defined in Section 4.6.2.
4.9.4 ADB, ADQB, ADMSE, and ADQR of the Estimators of Location Parameters
It may be observed that the test-statistic L, defined by (4.7.10) is a consistent test as such, when n
--f
03,
., S+
APT - S
all three estimators, namely 8,
, On,and
6, that are equivalent in distribution to that of 6,. Hence, the asymptotic distributional bias, MSE matrices, and the risk expressions are identical to that of fi(6, - 8).As we have seen before, t o overcome this problem, we consider the class of local alternatives
K(,,
:
e(,)=
6 = (&, . . . ,bp)’.
(4.9.18)
Then we have the following theorem under K(,), using the linearity result of Section 3.10: sup
{ f i ITj,(n-l12bj)
for j = 1 , . . . , p .
+ n-l/’b,y(+,
- Tj,(O)
4)l; lbjl < k} 5 0 (4.9.19)
201
4.9. Nonparametric Methods: R-Estimation
Proof.
Based on the linearity results given at (4.9.19), we have as n -+ 00,
h T n ( 0 )= & ( e n - q n ) ) T ( ? h4) + O p ( 1 ) .
(4.9.20)
Hence, using contiguity of probability measures under {K,,)} and t o that of 8 = 0 , we obtain, as n -, 00,
using (4.9.10). Now,
4 ) A , ' ( C ( 4 - 8,n)) = fiy($, 0)Az1(en- e(,)) - h-~($, 4)AL1enI(L <
fi-Y($,
and as n
-+
(4.9.22a)
M, the r.h.s. of (4.9.22a) converges in distribution to the r.v. (4.9.22b) z - (Z + a-16)1(11Z + 0-16112 < k,),
where C,?, -+ xg(a)= k,. Similarly, f i T ( $ , 4 ) A L 1 ( E - q,)) =
&T(?hW3en -
and as n
--f
00,
c&y($,
- 0,n))
4)AG16,Ci1I(C, > E,),
(4.9.23)
r.h.s. converges in distribution to the r.v.
z - c(Z + a-1S)llZ + a-16I1-2,
c =p
-
2.
(4.9.24)
Finally, h T W 7 d)A,'(b,S+ =
- 0(n,)
,/~~(+,4)~;1(iz+ - q,,) - C ( ~ - Y ( $ , ~ ) A L ~ ~ , ~ L > I ~ ( (4.9.25) C,
which converges in distribution to the r.v.
(Z + a-16)(1
-
+
Cl/Z+ a - 1 f 5 [ [ - 2 ) I ( ~ ~ za-1112> c).
As a result of Theorem 1, we obtain Theorem 2.
(4.9.26)
202
Chapter 4. Stein-Type Estimation
Theorem 2. Under {K(,)} and the assumed regularity conditions, as n -+00, (i)
fi(8,
- Q(,))
~ ~ ( A;/lr2(+, 0 ,
(4.9.27)
(ii) limP~~,,)(~A,’[T,(0)]’[T,(0)] I z}= H p ( z ;A’),
(4.9.28a)
where LIP(.; A2) is the cdf of a noncentral chi-square distribution with p d.f. and noncentrality parameter A2/2, where A2 = ))6112/021 o’ = A$/r2(+,,4)
(4.9.28b) -PT
(iii) The ADB, ADQB, ADMSE, and the ADQR of 6,, 8, are given below: For UE, we have
(en), bl(8,) = 0 and &(en)= 0 and
M1(6,) = 0’1, and R l ( 6 , ; W) = a’tr(W). For PTE,
,.PT
AS
, 8,, and
S+
8,
(4.9.29)
(an”), we have
b2(8, )
=
-q+2(x34;
. PT
A2) and B2(8, ) = A 2 { H p + 2 ( x 3 4 ;A2)I2; (4.9.30)
Mz(6n) = a 2 1 p { 1
- H~+Z(X;(Q); A’)}
+66’{2Hp+2(x;(Q);A2) - Hp+4(x34;A”} and
Rz(6;; W) = a2tr(W){1 - Hp+2(x2,(a);A2)}
4.9. Nonparam etric Methods: R-Es timation
203
and
The expressions above are similar to the expressions in Theorems 1 and 2 of Section 4.6.3. Hence, the asymptotic properties of the four estimators are the same as given there.
Proof. let
(i) See Theorem 1. (ii) The proof follows from (4.9.21). For (iii), we
u1 = z
-
(Z + a-'6)l(ljZ
+ 0-16)[/2 < Ice).
(4.9.34)
Then the bias vector is given by b2(eET)= E(aU1) =
-6Hp+2(xg(cu),A2).
Similarly, PT
M2(6, ) = D ~ E [ U , U=~c2Ip ] - 02 I p H p + 2 ( x ; ( a ) A2) ; +66'{2Hp+2(x3a); A2) - Hp+4(x2,(a);A2H. (4.9.35) -S
For the bias vector and MSE matrix of 6,, let u2 =
z
-
c(Z
+ a-16)"Z + a-161j-2.
Then compute -S
b3(6:) = aE(U2) and M3(6,) = o2E[U2U;].
(4.9.36)
Chapter 4. Stein-Type Estimation
204
-St
Similarly, the bias vector and the MSE matrix of 8, using the r.v.: u3 =
can be computed by
(Z + a-16)(1 - C l j Z f a - 1 6 ~ ~ - 2 ) I (+ ~ la-16j/2 z > c).
Also, compute
- S+ )
b4(@, M4
aE[U3],
(en -s+= a2E[U3Uj].
(4.9.37)
Asymptotic Properties of Confidence Sets
4.9.5
In this section, we consider the asymptotic properties of the following (1--y)'% confidence sets: CO(6,)
=
{4I@ - enIl2I Icy},
cPT (GPT n ( a ) )= in110 - 6t/T(a)/~2 I ky>, and
. S+
CS+(8,
(.I)
. S+
= {rill@ - 8,
(.)I2
5 k,>,
(4.9.38) (4.9.39) (4.9.40)
where Ic, = $(y). Similar to Theorem 4.8.6, we show that under fixed alternative, the asympPT st totic distributions of fi(8, ( a )- 8) and &(On (c) - 8 ) are the same as that of - 8).Hence, we consider the asymptotic properties of the confidence sets (4.9.26) through (4.9.28) under the local alternatives, A
&(an
qn,qn)= n-1/26.
To begin, it is easy to see that lim P~,,){nljO(,)- &/I2
n-m
5 k,}
= P(llZ112 < k7) = 1 - y.
(4.9.41)
Next, we consider Theorem 4.9.1 and use the results (i) through (iii) to compute the probability of the different sets. Thus, the asymptotic coverage prob..S+ abilities of CpT(6ET(a))and CS+(8, (c)) reduce to
and
(4.9.421;) The two expressions are similar to the expressions given in Theorem 4.8.7. Hence, the asymptotic coverage probabilities are similar to the finite sample coverage probabilities as given in Theorems 4.8.1 through 4.8.7.
205
4.10. Conclusions
4.10
Conclusions
In this chapter, we introduced the basic Stein-type estimators, in particular the James-Stein estimator together with the preliminary test estimator of the mean vector of a pvariate normal distribution with known and unknown covariance matrix. We have also presented three basic methods of obtaining Stein-type estimators. They are (1) the risk difference approach, (2) the empirical Bayes approach and the (3) the preliminary test (or quasi-empirical Bayes) approach. Of all these models, we particularly discussed the empirical Bayes and the preliminary test (or quasi-empirical Bayes) approach to obtain the Stein-type estimators. In general, we considered the (1) unrestricted (2) PTE, (3) Stein-type, and (4) positive-rule shrinkage estimators and evaluated their properties according to Table 4.10.1. Table 4.10.1 Properties of Estimators
Estimators
4
We showed the following among other results: (i) 0 = B1 (8,) 5 B4 (6;') 5 B3 (6:) 5 B2 (6:T) conditions on the level of significance, a
(-
(ii) R1 8,;W (iii) R3 (6;;
) 2 R3 (6,;W ) 2 R4 (*'+;W) 8, 2 R5 (6;;
<
03,
with some
forallA2E [O,co)
O - ~ I for ~ ) all A' E [0,w)
(iv) R2 (6zT;W) 2 Rg (6:T+; W) for all ( a , A 2 ) . . S+
-s
We concluded that for p 2 3, 8, is preferable to 8, and 8,, and that PT -1s -S . PT+ en is preferable to 8, for p 5 2. The estimator On is preferable to 8, if p
S+
-IS
2 3 but not over 8, . The positive-rule version of On is not preferable A
over 6;' and should not be considered as a viable estimator. We discussed the recentered confidence set based on the PRSE and presented its properties in Section 4.8 with the asymptotic version under local alternatives around the uncertain prior information 8 = 0 . The confidence sets with recentered PSR.E proved to dominate the usual set under local alternatives. The confidence
206
Chapter 4. Stein-Type Estimation
set with unknown variance turned out to be a difficult problem that is open for solution and was hence avoided. We appealed t o the asymptotic result for this problem. We also included the theory of Stein-type R-estimation of the location parameter 8 and the asymptotic properties of the point and the confidence set estimation of 8 under local alternatives.
4.11 Problems 1. Show that (i)
pdq
= 1.
{
(ii) lM2(hET)I = 1 - Hp+2(x2,(cy); A2)}’-’ x
I
-S
{ [I -
ffp+2(x34;
A’)]
+A2 [~H~+Z(X;(U); A’) - f f p + 4 ( ~ ; ( a ) ; A2)]}.
{
(iii) IMs(8,) = 1 - ( ~ - 2 )kE(x;:z(A2))
-(~-2)E(x;:~ (A’))]
}’-’
x ( 1 - (P-2)[2E(X,;22(AZ)) - (P-2)E(X&(A2))] X(P2
- 4)A2E(X,;4(A’))},
and . S+
(iv) 1M4(8,
11 = (I
- ( P - 2)[2E(x,S22(A2)) - ( P - 2)E(x;+Z(A2))]
< ~P ,-+2)])’-lm4(A2), Z(A~) -E[{1 - ( P - ~ ) X ~ ~ ~ ( A ’ ) ) ~ ~ ( X where m4(A2)is given by (4.4.59).
2. Refer to Section 4.4.2.
..PT
-S
(i) Compare 6 , and 8, with respect to MSE matrices and risk expressions, and show that neither dominates the other for some a, the level of significance. . PT
(ii) Compare 8, and bf+ with respect to MSE matrices and risk expressions, and show that neither dominates the other for some a , the level of significance. 3. Let
den
+
+
-
-(4 = 8 e,; i = 1, 2 , . . . ,n,where e, N p ( O , I p ) .Let 8, (1 - d)601p where 80 is a fixed scalar parameter. Determine the
Yi
bias, quadratic bias, MSE matrix, and quadratic risk based on the loss function L ( 8 : ; 6 ) = (6* - 8)’W(6*- 8 ) (0 < d < 1) of the estimator,
en.
4.11. Problems 4. Let
207
Y i= 8 + e i , i
=
1,2,. . . , n, where
ei
is distributed as
Define
(i) (ii) (iii) (iv)
6, = P = (TI,.. . , FP)’. APT 8, = 6, - 6,1(nll6,11~ < x;(a)). -S 8, = 6 , - ( p - 2)(nll6,l\ 2 -18., &,”+ = 6 , - 6,1(_nll6,ll2 < p 2) - ( p - 2)(nll~~,112)8,+ ( p 2)(nl16,1\2)-11(nlj8,1/2< p - a)@, as in Sections 4.2 and 4.3. Here
$ ( a ) is the a-level upper critical value from a central chi-squared distribution with p d.f. Show that the bias expressions are (i) bI(6,) = 0 and
,.PT
&(en) = 0.
(ii) b2(8, ) = -t9HLy2(x;(a); A2) and . . .
BZ(&;~) = A 2 { H ~ : ) 2 ( ~ ~ (A2)}2, a); where
j = 1,2. -S
(iii) b3(8,) and
&(fif)
where
= - ( p - 2)8E(1)(~;:2(A2)) =
( p - 2)2A2{E(1)(~;j2(A2)}21
208
Chapter 4. Stein-Type Estimation
4.11, Problems
209
6. Establish the following identities:
(i) E(')(X,S22j,2(A2))
Y i = 6 + e i , i = l , ... , n as in (4.1.1), and assume that the joint distribution el,.. . , e n is given by multivariate t-distribution Mt (0, a21,,). f(e;v0,o2)= v0,2>
0. Let
r(+) (nvo)np/2r
(y
)
U ~ P
?
Chapter 4. Stein-Type Estimation
210
1
x 1, (z(P
where x = +,
1 + s + 2 r - 2); 5(m + 2)),
uo > 2.
4.11. Problems
21 1
(b) Show that the MSE matrices of the estimators are (i) MI(&) = a21p, PT
(ii) MZ(8, ) = uzI,{l - G$!2,m(d; A*’)} (1) +nee’{2GF,!2,m(4; A*2)- Gp+4,m(zz; A*’)}.
+ ~A*’E(’)(x;:~(A*’))} + npc(p + 2)6e’E(’)(~p;3~(A*’)). 8. (a) Find the expressions for the (i) bias vector, (ii) MSE matrix, and -S (iii) the risk of the estimator 8, (c, k ) given by (4.7.5). (b) Show that c ; ( k , A2) minimizes the risk of 6f(c, k ) . 9. Prove Theorem 4.7.1.
10. Prove Theorem 4.8.3.
11. Prove Theorem 4.8.4. 12. Verify the expressions of ADB, ADQB, ADMSE, and ADQR of the R-estimators of 6 of Theorem 4.9.1.
This Page Intentionally Left Blank
Chapter 5
ANOVA Model 0utline 5.1 hfodel, Estimation, and Tests 5.2 Preliminary Test Approach and Stein-Type Estimation 5.3 Bias, Quadratic Bias, MSE, and Risk Expressions 5.4 Risk Analysis and Risk Efficiency 5.5 MSE-Matrix Analysis and Efficiency 5.6 Improving the PTE 5.7 ANOVA Model: Nonnormd Errors 5.8 ADB, ADQB, ADhfSE, and ADQR of the Estimators 5.9 Confidence Set Estimation 5.10 Asymptotic Theory of Confidence Set Estimation 5.11 Nonparametric Methods: R-estimation 5.12 Conclusions 5.13 Problems
An important model belonging to the class of general linear hypotheses is the analysis of variance (ANOVA) model. In this model, p samples of differing sample sizes ~ 1 , ... ,np from p normal distributions N(81,u’), . . . ,N ( 4 ,a’), respectively, are collected, where 6 = (01,. . . ,BP)’ is the unknown treatment mean vector and u2 is the conimon unknown variance. It is usual to test the null hypothesis HO : 61 = . . . = eP = 60 (unknown), that is, the treatments are equal to an unknown mean 60 against the alternative hypothesis, H A , that at least one pair of treatment means differ. Here n = n 1 + ... n p is the total size of the samples collected from the p populations. The main object of this chapter is to study the properties of various improved estimators of 6 = (61,. . . ,t),,)’.Accordingly, we consider the unrestricted, restricted, preliminary test, and Stein-type estimators of 6. We also present the Bayes and the empirical estimators of 6.We compare various estimators using the MSE matrix and the quadratic risk criterion. We also present the
+
213
Chapter 5. ANOVA Model
214
confidence set analysis based on the estimators as centers of the set with fixed volume. Finally, we discuss the asymptotic properties of the estimators when errors follows some nonnormal distributions, as well as the point and confidence set R-estimation of the treatment means to conclude the chapter.
5.1 Model, Estimation, and Tests 5.1.1
ANOVA model
Consider the ANOVA model
Y =Be+€,
(5.1.1)
where
y = (Yll,... , Y l n 1 , - . . 6 = (el, . . . , E
=
(Ell,
*.
.
rYpl,...
,Ypn,)’,
ePy, 7
Elnl
I .
.. ,E p l , . . .
7
Epn,)‘
B = Block diagonal vectors = Diag(l,, , . . . , In,), I,, = (1,1,.. . ,I)’, an n,-tuple of 1’s, n = n1 + 722 +
- . - +n,.
(5.1.2)
The distribution of the vector E is Mn(O,021n),where I, is the identity matrix of order n and o2 is the variance of the errors. Generally, the main object of ANOVA problems is the test of the null hypothesis HO : 0 = 001, against the alternative H A : 6 # 1301,. But, in this chapter, our objective is t o improve the usual estimators of the treatment means. In the next section, we consider the usual estimation and test of hypotheses in the ANOVA model, with an example and calculations of various related statistics together with the proof of likelihood statistics for the test of the hypothesis HO : 0 = 601, as the basis of developing the improved estimators.
5.1.2 Estimation of the Parameters of the One-way ANOVA Model Consider the ANOVA model (5.1.1) and (5.1.2). From the sample information and the model, the unrestricted estimator (UE) of 6 based on least squares (LS) or maximum likelihood (ML) method is given by
en = (B’B)-~B’Y = N - ~ B Y= (Yl,. . . ,Y,)’
=
(el,.. . ,e&
(5.1.3)
5.1. Model, Estimation, and Tests
215
where N = B'B = Diag(n1,. . . ,n p ) .The corresponding unrestricted unbiased estimator (UUE) of g2 can be written as sz =
1 m
-(Y
- Be,)'(Y
-
Be,), m = n - p .
(5.1.4)
It is clear that 6, is distributed as N , ( ~ , U ~ Nindependent -~) of the distribution of ms:/a2, which has a central chi-square distribution with m degrees of freedom (d.f.). Suppose that in addition to the sample information and the model, we have some additional information which consists of the null hypothesis Ho : 6 = eel,. If this hypothesis is true, then we estimate the common value of 801, bY
1 6, = -1 I'Ne,. n p P
(5.1.5)
Thus, 6 , is the restricted estimator (RE) of 6 under the null hypothesis H,, : o = eoi,. Table 5.1.1 One-way ANOVA Data
1"' Sample I Mean
1
I pth Sample I Mean
Yll
YPl
Sample
The grand mean of the data set is 5 = &, 80,
5.1.3
= :(TI f . . .
given by
+ T,).
(5.1.6)
Test of Equality of the Treatment Means
In this section, we consider the test of the null hypothesis in the ANOVA model, namely
H,,
:
o = eoip,
against the alternative
H~ : 0 #
eoi,.
The test is given by the following theorem:
(5.1.7)
Chapter 5. ANOVA Model
216
Theorem 1. The likelihood ratio (LR) statistic is given by (5.1.8) Under Ho, C, has the central F-distribution with ( p - 1 , m ) d.f., and under H A it has the noncentral F-distribution with ( p - 1,m ) d.f. and noncentrality parameter A2/2, where
(5.1.9)
Proof. The likelihood ratio (LR) statistic is obtained using the maximum of the likelihood functions under Ho and H A given by
Lo = SO"(&)-"
exp
{ -'}2
, n ~ g= (Y - B8n)'(Y - Ben),
and
LA =
exp
{ -?}2
, ns:
=
(Y - Be,)'(Y
- Be,) = m s z ,
respectively. Hence, the LR statistic is the result of 2/n
(Y - Be,)'(Y (Y - Bb,)'(Y
- Be,)
-
1
,
(5.1.10)
+G L , - Be,) + eLH'NH8,.
- Bb,)
1
since (Y - B8,)'(Y - Be,) = (Y - Be,)'(Y Thus we take C, = eLH'NHe,/(p- 1)s; as the LR statistic for testing Ho against
HA.
To obtain the distribution L, under H A ( H o ) , we consider an orthogonal matrix r = (rl,rz),where I'l is a p x ( p - 1) matrix and r2 is a p vector such that I'kI'l = 0 and rlr; = I,. Also, it diagonalizes the symmetric idempotent matrix N-'/2H'NHN-1/2 of rank ( p - l ) , meaiiing
+
r'N-1/2H'NHN-
1/2r' =
(
-
).
Then, defining w = a-'r"'/26,
and q = a-11'N-1/26,we have w Np(q,Ip).Further, we partition w = (w;, wb)' and q = (q;, q;)', where w1 is a subvector of order ( p - 1) and w2 is a scalar random variables such that
Hence, llw1112 follows the noncentral chi-square distribution with ( p - 1) d.f. and noncentrality parameter A2/2, where 1
A' = -[6'H'NH6] 02
= 11q1j12.
(5.1.11)
5.1. hilodel, Estimation, and Tests
217
Then, under H A , 13, has the noncentral F-distribution with [ ( p- 1 ) , m ] d.f. and noncentrality parameter A2/2, since ms2/u2is independently distributed as a central chi-square variable with m d.f. The derivation of the LR test is given only to see the required transformations that come in the picture, which will be used later in subsequent discussions. Some of the distributional results are given in the following theorem that are easy to prove.
Theorem 2. Under the assumed conditions,
-
(i)
X(,)= (6, - 8 ) NP(o; 2W1)
(ii)
Y(,l
(iii)
=
(6, - 6,)
- N,(H8; ~ H N - ~ ) ;
-
Z(,) = (6, - $01,) N,((8 - 80)1,; $B) , B = l&, and 81, = ;lp1LN8;
We now present an example t o illustrate the test of the null hypothesis of equality of means in several samples. (Here it is four samples.)
Labs 1 2 3 4
<---Samples ----> 0.25,0.27,0.22,0.30,0.27,0.28,0.32,0.24,0.31,0.26,0.21,0.28 0.18,0.28,0.21,0.23,0.25,0.20,0.27,0.19,0.24,0.22,0.29,0.16 0.19,0.25,0.27,0.24,0.18,0.26,0.28,0.24,0.25,0.20,0.21,0.19 0.23,0.30,0.28,0.28,0.24,0.34,0.20,0.18,0.24,0.28,0.22,0.21
Total 3.21 2.72 2.76 3.00
Chapter 5. ANOVA Model
218
Preliminary Test Approach and Stein-Type Estimators
5.2
In this section, we discuss the preliminary test (or quasi-empirical Bayes) approach to shrinkage estimation in addition to the Bayes and empirical Bayes approach.
5.2.1
Preliminary Test Approach (or Quasi-empirical Bayes Approach)
Following Chapter 3, we write the preliminary test estimator (PTE) of 8 as -PT
8,
=
-
en - (en- 8,)I(Ln
< Fp-i,m(a)),
(5.2.1)
where Fp-l,m(~) is the upper a-level critical value of the F-distribution with ( p - 1,m ) d.f. P T E of 8 depends on the level of significance and the choice of estimators remains between the two values 6 , and 6,. To overcome this difficulty we define the James-Stein-type estimator
-s ,. 8, = en + {I - CL; =
en -
CL,1(&
en -en>
-
in),
(5.2.2)
where c = ( ~(-pl-)3()m m Notice that we have replaced I ( & < F p - l , m ( ~ ) )by + 2 ’) CL;’ to obtain this estimator. The constant c is defined based on the degrees of freedom [ ( p - l),m]. The quantity ( p - 3) is associated with the expected value of a reciprocal chi-square variable with ( p - 1) d.f. -S
The estimator, On may take values past On.Thus, we consider the positive-S
rule version (PRSE) of On given by en
= 6,
+ (1 -
CL,’)
I ( L , > c ) ( 6 , - &),
(5.2.3)
-s
which is a P T E based on 8, and 6, with critical value c. The estimators above were discussed by Ali and Saleh (1991~)and Khan, Bashir (1997). Now, using Theorem 2, we can write
(ii)
-S (en - 0)= x ( ~ - C)( P - ~ ) $ Y ( ~ ) I I N Y ( ~ ) I I G ’ ,
. S+
(iii) (6, - 8) = ~ ( ~ C(P 1 -- ~)s~Y(~)IIY(~)IIG’
-y(,)(1- C ( P -
1 ~ ~ ~ l l ~ ( n (& ) Ityl I nG[ IZN ~<~ ). (5.2.4)
These expressions can be used to compute the expressions for bias, MSE matrices, and the quadratic risks of the different estimators.
5.2. Preliminary Test Approach and Stein-Type Estimators
5.2.2
Bayes and Empirical Bayes Estimators of Treatment Means
219
-
To obtain a Bayes estimator of 6 , we consider the distribution of Y N,(BB, a21,) together with the prior distribution of 6 as Np(Oolp,T ~ N - ' ) , which includes the null hypothesis Ho : 6 = 001, (0, unknown) into the prior of 6 . Then, by Theorem 3 of Section 2.3, the posterior distribution of 6 given 6, is Np{Ool, + (1 - d)(8, - Oolp), ~ ~- d)N-l}, ( 1 where N = B'B and d = &. Thus, under the quadratic loss L ( 6 * , 6 )= sll6*-ell2, the Bayes estimator of 6 is given by -B en = eoi, + (1 - d ) ( 6 , - eoip)
(5.2.5)
with covariance matrix
~ ~- d)N-'. ( 1
(5.2.6)
The marginal distribution of 8, is Np(Ool,, N-l(a2 +-r2)-l). Accordingly, minimizing (3, - 001,)'N(6, - 001,) w.r.t. 00, we obtain the estimate of 80 as
B0,
=(i;~i~)-li;~~6,.
(5.2.7)
Let 6, = 60~1,.Clearly, E"(6,) = &lP, where Em stands for the expectation based on marginal distribution of 6,. Furthermore, (6, - 6,) = He, is marginally Np(O,(a2+ T~)-'HN-'H') and
(5.2.8) Therefore,
+
and the unbiased estimator of (a2
is given by
T ~ ) - ~
( P - 3)
(~LH'NH~,)
(5.2.10)
.
Notice that no prior information is available on the variance 02. To obtain a Bayes estimate of a2, we have to use the so-called noninformatiue prior n(02) = l/a2 and estimate a2 from the marginal distribution of s:, namely, ( n- p ) $ N a x,-~. This leads to sz as the Bayes estimate of 02.But (msz)/m+ 2 is a scale-invariant better estimator of 02. Thus, the estimator (p-3)m~-' Accordingly, substituting the estimators of the extra of d is d = (,-1)(";2)' parameters, we have the following empirical Bayes estimator (EBE) of 6: A
-EB
.,
m(P - 3 1 4 ( m+ 2)6LH'NH6,
( 6 , - en).
(5.2.11)
Chapter 5. ANOVA Model
220
This is the estimated version of the Bayes estimator (5.2.4(ii)), with weights depending logically on the test-statistics C, defined by (5.1.8). We proposed -S
the same estimator, 8,, using the preliminary test approach. this creates a problem in the definition (5.2.11) Now, if C, + 0, L;' + a, of the empirical Bayes estimation. To keep the estimator of d bounded below 1, we restrict C, to be greater than c as given in Chapter 4. Thus, we obtain st the positive-rule shrinkage estimator (PRSE) of 8 defined by 8, in (5.2.3) and repeated again here:
-s+-
en
-
6,
+ (1 - cc,l) qc, > c)(e,- 6,).
(5.2.12)
We proceed to write the quasi-empirical Bayes estimators as
6; = 6,
+ (6, - 6 , ) 4 ( C , )
(5.2.13)
where, by choosing
4 ( W = 1, = 0,
> Fp,m(a)), = 1 - cc,', = I(C,
. S+
APT -S
we obtain the estimators 6,, 6, 8, ,On,and 8, , respectively. The following example illustrates the computation of the various estimators: Example 1. Computation of the preliminary test and shrinkage estimators. Consider the data in Table 5.1.2. For these we have C, = 2.87 and c= = 0.319. Let a = 0.15. Then the preliminary test based on C,
5 (2)
PT
rejects Ho, and 8,
,.S+
= (0.268,0.227,0.230,0.250)' = 6,. To compute
-S
0, and
0, , we have
:8 8:
= 8, - c(el -
^
- 60)C;l
= 0.268 - (0.319)(0.268 - 0.242) = 0.210,
= 0.227 - (0.319)(0.227 - 0.242) = 0.232,
6; = 0.230 - (0.319)(0.230 - 0.242) = 0.234, 6; = 0.250 - (0.319)(0.250 - 0.242) = 0.252. -S
Hence, 848= (0.210,0.232,0.234,0.252).
-s
-s+
Now, C n = 2.87 > 0.319. Therefore, 848= 8,, . We note that the PTE of 8 PT
-
S+
is 8,, = (0.268,0.227,0.230,0.250), while 8,, = (0.210,0.232,0.234,0.252) in this case. A
5.3. Bias, Quadratic Bias, MSE, and R,isk Expressions
221
5.3 Bias, Quadratic Bias, MSE, and Risk Expressions In this section, we obtain the bias, quadratic bias (QB), MSE matrix, and risk expressions for the unrestricted, restricted, preliminary test, James-Stein and the positive-rule Stein estimators.
5.3.1
Bias Expressions
First, we note that
c,
=
~ ' H I N H ~-, m Z i Z 1 (P - 1)SZ ( P - 1)xf
-
Fp-1,m(A2), A2 =
B'H'NHB g2
.
Therefore, we obtain the bias and the quadratic bias expressions as given in Theorem 1 below:
Theorem 1. (i) bl(6,)
=
0 and Bl(6,) = 0,
(5.3.l a )
(ii) bl(6,) = -H6 = -6 and B2(6,) = (6'NS)g-2 = A2, PT
(iii) b3(6, ) = -HBG,+l,,(e,;
e,
= ~F p+l
p - ~ , m ( a and )
-S
(iv) h ( 6 , ) =
-C(P
A2) = -SGp+1,,(t,; A'), A
PT
B3(8, ) = A2{Gp+i,m(&;A2)}2.
(5.3.1~)
- l)8E[x;:l(A2)] 2
and B l ( 6 ; ) = c2(p- 1)2A2{E[x;~1(A2)]} .
- S+
(v) b5(6, ) = -6{Gp+1,,(c-;A2) -C-
(5.3.lb)
[Fp;ll,,
fc(p-1)E (p+l)
(
(5.3.1d )
[F-'p+l,m(A2)]
(A2)I Fp+1,, (A2) <
CM)] } . (5.3.le)
Here E[Fp21,,(A2]is the expectation of the reciprocal of the noncentral Fvariable with ( p I, m ) d.f. and noncentrality parameter A2/2. Similarly,
+
(5.3.2)
Chapter 5. ANOVA Model
222
is the truncated expectation of the reciprocal of a noncentral F-variable with ( p 1,m ) d.f. and noncentrality parameter A2/2. Further, G,,,,, (.; A2) is the cdf of a noncentral F-distribution with (v1, v2) d.f. and noncentrality parameter A2/2.
+
ProoJ (i) The proof is obvious. (ii) The bias vector of
h,
may be calculated as
1 bz(6,) = E ( 6 , - 8 ) = -1 1’NO - 8 n ’ P 1 = - ( I p - -1 1’N)6 = -H8 = -6. n ’ P Hence,
&(en) = (6’N6)0-’
= A’.
by using the transformations in the proof of Theorem 5.3.1 = cN-1/2J?l -
(+;N’/28)
GP+l,-.(ta,A2) by Theorem 2.1.16
~ 8 ~ , + ~ , , (a2) t , , = -=p+l,m(ta,a2).
-
-
Since He, N(H8,O2HN-l) and e’H’NH6 X ~ - ~ ( with A ~ )p - 1 d.f. and mse2/02 is a x$, with m d.f. independent of X & ~ ( A ~ ) . Consequently, PT
) = A2{Gp+i,tn(la,A2)}2-
5.3. Bias, Quadratic Bias, MSE, and Risk Expressions -S
(iv) b4(0,) = E(6,
= -c(p
-
-
1)uE
0) + E [ ( 8 , - b,)(l
-
223
cLc,l)]
{ N;;;f;lzl }
S+
(v) b5(8:+) = E(0, -0) = E(6;-0)-E{(l-cL;')I(LR
< c)(8,-en)}.
We evaluate the second term above as follows:
E((6,
- 6,)(l
-
cLL1)I(LC,< c)}
-
S+
Substituting the above expression, we obtain the bias vector bs(0, ) and B5(62+)as in the theorem. Since bias expressions are vectors, we use the quadratic bias expressions to study the properties of the biases of the five estimators. The quadratic bias of 6, is zero and that of 0, is A2, which is an unbounded line starting from
Chapter 5. ANOVA Model
224
the origin. The quadratic bias of PTE, SE, and PRSE are bounded in A2, satisfying the relation
,.S+ ~ ~ ( 6= ,o )I Bs(e, I ~
..PT
~ ( 65 ,~) ~ ( 6I B , ~ ( ~ ~(5.3.3) J -S
whenever
5.3.2
MSE Matrix and Risk Expressions
The MSE matrix and the risk expression under the quadratic loss function
qe*;e) = (e* e)lw(e*- e) = lp* el\& -
(5.3.4)
-
are given by the following theorem: Theorem 2. (i) Ml(8,) = a2NP1and Rl(6,; W) = a2tr(WN-l). (5.3.5a) (ii) Mz(6,) = $lplb+66’ and Rz(6,;W) = < l ~ W l , + S ’ W S = 1 + A 2 if W = D - ~ N . (5.3.5b)
. PT (iii) MJ(8, ) = a2N-l - 02(HN-1H’)Gp+l,m(!,; A2)
+(66’){2Gp+1,m(!a; A2) - Gp+3,m(!:;
!:
A’)}
,= s F p - l , m ( a ) , = SFp-l,m(a), and HN-IH’ = where ! HN-1.
Further,
and
5.3. Bias, Quadratic Bias, MSE, and Risk Expressions
225
Proof. (i) M1(6,) = E ( 6 , - 6 ) ( & - 0)' = n2N-l
Rl(6,; W) = E ( 6 , - 6)'W(6, The risk Rl(6,; W ) equals p if W
- 6) = n2tr(WN-').
(5.3.6a)
= CT-~N.
(ii) Mz(6,) = E ( 6 , - 6 ) ( 6 ,- 6)' =
66'
and
R2(6,; W) = E(an - 6)'W(6, - 6) = tr[WM2(6,)]
= $l'pWlp
+ 6'W6 = a2tr(WN-') - n2tr(WHN-') + 6'W6,
(5.3.6b)
which reduces to
- PT
R & , N ~ - ~ )= 1 + n2 if
- PT
(iii) M3(6, ) = E ( 6 ,
,.PT
-0)(6,
-6)'
w = NO-^.
Chapter 5. ANOVA Model
226
PT
Hence, the risk of 8,
is given by
if W = o-’N. (iv) M4(6:) = E{(6: - 6)(6: - 6 ) ’ } = E((6,
-
6)(6,
- 6)’ - 2cE
5.3. Bias, Quadratic Bias, MSE, and Risk Expressions
if
W = a-’N.
-S
227
(5.3.6d)
Rewriting this formula for R4(6, : W ) , we obtain the expression of the theorem.
Chapter 5. ANOVA Model
228
Evaluating the terms as before, we have
M5(6ff) = M4(6:) - a2(HN-l)
229
5.4. Risk Analysis and Risk Efficiency
..S+
The risk of 8,
is then easily computed as
F p + l , m ( A 2<)
w )] Pi-1
if W = o - ~ N .
5.4
(5.3.6e)
Risk Analysis and Risk Efficiency
5.4.1 Comparison of
6,
and
a,.
The risk-difference Rl(8,; W) - R2(bn;W) shows that than 6, if a-2(d’W6)
and
6,
performs better than
b,
8,
performs better
5 tr(WHN-l),
(5.4. l a )
if
a-2(S’W6) 2 tr(WHN-’). If W = o - ~ N ,then 6, performs better than 6, performs better than 6,.
6 , when A2 < p -
(5.4.1b)
1; otherwise,
Chapter 5. ANOVA Model
230 The risk-based efficiency of
6,
compared to
6,
is given by
(5.4.2)
RRE(6, : 8,) is a decreasing function of A’. At A’ = 0, the maximum efficiency of p is attained. However, 6, performs better when 6, is the interval [O,p - l),and outside this interval, 6, performs better than 8,. PT
Comparison of 9,
5.4.2
and 6,(8,). -PT
Here the risk-difference shows that 8,
while
6,
if
-PT
6,
If W
performs better than
performs better than 6, if
= o-’N,
PT
then 6, performs better than
6,
when (5.4.4)
otherwise, 6, performs better than 6,. PT Now, consider the null hypothesis HOand the risk-difference of 6, and 8, , namely Rz(6,; W) - R 3 ( 6 r ;W) 0’
= -1LW1,
n
=
-
cT’tr(WN-’)
- o’tr(WHN-’)(l
-
+ a’tr(WHN-l)G,+l,,(!,; 0)) < 0.
G,+l,,(t,;
PT
Thus, under Ho, 6, performs better than 6, . PT
than 6,
. PT
if
and 8, performs better than 6, when
0)
(5.4.5)
. In general, 6, performs better
5.4. R.isk Analysis and Risk Efficiency If W
= u-’N,
231
then the r.h.s. of (5.4.6) and (5.4.7) reduces t o ( P - 1)(1- Gp+l,m(&; A’))
+ Gp+3,,(f;;
{ 1 - 2Gp+l,,(L; A’)
PT
Now, consider the risk-efficiency of 6 , by
relative to
(5.4.8)
A’)}.
e, for W = o-’N,
given
APT RRE(6, : 6 , ) )
fA2p-l
{ 2GP+i,,(&;
A’) - Gp+3,,(!;; -PT
We note the following properties of RRE(6,
- PT : e,)$l
(i) RRE(6,
A’)}]
-1
.
(5.4.9)
-
; fin):
according as
from (5.4.3a) and (5.4.3b) (ii) The maximum efficiency is attained a t A2 = 0 with the value (5.4.11) APT
-
(iii) R.RE(6, ;6 , ) decreases as a function of A2 for fixed a E (0,l) crossing the 1-line to a minimum, say, at A2 = ALin; then monotonically increases toward unity as A’ 00. Since P T E is not an estimator with uniform dominance over 6,, we can obtain a PTE with minimum guaranteed efficiency, say, Eo by solving the inequality --f
APT
max minaz RRE(6, a6d
-
;6,) = E ( a ,A k i n ( a ) ) = Eo.
(5.4.12)
Tables of maximum and minimum efficiency may be obtained as before. (iv) As a
--f
-PT
0, 6,
-+
APT
O n , while 6 ,
-+
-
6 , as a ,. PT
---f
1.
Next, we consider the risk efficiency of 6 , relative to 8, for W = N given bY
Chapter 5. ANOVA Model
232 -PT
We note the following properties of RR.E(O, ;8,): -PT
..
(i) RRE(8, ; 8,)
5 1 according as
(5.4.14)
(ii) Under Ho, the efficiency is given by
,.PT
while that of 8, re1 6, is given by
Thus, under Ho, APT.-
-PT .
p-l I RRE(8, ; 8,) I RRE(8,
5.4.3
-s -s+
Comparison of On,8,
, and
,On).
-
8,.
First note that the risk-difference is
R,(6,; W) - R4(6,s;W) = c(p-
l ) a 2 t r ( W H N - l ) ( p - ~)E[x;:~(A~)]
+ 2A2E(x;&(A2))
[
1-
(P + W’W6) 2( d’N6)tr(W H N - 1 )
(5.4.15)
for all A2 and all matrices W such that tr(WHN-’) Ch,,,(WN-1) -S
Hence, 8, dominates
6,
p+ 1 2
> -. -
uniformly for all W satisfying (5.4.16).
(5.4.16)
5.4. Risk Analysis and Risk Efficiency
233 s+
Next, we consider the comparison of 6 , R4
(6;;W>-R5 (6;’
-s
and 8,. Again, the risk-difference
W>
which is nonnegative for all 6. If W = u-’N, the r.h.s. of risk-difference reduces to
Thus, we obtain the order in the risk functions of S+
en,O- Sn , and 8,S+ as
Rl(6,; W) 2 R4(6:; W) 2 Rs(0, ;W) for a11 A2, A
..S+
.?
(5.4.17)
-S
over 6 , as well as over 6, is established. -s It may be verified that RRE(6,; 6 , ) and R.RE(ez+;6,”) are decreasing functions of A’ bounded below by the 1-line. -S The risk-efficiency of 6, compared to 8, is given by and dominance of 6 ,
-s RRE( 6 , ;en)
(5.4.18)
Chapter 5. ANOVA Model
234
-s
-S
Similarly, the risk-efficiency of 8, compared to 8, may be written as R.RE@+;
e:)
- -1 (5.4.19)
5.5
MSE Matrix Analysis and Efficiency
In this section, we consider the MSE matrix analysis of the various estimators and look at the efficiency based on the MSE matrices.
5.5.1
Comparison of 6%and
8,
In this case, consider the matrix-difference
This MSE-difference is positive definite whenever for a given nonzero vector t =I!( , . . . ,lp)'we have
(5.5.2)
(5.5.3) Since t'N-'t
> 0, we consider
or
(5.5.4)
5.5. I1fSE Matrix Analysis and Efficiency
235
Hence, 6, pcrforms better than 6 , if 0 5 A2 5 p - 1; otherwise, better than 8,. The efficiency of 6, relative to 6 , is then given by
-
6,
performs
la2N-1 jl/P
[$lpl;+ 66’1’/p (5.5.5)
MRE(6,;6,) is meaningless. Thus, But, 111 1 ‘ N + 0-~66’Nl = 0, P p_ p MRE(8,;8,) is not the suitable expression for it. We must depend on
RRE(~,;6,).
5.5.2
,.PT Relative to 8,
Comparison of 8,
and
8,
The MSE matrix difference is given by . PT
~ ~ ( )8- ~,
( 6 ,=)- a 2 ( ~ ~ - 1 ) ~ p + l , , ( e , ; ~ 2 ) +66’{2Gp+ip(&;A2) - Gp+3,,(t:; A’)}
(5.5.6)
. PT
It can be shown by the method in Section 5.5.1 that On performs better than 6, if
otherwise,
6,
,.PT. Similarly, we can show that 6,
performs better than On ., PT
forms better than 8,
per-
if
..PT
Otherwise, 8, performs better than en.Thus, none of the estimators dominate the other. ,.PT Now, the MSE-based efficiency of 8, relative t o 6 , and 6, is given respectively by
(5.5.9)
t +
236
Chapter 5. ANOVA Model
Table 5.5.1 Maximum and Minimum Guaranteed Efficiencies (15,6) (158) (15710 1 (15,121 (1574) a\(% P1 7.5774 10.0840 12.2909 14.4987 17.0014 0.05 E' 1.0528 Eo 0.7986 0.8450 0.8661 0.8633 Ao 17.7443 31.0244 53.0571 98.2937 205.7908 4.2183 5.4015 6.4131 0.10 E' 8.5414 7.4080 0.7494 Eo 0.7004 0.7535 0.7703 0.7584 Ao 15.0004 25.7483 43.9494 85.8566 212.908 1 3.0422 3.7906 4.4183 5.0280 0.15 E' 5.7225 0.8624 Eo 0.8653 0.8999 0.9117 0.9054 58.4570 169.2741 Ao 11.6013 0.20 E" 2.4357 4.3139 3.8323 0.8808 Eo 0.8856 0.9146 0.9248 0.9197 Ao 10.2014 17.0557 27.2477 47.9643 133.8645 2.0642 2.4680 2.7979 3.1126 0.25 E' 3.4696 0.8974 Eo 0.9022 0.9265 0.9353 0.9311 9.1734 40.7270 106.7507 A, 0.30 E* 1.8130 2.6317 2.9074 0.9116 0.9406 Eo 0.9165 35.3589 86.3478 Ao 8.3718 1.6321 0.35 E' 2.5065 2.2879 0.9489 0.9238 Eo 0.9289 1.4961 0.40 E' 2.2065 2.0300 0.9346 0.9565 Eo 0.9399 27.7770 59.8843 Ao 7.1811 0.45 E* 1.3906 1.9738 1.8297 0.9627 0.9442 Eo 0.9497 24.9568 51.034 1 Ao 6.7219 0.50 En 1.3069 1.6698 1.7883 0.9685 0.9528 Eo 0.9584 22.5630 43.9497 Ao 6.3262
Since there exists an orthogonal matrix I' such that I'HI'' = Diag(1, 1, . . . , 1 , 0 ) and r66"I"Diag(A2,0.. . , O ) , we write (5.5.10) as =
{ 1 - GP+1,,(P,; Az)}-p-2'p [1 - G p + i , m ( LA') ;
X
+ A2{2,+i,,(&;
A2)
-Gp+3,m(P:,;A2111
(5.5.10)
Note that under Ho, PT MRE(B, A ) = {1-Gp+1,&;0)}
--1+O/P)
APT
( 2 11,
-
and under the alternatives, that is, A' > 0, MRE(6, ;tin) is decreasing function of A' for fixed cy until A' = A;,,. Then it increases toward 1 as
A'
+ 00.
5.5. MSE Matrix Analysis and Efficiency
237
To determine optimum level of significance a to obtain a minimum guaranteed efficiency Eo, we solve the equation for a: APT
minnz MRE(8,
-
;6,) = E ( a ,A$,(.))
= Eo.
(5.5.11)
Some tabular values are given in Table 5.5.1 for various p and n values.
5.5.3
Comparison of
6,
First, consider the comparison of difference is given by
-s
AS
S+
and 8, (8, and 6, )
6,
-s
and 8,. In this case, the MSE-matrix
(5.5.12) -S
For 6 , to dominate 6, based on MSE matrices, we must show that (5.5.13) is nonnegative definite. Thus, consider the quadratic form
02C’(HN-’C) {2E[xi;l(A2)] -(P
- (p-3)E[x;;i(A2)])
+ l)e’(ss’)eEIxi~3(A2)1
for a given nonzero vector C = maximizing over all t , we have
(tl,
(5.5.13)
. . . ,lP)’. Dividing by C’N-’l(> 0), and
(5.5.14)
-S
which does not always hold. Hence, 8, does not dominate 6, uniformly.
Chapter 5. ANOVA Model
238 -S
The MSE-efficiency of 6 , compared to 6 , is given by
-s Under Ho, the MRE (6,;6,) becomes
-s -
and as A2 -+ 00, MRE(6,; 6,) -+1. S+ -S In order to compare 6 , and 6,, we note from (5.3.6e) that
-
M4 (6;)
- M5
(6,”’)
The r.h.s. of (5.5.18) is positive-semidefinite, since
- S+
Thus, the MSE-based efficiency of 6 ,
-s
relative t o 6 , is given by
5.5. MSE Matrix Analysis and Efficiency
J51
--f
1 from above. -S . S+ The MSE-efficiency of 8, compared to 8, may be written as
239
Chapter 5. ANOVA Model
240
-s+ -s
It may be shown that under Ho, the MRE (8, ; O n ) 2 1, and as A2 -s+ -s MRE (8, ;en)tends to 1 from above.
5.6
-+ co,
Improving the PTE . PT
As in Chapter 4, we consider the improvement of the PTE 8, of 8 for p We recall that the P T E of 8 is given by PT
based on 6, and given by
n ) G I Fp-l,m(cy))
0,
= 3, - (8, - 6
6,.
Now, for p 2 4, we have the Stein-type estimator of 6
APT+
Thus, for p 2 4, we may consider the estimator 8, 6, via the test on 6 = 001,. We may write APT+
8,
2 4.
.
= & l ( L< F p - l , m ( Q ) )
(5.6.1)
-S
by combining 8, and
-S
+ 6,I(L,
2 -s - s
Fp-l,n(Q))
< F p - l , m ( Q ) ) + 0, - 6 , I ( L < Fp-l,m(cx)) -s = 8, - (6, - G n ) l ( L < F p - l , m ( @ ) ) =6,I(L
-s A
PT
= 8,
- c(8, - ~,)c;'I(L2
F ~ - ~ , ~ ( ~ ) ) .
,.PT+ are then given
The bias vector, b6(6cT+)and the quadratic bias of 8, by
(5.6.3)
5.6. Improving the PTE
241
respectively. Note that (5.6.5) and (5.3.lf) are similar except for t,. The corresponding MSE-matrix and the risk expressions are given by
M6 (6ET+)= h’f4 (6:)
,.PT+
PT
respectively. The percentage improvements of 6 , over 6 , under the null hypothesis Ho for cr = 0.05 and 0.10 are given in Table 5.6.1.
-s
Similarly, we can make an improvement on 6 , by using the Stein-typeestimator of the variance 0 2 ,as in Section 4.5.6 of Chapter 4. We write estimator of 6 as -IS
.,
6, = 6,
+ { 1 - c ( m + 2)LG145s(L,)}(6, a,),
(5.6.8)
-
where
-S
This estimator improves over 6 , uniformly. The risk function of by
R7 (6; : o-~I,)= R4 (6; : 0-~1,) -
+
-IS en
( P - 3)2 [ m ( p 2 m (p+m+1)2 m+2
is given
+ 3)
Chapter 5. ANOVA Model
242
Table 5.6.1 Tables for Percentage Improvement of
enPT+ over O PT n ~
a
2
18
20
22
24
26
28
30
32
0.05
4 5 6 7 8 9 10 11 12 13
19.62 36.20 46.82 52.87 56.14 57.82 58.52 58.49 57.74 56.08
20.16 37.29 48.27 54.54 57.97 59.86 60.90 61.39 61.41 60.92
20.59 38.15 49.39 55.78 59.28 61.29 62.52 63.30 63.74 63.83
20.95 38.85 50.28 56.74 60.28 62.34 63.39 64.66 65.37 65.82
21.25 39.43 51.01 57.51 61.06 63.15 64.59 65.70 66.59 67.82
21.50 39.91 51.61 58.13 61.68 63.80 65.29 66.51 67.54 68.40
21.72 40.32 52.11 58.65 62.20 64.32 65.87 67.16 68.30 69.29
21.91 40.68 52.54 59.09 62.63 64.76 66.34 67.70 68.90 70.01
a
2
18
20
22
24
26
28
30
32
0.10
4 5 6 7 8 9 10 11 12 13
22.59 38.94 48.39 53.40 56.02 57.40 58.01 58.00 57.32 55.79
23.12 39.94 49.67 54.85 57.64 59.27 60.26 60.80 60.89 60.48
23.54 40.73 50.66 55.94 58.83 60.60 61.83 62.70 63.25 63.43
23.88 41.37 51.45 56.79 59.73 61.61 63.00 64.09 64.93 65.48
24.17 41.89 52.09 57.47 60.45 62.40 63.90 65.16 66.19 66.99
24.42 42.33 52.62 58.03 61.03 63.03 64.62 65.99 67.18 68.15
24.63 42.71 53.07 58.49 61.51 63.55 65.21 66.67 67.97 69.07
24.81 43.03 53.45 58.89 61.92 63.98 65.69 67.24 68.62 69.82
P
P
Source: Bashir Khan Ph.D. Thesis, 1997
ANOVA Model: Nonnormal Errors
5.7
In this section, once again, we consider the ANOVA model
- nr='=, n;:,
Y=BB+E as in (5.1.1) and (5.1.2) except that E unknown but satisfies the conditions (i)
(5.7.1)
F ( E ~where ~ ) the cdf F is
E ( E )= 0 and (ii) E ( E E '= ) 021,, (0 < u2 < 00);
lim (A3,)
n-cc
= lim n-cc
lim (A,) = A
n+cc
(2) x j
and
=
J,
(0 < Aj
= I, - l,l&.
< l),
(5.7.2)
(5.7.3)
(5.7.4)
Further, lim J, = J = I, - l,l$
n-co
(5.7.5)
5.7. ANOVA Model: Nonnormal Errors
243
5.7.1 Estimation and Test of Hypothesis
We define unrestricted and restricted estimator of 6 as before, as - the 6, = (yl,.. . ,Yp)'and 6 , = tlPlLN6,, respectively. Also, the unbiased estimator of u2 is defined as P
ni
(5.7.6)
and s: is a consistent estimator of 0'. In order to test the null hypothesis HO : 6 = Ool,, we consider the statistic -I
L,
=
n6, Jkh,3,8,
(5.7.7)
s:
Under Ho, L, is distributed approximately as a central chi-square variable with ( p - 1) d.f. On the other hand, under fixed alternatives
Hd : 6 = 601,
+ 6,
(5.7.8)
we can write
and
Now, by the standard central limit theorem together with the Slusky's theorem, we have
(5.7.10) It follows that L, = ns;'l18, - 6n112+ 00 as n fixed alternatives H 6 ,
Pa{& > K }
-+ 1
for all
--f
co. Consequently, for all
K.
(5.7.11)
Therefore, to obtain a meaningful asymptotic distribution of L, as n we consider a class of local alternatives {IS(,)} defined by
q,): e(,) = eoip+ n-1/26.
+ 00,
(5.7.12)
Then we have the following theorem: Theorem 1. Under {K(,)} and the assumed regularity conditions, we obtain the following as n 4 co:
Chapter 5. ANOVA Model
244
(5.7.13) (v) C, is asymptotically distributed as a noncentral chi-square with p - 1 d.f. and noncentral parameter A2/2, A2 = C T - ~ ( ~ J ' A J= S )S*J'AJ6*. The proof follows by defining r2 = A:l2Ip, then r1J?i= I, - Ak121pI;A:12.
Preliminary Test and Stein-Type Estimators
5.7.2
We combine the unrestricted and the restricted estimators of 8 t o obtain the PTE and Stein-type estimators as follows: (i)
,.PT
e,
where (ii)
-s
= 6, -
(6, - &,)I(C, < c,,~),
(5.7.15a)
is the upper a-level critical value of C, for finite samples,
-
en = e,
(5.7.1513)
- c*(Bn - 6 , ) ~ ; 1 , c* = (p--J)m m+2 1
and
are the usual Stein-rule estimator and positive-rule Stein-estimator, respectively, since C, # 0 with probability one.
5.8
ADB, ADQB, ADMSE, and ADQR of the Estimators
In this section, we present the asymptotic distributional bias (ADB), quadratic bias (ADQB), MSE-matrices (ADMSE), and the quadratic risks (ADQR) of the estimators of 8. Accordingly, consider the estimator O;, and let W be a p.s.d. matrix and the quadratic loss function
qe;, e) = n(e:,- e)'w(e; - e) = tr[W{n(e; - e)(e; - B)'}]. (5.8.1) Let the ADM be M(8;) = E{71(0; - e)(e: e)'}. Then the ADQR -
of 0; is given by
R(0;;W) = tr[WM(e;)].
(5.8.2)
5.8. A D B , ADQB, ADMSE, and A D Q R of the Estimators
245
(0; -0) is equivalent In many situations, the asymptotic distribution of fis;' to f i u - ' ( 6 , - 0) under fixed alternatives 8 = 1901, 6, and thus t o obtain appropriate asymptotic distribution of 6s;' (0; - e),we consider the class of local alternatives
+
q,): qn)= oOi, + n-1/26
(5.8.3)
- @(,I) as
and define the asymptotic cdf of fis;l(e;
G;(x) = n-m lim P { &is;'(e;
-
t9(,)) 5 x/K(,)}
(5.8.4)
if it exists. Then the ABD and ADQB are defined by
b(8;)= J x d ~ ; ( x ) ,
(5.8.5)
5.8.1 Asymptotic Distribution of the Estimators under Fixed Alternatives First, note that under 0 = 001,
&is;l(en
+ 6, -
e) N,(o,A-~).
(5.8.9)
Now, consider the quadratic difference -2
ns,
APT
jp,
-
en112=
~S;~II(& -
= L,I(LTL
~ , ) I I ~ I ( L<, L,,~)
< L,,)5 LlL,J(Ln < Ln,*). (5.8.10)
Thus, by the consistency of the test based on L,,
E{L,I(L, < L,,,)} Hence, as n
--f
00,
4
0 as TI
-+ m.
(5.8.11)
Chapter 5. ANOVA Model
246 Similarly, consider the quadratic difference -S
ns;2110, - 8,112 = ns,211(8n - 8n)1/2c*2~;2 = c*'L;'. P
Since C, 2 0 and Ln-+oo as n C* -+ ( p - 3). Hence,
-s
---f
03,
- e) =
&is;l(e,
we have E[L;']
-+
(5.8.13)
0 as n -+ oc) while
& i d @ , - 0) + op(i).
(5.8.14)
Finally, we show that
,.hS;l(e,- S+ - e) = f i d ( 6 ,
as n
-+
03.
-
e) + op(i)
(5.8.15)
In this case,
Hence, (5.8.17) and
,.S+ - en)= &&(en
&iS;l(en
-
e) + op(i).
(5.8.18)
We conclude that under fixed alternatives 8 = 601, + 6 , asymptotic distribuAPT
,. S+
tion of J;Is;l(e, - e), ,/~s;1(8,S - el, a nd &s;' (6, - 8) is the same normal distribution N,(O, A-'). Hence, the ADB, ADQB, AMSE, and ADQR of the estimators are the same as that of fis;'(8, - O), while the asymptotic distribution of fisT'(8, - 8) is degenerate.
5.8.2
Asymptotic Distribution of the Estimators under Local Alternatives
In this section, we consider the local alternatives {K(,)} given at (5.8.3) and Theorem 5.7.1 to obtain the ADB, ADQB, ADM, and ADQR of various estimators of 8. Clearly,
5.8. ADB, ADQB, ADMSE, and ADQR of the Estimators (i) bl(8,) = 0 and Bl(6,)
247
=0
and (ii) b2(bn)= 6* and
(iii)
&(en) = A2
(6* = a-ld).
(5.8.19)
From Theorem 5.7.1, we write
z(X where (since
+ 6*)
-
+
(Y
+ J6*)1(/IY+ J6*112< X;-~(CY)),
(5.8.20b)
X ,2- ~ ( C Y ) ) as n 4 co),
(5.8.21) so we arrive a t the theorem
Theorem 1. Under {K(,)} and assumed regularity conditions, the asymptotic distribution of fis;'(O,
-PT
- 001,) is given by
G , P T ( X ) = @ p b ; 0,B)Hp-l(x;-l(4;
A2)
+ / . - - / C P p ( ~ - Y ; O , B ) d @ , ( Y JA-l), ;O,
(5.8.22)
n6') where
r(S*)= { z : (Y + J6*)'(Y+ JS*) 2 x ~ - ~ ( ( Y ) } ,
(5.8.23)
CPP(x;C ) cdf of a p-variate normal distribution with mean vector p and covariance matrix,
x.
Similarly, we write
-S fiS;l(en - 801,)
as
3 (X + S*)- ( p - 3)(Y + JS*){IIY+ JS*11-2}.(5.8.24) The r.h.s. of (5.8.24) is the asymptotic representation of f i ~ L l ( 6 -~ 001,).
A similar representation of fis;'(e, following such simple steps as
A
V
+
S+
- 001,) can be easily obtained by
(X+ a*) - ( p - 3)(Y + JG*)ljY+ J6*1)2}1(IIY+ J6*1I2> p - 3) (5.8.25) - (Y + JG*)I(JIY + J6*1I2 < p - 3).
Chapter 5. ANOVA Model
248
5.8.3 ADB, ADQB, MSE-Matrices, and ADQR of 6; -S ,.S+ 8, and 8, . PT
First, we consider 8, . In this case, we note that
&sz1
(eg- eoi,) 3 u1= x +6*
-
(Y + J S * ) I ( ~ ~+YJ S * I /<~x;-l(a)). (5.8.26)
Then
b3(6?)
PT
-601;) -s,’6}
= n+cc lim
E{&s;’(6,
= E(X
+ 6*)- J6*HP+1(&1(a);A2)
-
6’
( a ) A2). ;
= -JJ*Hp+l
(5.8.27)
Thus,
B3(6np‘) = A2 {Hp+i(x;-l(a);A2}2.
(5.8.28)
- PT may be obtained as
The MSE-matrix for 8,
M3 (6;)
= E(U1U;) = A-’ - (JA-’)HP+1(x;-1(a); X
A2)+ (J6*6*’J’)
{2Hp+1(X;-i(a);A2) - Hp+3(X;-1(a);A2)},
(5.8.29)
and the quadratic risk as
A2) + (S*’J’WJS*)
R 3 ( h r : W ) = tr(A-‘W) - tr(JA-lW)Hp+l($-l(a); X
{2Hp+l(X;-l(a); A2) - H p + 3 ( & 1 ( a ) ;
A’)} .
(5.8.30)
-S
Next, we consider 8,. In this case, fis;’
Thus,
(6: - qnIip) 3 uz= (x+ 6’) - ( p - 3
+
) ( ~J ~ * ) ~+IJY S*IJ-~.
-S
E(U2) = b4 (8,) = - ( p - 3)J6*E [ x ~ : ~ ( A ~ ) ] B4
(6:)
The expressions for M4
(6:)
= - ( p - 3)2A2{ E M4
= E(U2I.J;)
(*”I
[x;?,(A2)]}”
(5.8.31a) (5.8.31b)
8, and R4 8,; W are computed as
(*” )
= A-1 - ( ~ - 3 ) ( J n -( 2 ’ )E [x,s”1(A2)] - ( p - 3 ) E [x;:,(A2)])
+ (P- 3 ) b +
1)(JJ*6*’J’)E [x,-,”3(A2)] ,
(5.8.32)
5.8. ADB, ADQB, ADhilSE, and ADQR of the Estimators
249
and R4
(6:; w)
= tr(A-’W) - ( p - 3)tr(JA-’W)
( 2 E [X,-,”~(A’)]- ( p - 3 ) E [X;:~(A’)]}
+ ( p - 3 ) ( p + l)(S*J’WJG*)E[X;:~(A’)]
(5.8.33)
respectively. S+
The corresponding expressions for 8,
(6:’)
b5
=
are given by
-J6*{ (P- 3 ) E [xp:1(A2)I (x;:1(A2)
> p - 3)] ( 5.8.34)
-l Hp+l(P - 3; A’)},
M5 (6B’)
(6:)
= M4
- (JA-l)E
[(I - ( p - ~)X,;”~(A’))’ I (xg+l(A2)< p - 3)]
+ (J6*6*’Jr){2E [ ( ( p- 3)X;j1(A2) -
E [((P- 3)x;&(A2)
and the ADQR of
6:’
- 1) I (x:+l(A2)
- ~ ) ’ I ( x & ~ ( A<’ )P - 3)]
-
}
3)] (5.8.36)
is given by
R5 (6:’; W) = R4
(6;; W) - t r ( J A - l W ) E [ (1 - ( p - ~)X;:~(A’))’
x I (x;+l(A2)
< P - 3)]
+ (&*’J’wJa*){ [(P- 3)~;:~(A’) -
E
[(b- 3)x;:3(A2)
- 1)I(X;+~ (A’)
- 1)’ I (x:+3(A2)
- 3)]
- 3)]
.
(5.8.37)
It can be shown by risk-difference analysis that
RI (6,; W) 2
R4
(6;;
a,,
W) 2 R5
(6;’;
- s -s+
W)
(5.8.38) . PT
for all A2, and none of the estimators 8,, 8, , and 6, dominate 8, uniformly. This is due to the size of the preliminary test of Ho and unboundedness of the risk of 6,.
Chapter 5. ANOVA Model
250
5.9
Confidence Set Estimation
In this section, we consider the confidence set estimation of the treatment means 8 in the model (5.1.1) and (5.1.2)
Y = Be + E ,
E
-
N(0,a21,),
(5.9.1)
where B is the block-diagonal matrix of the vectors l, , . . . , lnpand a' is known. We have defined several estimators of 8 repeated here again: -
(B'B)-'B'Y = (TI,.. . , Y,)'.
(i)
6,
=
(ii)
6,
= ;lplLN6n
(iii)
8,
(iv)
-s 8, = 8,
(v)
8,
APT
-s+
when 8 = 1901,.
-
I ( L n <xt(a)), q = p - 1.
= 8,-
-
(-
^> L E I , o <
c 8, - 8,
-s
= e, - (1 - C L ; ~ ) I ( L ,
c
< 2(p - 3 ) .
< ). (6, - en),
where
L, Since 6,
N
= u-2
(~,HINH~,).
(5.9.2)
Np(8,a2N-'), the (1 - -y)% confidence set Co((e)is defined as (5.9.3)
where $(y) is chosen such that PO{$ 5 xg(y)} = 1 - y. Thus, CO(6,) has coverage probability 1-7. The set CO(8,) is minimax in the sense that among all the confidence sets with coverage probability, at least 1 - y. Also, Co(6,) minimizes the maximum volume. A s we saw in Chapter 4, there is a scope to improve upon CO(6,) for p 2 4 in the sense that (1) Pe{8 E C:(8:)} 2 Pe(8 E Co(6,)) and (2) The volume of C*(8:) 5 volume of CO(8,) with strict inequality holding in either case for a set of positive Lebesgue measure of 8 or Y , respectively. In this section, we first consider the confidence sets that are centered at the estimators of the form
e:, = ~ 6 +, (I - A ) & ~ ( L , ) ,
(5.9.4)
where (I - A) is an idempotent matrix of rank p - 1 and g ( L , ) is a nondecreasing function of the test-statistic L, = O-~(~;H'NH&).Particularly, we take A = i1,lLN and then H = I, - A.
5.9. Confidence Set Estimation
251
(i) If g(L,) = 1, then 6: = 8,.
(5.9.5)
(ii) If g(L,) = 0, then 6; = 6,. PT
(iii) If g(L,) = I ( L , > x:(a)), then 6: = 6 , A
( q = p - 1).
-*
-s
(iv) If g(C,) = (1 - c L i l ) ; 0 < c < 2(p - 3), then 8, = 6,. . S+
(v) If g(L,) = (l-cLil)I(L, > c ) , then 6; = On ( c ) ,which is the Lindley (1962) estimator of 6 . The estimator 0; shrinks toward the linear subspace defined by 6 = &l,.
5.9.1
Confidence Sets and Coverage Probabilities
We consider the class of confidence sets given by
c*(e;) = ( 6 ; a - 2 ~~6 6;iiL I x;(Y)}.
(5.9.6)
More specifically, we consider the following confidence sets: (i) CO(8,) = ( 6 : 0-~116- 6,I&
< $(y)}.
PT
(iii) C P T ( 6 , ( a ) )= ( 6 : 0-~116- 6,"(a)IIk
< xX(y)}.
-S
(iv) Cs(6:) = ( 6 : o-2)16 - 6,llL < $(r)}.
().
-
S+
CS+(~:+(~ =)()6 : a - 2 1 p - 6 ,
( c ) ~ <~ kx;(y)}.
(5.9.7)
Note that N-1/2(I- A)'N(I - A)N-'/', A = :l,lbN is a symmetric idempotent matrix of rank ( p - 1). An orthogonal matrix, = (rl,r2) exists such that
rN-1/2(1 - A)'N(I - A)N-'l2r' =
(
0
0
(see Section 5.1.3). Defining w1 = $I'iN1/28, and w2 = $I'iN'/28, we have
(5.9.8)
Chapter 5. ANOVA Model
252
As a result we write
lie e;I&
o - ~ -
=
(v2 -
2
+
.
(5.9.9)
dHl(t>O).
(5.9.10)
~ a ) ~llril - ~ g ( i i w 1 / 1 ~ ) 1 1
Thus we can write the coverage probability of (5.9.9) as pq
{ [(72 -
=f
( 7 )
W2I2
pq
+ 1171 - ~ 1 ~ ( 1 1 ~ 1 1 1 2 ) 151 x;(Y)} 2]
{ 1/71- wl~(llwl/12))lj2I(Xi(?)
- t)’}
where H1 ( t ;0) is the cdf of a central chi-squared distribution with one d.f. That is to say (i) If g(11w11I2) = 1, we have
Po { (712 - W 2 Y = Po
+ 1171 - 412 < x&))
(xi 5 x;cr,>
= Hp(x;(Y);o) = 1 - Y.
(5.9.11)
253
5.9. Confidence Set Estimation
5.9.2
Analysis of the Confidence Sets
We first note that P(C"(e,)) = 1-7, which is constant for all A'. Next, we see that
P
(c" (6.))
= H1(Xi(Y) - A2;0)
1
(5.9.16)
which is a decreasing function of A2 with a maximum H I (x: ( a ): 0) at A' = 0 and zero when A2 -+ x;(y). The coverage probabilities of C 0 ( e n )and C R(6,) are equal when A2 = A:, which is given by
A; = xi(y) - Hcl(l
-
y).
(5.9.17)
Now, consider the confidence set CPT(O;'(a)). In this case,
{
p CPT (
634))
+h
Pv, (11771
X;(-i)
= Hl(X;(Y)
-
- A2;o)HQ(x:(a); A2)
will2 5 (Xi(Y) - t ) + ;IIw11I2> Xi(Q>} dHl(t,O), (5.9.18)
and we have the following theorem: Theorem 1.
(4If A2 5 X;(Y), P
(cP*
(6?(a))) 2 1 - y,
(5.9.19)
(5.9.20)
254
The coverage probability (5.9.18) drops to
As a result, we have
Chapter 5. ANOVA Model
5.9. Confidence Set Estimation
255
S+
Next, we consider the coverage probability of CSf(6,
P
{c~+(et+(~))} = H~(X;(?)
+
1
X.37)
( c ) ) .In this case,
- AZ;O)H,(C; ~ 2 )
pq, { 11711 - Wl(1 - CIIWl 11-2)112
F X;(Y)
- t; IlWl
112 > c } d H l ( t ,0). (5.9.23)
Again, we note that (5.9.23) is always greater than or equal to 1 - y. To show this, we write
Jnx { 5q cll~lll-2)112 l F (xi($ 11711 - Wl(1 -
- t)+;IlW11l2
> c} dH1(t;O)
2 (1 - 7 ) - Hl(XE(7) - A2;O)H,(X:(4;A2).
(5.9.24)
From Chapter 4 and Hwang and Casella (1984), the condition for dominance of confidence set based on the positive-rule Stein estimator over the unrestricted estimator is that
(5.9.25a) and G(2)(c;x;(Y))
=[
xP(y) +
Jm] p-2
fi
-XP(Y)fi
[Jix;40+4c) ] - xP(y)
2&
21
(5.9.25b) for p 2 3. Note that Z1 Np-~(q1,Ip-1), so it is sufficient t o establish that G(l)(c,b) 2 1 and G@)(c,b)2 1 for 0 < c < co and 0 < b2 I xE(y). Let us prove G ( ' ) ( c , b ) 2 1 for 0 < b2 < x;(y). Note that for every value of b, @')(c,b) is decreasing in c. Hence, it is sufficient t o establish that there exists a c* such that G(')(c*,b) 2 1 and @ ) ( c * , b ) = 1. Also, G(l)(c,b) is strictly decreasing in b. Hence, @')(c*, b) either (1) strictly decreases to zero in b or (2) strictly increases to a unique maximum and then decreases to zero. Since @')(c*, 0) = G(')(c*; xp(y)) = 1, then case 1 is not true, and in the case
-
Chapter 5. ANOVA Model
256
2 we get B(')(c*,O) > 1 for 0 < b2 < x;(y). The proof with ($')(c,~,(y)) is similar. In our case, c* = p - 3, and computational results show that there is no significant difference in coverage probabilities. Now,
-mX;(Y)
-A ';
o ) H q ( x ; ( 4 ;A2)
2 (1 - Y) - Hl(X;(Y) - A2;O)Hq(X;(a);A2).
(5.9.26)
Consider further the solution of (5.9.25a) and (5.9.25b) for equality to 1 and let co be the minimum of two solutions. Table 5.9.1 gives the values of co is given from Casella and Hwang (1987). Table 5.9.1 Values of co from (5.9.25a) and (5.9.25b) for p = 4(1)25 and y = 0.05 and 0.10
A2\v y = 0.10 y = 0.05 4 5 6 7 8 9 10 11 12 13 14
0.669 1.517 2.392 3.211 4.036 4.865 5.697 6.532 7.369 8.207 9.047
0.633 1.458 2.281 3,983 3.893 4.710 5.531 6.355 7.182 8.011 8.842
P -
15 16 17 18 19 20 21 22 23 24 25
y = 0.10 y = 0.05 9.888 9.675 10.731 10.510 11.574 11.345 12.418 12.182 13,263 13,020 14.109 13.859 14.955 14.699 15.802 15.539 16.650 16.381 17.498 17.223 18.346 18.065
The coverage probability of CSf ( 6 z + ( c ) ) is given in Table 5.9.2 for p 4(4)24 and y = 0.10.
=
5.9. Confidence Set Estimation
257
Table 5.9.2 Coverage Probabilities of the Set Cs+ with c = CO, y = 0.10
AZ\P 0 1 2 3 4 6 8 10 20 50 100
4 0.940 0.936 0.927 0.910 0.905 0.902 0.901 0.901 0.900 0.900 0.900
8 0.991 0.990 0.985 0.976 0.956 0.930 0.918 0.912 0.903 0.901 0.900
12 0.998 0.998 0.997 0.994 0.988 0.962 0.941 0.928 0.908 0.901 0.900
16 0.999 0.999 0.999 0.998 0.997 0.983 0.962 0.946 0.914 0.902 0.901
20 0.999 0.999 0.999 0.999 0.999 0.994 0.978 0.961 0.920 0.904 0.901
(~E+(c))
24 0.999 0.999 0.999 0.999 0.999 0.998 0.989 0.974 0.927 0.905 0.901
Table 5.9.3: Coverage Probabilities of the Set CS+ ( h : ' ( c ) ) with co = p - 3, y = 0.10
A2\P 0 1 2 3 4 6 8 10 20 50 100
4 0.952 0.948 0.936 0.911 0.905 0.902 0.901 0.901 0.900 0.900 0.900
8 0.995 0.994 0.990 0.982 0.958 0.931 0.919 0.912 0.903 0.901 0.900
12 0.999 0.999 0.999 0.997 0.991 0.963 0.942 0.929 0.908 0.901 0.900
16 0.999 0.999 0.999 0.998 0.998 0.985 0.964 0.947 0.914 0.902 0.901
20 0.999 0.999 0.999 0.999 0.999 0.995 0.980 0.963 0.921 0.904 0.901
24 0.999 0.999 0.999 0.999 0.999 0.998 0.990 0.978 0.928 0.905 0.901
The value co is almost near p - 3, which is our choice, in the Stein-rule estimation. Table 5.9.3 shows that by this choice, co = ( p - 3). There is not much difference in coverage probability. (Tables 5.9.1 through 5.9.3 are due to Casella and Hwang (1987) reproduced with permission from Elsevier). Thus, from the analysis of the coverage probabilities, we conclude that under Ho,
depending on the size of the test given by x:(a). This may give rise to the ordering CR (6.)
2 cs+(if+)
(^"+I
2 CPT
(er) co 2
(6,).
(5.9.28)
As A2 diverts from the origin, the dominance ordering changes drastically, leaving room for only Cs+ 8, , which dominates Co O n uniformly. Thus,
(- 1
Chapter 5. ANOVA Model
258 for everyday use, we consider Cs+ coverage probability is a t least 1 - y.
5.10
which gives the guarantee that the
Asymptotic Theory of Confidence Set Estimation
In this section, we develop the asymptotic theory of confidence sets for the treatment means. We recall the conditions on the ANOVA model (5.7.1) as given by (5.7.2) through (5.7.5). The confidence sets corresponding to the (1 - y)% confidence coefficient as given below.
Co (6,) = ( 6 : nsi21j6 - 8,ljL
1
5 ny ,
(5.10.1a)
where P (C'(6,)) = 1 - y for niy= x;(y),
CR
Pn)
= ( 6 :n
CPT
~ ; ~ [-[6,lli, 6
py(a)) { 6 =
,.PT
- 6,
:
cs (6;) = ( e : n s ; 2 p
-
1
-
1 - y.
(5.10.1~)
5 K.}
(cy)I1Ln
(5.10. Id)
ij;+(c)llirL 5 .}.
We will study the properties of these five (1 - y)%
4
=
(5.10.1b)
B , S I I ~ ,5 n7}
CS+ (ez+(c))= {e : ns;21p n
since P(xE 5 xg(y))
(5.10.1e)
confidence sets as
00.
5.10.1 Asymptotic Representations of Normalized
Estimators under Fixed Alternatives
From Section 5.8.1 we know that under fixed alternatives 6 = 001, -6
) = o - 1 A 1 / 2 (-6 , - 6 ) +
6 s ; An
( (-3 n
fis,'ALf2
(bs - 6 ) = o
and 6 s :
1
OP(1)t
(
A,,112 ijs+ , - 6 ) = o-'A112
As in section 4.8.1 the asymptotic distribution of &-~s;'A:'~(e~
+ 6: (5.10.2a)
(5.10 . 2 ~ )
(5.10.2d)
- 6 ) degener-
ates. Hence, P (Co(6,)) , P ( CPT(6,"(.y))), P (Cs (8,)) and P(Cs+(6:' (c))) tend to 1 - y as n -+ co under fixed alternatives.
5.10. Asymptotic Theory of Confidence Set Estimation
259
5.10.2 Asymptotic Coverage Probability of the Confidence Sets under Local Alternatives In order to obtain meaningful asymptotic coverage probabilities of the confidence sets in (5.10.1a-e), we consider the local alternatives K ( n ) : 6(n)= 80lP+n-'f26 as given by (5.7.12). Accordingly, we start with the unrestricted estimator based confidence set CO(8,). In this case,
fis,'Akf2
(en -
6(,))
-
Z
-n/p(O,Ip).
Hence,
(-en1. In this case, lim pK(,,{ n s ~ ~ 1 1 6 ( , ) anlll- 5
which is constant for all A'. Next, we proceed to the set C R n-w
= Pql{ (772 - W 2 Y
K-,}
+ 11171112 5 x2,(r))
= Hl(X2,(7)- A2;0)
(5.10.4)
by similar transformations as in Section 5.9.1 and equation (5.9.12). We note that the asymptotic coverage probability is the same as in the finite case and analysis similar to that of Section 5.9.2. Accordingly, n-ce lim pK(,,)
PT
{ns,zl/o(n) - on *
iih, 5
K.}
= Hl(XE(Y) - A2; 0 ) H q ( x , 2 ( 4A2) ;
(5.10.5) In the case positive-rule Stein-estimator, we obtain
x d H , ( t ; 01, which is the same as (5.9.15). Hence, Cs+ all A2.
(5.10.6)
(a:+(,))
dominates Co (8,) for
Chapter 5. ANOVA Model
260
5.11
Nonparametric Methods: R-Estimation
In this section, we consider the nonparametric methods of estimation of several location parameters 8 = (61,62, . . . , OP)’ when it is suspected but not certain that 6 = &lP, 1, = (1 ... 1)’ is a p t u p l e of ones and 60 is a scalar. This methodology broadens the scope of applications of the five estimators defined in earlier sections.
5.11.1
Model, Assumptions, and Linear Rank Statistics
(LW Consider the ANOVA model (5.1.1) in explicit form, as given below where S, denotes the ith sample:
S i : Y , = 6 , 1 n , + ~ z ,i = l ,
...,p,
(5.11.1)
- a$).
(5.11.2)
where
with the cdf given by ni
j=1
and J’(€i,.
. . ,~
nn P
p
=)
n,
Fo(K,
z=1,=1
(2,.
+ +
Further, let D, = Diag . . , %) and n = n1 . . . np. (5.11.3) It is assumed that { E ~ , } are mutually independent and identically distributed with the cdf F ( E ~. .,. , E ~ defined ) by (5.11.2), where Fo(.) belongs t o the class 3 of absolutely continuous c.d.f. with absolutely continuous p.d.f. fo(-) having finite “Fisher’s information”, that is, (5.11.4)
(iii) Score function a,(.) is generated by a function d(u) that is nonconstant, nondecreasing, and square integrable and
5.11. Nonparametric h4ethods: R-Estimation
26 1
where Ukn is the order statistics with rank k in a random sample of size n from U ( 0 , l ) . Finally, let R,,(b,) be the rank of ( K J - b,) among {(XI - b,), . . . , (Y,,, - b,)li = 1 , . . . , p } , and set
j=1
i=l
(5.11.6) Denote by T,*(b,),the sum of Tnz(bi),
where
T,(b) = (Tnl ( b i ) , . . . ,Tn,(bp))'.
(5.11.7b)
As in Chapter 4, we define
and
rl
(5.11.8)
Further, let
~ 2 =%(nz - 1)C (an,(Rzj)12%
7
(5.11.9)
3=1
where R,, stands for R Z J ( 0and ) tin, = nL1C;:, a n , ( k ) . Then, by the basic theorems of Chapter 4 of Hajek and Sidak (1967) under 8 = 0,
-
- 1 / 2 ~ , ( ~ ) N~(o,A;A),
(5.11.10)
where A = limn-- D,. Note that T,,(b,) \ in b, and T,'(bo) \ in bo under the model (5.11.1) and under HO : 6 = 801, respectively. Also, under 8 = 0, TnZ(O) and T,*(O)are symmetric about 0. Thus, we define the unrestricted and the restricted estimators - of 6- as follows: Let On = (Bnl,. . . ,enp)' be the estimate of 8 = (61,. . . ,eP)'. Then the components of 6, are given by sup{b, : ~ ~ , ( b>, )0)
+ infib, : ~ , % ( b , <) 011,
i = 1,... ,p, (5.11.1la)
Chapter 5. ANOVA Model
262 and under 8 = BOl, we define
as the estimate of 60. Then 8, = 80,1p. Note that both 6 , and are translation-invariant robust and consistent estimators of 8 under the general model (5.11.1) and the restricted model with 8 = 601,. For the preliminary test of the hypothesis, HO : 8 = 60l,, we use the nonparametric test-statistic
an
13, = A i 2 [Tn(Gn)]'D,' [Tn(Gn)]',
an
=&nlp,
(5.11.12)
where n
(5.1I.13)
A: = ( n - l)-l C [ a , ( k ) - Znl2 k=l
and
T n ( 6 n )=
(Tn1 (&n),
. .. T n P(Cn))'.
(5.11.14)
Also, Theorem 3.1 of JureEkova (1969) implies that under 8 = 0 and for any finite k (> 0), sup{n-1/21Tn~(n-'/26,) -T,*(O)
+ X,n'/2bzX($,4)1
: lbzl < k }
50 (5.11.15a)
so that
s ~ p { ~ - ' / ~ I T ~ ( n - ~ / ~ b ) - T ~( e(z =A O 1 z)b+ z )ny'(/$~, 4 ) / : Ibl < k } 5 0 , (5.11.15b) where Ibl = max(lb11,. . . , Ibpl). Thus, under 8 = 0 and relative compactness of n-1/260n,we obtain the relations P
(i) n-lW,*(o)y-'(+, 4) - n-1/2&n-+ (ii) n - l ~(Gon) ~ ,or
o 4))
( n - 1 / 2 ~ (0) ~ , - ~,n1/2ijo~y(+,
n - 1 1 2 ~(ion> ~ ~ - (n-'/2~,,(0)
5o
- x , ~ - ' / ~ T , * ( o )---f )P 0.
(5.1 1.16)
In vector forms we may write
n - 1 / 2 ~ n ( B-n (I, ) - ~ , ~ ; A ) ~ - ~ / ~ T ,Z( o0.)
(5.11.17)
Then, we may write
C,
= Ai2[T,(0)]'JA-1J'[T,(O)]
+ op(l),
(5.11.18)
where J = I, - l&A. Hence, under 8 = 0, as n -+ 00, C, approximately follows the central chi-square distribution with ( p - 1) d.f., since JA-'J' is an idempotent matrix with rank ( p - 1).
5.11. Nonparametric Methods: R-Estimation
263
5.11.2 Preliminary Test and Stein-Type Estimators We combine the u n r e s t r i c t e d and the r e s t r i c t e d estimators of 0 to obtain the preliminary test estimator and the Stein-type estimators as in Section 5.7.2, which are given by
- PT = a, - (8, - hn)1(cn< c,,~),
(i) 8,
(5.11.19)
where is the a-level critical value from the distribution of C, under No which converges to as n -+ 03. The Stein-type estimators are given bY
~i-~(cr)
-s
-
(ii) 8, = 8, - c(an - B,)c;'I(c, where
E,
-s+
(ii) 8,
(5.11.20)
and c = ( p - 3) and
-+0
as n
= 8,
+ (1 - CC;~)I(C,
-+ CQ
> E,)
> c)(&
-
6,).
(5.11.21)
as in Section 5.7.2. Note that (5.11.20) involves I ( & > en), since C, assumes the value 0 with positive probability.
5.11.3
Asymptotic Distributional Properties of R-Es t imat ors
First of all we know from the basic theorems of Chapter 4 of Hajek and Sidak (1967), that as R CQ, -+
n-1/2T,(0) and under fixed alternatives C,
..PT
--+
00
-
Np(O,A$A)
as n
-s
-+
(5.1 1.22)
m. Also, under fixed alternatives ., S f
- 0) are equivalent all the estimators f i ( 0 , - 0), fi(0, - 0) and fi(0, in distribution t o that of fi(0, - 0). Hence, we consider the class of local alternatives
K(,) : qn) = 0~1,+ n-1/2S.
(5.11.23)
It may be mentioned that this class of alternatives is contiguous to that of 0 = 601,. Thus, we have the following theorem: Theorem 1. Under {K(,)} and the assumed regularity conditions w.r.t. the model (5.11.1), as n + 00, we have
&(a, (ii) &(a, (i)
- 601,)
- N,(6,02A-');
- 6,)
N,(J6,02JA-l), J = (I, - I,l;A),
-
o2 = A $ / T ~ ( $q5), ,
Chapter 5. ANOVA Model
264
and = H p - l ( z ;A’), A’ = a-’ (6’J’AJ6). (vi) lim P { L , 5 x:JK(,)} a-Di)
(5.11.24)
Proof. Note that both 8, and 6, (or 8,) are translation-invariant estimators. Thus, without loss of generality, we may assume 8 = 0. Also, by the definition of the estimators (5.11.11a) and (5.11.10) and (5.11.15a, b) we have n’/’16, 61 = O p ( l ) ,while under {K(,)} n’/’(6’ - &lP) = 6 = O(1). Thus, n1/216nl (and n1/’/6,1) = Op(l).Observe also that under 8 = 0, n-l/’T,(O)
= An1/’8,y($,
4 ) + op(l).
(5.11.25)
Hence, utilizing contiguity under { K(,)} t o those under 8 = 0, we obtain the distribution of n’/’(6, - &lP) as N,(6, a2A-‘), 0’ = A”,/r’($, 4). Similarly, the asymptotic distribution of f i ( 6 , -8,) is N,(J6, a’JA-l) under {K(,)}, while under 8 = 0,
+
(Ae1 - l p l ~ ) n - l ~ ’ T , ( 0 =)&(en - en) oP(l).
(5.11.26)
Also, -
( l p l ~ ) ~ - l ’ ’ T n (=0n) ‘ / ’ ( 6 , - 601,) and under { K(,)},
nq8,
- 601,)
-
+ oP(l),
(5.11.27)
Np(0,a2B).
As for the asymptotic distribution of L,, as n 8=0as
-+ 00,
we consider 13, under
L, = Ai2[T,(0)]’J’A-1J[T,(0)] + op(l),
(5.11.28)
which follows a central chi-square with ( p - 1) d.f. Under {K(,)}, Ln is approximately distributed as a noncentral chi-square with ( p - 1) d.f. and noncentrality parameter $A2, A’ = Ai2--y2(+,4)(6’J’hJ6). As a consequence of Theorem 1, we obtain ADB, ADQB, ADMSE, and ADQR of the five estimators, as given below following Section 5.8.3.
5.11. Nonparametric Methods: R-Estimation
265
5.11.4 ADB, ADQB, ADMSE, and ADQR For UE, ( 6 ) , (i) bl(6,) = 0 and Bl(6,) = 0; (ii) Ml(6,)
= a2A-’
(5.1I .28a)
(en),
For RE,
(i) bz(6,) = -6 and (ii)
and Rl(6,; W) = a2tr(WA-’).
M 2 ( e n ) = u2B
&(en) = A2,
A2 = 0-~(6’5’AJ6);
+ 66’ and Rz(6,; W) = a 2 t r ( W B )+ 6’Wd.
(5.11.28b)
.. PT
For PTE, (6, ),
- PT
(i) b3(6, ) = -JBHp+a(~;-l(~);A2) and
,.PT
2
B3(6, ) = A2{~p+1(X;-1(.);A’)} ;
..PT
(ii) M3(6, ) = u2A-’ - C ’ J A - ~ H , + ~ ( X E - ~A2) (~);
+ (Jdd’J’){
2Hp+1(x&1(a);A2) - Hp+3(x;-1(~);A2)}
and . PT &(en ;W) = cT2tr(WA-’)
- u2tr(WJA-1)Hp+1(x;-l(a); A’)
+ (”JWJ”) { 2 H p + 1 ( X;-I For SE, (i)
( a ) A’) ; - H p + 3 (x;- 1( a ) A’)} ;
*
(a:), -s
b4(6,) =
-(P - 3)JaE[x;-‘(a);
A’)]
and
B4(3:) = ( P -3)A2{E[~G:l(A2)]}2; (ii) M4(6:) = u’tr(WA-’) - (P-3)E[x;;I
R4(6:;W)
-
O~~~(WJA-’{~E[X~:~(A~)],
(A2)]} + (P-3) (P+ 1)(6’J’WJW [xi$ (A’)]
= a2tr(WA-’)
-
a2tr(WJA-’)
x {2E[Xp-:1(A2)]
- ( P - 3)E[X;:1(A2)]}
+ ( P - 3)(p + ~)(G’J’WJG)E[X~;~(A’)]. ..S+
For PRSE, (6, ),
(5.11.28d)
266
Chapter 5. ANOVA &lode1
where (5.11.30)
By computing (i) E[U1], a2EE[U1U’,]and a2E[U{WU1], (ii) E[Uz], a2E[U~Ua] and o’E[U;WUz],
and
5.11. Nonparametric hlethods: R-Estimation
267
(iii) E[U3],02E[U3Ui]and 02E[UiWU3],
(5.11.31)
we obtain the bias, MSE matrix, and the quadratic risks of the PTE, SE, and PRSE. Next, we discuss the recentered confidence sets based on the five R-estimators. First, we consider the usual (1-Y)% confidence set, namely Co(6,) defined by
~ ‘ ( 6 , )= { O
: nai2/16- l~,ll&5 k , } ,
(5.11.32)
a = A:/?:. For some positive a (< co),we define a where k, = $(y), and : consistent estimator of Y($, 4 ) by
1
2
Yn = P j=1 ~
T ~(6, ,
+ n-1/2alp) - T ~(6,, - n-I/2alP) 2njan
(5.11.33)
Then, the asymptotic coverage probability of the set, Co(an)may be written as 2 -1
P K ~ , , ) ( C * ( ~=n )n-wx lim ) PK(,,{A, n
n-cc lim
= Po(&2
5 Icy)
IlTn(O)ll%_l i Icy}
= 1 - y.
(5.11.34)
We define the confidence sets by
c*(e:) = ( 6 : na;21p - e;&
5 k,},
(5.1 1.35)
where 6; = A6,
+ ( I - A)eng(Ln),
A = l,lbDn,
(5.11.36)
as before (see Section 5.9.4).
Now, using the transformations as in Section 5.9.1, we have w1
where
n1/2on-11”D1/2an 1 n and w2 = n1/20;1rLDi/26n,
1
I’2 =
Dk‘21,. Then, as n
(5.11.37)
+ 00,
(5.11.38) Hence, as n 4 co,
na;2j16 - 6:11D,-(732 2 2 -~
2
+ )I I ~~ -I wlg(/IwiI12)I12.
(5.11.39)
The asymptotic coverage probability of the set C*(6:)is then given by lim P(C*(6:))
n-w
= Pq
{ “732
- w2)2
+ I l V l - ~lg(l/w1112)l1215 4 )
Chapter 5. ANOVA Model
268
which is the same as the expression at (5.9.10). The rest of the analysis follows based on the material in Section 5.9.1 and 5.9.2 and is not repeated.
5.12
Conclusions
In this chapter, we considered the simple ANOVA problem with regard t o the improved estimation of the parameters of the model (5.1.1). Accordingly, we discussed the unrestricted, restricted, PTE, SE, and the PRSE of the parameters based on normal theory as well as the nonnormal theory in an asymptotic setup. We added the nonparametric R-estimation methods t o enhance the scope of the methodology. In addition, we discussed the recentered confidence set estimation and their finite sample as well as asymptotic properties of the coverage probabilities under local alternatives.
5.13 Problems 1. Prove Theorem 2 of Section 5.1.3. APT -S ,.S+ 2. Determine the bias-vector of 6 , , 8, and 8, using the expressions at (5.2.4-iii).
..S+ , 8,, and 6,
-PT - S
3. Determine the MSE matrices of 8, at (5.2.4i-iii).
using the expressions
..S+
4. Show that 6;' is the convex combination of 6, and 6,, and hence On is a preliminary test estimator of 6 for p 2 4. -S
5. Determine lM1(an)1,M2(6,)/,M s ( 6 r ) I JM4(6:)1, and M~(6;+)l,and verify the expressions for the relative efficiencies based on MSE matrices. 6. Prove Theorem 1 of Section 5.7.1. s+ 7. Consider (5.8.16), and show that E{n~;~/j6, - 6 / ' } 0 as n 4 oc) under fixed alternatives 6 = BOl, 6. .--)
+
8. Show that
fis;'
(6;
- 601,) 2
[(X+ S*)- (Y + Jd*)I(llY + J6*j12 < ~ & ~ ( a under ) ) ] local al-
ternatives 8(nl = 901,
( fiii,
+ n-'f26
where
)} (Section 5.8.2). V
9. Verify that &s;' (6: - &l,)= X - ( p - 3)(Y given a t (5.8.24). 10. Prove Theorem 5.8.1, equation (5.8.22).
1,
I
11. Determine (Ms (6t['T) IM1 (6;) and IMs (6:') sions given at (5.8.29), (5.8.32), and (5.8.36).
+ JS*)llY -+ J6*/1-2
I
using the expres-
5.13. Problems 12. Let
Y i= 8 + e,, i
= 1,. . . , n , where e, is distributed as f(e,;v,a’) =
c:=, eze,I ( ~ + ~ );/v,a2 ’ > 0. Define a,,
-PT
-S
,.S+
6,, 8, , 6,, and 6 , 4 1 + vu2 as given in Section 5.2.1. Determine the bias vector, MSE matrices and weighted risk expression for the five estimators. Prove Theorem 1 of Section 5.9.2. Prove (5.20.2a-d) of Section 5.10.1. Prove the asymptotic linearity results given by (5.11.15a, b). Show that \
13. 14. 15. 16.
269
I
n’/%, = n 1 ’ 2 8 ,
+ op(l),
where 6, = A l e n l + . . . + Apen, using the asymptotic linearity results. 17. Verify the expressions for ADR and ADMSE matrices and for ADQR. given by (5.11.28a+) using (5.11.31) as the computational step. 18. Prove that 13, a t (5.11.12) approximately follows the central chi-square distribution with ( p - 1) d.f. under Ho : 6 = 6 0 1 ~ .
+
19. Show that C, = n(6, - 6,)’A(6, - 8,) op(l). 20. Verify the expressions for the asymptotic coverage probabilities of the confidence sets given by (5.11.35) explicitly with that of the Section 5.9.
This Page Intentionally Left Blank
Chapter 6
Parallelism Model Outline 6.1 6.2 6.3 6.4 6.5 6.6 6.7 6.8 6.9 6.10 6.11 6.12
Model, Estimation, and Test of Hypothesis Preliminary Test and Stein-Type Estimators Bias, Quadratic Bias, MSE Matrices, and Risk Expressions Comparison of the Estimators of the Intercept Parameter Estimation of the R.egression Parameters: Nonnormal Errors Asymptotic Distributional Risk Properties Asymptotic Distributional MSE-matrix Properties Confidence Set Estimation: Normal Case Confidence Set Estimation: Nonnormal Case Nonparametric Methods: R-estimation Conclusions Problems
In this chapter, we consider another important model, namely the parallelism model, belonging to the class of linear hypotheses useful in the analysis of bioassay data, shelf-life determination of pharmaceutical products, and profitability analysis of factory products in terms of costs and outputs, among other applications. In this model, as in the ANOVA model, p independent bivariate samples { ( x a l yal , ), (xa2 ya2 ), . . . , (xan,,,yan, ) = 1 . .. ,PI are considered such that y a j n/(Sa Paxa5,o2) for each pair ( c x , ~ )with fixed x a j . The parameters 6 = (61, . . . OP)’ and ,6? = (PI . . . ,PP)’are the intercept and slope vectors of the p lines, respectively, and o2 is the common unknown variance. In this model, it is common to test the parallelism hypothesis NO : p = P o l p (where Po is an unknown scalar) against the alternative HA: a t least one pair of the components of p differ. Hence, we designate this model as the “parallelism model”. The main objective of this chapter is to study the properties of various estimators of 6 and ,f3 when it is suspected but not
-
+
271
Chapter 6. Parallelism Model
272
certain that the parallelism hypothesis HO : p = P o l p holds. Accordingly, we consider (1) the unrestricted estimator (UE), (2) the restricted estimator (R.E), (3) the preliminary test estimator (PTE), (4) the James-Stein-type estimator (JSE) and (5) the positive-rule Stein estimator (PRSE) of 8 and p. In addition, we consider the properties of the recentered confidence sets based on these estimators as the center of the sets with fixed volume. Both small and large sample cases are treated for the problems stated above. In addition, we include the R-estimation theory of point and confidence set estimation. Interested readers may refer to Lambert and Saleh (1985), Akritus, Saleh, and Sen (1985), Saleh and Sen (1985a), Saleh and Hassanein (1986), and Robert and Saleh (1991) for more information.
Model, Estimation, and Test of Hypothesis
6.1
6.1.1. Parallelism Model Consider the p simple linear models
+ Pax, +
ya =
€ ,,
a = 1, * .
. ,p,
(6.1.1)
where
Y , = ( ~ ~ 1..,y a. R n ) ’ , 1,xa =
=
(%I
( I , . . . ,I)’ - a n,-tuple of 1k,
7 .
. . ,%na
(6.1.2)
)I,
N,,( 0 ,021n,), In, being the identity matrix of order n,. and E, Generally, the main objective of this model is the test of the parallelism hypothesis HO : = P o l p (POis an unknown scalar) against the alternative H A : ,B # polp. In this chapter, we pursue the problem of improving the usual estimators of 6 and p when it is suspected but not evident that HO holds. N
6.1.2
Estimation of the Intercept and Slope Parameters
Consider (6.1.1) and (6.1.2) using the sample information and the model. The unrestricted estimator of 6 and p are given by the LSE/MLE as follows: (a)
where
-
3,
and
= Y - T,B,
(i) Y
=
(b)
p, =
PI^^, . .. ,&,,)’,
,,... ,Ypn,)’, (ii) T,
(v1,
(6.1.3) = Diag(zl,, ,... , znp),
)(I:, Y-) 1 (iii) Pan, = x L Y a - & (1k,xn , (iv) n,Q, = x&xa - n, (1;- x, l29 n,Q,, (6.1.4) (v) Zxna = ,(lb,Xol) 1 and yen, = L n, ( l n n Y , ) . Finally, the unbiased estimator of o2 is given by
s : = ( n - 2p1-l
c P
llYa
a=l
-&Jnn
- P*,n,Xall 2 *
(6.1.5)
6.1. hdodel, Estimation, and Test of Hypothesis In the case where the slopes are equal, meaning estimators of 8 and P are given by
where
(i)
an
(iii) H = I, -
273
P
=
pol,,
(ban,... ,An)' = h n l p t (ii) n Q ; ; , (iv) DT; = Diag(nlQ1,. . . ,n,Q,). nQ =
1,l ;D
the restricted
= CZ=ln,Qa,
(6.1.6)
The following theorem gives the exact distribution of the estimators, whose proof is straightforward.
Theorem 1. Under the assumed conditions
( il 1 ) - N2, { ( 8 ) ; a 2( -TnD22 D1l
-TnD22 D22
( Penn -' { ( TnHP ) ( gii -P ) ;u2
N2,
where (i)
pl,
=
,b
1 1 D;, nQ
P
D'2
D22
)} ,
(6.1.7a)
)},
(6.1.7b)
.
, (ii) D11
+
= N-l
T,i , i ; ~ ,
(6.1.7d)
+ TnD22Tn; and
N =
and DT2 = - 2 nQ1 P l'Tn, P Diag (nl,.. . , n,), (iii) DT1 = N-' nQ (6.1.8) (iv) H is defined by (6.1.5 (iii)) and D22 is defined by (6.1.5iv). We will need these distributions later to compute the bias, the MSE matrices, and the weighted quadratic risks of the estimators of the form
where A is an idempotent matrix of rank 1 and g ( L n ) is a nondecreasing function of the test-statistic L, for testing the null hypotheses HO : P = Pol, against the alternative H A : p # ,001, given in Section 6.1.3 below.
Chapter 6. Pardelism Model
274
Table 6.1.1 Parallelism Data
First Sample
I I
Second Sample
I
pth Sample
Data
Estimate
I
Variance Estimate I
In the sequel, we will need the following statistics: (a) Pooled estimate of slope:
boon= nlQ16lnI +...+n,Q,PPnp nQ
(b) Pooled estimate of variance:
sz =
( n - 2p)-'
c",=, llya
-
-
ba,naXaI12*
6.1.3
Test of Parallelism
In this section, we consider the testing of the parallelism hypothesis Ho : p = polp. The test-statistic is given by the following theorem:
Theorem 2. The likelihood ratio(LR.) statistic for the null hypothesis Ho is given by (6.1.10) Under Ho, L, follows the central F-distribution with (p - 1 , m ) d.f., where m = n - 2p and under H A , it follows the noncentral F-distribution with ( p - 1,m ) d.f. and noncentrality parameter A2/2, where
(6.1.11) Proof. The likelihood ratio statistic is obtained by using the maximum of the likelihood function under HO and H A , which are given by
6.2. Preliminary Test and Stein-Type Estimators
275
and LA
= Sin(&)-nexp
{ - ;},
c P
2
nsA =
llya - 8 a ~ n a paxa112 = ms:,
a=l
respectively. Hence, the likelihood ratio statistic is the result of
(6.I. 13)
since P
P
a=l
m=l
C I I Y-~ 6a1na - Baxalj' = C I I Y-~
e a l n a - paxa/l2
+ P;H~D;~@,. (6.1.15)
p' H D; ;HP- as the L R statistic for testing Ho against Thus, we take L, = (p-l)s,z
HA. To obtain the distribution of 13, under H A (or Ho), we consider the orthogonal matrix I' = (rl,r2). Here rl is a p x p - 1 matrix, and I'z is a pvector such that rLrl = 0 and rlr; + r2ri= I, SO that = I, - r2rL.Also, I' diagonalizes the symmetric idempotent matrix D22 1/2H'Di;HDip of rank ( p - I), that is,
112 1/2 Defining w = a-1rD22 0, and q = a-lrDz2 0,we have w = N p ( q ,Ip).Now partition w = (wi,wz)' and q = ( q i ,q 2 ) ' , where w1 is a ( p - 1)-vector and
( :;
("I
; I,( ti1 :I}.
is a scalar random variable. Then Hence, llw1112 follows the noncentral chi-square distribution with ( p - 1) d.f.
w2
and noncentrality parameter A2/2 with A2 = ljqlll2 - O'H'D-,'HP. 4; Then, L, under H A follows the noncentral F-distribution with ( p - 1,m) d.f. and noncentrality parameter A2/2 since ms2/a2 follows the central chi-square distribution with m d.f. independent of Clearly, under Ho, C n follows the central F-distribution with ( p - 1,m) d.f.
Pn.
6.2
Preliminary Test and Stein-Type Estimators
Following Chapter 5, we now consider various estimators of the intercept 6 and slope vectors p.
Chapter 6. Paralielism Model
276
6.2.1 The Estimators of Intercepts and Slopes The unrestricted and restricted estimators of 8 and @ are given in Section 6.1.2. We consider the preliminary test estimators of 8 and @ as follows:
KT= P , and
- PT
en
=
-
HP,W, < Fp--l,rn(a))
e, + T,HP,I(L,
<~
(6.2.1a)
b) ~ - ~ , ~ ( a ) (6.2.1 )
respectively, where Fp--l,m(~) is the upper a-quantile of the central Fdistribution with ( p - 1,m ) d.f. To overcome the dependency on a, we consider the James-Stein-type estimators of 8 and @, namely
-s
-
@, = /3,
rs
0, = 6 , cL;'
-
cHP,L,'
(6.2.2a)
+c ~ , ~ & ~ ; l ,
(6.2.2b)
(Ln <
Jotice that we have replaced in (6.2:la,b) to'obtain (6.2.2a,b).
Fp--l,rn(aN
-s
-S
by
The estimators 8, and p, may exceed the values 6 , and ,h, respectively.
-s
Thus, we consider the convex combinations of 6, and 8,, and ,hn and @,, via the preliminary test procedure with critical value c to obtain the positiverule Stein-type estimators of 8 and @, respectively, as -S
S+ -s 8, = 6,I(L, < c) + 8,I(Ln > c)
= 8, - ( 6 ,
-S
-
8,}1(Ln > C)
= 6, - (1 - c ~ ; l ) ~ ( ~ > cC)T,H& , = 6,
+ T,HP,{1
and S+
p,
= ,h,I(L, =
- (1 - cLC,')I(L,
> c)}
(6.2.3a)
< c) + P;tI(L, > c )
bn + (1 - cLG1)1(L, > c)HP,
(6.2.3b)
with p 2 4. We present the following data set related to shelf-life, to illustrate the summary statistics and the test-statistics. The tabular values contain results of six experiments conducted to determine the shelf-life of a drug by setting fixed concentrations of a component of the drug and observing the times (years) to destroy the concentration.
6.2. Preliminary Test and Stein-Type Estimators
277
Table 6.2.1 Shelf-life Data Batch
Time Concentration (years) 1 0.014 100.4 0.230 100.3 0.514 99.7 0.769 99.2 1.074 98.9 1.533 98.2 2.030 97.3 3.071 95.7 4.649 94.5
2
3
0.022 0.118 0.272 0.566 1.165 2.022 3.077
100.7 100.6 100.3 99.4 98.6 97.6 96.4
0.025 0.275 0.547 0.797 1.041 2.058 2.519
100.2 99.7 99.2 99.0 98.8 96.4 96.2
Batch 4
Time Concentration (years) (%I 0.066 100.4 0.343 100.0 0.533 99.5 0.802 99.3 1.033 99.3 1.538 98.2
5
0.011 0.310 0.624 1.063
100.5 99.8 99.1 98.4
6
0.011 0.310 0.624 1.063
100.1 99.5 98.5 98.4
Adapted from Roberg and Stegman (1991)
Table 6.2.2 contains the summary statistics related to Table 6.2.1. Table 6.2.2 Individual Batch Regression Estimates Intercept Slope Batch (%) (%/year) 1 100.49 -1.515 2 100.66 -1.449 3 100.25 -1.682 4 100.45 -1.393 -1.999 5 100.45 6 99.98 -1.701 Pooled batch regression with 99.90 -1.534
Weight MSE niQi 0.019 14.63 0.043 7.83 0.062 5.14 0.035 1.21 0.61 0.011 0.124 0.61 equivalent slopes 0.047
The value of 13, = 1.64 with p-value 0.186.
Chapter 6. Parallelism Model
278
6.2.2 Bayes and Empirical Bayes Estimators of Intercepts and Slopes
(@,,on)’
be the intercept and slope parameter, and let 8, = Let 6 = (6’, ..I ..I be the unrestricted estimator and 2, = (6,, 0,)’ the restricted estimator of (6’,/3’)’.We then write -I
-I
(6.2.4)
- NZp{ ( E:i Eiz )}
-
By sufficiency considerations, we can reduce the model to the observation of the statistic, 8,
6, a2
m*=n-2p+2and
a2x
c P
s: = ( n - 2p+ 2)-l
and m*s:
lly* - &Aa - pax,p.
~ where - ~ (6.2.5)
Cr=l
The estimator sp of a2 is the best scale invariant under the loss
L(a2,d ) = (1 -
$)
2
(6.2.6)
p
= p o l p , we
)} ,
(6.2.7)
and dominates the mle of a2.Since our null hypothesis Ho is select the prior
6
- NzP{ ( ) ( ; 7’
D1l
-TnD22
-TnD22 D22
which gives the posterior distribution of 6 conditionally on a2 as
where 6’ =
( :; ).
Conditionally on cr2, the associated posterior mean of
618, may be written as
Now, using the noninformative prior on a2, i.e., ..(a2)= 1/a2?we find that the Bayes estimate of a2 is s.: Similarly, the empirical Bayes estimator of 6* and (cr2 T ~ ) - ’ based on the marginal distribution of 8,, namely
+
( -ZTn-%2y22)} ’
(6.2.SO)
~
6.2. Preliminary Test and Stein-Type Estimators is given by
279
a, and 6,, since Em($,) = Polp
E(6,) = 6,
and
where Em[.]stands for the marginal expectation of the r.v. in and 8, are empirical Bayes estimators of P o l p and 6 . Further, COV[(&
-fin)] =
(2+ T’)HD~’H’
(6.2.11)
(.I.
Thus,
b,
(6.2.12)
and
(P, - b,)’Di;(P,
-
p,)
2
+ T2)&1.
(0’
(6.2.13)
Therefore,
and the estimator of 1/(02+ T’) is given by (6.2.15) Hence, the empirical Bayes estimator of 6 may be written as
=
8,
+ ( 1 - cL,1) (6, - 8,),
(6.2.16)
where
with m = n - 2p being the LR test for Ho : P = polp. The corresponding positive-rule empirical Bayes estimator of S is given by
&+:
= L,+
( 1 - c.c,1) I ( L , > c ) (6, - h,),
(6.2.18)
which coincides with the PTE approach for the shrinkage estimators (see Section 6.1.4).
Example 1. Consider the data of Tables 6.1.1 through 6.1.3. We illustrate the computation of estimators of the intercept vector 6 . The computation of the estimators of p may be obtained t o verify these computations. Table 6.1.2 shows that 13, = 1.64 with pvalue 0.186. At the 0.25 level of significance, the hypothesis of equality of the slopes is reAPT -S - S+ jected. The formulas for 0, , O n ,and 6 , are given by (6.2.la) through
Chapter 6. Parallelism Model
280
(6.2.3b). Accordingly, we have (i) The unrestricted estimator of 8 is 6 3 7 = (100.49,100.60,100.25,100.45,100.45,99.98)’. (ii) The restricted estimator of 6 is 6 3 7 = (99.50,99.90,99.90,99.90,99.90,99.90)’. (iii) The PTE of
-
-PT
6 is 8,, = 6 3 7 = (100.49,100.60,100.25,100.45,100.45,99.98)’. (iv) The -S JSE of 8 is 8,, = (100.50,100.62,100.23,100.47,100.37,99.48)’, here c = -s+ - - S 0.556 and L, = 1.64, and (v) The PRSE of 8 is then 8,, - 8,, = (100.50,100.62,100.23,100.47,100.37,99.48)’.
Bias, Quadratic Bias, MSE Matrices, and Risk Expressions
6.3
In this section, we present the expressions for bias vector, quadratic bias, MSE matrices, and weighted quadratic risks of various estimators. They are given in the sequel based on Theorem 6.1.1.
6.3.1
Unrestricted Estimators of p and 8
(i) bl(P,) = 0, and Bl(P,)
= 0;
(ii) MI@,) = u2D22,and Rl(P,;W)
= u2 tr(WD22).
(6.3.1a)
Similarly, (i) bl(6,) = 0, and
&(en)= 0
(ii) Ml(6,) = 02Dll, and Rl(6,; W) = o2 tr(WD11).
(6.3.1b)
Restricted Estimators of p and 8 bz(P,) = -HP, and Bz(bn)= A’ = o - ~ P ’ H ’ D ~ ~ H P
6.3.2 (i) (ii)
M2(Pn)
=
nQ
+ P’H‘WHP. + HPP’H‘ and R2(gn;W) = a21’iw1, Q (6.3.2a)
Similarly, (i) bZ(6,) = T,HP, and Bz(6,) = CT-~P’H’TLD,-,’T,H/?; (ii) M z ( ~=~02D;, )
+ T,HPP’H’T,
P‘H‘TkWT,HP, where Dil = N-’
and Rz(6,; W) = u2 tr(WD;,)
+
T : I ~ ~ ; T ~ nQ ’
+
(6.3.2b)
6.3. Bias, Quadratic Bias, MSE Matrices, and Risk Expressions
28 1
and
and
%(bf;W) = a’tr(WDz2) - c(p - l)a’tr(WHDzz){2E[~~~~(A’)] (P- 3)E[x;$1(A2)]} + C(P’ - 1)(P’H’WHP)E[x,54,(A2)]. (6.3.4a) Similarly, (i)
-S
b4(en)= -S
c ( p - ~)T,HPE[X;;~(A’)] and
B4(6,) = ~ ’ ( -p ~)‘CT-’(P’H’T,D~~T,HP){ E[x&(A’)]}’;
282
Chapter 6. Parallelism Model
6.4. Comparison of the Estimators of the Intercept Parameter . S+
(ii) Mg(6, ) X
x
= M4(6:) - a2(T,HD22H’T;)
E [ ( 1- c1F~~i,,(A2))21(F,+l,,(a2)< CI)]
{2
-
and
283
w - C1F,-:,(A2))~(Fp+2,,(A2)
< Cl)] < c2)]}*
E[(l - c2F,;13,,(A2))21(Fp+3,,(A2)
..S+
+ (T,HDzzH’T&)
Rg(6, ; W) = &(en;W) - cr2 tr(WT,HD22H’T’,) X
+
-S
E[(1- clF,-:l,,(A2))21(Fp+l,,(A2)
(2w
< .I)]
+ (P’H’TLWT,HP)
- C 1 F ~ ~ ~ , , ( A 2 ) ) 1 ( F P + ~ , , (< A 2C l)) ]
- E[(l - C Z F ~ ~ ~ , m ( A 2 ) ) 2 1 ( F p + 3 ,< m CZ)]} (A2)
(6.3.5b)
where E[Fp>11,,(A2] is the expectation of the reciprocal of the noncentral F-variable with ( p 1,m ) d.f. and noncentrality parameter A2/2. Similarly,
+
E [ F ~ ~ l , , ( A 2 > ~ ( F P + l , , ( A<2C) l ) ] is the truncated expectation of the reciprocal of a noncentral F-variable with ( p 1,m ) d.f. and noncentrality parameter A2/2. Further, G,,,,, (-;A2) is the CDF of a noncentral F-distribution with (v1,v2) d.f. and noncentrality parameter A2/2.
+
6.4
Comparison of the Estimators of the Intercept Parameter
In this section, we discuss the comparison of the estimators of the intercept parameters, 6 . The comparison of the estimators of the slope parameters p is similar to that of the treatments means in the ANOVA model given in Chapter 5.
6.4.1 Bias Comparison of the Estimators of the Intercept Parameter The bias comparisons of the estimators of 6 is made via quadratic bias exThese expressions depend on T, except for the pressions as they are scalars. unrestricted estimator 6,. Hence, if T, = 0 ,
- PT
-S
Bl(6,) = Bz(6,) = B3(6, ) = B4(6,)
=
,.S+ &(en ) = 0.
(6.4.1)
If T, # 0, BI(6,) is independent of p, and all the other expressions depend on p and thereby on A2, the departure parameter from the null hypothesis.
Chapter 6. Parallelism Model
284
First, note that B2(6,) is unbounded as a function of A2, since
A’Chmin [T;D22] I B2(6n)
I A2Chrnax[T:D22],
(6.4.2)
Further, under Ho,
and in general, (6.4.3) and . PT
0IB3(Bn ) I B2(&2), depending on the magnitude of the level of significance, a.
6.4.2
MSE-matrix Comparisons
Notice that all MSE-matrix expressions of the estimators of B involve the matrix T,. Thus, if T, = 0, then
..PT
-S
..S+
MI(6,) = Mz(6,) = M3(8, ) = M4(Bn)= M5(B, ) = a2N-’.
-
APT
-S
(6.4.4)
. S+
Clearly, for T, = 0 , B,, B,, 0, , M4(8,) and M5(Bn ) are MSE equivalent. Further, we have the following theorem: on the properties of the MSE matrices under Ho : /3 = p o l p .
Theorem 1. Under Ho and assumed conditions, the quantities Ml(8,) AS
M2(B,), M3(6ET)-M2(6,), Mi(8,) -M3(6ET), Mi(8,) -M4(0,), and -S S+ M4(Bn)- M5(B, ) are positive semidefinite. Proof.
Under Ho, we may write
(i) MI(&) - Mz(6,) = a2(T,HDz2H’Tn), (ii)
PT
MI(^) - M3(8, ) = a2(T,HD22H’T,)Gp+~,,(C,; 0), ~
. PT
(iii) M3(0, ) - Mz(6,) = a2(T,HD22H’T,)(1 - GP+1,,(&;O)), (iv) Mi(6,) - M4(6:)
= ca2(T,HD2zH’T,),
Sf
(v) M4(6:) - Ms(8, ) = a2(T,HD22H’T,) A
xE[(1-clF,-:l,,(0))21(F,+1,,(0)
< C d ] , (6.4.5)
6.4. Comparison of the Estimators of the Intercept Parameter
285
Expressions (i) through (v) are positive semidefinite matrices (p.s.d.) by the fact that
1
-(T,HD22H’T,) nQ
1 = -T,D(I,
nQ
- D-llpl~D-l)DT,,
(6.4.6)
where
D = m D i a g ( (nlQ I ) - ’ / ~.,. . , ( n p Q p ) - 1 / 2 )
(6.4.7)
and D-llplbD-l is of rank 1 and its only positive root is the trace of the matrix, the others being zero. Thus, we may write
for some orthogonal matrix l?, which proves the claim. Further,
Now, consider the MSE based comparisons in general
a,
Comparison of mators is given by
Relative to
8,.
The MSE-matrix difference of the esti-
n2(TnHD22H’T,) - (T,HPP’H’T,).
(6.4.8)
In order for (6.4.8) to be p.s.d, we must have
t’(TnHD22H’T,)t 2 o-~~’(T,HPP’H’T,)~
(6.4.9)
for a given nonzero vector t. Hence, we must have IY-~
t’(T,HPP’H’T,)t < t’(TnHD22H’T,)t t’(TnD22T,)t t’(T,D22T,)t ’
(6.4.10)
But max
t
T,HPP’H’ ~ , ) t = ,-2(P’H’DTiH,B) = A2. t’(T,D22Tn)t
(6.4.11)
Hence. (6.4.12)
286
Chapter 6. Parallelism Model
Thus, for A2 5 1, 6, performs better than 6,; otherwise, The efficiency of 6, relative to 6, is given by
~IN-’
1 + -T nQ
n1 p I’T, p
6,
performs better.
-1IP + ~-~(T,HPP/H/T,)/ . (6.4.13)
Now, we may write
IN-^ + =
1 -T nQ
n
1 11T, p
p
+a
Ia2Dll - T,D22H’T,
l- ’p
- 2 ( ~ , ~ ~ ~ ’ ~ ~ T , )
+ T,H,L3#?’H’T,I-’1p.
Then
M R E ( ~ ,: 6,) =
1~2Dll11/P la2Dll - TnD22H’T, + T,H,f3P’H’T,11/p
=
11, - DT;T,D22 (H’- 8DgSHPP’H’) T,
I-1IP ,(
6.4.14)
where H is positive semidefinite with rank p - 1 and D,-,’HPP’H‘ is positive semidefinite with rank 1 and its only positive characteristic root is A2 = D - ~ ( P ’ H ’ D ~ ~ HSince P ) . D22 is positive definite, H‘ - G-~D&~H/~P‘H‘ is positive definite. Thus, if A’ is small, MRE(6, : 6,) 2 1, and for large values of A2, MRE(6, : 6,) < 1. Further, as A2 --+ 00, the MRE(6, : 6,) goes to zero. Thus, 6, loses efficiency for large A2.
- PT
Comparison of 0, Relative to of the estimators is given by
Ml(6,)
6,.
In this case, the MSE matrix difference
- M3(6ET)
= 2 (T,HDzzH’T,)Gp+i,,(eol; A’)
-(TnHPP’HTn)
{ 2Gp+1,m(fa;A’)
- Gp+s,m(f:;
A”,>. (6.4.15)
Thus, for a given nonzero vector 4 we obtain by setting (6.4.15) 2 0 the expression below:
6.4. Comparison of the Estimators of the Intercept Parameter
287
Dividing by t’(TnD22T,)k?, and maximizing over all k?, we obtain
APT
Thus, 8,
6,
performs better than ,
PT
performs better than 8,
if A’ satisfies (6.4.17); otherwise, 6,
. PT
. The MR.E(B,
:
6,)
is given by
= 11, - DT; (T,HD22H’T,)Gp+l,m(ea; A’)
+ a-’D,-,’
(T,HPP’H’T,)
{
x 2GP+1,,(&; A’) - Gp+3,m(!:;
(6.4.18)
,.PT
Clearly, as a function of A’, MRE(8, : 6,) has a maximum at A’ = 0, decreases crossing the 1-line to a minimum, and then increases toward 1-line as A -+ 03. The same conclusion will be reached if we consider the trace of the matrices instead of the determinant. -S
Comparison of 8, Relative to is given by
6,.
In this case, the MSE-matrix difference
Ml(6,) - M1($) = c ( p - 1)(TnHD22H’T,) { ( P - 3)E[X;:1(A2)]
+ 2A2E[X,343(A2)]}
-c(p’ - 1)(T,HPP’H’TnE[~~:,(A2)].(6.4.19)
In order that the MSE difference is p.s.d., we must have for every nonzero vector 1, ( P + 1)(t’T,HPP’H’T,t)E
[xi:,
(A’ )]
I ( ~ ’ T , H D z z H ’ T , ~() p{ - 3)E[x;;1(A2)]
+ ~A’E[X;$~(A’)]}.
Dividing by t’TnD22T,t and maximizing over
(6.4.20)
t , we have
+
( P + 1)A2E[x&(A2)] I (P-~)E[X;$~(A’)] ~A’E[x;:~(A’)],
or
Chapter 6. Parallelism Model
288
-S
The inequality (6.4.21) does not hold for all A’. Hence, 8, does not dominate 0, uniformly. Here we have the efficiency expression as
..S+
which is positive. Hence, 6 ,
-s
dominates 6, uniformly under the MSE criteria.
6.4. Comparison of the Estimators of the Intercept Parameter
289
Here we have the efficiency expression as
< ci)] - (T,HPP’H’T,)
x E [ ( 1 - .1F,-:l,,(A2))21(F,+l,,(Az)
x E [ ( 1 - c 2 F 6 ; ’ 3 , , ( A ” ) ) 2 r ( F , + 3 , m ( ~ 2<) c2)] - 2(T,HPP’H’T,)
x [ (c1q---l,l,,(A’)
-
1
1)IF( p + l , m (A2) < C l >]} - l’p,
(6.4.26)
-S
P+l and c2 = c(p-l). P+3 MRE(6;’ : 6,) is a decreasing function of where c1 = cO A’ with maximum a t A’ = 0, and as A .+ 00, it converges to 1 from above. -S Hence, 6:’ dominates 6 , uniformly with respect to MSE matrices.
6.4.3 Weighted Risk Comparisons of the Estimators In this section, we compare the estimators of 6 based on the weighted quadratic risk functions.
Comparison of 6, Relative to 6,. The risk-difference of the two estimators In this case, is given by
Ri(8,
: W) - Rz(6, : W) = 02tr[W(D11 - Dil)] -
(P’H’TnWTnHP). (6.4.27)
Thus, the risk-difference is f 0 according as
u-~(P’H’T,WT,HP)
2 tr[W(D11 - DT1)]. <
(6.4.28)
Now,
6’ (P’H’T,WT,HP) = A2Chm,[(D11
5 A’tr(D11W) Hence, for A2 5
I A2Ch,,
- N-l)W]
(D2zT,WT,)
I A2Chm,,(D11W)
and tr[W(D11 - Dz1)] 5 tr(D11W).
C ~t rm( D =1 x l(W D l) l W )6, l
performs better than
- , performs better than 6,. A2 2 C htr(Di1W) ,i,(D1lW), 6 Thus, the risk-efficiency of 6, relative to 8, is R R E ( ~ , : 8,) =
a,,
(6.4.29)
and if (6.4.30)
R,(6, : W) R z ( 6 , : W)
(6.4.31)
290
Chapter 6. Parallelism hdodel . PT
Comparison of 8, Relative to risk-difference, we obtain
..PT
RI(6, : W) - R3(8,
. PT
R1(6, : W ) - R3 (8,
6, (6,). In this case, by considering the
> according as <
: W)-0
> according as <
: W) -0
We note that D11 - D;l is a positive semidefinite matrix of rank of at most p - 1. Therefore, for T, # 0, tr[W(D11 - DT1)] is a nonnegative number.
..PT
Hence, 8, performs better than
and
6,
6,
if
. PT
performs better than 8, if
PT
Similarly, b, performs better than 8, if
., PT
and 8, performs better than
PT
6 , if
The risk-efficency of 8, Relative to
en is given by
- Ri(6, : W) -
R2(6ET : W)
(6.4.36)
6.4. Comparison of the Estimators of the Intercept Parameter -S
Comparison of 6 , Relative to by
6,.
29 1
In this case, the risk-difference is given
Rl(6, : W) - &(bf : W) = a2c(p- l)tr[WT,HD22H’T,] x { (P- 3)E[X,;43(AZ)1+ 2A2E[X,S43(A2)I}
-c(p2
-
~)(P’H’T,WT,HP)E[X;:~(A’)] for p 2 4.
(6.4.37)
Thus, the risk-difference is a decreasing function of a-2(P’H’T, W T , H P ) . Hence, replacing o-~(P’H’T,WT,HP) by A’tr(D11W) in (6.4.37), we note that the risk-difference satisfies the equality
Rl(6, : W ) - R4(6: : W) 1 a2c(p- l)tr[W’TnHD22H’T,]
{
x (P- 3)E[X&(A2)]
-+’
+ 2A2E[x;-f3(A2)]}
- l)A2tr(D11W)E[x,S43(A’)],
(6.4.38)
and the R.H.S. is a decreasing function of A2 with a maximum value
a2c(p- l ) ’ ( p - 3)tr[W’TnHD22H’T,] at A2 = 0. Hence, 6 , dominates en uniformly. -S The risk-efficiency of 6 , relative to 6 , is given by the expression -S
RRE(6: : 6,) tr[WT,HD22H’Tn] trlWD11 I
{ ( p - 3)Ek;-f3(A2)j + 2A2E[Xi:3(A2)1}
(6.4.39)
Chapter 6. Parallelism Model
292
- S+
The risk-efficiency of 8,
-S
relative to 8, is the ratio of two risks.
-S
Comparison of 8, Relative to 6,. The risk of 6, is unbounded while the -S risk of 8, is bounded. Also under Ho,
&(en : W) = u2~ I - [ W D ; ~ ] R4(6: : W) = u2tr[WD11] - cu2 tr[W(D11 - D;l)]. -S
Hence, the risk-difference R4(8, : W) u2tr [WD,,]
-
(6.4.43)
&(en : W) In this case, is
cu2 tr [W(D11 - D;,)] - u2tr [WD;l]
= u2(1- c) tr[WD11] - u2(1 - c) t~-[wD;~] = a2(1- c) tr[W(D11 - D;l)]
L 0.
(6.4.44)
-s Hence, en does not dominates 6, a t A’ = 0. On the other hand as A’ moves away from the origin, E[X;$~(A’)], E [ x ; $ ~ ( A ~ ) ] ,E[XF&(A’)], and E[X;;~(A’)] all decrease, so the risk-difference is less than zero. Thus, neither -S
8, nor
6,
dominates the other. ., PT
-S
. PT
Comparison of 8, Relative to 8, . The risk of 8, depends on the level -s of significance, while that of 8, is free of this restriction. First, consider the risk-difference under Ho, : W ) = u2tr[WD11] - u2ctr[W(D11 - D;,)]
- PT
= R3(8,
:W )
+ u 2 t r [ W ( D i i - D;,)] [Gp+i,,(&;O)
- C]
whenever GP+l,m(!,;0) 2 c. -S
A
PT
Thus, 8, does not dominate 8,
: W)
(6.4.45)
under Ho whenever
Gp+l,rn(f&0 )
L c,
(6.4.46) -S
,.PT
which implies for certain values of a that neither 8, nor 8, another.
6.5
- PT
L R3(8,
dominate one
Estimation of the Regression Parameters: Nonnormal Errors
In this section, we consider the estimation of the intercept and slope parameters in the parallelism model when the error distribution is unknown. For each a ( a = 1,... , p ) , let yal ,... ,ya,, be independent random variables
6.5. Estimation of the Regression Parameters: Nonnormal Errors with distribution functions Fa,, . . . ,Fa,, line R = (-m, co).It is assumed that = F((Y*,
Ex,(Y)
for
Q
.
- ea
293
, respectively, all defined on the real
- Pa%,)/U),
Ya,
R,
(6.5.1)
= 1,. . , p and j = 1,. . . ,na. Without loss of generality, assume
1
(6.5.2)
ydFaJ (y) = 0 and
6.5.1
Unrestricted, Restricted, Preliminary Test, James-Stein and Positive-Rule Stein Estimators and Test of Hypothesis
As in Section 6.1.2, the unrestricted and restricted estimators of 8 and p are as follows:
8, where
=P -
T,P,
6, = (Pin,,. . . ,Ppn,)’,
and 6, = 6,
and
X&Ya
I
+ T,H,P,
1 - <(l;axa)(l;,Ya)
Pan, =
n*Qa
with
and P
and D;.! = Diag(nlQ1,. . . ,npQp).
no&,
nQ, = a=l
The restricted estimator is given by
Also, the estimate of u2 is given by P
= ( n - 2pl-l
C
ll~a- 6a1na
- Pa~a11~.
(6.5.3)
a=l
For the test of Ho : p = Polp, we consider the test-statistic L, defined by
L,
=
Therefore, the PTE of 6 and
s;2(B~~/,~;.n~,p,). p are defined by
(6.5.4)
Chapter 6. Parallelism Model
294 (i) i j r= (ii)
PT
P,
=
an + T,H,P,I(L, P,
- HnPnI(L <
< t,,,)
(6.5.6a)
.en,,),
(6.5.6b)
where .en,, is the upper a-level value of the distribution of C, under Ho. Similarly, the corresponding James-Stein and positive-rule Stein estimators of 8 are defined by -S
(i) 8, (ii) ij:+
+ cnC,'TnHnPn, = an + (1 - (1 - C , C , ~ ) I ( C ,
=
and that of (iii)
-s
p,
6,
> C,)}T,H,~,,
P by
= ,B,
- 6, 6.) 0, -
- c,H,P,Cil,
+ (1 - c n L i l ) I ( C n > c,)H,P,,
where c, =
6.5.2
(6.5.7)
and c,
c = (p - 3) as n
4
(6.5.8) -+ 00.
Conditions for Asymptotic Properties of the Estimators and Their Distributions
We know that if errors are independent normal with zero mean and vari-I -I ance c', the exact distribution of (6,,Pn)' is a 2pvariate normal with mean (O', @ ) I and covariance matrix (6.5.9)
%,*..
, -%L + h). %QP where Dlln = Diag(& + Since the errors are independent but not normally distributed, we need the following regularity conditions: (i) For some T ( 2 2), (6.5.10) This means o2 = f",z'dF(z) -
(ii) For each a (= 1 , 2 , . . . , p ) , as n -+ n, A, = - -+ A,, n ?an,
Qn,
-+
+
zdF(z))' < 00.
00,
0
< Aa < 1,
IGl < M,
La,
0 < Qa
Qa,
P
Qn .+
(s-",
<1
Q = C A ~ Q ~o < , Q < 1. a=l
6.5. Estimation of the Regression Parameters: Nonnormal Errors
(iii)
(Noether Condition). As n
6.5.3
295
-+ CQ,
Asymptotic Distributions of the Estimators
bn
Note that the estimators en and fin, 6n and are all linear estimators. Thus, under u2 < CQ and the Noether conditions above, the classical central P limit theorem remains applicable. Further, sz-+u2 as n 00. -+
Theorem 1. Under Ho : P = p o l p and assumed the regularity conditions, we have as n -+ 00, (i) ,.hi(
3- n -- P6 Pn
) - NZp{( ), u2 (
-ToA22
)},
A22
(6.5.11)
where A; = Diag(AlQ1,. . . ,ApQp)’, To = limn+m Tn = Diag(Z1,. . . , Z p ) , Aii = no + ToA22T0, A12 = A21 = ToA22;
with limn-+mH, = J = I, -
no+
T o l p l ’ To Q P
1
ipi;A;i Q
,
A,’
= Diag(A1, ... , A p ) , A;, =
A*1 2 -- A *2 1 -- -$1,1;To.
The results above coupled with the well-known Cochran theorem on quadratic forms in normal variables ensures that
xs-l
as n -+m.
(6.5.12)
On the other hand, sz -+ u2 in probability as n -+ 00 and using Sluskey’s V theorem, we conclude that tn xiFl as n CQ, and !n,a $ - ~ ( c Y ) as n -+ 00. Now, consider the fixed alternatives H6 : P = p o l p 6. We see that -+
-+
-+
+
fi@, - p o l p ) = &(Pn - 0) + fi~. Hence,
&(fin
-
an)
= HnficPn -P)+Hn(fid),
(6.5.13a)
(6.5.13b)
Chapter 6. Parallelism Model
296 where
-
H n h ( P n -0)
+ O(:)
since H, = J
n
and J = I, -
1,l;Ai;
. Further, nllJ6/I2 --+
Q
00
hence, it follows that Cn -+ 00 as n -+ 00. Consequently, P(Cn 1 as n 00 for all k E R1. Furthermore, under a fixed alternative Ha : p = 001, 6,
-+ 00;
k)
(6.5.13~)
N(O1u2JA22),
-+
as
>
+
--f
(6.5.14a)
(ii)
lim n-co
~ { n @ -
,B,)’D;.,(P,S
- fin>
>
= c2u2 lim ~ { s : n ( P , ~ , > ’ D & ~ ( ( P-~ n-co
= c2u2 lim
n-+co
E(CZ1) = 0, where
since C n 2 0 and Cn (iii)
--f
00
s+
> ’ ~ ; ; s+ ~ (-b ,s>>
E(c,~(I- C L , ’ ) ~ I ( C ,
< c)>
= 0.
,.
Bnl - P T on, -S
(6.5.14b)
s
lim ~ { n s : s ; ~ ( p ,- 6 n-w
= ( p - 3),
in probability;
n-w
= u2 lim
c
bn)~;2>
S+
(6.5.14~)
Hence, pn , and pn are risk equivalent under fixed alternatives. Further, computing the asymptotic dispersion matrix of we obtain
a,,
(6.5.15) 00. &(fin - p) has unbounded dispersion Note that nIj6112 4 00 as n matrix. Hence, the asymptotic distribution of is degenerate. Thus, for any --f
pn
APT -s + 6 asymptotic distribution of ,Bn , 0, and b,S+is the same as that of fi(Pn - p), and that of bn is degenerate.
fixed alternatives Hb : /3 = ,001,
To obtain a meaningful asymptotic distribution of the estimators, we confine ourselves to the local alternatives defined by ~
( : ~ ~ ( =1 ~POI, 1
+ n-1/26,
(6.5.16)
6.5. Estimation of the Regression Parameters: Nonnormai Errors
297
where 6 = (61,. . . ,bE)’ is fixed, ,& E R1,and 61,. .. ,bp are not equal such that Cg=,XaQaba/Q = 0. The following theorem: gives the asymptotic distribution of various estimators under { K(,)} :
Theorem 2. Under 6.5.3, as n -+ 00,
{A’(,)}and assumed regularity conditions of Section
(iv) L, has asymptotically a noncentral chi-square distribution with p - 1 d.f. and noncentrality parameter A2/2, where
A2 = K 2(6’J’A;: J6) = a-2 (6’A;i 6 ),
(6.5.17)
since J6 = 6 from (6.5.16). Now, by virtue of the definition of the PTE of
&(SET- P ) = &(Pn
- P) -
&(Dn
and 8, we have
- BJ(L7Z < &>
and
- e) = &(fin
-
el + T,&(P,
Therefore, using the fact that we obtain the following theorem:
-BJI(C,
-+ x 2p m l ( a T, ) , -+
< t , , ~ . (6.5.18)
To and Theorem 6.6.1,
Theorem 3. Under { Kc,)} and assumed regularity conditions, the cdf of the estimates of /3 and 6 are given by (i)
G , P T @ ) ( ~6, ) = 11-+W lim p{fi(bET - 0) F x l ~ ( , ) ) = lim n-oo
n-o3 lim
p{ h
( B , - P ) Ix ;Ln < L , ~ I K ( ~ ) }
p { f i ( D n- 0 ) L x
;LL L,~IK(,)}
= Q P ( x- 6 ;0 , CJ~B)H,-~ ($1
( a ) ;A’) (6.5.19a)
Chapter 6. Parallelism Model
298 where B = &lplL;
aP(x + Jz; 0, n2Ao)dGp(Z;0, u’Azz),
(6.5.19b)
+
E ( 6 ) = { Z ; O - ~ ( Z Js)’Ag.(Z + J6)2 x&-I(~)}. (6.5.19~) ap(x;p, E) is the pvariate normal distribution with mean p and covariance matrix C, and H,(-; A’) is the cdf of a noncentral chi-square distribution
where
function with m d.f. and noncentrality parameter A2/2.
-s
fi(gnS+
The asymptotic distributions of fi(pn- p ) and - p) are difficult to obtain but the asymptotic bias, MSE matrix, and weighted quadratic risk function can be computed writing
h~iii~ - (~= zAD,, ~ o l p )
1/2
-
(pn - ~
o l p)
cADzii2(bn-
an)^,'
and 6
1/2 -s+
~
(0,2
~
,- ~ o l p = )
J;EDT;,~,/’( P n - ~
O l p)
-1/2
C J ; E D ~ ~ , ~ -, / B~ n( P) L, i l
-
-{ (1 - CLi1)I(Ln < c ) } J ; E D ~(0, ~ ~- Bn).
(6.5.20a)
Similarly, we can write expressions for
(6ET- 0), 6DL:’2 (6: - 0) and fi
fiD;:” 6.5.4
(6:’
- 0). (6.5.20b)
Expressions for ADB, ADQB, ADMSE, and ADQR of the Estimators
Note that under (6.5.1) and (6.5.2) and the regularity conditions (6.5.10iiii), we can obtain the asymptotic distributional bias, MSE matrices, and the weighted risk expressions as follows:
Unrestricted Estimators. (i) b l ( p n ) = lim E n-m
{ fi p
(-n
- p ) IK
(ii) MI@,) = n-m lim E { n ( p n - P ) and
R1
Similarly,
(a,;W )
= 0% [wA22]
(4
=OandBl(pn)=O
( a n - P)’IK(n)} = a 2 A ~ 2
(6.5.21a)
6.5. Estimation of the Regression Parameters: Nonnorrnal Errors
and Rz(p,; W) = o2
1 W1’ P Q
E
+ G’WG.
299
(6.5.22a)
Similarly, (i) b2(6,) = lim E { &(en - O)lK(,)} = T o J G n-w
and
&(en) = U-’(S’J’TOA;~TOJG) = A*’;
(ii) M2(en)= lim E ( n ( 6 , - 0) (6, - e)’lK,,)} n-M
= 02(A;,
+ o-~ToJSG’J’TO)
+
and Rz(6,; W) = c~’tr[WA;~] (G’J’ToWToJG).
(6.5.2213)
Chapter 6. Parallelism Model
300
and R4(6:; W)
= a2tr[WA11] - a 2 ( p - 3)tr[W(ToJAzzJ’To)]
X{2E[X;:l(A2)1
-
(P - 3)E[X,541(A2)]}
+( P - 3)( p - 1)(G’J’ToWToJS)E [~;:3(
A’)].
(6.5.2413)
Positive-Rule Stein Estimators. (i) bs(b:+) = -JS{Hp+i(P-3;
A ’ ) + ( ~ - ~ ) E [ X ~ ~ ~ ( A ’ )>~ p(-X 3 )~] }+ ~ ( A ’ )
6.5. Estimation of the Regression Parameters: Nonnormal Errors
301
Chapter 6. Parallelism Model
302
Asymptotic Distributional Risk Properties
6.6
In this section, we provide the risk analysis of the estimators of 8 with the general quadratic loss function using the matrix W so that James-Stein and positiverule Stein estimators of p dominate. The properties of the estimators of /3 may be obtained following Chapter 5 with the ANOVA model.
6.6.1 Comparison of 6, and
bn
The risk-difference In this case, is given by
Rl(6, : W ) - Rz(6, :W) = a2{tr[W(All -A;,)]
- U-’(G’J’TOWTOJG)).
By Courant’s theorem,
Hence,
R l ( 6 , : W) - o’tr[W(All - A;,)]
+ ~’A’Chmin [ ( T o W T O ) A ~ ~ ]
5 Rz(6, : W) 5 Rl(6, : W) - u’tr[W(All - A i l ) ] +u2A2Chm, [(ToWTo)A22].
(6.6.1)
When A2 = 0, the bounds are equal. Thus, (6.6.1) means 6 , performs better than 6, whenever (6.6.2a) whereas
6,
performs better than
6, whenever (6.6.2b)
6, performs better than 6, in the interval For W = A;:, tr[Ag:(A1l and worse outside the interval. (O’ Chm, [ (TO&; To)A22)]
)
6.6.2
*
PT
Comparison of 6, * PT
Here we compare 8,
and
6,.
and en(8,)
The risk-difference In this case, is as follows:
PT
R l ( 6 , : W) - R3(On : W) = a’tr[W(All
- (~’J’ToWTO J6) { 2Hp+i(x-;
1( a ) ; A’)
- Ay,)]H,+i(x~-l(*);A’)
- Hp+3(x;-, ( a ) A’)}. ;
(6.6.3)
6.6. Asymptotic Distributional Risk Properties
303
Thus, the R.H.S of (6.6.3) is nonnegative whenever
. PT
In this range, 8, ever
performs better than
6,,
whereas
6,
performs better when-
t r [ w ( A l ~- A ; i ) ] H p + 1(xf-1 ( a ) A’) ; C ~ ~ ~ ~ [ ( T O W T O{2Hp+1(~;-1(a); )AZZ] A’) - H~+~(x;-~(cY); A’)}. (6.6.4b) ,.PT Under Ho, 8, is superior to since the risk-difference is positive for all a.
A’ 2
a,
. PT
We can describe the graph of R3(8, 0’
: W) as
follows: It begins with a value
t r [ W A l l ] - o’tr[W(Aii - A;l)]Hp+i(xg-l(a); 0)
(6.6.5)
at A’ = 0, then increases, crossing the risk of 6, t o a maximum, and then drops gradually toward u’tr[WA,,] as A’ 4 co. By setting a = 0, it can be shown that PT
R2(an: W) = R3(6, A
:W).
(6.6.6a)
For cr = 1, however, we obtain
~ ~ ( : 6w), = ~ ~., PT ( 8 : w). , ..PT and 6, . Both are superior to 8,
NOW,we compare 6, In general, the risk-difference is given by
~ ’ ( 6: ,w) - ~ = -u’tr[W(All
(6.6.6b) under Ho : /3 = polp.
PT
~ ( 8 : w) , - A;,)]
+
[I - H , + 1 ( ~ i - ~ ( a ) ; A ’ ) ] (G’J’ToWToJG)
~ ( 1 2Hp+1(xf-1(a);A2) + Hp+3(x:-1(a);A2)}. PT
Thus, 8,
(6.6.7)
performs better than 6n whenever
t f [ W ( A l l - A;,)] El - HP+l(x;-l(~);A’)] C h m m [ ( T o W T O ) A Z{~1]- 2Hp+l(X;-i(a); A’) + Hp+3(x~-1(a);A2)}’ (6.6.8a) while for
A2 5
tr[W(h11 - A;,)] [I - Hptl(x;-l(~);A’)] Chmin [(ToWTO)AZZ] { 1 - 2Hp+1(~;-1(a); A’) + Hp+J(x;-1(a); A’)} ’ (6.6.8b) ,.PT 6 , performs better than 8, . Under Ho, the risk ordering is given by
A’ 2
- PT
~ ~ ( :6w) , >~ ~ ( 8 , >~ : w) depending on the size of a.
~ (: w6 ) ,,
(6.6.9)
Chapter 6. Parallelism Model
304 -S
6.6.3 Comparison of 8, and
en(en)
-S
In order to compare 8, and 6,, we consider the risk-difference given by Rl(6, : W) - R4(6: : W)
(P- 3 ) E [xi21(A2)]
= a2( p - 3)tr [W(TO JAZZJ‘To)]
The risk-difference is positive for all W such that
+
p 1 tr[W(ToJA22J’To)] 2 2. C L , [(ToWTo)Azz] -S
Thus, 8, uniformly dominates
6,.
Further, as A2 -+ 03,
: W) -+
R&
(6.6.11)
R , ( e , : W).
-S
To compare 8, and 8,, we can write R4 (6: : W) = R2 (
6,
-(G’J’ToWToJG) +[l-
(p
-
: W)
+ a 2 t r[W(ToJAZZJ’To)]
a 2 ( p - 3)tr[W(ToJA22J’To)] ( p - ~ ) E [ x ; ~ ~ ( A ~ ) ]
+ ~)(G’J’T~WT~JS)
(6.6.12)
2a2A2tr[W(ToJA22J’To)]
Under Ho, this becomes
R4(6z :W)
=
&(en
:W )
+ a2tr[W(ToJA2zJ’To)]
x ( 1 - ( P - 3 ) 2E
x-4 p + l (A2)] 2
,
(6.6.13a)
: W)
while
R2(6, : W) = R,(e, : W) - o2tr[W(ToJAz2J’T0)] 5 Rl(6, : W). (6.6.13b) -S Thus, 6 , performs better than 8, under Ho. However, as 6 moves away from the origin, o-2(G’J’ToWJ6) increases, and the risk of 6 , becomes unbounded -S
while the risk of 8, remains below the risk of 6, and merges with it as A -S -s Thus, 8, dominates 8, outside an interval around the origin.
-+
00.
6.6. Asymptotic Distributional Risk Properties
305
,.PT
-S
Comparison of 8, and 8,
6.6.4
PT
-S
Now, we compare 6 , and 6 ,
. First note that under Ho,
HP+l(x;-l(a);o) > (P - 3) -S
(6.6.15)
- PT
for some a. The risk of 6 , is smaller than that of the risk of 6 , when the critical value ~ ; - ~ ( o l ) satisfies the opposite relation in (6.6.15). This means -PT
-S
that 6 , does not always dominate 6 , follows: . PT
R z ( 6 , : W) < R3(6,
under
Ho.We can order the risks as
< R4(6: : W) < Rl(8, : W)
: W)
(6.6.16)
xi
when satisfies (6.6.16). This picture changes as A’ moves away from 0. As A2 -+ 00, the risk of PT -S 6 , and 6 , converge to the risk of 8,. For a reasonable a-value, the risk of A
., PT
-S
6 , is smaller than the risk of 6 , for a , satisfying (6.6.15) and p 2 4. Thus, none of the estimators dominates the other uniformly.
-
Sf
Comparison of 8,
6.6.5
-S
-S
,.PT
and On,6,, 8,
,.S+ . The risk-difference is given by
Finally, we compare 6 , and 6 ,
R.F,(~:+ : W) - R4(6: : W) = -u2 tr[W(ToJA22J’To)] x E [ ( 1 - (P - 3)~,;2l(A~))~1(xZp+l(A’) < P - 3)]
-(G’J’TOWT~JG)(~E[(~ - ( P - ~ ) X , ~ ~ ~ ( A ’ ) ) I ( X<~p+- 3 ~ )(]A ~ ) - E [ ( l - (P- ~ ) X , ; Z ~ ( A ’ ) ) ’ I ( X ; + ~ ( A <~P) - 3)]}.
(6.6.17)
The R.H.S. of (6.6.17) is negative since the expectation of a positive random variable is positive, and we get (0
< x;+j(A2) < P - 3) =+ ((P - 3)x,S2j(A2) - 1) 2 0.
(6.6.18)
Thus,
E[((P - ~)X;:~(A’)
-
~)I(x;+~< P - 3)] 2 0.
(6.6.19)
Chapter 6. Parallelism Model
306 S+
Therefore, for all (8’,/3’)’, R5(8, write
Rl(8,
: W)
:
W) I R1(8: : W ) . Further, we can
- S+
2 R4(b: : W) 2 R5(8, : W)
for all A’ E ( 0 , ~ )(6.6.20)
and for p 2 4. - S+ - S+ To compare 6 , and 6 , , we first consider the risk of 6 , under Ho:
,.S+
R5(8,
: W) = Rz(6n : W )
{ p-1 2
-
+ a2tr[W(T0JAzzJ’To)]
E[(1- ( P -
-3)]}
2 R z ( 6 , : W),
(6.6.21)
since
E [ ( 1 - ( P - 3)x;f1)’1(~;+1 -2
2
5 E(1- (P -~ I X , + ~ ) Thus,
b,
under Ho. However, as 6 moves away s+ becomes unbounded while the risk of 8, remains
6,
S+
Finally, we compare 8,
{
(6.6.22)
s+
-
below the risk of 8, and merges with it as A’ 6 , outside an interval around the origin.
x
2
= p-l.
performs better than 6 ,
from zero, the risk of
R5 (6,
< P - 311
- PT
: W) = R3(6,
..PT
and 8,
-+
00.
,.s+ dominates
Thus, 8,
. Under Ho, we obtain
+ a2tr[W(ToJWJ’To)]
: W)
-)
(Hp+l(x;-l(a);o) - P - 3 P-1
-E
m - (P-3)X,;2JZ1(x;+1 < P
- PT
L R3(8, :W)
4 I j
(6.6.23)
for all a , satisfying the relation { a :f f p + l ( X ; - 1 ( 4 ; 0 )
,.s+
The risk of 8,
>P - 3 +E[(1- (P-3)x,;21)2qx;+1 < P - 3 ) ] }
*
P- 1
(6.6.24) ..PT is smaller than the risk of 6 , when the critical value ~ ; - ~ ( a ) S+
satisfies the opposite inequality in (6.6.24). Thus, 8, does not always domi- PT nate 8, when HO is true. We can order the risks under HO as
Rz(in : W) <
R~(GY W) < R ~ ( B : +W) < R~(G;W) < ~ 1 ( 8 ,w). :
:
:
:
(6.6.25)
6.7. Asymptotic Distributional A!fSSmatrix Properties
,.PT
The position of R3(8,
: W) may
-s
307 AS+.
shift in between Rz(6, : W) and R5(8,
.
W) t o in between R4(8, : W) and Rl(6, : W). Thus, the ordering under HO may change to ..s+ R2(6, : W) < R5(8, : W) < R4(6: : W) < : W) < Rl(6, : W), (6.6.26) depending on the size of a. The picture changes as 6 moves from the origin. Note that the risk of 6, is constant and the risk of 6, becomes unbounded as -PT - S ,.s+ 6 moves away from 0 while risk of 8, , 8,, and 8, converges t o the risk of . PT s+ nor 8, dominates 6, as 6 moves farthest away from 0 . Thus, neither 8, each other uniformly.
&(67
-
6.7
Asymptotic Distributional MSE-matrix Properties
In this section, we discuss the property of the asymptotic distributional MSE matrices of the estimators. Comparison of 6, and
6,.
Mi(8,) - Mz(6,)
Consider the difference of the MSE matrices
= 02(A11 - A;,)
-
(6.7.1)
(ToJ66’J’To).
This expression is positive semidefinite if for any nonzero vector 1 we have (6.7.2) Now, (6.7.3a) and max o-2
e
e’(ToA22To)e
5 Ch,,,(J)
= 1.
(6.7.3b)
If A2 < 1, then 6, performs better than 6,; otherwise, 6 , performs better than 6%.The asymptotic efficiency of 6, relative to 6, is given by
AMRE(6,
:
6,)
= =
11,
,1
-1lP
- AT;{ToJA2nJ’To - o-2T~J66’J’To}/
-11,
-
A ; ; { T ~ A ~ ~ ( J- c r - 2 ~ ; . ~ s s r ~ / )(6.7.4) ~o}I
where J is positive semidefinite with rank p - 1 while u-~AY~(JSG’J’)is positive semidefinite with rank 1 and the nonzero characteristic root of this matrix is o-26’J’A;i J6 = A2.
308
Chapter 6. Parallelism Model
Thus, (6.7.4) exceeds 1 whenever A’ < 1 and falls below 1, whenever A’ > 1. Further, if A’ --+ oc), (6.7.4) tends to zero, showing that 6, loses its efficiency as A’ moves away from the unit interval.
..PT
Comparison of 8, is given by
and 6,. In this case, the difference of the MSE matrices
Mi(8,) - M3(6ET) = a2(ToJA22J’To)H,+i(x~-l(a); A’)
-(ToJ6s’J’To){2Hp+1(~~-1(~); A’) - Hp+3(X:-1(Q);A’)}. (6.7.5) For a given nonzero vector
e, we obtain setting R.H.S. of
(6.7.5)
2 0 as
t r ( ~ o ~ 6 a l ~ rI~ o )e’(ToJAzzJ’To)H,+i(~:-l(a); e A2)l {2Hp+1(x:-l(a);A2)
- Hp+3(x;-1(a);A2)}.
Dividing by l’ToA22T& and maximizing over all
..PT
Thus, 0,
performs better than
otherwise,
6,
6,
e, we obtain
(6.7.6)
whenever
PT
performs better than 8,
PT
. The AMRE(6,
-
; 8,) is given by
AMRE(6zT : 6,) = 11, - a2Ac/ ( T o J A ~ ~ J ’ T o ) H ~ + ~
A’)
-AI;~(TOJGS’J’T~)(~H,+~(X;-~(~); A’) - Hp+3(&1(a); A’)}I-’lp. (6.7.8) This is a function of A2. It has its maximum at A2 = 0, then decreases crossing the 1-line to a minimum, and then increases as A’ increases toward 1-line.
-s
Comparison of 8, and by
6,.
In this case, the MSE-matrix difference is given
Mi(6,) - M4(bE) = ( p - ~)O~(TOJA~~J’TO)(~E[X~~~(A’)]
- (P- 3 ) E [xG:~ (A’)] }
- ( P - 1)(P- 3) (ToJ66’J’To) E [
xi&( A’)].
(6.7.9) For the MSE difference to be p.s.d., we must have for every nonzero vector t , ( p - 1)(e’ToJ66’J’Tol) E [x;:~(A’)]
5 ~ ’ ( T O J A ~ ~ J ’ T O ) ~ ( ~ E[X ( p~-~3)E[x;:1(A2)]}. ~(A’)]
(6.7.10)
6.7. Asymptotic Distributional h;lSE-matrixProperties
309
Dividing by t’(ToA22To)t and maximizing over l , we have ( p - l)A2E[XpS43(A2)] 5 2E[x;$1(A2)]
-
( p - 3)E[x;:1(A2)]
(6.7.11)
or
E [x;: 1(A211 L P E [x;:1 which does pot hold for all A2. Hence, 0, does not dominate
8,
(A213 ,
uniformly, as we have seen with respect
-, S to risk analysis. The asymptotic mean-square relative efficiency AMRE of 8 compared to 6, is given by
AMRE(eE : 6,) = 11,
-
A
S+
M4(0:)
(6.7.12)
-S
and 8,. In this case, the MSE-matrix difference equals
,. S+
- M5(8,
(ToJA~~J’T~)[~E[x;$(A~)]
+ (P - 1)(ToJ66’J’To)E[x,-t4,0l)j-l/q
- ( p - 3)E[x;;1(A2)]]
Comparison of 8,
( p - 3)A;;’{
)
=C~~(TOJA~~J’TO - )( p E-[ (~I) X ; ~ ~ ( A ~ ) ) ~ I ( X > ; +p~- (3)] A~)
+(ToJdh’J’To){ E[(1 - ( P - ~ ) X ; ; ~ ( A ~ ) ) ~ I ( X ~>+p~ (A 3)]~ ) +2E[(1 - ( P - ~ ) X , - , ~ ~ ( A ~ ) ) I ( X > ; +p~-( 3)]} A ~ ) 2 0. S+
(6.7.13)
-S
Hence, 8, dominates 8, uniformly. Furthermore, A
,.S+ ) 2 0
Mi(8,)
- M4(8,
M4(6:)
. S+ - M5(8, )
and
2 0.
(6.7.14)
MI(&),
(6.7.15)
So we can order the MSE matrices as
M5(6:+)
4
M4(0:)
4
where 4 means “domination”.
-s+ -s
: 8,) is then given by : 8,) -S = 1 1, - c~~{M~(~:)}-~(ToJA~~J’To)
The AMRE(8,
AMRE(h:+
Chapter 6. Parallelism Model
310
Under Ho, (6.7.17) becomes
P-3 a 2 ( A i i -A;,) - -0 P- 1
2
2 (ToJAzzJ’To) = - - - - - ~ ~ ( A i i -A;,). P-1
(6.7.18)
-s
The expression (6.7.18) is p.s.d. Hence, b, performed better than 6, under Ho. In general, the R.H.S. of (6.7.17) is negative-semidefinite(n.s.d) if for every nonzero vector f?, we have
t‘(Aii -(P
- A;,)t
- 3)f?’(ToJA22J’To)f?{2E:[x~~~(A2))I - (P - ~)E[x;:~(A~)]}
-e’(ToJAzzJ’To)t{ (P - l ) ( p - 3)E[x&(A2)] - 1) 5 0.
(6.7.19)
Dividing by t’ToA22Tot and maximizing over all 1, we obtain
1 - (P- 3){ 2E:[X,S21(A2)I- ( P - 3)qX;:l(A2)]}
-A2{ (P - 1 ) b- 3)E[x;:1(A2)]
- 1)
50
or
“S
Thus, 0, dominates 6, whenever (6.7.20) is satisfied; otherwise, 6 , dominates -S
-S
8,. Thus, neither 6, nor 6, dominates the other. The asymptotic efficiency -S of 6, relative to 6, is then given by
where AMRE(6: : 6,) and AMRE(6, : 6,) are given by (6.7.16) and (6.7.4), respectively.
6.8. Confidence Set Estimation: Normal Case -S
, ,
311
PT
Comparison of 6, and 6, . We may write the MSE-matrix difference as
~(63 ~ ~ ( 6 5 1Q2(nll ~) =
-
-ny,)[~,+~(x~-~(~);~2)
-(P - 3){2E[X;:l(A2)]
- ( P - 3)E[X;:1(A2)]}]
+(ToJ~~’J’T [(P o-) I ) ( P
- {2E [x;:l
>I - E Ix;A
(A
- 3)E[x&(A2)]
>I}I.
(6.7.22)
(A2
The expression (6.7.22) is p.s.d. whenever
PT
Hence, 6 ,
(6.7.23) -S -S is superior to 6 , whenever (6.7.23) is satisfied; otherwise, 6, is
. PT
superior to 8, . -S PT The asymptotic efficiency of 6, relative t o 6, is given by (6.7.24) where AMR,E(6; : (6.7.8), respectively.
8,)
and AMRE(6ET : PT
Similarly, we can compare 6, and are given in the problems section 6.12. A
6.8
en) are given by
..s en, 6,
., S+
and 6,, 6,
(6.7.12) and
..PT and 6, , which
Confidence Set Estimation: Normal Case
So far, we have considered point estimators of the intercept and slope parameters in the parallelism model from preliminary test and Stein’s perspectives. However, along with the point estimators, it is important t o provide some information on confidence sets. This section is set out t o study this problem.
6.8.1
Confidence Sets for the S l o p e Parameters
In this section, we consider the confidence set estimation of the slope parameters, p = ,,Bp)’ when it is suspected that p = p o l p holds in the parallelism model. Consider, again, the model (,&,.a.
-
Y , = doIne
+ pax, +
E,,
(Y
.
= 1,.. , p ,
(6.8.1)
N,, ( 0 ,a21,_) as stated in Section 6.1.1 and o2 is known. Thus, where E, we consider several estimators of p of the type P;C, = APn + (1,
- A)Png(Ln),
(6.8.2)
Chapter 6. Parallelism Model
312 where
L,
=
a - 2 B k ( ~ -p A)’D;;
(I, - A)&
(6.8.3)
and g(C,) is a nondecreasing function of the test-statistic for testing the null hypothesis HO : = p o l p against the alternative H A : P # polp. Here we consider A t o be
(6.8.4)
I, - A is an idempotent matrix of rank p - 1. We limit g(C,) to the following selected functions: (i) g(13,) = 1, then ,B: = (ii) g(L,)
= 0,
then ,f3: = P,;
(iii) g(&) = I ( L (iv) g(L,)
=
p,;
PT
> ~ ; - ~ ( a )then ) , PA = Pn
-s
(1 - cL;’), 0 < c < 2 ( p - 3) then 0: = P,;
and (v) g(L,)
;
=
(1 - cL;‘)I(L, > c ) , then ,Bz
=
- s+
P, .
(6.8.5)
Following Chapter 5 (Section 5.9.1), we define the five confidence sets as
We like to provide the properties of the coverage probabilities of these five sets. As before, let r = (r1,rz)be an orthogonal matrix, where is p x p - 1 and I’2 is a pvector such that = I, - r2I’;.Also, r diagonizes the symmetric idempotent matrix D:i2H’D:i2HD:i2 of rank ( p l), meaning rD2, 112H‘Dii2HD:i2r’ = Further, choose r2 = 0 0
(
).
6.8. Confidence Set Estimation: Normal Case
313
(6.8.7) where q1 = Hence,
$ r ; ~ i i /and ~ a772 = 'a r2 f2 ~2 - 1 / 2 ~ '
2
a-211P - P:I/*;:
2
= 11711 - w1g(11w11I2)II
+ (772 - w2)2 .
(6.8.8)
We can then write the coverage probability of the set C*(P7tl)=
{ P : a-211P -a$),
I XJY)) 2
(6.8.9)
as
PV{"772 - w2)2 + 11711 - ~ 1 ~ ( 1 1 ~ 1 1 1 2 ) 1 152x;(d} 1 =
I
x;(Y)
Pv{llrll
- wlg(l/w1112)112I (X;w
- t)+}dHl(t,O),
(6.8.10)
where H,(- : A') is the cdf of a noncentral chi-square distribution with v d.f. and noncentrality parameter A2/2. Specifically, we may write the coverage probability as follows: (i) If g(I[w11[2)= 1, then by definition
1 - y = Po{ (772 - w2I2 + 11711 - will2 I x;(Y))
= Hp(x;(m).
(6.8.11)
(ii) If g(j[wlj(2)= 0, then we have
(6.8.12)
314
Chapter 6. Parallelism Model
6.8.2 Analysis of Coverage Probabilities
Clearly, P ( C o ( p n ) = ) 1-7, which is a constant for all A '. Next, we note that
W R ( B , ) ) = Hl(x;(r) - A2;0),
(6.8.16)
which is a decreasing function of A ', has a maximum (11-7) at A' = 0, and decreases to zero when A ' 4 x;(y). The coverage probabilities of P ( C o ( p n ) ) and P ( C R ( B n )are ) equal when A' = A; where
A;
= x;(y) - Hcl(l- 7 ) .
Now, consider the confidence set ability is given by (6.8.13),
Hdx;(r)
+b
(6.8.17)
PT
CpT(p, ). In this case, the coverage probA
- A2;o)HP-l(X;-l(a); A2)
X;h)
~77{llrll
-w11I2 5 (x;(Y) - t)+;IIw11I2> x;-l(..))dHl(t,O),
and we have the following theorem: similar t o Theorem 5.9.1: Theorem 1.
(6.8.18)
6.8. Confidence Set Estimation: Normal Case
315
Chapter 6. Parallelism Model
316 or
A I llw1 /I + I(x;(Y) - t )+ I 1/2 . We then have
I($(?)
(6.8.24)
+ Xp-i(a) 1. < k i I I + /(x;(r)- t)+I‘/’.
- t)+I1/’
(6.8.25)
Thus, from (6.8.23) we have the R.H.S. as Hence, j/wlj12 > X:-~(CY).
{ llrll = Pql{
-
w1 112
(712 -
= Po{ ( 7 2 -
< (x;(Y)
42 + 11%
- t)+;IIw1
-0
1
112
5
I? > x;-lW}
x;w; ll
Wl
112 > x;-1w}
+ 11171 - w1 Il2 I x;(?.)} = 1 - Y-
(6.8.26)
This completes the proof. The graph of the coverage probability of CPT(p, ( a ) )as a function of A2 may be described as follows: As function of A2, the coverage probability is decreasing in the range 0 5 A2 5 xi(y) with a maximum a t A2 = 0. At A’ = $(y) there is discontinuity, and it drops to *
1
X;(Y)
PT
2 ~~-~(a)}dHl(t,O), (6.8.27) Prll{X;-15 (x;(Y)- t ) +.,xP-l(A2) 2
then increases as
A’ increases in the range
x;w
< A2 < ( X P ( Y ) + xp-l(4)2,
(6.8.28)
and eventually goes to 1 - y as A exceeds the limit
(XP(Y)+ Xp-1 Next, we consider the coverage probability of the set case, we have
P{CSf(p:+(c))}
Cs+(p, (c)).In this A
S+
= Hl(x;(r) - A2;0)Hp-1(c;A2)
5 (x;(y) - t ) + ;llw11I2> C}dHl(t,O)2 (1 - 7). (6.8.29) The first term on the R.H.S. of (6.8.29) is decreasing as a function of A’, and the second term is increasing in A2 and for all values of A’ the coverage prob. s+ ability is a t least (1 - 7). Hence, Cs+(p, (c)) uniformly dominates Co(B,) in coverage probability.
6.8. Confidence Set Estimation: Normal Case
317
Before we prove the assertion (6.8.29), we define the functions
(6.8.30a) and
(6.8.3013) . s+
Theorem 2. Cs+(pn(c)) has higher coverage probability than Co(p,) for all A2 such that 0 < c < CO, where co is the minimum of the two unique solution
Ml(G (x;(Y)
- t ) + )= 1
and
(6.8.31)
Proof. We have to show that for every 0 < b2 < x:(y),
HI(Xg(Y)
- A2;0)H,-1(c; A2)
It is sufficient to establish that M l ( c ; b) 2 1 and Mz(c; b) 2 1 for all c E (0, CO) and b2 E (0, $(y)). First, we prove that Ml(c;b) 2 1 for b E (O,.X;(Y)). Note that MI (c; b) for each fixed value of b is decreasing in c. Hence, it is sufficient to establish MI (c*;b) 2 1 where c* satisfies MI (c*;xp(y))= 1. Note that a2
-logMl(c*; b) < 0. db2
Chapter 6. Parallelism Model
318
0
logM1 c; b is strictly decreasing in b. Hence, M l ( c * ;b) is This is to say, (1) either strictly decreases to zero in b or (2) strictly increases to a unique maximum and then strictly decreases to zero. The first case does not hold, since
(6.8.33)
=1-y.
6.8.3
Confidence Sets for the Intercept Parameters when o2 is Known
In this section, we propose some confidence sets for the intercept parameters for the model (6.8.1) when it is suspected but not certain that Ho : p = pol, holds. Following Section 6.8.1, we define the five confidence sets given by the expressions below, assuming o2 is known:
We can write the confidence sets compactly as
6.9. Confidence Set Estimation: Nonnormal Case
319
b, + ( 6 , - b,)g(L,),
g ( L , ) is given by (6.8.2) and L, = o-~PLH’D;~H@,.By a suitable transformation (see Section 6.1.3), we then find that
where 6; =
O-2t18
- 8;112 = ( r l 2 -
4 + 11771 - wlg(llw1t12)112~
( Ei ) - { ( ;l ) ( ‘xl )}.
where Np (6.8.35) is then given by
;
P(c*(8:))= P{ 1171~ - wlg(itwl
(6.8.36)
The coverage probability of
tI2)P+ (72 - 4 I: X;(Y)},
(6.8.37)
which is the same as (6.8.10). Hence the details of Sections 6.8.1 and 6.8.2 are applicable t o evaluate properties of the confidence sets with center 8,, -PT - S . S+ 8, , On,and 8, , respectively, which are described in Section 6.8.3 replacing PA by 8:.
a,,
6.9
Confidence Set Estimation: Nonnormal Case
Now, consider the five estimators in a compact form as in Section 6.8.1:
Chapter 6. Parallelism Model
320
where A, = lplbDT;/nQ -+ lplbAi;/Q as n -+ m. Then, under {K(,)} as n -+ m, by Theorem 1 of Section 6.5.3, we can write
where (6.9.5) Thus, under {K(,)} as n -+ m, we obtain the following asymptotic coverage probabilities according t o the choice of g( Ilwl
1. If g(llw1112) = 1, then (6.9.4) becomes po(ll771 - 4 12 + (v2 - W 2 I 2 5 x;(Y)}.
(6.9.6)
32 1
6.10. Nonparametric Methods: R-Estimation
These expressions are similar to that of the expressions in Section 6.8.1. Hence, the analysis of the coverage probabilities is similar to that given in Section 6.8.2.
6.10
Nonparametric Methods: R-Estimation
In this section, we consider the nonparametric methods of estimation of the intercept and slope parameters of several simple linear models. These applications enlarge the scope of the theory of shrinkage estimation bringing it to a broader class of estimators with robustness properties.
6.10.1 Model, Assumptions, and Linear Rank Statistics Consider, again, the set of several simple linear models given by
Y, = &ln, +paxa + E,,
CY =
1,. .. , p ,
(6.10.1)
where
with the cdf defined by
and P
n,
(6.10.3)
Chapter 6. Parallelism Model
322
2)
+
Let n = n1+ . . . np and A,’ = Diag (%, . . . , . It is assumed that { E , ~} are mutually independent and identically distributed with cdf F(e1, . . . ,c P ) defined by (6.10.3), where PO(.)belongs to the class F of absolutely continuous cdf with absolutely continuous pdf, fo(-) having finite “Fisher information” as follows:
(6.10.4)
ni (iii) lim - = Xi (0 < X i < l), meaning, limn-wA;l n-iw n Diag(X1,. . . ,X p ) .
= A-’ 0
=
(iv) Score functions a,(.) (and a:(.)) are generated by a function ~ ( z L ) u , E (0, l ) , which is a nondecreasing, skew symmetric (i.e., 4(~)+4(1--2~) =0 for all 11 E (0, l)),and square integrable. Let ~ + ( z L )= 4
(F),E (0, l ) , and set ZL
and a:(k) = E[4+(Ukn)]or
4’
(&),
k = 1,... ,n,
(6.10.5)
where 0 < U1, 5 ... 5 U,, < 1 are the order-statistics of a sample of size n from U(0,l). Let R,j(a,, b,) be the rank of (Y,j - a, - b,zaj) among (Y,l - a, - b,z,i), . .. , - a, - bazan,) and similarly, let R,+J(a,,b,) be the rank of {/Yaj - a , - b,zajl among l(Yal - a, - b,z,l)l,. . . ,I(Y,,, a, - boz,,_)l for a = 1 , . .. , p . Let the vector of linear rank-statistics (LRS) be
where
and
where (6.10.6b)
323
6.10. Nonparametric Methods: R-Estimation Further, let
c P
=
j=1
Ln,(b,)
(6.10.7)
= lbLn(b).
We will use the LRS above for the estimation and test of hypothesis-related problems regarding the intercepts and the slopes.
6.10.2
R-Estimation and Test of Hypothesis
As in Chapters 4 and 5, we define
and
Y(+, 4) =
/
1
0
(6.10.8)
+r(2h)d(2h)dZh.
We let na
A:_ = (n, - 11-l C ( a n , ( k ) - an,>2 k=l with sin_ = n,lCLrl(an,(k) and A: = (n-p)-'
(6.10.9a)
xE=l(n,-l)A:,.
Further,
=C(~cr~--n~)~, ~ = 1 , . - ,P. n,
Qn,
(6.10.9b)
j=1
Note that given x, = (z,~,. . . ,zan_)',L,_ (b,) \ b, for b, E (--00, m) and T,_(a,,b,) \ a, for fixed b,. Under the model (6.10.1) with 8, = pa = 0, Trim (0,O) and Lnm(0), both have distributions symmetric about 0. First, we note that from the basic theorems of Hajek and Sidak (1967) in =0 Chapter 4, we have that under 6, =
where A,'
= Diag(X1,.
.. ,Ap)
and A;
= Diag(XlQ1,.
. . ,X,Qp).
(6.10.11)
6.10.3 Estimation of the Intercepts 6, and the Slope We consider two unrestricted estimators of 8, and and (6.10.6b), respectively, as follows:
,&= -21 [ sup{b,
: Ln_(b,)
0, based
> 0} + inf{b, : L,_(b,) < O } ] ,
on (6.10.6a) (6.10.12a)
Chapter 6. Parallelism Model
324
1
[ sup{aa : Tn, (aa 2
gna =
7
> 0) + inf{aa : Tn, (aa
Pn,
7
Pn,
p under p = polp is defined by
The restricted estimator of
1
Boon= 5[ sup{b : ~ i ( b > ) 01+ infib : LA(^) < 013. Hence, we denote the unrestricted estimators of 8 and
. . . ,JnP)’, and
6, = (g,
p, = (p,,,...
< 011. (6.10.12b)
(6.10.12~)
p by
respectively. (6.10.13)
,,&)I,
Let us denote the restricted estimator of 6 by
6n,
-
1
= ( 6 n l , - - -i o n p ) 7
(6.10.14a)
where
in,
=
1 2
{aa : Tn,(aa;h:on) > O }
-[SUP
+ inf {aa : ~n,(aa;h n > < o } ] (6.10.14b)
p = polp by p n = & l p . Test of Hypothesis: ,B = polp.For the test of the null hypothesis Ho : p =
and that of
p o l p ,we use the nonparametric test due to Sen (1969), which is based on the statistic P- 1
Ln =
C~ ~ ~ A L Z Q ; :
[ ~ n , ( o n ) ] ~ ,
(6.10.15)
a=l
where A i m and Q,, are given by (6.10.9a) and (6.10.9b), respectively. Under the null hypothesis Ho : p = ,&lp, L, is approximately distributed as a chisquare with ( p - 1) d.f. This allows us to define the PTE, SE, and PRSE of p as follows: (i)
bZT = P n - ( P n - B n ) I ( L n
(ii)
b: = P, where
(iii)
E,,
- d ( p n - B,>L,’
-+
0 as n -+ 00, and
< X;-,(~Y,>, I(L > E n ) ,
d =p -3
bf+ = 6, - (1 - ~ L Z ~ ) I (>Ld, ) ( ~ -, f i n > ,
(6.10.16a) (6.10.16b)
(6.10.16~)
respectively. Thus, we have defined five possible estimators of to generate five possible estimators of the intercept parameter 8 of the model (i) UE : 6 , = (ii) RE : hn =
(&, . . .
,en,)’;
(6.10.17a)
(en,,.. . ,en,,)’,;
- PT (6:?, . . . ,enP -PT ) ,
(iii) PTE : 8, =
I.
(6.10.17b) (6.10.18a)
6.10. Nonparametric Methods: R-Estimation
-s
(iv) SE : 8, =
325
.
(s,Sl,.. . ,6,Sp)',
(6.10.19aj
where
6.10.4 Asymptotic Distribution of the R-Estimators of the Slope Vector First, note that by JureEkova's (1969) asymptotic linearity result (see Section 2.8 of Chapter 2) we have under ,B = 0, as n --+ 00 and R > 0,
{
sup n-'/2/Lna(n-'/'b,)
- L,, (0)
+ X,n'/2b,Q,y(T,!I,
$)llb,l
50 (6.10.21)
so that
(6.10.22)
Also,
(6.10.23) Further, unde; 8 = p = 0 as n --+ sup
{
00
n - 1 / 2 p n a ( n - 1 / 2 ( a a , 6,))
+Xan1/'(aa
for k (> 0), we have - T,_(O, 0)
+ baza)r(T,!I,q5)Ila,l 5
(See Chapter 3, Section 3.10).
R,
[be[i R }
5 0.
(6.10.24)
326
Chapter 6. Parallelism Model
where
pn =
cLQa&, P
(Q)-l,
(6.10.25)
+ %(I).
(6.10.26)
a=l
n1/2QBon)Y(+,4)
As a consequence, we have
boon = D n + Op(1).
(6.10.27)
Further, from (6.10.21) we have under p = 0,
+
n-l/’L,(O) = A22n1/2P,~(+,$) op(l).
(6.10.28)
Hence, (6.10.29)
or
Thus, in vector form, we write
Hence, under p = 0 we write
L, = n-l A,2[Ln(0)I’JA22J’[Ln(O)1 + oP(l ) ,
(6.10.32)
6.10. Nonparametric Methods: R-Estimation
327
1 1'
where J = I, - Y A Y : and JA22J' is an idempotent matrix of rank ( p - 1). Thus, C, asymptotically approximates the central chi-square distribution with ( p - 1) d.f. We recall that under P = 0 as n + co, V = N,(o, A$;;),
-
n '/'L,(o)
fi(P: -P) = f i ( p nwhere 0: = 0, , 0, and 0, . Hence, we consider a
and under fixed alternatives, C,,
P ) + op(1) as n
--f
00,
--f
m as n APT
-+ 00
-s
and
S f
class of local alternatives
K(,,) : P ( ~ ) pol, =
+ n-'j26,
(6.10.33)
such that 6'&;1, = 0. We further note that this class of alternatives is contiguous to that of HO : P = polp. Thus, we have the following theorem:
Theorem 1. Under {K(,)} and the assumed regularity conditions with respect to the model (6.10.1) as n + 00, we have (i)
f i ( P n - P01p) N
(ii) (iii)
-Pn)
where
-Np(J6,02JA22);
-
fi(Bon -po)~,
(vi) n-oo lim
Np(6,c2A22), o2 = A$/r2(+,4);
~ ~ (0 2 6 ~, ) B, =
{c,,5 Z I K ~ , )=} H , - ~ ( z ;a2);
1 1' J = I, - >A;;
and A2 = (~-~(6'Ai;d).
Q
pn
p,,
(6.10.34)
PTOO~. First note that and are both translation-invariant estimators, and hence, we can assume without loss of generality, P = pol,. Also, by (6.10.12a) and (6.10.12~)together with (6.10.10) and the linearity results (6.10.21) through (6.10.23), we have n'/21pn-01 = O p ( l ) and n'/21fin-01 = O,(l), while under {K(,,)},n'/'pn = ,801, b = O(1). Thus under {K(,,)}, n'/2Pn = ,801, + 0,(1) and n1/2fin= pol, O,(l). Observe that under H' : 6 = P = 0 by (6.10.12a-c), n-'12Ln(0) = AY;n'/2fin-y(+,(6) + op(l). Hence, utilizing the contiguity of probability measures under KTn, : 0 = 0, P - ,801, = n-'126 to those under H', we obtain under Kin, as n 00, that n-'12Ln(0) has the same distribution a5 n-'12Ln(6) under
+
---f
+
Chapter 6. Parallelism Model
328
H*. Thus, by (6.10.10) n-'/2L,(n-1/2(6) is asymptotically pvariate normal with mean-vector A;i6y($, 4 ) and dispersion matrix A;&.. .-Thus, under
- Np{&16d$, $1, 4).
{K(,)}, n-lI2L,(O)
Hence, n1I2(P, - Polp)
A&2).
Np(6,g2A22), a2 = A$,/r2($,
-
Similarly, by the contiguity of probability measures under K:,, t o those un-
der H ' , we obtain under KT,, as n --+ co,n1/2,b, = hl' n-1/2L,(0)y-1($, Q
P
o p ( l ) .By a similar argument as before, we have fi(P, -Polp) B = lplL/Q. By similar steps, we obtain under {K(,)}, (iii)
TZ'/~(P,- P,)
+
= (A22 - ~ l p l ~ ) n ' / 2 L , ( 0 ) ~ - 1 (o~p ,( ~ l ))
-
NP(J6,a2JA22).
Further, under {K(,)} as n
(6.10.35)
00,
A22n'/2L,(0)7-1 ($, 4 )
n1/2(D, - P o l p n'/2(& -
-
+)+ Np(6;a2B),
P,)
N N2,
{(
6!
),
(
g2
A22 A22 J'
JA22
JA22
+OP(1)
)}
(6.10.36)
by the same argument as before. Similarly, under {K(,)} as n 3 co,
(vi) To prove limn-m P { L , 5 Z I K ( ~ ,= } H p - l ( z ;A2), we note that under (0 - Polp) = 0, L, = nA~[nL,(0)]'JA22J'[Ln(0)] o p ( l ) ,which approximately follows a central chi-square distribution with ( p - 1) d.f. Also, under {K(,)}, as n -+ 00, we can write
+
L,
=
n(p, - ~ , ) J ~ A ; ~ J-@&J, + op(i).
(6.10.38)
By (iii) the expression above follows approximately a noncentral chi-square distribution with ( p - 1) d.f. and a noncentrality parameter A2/2 with A2 = 0-2( dfA; 6).
As a consequence of Theorem 6.10.1, we obtain the ADB, ADQB, ADMSE, and ADQR of the five estimators its given below.
For UE,
B,,
(i) bl(P,) = 0 (ii)
and
MI@,) = a2A22
For RE,
a,,
&(&) and
= 0;
R l ( P , ; W ) = u2tr(WA22).
(6.10.39a)
6.10. Nonparametric Methods: R-Estimation
329
330
Chapter 6. Parallelism Model x I(X;+~(A~) <
- 311
+ (Jfifi’J’) { 2E [ (1 - (P-3)~;:~ (A2))I ( x &(A’) ~ < p - 3)] -E[(1-
(P - ~ ) ( x , ; ~ ~ ( A ’ ) ) ’ I ( x ; + < ~ (PA-’3)]} )
and ., S+
R5(Pn
R4
;w)=
(6,; W) -g’tr(WJAzz)E
[ (1- ( P - ~ ) X $ ~ (A2))21(xg+1(A’)
+(firJ’WJfi){2E[(1 - (P - ~ ) X ~ ~ ~ ( A ’ ) ) I ( X ~<+p~-( A 3)]’ )
- E [ ( 1 - (P- ~ ) x ~ ~ ~ ( A ’ ) ) ’ ~ ( x &< ~ p( A - ’3)]}. )
(6.10.39e)
It should be observed that these expressions are similar t o those in Section 6.5.4, and the properties of the five R-Estimators of p are the same as those given in Sections 6.5 and 6.6.
6.10.5 Asymptotic Distributional Properties of the R-Estimators of Intercepts Recall from Section 6.10.2 that the unrestricted and the restricted estimators of 6 are those given by
-
On =
I
, .. . ,OnP)’ and 6,
= (&
enp)’,
, .. . ,
(6.10.40)
respectively, as defined by (6.10.13a) and (6.10.14a). We will consider the distributional properties of these estimators under the local alternatives {K(,l}. Theorem 2. Under {K(,)} and the regularity conditions for the model (6.10.1), as n -+03,
where
To = Diag(21,. . . ,z p ) ,
and (6.10.42)
6.10. Nonparametric Methods: R-Estimation
33 1
Proof. Assume, without loss of generality, that 8 = 0. Relation (6.10.24) and the definition of 8, imply that under {K(,)}, n1l28, is asymptotically equivalent to
A ~ ~ - ~ / ~ T o, )( ?O- ~ , ( +4) , - T~dl2P,. Using relation (6.10.21) and the definition of
p,, we have under {K(,]},
1 A o ~ - ' / ~ T , ( O O)Y-'(+~ , 4 ) - =To A22n'/2L,(0)y-1(+, +),
which under
Q Kin, : 8 = 0,(p - pol,)
(6.10.43)
(6.10.44)
= n-'l26 is the same as
[Aon-1/2T,(0, n-'/26) - To A22n1/2L,(n-'/26)]y-1(+l 4).
(6.10.45)
As in (6.10.21) through (6.10.24), under HOthe above is asymptotically equivalent to the random vector 1 A o ~ - ' / ~ T , ( O ,O ) ~ - ' ( $ J ~4) - -TO A22n-1/2L,(0)y-1(+74).
Q
(6.10.46)
Thus, the last expression has the same asymptotic distribution under both {K(,)} and Ho. Since the asymptotic mean is AT;6y($, d), Theorem 6.10.2 follows from (6.10.10). The next theorem relates to the asymptotic distribution of &(6, - 8 ) . Theorem 3. Under {K(,)} and the assumed regularity conditions for the model (6.10.1) as n -+ 00, we have
where
B=Diag(?,
...
".) A,
-
(7) ,
i , j = 1,... , p .
(6.10.48)
Proof. Assume, without loss of generality, 8 = 0. From the relative compactness of n1/2g,, the definition of 6, and the relations (6.10.21) through (6.10.24), it follows that
Chapter 6. Parallelism hdodel
332
(i) n W n - ( A ~ ~ - ~ / ~ T o)T-l(+, , ( o , 4) - ~ ~ n 1 / 2 f 1i:~0,) (ii) n 1 / 2 P n - & l P n l / 2 t : ( 0 ) y - 1 ( + , 4) 5 0, and (iii) y-l (4,4)n-'/'L,(Pn) -(n-'I2Ln (0)- - - A ~ ~ T L ' / ~ P 4)) ~~5 - ' 0. ( $(6.10.49) , The relations (6.10.48i, ii) imply n'/2e,
-(
1
A ~ ~ - ~ / ~ Toly(+, , ( o , $1 - V ~ o i p n - 1 / 2 ~ : ( ~ ) y - l 4)) ( 1 , 3 0, (6.10.50)
while the relations (6.10.48ii, iii) imply
n-1/2Ln(bn)- (n-'/2Ln(0)
- -=ToA~ilpn-1/2L:(0)) 1 -+ P 0.
(6.10.51)
Q
Next, the relations (6.10.49) and (6.10.50) imply that under {K(,)},
( n'!$!Ln-$:)
)
is asymptotically equivalent t o the random vector
A ~ ~ ~ ~ - ~o )/ ~~- ~T( 4) +~ , ( go1 ~,o i p n - 1 / 2 ~ ; Y(+, - 1 4) c'/~L,(O)
- & n - 1 / 2 A ~ ~ 1 p L ; ( 0 ) y - 1 (4) +,
But the asymptotic distribution of (6.10.52) under
. (6.10.52)
{IY(~)} is the same as
A ; ~ ~ - ' / ~ T , ( o )n1/2d)T-1(+, , 4) - & T ~ ~ , ~ - ' / ~-Ln-1/2d)y-1(+, ; 4) n-1/2Ln(-n-'/2d) - Ln-'/2A-11 L* (-n-l/Zd)y-l(+, 4) Q 22 P n
)
(6.10.53) under H' : 6 = p = 0 by the fact that the distribution of T,(a, b) under 6 = a,3!, = b is the same as that of Tn(6- a,P - b) under 6 = 0, p = 0, and similarly for L,(O). Using again (6.10.21) through (6.10.25), we find that the row vectors under H* have the same distribution as
A,1n-1/2Tn(0, O)T-'($, n-1/2Ln(0) - 1A-11 22
p
4 ) - &TOlpR-1/2L;(a)y-'(+, 4)
+ Tod - & ( T o l P ) ( d ' A T ~ l ~ )
,,,,-1/2
-K(o)7-1(1> 4) + -
&w+, 4)
~ ( A y i l ~ ) ( d ' A ~ i l p ) y4) (+,
(6.10.54) However, by the definition of {K(,l}, we have d/ATilp = 0. Thus, it follows immediately that the constant term converges to (Tod,y(+, 4)AT;d) as in the theorem. Now, by (6.10.10), it follows after straightforward computations that the asymptotic covariance of the vector above is given by the expressions in the theorem, that is, E* =
( %' G2 ).
Corollary 1. The random vector independent.
,/%(en- 6 ) and the test-statistic L, are
333
6.10. Nonparametric Methods: R-Estimation
Corollary 2. The statistic under {K(,)} is asymptotically equivalent to
-1L'
A-2 +
n
,(O)(A22 -
1
= AT2n-'Lh(O)(JA2zJ')L;(O)
=l,l;)L;(O)
Q
(6.10.55)
as in the proof (vi) of Theorem 6.10.1. Based on the procedure above, we have the following theorem:
Theorem 4. Under {K(,)} and the assumed regularity conditions, as n we have (i)
( &(Pn
- P)
&(&I
02 =
) - { ( :) (
'A where r2(414'
~
where A;, = A0
;
N2,
A11 = A0
a2
- 9 3 2 2
A22
+ ToA22To.
+ &(Tol,lkTo) and
A;, = -11 l'To. Q P P
4
00
)},
(6.10.56)
(6.10.57)
Similarly, we can write
TO n1/2fjn = A22n-1/2Tn(0, 0 ) ~ - ' ( $ 1 @) -lpl~n-1/2L,(0)y-1($,
Q
4) + oP(l)
+ TO(A22- ~Q l p l ~ ) ~ - 1 ~ 2 L , ( 0 )+~0,(1) -1(~l~) (6.10.59) = d 2 8 , + T ~ J ( ~ ~+/ o,~ (I),& ) = n1/28,
where J = I, - an;;. Hence, straightforward computation under {K(,)} Q leads to the theorem. 1 1'
Chapter 6. Parallelism Model
334
The results above show us that the test-statistic, 13, can be represented as 13, = nc- 2 P, - l JIA 2’- ~P, J”
+ op(l),
(6.10.60)
where a’ = A$/-y2(6,4). Consequently, we can write the PTE, SE, and PRSE of 6 as follows:
+ T ~ J ( ~ ~ / ~ P , ) I<( x;-l(a)) L, +op(l), -S (ii) n1/’8, = n1I28, + (p - 3)ToJ(n’/2P,)13;1 + o p ( l ) , (i) n1/26ET = n1/28,
and (iii)
n1/’8:+
= n1/’8:
(6.10.61)
+ ToJ(d/’&J[l
- (p-
~),c;~]I(< L ,p - 3) + op(l).
Using Theorems 6.10.3 and 6.10.4, we can determine the ADB, ADQB, ADMSE, and ADQR of the &Estimators of 8 under { K(,,} as follows:
Unrestricted Estimator (6,)
(i) bl(8,) = 0 and
&(8,)
= 0,
(ii) Ml(8,) = o’A11 and Rl(6,; W) = a’tr[Whll].
6.10. Nonparametric hfethods: R-Estimation
335
-S
James-Stein-Type Estimator (6,)
-s
(i) b4(On) = -(P - ~ ) T O J S E [ X , S ” ~and (A~)]
B4 (6:) = (P- 3)2(6’J’T~A;;T~6) { E [xi& (A2)]}2;
These expressions are similar to those of Section 6.5. Hence, the conclusions on the properties of the estimators in Sections 6.6 and 6.7 hold.
336
Chapter 6. Parallelism &lode1
6.1 1. Conclusions where A = I, -
337
ipi;A&’ Q
. Then we can see that under {K(,)},
5 x2,(r)}
= limPK(,){n~,211~ - 6:lli;; = Po{ 11%
- ~ l ~ ( l / ~ l l I 2+ ) 1(772 l 2 - w2)2 <
x2,(r)},
which is similar to the expressions for the coverage probabilities given in Section 6.8. As such, the asymptotic coverage probabilities of each of the confidence sets are available in Section 6.8.1 with the analysis in Section 6.8.2 and are left to the readers to peruse.
6.11
Conclusions
In this chapter, we considered estimations of the intercept and slope vectors of several sample linear models to test the hypothesis of the equality of the slopes (i.e., the lines are parallel). We presented the normal theory of point estimation and confidence set estimation in Sections 6.1 through 6.4 and the nonnormal theory in Sections 6.5 through 6.7. In Section 6.8, we considered the confidence set estimation. Finally, we studied the nonparametric methods, namely the R-estimation by which we expanded the scope of the application of these estimators. In every case we considered the unrestricted, restricted, preliminary test, and Stein-type estimators of parameters. Asymptotic theory dominates the analysis of nonnormal and nonparametric methods under local alternative to parallelism hypothesis. Also, we discussed the confidence sets recentered a t the five estimators.
6.12
Problems
1. Find the joint distribution of ((6,- 6)’,(6, - 6,)’)’ based on the Theorem 6.1.1. 2. How would you test the hypothesis HOI : 6 = 60l, against the alternative HA^ : 6 # BOl,. Define the test of HO after the test of H o :~p = p o l p against HA^ : # pol,. 3. Verify the bias, quadratic bias, MSE matrices, and risk expressions of Section 6.3. ., PT ..PT 4 . Define the improved estimation of the PTE, 0, , and 6 , and obtain their bias, quadratic bias, MSE, and risk expressions.
a,),
5. Verify the expressions of MRE(6, : en),and MRE(6; : 6,). 6. Consider the usual simple linear models
,.PT
MRE(6,
:
en), MRE(6:
:
338
Chapter 6. Parallelism Model
where Y , is a (n, x 1)-vector, ,l is a 1-vector of n,- tuples of ones, E , = ( e e l , - . - ,e,,,)', x, = (z,~,... ,zan,)'and E, are i.i.d. random variables h/(0,o2I,_).Estimate 8 when it is suspected that 8 belongs to the subspace 8 = 801,. Define the unrestricted, restricted, PTE, SE, and PRSE. Determine the bias, quadratic bias, MSE, and risk expressions of these estimators. 7. Refer to the model in Problem 6. Now, assume that the distribution of E, is
Define the unrestricted, restricted, PTE, SE, and PRSE when p belongs to the subspace p = pol, ( p = ( P I , . . . ,p,)'), and find their bias, quadratic bias, MSE matrices, and risk expressions. 8. Refer again to Problem 7. Define the confidence sets for ,B and 8. Can you find the expressions for the coverage probabilities of these sets? to { ( V Z -wz)' Ilql - w1g(11w1112)112),and 9. Transform a-2118: - OlIb;,. show that
+
P{~-211~:, -;;8 1 ; = P((V2 - W 2 Y
5 x;(Y))
+ 11171 - wlg(llwll12)112I x;(Y)).
10. Prove Theorem 6.8.1 for the confidence sets for 8. 11. Verify the expressions of the asymptotic biases, ADMSE, and ADQR of the R-Estimators of p and 8 under 12. Refer to (6.10.27). Show that the rank estimator of common slope ,&,and the average of the slope parameter ,& (i = 1,. .. , p ) are asymptotically equivalent in probability, meaning $, = ,&,lp+ op(l).
Chapter 7
Multiple Regression Model Outline 7.1 Model, Estimation, and Tests 7.2 Preliminary Test Approach and Stein-Type Estimation of Regression
Parameters Bias, Quadratic Bias, MSE, and Quadratic Risk Expressions Risk Analysis of the Estimators ILISE-Matrix Analysis of the Estimators Improving the PTE Multiple Regression Model: Nonnormal Errors Asymptotic Distribution of the Estimators Confidence Set Estimation 7.10 Asymptotic Theory of Confidence Sets 7.1 1 Nonparametric Methods: R-Estimation 7.12 Conclusions 7.13 Problems 7.3 7.4 7.5 7.6 7.7 7.8 7.9
The most important model belonging to the class of general linear hypotheses is the multiple regression model. It is useful in the analysis of scientific, engineering, agricultural, biostatistics, and econometric data among other applications. In this model, n independent ( p 1)-variate samples { ( z a l , ... ,zap)’,yala = l , . .. ,n} are considered such that ya JV(P,Z,~ . . . PPzap; 0’) for each fixed pvector ( x , ~ ,. . . , zap)’.The vec= (PI,.. . PP)’ is the vector of regression parameters of the model, tor and ( ~ ~ 1. ., ,.xap)’ are the design/predictor variables. The main use of the model is to estimate efficiently the regression parameter, P = (PI,. . . ,&)’ as well as to test the hypothesis Ho : HP = h for a given matrix H q x pand hqxl against the alternative H A : HP # h. Accordingly, we consider the unrestricted, restricted, preliminary test, James-Stein-type, and positive-rule Stein-type estimators. In addition, we consider the confidence set estimator
+
+
N
+
339
Chapter 7. Multiple Regression Model
340
of p based on the five estimators as the center of the confidence sets with fixed volume, and we study their properties. We also provide some asymptotic theory for nonnormal errors and R-estimators in a nonparametric setup.
7.1
Model, Estimation, and Tests
Consider the multiple regression model (MR,M)
Y
=xp+.E,
(7.1.1)
where Y is a n x 1 vector of response variables, X is an (n x p ) design matrix of full rank (i.e., X’X is invertible), P = (01,.. . ,&,)’ is the vector of regression parameters, and E = ( & I , . . . ,E,)’ is the ( n x 1) vector of errors distributed according to normal distribution N,(O, c21,), I, is the identity matrix of order n and u2 is the common variance of the error variables. For more details of MRM, see Graybill (1976) and Arnold (1981) among others.
1,
Remark 1. This model includes the psample model to test the equality of the population means. This is done by choosing
L,
x=[:
0
...
0 l, 0
... 0
...
... l n p
where ln3= (1,.. . , 1)’, a vector of n3 one’s, j = 1 , 2 , . . . ,p. Generally, the main objective of the multiple regression model is the estimation of the regression parameters and the prediction of response for a given design matrix. This is in addition to testing relevant hypotheses such as Ho : HP = h, where H is a ( q x p ) matrix of rank q and h is a ( q x 1)-vector of known constants. The hypothesis is relevant for many situations and includes the equality of the components of the vector p and the subhypothesis Ho : (/3i,pL)’= (pi,0)’. In this chapter,, we will pursue the objective of improving the traditional point estimator of p by combining the results on the test of HO : HP = h based on normal, nonnormal, and nonparametric theory. In addition, we will discuss properties of recentered confidence sets based on the improved estimators.
7.1.1 Estimation of Regression Parameters of the Model Consider the MRM given by (7.1.1). On the basis of the sample information and the model, the unrestricted estimator (UE) of p by the least squares (LS) or maximum likelihood (ML) method is written as
p, = (x’x)-Ix’y= C-IX’Y, c = X’X.
(7.1.2)
7.1. Model, Estimation, and Tests
34 1
The corresponding unrestricted unbiased estimator (UUE) of u2 is sz =
1
-(Y
-
m
+ +
x/~,)’(Y - XB,),
-
m =n -p,
(7.1.3)
where n = n1 + n 2 . . . np. Clearly, p, Np(p,a2C-l) independent of the distribution of ms:/a2, which has a central chi-square distribution with m d.f. Suppose now that in addition to the sample information and the model, we have some additional information described by the null hypothesis Ho : HP = h. If this hypothesis is true, then our revised least squares or ML estimator of p is given by I
p, = p, - C-lH’(HC-lH’)-l (Hp, - h). Under Ho, E(p,) = p. Further, p, Np(P,a2A), where A
-
c-l - c - ~ H ’ ( H c - ~ H ’ ) - ~ H c - ~ .
=
(7.1.4)
(7.1.5)
Remark 2. For
6,
- Np(p,u2N-’),
. . , Tp)’
= g, = (Ti,. j j 1 ) 2 . Further, for
H=
6, = 2, have
2,
-
[
1 1
-1 0
..
.
0 -1
..
. 0 ...
1
= n-’(n1jjl
..0 0
... 0
.
.
and s: = m-l
and
C zp= 1 En” J = l(Yv . -
h = 0,
-1
-
+ ... + npy,) = Y. Under Ho : p1 = ... = p p = p we
N ( p ,u 2 / n ) .
7.1.2 Test of the Null Hypothesis, HP = h In this section, we consider the test of the null hypothesis Ho : HP = h against the alternative H A : H,B # h. The test is given by the following theorem: Theorem 1. The likelihood ratio statistic for testing HO against H A is given bY
c,
=
(Hp,
- h)’(HC-lH’)-’(HB, - h)
9s:
(7.1.6)
342
Chapter 7. Multiple Regression Model
Under H A , L, has a noncentral F-distribution with (q,m)d.f. and noncentrality parameter A2/2, where
(Hp - h)’(HC-’H’)-l(Hp
A =
- h)
Proof. The likelihood ratio test is given by L o / L A ,where LO
(7.1.7)
02
(-f)
= &;n(a)-nexp
and L A = a,”(v%)-”exp
with
(-5)
(7.1.8)
Now,
and
1
-
,
a being a constant free of C,.
-
Thus, C,-defined by (7.1.6) is the LR statistic for testing HO against H A . Since p, Np(/3,0 2 C - ’ ) ,HP, N,(HP, 02(HC-’H’)) so that (HB, h) N,(H/3 - h,02(HC-’H’)). Clearly, (HB, - h)’(HC-lH’)-’(H,?ln h)/02 follows a noncentral chi-squared distribution with q d.f. and noncentrality parameter A2/2. Hence, under H A , L, follows the noncentral Fdistribution with (q, m ) d.f. and noncentrality parameter A2/2. The theorem follows.
-
Remark 3. It is easy to see that the psample test-statistic for the equality of means is given by
343
7.2. Preliminary Test and Stein-Type Estimation
Some distributional results involving the estimators, in the following theorem, which is easy to prove:
pn and an are given
Theorem 2. Under the assumed conditions, (i)
-
v::; = (p,- p) N,(o,0 2 c - 1 ) ;
(ii) V::; =
(p,- p ) - N , ( - C - ~ H ’ ( H C - ~ H ’ ) ( H-~h),u2A}
where A = C-I - C-lH’(HC-lH’)-lHC-l; (iii) Vy:), = ( f i n
-
-bn)
Np{C-lH’(HC-lH’)-l(HP
- h),a2(C-l- A ) } ;
C- H‘(HCIH‘)- (HP -h) - H‘(HC - I,‘) - (Hp- h) );n2(
CY-A)};
Vil’lHpn - h} = C-lH’(HC-lH’)-l(Hfin - h) -C-1H’(HC-1H’)-1(H/3 - h) and
E {Vi1)IVi3)}= [Vp’ - C-lH’(HC-lH’)-’(HP
- h)];
(viii) P ( L n 5 z) = G q , m (A2) ~ ; where Gq,m(z; A2) is the cdf of a noncentral F-distribution with ( 4 , m ) d.f. and noncentrality parameter A2/2.
7.2
Preliminary Test Approach and Stein-Type Estimation of Regression Parameters
In this section, we discuss the preliminary test approach t o shrinkage estimation in addition t o the empirical Bayes approach estimation of the regression parameters .
7.2.1 Preliminary Test (or Quasi-empirical Bayes) Approach Following Chapter 5, we write the preliminary test estimate (PTE) of PT
Pn A
=Pn
- ( P n - bn)1(Ln< ~ q , m ( a ) ) ,
p
as
(7.2.1)
Chapter 7. Multiple Regression Model
344
where Fq,m(a) is the upper cx level critical value of the F-distribution with (q, rn) d.f. PTE depends on the level of significance, a and t o make it independent of a , we defined a James-Stein-type estimator, similar to the ANOVA and parallelism model as (7.2.2)
,.PT
-s
Notice that the forms of P, and p n are similar where (7.2.2) obtained from (7.2.1) by replacing I ( & < Fq,m(cx)) by dC;' to make it independent of a, the level of significance. -s The estimator 0, may go past the estimator fin. Thus, we consider the positive-rule Stein-type estimator given by
which is a PTE based on
Pn and P,- swith critical value d. -PT
Now, consider the expressions for Pn
-S
S+
, 0,and Pn given below:
(7.2.4)
These expressions will be used to compute the bias, MSE matrices, and quadratic risks of the estimators. For an application of PTE in the design of linear models, see Brenda (1996).
Bayes and Empirical Bayes Estimators of the Regression Parameters
7.2.2
-
-
Assume that YIP Mn(X,B,021n)as given for the model (7.1.1) with the assumption that C = X'X is of full rank. Next, we assume P N p ( u ,T ~ V ) , where u is a pcomponent vector and V is a p x p nonsingular matrix and r2 is a scalar. Then, we have the posterior distribution of P given Y in Theorem 1 that follows.
Theorem 1. The posterior distribution of p given Y = y is
{ + (C +
Np
Y
$V-')
-'}
-'X'(Y - XY), o2 (C + $V1)
. (7.2.5)
345
7.2. Preliminary Test and Stein-Type Estimation
Proof. First, note that c121Tl
+ r2XVX’
] }.
r2xv
r2V
(7.2.6)
Hence, using the conditional formula, the distribution of p for Y = y is given by
Np{v+ 72VX’(a2I, + PXVX’)(Y - Xv),
r2v- r4VX’(a2I, + r2XVX’)-1XY}.
(7.2.7)
Now, by Theorem 10 of Chapter 2, we have
(21,
+ ?xvx’)-‘
=
1
(r-2)
= (r-2)
[Z
71, -
74 -x 04
-l
(-X’X 72
02
+ v-1)
-1
XI]
We have from (7.2.7), the following expression:
r2VX’(2In + r2XVXt)-1 7-2
=U2
=
[,,. VX’X(X’X + -v-
( c + F2v - l
O2 72
)
)
-l
3
x
X‘.
(7.2.9)
Hence,
r2VX’(a2I, + T2xvx’)-1xv CV = V
-
U2
(C
02 + ;zV-’)l.
(7.2.10)
The conclusion of the theorem follows from (7.2.7) through (7.2.9). Now, since V = C-l, the Bayes estimator of
B:
= E(PIY = y) = Y
p is given by
+ (1 - B ) @ ,
-
v),
(7.2.11)
+ r2 and the a posterior distribution of /3 given Y = y is (7.2.12) Np{ v + (1 I?)@, v), 2 ( 1 - qc-’).
where B = c12/c12
-
-
346
Chapter 7. Multiple Regression Model
To obtain the empirical Bayes estimator of p, the parameters v , r2 and o2 have to be estimated because they are unknown. By taking V = C - ’ , we
obtain the marginal distribution of Y as
Y
- N;,
(XY,x)
(7.2.13)
7
where
r, = 0 2 1 ,
-+ ?PX,
& = xc-lx’,
(7.2.14)
and PXis the orthogonal projection on the subspace spanned by the columns
of the design matrix, X .
+
The next theorem provides the complete sufficient statistics for (v,02 r 2 ,u 2 ) based on the marginal distribution of Y b y direct computation.
Theorem 2. Based on the marginal distribution of Y and the assumption Hv = h,
{ p,, (Hp, - h)’(HC-’H’)-’(Hp, is a complete sufficient statistic for (v,o2
- h),ss}
(7.2.15)
+ r 2 ,0 2 ) .
Proof. First, note that
+
(021n ~’Px)-l= o-2 [I, - (1 - B)Px] (7.2.16) Hence, from (7.2.12), the probability density function of the marginal distribution of Y is proportional to exp
1 { -s IIY
- Xvllh.}
(7.2.17)
*
By the Pythagorean theorem,
p - xv1/;* = IIy - XP&* + llXPn - xyII:*.
(7.2.18)
Since Xp, = X(X’X)-’X’Y = P x Y , where PX= X(X’X)-lX’, and using the fact that & X = X and Y - Xp, = (I, - &)Y,
(7.2.19)
we have llXPn - xvllc, = (XP, - XY)”I, - (1 - B)Px](Xfi, - X v ) 2
=
(123,
- v)’X’[I, - (1 - B)Px](Xfi, - XV)
= B ( p , - v)’C(p, - v )=
2
Blip, - vllC
(7.2.20)
7.2. Preliminary Test and Stein-Type Estimation
347
and
IIY - xa,11;*
= (Y - XP,)"I,
- (1 - B)Px](Y-
xa,,
= Y'(1, - Px)"In - (1 - B)Px](I, - P x ) Y
= Y'(I, - Px)'(I, - PxY) = IIY - Xp,l12. (7.2.21)
b,
= [I,-C-'H'(HC-'H')-'H]~,, where I,- C-'H'(HC-'H')-'H Now, is an idempotent matrix and I, - (1 - B ) P x is the orthogonal projection on the subspace, Rq.Assume h = 0. Then, by the Pythagorean theorem, we have
where
lip, - bnll; = (HP, - h)'(HC-'H')-'(Hp,
- h).
Therefore, from (7.2.16) through (7.2.21), it follows that
a-21jY-xv11;,
= O-2(n-p)s:+(a2fT2)-'
{/lan-b,ll;
+ IIb, - vlIc}. 2
(7.2.23) Now, by the condition of sufficiency and completeness for the regular exponential family, the theorem follows. Note that
E(P,) = E ( C - ' X ' Y ) and
Hv
E ( b J = IJ (Ha,
and
- h)
(7.2.24a)
+ T~)C-'.
Cov(p,) = Cov(C-'X'Y)
Thus, imposing the condition
=v
(7.2.24b)
= (a2
= h, we obtain
- Nq(O,
+
(a2 T~)HC-'H').
(7.2.25)
Therefore, it follows from (7.2.25) that 1 L* = (HP, - h)](HC-'H')-'(HD, - h) 02fT2
Q
xq. 2
(7.2.26)
+T
Using Theorem 7.2.2 and (3.7.25), we have the UMVUE of (a2 2)L;', where
L, =
a,
(Hp,
- h)'(HC-'H)-'(Ha, 9s2
- h)
~ as) ( q
-
(7.2.27)
and is UMVUE of u. Further, the best invariant estimator of a2 is msz/m + 2. Inserting the estimators above of v , (a2+ T ~ ) - ' , and o2 in the
Chapter 7. Multiple Regression Model
348
Bayes estimator (7.2.10), we obtain the empirical Bayes estimator (EBE) of P as (7.2.28) Note that by the P T E approach, we have the same estimator of P. Now, as L, 0, L,' -+ 00, the estimate will have values past fin. Thus, we restrict L, > to obtain the positive-rule Stein estimator (PRSE), --f
Notice that (7.2.27) and (7.2.29) are the same as (7.2.3) and (7.2.4), respectively. The reader may refer to Ali and Saleh(l99lb) and Ghosh, Saleh, and Sen (1989) for more information on EBE
7.3
Bias, Quadratic Bias, MSE and Quadratic Risk Expressions
In this section, we obtain the bias, quadratic bias, MSE matrices, and the weighted risk expressions of the five estimators of P.
7.3.1 Bias Expressions The bias and quadratic bias expressions of the five estimators of the regression coefficients are given in the following theorem:
Theorem 1. (i) bl(P,) = 0 and B1(B,) = 0.
(ii)
-C-'H'(HC-'H')-'(HP
b2(Bn) =
&(a,)
(7.3.1) -
h) = 6 (say) and
= &(HP - h)'(HC-'H')-'(HP - h) = 6'C6/a2 = A2.
(iii) b3(p:T)
=
-C-'H'(HC-'H')-'(HP
- h)G,+2,,(e,;
A2)
= -6Gq+z,,(L; A2),
where & = &Fq,,(a), (iv)
bl(b:)
and
PT
&(On ) = A2{Gq+z,,(!,; A2)}2.
-dqC-'H'(HC-'H')-'(HP = -WE[x;?2 (A2)I
=
-~)E[x~:~(A~)]
7.3. Bias, Quadratic Bias, MSE, and Quadratic Risks
349
Proof. (i) is obvious. For others using Theorem 7.2.2, Theorem 2.2.6, and equation (7.2.4), we obtain the following:
(ii) E ( 6 , - P ) = -C-'H'(HC-lH')-l(HP (iii)
&(fin) = A'. (7.3.2) E(P, -0) = -C- 'H' (HC-lH')-lE{(HB - h)I(L, I Fq,,(a))} - h) and
..PT
= -C-'H'(HC-'H')-'(HP
by Theorem 2.2.6. Thus,
..PT
- h)Gq+2,,($;
A2)
) = A2{Gq+2,,(!,; A2)}2.
S
(iv) E(b, - P ) = -dqC-'HC-lH')-l(HP
- h)E[&2(A2)],
-s and B4(/3,) = d2q2A2{ E [ x ~ : ~ ( A ~ ) ]by } ~Theorem 2.1.16. (v)
a:+ -a)=E[Bz-P]-
E
[-P, - P , ) W n
+ dE [ ( P , - B,)L;'I(L,
I 41
I 41.
The first term is given by (iv). The second term equals
-S(Gq+z,,(di; A'),
dl = -d. Y+2
Now, the third term can be written as
E
[(p,- b , ) L z l l ( L n 5 d ) ] = di6E [F>'2,,I(Fq+2,m(A2)
Collecting the expressions in (7.3.2) and S+
b4(&),
I di)]
I
we obtain the expression
b5(p;+). The expression for &(On ) is easily verified. In order to study the bias of the estimators, we consider the quadratic bias * PT AS . S+ expressions B ~ ( P , ) , ~ 3 ~ 31, ,B~(P,),and B~(P,1. clearly, is
B~(B,),
a,
Chapter 7. Multiple Regression Model
350 unbiased and the bias of
a, is-unbounded. The bias of P , s -
. PT
S+
size of a and A2. The bias of P, and P, establish the relation
o = ~l(p,)I ~
s ( ~ f5+~ ) 4
depends on the
depend on A2 alone. Thus, we can
..
PT ( ~ 5 z ~) 3 ( ~ n 5. ~ 2 ( B n )
(7.3.3)
PT
under certain conditions on a. Note that for a = 0, B3(Pn ) = B2(Pn)7while A
. PT
-s
for a = 1,B3(p:T) = Bl(p,). The position of B3(P, ) and B4(Pn)switches, depending on the value of GU.
7.3.2
MSE Matrices and Weighted Risks of the Estimators
The MSE matrices and the weighted risk based on the loss function
L(P*,P)= (P’
- P)’W(P*- P ) =
are given in the following theorem:
p*- PI[;
(7.3.4)
7.3. Bias, Quadratic Bias, MSE, and Quadratic Risks
351
Proof. (i) is obvious. The remainder of the results follow by Theorems 7.2.2 and 7.2.6 and formula (7.2.4).First, consider
(4W b , ) = E ( b , - P)@, - P)’ = E { @ , - p) - C-lH’(HC-’H’)-l(Hfi,
{(p, - 0) - C-’H’(HC-’H’)-l(Hb,
= 02C-’
+ C-lH’(HC-lH’)-lE{(Hfi,
’
’
’
’
- h)} - h)}’
- h)(Hfi, - h)’} x
-’ ’
(HC- H’)- HC- - 2 C - H’(HC H’)- x E{(HP, - h M , - PI’>
= 02c-l+ C-~H’(HC-~H’)-~{~~(HC-~H’)
+ (HP - h)(HP- h)’}(HC-’H’)-lHC-’ -
~D~C-~H’(HC-’H’)-’HC-’
= 2c-1 - a
+
2 ~ - l ~ ’ ( ~ ~ - 1 ~ ’ ) - l ~68’. ~ - 1
(7.3.6a)
The risk expression is given by
Rz(fJ,; W) = 02tr[WA]+S’W6,A = C-’
- C-lH’(HC-’H’)-’HC-l.
(7.3.6b)
352
Chapter 7. hdultiple Regression Model
M ~ ( B : ~=) E
(iii) =
( ~ , P-~p)(jfT
-
p)’
{ ( P , - p) - C-’H’(HC-’H’)-’(HP, x { ( P - p) - C-’H’(HC-’H’)-’(HP,
= E [ ( P ,-
x
-
h)l(L, < Fq,m(a)} - h)l(L,
< Fq,m(a))}’
p ) ( P , - p)’] - 2E[C-’H‘(HC-’H’)-’(HP,
- h)
(P, - P>’l(Ln< % m ( a ) I
+ EIC-lH’(HC-lH’)-l(HPn
- h)(HP, - h)’
x (HC-~H’)-~HC-~I(L,
(7.3.7)
Now, we use the following general theorem t o evaluate (7.3.7).
Theorem 3. Under the assumed conditions and continuous function g(C,), we have (i)
E{C-lH’(HC-lH’)-l(HP, - h)@, - P)’g(Ln)}
(ii) E { [C-’H’ (HC-lH’)-l(Hp, - h)(HP, -h)’(HC-’H’)-’HC-’] x S 2 W }
(7.3.9) by Theorems 7.1.1 and 2.2.7
7.3. Bias,Quadratic Bias, MSE, and Quadratic Risks
353
Now, let g(13,) = I ( & < Fq,m(a)).Then
= Gq+4,m(lz : A')
for s = 1,2. Applying Theorem 7.3.3 in (7.3.7) and choosing g(&) Fq,m(a)),we obtain
=
i(c, <
,.PT M3(Pn ) = o2C-' - 02(C-'H'(HC-'H')-1HC-')Gq+2,m(t,; A2)
+ 66' {2Gq+2,m(fa;5') - Gp+4,m(l:;
and
(ii)
(iii)
A')}
354
Chapter 7. Multiple Regression Model
7.4. Risk Analysis of the Estimators and the expression of
7.4
R5
(6;’;
355
(
. S+
W) follows as we compute tr WM5(P, )).
Risk Analysis of the Estimators
In this section, we provide the weighted risk analysis of the five estimators with the general loss function
L(P*;P)= (P* - P)’W(P*- P),
(7.4.1)
where W is a p.s.d. matrix. We impose conditions on W so that Stein-type estimators dominate the unrestricted estimator for all A2.
Comparison of
Bn and p, p,
It is clear that the risk of is constant, meaning a2tr(WC-’). However, the risk of depends on 6’W6, since
a,
&(fin; W) = o2t r ( W C - l )
- o2tr[WC-lH’(HC-lH’)-lHC-l]
+ 6’W6,
(7.4.2)
where
6 = C-lH’(HC-’H’)-’(HP
- h).
Note that C-1/2Hf(HC-1H’)-1HC-1/2is a symmetric idempotent matrix with rank q ( 5 p ) . Therefore, there exists an orthogonal matrix I’ such that
rC-1/2H’(HC-1 Hf)-l HC-’/2rf = and
Then
(7.4.3)
356
Chapter 7. Multiple Regression Model
(7.4.5)
= v;Allvl?
where
v = r C 1 / 2 P - rC-1/2H’(HC-1H’)-1h
=
Thus, using (7.4.3) and (7.4.4), we obtain
R2(,6,;W) = u2tr(WC-l)
- u2tr(Al1)
+ q:A1lql
(7.4.6)
By Courant’s theorem (see Chapter 2), we may write (7.4.7)
or a2A2Chmi,(All) I: qiA11vl 5 u2A2Chmax(All),
(7.4.8)
where Chmin(A1l)and Ch,,(All) are minimum and maximum characteristic roots of A l l and A2 = qiql/u2. Then, we write R l ( P , ; W ) - u2tr(A11) + Chmin(A11)
I: ~ 2 ( b n ; ~ ) I ~1 (P,; W) - u2tr(Al1) + Chmm ( A l l
7
=P,the bounds are equal. Thus, (7.4.9) means that P,
When A2 better than
whereas
P,
p, whenever
performs better than
For W = C, we see outside this interval
performs
b, whenever
a, performs better than p,, in the interval
p,
(7.4.9)
performs better than
,6,.
[O,q] and
357
7.4. Risk Analysis of the Estimators
..PT
Comparison of ,B,
and
p,
Consider the risk-difference Rl(p,; W) - R3@ftT; W) = 0’ tr(All)Gq+2,m(ea;A’)
- ( V / ~ A I I V I{2G,+z,m(L; ) A’) - Gq+4,m(e:; A’)}.
(7-4.10)
The r.h.s. of (7.4.10) is nonnegative ( 2 0) whenever
..PT
In this range 0, performs better than
- PT
p,
p,, whereas p, performs better than
whenever
For W = C , tr(Al1) = q, and the required intervals follow from (7.4.11a) and (7.4.1lb).
- PT is superior t o p,, -
Under Ho : HP = h, P,
since (7.4.10) is positive for all
..PT
a-values. We can describe the graph of R3(Pn ;W) as follows: At A2 = 0, the graph has a value equal to o2tr (WC-l) - 0’ tr(A11)Gq+2,m(la;O);
(7.4.12)
then it increases, crossing the line 0’ tr(Wf2-l) t o a maximum as A2 increases from 0 t o the value Aka. The graph proceeds to decrease towards PT
as A’ 4 00. For (Y = 0, R3(Pn ;W ) = R,(p,; W ) , and for . PT ..PT ;W) = R1(B,; W). The efficiency of 0, relative to is
0’ t r ( W C - l )
a = 1,
,.PT
PT
p,,
p,
we must obR1(Pn)[R3(P, )]-I. Since 0, is not uniformly better than tain a PTE, selecting an optimum level of significance a* with a minimum guaranteed efficiency, say Eo, by solving the equation max min E(a,A’) 2 Eo,
O < a < l A220
(7.4.13a)
where
E ( a , A ’ ) = R l ( p l : W) [ R 3 ( b r : W1-l.
(7.4.13b)
Chapter 7. Multiple Regression Model
358 PT
Comparison of ,B,
and
bn
Note that both PETand 0, are superior to the risk-difference is given by
B, under HO : HP = h. In general,
(7.4.14)
(7.4.15a)
,.PT is superior to p,
and P,
whenever (7.4.15b)
Under Ho, the risk of the three estimators may be ordered as
R ~ ( B , ;5wR) ~ ( P ~ ; LWR l) ( B n ; W ) .
(7.4.16)
Comparison of ,B, and ,fl, -8s
The risk-difference is given by
(7.4.17) The risk-difference is positive whenever (7.4.18) Note that A l l involves the matrix W. -s Thus, P, uniformly dominates Further, as A' tends to 0 from below.
p,.
4
co,the risk-difference
359
7.4. Risk Analysis of the Estimators
-s
b,
Comparison of ,B, and In this case, the risk of
-s
0,
may be written as
R4(bZ;W) = R2(P,;W)
+ a2tr(All)q;A117jl + CJ2~4tr(A11){( 4 - 2)E [x;:2(A2)I
}
(7.4.19) Under Ho, this becomes
R4(B:; W) =
&(a,;
W)
+ u2(1- d)tr(A11)
2 ~ 2 0 3 , ; w),
(7.4.20)
while
-s
6,
Thus, performs better than p, under Ho. However, as ql moves away from the origin 0, meaning A’ diverts from 0, and the risk of ,h, becomes
-s
unbounded. The risk of p,, however, remains below the risk of with it as A2 origin.
.+ 00.
-s
Thus, p, dominates -S
p, and merges
b,, outside an interval around the
PT
Comparison of 0, and ,B, First, under HO the risk of
L.
-s
p,
is given by
&(bf;W) = &(BET;W) + u2tr(Aii)[Gq+2,m(fa;0) - d] PT
2 R3(P, ; W )
(7.4.22)
whenever Gq+2,7n(Ca;O) > d-
(7.4.23)
That is to say, (7.4.22) holds whenever there exists an a such that
(7.4.24) -S
The risk of p, is smaller than that of the risk of satisfies the opposite inequality, that is,
. PT
p, when the critical value (7.4.25)
Chapter 7. Multiple Regression Model
360
-s
This means that the James-Stein-type estimator p, does not always dominate the PTE,
. PT
p, under Ho, and we can order the risks as - PT
(7.4.26)
Rz(P,;W) L R3(Pn ; W ) L R1(P,;W) when
Q
satisfies (7.4.24).
The r.h.s. is positive semidefinite, since the expectation of a positive random variable is positive by the definition of an indicator function, (0 < Fq+2,,(A2) < di)
-
(diJ’i:z,m(Az)
- 1) 2 0.
(7.4.28)
We get
E [(dlF,-:,,,(A2)
- 1)
I (Fq+z,m(A2)< dl)] 2 0.
(7.4.29)
Thus, for all A’, R5
s+
@+:
; W) L R4
(6:;
W),
-s
(7.4.30)
and 0, not only confirms inadmissibility of p, but also provides a simple -s and p, ..Sf superior estimator. In passing, we remark that the risks of ,On, can be ordered as
P,,
s+ R5(Pn ; W ) I R4(b:;W) L R1(Pn;W) for all A’, provided that
(7.4.31)
7.4. R,isk Analysis of the Estimators
Comparison of
36 1
a,, and p,
Sf
First, consider the risk of
p,S+
under
Ho,in terms of the risk of b,,
Rs(bzt; W) = Rz(,dn;W) + g2tr(A11){(1 - d ) -
E [(I
-
diF,-:,,,(A2))21(~q+z,m(0) < di)]
2 R2(,dn;W),
} (7.4.32)
since
5 E [(l - d1Fi:2,m(0))z]= 1 - d.
- s+
(7.4.33)
Clearly, ,8, performs better than p, under Ho. However, as v1 moves away from 0, A2 increases, Then the risk of becomes unbounded while the risk
- s+ of p, remains below the risk of ,8, . S+
pn
and merges with it as A' dominates ,8, outside an interval near the origin.
2 for all
{
bn
QI
QI
Thus,
(7.4.34)
satisfying the condition
: Gq+z,m(L; 0)
The risk of
&Fq,,(a)
W)
-+ 03.
-St.
p,
[
2 d + E (1 - dlF,-:2,m(o))2 I(Fq+2,m(0)< dl,] . PT
}.
(7.4.35)
when the critical value ! ,= .. s+ satisfies the opposite relation to (7.4.35). Thus, p, does not
always dominate under HO as
IS
smaller than that of /3,
,.PT
0,
under
Ho.We can order the risk of the estimators
Chapter 7. Multiple Regression Model
362
,.PT
the position of the PTE ,On
shifts from “in between” R2(Pn;W) and
-s R5(Pn ;W) t o “in between” R4(pn;W) and R I ( ~ ,W). ; That is to say, the . S+
risk order is
The picture changes when q1 moves away from the origin, 0. The risk of remains constant, and the risk of unbounded, as A2 moves away t o APT - S S+ infinity. Also, the risks of p, , P,&, and P, converge to the risk of fJ, as PT A2 oc). But for a reasonable value of A2 near 0, the risk of p, is smaller
p,
Bn
A
---f
s+
p, . S+ nor p,
than
6;
-
Bf
PT
fin,
as well as that of while the risk of 0, risk of at some intermediate values of A2 depending on a.
7.5
. PT
as well as for a satisfying (7.4.35) for q 2 3. Thus, neither P, -s S+ nor 0, dominate each other. However, the risk of p, is below the exceeds that risk of
p
MSE-Matrix Analysis of the Estimators
In this section, we study the MSE-matrix analysis of the estimators and determine their dominance properties.
Comparison of ,bnand
p,
The MSE (matrix) difference of
~ ~ ( 6- M&,) ,)
p, and Bn is
= G ~ C - ~ H ’ ( H C - ~ H ’ ) - ~H66’. C-~
(7.5.1)
e = ( e l , .. . ,e,)’
The MSE difference is p.s.d. whenever for a nonzero vector we have
(7.5.2)
c ’ [ M ~ ( p n-)Mz(Bn)]f 2 0. That is to say,
c 2 e ’ { ~ - 1 ~ 1 ( ~ ~ - 1 ~ ’ ) - 1>~e166‘e. ~-1}t
(7.5.3)
Hence, we have
e W e < ~’{c-~H’(Hc-~H’)-~Hc-~}~ e. ~
02e’c-1e
-
t’C-1.t
Therefore,
A2 = max
c
e’ { c- H’(HC- H’)- HC1‘66‘4 < max aze’c-le f?C-lt
}e 7
or
A2 F Ch,,[H’(HC-lH’)-lHC-ll
= 1,
(7.5.4)
7.5. h/lSE-Dlatrix Analysis of the Estimators
363
since C-1/2H’(HC-1H’)-1HC-1/2 is an idempotent matrix with maximum characteristic value equal to 1. Thus, (7.5.1) is p.s.d. if and only if A2 5 1. Therefore, performs better than when A2 5 1; otherwise, performs better than The range of the domination of over is bigger in the case of risk analysis. For W = C, the range of risk domination is [0,q] and q 2 1. The MSEbased efficiency of relative to B, is meaningless, since
6, 6,.
M R E ( ~ , ;6,) =
6,
6,
a,
pP - H’(Hc-~H’)-~Hc-~
p,
p,
- a - 2 ~ 6 ~ / j= - ”0,~ (7.5.5)
while the risk eficiency, which is meaningful, is given by
(7.5.6) For W = C , tr(A11) = q and tr(WC-’) = p so that (7.5.6) reduces t o PKP - Q )
+ a21-l.
(7.5.7)
6,
Thus, the risk efficiency of is greatest when A2 5 q, while the range of A2 is smaller for the MSE-matrix comparison. PT
Comparison of ,On The MSE difference of
,
a,, and ,h,
p, and p,. PT is given by
MI@,) - M3(,hET)= u2 (C-lH’(HC-’H’)-’HC-’) Gq+2,m(la; A’) -
SS’{ 2Gq+2,,(la; A’)
- Gq+qm(!;;A’)}. (7.5.8)
The MSE difference is p.s.d. whenever for any nonzero vector t we have
and consequently, we may write
-
~’{c-~H’(Hc-~H’)Hc-~}~ u2t’c - 1t
(7.5.10)
Hence, taking the maximum over all nonzero vectors t we obtain
(7.5.11)
Chapter 7. Multiple Regression Model
364 This means
PT
p,
p,
is superior t o
..PT
p,
in the range of A’ is given by (7.5.11),
p,
. PT
otherwise is superior t o pn . The range of domination of p, over is bigger in the case of risk analysis. For W = C, the range of A’ is 5 q. The PT risk-based efficiency of p, relative to is given by
p,
-1
x {2Gq+z,,(&; A’)
-
Gq+4,m(f:;A”)}]
. (7.5.12)
For W = C5(7.5.12) equals
[
P P - qGq+z,m(&;A’)
+ A’
{ 2Gq+~,m(e,;A’)
-1 -
Gq+4,m(G; A’)}]
(7.5.13)
. PT Thus, the risk efficiency of p, is greatest when
. PT
The MSE-based efficiency of p, MRE =
1,
relative to
.
p, is given by
(SET: p,) - { H ’ ( H C - ~ H ’ ) - ~ H C - ~ } G , +A’) ~,~(~~;
+ ~-‘C66’{2G,+~,,(!,; = [1 - Gq+z,m(&; A’)
A’) - Gq+4,m(t;; A’)}
I-”’
+ A ’ { ~ G , + Z , ~ (A’) ~ , ; - Gq+4+(!;; A’)}]-l
x {I - G,+z,,(e,;
(7.5.15)
..PT is greatest when
Thus, MRE of p,
which is smaller than (7.5.14). PT To compare p, and f i n we consider the MSE difference .L
~2
(Pn)
- ~3
(bET)
= -u’ {C-lH’(HC-lH’)-lHC-’} {I - Gq+2,m(!,;A’)}
+ 66’ { 1 - 2Gq+z,m(L;A’) + Gq+i,m(t;;A’)} .
(7.5.17)
7.5. MSE-Matrix Analysis of the Estimators
365
The MSE difference is negative-semidefinite (n.s.d.) whenever for any nonzero vector i! we have
L
e{ C- l ~ ’H(C - ~ H ’-) l ~ ~ - l } e e’C-le
which implies that the MSE difference is n d . whenever
(7.5.19) Thus, in this range
6,
,.PT
6,
also the range of superiority of risk comparison. PT The MRE(P, ; f i n ) is given by
..PT
p, ; otherwise, p, is superior. Here . PT over p, is smaller than the range for the
is superior to
bi, p, and bn -s We compare p, and B,. The MSE difference in this case is M1 (B,) - M4 (6:) = dqa’ { C-lH’(HC-lH’)-lHC-l} Comparison of
x
{ ( 4 - 2)E [X;:2(A2)l
--dq(q
+ 2A2E [XqS44(A2)11
+ 2 W ’ E [xqS44(aZ)l.
(7.5.20)
The difference is p.s.d. whenever for a nonzero vector 4 we have
dqa2{e’{C-1H’(HC-’H’)-’HC-1e} { ( q - ~)E[X;:~(A’)] +~A’E[x&(A’)]}
- dq(q
+ 2)(e’ss’e)E[~,S4~(A’)]2 0.
(7.5.21)
This implies that (after simplification)
qA2E [x;:4(A2)]
L
(Q - 2)E [X,s“’(A”,]
,
(7.5.22)
Chapter 7. Multiple Regression Model
366
-s
which does not hold for all A2. Thus, P, is not uniformly better than under the MSEscriterion. The MRE(pn;P,) in this case is given by
p,
-s -
MRE(Pn; Pn) =
/Ip- dqH'(HC-'H')-'HC-'(
(q - 2)E [ x , S ~ ~ ( A ~ ) ]
I-'"
+ 2A2E [ x ~ : ~ ( A ~ )}]+ dq(q + 2)0-~C66'E[x;$~(A~)] =
(1 - dq{ (4 - 2)E[x,:2(A2)]
+ dq(q + 2)A2E [x;:~(A~)] X
+ 2A2E[~L:2(A2)]}
)-' +
(1 - dq { (q-2)E [x,S42(A2)] 2A2E [X;:~(A~)]})
Similarly, the comparison of ference
M2
P---llP
.
(7.5.23)
-s b, and Pn is obtained through the MSE dif-
(bn)- M4 ( b f ) a2C-' H'(HC-'H')-lHC-'
--
X {
1 - dq(q - 2)E [xi:2(A2)] - 2dqA2E [x;:,(A2)]
+ as'{ 1 - dq(q + 2 ) E [x,&(A2)]
}.
Under Ho, the MSE difference is n.s.d. Therefore, ..s
Pn
Pn.
} (7.5.24)
b,, performs better than -s
at A2 = 0. Except for small intervals around 0, P, performs better than
7.5. hlSE-Matrix Analysis of the Estimators
. PT
367
(7.5.27) whenever A2 satisfies (7.5.27); otherwise,
-s
Thus, 0, performs better than 0, -s PT 0,performs better than p, . Neither estimator dominates the other. . PT Similarly, we can find the range of A2 for which 0, performs better than ., S+ ., PT . s+ 0, using the expression (7.5.26) and observe that neither 0, nor 0, dominates the other. MRE expressions can be obtained by straightforward computation. Next, we consider the MSE differences under Ho for the comS+ parison of 0, and n
p,: M5 (X+) - M2 (p,)=
o2 [C-lH'(HC-lH')-lHC-']
{ 1- d -
[
E (1-d1F;:2,m(0))2 I (Fq+2,m(0)< dl)]}. (7.5.28) This is p.s.d., since (1 - 4 2 E [(I - dlF;:2,m(0))2
6,
I (Fq+2,m(0)< d d ] .
(7.5.29)
. S+
This implies that is superior t o 0, under Ho. . PT A similar comparison of with 0, under Ho shows that
M5
(bt')
- M3 (PET)
= o~C-'H'(HC-'H')-'HC-I{ Gq+2,m(t,;A2) - d
(7.5.30)
368
Chapter 7. Multiple Regression Model
whenever there exists a set A of a such that
{
A = a : Gq+2,m(ta; A2) L d
+ E [ (1 - d1F>12,m(0))2I (Fq+2,m(0)< d l ) ] } .
(7.5.31)
..S+
PT
-Sf.
Thus, 0, is superior t o pn whenever (7.5.31) is satisfied; otherwise, 0, IS ,.S+ PT superior. Therefore, pn does not always dominate pn when the hypothesis Ho : HP = h holds. Hence, we can order the MSE matrices as ~i
(P.)
L ~4
(a:)
2
~
(XT)L 3
M (Bn) ~ , -s
(7.5.32) . S+
where 2 stands for M2(pn) - Md(bZ) 2 0, M4(Pn) - M5(pn ) 2 0,
Ms(B:+)
- M3(bET)L 0 , and
A
PT
M3(Pn
-
M2(bn) 2 0 .
The position of PTE may be changed to the order
M I (Pn) 2 M3
(BET) L M4 (Bf)
2 M5
(Bf')
2 M2 ( P n )
7
(7.5.33)
depending on the size of a.
Comparison of Finally, we compare
M4
-s
p, , p, and p, A
S+
B:+
and
-s
p,. In this case, MSE difference yields
(b:) - M5 (8')
= n2
{ [C- 'H' (HC-'H' )-'HC - '1
[
x E (1 - d l F q + ~ , m ( A I~ )(Fq+2,m(A2) )~
+ U - ~ M ' E[(I - d2F'5,m(A2))2
I (Fq+4,m(A2) < dz)]
}
+ 266'E [(d;F;22,m(A2)- 1) 1 (Fg+2,m(A2) < d ; ) ] 2 0. (7.5.34) -s The MSE difference is p.s.d. for all A2, and hence, b:+ dominates pn uni-s
formly. But, p, does not dominate ..S+ a n d p , as
b:
M4
fin
uniformly. We can order the MSE of
(pa) 2 M5 ( B a t ) .
(7.5.35)
7.6. Improving the PTE
369
Improving the PTE
7.6
,.PT
As in Chapters 4, 5, and 6, we consider the improvement of the PTE, pn of ,.PT as P when q L 3. We may write the improved estimator of p, PAnP T + =
for q 2 3.
,8:T - d (B, PT+
The bias vector of P,
an)CL'I (C, 2 Fq,m(a))
(7.6.1)
and the quadratic bias of this estimator are given
bY
(BET+)
b5
= b3
(a:')
+ b4 ( b f )+ dl6E [Fi22,m(A2)1 (Fq+2,m(A2) <
fa)]
(7.6.2)
and
B6
(B:")
= A2{diE [F,-:,,,(A2)I - dqE [x;:2(A2)]
(Fg+2,m(A2)< fa)] - Gq+z,m(fa A2)} ;
2
. (7.6.3)
The corresponding MSE-matrix and the weighted risk expressions are given bY
Ms
(a:")
= M3
(XT)
- a2dl [C-'H'(HC-lH')-lHC-l]
E{ (2F[:2,m(A2) - '266'[E{
- d1F>22,77z(A2))
(Fq+2,m(A2)
(2F$4,m(A2) - d2F$4,m(A2)) I
> fa)
} }
> f:)
(Fg+4,m(A2)
-2E (f7>12,m(A2)1(Fq+2,7n(A2) > fa))]
(7.6.4)
and R6
(BET+;w)
= R3
. PT (Pn ;W) - a2dl tr[WC-lH'(HC-lH')-lHC-l~
E{ -
(2Fi22,m(A2> - dlF[-,22,m(A2))
d1(6'W6)
[E( (2F$4,,(A2) -2E (F,-l2,,(A2)1
I (Fq+2,m(A2) > ea)
}
- dzF$4,m(A2)) I (J'q+4,m(A2) > e:) ( F q + 2 , m ( A 2 )> fa))].
}
(7.6.5)
Chapter 7. Multiple R.egression Model
370
. PT+
For W = C, the percentage improvements of p, the tabular values in Table 5.6.1 of Chapter 5.
7.7 7.7.1
over
- PT are similar to
0,
Multiple Regression Model: Nonnormal Errors Introduction
Consider the multiple regression model
y , = X,P
+en,
(7.7.1)
where Y , = (y1,. . . ,y,)’ is the vector of observable response, X , is an n x p matrix of known constants, p = (PI,.. . ,pp)‘is the vector of regression parameters, generally unknown, and e , = ( e l , .. . ,en)’ is the vector of i.i.d. random errors assumed to follow a distribution, F ( e ) such that E ( e , ) = 0
-
and E(e,ek) = a21, (a2< 00). Further, assume that max X ; ( X ~ X , ) - ~ X ~ o as n 4 00, (7.7.2) (i) ljijn where xi is the a th row of X,; (ii) lim (n-’(X; X , ) ) = C, finite and positive-definite matrix. n-Dc)
The basic problem of this section is the estimation of the regression parameter p when it is suspected that p belongs to the subspace defined by Hp = h, where H is a known q x p matrix and h is a q x 1 vector of known constants. Accordingly, we consider the unrestricted, restricted, preliminary test and Stein-type estimators of p.
7.7.2 Estimation of Regression Parameters and Test of the Hypothesis Based on the least squares principle, the unrestricted and the restricted estimators of p are given by
where C, = ( X k X , ) . We define the estimator of u2 by
~ 1 =4 ( n- P ) - ’ ( Y , - x , P , ) ’ ( y ,
-
LP,)
(7.7.4)
One may show that sz -+ o2 almost surely (see for example Sen and Singer (1993, p. 281)). For testing the null hypothesis Ho : Hp = h, we consider test-statistic L, defined by L, = sL2(Hp,
-
h)’(HC,’H’)-’(Hp,
- h).
(7.7.5)
Under Ho, L, follows a central chi-square distribution with q d.f. as n (see Sections 7.8.1 and 7.8.2).
4
co
7.8. Asymptotic Distribution of the Estimators
7.7.3
371
Preliminary Test and Stein-Type Estimation
As usual (see Chapters 5 and 6), we combine the unrestricted and the restricted estimators to obtain the PTE and Stein-type estimators of P as follows:
,.PT = P ,
P,
-
(P, - P , ) G
< C,,,),
(7.7.6)
where is the upper a-level critical value of the exact distribution of under Ho. The James-Stein-type estimator (JSE) is defined by
C,
(7.7.7) The positive-rule Stein-type estimator (PRSE) is defined by
- S+ = P , + (1
P,
-
kJ,l)wn > W ( P , - P,)
since C,
# 0 with
7.8
Asymptotic Distribution of the Estimators
probability one.
In order to find the asymptotic distributional bias (ADB), quadratic bias (ADQB), MSE-matrices (ADMSE), and quadratic risks (ADQR) of the estimators of 0, we need the asymptotic distributions of the various estimators and of the test-statistic, C,. As in Section 5.8, we consider the generic estimator P: and a positive-semidefinite matrix W defining the loss function
m:;P ) = 40:
- P>'w(P: - P>
= tr[W{n(P: - P)(P: - P ) / } ] *
Let
M(PT,)denote the
0: is given by
(7.8.1)
- P)(P:
- P)'}. Then the ADQR of
R ( P 3 = t"WM(P:,)I.
(7.8.2)
E{ n(P:
In many situations, the asymptotic distribution of & s,'(p: -P) is equivalent to the fio-'(P, - P ) distribution as n --+ 00 under fixed alternatives
Kc : HP = h + (.
Then, to obtain a meaningful asymptotic distribution of &s,'(P,', consider the class of local alternatives, {K(,)} defined by
K(,) : HP = h + T Z - * / ~ ( .
(7.8.3) - P ) , we
(7.8.4)
372
Chapter 7. Multiple Regression Model
Let the asymptotic cdf of &s,l(P:
- P ) under
G p ( x )= n-o;, lim pq,,{ f i s , l ( P :
-
{K(,)} be
P) 5 x } .
(7.8.5)
If the asymptotic cdf exists, then the ADB and the ADQB are given by
b(P:) and
= n-m lirri
E [ f i ( P : -P)] =
/
xdG,(x),
w:) .-“b(P:)I’Cjb(P:)i,
(7.8.6b)
=
respectively, where a 2 C 1is the MSE-matrix of
(7.8.6a)
P,
as n
-+ 03.
Defining (7.8.7)
we have the weighted risk of ,B: given by
Asymptotic Distribution of the Estimators under Fixed Alternatives First, we consider the asymptotic distribution of f i s , ’ ( P , - p) under the 7.8.1
regularity conditions given by 7.7.1(i) and (ii). Theorem 1. Under the regularity conditions 7.7.1(i) and (ii) together with o2 < co,as n -+ 00, fiS,’(Pn
- 0)
N,(O, c-1).
(7.8.9)
For proof readers are referred to Sen and Singer (1993, p. 280). Next, we consider the asymptotic distribution of the test-statistic Ln for testing Ho : HP = h under some fixed alternative hypothesis of the form
K t : HP = h + E, where
(7.8.10)
6 is a fixed vector in Rg.Then
+ (HP - h)} = sZ1H(Pn- p) + s i l t ,
sll(HPn - h) = S;’{H(P,
- 0)
and the statistic Ln can be written as
c,
=
s;2(pn - P)’H’(Hc;~H’)-~H(P,- p) + s;2(’(~~;1~’)-1t + 2 s i 2 ( P n- P ) ’ H ’ ( H C ; ~ H ’ ) - ~ ~ . (7.8.11)
7.8. Asymptotic Distribution of the Estimators
373
Under the assumed regularity conditions and the CLT, we obtain as n
J ; ~ ' ~ ; ~ ( H-B11)~ 2 N , ( J ; E w ~ ,( H C - ~ H ~ ) )
--f
CO,
(7.8.12)
and nal2<'(HC-'H')-l< -+ CO. Similarly, the third term goes to infinity as n -+ 03. Hence, it follows that Cn -+ 00 as n -+ 00. Consequently, for all hypotheses, K< : HP = h <, P { L n > z} -+ 1 as n --+ 03 for all x E R+. Then we have the following theorem:
+
Theorem 2. Under the regularity conditions and fixed alternatives, K t as n -+ CO,
-
Jis;'(b:T
(i)
p-) = J;;;a-'(P,
- P ) + OP(l),
(7.8.13)
Pro0f. Consider the quadratic-difference
< Cn+)}
Then limn-w E {
Since Cn
2 0 and C,
-+
03
as n
-+ 00,
= 0. Hence, (i) is proved. Similarly,
we have limn.-+wkn
--f
k and
Finally,
Theorem 7.8.2 shows that the asymptotic distribution of
&(a:
-
S+
p), and &(Pn
-
&(P,
PT -
P),
0 ) is given by Np(Oto2C-l),and the ADB,
Chapter 7. Multiple Regression &lode1
374
-PT - S ..s+ ADQB, ADMSE, and ADQR of the estimators p, , p,, and 0, are all
equal and are given by
and
~ ' ( 6:,W) = R
PT
~ ( P: W) ~ =~
-s
4 ( :p W)~ =
. s+ R ~ ( P: W) ~ = u2t r ( W C - l )
(7.8.14)
while Si2
-
".j/c,
2
= L +co
and E(C,) 3 cc as n -+ 00. Thus, the asymptotic distributions of are not equivalent as n -+ 00. Now,
( f i n - p) and &s,'(p,
= &s,'(p,
&s,'(p,
- p) - J;ES,'C-~H'(HC,'H')-'(H~ ,h)
+r }
- p) - J;E~~~C-~H'(HC,~H')-~{H(~, - p) (7.8.16)
implies that the asymptotic distribution of fis;'(B, the fixed alternative K r .
7.8.2
- p)
-P)
J;ES,'(P,
=
&s';
(7.8.15)
as n + w ,
-p) is degenerate under
Asymptotic Distribution of the Estimators under Local Alternatives, and ADB, ADQB, ADMSE, and ADQR
To obtain meaningful asymptotic distributions of the various estimators and the test-statistics, L, we consider the following theorem: Theorem 3. Under K(nl : HP = h + n-1/2< and the assumed regularity conditions 7.7.2(i) and (ii), we have the following as n co: ---j
VL') = fi(D,
-0) -Np(0,2(C-')).
(7.8.17)
VL2)= fi(fi,--P) Np(-6,a2A), where 6 = C-'H'(HC-'H')-'i$, and A = C-' - C-'H'(HC-'H')-'HC-'. N
VL3) =
&(p,
- $,)
N
Np(6, u2[C-' - A]).
375
7.8. Asymptotic Distribution of the Estimators
+
J,(@
aP(x+ z;o , ~ ( c --A~) ) ~ @ ~ o( Z, ~, ( H c - ~ H ’ ) ) ,
+
+
where E ( 6 ) = { Z : (Z c~)’(HC-~H’)-’(Z 6) 2 xi(a)},and aP(.; p, E) is the cdf of a pvariate normal distribution with mean p and covariance matrix, E, and H,,(-;A2) is the cdf of a noncentral chi-square distribution with v d.f. and noncentrality parameter A2/2. It is difficult to obtain the asymptotic distributions of
fi(pE+
-s
&(On
- P ) and
- p), but we can obtain an asymptotic representation of these
estimators under { K(,)} to facilitate the computation of ADB, ADQB, ADMSE and ADQR of these estimators. Hence, we have under {IT(,)},
c-~H’(Hc-~H’)(Hz
J;;(,h:+-P)
=Z-k
<)I
+[C-’H’(HC-~H’)-~(HZ + k - a-2(HZ + <)’(HC-lH’)-l(HZ 6) x I{u-’(HZ + <)’(HC-’H’)(HZ <) < k},
{
where Z
Proof. (-11)
+ <)
{ r 2 ( H Z+ <)’(HC-lH’)-l(HZ + 6)
2,
(ix)
-
+
NP(0,a2C-’) and k
+
= q - 2.
I
(i) See Theorem 7.8.1.
fi{ (p, - p) - C;lH’(HC;’H’)-l(Hp, - h)} = fi(p, - p) - c , ~ H I ( H c , ~ H ’ ) - ’ H [ ~ ~ ( P -,p) +
vL2) -
fi(p,-P)
-
=
Np(-6,a2A) as n -+ co. (iii)
vL3)= fi@, - p,) N
=
( C , ’ H ’ ( H C , ’ H ’ ) - ~ H [ ~ ~ /; ~0) ( ~+,
NP(6,a2[C-’ - A]) as n -, 00.
(iv) and (v) follow similarly. To prove (vi), we note that fis;’(Hfi, h)ENq(o-’<,( H C ’ H ’ ) ) as n 03. Hence, ---f
nsL2(HP, as n --+
00
- h)’(HC,’H’)-l(Hp,
- h)gxi(A2)
where A’ = o-’<’(HC-’H’)-’< = a-l6’C6.
-
Chapter 7. Multiple Regression Model
376
+(b,
Since reduces to
- p) and
&(p,
-
bn)are independent, the first term
H q ( x i ( a ) A’)@,(x+ ; 6; 0,o’A) as n -+ co. The second term is obtained by conditional arguments as n --+ 03 and is given by QP(x - C-’H’(HC-lH’)-lZ; 0, a’[C-l - A]) x d@,(z; 0 , ~ ( H c - ~ H ’ ) ) ,
where E ( 6 ) = { Z; (Z
+ 6)’(HC-1H)-1(Z+ 6) 2 x;(a)}.
Proofs of (viii) and (ix) are obtained by writing the expressions in terms of (HP - h) = nF1/’< and then applying the distribution of the related statistics. Based on the theorem above we can easily obtain the ADB, ADQB, ADMSE and ADQR of the estimators. First, we consider the ADB and ADQB of the estimators. Clearly,
(i) bl(fJn)= lim E ( h ( P , - 0)) = 0 and B1(bn)= 0, n-w
(ii)
bz(bn)=
lim E(&(p,
11-00
(iii) b3(bn)= lim n-oo
and
-P)}
E(&(b,-p)}
(7.8.18)
= -6 and B2(pn)= 0-’(6’C6) = A2, = lim n-oc1
E(-&(6n-bn)I(Ln < Ln,a)}
= - d H , + ~ ( x ; ( a ) ;A’), since
---f
X ; ( C Y ) as n -+ co,
B3(btlT)= A2 {Hq+z(X;(a); (iv) bd(b;) = lirn E { n-w
&(b: - P ) } = -k
= -k6E[x$(AZ)]
and
Similarly,
,
&(a:)
lim E {
n-oo
h ( P n - b,)&’}
k = lim k, = (q - 2) n-w
= k’ A’ { E
[xi:’
(A’)]}’.
7.8. Asymptotic Distribution of the Estimators
377
We present the expressions for the ADQR and ADMSE of the estimators together with the ADMSE using Theorem 7.8.3. Notationally, they are similar to the expressions given in Section 7.3.2.
(i)
R1(Bn;W) = a2 tr(WC-l)
and MI@,) = a2C-l.
(ii) R2(Bn; W) = a' tr(WC-l) - a2tr(Al1)
M ~ & ) = a 2 ( ~ - 1 - A) + M',
+ q;Allql
(7.8.19) and
= WC-'H'(HC-~H')-~HC-~.
(iii) R3($ET;W) = a' tr(WC-') - a ' tr(A11)Hq+2(x,2(a);A2)
+ (rl;A11771){ 2 ~ q t n ( X : ( a ) ; A2) - Hq+4(x34;A"} and PT
M3(Bn ;W) = a2C-'
- a2[C-l - A]Hq+2(x;(a);A2)
+a&' {2Hpt2(x;(a); A')
- HP+l(x:('y); A",>.
378
Chapter 7. Multiple Regression Model
is a symmetric idempotent matrix with Since C-1/2H’(HC-1H’)-1HC-1/2 rank q ( 5p ) , there exists an orthogonal matrix such that
The matrices
A11
and
A12
are of order q and p - q , respectively. Hence,
7.8. Asymptotic Distribution of the Estimators
379
Chapter 7. Multiple Regression Model
380
7.8.3 ADQR Analysis First, compare is given by
P, and a,. The asymptotic risk efficiency of a,
relative t o
P,
7.8. Asymptotic Distribution of the Estimators
381
then
Under Ho,AR.RE(b,;
a,)
equals
However,
whenever
>
17/1A11171-$Al). APT
-
Next, the ARR.E(P, ;0,) is given by
(7.8.26)
Chapter 7. Multiple Regression Model
382 Under Ho, we have
and under non null situations, we have (v;Allvl
).-2
> tr( A l l ) 4 + 2 (xg( a ) ;A2) < t r ( W C - l ) ( 2 H , + z ( x ~ ( a ) ;A2) - Hq+4(x34;A"}.
-
(7.8.31)
p,
., PT
Since PTE, p, is not dominating uniformly, we can select a PTE such that it has at least a pre-specified asymptotic efficiency, say, Eo,by selecting an optimum level of significance, a o p t . This is achieved by solving the equation rnin E(a,A2) = E ( a , A2
= Eo,
(7.8.32)
+
where E ( a ,A') = [l g(A2)]-'. The resulting ( @ O p t , A') provide the minimum guaranteed ARRE for the application of PTE. -s -s Now, compare p,, and 0,. We can rewrite R4 (/3,,; W) as
5 R1(pn;W) for all (A2;W), and
q
2 3,
such that
(7.8.33) The risk-difference is R4&; W ) - Rt,(bE+; W) = o2{tr(Aii)E[(1 - ~
X ~ ? ~ ( A ~ ) ) ~ ~ (<Xk :) ]+ ~ ( A ~ )
383
7.9. Confidence Set Estimation
-s
&(On
;W). The ARRE (bn;pn) and ARRE (p, ;Pn) both tend to 1 as A’, for fixed a, while for a given a , they intersect a t say, A:, where A: is the solution of the equation Now, consider R 4 ( P n ; W )and APT
-
PT
{
= Hq+2(x:(4;AZ)- viA1lrll 2Hq+z(x;(a);A2)-Hq+4(X:(a);A”)). cJz tr(A11)
(7.8.36) Note that for some a , there may not be any intersection. In this case, superior t o
PT
-s
p, is
0, . When there is an intersection, -s
PT
p, is superior to p, on [0,A:], -S ..PT (ii) p, is superior t o p, on (A:, m), (i)
- PT and p,- sare equally risk-efficient if A’ = A:. ,.PT - s Clearly, neither p, nor p, dominates each other. Finally, under Ho, (iii)
pn
Rz(b,;W)
I &(bFT;W)I %(bs.W) n, L Ri(Pnr- W ) .
(7.8.37)
We conclude that if q 2 3, then on the basis of ARRE ordering, namely R S ( ~ ~ + ; W )
5 R~&;w) I Rl(Bn;W)*
..PT
,.S+
We should use p, , while for q < 3, we lean toward 0, unless there is clear indication that Ho is true to use the restricted estimator. Similarly, the ADMSE analysis may be carried out following Section 7.5.
b,,
Confidence Set Estimation
7.9 7.9.1
Preliminaries
In this section, we consider the confidence set estimation of the regression parameters P in the model
Y=Xp+E
(7.9.1)
when it is suspected that p may belong to the subspace defined by HP = h, as before, where .c Nn(0,a21n) and a2 is known. As in Sections 7.1.1 and 7.2.1, we consider the five estimators: N
(i)
6,
=
(x’x)-~x’Y.
Chapter 7. Multiple Regression Model
384 (ii)
fin
(iii)
P,
(iv)
P,
(v)
b:'
=
,.PT
0,
=
-s
=
- C-'H'(HC-'H')-'(HP,
6, - ( B n - Bn)I(Ln < x;(a)),
p, - d ( p , - b,)L;',
=
- h).
fi;
(7.9.2)
where d = q - 2.
- (1 - dC,')I(L,
< d ) ( P , - b,).
where Ln is the test-statistic for testing Ho : HP = h, defined by
c,
b,).
- & J ' ( H C - ~ H ' ) - ~ ( ~-~ ,
=
Sincep, Np(P,02C-'), C = (X'X), and the (1-y)% is defined by N
c;(B~) = { P : a - 2 1~P~n i l g
(7.9.3)
confidence set C;(B)
5 X;(Y));
(7.9.4)
where xg(y) is chosen such that P(xg 5 xg(y)) = (1 - y). Thus, Ci(p,) has the coverage probability 1 - y. The set C;(p,)is minimax in the sense that among all the confidence sets, C; has coverage probability with least 1 - y. Also, C;(p,) minimizes the maximum volume. As in Section 5.9, we notice that there is scope t o improve on C;(P,) for q 2 3, in the sense that (i) Pp{P E C;(P:)} 2 Pp{P E C i ( p , ) } and (ii) the volume of C;((p:) 5 volume of C;(a,) , where P: stands for the generic notation for any of the five estimators. In this section, we consider the confidence sets that are centered at the estimators of the form
(pn)
0: = P n + ( P n - B n ) g ( L ) ,
(7.9.5)
where g(13,) is a nondecreasing function of the test-statistic L,. In particular, we consider the various functions as in Chapter 5, Section 5.9. Accordingly, we have (i) (ii)
pi = p, 0; = p,
ifg(L,)
=
1,
if g(L,) = 0,
. PT
if g ( L n ) = I ( & 2 x;(a)), where x;(a) is the a-level (iii) 0; = P, critical value under Ho, (7.9.6)
-s
= P, if g(C,) = 1 - dL,',
(iv)
0:
(v)
p:, = p,
-s+.
If
g(L,) = (1 - dL,')I(L,
- s+ and P, , respectively.
> d).
In the next section we consider the five confidence sets with centers -PT
-S
P, , P,,
p,, b,,
7.9. Confidence Set Estimation
7.9.2
385
Confidence Sets and the Coverage Probabilities
We consider the confidence sets on the §ion of the form
(7.9.8)
Now, consider the orthogonal matrix I’ that diagonalizes the idempotent matrix C-112H(HC-1H’)-1C-112with rank q , meaning I’C-’12
r’ =
x H (HC -1H’)-1 C-l12
( ‘0. ),
and define the variable w and 77 as
~ = ~ - 1l/ 2p-rn ~ - g-lrjC-’/2H’(HC-1H’)-lh, c
(7.9.9)
and
(:;)-..{( where w1 is a q-vector and show that
g-211P - P:;l
;;)( w2
2
Ip:q)}*
(7.9.11)
is a ( p - q)-vector respectively. Hence, we can
= llril - wl~(l/w1112)112 + 11%
- w2ll2.
(7.9.12)
(See Problem 12 of Section 7.12). We can write the coverage probability of the set G‘(P;) as
G3{a-211P-P;II&
I x;(Y)>
= EJ{11771 - ~1g(11w1112)112 +
11772
- wall2 L
x;(Y)>
Chapter 7. Multiple R.egression Model
386
where H p V q ( t0) ; is the cdf of a central chi-square distribution with ( p - q ) d.f. This allows us to write the coverage probabilities of the five confidence sets as given below: (7.9.14) If g(L,) = 1, then
qJ{a-211P- P,II& L x;f(r>)= p,{ 11771 - Wll12 + l h 2 - W21I2 I X ; w ) 2 = po{x, L x;cr,> = 1 - Y. If q(L,)
= 0,
then
(7.9.16)
(7.9.17)
7.9. Confidence Set Estimation
387
7.9.3 Analysis of the Coverage Probabilities CR(p,):Note that P O ( U - ~ ~ ~ P -5Px;(-y)} ~~~& = 1-7, which is independent of A2. Thus, the coverage probability of Co is constant as a function of
P (p,)
A2. Next, consider the coverage probability of CR(p,), which is given by
{.-211P
-
all: L x;(Y))
112 + 11772 - W21I2 L x;(Y)}
= PV,{ 11771
= Hp-q(x;(Y) - A2;0).
(7.9.19)
Now, Hp-q(~;(-y)-A2; 0) is a decreasing function of A2. At A2 = 0, it attains the maximum value Hp-q(x;(y); 0), and it tends to zero as A' + xg(y). The coverage probability P0(Co(p,)) is equal to Hp-q(x;(y) - A2; 0) whenever
A2 = $(y) - H$q(l - 7 ) .
..PT
CPT(p, ):
The coverage probability of this set is
Then we have the following theorem:
(7.9.20)
Chapter 7. Multiple Regression Model
388
Theorem. (7.9.22)
(i) If 0 5 A2 < xg(y), then Pp(CPT(BET))2 1 - y. (ii)
If xg(y) < A2 < (xp(y)+ x,(cu))~, then PP(C’~(B:~))< 1 - y. ., PT + x,(cY))~,then Pp (CPT(p, )) = 1 - y.
(iii) If A2 2 (xp(y)
(iii) If A2 > (xp(y)+ ~,(a))~, then
(BET))
pp (CPT
=I
X;(Y)
Pv1{ llw 112
’xi(4;Il77l -
W1
/I2 L
(x;(y)
Note that xp(r)
+ X,(Q) < A = 1177111.
-
t)+}dHp-&, 0).
7.9. Confidence Set Estimation
389
Since
The reader is referred to Tables 4.8.2a and 4.8.2b for the tabular values of the coverage probabilities against A2-values for p = 5(2)15. Clearly, the coverage probability attains its maximum a t A’ = 0. Then it decreases t o A2 = x;(y) and drops to a minimum value (there is a discontinuity at A’ = x;(y)). As A’ > x:(r), the coverage probability increases monotone toward the value 1 - y as A2 4 cx). The picture of the coverage probability is similar to the risk of the PTE as a function of A2.
for 0 I d 5 2d by the fact that w l ( 1 - dllwlll-2) is the Stein function, while w1 is the usual q-variate normal with mean q 1 and covariance I,.
Chapter 7. Multiple Regression Model
390 S+
Cs+(/3, ):
In this case, we have the following theorem:
Theorem 1. If q 2 3 and A2 > 0, ~p
{ c+(p,S+)} L P ~ ( C O ( P , ) )for all A’ E ( 0 ,cm)
provided that 0
(7.9.27)
< d < do where do is the minimum of the two unique
(7.9.28)
Proof. The theorem is proved if we can show that for every 0 < b2 5 x$(y),
Pvl{ 11771 - Wl(1 - dlI~lll-2)~(llW1112 > 4112I b 2 } > PvJIl771 - 4 l 2 5 b”.
(7.9.29) Substituting the right-hand side of (7.9.28) for the integrand below, we get
-
Note that w1 Nq(ql, Iq);hence, it is sufficient t o establish that G$’)(d, b) 2 1 and Gi2’(d,b) 2 1 for 0 < d < do and 0 < b2 I xg(y).
Let us prove Gil)(d,xp(y)) 2 1 for 0 < b2 I x;(y). Note that for each b, Gil’(d,b) is decreasing in d where d* satisfies G F ) ( d * , x p ( ~=) 1. ) Hence, it is sufficient to prove that there exists a d* such that @)(d*,b) 2 1. Note that
7.10. Asymptotic Theory of ConfidenceSets
391
Gq(1)( d , b) is strictly decreasing in b. As a result, Gil)(d*,b) either (1) strictly decreases to zero in b or (2) strictly increases t o a unique maximum and then strictly decreases to zero. Since Gq(1)( d * , 0) = Gil'(d; x p ( y ) )= 1, then (1) is not true, and in the case of (2) we get Gil)(d*,b) > 1 for 0 < b2 < xi(y). The proof with &)(d,x;(r)) is similar. In our case where d* = q - 2, computational results show that there is no significant difference in the coverage probabilities. Notice that
[I1171 - W l ( 1
2
PTlI[I1171
-
( 4 - 2)11~111-21 I(lIw11l2 > Q - 2)lI2
- Wl(1 - ( 4 - 2)11~111-2} < X&)l-
< x;(Y)l (7.9.31)
s+
Thus, the confidence sets C"(p,") and Cs+(p, ) satisfy the dominance coverage probability
cs+(p,s+) 1 C"(P,").
(7.9.32)
Consequently,
cs+(P:+) L CS(P,) 2 CO(bn).
(7.9.33)
The reader is referred to Tables 5.9.2 and 5.9.3 for some values of coverage probabilities.
7.10
Asymptotic Theory of Confidence Sets
In Section 7.9 we discussed the finite sample properties of the confidence sets centered a t the unrestricted, restricted, preliminary test, James-Stein-type, and positive-rule Stein estimators when o2 is known. When o2 is unknown, the closed form answer cannot be obtained. Hence, we consider the asymptotic theory, noting the fact that the estimator s: of o2 converges almost surely t o o2 as n -+ co. This leads us to use the results of Section 7.9 for moderate to large sample sizes. However, when the error distribution of the regression model is nonnormal, we consider the asymptotic theory as discussed in Section 7.7.
7.10.1 Confidence Sets Again, consider the model
Y , = X,P
+ e,
(7.10.1)
Chapter 7. Multiple Regression Model
392
Pnl ,.
..s+
and the estimators P,, P, , P,, and p, of p given in Sections 7.7.1 and 7.7.2 and Sections 7.8.1 and 7.8.2. In this section, we consider the asymptotic theory of confidence sets defined in Section 7.6 repeated below: (i)
-PT
AS
c0(Pn) = { P : n s i ’ i i ~ PnI%=I x;(Y)>. -
(7.10.2)
where $(y) is y-level upper critical value of a central chi-square distribution with p d.f. We can write the confidence sets compactly as
C’(P3
=
{P : ns,’IlP
-
P:ll&,
< x;(Y)l,
(7.10.3)
where
P:
=
P, + (Pn -P n ) g ( L )
(7.10.4)
as before. We study the properties of these sets in the next section and show that asymptotically they are similar to that of Section 7.9.
7.10.2
Asymptotic Properties of Confidence Sets
From Section 7.8.1, we observe that the asymptotic coverage proFability of all the confidence sets is 1 - y under fixed alternative except C$(Pn). By Theorem 3 of Section 7.8.2, we obtain the asymptotic distribution of APT - S the statistics fis;’(p,’, - P ) , where P: = P, ,P,, and /?If+ under local alternatives
P,,p,,
K(n, : HP = h + n-1’2t.
(7.10.5)
First, we note that lim
n-m
PK(,,)(~S,~IIP -
PnII2C,, I x ; ( Y ) > = 1 - Y,
(7.10.6)
The rest of the expressions are obtained using the basic transformation and technique given in Section 7.9.2.
7.10. Asymptotic Theory of ConfidenceSets
393
(7.10.9)
and
respectively. Since the asymptotic coverage probabilities of the five sets are similar to those given in Section 7.9, the properties are same as before.
Chapter 7. Multiple Regression Model
394
7.11
Nonpararnetric Methods: R-Estimation
In this section, we consider nonparametric methods of estimation, namely the R-estimation of the regression parameters, /3 = (PI , . . . ,&)’ in the usual model
Y =Xp+&,
(7.11.1)
where X is a n x p design matrix of the known regression constants ( n 2 p 2 1) and the components of the random vector E are i.i.d.r.v., with continuous distribution function I?(€,) defined on the real line R1. Further, we consider the following possible restriction on P:
HP = h,
(7.11.2)
where H is a y x p matrix of known constants (rank y) and h is a y x 1 vector of known constants as well. Our problem is the robust estimation of ,f3 when it is suspected but one is not sure that (7.11.2) holds. We consider R-estimation of 0. Let be the unrestricted and be the restricted (under (7.11.2)) R-estimators of P respectively. Further, let LTtbe the rank-test of the null hypothesis Ho : HP = h against the alternative H A : HP # h. Then following previous sections, we can consider three more R-estimators of P, namely, PRE, SRE, and PRSRE given by
p,
b,,
- PT = P , l ( L 2
PTRE : P,
(7.11.3)
SRE :
(7.11.4)
B: = p,(l- d C L 1 I ( L , > c,)) ( d = p - a), PRSRE : b;’ = p, + (p, - B,)(1 - dCL1)l(C,> d ) ,
(7.11.5)
where C, -+ 0 as n + 03, and is the a-level critical value from the null distribution of C,. In this section, we compare and study various properties of the R-estimators of /3 in an asymptotic setup.
7.11.1 Linear Rank Statistics, R-Estimators and Confidence Sets Let Ri(b) be the rank of Y , - (xi-R,)b among y1- (XI-%,)b, . . . ,y, R,)b, where xi is the ith row of X = (xi,. . . ,xi)’and Rn = n-’ Now, consider the LRS,
L ( b ) = (Li,(h), . . . , L p n ( b p ) ) ’ =
n
C(xi- %)an(Ri(b)),
+
- (x, -
cr=lxi. (7.11.6)
i=l
where (i) a n ( k ) = E&(Uk,,) or 4 ( k / n . 1) for square integrable and nondecreasing score generating function &(-) defined on ( 0 , l ) as before (see Chapters 4,5, and 6 ) .
395
7.11. Nonparametric Methods: R-Estimation Let C, = ‘j& (xi - %,)(xi- Z,)’, and assume that
;c,
(i)
=
c,
(ii) maxlsis, {(xi - X,)C,’(X~
I(f) =
-
Zn)} = ~ ( n ) . and A$ = ~ ‘ Q 2 ( ~ ) - ( ~ ’ Q ( U ) d z ~ ) ’ .
{f’(.)/f(.)}2f(.)d.<.. --m
(7.11.7)
EL=,
Further, if we put llall = ~ ( ~ a1 = , (a(’),. . . , a ( p ) ) , then the uwestricted R-estimator of P is defined by any central point of the set
p,
S = { b : llLn(b)11 = minimum}.
(7.11.8)
Using JureEkova’s (1969) linearity result (see Section 2.8.4), we have
for any k > 0 and E > 0, where w is a pdimensional column vector and denotes the pdimensional Euclidean norm. We obtain
&(D,
-P)
Np(0,o2C-l),
cT2
To define the restricted R-estimator of mator of p under HP = h as
= A2,/r2(+,
4).
11 . 11
(7.11-10)
P, we mimic the least square esti-
b, = p, - CG1H’(HCi1H’)(HB,- h).
(7.11.11)
In order to test the null hypothesis Ho : HP = h against H A : HP consider the rank-test
# h, we (7.11.12)
13, = A,2[L,(b,)I’C,1 [L,(b,)l, where
A: = (n - 1)-’ C ( u n ( k )k=l
sin = n-l C u n ( k ) .
(7.11.13)
k=l
We obtain the following theorem on the null distribution of
en:
Theorem 1. Assume (7.11.10). Then, under the null hypothesis HO : HP = h as n 4 00, 13, approximately follows the central chi-square distribution with q d.f.
Chapter 7. Multiple Regression Model
396
Proof.
By the asymptotic linearity result (7.11.9) we have two relations: (i)
and
(ii)
L,(D,)
+ r(+,$)cfi(b, - 0)’ -Is, 0 L(P) - r(+,+)cfi(P, - P)’ 5 0 . - L(P)
(7.11.14)
From (i) and (ii) of (7.9.13), it follows that
C,
=
r2(+,$)A,’n(P,
- b,)’C(P,
= r2(+,4)A,’n(HP,
-
a,, +
OP(1)
- h)’(HC-lH’)-’(HP,
- h)
+ op(1).
(7.11.15)
After combining (7.11.10) and (7.11.14), we prove of the theorem. We note that the test based on 13, is consistent against fixed P such that HP # h. Thus limn-w P(C, > k) -+ 1 for all k E R1, and the PTRE, SRE, under a fixed alternaand PRSRE of p is asymptotically equivalent t o tive. Hence, we consider the properties of the five estimators under the local alternatives:
P,
7.11.2 Asymptotic Distributional Properties of the R-es t imators By the asymptotic linearity results of JureEkova (1969) (see also Puri and Sen (1986)), we have the following theorem:
Theorem 2. Under {K(,)} and the conditions (7.11.7) as n clude the following:
- a2C-1), A;/?($, 4), &(a, P ) - Np(-6, 6 C-’H’(HC-lH’)< A C-’ C-lH1(HC~HC-~.
6) f i ( P , - P ) (ii)
Np(0,
0’
=
a2A), where
-
=
(iii)
-+ CQ,
-
J;E(P, - j?,)
- N,(s,
=
we con(7.11.17)
and
IH/)-
a2(c-l - A)),
(iv) l i m p ( & 5 KIK(,))= H q ( z ; A 2 ) ,where A’ = a-’d’C6’ - a - z E I (HC-~H’)-~E. -
+
+
- P ) i KIK(,)) = H q ( x ? ( a ) ; A 2 ) @ , ( x6,0,02A) @ , ( x - c-~H’(Hc-~H’)z; 0,o ~ A ) ~ @ , 0, ( za;2 ( ~ ~ - 1 ~ i ) - 1 ) where E(<)= {z;a-’(HZ <)’(HC-’H’)-’(HZ 6) 2 x i ( a ) } .
(v) l i m P { f i ( $ r
+
(4fi(KT-P ) = z - (HZ + < ) I ( llHZ + where Z
- NP(O,
a’C-’).
+
Ell&c-lH!)-l
I xgca>> + OP(l),
7.11. Nonparametric Methods: R-Estimation
397
Theorem 3. Under {K(,)} and conditions (7.11.7), as n 4 00, the ADB, ADQB, ADMSE, and ADQR of the five estimators are given by
(i) UR.E(&: bl(&) = 0
(7.11.18) and B1(&) = 0;
MI(&) = a2C-l and R1(&; W) = o2t r ( C - l W ) .
Chapter 7. Multiple Regression Model
398
where d = q - 2. Notice that these expressions are similar to the expressions of Section 7.8.2. Hence, the properties of the five estimators are asymptotically similar.
7.11. Nonparametric hfethods: R-Estimation
399
7.11.3 Asymptotic Properties of the Recentered Confidence Sets Based on R-Estimators We consider the recentered confidence sets based on the R-estimators, namely
where ut = At/$(+, bY
4 ) and $($, 4) is a consistent estimator of T($, 4) given
C$) is the j t h row of C , , and ej is the unit-vector with 1 at the j t h place and zero elsewhere. We can, again, write the confidence sets compactly as
(p,
- bn)g(L,). We study the asymptotic ( n -+ m) propwhere P: = ,bn+ erties of the confidence sets under the local alternative
First, we note that
Chapter 7. Multiple R,egression Model
400
Next, we consider the coverage probability
by suitable transformation as in (7.9.9) and (7.9.10). Thus, we can write the asymptotic coverage probability as
This expression is the same as (7.9.13). Correspondingly, all conclusions derived in Section (7.9.3) hold. This expression is the same as (7.9.13). Hence, all analyses and conclusions are the same as given in Section 7.9.3 hold.
7.12
Conclusions
In this chapter, we studied the estimation of the regression parameters when it is suspected that the parameters belong to a subspace. Accordingly, the unrestricted, the restricted, the preliminary test, the Stein-type, and the positiverule Stein-type estimators are defined, and their properties are studied in small as well as in large samples. In addition, we discuss the confidence set estimation of the parameters in both small and large samples, which include the nonparametric R-estimation methods.
7.13 1. 2. 3. 4. 5. 6.
Problems
Derive the expression for f i n given in (7.1.4). Verify remark 2 of Section 7.1.1. Prove Theorem 2 of Section 7.1.2. APT Verify the expression for MRE(Pn ; Prove Theorem 3 of Section 7.3. Find the expressions for the covariance matrices of
on).
..s+
Pn . 7. Prove that the risk-difference of -S
-s on and @,
fin, 0, , on,and -PT
-5'
is given by (7.4.17).
8. Verify that the risk of 0, can be expressed as the r.h.s. of (7.4.19). APT
9. Verify that the MRE(Pn
;on) is given by the R.H.S of (7.5.15). PT+
10. Verify the expression for M~(fl:") tions 7.6.4 and 7.6.5.
L.
and R6(Pn
: W) given in Sec-
401
7.13. Problems
-s
11. Prove that the risk R4(Pn : W) may be expressed as R1(Pn : W) - W A n ) { ( q - 2)E[X,;22(A2)l
given by (7.8.33). 12. Prove that n-cc lim P K + )
{ nsi211P- P2I&, < x;(d}
= %{11171 - w7(ll~1112)112 + 11172
- w21I2
5 x2,(r)}
by suitable transformations and analysis, and verify (7.10.2) through (7.10.5). 13. Prove that
q 1 P - P$
&(p,
14. Prove that under K(nl.
-
= 11771
P)
-
fi(pn - bn)
15. Prove that under K(n). 16. 17. 18. 19.
-
-
-
wlg(l/wll12)J/2 + 1%
- w2ll2.
Np(0,a2(C-1)) where a2 =
N p ( - 6 , a2(C-l - A)) of Theorem 7.9.2
N p ( - 6 , a 2 A ) under K ( n ) . Prove that fi(bn - 0,) Show that limn-cc P { L , < zIK(,)} = Hg(z; A2). Verify Theorem 7.9.3. Numerically evaluate
for q = p - 1.
This Page Intentionally Left Blank
Chapter 8
Multiple Regression Model: Stochastic Subspace Hypothesis Outline 8.1 The Model, Estimation and Test of Hypothesis 8.2 Bias, Quadratic Bias, MSE Matrices, and the Quadratic Risks of the Estima-
tors 8.3 Estimation of Regression Parameters with Prior Information from a Previous
Sample 8.4 Multiple Regression Model and Stochastic Subspace Hypothesis: Nonnormal
Errors
8.5 Asymptotic Distribution of the Estimators
8.6 8.7 8.8 8.9
Confidence Set Estimation: Stochastic Hypothesis &Estimation: Stochastic Hypothesis Conclusions Problems
In Chapter 7, we considered the estimation of the regression parameters P in the multiple regression model Y = XP + e when it is a priori suspected that belongs to the subspace defined by HP = h. In this chapter, we consider the estimation of P when P belongs to the stochastic subspace defined by h = HP + v, where v is normally distributed with mean 6 and covariance matrix a2s2,with s2 being a known ( q x q ) positive-definite matrix. In addition, we consider the estimation of the regression parameters when s2 is unknown based on a sample from previous study. Asymptotic theory is also provided for nonnormal errors in the models. Here we consider the idea of mixed estimation involving combined sample information from the model and an independent prior stochastic subspace information. The methodology is 403
404
Chapter 8. Regression Model: Stochastic Subspace
due to Theil and Goldberger (1961), Theil (1963), and Nagar and Kakwani (1964) among others. Further, we provide the confidence set analysis involving various estimators of p for finite as well as for large sample.
8.1
The Model, Estimation, and Test of Hypothesis
In this section, we formally present the model, estimation of the parameters, and related test of hypothesis results that will be used in the discussions of the multiple regression model.
8.1.1 The Model Formulation Consider the estimation of the regression parameters in the model given by
Y=Xp+e when it is suspected that the parameter model, namely
h = HP + V,
v
(8.1.1)
p belongs to the stochastic subspace
- N,(S,
u2Q),
(8.1.2)
where h is a known q x 1 vector, H is a known q x p matrix of constants with rank g, and v is a q x 1 vector of random errors. In the model above Q is a known matrix but c2is unknown. This type of model appears in many econometric analyses. Reader may see Theil and Goldberger (1961) and Theil (1963). For a suitable analysis, consider the sample information and model (8.1.1) together with the stochastic subspace model (8.1.2) to obtain the following mixed model: (8.1.3) where (8.1.4) Thus, our problem reduces to the estimation of when it is suspected, but not certain, that
p
for the model (8.1.3)
(8.1.5) Rewriting (8.1.3) and (8.1.5), we have the problem of estimating 8 in the model
Y "= Z 8 + E ,
(8.1.6)
8.1. The Model, Estimation, and Test of Hypothesis
405
when it is suspected that 8 belongs to the subspace RB = 6 = 0, where
and
(8.1.8)
8.1.2
Mixed Model Estimation
Applying the least squares principle to the model (8.1.6), we have the unrestricted mixed estimator of 8 for
(8.1.9) Similarly, the mixed restricted estimator of 6 subject to RB = 0 is given by
8,
=
8,
-
C;~R’(RC;~R’)-~R~,, c, = z’z,
1
+ n)-’(HB, - h) . + a)-’ (Hp, - h)
C-lH’(HC-’H’ -n(HC-’H’
Let
a, denote the restricted estimator of ,O; then a, p, C-lH’(HC-’H’ + SZ)-l(Hp, =
-
(8.1.10)
- h)
from (8.1.10). Let us now consider the estimation of P based on the formulations (8.1.3) and (8.1.4) using the generalized least squares principle (GLSE). The mixed restricted estimator of P of the mixed model (8.1.3) is given by
= (C
+ H’W1H)-’(CP, + H’St-Ih).
(8.1.11)
Since
(C + H’n-’H)-’
=
C-’
-
C-lH’(HC-lH’+ O)-’HC-’
(8.1.12)
by Theorem 2.6.8 of Chapter 2 , we have
$f = (p,+ C-’H’n-’h)
-
C-lH’(HC-lH’+ n)-’H(p,
+ C-’H’O-’h).
(8.1.13)
406
Chapter 8. Regression Model: Stochastic Subspace
Is there any difference between difference
b and p- nR ? To find the answer, consider the
-R
an
=
-Pn
[Pn-c'H' (HC-'H/ + fl))-'(Hp, - h)] - [(p,+ C-'H'SI-'h)
+ O)-'HC-'(Cp + H'W'h)] (8.1.14) = C-'H'[O-' - (HC-'H' + SI)-'(HC-'H' + fl)SI-']h = 0 , which shows that 6 = 8:. Therefore, the restricted mzxed estimator of 8 can C-'H'(HC-'H'
-
be rewritten as
en =
( 3, )
(8.1.15)
+
If we premultiply 6, and 6n by the p x ( p q ) matrix ( I p , O p x q ) , we obtain and respectively. In the next section, we consider the test of the hypothesis Ho : 6 = R8 = 0 against the alternative H A : 6 # 0.
p,
a,,
8.1.3 Test of Hypothesis The joint distribution of
(on,p,)' -I
..I
is given by
(8.1.16) where An = C-' - C-'H'(HC-'H'+ SI)-'HC-'. For the estimation of o', we consider s: = m-'(Y*- Z6,)'(Y* - Zen) = Y*[I - z(z'z)-'z']y* = m-'(Y
- xpn)I(Y- XP,), m = 72 - p .
(8. I. 17)
Further,
(6, - e n ) ' C Z ( 6 -, 6,)
= eLR'[RC,lR']-lRen
= (HP, - h)'(HC,'H'
+ fl)-'(HPn - h).
(8.1.18)
Hence, we can define the statistics .Cz for testing the hypothesis HO : Re 6=Oby
=
.C* = ~;R'(RC;~R')- 1 ~ 6 , qs2 -
(Hp,
- h)'(HC-'H'
4 2
+ fl)--'(HBn- h) 1
(8.1.19)
8.1. The h'lodel, Estimation, and Test of Hypothesis
407
which is similar to the statistic C, given in (7.1.6) except that it contains a. The distribution of Cc under Ho : 6 = R6 = 0 follows the central Fdistribution with ( 4 ,m ) d.f. Under H A : 6 = R6 # 0, it follows the noncentral F-distribution with (q, m ) d.f. and noiicentrality parameter A2/2 where
A2 = a-2[6'R'(RC,1R')-1R6]
= K26'
(HC-lH'
+ a)-' 6.
(8.1.20)
These statements are proved in the same way as in Theorem 7.1.1.
8.1.4 Preliminary Test and Stein-type Mixed Estimators Following Section 7.2.1, we define the PTE of 6 as . PT
6, = 6, - (6, - 6,)l(C; < Fq,,(a)),
(8.1.21)
where F,,,(a) is the upper cu-level critical value from the central F-distribution with (q,m)d.f. Similarly, the James-Stein-type estimator of 6 is given by
-s 6, = 6 ,
-
d(8, - 6,)C;-',
(8.1.22)
and the PRSE of 6 is given by
-s+ 6, - 6,
+ (1 - dC;-'}l(L;
> d)(6, - k),
(8.1.23)
where (8.1.24) Explicitly, the expressions for PTE, SE, and PRSE are given by (8.1.25) where
a,
=
p, - C-'H'(HC-'H' + a)-lHC-l(H,h,
- h).
(8.1.26) and (8.1.27) -PT - S . S+ If we premultiply 6, , 6,, and 6, by the p x ( p + q ) matrix ( I p , O p x g ) , we APT - S ..S+ obtain the corresponding expressions for P, , P,, and P, (see also Khan, 1997).
408
Chapter 8. R,egression Model: Stochastic Subspace
Bias, Quadratic Bias, MSE Matrix, and the Quadratic Risk of the Estimators
8.2
In this section, we provide the bias, quadratic bias, MSE matrices, and risk expressions of the five estimators of 6 .
8.2.1
Bias and Quadratic Bias Expressions
We consider the bias and quadratic bias expressions of the estimators of 6 . They are given by
&(a,)
(i) bl(8,) = 0 and (ii)
(8.2.1)
b2(6n)
= -C,1R’(RC,1R’)-1R6
B2&)
= a-26;C,6,
PT
(iii) b3(6, ) (iv)
= 0;
-S b4(@,)
=
=
,.S f
(v) b 5 ( 6 n )
= A2;
-6,Gq+2,,(&; A’) and
-6, (say), then
- PT
-dq6,E[~;:~(A’)] and B4(6:)
=
)
= A2{Gq+2,m(t,;A2)}2;
= d2q2A2{E[~,S22(A2)1}2;
- 6 z { d 1 E [ F 4 ; ’ , , ~ ( A ~ ) r ( ~ q + ~<, dl)] m(A~) -d1EP,-:,,,(A2)1
and
=
-
Gq+2,m(d1;A”}
..S+
B5(6, ) = A2(dlE[Fq;lZ,,(A2)r(Fq+z,m(a2) < di)]
-4EP7,-:z,,(A2)1 - Gq+2,rn(4;A2)}2’ where dl = qd/q
+ 2.
From the formulas above, we obtain the expressions for bias vector and quadratic bias of the estimators (i) bl
(p,)= 0
and
B1
-S p,, b,, ,6,PT , p,,
B3
(p:’)
=
-C-’H(HC-’H’
= A2{;,!(
(a:)
n)-l6 and B2(bn)= A2.
+ f2)-16Gq+2,m(!,;A2)
and
2
A”} .
(iv) b4(bE) = -dqC-’H’(HC-’H’
B4
0..,s+ given below:
(b,)= 0.
(ii) bz(b,) = -C-’H’(HC-’H’+ ., PT (iii) b3(Pn )
and
= d2q2A2{ E
+ 52)-16E[~i:2(A2)] and
[x;:~(A’)] }”
8.2. Bias, &HE, and Risks
409
410
Chapter 8. Regression Model: StochczsticSubspace and . S+ Rs(6, ) = R4(6:) - a’ trWC,’R’(RC,’R’)-’RCr’
x E [(I - dlF,~z,,(A2))21(F+z,m(Az)< dl)]
+ s:wsz -E [(I
- W(Fq+2,m(A2)< dl)]
{ 2 E [(dlF,-:m(A2)
- d2F,-:4,,(A2))”(Fq+4,m(A2)
< &)I}.
p,, B,, - P T
-S
..S+
To obtain the MSE matrices of the estimators of p, ,p, and p, we pre- and postmultiply the MSE matrices above by the p x ( p + q ) matrix ( I p ,OP+,) and its transpose, respectively. The resulting expressions are given below:
(pn)= a2C-’ and Rl(p,; W) = tr(WC-’). (ii) ~ ~ ( p= ,0W-l ) - o~c-’H’(Hc-’H~ + o)-’Hc-~ + C-lH’(HC-’H’ + i2)-166’(HC-1H’ + i2)-’HC-] (i) MI
O’
and
R2(bn,W) = ~ ’ t r [ W C - ~-l a’ tr[WC-’H’(HC-’H’
+&’(HC-’H’+ i2)-1H’C-1 WC-’H’(HC-’H’
+ i2)-1HC-1] + i2)-1HC-16.
8.2. Bias, MSE, and Risks
41 1
and
8.2.3 MSE Matrix Comparisons of the Estimators Comparison of 6, and 6,. First, we compare between the MSE matrices is given by
~ ~ ( 6-, ~) ~ ( 6= ,a )2
6,
and
6,.
The difference
s,s:
~ , 1 ~ ’ ( ~ ~ ; 1 ~ ’ ) - 1- ~ ~ , 1
(8.2.3)
The r.h.s. of (8.2.3) is positive-semidefinite whenever for a nonzero vector we have 1 = (11, . . . ,
0 2 e ’ ~l ;~ ’RC( ;’ R’)- ~ R C ; ~2Le’s,s;e.
(8.2.4)
Dividing by t’C,.f, we have
Therefore,
Thus, the r.h.s. of (8.2.3) is a positive-semidefinite matrix whenever A2 5 1, and 6, performs better than 6, in this range; otherwise, 6, is better than 6,. MSE-based efficiency of 6, relative to 6, is meaningless, since
1
- R’(Rc;~R’)-~Rc;~- a - 2 ~ , 6 , 6 : ~ = 0.
412
Chapter 8. Regression Model: Stochastic Subspace -PT
-
Comparisons of 6 , , 6 , , and 6,. The difference between the MSE matrices 6, and 6..PT , is given by PT
Ml(6n) - M3(6, ) = O~C,'R'(RC,'R')-'RC,'G~+~,~(!~;A2)
-Jz6:{2Gq+2,m(&; A2) - Gq+4,m(e:; A2)}. (8.2.6) Following Section 8.2.3a, we note that the r.h.s. of (8.2.6) is p.s.d. whenever
,.PT is superior to 6, in this range of A2. For
Hence, 6 ,
A2 2
-
Gq+2,m(ea;A2) {2Gq+2,m(&;A2) - Gq+4,m(e:; A2)}'
PT
6 , is superior t o 6 , . PT Similarly, to compare 6 , and MSE matrices:
Mz(6n) -M3(6, A
PT
-
en, we consider the difference between the
) = -a2CJ'R'(RC; 1 R I )- ' RCL1{l - Gq+2,m(!,;A2)}
+ 6zJ:{1-2Gq+2,m(&;
which implies that
(8.2.8)
6,
,.PT
is superior to 6 ,
A2)
+ Gq+4,m(e:;
A2)}, (8.2.9)
whenever
(8.2.10) APT
otherwise, 6 ,
is superior to 6,. -s
-
-
-S
Comparisons of 6,, 6,, and 6,. First, we compare 6 , and 6,. In this case, -S the difference between the MSE matrices of 6 , and 6 , is given by MI(6n) - M4(6:) = ~~O'C;'R'(RC;'R')-'RC;'{~E[X~~,(A~)] - (9 - 2)E[x;:2(A2)1>
- dq(9
+ 2)M:E[x;&&12)1. (8.2.11)
The r.h.s. of (8.2.11) is p.s.d. whenever for a given vector l ,
dqo2{ l'C, 'R'(RC; 'R')-'RC;'l} { ~ E [ X(A2)] $ ~ - ( q - 2)E[~,-t4~ (A2)]} -&(q
+ 2 ) a z ~ : E [ ~ $ g ( A 22)0.]
(8.2.12)
After simplification, as in Section 8.2.1, we obtain
9A2E[X,S44(A2)15 ( 4 - 2)E[X,S42(A2)I1
(8.2.13)
8.2. Bias, &BE, and Risks
413 -S
which does not hold for all A2. Hence, 8, is not uniformly better than
-s -
The MRE(8,; 8,) is given by
-'
(A2)]+2A2E[xi;2 (A2)]}+ dq(g+ 2)A2E[x;;4 (A2)])
(1-&{ ( q - 2)E[x;;4 X(1
-
en.
d d ( q - ~)E[x;;~(A')]
Similarly, the comparison of between the MSE matrices:
6,
+ 2A2E[xi$4(A2)]})y. -s
and 6, is obtained through the difference
M2(6,) - M4(6:) - -a2C,1R'(RC,'R')-'RC,' -2dqA2E[xT:4(A2)]}
(8.2.14)
+ 6,6:{1
(1 - dq(q - ~)E[x;:~(A~)] - dq(q
+ 2)E[xi:4(A2)]}.
(8.2.15)
-s Under Ho, the MSE difference is n.s.d. Therefore, 6, performs better than 8, a t A2 = 0. Further,
6,
-s
performs better than 8, in the interval given by
(8.2.16) -S
Outside the interval, 6, performs better than better than 6,.
6,.
-s
Thus, 8, is not uniformly
-s+ - s PT Comparisons of 8, , 8,, and 8, . Consider the difference between the -S AS+. hlSE matrices of 8, and 8, . S+
M4(6:) - Mg(6, ) =
{
o2 C, 'R'(RCl1R')-lRCil E [(1 - d1 F>'2,m (A2))2I(Fq+2,m(A2) < dl)]
+ a-26,6:E[(1 - dzF,-:,,,(A'))21(Fq+4,~(A2)< d z ) ] } + N , ~ : E [ ( ~ I F [ ~ ~ , -~ 1)I(Fq+2,m(A2) (A~) < dl)] 2 0.
(8.2.17)
,.S+ The difference between the MSE matrices is p.s.d. for all A'. Hence, 6, -S
-
-s
dominates On uniformly. But, 6, does not dominate 8,. Hence, we can write ., S+
M4(6:) 2 Ms(6, ).
(8.2.18)
414
Chapter 8. R,egression hiodel: Stochastic Subspace
Next, we consider the difference between the MSE matrices, which is
and S+
~ ~ ( )6- ~, - &(q -
PT
~ ( 8= ,~ZC;’R’(RC;~R’)-’RC;’( -
~)E[X;:~(A’)] - 2dqA2E[~&(A2)]
E“1 - ~ l F ~ 1 z , , ( A 2 ) ) 2 ~ ( F q + z , m<( Adl)]} 2)
f 6Z6:{E[(1
-~ Z ~ ~ ~ , ~ ( ~ ’ ) ) ’ ~ ( ~ q < + 4 dZ)] , m ( ~ ’ )
-[2Gq+z,m(ca; A’) - Gq+4,m(C;A’) -
~ ~ + ’ , ~ ( A’) e,;
-
2E[(1- d1F,-:2,,(A2))’1(Fq+z,m(A’)
&(q
+~)E[x~:~(A~)]]
< di)]}.
(8.2.20)
The r.h.s. of (8.2.19) is p.s.d. whenever ~
Gq+a,m(!a; A’) - dq(q - 2)E[x,S42(A2)]- 2&A2E[x&(A2)] 2Gq+z,m(L;A’) - Gq+4,m(tA;A’) - dq(q 2)E[xi:z(Az)]
+
’
(8.2.21) . PT -S Thus, 6 , performs better than 6 , whenever A’ satisfies (8.2.21);otherwise, -S APT -S APT 6 , performs better than 6 , . Neither 6 , nor 6 , dominates the other. Simi. S+ . PT larly, using (8.2.20),we note that neither 6 , nor 6 , dominates each other. Next, we consider the difference between the MSE matrices under HO to ,.S+ compare 6 , and 6,: Ms(6:’)
-
Mz(6,) = O’C,~R’(RC;’R’)-’RC;~( 1 - d
- E [ ( 1 - d1F,-:2,,(A’))’1(Fq+2,m(A’) < d l ) ] } .
(8.2.22)
This R.H.S of (8.2.22) is p.s.d. whenever 1 - d 2 E[(1 - d1F~1~,,(A2))21(Fq+z,~(Az) < dl)].
(8.2.23)
Hence, 6, performs better than 6..S+ , under Ho, and 6;’ is not uniformly better than 6,. Similar conclusions hold on the properties of the five estimators of p.
415
8.2. Bias, MSE, and Risks
8.2.4
Risk Comparisons of the Estimations
In this section, we provide the weighted risk analysis of the five estimators with the loss function
q e * e) = (e*- e)’w(e*- el,
(8.2.24)
where W is a p.s.d. matrix. Comparisons of 6, and 6,. It is clear that the risk of 6, is constant, meaning u2tr(WC;’), and the risk of 6, depends on S:WS,, since
Rz(6, : W ) = 02tr(WC,’)
- u2tr[WC,lR’(RC,’R’)-lRC,’]
+ S:WS,.
(8.2.25) Further, C,1/2R’(RC,1R’)-1RC,1/2 is a symmetric idempotent matrix with rank y ( i p y). Therefore, there exists an orthogonal matrix such that
+
(8.2.26) and (8.2.27) Thus,
tr[RC,lWC,lR’(RC,lR’)-l] = tr(A11)
(8.2.28)
and # W S , = q i A l l q l , q = rC:/’$ - rC,1/2R’(RC,1R’)-1RB
=
G9(8.2.29)
Hence,, we may write : W) = o2tr(WC,’)
We conclude that
6,
- o2tr(Al1)
performs better than
6,
+ rl;A11~1.
(8.2.30)
whenever (8.2.31)
whereas
6,
performs better than
6,
whenever (8.2.32)
If W = C , , we see that 6, performs better than performs better than 6, outside this interval.
8,
6,
in the interval [O,y]and
416
Chapter 8. Regression Model: Stochastic Subspace APT
Comparisons of 8,
-
, 8,
and
en. Consider the risk-difference
- PT
Ri(6, : W) - R3(8,
: W) = ~ 7 ’ t r ( A 1 1 ) G ~ + 2 , ~ ( t ~ ; A ’ )
(8.2.33)
- d A 1 1 ~ 1 { 2 G , + 2 , ~ ( CA’) ~ ; - Gq+4,m(tT,;A’)}.
..PT
Thus, from risk-difference of (8.2.33), 8,
while
6,
.- PT
perform better than 8,
performs better than
6,
whenever
whenever
..PT
a,,
Since 8, is not uniformly better than we obtain a PTE by selecting an optimum level of significance a+with a minimum guaranteed efficiency, say, Eo, by solving the equation min E ( a ,A’) = E ( a , A=
= Eo,
(8.2.36)
where
E ( a , A 2 )= Rl(6, . PT
Next, we compare 8, and
6,. . PT
R z ( 6 , : W) - R3(8,
: W)[R3(6flT: W)]-’.
Here :W ) =
+ ~ ; A i i ~ 11 -{ 2Gq+2,,(ta; Thus,
6,
,.PT
..PT
performs better than 8,
and 8, performs
6,
(8.2.37)
-0’ tr(A11)[1
A’)
- G,+2,,(t,;A2)]
+ Gq+4,m(t:; A’)}.
whenever
whenever
Under Ho, the risks of the three estimators may be ordered as
(8.2.38)
8.2. Bias, MSE, and Risks
41 7
-s+ - s
Comparisons of 6, bY
, 6,, and 6,.
The risk-difference of 6, and -S
6,
is given
&(en: W) - R4(6: : W) = 02dq tr(All){ ( q - ~)E[X;$~(A’)] This risk-difference is p.s.d. for all d such that (8.2.43) Note that A l l involves W. Thus, en dominates 6, uniformly for all by (8.2.43). Now. consider the risk-difference -S
A
given
The r.h.s. of (8.2.44) is positive, since the expectation of a positive random variable is positive, which follows from the fact that
We thus obtain
..S+
R:,(6, ;W) 5 R4(8:;
W).
(8.2.47)
As a consequence, we obtain R5(6;+; W) 5 R4(@; W) 5 Rl(6;; W)
(8.2.48)
for all A’ provided (8.2.49) holds.
418
Chapter 8. Regression Model: Stochastic Subspace PT
-S
Comparisons of 6 , and 8,
-S
. Under Ho, the risk of 8, may be written as
R4(6z;W) = R ~ ( 6 5 1W ~ ;) + o2tr(A11) [Gq+2,,(l,; 0) - d] PT
2 R3(8, ; W )
(8.2.50)
whenever G q + 2 , m ( f a ; o )2
d.
(8.2.51)
That is to say, (8.2.35) holds whenever there exits a set of a such that (8.2.52) -S
*
PT
The risk of 8, is smaller than that of 8, whenever the opposite inequality in (8.2.52) holds. Further, under Ho, the risks may be ordered as
,.PT
R z ( k ; W )I R3(0, ;W) I R l ( 6 h
(8.2.53)
when Q satisfies (8.2.51). ..PT As A2 diverts from 0, the risk comparisons show that 8, gets worse and so does 6,. Only 6, has stable risk. None of the estimators dominate each other. Similar conclusions as in Sections 8.2.3 and 8.2.4 hold for the five estimators of 0.
8.3
Estimation of Regression Parameters with Prior Information from a Previous Sample
In the last two sections, we considered the estimation of the regression parameters p1in the model ~1
= xlpl
when it is suspected that
-
+ el :
el
- N,,(o,~~I,,)
(8.3.1)
P1 belongs t o the stochastic subspace defined by h = HIPI
+ V,
(8.3.2)
where v hi,(d, 02n)and 0 is known. In a practical situation, the assumption that flis known may not be valid. However, one may gather information about st based on a previous sample regression model
Yo = XoPo + eo :
eo
- N,,(o,
o2rn0),
(8.3.3)
8.3. Estimation with Prior Information
419
where YOis a (no x 1) vector of responses, XOis an no x PO known design matrix, Po = (pol,. . . ,Popo)' is a p~x 1 vector of unknown regression parameters, and eo is a no x 1 vector distributed normally with mean 0 and covariance 021no.Further, in addition to the sample information, it is suspected that Po and P1 are related through the constraint
HOP0 = HlP,,
(8.3.4)
where H, is a q x pi (i = 0 , l ) known matrices of constants. Now, estimating Po by = C,'XLYo, CO = XbXo, we obtain the stochastic constraint defined by
POno
-
(8.3.5)
ho = HOPOno = H I P I +VO,
-
where vo Nq(S0,o2f2o), and f20 = HoCilH;, SO = Hopo - HIP,. Note that S20 is known for this stochastic constraint. Thus, our objective is to estimate P1 based on the model
Y1= XlPl+ when it is suspected that
el
P1 belongs to the stochastic subspace defined by ho = HIPI + VO.
The problem is discussed in Khan and Saleh (2005).
8.3.1 Estimation of
PI and Test of Hopo = HIP, First, we note that the LSE of P1 from (8.3.1) is pln1= C,'X',Yi,
X',X1 and the restricted estimator of P1subject to (8.3.5) is given by
C1 =
Plnl = (Cl + H;f2;1H1)-1(XiY1 + H;f2i1ho) -R
=PI,,
- c;~H;(H~c;~H;+ aO)-'(HlB1,1
- ho) =
bln1.
(8.3.6)
-R
Under Hopo = HIP,, we have E ( P l n 1 = ) P1and
cOV(blnl) = 2(c1 + H~~I;~H;)-!
(8.3.7)
Further, the pooled estimator of o2 is given by
6; = ~ m{ Y ~ [ I , , , XoC;'Xb]Yo +Y:[I,, - X1C;'X;]Y1),
(8.3.8)
+
where m = n - (PO + P I ) and n = no 721. In order t o test the null hypothesis HOPo = HIP, against the alternative HOPo# HIP,, we consider the test-statistic, following Section 8.1.3, as
13,
=
(H1P1nl - ho)'(HIC,lHI, + no)-l(HIP1nl 962
- ho) 7
(8.3.9)
420
Chapter 8. Regression hdodel: Stochastic Subspace
which follows the noncentral F-distribution with ( q ,rn) d.f. and noncentrality parameter A2/2 with (8.3.10)
8.3.2
The Mixed Estimators
Let
e=[
HIP,
+ do ]
(8.3.11) '
Then the model (8.3.1) together with (8.3.5) can be written as
or
Y * =Z B + E . The unrestricted and the restricted estimators of 6 are given by
en =
( Pln, h0 )
(8.3.13)
and
=
(
)'
(8.3.14)
respectively. Similarly, the PTE, JSE, and PRSE of 6 are given by
,.PT en = 8,
-
(8,
-
hn)I(Ln < Fq,m(a)),
(8.3.15) (8.3.16)
and . S+ en
= S,
+ (6, -
- ~ L ; ~ ) I ( L>,d ) ,
(8.3.17 )
where Fq,m(a) is the ath-level upper critical value from the central Fdistribution with ( 4 , m ) d.f. Let
R = [-H,,Inl].
(8.3.18)
Then we write the bias, MSE matrices, and risk of the estimators (8.3.13) through (8.3.17) as follows:
8.3. Estimation with Prior Information
421
422
8.4
Chapter 8. Regression Model: Stochastic Subspace
Multiple Regression Model and Stochastic Subspace Hypothesis: Nonnormal Errors
8.4.1 Introduction Consider the multiple regression model
Y=X,p+e
(8.4.1)
with the suspected prior information on /3 provided by the stochastic subspace constraint
2
h = HP + v; E(v) = 6 and E(vv’) = --a n
+ ad’,
(8.4.2)
where e = ( e l , . . . , e n ) are i.i.d. r.v assumed t o be distributed with the cdf F ( e ) nonnormal such that E ( e ) = 0 and E ( e e ’ ) = (~’1,. Further, assume that (i) max x;C;lxi -+ 0 as n -+ 03, where xi is ith row of X,, (8.4.3) and [n-lc,] = C, C, = X L X , , meaning the generalized Noether (ii) condition is satisfied for asymptotic normality of the estimators t o be defined.
423
8.4. Stochastic Subspace Hypothesis: Nonnormal Errors
With this setup our basic problem is the estimation of 0 when it is suspected that belongs to the stochastic subspace defined by
h=HP+v.
(8.4.4)
Accordingly, we consider the unrestricted, restricted, preliminary test, JamesStein-type, and positive rule Stein-type estimators of P.
Estimation of the Parameters and Test of Hypothesis
8.4.2
As in Section 8.1.1, we consider the mixed model, ,1/2a2-1/2h
1-
-
where we assume that e and i2,1/2(v - 6) are independent so that (8.4.6) Defining
(8.4.7) we obtain the model
Y *= Z,8+ E,
(8.4.8)
where we wish to estimate 8 when it is suspected that 8 belongs to the subspace R8 = 6 = 0. Thus, the unrestricted estimator of 0 is given by
6,
=
[ ph. 1, p,
= C,'XkY,
(8.4.9)
and the restricted estimator of 8 subject to R8 = 0 is given by
en = 6, where
c,,
= ZkZ, =
- c;;R'(Rc;;,'R')-~R~,,
( "F.
(8.4.10)
)
(8.4.11)
Chapter 8. Regression Model: Stochastic Subspsce
424
a,nd n - l C n Z -+ C,. Then we assert that as n through (iii) of (8.4.3),
where C, = fi(6,
(
- 8)
-
fi@, - 8 ) N,+& 0
).
n-l
-+ 00
under the conditions (i) (8.4.12)
o~c;~),
Similarly, as n 4 co,
- Nn+q(-6z, a2[C,’
- C,’R’(RC,lR’)-lRC,l]),
(8.4.13)
where
6,
= C,’R’(RC,’R’)-1R8.
(8.4.14)
For the test of the null hypothesis, R8 = 0 , we consider the test-statistic
x,c,~x,]Y,m = n - p.
s: = r n - l ~ ’ [-~
(8.4.16)
P
and s2-+02 as n oc). Hence, as n 4 03, LE converges to the central chisquare distribution with m d.f. under the null hypothesis 6 = 0 . ---f
8.4.3 Preliminary Test and Stein-type Estimators Following Section 8.1.4, we define PTE, JSE, and PRSE, respectively: PT
e, A
=
e,
- (8, -
6 , ) q ~ <; x;(a)),
(8.4.17)
-
-s
6 , = 6 , - ( q - 2)(8, - 6,)L’721 6:’
8.5
=
en + (8, - 6,)(l - ( q - 2)L;-’)I(L;
(8.4.18)
> q - 2).
(8.4.19)
Asymptotic Distribution of the Estimators
In order to obtain the asymptotic distributional quadratic bias (ADQB), MSE matrices (ADMSE), and the quadratic risks (ADQR) of 8 , we need the asymptotic distributions of UE, RE, PTE, and PRSE of 8. As in Section 5.8, we consider the generic estimator 8; and a positive-semidefinite matrix W defining the loss function
qe;; e ) = n(e:,- e)’w(e:,- e ) = tr[W{n(8; - 8 ) ( 6 ; - O ) ’ } ] .
(8.5.1)
8.5. Asymptotic Distribution of the Estimators
425
Let M(0fi)denote the asymptotic distributional MSE matrix of 6:. Then the ADQR of 0: is given by
R(0fi : W) = tr[WM(6:)].
(8.5.2)
In many situations, the asymptotic distribution of &s;'(0: - 0) is equivalent to f i ~ l l ( 8-~0), as n -+ M, under fixed alternatives of real values as given bY
K t : 6 = 6,
where
< is a nonzero vector.
To obtain a meaningful asymptotic distribution of fis;l(0: the class of local alternatives
Iqn): a(,)
(8.5.3) - 6 ) we consider
(8.5.4)
= n-'/2<.
Let the asymptotic distribution of fis,'(6:
- 6 ) under
K ( n )be
Gp(x) = lim P~(~){,,&s;'(0: - 6 ) 5 x} 12-00
(8.5.5)
if it exits, then the ADB and the ADQB are given by
respectively, where Ml(8,) = u2Ci1.
(8.5.8)
8.5.1 Asymptotic Distribution of the Estimators under Fixed Alternatives First, we consider the asymptotic distribution of fi~;'(8~ - 6 ) under the regularity conditions (i) through (iii) of (8.4.3). From Theorem 7.8.1, we have
vGs;1(6n
-
6 ) 2 N,+,(O, C L l ) ,
(8.5.9)
where s% u2 converges surely as n -+ 00. Next, we consider the asymptotic distribution of the test-statistic 13; under the fixed alternative K t : 60 = <. Then ---f
J;;s,'RB,
= &s,'R(6,
+ &sL'<.
- 6)
(8.5.10)
So Is: can be written as
CG = nsL2(Sn- O)'R'(RCLiR')-lR(B,
+ nsL2<'(RC;I,'R')-'< + 2nsZ2(6,
- 0) - O)'R'(RCi;R')-'<.
(8.5.11)
Chapter 8. Regression Model: Stochastic Subspace
426
Under the conditions (i) through (iii) of (8.4.3) and CLT as n 4 co,we obtain
fis ; l ~ e ,3~~(6 c- lt,RC;
l ~ ’ )
(8.5.12)
and TZD-~E’(RC;~R’)-~[ -+ co,as n -+ 00, while the third term goes to zero as n co in probability. Hence, under K
e,
.--)
lim P(LG > z) = 1 for all z E R+.
n-co
(8.5.13)
Thus, we obtain the following results: (i)
J;ISl1(e,
PT
- e) + o P ( l ) .
e) =
- 6 ) = fio-l(e,
(ii) fis;l(i:
(iii)
-
fiS;l(e,- S+
-
e) =
-
(8.5.14)
e) + o P ( l ) .
-
e) + o P ( l )
by Theorem 7.8.2. The ADB and ADQB are given by
..PT -s S+ b,(e,) = b3(8, ) = b4(8,) = bS(8, ) = 0 and APT
-S
S+
~ ~ ( 6=, ~) ~ ( )8=,B4(en)= ~ ~ ( 6= O. , Similarly,
,.PT
Mi(8,) = M3(8, )
-S
-
S+
= M4(On)= M5(8,
)
(8.5.15)
= a2C,’,
and the ADQR is given by
-s S+ ~ (: w) 8 ,=~ ~ ( 8 : w) , = u2 t,[wc;’]. (8.5.16) Now, for the restricted estimator, we have ., PT ~ ~ ( : 8w), = R3(en : w) = ~
nse2ilin - Gn112 = L; and E(LE) -+ and fis;’(e,
00 -
-+ 00,
as n -+03. The asymptotic distributions of fis;’(6, 0) are not equivalent. Further,
(8.5.17) - 8)
f i ~ ; ~ ( e-, e) J;lS;l(8,
e) - J;I.,~C;I,’R’(RC,,~R’)-~R~, = J;Isel(e, - e) - J ; ~ s , ~ c ~ ~ R ’ ( R c , ~ R ’ -)e) - ~+[4, R (8.5.18) (~, and this implies that the asymptotic distribution of 6 s : ’ (en- 0) is degen=
-
erate under fixed alternatives K t , since ~ ~ s ~ C ; ~ R ’ ( R C ; I , ’ R ’-+ ) - ~00E as n -+ co.
8.5. Asymptotic Distribution of the Estimators
427
Asymptotic Distribution of the Estimators under Local Alternatives
8.5.2
To obtain a meaningful asymptotic distribution of the estimators and the test-statistic .L:, we consider the following theorem under { K(,)}: Theorem 1. Under IS(,) : 6 ( n )= n-'/'t and the assumed regularity conditions (i) through (iii) of (8.4.3), we have as n cu the following: (i)
v,$\
(ii) V;::
+(a,
-e)
= ,/Ti(&,
- 8)
=
-
(8.5.19)
N,+,(O,~'C;').
Nn+q(-6r,a2AZ),
where A, = C,' - C,'R'(RC,'R')-'RC,'
and
6, = C;~R/(RC;~R')R(.
(iii) V$$ =
&(a,
-
6,)
-
N,+q(6,,a2[C,1 - A,]).
(vi) i i t P ~ ( ~ ) {ILz}: = H q ( z ; A 2 ) (vii)
PT
+
lim P K ( , , { h ( 8 , - 8) 5 z} = H q ( x z ( a ): A2)@n+q(x6,; 0;a'A,)
n-cs
+ JE(6,)
+ U;0;a'[C;' - A,])d @'lz+q(u, 0;o'RC;'R'). {U;(U + S,)'(RC;'R')-l(U + 6,) 2 x:(a)} and
@.n+9(~
where E(6,) = @p(x;U;Z) is the cdf of a v-variate normal distribution with mean p and covariance matrix E and H9(.;A') is the cdf of a noncentral chisquare distribution with q d.f. and the noncentrality parameter A2/2,
428
Chapter 8. Regression Model: Stochastic Subspace
Based on the distribution, we find the ADB, ADQB, and ADMSE matrices and the ADQR. of the estimators. First, the ADB and the ADQB expressions are (i) bl(6,) = 0 and
&(en)= 0
(ii) bz(6,) = -6,
and &(&)
= A2, (8.5.20)
8.6. Confidence Set Estimation: Stochastic Hypothesis
429
&,(6:+ : W) = &(h: : W) - a2(tr[W(CL1 - A,)] x
w - kx,;22(A2))21(x~+2(A2) < k)l
-%wfi= {2E[(1-
k~;;z(A~))l(x;+z(A~)< k)]
-E"1 - kx~~4(A2))21(x;+2(A2) < k)]}. The expressions above may be used to compare the properties of the five estimators as has been done in Section 8.2.3 and (8.2.4).
8.6
Confidence Set Estimation: Stochastic Hypothesis
In this section, we consider the confidence set estimation of the regression parameters, P in the model (8.1.) together with (8.1.2), namely
Y=Xp+e
(8.6.1)
and
h = HP
+ 6 + V.
Thus, we consider the five estimators
p, = (c-'x'Y. (ii) a, = p, - C - ~ H ' ( H C - ~ H +/0)-'(HP, - h). (8.6.2) (iii) j - ~= :~ p, - C- 'H' (HC-'H'+ Q)-'(HP, - h)I(L, < x;(a)), (i)
where L, = ( T - ~ ( H-~h)'(HC-'H' ,
bt = p, - dC-'H'(HC-'H'+ (v) b:' = b, + C-'H'(HC-'H'+
(iv)
+ n)-'(Hp,
- h).
n)-l(Hp, - h)L,', n)(Hp, - h)(l - dL;')I(Ln < d ) .
These estimators may be compactly written as
PE = b n + ( P n - bn>s(L) as before. Now, we write
(8.6.3)
PE as
p;1 = [I - AH]p, - Ah + A(H6, where A = C-'H'(HC-'H' + a)-'. Then A'CA = (HC-'H'
- h)g(L,),
+ fl)-'(HC-'H')(HC-'H' + n)-'
(I - AH)'C(I - AH) = C - H'(HC-'H'
+ n)-'H
(8.6.4)
(8.6.5) (8.6.6)
Chapter 8. Regression Model: Stochastic S u b s p x e
430
Thus, we may consider the (1 -
7)sconfidence sets defined by
c * ( P ; ) = { P : a-211P - P X I x;cr,>. Accordingly, the five confidence sets of (i) CO(P,)= (ii)
P are given by
{ P : o-211P - P,lE 5 x;cr,>,
cR(B,) = { P : 0-'11b - P,II&I x;(r)>, ..PT
(iii) CPT(B;'(~)) (iv)
(8.6.7)
c@)
=
{ P :a
- 2 -~P,~ 11%~
= { P : a-211P -
F X;(Y>>,
b h F xgcr,>,
and S+
I x;(r)>,respectively.
(v) CS+(bEf)= { P : u - ~ I ~-PP,
(8.6.8)
Following Chapter 7, Section 7.9, we obtain the expressions after the coverage probabilities of these confidence sets by
P{a-2/lP- P;I& =
1772
L x;(Y)}
- Wl2 + 1711 - ~ l ~ ( l l ~ 1 1 1 2I )j2
x;cr>>.
(8.6.9)
As given in Section 7.9,
(see Problem 11). Similar conclusions hold for the nonnormal case considering the asymptotic setup.
8.7
R-Est imation: Stochastic Hypothesis
In this section, we consider the problem of R-estimation of the regression parameters P in the model (as in Section 8.1)
Y , = P O L+ X,P when it is suspected but not certain that defined by
h = hp
+ e,
(8.7.1)
belongs t o the stochastic subspace
+ 6 + vq,
(8.7.2)
where h is a q-vector of random variables, H is a q x p matrix of known constants, 6 is a q-vector of constants (unknown), and vg is a q-vector of
431
8.7. R-Estimation: Stochastic Hypothesis
random variables such that E ( v q ) = 0 and Cov(v,) = $a; here 0 is a known q x q matrix of constants. In order to obtain the R-estimator of 0, we first concentrate on the model (8.7.1) and consider the following toward estimating 0: Let R,(b) be the rank of Y , - (x,- %,)b among Yl - (x,- x,)b, . . . ,Y, (x, - R,)b, where x, is the ith row of X = (xi,.. . ,xk)' and X, = n-' c,"=l x,. Now, consider the LRS:
L,(b)
=
. . . Lpn(bp))' =
(h2(h),
7
c n
,=I
(x,- %L)GL(Rz(b)).
(8.7.3)
(i) For a square integrable and nondecreasing score generating function 4(.) defined on ( 0 , l ) as before (see Chapters 4, 5, and 6), a n ( k ) = E#(Uk,,) or
4(&>.
(ii) Let C, = C,"=l(xz - %,)(x, - R,)', and assume that limn-m C, = C , and max12zjn{ (x,- %,)C;'(x, - Xn)} = o ( n ) .
(8.7.4) Further, if we put [/all = ck=lp("),and a = (a('), . . . ,a(?')), then the unrestricted R-estimator (URE), 0, of p is defined by any central point of the set
S = { b : llL,(b)/l = minimum}.
(8.7.5)
Then, by JureEkova's (1969) linearity result (see Section 2.8.4), we have
for any k > 0 and E > 0, where w is a pdimensional column vector and denotes the pdimensional Euclidean norm. Then we obtain
h ( P , -P) -~p(o,O2c-'),
= A;/r2($,4).
I] . [I
(8.7.7)
To define the restricted R-estimator of p, we mimic the least square estimator of P under (8.5.2) as the restricted rank estimator (RRE) of P given by
b, = p, - Ci'H''
n
(8.7.8)
432
Chapter 8. Regression Model: Stochastic Subspace
where Cov(v,) = $fl and 0’ = A$/y’($J, 4) as defined in (8.7.4) and fl is a krlowr~q x q matrix of constants. In order t o test the hypothesis Ho : 6 = 0 against the alternative H.4 : 6 # 0, we consider the rank-test given by
13,
=
(~~(+,4)A,’}n(Hp,- h)’(nHC,’H’
where A: in($,
and
= ( n - 1)-’ CE=l(a,(k) - En)’,
+ fl)-’(HD, E n = n-l
4) = P-’ C,”= Crjn(47 4)
(8.7.9)
- h),
C;=lUn(k), and
(8.7.10) with the following two constraints: (i) C ( j )is the j t h row of C.
(ii) ej is the unit-vector with 1 at the j t h place and 0 elsewhere. Here, Tn($, 4) is a consistent estimator of ~ ( “ ~ 4 ) . Thus, by (8.7.7) as n + co, 13, converges in distribution to the central chi-square with q d.f. under Ho : 6 = 0. Further, under the fixed alternative, HE : 6 = 6, C, -+ 03 as n 4 03, since
C,
where 8, = HP, Note that (i)
6;’fiSn
-
n(8,
(8.7.11)
h.
- N{na-’<, (HC-’H’+ a)},
(ii) n<’(HC-’H’+ st)-’( (iii)
+ fl)-’8,,
= *’(+, #)A,’n8;(nHC,‘H’
-+
co as n
+
-+
00,
- ~ ) ’ ( H c - ~ H ’t1>-’6%0 as n
-+
co.
(8.7.12)
P
Hence, under a fixed alternative, C,+co as n -+ co and lirri,-,m P(Cn > k) 4 1 for all k E R. However, under the null hypothesis Ho : 6 = 0 , L, approximately follows a central chi-square distribution with q d.f. Now, we consider the mixed rank estimators of 8 = URE of 6 is then given by
8,
=
(h
6,
=
( Hi+&
>-
The
( % ), and the RRE of 8 is given by
Pn
+
+O(HC;’H’ $fl)-’(H& - h) >. Let x:(a) be the upper a-level critical value from a central chi-square distribut:on with q d.f. Then we define the PTRE of 8 as -
-. PT
on
= 6, - (6, - 8,)1(Cn
< &a)).
(8.7.13a)
8.7. R-Estimation: Stochastic Hypothesis
433
Similarly, we define the SRE and PRSRE of 6 as the following:
-s
-
+ 03,
and d = 1 - 3 and
6, = 6, - d(6, - 6,)LL11(L, > E where
E,
+0
as n
e:+
6, + (1 - dLi1)I(Ln > d)(6, - 6,)
=
~ ) ,
(8.7.13b)
(8.7.13~)
as in Section 8.4.3. Further, we can show, as in Section 8.7.1, that under the fixed alternative H3 : 6 = 6, we have (i) 6;1&(6ET
= a-1+(~,
-S
(ii) a;’&(~~
-
- 6)+op(l),
+ op(l), - 6 )+op(l),
0) = a-lfi(6, - 6 )
(iii) 6;’&(On -s+- 6 ) = a-’&(Gn
(8.7.14)
while the asymptotic distribution of &6,(6, - 6 ) is degenerate as n Thus, as in Section 8.5.1, we consider the local alternatives
K(,) : 6(,) = n-l/’E,
6 E R”
-+
00.
(8.7.15)
to obtain the following theorem: = nP1/’< and the assumed regularity condiTheorem 1. Under K(,] : tions of (8.6.4), we have the following as n -+ 03:
(i)
vJ~;= &(a,
(ii) VJ:;
=
&(an
A, = C;l(iii) VA ;:
- 6) - 6)
- N,+~(o, -
Nn+q(-Sx,a2AZ),where
C;’R’(RC;lR’)-’RC;’
= &(6, - 6,)
(8.7.16)
a2c;1).
-
and 6, = C;lR’(RC;lR’)R<.
Nn+q(6,,~2[C;1 -A,]).
Chapter 8. Regression Model: Stochastic Subspace
434
+
+
where E ( 6 , ) = { U ; ( U 6z)’(RC;’R’)-’(U 6,) 2 x;(a)} and @,(x;U;C)is the cdf of a v-variate normal distribution with mean p and covariance matrix E and H q ( . ; A 2 is ) the cdf of a noncentral chi-square distribution with q d.f. and noncentrality parameter A2/2.
(ix) fi(6,
V
- 6 ) = U-k
S+
+
+r )
C;’R’(RC;~R’)-~(RU
RU+()I(RC;~R~)-~(RU + ()
C;’R’(RC;~R’)-~(RU+ r )
Separating out the expressions for the properties of the estimators of the regression coefficients P, we have as n -+ 03, Theorem 2. Under K(,), we have
fi(& a,) -
- p) N,(-d, a2(C-’ - A ) ) where A = C-’H’(HC-’H’ (ii) &z@, O)-’HC-’, 6 = C-’H’(HC-lH’ + S2)-’H<;
(iii)
-
Np(6,a2(C-’ - A));
+
(iv) liinP(L, 5 zlK(,)) = H q ( z ; A 2 ) ;where A2 = o-~(’(HC-’H’+
a)-’&
Further,
435
8.7. R-Estimation: Stochastic Hypothesis
Based on Theorem 8.7.2, we may compute the ADB, ADQR, ADMSE, and the ADQR of the five estimators given in the following two theorems: Theorem 3. Under { K ( n ) }and conditions (8.1.4) as n 00, the ADB, ADQB, ADMSE and ADQR of the five estimators are given as follows: ---f
(i) U R E ( ~ , ) :
bl(P,)
=0
and Bl(p,) = 0
MI@,) = 02C-’ and Rl(p,; W) = o2tr(C-lW). (ii) RRE(~,): b2(bn)= -6
and B2(bn)= A2;
M2(b,) = 02(C-’ - A) + 66‘ and R2(bn;W) = o2tr[(C-’
-
A)W] + 6’W6.
(8.7.19)
Chapter 8. Regression Model: Stochastic Subspace
436
8.8 Conclusions In this chapter, we studied the estimation of regression parameters when it is suspected that the regression parameters belongs to a stochastic subspace under the assumption that the covariance of the errors in the description of the stochastic subspace is known. The problem of estimation of the regression parameters is also discussed when there exists prior sample information from a previous study. The usual five estimators are defined and their finite sample properties are discussed in detail. We find that the properties of these estimators are similar to the properties of estimators given in Chapter 7. We also obtain confidence sets based on these estimators and apply the developments of Chapter 7 to study them. The small sample as well as large sample properties of confidence sets are similar to that of Chapter 7.9. Some nonparametric methods leading to R-estimation are also discussed.
8.9
Problems
+
1. Show that (8.1.10) holds. First, show that RCLIR’ = HC-IH’ 0. 2. Verify (i) through (v) of (8.2.1). For (ii), write down the explicit form of S,. Show that 6,C,S, = A2.
437
8.9. Problems 3. Verify (i) through (v) of (8.2.2). . S+
4. Compare 6 , and 8, using MSE matrices and weighted risk expressions. 5. Find the expressions for A P T .-
-PT
MRE(6, ,On), and RRE(8,
6. Find the expressions of the covariance matrices of 7. Show that
-
-R 61nl = eln, - C;lH',(HlC;'H;
;en)
APT -s . Sf en,8, ,8,, and 8, .
+ f20)-~(Hl61,~ - h)
as given in (8.3.6). 8. Verify (i) through (v) of (8.3.19). 9. Verify (i) through (v) of (8.3.20). 10. Consider the system of two equations and a set of stochastic constraints
[I;: [ =
x1
1:
31i 2 I 0
+
[ ::]
where ( e ; ,e$,vi) is normally distributed with mean vector (0', 0', 6') and covariance matrix
C= (a) Find the estimator
("'
0
)
8
0%
(Pi,0;)'.
(b) Find the mean vector and covariance matrix of the estimators. (c) Determine the stochastically restricted estimator of suming C to be known.
(Pi,P;)'
as-
(d) Determine the test-statistic for testing RIP, = rl and its distribution.
(p,),
11. Verify the asymptotic expression of the coverage probabilities of Co
(BE')
C"(,hn ) ? CPT(/!I:'), Cs (,h:), and Cs+ from the developments of Section 8.7.3. 12. Prove Theorem 8.6.2. 13. Prove Theorem 8.6.3. 14. Consider the two sets of regression models with nonnormal errors Yn,
and
=X
I P+ ~ en,
7
438
Chapter 8. Regression Model: Stochastic Subspace where p 1 and Pz are pl- and pa-vectors of regression coefficients and X 1 and X z are n1 x pl and 72.2 x pa matrices of known constants and Y,, and Yn2are response vectors such that n i / n 1 + nz + X i (0 < X i < 1) as n(= n 1 + n 2 ) -+ 00. Let H1 and H2 be two n 1 x p l and 122 x pz known matrices. Formulate the (a) unrestricted, (b) restricted, (c) preliminary test estimators, (d) Stein-type estimators, and the (e) positive-rule Stein estimators when it is suspected but not certain that HIP, = H2P2 may hold using (a) LSE procedure and (b) R-estimation procedure.
Chapter 9
Ridge Regression Outline 9.1 9.2 9.3 9.4
9.5 9.6 9.7 9.8
Ridge R.egression Estimators (RRE) Ridge Regression as Bayesian Regression Estimators Bias Expression of Various Ridge Estimators Covariances, MSE Matrix and Weighted Risk Functions Estimators Performance of Estimators: Risk Analysis Estimation of the Ridge Parameter Conclusions Problems
In the last two chapters, we have studied the properties of five estimators of the regression parameters, namely the unrestricted, the restricted, the preliminary test, the James-Stein-type, and the positive-rule Stein-type estimators of the regression parameters p in the model
Y = Xp + e; e
-
N,(0,021,)
when it is suspected but not certain that p belongs to a nonstochastic subspace. It is observed from above that the usual least squares estimator (LSE) of p depends heavily on the characteristics of the matrix C = X’X. If the C matrix is ill-conditioned (near dependency among various columns of C , called “multicollinearity”), then the least squares estimator (LSE) produces unduly large sampling variances. Moreover, some of the regression coefficients may be statistically insignificant with the wrong sign, and meaningful statistical inference becomes difficult for the researcher. Hoerl and Kennard (1970) found that “multicollinearity” is a common problem in the field of engineering and econometrics, among many other fields. To resolve this problem, they suggested the use of C ( k )= X’X +kI, ( k 2 0), instead of C in the estimation of
439
440
Chapter 9. Ridge Regression
p. Accordingly, we consider the estimating equation (C + k I p ) p n ( k )= X'Y
+
instead of C p , = X'Y. The solution p , ( k ) = (I, kCP1)-lP, is known as the rzdge regression estimator ( R R E ) of p. Recent application of ridge regressions are given in Malthouse (1999) and the references there in. Ridge regression methods have been considered by many researchers, beginning with Hoed and Kennard (1970) and followed by Gibbons (1981), Vinod and Ullah (1981), Sarker (1992), Saleh and Kibria (1993), Kibria (1996a, b), Gruber (1998), Singh and Tracy (1999), Tabatabaey (1995), Wencheko (2000), Inoue (200l), Montgomery et al. (2001), and very recently Kibria and Saleh (2003, 2004) among others. The purpose of this chapter is twoIold: (1) to present the basic ideas and properties of ridge regression estimators by considering the estimation of the regression parameter p when C is ill-conditioned and it is suspected that /3 may be restricted to the subspace Ho : p = 0, and (2) to discuss the estimation techniques of the ridge parameter k . First we consider the regression estimators and variations thereof, to show that these can be further improved via ridge regression (Section 9.1). Consider the model given earlier. The usual unrestricted and the restricted estimators of p under the restriction p = 0 are given by
6,
= C-'X'Y
6,
and
= 0,
respectively. The usual preliminary test estimator (PTE) of ., PT
P,
=PJ(Cn
p is defined by
2 ~p,rn(a)), m =n -P
where F,+(cr) is the a-level critical value from the null distribution of L,, I ( A ) is the indicator function of the set A , C, is the test-statistic for testing the null hypothesis p = 0 (against the alternative hypothesis p # 0) given by
with
a2 = m-'(Y
- x ~ , ) ' ( Y-
~6,).
Similarly, the James-Stein-type estimator of
p
and the positive-rule Stein-type estimator of
p is given by
..S+
0,
is given by
= (1 - dL,l)Z(Cn> d)P,.
In the next section, we define the ridge regression estimators of p corresponding to each of the estimators, beginning with the unrestricted ridge regression estimator (URRE).
9.1. Ridge Regression Estimators
9.1 9.1.1
441
Ridge Regression Estimators (RRE) Ridge Regression with Normal Errors
In this section, we consider the ridge regression version of the four estimators of p. Accordingly, the unrestricted ridge regression estimator (URRE) of p is defined by the ridge matrix, R(k) as follows:
p , ( k ) = (C + kIp)-lX'Y = C-l(k)X'Y
= R(k)P,,
(9.1.1)
p,
where R ( k ) = (IP+kC-')-', k 2 0, is the unrestricted estimator of p and k is the ridge parameter. This is the well-known Hoerl and Kennard (1970) estimator of p. Further, due to Rao (1975), a generalized ridge regression estimator of p can be defined by choosing a positive-semidefinite matrix G, instead of kip, and writing the estimator as
j,(G) = (C + G)-lX'Y = (I, + GC-l)-'p,.
(9.1.2)
Under the null hypothesis Ho : p = 0 , the restricted ridge regression estimator (RRRE) of p is defined by p,(k) = 0, where 0 is the restricted estimator of p. Next, we consider the preliminary test ridge regression estimator (PTRRE) of p defined by
,bET(k) = R(k)bET= R(k)p,
- R(k)P,I(L,
5 F p , n ( ~ ) ) , (9.1.3)
where C, is the test-statistic for testing p = 0. These kinds of estimators were studied by Saleh and Kibria (1993) for the normal regression model and by Kibria (1996a) for the Student's t regression model. Parallel to PTR.RE, we define the Stein-type ridge regression estimator (SRRE) of as
Finally, we consider the positive-rule Stein-type ridge regression estimator (PRSRRE) of p defined by
s+
a t ' ( k ) = R ( k ) P , = R(k)bf - (1 - dL;')I(C, < d ) R ( k ) P , .
(9.1.5)
These kinds of estimators are considered by Kibria and Saleh (2004a, b) for the normal regression model and by Tabatabaey, Kibria, and Saleh (2005) for the Student's t regression model. -PT - S . St The four estimators 0, p,, and p, therefore give rise to four corresponding ridge regression estimators that are of interest in practical work in applied statistics.
p,,
442
Chapter 9. Ridge Regression
Nonparametric Ridge Regression Estimators
9.1.2
In this section, we consider ridge regression estimators based on rank statistics. As in Chapter 7,we consider the model
(9.1.6)
Y=Xp+E,
where X is a n x p matrix of known constants such that C, = cy=l(xi%,)(xi- ?in)' (where xi is the ith row of X and X, = ~ ~ ~ = lis xill-i ) conditioned. This means that some of the characteristic roots of C, are small, which makes some of the asymptotic variances very large, and as a result makes the inference problem difficult. Thus, as in Section 7.11.1,consider the linear rank statistics required for testing Ho : p = 0: n
(9.1.7)
Ln(b) = C ( X i - %Jan(&(b)) i=l
as given by (7.11.6). We assume (7.11.7i-iv). In particular, we mention limit,-+w(n-lC,) = C and max { (xi - %,)'C,'(xi - X,)} = o(n).
pr)
Let be the unrestricted rank-estimator of point of the set
p by choosing the central (9.1.8)
S = { b : I IL, (b)I I = minimum}. Then, as in Section 7.11.7,
h@r)P ) -
Jyp(o7
A;/?(+,
(9.1.9)
4c-').
We define the ridge matrix R ( k ) as
R ( k ) = lim (I, + k(n-'C,)-' new
= (I,
+ kc-')-'.
(9.1.lo)
-
Then the nonparametric ridge regression estimator of ,B is defined by p, ( k ) =
- (.)
R(k)P, , and under the local alternatives, f l ( n ) = n-l/'d, 6
&(&;- p)
N
= ($1,.
Np{ - kC-'(k)G, A;/r'[R(k)C-'R(k)]}.
. . ,a,)',
(9.1.11)
Thus, the asymptotic distributional bias, covariance, and MSE matrix can be computed. We mention that the asymptotic distribution of the least square theory based ridge regression estimator R(k)p,, where is the LSE of 0, under the local alternative K(,) : p(n)= n-'/'G is again
p,
N,{
- kc-'(k)6;
2(~(k)c-~~(k))}.
(9.1.12)
Hence, the asymptotic efficiency of the rank-based ridge regression estimator is +Y2(47,
&A,',
(9.1.13)
9.2. Ridge Regression as Bayesian Regression Estimators
443
which is the same as the efficiency of the R-estimator relative to the LSE of
P.
Based on the rank ridge regression estimator of more estimators, as before:
p, we can define the three (9.1.14)
(ii) fi,"(k) = ~ t ' ( k ) { l- ( p - ~ ) L ; ' } I ( L ,> En) where (iii)
E,
---$
0 as n -+ 03, and
S+
p, ( I C ) = P:'(~){I
- ( p-~
) L ; ~ } I ( L>, ( p - a)),
respectively, where
L, = A,2[Lh(0)C,1L,(O)]
(9.1.15)
EL=,
with A: = ( n- 1)-' C ; = l ( a n ( k ) a, = n-' a n ( k ) is used to test the null hypothesis Ho : = 0. These estimators parallel those given in (9.1.3through 9.1.5). The bias, covariance matrix, MSE matrix and risks of the estimators are given in Problems 10 and 11 and are left to the reader to verify.
9.2
Ridge Regression as Bayesian Regression Estimators
We justify the ridge regression estimator (RRE) as a Bayesian estimator of the regression parameter p. be Let Y given /3 be N,(XP,a21n), and let the prior distribution of N p ( y ,r2V). Then the posterior distribution of p given Y = y is y
+ (C+ -07V2-')-'X'(y
02 + -V-')-' 72
- Xy),a2(C
Thus, the Bayes estimator of /3 given Y = y under a quadratic loss function is the posterior mean
E [ P ( y ]= y
+ ( C + kV-l)-'x'(y - Xy),
(9.2.2)
where k = a 2 / r 2 This . estimator is the generalized ridge estimator of p. If V = I,, and y = 0, then E(PIy) is equal to the ridge estimator of /3 defined by Hoerl and Kennard (1970). Thus, the Bayes estimator of /3 in this case is given by
p,(k) = R(k)P,,
R(k) = (I,
+ kc-')-',
k 2 0.
(9.2.3)
The remaining three estimators are obtained by transforming the basic estimators given in the introduction. In addition, if k is estimated by some function,
444
Chapter 9. Ridge Regression
k ( y ) of the sample observations, then we obtain the so-called adaptive ridge estimators of P. The parameter k = c?/-r2 is called the ridge parameter of the ridge estimator. There exists a number of methods in the literature for the estimation of k. The interested reader is referred to Newhouse and Oman (1971), Farebrother (1975), Hoerl and Kennard (1975), Hoerl, Kennard, and Baldwin (1975), Marquardt and Snee (1975), McDonald and Galarneau (1975), Obenchain (1975), Dempster et al. (1977), Lawless and Wang (1976), Lawless (1978), Hemmerle and Brantle (1978), and Vinod and Ullah (1981) among others. More discussion on the ridge regression is available in Gruber (1998) and Montgomery and Peck and Vining (2001). Hoed and Kennard (1970) suggested use of the ridge trace t o find an appropriate value of k . The ridge trace is a plot of @,(k) versus k between 0 to 1. If the multicollinearity in the data is severe, the ridge trace will show the obvious instability in the regression coefficients. However, more work needs to be done on the ridge trace. An excellent review and study of this problem is given by Kibria (2003), details of which are provided in Section 9.6.
9.3
Bias Expression of Various Ridge Estimators
The bias vectors of various ridge estimators are given in this section. Bias Vector of The bias vector of the unrestricted RRE is given by
p,(lc).
bl(P,@)) =
=
9.3.1
E(P,(k)
-[Ip
-
-P) = ~
~ R ( k ) -PPI,
R ( k ) ] P= -kC-’(k)P.
(9.3.1)
Bias Vector of f(’(k) PT
Similarly, the bias vector of PTRRE, p, ( k ) can be written as
b Z ( P 3 k ) )= E(P%
- P)
= E ( P , ( k ) - PI - E { R ( k ) P , G
IFp,m(4)}
= - k c - ’ ( k ) P - R(k)PGp+2,,(&;
A2),
where e, = &Fp,,(a)
(9.3.2)
and A’ = (P’CP)O-~.
-s
-s
Bias Vector of P,(k). In a similar way, the bias vector of SRR.E, P,(k) can be written as b3
( P h ) = E (a%
- P ) = E(P,(k)
-
P ) - PdR(w(P,.C,l)
= - k C - l ( k ) P - p d R ( k ) P E [X,S~~(A’)] .
(9.3.3)
9.3. Bias Expressions
445
..S+
. S+
Bias Vector of P, (k). Finally, the bias vector of PRR.RE, P, ( k ) can be written as
bdP:+(w= E ( P : + ( k ) - P ) = E(p:(k) - P ) - qdR(k)E{B,(l - d L i l ) I ( L , =
- ~ c - W-Pdl~[~;&,@2)1~ +dlR(k)PE
=-kC-WP
- d l ~ ( ~ ) P ~ , + 2 , , ( dA’) l;
[Fp:z,m (A’ )I(Fp+z,n(A’)
< dl I]
- R ( k ) P { E [ U - ~l~~-:,,,(A2))~(~,+2,,~A2)
+ dlE[F;:2,,(A2)1)
(9.3.4)
1
where dl = &d. The corresponding expressions of the rank estimators are given in Problem 12. PT
Bias Comparison. For a = 0, the bias of P, ( k ) coincides with that of the restricted ridge regression estimator, 0 , while for Q = 1, it coincides with that of p , ( k ) , the unrestricted ridge regression estimator of P. Also, as the ..PT -s . S+ noncentrality parameter A2 + 03, bz(P, ( k ) )= b 3 ( P n ( k ) )= b4(Pn ( k ) ) = bl(B,) are unbounded. Under Ho : /3 = 0 all the estimators are unbiased. Now, we compare the biases of the proposed estimators under the alternative hypothesis. In order to present a clear-cut picture of the biases, we transform them in quadratic (scalar) form by defining Q B ( p * ) = b(p*)’b(p*) as the quadratic bias function of an estimator ,b*of the parameter vector P. The quadratic bias functions of the estimators can be expressed by the following theorem: Theorem 1. Quadratic bias of the URRE, PTRRE, SRRE, and PRSRRE estimators are given, respectively, by
QBi(B,(k))
= k2P’C-’(k)P,
(B:+ ( k ) )= P’R’(k)R(k)P[F(A2)I2 + 2kp’C-’(k)R(k)PF(A2)
QB4
+ k2P’c-2(k)P,
(9.3.5)
Chapter 9. Ridge Regression
446 where
F ( A ~= ) -E [(I - dl~;;2,,(~2))r
(Fqf2,,(A2) I 4 ) ]- ~ G [ Fp-+’a , m (A2)]*
The quadratic bias of the PTRRE depends on a , the size of the test. The magnitude of the RRRE increases without a bound and tends to 0;) as A2 -+03. Since both E[Fp72,,(A2)]and G q + 2 , n - p ( & , A2) are decreasing functions of A’, the quadratic biases of the PTRRE, SRRE, and PRSRRE estimators start from 0 and increase to a point, and then decrease gradually to k2P’C-2(Ic)Pas A2 + 03. Under the null hypothesis, the quadratic biases are the same for all estimators. However, under the alternative hypothesis, the bias of the PRSRRE estimator remains below the curve of SRRE and PTR.RE for all Ic > 0. Based on the analysis of equation (9.3.5), we can establish the following inequality:
QBi(P,(k))
I QB4(b:+(Ic))I Q&(b:(k)) 5 QB2(b:T(k))I P’P
whenever
otherwise.
whenever
9.4
Covariance, MSE Matrix, and Weighted Risk Functions
In this section, we obtain the expressions for covariance matrices, MSE matrices, and the weighted risk functions of the ridge regression estimators.
URRE,
,8,(Ic).The covariance matrix of p,(k) V l ( k ) = E[R(k)(P, - PHP,
is given by -
P)’RV)]
=a2~(k)~-1~’(k).
(9.4.1)
As a result, the MSE matrix of P,(k) is given by
M1
(p,( k )) = n2R(k )C-
R’( k ) + k2C-’ (k)Pp’C- ( I c ) .
(9.4.2)
9.4. Covariance, MSE R/Iatrix, and Risk Functions
447
The weighted risk function of p,(k) based on a quadratic loss function with the p.s.d matrix W is given by
Rl(p,(k); W) = u2tr[WR(k)C-’R’(k)] = a2tr[R(k)C-’R’(k)]
+ k2p’C-1(k)WC-1(k)P
+ k2P’C2(k)P,
for W = I,. (9.4.3)
- PT
PT
PTRRE, P, ( k ) . The covariance matrix of P, ( k ) is given by V2
(bz’(k)) = R(k)V2 (by)R’(k) + R(k)PGp+2,m(ea;A’)] - P ) + ~ ( k ) ~ ~ p + 2 a2)1’ ,m(~;
= EP(k)(bET - P)
XIR(~)(P;
= u2[R(k)C-’R’( k)] - u2[R(k )C -
’R’(k)]Gp+2,m(e,
;A2)]
+ [2Gp+2,m(Ja;A2) - Gp+4,m(C;A2) - {Gp+2,m(t,; A2)}2]R(k)PP’R’(k). (9.4.4)
Thus, the MSE matrix is given by
M2(pET(k))= u2R(k)C-lR’(k) - u2 [R(k)C-’R’(k)] Gp+2,m(t,; A2)
+ R(k)PP’R’(k)(aGp+2,m(e,; A2) - Gp+4,m(eL;A2)}
+ k[R(k)PP’c-‘(k) + C-’(k)PP’R(k)]Gp+2,m(e,; A2) + k2c-1 (k)PP’C -1 ( k ),
(9.4.5)
and the weighted risk expression is given by &(b;*(k);W)
= u2 tr[WR(k)C-’R’(k)]
- u2tr[WR’(k)C-1R(k)]Gp+2,m(!,; A2)
+ P’R’(k)WR(k)P{2Gp+2,m(e,; A2) - Gp+l,m(e:;
+ k2[P’C-’ (k)WC-’ ( k ) P ]+ k[P’C-’ ( k ) W R (k ) p
+ P’R’(k)WC-’
(k)P]Gp+z,m(ea;A2),
A2)} (9.4.6)
where ! := &Fp,m(a) and GP+4,,(1,;A2) is the cdf of a noncentral Fdistribution with ( p 4, m) degrees of freedom.
+
For W = I,, A
PT
R2(P, ( k ) ;I p ) =
o2tr[R(k)C-’R’(k)](l - Gp+2,,(!!,; A2))
+ P’R’(k)R(k)P(2Gp+2,m(e,; A2) - Gp+4,m(e:;
A2)}
+ k2P’C-2(k)P + kP’[C(k)R(k)+ R(k)C-’(k)]PGp+2,,(e,;
A2). (9.4.7)
Chapter 9. Ridge Regression
448
-s -s SRRE, P,(k). In this case, the covariance matrix of P,(k) is given by V 3 ( a f ( k ) ) = 02R(k)C-’R’(k) - 02pd[R(k)C-’R’(k)]
x {2E(X,-t22(A2))- ( P - 21E(xp;42(A2)))
+ dPR(k)PP’R(k) { (P+ 2)E[X;:4(A2)I
- dP
(E[x;:,(A”,l)”}
*
(9.4.8) The MSE matrix of af(k) is
M3
(af( k ) )
= o2(R(k)C-’R’(k)) - 02pd[R(k)C-1R’(k)]
x W(x;:2(A2))
- ( P - 2)E(X,S42(A2)))
+ P ~ ( +P ~ ) ( R ( ~ ) P P ’ R ’ ( ~ ) ) E ( x ; ~ ~ ( A ’ ) ) + d ~ k [ R ( k ) P P ’ C - l ( k )+ C - ’ ( ~ ) P P ’ R ’ ( ~ ) ] E ( X ~ : ~ ( A ~ ) ) + k2c-1(k)PP’C-1 ( k ). (9.4.9) -s
The weighted risk function of P,(k) follows immediately by calculating the tr[WM3(b:(k))] as given below: R 3 ( b 3 k ) ; w)
= o2tr[WR(k)C-’R’(k)] - 0 2 p d ~~[WR(~)C-’R(~)]{~E[X;~~(A~ - ( P - ~)EXL:~(A’)]}
+ k2P’C-’
+ ~ P ( +P 2)(P’R’(k)WR(k)P)E~xL~4(A2)]
+
(k)WC-’ ( k ) P pdk{ 0’C-l ( k ) W R ( k ) P
+ P’R’(k)WC-’
(k)P}E[x;:,(A2)]
(9.4.10)
= o2tr[R(k)C-lR’(k)] - 02pdtr[R(k)C-’R’(k)]
X{2E[X&(A2)1 - ( P - 2)E[X;:2(A2)]}
+ ~ P ( +P ~ ) ( P ’ R ’ ( ~ ) R ( ~ ) P ) E [ x+; k2P’C-2(k)P ~~(A~)] + pdk [P’C- ’( k )R(k ) P + P’R’( k )C - ’( k ) P ]E [xL2: (A2)]
(9.4.1 1)
for W = I,. Finally, we consider the covariance and MSE matrix t o obtain t h e risk of the positive-rule Stein-type ridge regression estimator (PRSRRE). They are provided in the following subsection. For the expressions for the R-estimators see Problem 13.
9.4. Covariance, MSE Matrix, and Risk Functions
449
Chapter 9. R,idge Regression
450
9.5
Performance of Estimators: Risk Analysis
Since C is a p.s.d matrix, there exists an orthogonal matrix I? such that
I”Cr = A
= Diag(X1, X2,.
. . ,A,),
where X 1 _> A 2 2 ... A, > 0 are the eigenvalues of C . It is easy to see that the eigenvalues of R ( k ) = (I, k c - ’ ) - ’ and of C ( k ) = ( C kI,) are X z + k , . .. , and ( X i k , XZ k , . . . ,A, k ) , respectively. Then
(x1xz
&)
+
+
+
+
+
we obtain the following identities: P
t r ( R ’ ( k ) C - ’ R ( k ) )=
i=l (Az
Xz
+ k ) 2’
(9.5.1)
(9.5.2)
For the risk comparison, we let W = I,.
C o m p a r i s o n of 6, and p , ( k ) . The risk function of p , ( k ) is given by (9.5.3)
If k = 0, then the first term equals u2Cr=lA,’, and the second term equals 0. The first term is a continuous monotonically decreasing function of k , and its derivative w.r.t. k approaches 00 as k -+ O+ and X p + Of as k -+ 0. The second term is also a continuous monotonically increasing function of k , and its derivative w.r.t. k approaches 0 as k -+ O+. Note that the second term approaches 11/3112 as k 00. Thus, differentiating (9.5.3) with respect to k , we get ---f
Next we define (9.5.5) where amaxis the largest element of a. We see that a sufficient condition for (9.5.4) t o be negative is that there exists a k E ( 0 , k l ) such that URRE, p , ( k ) has smaller risk than that of unrestricted estimator, Similarly, we can show that a sufficient condition for R1(fin(k);I,) t o be smaller than R l ( p , ; I , ) = u2trC-’ is that there exists a A’ E ( 0 , A 2 ( k ) where ]
p,.
A 2 ( k )=
tr[C-l - R ( k ) C - ’ R ( k ) ] . k2Ch,i, [C-2( k )c - 1 1
Combining (9.5.5) and (9.5.6), we have the following theorem:
(9.5.6)
45 1
9.5. Performance of Estimators
Theorem 1. A sufficient condition for the superiority of @,(k) that there exists a pair (A2,k) such that
(A2$)
E
(0,A2(kl)lx (Olkll.
Further, the MSE difference between
u2(c-l =
-
6,
over
@,
is
(9.5.7)
and B ] , ( k )is given by
R(~)c-~R’( ~k2c-1(k)pp’c-1(k) ))
(9.5.8)
r(h+ k ~ , ) - l {(..’(A + ~ I ~ ) A - ~+(ICI) A - 2~- k 2 a a ’ } ( + ~ krP)-T
= I’(A
+ kip)-' [ka2(21p + k A - ’ ) - k 2 a a ’ ] ( A+ kIP)-’I’’.
(9.5.9)
The r.h.s. of (9.5.8) is an n.n.d. matrix if and only if (1) a belongs to the range of (21, kA-’) and (2) a’(2IP k A - ’ ) - ’ a 5 a 2 / k , holds true (see Baksalary and Kala, 1983). Using Courant’s theorem, we have
+
+
[ (
1 A’ 5 - max ___ k 1SiSp 2Ai + k
)I-’
= A2(kl).
(9.5.10)
Comparison of PTRRE with PTE and URRE. Case 1: Null hypothesis Ho : p = 0 . In this case, the risk-difference of the two estimators, namely u’tr[C-’
-
PT
*
PT
p, and p, ( k ) ,is given by
R(k)C-’R’(k)](l - Gp+~,m(t,; 0)) 2 0.
PT
(9.5.11)
PT
Hence, p, (k)dominates p, uniformly for all k ( 2 0) and.,! Case 2: Alternative hypothesis HA : p # 0 . The risk-difference in this case is given by 0’
tr[C-’ - R(k)C-’R’(k)](l - Gp+2,m(!a; A2))
-p’k -k{I
- CC-2(k)C{2Gp+2,m(!,;
A’)
+ C-’(k)C}G,+z,,(!,;
- CC2(lc)
-
Gp+4,m(c;A’)} - k 2 C 2 ( k )
A’)]p.
(9.5.12)
The expression (9.5.12) is nonnegative if and only if
P’[k2C-2(k)+ (1- CC-2(k)C>{2Gp+~,m(L; A2) - Gp+~,rn(!;; A’)}
+k{CC-’(k)
5 u2tr[C-’
-
+ C-2(k)C}Gp+z,m(tcr; A2)]p R(k)C-’R’(k)](l - Gp+2,m(!a;A2)).
(9.5.13)
By standard calculations using Courant’s theorem, we obtain
A2 I
u’ tr[C-’ - R(k)C-’R’(k)](l - Gp+P,m(!,; A2)) = A’@, a ) , Chmi, [AC-’1 (9.5.14)
Chapter 9. R.idge Regression
452
where
A
+ { I - CC-2(k)}{2Gp+2,m(t,;A2)- Gp+4,,(t;; +k{CC-’(k) + C-’(k)C)G,+z,,(l,; A’).
= k2C-’(k)
Now, differentiating the risk of
PT
p, *
(9.5.15)
( k ) w.r.t. k , we obtain
-a2(1 - Gp+2,m(&;A’)) -Gp+4,m(f:,;
A’)}
-
X,2~?(2G~+z,,(&; A’)
A’))}.
(9.5.16)
A sufficient condition for (9.5.16) to be negative is that there exists a k E (0, k z ) where k’ = l min
[a2(1 - Gp+~,m(&; A’))
+ X?4(2Gp+2,,((,;
A’)
- Gp+4,m(t:;
A’))]
max [ 4 1 - Gp+2,n(L;A’))]
I
(9.5.17)
,.PT
For t, = 0, we have k2 = kl as in Section 9.5.1. Thus, 0, (k) is superior to ..PT 0, whenever k E [0,kz] under H A . This leads us to the following theorem: Theorem 2. A sufficient condition for there exists a pair (A2,k ) such that
PT
PT
p, ( k ) t o be superior t o p, is that A
A
(9.5.18)
(A2,k) E (0,A2(k2,Q)lx (0,kzI. PT
Consider the MSE comparison between p, difference is given by
and
. PT
0, (k). Here the MSE
a2[C-l - R(k)C-’R’(k)](l - Gp+2,,(l,; A’))
+[PO’
- R(k)PP’R’(k)I{2Gp+z,,(f,; A’) - Gp+4,,(t:;
A’)}
453
9.5. Performance of Estimators
x
[
lmax
( 2XiL + k (1) -
If ! ,= 0, (9.5.22) reduces to
I)-(
2 [ max
A’ i k
lSi<_p
1
2Xi
-1
+k
.
(9.5.23)
Now, we consider the efficiency of PTRR.E relative to URRE given by -1
J % ( A ’ ; ~=) [l - 9 k ( A 2 ? a ) ]
9
(9.5.24)
Chapter 9. Ridge Regression
454 where
- Gp+4,,([;;
A’),> + 2P’[C-’ (k)R(k) + R’(lc)C-’(k)]PGp+~,,(e,; A’)] (9.5.25a)
and
&)’(A2, a ) = o2tr(R(k)C-’R(k))
+ k2,f3’C-2(k)p.
(9.5.25b)
Note that &(A2; a ) attains its maximum at A’ = 0 with the value ~ I n , x ( ~ , Oa; ) =
[1 - Gp+2,m(Pn;0)]-1.
(9.5.26)
As A2 increases, Ek(A2;a) decreases, crossing the 1-line to a minimum Ek(Aki,(a);a) a t A2 = Akin, and then increases toward the 1-line as A’ -+ co. Since PTRRE does not dominate URRE uniformly, we select an optimum a by prefixing an efficiency E: and solving for a satisfying the equation
Ek(A;i,(a);a)
2 E:-
(9.5.27)
The solution a* obtained in this way is considered t o be appropriate for the preliminary test ridge regression e s t i m a t o r (PTRRE) for p.
-s
Comparison of SRRE, ,h:(k) a n d SE, p,.
Case 1: Null hypothesis Ho : p = 0. The risk-difference function, in this case, turns out t o be
R 3 ( ~ ~ ; I p ) - R 3 ( , h : ( k ) ;= I p~~tr[C-’-R(k)C-~R’(k)](l-d) ) > 0, (9.5.28)
which is nonnegative for all k (> 0). Hence, SRRE dominates SE uniformly in k under Ho. Case 2: A l t e r n a t i v e hypothesis HA : p # 0. In this case, the risk function is given by
R3 (b:( k ) ;1, = o2 tr[R(k)C-’R’(k)] [l - pd{ ( p - ~ ) E [ x ; $ ~ ( A ~ ) ]
(9.5.29)
9.5. Performance of Estimators
455
Differentiating (9.5.29) with respect to k , we obtain
Hence, a sufficient condition for (9.5.30) t o be negative is that there exists a k such that 0 < k < k3 where k3 = k~)(A’)/k~)(A’) with
and (2)
IC3
(A
’ -- max Xia?(l - pdE[~;$~(A’)l). l
-s
(9.5.31b)
-s
Hence, p,(k) has smaller risk than p, for k E [O,k3].Further, we can check the sufficient condition on A’ by considering the risk-difference R3(&
lp)
- R 3 ( X ( k ) I;p )
= a’ tr[C-’-
+P(P
R(k)C-’R(k)] [l -pd{ ~E[X;:~(A’)] -(~-~)E[x;:~(A’)]}]
+ 2)dP’[Ip - R’(lc)R(k)]P’E[x,-,(A’)] - k’p’C-’(k)p
-kP’[C-l (k)R(k)
+ R’(k)C-’
(k)]PE[x;$2(A2)].
(9.5.32)
Using Courant’s theorem, we find that SRRE is superior to SE whenever A’ < AZ(k), defined by (9.5.33) where
and
Ail(k) = Ch,,(AC-l)
(9.5.34b)
Chapter 9. Ridge Regression
436 with
A = k2C-2(k) + k[C-’(k)R’(k) -P(P
+ 2)@p
+R(~)C-’(~)]E[X,S~,(A~)]
- R(k)R’(lc)]E[x~$,(A’)~.
(9.5.35)
Based on the analysis above we state the following theorem:
Theorem 3. The SRRE is superior to SE in the sense of having smaller quadratic risk if there exists a pair (A2,k ) such that
(A2,k ) E ( 0 ,A2(k3)]x (0, k 3 ] .
(9.5.36)
Comparison of SRRE with U R R E . We consider the risk-difference
~ 1 ( D n W & )- R3(X(k);Ip) = pda2 tr[R(k)C-’R(k)]
( p- ~ ) E [ x ~ $ ~ ( A ~ ) ]
+
-kp’[C-’ ( k ) R ( k ) R ( k ) C - l (k)]PE[x;;2(A2)].
(9.5.37)
The risk-difference is nonnegative for all pairs (A2,k ) whenever t r [R(k )C - R’(k)] > -.P f 2 Ch,,,([Ip - R’(k)R(k)]C-’} - 2
(9.5.38)
On the other hand, the risk-difference is nonnegative whenever there exists a value of k E (0, k4], where
Hence, we obtain the following theorem:
(9.5.39)
Theorem 4. A sufficient condition for SRRE t o have smaller quadratic risk than UR.RE is that there exists a pair ( A 2 , k )E ( 0 , ~ x) (O,k4] and the following holds: (9.5.40)
9.5. Performance of Estimators
457
The expression (9.5.42) is an n.n.d. matrix if and only if for all k , we have
Chapter 9. Ridge Regression
458 where
AC-’ = [R’(k)R(k)C-l]
x { (2Gp+2,m(G;A2) - Gp+4,m(t:;A’)) -k [C-’ (k)R(k ) C - l
x{pdE[x;:2(A2)]
+ p d ( p + 2)E[x;&(A2)]}
+ R’(k )C-’ (k)C-’]
- Gp+a,m(L;A”}.
(9.5.46)
On the other hand, the risk-difference is nonpositive whenever there exists a value k E [0, k s ] ,where (9.5.47)
(9.5.48) and
kf)(A2, a ) = min
1lZlP
{ 2X:a:(Gp+2,m(&; A2) - ~ ~ E [ x F : ~ ( A ~ ) ) }(9.5.49) .
Hence, we state the following theorem:
Theorem 5 . A sufficient condition for SRRE to be superior t o PTRRE is that there exists a pair (A2,k) E (0,A2(k,a)]x (O,kg]for a E (0,l). Under Ho, the risk-difference becomes Is2
trlR(k)C-1R’(k)l{Gp+2,rn(e,; 0) - d } ,
(9.5.50)
which is nonnegative whenever
Gp+2,m(&r;O) > d. -s 6,PT ( k ) is superior to p,(k);
-s
otherwise, p , ( k ) is superior t o
(9.5.51) . PT
0, ( k ) .
9.5. Performance of Estimators
459
460
Chapter 9. Ridge Regression
The risk-difference of the two estimators is given by
(9.5.62)
9.6. Estimation of the Ridge Parameter
46 1
+ (1 - ( P2a2A2 i 2 ) h i a f ) ( ~ A ’ E [P+4 x - ~(A2)]))
and
fi(a2) = lmax
afXi{ 1
+ ~ [ ( ~ ~ F , ; : , , ( A ~ )1 1 ( ~ p + z , m ( <~ ~d1)l ) -
-PdE[X,-:,(A2)}.
(9.5.64)
. s+ -s Hence, p, ( k ) dominates P,(k) for k E (O,k6). Combining (9.5.59) and (9.5.62), we obtain the next theorem.
,.S+ is that there
..S+
Theorem 6. A sufficient condition for p, ( k )t o dominate P, exists a pair (A2,k) E (O,A?(k)] x (0,k6]. The MSE matrix comparison of
S+
p,
s+
0, ( k ) can A
be carried out by considering the difference M4(p, ) - Md(P, ( k ) ) ,and this is left to the reader as an exercise (Problem 7). S+
9.6
and
,.s+
Estimation of the Ridge Parameter
In this section, we discuss variety of estimators of the ridge parameter k . We first define a simple generalization of the ridge regression estimator and consider the estimators based on this generalization. As before, we consider the diagonalization of the design matrix C its A = Diag(X1,. . . ,A), so that the canonical form of the linear model (9.0.1) is written as
Y =X*a+e,
(9.6.1)
where X*= XI’and a = r’p and I? is defined as in Section 9.5. This allows us to write one simple generalization of the ridge regression estimator as
+
&(k) = (X*’X* K)-’X’*Y,
(9.6.2)
where K = Diag(k1, k z , . . . ,k p ) , (ki > 012 = 1,.. . , p ) while & = (X’*X*)X*’Y is the least squares estimator (LSE) of a. Clearly, the MSE expression for
Chapter 9. Ridge Regression
462
&(k) is P
C + ki)'
' P
k?a? (Xi (Xi + ki)" i=l i=l (9.6.3) Hoerl and Kennard (1970) showed that the optimal value of ki is c2/a;and they subsequently came up with the estimator of k for the case K = kIp as
MSE(&(k))= E(CU(k)- a)'(&(k)- a ) = c2
Xi
+
n
(9.6.4) There are two variations of the estimator proposed by Hoerl and Kennard (1970). One is due to Hoerl, Kennard, and Baldwin (1975), given by (9.6.5) which is the harmonic mean of ki = C2/CU~, (i = 1 , . . . ,p ) and the other is due to Hocking, Speed, and Lynn (1976), given by (9.6.6) In addition, a Bayesian approach by Lawless and Wang (1976) led t o the estimator (9.6.7) More recently, Kibria (2003) proposed the following three estimators of k, namely the arithmetic mean, geometric mean, and the median of 62/(-Y;(i = 1 , . . . , p ) . Explicitly, they are (9.6.8)
{ g},
(iii) kmed = med'=,lilp respectively. It is observed from equations (9.6.6) through (9.6.8) that for CUi = CU (i = 1,.. . ,p), all proposed estimators except ~ H S Lbecome 62/(.2.Also, all estimators except ~ H L Wand k ~ ares independent ~ of (Al,. .. ,A,). In order to study the properties of the estimators, we use the MSE criterian following Dempster et al. (1977), Lawless and Wang (1976), and Gibbons (1981) among others. All these authors used simulation studies to assess the finite
9.6. Estimation of the Ridge Parameter
463
sample properties of the estimators. It is reported that all the estimators have smaller MSE values than the LSE. It is also shown that ~ G M and k ~ w performed equivalently and little better than &B. For the small correlation, ~ G M performed better than i ~ and w for high correlation ~ L W performed better than LGM. Further, it is noted that performed quite well compared to the others. Anyway, more studies are necessary t o differentiate between the performances of the estimators, and as such the area remains open for research, since no mathematical analysis has yet been recorded.
An Example. In this section, we consider one real-life example to demonstrate the performance of the ridge estimators. The example was considered by Anderson et al. (1993, p. 579). This example contains data for 19 stocks on the following variables: profit margin, growth rate, type of industry and price to earning ratio. We consider the following linear regression model:
Y = Po + Pixi
+ P2X2 + ~ 3 ~ +x 3e ,
(9.6.9)
where Y is the price to earning ratio, X I is the profit margin, X2 is the growth rate, and X 1 is the type of industry. The random error is assumed to follow the normal distribution with mean 0 and variance c2. The correlation matrix of the variables in model (9.6.9) is presented in Table 9.1. Table 9.1 Correlations among the Variables
Source: Adapted from Kibria (2003) As observed in Table 9.1, the explanatory variables are strongly correlated. Moreover, the ratio of the largest and the smallest root of the design matrix, Al/Ap = 8072.09207/15.83687 = 509.7025, which implies the existence of multicollinearity in the data set. So it is adequate to compare the proposed ridge estimators with this real data set. The estimated MSE of the proposed estimators along with the ridge regression coefficients are presented in Table 9.2.
I
Table 9.2 MSE and the Estimated Regression Coefficients Estimators1 LSE k H K ~ H K B k ~ w~ H S L kCM ~ M E MSE I 10.308 0.319 0.312 0.666 0.319 0.384 0.332 0.311 a1 0.656 0.674 0.675 0.656 0.674 0.654 0.671 0.675 a 2 0.336 0.374 0.386 0.336 0.374 0.421 0.399 0.381 0.782 0.446 0.361 0.782 0.446 0.201 0.287 0.396 a 3
D
464
Chapter 9. Ridge Regression
From Table 9.2, we know that all the proposed estimators perform better than LSE in the sense of having smaller MSE. ~ M E Dand ~ H K perform B equivalently and little better than other estimators. However, k ~ performs w worse compared to other estimators. Again, the performance of ~ G M is between k ~ w and ~ H K B .
9.7
Conclusions
In this chapter, we discussed the estimation procedures of the regression parameters for the ill-conditioned data. First, we analyzed some finite sample theory of four well-known ridge regression estimators of p that are a combination of the sample and nonsample information. The RRRE performs best compared to other estimators in the neighborhood of the null hypothesis; however, it performs worse when A2 moves away from its origin. We may order the risk of the estimators under Ho as R2(Pn(k);Ip)
I R3cB:’ck);
Ip)
5 R5@;Tf(k);I p )
5 R 4 ( X ( k ) ;I p ) 5 R1(Pn(k);Ip), . S+
while the position of the PTR,RE shifts from “in between” R5(P, ( k ) ; I p ) and R 2 ( B n ( k ) ; I P ) to “in between” R 4 ( f i f ( k ) ; I p and ) R l ( f J n ( k ) ; I p )That . is to say, the risk order is
R2(Bn(k);I p )
I R d X + ( k ) ;I P ) I R4(k)(B;T(k);Ip) 5 R 3 ( X T ( k ) ;l p ) I R1(Pn(V;l p ) .
The picture changes when /3 moves away from the origin 0. We found that there exists a value of k such that ridge regression estimators dominate each others and over the traditional estimators. Note that the application of PRRRE and SRRE is constrained by the requirement q 2 3, while PTRRE does not need such constraint. However, the choice of the level of significance of the test has a dramatic impact on the nature of the risk function for the PT estimator. Thus, when q 2 3, we use PR.RRE; otherwise, PTRRE with some optimum size a. Second, we considered various estimators of ridge parameter k and specified their relative properties based on documented simulation results.
9.8
Problems
1. Prove Theorem 1. 2 . Show that Vl(0) - Vl(k) is n.n.d. matrix for all k > 0, where Vl(k) is given by (9.4.1).
9.8. Problems
465
3. Show that V2(0) - V2(k)is n.n.d. matrix for all Ic > 0, where V2(k) is given by (9.4.4). 4. Show that V3(0)- V3(k)is n.1i.d. matrix for all given by (9.4.8).
Ic > 0, where V3(Ic) is
5. Show that V4(0) - V4(k) is n.n.d. matrix for all Ic > 0, where Vz(Ic) is given by (9.4.12). -s -s 6. Determine the conditions when M3(Pn)- M3(Pn(k))is n.n.d. matrix for all Ic > 0. . S+
. S+
7. Determine the conditions when M4(Pn )-M4(Pn ( k ) )is n.n.d. matrix for all Ic > 0.
8. Show that Vl(0) - Vz(Ic) is n.n.d. matrix for all k
9. Show that Vl(0) - V3(Ic) is n.n.d. matrix for all
> 0.
Ic > 0.
10. Consider the regression model (9.0.1)
Y = x p + E,,
E,
N
N,(O, 2 1 , )
and the estimation of P when the restriction HqXPPpXl = hqxlholds. Thus, we have the unrestricted and the restricted estimators 6, = C-IX’Y and ,?I = , - C-lH’(HC-lH’)-l(HB, - h). Define the ridge estimators as follows:
P,
W~IB,, =R ( ~ ) P , ,
(i) B,(k> = (ii) P , ( k )
..PT
APT APT
(iii) P, (~c)= R ( ~ ) P ,, Pn =
P , (P, - B n ) ~ ( 5~ Fq,rn(a)), n -
where
L,
=
(HB,
- h)’(HC-lH’)-l(Hp,
- h) 9
45:
a) Find the bias vector of the estimators: P,(k),
b,(Ic),and P,. PT (Ic).
b) Find the covariance matrix and the MSE matrix of the estimators:
P , ( k ) , P , ( k ) > and
P:’(k).
c) Compare the estimators:
fin
vs.
b,(k), ,& vs. ,&(Ic),
and
..P T vs.
0,
P,PT ( k ) using covariance matrices and using MSE matrices.
Chapter 9. Ridge R.egression
466
11. Define the Stein estimator in the ridge setup as
a) Find the bias vectors of
st
-s
P, ( k ) and P, ( k ) .
-s
b) Find the covariance and MSE matrices of P,(k) and
s+
0, ( k ) . a
c) Compare the estimators:
using covariance and MSE matrices.
12. Consider the following rank estimators of
as given by (9.1.14a-c):
a) ~ : ) ( k = ) ~(k)pr), b) b;'(k)
=
b:'(k)
= &)(k){
c) &k)
= p:)(k)I(Ln
> xg(a))
1 - ( p - 2)L;1}I(Ln > E n )
d) bE+(k) = b:'(k){ 1 - ( p - 2 ) L L 1 } I ( L ,> ( p - 2)). 13. Show that asymptotic expressions of bias of the estimators are given as follows: a) bl ( p c ' ( k ) )= -kC-'(k)6. b) bz(D:'(k))
= -kC-l(k)G - R(k)6HP+2(xg(a);A')),
A' = 0-'(6'C6) where
O'
= A$/-y2($,4).
c ) be(bf(k)) = -kC-'(k)6 - ( p - 2)R(k)6E[~i;,(A')].
d)
. st
b4(/3,
( k ) )= -kC-'(k)6 I(X;+z(A')
-
R(k)S{E(l - ( p - 2 ) ~ ; 2 ~ ( A ' ) ) .
< P - 2 ) ) + (P - 2)E[X,S22(A2)l.
14. Continuation of Problem 12. Asymptotic variance MSE matrix and risk expressions of the four rank based ridge regression estimators are given
9.8. Problems
467
Chapter 9. Ridge Regression
468 and
Chapter 10
Regression Models with Autocorrelated Errors Outline Simple Linear Model with Autocorrelated Errors Multiple Regression Model with Autocorrelation Bias, MSE Matrices, and the Risk of Estimators When p Is Known ADB, ADMSE, and ADQR of the Estimators ( p Unknown) Estimation of Regression Parameters When p Is Near Zero Estimation of Parameters of an Autoregressive Gaussian Process R-Estimation the Parameters of the AR.[p] Models R-Estimation of the Parameters with AR[1] Errors 10.9 Conclusions 10.10 Problems 10.1 10.2 10.3 10.4 10.5 10.6 10.7 10.8
In Chapters 7 and 8, we considered the estimation of the regression parameters when it is suspected that the regression parameters belongs t o a (1) subspace (Chapter 7) or to a (2) stochastic subspace (Chapter 8) while the errors of the regression model are iid r.v. (normal) with mean zero and variance cr2. In this chapter, estimation of the regression parameters concerns the case where the errors of the model are autocorrelated. These models are often satisfactory representation of time-series data in econometrics. We provide some limited nonparametric analyses for autoregressive models. With regard to a regression model with AR[l] errors, we attempt to provide only some initial results. In Section 10.1, we extend the simple linear model of Chapter 3 to introduce autocorrelated errors, and in Sections 10.2 through 10.5, we introduce autocorrelated errors t o the multiple regression model of Chapter 7. Section 10.6 contains the estimation of parameters in an autoregressive Gaussian process, and Section 10.7 deals with the R-estimation of regression parameters in 469
470
Chapter 10. Regression Models with Autocorrelated Errors
an AR[p] model. We sketch the R-estimation for the regression models with AR.[l] errors in Section 10.8.
10.1
Simple Linear Model with Autocorrelated Errors
Consider the simple linear regression model as in Chapter 3 with autocorrelated errors given by
Y n = 01,
+ ,OX+
where E , = (€1, . . . ,6,)’ and the components of stationary autocorrelated model defined by €2
= PE2-1
+
zlz,
Ipl
< 1;
(10.1.1)
E,,
2
E,
satisfy the first-order
= 1,.. . ,n,
(10.1.2)
with E(v) = 0, E(vv’) = u:In, where v = (zll,.-- ,v,)’. This specification for each ei implies that E, is a Gaussian vector with E ( E ~=) 0 and E(E,EL)
( 10.1.3)
= a,2Ep,
where P
1
(10.1.4)
Pn-2
with the inverse
= (I - p
y
L 1
-p
-P
l+P
0
...
2
0
0
...
0
0
0
..
..
-.. - p 10 1 ,
Further, from (10.1.1) it is clear that
10.1.1
Estimation of the Intercept and Slope Parameters when p is Known
The usual unbiased estimators of 0 and
p (when p = 0 ) are given by (10.1.6)
10.1. Simple Linear Model with Autocorrelated Errors
471
with the covariance matrix
(10.1.7) The unbiased estimator of u,” is then the usual sz given as
1 s2 = ( y n - 8 n l n - ,&x)’(Y, n-2
-i n ~ n ,&XI.
(10.1.8)
Now for the model ( l O . l . l ) , the unbiased estimators of 6 and /3 can be obtained based on the generalized least squares estimation (GLSE) procedure as
with the covariance matrix ‘OV
(
e,(PI
,&(p)
)
2 *7J
=
KlK2 -K:
( 10.1.10)
where X = (16,x’)’,
K1 = lhE;’l,
=
n
-
l + P (1 -
!)
( 10.I.I l a )
7
and (10.1.11c) are functions of the autocorrelation, p assumed known. Here, we suppress p in the quantities K1, K2, and K3. To test the null hypothesis, Ho : /3 = 0 against H A : /3 # 0, we first consider the X2-statistic given by
1
x2 = --[[Pn(P)l2[K1K2 02 Ki since p n ( p )
- N(,B,u:*$
- K,21
22 x?(A2),
(10.1.12)
o%Ki ) and A2 = P2‘K1Kz-K,”1
’
On the other hand, the estimate of 0,”is given by
1 3; = -[Y, n-2
- 6,(p)ln
- ,&(p)x]‘E;’[Y,
- B,(p)l,
- & ( p ) ~ ] . (10.1.13)
472
Chapter 10. Regression Models with Autocorrelated Errors a? D
-
xg
The exact distribution of rng = (m=n-2) and independent of Pn(p). Hence, we define the test-statistic, Ln as
2,
for testing Ho : ,B = 0, where Ln = Fl,, under Ho. If the restricted estimator of 0 can be written as
p = 0 with certainty,
with variance
10.1.2 Preliminary Test and S-Estimation of /3 and 8 Let Fl,,(a) be the a-level critical value of Ln (under Ho), then the PTE of ,8 and 0 are given by
P F ( P )= P n ( P ) - Pn(p)l(Ln< ~ l , r n ( a ) )
(10.1.17)
and e,PT(p) = B ~ ( P ) [Bn(P)- e = 4n
K3 + --PnI(Ln K1
<
n ( ~ ) ] ~ (~~l ,nr n ( a ) )
< ~I,rn(a))
(10.1.18)
respectively. Similarly, we may define the S-estimators of ,D and 6 following Section 3.5 as
( 10.1.19) and (10.1.20) respectively. Note that the restricted estimator of p is 0 and that of 0 is 6, (since P = 0). It can be shown, following Section 3.5, that the bias and the mean-square error of the estimators of p are given by the following: (i) Unrestricted estimator
10.1. Simple Linear hIodel with Autocorrelated Errors
473
(ii) P T E
(iii) S-estimator
using c =
f i ~K , ~ Gr;-). , =
Next we consider the bias, mean-square errors of e n ( p ) , & ( p ) , 6 r T ( p ) , and @ ( p ) . They are given as follows:
Chapter 10. Regression Models with Autocorrelated Errors
474
(iv) S-estimator
(10.1.27)
In general, for small samples, the variances of the estimators with known values of p are smaller than the traditional estimators given by (10.1.6). However, it can be shown that under the conditions of Theorem 3.6.1 of Chapter 3, as n 00, the variance and covariance matrices of the two sets of estimators are equal. Hence, asymptotically they have the same efficiency.
10.1.3 Estimation of the Intercept and Slope Parameters When Autocorrelation Is Unknown If p is unknown, the estimation of (6,p)' becomes a bit cumbersome, since p is a nuisance parameter. Fisher's recipe tells us t o estimate this parameter or test it out. Let us then consider the case where we want t o estimate p consistently. We follow Durbin's (1960) method of estimating p conditionally as follows: Let us transform the responses of the model (10.1.1) to the model (see Fuller, 1976)
y1 = e
+ pzl + w l ,
yt = (1 - p)6 + pX-1
+ p(zt - p z t - 1 ) + vt,
t = 2 , 3 , . - - ,n, (10.1.28)
where
v = (211,. .. ,vn)'
N
&{ 0 , o:Diag[( 1 - p 2 ) - l , 1, . . . ,1]}.
The conditional likelihood function of (6,p) for fixed n
-(n-1) log(27r)- (n-1) logo: -oZ2 c [ ( t=2
K -pX-1)
Y1
(10.1.29)
is given by
-(l-p)O-P(zt
-pzt-~)]~.
(10.1.30) We can use the likelihood function or the LSE method to obtain the estimator of p as (10.1.31)
10.1. Simple Linear Model with Autocorrelated Errors where
475
p(-1)= A ( Y 1 - f. . . + Yn-l) and Y(0)= A ( Y 2 +. . .+ Y,).Similarly,
?+-I)
= a 1( x l + . - - + x n - l ) andD(o)= , r1( x 2 + . . . + x n - 1 ) . In order to obtain the stable estimate fin of p, we follow the iterative procedure based on the initial estimates of 0 and p given in (10.1.6) t o obtain initial &'), which helps to obtain the second stage estimates of 0 and PI namely 6i1)(Pn)and ,&')(&) based on (10.1.30). The method is repeated until stable estimates are obtained. The method produces consistent and efficient estimators of p, 0 and P. Thus, the final estimates of 0 and /3 can be written as
( 10.1.34) There are two other methods of obtaining consistent estimators of p, namely (the Cochrane-Orcutt (1949) two stage procedure and the Prais-Winsten (1954) two-stage procedure. They are described below. Cochrane-Orcutt Two-Stage Procedure. Here one begins with the first stage regression estimators of p given by (10.1.35) are the estimates based on (10.1.6). where it = yt - jn - &xt and In the second stage, new variables are defined by
Y * = rcoY = r c o ( i n , x )
( ;)
+ r c o El
(10.1.36)
476
Chapter 10. Regression Models with Autocorrelated Errors
where -p
1
0
0
-p
1
0
0
...
0 0
.. .
...
...
... 0
(10.1.37)
is a ( n - 1) x n matrix to be used as the basis of second-round residuals yt - &&)q to obtain using (10.1.31). The process is repeated until stable values Pn is obtained and has the same asymptotic properties as the MLE as studied by Kmenta (1972). Prais-Winsten (1954) Two-Stage Procedure. Durbin’s method and Cochrane-Orcutt’s method lose one observation that can have a significant effect in evaluating the estimates. For this reason, Prais and Winsten (1954) proposed n x n matrix I’pw defined by
@A2)
en
0 1
0 0
0 0
...
-.. 0
I ‘
(10.1.38)
By way of a transformation, we obtain the model
Y *= rpWY= r P W ( i n , X )
(10.1.39)
and by the procedure of Cochrane-Orcutt, we obtain a consistent estimate of P P p. Note that Pn -+ p and Xc,, + E p as n -+ 00. Hence, as n 03,
d q n ( P n ) - 0) - P)
)
--j
’
0,”
N2{( :)’KlK2-Ki
(-K3
K1 (10.1.40) Now, in order to test the hypothesis HO : ,O = 0, we can use the statistic &(Pn(Pn)
(under Ho). From the results, we can define the PTE and S-estimators of as ^PT
Pn
-
( P n ) = Pn(Pn) - P n ( P n ) W n <
xm>;
P
10.1. Simple Linear Model with Autocorrelated Errors
477
and that of 8 as
(10.1.43)
The restricted estimator of ,# is 0 and that of 8 is gn((Pn)
+ mK 3p( Pnn )(- P n ) .
Now we may find the asymptotic expressions for the ADB and ADMSE of the estimators of p are given by the following: (i) Unrestricted estimator of ,#
(ii) PTE of ,#
and
(iii) S-estimator of
p
(10.1.46)
Similarly, the ADB and ADMSE expressions for the estimators of 8 are
Chapter 10. Regression Models with A utocorrelated Errors
478
given by
(10.1.47) (10.1.48)
( 10.1.49) respectively. Based on these results, the conclusion of Chapter 3 follows on the properties of the estimators of 0 and 6.
10.2
Multiple Regression Model with Aut ocorrelation
Consider the multiple regression model of Chapter 7 with autocorrelation p given by
Y=xp+E, where Y = satisfies
. . . ,Yn)’, X is a n x p design matrix, and
(Y1,
(102.1) E
= (€1,. . . ,6,)’
Thus, our problem is to estimate 0 when it is suspected that p belongs t o the subspace defined by HP = h as in Chapter 7, and also to test the hypothesis Ho :HP = h against HP # h.
479
10.2. Multiple R.egression Model with Autocorrelation
Estimation of ,Ll and Test of Hypothesis of HP=h
10.2.1
Assuming p to be known and using the generalized least squares principle, we minimize
(Y - XP)'XC,'(Y - XP) w.r.t
P to obtain the unrestricted estimator &(p)
=
(10.2.3)
P , ( p ) as
(x'c,lx)-'(x'E,'Y,)
(10.2.4)
with Cov(p,(p)) = oz(X'Ei'X)-'. The restricted estimator of P subject t o HP = h is then given by P , ( P ) = P J P ) - C,'H'(HCJ'H')-'(HP,(p)
- 4,
(10.2.5)
where C, = (X'Xi'X), and the covariance matrix of g , ( p ) is given by Cov(P,(p)) = oz[CP' - C,'H(HC,'H')-'HC,'].
The estimator of
0," can
1 6," = -m[ y n
( 10.2.6)
be written as
-xP,(P)I'E,'[Y, -xP~(P)I,
(10.2.7)
= n - p . Hence for the test of the hypothesis Ho : HP = h against H A : HP # h, we can use the test-statistic
where m
&(P)
=
(HB,(p)
- h)'(HC,'H')-'(HP,(p)
- h)
96:
(10.2.8)
2,
Clearly, Ln(p) = Fq,,(A2), where
1 A2 = >(HP - h)'(HC,'H')-'(HP - h). If p is unknown, we estimate p consistently by 10.1.3.
(10.2.9)
Cn by the procedure of Section
10.2.2 Preliminary Test, James-Stein and Positive-Rule Stein-Type Estimators of p Following Chapter 7, Section 7.1.2, we obtain the PTE, JSE, and PRSE of P as follows when p is known: (i) B,PT(P) = P , ( P ) - ( P , ( P ) 6)
P 3 P , =P,(d
-
~n(p))~(~< n (Fpq ), m ( Q ) ) ,
- d(P,(P) -P , ( P > ) W P ) , d =
(9-2)m q(m+2)'
(10.2.10)
Chapter 20. R,egression Models with Autocorrelated Errors
480
10.3
Bias, MSE Matrices, and the Risk of Estimators When p Is Known
When p is known, the generalized least squares estimator of P is P , ( p ) given by (10.1.5), and (p,(p) - P ) is distributed normally with mean 0 and covariance matrix u,2Ci1. Hence, the bias and quadratic bias of p , ( p ) are given by
bl(6,(P)) = 0
and
Bl(P,(P)) = 0,
(10.3.1)
respectively. Similarly, we find the MSE matrix given by M1(Pn(p))= u2(X’C-lX)-’ P = u2c,1,
(10.3.2)
and the quadratic risk under the loss function
L(P:,P)
= (0:- P)’WK - P )
(10.3.3)
is given by ~ 1 @ , ( p ): W) = u,”t r [ ~ ~ , l ] .
(10.3.4)
10.3. Bias, MSE Matrices, and the Risk of Estimators When p Is Known481
-
We know that the usual LSE of p ignoring p (i.e., assuming p = 0) is 0,= (X’X)-’X’Y, which is unbiased, having a positive-definite symmetric MSE matrix (u;(C-’C,C-’). Clearly, the MSE matrix difference of (X’X)-’X’Y and of p,(p) is given by
a,2[C-’C,C-’ - c-’ P I
(10.3.5)
p,
and is a positive-semidefinite matrix showing that p,(p) is better than = (X’X)-lX’Y whenever p # 0. The bias of the restricted estimator, p,(p) can be shown to be bz($,(p))
=
- h) =
-C,’H’(HCi’H’)(HP
-6,
(10.3.6)
and, say, = A;.
Bz(B,(p)) = a,26bC,6,
(10.3.7)
The MSE matrix and the risk expressions of $,(p) bY
under (10.3.3) are given
+
M2(Pn(p))= a2Ci1 S,6b
and
~ 2 @ , ( p ) : W)
(10.3.8)
+
(10.3.9)
= c2t r [ ~ ~ , l ]6 ; ~ 6 , ,
respectively.
-PT
Similarly, we find the bias, MSE matrices, and the risk of p, ..s+ . PT and 0, ( p ) as follows: First, we consider p, (p), and we obtain . PT
b3(Pn
( P ) ) = -6pGq+2,n(!a
: A;),
!a
=
-S
(p),p,(p)
(10.3.10)
4 2Fq,m(a)7 +
PT
~ 3 ( B n ( P ) ) = ~ : { ~ q + z , r n ( t a: A:)I~,
M3 ( p , ” ( p ) ) = a;Cil - o?C,1H’(HC,1H’)HC,’Gq+2,rn(!a : A”,, : A;) - Gq+4,,(!:
+S&{2Gq+z,,(&
: A;)},
& ( B F ( p ) ; W) = a,”t r [ W C i l ] - a,”tr[WCilH’(HC,lH’)HC,l] xGq+2,m(ta : A:) -~q+4,rn(C
+ abW6,(2Gq+2,rn(!a
: A;)
: A:)},
with ! := &Fq,m(a). -S
Next, we consider the Stein-type estimator, p,(p) and obtain the bias,
482
Chapter 10. Regression Models with Autocorrelated Errors
10.4. ADB, ADPISE, and ADQR of the Estimators ( p Unknown)
483
10.4 ADB, ADMSE, and ADQR of the
Estimators ( p Unknown) If p is unknown, we replace p by a consistent estimator, that is, the estimaP tor in that converges to p in probability (Pn--+p) as n --+ co to obtain the estimators given by (10.2.11). Then we can write (as n -+ m) under fixed alternatives Ha : HP = h + 6 ,
fi(P:,(Pn)
-P) =
fi(P,(P)
P , p,
- P)
+ Op(%
-
(10.4.1)
where 0;(.) stands for any one of (.), (-),P, (-),P, (.) and P, (.). Under the local alternatives H A : HP = h n-'I2<, Li(Pn) follows the noncentral chi-square distribution with q D.F. and noncentrality parameter A 3 2 as n 00, where A: = ~;~6bC,6,and 6, = C;'H'(HC;'H')-'C. These results allow us t o calculate the ADB, ADMSE, and ADQR of the estimators as follows: APT
-S
+
S+
--f
-s+
-
(i) For ADB and ADQB ofP,(P,),B,(p,),BnP'cP,),B,"(ii,), and P, (p,), bi(P,(~,))= 0 and bz(b,(&)) = -6,
~i(P,(p,))
=0
and B2(bn(ijn))= A;
-
-PT
b3(Pn (P,)) = - ~ , H q + 2 ( x ; ( 4 : A;), B3
(Pr
(Pn))
and
= A 3 H q + a ( X 3 4 A;)>2
-s
b4(P,(Pn)) = 4 9 - 2)6,E[X,S22(A;)l and B4(P,"(Pn))= qE[X;:2(A;)ll2 b5
-s+ -
(P,
(Pn)) = b4
-s+
Bs(1 - P,
-s
(P,(Pn)) - ~ , E W- ( 4 - 2 ) X , S 2 2 ( a ; ) ) I ( X ; + z ( A ; ) < q-2)l
-
(Pn)) = A;{(4
-
2)E[X,-:2(A;>lEI(1-(4-2)X;~,(A;>>
xI(x:+z(A;)
< 4 - 211 l2
(ii) For ADMSE and ADQR of P,(P,), -s+ -
P,
a,(&),b,"(P,),Bz(Pn),and
(Pn),
M1@,(&)) = a,"C,' and Rl(P,(P,)) = 0,"tr[WC,'];
M2(b,(Pn)) = o,"[CP' - C,lH'(HC,'H')-lHC,']
R2(Bn(Cn) : W) = 0," tr[WCp'] +6bW6,;
(10.4.2)
+ 6,Sb
- c,"tr[WC,'H'(HC,'H')-'HC,']
(10.4.3)
484
Chapter 10. Regression Models with Autocorrelated Errors
10.5. Estimation of Regression Parameters When p Is Near Zero
10.5
485
Estimation of Regression Parameters When p Is Near Zero
Judge and Bock (1978) considered the properties of the preliminary test estimator of p for the model (10.2.1) with autocorrelation p , when it is suspected that p is zero. We attempt to improve on the procedure in this section. To define the preliminary test estimator of p, Judge and Bock used the DurbinWaston test (1950, 1951), &, defined by (10.5.1) where Ell is the residual error -Xp,(P,), t = 1 , 2 , . . . ,n for testing HO : p = 0 against H A : p > 0. In this case, the critical region is of the form {&I& < DL( a ) } and , the decision is reached as follows: If f i w < D L( a ) ,HOis rejected, and if & > D,(a), the hypothesis HO is not rejected. If DL(CY)< f i w < D,(a) then Ho is inconclusive. Durbin and Watson provided the upper and lower critical values D U ( a )and DL(cY), respectively, with various sample sizes. A second test due to Berenblutt and Webb (1973) is indicated by the 91test, defined by =
where B is the matrix
B=
[
Y’[B - BX(X’BX)-lX’B]Y Y”I - X(X’X)-lX’]Y ’
2 -1 0 0 0
-1 2 0 0 0
0 -1 0
0 0 0
... -1 ... 0
.’. ..’ ... 2 -1
(10.5.2)
(10.5.3) -1 -1
test is generally higher than the power of the D-W-test The power of the for higher values of p; otherwise, 61 test is similar to the D-W-test. So the test can be reduced to the same canonical form as the D-W-test. Both tests have been obtained by approximating the likelihood function for p in two different ways. (Durbin and Watson, 1950, 1951, and Berenblutt and Webb, 1973, for details; for some related problems, see Kmenta and Gilbert 1968, 1970).
10.5.1 Preliminary Test and Stein-Type Estimators (Chen and Saleh, 1993) If p = 0, then the LSE is
p, = (X’X)-lXY. The estimator of p is given by
p,(pn) = (x’c;nlx)-lx’c;nlY,
(10.5.4)
486
Chapter 10. R,egression Models with Autocorrelated Errors
if p is unknown. Let L, be a suitable test-statistic for testing Ho : p = 0 against p > 0. Also, let Ln,a be the a-level critical value. Then the PTE of p is defined by
P,“(P~)
= P , ( P ~ )- ( P , ( P ~ ) - P
By the Durbin-Watson test, we can write the P T E of PT(D)
p,
( f i n )= P n ( P n ) I ( - 2
+p,qo
(10.5.5)
< Ln,a)*
n ) ~ ( f n
< fiw
< Dw
as
- ~ u ( a<)0)
- DL(cY) < 2 )
(10.5.6)
and by the 41 test,
PT((il)(pn ) = p n ( & ) I ( - 2 < 41 - D,(a) < 0) + P,I(O < 41 - D t ( a ) < 2).
P. n
Now, consider the Stein-type estimators of mator of p is defined as
p. For p
(10.5.7)
= 0, the Stein-type esti-
(10.5.8) with 6: = “(Y m - XP,)’(Y - XP,). However, for p and define the Stein-type estimator of p as
# 0, we estimate p by Pn
Thus, based on the Stein-type estimators, we define the preliminary test estimators of /3 as
(Pn)= P:I(-2 < Dw - D, < 0) + P,S(&)I(O < Dw - DL < 2 )
PT(1)
p,
and based on Dw and
(10.5.10)
41
tests,
ar(2’”(pn) = p:I(-2
< !&-DU< O)+p:(&)I(O < &-DL < 2). (10.5.11)
+
+
For the multiple regression model Y = XP E , where = p ~ i - 1 wi,Ipl < 1, i = 1 , 2 , . . . ,n,and v Nn(Ol o ~ E c pwe ) , consider the risk analysis of the four pairs of estimators: (1) two unrestricted estimators, ( 2 ) two PTEs, and (3) two Stein-type estimators, and (4) two PTEs based on the Stein-type estimators. From our simulation study, we learn that the (1) shrinkage estimators performs better than the unrestricted estimators and the ( 2 ) the shrinkage PTE performs better than the usual PTEs. Further, the PTEs based on the 41 test perform better than the PTEs based on the D-W-test. In the next section, we describe the design of the Monte Carlo study.
-
10.5. Estimation of Regression Parameters When p Is Near Zero
487
10.5.2 Design of Monte Carlo Experiment In the Monte Carlo study, two hundred samples of size 25 were generated using the orthonormal statistical model
Y = XP,
+ E , X ' X = I, ( n = 25),
(10.5.12)
where Po = (13.9,10.79,6.13,3.01,10.81)'. The autocorrelated errors were generated by the first-order autocorrelated model ct = p t - 1 vt, lpI < 1. The vector v = ( ~ 1 ,.. ,v25) was generated from N(0,60.8) and was assumed to be iid. The generated v's are then used to create the ct's by varying the autocorrelation in steps of tenths from 0.0 to 0.9. We obtained the MLE of the parameters for the 200 samples in order to compute the test-statistics. We only choose p to be nonnegative. The tests are based on the 0.01 and 0.05 levels of significance for the D-W-test and test.
+
10.5.3
Empirical Results and Conclusions
The simulation study reported by Judge and Bock (1978) gave performances of the PTE of based on both the D-W-test and &-test and they are the precursor of the study by Chen and Saleh (1993). In order to carry out the risk analysis of the estimators, we consider the unweighted and weighted loss function given by
LI(P* - P ) = (P* - P)'(P* - P ) and R(P* 1) = E"P* - P)'(P* - P)], ( 10.5.13) Lw(P* - P ) = (0'- P)'W(P* - P),
and
R(P* : W) = tr{WEl(P* - P)'(P* - P ) ] } ,
(10.5.14)
where we used the weight to be cG2C,. Based on these results we can calculate the performances of the estimators. The sampling experiment and the chosen model are as given in (10.3.10). Table 10.4.1 and Table 10.4.2 give the empirical risks based on (10.3.14). The major findings are as follows: (i) The shrinkage estimators have uniformly lower risks than the unrestricted estimators for all p-values. Generalized least squares estimators have smaller risk than the ordinary least squares estimators. The Durbin's two-stage estimator has the lowest risk compared to the C o c h r a n e O r c u t t (CO) and Prais-Winsten (PW) estimators, in Table 10.4.1 and Table 10.4.2, respectively. The same results were observed in the studies by Griliches and Rao (1969) and Judge and Bock (1978).
488
Chapter 10. R.egression Models with Autocorrelated Errors
Table 10.4.1 Empirical Risks for Different Estimators prior to Testing P -
OLS
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.8 0.9 -
280.56 290.94 308.07 335.57 380.18 454.95 587.68 436.42 208.83
GLS Durbin 280.56 287.68 296.94 310.44 331.31 364.49 418.03 643.91 848.30
co
PW
280.98 292.91 290.82 294.15 301.25 298.98 304.65 310.81 309.37 320.08 327.67 325.17 344.73 353.14 350.75 385.04 394.43 392.15 448.87 461.60 458.88 716.10 777.24 768.53 958.54 1244.43 1225.25
Table 10.4.2 Empirical Risks for Different Estimators prior to Testing Shrinkage Estimates
p 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 -
OLS 225.91 234.84 248.73 270.24 303.86 357.15 449.66 639.65 1010.84 2609.99
G LS Durbin 225.80 233.23 242.82 254.83 269.61 290.40 322.93 375.14 457.53 571.12
230.67 238.86 249.18 262.27 278.98 303.12 341.52 403.59 507.77 652.46
COI 236.30 244.57 254.66 267.65 284.93 310.22 350.36 418.13 548.22 865.44
PW 234.57 242.52 252.29 265.24 282.83 308.28 348.02 414.54 542.14 852.51
In Tables 10.4.3 through 10.4.6, we present the empirical results the two PTEs. One is based on the D-W-test, and the other on the &-test a t 1% and 5% levels of significance. The results shows that the P T E with shrinkage estimators performs better than the P T E based on the unrestricted estimators.
10.5. Estimation of Regression Parameters When p Is Near Zero
Table 10.4.3 Empirical Risk Values for PTE Based on D-W and 61 statistic, Q: = 0.01
F J
G1
D-W
61
287.51 285.91 286.43 297.16 297.28 295.67 308.97 309.58 307.34 326.45 326.84 324.42 352.91 353.96 351.03 394.63 397.25 392.38 461.67 464.02 458.98 573.59 569.22 596.04 777.24 768.86 768.53 1244.43 1232.51 1225.25
Table 10.4.4 Empirical Risk Values for Shrinkage PTE Based on D-W and 61 Statistic, a = 0.01
E p
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
Durbin
D-W 271.15 274.03 280.15 289.07 298.38 326.04 357.54 407.99 508.92 707.72
61
260.54 262.93 268.05 277.00 286.41 305.72 343.72 407.07 507.77 652.46
CO D-W 273.59 276.62 282.67 291.59 301.16 329.05 362.50 422.07 548.93 881.06
I
PW
61
262.68 265.07 269.84 278.41 288.74 310.69 350.10 417.78 542.14 852.51
489
Chapter 10. Itegression Models with Autocorrelated Errors
490
Table 10.4.5 Empirical Risk Values for PTE Based on D-W and 61 Statistic, cy = 0.05 E
I
Durbin
41
co
D-w
61
PW D-W
81
283.96 288.19 288.69 286.90 287.10 292.25 296.77 297.10 295.38 295.32 303.76 308.71 309.59 306.95 307.42 320.21 326.36 326.94 324.42 324.66 344.84 353.05 353.23 351.05 350.85 385.04 394.48 394.42 392.21 392.15 448.87 461.59 461.59 458.88 458.88 545.12 573.35 573.35 568.80 568.80 716.10 777.24 777.24 768.53 768.53 958.54 1244.43 1244.43 1225.25 1225.25 Table 10.4.6 Empirical Risk Values for Shrinkage PTE Based on D-W and 41 Statistic, a = 0.05
E P 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 -
Durbin
T
The Tables 10.4.3 through 10.4.6 are prepared with the headers p/E. These headers indicate that for various values of p, the estimators (E) are based on the Durbin, Cochrane-Orcutt and Prais-Winsten two-stage procedures by way of the tests (T), D-W and for the PTEs, and risks are calculated by the formula (10.3.13). Table 10.4.7 through 10.4.12 give the risk tables for the same estimators based on the weighted loss function given by (10.3.15).
20.5. Estimation of Regression Parameters When p Is Near Zero
Table 10.4.7 Empirical Risks for Different Estimators prior to Testing P
Durbjn
5.76 5.77 5.79 5.82 5.84 5.86 5.87 5.92 6.02 6.04
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
-PW co 6.02 6.03 6.03 6.02 6.02 6.03 6.07 6.17 6.47 7.38
5.98 5.99 5.99 5.99 5.99 6.00 6.04 6.15 6.46 7.39 -
Table 10.4.8 Empirical Risks for Different Estimators prior to Testing-Shrinkage Estimates
--
P -
Durbin co PW
4.32 4.30 4.32 4.34 4.36 4.38 4.41 4.48 4.59 4.61
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 -
4.53 4.50 4.51 4.48 4.50 4.48 4.49 4.47 4.48 4.46 4.48 4.46 4.50 4.48 4.57 4.56 4.80 4.79 5.55 5.54 -
Table 10.4.9 Empirical Risk Values for PTE Based on D-W and
E p
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
Statistic, a = 0.01
PWco- Durbin __ D-W 41 41 D-W 41 D-W -5.40 5.46 5.52 5.56 5.61 5.64 5.66 5.64 5.74 5.84
5.39 5.44 5.49 5.54 5.57 5.57 5.58 5.63 5.72 5.73
5.54 5.56 5.62 5.62 5.68 5.66 5.71 5.69 5.74 5.71 5.78 5.73 5.82 5.76 5.87 5.87 6.15 6.14 7.06 7.01 -
5.54 5.61 5.67 5.69 5.72 5.76 5.80 5.85 6.14 7.07 -
5.54 5.60 5.64 5.67 5.69 5.70 5.74 5.85 6.13 7.02 -
491
492
Chapter 10. Regression Models with Autocorrelated Errors
Table 10.4.10 Empirical Risk Values for Shrinkage PTE Based on D-W and 81 Statistic, cy = 0.01
E P
0.0 0.1 0.2 0.3 0.4 0.5 0.6
0.7 0.8 0.9
Durbin Gi D-W 4.23 4.27 4.32 4.37 4.42 4.47 4.48 4.50 4.60 4.68
PWD-W G1
4.21 4.34 4.33 4.24 4.40 4.38 4.29 4.44 4.42 4.36 4.47 4.47 4.38 4.50 4.47 4.38 4.54 4.48 4.41 4.56 4.50 4.49 4.58 4.57 4.59 4.80 4.80 5.61 4.61 5.54
4.34 4.32 4.40 4.37 4.44 4.40 4.47 4.45 4.50 4.46 4.53 4.46 4.54 4.48 4.57 4.56 4.80 4.79 5.63 5.55 -
Table 10.4.11 Empirical Risk Values for P T E Based on D-W and Statistic, Q = 0.05
-E Du D-W P -
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
5.40 5.43 5.48 5.53 5.58 5.57 5.58 5.63 5.72 5.73
-P' 7- D-W D-W GI Gl 81 in 5.40 5.43 5.48 5.53 5.55 5.56 5.58 5.63 5.72 5.73
C
)
5.57 5.58 5.61 5.63 5.66 5.68 5.69 5.70 5.71 5.72 5.73 5.73 5.76 5.76 5.86 5.86 6.14 6.14 7.01 7.01 -
5.55 5.55 5.59 5.60 5.63 5.65 5.67 5.68 5.69 5.69 5.70 5.70 5.74 5.74 5.84 5.84 6.13 6.13 7.02 7.02 -
10.6. Estimation of Parameters of an Autoregressive Gaussian Process 493
Table 10.4.12 Empirical Risk Values for Shrinkage PTE Based on D-W and 81 Statistic, cy = 0.05 E P -
-- Dui )in C 1 PW D-W D-W 8 1 81 D-W -- -
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 -
4.22 4.24 4.29 4.35 4.40 4.38 4.41 4.48 4.59 4.61 -
4.21 4.23 4.29 4.34 4.36 4.38 4.41 4.48 4.59 4.61
4.34 4.38 4.42 4.46 4.48 4.48 4.50 4.57 4.80 5.54
4.35 4.38 4.44 4.47 4.48 4.48 4.50 4.57 4.80 5.54
4.33 4.37 4.40 4.45 4.47 4.46 4.48 4.56 4.79 5.55
81
4.33 4.36 4.42 4.46 4.46 4.46 4.48 4.56 4.79 5.55
(Tables 10.4.1 - 10.4.12 are duc #oChen and Saleh (1993) reproduced here with permission of VSP publishers, an imprint of Brill Academic Publishers, Leiden, The Netherlands.)
10.6
Estimation of Parameters of an Autoregressive Gaussian Process
Consider the very often encountered autoregressive model in time-series defined by the Gaussian process yt, t = . . . , -1,O, 1,2, . .. , satisfying the difference equation
Yt - 81Yt-l - . . . - O p X - p = V t ,
(10.6.1)
where vt , t = . . . , -1, 0,1,2, . . . is a sequence of iid Gaussian random errors such that E ( q ) = 0 and E ( $ ) = a: > 0. The coefficient vector 8 = (61,. . . ,OP)’ is such that all the roots of the characteristic equation hp(Z)
= 1 - 61z - . . . - 8,ZP = 0
(10.6.2)
exceed one in absolute value. This model is often used by economists for timeseries analyses of data price index, unemployment, labor force, population, gross national product, and so forth. It can be verified that the parameters (O’,a2)’of (10.6.2) are uniquely determined by the spectral density function (sp.d.f) of yt,
Also, the p x p covariance matrix B of the autoregressive process yt of order p with sp.d.f (10.6.3) satisfies the so-called Yule-Walker equation (e.g.,
494
Chapter 10. Regression Models with Autocorrelated Errors
see Doob, 1953, and Hannan, 1970):
p ( ~-) &@(T - 1) - . . - O p / 3 ( 7 - p ) =
2
*
T
= 0,1,.
.,
(10.6.4)
where 60, = 1 or 0 accordingly as T = 0 or # 0 (see. also Brockwell and Davis, 1996). Now, suppose that 8 belongs t o the subspace H8 = h, where H is a q x p matrix and h is a pvector of known constants. Our problem is t o estimate 8 based on a sample Y I ,Y2,. . . , Y, when it is suspected that H8 = h holds. Accordingly, we define (as in Chapter 7) five estimators of 8: (1) the unrePT
stricted estimator 6,, (2) the restricted estimator en, (3) the PTE, 8, , (4) ..S+ the Stein-type (SE) 6: and (5) the positive-rule Stein estimator (PRSE) 8, , respectively, and study their properties.
10.6.1
Estimation and Test of Hypothesis
Our proposed estimators of 8 are based on the "principal part" (PP) of the logarithm of the likelihood function, L, = logp(Y1,. . . ,Y,) denoted by P since n-l/'(L, -L,)-10 as n .+ m (e.g., see Whittle, 1952, and Dzhaparidze, 1986). For the model (10.6.1) it has been shown that the approximation is given by
where 00 = 1 and I,(X)
P;(T) =
I-':
is the periodogram of the process {Y,}, while
I,(X)eix'dX
ytyt+.,,
= n-l
T
= 0,1, ... , n - 1,
(10.6.6)
t=l
are the elements of the estimated covariance matrix BE. Denoting by p i = (/?;(l),. . . ,p:(p)) ', we define the-unrestricted principal part maximum likelihood estimator (UPPMLE) as 8, as
6,
= Bi-'P:.
(10.6.7)
If 8 belongs to the subspace H8 = h, then the restricted principal part MLE (RPPMLE) of 8 can be written as
6 , = 6, - B:-lH'(HB:-lH')-l(HB,
- h).
(10.6.8)
We consider here the test of the hypothesis Ho : H8 = h against H A : HB based on the Wald-type statistic:
L, = n5;2(He,
- h)'(HB;-,-'H')-'(H6, - h),
#h
(10.6.9)
10.6. Estimation of Parameters of an Autoregressive Gaussian Process
495
where 52n - P
~ ( o-) P i e n .
( 10.6.10)
In the next section we will show that as n -+ ca, C, follows a central chisquares distribution with q D.F. Then we can define the principal part preliminary test estimator (PPPTMLE) as
. PT 6 , = 8, - (en- en)i(cn <
(10.6.11)
Similiarly, following Chapter 7, we define the Stein-type principal part estimator (SPPMLE) and the positive-rule principal part Stein estimator (PRPPMLE) as follows:
-s -
( 10.6.12)
6, =8,-d(6,-ffn)C,',d=q-2 and
-s+-
en
-
e, + (1 - dC;')I(C,
> d)(&
-
en),
( 10.6.13)
respectively. See also Saleh (1992).
10.6.2
Asymptotic Theory of the Estimators and the Test-Statistics
Our main purpose is the comparison of the five estimators based on the asymptotic theory of the estimators and loss function
L(6:, 6 ) = n(6: - 6)'W(O:
- 6),
( 10.6.14)
where W is a positive-semidefinite weight matrix. It can be shown that C, is a consistent test, and for any fixed alternative H6 : H6 = h 6,all the estimators will be asymptotically risk equivalent t o 6, except that 6, tends to be degenerate as n -+ co. Thus, we resolve this issue by considering the class of local alternatives (as in Chapter 7) t o be
+
K(,] : H6 = h + n-1/2[,[ # 0. First, we note that the following theorem holds as n
( 10.6.15) -+
co under {K(,]}:
Theorem 1. Under {K(,)} and the assumed regularity conditions for the model (10.6.1), we have as n --$ co
[ '(1)
(i) lim PK(,,,{
B=
&(en
P(0)
-6)5
x} = a P ( x 0; , a:B),
P(1)
P(2)
.'. P ( P - 1) ... P ( P - 2 )
P ( P - 1) P ( P - 2 ) ... P(0) is the Toeplitz matrix associated with sp.d.f.;
( 10.6.16)
Chapter 10. Regression Models with Autocorrelated Errors
496
{&(in - 0) 5 XI = a P ( x , 6,o; , c,”V),
(ii) lim PK(,,
where V = B-’ - B-’H’(HB-lH’)-lHB-’ 6 =B-~H’(HB-~H’)-~~;
(10.6.17)
and
(iii) limPK(,)(C, 5 X) = HQ(x; A’), A’ = O;’~’(HB-’H’)-~~;(10.6.18) - 0) I x} = Hg(x;(cr);A2)@’p(x6;O;o;V)
(iv) h P K ( , l { & ( 6 y
+ J @,(x- Z; 0 ;a,”V)d@,(Z,0;u;(HB-lH’)-l),
+
(10.6.19)
+
where E ( t ) = {Z : (HZ t)’(HB-lH’)-’(HZ t ) 2 x:(cr)} and where Q P ( x0; ; E) stands for the pvariate normal cdf with mean 0 and covariance matrix E, and Hg(x; A’) is the cdf of a noncentral chi-square distribution with q-D.F. and noncentrality parameter A2/2.
-s
(v) the asymptotic representation of fi(6,
J;;;(’:
-
where Z
V
0) --t
N
0) given by
1
z - d { b , . ’ ( H z ~ ~ ) . ( H B - l H , ) - l ~ ~ ~ ~ ~ ) (10.6.20) B-’H’(HB-’H’)-’(Hz+
)
7
NP(O,0;B-l) and d = q - 2;
’
and (vi)
-
fi(e? -
{ o , ’ ( H z + t ) , ( H*8 - l H ’(HZ+ ) - l ~ H ~ + t )1 B- H‘(HB- H’)-
-
)
+ <>’(HB-lH’)-l(HZ+ t)}-’ x I{(HZ + E)’(HB-’H’)-I(HZ + t )< d } x { B - ~ H ’ ( H B - ~ H ’ ) - ~ (+ HtZ) } ] . (10.6.21)
- [I - oi2d{(HZ
Proof.
P
(i) Since B;+B as n
3,
-+
03,
we can write
= B;-~P;(T) = B-’P;(T)
+ 0,(1).
( 10.6.22)
V
- O)=Np(O,a;B-l), being a PPMLE (see DzhaFurther as n -+ 03, paridze, 1986). Hence, we set Z = &(en - 6 ) = &B-’(~;(T) - P(T))~N,(O, a;B-’), as n
-+
00.
(10.6.23)
(ii) Restricted estimator
&(in
- 6)=
=
Asn-+m,
&{(an
- 6) - ~B;-lH’(HB~-lH’)-l(HB,- h)}
z - B-~H’(HB-’H’)-~(Hz+ <) + op(i).
(10.6.24)
10.6. Estimation of Parameters of an Autoregressive Gaussian Process 497 where V = B-'
-
B-lH'(HB-'H')-lHB-l.
(iii) Test-Statistic
13, = n6Gz(H8, - h)'(HB;--'H')-'(H8, - h)
+ <)'(HB-lH')-'(HZ + 6) + op(l).
= aY2(HZ
(10.6.26)
As n -+ 00, lirnPK(,){Ln 5 x} = PAz{aG2(HX+ <)'(HB-'H')-'(HZ
+ <) 5 x}
= Hq(x;A2), A' = O,~~'(HB-'H')-'S
(10.6.27)
Next (iv), (v) and (vi) follows from (i)-(iii) and noting that {&6, are statistically independent as n -+M.
e)', &(en - en)'}' 10.6.3
-
ADB, ADMSE Matrices, and ADQR of the Estimators
Using Theorem 10.6.1 we can find expressions of the asymptotic distribution bias (ADB), MSE (ADMSE) matrices, and the risks (ADQR), which are given below estimator by estimator. (i) UPPMLE:
bl(6,) = 0 and B'(8,)
=0
Ml(8,) = a2B-I and Rl(6, : W) = a;tr(WB-').
(10.6.28)
(ii) RPPMLE:
bz(6,) = -6 and Bz(6,)= A 2 , Mz(6,)
+ 66', where V = B-' = a,"t r ( W V ) + ~ ' w s .
= a:V
~ z ( 6 ,: W)
- B-'H(HB-'H')-'HB-l,
(10.6.29)
498
Chapter 10. Regression hlodels with Autocorrelated Errors
The expressions above are similar t o the asymptotic expressions of the regression estimators of Chapter 7. Hence, the properties of the five estimators are similar t o those of Chapter 7. Similarly, we can develop the confidence sets similar to Section 7.9 of Chapter 7 and arrive a t the same conclusion as given there.
10.7
R-Estimation of the Parameters of the AR[p]-Models
E O , E * ~ , E ~ *,..., ~ be i.i.d.r.v.’s with cdf F defined on R.Let Y o = (YO, Yl,.. . ,Y I - ~ )be ’ an observable random vector independent of v1, v2,. . .
Let
10.7. R-Estimation of the Parameters of the AR[p]-Models
499
for some fixed integer p ( 2 1). Consider the pth order autoregressive, AR[p] model where the observations Y1, Y2, . . . ,Y, satisfy the relation
Y , = ely,-l
+ ... + e,~,-, + vi,
1i i
I n,
(p 1 1)
(10.7.1)
and assume that all roots of xp
-
elxP-’ - ( 3 2 2 P - 2 . .
. - 8P -- 0
(10.7.2)
are in (-1,l). Here 8 = (01,. . . ,BP)’ E RP is the vector of unknown autoregressive parameters. Our primary interest is to obtain R-estimators of 8 when it is suspected but not certain that 8 belongs to the subspace defined by HB = h, where H is a p x q matrix and h is a q-vector of constants, respectively. For a simple subhypothesis problem, see also Koul and Saleh (1993).
10.7.1 R-Estimation of the Parameters of the AR[p] Model In this section, we introduce a class of R-estimators of 8 for the AR[p] model (10.7.1) and discuss their asymptotic properties. Let Y, = (K,. . . ,yZ-p+l)’, 1 5 i 5 n, and define Ri(b) as the rank of Y , - b’Yi-1 among {q- b’Yj-1lj = l , . . . ,n}. Set Ri(b) = 0 for i 5 0. Let 4 be a nondecreasing function from [O, 11 to the real line, and define L,(b) = (Ll(b), . . . ,Lp(b))’, where (10.7.3) with b = (bl, . . . ,bp)’ E RP.The class of rank-statistic L,(b), one for each 4, is similar t o a class of similar rank-statistics for the linear regression model discussed in Chapter 7 where we replaced the weights {yZ-j} by appropriate design points as discussed in Hajek, Sidak, and Sen (1999) and Puri and Sen (1986) among others. For an alternative class of rank-statistics, see Hallin and Puri (1988). It is natural to define 6, as the unrestricted (UE) R-estimator of 8 by the relationship (10.7.4) Then we can state the asymptotic linearity result as given below. Theorem 1. Assume that (10.7.1) and (10.7.2) hold. In addition, assume that the following conditions hold. A. (i) E ( v ) = 0, 0 < E(v4) < 03 and q , . .. ,v, are i.i.d.r.v. with cdf, F , (ii) F has a uniformly continuous pdf f (> 0), almost everywhere.
Chapter 10. Regression Models with Autocorrelated Errors
500
B. Q is nondecreasing and differentiable with the derivative Q‘being uniformly continuous on [O, 11. Then, for every 0 < k < 00, (10.7.5)
n
y=
i=j+l
1
f d 4 ( F ) and E = ( ( p ( i- j ) ) ) is the Toeplitz matrix with elements
P ( k ) = COV(X0, Xk), 1 I k I P.
(10.7.6)
The theorem above covers Wilcoxon-type scores but not the normal score. For a proof, see Koul and Saleh (1993) and Problem 14. As a consequence of Theorem 1, we can write
n1/2(6,- e) = yc-lL,(o) + op(i).
( 10.7.7)
Observe ,... that Ln is a vector of square integrable mean zero martingales with E[L,Lk] = u;E, where 0: = Var(Q(V)). Thus, by a routine Cramer-Wold device and Corollary 3.1 of Hall and Heyde (1980), we obtain
L,
-
NP(O,o p )
n1/2(6, - 0)
and
N
Np(0,~-2u$E-1).
(10.7.8)
The proof is left to the reader; see Problem 15.
10.7.2
Tests of Hypothesis and Improved &Estimators of e
Since our objective is to improve upon 6, when 6 belongs to subspace defined by HB = h, we need to test the null hypothesis H,-, : HO = h against the alternative H A : He # h. For this, we first consider the restricted R-estimators of 0 as
a, where C,
= ((&,ij))
= 8, - $nH’(HE,1H)-’(H6, - h),
(10.7.9)
consistent estimator of X with
n-rnax(t,j)
u^ n. t’ ~-
(Yk-i k=l
- %)(Yk-j
-
F)1
15
2,
j
5 72,
(10.7.10)
10.7. R-Estimation of the Parameters of the AR[p]-Models P
since E(V?)< 03 and the ergodic theorem implies ;e,-+X Now, we consider the test-statistic C,, defined by
c,
=
n&;~;(b,)k;'~,(ij,)l
(A) 6,)
as n
+ co.
(10.7.11)
6,
2
501
(&).
where 6; = n-l ($ , and = n-' 4 It can be shown (see Problem 16) that under Ho, C, has asymptotically the central chi-square distribution with q D.F. Let x:(a) be the upper 100a% critical value of the central chi-square distribution with q D.F. Then we define the preliminary test rank estimator (PTRE) as (10.7.12) The Stein-type rank estimator (SRE) is defined by
-s
-
8, = 6 ,
+ (1- C L ; ' ) ( ~ ,
- b,), c =
-
( 10.7.13 )
2,
and the positive-rule Stein-type rank estimator (PRSR.E)is defined by
ij:+
10.7.3
=
6, + (1 - CC,')I(L, > C ) ( B n
-
bn),
q
2 3.
(10.7.14)
Asymptotic Bias, MSE Matrix, and Risks of the R-Estimators
In order to obtain the expressions of the ADB, ADQB, ADMSE, and ADQR of the five R-estimators, we follow Saleh and Sen (1986), Sen and Saleh (1987), and Saleh (1992) and use the sequence of total alternatives. K(,) : H8 = h n-'/'<, # 0. We need the following additional assumptions (in addition to the assumptions in Theorem 1): A* (ii) The error distribution F has an absolutely continuous pdf f such that
<
+
I(f)= J-",
(-)
2
f(t)dt < 03. It can be shown (see Akritus and Johnson, 1982) that under A (i), (ii), B, and A* (ii), I((,) is contiguous to Ho. Using this argument and that of Theorem 3.2 of Sen and Saleh (1982), we obtain Theorem 2 as given below. Theorem 2. Under (10.7.1) and (10.7.2), A(i), (ii), A*(ii), and B, the ADB, ADQB, ADMSE, and ADQR are given as follows: A. For URE
e,,
&(en)
= 0, (i) bl(6,) = 0 and (ii) Ml(8,) = ~ ~ g J 2 -and l Rl(8,;Q) = a;tr[QX-'].
B. For RRE (i)
en,
b2(bn)= -6, 6 = X-'H'(HX-'H')-')E ~ ~ (= 8 = ~ a ); 2 d ' ( ~ ~ - 1 ~ ' ) - 1 6
and 1
502
Chapter 10. Regression h/lodels with A utocorrelated Errors
10.8. R-Estimation of the Parameters with AR[l] Errors
503
and Rs(6;’;
Q) = R4(6:; Q ) - c; tr[QX-lH’(HX-lH’)-’HX:-l] x{E[(1 - cx,;22(A2))21(xi+2(A2)<
}I).
+(a’Qa){aE[l - C X ~ : ~ ( A ~ ) ) I ( X ~ + ~<( A C)’ )
.)I}
-~[(1 - C X , - ~ ~ ~ ( A ~ ) ) ~<~ ( X ~ + ~ ( A ~ ) The proof is left to the reader; see Problem 18. The asymptotic dominance picture of the R-estimators are the same as that of Chapter 7 and Section 10.6. The asymptotic properties of the recentered confidence sets are similar to Section 7.9, and can be obtained by way of Problem 19.
10.8
R-Estimation of the Parameters with AR[l] Errors
Consider the simple linear regression model Y1 = 00 + Plz,
+ e,,
e, = pe,-l
+ v,,
Ip( < 1,
i = 1 , 2 , . . . ,n, (10.8.1)
where II,are i.i.d. standardized r.v.’s with distribution function F , assumed t o be symmetric around 0. The design variables z, are real known constants and nonrandom. We focus on the asymptotic distribution of a number of signed rank estimators of Po and PI. Let # be a nondecreasing function from [0, I] t o R with 4 ( z i ) = -+(l - Z L ) , for all zi E [0,1]. Let b = (b0,bl)’ E IR2, e,,b := Y,- bo - biz,, and RZb denote the rank of le3,bl among /eJ,bl. 1 5 j 5 n. Define @(u) = ~ ( Z L 1)/2) and S(b) := n-l
( ii ) 4’
(s) n f l
+
sign(ei,b),
(10.8.2) Let p := (Po,,&)’. Then under the symmetry o f f around zero, E [ S ( P ) ]= 0. Define the signed rank estimator vector to be a minimizer of ilS(b)ll with respect to b E JR2 or a solution of the equations S(b) = 0. Let X denote the n x 2 design matrix whose ith row is (l,zi),1 5 i 5 n. Then from Koul (1977) we obtain the following result:
p,
Theorem 1. In addition to the assumptions above, suppose that the following holds: The d.f. F has a uniformly continuous Lebesgue density f, the score function 4 is right-continuous and bounded, and the design matrix X
504
Chapter 10. Regression Models with Autocorrelated Errors
I z Z I / d m= o(1).Also,
is of the full rank and satisfies nl/’ maxi<,<, assume that the integral y := J’ f d + ( F ) >%.Then
nqp,’
- p) = yl(X’X)-ln1/2S
+ op(l).
(10.8.3)
and Now, let $(e,) := +(F(~e,l)sign(e,), (10.8.4) Then the covariance matrix of the leading vector on the right-hand side can be written as
E { ( X ’ X ) - l n S P (X ’ X ) -l} = ( X ‘ X )-
cn( p )(x’x)-’.
( 10.8.5)
Assuming that this matrix ha5 a positive definite limit, say, X ( p ) , we can further conclude that
n1/2(p,’- P ) ===+ Nz(o,7-z%N.
(10.8.6)
Further characterization of the covariance matrix X ( p ) depends on the design, F and the score function. See, for example, Section 4 of Koul (1977) for the details involved in the case where e, is a Gaussian process with mean zero, variance 1, and the lag k correlation plkl. It is really cumbersome. The main difficulty is how t o compute a reasonable approximation t o the covariances E [ $ ( e z ) $ ( e 3 )as ] a function of (i - j ) in general when e,’s are a moving average of the i.i.d. r.v.’s. If p if not known, following Durbin (1960), we can write the model (10.8.1) as
y, = X + py,-1+
Pl(X, - px,-1)
+ v,,
i = 2 , . . . ,n,
(10.8.7)
conditionally on Yl, where X = (1- p)po. Now, we consider the rank-statistics
Sn( u ) defined by
where Ri(&,Pln) is the rank of residuals x-1 --Pon -Plnxi-1 among { qboon - plnx312 5 j 5 n } . The first residuals are computed based on the Rand P, assuming p t o be zero as in Chapter 3. Then estimators of
fii1)= 1[sup { z t
01
011.
: Sn(tL) > + inf {u : Sn(tt) < 2 as given below. Next, we compute p1 and 1 = [sup { b : ~ , ( b j f i i l ) )> inf { b : ~,(blbc))< o}]
(10.8.9)
01 +
and
jk)= 51 [sup { a : Tn(a,fiii)lfiil)) > 01+ inf { a : Tn(a, ^ ( I )lcn
~ 1 , (1)
< 011, (10.8.10)
10.9. Conclusions
505
where
and n
Ln(bjfip) = n-l
C(x;-
.;)Un(Rt(bj&1))),
(10.8.1I )
t=2
respectively and Y t = yt - pyt-1 and x; = zt - pxt-1 for t = 1,. . . ,n (see Chapter 3 ) . The process is repeated until the R-estimators are stabilized. The asymptotic distribution of &{ (Pon - P o ) , (Pin - P I ) } can be proved to be an asymptotically bivariate normal covariance matrix as given by (10.8.6) (see Problem 20), so this is an open problem.
10.9
Conclusions
In this chapter, we considered the simple linear model and the multiple regression model with autocorrelated errors and discussed various estimators as in Chapter 7. If the autocorrelation in the model is unknown, we showed that by asymptotic theory consistent estimation of the autocorrelation parameter ( p ) and asymptotic properties of the various estimators can be obtained. Further, we discussed the asymptotic properties of the estimators of the parameters of an autoregressive model. In addition, we discussed R-estimation of the regression parameters, where autocorrelation may or may not be known, based on Durbin's (1960) method.
10.10
Problems
1. Verify that the inverse of the matrix,
to be
I=, defined by
506
Chapter 10. Regression Models with Autocorrelated Errors
-
+
2. Show that if Y, N(191, /3x,a:C,), where Y , = (Yl,... ,Yn)’ and x = (21,. . . ,zn)’,then the following hold:
(a) Y
- N ( I+~
PZ,
$xP),
(b) If p is known, the estimators of 8 and covariance matrix given by (10.1.10).
/3 are given by (10.1.9) with
: (m = n - 2) (c) The unbiased estimator of a
1
5: = -[Y, m
- B , ( p ) l , - Pn(P)xl’x:,l[Y,
- % ( P ) L - &(p)xI,
and m5?/c?follows the central chi-square distribution with m D.F. independent of 8,(p) and P,(p). 3. Show that the bias and MSE of the estimators &(p(P),BLT(p) and B:(p) are given by (10.1.20) through (10.1.22). 4. Show that the bias and MSE of the estimators 8,(p), 6,(p), h z T ( p ) , and 6 z ( p ) are given by (10.1.23) through (10.1.26). 5. Consider the model (10.1.1) and the prediction of Y value given 2 = 20, where p is known, %bO)
=k(P)
+ P,(p).o.
Find the bias and MSE of the following estimators:
(9 Y J X O ) ,
(ii) Y,(xo) = Y,
(iii) Y / T ( z o ) = Y I ( L < ~ l , ~ ( + a p) n) ( ~ o ) l (2~~: ln, , ( a ) )
6 . Consider the model (10.1.1) when p is unknown. Show that (a) ,& = p + O,(n-l/’), holds as n + 00. (b) & = 8 + O , ( T L - ~ / ~ ) , (c) = P + 0,(n-’/2) and
(4
Pn
[&(en
-
$1, V
W n - P), &(Pn
- P)]’
7. Verify that expressions for ADB and ADMSE of the estimators of p, /3 and 8 as given by (10.1.20) through (10.1.26). 8. Consider the multiple regression model (10.2.1) when p is known. Show that
10.10. Problems
507
9. Verify the expressions for the bias, quadratic bias, MSE matrices, and the risks for the estimators of ,B given by (10.3.6) through (10.3.13). 10. Following Chapter 7, Section 7.8.1, show that the estimators of p when p is unknown are asymptotically risk equivalent under fixed alternatives (refer to section 10.6). 11. Verify the expression for the ADB, ADQB, ADMSE, and ADQR of p when p is consistently estimated by p,, given by (10.4.4) through (10.4.9). 12. Following Chapter 7, Section 7.8.1, show that the estimators of 6 are asymptotically risk equivalent under fixed alternatives. 13. Following Chapter 7, Section 7.9, develop the asymptotic properties of the five confidence sets, where the centers are en,6- ,P T, 6- S, , and 6. S+ , , respectively (Ref 10.6). 14. Prove Theorem 1 of Section 10.7.2. 15. Prove T Z - ~ / ~ L , ( O ) Mp(Ola;X) and f i ( 6 , - 6 ) Np(O,y-2a;X). 16. Prove that as n 3 oc),
en,
-
-
L, = a;(He, - h)’(HX-’H’)-’(Ha, - h)
+ op(l),
where L, is defined by (10.7.11). 17. Prove that the local alternatives {K(,)} defined by H6 = h n-1/2t is contiguous to Ho : H8 = h. 18. Prove Theorem 2 of Section 10.7.4. 19. Formulate the recentered (1-a)% confidence interval when E is known, and study the dominance properties as Section 7.9. 20. Develop the R-estimation of the intercept and multiple regression parameters in the model
+
when (i) p is known and (ii) when p is unknown.
This Page Intentionally Left Blank
Chapter 11
Multivariate Models Outline 11.1 Point and Set Estimation of the Mean Vector of an MND 11.2 U-Statistics Approach to Estimation 11.3 Nonparametric Methods: R-Estimation 11.4 Simple Multivariate Linear Regression Model 11.5 R-Estimation and Confidence Sets for the Simple Multivariate Linear Model 11.6 Conclusions 11.7 Problems
Real-life data sets involve measurements of two or more characteristics of individual samples chosen for study. For example, height, weight, chest-girth, and the like, can be treated as vectors of observations, Y = (Yl, Y2,. .. ,Y,)’, of pcomponent characteristics. The component characteristics are not necessarily independent but correlated. The usual distribution that is used for analysis of a vector data set of fixed sample size is the multivariate normal distribution (MND). Thus, for statistical analysis we consider a random sample Y 1 , - . ,. Y Nof size N from the MND, n/,(0,E) with mean vector 0 and the p x p covariance matrix, X. In general, (0,X) is unknown, and the simplest statistical problem is the estimation of ( 6 ,E) based on the sample, Yl,YZ,-.. , Y N .It is well-known (e.g., see Anderson, 1984; Srivastava and Khatri, 1986) that the maximum likelihood estimator of (0,E) is (YN, S), N where Y N = C,=] Yi and ( N - 1)s= CEl(Y,-&)(Yi -YN)’. Further, (YN, S ) satisfies the estimation criteria of Fisher, sufficiency, unbiasedness, uniform minimum variance unbiased estimation (UMVUE), as well as the decision theoretic criterion of “minimaxity”. As we mentioned in Chapter 4, Stein (1956) proposed that the usual estimator of 0, namely YN, may be uniformly improved by a nonlinear estimator under a quadratic loss function. In this chapter, we begin with the JamesStein estimator of 0 that uniformly improves over the sample mean YNunder
&
509
510
Chapter 11. Multivariate Models
a quadratic loss. It has inspired many researchers t o travel in different paths to appreciate the impact of the Stein’s theory in the parametric estimation. We present the Bayes and the empirical Bayes methods (see Efron and Morris, 1973) of estimating 8 as well. We consider the simple linear model in a multivariate setup and propose preliminary test and Stein-type estimators of the i n t e r c e p t and the slope-vector parameters. Further, we discuss the related improved confidence-set problem. (e.g., see Berger, 1980b, Lu and Berger, 1989). As a particular case of the simple linear model, we obtain the results for the two-sample problem of estimating the mean vectors when they may be equal. In the course of various developments, Saleh and Sen (1978-’86) merged two diverse areas of the preliminary test and Stein-type estimation in a nonparametric setup that includes robust statistics. In this regard we considered U-statistics and the problem of estimating functionals of pvariate nonnormal distributions, and we extend the methodology t o R-estimation of the intercept and slope-vectors when the error distribution is nonnormal but diagonally symmetric.
11.1
Point and Set Estimation of the Mean Vector of an MND
In this section, we consider the simplest one-sample problem of estimating the mean vector 8 of the MND, Np(8,E)based on the sample, Y 1 , . . -, Y Nof size N when C is unknown.
11.1.1 Model, Estimation, and Test of Hypothesis Consider the one-sample multivariate model: y i = e + E i ,
E ~ - N ~ ( O , E )i, = i , . . . , ~ ,
(11.1.I)
where Yi = (XI,... ,Kp)’,8 = ( e l , . . . ,eP)’,and ~i = (Q, ... ,eip)’. Our main objective is the estimation of 8 when it is suspected that 8 may be 0 and to study the properties of the estimator(s) under a quadratic loss function. Accordingly, we propose (1) the u n r e s t r i c t e d MLE, (2) the restricted MLE, (3) the PTE, (4) the James-Stein estimator (JSE), and ( 5 ) the positive-rule Stein estimator (PRSE) of 8. It is well-known that the MLE of (8,E) is (YN, S ) . So we designate the unrestricted estimator (UE) of 8 to be 6 ~ ( y = ~and ) the restricted estimator to be 6 ~ ( 0=in this case). Since 8 may be 0 , we test the null hypothesis Ho : 8 = 0 against 8 # 0 based on the liklihood ratio test given by
m np
C N = -Ne‘,S-le~,
m =N -p
and n = N - 1.
(11.1.2)
11.1. Point and Set Estimation of the Mean Vector of an MND
511
Then, from Rao (1973) and Srivastava and Khatri (1979, p. 113), we have
(11.1.3) For a given
e~, ( 11.1.4)
and we know that
6,
V
N
Np(8,N - I X ) which leads to conclude that
~e',x:-leN
Ex;(~2),
~2
(11S.5)
= Ne'x-le,
so the distributions (11.1.4) and (11.1.5) are independent. That is t o say,
(11.1.6) where C N follows a noncentral F-distribution with ( p ,m) d.f. and noncentrality parameter A2/2. If 8 = 0, the F-distribution is central. Thus, the PTE of 8 can be written as -PT
-
= ON
-
< Fp,m(a)),
(11.1.7)
~ N I ( L N
is the upper a-level critical value from the central F-distribution. where Similarly, the James-Stein estimator of 8 can be written as
(11.1.8) and the PRSE of 8 is given by
..S+
8 N = (1 - dL&'))I(LN > d)6N.
(11S.9) -PT
-S
..S+
We now prove that the proposed estimators, namely B N , 8, , O N , and 8, , belong to the class of "empirical Bayes" estimators. First, we note that ON Np(8;N - ' C ) , and since 8 may be 0, we assume the prior distribution of 8 to be Np(O,r 2 N - ' C ) , where T~ is a scalar. Thus, a plausible estimator of 8 may be of the form k6N, where k is not known to have been chosen appropriately. Applying Bayes Theorem, we have the conditional distribution of 818, as N p { k e N ,k N - ' C } , where k = ~ ~ T/ ~1.
-
-B
+
Hence, the Bayes estimator 8,is given by 8 N = k e N , where e~ and 0 are the extreme estimators of 8 when + 00 and r2 -+0, respectively. Here k is still unknown, and equivalently r2 is unknown. In order t o estimate k , we write
Chapter 11. Multivariate Models
512 it as [l - 1/(1
+
T ~ )so ]
that
where 1/(1
+T
~ can )
be
estimated. Thus, we consider the marginal distributions of n S and 6, that are V independent and are given by n S = W p ( E ,n ) and 6, = Np(O,(1+ T ~ ) N - ’ X ) , V
-I
respectively. Then ( ~ + T ~ ) - ’ I V ~ ~ E -and ’ ~ LN N ==XzN8,S-16wg(l+ ~ T ~ ) - ‘ F ~So , ~it. is easy t o see that (1 + T y E ( L , 1 ) = E [ F p 3=
~
P-2
for p > 2.
(11.1.10)
- ’ be estimated by a scalar multiple of L i ’ . More generally, Hence, ( 1 + ~ ~ )can let g ( 1 3 N ) a real valued function of LN,be an estimator of (1+ r 2 ) - ’ .Accordingly, we define a class of “empirical Bayes” estimators of 8 as a substitute of the Bayes estimator:
ey
We consider the following five choices of g(L,).
..E B
(2) ~ L N=) 1, 8,
(4) g(L,)
=
-EB
(1) g ( L N ) = 0, 8,
-EB
= 0; (3) g ( L N ) = ~ ( L N 5 F p , m ( a ) ) , 8, -EB
(p-2)m
dL&’, d = p ( m + ’ ) ’ 8, -EB
=
-s+
-
= 8,; -PT
= 8, ; 8,; and ( 5 ) C l ( L N ) = 1 - (1 -
. Note that PSRE is a PTE, since . S+ > d ) = B,I(LN > d ) = 8, (see also Efron and
dLN1)I(LN > d ) . In this case, 8,
+
(11.1.11)
= (1 -g(L,))S,.
-S
= 8,
-S
~NI(L< N d ) 8,I(LN Morris, 1972-’77); and Morris, 1983).
11.1.2 Bias, QB, MSE Matrix, and Weighted Risk Expressions of the Estimators Consider the following formulas for the bias, MSE matrices, and risks of the estimators: (1) b(8fv)= E[8; - 81 and B(8fv)= N[b(8fv)]’X:-’[b(8fv], (2) = E[N(efv- 8)(8; - e)’],and (3) R(8;;Q) = tr[QM(B;)], where 8; stands for any of the five estimators of 8. For the unrestricted estimator we have bl(6,) = 0 and Bl(6,) = 0, M l ( 6 ~=) E and R ~ ( ~ N ; = Q t)r ( Q E ) . (11.1.12)
..
PT b2(8, ) = 2 PT
For the PTE, we obtain
..
..P T )
-8Gp+2,m(&(a);A2) and &(ON
A2{Gp+2,m(trr(~);A2)}, M2(8, ) = E - EGp+~,m(!cr(a); A’)
+ NO8
=
X{2Gp+2,m(&(a);A2) - Gp+4,m(l:(a);A2)} and R z ( 6 F ; Q ) = tr(QX) t r ( Q X ) G p + z , m ( L ( a ) ;A’) N8’Q@(2Gp+2,, ( & ( a ) A2) ; - Gp+4,m (!:(a);A2)} (11.1.13)
+
-S
here & = &Fp,m(a) and t: = &Fp,m(a). The expressions for JSE, 6,, are given by
b3(6;) = - ~ P Q E [ x ~ : ~ ( A ~and )]
2
&(is,) = d2p2A2(E[~;:2(A2)]} ,
11.1. Point and Set Estimation of the Mean Vector of an MND
x(1-
513
+
( p 2)N6’Q6 2A2tr(QX) )2AzE[Xi:4(AZ)]} 2 0 (11.1.17)
for all A’ provided that the matrix Q belongs to the set satisfying
(11.1.18)
514
Chapter 11. Multivariate Models
Further,
since
The term 2E[(1- d l F ~ ~ z , m ( A 2 ) ) 1 ( F p + 2 ,<~ (dA l )2] )is negative, whereas E[(1- d2F,;’4,,(A2))2r(Fp+4,,(A2) < &)I is positive. Hence, the third term of (11.2.5) is positive, and (11.2.1) holds. Thus, all together, we obtain Rl(eN; Q) 2 R3(6;; Q) 2
. S+
&(ON
; Q) v(A2,Q)
(11.1.21)
where Q belongs to the set (11.1.16) with p 2 3. A
PT
Next, we observe that 8,
A2 5
6,
whenever
tr(QE)Gp+z,m (L( a ) ;A2) . Chmaz(Qx) [2Gp+2,m(&( a ) A’) ; - Gp+4,m([G(~);A’)] ’
otherwise,
A2 2
is superior to
(11.1.22)
is superior whenever
tr(Qx)Gp+2,m(-%(a); A2) Chmin (Qx) [2Gp+2,m([a( a ) ;A2) - Gp+4,m ([A ( a ) ;A’)]
The efficiency of *
PT ON
(11.1.23) *
can be written, for example, as . PT
PT
Eff[ON : e N ] = Rl(6,; Q ) [ R ~ ( O N;Q)]-’ = E [ a , A 2 ] ,
(11.1.24)
it depends on cy and A2. As we know P T E is not uniformly better than since ON. Nevertheless, we can obtain a PTE with a minimum guaranteed efficiency, say, Eo, by choosing a suitable level of significance, a*,in solving the equation max min E(cy,A2) 2 Eo. a
Az
(11.1.25)
Some idea of optimum a* may be found in the Table 4.2.2 of Chapter 4. Next we consider an MSE analysis: . PT Comparison of 6, and ON . Consider the MSE-matrix difference M I ( 6 N ) - M2(6y)
= x G p + ~ , m ( &A2) ; - N8O’{ 2Gp+2,,(ea; A2) - Gp+4,m(!:;
A2)}.(11.1.26)
11.1. Point and Set Estimation of the Mean Vector of an MND
515
The matrix difference is n.n.d. if and only if
., PT
This means that 8,
is superior to PT wise, 8, is superior to 8, .
-
Comparison of case,
-s
4,
S+
. First, we consider 6 ,
and 8,
eN, 8N,
whenever A’ satisfies (11.2.11); other-
-s
vs. 8,. In this
s Ml(6.w) - M3(en)= ~ P X { ~ E [ X L ~ ~-( A (P~ )~)E[x;:~(A~)]} ] - dp(p
+~)NB~’E[~;$(A~)].
(1 1.1.28)
This difference is n.n.d. if and only if
< (P- 2)E[X;;2(A2)]
PECX;:2(A2)l
(11.1.29)
-s
for all A’, which may not hold always. Hence, 8, does not dominate 6, uniformly based on the MSE criterian.
-s
*
s+
For the comparison of 8, and 8, given by
M3(6:)
, we consider the MSE-matrix difference
- M4(6:)
=xE[(1-
dlF,;’z,,(A2))2J(F,+z,,L(A2)< di)]
eel{ 2 ~ [ ( 1 -
<
~ l ~ ~ ~ 2 , , ( ~ 2 ~ ) ~ ( ~ p + 2 , mdl)l ( ~ 2 )
- E [ ( 1 - dzF,;’,,,(A2))21(F,+4.,(a2) < The expression (11.1.28) is n.n.d., since Fp+z,,(A2) -s nates 8, uniformly:
Mq
(6y)5 M3 (6;)
We may note that 8%’ may not dominate
-
-PT
-S
&)I}.
(11.1.30)
- S+ domi< d l . Hence, 8,
VA2.
(11.1.31)
-s
e ~since , 8,
..S+
does not dominate
For the comparison of 8, , 8 N , and 8, , see Problem 11. Finally, we consider the confidence set estimators that are similar to those described in Chapter 4. The confidence sets are given below when X is known. ON.
(i)
(ii)
and
c 0 ( 8 N ) = { e : Nile - 6 N l l $ - 1 I xE(y)} PT hPT - PT (
N
( a ) )= { e : N l l e - e N
(iii)
-s 2 cs(6sN ) = ( 8 : Nile - eNl1X-l
(iv)
C
s+ es+
(
N
- s+
(a)ll$-l
(1 1.1.32)
5 x;(y)},
L x;(y)},
) = ( 8 : Nil8 - 8, ( N ) ~ I & - ~
I x;(y)}.
Chapter 11. h/luitivariate Models
516
The results hold good if X is unknown and N + a.The analysis of the properties of the confidence sets are similar to those given in Section 4.8. For the estimation of mean vector under possible subhypothesis restriction, see Srivastava and Saleh (2005).
11.2
U-statistics Approach to Estimation
This section contains the estimation of functionals of distributions based on U-statistics and its modifications. The idea of U-statistics is due to Hoeffding (1948). Parallel t o pvariate normal theory, we consider the unrestricted, PTE, SE, and PRSE of the functionals which, as a special case, contains the mean vector of distributions as target for estimation enlarging the scope of application. Let {Xi, i 2 1) be a sequence of i.i.d. random vectors with a distribution function F defined on EQfor some q 2 1. Let 3 be the space of all distribution functions, and for every F E 3,consider the vector of functionals
6 = e ( F )= ( h ( F ) ,.. . ,O,(F))’ whose domain is 3.If there exists a kernel pII(XI,. arguments of degree m3 ( 2 l ) ,such that
for p 2 1
(11.2.1)
. . ,xm,),symmetric in its
~ J ( F=EF{(Pj(Xl,-.. ) >xm,)},v F E 3, j = 1,. . . , p , then 6 is an admissible vector parameter. For max(m1,. .. ,m p ) ,we can define the vector of U-statistics,
Un = ( u n 1 , . . .
i
unp)’,
(11.2.2) TZ
2 m*
=
(11.2.3)
by letting
(11.2.4) For a detailed account of U-statistics, see Rohatgi and Saleh (2001), Puri and Sen (1971), and Randles and Wolfe (1979) among others. U n is symmetric and unbiased estimator of 6 with minimum variance property. Assume that the kernels pj are all integrable. Then we define the following conditional expectations:
vj,c=E~[Cpj(Xl, ... ,Xc; Xc+l ,... ,xrnj)], C E O ,... , m j , and
(11.2.5)
11.2. U-statistics Approach to Estimation
517
for j , d = 1,.. . , p , and c = 0,1,. . . ,min(mj, me). From Hoeffding (1948) (see also Puri and Sen, 1971), we have
nE[(U,
- 6)(U, - 6)']= E
+ ~(n-l),
(11.2.6)
A convenient estimator of E is available by way of where E = ((rnjme&je,,)). the jackknife technique. For this, let
un = U(X1,. .
*
,X,),
and for every i = 1,.. . ,n, define one-skip U-statistics as
u;ll
= U(X1,. . . ,X&1, XZfl, .. *
,X,)
and
u,,z = nu,
- (n - l)U,(2) -,.
(11.2.7)
Then the jackknife estimate of E is given by
9,
n
= ( n - 1)-1
C[Un,3 - U,][U,,j
- U,]'.
(11.2.8)
j=1
e,z
Further, E as n --t 00 (see Puri and Sen, 1971). For testing the null hypothesis HO: 8 = 0 , we can use the test-statistic ,. -1 -I -1 L, = nU',I=, U, = nO,E, 6,, A
(11.2.9)
setting 8, = U,. Then, under Ho as n 4 00, L N converges to the xgdistribution. Now, consider various estimators of 0 when it is suspected that 8 is 0. We have the unrestricted estimator of 0 given by 8, = U,. The three estimators of 0 based on 6, and LN are as follows:
. PT
en = en%& 2 L , O ) , e,- s = e,(i - CLC,'), (C = p - a),
(11.2. lo) We define the corresponding confidence sets as
respectively. See also Sen (1984) and Saleh (2003).
Chapter 11. fiiultivariate Models
518
11.2.1 Asymptotic Properties of Point and Set Estimation under Fixed Alternatives First, we note that
-
h ( 8 , - e) N,(o,E) and lim P
12-00
{nlje -
I x;(y)}
~ ~ 1 1-l2 -
for
e# o (11.2.12)
= 1 - y.
. PT
Next, we consider the PTE, 8, , and notice that
-S
Similarly, for the SE, 0, we obtain (11.2.14) Sf
and for the PRSE, 8,
lie,
-
- S+ en
we obtain
(c)ljE n 2 ~
-l
=
c,qc, <
+ c2c;1(c,
>
(11.2.15)
Assume that ~~~p~~~~ < 00
for some T 2 I.
(11.2.16)
converges to the finite < Cn,a)l where Now, E F I [ L , < C,,,] = limit xg(cy) as n -+ 00. Since 2, + E almost surely as n -+ co due to (11.13.16) and e,"z6 as n -+ co,then n - l C , -+ OIE-'B a.s. and 13, 4 co in probability as n co. Therefore, for every 0 # 0, meaning A2 > 0, limn+m P(Cn< 0. Hence, the r.h.s. of (11.2.13) converges to 0 as n -+ 00. Further, from the discussion above nC;' converges to (OrE-lO)-l almost surely, which implies that as n 00, -+
-+
-+
C , '
as well as L,'I(C,
> c) -+ 0
in probability.
Hence, limn.-+mEF(C;l) = 0, and lirn,,,EF[C;'I(C, result,
> c ) ] = 0. As a
(11.2.17)
11.2. U-statistics Approach to Estimation
519
-
APT
-s
S+
in the first mean as n + 00, and we conclude that On, 8, , On, and 8, equivalent in quadratic mean under fixed alternatives, 0 # 0 . Hence,
,/G@~
,/K(G; (iii)
e) = &(en e) + op(i), e) = &(en - e) + Op(i),
-
-
are
-
,,%(en e) = ,,%(en - e) + op(l), . S+
(1 1.2.18)
-
and
(1 1.2.19) where
(1 1.2.20) and
< &,a)
g(L,) =
= (1 - C L Z I ) = CL,’
+ (1 - cL,’)I(L,
< c),
(1 1.2.21)
respectively. Also, the asymptotic distributional MSE and risks are
M(8:) = E
11.2.2
and
R(0:; Q) = t r ( E Q ) .
(1 1.2.22)
Asymptotic Properties of the Point and Set Estimation under Local Alternatives
As before, we consider a class of local alternatives
q,,: e(,) = R.- 1/26,
6 = (iil,. . . ,iip)’,
(11.2.23)
fixn-1/2
to obtain meaningful- differences in the asymptotic distributions of (6;e,,)), where 0: = 0, - 8,g(L,) as n + 00. Accordingly, consider the point I
APT
-s
AS+
estimators 8, ( a ) ,O,, and 0,
(c) under K(,) as follows:
520
Chapter 11. Multivariate Models
It is known that and
%n
--f
X a.s. as n . -1/2
J;E2,'/2en= &En V
-+
+
a~and
-
+ a,;
(6, - 6(,))
Z + 6' as n -+ 00, 6'
. -1/2
fix,
-
-
(6n-6(n)> Np(O,Ip)
(6; = 9 y 2 6 )
= Plim
a,;
Hence, we have the following theorem:
Theorem 2. Under {K(,)} and the assumed regularity conditions
11.2. U-statistics Approach to Estimation (i) Unrestricted estimator b,(&) = 0 and Bl(8,) = 0, M1(8,) = E and RI(8, : Q) = tr(QE). (ii) PTE
521
(11.2.28a)
Chapter 11. Multivariate Models
522
These results may be obtained by using the random variable
where Z
11.3
- Np(O,Ip)
and g ( . ) is defined by (11.2.21).
Nonparametric Methods: R-estimation
Consider the model Y j = 6 + ~ j ,j = 1 ,
. . . , N,
(11.3.1)
where Y1,.. . ,Y Nare i.i.d. r.v. having pvariate continuous cdf, FO(Y), defined on the Euclidean space R p that is diagonally symmetric about its location parameter, 6 = (61,.. . , BP)’,
F e ( Y )= F ( Y
-
6 ) , Y E RP,
(11.3.2)
where F is diagonally symmetric about 0. Let F[jl be the marginal cdf corresponding to F and F [ i , j ]i, # j be the bivariate marginal cdf corresponding to F . Further, let Frjl possess absolute continuous pdf frjl, and let
In addition, for every N ( 2 1) and j (= 1,. . . ,p)’ define N scores
aftij(h) = E[4T(UNk)] or 4;
(-)N k+ l
for IC = I,.. . ,N,
(11.3.4)
where UNk is the kth order statistic from sample size N from U ( 0 , l ) and for every u E (0, 1),consider
(11.3.5) where the functions { & ~ ( z L ) } are square integrable and nondecreasing. To define the rank statistics, let Rf(b) be the rank of l y Z j - bl among /Y&b(.. . . , 1 Y ~ j bl, for i = 1,. . . ,N and j = 1,. . . , p , and consider the vector of LRS
where
11.3. Nonparametric i'vfethods: R-estimation
523
Notice that T N( b~) \ in b (see Puri and Sen, 1986), and define the unrestricted estimator of 6, as
1 & = ~ ( s u p ( b : T ~ ~>O}+inf{b;T~,(b)
j = l ,... , p . (11.3.7)
Then define 8, = (81,.. . ,6,)' as the estimator of 0 = (61,. . . ,Op)'. It may be verified that 6, is robust, translation-invariant and consistent estimator having (coordinatewise) median unbiasedness property (see Puri and Sen, 1986). To study various other estimators, we set ~a(F[,1(2))#3(Fiil(z))dFIa~l(z,~) for i , j = 172,
,P,
and (11.3.8)
Let
A = ((Xij)),
v = Diag(y1,. . . , y p ) , and
E = v-'Av-'.
(11.3.9)
It can be verified that f i T ~ ( 0- N ) p ( O , A ) , TN(O)= ( T ~ 1 ( 0 ).,. . ,T,vP(O))'.
(11.3.10)
As a result, we can prove that (Puri and Sen, 1971, ch. 6)
-
d E ( i j N - e) N,(o, E).
(11.3.11)
Suppose that it is a priori suspected that 0 is 0 (null). Then, if we wish t o test the hypothesis HO : 0 = 0, we consider the test-statistic LN defined by
L N = NITN(0)]'MXITN(O)],
(11.3.12)
where N
M" - ((m;e)),m;! = N-' xaAj(R;)aAe(R$)signY& sign&,
(11.3.13)
i=l
j , B = I , . . . , p , and
MX is the generalized inverse of Mf, (see Puri and Sen, 1967). In the sequel, we will need a consistent estimator of E defined by
2~ =~;'MN?;',
M N = ((mje)),
(11.3.14)
where mje = N-' uN,(Rij)aNe(Rie) and Rij is the rank of Y,j among Ylj,. . . , Y N ~( j = 1,... , p ) . Then MN is robust and translation-invariant estimator of v. Also, V N = (51,. . . ,;Yp)', where
Chapter 11. Multivariate Models
524
j = 1,. ,. , p and a is a prefixed positive number. Let C N , be ~ the 0-level critical value from the null distribution of C N . Also, as N -+ 03, C N + ~ $(a). This allows us t o define the following three estimators of 8 as in (11.1.7) through (11.1.9):
..PT
8,
= 8 N - 6,I(LC,
is -8 N -
< Lh,,)
N - C ~ N C ; ' ( C N2 E N ) ,
62 = 6 N ( l - cCG1)I(C >
C),
where C
EN
--f
0 as N
-+
M
( p 2 3), and
( 11.3.16)
= p - 2,
respectively. Associated with the point estimators given above, we consider the corresponding confidence sets:
respectively. See also Sen and Saleh (1985, 1987).
11.3.1
Asymptotic Properties of the Point Estimators
We assume that for some positive b (not necessarily 2 l), E F I Y ~ < ~ (M ~ for j = 1.. . . , p , $ + j ( u ) , T = 0,1,2 exist for ZL E (0, I ) almost everywhere, and there exist positive constants k and that l+!)(U)l
5 k(u(1 - u ) > - E - '
Finally, we assume that the derivative SUP f[j] ).( 3
{ qj]).( [I - F[j]( 4 1}
< M,
T
(11.3.18) E
(<
= 0,1,2.
a) such
(11.3.19)
fL1 is bounded almost everywhere and -€-v
< m, 1 I j I P,
(11.3.20)
where E and 77 are both positive. Then (see Sen, 1980) we conclude that for each j (= 1,. . . ,p ) as n -+ 03,
J N ( 8 j - 6,) - yylTNj(6j) = w j -+ o ElWjIk -+ 0 v k < (1 - a€)/&(> 2), NE(6N - e ) ( e N - 6)'
I=.
almost surely,
(11.3.21)
525
11.3. Nonparametric Methods: R-estimation
We want to determine the asymptotic bias, MSE matrix, and quadratic risk of the estimators t o compare the dominance characteristics of the estimators. First, we discuss briefly the case under fixed alternatives 8 # 0. Note that
llC -S
lpN
- e N 1 y = Ij6Nlj21(LN 2
- 6Nll
= lj6NJ12{I(LN
S+
lleN - 6,112 *
=
< LN,u),
< En)
+ C21(LN L E n ) L G 2 } ,
ljeN112{~(~N < +c
2 ~ ( ~2 Nc)~;2}.
(11.3.22)
Now by (11.3.12),
~ { L
I minP{ lT~j1< N-ll2&}
for every k
> 0.
(11.3.23)
Then it follows (see Sen 1970, Ths. 1 and 3) that under (11.3.19), whenever 8 # 0, the r.h.s. of (11.3.23) is O ( N - 2 ) , while by (11.12.21), has bounded expectation. Thus, by standard analysis, we conclude that the I ( L N 5 L N ~converges ) in first mean to 0 as N -+ co. Next noting that L', -+ 0 as N -+ co,E{I(LN > c ) L i 2 } 0 as N + co as well. Thus, for fixed 8 # 0, as N -+ 00, ---f
limsupE{ IpY, - 6N11218# 0} = 0, N+CC
(11.3.24)
: N g) ( L N ) = I ( L N < L N ~ )(p, - 2)L,', where 6;; = 6, - ~ N ~ ( L and (p - 2)L;' { 1 - (p - 2 ) L i 1 ) 1 ( L ~< (p - 2)). Thus, we have
+
Theorem 3. Under fixed alternatives 8 # 0, the following holds as N
(.1) (..11) and (iii)
~
1
/
2
~
~ N ~- 0 / ) ~= ~ ( 1 i/ 2 j1 : ~- 1 /~2 ( 6 ~ - 0 )
NW&-'/'(~' N N
-
and + 00:
+op(l).
0 ) = ~ 1 / 2 1 : - 1 / 2 ( ~-~8 ) + op(l).
-1/2 -s+ ~ 1 / 2 & (8, ~ -
e ) = N ~ / ~ I : c - ~ / ~-( e~ )N+ op(1).
(11.3.25)
As a result, the asymptotic distributional bias, ADB, ADMSE, and ADQR of the estimators are 0, I: and tr( QI:) respectively for each of the four estimators. Also, (11.3.26) Hence, under the fixed 8 # 0 , there is no difference among the ADB, ADMSE, and ADQR of the estimators and the coverage probabilities of the confidence sets. The situation changes when we consider local translation alternatives
qN) : qN) = N - ' / ~ & , 6 = (dl,. . . ,dJ.
(11.3.27)
Chapter 11. Multivariate Models
526
By virtue of the translation-invariance of 6, and the fact that Y N ;= -N-'/'6 has the cdf F under K ( N ) and do not depend on N , we conclude that under {K")}, lim E [ N ( ~ -Ne(,))'Q(eN
"03
-6
(~= ) )tr(QE),
and -
2
lim P [ N I I ~-~~ (N ~ l/& ) - i 5 x;(y)] = 1 - y.
N-+W
N
(11.3.28)
Further, we have under {K")},
e ~ ,
Now, because of translation-invariance of (11.3.11) holds under { K ( N ) } as well (replacing 6 by 6 ( ~ so ) )that for every k (> 0) under { K ( N ) with } N
large enough and Q p.d. matrix, ( N 6 k Q 6 ~ and ) ~ " ( 6 ~- €J,,))'Q(6, 8 ( N ) ) I k are integrable uniformly in 6 belonging to a compact set c*.Hence, by (11.3.21), we obtain under {K")} uniformly in 6 E C , as N -+ co,
(1) L,'
+VN,
= N6',E-16~
Therefore, for
(2)
~ ; 1 -
E,
VN
4
0 in probability (or a s . )
> 0 under { K ( N ) }uniformly in 6 E C*,
( ~ e k ~ - l-16 4 ~P o)
as N
co.
Finally, under { K ( N ) by } (11.3.21),uniformly in 6 E C* (3)
Z N ~ ( E - ' ~ ~ ~as, IN ~ ) oo ,
N I / ~ E : - ~ ~ ~ ~ N
--f
(1 1.3.30)
(see also Huskova, 1971). As a result, we have the following theorem: Theorem 4. Under { K ( N ) and } regularity conditions given above as N 00, the following holds:
-+
(i) Unrestricted estimator
bI(6,) = 0 and B l ( 6 ~=) 0;
M1(8N)= Z: and Rl(6,; Q) = tr(QX)
=p
if Q = X-'.
(11.3.31a)
(ii) PTE PT b2(hN
) = - 6 H p + 2 ( x ; ( a ) ; A') and B z ( e y ) = A2{Hp+~(X;(a);Az)}2;
M 2 ( 6 3 = X{ 1 - Hp+2(x;(o); A"}
+ as'{ 2Hp+2(x;(4; A2) - Hp+4(x;(4;A"}
11.3. Nonparametric Methods: R-estimation
527
and A’)} Rz(6Y; 9 )= tr(QE){ 1 - Hp+2(x;(a);
+ 6’Q6{2Hp+z(~;(~); A’) (iii) SE
Proof.
Consider the expression
- Hp+4(x:(a);
A’)}.
(11.3.31b)
528
Chapter 11. Multivariate Models
where
~ ( L N=) I ( L N < L N , ~ )and L N , 4 ~ xi(a)as N = ( p - 2 ) L i 1 1 ( L2~E N ) = ( p - 2)Lk1 -1/2
and
EN --+
-+
0 as N
co
-+
00
+ { 1 - ( p - 2 ) L k 1 } I ( L<~p - 2 ) .
(11.3.33)
-
Let Z = f l X N (8, - 8 ( N ) ) . Then Z 2 N p ( 0 , 1 p ) and , N6’,*,’6, jjZ 6*jj2,where 6* = C-1/26. Hence, under { K ( N ) } ,
+
&(e;
z - (z + S * ) S ( ~ I+Z s*ii2)+ O P ( ~ ) .
- 8“)) =
rz
(11.3.34)
Direct calculation yields ADB : b(8;) = -6E[g(x;+2(A2))] ADMSE : M G ) =
- ~ { x ; + , c A ’ ) )I E[S2(X;+z(AZ))]}
+a&’{
2E [dx;,2(A2),>l- 2E [s(x;+z(A”,>]
+ E[S2( x i + 4 ( A 2 ) ) l }
(11.3.35)
3
and ADQR : R(8;;Q) = tr(QM(8;)). Then, specializing for g ( L N ) , we obtain the expressions in Theorem 2. For example, if ~ ( L N=)I ( L , < L N , * ) , we have E[I(x;+2,.(A2)L x;(a))]= Hp+2r(x;(a);A2); T = 1,2. Then we obtain
b3(6y) = -6HP+z(xi(a); A2),
W(6y)= X{ 1
- H p + 2 ( x ; ( a ) ;A ’ ) }
+ 6 q 2 H P + 2 ( x ; b ) ;A 2 )- H p + 4 ( x 3 4 ; A ” } , and
11.3.2
Asymptotic Properties Confidence Sets
Theorem 1 of Section 11.3.2 shows that under fixed alternatives 8 N-CQ lim
P(Nl16
- 8;)1&1
5 x ; ( y ) } = 1 - y.
Thus. we consider the local alternatives
# 0, (11.3.37)
11.3. Nonparametric Methods: R-estimation
529
a compact set in R P . Accordingly, as before, where 6 E C*,
. -1/2
fiki1”(6k - e ( N ) ) - fix:, Since k~
-+
fikk1I2eN
X as N
-+
co and
.
..- 1 / 2 -
-
(ON - e ( N ) ) = f i x ,
fixN
-1/2
-
(ON - t ? ( N ) )
N
..-1/2 -
and
2 en ljz + 6 *112 =nxP(a2), a2= 6
--1-
(11.3.39)
Np(O,Ip), we obtain
(6, - 6 ( N ) ) + ki1/263 z + 6*, 6* = x-’l26, (11.3.40a)
=fiEN
N~’,x=,
@Ng(LN).
2,
~
6 (11.3.40b) .
z - (Z + s*)g(lIz+ 6*112),
(11.3.41)
-+
~
Hence, f i ~ i ’ / ~ ( e ( N-)
as N
-+
ek)
-+
co, and the coverage probability of the confidence set 2
( 0 :N l l e ( N ) - e N l / e ; 1
i x:(y)} is given by
C.(t?h)=
The expression above can be rewritten as = -6*
+ (1 - C l l Z + 6*ll-2)I(jjz + 6*112 > c ) ( Z + a*>.
(11.3.44)
Chapter 11. Multivariate Models
530
Hence, under K ( N ): 8") = N-'I26, we have
= Pa*{
p*- (1 - cjjz + 6*lj-2)I(jlz + b*j12 > c)(Z + s')112 < X2,(T)}
= Hp(c; A
2 ) W < xg(Y))
+Q*{ jlZ - c(Z + 6*jl-2112 < x;4(r);jjz + 6*jj2 > c},
(11.3.45)
which is the same as (4.8.32) in Theorem 4.8.7. The analysis of the confidence sets is the same as in Chapter 4 and therefore is not repeated.
11.4 Simple Multivariate Linear Regression Model In this section, we consider a simple multivariate linear regression model that is an extension of the simple linear model of Chapter 3 in a multivariate setup. In this model, one sample {(XI,Y I ) .,. . ,(ZN, Y N ) }of size N is drawn from the pvariate normal distributions {Np(8 Pz,, X ) I a = 1 , 2 , . - . ,N } , where 8 = (61,. . . ,6,)' and ,B = (PI,. . . ,Pp)' are the intercept and slope vectors and X is the unknown covariance matrix. The main object of this section is the study of the properties of several estimators of 8. As a special case, we study the two-sample problem involving two pvariate normal distributions. For more details, the reader is referred t o Ahmed and Saleh (1990, 1999) and Sen and Saleh (1979).
+
11.4.1 Model, Estimation and Tests Model. Let Y,
=8
+ pz, + E,,
Q
= 1,2,'. . , N ,
(11.4.1)
where Y , = (Y,l,-.- ,Yap)' is the response pvector, z, is a scalar, E = (€,I, '. . ,cap)' is the pvector of errors, 8 = (61,.. , 6,)' is the intercept vector, and P = (PI,... ,&)' is the slope-vector of the model. We assume E, Np(O,E). Our problem is to estimate 8 when it is suspected that p may be null, meaning, 0 without loss of generality.
-
Estimation. Using LSE/MLE methods, we obtain the unrestricted estimators of p and 8 as
11.4. Simple Multivariate Linear Regression Model
h(1"~)~
531
with x = ( X I , - .,.X N ) ' and Z N = 51"~. If where Q N = x'x p = 0, then the restricted estimator of 8 is given by -
8N = Y N =
-1k N1
[ ' 1.
(11.4.3)
Y N
Test of Hypothesis. For testing the null-hypothesis Ho : p = 0 against H A : # 0 , we use the likelihood ratio statistic given by (11.4.4) where N
Sa = (N-2)Se = C((Ya-Y~)-PN(z, -ZN)}{
(Y,- V N ) - ~ Z ~ N ( Z , - Z N ) } ' .
ff=l
(11.4.5) It can be shown that the likelihood ratio statistic is equivalent to the Hotelling's T2-statistic, given by
&-G-';
(11.4.6)
(N -21-1~; = QN&S;'P~
T$ as the test-statistic equivalent to 5";. We define C N = Under H A , C N follows the noncentral F-distribution with (p,m) d.f. ( m= N - p - 1) and noncentrality parameter A2/2 with A2 = &~p'I=-'p.
(11.4.7)
11.4.2 Preliminary Test and Stein-Type Estimators Let F p , m ( ~be) the cr-level critical value of the CN-statistic under Ho. Then we define the preliminary test and the Stein-type estimators of p as follows: A
PT
PN
=P N -PNwN
6; = P N and
-s+
PN
-
d P N c i1i
-s
=p N -pN(1
<~p,m(~)),
(11.4.8a)
d-
(11.4.8b)
(p-2)m
p(m+2)
- d c ~ ' ) I ( L N< d ) .
( 11.4.8~)
Similarly, the corresponding estimators of 6 are given by -PT
ON
-
=y N I ( c N
-PNzN)I(cN
+PNzNI(LN < F ~ , ~ ( Q ) ) , 8; = PN + d P N z N c , l , -s+ 8 N = O N + (8, - 6 ~ ) ( 1dCi1)I(CN >d) = 8; + P N S N { 1 - (1 - d c G 1 ) I ( c N < d ) } . = 3,
and
< Fp,m(a)) + ( Y N
> Fp,m(a)) (11.4.9~~) (11.4.9b)
(11.4.9~)
532
Chapter 1 1 . Multivariate Models
11.4.3 Bias, Quadratic Bias, MSE Matrices, and Risk Expressions of the Estimators This section contains the bias, quadratic bias, MSE matrices, and weighted risk expressions of the estimators of p and 6 following the definitions given in the introduction of Section (11.1.2).
Theorem 1. The bias, quadratic bias, MSE matrix, and weighted risk expressions of the estimator of ,f3 and 8 are given as follows: (i) Unrestricted estimator and = 0.
&(PN)
bl(PN) =0
1 M I @ , ) = ---C QN
bl(8N) =0 M I ( ~ N )=
and
and
&(a,)
(+ + %)E.
R 1 ( P N ; Q) =
= 0.
1 -tr[QE] QN
P
= - if Q = C-’. QN
(11.4.10a)
11.4. Simple Multivariate Linear Regression Model
and
533
534
Chapter 11. Multivariate Models
11.4. Simple Multivariate Linear Regression Model
535
--
11.4.4 Two-Sample Problem and Estimation of the Means Consider the model (11.4.1). Set x = ( 0,. . . ,0 , 1, .. . , 1 )'. Then we have N~ -zero's
N2-one's
respectively. Further, let
p 1 = 6 and p Z = O + P
(11.4.16)
536
Chapter 11. Multivariate Models
so that p2 - p1 = 0. Clearly, the unrestricted estimators of p l and p2 are = Y 2 . AS a N1 Y, = Y 1 and ji2 = N given by f i l = result, the estimator of the difference of the means is given by
c,=1
&Ca=Nl+lYa
& - p1
= Y2 - Y1.
( 11.4.17)
The test-statistic for testing Ho : p1 = p2 against H A : p1 # p2 follows as Hotelling’s T2-test, defined by =
(N1+N2 - 2)s=
N1N2 (Y2- Y1)/s-1(Y2 -Y1). N1+ N2
C(Y,Y1)(Y, N1
-
-Y1)’
a=l
+
N
C
a=N1+1
(11.4.18)
(Y,- Y2)(Ya -Y2)’. (11.4.19)
Now, under Ho, the pooled estimator of p1 is given by
(11.4.20)
respectively. The bias expressions of the five estimators are given by
bl(@,) = 0 and B1(&) = 0,
B4(bLS) =
N2d2p2 Nl(Nl +Nz)
(11.4.22a)
N2
)
-l{
E[x;:~(A~)]}~. (11.4.22d)
11.4. Simple 2lIultivariate Linear Regression Model
537 Finally,
Next, we consider MSE matrices and the weighted risk expressions of the five estimators of pl. They are given by Mi(fi1)
=
k X and R ~ ( f i i ; Q )= & t r [ Q x ] ,
M2(Gl) = + ( I and R2(fil; Q) =
(11.4.23a)
+ %A2)X, (1
+ %A2)tr[QX],
(11.4.23b)
538
Chapter 11. Multivariate Models
11.4.5
Confidence Sets for the Slope and Intercept Parameters
P based on the four estimators,
Now, consider the confidence sets for -s
APT
PN
i
PNi
and
1
S+
pN,
PN
c*(P;)= { P : QNIJP- P;V/I& I x;(Y)} where
P;\r = /?N (1 - g(L*,)), 1 (L;V I Fp,m((.)) (P- W k l and > (p - 2 ) L > 4 + (1 - (p - 2)L;V41(&*, < (p - 2))
d q v ) = 0,
1
and L*, = Q N @ ~ C - ' ~ Nwhere , E is known. Similarly, the confidence sets based on the five estimators of 6 can be written as
where
e;
= 8,
+ (8,
-e N > g ( ~ k ) .
The coverage probabilities may be calculated as in Chapter 5 for the parallelism problem.
11.5. R-estimation and Confidence Sets for Simple Multivariate hdodel 539
11.5
R-estimation and Confidence Sets for the Simple Multivariate Linear Model
11.5.1
Introduction
Let {X, = ( X , l , . . . , XZp)’, i 2 l} be a sequence of independent random vectors with p ( 2 1)-variant distribution functions {F?, z 2 l}, where for every z 2 1,
F,(Y)= F(Y
-6 -
OX,),Y E RP . . ,Pp)’ are the unknown
(11.5.1)
where 6 = (81,. . . ,Op)’ and p = (/?I,. intercept and slope parameters, respectively, and z,(z 2 1) are sequences of known constants. As in Section 11.14, we are interested in the R-estimation of 6 and p when it is suspected that /3 may be null vector ( 6 ) based on the rank method. Let R,,(a,b) and R t ( a ,b) be the rank of yZ, - a - bx, (or /l’iJ - a - bz,l) among Y1, - a - bz,, . . . ,Yn, - a - bx, (or IY1, - a - bzl 1,. . . , lYnl - a - bz,) for z = 1 , . . . ,n; J = 1 , . . . , p , where a , b are real numbers. Then define the LRSs: Tn(a,b)= ( T n l ( a l , b l ) , . .
,Tnp(ap,bp))’
and Ln(b) = (Lnl(h), . .. , L n p ( b p ) ) ’ , (11.5.2)
where n
T ~ , ( u , ~=) n-’ C ~ X ( ~ : ( a , b ) s i g n ( ~ ,-, a - bz,) 2=1
n
and
Lnj(b) = n-l
C(xz - zn)an(RzJ(b)),
( 11.5.3)
i=l
respectively, with scores defined by a,’ (k)= E#j(Unk)
(11.5.4)
a n ( k ) = E#:(Unk)
for i = 1,. . . , n and u n k is the order statistic with rank k among n independent samples from U(0,l). The score functions #j = {#j(u);0 < ZL < 1) is absolutely continuous, nondecreasing skew-symmetric, meaning q 5 j ( ~ ~+ ) #j(l - Z L ) = 0 V ZL E ( 0 , l ) and are square-integrable inside (0,l) whereas = #j u E (0, l), j = 1,... , p . Assume that
#T(ZL)
(y),
c(zi n
Qn =
- Z,)
and
i=l
lim Z n = 3: (121 < m)
n-ca
lim n-’Q, = Q’
,-a,
(11.5.5)
540
Chapter 11. Multivariate Models
both exist. F E 3,where 3 is the class of all pvariate continuous distribution functions that are diagonally symmetric about 0 having a finite Fisher information matrix. (11.5.6) Note that Ti,j(a,b ) \ a for fixed b and Lnj(a,b) \ in b independent of a for every ;I = 1,. . . ,p . Also, under HO : 8 = P = 0 , [TL(O,0 ) ,Lk(O)]’is symmetrically distributed around ( 0 , O ) . Hence, as in Sen and Saleh (1979), P’)’ as follows: define the unrestricted estimators of (O’, and
a n = ( & , . - . ,ap)’
Under Ho :
P n = ( P 1 , . . * ,Pp>’t
= 0, the restricted estimator of B is given by
en = ( 6 1 , . . . ,ep)’,
+
(11.5.7)
where 6, = ${ sup[a, : Tn,(a,;O)> 01 inf[a, : Tn,(a,;O)< 01) since the restricted estimator of is 0 . Then enis a translation-invariant, robust, and consistent estimator of 8 when P = 0, while as well as are similar types of R-estimators when P # 0. In order to define the rest of the modified R-estimators, we consider first the test of hypothesis HO : P = 0 based on the rank-test Cn given by
an
Ln
=
an
[nLL(O)M,Ln(o)]Qn,
(11.5.8)
where
Mn = ((m,e)), m,e = n
c,=l
-1
n
C[an,(Rz,) - ajl[ane(&e)
-
at],
(11.5.9)
2=1
n
an,(i); j , L! = 1 , . . . , p , and M, is the generalized inverse with ii, = n-l of M,. Now, conditionally, C, is a distribution-free statistic, and for large n, Cn is approximately distributed as a central chi-square with p d.f. when M, is of full rank. Let Ln,a be a upper a-level critical value from the distribution of C,, and then Ln,@-+ $ ( a ) as n -+ co. Based on this information, we define the three estimators of P given by ,. PT
Pn
-s
0,
= P n - PnI(Ln < C n p )
=
Pn - CP,L,~I(C~> E ~ ) ,
E,
-+
o as n -+
co; c = p - 2, (11.5.10)
Similarly, we have four additional estimators of 8 as follows:
11.5. R-estimation and Confidence Sets for Simple Multivariate Model 541
6 , the restricted estimator of 8 when P = 0. . PT
= 8, -
8,
-s -
(en e , ) q ~ ,< L,,J -
8, = 8, - c(6, - 6,)L;'I(Ln
and -s+ 8, = 8, n
2 E,),
E,
-+ 0 as n --f
00
and c = ( p - 2),
+ (1 - C L ; ~ ) I ( . C>, c)(en-&I,
(11.5.11)
respectively.
11.5.2
Asymptotic Properties of the R-estimators
Let 4,l and Fjel be the marginal and the joint-distribution of the j t h and (j,!)th variable corresponding to F in (11.5.1) as described in Section 11.3.1 and consider the matrices defined by A and E as in (11.3.9). We assume that they are of full rank and M, is a consistent estimator of E. Then, one may show that under fixed alternative P # 0, the following results hold
JnM,- l / 2 ( j I n &M;'/'(fIZ and
PT
-P) =
-P) =
fiM,1/2(/?lZ+
J;EE-'/2(p, - P ) + o p ( l ) ,
@-1/2(p,
-P) =
-
P ) + op(l),
&E-1/2(pn - 0) + op(1).
(11.5.12)
Thus, we consider the class of contiguous alternatives
K(,) : P(,) = n- 12 6 , 6 = (61,. . . ,bp)'.
(11.5.13)
Now, from the basic theorems of Chapter 8 (Puri and, Sen 1985), it follows that under 8 = P = 0 and (11.5.2)
(11.5.14) Further, from theorems of Chapter 6 (Puri and Sen, 1985), we have the linearity results, where k (0 < k < 00):
+
l~,,(n-1/2(a,,b , ) ) - T,,(o,o) n'/2(a, sup < k2
la31
Ib,/
+ b , ~ ) y , j50 as n
+
m,
< kl
(11.5.15)
so
f' where y, = 1 +J(u)4,(zi)duand +,(a) = -LLfor j = 1 , . . . , p . Then we fb1 ( u ) have the following theorems based on (11.5.14) and (11.5.15).
542
Chapter 11. Multivariate Models
Theorem 1. Under the assumed regularity conditions given in (11.5.5) through (11.5.15), as n + 00, we have
and when p
= 0,
h(6, - 6 ) 5 Np(O,E).
( 11.5.16b)
Theorem 2. Under {K(,)} and assumed regularity conditions, as n
r*Q*
( %<,Fay) )
Nzp
---f
00
(11.5.17a)
{ ( 7fi* ) E ;
8 Diag(l,Q*)}
(11.5.17b)
Proof. Under 6 = P = 0 and using componentwise linearity results, we have as n --+ 00, h L ( 0 )=
vhb,Q*+ O P ( ~ )
and
&T,(O, 0) = u&(6, so that u h e , = &T,(O,O)
-
+ pnz)+ op(1)
(11.5.18)
+ op(1).
(11.5.19)
z
----hL,(O)
Q
Hence, utilizing contiguity of probability measures to those under H,* : 6 P = 0 , we obtain under {K(,)} as n + 00 the joint-distribution of (fiT:,(O,O), fiL:,(0)lf as
=
Further, since
~ 6 6 = ,&T,(O,O) the proof of (11.5.16b) is complete.
-I-o p ( l ) ,
11.5. R-estimation and Confidence Sets for Simple Multivariate Model Corollary 1. The marginal distributions of fi(6, under {K(,)} as n -+ 00 are
&(en - 6 ) and
-
&(en
- 6 ) and
fi(8,
-
- 8 ) and f i ( 6 n - 8 )
Np(S%;I:)
&(6, - 6,) Also,
543
(11.5.21)
Np
6,) are mutually independent.
By the corollary we find the ADB, ADMSE, and ADQR of the estimators of 8 and 0 as given below. (i) Unrestricted estimators
bl(P,) = 0 and B1(P,) = 0;
bl(6,) = 0 and 1
&(a,) = 0;
and Ml(8,) = (1 + P 2 ) I:;
MI@,) = -I:
Q’
3
1 Rl(6,; W) = -tr(WI:) and Rl(8,; W) = 1 + - tr(WI:). (11.5.22)
(
Q’
(ii) Restricted estimators
bz(b,) = -6 and b2(6,) = 6% and M2(/?,) = 66’
B2(/?,) =
A’;
&(en) = ?’A2;
and Mz(6,) = I:
+ P’66’;
R,(p,; W) = 6’W6 and R2(bn;W ) = tr(WI:)
+ Z’(d’W6).
(11.5.23)
544
Chapter 11. Multivariate Models
11.6. Conclusions
R5
11.6
(6:’;W)
545
=
Conclusions
In this chapter, we first discussed the estimation of the mean vector of an MND when the covariance matrix is unknown, extending the results of Chapter 4. We covered the estimation of functionals of distribution based on U statistics to enlarge the scope of nonnormal multivariate distributions as well as R-estimation methodology to include robustness properties of the location parameter estimators. In addition, we considered the simple multivariate linear model and discussed the various estimators of location and regression parameters with MND errors and arbitrary multivariate errors with symmetric marginals. In all cases, we included confidence set estimations with their properties.
Problems 1. Let Y1,... , Y Nbe a sample of size N
11.7
from the MND, “ ( 8 , E). The problem is the estimation of 8 when it is suspected that Ho : 8 = 801, where 80 is a scalar and unknown. (a) Show that the point estimator of 8 under H0 is the restricted estimator en defined by 6N
= lp(lbS-’lp)-’lbS-’e,;
8,
= yhl,
546
Chapter 11. Multivariate h4odels where S is the unbiased estimate of
X.
(b) Show that the LR-statistic for testing HO : 6 = 801, against H A : 6 # 801, is given by
LN
=
m
N8h(S-'
n ( P - 1)
-s-'~p~-'lp)-'~~s-')6N,
where n = ( N - 1) and m = N - p . (c) Show that (i) H = SC(C'SC)-'C', where C'l, = 0 is an unbiased estimate of H = XC(C'XC)-'C', (ii) E(Hx:-~H)=
[el c(c'xc)-~c'.
(d) Show that a class of empirical Bayes estimators of 6 is
6EB = 8 N
+ 11 - g ( L C , ) ] ( a N - 8 N ) ,
where g ( L N ) = 0, 1, ~ ( L N < F p - l , m ( ~ ) )cL;', , and 1 - (1 -
CLb1)I(LN >
..S+
and 6 ,
p-3 m
c ) , with c = ( p ! l ) ( i + z ) yielding
,a,
$N,
APT - S
6,
, 6,,
, respectively.
[fix2+;r*)]},
2. Show that the bias vector and MSE matrix of 6;
g ( L N ) are b(6;) = -HOE ( 9
=
6,
- (6, - ~ N ) X
H = XC(C'XC)-W
as defined in Problem 1 (part b)
M(6k) = X - (EC(C'XC)-'C'E)
where A2 = Ne'(X-' - E - ' l ~ ( l ~ ~ - 1 1 p ) - ' ~ ~ E - ' ) 6 .
3. Write down the bias vector, MSE matrix, and the quadratic risks of the five estimators explicitly for each estimator of 6 .
547
11.7. Problems
4. Apply the U-statistic approach to the problem of estimating 8 under I9 = Bol,, and derive its asymptotic MSE matrix.
5. Apply the R-estimation approach to the problem of estimating 0 under 8 = &lP, and derive its asymptotic MSE matrix.
6. Prove Theorem 1. 7. Prove Theorem 2. 8. Prove Theorem 3. 9. Prove Theorem 4. 10. Prove
APT - S
11. Compare 8,
, 8,,
and
8:
based on the MSE matrix criterion.
This Page Intentionally Left Blank
Chapter 12
Discrete Data Models Outline 12.1 Product of Bernoulli Models, Estimation, and Test of Hypothesis 12.2 Product of Binomial Distributions, Estimation, and Test of Hypothesis: Homogeneity of odds ratios 12.3 Product of Multinomial Distributions, Estimation, and Test of
Hypothesis 12.4 Conclusions 12.5 Problems
In this chapter, we consider several important discrete models that appear frequently in many applied areas, along with the related estimation strategies. We choose the following models:
Product Bernoulli models. Baseball data analysis Product Binomial models. Pooling of odds ratios or meta-analysis Product Multinomial Models. T x c contingency tables We began Chapter 1 with the illustration of the Stein-type estimation methodology using the baseball data discussed by Efron and Morris (1975). In this chapter, we provide the related theoretical basis towards the Stein strategy based on the Bernoulli models. Next, as a natural extension, we consider the product binomial models to discuss the pooling of odds ratios. The pooling of odds ratios arises in many epidemiological and pharmaceutical studies. The problem occurs explicitly in combining several 2 x 2 tables for metaanalysis. As a further extension of our discussions, we investigate the product multinomial models to consider the estimation of cell probabilities in a twoway contingency table when it is plausible that the table has independent structure relating to the two traits.
549
550
Chapter 12. Discrete Data Models
In Section 12.1, we consider the product Bernoulli model with differing probability of success and the related problem of estimation of the probabilities when it is suspected that they are equal. These results provide the basis of baseball data analysis as discussed by Efron and Morris (1975). In many situations, statistical data are stratified, as in meta-analysis, into several 2 x 2 tables in order to control for the confounding factors. Such instances occur in many case-control studies, where the primary goal is to access the disease association by means of estimating the common odds ratio. It is often uncertain that odds ratios of several tables are homogeneous across strata. Then the object is to estimate the odds ratios when homogeneity is uncertain. In Section 12.2, we consider the problem of estimating odds ratio when we suspect homogeneity of several 2 x 2 tables. In many survey studies and in epidemiological surveys, data are often presented in a two-way classification leading to r x c contingency tables according to two traits. One of many statistical problems that can arise is the estimation of the cell probabilities when it is plausible that the table might have independent structure relating to the two traits. We will discuss this problem in Section 12.3 using a product of multinomial models. For some early developments on discrete data, see Kale and Bancroft (1967).
12.1
Product of Bernoulli Models, Estimation, and Test of Hypothesis
12.1.1
Model, Estimation, and Test
Let us consider the “baseball data” given in Table 1.1.1 of Chapter 1. The formulation of its analysis is based on several Bernoulli models as follows: Consider a team consisting of p players. Let (zil,zi2,-.. ,xin,)’be the result of hit or miss of a sequence of ni battings of the ith player of the team. Let 6 = (01,. . . ,0,)’ be the vector of the batting averages for the team of p players. Now, consider the outcome x = ( ~ 1 1 ,... ,qnl ; . . . ;x p l , .. . ,zpn,)’of n(= n1 n2 . n,) battings of the p players in a game. The likelihood function for the parameter 8 = (01,. . . ,6,)’ given x is
+ +. +
D
n,
i=l j=1
(12.1.1)
since the performances of the players are independent of each other. Then the batting averages of the p players can be written as
( 12.1.2) If the players are assumed t o be equally good, we expect the vector of the batting averages to be 8 = &l,, 1, = (1,. . . , 1)’ due to strict selection of
12.1. Product of Bernoulli Models
551
the quality of the players in the team. In this case, the likelihood function for 8 = 601,~based on Y = (y1, . . . ,y,)', is given by
L(fjOjx)= 0?:=1 y'(l- ~,)"-C:=IYz,
(12.1.3)
and estimated common batting average of the players is
-
6 , = 6 o n l p , 1, = (1;'. ,1)' and 8,
=
11"-
pp 8, 7 n
(12.1.4)
where N = Diag(n1,. . . ,n,). Thus, we obtain two estimates of 8, one where the players are different and the other where they are similar. If we are not sure that 8 = &l,, then it is natural t o determine any deviation statistically. Thus, we test the null hypothesis
Ho : 8 = 601, versus H A : 8 # 801,
( 12.1.5)
based on the statistics of departure from the null hypothesis:
D , = n(6,
-
..-I
-
..
6 , ) ' ~(8,~ - en),
(12.1.6)
where
( 12.1.7) It is easy to prove that for large n, D, closely follows a central chi-square distribution with ( p - 1) d.f. Thus, for a given level of significance, a(0 < a < l),the critical value D,,a may be approximated by ~ ; - ~ ( athe ) ,upper a-level value from the central chi-square distribution with ( p - 1) d.f. This allows us to define the PTE of 8 as . PT
en
=
e,
-
(e, - ~ , ) I ( D ,<
( 12.1.8)
Similarly, we can define the Stein-type estimator (SE) of 8 as
-s en = 8,
- d(6,
-6
, ) ~ ; ~d ~= ( p - 3) (> 0).
(12.1.9)
Finally, the positive-rule Stein-type estimator (PRSE) of 8 is given by ., S+
8,
=
6 , + (1 - do;l)r(o,> d ) ( e , - 6,) -s
= 8,
- (1 - ~ D ; ~ ) I ( D<,d ) ( e , - en),
(12.1.10)
(see further Ali and Saleh, 1991a). In order to assess the batting average of the players of the team, we compare the five estimators using a quadratic loss function
q e ; ; e ) = n(e; - e)'w(e;- el, APT -S
-
S+
(12.1.11)
where 8: stands for 6,, On, 8, ,O n , and 8, . Before doing so, we present the Bayes and empirical Bayes estimators of 8.
552
Chapter 12. Discrete Data Models
12.1.2
Bayes and Empirical Bayes Estimation
In Section 12.1.1, we presented the five estimators of 8 = (el,-..,6JP)’. In this section, we consider the Bayes and the empirical Bayes method of estimating 8. For the Bayes estimation of 8,we first assume that 61,. . . ,6, are independently and identically distributed beta variables with parameter ( K q , K ( l - v)),where 0 < q < 1 and 0 < K < 00. Thus, the joint-distribution of 61, . . ,ep is given by the conjugate prior conditional on ( K ,77):
(I2.1.12) and the hyperparameters ( K , q )are both known. where B ( a , b ) = Hence, the joint-distribution of (8’,Y’)’ given (K,17) is obtained as
and the marginal distribution of Y = (yl, . . . ,yp)’ is given by
(12.1.13)
(12.1.14) where a ( b )= a ( a + l ) - . . ( a + b - l ) , (a(’) = I). Consequently, the posterior distribution of 8 given (Y’, K)’ is
n ( w , K )=
1
~ ( 8 , Y l ~ , ~ ) ~ l ( Y-t q)I-l&l, ~ , ~ ) ~ ~ (12.1.15) ( l
by the fact that E[qIY, K ] = Cml (Y1 K, q)$(q). Then, assigning the noninformative prior
( 12.1.16) +(?) = Id1 - 7l)l-l to q yields a mixture of a product of beta densities. To obtain the Bayes estimator of 8 = (61,.. . ,OP)’, we compute the posterior means ( 1 2.1.1 7) and the posterior variances as
V[6iIY] = E [Var(&lY,q)]
+ Var [E(eijY,q)]
(12.1.18)
553
12.1. Product of Bernoulli Models
respectively, for i = 1,. . . ,p . The Bayes estimators of 6 = ( B l , . . . ,Bp)’ depend on K and the conditional distribution of r] given Y. If K is known, we still have a problem of integration. Generally, K is unknown, then we take recourse to the empirical Bayes method due to Robbins (1956) and specifically due to Efron and Morris (1975). To better understand the properties of the Bayes estimators, we note that K E (0, co) and let K 00. Then we can show that the posterior asymptotic (K co) distribution of r] given Y is a beta distribution with parameters (nij,n - nij), where i j = yi (see Problem 5). Hence, for example, --f
cf,
--f
lim E[BiIY]= lim E[r]IY]= ij = 6n and
K-w
K-CC
lim Var[Bilg] = lirn Var[r]Iy] = -ij(1 . - 5) K-CC n+l On the other hand, if K -+ 0, then K-w
Yi ti(1 - 5 ) lim E[Bi/y]= - = Bi and Var[BiIy] = -. ni ni 1
+
K-0
( 12.1.19) (1 2.1.20)
(12.1.21)
Note that as K -+ co or 0, the posterior mean approaches ij and Bi, respectively, while the limit of variance of the estimators are similar (only n replaced by ni) in the two cases. Since K is unknown, we can estimate K based on the marginal distribution of Y = (y1, . . . ,yp)’. Also, we can develop a simple approximation, as in Albert and Gupta (1981), by using a Taylor expansion t o develop the normal posterior distribution of r]. Thus, following Albert and Gupta (1981), we find the empirical Bayes estimator of Bi to be
(1 2.1.22) and Var[BijY] 21 &(l- 8,) ni 1
+
Instead of &, rior mean
-
K
we estimate A(K) =
(5). (12.1.23)
& (6 = p-’ C;=’=, ni) by its poste-
h
W K ) = W(K)IYI
-
:s
X(K)
[s;
W ( Y I K ?r])[r](1- s)l-’dr]] d K I K
[so” s,’ ~l(YIK?r])[r]U - r])I-‘WK/K]
. (12.1.24)
The evaluation of these integrals are complicated except by numerical methods. However, we can show, as in Albert (1987), that the empirical Bayes estimators can be written as -EB
6,
=
an
-
(8,
- e,)g(Dn),
(12.1.25a)
Chapter 12. Discrete Data Models
554 where g( D,) = min [1,( p - 3) D;
'1
(12.1.25b)
(see Problem 8).
If we set g(D,), where D, is the test-statistic, then for g ( D n ) = 0, we have -EB
-EB
= 6,; for g ( D n )= 1, we have 8, = 6,; for g(D,) = I ( D , < x ~ - ~ ( c Y ) ) , APT -EB -S -EB we have 8, = 8, ; for g ( D n ) = dD;', we have On = 8,, d = p - 3; for
8,
-EB
g(Dn) = 1 - (1 - dD;')I(D, > d), we have 8,
-S+
= 8,
.
(12.1.26)
Thus, our estimators in Section 12.1.1 are approximate empirical Bayes estimators of 8 = (61, . . . ,OP)'.
12.1.3 Asymptotic Theory of the Estimators and the Test of Departure Let 6, be an estimator of 8,and let W be a positive-semidefinite matrix, and consider the quadratic loss equation function
L,(s,
: e ) = n(6, -
e)'w(s, e ) -
= n t r [ W ( 6 , - q ( 6 , - e)'].
( 12.1.27)
For the pdimensional cube R, let w be the subspace for which 8 = 601,. Further, the departure statistic D, provides a consistent test as such for fixed 00 as n -.+ 00. As a consequence, under alternatives, 86 = 801, 6, D ,
+
---f
fixed alternatives the asymptotic distributions of
..S+
&(enPT - O ) , fi(8,- S - 8 ) A
and &(On - 8 ) are equivalent to the asymptotic distribution of while the asymptotic distribution of &(en - 8)is degenerate as n show these findings in the following: . PT
(i) PTE, 8, have
. Consider
-O ) , 00.
We
the fixed alternatives 86 as given above. Then we
..PT APT n(e, 6,)'C;l(e, - 6,) -
= n(6, - en)'C,1(6n - 6 , ) I ( D n < X;-'((Y)) = D,I(D,
< x;-l(c.r)) i X2p - l ( 4 w L < X;-'(Q)).
(12.1.28) PT
By consistency of D,, I ( D , < ~ ; - ~ ( ( ~ ) -.+ [ 8 06as ) n M. Hence, &(On 8) = - 8) o,(l) as n -+ DC) and the asymptotic distribution of
&(an
&(6ET
A
+
- 8 ) is equivalent to that of -S
---f
&(en
- 8).
-s
-
-s
(ii) SE, 8,. We note that the quadratic form n(8, - 8,)'!Z;1(8, - 6,) = -s d2D;'. Therefore, on the set {D, = 0}, we have 8, = 8, = &,,Ip. On the other hand, if we know that E[D,'] + 0 as n -+ 03, then we obtain the desired result. Lemma 1 below relates t o this assertion.
12.1. Product of Bernoulli Models
555
Lemma 1. (Ali and Saleh 1991). EIDnl(O < D , < e)] -+ 0 as n Proof. Define k , = m a x l < j l h ( 8 , -&,I. -s
---f
m.
(12.1.29)
Then D, 2 ;ki. If kn = 0, then
D, = 0 and 6 , = 6, = &,lPholds. So we consider the case where k,E > 0 and kn are less than 1 with probability 1. In any case, we may set k, > 1 with probability 1. Thus, on the set {D, > 0}, we write DL'I(0 < D , 5
E)
= DLlI
( < D , < -3+ DL'I (%- < D , < ) . 0
e
(12.1.30)
Let ko,
=+ h e .
Then
E,9 [D)nll(O< D, 5 E ) ] n 1 -(J?3(3[kL21(< kn < I)] + 8&1[k,~1(15 kn 5 ken)]). (12.1.31) 4 n Replacing k, by the largest integer [k,], we find
E6[kL21(1< kn < kOn)] 5 E6[kL21(15
c
Ikon 1
5
j-"Pe{[kn]
j= I
[ken I [kn]5
kOn)]=
5 k } + [kOn]-2p6{[kn] 5 [ken]}.
j=1
j-1p6{[kn] = j } (12.1.32)
Now the d.f. for D, is p - 1. Hence, there are kp-' configurations for which k , 5 k . For each configuration the order of probability is O(n-(p-')/2). Thus,
Pe{k, 5 k } = O ( n - ( P - 1 ) / 2 )
(12.1.33)
for every k 5 ko,. Hence, by (12.1. 30) and (12.1.31),
-
o ( J P - ~ for ) / ~p )2 4.
(12.1.34)
Since c is small, (12.1.34) can be made small. For the first term, set [nk]= Icon so that on < k , < 1, we obtain 1 5 k; 5 n. Thus,
(12.1.35)
556
Chapter 12. Discrete Data Models
By a similar argument, we can show that (12.2.35) is O ( r ~ - ( p - ~ ) / ’ )p, 2 4. Hence, we have proved that
E ~ { D ; ’ I ( o< D, < 6 ) )
-+
o
as n
+ 03.
(12.1.36)
Applying the lemma, we see that
( 12.1.37)
d 2 ~ [ ~ , 1 ] + 0as n + w .
Hence, we have
A(&: - e) = ~
( 8 -,e) + op(i).
(12.1.38)
AS+.
Similarly, we consider PRSE, 8, .
@,”+ - e)’x;l
= ~zD,’ - (2d - i ) ~ , (-i ~ D ; ~ ) I ( D <,d )
5 d’Di1
- d(2d - 1)(1- dD,l)I(D,
< d). (12.1.39)
By Lemma 1, we can show that as n
-+
00,
E[dZD,1 - d(2d - 1)(1- dD,l)I(D, < d ) ] 4 0.
(12.1.40)
Hence,
&(enS+ - e) = ,hi(&- e) + op(i). On the other hand, as n
(12.1.41)
+ 00,
&(en - e)Y(ooi,- e) = qo,
(12.1.42)
so
n(e, - e)’x;l(&,- qZco as
.+ co
(12.1.43)
and also
n(8, - 0)’(6,- 6) = Diag[ where X i =
- 81)
A1
’
. . . q 1 - 6,)
’
XP
(12.1.44)
2 ,(i = 1 , . .. , p ) . This result implies th; -
e)’x;l(e,- e)(<03)
is bounded. It further implies the following theorem:
Theorem 1. Under fixed alternatives
06
= 001,
+ 6,
(12.1.45)
12.I. Product of Bernoulli Models (i)
6,
557
has asymptotically unbounded risk,
APT - S
(ii) 8, ,8, and 6;’ bounded risk.
are all asymptotically equivalent to 6, and have
To compare the estimators approximately, we consider the sequence of local alternatives
K(,) : 8(,) = 601, + n-l/’< such that A’(
= 0,
(12.1.46)
where A = (Al,. . . , A,)’ t o arrive at the next theorem on asymptotic distributions of the various statistics.
Theorem 2. (Ali and Saleh 1991). Under {K(,)} and the assumed regularity conditions, as n -+ CO, the following hold: (i)
h(8,
-
eel,)
- N,(e, xo),xo
=
eo(i - 0dA-l
where A = Diag(A1,-.- ,Ap). (ii) h(6, - 6,) (iii)
N
h ( 6 , - 001,)
N,(J<,XOJ’), J = I,
- N,(O,
-
1,lLA.
B), B = &(l- 60)1,lL.
(vi) lim P { D , 5 z/K(,)} = H,-i(z : A2),A2 = 6’I=,’6, S = J(. n-o5
For a proof, see Problem 9.
(12.1.47)
Chapter 12. Discrete Data Models
558
12.1.4 ADB, ADQB, ADMSE, and ADQR of Estimators We can use Theorem 3 to obtain the ADB, ADQB, ADMSE, and ADQR of the estimators as follows:
6,
(i) UE,
bI(6,) = 0 and BI(6,) = 0;
M I ( & ) = EO and R l ( 6 , : Q) = tr[QEo].
b,
(ii) RE,
b2(On)= -JE = -6 and B2(6,) = A 2
M2(bn)= B
+ 6s’ and &(en
: Q) = tr[QB]
+ S’QS.
(iii) PTE, 6ET . PT
b3(On
) = - 6 H , + 1 ( ~ ~ - ~ ( a ) ; A ’ )and
B ~ ( X= ~A )~ { H(x;-, ~ + (~a ) ;
a2112
. PT
M3(0, ) = EO- &J’Hp+1(X;-l(a);
A’)
+SS’{2Hpt1(X%-i(a); A’) - Hpt3(x;-1(a); A’)} and
R3(h:T; Q) = ~ ~ [ Q E -o ~ ]~ [ Q E O J ’ ] H , + ~ ( X ; -A’) ~(~);
+ S’Q6{2Hpt1(X;--1(a); A2) - Hp+3(X;-i(a); A’)}. (iv) SE, 6; -S
b4(en) = - ( p - ~)SE[XL:~(A’)] and
-s
B4 (6,) =
( P - 3)’A’ {E[xi:1 (A’)]}’;
-S
M4(en)= EO- (P- ~ ) X O J ’ { ~ E [ X , S ~ ~ ( A( ’p)] ~)E[X,-;’~(A’)]} +(P - 3)(p
+ ~)SS’E[X;~~(A’)]and
R4(bf : Q) = tr[QEo] - ( p - 3) ~~[QEOJ’]{~E[X;;~(A’)] -(P-3)E[Xi:i(A2)II
+ ( P - 3)(p+ l)S’QSE[xi&(A’)].
(12.1.48)
12.1. Product of Bernoulli h4odels
559
12.1.5 Analysis of the Properties of Estimators In this section, we consider the ADQR analysis of the five estimators based on the results of the theorems below.
Theorem 3. Under Ho, the positive-semidefinite matrix, Q in the loss funcAS s+ tion, ADQR of estimators, en,8,-PT,en, and 8, can be ordered as follows:
en,
: Q)
L~
. Q) 5 Rd(6: : Q) 5 Rl(6, : Q),
(i)
RZ(&
(ii)
~ ’ ( e :,Q) L ~ ~ ( :6Q): I ~R ~ ( B:,Q), PT
(iii) R3(8,
: Q)
5 ( 8 ,
. S+
5 R5(8,
: Q)
-s
5 R4(6, . Q) 5 Ri(6n : Q),
where the ordering depends on the size of the level of significance, a for the
..PT
PTE, en . Proof.
Let 0; = 6, - (6, - k ) g ( O , ) ,
(12.1.49)
where g(D,) is a decreasing function of D,. Take the forms g ( D n ) = 0, l,I(Dn < ~ : - ~ ( a[)l)- ,( p - 3 ) D ; 1 ] , and 1 - [l- (p-3)O;’]1(Dn > p-3)
Chapter 12. Discrete Data Models
560
- s+ ,On and 8, , respectively. Under Ho,
APT - S
to obtain the estimators 6,,6,,8, ADQR of the estimators are
R(8; : Q) = tr[Q&] if 6; = 8, = trIQB] if 6: = 6 ,
+
= tr[QI=oI{l- Hptl(xE-i(a);O)} tr[QB]Hp+l(xg-i(a);O)
- PT
if 6; = 6, 2 P-3 - -tr[Q%] -tr[QB] P-1 P-1 2 - tr[QZo]{ -- (p - 3) P-1
-s
+
P-3 + tr[QB]{ + ( p - 3) P-1
if 6; = 6, P-3
11 - (P - 3 ) ~ - ' ] ~ d H ~ +O1) }( ~ ;
1p-3jl
0
- (p - 3 ) ~ - ~ ] ~ d H ~ + 0)} l(z;
. S+
if 6; = 8, . The risk-difference between
(12.1.50)
6,
and
6,
is
tr[Q(& - B)] > 0. Hence, Rz(6,; Q) <
&(a,;
Q).
The risk-difference between
6,
(12.1.51) (12.1.52a)
PT
and 6, is
tr[Q(Zo - B ) ] H ~ + I ( x ~ - ~ ( ~2) ;OO V)
(Y
E [O, I],
,.PT
and this implies R3(6, ;Q) 5 R1 (8,; Q). PT The risk-difference between 6 , and 8, is
(12.1.52b)
tr[Q(I=o - B)]{l - Hp+l(x~-l(a);O)}2 0 b'a E [O, 11,
- PT ; Q )
and this implies RZ(6,; Q) 5 &(On
(12.1.52~)
V a E [O,1]. Hence, we have
&(en; Q) 5 R3(6ET; Q) 5 Rl(8,; Q) under Ho. -s ,.PT The risk-difference between 8, and 8, is
%(6:;Q)
(12.1.52d)
@}, 1
- &(6ET;Q)=tr[Q(Co - B ) l ~ p t ~ ( x ~ - l ( a ) ;O)
P-
~ 2 4 .
PT
Hence, 6, has smaller ADQR when Hp+l(X;-l(a);O)
P-3
2 -; P-1
(12.1.52e)
12.1. Product of Bernoulli Models
56 1
-s
otherwise, 6 , has smaller ADQR,. Also,
2
Q) = & ( e n ; Q) + ---tr[Q(Co
R4(6:;
- B)] 2
P-1
Rz(an;Q).
Hence, R z ( k ; Q ) I fb(6:;Q). Thus, whenever Hp+l(x~--l(cy); 0) > -S
and 6 , under HO is given by
(12.1.52f)
s,
the order of ADQR of
- PT
-S
,
+ str[QB] =
-S
Also note that R(6,;Q) - Rl(8,;Q) = - s t r [ Q & ] - B)] < 0. Hence,
PT
( 12.1.52g)
Rz(6,; Q) I R3(6, ;Q) I R4(6,; Q). - dP-t r1[ Q ( &
en,en,6- ,
R4(6:; Q) 5 Rl(6,; Q), which implies that
- PT 6 , , and 6 , under Ho is the ADQR ordering of en,in, -S
&(an;Q) I R3(6:T;
Q) I R416,) I Ri(8,; Q>. >) provided that H p + l ( ~ ~ - l ( a ) ; O -S
5.
. S+
Now consider the ADQR comparison of 6 , ..s
Ho. First, compare 6 , and
-
6:'.
and
(12.1.53)
..s en, en,6 , , and 6..PT , under I
A
The risk-difference
S+
&(6:; Q) - R5(8, ; Q) P-3
[I - (P- ~ ) Z - ~ ] ~ ~ H ,0)+ I ( I ;
= ( p - 3)tr[Q(Eo - B)]
5 -tr[Q(& P-1
- B)] ( 2 0).
Hence, under Ho,
R5(6:'; Q) 5 R4(4,; Q) i S
PT
Next, consider the comparison of 6 , case is
&(6:')
and
&(en;Q).
6:'.
(12.1.54)
The risk-difference in this
- Rs(6FI'; Q) = tr[Q(Eo - B)l{Hp+i(~g-~(a); 0)
We know that 6 , is better than en whenever Hp+l(x&-I(cy);O) > Hence, the risk-difference above is nonnegative whenever PT
-S
s.
(12.1.56)
Chapter 12. Discrete Data Models
562
Since (12.2.53b) is equivalent to the density inequality
2 2 2hp+1(xp-1(4;0) 5 -+ a , P L 4 1 P-1 *
PT
we note that for a range of C Y , 6 , as given below:
A
s+
dominates 6 ,
(12.1.57)
and satisfies the ordering
. S+
. PT
Rz(6,; Q) i R3(6, ; Q) I R5(6, ; Q) I R4(6:; Q) I Ri(6,; Q).
(12.1.58)
. S+ -s The next theorem gives the dominance of 6 , and 8, over
6,
for all A’.
Theorem 4. Let p 2 4 and let the matrix Q in the loss function, L(6:; 6 ) = n(6: - 8)’Q(6: - 6)be positive-semi-definite, satisfying (12.1.59) -Sf
Then, under {K(,]}, the ADQR of 6 ,
,6- ,s and 6,
R s ( e + )I R4(6:; Q) I
&(a,;
can be ordered as
Q)
(12.1.60)
uniformly in A’. -S
Proof. The ADQR expression for 6 , can be written as -S
R4(6,;
Q) = tr[QXo] - ( P - 3)tr[QEoJ’]{
( p - ~)E[X;:~(A’)]
-S
The risk-difference R1 (8,; Q) - R4(6,; Q) is nonnegative for all Q , satisfyiilg (12.1.59), and for all A’. Hence, R4(62; Q) I & ( e n ; Q) b’ (A’, Q).
(12.1.62)
Similarly, the risk-difference AS+.
R5(8, =
1
Q) -
&(a:;
Q)
-trlQxoJ’lE[(1 - ( P - ~ ) X ~ ~ ~ ( A ’ ) ) ~ ~ ( < X p~ -+ 3)] ~(A’) f d’Qd{2E[(1-
-
( P - ~ ) X ~ ~ ~ ( A ’ ) ) I ( X ~<+p~ (A 3)]} ’)
E[(1- ( P - 3>X,-t23(A2))21(x~+3(A2)
-
3)].
563
12.1. Product of Bernoulli Models
Now, we consider the asymptotic distributional relative efficiency (ADRE) of the estimators compared to 8,. (i) AREle,; =
a,] = Ri(8,; Q)[R2(en;
(12.1.64)
Q)I-l
+
[tr[QB] 6'Q6]-'tr[QCo]
= p[l
+A2]]-'
= p(l
+ A2)-',
if Q = Ilk',
which is a decreasing function of A2. In general, [tr[QB]
+ A2CChmaS(QCa)]-'tl-[QCo]F ARE(6,; 8,)
I [tr[QBl+ A2Ch,i,(QCo)I-'tr[Q~:o1. APT
-
(ii) ARE[6, ;6,] = [I
+ SPT(CY,A')]-',
where SPT = - [tr(QEoJ')][tr(Q&)]-' H p + l
(x;-
+
( a ) A2) ; d'Qd[tr( Q&)l-'
x{2HP+l(x;-1(4;A2) - Hp+3(x;--1(4; A")> = (1 - ;)Hp+iJ~;--i(a);A~) +p-lA2{2HP+i(x;-1(cr);A2)
-Hp+3(xz-1((Y);A2)} if Q = X i 1 . -PT
-
The graph of ARE[6, ; 6,] as a function of A2 attains its maximum a t A2 = 0 for all CY (0 < a < 1) and decreases monotonically, crossing the 1-line. Table 12.1.1 gives the maximum and minimum values of the ARE together with the intersecting efficiencies of the PTE for a = 0.05(0.05)0.5 and p = 4(2)16. -s
-
(iii) ARE(6,; 6,) where
= [l
+ g,(A2)]-',
Chapter 12. Discrete Data Models
564
Table 12.1.1 gives some ARE-values of
-S en(&),6PT , (E3) and 6,(E4) and the
PT
-S
intersecting ARE-values denoted by Eaa for 6, , and 6 , and also the A:values where the intersection occurs. For example, for p = 6, (Y = 0.1, and A’ E . PT -S [0,1.1514], from the table it appears that 6 , dominates 6,, but outside this interval, domination is reversed. For A2 = 0, 6, has larger ARE (El = 6.000)
- PT (E3 = 4.2350) and en(E4 = 2.5000) relative t o 8,. -S
than 6 ,
12.1.6
Baseball Data Analysis
We can illustrate the theory above with baseball data as given in Efron and Morris (1975). For these data, Efron and Morris used the James-Stein (1961) estimator to predict the batting averages of 18 major league players in the remainder of the 1970 season. The data consist of the number of hits y in the first 45 bats observed for each player (i = 1, ... ,18). The problem is to estimate 6 = (61, . . . ,618)’, where OZ denotes the final season batting average of the ith player. Efron and Morris (1975) used the arc-sin transformation on each yz to obtain an approximate normal distribution with constant variance, and then used the James-Stein estimator on the transformed counts. EM This we denote by 6 , . Albert (1984) proposed an approximate empir*
EB
ical Bayes estimator, 8, -EM
. We present
- E B -PT
-S
the true batting average
(eT)to-
gether with 6,, 6 , , 6 , , 6 , , and 6,. The true batting average obtained from the Efron-Morris paper is 0.267 with standard deviation 0.037. To assess the performance of various estimators they used the loss defined by (6%- 67)2/(0.037)2 for each estimator. The true values ):6( and the estimated values of 8, based on Efron-Morris (EM), empirical Bayes (EB), and Ali and Saleh estimators, 6,PT,@,and 6,”’ are given in Table 12.1.2. The PTE is (0.265,. . . ,0.265)’ as a result of testing Ho : 61 = . . = 618 at the 15% level of significance. In Table 12.1.3 we present the loss ( L a _ L , E M ,L E B ,LPT, and 1
-s -s+ L s ) of the estimators, respectively. A comparison shows that 6 , / 6 , by our ,.E B . EB method is close to 6 , and 6 , for all i = 1 , e - a ,18, which supports the validity of the theory.
12. I. Product of Bernoulli Models
565
Table 12.1.1 Maximum Relative Efficiencies of the RMLE, PTE, and SE and the Intersection Efficiencies for the PTE and SE for each (Y with Corresponding A,-Values for pValues for a = 0.05(0.05)0.25 and p = 4(2)16
OJP 0.05
Ei
0.10
0.15
0.20
0.25
-425
4 4.0000 5.9971 1.5000 1.2167 2.0988 4.0000 3.5396 1.5000 1.2913 1.6932 4.0000 2.6415 1.5000 1.3199 1.3593 4.0000 2.1684 1.5000 1.3511 1.0462 4.0000 1.8729 1.5000 1.3873 .7340
6 6.0000 7.3761 2.5000 1.8128 1.8974 6.0000 4.2350 2.5000 1.9996 1.1514 6.0000 3.1003 2.5000 2.2163 ,5482 6.0000 2.5038 2.5000 2.4973 .0044 6.0000 2.1328 2.5000 2.1328 .OOOO
8 10 12 14 16 9.0000 10.0000 12.0000 14.0000 16.0000 8.3363 9.0666 9.6514 10.1362 10.5481 3.5000 4.5000 5.5000 6.5000 7.5000 3.8010 2.4102 3.0680 4.6266 5.5660 1.5048 1.3048 1.7020 1.1025 3983 9.0000 10.0000 12.0000 14.0000 16.0000 5.0648 5.3466 4.7092 5.5783 5.7739 3.5000 4.5000 5.5000 6.5000 7.5000 4.0761 2.9013 5.3466 5.5783 5.7739 .3048 .7081 .oooo .oooo .oooo 9.0000 10.0000 12.0000 14.0000 16.0000 3.8177 3.4085 3.6375 3.9652 4.0891 5.5000 3.5000 4.5000 6.5000 7.5000 3.6375 3.8177 3.4085 3.9652 4.0891 .oooo .oooo .oooo .0000 .oooo 9.0000 10.0000 12.0000 14.0000 16.0000 2.8925 3.0218 2.7274 3.2155 3.1272 3.5000 4.5000 5.5000 7.5000 6.5000 3.0218 2.7274 2.8925 3.1272 3.2155
.oooo
9.0000 2.3049 3.5000 2.3049 .OOOO
.oooo
10.0000 2.4314 4.5000 2.4314 .OOOO
.oooo
12.0000 2.5300 5.5000 2.5300 .OOOO
.oooo
14.0000 2.6101 6.5000 2.6101 .0000
.oooo
16.0000 2.6771 7.5000 2.6771 .OOOO
(Table 12.1.1 - 12.1.3 are due t o Ali and Saleh (1991), and reproduced with the permission of Statistica Sinica.)
Chapter 12. Discrete Data Models
566
Table 12.1.2 True Value (@) and Estimated Values of 64 Based on Efron-Morris (EM), Empirical B a y a (EB) and Ali and Saleh Estimators, 6PT, 62, and 6;' -
e -?p i 3__ --7 0.400 0.290 0.265 0.279 1 0.346 -2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 mean s.d.
~
0.300 0.279 0.223 0.276 0.273 0.266 0.211 0.271 0.232 0.266 0.258 0.306 0.267 0.228 0.288 0.318 0.200 0.267 0.037
I
~
0.378 0.356 0.333 0.31 1 0.311 0.289 0.267 0.244 0.244 0.222 0.222 0.222 0.222 0.222 0.200 0.178 0.156 0.265 0.068
0.286 0.277 0.265 0.281 0.274 0.265 0.277 0.272 0.265 0.273 0.270 0.265 0.273 0.270 0.265 0.268 0.268 0.265 0.264 0.265 0.265 0.259 0.263 0.265 0.259 0.263 0.265 0.254 0.261 0.265 0.254 0.261 0.265 0.254 0.261 0.265 0.254 0.261 0.265 0.254 0.261 0.265 0.249 0.259 0.265 0.233 0.257 0.265 0.265 0.254 0.208 0.261 0.265 0.265 0.019 0.007 0.000
67/67 0.294 0.289 0.242 0.280 0.275 0.275 0.270 0.266 0.261 0.261 0.256 0.265 0.265 0.256 0.265 0.252 0.247 0.242 0.263 0.014
Table 12.1.3 Estimated Average Loss for the Estimators
--i LMLE LEM LEB LPT 1 2.130 2.291 3.279 4.793 4.444 2 4.331 3 8.839 4 0.895 5 6 1.055 0.386 7 2.291 8 0.533 9 10 0.105 11 1.414 0.947 12 13 5.154 14 1.479 15 0.026 16 5.657 17 14.317 18 1.414 mean 3.079 s.e. 3.565
~
0.143 0.386 0.003 0.018 2.130 1.754 0.007 0.026 0.000 0.007 0.003 0.003 2.052 2.130 0.105 0.047 0.533 0.702 0.105 0.018 0.012 0.007 1.975 1.479 0.123 0.026 0.494 0.795 1.111 0.614 5.278 2.718 0.047 __ 2.130 0.912 0.897 1.346 1.045 -
0.895 0.143 1.289 0.088 0.047 0.001 2.130 0.026 0.795 0.001 0.036 1.228 0.003 1.ooo 0.386 2.052 3.086 1.000 1.270
LSlS
1.975 0.088 0.844 2.373 0.001 0.003 0.012 2.210 0.073 0.614 0.073 0.003 1.826 0.088 0.573 0.947 3.682 1.289 0.926 1.051
12.1. Product of Bernoulli Models
567
12.1.7 Asymptotic Properties of Confidence Sets In this section, we consider confidence sets, which are spheres with centers
-S ., S+ en,en,8,- P T , en, and 8, , respectively. Accordingly, we have the five confi-
dence sets corresponding to the assigned estimators as follows:
Co(6n)= ( 6 :
-
8nI12-
E
-1
IL
I X:(Y)),
Co(en)= ( 8 : njle - enI12- -1
En
I x;(Y)),
(12.1.66)
P(nl16n - ell? - l 5 x;(T)) = 1 - y and $(y) En
where
is the upper
critical value from a central chi-square distribution with p d.f. From Section 12.1.1, we know that under fixed alternatives, the asymptotic distribution of estimators are equivalent to that of fi(6, - 6 ) N,(O, Eo),except for the restricted estimator whose distribution is degenerate
-
as n
-+
00.
. PT
S
Hence, the asymptotic coverage probability of CPT(Bn ), Cs(hn),
and Cs(b:+) is 1 - y (as n -+ 03). Thus, to obtain meaningful asymptotic coverage probabilities of confidence sets, we consider the sequence of local alternatives, K ( n ): 6 ( n )= 801, n-ll2i$ as given by (12.1.46). Accordingly, we have
+
P[c'((~,)]= lim P{nlp(n)- 8,11212-00
-l
E3 l
I $(y))
= 1 -y
(12.1.67)
Next, we consider the sets C*(6:). In this case, (12.1.68) We now evaluate the asymptotic coverage probability of C ( 6 : ) (as n 00) under {K(,)}. We know that as n -+ co,
&(en
- 6(,)), N,(O,U~A-~), u2 = &(l- 60).
-3
(12.1.69)
Now, consider the orthogonal matrix I? = (I'l,l?2), where rl is a p x ( p 1) orthogonal matrix and I'2 is a p x 1 unit vector. Then r$1 = 0 and rlr; r2r;= I, so that rlr; = I, For our purpose, we consider r2 = A;I21, where A = Diag(?,.-. ,?) with X i = ni/n, (i = 1;-- , p ) . Thus, J, = A;1/2r1r',AL/2 = IP -A-1/2A1/21 1' A-1/2A1/2 = I -1 1'A,. n n ~ p nn p P P
+
Now, consider the transformation
Chapter 12. Discrete Data Models
568
Hence,
Further, nl[O(,)- O;l['-
-, asymptotically equals
En
(772 - w2I2
where 22
w1
+72,
-
-
+ 61, Z1 J I ( ~ - ~ ( O , I ~and - ~ )6 = N ( 0 , l ) .Hence,
= Z1 22
+ 11171 - w10(llw1112)ll JC. Similarly,
w2
=
12.2. Product Binomial Distributions
12.2
Product Binomial Distributions, Estimation, and Test of Hypothesis: Homogeneity of Odds Ratios
12.2.1
Introduction
569
In many practical situations such as an epidemiological survey or a metaanalysis, statistical data are often stratified into several 2 x 2 tables in order to control for confounding factors. For instance, consider the case-control studies where medical records are examined and questionnaires are administered involving, say, nlj experimental cases and nz3 control cases of the j t h stratum ( j = 1 , 2 , . . . ,k) to determine the number of exposed cases xlj and exposed controls x2j. The primary goal is to assess the disease exposure association by examining, the common “odds ratio” (OR) $0. Very often it is uncertain that the 0% $j, ( j = 1 , 2 , . . . ,k),of k tables are homogeneous across strata. Then the objective is t o estimate the OR of the k tables, meaning Q = ( $ I , - - . ,&)’ when one suspects that &! = q ! ~ ~ l k . This section contains the study of the problem of estimating 9 when it is suspected that HO : Q = $elk holds.
12.2.2
.
Model, Estimation, and Test of Hypothesis
Consider k pairs of mutually independent binomial distributions { B l j (nlj,O l j ) x &j(nzj,62j)lj I , . - - ,k},6 = {(6lj,62j);j = l , . . . , k } . Consider the j t h 2 x 2 table. Denote the number of exposed cases and exposed controls by zlj
Chapter 12. Discrete Data Models
570
and xz3 from nI3 and nz3 samples, respectively. Then the likelihood function for 6 given x = { ( x l 3 , x z 3 ) ; j = 1 , - . ., k } is
qqX)=
nJJ~ ; q i 2
k
-B ~ ~ ) ~ ~ + ~ J
(12.2.1)
z=13=1
We are interested in the estimation of the “odds ratio” (OR) vector, k =
(41
7
‘
.’
7
‘d!k)’, where
(12.2.2) when it is suspected that $1 = ... = $k = $0; that is, k = ?+bOlk, $o is the common odds ratio of the k stratum involving the k , 2 x 2 tables 1 Exp I l ~ l lI n 1 1 - 5 1 1 I n11 . . . Control [ z21 I n21-1~21 I n 2 1
k
j
I ml-ml ~
l
n j lj-l~lj
( 12.2.3) Using the likelihood function (12.2.1), we can write the unrestricted maximum likelihood estimator (UMLE) of P as &n
= (‘&1,”’
(12.2.4a)
,qk)’
where
I f & = . . . = ?&!) = $0 holds, then maximizing L(6lx) subject to ko = $ 0 l k yields the restricted MLE (RMLE) of !PO,denoted @ M L . The estimator is complicated to obtain, as it turns out to be the solution of a system of ( k - 1) simultaneous cubic equations. To avoid this difficulty, Gart (1962) proposed a noniterative solution. The restricted estimator(RE) @On = ?,60nlk is obtained by considering the information matrix elements for $0 as follows: k
E[*]
=
j=1
]
~ [a2alog6 ~ ~= a0, 6j ~# k~, (12.2.5a)
(12.2.5b)
( 12.2.5~)
12.2. Product Binomial Distributions
571
Further, letting
(12.2.6) we invert the information matrix to obtain the asymptotic variance of the RMLE, as
GML
A V ~ T ( ? J M=L$‘,02W-1, )
W
= W1
+ . .. + wk,
(12.2.7)
where
Then there are three possible noniterative estimators, given by
60, of $0
which are
k
(i)
$0, = ti-lCtij?Jj, 6 =til+ . . - + t i k j=1
where the estimates of wj(j = 1,.. . ,Ic) are given by
w,-’=
1 n,je,j
(1 - elj)
+
1 n z j e z j (1 - & j )
(1 2.2.10) *
We can use any one of these estimators in our study. We prefer t o use (i) in our discussion. The asymptotic variance of,&, given by (12.2.7) is the same as that of G M L.Thus, we can write the restricted estimator as @On
= (‘&On,. ‘ ’ 3 40,)’ = ‘$Onlk.
(12.2.11)
The next step is to test the null hypothesis H o : 9 = $‘Olk against the alternatives H A : \k # $ t g l k . For this, we consider the Wald-type test-statistic defined by the departure statistic due to Gupta and Saleh (1997) given by
D, = n?J;:(G,
-
Go,)’t2,(@,
-
Go,)’;
k
n = C(n12-tnzz),
(12.2.12)
i=Z
where hn = Diag(ti1, ... , z j j k ) , which is a consistent estimator of S2 = Diag(w1,. . . ,wk). Under the null hypothesis, D, closely approximates the central chi-square distribution with ( k - 1) d.f. Thus, we can define the following class of estimators of \k
9;= Gn - (G,
- @o,)g(Dn),
where g(D,) is a function of the statistic, D,:
(12.2.13)
Chapter 12. Discrete Data h!!odels
572 (i) If g ( D n ) = 0, then 9: = G,.
(12.2.14a)
(ii) If g(Dn) = 1, then 9;= 80,.
(12.2.14b)
(iii) If g(Dn) = I ( D , < x;&>),
then 9: = 9, = 9,- (&, - 6 , ) l ( D n < x ~ - ~ ( Q ) ) . ( 12.2.14~) *
PT
-s (iv) If g(Dn) = dD;l, then 9: = Q, = Q, - d(\k, - 80,)D;l,
d = Ic - 3.
(v) If g(D,) = 1 - (1 - d D ; l ) I ( D , > d ) ,
+ (9,- @on)(l - dD;l)I(D, > d ) , -s = 9, - (9, - 90,)(1- ~ D ; ' ) I ( D , < d ) .
then Q: = 8,
(12.2.144
or
( 12.2.14e)
*:
12.2.3 Asymptotic Theory of the Estimators and the Test-Statistics Note that &, and 80,are consistent estimators of Q and 90= ( $ 0 , . . ,&)', respectively. Similarly, since D, is a consistent test-statistic, under fixed a1t erna tives , 9 6 = $Olk
we obtain
D , = n4;2($~- 9 o n ) ' f i ,
..-I
(9- 90,)
Consequently, for fixed alternatives
\E6
for all
5
--+ m
= $elk
P96{D,>z)-+1
( 12.2.15)
f6 ,
as n
+ m.
+ 6 , we have ( 12.2.17)
as n - + m
2 0. Further, for fixed alternatives,
( 12.2.16)
9 6 .
n4O;a2Il(8Y-*,)1I2= D J ( D n <XE-I(Q)) fin
I x2-&)I(Dn
<xE-&)).
Then, taking expectation on both sides, we have
P[D, < X ~ - ~ ( C Y+ ) ] 0 as n
m by (12.2.17)
(12.2.18a)
E I D i l ] + 0 as n -+ co,
(12.2.1813)
-+
Similarly,
n$G:lI&E - *,/I&
= d 2 D i 1 and
and
n?ZO-,2 II 9,7 .2s+ -
= d2D,11(Dn
= d2D,1
> d ) + D,'I(D,
+ (1 - d2)D,11(Dn
> d) < d).
(12.2.18~)
The r.h.s. of (12.2.18a,18b) tends to 0 in the first mean as n + 03. Hence, under fixed alternatives * 6 , the asymptotic distribution of the estimators are equivalent t o that of &,. Further,
12.2. Product Binomial Distributions (90, -~
P
) - = q (&lk
573
- S ) as n
so that n&2l1&on - \kilh,
+ 0;)
---f
(12.2.19a)
0;)
as n -+ m.
(12.2.19b)
Also,
E[n(Gn- \k)(Gn- S)’]= +in-’, S2 = Diag(w1, ... which is bounded,
n7j;2(Gn - \k)R(Gn - \k) < 53
(12.2.21)
as n + 0;).Thus, we have the following theorem:
Theorem 1. Under fixed alternatives (i)
4onlk APT
S 6
= &lk
+6, ( 12.2.22)
has unbounded risk, -s
s+
(ii) S n , \kn and 9, are all asymptotically equivalent to tion and have bounded risks.
Gn in distribu-
In order to compare the five estimators, we consider the asymptotic setup along with the quadratic loss function
L ( @ ; ;S ) =
n(&i- S)’Q(@Z- \k),
(12.2.23)
where 9;is any of the estimators for different choices of the g-function given a t (12.2.13) and Q is a positive-semidefinite (k x k) matrix. Then the asymptotic risk of @: is given by
R(\k: : Q) = tr(QVG),
( 12.2.24)
where Vz is the asymptotic distributional dispersion matrix of &(S; - S ) . Further, \k: will be called an asymptotically inadmissible estimator of S if there exists an alternative estimator S: such that
R ( 9 : : Q) > R(@E: Q ) VA2,
(12.2.25)
where A2 is the departure function of the parameters of the models from the null hypothesis, with strict inequality for some A2. The problem is the computation of the asymptotic distributional quadratic risks (ADQRs) of estimators, \k;. To enable such a computation, we consider a sequence {K(,)) of local alternatives, namely
K(,) : \k(n)= $ O l k
<
+ n-1’2t
(12.2.26)
with a real finite vector t. Note that for = 0 , we have the null hypothesis. For an estimator \k;, the asymptotic distribution of &(\k; - \k) is given by (12.2.27)
Chapter 12. Discrete Data Models
574
whenever the limit exists. Also, let the asymptotic distributional mean square error (ADMSE) matrix be given by (12.2.28) Then the ADQR of 8; is given by
ADQR(8:; Q) = tr(QV:).
(12.2.29)
Note that for fixed alternatives, all estimators of 8 are risk-equivalent t o $,, (except RE, which has unbounded risk) due t o the consistency of the proposed test-statistic L,. Thus, our results allow us to compute biases and risks of estimators under the local alternatives presented above. First, we consider the asymptotic distribution under {K(,)} of some of the basic random variables related to the estimators in the form of a theorem given below. Theorem 2. (Gupta and Saleh 1997). Under {K(,)} and the assumed regularity conditions, as n -+ 00,the following results hold: (i) X, =
+(qn
- \EOlk)
- Nk(<,+@-'), 0
= Diag(w1,.
.. ,wk),
1 1 where wzl = xI9el3(i-el3)+ xz,e2,(i-ez,)' j = 1 , . . . ,k , and
3 n -+ A, (< 00);
(ii) Y , = +($n (iii)
z,
= &(&on
-
i = 1,2, j = 1, . . . , k.
-
Gon) N ( - J E , + ; ~ - ~ JJ = ' ) &a. ,
- \Eo)lk
N
21'
Nk (%a<,+&-'lkl;).
(vi) limn-m P ( D n I zIK(,)) = Hp-i(z : A2),A2 = +k26'a8, where 6 = J t .
(12.2.30)
For a proof, see Problem 10.
12.2.4 ADB, ADQB, ADMSE, and ADQR of the
Estimators
With Theorem 12.2.2, we can obtain the ADB, ADQB, ADMSE, and ADQR of the five estimators given below. First, we note that the expressions for ADB, ADQB, ADMS, ADMSE, and ADQR of 8: are given by
(9 b " ( 8 3 = -6E[g(xE+1(A2))l,
12.2. Product Binomial Distributions
575
Chapter 12. Discrete Data h4odels
576 = 4;sZ-l
M4($:)
- (k - 3)4isZ-1J’{2E[~~:l(Az)]
-(k - 3)E[X&(A2)1> S
and
= $$tr[QsZ-’]
- ( k - 3)E[x&(A2)]}
(v)
*:+,
-
+ ( k - 3)(k + 1)68’E[X&(A2)1
( k - 3)q!~~tr[QsZ-’J’]{2E[x;:~(A~)]
+ (k - 3)(k+ ~ ) ~ ’ Q ~ E [ X ; : ~ ( A(12.2.32d) ~)].
PRSE
bs(@z+)= - 6 { ( k - 3)E[x;il(A2)] - ~ [ ( 1 -( k - 3)x;~l(a2))~(x2k+l(A2) < k - 311)
s+
and &(9, ) = A2{(k - ~ ) E [ x L : ~ ( A ~ ) ]
- ~ [ ( 1 -( k -
< k - 3)1i2;
..s+ M s ( 9 , = M4(*;T) -?+!J~~Z-~J’{E[(~ - ( k - 3 ) x ~ ~ l ( A 2 ) ) 2 1 ( x ~ + l< ( Ak 2-) 3)]} +66’{2E[(1- ( k - 3)x~:l(A2))1(~E+l(A2) < k - 3)]
-E[(1 - ( k - 3 ) ~ ~ : 3 ( A ~ ) ) ~ 1 ( ~ ~<+k3-( 3)]} A~) s+
and Rs(9, ) = ,
-$;tr[Qst-lJ’]E[(l
- ( k - ~ ) X ; : ~ ( A ~ ) ) ~ ~ ( X E< + ~k (-A3)] ~)
+d’Q6{2E[(1 - ( k - 3)x~:l(A2))1(x~+1(A2)< k
-E[(1- ( k - 3)xL~3(A2))’1(x~+3(A2)
-
3)]
- 3)l).
Based on the results above, we present the ADQR analysis of the estimators using the following theorem:
Theorem 3. Under HO : 9 = $elk and loss function L ( 9 : : 9)= n(9: *)’Q(*: - \k) with positive-semi-definite matrix, Q, ADQR of the five estiAPT - S s+ mators (!@n, *,, 9, ,9, and 9, ) can be ordered as follows:
6) R2(Gn;Q) I Rs(*z+; Q) < - &(@’. n , Q)5 &(*,; (ii) Rz(@n;Q) I R3(@ET;Q) L Ri(*,;
..PT
(iii) R3(9,
; Q)
Q).
Q).
L R s ( i z + ;Q) L R4(?i.z; Q) L Rl(*,; Q).
(12.2.33)
The ordering of these estimators depends on the level of significance Q for the PTE.
12.2. Product Binomial Distributions Proof.
577
Recall Q: = G , - (6,- 9 , ) g ( D , ) ,
( 12.2.34)
where g ( D n ) is decreasing function of D,, the test-statistic for Ho. If
~ ( 0 ,=) 0, then =
@ ! :
=
9,
1, then Q: = G,
= I(&
(RE)
< X : - ~ ( C Y ) ) , then
= ( k - 3)D;', = 1-
(UE)
,.PT
Q: = Q,
-s
then Q: = 9,
(I - (IC- 3)D;l)I(Dn > k
- 3), then
Q;t
. s+ = \k, .
Now, under Ho, the ADQR of Q: is +;w-'
I)
tr[Qlkl;l { ~ E M X ;1+ (A2)
-J%?"X:+~(A~))I> + Y$ trIQn-'IE[l-
g(x:+i (A2))121
which is a increasing function of E[1 - ~ ( x E + ~ ( A ~Hence, ) ) ] ~ .we get
and the assertion of the theorem holds. The next theorem deals with dominance of the Stein-type estimators. Theorem 4. Let k > 4 and the matrix Q satisfy the condition
( 12.2.36)
-s
- s+ outperform G , with respect
Then under {K(,l}, the estimators, Q, and Q, to ADQR and
R E , ( G ~Q) + ;I &($:;
Q) I Ri(*,; Q).
(12.2.37)
Chapter 12. Discrete Data Models
578 Proof. R4(&:
The ADQR of
&:
can be written as
: Q ) = $: tr[Qn-'] - ( k - 3)$:w-l
tr[Qn-'J']{ ( k - ~ ) E [ x ~ : ~ ( A ~ ) ] (12.2.38)
The risk-difference Rl(G,; Q) - Rq(4,; Q) (12.2.36). Hence,
2 0 for all (A2,Q), Q satisfying
R,(e:; Q) i RI(*,; Q) v(A2,Q).
(12.2.39)
Similarly, the risk-difference R5(*:+;
=
Q) - R4(&:: Q)
-& tr[Qn-'J']E[(l -a'Qa{W(1 - (k
- (k -3)~i$(A'))~1(xg+~(A < ~k )- 3)]
-
3 ) x i ~ l ( A 2 ) ) I ( x i + 1 ( A 2<) k - 311)
-E[(1 - (k - 3 ) ~ ~ ~ ~ ( A ~ ) ) ~ 1 ( ~ 2
(12.2.40)
Hence,
Rs(GZ+;Q) 5 R4(@:; Q) i R I ( @ ~Q). ; The next theorem states the range of dominance of 9: over
R4(&:;Q) and R5(@ft; Q)).
(12.2.41)
@,
(excluding
..PT
Theorem 5 . Under {K(,)} and positive-semi-definite, Q, &, and 9, have smaller ADQR than G , whenever
otherwise,
@,
dominates
&n
and
,.PT
*n
(12.2.42)
,
(12.2.43) The theorem is proved by considering the risk-difference of the estimators and noting that Ch,in(QQ-')
5 s'Qs Chmaz(Qn-'). s'ns < . s+
(12.2.44)
From the theorem above it is clear that PRSE, is the best practical . PT application if k 2 4, while for k < 4, 9, is a reasonable compromise.
12.2. Product Binomial Distributions
579
12.2.5 Estimation of Odds Ratio under Uncertain Zero Partial Association In this case, we like to estimate 9 = ($1,. . . ,$k)’ when it is suspected that Ho : $1 = . . . = $k = 1. This null hypothesis holds if and only if 13,j = 62j,j = 1, . . ,k , meaning that in each stratum, the probability of success is the same for the cases (treated) and control groups. This is known as &homogeneity. The large sample test for 6-homogeneity may be given by the asymptotic distribution of the test-statistic L;, defined by
L: = n(Gn - l k ) ’ i i n ( G n - ~ k ) 1 , 1= , (1,... , I)’,
(12.2.45)
where the components of hnare given by (12.2.8) and 9, is the unrestricted estimator of 9,since the common value of the odds ratio is 1. The asymptotic distribution of 13; is a chi-square distribution under the k d.f. The preliminary test estimator of 9 is defined by
&y= l k I ( L ; < C;,J
+ a n r ( L ; > c;,&
(12.2.46)
where L:+ is the a-level critical value from the distribution of L;. Similarly, the shrinkage estimators (SE) of 9 are defined by
-s
9n= l k
+ (1
-
( k - 2)C;4}(Gn - lk),
(12.2.47)
and
s+ 9, - l k + (1 - ( k - 2)c;-1}qc:
>k
.?
- a)(@, - l k )
(12.2.48)
which is the positive-rule shrinkage estimator of 9. For the asymptotic distributional biases and risks of the estimators, we use local alternatives of the form
Kin) :
= lk
+ n-’l2t.
(12.2.49)
Thus, following previous developments in Section 3, we obtain the biases as
W a n ) = -6~k+2(x:(a); a2), b(&f)= - ( k - 2 ) d E [ ~ i : ~ ( A ~ ) ] , .
s+
b(9,
= -s(Hk+l(k - 2; A2)
+ ( k - 2)E[X&(A”l,
-(k - 2 ) E [ X 3 A 2 ) I ( X & < k - 2)l>, where A2 = 6’0x5. The asymptotic distributional quadratic risks (ADQR) are given by ADQR(G,) = tr[Qn-l]
Chapter 12. Discrete Data Models
580
ADQR(@ET) = tr[QCt-l]
+ (1 - Hk+z(x;(a); A’)}
+S’QS{~H~+Z(XE ( a ) A’) ; - Hk+4(x;( a ) A’)} ;
ADQR(@z) = tr[QC2-’]
- ( k - 2)tr[QC2-1J’]{2E[~;:z(A2)],
-(k - ~)E[x&(A’)J + (k - 2)(k + ~)~’QSE[X&(A’)],
A D Q R ( @ ~ += ) AUQR(G~) -tr[QC2-’]E[(l - ( k - 2 ) x ~ ~ 2 ( A 2 ) ) 2 1 < ( xk~-~2)] ,
< k - a)]}
+b’QSE[2(1 - ( k - ~)X;~~(A’))I(X;:,
-E[(l - ( k - 2)x&(A2))’1(x;:4
-
a)].
From these risk expressions, the asymptotic properties can be established as before with similar conclusions.
12.2.6 Odds Ratios: Application to Meta-analysis of Clinical Trials Meta-analysis is a systematic and quantitative review of results of a set of individual studies intended to integrate their findings. Statistical methods are used as a fundamental tool in reviewing and combining the evidences from various clinical trials in medical research. There are a number of reasons why meta-analysis is an important technique in clinical trails:
1. Narrative reviews of a set of individual clinical trials can be misleading and have the potential of misleading and distorting trial results.
2. Explosion of research evidence in the form of published trials often are not easily assimilated without formal review.
3. In assessing the benefits of a particular medical treatment, judgment should be based on the totality of evidences from well-conducted randomized clinical trials.
4. Since individual clinical trials may involve small samples, a collection
of several similar clinical trials would increase the sample to provide reliable evidence on the medical treatment.
Meta-analysis therefore has several objectives: (1) to provide systematic qualitative as well as quantitative summaries of results from individual studies and (2) to combine these results across studies and provide overall interpretation t o suggest effectiveness of the medical treatments. In this section, we consider the principal results from nine randomized trials of diuretics on incidences of pre-eclampsia using a fixed effect design. The
12.2. Product Binomial Distributions
581
data shown in the Table 12.2.1 come from a meta-analysis of nine randomized studies investigating the use of diuretics t o prevent pre-eclampsia (see Thompson and Pocock, 1991). Table 12.2.1 Incidence of pre-eclampsia in nine randomized trials
Source: Adapted from Thompson and Pocock (1991)
The object of the studies is to find whether diuretics are effective treatment for pre-eclampsia. The odds ratios and their confidence intervals are presented as initial steps nine studies. OR. less than unity represents beneficial effects of diuretics. In order to combine the nine OR values into one common value, we use the traditional log-scale and consider the modified test of homogeneity of the odds 0 ratios by a chi-square test given by xi = wi(!n& - !nGo)’ = 27.31 > xg(.05) = 15.51. Thus, we have the following table for the estimators of the 0% with graphical representation in Fig. 12.2.la. Table 12.2.2 Various Estimators of Odd-Ratio
It may be noted that 95% CI for
40 is (0.5680 - 0.8005).
Now, we observe that the null hypothesis Ho : $1 = $2 = . . . = $9 = $0 is rejected at 5% level of significance. Even the Stein method of pulling the 5th value of the OR is not pulling enough towards the common value (metaanalysis) 0.6743. Also, the confidence intervals do not always include the common value 0.6743. Thus, there is enough evidence to suspect heterogeneity of
Chapter 12. Discrete Data Models
582
ElU"3.l.d
ca**R.dOl
P..d,et.<
om,
R*,,O',
Figure 12.2.la Predicted odds ratios
I -1 cp--II
t (
-4
Figure 12.2.lb Confidence Intervals of odds ratios
22.2. Product Binomial Distributions
Figure 12.2.2a Predicted Odds Ratios (Deleting Fallis)
C"ad,OS
Landesman KrmS
Tswila
Campbell
Mela-anaiyrlr
Figure 12.2.2b Confidence Intervals of odds ratios (Deleting Fallis)
583
Chapter 12. Discrete Data Models
584
nine 2 x 2 tables. Most of the heterogeneity is caused by the OR value 0.2292 (Fallis) or .2488 (Cuadros). The recomputed D7-statistic yields the value 17.99 by deleting the “Fallis” or ‘Tuadros” study. But the weight for “Fallis” is less than that of Tuadros”. Hence, “Fallis” is designated as an outlier. Deleting this value, we obtain the common value as 4 0 8 = 0.7244, with the corresponding chi-square value 0 7 = 17.99 with 7 d.f. and the null hypothesis is marginally rejected. Thus, based on this D-value we obtain the following modified Table 12.2.3 of the estimators. Note that the 95% CI for is 0.6065 - 0.8652. Table 12.2.3 Revised Estimators of ORs after Deleting “Fallis”
Samples
*
UE
Weseley Flowers Menzies Cuadros Landerman Tervillis Krans Campbell
1.0427 0.9371 0.3256 0.2488 0.7431 2.9705 0.7699 1.1449
RE 409
0.7244 0.7244 0.7244 0.7244 0.7244 0.7244 0.7244 0.7244
PTE
SE/PR.SE
1.0427 0.9371 0.3256 0.2488 0.7431 2.9705 0.7699 1.1449
0.9423 0.4693 0.4066 0.3155 0.7378 2.0069 0.7570 1.0081
jPT qP/4”’
95% CI for OR. based on 0.4307 - 2.0618 0.2396 - 0.9193 0.1779 - 0.9296 0.1081 - 0.9215 0.5870 - 0.9273 0.3957 - 10.1787 0.3811 - 1.4956 0.6050 - 1.6000
GS+
The chi-square value together with the power of CI of the ORs reveal that 8 data sets are homogeneous and the common OR value is 0.7244 with 95% CI (0.6065-0.8652) indicating the degree of benefit of the diuretics to prevent pre-eclampsia via the odds interval (13%-39%). The reader is referred to Saleh et al. (2006) for some details.
12.3
Product of Multinomial Models, Estimation, and Test of Hypothesis
It is common practice in population census and epidemiological surveys to classify data according to two traits to obtain r x c contingency tables. The usual problem in this case is to estimate the cell probabilities when it is suspected that the two traits may be independent. For the estimation of the cell probabilities under independence structure, we consider the formulation of the model and the estimation strategies in the following subsection.
12.3.1 The Product of Multinomial Models Let nZ3stand for the observed cell frequency for the ( 2 , j)-cell in an T x c contingency table for i = 1,.. . ,r ( 2 2) and j = 1, ... ,c ( 2 2). Let n,+ = nZJ, and let n+3= nZ3and n = n,+ = n+3.The probability distribution of the random vector n = (rill,. . . , nlc,.. . ,n,.~,.. . ,nTc)’is given
I:=,
c:==,xi=,
cg_,
12.3. Product of Multinornial Models
585
by the product multinomial distribution r
r
c
i=l j = 1
c
(12.3.1)
i=l j = l
where 8 = (ell,.. . ,e l c , .. . , O r l , . . . ,OrC)’ with 6’lrc= 1 being the vector of the true cell probabilities. Define
ei+ =
cOij,
3=1
XOij, (1 5 j 5 c ) . 2=1 T
C
(1 5 i 5 T ) , and 6+j =
(12.3.2)
If the independence structure of the two traits holds, then we specify the null hypothesis as
H~ : e,, = e,+. e+3
v (i,j).
(12.3.3)
Note that in (12-3.1) we have ( T C - 1) independent parameters, while under Ho, we encounter ( T - 1) ( c - 1) = T c - 2 parameters.
+
12.3.2
+
Estimation of the Parameters
For the estimation of Bi; in the stricted MLE of 6 given by
T
x c contingency table, consider the unre-
( 12.3.4) while the marginal probabilities &+.(l5 i 5 defined by
T)
and 8+,(1 5 j 5 c ) are
,. = -. n+3 &+ = yna’ and 8+,
n Then the restricted MLE of 8,, under Ho is given by
e n = ( & I , . - . ,6lc,... 12.3.3
, 6 ? 1 , . - . ,&,),
withi,, =eZ+6+,
Test of Independence in an Table
T-
(12.3.5)
V ( i , j ) . (12.3.6)
x c Contingency
In order to test the null hypothesis
H~ : ezj = ei+.e+j
v (i,j),
( 12.3.7)
we use the departure statistic D,, defined by
D , = n(6, - bn)’Ei1(6,- b,),
(12.3.8)
En = Diag(611,. . . ,elc,. . . ,&I,. . . ,&).
(12.3.9)
where It may be shown, following Bishop et al. (1975) and Agresti (1990), that under Ho, D, closely follows the central chi-square distribution with m (= ( T - I)(c - 1)) d.f.
586
Chapter 12. Discrete Data Models
12.3.4 Preliminary Test and Stein-Type Estimators of the Cell Probabilities Let dn,a be the a-level critical value from the exact distribution of D,, which may be closely approximated by xk(a),and is the a-level critical value from the central chi-square distribution with m d.f. Then we define the class of estimators of 8 as 8: =
en + (1 - g(~,))(en - Pn),
( 12.3.10)
where g ( D n ) is a decreasing function of D,. (i) If we choose g(Dn) = I ( D , * PT
8,
< xk(a)),we obtain the PTE of 8 as
= e, - (6, - ~ , ) I ( D , < x2,(a)).
(12.3.11)
d = m - 2, then we obtain the Stein-type
(ii) If we set g(Dn) = estimator (SE) as
-s 8, = 8, - d ( 8 , - 6
,)~;~.
(iii) If we choose g(Dn) = 1 - (1 - d D i l ) I ( D , positive-rule Stein-type estimator (PRSE) as -s+ 8, - 8,
+ (1 - dD;’)I(D,
> d ) , then we obtain the
> d)(&
-
6,).
We note that for g(D,) = 0 or 1, we obtain the unrestricted and restricted estimators of 8, respectively. Our estimators are restricted t o the choices above of g(D,) function.
12.3.5 Bayes and Empirical Bayes Estimation of the Cell Probabilities In order to obtain the Bayes estimator of 6 , we consider a Dirichlet distribution as prior on 8 with parameter K q where K is a scalar with q = (qll;.. , v I ~ , . - *, ~ r l , . - . ,qrc)’ having marginal row sum vi+;i = I , . - - ,c, and marginal column sum v + ~j; = 1 , 2 , . . . ,r. The independency configuration is then given by nij = ni+ . n+j V ( z , j ) . If K and q are known, then the Dirichlet distribution with parameters (Kq)is given by
Hence, the joint distribution of (n’,8’)’ is given by
587
12.3. Product of &!ultinorniaJ Models Accordingly, the marginal density of { nij} is then obtained as
+
+
= u ( u 1) . . . ( u b - 1) with u(O) = 1. Consequently, the posterior where distribution of 8 given n for fixed K is given by
(12.3.15) where
(12.3.16)
+
is the Jeffreys’ noninformative prior on the marginal totals and 7r(81Kq n) is the posterior distribution of 8 given Kq n. To obtain the Bayes estimator of 8 , we compute the posterior means. Thus, for fixed K, the posterior expectation of Bij is given by
+
E(&jIn,K) =
nij
+ KE(%ijIn, K) n+K
+ K 2( n+ K ) ( 1n+ K + 1) WVijIn, K),
( 12.3.17)
(12.3.18)
respectively. Note that the expressions for the posterior moments of & j are complicated and intractable. Thus, we need some approximations (as in Albert and Gupta, 1982).First, note that the covariance terms of the multinomial Dirichlet distribution are given by
( 12.3.19) Now, defining the random variables
(12.3.20)
Chapter 12. Discrete Data Models
588 we have
We approximate the marginal density M l ( n j K , q ) given in (12.3.14) by a 77’)., Then M l ( n l K ,7)is approximultiriomial density with parameter ( n ~ mated by
Hence, the approximate marginal distribution n given K is obtained by integrating (12.3.22) with respect to the prior of (qk,qL)’given by (12.3.16) to obtain the expression
which is the approximate density of n given K . Now, if M l ( n l K , q ) at (12.3.22) is combined with the prior given at (12.3.16). Note that q k and qc have approximately independent posterior Dirichlet distribution with parameter T n k and ~ n , ,respectively, where n, = ( n 1 + , . . -,nr+)’ and n, = ( n + 1 , . -,.T L + ~ ) ’Then . the approximation of the posterior moments (12.3.17) and (12.3.18) are obtained as
K = (I - X(K))&
1
+ X(K)6ij =
&j,
where X(K)= K ( n
+ K)-’,
(12.3.24)
and Var(BZJlnz3,K) N gZ3(l - &,)(n
+
+ K 2 ( n K)-l(n
+K f
X[6+3(1 - 1!+~)(72.T 4-
+ K + 1)-l
1)-’{[6,+(1 - 6,+)(nT f 6:J]} - [6,+ . 6+2]2.
+ iz”,] (12.3.25)
Thus, the Bayes estimator of Oz2 depends on X ( K ) through K . If K is unknown we can use a noninformative prior K-’ and estimate K from the posterior distribution. The empirical Bayes method suggests that K is to be estimated from the marginal distribution M(n1K) given at (12.3.23). Let i ( K ) be the estimate
12.3. Product of Multinornial Models
589
of X(K). Then the estimate of 0i, is obtained as
iy = X(K)02j + (1 -,.
- X(K))&j, V(2,j)
x
= 0ij - X(K)(Bij- &),
(12.3.26)
and this can be written compactly as (12.3.27a) We can show (see Albert, 1987) that
(12.3.27b) (see Problem 17). Note that (G2- D,) -+ 0, as n -+ 00 (under Ho and local alternatives). Hence, the empirical Bayes estimator of 8 can be written as -EB
8,
=
6, + (1 - ~D;')I(D, > d ) ( 8 , - 8,).
(12.3.28)
In the derivation of the empirical Bayes estimators of 8, we have noticed the degree of difficulty in calculating the estimator while the quasi-empirical Bayes (PTE approach) achieves the same estimators without the sequence of assumptions and derivation. Our estimators are obtainable from (12.3.28) by setting -EB X(K) = 0 t o obtain 8, = 8, -EB
= 1 to obtain 8, = I(D,
..
= 8, -EB
5 & ( a ) ) to obtain 8,
-EB
= d D i l , ( d = rn - 2) to obtain 8, = 1 - (1 - dD;l)I(D,
APT
= 8,
-S
= 8, -EB
> d ) to obtain 8,
-S+
= 8,
, (12.3.29)
respectively.
12.3.6
Asymptotic Properties of the Estimators of Cell-Probabilit ies under Fixed Alternatives
For the rc-dimensional simplex R = (8: 8'1 = l}, let w c R be the subspace for which 8 satisfies the independence hypothesis. By virtue of the consistency of the D,-test, we note that for fixed 8 E w , D, + 00 in probability as n + 00. APT -s S+ Thus, for any fixed 8 $! w and large n(-+ co),the estimators 8, ,8,, and 8, are equivalent in probability to the unrestricted estimator, and %, has
a,,
Chapter 12. Discrete Data Models
590
bounded risk. Thus, t o obtain differing risks for the estimators, we consider Pitman’s local alternatives
K(,) : 8(,)
= Q0
+ n-’I2J, B0 E w and 6’1 = 0,
(12.3.30)
where J has fixed elements. First, we look a t the fixed alternatives. For a positive-semidefinite matrix, W the loss is
The elements of X, are nonnegative and bounded by 1. Also, I ( D , < 0 as n 03 by the consistency of D,. Hence,
xk(a))
-+
---f
E{D,I(D, < x2,(a))p $ w }
o
-+
as n
-, 03.
(12.3.33)
., PT
Thus, for 8 @ w , 8, and 6, are asymptotically risk equivalent. Next, for g(D,) = dD;’, ( d = m - a), we have
( 12.3.34)
d2D,2n(6, - Gn)’W(6, - 6,) 5 d2D,1Chmaz(W&).
-s
-
On the set {D, = 0}, we have 8, = 8, =
6. Further,
we prove that
EB{D;~I(D, > O ) l 8 $ w } -+ 0 as n 03, which implies that 8, are empirically risk equivalent for every 8 $ w . --j
6,” and
Lemma 1. (Gupta, Saleh, and Sen, 1989).
{ D ; ’ I ( D ~> O)~Q w }
o
as n
-+
4
03
(12.3.35)
Proof. We need to show that for every c > 0,
E{D,’I(O < D, < E }
-+
This we verify as follows for { ( i , j ) i i= 1;-. UiJ
=
(nij -
neij) ,
4%
as n
0
Zli+
,T,
=
-+
co.
j = 1 , . - -,c},
(ni+- no,+)
6 ’ (12.3.36)
12.3. Product of Multinornial Models
591
Using the Stirling approximation together with standard steps, we see that the approximate large sample distribution of { Vij} is given by
>
for every n E N, = {n : nij 0 'd(z,j);n'l = n}. Similarly, the joint density function of the two marginal of ni+ and n+?is given by
Therefore, the conditional density of n given marginals (n,+ ,n+j,8)is
(12.3.39)
where rn = ( r - l ) ( c - 1)(>3). Let
k , = max
{I
nij -
n. n 2+ n
1
+j : 1 5 i 5 r; 1 5 j 5 c}.
(12.3.40)
Note that D, cannot be smaller than the parallel quantity for any 2 x 2 table (out of an T x c table) with probability one. Thus, we have
D, 2 16n-lk:
with probability one.
(12.3.41)
Now, if (ni+n+j)n-' is an integer for every i ( l 5 i 5 r ) and j ( 1 5 j 5 c), then k , = 0 with positive probability. Then k, = 0 implies D, = 0. Hence, -s 8, = 6, = 6,. On the other hand, if (ni+n+j)n-l is not an integer for at least one ( i , j ) ,then k, # 0, though it can be less than 1 with positive probability. In any case, k, > n-l with probability 1. Thus, on the set { D , > 0}, we can write
DL'I(0 < D, < 6 ) = D,'{1(0
< D, < 16n-l)
+ DL11(16n-'
5 D, 5 6 ) ) .
(12.3.42) Based on (12.3.41) and (12.3.42), for every 8 (under K(,) as well) we obtain
Ee[DL1l(O < D, < E ) ] I: -{Ee[ki21(15 k , 5 k:)] n 16
+ Ee[kl21(n-'
5 k, 5 l)]},
(12.3.43)
where we choose k: = [ i ( n ~ ) ' / ~so ] that < (k: + 1)2.Here [ ] 5 stands for the largest integer contained in [ ] for k, > 1. Replacing k , by [k,]
Chapter 12. Discrete Data Models
592 we have from (12.3.39), Ee{kG21(1 < kn k:
=
< kz)} I { ~ % { [ k n ] - ~ IL( 1k n I k:)} kz-1
C ~ c - ~ ~ e ( [ =k ,k)] I 2 C { k 2 ( k +
~))-'~e{[kn]
5 IC}
k=l
k=l
+(k:)2Pe{[kn]= k } .
(12.3.44)
Now for the conditional model in (12.3.39), the row or column sums {nv } are zero, and essentially there are ( r - l ) ( c - 1) = m n linearly independent entries. Hence, given the marginals and the estimated marginal probabilities held fixed, there are (2k)" configurations for which k , 5 k. For each configuration, the conditional probability by (12.4.39) is O(n-"/'). Thus, Pe{[k,]5 k } = O(k"n-"/') for every k 5 k z . In this formulation, we actually take Pe(A) = Ee{Pe(A)Ini+n+j} and use (12.3.39). Then, by (12.3.43) and (12.3.44), we have
- 0(~-"/2+1 -
k=l = ~ ( ~ - m / Z (k,) +0 l "-2
)-0 ( @ - 1
)
= O(E'/'),
m 2 3.
(12.3.45)
Thus, (12.3.45) may be made arbitrarily small by suitably choosing E . For second term of (12.3.43), writing s, = [nk,],so that I k , 5 1 and 1 5 s , 5 n. Thus, as in (12.3.44),
Note that we have k, < 1 so that s, < n. Hence, by the definition of k, (see 12.3.39), n,, differ from (n,+n+,)/nby some number within (-1,l). Thus, for a given the number of configurations of n(s, < n - 1) is equal t o 2" a t most. Also, the number of configuration of n,+ and n+, for
(n"+z+3)
which (n,+n+,)/ndiffer from some number 5 k,(= ~-IS,) is O ( S ? - ' - ~ ) / )~ Further, total number of the n,+ (or n+,) is equal t o n, and for every integer, a , b, (n,+ a)(n+3 b)/n = (n,+n+,)/n an+,/n bn,+/n ab/n, where both n,+/nand n+,/n are less than 1. Thus, ab needs to be of the order O(S,), which reduces the number of configurations t o the order O ( s(7--1)/Z+(c-l)/' , )= o(s;+C-2)/2 ). Finally, by (12.3.37) and (12.3.38), for each such configurations
+
+
+
+
+
12.3. Product of Multinomial Models
593
of n, the probability is 0(nwmI2) and for the marginals , it is O(n-(T+c-2)/2 )* Therefore, Pe(s, I s) = O ( s ( T f c - 2 ) / 2 n - " / 2 ) ) 0 ( nm/2 - ) for s = I , . . . , n - I. Hence, (12.3.46) reduces to
5
)o{
-
O(n3-m/2-
-
O(n3-m/2-(T+c-2)/2 ) o ( ~ ( T + c ) / ~ = - ~0)( ~ - m / 2 + 1
(T+C-2)/2
S(T+c)/2-3
s=l
= O(n-1/2),
as m 2 3.
) (12.3.47)
Hence, (12.3.46) converges t o 0 as n -+ co. This completes the proof that for every positive E ,
E ~ { D , I ( O< D , < 6)le
w} --+ o
(12.3.48)
as n -+ co.The proof of the lemma is complete. Next, we note that
n ( 6 , - 8)'W(6,
-
8)
--f
00
in probability as n
-+
ca.
(12.3.49)
As a result, the asymptotic risk of 6, for any 8 $! w goes to 03. Also, n(6, 8 ) ( 6 , - 8)' = Diag(B11,. . . , BTC) - 88' so that the risk of 6, is bounded for ., S+
every 8 E a. Finally, we consider the asymptotic risk equivalence of 8, and 8, as follows:
..S+ -e)'w(e, - S+
n(8,
-8)
-S n(e, - D,)'W(~:- en)+ n(i - ~ D ; ' ) ~ I ( D<, d ) ( 6 , - ij,)'w(&- 6,) -S - 2 4 8 , - e,)'w(e, - 6,)(i - ~ D ; ~ ) I ( D<,d ) -s -s ,. = n(e, - 6,)'w(e, 8,)
=
-
-2n(62 - e,)'w(e,- 6,)(l - dD,l)I(D, < d ) .
(12.3.50)
-S
We observe that 8, is asymptotically risk equivalent to 6, for fixed alternatives. Now, consider the last two terms of (12.3.49) pooled together as [(l- dD,1)2I(D,
< d)D,
x n(6, - 6,)'W
+ 2 n d D 3 1 - dD,')I(D,
< d)]
(6, - 6,)
5 [DB(l - dD,')2I(D,
< d ) + 2 7 4 1 - dD,')I(D,
< d)]Chmi,(WI=,). (12.3.51)
Chapter 12. Discrete Data hdodels
594
The item in the square bracket may be shown to tend to 0 as n + 03 using (12.3.32) and using Lemma 12.3.1 or a small modification of it. Hence, we arrive at the following theorem:
Theorem 1. Under the local alternatives 86 = 80
6, the asymptotic risk of 6,
+ 6,
(i) the asymptotic risk of
becomes unbounded as n -+co,
(ii)
is bounded,
-PT - S
. S+
(iii) the asymptotic risk of 8, , O n , and 6,
are equivalent to that of
6 , as
72 + 00.
12.3.7 Asymptotic Properties of the Estimators under Local Alternatives Now, we consider the asymptotic distributions of under the local alternatives
- P T ,. . S+ 6,, 6,, 8, ,ens,and 8,
K(,) : 8(,) = B0 + T L - ' / ~ [ ,['l = 0 and
e0 E w
( 12.3.52)
t o obtain the required risk comparisons. First, define
where 8'
=
(&+,-.. ,6,+,i3+1,... ,6+,-)'.
(12.3.53)
In this part of the discussion we shall assume the regularity conditions of Chapter 2 borrowed from Agresti (1990) and Fienberg and Holland (1975), and also Ferguson (1996). Then we arrive a t the following theorem. For a proof, see Problem 17.
Theorem 2. Under K ( n )given by (12.3.52) and assumed regularity conditions,
6,) -
(i) X ( n )= n'/2Xi1/2(6n- 8 0 )
Nrc(6, J), where 6 = Xi1/2<;
(ii) Y ( , )= n1/2Xi1/2(8, - -
NrC(J*6, A), where J* = I - K;
(iii) Z(,) =
n1/2-p/2 0
.
(8, - 80) Nrc(K6, K);
595
Product of Multinomiai Models
( p&NZrc{(
l)};
;;):(;
lim P{D, 5 X I K ( ~ )=} H,(q A’), m = (T - l)(c - l),
12-00
where A2 = 6‘J*6; - e,)
lim
12-00
Ixl~(,))
arc(x- K6; 0, K)H,(x;(cx); A’) = J . .. J arc(x- 6 - Z; 0,K)d@(z: 0,A), M(6) =
where M ( 6 ) = {z : ( Z
+ J*S)’(z+ J*6) 2 x;(a)}.
S
f i ~ ; ~ / ~-(e,) 6 ~= x ( ~- ()m - ~ ) Y ( ~ ) ( Y ; Y , J - ~
2 (X + 6 ) - ( m- 2)(Y+ J*6){(Y+ J*6)’(Y+ J*6)}-l, where
( )
tribution as n
N
---$
Nzrc 00;
{ ( ) ( 2 )} :
and + V converges in dis-
Application of Theorem 12.3.2results in the asymptotic distributional bias expressions as
(1) b1(6,) = 0 and Bl(6,) = 0;
(2) b2(6,) = -J’S and Bz(6,) = A’;
. PT
(3) b3(8, ) = -J*~H,+z(x;((Y);A~) and
-S
(4) b4(On) = -(m - ~ ) J * ~ E [ x E ; ~ ( A and ~)]
&(a:)
= ( m- 2 ) 2 A 2 { E [ ~ ~ ~ 2 ( A 2 ) ] } 2 ;
..S+
(5) bj(On ) = -(m - ~ ) J * ~ E [ x L : ~ ( A ~ ) ]
rn2-) 2)} and -J*dE{(l - ( m- 2 ) - 1 ~ ~ ~ 2 ( A 2 ) ) 1 ( ~ ; + 2<( A
596
Chapter 12. Discrete Data Models
597
12.3. Product of Multinomjal Models
12.3.8 Analysis of the Asymptotic Properties of the Estimators We consider the asymptotic risk analysis of the five estimators.
Unrestricted and Restricted Estimators. Consider ARRE(6, : 6,) = R z ( e n : W ) 21 Ri(8":W) <
according as ( S ' J * ' W J * S ) ~ t r [ W A(> ] 0),
(12.3.574
<
which implies that 6, works well near the origin. When 6 moves away from the origin, the risk of 6 , is unbounded while that of 6, is constant. -PT
Unrestricted, restricted and PTE. Here we have ARRE(8, R3(6:T:W) Ri(On:w)
-
: 8,) =
$1 according as
- PT works well near the origin. Under Ho,we have
Here also, 8,
. PT
Rz(6, : W ) 5 RS(8,
: W)
5 R l ( 6 , : W).
(12.3.58)
Note that J, J " , and A are all idempotent matrices when the rank of A is m and the rank of J* is greater than m. Thus, by Courant's theorem, 6'J"wJ*6 5 Ch,,,(W) for all S. Consequently, let W be the class of W 6'5=6 matrices such that
(12.3.59) Then we have -S
R4(8n
' w, <- 1
R,(6, : W)
'd G,and W E W .
(12.3.60)
We may note the choice of W based on (12.3.59) in the loss function for the -S dominance of 8, over 8, for all m > 3. If, in particular, W = I,, or J, then ChmaX(W)= 1, and the rank of I,,A is m and the rank of JA 2 m- 1. In this
Chapter 12. Discrete Data Models
598
case, W E W . The dominance may not hold for arbitrary choice of W because the condition (12.3.59) may not hold simultaneously for all A (i.e., K ) , and -S
in such cases, the Stein-type estimator 8, works with some modification due to Berger et al. (1977) as . MS+
8,
=
6, + (1 - cdnD,lw-lE:,1}(6,
- 6,),
(12.3.61)
where c is the shrinkage constant and d, = C I L ~ ~ , ( W - ~ / ~ E ~ ~ ) .
As for the Stein-type estimator, en,it has smaller risk than 8, uniformly for all A2 by (12.3.59). However, for the comparison of risk of the restricted -S estimator en with that of 8,, we first note that -S
+ tr[WA]
R4(6: : W) = &(en : W)
-
(S‘J*’WJ*d)
- (m - 2) tr[WAl{(m - 2)E[X;:2(A2)j
1
+
+
[l-
(m 2)(S*J*’WJ*S) ~A’E[X;:~ (A2)]. (12.3.62) 2(S’J*S) t r [WA]
Under Ho, (12.3.62) becomes
R4(6: : W) = R2(6,, : W) + - tr[WA] L 0. 2 m
-S
Hence, asymptotic risk of 8, is greater than that of
6,
(12.3.63)
under Ho. However, -s
if A2 diverts from 0, the risk of 6 , grows bigger while risk of 8, becomes -S
6,
smaller and 8, dominates -S
neither 8, nor
6,
outside an interval around the origin 0. Thus,
dominate each other completely.
-5-
c.
PT
In the case of 8, and 8,
we have under Ho,
..PT
-s
+
R 4 ( 8 , : W) = Rs(8, : W) tr[WA]{l - H,+z(x:(a) 2 . PT + -tr[WA] 2 R3(8, : W ) V m 2 3. m
: 0))
(12.3.64)
The PTE has smaller ADQR than that of 6; at A’ = 0, and 6: does not - PT dominate 8, when independence hypothesis holds. Clearly, under Ho the risk ordering is as follows:
R2(6, : W) 5
: W)
5 &(6:
: W)
5 R1(8, : W).
(12.3.65)
The picture changes as soon as A2 moves away from 0, namely (1) the risk of 6 , remains constant, (2) the risk of 6 , becomes unbounded, (3) the risk . PT of 8, grows t o a maximum and then drop toward the risk of 6,, and (4)
599
12.4. Conclusions -S
the risk of 8, merges to the risk of 6 , as A' -+ 00 from below. It may be -S ,. PT calculated that the risk of 8, is greater than that of 8, whenever 2hrn+z(x;(a)
: 0)
2
I+ a, m
(12.3.66)
where hm+z(-;0) is the pdf of Hrn+2(.: 0) with (m + 2) d.f. and Q is the level of the preliminary test of significance. As soon as (12.3.66) is violated, the ordering of the risks becomes
~ ' ( 6 , w) I ~ ~ ( :6w) : 5 R3(e, : w). *
PT
-s
(12.3.67)
,.PT
Finally, we conclude that for m 2 3, 8, is a prefered estimator than 8, , PT while for m 5 2, 8, is preferable.
-s+ -s
As regards, 8, , 8 , and
a,, the asymptotic risks may be ordered as follows:
Rs(h:+ : W) 5 R4(6: : W) 5 Rl(6, : W) 'd A'. Hence, among the five estimators of 8,6;'
12.4
(12.3.68)
is the most preferred for m 2 3.
Conclusions
In this chapter, we considered three discrete models related to the baseball data analysis, meta-analysis and T x c contingency tables. Asymptotic properties of the point estimators were studied, and the usual conclusions on the dominance properties of the estimators were arrived at.
12.5 Problems 1. (Refer to Section 12.1.2). Verify that the joint density of (8',Y')'given (K177)is
2. (Continuation.) Verify that the marginal distribution of Y given ( K ,77) is
3. (Continuation.) Verify that the posterior distribution of 8 given (Y',K)' is
cl'
.(e,Ylh',77)ml(YI~,71)~7-]( 77 1)1-1d77,
where C is the constant of the integration.
Chapter 12. Discrete Data Models
600 4. Verify, based on Problem 3, that
and
=E
+ Var[E(OzIY,77)1
[ss)+
(b) Var(0,IY) = E[Var(4IY, q)l
(1 -
(n, K - 1)-1]
+ V a rn,+K [ m
I
.
5. Show that as K -+ co,the posterior distributioii of 77 is approximately a beta distribution with parameters ng and n(1 - jj). 6. Verify that limK,, E(0,IY) = IimK,, G(1-G). limK,, Var(qlY) = n+l
E(q1Y) = jj and limK,,
Var(0,IY) =
7. Verify that limK-o E(0,IY) = limK-o E(q1Y) = g and 1imK-o Var(0,jY) = limK-0 Var(7jY) = G(1-8). n+l
8. Show that the empirical Bayes estimators of 8 can be written as
where g(Dn) = min(1, ( p - 3)D;').
9. Prove Theorem 12.1.2. 10. Refer t o Section 12.1.4. Verify the expressions for ADB, ADQB, ADMSE, and ADQR from (i) through (v). 11. Prove that the asymptotic variance of the three estimators given at (12.2.8) iS $:W-l, W = W1 f * . ' + wk.
12. Prove Theorem 12.2.2. 13. Refer to Section 12.2.4. Verify the expressions for ADB, ADQB, ADMSE, and ADQR from (i) through (v).
14. Refer t o Section 12.1.7. Verify the expressions for the asymptotic coverage probabilities of C*(8:).
15. Show that M ( n l K ) N Crm/2exp {
-
i G i } as n + co,where
See formula a t (12.3.33).
16. Refer to (12.3.27b). Show that X(K) 17. Prove Theorem 12.3.2.
(1-dDZ1)I(D, > d) as n -+ ca.
References 1. Adichie, J. N. (1967). Estimates of regression parameters based on ranks. An. Math. Statist. 38:894-904. 2. Agresti, A. (1990). Categorical Data Analyszs. Wiley, New York. 3. Ahmed, S. E., and Saleh, A. K. Md. E. (1988). Estimation strategy using a preliminary test in some normal models. Soochow J. of Math. 14:135165. 4. Ahmed, S. E., and Saleh, A. K. Md. E. (1990). Estimation strategies for intercept vector in a simple multivariate regression model. Comput. Statist. Data Anal., 10:193-206. 5. Ahmed, S. E., and Saleh, A. K. Md. E. (1993). Improved estimation for the component mean vector. J. of the Jpn. Statist. SOC.43:177-195. 6. Ahmed, S. E., and Saleh, A. K. Md. E. (1999). Improved nonparametric estimation of location vectors in multivariate regression models. J. Nonparumetric Statist. 11:51-78. 7. Ahsanullah, M. (1971), On the estimation of means in a bivariate normal distribution with equal marginal variances. Biometrika 58:23@233. 8. Ahsanullah, M., and Saleh, A. K. Md. E. (1972), Estimation of intercept in a linear regression model with one dependent variable after a preliminary test on the regression coefficient, Int. Statist. Rev. 40:139-145. 9. Akritas, M. G., and Johnson, R. A. (1982a). Efficiencies of tests and estimators in autoregression under onnormal error distribution. An. Znst. Math. Statist, A34:579-589. 10. Akritas, hl. G., and Johnson, R. A. (1982b). Asyptotic inference in continuous time, diffusions and Gaussian processes with known covariance. J. Mult. Anal. 12:123-135. 11. Akritas, M., Saleh, A. K. Md. E. and Sen, P. K. (1985). Nonparametric estimation of intercepts after a preliminary test on parallelism of several regression lines. Baostatistics: Statistics in Bzomedical, Public Health and Environmental Sciences Ed. P. K. Sen, Elsevier Science, North-Holland, pp. 221-235. 12. Albert, J. H. (1984). Empirical Bayes estimation of a set of binomial probabilities. J. Statist. Comput. Szmulation 20:129-144. 13. Albert, J. H. (1987). Empirical Bayes estimation in contingency tables. Commun. Statist.-Theory Meth., 16:2459-2485. 14. Albert, J.H., and Gupta, A.K. (1981). Bayesian methods for binomial data with applications to a nonresponse problem. Technical report, Department of Mathematics and Statist., Bowling Green State University.
601
602
References
15. Albert, J. H., and Gupta, A. K. (1982). Mixture of Dirichlet distributions and estimation in contingency tables. A n . Statist. 10:1260-1268. 16. Albert, J. H., and Gupta, A. K. (1983a). Bayes estimation methods for 2 x 2 contingency tables using mixtures of Dirichlet distributions. J. Amer. Statist. ASSOC.,78:708-717. 17. Albert, J. H., and Gupta, A. K. (198313). Estimation in contingency tables using prior information. J . Roy. Statist. SOC.B 45:60-69. 18. Ali, M. Abdunnabi. (1990). Interface of preliminary test approach and empirical Bayes approach to shrinkage estimation. Ph.D. thesis. Carleton University, Ottawa, Canada. 19. Ali, M. Abdunnabi, and Saleh, A. K. Md. E. (1991). Estimation of means and treatment effects in a one-way ANOVA model. Soochow J. of Math. 17:287309. 20. Ali, hl. Abdunnabi, and Saleh, A. K. Md. E. (1991a). Asymptotic Theory for Simultaneous Estimation of Binomial Means. Statist. Sinica 1:271-294. 21. Ali, A.M., and Saleh, A.K.Md.E. (1991b). Preliminary test and empirical Bayes approach to shrinkage estimation of regression parameters. J. Jpn. Statist. SOC.,21(1):401-416. 22. Anderson, T. W. (1984). Introductzon to Multivariate Analysis. Wiley, New York. 23. Anderson, D. R., Sweeney, D. J., and Williams, T. A. (1993). Statistics for Business and Economica, 5th ed., West Publishing, Boulder, CO. 24. Arnold, Steven F. (1981). The Theory of Linear Models and Multivariate Analysis. Wiley, New York. 25. Asano, C., and Sato, S. (1962). A Bivariate Analogue of Pooling Data. Bull, Math. Statzst. 10:39-59. 26. Baksalary, J. K., and Kala, R. (1983). Partial ordering between matrices one of which is of rank one. Bull. Polish Acad. of Sci., Math., 31:5-7. 27. Bancroft, T. A. (1944). On biases in estimation due to the use of preliminary tests of significance. An. Math. Statist., 15:190-204. 28. Bancroft, T. A. (1964). Analysis and inference for incompletely specified models involving the use of preliminary test(s) of significance. Biometn’cs 20:427-442. 29. Bancroft, T. A. (1965). Inference for incompletely specified models in the physical sciences (with discussion). Bull. ISI, Proc. 35th Session, 41(1):497-515. 30. Bancroft, T.A., and Han, C.-P. (1977). Inference based on conditional specification: A note and a bibliography. ISI Rev. 45:117-127. 31. Baranchik, A. (1970). A family of minimax estimators of the mean of a multivariate normal distribution. An. Math. Stat. 41542-645. 32. Bends, N. (1996). Pre-test estimation and design in linear model. J. Statist. Plan and Infer. 52:225-240. 33. Bennett, B. M. (1952). Estimation of means on the basis of preliminary tests of significance. An. Inst. Stat. Math. 4 31-43. 34. Bennett, B.M. (1956). On the use of preliminary tests in certain statistical procedures, An. Inst. Stat. Math. 8:45-57. 35. Berenblutt, I. I., and Webb, G. I. (1973). A new test for autocorrelated errors in the linear regression model. J. Roy. Statist. SOC.B35:33-50.
R,eferences
603
36. Berger, J. 0. (1976). Admissible minimax estimation of a multivariate normal mean with arbitrary quadratic loss. An. Statist. 4:223-226. 37. Berger, J. 0. (19804. Statistical Decision Theory. Springer-Verlag, New York. 38. Berger, 3 . 0 . (1980b). A robust generlized Bayes estimator and confidence region for a multivariate normal mean. An. Statist. 8:716-761. 39. Berger, J. 0. (1985). Statistical Decision Theory and Bayesian Analysis, (2nd edition). Springer-Verlag, New York. 40. Berger, J.O., Bock, M E . , Brown, L.D., Casella, G., and Gleser, L. (1977). Minimax estimation of a normal mean vector for arbitrary quadratic loss and unknown covariance matrix. An. Statist. 5:736771. 41. Berry, J. C. (1994). Improving the James-Stein estimator using the Stein variance estimator. Statist. Prob. Lett. 20:241-245. 42. Bickel, P. J., and Doksum, K. A. (2001). Mathematical Statistzstics: Basic Ideas, Vol. 1. Holden-Day, Oakland, CA 43. Bishop, Y. M. M., Fienberg, S. E., and Holland, P. W. (1975). Discrete Multivariate Analysis. MIT Press, Cambridge. 44. Bock, M. E. (1988). Shrinkage estimators: pseudo-Bayes rules for normal mean vectors, Vol 1. Proc. Fourth Purdue Symp. Stat. Dec. Theo. Rel. Topics, (eds. S. S . Gupta and J. Berger) Springer Verlag, New York. 45. Bock, M. E., Yancey, T. A., and Judge, G. G. (1973). The statistical consequence of preliminary test estimators in regression. J. Amer. Statist. Assoc. 68:109-1 16. 46. Bolfaine, H., and Zacks, S. (1992). Prediction Theory f o r Finite Population. Springer Verlag, New York. 47. Bozivich, H., Bancroft, T.A., and Hartley, H.O. (1956). Power of analysis of variance test procedure for certain incompletely specified models. An. Math. Statist. 27:1037-1043. 48. Brewster, J.F., and Zidek, J. V. (1974). Improving on equivariant estimators. An. Statist. 2:21-38. 49. Brockwell, P. J., and Davis R.A. (1996). Introduction to Times Series and Forecasting. Spinger, New York. 50. Brown, L.D. (1966). On the admissibility of invariant estimators of one or more location parameters. An. Math. Statist. 37:1087-1136. 51. Brown, L. D. (1988). The differential inequality of a statistical estimation problem. Proc. Fourth Purdue Symp. Stat. Dec. Theo. Rel. Topics, Vol 1 (eds. S.S. Gupta and J. Berger). Springer Verlag, New York. 52. Casella, G. (1985). An introduction to empirical Bayes data analysis. J. Amer. Statist. Assoc. 39 (2):83-87. 53. Casella, G., and Berger, R.L. (1990). Statist. inference. Duxbury Press. Belmount, California. 54. Casella, G., and Hwang, J. T. (1983). Empirical Bayes confidence sets for the mean of a multivariate normal distribution, J. of Amer. Statist. Assoc., 78:688698. 55. Casella, G., Hwang, J.T. (1986). Confidence sets and the Stein-effect, Com. Stat. Theo. Meth. 15:2043-2063. 56. Casella, G., and Hwang, J.T. (1987). Employing vague prior information in the construction of confidence sets. J . Mult. Anal., 21:7%104.
604
R,eferences
57. Cellier, D., Fourdrinier, D., and Robert, C. (1989). Robust shrinkage estimators of the location parameter for elliptically symmetrical distributions. J . Mult. Anal. 29:39-52. 58. Chen, J., and Hwang, J. T. (1988). Improved set estimators for the coefficients of a linear model when the error distribution is spherically symmetric with unknown variances. Can. J. Statist. 16:293-299. 59. Chen E.J., and Saleh, A. K. Md. E. (1993). Estimation of regression parameters when the errors are autocorrelated. Proc. 3rd Pacific Area Stat. Conference Stat. Sci. and Data Analysis. VSP Utrect, The Netherlands (eds. Matrusita, Puri and Hawakawa) 61-76. 60. Chiou, P. C., and Saleh, A. K. Md. E. (2002). Preliminary test confidence sets for the mean of a multivariate normal distribution. J. Prop. Prob. Statist. 2: 177-189. 61. Cochrane, D., and Orcutt, G. H. (1949). Application of least squares regression to relationships containing autocorrelated error terms. J. Amer. Statist. Assoc., 44:32-61. 62. Cohen, A. (1965). Estimates of linear combinations of the parameters in the mean vector of a multivariate distribution. An. Math. Statist. 36:78-87. 63. Cohen, A,, and Strawderman, W. (1973). Admissibility implications for different criteria in confidence estimation. An. Stat. 1:363-366. 64. Cramer, H. (1946). Mathematical Methods of Statzstics. Princeton University Press, Princeton. 65. Deely, J., and Lindley, D. (1981). Bayes empirical Bayes. J. Amer. Statist. ASSOC.76:833-841. 66. Dempster, A.P., Schatzoff, M., and Wermuth, N. (1977). A simulation study of alternatives to ordinary least squares. J. Amer. Statist. Assoc. 72:77-91. 67. Doob, J.L. (1953). Stochastic Processes. Wiley, New York. 68. Durbin, J. (1960). Estimation of parameters in time series regression models. J . Roy. Statist. SOC.B22:139-153. 69. Durbin, J., and Watson, G.S. (1951). Testing for serial correlation in least squares regression. I. Biometrzka 37409-428. 70. Durbin, J., and Watson, G.S. (1950). Testing for serial correlation in least squares regression. 11. Biornetrika 38:159-177. 71. Dzhaparidze, K. (1986). Parameter Estimation and Hypothesis Testing in Spectral Analysis of Stationary Time Series. Springer Verlag, New York. 72. Efron, B. (1975). Bias versus unbiased estimation. Adv. Math. 16:259-277. The Statistical Century. RSS News 22(5):1-2. 73. Efron, B., and Morris, C. (1972). Limiting the risk of Bayes and empirical Bayes estimators - Part 11: The empirical Bayes case. J . Amer. Statist. Assoc. 67:130-139. 74. Efron, B., and Morris, C. (1973). Stein’s estimation rule and its competitors An empirical Bayes approach. J. Amer. Statist. Assoc., 68:117-130. 75. Efron, B., and Morris, C. (1975). Data analysis using Stein’s estimator and its generalizations. J. Amer. Statist. Assoc. 70:311-319. 76. Efron, B., and Morris, C. (1977). Stein’s paradox in statistics. Sci. A m . 236(5):119-127.
References
605
77. Faith, R. E. (1976). Minimax Bayes set and point estimators of a multivariate normal mean. Tech. Report 66, University of Michigan, Ann Arbor. 78. Farebrother, R. W. (1975). The minimum mean square error linear estimator and ridge regression. Technometrics, 17:127-128. 79. Feller, W. (1954). A n Introduction to Probability Theory and its Applications. Wiley, New York. 80. Ferguson, T. S. (1967). Mathematical Statistics: A Decision Theoretical Approach. Academic Press, New York. 81. Ferguson, T. S. (1996). A course in Large Sample Theory. Chapman and Hall, New York. 82. Fuller, W. (1976). Introduction to time series. Wiley, New York. 83. Gart, J. J. (1962). On the combination of relative risks. Biometrica 18:471-475. 84. Gart, J. 3. (1992). Pooling 2 x 2 tables: Asymptotic moments of estimators. J . Roy. Statist. SOC.B54:531-539. 85. Ghosh, M., Hwang, J., and Tsui, K. (1983). Construction of improved estimators in multiparameter estimation for discrete exponential families. Discussion by J. 0. Berger, H. Malcolm Hudson and Carl Morris. An. Statist. 11:351-376. 86. Ghosh, M., Saleh, A. K. Md. E., and Sen, P. K. (1989). Empirical Bayes subset estiation in regression models. Stat. Dec., 7:15-35. 87. Gibbons, D. G. (1981) A simulation study of some ridge estimators. J. Amer. Stat. ASSOC.,76:131-139. 88. Graybill, F. A. (1976). Theory and Application of the Linear Model. Duxbury Press. 89. Griliches, Z., and Rao, P. (1969). Small sample properties of several two-stage regression methods in the context of autocorrelated errors. J. Amer. Stat. AsSOC., 64:253-272. 90. Gruber, hl.H.J. (1998). Zmprowing Eficiency b y Shrinkage: The James-Stein and Ridge Regression Estimators. hlarcel Dekker, New York. 91. Gupta, A.K., Saleh, A. K. Md. E., and Sen, P. K. (1989). Improved estimation in a contingency table: Independence structure. J. Amer. Stat. Assoc., 84:525532. 92. Gupta, A.K., and Saleh, A. K. Md. E. (1997). Estimation Odds-Ratio: Homogeneity Constraints. J. Ital. Stat. SOC.6(1):67-81. 93. Hajek, J. (1969). Nonparametric Statistics. Holden Day, San Francisco. 94. Hajek, J., and Sidak, Z. (1967). Theory of Rank Tests. Academic Press, New York. 95. Hajek, J., Sidak, Z., and Sen, P.K. (1999). Theory of Rank Tests. Academic Press, New York. 96. Hald, A. (1952). Statistical Theory with Engineering Applications. Wiley, New York. 97. Hall, P., and Heyde, C. C. (1980). Martingale limit theory and its applications. Academic Press, New York. 98. Hallin, M., and Puri, M. L. (1988). Optimal rank-based procedures for timeseries analysis: testing an ARMA model against other ARMA models. An. Stat&. 16:402-432. 99. Hannan, E. J. (1970). Multiple Time Series. Wiley, New York.
606
References
100. Han, C. P., and Bancroft, T. A. (1968). On pooling means when variance is unknown. J . Amer. Stat. Assoc., 63 1333-1342. 101. Haq, M.S., and Kibria, B.M.G. (1996). A shrinkage estimator for the restricted linear regression model: Ridge regression approach. J. Appl. Stat. Sci. 3(4):301-316. 102. Hemmerle, W. J., and Brantle, T. F. (1978). Explicit and constraint generalized ridge estimator. Technometrics 2: 109-120. 103. Hocking, R. R., Speed, F. M., and Lynn, M.J. (1976). A class of biased estimators in linear regression. Technometrics 18:425-438. 104. Hodges, J. L., and Lehmann, E. L. (1963). Estimates of location based on rank tests. An. Math. Stat. 34:598-611. 105. Hoeffding, W. (1948). A class of statistics with asymptotically normal distribution. An. Math. Stat. 19:293-325. 106. Hoerl, A. E., and Kennard, R. W. (1970). Ridge regression: biased estimation for nonorthogonal problems. Technometrics 12:55-67. 107. Hoerl, A.E., Kennard, R.. W., and Baldwin, K.F. (1975). Ridge regression: Some simulation. Commun. Statist. 4:105-123. 108. Hoffmann, K. (1992). Improved Estimation of Distribution Parameters: Steintype Estimators. Teubner Verlagsgesellschaft, Stuttgart. 109. Hogg, R. V., McKean, J. W. and Craig, A. T. (2005). Introduction to Mathematical Statistics (7th ed.). Prentice Hall, NJ. 110. Huntsberger, D. V. (1955). A generalization of a preliminary test procedure of pooling data. An. Math. Stat. 26:734-743. 111. Hugkova, M. (1971). Asymptotic distribution of rank statistics used for multivariate testing symmetry. J. Mult. Anal. 1:461-484. 112. Hwang, J . T. (1985). Universal domination and stochastic domination: Estimation simultaneously under a broad class of loss functions. An. Stat. 13:295-315. 113. Hwang, J. T., and Casella, G. (1982). Minimax confidence sets for the mean of a multivariate normal distribution. An. Stat. 10:868-881. 114. Hwang, J. T., and Casella, G. (1984). Improved set estimators for a multivariate normal mean. Stat. Decision, suppl. 1:3-16. 115. Hwang, J. T., and Chen, J. (1986). Improved confidence sets for the coefficients of a linear model with spherically symmetric errors. An. Stat. 14:444-460. 116. Hwang, J.T., and Ullah, A. (1989). Confidence sets recentered at James-Stein estimators - a surprise concerning the unknown variance case. Technical Report, Mathematics Department, Cornell University. 117. Inoue, T. (2001). Improving the “HKB” ordinary type ridge estimators. J. Jpn. Stat. SOC.31(1):67-83. 118. James, W. and Stein, C. (1961). Estimation with quadratic loss. Proc. Fourth Berkeley Symp. on Math. Statist., and Probability, University of California, V ~ l . ’ pp. l 361-379. 119. Joshi, V. M. (1967). Inadmissibility of the usual confidence set for the mean of a multivariate normal population. An. Math. Stat. 38:1868-1875. 120. Joshi, V. M. (1969). Admissibility of the usual confidence sets for the mean of a univariate or bivariate normal population. An. Math. Stat. 40:1042-1067. 121. Johnson, N., Kotz, S., and Balakrishnan, N. S. (1994). Continuous univariate distributions - 1. Wiley, New York.
References
607
122. Judge, G. G., and Bock, M. E. (1978). The Statistical Implications of Pre-test and Stein-rule Estimators in Econometrics. North-Holland, Amsterdam. 123. JureEkovA, J. (1969). Asymptotic linearity of a rank statistic in regression parameter. An. Math. Statist. 40:1889-1900. 124. JureEkovA, J., and Sen, P. K. (1996). Robust Statistical Procedures: Asymptotics and Interrelations. Wiley, New York. 125. Kakwani, T. (1968). Note on the unbiased of a mixed regression estimator. Econometrics 36:610-611. 126. Kale, B. K., and Bancroft, T. A. (1967). Inference for some incompletely specified models involving normal approximations to discrete data, Biometrics 23:335-348. 127. Kendall, M. G., and Stuart, A. (1963). The Advanced Theory of Statistics, 2 Vols. Hafner Publishing, New York. 128. Khan, Bashir U1. (1997). Some contribution to positive part shrinkage estimation in various model. Ph.D. thesis. University of Regina, Regina, Canada. 129. Khan, Bashir U1. and Saleh, A.K.Md. E. (2005). Estimation of Regression Parameters: Parallelism R.estriction. J . Statist. Theory Appl. 4:91-107. 130. Khan, S., and Saleh, A. K. Md. E. (1997). Shrinkage pre-test estimator of the intercept parameter for a regression model with multivariate Student’s t-errors. Biomed. J . 2:131-147. 131. Khan, S., and Saleh, A. K. Md. E. (2001). On the comparison of pre-test and shrinkage estimators for the univariate normal mean. Stat. papers 42(4):451473. 132. Khan, S., Hoque, Z., and Saleh, A. K. Md. E. (2002). Estimation of the slope parameter for linear regression model with uncertain prior information. J. Stat. Res. 36(2):55-74. 133. Ki, F., and Tsui, K. (1985). Improved confidence set estimators of a multivariate normal mean and generalizations. An. Inst. Statist. Math. 37:487-498. 134. Kibria, B. M.G. (1996a). On preliminary test ridge regression estimator for the restricted linear model with non-normal disturbances. Commun. Stat. Theory Meth. 25:2349-2369. 135. Kibria, B. M. G. (1996b). On shrinkage ridge regression estimators for restricted linear models with multivariate t disturbances Students, 1 (3) 177-188. 136. Kibria, B. M. G. (2003). Performance of some new ridge regression estimators. Commun. Statist. Simul. Comput. B32:429-435. 137. Kibria, B. M. G. and Saleh, A. K. Md. E. (1993). Performance of shrinkage preliminary test estimator in regression analysis. Jahangirnagar Review A17: 133148. 138. Kibria, B.M. G. and Saleh, A.K. Md. E. (2003). Effect of W, LR, and LM tests on the performance of preliminary test ridge regression estimators. J . J p n . Stat. SOC. 33:119-136. 139. Kibria, B. M. G. and Saleh, A. K. Md. E. (2004a). Preliminary test ridge regression estimators with Student’s t errors and conflicting test-statistics, Metrilca 59:105-124. 140. Kibria, B. M. G. and Saleh, A. K. Md. E. (2004b). Performance of Positive Rule Ridge Regression Estimators for the Ill-Conditioned Gaussian Regression Models. Calcutta Stat. Assoc. Bull. 55 to appear.
608
References
141. Kim, P.T. (1987). Recentered confidence sets for the mean of a multivariate distribution when the scale parameter is unknown. Ph.D. thesis. Department of Mathematics, University of California a t San Diego. 142. Kim, H. M., and Saleh, A. K. Md. E. (2003). Preliminary test estimators of the parameters of simple linear model with measurement errors. Metrika 57:223251. 143. Kim, H. M., and Saleh, A. K. Md. E. (2004). Improved estimation of regression parameters in measurement error model. J . Mult. Anal. 95:273-300. 144. Kitagawa, T. (1963). Estimation after preliminary tests of significance. UC Publ. Stat. 3:147-186. 145. Kmenta, J. and Gilbert, R. F. (1968). Small sample properties of alternative estimates of seemingly unrelated regressions. J. Amer. Stat. Assoc. 63:11801200. 146. Kmenta, J. and Gilbert, R.F. (1970). Estimation of seemingly unrelated regressions with autoregressive disturbances. J . Amer. Stat. Assoc., 65:186-197. 147. Koul, H.L. (1977). Behavior of robust estimators in the regression model with dependent errors. A n . Statist. 5:681-699. 148. Koul, H.L., and Saleh, A. K. Md. E. (1993). R-Estimation of the parameters of autoregressive AR[p] models. A n . Statist. 21:534-551. 149. Koul, H. L., and Saleh, A. K.Md. E. (1995). Autoregression quantiles and related rank-scores processes. A n . Statist. 23:670-689. 150. Kubokawa, T. (1991). An approach to improving the James-Stein estimator. J. Mult. Anal. 36:121-126. 151. Lambert, A.,Saleh, A. K. Md. E. and Sen, P. K. (1985). On least squares estimation of intercept after a preliminary test on parallelism of regression lines. Commun. Statist. Theory Math. 14(4):793-807. 152. Lawless, J.F. (1978) Ridge and related estimation procedure. Commun. Statist., A7:139-164. 153. Lawless, J.F., and Wang, P. (1976). A simulation study of ridge and other regression estimators. Commun. Statist. A5:307-323. 154. Lindley, D. (1962). Discussion of Professor Stein’s paper, ”Confidence sets for the mean of a multivariate normal distribution”. J . Roy. Stat. SOC.B24:265296. 155. Lindley, D., and Smith, A.F. M. (1972). Bayes estimates for the linear model (with discussion) J . Roy. Stat. SOC.B34:l-41. 156. Lu, K., and Berger, J. (1989). Estimated confidence procedures for multidimensional normal means. J. Stat. Plann. Inf. 23( 1):l-20. 157. Maatta, J. M., and Casella, G. (1990). Variance estimation. Statist. Sci. 5:107109. 158. Malthouse, E. C. (1999). Shrinkage estimation and direct marketinmg scoring model. J . Interactive Market. 13(4):10-23. 159. Marquardt, D. W., and Snee, R. D. (1975). Ridge regression in practice. A m . Statistician 29:3-20. 160. McDonald, G.C.,Galarneau, D. I. (1975). A monte carlo evaluation of some Ridge-type estimators. J. A m . Stat. Assoc., 70:407-416. 161. Montgomery, D. C.,Peck, E. A., and Vining, G. G. (2001). Introduction to Linear Regression Analysis. 3rd ed. Wiley, New York.
References
609
162. Morris, C. N. (1983). Parametric empirical Bayes inference: Theory and applications. J . A m . Stat. Assoc. 78:47-65. 163. hfosteller, F. (1948). On pooling data. J. A m . Stat. Assoc., 43:231-242. 164. Nagar, A. L., and Kakwani, N. C. (1965). Note on the use of prior information in statistical estimation of economic relations. Sankhya A27:105-112. 165. Newhouse, J. P., and Oman, S. D. (1971). An Evaluation of Ridge Estimators. Rand Corporation, P-716-PR. 166. Obenchain, R.L. (1975). Ridge analysis following a preliminary test of the shrunken hypothesis. Technometrics 17:431-441. 167. Prais, S. J., and Winsten, J. A. (1954). Tren estimators and series correlation. Cowles Commission Discussion Paper, 383. Chicago. 168. Puri, M.L., and Sen, P.K. (1971). Nonparametric Methods i n Multivariate Analysis. Wiley, New York. 169. Puri, M. L., and Sen, P. K. (1986). Nonparametric methods in general linear models. Wiley, New York. 170. Rao, C. R.. (1973). Linear Statistical Inference and its Applications. Wiley, New York. 171. Rao, C. R. (1975). Simultaneous estimation of parameters in different linear models with applications to biometric problems. Biometrics, 31:545-554. 172. Randles, R. H., and Wolfe, D. A. (1979). Introduction to the Theory of Nonparametric Statistics. Wiley, New York. 173. RCnyi, A. (1970). Foundations of Probability. Holden-Day, San Francisco. 174. Robbins, H.E. (1955). An empirical Bayes approach to statistics. Proc. 3rd Berkeley Symp. Meth. Statist. 35:l-20. 175. R.oberg, S. J., and Stegeman, J. W. (1991). Pooling data for stability studies: testing the equality of batch degradation slopes. Bzometrics 47:1059-1069. 176. Robert, C. and Casella, G. (1987). Improved confidence sets in spherically symmetric distributions. J. Mult. Anal. 22:300-315. 177. Robert, C. P., and Saleh, A. K. Md. E. (1991). Point estimation and confidence set estimation in a parallelism model: An empirical Bayes approach. An. Econ. Statist. 23:65-89. 178. Rodrigue, J. (1987). Some shrunken predictors in finite population with a multinormal superpopulation model. Stat. Prob. 5:347-351. 179. R.ohatgi, V. K., and Saleh, A. K. Md. E. (2001). Introduction to Probability and Statistics. Wiley, New York. 180. Saleh, A. K. Md. E. (1992). On shrinkage estimation of the parameters of an autoregressive Gaussian process. Theory Prob. Appl. 37:290-300. 181. Saleh, A. K. Md. E. (2003). Asymptotic properties of Stein-type confidence sets - a U-statistics Approach. Sankhya 11, 65:l-11. 182. Saleh, A. K. Md. E. and Han, C.-P. (1990). Shrinkage estimation in regression analysis. Estadistica 4 2 40-43. 183. Saleh, A. K. Md. E. and Hassanein, K.M. (1986). On various F-estimation in a parallelism problem. Soochow J . Math. 12:83-94. 184. A. K. Md. E., Hassanein, K.M., Hassein, R.S., and Kim, H.M. (2006). Quasiempirical Bayes methodology for improving meta-analysis. J. Biopharmaceutical Stat. 16:77-90.
610
References
185. Saleh, A.K.Md.E., and Kibria, B.M.G. (1993). Performances of some new preliminary test ridge regression estimators and their properties. Commun. Stat. Theory Meth. 22:2747-2764. 186. Saleh, A. K. Md. E. and Sen, P. K. ( 1 978). Non-parametric estimation of location parameter after a preliminary test on regression. A n . Statist. 6:154-168. 187. Saleh, A. K. Md. E., and Sen, P. K. (1983). Nonparametric tests of location after a preliminary test on regression in the multivariate case. Commun. Statist. Theory Meth. 12(16):1855-1872. 188. Saleh, A. K. Md. E., and Sen, P. K. (1984). Least squares and rank order preliminary test estimation in general multivariate linear models. Proc. ISI, Golden Jubilee Conf. on Statist.: Application and New Directions (Dec. 16-19, 1981), pp. 237-253. 189. Saleh, A. K. Md. E., and Sen, P. K. (1984). Nonparametric preliminary test inference. Handbook of Statist. 4 (eds. P. R. Krishnaiah and P. K. Sen) NorthHolland, Amsterdam, pp. 275-297. 190. Saleh, A. K. Md. E., and Sen, P. K. (1985). On shrinkage M-estimators of location parameters. Commun. Statist. Theory Meth. 1 4 (10):2313-2329. 191. Saleh, A. K. Md. E. and Sen, P. K. (1985a). Nonparametric shrinkage estimation in a parallelism problem. Sankhya 47A:156-165. 192. Saleh, A. K. Md. E., and Sen, P. K. (1985b). Preliminary test predicted in general multivariate linear models. Proc. Pacific Area Statistical Conf. (ed. M. Matusita) (Dec. 15-19, 1982), North-Holland, Amsterdam, pp. 619-638, 193. Saleh, A. K. Md. E., and Sen, P. K. (1985~).Shrinkage least squares estimation in a general multivariate linear model. Proc. 5th Pannonian Symp. on Math. Statistics (eds. J. Mogyorodi, I. Vincze, and W. Wertz), pp. 307-325. 194. Saleh, A. K. Md. E. and Sen, P. K. (1985). On shrinkage least squares estimation in a parallelism problem. Commun. Statist. Theory Meth. 15:1451-1466. 195. Saleh, A. K. Md. E. and Sen, P. K. (1986). On shrinkage R-estimation in a multiple regression model. Commun. Statist. Theory Meth. 15(7):2229-2244. 196. Sarker, N. (1992). A new estimator combining the risge regression and the restricted least squares method of estimation. Commun. Statist. Theory Meth. 21:1987-2000. 197. Sclove, S. L., Morris, C., and Radhakrishnan, R. (1972). Non-optimdity of preliminary test estimators for the mean of a multivariate normal distribution. An. Math. Statist. 45:1481-1490. 198. Sen, P.K. (1969). On a class of rank order tests for parallelism of several regression lines. An. Math. Statist. 40:166&1683. 199. Sen, P. K. (1970). On some convergence properties of one sample rank order statistics. A n . Statist. 41:2 140-2 143. 200. Sen, P. K. (1980). On almost sure linearity theorems for signed rank statistics. A n . Statist. 8:313-321. 201. Sen, P. K. (1981). Sequential Nonparametrics: Invariance principle and Statistical Inference. Wiley, New York. 202. Sen, P. K. (1984). A James-Stein type detour of U-Statistics. Commun. Statist. Theory Meth. 13(22):2725-2747.
References
611
203. Sen, P. K. (1986). On the asymptotic distributional risks of shrinkage and preliminary test versions of maximum likelihood estimators. Sankhya A48:354371. 204. Sen, P. K., and Saleh, A. K. Md. E. (1979). Nonparametric estimation of location parameter after a preliminary test on regression in multivariate case. J . Mult. Anal. 9(2):322-331. 205. Sen, P.K., and Saleh, A.K.Md.E. (1985). On some shrinkage estimators of multivariate location. An. Statist. 13:172-281. 206. Sen, P. K., and Saleh, A. K. Md. E. (1987). On preliminary test and shrinkage M-estimation in linear models. An. Statist. 15(4):1580-1592. 207. Sen, P.K., and Singer, J.M. (1993). Large Sample Methods in Statist.: A n Introduction with Applications, Chapman and Hall, New York. 208. Shinozaki, N. (1989). Improved confidence sets for the mean of a multivariate normal distribution. An. Inst. Stat. Math. 41(2):331-346. 209. Singh, S., and Tracy, D. S. (1999). Ridge-regression using scrambled responses. Metrika, 147-157. 210. Snedecor, G. W. (1938). Statistical Methods. Collegiate Press, Iowa. 211. Srivastava, M. S., and Khatri, C. B. (1979). A n Introduction to Multivariate Statistics. North Holland, Amsterdam. 212. Srivastava, M. S., and Saleh, A. K. Md. E. Saleh (2004). Estimation of the Mean Vector of a Multivariate Normal Distribution: Subspace Hypothesis. J. Mult. Anal. 9655-72. 213. Stein, C. (1955). Inadmissibility of the usual estimator of mean of a multivariate normal distribution. Proc. 3rd Berkeley Symp. Math. Statist. Prob. 1 pp. 197206. 214. Stein, C. (1955). A necessary and sufficient condition for admissibility. An. Math. Statist. 26:518-522. 215. Stein, C. (1956). Inadmissibility of the usual estimator of the mean of a multivariate normal distribution. Proc. 3rd Berkeley Symp. Math. Statist. Probability 1:197-206. 216. Stein, C. (1962). Confidence sets for the mean of a multivariate normal distribution. J. Roy. Stat. SOC.B24:265-296. 217. Stein, C. (1964). Inadmissibility of the usual estimators of variance of a normal distribution with unknown mean. An. Znst. Stat. Math. 16:155-160. 218. Stein, C. (1966). An approach to the recovery of inter-block information in balanced incomplete block designs. Research Papers in Statistics. 219. Stein, C. (1973). Estimation of the mean of a multivariate distribution. Proc. Prague Symp. Asymptotic Statist. pp. 345-381. 220. Stein, C. (1981). Estimation of the mean of multivariate normal distribution. A n . Statist. 93135-1151. 221. Strawdermann, W. E. (1971). Proper Bayes minimax estimators of the multivariate normal mean. An. Math. Statist. 47:365-388. 222. Tabatabaey, S. M. (1995). Preliminary test approach estimation: regression model with spherically symmetric errors. Ph.D. thesis. Carleton University, Ottawa. 223. Tamura, R. (1967a). Small sample properties of a sometimes posted estimate in the nonparametric case. Bull. Math. Statist. 12:75-83.
References 224. Tamura, R. (1967b). Some estimation procedures with a nonparametric preliminary test 11. Bull. Math. Statist. 12:47-59. 225. Tamura, R. (1967~).Some weighted estimates in the nonparametric case. Bull. Math. Statist. 12:61-73. 226. Theil, H. (1963). On the use of incomplete prior information in regression analysis. J. Am. Stat. Assoc. 58:401-414. 227. Theil, H. (1971). Principle of Econometrics. Wiley, New York. 228. Theil, H and Goldberger, A. S. (1961). On pure and mixed statistical estimation in economics. Int. Econ. Rev. 2:65-78. 229. Thompson, S. G., and Pocock, S. J. (1991). Can meta-analysis be trusted? Lancet 338:1127-1130. 230. Vinod, H. D., and Ullah, A. (1981). Recent advances in regression methods. Marcel Dekker, New York. 231. Wencheko, E. (2000). Estimation of the signal-t-noise in the linear regression model, Statist. Papers 41:327-343. 232. Whittle, D. (1952). Estimation and information in time series analysis. Skand. A ~ W U T35:48-60. .
Glossary UE
Unrestricted estimator
UMLE
Unrestricted maximum likelihood estimator
ULSE
Unrestricted least-squares estimator
UR E
Unrestricted rank estimator
UPPE
Unrestricted principal part estimator
UR RE
Unrestricted ridge regression estimator
RE
Restricted estimator
RMLE
Restricted maximum likelihood estimator
RLSE
Restricted least-squares estimator
RRE
Restricted rank estimator
RPPE
Restricted principal part estimator
RUE
Restricted unbiased estimator
R(.; .)
Relative efficiency
RRE(.; .)
Risk-based relative efficiency
PTE
Preliminary test estimator
PThlLE
Preliminary test maximum likelihood estimator
PTLSE
Preliminary test least-squares estimator
PTPPE
Preliminary test principal part estimator
PTRE
Preliminary test rank estimator
PTR RE
Preliminary test ridge regression estimator
PPRSE
Positive-rule Stein-type estimator
PTPPE
Positiverule Stein-type principal part estimator
PTRRE
Positive-rule Stein-type ridge regression estimator
JSE
James-Stein estimator
SE
Stein-type estimator
SMLE
Stein-type maximum likelihood estimator
SLSE
Stein-type least-squares estimator
613
Glossary
614 SRE
Stein-type rank estimator
SRRE
Stein-type ridge regression estimator
SPPE
Stein-type principal part estimator
LSE
least-squares estimator
MLE
maximum likelihood estimator
MSE
Mean square error
MRE(., .)
Mean square error based relative efficiency
ADB
Asymptotic distributional bias
ADQB
Asymptotic distributional quadratic bias
ADMSE
Asymptotic distributional mean square error
ADQR
Asymptotic distributional quadratic Risk
D.F.
Degrees of freedom
C.L.T.
Central limit theorem
EBE
Empirical Bayes estimation
Symbols and Meaning UE RE PTE
SE PRSE
Bias (vector) Quadratic bias MSE
Risk
D.F
CDF of a noncentral chi-square distribution with
Y
CDF of a noncentral F distribution with
D.F.
test-statistic noncentrality parameter
(YI, 2 4 )
Authors Index Adichie, J.N. 50, 56, 110, 111 Agresti, A. 22, 38, 595, 594 Ahmed, S.E. 55, 58, 530 Ahsanullah, M. 55, 58 Akritas, M.G. 272 Albert, J.H. 553, 587 Ali, A.M. 348, 551, 555, 557 Anderson, T.W. 12, 34, 40 Anderson, D.R. 463 Arnold, Steven F. 512 Asano, C. 2 Baldwin, K.F. 444, 462 Baksalary, J.K. 451 Bancroft, T.A. 2, 55, 56, 129, 136, 550 Baranchik, A. 140 Benda, Norbert 344 Bennett, B.M. 2 Berenblutt, 1.1. 485 Berger, J.O. 3, 140, 183, 186, 510 Berger, R.L. 30 Berry, J.C. 171 Bickel, P.J. 30 Bishop, Y.M.M. 22, 38, 585 Bock, M.E. 18, 30, 32, 58, 485, 487 Bolfaine, H. 77, 78 Bozivich, H. 2 Brantle, T.F. 444 Brewster, J.F. 140, 182 Brockwell, P.J. 494 Brown, L.D. 140, 183, 186 Casella, G. 5, 30, 103, 140, 186, 192, 253, 256, 257, 258 Cellier, D. 140 Chen, J. 186 Chen E.J. 487 Chiou, Paul C. 193 615
616
Authors Index
Cochrane, D. 475, 476, 487 Cohen, A. 2, 140, 186 Craig, A.T. 30 Cramer, H. 30, 38 Davis R.A. 494 Deely, J. 156 Dempster, A.P. 444, 462 Doksum, K.A. 30 Doob, J.L. 494 Durbin, J. 474, 485, 504 Dzhaparidze, K. 494 Efron, B. 2, 3, 4, 5, 140, 151, 154, 158, 510, 512, 549, 550, 553 Faith, R.E. 186 Farebrother, R.W. 444 Feller, W . 40 Ferguson, T.S. 40, 594 Fienberg, S.E. 38, 594 Fourdrinier, D. 140 Fuller, W. 474 Galarneau, D.1 444 Gart, J.J. 21, 570 Ghosh, M. 5, 348 Gibbons, D.G. 440,462 Gilbert, R.F. 485 Goldberger, A.S. 404 Graybill, F.A. 18 Griliches, Z. 487 Gruber, M.H.J. 18, 137, 440, 444 Gupta, A.K. 553, 587 Hajek, J. 45, 48, 111, 199, 200, 263, 500 Hall, P. 500 Hallin, M. 500 Han, C.-P. 56, 129 Hannan, E.J. 494 Hartley, H.O. 2 Hassanein, K.M. 272 Hassanein, R.S. 584 Hemmerle, W.J. 444 Heyde, C.C. 500 Hocking, R.R. 462 Hodges, J.L. 50, 110 Hoeffding, W. 516, 517 Hoed, A.E. 16, 440, 444, 462 Hoffmann, K. 30, 140, 152 Hogg, R.V. 30 Holland, P.W. 38, 594
Authors Index
Hoque, Z. 83 Huntsberger, D.V. 2 HuSkov6, M. 526 Hwang, J.T. 5, 183, 186, 192, 253, 256, 257, 258 Inoue, Takakatsu 440 James, W. 125, 151 Joshi, V.M. 186 Judge, G.G. 18, 30, 32, 58, 485, 487 JureEkov6, J . 52, 112, 263, 397, 431 Kakwani, T . 404 Kale, B.K. 550 Kendall, M.G. 38 Kennard, R.W. 16, 440, 444, 462 Khan, S. 78, 83 Khan, Bashir 242, 419 Khatri, C.B. 12, 34, 40, 511 Ki, F. 186 Kibria, B.M.G. 440, 441, 462, 463 Kim, H.M. 584 Kitagawa, T . 2 Koul, H.L. 500, 504 Kubokawa, T. 140, 184 Kmenta, Jan 476, 485 Lambert, Annick 272 Lawless, J.F. 444, 462 Lehmann, E.L. 50, 110 Lindley, D. 156 Lu, K. 510 Lynn, M.J. 462 Maatta, J.M. 103 Malthouse, E.C. 440 Marquardt, D.W. 444 McDonald, G.C. 444 Montgomery, D.C. 440 Morris, C.N. 2, 3, 5, 131, 140, 151, 154, 158, 510, 512, 549, 550, 553 Mosteller, F . 2 Nagar, A.L. 404 Newhouse, J.P. 444 Obenchain, R.L. 444 Oman, S.D. 444 Orcutt, G.H. 475, 476, 487 Peck, E.A. 444 Pocock, S.J. 581 Prais, S.J. 475, 476, 487 Puri, M.L. 45, 110, 111, 199, 200, 397, 500, 516, 517, 523, 541 Radhakrishnan, R. 131
617
618
Authors Index
Randles, R.H. 45, 516 Rao, C.R. 12,40, 511 Rao, P. 487 Renyi, A. 36, 38, 40 Robbins, H.E 5, 154 Robert, Christian P. 140, 193, 272 Rodrigue, J. 77, 78 Rohatgi, V.K. 8, 30, 40, 516 Saleh, A.K.Md. Ehsanes 5, 8, 30, 40, 52, 55, 56, 58, 78, 83, 109, 113, 129, 140, 192, 193, 272, 348, 440, 441, 500, 501, 510, 516, 517, 524, 525, 530, 551, 555, 557, 573, 579, 564 Sarker, N. 440 Sato, S. 2 Schatzoff, M. 444, 462 Sclove, S.L. 36, 131 Sen, P.K. 5, 38, 40, 45, 52, 56, 109, 110, 111, 113, 115, 129, 140, 152, 199, 200, 272, 348, 397, 500, 501, 510, 516, 517, 524, 525, 530, 532, 541 Sidak, Z. 45, 48, 111, 199, 200, 263, 500 Singer, J.M. 38, 40 Shinozaki, N. 186 Singh, S. 440 Smith, A.F.M. 156 Snedecor, G.W. 2 Snee, R.D. 444 Speed, F.M. 462 Srivastava, M.S. 12, 34, 40, 511, 516 Stein, C. 2, 56, 125, 127, 137, 151, 186, 509 Strawdermann, W.E. 140, 183, 186 Stuart, A. 38 Tabatabaey, S.M. 440, 441 Tamura, R. 109 Theil, H. 404 Thompson, S.G. 581 Tracy, D.S. 440 Tsui, K. 5, 186 Ullah, A. 440, 444 Vining, G.G. 444 Vinod, H.D. 440, 444 Wang, P. 444, 462 Watson, G.S. 485 Webb, G.I. 485 Wencheko, E. 440 Wermuth, N. 450, 469 Whittle, D. 494 Winsten, J.A. 475, 476, 487 Wolfe, D.A. 45, 516, 525
Authors Index Yancey, T.A. 493,495 Zacks, S. 77, 798 Zidek, J.V. 140, 182
619
This Page Intentionally Left Blank
Subject Index ANOVA model:
2, 6, 13, 14, 23, 213
Asymptotic
distribution 22, 43, 45, 51, 53, 297, 375, 424, 427, 557 distributional bias 92, 176, 248, 377, 426, 512, 558 distributional MSE 94, 177, 248, 378, 426, 428, 513, 559, 566 distributional risk 177, 180, 248, 378, 426, 428, 513, 559 distributional quadratic risk 180, 378, 426, 428, 513, 559 Auto correlation 469, 470, 474, 478 Bayes risk 7, 8 Bayes rule 7, 8 Courant theorem
39
Convergence in distribution 41 in mean 41 in probability 41 almost sure 42 CLT 43 Hajek-Sidak CLT 44 Law of large numbers 42 Chebycheff’s inequality 43 Borel-Cantelli Lemma 43 Markov Inequality 43 Contiguity 44 Decision rule 7, 8 Decision theory 6 Distribution
Bernoulli 20, 36, 549, 550 Beta 29 Binomial 22, 36, 549 Chi square 29, 30, 31 F-distribution 10, 12, 14, 15 Multinomial 22, 37 621
622 Multivariate normal normal 29 Wishart 33
Subject Index 33, 34
Estimators Admissible 180 Bayes 154, 278, 344, 443 BLUE 8, 446 Confidence set 185, 250, 311, 386 Empirical Bayes 154, 162, 278, 344, 511 James-Stein 139, 140, 218, 276, 344, 407, 441 Maximum likelihood 127, 510 Meta analysis 580 minimax 127 mixed 405 order statistics 46 Preliminary test 129, 218, 264, 343, 407, 441 Positive-rule Stein 140, 264, 344, 407, 441 P T E of variance 130 Quasi-empirical Bayes 157, 175, 343 Restricted 340, 341, 405, 532 R-estimator 111, 198, 264, 395, 430, 522, 539 Ridge regression 439 Sclove modified 149 sign statistics 45, 46 Stochastic restricted 403 unrestricted 340, 341, 405, 532 Gauss-Markoff 8
Hypothesis likelihood ratio test 216, 274, 342 test statistic 2, 10, 12, 14, 16, 18, 19, 21, 22 inadmissible 2 loss function 8 Linex function 8 Linear model 56 Mean 2, 3, 8 Mean square error 8, 127 Non-centrality parameter 10, 11, 12, 14, 16, 19, 21, 22 Odds-ratio 21, 579, 580 Prior distribution 552, 586 Prior information 1 Posterior distribution 552, 586
WILEY SERIES IN PROBABILITY AND STATISTICS ESTABLISHED BY WALTER A. SHJWHART AND SAMUEL S. WILKS Editors: David J. Balding, Noel A. C. Cressie, Nicholas I. Fisher, lain M. Johnstone, J. B. Kadane, Geert Molenberghs. Louise M. Ryan, David W. Scott, Adrian F. M. Smith, Jozef L. Teugels Editors Emeriti: Vic Barnett, J. Stuart Hunter, David G. Kendall The Wiky Series in Probability and Statistics is well established and authoritative. It covers many topics of current research interest in both pure and applied statistics and probability theory. Written by leading statisticians and institutions, the titles span both state-of-the-art developments in the field and classical methods. Reflecting the wide range of current research in statistics, the series encompasses applied, methodological and theoretical statistics, ranging from applications and new techniques made possible by advances in computerized practice to rigorous treatment of theoretical approaches. This series provides essential and invaluable reading for all statisticians, whether in academia, industry, government, or research.
t
*
* *
ABRAHAM and LEDOLTER . Statistical Methods for Forecasting AGRESTI . Analysis of Ordinal Categorical Data AGRESTI . An Introduction to Categorical Data Analysis AGRESTI . Categorical Data Analysis, Second Edition ALTMAN, GILL, and McDONALD . Numerical Issues in Statistical Computing for the Social Scientist AMARATUNGA and CABRERA . Exploration and Analysis of DNA Microarray and Protein Array Data ANDBL . Mathematics of Chance ANDERSON . An Introduction to Multivariate Statistical Analysis, Third Edition ANDERSON . The Statistical Analysis of Time Series ANDERSON, AUQUIER, HAUCK, OAKES, VANDAELE, and WEISBERG . Statistical Methods for Comparative Studies ANDERSON and LOYNES . The Teaching of Practical Statistics ARMITAGE and DAVID (editors) . Advances in Biometry ARNOLD, BALAKFUSHNAN, and NAGARAJA . Records ARTHANARI and DODGE . Mathematical Programming in Statistics BAILEY . The Elements of Stochastic Processes with Applications to the Natural Sciences BALAKRISHNAN and KOUTRAS . Runs and Scans with Applications BARNETT . Comparative Statistical Inference, Third Edition BARNETT and LEWIS . Outliers in Statistical Data, Third Edition BARTOSZYNSKI and NIEWIADOMSKA-BUGAJ . Probability and Statistical Inference BASILEVSKY . Statistical Factor Analysis and Related Methods: Theory and Applications BASU and RlGDON . Statistical Methods for the Reliability of Repairable Systems BATES and WATTS . Nonlinear Regression Analysis and Its Applications BECHHOFER, SANTNER, and GOLDSMAN . Design and Analysis of Experiments for Statistical Selection, Screening, and Multiple Comparisons BELSLEY . Conditioning Diagnostics: Collinearity and Weak Data in Regression
*Now available in a lower priced paperback edition in the Wiley Classics Library. ?Now available in a lower priced paperback edition in the Wiley-Interscience Paperback Series.
f BELSLEY, KUH, and WELSCH . Regression Diagnostics: Identifying Influential Data and Sources of Collinearity BENDAT and PIERSOL . Random Data: Analysis and Measurement Procedures, Third Edition BERRY, CHALONER, and GEWEKE . Bayesian Analysis in Statistics and Econometrics: Essays in Honor of Arnold Zellner BERNARD0 and SMITH . Bayesian Theory BHAT and MILLER . Elements of Applied Stochastic Processes, Third Edition BHATTACHARYA and WAYMIRE ’ Stochastic Processes with Applications f BIEMER, GROVES, LYBERG, MATHIOWETZ, and SUDMAN . Measurement Errors in Surveys BILLINGSLEY . Convergence of Probability Measures, Second Edition BILLINGSLEY . Probability and Measure, Third Edition BIFXES and DODGE . Alternative Methods of Regression BLISCHKE AND MURTHY (editors) . Case Studies in Reliability and Maintenance BLISCHKE AND MURTHY . Reliability: Modeling, Prediction, and Optimization BLOOMFIELD . Fourier Analysis of Time Series: An Introduction, Second Edition BOLLEN . Structural Equations with Latent Variables BOLLEN and CURRAN . Latent Curve Models: A Structural Equation Perspective BOROVKOV . Ergodicity and Stability of Stochastic Processes BOULEAU . Numerical Methods for Stochastic Processes BOX . Bayesian Inference in Statistical Analysis BOX . R. A. Fisher, the Life of a Scientist BOX and DRAPER . Empirical Model-Building and Response Surfaces * BOX and DRAPER . Evolutionary Operation: A Statistical Method for Process Improvement BOX, HUNTER, and HUNTER . Statistics for Experimenters: Design, Innovation, and Discovery, Second Editon BOX and LUCERO . Statistical Control by Monitoring and Feedback Adjustment BRANDIMARTE . Numerical Methods in Finance: A MATLAB-Based Introduction BROWN and HOLLANDER . Statistics: A Biomedical Introduction BRUNNER, DOMHOF, and LANGER . Nonparametic Analysis of Longitudinal Data in Factorial Experiments BUCKLEW . Large Deviation Techniques in Decision, Simulation, and Estimation CAIROLI and DALANG . Sequential Stochastic Optimization CASTILLO, HADI, BALAKRISHNAN, and SARABIA . Extreme Value and Related Models with Applications in Engineering and Science CHAN . Time Series: Applications to Finance CHARALAMBIDES . Combinatorial Methods in Discrete Distributions CHATTERJEE and HADI . Sensitivity Analysis in Linear Regression CHATTERJEE and PRICE . Regression Analysis by Example, Third Edition CHERNICK . Bootstrap Methods: A Practitioner’s Guide CHERNICK and FRIIS . Introductory Biostatistics for the Health Sciences CHILES and DELFINER . Geostatistics: Modeling Spatial Uncertainty CHOW and LIU . Design and Analysis of Clinical Trials: Concepts and Methodologies, Second Edition CLARKE and DISNEY . Probability and Random Processes: A First Course with Applications, Second Edition * COCHRAN and COX . Experimental Designs, Second Edition CONGDON . Applied Bayesian Modelling CONGDON . Bayesian Statistical Modelling CONOVER . Practical Nonparametric Statistics, Third Edition COOK . Regression Graphics
*Now available in a lower priced paperback edition in the Wiley Classics Library. ?Now available in a lower priced paperback edition in the Wiley-Interscience Paperback Series.
*
* *
* *
*
t
*
t
COOK and WEISBERG . Applied Regression Including Computing and Graphics COOK and WEISBERG . An Introduction to Regression Graphics CORNELL . Experiments with Mixtures, Designs, Models, and the Analysis of Mixture Data, Third Edition COVER and THOMAS . Elements of Information Theory COX . A Handbook of Introductory Statistical Methods COX . Planning of Experiments CRESSIE . Statistics for Spatial Data, Revised Edition CSORGO and HORVATH . Limit Theorems in Change Point Analysis DANIEL . Applications of Statistics to Industrial Experimentation DANIEL . Biostatistics: A Foundation for Analysis in the Health Sciences, Eighth Edition DANIEL . Fitting Equations to Data: Computer Analysis of Multifactor Data, Second Edition DASU and JOHNSON . Exploratory Data Mining and Data Cleaning DAVID and NACARAJA . Order Statistics, Third Edition DEGROOT, FIENBERG, and KADANE . Statistics and the Law DEL CASTILLO . Statistical Process Adjustment for Quality Control DEMARIS . Regression with Social Data: Modeling Continuous and Limited Response Variables DEMIDENKO . Mixed Models: Theory and Applications DENISON, HOLMES, MALLICK and SMITH . Bayesian Methods for Nonlinear Classification and Regression DETTE and STUDDEN . The Theory of Canonical Moments with Applications in Statistics, Probability, and Analysis DEY and MUKERJEE . Fractional Factorial Plans DILLON and GOLDSTEIN . Multivariate Analysis: Methods and Applications DODGE . Alternative Methods of Regression DODGE and ROMIG . Sampling Inspection Tables, Second Edition DOOB . Stochastic Processes DOWDY, WEARDEN, and CHILKO . Statistics for Research, Third Edition DRAPER and SMITH . Applied Regression Analysis, Third Edition DRYDEN and MARDIA . Statistical Shape Analysis DUDEWICZ and MISHRA . Modem Mathematical Statistics DUNN and CLARK . Basic Statistics: A Primer for the Biomedical Sciences, Third Edition DUPUIS and ELLIS . A Weak Convergence Approach to the Theory of Large Deviations ELANDT-JOHNSON and JOHNSON . Survival Models and Data Analysis ENDERS . Applied Econometric Time Series ETHIER and KURTZ . Markov Processes: Characterization and Convergence EVANS, HASTINGS, and PEACOCK . Statistical Distributions, Third Edition FELLER . An Introduction to Probability Theory and Its Applications, Volume I, Third Edition, Revised; Volume 11, Second Edition FISHER and VAN BELLE . Biostatistics: A Methodology for the Health Sciences FITZMAURICE, LAIRD, and WARE . Applied Longitudinal Analysis FLEISS . The Design and Analysis of Clinical Experiments FLEISS . Statistical Methods for Rates and Proportions, Third Edition FLEMING and HARRINGTON . Counting Processes and Survival Analysis FULLER . Introduction to Statistical Time Series, Second Edition FULLER. Measurement Error Models GALLANT . Nonlinear Statistical Models GEISSER . Modes of Parametric Statistical Inference GEWEKE . Contemporary Bayesian Econometrics and Statistics
*Now available in a lower priced paperback edition in the Wiley Classics Library. ?Now available in a lower priced paperback edition in the Wiley-Interscience Paperback Series.
GHOSH, MUKHOPADHYAY, and SEN . Sequential Estimation GIESBRECHT and GUMPERTZ . Planning, Construction, and Statistical Analysis of Comparative Experiments GIFI . Nonlinear Multivariate Analysis GIVENS and HOETING . Computational Statistics GLASSERMAN and YAO . Monotone Structure in Discrete-Event Systems GNANADESIKAN . Methods for Statistical Data Analysis of Multivariate Observations, Second Edition GOLDSTEKN and LEWIS . Assessment: Problems, Development, and Statistical Issues GREENWOOD and NIKULM . A Guide to Chi-Squared Testing GROSS and HARRIS . Fundamentals of Queueing Theory, Third Edition t GROVES . Survey Errors and Survey Costs * HAHN and SHAPIRO . Statistical Models in Engineering HAHN and MEEKER . Statistical Intervals: A Guide for Practitioners HALD . A History of Probability and Statistics and their Applications Before 1750 HALD . A History of Mathematical Statistics from 1750 to 1930 HAMPEL . Robust Statistics: The Approach Based on Influence Functions HANNAN and DEISTLER . The Statistical Theory of Linear Systems HEIBERGER . Computation for the Analysis of Designed Experiments HEDAYAT and SINHA . Design and Inference in Finite Population Sampling HELLER . MACSYMA for Statisticians HMKELMANN and KEMPTHORNE . Design and Analysis of Experiments, Volume 1: Introduction to Experimental Design HINKELMANN and KEMPTHORNE . Design and Analysis of Experiments, Volume 2: Advanced Experimental Design HOAGLM, MOSTELLER, and TUKEY . Exploratory Approach to Analysis of Variance HOAGLIN, MOSTELLER, and TUKEY . Exploring Data Tables, Trends and Shapes * HOAGLIN, MOSTELLER, and TUKEY . Understanding Robust and Exploratory Data Analysis HOCHBERG and TAMHANE . Multiple Comparison Procedures HOCKNG Methods and Applications of Linear Models: Regression and the Analysis of Variance, Second Edition HOEL . Introduction to Mathematical Statistics, Fifih Edition HOGG and KLUGMAN . Loss Distributions HOLLANDER and WOLFE . Nonparamemc Statistical Methods, Second Edition HOSMER and LEMESHOW . Applied Logistic Regression, Second Edition HOSMER and LEMESHOW . Applied Survival Analysis: Regression Modeling of Time to Event Data f HUBER . Robust Statistics HUBERTY . Applied Discriminant Analysis HUNT and KENNEDY . Financial Derivatives in Theory and Practice HUSKOVA, BERAN, and DUPAC ' Collected Works of Jaroslav Hajekwith Commentary HUZURBAZAR . Flowgraph Models for Multistate Time-to-Event Data IMAN and CONOVER . A Modem Approach to Statistics JACKSON . A User's Guide to Principle Components JOHN . Statistical Methods in Engineering and Quality Assurance JOHNSON . Multivariate Statistical Simulation JOHNSON and BALAKRISHNAN . Advances in the Theory and Practice of Statistics: A Volume in Honor of Samuel Kotz JOHNSON and BHATTACHARYYA . Statistics: Principles and Methods, FzPh Edition *Now available in a lower priced paperback edition in the Wiley Classics Library. ?Now available in a lower priced paperback edition in the Wiley-Interscience Paperback Series.
*
JOHNSON and KOTZ . Distributions in Statistics JOHNSON and KOTZ (editors) . Leading Personalities in Statistical Sciences: From the Seventeenth Century to the Present JOHNSON, KOTZ, and BALAKRISHNAN . Continuous Univanate Distributions, Volume 1, Second Edition JOHNSON, KOTZ. and BALAKRISHNAN . Continuous Univanate Dlstributions, Volume 2, Second Edition JOHNSON, KOTZ. and BALAKRISHNAN . Discrete Multivariate Disrnbutions JOHNSON, KEMP. and KOTZ Univariate Discrete Distributions, Third Edition JUDGE, GRIFFITHS, HILL, LUTKEPOHL, and LEE The Theory and Practice of Ecpometrics, Second Edition JURECKOVA and SEN Robust Statistical Procedures: Aymptotics and Interrelations JUREK and MASON . Operator-Limit Distributions in Probability Theory KADANE . Bayesian Methods and Ethics in a Clinical Trial Design KADANE AND SCHUM . A Probabilistic Analysis of the Sacco and Vanzetti Evidence KALBFLEISCH and PRENTICE . The Statistical Analysis of Failure Time Data, Second Edition KASS and VOS . Geometrical Foundations of Asymptotic Inference KAUFMAN and ROUSSEEUW Finding Groups in Data: An Introduction to Cluster Analysis KEDEM and FOKIANOS . Regression Models for Time Series Analysis KENDALL, BARDEK, CARNE, and LE . Shape and Shape Theory KHURI Advanced Calculus with Applications in Statistics. Second Edition KHURI, MATHEW, and SINHA ' Statistical Tests for Mixed Linear Models KISH ' Statistical Design for Research KLEIBER and KOTZ Statistical Size Distributions in Economics and Actuarial Sciences KLUGMAN, I'ANJER, and WILLMOT . Loss Models: From Data to Decisions, Second Edition KLUGMAN, PANJER, and WILLMOT. Solutions Manual to Accompany Loss Models: From Data to Decisions, Second Edition KOTZ. BALAKRISHNAN, and JOHNSON Continuous Multivariate Distributions, Volume I , Second Edition KOTZ and JOHNSON (editors) ' Encyclopedia of Statistical Sciences: Volumes 1 to 9 with Index KOTZ and JOHNSON (editors) ' Encyclopedia of Statistical Sciences: Supplement Volume KOTZ. READ, and BANKS (editors) ' Encyclopedia of Statistical Sciences: Update Volume 1 KOTZ. READ, and BANKS (editors) . Encyclopedia of Statistical Sciences: Update Volume 2 KOVALENKO, KUZNETZOV, and PEGG Mathematical Theory of Reliability of Time-Dependent Systems with Practical Applications LACHIN Biostatistical Methods: The Assessment of Relative Risks LAD . Operational Subjective Statistical Methods: A Mathematical, Philosophical, and Historical Introduction LAMPERTI . Probability: A Survey ofthe Mathematical Theory, Second Edition LANGE, RYAN, BILLARD, BRILLINGER, CONQUEST, and GREENHOUSE ' Case Studies in Biometry LARSON . Introduction to Probability Theory and Statistical Inference. Third Edition LAWLESS ' Statistical Models and Methods for Lifetime Data, Second Edition LAWSON ' Statistical Methods in Spatial Epidemiology LE . Applied Categorical Data Analysis LE . Applied Survival Analysis
"Now available i i i a lower priced paperback edition in the Wiley Classics Library. available i n a lower priced paperback edition in the Wiley-lntersciencc Paperback Series.
tYow
LEE and WANG Statistical Methods for Survival Data Analysis, Third Edition LEPAGE and BILLARD Exploring the Limits of Bootstrap LEYLAND and GOLDSTEIN (editors) . Multilevel Modelling of Health Statistics LIAO . Statistical Group Comparison LINDVALL . Lectures on the Coupling Method LINHART and ZUCCHINI Model Selection LITTLE and RUBrli . Statistical Analysis with Missing Data, Second Edition LLOYD . The Statistical Analysis of Categorical Data LOWEN and TEICH ' Fractal-Based Point Processes MAGNUS and NEUDECKER Matrix Differential Calculus with Applications in Statistics and Econometrics, Revised Edition MALLER and ZHOU . Survival Analysis with Long Term Survivors MALLOWS . Design, Data, and Analysis by Some Friends of Cuthbert Daniel MANN, SCHAFER, and SINGPURWALLA . Methods for Statistical Analysis of Reliability and Life Data MANTON, WOODBURY, and TOLLEY . Statistical Applications Using Fuzzy Sets MARCHETTE . Random Graphs for Statistical Pattern Recognition MARDIA and JUPP . Directional Statistics MASON, GUNST, and HESS . Statistical Design and Analysis of Experiments with Applications to Engineering and Science, Second Edition McCULLOCH and SEARLE . Generalized, Linear, and Mixed Models McFADDEN ' Management of Data in Clinical Trials * McLACHLAN . Discriminant Analysis and Statistical Pattern Recognition McLACHLAN, DO, and AMBROISE . Analyzing Microarray Gene Expression Data McLACHLAN and KRISHNAN The EM Algorithm'and Extensions McLACHLAN and PEEL Finite Mixture Models McNEIL Epidemiological Research Methods MEEKER and ESCOBAR ' Statistical Methods for Reliability Data MEERSCHAERT and SCHEFFLER . Limit Distributions for Sums of Independent Random Vectors: l l e a w Tails in Theory and Practice MICKEY, D U " , and CLARK . Applied Statistics: Analysis of Variance ano Regression, Third Edition * MILLER . Survival Analysis, Second Edition MONTGOMERY, PECK, and VINING Introduction to Linear Regression Aralysis, Third Edition MORGENTHALER and TUKEY ' Configural Polysampling: A Route to Practical Robustness MUIRHEAD . Aspects of Multivariate Statistical Theory MULLER and STOY AN . Comparison Methods for Stochastic Models and Risks MURRAY . X-STAT 2.0 Statistical Experimentation, Design Data Analysis, and Nonlinear Optimization MURTHY, XIE, and JIANG Weibull Models MYERS and MONTGOMERY . Response Surface Methodology: Process and Product Optimization Using Designed Experiments, Second Edition MYERS, MONTGOMERY, and VINING . Generalized Linear Models. With Applications in Engineering and the Sciences NELSON . Accelerated Testing, Statistical Models, Test Plans, and Data Analyses t NELSON . Applied Life Data Analysis NEWMAN ' Biostatistical Methods in Epidemiology OCHI . Applied Probability and Stochastic Processes in Engineering and Physical Sciences OKABE, BOOTS, SUGIHARA. and CHIU . Spatial Tesselations: Concepts ar;d Applications of Voronoi Diagrams, Second Edition OLIVER and SMITH ' Influence Diagrams, Belief Nets and Decision Analysir
'
*Now
available in a lower priced paperback edition in the Wiley Classics Library.
'Now available in a lower pnced paperback edition in the Wiley-Interscience Paperback Series.
PALTA . Quantitative Methods in Population Health: Extensions of Ordinary Regressions PANKRATZ . Forecasting with Dynamic Regression Models PANKRATZ . Forecasting with Univariate Box-Jenkins Models: Concepts and Cases * P A F E N . Modem Probability Theory and Its Applications PENA, TIAO, and TSAY .A Course in Time Series Analysis PIANTADOSI . Clinical Trials: A Methodologic Perspective PORT . Theoretical Probability for Applications POURAHMADI . Foundations of Time Series Analysis and Prediction Theory PRESS . Bayesian Statistics: Principles, Models, and Applications PRESS . Subjective and Objective Bayesian Statistics, Second Edition PRESS and TANUR . The Subjectivity of Scientists and the Bayesian Approach PUKELSHEIM . Optimal Experimental Design PURI, VILAPLANA, and WERTZ . New Perspectives in Theoretical and Applied Statistics t PUTERMAN . Markov Decision Processes: Discrete Stochastic Dynamic Programming QIU . Image Processing and Jump Regression Analysis * RAO . Linear Statistical Inference and Its Applications, Second Edition RAUSAND and H0YLAND . System Reliability Theory: Models, Statistical Methods, and Applications, Second Edition RENCHER . Linear Models in Statistics RENCHER . Methods of Multivariate Analysis, Second Edition RENCHER . Multivariate Statistical Inference with Applications * RIPLEY . Spatial Statistics RIPLEY . Stochastic Simulation ROBINSON . Practical Strategies for Experimenting ROHATGI and SALEH . An Introduction to Probability and Statistics, Second Edition ROLSKI, SCHMIDLI, SCHMIDT, and TEUGELS . Stochastic Processes for Insurance and Finance ROSENBERGER and LACHIN . Randomization in Clinical Trials: Theory and Practice ROSS . Introduction to Probability and Statistics for Engineers and Scientists t ROUSSEEUW and LEROY . Robust Regression and Outlier Detection * RUBIN . Multiple Imputation for Nonresponse in Surveys RUBINSTEIN . Simulation and the Monte Carlo Method RUBINSTEIN and MELAMED . Modem Simulation and Modeling RYAN . Modem Regression Methods RYAN . Statistical Methods for Quality Improvement, Second Edition SALEH . Theory of Preliminary Test and Stein-Type Estimation with Applications * SCHEFFE . The Analysis of Variance SCHIMEK . Smoothing and Regression: Approaches, Computation, and Application SCHOTT . Mamx Analysis for Statistics, Second Edition SCHOUTENS . Levy Processes in Finance: Pricing Financial Derivatives SCHUSS . Theory and Applications of Stochastic Differential Equations SCOTT . Multivariate Density Estimation: Theory, Practice, and Visualization SEARLE . Linear Models for Unbalanced Data SEARLE . Matrix Algebra Useful for Statistics SEARLE, CASELLA, and McCULLOCH . Variance Components SEARLE and WILLETT . Matrix Algebra for Applied Economics SEBER and LEE . Linear Regression Analysis, Second Edition 7 SEBER . Multivariate Observations 7 SEBER and WILD . Nonlinear Regression SENNOTT . Stochastic Dynamic Programming and the Control of Queueing Systems * SERFLING . Approximation Theorems of Mathematical Statistics SHAFER and VOVK . Probability and Finance: It’s Only a Game! *Now available in a lower priced paperback edition in the Wiley Classics Library. ?Now available in a lower priced paperback edition in the Wiley-Interscience Paperback Series.
*
SILVAPULLE and SEN . Constrained Statistical Inference: Inequality, Order, and Shape Restrictions SMALL and McLEISH . Hilbert Space Methods in Probability and Statistical Inference SRIVASTAVA . Methods of Multivariate Statistics STAPLETON . Linear Statistical Models STAUDTE and SHEATHER . Robust Estimation and Testing STOYAN, KENDALL, and MECKE . Stochastic Geometry and Its Applications, Second Edition STOYAN and STOYAN . Fractals, Random Shapes and Point Fields: Methods of Geometrical Statistics STYAN . The Collected Papers ofT. W. Anderson: 1943-1985 SUTTON, ABRAMS, JONES, SHELDON, and SONG . Methods for Meta-Analysis in Medical Research TAKEZAWA . Introduction to Nonparametric Regression TANAKA . Time Series Analysis: Nonstationary and Noninvertible Dismbution Theory THOMPSON . Empirical Model Building THOMPSON . Sampling, Second Edition THOMPSON . Simulation: A Modeler’s Approach THOMPSON and SEBER . Adaptive Sampling THOMPSON, WILLIAMS, and FINDLAY . Models for Investors in Real World Markets TIAO, BISGAARD, HILL, PESA, and STIGLER (editors) . Box on Quality and Discovery: with Design, Control, and Robustness TIERNEY . LISP-STAT: An Object-Oriented Environment for Statistical Computing and Dynamic Graphics TSAY . Analysis of Financial Time Series, Second Edition UPTON and FINGLETON . Spatial Data Analysis by Example, Volume 11: Categorical and Directional Data VAN BELLE . Statistical Rules of Thumb VAN BELLE, FISHER, HEAGERTY, and LUMLEY . Biostatistics: A Methodology for the Health Sciences, Second Edition VESTRUP . The Theory of Measures and Integration VIDAKOVIC . Statistical Modeling by Wavelets VINOD and REAGLE . Preparing for the Worst: Incorporating Downside Risk in Stock Market Investments WALLER and GOTWAY . Applied Spatial Statistics for Public Health Data WEERAHANDI . Generalized Inference in Repeated Measures: Exact Methods in MANOVA and Mixed Models WEISBERG . Applied Linear Regression, Third Edition WELSH . Aspects of Statistical Inference WESTFALL and YOUNG . Resampling-Based Multiple Testing: Examples and Methods for p-Value Adjustment WHITTAKER . Graphical Models in Applied Multivariate Statistics WINKER . Optimization Heuristics in Economics: Applications of Threshold Accepting WONNACOTT and WONNACOTT . Econometrics, Second Edition WOODING . Planning Pharmaceutical Clinical Trials: Basic Statistical Principles WOODWORTH . Biostatistics: A Bayesian Introduction WOOLSON and CLARKE . Statistical Methods for the Analysis of Biomedical Data, Second Edition WU and HAMADA . Experiments: Planning, Analysis, and Parameter Design Optimization YANG . The Construction Theory of Denumerable Markov Processes ZELLNER . An Introduction to Bayesian Inference in Econometrics ZHOU, OBUCHOWSKI, and McCLISH .Statistical Methods in Diagnostic Medicine
*Now available in a lower priced paperback edition in the Wiley Classics Library. ?Now available in a lower priced paperback edition in the Wiley-Interscience Paperback Series.