Generalized estimating equations MVsa

Author: James W. Hardin | Joseph M. Hilbe

125 downloads 1946 Views 3MB Size Report

This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!

Report copyright / DMCA form

DOWNLOAD PDF

Generalized Generalized Estimating Estimating Equations Equations

© 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC ©

eneralized Generalized Estimatin Estimating Equations Equations James W. Hardin Hardin James W. Joseph M. Joseph M. Hilbe Hilbe

N CHAPMAN & HALL/CRC 4~

CHAPMAN & HALL/CRC Boca Raton Raton Boca

© 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC ©

A CRC CRC Press Press Company Company A

London London

New York Washington, Washington, D.C D.C.. NewYork

Library of of Congress Congress Cataloging-in-Publication Cataloging-in-Publication Data Data Library Hardin, James James W. w. (James William) Hardin, (James William) W. Hardin, Hardin, Joseph Joseph M. M. Hilbe. Hilbe. Generalized estimating estimating equations equations // James James W. Generalized cm. pp.. cm. Includes bibliographical bibliographical references references and and index index.. Includes ISBN 1-58488-307-3 1-58488-307-3 (alk (alk.. paper) paper) ISBN Generalized estimating estimating equations equations.. I.I. Hilbe, Hilbe, Joseph. II.. Title. 11.. Generalized Joseph . II Title . QA278.2 .H378 2002 2002 QA278 .2 .H378 519.5'36--dc21 519 .5'36-dc21

2002067404 2002067404

This book contains contains information information obtained obtained from from authentic authentic and and highly highly regarded regarded sources sources.. Reprinted Reprinted material material This book of references references are are listed listed.. Reasonable Reasonable is quoted quoted with with permission, permission, and and sources are indicated. indicated. A A wide variety of is sources are wide variety efforts have have been been made made to to publish publish reliable reliable data and information, information, but but the and the the publisher publisher cannot cannot efforts data and the author author and assume responsibility responsibility for the validity validity of of all all materials materials or or for for the the consequences of their their use. use. assume for the consequences of Neither this this book book nor nor any any part part may may be be reproduced reproduced or or transmitted transmitted in in any any form form or or by by any any means, means, electronic electronic Neither or mechanical, mechanical, including including photocopying, photocopying, microfilming, microfilming, and recording, or or by by any any information storage or or or and recording, information storage retrieval without prior prior permission permission in in writing writing from from the the publisher. publisher. retrieval system, system, without The consent of of CRC CRC Press Press LLC LLC does not extend to copying copying for for general general distribution, distribution, for for promotion, promotion, for for The consent does not extend to creating new new works, works, or or for for resale resale.. Specific Specific permission permission must must be be obtained obtained in in writing writing from from CRC CRC Press Press LLC LLC creating for copying.. for such such copying Direct all all inquiries inquiries to to CRC CRC Press Press LLC, LLC, 2000 2000 N N.W. Corporate Blvd., Blvd., Boca Boca Raton, Raton, Florida Florida 33431 33431.. Direct .W. Corporate Trademark Notice Notice:: Product Product or or corporate corporate names names may may be be trademarks trademarks or registered trademarks, trademarks, and are Trademark or registered and are used only only for for identification identification and and explanation, explanation, without without intent intent to to infringe infringe.. used

Visit the the CRC CRC Press Press Web Web site site at at wwwcrepress www.crcpress.com Visit .com Chapman & Hall/CRC © 2003 2003 by by Chapman & Hall/CRC © No claim claim to to original original U U.S. works No .S . Government Government works International 1-58488-307-3 International Standard Standard Book Book Number Number 1-58488-307-3 Library Library of of Congress Congress Card Card Number Number 2002067404 2002067404 of America America 11 22 33 44 55 66 77 88 99 00 Printed in in the the United United States States of Printed Printed acid-free paper Printed on on acid-free paper

© 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC ©

To our wives, wives, Mariaelena Mariaelena Castro-Hardin Castro-Hardin and and Cheryl Cheryl Lynn Lynn Hilbe, Hilbe, To our and our children, children, Taylor Taylor Antonio Antonio Hardin, Hardin, Conner Conner Diego Diego Hardin, Hardin, and our Heather Lynn Hilbe Hilbe O'Meara, O'Meara, Michael Michael Joseph Joseph Hilbe, Hilbe, and and Mitchell Mitchell Heather Lynn Jon Hilbe. Hilbe. Jon

© 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC ©

Preface Preface

Generalized Estimating Estimating Equations Equations is is written written for for the the active active researcher researcher as as Generalized well as for for the the theoretical theoretical statistician statistician.. Our Our goal goal throughout throughout has has been been to to clarclarwell as ify the (GEE) and ify the nature nature and and scope scope of of generalized generalized estimating estimating equations and to to equations (GEE) demonstrate its relationship relationship to to alternative alternative panel panel models. models. demonstrate its This This text text assumes assumes that that the the reader reader has has aa fundamental fundamental understanding understanding of of (GLM). We generalized linear models models (GLM). We shall shall provide provide an an overview overview of of GLM, GLM, but but generalized linear intend it to to be be merely merely aa review review.. The The more more familiar familiar aa reader reader is is with with GLM, the intend it GLM, the easier will be be to to recognize recognize how how the the basic basic GLM algorithm can can be be extended extended easier it it will GLM algorithm to incorporate incorporate the of longitudinal longitudinal and and clustered clustered data by means means of of to the modeling modeling of data by generalized estimating equations equations.. generalized estimating method of analyzing cercerGeneralized Linear Models is essentially essentially aa unified unified method Generalized Linear Models is of analyzing tain types types of situations.. It is based based on on the the exponential exponential family family of of probprobtain of data data situations It is ability distributions, which which includes includes the the Gaussian Gaussian or or normal, normal, the the binomial, binomial, ability distributions, Poisson, gamma, inverse geometric, and and for given ancillary ancillary papaPoisson, gamma, inverse Gaussian, Gaussian, geometric, for aa given rameter, the negative negative binomial binomial.. The The binomial binomial models models themselves themselves include include the the rameter, the logit, probit, log-log, log-log, and and complementary complementary log-log, log-log, among among others others.. Hence, Hence, one one logit, probit, may use GLM GLM to to model model OLS OL8 regression regression as well as as logistic, probit, and and Poisson Poisson may use as well logistic, probit, regression models.. The The ability ability to to compare compare parameter parameter estimates, estimates, standard standard erregression models errors, and summary summary statistics statistics between between models models gives gives the the researcher researcher a a powerful powerful rors, and means by which which he he or or she may arrive arrive at at an an optimal optimal model model for for aa given given dataset dataset.. means by she may However, being likelihood likelihood based, based, GLMs GLMs assume assume that that individual individual rows rows in in the the However, being from one data are independent independent from another.. However, However, in in the the case case of of longitudinal longitudinal data are one another and clustered data, data, this this assumption assumption may may fail. fail. The The data data are correlated. The The and clustered are correlated. clustering units are many times times called called panels; panels; hence hence their their reference reference as as panel panel clustering units are many data. data. Although statisticians statisticians created created methods methods within within the the GLM GLM framework framework to to help help Although correct for correlated correlated data, data, it it became became evident evident that that these these methods methods were were not not correct for sufficient. GEE was was explicitly explicitly developed developed to to serve serve as means to to extend extend the the sufficient . GEE as aa means GLM algorithm algorithm to to accommodate the modeling modeling of of correlated correlated data data that that would would GLM accommodate the have otherwise been been modeled modeled using using straightforward straightforward GLM GLM methods methods.. We We note note have otherwise as well that that GEE GEE has has itself been extended, extended, and and at at times times in in aa manner manner that that as well itself been substantially varies from from the the original original GLM GLM approach. substantially varies approach . Our intent in in writing writing this this text text is is to to provide provide an an overview overview of ofthe GEE methodmethodOur intent the GEE ology in all of its variations as well as to compare it with other methods that ology in all of its variations as well as to compare it with other methods that are used to model correlated and clustered data. However, we concentrate our are used to model correlated and clustered data. However, we concentrate our discussion to the general GEE approach. discussion to the general GEE approach. the text text into into four four divisions, represented by by four four main main We have have organized organized the We divisions, represented chapters; fifth chapter chapter lists lists data and useful useful programs. programs. The The first first chapter chapter proprochapters ; aa fifth data and vides an introduction introduction to to the the subject subject matter matter.. The The second chapter serves serves as vides an second chapter as aa review of generalized generalized linear linear models. We first first offer offer an an historical historical perspective perspective review of models . We to the the development development of of GLM GLM methodology methodology and and point point out out methods methods by by which which to GLM algorithm the GLM algorithm has has been been extended extended to to meet meet particular particular modeling modeling purposes purposes.. the We then review review basic basic modeling modeling strategies wherein we we focus focus on on the the nature nature and and We then strategies wherein vii

© 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC ©

viii Vlll

scope the estimating estimating equation equation.. By By focusing focusing attention attention on on the the estimating estimating scope of of the equation of familiar familiar models, models, we we believe believe it it is is easier easier to to understand understand the the more more equation of complex generalized estimating estimating equation equation.. Finally, Finally, we we use use Chapter Chapter 22 to to introintrocomplex generalized duce panel data data and and discuss discuss many many of of the the available available likelihood-based likelihood-based models models duce panel that have have been been used used to to deal deal with with such such data data situations situations.. that Chapter 33 concentrates concentrates on on the the varieties varieties of of generalized generalized estimating estimating equations equations.. Chapter In fact, we we have have specifically specifically organized organized the the chapter chapter to to facilitate facilitate aa comparison comparison In fact, of the different different types types of of GEE GEE models models.. The The prime prime division division is is between between marginal marginal of the or population averaging averaging models models and and subject subject specific specific models. models. Wherever Wherever possipossior population ble we we attempt attempt to to demonstrate demonstrate the the source source of of observed observed differences differences in output ble in output between different different software software applications applications when when they they occur. occur. Typically Typically they they differ differ between because of of alternative alternative formulae formulae in in the the estimating estimating algorithms algorithms.. Computational Computational because variations are usually usually minor, minor, and and involve involve an an extra term in in the denominator of of variations are extra term the denominator an ancillary equation equation.. an ancillary Chapter 44 deals deals with with residual residual analysis analysis and and model model goodness goodness offit. of fit. We We demondemonChapter strate many graphical graphical and and statistical techniques that that can can be be applied applied to to GEE GEE strate many statistical techniques analysis. Numerous journal journal articles articles have have recently recently been been published published dealing dealing with with analysis . Numerous GEE GEE fit fit analysis; analysis; we we attempt attempt to to summarize summarize and and demonstrate demonstrate the the methods methods that seem seem most most appropriate appropriate.. We We do do recognize, recognize, however, however, that that there there are are as as yet yet that few commercial software applications applications implementing implementing these these methods methods.. few commercial software We have have tried tried to to remain remain faithful faithful to to the the title title of of our our text text.. Notably, Notably, we we focus focus We our attention to to the the varieties varieties of of GEE GEE models models without without overly overly expanding expanding the the our attention discussion to include include alternative alternative approaches approaches to the modeling modeling of of panel panel data, data, discussion to to the e.g., hierarchical models, models, mixed mixed models, models, and and random-effects random-effects models models.. However, However, e.g., hierarchical we do discuss discuss and and show show output output from from some some of of these these alternatives alternatives when when they they we do are equivalent or nearly so so to to the GEE models models of of primary primary interest. interest. are either either equivalent or nearly the GEE Ignoring the likelihood-based likelihood-based and simulation-based models models would would have have been been Ignoring the and simulation-based shortsighted since we we desire desire the the reader reader to to be be aware aware of of these these available available alternaalternashortsighted since tive choices choices.. tive We perhaps perhaps present present more more mathematical mathematical and and algorithmic algorithmic detail detail than than other We other texts in in the the area. area. It It is is our our belief belief that that this this approach approach will will be be of of value value to to aa texts wider audience.. Our Our goal goal is is to to address address the the needs needs of of the the practicing practicing researcher researcher wider audience rather than limiting limiting the the presentation presentation to to the the theoretical statistician.. However, However, rather than theoretical statistician we hope that that the the text text will will be be of of use use to to the the latter latter as as well. well. We We focus on origins, origins, we hope focus on applications, relationships, and and interpretation-all interpretation-all of which we we perceive perceive to to be be applications, relationships, of which useful to the the researcher researcher.. We We try try not not to to present present too too many many theoretical theoretical derivauseful to derivations, and we make make our our presentation presentation in in summation summation notation notation rather rather than than in in tions, and we matrix notation wherever wherever possible. possible. When When matrix matrix results results or or arguments arguments are are rerematrix notation quired, we include include the the sizes of matrices matrices to to more more clearly clearly illustrate the results. results. quired, we sizes of illustrate the Consequently, there there is often more more explanation explanation than than is is necessary necessary for for the the more more Consequently, is often for more statistically erudite reader, reader, but but we we hope hope that that it makes for more meaningful meaningful statistically erudite it makes reading and reading and application application for for those those analysts analysts who who are are not not as as grounded grounded in in statisstatistical theory theory.. tical We have have gathered gathered aa great great deal deal of of information information related related to to GEE GEE methodology. methodology. We To distinguish each each approach, approach, we have developed developed aa taxonomy taxonomy of of models. models. VarVarTo distinguish we have ious labels can can be be found found in in the the literature, literature, particularly particularly with with respect respect to to GEE GEE ious labels

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

ix

extensions. We attempt attempt to to adopt adopt those those published published labels where reasonable. reasonable. extensions. We labels where However, because of of the the variation variation found found in the literature, literature, we we have have created created aa However, because in the common taxonomy related related to to all all relevant relevant models models.. Care Care should should be be taken taken when common taxonomy when reading articles to to understand labels by by means means of of context context.. As As in all reading original original articles understand labels in all aspects of our life, care care and and common common sense sense should should dictate dictate.. aspects of our life, In attempting attempting to to illustrate illustrate as as many many techniques techniques as as possible, possible, we we occasionally occasionally In include examples of of fitting fitting models models that are not not the the best best choice choice for for the the data in include examples that are data in use. We fit fit these these "wrong" "wrong" models models for for the the pedagogical pedagogical purpose purpose of of illustrating use . We illustrating techniques and and algorithms algorithms even even though though these these examples examples sacrifice sacrifice correct correct modmodtechniques eling We hope hope the the readers readers will will forgive forgive these these transgressions transgressions on on our our eling strategies. strategies. We part. part. We wish wish to to recognize recognize many many who who have have contributed contributed to to the the ideas ideas expressed expressed We in this text. text. John John Nelder NeIder has has been been our our foremost foremost influence influence.. Others Others who who we we in this consider most important important to to our our efforts efforts include include Scott Scott Zeger, Zeger, Kung-Yee Kung-Yee Liang, Liang, consider most Roger Newson, Raymond Raymond J. J. Carroll, Carroll, H H.. Joseph Joseph Newton, Newton, Vince Vince Carey, Carey, Henrik Henrik Roger Newton, Schmiediche, Norman Norman Breslow, Breslow, Berwin Berwin Turlach, Turlach, Gordon Gordon Johnston, Johnston, Thomas Thomas Schmiediche, Lumley, Bill Sribney, Sribney, the the Department Department of of Statistics Statistics faculty faculty at at Texas Texas A&M A&M Lumley, Bill University, and aa host host of of others others.. We We also also wish wish to to thank thank Helena Helena Redshaw, Redshaw, susuUniversity, and pervisor of ofthe editorial project project development development of of Chapman Chapman & Hall/CRC Press, Press, pervisor the editorial & Hall/CRC for her her encouragement encouragement and and support support for for this this project project.. for At Chapman & Hall/CRC Hall/CRC Press, Press, we we thank Marsha Hecht, Hecht, Michele Michele Berman, Berman, thank Marsha At Chapman & and Jasmin Naim Nairn for providing editorial editorial guidance, guidance, arranging reviews, and and for providing arranging reviews, and Jasmin keeping us on on schedule. schedule. Finally, Finally, we we express express our our gratitude gratitude and and appreciation keeping us appreciation to Kirsty Kirsty Stroud, Stroud, Chapman Chapman & Hall/CRC statistics statistics editor, editor, for for her her initiation, to & Hall/CRC initiation, confidence, and support support throughout throughout this this project project.. confidence, and

JJ.W.H. .W.H . JJ.M.H. .M .H.

Datasets this book book are are available available in in tab-delimited plain text text format format from: from: from this Datasets from tab-delimited plain http://www.crcpress.com/e_products/downloads/download.asp?cat_no=C3073 http ://www .crcpress .com/e-products/downloads/download .asp?cat-no=C3073

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

Contents Contents

1 1

Introduction Introduction Notational conventions conventions 11.1 .1 Notational A short review of of generalized generalized linear linear models models 11.2 .2 A short review Historical review review 1.2.1 Historical 1.2.1 Basics 1.2.2 1.2 .2 Basics 1.2.3 Link and and variance variance functions functions 1.2 .3 Link Algorithms 1.2.4 1.2 .4 Algorithms Software 11.3 .3 Software 1.3.1 S-PLUS S-PLUS 1.3.1 SAS 1.3.2 1.3 .2 SAS Stata 1.3.3 1.3 .3 Stata 1.3.4 SUDAAN 1.3 .4 SUDAAN Exercises 1.4 1 .4 Exercises

11 22 33 33 66 88 99 11 11 12 12 13 13 13 13 14 14 15 15

Model Construction Construction and and Estimating Estimating Equations Equations 22 Model 2.1 Independent Independent data data 2.1 2.1.1 The The FIML FIML estimating estimating equation equation for for linear regression 2.1.1 linear regression The FIML FIML estimating estimating equation equation for for Poisson regression 2.1.2 2.1 .2 The Poisson regression 2.1 Bernoulli regression 2.1.3 The FIML FIML estimating estimating equation equation for for Bernoulli regression .3 The 2.1 .4 The 2.1.4 The LIML LIML estimating estimating equation equation for for GLMs GLMs The LIMQL LIMQL estimating estimating equation equation for for GLMs GLMs 2.1.5 2.1 .5 The Estimating the the variance variance of of the the estimates 2.2 Estimating 2.2 estimates 2.3 Panel Panel data data 2.3 Pooled estimators 2.3.1 Pooled 2.3.1 estimators 2.3.2 Fixed-effects and and random-effects random-effects models models 2.3 .2 Fixed-effects 22.3.2.1 Unconditional fixed-effects fixed-effects models models .3 .2 .1 Unconditional Conditional fixed-effects fixed-effects models models 2.3.2.2 2.3 .2 .2 Conditional 2.3.2.3 Random-effects models models 2.3 .2 .3 Random-effects Population-averaged and and subject-specific subject-specific models models 2.3.3 2.3 .3 Population-averaged Estimation 2.4 Estimation 2.4 2.5 Summary Summary 2.5 22.6 Exercises .6 Exercises

17 17 17 17 18 18 21 21 22 22 24 24 27 27 28 28 32 32 33 33 34 34 35 35 36 36 42 42 49 49 50 50 50 50 52 52

33

55 55 55 55 57 57

Generalized Estimating Equations Generalized Estimating Equations 3.1 Population-averaged (PA) and subject-specific subject-specific (SS) (SS) models models 3.1 Population-averaged (PA) and 3.2 The PA-GEE for GLMs 3.2 The PA-GEE for GLMs

xi

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

CONTENTS CONTENTS

xii

3.3 3.3

3.4 3 .4 3.5 3 .5 3.6 3.6

3.7 3.7 3.8 3.8 3.9 3.9 3.10 3.10

3.2.1 Parameterizing Parameterizing the the working working correlation matrix 3.2.1 correlation matrix 3.2.1.1 Exchangeable correlation correlation 3.2.1 .1 Exchangeable 3.2.1.2 Autoregressive correlation correlation 3.2.1 .2 Autoregressive 3.2.1.3 Stationary correlation correlation 3.2.1 .3 Stationary 3.2.1.4 Nonstationary correlation correlation 3.2.1 .4 Nonstationary 3.2.1.5 Unstructured correlation correlation 3.2.1 .5 Unstructured 3.2.1.6 Fixed correlation correlation 3.2.1 .6 Fixed 3.2.1.7 Free specification specification 3.2.1 .7 Free 3.2.2 Estimating the the scale scale variance variance (dispersion (dispersion parameter) parameter) 3.2 .2 Estimating 3.2.2.1 Independence models models 3.2 .2.1 Independence 3.2.2.2 Exchangeable models models 3.2 .2 .2 Exchangeable 3.2.3 Estimating the the PA-GEE PA-GEE model model 3.2 .3 Estimating 3.2.4 Convergence of the estimation estimation routine routine 3.2 .4 Convergence of the Estimating correlations correlations for for binomial binomial models models 3.2.5 ALR:: Estimating 3.2 .5 ALR 3.2.6 Summary 3.2 .6 Summary The The SS-GEE for GLMs GLMs SS-GEE for 3.3.1 Single Single random-effects random-effects 3.3.1 3.3.2 Multiple random-effects random-effects 3.3 .2 Multiple Applications of the SS-GEE SS-GEE 33.3.3 .3 .3 Applications of the Estimating the the SS-GEE SS-GEE model model 33.3.4 .3 .4 Estimating Summary 33.3.5 .3 .5 Summary The GEE2 for GLMs GLMs The GEE2 for GEEs for for extensions extensions of of GLMs GLMs GEES 3.5.1 Generalized Generalized logistic logistic regression regression 3.5.1 3.5.2 Cumulative logistic logistic regression regression 3.5 .2 Cumulative Further developments developments and and applications applications Further 3.6.1 The The PA-GEE PA-GEE for for GLMs GLMs with with measurement measurement error error 3.6.1 3.6.2 The PA-EGEE PA-EGEE for for GLMs GLMs 3.6 .2 The 3.6.3 The PA-REGEE PA-REGEE for GLMs 3.6 .3 The for GLMs Missing data data Missing Choosing an an appropriate appropriate model model Choosing Summary Summary Exercises Exercises

4 Residuals, Diagnostics, Diagnostics, and and Testing Testing 4 Residuals,

4.1 4.1

Criterion measures measures Criterion 4.1.1 Choosing Choosing the the best best correlation correlation structure structure 4.1.1 4.1.2 Choosing the the best best subset subset of of covariates 4.1 .2 Choosing covariates 4.2 Analysis Analysis of of residuals residuals 4.2 4.2.1 nonparametric test 4.2.1 A A nonparametric test of of the the randomness randomness of of residuals residuals 4.2 .2 Graphical 4.2.2 Graphical assessment assessment 4.2 for PA-GEE 4.2.3 Quasivariance functions PA-GEE models models .3 Quasivariance functions for 44.3 Deletion diagnostics .3 Deletion diagnostics 44.3.1 Influence measures measures .3.1 Influence 44.3.2 Leverage measures measures .3 .2 Leverage 4.4 Goodness Goodness of of fit fit (population-averaged models) 4.4 (population-averaged models)

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

58 58 59 59 66 66 68 68 71 71 72 72 73 73 73 73 76 76 77 77 82 82 85 85 89 89 89 89 93 93 95 95 96 96 98 98 99 99 103 103 104 104 104 104 106 106 106 106 108 108 110 110 110 110 117 117 119 119 122 122 128 128 131 131 134 134 137 137 139 139 139 139 142 142 142 142 143 143 143 143 154 154 158 158 159 159 165 165 165 165

CONTENTS CONTENTS

4.5 4.5

4.6 4.6 4.7 4.7 4.8 4.8

55

4.4.1 Proportional Proportional reduction reduction in in variation variation 4.4.1 4.4.2 Concordance correlation correlation 4.4 .2 Concordance 4.4.3 A x2 X2 goodness goodness of of fit fit test test for for PA-GEE PA-GEE binomial binomial models models 4.4 .3 A Testing coefficients coefficients in in the the PA-GEE PA-GEE model model Testing 4.5.1 Likelihood Likelihood ratio ratio tests tests 4.5.1 4.5.2 Wald tests tests 4.5 .2 Wald 4.5.3 Score tests tests 4.5 .3 Score Assessing the the MCAR MCAR assumption assumption of of PA-GEE PA-GEE models models Assessing Summary Summary Exercises Exercises

Programs and and Datasets Datasets Programs 5.1 Programs Programs 5.1 Fitting PA-GEE PA-GEE models models in in Stata Stata 5.1.1 Fitting 5.1.1 5.1.2 Fitting PA-GEE PA-GEE models models in in SAS SAS 5.1 .2 Fitting 5.1.3 Fitting PA-GEE PA-GEE models models in in S-PLUS S-PLUS 5.1 .3 Fitting 5.1.4 Fitting ALR ALR models models in in SAS SAS 5.1 .4 Fitting 5.1.5 Fitting PA-GEE PA-GEE models models in in SUDAAN SUDAAN 5.1 .5 Fitting Calculating QIC QIC in in Stata 55.1.6 .1 .6 Calculating Stata Calculating QICu QICu in in Stata Stata 55.1.7 .1 .7 Calculating Graphing the the residual residual runs runs test test in in S-PLUS S-PLUS 55.1.8 .1 .8 Graphing Using the the fixed correlation structure structure in in Stata Stata 55.1.9 fixed correlation .1 .9 Using Fitting quasivariance quasivariance PA-GEE PA-GEE models models in in S-PLUS 55.1.10 .1 .10 Fitting S-PLUS 5.2 Datasets Datasets 5.2 5.2.1 Wheeze Wheeze data data 5.2.1 5.2.2 Ship accident data 5.2 .2 Ship accident data 5.2.3 Progabide data data 5.2 .3 Progabide 5.2.4 Simulated logistic data 5.2 .4 Simulated logistic data 5.2.5 Simulated user-specified user-specified correlated correlated data data 5.2 .5 Simulated 5.2.6 Simulated measurement error data data for for the the PA-GEE PA-GEE 5.2 .6 Simulated measurement error

References References

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

xiii xiii

165 165 166 166 167 167 169 169 170 170 172 172 174 174 174 174 177 177 179 179 181 181 181 181 182 182 183 183 184 184 185 185 186 186 187 187 188 188 189 189 190 190 191 191 192 192 192 192 194 194 196 196 202 202 209 209 212 212 215 215

CHAPTER 1 CHAPTER1

Introduction Introduction In this text text we we address address the the general general field field of of panel panel data data analysis analysis including including longilongiIn this tudinal data data analysis, analysis, but but our our main main focus focus is is on those models models generally generally classified classified tudinal on those as generalized estimating equations, or or GEE. GEE. Throughout, Throughout, we we have have endeavendeavas generalized estimating equations, to remain remain consistent consistent in our use use of of terms terms and and notation notation defined defined in in the the ored ored to in our following paragraphs. paragraphs. Employing Employing strict definitions to to these terms will enable following strict definitions these terms will enable the reader reader to to traverse traverse the the relevant relevant subject subject literature. literature. the All GEE GEE models models consider consider an an estimating estimating equation equation that that is is written written in in two two All parts.. The The first first part part estimates estimates the the regression regression parameters, parameters, and and the the second second esparts estimates the the association association parameters parameters or the parameters parameters of of the the second second order timates or the order variance distribution.. We present below below aa schema schema of of the the various various categories categories of of variance distribution We present GEE GEE models. models. The The remainder remainder of of the the text text is is devoted devoted to to filling filling in in the the details. details. GEE! GEE1 PA PA PA-GEE PA-GEE ALR ALR PA-EGEE PA-EGEE

PA-REGEE PA-REGEE

SS SS

GEE2 GEE2

SS-GEE SS-GEE

Any GEE GEE model model that that assumes assumes orthogonality orthogonality of of the the Any estimating equations for the the regression regression and and estimating equations for association parameters parameters.. association Population-averaged model model focusing focusing on on the the Population-averaged marginal distribution distribution of the outcome outcome.. marginal of the A GEE GEE model model using using moment moment estimates estimates of of the the A association parameters parameters based based on on Pearson Pearson residuals residuals.. association AGEE model using using logistic logistic regression regression of of the the A GEE model odds ratios ratios to to estimate estimate the the association association parameters. parameters. odds A GEE GEE model model using using the the extended extended quasilikelihood A quasilikelihood as its its genesis genesis rather rather than than the the quasilikelihood quasilikelihood.. as The model model can can use use either either Pearson residuals or or The Pearson residuals odds ratios ratios for for the the association parameters. odds association parameters. A resistant resistant GEE GEE model model using using downweighting downweighting to A to remove influence influence of of outliers outliers from from the the estimation. estimation. remove The model model can can use use either either Pearson residuals or or The Pearson residuals odds ratios ratios for for the the association parameters. odds association parameters. Subject-specific model model.. Subject-specific A A GEE GEE model model assuming assuming aa parametric parametric distribution distribution for the for the random random component component and and modeling modeling the the entire entire marginal population distribution distribution rather rather than than the the marginal population distribution.. distribution Any GEE GEE model model that that does does not not assume assume orthogonorthogonAny ality of the estimating equations for the ality of the estimating equations for the regression and and association association parameters. parameters. regression 1

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

22

INTRODUCTION INTRODUCTION

Notational conventions conventions 11.1 .1 Notational Throughout the text, we use use the the following following acronyms according to to the given Throughout the text, we acronyms according the given descriptions: descriptions:

FIML FIML LIML LIML LIMQL LIMQL RE RE FE FE GEE GEE GEE1 GEE1

GEE2 GEE2

PA-GEE PA-GEE 55-GEE SS-GEE PA-EGEE PA-EGEE

PA-REGEE PA-REGEE

Full information information maximum maximum likelihood likelihood Full Limited information maximum likelihood Limited information maximum likelihood Limited information maximum quasilikelihood Limited information maximum quasilikelihood Random effects effects Random Fixed effects Fixed effects Generalized estimating estimating equation equation Generalized GEE application where the estimating estimating equation equation for for GEE application where the the second-level variance parameters is ancillary and the second-level variance parameters is ancillary and assumed to be orthogonal to the estimating equation assumed to be orthogonal to the estimating equation of the the regression regression coefficients of coefficients GEE GEE application where estimating estimating equation equation for for the the application where second-level variance parameters is not assumed to second-level variance parameters is not assumed to be orthogonal to the estimating equation of the be orthogonal to the estimating equation of the regression coefficients coefficients regression GEE-constructed model model focusing focusing on on the the marginal marginal GEE-constructed distribution (also (also known known as population-averaged distribution as aa population-averaged or marginal marginal model) model) or GEE-constructed model model focusing focusing on on the the individuals individuals GEE-constructed (also known known as as aa subject-specific subject-specific model) model) (also GEE-constructed binomial binomial model model focusing focusing on on the the GEE-constructed marginal distribution distribution (also known as as aa marginal (also known population-averaged or or marginal marginal model) model) that that population-averaged provides simultaneous simultaneous estimation estimation of of coefficients coefficients and and provides association parameters. parameters. This This technique technique differs differs from from association PA-GEE in in the the manner manner in in which which the the association association PA-GEE parameters are are estimated estimated parameters Resistant GEE-constructed GEE-constructed model model focusing on the the Resistant focusing on marginal distribution distribution (also known as as aa marginal (also known population-averaged or or marginal marginal model) model) where where population-averaged the model model downweights downweights the data to to remove remove influence influence the the data

the following following notation: notation: We also also use use the We

L() £0 G() £0 Q0 QO Q+O Q+0 wO0 1P

Likelihood function. Likelihood function . Log-likelihood function.. Log-likelihood function Quasilikelihood function.. Quasilikelihood function Extended quasilikelihood function.. Extended quasilikelihood function The estimating estimating equation equation or or generalized generalized estimating equation.. The estimating equation

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

A SHORT REVIEW REVIEW OF OF GENERALIZED LINEAR MODELS MODELS A SHORT GENERALIZED LINEAR

33

A short short review review of of generalized linear models models 11.2 .2 A generalized linear

Generalized Estimating Estimating Equations Equations (GEE), (GEE), the the prime prime subject subject of of this this text, text, is Generalized is traditionally presented presented as as an an extension extension to to the standard array array of of Generalized Generalized traditionally the standard Linear Models (GLMs) (GLMs) as as initially initially constructed constructed by by Wedderburn Wedderburn and and Nelder NeIder in in Linear Models the mid-1970s mid-1970s.. As As such, such, we we shall shall provide provide an an overview overview of of GLM and discuss discuss the GLM and the various various ways ways that that the the GLM algorithm has has been been extended extended to to allow allow the the the GLM algorithm modeling of modeling of correlated correlated data. data.

1.2.1 Historical Historical review review 1.2.1 Peter McCullagh ofthe ofthe University University of of Chicago Chicago and and John John Nelder NeIder of ofthe Imperial Peter McCullagh the Imperial College of Science and and Technology, Technology, London, London, authored authored the the seminal seminal work work on on College of Science Generalized Linear Linear Models Models in in 1983, in aa text text with with the the same same name. name. Major Major Generalized 1983, in revisions were made made in in McCullagh McCullagh and and Nelder NeIder (1989), (1989), which which is is still still the the most most revisions were current edition.. This text remains remains the the mainstay mainstay and and most most referenced referenced book book on on current edition This text the topic topic.. More More importantly, importantly, for for our our purposes, purposes, it it is is the the basis basis upon which Liang Liang the upon which and Zeger (1986) (1986) introduced introduced aa method method for for handling handling correlated correlated longitudinal longitudinal and Zeger and clustered data. data. and clustered As likelihood-based likelihood-based models, models, GLMs are based based on on the the assumption assumption that that ininAs GLMs are dividual subjects or or observations observations are are independent. This assumption assumption is is comcomdividual subjects independent . This monly referred to to as as the the iid iid requirement requirement;; ii.e., observations are are independent independent monly referred .e., observations and identically distributed distributed.. However, there are are many common data data situations situations and identically However, there many common for which which responses responses are are correlated. correlated. For For instance, instance, consider consider aa dataset dataset consisting for consisting of patient records records taken taken from from various various hospitals hospitals within within aa state state or province. of patient or province. Also, suppose that that the the data interest relate relate to to aa certain certain type type of of medical medical Also, suppose data of of interest procedure.. It It is is likely likely that that each each hospital hospital has has its its own own treatment treatment protocol protocol such such procedure that there there is correlation of of treatment effects within within hospitals hospitals that that is is absent absent that is aa correlation treatment effects between hospitals hospitals.. When When such condition exists, exists, the the individual data records records between such aa condition individual data are not independent, independent, and, and, hence, hence, violate violate the the iid iid assumption assumption upon upon which which many many are not likelihood and quasilikelihood quasilikelihood models models are based. likelihood and are based. In the the late late 1970s, 1970s, John John Nelder NeIder designed the first first commercial commercial software software develdevelIn designed the oped exclusively for for GLMs. GLMs. Called Called GLIM, GUM, for for Generalized Generalized Linear Linear Interactive Interactive oped exclusively Modeling, the Modeling, the software software was was manufactured manufactured and and distributed distributed by by Numerical Numerical AlAlgorithms Group in in Great Great Britain Britain.. gorithms Group and GLIM GUM team team members members introduced introduced capabilities capabilities into GUM Later, Nelder NeIder and Later, into GLIM that allowed allowed adjustment adjustment of of the the variance-covariance variance-covariance or or Hessian Hessian matrix matrix so so that that that the effects effects of of extra extra correlation correlation in the data data would would be be taken taken into account with with rerethe in the into account spect to standard standard errors. This was was accomplished accomplished through through estimation estimation of of the the disdisspect to errors. This persion statistic. There are are two two types types of of dispersion dispersion statistics in GLM GLM methodmethodpersion statistic . There statistics in first type ology. The first type is is based based on the deviance deviance statistic statistic;; the second on the PearPearology. The on the the second on the son X2 statistic statistic.. As As we we discuss later, the the overall overall model model deviance deviance and Pearson son x2 discuss later, and Pearson x2 X2 statistics statistics are summary measures measures of of model model fit fit that that are traditionally included are summary are traditionally included in model The deviance in model output output.. The deviance dispersion is derived derived by by dividing dividing the the deviance deviance dispersion is statistic by the the model model residual residual degrees degrees of of freedom. Likewise, the the Pearson Pearson x2 X2 statistic by freedom . Likewise, x2 2 statistic is calculated by dividing the summary Pearson X by the same model statistic is calculated by dividing the summary Pearson by the same model

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

44

INTRODUCTION INTRODUCTION

degrees of freedom freedom.. The The residual residual degrees degrees of of freedom freedom is itself defined defined as as (n p) degrees of is itself (n --p) where is the the number number of of cases cases in in the the model model and and p p refers refers to to the the number number of of where nn is model predictors, including including aa constant constant if if applicable. applicable. model predictors, Depending on the type type of correlation effect, effect, we we characterize characterize response response data data Depending on the of correlation on counts and and binomial binomial trials trials as as underdispersed underdispersed or or overdispersed overdispersed.. If are on counts If we we are to more more appropriately appropriately model model such such data, data, we we must must amend amend the the usual usual GLM GLM and and to estimating algorithm to to address address the the correlation correlation effects effects.. estimating algorithm The earliest earliest method method used used to to adjust adjust standard standard errors errors due due to to perceived perceived correcorreThe lation was to to divide divide each each parameter parameter standard standard error error by by the the square square root root lation effects effects was of either the the deviance-dispersion deviance-dispersion or or Pearson Pearson Xz X2 dispersion. This procedure procedure is of either dispersion. This is called the scaling scaling of of standard errors.. It It is is aa post-estimation post-estimation adjustment adjustment of of called the standard errors standard errors that that has has no no effect effect on on the the fitted fitted regression regression coefficients coefficients.. standard errors For binomial binomial and and count count models models estimated estimated using using GLM GLM methodology, methodology, aa disdisFor statistic greater greater than than 1.0 1.0 indicates possible extra extra correlation correlation in in the the persion statistic persion indicates possible data. Scaling is is an an attempt attempt to to adjust adjust the the standard standard errors errors to to values values that that would would data. Scaling be observed observed if if the the data data were were not not overdispersed overdispersed.. That That is, is, scaling scaling provides provides stanstanbe dard errors that that would would be be obtained obtained if if the the dispersion dispersion statistic statistic were were 1.0. dard errors 1 .0 . The above The above description description of of scaling scaling is is somewhat somewhat naive, naive, as as we we shall shall see see.. HowHowever, the idea idea behind behind scaling scaling is is to to use use straightforward straightforward model model statistics statistics to to ever, the accommodate data that that are are marginally marginally correlated. correlated. This This method method still proves still proves accommodate data useful to to the the analyst analyst as as aa first-run first-run look look at at the the data. data. useful We should should mention mention at this point point that that there there are occasions when when aa model model may may We at this are occasions appear to be be overdispersed overdispersed when when in in fact fact it it is is not not.. For For instance, instance, if the devianceif the appear to deviancebased dispersion of aa Poisson Poisson model greater than than 1.0, 1.0, this this provides provides prima prima based dispersion of model is is greater facie evidence evidence that that the the model model may may be be overdispersed overdispersed.. In In practice, practice, analysts analysts facie typically start start terming terming aa Poisson Poisson model model as as overdispersed overdispersed when when the the dispersion dispersion typically statistic is greater greater than 1.5 and and the the number number of of cases cases in in the the model model is statistic is than 1.5 is large. large. Just how much much greater greater than than 1.5 1.5 and and just just how how large large of of aa dataset dataset depend depend on on Just how the number number of of predictors, predictors, predictor predictor profile, profile, and and the the pattern pattern of of covariates covariates in in the the model model.. Hence, Hence, there there is is no no definitive definitive dispersion dispersion value value over over which which aa model model the is categorized as as overdispersed overdispersed.. is specifically specifically categorized In addition addition to to the the above above caveat caveat regarding regarding model model overdispersion, overdispersion, aa model model In that otherwise otherwise appears to be be overdispersed overdispersed may may in reality be be what what we we call call apapthat appears to in reality parently overdispersed overdispersed.. Apparent Apparent overdispersion overdispersion results results when model omits omits parently when aa model relevant explanatory predictors, predictors, or or when when the the data data contain contain influential influential and and posposrelevant explanatory sibly mistakenly coded coded outliers, outliers, or or when when the the model failed to to account account for for sibly mistakenly model has has failed needed interaction terms, or when when one one or or more predictors need need to to be be transtransneeded interaction terms, or more predictors formed to to another another scale, scale, or or when when the the assumed assumed linear relationship between between formed linear relationship the response response and and predictors predictors is is in fact some some other other relationship relationship.. When When any any of of the in fact inflation of the above above situations occurs, the the result result may may be be an an inflation of the the dispersion dispersion the situations occurs, statistic. Applying remedies remedies to to accommodate accommodate the the above above conditions may rerestatistic . Applying conditions may sult in aa value value of of the the reestimated reestimated dispersion dispersion statistic statistic to to be be reduced reduced to to near near sult in 11.0. When this this occurs, occurs, the the original original model model is is proven proven to to have have been been apparently apparently .0. When overdispersed. overdispersed . for or On the the other other hand, hand, if if one one tests tests for or makes makes appropriate appropriate changes changes in in the the On model the dispersion dispersion statistic statistic is is still still high, high, then then it is likely likely that that the the disdismodel and and the it is

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

A SHORT REVIEW REVIEW OF OF GENERALIZED LINEAR MODELS MODELS A SHORT GENERALIZED LINEAR

55

persion is real. Other Other checks checks may may be be used used to to assess assess overdispersion overdispersion including including persion is real. the comparison comparison of of the mean and and variance variance of the response, response, or or evaluation evaluation of of the the mean of the residuals. residuals . The important point point is is that that signs signs of of model model overdispersion overdispersion must must be be evalevalThe important uated; and if if overdisperson overdisperson is is found found to to be be real real it it must must be be dealt dealt with with in in an an uated ; and appropriate manner.. The The manner manner in in which is dealt dealt with with in in appropriate manner which overdispersion overdispersion is large part depends depends on on the the perceived perceived source of the the overdispersion, overdispersion, which which itself itself large part source of represents excess correlation correlation in in the the data. data. Standard Standard methods methods include include scaling, scaling, represents excess using robust variance variance estimators, or implementing implementing models models that that internally adusing robust estimators, or internally adjust for for correlated correlated data. data. just Scaling standard errors errors is is aa post post hoc method of of analyzing analyzing correlated correlated data. data. Scaling standard hoc method It performed after after the the model model has has been been estimated, estimated, and and only only adjusts adjusts standard standard It is is performed errors. It has has no no effect effect on on parameter parameter estimates estimates.. As As such, such, the the major major deficiency deficiency errors. It is that it it does does not capture, or or appropriately adjust for, for, an an identified identified cluster cluster is that not capture, appropriately adjust or correlation effect effect.. The The method method simply simply provides provides an an overall overall adjustment. adjustment. or correlation Another method method that that applies applies an an overall to standard standard errors errors has has Another overall adjustment adjustment to also found favor favor in in aa number number of scientific disciplines. disciplines. This This method, method, an an alternaalternaalso found of scientific tive variance variance estimate, estimate, has has been been called called by by various various names the past past several several tive names over over the decades, many times times depending depending on on the the academic academic discipline discipline employing employing it it.. We We decades, many shall simply refer refer to to it it as as the sandwich variance variance estimator. Over time, time, other shall simply the sandwich estimator . Over other related variance estimators estimators have have been been proposed proposed to to more more directly nonrelated variance directly address address nonindependence, and we we discuss discuss one one general modification in in particular particular.. These These independence, and general modification alternative variance estimators estimators represent represent aa more more sophisticated sophisticated approach approach to to alternative variance adjusting inference than than simply simply scaling scaling the the standard standard errors based on the disdisadjusting inference errors based on the persion statistic. However, the the adjustment adjustment is is still still post post hoc hoc and and only only affects affects the the persion statistic . However, standard errors, not not the the parameter parameter estimates estimates themselves themselves.. standard errors,

In the the mid-1980s, mid-1980s, researchers researchers at at Johns Johns Hopkins Hopkins Hospital Hospital in in Baltimore Baltimore dedeIn veloped methods to to deal deal with with longitudinal longitudinal and cluster data data using using the the GLM GLM veloped methods and cluster format.. In In so so doing, doing, they they created created aa 2-step 2-step algorithm algorithm that first estimates estimates aa format that first straightforward GLM, and then calculates calculates aa matrix matrix of of scaling scaling values values.. The The straightforward GLM, and then scaling matrix adjusts adjusts the the Hessian Hessian matrix matrix at at the the next next algorithm algorithm iteration iteration.. scaling matrix Each subsequent iteration iteration in in the the algorithm algorithm updates updates the the parameter parameter estimates, estimates, Each subsequent the adjusted Hessian matrix, matrix, and and aa matrix matrix of of scales. scales. Liang Liang and and Zeger Zeger (1986) (1986) the adjusted Hessian provided further further exposition exposition of of how how the the matrix matrix of of scales could be be parameterized parameterized provided scales could to allow allow user user control over the the structure of the the dependence dependence in in the the data. data. to control over structure of Although this is the the barest barest description description of of their their method, method, hopefully hopefully it it illusillusAlthough this is trates the the logic logic behind behind the initial versions versions of of the the extended GLMs introduced introduced trates the initial extended GLMs through generalized generalized estimating estimating equations equations.. The The method method arose arose to to better better address address through the dependence dependence of of longitudinal longitudinal and and clustered clustered data. data. As As should should be be expected, expected, the the the original GEE GEE algorithm original algorithm served served as as aa springboard for the the development development of of other springboard for other methods for dealing dealing with with correlated correlated data. data. methods for Because GEE GEE is is traditionally traditionally presented presented as as an an extension extension of of generalized generalized linear linear Because models, we outline outline the the various various features features that that characterize characterize aa GLM GLM.. A A much much more more models, we thorough examination examination can can be be found found in in Hardin Hardin and and Hilbe Hilbe (2001) (2001).. thorough

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

66

INTRODUCTION INTRODUCTION

1.2.2 Basics Basics 1.2.2 Many of Many the models models that that have have now now been been incorporated incorporated under under the the rubric rubric GenGenof the Linear many still eralized Linear Models (GLMs) were previously (and many still are) estieralized Models (GLMs) were previously (and are) estiusing maximum likelihood Examples mated using maximum likelihood methods. Examples include logistic regresmated methods. include logistic regression, Poisson sion, Poisson regression, regression, and and probit probit regression regression.. Each Each of these regression regression rourouof these in prior GLM algorithm. Why tines were in use prior to the creation of the GLM algorithm. Why duplicate tines were use to the creation of the duplicate what was already already available? available? what was the early early 1970s, 1970s, computing computing was was usually usually performed performed on on mainframe mainframe comcomIn the In puters. Academics could purchase execution time on campus computers, typputers . Academics could purchase execution time on campus computers, typically located within the newly developing Departments of Computer Science. ically located within the newly developing Departments of Computer Science. and analysts analysts were were fortunate fortunate to to have have easy easy access access to to Sometimes researchers researchers and Sometimes computing facilities; but that was rather rare. Computer use was absolutely computing facilities ; but that was rather rare . Computer use was absolutely necessary in order to estimate estimate parameters parameters using using maximum maximum likelihood likelihood optioptinecessary in order to mization techniques. The simple matrix inversion of the Hessian required for mization techniques . The simple matrix inversion of the Hessian required for maximum likelihood algorithms is not simple at all if one has to calculate maximum likelihood algorithms is not simple at all if one has to calculate likelihood optimization the inverse inverse by by hand hand.. Moreover, Moreover, maximum maximum likelihood optimization algorithms algorithms the computing require tractable starting values and substantial computing power, require tractable starting values and substantial power, especially especially for for large large datasets datasets.. There was was a a clear clear need need to to find optimization method method by by which which otherwise otherwise There find an an optimization nonlinear models could be estimated using standard OLS methods. Weddernonlinear models could be estimated using standard OLS methods. Wedderburn and NeIder discovered that the methods used to estimate weighted linear burn and Nelder discovered that the methods used to estimate weighted linear regression could be adjusted to model many data situations that were preregression could be adjusted to model many data situations that were previously estimated via maximum likelihood, particularly for those maximum viously estimated via maximum likelihood, particularly for those maximum likelihood models based based on on the the exponential exponential family family of of distributions distributions.. They They aclikelihood models accomplished this by applying the Iterative Weighted Least Squares (IWLS) complished this by applying the Iterative Weighted Least Squares (IWLS) algorithm already in in use. use. In In addition, addition, they they employed employed aa link link function function which which algorithm already linearized such functions as the logistic, probit, and log. The IWLS algorithm linearized such functions as the logistic, probit, and log. The IWLS algorithm was later renamed renamed IRLS, IRLS, meaning meaning Iterative Iterative Re-weighted Re-weighted Least Least Squares Squares to to was later emphasize the updating step for the weights in the algorithm. Also, it was emphasize the updating step for the weights in the algorithm. Also, it was renamed to distinguish distinguish it it from from the the traditional traditional weighted weighted least least squares squares WLS WLS renamed to algorithm. Hardin algorithm. Hardin and and Hilbe Hilbe (2001) (2001) point point out out that that the the name name change change is is not not without some etymological controversy; NeIder felt that "reweighted" put too without some etymological controversy; Nelder felt that "reweighted" put too given much emphasis on on the the updating updating of the weights weights in in the the OLS OLS calculation calculation given much emphasis of the that the synthetic dependent variable is also updated. that the synthetic dependent variable is also updated. Despite some reservations to to the the name name change change of of the the algorithm, algorithm, IRLS IRLS bebeDespite some reservations came a common framework for estimating models derived from the exponencame a common framework for estimating models derived from the exponential family family of probability distributions. distributions. The The algorithm algorithm takes takes advantage advantage of of the the tial of probability form of the variance estimate available from Fisher scoring to develop an easy form of the variance estimate available from Fisher scoring to develop an easy framework from from which which computer computer code can be be developed. developed. Later, Later, when when comcomframework code can puting memory and processor speed became more available, GLM algorithms puting memory and processor speed became more available, GLM algorithms were extended to to incorporate incorporate varieties varieties of of Newton-Raphson Newton-Raphson based based estimation estimation.. were extended This allowed more complex models to be estimated within an expanded GLM This allowed more complex models to be estimated within an expanded GLM framework. framework. Generalized linear models, models, as as previously previously mentioned, mentioned, are are based based on on the expoGeneralized linear the exponential family of distributions.. Members Members of of this family include include the the Gaussian Gaussian nential family of distributions this family

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

A SHORT REVIEW REVIEW OF OF GENERALIZED LINEAR MODELS MODELS A SHORT GENERALIZED LINEAR

77

or normal, binomial, binomial, gamma, gamma, inverse inverse Gaussian, Gaussian, Poisson, Poisson, geometric, and the the or normal, geometric, and negative binomial for for aa specified specified ancillary ancillary parameter parameter.. Liang and Zeger's Zeger's GEE GEE negative binomial Liang and extension of GLM GLM focused focused on on the the traditional traditional Gaussian, Gaussian, binomial, binomial, gamma, gamma, and and extension of Poisson family members, members, though though their application clearly extends to to other Poisson family their application clearly extends other members. members . All members members of of the the traditional traditional class class of of generalized generalized linear linear models models are are based based All on one of of the the above above probability probability functions functions.. The The likelihood likelihood function function is is simply simply on one aa re-parameterization re-parameterization of of the the probability probability function function or or density density.. A A probability probability function estimates estimates aa probability probability based based on on given given location location and and scale scale parameters. parameters. function A on the the other other hand, hand, estimates estimates the the parameters parameters on the A likelihood likelihood function, function, on on the basis of given probabilities probabilities or or means means.. The The idea idea is that the the likelihood likelihood estimates estimates basis of given is that parameters that that make make the the observed observed data most probable probable or or likely. likely. Statisticians Statisticians parameters data most use the the log log transform transform of of the the likelihood, however, because because it is (usually) (usually) more more use likelihood, however, it is tractable to to use use in in computer computer estimation. estimation. More More detailed detailed justification justification can can be be tractable found in in Gould Gould and and Sribney (1999).. found Sribney (1999) Members ofthe of the exponential exponential family family of of distributions distributions have have the the unique unique property property Members that their their likelihood likelihood formulation formulation may may be be expressed expressed as as that yO - b(9) b(O) } y9 exp exp { a(¢» - C(y, c(y, ~) ¢» a(0)

(1.1) (1.1)

For For instance, instance, consider consider the Poisson probability probability function function the Poisson Ce-I' flYy f(y;ft) f (y~ P) = - , Yy.

=

P

(1.2) (1.2)

We may rewrite rewrite this this function function in exponential family family form form as We may in exponential as

f(y; ft) f (y ; N)

I)}

In(p) p = exp exp {yy In(~) - ft -lnf(y + 1) In F(y + = 1

(1.3) (1.3)

As mentioned previously previously there there are are aa number number of of distributions distributions for which the the As mentioned for which associated likelihood follows follows this this general general form. form. The The power power of of GLM GLM lies in the the associated likelihood lies in ability to develop develop or or derive derive techniques, techniques, statistics, statistics, and and properties properties for for the the entire ability to entire group simply based based on on the the form form of of the the likelihood likelihood.. group simply The expected expected value of the the exponential exponential family distribution is is related related to to the the The value of family distribution outcome variable of of interest. interest. There There is is aa natural natural connection connection between between these these two two outcome variable quantities that allows allows us us to to introduce introduce covariates covariates into into the the model model in place of the quantities that in place of the expected value. This This connection connection is is the the 0 parameter.. When When aa particular particular distridistriexpected value. 9 parameter bution is is written written in in exponential exponential family form, the parameter is is represented represented by by bution family form, the 90 parameter some monotonic differentiable differentiable function of the the expected expected value value P. ft. This This function function some monotonic function of links the links the outcome outcome variable variable yy to to the the expected expected value value p. ft. The The particular particular funcfunction that that results from writing writing aa distribution distribution in in exponential exponential form form is is called called the the tion results from In general, general, we canonical link. In we can can introduce introduce covariates covariates into into the the model model through canonical link. through numeric any monotonic monotonic differentiable differentiable link link function, function, though though we we can can encounter encounter numeric any difficulties if the the function fails to to enforce enforce range range restrictions restrictions that that define define the the difficulties if function fails particular distribution of the exponential family. particular distribution of the exponential family. For any member member distribution distribution of of the the exponential exponential family family of of distributions, distributions, For any there is is aa general general link link function, function, called called the the canonical canonical link, link, that that relates relates the the linlinthere

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

88

INTRODUCTION INTRODUCTION

ear predictor r7 1] = Xj3 to to the the expected expected value value p. f.J,. These These canonical canonical links ear predictor = X,Q links occur occur when = 77'fJ.. For For the the Poisson Poisson model, model, we we see see that that 9 = implying that that when 9 = = In(f.J,) ln(p),, implying the canonical canonical link link is given by by the the log-link log-link 77'fJ = = ln(p) In(f.J,).. Since Since there there is is no no comcomthe is given pelling reason reason that that the the systematic systematic components components of of the the model model should should be be linear linear pelling on the scale scale of of the the canonical canonical link, we can, can, as as previously previously mentioned, mentioned, choose choose on the link, we any monotonic differentiable differentiable function. function. any monotonic Subsequent to introducing introducing this this class class of of regression regression models, models, Wedderburn Wedderburn Subsequent to (1974) showed showed that that the the theoretical theoretical results results could could be be justified justified through through an an asas(1974) sumption of independence independence of of the the observations observations and and an an assumption assumption that that the the sumption of variance could be be written written as as aa function function of of the the mean mean (up (up to to aa scale scale factor) factor).. variance could This set of of assumptions assumptions is is much much less less conservative conservative than than the the original assumption This set original assumption of particular parametric parametric distributions distributions.. As As aa consequence, consequence, the the class class of of GLMs GLMs of particular allows not only only aa specification of the the link link function function relating relating the the outcome outcome to to allows not specification of the covariates, but also also aa specification specification of of the the form form of of the the variance variance in in terms terms of of the covariates, but the mean. mean. These These two two choices choices are not limited limited to to specific specific distributions distributions in in the the the are not exponential family. Substituting given link link function function and and variance variance function function exponential family. Substituting aa given link and into the IRLS IRLS algorithm algorithm implies implies aa quasilikelihood quasilikelihood.. If If the the link and variance variance into the functions coincide coincide with with choices choices for for aa particular particular distribution distribution of the exponential exponential functions of the family, the the quasilikelihood quasilikelihood is is aa likelihood likelihood proper proper.. family,

e

e

Link and and variance variance functions functions 11.2.3 .2.3 Link There number of of standard standard choices choices in in the the data data analyst's toolbox for for specspecThere are are aa number analyst's toolbox ifying the relationship of the expected value of the outcome variable to the ifying the relationship of the expected value of the outcome variable to the linear combination of covariates Xj3. Usually, these choices are driven by the linear combination of covariates X,Q. Usually, these choices are driven by the range and nature of the outcome variable. For instance, when the outcome range and nature of the outcome variable. For instance, when the outcome is is binary, analysts analysts naturally naturally choose choose inverse inverse link link functions functions that that map map any any possible possible binary, calculation of the the linear linear combination of the the covariates and associated associated paramparamcalculation of combination of covariates and eters to a range (0,1) implied by the outcome. The inverse link function eters to a range (0,1) implied by the outcome . The inverse link function is is what converts the the linear linear predictor predictor X,Q xj3 into into an an estimate of the the expected expected value value what converts estimate of JL. Positive outcomes outcomes similarly similarly lead lead analysts analysts to to choose choose inverse inverse link functions it. Positive link functions that transform transform the the linear linear predictor predictor r7 1] = = X,Q Xj3 to to positive positive values values.. that Some standard choices choices of of link link and and inverse inverse link link functions functions are are listed listed in in TaTaSome standard ble 1.1. 1.1. Variance Variance functions corresponding to to member member distributions distributions in in the the exexble functions corresponding ponential family family are are listed listed in in Table Table 11.2. ponential .2. Other common choices choices for for link link functions functions include include the the general power link link funcfuncOther common general power tion (which (which includes includes the the log, log, reciprocal, reciprocal, and and inverse inverse square square as as special special cases) cases) tion and the odds odds power power link. link. See See Hardin Hardin and Hilbe (2001) (2001) for for aa more more complete complete and the and Hilbe list of link functions and and variance variance functions functions along along with with useful useful expressions expressions for for list of link functions derivatives and range range restrictions. derivatives and restrictions. Confusion can arise arise in in reading reading various various texts texts on GLMs. The The link link function function Confusion can on GLMs. is the function function that that converts converts the the expected expected value value it JL (which may be be range range rereis the (which may stricted) to the the unrestricted unrestricted linear linear predictor predictor X,Q Xj3.. The The function function is is invertible, invertible, stricted) to and often texts texts will will list list the the inverse inverse link link function function instead instead of, of, or or as as well well as, the and often as, the link function link function.. Terminology Terminology to to differentiate differentiate these these functions functions derives from the the derives from associated of the the link link function. For example, with aa positive positive outcome associated name name of function . For example, with outcome

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

A SHORT REVIEW REVIEW OF OF GENERALIZED LINEAR MODELS MODELS A SHORT GENERALIZED LINEAR

Link Link Name Name Complementary log-log log-log Complementary Identity Identity Inverse Square Square Inverse Log Log Log-log Log-log Logit Logit Probit Probit Reciprocal Reciprocal

99

Link function function Link =g(f-t) 777) = g(P)

Inverse Link Link Inverse f-t = = g-1(77) g-1 (7)) P

In{ln(1 In{ -In(I - p)} f-t)} f-t P 11//Nf-t22 In(p) In(f-t) -In{-In(f-t)} - In{- In (p)} In - P)) (f-t/(If-t)) In (P/(1
exp{ - exp(77)} exp(7))} 11 -- exp{7) 77 1/~'f?q 1/ y!fJ exp(7)) exp (r7) exp{ - exp(-77)} exp( -7))} exp{1J /(1 + e") ee"/(1 + e 1J )
Table 11.1 Standard link link and and inverse inverse link link functions. functions. Table .1 Standard

Distribution Distribution Bernoulli Bernoulli Binomial( k) Binomial(k) Gamma Gamma

Variance Variance V(f-t) V(P) P(1 - N) f-t(If-t)

p(1 p/k) k) f-t(I - f-t/ P2 f-t2

Gaussian Gaussian Inverse Gaussian Gaussian Inverse Negative binomial binomial Negative Poisson Poisson

11

P3 f-t3

p f-t+kf-t2 + kp2 f-t

Table 1.2 Variance functions functions for for distributions distributions in in the exponential family family.. Table 1 .2 Variance the exponential

variable, textbook authors authors may may choose choose the the log-link log-link such such that that ln(jt) In(JL) = = X,Q Xj3 variable, textbook while still mentioning mentioning the the inverse inverse link link it JL = = exp(X,Q) exp(Xj3).. while still

11.2.4 Algorithms .2.4 Algorithms The estimates of of the the GLM GLM regression parameter vector vector 13 are the the solution solution of of The estimates regression parameter Q are the estimating equation given given by by the estimating equation

8G 8£ = =00 813 8~

(1.4) (1.4)

where £ is the log-likelihood log-likelihood for for the the exponential exponential family. family. A A Taylor Taylor series series exexwhere G is the pansion about about the the solution solution (3* is given by pansion /3* is given by

22 0= 8813£ ((3*) ((3 (3*) 8 + ... .. 0 _ 0,3(0*)-P-a*)~~ £ T + 813813 ~T

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

(1.5)

10 10

INTRODUCTION INTRODUCTION

such that aa recursive recursive relationship relationship for finding the the solution solution is is such that for finding

j3(r)

= j3(r-1) + [_ EJ2 £

(j3(r-1))]

-1

8j38j3T

8£ (j3(r-1)) 813

(1.6)

The method of of Fisher Fisher scoring scoring substitutes substitutes the the expected expected value value of of the the Hessian Hessian The method matrix, resulting in matrix, resulting in

::T)

-1 ~~

0'C 0'C = j3(r-1) + [E (~~

-1) ] (j3(r-1)) (j3(r-1)) -1 0'C (3( [E (0,3 B,QT ) (0( ~ Filling in the the details, details, the the updating updating steps defined by by Filling in steps are are defined j3(r)

{ i=1 2-

(1. 7)

V(lti~a(¢) (~~)~ XiiXki} j3~r21 = (lei XjiXki

~

pXp PXP

aPX1 =

{t,v(~,~a(¢) (~~): {(Y'-Pi) (~:)i 9 +"'}Xj,L 1

-

v(Ni)a(0)

(N)i

`Yi

- Ni)

(C

N) a

+ai

l

Xji

(18)

~Pxl

for jj = = 1, 1, ... Here, we we note that for . . . ,,po p. Here, note that W W Z Z

= =

Diag Diag

z

{V(It~a(¢) (~~ } 071)r} v(p)a(o)

{ (y - It)

(

(1.9) (1.9) (nxn) (nxn)

(~~). +~} (nx1)

It a2 + ( ~N)

(1.10)

~~(nx1)

so we may may rewrite rewrite the the updating updating step step in matrix notation (with Xnxp) X nxp ) as as so that that we in matrix notation (with the weighted weighted OLS OL8 equation equation the (X T WX)j3(r) = = XTWZ XTWZ (1.11) (XTWX),3(r) Hence, the IRL8 algorithm algorithm is is succinctly succinctly described described by by the the algorithm algorithm Hence, the IRLS 1. Initialize the the vector vector of of expected values N.. J.L. 1. Initialize expected values 2. q= .) . 2. Calculate Calculate the the linear linear predictor predictor using using the the link link function function 'fJ = g(lr g(J.L). 3. Initialize Initialize the the scalars scalars O1dDeviance OldDeviance and and NewDeviance NewDeviance to to zero. zero. 3.

4. Initialize Initialize the the scalar scalar tolerance tolerance to to Ie - 6 (or (or another another small small tolerance tolerance value), value), and and 4. le-6 the scalar scalar DeltaDeviance DeltaDeviance to to one one (or some other other value value larger larger than than tolerance) tolerance).. the (or some 5. If If IDeltaDeviancel IDeltaDeviancel > tolerance tolerance then then stop. stop. 5. 6. Calculate Calculate the the weight weight matrix matrix W. W. 6. 7. Calculate Calculate the the synthetic synthetic dependent dependent variable variable Z. Z. 7. 8. Regress Z on X using OLS with weights W to get get j3(r). 8. Regress Z on X using OLS with weights W to 9. Calculate Calculate the the linear linear predictor predictor q 'fJ = = X,Q(T) Xj3(r) from from the the regression regression results results.. 9. 10. Calculate Calculate the the expected expected values values using using the the inverse inverse link link /r. J.L = g-1('fJ). 10. 11. Set equal to to NewDeviance NewDeviance.. 11. Set OldDeviance O1dDeviance equal 12. 12. Calculate Calculate the the deviance deviance and and store store in in NewDeviance. NewDeviance. 13. Store into DeltaDeviance Del taDeviance the the difference difference of NewDeviance and and O1dDeviance. OldDeviance. 13. Store into of NewDeviance 14. Return to step 5. 14. Return to step 5 .

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

SOFTWARE

11

The substitution given given by by the the method method of of Fisher Fisher scoring scoring admits admits the the use use of of The substitution weighted OL8 to to iteratively iteratively define define weights weights W Wand the synthetic synthetic dependent dependent weighted OLS and the variable Z. Later Later theoretical theoretical developments developments relaxed the assumption assumption that that the the variable Z. relaxed the estimating equation must must be be defined defined as derivative of of aa log-likelihood log-likelihood.. estimating equation as the the derivative If we maintain maintain the the assumption assumption that that the the estimating is the the derivative derivative If we estimating equation equation is of the log-likelihood, log-likelihood, we can skip skip the the substitution substitution of of the the expected expected Hessian Hessian of the we can matrix given by by the the method method of Fisher scoring scoring.. In In this this approach, approach, we we use use aa matrix given of Fisher Newton-Raphson algorithm algorithm to to iterate iterate to to the the maximum maximum likelihood likelihood estimates estimates Newton-Raphson of This algorithm is succinctly succinctly described described by by the the following following algorithm algorithm of 13. ,3. This algorithm is

1.1. 2. 2. 3. 3. 4. 4. 5. 5. 6. 6. 7. 7. 8. 8. 9. 9. 10. 10. 11. 11. 12. 12. 13. 13.

Initialize the the coefficient coefficient vector vector ,Q. /3. Initialize Calculate the the log-likelihood log-likelihood G £ for for the the initial /3. Calculate initial ,Q. Set the scalar BetaTol and the scalar LikTol to aa desired desired tolerance tolerance level. level. Set the scalar BetaTol and the scalar LikTol to Set the old old log-likelihood log-likelihood value value Gold £old to to G+2LikTo1 £+2LikTol (or (or some some other other large large value) value).. Set the Initialize the the coefficient coefficient vector vector 0oid /3old to to 10/3 + 11 (or (or some some other other large large values) values).. Initialize 100 + If then stop. stop. If 110 11/3 - 0oidI /3oldllI > > BetaTol BetaTol or or IL 1£ - Gold £oldl > > LikTol LikTol then Calculate the the gradient gradient gg = = 8G/e,Q o£/o/3 evaluated evaluated at /3old'. Calculate at 0oid Calculate the the Hessian Hessian H H = = -82 -02£/(o/3o/3T) at 0.1d/3old' Calculate Gl (090090T) evaluated evaluated at Set 0oid Set /3old = = 0/3. Set £old = = G. £. Set Gold Calculate the the new new coefficient coefficient vector vector ,Q /3 = = ,Q/3old H-1g. Calculate + H-'g. old + Calculate the the new new log-likelihood log-likelihood G. £. Calculate Return to to step step 6. Return 6.

The complete theory theory and and derivation derivation of of these these algorithms algorithms are are given given in in Hardin Hardin The complete and Hilbe (2001) (2001).. and Hilbe Software 11.3 .3 Software There are aa number number of general purpose purpose statistical statistical packages packages that that offer offer different different There are of general levels of support support for for fitting fitting the the models models described described in in this this text. text. In In the the following following levels of subsections, we discuss discuss only only those those packages packages that that were were used used in in preparing preparing the the subsections, we output for the the various various examples examples we we have have used. used. While While we we specifically specifically give give inforoutput for information and show output from from these these particular particular packages, packages, we we should should emphasize emphasize mation and show output that none none of these products products fully fully supports supports all all of the models models and and diagnostics that of these of the diagnostics that we we discuss discuss.. Each Each of of the the packages packages mentioned mentioned in in this this section section has has built built in in that support for user-written user-written programs, programs, so so that that included included output output and and examples examples can can support for be obtained obtained with with sometimes sometimes minimal minimal programming programming effort effort.. be A researcher researcher who who intends intends to to investigate investigate all of the the details details and and techniques techniques A all of outlined here will will ultimately ultimately need need to to engage engage in in programming programming.. For For the the diagdiagoutlined here nostics and tests, tests, the the level level of of programming programming is is minimal minimal so so that experience with with nostics and that experience any of any of these these packages packages should should allow allow the the interested interested reader reader sufficient sufficient expertise expertise to obtain obtain results. results. A A more more profound profound level level of of programming programming expertise expertise is is required required to for for models models that that are are extensions extensions from from the the principal principal collection collection of of models. models. The The commercial commercial vendors vendors for for the the software software packages packages used used in in this this text text can can provide license license information information as well as as technical technical support support.. Depending Depending on the provide as well on the

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

1122

INTRODUCTION INTRODUCTION

breadth of of analyses analyses an an individual individual must must perform, perform, having having access access to to more more than than breadth one of these these packages packages may may be be aa great advantage.. Horton Horton and and Lipsitz Lipsitz (1999) (1999) one of great advantage present aa particularly particularly useful useful review review of of software software that that focuses focuses on on the the four four softsoftpresent ware packages used. used. Hilbe offers aa similarly similarly useful useful review review of of software software ware packages Hilbe (1994a) (1994a) offers for GLMs. GLMs. for

1.3.1 S-PLUS S-PLUS 1.3.1 S-PLUS is is aa general general purpose purpose statistical package available available from from Insightful Insightful CorCorS-PLUS statistical package poration. See See http http://www com for for information pricing.. poration. : // www .. insightful. insightful . com information and and pricing S- PL US is is the the commercial commercial version version of of S code originally developed by by research research S-PLUS S code originally developed statisticians at AT&T. AT&T. The The SS language language became became popular popular in in academic academic circles circles to to statisticians at which was freely freely distributed distributed and and supported. Once commercialized, commercialized, S-PLUS which it it was supported . Once S-PLUS maintained its popularity popularity with with academicians. academicians. Gradually Gradually S-PLUS S-PLUS found found favor favor maintained its with corporate research research institutions institutions and and is is currently currently one one of of the the most most popular popular with corporate statistical packages in in use use.. statistical packages Behind GLIM, GUM, S-PLUS S-PLUS was was one one of the first first packages packages to to implement implement an an alalBehind of the gorithm for generalized generalized linear linear models, models, or or GLMs. GLMs. First First authored authored by by Trevor Trevor gorithm for Hastie, the GLM GLM procedure procedure has has remained remained virtually virtually unchanged since its its incepincepHastie, the unchanged since tion.. It It uses uses aa basic basic IRLS IRLS algorithm algorithm with with an an expected expected information information matrix matrix used used tion to derive derive standard standard errors errors.. S-PLUS S-PLUS supports supports the the traditional traditional GLM GLM families, but families, but to does not estimate geometric or or negative negative binomial binomial models. models. does not estimate geometric S-PLUS has has no no built-in built-in support support for for fitting GEE models models.. However, However, there there fitting GEE S-PLUS are various user-written user-written macros macros or or packages packages available available on on the the Internet Internet for for fitfitare various ting these these and and other other similarly similarly oriented oriented models models.. S-PLUS S-PLUS is is in wide use use by by ting in wide researchers due to to its its feature-rich feature-rich collection collection of of commands, commands, object-oriented object-oriented dederesearchers due sign, and support for linking linking directly directly to to externally externally compiled compiled code. code. The The original original sign, and support for user-written software software for for fitting fitting PA-GEE PA-GEE models models was was a a program program written written in the user-written in the C language. language. This This program program is is accessed accessed directly within the the S-PLUS S-PLUS system system.. C directly from from within Other user-written user-written programs programs are are available available including including YAGS YAGS (yet (yet another another Other GEE solver). solver). This This package package was originally developed developed for for use use with with S-PLUS S-PLUS GEE was originally version 3.4, which which has has since since been been superseded superseded by by aa number number of version enhanceenhanceversion 3.4, of version ments. At this this writing writing the the current version is is S-PLUS S-PLUS 6.0. 6.0. Since Since the the methods methods ments. At current version for accessing accessing externally externally compiled compiled code code have have changed changed through through the the versions versions of of for S-PLUS, the the YAGS YAGS package package is is not not usable usable with the current current version version of of S-PLUS S-PLUS.. S-PLUS, with the A programming language language founded founded on the syntax syntax of of S-PLUS A community-based community-based programming on the S-PLUS called R may may be be substituted in many many analyses; the current current version version of of YAGS YAGS called R substituted in analyses ; the can also be be used used with with R. R. The R language language is is fast fast becoming becoming aa popular popular means means can also The R of performing statistical statistical analysis analysis within within the the academic academic community, community, much much like like of performing in the was in the 1980s. SS was 1980s . geex is geex is another another GEE GEE package package for for use use with with S-PLUS S-PLUS.. It It is is available available from from http : // lib .. stat x . While http://lib stat.. cmu emu.. edu/S/gee edu/SI geex. While this this package package is is somewhat somewhat older older than currently currently developed packages, the the geex geex software software can can be be used used with with current current than developed packages, versions of S-PLUS S-PLUS and and is is the the GEE GEE module module we we use use for for many many examples examples in in this this versions of text. text. Graphics and and support support for for low-level low-level control control of of graphics graphics are particularly Graphics are aa particularly

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

SOFTWARE SOFTWARE

13 13

strong feature of of the the package package.. Nearly Nearly all all of of the the graphics graphics in in this this text text were were strong feature produced using using S-PLUS S-PLUS.. produced

1.3.2 SAS SAS 1.3.2 SAS is is aa general general purpose purpose statistical package with with the the largest user base base of of any any SAS statistical package largest user statistical product.. Information Information and and pricing pricing may may be be obtained obtained from from statistical software software product the SAS SAS Institute Institute company company website website at at http http://www.sas.com. the : //www . sas . com . A user user defined defined SAS macro was was the the only only recognized recognized program program for for estimating estimating A SAS macro GEE models models in in the the late late 1980s. The macro macro estimated models for for GEE 1980s. The estimated PA-GEE PA-GEE models the Gaussian, binomial, Poisson, Poisson, and gamma families families.. It It was was used used exclusively exclusively the Gaussian, binomial, and gamma until other other packages, packages, including including SAS, implemented the routines into into their their main main until SAS, implemented the routines statistical offerings. statistical offerings. as part part of of its its SAS incorporates aa full full range range of SAS incorporates of PA-GEE PA-GEE model model estimation estimation as STAT/GENMOD STAT/GENMOD built-in built-in procedure procedure.. GENMOD GENMOD is is an an all-purpose all-purpose GLM GLM modmodeling facility eling facility that that was was written written by by Gordon Johnston in in the the early early 1990s. UserGordon Johnston 1990s. Userwritten facilities were were brought brought together together from from sources such as as Hilbe Hilbe (1994b) (1994b) to to written facilities sources such complete this procedure procedure for for GLMs. GLMs. In In the the mid-1990s mid-1990s aa REPEATED REPEATED option option complete this was added to to the the main main GENMOD GENMOD program program.. The The new new option option allows allows estimation estimation was added of the standard standard PA-GEE PA-GEE models models using using either either moment moment estimators estimators or or alternatalternatof the ing logistic regressions regressions.. It It also also allows allows estimation estimation of the dispersion dispersion parameter parameter ing logistic of the via user specified specified option option for for two two different different moment moment estimators. estimators. It It is is possible possible via aa user to perform perform GEE GEE estimation estimation on on all standard GLM GLM families families.. all standard to There are aa number number of of user-written user-written macro macro additions that may may be be obtained obtained There are additions that and used as as alternatives alternatives to to those those commands commands included in the the software software as as well well and used included in as additions to to the the base base package package.. as additions In addition, addition, SAS SAS has has excellent excellent features features for for fitting fitting likelihood-based likelihood-based models models In for panel panel data. data. Two of the the highlights highlights of of this this support support include mixed linear linear for Two of include mixed regression models and and mixed mixed nonlinear nonlinear regression regression models. models. regression models

1.3.3 Stata Stata 1.3.3 Stata is is an an all all purpose purpose statistical statistical package package that that has has excellent excellent support support for for aa Stata variety panel models. models. Stata's Stata's GEE GEE program, program, called called xtgee, xtgee, is is built built upon upon variety of of panel its glm program program.. However, However, unlike unlike SAS, SAS, Stata's Stata's GEE GEE program program is is aa separate separate its g1m command. Hilbe wrote wrote the the first first comprehensive program for for Stata Stata in in 1993. 1993. command . Hilbe comprehensive glm g1m program Hardin wrote the the xtgee xtgee code code several several years years afterwards. afterwards. Hilbe Hilbe (1993a) (1993a) wrote wrote Hardin wrote the original glm user-written user-written command, command, and and the the current current Stata Stata version version of of g1m glm the original g1m was written by by Hardin Hardin and and Hilbe Hilbe in in 2000. 2000. was written for speciStata has has good good support support for for PA-GEE PA-GEE models models along along with with options options for Stata specifying for the fying two two different different estimators estimators for the dispersion dispersion parameter parameter.. All All GLM GLM families families are included as as modeling modeling options, options, including including power power and and odds odds power power links links.. are included In In addition addition to to its its basic basic xtgee xtgee program, program, Stata Stata employs employs separate program separate program procedures for for specific specific models models.. For For instance, instance, the the command command xtpois xtpois can can be be used used procedures fixed effects, to estimate estimate either either population population averaged averaged (PA-GEE), (PA-GEE), conditional conditional fixed effects, or or to random effects Poisson Poisson models. random effect effect may may in in turn turn be be designated designated random effects models . The The random

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

14 14

INTRODUCTION INTRODUCTION

as belonging to to the the gamma gamma or or Gaussian Gaussian distribution. distribution. This This facility facility allows allows the the as belonging user to easily easily compare compare alternative alternative models. models. user to In addition to the the built-in built-in support support for PA-GEE models, models, Stata Stata users users have have In addition to for PA-GEE contributed programs for fitting generalized generalized linear linear and and latent latent models models that that contributed programs for fitting include multi-level random random effects effects models. models. This This addition addition includes includes an an adaptive adaptive include multi-level quadrature optimization routine routine that that outperforms outperforms the the usual usual (nonadaptive) (nonadaptive) quadrature optimization Gauss-Hermite quadrature quadrature implementations implementations of many random-effects random-effects models. models. Gauss-Hermite of many Stata also also has has aa rather rather extensive extensive suite suite of of survey survey models, models, allowing allowing the the user user to to Stata select and survey survey weights weights.. select strata strata and Information and and pricing pricing are are available available from from http http://www.stata.com. Information : //www . stata . com.

1.3.4 SUDAAN SUDAAN 1.3.1, SUDAAN, developed developed by by Research Research Triangle Triangle Institute Institute in in North North Carolina, Carolina, is is aa SUDAAN, general purpose survey survey and and panel panel data data modeling modeling package package that that can can be be used used general purpose alone or as as aa callable callable program program from from within within the SAS package package.. Information Information and and alone or the SAS pricing pricing is is available available from http://www rti.. org/sudaan org/ sudaan.. from http : //www .. rti SUDAAN was was specifically specifically designed designed to to analyze analyze complex complex survey survey data. data. HowSUDAAN However, it can can also also be be used used for for data data without without aa specified specified sampling sampling plan. plan. While While ever, it SUDAAN does does not not allow allow the the modeling modeling of of several several mainstream mainstream GLM families SUDAAN GLM families and links, it it does does include include certain certain features features not not found found in in the the other other packages packages.. and links, Most notable among among these these is is support support for PA-GEE multinomial multinomial logistic logistic rerefor PA-GEE Most notable gression models (demonstrated in this this text) text).. All All SUDAAN SUDAAN commands commands include include gression models (demonstrated in support for specification of strata strata and survey weights weights.. support for specification of and survey SUDAAN provides provides an an option option to to estimate standard errors errors for for its its PA-GEE PA-GEE SUDAAN estimate standard models using jackknife jackknife procedures. procedures. Stata Stata allows allows this this option, option, as as well well as bootmodels using as bootstrapped standard errors, errors, for for its its glm but it it is is not not as as yet yet impleimplestrapped standard g1m command, command, but mented for the the xtgee xtgee command command.. mented for Bieler and and Williams Williams (1997) (1997) have have written written an excellent user user manual manual for for SUSUBieler an excellent DAAN that provides provides numerous numerous examples examples of of how how to to use use and interpret GEE GEE DAAN that and interpret models using SUDAAN SUDAAN software. software. models using

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

EXERCISES EXERCISES

15 15

Exercises 11.4 .4 Exercises 1. Obtain Obtain the the documentation documentation for for the the GLM GLM and GEE commands commands that that are are 1. and GEE supported by the software packages that you will use. supported by the software packages that you will use .

2. The The table table of of link functions does does not not include include entries entries for for the the negative negative bibi2. link functions nomial distribution (for specified specified ancillary ancillary parameter) parameter).. Derive Derive the the entries nomial distribution (for entries for this this function function.. for 3. Identify Identify and discuss two two methods methods for for dealing dealing with with overdispersion overdispersion in in aa 3. and discuss GLM.. GLM 4. Construct Construct aa GLM GLM IRLS IRLS algorithm algorithm specifically specifically for for the the log-linked log-linked Poisson Poisson 4. regression model model using using the the programming programming language language of of your your software software of of regression choice.. choice 5. Create Create aa list list of of GLM GLM and and GEE GEE capabilities capabilities for for each each software software package package 5. that you you use use.. Include Include in in your your list list the the available available link functions, variance variance that link functions, functions, dispersion dispersion statistics, statistics, variance variance estimators, and other other options options.. functions, estimators, and 6. Show how the the link link and 6. Show how and exponential family form exponential family form

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

variance functions functions can can be be abstracted abstracted from from the the variance of the GLM probability distribution. of the GLM probability distribution .

CHAPTER 2 CHAPTER2

Model Construction Estimating Model Construction and and Estimating Equations Equations In this chapter chapter we we review review various various modeling modeling techniques techniques in in order to provide provide In this order to aa common common glossary glossary of of terms terms to to be be used used throughout throughout the the text. text. This This review review also provides aa valuable valuable base base to to which which one one can can refer refer when generalizations are are also provides when generalizations introduced. introduced . We begin begin with with aa review review of of likelihood-based likelihood-based regression regression models models.. Our Our discusdiscusWe sion of the the standard standard techniques techniques for for deriving deriving useful useful models models begins begins with with illusillussion of trations on on independent independent data. data. Our Our focus focus is is on on the the derivation derivation of of the the likelihood likelihood trations for buildand estimating equation equation.. After After illustrating illustrating the the standard standard techniques techniques for buildand estimating ing estimating ing estimating equations equations for for likelihood-based likelihood-based models, models, we we review review the the estimating estimating equation for generalized generalized linear linear models models (GLMs) (GLMs).. We point out out the the generalizageneralizaequation for We point tions and and relationship relationship of of GLMs GLMs to to the the probability-based probability-based models models that that precede precede tions them in in the the discussion discussion.. them After After reviewing reviewing the the techniques techniques for for model model construction construction with with independent independent data, we introduce introduce the the concepts associated with with panel panel data data and and highlight highlight the the data, we concepts associated likelihood-based techniques for addressing second-order dependence within likelihood-based techniques for addressing second-order dependence within the data. data. Finally, Finally, we we present present estimators estimators for for the the variance variance of of the the regression regression the coefficients so that similar estimators for generalized estimating equations coefficients so that similar estimators for generalized estimating equations may be subsequently introduced with context. may be subsequently introduced with context . Independent data data 22.1 .1 Independent

A common introduction introduction to to likelihood-based likelihood-based model model construction construction involves involves sevsevA common eral standard steps steps which which follow follow:: eral standard 1. Choose Choose aa distribution distribution for for the the outcome outcome variable variable.. 1. 2. Write Write the the joint joint distribution the data data set. set. 2. distribution for for the the joint distribution to a likelihood. 3. Convert 3. Convert the joint distribution to a likelihood. 4. Generalize Generalize the the likelihood likelihood via via introduction introduction of of a a linear linear combination combination of of co4. covariates and and associated coefficients.. variates associated coefficients 5. Parameterize Parameterize the the linear linear combination combination of of covariates covariates to enforce range range rere5. to enforce strictions on the mean and variance implied by the distribution. strictions on the mean and variance implied by the distribution. 6. Write Write the the estimating estimating equation equation for for the the solution solution of of unknown unknown parameters. parameters. 6. the model is derived, derived, we we may may choose choose to to estimate the fully fully specified specified Once Once the model is estimate the log-likelihood with any any extra extra parameters, parameters, or or we we may may consider consider those those extra extra papalog-likelihood with ancillary to rameters to the the analysis. analysis. The The former former is is called called full full information information maxmaxrameters ancillary imum (FIML);; the the latter latter is is called called limited information maximum maximum imum likelihood likelihood (FIML) limited information 17 17

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

18 18

MODEL CONSTRUCTION CONSTRUCTION AND AND ESTIMATING ESTIMATING EQUATIONS EQUATIONS MODEL

likelihood Estimation may may then then be be carried carried out out using an optimizaoptimizalikelihood (LIML). (LIML) . Estimation using an tion method. method. The most common common technique technique is is that that of of Newton-Raphson, Newton-Raphson, or or aa tion The most modification of the the Newton-Raphson Newton-Raphson estimating estimating algorithm. modification of algorithm . We present present an an overview overview of of this this and and other other optimization optimization techniques techniques in in secsecWe tion 2.4. 2.4. For detailed derivation derivation of of the the Newton-Raphson Newton-Raphson and iteratively tion For aa detailed and iteratively reweighted least squares squares (IRLS) algorithms, see see Hardin Hardin and and Hilbe Hilbe (2001) (2001).. For For reweighted least (IRLS) algorithms, aa practical practical discussion discussion of of optimization optimization in in general, general, see see Gill, Gill, Murray, Murray, and and Wright Wright (1981).. (1981) The next The next three three subsections subsections illustrate model construction construction for for three three specific specific illustrate model distributions. In carrying carrying out out the the derivation derivation of of each each respective respective model, model, we we ememdistributions . In phasize phasize the the steps to model model construction, construction, the the need need for for parameterization, parameterization, and and steps to the identification of the the estimating estimating equation equation.. the identification of .1 The 2.1.1 FIML estimating estimating equation for linear linear regression regression 2.1 The FIML equation for

Let us assume assume that that we we have have aa data set where where the the outcome outcome variable variable of of interest interest Let us data set is continuous with with aa large large range. range. In In this the normal normal is (effectively) (effectively) continuous this situation situation the typically used used as the foundation foundation for for estimation. estimation. (Gaussian) distribution (Gaussian) distribution is is typically as the The density for for the the normal normal distribution distribution N(p,Q2) N(f-l, 1J"2) is is expressed expressed as: as: The density

f(Y~N, U

2) =

ex p

1

2

rl

(y

(2.1) (2.1)

2Q2 - ~)2

where where

E(y) E(y) V(y) v (y)

(2.2) (2.2) (2.3) (2.3)

W 0r2 > 0

=

p E

=

and indicates the the range range of of real real numbers. We may may write write this this density density for for aa and !R W indicates numbers . We single outcome as as single outcome

2

(y2 - /_t ) 2

1 27Q 2

(2.4)

The joint density density for n independent independent outcomes outcomes subscripted subscripted from from 1, is the the The joint for n 1, .... . . ,,n n is product of the densities for the individual outcomes product of the densities for the individual outcomes

f(yl, . . . 'ynl/-t'Q 2 )

n 2-1 n

2 1 exp ~- (y2 - F_t) 2Q2 2~Q2

H exp 2-1

-

In (2~Q2 ) - (y22 2F~) Q

(2.5) (2.5) 2

(2.6) (2.6)

The likelihood is is simply restatement of of the the joint joint density density where where we we consider consider The likelihood simply aa restatement

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

INDEPENDENT DATA INDEPENDENT DATA

119 9

the outcomes as given, given, and model the the parameters parameters as as unknown unknown the outcomes as and model

L(p, Q2 ly1, . .

.

yn)

_

(yi2~2) 2 exp ~ - 2 In (27FQ 2 ) 1i=1 n ~_ 2 exp In (27rQ 2 ) - (yi2or2) n

(2 (2.7) .7) (2.8) (2.8)

Since our our goal goal is to introduce introduce covariates that model model the the outcome, outcome, we we add add Since is to covariates that aa subscript subscript to to the the notation, changing pM to to lei, Mi, allowing allowing the the mean mean to to reflect reflect notation, changing aa dependence dependence on on aa linear linear combination combination of of the the covariates covariates and and their their associated associated coefficients. coefficients . n i) 2 , (2 (2.9) exp - 1 In (27FQ 2 ) - (yi .9) L(lt, Q 2 1yl . . , yn) = 2~2 i=1

i) 2 exp ~~ -~ In (27rQ 2 ) - (yi 2or2 n

(2.10) (2.10)

We introduce introduce covariates covariates into into the the model model as as aa function function of of the the expected expected value value of of We the outcome outcome variable variable.. We We also also assume that we we have collection of of independent independent the assume that have aa collection covariates with associated associated coefficients coefficients to to be be estimated estimated.. The The linear linear combination combination covariates with of the covariates covariates and and the the associated associated coefficients coefficients is is called called the the linear linear predictor, predictor, of the 'f}i = xi,Q xi/3 EE J2, !R, where where xi Xi is is the the ith ith row row of of the the X X matrix. matrix. The The linear linear predictor predictor 77i = is into the the model model in in such such aa way way that that the the range range restrictions restrictions of of the the is introduced introduced into distribution are observed. observed. distribution are For this this particular particular case, the variance variance of of the the outcome outcome is is V V(Yi) = Q(72, which For case, the 2 , which (yi) = does not impose impose any any range range restrictions restrictions on on the the expected expected value value of of the the outcome outcome.. does not Further, the range range of the expected expected value value matches matches the the range range of of the the linear linear prepreFurther, the of the dictor. As such, we could could simply replace the the expected expected value value P Mwith the linear linear dictor. As such, we simply replace with the predictor.. Formally, Formally, we we use use the the identity identity function to parameterize parameterize the the mean mean as as predictor function to g(Ni) = Ni =

xO

(2.11) (2 .11)

Under this approach, approach, equation equation 2.12 2.12 is is our our likelihood-based model for for linear linear rereUnder this likelihood-based model gression. Replacing the the expected expected value value with with our suitably parameterized parameterized linear linear gression. Replacing our suitably predictor results results in in the the log-likelihood log-likelihood predictor n 2 Xij3)2} _ (yi - xi,3) 1'(13 ,(7 21 X,Y1,··· ( 27f(7 ) ~{ _ _111Inn (2 7FQ_ 22)) - (YiG('3' U IX, y1, . . ,,Yn yn) = ~ -"2 J..2(72 2 2Q2

(2.12) (2.12)

Even though the identity identity parameterization parameterization is is the the natural, natural, canonical, canonical, Even though the rameterization from the derivation, we are not limited to that choice.. In In rameterization from the derivation, we are not limited to that choice case that the outcomes are always positive, we could choose case that the outcomes are always positive, we could choose g(Ni)

= In(pi) = xi,3

papathe the

(2.13) (2.13)

resulting in the familiar familiar log-linear log-linear regression regression model model.. The The parameterization parameterization via via resulting in the the log log function function implies implies that that g-1 g-l (Xij3) = exp(xi,Q) exp(xij3) = = lei, Mi, and and ensures ensures aa desired the (xi,Q) = desired nonnegative fit from from the the linear linear predictor predictor.. Under Under this this log log parameterization, parameterization, our our nonnegative fit

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

20 20

MODEL CONSTRUCTION CONSTRUCTION AND AND ESTIMATING ESTIMATING EQUATIONS EQUATIONS MODEL

final log-likelihood model for for log-linear log-linear regression regression is is final log-likelihood model

. Y, 2 { 1 In (2~Q } (7 IX, YI, ... ,Yn)) _ =~ ~ -"22 In (27f(722)) -- (Yi - exp(Xi,8))2 2(72 £(,8, Q2IX, G('3' } 2Q2

(2.14) (2.14)

the next next step step is is to to specify specify the the estimating estimating equaequaFor aa likelihood-based likelihood-based model, model, the For tion. The solution to the estimating equation provides the desired estimates. tion . The solution to the estimating equation provides the desired estimates . In the case case of of aa likelihood-based likelihood-based model, model, the the estimating estimating equation equation is is the In the the derivaderivative of the log-likelihood. We either derive an estimating equation in terms of of tive of the log-likelihood . We either derive an estimating equation in terms o = (,8, (72) (a FIML model), or we specify an estimating equation in terms 0 = (Q, Q2 ) (a FIML model), or we specify an estimating equation in terms of 0 = = (Q) (,8) where where Q2 (72 is is ancillary ancillary (a (a LIML LIML model) model).. The The ancillary ancillary parameters parameters of 0 in a LIML model are either estimated separately, or specified. The resulting in a LIML model are either estimated separately, or specified . The resulting estimates for the parameters are conditional on the ancillary parameters being estimates for the parameters are conditional on the ancillary parameters being correct. correct . Using the the identity identity link link for for parameterization parameterization of of the the linear linear predictor, predictor, the the Using linear regression FIML FIML estimating estimating equation equation T(0) '!I(0) = = 00 for for (3 (,8pXI' is given linear regression given p,, ,or(72) 2 ) is by by

8C

= i-1

X ji 2 (Yi - xi'3) 07

-

= [0] (pH) xl (P+l) Xl = [0]

2 8G 1 + (gi - xi,3) 2Q4 8U2 ( 2Q2 n

(2 .15) (2.15)

(pH) xl

Note, however, however, that that we we write write the the estimating estimating equation equation in in terms terms of of P, f.J" rather rather Note, than x,8, to incorporate a general parameterization ofthe linear predictor. To than x,Q, to incorporate a general parameterization of the linear predictor . To include the parameterization, we use the chain rule include the parameterization, we use the chain rule

o£ _ o£ 8G 8G 8p Of.J, 877 0'fJ --of] Of.J, 877 0'fJ of] 8/3 8p 8/3

(2.16) (2.16)

In this In this more more general general notation, notation, the the estimating estimating equation equation T(O) '!I(0) = = 00 is is given given by by 8G 0~j 8G 8U2

n

i-1

-

(Yi - lai 2 Q )

i-1

W

i

Xji

)2 1 + (gi - Ni ( 2Q2 2Q4

~j=l_ .'P

= [0](P+l)xl [O](PH)XI = l (pH) x xl (P+l)

(2.17) (2.17)

and we must must specify specify the the relationship relationship (parameterization) (parameterization) of ofthe expected value value and we the expected it to of linear JL to the the linear linear predictor predictor r7 1] = = X,Q. X,8. In In the the case case of linear regression, regression, r7 1] = = it. JL. The for the The estimating estimating equation equation for the LIML LIML model model T[0 '!I[0 = = (,8)] 0, treating treating Q2 (72 = 0, (Q)] = as ancillary, is is just just the the upper upper p p x x 11 part part of of the the estimating estimating equation equation 2.17. 2.17. as ancillary,

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

INDEPENDENT DATA INDEPENDENT DATA

2211

2.1.2 The FIML FIML estimating estimating equation for Poisson Poisson regression regression 2.1 .2 The equation for The Poisson The Poisson distribution distribution is is the the natural natural choice to model model outcome outcome variables that choice to variables that are nonnegative counts. Poisson density density is is given given by by are nonnegative counts. The The Poisson Y j(yIA) = e-aAY e-'\ A yy!i

f (yl A) =

(2.18) (2.18)

where where

E(y) E(y) V(y) V(y)

= =

AA>O > 0 AA>O > 0

(2.19) (2.19) (2.20) (2.20)

The joint density for n n independent independent outcomes outcomes subscripted subscripted from 1, .... The joint density for from 1, . . ,n n is is then given given as as the the product product of of the the densities for the the individual individual outcomes outcomes then densities for f (yi, . . . , yn l A)

2-i

_

n n fl exp {-A i=l i=1 nn fJ exp exp {-A {-A

II

2-1 i=l

(2.21) (2.21)

y2!

+ y2 In(A) - In(yi!)}

(2.22) (2.22)

y2 In(A) In P(yi + 1)} + Yi In(A) -lnf(Yi + I)} +

(2.23) (2.23)

The likelihood is is aa restatement restatement of of the the joint joint density where we we consider consider the the The likelihood density where outcomes as given given and model the the parameter parameter as as unknown unknown outcomes as and model

. . ,,Yn) L(AIYl, .... = L(Alyi, yn) =

nn

InF(y2 + II exp {{-AA+ + yi Yi In(A) In(A) -lnf(Yi + 1)} I)} H exp

i=1 i=l

(2.24) (2 .24)

Since our our goal goal is is to to introduce introduce covariates covariates that that model model the the outcome, outcome, we we add add Since aa subscript subscript to to the the notation notation allowing allowing the the mean mean to to reflect reflect aa dependence dependence on on aa linear linear combination combination of of the the covariates covariates and and their their associated associated coefficients coefficients.. We We the usual usual presentation presentation of of the the Poisson Poisson distribution distribution using using P /-l for for the the also replace the also replace expected value A. Replacing Awith /-l is merely for notational consistency (with expected value A. Replacing A with p is merely for notational consistency (with the models models to to follow), follow), and and has has no no effect effect on on the the derivation derivation of of the the estimating estimating the equation. equation.

. . ,Yn) L(JLIYl, = yn) = L(It l yi, ....

n n

In r(y2 + II exp f-p2 {-/-li + + yi Yi In(p2) In(/-li) -lnf(Yi + 1)} I)} H exp

2-1 i=l

(2.25) (2.25)

As in in the the previous previous derivation derivation for for linear regression, we we introduce introduce covariates As linear regression, covariates into the model through the expected value /-l of the outcome variable, and and we we into the model through the expected value p of the outcome variable, assume a collection of independent covariates with associated coefficients to assume a collection of independent covariates with associated coefficients to be estimated called the linear predictor 'f}i = xi/3 E !R; note that Xi is the ith be estimated called the linear predictor y2 = x2,3 E W; note that x2 is the ith row of the the design design matrix X. row of matrix X. We introduce introduce the the linear linear predictor predictor into into the the model model in in such way that that the the We such aa way range restrictions of of the the distribution distribution are are observed. observed. In In this this particular particular case, case, the the range restrictions

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

22 22

MODEL CONSTRUCTION CONSTRUCTION AND AND ESTIMATING ESTIMATING EQUATIONS EQUATIONS MODEL

variance of the the outcome outcome is is given given by by variance of

(2 .26) (2.26)

V (yi) = Pi > 0

which depends on on the the expected expected value value of of the the outcome outcome.. In In fact, fact, for for the the Poisson Poisson which depends distribution, the variance variance is is equal equal to to the the expected expected value value.. Therefore, Therefore, we we should should distribution, the parameterize the the linear linear predictor predictor to to enforce enforce aa range range (0, (0,00). The natural, natural, or or parameterize oo) . The canonical, choice obtained from our our derivation derivation is is canonical, choice obtained from

g(Ni) = In(pi) = xi,3

(2.27) (2.27)

This parameterization of ofthe canonical Poisson Poisson link link function function implies implies an This parameterization the canonical an inverse inverse relationship given by by g-1 g-l (x2,3) (xi/3) = = exp(xi,3) exp(xij3) = = Mi, which ensures a nonnegative relationship given which ensures a nonnegative pi, fit from from the the linear linear predictor predictor.. Under this parameterization parameterization for for the the expected expected fit Under this value, the final final log-likelihood log-likelihood is is given given by by value, the £(j3IX, Y1, Y1,· .... ,,Yn) = yn) _ GWIX,

IF (yi + ~ {~ -- exp(xi,3) exp(xij3) + + yixi,3 Yixij3 - In In f(Yi + 1) 1) }

(2.28) (2.28)

The general FIML FIML estimating estimating equation equation '!J(0) = 00 for for 0 = (3) (13) is is then then The general T(0) = 0 =

[{gt, 8~j

( N2 -

1) (~~) ~ji i

7

+

,P 1PX1

- [0]Pxl

(2.29) (2.29)

where there are are no no ancillary ancillary parameters. parameters. where there

2.1.3 FIML estimating estimating equation for Bernoulli Bernoulli regression regression 2.1 .3 The The FIML equation for Assume that the the outcome outcome variable variable of of interest interest is is binary binary and and that that our data are are Assume that our data coded that aa successful successful outcome outcome in in the the experiment experiment is is coded coded as one and and coded such such that as aa one aa failure failure is is coded coded as as aa zero. zero. The The Bernoulli Bernoulli distribution, distribution, aa limiting limiting case case of of the the binomial binomial distribution, distribution, is is the the appropriate choice for for estimation of binary binary data. data. appropriate choice estimation of Its density function function is Its density is -Y f(ylp) = P, pY(I - Al p?-Y (1 f (YIP) =

(2.30) (2.30)

where [0,1] is is the the probability probability of of success, success, and and where pp EE [0,1]

E(y) E(y) V(y) V (y)

=

p PE E (0,1) (0,1)

=

p(I-p) E (0,1) (0,1) p(1 - p) E

(2.31) (2.31) (2.32) (2.32)

The joint density for n n independent independent outcomes outcomes subscripted subscripted from 1, .... The joint density for from 1, . . ,n n is is then given given as as the the product product of of the the densities for the the individual individual outcomes outcomes then densities for f(yl, . .

Y . 1P)

= =

n n

p)1-Pi P (1IIpYi fl yi (1- p?-Yi

(2.33) (2.33)

i=l i=1

fl ex+i ln C i=1

P

+ ln(1 - p)

~

(2.34) (2.34)

The likelihood is is simply restatement of of the the joint joint density density where where we we consider consider The likelihood simply aa restatement

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

INDEPENDENT DATA INDEPENDENT DATA

223 3

the outcomes as given given and and the the parameters parameters are are modeled as unknown unknown.. the outcomes as modeled as

+ L(ply1, .

. . , yn) =

fl exp {yi In C l p p

ln(1 - p)

i=1

~

(2.35) (2.35)

Since our our goal is to to introduce introduce covariates covariates that that model model the the outcome, outcome, and and since since Since goal is we are interested interested in in the the individual individual contributions contributions of of each subject to to the the model, model, we are each subject we introduce aa subscript subscript to to the the notation. notation. This notation, changing changing pP to to pi, Pi, we introduce This notation, allows the mean mean response response to to reflect reflect aa dependence dependence on on the the linear linear combination combination of of allows the the covariates covariates and and their their associated associated coefficients coefficients.. We We also also replace replace the the common the common presentation of of the the Bernoulli Bernoulli expected expected value value pP with with p. f.J,. In In so so doing, doing, we we have have aa presentation consistent notation among among various various distributions distributions.. consistent notation . . ,,Yn) L(JLIYI, .... = yn) _ L(/tly1,

}] exp { Yi H exp yi In In i=1

(1 ~if.J,i) +

Ni ( 1 - pi

)

+ In(1 In(l - f.J,i) pi) }}

(2.36) (2.36)

Again, we introduce covariates covariates into into the the model model through through the the expected expected value value Again, we introduce the outcome variable.. As As in in the the previous previous example, example, we we assume assume aa collection collection of of the outcome variable of covariates with with associated associated coefficients coefficients to to be be estimated estimated called called of independent independent covariates the linear predictor yi 'f}i = = xi,Q xi/3 EE J2. !R. the linear predictor We introduce introduce the the linear linear predictor predictor into into the the model model in in such way that that the the We such aa way range restrictions of of the the distribution distribution and and variance variance are are observed. In this this parparrange restrictions observed . In ticular case, case, the the variance variance of of the the outcome outcome is by ticular is given given by

v(yi)=

(2.37) (2.37)

pi(1 - pi)

where f.J,i EE (0,1) depends on on the the expected expected value value of of the the outcome outcome.. Therefore, Therefore, where pi (0,1) depends we should parameterize the linear predictor to enforce a range (0,1). we should parameterize the linear predictor to enforce a range (0,1) . The binomial binomial admits admits several several interesting interesting and and useful useful parameterizations parameterizations.. If If we we The parameterize using the natural, or canonical, form from the derivation of the parameterize using the natural, or canonical, form from the derivation of the estimating equation estimating equation

Ni = xi,Q (2.38) (2.38) 1-lei ( we have aa logistic, logistic, or or logit, logit, regression regression model. model. If If we we parameterize parameterize using using we have 9(Ni) = In

(2.39) (2.39) 9(Ni) = 'b' (Ni) = xO where
[{gt, ~ t, (~: -~ =~:)(~) © © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

i X'2 Xii

}L .- , p 7 -1 . . . ,P

L~ pxl

pxl [Ojp",

(2.41) (2.41)

24 24

MODEL CONSTRUCTION CONSTRUCTION AND AND ESTIMATING ESTIMATING EQUATIONS EQUATIONS MODEL

There are no no ancillary ancillary parameters. parameters. There are

2.1.4 LIML estimating estimating equation equation for for GLMs GLMs 2.1 .4 The The LIML

In the preceding preceding sections sections we we introduced level of specification that that is is not not In the introduced aa level of specification normally present in in the the illustration illustration of of likelihood-based likelihood-based models models.. The The extra extra normally present specification was in terms of of parameterizing parameterizing the the linear linear predictor predictor.. The The reason reason specification was in terms we introduced this this specification specification was was to motivate its its use use in in the the more more general general we introduced to motivate setting of deriving deriving models models for for an an entire entire family family of of distributions distributions.. setting of We looked looked at at three three specific specific distributions distributions.. Here, Here, we we investigate investigate the the exponenexponenWe tial family family of distributions.. The The advantage advantage is is that that the the exponential exponential family family not not tial of distributions only includes the the three three specific specific examples examples already already presented, presented, but but also also includes includes only includes many other distributions. many other distributions . As we we discussed discussed in in Chapter Chapter 1, the theory theory of generalized linear linear models models As 1, the of generalized (GLMs) was was introduced in Nelder NeIder and and Wedderburn Wedderburn (1972) (1972).. These These authors authors (GLMs) introduced in showed an underlying underlying unity to a a class class of of regression regression models models where where the the response response showed an unity to variable was aa member member of of the the exponential family of of probability probability distributions distributions.. variable was exponential family Again, members of of this this family family of of distributions distributions include include the the Gaussian Gaussian or or nornorAgain, members mal, Bernoulli, binomial, binomial, Poisson, gamma, inverse inverse Gaussian, Gaussian, geometric, and mal, Bernoulli, Poisson, gamma, geometric, and negative binomial distributions distributions;; see see Hilbe Hilbe (1993b) (1993b) and and Hilbe Hilbe (1994c) more negative binomial (1994c) for for more information the negative negative binomial binomial as as aa GLM. information on on the GLM . We can can proceed proceed in in aa similar similar manner to our our previous previous examples examples with with the We manner to the goal goal of deriving likelihood-based models for for this this family. family. The The exponential exponential family family of deriving likelihood-based models of distributions has has aa location location parameter parameter 9, B, aa scale scale parameter parameter a(0), a(¢», and and aa of distributions normalizing term normalizing ¢» with with probability probability density density term c(y, c(y, 0)

where where

YB-b(B) } !(y;B,¢»=exp a(¢» +c(y,¢» + C(y, ~) f(y ; 9, 0) = exp { ~ yea-(0)(e) E(y) E(y) V(y) V(y)

=

(2.42) (2.42)

(2 .43) (2.43) (2.44) (2.44)

b'(B) = = f-tp b'(9)

b"(B)a(¢» = b"(e)a(O)

The normalizing term term is is independent independent of of 9 B and and ensures ensures that that the the density density inteinteThe normalizing grates to one one.. We We have have not not yet yet listed listed any any range range restrictions. restrictions. Instead, range grates to Instead, the the range restrictions are addressed addressed after after the the estimating estimating equation equation has has been been constructed constructed.. restrictions are The variance is is aa function function of of the the expected expected value value of of the the distribution distribution and and aa The variance function of of the the (possibly (possibly unknown) unknown) scale scale parameter parameter a(0) a(¢».. The The density density for for aa function single observation is is single observation f (y2; 9, 0) = exp

y2

a(O)b(O)

+ C(y2, ~)

(2.45) (2.45)

and the joint joint density density for for aa set set of of n n independent independent outcomes outcomes subscripted from and the subscripted from 1, . . ,,n 1, .... is the the product product of of the the densities for the the individual outcomes n is densities for individual outcomes f (yi, . . . , yn ; e, ~) _ fl exp {

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

y2

b(O) a(0)

+ C(y2, 0) }

(2.46) (2.46)

INDEPENDENT DATA INDEPENDENT DATA

225 5

The likelihood likelihood is is simply restatement of of the the joint joint density density where where we we consider consider The simply aa restatement the outcomes as given, given, and model the the parameters parameters as as unknown unknown the outcomes as and model

L(e, OIyi, . . . , yn)

= fl

exp {

~2

b(O) a(0)

+

C(yi,

0) }

(2.47) (2 .47)

i-~ Instead of of introducing introducing the the covariates covariates into into the the model model at at this this point, point, it it is Instead is notationally advantageous to to instead wait and and introduce introduce covariates covariates into the notationally advantageous instead wait into the estimating equation.. We We now now include subscript for for 00 in in anticipation anticipation of of inestimating equation include aa subscript introducing the the covariates covariates.. troducing The log-likelihood log-likelihood for for the the exponential exponential family family is The is

Yieia(~) )

n {YiOi - b(Oi) } c(e, 0IY1, ..... . ,Yn) £(O,¢IY1, =~ a(¢) (e2 + C(yi, C(Yi, ¢) Yn) = ~)} + i-i ~

(2.48) (2 .48)

The goal is is to to obtain obtain aa maximum maximum likelihood likelihood estimator for 0O.. Since Since our our focus focus is The goal estimator for is only on 0, we derive LIML estimating estimating equation equation where where we we treat treat the the dispersion dispersion only on 0, we derive aa LIML parameter a(0) a(¢) as as ancillary. parameter ancillary. We know know from from basic basic principles principles that that We

E E

(00 (~~) == 00

(2.49) (2 .49)

T(0) = Our LIML LIML estimating estimating equation equation is is then then '!J(0) = 8G/89 8£/80 = where we we derive derive Our = 00 where 8C _ 8£ = ~ yi Yi - b'(0) b'(O) 80 a(¢) 00 - ~ a(~) 2=1 Utilizing the GLM result that that in in canonical canonical form form b'(0) b'(O) Utilizing the GLM result

t n

(2.50) (2 .50)

= p, /-l, we we may may write write

=

Yi - /-li 00 - ~ a(0) 80-i=1~

8£ _

- _

Yi - Ni

(2.51) (2 .51)

substituting our preferred preferred (consistent) (consistent) p /-l notation notation for for the the expected expected value value.. substituting our Since our goal is to introduce covariates that model the outcome, we ininSince our goal is to introduce covariates that model the outcome, we cluded a subscript on /-l allowing the mean to reflect a dependence on a linear cluded a subscript on p allowing the mean to reflect a dependence on a linear combination ofthe covariates and and their their associated associated coefficients coefficients.. We We can can now now use use combination of the covariates the chain rule to obtain a more useful form of the LIML estimating equation the chain rule to obtain a more useful form of the LIML estimating equation 'Q(O) = '!J(0) = 00 for for O 0 =,Q(pxl) = J3(PX1) 8£ Ic 8f] a 11

[(

- [C ~~) 00

(~:) op) (~~)

(077)

(:;J LpX1)

(2 .52) (2.52)

(Oaj)~ 077 (P xl )

y - b'(Oi)) ~~ i) 1 (( 1 ) (8/-l) ] n (Yi [ a(¢) V(/-li) ) C0~)i 8'fJ i (~~2)] (Xji) (pX1) C i a(O)e J V(~i)

~

(

[t n_

i=l

(2.53) (2 .53)

(Pxl)

ON Yi /-t i XjZ ] Yi-/-li (8/-l) a(¢)V(/-li) 8'fJ i a(O)v(pi) 077 i Xji J (px1) -

(2.54) (2 .54)

(Pxl)

so that the the general general LIML LIML estimating estimating equation equation for for the the exponential exponential family family is so that is

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

26 26

MODEL CONSTRUCTION CONSTRUCTION AND AND ESTIMATING ESTIMATING EQUATIONS EQUATIONS MODEL

given by given by

[U~ 8~j

_

E "(0)V n

Yi - /Ni

(Pi)

ON

_ f 1

Xi2 } .7

i .. .

,P

L~JPxl

(2.55) (2 .55)

Pxl

The result is is an an estimating estimating equation equation derived derived from from the the exponential exponential family family of of The result distributions where the the expected expected value value of of the the outcome outcome variable variable is is parameterparameterdistributions where ized in terms terms of of aa linear linear predictor predictor.. The estimating equation equation is is the the derivative derivative ized in The estimating of the log-likelihood log-likelihood for for the the exponential exponential family. family. It is given given in in terms terms of of the the of the It is suitably parameterized expected expected value value and and variance, variance, where where the the variance variance is is aa suitably parameterized function of of the the expected expected value. value. There There is is an an additional additional parameter parameter a(0) a( ¢» that that is function is not in the the estimating estimating equation equation;; it is an an ancillary ancillary parameter parameter called called not addressed addressed in it is in GLM the dispersion GLM literature. the dispersion in literature . Our presentation presentation in in this this section section assumes assumes that that p f-l and and V(p) V(f-l) are are the the resultresultOur ing forms from from the the chosen chosen exponential exponential family family member member distribution. distribution. Since Since the the ing forms expected and variance variance function function result result from from the the specific specific distribution distribution in in expected value value and the exponential exponential family, family, the the estimating estimating equation equation implies implies aa valid valid likelihood likelihood (in (in the terms of of the the source source distribution distribution of of our our mean mean and variance functions) functions).. ConseConseterms and variance quently, under these these restrictions restrictions we we view view equation equation 2.55 2.55 as as the the LIML LIML estimating estimating quently, under equation for GLMs. GLMs. equation for The ancillary parameter a(0) a(¢» is is taken taken to to be be the the scale scale parameter parameter 0 ¢> in in nearly nearly The ancillary parameter all GLM software implementations. One ofthe software implementations all GLM software implementations . One of the software implementations (used (used in examples in in this text) of of GEE-based GEE-based extensions to GLMs, GLMs, however, however, allows allows in examples this text) extensions to aa more more general general setting setting.. As As such, such, and and in in anticipation of later later explanation, explanation, our our anticipation of presentation leaves this ancillary parameter specified as a( ¢». presentation leaves this ancillary parameter specified as a(O). We lastly lastly turn turn to to aa discussion discussion regarding regarding the the restriction restriction of of the the range range of of We our parameterized linear predictor.. GLMs specified through through aa parameterparameterour parameterized linear predictor GLMs are are specified ization function, called called the the link link function, function, and and aa variance variance that that is is aa function function ization function, of the mean. mean. The The conservative conservative approach approach is to specify specify only only parameterizations parameterizations of the is to that ensure ensure implied implied range range restrictions the mean mean and and variance variance functions. that restrictions of of the functions . In so doing, doing, the the optimization optimization should have no no numeric numeric difficulties (outside of of In so should have difficulties (outside collinearity or poorly poorly chosen chosen starting starting values) values) iterating to the the global solution. collinearity or iterating to global solution. However, if we we choose choose aa link link function function that that does not restrict restrict the the variance variance to to However, if does not positive solutions, optimization may may step step to to aa candidate solution for for which which the the positive solutions, optimization candidate solution variance is negative negative or or undefined. undefined. For For example, example, if if we we choose choose the the log-link log-link for for variance is aa binomial binomial variance variance model, model, the the calculation calculation of of exp(xi,Q) exp(xij3) might might be be larger larger than than one or smaller smaller than than zero zero for for certain certain observations. On the the other hand, the the data data one or observations . On other hand, might support this this link link in in the the sense sense that that the the calculation calculation of of exp(xi,Q) exp(xij3) EE (0,1) (0,1) might support for for the for all all ii for the true true 13. If the the data data support nonrestrictive link, link, then then we we are are Q . If support aa nonrestrictive free to to fit fit the the model model with with this this (nonrestrictive) (nonrestrictive) link link function function and and inference inference is free is example of clear. An example of the the application application and and interpretation interpretation of of nonrestrictive nonrestrictive links links clear . An is given in in Wacholder Wacholder (1986) (1986).. While While data data may may occasionally occasionally support support aa nonrenonreis given strictive link function, function, we are not not surprised surprised when when unrestricted unrestricted optimization strictive link we are optimization steps out of of the the restricted restricted range range implied implied by by the the variance variance function. In other steps out function . In other words, we can, can, in in fact, fact, use use any any link link function. function. Whether Whether estimation estimation proceeds proceeds words, we

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

INDEPENDENT DATA INDEPENDENT DATA

227 7

to aa solution solution in in aa valid valid region region of of the the parameter parameter space space using using particular particular data, data, to however, not guaranteed. guaranteed. however, is is not 2.1.5 LIMQL estimating equation for for GLMs GLMs 2.1 .5 The The LIMQL estimating equation In the first first three three examples examples of of model model construction construction in in this this chapter, chapter, we we introduced introduced In the covariates the log-likelihood of a a model model and and then then derived the associated associated covariates into into the log-likelihood of derived the estimating equation.. Our Our presentation presentation of of likelihood-based likelihood-based generalized generalized linear linear estimating equation models, however, instead instead introduced introduced the the covariates covariates directly directly into into the the estimating estimating models, however, equation. Either way, way, the the result result was was an an estimating estimating equation equation that that included included aa equation. Either linear predictor with with an an associated associated coefficient coefficient vector vector to to be be estimated estimated.. linear predictor A to view view the the mean mean and and A powerful powerful result result from from Wedderburn Wedderburn (1974) (1974) allows allows us us to for GLMs variance functions as as part part of of the the LIML LIML estimating estimating equation GLMs with with variance functions equation for no formal restriction restriction that that they they originate originate from from aa specific the same) same) no formal specific (or (or even even the we choose choose p f-l and and V(p) V(f-l) from from aa (single) (single) member-distribution of distribution. distribution . If If we member-distribution of the exponential the estimating estimating equation then implies the associated associated the exponential family, family, the equation then implies the log-likelihood for that that distribution. distribution. Resulting Resulting coefficient coefficient estimates estimates in in this this case log-likelihood for case are properly labeled labeled maximum maximum likelihood estimates.. are properly likelihood estimates the result result by by assuming assuming that that the the form form of of the the varivariWedderburn's work work extends Wedderburn's extends the ance function is is aa known known function of the the mean mean (up (up to to aa scalar constant) and and ance function function of scalar constant) by assuming assuming independence independence of of the the observations observations.. These These are are weaker weaker assumptions assumptions by than aa derivation derivation from from aa specific specific distribution distribution.. This This extension extension of of work work under under aa than weaker set of of assumptions assumptions is is analogous analogous to to Gauss's Gauss's extension extension of of classical ordiweaker set classical ordinary squares where where the the properties properties of of the the estimates estimates for for linear regression nary least least squares linear regression are justified on on assumptions assumptions of of independence independence and and constant constant variance variance rather rather are justified than upon an assumption of normality normality.. We We are are therefore therefore free free to to choose any than upon an assumption of choose any parameterization of of the the mean mean and and variance variance function function and and apply apply them them in in the the parameterization derived estimating equation equation.. derived estimating When we we choose functions that that are are not not from from an an exponential exponential family family member, member, When choose functions the log-likelihood log-likelihood implied by the the estimating estimating equation equation is is called called aa quasilikelihood quasilikelihood the implied by defined as defined as

2(d ; P)

Y

P*

V (p*)a(O)

dp*

(2.56) (2.56)

Resulting coefficient estimates estimates are are properly properly called called maximum maximum quasilikelihood quasilikelihood Resulting coefficient estimates. The quasilikelihood is a generalization of the likelihood. Often, one one estimates . The quasilikelihood is a generalization of the likelihood . Often, refers to all estimates obtained from a GLM as maximum quasilikelihood estirefers to all estimates obtained from a GLM as maximum quasilikelihood estimates, irrespective ofthe source distribution distribution of ofthe applied mean mean and and variance variance mates, irrespective of the source the applied functions. This is technically the case for all models except those employing functions . This is technically the case for all models except those employing the canonical link, which, which, in in fact, do produce produce likelihood-based likelihood-based estimates estimates.. the canonical link, fact, do LIML estimating not alter alter the the LIML estimating equation equation given given in in the the preceding preceding We need need not We section. The LIMQL LIMQL estimating estimating equation equation for for GLMs with no no restriction restriction on on section . The GLMs with the choice choice of of the the mean mean and and variance variance functions functions is is the the same (equation 22.55) as the same (equation .55) as the case case where where we we restricted restricted the the population population of of candidate candidate choices choices for for the the mean mean the and variance functions. and variance functions .

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

28 28

MODEL CONSTRUCTION CONSTRUCTION AND AND ESTIMATING ESTIMATING EQUATIONS EQUATIONS MODEL

Estimating the the variance variance of of the the estimates estimates 22.2 .2 Estimating in the Included the original original presentation presentation of of GLMs GLMs was was aa description description of of an an iteraiteraIncluded in for fitting tively reweighted reweighted least least squared squared (IRLS) (IRLS) algorithm algorithm for fitting the the models models and and tively obtaining estimates obtaining estimates.. This This algorithm algorithm is is iterative iterative and and requires requires only only weighted weighted OLS at each step step.. The The majority majority of programmable statistical statistical software software packages packages OLS at each of programmable can be programmed programmed to to implement implement the the full full collection collection of of models models.. A A presentation presentation can be the details details and and derivation derivation of of the the IRLS IRLS algorithm algorithm together together with with estimated estimated of of the variance matrices is is covered covered in in Hardin Hardin and and Hilbe Hilbe (2001) (2001).. Here, Here, we we discuss discuss the the variance matrices results derived from from that that reference reference.. results derived

and aa short discussion on on various various estimated estimated variance variance We present present formulae formulae and We short discussion op,jorJ is is to to be be calculated calculated at at pf-l = = ~Ii and and ~¢; is is an an matrices where notationally notationally 8p/877 matrices where Full their estimate the dispersion dispersion parameter parameter a(¢». Full details on GLMs and their estimate of of the details on GLMs and a(0) . associated variance estimates estimates are in the the references references cited cited.. Additional Additional coverage coverage associated variance are in of can be be found found in in McCullagh McCullagh and and Nelder NeIder (1989), (1989), Hilbe Hilbe (1994a), (1994a), and and of GLMs GLMs can Lindsey (1997).. Lindsey (1997) the variance variance estimate estimate numerically, numerically, Statistical packages packages typically typically calculate calculate the Statistical matrix of or analytically, as as the the inverse inverse matrix of (negative) second derivatives derivatives.. AlterAlteror analytically, (negative) second natively, the matrix of natively, the estimate estimate may may be be constructed constructed from from the the Fisher Fisher scoring scoring matrix of expected second derivatives derivatives.. In In the the case case that that the the GLM GLM is is fit fit with with the the canonexpected second canonical link, these calculations result result in in the the same same estimate estimate.. Otherwise, the two two ical link, these calculations Otherwise, the estimates are only only asymptotically asymptotically the the same same.. estimates are The variance The variance estimates estimates are are given given by by _

~ _~

VH(Q) VH(,B) =

{(

22

0 £ - eau o(3u av o(3v

)}-1 l1

1

l pxp

pxp

(2 .57) (2.57)

where u, vv = = 1, 1, .... and p p is is the the column column dimension dimension of of X X.. The The Hessian Hessian matrix matrix where u, . . ,,p, p, and uses the second second derivatives derivatives (of (of the the likelihood) likelihood) while while the the Fisher Fisher scoring scoring matrix matrix uses the If the uses the expected expected second second derivatives derivatives.. If the second second derivatives derivatives are are used, used, we we uses the illustrate this by by denoting denoting VH VH as as VO VOHH to to indicate indicate that that the the variance variance estimate illustrate this estimate If the matrix is is based~on the_observed the~observed Hessian Hessian.. If the Fisher Fisher scoring scoring matrix is used, used, we we is based_on denote V H as as VEH V EH to to indicate indicate that that the the variance variance estimate estimate is is based based on the denote VH on the different (asymptotically expected Hessian.. The The approaches approaches are are based based on on two two different (asymptotically expected Hessian equivalent) forms of of the the information matrix.. equivalent) forms information matrix The sandwich estimate estimate of of variance variance is is of ofthe form A A-I -T where where A A is is the the The sandwich the form -1 BA 13A -T usual estimate of of the variance based based on on the the information information matrix matrix.. The The middle middle usual estimate the variance of the sandwich sandwich is is aa correction correction term term.. Huber Huber (1967) (1967) introduced introduced the the idea idea in in aa of the discussion of the the general general properties properties inherent inherent in in the the solution solution of of an an estimating estimating discussion of equation given by by equation given n ~T = ~ Ti i=1

(xi"3)

l

J px1

= L0Jpx1

(2.58) (2 .58)

where Wi(Xi, 13) is is the the estimating estimating equation equation for for the the ith ith observation observation.. where Ti(xi,,3) For likelihood-based models, models, the the estimating equation is is the the derivative derivative For our our likelihood-based estimating equation ofthe log-likelihood for for the the distribution, distribution, Of o£/oj3. Our desire is to to evaluate evaluate the the of the log-likelihood desire is 10,3 . Our

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

ESTIMATING THE VARIANCE VARIANCE OF OF THE THE ESTIMATES ESTIMATES ESTIMATING THE

29 29

properties of the variance variance estimate estimate for for cases cases when when the the data data really really come come from from properties of the distribution distribution g. g. A is is formally formally given given by by A

A-

8EIP (,3) 8,Q

-1

(2.59) (2.59)

In most cases cases we we can can swap swap the the order order of of the the expectation expectation and and derivative derivative operoperIn most ators so that that ators so

_

OP~)

(OW({3))-1 -1 (2 .60) (2.60) 0{3 ( ) may be estimated as (the (the inverse inverse of) of) the the usual estimate of variance based based on on may be estimated as usual estimate of variance the information matrix VH-a naive variance estimate assuming the data are the information matrix VH-a naive variance estimate assuming the data are from distribution distribution f. f. Cases Cases which which allow allow swapping swapping of of the the order order of of expectation expectation from and differentiation are validated through convergence theorems not covered in in differentiation and are validated through convergence theorems not covered this text. Interested readers should look at a text which more formally covers which formally this text. Interested readers should look at a text more covers regularity conditions, such such as as Billingsley Billingsley (1986), (1986), for for details. details. Otherwise, Otherwise, note note regularity conditions, that the interchange of these operators is allowed in the various models we interchange in that the of these operators is allowed the various models we discuss. discuss . The correction The correction term term given given by by the the B B matrix matrix is is the the covariance covariance of of the the estiestimating equation matrix mating equation T(Q) W({3) = E L: Tj Wi(Xi, (3); it is the covariance matrix of a of a sum sum (xi Q) ; it is the covariance (of vectors). Since the expected value of the score contributions is zero, the the (of vectors). Since the expected value of the score contributions is zero, variance of the estimating equation is simply E(W({3)T W({3)) so that simply E(IQ(Q) p(3)) variance of the estimating equation is T so that A=E A - E

n

B= = B

n

n

LEE [,Pi [Wi(Xi, (3)W!(Xi, (3)] + L L i=1

(xi,,3)IPT (xi,,3)] +

i=1 jj == 11

E [,Pi [Wi(Xi,{3)W!(Xj,{3)] E (xi,,3)TT(xj,,3)]

(2.61) (2 .61)

#- ii jj :~'

If we assume assume that that the the observations observations are independent, then then the cross terms terms are are If we are independent, the cross zero and the the natural natural estimator estimator of of B B is is zero and

B B ==

n n

L

i=1 i=1

[~i(Xi,i3)~l(Xi,i3)] T(Xi,~)]

(2.62) (2.62)

[`Fj(xi,~)

Using information, the Using this this information, the middle middle of of the the sandwich variance estimate estimate is sandwich variance is from the formed from the independent independent contributions contributions of of the the estimating equation.. For For formed estimating equation example, the correction correction term for the the sandwich sandwich estimate estimate of of variance variance appropriate appropriate example, the term for for GLMs GLMs is is derived derived using using for 1C)i (0£) (0 (0£) (Oft) -xTg = x!Yi iii2 (Oft) 0'fJ i ap Oft i 071)i 0'fJ i V(ft)i (877)i 0'fJ i VPi

~i(Xi',8) = = Ti(xi,(071)i-

(2.63) (2 .63)

where the derivatives derivatives are evaluated at the estimated parameters and and the the corwhere the are evaluated at the estimated parameters correction term is is given given by by rection term n BGLM(,3) _ [~

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

2 Yi - lai C~N T xi { V(Ni) (8~ i~~

x2J PXP

(2.64) (2 .64)

30 30

MODEL CONSTRUCTION CONSTRUCTION AND AND ESTIMATING ESTIMATING EQUATIONS EQUATIONS MODEL

Induded in this this correction correction term, term, xi Xi is is the the ith ith (p (p xxl) row vector vector of of the the (n x p) p) Included in 1) row (n x matrix of covariates covariates X. X. matrix of The general general sandwich sandwich estimate estimate of of variance variance is the pp xx p p matrix matrix The is the (2.65) (2.65)

Vs (Q) = Vxl(,3)B(,3)Vxl(,3)

Since the sandwich sandwich estimate estimate of of variance variance combines combines the the variance variance estimate estimate for for Since the matrix from the specified model with a variance matrix constructed from the data, the the specified model with a variance constructed the data, the variance estimate is is sometimes called the the empirical empirical variance variance estimate estimate.. variance estimate sometimes called Note that that we we can write equation equation 22.63 as the the product product of of xi Xi and and some some scalar scalar Note can write .63 as By quantity Ui. By construction, the expected value of Ui is zero. These individual quantity ui . construction, the expected value of ui is zero. These individual values the scores scores or or score score residuals. Some software software packages packages allow allow values are are called called the residuals. Some for access to the scores for model assessment. access to the scores model assessment . If If observations observations may may be be grouped grouped due due to to some some correlation correlation structure structure (perhaps (perhaps really panel because the data are really panel data), then the sandwich estimate is modified because the data are data), then the sandwich estimate is modified for to consider the sums of the ni observations for each independent panel The to consider the sums of the ni observations each independent panel i.i. The individual observation-level contributions to the estimating equation are no individual observation-level contributions to the estimating equation are no longer independent longer independent;; but but the the sums sums over over the the panel panel of of the contributions are the contributions are independent. These contributions to the the estimated estimated scores scores are are used used to to form form independent . These contributions to the middle middle of the modified modified sandwich sandwich variance variance.. Continuing Continuing our our example example for for the of the GLMs, the the correction correction term term is is given given by by GLMs,

BMs (,3)

n

n;

T Yit - /-tit ~~~x-it 077 )it~~ V(hit) ~ Pxl

n n;

j=1

_

Yit - /-tit 0/-t xzt V (hit) ~~~ ~ it ~

(2.66) 1xP PxP

where Xit is is the the itth itth (p 1) row row vector vector of of the the (n (n x x p) p) matrix matrix of of covariates X where xit (p xx 1) covariates X and jj = . . ,po p. _ and = 1, 1, .... Using Using either either form form of the naive naive pooled pooled variance variance estimate estimate VH, VH , which which ignored ignored of the any within-panel any within-panel correlation, the modified sandwich estimate of variance correlation, the modified sandwich estimate of variance is is the pp xx p p matrix matrix given given by by the

VMS = VH 1 ARMS AVH1

A

(2.67) (2.67)

A sandwich A sandwich estimate estimate of of variance variance constructed constructed with with VH VH = = VO V OH is called the H is called the robust variance variance estimate. estimate. If If the the construction uses VH VH = = VEH, V EH, the the variance variance esrobust construction uses estimate is is called called the the semi-robust semi-robust variance variance estimate. estimate. The The distinction distinction arises arises when timate when the estimated variance, constructed with the the expected expected Hessian, the estimated semi-robust semi-robust variance, constructed with Hessian, is not robust to misspecification of the link function. is not robust to misspecification of the link function. Again, that we we can write the the relevant relevant terms terms in in the the innermost innermost sums sums of of Again, note note that can write equation 2.66 as as the the product product of of xit Xit and and some some scalar scalar quantity quantity uit Uit.. As As in in the the equation 2.66 case of uncorrelated uncorrelated data, data, the the expected expected value value of Uit is is zero zero by by construction construction.. case of of uit The Uit values are the scores. The uit values are the scores . that the the middle middle of of the the modified sandwich estimate estimate of of variance variance We emphasize emphasize that We modified sandwich has replaced panels panels of of observations observations with with their their respective respective sums. sums. The rank of of has replaced The rank

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

ESTIMATING THE VARIANCE VARIANCE OF OF THE THE ESTIMATES ESTIMATES ESTIMATING THE

31 31

the resulting matrix is is less less than than the the number number of of panels panels in in the the construction construction.. the resulting matrix Therefore, this variance estimate should should not not be be used used for for data data sets sets with with aa small Therefore, this variance estimate small number of panels panels since since the the asymptotic asymptotic justification justification ofthe of the distribution distribution includes includes number of the assumption assumption that that the the number number of of panels panels goes goes to to infinity. infinity. This This dependence dependence the on the number number of panels can can be be seen seen on on inspection of equation equation 2.64 2.64 where where on the of panels inspection of the result result is is computed computed by by summing summing n n matrices matrices of size (p (p xx p). p). Assuming Assuming an an the of size organization of these these n observations in balanced data data set set of of kk panels panels each each organization of n observations in aa balanced with observations (n (n = = kt), kt), equation equation 2.66 is the the result result of of summing summing only only kk with tt observations 2.66 is matrices of size size (p p). If If kk G < p, p, the the modified modified sandwich sandwich estimate estimate of of variance variance matrices of (p xx p). is is singular. singular . As aa generalization, generalization, if if the the observations observations are are not not independent, independent, but but may may be be As pooled into into independent independent panels, panels, the the formation formation of of the the B B matrix matrix is is aa simple simple pooled extension of the the usual usual approach. approach. The within independent independent panels panels is extension of The correlation correlation within is addressed by summing summing the the contributions contributions to to the the estimating estimating equation equation.. The The B B addressed by matrix is the the sum sum (over (over panels) panels) of of the the outer product of of the the independent independent sums sums matrix is outer product (within (within panels) panels) of ofthe L:i(L: t `FZt)(Et Wit)(L: t `FZt) Wit)T. In this this case, case, the estimating estimating equation equation E2(Et T . In the estimate of variance variance is is called called the the modified sandwich variance variance estimate estimate.. The The the estimate of modified sandwich difficult if estimation is more more difficult if the the equation equation is is not by independent independent obobestimation is not defined defined by servations nor by by independent independent panels panels.. For For example, example, in in the the Cox Cox proportional proportional servations nor hazard model, observations observations contribute contribute to to the the estimating equation through hazard model, estimating equation through membership in membership in the the risk risk pool pool.. Further, Further, the the risk risk pools pools share share observations. observations. Thus, Thus, the estimating estimating equation equation is is not not characterized characterized by by either either independent independent observaobservathe tions nor nor by by independent independent panels panels of observations.. Moreover, Moreover, the the derivation derivation of of tions of observations the sandwich estimate of variance is complicated by the need to identify the sandwich estimate of variance is complicated by the need to identify aa suitable form of of the the estimating estimating equation equation.. Lin Lin and and Wei Wei (1989) (1989) demonstrate demonstrate the the suitable form derivation of the sandwich estimate of variance for this complicated model. derivation of the sandwich estimate of variance for this complicated model . Several times throughout throughout the we construct construct these these modified modified sandwich sandwich Several times the text text we estimates of variance variance for for generalized estimating equations equations.. The The preceding preceding disdisestimates of generalized estimating cussion of sandwich sandwich estimates estimates of of variance variance is is valid valid for for the equations cussion of the estimating estimating equations derived from likelihoods likelihoods as as well well as as for for the estimating equations equations that that imply derived from the estimating imply quasilikelihoods. quasilikelihoods . Lee, Scott, and Soo Soo (1993) (1993) show show that that the the modified modified sandwich sandwich estimate estimate of of Lee, Scott, and variance for the pooled estimator estimator underestimates underestimates the the true true covariance covariance matrix matrix.. variance for the pooled This is well well known, known, and and in in fact fact all all maximum maximum likelihood likelihood estimation estimation procedures procedures This is underestimate the true true covariance matrix.. For For small small samples, samples, this this bias bias is is more more underestimate the covariance matrix pronounced, and and various various ad ad hoc hoc measures measures have have been been proposed proposed for for modifying modifying pronounced, the sandwich variance.. the sandwich estimate estimate of of variance The most most common common modification to the the sandwich estimate of of variance variance is is aa The modification to sandwich estimate scale factor that that depends depends on on the the sample sample size. size. For For the the usual usual sandwich sandwich estimate scale factor estimate of variance aa commonly commonly used multiplies the the estimate estimate by by n/(n nj(n -- p) p) of variance used approach approach multiplies where n, in in this this case, case, is is the the sample sample size, size, and and pp is the number number of of covariates where n, is the covariates in the in the model model.. For For the modified sandwich sandwich estimate estimate of of variance, variance, the the estimate estimate is the modified is scaled by n/(n nj(n - 1) 1) where where n, n, in in this this case, case, is is the the number number of of panels. panels. This This ad ad scaled by small hoc attempt to to modify modify the the sandwich sandwich estimate estimate of of variance variance for for use use with with small hoc attempt samples results in different answers answers from from competing competing software software packages packages.. The The samples results in different

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

32 32

MODEL CONSTRUCTION CONSTRUCTION AND AND ESTIMATING ESTIMATING EQUATIONS EQUATIONS MODEL

user consult the documentation for for specific specific software software to to learn learn if if any any user should should consult the documentation scale adjustments are are made. made. scale adjustments

data Panel data 22.3 .3 Panel There is substantial substantial literature literature addressing addressing the the subject subject of of clustered clustered data. data. ClusClusThere is tered data data occur occur when when there there is is aa natural natural classification to observations such tered classification to observations such that data data may may be be organized organized according according to to generation generation or or sampling from units units.. that sampling from For example, we we may may collect collect data data on on loans we have have multiple multiple observaobservaFor example, loans where where we tions from from different different banks. banks. It It would would be be natural natural to address the the dependence dependence tions to address in the data data on on the the bank bank itself, itself, and and there are several several methods methods that that we we might might in the there are utilize to take take this this dependence into account account.. utilize to dependence into Panel Panel data data comes comes under under many headings.. If If the the panels panels represent represent aa level level of of many headings data organization where where the the observations observations within within panels panels come come from from different different data organization experimental units belonging belonging to to the the same classification, the the data usually experimental units same classification, data are are usually called panel data, data, clustered clustered data, data, or or repeated repeated measurement measurement data. data. If If the the obserobsercalled panel vations within panels panels come come from from the the same same experimental experimental unit unit measured measured over over vations within time, the the data data are are typically typically called called longitudinal longitudinal data. data. time, Unless aa model model or or method method is specific to to aa certain type of of panel panel structure, structure, Unless is specific certain type all forms forms of we adopt the the term term panel panel data data to to imply imply all of this this type type of of data. data. Each Each we adopt method for addressing addressing the the panel panel structure structure of of the the data data has has advantages advantages as as well well method for as limitations;; it it benefits benefits the the researcher researcher to to recognize recognize the the assumptions assumptions and and as limitations inferences that are are available. inferences that available . In aa panel panel data data set, set, we we assume that we we have have ii = = 1, 1, ... panels (clusters) (clusters) In assume that . . . ,,n n panels where each panel panel has has tt = = 1, 1, .... correlated observations observations.. This notation where each . . ,,ni n2 correlated This notation allows either ... = , or allows either balanced balanced panels, panels, nl nl = = n2 n2 = = ... = nnn, or unbalanced unbalanced panels panels nj nj 0 ini for at at least one jj EE {1, {I, ... i- ii.. We We focus focus on on the the exponential exponential family family n2 for least one . . . ,,n}, n}, jj 0 of distributions since they include include the the distributions distributions individually individually illustrated illustrated in in of distributions since they previous subsections; linear regression regression in in section section 2.1.1, 2.1.1, Poisson Poisson regression regression in in previous subsections ; linear section 2.1.2, and and Bernoulli Bernoulli regression regression in in section 2.1.3. section 2.1.2, section 2.1 .3. amend the the exponential exponential family family notation notation to to read read We amend We

.I..)}

(e2t) _ ( YitBit-b(Bit) yite2a exp exp { a(¢» -cYit,'f/ C(Y2t, ~) (0)

(2.68) (2.68)

where the repeated repeated observations observations tt = = 1, 1, .... within aa given given panel panel ii are are where the . . ,,ni n2 within assumed to be correlated. GLMs assume that the observations are independent assumed to be correlated. GLMs assume that the observations are independent with no correlation correlation between between the the outcomes. outcomes. Marginal Marginal models, models, GEE GEE models, models, with no GLM for and random-effects models are extensions of the GLM for correlated data. In and random-effects models are extensions of the correlated data. In many for the next few sections we illustrate many of the methods for addressing the the next few sections we illustrate of the methods addressing the correlation inherent in in panel panel data. data. correlation inherent Throughout these subsections, subsections, we we include include results results for for analyzing analyzing the the ship ship (see (see Throughout these .2) and .1) data. section 5.2.2) and wheeze wheeze (see section 5.2 5.2.1) data. We We model model the the ship ship data data section 5.2 (see section using panel using panel Poisson Poisson estimators; estimators; the the wheeze wheeze data data are are modeled modeled using using panel panel (logistic) binomial binomial estimators estimators.. (logistic)

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

PANEL DATA PANEL DATA

33 33

2.3.1 Pooled Pooled estimators estimators 2.3.1

A simple A simple approach to modeling modeling panel panel data data is is simply simply to to ignore ignore the the panel panel dependepenapproach to in the dence that might might be be present present in the data. data. The The result result of of this this approach approach is called dence that is called aa pooled pooled estimator, estimator, since since the the data data are are simply simply pooled pooled without without regard regard to to which which panel the the data data naturally naturally belong belong.. The The resulting resulting estimated estimated coefficient vector, panel coefficient vector, though consistent, consistent, is not efficient efficient.. A A direct direct result result of of ignoring ignoring the the within-panel within-panel though is not correlation is that that the the estimated estimated (naive) (naive) standard standard errors errors are not aa reliable reliable correlation is are not measure for testing purposes.. To To address address the the standard standard errors, errors, we we should emmeasure for testing purposes should employ aa modified modified sandwich sandwich estimate estimate of of variance, variance, or or another another variance variance estimate ploy estimate that adjusts adjusts for for the the panel panel nature nature of of the the data. data. that The general The general LIML LIML exponential exponential family family pooled pooled estimating estimating equation equation is is given given by by

[{gt,

Vit Oh ~~ Yit - /-tit Mit (OM) it } LL ~ ~ a(O)V(pit) a(¢»V(Mit) ~~~)itX O'fJ . Xjit t=1 i=1 2=1 t=1

it

j=1, .... ,p ~7=1, . .,p

_ f 1

] pX1 Px1

=L0JPx1 = [O]PX1

(2.69) (2.69)

where p is is the the column column dimension dimension of the matrix matrix of of covariates covariates X. X. Apart Apart from from where p of the aa second second subscript, subscript, the the pooled pooled estimating estimating equation equation is is no no different different from from equaequation 22.55. The implied implied likelihood likelihood of of the estimating equation does not not address address tion .55. The the estimating equation does any second In if any second order order dependence dependence of the data. In other words, if we believe that of the data. other words, we believe that within-panel implied likelihood wrong . there is within-panel dependence, our implied likelihood is wrong. there is dependence, our is Since our our estimating estimating equation equation does does not not imply imply aa likelihood likelihood that that includes Since includes within-panel dependence of the data, we must be very careful in our our interinterwithin-panel dependence of the data, we must _be very careful in pretation of results. The usual variance matrix V H obtained from fitting the pretation of results. The usual variance matrix VH obtained from fitting the GLM is naive in the sense that it assumes no within-panel dependence of the GLM is naive in the sense that it assumes no within-panel dependence of the data. Instead, we can use the modified sandwich estimate of variance for testdata. Instead, we can use the modified sandwich estimate of variance for testing and interpretation; but we we should should acknowledge acknowledge the the fact fact that that employing employing aa ing and interpretation ; but pooled estimator with a modified variance estimate is a declaration that the the pooled estimator with a modified variance estimate is a declaration that underlying likelihood underlying likelihood is is not not correct correct.. The modified sandwich sandwich estimate of variance addresses possible possible within-panel within-panel The modified estimate of variance addresses correlation, as in Binder (1983), by summing residuals over the panel identicorrelation, as in Binder (1983), by summing residuals over the panel identifiers in the estimation of the variance of the estimating equation. However, fiers in the estimation of the variance of the estimating equation . However, itit does not alter alter the the estimating estimating equation equation itself. Therefore, the the implied implied likelihood likelihood does not itself. Therefore, of the estimating equation is unchanged; it does not directly address within of the estimating equation is unchanged; it does not directly address within panel correlation, nor does it change the resulting coefficient estimates from panel correlation, nor does it change the resulting coefficient estimates from aa hypothesis hypothesis of of within within panel panel independence independence.. Rather, the the modified modified sandwich sandwich estimate estimate of of variance variance alters alters the the variance variance esRather, estimate. Our interpretation of coefficients is in terms of an underlying best-fit timate . Our interpretation of coefficients is in terms of an underlying best-fit independence model for for data data that that in in fact fact come come from from aa dependence dependence model. model. In In independence model other words, there is a best independence model for the data consisting of the other words, there is a best independence model for the data consisting of the entire population of of panels panels and and observations observations;; our our results results estimate estimate this this best best entire population independence model. independence model . In In the the sense sense that that we we have have explicitly explicitly addressed addressed possible possible within within panel panel correlacorrelation without without altering the estimating estimating equation equation from the independence independence model, model, tion altering the from the

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

34 34

MODEL CONSTRUCTION CONSTRUCTION AND AND ESTIMATING ESTIMATING EQUATIONS EQUATIONS MODEL

we are fitting fitting aa model model from from the the wrong wrong (implied) (implied) likelihood likelihood.. It is for for this this reawe are It is reason that researchers researchers using using this this variance variance estimate estimate do do not not use use likelihood-based likelihood-based son that criteria and tests tests to to interpret interpret the the model model.. criteria and The modified sandwich sandwich estimate estimate of of variance variance is is robust robust to to any any type type of The modified of corcorMany people people believe believe that that the the adjective adjective robust robust means means relation within panels. panels. Many relation within that sandwich estimates of of variance variance are are larger than naive naive estimates estimates of of varivarithat sandwich estimates larger than ance. This is not the the case. case. A A robust robust variance variance estimate estimate may may result result in in smaller smaller or or ance. This is not larger estimators depending depending on on the the nature nature of ofthe within-panel correlation. correlation. The The larger estimators the within-panel calculation of the the modified modified sandwich sandwich estimate estimate of of variance uses the the sums sums of the calculation of variance uses of the residuals from each each panel panel.. If If the the residuals residuals are are negatively negatively correlated correlated and and the the residuals from sums are small, small, the the modified modified sandwich sandwich estimate estimate of of variance variance produces produces smaller smaller sums are standard errors than than the the naive naive estimator estimator.. standard errors

that the the underlying underlying likelihood likelihood of the fitted fitted model model is is not not correct correct Declaring that Declaring of the by using using the modified sandwich sandwich estimate estimate of of variance variance requires requires more more than than aa by the modified careful interpretation of of model model results. results. One must also also be be vigilant vigilant about about not not careful interpretation One must employing model fit diagnostics and and tests tests based based on on likelihood likelihood calculations calculations or or employing model fit diagnostics assumptions. There is is no no free free lunch lunch with with this this variance variance adjustment. One should should assumptions . There adjustment. One not adjust the the variance variance due due to to aa beliefthat belief that there there is is aa violation violation of of independence independence not adjust of the observations, observations, and and then then ignore ignore this this fact fact later; later; for for example, example, by by running running of the aa likelihood likelihood ratio ratio test test comparing comparing aa nested nested model. model. That That is, is, we we can can not not use use these post-estimation post-estimation tests tests and and diagnostics diagnostics outside outside of of the the interpretation of these interpretation of our model being being an an estimate estimate of incorrect best best fit fit independence independence model model of of our model of some some incorrect aa population population of of observations observations..

2.3.2 Fixed-effects Fixed-effects and and random-effects random-effects models models 2.3.2

To address the the panel panel structure structure in our data, data, we we may may include include an an effect effect for for each each To address in our panel in in our our estimating estimating equation equation.. We We may may assume that these these effects effects are are fixed fixed panel assume that effects or random random effects effects.. In In addition, addition, the the fixed effects may may be be conditional conditional effects or fixed effects fixed effects effects or or unconditional unconditional fixed fixed effects effects.. Unconditional Unconditional fixed-effects fixed-effects estiestifixed mators simply include an indicator indicator variable variable for for the the panel panel in our estimation. estimation. mators simply include an in our Conditional fixed-effects fixed-effects estimators estimators are are derived derived from from aa different different likelihood likelihood.. Conditional They are derived derived from from aa conditional conditional likelihood, likelihood, which which removes removes the the fixed fixed efThey are effects from from the the estimation estimation by by conditioning conditioning on on the the sufficient sufficient statistic statistic for for the the fects parameter to to be be removed removed.. parameter

There There is is some some controversy controversy over over the the choice choice of of fixed-effects fixed-effects or or random-effects random-effects models. The choice choice is is clear clear when when the the nature nature of of the the panels panels is is known. known. The The models . The inference follows inference follows the the nature nature of of the the model model.. When When there there is is no no compelling compelling choice choice between the the two two models, models, the the random-effects random-effects model model is sometimes preferred preferred if if between is sometimes there are are covariates covariates that that are are constant constant within within panels. panels. Coefficients Coefficients for for these these there covariates not be be estimated estimated for for fixed-effects fixed-effects models models since since the the covariate covariate is covariates can can not is collinear with the the fixed fixed effect effect.. collinear with

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

PANEL DATA PANEL DATA

335 5

2.3.2.1 Unconditional fixed-effects models models 2.3.2.1 Unconditional fixed-effects If there are finite number number of panels in in aa population population and and each each panel panel is is reprepIf there are aa finite of panels resented in our our sample, sample, we we would would use use an an unconditional unconditional fixed-effects fixed-effects model model.. If resented in If there are are an an infinite infinite number number of of panels panels (or (or effectively effectively uncountable), uncountable), then then we we there would use aa conditional conditional fixed-effects fixed-effects model, model, because because using using an an unconditional unconditional would use fixed-effects model model would would result result in in biased biased estimates estimates.. fixed-effects The unconditional fixed-effects fixed-effects estimating estimating equation equation for for the exponential famfamThe unconditional the exponential ily is given given by by admitting admitting the the fixed fixed effect effect vi Vi into into the the linear linear predictor predictor 'fJit ily is 77it = = Xitj3 + + vi Vi where where xi Xitt is is the the itth itth row row of of the the X X matrix matrix.. We We wish to estimate the xit,Q wish to estimate the (p + n) xx 11 parameter parameter vector vector e = (Q, (13, v) v).. The The estimating estimating equation for the the (p + n) O = equation for unconditional fixed-effects GLM GLM is is given given by by unconditional fixed-effects n

n;

~~ ~~

8G

i=l t=1 t=l i=

8G 8vk

fit -- /-tit OP Yit Mit (OM) - ) it xj2t Xjit } a 77 it a((O) ¢» V V (pit (Mit)) (0 O'fJ

Ykt Ykt - hkt Mkt (OM) } a(¢»V(Mkt) O'fJ ) kt tt(O)v(l-tkt) 077 01-t t-1

~

=

[O](p+n)xI

nk

(p+n) xl

(2 .70) (2.70)

for jj = 1, .... and kk = = 1, 1, .... for = 1, . . ,,p p and . . ,,no n. Unconditional fixed-effects fixed-effects models models may may be be obtained obtained for for the the full full complement complement Unconditional of GLMs including including those those implying implying quasilikelihoods. quasilikelihoods. of GLMs Using the the ship ship data, data, we we fit fit an an unconditional unconditional fixed fixed effects effects Poisson Poisson model model by by Using including indicator variables variables for for the the ship ship.. The The results results are are given given by by including indicator Poisson regression Poisson regression Log Log likelihood likelihood

= -68 -68.280771 .280771

=

Number of of obs obs Number LR chi2(8) chi2(8) LR Prob >> chi2 chi2 Prob Pseudo R2 R2 Pseudo

34 34 107.63 107 .63 0.0000 0 .0000 0.4408 0 .4408

incident II Coef.. Std.. Err Err.. Z P> I z I [95% Conf Conf.. Interval] Interval] incident Coef Std z P>Izl [95% -----------------------------------------------------------------------------_Iship_2 I --.5433443 .1775899 -3.06 _Iship_2 .5433443 .1775899 -3 .06 00.002 .002 --.8914141 .8914141 --.1952745 .1952745 _Iship_3 I --.6874016 .3290472 -2.09 -1.332322 _Iship_3 .6874016 .3290472 -2 .09 00.037 .037 -1 .332322 --.042481 .042481 _Iship_4 I --.0759614 .2905787 -0.26 .4935623 _Iship_4 .0759614 .2905787 -0 .26 00.794 .794 --.6454851 .6454851 .4935623 _Iship_5 I .3255795 .2358794 1.38 .7878946 _Iship_5 .3255795 .2358794 1 .38 00.168 .168 --.1367357 .1367357 .7878946 op_75_79 .384467 .1182722 3.25 .1526578 .6162761 op-75-79 I .384467 .1182722 3 .25 00.001 .001 .1526578 .6162761 co_65_69 .6971404 .1496414 4.66 .4038487 .9904322 co-65-69 I .6971404 .1496414 4 .66 00.000 .000 .4038487 .9904322 co_70_74 .8184266 .1697736 4.82 .4856763 co-70-74 I .8184266 .1697736 4 .82 00.000 .000 .4856763 11.151177 .151177 co_75_79 .4534266 .2331705 1.94 .9104324 co-75-79 I .4534266 .2331705 1 .94 00.052 .052 --.0035791 .0035791 .9104324 cons I -6 -6.405902 .2174441 -29.46 -6.832084 -5.979719 _cons .405902 .2174441 -29 .46 00.000 .000 -6 .832084 -5 .979719 exposure I (offset) exposure (offset)

-------------+----------------------------------------------------------------

Using Using the the wheeze wheeze data, data, we we fit fit an an unconditional unconditional fixed-effects fixed-effects logistic logistic regresregresincluding indicator (case) . The sion model by by including indicator variables variables for for the the child child (case). The results results sion model are given by by are given

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

36 36

MODEL CONSTRUCTION CONSTRUCTION AND AND ESTIMATING ESTIMATING EQUATIONS EQUATIONS MODEL

Logit estimates Logit estimates Log Log likelihood likelihood

= -23 -23.454028 .454028

=

Number of of obs obs Number LR chi2(11) ehi2(11) LR Prob >> chi2 ehi2 Prob Pseudo R2 R2 Pseudo

40 40 88.44 .44 0.6731 0 .6731 0.1525 0 .1525

wheeze II Coef.. Std.. Err Err.. z P> I z I [95% Conf Conf.. Interval] Interval] wheeze Coef Std z P>Izl [95% -----------------------------------------------------------------------------_Iease_3 I --.9845011 1.931727 -0.51 -4.770617 2.801615 _Icase_3 .9845011 1 .931727 -0 .51 00.610 .610 -4 .770617 2 .801615 Iease_4 I -1 -1.167123 1.571454 -0.74 -4.247117 _Icase_4 .167123 1 .571454 -0 .74 00.458 .458 -4 .247117 11.912871 .912871 _Iease_9 I -2 -2.462266 2.221064 -1.11 -6.815471 _Icase_9 .462266 2 .221064 -1 .11 00.268 .268 -6 .815471 11.890939 .890939 Iease_ll I 2.109374 0.63 -2.812979 5.455615 _Icase_11 11.321318 .321318 2 .109374 0 .63 00.531 .531 -2 .812979 5 .455615 _Iease_13 I 1.794704 0.70 -2.264269 4.77084 _Icase_13 11.253285 .253285 1 .794704 0 .70 00.485 .485 -2 .264269 4 .77084 Iease_14 I -2 -2.446925 2.137959 -1.14 -6.637248 _Icase_14 .446925 2 .137959 -1 .14 00.252 .252 -6 .637248 11.743397 .743397 _Iease_15 I -1 -1.073333 1.621803 -0.66 -4.252007 2.105342 _Icase_15 .073333 1 .621803 -0 .66 00.508 .508 -4 .252007 2 .105342 _Iease_16 I -1 -1.284241 1.925523 -0.67 -5.058197 2.489715 _Icase_16 .284241 1 .925523 -0 .67 00.505 .505 -5 .058197 2 .489715 kingston I -1 -1.341676 2.210786 -0.61 -5.674736 2.991385 kingston .341676 2 .210786 -0 .61 00.544 .544 -5 .674736 2 .991385 age I --.3607877 .3399243 -1.06 -1.027027 .3054517 age .3607877 .3399243 -1 .06 00.289 .289 -1 .027027 .3054517 0.13 -1.614862 smoke I .1154117 .8828089 smoke .1154117 .8828089 0 .13 00.896 .896 -1 .614862 11.845685 .845685 eons I 4.927258 3.828498 1.29 -2.57646 12.43098 _cons 4 .927258 3 .828498 1 .29 00.198 .198 -2 .57646 12 .43098

-------------+----------------------------------------------------------------

In estimating the model, model, it it is is determined determined that that several several of the indicator indicator varivariIn estimating the of the ables for the subject predict the outcome perfectly. In such a case, software ables for the subject predict the outcome perfectly. In such a case, software may drop these these variables variables (as the outcome outcome above above reflects) reflects).. Keeping Keeping these these perpermay drop (as the fect predictors in the model requires (in maximum likelihood) that the fitted fect predictors in the model requires (in maximum likelihood) that the fitted coefficients should be be infinite infinite.. An An alternative would be be to to use use software software (or (or coefficients should alternative would programming techniques) that model exact logistic regression. programming techniques) that model exact logistic regression . 2.3.2.2 Conditional Conditional fixed-effects fixed-effects models models 2.3.2.2 A conditional fixed-effects fixed-effects model model is is formed formed by by conditioning the fixed fixed effects effects A conditional conditioning out out the from the the estimation. This allows allows aa much much more more efficient efficient estimator estimator at at the the cost cost from estimation . This of placing constraints constraints on on inference inference in the form form of of the the conditioning conditioning imposed imposed on on of placing in the the likelihood. Such models models are are derived derived from from specific specific distributions with valid valid the likelihood . Such distributions with likelihoods. likelihoods . Conditional fixed-effects models models are are derived derived from from specific specific distributions, distributions, not not Conditional fixed-effects from the general exponential family distribution. For illustration of the model from the general exponential family distribution . For illustration of the model construction, we derive derive the the estimating estimating equation equation for for the the FIML FIML conditional conditional fixed fixed construction, we effects Poisson regression model. Apart from identifying a sufficient statistic effects Poisson regression model . Apart from identifying a sufficient statistic on which to to condition, condition, the the derivation derivation of of the the estimating estimating equation equation is is the the same same on which as for the previous illustrations for independent data. as for the previous illustrations for independent data. In In general, general, we we have have aa specific distribution for for aa single single outcome outcome on Yit that that we we specific distribution on yet call (Yit).. We We find find the the joint joint distribution distribution for for all all of ofthe observations for for aa spespecall II the observations fl (yit) cific panel panel fl II (y2) (Yi) _ = fle-1 n~~l fl II (yet) (Yit) and and obtain obtain the the sufficient sufficient statistic statistic ~(y2) ~(Yi) for for the the fixed fixed effect Vi.. We We then then find find the the distribution distribution of ofthe sufficient statistic h(~(Yi)). effect v2 the sufficient statistic f2( (y2)) . Finally, we Finally, we obtain obtain the the conditional conditional distribution distribution of the outcomes outcomes given the disdisof the given the tribution of the sufficient sufficient statistic statistic as as f3(y2 !3(Yi;;,Q1~(YZ)) ,B1~(Yi)) = = fl(YZ)/f2(~(y2)) II(Yi)/ h(~(Yi)).. This This tribution of the distribution is free free of of the the fixed fixed effect effect v2 Vi.. Thus, the conditional conditional log-likelihood log-likelihood distribution is Thus, the

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

PANEL DATA PANEL DATA

37 37

for all all of the panels panels is given by by for of the is given n n

II

G I: = = In In fj f3 h (Yi;,3 (Yi; ,81~(Yi)) ~ (Yi))

(2.71) (2 .71)

i=1 i=l

1, .... . . ,,p p with estimating equation equation for for O 0 = and jj = = 1, with estimating = (,8) (Q) and ~b
_ 8G _ = [O](PXl) 101 (P 1) - R 0~j ~] (PX1) (pxl) -

= [{ :fJl:.}] J

(2 .72) (2.72)

X

The estimating equation equation can can be be made made FIML FIML or or LIML LIML depending on whether whether The estimating depending on there are additional parameters from II that we include in 0 or treat as there are additional parameters from fl that we include in O or treat as ancillary. We shall derive the conditional fixed-effects Poisson regression model ancillary. We shall derive the conditional fixed-effects Poisson regression model to highlight highlight the the steps steps outlined outlined above. above. Our Our specific specific model model will will assume assume the the to canonical log link for the relationship of the linear predictor to the expected canonical log link for the relationship of the linear predictor to the expected value. value . The The probability probability for for aa specific specific outcome outcome in in the the (individual (individual level) level) Poisson Poisson model is model is

i= Yit·'! _ _ e-wit -I'i' pYi'/ P( ~7 J:it-Yit -e Mit 'it = P(1 yit)) = t /yit

(2.73) (2 .73)

where the expected expected value value of ofthe outcome is is given given by by pit Mit.. As As done previously, we we where the the outcome done previously, specify the relationship relationship of of the the expected expected value value to to the the linear linear predictor predictor through specify the through aa link link function; function; in in this this case, case, the the log log link. link.

pit = exp (xit,3 + ryi) = exp(47it + ryi)

(2.74) (2 .74)

Here, we have have made made the the inclusion of the the fixed fixed effect effect -yi 'Ii explicit explicit in in the the parameparameHere, we inclusion of terization.. The The probability probability for for aa specific specific outcome, outcome, introducing introducing covariates covariates and and terization the fixed fixed effect, effect, is is the

e-eXP(1it+li) exp(tlit Pit P(1'it = yit) = +'Yi) /yit!

(2.75) (2 .75)

within aa panel Since observations observations within panel are are independent, independent, we we write write the the probability probability Since panel ii as of vector of of outcomes outcomes for for panel as of aa vector P(Yi= Yi)

yit /yit! H II e-exl(lit+l) exp(tlit exp(1Jit +'Yi) + '1i)Yi' /Yit! ni ni

(2.76) (2 .76)

e-exp(1Ji,+,ri)

t=l t=1

e- 2:, exp(~it+,ri) exp(1Ji'+1'i) exp('Yi)~t exp("(i)2:, Pit Yi' e-Et

H IT exp(77ityit) eXP(1Ji~Yit) ni

t=l t=1

Yit! Yd·

(2.77) (2 .77)

ni Yit· The sufficient The sufficient statistic statistic for for -yi 'Ii is is then then ~(Yi) ~(Yi) _ = Et L:~~l =1 tJit . Since we we know know that that the the sum sum of of Poisson Poisson random random variables variables is also Poisson, Poisson, Since is also

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

38 38

MODEL CONSTRUCTION CONSTRUCTION AND AND ESTIMATING ESTIMATING EQUATIONS EQUATIONS MODEL

we write the the probability probability of of aa given given outcome outcome for for the the sufficient as we write sufficient statistic statistic as

ni ni P Yit ) P ~ (tYit = Y~ t Yit Y~ 1'it = t=l t=l t=1 t=1

ni

1 Et Yit /

t=1

J

(~exP(""

ni

1

(~y,,}

= ee- E L,t e-p(?7it+-ri) "p(,,,+,,l ~~ exp(t7it +'Yi) + 1'») L, ", / ~Y, Yit !

(2.78) (2.78) t=1 J Et fit ni ni 1 E eXP(qit+-ri) pit ~~ t eL, "p(,,,+,,l expb,)L, '" L, '" / e(2.79) exp('Yi)Et exp(tlit) / ~ Y~ Yit ! (2.79) t=1 J t=1 J

(~exP(""»)

(~y,,}

The conditional conditional probability probability is is then then the the ratio ratio of of the the joint joint probability probability (equa(equaThe tion 22.77) and the the probability probability of of the the sufficient sufficient statistic statistic (equation (equation 2.79) 2.79) given tion .77) and given by by

P (Yi = yi Yi P Yi =

It ~ ni

t=l t=1

Yit) __ = fit

IT ni

exp(t7ityit (Et (L:t yit Yit)!I (exP(rJitYit)) Yit ( (L: t exp(77it))Et exp(rJit))2:, Yi' t=l Yit! Yit! )) (Et t=1

(2.80) (2.80)

The conditional The conditional probability probability of of observations for aa single single panel panel is is free free of of the the observations for for the panel-level fixed fixed effect effect since since we we conditioned conditioned on on the the sufficient sufficient statistic statistic for the panel-level panel-level effect effect.. The The estimating estimating equation equation is is derived derived using using the the standard standard apappanel-level proach illustrated illustrated throughout throughout this this chapter chapter.. The The conditional conditional likelihood likelihood is is thus thus proach given by given by the the product product of of the the conditional conditional probabilities probabilities for for all all of of the the panels panels

L({3) = L(Q)

lI ni

exp(t7ityit nz (exP(rJitYit)) (Et (L:t yit Yit)!! i=l (Et (L: t exp(77it))Et exp(rJit))2:, yit Yi' t=l ( Yit! Yit! ni ni ~ n exp F (~ yit + In (exp(tlit)) exp { E In lnf +1 Yit In (exp(rJit)) tJit i=1 t=1 t=1 / ni 1)) } + ~7, (tlityit (rJitYit -lnf(Yit + 1)) + - In F(yit + t=1 nn

lI

t

(~Yit

1) -~

~

the log-likelihood log-likelihood is is Consequently, the Consequently, n ni ni lnf £({3) _ E In F ~~ yit + + 1 - Y~ yit In (exp(77it)) G(~) i=1 t=1 t=1 / ni ni In IF (yit + + (rJitYit -lnf(Yit + 1)) 1)) + ~7, (77ityit t=l t=1

t

(~Yit

(2.81) (2.81)

(2.82) (2.82)

1) -~Yitln(eXP(rJit))

L

(2.83) (2.83)

For this particular particular case, case, there there are are no no ancillary ancillary parameters parameters and and the the estiestiFor this mating equation for fixed mating equation '1i({3) = 0 for the conditional fixed effects log-linked Poisson = 0 the conditional effects log-linked Poisson 3 Q( ) . . . ,P model is the the derivative derivative of of the the above log-likelihood for for jj = = 1.1, ... ,p model is above log-likelihood

n ni exp(77it) 8C 077 _ ~ ~ [ o£ orJ eXP(rJit)]}] of]· = ~ ~ fit Yit - fit Yit L: ex ( .) Xjit Xjit ~~ [{ 7) 8 rJ 8 ~jJ ~7 i=1 i=l t=i t=l k exp(t1i1 P rJ'k )

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

(pxl) (Pxl)

_ = 101(pxl) [O](PXl)

(2.84) (2.84)

PANEL DATA PANEL DATA

339 9

Fitting aa conditional conditional fixed fixed effects model using using the the ship ship data data provides provides the the Fitting effects model following results: following results : Conditional fixed-effects Poisson Poisson Conditional fixed-effects Group variable variable (i) (i) :: ship

Number of obs Number of obs Number Number of groups groups

34 34 5 5

= = =

Obs per per group group:: min min = Obs avg avg = max max = Log Log likelihood likelihood

Wald chi2(4) chi2(4) Wald Prob >> chi2 Prob chi2

=

-54.641859 = -54 .641859

6 6 6.8 6 .8 7 7 48.44 48 .44 0.0000 0 .0000

incident II Coef.. Std.. Err Err.. Z P> I z I [95% Conf Conf.. Interval] Interval] incident Coef Std z P>Izl [95% -------------+----------------------------------------------------------------

-------------+---------------------------------------------------------------op-75-79 op3539 co_65_69 co-65-69 co3034 co-70-74 co3539 co-75-79 exposure exposure

.384467 .1182722 3.25 .1526578 .6162761 .384467 .1182722 3 .25 00.001 .001 .1526578 .6162761 I .6971405 .1496414 4.66 .4038487 .9904322 .6971405 .1496414 4 .66 00.000 .000 .4038487 .9904322 I .8184266 .1697737 4.82 .4856764 .8184266 .1697737 4 .82 00.000 .000 .4856764 11.151177 .151177 I .4534267 .2331705 1.94 .9104324 .4534267 .2331705 1 .94 00.052 .052 --.0035791 .0035791 .9104324 I (offset) I (offset) ------------------------------------------------------------------------------

The same same logic logic can can be be used used to to derive derive the the conditional conditional fixed-effects fixed-effects logistic logistic The regression model. First, First, the the probability probability of of aa given given outcome outcome is is specified as regression model. specified as Yi 1-yet (2.85) , = Yit) = /-ttt /-lr~' ((11 - /tit) /-lit)l(2.85) P(Yit 'Zt = yet) = P(1 the covariates covariates and and the the fixed fixed effect through the the canonical canonicallogit link Introducing Introducing the effect through logit link function yields function yields

exp(1]it +-YZ) exp(XitJ3 +-YZ) + 'Ii) __ exp(t12t + 'Ii) exp(xit,3 (2.86) (2.86) /-lit = = 11 + exp(x Pit 11 + exp(t12t + 'Ii)) + exp(1]it + rye) 'Ii) + exp(xitJ3 tt+ ~ + 'YZ The probability of of aa given given outcome outcome is is The probability Yet ( 1 1- Yi exp(t12t P (1]it +YZ) P(Yit = = yet) Yit) _ (ex + 'Ii) ) Yi' ( 1 ) l- '(2.87) (2 .87) P('Zt + exp(1]it + 'Ii) + exp(1]it + 'Ii) ) 11 +exp(t12t+-YZ) ( 11 +exp(t12t+-YZ)) exp exp {yit(t12t {Yit(1]it + + rye) 'Ii) -In(l + exp(t12t exp(1]it +'YZ))1 + 'Ii))} (2.88) In(1 + (2.88)

Since observations within panel are are independent, independent, the the probability probability of of aa Since observations within aa panel vector of outcomes for panel i is the product vector of outcomes for panel i is the product

=

P(Yi = YZ) Yi) P(YZ ni; n

II

In(1 + {Yit(1]it + + rye) 'Ii) -In(l + exp(712t exp(1]it + + -y2))} 'Ii))} H exp exp {yit(77Zt t=l t=1

(2.89) (2.89)

exp exp {

(2 .90) (2.90)

n;

n;

n;

t=1

t=1

t=1

~ Yit1]it + 'Ii Y~ ~ Yit In(l + + exp(t12t exp(1]it +'YZ)) + 'Ii))} yitt12t +'YZ yet -- ~ In(1

n The sufficient The sufficient statistic statistic for for rye 'Ii is is then then ~(YZ) ~(Yi) _ = Et L:~~1 Yit·. ~=1 y2t We know know the the ratio ratio of of the the joint joint distribution distribution and and the the distribution distribution of of the the sufsufWe ficient ficient statistic statistic ~(y2) ~(Yi) does does not not involve involve the the fixed fixed effect effect rye 'Ii.. Unfortunately, Unfortunately, the the

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

40 40

MODEL CONSTRUCTION CONSTRUCTION AND AND ESTIMATING ESTIMATING EQUATIONS EQUATIONS MODEL

sum of Bernoulli Bernoulli random random variables, variables, when when the the individual individual observations observations do not sum of do not all have the the same same probability probability of of success, success, is is not not characterized characterized by by aa known known disdisall have tribution (as (as in in the the previous previous example example for for the the conditional conditional fixed fixed effects effects Poisson Poisson tribution model). model) . Imagine that that we we have have Xl Xl '" Bernoulli(p Bernoulli(pd, X2 '" Bernoulli(p Bernoulli(p2), X 3 '" Imagine l ), X2 2 ), X3 Bernoulli(p3), and we we desire desire the the probability probability distribution distribution of of T T = = Xl Xl + + X2 X2+ Bernoulli(p3 ), and + X 3 .. The The possible possible outcomes outcomes and and associated associated probabilities probabilities for for the the sum sum of of these these X3 Bernoulli random variables variables is is given given by by Bernoulli random

P(T = = 0) O) P(T P(T = 1) P(T = 1)

P(T = 2) 2) P (T =

= =

=

pd(I - P2)(1 P2)(I - P3) P3) ((11 - Pl)(1 (pd(I - P2)(I- P3) (P1)(1-P2)(1-P3) +(11 - P1)(P2)(1 pd(P2)(I - P3) P3) +( +(1 - pd(I - P2)(P3) +(1-P1)(1-P2)(P3) (pd(P2)(I - P3) P3) (Pl)(P2)(1 +(pd(I - P2) P2)(P3) +(Pl)(1 (P3) +(1pd(P2)(P3) P1)(P2)(P3) +( 1 (pd (P2 )(P3) (P1)(P2)(P3)

(2.91) (2.91)

(2.92) (2 .92)

(2.93) (2 .93)

(2.94) (2.94) P(T = 3) = P(T=3) Since this this probability probability distribution distribution is is not not aa simple simple known known distribution distribution for for Since which we can can easily easily look look up up aa single single formula, we must must derive derive aa useful useful and and which we formula, we workable characterization.. For For this this rather rather elementary elementary example example of of a a sum sum of of workable characterization three Bernoulli Bernoulli random random variables, variables, we we imagine imagine aa vector vector of of indicator indicator variables variables three dd = = (dl, (d l , d2, d 2 , d3) d 3 ) where where d2 di E E {O, I} specifies specifies whether whether the random variable variable XZ Xi is {0,1} the random is (d2i = = 1) 1) or or failure = 0). 0). For For outcome outcome T T = = k, k, we we let let Sk Sk denote aa success success (d failure (d (d2i = denote the set set of of vectors vectors d d such such that that L: di = = k. k. Clearly, Clearly, the the number number of of terms terms in in Sk the Sk E2i d2 is by (3) for for outcome outcome equal equal to to k. k. is given given by

m

T

Sk

°011 {(0,0,0)} HO, On {(1, 0, 0), (0, {(I,O,O), (O,I,O),(O,O,I)} 1, 0), (0, 0,1)} 0,

22 33

{(1,1, 0), (1, {(I,I,O), (I,O,I),(O,I,I)} 0, 1), (0,1,1)} {(1,1,1)} HI, 1, In

Using this this notation, notation, we we can can then then construct construct the the conditional conditional probabilities probabilities as as Using

3

P(X I = = xl, XI,X = X2, X2,X = X3 x31T = k) k) P(Xl X22 = X33 = IT = -X p~" (1 pd-Xlp~2 (1 P2)I-X2p~3 (1 _ P3)1P3)I-XX33 Pi (1 - Pi) 1-X1 P2 2 (1 - P2)1 ZP33 (1

For example, For example,

= 1,X2 I,X 2 = = O, 0,X = 0IT = 1) 1) X33 =OAT=

P(X I = P(X1 _

)1_dii L:dESk n~=l PZd;i (1 (1 -- PZ Pi)l-d E deS, ~2=1

pt

(2.95) (2.95)

PI(IP2)(IP3) Pi (1 - P2) (1 - P3) (2.96) (2.96) (1 P2)(1 P1)P2(1 P1 P3) + ( 1 P3) + (1 - P1)(1 - P2)P3

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

PANEL DATA PANEL DATA

41 41

In our our more more general general case, we add add aa subscript subscript ii to to the the above above notation notation to to reflect reflect In case, we the association to conditional outcome the association to panel panel i. i. For For aa given given conditional outcome Et L: t yit Yit = = ki, k i , there there are (~:) possible possible terms terms (generalizing (generalizing from from the the simple simple example example above) above).. Let Let are ('i) ddii denote denote aa vector vector of of indicators indicators of of length length ni ni indicating indicating whether whether aa particular particular observation is aa success success.. Let Let Ski Ski denote denote the the set set of of vectors vectors di d i such such that L:t dit d it = observation is that Et = k i ·. ki

The conditional probability probability (from our logit-link logit-link parameterization) parameterization) is is then then The conditional (from our given by given by

exp(yit nit ) L:diESki exp(dittlit) exp(ditrJit) EdiESki

(2.97) (2.97)

The conditional likelihood likelihood is is given given by by the the product product of of the the conditional conditional probaprobaThe conditional bilities for for all all of of the the panels panels bilities

£(13) L (Q)

IT nr

=

exp(yittlit) exp(YitrJit) exp(dittlit) exp(ditrJit)

(2.98) (2.98)

11 i=1 Ed. L:diESki ESk .

t n

_

L

exp In Y~ exp(dit?7it) exp {YitrJit -In eXP(ditrJit)} yittlit di diESki E i Ski Finally, the conditional log-likelihood is is Finally, the conditional log-likelihood i=1 2=1

(2.99) (2 .99)

~ t, {Y""" J~., exP(d""")}

G(~) £(iJ) _

n

i=1

In yittlit -In

di ESki

exp(dit77it)

(2.100) (2 .100)

from which which the the estimating estimating equation equation can can be be derived derived as as the the derivative derivative of of the the from log-likelihood for j = 1, ... ,p log-likelihood for j = 1, . . . , p n

[{

'1i(j3)

=

nii

f=n (n

L

0 o(3j orJ E ( E In E exp(ditt7it) orJ ~ yit77it YitrJit -In exp(ditrJit) ) } ] 8y ~a~ i=1 t=1 di ESk i 2-1 t-1 diESki (pX1) ) 11 (P X1)

=

[O](PX1)

(2.101)

We emphasize that that the the conditional conditional probability probability in 2.97 is is really really the the We emphasize in equation equation 2.97 ratio of aa fraction fraction to to the the sum sum of of fractions. Each fractional fractional term term has has aa common ratio of fractions . Each common denominator that cancels cancels and and was was suppressed suppressed in printing the the equation equation.. denominator that in printing Using Using the the binary binary outcome outcome wheeze wheeze data, data, we estimate aa conditional fixedwe estimate conditional fixedeffects logistic regression regression model model.. The The results results are are given given by by effects logistic Conditional fixed-effects Conditional fixed-effects logit logit Group variable case Group variable (i) (i) :: case

Number Number of obs obs Number of groups Number of groups

40 40 10 10

= = =

Obs Obs per per group group:: min min = avg avg = max max =

Log Log likelihood likelihood

=

-14.622988 = -14 .622988

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

LR chi2(2) chi2(2) LR Prob >> chi2 Prob chi2

4 4 4 4.0 .0 4 4 00.91 .91 0.6336 0 .6336

42 42

MODEL CONSTRUCTION CONSTRUCTION AND AND ESTIMATING ESTIMATING EQUATIONS EQUATIONS MODEL

wheeze II Coef.. Std.. Err Err.. z P> I z I [95% Conf Conf.. Interval] Interval] wheeze Coef Std z P>Izl [95% -------------+---------------------------------------------------------------

-------------+----------------------------------------------------------------

age II --.2701682 .2938288 -0.92 .3057256 age .2701682 .2938288 -0 .92 00.358 .358 --.8460621 .8460621 .3057256 smoke II .0900261 .7720841 0.12 -1.423231 smoke .0900261 .7720841 0 .12 00.907 .907 -1 .423231 11.603283 .603283 ------------------------------------------------------------------------------

The results results indicate indicate that that only only 40 40 observations observations are are used used instead instead of of the the full full The 64 observations.. In In these these data, data, there there are are 66 cases cases for for which which the the outcome outcome does does 64 observations not vary for the child child;; the the sum of successful successful outcomes outcomes is is either either zero zero or ni not vary for the sum of or ni 2 .97 has so that the the denominator denominator in in equation has only only one one term term.. As As such, such, the the so that equation 2.97 conditional probability for for the the observations, observations, given given the the sum sum of the outcomes, outcomes, conditional probability of the is equal to to one one.. Since Since the the log log of of this this outcome outcome is is zero (In 11 = = 0), 0), there there is is no no is equal zero (In contribution to the the log-likelihood log-likelihood calculation, calculation, and and the the subjects subjects are are dropped contribution to dropped from from the the estimation estimation.. In In reality, reality, they they could could remain remain in in the the estimation estimation;; but but since since they contribute contribute no information to to the the model model estimation, estimation, there there is is no no reason reason they no information to artificially the sample sample size. size. For illustration, in in our our simple to artificially increase increase the For illustration, simple example example of the sum sum of of three three Bernoulli Bernoulli random variables, note note that that the the conditional conditional of the random variables, probabilities for for T = 00 and T = = 33 are are both both equal equal to to one. one. probabilities T = and T Having panels panels with with aa conditional conditional probability probability equal equal to to one one can can also also occur Having occur in Poisson models. models. A A Poisson Poisson model model for for which which all all of of the the outcomes outcomes in in aa panel panel in Poisson are zero conditions conditions on the sum sum of of the the outcomes outcomes being being zero. zero. Since there is is only are zero on the Since there only one possible set set of of outcomes outcomes for for the the individual individual measurements measurements in in the the panel, panel, one possible the conditional conditional probability probability of of the the panel panel is is one one.. In In other other words, words, if if the the sum sum of of the the outcomes outcomes in in aa panel panel of of size size ni ni is is equal equal to to zero, zero, the the conditional conditional probability probability the ni

0 ~

P(Yil

0)/ == 11

I Y~ Yit Yit = =0

P CYA = Yi2 . . = Yini Mini = Yi2 = .... =0

(2.102) (2.102)

In such from the In such cases, cases, we we recommend recommend dropping dropping those those panels panels from the conditional conditional fixedfixedeffects Poisson model model just just as as those those panels panels were were dropped dropped in in the the conditional conditional effects Poisson fixed-effects logistic logistic model model above above.. We We emphasize emphasize that that there there were were no no such such panpanfixed-effects els in the the ship ship data data illustrated illustrated earlier earlier.. els in

2.3.2.3 Random-effects Random-effects models models 2.3.2.3 A random-effects model model parameterizes parameterizes the the random random effects effects according according to to an an asasA random-effects sumed distribution for for which which the the parameters parameters of of the the distribution distribution are are estimated estimated.. sumed distribution These models are are called called subject-specific subject-specific models, models, since the likelihood likelihood models models These models since the the individual individual observations observations instead instead of of the the marginal marginal distribution distribution of of the the panels. panels. the As in the the case case of of conditional conditional fixed-effects fixed-effects models, models, our our derivation derivation begins begins with with As in an assumed distribution distribution and, and, thus, thus, does does not not address address the the quasilikelihoods quasilikelihoods of of an assumed GLMs. GLMs. The log-likelihood log-likelihood for for aa random-effects random-effects model model is is The

G = In

n

H

i=i

~

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

~

°°

f (vi)

n; ~ ~ fy (xit,Q + t-i

vi) ~ dvi

(2 .103) (2.103)

PANEL DATA PANEL DATA

43 43

where fy is is the the assumed assumed density density for for the overall model model (the (the outcome) outcome) and and where fy the overall f isis the the density density of of the random effects effects vi Vi.. The The estimating equation is is the the f the iid iid random estimating equation derivative of the log-likelihood log-likelihood in in terms terms of of /3 (3 and and the the parameters parameters ofthe of the assumed derivative ofthe assumed random-effects distribution. random-effects distribution. By inspection, obtaining the the estimating estimating equation equation might might be be aa formidable formidable By inspection, obtaining task. There are cases for which an analytic solution of the integral is possible possible task. There are cases for which an analytic solution of the integral is and for which the resulting estimating equation may be easily calculated. This and for which the resulting estimating equation may be easily calculated. This depends on both the distribution of the outcome variable and the distribution depends on both the distribution of the outcome variable and the distribution of the random random effect. There are are also also cases cases for for which which numeric numeric integration techof the effect . There integration techniques, e.g., quadrature formulae, may be implemented in order to calculate niques, e.g., quadrature formulae, may be implemented in order to calculate the estimating estimating equation equation.. In In the the following, following, we we present present an an example example of of each each of of the these approaches. these approaches. Revisiting the the Poisson Poisson setting, setting, aa random random effects effects model may be be derived derived asasRevisiting model may suming gamma distribution distribution for for the the random random effect effect.. This This choice choice of of distribution distribution suming aa gamma leads to an an analytic analytic solution solution of of the the integral in the the likelihood. leads to integral in likelihood. In the the usual usual Poisson Poisson model model we we hypothesize hypothesize that that the the mean mean of of the the outcome In outcome given exp(xit,Q) In Y is given by Ait = exp(xitJ3). In the panel setting we assume that that variable variable y is by Ait = . the panel setting we assume panel different given exp(xit each panel has a different mean that is given by exp(xitJ3 + 'fJi) = AitVi. As each has a mean that is by ,Q + t7i) = Aitvi . As such, we refer to the random effect as entering multiplicatively rather than refer entering multiplicatively such, we to the random effect as rather than additively, is the the case case in in random-effects random-effects linear linear regression regression.. additively, as as is Since the random random effect Vi = = exp(t7i) exp('fJi) is positive, we we select select a a gamma gamma distridistriSince the effect vi is positive, bution adding adding the the restriction restriction that that the the mean mean of of the the random random effects bution effects equals equals one. one . We this so so that that there is only only one one additional additional parameter parameter 0B to to estimate. We do do this there is estimate . f(v2)

a h(9)

v e-l exp(-evi)

(2.104) (2.104)

The conditional conditional mean mean of of the the outcome outcome given given the the random random effect effect is is Poisson, Poisson, The and the random random effect effect is is distributed distributed Gamma(9, Gamma(B, 0) B).. Therefore, Therefore, we we take take the the and the product to to obtain obtain the the joint joint density density function function for for the the observations of aa single single product observations of panel given given by by panel

ni Be ni eg l , = v8-' exp( exp( f(Vi, Ail, ... ,AinJ = f(B) exp( -BVi) exp( -ViAit) (ViAit)Yit /Yit! f (vi, Ail, . . . Aini) viAit)(viAit) Yzt /yit~ -evi) h9) t=l t=1 (2.105) (2.105)

vt-

II

Moreover, since the the panels panels are are all all independent, independent, the the joint joint density density for for all all of of the the Moreover, since panels combined combined is is the the product product of of the the density of each each of of the the panels. panels. panels density of The The log-likelihood log-likelihood for for gamma gamma distributed distributed random random effects effects may may then then be be dederived by integrating integrating over over vi. Vi. We We note note that that by by rearranging rearranging terms terms in in the the joint joint rived by density, the integral term may may be be simplified simplified to to one one since it is the integral integral of of density, the integral term since it is the another gamma random random variable. variable. After After simplification simplification and and collection collection of of terms, terms, another gamma we substitute our our preferred preferred f-li notation for for the the expected expected value value AA for for consisconsiswe substitute pi notation tency and and to to address address the the goal goal of of introducing log-likelihood is tency introducing covariates. covariates . The The log-likelihood is

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

44 44

MODEL CONSTRUCTION CONSTRUCTION AND AND ESTIMATING ESTIMATING EQUATIONS EQUATIONS MODEL

then specified specified as as then n ni 1 ni G In P (0+ + Y~ yit In P(9) - Y~ In r(yit + 1) + 0 In ui £ = = lnf -lnf(O)t=1 C9 t=1 J ni 1 ni 1 ni 1 ln(1 ui) ~~ In ~~ yit In(l - Ui) yitJ + ~~ Yit) Yit) In /-lit) Pit t=l J t=l t=l J ni (2.106) (2 .106) + Y~ yit Yit In(pit) In(/-lit) } + t=1 0o (2.107) (2 .107) ~ ni eo+ + Lrt L:~~l /-lit =1 Nit

t{ (t

~Yit)

~lnf(Yit+1)+0InUi

(t

(t

~

and /-lit = = exp(xit,3) exp(xitj3).. and pit The estimating equation 'lJ(0) = = T(Q,9) 'lJ(j3,0) for for aa gamma gamma distributed distributed ranranThe estimating equation T(O) dom effects Poisson model is then given by setting the derivative of the loggiven dom effects Poisson model is then by setting the derivative of the loglikelihood to zero zero likelihood to

where where

8G

I

i

:~ } 80

(2.108) (2 .108)

= 1[0] (pH) xxli 01(P+l) -

8£ } (p+l)Xl

ttXjit -Ui)] (ui - 1) En°l Xjit [Yit+/-lit Yit + pit ((Ui-1)~Z~lYU

_gyp (~/-l).

t

~i ~Yit](2.110)

yie - ui

(2 .109) (2.109)

~ ~ (~ 8y~ )itit ni 1 ni l n 8G ~~ (0 V)(9)+lnui+(1 - Ui) l -7/'(0) [7/' + + lnui + (1C9+~yit ui) -Eyit 2 .110 00 J and is defined in equation 2.107. In the derivative with respect to (equaand Ui is defined in equation 2.107 . In the derivative with respect to 80 (equaui tion 22.110), we use use 7/' to denote denote the the derivative derivative of the log of the the Gamma Gamma tion .110), we V)()0 to of the log of function (the (the psi-function) psi-function).. This This is is aa standard standard notation notation for for this this function function and and function should not be be confused confused with with our our use use of of 'lJ0 (capital psi) psi) in in other sections to to should not T() (capital other sections denote the estimating estimating equation equation.. denote the Using the the ship ship data, data, we we fit fit aa gamma gamma distributed distributed random random effects effects Poisson Poisson Using model. In this this case, there is is no no need need to to approximate approximate the the likelihood likelihood through model. In case, there through quadrature (or any any other other means) means).. Instead, Instead, there there is is an an analytic solution to to the the quadrature (or analytic solution likelihood despite the the need need to to integrate integrate over over the the random random effect effect.. This This is is the the real real likelihood despite benefit of of choosing choosing the the gamma gamma distribution distribution for for the the random random effect effect in in aa Poisson Poisson benefit model. model. The results results of of fitting fitting aa gamma gamma distributed distributed random-effects random-effects model model for for the the The ship data are are presented presented as as ship data ~l~l i=i t=i

~

~Yit)

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

pie ~l/-l~ Ee=i

445 5

PANEL DATA PANEL DATA

Random-effects Poisson Poisson Random-effects Group variable (i) (i) : ship Group variable ship

Number of obs Number of obs Number of groups Number of groups

Random effects effects u_i u_i -- Gamma Gamma Random

Obs per per group group:: min min == Obs avg == avg max == max

Log Log likelihood likelihood

Wald chi2(4) chi2(4) Wald Prob >> chi2 Prob chi2

= -74 -74.811217 = .811217

34 34 5 5 6 6 6.8 6 .8 7 7

50.90 50 .90 0.0000 0 .0000

incident II Coef.. Std.. Err Err.. Z P> I z I [95% Conf Conf.. Interval] Interval] incident Coef Std z P>Izl [95% -----------------------------------------------------------------------------op_75_79 .3827453 .1182568 3.24 .1509662 .6145244 op_75_79 I .3827453 .1182568 3 .24 00.001 .001 .1509662 .6145244 co_65_69 .7092879 .1496072 4.74 .4160633 002513 co-65-69 I .7092879 .1496072 4 .74 00.000 .000 .4160633 11..002513 co_70_74 .8573273 .1696864 5.05 .5247481 co-70-74 I .8573273 .1696864 5 .05 00.000 .000 .5247481 11.189906 .189906 co_75_79 .4958618 .2321316 2.14 .0408922 .9508313 co-75-79 I .4958618 .2321316 2 .14 00.033 .033 .0408922 .9508313 cons I -6 -6.591175 .2179892 -30.24 -7.018426 -6.163924 _cons .591175 .2179892 -30 .24 00.000 .000 -7 .018426 -6 .163924 exposure I (offset) exposure (offset) -----------------------------------------------------------------------------/lnalpha II -2 -2.368406 .8474597 -4.029397 /lnalpha .368406 .8474597 -4 .029397 --.7074155 .7074155 ------------------------------------------------------------------------------

-------------+----------------------------------------------------------------

-------------+----------------------------------------------------------------------------+----------------------------------------------------------------

alpha II .0936298 .0793475 .0177851 .4929165 alpha .0936298 .0793475 .0177851 .4929165 -----------------------------------------------------------------------------Likelihood ratio test of alpha=0 alpha=O:: chibar2(01) chibar2(01) == Likelihood ratio test of

10.61 Prob>=chibar2 == 00.001 10 .61 Prob>=chibar2 .001

Applying the the well-known well-known Gauss-Hermite Gauss-Hermite quadrature quadrature approximation, comApplying approximation, aa common random-effects model model can can be be derived derived for for Gaussian Gaussian distributed distributed random random mon random-effects effects. The likelihood likelihood is based on the joint joint distribution distribution ofthe of the outcome outcome and and the the effects . The is based on the Gaussian random random effect effect.. After After completing completing the the square square of of terms terms in in the model, Gaussian the model, the resulting resulting likelihood is the the product product of functions of of the the form form the likelihood is of functions

i:

L

-z2 z2

e e-

(2.111) (2 .111)

f (z)dz f(z)dz

This may be be numerically numerically approximated approximated using using the the Gauss-Hermite Gauss-Hermite quadrature quadrature This may formula. The The accuracy accuracy of of the the approximation is affected affected by by the number of of formula. approximation is the number points used used in in the the quadrature quadrature calculation calculation and and the the smoothness of the the product product points smoothness of of the functions functions f f(zi)-how well this this product product may may be be approximated approximated by by aa of the (z2)-how well polynomial polynomial.. this approach approach to to the the construction construction of of aa Gaussian random-effects Applying this Applying Gaussian random-effects Poisson regression model, model, we we obtain quadrature approximated approximated log-likelihood log-likelihood Poisson regression obtain aa quadrature La formulated formulated as as Go n

Ga La

1 1

M

n;i

m=1

t=1

J; g

= wm ft J' =E ~ In In Vir E7 w;" F CXZta Xit{3 + + n

2=1

Mn

(

~)

V22 _p~ P x;" V 1 p x* l 1

(2.112) (2.112)

where (w;", xm) x;") are are the the quadrature quadrature weights weights and M is the number number of of where (w;n, and abscissa, abscissa, M is the points used used in in the the quadrature quadrature rule, rule, and = w/(w (7~/((7~ + + QE) (7;) is is the the proportion proportion of of points and pp = total variance variance contributed contributed by by the the random random effect effect variance variance component component.. For For the the total Poisson of interest, interest, Poisson model model of

F(z) J 7(z)

= exp{-exp(z)} exp{ - exp(z)} exp(z)yi*/yet! exp(z)Yit /Yit! =

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

(2.113) (2 .113)

46 46

MODEL CONSTRUCTION CONSTRUCTION AND AND ESTIMATING ESTIMATING EQUATIONS EQUATIONS MODEL

The estimating equation equation for for this this likelihood-based likelihood-based model model is is specified specified by by setting setting The estimating the derivative of the the log-likelihood log-likelihood to to zero zero.. the derivative of Using the the ship ship data, data, we we fit fit aa Gaussian Gaussian distributed distributed random random effects effects Poisson Poisson Using model. The results results are are given by model. The given by Random-effects poisson poisson Random-effects Group variable (i) (i) : ship Group variable ship

Number of obs Number of obs Number of groups Number of groups

effects u_i u_i -- Gaussian Gaussian Random effects Random

Obs per per group group:: min min = Obs avg = avg max = max

Log Log likelihood likelihood

LR chi2(4) chi2(4) LR Prob >> chi2 Prob chi2

= -74 -74.225924 = .225924

incident incident

II

Coef.. Coef

op_75_79 op_75_79 co_65_69 co-65-69 co_70_74 co-70-74 co_75_79 co-75-79 _cons cons exposure exposure

I I I I I I

.3853861 .3853861 .7059975 .7059975 .8486468 .8486468 .4950771 .4950771 -6.732638 -6 .732638 (offset) (offset)

Std.. Err Err.. Std

z z

P> I z I P>Izl

[95% Conf Conf.. [95%

34 34 5 5 6 6 6.8 6 .8 7 7

55.93 55 .93 0.0000 0 .0000 Interval] Interval]

------------------------------------------------------------------------------------------+---------------------------------------------------------------.1182126 .1182126 .1495677 .1495677 .1695192 .1695192 .2302197 .2302197 .1404479 .1404479

3.26 3 .26 4.72 4 .72 5.01 5 .01 2.15 2 .15 -47.94 -47 .94

00.001 .001 00.000 .000 00.000 .000 00.032 .032 00.000 .000

.1536936 .1536936 .4128502 .4128502 .5163953 .5163953 .0438548 .0438548 -7.007911 -7 .007911

.6170786 .6170786 .9991449 .9991449 11.180898 .180898 .9462994 .9462994 -6.457365 -6 .457365

------------------------------------------------------------------------------------------+----------------------------------------------------------------

/lnsig2u II -1.42662 .5613872 -2.54 -2.526919 /lnsig2u .42662 -2 .54 -2 -1 .5613872 00.011 .011 .526919 --.3263217 .3263217 ----------------------------------------------------------------------------sigma_u II .4900195 .1375453 .2826744 .8494545 sigma .4900195 .1375453 .2826744 .8494545 rho II .1936258 .0876521 .0739925 .4191359 rho .1936258 .0876521 .0739925 .4191359 ------------------------------------------------------------------------------

-------------+---------------------------------------------------------------u

Likelihood ratio test of rho=0 rho=O:: Likelihood ratio test of

chibar2(01) == chibar2(01)

11.78 Prob>=chibar2 == 00.000 11 .78 Prob>=chibar2 .000

Using the the wheeze data, we we fit fit aa Gaussian Gaussian distributed distributed random random effects effects logistic logistic Using wheeze data, regression model. The results are given by regression model. The results are given by Random-effects logit logit Random-effects Group variable (i) (i) : case Group variable case

Number of obs Number of obs Number of groups Number of groups

Random effects effects u_i u_i -- Gaussian Gaussian Random

Obs per per group group:: min min = Obs avg = avg max = max

Log Log likelihood likelihood wheeze wheeze

II

= =

Wald chi2(3) chi2(3) Wald Prob >> chi2 Prob chi2

-37.20499 -37 .20499 Coef.. Coef

Std.. Err Err.. Std

z z

P> I z I P>Izl

[95% Conf Conf.. [95%

64 64 16 16 4 4 4.0 4 .0 4 4 00.93 .93 0.8170 0 .8170

Interval] Interval]

------------------------------------------------------------------------------------------+----------------------------------------------------------------

kingston I .1652582 .8476326 0.19 -1.496071 kingston .1652582 .8476326 0 .19 00.845 .845 -1 .496071 11.826588 .826588 age .282497 age I --.2540051 .282497 -0.90 .2996789 .2540051 -0 .90 00.369 .369 --.807689 .807689 .2996789 smoke smoke I --.0699977 .5360669 -0.13 -1.12067 .9806742 .0699977 .5360669 -0 .13 00.896 .896 -1 .12067 .9806742 _cons 11.541053 2 .204012 cons I 2.931209 0.53 -4.204012 7.286118 .541053 .931209 0 .53 00.599 .599 -4 7 .286118 -----------------------------------------------------------------------------/lnsig2u 1 .041892 -1 2 /lnsig2u II .2538943 1.041892 -1.788176 2.295964 .2538943 .788176 .295964 ----------------------------------------------------------------------------sigma_u II .5914594 .4089805 3.151826 sigma 11.135357 .135357 .5914594 .4089805 3 .151826 rho rho II .2815162 .0640567 .0483826 .7512176 .2815162 .0640567 .0483826 .7512176

-------------+----------------------------------------------------------------------------+---------------------------------------------------------------u

Likelihood ratio test of rho=0 rho=O:: chibar2(01) chibar2(01) == Likelihood ratio test of

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

2.53 Prob >= >= chibar2 chibar2 == 00.056 2 .53 Prob .056

PANEL DATA PANEL DATA

47 47

It is is worth worth emphasizing emphasizing that that the the random random effects are not not estimated estimated in in these these It effects are models. the parameters parameters (variance (variance components) components) of of the the assumed distrimodels . Rather, Rather, the assumed distribution of of the the random random effects effects enter the model model.. While the approach approach outlined outlined for for bution enter the While the Gaussian random random effects effects allows allows aa general general specification, specification, one one should should use use caution caution Gaussian when assessing models models fitted fitted by by straight straight Gauss-Hermite Gauss-Hermite quadrature. quadrature. The The ababwhen assessing scissa in this this approach spaced about about zero, zero, which which may may be be aa poor poor choice choice of of scissa in approach are are spaced value for the the function function to to be be approximated. approximated. value for The ease with which which one one may may program program Gaussian Gaussian random random effects effects models models has has The ease with made estimators readily readily available available in in software software.. However, However, we we caution that the the made estimators caution that Gauss-Hermite quadrature quadrature approach approach does does not not always always provide provide aa good good approxapproxGauss-Hermite imation. Better approximations approximations come come from from adaptive adaptive quadrature methods that that imation . Better quadrature methods choose abscissas based based on the function to be be evaluated evaluated.. At At the the very very least, least, you you choose abscissas on the function to should compare results from Gauss-Hermite Gauss-Hermite quadrature quadrature approximated approximated modmodshould compare results from els for various various numbers numbers of of quadrature quadrature points points to to evaluate evaluate the the stability stability of of the the els for results. Adaptive quadrature quadrature approaches approaches can can be be much much better better for for these types of of results . Adaptive these types random-effects as investigated investigated in in Rabe-Hesketh, Rabe-Hesketh, Skrondal, and Pickles Pickles random-effects models models as Skrondal, and (2002).. (2002) Using an an adaptive adaptive quadrature quadrature optimization optimization routine routine to to fit fit the the Gaussian Gaussian disdisUsing tributed random random effects effects logistic logistic regression regression model model for for the the wheeze wheeze data data results results tributed in in

= -37 -37.204764 .204764

log likelihood log likelihood =

wheeze II Coef.. Std.. Err Err.. z P> I z I [95% Conf Conf.. Interval] Interval] wheeze Coef Std z P>Izl [95% -------------+---------------------------------------------------------------

-------------+----------------------------------------------------------------

kingston I .1655219 .8478793 0.20 -1.496291 kingston .1655219 .8478793 0 .20 00.845 .845 -1 .496291 11.827335 .827335 age I --.2540822 .2825297 -0.90 .2996659 age .2540822 .2825297 -0 .90 00.368 .368 --.8078303 .8078303 .2996659 smoke I .5360616 -0.13 -1.120859 .9804634 smoke --.070198 .070198 .5360616 -0 .13 00.896 .896 -1 .120859 .9804634 2.931651 0.53 -4.204509 7.287352 - cons cons I 11.541421 .541421 2 .931651 0 .53 00.599 .599 -4 .204509 7 .287352 -----------------------------------------------------------------------------Variances and and covariances covariances of of random random effects effects Variances ----------------------------------------------------------------------------***level ***level 22 (case) (case) var(1) .292345 (1 var(1):: 11.292345 (1.3502367) .3502367) -----------------------------------------------------------------------------

where the difference difference from from the the straightforward straightforward Gauss-Hermite Gauss-Hermite quadrature quadrature opopwhere the timization is apparent. In In this this particular particular case, case, the the interpretation interpretation of of the the results results timization is apparent. does not change change and the difference difference in in the the fitted fitted coefficients coefficients and variance comcomdoes not and the and variance ponents is not too dramatic. This is not always the case since an adaptive ponents is not too dramatic . This is not always the case since an adaptive quadrature method can can show show significant significant improvement improvement in in accuracy accuracy.. See See RabeRabequadrature method Hesketh et al. (2002) for more information on adaptive quadrature techniques Hesketh et al . (2002) for more information on adaptive quadrature techniques and comparison to to nonadaptive nonadaptive optimization. optimization. and comparison The The difference difference in in results results when when using using adaptive adaptive quadrature quadrature is is more more propronounced if we we fit fit aa random-effects random-effects Poisson Poisson model model for for the the Progabide Progabide data. data. nounced if

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

48 48

MODEL CONSTRUCTION CONSTRUCTION AND AND ESTIMATING ESTIMATING EQUATIONS EQUATIONS MODEL

First, us fit fit aa Gaussian Gaussian distributed distributed random-effects random-effects Poisson Poisson model model using using First, let let us straightforward Gauss-Hermite quadrature quadrature straightforward Gauss-Hermite number number of of level level 11 units units = 295 295 number of of level level 22 units units = 59 59 number

= -1017 -1017.954 .954

log likelihood log likelihood =

seizures II Coef.. Std.. Err Err.. z P>lzl [95% Conf Conf.. Interval] Interval] seizures Coef Std z P>Izl [95% ----------------------------------------------------------------------------time I .1118361 .0468766 2.39 0.017 .0199597 .2037125 time .1118361 .0468766 2 .39 0 .017 .0199597 .2037125 progabide I .0051622 .0530336 0.10 0.922 .1091062 progabide .0051622 .0530336 0 .10 0 .922 --.0987817 .0987817 .1091062 timeXprog I -.104726 .0650299 -1.61 0.107 .0227303 timeXprog .104726 .0650299 -1 .61 0 .107 --.2321823 .2321823 .0227303 cons I 1.069857 .0480689 22.26 0.000 .9756434 _cons 1 .069857 .0480689 22 .26 0 .000 .9756434 11.16407 .16407 lnPeriod I (offset) 1nPeriod (offset) -----------------------------------------------------------------------------

------------+----------------------------------------------------------------

Variances and and covariances covariances of of random random effects effects Variances ----------------------------------------------------------------------------***level 22 (id) (id) ***level var(1):: .2970534 var(1) .2970534 ((.01543218) .01543218) -----------------------------------------------------------------------------

Now, let let us us fit fit the the same same model model using using an an adaptive adaptive quadrature quadrature routine routine for for Now, the estimation. the estimation. number of of level level 11 units units = 295 295 number number of of level level 22 units units = 59 59 number

= -1011 -1011.0208 .0208

log likelihood log likelihood =

seizures II Coef.. Std.. Err Err.. z P>lzl [95% Conf Conf.. Interval] Interval] seizures Coef Std z P>Izl [95% -----------------------------------------------------------------------------

------------+----------------------------------------------------------------

time I .111836 .0468768 2.39 0.017 .0199591 .203713 time .111836 .0468768 2 .39 0 .017 .0199591 .203713 progabide I --.0214708 .2101376 -0.10 0.919 .3903914 progabide .0214708 .2101376 -0 .10 0 .919 --.4333329 .4333329 .3903914 timeXprog I --.1047258 .0650304 -1.61 0.107 .0227315 timeXprog .1047258 .0650304 -1 .61 0 .107 --.232183 .232183 .0227315 cons I 1.032649 .1524222 6.77 0.000 .7339074 _cons 1 .032649 .1524222 6 .77 0 .000 .7339074 11.331392 .331392 lnPeriod I (offset) 1nPeriod (offset) ----------------------------------------------------------------------------Variances and and covariances covariances of of random random effects effects Variances ----------------------------------------------------------------------------***level 22 (id) (id) ***level var(1):: .60702391 var(1) .60702391 (.11621224) ( .11621224) -----------------------------------------------------------------------------

in sign Note the the increase increase in in the the log-likelihood, the change change in sign for for the the progabide progabide Note log-likelihood, the coefficient, and the the difference difference in the estimate estimate for for the the variance variance of of the the random random coefficient, and in the

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

449 9

PANEL DATA PANEL DATA

effects. In general, general, the the adaptive adaptive quadrature results are are more more accurate accurate than than the the effects . In quadrature results nonadaptive quadrature results. One can can examine examine the the stability stability of of the the results results nonadaptive quadrature results . One for the the nonadaptive nonadaptive techniques techniques by by fitting fitting the the model several times times where where each each for model several estimation uses aa different different number number of of quadrature quadrature points. points. If If the the results results are are estimation uses stable, then we we can can be be comfortable comfortable with with inference for the the model. model. stable, then inference for At the the beginning beginning of of this this section section we we presented presented the the derivation derivation of of randomrandomAt effects models from from an an assumed assumed distribution. distribution. It is possible possible to to derive derive aa model model effects models It is for random-effects random-effects GLM GLM.. Estimation Estimation may may be be performed performed using various optioptifor using various mization techniques including including Monte Monte Carlo methods.. Zeger Zeger and and Karim Karim (1991) (1991) mization techniques Carlo methods present aa Gibb's sampling approach approach for for constructing constructing GLM GLM random random effects effects present Gibb's sampling models. Basically, the the authors authors describe describe an an estimating equation given given by by models . Basically, estimating equation

8G 8G 8vk

n

n;

fit - /-tit ~~ Yit-Mit ~~ a(O)V(pit)

i=l t-1 t=l i-1

nk

~ nk

0l-t (OM) Xjit } Xj2t a(¢»V(Mit) ( 077 O'f} )itit ~

I

Ykt Ykt - hkt Mkt

(OM) } a(¢»V(Mkt) O'f} ) kt kt a(0)v(l-tkt) C077 01-t

(p+n)xl

(2.114)

for jj = = 1,1, .... ,p and and kk = = 1, 1, .... The random random effects 'Ii are are assumed assumed to to for . . ,p . . ,,q. q. The effects -yi follow some distribution 9 characterized characterized by by aa (q (q x x 1) 1) parameter parameter vector vector v. v. follow some distribution The authors The authors show show that, that, through through the the use use of of conditional conditional distributions, Monte distributions, aa Monte Carlo approach approach using using Gibb's sampling may may be be used to estimate estimate the the unknown unknown Carlo Gibb's sampling used to random effects -yi 'Ii which which are are then then used used to to estimate estimate the the parameters parameters v v of of the the random effects distribution ofthe random effects effects 9. Monte Carlo Carlo methods form another another class class distribution of the random 9. Monte methods form of techniques for for constructing constructing and and estimating estimating models models for for panel panel data. of techniques data. 2.3.3 Population-averaged Population-averaged and and subject-specific subject-specific models models 2.3.3 There are two classifications of of models models that that we we discuss discuss for for addressing addressing the the There are two classifications panel structure of data. A population-averaged population-averaged model model is is one which includes includes panel structure of data. A one which the within-panel within-panel dependence dependence by by averaging averaging effects effects over over all all panels panels.. A A subjectthe subjectspecific model is one which which addresses addresses the the within-panel within-panel dependence dependence by by introintrospecific model is one ducing specific panel-level panel-level random random components. components. ducing specific A population-averaged population-averaged model, model, also known as as aa marginal marginal model, model, is is obobA also known tained through through introducing parameterization for for aa panel-level panel-level covariance. tained introducing aa parameterization covariance . The panel-level covariance covariance (or (or correlation) correlation) is is then then estimated estimated by by averaging averaging The panel-level across from all all of of the the panels. panels. A A subject-specific subject-specific model model is is obtained obtained across information information from through the the introduction introduction of of a a panel panel effect effect.. While While this this implies implies aa panel-level panel-level through covariance, each panel panel effect effect is is estimated using information information only only from from the the covariance, each estimated using specific panel.. Fixed-effects Fixed-effects and and random-effects random-effects models models are are subject subject specific. specific panel specific. In following chapter In the the following chapter we we further further discuss discuss these these two two classifications classifications and and show derivations for for subject-specific and population population averaged averaged GEE GEE models. models. show derivations subject-specific and These are not not the the only only types types of of panel panel data data models models that that one might apply apply to to These are one might data. Transitional models models and and response response conditional conditional models models are are used used when when the the data. Transitional analysis of longitudinal longitudinal studies must address address the the dependence dependence of of the the current current analysis of studies must response on previous previous responses responses.. This This text text does does not not discuss discuss these these models. models. InInresponse on

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

50 50

MODEL CONSTRUCTION CONSTRUCTION AND AND ESTIMATING ESTIMATING EQUATIONS EQUATIONS MODEL

terested readers readers should should refer refer to to Neuhaus Neuhaus (1992) (1992) for for aa clear clear exposition exposition and and aa terested useful list of of references references.. useful list Estimation 22.4 .4 Estimation

The solution of of an estimating equation equation is is obtained obtained using optimization techtechThe solution an estimating using optimization niques. These techniques techniques iterate iterate toward toward aa solution solution by by updating updating aa current current esniques . These estimate to to aa new new estimate. The common common approach employs aa Taylor Taylor series series timate estimate. The approach employs expansion of an an estimating estimating equation equation given given by by w(j3) = 0, 0, such such that that expansion of Q(,3) = 3

P,, 3

(o)) + (13 -'Q(0)) .. o = w (13(0)) _ 13(0)) w' (13(0)) + ~ (13 _ ~(o))z 13(0)) 2 w" (13(0)) + + .... + (3 P' (~(o)) + 2 (' (' (o)) 0 = XP (3

(2.115) (2.115)

Keeping only the first first two two terms, terms, we we have have the the linear linear approximation approximation Keeping only the 0

T (Q( o)) + (Q -,Q(o)) P' (Q( o))

(2.116) (2.116)

,0

,Q( 0) -

(2.117) (2.117)

this relationship relationship in matrix notation, notation, we we then then iterate iterate to to aa solution solution Writing Writing this in matrix using the relationship relationship using the ~(k) - ~(k _i) +

a

XP (3(k-1))]

(3(k-1)

(2.118)

Thus, given aa starting estimate 13(0), we update update our our estimate estimate using using the the relarelaThus, given starting estimate 3( o), we equation 2.118. 2.118. Specific Specific optimization optimization techniques techniques can can take take advanadvantionship in in equation tionship tage of of properties properties of specific sources sources of of estimating estimating equations equations.. For For example, example, the the tage of specific IRLS algorithm takes takes advantage advantage of of the the form form of of the the updating updating step step by by using using the the IRLS algorithm expected derivative of of the the estimating estimating equation equation so so that that the the updating updating step step may may expected derivative be obtained obtained using using weighted weighted OLS OLS.. be The parameters parameters are are estimated estimated separately separately when when there there are are ancillary ancillary paramparamThe eters in the the estimating estimating equation equation to to be be solved solved.. This This estimation estimation must must also also be be eters in updated at each each step. step. If If we we consider consider aa second estimating equation equation for the updated at second estimating for the ancillary parameters, our our overall optimization approach approach is is to to update update Q, then then ancillary parameters, overall optimization update the ancillary ancillary parameter parameter estimates, estimates, and and continue continue alternating alternating between between update the the estimating equations throughout throughout the the iterative iterative optimization. optimization. the estimating equations

13,

Summary 22.5 .5 Summary

We illustrated three three derivations derivations of estimating equations for likelihood-based likelihood-based We illustrated of estimating equations for models with independent independent data data and and then then showed showed the the relationship relationship of of the the GLM GLM models with estimating equation to to the the previously previously illustrated illustrated models. models. We We discussed discussed the the estimating equation GLM to ability of an analyst to build models models that that extend extend the the GLM quasilikelihood ability of an analyst to build to quasilikelihood models. We then then introduced introduced the concept of of panel panel data data and and showed showed examples examples models . We the concept of how likelihood-based likelihood-based models models are are derived derived to to address address the the correlated correlated nature nature of how

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

SUMMARY

51

of the data. data. We We also also showed showed that that aa naive naive pooled pooled estimator estimator could could be be estiestiof the mated with aa modified modified sandwich sandwich estimate estimate of of variance variance in in order order to to adjust adjust mated along along with the standard standard errors the naive naive point point estimates estimates.. Finally, Finally, we we gave gave aa general general the errors of of the overview of how how estimation proceeds once once an an estimating estimating equation is specified specified.. overview of estimation proceeds equation is The middle middle of of the the sandwich sandwich estimate estimate of of variance variance involves involves the the sums sums of of the the The contributions in each each panel panel.. The The use use of sums over over correlated correlated panels panels results results contributions in of sums in variance estimate estimate called called the the modified sandwich estimate estimate of of variance variance.. See See in aa variance modified sandwich Carroll and and Kauermann Kauermann (to (to appear) appear) and and Hardin Hardin and and Hilbe Hilbe (2001) (2001) for for lucid lucid Carroll discussions of the the robust robust variance variance estimate. estimate. discussions of Our illustration of of deriving deriving estimating estimating equations equations for for likelihood-based likelihood-based models models Our illustration included models for for independent independent data and panel panel data. data. Pooled Pooled models, models, unconunconincluded models data and ditional fixed-effects models, models, conditional conditional fixed-effects fixed-effects models, models, and and randomrandomditional fixed-effects effects models all through the the same same construction effects models all admit admit estimating estimating equations equations through construction algorithm shown the independent data models models.. algorithm shown for for the independent data The The following following chapter chapter presents presents the the details details and and motivation motivation of of generalized generalized esestimating equations equations.. The The motivation motivation and and illustrations illustrations extend extend the the results results shown shown timating in chapter.. Thus, Thus, this this review review serves serves to to provide provide the the basis basis of the various various in this this chapter of the kinds of GEE GEE models addressed.. You You should should have have aa thorough thorough understanding understanding of of kinds of models addressed the techniques, techniques, derivations, derivations, and and assumptions assumptions that that are are presented presented here here in in order the order to fully appreciate the the extensions in the the subsequent subsequent material material.. fully appreciate to extensions covered covered in

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

52 52

MODEL CONSTRUCTION CONSTRUCTION AND AND ESTIMATING ESTIMATING EQUATIONS EQUATIONS MODEL

Exercises 22.6 .6 Exercises 1. Show that the the negative negative binomial binomial regression regression model model is is aa member member of of the the 1. Show that exponential family and discuss ways to address the ancillary parameter. exponential family and discuss ways to address the ancillary parameter . The negative negative binomial binomial density density is given by by The is given + r -1\ (1 = ( yy+r-l) r_ 11 I p, pT(1 - p), p)Y r

f(y; r,p) f p) = (y ; r,

where the the density provides the the probability probability of of observing observing yy failures failures before before where density provides the rth rth success success in in aa series series of of Bernoulli Bernoulli trials, trials, each each with with probability probability of of the success equal equal to to p. p. success 22..

Derive the the FIML FIML estimating equation for binomial regression regression model model.. Derive estimating equation for aa binomial You should be able to incorporate the repeated (Bernoulli) trial nature You should be able to incorporate the repeated (Bernoulli) trial nature Bernoulli of the distribution into the earlier Bernoulli example. of the distribution into the earlier example.

3. Derive Derive the the FIML FIML estimating equation for the gamma gamma regression regression model model 3. estimating equation for the and identify identify the the canonical canonical link link function. function. and 4. Derive Derive the the conditional conditional fixed-effects fixed-effects linear regression estimating estimating equation equation.. 4. linear regression 5. The The FIML FIML Poisson Poisson model model used used aa log-link log-link for for estimating estimating the the parameters. parameters. 5. Show that the interpretation of the exponentiated coefficients do not not dedeShow that the interpretation of the exponentiated coefficients do pend on the value of the covariate and use the delta method to derive the pend on the value of the covariate and use the delta method to derive the variance of of the the natural natural (not (not parameterized parameterized or or untransformed) untransformed) coefficient coefficient.. variance 6. Discuss Discuss possible possible parameterizations parameterizations for for the the LIML LIML estimating estimating equation, equation, 6. treating Q2 (72 as as ancillary, ancillary, of of an an inverse inverse Gaussian(p, Gaussian(p" Q2 (72)) model, model, where where the the treating inverse Gaussian density is given by by inverse Gaussian density is given ; ; f(y N

z) -

1 2~y 3 QZ

exp

~

-

(y -

/_t)2

2(/~or)2y

7. A A Gaussian Gaussian random-effects random-effects linear linear regression regression model model may may be be derived derived such such 7. that there there is is an an analytic analytic solution solution to to the the integral integral in in the the log-likelihood log-likelihood.. that Show this derivation. derivation. Show this the advantages advantages and and disadvantages of the the pooled pooled estimator estimator for for 8. Discuss Discuss the 8. disadvantages of panel data. data. panel

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

EXERCISES EXERCISES

53 53

9. Give Give aa detailed detailed argument how to to treat treat the the following (complete) panel panel 9. argument for for how following (complete) of data data in conditional fixed-effects logistic regression: regression: of in aa conditional fixed-effects logistic id id

44 44 44 44

yY 11 11 11 11

xl x1 00 11 11 0 0

x2 x2 11 11 0 0 0 0

x3 x3 00 00 11 11

in general, 10. A A conditional conditional fixed-effects fixed-effects model model does not, in general, include include aa parameter parameter 10. does not, for for the the constant. constant. Show Show that that the the conditional conditional fixed-effects fixed-effects negative negative binomial binomial model does does allow in the the model model and and discuss discuss why why this this is is so. so. model allow aa constant constant in

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

CHAPTER 3 CHAPTER3

Generalized Estimating Estimating Equations Equations Generalized In the previous previous chapter chapter we we illustrated illustrated aa number number of of estimating estimating equations equations that that In the were all derived derived from log-likelihood.. We We showed showed that that the the LIMQL LIMQL estimating estimating were all from aa log-likelihood equation for GLMs GLMs has has its genesis in in aa log-likelihood log-likelihood based based upon upon the the expoequation for its genesis exponential family of of distributions. In addition, addition, we we noted noted that that the the utility utility of of this this nential family distributions . In estimating equation is is extended of the the implied implied log-likelihood log-likelihood due due to to estimating equation extended outside outside of the work work of of Wedderburn Wedderburn (1974) (1974).. The The estimating estimating equation methods are are related related the equation methods to the quasilikelihood methods methods in in that that there there are no parametric parametric assumptions assumptions.. to the quasilikelihood are no The term generalized generalized estimating estimating equations equations indicates indicates that that an an estimating estimating The term equation is not the result of a likelihood-based derivation, but that is obobderivation, equation is not the result of a likelihood-based but that it it is tained by generalizing another estimating equation. The modification we make generalizing . The tained by another estimating equation modification we make to obtain obtain aa generalized generalized estimating estimating equation equation (GEE) (GEE) is is an an introduction introduction of of secsecto ond order variance components directly into a pooled estimating equation. ond order variance components directly into a pooled estimating equation . As we saw saw in in the the latter latter sections sections of the previous previous chapter chapter the the likelihood-based likelihood-based As we of the approach would address address these these additional additional variance variance components components parametrically. parametrically. approach would Here, the approach approach is is ad ad hoc. hoc. Here, the 3.1 Population-averaged Population-averaged (PA) (PA) and and subject-specific subject-specific (SS) (88) models models 3.1

To highlight two two different different categories categories of of models, models, let let us us consider consider the the generalized generalized To highlight linear mixed linear mixed model model as the source source of of nonindependence nonindependence.. For For aa given given outcome as the outcome 1) vector xit associated we have have aa (p (p xx 1) vector of of covariates covariates Xit associated with with our our parameter parameter Yit, get, we vector We also also have have aa (q (q x x 1) 1) vector vector of of covariates covariates zit Zit associated with the the vector 13. Q. We associated with random effect Vi. The conditional expectation of the outcome is given by random effect vi. The conditional expectation of the outcome is given by

/-tis = E(yitl vi)

(3.1) (3.1)

The responses for for aa given panel ii are are characterized characterized by by The responses given panel

g(f-tft 9(F~ ts )) = V(YitIVi) v(Yitl vi) = S

xitQss + zitvi v(Ftts )

.2) (3.2) (3 (3.3) (3.3)

the random random effects effects Vi follow some some distribution distribution.. where where the vi follow can either either focus focus on on the distribution of ofthe random effects effects as as the the source of We can We the distribution the random source of nonindependence, or we we can can consider consider the the marginal marginal expectation expectation of of the the outcome nonindependence, or outcome (integrated over over the the distribution) distribution) (integrated

Pit

= E [E(yit lvi)] 55 55

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

.4) (3.4) (3

56 56

GENERALIZED ESTIMATING ESTIMATING EQUATIONS EQUATIONS GENERALIZED

so the responses responses are characterized by by so that that the are characterized

Xitj3PA (3 PA 9g(p,ftA) it ) = Xit,3 V(Yit) = V(p,ftA)a(¢) v(Y2t) = v(PPA)a(O)

(3.5) .5) (3.6) (3 .6)

Thus, the marginal marginal expectation expectation is is the the average response for for observations observations sharsharThus, the average response ing the all panels) ing the same same covariates covariates (across (across all panels).. The SS PA superscripts The SS and and PA superscripts are are added added above above to to differentiate the two two apapdifferentiate the proaches.. SS SS indicates indicates that that we we are are explicitly explicitly modeling modeling the the source source of of the the heteroheteroproaches and that that the the coefficients coefficients 3ss j3ss have have an interpretation for for individualsindividualsgeneity geneity and an interpretation the SS SS means subject specific specific.. PA PA indicates indicates that that we we are are looking looking at at the the marginal marginal the means subject outcome averaged over over the population of of individuals individuals and and that that the the coefficoeffioutcome averaged the population QPA have cients have an an interpretation interpretation in in terms terms of of the the response response averaged averaged over over the the cients j3PA population-the PA PA means means population population averaged. averaged. One One should should also also note note that that population-the the form form of ofthe marginal model model is is aa parameterization parameterization in in terms terms ofthe distributhe the marginal of the distribution of of the the panels. panels. As As such, such, variance variance weighted weighted analyses analyses are are limited to include include tion limited to weights that are are at at the the level level of ofthe panel (and (and not not at at level level of of the the observation) observation).. weights that the panel Likelihood-based models models that that fit fit in these categories include random random effects effects Likelihood-based in these categories include probit regression regression models models (subject (subject specific) specific) and and beta-binomial beta-binomial marginal marginal regresregresprobit sion models (population (population averaged). sion models averaged) . Sribney (1999) (1999) devised devised an an illustrative illustrative example highlighting the the difference difference Sribney example highlighting between the the parameters parameters for for SS SS and and PA PA models. models. At At the the heart heart of of the the illustraillustrabetween tion is is an an emphasis emphasis that that the the population population parameters,QSS parameters j3ss and and j3PA are different different tion 3PA are The SS SS model model fully fully parameterizes parameterizes the the distribution distribution of of the the populapopulaentities. entities. The tion, while while the the PA model parameterizes parameterizes only only the the marginal marginal distribution distribution of of the the tion, PA model population.. population Suppose that we we are entertaining aa logistic logistic regression regression model model where where the Suppose that are entertaining the outoutcome of interest interest YZ Yitt represents represents the the case case of of aa child child having having aa respiratory respiratory illness illness come of (1/0).. A A single single explanatory explanatory covariate Xitt denotes denotes the the smoking smoking status of the the (1/0) covariate XZ status of child's mother.. The The SS SS model model with with aa single single random component v2 Vi assumes assumes child's mother random component that that logit P(YZt P(Yit = = IIX Vi) = = /oo (3gSs + + XZtoi X it (3fSs + Vi logit + v2 11 XZt, it , v2)

(3.7) (3.7)

such that the the subject-specific subject-specific odds odds ratio ratio given given by by such that

1, v2)/P(1'Zt it = x'tit = 1, P(Yit = 11 IIX Vi) /P(Yit = = 0IX = 1'v) 1, Vi) = = exp(Q exp(j3ss) (3.8) ORsS = = P(YZt 01 XZt SS) ORSS (3 .8) P(Yit = IIX = 0, Vi)/P(Yit = 0IX = 0, Vi) P(YZt =11xitit = 0, v2)/P(YZt = 01 xitit = 0, v2) represents the ratio of of the the odds given child child having having respiratory respiratory illness illness if if represents the ratio odds of of aa given child having the mother smokes compared to the odds of the same child having respiratory the mother smokes compared to the odds of the same respiratory illness if the the mother mother does not smoke smoke.. illness if does not The PA PA model, model, on on the the other other hand, hand, assumes assumes that that The (Yit = = 11 logit P P(Yit IIX (3f;A + + XZtO x it (3iPAA (3.9) (3 logit .9) it ) = aoA XZt)

=

such that the the population-averaged population-averaged odds odds ratio ratio given given by by such that PA P(Yit = 11xit IIXit = 1I)/P(Yit = 01 0IX = 1) 1) = (j3PA) )/P(YZt = xitit = OR = = P(YZt =exp ORPA expwPA) 0) = 01x it P(Yit = IIX = O)/P(Yit = 0IX = 0) =11xit = 0)/P(YZt = it it P(YZt

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

(3.10) (3.10)

THE PA-GEE FOR FOR GLMS GLMS THE PA-GEE

557 7

represents the ratio ratio of of the the odds odds of of an an average average child child with with respiratory respiratory illness illness represents the and mother to to the the odds odds of of an an average average child child with with respiratory illness and aa smoking smoking mother respiratory illness and nonsmoking mother mother.. and aa nonsmoking The lesson lesson here here is is that that we we must must think think carefully carefully about about which which parameter parameter we we The are interested in in estimating. estimating. If we wish to estimate estimate how how cessation cessation of of smoking are interested If we wish to smoking might decrease the the chances chances of our children children getting getting respiratory respiratory illness, illness, we we want want might decrease of our to estimate estimate aa subject-specific subject-specific model model.. If we wish wish to to compare compare the the respiratory respiratory to If we illness for children of smokers smokers to to children children of of nonsmokers, nonsmokers, then then we we want want to to illness for children of estimate population-averaged model model.. estimate aa population-averaged

3.2 The The PA-GEE PA-GEE for for GLMS GLMs 3.2 Certainly the the most most well-known well-known GEE-derived GEE-derived group group of of models models is is that that colleccollecCertainly tion described described in in the the landmark landmark paper paper of of Liang Liang and and Zeger Zeger (1986). The authors authors tion (1986) . The therein provide provide the the first first introduction introduction to to generalized generalized estimating estimating equations equations.. therein They also provide provide the the theoretical theoretical justification justification and and asymptotic asymptotic properties properties for for They also the resulting estimators. In In fact, fact, the the majority majority of of researchers researchers who who refer refer to to aa the resulting estimators. GEE GEE model model are are referring referring to to this this particular particular collection collection of models. of models. Understanding the the PA-GEE PA-GEE is is relatively relatively straightforward straightforward given given our our focus focus Understanding on the development development of of the the estimating the preceding preceding chapter chapter.. We We on the estimating equation equation in in the begin with with the the LIMQL LIMQL estimating estimating equation equation for GLMs begin for GLMS

3 wC{3) 1

)

~ [{ ~ ~ a~;)~~:) (~~)., ~ ~ n

=

n;

Yit - /-tit ~~zt a(O)V(pit (Flit)) ( 077) it Xj" t=1

2=1

L" L~ j=1- .,PJ Pxl

and rewrite it in matrix matrix terms terms of of the the panels panels and rewrite it in

2-1

=

x'ZTD

8

V (l-ti)]-1

Yi - Ai a

(311)

L"L

[{~X;;D (~~) [VCI',W' (Y~(f~') ~ O) ~

wCt')

[Ol,x>

i=1--'P1 Px1

[O]PXl L0JPx1

(3.12) (3 .12)

where DO denotes diagonal matrix matrix.. V(Iti) V(JLi) is is clearly clearly aa diagonal diagonal matrix matrix which which where D() denotes aa diagonal can be decomposed decomposed into into can be

v(wi)

=

[D(v(/-tit))'/' I(., x .,) D(v(/_tit))1/2] n ; x n;

(3.13) (3.13)

This presentation makes makes it it clear that the the estimating estimating equation equation is is treating treating each each This presentation clear that observation within aa panel panel as as independent independent.. A A (pooled) (pooled) model model associated with observation within associated with this estimating estimating equation equation is is called called the the independence independence model. model. this If If we we focus focus on on the the marginal marginal distribution the outcome, outcome, for for which which the the distribution of of the expected value and and variance variance functions functions are are averaged over the the panels panels (they (they are are expected value averaged over unchanged from the the specification specification given given for for the the LIMQL LIMQL estimating estimating equation equation for for unchanged from GLMs), then then the the identity identity matrix matrix in in equation equation 3.13 3.13 is is clearly clearly the the within-panel within-panel GLMs), GEE proposed correlation matrix.. The The GEE proposed by by Liang Liang and and Zeger Zeger is modification correlation matrix is aa modification ofthe LIMQL estimating estimating equation equation for for GLMS GLMs that that simply simply replaces replaces the the identity identity of the LIMQL matrix with aa more more general correlation matrix, matrix, since the variance variance matrix matrix for for matrix with general correlation since the

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

58 58

GENERALIZED ESTIMATING ESTIMATING EQUATIONS EQUATIONS GENERALIZED

correlated data does does not not have have aa diagonal diagonal form. form. correlated data v(wi)

=

I

[D(v(Uit)) 1 / 2 R(a)(ni xn i) D(v(Uit)) 1/ 2 n . xni

(3.14) (3 .14)

R(a) to to emphasize the correlation matrix is is to to be be estimated estimated We write R(a) We write emphasize that that the correlation matrix through the the parameter parameter vector vector a. turns out out that that it it is is relatively relatively easy easy to to through a. It It turns describe large grouping grouping of of useful useful structures structures on on the the correlation correlation matrix matrix via via describe aa large the aa parameter parameter vector vector.. the The conceptual conceptual idea idea behind behind these these GEES GEEs is is simple. simple. However, is not not to to The However, that that is say that the the proof proof is is simple, or that that our our appreciation of the the results results is is any any way way say that simple, or appreciation of lessened. Indeed, the the Liang Liang and and Zeger Zeger paper paper is an impressive impressive presentation presentation of of lessened. Indeed, is an sophisticated and distributional distributional statistics. statistics. Since Since our our focus in this this sophisticated asymptotic asymptotic and focus in text is is on on the the concepts concepts and and application, application, we we forego forego the the advanced advanced mathematics mathematics text required to prove prove the the properties properties of the estimators estimators for for these models. required to of the these models. 3.2.1 Parameterizing Parameterizing the the working working correlation correlation matrix matrix 3.2.1 We efficiency in in the the estimation estimation of of the the regression regression parameters parameters by by choosing choosing We gain gain efficiency to formally formally include include aa hypothesized hypothesized structure structure to to the within-panel correlation correlation.. to the within-panel There are several several ways ways in in which we might might hypothesize hypothesize this this structure structure.. There are which we Only one additional scalar parameter parameter need need be be estimated estimated if if we we believe believe that that Only one additional scalar the observations observations within within aa panel panel follow follow no no specific and that that they they are are the specific order order and equally Alternatively, we we may may hypothesize hypothesize aa more more complicated complicated equally correlated. correlated . Alternatively, structure under the the belief belief that that the the observations observations within within aa panel panel do do follow follow aa structure under specific order. Here, Here, we we may may require require aa vector vector of of additional additional parameters parameters requiring requiring specific order. up to (maxi ni) - 11 parameters, parameters, or an entire entire matrix matrix of of parameters parameters requiring requiring up to (maxi ni) or an - ni up to (maxi (ma~i ni) ni ) ni additional additional parameters. parameters. up to The following subsections present present standard standard approaches approaches to to specifying specifying aa strucstrucThe following subsections ture for for the the estimated estimated within within panel panel correlation correlation.. In In each each subsection subsection we we include include ture the results results for for analyzing analyzing aa Poisson Poisson model model of of the the repeated repeated observations of the observations of The observations are part part of of the the Progabide Progabide seizures for aa group group of of epileptics. epileptics. The seizures for observations are dataset given in in section section 5.2.3. dataset given 5.2 .3 . can formally formally write write the the estimator estimator for the ancillary ancillary association association parameters parameters We can We for the as estimating equation as the the estimating equation 'P (a) _

where where wi

C~a2)THZ 1 (wi - ~i) = L0Jgx1

i=1

=

Hi =

T (ril r i2,ril r i3, .... ,rini-1rinJ;xl (rilri2,rilri3, . . ~rini-lrini) gx l

D (v(Wij))gxq D (V(Wij))qXq

E(Wi)qxl E(Wi)gxl

(3.15) (3 .15)

(3.16) (3 .16) (3.17) (3 .17) (3.18) (3 .18)

such that rid rij is is the the ijth Pearson residual, residual, Hi Hi is is aa diagonal diagonal matrix, matrix, and and qq = such that ijth Pearson = (ni) . From (~i). From this estimating equation, it is clear that the parameterization of this estimating equation, it is clear that the parameterization of 2 the correlation correlation matrix matrix enters enters through through equation equation 3.18. 3.18. In In fitting fitting this estimating the this estimating

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

THE PA-GEE FOR FOR GLMS GLMS THE PA-GEE

559 9

equation, we substitute substitute rid rij obtained obtained from the current current estimate estimate 73, for the the equation, we from the Q, for Pearson residuals.. In In the the subsections subsections to follow, we we include include simple simple formulae formulae Pearson residuals to follow, for the the estimation estimation of the components components of a. In In all all cases cases the the formulae formulae may may be be for of the of a. derived directly from from the the above above estimating estimating equation equation.. derived directly In the the following following subsections, subsections, symmetric symmetric (square) (square) matrices matrices print print results results only In only for the the lower lower triangle triangle of the matrix matrix for for ease ease of of readability readability.. for of the

3.2.1.1 Exchangeable correlation 3.2.1 .1 Exchangeable correlation The simplest form ofthe correlation matrix matrix is is the the identity identity matrix matrix assumed assumed by by The simplest form of the correlation the independence model, which imposes no additional ancillary parameters. the independence model, which imposes no additional ancillary parameters. In extension to to this this structure, structure, we we might might hypothesize that observaobservaIn aa simple simple extension hypothesize that tions within a panel have some common correlation (one additional ancillary tions within a panel have some common correlation (one additional ancillary In this parameter).. In case, a a is is aa scalar scalar and and the the working working correlation correlation matrix matrix has has parameter) this case, the structure the structure

R(a) R(a) = =

11 a a a a

aa a a a 11 a a 11 a

.. . .. . .. .

aa aa aa

a a

a a a a

.. .

11

(3.19) (3 .19)

which we can can write write succinctly succinctly as as which we R uv Ruv

1 if if a=v u=v = 1{ aa otherwise otherwise

(3.20) (3 .20)

This hypothesis is is valid valid for for datasets datasets in in which which the the repeated repeated measurements measurements This hypothesis have no time time dependence dependence and and any any permutation permutation of of the repeated measurements measurements have no the repeated is valid. An An example example of of this this type type of of data data is is aa health health study study in in which which the the panpanis valid. els represent clinics clinics and and the the repeated repeated measurements measurements are are patients patients within within the the els represent clinics . clinics. A A GEE GEE with with an exchangeable correlation correlation structure structure uses uses the the estimated estimated PearPearan exchangeable son residuals son residuals Pit fit = = (yit (Yit - lit)/VV(l) Mit) / JV (Mit) from from the the current current fit fit of of the the model model to to estimate the common common correlation correlation parameter parameter.. The The estimate estimate of a using using these these estimate the of a residuals residuals is is n ; ",ni n; ~ ~ n ; ~2} 2 n ~ {",ni ",ni u =1 riu a = .;1 '""" Du=l Dv=l riuriv riuriv - ~, Du=l r iu ~ u=1 ~v=1 n

.I.. ~ 0/ i=l i=1

n·(n· -1) ni(ni 1) I I

(3 .21) (3.21)

This This type type of of correlation correlation goes goes under under several several names names including including exchangeable exchangeable correlation, equal equal correlation, correlation, common common correlation, correlation, and and compound compound symmetry. symmetry. correlation, Specifying this this structure structure on on the the working working correlation correlation matrix, matrix, the the result result of of Specifying fitting this this model model to to the the seizure seizure data data (section (section 5.2.3) is given given by by fitting 5.2 .3) is

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

60 60

GENERALIZED ESTIMATING ESTIMATING EQUATIONS EQUATIONS GENERALIZED

GEE population-averaged model model GEE population-averaged Group variable:: Group variable

Link: Link : Family:: Family Correlation: Correlation :

Number of obs Number of obs Number of groups Number of groups Obs per per group group:: min min Obs avg avg max max chi2(3) Wald chi2(3) Wald Prob >> chi2 Prob chi2

id id log log Poisson Poisson exchangeable exchangeable

Scale parameter:: Scale parameter

1 1

= = = = = = =

= =

295 295 59 59 5 5 5.0 5.0 5 5 00.92 .92 0.8203 0 .8203

(standard errors errors adjusted adjusted for for clustering clustering on (standard on id) id) -----------------------------------------------------------------------------II

Semi-robust Semi-robust Coef.. Std.. Err Err.. z P> I z I [95% Conf Conf.. Interval] Interval] Coef Std z P>Izl [95% -------------+----------------------------------------------------------------------------+---------------------------------------------------------------seizures seizures II

time I .111836 .1169256 0.96 .3410059 time .111836 .1169256 0 .96 00.339 .339 --.1173339 .1173339 .3410059 progabide I .0275345 .2236916 0.12 .465962 progabide .0275345 .2236916 0 .12 00.902 .902 --.410893 .410893 .465962 timeXprog I -.1047258 .2152769 -0.49 .3172092 timeXprog .1047258 .2152769 -0 .49 00.627 .627 --.5266608 .5266608 .3172092 .1587079 8.49 1.036547 - cons cons I 11.347609 .347609 .1587079 8 .49 00.000 .000 1 .036547 11.658671 .658671 lnPeriod (offset) 1nPeriod I (offset) ------------------------------------------------------------------------------

The estimated correlation correlation matrix matrix is The estimated is r1 ri r2 r2 r3 r3 r4 r4 r5 r5

c1 ci 1.0000 1 .0000 0.7767 0 .7767 0.7767 0 .7767 0.7767 0 .7767 0.7767 0 .7767

c2 c2

c3 c3

c4 c4

c5 c5

11.0000 .0000 0.7767 0 .7767 0.7767 0 .7767 0.7767 0 .7767

1.0000 1 .0000 0.7767 0 .7767 0.7767 0 .7767

1.0000 1 .0000 0.7767 0.7767

11.0000 .0000

-diagonal correlations Note that that all all off off-diagonal identical, which which is is characteristic characteristic Note correlations are are identical, of the exchangeable exchangeable correlation correlation structure structure.. of the This type type of of correlation correlation can can come come about about when when we we believe believe that that the the repeated repeated This measures not in in any any particular particular order. For this this particular particular dataset, we would would measures are are not order . For dataset, we have to believe believe that that there there was was no no time time dependence dependence of the observations observations despite despite have to of the the fact that the the observations observations are, are, in in fact, fact, collected collected over over time. One could could argue argue the fact that time . One in this case case that that the the analysis analysis would would benefit benefit from from hypothesizing hypothesizing aa time series in this time series model of the the correlation. model of correlation . In this this subsection subsection we we illustrate illustrate the estimation techniques techniques for fitting the the PAIn the estimation for fitting PAGEE model model.. To To clearly clearly demonstrate demonstrate the the algorithm, algorithm, we we use use the the following following data data GEE so the calculations calculations can can be be understood understood and and verified verified by by the the reader. reader. so that that the

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

id id

tt

y Y

x

11 11 11 11 22 22 22 22

11 22 33 44 11 22 33 44

44 55 66 77 55 66 77 88

00 11 00 11 00 11 00 11

THE PA-GEE FOR FOR GLMS GLMS THE PA-GEE

6611

Our goal goal is is to to fit fit an an exchangeable exchangeable correlation correlation linear linear regression regression model model Our (3.22) (3.22)

yit = ~0 + XZto1

where the panel panel level level variance variance is is given given by by where the

V V (l-ti) (JLi)

11 PP PP Pp I 1 p P Pp Pp 11

(

(3.23) (3.23)

PP PP PP

Since we we are are fitting fitting aa Gaussian Gaussian model, model, Since

V (Pi) =

(3.24) (3 .24)

1(4x4)

The variance function function in in terms terms of the mean mean it JL is is 1.0 and the the scale scale parameter parameter The variance of the 1 .0 and 1> must must still still be be estimated estimated.. The starting starting value value for for (/30, ((30, /31) (3d must must be be specified. specified. From From aa linear linear regression regression The (or an an independent independent correlation, correlation, linear PA-GEE), we we obtain obtain ((30, (3d = (5.5,1). (or linear PA-GEE), /31) = (5.5, 1) . (/30, Now, we we must must estimate estimate the the dispersion dispersion parameter parameter and and the the common common correlation correlation Now, parameter.. We We obtain obtain the the fitted values of of the the model model xb xb = = 5.5 5.5 + + xx and and the the parameter fitted values residuals res = = yy -- xb xb residuals res id id

tt

yY xx

xb xb

res res

11 11 11 11 22 22 22 22

11 22 33 44 11 22 33 44

44 55 6 6 77 55 66 77 8 8

5.5 5.5 6.5 6.5 5.5 5.5 6.5 6.5 5.5 5.5 6.5 6.5 5.5 5.5 6.5 6.5

-1.5 -1.5 -1.5 -1.5 0.5 0.5 0.5 0.5 -0.5 -0.5 -0.5 -0.5 1.5 1.5 1.5 1.5

00 11 0 0 11 0 0 11 0 0 11

An estimate of of the parameter is is calculated calculated using using equation equation 3.48 3.48 (we (we An estimate the dispersion dispersion parameter could also use use equation equation 3.47) 3.47) could also 1

-

22 44

res 7t 8~~rest 8 2=1 i=l t=1 t=l

-LL

(3.25) (3.25)

8"8(2 (2.25 + 2.25 + 0.25 + 0.25 + 0.25 + 0.25 + 2.25 + 2.25) .25+2.25+0.25+0.25+0 .25+0 .25+2 .25+2 .25)

1

(3.26) (3.26)

.25 ~ (10) (10) = = 11.25

(3.27) (3.27)

8

The (common) exchangeable exchangeable correlation correlation coefficient coefficient is is estimated estimated by by using using an an The (common)

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

62 62

GENERALIZED ESTIMATING ESTIMATING EQUATIONS EQUATIONS GENERALIZED

equivalent for equation equation 3.21 3.21 equivalent formula formula for 22

1

44

-1 p = ¢;-l 12 Y~ L Y~ L Y~ L resitresit, resitresit' 12

(3.28) (3.28)

2=1 i=l t=1 t=l t' t' >t >t

.5(-1 .5 ++ .5.5 + 11 1{ [-1 [-1.5(-1.5 + .5) .5) - 1.5(0 1.5(0.5 + 00.5) + 00.5(0.5)] + (.8) 12 (.8) .5 + .5) + .5(0.5)] + 2 [-0 .5(-0 .5 + .5 + .5)] [-0.5(-0.5 + 1.5 1.5 + + 1.5) 1.5) - 0.5(1 0.5(1.5 + 1.5) 1.5) + + 1.5(1 1.5(1.5)]} _

1

(0 (0.8)12([-.5] + [-.5]) [-.5]) .8)12 ([- .5] + -1

(3.29) (3.29) (3.30) (3.30)

1

(0.8) 06667 (0.8)12 = -15 = -.06667 12 15 The output for for fitting model is is displayed displayed as as The output fitting this this model GEE population-averaged model model GEE population-averaged Group variable:: id Group variable id Link: identity Link : identity Family:: Gaussian Family Gaussian Correlation: exchangeable Correlation : exchangeable Scale parameter:: Scale parameter

1.25 1 .25

(3.31) (3.31)

Number of obs Number of obs Number of groups Number of groups Obs per per group group:: min min Obs avg avg max max Wald chi2(1) chi2(1) Wald Prob >> chi2 Prob chi2

= = = = = = =

= = =

8 8 2 2 4 4 4.0 4.0 4 4 11.50 .50 0.2207 0 .2207

Coef Std z P>Izl y II Coef.. Std.. Err Err.. Z P> I z I [95% Conf Conf.. Interval] Interval] y [95% -------------+----------------------------------------------------------------

-------------+---------------------------------------------------------------x x II

1 1.8164966 .8164966 5.5 .5400617 5 .5 .5400617

-_cons cons II

1 .22 1. 22 10.18 10 .18

.221 00.221 00.000 .000

--.6003039 .6003039 4 .441498 4.441498

2 2.600304 .600304 6.558502 6.558502

with estimated correlation correlation matrix matrix with estimated

rl ri r2 r2 r3 r3 r4 r4

cl ci 11.0000 .0000 -0.0667 -0 .0667 -0.0667 -0 .0667 -0.0667 -0 .0667

c2 c2

c3 c3

c4 c4

11.0000 .0000 -0.0667 -0 .0667 -0.0667 -0 .0667

11.0000 .0000 -0.0667 -0 .0667

11.0000 .0000

The The output output from from the software matches matches the the manual manual calculations calculations that that are are the software illustrated. In aa later later section section we we discuss discuss the the implications implications of of the the two two referenced referenced illustrated . In equations for estimating estimating the the dispersion dispersion parameter parameter.. Running Running this this example example in in equations for in 0 software that uses uses equation equation 3.47 3.47 results results in ¢; == 11.6667 and pp = = --.06 resoftware that .6667 and .06 respectively. changes to to the the manual manual calculations calculations that that match match such such software software spectively. The The changes results are given given by by results are 1 2 (10) 8~ (10) 8 2 P

=

1

= .6667 = 6~ (10) (10) = = 11.6667 -1

(.6)--(-1) = ((.6).6) 12 1 - 22 (-1) 10 (6) 12 10

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

= --.0600 .0600

(3.32) (3.32) (3.33) (3.33)

THE PA-GEE FOR FOR GLMS GLMS THE PA-GEE

63 63

For the the specific specific case case of of Gaussian Gaussian variance variance with with an an identity identity link, link, the the exexFor changeable correlation PA-GEE PA-GEE estimates estimates the the same same model model as as random random effects effects changeable correlation linear regression. This equivalence for for the the PA PA model model to to the the SS SS random random effects effects linear regression. This equivalence linear regression holds holds since the model model specifies specifies the the identity identity link. link. It It turns turns out out linear regression since the that there there are of equivalent equivalent estimators estimators for this model model.. that are aa number number of for this The model model may may be be estimated estimated by by fitting fitting aa FIML FIML random-effects random-effects regression, regression, The the PA-GEE PA-GEE exchangeable exchangeable linear linear regression, regression, or generalized least least squares squares the or aa generalized (GLS) model model.. Interpretation Interpretation of coefficients is is identical identical for for the three approaches approaches (GLS) of coefficients the three listed, but numeric numeric differences differences can can arise arise for for three three different different reasons reasons.. The The first first listed, but source of numeric numeric differences differences is is in in the the choice choice of of estimator estimator for for the the dispersion dispersion source of parameter in in the the PA-GEE PA-GEE model. model. A A second second source of numeric numeric differences differences is is in in parameter source of whether the dataset dataset is balanced (t2 (ti = =T T for for all all ii = = 1, third source source whether the is balanced 1, .... . . ,,n). n). The The third of numeric differences differences is is whether whether the the dataset dataset is is large large enough enough to to admit admit reliable reliable of numeric the FIML FIML model model.. estimates in the estimates in Many software software packages packages will will allow allow specification specification of of all all of these models. models. While While Many of these they all all estimate the same same underlying underlying population population parameters, parameters, numeric numeric differdifferthey estimate the ences will be be noted noted due due to to differences differences in in the the estimation estimation of of ancillary ancillary parameters parameters ences will as well as as the the sensitivity sensitivity of of FIML FIML optimization optimization routines routines in the specific specific software software as well in the package.. package Using the the balanced balanced linear linear regression regression data data given given in in section section 55.2.5, we first first fit fit Using .2 .5, we an exchangeable PA-GEE PA-GEE model model where where the the dispersion dispersion parameter parameter is is estimated estimated an exchangeable using equation 3.48 3.48.. using equation GEE population-averaged model model GEE population-averaged Group variable:: id Group variable id Link: identity Link : identity Family:: Gaussian Family Gaussian Correlation: exchangeable Correlation : exchangeable Scale parameter:: Scale parameter

11.029535 .029535

Number of obs Number of obs Number of groups Number of groups Obs per per group group:: min min Obs avg avg max max Wald chi2(2) chi2(2) Wald Prob >> chi2 Prob chi2

= = = = = = =

= =

80 80 10 10 8 8 8.0 8.0 8 8 53.74 53 .74 0.0000 0 .0000

Coef Std z P>Izl Coef.. Std.. Err Err.. Z P> I z I [95% Conf Conf.. Interval] Interval] yy II [95% -------------+---------------------------------------------------------------------------+---------------------------------------------------------------xi 11.182527 .182527 11.641648 .641648 xl II .2342497 5.05 .7234056 .2342497 5 .05 00.000 .000 .7234056 x2 II .2685991 4.29 .6264242 x2 11.152869 .152869 .2685991 4 .29 00.000 .000 .6264242 11.679313 .679313 -_cons cons II .2477123 11.362751 .877244 .2477123 3.54 .3917368 .877244 3 .54 00.000 .000 .3917368 .362751 -------------+----------------------------------------------------------------------------+---------------------------------------------------------------rho rho II .0639 .0639 ------------------------------------------------------------------------------

Next, we we fit fit an an exchangeable exchangeable PA-GEE PA-GEE model model where where the the dispersion dispersion paramparamNext, eter is estimated using equation equation 3.47. eter is estimated using 3.47. GEE population-averaged GEE population-averaged model model Group variable id Group variable:: id Link : identity Link: identity Family:: Gaussian Family Gaussian Correlation : exchangeable Correlation: exchangeable

Scale parameter:: Scale parameter

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

11.069642 .069642

Number of obs Number of obs Number of groups Number of groups Obs Obs per per group group:: min min avg avg max max Wald chi2(2) chi2(2) Wald Prob >> chi2 Prob chi2

= = = = = = =

= =

80 80 10 10 8 8 8.0 8.0 8 8 51.63 51 .63 0.0000 0 .0000

64 64

GENERALIZED ESTIMATING ESTIMATING EQUATIONS EQUATIONS GENERALIZED

y II Coef.. Std.. Err Err.. Z P> I z I [95% Conf Conf.. Interval] Interval] Coef Std z P>Izl [95% y -------------+---------------------------------------------------------------------------+----------------------------------------------------------------

xl II .2389458 4.95 .7137348 xi 11.18206 .18206 .2389458 4 .95 00.000 .000 .7137348 11.650385 .650385 x2 II .2739036 4.21 .6161515 x2 11.152993 .152993 .2739036 4 .21 00.000 .000 .6161515 11.689834 .689834 .8773187 .2522629 3.48 .3828926 -_cons cons II .8773187 .2522629 3 .48 00.001 .001 .3828926 11.371745 .371745 -------------+----------------------------------------------------------------------------+---------------------------------------------------------------rho II .0622 rho .0622 ------------------------------------------------------------------------------

Fitting aa FIML FIML Gaussian Gaussian random-effects random-effects linear linear regression regression model model results results in: in: Fitting Random-effects ML ML regression regression Random-effects Group variable (i) (i) : id id Group variable

Number of obs Number of obs Number of groups Number of groups

Random effects effects u-i u_i -- Gaussian Gaussian Random

Obs per per group group:: min min = Obs avg = avg max = max

Log Log likelihood likelihood

= -114 -114.21672 = .21672

LR chi2(2) chi2(2) LR Prob >> chi2 Prob chi2

80 80 10 10

8 8 8.0 8.0 8 8

40.09 40 .09 0.0000 0 .0000

y II Coef.. Std.. Err Err.. z P> I z I [95% Conf Conf.. Interval] Interval] Coef Std z P>Izl [95% y -------------+----------------------------------------------------------------------------+----------------------------------------------------------------

xl II .2352395 5.03 .7214657 xi 11.182527 .182527 .2352395 5 .03 00.000 .000 .7214657 11.643587 .643587 x2 II .2686589 4.29 .6263071 x2 11.152869 .152869 .2686589 4 .29 00.000 .000 .6263071 11.67943 .67943 .877244 .2477359 3.54 .3916905 -_cons cons II .877244 .2477359 3 .54 00.000 .000 .3916905 11.362797 .362797 -------------+---------------------------------------------------------------------------+---------------------------------------------------------------/sigma_u .2564913 .1678537 1.53 .5854785 /sigma -u II .2564913 .1678537 1 .53 00.126 .126 --.0724959 .0724959 .5854785 /sigma -e II 11 11.144407 .144407 /sigma_e .9817064 .083012 11.83 .819006 .9817064 .083012 .83 00.000 .000 .819006 -------------+----------------------------------------------------------------------------+----------------------------------------------------------------

rho rho II .0639005 .0813078 .0025834 .4015966 .0639005 .0813078 .0025834 .4015966 -----------------------------------------------------------------------------Likelihood ratio test of sigma_u=O: chibar2(01)= Likelihood ratio test of sigma-u=0 : chibar2(01)= 0.92 Prob>=chibar2 = = 00.169 0 .92 Prob>=chibar2 .169

These results match match the the results results of the first first exchangeable model that that These results of the exchangeable PA-GEE PA-GEE model we fit to to this this dataset dataset.. The The results results using using these these two two particular particular estimation estimation apapwe fit proaches do do not, not, in in general, general, match when the the data are comprised comprised of unbalanced proaches match when data are of unbalanced panels.. There There can can also also be be small numeric differences differences for for balanced balanced panels panels when panels small numeric when the two estimation approaches approaches use use different different tolerance tolerance criteria criteria for for declaring declaring the two estimation convergence. convergence . The following following is is the the output output display display of of fitting fitting aa random random effects effects linear linear regresregresThe sion model via via generalized generalized least least squares: sion model squares : Random-effects GLS GLS regression regression Random-effects Group variable (i) (i) :: id id Group variable

Number of obs Number of obs Number of groups Number of groups

R-sq:: R-sq

Obs per per group group:: min min = Obs avg = avg max = max

within = 0.4325 within 0 .4325 between = 0.0760 between 0 .0760 overall = 0.3871 overall 0 .3871

Random effects effects u-i u_i "- Gaussian Gaussian Random corr(u_i, X) 0 (assumed) (assumed) corr(u -i, X) = 0

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

Wald chi2(2) chi2(2) Wald Prob >> chi2 Prob chi2

80 80 10 10 8 8 8.0 8.0 8 8

52.88 52 .88 0.0000 0 .0000

THE PA-GEE FOR FOR GLMS GLMS THE PA-GEE

665 5

y II Coef.. Std.. Err Err.. Z P> I z I [95% Conf Conf.. Interval] Interval] Coef Std z P>Izl [95% y -------------+---------------------------------------------------------------

-------------+----------------------------------------------------------------

xl II .2367737 5.03 .7269203 xi 11.190988 .190988 .2367737 5 .03 00.000 .000 .7269203 11.655056 .655056 x2 II .2729453 4.22 .6156148 x2 11.150578 .150578 .2729453 4 .22 00.000 .000 .6156148 11.685541 .685541 .8759236 .2592818 3.38 .3677406 -_cons cons II .8759236 .2592818 3 .38 00.001 .001 .3677406 11.384107 .384107 -------------+----------------------------------------------------------------

-------------+---------------------------------------------------------------sigma_u .33430544 sigma u II .33430544

sigma_e .99576555 sigma - e II .99576555 (fraction of variance due due to to uu_i) rho II .10129539 .10129539 rho (fraction of variance i) ------------------------------------------------------------------------------

Aside from from the the comparisons comparisons that that can can be be made made for for the the estimation estimation methods methods Aside involved in fitting fitting this this same same model model in in various various ways, ways, we we also also point point out out that that the the involved in FIML model provides provides point point estimates estimates and and standard standard errors errors for for the the variance variance FIML model estimating all all parameters parameters simultaneously. The other components since it it is is estimating components since simultaneously. The other methods treat the the random random effects effects variance variance parameters parameters as ancillary. methods treat as ancillary. Hardin and and Hilbe Hilbe (2001) (2001) include include aa sample sample analysis analysis of ofinsurance claims data.* data. * Hardin insurance claims Observations are are collected on the the payout payout yy for for car car insurance insurance claims claims given given the the Observations collected on car group (cart, (cari, car2, car2, car3) car3) and and vehicle vehicle age age group group (valuel, (valuei, value2, value2, value3) value3).. car group Additional covariates were created created for for the the interaction interaction of of the the car car and and vehicle vehicle Additional covariates were age indicators.. The The data data are are collected collected on on panels panels defined defined by by the the policy policy age group group indicators holder's age group group.. Since the group group is is aa collection of different different policy policy holders holders holder's age Since the collection of rather than repeated repeated observations on the the same same individual, individual, it it is is reasonable reasonable to to rather than observations on assume the exchangeable-correlation exchangeable-correlation structure structure over over aa time-related time-related structure structure.. assume the Since the the outcomes outcomes are are positive, positive, the the natural natural modeling modeling choices choices are gamma and and Since are gamma inverse Gaussian. Here, Here, we we use use the the gamma gamma model model.. inverse Gaussian. GEE population-averaged GEE population-averaged model model variable:: Group Group variable pa pa Link: reciprocal Link : reciprocal Family:: gamma Family gamma Correlation: exchangeable Correlation : exchangeable

Scale parameter:: Scale parameter

.0315838 .0315838

Number of obs Number of obs Number of groups Number of groups Obs per per group group:: min min Obs avg avg max max Wald chi2(15) chi2(15) Wald Prob >> chi2 Prob chi2

= = = = = = = = =

871 871 8 8 14 14 108.9 108 .9 218 218 580.49 580 .49 0.0000 0 .0000

y II Coef.. Std.. Err Err.. z P> I z I [95% Conf Conf.. Interval] Interval] Coef Std z P>Izl [95% y -------------+----------------------------------------------------------------

-------------+---------------------------------------------------------------carl cari car2 car2 car3 car3 valuel values value2 value2 value3 value3 carlvall carivali carlva12 carival2 carlva13 carival3 car2vall car2vali car2va12 car2va12

I I I I I I I I I I I

.0040821 .0040821 .0036037 .0036037 .0032887 .0032887 -.0017425 .0017425 -.0013117 .0013117 .0001354 .0001354 -.0028558 .0028558 -.0026487 .0026487 -.0025891 .0025891 -.0020648 .0020648 -.0022636 .0022636

.0004547 .0004547 .0004137 .0004137 .0005279 .0005279 .0002973 .0002973 .0003021 .0003021 .0003812 .0003812 .0004559 .0004559 .0004585 .0004585 .0005509 .0005509 .000407 .000407 .0004134 .0004134

*

8.98 8 .98 8.71 8 .71 6.23 6 .23 -5.86 -5 .86 -4.34 -4 .34 0.36 0 .36 -6.26 -6 .26 -5.78 -5 .78 -4.70 -4 .70 -5.07 -5 .07 -5.47 -5 .47

00.000 .000 00.000 .000 00.000 .000 00.000 .000 00.000 .000 00.723 .723 00.000 .000 00.000 .000 00.000 .000 00.000 .000 00.000 .000

.0031909 .0031909 .0027928 .0027928 .0022539 .0022539 --.0023253 .0023253 --.0019038 .0019038 --.0006117 .0006117 --.0037493 .0037493 --.0035473 .0035473 --.0036688 .0036688 --.0028625 .0028625 --.0030739 .0030739

.0049733 .0049733 .0044146 .0044146 .0043234 .0043234 --.0011598 .0011598 --.0007196 .0007196 .0008824 .0008824 --.0019623 .0019623 --.0017501 .0017501 --.0015094 .0015094 --.0012672 .0012672 --.0014533 .0014533

net from from http http://www.stata.com/users/jhardin for users users of of Stata Stata.. Note Note that for * net ://www .stata .com/users/jhardin for that for example we we divided divided the the frequency frequency weight weight by by 10 10:: replace replace number= number=int(number/l0). example int(number/10) .

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

this this

66 66

GENERALIZED ESTIMATING ESTIMATING EQUATIONS EQUATIONS GENERALIZED

car2va13 -.0025697 .0004942 -5.20 car2va13 I .0025697 .0004942 -5 .20 00.000 .000 --.0035383 .0035383 --.0016011 .0016011 car3va11 -.0024019 .0005261 -4.57 car3vall I .0024019 .0005261 -4 .57 00.000 .000 --.0034331 .0034331 --.0013707 .0013707 car3va12 -.0025727 .0005327 -4.83 car3va12 I .0025727 .0005327 -4 .83 00.000 .000 --.0036168 .0036168 --.0015286 .0015286 car3va13 -.0028441 .0006057 -4.70 car3va13 I .0028441 .0006057 -4 .70 00.000 .000 --.0040312 .0040312 --.0016571 .0016571 .004301 .0003235 13.30 .003667 .004935 - cons cons I .004301 .0003235 13 .30 00.000 .000 .003667 .004935 ------------------------------------------------------------------------------

The are interpreted interpreted as as the the rate rate at at which which each each Pound Pound Sterling Sterling is is paid paid The results results are for an an average average claim claim over over an an unspecified unspecified unit unit of of time. time. The The fitted fitted coefficient coefficient on on for car!, for for example, example, is is the the change change in in rate rate at at which which aa claim is paid paid for for aa random random cart, claim is car from group group 11 versus versus aa random random car car from from some other group. group. Since Since this is aa car from some other this is marginal model, we we cannot cannot discuss discuss the the effect effect of an individual individual observation. marginal model, of an observation .

3.2.1.2 Autoregressive correlation correlation 3.2.1 .2 Autoregressive It may be be more more reasonable reasonable to to assume assume aa time time dependence dependence for for the the association association if if It may the repeated repeated observations observations within within the the panels panels have have aa natural natural order order.. For For example, example, the in aa health in health study study we we might might have have panels panels representing representing patients patients with with repeated repeated measurements taken over over time. time. measurements taken The correlation For norThe correlation structure structure is is assumed assumed to to be be corr(yit corr(Yit', yit, Yit')) = = aft-t'l a1t-t'l.. For normally distributed mally distributed Yit, this is is analogous analogous to to aa continuous continuous time time autoregressive autoregressive yet, this (AR) process process.. (AR) In this this case, case, aa is is aa vector vector (it (it was was scalar in the the preceding preceding subsection) subsection) and and we we In scalar in estimate the estimate the correlations correlations using using the the Pearson Pearson residuals residuals Ft fit = = (Yet (Yit -- ~Zt)/ Mit) / JV(Mit) v(V from the the current current fit fit of the model model.. from of the ~

a

n (",ni-O n;-0 ~ n;-k ~ ~ ~ ~ ",ni-k L.."t=l r2,tr2,t+o ri,tri,t+O L.."t=l F2tF,t+k ritri,t+k ) ] ~t-1_ , ... , ~t=1 , n2 n2 1> [ 2-1 i=l ni ni ~

n = _11/'-.. '""" ~

(3.34) (3.34)

The correlation The correlation matrix matrix is is then then built built from from the the autoregressive autoregressive structure structure imimplied by by the the AR AR correlations correlations.. An An autoregressive process of of order order kk has has nonzero nonzero plied autoregressive process autocorrelations for many many more more than than kk lags; the matrix matrix is is constant along all all autocorrelations for lags; the constant along major major diagonals. diagonals . Specifying this correlation correlation structure, structure, the the fit fit of of the the model for the the seizure seizure Specifying this model for data (section 5.2.3) is given given by by data (section 5.2.3) is GEE population-averaged model model GEE population-averaged Group and time time vars vars:: Group and Link: Link : Family:: Family Correlation: Correlation : Scale parameter:: Scale parameter

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

id tt id log log Poisson Poisson AR(2) AR(2) 1 1

Number of obs Number of obs Number of groups Number of groups Obs per per group group:: min min Obs avg avg max max Wald chi2(3) chi2(3) Wald Prob >> chi2 Prob chi2

= = = = = = =

= =

295 295 59 59 5 5 5.0 5 .0 5 5 76 11..76 0.6243 0 .6243

THE PA-GEE FOR FOR GLMS GLMS THE PA-GEE

67 67

(standard errors errors adjusted adjusted for for clustering clustering on (standard on id) id) ------------------------------------------------------------------------------

Semi-robust Semi-robust Coef.. Std.. Err Err.. z P> I z I [95% Conf Conf.. Interval] Interval] Coef Std z P>Izl [95% -------------+----------------------------------------------------------------------------+---------------------------------------------------------------II

seizures seizures II

time I .1364146 .1039591 1.31 .3401707 time .1364146 .1039591 1 .31 00.189 .189 --.0673415 .0673415 .3401707 progabide 1 .0105488 .2187692 0.05 .4393285 progabide .0105488 .2187692 0 .05 00.962 .962 --.4182309 .4182309 .4393285 timeXprog 1 -.1080133 .2354003 -0.46 .3533629 timeXprog .1080133 .2354003 -0 .46 00.646 .646 --.5693895 .5693895 .3533629 .1615541 8.21 1.009958 - cons cons 1 11.326599 .326599 .1615541 8 .21 00.000 .000 1 .009958 11.643239 .643239 lnPeriod (offset) 1nPeriod I (offset) ------------------------------------------------------------------------------

The estimated correlation correlation matrix matrix is The estimated is

rl ri r2 r2 r3 r3 r4 r4 r5 r5

cl ci 1.0000 1 .0000 0.8101 0 .8101 0.7445 0 .7445 0.6563 0 .6563 0.5863 0 .5863

c2 c2

c3 c3

c4 c4

c5 c5

11.0000 .0000 0.8101 0 .8101 0.7445 0 .7445 0.6563 0 .6563

1.0000 1 .0000 0.8101 0 .8101 0.7445 0 .7445

1.0000 1 .0000 0.8101 0.8101

11.0000 .0000

The The Progabide Progabide data data are are aa collection collection of of repeated repeated measures measures over over time. time. Given Given might this, it is a natural assumption that the dependence of the observations this, it is a natural assumption that the dependence of the observations might be related related to to aa time series type type dependence dependence.. The The GEE GEE autoregressive autoregressive correlacorrelabe time series tion structure provides such a model. The difficulty in choosing this type of of . The difficulty in tion structure provides such a model choosing this type correlation lies in determining the correct order of the autoregressive process. in determining correlation lies the correct order of the autoregressive process . The QIC information information criterion criterion (see (see chapter 4) is is useful useful in in helping helping the the analyst analyst The QIC chapter 4) to choose between competing hypothesized correlation models. competing to choose between hypothesized correlation models . Lesaffre and Spiessens (2001) (2001) investigate investigate the the stability of the the quadraturequadratureLesaffre and Spiessens stability of approximated Gaussian random effects logistic regression, a topic we discuss discuss approximated Gaussian random effects logistic regression, a topic we in section 2.3.2 2.3.2.3. The data data include include aa patient patient identifier identifier number number idnr; idnr; the the in section .3 . The treatment group group of of the the patient patient trt; trt; the the time time of of the the measurement measurement time; time; and and treatment an indicator outcome outcome variable variable y.* y.* Here, Here, we we fit fit aa marginal marginal probit probit PA-GEE PA-GEE an indicator model subject to to aa hypothesized hypothesized autoregressive correlation structure of order model subject autoregressive correlation structure of order one. The data data are are unbalanced unbalanced and and five five of of the the patients patients are excluded from from the the one . The are excluded analysis since they they have have only only one one observation observation.. analysis since GEE population-averaged model model GEE population-averaged Group and time time vars vars:: Group and

Link: Link : Family:: Family Correlation: Correlation :

Scale parameter:: Scale parameter

*

idnr visit visit idnr probit probit binomial binomial AR(l) AR(1) 1 1

Number of obs Number of obs Number of groups Number of groups Obs per per group group:: min min Obs avg avg max max chi2(4) Wald chi2(4) Wald Prob >> chi2 Prob chi2

= =

1651 1651 245 245 2 2 6.7 6.7 7 7 79.13 79 .13 0.0000 0 .0000

http://www.blackwellpublishers.co.uk/rss/Readmefiles/lesaffre.htm is aa link link to to * http ://www .blackwellpublishers .co .uk/rss/Readmefiles/lesaffre .htm is the data data in in the the article article.. the

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

68 68

GENERALIZED ESTIMATING ESTIMATING EQUATIONS EQUATIONS GENERALIZED

y II Coef.. Std.. Err Err.. z P> I z I [95% Conf Conf.. Interval] Interval] Coef Std z P>Izl [95% y ------------------------------------------------------------------------------------------+---------------------------------------------------------------trt I .040119 .1500653 0.27 .3342416 trt .040119 .1500653 0 .27 00.789 .789 --.2540036 .2540036 .3342416 time I .0309173 -1.95 .0001719 time --.060425 .060425 .0309173 -1 .95 00.051 .051 --.1210218 .1210218 .0001719 timeXtrt I --.0163527 .024753 -0.66 .0321622 timeXtrt .0163527 .024753 -0 .66 00.509 .509 --.0648676 .0648676 .0321622 visit I --.0861492 .0564823 -1.53 .024554 visit .0861492 .0564823 -1 .53 00.127 .127 --.1968524 .1968524 .024554 cons I --.3053814 .1409726 -2.17 _cons .3053814 .1409726 -2 .17 00.030 .030 --.5816827 .5816827 --.0290802 .0290802 -------------+--------------------------------------------------------------------------------------------------------------------------------------------alpha1 II 0.7089 alphas 0.7089 ------------------------------------------------------------------------------

3.2.1.3 Stationary correlation 3.2.1 .3 Stationary correlation an alternative alternative to to the the time time series series autocorrelation hypothesis, we we may may instead instead As As an autocorrelation hypothesis, hypothesize that correlations exist for for some some small small number number of of time time units units.. In In hypothesize that correlations exist this hypothesis, hypothesis, we we specify specify aa maximum maximum time time difference difference for for which which observations this observations might be might be correlated correlated such such that that the the correlation correlation matrix matrix is is banded banded.. In In this this case, case, aa is is aa vector vector of the correlations correlations for for up up to to user-specified user-specified kk lags. lags. of the Using Using the the Pearson Pearson residuals residuals Ft fit = = (Yit - ~Zt)/ Mit) / JV(Mit) from the the current current fit fit (yet v(V from the model, model, we we can estimate the the vector vector of of correlation correlation parameters parameters a a in in the the of of the can estimate same manner as as for for the the autoregressive autoregressive correlation correlation.. same manner ~

n

n = _11/'-.. '""" ~

n i _0(",ni-O ~ ~ r2,tr2,t+o L.."t=l ri,tri,HO ~t-1_

n i _k-

",ni-k ~ ~ L.."t=l r2,tr2,t+k ri,tri,Hk ) ] , ... , ~t=1 , n2 ni

(3.35) (3 .35) n2 1> [ 2-1 i=l ni The hypothesized correlation matrix is banded with with is Is down the diagonal, diagonal, The hypothesized correlation matrix is banded down the al down the first band, a2 down the second band, and so forth. a1 down the first band, a2 down the second band, and so forth . The correlation matrix matrix may may be be succinctly succinctly described described as The correlation as

a

R Ruv uv

al,-,l l if = {OOlu-v if Ju lu -- vJvi <:S kk

(3.36) (3 .36)

otherwise 0o otherwise

Specifying this this structure structure for for the the working working correlation correlation matrix matrix results results in in the the Specifying following model for the the seizure seizure data data (section following model for (section 55.2.3). .2 .3) . GEE population-averaged model model GEE population-averaged Group and time time vars vars:: id tt Group and id Link: log Link : log Family:: Poisson Family Poisson Correlation: stationary(2) Correlation : stationary(2) Scale parameter:: Scale parameter

1 1

Number of of obs Number obs Number of groups Number of groups Obs per per group group:: min min Obs avg avg max max Wald chi2(3) chi2(3) Wald Prob >> chi2 Prob chi2

= = = = = _ =

= =

295 295 59 59 55 5.0 5.0 55 00.43 .43 0.9333 0 .9333

(standard errors errors adjusted adjusted for for clustering clustering on (standard on id) id) -----------------------------------------------------------------------------Semi-robust Semi-robust II seizures II Coef Std z P>Izl seizures Coef.. Std.. Err Err.. z P> I z I [95% Conf Conf.. Interval] Interval] [95% ------------------------------------------------------------------------------------------+---------------------------------------------------------------time time I .0866246 .1739279 0.50 .4275169 .0866246 .1739279 0 .50 00.618 .618 --.2542677 .2542677 .4275169 progabide .12 --.410893 progabide I .0275345 .2236916 0.12 .465962 .0275345 .2236916 0 00.902 .902 .410893 .465962 timeXprog I --.1486518 .2506858 -0.59 .3426833 timeXprog .1486518 .2506858 -0 .59 00.553 .553 --.639987 .639987 .3426833 _cons 11.347609 1 .036547 11.658671 cons I .1587079 8.49 1.036547 .347609 .1587079 8 .49 00.000 .000 .658671 lnPeriod (offset) 1nPeriod I (offset) ------------------------------------------------------------------------------

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

THE PA-GEE FOR FOR GLMS GLMS THE PA-GEE

669 9

and the estimated estimated correlation correlation matrix matrix is is given given by by and the

ri r1 r2 r2 r3 r3 r4 r4 r5 r5

ci c1 1 .0000 1.0000 0.8152 0 .8152 0.7494 0 .7494 0.0000 0 .0000 0.0000 0 .0000

c2 c2 11.0000 .0000

0.8152 0 .8152 0.7494 0 .7494 0.0000 0 .0000

c3 c3

c4 c4

c5 c5

1 .0000 1.0000 0.8152 0 .8152 0.7494 0 .7494

1.0000 1 .0000 0.8152 0 .8152

11.0000 .0000

Note the the partial partial similarity similarity of of this this correlation correlation matrix matrix to to the the one one displayed displayed Note using the using the autoregressive autoregressive correlation correlation structure. In particular, particular, note note the the bands bands structure . In of 0.0 at at the the lower lower left, left, and and hence hence upper right, extremes extremes of of the the matrix matrix.. of 0.0 upper right, The stationary model model differs differs from from the the autoregressive autoregressive model model in in that that the the The stationary correlations are not not assumed to be be nonzero nonzero after after the the specified specified order. order. correlations are assumed to Hardin and and Hilbe Hilbe (2001) (2001) provide provide an an analysis analysis of of length-of-stay length-of-stay data.* data.* The The Hardin data the length length of hospital stay in days, days, los; los; whether whether the the patient patient is data include include the of hospital stay in is Caucasian, white white;; urgent urgent admission admission indicator, indicator, type2 type2;; emergency emergency admission Caucasian, admission indicator, type3;; and and an an indicator of whether whether the the patient patient died, died, died died.. Hospital Hospital indicator, type3 indicator of admissions that are are neither neither urgent urgent nor nor emergencies emergencies are are deemed deemed elective. elective. The The admissions that goal is to to model model the the length length of of stay stay on on the the covariates covariates taking taking into into account account the the goal is correlation of patients patients with with the the same same insurance insurance provider, provider, provider provider.. The The data data correlation of are from 54 different providers providers and and the the panels panels are are unbalanced; unbalanced; the the smallest smallest are from 54 different provider has has aa single single observation observation and and the the largest largest has has 92 92.. provider Here we we wish wish to to use use the the geometric geometric family family to to describe describe the the model model.. This This family family Here is the discrete discrete correlate correlate to to the the negative negative exponential exponential distribution distribution and and may may is the be specified specified as as the the negative negative binomial binomial variance variance function function where where the the ancillary ancillary be parameter is is fixed fixed at at 1.0. the canonical canonical link link for for the the negative negative binomial binomial parameter 1 .0. While While the variance is the the negative negative binomial binomial link link function, function, most most researchers researchers use use the the log variance is log link to link to facilitate facilitate comparisons to the the Poisson Poisson model model.. In In fact, fact, most most applications applications comparisons to of the negative negative binomial binomial models models are are aa means means to to address overdispersion in in the the of the address overdispersion Poisson model.. As As aa side side note, note, we we should mention that that aa negative negative binomial binomial Poisson model should mention model employing the the canonical canonical link link has has the the unfortunate unfortunate property property of model employing of having having the ancillary ancillary parameter parameter embedded embedded in in both both the the link link and variance functions. the and variance functions . Estimation, hence tractibility, tractibility, is more troublesome troublesome than than using using the the natural natural Estimation, and and hence is more log In(p,).. link, ln(p) log link, Here, we we fit fit the the log log link link geometric geometric model model subject subject to to aa stationary stationary correlation correlation Here, structure with lag lag 2. use of of this this correlation correlation structure structure is is illustrative illustrative for for 2. Our structure with Our use this section section.. In In reality, reality, we we would would prefer prefer the the exchangeable exchangeable correlation correlation or or ununthis structured correlation given given that that the the panels panels are made up up of of different different patients. patients. structured correlation are made Given our desire to to fit fit the the stationary stationary correlation correlation structure structure with with up up to to 22 lags, lags, Given our desire the estimation first drops drops those those panels panels with with fewer fewer than than 33 observations observations.. The The the estimation first resulting analysis is is on on 49 49 providers providers rather rather than than the the 54 54 represented represented in in our our resulting analysis data. data.

*

net from from http://www.stata.com/users/ j hardin for for users users of of Stata Stata.. * net http ://www .stata .com/users/jhardin

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

70 70

GENERALIZED ESTIMATING ESTIMATING EQUATIONS EQUATIONS GENERALIZED

GEE population-averaged model model GEE population-averaged Group and time time vars vars:: Group and

Link: Link : Family:: Family Correlation: Correlation :

provider tt provider log log negative binomial(k=1) binomial(k=l) negative stationary(2) stationary(2)

Scale parameter:: Scale parameter

1 1

Number of obs Number of obs Number of groups Number of groups Obs per per group group:: min min Obs avg avg max max Wald chi2(4) chi2(4) Wald Prob >> chi2 Prob chi2

= = = = = = = = =

=

1487 1487 49 49 3 3 30.3 30 .3 92 92 59.29 59 .29 0.0000 0 .0000

los II IRR Std.. Err Err.. Z P> I z I [95% Conf Conf.. Interval] Interval] los IRR Std z P>Izl [95% -------------+---------------------------------------------------------------------------+----------------------------------------------------------------

whi te II .883152 .086767 -1..26 26 .728463 070689 white .883152 .086767 -1 00.206 .206 .728463 11..070689 type2 1I .0933929 3.27 1.100677 type2 11.271154 .271154 .0933929 3 .27 00.001 .001 1 .100677 11.468036 .468036 type3 1I 2.017643 .2301166 6.15 1.613477 2.523049 type3 2 .017643 .2301166 6 .15 00.000 .000 1 .613477 2.523049 died 1I .7917637 .0457454 -4.04 .7069945 .8866968 died .7917637 .0457454 -4 .04 00.000 .000 .7069945 .8866968 -------------+----------------------------------------------------------------------------+---------------------------------------------------------------alphal II 0.0373 alphai 0.0373 alpha2 II 0.0577 alpha2 0.0577 ------------------------------------------------------------------------------

The output output indicates indicates that that both both urgent urgent and and emergency emergency admissions admissions to to the the The hospital significantly increase increase the the length length of stay averaged averaged over over the the providers providers.. hospital significantly of stay We see that that dying dying results results in in aa shorter shorter stay stay averaged averaged over providers.. We also also see over providers The output is listed in terms of exponentiated coefficients. For Poisson, Poisson, The output is listed in terms of exponentiated coefficients . For binomial, innegative binomial, and geometric models, the exponentiated coefficient is innegative and geometric models, the exponentiated coefficient is For illustration, terpreted as the incidence rate ratio. For illustration, note that the inverse terpreted as the incidence rate ratio. note that the inverse link exp(q) link exp (1]) is nonlinear and and focus focus on on the the interpretation interpretation of of the the coefficient coefficient /31 (31 is nonlinear white . on white. on ~lOSit = Alosit

exp((3o + + (whiten (whiteit + + 1)/31 1)(31 + + type22t type2 it /3 (322 + + type32t03) type3 it (33) exp(00 01 + - exp(0 exp((3o0 + + whitei whit eitt (31 + type2 type22tit /3 (322 + + type3 type32tit 03) (33) (3.37) (3.37)

Clearly, the effect effect on on the the length length of of stay, by increasing increasing the the value value of of the the white white Clearly, the stay, by covariate, depends on on the the values values of of the the other other covariates covariates.. Instead Instead of of focusing covariate, depends focusing on the difference difference in in the the outcome, outcome, we we can can define define aa different different measure measure based based on on on the the incidence rate ratio ratio in in the the length length of of stay stay the incidence rate IRR.hite IRRwhite

exp((3o + + (white2 (whiteitt + + 1)/31 1)(31 + + type type2 + tYPe3 type3Zta3) exp(00 22t/32 it (32 + it. (33) (3.38) (3.38) exp(0 t 01 + 2t /3 exp((3o0 + + whitei whiteit(31 + type2 type2 it (322 + type3it (33) + type32t/33) exp (/31) exp((3t} (3.39) (3.39)

This well defined defined ratio ratio has has aa clear clear interpretation interpretation and and does does not not depend depend on on the the This well values of the the covariates. covariates. It is aa simple simple transformation transformation of of the the fitted fitted coefficients coefficients.. values of It is Standard errors errors for for the the exponentiated exponentiated coefficients coefficients can can be be obtained obtained through Standard through the delta delta method; method; see see Feiveson Feiveson (1999) (1999) for for aa helpful helpful illustration illustration.. We We use use 11 the rather than 00 as as the the null null hypothesis hypothesis in in testing testing our individual coefficients coefficients.. An An rather than our individual incidence rate ratio ratio of of 11 indicates indicates no no change change in in the the rate rate while while an rate incidence rate an incidence incidence rate for an ratio of 22 indicates indicates twice twice the the incidence incidence for an increase increase of of one one in in the the associated associated ratio of coefficient. coefficient .

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

THE PA-GEE FOR FOR GLMS GLMS THE PA-GEE

71 71

3.2.1 .4 Nonstationary 3.2.1.4 Nonstationary correlation correlation

In order to to formalize formalize aa correlation correlation structure structure where where In order correlation matrix is estimated from the available correlation matrix is estimated from the available matrix of parameters parameters a. a. matrix of En

ni i=1 ni n L:~=l ni 2 L: i=l L:ni ~2 / t=l ri,t ni

a a

(

i=1 ~t=1 ri,t/ni ~n

G G

=

a,, v) I(i,u,v) 1(i,

(3.40) (3.40)

G G

r

~, g2,lfdPi,1 92,1ri,2ri,1

gl,2 i,1 ri,2 91,2ri,1ri,2

gl,ni ri,l ri,ni ri,1 ri,ni 91, .i

g2, 2ri,22 92,2ri

g2,ni ri,2ri,ni i,ni 92, .i

gni,1 ri,ni ri,ni ri,1 ri,l 9ni,1

gni,2G,ni fi,2 9ni,2 ri,ni ri,2

gni,niri,ni gni,nir ni

91,1x21 gl,lri,l

~2

ri,2 r

i=1

)

~2

(t,I(i,u,v))-, Yi n

gu,v 9u,v

each entry entry in in the the working working each information, we specify information, we specify aa

I(a, a, v)

(3.41) (3.41)

(3.42) (3.42)

{f 01

1 if if panel panel ii has has observations observations at at indexes indexes au and and vv otherwise 0 otherwise

(3.43) (3.43)

to the the stationary stationary structure, structure, the the nonstationary nonstationary correlation correlation matrix matrix Similar Similar to uses the estimated correlations for a specified number of bands 9 for the uses the estimated correlations for a specified number of bands g for the matrix. The working correlation matrix is specified as matrix . The working correlation matrix is specified as Ruv

=

1 auv 0

if a=v u=v if if if 00 G < lu lu -- v1vi <:::; g9 otherwise otherwise

(3.44) (3.44)

The nonstationary hypothesis hypothesis differs differs from from the the stationary stationary hypothesis hypothesis in in that that The nonstationary the correlation matrix is is not not assumed assumed to to have have constant constant correlations correlations down down the the the correlation matrix diagonals. This estimated estimated correlation correlation matrix matrix is is not not guaranteed guaranteed to to be be invertible diagonals . This invertible and numeric problems problems may may be be encountered, encountered, especially especially for for unbalanced unbalanced datasets datasets.. and numeric This is This is readily readily observed observed by by the the different different amounts amounts of of information information going going into into the matrix matrix element element estimates estimates.. the GEE population-averaged model model GEE population-averaged Group and time time vars vars:: Group and

Link: Link : Family:: Family Correlation: Correlation :

Scale parameter:: Scale parameter

id tt id log log Poisson Poisson nonst nonst 1 1

Number of obs Number of obs Number of groups Number of groups Obs per per group group:: min min Obs avg avg max max Wald chi2(3) chi2(3) Wald Prob >> chi2 Prob chi2

= = = = = = =

= =

295 295 59 59 5 5 5.0 5.0 5 5 00.43 .43 0.9333 0 .9333

(standard errors errors adjusted adjusted for for clustering clustering on (standard on id) id) -----------------------------------------------------------------------------Semi-robust Semi-robust II seizures II Coef Std z P>Izl seizures Coef.. Std.. Err Err.. z P> I z I [95% Conf Conf.. Interval] Interval] [95% -----------------------------------------------------------------------------time time I .0866246 .1739279 0.50 .4275169 .0866246 .1739279 0 .50 00.618 .618 --.2542677 .2542677 .4275169 progabide .12 --.410893 progabide I .0275345 .2236916 0.12 .465962 .0275345 .2236916 0 00.902 .902 .410893 .465962 timeXprog I -.1486518 .2506858 -0.59 .3426833 timeXprog .1486518 .2506858 -0 .59 00.553 .553 --.639987 .639987 .3426833 _cons 11.347609 1 .036547 11.658671 cons I .1587079 8.49 1.036547 .347609 .1587079 8 .49 00.000 .000 .658671 lnPeriod (offset) 1nPeriod I (offset) ------------------------------------------------------------------------------

-------------+----------------------------------------------------------------

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

7722

GENERALIZED ESTIMATING ESTIMATING EQUATIONS EQUATIONS GENERALIZED

The estimated correlation correlation matrix matrix is The estimated is

r1 ri r2 r2 r3 r3 r4 r4 r5 r5

c1 ci 1.0000 1 .0000 0.9892 0 .9892 0.7077 0 .7077 0.0000 0 .0000 0.0000 0 .0000

c2 c2

c3 c3

c4 c4

c5 c5

11.0000 .0000 0.8394 0 .8394 0.9865 0 .9865 0.0000 0 .0000

1.0000 1 .0000 0.7291 0 .7291 0.5538 0 .5538

1.0000 1 .0000 0.7031 0.7031

11.0000 .0000

Note the the relationship relationship of ofthe above correlation correlation matrix matrix to to that that of ofthe stationNote the above the stationary correlation structure. Both have bands of 0.0 at the matrix extremes. ary correlation structure . Both have bands of 0 .0 at the matrix extremes.

3.2.1.5 correlation 3.2.1 .5 Unstructured Unstructured correlation The unstructured correlation matrix is is the the most most general general of of the the correlation correlation The unstructured correlation matrix structures that we we discuss discuss.. It no structure structure to to the the correlation correlation matrix matrix structures that It imposes imposes no and is equal equal to to the the nonstationary nonstationary matrix matrix for for the the maximum maximum lag lag.. The The working working and is correlation matrix is is specified specified as as correlation matrix (3.45)

R=a

a is is defined defined by by equation equation 3.40. 3.40. Like Like the the nonstationary nonstationary correlation correlation strucstrucwhere where a ture, the the estimated estimated correlation correlation matrix matrix is is not not guaranteed guaranteed to to be be invertible invertible and and ture, numeric problems numeric problems may may be be encountered, encountered, especially especially for for unbalanced unbalanced datasets datasets.. Again as for for the the nonstationary nonstationary structure, structure, we we can can see see the of these these probprobAgain as the source source of lems as the the different different amounts amounts of of information going into into the the calculation calculation of of the the lems as information going individual matrix element element estimates. estimates. individual matrix GEE population-averaged model model GEE population-averaged Group and time time vars vars:: Group and

Link: Link : Family:: Family Correlation: Correlation :

Scale parameter:: Scale parameter

id t t id log log Poisson Poisson unstructured unstructured 1 i

Number of of obs Number obs Number of groups Number of groups Obs per per group group:: min min Obs avg avg max max Wald chi2(3) chi2(3) Wald Prob >> chi2 Prob chi2

= = = = = = =

= =

295 295 59 59 55 5.0 5.0 55 00.37 .37 0.9464 0 .9464

(standard errors errors adjusted adjusted for for clustering clustering on (standard on id) id) ------------------------------------------------------------------------------

Semi-robust Semi-robust Coef.. Std.. Err Err.. z P> I z I [95% Conf Conf.. Interval] Interval] Coef Std z P>Izl [95% -------------+----------------------------------------------------------------------------+---------------------------------------------------------------II

seizures seizures II

time I .0826525 .1386302 0.60 .3543626 time .0826525 .1386302 0 .60 00.551 .551 --.1890576 .1890576 .3543626 progabide I .0266499 .224251 0.12 .4661738 progabide .0266499 .224251 0 .12 00.905 .905 --.4128741 .4128741 .4661738 timeXprog I -.1002765 .2137986 -0.47 .318761 timeXprog .1002765 .2137986 -0 .47 00.639 .639 --.5193139 .5193139 .318761 .1623308 8.23 1.017142 - cons cons I 11.335305 .335305 .1623308 8 .23 00.000 .000 1 .017142 11.653467 .653467 lnPeriod (offset) 1nPeriod I (offset) ------------------------------------------------------------------------------

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

THE PA-GEE FOR FOR GLMS GLMS THE PA-GEE

73 73

The estimated correlation correlation matrix matrix is given by by The estimated is given r1 ri r2 r2 r3 r3 r4 r4 r5 r5

c1 ci 1.0000 1 .0000 0.9980 0 .9980 0.7149 0 .7149 0.8034 0 .8034 0.6836 0 .6836

c2 c2

c3 c3

c4 c4

c5 c5

11.0000 .0000 0.8290 0 .8290 0.9748 0 .9748 0.7987 0 .7987

1.0000 1 .0000 0.7230 0 .7230 0.5483 0 .5483

1.0000 1 .0000 0.6983 0.6983

11.0000 .0000

This correlation correlation matrix matrix is is unlike unlike those those that that we we have have thus thus far considered. This far considered. Note the the asymmetry asymmetry of of the the off off-diagonal values.. Note diagonal values

3.2.1.6 Fixed correlation correlation 3.2.1 .6 Fixed A fixed A fixed correlation correlation matrix matrix can can be be imposed if we we have have knowledge knowledge of of the the strucstrucimposed if matrix from ture of of the the correlation correlation matrix from another another source. source. This This approach approach does does not not ture estimate the working working correlation correlation at at each each step, step, but but rather rather it it takes takes the the supplied supplied estimate the correlation matrix as as given given.. correlation matrix Another Another use use for for specifying specifying aa fixed fixed correlation correlation matrix matrix is is to to enable enable estimation estimation of structure that that is is not not directly directly supported supported by by an option of of aa specific specific software software of aa structure an option program program.. This This correlation correlation structure structure is is discussed discussed in in an an example example in in the the following following subsection. subsection .

3.2.1.7 Free specification specification 3.2.1 .7 Free There are There are many many other other structures structures we we might might hypothesize hypothesize for for our our within within panel panel matrix forms correlation matrix that do not follow the constraints of the discussed forms. correlation that do not follow the constraints of the discussed . For example, might have For we might have aa study study in in which which there there are are patients patients within within doctors doctors example, we (our panels) panels).. We We might might hypothesize hypothesize that that the the multiple multiple observations observations on on aa given (our given while within patient are correlated while observations between patients (even within the patient are correlated observations between patients (even the same doctor) are are uncorrelated uncorrelated as as aa first first step step in in the the analysis. analysis. None None of of the the same doctor) previously discussed discussed correlations correlations match match such such aa description description.. previously For where the the panels panels For aa specific example, consider balanced panel panel dataset specific example, consider aa balanced dataset where identify ophthalmologists.. Each ophthalmologist reports reports on on aa study study of of treating identify ophthalmologists Each ophthalmologist treating each of four four different different patients. patients. If If the the data data are are ordered ordered such such that that within within each eye eye of the ophthalmologist ophthalmologist id id we collect data data on on the the left eye and and then then the the right right eye eye the we collect left eye of each patient, patient, we we can can hypothesize hypothesize aa common common correlation correlation of of data data on on eyes of each eyes for individual individual patients patients while while data patients are are uncorrelated. uncorrelated. Such Such aa for data across across patients hypothesized correlation matrix matrix would would take take the the form form hypothesized correlation 1 1

R= R=

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

pP 00 00 00 00 00 00

p P

11 00 00 00 00 00 00

00 00 11 pP 00 00 00 00

00 00 pP 11 00 00 00 00

00 00 00 00 11 pP 00 00

00 00 00 00 pP 1 1 00 00

00 00 00 00 00 00 1 1 pP

00 00 00 00 00 00 pP 1 1

(3.46) (3.46)

7744

GENERALIZED ESTIMATING ESTIMATING EQUATIONS EQUATIONS GENERALIZED

A analysis might might hypothesize hypothesize that that the the zeros zeros in the above above hypothesized hypothesized A final final analysis in the structure are replaced replaced with with T T to to denote denote aa nonzero, nonzero, but but different, different, correlation correlation structure are between patients patients within within doctors. If we we really really believed believed that that within-patient within-patient between doctors . If correlation was the the only only source source of correlation in in the the data, data, we we would would simply simply correlation was of correlation specify that the patients should should be be in the ii panel panel data. data. We We may also consider consider specify that the patients in the may also adding fixed effect for doctors. doctors. adding aa fixed effect for We can impose any any structure structure we we wish wish if if we we have have access to software software that that We can impose access to allows fixed correlation correlation matrix matrix specification, specification for for limiting limiting the the allows aa fixed specification, aa specification estimation to aa single single step, step, and and the the ability ability of ofthe user to to supply supply starting starting values values estimation to the user for the the regression regression coefficients. to do do so so requires requires some some programming, programming, for coefficients . However, However, to or at least least repeated repeated calls calls to to the the command command for fitting the the model model.. or at for fitting We can can proceed proceed by by specifying specifying an an independent independent correlation correlation matrix matrix to to get get We starting values for for our our regression regression coefficients coefficients.. We We can can then then obtain obtain the the Pearson Pearson starting values residuals and estimate estimate aa correlation correlation matrix matrix under under any any constraints constraints we we desire desire.. residuals and Next, we we iterate iterate by by supplying supplying our our estimated estimated correlation correlation matrix matrix to to the the software software Next, using the the fixed fixed correlation correlation matrix matrix specification specification along along with with the the starting starting values values using for the the regression regression coefficients coefficients.. We We limit limit the the estimation estimation to to aa single single step, step, take the for take the resulting regression coefficients coefficients as as input input to to the the next next step, step, obtain obtain an an updated updated resulting regression estimate of the the correlation correlation matrix, matrix, and and iterate. iterate. estimate of Our example uses uses the the data data listed listed in in section section 5.2.5, 5.2.5, which which follows follows the the above above Our example structure. the data data are are constructed, constructed, we we do do not not focus focus on on the the nature nature of the structure . Since Since the of the data nor on on any any particular particular ophthalmological ophthalmological study. study. The The data data exist exist merely merely to to data nor illustrate the techniques techniques for for fitting fitting user-specified user-specified correlation correlation structures structures under under illustrate the the current software options options.. the current software If If we we cannot cannot specify specify aa structure, and the the options options do do not not allow allow sufficient sufficient structure, and control for estimation estimation through through a a specified correlation matrix, matrix, we we could could specify control for specified correlation specify an unstructured correlation correlation matrix matrix in in order to see see if if aa recognizable recognizable structure structure an unstructured order to exists. This specification may not not lead lead to to convergence convergence.. The The results results below below were were exists . This specification may obtained after specifying specifying aa more more liberal liberal convergence convergence criterion. criterion. The The difficulty difficulty obtained after in this model model is is not not unexpected unexpected since since we we are are estimating estimating 33 regression regression in fitting fitting this parameters and and 56 association parameters parameters from from only 80 observations observations.. parameters 56 association only 80 GEE population-averaged model model GEE population-averaged Group and time time vars vars:: id tt Group and id Link: identity Link : identity Family:: Gaussian Family Gaussian Correlation: unstructured Correlation : unstructured Scale parameter:: Scale parameter

11.137088 .137088

Number of obs Number of obs Number of groups Number of groups Obs per per group group:: min min Obs avg avg max max chi2(2) Wald chi2(2) Wald Prob >> chi2 Prob chi2

= = = = = = =

= =

80 80 10 10 8 8 8.0 8.0 8 8 212 .47 212.47 0.0000 0 .0000

(standard on id) id) (standard errors errors adjusted adjusted for for clustering clustering on -----------------------------------------------------------------------------Semi-robust Semi-robust II Coef Std z P>Izl Coef.. Std.. Err Err.. Z P> I z I [95% Conf Conf.. Interval] Interval] yy II [95% -------------+---------------------------------------------------------------------------+----------------------------------------------------------------

xl II xi x2 x2 II -_cons cons II

11.357834 .357834 11.857563 .857563 .2837581 .2837581

.1886613 .1886613 .2706295 .2706295 .1964586 .1964586

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

7.20 7 .20 6.86 6 .86 1.44 1 .44

00.000 .000 00.000 .000 00.149 .149

.9880643 .9880643 1 .327139 1.327139 --.1012936 .1012936

11.727603 .727603 2 2.387987 .387987 .6688099 .6688099

THE PA-GEE FOR FOR GLMS GLMS THE PA-GEE

75 75

The estimated unstructured unstructured correlation correlation matrix matrix is is The estimated rl ri r2 r2 r3 r3 r4 r4 r5 r5 r6 r6 r7 r7 r8 r8

cl ci 11.00 .00 11.00 .00 -0.18 -0 .18 -0.26 -0 .26 -0.07 -0 .07 -0.28 -0 .28 00.38 .38 -0.37 -0 .37

c2 c2

c3 c3

c4 c4

c5 c5

c6 c6

c7 c7

c8 c8

11.00 .00 -0.18 -0 .18 -0.23 -0 .23 -0.18 -0 .18 -0.23 -0 .23 0.36 0 .36 -0.35 -0 .35

11.00 .00 0.50 0.50 0.13 0.13 0.26 0.26 .24 0.24 0 .14 0.14 0

11.00 .00 00.10 .10 .24 00.24 -0.21 -0 .21 00.04 .04

1 .00 1.00 0.08 0 .08 -0.25 -0 .25 0.03 0 .03

11.00 .00 0.28 0.28 0.50 0.50

11.00 .00 .24 0.24 0

1 .00 1.00

There is no no discernible discernible structure structure to to the the estimated estimated correlation correlation matrix matrix.. However, However, There is what really want want is to be be able able to to fit fit aa model model using using the the structure structure specified specified by by what we we really is to equation 3.46.. To To do do this, this, we we fit independent model, model, obtain obtain the the residuals, residuals, equation 3.46 fit an an independent estimate the common common correlation correlation of of our our structure, structure, construct construct aa working working correcorreestimate the matrix, and and then then specify specify that that constructed constructed matrix matrix and and fitted fitted coefficients coefficients lation lation matrix, into another model model estimation. estimation. In In this this final final command command we we allow allow only only one one iteraiterainto another tive step step to to the the regression regression parameter parameter estimation. estimation. This This process process is is repeated repeated until until tive the change in estimated estimated coefficients coefficients between between successive successive runs runs satisfies satisfies aa predepredethe change in termined convergence convergence criterion. criterion. Using Using this this algorithm, obtain the the following following termined algorithm, we we obtain results (for the the regression regression parameters parameters and and the the common common association association parameter) parameter) results (for for each each iteration iteration of this analysis analysis for of this ~xl Oxi

~x2 ~x2

f3_cons _cons

Pp

1.1584795 1.1584795 1.1387627 1.1387627 1.1387178 1.1387178 1.1387177 1.1387177 1.1387177 1.1387177 1.1387177 1.1387177

1.1589283 1 .1589283 1.1181093 1 .1181093 1.1180046 1 .1180046 1.1180043 1 .1180043 1.1180043 1 .1180043 1.1180043 1 .1180043

0.88134072 0.88134072 0.91961252 0.91961252 0.91970868 0.91970868 0.91970888 0.91970888 0.91970888 0.91970888 0.91970888 0.91970888

0.00000000 0.00000000 0.54273993 0.54273993 0.54414581 0.54414581 0.54414863 0.54414863 0.54414864 0.54414864 0.54414864 0.54414864

The final The final estimated estimated model model results results are are GEE population-averaged model model GEE population-averaged Group and time time vars vars:: id tt Group and id Link: identity Link : identity Family:: Gaussian Family Gaussian Correlation: fixed (specified) (specified) Correlation : fixed Scale parameter Scale parameter::

11.029858 .029858

Number of obs Number of obs Number of groups Number of groups Obs per per group group:: min min Obs avg avg max max Wald chi2(2) chi2(2) Wald Prob chi2 Prob >> chi2

= = = = = = =

= = =

80 80 10 10 8 8 8.0 8.0 8 8 112.62 112 .62 0.0000 0 .0000

(standard on id) id) (standard errors errors adjusted adjusted for for clustering clustering on -----------------------------------------------------------------------------Semi-robust Semi-robust II Coef Std z P>Izl y II Coef.. Std.. Err Err.. Z P> I z I [95% Conf Conf.. Interval] Interval] y [95% -------------+---------------------------------------------------------------------------+---------------------------------------------------------------xi 11.138718 .12 11.503684 xl II .1862108 6.12 .7737513 .138718 .1862108 6 00.000 .000 .7737513 .503684 x2 II .3121714 3.58 .5061597 x2 11.118004 .118004 .3121714 3 .58 00.000 .000 .5061597 11.729849 .729849 -_cons cons II 4 .42 11.327616 .9197089 .2081197 4.42 .5118018 .9197089 .2081197 00.000 .000 .5118018 .327616

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

76 76

GENERALIZED ESTIMATING ESTIMATING EQUATIONS EQUATIONS GENERALIZED

with estimated correlation correlation matrix matrix with estimated rl ri r2 r2 r3 r3 r4 r4 r5 r5 r6 r6 r7 r7 r8 r8

cl ci 11.00 .00 0.54 0 .54 0.00 0 .00 0.00 0 .00 0.00 0 .00 0.00 0 .00 0.00 0 .00 0.00 0 .00

c2 c2

c3 c3

c4 c4

c5 c5

c6 c6

c7 c7

c8 c8

11.00 .00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

11.00 .00 0.54 0 .54 0.00 0 .00 0.00 0 .00 0.00 0 .00 0.00 0 .00

11.00 .00 00.00 .00 00.00 .00 00.00 .00 00.00 .00

1.00 1 .00 0.54 0.54 0.00 0.00 0.00 0.00

11.00 .00 0.00 0 .00 0.00 0 .00

11.00 .00 0.54 0.54

11.00 .00

These results closely match the the theoretical theoretical results results for for the the association association papaThese results closely match rameters that we we should should get get given the specifications specifications of of the the constructed constructed data. data. rameters that given the Refer to equation equation 5.4 to compare compare the the estimated estimated correlation correlation parameter parameter with with Refer to 5.4 to the constructed values.. the constructed values

3.2.2 Estimating Estimating the the scale scale variance variance (dispersion (dispersion parameter) parameter) 3.2.2

The usual estimate estimate of of 1> given by by The usual 0 isis given n

n

n;

1

(~2=i n2) -p 2=1 t=1

rit

(3.47) (3.47)

is the the total total number number of of observations, observations, Ft fit is is the the itth itth Pearson Pearson residresidwhere where L: E ni n2 is Liang ual, and p is the number of covariates in the model. However, Liang and ual, and p is the number of covariates in the model. However, and 0 Zeger point out that any consistent estimate of 1> is admissible. Most software Zeger point out that any consistent estimate of is admissible . Most software implementations use equation equation 3.47, 3.47, but but some some use use implementations use _

1 n

n

n;

~ ~ rte

~2=1 n2 2=1 t=1

(3.48) (3.48)

Equation 3.47 has the the advantage advantage:: Equation 3.47 has match GLM GLM results. results. Model results results (independent (independent correlation) correlation) exactly "• Model exactly match Equation 3.48 has the the advantage advantage:: Equation 3.48 has any correlation replicorrelation structure) structure) to to panel-level panel-level repliResults are are invariant invariant (with (with any "• Results cation changes changes of of the the dataset dataset.. cation

In other words, words, if if we we make make an an exact exact copy copy of of our our panel panel dataset dataset (updating (updating In other the panel panel identifiers), results in in exactly exactly the the same same estimates estimates the identifiers), equation equation 33.48 .48 results of (standard errors errors have have aa scale scale change) change).. The The use of equation equation 3.47 3.47 fails of j3 Q (standard use of fails to produce produce the the same results. The The reason reason is is that that the the numerator numerator (for (for either either to same results. estimator) changes by by aa factor factor that that is is related related to to the the number number of of observations observations.. estimator) changes Only Only the the denominator denominator for for equation equation 3.48 3.48 similarly similarly changes changes.. It is is interesting interesting to to note note that two of of the the major major software software packages packages have have each each It that two switched the default default equation equation whereby whereby this this parameter parameter is estimated.. Stata Stata switched the is estimated version used equation 3.47, but but versions versions 66.0 and higher higher use use equation equation 3.48. 3.48. version 55.0 .0 used equation 3.47, .0 and .12 used SAS software software made made the the opposite opposite switch switch.. Version Version 66.12 used equation equation 3.48, 3.48, but but SAS versions 8.0 and and higher higher use use equation equation 33.47. versions 8.0 .47 .

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

THE PA-GEE FOR FOR GLMS GLMS THE PA-GEE

77 77

Stata users have have the the option to use use both both equations. The default default calculation calculation Stata users option to equations . The for recent recent versions is equation equation 3.48; 3.48; but but the the PA-GEE PA-GEE modeling modeling command command for versions is xtgee includes an an option option nmp nmp for for requesting requesting calculation calculation using using equation equation 3.47. 3.47. xtgee includes the option option to to use use both both equations equations.. The The default calculaSAS users also also have have the SAS users default calculation in in recent recent versions versions is is equation equation 33.47, but the the PA-GEE PA-GEE modeling modeling command command tion .47, but PROC GENMOD includes includes an option V6CORR V6CORR with with the the REPEATED REPEATED statement for PROC GENMOD an option statement for requesting by equation 3.48.. requesting calculation calculation by equation 3.48

3.2.2.1 Independence Independence models models 3.2.2.1 In this subsection subsection we we investigate investigate if if there there is is any any difference difference in in the the two two approaches approaches In this to estimating estimating the the dispersion parameter for for independence independence models. models. to dispersion parameter The data data to to investigate the effects effects of of the the competing competing estimators estimators for for the the disdisThe investigate the persion parameter parameter are are only only for for pedagogical pedagogical purposes purposes and we make make no no attempt attempt persion and we to identify identify the the nature nature of of the the data data nor nor to to interpret interpret the the results results beyond beyond noting noting to the effect effect on on the the estimated regression and and association association parameters. parameters. Our Our first first the estimated regression sample dataset, Sample1, is comprised comprised of of the the following following data data sample dataset, Samplel, is

id id

tt

y Y

xi x1

x2 x2

11 11 11 11 22 22 22 22

11 22 3 3 44 11 22 3 3 44

44 55 66 77 55 66 77 88

00 11 0 0 11 0 0 11 0 0 11

00 00 11 11 00 00 11 11

from Samplel Sample1 where where we we replicate replicate the the panels panels from from SamSamSample2 is is constructed Sample2 constructed from plel.. It is aa simple simple replication replication of the Samplel Sample1 panels panels.. This This dataset has exactly exactly plel It is of the dataset has the same same within within panel panel information information as as in in Sample1, but twice twice the the number number of of the Samplel, but panels.. Examining Examining the the data, data, you you can can see see that that panel panel 33 (id=3) is the the same same as as panels (id=3) is panel 1, 1, and and panel panel 44 is is the the same same as as panel panel 2. 2. We We have have merely merely added added aa single single panel copy of each each of of the the original original panels panels in in the the Samplel Sample1 data. data. copy of

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

GENERALIZED ESTIMATING ESTIMATING EQUATIONS EQUATIONS GENERALIZED

78 78

id id

tt

yY

xi x1

x2 x2

11 11 11 11 22 22 22 22 33 33 33 33 44 44 44 44

11 22 33 44 11 22 33 44 11 22 33 44 11 22 33 44

44 55 66 77 55 66 77 88 44 55 66 77 55 66 77 88

00 11 00 11 00 11 00 11 00 11 00 11 00 11 00 11

00 00 11 11 00 00 11 11 00 00 11 11 00 00 11 11

On the next next three three pages pages we we present present the the results results for for fitting identical indepenindepenOn the fitting identical dent models to to the the base base (Samplel) (Samplel) and expanded data data (Sample2) (Sample2).. dent correlation correlation models and expanded The aim is to illustrate illustrate the the effect effect on the results results of of the the moment moment estimator estimator for for The aim is to on the the dispersion parameter. the dispersion parameter. Before reading reading those those three three pages, pages, think think about about what what you you expect expect for for the the Before relationship between the the fitted fitted coefficients coefficients when when modeling modeling the the two two datasets datasets.. In In relationship between addition, the relationship relationship of of the the estimated estimated correlation correlation parameter parameter for for addition, consider consider the the two two datasets datasets.. The The manner manner in in which which we we estimate estimate the the dispersion dispersion parameter parameter the affects our ability to demonstrate demonstrate these these kinds kinds of of relationships, and affects affects our our affects our ability to relationships, and ability to demonstrate demonstrate the the type type of of information information contained contained in in the the data. data. ability to The purpose purpose of of the the following following pages pages is is to to illustrate illustrate the the source source of of differences differences The in output for for different different software software packages packages.. That That software will produce produce different different in output software will answers the same same analysis analysis is is aa constant constant source source of of confusion confusion usually usually leading leading answers for for the analysts to consider consider that that one one of the software software packages packages is is producing producing incorrect incorrect analysts to of the results. That is is not not necessarily necessarily the the case. case. results . That The The PA-GEE PA-GEE model model is is not not fully fully specified specified and and software software vendors vendors may may choose choose for the any unbiased unbiased estimate estimate for the dispersion dispersion parameter parameter.. This This choice choice affects affects comcomany parisons of of equivalent equivalent analyses analyses across across software software packages packages as as well well as as the the relarelaparisons tionship tionship to to other other analyses analyses within within the the same software package package.. Our Our detailed detailed same software examples with the the small small data data provided provided on on this this and and the the preceding preceding page page will will examples with highlight the effects effects of the two two most most common common choices choices for for estimating estimating the the disdishighlight the of the persion parameter parameter.. In In fact, fact, we we know know of of no no software software that that does does not not use use one one of of persion these two two estimators estimators.. these

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

THE PA-GEE FOR FOR GLMS GLMS THE PA-GEE

79 79

Fitting aa GLM the datasets datasets yields yields the the following following output output:: Fitting GLM to to the Generalized linear Generalized linear models models Optimization ML:: Newton-Raphson Newton-Raphson Optimization :: ML Deviance Deviance Pearson Pearson

= =

2 2 2 2

No of obs No.. of obs Residual df df Residual Scale param Scale param (l/df) Deviance Deviance (1/df) (l/df) Pearson Pearson (1/df)

Variance function function:: V(u) V(u) = 11 Variance Link function : g(u) Link function g(u) = u u Standard errors : OIM DIM Standard errors

[Gaussian] [Gaussian] [Identity] [Identity]

Log Log likelihood likelihood BIC BIC

ArC AIC

=

-5.806330821 = -5 .806330821 -4.238324625 = -4 .238324625

8 8 5 5 .4 .4 .4 .4 .4 .4

=

2.201583 2.201583

y II Coef.. Std.. Err Err.. z P> I z I [95% Conf Conf.. Interval] Interval] Coef Std z P>Izl [95% y -------------+---------------------------------------------------------------------------+----------------------------------------------------------------

xl II 11 .4472136 2.24 .1234775 xi .4472136 2 .24 00.025 .025 .1234775 11.876523 .876523 x2 II 2 .4472136 4.47 1.123477 2.876523 x2 2 .4472136 4 .47 00.000 .000 1 .123477 2.876523 .3872983 11.62 3.740909 5.259091 -_cons cons II 44.5 .5 .3872983 11 .62 00.000 .000 3 .740909 5.259091 ------------------------------------------------------------------------------

ModelbLM:: Linear Linear model model for for Samplel Mode1GLM Samplel Generalized linear models models Generalized linear Optimization ML:: Newton-Raphson Newton-Raphson Optimization :: ML Deviance Deviance Pearson Pearson

= =

Variance function function:: Variance Link function Link function : Standard errors : Standard errors Log Log likelihood likelihood BIC BIC

4 4 4 4

V(u) = 11 V(u) g(u) = u u g(u) DIM OIM

=

-11.61266164 = -11 .61266164 -4.317766167 = -4 .317766167

No.. of obs No of obs Residual df df Residual Scale param Scale param (l/df) Deviance Deviance (1/df) (l/df) Pearson Pearson (1/df)

16 16 13 13 .3076923 .3076923 .3076923 .3076923 .3076923 .3076923

[Gaussian] [Gaussian] [Identity] [Identity] ArC AIC

=

11.826583 .826583

y II Coef.. Std.. Err Err.. z P> I z I [95% Conf Conf.. Interval] Interval] y Coef Std z P>Izl [95% -------------+---------------------------------------------------------------------------+----------------------------------------------------------------

xl II 1.2773501 3.61 .4564038 543596 xi 1 .2773501 3 .61 00.000 .000 .4564038 11..543596 x2 2 .21 1 .456404 2 x2 II 2 .2773501 7.21 1.456404 2.543596 .2773501 7 00.000 .000 .543596 .2401922 18.73 4.029232 4.970768 -_cons cons II 44.5 .5 .2401922 18 .73 00.000 .000 4 .029232 4.970768 -----------------------------------------------------------------------------Sample2 ModelbLM: Linear model model for for Sample2 Mode1G LM : Linear

Note that that the coefficient estimates estimates for for the two datasets datasets are are exactly the same same Note the coefficient the two exactly the (as we we anticipate) anticipate).. The The standard standard errors errors are are scale since there there are are twice (as scale different different since twice as many observations observations in in the the second second dataset dataset.. The relationship between between the the two two as many The relationship standard errors is is the the scale scale factor factor standard errors SEMode12 - SEModell

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

nModel"l - P p nModel nModel22 P - P nMode1

(3.49) (3.49)

80 80

GENERALIZED ESTIMATING ESTIMATING EQUATIONS EQUATIONS GENERALIZED

Fitting an an independent independent PA-GEE PA-GEE model model to to the the two two sample sample datasets datasets using Fitting using equation to estimate estimate 1> yields the the following results: equation 33.48 .48 to 0 yields following results: GEE population-averaged model model GEE population-averaged Group variable variable:: Link: Link : Family Family:: Correlation: Correlation :

identity identity Gaussian independent independent

Pearson Pearson chi2(8): chi2(8) : Dispersion (Pearson) (Pearson):: Dispersion

Wald chi2(2) chi2(2) Wald Prob chi2 Prob >> chi2

2 2.00 .00 .25 .25

Coef.. Coef

Std.. Err Err.. Std

= =

Obs per per group group:: min min = Obs avg avg =

.25 .25

Scale parameter Scale parameter::

y II y

Number of obs Number of obs Number Number of groups groups

id id

max = max

= =

22.00 .00 .25 .25

Deviance Deviance Dispersion Dispersion

Z z

P> I z I P>Izl

8 8 2 2 4 4 4 4.0 .0 4 4 40.00 40 .00 0.0000 0 .0000

[95% Conf Conf.. Interval] Interval] [95%

-------------+----------------------------------------------------------------------------+----------------------------------------------------------------

xl II 11 .3535534 2.83 .3070481 xi .3535534 2 .83 00.005 .005 .3070481 11.692952 .692952 x2 II 2 .3535534 5.66 1.307048 2.692952 x2 2 .3535534 5 .66 00.000 .000 1 .307048 2.692952 .3061862 14.70 3.899886 5.100114 -_cons cons II 44.5 .5 .3061862 14 .70 00.000 .000 3 .899886 5.100114 -----------------------------------------------------------------------------Model' for ~ Model!:iEE:. PA-GEE PA-GEE model model using using equation equation 3.48 3.48 for ¢ with with Samplel Samplel

GEE population-averaged model model GEE population-averaged Group variable:: Group variable Link: Link : Family:: Family Correlation : Correlation:

id id identity identity Gaussian Gaussian independent independent .25 .25

Scale parameter parameter:: Pearson Pearson chi2(16): chi2(16) : (Pearson) :: Dispersion Dispersion (Pearson)

4 4.00 .00 .25 .25

Number of obs Number of obs Number of groups Number of groups Obs per per group group:: min min Obs avg avg max max chi2(2) Wald chi2(2) Wald Prob chi2 Prob >> chi2

= = = = =

= =

Deviance Deviance Dispersion Dispersion

16 16 4 4 4 4 4.0 4.0 4 4 80.00 80 .00 0.0000 0 .0000 44.00 .00 .25 .25

y II Coef.. Std.. Err Err.. z P> I z I [95% Conf Conf.. Interval] Interval] Coef Std z P>Izl [95% y -------------+----------------------------------------------------------------------------+----------------------------------------------------------------

xl II 11 .25 4.00 .510009 xi .25 4 .00 00.000 .000 .510009 11.489991 .489991 x2 II 2 .25 8.00 1.510009 2.489991 x2 .25 8 .00 00.000 .000 1 .510009 2.489991 2 .2165064 20.78 4.075655 4.924345 -_cons cons II 44.5 .5 .2165064 20 .78 00.000 .000 4 .075655 4.924345 -----------------------------------------------------------------------------Mode1G EE : PA-GEE for ~ Model~iEE: PA-GEE model model using using equation equation 3.48 3.48 for ¢ with with Sample2 Sample2

The resulting resulting coefficient estimates match match the the output output of of the the GLM, GLM, but but the the The coefficient estimates standard errors are are different different due due to to the the different different denominator denominator used used by by equaequastandard errors tion 3.48 3.48.. The The relationship of the the standard for the the PA-GEE PA-GEE model model to to tion relationship of standard errors errors for the standard errors of of the the associated associated GLM GLM model model with with the the same same data data is is the standard errors

n p k k SE SEGEE = SEGLM~ SEGLMV -----:;;:--nn GEE =

rn=P

(3.50) (3.50)

The relationship between between the the PA-GEE PA-GEE standard standard errors errors of of the the two datasets is The relationship two datasets is the scale factor the scale factor

SEMode1 2 - SEModeh

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

nModell nModell

( 3.51) (3.51)

THE PA-GEE FOR FOR GLMS GLMS THE PA-GEE

81 81

Fitting an an independent independent PA-GEE PA-GEE model model to to the the two two sample sample datasets datasets using Fitting using equation to estimate estimate 1> yields the the following results: equation 33.47 .47 to 0 yields following results: GEE population-averaged population-averaged model model Group variable variable:: Group Link : Link: Family Family:: Correlation: Correlation :

.4 .4

Scale parameter:: Scale parameter Pearson Pearson chi2(5): chi2(5) : Dispersion (Pearson) (Pearson):: Dispersion y II y

Number Number of obs obs Number of groups Number of groups Obs Obs per per group group:: min min avg avg max max Wald chi2(2) chi2(2) Wald Prob >> chi2 Prob chi2

id id identity identity Gaussian Gaussian independent independent

Coef.. Coef

2.00 2 .00 .4 .4 Std.. Err Err.. Std

= = = = =

= =

22.00 .00 .4 .4

Deviance Deviance Dispersion Dispersion Z z

P> I z I P>Izl

8 8 2 2 4 4 4 4.0 .0 4 4 25.00 25 .00 0.0000 0 .0000

[95% Conf Conf.. Interval] Interval] [95%

-------------+----------------------------------------------------------------------------+----------------------------------------------------------------

xl II 11 .4472136 2.24 .1234775 xi .4472136 2 .24 00.025 .025 .1234775 11.876523 .876523 x2 II 2 .4472136 4.47 1.123477 2.876523 x2 2 .4472136 4 .47 00.000 .000 1 .123477 2.876523 .3872983 11.62 3.740909 5.259091 -_cons cons II 44.5 .5 .3872983 11 .62 00.000 .000 3 .740909 5.259091 ------------------------------------------------------------------------------

Model!:iEE:: PA-GEE PA-GEE model model using using equation equation 3.47 3.47 for ¢ with with Samplel Sample1 Mode1GEE for ~ GEE population-averaged model model GEE population-averaged Group variable:: Group variable Link: Link : Family:: Family Correlation: Correlation : Scale parameter:: Scale parameter

.3076923 .3076923

Number of obs Number of obs Number of groups Number of groups Obs per per group group:: min min Obs avg avg max max Wald chi2(2) chi2(2) Wald Prob >> chi2 Prob chi2

Pearson Pearson chi2(13): chi2(13) : Dispersion (Pearson) (Pearson):: Dispersion

4.00 4 .00 .3076923 .3076923

Deviance Deviance Dispersion Dispersion

id id identity identity Gaussian Gaussian independent independent

16 16 4 4 4 4 4.0 4.0 4 4 65.00 65 .00 0.0000 0 .0000 = =

44.00 .00 .3076923 .3076923

Coef.. Std.. Err Err.. z P> I z I [95% Conf Conf.. Interval] Interval] y II y Coef Std z P>Izl [95% -------------+----------------------------------------------------------------------------+----------------------------------------------------------------

xl II 1.2773501 3.61 .4564038 543596 xi 1 .2773501 3 .61 00.000 .000 .4564038 11..543596 2 .2773501 7.21 1.456404 2.543596 x2 II x2 2 .2773501 7 .21 00.000 .000 1 .456404 2.543596 .2401922 18.73 4.029232 4.970768 -_cons cons II 44.5 .5 .2401922 18 .73 00.000 .000 4 .029232 4.970768 ------------------------------------------------------------------------------

Model~iEE: PA-GEE model model using using equation equation 3.47 3.47 for ¢ with with Sample2 Sample2 Mode1G for ~ EE : PA-GEE

The resulting resulting coefficient coefficient estimates estimates and and standard errors exactly exactly match match the the The standard errors results for the the GLM GLM fit fit of of each each dataset dataset.. Equation Equation 3.47 used in in the the PA-GEE PA-GEE results for 3.47 used model to estimate estimate 1> the same same estimator estimator for the dispersion dispersion used used to to estimate model to 0 isis the for the estimate the GLM GLM.. the

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

82 82

GENERALIZED ESTIMATING ESTIMATING EQUATIONS EQUATIONS GENERALIZED

3.2.2.2 Exchangeable Exchangeable models models 3.2.2.2

In this subsection, we continue our investigation investigation of of the the difference difference in output In this subsection, we continue our in output resulting from our estimation method for the dispersion parameter with an resulting from our estimation method for the dispersion parameter with an exchangeable logistic model. Our base sample dataset, Sample3, is comprised exchangeable logistic model . Our base sample dataset, Sample3, is comprised of the following following data data of the id id

tt

y Y

xi x1

x2 x2

11 11 11 11 22 22 22 22

11 22 33 44 11 22 33 44

00 11 11 00 00 00 00 11

id id

tt

y Y

xi x1

x2 x2

11 11 11 11 22 22 22 22 33 33 33 33 44 44 44 44

11 22 33 44 11 22 33 44 11 22 33 44 11 22 33 44

00 11 11 00 00 00 00 11 00 11 11 00 00 00 00 11

00 11 00 11 11 11 00 11 00 11 00 11 11 11 00 11

00 00 11 11 00 00 11 11 00 00 11 11 00 00 11 11

00 00 11 00 00 11 11 11 11 00 11 00 00 11 11 11 Sample4 is in which Sample4 is constructed constructed from from Sample3 Sample3 in in exactly exactly the the same same manner manner in which we expanded Samplel Samplel to to construct Sample2.. We We have have merely merely added added aa single single we expanded construct Sample2 copy of each each of of the the original original panels panels in in the the Sample3 Sample3 data. data. copy of

On the next next two two pages, pages, we we present present the the results results for for fitting fitting similar exchangeable On the similar exchangeable correlation models to to the the base base and expanded data in order order to to illustrate illustrate the the correlation models and expanded data in effect on the the results results of of the the moment moment estimator for the the dispersion parameter.. effect on estimator for dispersion parameter Before looking at at the the results results of of the the experiment, experiment, we we should should think think carefully carefully Before looking about our expectations expectations of of the results for for fitting fitting models models to to these these two two datasets datasets.. about our the results Do you expect expect the the regression regression parameters parameters to to be be the the same for the the two two datasets? datasets? Do you same for Do you expect expect the the association parameters to to be be the the same for analyses on the the Do you association parameters same for analyses on two datasets? datasets? What What kind kind of of information information was was added added to to the the construction construction of of the the two Sample4 data? data? Sample4

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

THE PA-GEE FOR FOR GLMS GLMS THE PA-GEE

883 3

Fitting an an exchangeable exchangeable logistic logistic model model to to the the data data yields yields the the following following rereFitting sults when we we use use equation equation 3.48 3.48 to to estimate ¢>: sults when estimate 0: GEE population-averaged GEE population-averaged model model Group variable Group variable::

Link : Link: Family Family:: Correlation: Correlation :

Scale parameter:: Scale parameter

id id logit logit binomial binomial exchangeable exchangeable 1 1

Number of obs Number of obs Number of groups Number of groups Obs Obs per per group group:: min min avg avg max max Wald chi2(2) chi2(2) Wald Prob >> chi2 Prob chi2

= = = = = = =

8 8 2 2 4 4 4 4.0 .0 4 4 .44 00.44 0.8035 0 .8035

y II Coef.. Std.. Err Err.. Z P> I z I [95% Conf Conf.. Interval] Interval] Coef Std z P>Izl [95% y -------------+---------------------------------------------------------------------------+----------------------------------------------------------------

xl II -.3681158 1.599157 -0.23 -3.502406 2.766174 xi .3681158 1 .599157 -0 .23 00.818 .818 -3 .502406 2.766174 x2 II .9705387 1.723205 0.56 -2.406882 4.347959 x2 .9705387 1 .723205 0 .56 00.573 .573 -2 .406882 4.347959 -.7897303 1.525776 -0.52 -3.780196 2.200736 -_cons cons II .7897303 1 .525776 -0 .52 00.605 .605 -3 .780196 2.200736 -------------+----------------------------------------------------------------------------+---------------------------------------------------------------alpha II -0 -0.2338 alpha .2338 ------------------------------------------------------------------------------

Model~jEE:: PA-GEE PA-GEE exchangeable exchangeable logistic logistic model model using using equation equation 33.48 for ~ ¢ in in Sample3 Sample3 Mode1GEE .48 for GEE population-averaged model model GEE population-averaged Group variable:: Group variable

Link: Link : Family:: Family Correlation: Correlation :

Scale parameter:: Scale parameter

id id logit logit binomial binomial exchangeable exchangeable 1 1

Number of obs Number of obs Number of groups Number of groups Obs per per group group:: min min Obs avg avg max max chi2(2) Wald chi2(2) Wald Prob >> chi2 Prob chi2

= = = = = = =

16 16 4 4 4 4 4.0 4.0 4 4 00.87 .87 0.6457 0 .6457

y II Coef.. Std.. Err Err.. z P> I z I [95% Conf Conf.. Interval] Interval] y Coef Std z P>Izl [95% -------------+---------------------------------------------------------------------------+----------------------------------------------------------------

xl II -.3681158 1.130775 -0.33 -2.584393 xi .3681158 1 .130775 -0 .33 00.745 .745 -2 .584393 11.848162 .848162 x2 II .9705387 1.21849 0.80 -1.417658 3.358736 x2 .9705387 1 .21849 0 .80 00.426 .426 -1 .417658 3.358736 -.7897303 1.078887 -0.73 -2.904309 -_cons cons II .7897303 1 .078887 -0 .73 00.464 .464 -2 .904309 11.324849 .324849 -------------+----------------------------------------------------------------------------+---------------------------------------------------------------alpha II -0 -0.2338 alpha .2338 ------------------------------------------------------------------------------

Model~jEE:: PA-GEE PA-GEE exchangeable exchangeable logistic logistic model model using using equation equation 33.48 for ~ ¢ in in Sample4 Sample4 Mode1GEE .48 for

The coefficient coefficient and and correlation correlation estimates estimates exactly exactly match match for for the the two two datasets datasets The using equation 33.48 to estimate estimate 0. ¢>. The The relationship relationship between between the the PA-GEE PA-GEE using equation .48 to standard errors of of the the two two datasets datasets is the scale scale factor factor standard errors is the

nmodel3 (3.52) 3.52 nModel4 This is the the same same relationship relationship as as for for the the independence independence PA-GEE PA-GEE model model seen seen in in This is equation equation 33.51. .51 . SEMode1 4 = SEMode13

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

84 84

GENERALIZED ESTIMATING ESTIMATING EQUATIONS EQUATIONS GENERALIZED

Fitting an an exchangeable exchangeable logistic logistic model model to to the the data data yields yields the the following following rereFitting sults when we we use use equation equation 3.47 3.47 to to estimate 1>: sults when estimate 0: GEE population-averaged model model GEE population-averaged Group variable:: Group variable

Link: Link : Family:: Family Correlation: Correlation :

Scale parameter Scale parameter::

id id logit logit binomial binomial exchangeable exchangeable 1 1

Number of obs Number of obs Number of groups Number of groups Obs per per group group:: min min Obs avg avg max max chi2(2) Wald chi2(2) Wald Prob chi2 Prob >> chi2

= = = = = = =

8 8 2 2 4 4 4.0 4.0 4 4 00.45 .45 0.7985 0 .7985

Coef Std z P>Izl y II Coef.. Std.. Err Err.. Z P> I z I [95% Conf Conf.. Interval] Interval] y [95% -----------------------------------------------------------------------------------------+---------------------------------------------------------------xi 1 .675585 .153424 .414747 xl II .1306614 1.675585 0.08 -3.153424 3.414747 .1306614 0 .08 00.938 .938 -3 3 x2 11.141203 .141203 1 .718915 4.510214 x2 II 1.718915 0.66 -2.227808 4.510214 0 .66 00.507 .507 -2 .227808 _cons 1 .650726 .442293 2 _cons II -1.20693 1.650726 -0.73 -4.442293 2.028433 -1 .20693 -0 .73 00.465 .465 -4 .028433 ------------------------------------------------------------------------------------------+---------------------------------------------------------------alpha alpha II -0 -0.1823 .1823 -----------------------------------------------------------------------------Model~jEE: logistic model model using using equation equation 3.47 for Sample3 Mode1G EE : PA-GEE PA-GEE exchangeable exchangeable logistic 3 .47 for Sample3

GEE population-averaged model model GEE population-averaged Group variable:: Group variable

Link: Link : Family:: Family Correlation: Correlation :

Scale parameter:: Scale parameter

id id logit logit binomial binomial exchangeable exchangeable 1 1

Number of obs Number of obs Number of groups Number of groups Obs per per group group:: min min Obs avg avg max max chi2(2) Wald chi2(2) Wald Prob >> chi2 Prob chi2

= = = = = = =

16 16 4 4 4 4 4.0 4.0 4 4 00.85 .85 0.6526 0 .6526

y II Coef.. Std.. Err Err.. z P> I z I [95% Conf Conf.. Interval] Interval] y Coef Std z P>Izl [95% -----------------------------------------------------------------------------------------+---------------------------------------------------------------xi 1 .163012 2 xl II -.0734883 1.163012 -0.06 -2.352949 2.205973 .0734883 -0 .06 00.950 .950 -2 .352949 .205973 x2 11.07385 1 .217312 x2 II 1.217312 0.88 -1.312038 3.459738 .07385 0 .88 00.378 .378 -1 .312038 3.459738 _cons 1 .128697 .249443 11.174969 _cons II -1 -1.037237 1.128697 -0.92 -3.249443 .037237 -0 .92 00.358 .358 -3 .174969 ------------------------------------------------------------------------------------------+---------------------------------------------------------------alpha alpha II -0 -0.2083 .2083 -----------------------------------------------------------------------------Model~jEE: logistic model model using using equation equation 3.47 for Sample4 Mode1G EE : PA-GEE PA-GEE exchangeable exchangeable logistic 3 .47 for Sample4

The coefficient coefficient and and correlation correlation parameter parameter estimates estimates do not match match when The do not when using the estimator estimator for for 0 1> given given in equation 3.47 3.47.. The The relationship relationship of of the the stanstanusing the in equation dard errors for for the the two two models models is complicated by by the the fact fact that that the the estimated estimated dard errors is complicated common correlation is is now now different different.. common correlation

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

THE PA-GEE FOR FOR GLMS GLMS THE PA-GEE

85 85

3.2.3 Estimating Estimating the the PA-GEE FA-GEE model model 3.2.3 The software implementation implementation was was given given by by Karim Karim and and Zeger Zeger (1989) (1989) The first first software shortly after the the appearance appearance of of the the initial paper describing describing the the PA-GEE PA-GEE colcolshortly after initial paper lection of models. models. The The authors authors provided provided aa macro macro for for use use with with the the SAS software lection of SAS software system. addition to to this this macro, macro, aa standalone C-Ianguage source source code code proprosystem . In In addition standalone C-language gram was developed by Vince Vince Carey Carey estimating estimating these these models models for for balanced balanced gram was developed by panels.. Carey Carey later code for for fitting fitting alternating alternating logistic logistic regression regression panels later developed developed code PA-GEE models. He He subsequently subsequently developed developed the the YAGS YAGS software software in in addition addition PA-GEE models. to C++ C++ code code classes classes for for programmers programmers..** Once this code code was was available, support to Once this available, support software was developed developed for for use use with with many many other other software software packages. packages. software was Combining the estimating estimating equations equations for the regression regression parameters parameters (equa(equaCombining the for the tion 3.12) 3.12) and and the the ancillary ancillary parameters parameters (equation (equation 3.15), 3.15), the the complete complete PA-GEE PA-GEE tion is by is given given by

w(,8,a) = `F0,a)

n

2

nl

i-1

v(Iti)

=

(3.53) (3.53)

Opp0,a),`F«0,a)) T

~~2

( 8a

1

lti T )

Hi

1

Yi - lei

(3.54) (3.54) (wi - ~ i)

D(v(l-tit))'/' R(a) D(v(l-tit)) 1/2

(3.55) (3.55)

Estimation assumes assumes that that the the estimating estimating equation equation for for the the correlations correlations is Estimation is orthogonal to the the estimating estimating equation equation for for ,8. At each each step step in in the the usual usual GLM GLM orthogonal to Q . At algorithm, we first first estimate estimate R, and then then use use it it to to estimate estimate ,8. Convergence algorithm, we R, and Q. Convergence is declared when when either either the the change change in in parameter parameter estimates estimates is is less less than than aa set set is declared criterion, or the the change change in in the the sum sum of of the the squared squared deviances deviances is is less than aa criterion, or less than given criterion given criterion.. The The squared squared deviance deviance residuals residuals for for various various distributions distributions from from exponential family the exponential family are provided in in Table Table 33.1. the are provided .1 . While the the deviance deviance may may be be calculated calculated and and used used as criterion for declaring While as aa criterion for declaring convergence in the the optimization, optimization, it it is is not not usually usually reported reported in in software. The convergence in software . The deviance plays an an important part in in the the inference inference for GLMs, but but does does not not deviance plays important part for GLMS, have the same properties for for PA-GEE PA-GEE models, models, unless unless the the PA-GEE PA-GEE model model uses uses have the same properties the independent correlation structure. For example, example, when using aa correlation correlation the independent correlation structure . For when using structure other than than independence, independence, the the deviance deviance could could either either increase increase or or dedestructure other crease with the the addition addition of covariate.. Some Some packages packages will will include include GLM GLM type type crease with of aa covariate summary statistics including including the the deviance deviance in the output output where where these these sumsumsummary statistics in the mary statistics are are calculated calculated for for the the independence independence model. model. These These statistics statistics mary statistics are useful in in calculating calculating other criterion measures measures as as we see in in Chapter Chapter 4. 4. are useful other criterion we see Zeger and and Liang Liang provide provide evidence in their their early early work work that that even even if an incorincorZeger evidence in if an rect structure is is used used for for the the correlation correlation matrix, matrix, that that only only the the efficiency efficiency of of our our rect structure estimated is affected affected.. This This robustness robustness to to misspecification misspecification of of the the correlation correlation estimated ,8 Q is structure is purchased purchased through through the the assumption that the the estimating estimating equation equation structure is assumption that

*

http://www biostat..harvard harvard..edu/-care edur carey currently has has links links for for these these packages packages.. * http ://www ..biostat y currently

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

86 86

GENERALIZED ESTIMATING ESTIMATING EQUATIONS EQUATIONS GENERALIZED

Distribution Distribution

Squared deviance residual Squared deviance residual d22 d ~2

(variance) (variance)

i

Gaussian Gaussian

-21n(1 - N2) -21n(~2)

Bernoulli Bernoulli

Binomial (k) (k ) Binomial

if if y2 Yi = = 00

if if y2 Yi = = 11

2k2 1n ( k

. )

if if y2 Yi = = 00

2y2 1n

+ 2(k2 - y2) 1n (ki -wi

if 0Gy2Gk2

(w~ )

2k2 1n ( Pi w )

2~2 21y21n (Y'. ) - (y2 - N2)}

Poisson Poisson Gamma Gamma

-2

Inverse Gaussian Gaussian Inverse

~ln

if y2 = k2 if y2 = 0 otherwise

y2 Ft2 Y2) Ft2 ( (y2 - X2)2 Ft2 y2

Table 3.1 3.1 Squared Squared deviance deviance residuals residuals Table

for the the regression regression coefficients is orthogonal to the the estimation estimation equation equation for for the the for coefficients is orthogonal to correlation coefficients.. correlation coefficients We can can further further protect protect ourselves ourselves from from misspecification misspecification of of the the within-panel within-panel We correlation assumption by by employing employing the the modified modified sandwich of varivaricorrelation assumption sandwich estimate estimate of ance for the the estimated estimated ,Q 13.. Recall Recall that that the the modified modified sandwich sandwich estimate estimate of of varivariance for ance is robust robust to any form form of of within-panel within-panel correlation correlation.. In In this this way way we we gain gain ance is to any our estimated estimated 13 we have have the the correct correct form form of of within-panel within-panel correcorreefficiency in our efficiency in Q if if we lation, and we we are protected from misspecification if if we are wrong wrong.. Sutradhar lation, and are protected from misspecification we are Sutradhar and Das (1999) (1999) investigate investigate the the efficiency efficiency of of the the regression regression coefficients coefficients under under and Das misspecification and provide provide results results for for some some simulation simulation studies studies.. misspecification and The modified sandwich sandwich estimate estimate of of variance variance for for the the complete complete estimating estimating The modified equation is derived derived from from equation is VMS VMS

=

A A

B B

=

T A -lBAA-113A -T

P _ 8lI! f3 0,3 Oa 8810£ 8'P [ 0,3 8a 813 n n;i ~ ( lI! TF«p2t ) f3it Zt i=l t=l ( lI! O£it) 2-1 t=1

(t Ln(n L

(3.56) (3.56)

-1

(3.57) (3.57)

(nL; i

)TT

T«it lI!pf3it lI! O£itt ~) )

(3.58) (3.58) t=l ( t_1 However, since we we assume assume that that the the two two estimating estimating equations equations are are orthogonal, orthogonal, However, since

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

THE PA-GEE FOR FOR GLMS GLMS THE PA-GEE

887 7

we can write write we can

(_ ""11,3 ) 0,3

-1

0

i

8T « 8ce

0

(3.59) (3.59)

This of orthogonality orthogonality leads leads to to aa block block diagonal diagonal modified modified sandwich sandwich This assumption assumption of estimate of variance variance where where the the upper upper left left entry entry is is given given by by estimate of

0po) aW ) ( - a; (

n

( n;i

[n~ (n~ 2-i

t-i

`Lpit W/3it

) ( n ;i )

(n~ t-i

T

`LPZt W/3it

)

`Y«Zt WOlit

)

T] (()T aW ) T - a;

(3.60) (3.60)

T] (- aw )T aaOl

(3.61) (3.61)

0,3

and the lower lower right right entry entry is is given given by by and the

) ~ n

( n;i

) [n aaOl ~ (n ~ `Y«Zt wOlit ( - aw 2-i

t-i

(n~ i

)

)

i

we are are interested interested only only in in the the regression regression parameters, parameters, there there is is no no need need Because Because we ancillary parameters to calculate calculate variances variances for for the parameters since since the the matrix matrix is is block block to the ancillary diagonal. The modified modified sandwich sandwich estimate estimate of of variance variance for for the the regression regression papadiagonal. The rameters is the upper pp xx pp part part of of VMS given by by equation equation 3.60 3.60 and the rameters is the upper VMS given and the modified sandwich estimate variance for for the the association association parameters parameters is is the the modified sandwich estimate of of variance lower part of of VMS VMS given given in in equation equation 3.61. 3.61. The The variance variance of of the the association association lower qq xx qq part parameters is is not not calculated calculated in the approach approach given given in in section section 3.2, 3.2, though though the the parameters in the formula is is valid. valid. The The variance of the the association association parameters parameters is is calculated calculated in in formula variance of the ALR ALR approach approach (section (section 3.2.5) 3.2.5) as as well well as as other other GEE models. the GEE models. that all all users users specify specify the the modified modified sandwich sandwich estimate estimate of variance We advise advise that We of variance with model.. This This is is called called the the empirical empirical variance variance in in SAS, SAS, the robust with this this model the robust variance in S-PLUS, S-PLUS, and and the the semirobust semirobust variance in Stata. Stata. Stata calls the the variance in variance in Stata calls variance estimate semirobust semirobust due due to to the the use use of of the the expected expected Hessian Hessian in in the the variance estimate bread (the (the A A matrix) matrix) of ofthe sandwich variance variance estimate estimate.. The The expected expected Hessian Hessian bread the sandwich is not robust to the the misspecification misspecification of of the the link link function function.. SUDAAN SUDAAN allows allows is not robust to user the expected expected Hessian Hessian with with option option zeger zeger or or the the observed observed user specification specification of of the The semirobust Hessian option binder binder.. The semirobust variance variance estimate estimate is is the the same same as as Hessian with with option the robust robust variance variance estimate estimate if if the link is used for for the the model, model, but but the the canonical canonical link is used the output output from from Stata Stata is is still still labelled labelled "semirobust "semirobust." Hardin and and Hilbe Hilbe the ." See See Hardin (2001) for for details details and and further further discussion discussion of of robust robust versus versus semirobust sandwich (2001) semirobust sandwich estimates of variance variance.. estimates of Recently, Pan Pan (2001b) (2001b) introduced introduced an an alternative alternative estimator estimator for for the the variance variance Recently, of outcome.. He He noted that the the usual usual correction correction factor factor for the modified modified of the the outcome noted that for the sandwich estimate of of variance variance in in PA-GEE PA-GEE models models may may be be written written sandwich estimate

B = ~D ( ~

2

) T v(w2) -T cov(y2)v(w2) -1 D (

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

~

2

)

(3.62) (3.s2)

88 88

GENERALIZED ESTIMATING ESTIMATING EQUATIONS EQUATIONS GENERALIZED

to emphasize emphasize that that the the covariance covariance of of the the outcome outcome is estimated by by to is estimated

Cov(y2) COV(Yi)

=

SZST SiS;

Si Si

=

Yi YZ

(3.63) (3.63) (3.64) (3 .64)

JLi /-t2

Pan's alternate formulation formulation changes changes the the covariance covariance of of the the outcome outcome term term to to Pan's alternate

Cov(y2)

=

A 1/ 2 A2/2 "

AZ

=

D D

(~n~" ~ A:-1/2SZSTA2 S"ST A:- 121 A2 A /2 ~AZ 0"" " 1 2 /

n 2=1 i=l C1

lt2 (:i) ( Oq )

1 2 / )

1 2 /

(3.65) (3 .65) (3.66) (3.66)

arguing that the the usual usual estimate estimate is is neither neither consistent consistent nor nor efficient efficient since since itit uses uses arguing that data from only one subject subject.. data from only one Early simulation simulation work work demonstrated demonstrated that that the the modified modified sandwich sandwich estimate Early estimate of variance resulting resulting from from this this alternative alternative formulation formulation has has aa superior superior perforperforof variance mance to the the sandwich estimate of of variance variance above above in in terms terms of of being being closer closer mance to sandwich estimate to nominal nominal levels levels in in simulations simulations.. When When using using this this new new variance variance estimate, estimate, we we to emphasize that the the formulation formulation assumes assumes that that the the marginal marginal variance variance of of the the emphasize that outcome modelled correctly, correctly, and and that that there there is is some common correlation correlation outcome is is modelled some common structure for all all panels. panels. structure for Since material on internet can can last last far far past past its useful life, life, we we also also disdisSince material on the the internet its useful cuss another estimation estimation problem problem even even though though it it no no longer longer exists exists in in commercial commercial cuss another software packages.. In In early early software software implementations implementations of of the the PA-GEE model, PA-GEE model, software packages there was was aa mistake mistake in in the the calculation calculation of the association association parameters parameters by by the the there of the Pearson residuals for for the the exchangeable exchangeable correlation correlation model model in in the the case case where where Pearson residuals some panels had had only only aa single single observation observation (such panels are are called called singletons) singletons).. some panels (such panels In the original original Liang Liang and and Zeger Zeger (1986) (1986) paper, paper, the the scale scale estimator estimator and and exexIn the changeable correlation were were correctly correctly specified specified (using (using equation equation 33.47 for the the changeable correlation .47 for dispersion parameter) as as dispersion parameter) 1 n n; (3.67) (3.67) n ~ ~ rit (E2=1 n2) - p 2=1 t=1 n (3.68) (3 ~~ .5n2 (n2 - 1) - P .68) a - ~ ~ ~ ~ r2tr2t~ 2=1 t=1 V >t i=1 J the first first macro macro program program However, different formulas formulas were were implemented implemented in in the However, different supporting estimation of of these these models models.. Subsequent Subsequent software software implementations supporting estimation implementations then copied copied these these (incorrect (incorrect in in the the presence presence of of singleton) formulas from from the the then singleton) formulas first first macro macro program program.. all of Current software software implementations implementations for for all the packages packages used used in in this this text text Current of the handle this issue issue correctly correctly.. To To verify verify this, this, or to test test another another software implehandle this or to software implefollowing data mentation, the following can be be modelled modelled as an exchangeable exchangeable correlation correlation mentation, the data can as an linear regression linear regression model model..

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

THE PA-GEE FOR FOR GLMS GLMS THE PA-GEE id id 11 22 33 44 66 66 66 66 66

yY 22.5324 22 .5324 22.1011 22 .1011 21.6930 21 .6930 21.3061 21 .3061 20.2493 20 .2493 20.3324 20 .3324 19.6399 19 .6399 18.6703 18 .6703 20.9972 20 .9972

889 9

xx 00 00 00 00 00 230 230 406 406 593 593 770 770

id id 77 88 88 88 88 88 99 99 10 10

yy 23.2159 23 .2159 23.4819 23 .4819 23.1031 23 .1031 23.6713 23 .6713 23.2609 23 .2609 23.7659 23 .7659 20.4287 20 .4287 18.9259 18 .9259 24.1646 24 .1646

xx 00 00 242 242 382 382 551 551 718 718 00 234 234 00

id id 10 10 10 10 10 10 10 10 12 12 12 12 12 12 12 12 12 12

yy 23.5287 23 .5287 24.5693 24 .5693 24.0201 24 .0201 24.6849 24 .6849 21.1412 21 .1412 21..8088 8088 21 22.8473 22 .8473 22.1797 22 .1797 21..7346 7346 21

xx 273 273 416 416 616 616 806 806 00 225 225 400 400 595 595 771 771

The correct estimate estimate of of the the correlation correlation parameter parameter is is 0.953, 0.953, while while an an incorrect incorrect The correct result of 0.748 0.748 is is reported reported in in flawed flawed implementations implementations.. Note Note that that these these valvalresult of for estimation estimation of the dispersion parameter (as (as implied implied ues use equation ues use equation 3.47 3.47 for of the dispersion parameter above). above) .

3.2.4 Convergence Convergence of of the the estimation routine 3.2.1, estimation routine For most For most data, data, the the estimation estimation routine routine converges converges in in relatively relatively few few iterations. iterations. However, are times times when when the the estimation estimation of ofthe model does does not not converge converge.. However, there there are the model in the Often this this is due to to an an instability the estimation estimation of of the the correlation correlation matrix matrix.. Often is due instability in A common A common cause cause of of nonconvergence nonconvergence is is that that the the solution solution for for the the correlations correlations iterates between two two (or (or more) more) possible possible outcomes. outcomes. iterates between If we we take take the the Samplel Sample1 dataset dataset previously previously used used and and try try to to fit fit an an exchangeIf exchangeable regression model where the dispersion parameter is estimated using equaequaable regression model where the dispersion parameter is estimated using tion 3.47, we see that the estimation alternates between the following two tion 3 .47, we see that the estimation alternates between the following two estimates of estimates of O 0 = = (,Q, (13, 0) 1»

0011 O 022

= = = =

(1,2,4.5, (1, 2, 4.5, .5556) .5556) (0,0,4.5,1.000) (0, 0, 4 .5,1 .000)

(3.69) (3.69) (3.70) (3.70)

There are two two choices that we we can can take take to to address address this this instability instability in in the the estiestiThere are choices that mation. One choice choice is is to to use use the the other other estimator estimator for for the the dispersion dispersion parameter parameter.. mation. One The other choice choice is is to to specify specify aa different different correlation correlation structure structure.. Specifying Specifying aa The other different correlation structure explicitly addresses addresses the the fact fact that that the the data data may may different correlation structure explicitly not support our our original original specification, specification, while while changing changing the the estimator estimator for for the the not support dispersion parameter indirectly indirectly changes changes the the correlation correlation parameter parameter estimates estimates dispersion parameter via the denominator denominator.. Either Either solution solution fixes fixes the the nonconvergence nonconvergence problem problem seen seen via the in this example example.. in this Alternatively, if if the the model model we we are are analyzing is binomial, binomial, we we can can use use the the Alternatively, analyzing is estimation techniques of of the the following following section section or we can can rely rely on on the the one-step one-step estimation techniques or we estimates-those estimates resulting resulting from single iteration iteration of of the the estimation estimation estimates-those estimates from aa single algorithm ; see algorithm; see Lipsitz, Lipsitz, Fitzmaurice, Fitzmaurice, Orav, Orav, and and Laird Laird (1994) (1994)..

3.2.5 ALR ALR:: Estimating Estimating correlations correlations for for binomial binomial models models 3.2.5 Carey, Zeger, Zeger, and and Diggle Diggle (1993) point out out that that the the Pearson Pearson residuals residuals are are not not Carey, (1993) point fitting aa aa very very good good choice choice for for the the estimation estimation in in the case when when we we are are fitting the special special case binomial model model.. They They offered offered the the alternative alternative approach approach that that is is discussed discussed here. here. binomial

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

90 90

GENERALIZED ESTIMATING ESTIMATING EQUATIONS EQUATIONS GENERALIZED

We can can write write the the correlation between aa pair pair of of observations observations in in aa panel panel as as We correlation between

1, Yik P(Yij = = 1, = 1) 1) - PijPik !-lij!-lik _ P(yij yik = (3.71) Corr(yij, (3.71) C orr (Yij, yik) Yik ) -_ - -----'~==;======;====;======;=(1 (1 V!-lij(l!-lij)!-lik(l!-lik) Pij - Pij)Pik - Pik) and note that that the the probability probability that that both both observations observations have have values values of of 11 satisfies satisfies and note max(0, Pij

+ Pik -

1) <

P(yij

= 1,

yik

= 1) < min(pij,

Pik)

(3.72) (3 .72)

That means that that the the correlation constrained to to be be within within some some limits limits that that That means correlation is is constrained depend on the the mean mean of of the the data. data. On On the the other other hand, hand, the the odds odds ratio ratio does does not not depend on have this restriction restriction.. have this The odds odds is is a a ratio ratio of of the the probability probability of of success success to to the the probability probability of of failure failure.. The The odds that that yij Yij = = 11 given given that that yik Yik = = 11 isis then then The odds

1, yik = .. ' . -1) = _ P(yij P(Yij = = 1,Yik = 1) 1) OddS (Y'J' Odds(yij ; yik Y,k = - 1) 0, P(Yij = 0, Yik = 1) P(yij = yik = 1) and the and the odds odds that that yij Yij = = 11 given given that that yik Yik = = 00 isis

(3.73) (3.73)

1, Yik P(Yij = = 1, = 0) 0) yik = P(yij Odds(yij Yik = = 0) 0) = = P( Y'J.. -_ 0, 0 . - 0) Odds(Yij;; yik P(yij = ,Y,kyik = 0)

(3.74) (3.74)

The odds ratio ratio is is the the ratio ratio of of these these two two odds odds The odds

0, yik · ( )_ _ .i, _ P(yij P(Yij = = 1,1, yik Yik = = 1)P(yij l)P(Yij = = 0, Yik = 0) 0) Odds Ratio'~~ R a t 10(yij Odds Yij, Yik - y~ijk
(3.75) (3.75)

Instead of of estimating estimating correlations correlations with with Pearson Pearson residuals, residuals, we we can can take take every every Instead pairwise comparison comparison of of odds odds ratios ratios and and find find the the correlation correlation of those measures measures.. pairwise of those In doing so, it is is apparent apparent that that aa method method may may be be derived derived to to obtain obtain the the estiestiIn doing so, it mated correlation by by fitting logistic regression regression model model to to the the pairwise pairwise odds odds mated correlation fitting aa logistic ratios (at each each step step of of the the optimization) optimization).. ratios (at the outline outline of of the the PA-GEE PA-GEE estimation estimation from from the the previous previous subsection. Recall the Recall subsection . We are changing changing the the manner manner in in which which a a is is estimated estimated in in this this approach by We are approach by specifying an alternate alternate estimating estimating equation for those those ancillary ancillary parameters. parameters. specifying an equation for Instead of estimating correlation coefficients coefficients from from Pearson residuals, we we find find Instead of estimating correlation Pearson residuals, the odds-ratio odds-ratio estimate for each each of of the the parameters parameters of of the the specified specified correlation correlation the estimate for matrix. other words, words, the the log log odds ratios are are used used in in aa logistic logistic regression regression to to matrix . In In other odds ratios estimate the correlation correlation matrix matrix.. estimate the The notation is is complicated complicated by by the the need need to to address address the the combinacombinaThe following following notation toric origin origin of of the the values values that that enter enter the the estimating estimating equations equations.. toric We let let "(ijk denote the the log log odds odds ratio ratio between between the the outcomes outcomes yij Yij and Yik (it (it is We and yik is 'Yijk denote the log of in equation vijk = = the log of V)ijk 7/!ijk in equation 3.75) 3.75).. We We let let Nij !-lij = = P(yij P(Yij = = 1) 1) and and Vijk = P(yij P(Yij = 1, have 1, yik Yik = = 1) 1).. We We then then have .

logit P(yij 10glt P(Yij = = 11lYik) = "(ijkYik ++ In In 1 yik) ='Yijkyik

(

Pij v2jk !-lij - Vijk -

)

\/I

(11 - Nij !-lij - Pik+Vijk/ !-lik + Vijk

(3.76) (3.76)

Note that that in in analyzing analyzing the the log log odds odds ratio ratio estimates estimates there there is is an an (zi) (~i) vector vector Note of values.. of values

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

THE PA-GEE FOR FOR GLMS GLMS THE PA-GEE

991 1

Let Let (2 (i be be the the (~i) vector (zi) vector ;-o."ijk = =1 toOgl t -11 gi't-

1n ( 1 /-lij - Vijk ) } (3.77) (3.77) + + In - N2j /-lij - Pik /-lik + + v2jk Vijk ) ~ (1 where "(ijk q) known where is parameterized parameterized as as Za, Za, and and Z Z is is aa ((zi) ((~i) xx q) known covariate covariate -Y2jk is matrix that defines defines the the relationship relationship between between pairs pairs of of observations observations in in terms terms of of matrix that the appropriate elements of a. the appropriate elements of a . Overall, this approach approach involves involves aa second second estimating estimating equation equation such that Overall, this such that W(e) = = [[wpf3 (Q, CB, a) a) T w«a (Q, (,8 , a)] is given given by by T(19) a)] is

w(,8,a)((p+Q)Xl) `L('3' a) ((P+0xi)

{

"(ijkYik 'y2jky2k

(wf3(,8,a)(PXl)' wa (,8,a)(QXl)) (TP0,a)(Pxl),T«0,a)(gxl)) n

xTD (O:i) tX~D

i~l

2-1 n

lt2

~Z

( ~(~~) 2_1

n~

( Ta1] )T

8a2

(V(JLi))-l

(v(/-t2))-1

(3.78)

(Y~a(0) - M2JLi) YZ -

(

(¢»

)

D((ijk(l-(ijk))-l(yi'-(i) ((2jk (1 - (2jk)) -1 (YZ - (2) D

)

(3.79) 3 .79

R(a) D(V(/-lit)?/2 (3.80) (3.80) R(a) D(v(pit))1/2 where is the the total total number number of of parameters parameters needed needed for for a a to to represent the desired where qq is represent the desired matrix x 1) correlation matrix structure, and yi' is the ((~i) xl) vector constructed from correlation structure, and y2 is the ((zi) vector constructed from Yi-the construction is such that it matches the indices of (i' y2 -the construction is such that it matches the indices of (2. It turns turns out out that that using using pairwise pairwise odds odds ratios ratios (instead (instead of of Pearson Pearson residuals) residuals) It in which results in two (assumed to be orthogonal) estimating equations which can be be results two (assumed to be orthogonal) estimating equations can efficiently calculated efficiently calculated combining combining aa PA-GEE PA-GEE regression-step regression-step estimate estimate with with aa lologistic regression. Because Because of the fact fact that that the the algorithm algorithm alternates alternates between between aa gistic regression. of the PA-GEE step and and aa logistic logistic regression, regression, it it is is called called alternating alternating logistic logistic regresregresPA-GEE step sion or or ALR. ALR. SAS SAS users users can can estimate estimate models models with with this this alternative alternative method method for for sion estimating parameters via via an an option on the the REPEATED REPEATED statement statement estimating the the ancillary ancillary parameters option on in PROC GENMOD GENMOD.. To To be be clear, clear, the REPEATED statement statement respectively respectively specifies specifies eieiin PROC the REPEATED ther CORR= CORR= or or LOGOR= LOGOR= to to specify specify correlations correlations estimated estimated using using Pearson Pearson residuals residuals ther or using log log odds odds ratios. ratios. or using V(JLi) v(w2)

= =

D(V(/-lit)?/2 D(v(1_t2t)) 1/2

To summarize, summarize, the the ALR ALR approach approach has has the the following following characteristics characteristics:: To The approach leads to to estimates estimates and and standard standard errors errors for the regression regression •" The approach leads for the coefficients and and the the log log odds odds ratios. ratios. coefficients

•" ,8 a are are assumed to be be orthogonal orthogonal.. Q and and a assumed to though we we gain gain insight insight into into the the association association parameters, parameters, since and •" Even Even though since ,8 Q and a are are assumed assumed to to be be orthogonal, orthogonal, the the estimating estimating equation equation is still aa GEE1. GEE!. a is still

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

92 92

GENERALIZED ESTIMATING ESTIMATING EQUATIONS EQUATIONS GENERALIZED

We first first fit an exchangeable exchangeable correlation correlation PA-GEE model using using the the data data on on We fit an PA-GEE model prenatal care care (see (see section section 33.6.3). prenatal .6.3) . GEE population-averaged GEE population-averaged model model Group variable mom Group variable:: mom Link: logit Link : logit Family binomial Family:: binomial Correlation: exchangeable Correlation : exchangeable

Scale parameter:: Scale parameter

1 1

Number of obs Number of obs Number of groups Number of groups Obs per per group group:: min min Obs avg avg max max Wald chi2(4) chi2(4) Wald Prob >> chi2 Prob chi2

= = = = = = =

2449 2449

1558 1558 11 1 .6 1.6 4 4 237.57 237 .57 0.0000 0 .0000

(standard errors errors adjusted adjusted for for clustering clustering on on mom) mom) (standard -----------------------------------------------------------------------------Semi-robust Semi-robust II prenat II Odds Odds Ratio Ratio Std.. Err Err.. Z P> I z I [95% Conf Conf.. Interval] Interval] prenat Std Z P>IZI [95% -------------+---------------------------------------------------------------

-------------+----------------------------------------------------------------

indSpa I .4059843 .053526 -6.84 .313534 .525695 indSpa .4059843 .053526 -6 .84 00.000 .000 .313534 .525695 husProf I .2837106 3.37 1.259089 2.390355 husprof 11.73484 .73484 .2837106 3 .37 00.001 .001 1 .259089 2 .390355 toilet I 3.639483 .5323812 8.83 2.732288 4.847891 toilet 3 .639483 .5323812 8 .83 00.000 .000 2 .732288 4 .847891 ssDist I .9865406 .0024238 -5.52 .9818014 .9913027 ssDist .9865406 .0024238 -5 .52 00.000 .000 .9818014 .9913027 ------------------------------------------------------------------------------

The output output in in terms terms of of the the regression regression coefficients coefficients is is The (standard errors errors adjusted adjusted for for clustering clustering on on mom) mom) (standard -----------------------------------------------------------------------------Semi-robust Semi-robust II prenat Coef Std Z P>IZI prenat II Coef.. Std.. Err Err.. z P> I z I [95% Conf Conf.. Interval] Interval] [95% -------------+----------------------------------------------------------------

-------------+----------------------------------------------------------------

indSpa I --.9014407 .1318425 -6.84 -1.159847 indSpa .9014407 .1318425 -6 .84 00.000 .000 -1 .159847 --.6430341 .6430341 husprof .871442 husProf I .5509154 .163537 3.37 .2303888 .871442 .5509154 .163537 3 .37 00.001 .001 .2303888 toilet I .1462793 8.83 1.005139 toilet 11.291842 .291842 .1462793 8 .83 00.000 .000 1 .005139 11.578544 .578544 ssDist --.0087353 ssDist I --.0135508 .0024569 -5.52 .0135508 .0024569 -5 .52 00.000 .000 --.0183662 .0183662 .0087353 .0862278 .0907927 0.95 .2641783 - cons cons I .0862278 .0907927 0 .95 00.342 .342 --.0917227 .0917227 .2641783 ------------------------------------------------------------------------------

This This model model estimates estimates a = = .7812 .7812.. Fitting Fitting the the same same exchangeable exchangeable correlation correlation from the model using ALR ALR instead instead of of the the usual usual moment moment estimates estimates from the Pearson Pearson model using residuals results in in residuals results

a

Analysis Analysis Of Of GEE GEE Parameter Parameter Estimates Estimates Empirical Empirical Standard Standard Error Error Estimates Estimates

Parameter Estimate Parameter Estimate Intercept Intercept indspa indspa husprof husprof toilet toilet ssdist ssdist Alpha1 Alphas

0.0873 0 .0873 -0.9014 -0 .9014 0.5537 0 .5537 11.2934 .2934 -0.0135 -0 .0135 4.5251 4 .5251

Standard Standard Error Error 0.0908 0 .0908 0.1318 0 .1318 0.1635 0 .1635 0.1464 0 .1464 0.0025 0 .0025 0.2646 0 .2646

Confidence 95% 95% Confidence Limits Limits -0.0906 -0 .0906 -1.1597 -1 .1597 0.2332 0 .2332 1.0065 1 .0065 -0.0184 -0 .0184 4.0066 4 .0066

0.2652 0 .2652 -0.6430 -0 .6430 0.8742 0 .8742 1.5803 1 .5803 -0.0087 -0 .0087 5.0437 5 .0437

ZZ

Pr Pr >> IZI IZI

00.96 .96 -6.84 -6 .84 33.39 .39 8.84 8 .84 -5.51 -5 .51 17.10 17 .10

0.3362 0 .3362 <.0001 < .0001 0.0007 0 .0007 <.0001 < .0001 <.0001 < .0001 <.0001 < .0001

4.5251 is where Alpha! = = 4.5251 is the the common (exchangeable) log log odds odds ratio ratio.. It where Alphal common (exchangeable) It is is important to emphasize that important to emphasize that the the two two approaches approaches to to modeling modeling aa common common corcorrelation estimate that that common common correlation correlation on on different quantities quantities.. It It may may be be relation estimate

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

THE PA-GEE FOR FOR GLMS GLMS THE PA-GEE

993 3

tempting to to convert convert the the results results of of this this approach to the the interpretation interpretation of of the the tempting approach to moment estimate of of the the residuals residuals.. We We do do not not do do this. this. If If we we want want to to calculate calculate moment estimate aa PA-GEE PA-GEE model model based based on correlation structure structure for for the the residuals, residuals, then then we we on aa correlation do that directly. directly. do that The standard The standard errors errors in in the the output output are are in in terms terms of the sandwich sandwich estimate of of the estimate of variance. The estimating estimating equations are solved solved separately separately since since the the two two estiestivariance . The equations are mating equations mating equations are are assumed assumed to to be be orthogonal orthogonal.. In In this this output, output, the the sandwich sandwich estimate of variance variance has been calculated for both both of of the the estimating estimating equations equations.. estimate of has been calculated for We not recommend recommend calculating calculating the the naive naive variance variance of the estimating estimating equaequaWe do do not of the tion of the association parameters along with the sandwich estimate of varition of the association parameters along with the sandwich estimate of variance for the regression parameters. ance for the regression parameters. Since this model model places places more more emphasis on the the estimation estimation of of the the association association Since this emphasis on parameters, we we should should consider consider methods methods for for calculating calculating diagnostics diagnostics for for this this parameters, part of the estimation estimation problem problem in in addition addition to to our our focus focus on the regression regression part of the on the parameters.. Methods Methods for for calculating calculating leverage leverage and and influence influence are are discussed discussed in in parameters Chapter 4. 4. Chapter

3.2.6 Summary Summary 3.2.6 Liang Zeger apply apply the the name population averaged averaged GEE GEE to to emphasize emphasize the the Liang and and Zeger name population nature of the the generalization generalization of of the the original original estimating equation due due to to the the nature of estimating equation focus on on the the marginal marginal distribution. distribution. focus The for GLMS GLMs is is best best understood understood by by focusing focusing on on what what we we did did and and The PA-GEE PA-GEE for what we did not do in deriving the algorithm. First, we did not start with what we did not do in deriving the algorithm . First, we did not start with aa probability-based model, model, nor nor even even aa likelihood likelihood.. We We used used the the estimating estimating equaequaprobability-based tion for a pooled model and then generalized the estimation ofthe panel-level tion for a pooled model and then generalized the estimation of the panel-level variance through ancillary ancillary parameters. parameters. There There is is an an implied implied quasilikelihood quasilikelihood variance through from the GEE model which mayor may not coincide to a probability-based from the GEE model which may or may not coincide to a probability-based model. model. Second, note note that that the the model model was was extended extended assuming assuming aa correlation structure Second, correlation structure that was was estimated estimated by by combining combining information information across across panels. panels. We the that We estimate estimate the ancillary parameters to to get get aa working working correlation correlation matrix matrix.. Using Using that that matrix matrix ancillary parameters (which is is applied applied to to each each panel), panel), we we then then estimate estimate the the j3 regression coefficoeffi(which Q regression cient parameter vector vector.. Thus, we are are focusing focusing on on the the marginal marginal distribution distribution cient parameter Thus, we where the panels panels are are summed summed together together after after taking taking into into account account the the correlacorrelawhere the tion. In effect, we we are are averaging averaging over over the the panels panels.. The The resulting resulting j3 is called called the the tion. In effect, Q is population-averaged estimator estimator for for this this reason. reason. population-averaged For model the the presence presence of of respirespiFor example, example, consider consider aa logit logit model model where where we we model ratory illness of of aa child child at at each each of of several several doctors doctors visits visits on on whether whether the the mother mother ratory illness of child smokes smokes (also (also collected collected at at each each visit) visit).. The The estimated estimated parameter parameter for for of aa child the smoking smoking status status of of the the mother mother is is aa measure measure of of the the effect effect of of second-hand the second-hand (maternal) smoking (maternal) smoke smoke on on respiratory respiratory illness illness averaged averaged over over all all instances of smoking instances of (1/0).. This This is is without without regard regard to to aa specific mother or or child. child. (1/0) specific mother

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

94 94

GENERALIZED ESTIMATING ESTIMATING EQUATIONS EQUATIONS GENERALIZED

We rewrite rewrite the the PA-GEE more formally formally as as We PA-GEE more n n _ w, ~(3))' k3)] w(j3) _ = ~ `FZ ['3' [j3,a(j3,¢;(j3)),¢;(j3)] `F('3)

LWi i=l 2-i

a

(3.81) (3.81)

in order to to emphasize emphasize that that in order

•" The The GEE GEE that that we we solve solve uses uses moment moment estimates estimates of of both both 0¢> and and a. a.

ancillary parameters on ,Q •" The The moment moment estimates estimates for for the the ancillary parameters depend depend on 13.. are assumed to be be orthogonal orthogonal.. are assumed to fit heteroskedastic There is is aa SAS SAS macro macro available available that that allows allows one one to to fit heteroskedastic •" There GEEs (in (in some some situations) situations).. The The extension extension assumes that a(¢» = O2 ¢>i such such GEES assumes that a(0) = that aa different different error variance is is estimated estimated for for each each panel. panel. * that error variance

•" 13Q and a and a

We also also note note that that We structure of is the •" The The basic basic structure of the the GEE GEE is the independence independence model-the model-the LIMQL LIMQL equation for for pooled pooled GLMs. GLMs. estimating equation estimating

ancillary parameters using •" We We do do not not obtain obtain standard standard errors errors for for ancillary parameters when when using the original original models models of Liang and and Zeger Zeger (section (section 3.2). This is is not not an an issue issue the of Liang 3.2) . This if the focus of our analysis is only the regression parameters 13. ALR or if the focus of our analysis is only the regression parameters Q. ALR or another GEE model is preferred if the focus of our analysis does include another GEE model is preferred if the focus of our analysis does include the association association parameters. parameters. the the modified sandwich estimate estimate of variance means means that that the the stanstan•" Using Using the modified sandwich of variance dard errors for 13 are robust to misspecification of the assumed correlation dard errors for ,Q are robust to misspecification of the assumed correlation structure of of a a.. structure •" The The misspecification of a a affects affects only the efficiency efficiency of of our our regression regression misspecification of only the but not the consistency. estimates, estimates, but not the consistency. •" The The relative relative gain gain in in using using aa PA-GEE PA-GEE model model over the independence independence model model over the is relatively small if the panels are small in size. We recommend the is relatively small if the panels are small in size. We recommend the independence model when there are less than 30 panels in a dataset. independence model when there are less than 30 panels in a dataset .

We alluded alluded to to various various nuances nuances in in the the construction construction of the sandwich sandwich estimate We of the estimate of variance.. Principally, Principally, we we noted noted that that the the use use of of the the expected expected Hessian Hessian in in the the of variance construction of the the sandwich sandwich estimate estimate of variance could could be be numerically numerically difdifconstruction of of variance ferent from the sandwich sandwich estimate estimate of of variance variance constructed with the the observed observed ferent from the constructed with Hessian for the the cases cases in in which which we we do not specify specify the the canonical canonical link link function. function. Hessian for do not Software implementations may choose choose either either approach. approach. The The expected expected HesHesSoftware implementations may sian is typically typically used used (and (and documented documented as the method method of of Fisher Fisher scoring) scoring) in sian is as the in aa straightforward implementation of the IRLS IRLS algorithm. The two two approaches approaches straightforward implementation of the algorithm . The may be mixed mixed (as done by by default in SAS SAS and and optionally in Stata) Stata) for for may also also be (as done default in optionally in the estimation estimation of of generalized generalized linear models.. SAS SAS users users should see the the docudocuthe linear models should see mentation for PROC GENMOD noting noting the the documentation documentation for for the the EXPECTED and mentation for PROC GENMOD EXPECTED and

*

* http http://www.statlab.uni-heidelberg.de/statlib/GEE/GEEl/GEEL202.DOC is the the curcur://www .statlab .uni-heidelberg .de/statlib/GEE/GEEI/GEEI202 .DO C is rent source source of of information information on on the the macro macro at at the the time time of of writing writing this this text text.. The The macro rent macro was written written by by Ulrike Ulrike Gromping Gromping.. was

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

THE FOR GLMS GLMS THE SS-GEE SS-GEE FOR

95 95

SCORING options options of of the the MODEL MODEL statement statement.. Stata Stata users users should consult the the docSCORING should consult documentation for the the glm glm command command noting noting the the documentation documentation for for the the irls irIs and and umentation for fisher options.. fisher options we point point out out that that aa generalized generalized least squares (GLS) (GLS) For completeness, completeness, we For least squares approach can also also lead lead to to valid valid estimates estimates for for this this class class of of models. models. Lipsitz, Lipsitz, approach can Laird, (1992) present present an an approach approach based based on generalized least least Laird, and and Harrington Harrington (1992) on generalized squares. These estimates estimates are are asymptotically asymptotically equivalent to the the PA-GEE PA-GEE results results squares . These equivalent to and are similar similar to to the the one-step one-step PA-GEE PA-GEE approximation-the approximation-the approximation approximation and are resulting from resulting from only only one one iteration iteration of of the the fitting fitting algorithm algorithm..

3.3 The The SS-GEE 55-GEE for 3.3 for GLMs GLMS The subject-specific versions versions of of GEES GEEs for GLMs also also extend extend the the LIMQL LIMQL estiestiThe subject-specific for GLMS mating equation for for pooled pooled GLMS. GLMs. They They have have the the same same origin origin as as the the populapopulamating equation tion averaged averaged models. models. However, However, we we hypothesize hypothesize that that there there is is some some underlying underlying tion distribution for random random effects effects in in the the model model that that serves serves as as the the genesis of the the distribution for genesis of within-panel correlation.. within-panel correlation there are three items items we we must must address address to to build build models models for for these these As such, such, there As are three GEEs: GEES:

1. 1. We We must must choose choose aa distribution distribution for for the the random random effect effect.. 2. 2. We We must must derive derive the the expected expected value value which which depends depends on on the the link link function function and the the distribution distribution of of the the random random effect effect.. and must derive derive the the variance-a variance-a function function of of the the usual usual variance variance and and the the We must 33.. We random effect effect.. random Recall the the forms forms of of the expected value value (mean) (mean) for for the the parametric parametric randomrandomRecall the expected effects models from the beginning of our presentation. We must follow this from beginning . follow effects models the of our presentation We must this same approach when deriving the expected values for these SS-GEEs. when deriving for same approach the expected values these SS-GEES . Whether the the calculation calculation of of the expected value value has has an an analytic analytic or or numeric numeric sosoWhether the expected lution to the integral depends on the choice of the random-effects distribution lution to the integral depends on the choice of the random-effects distribution and the form form of the link link function (how the the expected expected value value is is parameterized) parameterized).. and the of the function (how Formally, Formally, we we have have

is

P

= E(g) = E[E(gitlv2)]

= f f(v2)g -1 (X,32 + v2)dv2

(3.82) (3.82)

This formulation single random This formulation assumes assumes aa single random effect effect v2. Vi. In In fact, fact, we we may may have have several random effects such that several random effects such that Piss = E(g) = E[E(gitlv2)]

= f f(v2)g-1(X,32 +zitv2)dv2

(3.83) (3.83)

where is aa vector vector of of covariates covariates associated associated with with the the random random effects, effects, and and f f isis where zz is the multivariate density of the random effect vector Vi. the multivariate density of the random effect vector v2. Focusing on on aa single single random random effect effect Vi, we emphasize emphasize that that for the PA-GEES, PA-GEEs, Focusing for the v2, we

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

96 96

GENERALIZED ESTIMATING ESTIMATING EQUATIONS EQUATIONS GENERALIZED

we have we have

g(JtftA)) g(pPA pPA JtftA = = E(tJit) E(Yit) VPA(Yit) VPA (dit)

= = =

(3.84) (3 .84) (3.85) (3 .85)

Xitj3PA Xit0PA g-l (Xitj3PA) g-1(Xit,3PA) V(JtftA)a(¢) V(ppA)a(O)

(3.86) (3.86)

and for the the SS-GEEs we have have and for SS-GEES we

g(p ts) SS

Pit

Xitoss Xitj3SS + + Vi vi

f f

(3.87) (3.87)

+ vi)dvi Vi)dvi (vi)g-1 (Xit,Q ss + f ff(Vi)g-l(Xitj3SS

(3.88) (3.88)

[g-l (Xisj3SS + vi) Vi) - pss] Jt?sS] [g-1(Xitoss [g-l (Xitj3ss + + Vi) - pts Jt?tS]] df df(Vi) vi) ss + f1g_1(Xi"Q (vi)

f

+a(O)I(s +a(¢)I(s = = t)t) f V(Jt?ts)df(Vi) V(p ts)df (vi)

(3.89) (3.89)

where f isis the the distribution distribution of of the the random random effect effect v. v. The The variance variance matrix matrix for for where f the ith subject is is defined in terms terms of of the the (s, (s, t) t) entry. the ith subject defined in entry. PA) is S unless g() It is is important important to see that that g-1 g-l(JtftA) is not not equal equal to to gg-l(Jt?t gO isis It to see (p -1 (p ts)) unless the identity function and the expected value of the random effect is zero (as the identity function and the expected value of the random effect is zero (as it is for Gaussian random effects). it is for Gaussian random effects). The for GLMs The SS-GEE GLMs is is estimated estimated using using the the same LIMQL estimating estimating SS-GEE for same LIMQL equation for pooled pooled GLMs GLMs but but we we substitute substitute equation equation 33.88 for pit Jtit and and equaequaequation for .88 for tion 3.89 3.89 for for V(pit) V(Jtit).. The The difficulty difficulty of of this this depends depends on on the the link link function function and and tion random-effects distribution f. Some, Some, but but not not all, all, choices choices lead lead to to an an analytic random-effects distribution analytic solution of the the integral integral equations equations.. solution of

3.3.1 Single Single random-effects random-effects 3.3.1 To admit comparison comparison with with the population averaged averaged models models already covered, To admit the population already covered, we consider a single random effect Vi following the Gaussian distribution with we consider a single random effect vi following the Gaussian distribution with mean zero and variance (7~. The expression for the marginal mean is relatively mean zero and variance Qv. The expression for the marginal mean is relatively easy to calculate calculate or or approximate approximate for for the the standard standard link link functions functions.. For For the the easy to identity link, identity link, /ISS

,-it p is

E(yit) = f g-1 (Xit,Q

(Xito ss + vi) XitQss f

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

ss

+ vi)dF(vi) z 1 exp (dvi 27fQv 2w

vz )

(3.90) (3 .90) (3.91) (3.91) (3.92) (3.92)

THE FOR GLMS GLMS THE SS-GEE SS-GEE FOR

97 97

For the log link, For the log link, SS /ISS

,-it P

E(yit)

f

= f 9 -1 (xit,Q ss + v2)dF(v2)

v)

z 1 exp (2Qv 27fQv

+ exp (x2,Qss v2) ~2

(3.93) (3.93) dv2

(3.94) (3.94)

) exp Cx2t,Qss + 2

(3.95) (3.95)

= f 9 -1 (xit,Q ss + v2)dF(v2)

(3.96) (3.96)

For the For the probit probit link, link, SS /ISS

,-it P

E(yit)

WitOSS+vi 00

xitoss

1 1 e-z2 / 2dz l ~72 f

2

(3.97) exp (.97) 2Q 2~Q v ' ) dv2 (3 1

(3.9S) (3.98)

1+Q )

The logit link link has has no no closed closed form form solution, solution, but but may may be be approximated approximated by by The logit /ISS

,-it

E(yit)

f

= f 9 -1 (xit,Q ss + v2)dF(v2) eXP+v i

1 + eXP+vi ) s xito s

1+

1

27rQ2v

exp (-

(3.99) (3.99) 2

2Q2 v

)

dv2

(3.100) (3.100) (3.101) (3.101)

CZwl

where = 16v/3-/(157r) 16V3/(157f).. where cc = The most most important important result result is is that that of of equation equation 3.92, 3.92, showing showing the the equivalence equivalence The of the identity identity link link parameterization parameterization for for the the subject-specific subject-specific and and populationpopulationof the averaged approaches. This This result result means means that that coefficients coefficients for models fit fit with with averaged approaches. for models the identity identity link link have have both both aa subject-specific subject-specific and population-averaged inthe and aa population-averaged interpretation.. This This result result is is true true for for any any distribution distribution of of the the random random effect effect for for terpretation which the expected expected value value is is zero. zero. which the

Ideally, we for the Ideally, we would would be be able able to to derive derive similar similar simple simple expressions expressions for the varivariance function as as well well;; but but this this is is not not true true except except for for the the case case when when the mean ance function the mean linear link. is parameterized with with the the linear link. However, However, we we need need only only derive an apapis parameterized derive an proximation of of the the variance variance V(y2), V(Yi), and and so consider aa Taylor Taylor series series expansion expansion proximation so consider

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

98 98

GENERALIZED ESTIMATING ESTIMATING EQUATIONS EQUATIONS GENERALIZED

of equation 3.S9 to obtain obtain of equation 3.89 to

V(Yi) V(Yj)

^ ~

~

+vi)

V [g-l(Xit,BSS) + w9 8~/-1(Xit,BSS 9-1 (xito ss ) + -1 (xito ss + Vi)] V L J 2

{ V [g-1 (xjt,3 SS)+ +¢E {V [g-l(Xit,BSS) + w9-1(xit,3SS)viJ 8~/-1(Xit,BSS)Vi] }} (3.102) +OE (3 .102) l ass 2 _ D (3.103) D (077ss) w + + OV(~tSS)JJT ¢V(f-tss)JJT = = V(Yi) V(Yi) (3.103)

r

(~~:: (J~

J is is aa vector vector of of indicator indicator variables variables.. The The first first term term is is aa matrix matrix of of variance variance where where J components for the the random random effect effect and and the the second second term term is is a a matrix matrix of of the the components for dispersion parameter and and usual variance for for aa GLM GLM.. dispersion parameter usual variance As in in the the population population averaged averaged models, models, we we can can use use simple simple moment moment estimators estimators As for the the unknown unknown ancillary ancillary parameters. parameters. for a2 v

n

i=1

En

XS)

[(Yi -

( ~it

1

i=1

ni

i=1

t=1

(

Yi -

'SS)J

l-ti

SS) -

2 Q2v

- hitSS ) ) V (lists

(3.104) (3.104) (3.105) (3.105)

3.3.2 Multiple Multiple random-effects 3.3.2 random-effects Considering single single random-effects random-effects allows allows us us to more closely closely compare the resultresultConsidering to more compare the ing models for for the the SS-GEE SS-GEE and and PA-GEE PA-GEE approaches. However, the the SS-GEE SS-GEE ing models approaches . However, allows richer collection collection of of models models in in that that we we can can consider consider multiple multiple random random allows aa richer effects. For completeness, completeness, we we vectorize vectorize the the equations equations of of interest interest for for this this set set of of effects . For This illustrates illustrates the the derivation derivation assuming assuming aa Gaussian Gaussian (normal) (normal) distridistrimodels. models . This bution for for the the random random vector vector with with mean mean 00 and and variance variance matrix matrix E ~v.. bution For For the the identity identity link link and and aa random random vector vector of of length length q, q,

f-tftssS = Pi

E(Yit) E(yit)

f

ss

= f 9-1 g-l (xit,Q (Xit,Bss + + vit)dF(vit) vit)dF(vit) =

fiXit,BSS + vit)(27r)-q/21Ev1-1/2 Vit)(27f)-q/21~vl-1/2 exp exp xit,Qss +

f

Xit,Bss ss xitQ

(3.106) (3.106) 1 \ vtE-1vit l dvit (_~V~~~lVit)dVit 2 /

(_

(3.107) (3.107) (3.10S) (3.108)

For the log link, For the log link, /ISS ss ,-it = Pi

E(yit)

= f 9-1 (xit,Q ss

~e(X")3"+`t)(27,)-q/2IEv1-1/2exp

exp (Xito ss + 21 vTitEvvit

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

(3.109) (3.109)

+ vit)dF(vit) (

(3.110) -2veE-lvitl dvit (3.110) v

(3.111) (3.111)

THE FOR GLMS GLMS THE SS-GEE SS-GEE FOR

For the probit probit link, link, For the

S

/-lftss Pi

= _

E(Yit) E(yit)

999 9

f

g-l (Xitj3ss + + vit)dF(vit) vit)dF(vit) == f 9-1(XZtQSS

f (i:f3SS+Vit Witoss+"i, 00

(3.112) (3 .112)

1 exp ( ( - 2ztI -1 z2t) dzit) (27r)-q/2I I I- 1/ 2 exp dZ it ) (27r)-9/2III-1/2

-~Z~I-1Zit)

-2vtE-lv,t) dvit (27r)-q/21~vl-1/2exp (_~V~~~lVit) dVit

(27r) -9 / 2 1E,1 -1 / 2 exp _

(

(3.113) (3 .113)

v

(3.114) (3 .114)

The logit link link has has no no closed closed form form solution, solution, but but may may be be approximated approximated by by The logit

/-lftss = Pi S

E(Yit) E(yit)

f((1+eXPvt)_ f

f

g-l(Xitj3SS + + vit)dF(vit) == f 9-1(XZtQSS vit)dF(vit)

(3 .115) (3.115)

eXP+vit

1

eXf3+;~t.) (2~)_(27r)-q/21~vl-1/2 exp (_~V~~~lVit)dVit (3.116) e/2IEvI-1/2exp (- 2 vtE~ 1 v2t)dv2 t (3 .116)

1 + eX

2

v"

~

(3.117)

II-4/2)

where cc = where = 16v/-3/(157r) 16V3/(157r).. Taking into into account account the the more more general general matrix matrix notation notation associated associated with with the the Taking random effects, the the variance variance V7(y2) V(Yi) isis derived derived as as random effects, V(YZ)

V

fg

-1 (XZtoss ) +

w9-1(XZtQss t

+

v2)1

+OE { V [g -1 (XZto ss ) + w9-1(XZtoss)v2] t

l

}

D (Oltss ) v2ELvT D ( 0~sS ) + OV(luss ) = V(YZ)

(3.118) (3 .118) (3.119) (3 .119)

3.3.3 Applications Applications of of the SS-GEE 3.3.3 the SS-GEE This illustrates aa model model for for linear linear regression regression for for the the purpose purpose of of illusillusThis section section illustrates trating SS-GEE. The majority majority of of software software packages packages offer offer several competing trating SS-GEE . The several competing methods for fitting fitting this this model, model, and and we we wish wish to to highlight highlight the the equivalence equivalence of of methods for the PA-GEE and SS-GEE SS-GEE for for this particular case case.. the PA-GEE and this particular For the the case case of of a a single single Gaussian Gaussian distributed distributed random random effect, we can can derive derive For effect, we the exact solution for for the variance as as the exact solution the variance VSS VSS(Yit)

(YZt)

=

f

[(/-lis + + Vi) - Ni,,] /-lis] [(Nit [(/-lit + + Vi) - pit] /-lit] df df(Vi) v2) v2) (v2) + + f [(pi,, OI(S ¢I(s = = t)t)

Qv

JJT(J~ + D(0) D(¢» JJT +

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

f

f df (v2) df(Vi)

(3 .120) (3.120) (3.121) (3 .121)

100 10 0

GENERALIZED ESTIMATING ESTIMATING EQUATIONS EQUATIONS GENERALIZED

such that the the panel panel variance variance is is given given by by such that

1> +w + IJ"~ V

=

IJ"~ w IJ"~ w

IJ"~ w 1> +w + IJ"~ IJ"~ w

IJ"~ w IJ"~ w 1> + +w IJ"~

Qv

Qv

Qv

07 2 v 2 v 2 v

or or

.. .

(3.122) (3.122)

0 + Qv -

This is, in in fact, fact, the the same same hypothesized hypothesized structure structure as as the the exchangeable exchangeable correcorreThis is, lation PA-GEE that we observed for population averaged models. The two PA-GEE for The lation that we observed population averaged models. two models are equivalent since the link functions for the PA and SS models are link for PA models are equivalent since the functions the and SS models are the same. However, they may differ in calculation depending on the method However, may differ in the same. they calculation depending on the method used to estimate estimate the the panel panel variance variance components. components. used to Since the the two two models models are are the the same, same, there there is is no no compelling compelling reason reason to to calculate calculate Since this particular particular SS SS model-software model-software already already exists for the the equivalent equivalent PA PA model model.. this exists for In addition, the the software packages referred referred to to in this text text also also include include the the In addition, software packages in this means to estimate estimate an equivalent FIML FIML model model (Gaussian (Gaussian distributed distributed random random means to an equivalent effects linear regression). the PA PA model model allows allows us an interpretation interpretation effects linear regression) . Fitting Fitting the us an under either the the PA PA or or the the SS model assumptions assumptions.. under either SS model We can still derive derive the the appropriate appropriate panel panel level variance component component if if we we We can still level variance want linear regression regression model model with with more more than than one one random random effect effect.. Such want to to fit fit aa linear Such aa model model is is equivalent equivalent to to aa mixed mixed model model that that is is supported supported by by other other software software (see PROC PROC MIXED MIXED in in SAS the user-contributed user-contributed gllamm gllamm command command in (see SAS or or the in Stata). Stata) . Under these conditions, conditions, it it is is again again not not compelling to go go through through the the trouble trouble Under these compelling to of programming the the SS-GEE SS-GEE since since software exists for for the the equivalent equivalent models. models. of programming software exists On the the other other hand, hand, we we can can consider consider programming programming the the resulting resulting estimator estimator On if we wish wish to to fit fit aa log-linear log-linear regression regression model model (using (using the the log log link link rather rather than than if we the identity identity link). link). We We already saw that that the the random random effects effects induce induce an an offset offset on on the already saw the link function that that differs differs from from aa similar similar population population averaged averaged model model.. the link function Another illustration illustration is the SS-GEE SS-GEE Poisson Poisson model model for for the the Progabide Progabide data data.. Another is the We saw earlier earlier that that the the maximum maximum likelihood likelihood Gaussian Gaussian random random effects effects model model We saw results in results in

Random-effects poisson poisson Random-effects Group variable (i) (i) : id id Group variable

Number of of obs Number obs Number of of groups groups Number

295 295 59 59

Random effects effects uu_ii -- Gaussian Random Gaussian

Obs Obs per per group group:: min min =

= = =

avg = avg max max = Log likelihood Log likelihood

=

= -1017 .4249 -1017.4249

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

LR chi2(3) chi2(3) LR Prob Prob >> chi2 chi2

55 5.0 5 .0 55

33.40 33 .40 0.0000 0 .0000

THE FOR GLMS GLMS THE SS-GEE SS-GEE FOR

101 10 1

seizures II Coef.. Std.. Err Err.. z P> I z I [95% Conf Conf.. Interval] Interval] seizures Coef Std z P>Izl [95% -----------------------------------------------------------------------------time I .111836 .0468768 2.39 .0199591 .2037129 time .111836 .0468768 2 .39 00.017 .017 .0199591 .2037129 progabide I --.5396357 .0545001 -9.90 progabide .5396357 .0545001 -9 .90 00.000 .000 --.6464539 .6464539 --.4328174 .4328174 timeXprog I --.1047258 .0650304 -1.61 .0227314 timeXprog .1047258 .0650304 -1 .61 00.107 .107 --.232183 .232183 .0227314 cons I .0430617 26.11 1.039831 _cons 11.124231 .124231 .0430617 26 .11 00.000 .000 1 .039831 11.20863 .20863 lnPeriod (offset) 1nPeriod I (offset) -----------------------------------------------------------------------------/lnsig2u II --.8970602 .0495843 -18.09 /lnsig2u .8970602 .0495843 -18 .09 00.000 .000 --.9942437 .9942437 --.7998767 .7998767 ----------------------------------------------------------------------------sigma_u II .6385661 .0158314 .6082788 .6703614 sigma .6385661 .0158314 .6082788 .6703614 rho II .289655 .0102022 .2700747 .3100519 rho .289655 .0102022 .2700747 .3100519

-------------+----------------------------------------------------------------

-------------+----------------------------------------------------------------------------+---------------------------------------------------------------u

Likelihood ratio test of rho=0 rho=O:: Likelihood ratio test of

chibar2(01) == chibar2(01)

2602.16 Prob>=chibar2 == 00.000 2602 .16 Prob>=chibar2 .000

The above above results results are calculated using using aa straightforward Gauss-Hermite The are calculated straightforward Gauss-Hermite quadrature approximation of of the the likelihood, likelihood, gradient, gradient, and and Hessian Hessian.. This This apapquadrature approximation plication offers another opportunity opportunity for for us us to to illustrate illustrate the the sensitivity sensitivity of of this this plication offers another approximation for rough rough functions functions.. Using Using an an adaptive adaptive quadrature quadrature approximaapproximaapproximation for tion, we we obtain obtain the the following following results results tion, seizures II Coef.. Std.. Err Err.. z P> I z I [95% Conf Conf.. Interval] Interval] seizures Coef Std z P>Izl [95% -----------------------------------------------------------------------------time I .1118361 .0468766 2.39 .0199597 .2037125 time .1118361 .0468766 2 .39 00.017 .017 .0199597 .2037125 progabide I .0051622 .0530336 0.10 .1091062 progabide .0051622 .0530336 0 .10 00.922 .922 --.0987817 .0987817 .1091062 timeXprog I .0650299 -1.61 .0227303 timeXprog --.104726 .104726 .0650299 -1 .61 00.107 .107 --.2321823 .2321823 .0227303 cons I .0480689 22.26 .9756434 _cons 11.069857 .069857 .0480689 22 .26 00.000 .000 .9756434 11.16407 .16407 lnPeriod (offset) 1nPeriod I (offset) ------------------------------------------------------------------------------

-------------+----------------------------------------------------------------

Variances Variances and and covariances covariances of of random random effects effects ----------------------------------------------------------------------------***level ***level 22 (id) (id) var(1) var(1):: .2970534 .2970534 ((.01543218) .01543218) -----------------------------------------------------------------------------

The most most striking striking difference difference in in the the two two maximum maximum likelihood likelihood models models is is the the The This difference change in sign sign for for the the progabide progabide variable. variable. This difference is is aa reflection reflection of the change in of the sensitivity of the straightforward Gauss-Hermite Gauss-Hermite quadrature quadrature approximation approximation sensitivity of the straightforward used in the the first first model. used in model . Finally, for comparison Finally, we we can can fit fit aa SS-GEE Poisson model model to to the the data data for comparison SS-GEE Poisson seizures II Coef Std seizures Coef.. Std.. Err Err.. ------------------------------------time time I .11843141 .11843141 .04901529 .04901529 progabide progabide I .02746961 .02746961 .00899939 .00899939 timeXprog timeXprog I --.10386892 .07324486 .10386892 .07324486 _cons 1 .3499396 cons I 1.3499396 .00669060 .00669060 lnPeriod (offset) 1nPeriod I (offset) ------------------------------------sigma-2_v .15731536 sigma - 2_v II .15731536 -------------------------------------

-------------+-----------------------

-------------+-----------------------

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

102 10 2

GENERALIZED ESTIMATING ESTIMATING EQUATIONS EQUATIONS GENERALIZED

The results differ differ from from both both of of the the approximated approximated maximum maximum likelihood likelihood methmethThe results ods given before before.. In In fact, fact, the the model model is is not not aa good good choice choice for for the the data regardless ods given data regardless of the calculation calculation since since there there is is evidence evidence of of overdispersion overdispersion that that we we have have not not of the addressed. One of of the the points points we we have have emphasized emphasized in in this this section section is is that that even even addressed . One though we we can can program program a a SS-GEE SS-GEE model, model, it it is is often often unnecessary. unnecessary. The The SS-GEE SS-GEE though model includes the the assumption assumption that that the the estimating estimating equation equation for for the the variance variance model includes parameters is is orthogonal orthogonal to to the the estimating estimating equation equation for for the the regression paramparameters regression parameters. Most software software packages packages having maximum likelihood likelihood are are uncorrelated. uncorrelated. eters . Most having maximum

We can can fit fit aa maximum maximum likelihood likelihood gamma gamma distributed distributed random random effects effects model model We for this this particular particular model. model. The The gamma gamma distributed distributed random random effects effects model model has has for an estimated covariance covariance matrix matrix for for this this data data that that is is nearly nearly zero zero for for the the covarian estimated covariance of the the random random effects effects parameter parameter and and the the regression regression parameters, parameters, which which ance of is by the the orthogonality orthogonality assumption assumption imposed imposed by by the the SS-GEE SS-GEE model model.. is enforced enforced by The results of of this this model model more more closely closely agree agree with with the the SS-GEE SS-GEE results results than than The results did the Gaussian Gaussian distributed random effects effects models. models. To To be be clear, clear, the the gamma gamma did the distributed random distributed random effects effects model model does does not not impose impose an an assumption assumption of of orthogodistributed random orthogoThe small small estimates estimates of of covariance covariance for for the the estimating equations of of the the nality. nality. The estimating equations regression and variance variance parameters parameters for for this this particular particular dataset dataset are not aa general general regression and are not result. result .

Random-effects Poisson Poisson Random-effects Group variable (i) (i) : id id Group variable

Number of obs Number of obs Number of groups Number of groups

Random effects effects u-i u_i -- Gamma Gamma Random

Obs per per group group:: min min Obs avg avg max max

Log Log likelihood likelihood

= -1017 -1017.3826 = .3826

Wald chi2(3) chi2(3) Wald Prob >> chi2 Prob chi2

295 295 59 59 = = = = =

5 5 5.0 5.0 5 5 55.73 .73 0.1253 0 .1253

seizures Coef.. Std.. Err Err.. z P> I z I [95% Conf Conf.. Interval] Interval] seizures II Coef Std z P>Izl [95% -------------+----------------------------------------------------------------------------+----------------------------------------------------------------

time I .111836 .0468768 2.39 .0199591 .2037129 time .111836 .0468768 2 .39 00.017 .017 .0199591 .2037129 progabide I .0275345 .2108952 0.13 .4408815 progabide .0275345 .2108952 0 .13 00.896 .896 --.3858125 .3858125 .4408815 timeXprog I --.1047258 .0650304 -1.61 .0227314 timeXprog .1047258 .0650304 -1 .61 00.107 .107 --.232183 .232183 .0227314 .1529187 8.81 1.047894 - cons cons I 11.347609 .347609 .1529187 8 .81 00.000 .000 1 .047894 11.647324 .647324 lnPeriod (offset) 1nPeriod I (offset) -------------+----------------------------------------------------------------------------+---------------------------------------------------------------/lnalpha --.8137534 --.1350007 /lnalpha II .1731544 --.474377 .474377 .1731544 .8137534 .1350007 -------------+----------------------------------------------------------------------------+---------------------------------------------------------------alpha .4431915 alpha II .6222726 .1077492 .4431915 .8737153 .6222726 .1077492 .8737153 -----------------------------------------------------------------------------Likelihood ratio test of .24 Prob>=chibar2 Likelihood ratio test of alpha=0 alpha=O:: chibar2(01) chibar2(01) = = 2602 2602.24 Prob>=chibar2 = = 00.000 .000

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

THE FOR GLMS GLMS THE SS-GEE SS-GEE FOR

103 10 3

3.3.4 3.3.4 Estimating Estimating the the SS-GEE SS-GEE model model

The complete SS-GEE SS-GEE is is given given by by The complete n

alti Yi - Ai XD ((~i) Oil (V(l~i))(~X~D (V(JLi))-l (Y~(¢~i) ) a(O) ))

w(,8,a) (,~, a)

1

f

f(Vi)g-l (vi)g -1 ff

JLi _

~

E ~v(a) (a)

=

(3.124) (3.124)

(Xit,8ss + vi)dvi vi)dvi ss + (xit,Q

SS

V(JLi) V (wi)

(3.123) (3 .123)

SS

viE,(a)vTD (OM ) ss D (OM orJss ) vi~v(a)vrD (0 orJss + OV(lU ¢V(Mss )) (3.125) D (3 .125) (077SS) 77 SS SS ) + SS Parameterized variance variance matrix matrix Parameterized

(3.126) (3.126)

Specification of of the the second second estimating estimating equation equation T(a w( a,,8) for calculating calculating the the Specification Q) for components of ~v(a) for specific subject-specific models is left to the reader. components of E (a) for specific subject-specific models is left to the reader. At each each step step in the usual usual GLM GLM algorithm, algorithm, we we first first estimate estimate the the variance variance At in the components V(JL), and then use that result to estimate ,8. At the initial components V(lt), and then use that result to estimate Q. At the initial step step we can assume assume aa diagonal diagonal matrix matrix for for the the variances variances.. Calculation Calculation of of the the comcomwe can ponents of of the the variance variance matrix matrix involves estimating the the dispersion dispersion parameter parameter.. ponents involves estimating As in the case of the PA-GEE models, we can calculate the sandwich As in the case of the PA-GEE models, we can calculate the sandwich esestimate instead of of the the naive naive variance variance estimate estimate.. Doing Doing so so further further protects protects us us timate instead from misspecification of the within-panel correlation assumption implied by from misspecification of the within-panel correlation assumption implied by the variance variance component component.. the that the modified modified sandwich sandwich estimate estimate of variance is is robust robust to to any any form form Recall Recall that the of variance of within panel correlation. Thus, we gain efficiency in our estimated ,8 if we we of within panel correlation . Thus, we gain efficiency in our estimated Q if have the correct form of the within panel covariances, and we are protected have the correct form of the within panel covariances, and we are protected from misspecification misspecification if if we we are wrong. from are wrong. The The modified modified sandwich sandwich estimate estimate of of variance variance for for the the complete estimating complete estimating equation is found found using using the the same same approach approach as as was was used used for for PA-GEE PA-GEE models models equation is VMS VMS

A

A

=

-

=

A -lBA- T A-1BA-T

_ OWf3

_ 00;

001 8

_OW 8 «a 8a oa

[

00

0,8

(tL ( Ln(n n

n;i

:

-1

(3.128) (3 .128) T

)T

n;i

(nL

pit ))) T«itp)) Wf3it Wf3it (3.129) B B (3 .129) « it t-1 t t=l ( Wait) t=l ( Wait) t-1 ~~ ( Again, since we we assume assume that that the the two two estimating estimating equations are orthogonal, orthogonal, we we Again, since equations are can write can write A=

~ i=l i-1

(3.127) (3.127)

]-1

a'pP ]

0

8,Q 0

0

(3.130) (3.130)

This of orthogonality orthogonality leads leads to to aa block block diagonal diagonal modified modified sandwich sandwich This assumption assumption of

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

104 104

GENERALIZED ESTIMATING ESTIMATING EQUATIONS EQUATIONS GENERALIZED

estimate of variance variance where where the the upper upper left left entry entry is is given given by by estimate of alp,3 )

n

( n ;i

( - aw a; ) [n ~ (n1: ~ `Lpit W/3it 1: 0,3 i=1 t=1

`Lpit W/3it

)

T] (- 0Q aW a; )

(3.131) (3 .131)

WOlit

)

T] (- awaaOl )

(3.132) (3 .132)

)T

n i;

)

(n ~ ~1: t=1

(-

The lower right right entry entry is is given given by by The lower n

( n;i

aaOl ) [n ~ (n ~ wcxit Olit ( - aw i--1

t=1

) ( n;i )

(n~ t=1

Because we are are interested interested only in the the regression regression parameters, parameters, and and since since the the maBecause we only in matrix trix is is block block diagonal, diagonal, there there is is no no need need to to calculate calculate variances variances for for the the ancillary ancillary parameters.. The The modified modified sandwich sandwich estimate estimate of of variance variance for for the the regression regression parameters parameters is the upper upper pp xx p p part part of of VMS VMS given given by by equation equation 33.131 and the the parameters is the .131 and modified sandwich estimate variance for for the the association association parameters parameters is is the the modified sandwich estimate of of variance lower q xx qq part part of of VMS VMS given given in in equation The variance variance of of the the associaassocialower q equation 33.132. .132 . The tion parameters parameters is is not not calculated calculated in in the the approach approach given given in in section section 33.2, though tion .2, though the formula is valid. The variance of the association parameters is calculated the formula is valid. The variance of the association parameters is calculated in the ALR ALR approach (section 33.2.5) as well well as as other other GEE GEE models models.. in the approach (section .2 .5) as We advise that all users specify the modified sandwich estimate variance We advise that all users specify the modified sandwich estimate of of variance with this model. We are free to use either the expected Hessian or the observed with this model. We are free to use either the expected Hessian or the observed Hessian as we we did did for for the the PA-GEE PA-GEE model. model. Hessian as

3.3.5 Summary Summary 3.3.5 The SS-GEE The main The is not not implemented implemented as as often often as as the the PA-GEE PA-GEE model model.. The main SS-GEE is reason is that that alternatives alternatives such as maximum maximum likelihood likelihood are typically available available.. reason is such as are typically Maximum likelihood Maximum likelihood methods, methods, when when available, estimate the the same same population population available, estimate parameter.. All All of of the the GEE GEE models models that that assume assume orthogonality of the the estimating estimating parameter orthogonality of equations for the the regression regression parameters parameters as as well well as as the the association association parameters parameters equations for are called "GEE of order order 1" I" or or "GEEL" "GEE!." are called "GEE of We should should also also emphasize emphasize that that the the focus focus of of the the PA-GEE PA-GEE model model already already We covered was the the introduction of structured correlation.. Since the marginal marginal covered was introduction of structured correlation Since the model introduced directly directly into into the the estimation estimation of of the the PA-GEE PA-GEE model, model, and and model is is introduced we restrict attention to the within-panel correlation, most of the resulting we restrict attention to the within-panel correlation, most of the resulting variance structures implied implied by by the the correlation correlation can can not not even even approximately approximately be be variance structures generated from a random-effects model. If the variance structures are a focus generated from a random-effects model . If the variance structures are a focus of analysis and and we we believe believe aa mixed mixed model model explains explains the the data, data, we we should should of the the analysis focus attention on a SS-GEE (or an equivalent likelihood-based model) over focus attention on a SS-GEE (or an equivalent likelihood-based model) over aa PA-GEE PA-GEE model model..

3.4 The GEE2 GEE2 for for GLMs GLMs 3 .4 The for GLMs Our discussion discussion of of the the PA-GEE PA-GEE for GLMs included included two two estimating estimating equations equations;; Our for estimating one is for estimating j3 and the the other other for for estimating estimating a. Since we we were were not not one is Q and a . Since interested the correlation correlation parameters, parameters, and and since since we we assumed assumed that that the the two two interested in in the

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

THE GEE2 FOR FOR GLMS GLMS THE GEE2

105 105

coefficient vectors were were orthogonal, we focused focused only only on on the the estimating estimating equation equation coefficient vectors orthogonal, we for 13 treating na as as ancillary. ancillary. for Q treating Formally, our our overall overall GEE2 GEE2 for for GLMS GLMs addresses addresses both both of of the the parameter parameter Formally, vectors and their their associated associated estimating equations.. In In this this case case there there is is no no vectors and estimating equations assumption that the the two two estimating estimating equations equations are are orthogonal. orthogonal. The The GEE2 GEE2 may may assumption that be written written be n

L i-1 "=1

( D~ 8,0 013 O(ji 00%

--"

013 0,3

°Mi on 8a O(ji C~Qi on 8a

T

)"( (

V(Yi, Yi) Yi) v(Yi, V V(Si, Yi) (si ' Yi)

V(Yi,Si) ) V(Yi'si) V(Si,' si) Si) ) V(si

)

Yi -l-~i - Mi Yi = [0] [0] Si - 0(ji1 i ) ( Si

-1 (

(3.133) (3.133)

What GEE1 GEE1 does does differently differently from from GEE2 GEE2 is to assume assume that that the the first first two two What is to terms in in the the estimating estimating equation equation are are block block diagonal diagonal (assume (assume zero zero matrices matrices in in terms the off diagonal positions) positions).. It is therefore therefore clear clear that that GEE2 GEE2 is is aa generalization generalization the off diagonal It is of GEE!.. of GEE1 For GEE2, GEE2, we we not not only only have have to to provide provide aa working working correlation correlation matrix matrix for the For for the regression parameters, but but we we must must also also provide provide aa working working covariance covariance matrix matrix regression parameters, for the the correlation correlation parameters. parameters. In In other other words, words, instead instead of of making making assumptions assumptions for on the first first 22 moments, moments, we we make make assumptions assumptions on on the the first moments.. It It is on the first 44 moments is much more difficult difficult to to picture picture these these assumptions assumptions and and understand understand the the conconmuch more straints result. It It can be difficult difficult to to interpret interpret the the mean mean vector vector that that instraints that that result. can be includes dependence on on the the association association parameters parameters in in the the GEE2 GEE2 specification specification.. cludes aa dependence In most applications applications the the mean mean is only defined defined by by the the regression regression parameters; parameters; In most is only but the the assumption assumption that that 81ti oMi/on implies that that the the correlation is aa function function but /8a i0 00 implies correlation is of the regression regression parameters parameters 13. of the Q. The question is is how how to to define the estimating estimating equations equations to to model model the the assoassoThe question define the ciation of the covariance in terms of both the regression parameters and the ciation of the covariance in terms of both the regression parameters and the association parameters. The GEE2 specification is not often used because of association parameters. The GEE2 specification is not often used because of the fact that to obtain a consistent estimator of the regression parameters, the fact that to obtain a consistent estimator of the regression parameters, we must correctly correctly specify specify the the link link function function as as well well as as the the covariance covariance function. function. we must If we are willing to assume that a block diagonal parameterization is correct, If we are willing to assume that a block diagonal parameterization is correct, the procedure yields consistent estimation of the regression parameters even the procedure yields consistent estimation of the regression parameters -- even if the association is incorrectly specified. Certain models have been proposed, if the association is incorrectly specified. Certain models have been proposed, with appropriate proofs, proofs, dealing dealing with with the the consistency consistency and and distribution distribution of of the the with appropriate for matrix results. Results for the asymptotic covariance matrix and consistency of the results . Results the asymptotic covariance and consistency of the in (1991), estimators are given in Zhao and Prentice (1990), Prentice and Zhao (1991), estimators are given Zhao and Prentice (1990), Prentice and Zhao and Gourieroux and and Monfort Monfort (1993) (1993).. and Gourieroux A A sandwich sandwich estimate variance is is constructed constructed in in the the usual usual way way and and ililestimate of of variance lustrates that the the A-1 A -1 matrix matrix in in the the definition definition of of the the sandwich sandwich estimate of lustrates that estimate of variance A -1 BA -T -T is is not not necessarily necessarily symmetric-it is block block diagonal diagonal where where variance A -1 13A symmetric-it is out earlier, earlier, while while the the sandwich each block is is symmetric. symmetric. As As pointed pointed out each block sandwich estimate estimate of variance is robust to to misspecification misspecification of of the the correlation correlation structure for the the of variance is robust structure for PA-GEE model, PA-GEE model, it it does does not not have have this this property property for for the the GEE2 GEE2 models models.. The The reason is that that the the GEE2 GEE2 models models do do not not assume assume the the orthogonality orthogonality of of the the two two reason is

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

106 10 6

GENERALIZED ESTIMATING ESTIMATING EQUATIONS EQUATIONS GENERALIZED

estimating equations.. Hence, Hence, in in this this situation the hypothesized hypothesized correlation correlation estimating equations situation the structure enters into into the the calculation calculation of of the the sandwich sandwich estimate estimate of of variance variance.. structure enters For ease ease of of presentation, presentation, note note that that the the overall overall estimating estimating equation may be be For equation may written for O e == (,3, (13,0:) written for a) as as

'P(0)=

a) 'P«(0, a)) that the the sandwich sandwich estimate estimate of variance is is given given by by such such that of variance (P)3 (0,

VMS = =A A -1 -lBABA -T VMS

(3.134) (3.134) (3.135) (3.135)

T

where where (3.136) (3 .136) and and B

n

n;

i=1

t=1

it

J

; ~~ t=1

T it

J

(3.137) (3.137)

3.5 GEES GEEs for for extensions extensions of GLMs 3.5 of GLMs There have been been notable notable extensions extensions of of classes classes of of GEE GEE models models to multinomial There have to multinomial data. Multinomial data data are are classified as data data where where the the response response variable takes data. Multinomial classified as variable takes on one of of several distinct outcomes-the complete set of outcomes outcomes may mayor may on one several distinct outcomes-the complete set of or may not have aa natural natural order order.. not have

3.5.1 Generalized Generalized logistic logistic regression 3.5.1 regression The generalized logistic logistic regression regression model model assumes that the the response response counts counts The generalized assumes that of each covariate pattern have have aa multinomial multinomial distribution, distribution, where where the the multimultiof each covariate pattern nomial counts for for different different covariate patterns are independent.. Due Due to to these these nomial counts covariate patterns are independent assumptions, this type type of model is is usually usually called called the the multinomiallogit model.. assumptions, this of model multinomial logit model Upon assuming one of of the outcomes is the reference outcome, the the model model Upon assuming that that one the outcomes is the reference outcome, simultaneously fits aa logistic logistic regression regression model model comparing comparing each of the the other simultaneously fits each of other outcomes to the the reference reference.. As As such, such, with with kk possible possible outcomes, outcomes, there there are are (k-1) (k -1) outcomes to logistic regression vectors vectors.. For For this this model, model, the the exponentiated exponentiated coefficients coefficients are are logistic regression not always called called odds odds ratios ratios since since they they denote denote the the odds odds of of being being in in category category not always instead of of the the reference reference category; category; they are sometimes sometimes called called relative relative risk risk jj instead they are ratios. ratios. The SUDAAN package only one The SUDAAN package is is the the only one of ofthe four packages packages used used in in this this text text the four that has has support support for for fitting fitting this this model model.. This This package package may may be be used used in in standalone standalone that mode as aa callable PROC from from the the SAS SAS package package.. Our Our examples use the the mode or or as callable PROC examples use Shah, Barnwell, SAS callable callable method method.. Shah, Barnwell, and and Bieler Bieler (1997) (1997) can can be be referenced referenced for for SAS documentation on using using the the software software as as well well as as statistical statistical documentation documentation on on documentation on implemented. We the methods We should should mention mention that that this this particular particular package package the methods implemented. emphasizes the analysis analysis of of complex complex survey survey data data (not (not covered covered in in this this text) text).. As As emphasizes the

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

GEES FOR FOR EXTENSIONS EXTENSIONS OF OF GLMS GLMS GEES

107 107

discussed previously, SUDAAN software can be used without specification specification of of discussed previously, SUDAAN software can be used without the sampling the sampling frame. frame. In the the following following output output we we use use a a constructed constructed dataset dataset with with 50 50 panels panels of of size size In 10. Category Category 33 of of the the response response variable variable is is used used as as the the reference reference.. 10. Independence parameters parameters have have converged converged in in 55 iterations iterations Independence Step parameters have have converged converged in in 8 8 iterations iterations.. Step 11 parameters Number of of observations observations read read Number Observations used used in in the the analysis analysis : Observations Denominator degrees degrees of of freedom freedom Denominator

500 500 500 500 49 49

Weighted count: Weighted count : Weighted count: Weighted count :

Maximum number number of of estimable parameters for for the the model model is is Maximum estimable parameters

500 500 500 500

6 6

File GEE GEE contains contains 50 Clusters Clusters File 50 50 clusters clusters were were used used to to fit fit the the model model 50 Maximum cluster size is is 10 10 records records Maximum cluster size Minimum cluster size is is 10 10 records records Minimum cluster size Sample and Population Population Counts Counts for for Response Response Sample and 1: Sample Population 1: Sample Count Count 105 Population 105 260 2: Sample Sample Count Count Population 2: Population 260 Population 135 3: Sample Sample Count Count Population 3: 135

Variable YY Variable Count Count 105 105 Count 260 Count 260 Count Count 135 135

Variance Variance Estimation Estimation Method Method:: Taylor Taylor Series Series (WR) (WR) SE Method:: Robust Robust (Binder, (Binder, 1983) 1983) SE Method Working Correlations Correlations:: Exchangeable Exchangeable Working Link Function:: Generalized Generalized Logit Logit Link Function Response variable variable Y: Y: YY Response

II II I Independent Variables Variables and and Effects Effects II Independent (log-odds) II YY (log-odds) I I( Intercept Intercept II X1 Xl X2 II I II X2 I ----------------------------------------------------------------------------I I I I I I

vs 3 3 11 vs

I I I I I I

Beta Coeff. Coeff. Beta SE Beta Beta SE T-Test B=0 B=O T-Test P-value T-Test T-Test P-value B=O B=0

I I I I I I

I I I I I I

Beta Coeff. Coeff. Beta SE Beta Beta SE T-Test B=0 B=O T-Test P-value T-Test T-Test P-value B=0 B=O

I I I I I I

I

I

I

0.36 1 0.36 0.20 1 0.20 1. 77 1 1 .77

-1.42 -1 .42 1 0.39 0 .39 1 -3.66 -3 .66 1

-1.27 -1 .27 1 0.23 0 .23 1 -5.57 -5 .57 1

0.0823 0 .0823 1

00.0006 .0006 1

0.0000 0 .0000 1

I

I

I

----------------------------------------------------------------------------I

2 vs vs 3 3 12 I I I I

1.32 1 .32 0.18 0.18 7.39 7.39 0.0000 0 .0000

I 1 1 1 I 1

-1.11 -1 .11

0.26 0 .26 -4.24 -4 .24 00.0001 .0001

I 1 1 1 I 1

I

-0.91 -0 .91 1 0.14 0 .14 1 -6.69 -6 .69 1 0.0000 0 .0000

I 1

----------------------------------------------------------------------------Correlation Matrix Correlation Matrix -----------------------------------------Y Y Y Y 11 2 2 -----------------------------------------0.0190 1 -0.0123 0.0272 2 2 -0 .0123 0.0272 ------------------------------------------

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

108 10 8

GENERALIZED ESTIMATING ESTIMATING EQUATIONS EQUATIONS GENERALIZED

The output includes includes the the estimated estimated correlation correlation parameters parameters for for the the exchangeThe output exchangeable model.. Note Note that that there there are are three three correlation correlation parameters parameters listed listed instead instead of of able model the single parameter one one might might expect expect.. This model is is aa simultaneous simultaneous estimaestimathe single parameter This model tion of of logistic logistic regression regression models models comparing comparing outcomes outcomes to to the the reference reference catecatetion gory, such that that we we allow different common common correlation correlation for the comparisons comparisons.. gory, such allow aa different for the

3.5.2 Cumulative Cumulative logistic logistic regression 3.5.2 regression An assumes that that the the outcomes outcomes do do have have aa natural natural orderorderAn alternative alternative approach approach assumes ing. In this this approach, approach, the the cumulative cumulative logits logits are are used used to to define define the the model model.. In In ing. In addition to the the (k (k -- 1) 1) sets sets of regression parameters, parameters, there there are are also also (k - 1) 1) addition to of regression (k cut points estimated estimated for for the the kk possible possible outcomes. outcomes. The The exponentiated exponentiated regresregrescut points sion coefficients are as the the odds odds of of being being in in category category kk or or lower lower.. sion coefficients are interpreted interpreted as The response The response curves curves all all have have the the same same shape; shape; the the effects effects of of the the covariates covariates are are assumed to be be the the same same for for each each of of the the cut points.. assumed to cut points There There is is aa danger danger of of confusion confusion with with this this model model since since different different terminology terminology is is used in competing competing software manuals. SAS SAS software software labels labels this this the the cumulative cumulative used in software manuals. logistic regression model model while while Stata refers to to it it as as the the ordered ordered logistic logistic rerelogistic regression Stata refers gression model.. Stata Stata offers offers no no support support for for ordered ordered models models using using the the PA-GEE PA-GEE gression model software. SAS will will allow allow users users to to specify specify the the model, model, but but does does not not support support any any software . SAS correlation structure except except independence independence.. Stata offers pooled pooled ordered ordered logistic logistic correlation structure Stata offers and probit regression regression models models via via the the ologit ologit and and oprobit oprobit commands. These and probit commands . These two Stata Stata commands commands also also have have the the option option to to allow allow calculation calculation of of the the sandwich sandwich two estimate of variance variance.. In In addition addition to to these these estimators, estimators, SAS also supports supports the the estimate of SAS also cumulative complementary log-log log-log model model;; again, again, only only with with the the independent independent cumulative complementary correlation structure specification specification.. correlation structure To emphasize, we we assume assume that that there there are are kk possible possible ordered ordered outcomes. outcomes. In In To emphasize, addition to the coefficients of of the the model, cut points points are are also also estimated estimated to to addition to the coefficients model, cut divide the range range of the outcome outcome into into kk categories categories.. Some Some software software packages packages will will divide the of the directly list the the cutpoints cutpoints and and others will list list "intercepts" "intercepts";; these these intercepts intercepts are are directly list others will equal to the the negatives negatives of of the the cutpoints cutpoints.. equal to SAS Institute, Institute, Inc Inc.. (2000) includes aa simple simple example example to to illustrate illustrate the the cumucumuSAS (2000) includes lative logistic regression regression model model.. We We have have data data on on three three different different brands brands of lative logistic of ice ice cream. Included are are the the number number of of people people (count) (count) who who ranked ranked the the ice ice cream cream cream. Included (brand) on on aa Likert Likert scale (taste) from from 11 to to 5; l=very bad, bad, 2=bad, 2=bad, 3=aver3=aver(brand) scale (taste) 5; 1=very age, 4=good, 5=very 5=very good. good. The outcome variable variable is is the the qualitative qualitative taste taste of of age, 4=good, The outcome the ice cream for for each each brand. brand. The The count count variable variable is is aa replication replication style style weight weight the ice cream indicating the number number of of individuals assigning the the associated associated taste taste category category indicating the individuals assigning to that that associated associated ice ice cream brand. Replication Replication weights weights are sometimes called called to cream brand. are sometimes data compression weights weights or or frequency frequency weights weights.. data compression

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

GEES FOR FOR EXTENSIONS EXTENSIONS OF OF GLMS GLMS GEES

109 109

The data are: The data are: count count 70 70 71 71 151 151 30 30 46 46 20 20 36 36 130 130 74 74 70 70 50 50 55 55 140 140 52 52 50 50

brand brand 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3

taste taste 5 5 4 4 3 3 2 2 1 1 5 5 4 4 3 3 2 2 1 1 5 5 4 4 3 3 2 2 1 1

The The cumulative cumulative logistic logistic regression regression results results obtained from SAS SAS are are displayed displayed obtained from as: as: Criteria For Assessing Goodness Of Of Fit Fit Criteria For Assessing Goodness Criterion Criterion

DF DF

Log Log Likelihood Likelihood

Value Value

Value/DF Value/DF

-1564 .2269 -1564.2269 Analysis Analysis Of Of Parameter Parameter Estimates Estimates

Parameter Parameter Interceptl Intercepti Intercept2 Intercept2 Intercept3 Intercept3 Intercept4 Intercept4 brand brand Scale Scale

DF DF

Estimate Estimate

Standard Standard Error Error

11 11 11 11 11 00

-1.4936 -1 .4936 -0.5231 -0 .5231 1.1981 1 .1981 2.0595 2 .0595 -0.1932 -0 .1932 1.0000 1 .0000

0.1585 0 .1585 0.1485 0 .1485 0.1533 0 .1533 0.1631 0 .1631 0.0681 0 .0681 0.0000 0 .0000

Wald Wald 95% 95% Confidence Limits Limits Confidence

ChiChiSquare Square

Pr >> ChiSq ChiSq Pr

-1.8041 -1 .8041 -0.8142 -0 .8142 0.8977 0.8977 11.7399 .7399 -0.3266 -0 .3266 11.0000 .0000

88.85 88 .85 12.40 12 .40 61.10 61 .10 159.47 159 .47 88.05 .05

<<.0001 .0001 0.0004 0 .0004 <<.0001 .0001 <<.0001 .0001 0.0045 0 .0045

-1.1830 -1 .1830 -0.2320 -0 .2320 1.4985 1.4985 2.3792 2.3792 -0.0597 -0 .0597 1.0000 1.0000

The ordered ordered logistic logistic regression regression results results obtained from Stata Stata are: The obtained from are: estimates OOrdered dered logit logit estimates Log likelihood likelihood == -1564 .2269 Log -1564.2269

Number of of obs obs Number LR chi2(1) chi2(1) LR Prob >> chi2 chi2 Prob Pseudo Pseudo R2 R2

1045 1045 88.08 .08 0.0045 0 .0045 0.0026 0 .0026

taste Coef Std z P>Izl taste II Coef.. Std.. Err Err.. Z P> I z I [95% Conf Conf.. Interval] Interval] [95% -------------+----------------------------------------------------------------------------+----------------------------------------------------------------

brand .84 brand II --.1931679 .0680729 -2.84 .1931679 .0680729 -2 00.005 .005 --.3265883 .3265883 --.0597475 .0597475 -------------+---------------------------------------------------------------------------+---------------------------------------------------------------- cutl cuti I -2 (Ancillary -2.059537 .1630906 (Ancillary parameters) parameters) .059537 .1630906 -1.198094 .1532729 - cut2 cut2 I -1 .198094 .1532729 -_cut3 cut3 I .5230929 .1485218 .5230929 .1485218 .1584554 - cut4 cut4 I 11.493574 .493574 .1584554 ------------------------------------------------------------------------------

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

110 11 0

GENERALIZED ESTIMATING ESTIMATING EQUATIONS EQUATIONS GENERALIZED

Corresponding likelihood likelihood models models can can be be developed developed for for each each of the models models Corresponding of the within the standard GLM framework framework.. For For example, example, aa random-effects random-effects ordered ordered within the standard GLM model can be be fit fit assuming assuming aa Gaussian Gaussian distribution distribution for for the the random random effects. model can effects . Typically, software implementations implementations of ofthis model will will use use the the Gauss-Hermite Gauss-Hermite Typically, software this model quadrature approximation that we have have discussed discussed.. For For the the ice ice cream cream data data quadrature approximation that we analyzed earlier, we we now now assume assume that that there there is is aa random random effect effect associated associated with with analyzed earlier, each brand. each brand. The results results of of fitting fitting a a Gaussian Gaussian distributed distributed random-effects random-effects ordered ordered probit probit The regression model to to the the ice ice cream data are: regression model cream data are : Random-effects ordered ordered probit probit estimates estimates Random-effects Log -1548.3929 Log likelihood likelihood = -1548 .3929

=

Number of of obs obs Number

1045 1045

taste II Coef.. Std.. Err Err.. Z P> I z I [95% Conf Conf.. Interval] Interval] taste Coef Std z P>Izl [95% -----------------------------------------------------------------------------cutl _cuts II _cons II --.9368743 .0479922 -19.52 -1.030937 _cons .9368743 .0479922 -19 .52 00.000 .000 -1 .030937 --.8428114 .8428114 -----------------------------------------------------------------------------cut2 _cut2 II _cons II --.4278815 .0423327 -10.11 _cons .4278815 .0423327 -10 .11 00.000 .000 --.5108521 .5108521 --.3449109 .3449109 -----------------------------------------------------------------------------_cut3 _cut3 II _cons II .6603478 .0442729 14.92 .5735745 .7471211 _cons .6603478 .0442729 14 .92 00.000 .000 .5735745 .7471211 -----------------------------------------------------------------------------cut4 _cut4 II _cons II .0524694 23.38 1.123938 _cons 11.226776 .226776 .0524694 23 .38 00.000 .000 1 .123938 11.329615 .329615 -----------------------------------------------------------------------------rho rho II _cons II .2286464 .0518036 4.41 .1271133 .3301795 _cons .2286464 .0518036 4 .41 00.000 .000 .1271133 .3301795

-------------+----------------------------------------------------------------------------+----------------------------------------------------------------------------+----------------------------------------------------------------------------+----------------------------------------------------------------------------+----------------------------------------------------------------

Both the ordered ordered logit logit and and random-effects random-effects ordered ordered probit probit model model are are likelihoodlikelihoodBoth the based so so that that we we can can choose choose between between the the models models using using criteria criteria such such as as the the based Akaike information criterion criterion (AIC) (AIC) or or the the Bayesian Bayesian information information criterion criterion (BIC). (BIC). Akaike information Using the AIC, AIC, we we prefer prefer the the ordered ordered logit logit model model over over the the random-effects random-effects ororUsing the dered probit model. model. dered probit 3.6 Further Further developments developments and and applications 3.6 applications

Research continues in in the the several several areas areas of of generalized generalized estimating estimating equations equations.. Research continues We present an an introduction introduction to to some some of of the the recently recently proposed proposed applications applications and and We present theory in in the the following following sections. theory sections .

3.6.1 The FA-GEE for for GLMs GLMs with with measurement measurement error error 3.6.1 The PA-GEE Here we describe describe a a method method for for generating generating aa valid valid variance variance estimate estimate for for the the Here we case PA-GEE with with instrumental instrumental variables variables for for measurement measurement error error.. ObObcase of of aa PA-GEE taining taining point point estimates from the the model model is is relatively relatively straightforward straightforward since since we we estimates from simply replace the the endogenous endogenous regressors regressors with the predicted predicted values values from from OLS OL8 simply replace with the

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

FURTHER DEVELOPMENTS AND APPLICATIONS FURTHER DEVELOPMENTS AND APPLICATIONS

111 III

regressions. However, aa valid valid variance variance estimate of the the PA-GEE PA-GEE regression regression paparegressions . However, estimate of rameters must take take into into account account the the error error associated associated with with the the instrumental instrumental rameters must variables regressions.. variables regressions

In this this section, section, we we introduce introduce aa notation notation varying from the the usual usual notation notation In varying from associated with measurement measurement error error models models.. The usual notation notation involves involves namnamassociated with The usual ing matrices: Z Z for covariates measured measured without without error, error, W W for ing individual individual matrices: for covariates for cocovariates measured with with error, error, S5 for for the the instruments instruments of of W, and R R for the variates measured W, and for the augmented matrix of variables [Z [Z S]. 5]. In In order order to to avoid avoid confusion augmented matrix of exogenous exogenous variables confusion with the measurement measurement error error notation notation and and the the usual usual notation associated with with with the notation associated GLMs and and PA-GEES PA-GEEs (the (the W W weight weight matrix matrix in in the the IRLS IRLS algorithm algorithm and and the the GLMs R working correlation correlation matrix matrix of of the the Liang Liang and and Zeger Zeger PA-GEE), PA-GEE), we demote R working we demote the measurement measurement error error matrix matrix notational notational conventions conventions to to subscripts of the the X X the subscripts of matrix in matrix in the the PA-GEE PA-GEE notation notation..

We begin begin with with an an nn xx p p matrix matrix of covariates measured measured without without error error given We of covariates given by the the augmented augmented matrix matrix X X = = (Xl (Xl X2 X 2 )) and and consider consider the the case case for for which which by where X2 Xl = XZ, X z , where X 2 is is unobserved, unobserved, and and XW Xw = =X X 22 plus plus measurement measurement error error.. Xl = X is aa n n xx pz pz matrix matrix of of covariates covariates measured measured without without error error (possibly (possibly including including XZz is constant), and and XW X w is is aa n n xp., xpx (pz (pz+Px = p) p) matrix matrix of of covariates covariates with with classical classical aa constant), +p., = measurement error that that estimates estimates X2 X 2 .• We We wish wish to to utilize utilize an an n n xx pa Ps (where (where measurement error Ps > 2: p.,) Px) matrix matrix of of instruments instruments XS Xs for for XW X w ·. pa Greene (2000) (2000) discusses discusses instrumental instrumental variables variables and and provides provides aa clear presenGreene clear presentation to to supplement supplement the the following following concise concise description description.. The The method method of of instruinstrutation mental variables assumes assumes that that some some subset subset Xw X w of of the the independent variables mental variables independent variables is with the the error error term term in in the model.. In In addition, addition, we we have have aa mamais correlated correlated with the model trix XW X, Y, trix XS Xs of of independent independent variables variables which which are are correlated correlated with with X Y, and and w .. X, XW X are uncorrelated. uncorrelated. Using Using these these relationships, relationships, we we can an approxapproxcan construct construct an w are imately consistent imately consistent estimator that may be succinctly succinctly described described.. We We estimate estimate estimator that may be aa regression regression for for each the independent variables (each column) of of XW X w on on each of of the independent variables (each column) the instruments instruments and and the the independent independent variables variables not not correlated with the the error error the correlated with term (Xz (X z XS) Xs).. Predicted Predicted values values are are then then obtained obtained from from each each regression regression and and term substituted for the associated column of X in the analysis of the PA-GEE substituted for the associated column of Xw w in the analysis of the PA-GEE of interest.. This This construction construction provides provides an an approximately approximately consistent consistent estimator estimator of interest of the coefficients in the PA-GEE (it is consistent in the linear case). of the coefficients in the PA-GEE (it is consistent in the linear case) .

If we have have access the complete complete matrix matrix of measured without without If we access to to the of covariates covariates measured ), error (if we know X instead of just X we denote the linear predictor error (if we know X22 instead of just XW = w ), we denote the linear predictor 77'fJ = 2:;=l[X X ]j,Bj, and the associated derivative as 877/8~j O'fJ/o,Bj = [Xl [Xl X2 X 2 ]j k . The The and the associated derivative as 2 EP=, [Xll X2]j~j~j, estimating equation for Qj3 is estimating equation for is then then ~2 2:7=1i(y2 (Yi -N2)/V(p2)(8p/877)2[Xl - f.J,i)/V(f.J,i) (of.J,/o'fJMX l X2]j2 X 2 ]ji..

However, since since we we do do not not know know X2 X 2 ,, we we use use XR XR = = (X Xs) to to denote However, (XZz XS) denote the augmented augmented matrix matrix of of exogenous variables, which which combines combines the the covarithe exogenous variables, covariates measured without without error error and and the the instruments instruments.. We We regress each of of the the p., Px ates measured regress each components (each of the p., Px columns) columns) of of XW X w on on XR X R to obtain an an estimated estimated components (each of the to obtain (pz . . ,,Px· (pz + + pa) Ps) x x 11 coefficient coefficient vector vector y Ijj for for jj = = 1,1, .... p., . The naive The naive variance variance estimate estimate is is not not valid valid when when instrumental variables are are instrumental variables used. For this this case, case, we we must must rely rely on on other other asymptotic asymptotic estimates estimates of of the the varivariused. For

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

112 11 2

GENERALIZED ESTIMATING ESTIMATING EQUATIONS EQUATIONS GENERALIZED

ance that take take into into account account the the instrumental instrumental variables variables regressions regressions.. One One such such ance that estimate presented here here is the sandwich sandwich estimate estimate of of variance variance.. estimate presented is the While we are ultimately interested interested in in 13, we must must consider consider all all of of the the (non(nonWhile we are ultimately Q, we ancillary) parameters from from the the instrumental instrumental variables variables regressions regressions in in forming forming ancillary) parameters the associated variance matrix matrix.. It It is is not not reasonable reasonable to to assume assume that that these these two two the associated variance coefficient vectors are are orthogonal orthogonal.. coefficient vectors in aa PA-GEE for instrumental To include this this approach approach in PA-GEE for instrumental variable variable GLMs, GLMs, we we To include must first write write the the full full estimating estimating equation for O e == (,3, (13, "-y, a) a) as as must first equation for

'P(0)=

(3.138) (3.138)

(P)3 (0,-Y,a),`F-y(0,-Y,a)P«(0,-Y,a))

where the individual individual estimating estimating equations, equations, including including the the matrix matrix sizes, sizes, are are where the given by given by

wf3

(i=[xz

XR9]jiD (E[XZ XRiIjiD i=1 2-1

(o;;,i) V40-1 V(JLi)-l (Y~(¢~i )) Cya(0)

(

~ i

)

2

i=1

n

j=1, . . . ,(P=+Ps)

I

[ (XW p. - XR rYp.j) XR ji

(0a2)T Hi

1 (wi - ~

i)

I

j=l, ... ,p pX1 X1

(3.139) (3 .139)

j=1, . . . ,(P=+Ps)

XR'Yl j) XR ji

XR'Y2j) XR ji

/

(P=+Ps ) x 1

(3.140) (3.140)

j=1, . . . ,(P=+Ps)

J qxl

/ (P . (P=+P.))xl

(3.141)

Estimation is is performed performed in in two two stages stages.. First First we we run run the OLS regressions Estimation the OLS regressions for for the the instrumental instrumental variables variables.. Predicted Predicted values values are obtained from from the the OLS OLS are obtained regressions and used used as proxies for for the the appropriate appropriate variables variables in in the the generalized generalized regressions and as proxies linear model linear model of of interest. interest. Subsequent estimation is is the the same as in in PA-GEE PA-GEE.. An An Subsequent estimation same as estimate of 13 is obtained obtained followed followed by by estimates estimates of of the the ancillary ancillary parameters parameters a a estimate of Q is and ¢. We We alternate alternate estimation estimation between between the the coefficient coefficient vector vector and and the the ancilanciland 0. lary parameters until until subsequent subsequent estimates estimates of of 13 are within within some convergence lary parameters Q are some convergence criterion. criterion. The The upper upper p px xp p submatrix submatrix of of A A is is aa naive naive variance variance estimate estimate that that is is not not valid valid for for the the GLM GLM regression regression parameters parameters 13, since it it assumes assumes that that the the fitted fitted values values Q, since from from the the OLS 0 LS instrumental instrumental variables' variables' regressions regressions are true (without (without error) error).. In In are true addition, that matrix matrix assumes that 13 and y I are are orthogonal orthogonal.. In In most cases, the the addition, that assumes that Q and most cases, instruments that we we use use in in the the OLS OLS are are aa subset the GLM GLM covariates, covariates, so so instruments that subset of of the

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

FURTHER DEVELOPMENTS AND APPLICATIONS FURTHER DEVELOPMENTS AND APPLICATIONS

113 11 3

that this this assumption assumption is is untenable. untenable. A A valid valid variance variance estimate estimate may may be be obtained obtained that using the modified modified sandwich sandwich estimate estimate of of variance variance given given in in this this section. using the section . Our goal is is to to calculate the modified modified sandwich sandwich estimate estimate of of variance variance given given by by Our goal calculate the VMS = A-1 A-I B BA A -T. We form form the the variance variance matrix, matrix, A, A, for for O e by by obtaining obtaining the the - ' . We VMS = necessary derivatives where where both both 13 and, are each each assumed assumed to to be be orthogonal orthogonal necessary derivatives Q and y are to a 0;; but but 13 and, are not not assumed assumed to to be be orthogonal orthogonal to to each each other. to Q and y are other .

A-I

=

_ 0lI! f3p 013 pxp 0 13 PXP

00

_ 8pp 0lI! f3 _ 0lI!-y -C~ 7

-1

0, px (Px (Pz+Ps)) ay PX(P.(P=+P9))

0, (P(Px(Pz+Ps))x(Px(Pz+Ps)) (P=+Ps)) X (P- (P=+Ps)) 0

0

00 00 olI!n O'P«

00 qxq 8a qxq

(3.142) (3.142)

The (2,1) The (2,1) submatrix A is is zero zero since since 13 does not not enter enter the the T lI!-y estimating submatrix of of A Q does 7 estimating equation for the OLS regressions; but the (1,2) submatrix is not zero since, equation for the OLS regressions ; but the (1, 2) submatrix is not zero since -y lip enters the lI! f3 estimating equation through the predicted values of the OLS enters the estimating equation through the predicted values of the OLS regressions. The other other zero submatrices are the result result of of assumptions assumptions of of ororregressions . The zero submatrices are the thogonality, though one could develop a GEE2 without this assumption. thogonality, though one could develop a GEE2 without this assumption . The middle of of the the modified modified sandwich sandwich estimate estimate of variance is is given given by by The middle of variance n;

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

spit lit Twit

T

Tpit (3.143) B= (3.143) `L-,-it E Y~ i=1 t=1 t=1 Twit ) ) and the sandwich sandwich estimate of variance variance for for 13 is the the upper upper p p xx p p matrix matrix of of VMS VMS.. and the estimate of Q is The derivation The derivation of of aa valid valid variance variance estimate estimate for for PA-GEE PA-GEE analysis analysis of of panel panel data is an an application application of the general general method method of of forming forming sandwich sandwich estimates estimates data is of the of variance. Obtaining Obtaining estimates estimates for for the the regression regression parameter parameter vector vector 13 is relarelaof variance. Q is tively straightforward using most statistical packages; but obtaining the sandtively straightforward using most statistical packages ; but obtaining the sandwich estimate of of variance variance requires requires some some work work on on the the part part of of the the user. user. We We can can wich estimate not simply replace the covariates with the OLS regression predicted values not simply replace the covariates with the OLS regression predicted values and fit aa PA-GEE PA-GEE model. model. The The standard standard errors errors (naive or modified modified sandwich) and fit (naive or sandwich) are not correct and you must follow the derivation given above to to construct construct are not correct and you must follow the derivation given above aa valid valid estimate. estimate. -TT .. A Sandwich estimates estimates of of variance variance are are formed formed using using VMS VMS = = A-1 A- 1BA BAA Sandwich is symmetric for many applications. However, as Binder (1992) points out, is symmetric for many applications . However, as Binder (1992) points out, the bread bread of of the the sandwich sandwich estimate estimate of of variance is not, not, in in general, general, symmetric. symmetric. the variance is The asymmetry in the case of GLMs for longitudinal data with instrumenThe asymmetry in the case of GLMs for longitudinal data with instrumental variables due to the augmented matrices of cross derivatives is such such an an tal variables due to the augmented matrices of cross derivatives is example. example. In In constructing constructing sandwich sandwich estimates estimates of of variance, variance, the the bread bread of of the the sandwich sandwich (the A A matrix) matrix) is is the matrix of of second second derivatives derivatives of of the the complete complete estimating estimating (the the matrix equation. The middle middle of of the the sandwich of variance variance (the B matrix) matrix) equation . The sandwich estimate estimate of (the B is the variance variance of of the the complete estimating equation equation.. In In many many cases, cases, the the estiestiis the complete estimating mating equation involves involves independent independent observations; and the the B B matrix matrix may may be be mating equation observations ; and n

nt

114 11 4

GENERALIZED ESTIMATING ESTIMATING EQUATIONS EQUATIONS GENERALIZED

formed as as the the sum sum (over (over observations) observations) of of the the outer outer product product of of the the estimating estimating formed equation E2 equation L: i qiigT WiW{.. To highlight this this application, application, we we present present aa contrived contrived example example with with aa very very To highlight small dataset so so that that the the construction construction of of the the relevant relevant matrices matrices is is clear. clear. We We small dataset also point out out that that the the sandwich sandwich estimate estimate of of variance variance to to be be calculated calculated for for also point the regression regression coefficients coefficients does does not not depend depend on on the the hypothesized hypothesized correlation correlation the structure. The regression regression coefficients coefficients are are certainly but the the sandwich sandwich structure . The certainly affected, affected, but estimate of variance variance is is aa post-estimation post-estimation adjustment. adjustment. To To see see this, this, organize the estimate of organize the A B matrices matrices that that go go into into the the calculation as: A and and B calculation as:

A-I A

=

_oWf3 _oWf3 0 00,3 0/3 oOJ _8

(

___ "I

0

0, 0y

0

~o

o)

o)

)

-1

(~ ) ( 0

(3.144) (3.144)

« (_ a°o:a) a ~_

) _

B is B is organized organized similarly. similarly. Since Since our our goal goal is is to to obtain obtain the the sandwich sandwich estimate estimate of of variance for ,3, /3, we we need need only only look look at at the the result result of of the the matrix matrix multiplications multiplications variance for for the the upper upper p p x x pp entry entry of of the the sandwich sandwich estimate estimate of of variance variance.. Since Since A A is for is -T block block diagonal, diagonal, we we need need only only look look at at A11-11311A AI/BuAll. Due to to this this simplification, simplification, we we assume assume an an independent independent correlation correlation structure structure Due for aa linear linear regression regression model model since since that that allows allows us us to to certify results with with comcomfor certify results mercial that includes support for for the the sandwich sandwich estimate estimate of of variance variance mercial software software that includes support for instrumental instrumental variables variables regression regression.. Our Our approach approach will will match match the the commercommerfor cial results, except except in in the the situation situation where where scalar scalar adjustments adjustments may may be be cial software software results, made to the the variance variance estimator estimator.. made to Assume that that we we wish wish to to model model aa continuous continuous outcome outcome using using the the identity identity link link Assume function function Y = 00

+ 01X1

+ /32x2 + 03X3

(3.145) (3.145)

an exchangeable exchangeable correlation model.. However, we cannot cannot observe observe with with an correlation PA-GEE PA-GEE model However, we X3. Instead, we observe w which is equal to X3 plus measurement error. In x3 . Instead, we observe w which is equal to x3 plus measurement error . In addition, we have an instrumental variable s. addition, we have an instrumental variable s. Since x3 X3 is is not we can can first first fit fit aa regression regression of of (1, (1, x1 Xl,, x2 X2,, s) s) on on w, w, Since not observed, observed, we and use the the fitted fitted values values from from the the regression regression as as aa proxy proxy for the unobserved unobserved and use for the variable. Using the the data data listed listed in in section section 5.2.6, 5.2.6, the the A11 Au matrix matrix is estimated variable . Using is estimated by by

(3.146) (3.146) the estimated estimated submatrices for the the derivative derivative of of the the estimating estimating equation equation where where the submatrices for

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

FURTHER DEVELOPMENTS AND APPLICATIONS FURTHER DEVELOPMENTS AND APPLICATIONS

115 115

for the the regression regression of of interest interest are are for

_ oWf3 8,Q 0/3

0-y

_

_

43.88 43.88 -23.08 -23 .08 -84.48 -84 .48 ( -8.10 -8.10

-23 .08 -23.08 423.00 423 .00 637.00 637 .00 119.00 119 .00

-84 .48 -8.10 -84.48 -8.10) 637.00 119 119.00 637.00 .00 1578.00 216 216.00 1578.00 .00 216.00 40.00 216 .00 40 .00

-92.48 -92 .48 1695.09 1695.09 2552.66 ( 2552.66 476.87 476 .87

-338 .55 -338.55 2552.66 2552 .66 6323.53 6323 .53 865.58 865 .58

164 .70 -32 .45 164.70 -32.45) -131.00 476.87 -131 .00 476 .87 -330.96 865 865.58 -330.96 .58 -39.77 160.29 -39 .77 160 .29

(3.147) (3.147)

(3.148) (3.148)

and the estimated estimated submatrices submatrices for the derivatives derivatives of of the the estimating estimating equation equation and the for the of the instrumental variables regression are of the instrumental variables regression are

_ 8xp oW-y.y of] 8/3

=

[0] [0]

_ OW-Y

_

11900 ) ( 637.00 42300 216.00 637 .00 1578.00 1578.00 -82.59 -82 .59 216 .00

0, 0-y

(3.149) (3.149)

423 .00

637.00 637.00

-32.69 -32 .69

119 .00

-32.69 -32 .69 119.00 119 .00

-82.59 -82 .59 216.00 216.00

39.08 39.08 -9.92 -9.92

-9.92 -9.92 40.00 40.00

(3.150) (3.150)

The Bn matrix is is The B 11 matrix

B = - ( (pp) (Wf3) (Wf3) (pp) B11 n (w-y)(wf3)

(Wf3) (T7) (W-y) ) (pp) (w-y) (w-y)

(3.151) (3 .151)

where, for our our data, data, the the submatrices for the the first first row row are are estimated estimated by by where, for submatrices for

Opp) Opp)

56.10 33.44 56.10 33.44 33.44 281 281.41 33.44 .41 -0.53 355 355.03 -0.53 .03 ( 18.12 85.24 85.24 18.12

Opp)(T7)

-14 .59 -14.59 5.55 5.55 55.51 55.51 ( 0.92 0.92

-35 .63 -35.63 55.51 55.51 143.07 143 .07 13.42 13.42

-0.53 18.12 -0.53 18.12) 355.03 85.24 355 .03 85.24 944.74 139.77 944 .74 139 .77 139.77 32.39 139 .77 32.39 2.38 2.38 -12 .91 -12.91 -33.85 -33 .85 -5.20 -5.20

-5.75 -5.75) 0.92 0.92 13.42 13.42 -0.74 -0.74

(3.152) (3.152)

(3153) (3.153) .

and the submatrices submatrices for for the the second second row are estimated estimated by by and the row are

(T7)

Opp)

.59 ( -14 -14.59 -35.63 -35 .63 2.38 2.38 -5.75 -5.75

( (W-y) (w-y) (T-r ) (T7)

103 .00 10300 143.22 143 .22 -16.13 -16 .13 29.20 29.20

5.55 5.55 55.51 55.51 -12.91 -12 .91 0.92 0.92

55.51 55.51 0.92 092 ) 143.07 13.42 143 .07 13.42 -33.85 -5.20 -33 .85 -5.20 13.42 -0.74 13.42 -0.74

(3.154) (3 .154)

143.22 -16.13 143 .22 -16 .13 29.20 2920 ) 321.86 -37.02 45.00 321 .86 -37 .02 45.00 -37.02 8.87 -4.86 -4.86 -37 .02 8.87 45.00 -4.86 9.73 45.00 -4.86 9.73

(3.155) (3.155)

Using the the estimated estimated A11 An and and B11 B n matrices, matrices, the the calculated calculated sandwich sandwich estiestiUsing

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

116 11 6

GENERALIZED ESTIMATING ESTIMATING EQUATIONS EQUATIONS GENERALIZED

mate of variance variance for for the the instrumental instrumental variables variables regression regression is is mate of .105844 .105844 -.020986 -.020986 Ys(f3) = Vs (Q) _ .003564 .003564 ( .067679 .067679

-.020986 -.020986 .003564 .067679 .003564 .067679) .077022 .005081 -.273977 -.273977 .077022 .005081 .005081 .01454 -.110762 -.110762 .005081 .01454 -.110762 -.273977 1.68467 -.273977 .110762 1.68467

(3.156) (3.156)

This variance variance estimator, estimator, even even though though it it was was calculated calculated for for aa PA-GEE PA-GEE model model This with instrumental variables, is the same as a sandwich estimate of variance with instrumental variables, is the same as a sandwich estimate of variance for If we fit such for instrumental instrumental variables variables regression regression.. If we fit such aa model model in in aa commercial commercial package, we obtain the results package, we obtain the results IV (2SLS) (2SLS) regression regression with with robust robust standard standard errors errors IV

=

Number Number of of obs obs = F( F( 3, 3, 36) = 36) Prob = Prob >> FF R-squared R-squared = Root = Root MSE MSE

40 40 242 242.78 .78 0.0000 0 .0000 0.9506 0 .9506 2 .3249 2.3249

Robust Robust II Y II Coef.. Std.. Err Err.. t P> I t I [95% Conf Conf.. Interval] Interval] Coef Std t P>Itl [95% y -------------+----------------------------------------------------------------

-------------+----------------------------------------------------------------

.3429365 11.69 3.311802 4.702817 w 44.00731 .00731 .3429365 11 .69 00.000 .000 3 .311802 4 .702817 w II .2925399 6.42 2.472057 xl II x1 11.878759 .878759 .2925399 6 .42 00.000 .000 11.28546 .28546 2 .472057 3.141942 .1271036 24.72 2.884164 3.39972 x2 3 .141942 .1271036 24 .72 00.000 .000 2 .884164 3 .39972 x2 II .228422 1.368156 0.17 -2.546328 3.003172 -_cons cons II .228422 1 .368156 0 .17 00.868 .868 -2 .546328 3 .003172 -----------------------------------------------------------------------------Instrumented:: ww Instrumented Instruments:: xl x2 x2 ss Instruments xi ------------------------------------------------------------------------------

The commercial package (in this case, case, Stata) Stata) lists lists the the sandwich sandwich estimate of The commercial package (in this estimate of variance as variance as

yStata(!3)= Vstata (Q) S

.117605 .117605 -.023317 -.023317 .003960 .003960 ( .075198 .075198

.023317 .003960 .075198 -.023317 .003960 .075198 ) .085580 .00565 .085580 .00565 -.304419 -.304419 .00565 .016155 -.123069 -.123069 .00565 .016155 -.304419 -.123069 1.87185 .304419 .123069 1.87185

(3.157) (3.157)

Mentioned as aa possibility possibility earlier, earlier, Stata Stata does does apply apply aa documented documented scalar scalar adadMentioned as The scalar justment to to the the sandwich estimate of of variance variance.. The scalar adjustment adjustment is is equal equal justment sandwich estimate to n/(n nj(n - p) p) where where nn is is the the number number of of observations observations and and pp is is the the number number of of to covariates the regression regression model model.. In In this this case, can easily easily verify verify that that covariates in in the case, one one can 40 Vstata(,3) (3.158) (3.158) = 40-4 Vs w) we did did not not list list output output for for the the PA-GEE PA-GEE model, model, the the coefficients coefficients match match While While we the output from the the commercial commercial package package.. the output from To complete complete the the illustration illustration of of the the techniques, techniques, we we should should outline outline some some of of To the formulas formulas that that are are required required for for the the calculation calculation of of the the variance variance estimator estimator.. the There is There no commercial commercial software software that that allows specification of of these these types types of of is no allows specification general models (though (though we we saw saw an an example example of of aa specific specific member member of of the the class class general models of models).. of models) The The estimating estimating equation equation Tp Wf3 for for the the regression regression model model of of concern concern is is comcomplicated by by the covariate that that is is constructed constructed from from the the fitted fitted values values of of the the plicated the covariate

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

FURTHER DEVELOPMENTS AND APPLICATIONS FURTHER DEVELOPMENTS AND APPLICATIONS

117 117

instrumental variable regression. regression. instrumental variable

8~j

t {-~

2}]

(3 .159) [ 0 "~ - [Yi - (WA (wil3I + + x202 X2ifJ2 + + x303 X3ifJ3 + + ~0) fJO)] 2 = [0]4xi [O]4XI (3.159) 2 [Yi J ~ ~ i=1 with = 1, with jj = 1, .... . . ,,4. 4. The most complicated complicated part part of of calculating calculating the the sandwich sandwich estimate estimate of of variance variance The most is calculation of of -8xPp/8ryT -8lI! f3 / 8"fl,, since since the the construction construction of of w w involves involves is clearly clearly the the calculation the y I parameter parameter vector vector.. The The remaining remaining terms terms in the calculation of the the sandsandthe in the calculation of wich of variance variance are easily obtained obtained using using results results from from the the separate separate wich estimate estimate of are easily regressions. regressions .

3.6.2 The PA-EGEE for for GLMs GLMs 3.6.2 The PA-EGEE The PA-GEE The PA-GEE discussed discussed earlier earlier is is an ofthe quasilikelihood to to panel panel an application application of the quasilikelihood data. The partial partial derivatives derivatives of of the the quasilikelihood quasilikelihood have have score-like score-like properties properties data. The in terms when the derivatives derivatives are are in terms of of 13, but do do not not have have these these properties properties for for the the when the Q, but partial derivatives derivatives in in terms terms of of a a.. Nelder NeIder and and Pregibon Pregibon (1987) (1987) developed developed an an partial extension to the the quasilikelihood quasilikelihood for for which which both both partial partial derivatives derivatives have have scorescoreextension to and Severini Severini (1998) (1998) subsequently subsequently utilized utilized this this approach approach to to like properties. Hall Hall and like properties. extend the GEE 1 for GLMs. In the extension, it is assumed that the extended extend the GEE1 for GLMs. In the extension, it is assumed that the extended quasilikelihood may be be written written in in the the form form quasilikelihood may (3.160) (3.160) 2+ (Y2 ; Iti, a) = 2(Y2 ; It2) + fii (a) + f22 (YZ) ensuring that the the partial partial derivative derivative of of the the extended extended quasilikelihood quasilikelihood with with rereensuring that spect to 13 is the the same same as as the the partial partial derivative derivative of of the the quasilikelihood quasilikelihood with with spect to Q is respect to 13 respect to Q O2+ 8Q+ _- O2 8Q (3.161) 0,3 0,3 813 813 Recall that that the the quasilikelihood quasilikelihood associated associated with with GLMs GLMs is is given by Recall given by

2(Y ; P) Q(y; f-l) implying that implying that

Y - f-l* ** == fI "v(P*~) V(f-l*) df-l dp

8Q(y; f-l) 8p 8f-l

Y-f-l V(f-l) v(P)

(3.163) (3.163)

-2 f 2(y ; p) - 2(y ; y)}

(3.164) (3.164)

The deviance is is then then calculated calculated as as The deviance

D(y; p) f-l)

D( y ;

=

(3.162) (3 .162)

dp* (3.165) (3.165) v(P*~* The extended quasilikelihood quasilikelihood may may then then be be written written in in terms terms of of these these quantities The extended quantities as as

=

2+ (Y; N) _

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

-2f /'~

-2 In f 27roV(y)I - 2D(y ; p)

(3.166) (3.166)

118 11 8

GENERALIZED ESTIMATING ESTIMATING EQUATIONS EQUATIONS GENERALIZED

such that the the estimate estimate obtained obtained by by maximizing maximizing the extended quasilikelihood quasilikelihood such that the extended j3 Q+ estimates estimates the the same same population population parameter parameter as as the the estimate estimate obtained obtained maxmax,QQ+ imizing the quasilikelihood j3Q. A proper likelihood is implied if there is aa imizing the quasilikelihood,Q Q . A proper likelihood is implied if there is distribution in the exponential family, with the variance function specified for distribution in the exponential family, with the variance function specified for the extended quasilikelihood. the extended quasilikelihood . the connection connection of of the the extended extended quasilikelihood quasilikelihood to to the the models models To illustrate the To illustrate already examined, let let us us derive derive an an estimating estimating equation equation from from the the extended extended already examined, quasilikelihood for the the exponential exponential family using V(p) V(f-l) = = Ftf-l and a(¢» = = 11 (the (the quasilikelihood for family using and a(0) appropriate choices for for the the Poisson Poisson model) model).. The extended quasilikelihood quasilikelihood for for appropriate choices The extended this case case is is given given by by this _

-1

ln{27ry} +

~ Y - P* dot* fy V (p*)a(o) y - ln{27ry} + - P* dp* 2 fyP Ft

(3.167) (3.167)

y ln(p) -

(3.169) (3.169)

2

Ft -

(3.168) (3.168)

y(ln(y) - 1) - 1 ln{27ry}

with an estimating estimating equation equation '!J(0) = 82+ oQ+ /0u jOf-l = 0 =,3 = j3 given given by by with an T(0) = = 00 for for 0

_1)

n

i=1

(Yi

(0~) a X'2 } .=] ~ 1~ . . .1P PX1

[0JPX1

(3 .170)

Equation 3.170 matches matches the the specific derivation of of the the estimating estimating equation equation for for Equation 3.170 specific derivation the likelihood-based likelihood-based Poisson Poisson model model given given in in equation equation 22.29. The estimating estimating the .29 . The equations for the the two two approaches match even even though though the extended quasilikeequations for approaches match the extended quasilikelihood in equation equation 33.169, by assuming assuming aa variance variance function function from from the the lihood in .169, implied implied by Poisson distribution, differs differs (in (in the the normalizing normalizing term) term) from from the the Poisson Poisson loglogPoisson distribution, likelihood given (in terms of of x,Q) xj3) in in equation equation 22.28. likelihood given (in terms .28 . Solving the partial partial derivative derivative with with respect respect to to a a would would be be aa straightforward straightforward Solving the approach to deriving deriving the the estimating estimating equations from the the extended quasilikeliapproach to equations from extended quasilikelihood. However, there there are are two two problems problems with with this this approach. First, solving solving the the hood. However, approach . First, partial derivative derivative is is difficult, difficult, and and second, second, the the resulting resulting estimator estimator is is biased. biased. partial Utilizing the same same decomposition decomposition of ofthe as was was used used for for the the PA-GEE PA-GEE Utilizing the the variance variance as model, we require require aa matrix matrix model, we

00-1R(a) -1/2 {t( s)} O¢>-~ R( a) D(V(f-lit)) -1/2 {t( s) }ds f1 sD(V(f-lit)) (3 sD(V(/tit)) -1,2 It(s)1 D(V(Pit)) -1/2 {t(s)}ds Jo aJ Oaj

r

1

(3.171) .171)

where the elements elements of of the the matrix matrix are are functions functions of of a a that that depend depend on on another another where the integral. For our purposes, it is enough to understand that this approach integral. For our purposes, it is enough to understand that this approach is computationally vexing vexing due due to to numeric numeric integration integration of functions with with end end is computationally of functions point singularities. The solution (assuming we can get to one) leads to biased point singularities. The solution (assuming we can get to one) leads to biased estimates. However, we we emphasize emphasize that that we we could, could, in in fact, fact, proceed proceed with with solving solving estimates . However, these integrals out of a desire to fit a true extended quasilikelihood model. these integrals out of a desire to fit a true extended quasilikelihood model . Alternatively, the the integral integral (equation (equation 33.171) may be be approximated approximated using using aa Alternatively, .171) may

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

FURTHER DEVELOPMENTS AND APPLICATIONS FURTHER DEVELOPMENTS AND APPLICATIONS

119 119

first-order Taylor Taylor series expansion, providing providing the the PA-EGEE given by by first-order series expansion, PA-EGEE given

'!J(,B,a) = 1PW,a)

(To W, a) ,1P«W,a)) n alti ~xTD a n (V (l-~i)) -1 i=1 n i=1

-

+tr

V(wi)

=

V l) 8a

(yi - Ai)T

v(wi)

(3.172) (ya

-1

~) 2~

(yi - Ai)

(3.173) (3.173)

8V(lti)-1 8a

) D(V(/tit))'/' R(a) D(V(pit)) 1 / 2

(3.174) (3.174)

out that that we we can make use use of of the the fact fact that that Hall points out Hall (2001) (2001) points can make

iti V( .) 8V OV(JLi)-l = _ OV(JLi) V( .)-1 ~~(ai) ~a)-1 oa =oa V(l-~i)-1 JL,

(3.175) (3.175)

V(l-ti) JL,

in order to to avoid avoid the the need need to to differentiate differentiate V(Iti)-1 V(JLi)-l in calculating the the estimatestimatin order in calculating ing equation.. ing equation this model model in in practice practice requires requires programming programming since since there there is is currently currently To use this To use no support for for this this class of models models in in existing existing software packages.. Choosing Choosing no support class of software packages between fitting fitting aa PA-EGEE PA-EGEE model model and and aa GEE2 model is is usually usually based based on on the the between GEE2 model focus of of the the analysis analysis and the reasonableness reasonableness of of treating treating the the two two estimating estimating focus and the equations as orthogonal orthogonal.. In In general, general, PA-EGEE PA-EGEE compared compared to to aa similar similar GEE2 GEE2 equations as model provides smaller smaller standard standard errors errors for for ,B (because of of the the orthogonality orthogonality model provides Q (because assumption) and aa less less accurate accurate estimate estimate of of the the dispersion dispersion ¢>. assumption) and 0.

3.6.3 The PA-REGEE for for GLMs GLMs 3.6.3 The PA-REGEE

Following ideas Following ideas introduced introduced for for robust robust regression regression to to allow allow for for models models to outliers in the data, Preisser and Qaqish (1999) generalized the to outliers in the data, Preisser and Qaqish (1999) generalized the The for PA-GEE models. The resistant PA-GEE for ,Bpx1 is given by* for PA-GEE models. resistant PA-GEE for,Q Pxl is given by*

tEEf= (~~) (~M). n

'!J(,B) =

n;

,=1 t=1 t=l i=1

'fJ

itd

~V([V(JLi)]-l /-i)_

(w/~(¢>~i

-1Cw2ya(~)

2

--c2/ Ci) = [O]Pxl [O]PX1 -

resistant resistant concepts concepts (3.176) (3.176)

where the usual usual PA-GEE PA-GEE is is aa special special case case wherein wherein wi Wi is is an an ni ni x x ni ni identity identity where the matrix (for all all i) i) and and Ci is an an ni ni x x 11 vector vector of of zeros zeros (for (for all all i) i).. The estimating matrix (for ci is The estimating equation the association association parameters parameters a, a, due due to to Liang Liang and and Zeger Zeger (1986) equation for for the (1986) ,, is is given in equation equation 3.15 3.15 (using (using moment moment estimates estimates based based on Pearson residuals) residuals).. given in on Pearson In order to to use use the the estimating equation using using ALR ALR in in equation equation 3.79, 3.79, or or the the In order estimating equation estimating equation from the PA-EGEE PA-EGEE model, model, one one would would have to first first work work estimating equation from the have to out the required required robust robust formulae formulae changes changes to to those those estimating estimating equations equations.. The The out the changes the moment moment estimates estimates for for the the estimating estimating equation equation for for PA-GEE PA-GEE are are changes to to the given later in in this this section section.. given later We denote the resistant resistant PA-GEE PA-GEE as as PA-REGEE, the cited cited authors REGEE.. ** We denote the PA-REGEE, whereas whereas the authors use use REGEE

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

120 12 0

GENERALIZED ESTIMATING ESTIMATING EQUATIONS EQUATIONS GENERALIZED

This presentation, like like that that for for PA-GEE PA-GEE in in equation 3.14, assumes assumes that that the the This presentation, equation 3.14, variance of the the outcomes outcomes may may be be written written variance of

v(w2) =

[D(v(,UZt))1/2 R(a)(nixni) D(v(,UZt)) 1 /2

I

(3.177) (3.177)

n xn;

In general, In general, w2 Wi is is aa diagonal diagonal matrix matrix of of observation observation weights weights and and c2 Ci is is aa vector vector of constants ensuring ensuring that that the the estimating estimating equation equation is is unbiased. unbiased. of constants The The Mallows Mallows class class of of weights weights determines determines observation observation weights weights as as aa function function only. The of the values values of of the the covariates covariates only. The Schweppe class of of weights weights determines determines of the Schweppe class weights as aa function function of of the the outcomes. outcomes. The The basic basic idea idea of of resistant resistant estimation estimation is weights as is to investigate the influence influence of of the observations and and then then downweight influential to investigate the the observations downweight influential data so that that aa more more even even contribution contribution to to the the estimation estimation is is obtained obtained for each data so for each observation or panel. As seen in Chapter 4, influence may be measured per observation or panel . As seen in Chapter 4, influence may be measured per observation or per panel. Therefore, we may apply the downweighting based observation or per panel . Therefore, we may apply the downweighting based on either approach approach.. The theoretical justification justification of of the the approach approach is is discussed discussed on either The theoretical in the cited article as well as Carroll and Pederson (1993) for the case of of in the cited article as well as Carroll and Pederson (1993) for the case and the logit link function. models with binomial variance models with binomial variance and the logit link function. For the the Mallows Mallows class class of weights, we we have have c2 Ci = = 00 for for all i, and we need need not not For of weights, all i, and we make any further assumptions past those for PA-GEE. Following the fit of make any further assumptions past those for PA-GEE . Following the fit of aa PA-GEE model, the the Mallows Mallows weights weights may may be be determined determined through through an an investiinvestiPA-GEE model, gation ofthe influence and then a new PA-REGEE model fit with the weights gation of the influence and then a new PA-REGEE model fit with the weights determined in the the previous previous step. Even if if you you have have access access to to aa statistical statistical packpackdetermined in step. Even age that allows weights, you may not be able to use it. First, some statistical age that allows weights, you may not be able to use it. First, some statistical packages require require that that weights weights be be constant within panel panel (limiting (limiting you you to to panelpanelpackages constant within level downweighting), and second, the statistical package may not apply the level downweighting), and second, the statistical package may not apply the weights in the desired manner for the calculation of the moment estimates. weights in the desired manner for the calculation of the moment estimates . Check the the documentation of your your preferred preferred software package to to see see if this can can Check documentation of software package if this be done. Otherwise, programming is required. be done. Otherwise, programming is required . Preisser and and Qaqish Qaqish made made prenatal prenatal care care data data available* available* that that we we shall shall anaanaPreisser The The lyze. The data include 137 observations (patients) for 42 doctors. The outcome lyze. data include 137 observations (patients) for 42 doctors . outcome is whether the the patient patient is is bothered bothered by by urinary urinary incontinence incontinence.. Clustered Clustered by by docis whether doctor doct_id, the covariates included in the model are female, the gender of tor doct_id, the covariates included in the model are female, the gender of the patient; age, the age in decades; dayacc, a constructed daily number of the patient ; age, the age in decades ; dayacc, a constructed daily number of leaking accidents leaking accidents based based on on the the reported reported number number of of accidents accidents per per week; week; severe, severe, whether the accidents accidents are severe; and toilet, the average number of times whether the ; toilet, are severe and the average number of times the patient uses the toilet per day. the patient uses the toilet per day. The PA-GEE PA-GEE fit fit for for the the data data is is given given by by The GEE population-averaged GEE population-averaged model model doct_id Group variable doct -id Group variable:: Link : logit Link: logit Family binomial Family:: binomial Correlation : exchangeable Correlation: exchangeable

Scale parameter:: Scale parameter

*

1 1

http://www.phs.wfubmc.edu/data/uipreiss.html * http ://www .phs .wfubmc .edu/data/uipreiss .html

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

Number of obs Number of obs Number of groups Number of groups Obs Obs per per group group:: min min avg avg max max Wald Wald chi2(5) chi2(5) Prob >> chi2 Prob chi2

= = = = = = =

= = =

137 137

42 42 11 3.3 3.3 8 8 30.16 30 .16 0.0000 0 .0000

FURTHER DEVELOPMENTS AND APPLICATIONS FURTHER DEVELOPMENTS AND APPLICATIONS

121 121

bothered II Coef.. Std.. Err Err.. Z P> I z I [95% Conf Conf.. Interval] Interval] bothered Coef Std z P>Izl [95% -----------------------------------------------------------------------------female I --.7730688 .6012588 -1.29 -1.951514 .4053767 female .7730688 .6012588 -1 .29 00.199 .199 -1 .951514 .4053767 age I --.6556766 .575984 -1.14 -1.784585 .4732313 age .6556766 .575984 -1 .14 00.255 .255 -1 .784585 .4732313 dayacc I .3972632 .0926325 4.29 .2157068 .5788196 dayacc .3972632 .0926325 4 .29 00.000 .000 .2157068 .5788196 severe I .8027313 .3530613 2.27 .1107439 severe .8027313 .3530613 2 .27 00.023 .023 .1107439 11.494719 .494719 toilet I .1059107 .0841537 1.26 .2708489 toilet .1059107 .0841537 1 .26 00.208 .208 --.0590274 .0590274 .2708489 cons I -3 -3.035959 1.111234 -2.73 -5.213939 _cons .035959 1 .111234 -2 .73 00.006 .006 -5 .213939 --.8579799 .8579799 ------------------------------------------------------------------------------

-------------+----------------------------------------------------------------

The estimated The estimated exchangeable exchangeable correlation correlation is is 0.1013. 0.1013 . Note that the PA-REGEE model generalizes the estimating estimating equation equation Note that the PA-REGEE model generalizes the downweight for influential observations. The moment estimators must downweight for influential observations . The moment estimators must downweighted as well. The dispersion parameter is estimated using downweighted as well. The dispersion parameter is estimated using n

1

ni

¢= _1_~~r.*2 p Ft2 Y~ Y~ n* _p~~ i=l t=1 t=l i=1

to to be be

(3.178) (3.178)

d

where where it

n* n*

Yit - lit Mit - Cit Cit Yit n

(3.179) (3.179)

V(f.J,it) V(/-tit) ni

2 wit

LLWTt

t=1 i=l t=l i=1

(3.180) (3.180)

Specifying panel-level panel-level Mallows-class downweights results results in in Specifying Mallows-class downweights GEE population-averaged model model GEE population-averaged doct_id Group variable variable:: doct id Link: logit Link : logit Family binomial Family:: binomial Correlation: exchangeable Correlation : exchangeable Scale parameter:: Scale parameter

1 1

Number of obs Number of obs Number Number of groups groups

= =

137 137 42 42

Obs per per group group:: min min Obs avg avg max max Wald Wald chi2(5) chi2(5) Prob >> chi2 Prob chi2

= = =

11 3.3 3 .3 88 .15 32.15 32

= =

= =

0.0000 0 .0000

bothered II Coef.. Std.. Err Err.. z P> I z I [95% Conf Conf.. Interval] Interval] bothered Coef Std z P>Izl [95% -----------------------------------------------------------------------------female I --.8275123 .6234338 -1.33 -2.04942 .3943956 female .8275123 .6234338 -1 .33 00.184 .184 -2 .04942 .3943956 age I --.2153152 .5897718 -0.37 -1.371247 .9406162 age .2153152 .5897718 -0 .37 00.715 .715 -1 .371247 .9406162 dayacc I .3800309 .0889533 4.27 .2056856 .5543762 dayacc .3800309 .0889533 4 .27 00.000 .000 .2056856 .5543762 severe I .9275332 .3542698 2.62 .2331773 severe .9275332 .3542698 2 .62 00.009 .009 .2331773 11.621889 .621889 toilet I .0677876 .0821533 0.83 .2288051 toilet .0677876 .0821533 0 .83 00.409 .409 --.0932299 .0932299 .2288051 cons I -3 -3.141697 1.124495 -2.79 -5.345666 _cons .141697 1 .124495 -2 .79 00.005 .005 -5 .345666 --.9377271 .9377271 ------------------------------------------------------------------------------

-------------+----------------------------------------------------------------

Utilizing Utilizing the the Schweppe class of of weights weights is is more more complicated complicated since we must must Schweppe class since we for the determine the vector vector of of constants constants ci-hence ci-hence ensuring ensuring unbiasedness unbiasedness for the determine the estimating equation.. estimating equation

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

122 12 2

GENERALIZED ESTIMATING ESTIMATING EQUATIONS EQUATIONS GENERALIZED

3.7 Missing Missing data data 3.7

Techniques for dealing dealing with with missing missing data data are steadily gaining gaining recognition recognition and and Techniques for are steadily there is currently active research aimed at developing new techniques for spethere is currently active research aimed at developing new techniques for specific modeling situations situations.. This This subject subject is is far far larger larger in in scope scope than than we we can can detail detail cific modeling within the limits of our text. Our introduction here is designed to introduce within the limits of our text. Our introduction here is designed to introduce the reader reader to to the the topic topic and and to to outline outline some some of of the the techniques techniques that that have have been been the successfully applied-especially for the case of dropouts in longitudinal data successfully applied-especially for the case of dropouts in longitudinal data studies. In the subsequent chapter we present techniques for assessing missing studies . In the subsequent chapter we present techniques for assessing missing data together with with formal formal tests tests of of the the MCAR MCAR assumption assumption.. data together We anticipate that commercial software packages will add add sophisticated sophisticated We anticipate that commercial software packages will techniques for modeling panel data with missing observations. However, these techniques for modeling panel data with missing observations. However, these additions will not be turnkey solutions since the analyst will be required to additions will not be turnkey solutions since the analyst will be required to make major modeling decisions as to the nature and assumptions underlying make major modeling decisions as to the nature and assumptions underlying the applied applied techniques techniques.. This This section section outlines outlines those those assumptions assumptions and explains the and explains the motivations and implications of various types of missing data. the motivations and implications of various types of missing data. Throughout the the text, text, we we have have thus thus far far implicitly implicitly assumed assumed that that the the data data we we Throughout The analyze are complete. However, this is often not true in practice. The figure analyze are complete. However, this is often not true in practice. figure below illustrates illustrates various various patterns patterns of of missing missing data. data. below Missing data data Missing

r

r

r

r

r

r

r

r

r

r

1

2

3

4 4

5

6

7

6

9

Panel identifier identifier Panel

Squares mark missing missing data data for for the the response response variable variable in in aa dataset dataset with with 99 Squares mark panels and and 88 repeated repeated measures measures per per panel. panel. panels

In In the the figure figure above, above, the the missing missing data data patterns patterns are are identified identified as: as: 1, 2, •" Complete Complete:: Panels Panels 1, 2, 4, 4, and are complete complete panels panels where where there there are are no no and 55 are missing data. data. These These panels panels provide provide complete complete information information for for the the model. model. missing

•" Panel Panel nonresponse nonresponse:: Panel Panel 33 has has no no observations observations;; all all replications replications are are missing.. This panel provides provides no no information information for for the the model. model. missing This panel

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

MISSING DATA MISSING DATA

123 12 3

• Item Item nonresponse nonresponse:: Panels Panels 66 and and 88 have some missing missing data. data. These panhave some These pan-

els provide provide incomplete incomplete information information for the panels. panels. els for the

• Dropout: Dropout: Panels Panels 77 and and 99 have have aa special special type type of of item item nonresponse nonresponse where where

once an an observation observation is is missing missing for for the the panel, panel, the rest of of the the observations once the rest observations are also also missing missing.. These These panels panels provide provide incomplete incomplete information information for the are for the panels. panels.

In viewing viewing the the patterns patterns of of the the missing missing data, data, we we are are concerned concerned with with whether whether In that pattern pattern is is random random or or monotone. monotone. To the process process that that generates generates that To investigate investigate the the missing missing observations, observations, we we partition partition the the outcomes outcomes into into the Y Y Yo Y,

= =

=

complete data data complete observed data data observed missing data data missing

and we construct construct an an indicator indicator matrix matrix M M for for and we matrix where the elements of the matrix are defined where the elements of the are defined

_ 1 At - { 0

(3.181) (3 .181) (3.182) (3 .182) (3.183) (3.183)

the missingness of of observations the missingness observations as as

¥it isis missing missing YZt ¥it is is observed YZt observed

(3.184) (3 .184)

Our goal goal is is to to investigate investigate the the joint joint distribution fy,M insofar insofar as as we we are are Our distribution fy,M interested in knowing knowing whether whether the the distribution of the the missing missing data data fm fM is interested in distribution of is the outcomes. Essentially we we want want to to know know if fMIY = = fm fM.. independent of the independent of outcomes . Essentially if fmly We several useful useful terms terms based based on on probabilities probabilities for the We define define several for characterizing characterizing the missing data. missing data. If P(MIY) = = P(M) Y, then then M M is independent of of the the observed observed If P(MIY) P(M) for for all all Y, is independent outcomes Yo and and the the missing missing outcomes outcomes Y,,, Y m.. In In this this case, case, the the process process for for outcomes Yo missing data is called missing missing completely completely at at random, random, or or MCAR. MCAR. Rotnitzky Rotnitzky missing data is called and Wypij (1994) (1994) explain that the the MCAR MCAR assumption assumption means means the the process process and Wypij explain that that generates generates missing missing data data is is independent independent of of the the observed observed and and unobserved unobserved that data values. In In such such aa case, the standard standard techniques techniques we we have have discussed discussed provide provide data values. case, the valid inferences. valid inferences. , . For If P(MIY) = for all all Y,n, Y m, then then M M is is independent independent of of Y Y m. For this this If P(MIY) = P(MIY P(MIY,) o ) for case, the process process for for missing missing data data is is called called missing missing at at random, random, or or MAR. MAR. Rubin Rubin case, the (1976) points points out out that that valid valid inference inference is is obtained obtained from from likelihood-based likelihood-based models models (1976) that ignore ignore the the missing missing data data mechanism mechanism when when the the nonresponse nonresponse depends depends on on that the observed data; but but the the nonresponse nonresponse mechanism mechanism is is still still independent independent of of the the the observed data; unobserved unobserved data. data. If P(MIY) If P(MIY) depends depends on on the the missing missing outcomes outcomes Y,, Y m, the the missing missing data data are are informatively missing called missing data data or or nonignorable nonignorable nonresponse nonresponse.. called informatively In In aa catalog of analysis analysis techniques, techniques, we we can can partition partition our our data data into into complete complete catalog of and incomplete cases. cases. Imputation Imputation is is typically typically the the first first approach approach used used to to handle and incomplete handle missing data. In this missing data. In this technique, technique, missing missing values values are are replaced replaced with with some some imputed imputed value from the the data. data. This This is is aa simple simple technique, technique, but but requires requires assumptions assumptions on on value from how to impute impute the the values values.. The The validity the results results of of imputation imputation are are directly directly how to validity of of the tied to to the the assumptions assumptions used used in in imputing imputing the the missing missing data. data. tied We discuss the example data of the classification of asthma among among white white We discuss the example data of the classification of asthma

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

124 12 4

GENERALIZED ESTIMATING ESTIMATING EQUATIONS EQUATIONS GENERALIZED

children from Steubenville, Steubenville, Ohio. Ohio. This This example example is is also also used used as as the the motivating children from motivating example in Rotnitzky Rotnitzky and and Wypij Wypij (1994) (1994).. The The data data consists consists of of 1419 1419 children example in children (706 boys boys and and 713 713 girls) girls) where the classification classification of of asthma asthma status status is is recorded recorded (706 where the for each each child child at at age age 99 and and age age 13. There are are 149 149 missing missing classifications classifications for for for 13. There boys at at age age 13, 13, and 123 missing missing classifications classifications for for girls girls at at age age 13. The data data boys and 123 13. The are summarized as are summarized as

BOYS BOYS

Asthma Asthma at age age 99 at

No No Yes Yes Total Total

Asthma at at age age 13 13 Asthma No Yes Missing No Yes Missing 514 15 145 514 15 145 22 66 22 44 520 520

37 37

Total Total 674 674 32 32

149 149

706 706

No No Yes Yes

Asthma at at age age 13 13 Asthma Missing No Yes Missing No Yes 115 561 13 115 561 13 13 88 33 13

Total Total 689 689 24 24

Total Total

564 564

GIRLS GIRLS

Asthma Asthma at age age 99 at

26 26

123 123

713 713

These data have have ii = = 1, 1, ... = 1,2 where the the repeated repeated obserobserThese data . . . ,1419 ,1419 and and tt = 1,2 where vations for age age 99 and and age age 13. 13. There There are are 1147 complete panels panels and and 272 272 vations are are for 1147 complete dropout panels for for which which the the outcome outcome is is unobserved unobserved at at age age 13; thus, there there are are dropout panels 13 ; thus, 1147(2) + + 272(1) 272(1) = = 2566 2566 observations observations.. We We assume assume that that the the outcomes outcomes follow follow aa 1147(2) logistic model where where the the covariates covariates include include aa constant, constant, an an indicator indicator variable variable logistic model for gender, gender, and and an an indicator indicator variable variable for for age for age 13. 13. If we we fit fit aa logistic logistic regression regression model model (a (a PA-GEE PA-GEE model model assuming assuming indepenindepenIf dence the repeated repeated observations) observations) to to the the data data ignoring ignoring any mechanism for for dence of of the any mechanism the missing missing data, data, we obtain the the following following results results the we obtain GEE population-averaged model model GEE population-averaged Group variable:: Group variable Link: Link : Family:: Family Correlation: Correlation : Scale parameter:: Scale parameter Pearson chi2(2566): chi2(2566) : Pearson Dispersion (Pearson) (Pearson):: Dispersion

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

id id logit logit binomial binomial independent independent 1 1

2568.36 2568 .36 11.000921 .000921

Number of obs Number of obs Number of groups Number of groups Obs per per group group:: min min Obs avg avg max max Wald chi2(2) chi2(2) Wald Prob >> chi2 Prob chi2

= = = = = = =

2566 2566 1419 1419 11 1.8 1 .8 2 2 77.27 .27 0.0264 0 .0264

Deviance Deviance Dispersion Dispersion

= =

955.94 955 .94 .3725423 .3725423

= = =

125 12 5

MISSING DATA MISSING DATA

y II Coef.. Std.. Err Err.. Z P> I z I [95% Conf Conf.. Interval] Interval] Coef Std z P>Izl [95% y ----------------------------------------------------------------------------gender II .3750074 .1902226 1.97 .0021778 .7478369 gender .3750074 .1902226 1 .97 00.049 .049 .0021778 .7478369 age13 II .351794 .1882782 1.87 .7208124 age13 .351794 .1882782 1 .87 00.062 .062 --.0172244 .0172244 .7208124 -3.394797 .1758445 -19.31 -3.739446 -3.050148 _cons 1I -3 _cons .394797 .1758445 -19 .31 00.000 .000 -3 .739446 -3 .050148 ------------------------------------------------------------------------------

-------------+----------------------------------------------------------------

The The validity validity ofthe of the inferences inferences we we draw draw for for fitted models on on incomplete data is fitted models incomplete data is ignorable . aa function function of of whether whether the the mechanism mechanism generating generating the the missing missing data data is is ignorable. Nonignorable missing missing data data result result in in biased biased coefficient coefficient estimates estimates.. Nonignorable can hypothesize hypothesize many many reasons reasons for for the the missing missing data data in in the the example We can We example presented.. We We can can assume assume that that the the missing missing data are related related to to the the asthma asthma presented data are status such that that those those without without asthma asthma at at age 13 are are always always observed; observed; but but status such age 13 those with with asthma asthma have have some some probability probability of not being being observed observed.. Under Under this this those of not assumption, the imputed imputed complete complete table table of of responses responses would would be be assumption, the

Asthma Asthma at age age 99 at

No No Yes Yes

BOYS BOYS

GIRLS G RLS

Asthma at at age age 13 13 Asthma No Yes No Yes 514 160 514 160 66 26 26

Asthma at at age age 13 13 Asthma No Yes No Yes 561 118 561 118 21 33 21

Missing Missing data data all all assigned assigned as as asthmatics asthmatics.. Under imputation, the the coefficient coefficient table table for for the the independence independence model model is Under this this imputation, is

y II Coef.. Std.. Err Err.. z P> I z I [95% Conf Conf.. Interval] Interval] y Coef Std z P>Izl [95% ----------------------------------------------------------------------------gender .1319722 gender II .3588865 .1157747 3.10 .1319722 .5858008 .3588865 .1157747 3 .10 00.002 .002 .5858008

-------------+---------------------------------------------------------------age13 age13 _cons _cons

II II

11.98622 .98622 -3.379309 -3 .379309

.1505188 .1505188 .1522291 .1522291

13.20 13 .20 -22 .20 -22.20

00.000 .000 00.000 .000

1.691209 1 .691209 -3.677673 -3 .677673

2.281232 2 .281232 -3.080946 -3 .080946

Results from from this this assumption assumption show show how how our our estimate estimate of of the the coefficient coefficient on on Results age13 is is downward downward biased biased when when we we analyze analyze only only the the observed observed data. data. age13 Instead of of assuming assuming that that the the asthmatics asthmatics might might not not respond, respond, we we can can assume assume Instead that it it is the non-asthmatics non-asthmatics who who might might not not respond respond.. In In this this case, case, the imputed that is the the imputed complete table is is complete data data table

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

126 12 6

GENERALIZED ESTIMATING ESTIMATING EQUATIONS EQUATIONS GENERALIZED

Asthma Asthma at age age 99 at

No No Yes Yes

BOYS BOYS

GIRLS G RLS

Asthma at at age age 13 13 Asthma No Yes No Yes 659 15 659 15 10 22 10 22

Asthma at at age age 13 13 Asthma No Yes No Yes 676 13 676 13 11 13 11 13

Missing data all assigned assigned as as non-asthmatics non-asthmatics.. Missing data all The coefficient table table for for the the independence independence model under this this imputation imputation is The coefficient model under is then then

y II Coef.. Std.. Err Err.. Z P> I z I [95% Conf Conf.. Interval] Interval] y Coef Std z P>Izl [95% -------------+-------------------------------------------------------------------------------------------------------------------------------------------gender II .3464105 .1896561 1.83 .7181297 gender .3464105 .1896561 1 .83 00.068 .068 --.0253087 .0253087 .7181297 age13 II .1230778 .1877312 0.66 .4910241 age13 .1230778 .1877312 0 .66 00.512 .512 --.2448685 .2448685 .4910241 _cons II -3 -3.378212 .1748608 -19.32 -3.720933 -3.035491 _cons .378212 .1748608 -19.32 00.000 .000 -3 .720933 -3 .035491

using Under this assumption, assumption, our our estimated estimated coefficient coefficient on on the the age13 age13 variable, variable, using Under this only the Clearly, we only the observed observed data, data, is is biased biased upward upward instead instead of of downward. downward. Clearly, we can make many many other other assumptions assumptions about about the the nature nature of the mechanism mechanism driving driving can make of the the missingness missingness of of data. data. Under Under some some assumptions, assumptions, an an analysis analysis using using only the the only the observed data will will not not differ differ significantly the (unknown) (unknown) complete complete data data.. observed data significantly from from the A second approach to to analyzing analyzing data data with with missing missing values values is is another another form form A second approach of complete case case analysis analysis.. In In this this approach, approach, we we drop drop the the incomplete incomplete cases cases of complete and generate weights weights for for the the complete complete cases cases to to address address bias bias induced induced by by the the and generate missing data process process.. This can be be difficult difficult to to do with existing software unless unless missing data This can do with existing software the software supports user-defined user-defined weights weights.. the software supports If we we assume assume that that the the data data are are MAR, MAR, we we can can calculate calculate probabilities probabilities of of nonnonIf response to construct construct aa probability probability weighted weighted estimating estimating equation equation.. Assuming Assuming response to that the the data data are are missing missing as as aa function function of of gender gender and and the the observed observed outcome outcome at at that age 9, the the probability probability of of nonresponse nonresponse for boys who who were were classified classified as as asthmatasthmatage 9, for boys ics at age age 99 is is 145/674 145/674 = = .215, for boys boys who who were were not not classified classified as ics at .215, for as asthmatics asthmatics at age 99 is is 4/32 4/32 = .125, for for girls girls who who were were classified as asthmatics at age age 99 is at age = .125, classified as asthmatics at is 115/689 = = .167, .167, and and for for girls girls who who were were not not classified classified as as asthmatics asthmatics at age 99 115/689 at age is = .333. .333. Fitting Fitting this this weighted model results results in in is 8/24 8/24 = weighted model GEE population-averaged GEE population-averaged model model Group variable Group variable::

Link : Link: Family Family:: Correlation: Correlation :

Scale parameter:: Scale parameter

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

id id logit logit binomial binomial exchangeable exchangeable 1 1

Number of obs Number of obs Number of groups Number of groups Obs Obs per per group group:: min min avg avg max max Wald chi2(2) chi2(2) Wald Prob >> chi2 Prob chi2

= = = = = = =

= = =

2566 2566 1419 1419 11 1 .8 1.8 2 2 .24 13.24 13 0.0013 0 .0013

MISSING DATA MISSING DATA

127 12 7

(standard errors errors adjusted adjusted for for clustering clustering on (standard on id) id) -----------------------------------------------------------------------------Semi-robust Semi-robust II y II Coef.. Std.. Err Err.. Z P> I z I [95% Conf Conf.. Interval] Interval] Coef Std z P>Izl [95% y ----------------------------------------------------------------------------gender II .4119338 .2335005 1.76 .8695863 gender .4119338 .2335005 1 .76 00.078 .078 --.0457187 .0457187 .8695863 age13 II .365495 .1161503 3.15 .1378446 .5931454 age13 .365495 .1161503 3 .15 00.002 .002 .1378446 .5931454 _cons II -3 -3.418043 .1872194 -18.26 -3.784986 -3.0511 _cons .418043 .1872194 -18 .26 00.000 .000 -3 .784986 -3 .0511

-------------+----------------------------------------------------------------

A third third approach approach to to analyzing analyzing data data with with missing missing values values in in aa PA-GEE PA-GEE model model A is to assume assume that that the the process process generating generating the the missing missing data data admits admits this this estimaestimais to tion and and proceeds proceeds with with an incomplete analysis. analysis. In In this this approach, approach, all all complete complete tion an incomplete observations (regardless of of whether whether the the panel panel is is complete) are included in the the observations (regardless complete) are included in analysis. The PA-GEE PA-GEE model model actually actually requires requires aa special special case case of of the the MCAR MCAR analysis . The assumption; we assume assume that that P(MIY, P(MIY, X) = P(MI P(MIX) for all all Y. Y. Conditional Conditional on on assumption; we X) = X) for the covariates, is independent of the the observed observed outcomes Yo as as well well as as the the the covariates, M M is independent of outcomes Yo missing outcomes Y,,, Y m.. Further, Further, PA-GEE PA-GEE modeling modeling is is appropriate appropriate if if a a dataset dataset missing outcomes has missing values values generated generated from from aa dropout dropout process, process, if if the the data data are are MAR, MAR, has missing and if the the parameters parameters of of the the dropout dropout process process are are distinct distinct from from the the parameters parameters and if of This assumption is analyzed analyzed by by Shih (1992) where where he he outlines outlines the the of interest. interest . This assumption is Shih (1992) necessary conditions subject subject to to distinct distinct parameters. parameters. necessary conditions The most commonly commonly studied studied pattern pattern of of missing missing data data relates relates to to dropouts. dropouts. The most In fact, this this is is aa common common outcome outcome in in many many health health related related studies studies.. Imagine Imagine aa In fact, health study in in which which patients patients are are randomized randomized to to aa treatment treatment drug or to to aa health study drug or placebo.. It It is is reasonably reasonably expected expected (and (and common) common) that that those those patients patients assigned assigned placebo to the the placebo placebo may may stop stop participating participating after after several several observations observations when when there there to is no change change in in their their status status.. Likewise Likewise it is sensical sensical (and (and common) common) that that those those is no it is patients assigned assigned to to the the treatment treatment drug drug may may be be susceptible susceptible to to aa side side effect effect patients that causes causes their their participation participation to to stop stop at at some some point point in in the the study. study. In In fact, fact, that in order these types types of of dropouts are sometimes designed into into aa health health study study in these dropouts are sometimes designed order to safeguard safeguard the the participants participants.. to In for the In modeling modeling dropouts, dropouts, the the basic basic idea idea is is to to include include aa model model for the complete complete cases and aa model model for for the the dropouts. dropouts. Various Various interactions interactions are are hypothesized hypothesized cases and for Typically, such for considering considering the the joint joint distribution of these these two two models models.. Typically, such distribution of in this investigations result in in likelihood-based likelihood-based techniques techniques not not covered covered in this text. text. investigations result Interested readers can can see Little (1995) (1995) for for an an excellent excellent example example.. Interested readers see Little Robins, Rotnitzky, Rotnitzky, and Zhao (1995) (1995) present present another another approach approach for for modelmodelRobins, and Zhao ing dropouts; ing dropouts; see see also also Rotnitzky Rotnitzky and and Robins Robins (1995) (1995).. The The authors authors present present aa weighted estimating equation equation resulting resulting in in valid valid unbiased unbiased estimates estimates under under the the weighted estimating assumption that the the probability probability that that an an observation observation is is missing missing depends depends only assumption that only on the past past values values of of the the covariates covariates and and outcomes. outcomes. on the The approach The approach amounts amounts to to aa weighting weighting scheme scheme based based on on the the inverse inverse probprobability of censoring censoring that that extends extends the the GEE GEE class class of of models models to to MAR-classified ability of MAR-classified data. data. It is is important important to to note note that that we we have have switched switched the the notation notation from from the the original original It paper. The The authors' authors' discussion discussion centers centers on on an observation being being uncensored uncensored paper. an observation

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

128 12 8

GENERALIZED ESTIMATING ESTIMATING EQUATIONS EQUATIONS GENERALIZED

(not missing) missing) Rit R it = = 1; our present present discussion discussion centers centers on on an an observation observation being being (not 1 ; our missing Mi Mitt = = 0. O. missing The authors authors suppose suppose that that the the response response probabilities probabilities are given by by The are given

Ait = P(Mit = OlMit-1 = 0, Xil, . . . , Xit,Yil, . . . ,Yit)

(3.185) (3.185)

This equation This equation says says that that the the conditional conditional probability probability that that the the itth itth observation observation missing given missing is not missing given that the previous observation is not missing and given all all is not that the previous observation is not and given of the covariates and outcomes up to time t is equal to Ait. It is assumed that of the covariates and outcomes up to time t is equal to Ait . It is assumed that these conditional conditional probabilities probabilities are are known known up up to to qq unknown unknown parameters. parameters. The The these basic idea idea is is then then to to model model this this conditional conditional probability probability by by aa logistic regression.. basic logistic regression Fitted values are then used used as as the the weights weights in in the the GEE or other other model model.. Fitted values are then GEE or The PA-GEE weighting The PA-GEE is is generalized generalized for for this this inverse probability weighting as: inverse probability as:

w(,8, a)

_

-

V(JLi)

= =

(w f3 (,8, a), a), `Y«W wa (,8,' a)) a)) (`I'PW' n

alJLi t Yi -- Pi T ~xT_D (8 (V(JL_))-l1 (Yi JLi) ) x~iD ~~ ( a(~) {:-tl J" 811 ) (v(gi)) " a( 1» 2

(

n

(3.186) (3.186)

i

~ (~~) 8a i-1 (~

2

)

T T

~i) Hi (Wi -~i) Hi 1 (wi l

D(Ait1(1D(Ait l (l- Mit))D(v(/~it)) M it ))D(V(ftit)?/2 R(a) D(v(/_tit)) D(V(ftit)?/2 1j2 R(a) 1j2

(3.187) (3.187) (3.188) (3 .188)

t )) where the diagonal diagonal matrix matrix of of weights weights D(Ait(1 D(Ait(1 - Mi Mit)) are formed formed from from the the where the are . in applying fitted values of the logistic regression. Readers interested in applying these fitted values of the logistic regression Readers interested these techniques will will have have to to program program the the necessary necessary components components since since most most softsofttechniques The ware packages do not support individual level weights. The documentation for ware packages do not support individual level weights . documentation for SUDAAN indicates indicates that that it it supports supports specified specified observation-level observation-level weights weights.. As As we we SUDAAN previously alluded, alluded, this this technique technique is is not not limited limited to to PA-GEE PA-GEE models. models. previously There There are, are, of of course, course, additional likelihood-based modeling modeling approaches approaches to to additional likelihood-based missing data. missing data. Fitzmaurice, Fitzmaurice, Laird, Laird, and and Lipsitz Lipsitz (1994) (1994) present present aa study study with with balanced models models where where missing missing data data are classified as as MAR. MAR. In In this this approach, approach, balanced are classified the focus focus is is on on marginal marginal models models where where associations associations are are based based on on conditional conditional the log-odds ratios. The The approach approach relies relies on on the the EM EM algorithm, algorithm, see see Dempster, Dempster, log-odds ratios. Laird, and Rubin Rubin (1977), (1977), and requires substantial substantial programming programming on on the the part part Laird, and and requires of the interested interested analyst analyst due due to to the the lack lack of of commercial commercial software software support support.. of the Other approaches approaches for for specific specific types types of of missing missing data data are are addressed addressed in in Diggle Diggle Other and Kenward (1994), (1994), Heyting, Heyting, Tolboom, Tolboom, and and Essers Essers (1992), (1992), and and Little Little and and and Kenward Rubin (1987).. Rubin (1987)

3.8 Choosing Choosing an an appropriate appropriate model model 3.8

The previous The previous chapter chapter outlined outlined the the derivation derivation of of likelihood-based likelihood-based models models and and illustrated and assumptions. Likewise, this this chapter chapter illusillusillustrated model model construction construction and assumptions . Likewise, trates the the techniques techniques and and construction construction of of GEE GEE models. models. Given Given aa panel panel dataset, dataset, trates which model which model should should an an analyst choose to to estimate? estimate? The The answer answer is is driven by aa analyst choose driven by combination of factors factors:: the questions of of interest, interest, the the size size and and nature nature combination of the scientific scientific questions of the panel panel dataset, dataset, and and the the nature nature of of the the covariates covariates.. of the

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

CHOOSING AN AN APPROPRIATE APPROPRIATE MODEL MODEL CHOOSING

129 129

If the scientific questions ofinterest ofinterest center center on on the individual effects effects of of covariIf the scientific questions the individual covariates on the the response response variable, variable, then then aa subject-specific subject-specific likelihood-based likelihood-based model model ates on or subject-specific GEE is most most appropriate appropriate.. Population-averaged or aa subject-specific GEE model model is Population-averaged models are not not appropriate appropriate in in this this case case and and there there is is no no way way to to alter alter the models are the interinterpretation of of the coefficients to to allow interpretation in in aa subject-specific subject-specific pretation the fitted fitted coefficients allow interpretation manner. Valid likelihood-based likelihood-based models models to to address address subject-specific subject-specific hypotheses hypotheses manner . Valid include unconditional fixed-effects fixed-effects models, models, conditional conditional fixed-effects fixed-effects models, models, include unconditional and random-effects models models.. and random-effects On the other hand, if the scientific scientific questions questions center center on on the the marginal marginal efOn the other hand, if the effects of of covariates, then aa population-averaged population-averaged model model is is appropriate appropriate;; subjectfects covariates, then subjectspecific models are are not not appropriate appropriate.. The beta-binomial model model is is an an example example of of specific models The beta-binomial this type type of of valid valid likelihood-based likelihood-based marginal model.. Appropriate Appropriate GEE models this marginal model GEE models include the PA-GEE PA-GEE (using (using either either moment estimators or or ALR), ALR), PA-EGEE PA-EGEE include the moment estimators models, PA-REGEE models, models, or or GEE2 GEE2 models. models. models, PA-REGEE In In aa longitudinal longitudinal dataset, dataset, we we imagine imagine data data where where the the outcome outcome is is whether whether an individual student student attends attends an an optional optional study study session session.. In In these these data, there an individual data, there are several study study sessions sessions over over the the semester semester in in which which we we collect collect data. data. One One are several covariate is the the student's student's age age.. A A second second covariate covariate is is an an indicator indicator of whether covariate is of whether the student failed the the quiz quiz immediately preceding the the study study session session.. immediately preceding the student failed If we we want want to to answer answer the the question question of of whether whether the the attendance attendance depends depends on on If the age age of of the student, then then aa population-averaged population-averaged model model is is appropriate appropriate.. If If the the student, we want to to answer answer the the question question of whether the the probability probability of of attending attending the the we want of whether study session changes when an an individual individual learns he or she is is failing failing the course, study session changes when learns he or she the course, then the the subject-specific subject-specific model model is is appropriate. In this this example, we would would fit fit then appropriate . In example, we both types types of of models models in in order order to to answer answer the the scientific scientific questions questions of of interest. interest. both Now imagine imagine that that the the data are collected collected for for aa single single optional optional study study session session Now data are and the panels panels are are identified identified by by the the course course in in which which the the student is enrolled. enrolled. InInand the student is stead of aa longitudinal longitudinal dataset, dataset, we we have panel dataset dataset;; there there are no repeated repeated stead of have aa panel are no measurements the individual individual students. students. We We fit fit aa population population averaged averaged model model measurements on on the to answer the question question of of whether whether the the probability probability of of attending attending the the study study sessesto answer the sion depends on on the the age age of of the the student student.. This This model model does does not not take take advantage advantage sion depends of repeated measurements measurements even even if if such such information information exists exists in in the the data. data. Thus Thus of repeated there is is no no change change in in the the manner manner in in which which we we interpret interpret the the coefficients coefficients.. there To answer answer the the question question of of whether whether the the probability probability of of attending attending the the session session To depends on whether whether an an individual individual student student has has failed failed the the previous previous quiz, quiz, we we depends on can fit aa subject-specific model. However, However, in in this this case case the the interpretation interpretation of of can fit subject-specific model. the coefficient coefficient is more difficult difficult.. The The coefficient's interpretation is is based based on on aa the is more coefficient's interpretation change in in whether whether the the student has failed failed the the preceding preceding quiz, quiz, and and we we have have no no change student has such observations.. such observations In In general, general, population-averaged population-averaged models models are are most most appropriate appropriate for for assessing assessing in covariates changes that are are constant constant within within the the panel panel identifier. identifier. In In contrast, contrast, changes in covariates that the population-averaged population-averaged interpretation interpretation addresses addresses the the question of whether whether the the the question of probability of of attending attending the the study study session session depends depends on on whether whether the the previous previous probability quiz was failed, over all all students. students. quiz was failed, averaging averaging over Both subject-specific subject-specific and and population-averaged population-averaged GEE GEE models models depend depend on on the the Both availability of aa sufficient sufficient number number of panels in in the dataset to to be be analyzed. analyzed. A A availability of of panels the dataset

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

130 13 0

GENERALIZED ESTIMATING ESTIMATING EQUATIONS EQUATIONS GENERALIZED

fixed-effects model is the the most most appropriate appropriate model model if if there there are are aa small number fixed-effects model is small number of panels.. of panels We cannot cannot include include covariates covariates with values that that are are constant constant within within panels panels in in We with values aa fixed-effects fixed-effects (unconditional (unconditional or or conditional) conditional) model model.. Such Such covariates covariates are are called called panel level level covariates covariates.. Even Even if if our our focus focus is is on on interpreting interpreting the the subject-specific subject-specific panel effects our experimental experimental data, data, we cannot separate the effects effects of of panel panel level level effects in in our we cannot separate the covariates from the the fixed fixed effect-they effect-they are collinear. covariates from are collinear. Assuming that that a a population population averaged averaged model model is is appropriate, appropriate, there there is is still still Assuming aa choice choice between between using using the the moment moment estimators estimators of of the the correlation correlation matrix matrix or or the ALR ALR approach estimating correlations correlations based based on on log log odds odds ratios. ratios. We We recrecthe approach estimating ommend using ALR ALR when when the the data data are are binary, binary, especially especially if if the the focus focus of of the the ommend using analysis includes interpretation interpretation of of the the correlation coefficients.. If the data data are are analysis includes correlation coefficients If the not binary and and the the focus focus of the analysis analysis includes includes interpretation interpretation of of the the correcorrenot binary of the then aa GEE2 GEE2 model model is is preferred preferred over over aa GEE1 GEE1 model model.. For For example, example, lations, lations, then the PA-EGEE PA-EGEE model, model, compared compared to to aa similar GEE2 model, model, provides provides smaller smaller the similar GEE2 0 . The standard errors for and aa less less accurate accurate estimate estimate of of the the dispersion dispersion ¢>. The standard errors for j3 Q and smaller standard smaller errors are are aa result result of of assuming assuming orthogonality of the estimatstandard errors orthogonality of the estimating equation ing for the the regression regression coefficients coefficients and and the the estimating estimating equation equation for the equation for for the correlation parameters. correlation parameters. Within aa class class of for correlated correlated data, data, the the initial choice of of the the varivariWithin initial choice of GLMs GLMs for ance function is is driven driven by by the the range range and and nature nature of of the the outcome outcome variable. variable. The The ance function binomial variance variance p(1 p,(1 - p) p,) is preferred if if the the outcome outcome is is binary. binary. The The Poisson Poisson binomial is preferred variance p, is is preferred preferred if if the the outcomes outcomes represent represent counts counts of of events. events. The The GausGausvariance p sian variance 1, gamma variance variance p2, p,2, or or inverse inverse Gaussian Gaussian variance variance p3 p,3 may may be be sian variance 1, gamma used the outcome outcome is is (effectively) (effectively) continuous. continuous. Of Of course, course, the gamma and and used if if the the gamma inverse Gaussian are most appropriate appropriate to to use use when when the the response response consists consists of of inverse Gaussian are most positive valued valued continuous continuous numbers numbers.. However, However, once once the the initial initial variance variance is is chochopositive sen, residual analysis analysis is is used used to to investigate investigate the the fit of the the data data for for the the chosen chosen sen, residual fit of function. In Chapter 44 we we illustrate an analysis analysis of ofthe variance function, function, which which function. In Chapter illustrate an the variance includes these steps steps of of the the analysis analysis (see (see section section 4.2.3) 4.2.3).. includes these Likewise, the the initial initial choice of the the link link function for aa particular particular model model is Likewise, choice of function for is usually chosen based based on on the the range range of of the the outcome outcome variable variable.. In In most most cases cases the the usually chosen canonical link is is used. used. Whether Whether we we choose choose the the canonical canonical link link or or some some other canonical link other link usually has has no no effect effect on on the the outcome outcome of the analysis, analysis, but but can can affect affect the the link usually of the calculation of the the sandwich sandwich estimate estimate of of variance. variance. This This comes comes down down to whether calculation of to whether software uses the the expected expected or or observed matrix for for the the construction software uses observed information information matrix construction of the sandwich sandwich estimate estimate of of variance. the case case of of the the canonical canonical link, link, the the two two of the variance . In In the calculations are equivalent. Most software software implements implements the the expected expected Hessian Hessian calculations are equivalent . Most specified by Liang Liang and and Zeger Zeger (1986) (1986).. Shah Shah et al.. (1997) (1997) document document the the options options specified by et al available in the the SUDAAN SUDAAN package package whereby whereby users users can either approach approach.. available in can specify specify either Hardin and Hardin and Hilbe Hilbe (2001) illustrate the the relationship relationship and derivation of of both both (2001) illustrate and derivation variance estimate constructions constructions.. variance estimate The The above above distinction distinction is not as as clear clear in in the the case case of of choosing between various various is not choosing between any given random-effects models. For For any given model, model, one hypothesize any any desired random-effects models. one can can hypothesize desired distribution for the the random random effects effects.. Hopefully, Hopefully, the the choice choice of of the the distribution distribution distribution for is based on on some some scientific scientific knowledge knowledge of of the the process, process, but but this this need need not not be be the the is based

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

SUMMARY

131

case. As long long as as the the distribution distribution supports supports a a variety variety of of shapes, shapes, depending on case. As depending on the distribution distribution parameters, parameters, the the model model may may be be appropriate appropriate.. In In most most cases cases the the the choice of the the distribution distribution for for the the random random effects effects is is driven driven by by the the integrability integrability choice of ofthe resulting likelihood likelihood for panel. Residual Residual analysis analysis can help to distinguish of the resulting for aa panel. can help to distinguish aa good good model model from from aa poor poor model. model. Standard Standard model model criterion criterion such such as as Akaike's Akaike's information or the the deviance deviance statistic statistic can can be be used used to to choose between information criterion criterion or choose between aa small small collection collection of of possible possible models models.. 3.9 Summary Summary 3.9

In this chapter chapter we we have have illustrated illustrated various various approaches approaches to to building building models models from from In this GEE in in order order to to fit fit panel panel data. data. In so doing, we examined examined both both the the GEE1 GEE1 and and GEE In so doing, we GEE2 methods methods.. Within Within the the GEE1 GEE1 framework, framework, the the most most well-known well-known approach approach GEE2 is that of Liang and and Zeger Zeger (1986) (1986).. There There are are many many software packages that that offer offer is that of Liang software packages software support of of these these models. models. In In general, though, the the estimation estimation of of the the software support general, though, association parameters is is secondary secondary to to the the analysis analysis of of interest, and no no stanstanassociation parameters interest, and dard errors are are reported reported.. For For the the specific specific case case of of binomial binomial models, models, the the ALR ALR dard errors technique is is aa subset subset of of GEE!. This approach approach generally generally produces produces better better estechnique GEEl. This estimates of of the the association association parameters. parameters. Supporting Supporting software typically includes includes timates software typically estimates of standard standard errors. There is is excellent excellent support support for for this this technique technique from from estimates of errors . There commercial software. commercial software. The PA-EGEE PA-EGEE approach approach was was the the third third GEE1 technique examined examined.. This This The GEE1 technique technique, like ALR, specifies more formal estimating equation equation for for the the asaslike ALR, formal estimating technique, specifies aa more sociation parameters; however, however, there there is is at present no no commercial commercial software software sociation parameters; at present support. support . In aa situation where we we only only have have access access to to software software without without support support for one In situation where for one ofthe alternative GEE1 GEE1 approaches, approaches, we we can still fit the alternate models if ifthe of the alternative can still fit the alternate models the software allows us us to to both both specify specify fixed fixed correlation correlation matrices matrices and and to to limit limit the the software allows number iterations to to one. one. In In this this way, way, we we can solve the the association association parameter parameter number of of iterations can solve estimating equation ourselves, ourselves, build build the the working working correlation correlation matrix, matrix, and and then then estimating equation use the PA-GEE PA-GEE software software with with our our specified specified matrix matrix to to iterate once for for an an use the iterate once update to the the current current parameter parameter estimates estimates.. update to In addition, addition, we we reviewed reviewed aa resistant resistant GEE1 GEE1 method method for for building building GEE1 GEE1 modmodIn els that are are resistant resistant to to outliers. There were were two two different different approaches approaches to to buildbuildels that outliers . There ing and specifying specifying the the downweights for the the model model discussed discussed.. There There is currently ing and downweights for is currently no software support support for for this this approach. approach. We currently have have to to engage engage in in proprono software We currently gramming order to to fit fit these these adapted adapted models. models. gramming in in order The GEE2 approach approach differs differs from from that that of of the the GEE1 GEE1 in that the the estimating estimating The GEE2 in that equation for the the association association parameters parameters is is not not assumed assumed to to be be orthogonal orthogonal to to equation for the estimating estimating equation equation for for the the regression regression coefficients coefficients.. The The regression regression coefficoeffithe cients for the the PA-EGEE PA-EGEE model model estimate estimate the the same same population population parameter parameter as as cients Qj3 for is estimated by by the the PA-GEE PA-GEE model, model, even though the the two two approaches approaches are are not not is estimated even though numerically the The estimating numerically the same same since since the the moment moment estimates estimates of of a a differ differ.. The estimating equation for the the association association parameters parameters for for the the PA-EGEE PA-EGEE model model is is the the same same equation for as that for for the the GEE2 GEE2 model model when when the the GEE2 GEE2 model model assumes assumes aa Gaussian Gaussian distridistrias that bution for for the the random random component-see component-see Hall Note that that GEE2 models bution Hall (2001). (2001) . Note GEE2 models

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

132 13 2

GENERALIZED ESTIMATING ESTIMATING EQUATIONS EQUATIONS GENERALIZED

are less likely likely to to converge converge than than are are similar similar GEE1 models due due to to the the increased increased are less GEE1 models complexity of the the model. model. complexity of It should should be be noted noted that that many many papers papers and and researchers researchers make make reference reference to to It GEE.. In In nearly nearly all all cases, cases, this this reference reference is is to to the the PA-GEE PA-GEE model model described described by by GEE Liang Zeger (1986) (1986) where where the the association association parameters parameters are are estimated via Liang and and Zeger estimated via the Pearson residuals.. There There are are several several reasons reasons for for the the popularity popularity of of these these the Pearson residuals models. The original original description description ofthe PA-GEE models models included included illustrations illustrations models . The of the PA-GEE of how to to alter alter the the IRLS IRLS algorithm algorithm so so that that these these models could be be estimated estimated.. of how models could The ease with with which which one one could could do do this this led led to to aa large large number number of of adoptions adoptions The ease by various software software packages packages and, and, more more often, often, by by individual users of of software software by various individual users packages.. The The reader reader should note that that this this text text attempts attempts to to clarify clarify the the widenpackages should note widening field of of GEE GEE models models by by defining taxonomy. This This taxonomy taxonomy has has not not been been ing field defining aa taxonomy. used prior to to this this text, text, and and may may not not be be adopted adopted in in future future articles articles dealing dealing with with used prior the subject. In many many cases, cases, you you will will read read journal journal articles articles that that make make clear clear the the the subject . In model of interest interest in in context context.. We We have have tried tried to to adopt adopt individual individual notational notational model of conventions for identifying identifying GEE GEE models models into into our our presentation presentation wherever wherever posposconventions for sible and to to point point out out differences differences where where our our notation notation differs differs.. Regardless of sible and Regardless of an acceptance of of our our taxonomy taxonomy by by researchers researchers in general, we we believe believe that that our our an acceptance in general, notation allow aa clear within this this text text to to differentiate differentiate the the notation does does allow clear distinction distinction within models under discussion discussion.. models under Ziegler, Kastner, Kastner, Gr6mping, Gr6mping, and and Blettner Blettner (1996) (1996) recommend recommend that that analysts analysts Ziegler, limit analyses to to GEE1 GEE1 models models only when the the panel panel sizes sizes are are less than or or limit analyses only when less than equal to 44 (and (and there there are are at at least least 30 30 panels) panels).. This This recommendation recommendation follows follows equal to from the the simulation simulation results results of of Liang Liang and and Zeger Zeger (1986) (1986) where where they they showed showed only from only small gains in in efficiency efficiency for for the the PA-GEE PA-GEE models. models. The The advice advice includes includes mention mention small gains of at least least 30 30 panels. panels. This This is is now now aa standard standard rule rule of of thumb thumb for for applying applying of at asymptotically justified estimators. estimators. With With small small panel panel sizes, sizes, one one should should also asymptotically justified also compare results from from the the PA-GEE PA-GEE models models with with the the independence model.. compare results independence model In choosing choosing an an appropriate appropriate model, model, we we acknowledge that software software makes makes it In acknowledge that it possible for analysts analysts to to fit fit any any number number of of models models with with relative relative ease. ease. While While possible for there is is some some misuse misuse of of software software for for GLMs GLMs by by analysts analysts fitting fitting every every link link and and there variance this is is aa poor poor use use of software. There There is is an an even even greater greater variance function, function, this of software. opportunity for this this type type of of model-hunting model-hunting expedition expedition with with panel panel data data since since opportunity for there are are so more possible possible models models that that might might be be estimated. estimated. Data Data analanalthere so many many more ysis and model model inference inference starts starts with with an an analysis analysis of of the the scientific scientific questions questions of of ysis and interest that ultimately ultimately leads leads to to aa small small collection collection of of models models to to be be estimated estimated.. interest that Estimating all possible possible models models (because software allows allows it) it) is is scientifically scientifically Estimating all (because software irresponsible, and rarely rarely leads leads to to sound sound analysis. analysis. irresponsible, and There are are two two main main approaches approaches to to handling handling missing missing observations observations in in panel panel There data analysis. The The first first is imputation.. This This technique technique fills fills in in missing missing data data with with data analysis. is imputation values imputed from from the the observed observed data. data. Various Various techniques techniques form form these these proxy proxy values imputed observations based on on parametric parametric or nonparametric assumptions assumptions.. The The second second observations based or nonparametric technique is is to to embed embed in in the the GEE GEE another another model model for for the the mechanism mechanism that that gengentechnique erates the missing missing observations observations.. Either Either technique technique includes assumptions about about erates the includes assumptions carefully considered the nature nature of of the the missing missing observations observations that that must must be be carefully considered by by the the analyst. Our discussion discussion illustrated illustrated the the effects effects on on estimators estimators of of various various asasthe analyst . Our

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

SUMMARY SUMMARY

133 133

sumptions, presented the the characteristics characteristics of of different different types types of of missing missing data, and sumptions, presented data, and motivated the need need for for sophisticated techniques to to resolve resolve the the bias bias associated associated motivated the sophisticated techniques with models relying relying on the MCAR MCAR assumption assumption.. with models on the We have have made made a a concerted concerted effort effort to to compare and contrast contrast models models by by foWe compare and focusing on their their estimation algorithms and and calculation calculation.. We We believe believe that that ununcusing on estimation algorithms derstanding calculation of of the the models models offers offers insight into the the properties properties of of derstanding the the calculation insight into estimators for different different types types of of data data and and can can illuminate illuminate the the situations situations that that estimators for lead to numeric numeric difficulties. Finally, our our focus focus on on the the algorithms algorithms and and various various lead to difficulties . Finally, choices for ancillary ancillary parameter parameter estimators estimators clears clears up up the the frustration frustration many many of of choices for us feel when when comparing comparing output output across across different different software software packages packages.. us feel

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

134 13 4

GENERALIZED ESTIMATING ESTIMATING EQUATIONS EQUATIONS GENERALIZED

3.10 Exercises Exercises 3.10 1. This This chapter chapter focused focused aa lot lot of of attention attention on on two two estimators estimators for for the the disper1. dispersion parameter in the PA-GEE model. Choose one of these estimators sion parameter in the PA-GEE model . Choose one of these estimators and present present aa case case why why you you think think it it is is the the better better choice choice.. and

2. Using Using your your preferred preferred software, software, determine which estimator estimator of of 1> your softsoft2. determine which 0 your ware uses uses by by fitting fitting the the small small dataset dataset illustrating illustrating the the two two competing competing estiestiware mators. Determine Determine if if your your preferred preferred software software has has options options for for using using either either mators. approach. approach. 3. Pan Pan (200lb) introduced an an alternate alternate calculation calculation of ofthe modified sandwich sandwich 3. (2001b) introduced the modified estimate of variance. Explain the difference for Pan's formulation from estimate of variance. Explain the difference for Pan's formulation from the usual sandwich estimate of variance. For large samples, do you think the usual sandwich estimate of variance . For large samples, do you think there will will be be aa significant significant difference difference from from the the usual usual calculation? calculation? there 4. Explain Explain why why the the sandwich sandwich estimate estimate of of variance variance for for the the PA-GEE PA-GEE model model 4. results in in standard standard errors errors for for the the regression regression coefficients coefficients that that are are robust robust results to misspecification misspecification of of the the hypothesized hypothesized correlation structure.. to correlation structure 5. Using Using the the ship ship accident data, fit fit aa PA-GEE PA-GEE model model assuming assuming aa stationstation5. accident data, ary(l) correlation correlation structure structure.. Interpret regression coefficients as inary(1) Interpret the the regression coefficients as incidence rate rate ratios ratios (IRRs). your software software prints prints out out coefficients, coefficients, then then cidence (IRRs) . If If your calculate the the IRRs IRRs and and standard standard errors errors using using the the delta delta method. method. calculate 6. Discuss Discuss the the motivations motivations for for the the preference preference of of independence independence models models over over 6. more complicated complicated models models that that include include parameters parameters for for correlated correlated data. data. more 7. For For the the class class of of GLMs GLMs show show that that the the observed observed Hessian Hessian is is equal equal to to the the 7. expected Hessian Hessian when when the the GLM GLM is is constructed constructed with with the the canonical canonical link. link. expected PADiscuss what what this this implies implies for for the sandwich estimate estimate of of variance variance for for PADiscuss the sandwich GEE GEE models. models. 8. Data Data are are collected collected for of AIDS AIDS behavior behavior among among men men 8. for aa longitudinal longitudinal study study of in San Franscisco.* Subjects were recruited and surveyed annually. With in San Franscisco.* Subjects were recruited and surveyed annually. With complete data data for for 55 annual annual measurements, measurements, the the goal goal of of the the analysis analysis is is to to complete This description description is is based based on study conducted conducted by by McKusick, McKusick, Coates, Coates, Morin, Morin, Pollack, Pollack, ** This on aa study and Hoff Hoff (1990) (1990).. Neuhaus Neuhaus (1992) (1992) presents presents results results for for applying applying various various panel panel data data models models.. and

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

EXERCISES EXERCISES

135 135

determine the the factors factors influencing influencing an an individual's individual's probability probability of of engaging engaging determine in unsafe unsafe sexual behavior. The The binary binary outcome outcome is is whether whether the the individindividin sexual behavior. ual engaged engaged in behavior.. The The covariates covariates include include the the age age of of the the ual in unsafe unsafe behavior individual in in years, years, an an indicator indicator of whether the the man man was was involved involved in in aa individual of whether primary gay relationship, an an indicator indicator of whether the the man man was was involved primary gay relationship, of whether involved in aa monogamous monogamous relationship, relationship, an an indicator indicator of of whether whether the the man man had had been been in tested for for HIV, HIV, and and the the number number of of AIDS AIDS symptoms reported for for the the year. year. tested symptoms reported For the the following indicate whether whether you you would would fit populationFor following questions, questions, indicate fit aa populationaveraged or or aa subject-specific subject-specific model model.. averaged (a) Does Does the the probability probability of of engaging engaging in in unsafe unsafe behavior behavior depend depend on on the the age age (a) of the individual? of the individual? (b) Does Does the the unsafe unsafe behavior behavior decrease decrease over over time time for for the the population population of of men? men? (b) (c) Does the unsafe behavior decrease over time for an individual? (c) Does the unsafe behavior decrease over time for an individual? (d) Does Does the the unsafe unsafe behavior behavior decrease for an an individual individual once once they they learn learn (d) decrease for the results results of of their their HIV HIV test? test? the (e) Does Does the the unsafe unsafe behavior behavior decrease over time adjusted for for the the previous previous (e) decrease over time adjusted year?* year?*

This is is aa trick trick question! question! ** This

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

CHAPTER 4 CHAPTER4

Residuals, Diagnostics, and Testing

Residuals, Diagnostics, and Testing

This chapter highlights highlights the the techniques techniques and and measures measures used used to to evaluate evaluate GEE GEE This chapter models. We also also discuss discuss techniques techniques for for choosing choosing between between competing competing models models models . We and the extensions extensions of of familiar familiar diagnostics diagnostics and and graphical methods to to GEE. GEE. and the graphical methods Each of the member member distributions distributions for for the the exponential exponential family family have have implied implied Each of the quasilikelihoods. In Table Table 4.1 4.1 and and Table Table 4.2, 4.2, we we list list the the log-likelihood log-likelihood functions functions quasilikelihoods . In and the implied implied quasilikelihood quasilikelihood functions since they they prove prove to to be be useful useful for for and the functions since the diagnostics diagnostics presented presented in in this this chapter chapter.. several of the several of

Family

-~1 L

Gaussian Gaussian Binomial( k) Binomial(k)

¢

{(y(y -~Jt)2 ++ ln(27ro) In(27f¢)} P)2

L{

lnf(k + 1) -lnf(y + 1) -lnf(k - y + 1) Y~ { In F(k+1)-1nF(y+1)-1nF(k- y +1)

(~)

~)

+yln(Pk)+(P ~} +y In + (Jt -y)ln(1-Y) - y) In ( 1 -

0¢ L {-p In(p) In F(y + 1)} {-Jt + + Yy In(Jt) -lnf(y + I)}

Poisson Poisson

I",,{yY -In ~

Gamma Gamma

-¢

~

Inverse Gaussian Gaussian Inverse

¢-I1 In(y) + 1 1nF(O)~} - -In (¢) - ~ - --In(y) + -lnf(¢) Jt Jt ¢ ¢

L{

_:)2

(y _~ (y - l-02 + + In In (Oy3 (¢y3)) + + In(27r) In(27f)} -1 22 Y~ ~ yP yJt2 0 ¢

Table 4.1 4.1 Log-likelihoods Log-likelihoods for for exponential exponential family family members members Table

We first first review review the the estimation estimation procedure procedure for for the the GEE1 GEEI for for GLMs in order We GLMs in order to establish establish needed needed quantities quantities for for defining defining the the diagnostics diagnostics required required for for model model to

137 137

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

138 138

RESIDUALS, DIAGNOSTICS, DIAGNOSTICS, AND AND TESTING TESTING RESIDUALS,

Q

Family Gaussian Gaussian

-"211 "" ~(y -

Binomial(k) Binomial(k)

L

f-l) 2

{y log (P/(1 (f-l/(1 - P)) f-l)) + + log log (1 (1 -- P)} f-l)} {y log

L

Poisson Poisson

log p {y log f-l - P} f-l} ~7' {y

Gamma

-

Inverse Gaussian Gaussian Inverse

~7, {y/p + In p}

L{-y/(2f-l2)+1/f-l} { - y/( 2p 2 ) + 1/p}

Table 4.2 4.2 Quasilikelihoods Quasilikelihoods for for exponential exponential family family members members Table

evaluation. Recall the the estimating estimating equation equation for for the the PA-GEE PA-GEE for for GLMs: GLMs: evaluation. Recall

(4.1) (4.1)

(W/3(,B, a), wa(,B,a)) W, a), T., W, a)) (pp

w(,B,a)

n

(

Yi - Pi alti mgi)) x~ D (~i) ~XJ;D eVel'i»-' ((y~(¢n a (0) ) ( 8n ) 1

i-1

~ (~~) 8a ) (

T

1l

(wi Hi (Wi Hi

-~i)i)

(4.2)

(4 .2)

-

D(V(f-lit)?/2 R(a) D(v(l_tit)) D(V(f-lit)?/2 = D(v(l_tit)) 1/2 R(a) 1/2

V(JLi)

)

)

(4.3) (4.3)

The heart The heart of of the the procedure procedure for for solving solving this this estimating estimating equation equation is is the the iteraiteratively reweighted reweighted least least squares squares (IRLS) (IRLS) algorithm algorithm.. This This algorithm algorithm is is aa modmodtively ification of in which ification of the the Newton-Raphson Newton-Raphson algorithm algorithm in which the the expected expected Hessian Hessian matrix is for the matrix is substituted substituted for the observed observed Hessian Hessian.. The The modification modification is is known known as as the method method of of Fisher Fisher scoring. scoring. An An updating updating equation equation for for ,B is available available under under the Q is this approach approach such such that that this 1

(4 ~~DTV(w)i 1Di} {~DTV(wi) -lsi 2-1

2-1

.4)

where where

Di

=

D (v(pit)) D

Si

=

Yi - 9 -1 R)

(lt2)

Xi

(4.5) (4.5) (4.6) (4 .6)

In this In this form, form, the the updating updating equation equation clearly clearly has has the the form form of of aa weighted weighted least least squares regression algorithm algorithm with with aa (synthetic) dependent variable variable given given by by squares regression (synthetic) dependent

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

CRITERION MEASURES MEASURES CRITERION

the the

(E (2: n2) ni) xx

139 139

11 column column vector vector Zit Zit = (dit (Yit - Mit) Fit)

8

p) it + (~~) + ~2t ifit 817 it

(4 (4.7) .7)

The weighted OLS OLS algorithm algorithm involves covariates, X, X, and and weights weights The weighted involves covariates,

z (Of..L)2} WZ " - D { v(p2)a(O) V(f..Lit)a(¢» O' it } 87fJ it

W'-D{ 1

(4.8) (4 .8)

Written explicitly, we we see see that that the the updating updating equation equation is is Written explicitly, anew = (XTWX)-1XTWZ

(4.9) (4 .9)

The solution solution entails entails the the alternating alternating estimation estimation of of Q j3 and and of of a, (x, with with the the The results of each each estimate estimate being being used used to to update update values values for for the the calculation calculation of of the the results of subsequent estimate. Iterations Iterations continue continue until until aa predetermined predetermined criterion criterion of of subsequent estimate. convergence is reached reached.. This This is possible for for the the GEE1 GEE1 models models since since we we assume assume convergence is is possible that the the two two estimating estimating equations equations are are orthogonal. While we we have have specifically specifically that orthogonal. While emphasized the estimating estimating equations equations for for the the PA-GEE PA-GEE models, models, the the estimation estimation emphasized the steps are the the same same for for alternating logistic regression regression (ALR) (ALR) and and the the PA-EGEE PA-EGEE steps are alternating logistic models. However, the the description given does not address address estimation estimation of of GEE2 GEE2 models . However, description given does not models. models .

4.1 Criterion 4.1 Criterion measures measures Several criterion criterion measures measures have have recently recently been been proposed proposed for evaluating GEEGEESeveral for evaluating constructed models. In In the the next next few few subsections subsections we we highlight highlight several useful constructed models. several useful measures for evaluating evaluating the of fit fit of of the the model, model, choosing choosing the the best best measures for the goodness goodness of correlation structure for for aa PA-GEE PA-GEE model, model, and and choosing choosing the the best best collection collection correlation structure of covariates for for aa given given correlation correlation structure structure.. of covariates criterion (AIC) (AIC) is Akaike's information information criterion well-established goodness-of-fit goodness-of-fit Akaike's is aa well-established statistic for likelihood-based likelihood-based model model selection selection.. Pan Pan (2001a) (200la) introduced introduced two two useusestatistic for ful extensions extensions of of this this measure measure that we illustrate illustrate in in the subsections.. ful that we the following following subsections 4,4.1.1 .1 .1 Choosing correlation structure Choosing the the best best correlation structure

The AIC for for likelihood-based likelihood-based models models is is defined as The AIC defined as

AIC = -22 -2£ + + 2P 2p AIC =

(4.10) (4 .10)

where is the the number number of of parameters parameters in in the the model model.. The The goal goal is is to to generalize where pp is generalize this measure measure for for quasilikelihood quasilikelihood models. models. Since Since G, £, by by definition, definition, is is the logthis the loglikelihood it likelihood it seems seems obvious obvious that that we we should be able able to to replace replace it it with with the the should be quasilikelihood Q.. The The penalty penalty term term in in the the AIC AIC should should also also be be generalized generalized.. quasilikelihood Q Pan shows Pan shows how how we we can can derive derive aa new new measure measure called called the the quasilikelihood quasilikelihood under under the independence model information criterion (QIC) (QIC).. the independence model information criterion

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

140 14 0

RESIDUALS, DIAGNOSTICS, DIAGNOSTICS, AND AND TESTING TESTING RESIDUALS,

Recall the the quasilikelihood quasilikelihood for for PA-GEE PA-GEE models: models: Recall

2(Y ; P) Q(y; f-l)

- f-l* * - P* dp* == fI "YYv(P*) V(f-l*) df-l

(4.11)

We list quasilikelihoods quasilikelihoods for for various various distributions distributions in in the exponential family family We list the exponential in Table 4.2. 4.2. Regardless Regardless of the correlation correlation structure structure R(a) R(a) used used in in fitting fitting in Table of the the PA-GEE the quasilikelihood quasilikelihood is is calculated calculated under under the the assumption assumption the PA-GEE model, model, the of independence, R R = = I.I. It It uses uses both both the the model model coefficient coefficient estimates estimates and and the the of independence, correlation in the process.. However, However, the the quasilikelihood quasilikelihood does does not not itself itself directly directly correlation in the process any type address type of of correlation correlation.. The The penalty penalty term term of of the the AIC, AIC, the the 2p 2p term, term, address any for the -11 VMS,R) is the QIC as 22 trace trace (A (AV M S,R) where where AI AI is is the the variance variance is calculated calculated for QIC as matrix for matrix for the the independence independence model model and and VMS,R V MS,R is is the the sandwich sandwich estimate of estimate of variance for the correlated model model.. QIC(R) QIC(R) is is defined defined from from these these terms terms as as variance for the correlated

QIC (R) =

-22(g-1(x,QR)) +

2 trace (AI 1 VMS,R)

(4.12) (4 .12)

The notation The notation emphasizes: emphasizes: • Q(y Q (y;;g-1(x,QR)) g-l (xJ3 R )) is is the the value value of of the the quasilikelihood quasilikelihood computed computed using using the the from coefficients from the model with hypothesized correlation structure R. In coefficients the model with hypothesized correlation structure R. In in evaluating the quasilikelihood, we use iJ, = g-l (x,BR) in place of JL where evaluating the quasilikelihood, we use j~ = 9 -1 (OR) place of /-t where g-10 is the the inverse inverse link function for for the the model. link function g -1 () is model . AI is is the the variance variance matrix obtained by by fitting fitting an an independence independence model model.. • AI matrix obtained • VMS,R V MS,R is is the the modified modified sandwich sandwich estimate estimate of of variance variance from from the model with with the model hypothesized correlation correlation structure structure R. R. hypothesized

Since the definition of of the the QIC QIC is is in in terms terms of of the the hypothesized hypothesized correlation correlation Since the definition structure R, we we can can use use this this measure to choose choose between between several competing structure R, measure to several competing correlation structures. As As with with the the AIC, AIC, the the best best model model is is the the one one with with the the correlation structures. smallest measure. measure. The QIC is is equal equal to to the the AIC AIC when when the the model model implies implies aa smallest The QIC likelihood proper and we are are fitting fitting an independence model model (less (less aa constant constant likelihood proper and we an independence normalizing normalizing term). term) . We simulate simulate data data (see section 5.2.4) 5.2.4) that that follow follow an an exchangeable exchangeable correlation correlation We (see section binomial-logit model model where where the the common common correlation correlation is is .4 .4 for for aa balanced balanced dataset dataset binomial-logit with 50 individuals, individuals, each each with with 88 replicated replicated observations observations.. This This simulation simulation is with 50 is similar to the the one one performed performed by by Pan Pan in in the the previously previously cited cited reference reference and and is similar to is summarized in Table Table 4.3. 4.3. summarized in Using the the QIC QIC measure measure to to choose choose among among these these 77 competing competing correlation correlation Using structures leads us us to to select select the the exchangeable exchangeable correlation correlation model. model. structures leads We also also computed computed the the QIC QIC for for several several correlation correlation structures using the the ProProWe structures using gabide data analyzed analyzed in in Chapter Chapter 3. 3. The The results results for for various various correlation correlation strucstrucgabide data tures are are given given in in Table Table 4.4. 4.4. Using Using the the QIC QIC measure measure to to choose choose among among the the tures competing correlation competing correlation structures structures illustrated illustrated in in Chapter Chapter 33 again again leads leads to to the the selection of the the exchangeable exchangeable correlation correlation model model.. selection of In In the the previous previous chapter chapter we we also also looked looked at at simulated data (section (section 55.2.5) with simulated data .2.5) with aa complicated complicated correlation correlation structure structure.. We We examined examined various various methods methods for for fitting fitting an exchangeable correlation correlation structure structure and, and, then then in in section section 3.2.1.6, estimated an exchangeable 3.2 .1 .6, estimated aa model model matching matching the the generating generating correlation correlation structure structure.. Computing Computing the the QIC QIC

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

CRITERION MEASURES MEASURES CRITERION

141 141

Correlation Correlation

QIC QIC

Exchangeable Exchangeable Independent Independent AR(2) AR(2) AR(l) AR(1) Unstructured Unstructured Nonstationary(2) Nonstationary(2) Stationary(2) Stationary(2)

449.7804 449 .7804 451.3903 451 .3903 451.7270 451 .7270 452.0540 452 .0540 452.2829 452 .2829 453.2632 453 .2632 453.4091 453 .4091

Table Simulation results results for for the the QIC QIC measure measure.. The The true true correlation correlation structure structure Table 44.3 .3 Simulation used in in simulating simulating the the data data is is exchangeable exchangeable.. used

Correlation Correlation

QIC QIC

Exchangeable Exchangeable AR(2) AR(2) Unstructured Unstructured Stationary(2) Stationary(2) Nonstationary(2) Nonstationary(2)

3206.677 3206.677 3212.521 3212.521 3225.236 3225.236 3233.845 3233.845 3233.845 3233.845

Table 4.4 QIC QIC measures measures for for several several correlation correlation structures structures for for the PA-GEE Poisson Poisson Table 4.4 the PA-GEE of the the Progabide Progabide data data.. model of model

statistic for various various correlation correlation structures structures yields yields results results in in Table Table 4.5 4.5 validating validating statistic for the use of the more complicated correlation structure for fitting the model.. for fitting the use of the more complicated correlation structure the model

Correlation Correlation

QIC QIC

Correct Correct AR(2) AR(2) Stationary(3) Stationary(3) Independent Independent Exchangeable Exchangeable

171.894 171 .894 172.917 172 .917 173.390 173 .390 173.656 173 .656 173.933 173 .933

Table 4.5 QIC QIC measures measures for for several several correlation correlation structures structures for for the PA-GEE linear linear Table 4.5 the PA-GEE of the the data data in in section section 5.2 5.2.5. regression model model of regression .5 .

In In choosing choosing the the best best correlation correlation structure, structure, we we offer offer the the following following general general guidelines. guidelines . • If If the the size size of of the the panels panels is is small small and and the the data data are are complete, complete, use use the the unstructured correlation correlation specification. unstructured specification .

for the • If If the the observations observations within within aa panel panel are are collected collected for the same PSU over over same PSU time, then then use use a a specification specification that also has time dependence. dependence. time, that also has aa time • If If the are clustered clustered (not (not collected collected over over time), time), then then use use the the the observations observations are exchangeable correlation correlation structure. exchangeable structure .

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

142 14 2

RESIDUALS, DIAGNOSTICS, DIAGNOSTICS, AND AND TESTING TESTING RESIDUALS,

If the the number number of of panels panels is is small, small, then then the the independence independence model model may may be be "• If the best; best; but but calculate calculate the the sandwich sandwich estimate estimate of of variance variance for for use use with with the hypothesis tests tests and and interpretation interpretation of of coefficients coefficients.. hypothesis •" If If more more than than one one correlation specification satisfies the the above above descriptions, descriptions, correlation specification satisfies use the QIC measure to discern the best choice. use the QIC measure to discern the best choice . Of course, course, if if there there is is motivating motivating scientific scientific evidence evidence of of aa particular particular correlation correlation Of structure, then that that specification specification should should be be used. used. The The QIC QIC measure, measure, like like any any structure, then model selection criterion, should not not be be blindly blindly followed followed.. model selection criterion, should

of covariates covariates Choosing the the best best subset subset of 44.1.2 .1 .2 Choosing The is aa measure measure that that can can be be used used to to determine determine the the best best subset subset of of The QIC QIC,,u is particular model model.. The The measure measure is is defined defined as as covariates for a a particular covariates for -22 (g (4.13) (4.13) QIC . = -1 (XOR)) + 2p the notation notation emphasizes emphasizes that that the the quasilikelihood quasilikelihood is is calculated calculated for for the the where where the independence model, but but with with the regression coefficients coefficients fitted fitted for for the the hypothhypothindependence model, the regression esized correlation structure structure.. esized correlation In choosing choosing between between (two (two or or more) more) models, models, the the model model with with the the smallest smallest In QIC u criterion criterion measure measure is is preferred preferred.. QIC,, As aa short short first first example, example, we we look look at at the the Progabide Progabide dataset dataset.. We We fit all 11As fit all factor and and 2-factor 2-factor models models as as well well as as the the full full (3-factor) models for for comparison comparison.. factor (3-factor) models The sorted results results are presented in in Table 4.6. The sorted are presented Table 4.6. Covariates Covariates

QIC u QIC,,

time time time progabide progabide timeXprog timeXprog time time timeXprog timeXprog time time progabide progabide time progabide timeXprog progabide timeXprog progabide progabide timeXprog timeXprog

3202.203 3202.203 3206.677 3206.677 3207.649 3207.649 3209.472 3209.472 3253.112 3253.112 3253.736 3253.736 3257.007 3257.007

Table 44.6 QICu measures measures for for models models of of the the Progabide Progabide data data.. Table .6 QICu

Using Using only only the the QIC measure, the the best best model model includes includes only only the the time time varivariQIC,,u measure, criterion measure able. Note that that the the difference difference in in the the criterion measure for for the the best best model model as as able. Note well as for for the the full full model model is is almost entirely due due to to the the penalty penalty (2p) (2p) term term.. This This well as almost entirely criterion is criterion meant as as aa guide guide for for choosing between models models when when no no scientific scientific is meant choosing between knowledge would guide guide the the researcher researcher to to aa preference preference.. Despite Despite the the results results of of knowledge would this investigation, investigation, we we still still prefer prefer the the full model.. this full model 4.2 Analysis 4.2 Analysis of of residuals residuals An analysis An of data data includes includes an an important important final final check check that that the the selected selected model model analysis of adequately fits the the data. data. This This part part of the analysis analysis focuses on uncovering uncovering sigsigadequately fits of the focuses on

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

ANALYSIS OF RESIDUALS RESIDUALS ANALYSIS OF

143 14 3

nificant departures in in the the data data from from the the model model assumptions assumptions.. We We focus on two two nificant departures focus on types of of departure. departure. The The first first is is an an observation observation (isolated) (isolated) departure departure;; the the second second types is model (systematic) (systematic) departure departure.. is aa model 4.2.1 A A nonparametric nonparametric test test of of the the randomness randomness of of residuals residuals 4.2.1

One can can not not apply apply many many well-known well-known techniques, techniques, without without modification, modification, to to One the case case of of PA-GEE PA-GEE for for GLMs GLMs.. Chang Chang (2000) (2000) advises advises the the use use of of the the WaldWaldthe Wolfowitz run test test to to assist assist the the analyst analyst in in uncovering uncovering possible possible patterns patterns of of Wolfowitz run nonrandomness using scatter scatter plots plots of of residuals residuals.. The The test test codes the residuals residuals nonrandomness using codes the with an indicator indicator of of whether whether the the residual residual is is positive, positive, (`1'), ('1'), or negative, ('-1'). with an or negative, The sequence of of codes codes is is then then examined examined and and aa count count of of the the total total number number of of The sequence runs of the two codes is computed. This is without regard to the length of runs of the two codes is computed. This is without regard to the length of any given run. any given run. n p indicate total number number of of positive positive residuals, residuals, nn nn indicate indicate the the total total Let nP Let indicate the the total T indicate number of negative negative residuals, residuals, and and T indicate the number of of observed runs in in number of the number observed runs our sequence.. Under Under the the null null hypothesis hypothesis that that the the signs signs of of the the residuals residuals are are our sequence distributed in aa random random sequence, sequence, the the expected expected value value and and variance variance of of T Tare: distributed in are :

E(T) E(T) V(T) VT

= -

2n pn n

--=----+ 11 2nPnn + np+nn nP + nn

2n pn n (2n pn n - n p - nn) 2npnn(2npnn - nP - nn) (np + (n Pp + + nn)2 nn)2(np nn -1) (n + nn 1)

(4.14) (4.14) (4.15) 4.15

A test statistic statistic for for this this hypothesis hypothesis is is then then A test

T - E(T) W _ T-E(T) z - JV(T) V(T)

(4.16) (4.16)

which has an an approximately approximately standard standard normal normal distribution distribution.. Extreme Extreme values values which has of WZ indicate that that the the model model does does not adequately reflect reflect the the underlying underlying of W not adequately z indicate structure of the the data. data. structure of Clearly, this this test relies on on aa specific specific ordering ordering of of the the residuals residuals.. As As such, such, the the Clearly, test relies test may may be be amended amended in in order order to to assess different hypotheses hypotheses.. An overall test test test assess different An overall of the panel panel structure of the the model model could could sort sort the the residuals residuals in the "natural "natural of the structure of in the order." That is, is, the the data data would would be be sorted sorted by by the the panel panel identifier identifier ii and and the the order." That repeated identifier tt within Alternatively, if we wish wish to to assess assess repeated measures measures identifier within i.i. Alternatively, if we whether given (continuous) (continuous) covariate covariate is is specified specified in in the the correct correct functional functional whether aa given form, we we can can sort sort the the residuals residuals on on that that covariate, or we we can can test test the the model model form, covariate, or adequacy sorting on the fitted fitted values values.. adequacy sorting on the 4.2.2 4.2.2 Graphical Graphical assessment assessment

The first The first step step in in an an exploratory exploratory data data analysis analysis (EDA) (EDA) should should include graphinclude aa graphical illustration of of the the raw raw data. data. To To accomplish accomplish this, this, we we want want to to include include ililical illustration lustrations of the the data data that that reflect reflect the the panel panel nature nature.. One such approach approach (for (for lustrations of One such nonbinomial models) shows boxplots of of the outcome for for each each of of the the repeated repeated nonbinomial models) shows boxplots the outcome

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

144 144

RESIDUALS, DIAGNOSTICS, DJAGNOSTTCS, AND AND TESTING TESTING RESIDUALS,

measures can illustrate measures.. Using Using the the Progabide Progabide data, data, we we can illustrate boxplots boxplots for for the the baseline baseline and of the and four four follow-ups follow-ups of the seizure seizure counts counts.. Boxplots of of seizures seizures by by observation observation time time Boxplots

0

! 0~

I

!, I• 0 0

N_ baseline

~ 1

~ 2

==, 3

E!3 4

Since we use use the the log log link link with with Poisson Poisson variance variance to to model model the the counts, counts, we we can can Since we of the the seizure seizure counts counts.. also illustrate illustrate the the log log of also Boxplot$ of log(seizures) log(seizures) by time Boxplots of by observation observation time

!

-

•

I " "

I,

IJ 0 0

baseline

1

2

,

3

4

These boxplots boxplots show the raw raw and and log log transformed transformed seizure seizure counts counts for for the the enenThese show the tire dataset data.<;et.. Our Our analysis analysis is is primarily primarily focused focused on on the the efficacy efficacy of of the the Progabide Progabide tire treatment.. As As such, such, we we also also illustrate illustrate boxplots boxplots of of placebo placebo and Progabide obobtreatment and Progabide servations for each each observation observation time. time. The The log log transformed transformed data data are are used used for for servations for illustration illustration.. Boxplots Boxplots for for the the placebo placebo are are

© 2003 by G2003 by Chapman ChapmaJl & & Hall/CRC Hall/CRC

ANALYSIS OF OF RESIDUALS RESIDUALS ANALYSIS

145 14 5

Boxplots of of log(seizures) log(seizures) by by observation observation time time [Placebo] [Placebo] Boxplots

.r--------------, •

O o

'----:::::::------:-_-=_-=-_=----l "........ a baseline

1

2

3

4

and boxplots boxplots for for the the Progabide Progabide treatment treatment are are displayed displayed as as and Boxplots of ollog(selzures) by observation time [Progablde] Boxplots log(seizures) by observation time [Progabide]

•

I,

•

,f

!J O baseline

1

2

3

4

Standard approaches used for for model model building building apply apply equally equally for for GEE GEE models. models. Standard approaches used One should should assess assess model model adequacy adequacy via via the the same same types types of of residual residual plots plots used used in in One linear linear models models with with the the added added requirement requirement that that the the illustration illustration should should identify identify the panel structure the panel structure of of the the data. data. In In the the previous previous chapter chapter we we used used the the two two simulated simulated datasets data.'5ets from from secsec.4 (logistic tion 5.2.5 5.2.5 (linear (linear data) data) and and section section 5.2 5.2.4 (logistic data) data).. Here Here we we illustrate illustrate the the tion resulting resulting residual residual plots plots for for fitting fitting aa number number of of different different assumed assumed correlation correlation

© 2003 by G2003 by Chapman ChapmaIl & & Hall/CRC Hali/eRC

146 146

RESIDUALS, DIAGNOSTICS, DIAGNOSTICS, AND AND TESTING TESTING RESIDUALS,

structures in aa PA-GEE PA-GEE model model.. The The data data are are generated generated from from aa linear linear model model structures in with correlation structure structure not not (directly) (directly) supported supported by by software. software. The The logistic logistic with aa correlation data are generated generated from logistic model model where where the the data data are are characterized characterized by by data are from aa logistic an exchangeable correlation correlation structure. an exchangeable structure . Suppose we we wish wish to to examine examine the the results results for for fitting fitting an independent correcorreSuppose an independent lation, an lation, an exchangeable exchangeable correlation, correlation, and and the the theoretically theoretically correct correct correlation correlation structure linear data. data. The The results results of of fitting fitting the the independent correlation structure to to the the linear independent correlation model to linear linear data data are are:: model to GEE population-averaged model model GEE population-averaged Group variable:: Group variable Link: Link : Family:: Family Correlation: Correlation : Scale parameter:: Scale parameter

11.029406 .029406

Number of obs Number of obs Number of groups Number of groups Obs per per group group:: min min Obs avg avg max max Wald chi2(2) chi2(2) Wald Prob >> chi2 chi2 Prob

Pearson Pearson chi2(80): chi2(80) : Dispersion Dispersion (Pearson) (Pearson)::

82.35 82 .35 11.029406 .029406

Deviance Deviance Dispersion Dispersion

yy II

Coef.. Coef

id id identity identity Gaussian Gaussian independent independent

Std.. Err Err.. Std

Z z

P> I z I P>Izl

= = = = = = =

80 80 10 10 88 8.0 8.0 88 50.54 50 .54 0.0000 0 .0000

= =

82.35 82 .35 11.029406 .029406

[95% Conf Conf.. Interval] Interval] [95%

-------------+----------------------------------------------------------------------------+----------------------------------------------------------------

xl II .2397294 4.83 .6886185 xi 11.158479 4 11.62834 .158479 .2397294 .83 00.000 .000 .6886185 .62834 x2 II .2708094 4.28 .6281517 x2 11.158928 .158928 .2708094 4 .28 00.000 .000 .6281517 11.689705 .689705 _cons .8813407 .238173 3.70 .4145301 cons II 11.348151 .348151 .8813407 .238173 3 .70 00.000 .000 .4145301 ------------------------------------------------------------------------------

The residual residual plot plot for for this this model model is is illustrated illustrated below. below. The Residuals Fitted Values Residuals versus versus Fitted Values 8 9

9 6

s

12 1

0

b

4 3 34

1

3 b

4

8 0

3

0

9

467

0

8

2

r 1.0

r 1 .5

34

789

3 3

77

r 2.0 2.0

7

1

0

46

1

789 3 3

r 2.5 2.5

Fitted values values Fitted PA-GEE PA-GEE model model with with independent independent correlation correlation

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

6

12 2

46 7 90 5 9

2

8

5

9

5

2

r 0.5

8

r 3.0 3.0

r 3.5 3.5

ANALYSIS OF RESIDUALS RESIDUALS ANALYSIS OF

147 14 7

Since there there are are only only 33 distinct distinct covariate covariate patterns patterns in in these these data, data, the the residual residual Since plot includes includes only only 33 distinct distinct values values on on the the horizontal horizontal axis. axis. We We moved moved each each plot identifier slightly (in (in the the horizontal horizontal direction) direction) to to more more clearly clearly observe observe the the identifier slightly panel identifiers identifiers of of the the residuals residuals;; this this is is aa standard standard graphical graphical technique technique called called panel jitter.. Plots of this this type type are are routinely routinely examined to see if residuals residuals in in each each jitter Plots of examined to see if panel have have the the same same sign. sign. panel We calculate calculate the the runs test to to examine the randomness randomness of of the the residuals residuals.. We runs test examine the Below is aa graphical graphical illustration illustration of of the the test. test. Below is Graphical Illustration of of Residual Residual Runs Runs Graphical Illustration

b O -

3

b Q

0

1

1

1

20 20

40 40

60 60

80 80

Fitted values values Fitted PA-GEE model model with with independent independent correlation correlation PA-GEE

Test results provide provide the the following following statistics: statistics: Test results

np

=

42 42

(4.17) (4.17)

n T T

= =

(4.18) (4.18) (4.19) (4.19)

E(T) E(T)

=

V(T) V(T)

=

38 38 44 44 2n pn n 2npnn --'-------+ = 40.9 40.9 + nn + 11 = np+nn nP 2npnn (2n pnn 2npnn(2n~nn - np np- nn) nn) = = 19.65 19.65 (n + nn) (n + nn 1) p p nn (np + ) 2 (np + nn - 1) 44-40 .9 = 0.6993 44 - 40.9 06993 y'19.65 19.65 .2422 .2422

z p p

=

(4.20) (4.20) (4.21) (4.21) (4.22) (4.22) (4.23) (4.23)

The test The test reveals reveals that that there there is not enough enough evidence evidence to to reject reject the the hypothesis hypothesis is not that the the residuals residuals from from the the model model are are random random.. In In general, general, the the result result of of the the that runs test does not significantly significantly change change due due to to the the hypothesized hypothesized structure runs test does not structure when when the model model is correct in in terms terms of of including including necessary necessary covariates covariates in in their their proper proper the is correct form. form. We We instead instead use use the the QIC QIC measure measure to to select select the the best best correlation correlation structure structure instead. instead .

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

148 14 8

RESIDUALS, DIAGNOSTICS, DIAGNOSTICS, AND AND TESTING TESTING RESIDUALS,

The graphical illustration illustration of of the the residual residual runs runs test test is is produced produced by by plotting plotting The graphical sign(fit) versus the the observation observation number number (where (where the the data data are by panel panel sign(F t ) versus are sorted sorted by identifier and by by repeated repeated measure measure identifier identifier within within the the panel panel number) number).. The The identifier and vertical lines indicate indicate the the number number of of runs runs in in the the residuals residuals (less one).. The The vertical lines (less one) grid indicate the the breaks breaks in panels and and allow allow us us another another method method grid lines lines indicate in the the panels for checking checking the the number number of of panels panels where where the the residuals residuals have have aa common common sign. sign. for Preferably, we would would still produce this this plot plot for for aa slightly slightly larger larger dataset, dataset, but but Preferably, we still produce break the the presentation presentation into into several smaller units units.. This This type type of of plot plot is is not not useful useful break several smaller for very very large large datasets datasets since since the the amount amount of of information information becomes becomes too too dense. dense. for Hence, the plot plot is rather indecipherable indecipherable.. Hence, the is rather For the the logistic logistic data, data, we we can can first fit an an independence independence model: For first fit model: GEE population-averaged model model GEE population-averaged Group variable:: Group variable

Link: Link : Family:: Family Correlation : Correlation:

Scale parameter Scale parameter::

id id logit logit binomial binomial exchangeable exchangeable 11

Number of of obs Number obs Number of of groups groups Number Obs per per group group:: min min Obs avg avg max max Wald chi2(3) chi2(3) Wald Prob Prob >> chi2 chi2

= = = = = = =

= =

400 400 50 50 88 8.0 8 .0 88 20.48 20 .48 0.0001 0 .0001

Coef Std zZ P>Izl y II Coef.. Std.. Err Err.. P> I z I [95% Conf Conf.. Interval] Interval] y [95% -------------+---------------------------------------------------------------------------+----------------------------------------------------------------

xi xl I -.3325008 .3533269 -0.94 -1.025009 .3600073 .3325008 .3533269 -0 .94 00.347 .347 -1 .025009 .3600073 x2 I .2515805 .1081765 2.33 .0395584 .4636025 x2 .2515805 .1081765 2 .33 00.020 .020 .0395584 .4636025 .2158342 x3 I .1423381 .0374987 3.80 .068842 .2158342 x3 .1423381 .0374987 3 .80 00.000 .000 .068842 .2843566 4.49 .7199388 - cons cons I 11.277268 .277268 .2843566 4 .49 00.000 .000 .7199388 11.834596 .834596 ------------------------------------------------------------------------------

We plot plot the the Pearson Pearson residuals residuals for for all all panels panels for each of of the the 88 repeated repeated values values We for each in order to to examine examine whether whether there there is is an an order order effect effect.. in order

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

ANALYSIS OF RESIDUALS RESIDUALS ANALYSIS OF

149 14 9

Residuals versus versus Fitted Fitted Values Values (t=1) (t:1) Residuals

.-

Residuals versus versus Fitted Fitted Values Values (t=2) (t:2) Residuals

-

.......... .

..

.......

Firtedvmues PA-GEEmmelv.;tI1independemcorrelation

Firtedvmues PA-GEEmmelv.;tI1independemcorrelation

Residuals versus versus Fitted Fitted Values Values (t=3) (t:3) Residuals

Residuals versus versus Fitted Fitted Values Values (t=4) (t:4) Residuals

.......- .. .-

..

.

...

Firtedvmues PA-GEEmmelv.;tI1independemcorrelation

Firtedvmues PA-GEEmmelv.;tI1independemcorrelation

Residuals versus versus Fitted Fitted Values Values (t=5) (t:5) Residuals

Residuals versus versus Fitted Fitted Values Values (t=6) (t:6) Residuals

.....

.. ..

'''- ..

......

.......... ..

..

.. ....

Firtedvmues PA-GEEmmelv.;tI1independemcorrelation

Firtedvmues PA-GEEmmelv.;tI1independemcorrelation

Residuals versus versus Fitted Fitted Values Values (t=7) (t:7) Residuals

Residuals versus versus Fitted Fitted Values Values (t=8) (t:8) Residuals

........ ...............

Firtedvmues PA-GEEmmelv.;tI1independemcorrelation

Firtedvmues PA-GEEmmelv.;tI1independemcorrelation

There is There is no no indication indication in in the the plots plots that that the the residuals residuals depend depend on on either either the the panel identifier identifier or or on on the the repeated repeated measures measures identifier. identifier. In In this case, all all of of the the panel this case,

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

15
RESIDUALS, DIAGNOSTICS, DIAGNOSTICS, AND AND TESTING TESTING RESIDUALS,

plots are similar. similar . As plots are As previously previously discussed, discussed, we we can can also also test test. the the residuals residuals.. A A graphical illustration of the displays as: as: graphical iIlustrat.ion of the runs rUlls test lest displays Graphical 01 Residual Residual Runs Graphical Illustration Illustration of Runs

•• •

"L,---~--~--~-----,-J

,

--

Fitted.values P... -c.a: _ _ ....-._ PA-GEE model with independent correlation

The test test results resultS are are as as follows follows:: The np

",

Ti,, 11 11

T T E(T) E(T) V (T) V(T) ZZ p

"

=

296 296 = 104 10·1 291 = 291 = 2npT1 2ll p ll" n = .92 + 1=154 1 = 154.92 + = np 1J p + + nnn.. 2n ') = -899 2Il PpTb ll n.. (2npn,, (2n p ll n -- np IIp - n,,) = (1l,,+n,,)2(n +Iln-n 1) = 58.99 <). (np + nra)2 (n'pp + n n - 1) 291 - 154.92 = = 291-154.92 = 17.72 17.72 = )58.99 58.99 G .0001 < .0001 ~

(4.2<) (4.24) (4.25) (4.25) (4.26) (4.26)

(4 (4.2;) .27) (4.28) (4.28) (4.29) (4.29)

(4.30) (4.30)

In In this this case, case, the the test test provides provides aa clear clear indication indication that that the the residuals residuals are arc not lIot random. In hI fact, fact, we we know kllOw that that these these particular particular data data are arc generated generated from frolll aa random. correlatiou structure structlll'C not HOt. directly directly supported supported by by standard standard commercial cOlllmcrt:ial software software.. correlation The runs rUllS test test isis useful useful for for examining examining the the dependence dependence of of the the residuals residuals on on the the The individual covariates covariate!>.. For For this this example, example, we we generated generated data data that that accord accord with with individual the the model model z + Vi + lit .31) Yit = 1 + xl + 2xl2 +vi+Ett (4.31) (4 yit=I+x1+2x1 where 1) . This where v1/ ...... N(0, N(O,u;) and E€ '" N(0, N(O,I). This isis an an exchangeable exchangeable correlation correlation a,2) and model model where where pp = = .3, .3, the the correlation correlation used used in in generating generating the the data data.. Suppose Suppose that that we we misspecify misspecify the the model model in in the the estimation estimation using using only only the the x1 xl variable \~driablewithout without modeling modeling the the square square of of the the covariate covariate x12 x12.. Two Two techniques techniques excx-

© C 2003 2003 by by Chapman Chapman & & Hall/CRC HaJIICRC

ANALYSIS OF RESIDUALS RESIDUALS ANALYSIS OF

151 15 1

ist for discovering discovering this this missing missing covariate covariate.. The The first first method method is is graphical graphical where where ist for we plot the the residuals residuals versus versus the the covariate; the second second method method used used is is to to calcalwe plot covariate ; the culate the runs runs test test where where the the data data are are sorted sorted according according to to the the covariate covariate of of culate the interest. This is is possible possible when when the the covariate covariate under under investigation investigation is continuous. interest . This is continuous.

The The fit fit of of the the independence independence model model using using both both covariates covariates is is presented presented as: as:

GEE population-averaged model model GEE population-averaged variable:: Group Group variable Link: Link : Family:: Family Correlation: Correlation : Scale parameter:: Scale parameter

11.063566 .063566

Number of obs Number of obs Number of groups Number of groups Obs per per group group:: min min Obs avg avg max max Wald chi2(2) chi2(2) Wald Prob >> chi2 Prob chi2

Pearson Pearson chi2(100): chi2(100) : Dispersion (Pearson) (Pearson):: Dispersion

106.36 106 .36 11.063566 .063566

Deviance Deviance Dispersion Dispersion

y II y

Coef.. Coef

xi xl II x12 1I x12 cons 1I -_cons

.9963109 .9963109 2.006862 2 .006862 .8291631 .8291631

id id identity identity Gaussian Gaussian independent independent

Std.. Err Err.. Std

Z z

100 100 25 25

4 4 4.0 4.0 44 73374.59 73374 .59 0.0000 0 .0000 = =

106.36 106 .36 11.063566 .063566

P> I z I P>Izl

[95% Conf Conf.. Interval] Interval] [95%

00.000 .000 00.000 .000 00.000 .000

.9234193 .9234193 11.99234 .99234 .5964372 .5964372

-------------+----------------------------------------------------------------------------+---------------------------------------------------------------.0371903 .0371903 .007409 .007409 .1187399 .1187399

26.79 26 .79 270.87 270.87 6.98 6 .98

11.069202 .069202 2.021383 2.021383 11.061889 .061889

The fit of of the the independence independence model, model, excluding excluding the the x12 x12 variable, variable, is: is: The fit

GEE population-averaged model model GEE population-averaged Group variable:: Group variable Link: Link : Family:: Family Correlation: Correlation : Scale parameter:: Scale parameter

781. 3898 781 .3898

Number of obs Number of obs Number of groups Number of groups Obs per per group group:: min min Obs avg avg max max Wald chi2(1) chi2(1) Wald Prob >> chi2 Prob chi2

Pearson chi2(100): chi2(100) : Pearson Dispersion Dispersion (Pearson) (Pearson)::

78138.98 78138 .98 781. 3898 781 .3898

Deviance Deviance Dispersion Dispersion

id id identity identity Gaussian Gaussian independent independent

= = = = = = =

100 100 25 25

4 4 4.0 4.0 4 4 00.01 .01 0.9294 0 .9294

= =

78138.98 78138 .98 781..3898 3898 781

Coef Std z P>Izl y II Coef.. Std.. Err Err.. z P> I z I [95% Conf Conf.. Interval] Interval] y [95% -------------+----------------------------------------------------------------------------+----------------------------------------------------------------

xi 1 .003949 2 xl II .0889191 1.003949 0.09 -1.878785 2.056623 .0889191 0 .09 00.929 .929 -1 .878785 .056623 16.6458 2.802395 5.94 11.15321 22.13839 -_cons cons 1I 16 .6458 2.802395 5 .94 00.000 .000 11 .15321 22 .13839 ------------------------------------------------------------------------------

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

152 152

RESIDUALS, DIAGNOSTICS, DIAGNOSTICS, AND AND TESTING TESTING RESIDUALS,

We calculate calculate the the runs runs test test for for the the above above model model (where (where the the data data are are in in We natural order) to to obtain the following following results: results: natural order) obtain the

nP np = n nn = T T =

E(T) E(T)

=

V(T) V (T)

=

Z Z

p p

=

22 22 78 78 37 37 2n pn n 2npnn + 11 = = 36.32 36.32 + np+nn nP + nn 2nPnn (2n Pnn 2npnn(2n~nn - nP np- nn) nn) = = 11 11.55 .55 (n + nn) (n + nn 1) p p P + nn) 2 MP + nn - 1) MP 37 - 36.32 37-36 .32 = 0200 0.200 11 .55 VI1.55

(4.32) (4.32) (4.33) (4.33) (4.34) (4.34)

(4.35) (4.35) (4.36) (4.36) (4.37) (4.37)

.4207 .4207

(4.38) (4.38)

The test does does not not indicate indicate nonrandomness nonrandomness in in the the residuals, residuals, when when the the residuals residuals The test are ordered first first by by panel panel identifier then by by the the repeated repeated measures measures idenidenare ordered identifier and and then tifier within within the panel identifier. identifier. A A plot plot of of the the residuals residuals sorted sorted in in this this manner manner tifier the panel supports the conclusions of the the test, test, while while still still indicating indicating aa misspecification misspecification supports the conclusions of of indeterminate nature nature.. of indeterminate Residuals versus versus Observation Observation Number (Natural Order) Order) Residuals Number (Natural 0 ~ 0 0

;>

0 0 m"' 00

~.~ N

b

0 <0

.

0 0 O ~

3 b Q a: ~

O ~ O

I

0

')' 0

.....

I I I

.....

I-41

I

I

...

I I

I

..

I I

.

'

'

I I

1

.

I-1 1

1

..

1 1 1

r

r

r

r

r

20 20

40 40

60 60

80 80

100 100

Observation number Observation number PA-GEE model model with with independent independent correlation correlation PA-GEE

The plot clearly clearly shows shows that that the the magnitude magnitude of of the the positive positive residuals residuals is is much much The plot larger than the the magnitude magnitude of of the the negative negative residuals residuals.. However, However, there there is is no no larger than distinguish the indication of the the causal source.. The The vertical vertical lines lines in in the the plot plot distinguish the indication of causal source individual panels, where where the the residuals not indicate indicate aa time time dependence dependence.. individual panels, residuals do do not The plot of the raw residuals versus the fitted values is given below. The plot of the raw residuals versus the fitted values is given below.

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

ANALYSIS OF RESIDUALS RESIDUALS ANALYSIS OF

153 153 Residuals versus versus Fitted Values Residuals Fitted Values

0

o 0 o

0 m b

o

<0

N

3

.: 0

"............pl 1

1

1

1

1

1

1

15.0 15.0

15.5 15.5

16.0 16 .0

16.5 16.5

17.0 17.0

17.5 17 .5

18.0 18.0

Fitted values values Fitted PA-GEE model model with with independent independent correlation correlation PA-GEE

A plot of of the the raw raw residuals residuals versus the xi covariate values values is is displayed displayed as: as: A plot versus the x1 covariate Residuals versus versus x1l x1 covariate covariate Residuals 0

0 g

o 0 m

"'

b

g

b

O

Q

.' O

0

..

O

..\', •••••• ~.$.....

.

Y

.

~

-5 Fitted values values Fitted PA-GEE PA-GEE model model with with independent independent correlation correlation

The two Similarly, The two previous previous plots plots clearly clearly indicate indicate model model misspecification misspecification.. Similarly, the graphical graphical illustration illustration of of the the runs runs test test shows shows aa deviation deviation from from randomrandomthe ness for the the residuals residuals when when the the residuals residuals are are sorted sorted for for the the values values of of the the x1 xi ness for covariate. covariate .

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

154 154

RESIDUALS, DIAGNOSTICS, DIAGNOSTICS, AND AND TESTING TESTING RESIDUALS, Graphical Illustration of of Residual Residual Runs Runs Graphical Illustration o

_

co

ci 00 b

~

.~ N

." is

O

i"

0

ci

O

b N

a

~

Ui

co

9

d-

0

'";" 0

r

r

r

r

r

20

40

40

60

60

80

80

100

20

100

Fitted values values Fitted PA-GEE model model with with independent independent correlation correlation PA-GEE

Calculating the the runs runs test test for for this this arrangement arrangement of of the the residuals residuals results results in in Calculating

nP np = nn nn = T T =

E(T) E(T)

=

V(T) V(T)

=

Z Z p P

= G <

22 22 78 78 33 2n pn n 2nPnn + 11 = = 35.32 35.32 + nnp+nn P + nn 2nPnn (2n pnn 2npnn(2n~nn - nP np- nn) nn) = = 11 11.55 .55 (n p + + nn nn)) 2 (np (n p + nn - 1) 1) (nP + nn 3 - 35.32 3-35 .32 - -9 .510 = -9.510 11.55 Vl1.55 .0001 .0001

(4.39) (4.39) (4.40) (4.40)

(4.41) (4.41)

(4.42) (4.42) (4.43) (4.43) (4.44) (4.44) (4.45) (4.45)

Test results coincide coincide with with the graphical assessment assessment.. We We have have strong strong evidence evidence Test results the graphical that the the covariate xi is is misspecified misspecified in the fitted model. For For aa continuous continuous that covariate x1 in the fitted model. outcome model, we we can can plot plot the the residuals residuals versus versus each each of the sorted sorted covariates covariates.. outcome model, of the In In addition addition to to the the graphical graphical assessment and the the nonparametric nonparametric run run test, test, we we assessment and criterion measure can also calculate calculate the the QIC QIC uu criterion measure for for the the competing competing models. models. can also

QIC u (x1) QICu(xi) QICuu(x1 (xi x12) xi2) QIC

= =

156354.7721 156354 .7721 220.222 220 .222

(4.46) (4.46) (4.47) (4.47)

The results agree with our our other other model model analyses analyses.. The results agree with

4.2.3 Quasivariance Quasivariance functions functions for for PA-GEE FA-GEE models models 4.2.3 Wedderburn (1974) includes includes an an analysis analysis of spruce data. The author Wedderburn (1974) of Sitka Sitka spruce data. The author uses these data to illustrate illustrate that that the the usual usual binomial binomial variance variance function function does does uses these data to

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

ANALYSIS OF RESIDUALS RESIDUALS ANALYSIS OF

155 155

not adequately model model the the variance variance of of the the data. data. The The data used in in the the analysis analysis not adequately data used are actually at the individual individual level, level, but but include include an an identifier identifier for for the the variety of are actually at the variety of the barley. barley. We We use use the the variety variety as as the the panel panel identifier identifier and fit a a marginal marginal model model the and fit where we hypothesize hypothesize that that the the observations observations within within variety variety share share a a common where we common correlation. correlation .

The data The data include include the the percentage percentage leaf leaf of the leaf leaf area area of of barley barley affected affected by by of the

. . ,site9 site9 Rhynchosporium secalis, secalis, or or leaf leaf blotch, blotch, and and binary binary variables variables sitel, sitel, .... Rhynchosporium

to indicate indicate the site at which the the data data are are collected collected.. The The response response variable variable leaf leaf to the site at which was set to to .01% .01% for for those those observations observations that that were were zero zero (as (as in in the the original original analyanalywas set There are are 10 10 panels panels in in the the data data and and 99 sites for aa total total of of 90 90 observations observations.. sis). sis). There sites for

This analysis will will require require specification specification of of aa variance variance function that is is not not part part This analysis function that of the usual usual collection collection of of functions functions defined defined from from the the members members of of the the exponenexponenof the tial family. family. The The quasivariance quasivariance functions functions that that we we specify specify must be programmed programmed.. tial must be For this For this analysis, analysis, the the S-PLUS package provides provides the the best best support support for for useruserS-PLUS package written variance functions functions.. We We utilize utilize standard standard plots plots to to assess assess the the adequacy adequacy written variance of the model model.. of the The fit of of an an exchangeable exchangeable logistic logistic PA-GEE PA-GEE model provides The fit model provides

Coefficients: Coefficients : (Intercept) (Intercept) site1 sites site2 site2 site3 site3 site4 site4 site5 sites site6 site6 site7 site7 site8 sites

Values Values 00.4683789 .4683789 -5.7024161 -5 .7024161 -4.1052059 -4 .1052059 -2.5554481 -2 .5554481 -2.3385667 -2 .3385667 -2.3378754 -2 .3378754 -2.0826245 -2 .0826245 -1.4372780 -1 .4372780 -0.8738440 -0 .8738440

Stderr Stderr 11.2548246 .2548246 0.8578567 0 .8578567 11.3465430 .3465430 11.0323553 .0323553 11.3778087 .3778087 0.8762535 0 .8762535 11.4303881 .4303881 11.2446647 .2446647 0.4381757 0 .4381757

t-values Pr(Itl>) Pr(ltl» t-values 0.3732625 0.7099 0 .3732625 0 .7099 -6.6472829 0.0000 -6 .6472829 0 .0000 -3.0487002 0.0031 -3 .0487002 0 .0031 -2.4753572 0.0154 -2 .4753572 0 .0154 -1.6973087 0.0935 -1 .6973087 0 .0935 -2.6680355 0.0092 -2 .6680355 0 .0092 -1.4559857 0.1493 -1 .4559857 0 .1493 -1.1547511 0.2516 -1 .1547511 0 .2516 -1..9942776 9942776 0.0495 -1 0 .0495

Degrees of Freedom:: 90 Total;; 81 81 Residual Residual 90 Total Degrees of Freedom summary(res) >> summary(res) EFFECTS NDF NDF DDF DDF F PP.value EFFECTS F .value (Intercept) 81 00.1393 11 (Intercept) 11 81 .1393 00.7100 .7100 22 site1 81 44 44.1864 sites 11 81 .1864 00.0000 .0000 site2 81 99.2946 33 site2 11 81 .2946 00.0031 .0031 11 81 .1274 44 site3 81 66.1274 00.0154 .0154 site4 81 22.8809 55 site4 11 81 .8809 00.0935 .0935 sites 11 81 .1184 site5 81 77.1184 66 00.0092 .0092 site6 81 22.1199 site6 11 81 .1199 77 00.1493 .1493 11 81 site7 81 11.3335 88 .3335 00.2516 .2516 site8 81 33.9771 sites 11 81 99 .9771 00.0495 .0495

A plot plot of of the the Pearson Pearson residuals residuals versus versus the the linear linear predictor predictor is is given by A given by

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

156 15 6

RESIDUALS, DIAGNOSTICS, AND TESTING TESTING RESIDUALS, DIAGNOSTICS, AND Pearson residuals residuals versus versus linear linear predictor predictor Pearson

"'

.,;

o .,;

"' 9

:3-L.-.......---"'T'"---...,.------,----.-----.......- - - - 1 I

-5 -5

-4 -4

-3 -3

-2 -2

-1 -1

0

Linear predictor predictor Linear Exchangeable logistic logistic GEE-PA GEE·PA with with binomial binomial variance Exchangeable variance

and plot of the Pearson Pearson residuals residuals versus versus the the log log of of the the variance variance for for the the model model and aa plot of the is is Pearson residuals versus versus log(mu(1-mu)) log(mu(1-mu)) Pearson residuals

"' q

00

~

.~

"'

.,;

c

~

&:

q 0

9"' q

.,-5

-4

-3

-2

Log(binomial variance) variance) Log(binomial Exchangeable logistic logistic GEE-PA GEE·PA with with binomial binomial variance Exchangeable variance

The plots indicate indicate aa lack lack of of fit fit for for very and very very small small response response outThe plots very large large and outcomes. The plot plot of the Pearson Pearson residuals residuals versus versus the the log log of the variance variance should should comes. The of the of the look more uniform uniform if if the the model model were were truly truly adequate adequate for for the the data. look more data. Following Following the the analysis analysis of of Wedderburn, Wedderburn, we we hypothesize hypothesize aa quasivariance funcquasivariance function that that is the square of the the usual usual binomial binomial variance variance.. We We need need the the ability ability to to tion is the square of specify variance function function to to fit fit such such aa model model.. Most Most current current software software packages packages specify aa variance do not allow allow this this specification. Among the the packages packages that that are are used used throughout throughout do not specification . Among this text, text, S-PLUS S-PLUS does does support support this this specification specification.. this Programming the the squared squared binomial binomial variance variance function function into into the the software software enenProgramming

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

ANALYSIS OF RESIDUALS RESIDUALS ANALYSIS OF

157 157

tails the the additional additional specification of aa deviance deviance function function.. The The fit fit of of an an exchangetails specification of exchangeable logistic PA-GEE PA-GEE model model with with quasi quasi (square (square binomial) binomial) variance variance provides provides able logistic the following following output output:: the Coefficients : Coefficients: (Intercept) (Intercept) site1 sites site2 site2 site3 site3 site4 site4 site5 sites site6 site6 site7 site7 site8 sites

Values Stderr Values Stderr .2548246 00.4683789 .4683789 11.2548246 -5.7126423 0.8601273 -5 .7126423 0 .8601273 -4.1056059 -4 .1056059 11.3466953 .3466953 -2.5555499 -2 .5555499 11.0325136 .0325136 -2.3385667 -2 .3385667 11.3778087 .3778087 -2.3378754 0.8762535 -2 .3378754 0 .8762535 -2.0826245 -2 .0826245 11.4303881 .4303881 -1.4372780 -1 .4372780 11.2446647 .2446647 -0.8738440 0.4381757 -0 .8738440 0 .4381757

t-values Pr(Itl>) Pr(ltl» t-values 0.3732625 0.7099 0 .3732625 0 .7099 -6.6416246 0.0000 -6 .6416246 0 .0000 -3.0486524 0.0031 -3 .0486524 0 .0031 -2.4750764 -2 .4750764 -1.6973087 -1 .6973087 -2.6680355 -2 .6680355 -1.4559857 -1 .4559857 -1.1547511 -1 .1547511 -1.9942776 -1 .9942776

0.0154 0 .0154 0.0935 0 .0935 0.0092 0 .0092 0.1493 0 .1493 0.2516 0 .2516 0.0495 0 .0495

Degrees of Freedom:: 90 90 Total Total;; 81 81 Residual Residual Degrees of Freedom summary(res) >> summary(res) EFFECTS NDF NDF DDF DDF F PP.value EFFECTS F .value (Intercept) 81 00.1393 11 (Intercept) 11 81 .1393 00.7100 .7100 22 site1 81 44 44.1112 sites 11 81 .1112 00.0000 .0000 33 site2 81 99.2943 site2 11 81 .2943 00.0031 .0031 44 site3 81 66.1260 site3 11 81 .1260 00.0154 .0154 55 site4 81 22.8809 site4 11 81 .8809 00.0935 .0935 66 site5 81 77.1184 sites 11 81 .1184 00.0092 .0092 77 site6 81 22.1199 site6 11 81 .1199 00.1493 .1493 88 site7 81 11.3335 site7 11 81 .3335 00.2516 .2516 99 site8 81 33.9771 sites 11 81 .9771 00.0495 .0495

A plot plot of of the the Pearson Pearson residuals residuals versus versus the the linear linear predictor predictor is is displayed as A displayed as Pearson residuals residuals versus versus linear linear predictor predictor Pearson

0

')' ...-"T""---"'T'"---...,.-----,r-----T"'""---"T""-----I 0

0

0

0

0

0

-5 -5

-4 -4

-3 -3

-2 -2

-1 -1

0

Linear Linear predictor predictor Exchangeable logistic (square binomial) Exchangeable logistic GEE-PA GEE·PA with with quasi quasi (square binomial) variance variance

and plot of the Pearson Pearson residuals residuals versus versus the the log log of of the the variance variance for for the the model model and aa plot of the yields yields

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

158 15 8

RESIDUALS, DIAGNOSTICS, DIAGNOSTICS, AND AND TESTING TESTING RESIDUALS, Pearson residuals versus versus log(mu^2(1-mu)^2) log(muI\2(1-mu)1\2) Pearson residuals

0

')'"'l...-"""T-----"""T-----"""T-----"""T---...... r

r

r

-10 -10

-8 -8

-6 -6

-4 -4

Log(square binomial binomial variance) variance) Log(square GEE·PA with with quasi quasi (square binomial) variance variance Exchangeable logistic GEE-PA Exchangeable logistic (square binomial)

As in in Wedderburn's Wedderburn's original original analysis, analysis, we see aa substantial substantial reduction reduction for for the the As we see The fit for effect of the extreme fitted values. The model has an overall better fit for the effect of the extreme fitted values . model has an overall better the data. data. The original analysis analysis used used the the panels panels as as another another collection collection of of fitted fitted values values;; The original the analysis analysis was was aa GLM GLM.. A A true true analysis analysis of of these these data data would would allow allow the the disdisthe persion parameter parameter to to freely vary. There There is is no no aa priori priori reason reason to to assume assume the the persion freely vary. standard binomial value value of of 11 for for the the dispersion dispersion when when the the outcomes represent standard binomial outcomes represent percentages rather than binary binary outcomes. outcomes. We We also emphasize that that our our focus focus percentages rather than also emphasize in this example example is is on on the the specification specification of of alternate alternate quasivariance quasivariance functions functions.. In In in this so doing, we we adopted adopted one one of of the the original original covariates covariates in in the the designed designed experiment experiment so doing, to serve serve as as our our panel panel identifier. identifier. While While the the specification specification of of the the quasivariance quasivariance to shows improvement over over the the specification specification of of the the binomial binomial variance, variance, the the panel panel shows improvement analysis is not not preferred preferred over over Wedderburn's Wedderburn's original original illustrations illustrations.. analysis is 4.3 Deletion diagnostics diagnostics 4.3 Deletion

GLM analysis analysis utilizes utilizes the the DFBETA DFBETA and and DFFIT DFFIT residuals residuals described described for for general general GLM OL8 regression regression in in Belsey, Belsey, Kuh, Kuh, and and Welsch Welsch (1980), (1980), and and Cook's Cook's distance distance for for OLS identifying isolated departures. departures. To To check check model model departures, departures, one one usually usually relies relies identifying isolated on scatter plots plots of of both both raw raw and and standardized residuals versus versus fitted fitted values values on scatter standardized residuals (as we we demonstrated) demonstrated) as as well well as as on on other other prognostic prognostic factors factors.. (as In this this section we address address the the methods methods of of case case deletion deletion.. This This is is aa well well In section we known diagnostic known diagnostic tool tool used used extensively extensively in in OLS OL8 regression regression.. In In the the usual usual GLM GLM in turn analysis, one may may refit refit models models leaving leaving out out each each observation turn to to assess assess analysis, one observation in in the the impact impact of of the the change change in the fitted fitted model model for for aa given given observation observation.. We We the note here that that Stata's 8tata's g1m glm command, command, created created by by the the authors, authors, supports supports several several note here deletion, jackknife, and bootstrapping techniques techniques as as command options.. These These deletion, jackknife, and bootstrapping command options techniques extend extend to to GEE GEE models models as as well well.. techniques In the the case of the the GEE1 GEE1 for for GLMs, GLMs, we we must must address address the the panel panel structure structure In case of

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

DELETION DIAGNOSTICS DELETION DIAGNOSTICS

159 159

of the data data as as Haslett Haslett (1999) (1999) did did for for the the linear linear model model with with correlated correlated errors errors.. of the Preisser and Qaqish Qaqish (1996) (1996) consider diagnostics for for these these models models that that measure measure Preisser and consider diagnostics the influence influence of of aa subset of observations observations either either on on the the estimated estimated regression regression the subset of parameters or or on on the the linear predicted values values.. More More importantly, the authors authors parameters linear predicted importantly, the provide aa simple simple formula formula for for one-step one-step estimates estimates of of the the measures measures of influence. provide of influence. the data data analyst analyst with with the tools to to identify identify those those These diagnostics provide provide the These diagnostics the tools panels, or or individual individual observations, observations, having having an an undue undue influence influence on on the the fitted fitted panels, model. We could could refit refit the the model model with with the the associated associated subset subset of of observations observations model . We in order influence . In deleted order to to obtain obtain the the exact exact measure measure of of influence. In fact, fact, we we could could deleted in for every do this for every subset subset.. However, However, this this becomes becomes more more of of aa time time constraint, constraint, do this since as our our datasets datasets grow, grow, it it becomes becomes desirable desirable and and at at times times even even necessary since as necessary to develop develop one-step approximations to to the the influence influence.. to one-step approximations The basic idea idea for for one-step one-step approximations approximations is is to to restart restart the the estimation estimation The basic using the full full sample sample estimates estimates of of j3 a. We We delete delete the the associated associated subset subset using the Q and and a. of and then then reestimate reestimate the the two two parameter parameter vectors vectors with with only only one one of observations, observations, and iteration of the the estimation estimation procedure procedure.. iteration of the influence influence of of observations observations on on Deletion diagnostics diagnostics provide provide aa measure measure of of the Deletion parameter estimates estimates and and fitted values.. When When only only one one observation observation is is left out in in parameter fitted values left out aa deletion deletion diagnostic diagnostic procedure, procedure, it it is is called called an an observation-deletion observation-deletion diagnostic. diagnostic. It called aa cluster-deletion cluster-deletion diagnostic diagnostic when when aa set set of of observations observations corresponds corresponds It is is called to aa cluster cluster or or panel panel.. to there are are two two approaches approaches to to the the construction construction of of As previously previously described, described, there As influence measures. In one approach, we can measure the difference in the influence measures . In one approach, we can measure the difference in the fitted regression regression coefficients coefficients by by deleting deleting aa single single observation observation from from the the estimaestimafitted . In in tion. In the second approach, we measure the difference in the fitted regression tion the second approach, we measure the difference the fitted regression coefficients by deleting deleting aa panel panel of of observations. coefficients by observations . Let kk index index the the subset subset of of kk observations observations that that are are to to be be deleted, deleted, and and let let [k] [k] Let denote the follows then denote the remaining remaining observations. observations. It It follows then that that ,Qfkl ,B[kJ denotes denotes the the estiestimated regression parameters parameters with with the the set set of of kk observations observations that that are are deleted deleted.. mated regression Lastly, we provide provide the the equations equations Q Q= = X(X X(XTWX)-l X T and and H H = = QW. QW. These These Lastly, we TWX)-1XT equations serve as as aa basis basis for the discussion discussion to to follow follow in in the the next next two two subsecsubsecequations serve for the tions.. tions

Influence measures measures 44.3.1 .3.1 Influence The DFBETA diagnostic diagnostic is is aa measure measure of of the the difference difference between between the the full sample The DFBE_TA full sample estimator and the the fitted coefficient vector vector based based on on deleting deleting one one or or more more estimator ,B Q and fitted coefficient observations. A one-step one-step approximation approximation for for the the difference difference of the full full sample sample observations . A of the coefficient vector and and the the estimated estimated coefficient coefficient vector vector where where an an entire entire panel panel coefficient vector of observations is is deleted deleted is is of observations

(4 (XTWX)-1XT(W

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

1

-

hi) - 'Si

(4.48) .48)

160 16 0

RESIDUALS, DIAGNOSTICS, DIAGNOSTICS, AND AND TESTING TESTING RESIDUALS,

A one-step approximation for the the difference ofthe full sample sample coefficient coefficient vector vector A one-step approximation for difference of the full and the estimated estimated coefficient coefficient vector vector where where a a single single observation observation is is and the is deleted deleted is

(XTWX)-lx e

Sit wit - hit

(4.49) (4.49)

One may either either use use the the one-step one-step approximations to the the estimated estimated coefficoeffiOne may approximations to cients for leaving leaving out out observations, observations, or or one may fully fully fit fit the the model model for for each each cients for one may subsample. subsample . We have have looked looked several several times times at at the the Progabide Progabide data. data. Here, Here, the the calculation calculation We of diagnostic measures measures for for the the covariates covariates is is examined. A Poisson Poisson model model is used of diagnostic examined. A is used to explain the number number of of seizures experienced by by patients patients in in the the panel panel study. study. to explain the seizures experienced The outcome of of the the analysis analysis is is The outcome

GEE population-averaged model model GEE population-averaged Group variable:: Group variable

Link: Link : Family:: Family Correlation: Correlation :

Scale parameter:: Scale parameter

id id log log Poisson Poisson exchangeable exchangeable 1 1

Number of obs Number of obs Number of groups Number of groups Obs per per group group:: min min Obs avg avg max max Wald chi2(3) chi2(3) Wald Prob >> chi2 Prob chi2

= = = = = = =

= =

295 295 59 59 5 5 5.0 5.0 5 5 13.73 13 .73 0.0033 0 .0033

seizures Coef.. Std.. Err Err.. z P> I z I [95% Conf Conf.. Interval] Interval] seizures II Coef Std z P>Izl [95% -------------+----------------------------------------------------------------------------+---------------------------------------------------------------time .22 .179912 time I .111836 .0347333 3.22 .0437601 .179912 .111836 .0347333 3 00.001 .001 .0437601 progabide I .0275345 .0466847 0.59 .1190348 progabide .0275345 .0466847 0 .59 00.555 .555 --.0639658 .0639658 .1190348 timeXprog -2 .12 --.2015526 timeXprog I -.1047258 .0494024 -2.12 .1047258 .0494024 00.034 .034 .2015526 --.0078989 .0078989 - cons cons I 11.347609 1 .280853 11.414366 .414366 .0340601 39.57 1.280853 .347609 .0340601 39 .57 00.000 .000 lnPeriod (offset) 1nPeriod I (offset) ------------------------------------------------------------------------------

The model uses aa baseline baseline measure measure for for the the offset offset and and includes includes covariates covariates on on The model uses time, an an indication indication of of Progabide Progabide treatment, treatment, and and aa time time by by treatment time, treatment interinteraction. The particular particular PA-GEE PA-GEE model model fit fit here here is is an exchangeable correlation correlation action . The an exchangeable Poisson. Poisson . The The dataset dataset is is relatively relatively small, small, so so that that the the DFBETA DFBETA statistics statistics are are calculated calculated using both using both the the one-step one-step approximations approximations and and by by refitting refitting the the model model to to the the data subsets.. In In addition, addition, the the DFBETA DFBETA measures measures are are examined examined for for both both the the data subsets individual observation and and the the panel panel.. individual observation next present present a a plot plot of of the panel-level DFBETA DFBETA statistics statistics for for the the time time We next We the panel-level covariate where the the DFBETA DFBETA statistics statistics are are calculated calculated by by refitting refitting the the model model:: covariate where

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

DELETION DIAGNOSTICS DELETION DIAGNOSTICS

161 161 DFBETA versus versus Panel Identifier DFBETA Panel Identifier

o ci

b a w m W 0

oo ci

."

0

'.

.................................

1

1

1

1

1

1

10 10

20 20

30 30

40 40

50 50

60 60

Panel identifier identifier Panel PA-GEE Poisson model with with exchangeable exchangeable correlation correlation PA-GEE Poisson model

Next is plot of of the the panel-level panel-level DFBETA DFBETA statistics statistics for the progabide progabide Next is aa plot for the covariate, where the the DFBETA DFBETA statistics statistics are are calculated calculated by by refitting refitting the the model model:: covariate, where

DFBETA Panel Identifier DFBETA versus versus Panel Identifier co ci

0

ci

co

0

b

ci

'. '.

W

.'

0

ci O

.

.'

'. co

0

9 0

9 0

1

1

1

1

1

1

10 10

20 20

30 30

40 40

50 50

60 60

Panel identifier identifier Panel PA-GEE Poisson model with with exchangeable exchangeable correlation correlation PA-GEE Poisson model

and now aa plot plot of of the the panel-level panel-level DFBETA DFBETA statistics for the the time-progabide time-progabide and now statistics for interaction where the the DFBETA DFBETA statistics are calculated calculated by by refitting refitting the the model model:: interaction where statistics are

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

162 16 2

RESIDUALS, DIAGNOSTICS, DIAGNOSTICS, AND AND TESTING TESTING RESIDUALS, DFBETA versus versus Panel Identifier DFBETA Panel Identifier

co ci

o ci

b

a

m

0

....

o

ci o

.. . ...

'"

'.

..

.

'.

o

9 0

r

r

r

10 10

20 20

30 30

r 40 40

r

r

50 50

60 60

Panel identifier identifier Panel PA-GEE Poisson model with with exchangeable exchangeable correlation correlation PA-GEE Poisson model

This collection of of plots plots investigates investigates the the outliers of the the covariates covariates and and their their This collection outliers of effect on the fitted model. model. There There is is evidence evidence that that the the patient patient identified identified by by effect on the fitted panel id id = = 49 49 has has an an unusually unusually large large effect effect on two of of the the covariates covariates in in the the panel on two analysis. We should should point point out out that that the the data data listed listed in in this text have have no no other analysis . We this text other patient identifiers identifiers;; the the same same data data appear appear in in other other sources sources where where this this patient patient patient is as "patient "patient 207" 207";; see see Diggle, Diggle, Liang, Liang, and and Zeger Zeger (1994) (1994).. is identified identified as Knowing that that this this patient patient seems seems to to have have aa large large effect effect on on the the outcomes, outcomes, we we Knowing could proceed with with an an exploratory exploratory analysis analysis without without this this particular particular patient patient.. could proceed GEE population-averaged model model GEE population-averaged Group variable Group variable::

Link: Link : Family:: Family Correlation: Correlation :

Scale parameter:: Scale parameter

id id log log Poisson Poisson exchangeable exchangeable 1 1

Number of obs Number of obs Number of groups Number of groups Obs per per group group:: min min Obs avg avg max max Wald chi2(3) chi2(3) Wald Prob >> chi2 Prob chi2

= = = = = = =

= =

290 290 58 58 5 5 5.0 5.0 5 5 27.56 27 .56 0.0000 0 .0000

seizures Coef.. Std.. Err Err.. z P> I z I [95% Conf Conf.. Interval] Interval] seizures II Coef Std z P>Izl [95% -----------------------------------------------------------------------------time I .111836 .0378762 2.95 .0376001 .186072 time .111836 .0378762 2 .95 00.003 .003 .0376001 .186072 progabide I -.1068224 .0486304 -2.20 progabide .1068224 .0486304 -2 .20 00.028 .028 --.2021362 .2021362 --.0115087 .0115087 timeXprog I -.3023841 .0595185 -5.08 timeXprog .3023841 .0595185 -5 .08 00.000 .000 --.4190382 .4190382 --.18573 .18573 cons I .0340601 39.57 1.280853 _cons 11.347609 .347609 .0340601 39 .57 00.000 .000 1 .280853 11.414366 .414366 lnPeriod (offset) 1nPeriod I (offset) ------------------------------------------------------------------------------

-------------+----------------------------------------------------------------

While there While there seems seems to to be be evidence evidence that that this this patient patient is is different different from from the the rest, rest, without investigating the the causes, causes, we we should should not not drop drop the the patient patient from from the the without investigating analysis simply on the basis basis of of the results of of DFBETA DFBETA investigation investigation.. The The analysis simply on the the results

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

DELETION DIAGNOSTICS DELETION DIAGNOSTICS

163 163

patient does does seem seem to to have have an extraordinarily high high number number of of seizures seizures in in the the patient an extraordinarily baseline, perhaps perhaps indicating other medical medical conditions conditions.. In In this this case, case, we we would would baseline, indicating other contact the collectors collectors of the data data for for further further explanation. contact the of the explanation . our investigation, we can can also also calculate calculate DFBETA DFBETA measures measures Continuing our Continuing investigation, we on an observation basis rather than a panel basis. Here is a plot of the the on an observation basis rather than a panel basis . Here is a plot of observation-level DFBETA statistics for the the time time covariate covariate where where the the DFDFobservation-level DFBETA statistics for BETA statistics are are calculated calculated by by refitting refitting the the model model:: BETA statistics

DFBETA DFBETA versus versus Observation Observation Number Number

...... :\~•••

ci

-

•• ,

..

,.

'. o

•••• : ••

.,.

"·"'11"·': •.....

:.A ...::.:. :. •..:

,

_

...

,. ..,..

r

r

r

r

r

r

r

0

50 50

100 100

150 150

200 200

250 250

300 300

Observation number Observation number

PA-GEE Poisson model with with exchangeable exchangeable correlation correlation PA-GEE Poisson model

for the We now now present present aa plot plot of of the the observation-level observation-level DFBETA DFBETA statistics statistics for the We progabide DFBETA statistics progabide covariate covariate where where the statistics are are calculated calculated by by refitting refitting the DFBETA the model: model: the

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

164 164

RESIDUALS, DIAGNOSTICS, DIAGNOSTICS, AND AND TESTING TESTING RESIDUALS, DFBETA versus versus Observation Observation Number Number DFBETA

..

'.

.. . . :..:..:. :: ::"l1li,". -$Zt:.,-----------.. . '..:..... ... \.

~~·,'::A.

o

ci

r

r

r

r

r

r

r

0

50 50

100 100

150 150

200 200

250 250

300 300

Observation number Observation number PA-GEE Poisson model with with exchangeable exchangeable correlation correlation PA-GEE Poisson model

Here is plot of of the the observation-level observation-level DFBETA DFBETA statistics statistics for the timetimeHere is aa plot for the progabide interaction interaction where where the the DFBETA DFBETA statistics statistics are are calculated by refitrefitprogabide calculated by ting ting the the model model:: DFBETA versus versus Observation Observation Number Number DFBETA

co

0

ci

.

'.

iil

-"

>

'" fW

0

ci

..

...

. .....: \~... . :.. "·"'11"·': •....- .. , :.A ::.:. :. •..:

.

"'0 LL

....

_

...

,. ..,.

co

0

9

r

r

r

r

r

r

r

0

50 50

100 100

150 150

200 200

250 250

300 300

Observation number Observation number

PA-GEE Poisson model with with exchangeable exchangeable correlation correlation PA-GEE Poisson model

We may may have have anticipated anticipated that that the the analysis analysis of of the the DFBETA DFBETA statistics statistics for for We the observations would coincide coincide with with the the analysis analysis for for the the DFBETA DFBETA statistics statistics the observations would only calculated for the the panels. panels. However, However, this this is is not not the the case. case. When When we we delete delete only calculated for single observations, single observations, the the panel panel effect, for which which the the particular particular observation observation is is aa effect, for member, is not not removed removed unless unless the the observation observation represents represents aa (singleton) (singleton) panel. panel. member, is

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

GOODNESS OF OF FIT FIT (POPULATION-AVERAGED (POPULATION-AVERAGED MODELS) MODELS) GOODNESS

165 165

Leverage measures measures .3.2 Leverage 44.3.2 for use We may also extend Cook's Cook's distance distance for use with with panel panel data data in in addition addition to to We may also extend measuring the measuring the influence influence of of observations observations on on the the estimated estimated coefficient vector.. coefficient vector This is aa standardized standardized measure measure of of the the influence influence of of aa set set of observations on on the the This is of observations linear predicted value. DFBETA residuals are used to investigate the effect of linear predicted value . DFBETA residuals are used to investigate the effect of outliers in the covariates; Cook's distance is used to investigate the effect of outliers in the covariates ; Cook's distance is used to investigate the effect of the outliers in the the outcome outcome.. the outliers in Cook's distance, structured as as leaving leaving out out an panel, is is defined defined as as Cook's distance, structured an entire entire panel, Cooki = = Cookz

(p~) 1

Cpoll

S!(W;l1 ST(WZ

-- QZ) Qi)-lQi(W;l Qi)-lSi .j) -1 S2 -1 Qz(WZ 1 -- 0

(4.50) (4.50)

and for the the case case of of leaving out aa single single observation, Cook's distance distance is defined and for leaving out observation, Cook's is defined as as

C00 kit Cook2t

=

S7tQit QZt Sit 11 ~ p¢>(Wit - Qit)2z po(WZt -QZt)

(4.51) (4.51)

Cook's distance distance is is aa scaled scaled measure measure of of the the distance distance between between the the coefficient coefficient Cook's vectors when the kth group of observations is deleted from the analysis. vectors when the kth group of observations is deleted from the analysis. We should should also also consider consider methods methods for for calculating calculating the the leverage leverage of of the the observaobservaWe tions on the estimated association vector when the association parameters are tions on the estimated association vector when the association parameters are of prime interest. This type of analysis also necessitates the need to estimate of prime interest. This type of analysis also necessitates the need to estimate the variance variance of of these these parameters parameters.. the 4.4 Goodness of of fit fit (population-averaged (population-averaged models) models) 4.4 Goodness

Zheng Zheng (2000) (2000) provides provides discussion of measures measures of of goodness of fit fit for for PA-GEE PA-GEE discussion of goodness of for for GLMs. GLMs. These These measures measures are are generalizations generalizations of of measures measures commonly commonly used used for for fit of assessing the goodness goodness of GLMs.. assessing the of fit of GLMs

Proportional reduction reduction in in variation variation 44.4.1 .4 .1 Proportional Similar to to aa GLM, GLM, the the marginal marginal model model of of aa PA-GEE PA-GEE specifies specifies aa conditional conditional Similar mean and aa link function. The variance of of the the response response and and the the block-diagonal block-diagonal mean and link function. The variance covariance matrix describing the intrapanel intrapanel correlation correlation among among the the repeated repeated covariance matrix describing the responses are functions functions of of the the mean mean and and possible possible additional additional parameters parameters a a.. responses are Since the marginal marginal model model does does not not specify specify aa likelihood, we must must consider consider Since the likelihood, we extensions of nonlikelihood nonlikelihood based based summary summary measures measures in in order order to analyze extensions of to analyze the goodness goodness of of fit. The extension extension of of the the entropy entropy H measure for for categorical categorical the fit . The H measure marginal models marginal models is is defined defined as as

H

HMARG MARG

~

I

(~

)

_ 1 Et_ 1 Ek-1 _ ~Ztk - 11 _ En L.."i-l L.."t-l L.."k-l Kitk ln(~Ztk) n Kitk -= ",K ~ (~) N ak ln(ak) N Ek-1 L.."k=l CXk In CXk ",n

",ni

",K

(4.52) (4.52)

The The interpretation interpretation of of HMARG HMARG can can be be thought thought of as the the proportional proportional reducreducof as tion in in entropy entropy due due to to the the model model of of interest. interest. It is equal equal to to the the usual usual measure measure tion It is for GLM GLM when when the the number number of of observations per panel panel is is one one (nz (ni = = 11 for for all all i) i).. for observations per

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

166 166

RESIDUALS, DIAGNOSTICS, DIAGNOSTICS, AND AND TESTING TESTING RESIDUALS,

An extension extension of of the the R2 R 2 measure measure is is calculated calculated using using An n

2

RMARG

__

1

-

n;

Ri=1 ~t=1 (Yit - fit) \ni= L,1

E

ni t=1

2

it - Y

(4.53) (4.53)

~'

This measure is is interpreted interpreted as as the the proportion proportion of of variance variance in in the the outcome outcome that that This measure is by the the model model.. is explained explained by

Concordance correlation correlation 44.4.2 .4 .2 Concordance The concordance correlation correlation is is aa statistic statistic of of the the agreement agreement of of two persons The concordance two persons or measures.. The observed and and fitted fitted values values from from aa given given model model are are the the two two or measures The observed measures for which which we we calculate the correlation correlation.. measures for calculate the

rc r~

=

2l:~=1 (¥it - F..) Y .. )(~t - f' 17....)) (~'it E2 1 l:~~1 n' 1(Yit y.. 2 (Y: Y)2 ",n ",ni (l: Y)2 1 .. 2 L...,i=l Z-t=1i L...,t=l ( ') itt .. + L...,i=l1 Z_t=1it L...,t=l (~') it -- Y .. + Z-i Z-21 2

------=~==--=-"=-"'--'---------'---------:=--

",n

",ni

(4.54) (4.54)

This measure This measure is is less less than than or or equal equal to to Pearson's correlation coefficient coefficient.. The The Pearson's correlation reason for this this is that the the concordance concordance correlation correlation imposes imposes the the constraint that reason for is that constraint that the best best fitting fitting line line goes goes through the origin origin with with slope slope 11 when when comparing comparing the the the through the observed fitted values values.. observed and and fitted Here we we investigate investigate the the concordance concordance correlation correlation coefficient coefficient utilizing utilizing calcalHere culations from Bland Bland and and Altman Altman (1986) (1986) and and Lin Lin (1989) (1989).. The The coefficient coefficient is culations from is calculated for the the fit marginal model model for for the the Progabide Progabide data data with with the the calculated for fit of of aa marginal observed values.. This This investigation investigation produces produces the the following following results results observed values

Concordance correlation coefficient (Lin, 1989) 1989) Concordance correlation coefficient (Lin, CI type type rho_c SE(rho_c) Obs [ 95% ] PP CI 95% CI CI --------------------------------------------------------------00.391 .391

0.037 0 .037

295 295

=

Pearson's r = 0.493 0.493 Pearson's r Reduced major major axis axis:: Reduced

0.318 0 .318 0.316 0 .316

0.463 0.463 0.461 0.461

Pr(r = = 0) 0) = = 0.000 Pr(r 0 .000 Slope = 2.029 Slope = 2 .029

0.000 0 .000 0.000 0 .000

=

asymptotic asymptotic z-transform z-transform

=

C_b = rho_c/r rho_c/r = 00.793 C-b .793 Intercept = -13.243 Intercept -13.243

=

Difference (shat (shat -- seizures) seizures) 95% Limits Of Of Agreement Agreement Difference 95% Limits Average Std. Dev Dev.. (Bland && Altman, Altman, 1986) 1986) Average Std. (Bland --------------------------------------------------------------00.000 .000

16.262 16 .262

-31.872 -31 .872

31.872 31 .872

We can can graphically graphically depict depict the the concordance concordance by by plotting plotting the the observed observed values values We by the the fitted fitted values values.. In In this this plot, plot, we we include include the the lines lines of of perfect perfect fit fit for for the the by Pearson and concordance concordance correlations correlations.. Pearson and

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

GOODNESS OF OF FIT FIT (POPULATION-AVERAGED (POPULATION-AVERAGED MODELS) MODELS) GOODNESS

167 167

Note:: Data Data must must overlay overlay dashed dashed line line for perfect concordance concordance Note for perfect 0

150 150

0

100 100

0 0

o

0

50 50

0o -

10 1 ~0

20 20

Fitted values Fitted values

30 30

4.4.3 A x2 X2 goodness goodness of of fit fit test test for for PA-GEE FA-GEE binomial binomial models models .3 A 4.4

Horton, Bebchuk, Jones, Jones, Lipsitz, Lipsitz, Catalano, Catalano, Zahner, Zahner, and and Fitzmaurice Fitzmaurice (1999) (1999) Horton, Bebchuk, present an extension of the Hosmer Jr. and Lemeshow (1980) goodness of fit fit present an extension of the Hosmer Jr. and Lemeshow (1980) goodness of test applicable to PA-GEE models. The basic idea of the test is to group the test applicable to PA-GEE models . The basic idea of the test is to group the ordered fitted values values into into groups groups defined defined by by deciles deciles (tenth (tenth percentile, percentile, twentieth twentieth ordered fitted 2 x2 percentile, etc.), and calculate a X goodness of fit test on the counts. percentile, etc.), and calculate a goodness of fit test on the counts. The ordered ordered fitted fitted values values lit Mit are are used used to to define define groups groups defined defined by by The 1. The The first first group group contains contains the the Ei L:i ni/10 ni/10 observations associated with with the the 1. observations associated smallest Mit fitted values. smallest liit fitted values .

2. The The second second group contains the the Ei L: i ni/10 ni/10 observations observations associated with the the 2. group contains associated with next smallest smallest lit Mit fitted values.. next fitted values

The tenth tenth (last) (last) group group contains contains the the Ei L: i ni/10 associated 10. The 10. ni/10 observations observations associated with the the largest largest Ftjt Mit fitted fitted values values.. with It is aa fairly fairly common common occurrence occurrence that that the the fitted fitted values Mit will will contain contain tied tied It is values Ftjt values. In such such cases the number number of of members members in the decile decile risk risk groups will not not values . In cases the in the groups will be equal. equal. The The grouping grouping is such that that members members of of aa given given group group have have similar similar be is such predicted risk. risk. predicted We define define indicator indicator variables variables for the first first 99 groups groups and and reestimate reestimate the the model model We for the

Logit(Pit) = X,0 +'Y111it +'Y2I2it + . . . +'Y9I9it

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

(4 .55) (4.55)

168 16 8

RESIDUALS, DIAGNOSTICS, DIAGNOSTICS, AND AND TESTING TESTING RESIDUALS,

where I kit is is aa binary binary variable variable indicating indicating whether whether observation observation it it belongs belongs to to where Ikit group k. group k. In In general, general, if if the the original original model model holds, holds, then then 'Yl 1'1 = = rye 1'2 = = ... 1'9 = 0. O. Section Section 4.5 4.5 -y9 = the methods methods for for calculating calculating score score and and Wald Wald tests tests of of coefficients, coefficients, and and describes describes the the authors et al al.) suggest the the score score test test over over the the Wald Wald test test.. the authors (Horton (Horton et .) suggest Clearly the the Wald test is is easier easier to to calculate calculate with with standard standard software, software, though though it Clearly Wald test it does require fitting fitting the the alternate alternate model model in in equation equation 4.55 4.55.. With With large datasets does require large datasets there will will be be only only small small differences differences in in the the two two tests. tests. See See section section 4.5 4.5 regarding regarding there details on when when alternate alternate approaches approaches are are warranted warranted.. details on While we we have have presented presented the test in in its common form, one can can use use any any While the test its common form, one number G, where where the the test test statistic statistic is is distributed distributed as as x2 X2 with with (G (G -1) number of of groups groups G, -1) degrees of freedom. degrees of freedom . Note that that it it is is not not always clear when when to to calculate calculate this this test. test. The The model model Note always clear for the the Progabide Progabide data data are good example example of of an an analysis analysis resulting resulting in in tied for are aa good tied values. There are are only only 44 unique unique fitted values for the 295 295 observations. observations. It It would would values . There fitted values for the seem reasonable to to define define the the groups groups based based on these four four values, values, but but those those seem reasonable on these indicators are then then collinear collinear with with the the existing existing covariates already in in the the model model.. indicators are covariates already The analysis The analysis of of categorical categorical data, data, when when there there are are only limited number number of of only aa limited covariate patterns, commonly commonly results in aa lack lack of of uniquely uniquely fitted fitted values values.. covariate patterns, results in Returning to to the generated quadratic quadratic data data defined defined in in equation equation 44.31, we Returning the generated .31, we calculate the goodness goodness of of fit fit test test using using Wald's Wald's approach for the the linear linear model calculate the approach for model y

(4.56) (4.56)

= (30 + xl(31

the results results indicate indicate strong strong evidence evidence that that the the model model does not Not surprisingly, surprisingly, the Not does not fit the the data, data, x2 X2 = = 293 293.27. the better better model model fit .27 . Testing Testing the (30 + x1/31 xl(31 + + x12/32 x12(32 (4 yy = = /30

(4.57) .57)

X2 = = 5.78, for which which the the p-value p-value is is 0.7615-indicating O.7615-indicating no no evidence evidence results in x2 results in 5.78, for that the the model model does does not not fit. fit. that In aa study study to to assess assess the the effect effect of of smoke smoke and and pollution pollution on on the the respiratory respiratory In symptoms of children, children, responses responses were were measured, measured, for for each each child, child, once once aa year year symptoms of for 44 years. years. Covariates Covariates included included whether whether aa family family member member smoked, smoked, the the city city in in for which the child child resided, resided, and and the the age age of of the the child. child. An An exchangeable exchangeable correlation correlation which the binomiallogit PA-GEE model model resulted resulted in in binomial logit PA-GEE GEE population-averaged model model GEE population-averaged Group variable:: id Group variable id Link: logit Link : logit Family binomial Family:: binomial Correlation: exchangeable Correlation : exchangeable Scale parameter Scale parameter::

11

Number of of obs Number obs Number of of groups groups Number Obs per per group group:: min min Obs avg avg max max Wald Wald chi2(3) chi2(3) Prob Prob >> chi2 chi2

= = = = = = =

= = =

100 100 25 25

44 4 4.0 .0 44 8.26 8.26 0.0409 0 .0409

------------------------------------------------------------------------------

symptom Coef Std zZ P>Izl symptom II Coef.. Std.. Err Err.. P> I z I [95% Conf Conf.. Interval] Interval] [95% -----------------------------------------------------------------------------------------+---------------------------------------------------------------city I -1 .004414 city .4908312 -0.09 -1.004414 .9196087 --.0424028 .0424028 .4908312 -0 .09 00.931 .931 .9196087 age I --.3200042 .1836975 -1.74 .0400363 age .3200042 .1836975 -1 .74 00.082 .082 --.6800447 .6800447 .0400363 smoke I 2 .11 11.257368 smoke .6519219 .3089066 2.11 .0464761 .6519219 .3089066 00.035 .035 .0464761 .257368 cons I 2.301695 1.780744 1.29 -1.188499 5.791889 _cons 2 .301695 1 .780744 1 .29 00.196 .196 -1 .188499 5.791889 ------------------------------------------------------------------------------

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

TESTING COEFFICIENTS IN IN THE THE PA-GEE PA-GEE MODEL MODEL TESTING COEFFICIENTS

169 16 9

We evaluate evaluate the the goodness goodness of of fit fit of of the the model model using using 44 groups and obtain obtain the the We groups and results results chi2( 3) chi2( 3) == Prob >> chi2 chi2 == Prob

11.02 .02 00.7962 .7962

indicating no evidence evidence that that there there is is aa lack of fit. fit. The model indicates that indicating no lack of The model indicates that symptoms tend to decrease with age and that familial smoking tends to symptoms tend to decrease with age and that familial smoking tends to inincrease the likelihood likelihood of of respiratory respiratory symptoms as the the child child is is exposed. exposed. crease the symptoms as that this this test test is is useful useful in in determining the functional functional form form and and We emphasize emphasize that We determining the specification of covariates for a model. The test is not useful when comparing specification of covariates for a model . The test is not useful when comparing hypothesized correlation structures. In most most cases, cases, the the predicted predicted values values are are hypothesized correlation structures . In rank equal (or nearly so) for different correlation structures in models that rank equal (or nearly so) for different correlation structures in models that are otherwise equivalently equivalently specified. specified. are otherwise 4.5 Testing coefficients coefficients in the PA-GEE PA-GEE model model 4.5 Testing in the

The three standard to constructing constructing test test statistics statistics for for hypothesis hypothesis The three standard approaches approaches to tests are are the the likelihood ratio test, test, the the Wald Wald test, test, and and the the score score test. test. Each Each of of tests likelihood ratio these tests tests is is formed formed from from aa quadratic quadratic expression expression of of an an estimated estimated coefficient coefficient these vector, and an an estimate estimate of of the the variance variance of of the the coefficients coefficients.. Introductory Introductory modmodvector, and eling texts typically typically focus focus on likelihood-based models models for for which which aa discussion discussion of of eling texts on likelihood-based only one test construction construction focuses on only one approach approach.. Here Here we we wish wish to to differentiate differentiate test focuses on 33 possible possible approaches approaches to to building building each each of of the the tests. tests. Throughout, we assume assume that that (p (p xx 1) coefficient vector vector QT j3T may may be be written written Throughout, we 1) coefficient T as the augmented augmented vector vector (yT, (TT, ST) 8 ) where where the the (r x 1) 1) vector vector y ,T is the the first first rr as the (r x T is 1) vector component of QT, j3T , and and the the ((p-r) ((p-r) xxl) vector ST 8 T are are the the remaining remaining components. components. component of Further, we Further, we let let VMS VMS denote denote the the modified modified sandwich estimate of of variance variance and and sandwich estimate V the naive naive variance variance.. The The hypothesis hypothesis test test of of interest interest is is V denote denote the

'='0

(4.58) (4.58) Ho: H o: 'Y='Yo where 10 is is the the hypothesized hypothesized value value of of y. lwhere yo In presenting presenting the the derivations derivations and and formulas formulas for for testing testing PA-GEE PA-GEE models, models, In we let let 73 Q denote estimated coefficient coefficient vector, we denote the the estimated vector, _,Q 73rI denote denote the the estimated estimated coefficient vector for for the the independence independence model, model, and 73rc denote denote the the estimated estimated coefficient vector and ,QIc coefficient vector for for the independence model subject to to the the constraints constraints of the coefficient vector the independence model subject of the hypothesis test. hypothesis test. Approach Approach 11 is is analogous analogous to to aa likelihood-based modeling approach approach.. The The test test likelihood-based modeling statistic is constructed constructed using using the the naive naive variance variance estimate estimate and and assumed assumed to to folfolstatistic is low X2 distribution distribution with with rr degrees freedom.. Approach Approach 22 is is characterized characterized low aa Xz degrees of of freedom by constructing constructing the the test test statistic statistic using using aa robust robust variance variance estimate estimate.. The The test test by statistic is also assumed to to follow follow aa Xz X2 distribution distribution with unadjusted degrees degrees statistic is also assumed with rr unadjusted offreedom (generalized hypothesis test).. Approach Approach 33 constructs constructs the the test test statisstatisof freedom (generalized hypothesis test) tic using using the the naive naive variance variance estimate estimate.. The The test test statistic assumed to to follow follow tic statistic is is assumed

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

170 17 0

RESIDUALS, DIAGNOSTICS, DIAGNOSTICS, AND AND TESTING TESTING RESIDUALS,

the x2 X 2 distribution distribution with with r* r* adjusted adjusted degrees degrees of of freedom freedom (working hypothesis the (working hypothesis test) . test). Most software software packages packages support support the the first first two two approaches approaches to to the the construction Most construction of tests. The third approach approach is is not not currently currently supported supported.. In In all all of of of hypothesis hypothesis tests. The third these approaches approaches to to the the construction construction of of aa test test of of the the hypothesis, the PA-GEE PA-GEE these hypothesis, the model estimated using using the the user-specified user-specified correlation correlation structure structure.. If If aa test test is model is is estimated is constructed using results results from from the the independence independence model, model, it it is is called called aa naive naive constructed using hypothesis test test.. hypothesis Rotnitzky and and Jewell Jewell (1990) (1990) present present derivations derivations and and extensions extensions of ofthe workRotnitzky the working naive hypothesis hypothesis tests tests that that are are discussed discussed here here for the PA-GEE PA-GEE model model.. ing and and naive for the In the following following subsections subsections we we present present the the derivations derivations of the hypothesis hypothesis tests tests In the of the and use various various examples examples to to illustrate illustrate the the tests. tests. and use Likelihood ratio ratio tests tests 4,4.5.1 .5.1 Likelihood

The usual likelihood likelihood ratio ratio test test can not be be applied applied to to PA-GEE PA-GEE models models since since The usual can not there is no associated likelihood underlying the model. However, a naive likethere is no associated likelihood underlying the model . However, a naive likelihood ratio test may be calculated under the associated independence model. lihood ratio test may be calculated under the associated independence model . The naive likelihood likelihood ratio ratio test test is then aa comparison comparison of the likelihood_ likelihood G(31) £(/3r) The naive is then of the for fitting fitting the the unconstrained unconstrained independence model to to the the likelihood likelihood f £(/3rd for for independence model (,31C ) for fitting the the constrained independence model. fitting constrained independence model.

(4.59) (4.59)

TiR = = 22 [£(,8r) - G0101 £(,8rd] TLR [G01) -

In the the usual usual likelihood likelihood setting, setting, the the test test statistic statistic is is distributed X2 with with dedeIn distributed x2 grees of freedom equal to to rr.. In In the the case case of of PA-GEE PA-GEE models, models, the the test test statistic statistic grees of freedom equal is still distributed distributed x2, X2 , but but with with an an adjusted adjusted degrees degrees of of freedom freedom parameter parameter is still calculated from the the data. data. The The test test statistic may be be written written calculated from statistic may r

(4.60) (4.60)

TiR = Ldjxi TLR = djxi j=1 j=1

I where dl > 2: d2 d2 > 2: ... 2: d,. dr are are the the (ordered) (ordered) eigenvalues of P P ~ ~P where d1 .. . > eigenvalues of Po'P o- PIi 1 nn _

Po PO

-n1 ~ L XZOZAZOZXZ X\~iAi~iXi

(4.61) (4.61)

Pl

1 ~7' XZo2v(YZ)AZXZ

(4.62) (4 .62)

n

i=1 2=1

< p, p, where for rr G where for

Xyl (tX!'lT~;A;~iXi('l) -, (tX!'lT~;A;~iXYl)

Xi ~ X(1) X!» _- X( 2 )

XZ =

and and

X\ =

XZ = XZ Xi

n

2-1 ~~

(2)T AiAiAiX (2)

X

for for rr = = p. p.

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

-1

l

n

2-1 ~~

X

(2)T AiAZOZX (1)

/

(4.63) (4.63)

TESTING COEFFICIENTS IN IN THE THE PA-GEE PA-GEE MODEL MODEL TESTING COEFFICIENTS

171 171

When we we are are testing testing aa single covariate, the the value value of of dl d1 simplifies simplifies to to the the ratio ratio When single covariate, of the variance variance of of the tested coefficient coefficient in in the the PA-GEE PA-GEE model model to to the the variance variance of the the tested of the tested tested coefficient coefficient in in the the independence independence model model.. In In the the calculation calculation of of this this of the ratio, the modified modified sandwich sandwich estimates estimates of variance for for each each model model should should be be ratio, the of variance used. used. Using the the Progabide Progabide data, data, we we can can test test the the hypothesis hypothesis that that the the coefficient coefficient Using of timeXprog interaction interaction is is zero zero using using the the simplification described. of timeXprog simplification described. Fitting the the exchangeable exchangeable correlation correlation Poisson Poisson model model to to the data without without Fitting the data constraints results in in constraints results GEE population-averaged model model GEE population-averaged Group variable:: id Group variable id Link: log Link : log Family:: Poisson Family Poisson Correlation: exchangeable Correlation : exchangeable Scale parameter:: Scale parameter

1 1

Number of obs Number of obs Number of groups Number of groups Obs per per group group:: min min Obs avg avg max max Wald chi2(3) chi2(3) Wald Prob >> chi2 Prob chi2

= = = = = = =

295 295 59 59 5 5 5.0 5.0 5 5 00.92 .92 0.8203 0 .8203

(standard errors errors adjusted adjusted for for clustering clustering on (standard on id) id) -----------------------------------------------------------------------------II

Semi-robust Semi-robust Coef Std z P>Izl Coef.. Std.. Err Err.. z P> I z I [95% Conf Conf.. Interval] Interval] [95% -------------+----------------------------------------------------------------------------+---------------------------------------------------------------seizures II seizures

time time I .111836 .1169256 0.96 .3410059 .111836 .1169256 0 .96 00.339 .339 --.1173339 .1173339 .3410059 progabide .12 --.410893 progabide I .0275345 .2236916 0.12 .465962 .0275345 .2236916 0 00.902 .902 .410893 .465962 timeXprog I -.1047258 .2152769 -0.49 .3172092 timeXprog .1047258 .2152769 -0 .49 00.627 .627 --.5266608 .5266608 .3172092 - cons cons I 11.347609 1 .036547 11.658671 .1587079 8.49 1.036547 .347609 .1587079 8 .49 00.000 .000 .658671 lnPeriod (offset) 1nPeriod I (offset) ------------------------------------------------------------------------------

The modified sandwich sandwich variance variance of the timeXprog timeXprog coefficient is .0463. The modified of the coefficient is .0463 . Fitting an independence model to the data results in Fitting an independence model to the data results in Generalized linear models models Generalized linear Optimization ML:: Newton-Raphson Newt on-Raphs on Optimization :: ML Deviance Deviance Pearson Pearson

= =

3575.423602 3575 .423602 5726.792994 5726 .792994

= =

No.. of obs No of obs Residual df df Residual Scale param Scale param (l/df) Deviance Deviance (1/df) (l/df) Pearson Pearson (1/df)

V(u) = u Variance function function:: V(u) Variance u Link function g(u) = ln(u) In(u) Link function : g(u) Standard errors : Sandwich Sandwich Standard errors

[Poisson] [Poisson] [Log] [Log]

Log Log likelihood likelihood BIC BIC

ArC AIC

-2318.503321 = -2318 .503321 3552.6757 = 3552 .6757

295 295 291 291 11 12.28668 12 .28668 19.6797 19 .6797

=

15.74579 15 .74579

Robust Robust Coef Std z P>Izl Coef.. Std.. Err Err.. z P> I z I [95% Conf Conf.. Interval] Interval] [95% -------------+----------------------------------------------------------------------------+---------------------------------------------------------------II

seizures II seizures

time time I .111836 .1943749 0.58 .4928038 .111836 .1943749 0 .58 00.565 .565 --.2691317 .2691317 .4928038 progabide .2221647 .12 --.4079003 progabide I .0275345 .2221647 0.12 .4629693 .0275345 0 00.901 .901 .4079003 .4629693 timeXprog I -.1047258 .2946478 -0.36 .4727732 timeXprog .1047258 .2946478 -0 .36 00.722 .722 --.6822248 .6822248 .4727732 - cons cons I 11.347609 1 .038671 11.656548 .1576245 8.55 1.038671 .347609 .1576245 8 .55 00.000 .000 .656548 lnPeriod (offset) 1nPeriod I (offset) ------------------------------------------------------------------------------

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

172 17 2

RESIDUALS, DIAGNOSTICS, DIAGNOSTICS, AND AND TESTING TESTING RESIDUALS,

The modified sandwich variance for for the the progabide progabide coefficient is .0868 .0868 for for The modified sandwich variance coefficient is the independence model.. Log-likelihood Log-likelihood values values for for independence independence models with the independence model models with (shown above) and without without (not (not shown) shown) the the progabide progabide covariate covariate are (shown above) and are Lrull Gfull

=

Lsubset Gsubset

=

-2318.503321 -2318 .503321 -2319.800533 -2319 .800533

(4.64) (4.64) (4.65) (4.65)

The The naive naive likelihood likelihood ratio ratio test test statistic is therefore therefore equal equal to to 22.59 and the the statistic is .59 and adjusted degrees of of freedom freedom is is .0463/.0868 .0463;'0868 = = .5534 .5534.. The The results results of of the the naive adjusted degrees naive likelihood ratio test test are are thus thus summarized summarized as as likelihood ratio TLR

2.59 2.59 .0479 .0479

=

p p

=

(4.66) (4.66) (4.67) (4.67)

comparing the comparing the test test statistic statistic to to aa X random random variable variable with with .5534 .5534 degrees degrees of of freedom.. The The test test rejects rejects the the hypothesis hypothesis that that the the coefficient coefficient on on timeXprog timeXprog is freedom is o0 at at aa a a = = .05 .05 level level of of significance significance.. x22

4.5.2 Wald tests

4 .5.2

Most software packages packages will will allow allow Wald-type Wald-type testing testing of of coefficients coefficients after after model model Most software estimation using either either the the naive naive or or the the modified modified sandwich sandwich estimate of varivariestimation using estimate of ance. Test statistics are typically typically calculated calculated using using Wald Wald tests tests without without adjustadjustance. Test statistics are ing the ing the degrees degrees of of freedom. When it it is is said said that that the the degrees degrees of of freedom freedom are are freedom . When unadjusted, it means means that that there there is is no no aa priori priori algorithm algorithm that that calculates calculates an an unadjusted, it adjustment from data. However, some some software software packages packages will will make make aa differdifferadjustment from data. However, ent kind of of adjustment adjustment when when the the modified modified sandwich sandwich estimate estimate of variance is ent kind of variance is singular (when there there are are more more covariates covariates than than panels) panels).. Using Using the the modified modified singular (when sandwich estimate of of variance variance results results in in the the generalized generalized Wald Wald test test statistic statistic sandwich estimate

- ^y0) (4.68) (4.68) Tw = n(=y -'Yo)TvivlS In In most most cases cases software software packages packages that that allow allow post-estimation post-estimation Wald Wald tests tests will will use degrees of of freedom freedom equal equal to to rr.. As As always, always, when when using using the the modified modified sandsanduse degrees estimate of of variance variance to to construct construct test test statistics, statistics, we we should should ensure ensure that that wich wich estimate there are less covariates than panels; otherwise the modified sandwich estimate there are less covariates than panels; otherwise the modified sandwich estimate of variance is is singular. singular. of variance The generalized Wald test test can can easily easily be be performed performed as as aa post-estimation post-estimation The generalized Wald command using the Progabide data. First, the output of fitting PA-GEE command using the Progabide data. First, the output of fitting aa PA-GEE model is obtained by model is obtained by

GEE population-averaged population-averaged model model Group variable id Group variable:: id Link : log Link: log Family Poisson Family:: Poisson Correlation: exchangeable Correlation : exchangeable Scale parameter:: Scale parameter

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

1 1

Number Number of obs obs Number of groups Number of groups Obs Obs per per group group:: min min avg avg max max Wald Wald chi2(3) chi2(3) Prob >> chi2 Prob chi2

= = = = =

295 295 59 59 5 5 5.0 5 .0 5 5

= =

00.92 .92 0.8203 0 .8203

= = =

TESTING COEFFICIENTS IN IN THE THE PA-GEE PA-GEE MODEL MODEL TESTING COEFFICIENTS

173 173

(standard errors errors adjusted adjusted for for clustering clustering on (standard on id) id) -----------------------------------------------------------------------------Semi-robust Semi-robust II seizures II Coef.. Std.. Err Err.. z P> I z I [95% Conf Conf.. Interval] Interval] seizures Coef Std z P>Izl [95% -------------+----------------------------------------------------------------

-------------+---------------------------------------------------------------time time progabide progabide timeXprog timeXprog - cons cons lnPeriod 1nPeriod

of of

I 1 1 1 I

.111836 .111836 .0275345 .0275345 -.1047258 .1047258 11.347609 .347609 (offset) (offset)

.1169256 .1169256 .2236916 .2236916 .2152769 .2152769 .1587079 .1587079

0.96 0 .96 0.12 0 .12 -0.49 -0 .49 8.49 8 .49

00.339 .339 00.902 .902 00.627 .627 00.000 .000

--.1173339 .1173339 --.410893 .410893 --.5266608 .5266608 1.036547 1 .036547

.3410059 .3410059 .465962 .465962 .3172092 .3172092 11.658671 .658671

Test results results evaluating evaluating whether whether progXtime progXtime is is equal equal to to 00 provides provides statistics statistics Test

Tw = 0.24 0.24

(4.69) (4.69)

TW

p p

=

.6266 .6266

(4.70) (4.70)

the test test statistic statistic to to aa x2 X2 random random variable variable with with 11 degree of freedom freedom.. comparing comparing the degree of The test fails to reject reject the the hypothesis hypothesis that that the the coefficient coefficient on on timeXprog timeXprog is is 0 0 The test fails to at a = = .05 It is is not not uncommon uncommon to to reach reach different different conclusions conclusions using at the the a .05 level. level . It using different types of of x2 X2 tests. tests. different types In this this particular particular case, case, the the generalized generalized Wald Wald test test is is actually actually already already part part In of the output listed for for the the model model.. In In the the model model output, output, the the test test that that each each of the output listed column is 00 is is presented presented as as aa test test with with aa normally normally distributed test statistic statistic.. column is distributed test The X2 statistic statistic is is equal equal to to the the square square of ofthis statistic (.24 (.24 = = ((-.49)2) and the the The x2 this statistic .49) 2 ) and p-values are are the the same. same. p-values An alternative alternative to to the the generalized generalized Wald Wald test test is is the the working working Wald Wald test. test. In In An this approach, approach, we we construct test that uses the the naive naive variance variance estimate estimate to to this construct aa test that uses avoid the singularity singularity problems problems that that might might arise arise using using the the modified modified sandwich sandwich avoid the estimate of variance variance.. estimate of The The working working Wald Wald test test statistic statistic is is defined defined as as

T V -1 (y

Tw = n(=y - yo)

(4.71) (4.71)

- y0)

This approach assumes assumes that that the the correlation parameters, a, (x, describe describe the the true true This approach correlation parameters, structure of the the panels. panels. Regardless Regardless of of whether whether this this is is in in fact true, it it is is still still structure of fact true, possible to to describe describe the the degrees degrees of of freedom freedom of of the the test test statistic statistic.. possible The The working working test test may may be be written written r

T{v = = Tw where where

C1 c1

LCjXi j=1 j=1

(4.72) (4.72)

Cjx1

1Q11 2: C C2 > 2: ... > 2: c,. Cr are are the the (ordered) (ordered) Eigenvalues Eigenvalues of of Q ~ ~ Q > Q0 01 Q

DZ

Qo Q0

= n1 ~n

D V

1 n

DZV

2=1 n _

Z Z

1

(4.73) (4.73) _

(4 Z-1Cov(YZ)VZ-1 D2 Q1 2=1

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

.74) (4.74)

1744 17

RESIDUALS, DIAGNOSTICS, DIAGNOSTICS, AND AND TESTING TESTING RESIDUALS,

r

where for where for rr G < p, p,

v2-1,W1 _ n _ -1 n

D, ~ Dj'l - D)')

and Di and DZ

=

Di = DZ

(t, D;'lT v,-' D)'l (t, D;'lT v,-' Dj'1) 2-1

2-1

(4.75) (4.75)

for = p. p. for rr =

4.5.3 Score Score tests tests 4.5.3

The development The development of score hypothesis hypothesis tests tests follows follows the the development development seen seen in in the the of score previous section. The generalized generalized score score test test substitutes substitutes the the modified modified sandwich sandwich previous section . The estimate of variance variance for the naive naive variance variance to to obtain obtain the the test test statistic statistic for the estimate of TS = -'P 7 ('Yo, S('Yo)) T VMS,-y`F-y ('Y0, S('Yo))

(4.76) (4.76)

where w/,O isis the the constrained constrained estimating estimating equation equation of of the the PA-GEE PA-GEE model model of of where T,() interest. interest . In In the the case case that that the the number number of of covariates covariates exceeds exceeds the the number number of of panels, panels, we we working can consider the working score test as can consider the score test as

*

T Zss

11

= V/,w/,h'o,8h'o)) = -w/'h'o,8h'o)) 7('Yo~S('Yo)) nn XP 7('Yo, 6 ('Yo)) T v7 ~

~

T

(4.77) (4.77)

where (as in in the the case case of of the the working working Wald Wald test) test) where (as

s w

T = = Zs =T Z'w =

r

LCjXi cjX1

j=l j=1

(4.78) (4.78)

1Q1 and where c1 C1 >_ 2: c2 C2 2: ... 2: c,. Cr are the (ordered) (ordered) eigenvalues eigenvalues of of Q Q ~ ~ Q and where are the Qo 01 Q1 (defined in in equations equations 4.73 4.73 and and 4.74). 4.74). (defined

4 .6 Assessing of PA-GEE 4.6 Assessing the the MCAR MCAR assumption assumption of PA-GEE models models It was mentioned mentioned in in the the previous previous chapter chapter that that the the PA-GEE PA-GEE models models depend depend It was on an assumption assumption that that is is aa special special case case of of MCAR missing data. data. Here, Here, we we on an MCAR for for missing discuss various techniques techniques for for assessing assessing the the validity validity of this assumption assumption.. discuss various of this Section 3.7 3.7 included included an illustration of of the the various various patterns patterns of of missing missing data data Section an illustration that might might be be seen seen in in data. data. We We shall shall use this illustration illustration for for aa first first look look at at that use this the patterns patterns.. the Fanurik, Zeltzer, Zeltzer, Roberts, Roberts, and and Blount (1993) present present an example of of aa study study Fanurik, Blount (1993) an example on pain tolerance tolerance in in children. Each child child participating participating in in the the study study placed placed his his on pain children. Each for as or her hands hands in in cold cold water water for as long long as as possible possible.. The The response response variable variable for for the the or her pain tolerance tolerance proxy proxy is is the the log log of of time time in in seconds, Intime, that each child child was was pain seconds, 1ntime, that each his or in the able to keep keep his or her her hands hands in the water water.. Each Each child child participated participated in in the the trials trials able to by repeating repeating the the experiment experiment 44 times. times. The The children children were were classified classified as as either either by attenders or distractors, distractors, where these terms related to to the the child's child's coping coping style style attenders or where these terms related cs cs.. The The attenders attenders tended tended to to concentrate concentrate on on either either the the experimental experimental apparatus apparatus or on their their hands, hands, while while the the distractors distractors tended tended to to concentrate concentrate on on unrelated unrelated or on

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

ASSESSING THE MCAR MCAR ASSUMPTION ASSUMPTION OF OF PA-GEE PA-GEE MODELS MODELS ASSESSING THE

175 175

things (e (e.g., home or or school) school).. Three Three baseline baseline measurements measurements were were collected for things .g., home collected for each before aa counseling counseling session trt was was held. held. These These counseling counseling sessions sessions each child child before session trt were to teach teach the the attending coping style, the distractor distractor coping coping were randomized randomized to attending coping style, the style, or the the control control (no (no advice) advice).. A A fourth measurement was was collected collected after after style, or fourth measurement the counseling counseling session session.. The The authors that altering altering aa child's child's natural natural the authors expected expected that coping style would would impede impede his his or or her her performance, performance, while the control control counseling counseling coping style while the session was expected expected to to have have no no effect effect on on the the performance performance.. The The study study included session was included 64 children with with 11 11 missing missing data. data. 64 children There would be be 256 256 observations observations if if the the study study were were complete complete.. Due Due to to various various There would reasons, 11 total total observations observations are missing from from 66 of of the the 64 64 children. children. Our Our purpurreasons, 11 are missing pose here here is is not not to to propose propose aa full full analysis analysis of the data, data, but but rather rather to to investigate investigate pose of the the missing missing data data mechanisms mechanisms related related to to PA-GEE PA-GEE modeling modeling.. the The illustration below below provides provides aa graph graph of of the the missing missing data. data. The illustration Missing Missing data data

E

0 d

r

r

r

r

r

r

r

r

r

r

r

r

r

r

0

5

10 10

15 15

20 20

25 25

30 30

35 35

40 40

45 45

50 50

55 55

60 60

65 65

Panel identifier identifier Panel

Squares mark mark missing missing data data for for the response variable variable in dataset with 64 Squares the response in aa dataset with 64 panels with with 44 repeated repeated measures measures per per panel. panel. panels test the the PA-GEE PA-GEE MCAR MCAR assumption To To test assumption

P(RZIY2, XZ QZ ) = P(RZ)

(4.79) (4.79)

aa binary binary variable variable is is created created to to indicate indicate the the groups groups with with missing missing observations observations.. A t-test A t-test is is performed performed on on the the other other covariates. covariates. This This investigation investigation can can be be used used for for each each covariate covariate at at each each replication replication number number.. For For the the pediatric pediatric pain pain data data at at the third third replication, the coping coping style style results results are are the replication, the

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

176 176

RESIDUALS, DIAGNOSTICS, DIAGNOSTICS, AND AND TESTING TESTING RESIDUALS,

Two-sample test with with equal equal variances variances Two-sample tt test Group Il O b s Mean Std. Err.. Std.. Dev Dev.. [95% Conf Conf.. Interval] Interval] Group Obs Mean Std . Err Std [95% ---------+--------------------------------------------------------------------

---------+--------------------------------------------------------------------

o

66 5 .2236068 .5477226 .9252004 2.0748 0 II 11..5 .2236068 .5477226 .9252004 2 .0748 11 II 58 5 .0662266 .5043669 1. 367383 632617 58 11..5 .0662266 .5043669 1 .367383 11..632617 ---------+-------------------------------------------------------------------

---------+--------------------------------------------------------------------

combined II 64 .0629941 .5039526 1.374116 combined 64 11.5 .5 .0629941 .5039526 1 .374116 11.625884 .625884 ---------+--------------------------------------------------------------------

---------+--------------------------------------------------------------------

diff II 0 .2178535 .4354829 .4354829 diff 0 .2178535 -- .4354829 .4354829 -----------------------------------------------------------------------------Degrees of freedom:: 62 62 Degrees of freedom

=

=

Ho:: mean(0) mean(O) -- mean(1) mean(1) = diff diff = 00 Ho

-=

Ha:: diff diff << 00 Ha 0.0000 tt = 0 .0000 P << tt = P 00.5000 .5000

Ha:: diff diff >> 00 Ha 0.0000 tt = 0 .0000 > tt = P > 0.5000 P 0 .5000

Ha:: diff diff -= 00 Ha 0.0000 tt = 0 .0000 P >> Itl It I = 1.0000 P 1 .0000

=

The results for for counseling counseling session session treatment treatment at at the the third third replication replication are: The results are : Two-sample test with with equal equal variances variances Two-sample tt test Group Il O b s Mean Std. Err.. Std.. Dev Dev.. [95% Conf Conf.. Interval] Interval] Group Obs Mean Std . Err Std [95% ---------+--------------------------------------------------------------------

---------+--------------------------------------------------------------------

o

66 .3333333 .8164966 1.476473 3.190194 0 II 22.333333 .333333 .3333333 .8164966 1 .476473 3 .190194 11 II 58 .1055691 .8039904 1.736877 2.159674 58 11.948276 .948276 .1055691 .8039904 1 .736877 2 .159674 ---------+-------------------------------------------------------------------

---------+--------------------------------------------------------------------

combined II 64 .1008205 .8065641 1.782901 2.185849 combined 64 11.984375 .984375 .1008205 .8065641 1 .782901 2 .185849 ---------+--------------------------------------------------------------------

---------+--------------------------------------------------------------------

diff .3850575.3452229 075148 diff II .3850575 .3452229 --.3050332 .3050332 11..075148 -----------------------------------------------------------------------------Degrees of freedom:: 62 62 Degrees of freedom

=

=

Ho:: mean(0) mean(O) -- mean(1) mean(1) = diff diff = 00 Ho Ha:: diff diff << 00 Ha t 11.1154 .1154 t = P << tt = P 00.8655 .8655

Ha:: diff diff -= - 00 Ha tt = 1 .1154 1.1154

=

P P >> Itl It I =

0.2690 0 .2690

Ha:: diff diff >> 00 Ha tt = 1 .1154 1.1154 P > 0.1345 P > tt = 0 .1345

In both both cases, cases, there there is is no no significant significant difference. difference. Here, Here, we we are are interested interested only only In in the covariates covariates for coping style and counseling counseling session session.. There There are p(p -- 1) 1) in the for coping style and are p(p possible tests tests that that can can be be performed performed in in aa model model with with p p covariates covariates and and T T possible replications. Care must must be be taken taken regarding regarding accumulated accumulated Type Type II errors. replications . Care errors. Little (1988) (1988) presents presents aa single single test test for for assessing assessing MCAR. MCAR. In In this this approach, approach, aa Little vector means is is computed computed for for the the construction construction ofaX2 statistic.. The The statistic statistic vector of of means of a x2 statistic is as is defined defined as n

d2

=

2-1

_

n2(YZ - lt2)~

-1

(y2 -

i2)T

(4.80) (4.80)

where y2 for the Yi is is aa vector vector of of values values for the observed observed variables variables in in panel panel i.i. A fl and and E ~ where

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

177 177

SUMMARY SUMMARY

the maximum maximum likelihood likelihood estimates estimates assuming assuming that that the the panels panels are are iid iid normal normal are are the and the missing missing data process is is ignorable ignorable.. and the data process The of the the test test statistic statistic can can be be complex complex when when related related to to monomonoThe distribution distribution of tone missing missing data data (dropouts) (dropouts).. However, However, aa common common type type of of data data situation situation tone involves single follow-up follow-up repeated repeated measurement measurement (n2 (ni = 2), for which the the test test involves aa single = 2), for which statistic can be be computed computed as as statistic can SSB - (n -l)F d22 - SSB (n 1)F d = MST MST = (n (n - 22 + + F) F)

(4.81) (4.81)

and where SSB SSB and and MST MST are are the the between between sum of squares squares and and total total mean mean and where sum of square from an an analysis analysis of of variance variance of of Yl Y1 on on the the missing missing data data pattern pattern.. F F square from is the test test statistic statistic for for the the ANOVA ANOVA model model.. In In this this case, case, there there are are only only two two is the missing data patterns:: YA, Yil, y22 Yi2 are are both both observed, observed, and and yii Yil is is observed observed while while y22 Yi2 missing data patterns is missing.. is missing dataset of of the the asthma asthma status status of of white white children. children. Section 33.7 presents aa sample sample dataset Section .7 presents We can apply apply the the above above test test to to these these data. data. An An analysis analysis of of variance variance of of yii Yil on on We can the missing missing data data pattern pattern is is given given by by the Number of of obs obs = Number Root MSE MSE Root =

2566 2566 .210256 .210256

R-squared R-squared = Adj R-squared R-squared = Adj

=

0.0012 0 .0012 0.0008 0 .0008

Source II Partial Partial SS SS df MS F Prob >> FF Source df MS F Prob -----------+----------------------------------------------------

-----------+---------------------------------------------------Model Model

.133234469 .133234469 3.01 0.0827 .133234469 11 .133234469 3 .01 0 .0827 I I 3.01 0.0827 .133234469 .133234469 g I .133234469 11 .133234469 3 .01 0 .0827 g I Residual I 113.348059 2564 .044207511 Residual 113 .348059 2564 .044207511 -----------+----------------------------------------------------

-----------+---------------------------------------------------Total Total

II

113.481294 113 .481294

2565 2565

.04424222 .04424222

The test statistic statistic for for the the monotone monotone missing data is then The test missing data is then 2 d2 = d = 3.0088 3.0088

the F F statistic statistic here here is is equal equal to to Since the Since (for this particular case of 2 outcomes) (for this particular case of 2 outcomes) described. described .

(4.82) (4.82)

the square square of of aa tt statistic, statistic, this this test test the is the same as the t-test previously is the same as the t-test previously

4.7 Summary 4.7 Summary

Standard exploratory exploratory data data analysis analysis (EDA) (EDA) techniques techniques should should be be used used with with Standard panel datasets. Plots of of the the raw raw data data can can be be constructed, constructed, with with particular particular panel datasets . Plots attention to illustrations depicting both both the the panel panel nature nature of of the the data data and and attention to illustrations depicting the repeated measures identifiers identifiers.. These These types types of of plots plots assist assist the the analyst analyst in in the repeated measures identifying dependence identifying dependence on on time time as as well well as as on on the the panels panels.. In In addition, addition, standard standard GLM-type plots plots of of the the Pearson Pearson residuals residuals versus versus the the linear linear predictor predictor and and the the GLM-type Pearson residuals versus versus the the variance variance are are used used to assess model model adequacy. adequacy. Pearson residuals to assess

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

178 178

RESIDUALS, DIAGNOSTICS, DIAGNOSTICS, AND AND TESTING TESTING RESIDUALS,

Model assessment assessment is based on as well well as as statistical, statistical, points points of of view. view. Model is based on graphical, graphical, as Graphs of of influence influence and and leverage leverage uncover uncover outliers outliers in in the the data data that that may may not not be be Graphs noticed These outliers outliers should be investigated investigated for for data data integrity, integrity, and and noticed otherwise. otherwise . These should be statistical measures should should be be calculated calculated to to provide provide aa measure measure of of the the effect effect of of statistical measures these outliers outliers on on the the fitted fitted models. models. these There are two two deletion diagnostic approaches approaches to to measuring measuring the the effects effects of of There are deletion diagnostic outliers on fitted fitted models: models: deleting deleting individual deleting enenoutliers on individual observations observations and and deleting tire panels panels of of observations observations.. We We use use both both techniques techniques since since the the two two deletion deletion tire diagnostic do not not summarize the same information in in the the data. data. diagnostic criteria criteria do summarize the same information Model criterion measures are are provided provided to to assess assess overall overall model model goodness goodness Model criterion measures of fit. The The QIC QIC measure measure is is aa particularly particularly useful useful tool tool for for choosing choosing the the best best of fit. correlation structure in in aa PA-GEE PA-GEE model. model. Similarly, Similarly, the the QIC,, QIC u measure measure is correlation structure is used for model model selection selection.. used for Standard model model criterion criterion measures, measures, such such as as R2 R 2 ,, are are available for panel panel data data Standard available for models. These ubiquitous ubiquitous measures measures have have aa long long history history in in statistical models. models . These statistical models. They have aa clear clear interpretation interpretation for for OLS OLS linear linear regression, regression, and and researchers researchers They have have produced aa long long list list of of references references for extending the the measure measure to to nonlinear nonlinear have produced for extending models. However, the the R2 R 2 measure measure can can be be difficult difficult to to interpret interpret for for nonlinear nonlinear models . However, models, experience may may be be an an analyst's analyst's best best ally ally in in terms terms of of interpreting interpreting models, and and experience the magnitude magnitude of of R2 R 2 in in aa particular particular situation. the situation . We recommend recommend using using the the generalized generalized forms forms of of the the various various tests tests for for the the We majority of data data analysis analysis situations situations.. These These are are the the easiest easiest tests tests to perform and and majority of to perform the tests tests are are supported supported in in many many standard standard software software packages. packages. Interpretation the Interpretation is is easy, but the the analyst analyst must be aware aware of of the the fact fact that that the the modified modified sandwich sandwich easy, but must be estimate of variance variance has has aa rank rank that that depends depends on on the the number number of of panels. panels. If the estimate of If the number of panels panels is is less than the the number number of of covariates covariates in in the the model, model, or or the the number of less than number panels is is not not too too much much larger, larger, the the working working versions versions of of the the tests tests are are number of of panels preferred over over the the generalized generalized versions versions of of the the tests. tests. However, However, working working tests tests preferred must be programmed programmed by by the the analyst analyst.. must be This chapter attempts to to illustrate illustrate and and catalog catalog techniques techniques for for assessing assessing This chapter attempts model adequacy. We We pay pay particular particular attention attention to to those those criteria criteria for for PA-GEE PA-GEE model adequacy. models since those those are are techniques techniques most most notably notably missing missing from from software software docudocumodels since mentation other texts. texts. We We refrained refrained from presenting full full analyses analyses for for the the mentation and and other from presenting various in order order to focus attention attention on on each each technique technique.. A A complete complete various datasets datasets in to focus analysis would use use several several of of the the techniques techniques listed listed.. analysis would

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

EXERCISES EXERCISES

179 179

4.8 Exercises 4.8 Exercises 1. In In aa study study of of the the efficacy efficacy of of aa new new drug drug treatment treatment for for depression, depression, patients patients 1. receiving the new drug are likely to drop out of the study if they experireceiving the new drug are likely to drop out of the study if they experience dramatic improvement. Patients receiving the placebo are also likely ence dramatic improvement . Patients receiving the placebo are also likely to drop out of the study if there is no observed response. Is the characto drop out of the study if there is no observed response . Is the characterization of of missing missing data data amenable amenable to to modeling modeling by by PA-GEE? PA-GEE? terization

2. The The QIC QIC measure measure is is used used to to choose choose the the best best correlation correlation structure structure among 2. among PA-GEE PA-GEE models models with with the the same same covariates. covariates. Show that apart from aa nornorShow that apart from malizing malizing term term for for the the quasilikelihood, quasilikelihood, the the QIC QIC measure measure is is equal to the the equal to AIC AIC for for an an independence independence PA-GEE PA-GEE model model implying implying aa likelihood likelihood proper proper;; that is, is, for for an an independence independence PA-GEE PA-GEE model model which which specifies specifies aa variance variance that function from from aa member member of of the the exponential exponential family family of of distributions distributions.. function

3. Deletion Deletion diagnostics diagnostics can can be be calculated calculated by by either either deleting deleting individual individual obob3. servations, or or by by deleting deleting entire panels of of observations observations.. Discuss Discuss which which servations, entire panels calculation method method you you would would prefer prefer in in assessing assessing aa PA-GEE PA-GEE model. model. calculation 4. Verify Verify the the output output provided provided in in Table Table 4.4 and Table Table 4.6 using the the Progabide Progabide 4. 4.4 and 4.6 using data in in section section 5.2 5.2.3. data .3. 5. An An example example illustrated the fit fit of of the the quasivariance quasivariance V(p) V(f-l) = = N2 f-l2(1 - U) f-l)22 5. illustrated the (1 (see section section 4.2.3) 4.2.3).. Derive Derive the the associated associated deviance deviance.. (see

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

CHAPTER 5 CHAPTER5

Programs and and Datasets Datasets Programs This chapter presents presents some of the the datasets datasets used used in this text text together together with with This chapter some of in this samples of using using the the mentioned mentioned software software packages packages.. In In addition addition to to providing providing samples of sample code for for fitting fitting the the GEE GEE models models discussed this text, text, we we also also provide provide sample code discussed in in this sample code for for fitting fitting alternative alternative likelihood-based models where where appropriate appropriate.. sample code likelihood-based models This section will will not teach the reader how how to to use use aa particular particular software software package, package, This section not teach the reader but will provide samples samples from from which other analyses can be be produced produced.. We We pay pay but will provide which other analyses can particular attention to the the various various options options available available in in each each package package in in order particular attention to order to highlight highlight the the relatively relatively minor minor differences differences between between them. them. to files which Samples are presented presented by by means means of of input input files which can can be be run run in in "batch" "batch" Samples are mode. Readers can can then then interactively interactively enter the commands; commands; however, however, we we sugmode . Readers enter the sugin their gest first running running the the programs programs in their entirety and then then reading reading the the results. results. gest first entirety and The The PA-GEE PA-GEE model model has has the the best best support support across across packages. packages. There There are are some some minor differences minor differences in in the the default default behavior behavior for for the the packages packages that that we we highlight highlight in the following following programs. programs. We We include include comments to assist the reader reader in in ununin the comments to assist the derstanding choices of of both both the the analysis analysis and and the the selected derstanding the the choices selected options. options . The datasets datasets listed listed in in this this chapter chapter are are available available in in tab-delimited tab-delimited plain plain text text The format from: from: format http://www.crcpress.com/e_products/downloads/download.asp?cat_no=C3073 http ://www .crcpress .com/e-products/downloads/download .asp?cat-no=C3073

Programs 55.1 .1 Programs

The following pages pages include programs for for fitting fitting various various models models and and calculatcalculatThe following include programs ing various diagnostic statistics. These These programs programs can can be be run in batch batch mode mode ing various diagnostic statistics. run in and illustrate each each of the software software packages packages used used in in this this text. and illustrate of the text.

181

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

182 18 2

PROGRAMS AND DATASETS DATASETS PROGRAMS AND

5.1.1 Fitting PA-GEE FA-GEE models models in in Stata Stata 5.1 .1 Fitting This program program uses uses the the data data listed listed in in section section 11.3.4 ** This .3 .4 The program program assumes assumes the the data data are are available as `gicdata 'qicdata.txt' ** The available as .txt' infile id xl x2 x2 x3 x3 ei ei xx xx ys ys yy using using qicdata qicdata.txt infile id tt xi .txt Fit an an exchangeable exchangeable correlation correlation logistic logistic regression regression GEE-PA GEE-PA ** Fit using divisor divisor (n) (n) for for the the dispersion dispersion ** using xtgee y xi xl x2 x2 x3, x3, i(id) i(id) t(t) t(t) fam(bin) fam(bin) corr(exch) xtgee y corr(exch) Fit an an exchangeable exchangeable correlation correlation logistic logistic regression regression GEE-PA GEE-PA ** Fit using divisor divisor (n-p) (n-p) for for the the dispersion dispersion ** using xtgee y xi xl x2 x2 x3, x3, i(id) i(id) t(t) t(t) fam(bin) fam(bin) corr(exch) rump xtgee y corr(exch) rump Fit an an exchangeable exchangeable correlation correlation probit probit regression regression GEE-PA GEE-PA ** Fit using divisor divisor (n) (n) for for the the dispersion dispersion ** using xtgee y xi xl x2 x2 x3, x3, i(id) i(id) t(t) t(t) fam(bin) fam(bin) link(probit) corr(exch) xtgee y link(probit) corr(exch) exit exit Notes:: Notes Variance specification specification Variance ---------------------fam(bin) fam(bin) fam(bin k) fam(bin k) fam(bin variable) variable) fam(bin fam(gaussian) fam(gaussian) fam(gamma) fam(gamma) fam(igaussian) fam(igaussian) fam(nbinomial k) k) fam(nbinomial fam(poisson) fam(poisson)

Link specification specification Link -----------------link(identity) link(identity) link(cloglog) link(cloglog) link(log) link(log) link(logit) link(logit) link(nbinomial) link(nbinomial) link(opower a) a) link(opower link(power a) a) link(power link(probit) link(probit) link(reciprocal) link(reciprocal) rump rump

--

Variance Variance --------

mu(l-mu) mu(1-mu) mu(l-mu/k) mu(1-mu/k) mu(l-mu/[variable_it]) mu(1-mu/[variable_it]) 1 1 mu-2 mu -2 mu-3 mu -3 mu ++ k*mu"2 k*mu-2 mu mu mu

mu mu In(-ln(l-mu)) ln(-ln(1-mu)) In(mu) ln(mu) In(mu/(l-mu)) ln(mu/(1-mu)) In(mu/ (mu+1/k)) ln(mu/(mu+i/k)) [mu/(l-mu)-a -- 1] 1] // a a [mu/(1-mu)-a mu-a mu -a InvPhi(mu) InvPhi(mu) l/mu 1/mu

option to to allow allow estimation estimation of the dispersion dispersion parameter parameter option of the with denominator denominator (n-p) (n-p) instead instead of the default default (n) (n) with of the

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

PROGRAMS PROGRAMS

183

5.1.2 Fitting Fitting PA-GEE FA-GEE models models in in SAS SAS 5.1.2 options ls = 80 80 ;; options is = data qic ;; data qic infile 'qicdata 'qicdata.txt' infile .txt' input id id tt xi xl x2 x2 x3 xS ei ei xx xx ys y ;; input ys y /* Fit Fit an an exchangeable exchangeable correlation correlation logistic logistic regression regression GEE-PA /* GEE-PA using divisor divisor (n-p) (n-p) for for the the dispersion dispersion */ using */ proc genmod genmod data=qic data=qic ;; proc model yy == xi xl x2 x2 x3 xS // dist=bin dist=bin link=logit link=logit ; model repeated subject=id subject=id // corr=exch corr=exch ;; repeated run run ;; quit quit ;

/*

Notes:: Notes Variance specification specification Variance ---------------------dist=bin dist=bin dist=gamma dist=gamma dist=igaussian dist=igaussian dist=multinomial dist=multinomial dist=negbin dist=negbin dist=normal dist=normal dist=poisson dist=poisson

Variance Variance mu(l-mu) mu(1-mu) mu-2 mu -2 mu-S mu -3 mu . ++ k*mu"2 k*mu-2 mu 1 1 mu mu

variance specification specification is is an an option option in in the the MODEL MODEL statement statement variance Link specification specification Link -----------------link=cumcll link=cumcll link=cumlogit link=cumlogit link=cumprobit link=cumprobit link=cloglog link=cloglog link=identity link=identity link=log link=log link=logit link=logit link=power(a) link=power(a) link=probit link=probit

In(-ln(l-mul)) ,, ln(-ln(1-(mui+mu2))) In(-ln(1-(mul+mu2))) ln(-ln(1-mui)) In(mul/(l-mul)), ln((mui+mu2),/(1-(mui+mu2))), In((mul+mu2),/(1-(mul+mu2))), ln(mui/(1-mui)), InvPhi(mul) ,InvPhi(mul+mu2) InvPhi(mui) InvPhi(mui+mu2) In(-ln(l-mu)) ln(-ln(1-mu)) mu mu In(mu) ln(mu) In(mu/(l-mu)) ln(mu/(1-mu)) mu-a mu"a InvPhi(mu) InvPhi(mu)

link specification specification is is an an option option in in the the MODEL MODEL statement statement link Correlation structure structure Correlation --------------------corr=XXXXX corr=XXXXX logor=XXXXX logor=XXXXX

GEE-PA GEE-PA ALR ALR

correlation specification specification is an option option in the REPEATED REPEATED statement statement correlation is an in the V6CORR --- option option to to allow allow estimation estimation of of the the dispersion dispersion parameter parameter with with V6CORR (n) denominator denominator instead of the the default default (rump) (nmp). This This option option (n) instead of . is is specified specified in in the the REPEATED REPEATED statement statement..

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

184 18 4

PROGRAMS AND DATASETS DATASETS PROGRAMS AND

5.1.3 Fitting PA-GEE FA-GEE models models in in S-PLUS S-FLUS 5.1 .3 Fitting # # ## Uses Uses geex geex package package from from http http://lib.stat.cmu.edu/S/geex ://lib .stat.cmu .edu/S/gee x #

## This This program program uses uses the the data data listed listed in in section section 11.3.4 .3 .4 ## The The program program assumes assumes the the data data are are available as `gicdata 'qicdata.txt' available as .txt' qicdata scan("qicdata.txt",list(id=O,t=O,x1=O,x2=O, qicdata (<- scan("gicdata .txt",list(id=O,t=0,x1=0,x2=0, x3=O,ei=O,xx=O,ys=O,y=O)) x3=0,ei=O,xx=O,ys=O,y=O)) ## Fit Fit an an exchangeable exchangeable correlation correlation logistic logistic regression regression GEE-PA GEE-PA ## using using divisor divisor (n) (n) for for the the dispersion dispersion gee.exc gee (formula=y-x1+x2+x3, gee .exc (<- gee(formula=y-xi+x2+x3, family=binomial(link=logit), family=binomial(link=logit), correlation="compoundsymmetric", correlation="compoundsymmetric", data=qicdata, data=gicdata, subject=id, subject=id, repeated=t) repeated=t) ## Fit Fit an an exchangeable exchangeable correlation correlation probit probit regression regression GEE-PA GEE-PA ## using using divisor divisor (n) (n) for for the the dispersion dispersion gee.exc gee (formula=y-x1+x2+x3, <- gee(formula=y-xi+x2+x3, gee .exc (family=binomial(link=probit), family=binomial(link=probit), correlation="compoundsymmetric", correlation="compoundsymmetric", data=gicdata, data=qicdata, subject=id, subject=id, repeated=t) repeated=t)

# ## Notes Notes:: ## Variance specification specification Variance # ---------------------# # # # # # # # # # # # # # # # # # # # # # # # # # # #

family=binomial(link=lnkname) family=binomial(link=lnkname) family=gaussian(link=lnkname) family=gaussian(link=lnkname) family=gamma(link=lnkname) family=gamma(link=lnkname) family=igaussian(link=lnkname) family=igaussian(link=lnkname) family=poisson(link=lnkname) family=poisson(link=lnkname) Correlation specification specification Correlation ------------------------independent independent unstructured unstructured compoundsymmetric compoundsymmetric autoregressiveI autoregressiveI dependentI dependentI

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

Variance Variance -------mu(1-mu) mu(1-mu) 11 mu-2 mu -2 mu-3 mu -3 mu mu

PROGRAMS PROGRAMS

185 18 5

5.1 .1, Fitting 5.1.4 Fitting ALR ALR models models in in SAS SAS options ls = 80 80 ;; options is = data qic ;; data qic infile 'qicdata 'qicdata.txt' infile .txt' input id id tt xi xl x2 x2 x3 x3 ei ei xx xx ys y ;; input ys y /* Fit Fit an an exchangeable exchangeable correlation correlation logistic logistic regression regression GEE-PA /* GEE-PA using divisor divisor (n-p) (n-p) for for the the dispersion dispersion */ using */ proc genmod genmod data=qic data=qic ;; proc model yy == xi xl x2 x2 x3 x3 // dist dist = bin ;; model = bin repeated subject=id subject=id // logor=exch logor=exch ;; repeated run run ;; quit quit ;

/* Notes:: Notes Variance specification specification Variance ----------------------

Variance Variance -------mu(l-mu) mu(1-mu) mu'2 mu"2 mu'3 mu"3 . . . . mu ++ k*mu"2 k*mu'2 mu 11 mu mu

dist=bin dist=bin dist=gamma dist=gamma dist=igaussian dist=igaussian dist=multinomial dist=multinomial dist=negbin dist=negbin dist=normal dist=normal dist=poisson dist=poisson Link specification Link specification -----------------link=cumcll link=cumcll link=cumlogit link=cumlogit link=cumprobit link=cumprobit link=cloglog link=cloglog link=identity link=identity link=log link=log link=logit link=logit link=power(a) link=power(a) link=probit link=probit

In(-ln(l-mul)) ,, ln(-ln(1-(mui+mu2))) In(-ln(1-(mul+mu2))) ln(-ln(1-mui)) In(mul/(l-mul)), ln((mui+mu2),/(1-(mui+mu2))), In((mul+mu2),/(1-(mul+mu2))), ln(mui/(1-mui)), InvPhi(mul) InvPhi(mui) ,,InvPhi(mul+mu2) InvPhi(mui+mu2) , In(-ln(l-mu)) ln(-ln(1-mu)) mu mu In(mu) ln(mu) In(mu/(l-mu)) ln(mu/(1-mu)) mu'a mu"a InvPhi(mu) InvPhi(mu)

Correlation structure structure Correlation --------------------corr=XXXXX corr=XXXXX logor=XXXXX logor=XXXXX

GEE-PA GEE-PA ALR ALR

V6CORR --- option option to to allow allow estimation estimation of of the the dispersion dispersion parameter parameter with with V6CORR (n) denominator denominator instead of the the default default (rump) (nmp). This This option option (n) instead of . is specified specified in in the the REPEATED REPEATED statement statement.. is

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

186 186

PROGRAMS AND DATASETS DATASETS PROGRAMS AND

5.1.5 Fitting PA-GEE FA-GEE models models in in SUDAAN SUDAAN 5.1 .5 Fitting 1* This This code code is is for for use use where where SUDAAN SUDAAN is is a a callable /* callable PROe PROC from SAS SAS *1 from */ options ls = 80 80 ;; options is = data mlog ;; data mlog infile 'mlog.txt' 'mlog.txt' infile input id id tt y Y xi xi x2~ x2 ;; input

1* Fit Fit an an exchangeable exchangeable correlation correlation generalized /* generalized logistic regression regression GEE-PA GEE-PA */ *1 logistic proc multilog multilog data=mlog data=mlog R=exchangeable R=exchangeable proc nest -one_one_ id id ;; nest weight -one_one_ , weight subgroup subgroup y y ;; levels 3 ;; levels 3 model yy == xi xi x2 x2 I/ genlogit genlogit ; model run run ;;

1* Fit Fit an an exchangeable exchangeable correlation correlation cumulative cumulative /* logistic logistic regression regression GEE-PA GEE-PA */ *1 proc proc multilog multilog data=mlog data=mlog R=exchangeable R=exchangeable nest -one_one_ id id ;; nest weight -one_one_ , weight subgroup subgroup y y ;; levels 3 ;; levels 3 model yy == xi xi x2 x2 I/ cumlogit cumlogit ; model run run ;; quit quit ;

1*

Notes:: Notes R=independent R=independent R=exchangeable R=exchangeable

independence model model independence GEE-PA exchangeable exchangeable correlation correlation GEE-PA

There are are only only the two correlation correlation structures structures.. There the above above two

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

PROGRAMS PROGRAMS

187 18 7

5.1.6 Calculating Calculating QIC QIC in in Stata Stata 5.1.6 This program program uses uses the the data data listed listed in in section section 11.3.4 ** This .3 .4 The program program assumes assumes the the data data are are available as `gicdata 'qicdata.txt' ** The available as .txt' infile id xl x2 x2 x3 x3 ei ei xx xx ys ys yy using using qicdata qicdata.txt infile id tt xi .txt First, we we need need the naive variance variance matrix matrix from from ** First, the naive the independence independence model model.. ** the xtgee y xi xl x2 x2 x3, x3, i(id) i(id) t(t) t(t) fam(bin) fam(bin) corr(ind) xtgee y corr(ind) We save save the the variance variance matrix, matrix, and and calculate calculate the the inverse inverse.. ** We matrix A A = e(V) e(V) matrix matrix Ai Ai = syminv(A) syminv(A) matrix

=

We write write a a program program that that runs runs the the same same model model with with aa ** We specified correlation correlation structure. We need need the the ** specified structure . We robust variance variance matrix matrix from from this this model model to to ** robust calculate the the QIC QIC criterion criterion measure measure.. ** calculate capture program program drop drop qicm capture qicm program define define qicm qicm program ** Take Take as as an an argument argument the the hypothesized hypothesized correlation correlation structure structure args corr corr args quietly quietly { { capture quietly quietly xtgee xtgee yy xi xl x2 x2 x3, x3, i(id) i(id) t(t) t(t) /* /* capture */ fam(bin) */ fam(bin) corr(`corr') corr('corr') robust robust if (-rc erc != != 0) 0) { { if the model ** Exit Exit if if the model fails fails to to converge converge exit exit }

** Calculate Calculate trace(Ai*V) trace(Ai*V) V = e(V) e(V) V T Ai*V T = Ai*V trace(T) tt = trace(T) off = t[1,1] t[l,l] off ** Calculate Calculate the the remaining rema1n1ng term term for this for this ** particular particular model model (it is model model dependent) dependent) (it is tempvar ql ql tempvar quietly predict predict double double mu, mu, mu mu quietly quietly generate generate double double `ql' 'ql' = (y*log(mu/(1-mu)) (y*log(mu/(l-mu)) ++ log(1-mu)) 10g(1-mu)) quietly quietly summarize 'ql', meanonly meanonly quietly summarize `ql', matrix matrix matrix matrix matrix matrix scalar scalar

=

}

end end ** ** ** **

="

display in in green green "QIC "QIC = " in in yellow yellow 2*(off-r(sum)) 2*(off-r(sum)) display

Calculate the the QIC QIC statistic statistic for for aa variety variety of of alternate alternate Calculate correlation structures structures.. There There should should be be only only one one argument, argument, correlation so if if the the correlation correlation structure has an argument, so structure has an optional optional argument, we must must enclose enclose the the specification specification in in quotes quotes.. we

qicm qicm qicm qicm qicm qicm qicm qicm qicm qicm qicm qicm

ind ind exch exch liar 1" 1" "ar liar 2" "ar 2" unst unst "sta 2" "s ta 2"

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

188 188

PROGRAMS AND DATASETS DATASETS PROGRAMS AND

5.1.7 Calculating QICu QICu in in Stata Stata 5.1 .7 Calculating

*

This do-file do-file assumes assumes that that the the Progabide Progabide data data are are available ** This available as a a Stata Stata dataset dataset named named progdata progdata.dta. ** as .dta .

*

use progdata, progdata, clear clear use capture program program drop drop qicu capture qicu program define define qicu qicu program quietly { { quietly + 11 local pp = e(df e(df_m) local m) + tempvar xb xb ql ql tempvar predict double double `xb', 'xb', xb xb predict generate double double `ql' 'ql' = seizures*`xb' seizures*'xb' -- exp(`xb') exp('xb') generate *1 seizures*(log(seizures)-1) seizures*(log(seizures)-l) */ summarize `ql' 'ql' summarize local qicu qicu = -2*r(sum) -2*r(sum) ++ 2*`p' 2*'p' local

=

=

=

}

="

end end

display in in green green "QICu "QIC_u = " in in yellow yellow %8 %8.4f 'qicu' display .4f `gicu'

Store the the common common options options in in aa macro macro to to save save typing typing ** Store local "offset(1nPeriod) fam(poiss) local opts opts "offset(lnPeriod) fam(poiss) corr(exch) corr(exch) i(id) Hid) nolog" nolog" and calculate calculate QICu Run each each model model and QICu for for the the model model.. This This will will ** Run produce a a lot lot of of output output including including the the results results of of fitting ** produce fitting each model model along along with with the the QICu QICu criterion criterion measure measure.. ** each

xtgee seizures time, time, 'opts' xtgee seizures `opts' qicu qicu xtgee seizures progabide, progabide, `opts' 'opts' xtgee seizures qicu qicu xtgee seizures timeXprog, timeXprog, `opts' 'opts' xtgee seizures qicu qicu xtgee seizures time time progabide, progabide, `opts' 'opts' xtgee seizures qicu qicu xtgee seizures time time timeXprog, timeXprog, `opts' 'opts' xtgee seizures qicu qicu xtgee seizures progabide progabide timeXprog, timeXprog, `opts' 'opts' xtgee seizures qicu qicu xtgee seizures time time progabide progabide timeXprog, timeXprog, `opts' 'opts' xtgee seizures qicu qicu xtgee seizures ,, `opts' 'opts' xtgee seizures qicu qicu

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

1*

PROGRAMS PROGRAMS

5.1.8 Graphing the the residual residual runs runs test test in in S-PL S-PLUS 5.1 .8 Graphing US residrun (- function(ehat, function(ehat, yhat, yhat, subtitle="") subtitle="") { { residrun <## ## ##

The graph graph will will reflect the current current sort sort order of the The reflect the order of the residuals and and fitted fitted values values.. Calling Calling program program is is responsible residuals responsible for ordering ordering the the data data.. for sgn sign(ehat) sgn (<- sign(ehat) num (num <- il:length(ehat) :length(ehat) plot(xsg, sgn, sgn, plot(xsg, type=" s ", type="s", main="Graphical Illustration Illustration of of Residual Runs", main="Graphical Residual Runs", sub=subtitle, sub=subtitle, xlab="Fitted values", values", xlab="Fitted ylab="Sign(Pearson residuals)", residuals)", ylab="Sign(Pearson xlim=c(O,length(ehat)), xlim=c(O,length(ehat)), ylim=c(-l,l) ylim=c(-1,1)

}

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

189 18 9

190 190

PROGRAMS AND DATASETS DATASETS PROGRAMS AND

5.1.9 the fixed fixed correlation structure in in Stata Stata 5.1 .9 Using Using the correlation structure capture program program drop drop calcR calcR capture program define define calcR calcR program capture drop res res mm mm capture drop predict double double res res predict replace res res y-res replace = y-res generate double mm = res*res res*res generate double mm summarize mm, meanonly meanonly summarize mm, scalar phi r(mean) scalar phi = r(mean) scalar alpha scalar alpha = 0 local local ii 11 local n local n 0 while `i' 'i' <= <= $G $G { { while local ioff ioff = (`i'-1) ('i'-l) ** $Ni $Ni local local jj 11 local while `j' 'j' << $Ni $Ni {{ while local joff joff = `ioff' 'ioff' ++ 'j' 'j' local local koff koff = `joff' 'joff' ++ 11 local scalar rij rij = res[`joff'] res['joff'] scalar scalar rik rik = res[`koff'] res['koff'] scalar scalar alpha alpha = alpha alpha + + rij*rik rij*rik scalar local nn = `n'+1 'n'+l local local jj = `j'+2 'j'+2 local

=

=

=

°

°

=

=

=

}

=

}

local ii = `i'+1 'i'+l local

= =

scalar (alpha/'n')/phi scalar rr rr = (alpha/`n')/phi matrix R 1,rr, 0, matrix 0, 0, 0, 0, 0, 0, 0, 0, 0, 0\ 0\ R = (( 1,rr, rr, 1, */ 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0\ 0\ */ rr, 1,rr, 0, */ 0, 0, 0, 1,rr, 0, 0, 0, 0, 0, 0\ 0\ */ 0, O,rr, 1, 1, 0, 0, O,rr, 0, 0, 0, 0, 0, 0\ 0\ */ */ 0, */ 1,rr, 0, 0, 0, 0, 0, 0, 0, 0, 1,rr, 0, 0\ 0\ */ 0, O,rr, 1, 1, 0, 0, 0, 0, 0, 0, O,rr, 0, 0\ 0\ */ */ 0, 1,rr\ */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1,rr\ */ 0, 1) O,rr, 1) 0, 0, 0, 0, 0, 0, 0, 0, 0, O,rr, */ */ 0,

/* /* /* /* /* /* /* /* /* /* /* /*

/* /*

end end capture program program drop drop Iter Iter capture program define define Iter Iter program global Ni Ni == 8 8 global global GG == 10 global 10 xtgee yy xi xl x2, x2, i(id) i(id) t(t) t(t) corr(ind) corr(ind) nolog xtgee nolog local local ii 1 1 while `i' 'i' <= <= 10 10 { { while matrix R R == e(R) e(R) matrix matrix b b == e(b) e(b) matrix capture xtgee xtgee yy xi xl x2, x2, i(id) i(id) t(t) t(t) corr(fixed corr(fixed R) iter(2) from(b) from(b) capture R) iter(2) matrix R R == e(R) e(R) matrix calcR calcR local ii = = `i'+1 'i'+l local }

end end Use the the Iter Iter program program to to iterate iterate several several times times starting starting from from an an ** Use ** independence independence model model.. For For each each iteration, iteration, use use the the currently currently calculated calculated ** correlation correlation matrix matrix from from the the calcR calcR program. program. Iterate Iterate 10 times.. 10 times use optdata, clear use optdata, clear quietly Iter quietly Iter xtgee xtgee matrix list list e(R), e(R), format(%4.2f) matrix format(%4 .2f)

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

PROGRAMS PROGRAMS

191 19 1

5.1.10 Fitting quasivariance quasivariance PA-GEE FA-GEE models models in in S-PL S-FLUS 5.1 .10 Fitting US # # ## # ## ## ## # ## ## ##

Uses geex geex package package from from http http://lib.stat.cmu.edu/S/geex Uses ://lib .stat.cmu .edu/S/gee x The goal goal is to use use the the make make.family() function to to create create The is to .family() function another family family (fully (fully supported function) for for use use another supported variance variance function) with either either the the gee() gee() or glm() commands commands.. with or glmO First, define define the the necessary necessary label, label, variance, variance, and and First, deviance functions functions.. Symbolic Symbolic math math programs programs can can be be useful useful deviance for determining determining the the deviance deviance.. for

binsq.var (- list( list( binsq .var
res gee (formula=leaf-sitel+site2+site3+site4+site5+site6+s ite7+site8, res (<- gee(formula=leaf-sitei+site2+site3+site4+sites+site6+site7+site8, family=binsq .f, family=binsq.f, correlation="compoundsymmetric" correlation="compoundsymmetric", control=gee.control(tolerance=le-25,epsilon=le-14,trace=T), control=gee .control(tolerance=ie-25,epsilon=ie-14,trace=T), data=leafdata, data=leafdata, subject=variety, subject=variety, repeated=t) repeated=t)

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

192 192

PROGRAMS AND DATASETS DATASETS PROGRAMS AND

Datasets 55.2 .2 Datasets The The text text

following subsections subsections list list the the data data that that are are used used in in the the examples examples of of the the following along with an explanation of the data. along with an explanation of the data.

5.2.1 VVheeze data 5.2.1 Wheeze data These data study study the the effect effect on on health health effects effects of of air air pollution pollution.. These These data data are are These data aa subset subset of of the the data data in in Ware, Ware, Docker Docker III, III, Speizer, Speizer, and and Ferris Ferris Jr. Jr. (1984) (1984).. The The data the case case number, number, case; a within within subject observation identifier, identifier, t; t; data include include the case ; a subject observation aa binary binary indicator indicator of of whether whether the the subject subject wheezes, wheeze; a binary indicator wheezes, wheeze ; a binary indicator of whether the the observation observation is in Kingston, Kingston, kingston kingston;; the the age age of of the the child child of whether is in age; and a binary indicator of whether the child's mother smokes, in years, in years, age; and a binary indicator of whether the child's mother smokes, smoke. The The data data are are for for 16 children where where observations observations were were collected for four smoke. 16 children collected for four The years at ages 9-12 of the children. The subjects reside in either of the cities years at ages 9-12 of the children. subjects reside in either of the cities Kingston or Portage Portage.. Kingston or Indicator variables be constructed for the the case. case. In In the the text, text, we we include include Indicator variables can can be constructed for output where _Icase_l is an indicator that the observation is characterized output where -Icase-1 is an indicator that the observation is characterized by case case = 1; _Icase_2 indicates indicates that that the the observation is characterized by 1 ; -Icase2 by observation is characterized by case = 2. 2. Indicator Indicator variables variables for for the the other other case case identifiers identifiers are are constructed constructed case similarly. similarly. A description description of of the the panel panel structure structure of of the the data is:: A data is case:: case t:: t

1, 2, ... 16 1, 2, . . ,, 16 1, 2, ... 1, 2, . . ,, 44 Delta(t) . = (4-1)+1 == 44 Delta(t) = 11·; (4-1)+1 observation) (case*t uniquely uniquely identifies each observation) (case*t identifies each

.

Distribution of of T-i T i:: Distribution

min min 4 4

5% 5% 4 4

25% 25% 4 4

50% 50% 44

= =

n n = TT =

75% 75% 44

16 16 44

95% 95% 44

max max 4 4

Freq.. Percent Percent Cum.. I Pattern Pattern Freq Cum ---------------------------+---------

---------------------------+-----------------------------------+---------

16 100.00 1111 16 100 .00 100.00 100 .00 II 1111 ---------------------------+-------16 16

I

100.00 100 .00

xxxx

XXXX

A summary of of the the variables variables is: A summary is: Variable Mean Std. Dev Dev.. Min Max II Observations Variable I Mean Std. Min Max Observations -----------------+--------------------------------------------+----------------

-----------------+--------------------------------------------+----------------

wheeze wheeze

overall overall I between between I within wi thin I I kingston kingston overall overall I between I between within within I I age overall I age overall

.296875.4604927 .296875 .4604927 .2918154 .2918154 .3618734 .3618734

00 00 - .453125 -.453125

.5 .5

.5039526 .5039526 .5163978 .5163978 o 0

00 00 .5 .5

10.5 10 .5

11.126872 .126872

99

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

11 I .75 1 .75 11.046875 .046875 1 I 11 I 11 I .5 .5 I I 12 1 12

= = =

64 64 16 16 44

= = =

64 64 16 16 44

=

64 64

NN = n n = T = T NN = n n = T = T NN =

DATASETS DATASETS

193 193

between between within I within I

smoke smoke

overall overall between between within I within

The The data data are: are: case case 11 11 11 11 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4 4 5 5 5 5 5 5 5 5 6 6 6 6 6 6 6 6 7 7 7 7 7 7 7 7 8 8 8 8 8 8 8 8 9 9 9 9 9 9 9 9 10 10 10 10 10 10 10 10 11 11 11 11 11 11 11 11 12 12 12 12 12 12 12 12 13 13 13 13

.796875 .796875

tt 11 2 2 3 3 4 4 11 2 2 3 3 4 4 11 2 2 3 3 4 4 11 2 2 3 3 4 4 11 2 2 3 3 4 4 11 2 2 3 3 4 4 11 2 2 3 3 4 4 11 2 2 3 3 4 4 11 2 2 3 3 4 4 11 2 2 3 3 4 4 11 2 2 3 3 4 4 11 2 2 3 3 4 4 11 2 2

wheeze wheeze 11 11 11 0 0 11 11 0 0 0 0 11 0 0 0 0 0 0 0 0 11 11 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 11 0 0 0 0 0 0 0 0 0 0 0 0 0 0 11 0 0 11 11 0 0 0 0 0 0 0 0 0 0 11

© 2003 by by Chapman Chapman &Hall/CRC & Hall/CRC

© 2003

0 0 11.126872 .126872

10.5 10 .5 99

10.5 10 .5 12 12

.6941865 .6941865 .5493841 .5493841 .4409586 .4409586

00 00 .046875 .046875

2 2 75 11..75 11.546875 .546875

kingston kingston 0 0 0 0 0 0 0 0 11 11 11 11 11 11 11 11 0 0 0 0 0 0 0 0 11 11 11 11 0 0 0 0 0 0 0 0 11 11 11 11 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 11 11 11 11 11 11 11 11 0 0 0 0 0 0 0 0 11 11

age age 99 10 10 11 11 12 12 99 10 10 11 11 12 12 99 10 10 11 11 12 12 99 10 10 11 11 12 12 99 10 10 11 11 12 12 99 10 10 11 11 12 12 99 10 10 11 11 12 12 99 10 10 11 11 12 12 99 10 10 11 11 12 12 99 10 10 11 11 12 12 99 10 10 11 11 12 12 99 10 10 11 11 12 12 99 10 10

1 1 I I 1 1

smoke smoke 0 0 0 0 0 0 0 0 11 2 2 2 2 2 2 0 0 0 0 11 11 0 0 0 0 0 0 11 0 0 11 11 11 0 0 11 11 11 11 11 0 0 0 0 11 11 11 2 2 2 2 2 2 11 11 0 0 0 0 0 0 11 11 0 0 0 0 0 0 11 0 0 0 0 0 0 11 0 0

= =

16 16 44

= = =

64 64 16 16 44

n n = T = T NN = n n = T = T

PROGRAMS AND DATASETS DATASETS PROGRAMS AND

194 194 13 13 13 13 14 14 14 14 14 14 14 14 15 15 15 15 15 15 15 15 16 16 16 16 16 16 16 16

33

11

33

00

4 4 11 2 2

11 11 0 0 0 0 00 0 0 11 11 11 11 0 0 0 0 00 0 0

11 00 00

4 4 11 2 2

11 00 00

33

00

4 4 11 2 2

11 11 11

33

00

4 4

00

11 11

11

12 12 99 10 10 11 11 12 12 99 10 10 11 11 12 12 99 10 10 11 11 12 12

11 11 2 2

11

2 2 11 11

11

2 2 11 11 2 2 11

5.2.2 Ship Ship accident accident data data 5.2.2 This studies the the number number of of accidents accidents reported reported for for ships ships.. Included Included in in This dataset dataset studies ship identifier id; incident ; the data are a ship identifier id; the number of incidents reported incident; the data are a the number of incidents reported ship whether the ship operated between 1975 and 1979 op-75-79; whether the whether the operated between 1975 and 1979 op_75_79 ; whether the ship was ship ship was in in construction construction between between 1965 and 1969 1969 co_65_69 co_55_59;; whether whether the the ship 1965 and in co_70_74; ship was in construction between 1970 and 1974 co-70-74; whether the ship was was construction between 1970 and 1974 whether the was in construction in construction between between 1975 1975 and and 1979 co-75-79; the number of months in 1979 co_75_79 ; the number of months in service during the the data data collection collection mon mon;; and and the the exposure exposure of of the the ship ship (natural service during (natural log of months) months) exposure. exposure. We We have have included included the the exposure exposure variable variable in in the the data data log of listing, but you you can can generate generate this this variable variable as as ln(mon) In(mon) where where mon mon is is the the number number listing, but of months that that the the ship ship was was operating operating.. of months Indicator variables variables can can be be constructed constructed for for the the ship ship identifier. identifier. In In the the text, text, we we Indicator include where _Iship_1 _Iship_l is is an an indicator indicator that that the the observation observation is is characcharacinclude output output where terized by by ship = 1; _Iship_2 indicates indicates that that the the observation observation is is characterized characterized terized ship = 1 ; _Iship_2 by ship ship = = 2.2. Indicator Indicator variables variables for for the the other other ship ship types types are are constructed constructed by similarly. similarly. A description description of of the the panel panel structure structure of of the the data is:: A data is ship:: ship t:: t

1, 2, ... 1, 2, . . ,, 55 1, 2, ... 1, 2, . . ,, 8 8 Delta(t) . = (8-1)+1 == 88 Delta(t) = 11·; (8-1)+1 observation) (ship*t uniquely uniquely identifies each observation) (ship*t identifies each

T_i:: Distribution of of T_i Distribution

.

min min 8 8

5% 5% 8 8

Freq.. Percent Percent Cum.. I Pattern Pattern Freq Cum ----------------------------------------------------------------+---------5 100.00 11111111 5 100 .00 100.00 100 .00 II 11111111 ----------------------------------------------------------------+---------100.00 5 5 100 .00 I XXXXXXXX

xxxxxxxx

A summary of of the the variables variables is: A summary is:

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

25% 25% 8 8

50% 50% 88

= =

n n = T T=

75% 75% 88

5 5 8 8

95% 95% 88

max max 8 8

195 195

DATASETS DATASETS

Variable Mean Std.. Dev Dev.. Min Max II Observations Variable Mean Std Min Max Observations I -----------------+--------------------------------------------+----------------

-----------------+--------------------------------------------+----------------

incident overall overall I incident between between within I within overall I op-75-79 op3539 overall between between within I within

I co_65_69 overall overall I co-65-69 between between within I within I overall I co3034 overall co-70-74 between between within I within I overall I co3539 overall co-75-79 between between within I within I exposure overall exposure overall I between between within within I

8.9 8 .9

14.96115 14 .96115 12.7908 12 .7908 99.465524 .465524 .5063697 .5063697 0 0 .5063697 .5063697

0 0 11.5 .5 -22.725 -22 .725 00 .5 .5 00

.25 .25

.438529 .438529 0 0 .438529 .438529

00 .25 .25 00

.25 .25

.438529 .438529 0 0 .438529 .438529

00 .25 .25 00

.25 .25

.438529 .438529 0 0 .438529 .438529

00 .25 .25 00

7.049255 7 .049255

11..721094 721094

3.806662 3 .806662 5.953964 5 .953964 4 4.591827 .591827

.5 .5

11.516132 .516132 11..014858 014858

N N n n T T N N n n T T

= = = = = =

= = = = =

40 40 55 88 40 40 55 88

N 11 N .25 .25 n n T 11 I T I N 11 I N .25 1 .25 n n T 11 I T I N 11 I N .25 1 n .25 n T 11 I T I N 10.71179 N 10 .71179 I 9.694236 n 9 .694236 1 n 8.721374 T-bar 8 .721374 I T-bar

= = =

=

40 40 55 88

= = =

=

40 40 55 88

= = =

40 40 55 88

=

34 34 55 6.8 6 .8

58 58 31.625 31 .625 35.275 35 .275 11 .5 .5 11

I 1 1 I I I I I 1

= = =

= =

where exposure is is missing missing when when the the ship ship has has not not been been in service.. The The data data where exposure in service are: are: ship ship 11 11 11 11 11 11 11 11 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4 4 4 4 4 4

co_65_69 co-70-74 t incident incident op-75-79 t co3034 co-75-79 co3539 op3539 co-65-69 11 0 0 11 00 00 0 0 2 0 2 0 11 11 00 00 3 3 0 00 11 00 3 3 0 4 4 4 4 11 00 11 00 5 6 0 5 6 0 00 00 11 18 6 6 18 11 00 00 11 7 0 0 7 0 0 00 00 00 11 8 8 11 11 00 00 00 39 0 11 39 00 11 00 0 2 29 0 2 29 11 11 00 0 3 58 0 3 58 00 00 11 0 4 53 0 4 53 11 00 11 0 12 5 5 12 00 00 00 11 44 6 6 44 11 00 00 11 7 00 0 7 00 00 00 0 18 8 0 8 18 11 00 00 0 0 11 11 00 11 00 0 2 0 2 11 11 11 00 0 11 3 00 0 3 00 00 0 4 0 4 11 11 00 11 0 11 5 66 5 00 00 00 11 11 22 6 6 00 00 7 00 0 7 00 00 00 0 11 11 8 0 8 00 00 0 11 11 00 0 00 00 0 2 11 11 2 00 0 00 0 3 00 00 11 0 3 00 0 4 11 11 4 00 0 00 0 22 5 00 00 00 11 5 11 6 11 11 00 00 11 6

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

exposure exposure 44.8441871 .8441871 44.1431347 .1431347 66.9985096 .9985096 66.9985096 .9985096 77.3211886 .3211886 88.1176107 .1176107 77.7160153 .7160153 10.711792 10 .711792 9.7512683 9 .7512683 10.261477 10 .261477 9.9218185 9 .9218185 8.8627667 8 .8627667 9.4802912 9 .4802912

8.8702416 8 .8702416 7.0724219 7 .0724219 6.313548 6 .313548 6.6605751 6 .6605751 6.5161931 6 .5161931 6.6631327 6 .6631327 7.5745585 7 .5745585 5.6131281 5 .6131281 5.5254529 5 .5254529 4 4.6539604 .6539604 5.6629605 5 .6629605 5.2574954 5 .2574954 5.8550719 5 .8550719 7.0967214 7 .0967214

mon mon 127 127 63 63 1095 1095 1095 1095 1512 1512 3353 3353 0 0 2244 2244 44882 44882 17176 17176 28609 28609 20370 20370 7064 7064 13069 13069 0 0 7117 7117 1179 1179 552 552 781 781 676 676 783 783 1948 1948 0 0 274 274 251 251 105 105 288 288 192 192 349 349 1208 1208

PROGRAMS AND DATASETS DATASETS PROGRAMS AND

196 196 4 4 4 4 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5

7 7 8 8 11 2 2 3 3 4 4 5 5 6 6 7 7 8 8

0 0 4 4 00 00 77 77 55 12 12 00 11

00 11 00 11 00 11 00 11 00 11

00 00 11 11 00 00 00 00 00 00

00 00 00 00 11 11 00 00 00 00

0 0 0 0 0 0 0 0 0 0 0 0 11 11 0 0 0 0

0 0 2051 2051 45 45 0 0 789 789 437 437 1157 1157 2161 2161 0 0 542 542

7.6260828 7 .6260828 3.8066625 3 .8066625 6.6707663 6 .6707663 6.0799332 6 .0799332 7.0535857 7 .0535857 7.6783264 7 .6783264 6.295266 6 .295266

5.2.3 Progabide data

5.2.3 Progabide data

The Progabide The Progabide data data have have been been analyzed analyzed in in many many places places;; they they are are available available in in Thall and from in which Thall and Vail Vail (1990) (1990).. The The data data are from a panel study in which four sucare a panel study four successive two-week counts counts of of seizures seizures were were recorded recorded for for each each epileptic epileptic patient patient in in cessive two-week The the study. The covariates are the Progabide treatment indicator (O=placebo, the study . covariates are the Progabide treatment indicator (0=placebo, l=Progabide), the the followup followup indicator indicator (0=baseline (O=baseline measurement, l=followup), 1=Progabide), measurement, 1=followup), and an interaction interaction of of these these covariates covariates.. and an We use use only only these these covariates. covariates. Other Other sources sources that that analyze analyze these these data data include include We additional including the the age age of of the the patient patient.. additional covariates covariates including A description description of of the the panel panel structure structure of of the the data is:: A data is id:: id t:: t

= =

1, 2, . . .,, 59 59 1, 2, 0, 1, 1, . . ,, 44 0, Delta(t) . = 11;; (4-0)+1 (4-0)+1 = 55 Delta(t) (id*t uniquely uniquely identifies identifies each each observation) (id*t observation)

=

Distribution of of T_i T_i:: Distribution

n n = T T=

=

min min 5 5

5% 5% 5 5

25% 25% 5 5

50% 50% 55

75% 75% 5 5

59 59 5 5

95% 95% 55

max max 55

Freq.. Percent Percent Cum.. I Pattern Pattern Freq Cum -------------------------------------

---------------------------+---------

59 100.00 100.00 11111 59 100 .00 100 .00 II 11111 -----------------------------------XXXXX 100.00 59 59 100 .00 I

---------------------------+---------

xxxxx

A summary of of the the variables variables is: A summary is: Variable Mean Std.. Dev Dev.. Min Max II Observations Variable I Mean Std Min Max Observations ------------------------------------------------------------------------------... ..l l II iff OCAA1 10 a07n7 A ici I AT nnC seizures overall 12.86441 18.68797 0 151 N295 iL .VV

-----------------+--------------------------------------------+----------------

VV+-Y+VV

time time

betweenl between I within within I

II overalll overall I betweenl between I withinI within II

progabide progabide overalll overall I between I betweenl within withinI

y

.8 .8

.5254237 .5254237

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

22.2 .2

13.90844 13 .90844 12 12.58679 .58679

-14 .73559 -14.73559

.4006797 .4006797 0 .4006797 .4006797

0 .8 .8 0

.5002017 .5002017 .5036396 .5036396 0

0 0 .5254237 .5254237

o

o

o

o o o

90.6 90.6 77.06441 77 .06441

I 1 I 11 I .8 .8 I 11 I I 11 I 11 I .5254237 1 .5254237

= = =

59 ~59 55

= = =

295 295 59 59 55

= = =

295 295 59 59 55

n n = T = T NN = n n = T = T NN = n n = T = T

197 197

DATASETS DATASETS II

timeXprog overall I timeXprog overalll between I betweenl within withinI lnPeriod 1nPeriod

II overall I overalll between I betweenl within withinI

.420339 .420339

.4944521 .4944521 .4029117 .4029117 .2904372 .2904372

00 00 -.379661 - .379661

.9704061 .9704061

.55546 .55546 0 0 .55546 .55546

.6931472 .6931472 .9704061 .9704061 .6931472 .6931472

I 11 I .8 .8 I .620339 1 .620339 I 2.079442 2 .079442 I .9704061 I .9704061 2.079442 2 .079442 1

= = =

295 295 59 59 55

= = =

295 295 59 59 55

NN = n n = T = T NN = n n = T = T

The data data are are:: id id 11 11 11 11 11 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4 4 4 4 5 5 5 5 5 5 5 5 5 5 6 6 6 6 6 6 6 6 6 6 7 7 7 7 7 7 7 7 7 7 8 8 8 8 8 8 8 8 8 8 9 9 9 9 9 9 9 9 9 9 10 10 10 10 10 10

tt 0 0 11 2 2 3 3 4 4 0 0 11 2 2 3 3 4 4 0 0 11 2 2

seizures seizures 11 11 5 5 3 3 3 3 3 3 11 11 3 3 5 5 3 3 3 3 6 6 2 2 4 4

3 3 4 4

0 0 5 5 8 8 4 4 4 4 11 4 4 66 66 7 7 18 18 9 9 21 21 27 27 5 5 2 2 8 8 7 7 12 12 6 6 4 4 0 0 2 2 52 52 40 40 20 20 23 23 12 12

0 0 11 2 2 3 3 4 4 0 0 11 2 2 3 3 4 4 0 0 11 2 2 3 3 4 4 0 0 11 2 2 3 3 4 4 0 0 11 2 2 3 3 4 4

3 3 4 4

23 23 5 5 6 6 6 6 5 5

0 0 11 2 2

10 10 14 14 13 13

0 0 11 2 2

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

time time 0 0 11 11 11 11 0 0 11 11 11

11 0 0 11 11 11 11 0 0 11 11 11 11 0 0 11 11 11 11 0 0 11 11 11 11 0 0 11 11 11 11 0 0 11 11 11 11 0 0 11 11 11 11 0 0 11 11

progabide progabide 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

timeXprog timeXprog 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

lnPeriod 1nPeriod 2.079442 2 .079442 .6931472 .6931472 .6931472 .6931472 .6931472 .6931472 .6931472 .6931472 2.079442 2 .079442 .6931472 .6931472 .6931472 .6931472 .6931472 .6931472 .6931472 .6931472 2 .079442 2.079442

00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

.6931472 .6931472 .6931472 .6931472 .6931472 .6931472 2.079442 2 .079442 .6931472 .6931472 .6931472 .6931472 .6931472 .6931472 .6931472 .6931472 2.079442 2 .079442 .6931472 .6931472 .6931472 .6931472 .6931472 .6931472 .6931472 .6931472 2.079442 2 .079442 .6931472 .6931472 .6931472 .6931472 .6931472 .6931472 .6931472 .6931472 2.079442 2 .079442 .6931472 .6931472 .6931472 .6931472 .6931472 .6931472 .6931472 .6931472 2 .079442 2.079442

00 00 00

0 0 0 0 0 0

.6931472 .6931472 .6931472 .6931472 .6931472 .6931472 .6931472 .6931472 2.079442 2 .079442 .6931472 .6931472

.6931472 .6931472 .6931472 .6931472 .6931472 .6931472 .6931472 .6931472 2 .079442 2.079442 .6931472 .6931472 .6931472 .6931472

198 198

PROGRAMS AND DATASETS DATASETS PROGRAMS AND 10 10 10 10 11 11 11 11 11 11 11 11 11 11 12 12 12 12 12 12 12 12 12 12 13 13 13 13 13 13 13 13 13 13 14 14 14 14 14 14 14 14 14 14 15 15 15 15 15 15 15 15 15 15 16 16 16 16 16 16 16 16 16 16 17 17 17 17 17 17 17 17 17 17 18 18 18 18 18 18 18 18 18 18 19 19 19 19 19 19 19 19 19 19 20 20 20 20 20 20 20 20 20 20 21 21 21 21 21 21 21 21 21 21 22 22 22 22 22 22 22 22

33 44 00 11 22 33 44 00 11 22 33 44 00 11 22 33 44 00 11 22 33 44 00 11 22 33 44 00 11 22 33 44 00 11 22 33 44 00 11 22 33 44 00 11 22 33 44 00 11 22 33 44 00 11 22 33 44 00 11 22 33

66 00 52 52 26 26 12 12 66 22 22 33 33 12 12 66 88 55 18 18 44 44 66 22 42 42 77 99 12 12 14 14 87 87 16 16 24 24 10 10 99 50 50 11 11 00 00 55 18 18 00 00 33 33 111 111 37 37 29 29 28 28 29 29 18 18 33 55 22 55 20 20 33 00 66 77 12 12 33 44 33 44 99 33 44 33

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

11 11 00 11 11 11 11 00 11 11 11 11 00 11 11 11 11 00 11 11 11 11 00 11 11 11 11 00 11 11 11 11 00 11 11 11 11 00 11 11 11 11 00 11 11 11 11 00 11 11 11 11 00 11 11 11 11 00 11 11 11

00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

.6931472 .6931472 .6931472 .6931472 22.079442 .079442 .6931472 .6931472 .6931472 .6931472 .6931472 .6931472 .6931472 .6931472 22.079442 .079442 .6931472 .6931472 .6931472 .6931472 .6931472 .6931472 .6931472 .6931472 22.079442 .079442 .6931472 .6931472 .6931472 .6931472 .6931472 .6931472 .6931472 .6931472 22.079442 .079442 .6931472 .6931472 .6931472 .6931472 .6931472 .6931472 .6931472 .6931472 22.079442 .079442 .6931472 .6931472 .6931472 .6931472 .6931472 .6931472 .6931472 .6931472 22.079442 .079442 .6931472 .6931472 .6931472 .6931472 .6931472 .6931472 .6931472 .6931472 22.079442 .079442 .6931472 .6931472 .6931472 .6931472 .6931472 .6931472 .6931472 .6931472 22.079442 .079442 .6931472 .6931472 .6931472 .6931472 .6931472 .6931472 .6931472 .6931472 22.079442 .079442 .6931472 .6931472 .6931472 .6931472 .6931472 .6931472 .6931472 .6931472 22.079442 .079442 .6931472 .6931472 .6931472 .6931472 .6931472 .6931472 .6931472 .6931472 22.079442 .079442 .6931472 .6931472 .6931472 .6931472 .6931472 .6931472 .6931472 .6931472 22.079442 .079442 .6931472 .6931472 .6931472 .6931472 .6931472 .6931472

DATASETS DATASETS 22 22 23 23 23 23 23 23 23 23 23 23 24 24 24 24 24 24 24 24 24 24 25 25 25 25 25 25 25 25 25 25 26 26 26 26 26 26 26 26 26 26 27 27 27 27 27 27 27 27 27 27 28 28 28 28 28 28 28 28 28 28 29 29 29 29 29 29 29 29 29 29 30 30 30 30 30 30 30 30 30 30 31 31 31 31 31 31 31 31 31 31 32 32 32 32 32 32 32 32 32 32 33 33 33 33 33 33 33 33 33 33 34 34 34 34 34 34 34 34 34 34

199 19 9 4 4 0 0 11 2 2 3 3 4 4 0 0 11 2 2 3 3 4 4 0 0 11 2 2 3 3 4 4 0 0 11 2 2 3 3 4 4 0 0 11 2 2 3 3 4 4 0 0 11 2 2 3 3 4 4 0 0 11 2 2 3 3 4 4 0 0 11 2 2 3 3 4 4 0 0 11 2 2 3 3 4 4 0 0 11 2 2 3 3 4 4 0 0 11 2 2 3 3 4 4 0 0 11 2 2 3 3 4 4

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

44 17 17 22 33 33 55 28 28 88 12 12 22 88 55 55 18 18 24 24 76 76 25 25 99 22 11 22 11 10 10 33 11 44 22 47 47 13 13 15 15 13 13 12 12 76 76 11 11 14 14 99 88 38 38 88 77 99 44 19 19 00 44 33 00 10 10 33 66 11 33 19 19 22 66 77 44 24 24 44 33 11 33

11 0 0 11 11 11 11 0 0 11 11 11 11 0 0 11 11 11 11 0 0 11 11 11 11 0 0 11 11 11 11 0 0 11 11 11 11 0 0 11 11 11 11 0 0 11 11 11 11 0 0 11 11 11 11 0 0 11 11 11 11 0 0 11 11 11 11 0 0 11 11 11 11

00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 11 11 11 11 0 0 11 11 11 11 0 0 11 11 11 11 0 0 11 11 11 11 0 0 11 11 11 11 0 0 11 11 11 11

.6931472 .6931472 2.079442 2 .079442 .6931472 .6931472 .6931472 .6931472 .6931472 .6931472 .6931472 .6931472 2.079442 2 .079442 .6931472 .6931472 .6931472 .6931472 .6931472 .6931472 .6931472 .6931472 2.079442 2 .079442 .6931472 .6931472 .6931472 .6931472 .6931472 .6931472 .6931472 .6931472 2.079442 2 .079442 .6931472 .6931472 .6931472 .6931472 .6931472 .6931472 .6931472 .6931472 2 .079442 2.079442 .6931472 .6931472 .6931472 .6931472 .6931472 .6931472 .6931472 .6931472 2.079442 2 .079442 .6931472 .6931472 .6931472 .6931472 .6931472 .6931472 .6931472 .6931472 2.079442 2 .079442 .6931472 .6931472 .6931472 .6931472 .6931472 .6931472 .6931472 .6931472 2.079442 2 .079442 .6931472 .6931472 .6931472 .6931472 .6931472 .6931472 .6931472 .6931472 2.079442 2 .079442 .6931472 .6931472 .6931472 .6931472 .6931472 .6931472 .6931472 .6931472 2.079442 2 .079442 .6931472 .6931472 .6931472 .6931472 .6931472 .6931472 .6931472 .6931472 2 .079442 2.079442 .6931472 .6931472 .6931472 .6931472 .6931472 .6931472 .6931472 .6931472 2 .079442 2.079442 .6931472 .6931472 .6931472 .6931472 .6931472 .6931472 .6931472 .6931472

200 200

PROGRAMS AND DATASETS DATASETS PROGRAMS AND 35 35 35 35 35 35 35 35 35 35 36 36 36 36 36 36 36 36 36 36 37 37 37 37 37 37 37 37 37 37 38 38 38 38 38 38 38 38 38 38 39 39 39 39 39 39 39 39 39 39 40 40 40 40 40 40 40 40 40 40 41 41 41 41 41 41 41 41 41 41 42 42 42 42 42 42 42 42 42 42 43 43 43 43 43 43 43 43 43 43 44 44 44 44 44 44 44 44 44 44 45 45 45 45 45 45 45 45 45 45 46 46 46 46 46 46 46 46 46 46 47 47

0 0 11 2 2 3 3 4 4 0 0 11 2 2 3 3 4 4 0 0 11 2 2 3 3 4 4 0 0 11 2 2 3 3 4 4 0 0 11 2 2 3 3 4 4 0 0 11 2 2 3 3 4 4 0 0 11 2 2 3 3 4 4 0 0 11 2 2 3 3 4 4 0 0 11 2 2 3 3 4 4 0 0 11 2 2 3 3 4 4 0 0 11 2 2 3 3 4 4 0 0 11 2 2 3 3 4 4 0 0

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

31 31 22 22 17 17 19 19 16 16 14 14 55 44 77 44 11 11 22 44 00 44 67 67 33 77 77 77 41 41 44 18 18 22 55 77 22 11 11 00 22 22 00 22 44 00 13 13 55 44 00 33 46 46 11 11 14 14 25 25 15 15 36 36 10 10 55 33 88 38 38 19 19 77 66 77 77 11 11 22 44 36 36

0 0 11 11 11 11 0 0 11 11 11 11 0 0 11 11 11 11 0 0 11 11 11 11 0 0 11 11 11 11 0 0 11 11 11 11 0 0 11 11 11 11 0 0 11 11 11 11 0 0 11 11 11 11 0 0 11 11 11 11 0 0 11 11 11 11 0 0 11 11 11 11 0 0

11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11

0 0 11 11 11 11 0 0 11 11 11 11 0 0 11 11 11 11 0 0 11 11 11 11 0 0 11 11 11 11 0 0 11 11 11 11 0 0 11 11 11 11 0 0 11 11 11 11 0 0 11 11 11 11 0 0 11 11 11 11 0 0 11 11 11 11 0 0 11 11 11 11 0 0

2.079442 2.079442 .6931472 .6931472 .6931472 .6931472 .6931472 .6931472 .6931472 .6931472 2.079442 2 .079442 .6931472 .6931472 .6931472 .6931472 .6931472 .6931472 .6931472 .6931472 2.079442 2 .079442 .6931472 .6931472 .6931472 .6931472 .6931472 .6931472 .6931472 .6931472 2.079442 2 .079442 .6931472 .6931472 .6931472 .6931472 .6931472 .6931472 .6931472 .6931472 2.079442 2 .079442 .6931472 .6931472 .6931472 .6931472 .6931472 .6931472 .6931472 .6931472 2 .079442 2.079442 .6931472 .6931472 .6931472 .6931472 .6931472 .6931472 .6931472 .6931472 2.079442 2 .079442 .6931472 .6931472 .6931472 .6931472 .6931472 .6931472 .6931472 .6931472 2.079442 2 .079442 .6931472 .6931472 .6931472 .6931472 .6931472 .6931472 .6931472 .6931472 2.079442 2 .079442 .6931472 .6931472 .6931472 .6931472 .6931472 .6931472 .6931472 .6931472 2.079442 2 .079442 .6931472 .6931472 .6931472 .6931472 .6931472 .6931472 .6931472 .6931472 2.079442 2 .079442 .6931472 .6931472 .6931472 .6931472 .6931472 .6931472 .6931472 .6931472 2 .079442 2.079442 .6931472 .6931472 .6931472 .6931472 .6931472 .6931472 .6931472 .6931472 2.079442 2 .079442

DATASETS DATASETS 47 47 47 47 47 47 47 47 48 48 48 48 48 48 48 48 48 48 49 49 49 49 49 49 49 49 49 49 50 50 50 50 50 50 50 50 50 50 51 51 51 51 51 51 51 51 51 51 52 52 52 52 52 52 52 52 52 52 53 53 53 53 53 53 53 53 53 53 54 54 54 54 54 54 54 54 54 54 55 55 55 55 55 55 55 55 55 55 56 56 56 56 56 56 56 56 56 56 57 57 57 57 57 57 57 57 57 57 58 58 58 58 58 58 58 58 58 58 59 59 59 59

201 20 1

11 2 2 3 3 4 4 0 0 11 2 2 3 3 4 4 0 0 11 2 2 3 3 4 4 0 0 11 2 2 3 3 4 4 0 0 11 2 2 3 3 4 4 0 0 11 2 2 3 3 4 4 0 0 11 2 2 3 3 4 4 0 0 11 2 2 3 3 4 4 0 0 11 2 2 3 3 4 4 0 0 11 2 2 3 3 4 4 0 0 11 2 2 3 3 4 4 0 0 11 2 2 3 3 4 4 0 0 11

66 10 10 88 88 11 11 22 11 00 00 151 151 102 102 65 65 72 72 63 63 22 22 44 33 22 44 42 42 88 66 55 77 32 32 11 33 11 55 56 56 18 18 11 11 28 28 13 13 24 24 66 33 44 00 16 16 33 55 44 33 22 22 11 23 23 19 19 88 25 25 22 33 00 11 13 13 00 00 00 00 12 12 11

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

11 11 11 11 0 0 11 11 11 11 0 0 11 11 11 11 0 0 11 11 11 11 0 0 11 11 11 11 0 0 11 11 11 11 0 0 11 11 11 11 0 0 11 11 11 11 0 0 11 11 11 11 0 0 11 11 11 11 0 0 11 11 11 11 0 0 11 11 11 11 0 0 11

11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11

11 11 11 11 0 0 11 11 11 11 0 0 11 11 11 11 0 0 11 11 11 11 0 0 11 11 11 11 0 0 11 11 11 11 0 0 11 11 11 11 0 0 11 11 11 11 0 0 11 11 11 11 0 0 11 11 11 11 0 0 11 11 11 11 0 0 11 11 11 11 0 0 11

.6931472 .6931472 .6931472 .6931472 .6931472 .6931472 .6931472 .6931472 2.079442 2 .079442 .6931472 .6931472 .6931472 .6931472 .6931472 .6931472 .6931472 .6931472 2.079442 2 .079442 .6931472 .6931472 .6931472 .6931472 .6931472 .6931472 .6931472 .6931472 2.079442 2 .079442 .6931472 .6931472 .6931472 .6931472 .6931472 .6931472 .6931472 .6931472 2.079442 2 .079442 .6931472 .6931472 .6931472 .6931472 .6931472 .6931472 .6931472 .6931472 2.079442 2 .079442 .6931472 .6931472 .6931472 .6931472 .6931472 .6931472 .6931472 .6931472 2.079442 2 .079442 .6931472 .6931472 .6931472 .6931472 .6931472 .6931472 .6931472 .6931472 2.079442 2 .079442 .6931472 .6931472 .6931472 .6931472 .6931472 .6931472 .6931472 .6931472 2.079442 2 .079442 .6931472 .6931472 .6931472 .6931472 .6931472 .6931472 .6931472 .6931472 2.079442 2 .079442 .6931472 .6931472 .6931472 .6931472 .6931472 .6931472 .6931472 .6931472 2 .079442 2.079442 .6931472 .6931472 .6931472 .6931472 .6931472 .6931472 .6931472 .6931472 2 .079442 2.079442 .6931472 .6931472 .6931472 .6931472 .6931472 .6931472 .6931472 .6931472 2.079442 2 .079442 .6931472 .6931472

PROGRAMS AND DATASETS DATASETS PROGRAMS AND

202 202 59 59 59 59 59 59

2 2 3 3 4 4

44 33 22

11 11 11

11 11 11

11 11 11

.6931472 .6931472 .6931472 .6931472 .6931472 .6931472

5.2.4 Simulated Simulated logistic logistic data data 5.2.1, This dataset is is used used in in Chapter Chapter 44 to to illustrate the calculation calculation and and interpretainterpretaThis dataset illustrate the tion of the QIC diagnostic measure. The panel identifier is given by id; id; the the tion of the QIC diagnostic measure. The panel identifier is given by repeated measures for the panel are given by t; the xi covariate measures repeated measures for the panel are given by t; the x1 covariate measures are random U(0,1) U(O, 1);; the the x2 x2 covariate covariate measures measures are are random random N(0,1) N(O, 1);; the the x3 x3 coare random cobinary variate measures are random U(5, 10); and the binary outcome y is generated variate measures are random U(5,10) ; and the outcome y is generated such that .4 A is is the the approximate approximate within-panel within-panel correlation correlation and and such that Yit = = --Ax! + .25x2 .25x2 + + .15x3 .15x3 + + 1.3 1.3 yet .4x1 +

(5.1) (5.1)

for aa marginal marginal logistic regression. for logistic regression. A of the the panel panel structure structure of of the the data is:: A description description of data is id:: id t:: t

= =

1, 2, . . .,, 50 50 1, 2, 1, 2, . . ,, 88 1, 2, Delta(t) . == 11;; (8-1)+1 (8-1)+1 == 88 Delta(t) (id*t uniquely uniquely identifies identifies each each observation) (id*t observation)

Distribution of of T_i T_i:: Distribution

min min 8 8

5% 5% 8 8

25% 25% 8 8

50% 50% 88

n n = T T=

75% 75% 88

50 50 8 8

95% 95% 88

max max 8 8

I ---------------------------+----------

Freq.. Percent Percent Cum.. Pattern Freq Cum Pattern -------------------------------------50 100.00 100.00 11111111 50 100 .00 100 .00 II 11111111 ------------------------------------100.00 50 50 100 .00 XXXXXXXX I

---------------------------+----------

xxxxxxxx

A summary of of the the variables variables is: A summary is: Variable Mean Std.. Dev Dev.. Min Max II Observations Variable Mean Std Min Max Observations I -------------------------------------------------------------------------------

-----------------+--------------------------------------------+----------------

Y y

xl xi

x2 x2

x3 x3

overall I overall between I between wi thin I within I overall I overall between between I within within I I overall overall I between between I within within I I overall overall I between I between within I within

.74 .74

.4391836 .4391836 .2866503 .2866503 .3348961 .3348961

00 00 135 --..135

11 11 11.615 .615

.4873817 .4873817

.2843164 .2843164 .0966579 .0966579 .2676882 .2676882

.0022786 .0022786 .2572445 .2572445 - .178456 -.178456

.9914105 .9914105 .690023 .690023 11.107685 .107685

--.0616686 .0616686

.9793633 .9793633 .3630255 .3630255 .9108661 .9108661

-2.391706 -2 .391706 --.7829833 .7829833 -2 -2.282303 .282303

3.070278 3 .070278 .6286468 .6286468 2 2.729702 .729702

.0904452 .0904452

22.778632 .778632 .9949208 .9949208 22.597748 .597748

-4 -4.998804 .998804 -1.864617 -1 .864617 -5.857181 -5 .857181

4 .942484 4.942484 11.897457 .897457 6.516726 6 .516726

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

I I 1 I I I 1 I 1 1 1 I 1 1 1

= = =

400 400 50 50 88

= = =

400 400 50 50 88

= = =

400 400 50 50 88

= = =

400 400 50 50 88

NN = n = n T T = NN = n n = T = T NN = n n = T = T NN = n n = T = T

DATASETS DATASETS

203 20 3

The data are are:: The data id id 11 11 11 11 11 11 11 11 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 8 8 8 8

t t 11 2 2 3 3 4 4 5 5 6 6 7 7 8 8 11 2 2 3 3 4 4 5 5 6 6 7 7 8 8 11 2 2 3 3 4 4 5 5 6 6 7 7 8 8 11 2 2 3 3 4 4 5 5 6 6 7 7 8 8 11 2 2 3 3 4 4 5 5 6 6 7 7 8 8 11 2 2 3 3 4 4 5 5 6 6 7 7 8 8 11 2 2 3 3 4 4 5 5 6 6 7 7 8 8 11 2 2

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

y Y 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 0 0 0 0 0 0 11 11 0 0 11 11 11 11 11 0 0 11 11 0 0 11 11 0 0 0 0 0 0 0 0 0 0 11 0 0

x1 xi .88523501 .88523501 .33954228 .33954228 .98692055 .98692055 .43864285 .43864285 .19520924 .19520924 .62452864 .62452864 .58458474 .58458474 .40398331 .40398331 .5432351 .5432351 .45269737 .45269737 .52367102 .52367102 .7373026 .7373026 .84405544 .84405544 .26029491 .26029491 .68926677 .68926677 .33319191 .33319191 .99141046 .99141046 .11734212 .11734212 .17001171 .17001171 .95515468 .95515468 .26147315 .26147315 .29333199 .29333199 .62018647 .62018647 .87830916 .87830916 .6308325 .6308325 .17430338 .17430338 .43297697 .43297697 .44872869 .44872869 .53018721 .53018721 .88605676 .88605676 .58982214 .58982214 .1641364 .1641364 .66959327 .66959327 .93625849 .93625849 .76221961 .76221961 .87525675 .87525675 .40668204 .40668204 .53499123 .53499123 .32165161 .32165161 .44859654 .44859654 .89396044 .89396044 .95474025 .95474025 .44044753 .44044753 .94513756 .94513756 .40861741 .40861741 .61684079 .61684079 .43013996 .43013996 .28627501 .28627501 .75287802 .75287802 .94853001 .94853001 .02615721 .02615721 .42648522 .42648522 .01091767 .01091767 .55131528 .55131528 .39179926 .39179926 .35669087 .35669087 .12855085 .12855085 .2241307 .2241307

x2 x2 --.26417433 .26417433 -1.4430173 -1 .4430173 --.16949629 .16949629 --.12070738 .12070738 .51354993 .51354993 .61329929 .61329929 --.29184848 .29184848 1.2441504 1 .2441504 --.39363134 .39363134 --.65273565 .65273565 --.9039997 .9039997 --.7703287 .7703287 .31304621 .31304621 .1485929 .1485929 -1.1141923 -1 .1141923 .64182082 .64182082 .14224213 .14224213 995702 11..995702 .37615613 .37615613 --.36269575 .36269575 .1041402 -1.1041402 -1 --.79240046 .79240046 --.0337262 .0337262 2.4218153 2 .4218153 .03634451 .03634451 .72096278 .72096278 .61541613 .61541613 1.0482384 1 .0482384 1.5200186 1 .5200186 .42986316 .42986316 .6308999 .6308999 --.99606258 .99606258 .56058246 .56058246 .55995908 .55995908 -1.2894494 -1 .2894494 -1.0410228 -1 .0410228 --.37620679 .37620679 .06020311 .06020311 --.30549954 .30549954 -1.1548636 -1 .1548636 --.20606491 .20606491 .27077224 .27077224 .20482204 .20482204 -1.053525 -1 .053525 1.2710332 1 .2710332 --.25491999 .25491999 --.32949256 .32949256 --.33010354 .33010354 --.47702251 .47702251 .38926723 .38926723 -1 .2221744 -1.2221744 --.86363652 .86363652 --.87978583 .87978583 --.19303872 .19303872 --.87915599 .87915599 --.40892197 .40892197 .30357532 .30357532 --.00238475 .00238475

x3 x3 4.3258062 4.3258062 3.4140513 3.4140513 .60771281 .60771281 4.3190559 4.3190559 --.76819588 .76819588 -1.8277712 -1 .8277712 4.3657006 4.3657006 --.08235102 .08235102 11.0439372 .0439372 -1.547725 -1 .547725 4.5063062 4.5063062 .76219818 .76219818 3.7118576 3.7118576 .42186429 -- .42186429 -4.1421549 -4 .1421549 1. 740416 1 .740416 .17241989 .17241989 3.0959682 3.0959682 11.4662582 .4662582 11.8412877 .8412877 4 4.4556939 .4556939 -1..7109089 7109089 -1 -4 -4.7023703 .7023703 -4.5688076 -4 .5688076 .4425292 3.4425292 3 11.4317767 .4317767 2.3123194 2.3123194 2.1555201 2 .1555201 .46990644 .46990644 .98008882 .98008882 4.7580965 4.7580965 -1.8691156 -1 .8691156 .91035143 .91035143 3.6458639 3.6458639 3.4636962 3.4636962 -1.5899535 -1 .5899535 -2.0461566 -2 .0461566 3.1828135 3.1828135 -3.2824211 -3 .2824211 .11750016 .11750016 -2.6077053 -2 .6077053 -4.9657403 -4 .9657403 -3.2177994 -3 .2177994 3.1326672 3.1326672 .59599184 .59599184 3331989 11..3331989 11.7022658 .7022658 --.81824249 .81824249 --.3125868 .3125868 3.5482391 3.5482391 11.0403809 .0403809 -2 -2.2260864 .2260864 -3.3014733 -3 .3014733 -4 -4.5931443 .5931443 -2.3541865 -2 .3541865 3.6668923 3.6668923 3.0231191 3.0231191 -1.463341 -1 .463341

204 204

PROGRAMS AND DATASETS DATASETS PROGRAMS AND 8 8 8 8 8 8 8 8 8 8 8 8 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 15 15 15 15 15 15 15 15 15 15 15 15 15 15

3 3 4 4 5 5 6 6 7 7 8 8 11 2 2 3 3 4 4 5 5 6 6 7 7 8 8 11 2 2 3 3 4 4 5 5 6 6 7 7 8 8 11 2 2 3 3 4 4 5 5 6 6 7 7 8 8 11 2 2 3 3 4 4 5 5 6 6 7 7 8 8 11 2 2 3 3 4 4 5 5 6 6 7 7 8 8 11 2 2 3 3 4 4 5 5 6 6 7 7 8 8 11 2 2 3 3 4 4 5 5 6 6 7 7

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

11 11 0 0 0 0 11 11 11 11 0 0 0 0 11 0 0 11 0 0 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 0 0 0 0 11 11 11 11 0 0 11 0 0 0 0 11 0 0 0 0 0 0 11 0 0 11 11 11 11 11 11 11 11 11 11 11 11 11 11 0 0 11

.06231786 .06231786 .47453826 .47453826 .48496416 .48496416 .77902541 .77902541 .53028459 .53028459 .80609256 .80609256 .88378248 .88378248 .15312962 .15312962 .90788475 .90788475 .834628 .834628 .3549188 .3549188 .61216385 .61216385 .79174163 .79174163 .83288793 .83288793 .34653541 .34653541 .5918418 .5918418 .82795863 .82795863 .20497884 .20497884 .53749413 .53749413 .50569005 .50569005 .50079654 .50079654 .3367656 .3367656 .00380824 .00380824 .89680016 .89680016 .1360347 .1360347 .69414398 .69414398 .38013169 .38013169 .1684285 .1684285 .31684049 .31684049 .04945258 .04945258 .63517814 .63517814 .76923942 .76923942 .76285548 .76285548 .58721832 .58721832 .91543622 .91543622 .0366871 .0366871 .29761236 .29761236 .02741682 .02741682 .53181705 .53181705 .51958141 .51958141 .60291944 .60291944 .02950277 .02950277 .32117461 .32117461 .25505851 .25505851 .30665354 .30665354 .4827527 .4827527 .56189396 .56189396 .5073453 .5073453 .62189489 .62189489 .04573518 .04573518 .83306196 .83306196 .95508826 .95508826 .8425308 .8425308 .22936638 .22936638 .07859847 .07859847 .66571564 .66571564 .34542704 .34542704 .38745922 .38745922 .98387702 .98387702 .86916942 .86916942 .74294057 .74294057

.60146353 .60146353 1.5588749 1 .5588749 --.18722218 .18722218 --.52048368 .52048368 --.1661389 .1661389 .46771013 .46771013 .63588368 .63588368 --.38212096 .38212096 -1.286626 -1 .286626 --.13891725 .13891725 --.65583187 .65583187 .0449998 .0449998 -1.3466392 -1 .3466392 1.2581238 1 .2581238 --.64545144 .64545144 -1.1211371 -1 .1211371 -1.4299645 -1 .4299645 .50605129 .50605129 --.51641011 .51641011 --.63162137 .63162137 --.36714636 .36714636 --.58090701 .58090701 -1.9394894 -1 .9394894 .88590003 .88590003 -1.0620764 -1 .0620764 --.0589215 .0589215 1.6410543 1 .6410543 -1.1057832 -1 .1057832 .28738969 .28738969 --.84357201 .84357201 --.44529748 .44529748 -1.5517598 -1 .5517598 .00949782 .00949782 -1.2622061 -1 .2622061 .07091063 .07091063 -1..312804 312804 -1 --.25571016 .25571016 -1.0394488 -1 .0394488 --.1171302 .1171302 --.44810463 .44810463 .79294672 .79294672 --.14219833 .14219833 .22017125 .22017125 .4592555 .4592555 1.8549507 1 .8549507 .39750888 .39750888 --.13241135 .13241135 --.2808839 .2808839 .15450659 .15450659 .28300012 .28300012 --.13693423 .13693423 1 .0661567 1.0661567 .17461357 .17461357 -1.4735188 -1 .4735188 -1..098752 098752 -1 -- .4837278 .4837278 -2 .1701744 -2.1701744 --.48917295 .48917295 .88562335 .88562335 --.00397613 .00397613 .0916879 .0916879

-4.9988038 -4 .9988038 .54277238 .54277238 -3.8340121 -3 .8340121 -2.0633239 -2 .0633239 -4.3329232 -4 .3329232 .96578931 .96578931 2.226928 2 .226928 11.8735343 .8735343 -3.236961 -3 .236961 -1.3035694 -1 .3035694 .44230726 .44230726 --.70483647 .70483647 3.7028283 3.7028283 -3.6604588 -3 .6604588 --.72763472 .72763472 -1.6468374 -1 .6468374 .9285548 .9285548 -4.9833893 -4 .9833893 --.71150266 .71150266 7601965 11..7601965 -1.3482761 -1 .3482761 -3.9945315 -3 .9945315 3.5328471 3.5328471 .47946425 -- .47946425 11.1742915 .1742915 11.8255025 .8255025 .71103245 .71103245 2 .921645 2.921645 -4.8851947 -4 .8851947 -3.0051272 -3 .0051272 3.4268863 3.4268863 11.7410977 .7410977 2.1936293 2.1936293 11.1906233 .1906233 -4.3739526 -4 .3739526 .08243607 .08243607 -2.9577733 -2 .9577733 -2.044473 -2 .044473 -3.4598432 -3 .4598432 .89557035 .89557035 -1.8818473 -1 .8818473 11.1538704 .1538704 -2.3138548 -2 .3138548 -1.0209359 -1 .0209359 11.4762511 .4762511 4.675657 4 .675657 --.12640056 .12640056 2.1322419 2.1322419 4.654006 4 .654006 -2 .4122847 -2.4122847 2.9582598 2.9582598 3.911096 3 .911096 --.02330345 .02330345 3.7814739 3.7814739 -3.6369593 -3 .6369593 .50674996 .50674996 -4 -4.1409653 .1409653 --.3742053 .3742053 -4.0334277 -4 .0334277 --.19389177 .19389177 -3.2145793 -3 .2145793

DATASETS DATASETS 15 15 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 17 17 17 17 17 17 17 17 17 17 17 17 17 17 17 17 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 22 22 22 22 22 22 22 22 22 22 22 22 22 22 22 22 23 23 23 23 23 23 23 23

205 20 5 8 8 11 2 2 3 3 4 4 5 5 6 6 7 7 8 8 11 2 2 3 3 4 4 5 5 6 6 7 7 8 8 11 2 2 3 3 4 4 5 5 6 6 7 7 8 8 11 2 2 3 3 4 4 5 5 6 6 7 7 8 8 11 2 2 3 3 4 4 5 5 6 6 7 7 8 8 11 2 2 3 3 4 4 5 5 6 6 7 7 8 8 11 2 2 3 3 4 4 5 5 6 6 7 7 8 8 11 2 2 3 3 4 4

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

11 11 11 11 0 0 11 11 11 11 0 0 0 0 11 0 0 11 11 0 0 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 0 0 0 0 0 0 0 0 0 0 0 0 11 0 0 11 11 0 0 0 0 11 11 11 0 0 11 11 0 0

.27558712 .27558712 .58397944 .58397944 .5126295 .5126295 .41536157 .41536157 .48953521 .48953521 .68019157 .68019157 .3130354 .3130354 .12564593 .12564593 .11187084 .11187084 .37782425 .37782425 .00227859 .00227859 .07346247 .07346247 .18263109 .18263109 .36099134 .36099134 .57799378 .57799378 .72386528 .72386528 .98178964 .98178964 .52735318 .52735318 .9006656 .9006656 .24406173 .24406173 .25782954 .25782954 .62259568 .62259568 .98445967 .98445967 .41295201 .41295201 .17591862 .17591862 .29881176 .29881176 .67180651 .67180651 .71226042 .71226042 .67633982 .67633982 .92177484 .92177484 .6557825 .6557825 .77172166 .77172166 .26164491 .26164491 .17978976 .17978976 .00593496 .00593496 .34522656 .34522656 .81558473 .81558473 .22084109 .22084109 .22060169 .22060169 .8652202 .8652202 .51823612 .51823612 .13201322 .13201322 .7645131 .7645131 .69539795 .69539795 .2077739 .2077739 .73578102 .73578102 .19837309 .19837309 .48287501 .48287501 .20858556 .20858556 .53156268 .53156268 .2012225 .2012225 .02760816 .02760816 .16596069 .16596069 .68825371 .68825371 .84784981 .84784981 .86893977 .86893977 .16075925 .16075925 .83635474 .83635474 .04011063 .04011063 .40751043 .40751043 .8326156 .8326156

.34792129 .34792129 --.91380405 .91380405 .08168254 .08168254 -1.8501342 -1 .8501342 1.2301491 1 .2301491 --.28588617 .28588617 .07511148 .07511148 .61939931 .61939931 -- .4230035 .4230035 --.02957512 .02957512 --.3834609 .3834609 1.3034031 1 .3034031 1.2907545 1 .2907545 --.26646035 .26646035 1.1626149 1 .1626149 --.5290824 .5290824 --.43350923 .43350923 --.13539128 .13539128 --.07789571 .07789571 --.78731442 .78731442 --.07930608 .07930608 .66037158 .66037158 --.37126614 .37126614 --.74630656 .74630656 .80575665 .80575665 .44737735 .44737735 2.0835332 2 .0835332 .99170182 .99170182 1.3345701 1 .3345701 .31909905 .31909905 --.60688805 .60688805 --.50578954 .50578954 --.97731943 .97731943 --.89168252 .89168252 .93206089 .93206089 2.3692898 2 .3692898 -1.2878287 -1 .2878287 1.3172106 1 .3172106 .70023263 .70023263 --.00889777 .00889777 .53777187 .53777187 -1.3715155 -1 .3715155 --.21982861 .21982861 -1.8811427 -1 .8811427 -1.1611297 -1 .1611297 --.12590762 .12590762 --.98063133 .98063133 --.04722225 .04722225 --.47648833 .47648833 -- .4926744 .4926744 --.64922471 .64922471 -1.5883851 -1 .5883851 -2.0535783 -2 .0535783 --.35866027 .35866027 .27945389 .27945389 .68770709 .68770709 -1.8818703 -1 .8818703 --.69712701 .69712701 --.09210309 .09210309 .34407518 .34407518 --.12476642 .12476642

3.430073 3 .430073 -4.425422 -4 .425422 11.6236404 .6236404 -2.1072937 -2 .1072937 -2.9883699 -2 .9883699 -1.5320506 -1 .5320506 --.90140523 .90140523 .14262375 .14262375 2.4722555 2.4722555 -3.6807588 -3 .6807588 -1.0739004 -1 .0739004 .70664593 .70664593 -3.3018123 -3 .3018123 3.957581 3 .957581 --.90230217 .90230217 --.17979672 .17979672 --.6303771 .6303771 .24123824 .24123824 -1.3711217 -1 .3711217 -4.9034452 -4 .9034452 --.5210331 .5210331 4 4.5491875 .5491875 -2.9832802 -2 .9832802 -3.561139 -3 .561139 3.5713269 3.5713269 --.07935133 .07935133 -4.1586076 -4 .1586076 4 4.3754299 .3754299 -4.6777002 -4 .6777002 2.2397399 2.2397399 3.9729033 3.9729033 --.63316955 .63316955 --.74951374 .74951374 -4.995013 -4 .995013 2.192063 2 .192063 4.5985832 4.5985832 -3.5325098 -3 .5325098 3.6624955 3.6624955 .47788956 .47788956 -1.0808749 -1 .0808749 3.1110475 3.1110475 11.9826525 .9826525 4.1855778 4.1855778 4.3522186 4.3522186 .77274462 .77274462 -1..5178568 5178568 -1 -4.8624581 -4 .8624581 .67162919 .67162919 3.0968339 3.0968339 -4 -4.8648589 .8648589 2.5147342 2.5147342 3.5598681 3.5598681 .2191406 -4.2191406 -4 -1.9256883 -1 .9256883 4.9424842 4.9424842 2.6221845 2.6221845 -1.5203959 -1 .5203959 -1..9354771 9354771 -1 2 2.8352676 .8352676 --.77299453 .77299453 -3.6740776 -3 .6740776

206 206

PROGRAMS AND DATASETS DATASETS PROGRAMS AND 23 23 23 23 23 23 23 23 24 24 24 24 24 24 24 24 24 24 24 24 24 24 24 24 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25 26 26 26 26 26 26 26 26 26 26 26 26 26 26 26 26 27 27 27 27 27 27 27 27 27 27 27 27 27 27 27 27 28 28 28 28 28 28 28 28 28 28 28 28 28 28 28 28 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 31 31

5 5 6 6 7 7 8 8 11 2 2 3 3 4 4 5 5 6 6 7 7 8 8 11 2 2 3 3 4 4 5 5 6 6 7 7 8 8 11 2 2 3 3 4 4 5 5 6 6 7 7 8 8 11 2 2 3 3 4 4 5 5 6 6 7 7 8 8 11 2 2 3 3 4 4 5 5 6 6 7 7 8 8 11 2 2 3 3 4 4 5 5 6 6 7 7 8 8 11 2 2 3 3 4 4 5 5 6 6 7 7 8 8 11

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

11 0 0 11 11 0 0 0 0 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 0 0 11 11 11 11 0 0 11 11 11 11 0 0 11 0 0 11 11 11 11 0 0 11 11 11 11 11 11 0 0 11 11

.68758874 .68758874 .27736921 .27736921 .4968114 .4968114 .22005071 .22005071 .23589676 .23589676 .17549448 .17549448 .19661888 .19661888 .2569721 .2569721 .66256802 .66256802 .34392516 .34392516 .24420482 .24420482 .65872017 .65872017 .23582646 .23582646 .77706614 .77706614 .5493553 .5493553 .18284525 .18284525 .33852097 .33852097 .26992076 .26992076 .98637592 .98637592 .30449762 .30449762 .96272796 .96272796 .02418522 .02418522 .92729395 .92729395 .89722758 .89722758 .72422163 .72422163 .69992838 .69992838 .62149437 .62149437 .6631046 .6631046 .74619942 .74619942 .29762063 .29762063 .58929351 .58929351 .37908853 .37908853 .19380906 .19380906 .29698769 .29698769 .22582572 .22582572 .82906298 .82906298 .32449638 .32449638 .62007629 .62007629 .39488068 .39488068 .63217978 .63217978 .33537316 .33537316 .21462002 .21462002 .71845174 .71845174 .49786685 .49786685 .58656109 .58656109 .8677033 .8677033 .85653838 .85653838 .3687826 .3687826 .65798499 .65798499 .17775392 .17775392 .91381994 .91381994 .9393249 .9393249 .64733838 .64733838 .4630548 .4630548 .53684018 .53684018 .37143225 .37143225 .60971577 .60971577 .56700373 .56700373 .95745034 .95745034 .15581061 .15581061 .40739452 .40739452

1.2680465 1 .2680465 -1..7895554 7895554 -1 .19404024 .19404024 -1.2299077 -1 .2299077 -1.3315594 -1 .3315594 --.54896919 .54896919 --.74119281 .74119281 --.73098404 .73098404 --.33598341 .33598341 --.02288244 .02288244 .45729008 .45729008 --.41720822 .41720822 .84608032 .84608032 --.16192432 .16192432 -1.5005625 -1 .5005625 .75113585 .75113585 --.05585918 .05585918 --.43186582 .43186582 .13750156 .13750156 --.19142843 .19142843 1.4228575 1 .4228575 --.32378812 .32378812 --.64441027 .64441027 1 .0411978 1.0411978 .64399618 .64399618 .34498084 .34498084 1.1166724 1 .1166724 --.63930528 .63930528 1.2856775 1 .2856775 .2264596 .2264596 1.3225808 1 .3225808 -1.1701139 -1 .1701139 1.0166835 1 .0166835 --.28090275 .28090275 --.56469194 .56469194 -1.0804181 -1 .0804181 --.66505316 .66505316 1.6773657 1 .6773657 .07470998 .07470998 .48602902 .48602902 .8967287 .8967287 .35152923 .35152923 --.68620416 .68620416 2.0024899 2 .0024899 .39099986 .39099986 .02217869 .02217869 3.0142648 3 .0142648 -1.6364854 -1 .6364854 .69043478 .69043478 1 .6158611 1.6158611 --.17701318 .17701318 -1.1346761 -1 .1346761 --.51795779 .51795779 .74890016 .74890016 --.77432798 .77432798 1 .3865299 1.3865299 -1.0785653 -1 .0785653 .08187213 .08187213 .65348742 .65348742 --.9164123 .9164123 3.070278 3 .070278

2.212705 2 .212705 3.2003125 3.2003125 3.2960769 3.2960769 4.7701399 4.7701399 -1..7132597 7132597 -1 3.9754857 3.9754857 -1.3661813 -1 .3661813 .58565707 .58565707 -1.6845021 -1 .6845021 .66320471 .66320471 -1.0345737 -1 .0345737 -1.6707589 -1 .6707589 4.4993701 4.4993701 --.30763881 .30763881 2.9079465 2.9079465 -4.1398297 -4 .1398297 --.99301879 .99301879 -3.0121238 -3 .0121238 -3.6957577 -3 .6957577 2.2392974 2.2392974 -4.6142312 -4 .6142312 -1..2151424 2151424 -1 .19044551 .19044551 2 2.4677298 .4677298 2.3188249 2 .3188249 --.99608916 .99608916 4.7785935 4 .7785935 -4.7249802 -4 .7249802 -3.0055967 -3 .0055967 -3.331311 -3 .331311 -1.6281139 -1 .6281139 -4.3093914 -4 .3093914 -2.0458075 -2 .0458075 -1.2872821 -1 .2872821 -3.6555851 -3 .6555851 4.5924508 4.5924508 11.9937702 .9937702 2.9524903 2.9524903 -1.3430214 -1 .3430214 4.2420256 4.2420256 .05658468 .05658468 11.1532962 .1532962 11.1975402 .1975402 -1..7361389 7361389 -1 3.2369623 3.2369623 -3.3658257 -3 .3658257 -4.5343759 -4 .5343759 2.6264919 2.6264919 3.2047573 3.2047573 - .190484 -.190484 --.35774313 .35774313 -2 -2.3318768 .3318768 .43073647 .43073647 --.97820477 .97820477 3.3561943 3.3561943 .35617686 .35617686 .66651807 .66651807 -4.3671494 -4 .3671494 3.067481 3 .067481 .51913779 .51913779 .53683195 .53683195

DATASETS DATASETS 31 31 31 31 31 31 31 31 31 31 31 31 31 31 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 33 33 33 33 33 33 33 33 33 33 33 33 33 33 33 33 34 34 34 34 34 34 34 34 34 34 34 34 34 34 34 34 35 35 35 35 35 35 35 35 35 35 35 35 35 35 35 35 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 38 38 38 38 38 38 38 38 38 38 38 38

207 20 7

2 2 3 3 4 4 5 5 6 6 7 7 8 8 11 2 2 3 3 4 4 5 5 6 6 7 7 8 8 11 2 2 3 3 4 4 5 5 6 6 7 7 8 8 11 2 2 3 3 4 4 5 5 6 6 7 7 8 8 11 2 2 3 3 4 4 5 5 6 6 7 7 8 8 11 2 2 3 3 4 4 5 5 6 6 7 7 8 8 11 2 2 3 3 4 4 5 5 6 6 7 7 8 8 11 2 2 3 3 4 4 5 5 6 6

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

11 11 11 11 11 11 11 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 0 0 11 0 0 11 11 11 11 0 0 11 11 11 0 0 0 0 11 11 11 11 11 11 11 11

.56327384 .56327384 .29046376 .29046376 .53421342 .53421342 .54962087 .54962087 .95600868 .95600868 .63389869 .63389869 .64621045 .64621045 .10232071 .10232071 .78922367 .78922367 .42552667 .42552667 .41535373 .41535373 .763188 .763188 .72860125 .72860125 .13680069 .13680069 .91027842 .91027842 .54590047 .54590047 .83071607 .83071607 .17521875 .17521875 .61075648 .61075648 .01699386 .01699386 .70993419 .70993419 .06030223 .06030223 .06520496 .06520496 .86807731 .86807731 .21507805 .21507805 .76262093 .76262093 .80968973 .80968973 .25898625 .25898625 .32008237 .32008237 .64347544 .64347544 .03699922 .03699922 .78041924 .78041924 .65758922 .65758922 .34141978 .34141978 .39119715 .39119715 .29317256 .29317256 .3642809 .3642809 .24336802 .24336802 .340014 .340014 .13895188 .13895188 .53017192 .53017192 .1484553 .1484553 .03120775 .03120775 .19390937 .19390937 .93600489 .93600489 .51341005 .51341005 .60332375 .60332375 .80361478 .80361478 .83905361 .83905361 .66134728 .66134728 .14588421 .14588421 .98844257 .98844257 .55946653 .55946653 .77506609 .77506609 .47633007 .47633007 .25587877 .25587877 .31655058 .31655058 .47828454 .47828454 .5520271 .5520271 .70153985 .70153985 .08902159 .08902159

--.3849021 .3849021 .60229091 .60229091 .14574362 .14574362 -1.9203826 -1 .9203826 1.6891462 1 .6891462 --.78329977 .78329977 --.18761821 .18761821 .45032416 .45032416 1.2443122 1 .2443122 .54744884 .54744884 --.69373723 .69373723 --.5946792 .5946792 --.01476781 .01476781 .16688668 .16688668 .55757484 .55757484 .17411157 .17411157 --.73925464 .73925464 --.46675134 .46675134 --.8507672 .8507672 -1.2641486 -1 .2641486 .3947986 .3947986 1 .7805091 1.7805091 --.96107412 .96107412 --.10076391 .10076391 1.5387058 1 .5387058 .60498603 .60498603 1.2913217 1 .2913217 -1.3883636 -1 .3883636 --.36760313 .36760313 .32773953 .32773953 -1.8895956 -1 .8895956 .70382262 .70382262 .88996758 .88996758 -1.2497238 -1 .2497238 --.41529182 .41529182 --.63752937 .63752937 --.57384934 .57384934 1.3593231 1 .3593231 --.70663325 .70663325 --.15480263 .15480263 -2.3917064 -2 .3917064 -2.2279841 -2 .2279841 2.4427257 2 .4427257 11.892197 .892197 --.38750505 .38750505 --.01496864 .01496864 --.52653447 .52653447 .21759436 .21759436 --.202704 .202704 .53206591 .53206591 1.0827364 1 .0827364 --.46330086 .46330086 --.19740824 .19740824 -1..332941 332941 -1 -1.1817738 -1 .1817738 --.81091915 .81091915 1 .0162053 1.0162053 .09232223 .09232223 .73310513 .73310513 1.1413866 1 .1413866 -1.2584207 -1 .2584207

--.70206121 .70206121 -3.7232115 -3 .7232115 2.0436923 2.0436923 2.2235067 2.2235067 3.6981974 3.6981974 3.0074837 3.0074837 3.798925 3 .798925 -4.937239 -4 .937239 11.1670747 .1670747 2.3350909 2.3350909 -3.1032691 -3 .1032691 --.5058964 .5058964 -4.9182028 -4 .9182028 -4.1165028 -4 .1165028 --.8379931 .8379931 --.21226072 .21226072 -2.1004965 -2 .1004965 3.02218 3 .02218 11.6696662 .6696662 4.4708352 4.4708352 -2.4209945 -2 .4209945 .54445033 .54445033 -4.7515674 -4 .7515674 -4 -4.7624085 .7624085 -1.4352469 -1 .4352469 --.36165949 .36165949 3.6508464 3.6508464 -2.6991054 -2 .6991054 .42779067 .42779067 -1.753068 -1 .753068 -2.1669865 -2 .1669865 -1.5137079 -1 .5137079 .17061118 -- .17061118 2.5472233 2.5472233 -2.1858799 -2 .1858799 .43308025 .43308025 -1.2935386 -1 .2935386 4.0431023 4.0431023 2.7504995 2.7504995 .47988757 .47988757 .32486999 .32486999 -3.2921182 -3 .2921182 -3.2128021 -3 .2128021 -1.2066829 -1 .2066829 -2.369094 -2 .369094 3.5563123 3.5563123 -2.6556237 -2 .6556237 4.0917264 4.0917264 --.1131667 .1131667 .05741577 .05741577 -2.3197276 -2 .3197276 -2.9275867 -2 .9275867 11.0410657 .0410657 4 4.4338478 .4338478 --.73263268 .73263268 3.646419 3 .646419 .18708434 .18708434 -2.7465362 -2 .7465362 --.7492126 .7492126 11.1203905 .1203905 4.7620969 4.7620969

208 208

PROGRAMS AND DATASETS DATASETS PROGRAMS AND 38 38 38 38 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 43 43 43 43 43 43 43 43 43 43 43 43 43 43 43 43 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 45 45 45 45 45 45 45 45 45 45 45 45 45 45 45 45 46 46 46 46 46 46

7 7 8 8 11 2 2 3 3 4 4 5 5 6 6 7 7 8 8 11 2 2 3 3 4 4 5 5 6 6 7 7 8 8 11 2 2 3 3 4 4 5 5 6 6 7 7 8 8 11 2 2 3 3 4 4 5 5 6 6 7 7 8 8 11 2 2 3 3 4 4 5 5 6 6 7 7 8 8 11 2 2 3 3 4 4 5 5 6 6 7 7 8 8 11 2 2 3 3 4 4 5 5 6 6 7 7 8 8 11 2 2 3 3

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 0 0 11 11 0 0 0 0 11 0 0 0 0 0 0 0 0 11 11 11 11 11 0 0 0 0 11 11 11 11 11 11 0 0 11 0 0 11 0 0 0 0 11 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

.94034435 .94034435 .59914863 .59914863 .56260686 .56260686 .23018233 .23018233 .04914308 .04914308 .01188321 .01188321 .38878313 .38878313 .34094476 .34094476 .67529112 .67529112 .69555857 .69555857 .19676667 .19676667 .89083925 .89083925 .02532814 .02532814 .69018445 .69018445 .2821059 .2821059 .85792841 .85792841 .73548534 .73548534 .42788794 .42788794 .08003791 .08003791 .16347113 .16347113 .83242888 .83242888 .21015557 .21015557 .8732753 .8732753 .28980839 .28980839 .71055052 .71055052 .52896684 .52896684 .12264861 .12264861 .68375949 .68375949 .16295932 .16295932 .65459204 .65459204 .93183955 .93183955 .0145188 .0145188 .9675157 .9675157 .44257704 .44257704 .13219481 .13219481 .43382877 .43382877 .80208312 .80208312 .18421517 .18421517 .54542541 .54542541 .03521744 .03521744 .70278961 .70278961 .75151 .75151 .8386282 .8386282 .83382061 .83382061 .36430141 .36430141 .84947008 .84947008 .88426841 .88426841 .33416062 .33416062 .83010641 .83010641 .02928144 .02928144 .38704076 .38704076 .85878778 .85878778 .36038781 .36038781 .88836044 .88836044 .17073028 .17073028 .84309061 .84309061 .31751812 .31751812 .26018423 .26018423 .81664001 .81664001 .18930089 .18930089 .14386472 .14386472

.41596148 .41596148 1.1058064 1 .1058064 2.1370153 2 .1370153 .30045069 .30045069 --.08990481 .08990481 -1.2300802 -1 .2300802 --.56517104 .56517104 --.96707895 .96707895 .67396387 .67396387 --.16860392 .16860392 -1.2299737 -1 .2299737 .51698281 .51698281 .87571896 .87571896 -1..019677 019677 -1 --.05965368 .05965368 --.12587525 .12587525 -1.0268592 -1 .0268592 --.52243421 .52243421 .94269499 .94269499 --.18662671 .18662671 .31693476 .31693476 --.73428691 .73428691 -2.0275832 -2 .0275832 .05248421 .05248421 --.7547069 .7547069 -1 -1.1094163 .1094163 .22211133 .22211133 -1.7287251 -1 .7287251 --.88180576 .88180576 .28438627 .28438627 .74176345 .74176345 -1.3564762 -1 .3564762 -1..833475 833475 -1 .90140632 .90140632 --.0285642 .0285642 2.3835479 2 .3835479 1.5250206 1 .5250206 .07678258 .07678258 -1.2414406 -1 .2414406

-1.75639 -1 .75639 --.3389013 .3389013 .84971109 .84971109 2.6490389 2 .6490389 --.73016641 .73016641 .49163107 .49163107 .90074762 .90074762 --.85493851 .85493851 .92205232 .92205232 .52395781 .52395781 11.126852 .126852 --.02501493 .02501493 .34433422 .34433422 --.00653701 .00653701 --.92589448 .92589448 .48080668 .48080668 -2 -2.0786798 .0786798 --.85146016 .85146016 .80880252 .80880252 2 2.3827927 .3827927 --.18557107 .18557107 11.055132 .055132

4.1898823 4.1898823 2.8142233 2.8142233 2.0741688 2.0741688 3.0684806 3.0684806 --.97440486 .97440486 .45396775 .45396775 11.5995643 .5995643 --.15632532 .15632532 --.01713385 .01713385 .42847959 .42847959 3.6654478 3.6654478 11.5202862 .5202862 --.52499704 .52499704 --.77943381 .77943381 -3.4306597 -3 .4306597 -3.124772 -3 .124772 -4.3759957 -4 .3759957 --.77225534 .77225534 2.2493524 2.2493524 4.2082316 4.2082316 -1.1179924 -1 .1179924 11..9414704 9414704 --.95547237 .95547237 2 2.5167067 .5167067 -4.7631615 -4 .7631615 4 4.5933701 .5933701 9060186 11..9060186 2 2.0987929 .0987929 -1..7087712 7087712 -1 --.98221578 .98221578 2.3216357 2 .3216357 -2.2650536 -2 .2650536 2.6520978 2.6520978 2.2830742 2.2830742 2.3297012 2.3297012 3.823715 3 .823715 -1.5980395 -1 .5980395 --.00703386 .00703386 -3.0611441 -3 .0611441 -1.041176 -1 .041176 2.0592166 2.0592166 -3.3548854 -3 .3548854 4.1333626 4.1333626 -2.0150525 -2 .0150525 .54000746 .54000746 -2.1433894 -2 .1433894 -1..9446169 9446169 -1 --.37553885 .37553885 4.6679322 4.6679322 3.3548694 3.3548694 -3.9180342 -3 .9180342 -4 -4.4539489 .4539489 4 4.202635 .202635 -3.9760112 -3 .9760112 -4.058332 -4 .058332 .61482275 .61482275 3.1539063 3.1539063 .08828484 .08828484 11.2865451 .2865451 -1.107921 -1 .107921 11.1477528 .1477528

DATASETS DATASETS 46 46 46 46 46 46 46 46 46 46 47 47 47 47 47 47 47 47 47 47 47 47 47 47 47 47 48 48 48 48 48 48 48 48 48 48 48 48 48 48 48 48 49 49 49 49 49 49 49 49 49 49 49 49 49 49 49 49 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50

209 209

4 4 5 5 6 6 7 7 8 8 11 2 2 3 3 4 4 5 5 6 6 7 7 8 8 11 2 2 3 3 4 4 5 5 6 6 7 7 8 8 11 2 2 3 3 4 4 5 5 6 6 7 7 8 8 11 2 2 3 3 4 4 5 5 6 6 7 7 8 8

0 0 11 0 0 11 11 11 11 11 11 0 0 11 11 11 0 0 0 0 0 0 11 0 0 11 0 0 0 0 0 0 0 0 11 0 0 0 0 0 0 11 0 0 0 0 11 11 11 11 11 0 0 11

.05711375 .05711375 .25490037 .25490037 .88611686 .88611686 .93178333 .93178333 .76665744 .76665744 .54213497 .54213497 .31808571 .31808571 .15070568 .15070568 .85729343 .85729343 .59699909 .59699909 .07067903 .07067903 .77577306 .77577306 .09566843 .09566843 .4154052 .4154052 .30757364 .30757364 .14002295 .14002295 .16449026 .16449026 .04353584 .04353584 .03160888 .03160888 .87754801 .87754801 .07777111 .07777111 .55738669 .55738669 .43946018 .43946018 .86576495 .86576495 .49062378 .49062378 .89837667 .89837667 .05956974 .05956974 .1071656 .1071656 .28174196 .28174196 .48683077 .48683077 .16975154 .16975154 .80913531 .80913531 .09923071 .09923071 .51029362 .51029362 .24087059 .24087059 .37647759 .37647759 .03292408 .03292408

-1.8183438 -1 .8183438 1.2199273 1 .2199273 -1.4254149 -1 .4254149 1.8158332 1 .8158332 --.65296269 .65296269 2.2005336 2 .2005336 -1.1984512 -1 .1984512 --.42601789 .42601789 1.1353603 1 .1353603 -1..663961 663961 -1 .44522875 .44522875 .04262007 .04262007 591931 11..591931 1.2803466 1 .2803466 --.53293749 .53293749 .21588168 .21588168 .8598506 .8598506 .72386736 .72386736 --.66081645 .66081645 .01414242 .01414242 -1.2705579 -1 .2705579 --.80888332 .80888332 .03326704 .03326704 --.31548117 .31548117 --.05854831 .05854831 --.49643377 .49643377 --.38045395 .38045395 --.94569154 .94569154 --.31755863 .31755863 .41736385 .41736385 .70538924 .70538924 -1.3701565 -1 .3701565 .36496528 .36496528 --.02965048 .02965048 --.6967609 .6967609 -2.0212218 -2 .0212218 --.74457299 .74457299

-2.3616186 -2 .3616186 11.6594807 .6594807 -3.1099571 -3 .1099571 2.2550158 2.2550158 4.6473839 4.6473839 .28687697 .28687697 11.3526689 .3526689 4.7049423 4.7049423 11.1759126 .1759126 -3.2895913 -3 .2895913 -3.4154149 -3 .4154149 -4.0131043 -4 .0131043 --.65167378 .65167378 3.7479917 3.7479917 3.5696961 3.5696961 2.73561 2 .73561 4.1874599 4.1874599 --.88665317 .88665317 2.5444921 2.5444921 -2.0073164 -2 .0073164 11.2883756 .2883756 -2 -2.8039781 .8039781 -2.9146663 -2 .9146663 4 4.343178 .343178 -1..9599561 9599561 -1 -4.4098143 -4 .4098143 3.1175952 3.1175952 3.7714548 3.7714548 -3.0976467 -3 .0976467 4.0366273 4.0366273 1.677822 1 .677822 .97360708 .97360708 2.2439137 2.2439137 11.7768015 .7768015 -1.0320388 -1 .0320388 --.89256952 .89256952 .56396776 .56396776

5.2.5 Simulated Simulated user-specified user-specified correlated correlated data data 5.2.5

This dataset is is used used in in Chapter Chapter 33 to to illustrate illustrate calculation calculation of of aa PA-GEE PA-GEE model model This dataset with user-specified correlation correlation structure structure.. The The panel panel identifier identifier is is given by id; with user-specified given by id; the repeated repeated measures measures for for the panel are are given given by by tt;; the the x1 xi covariate covariate measures measures the the panel are random binary; binary; the the x2 x2 covariate covariate measures measures are are random binary;; and and the the are random random binary binary outcome outcome yy is is generated generated such that binary such that

Yit = = x1 xi +x2+ + x2 + 11 yet

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

.2) (5.2) (5

210 210

PROGRAMS AND DATASETS DATASETS PROGRAMS AND

for aa marginal marginal linear regression with with theoretical theoretical panel panel correlation correlation given given by by for linear regression

11 .6 .6 00 00 00 00 00 00

R= R=

.6 .6 11 00 00 00 00 00 00

00 00 11 .6 .6 00 00 00 00

00 00 .6 .6 11 00 00 00 00

00 00 00 00 11 .6 .6 00 00

00 00 00 00 .6 .6 11 00 00

00 00 00 00 00 00 11 .6 .6

00 00 00 00 00 00 .6 .6 11

(5.3) (5 .3)

The simulated data data has has panel panel errors errors uu with with actual actual correlation correlation given given by by The simulated 1.00 1.00 0.57 0.57 0.07 0.07 0.00 0.00 0.02 0.02 0.00 0.00 0.01 0.01 0.02 0.02

0.57 0.07 0.57 0.07 0.02 11.00 .00 0.02 0.02 1.00 0.02 1.00 0.06 0.63 0.06 0.63 -0.03 -0.04 -0.04 -0.03 0.02 -0.02 -0.02 0.02 0.04 0.05 0.04 0.05 0.02 0.10 0.02 0.10

0.00 0.02 0.00 0.02 0.06 -0.03 -0.03 0.06 0.63 -0.04 -0.04 0.63 1.00 0.00 1 .00 0.00 0.00 1.00 0.00 1.00 0.10 0.62 0.10 0.62 -0.20 -0.03 -0.03 -0.20 0.01 0.05 0.01 0.05

0.01 00.00 .00 0.01 0.04 00.02 .02 0.04 -0.02 0.05 -0 .02 0.05 -0.20 00.10 .10 -0.20 -0.03 00.62 .62 -0.03 -0.00 11.00 .00 -0.00 -0.00 1.00 -0 .00 1.00 0.58 00.11 .11 0.58

00.02 .02 00.02 .02 00.10 .10 00.01 .01 00.05 .05 00.11 .11 00.58 .58 1.00 1 .00

(5.4) (5 .4)

This is aa balanced balanced panel panel dataset dataset with with 10 panels of of 88 observations observations each. each. A A This is 10 panels description of the the panel panel structure of the the data data is: is: description of structure of id:: id t:: t

1, 2, . . ,, 10 10 1, 2, 1, 2, . . ,, 88 1, 2,

=

=

Delta(t) . = 11;; (8-1)+1 (8-1)+1 = 88 Delta(t) (id*t uniquely uniquely identifies identifies each each observation) (id*t observation)

Distribution of of T-i T_i:: Distribution

min min 8 8

5% 5% 8 8

25% 25% 8 8

50% 50% 88

= =

n n = T T=

75% 75% 8 8

10 10 8 8

95% 95% 88

max max 88

Freq.. Percent Percent Cum.. I Pattern Pattern Freq Cum ---------------------------+------------------------------------+---------10 100.00 100.00 100.00 II 11111111 11111111 10 100.00 ---------------------------+-----------------------------------+---------10 100.00 II XXXXXXXX XXXXXXXX 10 100.00

A summary of of the the variables variables is: is: A summary Variable Mean Std.. Dev Dev.. Min Max II Observations Variable I Mean Std Min Max Observations -----------------+--------------------------------------------+--------------------------------+--------------------------------------------+---------------y Y

xi xl

x2 x2

overall overall between between I within within I I overall overall between I between within within I I overall overall

2 .184972 2.184972

11.304209 .304209 .464511 .464511 11.226503 .226503

-.9349678 - .9349678 11.347499 .347499 -.8024338 - .8024338

.3625 .3625

.4837551 .4837551 .1094494 .1094494 .472336 .472336

0 0 .125 .125 --.1375 .1375

.7625 .7625

.428236 .428236

0 0

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

4 4.802304 .802304 I1 2 2.932273 .932273 II 4 4.276666 .276666 I1 II 11 II .5 .5 II 11.2375 .2375 I1 II 11 II

= = =

80 80 10 10 88

= = =

80 80 10 10 88

=

80 80

NN = n n = T = T NN = n n = T = T NN =

DATASETS DATASETS

211 211

between between within I within

The data The data are are:: id id 11 11 11 11 11 11 11 11 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 7 7 7 7 7 7 7 7 7 7 7 7

.1608355 .1608355 .3997626 .3997626

tt 11 2 2 3 3 4 4 5 5 6 6 7 7 8 8 11 2 2 3 3 4 4 5 5 6 6 7 7 8 8 11 2 2 3 3 4 4 5 5 6 6 7 7 8 8 11 2 2 3 3 4 4 5 5 6 6 7 7 8 8 11 2 2 3 3 4 4 5 5 6 6 7 7 8 8 11 2 2 3 3 4 4 5 5 6 6 7 7 8 8 11 2 2 3 3 4 4 5 5 6 6

y Y --.9349678 .9349678 3.023944 3.023944 .7980191 .7980191 11.952786 .952786 2.501896 2.501896 749945 11..749945 3.424954 3.424954 3.902931 3.902931 11.056602 .056602 11.068845 .068845 --.709373 .709373 --.1086365 .1086365 3.485659 3.485659 3.874307 3.874307 3.175718 3.175718 3.673608 3.673608 .1077769 .1077769 11.806073 .806073 .7593862 .7593862 .6223395 .6223395 2 2.293534 .293534 3.23475 3 .23475 11.085158 .085158 .8709748 .8709748 2.686399 2.686399 11.172582 .172582 2.183243 2.183243 .706732 .706732 .6680957 .6680957 2.941916 2.941916 3.103627 3.103627 11.280557 .280557 2.400463 2.400463 2.498784 2.498784 2.436319 2.436319 11.758729 .758729 3.310097 3.310097 4.549758 4.549758 .9309282 .9309282 2.746981 2.746981 2.883902 2.883902 3.710844 3.710844 3.233667 3.233667 3.014215 3.014215 3.004494 3.004494 2.191473 2.191473 11.262497 .262497 4 4.157091 .157091 2 2.063806 .063806 .5553885 .5553885 3.634825 3.634825 3.375006 3.375006 1.18627 1 .18627 3.41769 3 .41769

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

.5 .5 --.1125 .1125

x1 xi 0 0 11 0 0 0 0 11 0 0 0 0 11 11 0 0 0 0 0 0 0 0 11 11 11 0 0 11 0 0 0 0 0 0 11 11 0 0 0 0 0 0 0 0 0 0 0 0 11 11 0 0 0 0 0 0 0 0 11 0 0 11 11 0 0 0 0 0 0 11 0 0 11 0 0 0 0 11 11 0 0 11 11 0 0 11

x2 x2 0 0 11 0 0 11 11 0 0 11 11 0 0 11 0 0 0 0 11 11 11 11 11 11 11 0 0 11 11 11 0 0 11 0 0 11 11 0 0 11 11 11 11 11 11 0 0 11 11 0 0 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11

11 I 1.2625 1 .2625 1

= =

n n = T = T

10 10 88

PROGRAMS AND DATASETS DATASETS PROGRAMS AND

212 212 7 7 7 7 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10

7 7 8 8 11 2 2 3 3 4 4 5 5 6 6 7 7 8 8 11 2 2 3 3 4 4 5 5 6 6 7 7 8 8 11 2 2 3 3 4 4

5 5 6 6 7 7 8 8

.068189 .068189 2.234942 2 .234942 2.310676 2 .310676 .7103744 .7103744 3.370353 3 .370353 2.050668 2 .050668 2.348818 2 .348818 1.36899 1 .36899 4.722698 4 .722698 4.802304 4 .802304 .7837571 .7837571 2.188972 2 .188972 4.319549 4 .319549 2.956809 2 .956809 3.090703 3 .090703 2.731381 2 .731381 11.131808 .131808 11.467309 .467309 3.704787 3 .704787 .1794717 .1794717 384011 11..384011 3.720814 3 .720814 3.17122 3 .17122 11..828142 828142

0 0 0 0 11 0 0 0 0 0 0 0 0 0 0 11 11 0 0 11 0 0 0 0 0 0 0 0 0 0 0 0 11 0 0 0 0 11 11 0 0 0 0 0 0

.9966773 .9966773 1 .37176 1.37176

11 11 11 11 11 11 11 0 0 11 0 0 11 11 11 0 0 11 11 11 11 11 0 0 11 11 11 0 0 0 0 0 0

5.2.6 Simulated Simulated measurement measurement error error data data for for the the PA-GEE FA-GEE 5.2.6 In order to to illustrate illustrate techniques techniques for for constructing constructing the the sandwich sandwich estimate estimate of of In order variance for a two-step estimator, we simulated data for a linear regression variance for a two-step estimator, we simulated data for a linear regression model given by by model given di = /50

(5.5)

(5 .5)

+ /51X1i + /52X2i + 03X3i

However, X3 is is unobserved. unobserved. Instead, Instead, we we have have w w which which is is equal equal to to the the unobunobHowever, x3 error and served variable plus plus error and an an instrumental instrumental variable variable s. s. served variable A A description description of of the the panel panel structure structure of of the the data is:: data is id id:: tt::

1, 2, . . ,, 10 1,2, 10 1, 2, . . ,, 44 1, 2, Delta(t) . == 11;; (4-1)+1 (4-1)+1 == 44 Delta(t) (id*t uniquely uniquely identifies identifies each each observation) (id*t observation)

Distribution of of T_i T i:: Distribution

min min 4 4

5% 5% 4 4

Freq.. Percent Percent Cum.. I Pattern Freq Cum Pattern ------------------------------------10 100.00 100.00 1111 10 100 .00 100 .00 II 1111 -----------------------------------10 100.00 10 100 .00 XXXX I

---------------------------+-----------------------------------+---------

xxxx

A summary of of the the variables variables is: A summary is:

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

25% 25% 4 4

50% 50% 44

= =

n = n

10 10 44

TT =

75% 75% 44

95% 95% 44

max max 4 4

DATASETS DATASETS

213 213

Variable Mean Std.. Dev Dev.. Min Max II Observations Variable Mean Std Min Max Observations I -----------------+--------------------------------------------+----------------

-----------------+--------------------------------------------+----------------

y Y

xl xi

x2 x2

w w

ss

overall overall between I between within I within

21.97299 21 .97299

10.0514 10 .0514 55.700181 .700181 88.428398 .428398

4.094059 4 .094059 15.44533 15 .44533 4.086007 4 .086007

41.2298 41 .2298 31.22504 31 .22504 39.39337 39 .39337

2.975 2 .975

11.329883 .329883 .7115125 .7115125 11.140738 .140738

11 22 .725 .725

5 5 44.25 .25 5.725 5 .725

5.4 5 .4

33.248668 .248668 22.038518 .038518 22.591901 .591901

11 2.75 2 .75 -.85 .85

10 10 77.75 .75 9.9 9 .9

--.2024376 .2024376

11.154352 .154352 .4802081 .4802081 058143 11..058143

-2.57972 -2 .57972 -1.085854 -1 .085854 -2.665788 -2 .665788

2.093424 2 .093424 .4280724 .4280724 11.706574 .706574

--.2481142 .2481142

.9689245 .9689245 .505154 .505154 .8386081 .8386081

-2.152968 -2 .152968 -1..220872 220872 -1 -2.64149 -2 .64149

11.875231 .875231 .2404079 .2404079 11.386709 .386709

I overall overall between between within I within I overall overall between I between within I within I overall overall between between within I within I overall overall between between within I within

1 I 1 I I 1 1 I 1 I 1 I 1 1 1 I 1 1 1

= = = NN = = n n == T = T = = NN = n n == T = T = = NN = n == n T = T = = NN = n == n T = T = NN = n n = T = T

The data data are are:: id id 11 11 11 11 22 22 22 22 33 33 33 33 44 44 44 44 55 55 55 55 66 66 66 66 77 77 77 77 88 88 88 88 99 99 99

y y

31.744459 .744459 31 26.469884 26 .469884 33.265919 33 .265919 33.419892 33 .419892

29.936009 29 .936009 37.912047 37 .912047 18.311498 18 .311498 27.690159 27 .690159 14.424474 14 .424474 30.411879 30 .411879 44.0940588 .0940588 18.323714 18 .323714 29.770692 29 .770692 34.045546 34 .045546 41.229802 41 .229802 11.166042 11 .166042 66.8228099 .8228099 16.084249 16 .084249 31.451515 31 .451515 24.366558 24 .366558 12.571442 12 .571442 30.272789 30 .272789 18.958487 18 .958487 29.157948 29 .157948 88.1158695 .1158695 26.118726 26 .118726 11 11..07891 07891 38.331683 38 .331683 18.609577 18 .609577 22 22.598339 .598339 16.840708 16 .840708 16.598596 16 .598596 99.4027453 .4027453 11.152214 11 .152214 13.69361 13 .69361

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

xi xl 4 4 4 4 11 4 4

x2 x2

3 3 4 4 2 2 4 4 2 2 11 3 3 5 5 3 3 4 4 4 4 2 2 3 3 2 2 3 3 3 3 11 11 2 2 4 4

9 9 9 9 3 3 7 7 2 2 6 6 2 2 11 9 9 9 9 10 10 11 3 3 4 4 10 10 7 7 5 5 10 10 7 7 9 9 11 9 9 2 2

2 2 3 3 2 2 2 2 5 5 2 2 5 5 5 5 4 4 2 2 4 4

9 9 4 4 10 10 8 8

10 10 3 3 6 6 2 2 3 3 2 2 2 2 11

w w

ss

-1.2081616 -1 .2081616 11.4180836 .4180836

-1.0708159 -1 .0708159 11.0828664 .0828664

-1.1150387 -1 .1150387 .57584045 .57584045 11.3539961 .3539961 --.78054902 .78054902 .75987514 .75987514 1.7559449 1 .7559449 -2.0352778 -2 .0352778 1.2317474 1 .2317474 -1.6566934 -1 .6566934 --.81998911 .81998911 1.1209088 1 .1209088 2.0934245 2 .0934245 -1.6398477 -1 .6398477 .62104332 .62104332 -1.1174939 -1 .1174939 --.51899332 .51899332 --.88550458 .88550458 --.53659636 .53659636 -1.9905849 -1 .9905849 --.93072945 .93072945 -1.1448774 -1 .1448774 -2.5797199 -2 .5797199 .55967331 .55967331 .71729918 .71729918 -1.1565446 -1 .1565446 .14885816 .14885816 --.50131764 .50131764 --.15275318 .15275318 --.71003654 .71003654 .81480424 .81480424 1.1922808 1 .1922808

--.75967723 .75967723 .59421399 .59421399 11.1312023 .1312023 --.79222115 .79222115 .54007814 .54007814 11.8752313 .8752313 -2.1529676 -2 .1529676 .69928968 .69928968 --.81543436 .81543436 --.43137562 .43137562 .45670796 .45670796 11.5466827 .5466827 -2.1046986 -2 .1046986 .03954247 .03954247 -1.1654088 -1 .1654088 -1.0290361 -1 .0290361 -1.2184117 -1 .2184117 --.44657953 .44657953 -1.3242139 -1 .3242139 .8942821 -1.8942821 -1

--.77037395 .77037395 .4144579 .4144579

--.1847518 .1847518 .23857762 .23857762

--.01908745 .01908745 -1.8719163 -1 .8719163 .15424293 .15424293

.67129126 .67129126 --.5663126 .5663126 --.23900719 .23900719 .08000681 .08000681 --.59266891 .59266891 -1 -1.2672068 .2672068 .22284953 .22284953 .54590719 .54590719

40 40 10 10 44 40 40 10 10 44 40 40 10 10 44 40 40 10 10 44 40 40 10 10 44

214 214

PROGRAMS AND DATASETS DATASETS PROGRAMS AND 99 10 10 10 10 10 10 10 10

32.695215 32 .695215 12.419609 12 .419609 16.27943 16 .27943 25.63903 25 .63903 77.4432654 .4432654

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

55 11 11 55 22

77 44 55 44 11

--.13302718 .13302718 --.05218049 .05218049 --.82370427 .82370427 1.3528266 1 .3528266 --.96857179 .96857179

.05994656 .05994656 --.3663847 .3663847 --.28959267 .28959267 .82080319 .82080319 --.08195498 .08195498

References References Belsey, D. A., A., Kuh, Kuh, E., E., and and Welsch, Welsch, R. R. E. Regression Diagnostics Diagnostics:: Identifying Identifying Belsey, D. E. 1980. 1980 . Regression Influential Data Data and Sources of of Collinearity. Collinearity. New New York York:: Wiley. Wiley. Influential and Sources and Williams, Williams, R. 1997.. Analyzing Repeated Repeated Measures Measures and and ClusterClusterBieler, G. and Bieler, G. R. 1997 Analyzing SUDAAN, Release Release 7.5. 7.5. Research Research Triangle Triangle Park, Park, NC NC:: ReReCorrelated Data Data using using SUDAAN, Correlated Institute. search Triangle Triangle Institute. search Billingsley, P. 1986. Probability and and Measure Measure (Second (Second edition) edition).. New New York York:: Wiley. Wiley. Billingsley, P. 1986 . Probability On the the variances variances of of asymptotically asymptotically normal normal estimators estimators from from Binder, D.. A. A. 1983 1983.. On Binder, D complex surveys. International Statistical Statistical Review Review 51 51:: 279-292 279-292.. complex surveys. International ___ . 1992 1992.. Fitting Fitting Cox's Cox's proportional proportional hazards hazards models models from from survey survey data data.. Biometrika 79(1) 79(1):: 139-147. Biometrika 139-147. Bland, J. M. M. and and Altman, Altman, D D.. G. G. 1986 1986.. Statistical methods for for assessing assessing agreement agreement Bland, J. Statistical methods between two two methods methods of clinical treatment treatment.. Lancet Lancet I:I: 307-310 307-310.. between of clinical Zeger, S. S. L., L., and and Diggle, Diggle, P. 1993.. Modelling Modelling multivariate multivariate binary binary Carey, V. V. JJ.,., Zeger, Carey, P. JJ.. 1993 data with with alternating alternating logistic logistic regressions regressions.. Biometrika Biometrika 80 80:: 517-526. 517-526. data J. and and Kauermann, Kauermann, G. G. to to appear. appear. The The sandwich variance estimator: estimator: Carroll, R. Carroll, R. J. sandwich variance Efficiency properties properties and coverage probability probability of of confidence confidence intervals. intervals. Journal Efficiency and coverage Journal of the the American American Statistical Statistical Association Association.. of Carroll, R. R. J. J. and and Pederson, Pederson, SS.. 1993. On robustness robustness in in the the logistic logistic regression regression model. model. Carroll, 1993 . On Journal of of the the Royal Royal Statistical Statistical Society Society -- Series Series B B 55 55:: 693-706. 693-706. Journal Chang, Y.-C. Y.-C. 2000 2000.. Residuals Residuals analysis analysis of of the the generalized generalized linear linear models models for for longitulongituChang, dinal dinal data data.. Statistics Statistics in Medicine 19: in Medicine 19 : 1277-1293. 1277-1293 . ., Laird, Dempster, A.. P P., N.. M., and Rubin, Rubin, D. Maximum likelihood likelihood estiestiDempster, A Laird, N M., and D. B. B . 1977. 1977 . Maximum mation from from incomplete incomplete data data via the EM EM algorithm algorithm (with (with discussion) discussion).. Journal Journal mation via the of the the Royal Royal Statistical Statistical Society Society -- Series Series B B 39 39:: 1-38 1-38.. of Diggle, P. Diggle, P. J. J. and and Kenward, Kenward, M M.. G. G. 1994 1994.. Informative Informative dropout dropout in in longitudinal longitudinal data data analysis (with discussion) discussion).. Applied Applied Statistics Statistics 43 43:: 49-94 49-94.. analysis (with Diggle, P. .-Y., and Liang, K K.-Y., and Zeger, Zeger, S. L.. 1994 1994.. Analysis Analysis of of Longitudinal Longitudinal Data. Data. Diggle, P. JJ.,., Liang, S. L Oxford OX2 OX2 6DP 6DP:: Oxford Oxford University University Press. Press. Oxford Fanurik, D., Zeltzer, Zeltzer, L. K., Roberts, Roberts, M. M. C., Blount, R. R. L. L. 1993 1993.. The The relationship relationship Fanurik, D., L. K., C., and and Blount, between children's children's coping coping styles styles and and psychological psychological interventions interventions for for cold cold pressor pressor between Pain 53 53:: 213-222. 213-222. pain.. Pain pain What is is the the delta delta methods and how how is is it it used used Feiveson, A.. H. H. 1999. Feiveson, A 1999 . What methods and to estimate estimate the the standard standard error error of of a a transformed transformed parameter? parameter? http:: to http // / /www.stata.com/support/faqs/stat/deltam.html. www.Sata.com/support/fags/stat/deltam .htm l. Fitzmaurice, G. M., M., Laird, Laird, N. N. M M.,., and and Lipsitz, Lipsitz, S. R. 1994 1994.. Analysing Analysing incomplete incomplete Fitzmaurice, G. S. R. longitudinal longitudinal binary binary responses: responses: A A likelihood-based approach.. Biometrics Biometrics 50 50:: 601601likelihood-based approach 612. 612. W., and and Wright, Wright, M M.. H. H. 1981. Practical Optimization. Optimization. New New Gill, P. P. E., E., Murray, Murray, W., Gill, 1981 . Practical York:: Academic Academic Press. York Press. W. and Sribney, W. W. 1999 1999.. Maximum Likelihood Likelihood Estimation Estimation with with Stata. Gould, W. Gould, and Sribney, Maximum Stata. College Station, Station, TX: Stata Press. Press. College TX: Stata

215

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

216 21 6

REFERENCES REFERENCES

Gourieroux, C C.. and and Monfort, Monfort, A. A. 1993 1993.. Pseudo-likelihood Pseudo-likelihood methods. methods. In In G. G. Maddala, Maddala, Gourieroux, C.. Rao, Rao, and and H. H. Vinod Vinod (eds (eds.), Handbook of of Statistics, Statistics, Vol. 11. C .), Handbook Vol. 11. W. 2000 2000.. Econometric Econometric Analysis Analysis (Fourth (Fourth edition) edition).. Upper Upper Saddle Saddle River, River, Greene, W. Greene, Prentice Hall Hall.. New Jersey Jersey:: Prentice New the application of extended extended quasilikelihood quasilikelihood to to the clustered Hall, D. B. 2001.. On Hall, D. B. 2001 On the application of the clustered data case.. The The Canadian Journal of of Statistics Statistics 29(2) 29(2):: 1-22. data case Canadian Journal 1-22 . Severini, T T.. A. A. 1998 1998.. Extended Extended generalized generalized estimating estimating equations equations for for Hall, D. B. B. and Hall, D. and Severini, clustered data.. Journal Journal of of the the American American Statistical Statistical Association Association 93 93:: 1365-1375. 1365-1375. clustered data Hardin, J. W. W. and J. M M.. 2001 2001.. Generalized Generalized Linear Linear Models Models and and Extensions Extensions.. Hardin, J. and Hilbe, Hilbe, J. College Station: Station: Stata Stata Press. Press. College Haslett, J. 1999 1999.. A A simple simple derivation derivation of of deletion deletion diagnostic diagnostic results results for for the the generalized generalized Haslett, J. model with with correlated correlated errors errors.. Journal Journal of of the the Royal Royal Statistical Statistical Society Society -linear model linear Series B B 61(3): 603-609. Series 61(3) : 603-609. Heyting, A., Tolboom, Tolboom, J. J. T. T. B. B. M M.,., and and Essers, Essers, J. J. G. G. A. A. 1992. Statisticall handling Heyting, A., 1992 . Statistica handling of drop-outs in in longitudinal longitudinal clinical clinical trials trials.. Statistics Statistics in in Medicine Medicine 11 11:: 2043-2062 2043-2062.. of drop-outs J. M M.. 1993a. 1993a. Generalized Generalized linear models.. Stata Stata Technical Technical Bulletin Bulletin 11 11:: 20-28. 20-28. Hilbe, Hilbe, J. linear models ___ .. 1993b 1993b.. Log Log negative negative binomial binomial regression regression as as a a generalized model. generalized linear linear model. Graduate on Statistics. Statistics. Graduate College College Committee Committee on ___ .. 1994a. linear models models.. The The American American Statistician Statistician 48(3): 48(3): 2552551994a. Generalized Generalized linear 265.. 265 ___ .. 1994b. Log negative negative binomial binomial regression using the the GENMOD GENMOD procedure procedure in in 1994b. Log regression using SAS/STAT software.. SUG! pp. 1199-1204. SAS/STAT software SUGI pp. 1199-1204. ___ .. 1994c. Negative binomial binomial regression regression.. Stata Stata Technical Technical Bulletin Bulletin 18 18:: 2-5. 1994c. Negative 2-5. Horton, N. J., J., Bebchuk, Bebchuk, J. J. D., D., Jones, Jones, C C.. L., L., Lipsitz, Lipsitz, S. S. R., R., Catalano, Catalano, P. P. J., J., Zahner, Zahner, Horton, N. G. E. E. P., P., and and Fitzmaurice, Fitzmaurice, G. Goodness-of-fit for for GEE: GEE: An An example example G. G. M. M. 1999. 1999 . Goodness-of-fit with mental mental health health service service utilization utilization.. Statistics Statistics in in Medicine Medicine 18 18:: 213-222. 213-222. with Horton, N. J. J. and and Lipsitz, Lipsitz, S. S. R. R. 1999. Review of of software software to to fit fit generalized estimating Horton, N. 1999 . Review generalized estimating equation regression models models.. The The American American Statistician Statistician 53 53:: 160-169. 160-169. equation regression Hosmer Jr., D. D. W. W. and and Lemeshow, Lemeshow, S. S. 1980 1980.. Goodness-of-fit Goodness-of-fit tests tests for for the the multiple multiple Hosmer Jr., logistic regression regression model. model. Communications Communications in in Statistics Statistics A9 A9:: 1043-1069. 1043-1069. logistic Huber, P.. J. J. 1967 1967.. The The behavior behavior of of maximum maximum likelihood likelihood estimates estimates under under nonstannonstanHuber, P dard conditions. In In Proceedings Proceedings of the Fifth Fifth Berkeley Berkeley Symposium Symposium on on MathematMathematdard conditions. of the ical Statistics Statistics and and Probability, Probability, Vol Vol.. 1, pp.. 221-233, 221-233, Berkeley, CA.. University of ical 1, pp Berkeley, CA University of California California Press. Press. Karim, M. Karim, M. R. and Zeger, Zeger, S. S. L. L. 1989 1989.. A A SAS SAS macro macro for for longitudinal longitudinal data data analysis analysis.. R. and Department of of Biostatistics, Biostatistics, The The Johns Johns Hopkins Hopkins University: University: Technical Technical Report Report Department 674. 674 . Lee, A.,., Scott, Scott, A., A., and Soo, S. S. 1993 1993.. Comparing Comparing Liang-Zeger Liang-Zeger estimates estimates with with maxmaxLee, A and Soo, imum imum likelihood likelihood in in bivariate bivariate logistic logistic regression regression.. Statistical Computation Computation and and Statistical Simulation 44 44:: 133-148. Simulation 133-148 . E. and and Spiessens, Spiessens, B. B. 2001 2001.. On On the the effect effect of of the the number number of of quadrature quadrature points points Lesaffre, Lesaffre, E. in aa logistic logistic random-effects random-effects model: model: An An example. example. Applied Applied Statistics Statistics 50 50:: 325-335. 325-335. in and Zeger, Zeger, SS.. L. L. 1986. Longitudinal data analysis using using generalized generalized Liang, Liang, K-Y. K.-Y. and 1986 . Longitudinal data analysis linear 13-22. Biometrika 73 73:: 13-22. linear models models.. Biometrika Lin, D. D. Y. inference for and Wei, Wei, L. L. J. 1989.. The The robust robust inference for the the Cox proportional hazards hazards Lin, Y. and J. 1989 Cox proportional model. Journal Journal of of the the American American Statistical Statistical Association Association 84(408) 84(408):: 1074-1078. model. 1074-1078 . A concordance concordance correlation to evaluate evaluate reproducibility. reproducibility. Lin, L. I.-K Lin, L. I.-K. 1989. 1989 . A correlation coefficient coefficient to Biometrics 45 45:: 255-268. 255-268. Biometrics K . 1997 1997.. Applying Applying generalized generalized linear linear models models.. Berlin Berlin:: Springer-Verlag Springer-Verlag.. Lindsey, Lindsey, J. J. K Lipsitz, R., Fitzmaurice, Fitzmaurice, G. G. M., M., Orav, Orav, E. E. J., J., and and Laird, Laird, N. N. M. M. 1994 1994.. Performance Performance Lipsitz, SS.. R.,

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

REFERENCES REFERENCES

217 21 7

of generalized estimating estimating equations equations in practical situations situations.. Biometrics Biometrics 50 50:: 270270of generalized in practical 278.. 278 Lipsitz, R., Laird, Laird, N. N. M M.,., and and Harrington, Harrington, D. D. P. P. 1992 1992.. A A three-stage three-stage estimator estimator for for Lipsitz, SS.. R., studies with with repeated repeated and possibly missing missing binary binary outcomes outcomes.. Applied Applied Statistics Statistics studies and possibly 41(1):: 203-213. 203-213. 41(1) Little, R. J. J. A A.. 1988 1988.. A A test test of of missing missing completely completely at at random random for for multivariate multivariate data data Little, R. with missing missing values values.. Journal Journal of of the the American American Statistical Statistical Association Association 83(404) 83(404):: with 1198-1202. 1198-1202. ___ .. 1995 1995.. Modeling the drop-out drop-out mechanism mechanism in in repeated-measures repeated-measures studies. studies. Modeling the Journal of of the the American American Statistical Statistical Association Association 90(431): 90(431): 1112-1121. 1112-1121. Journal Little, S. R. and Rubin, Rubin, D D.. B. B. 1987 1987.. Statistical Analysis with with Missing Missing Data Data.. New New Little, S. R. and Statistical Analysis York:: Wiley. Wiley. York and Nelder, Neider, J. J. A. A. 1989. Generalized Linear Linear Models Models (Second (Second edition) edition).. McCullagh, P. P. and McCullagh, 1989 . Generalized London:: Chapman & Hall Hall.. London Chapman & McKusick, L L.,., Coates, Coates, T. T. J., J., Morin, Morin, S. S. F., Pollack, L., L., and and Hoff, Hoff, C. C. 1990 1990.. LongitudiLongitudiMcKusick, F., Pollack, nal predictors predictors of of reductions reductions in in protected protected anal anal intercourse intercourse among among gay gay men men in in San nal San Francisco: The The AIDS AIDS Behavioral Behavioral Research Project. American American Journal Journal of of Public Public Francisco: Research Project. 80:: 978-983. Health 80 Health 978-983. Nelder, D. 1987 Neider, J. J. A. A. and and Pregibon, Pregibon, D. 1987.. An An extended extended quasi-likelihood quasi-likelihood function. function . Biometrika 74 74:: 221-232. 221-232. Biometrika Neider, J. J. A. A. and Wedderburn, R. R. W. W. M. M. 1972 1972.. Generalized linear models models.. Journal Journal Nelder, and Wedderburn, Generalized linear of the the Royal Royal Statistical Statistical Society Society -- Series Series A A 135(3) 135(3):: 370-384. 370-384. of Neuhaus, J. J. M M.. 1992. Statistical methods methods for for longitudinal longitudinal and and clustered clustered designs designs with with Neuhaus, 1992 . Statistical binary responses. responses. Statistical Statistical Methods Methods in in Medical Medical Research Research 11:: 249-273. 249-273. binary Pan, W. 2001a. 2001a. Akaike's Akaike's information information criterion criterion in in generalized generalized estimating estimating equations. Pan, W. equations. Biometrics 57: 57: 120-125. 120-125. Biometrics ___ .. 2001b. 2001b. On On the the robust robust variance variance estimator estimator in in generalised generalised estimating estimating equaequations. Biometrika Biometrika 88(3): 88(3): 901-906. 901-906. tions. Preisser, J. SS.. and and Qaqish, Qaqish, B B.. F. F. 1996 1996.. Deletion Deletion diagnostics diagnostics for for generalized estimating Preisser, J. generalized estimating equations. Biometrika 83 83:: 551-562. 551-562. equations. Biometrika ___ .. 1999 1999.. Robust Robust regression regression for for clustered clustered data data with with application application to to binary binary rereBiometrics 55 55:: 574-579. 574-579. sponses. sponses. Biometrics Prentice, R. L. L. and and Zhao, Zhao, L. L. P. P. 1991 1991.. Estimating Estimating equations equations for for parameters parameters in in means means Prentice, R. and covariances of of multivariate multivariate discrete discrete and and continuous continuous responses. responses. Biometrics Biometrics and covariances 47:: 825-839 825-839.. 47 Rabe-Hesketh, Skrondal, A., A., and and Pickles, Pickles, A. A. 2002 2002.. Reliable Reliable estimation estimation of of genergenerRabe-Hesketh, SS.,., Skrondal, alized mixed models models using adaptive quadrature quadrature.. The Stata Journal Journal 1(3) 1(3).. alized linear linear mixed using adaptive The Stata Robins, J. M., M., Rotnitzky, Rotnitzky, A A.. G G.,., and and Zhao, Zhao, L. L. P. P. 1995 1995.. Analysis Analysis of of semiparametric Robins, J. semiparametric regression models models for for repeated outcomes in in the the presence presence of of missing missing data. data. Journal Journal regression repeated outcomes of the the American American Statistical Statistical Association Association 90(429) 90(429):: 106-121. 106-121. of Rotnitzky, A. Rotnitzky, A. and and Jewell, Jewell, N. N. P. P. 1990 1990.. Hypothesis Hypothesis testing testing of of regression regression parameters parameters in in semiparametric generalized linear linear models models for for cluster correlated data data.. Biometrika Biometrika semiparametric generalized cluster correlated 77(3):: 485-497. 485-497. 77(3) Rotnitzky, A. and and Robins, Robins, J. J. M. Semiparametric regression regression estimation estimation in in the the Rotnitzky, A. M. 1995. 1995 . Semiparametric presence of of dependent dependent censoring. Biometrika 82(4): 82(4): 805-820. presence censoring. Biometrika 805-820. Rotnitzky, A. G. G. and and Wypij, Wypij, D. D. 1994 1994.. A A note note on on the the bias bias of of estimators with missing missing Rotnitzky, A. estimators with data.. Biometrics Biometrics 50 50:: 1163-1170. 1163-1170. data Rubin, D. B. B. 1976 1976.. Inferrence Inferrence and and missing missing data. Biometrika 63: 581-592. Rubin, D. data . Biometrika 63 : 581-592. SAS Institute, Institute, Inc. Inc... 2000 2000.. SAS/STAT SAS/STAT User's Cary, NC NC:: SAS SAS Institute Institute Inc. Inc. SAS User's Guide. Guide . Cary,

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC

218 21 8

REFERENCES REFERENCES

Shah, B B.. V., V., Barnwell, Barnwell, B. B. G., G., and and Bieler, Bieler, G. G. S. S. 1997 1997.. SUDAAN SUDAAN User's User's Manual, Manual, Shah, 7.5.. Research Research Triangle Triangle Park, Park, NC NC:: Research Research Triangle Triangle Institute. Institute. Release 7.5 Release Shih, W. W. J. J. 1992 1992.. On On informative informative and and random random dropouts dropouts in in longitudinal longitudinal studies. studies. Shih, Biometrics: 970-971 970-971.. Biometrics: Sribney, W. W. M. M. 1999 1999.. What What is is the the difference difference between between random-effects and populationpopulationSribney, random-effects and averaged http://www.stata.com/support/faqs/stat/repa.html. averaged estimators? estimators? http : //www.stata.com/support/fags/stat/repa.htm l. Sutradhar, B. B. C. and Das, Das, K. K. 1999 1999.. On On the the efficiency efficiency of regression estimators estimators in in Sutradhar, C. and of regression generalised linear linear models models for for longitudinal longitudinal data. Biometrika 86(2) 86(2):: 459-465 459-465.. generalised data . Biometrika Thall, P. F F.. and and Vail, Vail, SS.. C. C. 1990 1990.. Some Some covariance covariance models models for for longitudinal longitudinal count count Thall, P. data with overdispersion overdispersion.. Biometrics Biometrics 46 46:: 657-671 657-671.. data with Wacholder, S. S. 1986 1986.. Binomial Binomial regression regression in in GLIM GLIM:: Estimating risk ratios ratios and and risk risk Wacholder, Estimating risk American Journal Journal of of Epidemiology Epidemiology 123(1) 123(1):: 174-184. differences. differences . American 174-184. J. H., H., Docker Docker III, III, S. S. A., A., Speizer, Speizer, F. F. E., E., and and Ferris Ferris Jr Jr.,., B. B. G. 1984.. PasPasWare, J. Ware, G. 1984 sive smoking, gas gas cooking, cooking, and and respiratory respiratory health health of children living living in in six cities.. sive smoking, of children six cities American Review Review of of Respiratory Respiratory Diseases Diseases 29 29:: 366-374. 366-374. American Wedderburn, R. W. M. M. 1974. 1974. Quasi-likelihood Quasi-likelihood functions, functions, generalized generalized linear linear models, models, Wedderburn, R. W. and the Gauss-Newton Gauss-Newton method method.. Biometrika Biometrika 61(3): 61(3): 439-447. 439-447. and the S. L. L. and and Karim, Karim, M M.. R. R. 1991 1991.. Generalized Generalized linear linear models models with with random random effects, effects; Zeger, S. Zeger, Gibbs sampling sampling approach approach.. Journal Journal of of the American Statistical Statistical Association Association aa Gibbs the American 86(413): 86(413) : 79-86. 79-86 . P. and and Prentice, Prentice, R. R. L. L. 1990. binary regression regression using using aa quadratic quadratic Zhao, L. L. P. Zhao, 1990 . Correlated Correlated binary exponential model exponential model.. Biometrika Biometrika 77: 642-648.. 77 : 642-648 Zheng, B. B. 2000 2000.. Summarizing Summarizing the the goodness goodness of of fit fit of of generalized generalized linear linear models models for for Zheng, longitudinal data data.. Statistics Statistics in in Medicine Medicine 19 19:: 1265-1275. longitudinal 1265-1275 . Kastner, C., Gromping, V., and Blettner, Blettner, M. M. 1996. The generalized generalized estiestiZiegler, A., A., Kastner, Ziegler, C., Gromping, U., and 1996 . The mating equations equations in in the the past past ten ten years: years: An An overview overview and and aa biomedical biomedical application application.. mating ftp: ftp.stat .uni-muenchen .de/pub/sfb386/paper24 .ps.Z. ftp: // / /ftp.stat.uni-muenchen.de/pub/sfb386/paper24.ps.Z.

© © 2003 2003 by by Chapman Chapman & & Hall/CRC Hall/CRC