Microeconometrics: Methods and Applications (Solution Manual)

1 2 3 4 5 6 7 8 9 BOOK PREFACE This book provides a detailed treatment of microeconometric analysis, t...

Author: A. Colin Cameron | Pravin K. Trivedi

338 downloads 1674 Views 30MB Size Report

This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!

Report copyright / DMCA form

DOWNLOAD PDF

1

2

3

4

5

6

7

8

9

BOOK PREFACE This book provides a detailed treatment of microeconometric analysis, the analysis of individuallevel data on the economic behavior of individuals or firms. This usually entails regression methods applied to cross-section and panel data. The book aims to provide the practitioner with a comprehensive coverage of statistical methods and their application in modern applied microeconometrics research. These methods include nonlinear modelling, inference under minimal distributional assumptions, identifying and measuring causation rather than mere association, and correcting from departures from simple random sampling. Many of these features are of relevance to individual-level data analysis throughout the social sciences. The ambitious agenda has determined the characteristics of this book. First, although oriented to the practitioner the book is relatively advanced in places. A cookbook approach is inadequate as when two or more complications occur simultaneously, a common situation, the practitioner must know enough to be able to adapt available methods. Second, the book provides considerable coverage of practical data problems, see especially the last three chapters. Third, the book includes substantial empirical examples in many chapters, to illustrate some of the methods covered. Finally, the book is unusually long. Despite this length we have been space-constrained. We had intended to include even more empirical examples. And abbreviated presentations will at times fail to recognize the accomplishments of researchers who have made substantive contributions. The book assumes a basic understanding of the linear regression model with matrix algebra. It is written at the mathematical level of the first-year economics Ph.D. sequence, comparable to Greene (2003). We have two types of readers in mind. First, the book can be used as a course text for a microeconometrics course, typically taught in the second-year of the Ph.D., or for data-oriented microeconomics field courses such as labor economics, public economics and industrial organization. Second, the book can be used as a reference work for graduate students and applied researchers who despite training in microeconometrics will inevitably have gaps that they wish to fill. For instructors using this book as an econometrics course text it is best to introduce the basic nonlinear cross-section and linear panel data models as early as possible, initially skipping many of the methods chapters. The key methods chapter (chapter 5) covers maximum likelihood and nonlinear least squares estimation. ML and NLS provide adequate background for the most commonly-used nonlinear cross-section models (chapters 14-17, 20), basic linear panel data models (chapter 21) and treatment evaluation methods (chapter 25). Generalized method of moments estimation (chapter 6) is needed especially for advanced linear panel data methods (chapter 22). For readers using this book as a reference work, the chapters have been written to be as selfcontained as possible. The notable exception is that some command of general estimation results in chapter 5, and occasionally chapter 6, will be necessary. Most models chapters are structured to begin with a discussion and example that is accessible to a wide audience. The web-site www.econ.ucdavis.edu/faculty/cameron/mmabook provides all the data and computer programs used in this book, and related materials useful for instructional purposes. This project has been long and arduous, and at times seemingly without an end. Its completion has been greatly aided by our colleagues, friends, and graduate students. We would like to thank especially the following for reading and commenting on specific chapters: Bijan Borah, Kurt Brännäs, Pian Chen, Tim Cogley, Parthe Deb, David Drukker, Massimiliano De Santis, Jeff Gill, 10

Tue Gorgens, Shiferaw Gurmu, Lu Ji, Oscar Jorda, Roger Koenker, Chenghui Li, Tong Li, Doug Miller, Murat Munkin, Jim Prieger, Ahmed Rahmen, Sunil Sapra, Haruki Seitani, Yacheng Sun, Xiaoyong Zheng, and David Zimmer. We thank Rajeev Dehejia, Bronwyn Hall, Cathy Kling, Jeffrey Kling, Will Manning, Brian McCall and Jim Ziliak for making their data available for empirical illustrations. We thank our respective departments for facilitating our collaboration, and for the production and distribution of the draft manuscript at various stages. We benefitted from the comments of two anonymous reviewers. Guidance, advice and encouragement from our CUP editor, Scott Pariss, has been invaluable. Our interest in econometrics owes much to the training and environments we encountered as students and in the initial stages of our academic careers. The first author thanks The Australian National University, Stanford University, especially Takeshi Amemiya and Tom MaCurdy, and The Ohio State University. The second author thanks the London School of Economics and The Australian National University. Our interest in writing a book oriented to the practitioner owes much to our exposure to the research of graduate students and colleagues at our respective institutions, UC-Davis and IUBloomington. Finally, we would like to thank our families for their patience and understanding without which completion of this project would not have been possible. A. Colin Cameron Davis, California Pravin K. Trivedi Bloomington, Indiana

11

TABLE OF CONTENTS I: PRELIMINARIES

II: CORE METHODS

1. Overview 2. Causal and Noncausal Models 3. Microeconomic Data Structures 4. Linear models 5. ML and NLS estimation 6. GMM and Systems Estimation 7. Hypothesis Tests 8. Specification Tests and Model Selection 9. Semiparametric Methods 10. Numerical Optimization

III: SIMULATION- 11. Bootstrap BASED 12. Simulation-based METHODS 13. Bayesian Methods

Methods Methods

IV:

CROSS-SECTION 14. Binary Outcome Models DATA MODELS 15. Multinomial Models 16. Tobit and Selection Models 17. Transition Data: Survival Analysis 18. Mixture Models and Unobserved Heterogeneity 19. Models of Multiple Hazards 20. Count Data Models

V:

PANEL MODELS

DATA 21. Linear Panel Models: Basics 22. Linear Panel Models: Extensions 23. Nonlinear Panel Models

VI: FURTHER TOPICS 24. Stratified and Clustered Samples 25. Treatment Evaluation 26. Measurement Error Models 27. Missing Data and Imputation APPENDICES

A. Asymptotic Theory B. Making Pseudo-Random Draws

12

PART 1 (chapters 1-3)

Part 1 covers the essential components of microeconometric analysis -- an economic specification, a statistical model and a data set. Chapter 1 discusses the distinctive aspects of microeconometrics, and provides an outline of the book. It emphasizes that discreteness of data, and nonlinearity and heterogeneity of behavioral relationships are key aspects of disaggregated microeconometric models. It concludes by presenting the notation and conventions used throughout the book. Chapters 2 and 3 set the scene for the remainder of the book by introducing the reader to key model and data concepts that shape the analyses of later chapters. A key distinction in econometrics is between essentially descriptive models and data summaries at various levels of statistical sophistication and models that go beyond associations and attempt to estimate causal parameters. The classic definitions of causality in econometrics derive from the Cowles Commission simultaneous equations models that draw sharp distinctions between exogenous and endogenous variables, and between structure and reduced form parameters. Although reduced form models are very useful for prediction, knowledge of structural or causal parameters is essential for policy analyses. Identification of structural parameters within the simultaneous equations framework poses numerous conceptual and practical difficulties. An alternative approach based on the potential outcome model, also attempts to identify causal parameters but it does so by posing limited questions within a more manageable framework. Chapter 2 attempts to provide an overview of the fundamental issues that arise in these alternative frameworks. Readers who initially find this material challenging should return to this chapter later after gaining greater familiarity with specific models covered later in the book. The empirical researcher’s ability to identify causal parameters depends not only on the statistical tools and models but also on the type of data available. An experimental framework provides a standard for establishing causal connections. However, observational, not experimental, data form the basis of much of econometric inference. Chapter 3 surveys the pros and cons of three main types of data available: observational data, data from social experiments, and those from natural experiments. The potential as well as the difficulties of conducting causal inference based on each type of data are reviewed.

PART 2 (chapters 4-10)

Part 2 presents the core methods – least squares, method of moments, and maximum likelihood -of estimation and inference in nonlinear regression models that are central in microeconometrics. Both the traditional topics as well as more modern topics like quantile regression, sequential estimation, empirical likelihood, bootstrap, and semi- and nonparametric regression are covered. In general the discussion is at a level intended to provide enough background and detail to enable the practitioner to read and comprehend articles in the leading econometrics journals. We presume prior familiarity with linear regression analysis. Chapter 4 begins with the linear regression model. It then covers at an introductory level quantile regression, which models distributional features other than the conditional mean. It provides a lengthy expository treatment of instrumental variables estimation, a major semiparametric method 13

of causal inference. Chapter 5 presents the most commonly-used estimation methods for nonlinear models, beginning with the quite general topic of m-estimation, before specialization to maximum likelihood and nonlinear least squares regression. Chapter 6 provides a comprehensive treatment of generalized method of moments, which is a quite general estimation framework, applicable both in linear and nonlinear, and single- and multi-equation settings. The chapter emphasizes the special case of instrumental variables estimation. Chapter 7 covers both the classical and bootstrap approaches to hypothesis testing, while Chapter 8 presents relatively more modern methods of model selection and specification analysis. .Because of their importance the bootstrap methods also get a more detailed stand-alone treatment in Chapter 11. As much as possible testing methods are presented in a unified manner in these chapters, but specific applications occur throughout the book Chapter 9 is a stand-alone chapter that presents nonparametric and semiparametric estimation methods that place a flexible structure on the econometric model. Chapter 10 presents the computational methods used to compute the nonlinear estimators presented in chapters 5 and 6. This material becomes especially relevant to the practitioner if an estimator is not automatically computed by an econometrics package.

PART 3 (chapters 11-13)

Part 1 emphasized that: (1) Microeconometric models are often nonlinear; (2) they are frequently estimated using large and heterogeneous data sets; and (3) the data often come from surveys that are complex and subject to a variety of sampling biases. A realistic depiction of the economic phenomena in such settings often requires the use of models that are difficult to estimate and analyze. Advances in computing hardware and software now make it feasible to tackle such tasks. Part 3 presents modern, computer-intensive, simulation-based methods of inference that mitigate some of these difficulties. The background required to cover this material varies somewhat with the chapter but the essential base is least squares and maximum likelihood estimation. Chapter 11 presents bootstrap methods for statistical inference. These methods have the attraction of providing a simple way to obtain standard errors when the formulae from asymptotic theory are complex, as is the case for some two-step estimators. Furthermore, if implemented appropriately, a bootstrap can lead to a more refined asymptotic theory that may then lead to better statistical inference in small samples. Chapter 12 presents simulation-based estimation methods. These methods permit estimation in situations where standard computational methods may not permit calculation of an estimator, because of the presence of an integral over a probability distribution for which there is no closedform solution. Chapter 13 surveys Bayesian methods that provide an approach to estimation and inference that is quite different from the classical approach used in other chapters of this book. Despite this different approach, the Bayesian toolkit can also be adopted to permit classical estimation and inference for problems that are otherwise intractable

14

PART 4 (chapters 14-20)

Part 4, consisting of chapters 14 to 20, covers the core nonlinear limited dependent variable models for cross-section data, defined by the range of values taken by the dependent variable. Topics covered include models for binary and multinomial data, duration data and count data. The complications of censoring, truncation and sample selection are also studied. Chapters 14-15 cover models for binary and multinomial data that are standard in the analysis of discrete choice and outcomes. Maximum likelihood methods are dominant. Different parameterizations for the conditional probabilities in these models lead to different models, notably logit and probit models, which are well-established Recent literature has focused on less restrictive modeling with more flexible functional forms for conditional probabilities and on accommodating individual unobserved heterogeneity. These objectives motivate the use of semiparametric methods and simulation-based estimation methods. Censoring, truncation or sample selection generate empirically several important classes of models that are analyzed in Chapter 16. The long-established Tobit model is central to this literature, but its estimation and inference rely on strong distributional assumptions to permit consistent estimation. We also examine the newer semiparametric methods require weaker assumptions. Chapters 17-19 consider duration models in which the focus is on either the determinants of spell lengths, such as length of an unemployment spell, or on modeling the hazard rate of transitions from one initial state to another. The relative importance of state dependence and unobserved heterogeneity as determinants of the average length of spell is a central issue, whose resolution raises fundamental questions about alternative modeling approaches. The analysis covers both discrete and continuous time models, and both parametric and semiparametric formulations, including the standard models like the exponential, the Weibull, and the proportional hazards model. Chapter 18 covers formulation and interpretation of richer models that incorporate unobserved heterogeneity. Chapter 19 deals with models with several types of events using the competing risks formulation and models of multiple spells. Chapter 20 covers the analysis of event count of the kind very common in health economics. There are many strong connections and parallels between count data models and duration models because of their common foundation in stochastic processes. We analyze the widely-used Poisson and negative binomial regression models, together with important variants such as the two-part or hurdle model, zero-inflated models, latent class models, and endogenous regressor models, all of which accommodate different facets of the event processes.

PART 5 (chapters 21-23)

Cross section models have certain inherent limitations. They are predominantly equilibrium models that generally do not shed light on intertemporal dependence of events. They also cannot satisfactorily resolve fundamental issues about the sources of persistence in behavior. Such persistence may be behavioral, i.e. arising from true state dependence, or it may be spurious, being an artifact of the inability to control for heterogeneous behavior in the population. Because panel data, also called longitudinal data, contain periodically repeated observations of the same subjects, they have a large potential for resolving issues that cross section models cannot satisfactorily handle. Chapters 21 through 23 present methods for panel data. We progress systematically from 15

linear models for continuous data in Chapter 21 to nonlinear panel data models for limited dependent variables in Chapter 23. Both fixed effects and random effects models are considered. A persistent theme through these three chapters is the importance of using robust methods of inference. Chapter 21, which reviews the key general results for linear panel data regression models, can be read easily by those with a good grasp of linear regression; it does not require the material covered in Parts 2 to 4. We recommend that even those who are interested in more advanced material should quickly peruse through the contents of this chapter first to gain familiarity with key concepts and definitions. Chapter 22 covers important extensions of Chapter 21, especially to dynamic panels which allow for Markovian dependence structure of current variables. The analysis is in the GMM framework that is currently favored by many practitioners in this area. The analysis here is at times intricate, involving many issues of detail. A strong grasp of GMM will be helpful in absorbing the main results of this chapter. The results of Chapters 21 and 22 do not extend to nonlinear panel models of Chapter 23 in a general and unified fashion. There are relatively fewer general results for limited dependent variable panel models. Despite this, in Chapter 23 we begin by presenting an analysis of some general issues and approaches. Later sections can be treated as panel data extensions of the counterpart cross section models in Part 4. these analyze four categories of models for binary, count , censored, and duration data, respectively. These should be accessible to a suitably prepared reader familiar with the parallel cross section models.

PART 6 (chapters 24-27)

Frequently in empirical work data present not one but multiple complications that the analysis must simultaneously deal with. Examples of such complications include departures from simple random sampling, clustering of observations, measurement errors, and missing data. When they occur, individually or jointly, and in the context of any of the models developed in Parts 4 and 5, identification of parameters of interest will be compromised. Three chapters in Part 6 – Chapters 24, 26, and 27 – analyze the consequences of such complications and then present methods that attempt to overcome the consequences. The methods are illustrated using examples taken from the earlier parts of the book. This features gives points of connection between Part 6 and the rest of the book. Chapter 24, which deals with features of data from complex surveys, complements various topics covered Chapters 3, 5, and 16. Chapter 26 which deals with measurement errors complements topics in Chapter 4, 14, and 20. Chapter 27 is a stand-alone chapter on missing data and multiple imputation, but its use of the EM algorithm and Gibbs sampler also gives it points of contact with Chapters 10 and 13, respectively. Chapter 25 deals with the important topic of treatment evaluation. Treatment is a broad term that refers to the impact of one variable, e.g. schooling, on some outcome variable, e.g. income. Treatment variables may be exogenously assigned, or may be endogenously chosen. The topic of treatment evaluation concerns the identifiability of the impact of treatment on outcome, as measured by either the marginal effects or certain functions of marginal effect. A variety of methods are used including instrumental variables regression and propensity score matching. The problem of treatment evaluation can arise in the context of any model considered in parts 4 and 5. This chapter 16

may also be read on its own, but it does presume familiarity with many other topics covered in the book, including instrumental variables and selection models, which is why it is placed in the last part.

17

GUIDE FOR INSTRUCTORS AND OTHER READERS

The book assumes a basic understanding of the linear regression model with matrix algebra. It is written at the mathematical level of the first-year economics Ph.D. sequence, comparable to Greene (2000). While some of the material in this book is covered in a first-year sequence, most of the material in this book appears in second year econometrics Ph.D. courses or in data-oriented microeconomics field courses such as labor economics, public economics or industrial organization. This book is intended to be used as both an econometrics text and as an adjunct for such field courses. More generally, the book is intended to be useful as a reference work for applied researchers in economics, in related social sciences such as sociology and political science, and in epidemiology. The models chapters have been written to be as self-contained as possible, to minimize the amount of background material in the methods chapters that needs to be read. For the specific models presented in parts four and five (chapters 14-23) it will generally be sufficient to read the relevant chapter in isolation, except that some command of the general estimation results in chapter 5 and in some cases chapter 6 will be necessary. Most chapters are structured to begin with a discussion and example that is accessible to a wide audience. For instructors using this book as a course text it is best to introduce the basic nonlinear crosssection and linear panel data models as early as possible, skipping many of the methods chapters. The most commonly-used nonlinear cross-section models are presented in chapters 14-16, and require knowledge of maximum likelihood and least squares estimation, presented in chapter five. Chapter twenty-one on linear panel data models requires even less preparation, essentially just chapter four. Table 1.2 provides an outline for a one-quarter second-year graduate course taught at the University of California - Davis, immediately following the required first-year statistics and econometrics sequence. A quarter provides sufficient time to cover the basic results given in the first half of the chapters in this outline. With additional time one can go into further detail or cover a subset of chapters eleven to thirteen on computationally-intensive estimation methods (simulation-based estimation, the bootstrap which is also briefly presented in chapter seven and Bayesian methods); additional cross-section models (durations and counts) presented in chapters seventeen to twenty; and additional panel data models (linear model extensions and nonlinear models) given in chapters twenty-two and twenty-three. Outline of a twenty-lecture ten-week course: Lectures Chapter Topic 1-3 4 Review of linear models and asymptotic theory 4-7

5

Estimation: M-estimation, ML and NLS

8

10

Estimation: Numerical Optimization

9-11

14,15

Models: Binary and multinomial

12-14

16

Models: Censored and Truncated

15

6

Estimation: GMM

16

7

Testing: Hypothesis Tests

17-19

21

Models: Basic Linear Panel

20 9 Estimation: Semiparametric At Indiana University - Bloomington, a fifteen-week semester long field course in microeconometrics is based on material in most of Parts 4 and 5 (chapters 14-23). The prerequisite courses for this course cover material similar to the material in Part 2 (chapters 4-10). 18

Some exercises are provided at the end of each chapter after the first three introductory chapters. These exercises are usually learning-by-doing exercises, some are purely methodological while others entail analysis of generated or actual data. The level of difficulty of the questions is mostly related to the level of difficulty of the topic. Detailed programs and data for all the data applications (using either actual data or generated data) will be made available at the book website.

19

ADVANCE REVIEWS "This book presents an elegant and accessible treatment of the broad range of rapidly expanding topics currently being studied by microeconometricians. Thoughtful, intuitive, and careful in laying out central concepts of sophisticated econometric methodologies, it is not only an excellent textbook for students, but also an invaluable reference text for practitioners and researchers." - Cheng Hsiao, University of Southern California "I wish "Microeconometrics" was available when I was a student! Here, in one place -- and in clear and readable prose -- you can find all of the tools that are necessary to do cutting-edge applied economic analysis, and with many helpful examples." - Alan Krueger, Princeton University "Cameron and Trivedi have written a remarkably thorough and up-to-date treatment of microeconometric methods. This is not a superficial cookbook; the early chapters carefully lay the theoretical foundations on which the authors build their discussion of methods for discrete and limited dependent variables and for analysis of longitudinal data. A distinctive feature of the book is its attention to cutting-edge topics like semiparametric regression, bootstrap methods, simulationbased estimation, and empirical likelihood estimation. A highly valuable book." - Gary Solon, University of Michigan "The empirical analysis of micro data is more widespread than ever before. The book by Cameron and Trivedi contains a superb treatment of all the methods that economists like to apply to such data. What is more, it fully integrates a number of exciting new methods that have become applicable due to recent advances in computer technology. The text is in perfect balance between econometric theory and empirical intuition, and it contains many insightful examples." -

Gerard J. van den Berg, Free University, Amsterdam, The Netherlands

20

PROGRAMS: I. INTRODUCTION (chapters 1-3) No programs.

PROGRAMS: II. CORE METHODS (chapters 4-10) Section Pages

Example

Program and Output

4.5.3

84-5

Robust Standard Errors for mma04p1wls.do OLS, WLS and GLS mma04p1wls.txt

* mma04p1wls.asc

4.6.4

88-90

Quantile and Regression

qreg0902.dta qreg0902.asc

4.8.8

102-3

Instrumental Regression

4.9.6

110-2

IV Application with Weak mma04p4ivweak.do mma04p4ivweak.txt Instruments

Median mma04p2qreg.do mma04p2qreg.txt Variables mma04p3iv.do mma04p3iv.txt

Data [* means generated]

or

* mma04p3iv.asc

DATA66.dat DATA66.dct

and

5.9.2-3 159-63 Exponential: MLE using mma05p1mle.do ml command mma05p1mle.txt

* mma05data.asc

5.9.2-3 159-63 Exponential: NLS using nl mma05p2nls.do command mma05p2nls.txt

* mma05data.asc

5.9.2-3 159-63 Exponential: NLS using ml mma05p3nlsbyml.do command mma05p3nlsbyml.txt

* mma05data.asc

5.9.4

159-63 Exponential: Computation mma05p4margeffects.do mma05p4margeffects.txt of marginal effects

* mma05data.asc

6.5.4

198-9

Nonlinear Limdep

* mma06p1nl2sls.asc

6.5.4

198-9

Part of preceding using mma06p2twostage.do Stata mma06p2twostage.txt

* mma06p1nl2sls.asc

7.4

241-3

Likelihood-based Hypothesis Testts

* mma07p1mltests.asc

7.6.3

248-9

Asymptotic Power of Wald mma07p2power.do Test mma07p2power.txt

No data

7.7.1-5 250-4

Monte Carlo Simulation of mma07p3montecarlo.do Wald Test mma07p3montecarlo.txt

Data for many simulations not saved

7.8

254-6

Bootstrap example

* mma07p4boot.asc

8.2.9

269-71 Conditional moment tests mma08p1cmtests.do example mma08p1cmtests.txt

* mma08p1cmtests.asc

8.5.5

283-4

*

Nonnested

2SLS:

models

Using mma06p1nl2sls.lim mma06p1nl2sls.out

mma07p1mltests.do mma07p1mltests.txt

mma07p4boot.do mma07p4boot.txt

test mma08p2nonnested.do

21

example

mma08p2nonnested.txt

8.7.3

290-1

Model example

diagnostics mma08p3diagnostics.do mma08p3diagnostics.txt

9.2

295-7

Nonparametric density mma09p1np.do estimation and regression: mma09p1np.txt appplication

mma08p2nonnested.asc * mma08p3diagnostics.asc

9.4-9.5 307-19 Nonparametric regression: mma09p2npmore.do more mma09p2npmore.txt

* mma09p2npmore.asc

9.3.3

* mma09p3kernels.asc

299300

10.2.5 338-9

Kernel functions plotted

mma09p3kernels.do mma09p3kernels.txt

Gradient method example mma10p1gradient.do (Newton Raphson) mma10p1gradient.txt

PROGRAMS:

No data

III. Computationally-Intensive Methods

(chapters 11-13)

Section

Pages

Example

Program and Output

Data

11.3

366-8

Bootstrap example

mma11p1boot.do mma11p1boot.txt

* mma11p1boot.asc

12.3.3

391-2

Integral Example

12.4.5, 12.5.6

397-7, 403-4

Maximum Simulated mma12p2mslmsm.do Likelihood and Maximum mma12p2mslmsm.txt Simulated Score Example

* mma12p2mslmsm.asc

12.8.2

412-3

Illustration of Methods to mma12p3draws.do Draw Random Variates mma12p3draws.txt

No data

13.2.2

424

Bayes Theorem Illustration mma13p1bayesthm.do for Normal Distribution mma13p1bayesthm.txt and Prior

No data

13.6

452-4

MCMC Example: Gibbs mma13p2bayesgibbs.sas Program generated Sampler for SUR mma13p2bayesgibbs.lst mma13p2bayesgibbs.log

PROGRAMS:

IV.

Computation mma12p1integration.do No data mma12p1integration.txt

Models

for

Cross-Section

Data

Section Pages

Example

14.2

Logit and Probit mma14p1binary.do Application (fishing mode) mma14p1binary.txt

464-5

Program and Output

(chapters

14-20)

Data Nldata.asc

22

14.7.5

486

Maximum score estimator mma14p2maxscore.lim for binary outcome mma14p2maxscore.out

mma14p1binary.asc

15.2.1- 491-5 3

Multinomial Logit and mma15p1mnl.do Conditional Logit mma15p1mnl.txt Application (fishing mode)

Nldata.asc

15.6.3

511

Nested Logit (or GEV) mma15p2gev.do estimation mma15p2gev.txt

Nldata.asc

15.2.2

493-4

Limdep multinomial logit

Nldata.asc

mma15p3mnl.lim mma15p3mnl.out

15.2.1- 491-5 3

Limdep and addon Nlogit mma15p4gev.lim for conditional and nested mma15p4gev.out logit

mma15p4gev.asc

16.2.1

530-1, 565

Classic Tobit MLE and mma16p1tobit.do CLAD mma16p1tobit.txt

mma16p1tobit.asc

16.3.4

540

Inverse Mills ratio plotted

No data

16.6

553-5

Selection Application expenditures)

17.2 17.5.1

574-5 581-3

Nonparametric estimation mma17p1km.do (KM for NA) for survival mma17p1km.txt data (strike duration)

strkdur.dta strkdur.asc

17.5.1

581-2

Nonparametric estimation mma17p2kmextra.do (KM and NA) for survival mma17p2kmextra.txt data (artificial)

Data in program

17.6.1

584-6

Weibull distribution mma17p3weib.do functions plotted mma17p3weib.txt

No data

17.11

603-8

Duration regression models mma17p4duration.do (unemployment duration) mma17p4duration.txt

ema1996.dta or ema1996.asc

18.8

632-6

Duration regression with mma18p1heterogeneity.do ema1996.dta unobserved heterogeneity mma18p1heterogeneity.txt or ema1996.asc (unemployment duration)

19.5

658-3

Competing risks model mma19p1comprisks.do (unemployment duration) mma19p1comprisks.txt

ema1996.dta or ema1996.asc

20.2 20.7

671-4 690

Count regression (doctor mma20p1count.do contacts) mma20p1count.txt

randdata.dta mma20p1count.asc

mma16p2mills.do mma16p2mills.txt

Model mma16p3selection.do (medical mma16p3selection.txt

randdata.dta or mma16p3selection.asc

or

23

PROGRAMS:

V.

Models

for

Data

(chapters

Pages

21.3.1-3

708-13 Linear Panel Fixed and mma21p1panfeandre.do Random Effects Application mma21p1panfeandre.txt (hours and wages)

MOM.dat

21.3.2 21.3.4

710 719

Linear Panel Estimators mma21p2panmanual.do manually obtained by OLS mma21p2panmanual.txt on transformed equation (hours and wages)

MOM.dat

21.3.4

713-5

Linear Panel Residual mma21p3panresiduals.do Analysis (hours and wages) mma21p3panresiduals.txt

MOM.dat

21.5.5

725

Linear Panel pooled OLS mma21p4pangls.do and GLS estimation (hours mma21p4pangls.txt and wages)

MOM.dat

22.3

754-6

Linear Panel GMM mma22p1gmmpanel.do Application (hours and mma22p1gmmpanel.txt wages)

MOMprecise.dat

23.3

792-5

Nonlinear Panel Application mma23p1pannonlin.do (patents and R&D) mma23p1pannonlin.txt

patr7079.asc

VI. Example

Program and Output

21-23)

Section

PROGRAMS:

Example

Panel

Further

Methods

Section

Pages

24.7

848-53 Clustered Linear Regression mma24p1olscluster.do (household medical mma24p1olscluster.txt expenditure clustered on commune)

Data

(chapters

Program and Output

Clustered Poisson mma24p2poiscluster.do Regression (individual mma24p2poiscluster.txt pharmacy visits clustered on commune)

24-27)

Data vietnam_ex1.dta or vietnam_ex1.asc

vietnam_ex2.dta or vietnam_ex2.asc

25.8.1-4

889-93 Treatment Evaluation: mma25p1treatment.do Simple calculations mma25p1treatment.txt (training on earnings)

nswpsid.da1 or nswpsid.dta

25.8.5

893-6

nswpsid.da1 or nswpsid.dta

25.8

889-96 Treatment

Treatment Evaluation: mma25p2matching.do Propensity score matching mma25p2matching.txt (training on earnings): Evaluation: mma25p3extra.do

nswre74_treated.dta 24

Additional analysis not in mma25p3extra.txt book using additional data sets (NSW experimental controls and CPS controls)

26.5

919-20 Measurement Example

27.8

935-9

Error

Bias To

Missing Data MCMC To come Imputation Example

and nswre74_control.dta or nswre74_all.asc propensity_cps.dta or propensity_cps.asc come Generated data

Generated data

25

DATA

SETS

Data in fixed format text file have extension .asc or .dat [and if Stata dictionary used extension is .dct] Stata data files have extension .dta We thank Rajeev Dehejia, Bronwyn Hall, Cathy Kling, Jeffrey Kling, Will Manning, Brian McCall and Jim Ziliak for making their data available for empirical illustrations. The relevant citations are given below. For "Authors' extract" the citation is A. C. Cameron and P. K. Trivedi (2005), "Microeconometrics: Methods and Applications," Cambridge University Press, New York. Many more examples use generated data - see programs. Pages

Topic

Data Source

Data

88-90

Median and quantile Vietnam World Bank Livings Standards qreg0902.dta regression Survey qreg0902.asc Authors' extract

or

110-2

Instrumental National Longitudinal Survey DATA66.dat variables with weak J. R. Kling (2001) "Interpreting DATA66.dct instruments Instrumental Variables Estimates of the Return to Schooling," Journal of Business and Economic Statistics, 19, 358-364.

and

295-7 300

Panel Survey of Nonparametric density estimation Authors' extract and regression

463-6 486 491-5

Binary multinomial outcomes

553-6 565

Selection models

Rand Health Insurance Authors' extract

574-5 582

Duration models

Strike duration data strkdur.asc J. Kennan (1985), "The Duration of strkdur.asc Contract strikes in U.S. Manufacturing," Journal of Econometrics, 28, 5-28.

or

603-8 632-6 658-62

Duration models

Current Population Survey Displaced ema1996.dta Workers Supplement ema1996.asc B. P. McCall (1996), "Unemployment Insurance Rules, Joblessness, and Parttime Work," Econometrica, 64, 647-682.

or

671-4 692

Count data models

Rand Health Insurance Experiment randdata.dta or P. Deb and P.K. Trivedi (2002), "The mma20p1count.asc Structure of Demand for Medical Care: Latent Class versus Two-Part Models," Journal of Health Economics, 21, 601625.

708-15

Linear panel Panel Survey of Income Dynamics MOM.dat models: basics J. Ziliak (1997), "Efficient Estimation

Income

Dynamics psidf3050.dat

choice data Nldata.asc and Fishing-mode J. A. Herriges and C. L. Kling (1999), mma15p4gev.asc "Nonlinear Income Effects in Random Utility Models," Review of Economics and Statistics, 81, 62-72.

or

Experiment randdata.dta or mma16p3selection.asc

26

With Panel Data when Instruments are Predetermined: An Empirical Comparison of Moment-Condition Estimators," Journal of Business and Economic Statistics, 15, 419-431. 754-6

Linear panel Panel Survey of Income Dynamics MOMprecise.dat models: GMM J. Ziliak (1997) - see previous cite.

792-5

Nonlinear models

848-53

Clustered data

889-95

panel Patents-R&D data patr7079.asc B. H. Hall, Z. Griliches and J. A. Hausman (1986), "Patents and R&D: Is There a Lag?", International Economic Review, 27, 265-283.

Treatment evaluation [nswpsid: NSW treated vs PSID control used in text. The other data sets not used in text but used in mmap3extra.do]

Vietnam World Bank Livings Standards Survey Authors' extract: (1) Household data (2) Individual data

vietnam_ex1.dta vietnam_ex1.asc vietnam_ex2.dta vietnam_ex2.asc

National Supported Work demonstration project and controls. R.H. Dehejia and S. Wahba (1999), "Causal Effects in Nonexperimental Studies: Reevaluating the Evaluation of Training Programs," JASA, 1053-1062. and / or R.H. Dehejia and S. Wahba (2002), "Propensity-score Matching Methods for Nonexperimental Causal Studies," ReStat, 151-161.

nswpsid.da1 or nswpsid.dta nswre74_treated.dta and nswre74_control.dta or nswre74_all.asc propensity_cps.dta or propensity_cps.asc

or or

27

EXPLANATION OF BOOK PROGRAMS PROGRAMS USED:

Most programs are in Stata version 8.0, executed on a MSWindows PC with Stata 8.2. Stata 7 will usually be okay. Exceptions where Stata 8 is needed include: (1) Estimates command (for tabulating regression results) is not available in version 7. Comment out occurrences of "estimates store ..." and "estimates table ...." (2) Graphics commands (used to obtain the figures in the book) changed substantially from 7 to 8. This only effects generating figures. If graphs are important, it is best to upgrade to Stata 8 as so much better. (3) In some places free Stata add-ons have been included. These are noted in programs. To download these programs e.g. knnreg in Stata give command "search knnreg" and follow directions. The Stata programs vary from very problem-specific code to code that potentially can be adapted to one's own needs. Some programs use Limdep version 7.0 and Nlogit 2.0, executed on an MSWindows PC. Some programs use SAS / IML. SAS version 8.0 used on a Unix machine. FILE NAMING CONVENTIONS: For Stata: as an example for chapter 4.5.3 we provide: mma04p1wls.do Stata program mma04p1wls.txt Output from this program - mma04p1wls.asc The generated data as fixed width ascii data set [permits analysis with programs other than Stata] For Limdep: as an example for chapter 14.5.3 we provide: mma15p3mnl.lim Limdep program - mma15p3mnl.out Output from this program For SAS: as an example for chapter 13.6 we provide: mma15p2bayesgibbs.sas SAS program mma13p2bayesgibbs.lst SAS output - mma13p2bayesgibbs.log SAS logfile For data sets the extensions are: .dta for Stata data set - .asc for ascii (text) data set that is usually both space delimited and fixed width For descriptions of the data sets see the relevant program that uses the data set, and the associated output. PROGRAM CPU TIME Programs generally take little time to run. Exception is programs that entail simulation, including bootstrapping. Programs can be speeded up by reducing the number of simulations / replications, though final analysis should use many simulations / replications.

28

29

30

31

32

33

34

35

36

37

38

Chapter 4. Linear models

-----------------------------------------------------------------------------------------------------log: c:\Imbook\bwebpage\Section2\mma04p1wls.txt log type: text opened on: 17 May 2005, 13:41:48 . . ********** OVERVIEW OF MMA04P1WLS.DO ********** . . * STATA Program . * copyright C 2005 by A. Colin Cameron and Pravin K. Trivedi . * used for "Microeconometrics: Methods and Applications" . * by A. Colin Cameron and Pravin K. Trivedi (2005) . * Cambridge University Press . . * Chapter 4.5.3 pages 84-5 . * Robust Standard Errors for OLS, WLS and GLS . * (1) Robust and nonrobust standard errors for OLS, WLS and GLS. . * (2) Table 4.3 . * using generated data (see below) . . ********** SETUP ********** . . set more off . version 8 . set scheme s1mono /* Used for graphs */ . . ********** GENERATE DATA and SUMMARIZE ********** . . * Model is y = 1 + 1*x + u . * where u = abs(x)*e .* x ~ N(0, 5^2) .* e ~ N(0, 2^2) . . * Errors are conditionally heteroskedastic with V[u|x]=4*x^2 . * OLS, WLS and GLS are consistent . * but need to use robust standard errors for OLS and WLS. . . set seed 10105 . set obs 100 obs was 0, now 100 . gen x = 5*invnorm(uniform()) 39

. gen e = 2*invnorm(uniform()) . gen u = abs(x)*e . gen y = 1 + 1*x + u . . * Descriptive Statistics . summarize Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------x| 100 -.1322828 4.64293 -11.05289 10.63336 e| 100 .350339 2.033639 -3.776468 5.150759 u| 100 1.215709 8.187081 -19.58098 32.6086 y| 100 2.083426 9.364465 -27.63657 39.93944 . . * Write data to a text (ascii) file so can use with programs other than Stata . outfile y x e u using mma04p1wls.asc, replace . . ********** ESTIMATE THE MODELS ********** . . ** (1) OLS - first column of Table 4.3 . . * (1A) OLS with wrong standard errors . regress y x Source | SS df MS Number of obs = 100 -------------+-----------------------------F( 1, 98) = 30.23 Model | 2046.73901 1 2046.73901 Prob > F = 0.0000 Residual | 6634.88855 98 67.7029444 R-squared = 0.2358 -------------+-----------------------------Adj R-squared = 0.2280 Total | 8681.62755 99 87.6932076 Root MSE = 8.2282 -----------------------------------------------------------------------------y| Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------x | .979313 .1781124 5.50 0.000 .6258548 1.332771 _cons | 2.212973 .8231553 2.69 0.008 .5794478 3.846497 -----------------------------------------------------------------------------. estimates store olsusual . . * (1B) OLS with correct standard errors (robust sandwich) . regress y x, robust

40

Regression with robust standard errors Number of obs = F( 1, 98) = 12.68 Prob > F = 0.0006 R-squared = 0.2358 Root MSE = 8.2282

100

-----------------------------------------------------------------------------| Robust y| Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------x | .979313 .2750617 3.56 0.001 .4334621 1.525164 _cons | 2.212973 .8198253 2.70 0.008 .586056 3.839889 -----------------------------------------------------------------------------. estimates store olsrobust . . ** (2) WLS - second column of Table 4.3 . . * (2A) WLS with wrong standard errors . * Use the aweight option (not clearly explained in Stata manual). . * The aweight option MULTIPLIES y and x by sqrt(aweight). . * Here we suppose V[u]=constant*|x| . * So want to divide by sqrt(|x|), so let aweight=1/|x| . gen absx = abs(x) . regress y x [aweight=1/absx] (sum of wgt is 5.7885e+02) Source | SS df MS Number of obs = 100 -------------+-----------------------------F( 1, 98) = 25.29 Model | 56.759883 1 56.759883 Prob > F = 0.0000 Residual | 219.985987 98 2.24475497 R-squared = 0.2051 -------------+-----------------------------Adj R-squared = 0.1970 Total | 276.74587 99 2.79541283 Root MSE = 1.4983 -----------------------------------------------------------------------------y| Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------x | .9569768 .1903115 5.03 0.000 .5793097 1.334644 _cons | 1.060374 .1498265 7.08 0.000 .7630484 1.3577 -----------------------------------------------------------------------------. estimates store wlsusual . . * (2B) WLS with correct standard errors (robust sandwich) . regress y x [aweight=1/absx], robust (sum of wgt is 5.7885e+02) Regression with robust standard errors

Number of obs =

100 41

F( 1, 98) = 17.07 Prob > F = 0.0001 R-squared = 0.2051 Root MSE = 1.4983 -----------------------------------------------------------------------------| Robust y| Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------x | .9569768 .231612 4.13 0.000 .4973503 1.416603 _cons | 1.060374 .050533 20.98 0.000 .9600931 1.160655 -----------------------------------------------------------------------------. estimates store wlsrobust . . ** (3) GLS - last column of Table 4.3 . . * (3A) GLS with usual standard errors (correct) . * Here we know V[u]=constant*x^2 . * So want to divide by x, so let aweight=1/(x^2) . gen xsq = x*x . regress y x [aweight=1/xsq] (sum of wgt is 1.0314e+05) Source | SS df MS Number of obs = 100 -------------+-----------------------------F( 1, 98) = 20.70 Model | .086075004 1 .086075004 Prob > F = 0.0000 Residual | .407542418 98 .004158596 R-squared = 0.1744 -------------+-----------------------------Adj R-squared = 0.1660 Total | .493617422 99 .004986035 Root MSE = .06449 -----------------------------------------------------------------------------y| Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------x | .9516457 .2091752 4.55 0.000 .5365444 1.366747 _cons | .9964956 .0065131 153.00 0.000 .9835706 1.009421 -----------------------------------------------------------------------------. estimates store glsusual . . * (3B) GLS with standard errors (robust sandwich - unnecessary here) . regress y x [aweight=1/xsq], robust (sum of wgt is 1.0314e+05) Regression with robust standard errors Number of obs = F( 1, 98) = 20.89 Prob > F = 0.0000 R-squared = 0.1744

100

42

Root MSE

= .06449

-----------------------------------------------------------------------------| Robust y| Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------x | .9516457 .2082145 4.57 0.000 .5384508 1.364841 _cons | .9964956 .0078922 126.26 0.000 .9808337 1.012157 -----------------------------------------------------------------------------. estimates store glsrobust . . * (3C) Check that aweight works as expected. . * Do GLS by OLS on daya transformed by dividing by x. . gen try = y/x . gen trint = 1/x . gen trx = x/x . regress try trx trint, noconstant Source | SS df MS Number of obs = 100 -------------+-----------------------------F( 2, 98) =11850.15 Model | 101659.545 2 50829.7726 Prob > F = 0.0000 Residual | 420.359033 98 4.28937789 R-squared = 0.9959 -------------+-----------------------------Adj R-squared = 0.9958 Total | 102079.904 100 1020.79904 Root MSE = 2.0711 -----------------------------------------------------------------------------try | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------trx | .9516457 .2091752 4.55 0.000 .5365444 1.366747 trint | .9964956 .0065131 153.00 0.000 .9835706 1.009421 -----------------------------------------------------------------------------. . ********** DISPLAY KEY RESULTS ********** . . * Table 4.3 . estimates table olsusual olsrobust wlsusual wlsrobust glsusual glsrobust, /* > */ se stats(N r2) b(%7.3f) keep(_cons x) -------------------------------------------------------------------------Variable | olsus~l olsro~t wlsus~l wlsro~t glsus~l glsro~t -------------+-----------------------------------------------------------_cons | 2.213 2.213 1.060 1.060 0.996 0.996 | 0.823 0.820 0.150 0.051 0.007 0.008 x | 0.979 0.979 0.957 0.957 0.952 0.952 | 0.178 0.275 0.190 0.232 0.209 0.208 43

-------------+-----------------------------------------------------------N | 100.000 100.000 100.000 100.000 100.000 100.000 r2 | 0.236 0.236 0.205 0.205 0.174 0.174 -------------------------------------------------------------------------legend: b/se . . * Minor typo in Table 4.3: . * for GLS Constant has robust s.e. of [0.008] not [0.006] . . ********** CLOSE OUTPUT ********** . log close log: c:\Imbook\bwebpage\Section2\mma04p1wls.txt log type: text closed on: 17 May 2005, 13:41:48 -----------------------------------------------------------------------------------------------------log: c:\Imbook\bwebpage\Section2\mma04p2qreg.txt log type: text opened on: 17 May 2005, 13:43:21 . . ********** OVERVIEW OF MMA04P2QREG.DO ********** . . * STATA Program . * copyright C 2005 by A. Colin Cameron and Pravin K. Trivedi . * used for "Microeconometrics: Methods and Applications" . * by A. Colin Cameron and Pravin K. Trivedi (2005) . * Cambridge University Press . . * Chapter 4.6.4 pages 88-90 . * Quantile Regression analysis. . * (1) Quantile regression estimates for different quantiles . * (2) Figure 4.1: Quantile Slope Coefficient Estimates as Quantile Varies . * (3) Figure 4.2: Quantile Regression Lines as Quantile Varies . . * To run this program you need data file . * qreg0902.dta . * or for programs other than Stata use qreg92.asc . . * Step (3) takes a long time due to bootstrap to get standard errors. . * To speed up the program reduce the number of repititions in qsreg . * But any final results should use a large number of bootstraps . . ********** SETUP ********** . . set more off . version 8.0 . set scheme s1mono /* Used for graphs */

44

. . ********** DATA DESCRIPTION ********** . . * The data from World Bank 1997 Vietnam Living Standards Survey . * are described in chapter 4.6.4. . * A larger sample from this survey is studied in Chapter 24.7 . . ********** READ DATA, TRANSFORM and SAMPLE SELECTION ********** . . use qreg0902 . describe Contains data from qreg0902.dta obs: 5,999 vars: 9 19 Sep 2002 21:45 size: 191,968 (98.1% of memory free) ------------------------------------------------------------------------------storage display value variable name type format label variable label ------------------------------------------------------------------------------sex byte %8.0g Gender of HH.head (1:M;2:F) age int %8.0g Age of household head educyr98 float %9.0g schooling year of HH.head farm float %9.0g loaiho Type of HH (1:farm; 0:nonfarm) urban98 byte %8.0g urban 1:urban 98; 0:rural 98 hhsize long %12.0g Household size lhhexp1 float %9.0g lhhex12m float %9.0g lnrlfood float %9.0g ------------------------------------------------------------------------------Sorted by: . summarize Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------sex | 5999 1.270712 .4443645 1 2 age | 5999 48.01284 13.7702 16 95 educyr98 | 5999 7.094419 4.416092 0 22 farm | 5999 .5730955 .4946694 0 1 urban98 | 5999 .2883814 .4530472 0 1 -------------+-------------------------------------------------------hhsize | 5999 4.752292 1.954292 1 19 lhhexp1 | 5999 9.341561 .6877458 6.543108 12.20242 lhhex12m | 5006 6.310585 1.593083 0 12.36325 lnrlfood | 5999 8.679536 .5368118 6.356364 11.38385 . . * Write data to a text (ascii) file so can use with programs other than Stata . outfile sex age educyr98 farm urban98 hhsize lhhexp1 lhhex12m lnrlfood /* 45

>

*/ using qreg0902.asc, replace

. . * drop zero observations for medical expenditures . drop if lhhex12m == . (993 observations deleted) . . * lhhexp1 is natural logarithm of household total expenditure . * lhhex12m is natural logarithm of household medical expenditure . gen lntotal = lhhexp1 . gen lnmed = lhhex12m . label variable lntotal "Log household total expenditure" . label variable lnmed "Log household medical expenditure" . describe Contains data from qreg0902.dta obs: 5,006 vars: 11 19 Sep 2002 21:45 size: 200,240 (98.0% of memory free) ------------------------------------------------------------------------------storage display value variable name type format label variable label ------------------------------------------------------------------------------sex byte %8.0g Gender of HH.head (1:M;2:F) age int %8.0g Age of household head educyr98 float %9.0g schooling year of HH.head farm float %9.0g loaiho Type of HH (1:farm; 0:nonfarm) urban98 byte %8.0g urban 1:urban 98; 0:rural 98 hhsize long %12.0g Household size lhhexp1 float %9.0g lhhex12m float %9.0g lnrlfood float %9.0g lntotal float %9.0g Log household total expenditure lnmed float %9.0g Log household medical expenditure ------------------------------------------------------------------------------Sorted by: Note: dataset has changed since last saved . summarize Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------sex | 5006 1.269676 .443836 1 2 age | 5006 48.06133 13.79974 18 95 educyr98 | 5006 7.147956 4.333304 0 21 46

farm | 5006 .5679185 .4954151 0 1 urban98 | 5006 .2920495 .4547504 0 1 -------------+-------------------------------------------------------hhsize | 5006 4.832601 1.95257 1 19 lhhexp1 | 5006 9.370402 .6726841 6.543108 12.20242 lhhex12m | 5006 6.310585 1.593083 0 12.36325 lnrlfood | 5006 8.697963 .5309517 6.356364 11.38385 lntotal | 5006 9.370402 .6726841 6.543108 12.20242 -------------+-------------------------------------------------------lnmed | 5006 6.310585 1.593083 0 12.36325 . . ********* ANALYSIS: QUANTILE REGRESSION ********** . . * (0) OLS . reg lnmed lntotal Source | SS df MS Number of obs = 5006 -------------+-----------------------------F( 1, 5004) = 311.91 Model | 745.293239 1 745.293239 Prob > F = 0.0000 Residual | 11956.9671 5004 2.38948183 R-squared = 0.0587 -------------+-----------------------------Adj R-squared = 0.0585 Total | 12702.2603 5005 2.53791415 Root MSE = 1.5458 -----------------------------------------------------------------------------lnmed | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------lntotal | .5736545 .0324817 17.66 0.000 .5099761 .6373328 _cons | .9352117 .3051496 3.06 0.002 .3369847 1.533439 -----------------------------------------------------------------------------. predict pols (option xb assumed; fitted values) . reg lnmed lntotal, robust Regression with robust standard errors Number of obs = F( 1, 5004) = 318.05 Prob > F = 0.0000 R-squared = 0.0587 Root MSE = 1.5458

5006

-----------------------------------------------------------------------------| Robust lnmed | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------lntotal | .5736545 .0321665 17.83 0.000 .510594 .636715 _cons | .9352117 .298119 3.14 0.002 .3507677 1.519656 -----------------------------------------------------------------------------. * Bootstrap standard errors for OLS 47

. set seed 10101 . * bs "reg lnmed lntotal" "_b[lntotal]", reps(100) . . * (1) Quantile and median regression for quantiles 0.1, 0.5 and 0.9 . * Save prediction to construct Figure 4.2. . qreg lnmed lntotal, quant(.10) Iteration 1: WLS sum of weighted deviations = 3554.0793 Iteration 1: sum of abs. weighted deviations = 3555.3279 Iteration 2: sum of abs. weighted deviations = 3344.1924 Iteration 3: sum of abs. weighted deviations = 3051.7353 Iteration 4: sum of abs. weighted deviations = 2942.1274 Iteration 5: sum of abs. weighted deviations = 2939.3979 Iteration 6: sum of abs. weighted deviations = 2935.9969 Iteration 7: sum of abs. weighted deviations = 2933.0493 Iteration 8: sum of abs. weighted deviations = 2932.7763 Iteration 9: sum of abs. weighted deviations = 2932.4432 Iteration 10: sum of abs. weighted deviations = 2932.4429 .1 Quantile regression Number of obs = 5006 Raw sum of deviations 2936.097 (about 4.1743875) Min sum of deviations 2932.443 Pseudo R2 = 0.0012 -----------------------------------------------------------------------------lnmed | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------lntotal | .1512009 .0552584 2.74 0.006 .0428702 .2595317 _cons | 2.825072 .5194064 5.44 0.000 1.806808 3.843336 -----------------------------------------------------------------------------. predict pqreg10 (option xb assumed; fitted values) . qreg lnmed lntotal, quant(.5) Iteration 1: WLS sum of weighted deviations = 6112.8801 Iteration Iteration Iteration Iteration

1: sum of abs. weighted deviations = 2: sum of abs. weighted deviations = 3: sum of abs. weighted deviations = 4: sum of abs. weighted deviations =

6112.4546 6098.5295 6097.2178 6097.1564

Median regression Number of obs = Raw sum of deviations 6324.265 (about 6.3716121) Min sum of deviations 6097.156 Pseudo R2

5006 =

0.0359

-----------------------------------------------------------------------------lnmed | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------lntotal | .6210917 .0388194 16.00 0.000 .5449886 .6971948 _cons | .5921626 .3646869 1.62 0.104 -.1227836 1.307109 48

-----------------------------------------------------------------------------. predict pqreg50 (option xb assumed; fitted values) . qreg lnmed lntotal, quant(.90) Iteration 1: WLS sum of weighted deviations = 3275.6073 Iteration Iteration Iteration Iteration Iteration Iteration Iteration Iteration

1: sum of abs. weighted deviations = 2: sum of abs. weighted deviations = 3: sum of abs. weighted deviations = 4: sum of abs. weighted deviations = 5: sum of abs. weighted deviations = 6: sum of abs. weighted deviations = 7: sum of abs. weighted deviations = 8: sum of abs. weighted deviations =

3279.5575 2691.3839 2521.5214 2506.303 2505.1952 2505.1334 2505.1314 2505.1313

.9 Quantile regression Number of obs = 5006 Raw sum of deviations 2687.692 (about 8.2789364) Min sum of deviations 2505.131 Pseudo R2 = 0.0679 -----------------------------------------------------------------------------lnmed | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------lntotal | .8003569 .0517225 15.47 0.000 .6989581 .9017558 _cons | .6750967 .4857563 1.39 0.165 -.2771985 1.627392 -----------------------------------------------------------------------------. predict pqreg90 (option xb assumed; fitted values) . . * (2) Create Figure 4.2 on page 90 first as this is easy . graph twoway (scatter lnmed lntotal, msize(vsmall)) (lfit pqreg90 lntotal, clstyle(p2)) /* > */ (lfit pqreg50 lntotal, clstyle(p1)) (lfit pqreg10 lntotal, clstyle(p3)), /* > */ scale (1.2) plotregion(style(none)) /* > */ title("Regression Lines as Quantile Varies") /* > */ xtitle("Log Household Medical Expenditure", size(medlarge)) xscale(titlegap(*5)) /* > */ ytitle("Log Household Total Expenditure", size(medlarge)) yscale(titlegap(*5)) /* > */ legend(pos(11) ring(0) col(1)) legend(size(small)) /* > */ legend( label(1 "Actual Data") label(2 "90th percentile") /* > */ label(3 "Median") label(4 "10th percentile")) . graph export ch4fig2QR.wmf, replace (file c:\Imbook\bwebpage\Section2\ch4fig2QR.wmf written in Windows Metafile format) . . * (3) Create Figure 4.1 second as this is more difficult . * Simultaneous quantile regression for quantiles 0.05, 0.10, ..., 0.90, 0.95 . * with standard errors by bootstrap - here 200 replications . set seed 10101 49

. sqreg lnmed lntotal, quant(.05,.1,.15,.2,.25,.3,.35,.4,.45,.5,.55,.6,.65,.7,.75,.8,.85,.9,.95) rep > s(200) (fitting base model) (bootstrapping ..................................................................................... > .................................................................................................. > .................) Simultaneous quantile regression bootstrap(200) SEs

Number of obs = 5006 .05 Pseudo R2 = 0.0015 .10 Pseudo R2 = 0.0012 .15 Pseudo R2 = 0.0058 .20 Pseudo R2 = 0.0106 .25 Pseudo R2 = 0.0149 .30 Pseudo R2 = 0.0183 .35 Pseudo R2 = 0.0242 .40 Pseudo R2 = 0.0274 .45 Pseudo R2 = 0.0326 .50 Pseudo R2 = 0.0359 .55 Pseudo R2 = 0.0408 .60 Pseudo R2 = 0.0464 .65 Pseudo R2 = 0.0500 .70 Pseudo R2 = 0.0520 .75 Pseudo R2 = 0.0563 .80 Pseudo R2 = 0.0603 .85 Pseudo R2 = 0.0630 .90 Pseudo R2 = 0.0679 .95 Pseudo R2 = 0.0795

-----------------------------------------------------------------------------| Bootstrap lnmed | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------q5 | lntotal | .1536332 .0791236 1.94 0.052 -.0014838 .3087501 _cons | 2.095395 .7559016 2.77 0.006 .6134964 3.577293 -------------+---------------------------------------------------------------q10 | lntotal | .1512009 .085018 1.78 0.075 -.0154716 .3178734 _cons | 2.825072 .7697613 3.67 0.000 1.316002 4.334141 -------------+---------------------------------------------------------------q15 | lntotal | .2695707 .0580757 4.64 0.000 .1557168 .3834245 _cons | 2.231293 .5429047 4.11 0.000 1.166962 3.295624 -------------+---------------------------------------------------------------q20 | lntotal | .3552251 .0504688 7.04 0.000 .2562841 .4541662 _cons | 1.740233 .4649551 3.74 0.000 .8287172 2.651749 -------------+---------------------------------------------------------------q25 | lntotal | .4034632 .0421514 9.57 0.000 .3208279 .4860984 50

_cons | 1.567055 .3844967 4.08 0.000 .8132731 2.320837 -------------+---------------------------------------------------------------q30 | lntotal | .4797723 .0478081 10.04 0.000 .3860474 .5734972 _cons | 1.097107 .4299363 2.55 0.011 .2542435 1.93997 -------------+---------------------------------------------------------------q35 | lntotal | .52179 .0440082 11.86 0.000 .4355147 .6080652 _cons | .9213684 .4064355 2.27 0.023 .1245768 1.71816 -------------+---------------------------------------------------------------q40 | lntotal | .5691746 .0412824 13.79 0.000 .4882429 .6501062 _cons | .6808693 .3754568 1.81 0.070 -.0551906 1.416929 -------------+---------------------------------------------------------------q45 | lntotal | .6123663 .0402805 15.20 0.000 .5333989 .6913337 _cons | .4890392 .373467 1.31 0.190 -.2431197 1.221198 -------------+---------------------------------------------------------------q50 | lntotal | .6210917 .0414602 14.98 0.000 .5398117 .7023718 _cons | .5921626 .3866997 1.53 0.126 -.1659383 1.350263 -------------+---------------------------------------------------------------q55 | lntotal | .6523013 .02904 22.46 0.000 .5953701 .7092324 _cons | .4913988 .264271 1.86 0.063 -.0266881 1.009486 -------------+---------------------------------------------------------------q60 | lntotal | .6531127 .0321585 20.31 0.000 .5900679 .7161575 _cons | .6631971 .2981433 2.22 0.026 .0787056 1.247689 -------------+---------------------------------------------------------------q65 | lntotal | .6843844 .03378 20.26 0.000 .6181608 .7506079 _cons | .5550968 .3162769 1.76 0.079 -.0649445 1.175138 -------------+---------------------------------------------------------------q70 | lntotal | .714783 .0330755 21.61 0.000 .6499406 .7796255 _cons | .4732288 .3028818 1.56 0.118 -.1205524 1.06701 -------------+---------------------------------------------------------------q75 | lntotal | .7416898 .0369607 20.07 0.000 .6692306 .814149 _cons | .4298887 .3416755 1.26 0.208 -.239945 1.099722 -------------+---------------------------------------------------------------q80 | lntotal | .7675658 .0443925 17.29 0.000 .680537 .8545946 _cons | .3966887 .4132223 0.96 0.337 -.4134081 1.206785 -------------+---------------------------------------------------------------q85 | lntotal | .8009016 .056703 14.12 0.000 .6897389 .9120642 _cons | .3649957 .5369325 0.68 0.497 -.6876273 1.417619 -------------+---------------------------------------------------------------q90 | 51

lntotal | .8003569 .0473557 16.90 0.000 .7075189 .8931949 _cons | .6750967 .4450068 1.52 0.129 -.1973116 1.547505 -------------+---------------------------------------------------------------q95 | lntotal | .767308 .0507532 15.12 0.000 .6678094 .8668066 _cons | 1.487137 .4739756 3.14 0.002 .5579371 2.416337 -----------------------------------------------------------------------------. * Test equality of slope coefffiients for 25th and 75th quantiles . test [q25]lntotal = [q75]lntotal ( 1) [q25]lntotal - [q75]lntotal = 0 F( 1, 5004) = 55.14 Prob > F = 0.0000 . * Create vectors of slope cofficients and estimated variances . * Code here specific for this problem . * with single slope coefficient is 1st, 3rd, 5th , ... entry . matrix b = e(b) . matrix bslopevector = b[1,1]\b[1,3]\b[1,5]\b[1,7]\b[1,9]\b[1,11]\b[1,13] /* > */ \b[1,15]\b[1,17]\b[1,19]\b[1,21]\b[1,23]\b[1,25] /* > */ \b[1,27]\b[1,29]\b[1,31]\b[1,33]\b[1,35]\b[1,37] . matrix V = e(V) . matrix Vslopevector = V[1,1]\V[3,3]\V[5,5]\V[7,7]\V[9,9]\V[11,11]\V[13,13] /* > */ \V[15,15]\V[17,17]\V[19,19]\V[21,21]\V[23,23]\V[25,25] /* > */ \V[27,27]\V[29,29]\V[31,31]\V[33,33]\V[35,35]\V[37,37] . matrix q = e(q1)\e(q2)\e(q3)\e(q4)\e(q5)\e(q6)\e(q7)\e(q8)\e(q9)\e(q10) /* > */ \e(q11)\e(q12)\e(q13)\e(q14)\e(q15)\e(q16)\e(q17)\e(q18)\e(q19) . * Convert column vectors to variables as graph handles variables . svmat bslopevector, name(bslope) . svmat Vslopevector, name(Vslope) . svmat q, name(quantiles) . gen upper = bslope1 + 1.96*sqrt(Vslope1) (4987 missing values generated) . gen lower = bslope1 - 1.96*sqrt(Vslope1) (4987 missing values generated) . * Also include OLS slope ccoefficient . quietly reg lnmed lntotal . gen bols=_b[lntotal] 52

. sum upper bslope1 lower bols Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------upper | 19 .6564067 .1904354 .3087155 .9120393 bslope1 | 19 .5641943 .209318 .1512009 .8009015 lower | 19 .4719818 .2302585 -.0154343 .7075397 bols | 5006 .5736545 0 .5736545 .5736545 . . * Following produces Figure 4.1 om page 89 . graph twoway (line upper quantiles1, msize(vtiny) mstyle(p2) clstyle(p1) clcolor(gs12)) /* > */ (line bslope1 quantiles1, msize(vtiny) mstyle(p1) clstyle(p1)) /* > */ (line lower quantiles1, msize(vtiny) mstyle(p2) clstyle(p1) clcolor(gs12)) /* > */ (line bols quantiles1, msize(vtiny) mstyle(p3) clstyle(p2)), /* > */ scale(1.2) plotregion(style(none)) /* > */ title("Slope Estimates as Quantile Varies") /* > */ xtitle("Quantile", size(medlarge)) xscale(titlegap(*5)) /* > */ ytitle("Slope and confidence bands", size(medlarge)) yscale(titlegap(*5)) /* > */ legend(pos(4) ring(0) col(1)) legend(size(small)) /* > */ legend( label(1 "Upper 95% confidence band") label(2 "Quantile slope coefficient") /* > */ label(3 "Lower 95% confidence band") label(4 "OLS slope coefficient") ) . graph export ch4fig1QR.wmf, replace (file c:\Imbook\bwebpage\Section2\ch4fig1QR.wmf written in Windows Metafile format) . . ********** CLOSE OUTPUT ********** . log close log: c:\Imbook\bwebpage\Section2\mma04p2qreg.txt log type: text closed on: 17 May 2005, 13:51:21 --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------log: c:\Imbook\bwebpage\Section2\mma04p3iv.txt log type: text opened on: 17 May 2005, 13:44:29 . . ********** OVERVIEW OF MMA04P3IV.DO ********** . . * STATA Program . * copyright C 2005 by A. Colin Cameron and Pravin K. Trivedi . * used for "Microeconometrics: Methods and Applications" . * by A. Colin Cameron and Pravin K. Trivedi (2005) . * Cambridge University Press . . * Chapter 4.8.8 pages 102-3 . * Instrumental variables analysis. 53

. * (1) IV Regression (with robust s.e.'s though not needed here for iid error). . * (2) Table 4.4 . * using generated data (see below) . . ********** SETUP ********** . . set more off . version 8 . . ********** GENERATE DATA and SUMMARIZE ********** . . * Model is . * y = b1 + b2*x + u . * x = c1 + c2*z + v . * z ~ N[2,1] . * where b1=0, b2=0.5, c1=0 and c2=1 . * and u and v are joint normal (0,0,1,1,0.8) . . * OLS of y on z is inconsistent as z is correlated with u . * Instead need to do IV with instrument x for z . * Also try using . . set seed 10001 . set obs 10000 obs was 0, now 10000 . scalar b1 = 0 . scalar b2 = 0.5 . scalar c1 = 0 . scalar c2 = 1 . . * Generate errors u and v . * Use fact that u is N(0,1) . * and v | u is N(0 + (.8/1)(u - 0), 1 - .8x.8/1 = 0.36) . gen u = 1*invnorm(uniform()) . gen muvgivnu = 0.8*u . gen v = 1*(muvgivnu+sqrt(0.36)*invnorm(uniform())) . . * Generate instrument z (which is purely random) . gen z = 2 + 1*invnorm(uniform())

54

. . * Generate regressor x which is correlated with z, and with u via v . gen x = c1 + c2*z + v . . * Generate dependent variable y . gen y = b1 + b2*x + u . . * Generate z-cubed. Used as an alternative instrument . gen zcube = z*z*z . . * Descriptive Statistics . describe Contains data obs: 10,000 vars: 7 size: 320,000 (96.9% of memory free) ------------------------------------------------------------------------------storage display value variable name type format label variable label ------------------------------------------------------------------------------u float %9.0g muvgivnu float %9.0g v float %9.0g z float %9.0g x float %9.0g y float %9.0g zcube float %9.0g ------------------------------------------------------------------------------Sorted by: Note: dataset has changed since last saved . summarize Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------u | 10000 .003772 1.010726 -4.010302 4.267661 muvgivnu | 10000 .0030176 .8085809 -3.208241 3.414129 v | 10000 .0097031 1.005874 -3.992237 3.79261 z | 10000 1.997786 1.013118 -1.895752 5.81496 x | 10000 2.007489 1.436511 -3.139744 7.366555 -------------+-------------------------------------------------------y | 10000 1.007516 1.538611 -5.309155 7.794924 zcube | 10000 14.14145 17.88016 -6.813095 196.6257 . correlate y x z u v (obs=10000)

55

| y x z u v -------------+--------------------------------------------y | 1.0000 x | 0.8423 1.0000 z | 0.3403 0.7140 1.0000 u | 0.9237 0.5716 0.0107 1.0000 v | 0.8601 0.7090 0.0124 0.8055 1.0000

. correlate y x z u v, cov (obs=10000) | y x z u v -------------+--------------------------------------------y | 2.36732 x | 1.86165 2.06356 z | .530456 1.0391 1.02641 u | 1.4365 .829866 .010909 1.02157 v | 1.33119 1.02447 .012687 .818958 1.01178

. graph matrix y x z u v . . * Write data to a text (ascii) file so can use with programs other than Stata . outfile y x z u v using mma04p3iv.asc, replace . . ********** DO THE ANALYSIS: ESTIMATE MODELS ********** . . * (1) OLS is inconsistent (first column of Table 4.4) . regress y x Source | SS df MS Number of obs = 10000 -------------+-----------------------------F( 1, 9998) =24412.17 Model | 16793.2198 1 16793.2198 Prob > F = 0.0000 Residual | 6877.65935 9998 .687903516 R-squared = 0.7094 -------------+-----------------------------Adj R-squared = 0.7094 Total | 23670.8791 9999 2.36732464 Root MSE = .8294 -----------------------------------------------------------------------------y| Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------x | .9021522 .005774 156.24 0.000 .890834 .9134704 _cons | -.8035441 .014253 -56.38 0.000 -.8314827 -.7756054 -----------------------------------------------------------------------------. regress y x, robust Regression with robust standard errors Number of obs = 10000 F( 1, 9998) =24780.49 56

Prob > F = 0.0000 R-squared = 0.7094 Root MSE = .8294 -----------------------------------------------------------------------------| Robust y| Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------x | .9021522 .0057309 157.42 0.000 .8909184 .9133859 _cons | -.8035441 .0141056 -56.97 0.000 -.8311939 -.7758942 -----------------------------------------------------------------------------. estimates store olswrong . . * (2) IV with instrument x is consistent and efficient (second column of Table 4.4) . ivreg y (x = z) Instrumental variables (2SLS) regression Source | SS df MS Number of obs = 10000 -------------+-----------------------------F( 1, 9998) = 2728.97 Model | 13628.1781 1 13628.1781 Prob > F = 0.0000 Residual | 10042.701 9998 1.004471 R-squared = 0.5757 -------------+-----------------------------Adj R-squared = 0.5757 Total | 23670.8791 9999 2.36732464 Root MSE = 1.0022 -----------------------------------------------------------------------------y| Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------x | .5104982 .0097723 52.24 0.000 .4913426 .5296538 _cons | -.017303 .0220296 -0.79 0.432 -.0604854 .0258793 -----------------------------------------------------------------------------Instrumented: x Instruments: z -----------------------------------------------------------------------------. ivreg y (x = z), robust IV (2SLS) regression with robust standard errors Number of obs = 10000 F( 1, 9998) = 2670.19 Prob > F = 0.0000 R-squared = 0.5757 Root MSE = 1.0022 -----------------------------------------------------------------------------| Robust y| Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------x | .5104982 .0098792 51.67 0.000 .4911329 .5298635 _cons | -.017303 .0220785 -0.78 0.433 -.0605813 .0259752 57

-----------------------------------------------------------------------------Instrumented: x Instruments: z -----------------------------------------------------------------------------. estimates store iv . . * (3) IV estimator in (3) can be computed by .* regress y on z gives dy/dz .* regress x on z gives dx/dz . * and divide the two . regress y z Source | SS df MS Number of obs = 10000 -------------+-----------------------------F( 1, 9998) = 1309.44 Model | 2741.16635 1 2741.16635 Prob > F = 0.0000 Residual | 20929.7128 9998 2.09338995 R-squared = 0.1158 -------------+-----------------------------Adj R-squared = 0.1157 Total | 23670.8791 9999 2.36732464 Root MSE = 1.4469 -----------------------------------------------------------------------------y| Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------z | .516808 .0142819 36.19 0.000 .4888126 .5448035 _cons | -.0249553 .031991 -0.78 0.435 -.0876642 .0377535 -----------------------------------------------------------------------------. matrix byonz = e(b) . regress x z Source | SS df MS Number of obs = 10000 -------------+-----------------------------F( 1, 9998) =10396.43 Model | 10518.3341 1 10518.3341 Prob > F = 0.0000 Residual | 10115.2362 9998 1.01172597 R-squared = 0.5098 -------------+-----------------------------Adj R-squared = 0.5097 Total | 20633.5703 9999 2.06356339 Root MSE = 1.0058 -----------------------------------------------------------------------------x| Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------z | 1.01236 .0099287 101.96 0.000 .9928979 1.031822 _cons | -.0149899 .02224 -0.67 0.500 -.0585847 .028605 -----------------------------------------------------------------------------. matrix bxonz = e(b) . matrix ivfirstprinciples = byonz[1,1]/bxonz[1,1] . matrix list byonz 58

byonz[1,2] z _cons y1 .51680804 -.02495533 . matrix list bxonz bxonz[1,2] z _cons y1 1.0123602 -.01498985 . matrix list ivfirstprinciples symmetric ivfirstprinciples[1,1] c1 r1 .5104982 . . * (4) IV can be computed as 2SLS, but wrong standard errors . * (third column of Table 4.4) . * (4A) OLS of x on z gives xhat . regress x z Source | SS df MS Number of obs = 10000 -------------+-----------------------------F( 1, 9998) =10396.43 Model | 10518.3341 1 10518.3341 Prob > F = 0.0000 Residual | 10115.2362 9998 1.01172597 R-squared = 0.5098 -------------+-----------------------------Adj R-squared = 0.5097 Total | 20633.5703 9999 2.06356339 Root MSE = 1.0058 -----------------------------------------------------------------------------x| Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------z | 1.01236 .0099287 101.96 0.000 .9928979 1.031822 _cons | -.0149899 .02224 -0.67 0.500 -.0585847 .028605 -----------------------------------------------------------------------------. predict xhat, xb . * (4B) OLS of x on xhat gives IV but wrong standard errors . regress y xhat Source | SS df MS Number of obs = 10000 -------------+-----------------------------F( 1, 9998) = 1309.44 Model | 2741.16636 1 2741.16636 Prob > F = 0.0000 Residual | 20929.7127 9998 2.09338995 R-squared = 0.1158 -------------+-----------------------------Adj R-squared = 0.1157 Total | 23670.8791 9999 2.36732464 Root MSE = 1.4469 -----------------------------------------------------------------------------y| Coef. Std. Err. t P>|t| [95% Conf. Interval] 59

-------------+---------------------------------------------------------------xhat | .5104982 .0141075 36.19 0.000 .4828446 .5381518 _cons | -.017303 .0318026 -0.54 0.586 -.0796425 .0450364 -----------------------------------------------------------------------------. regress y xhat, robust Regression with robust standard errors Number of obs = 10000 F( 1, 9998) = 1271.86 Prob > F = 0.0000 R-squared = 0.1158 Root MSE = 1.4469 -----------------------------------------------------------------------------| Robust y| Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------xhat | .5104982 .0143144 35.66 0.000 .482439 .5385574 _cons | -.017303 .0319207 -0.54 0.588 -.0798741 .045268 -----------------------------------------------------------------------------. estimates store twosls . . * (5) IV with instrument xcubed is consistent but inefficient . * (last column of Table 4.4) . ivreg y (x = zcube) Instrumental variables (2SLS) regression Source | SS df MS Number of obs = 10000 -------------+-----------------------------F( 1, 9998) = 2001.31 Model | 13598.1181 1 13598.1181 Prob > F = 0.0000 Residual | 10072.761 9998 1.0074776 R-squared = 0.5745 -------------+-----------------------------Adj R-squared = 0.5744 Total | 23670.8791 9999 2.36732464 Root MSE = 1.0037 -----------------------------------------------------------------------------y| Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------x | .5086427 .0113699 44.74 0.000 .4863555 .5309299 _cons | -.0135782 .0249344 -0.54 0.586 -.0624546 .0352982 -----------------------------------------------------------------------------Instrumented: x Instruments: zcube -----------------------------------------------------------------------------. ivreg y (x = zcube), robust IV (2SLS) regression with robust standard errors Number of obs = 10000 F( 1, 9998) = 1894.15 60

Prob > F = 0.0000 R-squared = 0.5745 Root MSE = 1.0037 -----------------------------------------------------------------------------| Robust y| Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------x | .5086427 .0116871 43.52 0.000 .4857337 .5315517 _cons | -.0135782 .0253208 -0.54 0.592 -.063212 .0360556 -----------------------------------------------------------------------------Instrumented: x Instruments: zcube -----------------------------------------------------------------------------. estimates store ivineff . . ********** DISPLAY KEY RESULTS in Table 4.4 p.103 ********** . . * Table 4.4 page 103 . estimates table olswrong iv twosls ivineff, se stats(N r2) b(%8.3f) keep(_cons x xhat) ---------------------------------------------------------Variable | olswrong iv twosls ivineff -------------+-------------------------------------------_cons | -0.804 -0.017 -0.017 -0.014 | 0.014 0.022 0.032 0.025 x | 0.902 0.510 0.509 | 0.006 0.010 0.012 xhat | 0.510 | 0.014 -------------+-------------------------------------------N | 1.0e+04 1.0e+04 1.0e+04 1.0e+04 r2 | 0.709 0.576 0.116 0.574 ---------------------------------------------------------legend: b/se . . ********** CLOSE OUTPUT . log close log: c:\Imbook\bwebpage\Section2\mma04p3iv.txt log type: text closed on: 17 May 2005, 13:44:41 --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------log: c:\Imbook\bwebpage\Section2\mma04p4ivweak.txt log type: text opened on: 17 May 2005, 13:45:59

61

. . ********** OVERVIEW OF MMA04P4IVWEAK.DO ********** . . * STATA Program . * copyright C 2005 by A. Colin Cameron and Pravin K. Trivedi . * used for "Microeconometrics: Methods and Applications" . * by A. Colin Cameron and Pravin K. Trivedi (2005) . * Cambridge University Press . . * Chapter 4.9.5 pages 110-2 . * IV regression with potentially weak instruments . * (1) Compares OLS and IV estimation of log-wages on schooling regression . * where schooling, experience and experience-squared are endogenous . * and proximity to 4-year college, age and age-squared are instruments . * so model is just-identified. . * (2) Verifies that here can treat errors as homoskedastic . * (3) Looks at weak instruments . * (A) instrument relevance: Whether Shea's partial R-squared is low . * (B) finite sample bias: whether first-stage partial F is low . * (4) Provides Table 4.5 . * (5) Does more analysis than reported in the book . . * To run this program you need data and dictionary files . * DATA66.dat ASCII data set . * DATA66.dct Stata dictionary that labels variables . . ********** SETUP ********** . . set more off . version 8.0 . set memory 20m (20480k) . set linesize 150 /* Permits long inputline commands with delimit */ . . ********** ORIGINAL DATA SOURCE ********** . . * Program mma4p4ivweak.do based on Kling Analys66.d0 September 2003 . * written for Jeffrey R. Kling (2001) "Interpreting Instrumental Variables Estimates . * of the Return to Schooling", Journal of Business and Economic Statistics, . * July 2001, 19 (3), pp.358-364. . * This program focuses on Columns (1) and (2) of Kling's Table 1 on p.359 . * in turn based on . * David Card (1995), "Using Geographic Variation in College Proximity to . * Estimate the Returns to Schooling", in . * Aspects of Labor Market Behavior: Essays in Honor of John Vanderkamp, . * eds. L.N. Christofides et al., Toronto: University of Toronto Press, pp.201-221. . 62

. ********** READ IN DATA and SUMMARIZE ********** . . infile using DATA66.dct, using(DATA66.dat) dictionary using DATA66.dat { _column(1) id %8f "ID CODE (r0000100) n= 5225 mean= 2613.000 min= 1 max= 5225 " _column(9) black %3f "Race (r0002300) n= 5225 mean= 1.296 min= 1 max=3 " _column(13) imigrnt %3f "Was r's brthpl in the US? (r0038000) n=4965 mean=0.98 mn=0 mx=1 " _column(17) hhead %8f "Person R lived w/ @ age 14 (r0039700) n= 5213 mean=1.92 mn=1 mx=9" _column(28) mag_14 %10f "Were magznes avail at age 14 (r0039900) n=5167 mean=0.69 mn=0 mx=1 " _column(40) news_14 %10f "Were nwspaprs avail at age 14 (r0040000) n=5195 mean=0.85 mn=0 mx=1" _column(52) lib_14 %10f "Were lib-card avail at age14 (r0040100) n=5204 mean=0.66 mn=0 mx=1 " _column(63) num_sib %8f "Tot # sibs r 66 (r0056900) n=5168 mean=3.408 min=0 max=18" _column(72) fgrade %8f "Hgc by father, 66 (r0063100) n=3930 mean=9.937 min=0 max=18" _column(81) mgrade %8f "Hgc by mother, 66 (r0063300) n=4573 mean=10.25 min=0 max=18" _column(90) iq %8f "Iq_score (r0171100) n= 3369 mean=101.582 min=50 max=158 " _column(99) bdate %8f "Birthdate - STATA formatted " _column(108) gfill76 %8f "'76 Grade level, some values filled from prevs reports" _column(117) wt76 %8f "'76 Weight " _column(126) grade76 %8f "'76 Grade level" _column(135) grade66 %8f "'66 Grade level" _column(144) age66 %8f "Age reported by screener (r0002200) " _column(153) smsa66 %8f "If lived in SMSA in 1966 (r0002455=1,2)" _column(162) region %8f "Census Region in 1966 (r0002900) " _column(171) smsa76 %8f "If lived in SMSA in 1976 (r0437515=1,2)" _column(180) col4 %8f "If any 4-year college nearby (r0004000!=4) " _column(189) mcol4 %8f "If male 4-year college nearby (r0004100=1,2) " _column(198) col4pub %8f "If public 4-year college nearby (r0004000=2,3)" _column(207) south76 %1f "If lived in South in 1976 (r0437511=1) " _column(209) wage76 %10f "'76 Wage" _column(219) exp76 %8f "'76 experience, (10 + age66) - grade76 - 6)" _column(230) expsq76 %10f "'76 experience, exp76 ^2/100 " _column(243) age76 %8f "'76 age (age66 +10) " _column(252) agesq76 %8f "'76 age squared (age76^2) " _column(261) reg1 %8f "region==NE" _column(270) reg2 %8f "If lived in Region 2 (region= MidAtl)" _column(279) reg3 %8f "If lived in Region 3 (region= ENC) " _column(288) reg4 %8f "If lived in Region 4 (region= WNC) " _column(297) reg5 %8f "If lived in Region 5 (region= SA ) " _column(306) reg6 %8f "If lived in Region 6 (region= ESC) " _column(315) reg7 %8f "If lived in Region 7 (region= WSC) " _column(324) reg8 %8f "If lived in Region 8 (region= M ) " 63

_column(333) reg9 %8f "If lived in Region 9 (region= P ) " _column(342) momdad14 %8f "If lived with both parents at age 14 " _column(351) sinmom14 %8f "If lived with mother only at age 14 " _column(360) nodaded %1f "If father has no formal education " _column(362) nomomed %1f "If mother has no formal education " _column(365) daded %10f "Mean grade level of father " _column(377) momed %10f "Mean grade level of mother " _column(396) famed %8f "Father's and mother's education " _column(405) famed1 %8f "If mgrade> 12 & fgrade> 12 (famed=1) " _column(414) famed2 %8f "If mgrade>=12 & fgrade>=12 (famed=2) " _column(423) famed3 %8f "If mgrade==12 & fgrade==12 (famed=3) " _column(432) famed4 %8f "If mgrade>=12 & fgrade==-1 (famed=4) " _column(441) famed5 %8f "If fgrade>=12 (famed=5) " _column(450) famed6 %8f "If mgrade>=12 & fgrade> -1 (famed=6) " _column(459) famed7 %8f "If mgrade>=9 & fgrade>=9 (famed=7) " _column(468) famed8 %8f "If mgrade> -1 & fgrade> -1 (famed=8) " _column(477) famed9 %8f "If famed not in range (1-8)" _column(486) int76 %8f "If wt76 not missing " _column(495) age1415 %8f "If in age group =14-15" _column(504) age1617 %8f "If in age group =16-17" _column(513) age1819 %8f "If in age group =18-19" _column(522) age2021 %8f "If in age group =20-21" _column(531) age2224 %8f "If in age group =20-24" _column(540) cage1415 %8f "If in age group =14,15 and lived near college" _column(549) cage1617 %8f "If in age group =16,17 and lived near college" _column(558) cage1819 %8f "If in age group =18,19 and lived near college" _column(567) cage2021 %8f "If in age group =20,21 and lived near college" _column(576) cage2224 %8f "If in age group =20-24 and lived near college" _column(585) cage66 %8f "Age in 66 and whether lived near college " _column(594) a1 %8f "If age in 66 = 14 (age66= 14)" _column(603) a2 %8f "If age in 66 = 15 (age66= 15)" _column(612) a3 %8f "If age in 66 = 16 (age66= 16)" _column(621) a4 %8f "If age in 66 = 17 (age66= 17)" _column(630) a5 %8f "If age in 66 = 18 (age66= 18)" _column(639) a6 %8f "If age in 66 = 19 (age66= 19)" _column(648) a7 %8f "If age in 66 = 20 (age66= 20)" _column(657) a8 %8f "If age in 66 = 21 (age66= 21)" _column(666) a9 %8f "If age in 66 = 22 (age66= 22)" _column(675) a10 %8f "If age in 66 = 23 (age66= 23)" _column(684) a11 %8f "If age in 66 = 24 (age66= 24)" _column(693) ca1 %8f "Not lived near college in 66" _column(702) ca2 %8f "If age in 66 = 14 and lived near college" _column(711) ca3 %8f "If age in 66 = 15 and lived near college" _column(720) ca4 %8f "If age in 66 = 16 and lived near college" _column(729) ca5 %8f "If age in 66 = 17 and lived near college" _column(738) ca6 %8f "If age in 66 = 18 and lived near college" _column(747) ca7 %8f "If age in 66 = 19 and lived near college" _column(756) ca8 %8f "If age in 66 = 20 and lived near college" _column(765) ca9 %8f "If age in 66 = 21 and lived near college" _column(774) ca10 %2f "If age in 66 = 22 and lived near college" _column(777) ca11 %2f "If age in 66 = 23 and lived near college" 64

_column(780) ca12 %8f "If age in 66 = 24 and lived near college" _column(782) g25 %12f "Grade level when 25 years old " _column(795) g25i %12f "If =g25 and intrvwed in year used for determining g25 " _column(819) intmo66 %8f "Intvw month in 1966, used to identify cases incl by CARD" _column(828) nlsflt %8f "Flag to identify if the case was used by CARD" _column(837) nsib %8f "Number of siblings " _column(846) ns1 %8f "If number of siblings = 0 (nsib= 0)" _column(855) ns2 %8f "If number of siblings = 2 (nsib= 2)" _column(864) ns3 %8f "If number of siblings = 3 (nsib= 3)" _column(873) ns4 %8f "If number of siblings = 4 (nsib= 4)" _column(882) ns5 %8f "If number of siblings = 6 (nsib= 6)" _column(891) ns6 %8f "If number of siblings = 9 (nsib= 9)" _column(900) ns7 %8f "If number of siblings =18 (nsib=18)" } (5226 observations read) . * save DATA66, replace . desc Contains data obs: 5,226 vars: 101 size: 2,132,208 (89.8% of memory free) ------------------------------------------------------------------------------storage display value variable name type format label variable label ------------------------------------------------------------------------------id float %9.0g ID CODE (r0000100) n= 5225 mean= 2613.000 min= 1 max= 5225 black float %9.0g Race (r0002300) n= 5225 mean= 1.296 min= 1 max=3 imigrnt float %9.0g Was r's brthpl in the US? (r0038000) n=4965 mean=0.98 mn=0 mx=1 hhead float %9.0g Person R lived w/ @ age 14 (r0039700) n= 5213 mean=1.92 mn=1 mx=9 mag_14 float %9.0g Were magznes avail at age 14 (r0039900) n=5167 mean=0.69 mn=0 mx=1 news_14 float %9.0g Were nwspaprs avail at age 14 (r0040000) n=5195 mean=0.85 mn=0 mx=1 lib_14 float %9.0g Were lib-card avail at age14 (r0040100) n=5204 mean=0.66 mn=0 mx=1 num_sib float %9.0g Tot # sibs r 66 (r0056900) n=5168 mean=3.408 min=0 max=18 65

fgrade mgrade iq

float %9.0g float %9.0g float %9.0g

bdate gfill76

float %9.0g float %9.0g

wt76 grade76 grade66 age66

float %9.0g float %9.0g float %9.0g float %9.0g

smsa66

float %9.0g

region smsa76 col4

float %9.0g float %9.0g float %9.0g

mcol4

float %9.0g

col4pub

float %9.0g

south76

float %9.0g

wage76 exp76

float %9.0g float %9.0g

expsq76 age76 agesq76 reg1 reg2

float %9.0g float %9.0g float %9.0g float %9.0g float %9.0g

reg3

float %9.0g

reg4

float %9.0g

reg5

float %9.0g

reg6

float %9.0g

reg7

float %9.0g

reg8

float %9.0g

reg9

float %9.0g

Hgc by father, 66 (r0063100) n=3930 mean=9.937 min=0 max=18 Hgc by mother, 66 (r0063300) n=4573 mean=10.25 min=0 max=18 Iq_score (r0171100) n= 3369 mean=101.582 min=50 max=158 Birthdate - STATA formatted '76 Grade level, some values filled from prevs reports '76 Weight '76 Grade level '66 Grade level Age reported by screener (r0002200) If lived in SMSA in 1966 (r0002455=1,2) Census Region in 1966 (r0002900) If lived in SMSA in 1976 (r0437515=1,2) If any 4-year college nearby (r0004000!=4) If male 4-year college nearby (r0004100=1,2) If public 4-year college nearby (r0004000=2,3) If lived in South in 1976 (r0437511=1) '76 Wage '76 experience, (10 + age66) grade76 - 6) '76 experience, exp76 ^2/100 '76 age (age66 +10) '76 age squared (age76^2) region==NE If lived in Region 2 (region= MidAtl) If lived in Region 3 (region= ENC) If lived in Region 4 (region= WNC) If lived in Region 5 (region= SA ) If lived in Region 6 (region= ESC) If lived in Region 7 (region= WSC) If lived in Region 8 (region= M ) If lived in Region 9 (region= P ) 66

momdad14

float %9.0g

If lived with both parents at age 14

sinmom14

float %9.0g

If lived with mother only at age 14

nodaded nomomed daded momed famed famed1 famed2 famed3 famed4 famed5 famed6 famed7 famed8 famed9 int76 age1415 age1617 age1819 age2021 age2224 cage1415 cage1617 cage1819 cage2021 cage2224 cage66 a1 a2 a3 a4 a5 a6

float %9.0g

If father has no formal education float %9.0g If mother has no formal education float %9.0g Mean grade level of father float %9.0g Mean grade level of mother float %9.0g Father's and mother's education float %9.0g If mgrade> 12 & fgrade> 12 (famed=1) float %9.0g If mgrade>=12 & fgrade>=12 (famed=2) float %9.0g If mgrade==12 & fgrade==12 (famed=3) float %9.0g If mgrade>=12 & fgrade==-1 (famed=4) float %9.0g If fgrade>=12 (famed=5) float %9.0g If mgrade>=12 & fgrade> -1 (famed=6) float %9.0g If mgrade>=9 & fgrade>=9 (famed=7) float %9.0g If mgrade> -1 & fgrade> -1 (famed=8) float %9.0g If famed not in range (1-8) float %9.0g If wt76 not missing float %9.0g If in age group =14-15 float %9.0g If in age group =16-17 float %9.0g If in age group =18-19 float %9.0g If in age group =20-21 float %9.0g If in age group =20-24 float %9.0g If in age group =14,15 and lived near college float %9.0g If in age group =16,17 and lived near college float %9.0g If in age group =18,19 and lived near college float %9.0g If in age group =20,21 and lived near college float %9.0g If in age group =20-24 and lived near college float %9.0g Age in 66 and whether lived near college float %9.0g If age in 66 = 14 (age66= 14) float %9.0g If age in 66 = 15 (age66= 15) float %9.0g If age in 66 = 16 (age66= 16) float %9.0g If age in 66 = 17 (age66= 17) float %9.0g If age in 66 = 18 (age66= 18) float %9.0g If age in 66 = 19 (age66= 19) 67

a7 a8 a9 a10 a11 ca1 ca2

float %9.0g float %9.0g float %9.0g float %9.0g float %9.0g float %9.0g float %9.0g

If age in 66 = 20 (age66= 20) If age in 66 = 21 (age66= 21) If age in 66 = 22 (age66= 22) If age in 66 = 23 (age66= 23) If age in 66 = 24 (age66= 24) Not lived near college in 66 If age in 66 = 14 and lived near college ca3 float %9.0g If age in 66 = 15 and lived near college ca4 float %9.0g If age in 66 = 16 and lived near college ca5 float %9.0g If age in 66 = 17 and lived near college ca6 float %9.0g If age in 66 = 18 and lived near college ca7 float %9.0g If age in 66 = 19 and lived near college ca8 float %9.0g If age in 66 = 20 and lived near college ca9 float %9.0g If age in 66 = 21 and lived near college ca10 float %9.0g If age in 66 = 22 and lived near college ca11 float %9.0g If age in 66 = 23 and lived near college ca12 float %9.0g If age in 66 = 24 and lived near college g25 float %9.0g Grade level when 25 years old g25i float %9.0g If =g25 and intrvwed in year used for determining g25 intmo66 float %9.0g Intvw month in 1966, used to identify cases incl by CARD nlsflt float %9.0g Flag to identify if the case was used by CARD nsib float %9.0g Number of siblings ns1 float %9.0g If number of siblings = 0 (nsib= 0) ns2 float %9.0g If number of siblings = 2 (nsib= 2) ns3 float %9.0g If number of siblings = 3 (nsib= 3) ns4 float %9.0g If number of siblings = 4 (nsib= 4) ns5 float %9.0g If number of siblings = 6 (nsib= 6) ns6 float %9.0g If number of siblings = 9 (nsib= 9) ns7 float %9.0g If number of siblings =18 (nsib=18) ------------------------------------------------------------------------------68

Sorted by: Note: dataset has changed since last saved . sum Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------id | 5225 2613 1508.472 1 5225 black | 5225 .2752153 .4466655 0 1 imigrnt | 5225 .0237321 .1522277 0 1 hhead | 5225 -.3783732 47.95128 -999 9 mag_14 | 5225 .6861566 .4616275 0 1 -------------+-------------------------------------------------------news_14 | 5225 .8483024 .3577176 0 1 lib_14 | 5225 .658469 .4733619 0 1 num_sib | 5168 3.407701 2.586307 0 18 fgrade | 3930 9.93715 3.777654 0 18 mgrade | 4573 10.25104 3.17986 0 18 -------------+-------------------------------------------------------iq | 3369 101.5818 15.93225 50 158 bdate | 5204 472926.6 31765.04 360823 521224 gfill76 | 5225 12.78718 2.802705 0 18 wt76 | 3695 475512.5 265188.5 98617 2582192 grade76 | 3671 13.23018 2.747627 0 18 -------------+-------------------------------------------------------grade66 | 5225 10.58431 2.433696 0 18 age66 | 5225 18.09129 3.157657 14 24 smsa66 | 5225 .6599043 .4737864 0 1 region | 5225 4.721722 2.300767 1 9 smsa76 | 5225 .491866 .4999817 0 1 -------------+-------------------------------------------------------col4 | 5225 .691866 .4617664 0 1 mcol4 | 5225 .6874641 .4635713 0 1 col4pub | 5225 .5129187 .4998809 0 1 south76 | 3695 .3964817 .4892328 0 1 wage76 | 3078 1.658013 .4430234 0 3.1797 -------------+-------------------------------------------------------exp76 | 3671 8.933533 4.212664 0 25 expsq76 | 3671 .9754971 .8778352 0 6.25 age76 | 5225 28.09129 3.157657 24 34 agesq76 | 5225 799.0896 182.0539 576 1156 reg1 | 5225 .04 .1959779 0 1 -------------+-------------------------------------------------------reg2 | 5225 .1617225 .3682313 0 1 reg3 | 5225 .1900478 .3923763 0 1 reg4 | 5225 .0639234 .2446399 0 1 reg5 | 5225 .2126316 .4092083 0 1 reg6 | 5225 .0895694 .2855912 0 1 -------------+-------------------------------------------------------reg7 | 5225 .1083254 .3108206 0 1 reg8 | 5225 .0304306 .1717855 0 1 69

reg9 | 5225 .1033493 .3044437 0 1 momdad14 | 5225 .7680383 .4221251 0 1 sinmom14 | 5225 .1182775 .3229673 0 1 -------------+-------------------------------------------------------nodaded | 5225 .2478469 .4318038 0 1 nomomed | 5225 .1247847 .3305062 0 1 daded | 5225 9.937162 3.276134 0 18 momed | 5225 10.25103 2.974812 0 18 famed | 5225 6.05933 2.643855 1 9 -------------+-------------------------------------------------------famed1 | 5225 .0610526 .2394497 0 1 famed2 | 5225 .0742584 .262216 0 1 famed3 | 5225 .1144498 .3183872 0 1 famed4 | 5225 .0474641 .2126498 0 1 famed5 | 5225 .077512 .2674276 0 1 -------------+-------------------------------------------------------famed6 | 5225 .1245933 .3302888 0 1 famed7 | 5225 .0486124 .215077 0 1 famed8 | 5225 .2273684 .4191726 0 1 famed9 | 5225 .224689 .4174173 0 1 int76 | 5225 .707177 .4551014 0 1 -------------+-------------------------------------------------------age1415 | 5225 .2595215 .4384141 0 1 age1617 | 5225 .2482297 .4320271 0 1 age1819 | 5225 .1751196 .3801058 0 1 age2021 | 5225 .11311 .3167576 0 1 age2224 | 5225 .2040191 .4030216 0 1 -------------+-------------------------------------------------------cage1415 | 5225 .1755024 .3804327 0 1 cage1617 | 5225 .1680383 .3739361 0 1 cage1819 | 5225 .1245933 .3302888 0 1 cage2021 | 5225 .0796172 .2707256 0 1 cage2224 | 5225 .1441148 .3512397 0 1 -------------+-------------------------------------------------------cage66 | 5225 12.56115 8.785895 0 24 a1 | 5225 .1314833 .3379605 0 1 a2 | 5225 .1280383 .3341644 0 1 a3 | 5225 .1326316 .3392086 0 1 a4 | 5225 .1155981 .3197729 0 1 -------------+-------------------------------------------------------a5 | 5225 .098756 .2983627 0 1 a6 | 5225 .0763636 .2656045 0 1 a7 | 5225 .0560766 .2300915 0 1 a8 | 5225 .0570335 .2319288 0 1 a9 | 5225 .0666029 .2493568 0 1 -------------+-------------------------------------------------------a10 | 5225 .0683254 .2523275 0 1 a11 | 5225 .0690909 .2536329 0 1 ca1 | 5225 .308134 .4617664 0 1 ca2 | 5225 .0876555 .2828203 0 1 ca3 | 5225 .0878469 .2830992 0 1 70

-------------+-------------------------------------------------------ca4 | 5225 .0870813 .2819812 0 1 ca5 | 5225 .0809569 .2727951 0 1 ca6 | 5225 .0708134 .2565374 0 1 ca7 | 5225 .0537799 .2256044 0 1 ca8 | 5225 .0390431 .193716 0 1 -------------+-------------------------------------------------------ca9 | 5225 .0405742 .1973204 0 1 ca10 | 5225 .0465072 .2106009 0 1 ca11 | 5225 .0484211 .2146748 0 1 ca12 | 5225 12.52593 2.740455 0 18 g25 | 5225 12.53923 2.749407 0 18 -------------+-------------------------------------------------------g25i | 4148 12.77929 2.740756 0 18 intmo66 | 5225 -5.790239 128.4984 -999 12 nlsflt | 5225 .9835407 .1272459 0 1 nsib | 5225 2.818565 2.473752 0 18 ns1 | 5225 .2547368 .4357549 0 1 -------------+-------------------------------------------------------ns2 | 5225 .3534928 .4780998 0 1 ns3 | 5225 .0109091 .1038853 0 1 ns4 | 5225 .1892823 .3917702 0 1 ns5 | 5225 .135311 .3420882 0 1 ns6 | 5225 .0558852 .2297218 0 1 -------------+-------------------------------------------------------ns7 | 5225 .0003828 .0195628 0 1 . . * Define the exogenous regressors using the global macro exogregressors . global exogregressors black south76 smsa76 reg2-reg9 /* > */ smsa66 momdad14 sinmom14 nodaded nomomed daded momed famed1-famed8 . . * Write data to a text (ascii) file so can use with programs other than stata . outfile wage76 grade76 exp76 expsq76 col4 age76 agesq76 black south76 smsa76 reg2-reg9 /* > */ smsa66 momdad14 sinmom14 nodaded nomomed daded momed famed1-famed8 /* > */ using mma04p4ivweak.asc, replace . . . ********** (1) OLS AND IV ESTIMATES: COLUMNS 1 AND 2 OF KLING TABLE 1 . . * RETAIN cases for the analysis . * Here drop if missing wages or missing schooling or not at first interview . keep if wage76!=. & grade76!=. & nlsflt==1 (2216 observations deleted) . . * DESCRIBE dependent variable, regressors and instruments . desc wage76 grade76 exp76 expsq76 col4 age76 agesq76 $exogregressors

71

storage display value variable name type format label variable label ------------------------------------------------------------------------------wage76 float %9.0g '76 Wage grade76 float %9.0g '76 Grade level exp76 float %9.0g '76 experience, (10 + age66) grade76 - 6) expsq76 float %9.0g '76 experience, exp76 ^2/100 col4 float %9.0g If any 4-year college nearby (r0004000!=4) age76 float %9.0g '76 age (age66 +10) agesq76 float %9.0g '76 age squared (age76^2) black float %9.0g Race (r0002300) n= 5225 mean= 1.296 min= 1 max=3 south76 float %9.0g If lived in South in 1976 (r0437511=1) smsa76 float %9.0g If lived in SMSA in 1976 (r0437515=1,2) reg2 float %9.0g If lived in Region 2 (region= MidAtl) reg3 float %9.0g If lived in Region 3 (region= ENC) reg4 float %9.0g If lived in Region 4 (region= WNC) reg5 float %9.0g If lived in Region 5 (region= SA ) reg6 float %9.0g If lived in Region 6 (region= ESC) reg7 float %9.0g If lived in Region 7 (region= WSC) reg8 float %9.0g If lived in Region 8 (region= M ) reg9 float %9.0g If lived in Region 9 (region= P ) smsa66 float %9.0g If lived in SMSA in 1966 (r0002455=1,2) momdad14 float %9.0g If lived with both parents at age 14 sinmom14 float %9.0g If lived with mother only at age 14 nodaded float %9.0g If father has no formal education nomomed float %9.0g If mother has no formal education daded float %9.0g Mean grade level of father momed float %9.0g Mean grade level of mother famed1 float %9.0g If mgrade> 12 & fgrade> 12 (famed=1) famed2 float %9.0g If mgrade>=12 & fgrade>=12 (famed=2) famed3 float %9.0g If mgrade==12 & fgrade==12 72

famed4

float %9.0g

famed5 famed6

float %9.0g float %9.0g

famed7

float %9.0g

famed8

float %9.0g

(famed=3) If mgrade>=12 & fgrade==-1 (famed=4) If fgrade>=12 (famed=5) If mgrade>=12 & fgrade> -1 (famed=6) If mgrade>=9 & fgrade>=9 (famed=7) If mgrade> -1 & fgrade> -1 (famed=8)

. . * SUMMARIZE dependent variable, regressors and instruments . sum wage76 grade76 exp76 expsq76 col4 age76 agesq76 $exogregressors Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------wage76 | 3010 1.656664 .443798 0 3.1797 grade76 | 3010 13.26346 2.676913 1 18 exp76 | 3010 8.856146 4.141672 0 23 expsq76 | 3010 .9557907 .8461831 0 5.29 col4 | 3010 .6820598 .4657535 0 1 -------------+-------------------------------------------------------age76 | 3010 28.1196 3.137004 24 34 agesq76 | 3010 800.5495 180.7484 576 1156 black | 3010 .2335548 .4231624 0 1 south76 | 3010 .4036545 .4907113 0 1 smsa76 | 3010 .7129568 .4524571 0 1 -------------+-------------------------------------------------------reg2 | 3010 .1607973 .367405 0 1 reg3 | 3010 .1956811 .39679 0 1 reg4 | 3010 .0641196 .2450066 0 1 reg5 | 3010 .2083056 .406164 0 1 reg6 | 3010 .0960133 .2946584 0 1 -------------+-------------------------------------------------------reg7 | 3010 .1099668 .3129003 0 1 reg8 | 3010 .0282392 .165683 0 1 reg9 | 3010 .0903654 .2867522 0 1 smsa66 | 3010 .6495017 .4772053 0 1 momdad14 | 3010 .7893688 .4078247 0 1 -------------+-------------------------------------------------------sinmom14 | 3010 .1006645 .3009339 0 1 nodaded | 3010 .2292359 .4204111 0 1 nomomed | 3010 .1172757 .321802 0 1 daded | 3010 9.988262 3.266511 0 18 momed | 3010 10.33675 2.987507 0 18 -------------+-------------------------------------------------------famed1 | 3010 .0614618 .2402153 0 1 famed2 | 3010 .0787375 .2693734 0 1 famed3 | 3010 .1249169 .3306796 0 1 famed4 | 3010 .0475083 .2127588 0 1 73

famed5 | 3010 .0790698 .2698925 0 1 -------------+-------------------------------------------------------famed6 | 3010 .1328904 .3395126 0 1 famed7 | 3010 .0504983 .2190073 0 1 famed8 | 3010 .2202658 .4144947 0 1 . . * OLS estimates of return to schooling. . * This regression computes schooling coeff, se for Table1 col 1 p.359 . * based on all cases (age grp 14-24) reported highest grd cmpl 76 . . reg wage76 grade76 exp76 expsq76 $exogregressors Source | SS df MS Number of obs = 3010 -------------+-----------------------------F( 29, 2980) = 44.94 Model | 180.320527 29 6.21794919 Prob > F = 0.0000 Residual | 412.32209 2980 .138363117 R-squared = 0.3043 -------------+-----------------------------Adj R-squared = 0.2975 Total | 592.642616 3009 .196956669 Root MSE = .37197 -----------------------------------------------------------------------------wage76 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------grade76 | .072635 .0036984 19.64 0.000 .0653833 .0798868 exp76 | .0845293 .0066819 12.65 0.000 .0714277 .0976308 expsq76 | -.2289581 .0319499 -7.17 0.000 -.2916041 -.1663121 black | -.1894065 .0194462 -9.74 0.000 -.2275358 -.1512773 south76 | -.1464841 .0260345 -5.63 0.000 -.1975314 -.0954368 smsa76 | .1377121 .0201334 6.84 0.000 .0982353 .1771889 reg2 | .1023805 .0360137 2.84 0.005 .0317662 .1729947 reg3 | .1488958 .0352521 4.22 0.000 .0797748 .2180168 reg4 | .0601267 .0417556 1.44 0.150 -.021746 .1419994 reg5 | .1348504 .0419098 3.22 0.001 .0526752 .2170255 reg6 | .1452831 .0453155 3.21 0.001 .0564302 .2341359 reg7 | .1301968 .044965 2.90 0.004 .0420312 .2183624 reg8 | -.0444289 .0513937 -0.86 0.387 -.1451997 .0563419 reg9 | .1285658 .0389959 3.30 0.001 .0521042 .2050274 smsa66 | .0233775 .019544 1.20 0.232 -.0149436 .0616987 momdad14 | .0693317 .0263402 2.63 0.009 .017685 .1209785 sinmom14 | .0335387 .0354168 0.95 0.344 -.0359052 .1029825 nodaded | -.0390477 .0531089 -0.74 0.462 -.1431815 .0650862 nomomed | .0168143 .0348295 0.48 0.629 -.051478 .0851066 daded | -.0017839 .0043977 -0.41 0.685 -.0104068 .0068389 momed | .0081443 .0041513 1.96 0.050 4.64e-06 .0162839 famed1 | -.1166029 .0788125 -1.48 0.139 -.2711354 .0379296 famed2 | -.052544 .0712753 -0.74 0.461 -.1922977 .0872097 famed3 | -.0719675 .0654608 -1.10 0.272 -.2003205 .0563856 famed4 | -.0197095 .0437058 -0.45 0.652 -.1054062 .0659872 famed5 | -.0252185 .0643526 -0.39 0.695 -.1513985 .1009615 famed6 | -.0733887 .0621076 -1.18 0.237 -.1951667 .0483894 famed7 | -.059927 .0656929 -0.91 0.362 -.188735 .068881 74

famed8 | -.0738951 .0572428 -1.29 0.197 -.1861345 .0383444 _cons | -.0278815 .1005974 -0.28 0.782 -.2251288 .1693659 -----------------------------------------------------------------------------. estimates store ols . . * IV Instrumental variables estimates of return to schooling. . * This regression computes schooling coeff and se for Table 1. col 2 p.359 . * Endogenous variables: schooling, experience, experience squared . * Excl instruments: college in cnty, age age^2 . * based on all cases (age grp 14-24) reported highest grd cmpl 76 ***/ . . ivreg wage76 $exogregressors /* > */ (grade76 exp76 expsq76 = col4 age76 agesq76 $exogregressors) Instrumental variables (2SLS) regression Source | SS df MS Number of obs = 3010 -------------+-----------------------------F( 29, 2980) = 34.56 Model | 122.395448 29 4.22053269 Prob > F = 0.0000 Residual | 470.247169 2980 .157801063 R-squared = 0.2065 -------------+-----------------------------Adj R-squared = 0.1988 Total | 592.642616 3009 .196956669 Root MSE = .39724 -----------------------------------------------------------------------------wage76 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------grade76 | .1324485 .0493419 2.68 0.007 .0357009 .2291961 exp76 | .0632411 .0241061 2.62 0.009 .0159748 .1105074 expsq76 | -.1266694 .1184765 -1.07 0.285 -.3589735 .1056347 black | -.1643766 .0292248 -5.62 0.000 -.2216795 -.1070737 south76 | -.1400178 .0283887 -4.93 0.000 -.1956812 -.0843545 smsa76 | .0909867 .0441338 2.06 0.039 .0044509 .1775224 reg2 | .0753178 .0444167 1.70 0.090 -.0117726 .1624083 reg3 | .1231473 .0431763 2.85 0.004 .038489 .2078057 reg4 | .0241968 .0534911 0.45 0.651 -.0806865 .1290801 reg5 | .1247819 .0455148 2.74 0.006 .0355383 .2140255 reg6 | .135761 .0490304 2.77 0.006 .039624 .2318979 reg7 | .1063645 .0519274 2.05 0.041 .0045472 .2081817 reg8 | -.0850609 .064327 -1.32 0.186 -.2111907 .0410688 reg9 | .0916464 .0515551 1.78 0.076 -.0094409 .1927337 smsa66 | .0379821 .0241116 1.58 0.115 -.0092951 .0852592 momdad14 | .043168 .0354056 1.22 0.223 -.0262539 .11259 sinmom14 | .025849 .0383465 0.67 0.500 -.0493392 .1010373 nodaded | -.0462392 .0570684 -0.81 0.418 -.1581366 .0656583 nomomed | .0266252 .0383434 0.69 0.487 -.048557 .1018074 daded | -.0110565 .0089768 -1.23 0.218 -.0286579 .0065449 momed | -.0017539 .0093223 -0.19 0.851 -.0200326 .0165249 famed1 | -.213271 .1160049 -1.84 0.066 -.4407287 .0141867 famed2 | -.1567074 .1145696 -1.37 0.171 -.3813508 .0679361 75

famed3 | -.1354685 .0872725 -1.55 0.121 -.3065889 .035652 famed4 | -.0707323 .0627189 -1.13 0.260 -.193709 .0522444 famed5 | -.0699675 .077928 -0.90 0.369 -.2227656 .0828306 famed6 | -.1171712 .0754408 -1.55 0.120 -.2650926 .0307502 famed7 | -.0921498 .0749801 -1.23 0.219 -.2391679 .0548683 famed8 | -.1184618 .0713021 -1.66 0.097 -.2582681 .0213445 _cons | -.4311125 .3567904 -1.21 0.227 -1.130693 .2684678 -----------------------------------------------------------------------------Instrumented: grade76 exp76 expsq76 Instruments: black south76 smsa76 reg2 reg3 reg4 reg5 reg6 reg7 reg8 reg9 smsa66 momdad14 sinmom14 nodaded nomomed daded momed famed1 famed2 famed3 famed4 famed5 famed6 famed7 famed8 col4 age76 agesq76 -----------------------------------------------------------------------------. estimates store iv . . ********** (2) NEW ANALYSIS: HETEROSKEDASTIC ROBUST STANDARD ERRORS ********** . . * Heteroskedastic errors makes little difference here. . . quietly reg wage76 grade76 exp76 expsq76 $exogregressors . hettest /* Shows that here there is no heteroskeadsticity for OLS */ Breusch-Pagan / Cook-Weisberg test for heteroskedasticity Ho: Constant variance Variables: fitted values of wage76 chi2(1) = 0.42 Prob > chi2 = 0.5191 . quietly reg wage76 grade76 exp76 expsq76 $exogregressors, robust . estimates store olshet . . quietly ivreg wage76 $exogregressors /* > */ (grade76 exp76 expsq76 = col4 age76 agesq76 $exogregressors), robust . estimates store ivhet . . **** DISPLAY RESULTS IN TABLE 4.5 p.111 . . * Table 4.5 p.111: OLS and IV estimates, s.e.'s and R^2 in Table 4.5 . . * Table reports only the coefficient and standard erros for grade76 . estimates table ols olshet iv ivhet, /* 76

>

*/ se stats(N ll r2 rss mss rmse df_r) b(%10.4f)

-----------------------------------------------------------------Variable | ols olshet iv ivhet -------------+---------------------------------------------------grade76 | 0.0726 0.0726 0.1324 0.1324 | 0.0037 0.0039 0.0493 0.0488 exp76 | 0.0845 0.0845 0.0632 0.0632 | 0.0067 0.0068 0.0241 0.0241 expsq76 | -0.2290 -0.2290 -0.1267 -0.1267 | 0.0319 0.0322 0.1185 0.1182 black | -0.1894 -0.1894 -0.1644 -0.1644 | 0.0194 0.0198 0.0292 0.0285 south76 | -0.1465 -0.1465 -0.1400 -0.1400 | 0.0260 0.0280 0.0284 0.0292 smsa76 | 0.1377 0.1377 0.0910 0.0910 | 0.0201 0.0193 0.0441 0.0440 reg2 | 0.1024 0.1024 0.0753 0.0753 | 0.0360 0.0350 0.0444 0.0432 reg3 | 0.1489 0.1489 0.1231 0.1231 | 0.0353 0.0338 0.0432 0.0418 reg4 | 0.0601 0.0601 0.0242 0.0242 | 0.0418 0.0412 0.0535 0.0531 reg5 | 0.1349 0.1349 0.1248 0.1248 | 0.0419 0.0428 0.0455 0.0459 reg6 | 0.1453 0.1453 0.1358 0.1358 | 0.0453 0.0452 0.0490 0.0483 reg7 | 0.1302 0.1302 0.1064 0.1064 | 0.0450 0.0457 0.0519 0.0516 reg8 | -0.0444 -0.0444 -0.0851 -0.0851 | 0.0514 0.0509 0.0643 0.0619 reg9 | 0.1286 0.1286 0.0916 0.0916 | 0.0390 0.0388 0.0516 0.0504 smsa66 | 0.0234 0.0234 0.0380 0.0380 | 0.0195 0.0187 0.0241 0.0231 momdad14 | 0.0693 0.0693 0.0432 0.0432 | 0.0263 0.0257 0.0354 0.0352 sinmom14 | 0.0335 0.0335 0.0258 0.0258 | 0.0354 0.0359 0.0383 0.0384 nodaded | -0.0390 -0.0390 -0.0462 -0.0462 | 0.0531 0.0511 0.0571 0.0550 nomomed | 0.0168 0.0168 0.0266 0.0266 | 0.0348 0.0344 0.0383 0.0375 daded | -0.0018 -0.0018 -0.0111 -0.0111 | 0.0044 0.0044 0.0090 0.0089 momed | 0.0081 0.0081 -0.0018 -0.0018 | 0.0042 0.0042 0.0093 0.0093 famed1 | -0.1166 -0.1166 -0.2133 -0.2133 | 0.0788 0.0792 0.1160 0.1160 famed2 | -0.0525 -0.0525 -0.1567 -0.1567 | 0.0713 0.0698 0.1146 0.1132 77

famed3 | -0.0720 -0.0720 -0.1355 -0.1355 | 0.0655 0.0644 0.0873 0.0865 famed4 | -0.0197 -0.0197 -0.0707 -0.0707 | 0.0437 0.0416 0.0627 0.0601 famed5 | -0.0252 -0.0252 -0.0700 -0.0700 | 0.0644 0.0625 0.0779 0.0763 famed6 | -0.0734 -0.0734 -0.1172 -0.1172 | 0.0621 0.0601 0.0754 0.0735 famed7 | -0.0599 -0.0599 -0.0921 -0.0921 | 0.0657 0.0640 0.0750 0.0730 famed8 | -0.0739 -0.0739 -0.1185 -0.1185 | 0.0572 0.0545 0.0713 0.0682 _cons | -0.0279 -0.0279 -0.4311 -0.4311 | 0.1006 0.0997 0.3568 0.3528 -------------+---------------------------------------------------N | 3010.0000 3010.0000 3010.0000 3010.0000 ll | -1279.2297 -1279.2297 r2 | 0.3043 0.3043 0.2065 0.2065 rss | 412.3221 412.3221 470.2472 470.2472 mss | 180.3205 180.3205 122.3954 122.3954 rmse | 0.3720 0.3720 0.3972 0.3972 df_r | 2980.0000 2980.0000 2980.0000 2980.0000 -----------------------------------------------------------------legend: b/se . . ********** (3) NEW ANALYSIS: CHECK FOR WEAK INSTRUMENTS ********** . . * Model is y = b1*x1 + x2'b2 + u . * where x1 is scalar endogenous (grade76) . * where x2 is vector of regressors that includes .* exp76 and exp76 which are also endogenous .* and $exogregressors which are exogenous . * and the instruments Z are grade76 col4 age76 agesq76 $exogregressors . . * Check for weak instruments . * Focus on grade76 but can also do this for the other two endogenous regressors. . * In this example no problems for the other two: . * as age and age-squared are good instruments for exp and exp-squared. . . **** (A) Simple analysis R-squared and F-test [Given in Table 4.5] . . * R2 from regress endogenous regressor on instruments . * This is same as correlation between x1 and projection of x1 on Z . quietly reg grade76 col4 age76 agesq76 $exogregressors . di e(r2) " r2 of x1 on Z" .29677588 r2 of x1 on Z . . * Do the partial F-test on the three instruments 78

. * This is the standard first-stage regression F-test . . **** DISPLAY RESULT IN TABLE 4.5 page 111 . . * First-stage F statistic given in Table 4.5 . test col4 age76 agesq76 ( 1) col4 = 0 ( 2) age76 = 0 ( 3) agesq76 = 0 F( 3, 2980) = 8.07 Prob > F = 0.0000 . . * Compare this to R-squared when only regress on instruments without Z . quietly reg grade76 $exogregressors . di e(r2) " r2 of x1 on Z with the three additional instruments dropped" .29106483 r2 of x1 on Z with the three additional instruments dropped . . * Obtain first-stge F for the other two endogenous . quietly reg exp76 col4 age76 agesq76 $exogregressors . test col4 age76 agesq76 ( 1) col4 = 0 ( 2) age76 = 0 ( 3) agesq76 = 0 F( 3, 2980) = 1772.03 Prob > F = 0.0000 . quietly reg expsq76 col4 age76 agesq76 $exogregressors . test col4 age76 agesq76 ( 1) col4 = 0 ( 2) age76 = 0 ( 3) agesq76 = 0 F( 3, 2980) = 1542.36 Prob > F = 0.0000 . . **** (B) Minimum eigenvalue of matrix analog of the first-stage F statistic .* proposed by Stock et al (2002) and tables in Stock and Yogo (2003) . * This test is not done here. . . **** (C) Bound et al (1995) partial R-squared 79

. . * Not relevant here as more than one endogenous regressor . * If only one endogenous regressor x1 Bound et al purge the effect of x2 . * by (1) get residual from regress x1 on x2 . * (2) get the residuals from regress z on x2 . * and then get the R-squared from regress (1) on (2). . . **** (D) Shea (1997) partial R-squared [Given in Table 4.5] . . * Here we have three endogenous regressors. . * Focus on the endogenous schooling regressor. . * For the other two just need to replace the first line of (1) . * e.g. quietly reg exp76 grade76 expsq76 $exogregressors . * and replace the first line of (2B) . * e.g. quietly reg exp76hat grade76hat expsq76hat $exogregressors . . * (1) Form x1 - x1tilda: residual from regress x1 on other regressors . quietly reg grade76 exp76 expsq76 $exogregressors . predict x1minusx1tilda, resid . . * (2) Form x1hat - x1hattilda: residual from regress x1hat on fitted values of other regressors . * (2A) First get the fitted values from regress endogenous on instruments . quietly reg grade76 col4 age76 agesq76 $exogregressors . predict grade76hat, xb . di e(r2) " r2 from regress x1 on Z" .29677588 r2 from regress x1 on Z . quietly reg exp76 col4 age76 agesq76 $exogregressors . predict exp76hat, xb . di e(r2) " r2 from regress second endog regressor on Z" .70622765 r2 from regress second endog regressor on Z . quietly reg expsq76 col4 age76 agesq76 $exogregressors . predict expsq76hat, xb . di e(r2) " r2 from regress third endog regressor on Z" .67573235 r2 from regress third endog regressor on Z . * Fitted values for the exogenous from regress exogenous on instruments are the exogenous . * (2B) Run the regression of x1hat on fitted values of other regressors . quietly reg grade76hat exp76hat expsq76hat $exogregressors . di e(r2) " r2 from regress prediction of x1 on predictions of x2 .98987117 r2 from regress prediction of x1 on predictions of x2 80

. predict x1hatminusx1hattilda, resid . . * (3) Form the correlation between (1) and (2) . corr x1minusx1tilda x1hatminusx1hattilda (obs=3010) | x1minu~a x1hatm~a -------------+-----------------x1minusx1t~a | 1.0000 x1hatminus~a | 0.0800 1.0000

. . **** DISPLAY RESULT IN TABLE 4.5 page 111 . . * Shea's Partial R^2 in Table 4.5 . di r(rho)^2 " Shea's partial R-squared measure" .00640757 Shea's partial R-squared measure . . sum grade76 grade76hat exp76 exp76hat expsq76 expsq76hat grade76 x1minusx1tilda x1hatminusx1hattilda grade76hat Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------grade76 | 3010 13.26346 2.676913 1 18 grade76hat | 3010 13.26346 1.458306 8.919074 17.42063 exp76 | 3010 8.856146 4.141672 0 23 exp76hat | 3010 8.856146 3.480551 1.329216 17.68953 expsq76 | 3010 .9557907 .8461831 0 5.29 -------------+-------------------------------------------------------expsq76hat | 3010 .9557907 .6955874 -.3913698 2.917523 grade76 | 3010 13.26346 2.676913 1 18 x1minusx1t~a | 3010 -8.71e-10 1.833502 -6.948598 5.661138 x1hatminus~a | 3010 -6.86e-11 .1467669 -.3732457 .3033035 grade76hat | 3010 13.26346 1.458306 8.919074 17.42063 . . **** (E) Poskitt-Skeels (2002) partial R-squared . * Not done here . . **** (F) If model was over-identified then do test of over-identifying restrictions . * Not done here as model is just-identified . . ********** CLOSE OUTPUT . log close log: c:\Imbook\bwebpage\Section2\mma04p4ivweak.txt log type: text closed on: 17 May 2005, 13:46:03 81

-----------------------------------------------------------------------------------------------------------------------------------------------------

82

Chapter 5.9 pp.159-63

----------------------------------------------------------------------------------------------------log: c:\Imbook\bwebpage\Section2\mma05p1mle.txt log type: text opened on: 17 May 2005, 13:48:11 . . ********** OVERVIEW OF MMA05P1MLE.DO ********** . . * STATA Program . * copyright C 2005 by A. Colin Cameron and Pravin K. Trivedi . * used for "Microeconometrics: Methods and Applications" . * by A. Colin Cameron and Pravin K. Trivedi (2005) . * Cambridge University Press . . * Chapter 5.9 pp.159-63 . * Maximum likelihood analysis. . . * Provides first two columns of Table 5.7 . * (1) OLS using Stata command regress . * (2) MLE using Stata command exp for exponential MLE . * (3) MLE using Stata command ml for user-provided log-likelihood . * using generated data (see below) . . * Related programs: . * mma05p2nls.do NLS, WNLS, FGNLS for same data using nl command . * mma05p3nlsbyml.do NLS, WNLS, FGNLS for same data using ml command . * mma05p4margeffects.do Calculates marginal effects . . ********** SETUP ********** . . set more off . version 8 . . ********** GENERATE DATA and SUMMARIZE ********** . . * Model is y ~ exponential(exp(a + bx)) .* x ~ N[mux, sigx^2] .* f(y) = exp(a + bx)*exp(-y*exp(a + bx)) .* lnf(y) = (a + bx) - y*exp(a + bx) .* E[y] = exp(-(a + bx)) note sign reversal for the mean .* V[y] = exp(-(a + bx)) = E[y]^2 . . * The dgp sets particular values of a, b, mux and sigx . * Here a = 2, b = -1 and x ~ N[1, 1] . scalar a = 2

83

. scalar b = -1 . scalar mux = 1 . scalar sigx = 1 . . * Set the sample size. Table 5.7 uses N=10,000 . set obs 10000 obs was 0, now 10000 . . * Generate x and y . set seed 2003 . gen x = mux + sigx*invnorm(uniform()) . gen lamda = exp(a + b*x) . gen Ey = 1/lamda . * To generate exponential with mean mu=Ey use . * Integral 0 to a of (1/mu)exp(-x/mu) dx by change of variables . * = Integral 0 to a/mu of exp(-t)dt . * = incomplete gamma function P(0,a/mu) in the terminology of Stata . gen y = Ey*invgammap(1,uniform()) . gen lny = ln(y) . gen lnfy = ln(lamda) - y*lamda . * twoway scatter Ey x . . * Descriptive Statisitcs . describe Contains data obs: 10,000 vars: 6 size: 280,000 (97.3% of memory free) ------------------------------------------------------------------------------storage display value variable name type format label variable label ------------------------------------------------------------------------------x float %9.0g lamda float %9.0g Ey float %9.0g y float %9.0g lny float %9.0g lnfy float %9.0g ------------------------------------------------------------------------------84

Sorted by: Note: dataset has changed since last saved . summarize Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------x | 10000 1.014313 1.004905 -2.895741 4.994059 lamda | 10000 4.457478 5.939084 .0500838 133.7191 Ey | 10000 .6185677 .8294007 .0074784 19.96655 y | 10000 .6194352 1.291416 .0000445 30.60636 lny | 10000 -1.554348 1.62358 -10.02114 3.421208 -------------+-------------------------------------------------------lnfy | 10000 -.0209485 1.419595 -7.52596 4.402257 . . ********** WRITE DATA TO A TEXT FILE ********** . . * Write data to a text (ascii) file . * used for programs mma05p2nlsbyml.do, mma05p3nlsbynl.do . * and mma05p4margeffects.do . * and can also use with programs other than Stata . outfile y x using mma05data.asc, replace . . ********** DO THE ANALYSIS: OLS and MLE ********** . . ** (1) OLS ESTIMATION . . * OLS is inconsistent in this example . regress y x Source | SS df MS Number of obs = 10000 -------------+-----------------------------F( 1, 9998) = 3030.74 Model | 3879.13606 1 3879.13606 Prob > F = 0.0000 Residual | 12796.7438 9998 1.27993037 R-squared = 0.2326 -------------+-----------------------------Adj R-squared = 0.2325 Total | 16675.8799 9999 1.66775476 Root MSE = 1.1313 -----------------------------------------------------------------------------y| Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------x | .6198182 .0112587 55.05 0.000 .5977488 .6418876 _cons | -.0092545 .016075 -0.58 0.565 -.0407648 .0222558 -----------------------------------------------------------------------------. estimates store rols . regress y x, robust Regression with robust standard errors

Number of obs = 10000 85

F( 1, 9998) = 596.30 Prob > F = 0.0000 R-squared = 0.2326 Root MSE = 1.1313 -----------------------------------------------------------------------------| Robust y| Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------x | .6198182 .0253823 24.42 0.000 .5700638 .6695725 _cons | -.0092545 .0171978 -0.54 0.591 -.0429655 .0244566 -----------------------------------------------------------------------------. estimates store rolsrobust . . ** (2) ML ESTIMATION USING STATA COMMAND FOR EXPONENTIAL MLE . . * The following uses Stata duration model commands. . * First need to define the duration variable (here y) . stset y failure event: (assumed to fail at time=y) obs. time interval: (0, y] exit on or before: failure -----------------------------------------------------------------------------10000 total obs. 0 exclusions -----------------------------------------------------------------------------10000 obs. remaining, representing 10000 failures in single record/single failure data 6194.352 total analysis time at risk, at risk from t = 0 earliest observed entry t = 0 last observed exit t = 30.60636 . streg x, dist(exp) nohr failure _d: 1 (meaning all fail) analysis time _t: y Iteration 0: Iteration 1: Iteration 2: Iteration 3: Iteration 4: Iteration 5:

log likelihood = -20754.005 log likelihood = -17232.884 log likelihood = -15760.556 log likelihood = -15752.193 log likelihood = -15752.19 log likelihood = -15752.19

Exponential regression -- log relative-hazard form No. of subjects =

10000

Number of obs =

10000 86

No. of failures = 10000 Time at risk = 6194.352495 LR chi2(1) Log likelihood =

-15752.19

= 10003.63 Prob > chi2 =

0.0000

-----------------------------------------------------------------------------_t | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------x | -.9896276 .0098692 -100.27 0.000 -1.008971 -.9702842 _cons | 1.982921 .0141496 140.14 0.000 1.955188 2.010654 -----------------------------------------------------------------------------. estimates store rexp . streg x, dist(exp) nohr robust failure _d: 1 (meaning all fail) analysis time _t: y Iteration 0: Iteration 1: Iteration 2: Iteration 3: Iteration 4: Iteration 5:

log pseudo-likelihood = -20754.005 log pseudo-likelihood = -17232.884 log pseudo-likelihood = -15760.556 log pseudo-likelihood = -15752.193 log pseudo-likelihood = -15752.19 log pseudo-likelihood = -15752.19

Exponential regression -- log relative-hazard form No. of subjects No. of failures Time at risk

= 10000 Number of obs = 10000 = 10000 = 6194.352495 Wald chi2(1) = 9914.62 Log pseudo-likelihood = -15752.19 Prob > chi2 = 0.0000 -----------------------------------------------------------------------------| Robust _t | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------x | -.9896276 .0099388 -99.57 0.000 -1.009107 -.9701479 _cons | 1.982921 .0144307 137.41 0.000 1.954637 2.011205 -----------------------------------------------------------------------------. estimates store rexprobust . . ** (3) ML ESTIMATION USING STATA ML COMMAND . . * For MLE computation can use the following Stata commands . * ml model lf provide the log-density . * ml model D0 provide the log-likelihood . * ml model D1 provide the log-likelihood and gradient 87

. * ml model D2 provide the log-likelihood, gradient and hessian . . * At a minimum need to provide . * (A) program define fcn where fcn is the function name .* defines the log-density (independent observations assumed) . * (B) ml model lf fcn + some extras .* the extras give the dependent variable and regressors . * (C) ml maximize .* obtains the mle . * (D) ml model lf fcn + some extras, robust .* provides robust sandwich standard errors . . * Here we provide the log-density (ml model lf) as this is simplest, . * and the Stata manual says that numerically only D2 is better. . . * (A) Define the log-density .* lnf(y) = (a+bx) - y*exp(a+bx) = theta - y*exp(theta) where theta = x'b . program define mleexp0 1. version 8.0 2. args lnf theta /* Must use lnf while could use name other than theta */ 3. quietly replace `lnf' = `theta' - $ML_y1*exp(`theta') 4. end . . * (B) Say that dependent variable is y and regressors are x plus a constant . ml model lf mleexp0 (y = x) . . * (C) Obtain the MLE . ml search /* Optional - can provide better starting values */ initial: log likelihood = -6194.3525 improve: log likelihood = -6194.3525 alternative: log likelihood = -5212.7607 rescale: log likelihood = -5212.7607 . ml maximize initial: log likelihood = -5212.7607 rescale: log likelihood = -5212.7607 Iteration 0: log likelihood = -5212.7607 Iteration 1: log likelihood = -1563.9176 Iteration 2: log likelihood = -217.6055 Iteration 3: log likelihood = -208.73633 Iteration 4: log likelihood = -208.71383 Iteration 5: log likelihood = -208.71383 Number of obs = 10000 Wald chi2(1) = 10054.85 Log likelihood = -208.71383 Prob > chi2 =

0.0000

-----------------------------------------------------------------------------88

y| Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------x | -.9896276 .0098692 -100.27 0.000 -1.008971 -.9702842 _cons | 1.982921 .0141496 140.14 0.000 1.955188 2.010654 -----------------------------------------------------------------------------. estimates store rmle . . * (D) Obtain robust standard errors . ml model lf mleexp0 (y = x), robust . ml search initial: log pseudo-likelihood = -6194.3525 improve: log pseudo-likelihood = -6194.3525 alternative: log pseudo-likelihood = -5212.7607 rescale: log pseudo-likelihood = -5212.7607 . ml maximize initial: log pseudo-likelihood = -5212.7607 rescale: log pseudo-likelihood = -5212.7607 Iteration 0: log pseudo-likelihood = -5212.7607 Iteration 1: log pseudo-likelihood = -1563.9176 Iteration 2: log pseudo-likelihood = -217.6055 Iteration 3: log pseudo-likelihood = -208.73633 Iteration 4: log pseudo-likelihood = -208.71383 Iteration 5: log pseudo-likelihood = -208.71383 Number of obs = 10000 Wald chi2(1) = 9914.62 Log pseudo-likelihood = -208.71383 Prob > chi2 =

0.0000

-----------------------------------------------------------------------------| Robust y| Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------x | -.9896276 .0099388 -99.57 0.000 -1.009107 -.9701479 _cons | 1.982921 .0144307 137.41 0.000 1.954637 2.011205 -----------------------------------------------------------------------------. estimates store rmlerobust . . * (E) Calculate R-squared and log-likelihood at the ML estimates . * lnL sums lnf(y) = ln(lamda) - y*lamda . gen lamdaml = exp(_b[_cons] + _b[x]*x) . gen lnfml = ln(lamdaml) - y*lamdaml . quietly means lnfml 89

. scalar LLml = r(mean)*r(N) . * R-squared = 1 - Sum_i(y_i - yhat_i)^2 / Sum_i(y_i - ybar)^2 . gen yhatml = 1/lamdaml . egen ybar = mean(y) . * quietly means y . * scalar ybar = r(mean) . gen y_yhatsqml = (y - yhatml)^2 . gen y_ybarsq = (y - ybar)^2 . quietly means y_yhatsqml . scalar SSresidml = r(mean) . quietly means y_ybarsq . scalar SStotal = r(mean) . scalar Rsqml = 1 - SSresidml/SStotal . di LLml " " Rsqml -208.71383 .39062307 . . ********** DISPLAY RESULTS: First two columns of Table 5.7 p.161 . . * (1) OLS - nonrobust and robust standard errors . * Here OLS is inconsistent. . * And expect sign reversal for slope as in true model mean E[y] = exp(-x'b) . estimates table rols rolsrobust, b(%10.4f) se(%10.4f) t stats(N ll r2) keep(_cons x) ---------------------------------------Variable | rols rolsrobust -------------+-------------------------_cons | -0.0093 -0.0093 | 0.0161 0.0172 | -0.58 -0.54 x | 0.6198 0.6198 | 0.0113 0.0254 | 55.05 24.42 -------------+-------------------------N | 10000.0000 10000.0000 ll | -1.542e+04 -1.542e+04 r2 | 0.2326 0.2326 ---------------------------------------legend: b/se/t

90

. . * (2) MLE by command ereg - nonrobust and robust standard errors . estimates table rexp rexprobust, b(%10.4f) se(%10.4f) t stats(N ll) keep(_cons x) ---------------------------------------Variable | rexp rexprobust -------------+-------------------------_cons | 1.9829 1.9829 | 0.0141 0.0144 | 140.14 137.41 x | -0.9896 -0.9896 | 0.0099 0.0099 | -100.27 -99.57 -------------+-------------------------N | 10000.0000 10000.0000 ll | -1.575e+04 -1.575e+04 ---------------------------------------legend: b/se/t . . * (3) MLE by command ml - nonrobust and robust standard errors . estimates table rmle rmlerobust, b(%10.4f) se(%10.4f) t stats(N ll) keep(_cons x) ---------------------------------------Variable | rmle rmlerobust -------------+-------------------------_cons | 1.9829 1.9829 | 0.0141 0.0144 | 140.14 137.41 x | -0.9896 -0.9896 | 0.0099 0.0099 | -100.27 -99.57 -------------+-------------------------N | 10000.0000 10000.0000 ll | -208.7138 -208.7138 ---------------------------------------legend: b/se/t . * And ML log-likelihood (check) and R-squared (needed to be computed) . di "Log likeihood for ML: " LLml Log likeihood for ML: -208.71383 . di "R-squared for MLE: " Rsqml R-squared for MLE: .39062307 . . ********** CLOSE OUTPUT ********** . log close log: c:\Imbook\bwebpage\Section2\mma05p1mle.txt log type: text closed on: 17 May 2005, 13:48:18 91

-----------------------------------------------------------------------------------------------------log: c:\Imbook\bwebpage\Section2\mma05p2nls.txt log type: text opened on: 17 May 2005, 13:53:31 . . ********** OVERVIEW OF MMA05P2NLS.DO ********** . . * STATA Program . * copyright C 2005 by A. Colin Cameron and Pravin K. Trivedi . * used for "Microeconometrics: Methods and Applications" . * by A. Colin Cameron and Pravin K. Trivedi (2005) . * Cambridge University Press . . * Chapter 5.9 pp.159-63 . * Nonlinear least squares . . * Provides last three columns of Table 5.7 results for . * (1) NLS using Stata command nl (hard to get robust s.e.'s) . * (2) FGNLS using Stata command nl (hard to get robust s.e.'s) . * (3) WNLS using Stata command nl (hard to get robust s.e.'s) . * using generated data set mma05data.asc . . * Note: Stata 8 does not give robust se's for nl .* But ml does - see program mma05p3nlsbyml.do .* New Stata 9 does have a robust se option (unlike Stata 8) . . * Related programs: . * mma05p1mle.do OLS and MLE for the same data . * mma05p3nlsbyml.do NLS using ml rather than nl . * mma05p4margeffects.do Calculates marginal effects . . * To run this program you need data and dictionary files . * mma05data.asc ASCII data set generated by mma05p1mle.do . . ********** SETUP ********** . . set more off . version 8 . . ********** READ IN DATA and SUMMARIZE ********** . . * Model is y ~ exponential(exp(a + bx)) .* x ~ N[mux, sigx^2] .* f(y) = exp(a + bx)*exp(-y*exp(a + bx)) .* lnf(y) = (a + bx) - y*exp(a + bx) .* E[y] = exp(-(a + bx)) note sign reversal for the mean .* V[y] = exp(-(a + bx)) = E[y]^2 . * Here a = 2, b = -1 and x ~ N[mux=1, sigx^21] 92

. * and Table 5.7 uses N=10,000 . . * Data was generated by program mma05p1mle.do . infile y x using mma05data.asc (10000 observations read) . . * Descriptive Statistics . describe Contains data obs: 10,000 vars: 2 size: 120,000 (98.8% of memory free) ------------------------------------------------------------------------------storage display value variable name type format label variable label ------------------------------------------------------------------------------y float %9.0g x float %9.0g ------------------------------------------------------------------------------Sorted by: Note: dataset has changed since last saved . summarize Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------y | 10000 .6194352 1.291416 .0000445 30.60636 x | 10000 1.014313 1.004905 -2.895741 4.994059 . . ********** DO THE ANALYSIS: NLS, WNLS and NFGLS ********** . . *** (1) NLS ESTIMATION USING STATA NL COMMAND (Nonlinear LS) . . * To do this in Stata . * (A) program define nlfcn where fcn is the function name .* defines g(x_i'b) and says what the regressors x are . * (B) nl fcn y where fcn is the function name in (A) .* and y is the dependent variable .* does NLS of y on fcn defined in (A) . * (C) Heteroskedastic-consistent standard errors requires extra coding . . * (1A) Define g(x'b) .* Note: Since E[y] = exp(-(a + bx)) there is sign reversal for the mean . program define nlexpnls 1. version 7.0 2. if "`1'" == "?" { /* if query call ... */ 3. global S_1 "b1int b2x" /* declare parameters */ 4. global b1int=1 /* initial values */ 93

5. global b2x=0 6. exit} 7. replace `1'=exp(-$b1int-$b2x*x) /* calculate function */ 8. end . . * (1B) Do NLS of y on the function expnls defined in (A) . nl expnls y (obs = 10000) Iteration 0: Iteration 1: Iteration 2: Iteration 3: Iteration 4: Iteration 5:

residual SS = residual SS = residual SS = residual SS = residual SS = residual SS =

17308.68 10333.37 10150.66 10149.86 10149.86 10149.86

Source | SS df MS Number of obs = 10000 -------------+-----------------------------F( 2, 9998) = 5103.98 Model | 10363.0157 2 5181.50784 Prob > F = 0.0000 Residual | 10149.8633 9998 1.01518937 R-squared = 0.5052 -------------+-----------------------------Adj R-squared = 0.5051 Total | 20512.879 10000 2.0512879 Root MSE = 1.007566 Res. dev. = 28527.52 (expnls) -----------------------------------------------------------------------------y| Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------b1int | 1.887563 .0306819 61.52 0.000 1.82742 1.947705 b2x | -.9574684 .0097419 -98.28 0.000 -.9765645 -.9383724 -----------------------------------------------------------------------------(SEs, P values, CIs, and correlations are asymptotic approximations) . estimates store bnls . . * Complications now begin: getting standard erors. Easier to use (1) !! . . * (1C) Get sandwich heteroskedastic-robust standard errors for NLS . . * Note that robust option does not work for nl . * So wrong standard errors are given for this problem as errors are heterosckeastic . . * To get robust standard errors is not straightforward . . * Obtain them by OLS regress y - g(x,b) on dg/db with robust option. . * Explanation: OLS regress y - g(x,b) = (dg/db)'a + v . * This is NR algorithm for update of b . * But a = 0 since iterations have converged, so v = y - g(x,b) . * So nonrobust standard errors from this OLS regression yield . * V[a] = s^2 (Sum_i (dg_i/db)(dg_i/db)') 94

. * where s^2 = (Sum_i(y - g(x_i,b)^2)) . * This is the nonrobust standard errors for NLS . * And robust option gives robust standard errors from this OLS regression. . . * Obtain the derivatives dg/db . * Here g = exp(x'b) so dg/db = exp(x'b)*x = yhat*x . quietly nl expnls y . predict residnls, residuals . predict yhatnls, yhat . scalar snls = e(rmse)

/* Use in earlier code */

. gen d1 = yhatnls . gen d2 = x*yhatnls . * This OLS regression gives robust standard errors . regress residnls d1 d2, noconstant robust Regression with robust standard errors Number of obs = 10000 F( 2, 9998) = 0.00 Prob > F = 1.0000 R-squared = 0.0000 Root MSE = 1.0076 -----------------------------------------------------------------------------| Robust residnls | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------d1 | 4.46e-07 .1420794 0.00 1.000 -.2785037 .2785046 d2 | -1.49e-07 .0611969 -0.00 1.000 -.1199583 .119958 -----------------------------------------------------------------------------. estimates store bnlsrobust . . * Check: Do OLS regression that gives nonrobust standard errors .* and verify that same results as in (1B) . regress residnls d1 d2, noconstant Source | SS df MS Number of obs = 10000 -------------+-----------------------------F( 2, 9998) = 0.00 Model | 2.6739e-10 2 1.3370e-10 Prob > F = 1.0000 Residual | 10149.8633 9998 1.01518937 R-squared = 0.0000 -------------+-----------------------------Adj R-squared = -0.0002 Total | 10149.8633 10000 1.01498633 Root MSE = 1.0076 -----------------------------------------------------------------------------residnls | Coef. Std. Err. t P>|t| [95% Conf. Interval] 95

-------------+---------------------------------------------------------------d1 | 4.46e-07 .0306819 0.00 1.000 -.0601423 .0601432 d2 | -1.49e-07 .0097419 -0.00 1.000 -.0190961 .0190958 -----------------------------------------------------------------------------. estimates store bnlscheck . . * (1D) Alternative to (1C) robust NLS standard errors that are better. . * These are sandwich form but use knowledge that V[u]=exp(x'b)^2 . * which can be estimated by Vhat[u] = yhat . * Now use this knowledge here in computing S in DSD. . * Form DSDknown = D'SD with S = Diag(yhat^2) . gen ds1known = yhatnls*yhatnls . gen ds2known = x*yhatnls*yhatnls . matrix accum DSDknown = ds1known ds2known, noconstant (obs=10000) . matrix accum DD2 = d1 d2, noconstant (obs=10000)

/* DD commented above */

. * Form the robust variance matrix estimate . matrix vnlsknown = syminv(DD2)*DSDknown*syminv(DD2) . * Calculate the robust standard errors . scalar seb1intnlsknown = sqrt(vnlsknown[1,1]) . scalar seb2xnlsknown = sqrt(vnlsknown[2,2]) . di "Robust standard errors of NLS estimates of b1int and b2x: " Robust standard errors of NLS estimates of b1int and b2x: . di "Using knowledge that Var[u] = exp(x'b)^2 estimated by yhat" Using knowledge that Var[u] = exp(x'b)^2 estimated by yhat . di seb1intnlsknown " " seb2xnlsknown .21097066 .08798113 . . * (1E) Calculate R-squared and log-likelihood at the NLS estimates . * Note that Stata version 8 reports the wrong R-squared . * as uses TSS = Sum_i y_i^2 and not Sum_i(y_i - ybar)^2 . * lnL sums lnf(y) = ln(lamda) - y*lamda . gen lamdanls = 1 / yhatnls /* yhatnls saved earlier */ . gen lnfnls = ln(lamdanls) - y*lamdanls . quietly means lnfnls

96

. scalar LLnls = r(mean)*r(N) . * R-squared = 1 - Sum_i(y_i - yhat_i)^2 / Sum_i(y_i - ybar)^2 . egen ybar = mean(y) . * quietly means y . * scalar ybar = r(mean) . gen y_ybarsq = (y - ybar)^2 . quietly means y_ybarsq . scalar SStotal = r(mean) . gen y_yhatsqnls = (y - yhatnls)^2 . quietly means y_yhatsqnls . scalar SSresidnls = r(mean) . scalar Rsqnls = 1 - SSresidnls/SStotal

/* SStotal found earlier */

. di LLnls " " Rsqnls -232.97524 .39134462 . . ** (2) FGNLS ESTIMATION USING STATA NL COMMAND . . * The following gives FGNLS in Table 5.7 . * To instead get the WNLS estimates in Table 5.7 . * replace gen wfgnls = (1/yhatnls)^2 below by gen wfgnls = 1/yhatnls . . * The Feasible generalized NLS estimator minimizes . * SUM_i (y_i - g(x_i'b))^2 / s_i^2 where s_i^2 = estimate of sigma_i^2 . * This is y_i = g(x_i'b) + u_i where u_i ~ (0,s_i^2) . * Can do NLS with weighting option [aweight = 1/(s_i^2)] . * Here s_i^2 = [exp(x_i'b)]^2 = yhatnls^2 . . * The simplest way to proceed is to use the aweights option. . . * (2A) nls program expnls already defined in (1A) . . * (2B) For FGNLS do this nls but now with weights . gen wfgnls = (1/yhatnls)^2 . * gen wfgnls = 1/yhatnls . nl expnls y [aweight=wfgnls] (sum of wgt is 405584.32) Iteration 0: residual SS = 1127.256 Iteration 1: residual SS = 363.8331 Iteration 2: residual SS = 239.3399 97

Iteration 3: Iteration 4: Iteration 5: Iteration 6:

residual SS = residual SS = residual SS = residual SS =

220.6796 220.2856 220.2851 220.2851

Source | SS df MS Number of obs = 10000 -------------+-----------------------------F( 2, 9998) = 4946.06 Model | 217.95244 2 108.97622 Prob > F = 0.0000 Residual | 220.285065 9998 .022032913 R-squared = 0.4973 -------------+-----------------------------Adj R-squared = 0.4972 Total | 438.237505 10000 .043823751 Root MSE = .1484349 Res. dev. = 8924.231 (expnls) -----------------------------------------------------------------------------y| Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------b1int | 1.984035 .0147737 134.30 0.000 1.955075 2.012994 b2x | -.990691 .01001 -98.97 0.000 -1.010313 -.9710694 -----------------------------------------------------------------------------(SEs, P values, CIs, and correlations are asymptotic approximations) . estimates store bfgnls . . * (2C) Robust standard errors . * The standard errors obtained given are consistent . * assuming correct model for heteroskedasticity. . * To guard against misspecification use similar approach to nls case . * Obtain the derivatives dg/db . * Here g = exp(x'b) so dg/db = exp(x'b)*x = yhat*x . predict residoptnls, residuals . predict yhatoptnls, yhat . gen d1opt = yhatoptnls . gen d2opt = x*yhatoptnls . * This OLS regression gives robust standard errors . regress residoptnls d1opt d2opt [aweight=wfgnls], noconstant robust (sum of wgt is 4.0558e+05) Regression with robust standard errors Number of obs = 10000 F( 2, 9998) = 0.00 Prob > F = 1.0000 R-squared = 0.0000 Root MSE = .14843 -----------------------------------------------------------------------------| Robust residoptnls | Coef. Std. Err. t P>|t| [95% Conf. Interval] 98

-------------+---------------------------------------------------------------d1opt | -9.85e-09 .0145803 -0.00 1.000 -.0285803 .0285802 d2opt | 8.81e-09 .0101319 0.00 1.000 -.0198606 .0198606 -----------------------------------------------------------------------------. estimates store bfgnlsrobust . * This OLS regression gives nonrobust standard errors . * It is a check and should equal (C) . regress residoptnls d1opt d2opt [aweight=wfgnls], noconstant (sum of wgt is 4.0558e+05) Source | SS df MS Number of obs = 10000 -------------+-----------------------------F( 2, 9998) = 0.00 Model | 2.2737e-13 2 1.1369e-13 Prob > F = 1.0000 Residual | 220.285065 9998 .022032913 R-squared = 0.0000 -------------+-----------------------------Adj R-squared = -0.0002 Total | 220.285065 10000 .022028506 Root MSE = .14843 -----------------------------------------------------------------------------residoptnls | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------d1opt | -9.85e-09 .0147737 -0.00 1.000 -.0289594 .0289594 d2opt | 8.81e-09 .01001 0.00 1.000 -.0196216 .0196216 -----------------------------------------------------------------------------. estimates store bfgnlscheck . . * (2D) Calculate R-squared and log-likelihood at the NLS estimates . * Note that Stata version 8 reports the wrong R-squared . * as uses TSS = Sum_i y_i^2 and not Sum_i(y_i - ybar)^2 . * lnL sums lnf(y) = ln(lamda) - y*lamda . gen lamdafgnls = 1 / yhatoptnls /* yhatoptnls saved earlier */ . gen lnffgnls = ln(lamdafgnls) - y*lamdafgnls . quietly means lnffgnls . scalar LLfgnls = r(mean)*r(N) . * R-squared = 1 - Sum_i(y_i - yhat_i)^2 / Sum_i(y_i - ybar)^2 . gen y_yhatsqfgnls = (y - yhatoptnls)^2 . quietly means y_yhatsqfgnls . scalar SSresidfgnls = r(mean) . scalar Rsqfgnls = 1 - SSresidfgnls/SStotal . di LLfgnls "

/* SStotal found earlier */

" Rsqfgnls 99

-208.71965

.39056605

. . ** (3) WNLS ESTIMATION USING STATA NL COMMAND . . * To get WNLS estimates in Table 5.7 . * replace gen wfgnls = (1/yhatnls)^2 in (3) FGNLS by gen wfgnls = 1/yhatnls . * Code is shorter as all comments are dropped . . gen wwnls = 1/yhatnls . nl expnls y [aweight=wwnls] (sum of wgt is 39858.614) Iteration 0: Iteration 1: Iteration 2: Iteration 3: Iteration 4:

residual SS = residual SS = residual SS = residual SS = residual SS =

2630.417 1694.802 1500.277 1494.658 1494.653

Source | SS df MS Number of obs = 10000 -------------+-----------------------------F( 2, 9998) = 5073.75 Model | 1517.00087 2 758.500436 Prob > F = 0.0000 Residual | 1494.6525 9998 .149495149 R-squared = 0.5037 -------------+-----------------------------Adj R-squared = 0.5036 Total | 3011.65337 10000 .301165337 Root MSE = .386646 Res. dev. = 14035.49 (expnls) -----------------------------------------------------------------------------y| Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------b1int | 1.990623 .0224903 88.51 0.000 1.946537 2.034708 b2x | -.9960671 .009777 -101.88 0.000 -1.015232 -.9769022 -----------------------------------------------------------------------------(SEs, P values, CIs, and correlations are asymptotic approximations) . estimates store bwnls . predict residwnls, residuals . predict yhatwnls, yhat . gen d1w = yhatwnls . gen d2w = x*yhatwnls . regress residwnls d1w d2w [aweight=wwnls], noconstant robust (sum of wgt is 3.9859e+04) Regression with robust standard errors Number of obs = 10000 F( 2, 9998) = 0.00 100

Prob > F = 1.0000 R-squared = 0.0000 Root MSE = .38665 -----------------------------------------------------------------------------| Robust residwnls | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------d1w | -1.11e-07 .0358551 -0.00 1.000 -.0702833 .0702831 d2w | 5.35e-08 .0224175 0.00 1.000 -.0439428 .043943 -----------------------------------------------------------------------------. estimates store bwnlsrobust . regress residwnls d1w d2w [aweight=wwnls], noconstant (sum of wgt is 3.9859e+04) Source | SS df MS Number of obs = 10000 -------------+-----------------------------F( 2, 9998) = 0.00 Model | 1.8190e-12 2 9.0949e-13 Prob > F = 1.0000 Residual | 1494.6525 9998 .149495149 R-squared = 0.0000 -------------+-----------------------------Adj R-squared = -0.0002 Total | 1494.6525 10000 .14946525 Root MSE = .38665 -----------------------------------------------------------------------------residwnls | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------d1w | -1.11e-07 .0224903 -0.00 1.000 -.0440856 .0440853 d2w | 5.35e-08 .009777 0.00 1.000 -.0191649 .019165 -----------------------------------------------------------------------------. estimates store bwnlscheck . gen lamdawnls = 1 / yhatwnls

/* yhatwnls saved earlier */

. gen lnfwnls = ln(lamdawnls) - y*lamdawnls . quietly means lnfwnls . scalar LLwnls = r(mean)*r(N) . gen y_yhatsqwnls = (y - yhatwnls)^2 . quietly means y_yhatsqwnls . scalar SSresidwnls = r(mean) . scalar Rsqwnls = 1 - SSresidwnls/SStotal

/* SStotal found earlier */

. di LLwnls " " Rsqwnls -208.93381 .39017996 101

. . ***** PRINT RESULTS: Last three columns of Table 5.7 page 161 . . * (1) NLS using NL - nonrobust and robust standard errors . * Here nonrobust differs from robust asymptotically . . * Table 5.7 NLS nonrobust standard errors . estimates table bnls, b(%10.4f) se(%10.4f) t stats(N ll) --------------------------Variable | bnls -------------+------------b1int | 1.8876 | 0.0307 | 61.52 b2x | -0.9575 | 0.0097 | -98.28 -------------+------------N | 10000.0000 ll | --------------------------legend: b/se/t . * Table 5.7 NLS robust standard errors . estimates table bnlscheck bnlsrobust, b(%10.4f) se(%10.4f) t stats(N ll) ---------------------------------------Variable | bnlscheck bnlsrobust -------------+-------------------------d1 | 0.0000 0.0000 | 0.0307 0.1421 | 0.00 0.00 d2 | -0.0000 -0.0000 | 0.0097 0.0612 | -0.00 -0.00 -------------+-------------------------N | 10000.0000 10000.0000 ll | -1.426e+04 -1.426e+04 ---------------------------------------legend: b/se/t . . /* > * Check: Nonrobust standard errors of NLS b1int and b2x: > di seb1intnlsnr " " seb2xnlsnr > * Robust standard errors of NLS estimates of b1int and b2x: > di seb1intnls " " seb2xnls > */ . * Alternative Robust standard errors of NLS estimates of b1int and b2x: 102

. * These use knowledge that Var[u] = exp(x'b) . di seb1intnlsknown " " seb2xnlsknown .21097066 .08798113 . . * (3) WNLS - nonrobust and robust standard errors . * Here nonrobust = robust asymptotically as WNLS in LEF . * Also should be same as MLE asymptotically . * Table 5.7 WNLS nonrobust standard errors . estimates table bwnls, b(%10.4f) se(%10.4f) t stats(N ll) --------------------------Variable | bwnls -------------+------------b1int | 1.9906 | 0.0225 | 88.51 b2x | -0.9961 | 0.0098 | -101.88 -------------+------------N | 10000.0000 ll | --------------------------legend: b/se/t . * Table 5.7 WNLS robust standard errors . estimates table bwnlscheck bwnlsrobust, b(%10.4f) se(%10.4f) t stats(N ll) ---------------------------------------Variable | bwnlscheck bwnlsrob~t -------------+-------------------------d1w | -0.0000 -0.0000 | 0.0225 0.0359 | -0.00 -0.00 d2w | 0.0000 0.0000 | 0.0098 0.0224 | 0.00 0.00 -------------+-------------------------N | 10000.0000 10000.0000 ll | -4685.9286 -4685.9286 ---------------------------------------legend: b/se/t . . * (2) FGNLS - nonrobust and robust standard errors . * Here nonrobust = robust asymptotically as FGNLS in LEF . * Also should be same as MLE asymptotically . * Table 5.7 FGNLS nonrobust standard errors . estimates table bfgnls, b(%10.4f) se(%10.4f) t stats(N ll)

103

--------------------------Variable | bfgnls -------------+------------b1int | 1.9840 | 0.0148 | 134.30 b2x | -0.9907 | 0.0100 | -98.97 -------------+------------N | 10000.0000 ll | --------------------------legend: b/se/t . * Table 5.7 FGNLS robust standard errors . estimates table bfgnlscheck bfgnlsrobust, b(%10.4f) se(%10.4f) t stats(N ll) ---------------------------------------Variable | bfgnlsch~k bfgnlsro~t -------------+-------------------------d1opt | -0.0000 -0.0000 | 0.0148 0.0146 | -0.00 -0.00 d2opt | 0.0000 0.0000 | 0.0100 0.0101 | 0.00 0.00 -------------+-------------------------N | 10000.0000 10000.0000 ll | 4887.7042 4887.7042 ---------------------------------------legend: b/se/t . . * (4) Print the various log-likelihoods and R-squared . * Log-likelihood for NLS and FNGLS . di "LLnls: " LLnls " LLfgnls: " LLfgnls " LLwnls: " LLwnls LLnls: -232.97524 LLfgnls: -208.71965 LLwnls: -208.93381 . * R-squared for MLE, NLS and FNGLS . di "Rsqnls: " Rsqnls " Rsqfgnls: " Rsqfgnls " Rsqwnls: " Rsqwnls Rsqnls: .39134462 Rsqfgnls: .39056605 Rsqwnls: .39017996 . . ********** CLOSE OUTPUT ********** . log close log: c:\Imbook\bwebpage\Section2\mma05p2nls.txt log type: text closed on: 17 May 2005, 13:53:34 -----------------------------------------------------------------------------------------------------log: c:\Imbook\bwebpage\Section2\mma05p3nlsbyml.txt 104

log type: text opened on: 17 May 2005, 13:54:20 . . ********** OVERVIEW OF MMA05P2NLSBYML.DO ********** . . * STATA Program . * copyright C 2005 by A. Colin Cameron and Pravin K. Trivedi . * used for "Microeconometrics: Methods and Applications" . * by A. Colin Cameron and Pravin K. Trivedi (2005) . * Cambridge University Press . . * Chapter 5.9 pp.159-63 . * Nonlinear Least Squares using Stata command ml . . * Provides third column of Table 5.7 for . * (1) NLS using Stata ml command (easy to get robust s.e.'s) . * using generated data set mma05data.asc . . * Note: Use ml rather than nl as then much easier to get robust s.e.'s .* Can instead use stata command nl see program mma05p2nlsbynl.do . . * Related programs: . * mma05p1mle.do OLS and MLE for the same data . * mma05p2nls.do NLS (and WMNLS and FGNLS) using Stata command nl . * mma05p4margeffects.do Calculates marginal effects . . * To run this program you need data and dictionary files . * mma05data.asc ASCII data set generated by mma05p1mle.do . . ********** SETUP ********** . . set more off . version 8 . . ********** READ IN DATA and SUMMARIZE ********** . . * Model is y ~ exponential(exp(a + bx)) .* x ~ N[mux, sigx^2] .* f(y) = exp(a + bx)*exp(-y*exp(a + bx)) .* lnf(y) = (a + bx) - y*exp(a + bx) .* E[y] = exp(-(a + bx)) note sign reversal for the mean .* V[y] = exp(-(a + bx)) = E[y]^2 . * Here a = 2, b = -1 and x ~ N[mux=1, sigx^21] . * and Table 5.7 uses N=10,000 . . * Data was generated by program mma05p1mle.do . infile y x using mma05data.asc (10000 observations read) 105

. . * Descriptive Statistics . describe Contains data obs: 10,000 vars: 2 size: 120,000 (98.8% of memory free) ------------------------------------------------------------------------------storage display value variable name type format label variable label ------------------------------------------------------------------------------y float %9.0g x float %9.0g ------------------------------------------------------------------------------Sorted by: Note: dataset has changed since last saved . summarize Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------y | 10000 .6194352 1.291416 .0000445 30.60636 x | 10000 1.014313 1.004905 -2.895741 4.994059 . . ********** DO THE ANALYSIS: NLS using STATA COMMAND ML ********** . . * (1) NLS ESTIMATION USING STATA ML COMMAND (maximum likelihood) . . * Advantage: ml command has robust standard errors as an option . . * The NLS estimator minimizes SUM_i (y_i - g(x_i'b))^2. . * Here let g(x'b) = exp(a + b*x) = exp(b1int + b2x*x) say. . * In fact for this dgp E[y] = exp(-(a + bx)) so sign reversal for the mean. . . * To adjust this code to other NLS problems . * (a) If more regressors, say x1 x2 and x3, replace ml model line with .* ml model lf mlexp (y = x1 x2 x3) / sigma . * (b) If different functional form for mean, say g(x'b), redefine `res' as .* `res' = $ML_y1 - g(`theta') . * (c) If functional form for mean is not single-index then the program . * will become considerably more complicated with more args. . . * (1A) The program "mlexp" defines the objective function . program define mlexp 1. version 8.0 2. args lnf theta sigma /* theta contains b1int and b2x; sigma is st.dev.of error */ 3. tempvar res /* create to shorten expression for lnf */ 4. quietly gen double `res' = $ML_y1 - exp(-`theta') 106

5. quietly replace `lnf' = -0.5*ln(2*_pi) - ln(`sigma') - 0.5*`res'^2/`sigma'^2 6. end . . * (1B) The following command gives the dep variable (y) and regressors (x + intercept) . ml model lf mlexp (y = x) / sigma . ml search initial: log likelihood = - (could not be evaluated) feasible: log likelihood = -35613.002 improve: log likelihood = -19164.648 rescale: log likelihood = -16938.923 rescale eq: log likelihood = -16938.923 . ml maximize initial: log likelihood = -16938.923 rescale: log likelihood = -16938.923 rescale eq: log likelihood = -16938.923 Iteration 0: log likelihood = -16938.923 (not concave) Iteration 1: log likelihood = -15504.033 Iteration 2: log likelihood = -14673.535 Iteration 3: log likelihood = -14272.637 Iteration 4: log likelihood = -14263.775 Iteration 5: log likelihood = -14263.761 Iteration 6: log likelihood = -14263.761 Number of obs = 10000 Wald chi2(1) = 10492.88 Log likelihood = -14263.761 Prob > chi2 =

0.0000

-----------------------------------------------------------------------------y| Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------eq1 | x | -.9574683 .0093471 -102.43 0.000 -.9757883 -.9391483 _cons | 1.887562 .0295701 63.83 0.000 1.829606 1.945519 -------------+---------------------------------------------------------------sigma | _cons | 1.007465 .0071239 141.42 0.000 .9935028 1.021428 -----------------------------------------------------------------------------. estimates store bnlsbymle . . * (1C) Adding ,robust gives Heteroskedastic robust standard errors . ml model lf mlexp (y = x) / sigma, robust . ml search initial: log pseudo-likelihood = - (could not be evaluated) feasible: log pseudo-likelihood = -35613.002 107

improve: log pseudo-likelihood = -17310.807 rescale: log pseudo-likelihood = -17310.807 rescale eq: log pseudo-likelihood = -16777.282 . ml maximize initial: log pseudo-likelihood = -16777.282 rescale: log pseudo-likelihood = -16777.282 rescale eq: log pseudo-likelihood = -16777.282 Iteration 0: log pseudo-likelihood = -16777.282 (not concave) Iteration 1: log pseudo-likelihood = -16097.359 Iteration 2: log pseudo-likelihood = -16013.711 Iteration 3: log pseudo-likelihood = -14412.885 Iteration 4: log pseudo-likelihood = -14264.159 Iteration 5: log pseudo-likelihood = -14263.761 Iteration 6: log pseudo-likelihood = -14263.761 Number of obs = 10000 Wald chi2(1) = 288.75 Log pseudo-likelihood = -14263.761 Prob > chi2 =

0.0000

-----------------------------------------------------------------------------| Robust y| Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------eq1 | x | -.9574683 .0563463 -16.99 0.000 -1.067905 -.8470317 _cons | 1.887562 .127832 14.77 0.000 1.637016 2.138108 -------------+---------------------------------------------------------------sigma | _cons | 1.007465 .0561714 17.94 0.000 .8973713 1.117559 -----------------------------------------------------------------------------. estimates store bnlsbymlerobust . . ***** PRINT RESULTS: Third column of Table 5.7 p.111 ********** . . * (1) NLS by ML - nonrobust and robust standard errors . * The coefficient estimates are exactly the same as those using the nl command . * The estimated standard errors are close - within 10% of those using the nl command . * Table 5.7 reports the standard errors using the nl command . estimates table bnlsbymle bnlsbymlerobust, b(%10.4f) se(%10.4f) t stats(N ll) ---------------------------------------Variable | bnlsbymle bnlsbyml~t -------------+-------------------------eq1 | x | -0.9575 -0.9575 | 0.0093 0.0563 | -102.43 -16.99 108

_cons | 1.8876 1.8876 | 0.0296 0.1278 | 63.83 14.77 -------------+-------------------------sigma | _cons | 1.0075 1.0075 | 0.0071 0.0562 | 141.42 17.94 -------------+-------------------------Statistics | N | 10000.0000 10000.0000 ll | -1.426e+04 -1.426e+04 ---------------------------------------legend: b/se/t . . ********** CLOSE OUTPUT ********** . log close log: c:\Imbook\bwebpage\Section2\mma05p3nlsbyml.txt log type: text closed on: 17 May 2005, 13:54:27 -----------------------------------------------------------------------------------------------------log: c:\Imbook\bwebpage\Section2\mma05p4margeffects.txt log type: text opened on: 17 May 2005, 13:57:02 . . ********** OVERVIEW OF MMA05P4MARGINALEFFECTS.DO ********** . . * STATA Program . * copyright C 2005 by A. Colin Cameron and Pravin K. Trivedi . * used for "Microeconometrics: Methods and Applications" . * by A. Colin Cameron and Pravin K. Trivedi (2005) . * Cambridge University Press . . * Chapter 5.9.4 pp.162-3 . * Marginal effects analysis for a nonlinear model (here exponential regression). . . * Provides . * (1) Sample average marginal effect using derivative . * (2) Sample average marginal effect using first difference . * (3) Marginal effect evaluated at the sample mean . * (4) Marginal effects (1)-(3) when model estimated by Stata ml command . * using generated data (see below) . . * Related programs: . * mma05p1mle.do OLS and MLE for the same data . * mma05p2nls.do NLS, WNLS, FGNLS for same data using nl command . * mma05p3nlsbyml.do NLS for same data using ml command . 109

. * To run this program you need data and dictionary files . * mma05data.asc ASCII data set generated by mma05p1mle.do . . ********** SETUP ********** . . set more off . version 8 . . ********** READ IN DATA and SUMMARIZE ********** . . * Model is y ~ exponential(exp(a + bx)) .* x ~ N[mux, sigx^2] .* f(y) = exp(a + bx)*exp(-y*exp(a + bx)) .* lnf(y) = (a + bx) - y*exp(a + bx) .* E[y] = exp(-(a + bx)) note sign reversal for the mean .* V[y] = exp(-(a + bx)) = E[y] . * Here a = 2, b = -1 and x ~ N[mux=1, sigx^21] . * and Table 5.7 uses N=10,000 . . * Data was generated by program mma05p1mle.do . infile y x using mma05data.asc (10000 observations read) . . * Descriptive Statistics . describe Contains data obs: 10,000 vars: 2 size: 120,000 (98.8% of memory free) ------------------------------------------------------------------------------storage display value variable name type format label variable label ------------------------------------------------------------------------------y float %9.0g x float %9.0g ------------------------------------------------------------------------------Sorted by: Note: dataset has changed since last saved . summarize Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------y | 10000 .6194352 1.291416 .0000445 30.60636 x | 10000 1.014313 1.004905 -2.895741 4.994059 . 110

. ********** MARGINAL EFFECTS for CHAPTER 5.9.4 ********** . . ** (1) DERIVATIVE METHOD FOR SAMPLE AVERAGE MARGINAL EFFECT . . * (1A) METHOD A: Use analytical results . * Since E[y] = exp(-(a + bx)) Note: here sign reversal for the mean !! .* dE[y]/dx = -b*exp(-(a + bx)) = -b*E[y] . . * Estimate the model . * The Stata code for exponential regression is unusual as st command . * Need to declare data to be st data with dependent variable y . stset y failure event: (assumed to fail at time=y) obs. time interval: (0, y] exit on or before: failure -----------------------------------------------------------------------------10000 total obs. 0 exclusions -----------------------------------------------------------------------------10000 obs. remaining, representing 10000 failures in single record/single failure data 6194.352 total analysis time at risk, at risk from t = 0 earliest observed entry t = 0 last observed exit t = 30.60636 . quietly streg x, distribution(exponential) nohr . gen dEydxanalyticalderivative = -_b[x]*exp(-_b[_cons] - _b[x]*x) . * Alternative is to (1) predict the mean and (2) multiply by -_b[x] . quietly sum dEydxanalyticalderivative . scalar mesaad = r(mean) . di "Sample average marginal effect by analytical derivative = " mesaad Sample average marginal effect by analytical derivative = .60976598 . . * (1B) METHOD B: Use numerical derivative (here one-sided) . * This is same as first difference code, except have small change in x . * Note: precision problems can arise with small changes in x . * The following code tries to minimize such problems . * Change in x will be 0.0001 times the standard deviation of x . egen sdx = sd(x) . quietly streg x, distribution(exponential) nohr . * Need to tell streg to predict the mean as this is not the default. . predict y0, mean time 111

. gen xoriginal = x . replace x = x+0.0001*sdx (10000 real changes made) . predict y1, mean time . gen dEydxnumericalderivative = (y1 - y0)/(0.0001*sdx) . quietly sum dEydxnumericalderivative . scalar mesand = r(mean) . di "Sample average marginal effect by numerical derivative = " mesand Sample average marginal effect by numerical derivative = .60949044 . replace x = xoriginal (10000 real changes made) . drop xoriginal sdx y0 y1 . . ** (2) FINITE DIFFERENCE METHOD FOR SAMPLE AVERAGE MARGINAL EFFECT . . streg x, distribution(exponential) nohr /* y is dependent variable */ failure _d: 1 (meaning all fail) analysis time _t: y Iteration 0: Iteration 1: Iteration 2: Iteration 3: Iteration 4: Iteration 5:

log likelihood = -20754.005 log likelihood = -17232.884 log likelihood = -15760.556 log likelihood = -15752.193 log likelihood = -15752.19 log likelihood = -15752.19

Exponential regression -- log relative-hazard form No. of subjects = 10000 No. of failures = 10000 Time at risk = 6194.352464

Number of obs =

LR chi2(1) Log likelihood =

-15752.19

= 10003.63 Prob > chi2 =

10000

0.0000

-----------------------------------------------------------------------------_t | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------x | -.9896276 .0098692 -100.27 0.000 -1.008971 -.9702842 _cons | 1.982921 .0141496 140.14 0.000 1.955188 2.010654 -----------------------------------------------------------------------------112

. . * The following method can be used following many stata estimation commands . * 1. Predict y using sample data. . * Need to say predict the mean as this is not the streg default. . predict y0, mean time . * 2. Predict y with regressor of x increased by one . gen xoriginal = x . replace x = x+1 (10000 real changes made) . predict y1, mean time . replace x = xoriginal /* Put x back to initial value for later analysis */ (10000 real changes made) . * 3. Calculate difference . gen dEydxfinitedifference = y1 - y0 . quietly sum dEydxfinitedifference . scalar mesafd = r(mean) . di "Sample average marginal effect by first differences = " mesafd Sample average marginal effect by first differences = 1.0414485 . drop xoriginal y0 y1 . . ** (3) DERIVATIVE METHOD FOR MARGINAL EFFECT AT SAMPLE MEAN . . * (3A) Use Stata command mfx . quietly streg x, distribution(exponential) nohr . * Need to tell mfx to predict the mean as this is not the streg default. . mfx compute, dydx predict(mean time) Marginal effects after ereg y = predicted mean _t (predict, mean time) = .37563828 -----------------------------------------------------------------------------variable | dy/dx Std. Err. z P>|z| [ 95% C.I. ] X ---------+-------------------------------------------------------------------x | .371742 .00525 70.81 0.000 .361452 .382032 1.01431 -----------------------------------------------------------------------------. di "Marginal effect by analytical derivative at mean of x using mfx: " Marginal effect by analytical derivative at mean of x using mfx:

113

. matrix list e(Xmfx_dydx) symmetric e(Xmfx_dydx)[1,1] x r1 .371742 . . * (3B) Write ones own code . quietly streg x, distribution(exponential) nohr . quietly sum x . scalar meanx = r(mean) . scalar dEydxatmeanx = -_b[x]*exp(-_b[_cons] - _b[x]*meanx) . di "Marginal effect by analytical derivative at mean of x done manually: " Marginal effect by analytical derivative at mean of x done manually: . di dEydxatmeanx .371742 . . ** (4) MARGINAL EFFECTS AFTER ML COMMAND . . * Preceding (1) - (3) presume there is a built-in command to get MLE. . * Now consider ML estimation using Stata's ml command. . * After ml command cannot use predict or mfx. . * Need to be more manual, as follows. . . * Estimate model by ml: for details see mma0p1mle.do . program define mleexp0 1. version 8.0 2. args lnf theta /* Must use lnf while could use name other than theta */ 3. quietly replace `lnf' = `theta' - $ML_y1*exp(`theta') 4. end . quietly ml model lf mleexp0 (y = x) . quietly ml search . quietly ml maximize . . * Note that here the mean is in fact exp(-a-b*x) . . * (1A) Sample average marginal effect by calculus methods . gen mldEydxanalyticalderivative = -_b[x]*exp(-_b[_cons] - _b[x]*x) . quietly sum mldEydxanalyticalderivative

114

. scalar mlmesaad = r(mean) . di "Sample average marginal effect by analytical derivative = " mlmesaad Sample average marginal effect by analytical derivative = .60976598 . . * (1B) Sample average marginal effect by numerical derivative . egen sdx = sd(x) . gen y0 = exp(-_b[_cons] - _b[x]*x) . gen xoriginal = x . replace x = x+0.0001*sdx (10000 real changes made) . gen y1 = exp(-_b[_cons] - _b[x]*x) . gen mldEydxnumericalderivative = (y1 - y0)/(0.0001*sdx) . quietly sum mldEydxnumericalderivative . scalar mlmesand = r(mean) . di "ML sample average marginal effect by numerical derivative = " mlmesand ML sample average marginal effect by numerical derivative = .60949063 . replace x = xoriginal (10000 real changes made) . drop xoriginal sdx y0 y1 . . * (2) Sample average marginal effect by increase x by one unit (finite difference) . gen mldEydxfinitedifference = exp(-_b[_cons]-_b[x]*(x+1)) - exp(-_b[_cons]-_b[x]*x) . quietly sum mldEydxfinitedifference . scalar mlmesafd = r(mean) . di "Sample average marginal effect by first differnce = " mlmesafd Sample average marginal effect by first differnce = 1.0414485 . . * (3) Marginal effect estimated at the sample mean of x . quietly sum x . scalar meanx = r(mean) . scalar mldEydxatmeanx = -_b[x]*exp(-_b[_cons] - _b[x]*meanx)

115

. di "ML marginal effect at mean of x by analytical derivative: " ML marginal effect at mean of x by analytical derivative: . di mldEydxatmeanx .371742 . . ********** DISPLAY RESULTS on p.162-3 ********** . . di "Marginal Effects: (1A) Analytical deriv (1B) Numerical Deriv (2) First diff" Marginal Effects: (1A) Analytical deriv (1B) Numerical Deriv (2) First diff . sum dEydxfinitedifference dEydxanalyticalderivative dEydxnumericalderivative Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------dEydxfinit~e | 10000 1.041449 1.373144 .01325 32.59646 dEydxanaly~e | 10000 .609766 .8039727 .0077578 19.08516 dEydxnumer~e | 10000 .6094904 .8035654 .0077479 19.11325 . . di "KEY RESULTS FOR CHAPTER 5.9.4 pp.162-3 FOLLOW" KEY RESULTS FOR CHAPTER 5.9.4 pp.162-3 FOLLOW . di "(1A) Sample average marginal effect by analytical derivative = " mesaad (1A) Sample average marginal effect by analytical derivative = .60976598 . di "(1B) Sample average marginal effect by numerical derivative = " mesand (1B) Sample average marginal effect by numerical derivative = .60949044 . di "(2) Sample average marginal effect by first differences = " mesafd (2) Sample average marginal effect by first differences = 1.0414485 . di "(3) Marginal effect at mean of x by analytical derivative = " dEydxatmeanx (3) Marginal effect at mean of x by analytical derivative = .371742 . . ********** CLOSE OUTPUT ********** . log close log: c:\Imbook\bwebpage\Section2\mma05p4margeffects.txt log type: text closed on: 17 May 2005, 13:57:06

116

-----------------------------------------------------------------------------------------------------log: c:\Imbook\bwebpage\Section2\mma06p2Theil.txt log type: text opened on: 18 May 2005, 17:45:50 . . ********** OVERVIEW OF MMA06P2THEIL.DO ********** . . * STATA Program . * copyright C 2005 by A. Colin Cameron and Pravin K. Trivedi . * used for "Microeconometrics: Methods and Applications" . * by A. Colin Cameron and Pravin K. Trivedi (2005) . * Cambridge University Press . . * NOTE: Stata does not have a NL2SLS command . . * Chapter 6.5.4 nonlinear 2SLS example. . * Table 6.4 partial only . * (1) OLS inconsistent . * (2) NL2SLS consistent NOT INCLUDED AS STATA DOES NOT DO . * (3) Wrong 2SLS inconsistent . . * To run this program you need data set .* mma06p1nl2sls.asc . * generated by Limdep program MMA06P1NL2SLS.LIM . . * Some of the analysis is done in Limdep which (unlike Stata) has . * an NL2SLS command . . ********** SETUP ********** . . set more off . version 8.0 . . ********** READ DATA and SUMMARIZE ********** . . * Model is y = 1*x^2 + u .* x = 1*z + v . * where u and v are joint normal (0,0,1,1,0.8) . . infile y x xsq z zsq u v using mma06p1nl2sls.asc (200 observations read) . . * Descriptive Statistics . describe Contains data obs: 200 117

vars: 7 size: 6,400 (99.9% of memory free) ------------------------------------------------------------------------------storage display value variable name type format label variable label ------------------------------------------------------------------------------y float %9.0g x float %9.0g xsq float %9.0g z float %9.0g zsq float %9.0g u float %9.0g v float %9.0g ------------------------------------------------------------------------------Sorted by: Note: dataset has changed since last saved . summarize Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------y| 200 1.632794 2.418096 -2.332656 9.354863 x| 200 .9970513 .8330302 -1.908285 2.696363 xsq | 200 1.684581 1.638509 .0000948 7.270374 z| 200 1 0 1 1 zsq | 200 1 0 1 1 -------------+-------------------------------------------------------u| 200 -.0517871 .9427286 -2.816687 2.202356 v| 200 -.0029487 .8330302 -2.908285 1.696363 . . ********** DO THE ANALYSIS: ESTIMATE MODELS ********** . . * (1) OLS is inconsistent (first column of Table 4.4) . regress y xsq, noconstant Source | SS df MS Number of obs = 200 -------------+-----------------------------F( 1, 199) = 2250.83 Model | 1558.96322 1 1558.96322 Prob > F = 0.0000 Residual | 137.83055 199 .692615831 R-squared = 0.9188 -------------+-----------------------------Adj R-squared = 0.9184 Total | 1696.79377 200 8.48396883 Root MSE = .83224 -----------------------------------------------------------------------------y| Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------xsq | 1.189495 .0250721 47.44 0.000 1.140054 1.238936 -----------------------------------------------------------------------------. estimates store olswrong

118

. regress y xsq, noconstant robust Regression with robust standard errors Number of obs = F( 1, 199) = 3850.71 Prob > F = 0.0000 R-squared = 0.9188 Root MSE = .83224

200

-----------------------------------------------------------------------------| Robust y| Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------xsq | 1.189495 .0191687 62.05 0.000 1.151695 1.227295 -----------------------------------------------------------------------------. estimates store olswrongrob . . * (2) NL2SLS command Stata does not have . * See LIMDEP program MMA06P1NL2SLS.LIM . . * (3A) Theil's 2sls where first regress x on z is inconsistent . regress x z, noconstant Source | SS df MS Number of obs = 200 -------------+-----------------------------F( 1, 199) = 286.51 Model | 198.822258 1 198.822258 Prob > F = 0.0000 Residual | 138.093918 199 .693939288 R-squared = 0.5901 -------------+-----------------------------Adj R-squared = 0.5881 Total | 336.916176 200 1.68458088 Root MSE = .83303 -----------------------------------------------------------------------------x| Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------z | .9970513 .0589041 16.93 0.000 .8808949 1.113208 -----------------------------------------------------------------------------. predict xhat (option xb assumed; fitted values) . gen xhatsq = xhat*xhat . regress y xhatsq, noconstant Source | SS df MS Number of obs = 200 -------------+-----------------------------F( 1, 199) = 91.19 Model | 533.203113 1 533.203113 Prob > F = 0.0000 Residual | 1163.59065 199 5.84718921 R-squared = 0.3142 -------------+-----------------------------Adj R-squared = 0.3108 Total | 1696.79377 200 8.48396883 Root MSE = 2.4181

119

-----------------------------------------------------------------------------y| Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------xhatsq | 1.642466 .1719981 9.55 0.000 1.303293 1.981638 -----------------------------------------------------------------------------. estimates store ivwrong . . ********** DISPLAY KEY RESULTS Table 6.4 p.199 ********** . . * Table 4.4 p.199 . estimates table olswrong olswrongrob ivwrong, b(%8.3f) se stats(N r2) keep(xsq xhatsq) ----------------------------------------------Variable | olswrong olswro~b ivwrong -------------+--------------------------------xsq | 1.189 1.189 | 0.025 0.019 xhatsq | 1.642 | 0.172 -------------+--------------------------------N | 200.000 200.000 200.000 r2 | 0.919 0.919 0.314 ----------------------------------------------legend: b/se . . * (3B) IV with instrument xsq for zsq should work but Stata cannot do . ivreg y (xsq = xsq), noconstant Instrumental variables (2SLS) regression Source | SS df MS Number of obs = 200 -------------+-----------------------------F( 1, 199) = . Model | 1558.96322 1 1558.96322 Prob > F = . Residual | 137.83055 199 .692615831 R-squared = . -------------+-----------------------------Adj R-squared = . Total | 1696.79377 200 8.48396883 Root MSE = .83224 -----------------------------------------------------------------------------y| Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------xsq | 1.189495 .0250721 47.44 0.000 1.140054 1.238936 -----------------------------------------------------------------------------Instrumented: xsq Instruments: xsq -----------------------------------------------------------------------------. corr xsq xsq (obs=200) 120

| xsq xsq -------------+-----------------xsq | 1.0000 xsq | 1.0000 1.0000

. corr xsq z (obs=200) | xsq z -------------+-----------------xsq | 1.0000 z| . .

. regress xsq z, noconstant Source | SS df MS Number of obs = 200 -------------+-----------------------------F( 1, 199) = 211.41 Model | 567.562553 1 567.562553 Prob > F = 0.0000 Residual | 534.257348 199 2.68471029 R-squared = 0.5151 -------------+-----------------------------Adj R-squared = 0.5127 Total | 1101.8199 200 5.50909951 Root MSE = 1.6385 -----------------------------------------------------------------------------xsq | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------z | 1.684581 .1158601 14.54 0.000 1.45611 1.913052 -----------------------------------------------------------------------------. predict xsqhat (option xb assumed; fitted values) . regress y xsqhat, noconstant Source | SS df MS Number of obs = 200 -------------+-----------------------------F( 1, 199) = 91.19 Model | 533.203113 1 533.203113 Prob > F = 0.0000 Residual | 1163.59065 199 5.84718921 R-squared = 0.3142 -------------+-----------------------------Adj R-squared = 0.3108 Total | 1696.79377 200 8.48396883 Root MSE = 2.4181 -----------------------------------------------------------------------------y| Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------xsqhat | .9692582 .1015002 9.55 0.000 .7691043 1.169412 -----------------------------------------------------------------------------. * ivreg y (xsq = z), noconstant . 121

. gen one = 1 . regress y one, noconstant Source | SS df MS Number of obs = 200 -------------+-----------------------------F( 1, 199) = 91.19 Model | 533.203113 1 533.203113 Prob > F = 0.0000 Residual | 1163.59065 199 5.84718921 R-squared = 0.3142 -------------+-----------------------------Adj R-squared = 0.3108 Total | 1696.79377 200 8.48396883 Root MSE = 2.4181 -----------------------------------------------------------------------------y| Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------one | 1.632794 .1709852 9.55 0.000 1.295618 1.969969 -----------------------------------------------------------------------------. . ********** CLOSE OUTPUT ********** . log close log: c:\Imbook\bwebpage\Section2\mma06p2Theil.txt log type: text closed on: 18 May 2005, 17:45:50 -----------------------------------------------------------------------------------------------------log: c:\Imbook\bwebpage\Section2\mma06p2twostage.txt log type: text opened on: 18 May 2005, 17:59:06 . . ********** OVERVIEW OF MMA06P2TWOSTAGE.DO ********** . . * STATA Program . * copyright C 2005 by A. Colin Cameron and Pravin K. Trivedi . * used for "Microeconometrics: Methods and Applications" . * by A. Colin Cameron and Pravin K. Trivedi (2005) . * Cambridge University Press . . * NOTE: Stata does not have a NL2SLS command . . * Chapter 6.5.4 nonlinear 2SLS example on pages 198-9. . . * Table 6.4 partial only . * (1) OLS inconsistent . * (2) NL2SLS consistent NOT INCLUDED AS STATA DOES NOT DO . * (3) Twostage Here 2SLS using Theil's interpretation of 2SLS is inconsistent . . * To run this program you need data set .* mma06p1nl2sls.asc . * generated by Limdep program MMA06P1NL2SLS.LIM . . * Some of the analysis is done in Limdep which (unlike Stata) has 122

. * an NL2SLS command . . ********** SETUP ********** . . set more off . version 8.0 . . ********** READ DATA and SUMMARIZE ********** . . * Model is y = 1*x^2 + u .* x = 1*z + v . * where u and v are joint normal (0,0,1,1,0.8) . . infile y x xsq z zsq u v using mma06p1nl2sls.asc (200 observations read) . . * Descriptive Statistics . describe Contains data obs: 200 vars: 7 size: 6,400 (99.9% of memory free) ------------------------------------------------------------------------------storage display value variable name type format label variable label ------------------------------------------------------------------------------y float %9.0g x float %9.0g xsq float %9.0g z float %9.0g zsq float %9.0g u float %9.0g v float %9.0g ------------------------------------------------------------------------------Sorted by: Note: dataset has changed since last saved . summarize Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------y| 200 1.632794 2.418096 -2.332656 9.354863 x| 200 .9970513 .8330302 -1.908285 2.696363 xsq | 200 1.684581 1.638509 .0000948 7.270374 z| 200 1 0 1 1 zsq | 200 1 0 1 1 -------------+-------------------------------------------------------123

u| v|

200 -.0517871 200 -.0029487

.9427286 -2.816687 2.202356 .8330302 -2.908285 1.696363

. . ********** DO THE ANALYSIS: ESTIMATE MODELS ********** . . * (1) OLS is inconsistent (first column of Table 4.4) . regress y xsq, noconstant Source | SS df MS Number of obs = 200 -------------+-----------------------------F( 1, 199) = 2250.83 Model | 1558.96322 1 1558.96322 Prob > F = 0.0000 Residual | 137.83055 199 .692615831 R-squared = 0.9188 -------------+-----------------------------Adj R-squared = 0.9184 Total | 1696.79377 200 8.48396883 Root MSE = .83224 -----------------------------------------------------------------------------y| Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------xsq | 1.189495 .0250721 47.44 0.000 1.140054 1.238936 -----------------------------------------------------------------------------. estimates store olswrong . regress y xsq, noconstant robust Regression with robust standard errors Number of obs = F( 1, 199) = 3850.71 Prob > F = 0.0000 R-squared = 0.9188 Root MSE = .83224

200

-----------------------------------------------------------------------------| Robust y| Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------xsq | 1.189495 .0191687 62.05 0.000 1.151695 1.227295 -----------------------------------------------------------------------------. estimates store olswrongrob . . * (2) NL2SLS command Stata does not have . * See LIMDEP program MMA06P1NL2SLS.LIM . * See also code further down . . * (3A) Theil's 2sls where first regress x on z .* and then use xhat^2 as instrument for x^2 is inconsistent . . regress x z, noconstant

124

Source | SS df MS Number of obs = 200 -------------+-----------------------------F( 1, 199) = 286.51 Model | 198.822258 1 198.822258 Prob > F = 0.0000 Residual | 138.093918 199 .693939288 R-squared = 0.5901 -------------+-----------------------------Adj R-squared = 0.5881 Total | 336.916176 200 1.68458088 Root MSE = .83303 -----------------------------------------------------------------------------x| Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------z | .9970513 .0589041 16.93 0.000 .8808949 1.113208 -----------------------------------------------------------------------------. predict xhat (option xb assumed; fitted values) . gen xhatsq = xhat*xhat . regress y xhatsq, noconstant Source | SS df MS Number of obs = 200 -------------+-----------------------------F( 1, 199) = 91.19 Model | 533.203113 1 533.203113 Prob > F = 0.0000 Residual | 1163.59065 199 5.84718921 R-squared = 0.3142 -------------+-----------------------------Adj R-squared = 0.3108 Total | 1696.79377 200 8.48396883 Root MSE = 2.4181 -----------------------------------------------------------------------------y| Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------xhatsq | 1.642466 .1719981 9.55 0.000 1.303293 1.981638 -----------------------------------------------------------------------------. estimates store twostage . . ********** DISPLAY KEY RESULTS Table 6.4 p.199 ********** . . * Table 4.4 p.199 first and third columns . estimates table olswrong twostage, b(%8.3f) se stats(N r2) keep(xsq xhatsq) -----------------------------------Variable | olswrong twostage -------------+---------------------xsq | 1.189 | 0.025 xhatsq | 1.642 | 0.172 -------------+---------------------N | 200.000 200.000 r2 | 0.919 0.314 125

-----------------------------------legend: b/se . . ********** FURTHER ANALYSIS ********** . . * For this particular example there are ways to get linear IV to work . * as the problem is not very nonlinear . . * (2A) regress xsq on z giving xsqhat and then regress y on xsqhat .* Gives nl2sls estimator though not correct standard errors . . * Note we get estimator 0.969 which is correct - Table 6.4 had typo . regress xsq z, noconstant Source | SS df MS Number of obs = 200 -------------+-----------------------------F( 1, 199) = 211.41 Model | 567.562553 1 567.562553 Prob > F = 0.0000 Residual | 534.257348 199 2.68471029 R-squared = 0.5151 -------------+-----------------------------Adj R-squared = 0.5127 Total | 1101.8199 200 5.50909951 Root MSE = 1.6385 -----------------------------------------------------------------------------xsq | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------z | 1.684581 .1158601 14.54 0.000 1.45611 1.913052 -----------------------------------------------------------------------------. predict xsqhat (option xb assumed; fitted values) . regress y xsqhat, noconstant Source | SS df MS Number of obs = 200 -------------+-----------------------------F( 1, 199) = 91.19 Model | 533.203113 1 533.203113 Prob > F = 0.0000 Residual | 1163.59065 199 5.84718921 R-squared = 0.3142 -------------+-----------------------------Adj R-squared = 0.3108 Total | 1696.79377 200 8.48396883 Root MSE = 2.4181 -----------------------------------------------------------------------------y| Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------xsqhat | .9692582 .1015002 9.55 0.000 .7691043 1.169412 -----------------------------------------------------------------------------. . * (2B) IV with instrument z for xsq should work but Stata cannot do .* for some reason due to here z = 1 which has no variation . ivreg y (xsq = z), noconstant note: z dropped due to collinearity 126

equation not identified; must have at least as many instruments not in the regression as there are instrumented variables r(481); end of do-file r(481); . exit, clear

127

-----------------------------------------------------------------------------------------------------log: c:\Imbook\bwebpage\Section2\mma07p1mltests.txt log type: text opened on: 17 May 2005, 13:59:20 . . ********** OVERVIEW OF MMA07P1MLTESTS.DO ********** . . * STATA Program . * copyright C 2005 by A. Colin Cameron and Pravin K. Trivedi . * used for "Microeconometrics: Methods and Applications" . * by A. Colin Cameron and Pravin K. Trivedi (2005) . * Cambridge University Press . . * Chapter 7.4 pp.241-3 . * Likelihood-based hypothesis tests . . * Implements the three likelihood-based tests presented in Table 7.1: . * Wald test . * LR test . * LM test direct . * LM test via auxiliary regression . * for a Poisson model with simulated data (see below). . . * NOTE: To implement this program requires: .* the free Stata add-on rndpoix . * To obtain this, in Stata give command: search rndpoix . * If you don't want to do this, instead use the data set . . ********** SETUP *********** . . version 8 . set more off . . ********** GENERATE DATA *********** . . * Model is . * y ~ Poisson[exp(b1 + b2*x2 + b3*x3 + b4*x4] . * where . * x2, x3 and x4 are iid ~ N[0,1] . * and b1=0, b2=0.1, b3=0.1 and b4=0.1 . . set seed 10001 . set obs 200 obs was 0, now 200 . scalar b1 = 0

128

. scalar b2 = 0.1 . scalar b3 = 0.1 . scalar b4 = 0.1 . . * Generate regressors . gen x2 = invnorm(uniform()) . gen x3 = invnorm(uniform()) . gen x4 = invnorm(uniform()) . . * Generate y . gen mupoiss = exp(b1+b2*x2+b3*x3+b4*x4) . * The next requires Stata add-on. In Stata: search rndpoix . rndpoix(mupoiss) ( Generating ....... ) Variable xp created. . gen y = xp . . sum Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------x2 | 200 -.0091098 1.010072 -2.857666 2.149822 x3 | 200 -.1459839 1.109521 -3.086754 3.111421 x4 | 200 -.0325314 .9674748 -2.852186 2.379461 mupoiss | 200 1.000447 .1993649 .6191922 1.903112 xp | 200 .845 .951579 0 6 -------------+-------------------------------------------------------y| 200 .845 .951579 0 6 . . * Write data to a text (ascii) file so can use with programs other than Stata . outfile y x2 x3 x4 using mma07p1mltests.asc, replace . . ********** ANALYSIS: LIKELIHOOD-BASED HYPOTHESIS TESTS *********** . . * Hypotheses to test are . * (A) Single exclusion: b3 = 0 . * (B) Multiple exclusion: b3 = 0, b4 = 0 . * (C) Linear: b3 = b4 . * (B) Nonlinear: b3/b4 = 1 . 129

. * Tests are Wald, LR, LM and LM (auxiliary) . . ****** (A) TEST H0: b3 = 0 . . * First skip to (B) where many comments given. . . ****** (B) TEST H0: b3 = 0, b4 = 0. . . * (1) Wald test requires estimation of unrestricted model only . poisson y x2 x3 x4 Iteration 0: log likelihood = -238.77153 Iteration 1: log likelihood = -238.77153 Poisson regression

Number of obs = 200 LR chi2(3) = 8.30 Prob > chi2 = 0.0401 Log likelihood = -238.77153 Pseudo R2 = 0.0171

-----------------------------------------------------------------------------y| Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------x2 | -.0275702 .0767909 -0.36 0.720 -.1780775 .1229371 x3 | .1630037 .0670848 2.43 0.015 .0315199 .2944874 x4 | .1026568 .0802139 1.28 0.201 -.0545595 .2598732 _cons | -.1653238 .0773479 -2.14 0.033 -.316923 -.0137246 -----------------------------------------------------------------------------. . * (1A) Stata Wald test command . test (x3=0) (x4=0) ( 1) [y]x3 = 0 ( 2) [y]x4 = 0 chi2( 2) = 8.57 Prob > chi2 = 0.0138 . . * (1B) Wald test done manually . * Use h'[RVR]-inv*h. . * Details below will change for each example. . * In particular, for nonlinear restrictions more work in forming R . * Note that Stata puts the intercept last, not first. . * So here the second and third elements of b are set to zero. . matrix bfull = e(b) /* 1xq row vector */ . matrix vfull = e(V)

/* qxq matrix */

. matrix h = (bfull[1,2]\bfull[1,3])

/* hx1 vector */

130

. matrix R = (0,1,0,0\0,0,1,0)

/* h x q matrix */

. matrix Wald = h'*syminv(R*vfull*R')*h /* scalar */ . matrix list h h[2,1] c1 r1 .16300365 r2 .10265681 . matrix list R R[2,4] c1 c2 c3 c4 r1 0 1 0 0 r2 0 0 1 0 . matrix list Wald symmetric Wald[1,1] c1 c1 8.5701855 . scalar WaldB = Wald[1,1] . . * (2) Likelihood ratio test requires estimating both models . . poisson y x2 x3 x4 Iteration 0: log likelihood = -238.77153 Iteration 1: log likelihood = -238.77153 Poisson regression

Number of obs = 200 LR chi2(3) = 8.30 Prob > chi2 = 0.0401 Log likelihood = -238.77153 Pseudo R2 = 0.0171

-----------------------------------------------------------------------------y| Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------x2 | -.0275702 .0767909 -0.36 0.720 -.1780775 .1229371 x3 | .1630037 .0670848 2.43 0.015 .0315199 .2944874 x4 | .1026568 .0802139 1.28 0.201 -.0545595 .2598732 _cons | -.1653238 .0773479 -2.14 0.033 -.316923 -.0137246 -----------------------------------------------------------------------------. estimates store unrestricted . scalar llunrest = e(ll)

/* Used for Stata lrtest */ /* Used for manual lrtest */ 131

. poisson y x2 Iteration 0: log likelihood = -242.92271 Iteration 1: log likelihood = -242.92271 (backed up) Poisson regression

Number of obs = 200 LR chi2(1) = 0.00 Prob > chi2 = 0.9608 Log likelihood = -242.92271 Pseudo R2 = 0.0000 -----------------------------------------------------------------------------y| Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------x2 | -.0037493 .0763386 -0.05 0.961 -.1533701 .1458716 _cons | -.1684599 .0769294 -2.19 0.029 -.3192388 -.0176811 -----------------------------------------------------------------------------. estimates store restrictedB . scalar llrestB = e(ll)

/* Used for Stata lrtest */ /* Used for Stata lrtest */

. . * (2A) Stata likelihood ratio test . lrtest unrestricted restrictedB likelihood-ratio test LR chi2(2) = 8.30 (Assumption: restrictedB nested in unrestricted) Prob > chi2 =

0.0157

. . * (2B) Likelihood test done manually . scalar LRB = -2*(llrestB-llunrest) . di "LR " LRB LR 8.3023503 . . * (3) LM test via direct compuation requires estimating only the restricted model. . . * For exclusion restrictions in the Poisson, from 7.6.2 . * LM = dlnL/db * V[b]-inv * dlnL/db where b evaluated at restricted . * = [Sum_i u_i*x_i]'[Sum_i exp(x_i'b)*x_i*x_i'][Sum_i u_i*x_i] . * First calculate Sum_i u_i*x_i' : a 1x4 row vector . . quietly poisson y x2 . predict yhatrest (option n assumed; predicted number of events) . gen u = y - yhatrest

/* yhatrest = exp(x_brest) calculated earlier */

132

. gen one = 1 . matrix vecaccum dlnL_db = u one x2 x3 x4, noconstant . * Then calculate Sum_i exp(x_i'b)*x_i*x_i' . gen trx1 = sqrt(yhatrest) . gen trx2 = sqrt(yhatrest)*x2 . gen trx3 = sqrt(yhatrest)*x3 . gen trx4 = sqrt(yhatrest)*x4 . matrix accum Vb = trx1 trx2 trx3 trx4, noconstant (obs=200) . matrix LMdirect = dlnL_db*syminv(Vb)*dlnL_db' . matrix list dlnL_db dlnL_db[1,4] one x2 x3 x4 u 1.192e-07 -4.632e-08 37.578639 19.933299 . matrix list Vb symmetric Vb[4,4] trx1 trx2 trx3 trx4 trx1 169 trx2 -2.1828434 171.62608 trx3 -24.733563 16.929495 210.68156 trx4 -5.561359 17.0457 23.027167 157.58531 . matrix list LMdirect symmetric LMdirect[1,1] u u 8.5750886 . scalar LMdirectB = LMdirect[1,1] . . * (4) LM test via auxiliary regression . . * N uncentered Rsq from regress (noconstant) 1 on the scores . * Begin by computing the unrestricted scores at the restricted estimates. . * This varies from problem to problem. . * In general could compute lnf(y) at current parameters . * and then get numerical derivative when perturb beta a little. . * Here use analytical derivative. . * s_j = dlnf(y)/db_j = (y-exp(x'b))*x_j for the Poisson 133

. . drop yhatrest . quietly poisson y x2 . predict yhatrest (option n assumed; predicted number of events) . gen s1 = (y-yhatrest)*1 . gen s2 = (y-yhatrest)*x2 . gen s3 = (y-yhatrest)*x3 . gen s4 = (y-yhatrest)*x4 . regress one s1 s2 s3 s4, noconstant Source | SS df MS Number of obs = 200 -------------+-----------------------------F( 4, 196) = 2.36 Model | 9.18577727 4 2.29644432 Prob > F = 0.0549 Residual | 190.814223 196 .973541953 R-squared = 0.0459 -------------+-----------------------------Adj R-squared = 0.0265 Total | 200 200 1 Root MSE = .98668 -----------------------------------------------------------------------------one | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------s1 | -.0265153 .0748092 -0.35 0.723 -.1740497 .121019 s2 | -.0102806 .0809418 -0.13 0.899 -.1699093 .1493481 s3 | .1794153 .0697359 2.57 0.011 .0418862 .3169444 s4 | .1225885 .0821671 1.49 0.137 -.0394566 .2846336 -----------------------------------------------------------------------------. * LM equals N times uncentered Rsq . scalar LMauxB = e(N)*e(r2) . * Check: LM equals explained sum of squares . scalar LMauxB2 = e(mss) . di "LMauxB " LMauxB " LMauxB2 " LMauxB2 LMauxB 9.1857773 LMauxB2 9.1857773 . . * (5) DISPLAY RESULTS . . estimates table unrestricted restrictedB, se stats(N ll r2) b(%8.3f) -----------------------------------Variable | unrest~d restri~B -------------+---------------------134

x2 | -0.028 -0.004 | 0.077 0.076 x3 | 0.163 | 0.067 x4 | 0.103 | 0.080 _cons | -0.165 -0.168 | 0.077 0.077 -------------+---------------------N | 200.000 200.000 ll | -238.772 -242.923 r2 | -----------------------------------legend: b/se . * Wald test using stata default Poisson variance matrix . di "WaldB " WaldB " p-value " chi2tail(2,WaldB) WaldB 8.5701855 p-value .01377234 . * LR test using Poisson log-likelihoods . di " LRB " LRB " p-value " chi2tail(2,LRB) LRB 8.3023503 p-value .0157459 . * LM test direct . di " LMdirectB " LMdirectB " p-value " chi2tail(2,LMdirectB) LMdirectB 8.5750886 p-value .01373862 . * LM test direct by auxiliary regression . di " LMauxB " LMauxB " p-value " chi2tail(2,LMauxB) LMauxB 9.1857773 p-value .01012357 . . ****** (A) TEST H0: b3 = 0 . . * (1) Wald test . quietly poisson y x2 x3 x4 . test (x3=0) ( 1) [y]x3 = 0 chi2( 1) = 5.90 Prob > chi2 = 0.0151 . scalar WaldA = r(chi2) . . * (2) LR test . poisson y x2 x4 Iteration 0: log likelihood = -241.64842 135

Iteration 1: log likelihood = -241.64842 Poisson regression

Number of obs = 200 LR chi2(2) = 2.55 Prob > chi2 = 0.2793 Log likelihood = -241.64842 Pseudo R2 = 0.0053

-----------------------------------------------------------------------------y| Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------x2 | -.0163179 .0770381 -0.21 0.832 -.1673098 .134674 x4 | .1278017 .0800348 1.60 0.110 -.0290637 .284667 _cons | -.1719505 .0772389 -2.23 0.026 -.3233359 -.0205651 -----------------------------------------------------------------------------. estimates store restrictedA . lrtest unrestricted

/* Uses estimates store unrestricted from earlier */

likelihood-ratio test LR chi2(1) = 5.75 (Assumption: restrictedA nested in unrestricted) Prob > chi2 =

0.0165

. scalar LRA = r(chi2) . . * (3) LM test via direct compuation requires estimating only the restricted model. . * See (B) for more explanation . drop one yhatrest u trx1 trx2 trx3 trx4 . matrix drop dlnL_db Vb LMdirect . quietly poisson y x2 x4 . predict yhatrest (option n assumed; predicted number of events) . gen u = y - yhatrest

/* yhatrest = exp(x_brest) calculated earlier */

. gen one = 1 . matrix vecaccum dlnL_db = u one x2 x3 x4, noconstant . gen trx1 = sqrt(yhatrest) . gen trx2 = sqrt(yhatrest)*x2 . gen trx3 = sqrt(yhatrest)*x3 . gen trx4 = sqrt(yhatrest)*x4 . matrix accum Vb = trx1 trx2 trx3 trx4, noconstant 136

(obs=200) . matrix LMdirect = dlnL_db*syminv(Vb)*dlnL_db' . matrix list dlnL_db dlnL_db[1,4] one x2 x3 x4 u -1.788e-07 -1.717e-07 34.832631 -3.179e-07 . matrix list Vb symmetric Vb[4,4] trx1 trx2 trx3 trx4 trx1 169 trx2 -2.1828435 170.25918 trx3 -21.987555 15.647287 212.5673 trx4 14.371941 16.35821 22.067372 158.94405 . matrix list LMdirect symmetric LMdirect[1,1] u u 5.9159017 . scalar LMdirectA = LMdirect[1,1] . . * (4) LM test via auxiliary regression . * See (B) for more explanation . drop yhatrest s1 s2 s3 s4 one . quietly poisson y x2 x4 . predict yhatrest (option n assumed; predicted number of events) . gen s1 = (y-yhatrest)*1 . gen s2 = (y-yhatrest)*x2 . gen s3 = (y-yhatrest)*x3 . gen s4 = (y-yhatrest)*x4 . gen one = 1 . regress one s1 s2 s3 s4, noconstant Source | SS df MS -------------+------------------------------

Number of obs = 200 F( 4, 196) = 1.57 137

Model | 6.21794802 4 1.554487 Prob > F = 0.1832 Residual | 193.782052 196 .988683939 R-squared = 0.0311 -------------+-----------------------------Adj R-squared = 0.0113 Total | 200 200 1 Root MSE = .99433 -----------------------------------------------------------------------------one | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------s1 | -.021781 .0760166 -0.29 0.775 -.1716964 .1281344 s2 | .0237921 .082791 0.29 0.774 -.1394834 .1870675 s3 | .1785093 .0711813 2.51 0.013 .0381297 .3188889 s4 | -.0065009 .084884 -0.08 0.939 -.1739042 .1609024 -----------------------------------------------------------------------------. * LM equals N times uncentered Rsq . scalar LMauxA = e(N)*e(r2) . di "LMauxA " LMauxA LMauxA 6.217948 . . * (5) DISPLAY RESULTS in Table 7.1 page 242 . . estimates table unrestricted restrictedA, se stats(N ll r2) b(%8.3f) -----------------------------------Variable | unrest~d restri~A -------------+---------------------x2 | -0.028 -0.016 | 0.077 0.077 x3 | 0.163 | 0.067 x4 | 0.103 0.128 | 0.080 0.080 _cons | -0.165 -0.172 | 0.077 0.077 -------------+---------------------N | 200.000 200.000 ll | -238.772 -241.648 r2 | -----------------------------------legend: b/se . di "WaldA " WaldA " p-value " chi2tail(1,WaldA) WaldA 5.9040087 p-value .01510647 . di " LRA " LRA " p-value " chi2tail(1,LRA) LRA 5.7537678 p-value .01645333 . di " LMdirectA " LMdirectA " p-value " chi2tail(1,LMdirectA) LMdirectA 5.9159017 p-value .01500482 138

. di " LMauxA " LMauxA " p-value " chi2tail(1,LMauxA) LMauxA 6.217948 p-value .01264616 . . ****** (C) TEST H0: b3 = b4 . . * (1A) Wald test . poisson y x2 x3 x4 Iteration 0: log likelihood = -238.77153 Iteration 1: log likelihood = -238.77153 Poisson regression

Number of obs = 200 LR chi2(3) = 8.30 Prob > chi2 = 0.0401 Log likelihood = -238.77153 Pseudo R2 = 0.0171 -----------------------------------------------------------------------------y| Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------x2 | -.0275702 .0767909 -0.36 0.720 -.1780775 .1229371 x3 | .1630037 .0670848 2.43 0.015 .0315199 .2944874 x4 | .1026568 .0802139 1.28 0.201 -.0545595 .2598732 _cons | -.1653238 .0773479 -2.14 0.033 -.316923 -.0137246 -----------------------------------------------------------------------------. test (x3=x4) ( 1) [y]x3 - [y]x4 = 0 chi2( 1) = 0.29 Prob > chi2 = 0.5883 . . * (1B) Wald test done manually . * Note that Stata puts the intercept last, not first. . * So here the second and third elements of b are tested as equal. . matrix drop h R Wald . matrix bfull = e(b)

/* 1xq row vector */

. matrix vfull = e(V)

/* qxq matrix */

. matrix h = (bfull[1,2]-bfull[1,3]) . matrix R = (0,1,-1,0)

/* hx1 vector */

/* h x q matrix */

. matrix Wald = h'*syminv(R*vfull*R')*h /* scalar */ . matrix list h 139

symmetric h[1,1] c1 r1 .06034684 . matrix list R R[1,4] c1 c2 c3 c4 r1 0 1 -1 0 . matrix list Wald symmetric Wald[1,1] c1 c1 .29301766 . scalar WaldC = Wald[1,1] . di " WaldC " WaldC " p-value " chi2tail(1,WaldC) WaldC .29301766 p-value .5882932 . . * (2) LR Test . * In general getting the restricted MLE requires constrained ML . * Here simple as if b3=b4 then mean is exp(b1+b2*x2+B3*(x3+x4)) . gen x3plusx4 = x3+x4 . poisson y x2 x3plusx4 Iteration 0: log likelihood = -238.91785 Iteration 1: log likelihood = -238.91785 Poisson regression

Number of obs = 200 LR chi2(2) = 8.01 Prob > chi2 = 0.0182 Log likelihood = -238.91785 Pseudo R2 = 0.0165

-----------------------------------------------------------------------------y | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------x2 | -.0287235 .0768651 -0.37 0.709 -.1793763 .1219293 x3plusx4 | .1374814 .0479519 2.87 0.004 .0434974 .2314653 _cons | -.1672262 .0773265 -2.16 0.031 -.3187832 -.0156691 -----------------------------------------------------------------------------. estimates store restrictedC . lrtest unrestricted

/* Uses estimates store unrestricted from earlier */

likelihood-ratio test

LR chi2(1) =

0.29 140

(Assumption: restrictedC nested in unrestricted)

Prob > chi2 =

0.5885

. scalar LRC = r(chi2) . . * (3) LM test direct . * Can use same code as earlier. Just different restricted estimates. . * Now from poisson y x2 x3plusx4 . drop one yhatrest u trx1 trx2 trx3 trx4 . matrix drop dlnL_db Vb . quietly poisson y x2 x3plusx4 . predict yhatrest (option n assumed; predicted number of events) . gen u = y - yhatrest

/* yhatrest = exp(x_brest) calculated earlier */

. gen one = 1 . matrix vecaccum dlnL_db = u one x2 x3 x4, noconstant . gen trx1 = sqrt(yhatrest) . gen trx2 = sqrt(yhatrest)*x2 . gen trx3 = sqrt(yhatrest)*x3 . gen trx4 = sqrt(yhatrest)*x4 . matrix accum Vb = trx1 trx2 trx3 trx4, noconstant (obs=200) . matrix LMdirect = dlnL_db*syminv(Vb)*dlnL_db' . matrix list dlnL_db dlnL_db[1,4] one x2 x3 x4 u 8.345e-07 -3.601e-07 4.8459933 -4.8459932 . matrix list Vb symmetric Vb[4,4] trx1 trx2 trx3 trx4 trx1 169 trx2 -2.1828442 171.13986 trx3 7.9990827 13.105974 225.99023 trx4 19.217934 15.11254 28.153892 161.75506

141

. matrix list LMdirect symmetric LMdirect[1,1] u u .29306257 . scalar LMdirectC = LMdirect[1,1] . . * (4) LM test via auxiliary regression . drop yhatrest s1 s2 s3 s4 one . quietly poisson y x2 x3plusx4 . predict yhatrest (option n assumed; predicted number of events) . gen s1 = (y-yhatrest)*1 . gen s2 = (y-yhatrest)*x2 . gen s3 = (y-yhatrest)*x3 . gen s4 = (y-yhatrest)*x4 . gen one = 1 . regress one s1 s2 s3 s4, noconstant Source | SS df MS Number of obs = 200 -------------+-----------------------------F( 4, 196) = 0.08 Model | .31510777 4 .078776943 Prob > F = 0.9891 Residual | 199.684892 196 1.01880047 R-squared = 0.0016 -------------+-----------------------------Adj R-squared = -0.0188 Total | 200 200 1 Root MSE = 1.0094 -----------------------------------------------------------------------------one | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------s1 | -.000531 .077731 -0.01 0.995 -.1538275 .1527654 s2 | .012802 .0857027 0.15 0.881 -.1562159 .1818199 s3 | .0283145 .0761713 0.37 0.711 -.121906 .1785351 s4 | -.0367099 .0869889 -0.42 0.673 -.2082642 .1348445 -----------------------------------------------------------------------------. * LM equals N times uncentered Rsq . scalar LMauxC = e(N)*e(r2) . di "LMauxC " LMauxC LMauxC .31510777

142

. . * (5) DISPLAY RESULTS in Table 7.1 page 242 . . estimates table unrestricted restrictedC, se stats(N ll r2) b(%8.3f) -----------------------------------Variable | unrest~d restri~C -------------+---------------------x2 | -0.028 -0.029 | 0.077 0.077 x3 | 0.163 | 0.067 x4 | 0.103 | 0.080 x3plusx4 | 0.137 | 0.048 _cons | -0.165 -0.167 | 0.077 0.077 -------------+---------------------N | 200.000 200.000 ll | -238.772 -238.918 r2 | -----------------------------------legend: b/se . di "WaldC " WaldC " p-value " chi2tail(1,WaldC) WaldC .29301766 p-value .5882932 . di " LRC " LRC " p-value " chi2tail(1,LRC) LRC .29264001 p-value .5885337 . di " LMdirectC " LMdirectC " p-value " chi2tail(1,LMdirectC) LMdirectC .29306257 p-value .58826462 . di " LMauxC " LMauxC " p-value " chi2tail(1,LMauxC) LMauxC .31510777 p-value .57456264 . . ****** (D) TEST H0: b3/b4 - 1 = 0 . . * (1) Wald test of b3 /b4 - 1 = 0 . * Stata does not do nonlinear hypotheses. . * Instead do 7.2.5 algebra. . matrix drop h R Wald . matrix h = (bfull[1,2]/bfull[1,3] - 1) . matrix R = (0, 1/bfull[1,3], -bfull[1,2]/(bfull[1,3]^2), 0) . matrix Wald = h'*syminv(R*vfull*R')*h

143

. matrix list h symmetric h[1,1] c1 r1 .58785028 . matrix list R R[1,4] r1

c1 c2 c3 c4 0 9.7411946 -15.467559

0

. matrix list Wald symmetric Wald[1,1] c1 c1 .15768686 . scalar WaldD = Wald[1,1] . di " WaldD " WaldD " p-value " chi2tail(1,WaldD) WaldD .15768686 p-value .69129516 . . * (2) LR Test . * This requires MLE subject to nonlinear constraints. . * This is difficult so not done here. . * But note that here will get same result as if . * get MLE subject to b3 = b4 which was done in (C). . . * (3) LM test direct . * Like (2) requires restricted MLE. . * This is difficult so not done here. . * But note that here will get same result as if . * get MLE subject to b3 = b4 which was done in (C). . . * (4) LM test via auxiliary regrression . * Same as for (3) . . * (5) DISPLAY RESULTS . di "WaldD " WaldD " p-value " chi2tail(1,WaldD) WaldD .15768686 p-value .69129516 . . . *********** DISPLAY RESULTS GIVEN IN TABLE 7.1 on page 242 *********** . . estimates table unrestricted restrictedA restrictedB restrictedC, se stats(N ll r2) b(%8.3f) ---------------------------------------------------------Variable | unrest~d restri~A restri~B restri~C 144

-------------+-------------------------------------------x2 | -0.028 -0.016 -0.004 -0.029 | 0.077 0.077 0.076 0.077 x3 | 0.163 | 0.067 x4 | 0.103 0.128 | 0.080 0.080 x3plusx4 | 0.137 | 0.048 _cons | -0.165 -0.172 -0.168 -0.167 | 0.077 0.077 0.077 0.077 -------------+-------------------------------------------N | 200.000 200.000 200.000 200.000 ll | -238.772 -241.648 -242.923 -238.918 r2 | ---------------------------------------------------------legend: b/se . di "WaldA " WaldA " p-value " chi2tail(1,WaldA) WaldA 5.9040087 p-value .01510647 . . * Wald test statistics . di "Wald A to D: (A) " %8.3f WaldA " (B) " %8.3f WaldB " (C) " %8.3f WaldC " (D) " %8.3f WaldD Wald A to D: (A) 5.904 (B) 8.570 (C) 0.293 (D) 0.158 . di " p-values : (A) " %8.3f chi2tail(1,WaldA) " (B) " %8.3f chi2tail(2,WaldB) " (C) " %8.3f chi2t > ail(1,WaldC) " (D) " %8.3f chi2tail(1,WaldD) p-values : (A) 0.015 (B) 0.014 (C) 0.588 (D) 0.691 . . * LR test statistics . di "LR A to D: (A) " %8.3f LRA " (B) " %8.3f LRB " (C) " %8.3f LRC " (D) " %8.3f LRC LR A to D: (A) 5.754 (B) 8.302 (C) 0.293 (D) 0.293 . di " p-values : (A) " %8.3f chi2tail(1,LRA) " (B) " %8.3f chi2tail(2,LRB) " (C) " %8.3f chi2tail( > 1,LRC) " (D) " %8.3f chi2tail(1,LRC) p-values : (A) 0.016 (B) 0.016 (C) 0.589 (D) 0.589 . . * Direct LM test statistics . di "LM A to D: (A) " %8.3f LMdirectA " (B) " %8.3f LMdirectB " (C) " %8.3f LMdirectC " (D) " %8. > 3f LMdirectC LM A to D: (A) 5.916 (B) 8.575 (C) 0.293 (D) 0.293 . di " p-values: (A) " %8.3f chi2tail(1,LMdirectA) " (B) " %8.3f chi2tail(2,LMdirectB) " (C) " %8. > 3f chi2tail(1,LMdirectC) " (D) " %8.3f chi2tail(1,LMdirectC) p-values: (A) 0.015 (B) 0.014 (C) 0.588 (D) 0.588

145

. . * Auxiliary Regression LM test statistics . di "LM* A to D: (A) " %8.3f LMauxA " (B) " %8.3f LMauxB " (C) " %8.3f LMauxC " (D) " %8.3f LMauxC LM* A to D: (A) 6.218 (B) 9.186 (C) 0.315 (D) 0.315 . di " p-values : (A) " %8.3f chi2tail(1,LMauxA) " (B) " %8.3f chi2tail(2,LMauxB) " (C) " %8.3f chi > 2tail(1,LMauxC) " (D) " %8.3f chi2tail(1,LMauxC) p-values : (A) 0.013 (B) 0.010 (C) 0.575 (D) 0.575 . . ********** CLOSE OUTPUT *********** . log close log: c:\Imbook\bwebpage\Section2\mma07p1mltests.txt log type: text closed on: 17 May 2005, 13:59:21 -----------------------------------------------------------------------------------------------------log: c:\Imbook\bwebpage\Section2\mma07p2power.txt log type: text opened on: 17 May 2005, 14:00:49 . . ********** OVERVIEW OF MMA07P2POWER.DO ********** . . * STATA Program . * copyright C 2005 by A. Colin Cameron and Pravin K. Trivedi . * used for "Microeconometrics: Methods and Applications" . * by A. Colin Cameron and Pravin K. Trivedi (2005) . * Cambridge University Press . . * Chapter 7.6.3 pages 248-9 . * Asymptotic Power of Wald test . . * (1) Chapter 7.6.3 obtains power for noncentral chisquare . * (2) Figure 7.2 (ch7power.wmf) plots against the noncentrality parameter lamda . * No data needed . . ********** SETUP ********** . . set more off . version 8.0 . set scheme s1mono /* Graphics scheme */ . . ********** ANALYSIS ********** . . * Obtain power of chi-square tests 146

. * with df degrees of freedom . * and noncentrality parameter (ncp) lamda from 0 to 20 . * for size alpha = 0.01, 0.05 and 0.10 . . set obs 201 obs was 0, now 201 . scalar df = 1

/* Degrees of freedom */

. gen lamda = 0.1*(_n-1) /* Lamda = 0, 0.1, 0.2, ..., 19.9, 20.0 */ . . * Obtain power .* = Pr[W > chi-square(alpha) | W ~ chi-square(alpha)] . * for alpha = 0.01, 0.05 and 0.10 . . * Critical value at size alpha uses central chisquare . * invchi2tail gives cv such that Pr(Chi2 > cv) = alpha . * Power is 1 minus cdf of noncentral chisquare . * nchi2 gives the cdf of noncentral chisquare . . scalar alpha = 0.01 . scalar criticalvalue = invchi2tail(df,alpha) . gen power01 = 1-nchi2(df,lamda,criticalvalue) . . scalar alpha = 0.05 . scalar criticalvalue = invchi2tail(df,alpha) . gen power05 = 1-nchi2(df,lamda,criticalvalue) . . scalar alpha = 0.10 . scalar criticalvalue = invchi2tail(df,alpha) . gen power10 = 1-nchi2(df,lamda,criticalvalue) . . sum Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------lamda | 201 10 5.816786 0 20 power01 | 201 .6230651 .3095508 .01 .9710402 power05 | 201 .7583101 .2717153 .05 .9940005 power10 | 201 .8152767 .2396043 .1 .9976528

147

. * For lamda = 0 have size = power, here 0.01, 0.05 and 0.10 . list if lamda==0 | lamda==5 | lamda==10 | lamda==20 +----------------------------------------+ | lamda power01 power05 power10 | |----------------------------------------| 1. | 0 .01 .05 .1 | 51. | 5 .3670189 .6087795 .7228636 | 101. | 10 .7212129 .8853791 .9354209 | 201. | 20 .9710402 .9940005 .9976528 | +----------------------------------------+ . . ********** FIGURE 7.1 (p.249): PLOT THE POWER FUNCTION ********** . . graph twoway (line power10 lamda, clstyle(p1)) /* > */ (line power05 lamda, clstyle(p2)) /* > */ (line power01 lamda, clstyle(p3)), /* > */ scale (1.2) plotregion(style(none)) /* > */ title("Test Power as a function of the ncp") /* > */ xtitle("Noncentrality parameter lamda", size(medlarge)) xscale(titlegap(*5)) /* > */ ytitle("Test Power", size(medlarge)) yscale(titlegap(*5)) /* > */ legend(pos(3) ring(0) col(1)) legend(size(small)) /* > */ legend( label(1 "Test size = 0.10") label(2 "Test size = 0.05") /* > */ label(3 "Test size = 0.01")) . graph export ch7power.wmf, replace (file c:\Imbook\bwebpage\Section2\ch7power.wmf written in Windows Metafile format) . . ********** CLOSE OUTPUT ********** . log close log: c:\Imbook\bwebpage\Section2\mma07p2power.txt log type: text closed on: 17 May 2005, 14:00:52 -----------------------------------------------------------------------------------------------------log: c:\Imbook\bwebpage\Section2\mma07p3montecarlo.txt log type: text opened on: 18 May 2005, 11:28:58 . . ********** OVERVIEW OF MMA07P3MONTECARLO.DO ********** . . * STATA Program . * copyright C 2005 by A. Colin Cameron and Pravin K. Trivedi . * used for "Microeconometrics: Methods and Applications" . * by A. Colin Cameron and Pravin K. Trivedi (2005) . * Cambridge University Press . . * Chapter 7.7.1-7.7.5 pp. 250-4 148

. * Size and power of the Wald test . . * (1) Figure 7.2 Density of Wald test statistic . * (2) Table 7.2 Actual size of Wald test at various nominal sizes . * (3) Table 7.2 Actual power of Wald test at various nominal sizes . * (4) Table 7.2 Nominal power of Wald test at various nominal sizes . * (5) Alternative way to simulate using postfile rather than simulate . . * on the slope coefficient for a Probit model with simulated data (see below). . . * NOTE: Because this is a simulation using many samples (here 10,000) . * the generated data are not saved in a text file. . . * Problem can arise if in one of the simulations all of sample is y=0 or y=1 . * Then the probit model is not estimable. . * Then need increase sample size, change dgp or reduce number of simulations. . * Here used N=40 with S=10000 for size and for power . * Another possible change is to have same regressors x across simulations . . ********** SETUP ********** . . set more off . version 8.0 . set scheme s1mono /* Graphics scheme */ . . ********** MONTE CARLO OVERVIEW ********** . . * The data generating process is . * - Probit with Pr[y=1] = Phi(b1 + b2*x2) . * - where b1 = 0 and b2 = 1 . * - and regressor x ~ N[0,1] is fixed throughout the simulations . . * The sample size N set below in the global numobs . * The number of simulations S is set below in the global numsims . * A third option is to switch to same x in each sample. This needs to be done manually. . . * The simulation is done using stata command simulate . * At the end of the program, an alternative using postfile is given . . * The program investigates both size and power . * of the Wald test that b2 = 1. . * For power the dgp instead uses b2 = 2. . . ********** INITIAL SIMULATION SET UP ********** . . set seed 10101 . * Change the following for different sample size N 149

. global numobs "40" . * Change the following for different number of simulations S . global numsims "10000" . . ****** ANALYSIS: SIMULATION OF PROBIT MODEL SLOPE ESTIMATES AND WALD TEST . . * The program is rclass. . * This means the results returned by the program are put into r( ) . * Here we return meany, vary, betahat, sebetahat, ztestforbetaeq1 . . * The probit model is Pr[y=1] = Phi(b1 + b2*x2) where b1=0 and b2=1 . * For size calculations: b2 = 1 . * For power calculations: b2 = 1.5 (as an example) . * So pass the argument trueb2 as an argument. . . * The following three lines are only needed . * if the regressors are constant across simulations, . * as then need to generate once and put in a data file to be reused. . * They are commented out here as here (x,y) both resampled. . * Also simprobit and simprobit2 need one line changed if x is fixed. . /* > set obs numobs > gen x = invnorm(uniform()) > save xforsim, replace > */ . * This version of the program instead redraws both x and y in each simulation . . * The program has one argument . * - trueb2 = value of b2 in the dgp . . program simprobit, rclass 1. version 8.0 2. /* define arguments. Here trueb2 = b2 in Phi(b1 + b2*x2) */ . args trueb2 3. /* Generate the data: here x and y */ . drop _all 4. set obs $numobs 5. gen x = invnorm(uniform()) 6. /* If instead want same x in each simulation, > replace above line with: use xforsim */ . gen y = 0 7. replace y = 1 if 0 + `trueb2'*x + invnorm(uniform()) > 0 8. /* Summarize the generated data as a check */ . summarize y 9. return scalar ymean=r(mean) 10. return scalar yvar=r(Var) 11. /* Do probit and store key results */ . probit y x 150

12. return scalar b2hat=_b[x] 13. return scalar seb2hat = _se[x] 14. return scalar ztestforb2eq1 = (_b[x]-1)/_se[x] 15. end . . ****** (1) DISTRIBUTION OF WALD TEST STATISTIC (Figure 7.2 p.253) . . * Now call the program simprobit where . * - include values for each argument within the quotes " " . * (here the argument is b2true and is set to 1 for size and 1.5 for power) . * - make sure that ask for each of the returned results . . * For size calculations set trueb2 = 1 . simulate "simprobit 1" ymean=r(ymean) yvar=r(yvar) b2hat=r(b2hat) /* > */ seb2hat=r(seb2hat) ztestforb2eq1=r(ztestforb2eq1), reps($numsims) command: simprobit 1 statistics: ymean = r(ymean) yvar = r(yvar) b2hat = r(b2hat) seb2hat = r(seb2hat) ztestfor~1 = r(ztestforb2eq1) . . * Summary of the results returned by simulate . * For Wald test key output is ztestforb2eq1 . describe Contains data obs: 10,000 simulate: simprobit 1 vars: 5 18 May 2005 11:29 size: 240,000 (97.7% of memory free) ------------------------------------------------------------------------------storage display value variable name type format label variable label ------------------------------------------------------------------------------ymean float %9.0g r(ymean) yvar float %9.0g r(yvar) b2hat float %9.0g r(b2hat) seb2hat float %9.0g r(seb2hat) ztestforb2eq1 float %9.0g r(ztestforb2eq1) ------------------------------------------------------------------------------Sorted by: . summarize Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------ymean | 10000 .49946 .0794447 .225 .775 yvar | 10000 .2499373 .0089917 .1788462 .2564103 151

b2hat | 10000 1.133952 .4516738 -.0306482 9.389184 seb2hat | 10000 .3589645 .1561059 .1902922 4.583915 ztestforb2~1 | 10000 .1141294 .9558451 -4.087344 2.278257 . . * For b2hat there are two ways to estimate the standard deviation. . * One is the average of seb2hat, the standard error of b2hat . * The other is the standard deviation of b2hat. . * These are equal asymptotically, but perhaps not in small samples due to bias. . * Also aveseb2hat is used later in calculating asymptotic power. . sum seb2hat Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------seb2hat | 10000 .3589645 .1561059 .1902922 4.583915 . scalar aveseb2hat = r(mean) . sum b2hat Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------b2hat | 10000 1.133952 .4516738 -.0306482 9.389184 . scalar stdevb2hat = r(sd) . di "Average standard error of b2hat: " aveseb2hat Average standard error of b2hat: .3589645 . di "Standard deviation of b2hat: " stdevb2hat Standard deviation of b2hat: .45167383 . . * The Wald test statistic will be called Wald . gen Wald = ztestforb2eq1 . label var Wald "Wald test statistic" . . * The mean and st.dev. should be 0 and 1 if Wald ~ N[0,1] . sum Wald Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------Wald | 10000 .1141294 .9558451 -4.087344 2.278257 . . * The 2.5 and 97.5 percentiles should be -1.96 and 1.96 if Wald ~ N[0,1] . * They can be used to get size-adjusted Wald test at 5 percent. . _pctile Wald, p(2.5,99.5)

152

. display "Wald: Lower 2.5 percentile = " r(r1) " Upper 2.5 percentile = " r(r2) Wald: Lower 2.5 percentile = -1.904708 Upper 2.5 percentile = 2.0034728 . . * The density of the simulated values of the Wald test should be . * a standard normal density if Wald ~ N[0,1] . * The following plots kernel estimate of density of Wald and a N[0,1] density . * Could also do Student[N-k] but this looks same as N[0,1] if N>=30. . gen N01density = normden(Wald) . sum Wald Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------Wald | 10000 .1141294 .9558451 -4.087344 2.278257 . . graph twoway (kdensity Wald, range(-3 3) clstyle(p1)) /* > */ (connect N01density Wald if Wald>-3 & Wald<3, clstyle(p2) sort(Wald) s(i)), /* > */ scale (1.2) plotregion(style(none)) /* > */ title("Monte Carlo Simulations of Wald Test") /* > */ xtitle("Wald Test Statistic", size(medlarge)) xscale(titlegap(*5)) /* > */ ytitle("Density", size(medlarge)) yscale(titlegap(*5)) /* > */ legend(pos(11) ring(0) col(1)) legend(size(small)) /* > */ legend( label(1 "Monte Carlo") label(2 "Standard Normal") /* > */ label(3 "Test size = 0.01")) . graph export ch7montecarlo.wmf, replace (file c:\Imbook\bwebpage\Section2\ch7montecarlo.wmf written in Windows Metafile format) . . ****** (2) ACTUAL SIZE OF THE WALD TEST STATISTIC (Table 7.2, p.253) . . * Obtain the size properties of a two-sided Wald test . * That rejects if |Wald| > z_alpha/2 where alpha = .01, .05, .1, .2 . . * Convert to two-sided test by taking absolute value . gen absWald = abs(Wald) . . * Give key percentiles of |Wald| . * Percentiles must be in ascending order for Stata . _pctile absWald, p(0.80,0.90,0.95,0.99) . display "I[Upper percentiles of |Wald|: " " 1 " r(r4) " 5 " r(r3) " 10 " r(r2) " 20 " r(r1) I[Upper percentiles of |Wald|: 1 .0115847 5 .01074749 10 .00998338 20 .00923005 . . * Program to calculate actual size given nominal size . * Temporary variables and scalars are in quotes ` ' . program size, rclass 153

1. version 8.0 2. args nominalsize 3. tempvar reject 4. tempname normalcriticalvalue 5. quietly { 6. scalar `normalcriticalvalue' = invnorm(1-(`nominalsize'/2)) 7. gen `reject' = 0 8. replace `reject' = 1 if absWald > `normalcriticalvalue' 9. summarize `reject' 10. return scalar actualsize = r(mean) 11. } 12. end . . * Calculate actual size for nominal sizes 0.01, 0.05, 0.10 and 0.20 . size 0.01 . scalar actualsize01 = r(actualsize) . size 0.05 . scalar actualsize05 = r(actualsize) . size 0.10 . scalar actualsize10 = r(actualsize) . size 0.20 . scalar actualsize20 = r(actualsize) . . * Following gives Actual Size column of Table 7.2 (p.253) . * Nominal Sizes and Actual Sizes of Two-sided Wald Test . di "0.01: " actualsize01 _new "0.05: " actualsize05 _new /* > */ "0.10: " actualsize10 _new "0.20: " actualsize20 0.01: .0053 0.05: .0294 0.10: .0805 0.20: .1922 . . ****** (3) ACTUAL POWER OF THE WALD TEST STATISTIC (Table 7.2, p.253) . . * Consider power when b2 = 2 rather than 1 . . * Obtain the actual power by simulation . * Use the same program simprobit as for size, . * except the argument b2true is 2.0 rather than 1.0 . . drop _all 154

. . * For size calculations set trueb2 = 2 . simulate "simprobit 2" ymean=r(ymean) yvar=r(yvar) b2hat=r(b2hat) /* > */ seb2hat=r(seb2hat) ztestforb2eq1=r(ztestforb2eq1), reps(10000) command: simprobit 2 statistics: ymean = r(ymean) yvar = r(yvar) b2hat = r(b2hat) seb2hat = r(seb2hat) ztestfor~1 = r(ztestforb2eq1) . . * Calculate |Wald| . gen Wald = ztestforb2eq1 (71 missing values generated) . gen absWald = abs(Wald) (71 missing values generated) . . summarize Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------ymean | 9929 .4998389 .0791531 .225 .825 yvar | 9929 .249985 .0090933 .1480769 .2564103 b2hat | 9929 2.581075 2.73046 .8547966 209.9805 seb2hat | 9929 1.002628 5.799384 .2816004 540.1536 ztestforb2~1 | 9929 1.667773 .3853416 -.4042006 2.59991 -------------+-------------------------------------------------------Wald | 9929 1.667773 .3853416 -.4042006 2.59991 absWald | 9929 1.668285 .383118 .0033462 2.59991 . . * Calculate actual power for nominal sizes 0.01, 0.05, 0.10 and 0.20 . * This can use the earlier program size . size 0.01 . scalar actualpower01 = r(actualsize) . size 0.05 . scalar actualpower05 = r(actualsize) . size 0.10 . scalar actualpower10 = r(actualsize) . size 0.20 155

. scalar actualpower20 = r(actualsize) . . * Following gives Actual Power column of Table 7.2 (p.253) . * Nominal Sizes and Actual Power of Two-sided Wald Test . di "0.01: " actualpower01 _new "0.05: " actualpower05 _new /* > */ "0.10: " actualpower10 _new "0.20: " actualpower20 0.01: .0073 0.05: .2257 0.10: .6077 0.20: .8583 . . ****** (4) ASYMPTOTIC POWER OF THE WALD TEST STATISTIC (Table 7.2, p.253) . . * Consider power when b2 = 2 rather than 1 . . * Calculate asymptotic theoretical power using noncentral chisquare . * Asymptotic power = Pr[W > chi-square(alpha) | W ~ noncentral chi-square(alpha,ncp) . * The noncentrality parameter is 0.5*(delta^2)/(se[b2]^2) . * Here size has b2 = 1 and power has b2 = 1+delta . * So delta = b2true - 1. . * Need to find the standard error of b2. . * Use the average from earlier simulations. . . * Program to calculate asymptotic power given nominal size . * Temporary variables and scalars and arguments are in quotes ` ' . * invchi2tail gives cv such that Pr(Chi2 > cv) = nominalsize . * Power is 1 minus cdf of noncentral chisquare . * nchi2 gives the cdf of noncentral chisquare . . drop _all . . * Arguments are alpha (size), lamda and df (degrees of freedom) . program power, rclass 1. version 8.0 2. args alpha lamda df 3. tempname criticalvalue powervianoncentralchi 4. quietly { 5. scalar `criticalvalue' = invchi2tail(`df',`alpha') 6. scalar `powervianoncentralchi' = 1-nchi2(`df',`lamda',`criticalvalue') 7. return scalar asymppower = `powervianoncentralchi' 8. } 9. end . . * scalar criticalvalue = invchi2tail(df,alpha) . * replace power = 1-nchi2(df,lamda,criticalvalue) . 156

. * Calculate df and lamda. . * This uses an estimate of se[beta] obtained earlier . scalar delta = 1 /* Here 2 - 1. Changes for different alternatives */ . scalar lamda = 0.5*(delta*delta)/(aveseb2hat*aveseb2hat) . scalar df = 1 . di "delta: " delta " aveseb2hat: " aveseb2hat " lamda: " lamda " df: " df delta: 1 aveseb2hat: .3589645 lamda: 3.8803151 df: 1 . . * Calculate asymptotic power for nominal sizes 0.01, 0.05, 0.10 and 0.20 . power 0.01 lamda df . scalar asymppower01 = r(asymppower) . power 0.05 lamda df . scalar asymppower05 = r(asymppower) . power 0.10 lamda df . scalar asymppower10 = r(asymppower) . power 0.20 lamda df . scalar asymppower20 = r(asymppower) . . * Following gives Asymptotic Power column of Table 7.2 (p.253) . * Nominal Sizes and Asymptotic Power of Two-sided Wald Test . di "0.01: " asymppower01 _new "0.05: " asymppower05 _new /* > */ "0.10: " asymppower10 _new "0.20: " asymppower20 0.01: .2722675 0.05: .50398701 0.10: .62755902 0.20: .75494224 . . ****** (5) ALTERNATIVE ANALYSIS: SIMULATION METHOD USING POSTFILE . . * This is an alternative, given for completeness. . * This fails if the model is not estimable in any of the simulation samples. . * By contrast, simulate just drops that simulation sample and continues simulating. . . * For each round of the simulation, the variables in `sim' are sent . * as a new line to a stata data set simprobitresults. . * The names of these variables are given in quotes after S_1 . * Need as many names in quotes after S_1 as variables at post . * Then can analyze these using summarize etcetera 157

. . * This program has two arguments . * - numsims = desired number of simulations . * - trueb2 = slope coefficient used to generate the data . . drop _all . . program simprobit2 1. version 8.0 2. args numsims trueb2 3. tempname sim 4. postfile `sim' meany vary beta sterror ztestforbeta using probitsimresults, replace 5. quietly { 6. forvalues i = 1/`numsims' { 7. drop _all 8. set obs $numobs /* may need to change */ 9. gen x = invnorm(uniform()) 10. /* If instead want same x in each simulation > replace above line with: use xforsim */ . gen y = 0 11. /* Use b2 = 1.0 for size and 1.5 for power */ . replace y = 1 if 0+`trueb2'*x+invnorm(uniform()) > 0 12. summarize y 13. scalar meany=r(mean) 14. scalar vary=r(Var) 15. probit y x 16. scalar beta=_b[x] 17. scalar sterror = _se[x] 18. scalar ztestforbeta = (beta-1)/sterror 19. post `sim' (meany) (vary) (beta) (sterror) (ztestforbeta) 20. } 21. } 22. postclose `sim' 23. end . . simprobit2 $numsims 1 . use probitsimresults, clear . . * Here we just summarize results for comparison with earlier . * But could do the further analysis as above . sum Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------meany | 10000 .4989575 .0791248 .225 .775 vary | 10000 .2499885 .0090127 .1788462 .2564103 beta | 10000 1.135003 .4315248 .0901358 7.205799 158

sterror | 10000 .3583266 .133302 .1863547 3.360862 ztestforbeta | 10000 .1218973 .954814 -3.401833 2.299991 . . ********** CLOSE OUTPUT ********** . log close log: c:\Imbook\bwebpage\Section2\mma07p3montecarlo.txt log type: text closed on: 18 May 2005, 11:29:29 -----------------------------------------------------------------------------------------------------log: c:\Imbook\bwebpage\Section2\mma07p4boot.txt log type: text opened on: 18 May 2005, 21:36:29 . . ********** OVERVIEW OF MMA07BOOT4.DO ********** . . * STATA Program . * copyright C 2005 by A. Colin Cameron and Pravin K. Trivedi . * used for "Microeconometrics: Methods and Applications" . * by A. Colin Cameron and Pravin K. Trivedi (2005) . * Cambridge University Press . . * Chapter 7.8 pages 254-256 . * Bootstrap applied to probit model . * Provides . * (1) Bootstrap confidence intervals . * (2) Bootstrap hypothesis test without refinement . * (3) Bootstrap hypothesis test with refinement: percentile-t method . . * Note corrections to book . * - sample size is N=40 not N=30 . * - use 999 bootstrap replications not 1000 . * - for asymptotic refinement p.256 the critical region .* is (-1.89, 1.80) not (-2.62, 1.83) . . * For more detail on bootstrap see . * Chapter 11: Bootstrap Methods pages 355-383 . * and program mma11p1boot.do . . ********** SETUP ********** . . set more off . version 8 . . ********** GENERATE DATA ********** . . * DGP is Probit: Pr[y=1] = PHI(a + bx) 159

. * where x is N[0,1] . * and a = 0 and b = 1 . . * Change the following for different sample size N . global numobs "40" . . * Probit example with slope coefficient equal to 1 . set seed 10105 . set obs $numobs obs was 0, now 40 . gen x = invnorm(uniform()) . gen y = 0 . replace y = 1 if 0+1.0*x+invnorm(uniform()) > 0 (19 real changes made) . save xyforsim, replace file xyforsim.dta saved . summarize Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------x| 40 -.0359197 .9203391 -2.210579 1.45199 y| 40 .475 .5057363 0 1 . probit y x Iteration 0: Iteration 1: Iteration 2: Iteration 3: Iteration 4:

log likelihood = -27.675866 log likelihood = -22.927488 log likelihood = -22.735204 log likelihood = -22.733966 log likelihood = -22.733966

Probit estimates

Number of obs = 40 LR chi2(1) = 9.88 Prob > chi2 = 0.0017 Log likelihood = -22.733966 Pseudo R2 = 0.1786 -----------------------------------------------------------------------------y| Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------x | .8168831 .2942893 2.78 0.006 .2400867 1.393679 _cons | -.0725436 .2162576 -0.34 0.737 -.4964006 .3513135 -----------------------------------------------------------------------------. save mma07p4boot, replace 160

file mma07p4boot.dta saved . . * Write data to a text (ascii) file so can use with programs other than Stata . outfile y x using mma07p4boot.asc, replace . . ********** (1) BOOTSTRAP CONFIDENCE INTERVALS ********** . . * Stata produces four bootstrap 100*(1-alpha) confidence intervals . * (1)-(2) have no asymptotic refinement . * (3)-(4) have asymptotic refinement . . * (1) Regular asymptotic normal: bhat +/- t(S-1)_alpha/2*se(bhat) . * except instead of using the initial se(bhat) . * we use the standard deviation of bhat from the bootstrap reps . * and use t(S-1) rather than z for critical value . * where S = number of bootstrap reps . . * (2) Percentile method: which orders the bhat(s) from simulations and . * goes from alpha/2 lowest bhat(s) to the alpha/2 highest bhat(s) . * where (s) denotes the s-th bootstrap sample . . * (3) Bootstrap-corrected. Same as (4) with a=0 . . * (4) Bootstrap-corrected and accelerated. . * This works with the pivotal Wald statistic. . * See the manual [R]bootstrap or a textbook. . * e.g. Efron and Tibsharani (1993, pp.184-188) with a=0 . * This orders the bhats from simulations and . * goes from p1 to the p2 highest . * where p1 and p2 are bias-correction adjustments to alpha/2 and 1-alpha/2 . * Let p1 = Phi(2z0 - z_alpha/2) .* p2 = Phi(2z0 + z_alpha/2) .* z0 measures the median bias in bhat with .* z0 = Phi-inv(fraction of the bhat(s) < bhat) . * And if z0=0 then p1 = alpha/2 and no correction . . * Change the following for different number of simulations S . * From page 399, for testing better to use 999 than 1000 . global breps "999" /* The number of bootstrap reps used below */ . . * (1A) Simplest bootstrap is of all the estimated coefficients . set seed 10105 . bootstrap "probit y x" _b, reps($breps) bca command: probit y x statistics: b_x = _b[x] b_cons = _b[_cons] 161

Bootstrap statistics

Number of obs = Replications = 999

40

-----------------------------------------------------------------------------Variable | Reps Observed Bias Std. Err. [95% Conf. Interval] -------------+---------------------------------------------------------------b_x | 999 .8168831 .1017329 .3763803 .0782956 1.555471 (N) | .3495505 1.878616 (P) | .2808956 1.600026 (BC) | .1552112 1.480223 (BCa) b_cons | 999 -.0725436 -.0176301 .2448404 -.5530047 .4079175 (N) | -.596443 .4247662 (P) | -.5528302 .4381396 (BC) | -.5205303 .4445401 (BCa) -----------------------------------------------------------------------------Note: N = normal P = percentile BC = bias-corrected BCa = bias-corrected and accelerated . . * (1B) This bootstrap is of MLE of b2 and the associated standard error . * and additionally gives the bias-accelerated method of Efron . set seed 10105 . bootstrap "probit y x" _b[x] _se[x], reps($breps) bca command: probit y x statistics: _bs_1 = _b[x] _bs_2 = _se[x] Bootstrap statistics

Number of obs = Replications = 999

40

-----------------------------------------------------------------------------Variable | Reps Observed Bias Std. Err. [95% Conf. Interval] -------------+---------------------------------------------------------------_bs_1 | 999 .8168831 .1017329 .3763803 .0782956 1.555471 (N) | .3495505 1.878616 (P) | .2808956 1.600026 (BC) | .1552112 1.480223 (BCa) _bs_2 | 999 .2942893 .0422005 .0932673 .1112667 .4773118 (N) | .2323841 .5831083 (P) | .2214397 .4475662 (BC) | .2162534 .4143377 (BCa) -----------------------------------------------------------------------------Note: N = normal P = percentile BC = bias-corrected BCa = bias-corrected and accelerated 162

. . * (1C) This bootstrap repeats (2) . * but will permit bootstrapping if Stata commands are more than one line . use mma07p4boot, clear . program define commandtobootstrap, rclass 1. version 8.0 2. quietly probit y x 3. return scalar b2hat=_b[x] 4. return scalar seb2hat=_se[x] 5. end . set seed 10105 . bootstrap "commandtobootstrap" r(b2hat) r(seb2hat), reps($breps) command: commandtobootstrap statistics: _bs_1 = r(b2hat) _bs_2 = r(seb2hat) Bootstrap statistics

Number of obs = Replications = 999

40

-----------------------------------------------------------------------------Variable | Reps Observed Bias Std. Err. [95% Conf. Interval] -------------+---------------------------------------------------------------_bs_1 | 999 .8168831 .1017329 .3763803 .0782956 1.555471 (N) | .3495505 1.878616 (P) | .2808956 1.600026 (BC) _bs_2 | 999 .2942893 .0422005 .0932673 .1112667 .4773118 (N) | .2323841 .5831083 (P) | .2214397 .4475662 (BC) -----------------------------------------------------------------------------Note: N = normal P = percentile BC = bias-corrected . . ********** (2) BOOTSTRAP HYPOTHESIS TESTS - NO REFINEMENT p.255 ********** . . * We want to test H0: b2 = 1 against Ha: b2 not equal 1 . . * For a simple test such as this we can just use . * the bootstrap confidence intervals from (1) . * and reject if bhat2 is not in the confidence interval . . * Here we instead present a common method without refinement . * essentially (1) above, performing the usual Wald test, . * except the standard error is estimated by bootstrap. . * This is useful when hard to obtain standard error by other means. 163

. * Here W = (b2hat - b2_0) / seb2hat_boot where b2_0 = 1 . * and reject at level .05 if |W| > z_.025 = 1.96 . . use mma07p4boot, clear . * Save the estimate . quietly probit y x . scalar b2est = _b[x] . * Obtain the bootstrap standard error . set seed 10105 . bootstrap "probit y x" _b, reps($breps) bca command: probit y x statistics: b_x = _b[x] b_cons = _b[_cons] Bootstrap statistics

Number of obs = Replications = 999

40

-----------------------------------------------------------------------------Variable | Reps Observed Bias Std. Err. [95% Conf. Interval] -------------+---------------------------------------------------------------b_x | 999 .8168831 .1017329 .3763803 .0782956 1.555471 (N) | .3495505 1.878616 (P) | .2808956 1.600026 (BC) | .1552112 1.480223 (BCa) b_cons | 999 -.0725436 -.0176301 .2448404 -.5530047 .4079175 (N) | -.596443 .4247662 (P) | -.5528302 .4381396 (BC) | -.5205303 .4445401 (BCa) -----------------------------------------------------------------------------Note: N = normal P = percentile BC = bias-corrected BCa = bias-corrected and accelerated . matrix sebboot = e(se) . scalar seb2boot = sebboot[1,1] /* x is first then constant */ . * Calculate the test statistic . scalar Wald = (b2est - 1)/seb2boot . . * DISPLAY RESULTS at bottom p.255 . * Note: Text had typo: . * (1-0.817)/0.376 = -0.487 should be (0.817-1)/0.376 = -0.487 . 164

. di "Probit slope estimate is: " b2est Probit slope estimate is: .8168831 . di "Bootstrap standard estimate is: " seb2boot Bootstrap standard estimate is: .37638029 . di "Wald statistic (no refinement) is: " Wald Wald statistic (no refinement) is: -.48652096 . di "Reject at level .05 if |Wald| > 1.96" Reject at level .05 if |Wald| > 1.96 . . ********** (3) BOOTSTRAP HYPOTHESIS TESTS - PERCENTILE-T p.256 ********** . . * Stata does not give this. For methods see . * e.g. Efron and Tibsharani (1993, pp.160-162) . * e.g. Cameron and Trivedi (2005)

Chapter 11.2.6-11.2.7 . * For sample s compute t-test(s) = (bhat(s)-bhat) / se(s) . * where bhat is initial estimate . * and bhat(s) and se(s) are for sth round. . * Order the t-test(s) statistics and choose the alpha/2 percentiles . * which give the critical values for the t-test . . * Implementation requires saving the results from each bootstrap replication . * in order to obtain ccritical values from percentiles of bootstrap distribution . . * (3A) Here bootstrap computes (b(s) - bhat) / se(s) s = 1,...,S . . use mma07p4boot, clear . * Save the estimate and the Wald test statistic . quietly probit y x . scalar b2est = _b[x] . scalar Wald = (_b[x] - 1)/_se[x] . * Then bootstrap calculates (b(s) - bhat) / se(s) . set seed 10105 . bootstrap "probit y x" ((_b[x]-b2est)/_se[x]), reps($breps) /* > */ level(95) saving(mma07p4bootreps) replace command: probit y x statistic: _bs_1 = (_b[x]-b2est)/_se[x] Bootstrap statistics

Number of obs = Replications = 999

40

165

-----------------------------------------------------------------------------Variable | Reps Observed Bias Std. Err. [95% Conf. Interval] -------------+---------------------------------------------------------------_bs_1 | 999 0 .1003619 .9350234 -1.834837 1.834837 (N) | -1.890602 1.801358 (P) | -2.101316 1.565618 (BC) -----------------------------------------------------------------------------Note: N = normal P = percentile BC = bias-corrected . * Then get data sets with result from each bootstrap . use mma07p4bootreps, clear (bootstrap: probit y x) . sum

/* Here just _bs_1 */

Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------_bs_1 | 999 .1003619 .9350234 -3.032139 2.572848 . gen b2test = _bs_1 /* _bs_1 is the bootstrap result of interest */ . sum b2test, detail /* Gives percentiles but not 2.5% and 97.5% */ b2test ------------------------------------------------------------Percentiles Smallest 1% -2.188575 -3.032139 5% -1.540843 -2.605178 10% -1.137846 -2.599248 Obs 999 25% -.4995352 -2.566578 Sum of Wgt. 999 50% 75% 90% 95% 99%

.1238111 Mean .1003619 Largest Std. Dev. .9350234 .7789762 2.22565 1.338348 2.359132 Variance .8742688 1.560646 2.377491 Skewness -.2505319 2.014282 2.572848 Kurtosis 2.853737

. _pctile b2test, p(2.5,97.5) . . * DISPLAY RESULTS on p.256 . . * Note: Error on p.256 Here get (-1.89, 1.80) not (-2.62, 1.83) . di "Lower 2.5 and upper 2.5 percentile of coeff b for z: " r(r1) " and " r(r2) Lower 2.5 and upper 2.5 percentile of coeff b for z: -1.8906019 and 1.8013585 . di "Reject H0 if Wald = " Wald " lies outside " r(r1) " ," r(r2) ")" Reject H0 if Wald = -.62223436 lies outside -1.8906019 ,1.8013585) 166

. . * (3B) Equivalently bootstrap calculates b(s) and se(s) s = 1,...,S .* and then later calculate (b(s) - bhat) / se(s) . . use mma07p4boot, clear . * Save the estimate and the Wald test statistic . quietly probit y x . scalar b2est = _b[x] . scalar Wald = (_b[x] - 1)/_se[x] . * Then bootstrap calculates b(s) and se(s) . set seed 10105 . bootstrap "probit y x" _b[x] _se[x], reps($breps) /* > */ level(95) saving(mma07p4bootreps) replace command: probit y x statistics: _bs_1 = _b[x] _bs_2 = _se[x] Bootstrap statistics

Number of obs = Replications = 999

40

-----------------------------------------------------------------------------Variable | Reps Observed Bias Std. Err. [95% Conf. Interval] -------------+---------------------------------------------------------------_bs_1 | 999 .8168831 .1017329 .3763803 .0782956 1.555471 (N) | .3495505 1.878616 (P) | .2808956 1.600026 (BC) _bs_2 | 999 .2942893 .0422005 .0932673 .1112667 .4773118 (N) | .2323841 .5831083 (P) | .2214397 .4475662 (BC) -----------------------------------------------------------------------------Note: N = normal P = percentile BC = bias-corrected . * Then get data sets with result from each bootstrap . use mma07p4bootreps, clear (bootstrap: probit y x) . sum

/* Here _bs_1 and _bs_2 */

Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------_bs_1 | 999 .918616 .3763803 .0030288 3.806198 _bs_2 | 999 .3364898 .0932673 .2162534 1.34312 167

. gen b2test = (_bs_1 - b2est)/_bs_2 . _pctile b2test, p(2.5,97.5) . . * DISPLAY RESULTS on p.256 . * Note: Error on p.256 Here get (-1.89, 1.80) not (-2.62, 1.83) . di "Lower 2.5 and upper 2.5 percentile of coeff b for z: " r(r1) " and " r(r2) Lower 2.5 and upper 2.5 percentile of coeff b for z: -1.8906019 and 1.8013583 . di "Reject H0 if Wald = " Wald " lies outside " r(r1) " ," r(r2) ")" Reject H0 if Wald = -.62223436 lies outside -1.8906019 ,1.8013583) . . ********** CLOSE OUTPUT . log close log: c:\Imbook\bwebpage\Section2\mma07p4boot.txt log type: text closed on: 18 May 2005, 21:36:36

168

-----------------------------------------------------------------------------------------------------log: c:\Imbook\bwebpage\Section2\mma08p1cmtests.txt log type: text opened on: 17 May 2005, 14:04:20 . . ********** OVERVIEW OF MMA08P1CMTESTS.DO ********** . . * STATA Program . * copyright C 2005 by A. Colin Cameron and Pravin K. Trivedi . * used for "Microeconometrics: Methods and Applications" . * by A. Colin Cameron and Pravin K. Trivedi (2005) . * Cambridge University Press . . * Chapter 8.2.6 pages 269-71 . * Conditional moment tests example producing Table 8.1 . . * (A) TEST OF THE CONDITIONAL MEAN . * (B) TEST THAT CONDITIONAL VARIANCE = MEAN . * (C) ALTERNATIVE TEST THAT CONDITIONAL VARIANCE = MEAN . * (D) INFORMATION MATRIX TEST . * (E) CHI-SQUARE GOODNESS OF FIT TEST . * for a Poisson model with generated data (see below). . . * The data generation requires free Stata add-on command rndpoix . * In Stata: search rndpoix . . ********** SETUP ********** . . set more off . version 8.0 . set scheme s1mono /* Used for graphs */ . . ********** GENERATE DATA ********** . . * Model is . * y ~ Poisson[exp(b1 + b2*x2] . * where . * x2 is iid ~ N[0,1] . * and b1=0 and b2=1. . . set seed 10001 . set obs 200 obs was 0, now 200 . scalar b1 = 0

169

. scalar b2 = 1 . . * Generate regressors . gen x2 = invnorm(uniform()) . . * Generate y . gen mupoiss = exp(b1+b2*x2) . * The next requires Stata add-on. In Stata: search rndpoix . rndpoix(mupoiss) ( Generating ................ ) Variable xp created. . gen y = xp . . * Write data to a text (ascii) file so can use with programs other than Stata . outfile y x2 using mma08p1cmtests.asc, replace . . ********* POISSON REGRESSION ********** . . poisson y x2 Iteration 0: log likelihood = -263.53818 Iteration 1: log likelihood = -263.5288 Iteration 2: log likelihood = -263.5288 Poisson regression

Log likelihood = -263.5288

Number of obs = LR chi2(1) = 321.75 Prob > chi2 = 0.0000 Pseudo R2 =

200

0.3791

-----------------------------------------------------------------------------y| Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------x2 | 1.12402 .0687868 16.34 0.000 .9892006 1.25884 _cons | -.1652935 .089065 -1.86 0.063 -.3398578 .0092707 -----------------------------------------------------------------------------. * Obtain exp(x'b) . . * Obtain the scores to be used later . predict yhat (option n assumed; predicted number of events) . * For the Poisson s = dlnf(y)/db = (y - exp(x'b))*x . gen s1 = (y - yhat)

170

. gen s2 = (y - yhat)*x2 . . * Summarize data . * Should get s1 and s2 summing to zero . sum Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------x2 | 200 -.0091098 1.010072 -2.857666 2.149822 mupoiss | 200 1.599601 1.674071 .0574026 8.58333 xp | 200 1.525 2.363749 0 15 y| 200 1.525 2.363749 0 15 yhat | 200 1.525 1.803242 .0341372 9.498652 -------------+-------------------------------------------------------s1 | 200 1.36e-09 1.36719 -3.148933 6.245292 s2 | 200 6.69e-09 1.889198 -6.420406 12.97311 . . ********** ANALYSIS: CONDITIONAL MOMENTS TESTS ********** . . * The program is appropriate for MLE with density assumed to be correctly specified. . * Let H0: E[m(y,x,theta)] = 0 . * Then CM = explained sum of squares or N times uncentered Rsq from . * auxiliary regression of 1 on m and the components of s = dlnf(y)//dtheta . * The test is chi-squared with dim(m) degrees of freedom. . . * Define the dependent variable one for the aucxiliary regressions . gen one = 1 . . *** (A) TEST OF THE CONDITIONAL MEAN (Table 8.1 p.270 row 1) . . * Test H0: E[(y - exp(x'b))*z] = 0 where z = x2sq . . * A smilar test is relevant for many nonlinear models . * Just change the expression for the conditional mean. . * Here we used E[y|x] = exp(x'b) for the Poisson . * Also for the Poisson z cannot be x as this sums to zero by Poisson foc . * For some other models (basically non-LEF models) z can be x . . gen z = x2*x2 . gen mA = (y - yhat)*z . regress one mA s1 s2, noconstant Source | SS df MS Number of obs = 200 -------------+-----------------------------F( 3, 197) = 1.09 Model | 3.27177115 3 1.09059038 Prob > F = 0.3536 Residual | 196.728229 197 .998620451 R-squared = 0.0164 171

-------------+-----------------------------Total | 200 200 1

Adj R-squared = 0.0014 Root MSE = .99931

-----------------------------------------------------------------------------one | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------mA | .1046155 .0577969 1.81 0.072 -.0093646 .2185956 s1 | -.0377486 .0822939 -0.46 0.647 -.2000387 .1245415 s2 | -.1544278 .1029465 -1.50 0.135 -.3574463 .0485908 -----------------------------------------------------------------------------. scalar CMA = e(N)*e(r2) . di "CMA: " CMA " p-value: " chi2tail(1,CMA) CMA: 3.2717711 p-value: .07048149 . . * Check that three different ways give same answer. . di "N times Uncentered R-squared: " e(N)*e(r2) N times Uncentered R-squared: 3.2717711 . di "Explained Sum of Squares: " e(mss) Explained Sum of Squares: 3.2717711 . di "N minus Residual Sum of Squares: " e(N) - e(rss) N minus Residual Sum of Squares: 3.2717711 . . *** (B) TEST THAT CONDITIONAL VARIANCE = MEAN (Table 8.1 p.270 row 2) . . * Test H0: E[{(y - exp(x'b))^2 - exp(x'b)}*x] = 0 . . * This test is peculiar to Poisson which restricts mean = variance . . * Here m has 2 terms . gen mB1 = ((y - yhat)^2 - yhat) . gen mB2 = ((y - yhat)^2 - yhat)*x2 . regress one mB1 mB2 s1 s2, noconstant Source | SS df MS Number of obs = 200 -------------+-----------------------------F( 4, 196) = 0.60 Model | 2.43400011 4 .608500026 Prob > F = 0.6604 Residual | 197.566 196 1.0079898 R-squared = 0.0122 -------------+-----------------------------Adj R-squared = -0.0080 Total | 200 200 1 Root MSE = 1.004 -----------------------------------------------------------------------------one | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------172

mB1 | .0432045 .0542516 0.80 0.427 -.0637873 .1501963 mB2 | -.0052374 .0357193 -0.15 0.884 -.0756808 .065206 s1 | -.0399879 .1073712 -0.37 0.710 -.251739 .1717633 s2 | -.003196 .0852726 -0.04 0.970 -.1713655 .1649735 -----------------------------------------------------------------------------. scalar CMB = e(N)*e(r2) . di "CMB: " CMB " p-value: " chi2tail(2,CMB) CMB: 2.4340001 p-value: .29611717 . . *** (C) ALTERNATIVE TEST THAT CONDITIONAL VARIANCE = MEAN (Table 8.1 p.270 row 3) . . * Test H0: E[{(y - exp(x'b))^2 - y}*x] = 0 . . * This test is peculiar to Poisson which restricts mean = variance . * This test is also peculiar as here dm/db = 0 . . * Here m has 2 terms . gen mC1 = ((y - yhat)^2 - y) . gen mC2 = ((y - yhat)^2 - y)*x2 . . * To be consistent with other tests include s1 and s2. . regress one mC1 mC2 s1 s2, noconstant Source | SS df MS Number of obs = 200 -------------+-----------------------------F( 4, 196) = 0.60 Model | 2.43400011 4 .608500027 Prob > F = 0.6604 Residual | 197.566 196 1.0079898 R-squared = 0.0122 -------------+-----------------------------Adj R-squared = -0.0080 Total | 200 200 1 Root MSE = 1.004 -----------------------------------------------------------------------------one | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------mC1 | .0432045 .0542516 0.80 0.427 -.0637873 .1501963 mC2 | -.0052374 .0357192 -0.15 0.884 -.0756808 .065206 s1 | .0032166 .0825345 0.04 0.969 -.1595531 .1659863 s2 | -.0084334 .0641096 -0.13 0.895 -.1348665 .1179997 -----------------------------------------------------------------------------. scalar CMC = e(N)*e(r2) . di "CMC: " CMC " p-value: " chi2tail(2,CMC) CMC: 2.4340001 p-value: .29611717 . 173

. * Since dm/db = 0 could just do the regression without the scores . regress one mC1 mC2, noconstant Source | SS df MS Number of obs = 200 -------------+-----------------------------F( 2, 198) = 1.21 Model | 2.40695177 2 1.20347588 Prob > F = 0.3016 Residual | 197.593048 198 .997944688 R-squared = 0.0120 -------------+-----------------------------Adj R-squared = 0.0021 Total | 200 200 1 Root MSE = .99897 -----------------------------------------------------------------------------one | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------mC1 | .0458705 .0510111 0.90 0.370 -.0547243 .1464652 mC2 | -.0075807 .03212 -0.24 0.814 -.0709218 .0557605 -----------------------------------------------------------------------------. scalar CMCnoscores = e(N)*e(r2) . di "CMCnoscores: " CMC " p-value: " chi2tail(2,CMCnoscores) CMCnoscores: 2.4340001 p-value: .30014911 . . *** (D) INFORMATION MATRIX TEST (Table 8.1 p.270 row 4) . . * Test H0: E[{(y - exp(x'b))^2 - y}*vech(xx')] = 0 . . * A similar test is relevant for other parametric models . * In general m = vech(d2lnf(y)/dbdb') . * and for Poisson this yields above . . * Here m is a 3x1 vector . gen mD1 = ((y - yhat)^2 - y) . gen mD2 = ((y - yhat)^2 - y)*x2 . gen mD3 = ((y - yhat)^2 - y)*x2*x2 . . * To be consistent with other tests include s1 and s2. . regress one mD1 mD2 mD3 s1 s2, noconstant Source | SS df MS Number of obs = 200 -------------+-----------------------------F( 5, 195) = 0.58 Model | 2.9463051 5 .58926102 Prob > F = 0.7129 Residual | 197.053695 195 1.01053177 R-squared = 0.0147 -------------+-----------------------------Adj R-squared = -0.0105 Total | 200 200 1 Root MSE = 1.0053 -----------------------------------------------------------------------------one | Coef. Std. Err. t P>|t| [95% Conf. Interval] 174

-------------+---------------------------------------------------------------mD1 | .0546342 .0566422 0.96 0.336 -.0570759 .1663442 mD2 | -.0712751 .0994042 -0.72 0.474 -.2673205 .1247703 mD3 | .0330527 .0464213 0.71 0.477 -.0584996 .124605 s1 | -.0098554 .0846533 -0.12 0.907 -.176809 .1570982 s2 | -.0146441 .0647803 -0.23 0.821 -.1424041 .1131158 -----------------------------------------------------------------------------. scalar CMD = e(N)*e(r2) . di "CMD: " CMD " p-value: " chi2tail(3,CMD) CMD: 2.9463051 p-value: .39997818 . . * Since dm/db = 0 could just do the regression without the scores . regress one mD1 mD2 mD3, noconstant Source | SS df MS Number of obs = 200 -------------+-----------------------------F( 3, 197) = 0.91 Model | 2.73445751 3 .911485837 Prob > F = 0.4370 Residual | 197.265542 197 1.00134793 R-squared = 0.0137 -------------+-----------------------------Adj R-squared = -0.0013 Total | 200 200 1 Root MSE = 1.0007 -----------------------------------------------------------------------------one | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------mD1 | .056165 .054176 1.04 0.301 -.0506743 .1630043 mD2 | -.056325 .0911035 -0.62 0.537 -.2359884 .1233384 mD3 | .0233527 .0408339 0.57 0.568 -.057175 .1038805 -----------------------------------------------------------------------------. scalar CMDnoscores = e(N)*e(r2) . di "CMDnoscores: " CMDnoscores " p-value: " chi2tail(3,CMDnoscores) CMDnoscores: 2.7344575 p-value: .43440333 . . *** (E) CHI-SQUARE GOODNESS OF FIT TEST (Table 8.1 p.270 row 5) . . * Test H0: E[{d_j - Pr[y = j]] = 0 . * where d_j = 1 if y = j for j = 0, 1, 2, and 3 or more . * and Pr[y = j] = exp(-lamda)*lamda^y/y! for lamda = exp(x'b) . * Cells get too small if have more cells than up to 3 or more. . . * A similar test is relevant for other parametric models, . * though a natural partitioning for y may be less obvious. . . * Here m has 4 terms . gen d0 = 0

175

. replace d0 = 1 if y==0 (87 real changes made) . gen d1 = 0 . replace d1 = 1 if y==1 (51 real changes made) . gen d2 = 0 . replace d2 = 1 if y==2 (22 real changes made) . gen p0 = exp(-yhat) . gen p1 = exp(-yhat)*yhat . gen p2 = exp(-yhat)*(yhat^2)/2 . gen mE1 = d0 - p0 . gen mE2 = d1 - p1 . gen mE3 = d2 - p2 . regress one mE1 mE2 mE3 s1 s2, noconstant Source | SS df MS Number of obs = 200 -------------+-----------------------------F( 5, 195) = 0.49 Model | 2.50056717 5 .500113433 Prob > F = 0.7807 Residual | 197.499433 195 1.0128176 R-squared = 0.0125 -------------+-----------------------------Adj R-squared = -0.0128 Total | 200 200 1 Root MSE = 1.0064 -----------------------------------------------------------------------------one | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------mE1 | 1.020078 .7290569 1.40 0.163 -.4177712 2.457927 mE2 | .7149016 .5053259 1.41 0.159 -.2817042 1.711507 mE3 | .2705081 .383646 0.71 0.482 -.4861201 1.027136 s1 | .2916116 .2217763 1.31 0.190 -.1457765 .7289997 s2 | -.1341565 .1125046 -1.19 0.235 -.3560384 .0877255 -----------------------------------------------------------------------------. scalar CME = e(N)*e(r2) . di "CME: " CME " p-value: " chi2tail(3,CME) CME: 2.5005672 p-value: .47518859 . . * Wrong alternative is basic chisquare 176

. quietly sum d0 . scalar sumd0 = r(sum) . quietly sum d1 . scalar sumd1 = r(sum) . quietly sum d2 . scalar sumd2 = r(sum) . scalar sumd3 = 1 - sumd0 - sumd1 - sumd2 . quietly sum p0 . scalar sump0 = r(sum) . quietly sum p1 . scalar sump1 = r(sum) . quietly sum p2 . scalar sump2 = r(sum) . scalar sump3 = 1 - sump0 - sump1 - sump2 . scalar chisq = (sumd0-sump0)^2/sump0 + (sumd1-sump1)^2/sump1 /* > */ + (sumd2-sump2)^2/sump2 + (sumd3-sump3)^2/sump3 . di "Wrong Traditional chi-square: " chisq " p = " chi2tail(3,chisq) Wrong Traditional chi-square: .47431003 p = .92449803 . . . ********** DISPLAY RESULTS (Table 8.1 p.270) ********** . . sum Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------x2 | 200 -.0091098 1.010072 -2.857666 2.149822 mupoiss | 200 1.599601 1.674071 .0574026 8.58333 xp | 200 1.525 2.363749 0 15 y| 200 1.525 2.363749 0 15 yhat | 200 1.525 1.803242 .0341372 9.498652 -------------+-------------------------------------------------------s1 | 200 1.36e-09 1.36719 -3.148933 6.245292 s2 | 200 6.69e-09 1.889198 -6.420406 12.97311 one | 200 1 0 1 1 177

z| 200 1.015227 1.286795 .0000877 8.166255 mA | 200 .1563713 3.403966 -13.52498 26.94856 -------------+-------------------------------------------------------mB1 | 200 .334863 3.470417 -6.436038 30.24896 mB2 | 200 .43869 5.749749 -11.74974 62.83503 mC1 | 200 .334863 3.077815 -6.838236 24.00367 mC2 | 200 .43869 4.897291 -12.484 49.86192 mD1 | 200 .334863 3.077815 -6.838236 24.00367 -------------+-------------------------------------------------------mD2 | 200 .43869 4.897291 -12.484 49.86192 mD3 | 200 .8381842 9.190652 -22.791 103.5763 d0 | 200 .435 .4970011 0 1 d1 | 200 .255 .436955 0 1 d2 | 200 .11 .3136749 0 1 -------------+-------------------------------------------------------p0 | 200 .429237 .2918348 .000075 .9664389 p1 | 200 .2406035 .1137756 .000712 .367864 p2 | 200 .1235594 .0894167 .0005631 .2706694 mE1 | 200 .005763 .4287003 -.9289918 .9571021 mE2 | 200 .0143965 .4210301 -.367864 .9315748 -------------+-------------------------------------------------------mE3 | 200 -.0135594 .3065698 -.2706694 .9688674 . . * Gives Rows 1-5 of Table 8.1 (The CMxnoscores are not reported) . di "CMA: " CMA " p-value: " chi2tail(1,CMA) CMA: 3.2717711 p-value: .07048149 . di "CMB: " CMB " p-value: " chi2tail(2,CMB) CMB: 2.4340001 p-value: .29611717 . di "CMC: " CMC " p-value: " chi2tail(2,CMC) CMC: 2.4340001 p-value: .29611717 . di "CMD: " CMD " p-value: " chi2tail(3,CMD) CMD: 2.9463051 p-value: .39997818 . di "CME: " CME " p-value: " chi2tail(3,CME) CME: 2.5005672 p-value: .47518859 . di "CMCnoscores: " CMCnoscores " p-value: " chi2tail(2,CMCnoscores) CMCnoscores: 2.4069518 p-value: .30014911 . di "CMDnoscores: " CMDnoscores " p-value: " chi2tail(3,CMDnoscores) CMDnoscores: 2.7344575 p-value: .43440333 . . ********** FURTHER ANALYSIS gives M** column in Table 8.1 ********** . . * The following drops the scores from the regression. Provides lower bound. . * Results are reported in last column in Table 8.1 178

. quietly regress one mA, noconstant . di "CMA without scores:" e(N)*e(r2) " with p = " chi2tail(1,e(N)*e(r2)) CMA without scores:.42328231 with p = .51530376 . quietly regress one mB1 mB2, noconstant . di "CMB without scores:" e(N)*e(r2) " with p = " chi2tail(2,e(N)*e(r2)) CMB without scores:1.8897296 with p = .38873213 . quietly regress one mC1 mC2, noconstant . di "CMC without scores:" e(N)*e(r2) " with p = " chi2tail(2,e(N)*e(r2)) CMC without scores:2.4069518 with p = .30014911 . quietly regress one mD1 mD2 mD3, noconstant . di "CMD without scores:" e(N)*e(r2) " with p = " chi2tail(3,e(N)*e(r2)) CMD without scores:2.7344575 with p = .43440333 . quietly regress one mE1 mE2 mE3, noconstant . di "CME without scores:" e(N)*e(r2) " with p = " chi2tail(3,e(N)*e(r2)) CME without scores:.73842732 with p = .86413036 . . ********** CLOSE OUTPUT . log close log: c:\Imbook\bwebpage\Section2\mma08p1cmtests.txt log type: text closed on: 17 May 2005, 14:04:20 -----------------------------------------------------------------------------------------------------log: c:\Imbook\bwebpage\Section2\mma08p2nonnested.txt log type: text opened on: 18 May 2005, 21:27:00 . . ********** OVERVIEW OF MMA08P2NONNESTED.DO ********** . . * STATA Program . * copyright C 2005 by A. Colin Cameron and Pravin K. Trivedi . * used for "Microeconometrics: Methods and Applications" . * by A. Colin Cameron and Pravin K. Trivedi (2005) . * Cambridge University Press . . * Chapter 8.5.3 pages 283-4 . * Nonnested model comparison given in Table 8.2: . . * (A) AIC AND VARIATIONS . * (B) VUONG TEST for Overlapping Models 179

. * for a Poisson model with simulated data (see below). . . * This example requires the free Stata add-on command rndpoix. . * In Stata: search rndpoix . . ********** SETUP ********** . . set more off . version 8.0 . set scheme s1mono /* Used for graphs */ . . ********** GENERATE DATA ********** . . * Dgp is . * y ~ Poisson[exp(b1 + b2*x2 + b3*x3] . * where . * x2, x3 is iid ~ N[0,1] . * and b1=0 and b2=1 and b3=1. . . * The Models compared are . * Poisson of y on x2 . * Poisson of y on x3 and x3^2 . . set seed 10001 . set obs 100 obs was 0, now 100 . scalar b1 = 0.5 . scalar b2 = 0.5 . scalar b3 = 0.5 . . * Generate regressors . gen x2 = invnorm(uniform()) . gen x3 = invnorm(uniform()) . gen x2sq = x2*x2 . gen x3sq = x3*x3 . . * Generate y . gen mupoiss = exp(b1+b2*x2+b3*x3)

180

. * The next requires Stata add-on. In Stata: search rndpoix . rndpoix(mupoiss) ( Generating ......... ) Variable xp created. . gen y = xp . . * Write data to a text (ascii) file so can use with programs other than Stata . outfile y x2 x3 x2sq x3sq using mma08p2nonnested.asc, replace . . ********* SETUP FOR THIS PROGRAM ********* . . * Change this if want different regressors . * Here both models differ from the dgp . * The Vuong test below assumes that the two models are OVERLAPPING . global XLISTMODEL1 x2 . global XLISTMODEL2 x3 x3sq . . ********* (A) AIC AND VARIATIONS ********* . . * Stata output from Poisson saves much of this. . * Also calculate manually. . . * The following code can be changed to different models than poisson . * provided . * ereturn list yields N = e(N); q = e(k); and LnL = e(ll) . * We use AIC = -2lnL+2q; BIC = -2lnL+lnN*q; CAIC = -2lnL+(1+lnN)*q . . poisson y $XLISTMODEL1 Iteration 0: log likelihood = -183.43146 Iteration 1: log likelihood = -183.43146 Poisson regression

Number of obs = 100 LR chi2(1) = 16.28 Prob > chi2 = 0.0001 Log likelihood = -183.43146 Pseudo R2 = 0.0425 -----------------------------------------------------------------------------y| Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------x2 | .291164 .072311 4.03 0.000 .1494371 .4328909 _cons | .6084331 .0752833 8.08 0.000 .4608806 .7559857 -----------------------------------------------------------------------------. estimates store model1

181

. scalar ll1 = e(ll) . scalar q1 = e(k) . scalar N1 = e(N) . scalar aic1 = -2*ll1 + 2*q1 . scalar bic1 = -2*ll1 + ln(N1)*q1 . scalar caic1 = -2*ll1 + (1 + ln(N1))*q1 . . poisson y $XLISTMODEL2 Iteration 0: log likelihood = -176.09611 Iteration 1: log likelihood = -176.09119 Iteration 2: log likelihood = -176.09119 Poisson regression

Number of obs = 100 LR chi2(2) = 30.96 Prob > chi2 = 0.0000 Log likelihood = -176.09119 Pseudo R2 = 0.0808 -----------------------------------------------------------------------------y| Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------x3 | .3588412 .07035 5.10 0.000 .2209578 .4967245 x3sq | .0912999 .0514311 1.78 0.076 -.0095032 .1921029 _cons | .492656 .0958903 5.14 0.000 .3047144 .6805975 -----------------------------------------------------------------------------. estimates store model2 . scalar ll2 = e(ll) . scalar q2 = e(k) . scalar N2 = e(N) . scalar aic2 = -2*ll2 + 2*q2 . scalar bic2 = -2*ll2 + ln(N2)*q2 . scalar caic2 = -2*ll2 + (1 + ln(N2))*q2 . . * Display results given in first three rows of Table 8.2 page 284 . . estimates table model1 model2, stats(N k ll aic bic)

182

---------------------------------------Variable | model1 model2 -------------+-------------------------x2 | .29116396 x3 | .35884118 x3sq | .09129986 _cons | .60843314 .49265596 -------------+-------------------------N| 100 100 k| 2 3 ll | -183.43146 -176.09119 aic | 370.86292 358.18238 bic | 376.07326 365.99789 ---------------------------------------. . di "Model 1: " _n "lnL: " ll1 " q: " q1 _n " N: " N1 Model 1: lnL: -183.43146 q: 2 N: 100 . di "-2lnL: " -2*ll1 _n "AIC: " aic1 _n " BIC: " bic1 _n "caic: " caic1 -2lnL: 366.86292 AIC: 370.86292 BIC: 376.07326 caic: 378.07326 . . di "Model 2: " _n "lnL: " ll2 " q: " q2 _n " N: " N2 Model 2: lnL: -176.09119 q: 3 N: 100 . di "-2lnL: " -2*ll2 _n "AIC: " aic2 _n " BIC: " bic2 _n "caic: " caic2 -2lnL: 352.18238 AIC: 358.18238 BIC: 365.99789 caic: 368.99789 . . ********* (B) VUONG TEST FOR OVERLAPPING MODELS ********* . . * The test has three variants . * (1) Nested models: G is contained in F . * (2) Strictly non-nested models: F intersection G equals null set . * (3) Overlapping models: F intersection G does not equal null set . . * Need to compute lnf(y) for models 1 and 2, . * where density f is model 1 and density g is model 2 . . * The procedures will vary with model. Here use Poisson. 183

. . * (0) COMPUTE THE LR TEST STATISTIC . . * This is LR = Sum_i [ ln (fy1_i / gy2_i) ] .* = Sum_i lnfy1_i - Sum_i lngy2_i .* = difference in log-likelihood for the two models . . * Easiest if program output gives logL . * Otherwise need to generate manually . . quietly poisson y $XLISTMODEL1 . scalar llf = e(ll) . quietly poisson y $XLISTMODEL2 . scalar llg = e(ll) . scalar LR = llf - llg . di "LR = " LR " and llf = " llf " llg = " llg LR = -7.3402698 and llf = -183.43146 llg = -176.09119 . . * (1) NESTED MODELS . . * Not done here as not relevant for the example of this application. . . * (1A) Usual LR test if assume densities correctly specified. . . * (1B) If instead want robustified version then need to compute W . * and use the weighted chi-square test. . * This is not the appropriate test here, . * but in 3(A) below W is computed and a weighted chi-square test used. . * This code could be easily adapted to here. . . * (2) STRICTLY NON-NESTED MODELS . . * Not done here as not relevant for the example of this application. . * Test uses LR/what ~ normal where what is computed in 3(B) below. . . * (3) OVERLAPPING MODELS . . * This is the relevant test here . * First test whether overlapping (even though here know that is) . * THen do the test . . * (3A-1) Compute what^2 . . * Calculate what^2 . * = (1/N)*Sum_i[ln(fy1_i/gy2_i)^2] - [(1/N)*Sum_i[ln(fy1_i/gy2_i)]^2 184

. * = (1/N) * Sum_i [(ln(fy1_i) - ln(gy2_i))^2] - (LR/N)^2 . . * For the Poisson .* f(y) = exp(-mu)*mu^y/y! . * so lnf(y) = -mu + y*ln(mu) - lny! . quietly poisson y $XLISTMODEL1 . predict yhatf (option n assumed; predicted number of events) . * Poisson default predict gives yhat = exp(x'b) . gen lnf = -yhatf + y*ln(yhatf) - lnfact(y) . quietly poisson y $XLISTMODEL2 . predict yhatg (option n assumed; predicted number of events) . gen lng = -yhatg + y*ln(yhatg) - lnfact(y) . gen lnratiosq = (lnf-lng)^2 . sum lnratiosq Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------lnratiosq | 100 .6967792 1.816804 .0000331 13.85592 . scalar whatsq = r(sum)/_N - (LR/_N)^2 . scalar Nwhatsq = _N*whatsq . di "First-stage test statistic whatsq - still need to find critical value" First-stage test statistic whatsq - still need to find critical value . di "N*omegahatsq = " Nwhatsq N*omegahatsq = 69.139128 . . * Aside: Check by recomputing LR this long way . gen lnratio = (lnf-lng) . sum lnratio Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------lnratio | 100 -.0734027 .8356883 -3.722355 2.571382 . scalar LRcheck = r(sum) . 185

. *** Display results given in second last row of Table 8.2 page 284 . . di "LR = " LR " and LRcheck = " LRcheck LR = -7.3402698 and LRcheck = -7.3402702 . . * (3A-2) Find the critical value by first find W, then eigenvalues lamda, then simulate . . * Calculate estimate of the W matrix on page ?? of Vuong. . * (a) Can estimate Af = E[d2lnf(y)/dbdb'] as inverse of usual ML variance matrix . * (b) Since the robust ML variance matrix is V = Ainv*B*Ainv . * can estimate Bf = -E[dlnf(y)/dbxdlnf(y)/db'] by A*V*A where A is in (a) . * (c) For Ag same as in part (a) except for model g . * (d) For Bg same as in part (a) except for model g . * (e) The only tricky bit is computation of Bfg . . gen one = 1 . * (a) Af . quietly poisson y one $XLISTMODEL1, noconstant . matrix Af = syminv(e(V)) . * (b) Bf . quietly poisson y one $XLISTMODEL1, noconstant robust . * robust gives Ainv*B*Ainv so pre and post multiply by A gives B . * Also make adjustment s Stata divides by (_N-1). Here use _N. . matrix Bf = Af*e(V)*Af*(_N-1)/_N . * (c) Ag . quietly poisson y one $XLISTMODEL2, noconstant . matrix Ag = syminv(e(V)) . * (d) Bg . quietly poisson y one $XLISTMODEL2, noconstant robust . matrix Bg = Ag*e(V)*Ag*(_N-1)/_N . . * (e) Bfg requires more specialized code pecuuliar to this example . * For Poisson dlnf(y)/db = Sum_I (y_i - mu_i)*x_i . * so Bfg = (1/N)*Sum_i [(y_i - muf_i)*xf_i]*[(y_i - mug_i)*xg_i]' . * For model 1 x is intercept and x2 (global XLISTMODEL1 x2) . gen bf1 = (y - yhatf) /* yhatf saved earlier = y - muf */ . gen bf2 = (y - yhatf)*x2 . * For model 2 x is intercept, x3 and x3sq (global XLISTMODEL2 x3 x3sq) . gen bg1 = (y - yhatg) /* yhatg saved earlier = y - mug */ 186

. gen bg2 = (y - yhatg)*x3 . gen bg3 = (y - yhatg)*x3sq . * Create Bfg . matrix accum BfBg = bf1 bf2 bg1 bg2 bg3, noconstant (obs=100) . * and Bfg is the (1,2) submatrix: rows 1 to 2 and columns 3 to 5 . matrix Bfg = BfBg[1..2,3..5] . . * Form the matrix W . * Note there is no need for minus sign as A has been defined as -A . matrix W11 = Bf*syminv(Af) . matrix W12 = Bfg*syminv(Ag) . matrix W21 = Bfg'*syminv(Af) . matrix W22 = Bg*syminv(Ag) . matrix W = W11,W12\W21,W22 . matrix list W W[5,5]

y:one y:x2 bg1 bg2 bg3

y: y: y: y: y: one x2 one x3 x3sq 1.5571072 .01745302 1.3738479 .03868485 -.1702893 .05110494 1.4484966 .61074273 .07847014 -.15039712 1.1488275 .1064062 1.6030095 .0647251 -.18944561 .39558125 .08428705 .20709641 1.0650899 -.05677421 1.1180355 -.0564763 .19914593 .07617139 .90718177

. . * Calculate the eigenvalues of W . matrix eigenvalues reigvalW ceigvalW = W . * Real eigenvalues . matrix list reigvalW reigvalW[1,5] y: y: y: y: y: one x2 one x3 x3sq real 2.7511946 .29082285 1.4750881 1.0021719 1.0616075 . * Complex eigenvalues - hopefully none . matrix list ceigvalW

187

ceigvalW[1,5] y: y: y: y: y: one x2 one x3 x3sq complex 0 0 0 0 0 . . * This gives the vector lamda of eigenvalus of W . matrix lamda = reigvalW . scalar l1 = lamda[1,1] . scalar l2 = lamda[1,2] . scalar l3 = lamda[1,3] . scalar l4 = lamda[1,4] . scalar l5 = lamda[1,5] . . * Now obtain the p-value and critical value at level 0.05 . preserve . * Obtain the 5 percent critical value by simulating 10000 draws from . * M_p+q(lamda) = Sum_j lamda*j*z_j^2 where z_j are N[0,1] so z_j^2 are chi(1) . set seed 10101 . set obs 10000 obs was 100, now 10000 . gen randomdraw = l1*invnorm(uniform())^2 + l2*invnorm(uniform())^2 + /* > */ l3*invnorm(uniform())^2 + l4*invnorm(uniform())^2 + l5*invnorm(uniform())^2 . gen indicator = Nwhatsq >= randomdraw . quietly sum indicator . di "p-value for the Omegahatsq test = " 1-r(mean) p-value for the Omegahatsq test = 0 . sum randomdraw, detail randomdraw ------------------------------------------------------------Percentiles Smallest 1% .6438425 .0756691 5% 1.286375 .1250253 10% 1.850972 .1326376 Obs 10000 25% 3.137835 .1402145 Sum of Wgt. 10000 50%

5.359223

Mean

6.614841 188

75% 90% 95% 99%

Largest Std. Dev. 4.90562 8.751276 38.32291 12.8871 38.75208 Variance 24.06511 16.10237 40.94431 Skewness 1.733549 23.85304 44.08449 Kurtosis 7.514808

. di "Reject overlapping at level .05 if N*omegahatsq exceeds " r(p95) Reject overlapping at level .05 if N*omegahatsq exceeds 16.102374 . restore . di "where N*omegahatequals " Nwhatsq where N*omegahatequals 69.139128 . di "If reject then continue to second step." If reject then continue to second step. . di "Otherwise stop as cannot determine whether models are overlapping." Otherwise stop as cannot determine whether models are overlapping. . . * (3B) Do the second stage test if reject at (3A) . gen TLR = (LR/sqrt(whatsq))/sqrt(_N) . . *** Display results given in second last row of Table 8.2 page 284 . . di "TLR is N[0,1]. Here TLR = " TLR TLR is N[0,1]. Here TLR = -.88277513 . di "Two-tailed test p-value: " chi2tail(1,TLR^2) Two-tailed test p-value: .37735778 . . ********** CLOSE OUTPUT ********** . log close log: c:\Imbook\bwebpage\Section2\mma08p2nonnested.txt log type: text closed on: 18 May 2005, 21:27:00 -----------------------------------------------------------------------------------------------------log: c:\Imbook\bwebpage\Section2\mma08p3diagnostics.txt log type: text opened on: 17 May 2005, 14:10:13 . . ********** OVERVIEW OF MMA08P3DIAGNOSTICS.DO ********** . . * STATA Program . * copyright C 2005 by A. Colin Cameron and Pravin K. Trivedi . * used for "Microeconometrics: Methods and Applications" 189

. * by A. Colin Cameron and Pravin K. Trivedi (2005) . * Cambridge University Press . . * Chapter 8.7.3 pages 290-1 . * Model diagnostics example (Table 8.3) . . * (A) DIFFERENT R-SQUAREDS . * (B) CALCULATION OF RESIDUALS . * for a Poisson model with simulated data (see below). . . * The data generation requires free Stata add-on command rndpoix . * In Stata: search rndpoix . . * This program gives results for model 2 . * For model 1 need to rerun with only x3 as regressor . . ********** SETUP ********** . . set more off . version 8.0 . set scheme s1mono /* Used for graphs */ . . ********** GENERATE DATA ********** . . * Model is . * y ~ Poisson[exp(b1 + b2*x2 + b3*x3] . * where . * x2 and x3 are iid ~ N[0,1] . * and b1=0.5 and b2=0.5 and b3=0.5. . . * The Diagnostics below are from Poisson regression of y on x3 alone . * or from Poisson regression of y on x3 and x3sq. [Note" x2 is omitted] . . set seed 10001 . set obs 100 obs was 0, now 100 . scalar b1 = 0.5 . scalar b2 = 0.5 . scalar b3 = 0.5 . . * Generate regressors . gen x2 = invnorm(uniform())

190

. gen x3 = invnorm(uniform()) . . * Generate y . gen mupoiss = exp(b1+b2*x2+b3*x3) . * The next requires Stata add-on. In Stata: search rndpoix . rndpoix(mupoiss) ( Generating ......... ) Variable xp created. . gen y = xp . sum Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------x2 | 100 .0053689 1.000686 -2.173506 2.106561 x3 | 100 -.0235884 1.024207 -2.857666 2.149822 mupoiss | 100 2.020511 1.400564 .3380426 7.029678 xp | 100 1.92 1.835013 0 8 y| 100 1.92 1.835013 0 8 . . * Write data to a text (ascii) file so can use with programs other than Stata . outfile y x2 x3 using mma08p3diagnostics.asc, replace . . ********* SETUP FOR THIS PROGRAM ********** . . * Change this if want different regressors . gen x3sq = x3*x3 . * global XLIST x3 /* Model 1 */ . global XLIST x3 x3sq /* Model 2 */ . . ********* R-SQUARED (reported in Table 8.3 p.291) ********** . . * The following code can be changed to diffferent models than poisson . * For RsqRES, RsqEXP and RsqCOR need .* y dependent variable . * yhat predicted value of dependent variable . * For RsqWRSS additionally need . * sigmasq predicted variance of dependent variable . * For RsqRG need log density evaluated at values given below . . * Obtain exp(x'b) Will vary with the model . poisson y $XLIST Iteration 0: log likelihood = -176.09611 191

Iteration 1: log likelihood = -176.09119 Iteration 2: log likelihood = -176.09119 Poisson regression

Number of obs = 100 LR chi2(2) = 30.96 Prob > chi2 = 0.0000 Log likelihood = -176.09119 Pseudo R2 = 0.0808 -----------------------------------------------------------------------------y| Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------x3 | .3588412 .07035 5.10 0.000 .2209578 .4967245 x3sq | .0912999 .0514311 1.78 0.076 -.0095032 .1921029 _cons | .492656 .0958903 5.14 0.000 .3047144 .6805975 -----------------------------------------------------------------------------. predict yhat (option n assumed; predicted number of events) . scalar dof = e(N)-e(k) . . * RsqRES and RsqEXP are R-squared from sums of squares . * First get TSS, ESS and RSS . egen ybar = mean(y) . gen ylessybarsq = (y - ybar)^2 . quietly sum ylessybarsq . scalar totalss = r(mean) . gen yhatlessybarsq = (yhat - ybar)^2 . quietly sum yhatlessybarsq . scalar explainedss = r(mean) . gen residualsq = (y - yhat)^2 . quietly sum residualsq . scalar residualss = r(mean) . * Second computed the rsquared . scalar sereg = sqrt(residualss/dof) . scalar RsqRES = 1 - residualss/totalss . scalar RsqEXP = explainedss/totalss

192

. . * RsqCOR uses sample correlation . quietly correlate y yhat . scalar RsqCOR = r(rho)^2 . . di "standard error of regression: " sereg standard error of regression: .16620308 . di "totalss: " totalss _n "explainedss: " explainedss _n "residualss: " residualss totalss: 3.3336 explainedss: .69556676 residualss: 2.6794761 . di "RsqRES: " RsqRES _n "RsqEXP: " RsqEXP _n "RsqCOR: " RsqCOR RsqRES: .19622149 RsqEXP: .20865333 RsqCOR: .19640666 . . * RsqWRSS uses weighted sums of squares . * First generate estimated variance of y . * Here for Poisson use fact that variance = mean . gen sigmasq = yhat . gen weightedylessybarsq = ((y - ybar)^2) / sigmasq . quietly sum weightedylessybarsq . scalar weightedtotalss = r(mean) . gen weightedresidualsq = ((y - yhat)^2) / sigmasq . quietly sum weightedresidualsq . scalar weightedresidualss = r(mean) . scalar RsqWRSS = 1 - weightedresidualss/weightedtotalss . di "RsqWRSS: " RsqWRSS RsqWRSS: .16945018 . . * RsqRG is from ML. Difficult to generalize beyond LEF models. . * Need . * lnL_fit log-likelihood at fitted values (the usual) . * lnL_0 log-likelihood at intecept only . * lnL_max log-likelihood at best fit . quietly poisson y $XLIST

193

. scalar lnL_fit = e(ll) . scalar lnL_0 = e(ll_0) . * The following applies only for Poisson. Differs for otehr models. . * lnf(y) = -mu + y*ln(mu) - ln(y!) . * is maximized at mu = y . * so compute lnL_max = sum of [-y + y*ln(y) - lny!] . * Following sets 0*ln0 = 0 . gen ylny = 0 . replace ylny = y*ln(y) if y > 0 (51 real changes made) . gen lnfyatmax = -y + ylny - lnfact(y) . quietly sum lnfyatmax . scalar lnL_max = r(sum) . scalar RsqRG = (lnL_fit - lnL_0) / (lnL_max - lnL_0) . . * RsqQ should only be used for binary and other discrete choice models . * And definitely use only if lnL_fit < 0 . scalar RsqQ = 1 - lnL_fit/lnL_0 . . di "lnL_0: " lnL_0 _n "lnL_fit: " lnL_fit _n "lnL_max: " lnL_max lnL_0: -191.57162 lnL_fit: -176.09119 lnL_max: -101.12402 . di "RsqRG: " RsqRG _n "RsqQ: " RsqQ RsqRG: .17115358 RsqQ: .08080754 . . * Check . sum Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------x2 | 100 .0053689 1.000686 -2.173506 2.106561 x3 | 100 -.0235884 1.024207 -2.857666 2.149822 mupoiss | 100 2.020511 1.400564 .3380426 7.029678 xp | 100 1.92 1.835013 0 8 y| 100 1.92 1.835013 0 8 -------------+-------------------------------------------------------x3sq | 100 1.039067 1.446146 .0000877 8.166255 yhat | 100 1.92 .838208 1.150405 5.398193 194

ybar | 100 1.92 0 1.92 1.92 ylessybarsq | 100 3.3336 5.966374 .0064 36.9664 yhatlessyb~q | 100 .6955668 1.572256 4.82e-06 12.09783 -------------+-------------------------------------------------------residualsq | 100 2.679476 4.830379 .0000825 36.93972 sigmasq | 100 1.92 .838208 1.150405 5.398193 weightedyl~q | 100 1.681324 2.560112 .0018502 19.23135 weightedre~q | 100 1.396423 2.424518 .0000276 19.21747 ylny | 100 2.15694 3.48234 0 16.63553 -------------+-------------------------------------------------------lnfyatmax | 100 -1.01124 .6233793 -1.969071 0 . poisson y $XLIST /* Stata Rsq = RsqQ */ Iteration 0: log likelihood = -176.09611 Iteration 1: log likelihood = -176.09119 Iteration 2: log likelihood = -176.09119 Poisson regression

Number of obs = 100 LR chi2(2) = 30.96 Prob > chi2 = 0.0000 Log likelihood = -176.09119 Pseudo R2 = 0.0808

-----------------------------------------------------------------------------y| Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------x3 | .3588412 .07035 5.10 0.000 .2209578 .4967245 x3sq | .0912999 .0514311 1.78 0.076 -.0095032 .1921029 _cons | .492656 .0958903 5.14 0.000 .3047144 .6805975 -----------------------------------------------------------------------------. . *** The following results are for Model 2 in Table 8.3 p.291 . *** For model 1 R-squareds need to rerun with only x3 as regressor . di "standard error of regression: " sereg standard error of regression: .16620308 . di "RsqRES: " RsqRES _n "RsqEXP: " RsqEXP _n "RsqCOR: " RsqCOR RsqRES: .19622149 RsqEXP: .20865333 RsqCOR: .19640666 . di "RsqWRSS: " RsqWRSS _n "RsqRG: " RsqRG _n "RsqQ: " RsqQ RsqWRSS: .16945018 RsqRG: .17115358 RsqQ: .08080754 . . ********* RESIDUAL ANALYSIS (text bottom p.290 to top p.291) ********** . . * Assume that from earlier have yhat 195

. . * raw residual . gen raw = y - yhat . gen sigma = sqrt(yhat) . gen Pearson = (y - yhat)/sigma . * Note that earlier defined ylny = 0 if y=0 and = yln(y) otherwise . gen deviance = sign(y-yhat)*sqrt(2*(-y+ylny)-2*(-yhat+y*ln(yhat))) . . *** The following are results reported in text bottom p.290 to top p.291 . sum raw Pearson deviance Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------raw | 100 -2.38e-09 1.645157 -2.993904 6.077806 Pearson | 100 -.0014455 1.187656 -1.498094 4.383774 deviance | 100 -.2103819 1.212345 -2.016939 3.264961 . corr raw Pearson deviance (obs=100) | raw Pearson deviance -------------+--------------------------raw | 1.0000 Pearson | 0.9852 1.0000 deviance | 0.9625 0.9818 1.0000

. * Example of use to find whether x3 belongs in the model . * graph twoway scatter Pearson x3 . . ********** CLOSE OUTPUT . log close log: c:\Imbook\bwebpage\Section2\mma08p3diagnostics.txt log type: text closed on: 17 May 2005, 14:10:13

196

-----------------------------------------------------------------------------------------------------log: c:\Imbook\bwebpage\Section2\mma09p1np.txt log type: text opened on: 17 May 2005, 14:16:51 . . ********** OVERVIEW OF MMA09P1NP.DO ********** . . * STATA Program . * copyright C 2005 by A. Colin Cameron and Pravin K. Trivedi . * used for "Microeconometrics: Methods and Applications" . * by A. Colin Cameron and Pravin K. Trivedi (2005) . * Cambridge University Press . . * Chapter 9.2 p.295-297 . * Nonparametric density estimation and nonparametric regression using actual data. . . * (1) Histogram: Figure 9.1 in chapter 9.2.1 (ch9hist) . * (2) Kernel density estimate as bandwidth varies: Figure 9.2 in chapter 9.2.1 (ch9kd1) . * (3) Kernel density estimate as kernel varies: Figure 9.4 in chapter 9.3.4 (ch9kdensu1) . * (4) Lowess regression: Figure 9.3 in chapter 9.4.3 (ch9ksm1) . * (5) Extra: Nearest neighbours regression: using Lowess and using add-on knnreg . * (6) Extra: Kernel regression: using add-on kernreg . . * using data on earnings and education (see below) . . * NOTE: This particular program uses version 8.2 rather than 8.0 .* For kernel density Stata uses an alternative formulation of Epanechnikov .* To follow book and e.g. Hardle (1990) use epan2 rather than epan .* epan = epan2 if epan bandwidth is epan2 bandwidth divided by sqrt(5) .* where kernel epan2 is an update to Stata version 8.2 . . * To run this program you need file . * psidf3050.dat . * in your directory . . * To do (5) and (6) you need Stata add-ons knnreg and kernreg . * In Stata give command search knnreg and search kernreg . . * See also mma9p2npmore.do for more on nonparametric regression (Figures 9.5-9.7) . . ********** SETUP . . di "mma09p1np.do Cameron and Trivedi: Stata nonparametrics with wages and education" mma09p1np.do Cameron and Trivedi: Stata nonparametrics with wages and education . set more off . version 8 . set scheme s1mono /* Graphics scheme */ 197

. . ********** DATA DESCRIPTION .* . * The original data are from the PSID Individual Level Final Release 1993 data . * From www.isr.umich.edu/src/psid then choose Data Center . * 4856 observations on 9 variables for Females 30 to 50 years . . * Fixed width data . * intnum 1-4 V30001="1968 INTERVIEW NUMBER" . * persnum 5-7 V30002="PERSON NUMBER" . * age 8-9 V30809="AGE OF INDIVIDUAL 93" . * educatn 10-11 V30820="G90 HIGHEST GRADE COMPLETED 93" . * earnings 12-17 V30821="TOTAL LABOR INCOME 93" . * hours 18-21 V30823="1992 ANNUAL WORK HOURS 93" . * sex 22 V32000="SEX OF INDIVIDUAL" . * kids 23-24 V32022="# LIVE BIRTHS TO THIS INDIVIDUAL" . * [NOTE: DO NOT USE THE kids VARIABLE AS IT IS NUMBER OF BIRTHS .* NOT NUMBER OF KIDS CURRENTLYU IN HOUSEHOLD] . * married 25 V32049="LAST KNOWN MARITAL STATUS" . . ********** READ DATA ********** . . * Data are fixed format so use infix . infix intnum 1-4 persnum 5-7 age 8-9 educatn 10-11 earnings 12-17 /* > */ hours 18-21 sex 22 kids 23-24 married 25 using psidf3050.dat (4856 observations read) . summarize Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------intnum | 4856 4598.101 2761.971 4 9306 persnum | 4856 59.21355 79.74856 1 205 age | 4856 38.46293 5.595116 30 50 educatn | 4855 16.37714 18.4495 0 99 earnings | 4856 14244.51 15985.45 0 240000 -------------+-------------------------------------------------------hours | 4856 1235.335 947.1758 0 5160 sex | 4856 2 0 2 2 kids | 4856 4.48126 14.88786 0 99 married | 4856 1.920717 1.504848 1 9 . . ********** MISSING VALUES, DATA TRANSFORMATIONS and SAMPLE SELECTION . . * For Highest grade codes the missing codes are 98 DK and 99 NA and 0 inappropriate . * Here treat these as missing . replace educatn = . if (educatn==0 | educatn==98 | educatn==99) (290 real changes made, 290 to missing)

198

. . * For marital status the codes are . * 1 married; 2 Never married; 3 Widowed; 4 Divorced, annulment; . * 5 Separated; 8 NA / DK; 9 No histories 85-93 . * Recode 2-5 as not married and treat 8 and 9 as missing . replace married = . if (married==8 | married==9) (52 real changes made, 52 to missing) . replace married = 0 if married > 1 (1785 real changes made) . . * For kids the missing codes are 98 DK/NA and 99 no birth history . replace kids = . if (kids==98 | kids==99) (118 real changes made, 118 to missing) . * But do not use these data as it is number of births . * not number of kids currently in household . * So I drop kids . drop kids . . * Work with positive earnings only . drop if earnings==0 (1204 observations deleted) . * Topcode women with very high earnings . replace earnings=100000 if earnings>100000 (11 real changes made) . * Create log hourly wage . gen hwage = earnings/hours . gen lnhwage = ln(hwage) . . * Work with age 36 and nonmissing education data . keep if age == 36 (3468 observations deleted) . drop if educatn == . (7 observations deleted) . summarize Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------intnum | 177 4699.853 2765.081 14 9240 persnum | 177 59.53672 79.73001 1 188 age | 177 36 0 36 36 educatn | 177 12.58757 2.841347 3 17 199

earnings | 177 17470.55 13513.56 87 70000 -------------+-------------------------------------------------------hours | 177 1506.401 698.4145 8 3160 sex | 177 2 0 2 2 married | 177 .7457627 .4366669 0 1 hwage | 177 12.71631 16.58889 .6837607 175 lnhwage | 177 2.198163 .8281614 -.3801473 5.164786 . . * Write data to a text (ascii) file so can use with programs other than Stata . outfile intnum persnum age educatn earnings hours sex married hwage /* > */ lnhwage using mma09p1np.asc, replace . . ********* ANALYSIS: (1)-(3) NONPARAMETRIC DENSITY ESTIMATES . . set scheme s1mono . . * Here give bin width for histogram and kdensity . . * Calculate Silberman's plugin estimate of optimal bandwidth in (9.13) . * with delta given in Table 9.1 for Epanechnikov kernel . quietly sum lnhwage, detail . global sadj = min(r(sd),(r(p75)-r(p25))/1.349) . di "sadj: " $sadj " iqr/1349: " (r(p75)-r(p25))/1.349 " stdev: " r(sd) sadj: .65488184 iqr/1349: .65488184 stdev: .82816143 . global bwepan2 = 1.3643*1.7188*$sadj/(r(N)^0.2) . di "Bandwidth: " $bwepan2 Bandwidth: .54538542 . . * HISTOGRAM ONLY - Figure 9.1 . graph twoway (histogram lnhwage, bin(20) bcolor(*.2)), /* > */ scale (1.2) plotregion(style(none)) /* > */ title("Histogram for Log Wage") /* > */ xtitle("Log Hourly Wage", size(medlarge)) xscale(titlegap(*5)) /* > */ ytitle("Density", size(medlarge)) yscale(titlegap(*5)) /* > */ legend(pos(10) ring(0) col(1)) legend(size(small)) /* > */ legend( label(1 "Histogram") label(2 "Kernel")) . graph save ch9hist, replace (file ch9hist.gph saved) . graph export ch9hist.wmf, replace (file c:\Imbook\bwebpage\Section2\ch9hist.wmf written in Windows Metafile format)

200

. . * COMBINED HISTOGRAM AND KERNEL DENSITY ESTIMATE . graph twoway (histogram lnhwage, bin(20) bcolor(*.2)) /* > */ (kdensity lnhwage, width($bwepan2) epan2 clstyle(p1)), /* > */ title("Histogram and Kernel Density for Log Wage") /* > */ caption("Note: Kernel is Epanechnikov with bandwidth 0.55") . . * KERNEL DENSITY ESTIMATE FOR 3 BANDWIDTHS - Figure 9.2 . global bwonehalf = 0.5*$bwepan2 . global btwotimes = 2*$bwepan2 . graph twoway (kdensity lnhwage, width($bwonehalf) epan2 clstyle(p2)) /* > */ (kdensity lnhwage, width($bwepan2) epan2 clstyle(p1)) /* > */ (kdensity lnhwage, width($btwotimes) epan2 clstyle(p3)), /* > */ scale (1.2) plotregion(style(none)) /* > */ title("Density Estimates as Bandwidth Varies") /* > */ xtitle("Log Hourly Wage", size(medlarge)) xscale(titlegap(*5)) /* > */ ytitle("Kernel density estimates", size(medlarge)) yscale(titlegap(*5)) /* > */ legend(pos(1) ring(0) col(1)) legend(size(small)) /* > */ legend( label(1 "One-half plug-in") label(2 "Plug-in") /* > */ label(3 "Two times plug-in")) . graph save ch9kd1, replace (file ch9kd1.gph saved) . graph export ch9kd1.wmf, replace (file c:\Imbook\bwebpage\Section2\ch9kd1.wmf written in Windows Metafile format) . . * KERNEL DENSITY ESTIMATE FOR 4 DIFFERENT KERNELS - Figure 9.4 . * Calculate Silberman's plugin optimal bandwidths using (9.13) . * with delta given in Table 9.1 for the different kernels . . * Use sadj calculated earlier for Epanecnnikov . global bwgauss = 1.3643*0.7764*$sadj/(_N^0.2) . global bwbiweight = 1.3643*2.0362*$sadj/(_N^0.2) . global bwrectang = 0.5*1.3643*1.3510*$sadj/(_N^0.2) . di "Usual Epanechnikov (epan2): " $bwepan2 Usual Epanechnikov (epan2): .54538542 . di "Gaussian: Gaussian:

" $bwgauss .24635632

. di "Quartic or biweight: Quartic or biweight:

" $bwbiweight .64609832

201

. di "Uniform or rectangular: " $bwrectang Uniform or rectangular: .21434015 . graph twoway (kdensity lnhwage, width($bwepan2) epan2) /* > */ (kdensity lnhwage, width($bwgauss) gauss) /* > */ (kdensity lnhwage, width($bwbiweight) biweight) /* > */ (kdensity lnhwage, width($bwrectang) rectangle), /* > */ scale (1.2) plotregion(style(none)) /* > */ title("Density Estimates as Kernel Varies") /* > */ xtitle("Log Hourly Wage", size(medlarge)) xscale(titlegap(*5)) /* > */ ytitle("Kernel density estimates", size(medlarge)) yscale(titlegap(*5)) /* > */ legend(pos(3) ring(0) col(1)) legend(size(small)) /* > */ legend( label(1 "Epanechnikov (h=0.545)") label(2 "Gaussian (h=0.246)") /* > */ label(3 "Quartic (h=0.646)") label(4 "Uniform (h=0.214)")) . graph save ch9kdensu1, replace (file ch9kdensu1.gph saved) . graph export ch9kdensu1.wmf, replace (file c:\Imbook\bwebpage\Section2\ch9kdensu1.wmf written in Windows Metafile format) . . * SHOW THAT STATA EPANECHNIKOV = USUAL EPANECHNIKOV . * Once divide usual Epanechnikov bandwidth by sqrt(5). . * (Pagan and Ullah (1999, p.28) have formulae.) . global bwepan = $bwepan2/sqrt(5) . graph twoway (kdensity lnhwage, width($bwepan2) epan2) /* > */ (kdensity lnhwage, width($bwepan) epan), /* > */ title("Epan = Epan2 if bandwidth adjusted") /* > */ legend( label(1 "Usual Epanechnikov") label(2 "Stata Epanechnikov")) . . . ********* ANALYSIS: (4) LOWESS NONPARAMETRIC REGRESSION ESTIMATES . . * LOWESS WITH DEFAULT BANDWIDTH of 0.8 . lowess lnhwage educatn . . * LOWESS REGRESSION WITH BANDWIDTHS of 0.1, 0.4 and 0.8 - Figure 9.3 . graph twoway (scatter lnhwage educatn, msize(medsmall) msymbol(o)) /* > */ (lowess lnhwage educatn, bwidth(0.8) clstyle(p2)) /* > */ (lowess lnhwage educatn, bwidth(0.4) clstyle(p1)) /* > */ (lowess lnhwage educatn, bwidth(0.1) clstyle(p3)), /* > */ scale (1.2) plotregion(style(none)) /* > */ title("Nonparametric Regression as Bandwidth Varies") /* > */ xtitle("Years of Schooling", size(medlarge)) xscale(titlegap(*5)) /* > */ ytitle("Log Hourly Wage", size(medlarge)) yscale(titlegap(*5)) /* > */ legend(pos(12) ring(0) col(2)) legend(size(small)) /* > */ legend( label(1 "Actual data") label(2 "Bandwidth h=0.8") /* 202

> */

label(3 "Bandwidth h=0.4") label(4 "Bandwidth h=0.1"))

. graph save ch9ksm1, replace (file ch9ksm1.gph saved) . graph export ch9ksm1.wmf, replace (file c:\Imbook\bwebpage\Section2\ch9ksm1.wmf written in Windows Metafile format) . . ********* ANALYSIS: (5) EXTRA: K-NEAREST NEIGHBORS NONPARAMETRIC REGRESSION . . * NEAREST NEIGHBOURS REGRESSION USING LOWESS . * Use lowess with mean and noweight options to give running means = centered kNN . global knnbwidth = 0.3 . di "knn via Lowess uses following % of sample: " $knnbwidth knn via Lowess uses following % of sample: .3 . lowess lnhwage educatn, bwidth($knnbwidth) mean noweight . . * LOWESS COMPARED TO NEAREST NEIGHBOURS . graph twoway (lowess lnhwage educatn, bwidth(0.3) mean noweight) /* > */ (lowess lnhwage educatn, bwidth(0.3)), /* > */ title("Centered kNN versus Lowess") /* > */ legend( label(1 "Centered kNN") label(2 "Lowess 0.8")) . . * NEAREST NEIGHBOURS REGRESSION USING KNNREG COMPARED TO USING LOWESS . * knnreg is a Stata add-on (in Stata search knnreg to find and download) . * Here we verify that same as lowess knn except knnreg drops endpoints . global k = round($knnbwidth*_N) . di "knnreg uses following number of neighbours: " $k knnreg uses following number of neighbours: 53 . knnreg lnhwage educatn, k($k) gen(knnregpred) ylabel nograph . lowess lnhwage educatn, bwidth($knnbwidth) gen(knnlowesspred) mean noweight nograph . * Following shows that the same except knnreg drops endpoints and lowess does not . sum knnlowesspred knnregpred Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------knnlowessp~d | 177 2.180308 .4522163 1.475512 2.954416 knnregpred | 125 2.184309 .3412013 1.529874 2.802865 . corr knnlowesspred knnregpred 203

(obs=125) | knnlow~d knnreg~d -------------+-----------------knnlowessp~d | 1.0000 knnregpred | 1.0000 1.0000

. . ********* ANALYSIS: (6) EXTRA: KERNEL NONPARAMETRIC REGRESSION . . * KERNEL REGRESSION . * Kercode 1 = Uniform; 2 = Triangle; 3 = Epanechnikov; 4 = Quartic (Biweight); .* 5 = Triweight; 6 = Gaussian; 7 = Cosinus . * bwidth(#) defines width of the weight function window around each grid point. . * npoint(#) specifies the number of equally spaced grid points over range of x. . * Here bwidth(3) gives e.g. positive weight from x=4 to x=10 if current x0=7 . kernreg lnhwage educatn, bwidth(3) kercode(3) npoint(100) ylabel gen(kernregpred1 xkernreg) . graph twoway (lowess lnhwage educatn, bwidth(0.5) clstyle(p2)) /* > */ (line kernregpred xkernreg, clstyle(p1)), /* > */ title("Lowess versus kernel regression") /* > */ legend( label(1 "Lowess") label(2 "Kernreg")) . . ********** CLOSE OUTPUT . log close log: c:\Imbook\bwebpage\Section2\mma09p1np.txt log type: text closed on: 17 May 2005, 14:17:05 -----------------------------------------------------------------------------------------------------log: c:\Imbook\bwebpage\Section2\mma09p2npmore.txt log type: text opened on: 17 May 2005, 14:17:35 . . ********** OVERVIEW OF MMA09P2NPMORE.DO ********** . . * STATA Program . * copyright C 2005 by A. Colin Cameron and Pravin K. Trivedi . * used for "Microeconometrics: Methods and Applications" . * by A. Colin Cameron and Pravin K. Trivedi (2005) . * Cambridge University Press . . * Chapter 9.4-9.5 (pages 307-19) . * More on nonparametric regression, including Figures 9.5 - 9.7 . . * It provides . * (1) Nonparametric regression .* k-nearest neighbors regression: Figure 9.5 in chapter 9.4.2 (ch9ksmma) 204

.* Lowess regression: Figure 9.6 in chapter 9.4.3 (ch9ksmlowess) .* Kernel regression (using Stata add-on kernreg) . * (2) Nonparametric derivative estimation .* Figure 9.7 in chapter 9.5.5 (ch9kderiv) . * (3) Cross-validation - still incomplete . * using generated data (see below) . . * See also mma09p1np.do for nonparametric density estimation and regression . . * This program uses free Stata add-on command kernreg . * To obtain in Stata give command search kernreg . . ********** SETUP ********** . . di "mma09p2npmore.do Cameron and Trivedi: Stata nonparametrics with generated data" mma09p2npmore.do Cameron and Trivedi: Stata nonparametrics with generated data . set more off . version 8.0 . set scheme s1mono /* Graphics scheme */ . . ********** GENERATE DATA ********** . . * Model is y = 150 + 6.5*x - 0.15*x^2 + 0.001*x^3 + u . * where u ~ N[0, 25^2] .* x = 1, 2, 3, ... , 100 .* e ~ N[0, 2^2] . . set seed 10101 . set obs 100 obs was 0, now 100 . gen u = 25*invnorm(uniform()) . gen x = _n . gen y = 150 + 6.5*x - 0.15*x^2 + 0.001*x^3 + u . sum Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------u| 100 2.809606 25.26291 -71.97334 73.59318 x| 100 50.5 29.01149 1 100 y| 100 228.5596 35.25377 132.2952 345.5873 . 205

. * Write data to a text (ascii) file so can use with programs other than Stata . outfile y x using mma09p2npmore.asc, replace . . ******** PARAMETRIC REGRESSION ********** . . * OLS regression on cubic polymomial . gen xsquared = x^2 . gen xcubed = x^3 . reg y x xsquared xcubed Source | SS df MS Number of obs = 100 -------------+-----------------------------F( 3, 96) = 31.15 Model | 60691.6801 3 20230.56 Prob > F = 0.0000 Residual | 62348.2994 96 649.461452 R-squared = 0.4933 -------------+-----------------------------Adj R-squared = 0.4774 Total | 123039.98 99 1242.82808 Root MSE = 25.485 -----------------------------------------------------------------------------y| Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------x | 6.055295 .9033915 6.70 0.000 4.262077 7.848513 xsquared | -.1402283 .0207284 -6.77 0.000 -.1813738 -.0990828 xcubed | .0009492 .0001349 7.03 0.000 .0006814 .0012171 _cons | 155.1521 10.58835 14.65 0.000 134.1344 176.1698 -----------------------------------------------------------------------------. predict ycubic (option xb assumed; fitted values) . summarize y ycubic Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------y| 100 228.5596 35.25377 132.2952 345.5873 ycubic | 100 228.5596 24.75979 161.0681 307.6293 . . ******** (1) NONPARAMETRIC REGRESSION ********** . . * K-NEAREST NEIGHBORS REGRESSION - FIGURE 9.5 . * ksm without options gives running mean = moving average = centered kNN . * Here _N = 100 so bwidth = 0.05 gives 100*0.05 = 5 nearest neighbours . graph twoway (scatter y x, msize(medsmall) msymbol(o)) /* > */ (lowess y x, mean noweight bwidth(0.05) clstyle(p1)) /* > */ (lfit y x, clstyle(p3)) /* > */ (lowess y x, mean noweight bwidth(0.25) clstyle(p2)), /* > */ scale (1.2) plotregion(style(none)) /* > */ title("k-Nearest Neighbours Regression as k Varies") /* 206

> > > > >

*/ xtitle("Regressor x", size(medlarge)) xscale(titlegap(*5)) /* */ ytitle("Dependent variable y", size(medlarge)) yscale(titlegap(*5)) /* */ legend(pos(12) ring(0) col(1)) legend(size(small)) /* */ legend( label(1 "Actual Data") label(2 "kNN (k=5)") /* */ label(3 "Linear OLS") label(4 "kNN (k=25)"))

. graph save ch9ksmma, replace (file ch9ksmma.gph saved) . graph export ch9ksmma.wmf, replace (file c:\Imbook\bwebpage\Section2\ch9ksmma.wmf written in Windows Metafile format) . . * VERIFY THAT kNN SAME AS MOVING AVERAGE . * Do moving average by hand for k = 5 . gen yma5 = (y[_n-2] + y[_n-1] + y + y[_n+1] + y[_n+2])/5 (4 missing values generated) . replace yma5 = (y[_n]+y[_n+1]+y[_n+2])/3 if _n==1 (1 real change made) . replace yma5 = (y[_n-1]+y[_n]+y[_n+1]+y[_n+2])/4 if _n==2 (1 real change made) . replace yma5 = (y[_n+1]+y[_n]+y[_n-1]+y[_n-2])/4 if _n==99 (1 real change made) . replace yma5 = (y[_n]+y[_n-1]+y[_n-2])/3 if _n==100 (1 real change made) . lowess y x, mean noweight bwidth(0.05) nogr gen(yknn5) . sum yma5 yknn5 Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------yma5 | 100 228.6037 26.63323 157.1434 297.4832 yknn5 | 100 228.6037 26.63323 157.1434 297.4832 . . * LOWESS REGRESSION - FIGURE 9.6 . graph twoway (scatter y x, msize(medsmall) msymbol(o)) /* > */ (lowess y x, bwidth(0.25) clstyle(p1)) /* > */ (line ycubic x, clstyle(p3)), /* > */ scale (1.2) plotregion(style(none)) /* > */ title("Lowess Nonparametric Regression") /* > */ xtitle("Regressor x", size(medlarge)) xscale(titlegap(*5)) /* > */ ytitle("Dependent variable y", size(medlarge)) yscale(titlegap(*5)) /* > */ legend(pos(12) ring(0) col(1)) legend(size(small)) /* > */ legend( label(1 "Actual Data") label(2 "Lowess (k=25)") /* > */ label(3 "OLS Cubic Regression") ) 207

. graph save ch9ksmlowess, replace (file ch9ksmlowess.gph saved) . graph export ch9ksmlowess.wmf, replace (file c:\Imbook\bwebpage\Section2\ch9ksmlowess.wmf written in Windows Metafile format) . . * KERNEL REGRESSION COMPARED TO k NEAREST NEIGHBORS REGRESSION . * For this artificial example (with equally spaced x) . * knn = kernel regression using uniform prior . * Kercode 1 = Uniform; 2 = Triangle; 3 = Epanechnikov; 4 = Quartic (Biweight); .* 5 = Triweight; 6 = Gaussian; 7 = Cosinus . * bwidth(#) defines width of the weight function window around each grid point. . * npoint(#) specifies the number of equally spaced grid points over range of x. . * Here bwidth(12) gives e.g. positive weight from x=15 to x=39 if current x=37 . kernreg y x, bwidth(12) kercode(1) npoint(100) ylabel gen(pykernreg xkernreg) . lowess y x, mean noweight bwidth(0.25) gen(yknn25) . sum pykernreg yknn25 Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------pykernreg | 100 228.6856 18.75275 181.1579 272.5488 yknn25 | 100 228.6856 18.75275 181.1578 272.5488 . . ******** (2) DERIVATIVE ESTIMATION ********** . . * DERIVATIVE ESTIMATION - FIGURE 9.7 . * Here use Lowess regression . lowess y x, xlab ylab bwidth(0.25) lowess nogr gen(yplowess) . * Need to first sort data on regressor if data on regressor are not ordered . sort x . gen dydxlowess = (yplowess - yplowess[_n-1])/(x - x[_n-1]) (1 missing value generated) . * And do the same for the earlier fitted cubic . gen dydxcubic = (ycubic - ycubic[_n-1])/(x - x[_n-1]) (1 missing value generated) . graph twoway (line dydxlowess x, clstyle(p1)) /* > */ (line dydxcubic x, clstyle(p3)), /* > */ scale (1.2) plotregion(style(none)) /* > */ title("Nonparametric Derivative Estimation") /* > */ xtitle("Regressor x", size(medlarge)) xscale(titlegap(*5)) /* > */ ytitle("Dependent variable y", size(medlarge)) yscale(titlegap(*5)) /* > */ legend(pos(12) ring(0) col(1)) legend(size(small)) /* 208

> */ legend( label(1 "From Lowess (k=25)") /* > */ label(2 "From OLS Cubic Regression") ) . graph save ch9kderiv, replace (file ch9kderiv.gph saved) . graph export ch9kderiv.wmf, replace (file c:\Imbook\bwebpage\Section2\ch9kderiv.wmf written in Windows Metafile format) . . ******** (3) CROSS-VALIDATION [PRELIMINARY] ********** . . /* The following does not work. > I need to figure out use of macros */ . . forvalues i = 5/25 { 2. scalar bd`i' = 0.01*`i' 3. global bw`i' = bd`i' 4. lowess y x, mean noweight bwidth($bw`i') gen(py`i') nogr 5. gen cv`i' = sum(3/2*(y-py`i')^2) 6. } . sum Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------u| 100 2.809606 25.26291 -71.97334 73.59318 x| 100 50.5 29.01149 1 100 y| 100 228.5596 35.25377 132.2952 345.5873 xsquared | 100 3383.5 3024.356 1 10000 xcubed | 100 255025 289320.7 1 1000000 -------------+-------------------------------------------------------ycubic | 100 228.5596 24.75979 161.0681 307.6293 yma5 | 100 228.6037 26.63323 157.1434 297.4832 yknn5 | 100 228.6037 26.63323 157.1434 297.4832 pykernreg | 100 228.6856 18.75275 181.1579 272.5488 xkernreg | 100 50.5 29.01149 1 100 -------------+-------------------------------------------------------yknn25 | 100 228.6856 18.75275 181.1578 272.5488 yplowess | 100 228.6494 25.46305 156.8217 302.5474 dydxlowess | 99 1.471977 2.20262 -1.953159 6.964434 dydxcubic | 99 1.480416 2.100452 -.8495026 6.342957 py5 | 100 228.0408 8.046055 217.6967 243.0812 -------------+-------------------------------------------------------cv5 | 100 84655.13 34359.8 10940.13 162417.9 py6 | 100 228.0408 8.046055 217.6967 243.0812 cv6 | 100 84655.13 34359.8 10940.13 162417.9 py7 | 100 228.0408 8.046055 217.6967 243.0812 cv7 | 100 84655.13 34359.8 10940.13 162417.9 -------------+-------------------------------------------------------py8 | 100 228.0408 8.046055 217.6967 243.0812 209

cv8 | 100 84655.13 34359.8 10940.13 162417.9 py9 | 100 228.0408 8.046055 217.6967 243.0812 cv9 | 100 84655.13 34359.8 10940.13 162417.9 py10 | 100 228.0408 8.046055 217.6967 243.0812 -------------+-------------------------------------------------------cv10 | 100 84655.13 34359.8 10940.13 162417.9 py11 | 100 228.0408 8.046055 217.6967 243.0812 cv11 | 100 84655.13 34359.8 10940.13 162417.9 py12 | 100 228.0408 8.046055 217.6967 243.0812 cv12 | 100 84655.13 34359.8 10940.13 162417.9 -------------+-------------------------------------------------------py13 | 100 228.0408 8.046055 217.6967 243.0812 cv13 | 100 84655.13 34359.8 10940.13 162417.9 py14 | 100 228.0408 8.046055 217.6967 243.0812 cv14 | 100 84655.13 34359.8 10940.13 162417.9 py15 | 100 228.0408 8.046055 217.6967 243.0812 -------------+-------------------------------------------------------cv15 | 100 84655.13 34359.8 10940.13 162417.9 py16 | 100 228.0408 8.046055 217.6967 243.0812 cv16 | 100 84655.13 34359.8 10940.13 162417.9 py17 | 100 228.0408 8.046055 217.6967 243.0812 cv17 | 100 84655.13 34359.8 10940.13 162417.9 -------------+-------------------------------------------------------py18 | 100 228.0408 8.046055 217.6967 243.0812 cv18 | 100 84655.13 34359.8 10940.13 162417.9 py19 | 100 228.0408 8.046055 217.6967 243.0812 cv19 | 100 84655.13 34359.8 10940.13 162417.9 py20 | 100 228.0408 8.046055 217.6967 243.0812 -------------+-------------------------------------------------------cv20 | 100 84655.13 34359.8 10940.13 162417.9 py21 | 100 228.0408 8.046055 217.6967 243.0812 cv21 | 100 84655.13 34359.8 10940.13 162417.9 py22 | 100 228.0408 8.046055 217.6967 243.0812 cv22 | 100 84655.13 34359.8 10940.13 162417.9 -------------+-------------------------------------------------------py23 | 100 228.0408 8.046055 217.6967 243.0812 cv23 | 100 84655.13 34359.8 10940.13 162417.9 py24 | 100 228.0408 8.046055 217.6967 243.0812 cv24 | 100 84655.13 34359.8 10940.13 162417.9 py25 | 100 228.0408 8.046055 217.6967 243.0812 -------------+-------------------------------------------------------cv25 | 100 84655.13 34359.8 10940.13 162417.9 . * Then need to choose the `i' with minimum cv`i' . * Problem here is that this gives e.g. $bw5 = 5 not 0.05 . . ********** CLOSE OUTPUT . log close log: c:\Imbook\bwebpage\Section2\mma09p2npmore.txt log type: text closed on: 17 May 2005, 14:17:43 210

-----------------------------------------------------------------------------------------------------log: c:\Imbook\bwebpage\Section2\mma09p3kernels.txt log type: text opened on: 18 May 2005, 21:31:55 . . ********** OVERVIEW OF MMA09P3KERNELS.DO ********** . . * STATA Program . * copyright C 2005 by A. Colin Cameron and Pravin K. Trivedi . * used for "Microeconometrics: Methods and Applications" . * by A. Colin Cameron and Pravin K. Trivedi (2005) . * Cambridge University Press . . * This program plots different kernel regression functions . * This is not included in the book . * There is no data . . * Results: . * Epanstata is similar to Gaussian kernel. Less peaked than Epanechnikov. . * Triangular, Quartic, Triweight and Tricubic are similar, . * and are more peaked than Epanechnikov . * The fourth oreder Kernels can take negative values. . . * NOTE: For kernel density Stata uses an alternative formulation of Epanechnikov .* To follow book and e.g. Hardle (1990) use epan2 .* (available in Stata version 8.2) rather than epan . . ********** SETUP ********** . . di "mma09p3kernels.do Cameron and Trivedi: Stata Kernel Functions" mma09p3kernels.do Cameron and Trivedi: Stata Kernel Functions . set more off . version 8.0 . set scheme s1mono /* Graphics scheme */ . . ********** GENERATE DATA ********** . . * Graphs will be for z = -2.5 to 2.5 in increments of 0.02 . set obs 251 obs was 0, now 251 . gen z = -2.52 + 0.02*_n . . ********** CALCULATE THE KERNELS ********** 211

. . * Indicator for |z| < 1 . gen abszltone = 1 . replace abszltone = 0 if abs(z)>=1 (152 real changes made) . . gen kuniform = 0.5*abszltone . . gen ktriangular = (1 - abs(z))*abszltone . . * Stata calls the usual Epanechnikov kernel epan2 . gen kepanechnikov = (3/4)*(1 - z^2)*abszltone . . * Stata uses alternative epanechnikov . gen abszltsqrtfive = 1 . replace abszltsqrtfive = 0 if abs(z)>=sqrt(5) (28 real changes made) . gen kepanstata = (3/4)*(1 - (z^2)/5)/sqrt(5)*abszltsqrtfive . . gen kquartic = (15/16)*((1 - z^2)^2)*abszltone . . gen ktriweight = (35/32)*((1 - z^2)^3)*abszltone . . gen ktricubic = (70/81)*((1 - (abs(z))^3)^3)*abszltone . . gen kgaussian = normden(z) . . gen k4thordergauss = (1/2)*(3-(z^2))*normden(z) . . * This is the optimal 4th order - Pagan and Ullah p.57 . gen k4thorderquartic = (15/32)*(3 - 10*z^2 + 7*z^4)*abszltone . . sum Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------z| 251 0 1.452033 -2.5 2.5 212

abszltone | 251 .3944223 .4897027 0 1 kuniform | 251 .1972112 .2448514 0 .5 ktriangular | 251 .1992032 .3058094 0 1 kepanechni~v | 251 .1991833 .2831384 0 .75 -------------+-------------------------------------------------------abszltsqrt~e | 251 .8884462 .3154457 0 1 kepanstata | 251 .199203 .1175801 0 .3354102 kquartic | 251 .1992032 .3209618 0 .9375 ktriweight | 251 .1992032 .351183 0 1.09375 ktricubic | 251 .1992032 .3191548 0 .8641976 -------------+-------------------------------------------------------kgaussian | 251 .1967985 .1323354 .0175283 .3989423 k4thorderg~s | 251 .2053453 .2297148 -.0327459 .5984134 k4thorderq~c | 251 .199253 .4584096 -.2676096 1.40625 . . * Write data to a text (ascii) file so can use with programs other than Stata . outfile z abszltone kuniform ktriangular kepanechnikov abszltsqrtfive /* > */ kepanstata kquartic ktriweight ktricubic kgaussian /* > */ k4thordergauss k4thorderquartic using mma09p3kernels.asc, replace . . ********** PLOT THE KERNEL FUNCTIONS ********** . . * Epanstata is similar to Gaussian kernel. Less peaked than Epanechnikov . graph twoway (line kuniform z) (line kepanechnikov z) (line kepanstata z) /* > */ (line kgaussian z), title("Four standard kernel functions") . . * Triangular, Quartic, Triweight and Tricubic are similar . * and are more peaked than Epanechnikov . graph twoway (line ktriangular z) (line kquartic z) (line ktriweight z) /* > */ (line ktricubic z), title("Four similar kernel functions") . . graph twoway (line k4thordergauss z) (line k4thorderquartic z), /* > */ title("Two fourth order kernel functions") . . ********** CLOSE OUTPUT ********** . log close log: c:\Imbook\bwebpage\Section2\mma09p3kernels.txt log type: text closed on: 18 May 2005, 21:32:00

213

-----------------------------------------------------------------------------------------------------log: c:\Imbook\bwebpage\Section2\mma10p1gradient.txt log type: text opened on: 17 May 2005, 14:21:11 . . ********** OVERVIEW OF MMA10P1GRADIENT.DO ********** . . * STATA Program . * copyright C 2005 by A. Colin Cameron and Pravin K. Trivedi . * used for "Microeconometrics: Methods and Applications" . * by A. Colin Cameron and Pravin K. Trivedi (2005) . * Cambridge University Press . . * Chapter 10.2.4 page 338-9 . * Gradient Method Example (Newton-Raphson) . * using artificial data . . ********** SETUP ********** . . set more off . version 8.0 . set scheme s1mono /* Used for graphs */ . . ********** ANALYSIS: FIRST SIX ROUNDS OF NR ********** . . * General Algorithm is . * b_s+1 = b_s + A_s*g_s . . * For this the example in section 10.2.4 . * Q(b) = -(1/2N) * Sum_i {(y_i-exp(b))^2} .* = -(1/2N) * Sum_i {(y_i)^2 -2*y_i*exp(b) + exp(b)^2} .* = ymean*exp(b) - 0.5*(exp(b))^2 - (1/N) * Sum_i {(y_i)^2} . . * so the gradient vector (here a scalar) .* g = dQ_s / db .* = (ymean - exp(b))*exp(b) . . * and using the Method of scoring variation of Newton-Raphson . * the weighting matrix (here a scalar) . * A_s = Inv [ - E[d^2 Q_s / db^2 ] ] . * A_s = Inv [ - E[(ymean - exp(b))*exp(b) - exp(b)*exp(b)] ] .* = Inv [ exp(2b) ] since E[(ymean - exp(b)] = 0 .* = exp(-2b) . . * Data . scalar ymean = 2.0

214

. . * Starting value . scalar b_1 = 0.0 . . * First round . scalar g_1 = (ymean - exp(b_1))*exp(b_1) . scalar A_1 = exp(-2*b_1) . scalar b_2 = b_1 + A_1*g_1 . . * Second round . scalar g_2 = (ymean - exp(b_2))*exp(b_2) . scalar A_2 = exp(-2*b_2) . scalar b_3 = b_2 + A_2*g_2 . . * Third round . scalar g_3 = (ymean - exp(b_3))*exp(b_3) . scalar A_3 = exp(-2*b_3) . scalar b_4 = b_3 + A_3*g_3 . . * Fourth round . scalar g_4 = (ymean - exp(b_4))*exp(b_4) . scalar A_4 = exp(-2*b_4) . scalar b_5 = b_4 + A_4*g_4 . . * Fifth round . scalar g_5 = (ymean - exp(b_5))*exp(b_5) . scalar A_5 = exp(-2*b_5) . scalar b_6 = b_5 + A_5*g_5 . . * Sixth round . scalar g_6 = (ymean - exp(b_6))*exp(b_6) . scalar A_6 = exp(-2*b_6) . 215

. * We also calculate the objective function at each round . * (ignoring the term - (1/N) * Sum_i {(y_i)^2} which does not depend on b) . scalar Q_1 = ymean*exp(b_1) - 0.5*(exp(b_1))^2 . scalar Q_2 = ymean*exp(b_2) - 0.5*(exp(b_2))^2 . scalar Q_3 = ymean*exp(b_3) - 0.5*(exp(b_3))^2 . scalar Q_4 = ymean*exp(b_4) - 0.5*(exp(b_4))^2 . scalar Q_5 = ymean*exp(b_5) - 0.5*(exp(b_5))^2 . scalar Q_6 = ymean*exp(b_6) - 0.5*(exp(b_6))^2 . . * DISPLAY THE RESULTS GIVEN IN TABLE 10.1 page 339 . di "Round Estiamte Gradient Weight Function" Round Estiamte Gradient Weight Function . di " 1: " b_1 %8.6f " " g_1 %8.6f " " A_1 %8.6f " " Q_1 %8.6f 1: 0 1 1 1.5 . di " 2: " b_2 %8.6f " " g_2 %8.6f " " A_2 %8.6f " " Q_2 %8.6f 2: 1 -1.9524924 .13533528 1.7420356 . di " 3: " b_3 %8.6f " " g_3 %8.6f " " A_3 %8.6f " " Q_3 %8.6f 3: .73575888 -.18171081 .22957678 1.9962098 . di " 4: " b_4 %8.6f " " g_4 %8.6f " " A_4 %8.6f " " Q_4 %8.6f 4: .6940423 -.00358529 .24955284 1.9999984 . di " 5: " b_5 %8.6f " " g_5 %8.6f " " A_5 %8.6f " " Q_5 %8.6f 5: .69314758 -1.602e-06 .2499998 2 . di " 6: " b_6 %8.6f " " g_6 %8.6f " " A_6 %8.6f " " Q_6 %-8.6f 6: .69314718 -3.206e-13 .25 2 . . ********** CLOSE OUTPUT ********** . log close log: c:\Imbook\bwebpage\Section2\mma10p1gradient.txt log type: text closed on: 17 May 2005, 14:21:11 -----------------------------------------------------------------------------------------------------log: c:\Imbook\bwebpage\Section3\mma11p1boot.txt log type: text opened on: 18 May 2005, 15:52:55 . . ********** OVERVIEW OF MMA11P1BOOT.DO ********** 216

. . * STATA Program . * copyright C 2005 by A. Colin Cameron and Pravin K. Trivedi . * used for "Microeconometrics: Methods and Applications" . * by A. Colin Cameron and Pravin K. Trivedi (2005) . * Cambridge University Press . . * Chapter 11.3 pages 366-368 . * Bootstrap applied to exponential regression model . * Provides . * (1) Bootstrap distribution of beta and t-statistic (Table 11.1) . * (2) Various statistics from bootstrap (pages 366-8) . * (3) Bootstrap density of the t-statistic (Figure 11.1) . * using generated data (see below) . . * Note: To speed up progam reduce breps - the number of bootstrap replications .* But final program should use many repications . . * Note: This program uses ereg which is an old Stata command .* superceded by streg, dist(exp) . . * Note: For bootstrap see also mm07p4boot.do .* which has additional commands / ways to bootstrap . . ********** SETUP ********** . . set more off . version 8 . . ********** GENERATE DATA ********** . . * Model is y ~ exponential(exp(a + bx + cz)) . * where x and z are joint normal (1,1,0.1,0.1,0.5) . * i.e. means 0.1 and 0.1 .* sd's 0.1 and 0.1 and correln 0.5 (so correln^2 = .25) . * variances 0.01 and 0.01 and covariance 0.005 . . * Generate data from joint normal . * Use fact that x is N(mu0.1,0.1) .* and z | x is N(0.1 + .05/.1*(x - .1), .01x.75 = .0075) .* so that st dev = sqrt(0.0075) = 0.0866025 . . set obs 50 obs was 0, now 50 . set seed 10001 . * Generate x and z bivariate normal . scalar mu1=0.1 217

. scalar mu2=0.1 . scalar sig1=0.1 . scalar sig2=0.1 . scalar rho=0.5 . scalar sig12=rho*sig1*sig2 . gen x = mu1 + sig1*invnorm(uniform()) . gen muzgivx = mu2+(sig12/(sig2*sig2))*(x-mu1) . gen sigzgivx = sqrt(sig2*sig2*(1-rho*rho)) . gen z = muzgivx + sigzgivx*invnorm(uniform()) . * To generate y exponential with mean mu=Ey use . * Integral 0 to a of (1/mu)exp(-x/mu) dx by change of variables . * = Integral 0 to a/mu of exp(-t)dt . * = incomplete gamma function P(0,a/mu) in the terminology of Stata . gen Ey = exp(-2.0+2*x+2*z) . gen y = Ey*invgammap(1,uniform()) . gen logy = log(y) . . * Descriptive Statistics . summarize Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------x| 50 .0935209 .1031485 -.1173506 .2778609 muzgivx | 50 .0967604 .0515742 -.0086753 .1889304 sigzgivx | 50 .0866025 0 .0866025 .0866025 z| 50 .1033014 .0909297 -.0885447 .3137469 Ey | 50 .2114837 .071719 .0945722 .4314067 -------------+-------------------------------------------------------y| 50 .2024206 .2237202 .0005293 .9601147 logy | 50 -2.282336 1.45494 -7.543878 -.0407026 . ereg y x z Iteration 0: Iteration 1: Iteration 2: Iteration 3: Iteration 4:

log likelihood = -84.246434 log likelihood = -80.068104 log likelihood = -79.871694 log likelihood = -79.871338 log likelihood = -79.871338 218

Exponential regression -- entry time 0 log expected-time form Number of obs = LR chi2(2) = 8.75 Log likelihood = -79.871338 Prob > chi2 =

50 0.0126

-----------------------------------------------------------------------------y| Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------x | .2670543 1.417339 0.19 0.851 -2.510879 3.044988 z | 4.663384 1.740712 2.68 0.007 1.251652 8.075117 _cons | -2.191619 .2328589 -9.41 0.000 -2.648014 -1.735224 -----------------------------------------------------------------------------. . save mma11p1boot, replace file mma11p1boot.dta saved . . * Write data to a text (ascii) file so can use with programs other than Stata . outfile y x z using mma11p1boot.asc, replace . . ********** SIMPLE BOOTSTRAP ********** . . * Stata produces four bootstrap 100*(1-alpha) confidence intervals . * (N) and (P) have no asymptotic refinement . * (BC)-(BCA) have asymptotic refinement . * For details see program mma07p4boot.do . . * Change the following for different number of simulations S . * From page 399, for testing better to use 999 than 1000 . global breps = 999 /* The number of bootstrap reps used below */ . . set seed 20001 . . * A simple and adequate bootstrap command for the slope coefficients is . bs "ereg y x z" "_b[x] _b[z]", reps($breps) level(95) command: ereg y x z statistics: _bs_1 = _b[x] _bs_2 = _b[z] Bootstrap statistics

Number of obs = Replications = 999

50

-----------------------------------------------------------------------------Variable | Reps Observed Bias Std. Err. [95% Conf. Interval] -------------+---------------------------------------------------------------219

_bs_1 | 999 .2670543 -.1885509 1.420956 -2.52135 3.055458 (N) | -2.9054 2.696445 (P) | -2.590993 2.864327 (BC) _bs_2 | 999 4.663384 .0524786 1.939086 .8582302 8.468539 (N) | .5006047 8.483892 (P) | .231034 8.174835 (BC) -----------------------------------------------------------------------------Note: N = normal P = percentile BC = bias-corrected . . ********** MORE DETAILED BOOTSTRAP ********** . . * The following bootstrap also gives standard error at each replication . * and saves data from replications for further analysis . . * In partiulcar, want to use the percentile-t method, . * which provides asymtptotic refinement . . * Stata does not give this. For methods see . * e.g. Efron and Tibsharani (1993, pp.160-162) . * e.g. Cameron and Trivedi (2005) Chapter 11.2.6-11.2.7 . * For sample s compute t-test(s) = (bhat(s)-bhat) / se(s) . * where bhat is initial estimate . * and bhat(s) and se(s) are for sth round. . * Order the t-test(s) statistics and choose the alpha/2 percentiles . * which give the critical values for the t-test . . * Implementation requires saving the results from each bootstrap replication . * in order to obtain ccritical values from percentiles of bootstrap distribution . . use mma11p1boot.dta, clear . . * Get and store coefficients (b) . * for regressors in the original model and data before bootstrap . quietly ereg y x z . global bx=_b[x] . global sex=_se[x] . global bz=_b[z] . global sez=_se[z] . di " Coefficients bx: " $bx " and bz: " $bz Coefficients bx: .26705432 and bz: 4.6633845 . di " Standard error sex: " $sex " and sez: " $sez 220

Standard error sex: 1.4173391 and sez: 1.7407119 . . * Bootstrap and save coeff estimates and se's from each replication . set seed 20001 . bs "ereg y x z" "_b[x] _b[z] _se[x] _se[z]", reps($breps) level(95) saving(mma11p1bootreps) repl > ace command: ereg y x z statistics: _bs_1 = _b[x] _bs_2 = _b[z] _bs_3 = _se[x] _bs_4 = _se[z] Bootstrap statistics

Number of obs = Replications = 999

50

-----------------------------------------------------------------------------Variable | Reps Observed Bias Std. Err. [95% Conf. Interval] -------------+---------------------------------------------------------------_bs_1 | 999 .2670543 -.1885509 1.420956 -2.52135 3.055458 | -2.9054 2.696445 (P) | -2.590993 2.864327 (BC) _bs_2 | 999 4.663384 .0524786 1.939086 .8582302 8.468539 | .5006047 8.483892 (P) | .231034 8.174835 (BC) _bs_3 | 999 1.417339 .0644196 .1718393 1.080131 1.754547 | 1.234399 1.902349 (P) | 1.196068 1.742845 (BC) _bs_4 | 999 1.740712 .0910103 .186631 1.374478 2.106946 | 1.542322 2.257937 (P) | 1.453673 2.058318 (BC) -----------------------------------------------------------------------------Note: N = normal P = percentile BC = bias-corrected

(N)

(N)

(N)

(N)

. . * Now use the bootstrap estimates . use mma11p1bootreps, clear (bootstrap: ereg y x z) . sum Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------_bs_1 | 999 .0785034 1.420956 -9.431229 4.278278 _bs_2 | 999 4.715863 1.939086 -1.747643 12.09208 _bs_3 | 999 1.481759 .1718393 1.145421 2.761842 _bs_4 | 999 1.831722 .186631 1.387625 2.910449 221

. * Order comes from "_b[x] _b[z] _se[x] _se[z]" in earlier bs . gen bxs = _bs_1 . gen bzs = _bs_2 . gen sexs = _bs_3 . gen sezs = _bs_4 . gen ttestxs = (bxs - $bx)/sexs . gen ttestzs = (bzs - $bz)/sezs . . ********** (1) TABLE 11.1 (page 367) . . summarize bzs ttestzs, d bzs ------------------------------------------------------------Percentiles Smallest 1% -.3361366 -1.747643 5% 1.544816 -1.716207 10% 2.270323 -1.366866 Obs 999 25% 3.570291 -1.205571 Sum of Wgt. 999 50% 75% 90% 95% 99%

4.77197 Mean 4.715863 Largest Std. Dev. 1.939086 5.970802 10.10243 7.100958 10.42623 Variance 3.760056 7.810663 10.76733 Skewness -.1344324 9.426978 12.09208 Kurtosis 3.545415

ttestzs ------------------------------------------------------------Percentiles Smallest 1% -2.66391 -3.921595 5% -1.727528 -3.483456 10% -1.32364 -3.201425 Obs 999 25% -.6209012 -2.975815 Sum of Wgt. 999 50% 75% 90% 95% 99%

.0618649 Mean .0261125 Largest Std. Dev. 1.046855 .7034938 2.693856 1.323415 3.087892 Variance 1.095904 1.70558 3.11692 Skewness -.1596043 2.529097 3.738328 Kurtosis 3.337749

. . * Additionally need the 2.5 and 97.5 percentiles not given in summarize, d 222

. . * Coefficient of z . _pctile bzs, p(2.5,97.5) . di " Lower 2.5 and upper 2.5 percentile of coeff b for z: " r(r1) " and " r(r2) Lower 2.5 and upper 2.5 percentile of coeff b for z: .50060469 and 8.4838924 . . * t-statistic for z . _pctile ttestzs, p(2.5,97.5) . di " Lower 2.5 and upper 2.5 percentile of ttest on z: " r(r1) " and " r(r2) Lower 2.5 and upper 2.5 percentile of ttest on z: -2.1827998 and 2.0659592 . . ********** (2) RESULTS IN TEXT PAGES 366-7 ********** . . * (2A) Bootstrap standard error estimate (no refinement) . * These are given earlier in bootstrap table output . * Equivalently get the standard deviation of bzs . . quietly sum bzs . scalar bzbootse = r(sd) . di "Bootstrap estimate of standard error: " bzbootse Bootstrap estimate of standard error: 1.9390864 . . * (2B) Test b3 = 0 using percentile-t method (asymptotic refinement) . * Use the 2.5% and 97.5% bootstrap critical values for t-statistic for z . . _pctile ttestzs, p(2.5,97.5) . di " Lower 2.5 and upper 2.5 percentile of ttest on z: " r(r1) " and " r(r2) Lower 2.5 and upper 2.5 percentile of ttest on z: -2.1827998 and 2.0659592 . . * (2D) 95% confidence interval with asymptotic refinement . * Use the preceding critical values . . scalar lbz = $bz + r(r1)*$sez /* Note the plus sign here */ . scalar ubz = $bz + r(r2)*$sez . di " Percentile-t interval lower and upper bounds: (" lbz "," ubz ")" Percentile-t interval lower and upper bounds: (.86375888,8.2596243) . . * (2B-Var) Variation for symmetric two-sided test on z . 223

. gen absttestzs = abs(ttestzs) . _pctile absttestzs, p(95) . di " Upper 5 percentile of symmetric two-sided test on z: " r(r1) " Upper 5 percentile of symmetric two-sided test on z: 2.0775187 . . * (2C) Test b3 = 0 without asymptotic refinement . * Usual Wald test except use bootstrap estimate of standard error . . scalar Wald = ($bz - 0) / bzbootse . di "Wald statistic using bootstrap standard error: " Wald Wald statistic using bootstrap standard error: 2.404939 . . * (2E) Bootstrap estimate of bias . * This is given in the earlier bootstrap results table . * and is explained in the text . . ********** (3) FIGURE 11.1 (p.368) PLOTS ESTIMATED DENSITY OF T-STATISTIC FOR Z . . set scheme s1mono . label var ttestzs "Bootstrap t-statistic" . kdensity ttestzs, normal /* > */ scale (1.2) plotregion(style(none)) /* > */ title("Bootstrap Density of 't-Statistic'") /* > */ xtitle("t-statistic from each bootstrap replication", size(medlarge)) xscale(titlegap(*5)) /* > > */ ytitle("Density", size(medlarge)) yscale(titlegap(*5)) /* > */ legend(pos(11) ring(0) col(1)) legend(size(small)) /* > */ legend( label(1 "Bootstrap Estimate") label(2 "Standard Normal")) . graph save ch11boot, replace (file ch11boot.gph saved) . . ********** CLOSE OUTPUT ********** . log close log: c:\Imbook\bwebpage\Section3\mma11p1boot.txt log type: text closed on: 18 May 2005, 15:53:47

224

-----------------------------------------------------------------------------------------------------log: c:\Imbook\bwebpage\Section3\mma12p1integration.txt log type: text opened on: 18 May 2005, 21:17:14 . . ********** OVERVIEW OF MMA12P1INTEGRATION.DO ********** . . * STATA Program . * copyright C 2005 by A. Colin Cameron and Pravin K. Trivedi . * used for "Microeconometrics: Methods and Applications" . * by A. Colin Cameron and Pravin K. Trivedi (2005) . * Cambridge University Press . . * Chapter 12.3.3 pages 391-2 . * Computes integral numerically and by simulation . * (1) Illustrate Midpoint Rule (page 392) . * (2) Illustrate Monte Carlo integral (Table 12.1 page 392) .* . * for computing E[x] and E[exp(-exp(x))] for x ~ N[0,1] . . * No data need be read in. . . ********** SETUP ********** . . set more off . version 8.0 . . ********** (1) NUMERICAL INTEGRATION USING MIDPOINT RULE ********** . . * Midpoint rule for n evaluation points between a and b is . * Integral = Sum (j=1 to n) [(b-a)/n]*f(xbar_j) . * where xbar_j is midpoint between x_j-1 and x_j . . program midpointrule, rclass 1. version 8 2. /* define arguments. Here trueb2 = b2 in Phi(b1 + b2*x2) */ . args neval a b 3. drop _all 4. scalar increment = (`b'-`a') / `neval' 5. set obs `neval' 6. /* Compute the function of interest */ . gen xbar = `a' - 0.5*increment + increment*_n 7. gen density = exp(-xbar*xbar/2)/sqrt(2*_pi) 8. * Following is contribution to E[x] when x ~ N[0,1] . gen f1xbar = xbar*density 9. * Following is contribution to E[exp(-exp(x))] when x ~ N[0,1] . gen f2xbar = exp(-exp(x))*density 10. /* Compute the averages */ 225

. quietly sum f1xbar 11. scalar Ex = r(sum)*increment 12. quietly sum f2xbar 13. scalar Eexpminexpx = r(sum)*increment 14. /* Print results */ . di "Evaluation points: " `neval' " over range: (" `a' "," `b' ") 15. di "Midpoint rule estimate of E[x] is: " Ex 16. di "Midpoint rule estimate of E[exp(-exp(x))] is: " Eexpminexpx 17. end . . midpointrule 20 -5 5 obs was 0, now 20 Evaluation points: 20 over range: (-5,5) Midpoint rule estimate of E[x] is: 0 Midpoint rule estimate of E[exp(-exp(x))] is: .38175625 . midpointrule 200 -5 5 obs was 0, now 200 Evaluation points: 200 over range: (-5,5) Midpoint rule estimate of E[x] is: 0 Midpoint rule estimate of E[exp(-exp(x))] is: .38175618 . midpointrule 2000 -5 5 obs was 0, now 2000 Evaluation points: 2000 over range: (-5,5) Midpoint rule estimate of E[x] is: 0 Midpoint rule estimate of E[exp(-exp(x))] is: .38175618 . . ********** (2) MONTE CARLO INTEGRATION USING DRAWS FROM DENSITY OF X ********** . . * To get E[g(x)] . * make draws from N[0,1], compute g(x), and average over draws . . program simintegration, rclass 1. version 8 2. /* define arguments. Here trueb2 = b2 in Phi(b1 + b2*x2) */ . args nsims 3. /* Generate the data: here x */ . drop _all 4. set obs `nsims' 5. set seed 10101 6. gen x = invnorm(uniform()) 7. /* Compute the function of interest */ . gen f1x = x /* For E[x] just need x */ 8. gen f2x = exp(-exp(x)) /* For E[exp(-exp(x))] */ 9. /* Compute the averages */ . quietly sum f1x 10. scalar Ex = r(mean) 226

11. quietly sum f2x 12. scalar Eexpminexpx = r(mean) 13. di "Number of simulations: " `nsims' 14. di "Monte Carlo estimate of E[x] is: " Ex 15. di "Monte Carlo estimate of E[exp(-exp(x))] is: " Eexpminexpx 16. end . . * Note a different program was used to obtain Table 12.1 on page 392 . * So results will differ somewhat from text, except for very high number of simulations . . simintegration 10 obs was 0, now 10 Number of simulations: 10 Monte Carlo estimate of E[x] is: -.10143571 Monte Carlo estimate of E[exp(-exp(x))] is: .42635197 . simintegration 25 obs was 0, now 25 Number of simulations: 25 Monte Carlo estimate of E[x] is: .17496346 Monte Carlo estimate of E[exp(-exp(x))] is: .35703296 . simintegration 50 obs was 0, now 50 Number of simulations: 50 Monte Carlo estimate of E[x] is: .0079132 Monte Carlo estimate of E[exp(-exp(x))] is: .37966293 . simintegration 100 obs was 0, now 100 Number of simulations: 100 Monte Carlo estimate of E[x] is: .11238423 Monte Carlo estimate of E[exp(-exp(x))] is: .3524417 . simintegration 500 obs was 0, now 500 Number of simulations: 500 Monte Carlo estimate of E[x] is: .06990338 Monte Carlo estimate of E[exp(-exp(x))] is: .36137551 . simintegration 1000 obs was 0, now 1000 Number of simulations: 1000 Monte Carlo estimate of E[x] is: .04309113 Monte Carlo estimate of E[exp(-exp(x))] is: .36945581 . simintegration 1000 obs was 0, now 1000 Number of simulations: 1000 Monte Carlo estimate of E[x] is: .04309113 227

Monte Carlo estimate of E[exp(-exp(x))] is: .36945581 . simintegration 100000 obs was 0, now 100000 Number of simulations: 100000 Monte Carlo estimate of E[x] is: -.00405425 Monte Carlo estimate of E[exp(-exp(x))] is: .38284684 . clear . set mem 20m (20480k) . simintegration 1000000 obs was 0, now 1000000 Number of simulations: 1000000 Monte Carlo estimate of E[x] is: -.00085186 Monte Carlo estimate of E[exp(-exp(x))] is: .38192861 . . ********** CLOSE OUTPUT ********** . log close log: c:\Imbook\bwebpage\Section3\mma12p1integration.txt log type: text closed on: 18 May 2005, 21:17:16 -----------------------------------------------------------------------------------------------------log: c:\Imbook\bwebpage\Section3\mma12p2mslmsm.txt log type: text opened on: 18 May 2005, 21:46:27 . . ********** OVERVIEW OF MMA12P2MSLMSM.DO ********** . . * STATA Program . * copyright C 2005 by A. Colin Cameron and Pravin K. Trivedi . * used for "Microeconometrics: Methods and Applications" . * by A. Colin Cameron and Pravin K. Trivedi (2005) . * Cambridge University Press . . * Chapter 12.4.5 pages 397-8 and 12.5.5 pages 402-4 . * Computes integral numerically and by simulation . * (1) Maximum Simulated likelihood Table 12.2 . * (2) Method of Simulated Moments Table 12.3 . * with application to generated data . . * The application is only illustrative. . * This is not a template program for MSL or MSM. . . * Different number of simulations S lead to different estimators. . * This program gives entries in Tables 12.2 and 12.3 for S = 100 . * For other values of S change the value of simreps 228

. * from the current global simreps 100 . . ********** SETUP ********** . . set more off . version 8 . . ********** DATA DESCRIPTION ********** . . * Model is y = theta + u + e . * where theta is a scalar parameter equal to 1 .* u is extreme value type 1 .* e is N(0,1) . * n is set in global numobs . . ********** DEFINE GLOBALS ********** . . global simreps 100 /* change this to change the number of simulations */ . global numobs 100 /* change this to change the number of observations */ . . . ********** (1) MAXIMUM SIMULATED LIKELIHOOD (Table 12.2 p.398) ********** . . * This MSL program is inefficiently written computer code . * as it requires drawing the same random variates at each iteration . . * Generate data . clear . set obs $numobs obs was 0, now 100 . set seed 10101 . gen u = -log(-log(uniform())) . gen e = invnorm(uniform()) . gen y = 1 + u + e . summarize u e y Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------u| 100 .7236045 1.372637 -1.827296 6.423636 e| 100 .0415449 .9472174 -2.906972 2.302204 y| 100 1.765149 1.684177 -2.227185 8.143228 229

. . * Write data to a text (ascii) file so can use with programs other than Stata . outfile u e y using mma12p2mslmsm.asc, replace . . * Use the variant ml d0 as this gives the entire likelihood, not just one observation. . * I want this so that seed is only reset for the entire data. . * My program is inefficient as variates needs to be redrawn at each iteration . program define msl 1. version 6.0 2. args todo b lnf /* Need to use the names todo b and lnf > todo always contains 1 and may be ignored > b is parameters and lnf is log-density */ 3. tempvar theta1 /* create as needed to calculate lf, g, ... */ 4. mleval `theta1' = `b', eq(1) /* theta1 is theta1_i = x_i'b */ 5. local y "$ML_y1" /* create to make program more readable */ 6. set seed 10101 7. tempvar denssim 8. global isim=1 9. quietly gen `denssim' = exp(-0.5*(`y'-`theta1'+log(-log(uniform())))^2)/sqrt(2*_pi) 10. while $isim < $simreps { 11. quietly replace `denssim' = `denssim' + exp(-0.5*(`y'-`theta1'+log(-log(uniform())))^2)/sq > rt(2*_pi) 12. global isim=$isim+1 13. } 14. mlsum `lnf' = ln(`denssim'/$isim) 15. end . . gen one = 1 . ml model d0 msl (y = one, nocons ) . ml maximize initial: log likelihood = -216.68168 alternative: log likelihood = -199.54479 rescale: log likelihood = -191.09715 Iteration 0: log likelihood = -191.09715 Iteration 1: log likelihood = -190.4391 (not concave) Iteration 2: log likelihood = -190.43885 Iteration 3: log likelihood = -190.4385 Iteration 4: log likelihood = -190.4385

Log likelihood = -190.4385

Number of obs = 100 Wald chi2(1) = 65.72 Prob > chi2 =

0.0000

-----------------------------------------------------------------------------y| Coef. Std. Err. z P>|z| [95% Conf. Interval] 230

-------------+---------------------------------------------------------------one | 1.177456 .1452451 8.11 0.000 .8927806 1.462131 -----------------------------------------------------------------------------. . *** Display MSL results in one column of Table 12.2 p.398 . . di "For number of simulations S = " $simreps For number of simulations S = 100 . di "MSL estimator: " _b[one] MSL estimator: 1.1774557 . di "Standard error: " _se[one] Standard error: .14524511 . . ********** (2) METHOD OF SIMULATED MOMENTS (Table 12.3 p.404) ********** . . clear . set obs $numobs obs was 0, now 100 . set seed 10101 . gen u = -log(-log(uniform())) . gen e = invnorm(uniform()) . gen y = 1 + u + e . summarize u e y Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------u| 100 .7236045 1.372637 -1.827296 6.423636 e| 100 .0415449 .9472174 -2.906972 2.302204 y| 100 1.765149 1.684177 -2.227185 8.143228 . . global isim=1 . gen usim = -log(-log(uniform())) . gen esim = invnorm(uniform()) . while $isim < $simreps { 2. quietly replace usim = usim-log(-log(uniform())) 3. quietly replace esim = esim+invnorm(uniform()) 4. global isim=$isim+1 231

5. } . gen usimbar = usim/$isim . gen esimbar = esim/$isim . gen theta = y - usimbar - esimbar . summarize Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------u| 100 .7236045 1.372637 -1.827296 6.423636 e| 100 .0415449 .9472174 -2.906972 2.302204 y| 100 1.765149 1.684177 -2.227185 8.143228 usim | 100 57.36345 13.16979 21.96637 90.07499 esim | 100 -.9702956 11.38655 -26.38858 33.28406 -------------+-------------------------------------------------------usimbar | 100 .5736345 .1316979 .2196637 .9007499 esimbar | 100 -.009703 .1138655 -.2638858 .3328406 theta | 100 1.201218 1.681435 -2.757669 7.75245 . . * Results for Table 12.3 on page 404 . * Here the st.eror of theta_MSM is approximated by the st. dev. of theta . * divided by the square root of S (the number of simulations) . quietly sum theta . scalar theta_MSM = r(mean) . scalar approx_sterror = r(sd)/sqrt($simreps) . . * Display MSM results in one column of Table 12.3 p.404 . di "For number of simulations S = " $simreps For number of simulations S = 100 . di "MSM estimator: " theta_MSM MSM estimator: 1.2012178 . di "Approximate standard error: " approx_sterror Approximate standard error: .16814348 . . * As written this will not give the correct standard errors (see p.403). . * Can get this by also computing the squared rv to get E[y^2] . . ********** CLOSE OUTPUT ********** . log close log: c:\Imbook\bwebpage\Section3\mma12p2mslmsm.txt log type: text 232

closed on: 18 May 2005, 21:46:28 -----------------------------------------------------------------------------------------------------log: c:\Imbook\bwebpage\Section3\mma12p3draws.txt log type: text opened on: 18 May 2005, 21:48:36 . . ********** OVERVIEW OF MMA12P3DRAWS.DO ********** . . * STATA Program . * copyright C 2005 by A. Colin Cameron and Pravin K. Trivedi . * used for "Microeconometrics: Methods and Applications" . * by A. Colin Cameron and Pravin K. Trivedi (2005) . * Cambridge University Press . . * Chapter 12.8.2 pages 412-5 . * Draws figures that illustrate two common ways to draw random variates . . * (1) Illustrate Inverse Transformation method: Figure 12.2 . * (2) Illustrate Envelope method: Figure 12.3 . . * No data need be read in. . . ********** SETUP ********** . . set more off . version 8 . set scheme s1mono . . ********** (1) INVERSE TRANSFORMATION - FIGURE 12.2 page 413 ********** . . * Graph is for x = 0 to 5 in increments of 0.05 . set obs 100 obs was 0, now 100 . gen x = 0.05*_n . * Unit Exponential cdf . gen Fx = 1 - exp(-x) . * Suppose uniform draw is 0.64 . gen uniformdraw = 0.64 . . graph twoway (line Fx x, yline(0.64) xline(1.02)), /* > */ scale (1.2) plotregion(style(none)) /* > */ title("Inverse Transformation Method") /* 233

> */ xtitle("Random variable x", size(medlarge)) xscale(titlegap(*5)) /* > */ ytitle("Cdf F(x)", size(medlarge)) yscale(titlegap(*5)) /* > */ caption(" " "Draw of 0.64 (vertical axis) yields x = 1.02 (horizontal axis).") . graph save ch12fig2invtransform, replace (file ch12fig2invtransform.gph saved) . graph export ch12fig2invtransform.wmf, replace (file c:\Imbook\bwebpage\Section3\ch12fig2invtransform.wmf written in Windows Metafile format) . . ********** (2) ENVELOPE METHOD - FIGURE 12.3 ********** . . * The following is a modification of the figure in the book . * making clear that the envelope is a scaling up of g(x) . . clear . . * Graph is for x = 0 to 10 in increments of 0.1 . set obs 101 obs was 0, now 101 . gen x = -0.05 + 0.1*_n . * Unit Exponential cdf . gen fx = normden(x-4) . gen gx = 1.5*normden(x-4)+0.005 . . graph twoway (line fx x, clstyle(p1)) /* > */ (line gx x, clstyle(p1) clwidth(*2) clcolor(gs12)), /* > */ scale (1.2) plotregion(style(none)) /* > */ title("Accept-reject Method") /* > */ xtitle("Random variable x", size(medlarge)) xscale(titlegap(*5)) /* > */ ytitle("f(x) and kg(x)", size(medlarge)) yscale(titlegap(*5)) /* > */ legend(pos(1) ring(0) col(1)) legend(size(small)) /* > */ legend( label(1 "Desired density f(x)") label(2 "Envelope kg(x)") ) . graph save ch12fig3envelope, replace (file ch12fig3envelope.gph saved) . graph export ch12fig3envelope.wmf, replace (file c:\Imbook\bwebpage\Section3\ch12fig3envelope.wmf written in Windows Metafile format) . . ********** CLOSE OUTPUT ********** . log close log: c:\Imbook\bwebpage\Section3\mma12p3draws.txt 234

log type: text closed on: 18 May 2005, 21:48:42 -----------------------------------------------------------------------------------------------------log: c:\Imbook\bwebpage\Section3\mma13p1bayesthm.txt log type: text opened on: 24 May 2005, 11:04:08 . . ********** OVERVIEW OF MMA13P1BAYESTHM.DO ********** . . * STATA Program . * copyright C 2005 by A. Colin Cameron and Pravin K. Trivedi . * used for "Microeconometrics: Methods and Applications" . * by A. Colin Cameron and Pravin K. Trivedi (2005) . * Cambridge University Press . . * Chapter 13.2.2 page 424 . * Create Figure 13.1 . * (1) Bayes Analysis illustrated using normal distribution and prior . . * No data are needed. . . ********** SETUP . . set more off . version version 8.2 . set scheme s1mono /* Graphics scheme */ . . ********** DATA DESCRIPTION ********** . . * Model is y ~ normal(theta, sigmesq) where sigmasq is known. . * and the prior is theta ~ normal(mu, tau) . * which gives a normal posterior . * n is set below in set obs . . ********** CREATE DATA ********** . . * The likleihood and prior are normal so the posterior is also normal . . * Will evaluate the densities at points between 0 and 15 . set obs 150 obs was 0, now 150 . gen xeval = 0.1*_n . 235

. * Likelihood with sigmasq known . scalar nobs = 50 . scalar ybar = 10 . scalar sigmasq = 100 . gen likelihood = normden(xeval,ybar,sqrt(sigmasq/nobs)) . . * Prior . scalar mu = 5 . scalar tausq = 3 . gen prior = normden(xeval,mu,sqrt(tausq)) . . * Posterior given sample mean of using . scalar tau1sq=1/((nobs/sigmasq)+(1/tausq)) . scalar mu1 = tau1sq*((ybar*nobs/sigmasq)+(mu/tausq)) . gen posterior = normden(xeval,mu1,sqrt(tau1sq)) . . scalar list mu1 = tau1sq = tausq = mu = sigmasq = ybar = nobs =

8 1.2 3 5 100 10 50

. summarize Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------xeval | 150 7.55 4.344537 .1 15 likelihood | 150 .0666548 .0944174 6.44e-12 .2820948 prior | 150 .0665247 .0804685 1.33e-08 .2303294 posterior | 150 .0666667 .1131755 1.85e-12 .3641828 . . graph twoway (line likelihood xeval, clstyle(p2)) /* > */ (line prior xeval, clstyle(p3)) /* > */ (line posterior xeval, clstyle(p1)), /* > */ scale (1.2) plotregion(style(none)) /* > */ title("Bayes: Likelihood, Prior and Posterior") /* > */ xtitle("Evaluation point", size(medlarge)) xscale(titlegap(*5)) /* 236

> > > >

*/ ytitle("Density", size(medlarge)) yscale(titlegap(*5)) /* */ legend(pos(10) ring(0) col(1)) legend(size(small)) /* */ legend( label(1 "Likelihood N[10,2]") label(2 "Prior N[5,3]") /* */ label(3 "Posterior N[8,1.2]") )

. graph save Ch13_Bayes1, replace (file Ch13_Bayes1.gph saved) . graph export Ch13_Bayes1.wmf, replace (file c:\Imbook\bwebpage\Section3\Ch13_Bayes1.wmf written in Windows Metafile format) . . ********** CLOSE OUTPUT ********** . log close log: c:\Imbook\bwebpage\Section3\mma13p1bayesthm.txt log type: text closed on: 24 May 2005, 11:04:12 1 The SAS System 25, 2005

08:50 Wednesday, May

NOTE: Copyright (c) 2002-2003 by SAS Institute Inc., Cary, NC, USA. NOTE: SAS (r) 9.1 (TS1M2) Licensed to UNIV OF CA/DAVIS, Site 0029107010. NOTE: This session is executing on the SunOS 5.9 platform.

You are running SAS 9. Some SAS 8 files will be automatically converted by the V9 engine; others are incompatible. Please see http://support.sas.com/rnd/migration/planning/platform/64bit.html PROC MIGRATE will preserve current SAS file attributes and is recommended for converting all your SAS libraries from any SAS 8 release to SAS 9. For details and examples, please see http://support.sas.com/rnd/migration/index.html

This message is contained in the SAS news file, and is presented upon initialization. Edit the file "news" in the "misc/base" directory to display site-specific news and information in the program log. The command line option "-nonews" will prevent this display.

NOTE: SAS initialization used: real time 0.11 seconds cpu time 0.10 seconds 1 2

* MMA13P2BAYES.SAS March 2005 for SAS version 8.2

237

3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29

********** OVERVIEW OF MMA13P2BAYES.SAS ********** * SAS Program * copyright C 2005 by A. Colin Cameron and Pravin K. Trivedi * used for "Microeconometrics: Methods and Applications" * by A. Colin Cameron and Pravin K. Trivedi (2005) * Cambridge University Press * Chapter 13.6 p.452-4 * MCMC Example: Gibbs Sampler for 2 equation SUR * Program creates the first column of Table 13.3 * (though differs somewhat due to use of different seed) * For different columns of Table 13.3 change * nobs = Sample size N (1000 or 10000) * replics = Gibbs sample replications (50000 or 100000) * tau = 1, 10 or 0.1 * This program does first column: tau=10, nobs=1000, replics=50000 * Note that the program does not exactly replicate Table 13.3 * Table 13.3 used the computer clock for seed, * with third argument zero in rannor(j( , ,0)) * Here instead the seed is consecutively 10101, 20101, ... , 70101 * so third argument is eg rannor(j( , ,10101)) * to permit reproducability by other users * This programs creates

238

2 25, 2005

The SAS System

08:50 Wednesday, May

30 * MMA13P2BAYES.1ST SAS Output with one column of Table 13.3 31 * MMA13P2BAYES.LOG SAS log file 32 33 * This program uses generated data - so no data set required 34 * This program uses a lot of memory - 1 gigabyte should do 35 * In Unix give command sas -MEMSIZE 1G mma13p2bayesgibbs.sas 36 37 *********************************************************************; 38 ***** BIVARIATE NORMAL-BAYESIAN-ESTIMATION-BY-MCMC **************; 39 *********************************************************************; 40 41 OPTIONS LS=75; 42 options NOTES; 43 44 PROC IML; NOTE: IML Ready 45 start main; 45 ! 46 47 print "A. Colin Cameron and Pravin K. Trivedi (2005)"; 47 ! 48 print "Microeconometrics: Methods and Applications, CUP"; 48 ! 49 print "MCMC Example: Gibbs Sampler for SUR"; 49 ! 50 51 ************* GENERATING DATA: 2 EQUATION SUR 51 ! ****************; 52 53 nobs = 1000; 53 ! 54 replics = 50000; 54 ! 55 burn = 5000; 55 ! 56 replics = replics + burn; 56 ! 57 58 npar1 = 2; 58 ! 59 npar2 = 2; 59 ! 60 61 alpha1 ={1,1}; 61 ! 62 alpha2 ={1,1}; 62 ! 239

63 64 64 65 65 66 66 67 67 68 69

sigma = {1 -0.5,-0.5 1}; ! T = {0.15 2.18 0.725 0.45}; ! EPS = 1e-20; ! IC = (1/2.506628275); ! R1 = j(nobs,1,1)||rannor(j(nobs,1,10101));

240

3 69 70 70 71 72 72 73 73 74 74 75 76 76 77 77 78 79 79 80 81 81 82 82 83 84 84 85 85 86 86 87 87 88 89 89 90 90 91 91 92 93 93 94 95 95 96 97 97 98

The SAS System 08:50 Wednesday, May 25, 2005 ! R2 = j(nobs,1,1)||rannor(j(nobs,1,20101)); ! e = rannor(j(nobs,2,30101))*root(sigma); ! e1 = e[,1]; ! e2 = e[,2]; ! Y1 = R1*alpha1 + e1; ! Y2 = R2*alpha2 + e2; ! ************* SPECIFY PRIOR DISTRIBUTIONS ! ******************; alpha01 = j(npar1,1,0); ! alpha02 = j(npar2,1,0); ! sigma = I(2); ! p = 3; ! df = 5; ! tau = 10; ! MUalpha = alpha01//alpha02; ! OMalpha = tau*I(npar1+npar2); ! OMphi = I(2); ! ************ ANALYSIS: GIBBS SAMLING BEGINS HERE ! ***************; do rep = 1 to replics; ! ************* GENERATE ALPHA1 ALPHA2 RHO ! *******************;

241

99 99 100 101 102 102 103 104

isigma = inv(sigma); ! LL = ((isigma[1,1]*R1`*R1||isigma[1,2]*R1`*R2)// (isigma[2,1]*R2`*R1||isigma[2,2]*R2`*R2)); ! LisigY = ((isigma[1,1]*R1`*Y1+isigma[1,2]*R1`*Y2)// (isigma[2,1]*R2`*Y1+isigma[2,2]*R2`*Y2));

242

4 104 105 106 107 107 108 109 109 110 110 111 112 112 113 113 114 115 115 116 117 118 118 119 119 120 120 121 121 122 122 123 123 124 124 125 126 126 127 128 128 129 130 130 131 131 132 132 133 134

The SAS System 08:50 Wednesday, May 25, 2005 ! alpha = inv(inv(OMalpha)+ LL)*(LisigY + inv(OMalpha)*MUalpha) + root(inv(inv(OMalpha)+ ! LL))`*rannor(j(npar1+npar2,1,40101)); alpha1 = alpha[1:npar1]; ! alpha2 = alpha[npar1+1:npar1+npar2]; ! e1 = Y1 - R1*alpha1; ! e2 = Y2 - R2*alpha2; ! ************* GENERATE SIGMA ! *******************; mt = (sqrt((rannor(j(1,nobs+df,50101))##2)[,+])||0)// (rannor(j(1,1,60101))||sqrt((rannor(j(1,nobs+df-1,70101))## ! 2)[,+])); mv = mt*mt`; ! e=(e1||e2); ! ms = e`*e+inv(OMphi); ! ml = root(inv(ms))`; ! mg = ml*mv*ml`; ! sigma = inv(mg); ! free mt mv e ml; ! ************* WRITE TO OUTPUT FILE IF AFTER BURN-IN ! **************; if rep <= burn then goto point300; ! sigma3 = sigma[1,1]||sigma[1,2]||sigma[2,2]; ! out1 = alpha1`||alpha2`||sigma3; ! output1=output1//out1; 243

134 135 136 136 136 137 138 138

!

! !

point300: end;

************* ! **************;

END OF GIBBS SAMPLING

244

5 139 140 141 141 142 142 143 143 144 145 145 146 147 147 148 148 149 150 150 151 151 152 152 153 153 154 155 155 156 156 157 157 158 158 159 160 160 161 161 162 162 163 164 164 165 165 166 166 167

The SAS System 08:50 Wednesday, May 25, 2005

**************************************************************** ! *****; ***** RESULTS: COMPARE LAST HALF WITH ALL (AFTER BURN-IN) ! *******; **************************************************************** ! *****; replics = replics-burn; ! out1 = output1[replics/2+1:replics,]; ! out = output1[1:replics,]; ! create exp from out1; ! append from out1; ! summary var _num_; ! close exp; ! create exp from out; ! append from out; ! summary var _num_; ! close exp; ! **************************************************************** ! *****; ****** RESULTS: POSTERIOR MEAN AND SD - TABLE 13.3 P.454 ! ********; **************************************************************** ! *****; xnames1 = {"CONSTANT"} || {"R1"}; ! xnames2 = {"CONSTANT"} || {"R2"}; ! parnames = concat({"d1"}," ",xnames1)||concat({"d2"}," ! ",xnames2)||{"SIGMA11"}||{"SIGMA12"}||{"SIGMA22"};

245

168 168 169 169 170 170 171 171

meanout = out[+,]/replics; ! stderr = ! sqrt(((out-j(replics,1,1)*meanout)##2)[+,]/(replics-1)); parm = meanout; ! stderr = stderr`; !

246

6 172 172 173 174 174 175 175 176 176 177 177 178 178 179 179 180 180 181 181 182 182 183 183 184 185 185 186 186 187 187 188 189 189 190 191 191 192 193 193 194 194 195 195 196 196 197 198 198 199

The SAS System 08:50 Wednesday, May 25, 2005 tnpar = npar1 + npar2 + 3; ! tstat = parm`/ stderr; ! coeff = parm` || stderr || tstat; ! info = tau // nobs // replics // burn // tnpar; ! rowinfo={'TAU' '# OBSERVATIONS' '# REPLICATIONS' '# BURN-IN' '# ! PARAMETERS'}; estcol ={ 'ESTIMATE' 'STD ERR' 'T-STAT'}; ! mattrib info rowname=rowinfo label={" "}; ! mattrib coeff rowname=parnames colname=estcol label={" "}; ! print / "Results for Table 13.3 p.454"; ! print info; ! print coeff; ! **************************************************************** ! *****; ********** RESULTS: CONVERGENCE CHECK: SEE P.454 ! ***************; **************************************************************** ! *****; print / "Convergence check on p.454"; ! corr = j(20,7,0); ! do i = 1 to 7; ! cov = covlag(out[,i],20)`; ! corr[,i] = cov/cov[1]; ! end; ! covd1 = j(20,2,0); !

247

200 200 201 201 202 202 203 203

do k = 1 to 3; ! covd1 = corr[,2*k-1:2*k]; ! print covd1; ! end; !

248

7

The SAS System 08:50 Wednesday, May 25, 2005

204 205 covd1 = corr[,7]; 205 ! 206 print covd1; 206 ! 207 208 finish main; NOTE: Module MAIN defined. 208 ! 209 210 run main; NOTE: The data set WORK.EXP has 25000 observations and 7 variables. NOTE: The data set WORK.EXP has 50000 observations and 7 variables. 210 ! NOTE: Exiting IML. NOTE: 65925 workspace compresses. NOTE: The PROCEDURE IML printed pages 1-6. NOTE: PROCEDURE IML used (Total process time): real time 5:44.35 cpu time 5:44.04

NOTE: SAS Institute Inc., SAS Campus Drive, Cary, NC USA 27513-2414 NOTE: The SAS System used: real time 5:45.48 cpu time 5:45.15

249

-----------------------------------------------------------------------------------------------------log: c:\Imbook\bwebpage\Section4\mma14p1binary.txt log type: text opened on: 19 May 2005, 09:01:28 . . ********** OVERVIEW OF MMA14P1BINARY.DO ********** . . * STATA Program . * copyright C 2005 by A. Colin Cameron and Pravin K. Trivedi . * used for "Microeconometrics: Methods and Applications" . * by A. Colin Cameron and Pravin K. Trivedi (2005) . * Cambridge University Press . . * Chapter 14.2 (pages 464-6) Logit and probit models. . * Provides . * (1) Table 14.1: Data summary . * (2) Table 14.2: Logit, Probit and OLS slope estimates . * (3) Figure 14.1: Plot of Logit Probit and OLS predicted probabilities . . * To run this program you need data file . * Nldata.asc . . ********** SETUP . . set more off . version 8.0 . set scheme s1mono /* Graphics scheme */ . . ********** DATA DESCRIPTION . . * Data Set comes from : . * J. A. Herriges and C. L. Kling, . * "Nonlinear Income Effects in Random Utility Models", . * Review of Economics and Statistics, 81(1999): 62-72 . . * The data are given as a combined observation with data on all 4 choices. . * This will work for multinomial logit program. . * For conditional logit will need to make a new data set which has . * four separate entries for each observation as there are four alternatives. . . * Filename: NLDATA.ASC . * Format: Ascii . * Number of Observations: 1182 . * Each observations appears over 3 lines with 4 variables per line . * so 4 x 1182 = 4728 observations . * Variable Number and Description . * 1 Recreation mode choice. = 1 if beach, = 2 if pier; = 3 if private boat; = 4 if charter 250

. * 2 Price for chosen alternative . * 3 Catch rate for chosen alternative . * 4 = 1 if beach mode chosen; = 0 otherwise . * 5 = 1 if pier mode chosen; = 0 otherwise . * 6 = 1 if private boat mode chosen; = 0 otherwise . * 7 = 1 if charter boat mode chosen; = 0 otherwise . * 8 = price for beach mode . * 9 = price for pier mode . * 10 = price for private boat mode . * 11 = price for charter boat mode . * 12 = catch rate for beach mode . * 13 = catch rate for pier mode . * 14 = catch rate for private boat mode . * 15 = catch rate for charter boat mode . * 16 = monthly income . . ********** READ IN DATA ********** . . infile mode price crate dbeach dpier dprivate dcharter pbeach ppier /* > */ pprivate pcharter qbeach qpier qprivate qcharter income /* > */ using nldata.asc (1182 observations read) . . * Divide income by 1000 so that results are easy to read . gen ydiv1000 = income/1000 . . label define modetype 1 "beach" 2 "pier" 3 "private" 4 "charter" . label values mode modetype . summarize Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------mode | 1182 3.005076 .9936162 1 4 price | 1182 52.08197 53.82997 1.29 666.11 crate | 1182 .3893684 .5605964 .0002 2.3101 dbeach | 1182 .1133672 .3171753 0 1 dpier | 1182 .1505922 .3578023 0 1 -------------+-------------------------------------------------------dprivate | 1182 .3536379 .4783008 0 1 dcharter | 1182 .3824027 .4861799 0 1 pbeach | 1182 103.422 103.641 1.29 843.186 ppier | 1182 103.422 103.641 1.29 843.186 pprivate | 1182 55.25657 62.71344 2.29 666.11 -------------+-------------------------------------------------------pcharter | 1182 84.37924 63.54465 27.29 691.11 qbeach | 1182 .2410113 .1907524 .0678 .5333 qpier | 1182 .1622237 .1603898 .0014 .4522 251

qprivate | 1182 .1712146 .2097885 .0002 .7369 qcharter | 1182 .6293679 .7061142 .0021 2.3101 -------------+-------------------------------------------------------income | 1182 4099.337 2461.964 416.6667 12500 ydiv1000 | 1182 4.099337 2.461964 .4166667 12.5 . . ********** CREATE BINARY DATA: CHARTER vs PIER ********** . . * Binary logit of charter (mode = 2) versus pier (mode = 4) . keep if mode == 2 | mode == 4 (552 observations deleted) . * charter is 1 if fish from charter boat and 0 if fish from pier . gen charter = 0 . replace charter = 1 if mode == 4 (452 real changes made) . . gen pratio = 100*ln(pcharter/ppier) . gen lnrelp = ln(pchart/ppier) . . * Overall summary . summarize Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------mode | 630 3.434921 .9011843 2 4 price | 630 62.51669 52.31219 1.29 387.208 crate | 630 .5533478 .6953035 .0014 2.3101 dbeach | 630 0 0 0 0 dpier | 630 .2825397 .4505921 0 1 -------------+-------------------------------------------------------dprivate | 630 0 0 0 0 dcharter | 630 .7174603 .4505921 0 1 pbeach | 630 95.19802 95.62037 1.29 578.048 ppier | 630 95.19802 95.62037 1.29 578.048 pprivate | 630 55.26221 59.99482 2.29 494.058 -------------+-------------------------------------------------------pcharter | 630 84.89158 60.79327 27.29 529.058 qbeach | 630 .2546022 .1983357 .0678 .5333 qpier | 630 .1716835 .1687288 .0014 .4522 qprivate | 630 .1695303 .2033172 .0014 .7369 qcharter | 630 .6368509 .688508 .0029 2.3101 -------------+-------------------------------------------------------income | 630 3741.402 2145.71 416.6667 12500 ydiv1000 | 630 3.741402 2.14571 .4166667 12.5 charter | 630 .7174603 .4505921 0 1 252

pratio | lnrelp |

630 27.45581 126.2598 -215.3976 406.2712 630 .2745581 1.262598 -2.153976 4.062713

. * Summary by charter or by pier . sort mode . by mode: summarize ----------------------------------------------------------------------------------------------------> mode = pier Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------mode | 178 2 0 2 2 price | 178 30.57133 35.58442 1.29 224.296 crate | 178 .2025348 .1702942 .0014 .4522 dbeach | 178 0 0 0 0 dpier | 178 1 0 1 1 -------------+-------------------------------------------------------dprivate | 178 0 0 0 0 dcharter | 178 0 0 0 0 pbeach | 178 30.57133 35.58442 1.29 224.296 ppier | 178 30.57133 35.58442 1.29 224.296 pprivate | 178 82.42908 69.30802 2.29 494.058 -------------+-------------------------------------------------------pcharter | 178 109.7633 72.37726 27.29 529.058 qbeach | 178 .2614444 .1949684 .0678 .5333 qpier | 178 .2025348 .1702942 .0014 .4522 qprivate | 178 .1501489 .0968393 .0014 .2601 qcharter | 178 .4980798 .3756255 .0029 1.0266 -------------+-------------------------------------------------------income | 178 3387.172 2340.324 416.6667 12500 ydiv1000 | 178 3.387172 2.340324 .4166667 12.5 charter | 178 0 0 0 0 pratio | 178 164.2956 104.3052 -79.13918 406.2712 lnrelp | 178 1.642956 1.043052 -.7913917 4.062713 ----------------------------------------------------------------------------------------------------> mode = charter Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------mode | 452 4 0 4 4 price | 452 75.09694 52.51942 27.29 387.208 crate | 452 .6914998 .7714728 .0029 2.3101 dbeach | 452 0 0 0 0 dpier | 452 0 0 0 0 -------------+-------------------------------------------------------dprivate | 452 0 0 0 0 dcharter | 452 1 0 1 1 pbeach | 452 120.6483 99.78664 4.29 578.048 253

ppier | 452 120.6483 99.78664 4.29 578.048 pprivate | 452 44.56376 52.23744 2.29 362.208 -------------+-------------------------------------------------------pcharter | 452 75.09694 52.51942 27.29 387.208 qbeach | 452 .2519077 .1997956 .0678 .5333 qpier | 452 .1595341 .1667353 .0014 .4522 qprivate | 452 .1771628 .2318749 .0014 .7369 qcharter | 452 .6914998 .7714728 .0029 2.3101 -------------+-------------------------------------------------------income | 452 3880.9 2050.028 416.6667 12500 ydiv1000 | 452 3.8809 2.050028 .4166667 12.5 charter | 452 1 0 1 1 pratio | 452 -26.43243 87.53686 -215.3976 235.8242 lnrelp | 452 -.2643243 .8753686 -2.153976 2.358242

. . * Write final data to a text (ascii) file so can use with programs other than Stata . outfile charter lnrelp using mma14p1binary.asc, replace . . ********** TABLE 14.1 - DATA SUMMARY BY OUTCOME AND OVERALL ********** . . * Following gives Table 14.1 page 464 . summarize charter pcharter ppier lnrelp Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------charter | 630 .7174603 .4505921 0 1 pcharter | 630 84.89158 60.79327 27.29 529.058 ppier | 630 95.19802 95.62037 1.29 578.048 lnrelp | 630 .2745581 1.262598 -2.153976 4.062713 . sort mode . by mode: summarize charter pcharter ppier lnrelp ----------------------------------------------------------------------------------------------------> mode = pier Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------charter | 178 0 0 0 0 pcharter | 178 109.7633 72.37726 27.29 529.058 ppier | 178 30.57133 35.58442 1.29 224.296 lnrelp | 178 1.642956 1.043052 -.7913917 4.062713 ----------------------------------------------------------------------------------------------------> mode = charter Variable |

Obs

Mean

Std. Dev.

Min

Max 254

-------------+-------------------------------------------------------charter | 452 1 0 1 1 pcharter | 452 75.09694 52.51942 27.29 387.208 ppier | 452 120.6483 99.78664 4.29 578.048 lnrelp | 452 -.2643243 .8753686 -2.153976 2.358242

. . ********** TABLE 14.2 - ESTIMATE LOGIT, PROBIT AND OLS MODELS . . logit charter lnrelp Iteration 0: Iteration 1: Iteration 2: Iteration 3: Iteration 4: Iteration 5:

log likelihood = -375.06167 log likelihood = -223.44527 log likelihood = -208.29369 log likelihood = -206.84942 log likelihood = -206.82698 log likelihood = -206.82697

Logit estimates

Number of obs = 630 LR chi2(1) = 336.47 Prob > chi2 = 0.0000 Log likelihood = -206.82697 Pseudo R2 = 0.4486 -----------------------------------------------------------------------------charter | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------lnrelp | -1.82253 .1445681 -12.61 0.000 -2.105879 -1.539182 _cons | 2.053125 .1689307 12.15 0.000 1.722027 2.384223 -----------------------------------------------------------------------------. estimates store blogit . . probit charter lnrelp Iteration 0: Iteration 1: Iteration 2: Iteration 3: Iteration 4:

log likelihood = -375.06167 log likelihood = -221.55989 log likelihood = -205.42312 log likelihood = -204.41773 log likelihood = -204.41087

Probit estimates

Number of obs = 630 LR chi2(1) = 341.30 Prob > chi2 = 0.0000 Log likelihood = -204.41087 Pseudo R2 = 0.4550 -----------------------------------------------------------------------------charter | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------lnrelp | -1.055515 .0761117 -13.87 0.000 -1.204691 -.9063383 255

_cons | 1.19436 .089504 13.34 0.000 1.018936 1.369785 -----------------------------------------------------------------------------. estimates store bprobit . . regress charter lnrelp Source | SS df MS Number of obs = 630 -------------+-----------------------------F( 1, 628) = 542.12 Model | 59.1676598 1 59.1676598 Prob > F = 0.0000 Residual | 68.5402767 628 .109140568 R-squared = 0.4633 -------------+-----------------------------Adj R-squared = 0.4624 Total | 127.707937 629 .203033285 Root MSE = .33036 -----------------------------------------------------------------------------charter | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------lnrelp | -.2429137 .0104328 -23.28 0.000 -.2634011 -.2224262 _cons | .7841542 .0134701 58.21 0.000 .7577023 .8106061 -----------------------------------------------------------------------------. estimates store bOLS . . * Heteroskedastic robust standard errors only needed for OLS . * but given for other models for completeness . . logit charter lnrelp, robust Iteration 0: Iteration 1: Iteration 2: Iteration 3: Iteration 4: Iteration 5:

log pseudo-likelihood = -375.06167 log pseudo-likelihood = -223.44527 log pseudo-likelihood = -208.29369 log pseudo-likelihood = -206.84942 log pseudo-likelihood = -206.82698 log pseudo-likelihood = -206.82697

Logit estimates

Number of obs = 630 Wald chi2(1) = 194.28 Prob > chi2 = 0.0000 Log pseudo-likelihood = -206.82697 Pseudo R2 = 0.4486 -----------------------------------------------------------------------------| Robust charter | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------lnrelp | -1.82253 .1307556 -13.94 0.000 -2.078807 -1.566254 _cons | 2.053125 .1473477 13.93 0.000 1.764329 2.341921 -----------------------------------------------------------------------------. estimates store bloghet 256

. . probit charter lnrelp, robust Iteration 0: Iteration 1: Iteration 2: Iteration 3: Iteration 4:

log pseudo-likelihood = -375.06167 log pseudo-likelihood = -221.55989 log pseudo-likelihood = -205.42312 log pseudo-likelihood = -204.41773 log pseudo-likelihood = -204.41087

Probit estimates

Number of obs = 630 Wald chi2(1) = 232.07 Prob > chi2 = 0.0000 Log pseudo-likelihood = -204.41087 Pseudo R2 = 0.4550 -----------------------------------------------------------------------------| Robust charter | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------lnrelp | -1.055515 .0692881 -15.23 0.000 -1.191317 -.9197122 _cons | 1.19436 .0794429 15.03 0.000 1.038655 1.350066 -----------------------------------------------------------------------------. estimates store bprobhet . . regress charter lnrelp, robust Regression with robust standard errors Number of obs = F( 1, 628) = 792.44 Prob > F = 0.0000 R-squared = 0.4633 Root MSE = .33036

630

-----------------------------------------------------------------------------| Robust charter | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------lnrelp | -.2429137 .0086292 -28.15 0.000 -.2598592 -.2259681 _cons | .7841542 .0119566 65.58 0.000 .7606744 .8076341 -----------------------------------------------------------------------------. estimates store bOLShet . . * Following gives Table 14.2 page 465 . estimates table blogit bprobit bOLS bloghet bprobhet bOLShet, /* > */ t stats(N ll r2 r2_p) b(%8.3f) keep(_cons lnrelp) -------------------------------------------------------------------------------Variable | blogit bprobit bOLS bloghet bprobhet bOLShet 257

-------------+-----------------------------------------------------------------_cons | 2.053 1.194 0.784 2.053 1.194 0.784 | 12.15 13.34 58.21 13.93 15.03 65.58 lnrelp | -1.823 -1.056 -0.243 -1.823 -1.056 -0.243 | -12.61 -13.87 -23.28 -13.94 -15.23 -28.15 -------------+-----------------------------------------------------------------N | 630.000 630.000 630.000 630.000 630.000 630.000 ll | -206.827 -204.411 -195.167 -206.827 -204.411 -195.167 r2 | 0.463 0.463 r2_p | 0.449 0.455 0.449 0.455 -------------------------------------------------------------------------------legend: b/t . . ********** FIGURE 14.1 - PLOT PREDICTED PROBABILITY AGAINST X FOR MODELS . . quietly logit charter lnrelp . predict plogit, p . . quietly probit charter lnrelp . predict pprobit, p . . quietly regress charter lnrelp . predict pOLS (option xb assumed; fitted values) . . sum charter plogit pprobit pOLS Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------charter | 630 .7174603 .4505921 0 1 plogit | 630 .7174603 .3193077 .0047196 .9974746 pprobit | 630 .72019 .3196164 .0009877 .9997377 pOLS | 630 .7174603 .3067022 -.2027341 1.307384 . . sort lnrelp . . * Following gives Figure 14.1 page 466 . graph twoway (scatter charter lnrelp, msize(vsmall) jitter(3)) /* > */ (line plogit lnrelp, clstyle(p1)) /* > */ (line pprobit lnrelp, clstyle(p2)) /* > */ (line pOLS lnrelp, clstyle(p3)), /* > */ scale (1.2) plotregion(style(none)) /* 258

> > > > > >

*/ title("Predicted Probabilities Across Models") /* */ xtitle("Log relative price (lnrelp)", size(medlarge)) xscale(titlegap(*5)) /* */ ytitle("Predicted probability", size(medlarge)) yscale(titlegap(*5)) /* */ legend(pos(1) ring(0) col(1)) legend(size(small)) /* */ legend( label(1 "Actual Data (jittered)") label(2 "Logit") /* */ label(3 "Probit") label(4 "OLS"))

. graph export ch14binary.wmf, replace (file c:\Imbook\bwebpage\Section4\ch14binary.wmf written in Windows Metafile format) . . ********** CLOSE OUTPUT ********** . log close log: c:\Imbook\bwebpage\Section4\mma14p1binary.txt log type: text closed on: 19 May 2005, 09:01:31

259

-----------------------------------------------------------------------------------------------------log: c:\Imbook\bwebpage\Section4\mma15p1mnl.txt log type: text opened on: 19 May 2005, 12:16:20 . . ********** OVERVIEW OF MMA15P1MNL.DO ********** . . * STATA Program . * copyright C 2005 by A. Colin Cameron and Pravin K. Trivedi . * used for "Microeconometrics: Methods and Applications" . * by A. Colin Cameron and Pravin K. Trivedi (2005) . * Cambridge University Press . . * Chapter 15.2.1-3 pages 491-5 . * Multinomial and conditional logit models analysis. . * It provides .... . * (0) Data summary (Table 15.1) . * (1A) Multinomial Logit estimates (Table 15.1) . * (1B) Multinomial Logit marginal effects (text page 494) . * (2A) Conditional Logit estimates (Table 15.2) . * (2B) Conditional Logit marginal effects (Table 15.3) . * (3) Multinomial estimates obtained using Cinditional Logit . * (4) "Mixed Model" estimates (Table 15.1) . . * Related programs are . * mma15p2gev.do estimates a nested logit model using Stata . * mma15p3mnl.lim estimates multinomial models using Limdep . * mma15p4gev.lim estimates conditional and nested logit models using Limdep . . * To run this program you need data file . * Nldata.asc . . /* Program summary: > > (1) Multinomial logit of mode on alternative-invariant regressor (income) > mlogit mode income > > (2) Conditional logit of mode on alternative-specific regressor (price, catch rate) > First reshape data so 4 observations per individual - one for each mode. > clogit mode p q > > (3) Conditional logit of mode on alternative-invariant regressor (income) > First reshape data so 4 observations per individual - one for each mode. > Then create dummy variables for each mode d2 d3 d4 > clogit mode d2 d3 d4 d2y d3y d4y > This gives same results as (1) > > (4) Conditional logit of mode on alternative-invariant regressor (income) > and on alternative-sepcific regressor (price, catch rate) > First reshape data so 4 observations per individual - one for each mode. 260

> Then create dummy variables for each mode d2 d3 d4 > clogit mode d2 d3 d4 d2y d3y d4y p q > */ . . ********** SETUP ********** . . set more off . version 8.0 . set scheme s1mono /* Graphics scheme */ . . ********** DATA DESCRIPTION ********** . . * Data Set comes from : . * J. A. Herriges and C. L. Kling, . * "Nonlinear Income Effects in Random Utility Models", . * Review of Economics and Statistics, 81(1999): 62-72 . . * The data are given as a combined observation with data on all 4 choices. . * This will work for multinomial logit program. . * For conditional logit will need to make a new data set which has . * four separate entries for each observation as there are four alternatives. . . * Filename: NLDATA.ASC . * Format: Ascii . * Number of Observations: 1182 . * Each observations appears over 3 lines with 4 variables per line . * so 4 x 1182 = 4728 observations . * Variable Number and Description . * 1 Recreation mode choice. = 1 if beach, = 2 if pier; = 3 if private boat; = 4 if charter . * 2 Price for chosen alternative . * 3 Catch rate for chosen alternative . * 4 = 1 if beach mode chosen; = 0 otherwise . * 5 = 1 if pier mode chosen; = 0 otherwise . * 6 = 1 if private boat mode chosen; = 0 otherwise . * 7 = 1 if charter boat mode chosen; = 0 otherwise . * 8 = price for beach mode . * 9 = price for pier mode . * 10 = price for private boat mode . * 11 = price for charter boat mode . * 12 = catch rate for beach mode . * 13 = catch rate for pier mode . * 14 = catch rate for private boat mode . * 15 = catch rate for charter boat mode . * 16 = monthly income . . ********** READ IN DATA and SUMMARIZE (Table 15.1, p.492) ********** . . * Method to read in depends on model used 261

. . /* Data are on fishing mode: 1 beach, 2 pier, 3 private boat, 4 charter > Data come as one observation having data for all 4 modes. > Both alternative specific and alternative invariant regresssors. > */ . . infile mode price crate dbeach dpier dprivate dcharter pbeach ppier /* > */ pprivate pcharter qbeach qpier qprivate qcharter income /* > */ using nldata.asc (1182 observations read) . . gen ydiv1000 = income/1000 . . * Look at data by alternative . label define modetype 1 "beach" 2 "pier" 3 "private" 4 "charter" . label values mode modetype . . summarize Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------mode | 1182 3.005076 .9936162 1 4 price | 1182 52.08197 53.82997 1.29 666.11 crate | 1182 .3893684 .5605964 .0002 2.3101 dbeach | 1182 .1133672 .3171753 0 1 dpier | 1182 .1505922 .3578023 0 1 -------------+-------------------------------------------------------dprivate | 1182 .3536379 .4783008 0 1 dcharter | 1182 .3824027 .4861799 0 1 pbeach | 1182 103.422 103.641 1.29 843.186 ppier | 1182 103.422 103.641 1.29 843.186 pprivate | 1182 55.25657 62.71344 2.29 666.11 -------------+-------------------------------------------------------pcharter | 1182 84.37924 63.54465 27.29 691.11 qbeach | 1182 .2410113 .1907524 .0678 .5333 qpier | 1182 .1622237 .1603898 .0014 .4522 qprivate | 1182 .1712146 .2097885 .0002 .7369 qcharter | 1182 .6293679 .7061142 .0021 2.3101 -------------+-------------------------------------------------------income | 1182 4099.337 2461.964 416.6667 12500 ydiv1000 | 1182 4.099337 2.461964 .4166667 12.5 . sort mode . by mode: summarize ---------------------------------------------------------------------------------------------------262

-> mode = beach Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------mode | 134 1 0 1 1 price | 134 35.69949 43.09414 1.29 306.82 crate | 134 .2791948 .1938734 .0678 .5333 dbeach | 134 1 0 1 1 dpier | 134 0 0 0 0 -------------+-------------------------------------------------------dprivate | 134 0 0 0 0 dcharter | 134 0 0 0 0 pbeach | 134 35.69949 43.09414 1.29 306.82 ppier | 134 35.69949 43.09414 1.29 306.82 pprivate | 134 97.80913 75.43844 2.29 392.946 -------------+-------------------------------------------------------pcharter | 134 125.0032 78.37641 27.29 427.946 qbeach | 134 .2791948 .1938734 .0678 .5333 qpier | 134 .2190015 .1677117 .0025 .4522 qprivate | 134 .1593985 .0948855 .0008 .2601 qcharter | 134 .5176089 .3629096 .0027 1.0266 -------------+-------------------------------------------------------income | 134 4051.617 2505.42 416.6667 12500 ydiv1000 | 134 4.051617 2.50542 .4166667 12.5 ----------------------------------------------------------------------------------------------------> mode = pier Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------mode | 178 2 0 2 2 price | 178 30.57133 35.58442 1.29 224.296 crate | 178 .2025348 .1702942 .0014 .4522 dbeach | 178 0 0 0 0 dpier | 178 1 0 1 1 -------------+-------------------------------------------------------dprivate | 178 0 0 0 0 dcharter | 178 0 0 0 0 pbeach | 178 30.57133 35.58442 1.29 224.296 ppier | 178 30.57133 35.58442 1.29 224.296 pprivate | 178 82.42908 69.30802 2.29 494.058 -------------+-------------------------------------------------------pcharter | 178 109.7633 72.37726 27.29 529.058 qbeach | 178 .2614444 .1949684 .0678 .5333 qpier | 178 .2025348 .1702942 .0014 .4522 qprivate | 178 .1501489 .0968393 .0014 .2601 qcharter | 178 .4980798 .3756255 .0029 1.0266 -------------+-------------------------------------------------------income | 178 3387.172 2340.324 416.6667 12500 ydiv1000 | 178 3.387172 2.340324 .4166667 12.5

263

----------------------------------------------------------------------------------------------------> mode = private Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------mode | 418 3 0 3 3 price | 418 41.60681 55.90806 2.29 666.11 crate | 418 .1775411 .2435798 .0002 .7369 dbeach | 418 0 0 0 0 dpier | 418 0 0 0 0 -------------+-------------------------------------------------------dprivate | 418 1 0 1 1 dcharter | 418 0 0 0 0 pbeach | 418 137.5271 115.3058 2.29 843.186 ppier | 418 137.5271 115.3058 2.29 843.186 pprivate | 418 41.60681 55.90806 2.29 666.11 -------------+-------------------------------------------------------pcharter | 418 70.58409 56.39575 27.29 691.11 qbeach | 418 .2082868 .1729351 .0678 .5333 qpier | 418 .1297646 .1368029 .0025 .4522 qprivate | 418 .1775411 .2435798 .0002 .7369 qcharter | 418 .6539167 .8064379 .0021 2.3101 -------------+-------------------------------------------------------income | 418 4654.107 2777.898 416.6667 12500 ydiv1000 | 418 4.654107 2.777898 .4166667 12.5 ----------------------------------------------------------------------------------------------------> mode = charter Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------mode | 452 4 0 4 4 price | 452 75.09694 52.51942 27.29 387.208 crate | 452 .6914998 .7714728 .0029 2.3101 dbeach | 452 0 0 0 0 dpier | 452 0 0 0 0 -------------+-------------------------------------------------------dprivate | 452 0 0 0 0 dcharter | 452 1 0 1 1 pbeach | 452 120.6483 99.78664 4.29 578.048 ppier | 452 120.6483 99.78664 4.29 578.048 pprivate | 452 44.56376 52.23744 2.29 362.208 -------------+-------------------------------------------------------pcharter | 452 75.09694 52.51942 27.29 387.208 qbeach | 452 .2519077 .1997956 .0678 .5333 qpier | 452 .1595341 .1667353 .0014 .4522 qprivate | 452 .1771628 .2318749 .0014 .7369 qcharter | 452 .6914998 .7714728 .0029 2.3101 -------------+-------------------------------------------------------income | 452 3880.9 2050.028 416.6667 12500 ydiv1000 | 452 3.8809 2.050028 .4166667 12.5 264

. . * Following commands give Table 15.1, p.492 . summarize ydiv100 pbeach ppier pprivate pcharter qbeach qpier /* > */ qprivate qcharter dbeach dpier dprivate dcharter Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------ydiv1000 | 1182 4.099337 2.461964 .4166667 12.5 pbeach | 1182 103.422 103.641 1.29 843.186 ppier | 1182 103.422 103.641 1.29 843.186 pprivate | 1182 55.25657 62.71344 2.29 666.11 pcharter | 1182 84.37924 63.54465 27.29 691.11 -------------+-------------------------------------------------------qbeach | 1182 .2410113 .1907524 .0678 .5333 qpier | 1182 .1622237 .1603898 .0014 .4522 qprivate | 1182 .1712146 .2097885 .0002 .7369 qcharter | 1182 .6293679 .7061142 .0021 2.3101 dbeach | 1182 .1133672 .3171753 0 1 -------------+-------------------------------------------------------dpier | 1182 .1505922 .3578023 0 1 dprivate | 1182 .3536379 .4783008 0 1 dcharter | 1182 .3824027 .4861799 0 1 . sort mode . by mode: summarize ydiv100 pbeach ppier pprivate pcharter qbeach qpier /* > */ qprivate qcharter dbeach dpier dprivate dcharter ----------------------------------------------------------------------------------------------------> mode = beach Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------ydiv1000 | 134 4.051617 2.50542 .4166667 12.5 pbeach | 134 35.69949 43.09414 1.29 306.82 ppier | 134 35.69949 43.09414 1.29 306.82 pprivate | 134 97.80913 75.43844 2.29 392.946 pcharter | 134 125.0032 78.37641 27.29 427.946 -------------+-------------------------------------------------------qbeach | 134 .2791948 .1938734 .0678 .5333 qpier | 134 .2190015 .1677117 .0025 .4522 qprivate | 134 .1593985 .0948855 .0008 .2601 qcharter | 134 .5176089 .3629096 .0027 1.0266 dbeach | 134 1 0 1 1 -------------+-------------------------------------------------------dpier | 134 0 0 0 0 dprivate | 134 0 0 0 0 dcharter | 134 0 0 0 0

265

----------------------------------------------------------------------------------------------------> mode = pier Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------ydiv1000 | 178 3.387172 2.340324 .4166667 12.5 pbeach | 178 30.57133 35.58442 1.29 224.296 ppier | 178 30.57133 35.58442 1.29 224.296 pprivate | 178 82.42908 69.30802 2.29 494.058 pcharter | 178 109.7633 72.37726 27.29 529.058 -------------+-------------------------------------------------------qbeach | 178 .2614444 .1949684 .0678 .5333 qpier | 178 .2025348 .1702942 .0014 .4522 qprivate | 178 .1501489 .0968393 .0014 .2601 qcharter | 178 .4980798 .3756255 .0029 1.0266 dbeach | 178 0 0 0 0 -------------+-------------------------------------------------------dpier | 178 1 0 1 1 dprivate | 178 0 0 0 0 dcharter | 178 0 0 0 0 ----------------------------------------------------------------------------------------------------> mode = private Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------ydiv1000 | 418 4.654107 2.777898 .4166667 12.5 pbeach | 418 137.5271 115.3058 2.29 843.186 ppier | 418 137.5271 115.3058 2.29 843.186 pprivate | 418 41.60681 55.90806 2.29 666.11 pcharter | 418 70.58409 56.39575 27.29 691.11 -------------+-------------------------------------------------------qbeach | 418 .2082868 .1729351 .0678 .5333 qpier | 418 .1297646 .1368029 .0025 .4522 qprivate | 418 .1775411 .2435798 .0002 .7369 qcharter | 418 .6539167 .8064379 .0021 2.3101 dbeach | 418 0 0 0 0 -------------+-------------------------------------------------------dpier | 418 0 0 0 0 dprivate | 418 1 0 1 1 dcharter | 418 0 0 0 0 ----------------------------------------------------------------------------------------------------> mode = charter Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------ydiv1000 | 452 3.8809 2.050028 .4166667 12.5 pbeach | 452 120.6483 99.78664 4.29 578.048 ppier | 452 120.6483 99.78664 4.29 578.048 pprivate | 452 44.56376 52.23744 2.29 362.208 266

pcharter | 452 75.09694 52.51942 27.29 387.208 -------------+-------------------------------------------------------qbeach | 452 .2519077 .1997956 .0678 .5333 qpier | 452 .1595341 .1667353 .0014 .4522 qprivate | 452 .1771628 .2318749 .0014 .7369 qcharter | 452 .6914998 .7714728 .0029 2.3101 dbeach | 452 0 0 0 0 -------------+-------------------------------------------------------dpier | 452 0 0 0 0 dprivate | 452 0 0 0 0 dcharter | 452 1 0 1 1

. . ********** (1) MULTINOMIAL LOGIT: ALTERNATIVE-INVARIANT REGRESSOR ********* . . *** (1A) Estimate the model . . * Data are already in form for mlogit . . * The following gives MNL column of Table 15.2, p.493 . mlogit mode ydiv1000, basecategory(1) Iteration 0: Iteration 1: Iteration 2: Iteration 3:

log likelihood = -1497.7229 log likelihood = -1477.5265 log likelihood = -1477.1514 log likelihood = -1477.1506

Multinomial logistic regression LR chi2(3) Prob > chi2 Log likelihood = -1477.1506

Number of obs = 1182 = 41.14 = 0.0000 Pseudo R2 = 0.0137

-----------------------------------------------------------------------------mode | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------pier | ydiv1000 | -.1434029 .0532882 -2.69 0.007 -.2478459 -.03896 _cons | .8141503 .2286316 3.56 0.000 .3660405 1.26226 -------------+---------------------------------------------------------------private | ydiv1000 | .0919064 .0406638 2.26 0.024 .0122069 .1716059 _cons | .7389208 .1967309 3.76 0.000 .3533352 1.124506 -------------+---------------------------------------------------------------charter | ydiv1000 | -.0316399 .0418463 -0.76 0.450 -.1136571 .0503774 _cons | 1.341291 .1945167 6.90 0.000 .9600457 1.722537 -----------------------------------------------------------------------------(Outcome mode==beach is the comparison group)

267

. . *** (1B) Calculate the marginal effects . . quietly mlogit mode ydiv1000, basecategory(1) . * Predict by default gives the probabilities . predict p1 p2 p3 p4 (option p assumed; predicted probabilities) . . * As check compare predicted to actual probabilities . summarize dbeach p1 dpier p2 dprivate p3 dcharter p4 Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------dbeach | 1182 .1133672 .3171753 0 1 p1 | 1182 .1133672 .0036716 .0947395 .1153659 dpier | 1182 .1505922 .3578023 0 1 p2 | 1182 .1505922 .0444575 .0356142 .2342903 dprivate | 1182 .3536379 .4783008 0 1 -------------+-------------------------------------------------------p3 | 1182 .3536379 .0797714 .2396973 .625706 dcharter | 1182 .3824027 .4861799 0 1 p4 | 1182 .3824027 .0346281 .2439403 .4158273 . . * Quick way to compute marginal effects (or semi-elasticities dp/dlnx or elasticities) . * is to use built-in Stata function whcih evaluates at sample mean . * dydx, eyex, dwex or eydx . mfx compute, dydx predict(outcome(1)) Marginal effects after mlogit y = Pr(mode==1) (predict, outcome(1)) = .11541492 -----------------------------------------------------------------------------variable | dy/dx Std. Err. z P>|z| [ 95% C.I. ] X ---------+-------------------------------------------------------------------ydiv1000 | .000075 .00393 0.02 0.985 -.007635 .007785 4.09934 -----------------------------------------------------------------------------. mfx compute, dydx predict(outcome(2)) Marginal effects after mlogit y = Pr(mode==2) (predict, outcome(2)) = .14472379 -----------------------------------------------------------------------------variable | dy/dx Std. Err. z P>|z| [ 95% C.I. ] X ---------+-------------------------------------------------------------------ydiv1000 | -.0206598 .00487 -4.24 0.000 -.030212 -.011108 4.09934 ------------------------------------------------------------------------------

268

. mfx compute, dydx predict(outcome(3)) Marginal effects after mlogit y = Pr(mode==3) (predict, outcome(3)) = .35220366 -----------------------------------------------------------------------------variable | dy/dx Std. Err. z P>|z| [ 95% C.I. ] X ---------+-------------------------------------------------------------------ydiv1000 | .0325985 .00569 5.73 0.000 .021442 .043755 4.09934 -----------------------------------------------------------------------------. mfx compute, dydx predict(outcome(4)) Marginal effects after mlogit y = Pr(mode==4) (predict, outcome(4)) = .38765763 -----------------------------------------------------------------------------variable | dy/dx Std. Err. z P>|z| [ 95% C.I. ] X ---------+-------------------------------------------------------------------ydiv1000 | -.0120137 .00608 -1.98 0.048 -.023922 -.000106 4.09934 -----------------------------------------------------------------------------. . * Better is to evaluate marginal effect for each observation and average . * The following calculates marginal effects using noncalculus methods . * by comparing the predicted probability before and after change in x . * Here consider small change of 0.0001 - then multiply by 1000 . * So should be similar to using calculus methods. . replace ydiv1000 = ydiv1000 + 0.0001 (1182 real changes made) . predict p1new p2new p3new p4new (option p assumed; predicted probabilities) . gen dp1dy = 10000*(p1new - p1) . gen dp2dy = 10000*(p2new - p2) . gen dp3dy = 10000*(p3new - p3) . gen dp4dy = 10000*(p4new - p4) . . * The computed marginal effects follow. . * These are close to those given in text page 494 (which were calculated using Limdep) . sum dp1dy dp2dy dp3dy dp4dy Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------dp1dy | 1182 .0001549 .0015919 -.0042468 .0027567 dp2dy | 1182 -.0207849 .0046004 -.0278652 -.0067055 269

dp3dy | dp4dy |

1182 .0318045 .0014852 .0280142 .0336766 1182 -.0111929 .0041308 -.0190735 -.0026822

. . * Note that here these are similar to the earlier values at means . * This is because little variation in predicted probability across individuals here . . * ASIDE: Binary logit will differ a little from MNL . keep if mode == 1 | mode == 2 (870 observations deleted) . mlogit mode ydiv1000 Iteration 0: log likelihood = -213.14899 Iteration 1: log likelihood = -210.28877 Iteration 2: log likelihood = -210.28833 Multinomial logistic regression LR chi2(1) Prob > chi2 Log likelihood = -210.28833

Number of obs = 312 = 5.72 = 0.0168 Pseudo R2 = 0.0134

-----------------------------------------------------------------------------mode | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------beach | ydiv1000 | .1134757 .0481736 2.36 0.018 .0190571 .2078942 _cons | -.7037127 .2125851 -3.31 0.001 -1.120372 -.2870535 -----------------------------------------------------------------------------(Outcome mode==pier is the comparison group) . . ******* (2) CONDITIONAL LOGIT: ALTERNATIVE-SPECIFIC REGRESSOR ********* . . *** (2A) Estimate the model . . * This requires reshaping the data . clear . infile mode price crate dbeach dpier dprivate dcharter pbeach ppier /* > */ pprivate pcharter qbeach qpier qprivate qcharter income /* > */ using nldata.asc (1182 observations read) . . gen ydiv1000 = income/1000 . . * Data are one entry per individual . * Need to reshape to 4 observations per individual - one for each alternative . * Use reshape to do this which also creates variable (see below) 270

. * alternatv = 1 if beach, = 2 if pier; = 3 if private boat; = 4 if charter . gen id = _n . gen d1 = dbeach . gen p1 = pbeach . gen q1 = qbeach . gen d2 = dpier . gen p2 = ppier . gen q2 = qpier . gen d3 = dprivate . gen p3 = pprivate . gen q3 = qprivate . gen d4 = dcharter . gen p4 = pcharter . gen q4 = qcharter . describe Contains data obs: 1,182 vars: 30 size: 146,568 (98.6% of memory free) ------------------------------------------------------------------------------storage display value variable name type format label variable label ------------------------------------------------------------------------------mode float %9.0g price float %9.0g crate float %9.0g dbeach float %9.0g dpier float %9.0g dprivate float %9.0g dcharter float %9.0g pbeach float %9.0g ppier float %9.0g pprivate float %9.0g pcharter float %9.0g qbeach float %9.0g qpier float %9.0g qprivate float %9.0g 271

qcharter float %9.0g income float %9.0g ydiv1000 float %9.0g id float %9.0g d1 float %9.0g p1 float %9.0g q1 float %9.0g d2 float %9.0g p2 float %9.0g q2 float %9.0g d3 float %9.0g p3 float %9.0g q3 float %9.0g d4 float %9.0g p4 float %9.0g q4 float %9.0g ------------------------------------------------------------------------------Sorted by: Note: dataset has changed since last saved . summarize Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------mode | 1182 3.005076 .9936162 1 4 price | 1182 52.08197 53.82997 1.29 666.11 crate | 1182 .3893684 .5605964 .0002 2.3101 dbeach | 1182 .1133672 .3171753 0 1 dpier | 1182 .1505922 .3578023 0 1 -------------+-------------------------------------------------------dprivate | 1182 .3536379 .4783008 0 1 dcharter | 1182 .3824027 .4861799 0 1 pbeach | 1182 103.422 103.641 1.29 843.186 ppier | 1182 103.422 103.641 1.29 843.186 pprivate | 1182 55.25657 62.71344 2.29 666.11 -------------+-------------------------------------------------------pcharter | 1182 84.37924 63.54465 27.29 691.11 qbeach | 1182 .2410113 .1907524 .0678 .5333 qpier | 1182 .1622237 .1603898 .0014 .4522 qprivate | 1182 .1712146 .2097885 .0002 .7369 qcharter | 1182 .6293679 .7061142 .0021 2.3101 -------------+-------------------------------------------------------income | 1182 4099.337 2461.964 416.6667 12500 ydiv1000 | 1182 4.099337 2.461964 .4166667 12.5 id | 1182 591.5 341.3583 1 1182 d1 | 1182 .1133672 .3171753 0 1 p1 | 1182 103.422 103.641 1.29 843.186 -------------+-------------------------------------------------------q1 | 1182 .2410113 .1907524 .0678 .5333 d2 | 1182 .1505922 .3578023 0 1 p2 | 1182 103.422 103.641 1.29 843.186 272

q2 | 1182 .1622237 .1603898 .0014 .4522 d3 | 1182 .3536379 .4783008 0 1 -------------+-------------------------------------------------------p3 | 1182 55.25657 62.71344 2.29 666.11 q3 | 1182 .1712146 .2097885 .0002 .7369 d4 | 1182 .3824027 .4861799 0 1 p4 | 1182 84.37924 63.54465 27.29 691.11 q4 | 1182 .6293679 .7061142 .0021 2.3101 . . reshape long d p q, i(id) j(alterntv) (note: j = 1 2 3 4) Data wide -> long ----------------------------------------------------------------------------Number of obs. 1182 -> 4728 Number of variables 30 -> 22 j variable (4 values) -> alterntv xij variables: d1 d2 ... d4 -> d p1 p2 ... p4 -> p q1 q2 ... q4 -> q ----------------------------------------------------------------------------. * This automatically creates alterntv = 1 (beach), ... 4 (charter) . describe Contains data obs: 4,728 vars: 22 size: 420,792 (95.9% of memory free) ------------------------------------------------------------------------------storage display value variable name type format label variable label ------------------------------------------------------------------------------id float %9.0g alterntv byte %9.0g mode float %9.0g price float %9.0g crate float %9.0g dbeach float %9.0g dpier float %9.0g dprivate float %9.0g dcharter float %9.0g pbeach float %9.0g ppier float %9.0g pprivate float %9.0g pcharter float %9.0g qbeach float %9.0g qpier float %9.0g qprivate float %9.0g 273

qcharter float %9.0g income float %9.0g ydiv1000 float %9.0g d float %9.0g p float %9.0g q float %9.0g ------------------------------------------------------------------------------Sorted by: id alterntv Note: dataset has changed since last saved . summarize Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------id | 4728 591.5 341.25 1 1182 alterntv | 4728 2.5 1.118152 1 4 mode | 4728 3.005076 .9933008 1 4 price | 4728 52.08197 53.81289 1.29 666.11 crate | 4728 .3893684 .5604185 .0002 2.3101 -------------+-------------------------------------------------------dbeach | 4728 .1133672 .3170746 0 1 dpier | 4728 .1505922 .3576888 0 1 dprivate | 4728 .3536379 .478149 0 1 dcharter | 4728 .3824027 .4860256 0 1 pbeach | 4728 103.422 103.6081 1.29 843.186 -------------+-------------------------------------------------------ppier | 4728 103.422 103.6081 1.29 843.186 pprivate | 4728 55.25657 62.69354 2.29 666.11 pcharter | 4728 84.37924 63.52448 27.29 691.11 qbeach | 4728 .2410113 .1906919 .0678 .5333 qpier | 4728 .1622237 .1603389 .0014 .4522 -------------+-------------------------------------------------------qprivate | 4728 .1712146 .2097219 .0002 .7369 qcharter | 4728 .6293679 .7058901 .0021 2.3101 income | 4728 4099.337 2461.183 416.6667 12500 ydiv1000 | 4728 4.099337 2.461183 .4166667 12.5 d| 4728 .25 .4330585 0 1 -------------+-------------------------------------------------------p| 4728 86.61996 88.01813 1.29 843.186 q| 4728 .3009544 .4335593 .0002 2.3101 . . clogit d q, group(id) Iteration 0: Iteration 1: Iteration 2: Iteration 3:

log likelihood = -1627.3339 log likelihood = -1604.8049 log likelihood = -1604.6163 log likelihood = -1604.6163

Conditional (fixed-effects) logistic regression Number of obs = LR chi2(1) = 67.97

4728

274

Prob > chi2 Log likelihood = -1604.6163

= 0.0000 Pseudo R2 =

0.0207

-----------------------------------------------------------------------------d| Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------q | .6307908 .0757624 8.33 0.000 .4822993 .7792823 -----------------------------------------------------------------------------. clogit d p, group(id) Iteration 0: Iteration 1: Iteration 2: Iteration 3: Iteration 4: Iteration 5:

log likelihood = -1595.7652 log likelihood = -1411.4335 log likelihood = -1376.0224 log likelihood = -1372.9619 log likelihood = -1372.9332 log likelihood = -1372.9332

Conditional (fixed-effects) logistic regression Number of obs = 4728 LR chi2(1) = 531.33 Prob > chi2 = 0.0000 Log likelihood = -1372.9332 Pseudo R2 = 0.1621 -----------------------------------------------------------------------------d| Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------p | -.0179501 .0010694 -16.79 0.000 -.0200461 -.0158542 -----------------------------------------------------------------------------. . * The following gives CL column of Table 15.2 . clogit d p q, group(id) Iteration 0: Iteration 1: Iteration 2: Iteration 3: Iteration 4: Iteration 5:

log likelihood = -1581.9099 log likelihood = -1363.5718 log likelihood = -1317.8453 log likelihood = -1312.1013 log likelihood = -1311.9797 log likelihood = -1311.9796

Conditional (fixed-effects) logistic regression Number of obs = 4728 LR chi2(2) = 653.24 Prob > chi2 = 0.0000 Log likelihood = -1311.9796 Pseudo R2 = 0.1993 -----------------------------------------------------------------------------d| Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------p | -.0204765 .0012231 -16.74 0.000 -.0228737 -.0180794 q | .9530985 .0894134 10.66 0.000 .7778514 1.128346 -----------------------------------------------------------------------------275

. . *** (2B) Calculate the marginal effects . . quietly clogit d p q, group(id) . predict pinitial (option pc1 assumed; conditional probability for single outcome within group) . . * Now compute marginal effects . * Consider in turn a change in each price and catch rate . * Change price by 1 unit and then multiply by 100 as in Table 15.2 . * Change catch rate by 0.001 and then multiply by 1000 . . * Change p1: price beach . replace p = p + 1 if alterntv==1 (1182 real changes made) . predict pnewp1 (option pc1 assumed; conditional probability for single outcome within group) . gen mep1 = 100*(pnewp1 - pinitial) . replace p = p - 1 if alterntv==1 (1182 real changes made) . . * Change p2: price pier . replace p = p + 1 if alterntv==2 (1182 real changes made) . predict pnewp2 (option pc1 assumed; conditional probability for single outcome within group) . gen mep2 = 100*(pnewp2 - pinitial) . replace p = p - 1 if alterntv==2 (1182 real changes made) . . * Change p3: price private boat . replace p = p + 1 if alterntv==3 (1182 real changes made) . predict pnewp3 (option pc1 assumed; conditional probability for single outcome within group) . gen mep3 = 100*(pnewp3 - pinitial) . replace p = p - 1 if alterntv==3 276

(1182 real changes made) . . * Change p4: price charter boat . replace p = p + 1 if alterntv==4 (1182 real changes made) . predict pnewp4 (option pc1 assumed; conditional probability for single outcome within group) . gen mep4 = 100*(pnewp4 - pinitial) . replace p = p - 1 if alterntv==4 (1182 real changes made) . . * Change q1: catch rate beach . replace q = q + 0.001 if alterntv==1 (1182 real changes made) . predict pnewq1 (option pc1 assumed; conditional probability for single outcome within group) . gen meq1 = 1000*(pnewq1 - pinitial) . replace q = q - 0.001 if alterntv==1 (1182 real changes made) . . * Change q2: catch rate pier . replace q = q + 0.001 if alterntv==2 (1182 real changes made) . predict pnewq2 (option pc1 assumed; conditional probability for single outcome within group) . gen meq2 = 1000*(pnewq2 - pinitial) . replace q = q - 0.001 if alterntv==2 (1182 real changes made) . . * Change q1: catch rate private boat . replace q = q + 0.001 if alterntv==3 (1182 real changes made) . predict pnewq3 (option pc1 assumed; conditional probability for single outcome within group) . gen meq3 = 1000*(pnewq3 - pinitial)

277

. replace q = q - 0.001 if alterntv==3 (1182 real changes made) . . * Change q1: catch rate charter boat . replace q = q + 0.001 if alterntv==4 (1182 real changes made) . predict pnewq4 (option pc1 assumed; conditional probability for single outcome within group) . gen meq4 = 1000*(pnewq4 - pinitial) . replace q = q + 0.001 if alterntv==4 (1182 real changes made) . . * Following gives Table 15.3 on page 493 . sort alterntv . by alterntv: sum pinitial mep1 mep2 mep3 mep4 meq1 meq2 meq3 meq4 ----------------------------------------------------------------------------------------------------> alterntv = 1 Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------pinitial | 1182 .1942074 .1545855 6.19e-08 .6159062 mep1 | 1182 -.2703818 .1753241 -.5119085 -1.26e-07 mep2 | 1182 .1183563 .1425011 0 .5107701 mep3 | 1182 .0846517 .0561764 6.24e-08 .1818448 mep4 | 1182 .0675326 .0398588 6.44e-08 .1960158 -------------+-------------------------------------------------------meq1 | 1182 .1264198 .0817316 5.91e-08 .2382994 meq2 | 1182 -.0552685 .0664207 -.2378225 0 meq3 | 1182 -.0395602 .0262581 -.0849366 -2.91e-08 meq4 | 1182 -.0315872 .0186528 -.0915527 -3.00e-08 ----------------------------------------------------------------------------------------------------> alterntv = 2 Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------pinitial | 1182 .1832872 .1456892 5.73e-08 .484103 mep1 | 1182 .1184102 .1425963 0 .5111754 mep2 | 1182 -.2618934 .1742628 -.5112112 -1.16e-07 mep3 | 1182 .0801368 .0543153 5.78e-08 .1729459 mep4 | 1182 .0636229 .0381182 5.96e-08 .1775354 -------------+-------------------------------------------------------meq1 | 1182 -.0552672 .0664175 -.2378225 0 meq2 | 1182 .1224849 .0812789 5.47e-08 .2380311 278

meq3 | meq4 |

1182 -.0374514 1182 -.0297604

.0253908 -.0807345 -2.69e-08 .0178421 -.0829101 -2.78e-08

----------------------------------------------------------------------------------------------------> alterntv = 3 Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------pinitial | 1182 .3298317 .173932 .0000756 .6739099 mep1 | 1182 .084509 .0561326 0 .1815647 mep2 | 1182 .0799891 .0542687 0 .172469 mep3 | 1182 -.3897785 .1364849 -.5119085 -.0001532 mep4 | 1182 .2248109 .1606873 1.24e-08 .5118489 -------------+-------------------------------------------------------meq1 | 1182 -.0395636 .02626 -.0849366 0 meq2 | 1182 -.0374553 .0253917 -.0807345 0 meq3 | 1182 .1818861 .0633881 .0000721 .2382994 meq4 | 1182 -.104879 .0748259 -.2382398 -7.28e-09 ----------------------------------------------------------------------------------------------------> alterntv = 4 Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------pinitial | 1182 .2926737 .1807255 .000078 .7322331 mep1 | 1182 .0674624 .0398696 0 .1958013 mep2 | 1182 .0635479 .0381287 0 .1772434 mep3 | 1182 .22499 .1608719 1.24e-08 .511682 mep4 | 1182 -.3559665 .1370352 -.5119085 -.0001582 -------------+-------------------------------------------------------meq1 | 1182 -.0315891 .018653 -.0915825 0 meq2 | 1182 -.0297618 .0178418 -.0829399 0 meq3 | 1182 -.1048757 .0748219 -.2382398 -7.28e-09 meq4 | 1182 .1662257 .0636901 .0000744 .2382994

. . ******* (3) CONDITIONAL LOGIT: ALTERNATIVE-INVARIANT REGRESSOR ********* . . * Here we get clogit to do something that is easier done by mlogit . . clear . infile mode price crate dbeach dpier dprivate dcharter pbeach ppier /* > */ pprivate pcharter qbeach qpier qprivate qcharter income /* > */ using nldata.asc (1182 observations read) . . gen ydiv1000 = income/1000

279

. . * Data are one entry per individual . * Need to reshape to 4 observations per individual - one for each alternative . * Use reshape to do this but first create variable . * Alternative = 1 if beach, = 2 if pier; = 3 if private boat; = 4 if charter . gen id = _n . gen d1 = dbeach . gen d2 = dpier . gen d3 = dprivate . gen d4 = dcharter . describe Contains data obs: 1,182 vars: 22 size: 108,744 (98.9% of memory free) ------------------------------------------------------------------------------storage display value variable name type format label variable label ------------------------------------------------------------------------------mode float %9.0g price float %9.0g crate float %9.0g dbeach float %9.0g dpier float %9.0g dprivate float %9.0g dcharter float %9.0g pbeach float %9.0g ppier float %9.0g pprivate float %9.0g pcharter float %9.0g qbeach float %9.0g qpier float %9.0g qprivate float %9.0g qcharter float %9.0g income float %9.0g ydiv1000 float %9.0g id float %9.0g d1 float %9.0g d2 float %9.0g d3 float %9.0g d4 float %9.0g ------------------------------------------------------------------------------Sorted by: Note: dataset has changed since last saved

280

. summarize Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------mode | 1182 3.005076 .9936162 1 4 price | 1182 52.08197 53.82997 1.29 666.11 crate | 1182 .3893684 .5605964 .0002 2.3101 dbeach | 1182 .1133672 .3171753 0 1 dpier | 1182 .1505922 .3578023 0 1 -------------+-------------------------------------------------------dprivate | 1182 .3536379 .4783008 0 1 dcharter | 1182 .3824027 .4861799 0 1 pbeach | 1182 103.422 103.641 1.29 843.186 ppier | 1182 103.422 103.641 1.29 843.186 pprivate | 1182 55.25657 62.71344 2.29 666.11 -------------+-------------------------------------------------------pcharter | 1182 84.37924 63.54465 27.29 691.11 qbeach | 1182 .2410113 .1907524 .0678 .5333 qpier | 1182 .1622237 .1603898 .0014 .4522 qprivate | 1182 .1712146 .2097885 .0002 .7369 qcharter | 1182 .6293679 .7061142 .0021 2.3101 -------------+-------------------------------------------------------income | 1182 4099.337 2461.964 416.6667 12500 ydiv1000 | 1182 4.099337 2.461964 .4166667 12.5 id | 1182 591.5 341.3583 1 1182 d1 | 1182 .1133672 .3171753 0 1 d2 | 1182 .1505922 .3578023 0 1 -------------+-------------------------------------------------------d3 | 1182 .3536379 .4783008 0 1 d4 | 1182 .3824027 .4861799 0 1 . . reshape long d, i(id) j(alterntv) (note: j = 1 2 3 4) Data wide -> long ----------------------------------------------------------------------------Number of obs. 1182 -> 4728 Number of variables 22 -> 20 j variable (4 values) -> alterntv xij variables: d1 d2 ... d4 -> d ----------------------------------------------------------------------------. describe Contains data obs: 4,728 vars: 20 size: 382,968 (96.3% of memory free) ------------------------------------------------------------------------------281

storage display value variable name type format label variable label ------------------------------------------------------------------------------id float %9.0g alterntv byte %9.0g mode float %9.0g price float %9.0g crate float %9.0g dbeach float %9.0g dpier float %9.0g dprivate float %9.0g dcharter float %9.0g pbeach float %9.0g ppier float %9.0g pprivate float %9.0g pcharter float %9.0g qbeach float %9.0g qpier float %9.0g qprivate float %9.0g qcharter float %9.0g income float %9.0g ydiv1000 float %9.0g d float %9.0g ------------------------------------------------------------------------------Sorted by: id alterntv Note: dataset has changed since last saved . summarize Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------id | 4728 591.5 341.25 1 1182 alterntv | 4728 2.5 1.118152 1 4 mode | 4728 3.005076 .9933008 1 4 price | 4728 52.08197 53.81289 1.29 666.11 crate | 4728 .3893684 .5604185 .0002 2.3101 -------------+-------------------------------------------------------dbeach | 4728 .1133672 .3170746 0 1 dpier | 4728 .1505922 .3576888 0 1 dprivate | 4728 .3536379 .478149 0 1 dcharter | 4728 .3824027 .4860256 0 1 pbeach | 4728 103.422 103.6081 1.29 843.186 -------------+-------------------------------------------------------ppier | 4728 103.422 103.6081 1.29 843.186 pprivate | 4728 55.25657 62.69354 2.29 666.11 pcharter | 4728 84.37924 63.52448 27.29 691.11 qbeach | 4728 .2410113 .1906919 .0678 .5333 qpier | 4728 .1622237 .1603389 .0014 .4522 -------------+-------------------------------------------------------qprivate | 4728 .1712146 .2097219 .0002 .7369 qcharter | 4728 .6293679 .7058901 .0021 2.3101 282

income | 4728 4099.337 2461.183 416.6667 ydiv1000 | 4728 4.099337 2.461183 .4166667 d| 4728 .25 .4330585 0 1

12500 12.5

. . gen obsnum=_n . gen d2 = 0 . replace d2 = 1 if mod(obsnum,4)==2 (1182 real changes made) . gen d3 = 0 . replace d3 = 1 if mod(obsnum,4)==3 (1182 real changes made) . gen d4 = 0 . replace d4 = 1 if mod(obsnum,4)==0 (1182 real changes made) . gen d2y = 0 . replace d2y = d2*ydiv1000 (1182 real changes made) . gen d3y = 0 . replace d3y = d3*ydiv1000 (1182 real changes made) . gen d4y = 0 . replace d4y = d4*ydiv1000 (1182 real changes made) . summarize Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------id | 4728 591.5 341.25 1 1182 alterntv | 4728 2.5 1.118152 1 4 mode | 4728 3.005076 .9933008 1 4 price | 4728 52.08197 53.81289 1.29 666.11 crate | 4728 .3893684 .5604185 .0002 2.3101 -------------+-------------------------------------------------------dbeach | 4728 .1133672 .3170746 0 1 dpier | 4728 .1505922 .3576888 0 1 dprivate | 4728 .3536379 .478149 0 1 dcharter | 4728 .3824027 .4860256 0 1 283

pbeach | 4728 103.422 103.6081 1.29 843.186 -------------+-------------------------------------------------------ppier | 4728 103.422 103.6081 1.29 843.186 pprivate | 4728 55.25657 62.69354 2.29 666.11 pcharter | 4728 84.37924 63.52448 27.29 691.11 qbeach | 4728 .2410113 .1906919 .0678 .5333 qpier | 4728 .1622237 .1603389 .0014 .4522 -------------+-------------------------------------------------------qprivate | 4728 .1712146 .2097219 .0002 .7369 qcharter | 4728 .6293679 .7058901 .0021 2.3101 income | 4728 4099.337 2461.183 416.6667 12500 ydiv1000 | 4728 4.099337 2.461183 .4166667 12.5 d| 4728 .25 .4330585 0 1 -------------+-------------------------------------------------------obsnum | 4728 2364.5 1365 1 4728 d2 | 4728 .25 .4330585 0 1 d3 | 4728 .25 .4330585 0 1 d4 | 4728 .25 .4330585 0 1 d2y | 4728 1.024834 2.160064 0 12.5 -------------+-------------------------------------------------------d3y | 4728 1.024834 2.160064 0 12.5 d4y | 4728 1.024834 2.160064 0 12.5 . . * The following gives MNL column of Table 15.2, p.493, . * which was more easily obtained using mlogit earlier . clogit d d2 d3 d4 d2y d3y d4y, group(id) Iteration 0: Iteration 1: Iteration 2: Iteration 3: Iteration 4:

log likelihood = -1570.1863 log likelihood = -1479.3713 log likelihood = -1477.159 log likelihood = -1477.1506 log likelihood = -1477.1506

Conditional (fixed-effects) logistic regression Number of obs = 4728 LR chi2(6) = 322.90 Prob > chi2 = 0.0000 Log likelihood = -1477.1506 Pseudo R2 = 0.0985 -----------------------------------------------------------------------------d| Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------d2 | .8141503 .228632 3.56 0.000 .3660399 1.262261 d3 | .7389208 .1967309 3.76 0.000 .3533352 1.124506 d4 | 1.341291 .1945167 6.90 0.000 .9600457 1.722537 d2y | -.1434029 .0532884 -2.69 0.007 -.2478463 -.0389595 d3y | .0919064 .0406637 2.26 0.024 .0122069 .1716058 d4y | -.0316399 .0418463 -0.76 0.450 -.1136571 .0503774 -----------------------------------------------------------------------------. 284

. ******* (4) "MIXED LOGIT" = CONDITIONAL LOGIT WITH BOTH .* ALTERNATIVE-SPECIFIC REGRESSOR .* AND ALTERNATIVE INVARIANT REGRESSOR ********* . . clear . infile mode price crate dbeach dpier dprivate dcharter pbeach ppier /* > */ pprivate pcharter qbeach qpier qprivate qcharter income /* > */ using nldata.asc (1182 observations read) . . gen ydiv1000 = income/1000 . . * Data are one entry per individual . * Need to reshape to 4 observations per individual - one for each alternative . * Use reshape to do this but first create variable . * Alternative = 1 if beach, = 2 if pier; = 3 if private boat; = 4 if charter . gen id = _n . gen d1 = dbeach . gen p1 = pbeach . gen q1 = qbeach . gen d2 = dpier . gen p2 = ppier . gen q2 = qpier . gen d3 = dprivate . gen p3 = pprivate . gen q3 = qprivate . gen d4 = dcharter . gen p4 = pcharter . gen q4 = qcharter . . reshape long d p q, i(id) j(alterntv) (note: j = 1 2 3 4) Data wide -> long ----------------------------------------------------------------------------285

Number of obs. 1182 -> 4728 Number of variables 30 -> 22 j variable (4 values) -> alterntv xij variables: d1 d2 ... d4 -> d p1 p2 ... p4 -> p q1 q2 ... q4 -> q ----------------------------------------------------------------------------. summarize Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------id | 4728 591.5 341.25 1 1182 alterntv | 4728 2.5 1.118152 1 4 mode | 4728 3.005076 .9933008 1 4 price | 4728 52.08197 53.81289 1.29 666.11 crate | 4728 .3893684 .5604185 .0002 2.3101 -------------+-------------------------------------------------------dbeach | 4728 .1133672 .3170746 0 1 dpier | 4728 .1505922 .3576888 0 1 dprivate | 4728 .3536379 .478149 0 1 dcharter | 4728 .3824027 .4860256 0 1 pbeach | 4728 103.422 103.6081 1.29 843.186 -------------+-------------------------------------------------------ppier | 4728 103.422 103.6081 1.29 843.186 pprivate | 4728 55.25657 62.69354 2.29 666.11 pcharter | 4728 84.37924 63.52448 27.29 691.11 qbeach | 4728 .2410113 .1906919 .0678 .5333 qpier | 4728 .1622237 .1603389 .0014 .4522 -------------+-------------------------------------------------------qprivate | 4728 .1712146 .2097219 .0002 .7369 qcharter | 4728 .6293679 .7058901 .0021 2.3101 income | 4728 4099.337 2461.183 416.6667 12500 ydiv1000 | 4728 4.099337 2.461183 .4166667 12.5 d| 4728 .25 .4330585 0 1 -------------+-------------------------------------------------------p| 4728 86.61996 88.01813 1.29 843.186 q| 4728 .3009544 .4335593 .0002 2.3101 . . * Bring in alternative specific dummies . * Since d2-d4 already used instead call them dummy2 - dummy4 . gen obsnum=_n . gen dummy1 = 0 . replace dummy1 = 1 if mod(obsnum,4)==1 (1182 real changes made) . gen dummy2 = 0 286

. replace dummy2 = 1 if mod(obsnum,4)==2 (1182 real changes made) . gen dummy3 = 0 . replace dummy3 = 1 if mod(obsnum,4)==3 (1182 real changes made) . gen dummy4 = 0 . replace dummy4 = 1 if mod(obsnum,4)==0 (1182 real changes made) . * And interact with income . gen d1y = 0 . replace d1y = dummy1*ydiv1000 (1182 real changes made) . gen d2y = 0 . replace d2y = dummy2*ydiv1000 (1182 real changes made) . gen d3y = 0 . replace d3y = dummy3*ydiv1000 (1182 real changes made) . gen d4y = 0 . replace d4y = dummy4*ydiv1000 (1182 real changes made) . . summarize Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------id | 4728 591.5 341.25 1 1182 alterntv | 4728 2.5 1.118152 1 4 mode | 4728 3.005076 .9933008 1 4 price | 4728 52.08197 53.81289 1.29 666.11 crate | 4728 .3893684 .5604185 .0002 2.3101 -------------+-------------------------------------------------------dbeach | 4728 .1133672 .3170746 0 1 dpier | 4728 .1505922 .3576888 0 1 dprivate | 4728 .3536379 .478149 0 1 dcharter | 4728 .3824027 .4860256 0 1 pbeach | 4728 103.422 103.6081 1.29 843.186 287

-------------+-------------------------------------------------------ppier | 4728 103.422 103.6081 1.29 843.186 pprivate | 4728 55.25657 62.69354 2.29 666.11 pcharter | 4728 84.37924 63.52448 27.29 691.11 qbeach | 4728 .2410113 .1906919 .0678 .5333 qpier | 4728 .1622237 .1603389 .0014 .4522 -------------+-------------------------------------------------------qprivate | 4728 .1712146 .2097219 .0002 .7369 qcharter | 4728 .6293679 .7058901 .0021 2.3101 income | 4728 4099.337 2461.183 416.6667 12500 ydiv1000 | 4728 4.099337 2.461183 .4166667 12.5 d| 4728 .25 .4330585 0 1 -------------+-------------------------------------------------------p| 4728 86.61996 88.01813 1.29 843.186 q| 4728 .3009544 .4335593 .0002 2.3101 obsnum | 4728 2364.5 1365 1 4728 dummy1 | 4728 .25 .4330585 0 1 dummy2 | 4728 .25 .4330585 0 1 -------------+-------------------------------------------------------dummy3 | 4728 .25 .4330585 0 1 dummy4 | 4728 .25 .4330585 0 1 d1y | 4728 1.024834 2.160064 0 12.5 d2y | 4728 1.024834 2.160064 0 12.5 d3y | 4728 1.024834 2.160064 0 12.5 -------------+-------------------------------------------------------d4y | 4728 1.024834 2.160064 0 12.5 . . clogit d dummy2 dummy3 dummy4 p q, group(id) Iteration 0: Iteration 1: Iteration 2: Iteration 3: Iteration 4: Iteration 5: Iteration 6:

log likelihood = -1548.5161 log likelihood = -1311.3761 log likelihood = -1247.5777 log likelihood = -1232.1412 log likelihood = -1230.7975 log likelihood = -1230.7838 log likelihood = -1230.7838

Conditional (fixed-effects) logistic regression Number of obs = 4728 LR chi2(5) = 815.63 Prob > chi2 = 0.0000 Log likelihood = -1230.7838 Pseudo R2 = 0.2489 -----------------------------------------------------------------------------d| Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------dummy2 | .3070552 .1145738 2.68 0.007 .0824947 .5316158 dummy3 | .8713749 .1140428 7.64 0.000 .6478551 1.094895 dummy4 | 1.498888 .1329328 11.28 0.000 1.238345 1.759432 p | -.0247896 .0017044 -14.54 0.000 -.0281301 -.021449 q | .3771689 .1099707 3.43 0.001 .1616303 .5927074 288

-----------------------------------------------------------------------------. . * The following gives Mixed column of Table 15.2, p.493 . clogit d p q dummy2 dummy3 dummy4 d2y d3y d4y, group(id) Iteration 0: Iteration 1: Iteration 2: Iteration 3: Iteration 4: Iteration 5: Iteration 6:

log likelihood = -1538.389 log likelihood = -1297.4143 log likelihood = -1233.5431 log likelihood = -1216.8043 log likelihood = -1215.1582 log likelihood = -1215.1376 log likelihood = -1215.1376

Conditional (fixed-effects) logistic regression Number of obs = 4728 LR chi2(8) = 846.92 Prob > chi2 = 0.0000 Log likelihood = -1215.1376 Pseudo R2 = 0.2584 -----------------------------------------------------------------------------d| Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------p | -.0251166 .0017317 -14.50 0.000 -.0285106 -.0217225 q | .357782 .1097733 3.26 0.001 .1426302 .5729337 dummy2 | .7779594 .2204939 3.53 0.000 .3457992 1.21012 dummy3 | .5272788 .2227927 2.37 0.018 .0906131 .9639444 dummy4 | 1.694366 .2240506 7.56 0.000 1.255235 2.133497 d2y | -.1275771 .0506395 -2.52 0.012 -.2268288 -.0283255 d3y | .0894398 .0500671 1.79 0.074 -.0086898 .1875695 d4y | -.0332917 .0503409 -0.66 0.508 -.131958 .0653746 -----------------------------------------------------------------------------. . * Output data file for Read into Limdep program mma15p4gev.lim . outfile id d p q ydiv1000 dummy2 dummy3 dummy4 d2y d3y d4y using mma15p4gev.asc, replace . . ********** CLOSE OUTPUT ********** . log close log: c:\Imbook\bwebpage\Section4\mma15p1mnl.txt log type: text closed on: 19 May 2005, 12:16:24 -----------------------------------------------------------------------------------------------------log: c:\Imbook\bwebpage\Section4\mma15p2gev.txt log type: text opened on: 19 May 2005, 12:16:29 . . ********** OVERVIEW OF MMA15P2GEV.DO ********** . 289

. * STATA Program . * copyright C 2005 by A. Colin Cameron and Pravin K. Trivedi . * used for "Microeconometrics: Methods and Applications" . * by A. Colin Cameron and Pravin K. Trivedi (2005) . * Cambridge University Press . . * Chapter 15.6.3 page 511 . * Nested logit (GEV) model analysis. . * (1) Set data up and reproduce Mixed estimates in Table 15.2 p.493 . * (2A) Nested logit model estimates (page 511) . * (2B) Restricted nested logit model estimates (page 511) . * (2C) Equivalent conditional logit model estimates (same as (2B)) . . * Related programs are . * mma15p1mnl.do multinomial and conditional logit using Stata . * mma15p3mnl.lim multinomial logit using Limdep . * mma15p4gev.lim conditional and nested logit using Limdep and Nlogit . . * To run this program you need data file . * Nldata.asc . . * NOTE: The example here is deliberately simple and merely illustrative. .* with nesting structure .* / \ .* / \ / \ . * In this case with parameter rho_j differing across alternatives . * Stata 8 estimates the earlier variant of the nested logit model . * rather than the preferred variant given in the text. . * See the discussion at bottom of page 511 and also Train (2003, p.88) . . ********** SETUP ********** . . set more off . version 8.0 . set scheme s1mono /* Graphics scheme */ . . ********** DATA DESCRIPTION ********** . . * Data Set comes from : . * J. A. Herriges and C. L. Kling, . * "Nonlinear Income Effects in Random Utility Models", . * Review of Economics and Statistics, 81(1999): 62-72 . . * The data are given as a combined observation with data on all 4 choices. . * This will work for multinomial logit program. . * For conditional logit will need to make a new data set which has . * four separate entries for each observation as there are four alternatives. . 290

. * Filename: NLDATA.ASC . * Format: Ascii . * Number of Observations: 1182 . * Each observations appears over 3 lines with 4 variables per line . * so 4 x 1182 = 4728 observations . * Variable Number and Description . * 1 Recreation mode choice. = 1 if beach, = 2 if pier; = 3 if private boat; = 4 if charter . * 2 Price for chosen alternative . * 3 Catch rate for chosen alternative . * 4 = 1 if beach mode chosen; = 0 otherwise . * 5 = 1 if pier mode chosen; = 0 otherwise . * 6 = 1 if private boat mode chosen; = 0 otherwise . * 7 = 1 if charter boat mode chosen; = 0 otherwise . * 8 = price for beach mode . * 9 = price for pier mode . * 10 = price for private boat mode . * 11 = price for charter boat mode . * 12 = catch rate for beach mode . * 13 = catch rate for pier mode . * 14 = catch rate for private boat mode . * 15 = catch rate for charter boat mode . * 16 = monthly income . . ******* (1) CONDITIONAL LOGIT MODEL (Table 15.2 p.493 Mixed column) ********* . . infile mode price crate dbeach dpier dprivate dcharter pbeach ppier /* > */ pprivate pcharter qbeach qpier qprivate qcharter income /* > */ using nldata.asc (1182 observations read) . . gen ydiv1000 = income/1000 . . * Data are one entry per individual . * Need to reshape to 4 observations per individual - one for each alternative . * Use reshape to do this which also creates variable (see below) . * alternatv = 1 if beach, = 2 if pier; = 3 if private boat; = 4 if charter . gen id = _n . gen d1 = dbeach . gen p1 = pbeach . gen q1 = qbeach . gen d2 = dpier . gen p2 = ppier . gen q2 = qpier 291

. gen d3 = dprivate . gen p3 = pprivate . gen q3 = qprivate . gen d4 = dcharter . gen p4 = pcharter . gen q4 = qcharter . summarize Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------mode | 1182 3.005076 .9936162 1 4 price | 1182 52.08197 53.82997 1.29 666.11 crate | 1182 .3893684 .5605964 .0002 2.3101 dbeach | 1182 .1133672 .3171753 0 1 dpier | 1182 .1505922 .3578023 0 1 -------------+-------------------------------------------------------dprivate | 1182 .3536379 .4783008 0 1 dcharter | 1182 .3824027 .4861799 0 1 pbeach | 1182 103.422 103.641 1.29 843.186 ppier | 1182 103.422 103.641 1.29 843.186 pprivate | 1182 55.25657 62.71344 2.29 666.11 -------------+-------------------------------------------------------pcharter | 1182 84.37924 63.54465 27.29 691.11 qbeach | 1182 .2410113 .1907524 .0678 .5333 qpier | 1182 .1622237 .1603898 .0014 .4522 qprivate | 1182 .1712146 .2097885 .0002 .7369 qcharter | 1182 .6293679 .7061142 .0021 2.3101 -------------+-------------------------------------------------------income | 1182 4099.337 2461.964 416.6667 12500 ydiv1000 | 1182 4.099337 2.461964 .4166667 12.5 id | 1182 591.5 341.3583 1 1182 d1 | 1182 .1133672 .3171753 0 1 p1 | 1182 103.422 103.641 1.29 843.186 -------------+-------------------------------------------------------q1 | 1182 .2410113 .1907524 .0678 .5333 d2 | 1182 .1505922 .3578023 0 1 p2 | 1182 103.422 103.641 1.29 843.186 q2 | 1182 .1622237 .1603898 .0014 .4522 d3 | 1182 .3536379 .4783008 0 1 -------------+-------------------------------------------------------p3 | 1182 55.25657 62.71344 2.29 666.11 q3 | 1182 .1712146 .2097885 .0002 .7369 d4 | 1182 .3824027 .4861799 0 1 p4 | 1182 84.37924 63.54465 27.29 691.11 292

q4 |

1182 .6293679

.7061142

.0021

2.3101

. . reshape long d p q, i(id) j(alterntv) (note: j = 1 2 3 4) Data wide -> long ----------------------------------------------------------------------------Number of obs. 1182 -> 4728 Number of variables 30 -> 22 j variable (4 values) -> alterntv xij variables: d1 d2 ... d4 -> d p1 p2 ... p4 -> p q1 q2 ... q4 -> q ----------------------------------------------------------------------------. * This automatically creates alterntv = 1 (beach), ... 4 (charter) . describe Contains data obs: 4,728 vars: 22 size: 420,792 (95.9% of memory free) ------------------------------------------------------------------------------storage display value variable name type format label variable label ------------------------------------------------------------------------------id float %9.0g alterntv byte %9.0g mode float %9.0g price float %9.0g crate float %9.0g dbeach float %9.0g dpier float %9.0g dprivate float %9.0g dcharter float %9.0g pbeach float %9.0g ppier float %9.0g pprivate float %9.0g pcharter float %9.0g qbeach float %9.0g qpier float %9.0g qprivate float %9.0g qcharter float %9.0g income float %9.0g ydiv1000 float %9.0g d float %9.0g p float %9.0g q float %9.0g ------------------------------------------------------------------------------293

Sorted by: id alterntv Note: dataset has changed since last saved . summarize Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------id | 4728 591.5 341.25 1 1182 alterntv | 4728 2.5 1.118152 1 4 mode | 4728 3.005076 .9933008 1 4 price | 4728 52.08197 53.81289 1.29 666.11 crate | 4728 .3893684 .5604185 .0002 2.3101 -------------+-------------------------------------------------------dbeach | 4728 .1133672 .3170746 0 1 dpier | 4728 .1505922 .3576888 0 1 dprivate | 4728 .3536379 .478149 0 1 dcharter | 4728 .3824027 .4860256 0 1 pbeach | 4728 103.422 103.6081 1.29 843.186 -------------+-------------------------------------------------------ppier | 4728 103.422 103.6081 1.29 843.186 pprivate | 4728 55.25657 62.69354 2.29 666.11 pcharter | 4728 84.37924 63.52448 27.29 691.11 qbeach | 4728 .2410113 .1906919 .0678 .5333 qpier | 4728 .1622237 .1603389 .0014 .4522 -------------+-------------------------------------------------------qprivate | 4728 .1712146 .2097219 .0002 .7369 qcharter | 4728 .6293679 .7058901 .0021 2.3101 income | 4728 4099.337 2461.183 416.6667 12500 ydiv1000 | 4728 4.099337 2.461183 .4166667 12.5 d| 4728 .25 .4330585 0 1 -------------+-------------------------------------------------------p| 4728 86.61996 88.01813 1.29 843.186 q| 4728 .3009544 .4335593 .0002 2.3101 . . * Bring in alternative specific dummies . * Since d2-d4 already used instead call them dummy2 - dummy4 . gen obsnum=_n . gen dummy1 = (mod(obsnum,4)==1) * 1 . gen dummy2 = (mod(obsnum,4)==2) * 1 . gen dummy3 = (mod(obsnum,4)==3) * 1 . gen dummy4 = (mod(obsnum,4)==0) * 1 . gen d1y = (mod(obsnum,4)==1) * ydiv1000 . gen d2y = (mod(obsnum,4)==2) * ydiv1000

294

. gen d3y = (mod(obsnum,4)==3) * ydiv1000 . gen d4y = (mod(obsnum,4)==0) * ydiv1000 . . summarize Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------id | 4728 591.5 341.25 1 1182 alterntv | 4728 2.5 1.118152 1 4 mode | 4728 3.005076 .9933008 1 4 price | 4728 52.08197 53.81289 1.29 666.11 crate | 4728 .3893684 .5604185 .0002 2.3101 -------------+-------------------------------------------------------dbeach | 4728 .1133672 .3170746 0 1 dpier | 4728 .1505922 .3576888 0 1 dprivate | 4728 .3536379 .478149 0 1 dcharter | 4728 .3824027 .4860256 0 1 pbeach | 4728 103.422 103.6081 1.29 843.186 -------------+-------------------------------------------------------ppier | 4728 103.422 103.6081 1.29 843.186 pprivate | 4728 55.25657 62.69354 2.29 666.11 pcharter | 4728 84.37924 63.52448 27.29 691.11 qbeach | 4728 .2410113 .1906919 .0678 .5333 qpier | 4728 .1622237 .1603389 .0014 .4522 -------------+-------------------------------------------------------qprivate | 4728 .1712146 .2097219 .0002 .7369 qcharter | 4728 .6293679 .7058901 .0021 2.3101 income | 4728 4099.337 2461.183 416.6667 12500 ydiv1000 | 4728 4.099337 2.461183 .4166667 12.5 d| 4728 .25 .4330585 0 1 -------------+-------------------------------------------------------p| 4728 86.61996 88.01813 1.29 843.186 q| 4728 .3009544 .4335593 .0002 2.3101 obsnum | 4728 2364.5 1365 1 4728 dummy1 | 4728 .25 .4330585 0 1 dummy2 | 4728 .25 .4330585 0 1 -------------+-------------------------------------------------------dummy3 | 4728 .25 .4330585 0 1 dummy4 | 4728 .25 .4330585 0 1 d1y | 4728 1.024834 2.160064 0 12.5 d2y | 4728 1.024834 2.160064 0 12.5 d3y | 4728 1.024834 2.160064 0 12.5 -------------+-------------------------------------------------------d4y | 4728 1.024834 2.160064 0 12.5 . . * The following gives Mixed column of Table 15.2 p.493 . * Note that dummy1 and d1y are omitted to avoid dummy variablle trap . 295

. clogit d dummy2 dummy3 dummy4 d2y d3y d4y p q, group(id) Iteration 0: Iteration 1: Iteration 2: Iteration 3: Iteration 4: Iteration 5: Iteration 6:

log likelihood = -1538.389 log likelihood = -1297.4143 log likelihood = -1233.5431 log likelihood = -1216.8043 log likelihood = -1215.1582 log likelihood = -1215.1376 log likelihood = -1215.1376

Conditional (fixed-effects) logistic regression Number of obs = 4728 LR chi2(8) = 846.92 Prob > chi2 = 0.0000 Log likelihood = -1215.1376 Pseudo R2 = 0.2584 -----------------------------------------------------------------------------d| Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------dummy2 | .7779594 .2204939 3.53 0.000 .3457992 1.21012 dummy3 | .5272788 .2227927 2.37 0.018 .0906131 .9639444 dummy4 | 1.694366 .2240506 7.56 0.000 1.255235 2.133497 d2y | -.1275771 .0506395 -2.52 0.012 -.2268288 -.0283255 d3y | .0894398 .0500671 1.79 0.074 -.0086898 .1875695 d4y | -.0332917 .0503409 -0.66 0.508 -.131958 .0653746 p | -.0251166 .0017317 -14.50 0.000 -.0285106 -.0217225 q | .357782 .1097733 3.26 0.001 .1426302 .5729337 -----------------------------------------------------------------------------. . ******* (2) NESTED LOGIT MODEL (p.511) ********* . . * Define the Tree for Nested logit .* with nesting structure .* / \ .* / \ / \ . * In this case with parameter rho_j differing across alternatives . * Stata 8 estimates the earlier variant of the nested logit model . * rather than the preferred variant given in the text. . * See the discussion at bottom of page 511 and also Train (2003, p.88) . . nlogitgen type = alterntv(shore: 1 | 2 , boat: 3 | 4) new variable type is generated with 2 groups label list lb_type lb_type: 1 shore 2 boat . nlogittree alterntv type tree structure specified for the nested logit model

296

top --> bottom type alterntv -------------------------shore 1 2 boat 3 4 . . *** (2A) Estimate the nested logit model . *** This is the model on p.511 that has "higher log-likelihood" . . * For the top level we use regressors that do not vary at the lower level . * So not p or q, but could be income or alternative dummy . * Here use income and alternative dummy . gen dshore = (type ==1) * 1 . gen dshorey = (type ==1) * ydiv1000 . nlogit d (alterntv = p q) (type = dshore dshorey), group(id) tree structure specified for the nested logit model top --> bottom type alterntv -------------------------shore 1 2 boat 3 4 initial: log likelihood = -1256.8179 rescale: log likelihood = -1256.8179 rescale eq: log likelihood = -1228.6278 Iteration 0: log likelihood = -1228.6278 Iteration 1: log likelihood = -1227.407 (backed up) Iteration 2: log likelihood = -1225.366 (backed up) Iteration 3: log likelihood = -1216.5831 (backed up) Iteration 4: log likelihood = -1210.9623 Iteration 5: log likelihood = -1210.323 (backed up) Iteration 6: log likelihood = -1199.5959 Iteration 7: log likelihood = -1198.2166 Iteration 8: log likelihood = -1193.1834 Iteration 9: log likelihood = -1190.8805 Iteration 10: log likelihood = -1188.0112 Iteration 11: log likelihood = -1185.7944 Iteration 12: log likelihood = -1184.8715 Iteration 13: log likelihood = -1183.776 Iteration 14: log likelihood = -1182.6316 297

Iteration 15: log likelihood = -1182.1119 Iteration 16: log likelihood = -1181.8783 Iteration 17: log likelihood = -1181.323 Iteration 18: log likelihood = -1181.162 Iteration 19: log likelihood = -1180.912 Iteration 20: log likelihood = -1180.7877 Iteration 21: log likelihood = -1180.5545 Iteration 22: log likelihood = -1180.4177 Iteration 23: log likelihood = -1180.2966 BFGS stepping has contracted, resetting BFGS Hessian (0) Iteration 24: log likelihood = -1180.2253 Iteration 25: log likelihood = -1180.2209 (backed up) Iteration 26: log likelihood = -1180.2139 (backed up) Iteration 27: log likelihood = -1180.2137 (backed up) Iteration 28: log likelihood = -1180.2113 Iteration 29: log likelihood = -1180.2019 Iteration 30: log likelihood = -1180.1739 Iteration 31: log likelihood = -1180.1278 BFGS stepping has contracted, resetting BFGS Hessian (1) Iteration 32: log likelihood = -1180.0852 Iteration 33: log likelihood = -1180.0773 (backed up) Iteration 34: log likelihood = -1180.0762 (backed up) Iteration 35: log likelihood = -1180.0762 (backed up) Iteration 36: log likelihood = -1180.0758 Iteration 37: log likelihood = -1180.0694 Iteration 38: log likelihood = -1180.0671 Iteration 39: log likelihood = -1180.0664 BFGS stepping has contracted, resetting BFGS Hessian (2) Iteration 40: log likelihood = -1180.058 Iteration 41: log likelihood = -1180.0576 (backed up) Iteration 42: log likelihood = -1180.0575 (backed up) Iteration 43: log likelihood = -1180.0575 (backed up) Iteration 44: log likelihood = -1180.0573 Iteration 45: log likelihood = -1180.0466 Iteration 46: log likelihood = -1180.0434 BFGS stepping has contracted, resetting BFGS Hessian (3) Iteration 47: log likelihood = -1180.043 Iteration 48: log likelihood = -1180.0427 (backed up) Iteration 49: log likelihood = -1180.0427 (backed up) Iteration 50: log likelihood = -1180.0427 (backed up) Iteration 51: log likelihood = -1180.0427 Iteration 52: log likelihood = -1180.0422 BFGS stepping has contracted, resetting BFGS Hessian (4) Iteration 53: log likelihood = -1180.0414 Iteration 54: log likelihood = -1180.0412 (backed up) Iteration 55: log likelihood = -1180.0412 (backed up) Iteration 56: log likelihood = -1180.0412 (backed up) Iteration 57: log likelihood = -1180.0411 Iteration 58: log likelihood = -1180.0404 Iteration 59: log likelihood = -1180.0401 BFGS stepping has contracted, resetting BFGS Hessian (5) 298

Iteration 60: log likelihood = -1180.0381 Iteration 61: log likelihood = -1180.038 (backed up) Iteration 62: log likelihood = -1180.0364 (backed up) Iteration 63: log likelihood = -1180.0364 (backed up) Iteration 64: log likelihood = -1180.0364 Iteration 65: log likelihood = -1180.0361 Iteration 66: log likelihood = -1180.0357 BFGS stepping has contracted, resetting BFGS Hessian (6) Iteration 67: log likelihood = -1180.0348 Iteration 68: log likelihood = -1180.0348 (backed up) Iteration 69: log likelihood = -1180.0348 (backed up) Iteration 70: log likelihood = -1180.0348 (backed up) Iteration 71: log likelihood = -1180.0348 Iteration 72: log likelihood = -1180.0331 Iteration 73: log likelihood = -1180.0328 BFGS stepping has contracted, resetting BFGS Hessian (7) Iteration 74: log likelihood = -1180.0319 Iteration 75: log likelihood = -1180.0318 (backed up) Iteration 76: log likelihood = -1180.0317 (backed up) Iteration 77: log likelihood = -1180.0317 (backed up) Iteration 78: log likelihood = -1180.0317 (backed up) Iteration 79: log likelihood = -1180.0313 BFGS stepping has contracted, resetting BFGS Hessian (8) Iteration 80: log likelihood = -1180.031 Iteration 81: log likelihood = -1180.031 (backed up) Iteration 82: log likelihood = -1180.031 (backed up) Iteration 83: log likelihood = -1180.031 (backed up) Iteration 84: log likelihood = -1180.031 (backed up) BFGS stepping has contracted, resetting BFGS Hessian (9) Iteration 85: log likelihood = -1180.0305 Iteration 86: log likelihood = -1180.0304 (backed up) Iteration 87: log likelihood = -1180.0304 (backed up) Iteration 88: log likelihood = -1180.0304 (backed up) Iteration 89: log likelihood = -1180.0304 Iteration 90: log likelihood = -1180.0303 Iteration 91: log likelihood = -1180.0301 BFGS stepping has contracted, resetting BFGS Hessian (10) Iteration 92: log likelihood = -1180.0296 Iteration 93: log likelihood = -1180.0295 (backed up) Iteration 94: log likelihood = -1180.0295 (backed up) Iteration 95: log likelihood = -1180.0295 (backed up) Iteration 96: log likelihood = -1180.0295 Iteration 97: log likelihood = -1180.0292 Iteration 98: log likelihood = -1180.029 BFGS stepping has contracted, resetting BFGS Hessian (11) Iteration 99: log likelihood = -1180.0288 Iteration 100: log likelihood = -1180.0288 (backed up) Iteration 101: log likelihood = -1180.0288 (backed up) Iteration 102: log likelihood = -1180.0288 (backed up) Iteration 103: log likelihood = -1180.0288 (backed up) Iteration 104: log likelihood = -1180.0285 299

BFGS stepping has contracted, resetting BFGS Hessian (12) Iteration 105: log likelihood = -1180.0283 Iteration 106: log likelihood = -1180.0283 (backed up) Iteration 107: log likelihood = -1180.0283 (backed up) Iteration 108: log likelihood = -1180.0283 (backed up) Iteration 109: log likelihood = -1180.0283 Iteration 110: log likelihood = -1180.0282 Iteration 111: log likelihood = -1180.028 BFGS stepping has contracted, resetting BFGS Hessian (13) Iteration 112: log likelihood = -1180.0274 Iteration 113: log likelihood = -1180.0274 (backed up) Iteration 114: log likelihood = -1180.0274 (backed up) Iteration 115: log likelihood = -1180.0274 (backed up) Iteration 116: log likelihood = -1180.0274 (backed up) Iteration 117: log likelihood = -1180.0266 BFGS stepping has contracted, resetting BFGS Hessian (14) Iteration 118: log likelihood = -1180.0265 Iteration 119: log likelihood = -1180.0265 (backed up) Iteration 120: log likelihood = -1180.0265 (backed up) Iteration 121: log likelihood = -1180.0265 (backed up) Iteration 122: log likelihood = -1180.0265 (backed up) Iteration 123: log likelihood = -1180.0263 BFGS stepping has contracted, resetting BFGS Hessian (15) Iteration 124: log likelihood = -1180.0261 Iteration 125: log likelihood = -1180.0261 (backed up) Iteration 126: log likelihood = -1180.0261 (backed up) Iteration 127: log likelihood = -1180.0261 (backed up) Iteration 128: log likelihood = -1180.0261 (backed up) BFGS stepping has contracted, resetting BFGS Hessian (16) Iteration 129: log likelihood = -1180.026 Iteration 130: log likelihood = -1180.026 (backed up) Iteration 131: log likelihood = -1180.026 (backed up) Iteration 132: log likelihood = -1180.026 (backed up) Iteration 133: log likelihood = -1180.026 (backed up) Iteration 134: log likelihood = -1180.0259 BFGS stepping has contracted, resetting BFGS Hessian (17) Iteration 135: log likelihood = -1180.0213 Iteration 136: log likelihood = -1180.0208 (backed up) Iteration 137: log likelihood = -1180.0207 (backed up) Iteration 138: log likelihood = -1180.0207 (backed up) Iteration 139: log likelihood = -1180.0206 Iteration 140: log likelihood = -1180.0191 Iteration 141: log likelihood = -1180.0186 BFGS stepping has contracted, resetting BFGS Hessian (18) Iteration 142: log likelihood = -1180.0185 Iteration 143: log likelihood = -1180.0185 (backed up) Iteration 144: log likelihood = -1180.0185 (backed up) Iteration 145: log likelihood = -1180.0185 Iteration 146: log likelihood = -1180.0185 (backed up) BFGS stepping has contracted, resetting BFGS Hessian (19) Iteration 147: log likelihood = -1180.0184 300

Iteration 148: log likelihood = -1180.0184 (backed up) Iteration 149: log likelihood = -1180.0184 (backed up) Iteration 150: log likelihood = -1180.0184 (backed up) Iteration 151: log likelihood = -1180.0184 (backed up) Iteration 152: log likelihood = -1180.0184 Iteration 153: log likelihood = -1180.0183 BFGS stepping has contracted, resetting BFGS Hessian (20) Iteration 154: log likelihood = -1180.0177 Iteration 155: log likelihood = -1180.0176 (backed up) Iteration 156: log likelihood = -1180.0176 (backed up) Iteration 157: log likelihood = -1180.0176 (backed up) Iteration 158: log likelihood = -1180.0176 (backed up) Iteration 159: log likelihood = -1180.0172 Iteration 160: log likelihood = -1180.0171 BFGS stepping has contracted, resetting BFGS Hessian (21) Iteration 161: log likelihood = -1180.017 Iteration 162: log likelihood = -1180.017 (backed up) Iteration 163: log likelihood = -1180.017 (backed up) Iteration 164: log likelihood = -1180.017 (backed up) Iteration 165: log likelihood = -1180.017 Iteration 166: log likelihood = -1180.017 BFGS stepping has contracted, resetting BFGS Hessian (22) Iteration 167: log likelihood = -1180.0169 Iteration 168: log likelihood = -1180.0169 (backed up) Iteration 169: log likelihood = -1180.0169 (backed up) Iteration 170: log likelihood = -1180.0169 (backed up) Iteration 171: log likelihood = -1180.0169 (backed up) Iteration 172: log likelihood = -1180.0169 Iteration 173: log likelihood = -1180.0169 BFGS stepping has contracted, resetting BFGS Hessian (23) Iteration 174: log likelihood = -1180.0167 Iteration 175: log likelihood = -1180.0167 (backed up) Iteration 176: log likelihood = -1180.0167 (backed up) Iteration 177: log likelihood = -1180.0167 (backed up) Iteration 178: log likelihood = -1180.0167 (backed up) Iteration 179: log likelihood = -1180.0166 BFGS stepping has contracted, resetting BFGS Hessian (24) Iteration 180: log likelihood = -1180.0165 Iteration 181: log likelihood = -1180.0165 (backed up) Iteration 182: log likelihood = -1180.0165 (backed up) Iteration 183: log likelihood = -1180.0165 (backed up) Iteration 184: log likelihood = -1180.0165 (backed up) BFGS stepping has contracted, resetting BFGS Hessian (25) Iteration 185: log likelihood = -1180.0165 Iteration 186: log likelihood = -1180.0165 (backed up) Iteration 187: log likelihood = -1180.0165 (backed up) Iteration 188: log likelihood = -1180.0164 (backed up) Iteration 189: log likelihood = -1180.0164 (backed up) Iteration 190: log likelihood = -1180.0164 BFGS stepping has contracted, resetting BFGS Hessian (26) Iteration 191: log likelihood = -1180.0164 301

Iteration 192: log likelihood = -1180.0164 (backed up) Iteration 193: log likelihood = -1180.0164 (backed up) Iteration 194: log likelihood = -1180.0164 (backed up) Iteration 195: log likelihood = -1180.0164 (backed up) Iteration 196: log likelihood = -1180.0164 BFGS stepping has contracted, resetting BFGS Hessian (27) Iteration 197: log likelihood = -1180.0163 Iteration 198: log likelihood = -1180.0163 (backed up) Iteration 199: log likelihood = -1180.0163 (backed up) Iteration 200: log likelihood = -1180.0163 (backed up) Iteration 201: log likelihood = -1180.0163 (backed up) Iteration 202: log likelihood = -1180.0162 BFGS stepping has contracted, resetting BFGS Hessian (28) Iteration 203: log likelihood = -1180.0162 Iteration 204: log likelihood = -1180.0162 (backed up) Iteration 205: log likelihood = -1180.0162 (backed up) Iteration 206: log likelihood = -1180.0162 (backed up) Iteration 207: log likelihood = -1180.0162 (backed up) BFGS stepping has contracted, resetting BFGS Hessian (29) Iteration 208: log likelihood = -1180.0161 Iteration 209: log likelihood = -1180.0161 (backed up) Iteration 210: log likelihood = -1180.0161 (backed up) Iteration 211: log likelihood = -1180.0161 (backed up) Iteration 212: log likelihood = -1180.0161 Iteration 213: log likelihood = -1180.0161 BFGS stepping has contracted, resetting BFGS Hessian (30) Iteration 214: log likelihood = -1180.016 Iteration 215: log likelihood = -1180.016 (backed up) Iteration 216: log likelihood = -1180.016 (backed up) Iteration 217: log likelihood = -1180.016 (backed up) Iteration 218: log likelihood = -1180.016 (backed up) BFGS stepping has contracted, resetting BFGS Hessian (31) Iteration 219: log likelihood = -1180.016 Iteration 220: log likelihood = -1180.016 (backed up) Iteration 221: log likelihood = -1180.016 (backed up) Iteration 222: log likelihood = -1180.016 (backed up) Iteration 223: log likelihood = -1180.016 (backed up) BFGS stepping has contracted, resetting BFGS Hessian (32) Iteration 224: log likelihood = -1180.0159 Iteration 225: log likelihood = -1180.0159 (backed up) Iteration 226: log likelihood = -1180.0159 (backed up) Iteration 227: log likelihood = -1180.0159 (backed up) Iteration 228: log likelihood = -1180.0159 Iteration 229: log likelihood = -1180.0159 Iteration 230: log likelihood = -1180.0159 BFGS stepping has contracted, resetting BFGS Hessian (33) Iteration 231: log likelihood = -1180.0157 Iteration 232: log likelihood = -1180.0157 (backed up) Iteration 233: log likelihood = -1180.0157 (backed up) Iteration 234: log likelihood = -1180.0157 (backed up) Iteration 235: log likelihood = -1180.0157 (backed up) 302

Iteration 236: log likelihood = -1180.0156 Nested logit estimates Levels = 2 Dependent variable = d Log likelihood = -1180.0156

Number of obs = 4728 LR chi2(6) = 917.1687 Prob > chi2 = 0.0000

-----------------------------------------------------------------------------| Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------alterntv | p | -.0013303 .001081 -1.23 0.218 -.003449 .0007883 q | .1284825 .1038986 1.24 0.216 -.075155 .33212 -------------+---------------------------------------------------------------type | dshore | -11.40196 9.15307 -1.25 0.213 -29.34164 6.537733 dshorey | .1108341 .0531049 2.09 0.037 .0067505 .2149178 -------------+---------------------------------------------------------------(incl. value | parameters) | type | /shore | 29.98591 24.40089 1.23 0.219 -17.83896 77.81078 /boat | 14.06438 11.39886 1.23 0.217 -8.276971 36.40572 -----------------------------------------------------------------------------LR test of homoskedasticity (iv = 1): chi2(2)= 145.39 Prob > chi2 = 0.0000 -----------------------------------------------------------------------------. estimates store nlogitunrest . . *** (2B) Estimate the restricted nested logit model . *** This is the model on p.511 that has log L = -1252 . . * Set the inclusive value parameters to 1 . nlogit d (alterntv = p q) (type = dshore dshorey), group(id) ivc(shore=1, boat=1) tree structure specified for the nested logit model top --> bottom type alterntv -------------------------shore 1 2 boat 3 4 User-defined constraint(s): IV constraint(s): [shore]_cons = 1 [boat]_cons = 1 303

initial: log likelihood = -1256.8179 rescale: log likelihood = -1256.8179 rescale eq: log likelihood = -1228.6278 Iteration 0: log likelihood = -1264.4012 Iteration 1: log likelihood = -1264.1213 (backed up) Iteration 2: log likelihood = -1256.9241 (backed up) Iteration 3: log likelihood = -1255.0984 (backed up) Iteration 4: log likelihood = -1254.4838 Iteration 5: log likelihood = -1252.7216 Iteration 6: log likelihood = -1252.7111 Iteration 7: log likelihood = -1252.711 Nested logit estimates Levels = 2 Dependent variable = d Log likelihood = -1252.711

Number of obs = 4728 LR chi2(4) = 771.7778 Prob > chi2 = 0.0000

-----------------------------------------------------------------------------| Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------alterntv | p | -.020246 .0012832 -15.78 0.000 -.022761 -.017731 q | .7552644 .0918004 8.23 0.000 .575339 .9351899 -------------+---------------------------------------------------------------type | dshore | -.5897435 .1565201 -3.77 0.000 -.8965172 -.2829697 dshorey | -.0790869 .0381453 -2.07 0.038 -.1538503 -.0043235 -------------+---------------------------------------------------------------(incl. value | parameters) | type | /shore | 1 . . . . . /boat | 1 . . . . . -----------------------------------------------------------------------------LR test of homoskedasticity (iv = 1): chi2(0)= 0.00 Prob > chi2 = . -----------------------------------------------------------------------------. estimates store nlogitrest . . * Perform a likelihood ratio test that inclusive parameters = 1 . lrtest nlogitunrest nlogitrest likelihood-ratio test LR chi2(2) = 145.39 (Assumption: nlogitrest nested in nlogitunrest) Prob > chi2 =

0.0000

. . *** (2C) As a check, verify that this restricted nested logit = conditional logit . . clogit d p q dshore dshorey, group(id) 304

Iteration 0: Iteration 1: Iteration 2: Iteration 3: Iteration 4: Iteration 5:

log likelihood = -1547.6028 log likelihood = -1317.5764 log likelihood = -1262.8183 log likelihood = -1253.096 log likelihood = -1252.7117 log likelihood = -1252.711

Conditional (fixed-effects) logistic regression Number of obs = 4728 LR chi2(4) = 771.78 Prob > chi2 = 0.0000 Log likelihood = -1252.711 Pseudo R2 = 0.2355 -----------------------------------------------------------------------------d| Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------p | -.0202461 .0012832 -15.78 0.000 -.0227611 -.0177311 q | .7552646 .0918003 8.23 0.000 .5753392 .9351899 dshore | -.5897442 .15652 -3.77 0.000 -.8965178 -.2829706 dshorey | -.0790866 .0381453 -2.07 0.038 -.1538499 -.0043232 -----------------------------------------------------------------------------. . ********** CLOSE OUTPUT ********** . log close log: c:\Imbook\bwebpage\Section4\mma15p2gev.txt log type: text closed on: 19 May 2005, 12:19:10

305

-----------------------------------------------------------------------------------------------------log: c:\Imbook\bwebpage\Section4\mma16p1tobit.txt log type: text opened on: 19 May 2005, 13:00:31 . . ********** OVERVIEW OF MMA16P1TOBIT.DO ********** . . * STATA Program . * copyright C 2005 by A. Colin Cameron and Pravin K. Trivedi . * used for "Microeconometrics: Methods and Applications" . * by A. Colin Cameron and Pravin K. Trivedi (2005) . * Cambridge University Press . . * Chapter 16.2.1 pages 530-1 and 16.9.2 page 565 . * Classic Tobit model with generated data . * Provides . * (1) Graph of various conditional means Figure 16.1 (ch16condmeans.wmf) . * (2) Tobit model estimation: various estimators not reported in book . * (3) Tobit model estimation: CLAD estimation mentioned on page 565 . * using generated data (see below) . . ********** SETUP ********** . . set more off . version 8.0 . set scheme s1mono /* Used for graphs */ . . ********** GENERATE DATA ********** . . * Data generating process is . * Regressor: lnwage ~ N(2.75, 0.6^2) . * Error term: e ~ N(0, 1000^2) . * Latent variable: ystar = -2500 + 1000*lnwage + e . * Truncated variable: ytrunc = 1(ystar>0)*ystar . * Censored variable: ycens = 1(ystar<=0)*0 + 1(ystar>0)*ystar . * Censoring Indicator: dy = 1(ycens>0) . . set seed 10101 . set obs 200 obs was 0, now 200 . gen e = 1000*invnorm(uniform( )) . gen lnwage = 2.75 + 0.6*invnorm(uniform( )) . gen ystar = -2500 + 1000*lnwage + e 306

. gen ytrunc = ystar . replace ytrunc = . if (ystar < 0) (70 real changes made, 70 to missing) . gen ycens = ystar . replace ycens = 0 if (ystar < 0) (70 real changes made) . gen dy = ycens . replace dy = 1 if (ycens>0) (130 real changes made) . . summarize Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------e| 200 76.96455 977.5598 -2906.972 2943.727 lnwage | 200 2.792559 .6249093 .9039821 4.373462 ystar | 200 369.5237 1163.722 -2852.944 3105.383 ytrunc | 130 1047.602 712.0859 17.88135 3105.383 ycens | 200 680.9414 761.3346 0 3105.383 -------------+-------------------------------------------------------dy | 200 .65 .4781665 0 1 . . * Save data as text (ascii) so that can use programs other than Stata . outfile e lnwage ystar ytrunc ycens dy using mma16p1tobit.asc, replace . . ********** (1) PLOT THEORETICAL CONDITIONAL MEANS ********** . . * Here we use the true parameter values used in the dgp . . * Compute the censored and truncated means . gen xb = -2500 + 1000*lnwage . gen sigma = 1000 . gen capphixb = normprob(xb/sigma) . gen phixb = normd(xb/sigma) . gen lamda = phixb/capphixb . gen eytrunc = xb + sigma*lamda

307

. gen eycens = capphixb*eytrunc . . * Descriptive Statistics . summarize Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------e| 200 76.96455 977.5598 -2906.972 2943.727 lnwage | 200 2.792559 .6249093 .9039821 4.373462 ystar | 200 369.5237 1163.722 -2852.944 3105.383 ytrunc | 130 1047.602 712.0859 17.88135 3105.383 ycens | 200 680.9414 761.3346 0 3105.383 -------------+-------------------------------------------------------dy | 200 .65 .4781665 0 1 xb | 200 292.5592 624.9093 -1596.018 1873.462 sigma | 200 1000 0 1000 1000 capphixb | 200 .5983181 .2092614 .0552424 .9694977 phixb | 200 .3271769 .0771531 .0689849 .3989196 -------------+-------------------------------------------------------lamda | 200 .6687834 .3533611 .0711553 2.020711 eytrunc | 200 961.3426 283.2587 424.693 1944.617 eycens | 200 631.3493 380.6074 23.46106 1885.302 . . * Plot Figure 16.1 on page 531 . sort lnwage . graph twoway (scatter ystar lnwage, msize(small)) /* > */ (scatter eytrunc lnwage, c(l) msize(vtiny) clstyle(p3) clwidth(medthick)) /* > */ (scatter eycens lnwage, c(l) msize(vtiny) clstyle(p2) clwidth(medthick)) /* > */ (scatter xb lnwage, c(l) msize(vtiny) clstyle(p1) clwidth(medthick)), /* > */ scale (1.2) plotregion(style(none)) /* > */ title("Tobit: Censored and Truncated Means") /* > */ xtitle("Natural Logarithm of Wage", size(medlarge)) xscale(titlegap(*5)) /* > */ ytitle("Different Conditional Means", size(medlarge)) yscale(titlegap(*5)) /* > */ legend(pos(5) ring(0) col(1)) legend(size(small)) /* > */ legend( label(1 "Actual Latent Variable") label(2 "Truncated Mean") /* > */ label(3 "Censored Mean") label(4 "Uncensored Mean")) . graph export ch16condmeans.wmf, replace (file c:\Imbook\bwebpage\Section4\ch16condmeans.wmf written in Windows Metafile format) . . ********** (2) TOBIT MODEL ESTIMATION FOR THESE DATA ********** . . * These are computations not reported in the book. . . * With only 200 observations the Heckman 2-step estimates given below . * are very inefficient. To verify that they are consistent . * increase the sample size e.g. set obs 20000 308

. . * (2A) ESTIMATE THE VARIOUS MODELS . . *** UNCENSORED OLS REGRESSION . * Possible here since for these generated data we actually know ystar . * Yelds consistent estimate. Expect slope = 1000 approximately. . regress ystar lnwage, robust Regression with robust standard errors Number of obs = F( 1, 198) = 96.32 Prob > F = 0.0000 R-squared = 0.2944 Root MSE = 980

200

-----------------------------------------------------------------------------| Robust ystar | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------lnwage | 1010.39 102.9518 9.81 0.000 807.3673 1213.413 _cons | -2452.05 303.2432 -8.09 0.000 -3050.051 -1854.049 -----------------------------------------------------------------------------. estimates store ols . predict ystarols (option xb assumed; fitted values) . . *** CENSORED OLS REGRESSION . * Yields inconsistent estimates . * From subsection 16.3.6 for slope coefficient OLS converges to p times b . * where p is fraction of sample with positive values. Here 0.65*1000 = 650. . regress ycens lnwage, robust Regression with robust standard errors Number of obs = F( 1, 198) = 84.20 Prob > F = 0.0000 R-squared = 0.2522 Root MSE = 660.04

200

-----------------------------------------------------------------------------| Robust ycens | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------lnwage | 611.8108 66.67493 9.18 0.000 480.3267 743.2949 _cons | -1027.577 176.0776 -5.84 0.000 -1374.805 -680.3484 -----------------------------------------------------------------------------. estimates store censols . predict ycensols 309

(option xb assumed; fitted values) . . *** TRUNCATED OLS REGRESSION for POSITIVE WAGE . * Yields inconsistent estimates . * See subsection 16.3.6 for discussion. . regress ytrunc lnwage, robust Regression with robust standard errors Number of obs = F( 1, 128) = 22.05 Prob > F = 0.0000 R-squared = 0.1261 Root MSE = 668.28

130

-----------------------------------------------------------------------------| Robust ytrunc | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------lnwage | 442.6319 94.26938 4.70 0.000 256.1038 629.16 _cons | -282.4444 282.9091 -1.00 0.320 -842.2285 277.3396 -----------------------------------------------------------------------------. estimates store truncols . predict ytrunols (option xb assumed; fitted values) . . *** CENSORED TOBIT MLE REGRESSION for HWAGE . * Yields consistent estimates . tobit ycens lnwage, ll(0) Tobit estimates

Number of obs = 200 LR chi2(1) = 65.64 Prob > chi2 = 0.0000 Log likelihood = -1118.3857 Pseudo R2 = 0.0285 -----------------------------------------------------------------------------ycens | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------lnwage | 956.4877 116.8382 8.19 0.000 726.0879 1186.887 _cons | -2244.567 346.8778 -6.47 0.000 -2928.595 -1560.539 -------------+---------------------------------------------------------------_se | 896.6811 59.14988 (Ancillary parameter) -----------------------------------------------------------------------------Obs. summary: 130

70 left-censored observations at ycens<=0 uncensored observations

. estimates store censtobit

310

. predict ycenstob (option xb assumed; fitted values) . . *** TRUNCATED TOBIT MLE REGRESSION for HWAGE . * If done propoerly yields consistent estimates . * Not sure how to do this in Stata . * The obvious command is . * tobit ytrunc lnwage, ll(0) . * but this gives the same estimates as truncated OLS . . *** PROBIT REGRESSION for HWAGE . * Yields consistent estimates for slope b/s = 1000/1000 = 1 . * but uses less information so expect less efficient than tobit . probit dy lnwage Iteration 0: Iteration 1: Iteration 2: Iteration 3:

log likelihood = -129.48933 log likelihood = -106.07902 log likelihood = -105.30024 log likelihood = -105.29672

Probit estimates

Number of obs = 200 LR chi2(1) = 48.39 Prob > chi2 = 0.0000 Log likelihood = -105.29672 Pseudo R2 = 0.1868 -----------------------------------------------------------------------------dy | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------lnwage | 1.173851 .1870053 6.28 0.000 .8073277 1.540375 _cons | -2.795715 .508104 -5.50 0.000 -3.79158 -1.799849 -----------------------------------------------------------------------------. estimates store probit . predict yprobit (option p assumed; Pr(dy)) . . *** HECKMAN 2-STEP ESTIMATOR DONE MANUALLY . * Yields consistent estimates but less efficient than censored tobit MLE . * The second stage standard errors will be incorrect . probit dy lnwage Iteration 0: Iteration 1: Iteration 2: Iteration 3:

log likelihood = -129.48933 log likelihood = -106.07902 log likelihood = -105.30024 log likelihood = -105.29672

Probit estimates

Number of obs = LR chi2(1) = 48.39

200

311

Prob > chi2 Log likelihood = -105.29672

= 0.0000 Pseudo R2 =

0.1868

-----------------------------------------------------------------------------dy | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------lnwage | 1.173851 .1870053 6.28 0.000 .8073277 1.540375 _cons | -2.795715 .508104 -5.50 0.000 -3.79158 -1.799849 -----------------------------------------------------------------------------. predict probity, xb . gen invmills = normd(probity)/normprob(probity) . summarize dy probity invmills Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------dy | 200 .65 .4781665 0 1 probity | 200 .482335 .7335506 -1.734574 2.33808 invmills | 200 .5867037 .3823083 .0261866 2.140342 . regress ytrunc lnwage invmills Source | SS df MS Number of obs = 130 -------------+-----------------------------F( 2, 127) = 9.41 Model | 8440402.78 2 4220201.39 Prob > F = 0.0002 Residual | 56971158.9 127 448591.802 R-squared = 0.1290 -------------+-----------------------------Adj R-squared = 0.1153 Total | 65411561.6 129 507066.369 Root MSE = 669.77 -----------------------------------------------------------------------------ytrunc | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------lnwage | 176.6468 418.2392 0.42 0.673 -650.9731 1004.267 invmills | -498.9958 760.3525 -0.66 0.513 -2003.596 1005.604 _cons | 745.3069 1597.558 0.47 0.642 -2415.972 3906.586 -----------------------------------------------------------------------------. estimates store heck2step . correlate lnwage invmills (obs=200) | lnwage invmills -------------+-----------------lnwage | 1.0000 invmills | -0.9745 1.0000

. * And more robust standard errors may be found by 312

. regress ytrunc lnwage invmills, robust Regression with robust standard errors Number of obs = F( 2, 127) = 13.96 Prob > F = 0.0000 R-squared = 0.1290 Root MSE = 669.77

130

-----------------------------------------------------------------------------| Robust ytrunc | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------lnwage | 176.6468 379.1739 0.47 0.642 -573.6699 926.9636 invmills | -498.9958 635.4917 -0.79 0.434 -1756.519 758.5276 _cons | 745.3069 1431.149 0.52 0.603 -2086.68 3577.293 -----------------------------------------------------------------------------. estimates store heck2srobust . . *** HECKMAN 2-STEP ESTIMATOR DONE USING BUILT-IN HECKMAN COMMAND . * Yields consistent estimates but less efficient than censored tobit MLE . heckman ytrunc lnwage, select(lnwage) twostep Heckman selection model -- two-step estimates Number of obs (regression model with sample selection) Censored obs = Uncensored obs = 130 Wald chi2(2) Prob > chi2

=

200 70

= 39.57 = 0.0000

-----------------------------------------------------------------------------| Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------ytrunc | lnwage | 176.6469 425.0025 0.42 0.678 -656.3428 1009.636 _cons | 745.3067 1617.583 0.46 0.645 -2425.098 3915.711 -------------+---------------------------------------------------------------select | lnwage | 1.173851 .1870053 6.28 0.000 .8073277 1.540375 _cons | -2.795715 .508104 -5.50 0.000 -3.79158 -1.799849 -------------+---------------------------------------------------------------mills | lambda | -498.9957 760.5005 -0.66 0.512 -1989.549 991.5578 -------------+---------------------------------------------------------------rho | -0.67419 sigma | 740.1433 lambda | -498.99575 760.5005 -----------------------------------------------------------------------------. estimates store heckman 313

. predict ystarhec, xb . predict ytrunhec, ycond . predict ycenshec, yexpected . predict yinvmill, mills . predict yprobsel, psel . correlate lnwage yinvmill (obs=200) | lnwage yinvmill -------------+-----------------lnwage | 1.0000 yinvmill | -0.9745 1.0000

. . * (2B) DISPLAY COEFFICIENT ESTIMATES . . * OLS estimates True model is -2500 + 1000*lnwage . estimates table ols censols truncols, b(%10.2f) se(%10.2f) t stats(N ll) ----------------------------------------------------Variable | ols censols truncols -------------+--------------------------------------lnwage | 1010.39 611.81 442.63 | 102.95 66.67 94.27 | 9.81 9.18 4.70 _cons | -2452.05 -1027.58 -282.44 | 303.24 176.08 282.91 | -8.09 -5.84 -1.00 -------------+--------------------------------------N | 200.00 200.00 130.00 ll | -1660.29 -1581.24 -1029.07 ----------------------------------------------------legend: b/se/t . . * Tobit estimates True model is -2500 + 1000*lnwage . estimates table censtobit probit, b(%10.2f) se(%10.2f) t stats(N ll) ---------------------------------------Variable | censtobit probit -------------+-------------------------lnwage | 956.49 1.17 | 116.84 0.19 | 8.19 6.28 314

_se | 896.68 | 59.15 | 15.16 _cons | -2244.57 -2.80 | 346.88 0.51 | -6.47 -5.50 -------------+-------------------------N | 200.00 200.00 ll | -1118.39 -105.30 ---------------------------------------legend: b/se/t . . * Tobit estimates using Heckman manual True model is -2500 + 1000*lnwage . estimates table heck2step heck2srobust, b(%10.2f) se(%10.2f) t stats(N ll) ---------------------------------------Variable | heck2step heck2sro~t -------------+-------------------------lnwage | 176.65 176.65 | 418.24 379.17 | 0.42 0.47 invmills | -499.00 -499.00 | 760.35 635.49 | -0.66 -0.79 _cons | 745.31 745.31 | 1597.56 1431.15 | 0.47 0.52 -------------+-------------------------N | 130.00 130.00 ll | -1028.85 -1028.85 ---------------------------------------legend: b/se/t . . * Tobit estimates using Heckman built-in True model is -2500 + 1000*lnwage . estimates table heckman, b(%10.2f) se(%10.2f) t stats(N ll) --------------------------Variable | heckman -------------+------------ytrunc | lnwage | 176.65 | 425.00 | 0.42 _cons | 745.31 | 1617.58 | 0.46 -------------+------------select | lnwage | 1.17 315

| 0.19 | 6.28 _cons | -2.80 | 0.51 | -5.50 -------------+------------mills | lambda | -499.00 | 760.50 | -0.66 -------------+------------Statistics | N | 200.00 ll | --------------------------legend: b/se/t . . ********** (3) CLAD ESTIMATION FOR THESE DATA page 565 ********** . . * Compare tobit MLE with censored least absolute deviations (CLAD) estimator . * Gives results at end of section 16.9.3 page 565 . . tobit ycens lnwage, ll(0) Tobit estimates

Number of obs = 200 LR chi2(1) = 65.64 Prob > chi2 = 0.0000 Log likelihood = -1118.3857 Pseudo R2 = 0.0285 -----------------------------------------------------------------------------ycens | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------lnwage | 956.4877 116.8382 8.19 0.000 726.0879 1186.887 _cons | -2244.567 346.8778 -6.47 0.000 -2928.595 -1560.539 -------------+---------------------------------------------------------------_se | 896.6811 59.14988 (Ancillary parameter) -----------------------------------------------------------------------------Obs. summary: 130

70 left-censored observations at ycens<=0 uncensored observations

. clad ycens lnwage, reps(100) ll(0) Initial sample size = 200 Final sample size = 159 Pseudo R2 = .12380382 Bootstrap statistics Variable | Reps Observed

Bias Std. Err. [95% Conf. Interval] 316

---------+------------------------------------------------------------------lnwage | 100 838.2366 59.09127 165.7476 509.3575 1167.116 (N) | 666.9485 1298.217 (P) | 664.528 1247.371 (BC) ---------+------------------------------------------------------------------const | 100 -1897.847 -184.2656 529.6713 -2948.83 -846.8643 (N) | -3406.233 -1435.466 (P) | -3406.233 -1435.466 (BC) ----------------------------------------------------------------------------N = normal, P = percentile, BC = bias-corrected . . ********** CLOSE OUTPUT . log close log: c:\Imbook\bwebpage\Section4\mma16p1tobit.txt log type: text closed on: 19 May 2005, 13:00:37 -----------------------------------------------------------------------------------------------------log: c:\Imbook\bwebpage\Section4\mma16p2mills.txt log type: text opened on: 19 May 2005, 13:02:12 . . ********** OVERVIEW OF MMA16P2MILLS.DO ********** . . * STATA Program . * copyright C 2005 by A. Colin Cameron and Pravin K. Trivedi . * used for "Microeconometrics: Methods and Applications" . * by A. Colin Cameron and Pravin K. Trivedi (2005) . * Cambridge University Press . . * Chapter 16.3.4 page 540 . * Presentation of Mills ratio . * It provides . * (1) Figure 16.1 (ch16millsratio.wmf) . * This program requires no data . . ********** SETUP *********** . . set more off . version 8 . set scheme s1mono /* Used for graphs */ . . ********** GENERATE DATA AND FUNCTIONS . . * Create density cdf Mills ratio for N[0,1] . set obs 100 obs was 0, now 100 317

. gen c = 4*(50-_n)/100 . gen PHIc = norm(c) . gen phic = normden(c) . gen lamdac = phic/(1-PHIc) . . * Descriptive statistics . summarize Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------c| 100 -.02 1.16046 -2 1.96 PHIc | 100 .4952275 .338039 .0227501 .9750021 phic | 100 .2386177 .1157086 .053991 .3989423 lamdac | 100 .9284788 .7023349 .0552479 2.337835 . . *********** FIGURE 16.2 page 540 *********** . . * This graph shows Mills ratio and cdf and density . graph twoway (scatter lamdac c, c(l) msize(vtiny) clstyle(p1) clwidth(medthick)) /* > */ (scatter PHIc c, c(l) msize(vtiny) clstyle(p3) clwidth(medthick)) /* > */ (scatter phic c, c(l) msize(vtiny) clstyle(p2) clwidth(medthick)), /* > */ scale (1.2) plotregion(style(none)) /* > */ title("Inverse Mills Ratio as Cutoff Varies") /* > */ xtitle("Cutoff point c", size(medlarge)) xscale(titlegap(*5)) /* > */ ytitle("Inverse Mills, pdf and cdf", size(medlarge)) yscale(titlegap(*5)) /* > */ legend(pos(11) ring(0) col(1)) legend(size(small)) /* > */ legend( label(1 "Inverse Mills ratio") label(2 "N[0,1] Cdf") label(3 "N[0,1] Density")) . graph export ch16millsratio.wmf, replace (file c:\Imbook\bwebpage\Section4\ch16millsratio.wmf written in Windows Metafile format) . . ********** CLOSE OUTPUT *********** . log close log: c:\Imbook\bwebpage\Section4\mma16p2mills.txt log type: text closed on: 19 May 2005, 13:02:15 -----------------------------------------------------------------------------------------------------log: c:\Imbook\bwebpage\Section4\mma16p3selection.txt log type: text opened on: 19 May 2005, 13:04:33 . . ********** OVERVIEW OF MMA16P3SELECTION.DO ********** 318

. . * STATA Program . * copyright C 2005 by A. Colin Cameron and Pravin K. Trivedi . * used for "Microeconometrics: Methods and Applications" . * by A. Colin Cameron and Pravin K. Trivedi (2005) . * Cambridge University Press . . * Chapter 16.6 pages 553-5 . * Selection models example . * It provides . * (1) Two-part model estimation (Table 16.1) . * (2) Selection model estimation . * (2A) ML estimates (Table 16.1) . * (2B) Heckman 2-step estimates (Table 16.1) . * (2C) Check for possible collinearity problems in Heckman 2-Step . . * To use this program you need health expenditure data in Stata data set . * randdata.dta . . ********** SETUP ********** . . set more off . version 8.0 . set scheme s1mono /* Used for graphs */ . . ********** DATA DESCRIPTION ********** . . * Essentially same data as in P. Deb and P.K. Trivedi (2002) . * "The Structure of Demand for Medical Care: Latent Class versus . * Two-Part Models", Journal of Health Economics, 21, 601-625 . * except that paper used different outcome (counts rather than $) . . * Each observation is for an individual over a year. . * Individuals may appear in up to five years. . * All available sample is used except only fee for service plans included. . * In analysis here only year 2 is used so panel complications are avoided. . * Clustering of individuals within household is ignored here. . . * Dependent variable is .* MED med Annual medical expenditures in constant dollars .* excluding dental and outpatient mental .* LNMED lnmeddol Ln(Medical expenditures) given meddol > 0 .* Missing otherwise .* DMED binexp 1 if medical expenditures > 0 . . * Regressors are . * - Health insurance measures .* LC logc log(coinsrate+1) where coinsurance rate is 0 to 100 319

.* IDP idp 1 if individual deductible plan .* LPI lpi 1og(annual participation incentive payment) or 0 if no payment .* FMDE fmde log(max(medical deductible expenditure)) if IDP=1 and MDE>1 or 0 otherw > ise. . * - Health status measures .* NDISEASE disea number of chronic diseases .* PHYSLIM physlm 1 if physical limitation .* HLTHG hlthg 1 if good health .* HLTHF hlthf 1 if good health .* HLTHP hlthp 1 if good health (omitted is excellent) . * - Socioeconomic characteristics .* LINC linc log of annual family income (in $) .* LFAM lfam log of family size .* EDUCDEC educdec years of schooling of decision maker .* AGE xage exact age .* BLACK black 1 if black .* FEMALE female 1 if female .* CHILD child 1 if child .* FEMCHILD fchild 1 if female child . . * If panel data used then clustering is on .* zper person id . . ********** READ DATA ********** . . use randdata.dta, clear . sum Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------plan | 20190 11.17553 3.976751 1 19 site | 20190 3.298811 1.80382 1 6 coins | 20190 26.3056 36.40386 0 100 tookphys | 20190 .5974245 .4904288 0 1 year | 20190 2.420109 1.217141 1 5 -------------+-------------------------------------------------------zper | 20190 357965.5 180868.1 125024 632167 black | 20190 .1814983 .3827071 0 1 income | 20190 8037.409 4058.371 0 29237.54 xage | 20190 25.72233 16.76945 0 64.27515 female | 20190 .5170381 .499722 0 1 -------------+-------------------------------------------------------educdec | 20186 11.96681 2.806255 0 25 time | 20190 .9989561 .0259741 .0767123 1 outpdol | 20190 51.12649 94.92627 0 2599.902 drugdol | 20190 13.1687 33.76212 0 706.3979 suppdol | 20190 6.8024 21.39346 0 1009.47 -------------+-------------------------------------------------------mentdol | 20190 6.870347 58.41298 0 1340.834 320

inpdol | 20190 100.4694 655.6215 0 38649.81 meddol | 20190 171.5679 698.2015 0 39182.02 totadm | 20190 .1127291 .4111857 0 8 inpmis | 20190 .0039624 .062824 0 1 -------------+-------------------------------------------------------mentvis | 20190 .4322437 3.430789 0 62 mdvis | 20190 2.860426 4.504365 0 77 notmdvis | 20190 .6855869 3.763543 0 109 num | 20190 3.954235 1.853034 1 14 mhi | 20190 76.55584 12.50224 12.2 100 -------------+-------------------------------------------------------disea | 20190 11.24449 6.741449 0 58.6 physlm | 20190 .1235003 .3220164 0 1 ghindx | 14967 73.09055 15.99371 3.7 100 mdeoff | 20185 417.8422 384.1199 0 1000 pioff | 20185 446.677 367.466 0 1291.68 -------------+-------------------------------------------------------child | 20190 .4013373 .4901812 0 1 fchild | 20190 .1937098 .3952139 0 1 lfam | 20190 1.248156 .539301 0 2.639057 lpi | 20190 4.707894 2.69784 0 7.163699 idp | 20190 .2599802 .4386343 0 1 -------------+-------------------------------------------------------logc | 20190 2.383342 2.041776 0 4.564348 fmde | 20190 4.029524 3.471353 0 8.294049 hlthg | 20190 .3620109 .4805938 0 1 hlthf | 20190 .077266 .2670196 0 1 hlthp | 20190 .0149579 .1213874 0 1 -------------+-------------------------------------------------------xghindx | 20190 73.2375 14.2332 3.7 100 linc | 20190 8.708265 1.228309 0 10.28324 lnum | 20190 1.248156 .539301 0 2.639057 lnmeddol | 15737 4.109318 1.484654 -.8495329 10.57597 binexp | 20190 .7794453 .414631 0 1 . . /* Describe and summarize the original data. > describe > summarize > * The orignal data are a panel. > * The following summarizes panel features for completeness > iis zper > tis year > xtdes > xtsum meddol lnmeddol binexp > */ . . ********** DATA SELECTION AND TRANSFORMATIONS ********** . . * Use only Year 2 . keep if year==2 321

(14615 observations deleted) . . * educdec is missing for one observation . drop if educdec==. (1 observation deleted) . . * rename variables . rename meddol MED . rename binexp DMED . rename lnmeddol LNMED . rename linc LINC . rename lfam LFAM . rename educdec EDUCDEC . rename xage AGE . rename female FEMALE . rename child CHILD . rename fchild FEMCHILD . rename black BLACK . rename disea NDISEASE . rename physlm PHYSLIM . rename hlthg HLTHG . rename hlthf HLTHF . rename hlthp HLTHP . rename idp IDP . rename logc LC . rename lpi LPI . rename fmde FMDE . . * Define the regressor list which in commands can refer to as $XLIST 322

. global XLIST LC IDP LPI FMDE PHYSLIM NDISEASE HLTHG HLTHF HLTHP /* > */ LINC LFAM EDUCDEC AGE FEMALE CHILD FEMCHILD BLACK . . * Summarize the dependents and regressors . sum MED DMED LNMED $XLIST Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------MED | 5574 169.7247 802.8303 0 39182.02 DMED | 5574 .7680301 .4221277 0 1 LNMED | 4281 4.069462 1.499372 -.5343859 10.57597 LC | 5574 2.420739 2.043883 0 4.564348 IDP | 5574 .261751 .4396272 0 1 -------------+-------------------------------------------------------LPI | 5574 4.726834 2.681354 0 7.163699 FMDE | 5574 4.065015 3.450558 0 8.294049 PHYSLIM | 5574 .1242463 .3233768 0 1 NDISEASE | 5574 11.20526 6.788959 0 58.6 HLTHG | 5574 .3649085 .4814477 0 1 -------------+-------------------------------------------------------HLTHF | 5574 .0782203 .268542 0 1 HLTHP | 5574 .0156082 .123965 0 1 LINC | 5574 8.696929 1.220592 0 10.28324 LFAM | 5574 1.241407 .5403965 0 2.564949 EDUCDEC | 5574 11.9466 2.837492 0 25 -------------+-------------------------------------------------------AGE | 5574 25.57613 16.73011 .0253251 63.27515 FEMALE | 5574 .5184787 .4997032 0 1 CHILD | 5574 .4050951 .4909545 0 1 FEMCHILD | 5574 .1955508 .3966597 0 1 BLACK | 5574 .1859852 .3860055 0 1 . . * Detailed summary shows that MED>0 very skewed whereas LNMED is not . sum MED LNMED if MED>0, detail medical exp excl outpatient men ------------------------------------------------------------Percentiles Smallest 1% 2.109705 .5860291 5% 5.752914 .6630728 10% 9.376465 .6770833 Obs 4281 25% 21.31435 .6770833 Sum of Wgt. 4281 50% 75% 90% 95% 99%

52.64357 Mean 220.987 Largest Std. Dev. 909.9021 136.4518 12044.11 453.8059 17465.98 Variance 827921.9 904.328 18641.98 Skewness 24.00829 2666.309 39182.02 Kurtosis 873.379 323

LNMED ------------------------------------------------------------Percentiles Smallest 1% .746548 -.5343859 5% 1.749707 -.4108706 10% 2.238203 -.3899609 Obs 4281 25% 3.059381 -.3899609 Sum of Wgt. 4281 50% 75% 90% 95% 99%

3.963544 Mean 4.069462 Largest Std. Dev. 1.499372 4.915971 9.396331 6.11767 9.76801 Variance 2.248116 6.807192 9.833171 Skewness .347695 7.888451 10.57597 Kurtosis 3.28909

. . * Write final data to a text (ascii) file so can use with programs other than Stata . outfile DMED MED LNMED LC IDP LPI FMDE PHYSLIM NDISEASE HLTHG HLTHF HLTHP /* > */ LINC LFAM EDUCDEC AGE FEMALE CHILD FEMCHILD BLACK /* > */ using mma16p3selection.asc, replace . . ****************** CHAPTER 16.6 REGRESSION ANALYSIS ************** . . * The analysis below models log expenditure (lny), not expenditure (y) . * where here y = MED and lny = LNMED. . . * This makes regular tobit difficult as it is not clear . * what the censoring/truncation point is since ln(0) = -infinity . * Also note that some LNMED<0 as 0<MED<1 is possible. . * So just do two-part model and sample selection model. . . * Interested in comparing MED not LNMED at end of day. . * So use . * If lny = xb + u, u ~ N[0, s^2] for y > 0 . * Then E[y] = exp(xb + (s^2)/2) for y > 0 . * and E[y] = Pr[y>0]*exp(xb + (s^2)/2) for all y . . * The models estimated are . * (1) Two-part model using . * (a) probit for whether positive y . * (b) regress with lny as dependent variable . * (2) Sample selection model similar to (3) . * except that inverse Mills ratio appears in (b), estimated by . * (a) MLE . * (b) Heckman 2-step . . * Additionally censored tobit and truncated tobit commands in levels . * are given below for completeness. 324

. . ************ (1) TWO-PART MODEL ************ . . * Two-part model: binary probit and then lognormal for expenditures . . * First part: probit for MED > 0 . probit DMED $XLIST /* global XLIST defined earlier */ Iteration 0: Iteration 1: Iteration 2: Iteration 3: Iteration 4:

log likelihood = -3019.1326 log likelihood = -2698.302 log likelihood = -2690.6146 log likelihood = -2690.5768 log likelihood = -2690.5768

Probit estimates

Number of obs = 5574 LR chi2(17) = 657.11 Prob > chi2 = 0.0000 Log likelihood = -2690.5768 Pseudo R2 = 0.1088 -----------------------------------------------------------------------------DMED | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------LC | -.118708 .0269005 -4.41 0.000 -.1714319 -.065984 IDP | -.1279483 .0522351 -2.45 0.014 -.2303272 -.0255693 LPI | .0283091 .0088793 3.19 0.001 .010906 .0457121 FMDE | .0075319 .0161584 0.47 0.641 -.024138 .0392018 PHYSLIM | .2732013 .0743761 3.67 0.000 .1274268 .4189758 NDISEASE | .0224861 .0035958 6.25 0.000 .0154384 .0295338 HLTHG | .0387516 .0438545 0.88 0.377 -.0472016 .1247049 HLTHF | .1920062 .0836688 2.29 0.022 .0280185 .355994 HLTHP | .6397294 .2126322 3.01 0.003 .222978 1.056481 LINC | .0518413 .0168128 3.08 0.002 .0188889 .0847938 LFAM | -.0335599 .041728 -0.80 0.421 -.1153452 .0482253 EDUCDEC | .036307 .0076536 4.74 0.000 .0213062 .0513078 AGE | .0002631 .0021606 0.12 0.903 -.0039715 .0044978 FEMALE | .4451035 .054292 8.20 0.000 .3386932 .5515138 CHILD | .111489 .0808338 1.38 0.168 -.0469424 .2699203 FEMCHILD | -.4512845 .0799219 -5.65 0.000 -.6079284 -.2946405 BLACK | -.6057367 .0523148 -11.58 0.000 -.7082718 -.5032017 _cons | -.271605 .1877345 -1.45 0.148 -.6395579 .0963478 -----------------------------------------------------------------------------. estimates store twoparta . scalar llprobit = e(ll) . predict probsel2part, p . predict xbprobit, xb

/* version 8 command for later table */

/* Log-likelihood */ /* Pr[y>0] = PHI(x'b) */ /* x'b */

. 325

. * Second part: OLS for log of positive values . * Here LNMED where LNMED missing if MED < 0 . regress LNMED $XLIST Source | SS df MS Number of obs = 4281 -------------+-----------------------------F( 17, 4263) = 39.69 Model | 1314.70352 17 77.335501 Prob > F = 0.0000 Residual | 8307.23358 4263 1.94868252 R-squared = 0.1366 -------------+-----------------------------Adj R-squared = 0.1332 Total | 9621.9371 4280 2.24811614 Root MSE = 1.396 -----------------------------------------------------------------------------LNMED | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------LC | -.0164006 .0312495 -0.52 0.600 -.0776658 .0448647 IDP | -.0789998 .061796 -1.28 0.201 -.2001522 .0421526 LPI | .0027057 .0097138 0.28 0.781 -.0163383 .0217498 FMDE | -.0306123 .0180695 -1.69 0.090 -.0660379 .0048134 PHYSLIM | .2619829 .0687459 3.81 0.000 .1272052 .3967607 NDISEASE | .0198922 .0034441 5.78 0.000 .01314 .0266444 HLTHG | .1438008 .0483778 2.97 0.003 .0489553 .2386464 HLTHF | .3642649 .0881004 4.13 0.000 .1915422 .5369876 HLTHP | .7865099 .1700502 4.63 0.000 .453123 1.119897 LINC | .0931988 .0217849 4.28 0.000 .0504891 .1359085 LFAM | -.1408033 .046203 -3.05 0.002 -.2313852 -.0502214 EDUCDEC | -5.66e-06 .0082599 -0.00 0.999 -.0161993 .016188 AGE | .0055602 .002251 2.47 0.014 .0011471 .0099733 FEMALE | .3442509 .0571573 6.02 0.000 .2321929 .456309 CHILD | -.2677921 .0904307 -2.96 0.003 -.4450833 -.0905009 FEMCHILD | -.3512207 .0896517 -3.92 0.000 -.5269847 -.1754568 BLACK | -.1964412 .0677021 -2.90 0.004 -.3291725 -.0637099 _cons | 3.077182 .2213448 13.90 0.000 2.64323 3.511133 -----------------------------------------------------------------------------. estimates store twopartb . scalar lllognormal = e(ll) /* Log-likelihood */ . scalar sols = e(rmse)

/* Standard error of the regression */

. predict pLNMED, xb

/* Predicted mean from OLS */

. predict rLNMED, residuals (1293 missing values generated) . . * Check for normal errors . hettest Breusch-Pagan / Cook-Weisberg test for heteroskedasticity Ho: Constant variance 326

Variables: fitted values of LNMED chi2(1) = 17.11 Prob > chi2 = 0.0000 . * imtest . sktest LNMED rLNMED Skewness/Kurtosis tests for Normality ------- joint -----Variable | Pr(Skewness) Pr(Kurtosis) adj chi2(2) Prob>chi2 -------------+------------------------------------------------------LNMED | 0.000 0.001 . 0.0000 rLNMED | 0.000 0.000 . 0.0000 . . * Create two-part model log-likelihood . scalar lltwopart = llprobit + lllognormal . di "lltwopart = " lltwopart lltwopart = -10184.076 . . * Create predictions of level of expenditures not logs . * E[y] = exp(pLNMED + (s^2)/2) for y > 0 . * and E[y] = Pr[y>0]*exp(xb + (s^2)/2) for all y . gen pMEDpos2part = exp(pLNMED + (sols^2)/2) . gen pMEDall2part = probsel2part*pMEDpos2part . . * Compare predictions to actual for MED > 0 . sum LNMED pLNMED MED pMEDpos2part if MED > 0 Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------LNMED | 4281 4.069462 1.499372 -.5343859 10.57597 pLNMED | 4281 4.069462 .5542326 2.298199 6.482164 MED | 4281 220.987 909.9021 .5860291 39182.02 pMEDpos2part | 4281 183.462 126.0213 26.37827 1731.088 . corr LNMED pLNMED MED pMEDpos2part if MED > 0 (obs=4281) | LNMED pLNMED MED pMEDpo~t -------------+-----------------------------------LNMED | 1.0000 pLNMED | 0.3696 1.0000 MED | 0.4560 0.1576 1.0000 pMEDpos2part | 0.3387 0.9204 0.1669 1.0000

327

. . * Compare predictions to actual including zeroes . sum MED pMEDall2part DMED probsel2part Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------MED | 5574 169.7247 802.8303 0 39182.02 pMEDall2part | 5574 140.966 120.2022 4.880651 1729.783 DMED | 5574 .7680301 .4221277 0 1 probsel2part | 5574 .7678377 .1457464 .1526731 .999246 . corr MED pMEDall2part DMED probsel2part (obs=5574) | MED pMEDal~t DMED probse~t -------------+-----------------------------------MED | 1.0000 pMEDall2part | 0.1772 1.0000 DMED | 0.1162 0.2158 1.0000 probsel2part | 0.1031 0.6380 0.3467 1.0000

. . ************ (2) SELECTION MODEL ************ . . * Sample selection model for log expenditures . * Selection equation: .* Observe y = y* if I = z'a + u > 0 u ~ N[0,1] . * Regression equation: .* y* = x'b + v v ~ N[0,s^2] and Corr[u,v]=rho . . * (2A) MLE for sample selection model . heckman LNMED $XLIST, select (DMED = $XLIST) Iteration 0: log likelihood = -10183.753 (not concave) Iteration 1: log likelihood = -10183.676 (not concave) Iteration 2: log likelihood = -10183.593 (not concave) Iteration 3: log likelihood = -10183.525 (not concave) Iteration 4: log likelihood = -10183.467 (not concave) Iteration 5: log likelihood = -10183.408 (not concave) Iteration 6: log likelihood = -10183.311 (not concave) Iteration 7: log likelihood = -10183.21 (not concave) Iteration 8: log likelihood = -10179.155 Iteration 9: log likelihood = -10176.799 Iteration 10: log likelihood = -10170.17 Iteration 11: log likelihood = -10170.11 Iteration 12: log likelihood = -10170.11 Heckman selection model Number of obs = 5574 (regression model with sample selection) Censored obs = 1293 328

Uncensored obs

Log likelihood = -10170.11

=

4281

Wald chi2(17) = 805.17 Prob > chi2 =

0.0000

-----------------------------------------------------------------------------| Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------LNMED | LC | -.0760236 .0337456 -2.25 0.024 -.1421638 -.0098833 IDP | -.1497199 .0661379 -2.26 0.024 -.2793478 -.020092 LPI | .01493 .0105015 1.42 0.155 -.0056526 .0355127 FMDE | -.023522 .0194745 -1.21 0.227 -.0616913 .0146474 PHYSLIM | .3548628 .0755425 4.70 0.000 .2068023 .5029233 NDISEASE | .0286474 .0037972 7.54 0.000 .0212051 .0360897 HLTHG | .1559173 .0521775 2.99 0.003 .0536513 .2581834 HLTHF | .4451223 .0955263 4.66 0.000 .2578942 .6323505 HLTHP | .9986065 .1878791 5.32 0.000 .6303701 1.366843 LINC | .1214009 .0230845 5.26 0.000 .0761562 .1666457 LFAM | -.1583018 .0497464 -3.18 0.001 -.255803 -.0608005 EDUCDEC | .0175951 .0090183 1.95 0.051 -.0000805 .0352707 AGE | .0057376 .0024426 2.35 0.019 .0009501 .0105251 FEMALE | .5503441 .0633313 8.69 0.000 .4262171 .6744711 CHILD | -.1976875 .097398 -2.03 0.042 -.3885841 -.006791 FEMCHILD | -.5653227 .0975292 -5.80 0.000 -.7564765 -.374169 BLACK | -.5358684 .0749191 -7.15 0.000 -.6827072 -.3890296 _cons | 2.107745 .2442285 8.63 0.000 1.629066 2.586424 -------------+---------------------------------------------------------------DMED | LC | -.1068027 .0264766 -4.03 0.000 -.1586959 -.0549096 IDP | -.108769 .0509938 -2.13 0.033 -.2087149 -.0088231 LPI | .0294804 .0086214 3.42 0.001 .0125827 .0463781 FMDE | .0007403 .0158738 0.05 0.963 -.0303719 .0318524 PHYSLIM | .2848256 .0722656 3.94 0.000 .1431877 .4264635 NDISEASE | .0210805 .0034967 6.03 0.000 .0142271 .027934 HLTHG | .0576901 .042799 1.35 0.178 -.0261945 .1415747 HLTHF | .2237238 .0814547 2.75 0.006 .0640755 .3833721 HLTHP | .7984291 .2048087 3.90 0.000 .3970114 1.199847 LINC | .0553122 .0166179 3.33 0.001 .0227416 .0878827 LFAM | -.031201 .0402985 -0.77 0.439 -.1101846 .0477827 EDUCDEC | .031499 .0074987 4.20 0.000 .0168018 .0461961 AGE | -.0006072 .0021064 -0.29 0.773 -.0047357 .0035212 FEMALE | .4093059 .0532548 7.69 0.000 .3049283 .5136834 CHILD | .0530643 .0786326 0.67 0.500 -.1010527 .2071813 FEMCHILD | -.3953421 .0783811 -5.04 0.000 -.5489662 -.241718 BLACK | -.5831049 .0520534 -11.20 0.000 -.6851277 -.4810822 _cons | -.2141574 .1842169 -1.16 0.245 -.5752159 .146901 -------------+---------------------------------------------------------------/athrho | .9408188 .0736303 12.78 0.000 .796506 1.085132 /lnsigma | .4511091 .0177227 25.45 0.000 .4163732 .485845 -------------+---------------------------------------------------------------329

rho | .7355982 .0337886 .6620789 .7950943 sigma | 1.570053 .0278256 1.516452 1.625548 lambda | 1.154928 .0702985 1.017145 1.29271 -----------------------------------------------------------------------------LR test of indep. eqns. (rho = 0): chi2(1) = 27.93 Prob > chi2 = 0.0000 -----------------------------------------------------------------------------. estimates store heckmle . scalar llhecklogs = e(ll) . scalar shml = e(sigma)

/* Log-likelihood */ /* s where Var[v]=s^2 */

. . * Save the Stata predictions: . * Distinguish between ystar=E[y*], ypos=E[y|I>0] and yall=E[y] . predict ystarhml, xb /* E[y*] = x'b */ . predict yposhml, ycond

/* E[y|I>0] = E[y*|I>0] = x'b+c*lamda(z'a) */

. predict invmillhml, mills

/* lamda(z'a) = phi(z'a)/PHI(z'a) */

. predict probselhml, psel

/* PHI(z'a) */

. * The following not appropriate here as it sets y=0 if I<0 . * whereas here data is in logs and y=ln(MED)=-infinity if I<0 . predict yallhml, yexpected /* E[y] = PHI(z'a)*E[y|I>0] */ . sum ystarhml yposhml invmillhml probselhml yallhml Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------ystarhml | 5574 3.543161 .7462608 .9570364 6.92732 yposhml | 5574 4.000607 .5482433 2.50515 6.92955 invmillhml | 5574 .396082 .2165116 .0019309 1.476998 probselhml | 5574 .7674107 .1404707 .1737047 .9994534 yallhml | 5574 3.124032 .9125439 .4932862 6.925763 . . * Create predictions of level of expenditures not logs . * E[y] = exp(ypos + (s^2)/2) for y > 0 Var[v]=s^2 . * and E[y] = Pr[y>0]*exp(ypos + (s^2)/2) for all y . gen pMEDposhml = exp(yposhml + (shml^2)/2) . gen pMEDallhml = probselhml*pMEDposhml . . * Compare predictions to actual for MED > 0 . sum LNMED yposhml MED pMEDposhml if MED > 0 Variable |

Obs

Mean

Std. Dev.

Min

Max 330

-------------+-------------------------------------------------------LNMED | 4281 4.069462 1.499372 -.5343859 10.57597 yposhml | 4281 4.071295 .5573439 2.50515 6.92955 MED | 4281 220.987 909.9021 .5860291 39182.02 pMEDposhml | 4281 240.4096 185.0424 42.00053 3505.48 . corr LNMED yposhml MED pMEDpos2part if MED > 0 (obs=4281) | LNMED yposhml MED pMEDpo~t -------------+-----------------------------------LNMED | 1.0000 yposhml | 0.3690 1.0000 MED | 0.4560 0.1592 1.0000 pMEDpos2part | 0.3387 0.9343 0.1669 1.0000

. . * Compare predictions to actual including zeroes . sum MED pMEDallhml DMED probselhml Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------MED | 5574 169.7247 802.8303 0 39182.02 pMEDallhml | 5574 184.5571 174.1649 8.814864 3503.564 DMED | 5574 .7680301 .4221277 0 1 probselhml | 5574 .7674107 .1404707 .1737047 .9994534 . corr MED pMEDallhml DMED probselhml (obs=5574) | MED pMEDal~l DMED probse~l -------------+-----------------------------------MED | 1.0000 pMEDallhml | 0.1734 1.0000 DMED | 0.1162 0.2015 1.0000 probselhml | 0.1074 0.6092 0.3468 1.0000

. . * (2B) Heckman 2 step for sample selection model . * Same as MLE execpt add option twostep in heckman command . heckman LNMED $XLIST, select (DMED = $XLIST) twostep Heckman selection model -- two-step estimates Number of obs (regression model with sample selection) Censored obs = Uncensored obs = 4281

= 5574 1293

Wald chi2(34) = 944.44 Prob > chi2 = 0.0000

331

-----------------------------------------------------------------------------| Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------LNMED | LC | -.0279209 .039754 -0.70 0.482 -.1058373 .0499955 IDP | -.0922898 .0680191 -1.36 0.175 -.2256048 .0410252 LPI | .0052225 .0111057 0.47 0.638 -.0165442 .0269893 FMDE | -.0295212 .0182427 -1.62 0.106 -.0652762 .0062339 PHYSLIM | .2814948 .0804535 3.50 0.000 .1238088 .4391808 NDISEASE | .021617 .0050395 4.29 0.000 .0117398 .0314943 HLTHG | .1474026 .0490497 3.01 0.003 .051267 .2435381 HLTHF | .3821683 .0961284 3.98 0.000 .19376 .5705765 HLTHP | .833294 .1974488 4.22 0.000 .4463015 1.220287 LINC | .0990973 .0251548 3.94 0.000 .0497948 .1483998 LFAM | -.1441358 .0468074 -3.08 0.002 -.2358766 -.052395 EDUCDEC | .0033639 .0109501 0.31 0.759 -.0180979 .0248257 AGE | .0055556 .0022549 2.46 0.014 .0011361 .0099751 FEMALE | .3846323 .1032799 3.72 0.000 .1822074 .5870573 CHILD | -.2565136 .0936771 -2.74 0.006 -.4401173 -.0729098 FEMCHILD | -.392146 .125089 -3.13 0.002 -.637316 -.146976 BLACK | -.2633649 .1577542 -1.67 0.095 -.5725574 .0458276 _cons | 2.882514 .4698969 6.13 0.000 1.961533 3.803495 -------------+---------------------------------------------------------------DMED | LC | -.118708 .0269005 -4.41 0.000 -.1714319 -.065984 IDP | -.1279483 .0522351 -2.45 0.014 -.2303272 -.0255693 LPI | .0283091 .0088793 3.19 0.001 .010906 .0457121 FMDE | .0075319 .0161584 0.47 0.641 -.024138 .0392018 PHYSLIM | .2732013 .0743761 3.67 0.000 .1274268 .4189758 NDISEASE | .0224861 .0035958 6.25 0.000 .0154384 .0295338 HLTHG | .0387516 .0438545 0.88 0.377 -.0472016 .1247049 HLTHF | .1920062 .0836688 2.29 0.022 .0280185 .355994 HLTHP | .6397294 .2126322 3.01 0.003 .222978 1.056481 LINC | .0518413 .0168128 3.08 0.002 .0188889 .0847938 LFAM | -.0335599 .041728 -0.80 0.421 -.1153452 .0482253 EDUCDEC | .036307 .0076536 4.74 0.000 .0213062 .0513078 AGE | .0002631 .0021606 0.12 0.903 -.0039715 .0044978 FEMALE | .4451035 .054292 8.20 0.000 .3386932 .5515138 CHILD | .111489 .0808338 1.38 0.168 -.0469424 .2699203 FEMCHILD | -.4512845 .0799219 -5.65 0.000 -.6079284 -.2946405 BLACK | -.6057367 .0523148 -11.58 0.000 -.7082718 -.5032017 _cons | -.271605 .1877345 -1.45 0.148 -.6395579 .0963478 -------------+---------------------------------------------------------------mills | lambda | .2358048 .5018117 0.47 0.638 -.7477282 1.219338 -------------+---------------------------------------------------------------rho | 0.16833 sigma | 1.4008246 lambda | .23580476 .5018117 ------------------------------------------------------------------------------

332

. estimates store heck2step . scalar sh2s = e(sigma)

/* s where Var[v]=s^2 */

. . * Save the Stata predictions: . * Distinguish between ystar=E[y*], ypos=E[y|I>0] and yall=E[y] . predict ystarh2s, xb /* E[y*] = x'b */ . predict yposh2s, ycond

/* E[y|I>0] = E[y*|I>0] = x'b+c*lamda(z'a) */

. predict invmillh2s, mills

/* lamda(z'a) = phi(z'a)/PHI(z'a) */

. predict probselh2s, psel

/* PHI(z'a) */

. * The following not appropriate here as it sets y=0 if I<0 . * whereas here data is in logs and y=ln(MED)=-infinity if I<0 . predict yallh2s, yexpected /* E[y] = PHI(z'a)*E[y|I>0] */ . sum ystarh2s yposh2s invmillh2s probselh2s yallh2s Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------ystarh2s | 5574 3.904371 .589474 2.005307 6.573941 yposh2s | 5574 3.997637 .5516546 2.337985 6.574553 invmillh2s | 5574 .3955256 .2253329 .002599 1.545223 probselh2s | 5574 .7678377 .1457464 .1526731 .999246 yallh2s | 5574 3.124344 .9213697 .4450346 6.569597 . . * Create predictions of level of expenditures not logs . * E[y] = exp(ypos + (s^2)/2) for y > 0 Var[v]=s^2 . * and E[y] = Pr[y>0]*exp(ypos + (s^2)/2) for all y . gen pMEDposh2s = exp(yposh2s + (sh2s^2)/2) . gen pMEDallh2s = probselh2s*pMEDposh2s . . * Compare predictions to actual for MED > 0 . sum LNMED yposh2s MED pMEDposh2s if MED > 0 Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------LNMED | 4281 4.069462 1.499372 -.5343859 10.57597 yposh2s | 4281 4.069462 .5543231 2.337985 6.574553 MED | 4281 220.987 909.9021 .5860291 39182.02 pMEDposh2s | 4281 184.9993 129.5432 27.63657 1911.624 . corr LNMED yposh2s MED pMEDpos2part if MED > 0 (obs=4281)

333

| LNMED yposh2s MED pMEDpo~t -------------+-----------------------------------LNMED | 1.0000 yposh2s | 0.3697 1.0000 MED | 0.4560 0.1584 1.0000 pMEDpos2part | 0.3387 0.9240 0.1669 1.0000

. . * Compare predictions to actual including zeroes . sum MED pMEDallh2s DMED probselh2s Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------MED | 5574 169.7247 802.8303 0 39182.02 pMEDallh2s | 5574 142.1438 123.2964 5.272963 1910.182 DMED | 5574 .7680301 .4221277 0 1 probselh2s | 5574 .7678377 .1457464 .1526731 .999246 . corr MED pMEDallh2s DMED probselh2s (obs=5574) | MED pMEDa~2s DMED probs~2s -------------+-----------------------------------MED | 1.0000 pMEDallh2s | 0.1772 1.0000 DMED | 0.1162 0.2132 1.0000 probselh2s | 0.1031 0.6298 0.3467 1.0000

. . * (2C) Check for possible collinearity problems in Heckman 2-Step . . * Check variation in inverse mills ratio and related measures . gen zprimea = invnorm(probselh2s) . gen zprimeasq = zprimea*zprimea . sum invmillh2s probselh2s zprimea ystarh2s Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------invmillh2s | 5574 .3955256 .2253329 .002599 1.545223 probselh2s | 5574 .7678377 .1457464 .1526731 .999246 zprimea | 5574 .8217315 .5175712 -1.025036 3.17314 ystarh2s | 5574 3.904371 .589474 2.005307 6.573941 . sum invmillh2s probselh2s zprimea ystarh2s, detail Mills' ratio ------------------------------------------------------------334

Percentiles Smallest 1% .0443035 .002599 5% .1081773 .0065964 10% .1479522 .0074306 25% .2404661 .0111331 50% 75% 90% 95% 99%

Obs 5574 Sum of Wgt. 5574

.3522253 Mean .3955256 Largest Std. Dev. .2253329 .5044507 1.42819 .7088638 1.42819 Variance .0507749 .863094 1.466996 Skewness 1.105156 1.080771 1.545223 Kurtosis 4.403004

Pr(DMED) ------------------------------------------------------------Percentiles Smallest 1% .338421 .1526731 5% .4598847 .1769602 10% .5570307 .1900167 Obs 5574 25% .6946899 .1900167 Sum of Wgt. 5574 50% 75% 90% 95% 99%

.7984734 Mean .7678377 Largest Std. Dev. .1457464 .8717066 .9962835 .927941 .9976236 Variance .021242 .9502093 .9979156 Skewness -1.048826 .9823552 .999246 Kurtosis 3.903288

zprimea ------------------------------------------------------------Percentiles Smallest 1% -.4167765 -1.025036 5% -.1007243 -.9270119 10% .1434453 -.8778346 Obs 5574 25% .5091883 -.8778346 Sum of Wgt. 5574 50% 75% 90% 95% 99%

.8361809 Mean .8217315 Largest Std. Dev. .5175712 1.134495 2.676793 1.460626 2.82333 Variance .2678799 1.646887 2.865093 Skewness -.0298741 2.105021 3.17314 Kurtosis 3.462529

Linear prediction ------------------------------------------------------------Percentiles Smallest 1% 2.770451 2.005307 5% 3.096997 2.005307 10% 3.248734 2.066777 Obs 5574 25% 3.460358 2.093177 Sum of Wgt. 5574

335

50% 75% 90% 95% 99%

3.818303 Mean 3.904371 Largest Std. Dev. .589474 4.304362 6.054721 4.68132 6.055911 Variance .3474796 4.946257 6.273092 Skewness .5047628 5.495563 6.573941 Kurtosis 3.235111

. . * Check for Mills ratio linear in zprimea . regress invmillh2s zprimea Source | SS df MS Number of obs = 5574 -------------+-----------------------------F( 1, 5572) =84783.34 Model | 265.518552 1 265.518552 Prob > F = 0.0000 Residual | 17.4500012 5572 .00313173 R-squared = 0.9383 -------------+-----------------------------Adj R-squared = 0.9383 Total | 282.968553 5573 .050774906 Root MSE = .05596 -----------------------------------------------------------------------------invmillh2s | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------zprimea | -.4217284 .0014484 -291.18 0.000 -.4245677 -.418889 _cons | .7420731 .0014065 527.59 0.000 .7393158 .7448305 -----------------------------------------------------------------------------. regress invmillh2s zprimea zprimeasq Source | SS df MS Number of obs = -------------+-----------------------------F( 2, 5571) = Model | 282.919807 2 141.459904 Prob > F Residual | .04874607 5571 8.7500e-06 R-squared -------------+-----------------------------Adj R-squared = Total | 282.968553 5573 .050774906 Root MSE

5574 . = 0.0000 = 0.9998 0.9998 = .00296

-----------------------------------------------------------------------------invmillh2s | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------zprimea | -.6381933 .0001715 -3720.60 0.000 -.6385296 -.6378571 zprimeasq | .1329635 .0000943 1410.22 0.000 .1327787 .1331484 _cons | .7945547 .0000831 9556.73 0.000 .7943917 .7947177 -----------------------------------------------------------------------------. * twoway scatter yinvmill probitxb . . * Check R-squared from regress yinvmill on other regressors . regress invmillh2s $XLIST Source | SS df MS Number of obs = 5574 -------------+-----------------------------F( 17, 5556) = 7477.36 Model | 271.118403 17 15.9481414 Prob > F = 0.0000 Residual | 11.85015 5556 .002132856 R-squared = 0.9581 336

-------------+-----------------------------Adj R-squared = 0.9580 Total | 282.968553 5573 .050774906 Root MSE = .04618 -----------------------------------------------------------------------------invmillh2s | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------LC | .0529008 .000877 60.32 0.000 .0511815 .0546202 IDP | .0590603 .0017037 34.67 0.000 .0557204 .0624003 LPI | -.0113774 .0002792 -40.75 0.000 -.0119247 -.01083 FMDE | -.0054681 .0005178 -10.56 0.000 -.0064831 -.004453 PHYSLIM | -.0864947 .0021028 -41.13 0.000 -.090617 -.0823724 NDISEASE | -.0077731 .0001032 -75.31 0.000 -.0079754 -.0075707 HLTHG | -.0155696 .0013947 -11.16 0.000 -.0183037 -.0128355 HLTHF | -.0844067 .0025693 -32.85 0.000 -.0894435 -.0793698 HLTHP | -.2164141 .0052914 -40.90 0.000 -.2267872 -.206041 LINC | -.0293205 .0005678 -51.64 0.000 -.0304337 -.0282074 LFAM | .0170455 .0013216 12.90 0.000 .0144545 .0196364 EDUCDEC | -.0152414 .0002405 -63.38 0.000 -.0157128 -.01477 AGE | .0001145 .0000665 1.72 0.085 -.0000158 .0002448 FEMALE | -.1792718 .0016754 -107.00 0.000 -.1825563 -.1759873 CHILD | -.0474152 .0025807 -18.37 0.000 -.0524744 -.042356 FEMCHILD | .1803783 .002565 70.32 0.000 .1753498 .1854067 BLACK | .3020816 .0017915 168.62 0.000 .2985695 .3055937 _cons | .875215 .0061051 143.36 0.000 .8632467 .8871833 -----------------------------------------------------------------------------. . * Find the condition number with inverse mills ratio included . matrix accum XX = invmillh2s $XLIST (obs=5574) . matrix XXScaled = corr(XX) . matrix symeigen XXSeigvec XXSeigval = XXScaled . scalar rowsXX = rowsof(XX) . scalar condnum1 = sqrt(XXSeigval[1,1]/XXSeigval[1,rowsXX]) . scalar condnum2 = sqrt(XXSeigval[1,1]/XXSeigval[1,(rowsXX-1)]) . . * Find the condition number without inverse mills ratio . matrix accum ZZ = $XLIST (obs=5574) . matrix ZZScaled = corr(ZZ) . matrix symeigen ZZSeigvec ZZSeigval = ZZScaled . scalar rowsZZ = rowsof(ZZ) 337

. scalar condnumnoinvmills1 = sqrt(ZZSeigval[1,1]/ZZSeigval[1,rowsZZ]) . scalar condnumnoinvmills2 = sqrt(ZZSeigval[1,1]/ZZSeigval[1,(rowsZZ-1)]) . . * Condition numbers between 30 and 100 indicate a strong near dependency . scalar list condnum1 condnum2 condnum1 = 82.333696 condnum2 = 24.558474 . scalar list condnumnoinvmills1 condnumnoinvmills2 condnumnoinvmills1 = 36.660119 condnumnoinvmills2 = 20.990872 . . * (2D) Do Heckman 2 step manually (this is unnecessary) . quietly probit DMED $XLIST /* global XLIST defined earlier */ . predict pselmanual, p

/* Pr[y>0] = PHI(x'b) */

. predict xbmanual, xb

/* x'b */

. gen invmillsmanual = normden(xbmanual)/pselmanual . regress LNMED $XLIST invmillsmanual if MED > 0 Source | SS df MS Number of obs = 4281 -------------+-----------------------------F( 18, 4262) = 37.49 Model | 1315.13292 18 73.06294 Prob > F = 0.0000 Residual | 8306.80418 4262 1.94903899 R-squared = 0.1367 -------------+-----------------------------Adj R-squared = 0.1330 Total | 9621.9371 4280 2.24811614 Root MSE = 1.3961 -----------------------------------------------------------------------------LNMED | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------LC | -.0279209 .0397381 -0.70 0.482 -.1058282 .0499864 IDP | -.0922898 .067979 -1.36 0.175 -.225564 .0409844 LPI | .0052225 .0110962 0.47 0.638 -.0165318 .0269769 FMDE | -.0295212 .01822 -1.62 0.105 -.065242 .0061996 PHYSLIM | .2814948 .0803424 3.50 0.000 .1239819 .4390076 NDISEASE | .0216171 .0050367 4.29 0.000 .0117426 .0314915 HLTHG | .1474026 .0489869 3.01 0.003 .0513627 .2434424 HLTHF | .3821683 .0960103 3.98 0.000 .1939381 .5703985 HLTHP | .833294 .1971219 4.23 0.000 .4468325 1.219756 LINC | .0990973 .0251514 3.94 0.000 .0497875 .1484071 LFAM | -.1441358 .0467495 -3.08 0.002 -.2357891 -.0524825 EDUCDEC | .0033639 .0109441 0.31 0.759 -.0180922 .0248201 AGE | .0055556 .0022512 2.47 0.014 .001142 .0099692 FEMALE | .3846324 .103291 3.72 0.000 .1821281 .5871366 338

CHILD | -.2565135 .0935766 -2.74 0.006 -.4399725 -.0730546 FEMCHILD | -.392146 .1250644 -3.14 0.002 -.6373374 -.1469547 BLACK | -.2633649 .1578399 -1.67 0.095 -.5728134 .0460835 invmillsma~l | .235805 .5023784 0.47 0.639 -.7491182 1.220728 _cons | 2.882514 .470116 6.13 0.000 1.960841 3.804186 -----------------------------------------------------------------------------. predict yposmanual, xb . * Predictions here should equal those from heckman two-step earlier . sum yposh2s yposmanual invmillh2s invmillsmanual probselh2s pselmanual Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------yposh2s | 5574 3.997637 .5516546 2.337985 6.574553 yposmanual | 5574 3.997637 .5516546 2.337985 6.574553 invmillh2s | 5574 .3955256 .2253329 .002599 1.545223 invmillsma~l | 5574 .3955256 .2253329 .002599 1.545223 probselh2s | 5574 .7678377 .1457464 .1526731 .999246 -------------+-------------------------------------------------------pselmanual | 5574 .7678377 .1457464 .1526731 .999246 . * And put in squared invmills ratio . gen invmillssq = invmillsmanual*invmillsmanual . regress LNMED $XLIST invmillsmanual invmillssq if MED > 0 Source | SS df MS Number of obs = 4281 -------------+-----------------------------F( 19, 4261) = 35.64 Model | 1319.30272 19 69.4369854 Prob > F = 0.0000 Residual | 8302.63438 4261 1.94851781 R-squared = 0.1371 -------------+-----------------------------Adj R-squared = 0.1333 Total | 9621.9371 4280 2.24811614 Root MSE = 1.3959 -----------------------------------------------------------------------------LNMED | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------LC | -.0793176 .0530386 -1.50 0.135 -.1833009 .0246658 IDP | -.1419148 .075965 -1.87 0.062 -.2908457 .0070161 LPI | .0174224 .0138796 1.26 0.209 -.0097888 .0446337 FMDE | -.0258495 .0183897 -1.41 0.160 -.0619029 .0102039 PHYSLIM | .3867535 .1078448 3.59 0.000 .1753217 .5981854 NDISEASE | .0305019 .0078898 3.87 0.000 .0150337 .0459701 HLTHG | .1652111 .0504705 3.27 0.001 .0662626 .2641596 HLTHF | .4576241 .1089774 4.20 0.000 .2439716 .6712766 HLTHP | 1.056745 .2493566 4.24 0.000 .5678762 1.545614 LINC | .1169339 .027948 4.18 0.000 .0621414 .1717264 LFAM | -.1550441 .0473343 -3.28 0.001 -.2478439 -.0622443 EDUCDEC | .018452 .0150373 1.23 0.220 -.011029 .047933 AGE | .0057227 .0022538 2.54 0.011 .001304 .0101414 FEMALE | .5748999 .1660813 3.46 0.001 .2492941 .9005056 339

CHILD | -.2096856 .0988886 -2.12 0.034 -.4035587 -.0158125 FEMCHILD | -.5873068 .1828525 -3.21 0.001 -.9457929 -.2288207 BLACK | -.5010232 .2264954 -2.21 0.027 -.9450721 -.0569744 invmillsma~l | 2.159812 1.407886 1.53 0.125 -.6003768 4.920001 invmillssq | -1.043357 .7132265 -1.46 0.144 -2.441653 .3549381 _cons | 1.909849 .8142753 2.35 0.019 .3134454 3.506253 -----------------------------------------------------------------------------. . ************ (3) DISPLAY RESULTS FOR TABLE 16.1 (page 554) ************ . . * Note for brevity the coefficients for only some of the regressors are reported . . * First two columns of Table 16.1 (page 554) . * Two part estimates: probit for first part and lognormal for second . estimates table twoparta twopartb, t stats(N ll rank aic bic) b(%10.3f) ---------------------------------------Variable | twoparta twopartb -------------+-------------------------LC | -0.119 -0.016 | -4.41 -0.52 IDP | -0.128 -0.079 | -2.45 -1.28 LPI | 0.028 0.003 | 3.19 0.28 FMDE | 0.008 -0.031 | 0.47 -1.69 PHYSLIM | 0.273 0.262 | 3.67 3.81 NDISEASE | 0.022 0.020 | 6.25 5.78 HLTHG | 0.039 0.144 | 0.88 2.97 HLTHF | 0.192 0.364 | 2.29 4.13 HLTHP | 0.640 0.787 | 3.01 4.63 LINC | 0.052 0.093 | 3.08 4.28 LFAM | -0.034 -0.141 | -0.80 -3.05 EDUCDEC | 0.036 -0.000 | 4.74 -0.00 AGE | 0.000 0.006 | 0.12 2.47 FEMALE | 0.445 0.344 | 8.20 6.02 CHILD | 0.111 -0.268 | 1.38 -2.96 FEMCHILD | -0.451 -0.351 340

| -5.65 -3.92 BLACK | -0.606 -0.196 | -11.58 -2.90 _cons | -0.272 3.077 | -1.45 13.90 -------------+-------------------------N | 5574.000 4281.000 ll | -2690.577 -7493.499 rank | 18.000 18.000 aic | 5417.154 15022.998 bic | 5536.419 15137.513 ---------------------------------------legend: b/t . di "lltwopart = " lltwopart lltwopart = -10184.076 . . * Last four columns of Table 16.1 (page 554) . * Sample selection estimates: 2step and MLE estimates . set matsize 60 . estimates table heck2step heckmle, t stats(N ll rank aic bic) b(%10.3f) ---------------------------------------Variable | heck2step heckmle -------------+-------------------------LNMED | LC | -0.028 -0.076 | -0.70 -2.25 IDP | -0.092 -0.150 | -1.36 -2.26 LPI | 0.005 0.015 | 0.47 1.42 FMDE | -0.030 -0.024 | -1.62 -1.21 PHYSLIM | 0.281 0.355 | 3.50 4.70 NDISEASE | 0.022 0.029 | 4.29 7.54 HLTHG | 0.147 0.156 | 3.01 2.99 HLTHF | 0.382 0.445 | 3.98 4.66 HLTHP | 0.833 0.999 | 4.22 5.32 LINC | 0.099 0.121 | 3.94 5.26 LFAM | -0.144 -0.158 | -3.08 -3.18 EDUCDEC | 0.003 0.018 341

| 0.31 1.95 AGE | 0.006 0.006 | 2.46 2.35 FEMALE | 0.385 0.550 | 3.72 8.69 CHILD | -0.257 -0.198 | -2.74 -2.03 FEMCHILD | -0.392 -0.565 | -3.13 -5.80 BLACK | -0.263 -0.536 | -1.67 -7.15 _cons | 2.883 2.108 | 6.13 8.63 -------------+-------------------------DMED | LC | -0.119 -0.107 | -4.41 -4.03 IDP | -0.128 -0.109 | -2.45 -2.13 LPI | 0.028 0.029 | 3.19 3.42 FMDE | 0.008 0.001 | 0.47 0.05 PHYSLIM | 0.273 0.285 | 3.67 3.94 NDISEASE | 0.022 0.021 | 6.25 6.03 HLTHG | 0.039 0.058 | 0.88 1.35 HLTHF | 0.192 0.224 | 2.29 2.75 HLTHP | 0.640 0.798 | 3.01 3.90 LINC | 0.052 0.055 | 3.08 3.33 LFAM | -0.034 -0.031 | -0.80 -0.77 EDUCDEC | 0.036 0.031 | 4.74 4.20 AGE | 0.000 -0.001 | 0.12 -0.29 FEMALE | 0.445 0.409 | 8.20 7.69 CHILD | 0.111 0.053 | 1.38 0.67 FEMCHILD | -0.451 -0.395 | -5.65 -5.04 BLACK | -0.606 -0.583 | -11.58 -11.20 _cons | -0.272 -0.214 | -1.45 -1.16 342

-------------+-------------------------mills | lambda | 0.236 | 0.47 -------------+-------------------------athrho | _cons | 0.941 | 12.78 -------------+-------------------------lnsigma | _cons | 0.451 | 25.45 -------------+-------------------------Statistics | N | 5574.000 5574.000 ll | -10170.110 rank | 37.000 38.000 aic | . 20416.221 bic | . 20668.004 ---------------------------------------legend: b/t . . ************ (4) A LITTLE FURTHER ANALYSIS ********** . . * Predictions . * Compare predictions to actual for MED > 0 . sum MED pMEDpos2part pMEDposhml pMEDposh2s if MED > 0 Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------MED | 4281 220.987 909.9021 .5860291 39182.02 pMEDpos2part | 4281 183.462 126.0213 26.37827 1731.088 pMEDposhml | 4281 240.4096 185.0424 42.00053 3505.48 pMEDposh2s | 4281 184.9993 129.5432 27.63657 1911.624 . corr MED pMEDpos2part pMEDposhml pMEDposh2s if MED > 0 (obs=4281) | MED pMEDpo~t pMEDpo~l pMEDp~2s -------------+-----------------------------------MED | 1.0000 pMEDpos2part | 0.1669 1.0000 pMEDposhml | 0.1617 0.9830 1.0000 pMEDposh2s | 0.1669 0.9994 0.9887 1.0000

. . * Compare predictions to actual including zeroes . sum MED pMEDall2part pMEDallhml pMEDallh2s DMED probsel2part probselhml probselh2s

343

Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------MED | 5574 169.7247 802.8303 0 39182.02 pMEDall2part | 5574 140.966 120.2022 4.880651 1729.783 pMEDallhml | 5574 184.5571 174.1649 8.814864 3503.564 pMEDallh2s | 5574 142.1438 123.2964 5.272963 1910.182 DMED | 5574 .7680301 .4221277 0 1 -------------+-------------------------------------------------------probsel2part | 5574 .7678377 .1457464 .1526731 .999246 probselhml | 5574 .7674107 .1404707 .1737047 .9994534 probselh2s | 5574 .7678377 .1457464 .1526731 .999246 . corr MED pMEDall2part pMEDallhml pMEDallh2s DMED probsel2part probselhml probselh2s (obs=5574) | MED pMEDal~t pMEDal~l pMEDa~2s DMED probse~t probse~l probs~2s -------------+-----------------------------------------------------------------------MED | 1.0000 pMEDall2part | 0.1772 1.0000 pMEDallhml | 0.1734 0.9861 1.0000 pMEDallh2s | 0.1772 0.9995 0.9909 1.0000 DMED | 0.1162 0.2158 0.2015 0.2132 1.0000 probsel2part | 0.1031 0.6380 0.5939 0.6298 0.3467 1.0000 probselhml | 0.1074 0.6552 0.6092 0.6468 0.3468 0.9980 1.0000 probselh2s | 0.1031 0.6380 0.5939 0.6298 0.3467 1.0000 0.9980 1.0000

. . ********** CLOSE OUTPUT . log close log: c:\Imbook\bwebpage\Section4\mma16p3selection.txt log type: text closed on: 19 May 2005, 13:04:40

344

-----------------------------------------------------------------------------------------------------log: c:\Imbook\bwebpage\Section4\mma17p1km.txt log type: text opened on: 19 May 2005, 13:19:55 . . ********** OVERVIEW OF MMA17P1KM.DO ********** . . * STATA Program . * copyright C 2005 by A. Colin Cameron and Pravin K. Trivedi . * used for "Microeconometrics: Methods and Applications" . * by A. Colin Cameron and Pravin K. Trivedi (2005) . * Cambridge University Press . . * Chapter 17.2 (pages 574-5) and 17.5.1 (pages 581-3) . * Nonparametric Duration Analysis . * It provides . * (1) Kaplan-Meier Survival Estimate Graph (Figure 17.1: kennanstrk.wmf) . * (2) Nelson-Aalen Cumulative Hazard Estimate Graph . * (3) Kaplan-Meier Survivor Function Estimates (Table 17.3) . * (4) Shows that Cox regression on intercept gives same results . . * To run this program you need data file . * strkdur.dta . . ********** SETUP ********** . . set more off . version 8 . set scheme s1mono /* Used for graphs */ . . ********** DATA DESCRIPTION . . * The data is the same data as given in Table 1 of . * J. Kennan, "The Duration of Contract strikes in U.S. Manufacturing", . * Journal of Econometrics, 1985, Vol. 28, pp.5-28. . . * There are 566 observations from 1968-1976 with two variables . * 1. dur is duration of the strike in days . * 2. gdp is a measure of stage of business cycle .* (deviation of monthly log industrial production in manufacturing .* from prediction from OLS on time, time-squared and monthly dummies) . . * All observations are complete for these data. There is no censoring !! . * For an example with censoring see mma17p2kmextra.do or mma17p4duration.do . . ********** READ DATA ********** . 345

. use strkdur.dta . sum Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------dur | 566 43.62367 44.66641 1 235 gdp | 566 .0060411 .0499072 -.13996 .08554 . . * Create ASCII data set so that can use programs other than Stata . outfile dur gdp using strkdur.asc, replace . . ********* ANALYSIS: NONPARAMETRIC SURVIVAL CURVE AND HAZARD FUNCTION ********** . . * Stata st curves require defining the dependent variable . stset dur failure event: (assumed to fail at time=dur) obs. time interval: (0, dur] exit on or before: failure -----------------------------------------------------------------------------566 total obs. 0 exclusions -----------------------------------------------------------------------------566 obs. remaining, representing 566 failures in single record/single failure data 24691 total analysis time at risk, at risk from t = 0 earliest observed entry t = 0 last observed exit t = 235 . . * The data here are complete. If dur is instead right-censored, . * then also need to define a censoring indicator. For example . * stset dur, fail(censor=1) . * where the variable censor=1 if data are right-censored and =0 otherwise . * See mma17p3duration.do . . * (1) GRAPH KAPLAN-MEIER SURVIVAL CURVE . . * Minimal command that gives 95% confidence bands . sts graph, gwood failure _d: 1 (meaning all fail) analysis time _t: dur . . * Longer command for Figure 17.1 (page 575) 346

. * Nicer graphs and also confidence bands are bolder and easier to read . sts gen surv = s . sts gen lbsurv = lb(s) . sts gen ubsurv = ub(s) . sort dur . graph twoway (line ubsurv dur, msize(vtiny) mstyle(p2) c(J) clstyle(p1) clcolor(gs10)) /* > */ (line surv dur, msize(vtiny) mstyle(p1) c(J) clstyle(p1)) /* > */ (line lbsurv dur, msize(vtiny) mstyle(p2) c(J) clstyle(p1) clcolor(gs10)), /* > */ scale(1.2) plotregion(style(none)) /* > */ title("Kaplan-Meier Survival Function Estimate") /* > */ xtitle("Strike duration in days", size(medlarge)) xscale(titlegap(*5)) /* > */ ytitle("Survival Probability", size(medlarge)) yscale(titlegap(*5)) /* > */ ylabel(0.00(0.25)1.00,grid)/* > */ legend(pos(3) ring(0) col(1)) legend(size(small)) /* > */ legend( label(1 "Upper 95% confidence band") label(2 "Survival Function") /* > */ label(3 "Lower 95% confidence band") ) . graph export kennanstrk.wmf, replace (file c:\Imbook\bwebpage\Section4\kennanstrk.wmf written in Windows Metafile format) . . * (2) GRAPH NELSON-AALEN CUMULATIVE HAZARD FUNCTION . . * Minimal command that gives 95% confidence bands . sts graph, cna failure _d: 1 (meaning all fail) analysis time _t: dur . . * Longer command gives nicer figure . sts graph, cna /* > */ scale (1.2) plotregion(style(none)) /* > */ title("Nelson-Aalen Cumulative Hazard") /* > */ xtitle("Strike duration in days", size(medlarge)) xscale(titlegap(*5)) /* > */ ytitle("Cumulative Hazard", size(medlarge)) yscale(titlegap(*5)) /* > */ legend(pos(12) ring(0) col(1)) legend(size(small)) /* > */ legend(label(1 "95% confidence bands") label(2 "Cumulative Hazard")) failure _d: 1 (meaning all fail) analysis time _t: dur . . * (3) LIST SURVIVOR and NELSON-AALEN CUMULATIVE HAZARD ESTIMATES . . * Gives a lot of output . 347

. * Table 17.2: Kaplan-Meier Survivor Function (page 583) . sts list failure _d: 1 (meaning all fail) analysis time _t: dur Beg. Net Survivor Std. Time Total Fail Lost Function Error [95% Conf. Int.] ------------------------------------------------------------------------------1 566 10 0 0.9823 0.0055 0.9674 0.9905 2 556 21 0 0.9452 0.0096 0.9230 0.9612 3 535 16 0 0.9170 0.0116 0.8910 0.9369 4 519 17 0 0.8869 0.0133 0.8578 0.9104 5 502 18 0 0.8551 0.0148 0.8234 0.8816 6 484 9 0 0.8392 0.0154 0.8063 0.8670 7 475 12 0 0.8180 0.0162 0.7837 0.8474 8 463 12 0 0.7968 0.0169 0.7613 0.8277 9 451 13 0 0.7739 0.0176 0.7371 0.8061 10 438 8 0 0.7597 0.0180 0.7223 0.7928 11 430 9 0 0.7438 0.0183 0.7058 0.7777 12 421 10 0 0.7261 0.0187 0.6874 0.7609 13 411 11 0 0.7067 0.0191 0.6673 0.7424 14 400 11 0 0.6873 0.0195 0.6473 0.7237 15 389 12 0 0.6661 0.0198 0.6256 0.7033 16 377 8 0 0.6519 0.0200 0.6111 0.6896 17 369 6 0 0.6413 0.0202 0.6003 0.6793 18 363 8 0 0.6272 0.0203 0.5860 0.6656 19 355 7 0 0.6148 0.0205 0.5734 0.6535 20 348 7 0 0.6025 0.0206 0.5609 0.6415 21 341 5 0 0.5936 0.0206 0.5519 0.6328 22 336 11 0 0.5742 0.0208 0.5324 0.6137 23 325 10 0 0.5565 0.0209 0.5146 0.5964 24 315 8 0 0.5424 0.0209 0.5004 0.5824 25 307 4 0 0.5353 0.0210 0.4934 0.5754 26 303 7 0 0.5230 0.0210 0.4810 0.5632 27 296 6 0 0.5124 0.0210 0.4704 0.5527 28 290 9 0 0.4965 0.0210 0.4546 0.5369 29 281 5 0 0.4876 0.0210 0.4458 0.5281 30 276 5 0 0.4788 0.0210 0.4371 0.5193 31 271 8 0 0.4647 0.0210 0.4231 0.5051 32 263 5 0 0.4558 0.0209 0.4144 0.4963 33 258 6 0 0.4452 0.0209 0.4039 0.4857 34 252 5 0 0.4364 0.0208 0.3952 0.4768 35 247 4 0 0.4293 0.0208 0.3883 0.4697 36 243 6 0 0.4187 0.0207 0.3779 0.4590 37 237 6 0 0.4081 0.0207 0.3675 0.4483 38 231 8 0 0.3940 0.0205 0.3537 0.4340 39 223 3 0 0.3887 0.0205 0.3485 0.4287 40 220 1 0 0.3869 0.0205 0.3468 0.4269 41 219 4 0 0.3799 0.0204 0.3399 0.4197 42 215 8 0 0.3657 0.0202 0.3261 0.4053 348

43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 67 68 70 71 72 74 75 77 82 83 84 85 86 87 88 90 91 92 94 98 99 100 101 102 103 104 105 106

207 203 194 191 187 182 179 174 166 165 157 151 150 148 145 142 141 137 131 126 124 122 117 114 113 112 108 107 106 105 104 101 99 98 95 93 92 91 90 89 87 86 85 82 79 77 74 72 71 68 67

4 9 3 4 5 3 5 8 1 8 6 1 2 3 3 1 4 6 5 2 2 5 3 1 1 4 1 1 1 1 3 2 1 3 2 1 1 1 1 2 1 1 3 3 2 3 2 1 3 1 2

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0.3587 0.3428 0.3375 0.3304 0.3216 0.3163 0.3074 0.2933 0.2915 0.2774 0.2668 0.2650 0.2615 0.2562 0.2509 0.2491 0.2420 0.2314 0.2226 0.2191 0.2155 0.2067 0.2014 0.1996 0.1979 0.1908 0.1890 0.1873 0.1855 0.1837 0.1784 0.1749 0.1731 0.1678 0.1643 0.1625 0.1608 0.1590 0.1572 0.1537 0.1519 0.1502 0.1449 0.1396 0.1360 0.1307 0.1272 0.1254 0.1201 0.1184 0.1148

0.0202 0.0200 0.0199 0.0198 0.0196 0.0195 0.0194 0.0191 0.0191 0.0188 0.0186 0.0186 0.0185 0.0183 0.0182 0.0182 0.0180 0.0177 0.0175 0.0174 0.0173 0.0170 0.0169 0.0168 0.0167 0.0165 0.0165 0.0164 0.0163 0.0163 0.0161 0.0160 0.0159 0.0157 0.0156 0.0155 0.0154 0.0154 0.0153 0.0152 0.0151 0.0150 0.0148 0.0146 0.0144 0.0142 0.0140 0.0139 0.0137 0.0136 0.0134

0.3193 0.3039 0.2988 0.2919 0.2834 0.2783 0.2698 0.2563 0.2546 0.2411 0.2310 0.2294 0.2260 0.2210 0.2159 0.2143 0.2076 0.1976 0.1893 0.1860 0.1827 0.1744 0.1695 0.1678 0.1662 0.1596 0.1580 0.1563 0.1547 0.1530 0.1481 0.1449 0.1432 0.1384 0.1351 0.1335 0.1319 0.1302 0.1286 0.1254 0.1238 0.1222 0.1173 0.1125 0.1093 0.1045 0.1013 0.0997 0.0950 0.0934 0.0902

0.3981 0.3819 0.3765 0.3693 0.3602 0.3548 0.3457 0.3312 0.3293 0.3147 0.3037 0.3019 0.2982 0.2927 0.2872 0.2854 0.2780 0.2669 0.2577 0.2540 0.2503 0.2410 0.2354 0.2335 0.2317 0.2242 0.2223 0.2205 0.2186 0.2167 0.2111 0.2073 0.2055 0.1998 0.1960 0.1942 0.1923 0.1904 0.1885 0.1847 0.1828 0.1809 0.1752 0.1695 0.1657 0.1600 0.1561 0.1542 0.1485 0.1465 0.1427 349

107 65 2 0 0.1113 0.0132 0.0871 0.1388 108 63 2 0 0.1078 0.0130 0.0839 0.1349 109 61 2 0 0.1042 0.0128 0.0808 0.1311 111 59 1 0 0.1025 0.0127 0.0792 0.1291 112 58 1 0 0.1007 0.0126 0.0777 0.1272 114 57 1 0 0.0989 0.0126 0.0761 0.1252 115 56 1 0 0.0972 0.0124 0.0745 0.1233 116 55 1 0 0.0954 0.0123 0.0730 0.1213 117 54 2 0 0.0919 0.0121 0.0699 0.1174 118 52 1 0 0.0901 0.0120 0.0683 0.1155 119 51 1 0 0.0883 0.0119 0.0668 0.1135 122 50 3 0 0.0830 0.0116 0.0622 0.1076 123 47 1 0 0.0813 0.0115 0.0606 0.1056 124 46 1 0 0.0795 0.0114 0.0591 0.1037 125 45 2 0 0.0760 0.0111 0.0561 0.0997 126 43 1 0 0.0742 0.0110 0.0545 0.0977 127 42 2 0 0.0707 0.0108 0.0515 0.0937 130 40 2 0 0.0671 0.0105 0.0485 0.0897 131 38 1 0 0.0654 0.0104 0.0470 0.0877 133 37 1 0 0.0636 0.0103 0.0455 0.0857 135 36 1 0 0.0618 0.0101 0.0440 0.0837 136 35 2 0 0.0583 0.0098 0.0410 0.0797 139 33 2 0 0.0548 0.0096 0.0381 0.0756 140 31 1 0 0.0530 0.0094 0.0366 0.0736 141 30 3 0 0.0477 0.0090 0.0323 0.0675 142 27 1 0 0.0459 0.0088 0.0308 0.0654 143 26 1 0 0.0442 0.0086 0.0294 0.0633 146 25 2 0 0.0406 0.0083 0.0265 0.0592 147 23 1 0 0.0389 0.0081 0.0251 0.0571 148 22 2 0 0.0353 0.0078 0.0223 0.0529 151 20 1 0 0.0336 0.0076 0.0209 0.0508 152 19 1 0 0.0318 0.0074 0.0196 0.0487 153 18 2 0 0.0283 0.0070 0.0169 0.0444 154 16 1 0 0.0265 0.0068 0.0155 0.0423 160 15 1 0 0.0247 0.0065 0.0142 0.0401 163 14 2 0 0.0212 0.0061 0.0116 0.0357 165 12 1 0 0.0194 0.0058 0.0103 0.0335 168 11 1 0 0.0177 0.0055 0.0091 0.0312 174 10 1 0 0.0159 0.0053 0.0079 0.0290 175 9 1 0 0.0141 0.0050 0.0067 0.0267 179 8 1 0 0.0124 0.0046 0.0055 0.0244 191 7 1 0 0.0106 0.0043 0.0044 0.0220 192 6 1 0 0.0088 0.0039 0.0034 0.0196 205 5 1 0 0.0071 0.0035 0.0024 0.0171 208 4 1 0 0.0053 0.0031 0.0015 0.0146 216 3 1 0 0.0035 0.0025 0.0007 0.0121 226 2 1 0 0.0018 0.0018 0.0002 0.0095 235 1 1 0 0.0000 . . . ------------------------------------------------------------------------------. 350

. * And Nelson-Aalen Integrated Hazard . * sts list, na . . * (4) STCOX REGRESS ON INTERCEPT GIVES SAME RESULTS AS ABOVE . . * Cox Regression on an intercept . gen one = 1 . stcox one, basesurv(coxbasesurv) basechazard(coxbasecumhaz) basehc(coxbasehaz) failure _d: 1 (meaning all fail) analysis time _t: dur note: one dropped due to collinearity Iteration 0: log likelihood = -3032.134 Refining estimates: Iteration 0: log likelihood = -3032.134 Cox regression -- Breslow method for ties No. of subjects = No. of failures = Time at risk =

566 566 24691

Number of obs =

LR chi2(0) Log likelihood =

-3032.134

566

= 0.00 Prob > chi2 =

.

-----------------------------------------------------------------------------_t | Haz. Ratio Std. Err. z P>|z| [95% Conf. Interval] -------------+--------------------------------------------------------------------------------------------------------------------------------------------. . * Instead use sts which analyzes dependent in isolation . * sts gen surv = s . sts gen cumhaz = na . sts gen haz = h . . * Compare to verify that same answers . sum surv coxbasesurv cumhaz coxbasecumhaz haz coxbasehaz Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------surv | 566 .493014 .2848417 0 .9823322 coxbasesurv | 566 .493014 .2848417 0 .9823322 cumhaz | 566 1 .9834583 .0176678 6.871446 coxbasecum~z | 566 1 .9834583 .0176678 6.871446 haz | 566 .0345186 .0515235 .0045455 1 -------------+-------------------------------------------------------coxbasehaz | 566 .0345186 .0515235 .0045455 1 351

. corr surv coxbasesurv (obs=566) | surv coxbas~v -------------+-----------------surv | 1.0000 coxbasesurv | 1.0000 1.0000

. corr cumhaz coxbasecumhaz (obs=566) | cumhaz cox~mhaz -------------+-----------------cumhaz | 1.0000 coxbasecum~z | 1.0000 1.0000

. corr haz coxbasehaz (obs=566) | haz cox~ehaz -------------+-----------------haz | 1.0000 coxbasehaz | 1.0000 1.0000

. . * (5) ESTIMATE HAZARD FUNCTION . . * sts graph does not give the true hazard function - it instead gives the . * difference in the cumulative hazard (without division by time difference). . . ********** CLOSE OUTPUT . log close log: c:\Imbook\bwebpage\Section4\mma17p1km.txt log type: text closed on: 19 May 2005, 13:20:01 -----------------------------------------------------------------------------------------------------log: c:\Imbook\bwebpage\Section4\mma17p2kmextra.txt log type: text opened on: 19 May 2005, 13:24:01 . . ********** OVERVIEW OF MMA17PP2KMEXTRA.DO ********** . . * STATA Program . * copyright C 2005 by A. Colin Cameron and Pravin K. Trivedi . * used for "Microeconometrics: Methods and Applications" . * by A. Colin Cameron and Pravin K. Trivedi (2005) 352

. * Cambridge University Press . . * Chapter 17.5.1 pages 581-2 . * Nonparametric Survival Analysis . * Provides . * (1) K-M Survivor Function and N_A Cum Hazard Estimates (Table 17.2) . * using artificial data . . ********** SETUP ********** . . set more off . version 8.0 . set scheme s1mono /* Used for graphs */ . . ********** GENERATE DATA ********** . . * The time does not matter except for the hazard. . * Here arbitrarily let durations be 1, 4, 6, 11 and 20 (so irregularly spaced) . * 1. At t = 10 (time t1): 6 failures . * 2. At t = 15: 4 censored (lost) between t1 and t2 . * 3. At t = 20 (time t2): 5 failures . * 4. At t = 25: 3 censored (lost) between t2 and t3 . * 3. At t = 30 (time t3): 2 failures . * 4. At t = 35: 1 censored (lost) between t3 and t4 . * 3. At t = 40 (time t4): 1 failures . * 4. At t = 45: 32 failures (lost) between t4 and t5 . * 5. At t = 50 (time t5): 26 censored . . * Indicator failed = 1 if fail and 0 if censored . input duration failed

1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16.

duration 10 1 10 1 10 1 10 1 10 1 10 1 15 0 15 0 15 0 15 0 20 1 20 1 20 1 20 1 20 1 25 0

failed

353

17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. 32. 33. 34. 35. 36. 37. 38. 39. 40. 41. 42. 43. 44. 45. 46. 47. 48. 49. 50. 51. 52. 53. 54. 55. 56. 57. 58. 59. 60. 61. 62. 63. 64. 65. 66. 67.

25 25 30 30 35 40 45 45 45 45 45 45 45 45 45 45 45 45 45 45 45 45 45 45 45 45 45 45 45 45 45 45 45 45 45 45 45 45 50 50 50 50 50 50 50 50 50 50 50 50 50

0 0 1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 354

68. 50 69. 50 70. 50 71. 50 72. 50 73. 50 74. 50 75. 50 76. 50 77. 50 78. 50 79. 50 80. 50 81. end

1 1 1 1 1 1 1 1 1 1 1 1 1

. . sum Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------duration | 80 39.625 13.40166 10 50 failed | 80 .5 .5031546 0 1 . . ***** COMPUTATION USING STATA ********** . . * Stata st curves require defining the dependent variable . stset duration, fail(failed=1) failure event: failed == 1 obs. time interval: (0, duration] exit on or before: failure -----------------------------------------------------------------------------80 total obs. 0 exclusions -----------------------------------------------------------------------------80 obs. remaining, representing 40 failures in single record/single failure data 3170 total analysis time at risk, at risk from t = 0 earliest observed entry t = 0 last observed exit t = 50 . stsum failure _d: failed == 1 analysis time _t: duration | incidence no. of |------ Survival time -----| | time at risk rate subjects 25% 50% 75% ---------+--------------------------------------------------------------------355

total |

3170 .0126183

80

50

50

50

. stdes failure _d: failed == 1 analysis time _t: duration |-------------- per subject --------------| Category total mean min median max -----------------------------------------------------------------------------no. of subjects 80 no. of records 80 1 1 1 1 (first) entry time (final) exit time subjects with gap time on gap if gap time at risk

0 39.625

0

0 10

0 45

50

0 0 3170

39.625

10

45

50

failures 40 .5 0 .5 1 -----------------------------------------------------------------------------. . * K-M survival graph . * sts graph, gwood . . * N-A Cumulative Hazard . * sts graph, cna . . * Kaplan-Meier Survivor Function listed (last column Table 17.2) . sts list failure _d: failed == 1 analysis time _t: duration Beg. Net Survivor Std. Time Total Fail Lost Function Error [95% Conf. Int.] ------------------------------------------------------------------------------10 80 6 0 0.9250 0.0294 0.8407 0.9656 15 74 0 4 0.9250 0.0294 0.8407 0.9656 20 70 5 0 0.8589 0.0395 0.7596 0.9193 25 65 0 3 0.8589 0.0395 0.7596 0.9193 30 62 2 0 0.8312 0.0428 0.7268 0.8984 35 60 0 1 0.8312 0.0428 0.7268 0.8984 40 59 1 0 0.8171 0.0443 0.7104 0.8875 45 58 0 32 0.8171 0.0443 0.7104 0.8875 50 26 26 0 0.0000 . . . ------------------------------------------------------------------------------. 356

. * Nelson-Aalen Cumulative Hazard Listed (second last column Table 17.2) . sts list, na failure _d: failed == 1 analysis time _t: duration Beg. Net Nelson-Aalen Std. Time Total Fail Lost Cum. Haz. Error [95% Conf. Int.] ------------------------------------------------------------------------------10 80 6 0 0.0750 0.0306 0.0337 0.1669 15 74 0 4 0.0750 0.0306 0.0337 0.1669 20 70 5 0 0.1464 0.0442 0.0810 0.2648 25 65 0 3 0.1464 0.0442 0.0810 0.2648 30 62 2 0 0.1787 0.0498 0.1035 0.3085 35 60 0 1 0.1787 0.0498 0.1035 0.3085 40 59 1 0 0.1956 0.0526 0.1155 0.3313 45 58 0 32 0.1956 0.0526 0.1155 0.3313 50 26 26 0 1.1956 0.2030 0.8571 1.6678 ------------------------------------------------------------------------------. . ***** MANUAL COMPUTATION AS IN TABLE 17.2 (page 582) ********** . . scalar cumhaz1 = 6/80 . scalar cumhaz2 = 6/80 + 5/70 . scalar cumhaz3 = 6/80 + 5/70 + 2/62 . scalar surv1 = 1-6/80 . scalar surv2 = (1-6/80)*(1-5/70) . scalar surv3 = (1-6/80)*(1-5/70)*(1-2/62) . di "Cumulative hazard at t1: " cumhaz1 " at t2: " cumhaz2 " at t3: " cumhaz3 Cumulative hazard at t1: .075 at t2: .14642857 at t3: .17868664 . di "Survivor function at t1: " surv1 " at t2: " surv2 " at t3: " surv3 Survivor function at t1: .925 at t2: .85892857 at t3: .8312212 . . ********** CLOSE OUTPUT ********** . log close log: c:\Imbook\bwebpage\Section4\mma17p2kmextra.txt log type: text closed on: 19 May 2005, 13:24:01 -----------------------------------------------------------------------------------------------------log: c:\Imbook\bwebpage\Section4\mma17p3weib.txt log type: text opened on: 19 May 2005, 14:22:25 357

. . ********** OVERVIEW OF MMA17P3WEIB.DO ********** . . * STATA Program . * copyright C 2005 by A. Colin Cameron and Pravin K. Trivedi . * used for "Microeconometrics: Methods and Applications" . * by A. Colin Cameron and Pravin K. Trivedi (2005) . * Cambridge University Press . . * Chapter 17.6.1 (pages 584-6) . * Plot of Weibull density, survuvor, hazard and cumulative hazard functions . * Provides . * (1) Figure 17.2 (ch17weibull.wmf) . . * This program requires no data . . ********** SETUP ********** . . set more off . version 8.0 . set scheme s1mono /* Used for graphs */ . . ********** GENERATE DATA AND FUNCTIONS ********** . . set obs 800 obs was 0, now 800 . . gen t = 0.1*_n /* duration time */ . . * Generate the survivor, hazard, cumulative hazard and density . scalar g = 0.01 /* gamma */ . scalar a = 1.5 /* alpha */ . gen surv = exp(-g*(t^(a))) . gen density = g*a*(t^(a-1))*exp(-g*(t^(a))) . gen hazard = g*a*(t^(a-1)) . gen cumhaz = -ln(surv) . . ********** DO THE FOUR SEPARATE GRAPHS FOR FIGURE 17.2 ********** . 358

. * Weibull density . graph twoway (scatter density t, c(l) msize(vtiny) clwidth(medthick) clstyle(p1)), /* > */ scale (1.2) plotregion(style(none)) /* > */ xtitle("Duration time", size(large)) xscale(titlegap(*5)) /* > */ ytitle("Weibull density", size(large)) yscale(titlegap(*5)) /* > */ xlabel(,labsize(medlarge)) ylabel(,labsize(medlarge)) . graph save ch17fig2a, replace (file ch17fig2a.gph saved) . . * Weibull survivor . graph twoway (scatter surv t, c(l) msize(vtiny) clwidth(medthick) clstyle(p1)), /* > */ scale (1.2) plotregion(style(none)) /* > */ xtitle("Duration time", size(large)) xscale(titlegap(*5)) /* > */ ytitle("Weibull survivor", size(large)) yscale(titlegap(*5)) /* > */ xlabel(,labsize(medlarge)) ylabel(,labsize(medlarge)) . graph save ch17fig2b, replace (file ch17fig2b.gph saved) . . * Weibull hazard . graph twoway (scatter hazard t, c(l) msize(vtiny) clwidth(medthick) clstyle(p1)), /* > */ scale (1.2) plotregion(style(none)) /* > */ xtitle("Duration time", size(large)) xscale(titlegap(*5)) /* > */ ytitle("Weibull hazard", size(large)) yscale(titlegap(*5)) /* > */ xlabel(,labsize(medlarge)) ylabel(,labsize(medlarge)) . graph save ch17fig2c, replace (file ch17fig2c.gph saved) . . * Weibull cumulative hazard . graph twoway (scatter cumhaz t, c(l) msize(vtiny) clwidth(medthick) clstyle(p1)), /* > */ scale (1.2) plotregion(style(none)) /* > */ xtitle("Duration time", size(large)) xscale(titlegap(*5)) /* > */ ytitle("Cumulative hazard", size(large)) yscale(titlegap(*5)) /* > */ xlabel(,labsize(medlarge)) ylabel(,labsize(medlarge)) . graph save ch17fig2d, replace (file ch17fig2d.gph saved) . . ********** COMBINE THE FOUR GRAPHS FOR FIGURE 17.2 (page 585) ********** . . graph combine ch17fig2a.gph ch17fig2b.gph ch17fig2c.gph ch17fig2d.gph, /* > */ title("Weibull Distribution", margin(b=2) size(vlarge)) . graph export ch17weibull.wmf, replace (file c:\Imbook\bwebpage\Section4\ch17weibull.wmf written in Windows Metafile format) 359

. . ********** CLOSE OUTPUT . log close log: c:\Imbook\bwebpage\Section4\mma17p3weib.txt log type: text closed on: 19 May 2005, 14:22:39 -----------------------------------------------------------------------------------------------------log: c:\Imbook\bwebpage\Section4\mma17p4duration.txt log type: text opened on: 19 May 2005, 15:25:00 . . ********** OVERVIEW OF MMA17P4DURATION.DO ********** . . * STATA Program . * copyright C 2005 by A. Colin Cameron and Pravin K. Trivedi . * used for "Microeconometrics: Methods and Applications" . * by A. Colin Cameron and Pravin K. Trivedi (2005) . * Cambridge University Press . . * Chapter 17.11 (pages 603-8) . * Duration regression with censored data example . * Provides . * (1) Data summary: Table 17.6 . * (2) List of Survivor Function and Cumulative Hazard Estimates: Table 17.7 . * (3) Various graphs describing the data .* (3A) K-M Survival Graph for all data (Figure 17.3: km_pt1.wmf) .* (3B) K-M Survival Graph by unemployment insurance (Figure 17.4: km_pt2.wmf) .* (3C) N-A Cumulative Hazard Graph for all data (Figure 17.5: na_pt1.wmf) .* (3D) N-A Cumulative Hazard Graph by unemployment insurance (Figure 17.6: na_pt2.wmf) . * (4) Coefficient Estimates of Some Parametric Models (Table 17.8) . * (4) Hazard Rate Estimates of Some Parametric Models (Table 17.9) . . * To run this program you need data file . * ema1996.dta . . ********** SETUP ********** . . set more off . version 8.0 . set scheme s1mono /* Used for graphs */ . set matsize 100 . . ********** DATA DESCRIPTION ********** . 360

. * The data is from . * B.P. McCall (1996), "Unemployment Insurance Rules, Joblessness, .* and Part-time Work," Econometrica, 64, 647-682. . . * McCalls data set named ema_1996_pt_lastweek.dta . * has name changed to ema1996.dta . . * There are 3343 observations from the CPS Displaced Worker Surveys . * of 1986, 1988, 1990 and 1992 . * 1. spell is length of spell in number of two-week intervals . * 2. CENSOR1 = 1 if re-employed at full-time job . * 3. CENSOR2 = 1 if re-employed at part-time job . * 4. CENSOR3 = 1 if re-employed but left job: pt-ft status unknown . * 5. CENSOR4 = 1 if still jobless . * 6. ui (UI) = 1 if filed UI claim . * 7. reprate (RR) = eligible replacement rate . * 8. disrate (DR) = eligible disregard rate . * 9. tenure (TENURE) = years tenure in lost job . * 10. logwage (LOGWAGE) = log weekly earnings in lost job (1985$) . * 11.-43. other variables listed in McCall (1986) table 2 p.657 . . ********** READ DATA ********** . . use ema1996.dta (Sample for 1996 EMA paper: part-time= worked part-time last week) . sum Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------spell | 3343 6.247981 5.611271 1 28 censor1 | 3343 .3209692 .4669188 0 1 censor2 | 3343 .1014059 .3019106 0 1 censor3 | 3343 .1717021 .3771777 0 1 censor4 | 3343 .3754113 .4843014 0 1 -------------+-------------------------------------------------------ui | 3343 .5527969 .4972791 0 1 reprate | 3343 .4544717 .1137918 .066 2.059 logwage | 3343 5.692994 .5356591 2.70805 7.600402 tenure | 3343 4.114867 5.862322 0 40 disrate | 3343 .1094376 .0735274 .002 1.02 -------------+-------------------------------------------------------slack | 3343 .4884834 .4999421 0 1 abolpos | 3343 .1456775 .3528354 0 1 explose | 3343 .5025426 .5000683 0 1 stateur | 3343 6.5516 1.803825 2.5 13 houshead | 3343 .6120251 .4873617 0 1 -------------+-------------------------------------------------------married | 3343 .5860006 .4926221 0 1 female | 3343 .3478911 .4763725 0 1 child | 3343 .4501944 .4975876 0 1 361

ychild | 3343 .1956327 .3967463 0 1 nonwhite | 3343 .1390966 .3460991 0 1 -------------+-------------------------------------------------------age | 3343 35.44331 10.6402 20 61 schlt12 | 3343 .2811846 .4496446 0 1 schgt12 | 3343 .3356267 .4722797 0 1 smsa | 3343 .7241998 .4469835 0 1 bluecoll | 3343 .6036494 .489212 0 1 -------------+-------------------------------------------------------mining | 3343 .029315 .1687132 0 1 constr | 3343 .1480706 .3552231 0 1 transp | 3343 .0646126 .2458778 0 1 trade | 3343 .1848639 .3882452 0 1 fire | 3343 .0514508 .2209484 0 1 -------------+-------------------------------------------------------services | 3343 .1699073 .3756075 0 1 pubadmin | 3343 .0095722 .097383 0 1 year85 | 3343 .2677236 .442839 0 1 year87 | 3343 .2174693 .4125862 0 1 year89 | 3343 .1998205 .3999251 0 1 -------------+-------------------------------------------------------midatl | 3343 .1088842 .3115405 0 1 encen | 3343 .1429853 .3501103 0 1 wncen | 3343 .0643135 .2453472 0 1 southatl | 3343 .2375112 .4256217 0 1 escen | 3343 .0532456 .2245564 0 1 -------------+-------------------------------------------------------wscen | 3343 .1441819 .3513266 0 1 mountain | 3343 .1079868 .3104102 0 1 pacific | 3343 .0260245 .159232 0 1 . . * The following gives variables in same order as Table 2 p.657 of McCall (1996) . * which gives fuller names for the variables . sum spell censor1 censor2 censor3 censor4 age /* > */ ui reprate disrate logwage tenure slack abolpos explose bluecoll /* > */ houshead married child ychild female schlt12 schgt12 nonwhite smsa /* > */ midatl encen wncen southatl escen wscen mountain pacific /* > */ mining constr transp trade fire services pubadmin /* > */ year85 year87 year89 Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------spell | 3343 6.247981 5.611271 1 28 censor1 | 3343 .3209692 .4669188 0 1 censor2 | 3343 .1014059 .3019106 0 1 censor3 | 3343 .1717021 .3771777 0 1 censor4 | 3343 .3754113 .4843014 0 1 -------------+-------------------------------------------------------age | 3343 35.44331 10.6402 20 61 ui | 3343 .5527969 .4972791 0 1 362

reprate | 3343 .4544717 .1137918 .066 2.059 disrate | 3343 .1094376 .0735274 .002 1.02 logwage | 3343 5.692994 .5356591 2.70805 7.600402 -------------+-------------------------------------------------------tenure | 3343 4.114867 5.862322 0 40 slack | 3343 .4884834 .4999421 0 1 abolpos | 3343 .1456775 .3528354 0 1 explose | 3343 .5025426 .5000683 0 1 bluecoll | 3343 .6036494 .489212 0 1 -------------+-------------------------------------------------------houshead | 3343 .6120251 .4873617 0 1 married | 3343 .5860006 .4926221 0 1 child | 3343 .4501944 .4975876 0 1 ychild | 3343 .1956327 .3967463 0 1 female | 3343 .3478911 .4763725 0 1 -------------+-------------------------------------------------------schlt12 | 3343 .2811846 .4496446 0 1 schgt12 | 3343 .3356267 .4722797 0 1 nonwhite | 3343 .1390966 .3460991 0 1 smsa | 3343 .7241998 .4469835 0 1 midatl | 3343 .1088842 .3115405 0 1 -------------+-------------------------------------------------------encen | 3343 .1429853 .3501103 0 1 wncen | 3343 .0643135 .2453472 0 1 southatl | 3343 .2375112 .4256217 0 1 escen | 3343 .0532456 .2245564 0 1 wscen | 3343 .1441819 .3513266 0 1 -------------+-------------------------------------------------------mountain | 3343 .1079868 .3104102 0 1 pacific | 3343 .0260245 .159232 0 1 mining | 3343 .029315 .1687132 0 1 constr | 3343 .1480706 .3552231 0 1 transp | 3343 .0646126 .2458778 0 1 -------------+-------------------------------------------------------trade | 3343 .1848639 .3882452 0 1 fire | 3343 .0514508 .2209484 0 1 services | 3343 .1699073 .3756075 0 1 pubadmin | 3343 .0095722 .097383 0 1 year85 | 3343 .2677236 .442839 0 1 -------------+-------------------------------------------------------year87 | 3343 .2174693 .4125862 0 1 year89 | 3343 .1998205 .3999251 0 1 . . * The following creates a space-delimited data set with . * variables in same order as Table 2 p.657 of McCall (1996) . * Permits use by programs other than Stata . * Note that order has been changed a little from the original Stata data set . . outfile spell censor1 censor2 censor3 censor4 age /* > */ ui reprate disrate logwage tenure slack abolpos explose bluecoll /* 363

> > > >

*/ houshead married child ychild female schlt12 schgt12 nonwhite smsa /* */ midatl encen wncen southatl escen wscen mountain pacific /* */ mining constr transp trade fire services pubadmin /* */ year85 year87 year89 using ema1996.asc, replace

. . ********* ANALYSIS: UNEMPLOYMENT DURATION ********** . . * Stata st curves require defining the dependent variable . * and the censoring variable if there is one . stset spell, fail(censor1=1) failure event: censor1 == 1 obs. time interval: (0, spell] exit on or before: failure -----------------------------------------------------------------------------3343 total obs. 0 exclusions -----------------------------------------------------------------------------3343 obs. remaining, representing 1073 failures in single record/single failure data 20887 total analysis time at risk, at risk from t = 0 earliest observed entry t = 0 last observed exit t = 28 . stdes failure _d: censor1 == 1 analysis time _t: spell |-------------- per subject --------------| Category total mean min median max -----------------------------------------------------------------------------no. of subjects 3343 no. of records 3343 1 1 1 1 (first) entry time (final) exit time subjects with gap time on gap if gap time at risk

0 6.247981 0 0 20887 6.247981

0

0 1

0 5

1

28

5

28

failures 1073 .3209692 0 0 1 -----------------------------------------------------------------------------. . * (1) SUMMARIZE KEY VARIABLES (Table 17.6, p.603) . . sum spell censor1 censor2 censor3 censor4 ui reprate disrate tenure logwage 364

Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------spell | 3343 6.247981 5.611271 1 28 censor1 | 3343 .3209692 .4669188 0 1 censor2 | 3343 .1014059 .3019106 0 1 censor3 | 3343 .1717021 .3771777 0 1 censor4 | 3343 .3754113 .4843014 0 1 -------------+-------------------------------------------------------ui | 3343 .5527969 .4972791 0 1 reprate | 3343 .4544717 .1137918 .066 2.059 disrate | 3343 .1094376 .0735274 .002 1.02 tenure | 3343 4.114867 5.862322 0 40 logwage | 3343 5.692994 .5356591 2.70805 7.600402 . . * (2) LIST SURVIVAL CURVE AND CUMULATIVE HAZARD ESTIMATES (Table 17.7, p.605) . . * Kaplan-Meier Estimates of Survival Function . sts list failure _d: censor1 == 1 analysis time _t: spell Beg. Net Survivor Std. Time Total Fail Lost Function Error [95% Conf. Int.] ------------------------------------------------------------------------------1 3343 294 246 0.9121 0.0049 0.9019 0.9212 2 2803 178 304 0.8541 0.0062 0.8415 0.8659 3 2321 119 305 0.8103 0.0071 0.7960 0.8238 4 1897 56 165 0.7864 0.0076 0.7712 0.8008 5 1676 104 233 0.7376 0.0085 0.7206 0.7538 6 1339 32 111 0.7200 0.0088 0.7023 0.7369 7 1196 85 178 0.6688 0.0098 0.6492 0.6876 8 933 15 70 0.6581 0.0100 0.6380 0.6773 9 848 33 98 0.6325 0.0106 0.6113 0.6528 10 717 3 55 0.6298 0.0106 0.6086 0.6503 11 659 26 77 0.6050 0.0113 0.5825 0.6267 12 556 7 40 0.5974 0.0115 0.5744 0.6195 13 509 25 69 0.5680 0.0123 0.5434 0.5918 14 415 30 74 0.5270 0.0135 0.5001 0.5531 15 311 19 40 0.4948 0.0146 0.4658 0.5230 16 252 10 41 0.4751 0.0153 0.4449 0.5047 17 201 8 24 0.4562 0.0161 0.4245 0.4874 18 169 7 13 0.4373 0.0169 0.4040 0.4702 19 149 4 15 0.4256 0.0174 0.3912 0.4595 20 130 3 18 0.4158 0.0179 0.3804 0.4507 21 109 4 23 0.4005 0.0188 0.3635 0.4372 22 82 4 9 0.3810 0.0203 0.3412 0.4206 23 69 0 9 0.3810 0.0203 0.3412 0.4206 365

24 60 0 2 0.3810 0.0203 0.3412 0.4206 25 58 0 10 0.3810 0.0203 0.3412 0.4206 26 48 2 13 0.3651 0.0223 0.3214 0.4088 27 33 5 24 0.3098 0.0296 0.2528 0.3684 28 4 0 4 0.3098 0.0296 0.2528 0.3684 ------------------------------------------------------------------------------. . * Nelson-Aalen Estimates of Cumulative Hazard . sts list, na failure _d: censor1 == 1 analysis time _t: spell Beg. Net Nelson-Aalen Std. Time Total Fail Lost Cum. Haz. Error [95% Conf. Int.] ------------------------------------------------------------------------------1 3343 294 246 0.0879 0.0051 0.0784 0.0986 2 2803 178 304 0.1514 0.0070 0.1383 0.1658 3 2321 119 305 0.2027 0.0084 0.1869 0.2199 4 1897 56 165 0.2322 0.0093 0.2147 0.2512 5 1676 104 233 0.2943 0.0111 0.2733 0.3169 6 1339 32 111 0.3182 0.0119 0.2957 0.3424 7 1196 85 178 0.3893 0.0142 0.3624 0.4181 8 933 15 70 0.4053 0.0148 0.3774 0.4353 9 848 33 98 0.4443 0.0162 0.4135 0.4773 10 717 3 55 0.4484 0.0164 0.4174 0.4818 11 659 26 77 0.4879 0.0182 0.4536 0.5248 12 556 7 40 0.5005 0.0188 0.4650 0.5387 13 509 25 69 0.5496 0.0212 0.5096 0.5927 14 415 30 74 0.6219 0.0250 0.5748 0.6728 15 311 19 40 0.6830 0.0286 0.6291 0.7415 16 252 10 41 0.7227 0.0313 0.6639 0.7866 17 201 8 24 0.7625 0.0343 0.6982 0.8327 18 169 7 13 0.8039 0.0377 0.7333 0.8812 19 149 4 15 0.8307 0.0400 0.7559 0.9130 20 130 3 18 0.8538 0.0422 0.7750 0.9406 21 109 4 23 0.8905 0.0460 0.8048 0.9853 22 82 4 9 0.9393 0.0521 0.8426 1.0470 23 69 0 9 0.9393 0.0521 0.8426 1.0470 24 60 0 2 0.9393 0.0521 0.8426 1.0470 25 58 0 10 0.9393 0.0521 0.8426 1.0470 26 48 2 13 0.9809 0.0598 0.8705 1.1055 27 33 5 24 1.1325 0.0904 0.9685 1.3242 28 4 0 4 1.1325 0.0904 0.9685 1.3242 ------------------------------------------------------------------------------. . * (3) VARIOUS GRAPHS (Figures 17.3-17.6) . . * (3A) Figure 17.3: Overall Survival Function (page 604) 366

. * sts graph, gwood . * Nicer graphs and also confidence bands are bolder and easier to read . sts gen surv = s . sts gen lbsurv = lb(s) . sts gen ubsurv = ub(s) . sort spell . graph twoway (line ubsurv spell, msize(vtiny) mstyle(p2) c(J) clstyle(p1) clcolor(gs10)) /* > */ (line surv spell, msize(vtiny) mstyle(p1) c(J) clstyle(p1)) /* > */ (line lbsurv spell, msize(vtiny) mstyle(p2) c(J) clstyle(p1) clcolor(gs10)), /* > */ scale(1.2) plotregion(style(none)) /* > */ title("Overall Survival Function Estimate") /* > */ xtitle("Unemployment Duration in 2-week intervals", size(medlarge)) xscale(titlegap(*5)) /* > */ ytitle("Survival Probability", size(medlarge)) yscale(titlegap(*5)) /* > */ ylabel(0.00(0.25)1.00,grid)/* > */ legend(pos(1) ring(0) col(1)) legend(size(small)) /* > */ legend( label(1 "Upper 95% confidence band") label(2 "Survival Estimate") /* > */ label(3 "Lower 95% confidence band") ) . graph export km_pt1.wmf, replace (file c:\Imbook\bwebpage\Section4\km_pt1.wmf written in Windows Metafile format) . . * (3B) Figure 17.4: Survival Function by Treatment (here ui) (p.605) . * sts graph, by(ui) . sts graph, by(ui) /* > */ scale (1.2) plotregion(style(none)) /* > */ title("Survival Function Estimates by UI Status") /* > */ xtitle("Unemployment Duration in 2-week intervals", size(medlarge)) xscale(titlegap(*5)) /* > */ ytitle("Survival Probability", size(medlarge)) yscale(titlegap(*5)) /* > */ legend(pos(1) ring(0) col(1)) legend(size(small)) /* > */ legend(label(1 "No UI (UI = 0)") label(2 "Received UI (UI = 1)") ) failure _d: censor1 == 1 analysis time _t: spell . graph export km_pt2.wmf, replace (file c:\Imbook\bwebpage\Section4\km_pt2.wmf written in Windows Metafile format) . . * (3C) Figure 17.5: Overall Cumulative Hazard Function (p.606) . * sts graph, cna . * Nicer graphs and also confidence bands are bolder and easier to read . sts gen cumhaz = na . sts gen lbcumhaz = lb(na) . sts gen ubcumhaz = ub(na) 367

. sort spell . graph twoway (line ubcumhaz spell, msize(vtiny) mstyle(p2) c(J) clstyle(p1) clcolor(gs10)) /* > */ (line cumhaz spell, msize(vtiny) mstyle(p1) c(J) clstyle(p1)) /* > */ (line lbcumhaz spell, msize(vtiny) mstyle(p2) c(J) clstyle(p1) clcolor(gs10)), /* > */ scale(1.2) plotregion(style(none)) /* > */ title("Overall Cumulative Hazard Estimate") /* > */ xtitle("Unemployment Duration in 2-week intervals", size(medlarge)) xscale(titlegap(*5)) /* > */ ytitle("Cumulative Hazard", size(medlarge)) yscale(titlegap(*5)) /* > */ ylabel(0.00(0.50)1.50,grid)/* > */ legend(pos(11) ring(0) col(1)) legend(size(small)) /* > */ legend( label(1 "Upper 95% confidence band") label(2 "Cumulative Hazard Estimate") /* > */ label(3 "Lower 95% confidence band") ) . graph export na_pt1.wmf, replace (file c:\Imbook\bwebpage\Section4\na_pt1.wmf written in Windows Metafile format) . . * (3D) Figure 17.6: Cumulative Hazard Function by Treatment (here ui) (p.606) . * sts graph, na by(ui) . sts graph, na by(ui) /* > */ scale (1.2) plotregion(style(none)) /* > */ title("Cumulative Hazard Estimates by UI Status") /* > */ xtitle("Unemployment Duration in 2-week intervals", size(medlarge)) xscale(titlegap(*5)) /* > */ ytitle("Cumulative Hazard", size(medlarge)) yscale(titlegap(*5)) /* > */ legend(pos(1) ring(0) col(1)) legend(size(small)) /* > */ legend(label(1 "No UI (UI = 0)") label(2 "Received UI (UI = 1)") ) failure _d: censor1 == 1 analysis time _t: spell . graph export na_pt2.wmf, replace (file c:\Imbook\bwebpage\Section4\na_pt2.wmf written in Windows Metafile format) . . * (4) VARIOUS PARAMETRIC MODELS: COEFFICIENTS (Table 17.8) . . * streg default is to report hazard rates ratehr than coeffcients . * streg with nohr option reports coefficients . . * Create regressors . gen RR = reprate . gen DR = disrate . gen UI = ui . gen RRUI = RR*UI . gen DRUI = DR*UI 368

. gen LOGWAGE = logwage . . * Define $xlist = list of regressors used in subsequent regressions . global xlist RR DR UI RRUI DRUI LOGWAGE /* > */ tenure slack abolpos explose stateur houshead married /* > */ female child ychild nonwhite age schlt12 schgt12 smsa bluecoll /* > */ mining constr transp trade fire services pubadmin /* > */ year85 year87 year89 midatl /* > */ encen wncen southatl escen wscen mountain pacific . . * Exponential regression . streg $xlist, nohr robust dist(exponential) failure _d: censor1 == 1 analysis time _t: spell Iteration 0: Iteration 1: Iteration 2: Iteration 3: Iteration 4: Iteration 5:

log pseudo-likelihood = -3012.4909 log pseudo-likelihood = -2810.3791 log pseudo-likelihood = -2701.8024 log pseudo-likelihood = -2700.6911 log pseudo-likelihood = -2700.6903 log pseudo-likelihood = -2700.6903

Exponential regression -- log relative-hazard form No. of subjects No. of failures Time at risk

= = =

3343 1073 20887

Number of obs =

Wald chi2(40) = 565.24 Log pseudo-likelihood = -2700.6903 Prob > chi2

3343

=

0.0000

-----------------------------------------------------------------------------| Robust _t | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------RR | .4720235 .6005534 0.79 0.432 -.7050396 1.649087 DR | -.5756396 .7624489 -0.75 0.450 -2.070012 .9187327 UI | -1.424561 .2493917 -5.71 0.000 -1.91336 -.9357622 RRUI | .9655904 .6118408 1.58 0.115 -.2335956 2.164776 DRUI | -.1990635 1.019118 -0.20 0.845 -2.196498 1.798371 LOGWAGE | .3508005 .115598 3.03 0.002 .1242327 .5773684 tenure | -.0001462 .0064637 -0.02 0.982 -.0128147 .0125224 slack | -.2593666 .0759363 -3.42 0.001 -.4081991 -.1105342 abolpos | -.1550897 .0953306 -1.63 0.104 -.3419342 .0317549 explose | .198458 .0648354 3.06 0.002 .071383 .3255331 stateur | -.064626 .0229903 -2.81 0.005 -.1096862 -.0195659 houshead | .3812208 .0836602 4.56 0.000 .2172499 .5451918 married | .369552 .0786145 4.70 0.000 .2154705 .5236335 369

female | .1164067 .0852986 1.36 0.172 -.0507754 .2835888 child | -.0333008 .0794577 -0.42 0.675 -.1890352 .1224335 ychild | -.1449722 .1022781 -1.42 0.156 -.3454336 .0554892 nonwhite | -.6692066 .1188272 -5.63 0.000 -.9021037 -.4363095 age | -.0220821 .0039256 -5.63 0.000 -.0297762 -.0143879 schlt12 | -.1231414 .0966102 -1.27 0.202 -.3124939 .066211 schgt12 | .1114395 .082945 1.34 0.179 -.0511297 .2740087 smsa | .1922291 .0799904 2.40 0.016 .0354508 .3490075 bluecoll | -.2033718 .085129 -2.39 0.017 -.3702215 -.036522 mining | -.1205818 .1973575 -0.61 0.541 -.5073955 .2662319 constr | -.04475 .1081519 -0.41 0.679 -.2567237 .1672238 transp | -.1786694 .156034 -1.15 0.252 -.4844906 .1271517 trade | -.0345159 .1019152 -0.34 0.735 -.234266 .1652341 fire | .1120549 .1386716 0.81 0.419 -.1597365 .3838462 services | .1840002 .0983911 1.87 0.061 -.0088428 .3768432 pubadmin | .1090606 .2954211 0.37 0.712 -.4699541 .6880752 year85 | .2147661 .0888664 2.42 0.016 .0405911 .388941 year87 | .3541162 .0948499 3.73 0.000 .1682139 .5400186 year89 | .467082 .1104355 4.23 0.000 .2506325 .6835316 midatl | .0264112 .1465647 0.18 0.857 -.2608503 .3136727 encen | .0043916 .1502813 0.03 0.977 -.2901544 .2989375 wncen | .1724311 .1607689 1.07 0.283 -.1426703 .4875324 southatl | .2638807 .1183726 2.23 0.026 .0318747 .4958867 escen | .35414 .19317 1.83 0.067 -.0244664 .7327463 wscen | .3385896 .1433308 2.36 0.018 .0576664 .6195128 mountain | .0063693 .1538821 0.04 0.967 -.2952341 .3079727 pacific | .0770202 .2393505 0.32 0.748 -.3920982 .5461385 _cons | -4.079107 .8767097 -4.65 0.000 -5.797426 -2.360788 -----------------------------------------------------------------------------. estimates store bexponential . . * Weibull regression . streg $xlist, nohr robust dist(weibull) failure _d: censor1 == 1 analysis time _t: spell Fitting constant-only model: Iteration 0: log pseudo-likelihood = -3012.4909 Iteration 1: log pseudo-likelihood = -3012.3543 Iteration 2: log pseudo-likelihood = -3012.3543 Fitting full model: Iteration 0: Iteration 1: Iteration 2: Iteration 3:

log pseudo-likelihood = -3012.3543 log pseudo-likelihood = -2799.9064 log pseudo-likelihood = -2688.7377 log pseudo-likelihood = -2687.6004 370

Iteration 4: log pseudo-likelihood = -2687.5995 Iteration 5: log pseudo-likelihood = -2687.5995 Weibull regression -- log relative-hazard form No. of subjects No. of failures Time at risk

= = =

3343 1073 20887

Number of obs =

Wald chi2(40) = 501.65 Log pseudo-likelihood = -2687.5995 Prob > chi2

3343

=

0.0000

-----------------------------------------------------------------------------| Robust _t | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------RR | .4481156 .6381895 0.70 0.483 -.8027127 1.698944 DR | -.4269187 .8086983 -0.53 0.598 -2.011938 1.158101 UI | -1.496066 .2639679 -5.67 0.000 -2.013434 -.9786984 RRUI | 1.015226 .6455611 1.57 0.116 -.2500501 2.280503 DRUI | -.2988417 1.065384 -0.28 0.779 -2.386956 1.789272 LOGWAGE | .3655253 .12212 2.99 0.003 .1261745 .6048761 tenure | -.0011127 .0068716 -0.16 0.871 -.0145809 .0123554 slack | -.2652154 .0803214 -3.30 0.001 -.4226424 -.1077883 abolpos | -.1604227 .1012942 -1.58 0.113 -.3589557 .0381103 explose | .2075085 .0684715 3.03 0.002 .0733068 .3417103 stateur | -.0708745 .0242117 -2.93 0.003 -.1183286 -.0234204 houshead | .3976626 .0887192 4.48 0.000 .2237762 .571549 married | .3786057 .0830317 4.56 0.000 .2158665 .541345 female | .1260829 .0896987 1.41 0.160 -.0497233 .301889 child | -.0336778 .0839956 -0.40 0.688 -.1983061 .1309505 ychild | -.1613066 .108947 -1.48 0.139 -.3748389 .0522256 nonwhite | -.7025504 .12426 -5.65 0.000 -.9460956 -.4590052 age | -.0235823 .0041922 -5.63 0.000 -.0317989 -.0153658 schlt12 | -.1226759 .1022762 -1.20 0.230 -.3231335 .0777816 schgt12 | .1162848 .0880692 1.32 0.187 -.0563278 .2888973 smsa | .1999567 .0841129 2.38 0.017 .0350985 .3648149 bluecoll | -.1994925 .0899354 -2.22 0.027 -.3757626 -.0232223 mining | -.1015676 .2036644 -0.50 0.618 -.5007425 .2976073 constr | -.0253737 .1135609 -0.22 0.823 -.247949 .1972016 transp | -.1981522 .1672141 -1.19 0.236 -.5258858 .1295814 trade | -.0311361 .1079502 -0.29 0.773 -.2427146 .1804423 fire | .1262153 .1492527 0.85 0.398 -.1663145 .4187452 services | .2031673 .1038945 1.96 0.051 -.0004622 .4067968 pubadmin | .1117728 .3087374 0.36 0.717 -.4933415 .716887 year85 | .2374972 .093387 2.54 0.011 .054462 .4205325 year87 | .3787397 .1011782 3.74 0.000 .1804341 .5770454 year89 | .4920278 .1180472 4.17 0.000 .2606596 .7233959 midatl | .02465 .1542139 0.16 0.873 -.2776037 .3269036 encen | -.0014111 .1579065 -0.01 0.993 -.3109023 .30808 wncen | .1844363 .1694444 1.09 0.276 -.1476687 .5165413 southatl | .2740974 .1250481 2.19 0.028 .0290076 .5191872 371

escen | .367742 .2024771 1.82 0.069 -.0291058 .7645899 wscen | .3440005 .1527804 2.25 0.024 .0445563 .6434446 mountain | .0159627 .1620188 0.10 0.922 -.3015883 .3335136 pacific | .0849532 .2504077 0.34 0.734 -.4058368 .5757432 _cons | -4.357886 .9196792 -4.74 0.000 -6.160424 -2.555347 -------------+---------------------------------------------------------------/ln_p | .1215314 .0194374 6.25 0.000 .0834348 .1596281 -------------+---------------------------------------------------------------p | 1.129225 .0219492 1.087014 1.173075 1/p | .8855632 .0172131 .8524608 .9199511 -----------------------------------------------------------------------------. estimates store bweibull . . * Gompertz regression . streg $xlist, nohr robust dist(gompertz) failure _d: censor1 == 1 analysis time _t: spell Fitting constant-only model: Iteration 0: Iteration 1: Iteration 2: Iteration 3:

log pseudo-likelihood = -3012.4909 log pseudo-likelihood = -3002.0916 log pseudo-likelihood = -3002.026 log pseudo-likelihood = -3002.026

Fitting full model: Iteration 0: Iteration 1: Iteration 2: Iteration 3: Iteration 4: Iteration 5:

log pseudo-likelihood = -3002.026 log pseudo-likelihood = -2796.0001 log pseudo-likelihood = -2701.6693 log pseudo-likelihood = -2700.6057 log pseudo-likelihood = -2700.605 log pseudo-likelihood = -2700.605

Gompertz regression -- log relative-hazard form No. of subjects No. of failures Time at risk

= = =

Log pseudo-likelihood =

3343 1073 20887

Number of obs =

Wald chi2(40) = 529.75 -2700.605 Prob > chi2

3343

=

0.0000

-----------------------------------------------------------------------------| Robust _t | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------RR | .472405 .6033813 0.78 0.434 -.7102005 1.655011 DR | -.5627894 .7646131 -0.74 0.462 -2.061404 .9358247 372

UI | -1.428355 .2508349 -5.69 0.000 -1.919982 -.9367272 RRUI | .9689413 .6144464 1.58 0.115 -.2353514 2.173234 DRUI | -.2112495 1.021112 -0.21 0.836 -2.212593 1.790094 LOGWAGE | .3524722 .1162698 3.03 0.002 .1245876 .5803567 tenure | -.0002233 .0065002 -0.03 0.973 -.0129635 .0125168 slack | -.2593933 .0762829 -3.40 0.001 -.4089051 -.1098815 abolpos | -.1552595 .0958002 -1.62 0.105 -.3430244 .0325053 explose | .1991286 .0650876 3.06 0.002 .0715592 .326698 stateur | -.065244 .0231645 -2.82 0.005 -.1106456 -.0198424 houshead | .3822818 .0841671 4.54 0.000 .2173173 .5472464 married | .3700141 .0789107 4.69 0.000 .215352 .5246762 female | .1170987 .0856236 1.37 0.171 -.0507206 .2849179 child | -.0331425 .0798246 -0.42 0.678 -.1895958 .1233108 ychild | -.1466596 .102884 -1.43 0.154 -.3483085 .0549893 nonwhite | -.6720521 .1197092 -5.61 0.000 -.9066778 -.4374264 age | -.0222175 .0039787 -5.58 0.000 -.0300157 -.0144193 schlt12 | -.1228615 .097015 -1.27 0.205 -.3130075 .0672845 schgt12 | .1121295 .0831976 1.35 0.178 -.0509348 .2751938 smsa | .1925807 .0803478 2.40 0.017 .0351019 .3500596 bluecoll | -.203405 .0854986 -2.38 0.017 -.3709791 -.0358309 mining | -.1183683 .1976441 -0.60 0.549 -.5057435 .269007 constr | -.0423947 .1082891 -0.39 0.695 -.2546375 .169848 transp | -.1799724 .1570001 -1.15 0.252 -.487687 .1277422 trade | -.0341793 .1023611 -0.33 0.738 -.2348034 .1664447 fire | .1143611 .1398161 0.82 0.413 -.1596734 .3883955 services | .1854033 .0987923 1.88 0.061 -.0082261 .3790327 pubadmin | .1089298 .2965867 0.37 0.713 -.4723694 .690229 year85 | .2172389 .0890506 2.44 0.015 .0427028 .3917749 year87 | .3564181 .095298 3.74 0.000 .1696374 .5431988 year89 | .4690752 .1114266 4.21 0.000 .250683 .6874674 midatl | .026766 .1471298 0.18 0.856 -.2616031 .3151351 encen | .0043808 .15089 0.03 0.977 -.2913581 .3001198 wncen | .1735986 .1614007 1.08 0.282 -.142741 .4899382 southatl | .2647448 .1188746 2.23 0.026 .031755 .4977347 escen | .3560917 .1938142 1.84 0.066 -.0237772 .7359606 wscen | .3393956 .1442438 2.35 0.019 .0566829 .6221082 mountain | .0076507 .1545162 0.05 0.961 -.2951954 .3104969 pacific | .0778885 .2400495 0.32 0.746 -.3925999 .5483769 _cons | -4.09733 .8802997 -4.65 0.000 -5.822686 -2.371975 -------------+---------------------------------------------------------------gamma | .002658 .0067759 0.39 0.695 -.0106225 .0159386 -----------------------------------------------------------------------------. estimates store bgompertz . . * Weibull regression . stcox $xlist, nohr robust failure _d: censor1 == 1 analysis time _t: spell 373

Iteration 0: log pseudo-likelihood = -7981.9304 Iteration 1: log pseudo-likelihood = -7731.2822 Iteration 2: log pseudo-likelihood = -7717.3198 Iteration 3: log pseudo-likelihood = -7717.2334 Iteration 4: log pseudo-likelihood = -7717.2334 Refining estimates: Iteration 0: log pseudo-likelihood = -7717.2334 Cox regression -- Breslow method for ties No. of subjects No. of failures Time at risk

= = =

3343 1073 20887

Number of obs =

Wald chi2(40) = 540.98 Log pseudo-likelihood = -7717.2334 Prob > chi2

3343

=

0.0000

-----------------------------------------------------------------------------| Robust _t | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------RR | .5222796 .5711698 0.91 0.361 -.5971926 1.641752 DR | -.752507 .72175 -1.04 0.297 -2.167111 .6620971 UI | -1.317719 .2372893 -5.55 0.000 -1.782798 -.8526409 RRUI | .8822462 .582115 1.52 0.130 -.2586783 2.023171 DRUI | -.0951357 .977774 -0.10 0.922 -2.011538 1.821266 LOGWAGE | .3352639 .1106483 3.03 0.002 .1183972 .5521306 tenure | .0008278 .0061286 0.14 0.893 -.0111841 .0128396 slack | -.247863 .0721173 -3.44 0.001 -.3892103 -.1065158 abolpos | -.1511638 .0905035 -1.67 0.095 -.3285475 .0262198 explose | .1865068 .0615742 3.03 0.002 .0658236 .30719 stateur | -.0590475 .022085 -2.67 0.008 -.1023334 -.0157616 houshead | .3601866 .0794827 4.53 0.000 .2044035 .5159698 married | .358819 .0746355 4.81 0.000 .2125362 .5051019 female | .1002758 .0813277 1.23 0.218 -.0591236 .2596753 child | -.0396054 .0755365 -0.52 0.600 -.1876542 .1084435 ychild | -.1276638 .0967856 -1.32 0.187 -.3173602 .0620325 nonwhite | -.6394475 .1151332 -5.55 0.000 -.8651043 -.4137906 age | -.0204623 .0037593 -5.44 0.000 -.0278305 -.0130942 schlt12 | -.1220585 .0920073 -1.33 0.185 -.3023895 .0582726 schgt12 | .1104817 .0783542 1.41 0.159 -.0430897 .2640531 smsa | .1864841 .0766075 2.43 0.015 .0363361 .3366321 bluecoll | -.2108023 .080867 -2.61 0.009 -.3692986 -.052306 mining | -.1238251 .1906352 -0.65 0.516 -.4974632 .249813 constr | -.054455 .1029488 -0.53 0.597 -.256231 .1473209 transp | -.1551657 .1466515 -1.06 0.290 -.4425973 .1322659 trade | -.0383252 .0968106 -0.40 0.692 -.2280706 .1514201 fire | .1097585 .1300779 0.84 0.399 -.1451895 .3647065 services | .1666262 .0939507 1.77 0.076 -.0175138 .3507662 pubadmin | .1022002 .2829817 0.36 0.718 -.4524336 .6568341 year85 | .204162 .084908 2.40 0.016 .0377454 .3705786 374

year87 | .3384229 .0899115 3.76 0.000 .1621997 .5146462 year89 | .4486559 .104937 4.28 0.000 .2429832 .6543286 midatl | .0342238 .140515 0.24 0.808 -.2411805 .3096282 encen | .0174597 .1438862 0.12 0.903 -.2645521 .2994716 wncen | .1650967 .1532559 1.08 0.281 -.1352795 .4654728 southatl | .2518023 .1127138 2.23 0.025 .0308874 .4727172 escen | .3450422 .1839818 1.88 0.061 -.0155554 .7056398 wscen | .3316752 .1359801 2.44 0.015 .0651591 .5981914 mountain | .009484 .1468626 0.06 0.949 -.2783613 .2973293 pacific | .0720292 .2263339 0.32 0.750 -.3715771 .5156355 -----------------------------------------------------------------------------. estimates store bcox . . * Display Results for Table 17.8 (page 607) . estimates table bexponential bweibull bgompertz, t stats(N ll) b(%8.3f) /* > */ keep(RR DR UI RRUI DRUI LOGWAGE _cons) ----------------------------------------------Variable | bexpon~l bweibull bgompe~z -------------+--------------------------------RR | 0.472 0.448 0.472 | 0.79 0.70 0.78 DR | -0.576 -0.427 -0.563 | -0.75 -0.53 -0.74 UI | -1.425 -1.496 -1.428 | -5.71 -5.67 -5.69 RRUI | 0.966 1.015 0.969 | 1.58 1.57 1.58 DRUI | -0.199 -0.299 -0.211 | -0.20 -0.28 -0.21 LOGWAGE | 0.351 0.366 0.352 | 3.03 2.99 3.03 _cons | -4.079 -4.358 -4.097 | -4.65 -4.74 -4.65 -------------+--------------------------------N | 3343.000 3343.000 3343.000 ll | -2.7e+03 -2.7e+03 -2.7e+03 ----------------------------------------------legend: b/t . estimates table bcox, t stats(N ll) b(%8.3f) keep(RR DR UI RRUI DRUI LOGWAGE) ------------------------Variable | bcox -------------+----------RR | 0.522 | 0.91 DR | -0.753 | -1.04 375

UI | -1.318 | -5.55 RRUI | 0.882 | 1.52 DRUI | -0.095 | -0.10 LOGWAGE | 0.335 | 3.03 -------------+----------N | 3343.000 ll | -7.7e+03 ------------------------legend: b/t . . * (5) VARIOUS PARAMETRIC MODELS: HAZARD RATIOS (Table 17.9, page 608)) . . * streg default is to report hazard rates rather than coeffcients . * streg with nohr option reports coefficients . . * Exponential regression . streg $xlist, robust dist(exponential) failure _d: censor1 == 1 analysis time _t: spell Iteration 0: Iteration 1: Iteration 2: Iteration 3: Iteration 4: Iteration 5:

log pseudo-likelihood = -3012.4909 log pseudo-likelihood = -2810.3791 log pseudo-likelihood = -2701.8024 log pseudo-likelihood = -2700.6911 log pseudo-likelihood = -2700.6903 log pseudo-likelihood = -2700.6903

Exponential regression -- log relative-hazard form No. of subjects No. of failures Time at risk

= = =

3343 1073 20887

Number of obs =

Wald chi2(40) = 565.24 Log pseudo-likelihood = -2700.6903 Prob > chi2

3343

=

0.0000

-----------------------------------------------------------------------------| Robust _t | Haz. Ratio Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------RR | 1.603235 .9628283 0.79 0.432 .494089 5.202226 DR | .5623451 .4287594 -0.75 0.450 .1261843 2.506112 UI | .2406141 .0600072 -5.71 0.000 .1475837 .3922867 RRUI | 2.626338 1.606901 1.58 0.115 .7916819 8.712654 DRUI | .8194978 .8351649 -0.20 0.845 .1111919 6.039799 LOGWAGE | 1.420204 .1641727 3.03 0.002 1.132279 1.781344 376

tenure | .9998539 .0064627 -0.02 0.982 .9872671 1.012601 slack | .7715401 .0585879 -3.42 0.001 .6648465 .8953557 abolpos | .8563384 .0816353 -1.63 0.104 .7103949 1.032264 explose | 1.219521 .0790681 3.06 0.002 1.073992 1.384769 stateur | .937418 .0215515 -2.81 0.005 .8961153 .9806243 houshead | 1.464071 .1224844 4.56 0.000 1.242655 1.724939 married | 1.447086 .1137619 4.70 0.000 1.240445 1.68815 female | 1.123453 .0958289 1.36 0.172 .9504921 1.327887 child | .9672475 .0768553 -0.42 0.675 .8277574 1.130244 ychild | .8650463 .0884753 -1.42 0.156 .7079133 1.057058 nonwhite | .5121147 .0608532 -5.63 0.000 .4057153 .6464176 age | .9781599 .0038399 -5.63 0.000 .9706627 .9857151 schlt12 | .8841386 .0854168 -1.27 0.202 .7316201 1.068452 schgt12 | 1.117886 .0927231 1.34 0.179 .9501554 1.315226 smsa | 1.211948 .0969443 2.40 0.016 1.036087 1.41766 bluecoll | .8159748 .0694631 -2.39 0.017 .6905813 .9641369 mining | .8864046 .1749386 -0.61 0.541 .6020616 1.305038 constr | .9562365 .1034188 -0.41 0.679 .7735819 1.182019 transp | .8363823 .1305041 -1.15 0.252 .6160109 1.135589 trade | .966073 .0984575 -0.34 0.735 .7911514 1.179669 fire | 1.118574 .1551145 0.81 0.419 .8523684 1.46792 services | 1.202016 .1182677 1.87 0.061 .9911962 1.457676 pubadmin | 1.11523 .3294624 0.37 0.712 .625031 1.989882 year85 | 1.239572 .1101563 2.42 0.016 1.041426 1.475418 year87 | 1.424921 .1351536 3.73 0.000 1.18319 1.716039 year89 | 1.595332 .1761812 4.23 0.000 1.284838 1.980861 midatl | 1.026763 .1504872 0.18 0.857 .7703962 1.368442 encen | 1.004401 .1509427 0.03 0.977 .7481481 1.348425 wncen | 1.18819 .191024 1.07 0.283 .8670399 1.628293 southatl | 1.301973 .1541179 2.23 0.026 1.032388 1.641953 escen | 1.424955 .2752586 1.83 0.067 .9758305 2.080787 wscen | 1.402967 .2010884 2.36 0.018 1.059362 1.858023 mountain | 1.00639 .1548654 0.04 0.967 .7443573 1.360664 pacific | 1.080064 .2585138 0.32 0.748 .6756378 1.726573 -----------------------------------------------------------------------------. . * Weibull regression . streg $xlist, robust dist(weibull) failure _d: censor1 == 1 analysis time _t: spell Fitting constant-only model: Iteration 0: log pseudo-likelihood = -3012.4909 Iteration 1: log pseudo-likelihood = -3012.3543 Iteration 2: log pseudo-likelihood = -3012.3543 Fitting full model:

377

Iteration 0: Iteration 1: Iteration 2: Iteration 3: Iteration 4: Iteration 5:

log pseudo-likelihood = -3012.3543 log pseudo-likelihood = -2799.9064 log pseudo-likelihood = -2688.7377 log pseudo-likelihood = -2687.6004 log pseudo-likelihood = -2687.5995 log pseudo-likelihood = -2687.5995

Weibull regression -- log relative-hazard form No. of subjects No. of failures Time at risk

= = =

3343 1073 20887

Number of obs =

Wald chi2(40) = 501.65 Log pseudo-likelihood = -2687.5995 Prob > chi2

3343

=

0.0000

-----------------------------------------------------------------------------| Robust _t | Haz. Ratio Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------RR | 1.56536 .998996 0.70 0.483 .4481117 5.46817 DR | .6525166 .527689 -0.53 0.598 .1337292 3.183881 UI | .2240097 .0591314 -5.67 0.000 .1335294 .3757999 RRUI | 2.759988 1.781741 1.57 0.116 .7787618 9.781599 DRUI | .7416768 .7901705 -0.28 0.779 .0919091 5.985096 LOGWAGE | 1.441271 .176008 2.99 0.003 1.13448 1.831025 tenure | .9988879 .006864 -0.16 0.871 .9855249 1.012432 slack | .7670407 .0616098 -3.30 0.001 .6553129 .8978176 abolpos | .8517837 .0862808 -1.58 0.113 .6984053 1.038846 explose | 1.230608 .0842616 3.03 0.002 1.076061 1.407352 stateur | .9315788 .0225551 -2.93 0.003 .8884041 .9768517 houshead | 1.488342 .1320445 4.48 0.000 1.250791 1.771008 married | 1.460247 .1212469 4.56 0.000 1.240937 1.718316 female | 1.134376 .101752 1.41 0.160 .9514927 1.352411 child | .966883 .0812139 -0.40 0.688 .8201188 1.139911 ychild | .8510311 .0927173 -1.48 0.139 .6874 1.053613 nonwhite | .4953204 .0615485 -5.65 0.000 .388254 .6319119 age | .9766936 .0040945 -5.63 0.000 .9687014 .9847517 schlt12 | .8845503 .0904684 -1.20 0.230 .7238772 1.080887 schgt12 | 1.123316 .0989295 1.32 0.187 .9452293 1.334955 smsa | 1.22135 .1027313 2.38 0.017 1.035722 1.440247 bluecoll | .8191464 .0736702 -2.22 0.027 .6867654 .9770452 mining | .9034201 .1839945 -0.50 0.618 .6060805 1.346633 constr | .9749455 .1107157 -0.22 0.823 .7803997 1.21799 transp | .820245 .1371565 -1.19 0.236 .5910316 1.138352 trade | .9693436 .1046408 -0.29 0.773 .7844954 1.197747 fire | 1.134526 .1693311 0.85 0.398 .8467799 1.520053 services | 1.225277 .1272996 1.96 0.051 .9995379 1.501999 pubadmin | 1.118259 .3452483 0.36 0.717 .6105827 2.048048 year85 | 1.268072 .1184214 2.54 0.011 1.055972 1.522772 year87 | 1.460443 .147765 3.74 0.000 1.197737 1.780769 year89 | 1.63563 .1930814 4.17 0.000 1.297786 2.061422 378

midatl | 1.024956 .1580625 0.16 0.873 .757597 1.386668 encen | .9985899 .1576839 -0.01 0.993 .7327855 1.36081 wncen | 1.20254 .2037638 1.09 0.276 .8627169 1.67622 southatl | 1.315343 .1644812 2.19 0.028 1.029432 1.680661 escen | 1.444469 .292472 1.82 0.069 .9713137 2.148113 wscen | 1.410579 .2155089 2.25 0.024 1.045564 1.903025 mountain | 1.016091 .1646258 0.10 0.922 .7396425 1.395864 pacific | 1.088666 .2726104 0.34 0.734 .6664189 1.778452 -------------+---------------------------------------------------------------/ln_p | .1215314 .0194374 6.25 0.000 .0834348 .1596281 -------------+---------------------------------------------------------------p | 1.129225 .0219492 1.087014 1.173075 1/p | .8855632 .0172131 .8524608 .9199511 -----------------------------------------------------------------------------. . * Gompertz regression . streg $xlist, robust dist(gompertz) failure _d: censor1 == 1 analysis time _t: spell Fitting constant-only model: Iteration 0: Iteration 1: Iteration 2: Iteration 3:

log pseudo-likelihood = -3012.4909 log pseudo-likelihood = -3002.0916 log pseudo-likelihood = -3002.026 log pseudo-likelihood = -3002.026

Fitting full model: Iteration 0: Iteration 1: Iteration 2: Iteration 3: Iteration 4: Iteration 5:

log pseudo-likelihood = -3002.026 log pseudo-likelihood = -2796.0001 log pseudo-likelihood = -2701.6693 log pseudo-likelihood = -2700.6057 log pseudo-likelihood = -2700.605 log pseudo-likelihood = -2700.605

Gompertz regression -- log relative-hazard form No. of subjects No. of failures Time at risk

= = =

Log pseudo-likelihood =

3343 1073 20887

Number of obs =

Wald chi2(40) = 529.75 -2700.605 Prob > chi2

3343

=

0.0000

-----------------------------------------------------------------------------| Robust _t | Haz. Ratio Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------RR | 1.603847 .9677311 0.78 0.434 .4915456 5.233135 379

DR | .5696179 .4355373 -0.74 0.462 .1272752 2.549315 UI | .239703 .0601259 -5.69 0.000 .1466096 .3919084 RRUI | 2.635153 1.61916 1.58 0.115 .7902931 8.786655 DRUI | .809572 .8266639 -0.21 0.836 .1094166 5.990014 LOGWAGE | 1.42258 .165403 3.03 0.002 1.132681 1.786676 tenure | .9997767 .0064987 -0.03 0.973 .9871202 1.012595 slack | .7715195 .0588538 -3.40 0.001 .6643773 .8959403 abolpos | .856193 .0820234 -1.62 0.105 .7096209 1.033039 explose | 1.220339 .079429 3.06 0.002 1.074182 1.386383 stateur | .9368388 .0217014 -2.82 0.005 .895256 .9803531 houshead | 1.465625 .1233575 4.54 0.000 1.242738 1.728487 married | 1.447755 .1142433 4.69 0.000 1.240298 1.689912 female | 1.12423 .0962607 1.37 0.171 .9505442 1.329653 child | .9674007 .0772224 -0.42 0.678 .8272934 1.131236 ychild | .8635879 .0888493 -1.43 0.154 .7058811 1.056529 nonwhite | .5106596 .0611307 -5.61 0.000 .4038637 .6456961 age | .9780275 .0038913 -5.58 0.000 .9704303 .9856841 schlt12 | .8843861 .0857988 -1.27 0.205 .7312444 1.0696 schgt12 | 1.118658 .0930697 1.35 0.178 .9503406 1.316786 smsa | 1.212374 .0974117 2.40 0.017 1.035725 1.419152 bluecoll | .8159478 .0697624 -2.38 0.017 .6900584 .9648035 mining | .8883688 .1755808 -0.60 0.549 .603057 1.308664 constr | .9584913 .1037942 -0.39 0.695 .7751974 1.185125 transp | .8352933 .1311411 -1.15 0.252 .614045 1.13626 trade | .9663982 .0989216 -0.33 0.738 .7907263 1.181098 fire | 1.121157 .1567557 0.82 0.413 .8524222 1.474613 services | 1.203704 .1189167 1.88 0.061 .9918076 1.460871 pubadmin | 1.115084 .3307191 0.37 0.713 .6235232 1.994172 year85 | 1.242641 .110658 2.44 0.015 1.043628 1.479605 year87 | 1.428205 .1361051 3.74 0.000 1.184875 1.721505 year89 | 1.598515 .1781172 4.21 0.000 1.284903 1.988673 midatl | 1.027127 .1511211 0.18 0.856 .7698165 1.370444 encen | 1.00439 .1515525 0.03 0.977 .747248 1.35002 wncen | 1.189578 .1919987 1.08 0.282 .8669786 1.632215 southatl | 1.303098 .1549053 2.23 0.026 1.032265 1.644991 escen | 1.427739 .276716 1.84 0.066 .9765033 2.087486 wscen | 1.404099 .2025325 2.35 0.019 1.05832 1.862851 mountain | 1.00768 .1557029 0.05 0.961 .7443861 1.364103 pacific | 1.081002 .2594941 0.32 0.746 .6752989 1.730442 -------------+---------------------------------------------------------------gamma | .002658 .0067759 0.39 0.695 -.0106225 .0159386 -----------------------------------------------------------------------------. . * Cox regression . stcox $xlist, robust failure _d: censor1 == 1 analysis time _t: spell Iteration 0: log pseudo-likelihood = -7981.9304 380

Iteration 1: log pseudo-likelihood = -7731.2822 Iteration 2: log pseudo-likelihood = -7717.3198 Iteration 3: log pseudo-likelihood = -7717.2334 Iteration 4: log pseudo-likelihood = -7717.2334 Refining estimates: Iteration 0: log pseudo-likelihood = -7717.2334 Cox regression -- Breslow method for ties No. of subjects No. of failures Time at risk

= = =

3343 1073 20887

Number of obs =

Wald chi2(40) = 540.98 Log pseudo-likelihood = -7717.2334 Prob > chi2

3343

=

0.0000

-----------------------------------------------------------------------------| Robust _t | Haz. Ratio Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------RR | 1.685866 .962916 0.91 0.361 .5503545 5.164209 DR | .4711838 .3400769 -1.04 0.297 .1145079 1.938854 UI | .2677452 .0635331 -5.55 0.000 .168167 .4262877 RRUI | 2.416321 1.406577 1.52 0.130 .7720714 7.562264 DRUI | .9092495 .8890406 -0.10 0.922 .1337828 6.179678 LOGWAGE | 1.398309 .1547206 3.03 0.002 1.125691 1.73695 tenure | 1.000828 .0061337 0.14 0.893 .9888782 1.012922 slack | .7804668 .0562851 -3.44 0.001 .6775918 .8989608 abolpos | .8597068 .0778065 -1.67 0.095 .7199688 1.026567 explose | 1.205033 .0741989 3.03 0.002 1.068038 1.359599 stateur | .942662 .0208187 -2.67 0.008 .9027285 .9843619 houshead | 1.433597 .1139461 4.53 0.000 1.226793 1.675262 married | 1.431638 .106851 4.81 0.000 1.236811 1.657154 female | 1.105476 .0899059 1.23 0.218 .9425903 1.296509 child | .9611687 .0726033 -0.52 0.600 .8289013 1.114542 ychild | .8801492 .0851858 -1.32 0.187 .7280685 1.063997 nonwhite | .5275839 .0607424 -5.55 0.000 .4210076 .6611394 age | .9797456 .0036832 -5.44 0.000 .9725532 .9869912 schlt12 | .8850966 .0814354 -1.33 0.185 .7390501 1.060004 schgt12 | 1.116816 .0875072 1.41 0.159 .9578255 1.302197 smsa | 1.205005 .0923125 2.43 0.015 1.037004 1.400224 bluecoll | .8099341 .0654969 -2.61 0.009 .6912189 .9490384 mining | .8835344 .1684327 -0.65 0.516 .6080713 1.283785 constr | .9470011 .0974926 -0.53 0.597 .7739632 1.158726 transp | .8562733 .1255737 -1.06 0.290 .6423659 1.141412 trade | .9623999 .0931706 -0.40 0.692 .796068 1.163485 fire | 1.116009 .1451681 0.84 0.399 .8648584 1.440091 services | 1.181313 .1109851 1.77 0.076 .9826387 1.420155 pubadmin | 1.107605 .313432 0.36 0.718 .6360783 1.928677 year85 | 1.226497 .1041394 2.40 0.016 1.038467 1.448572 year87 | 1.402734 .1261218 3.76 0.000 1.176095 1.673046 year89 | 1.566206 .1643529 4.28 0.000 1.275047 1.92385 381

midatl | 1.034816 .1454072 0.24 0.808 .7856998 1.362918 encen | 1.017613 .1464205 0.12 0.903 .7675496 1.349146 wncen | 1.179507 .1807665 1.08 0.281 .8734718 1.592767 southatl | 1.286342 .1449884 2.23 0.025 1.031369 1.604348 escen | 1.41205 .2597913 1.88 0.061 .984565 2.025142 wscen | 1.3933 .1894611 2.44 0.015 1.067329 1.818826 mountain | 1.009529 .148262 0.06 0.949 .7570232 1.346259 pacific | 1.074687 .243238 0.32 0.750 .6896459 1.674702 -----------------------------------------------------------------------------. . * Display results for Table 17.9 page 608 . * Not possible here as estimates table gives coefficients not hazard rates . * Instead need to use output for each model . * Not sure why t-statistics differ somewhat from those in Table 17.9 . . ********** CLOSE OUTPUT ********** . log close log: c:\Imbook\bwebpage\Section4\mma17p4duration.txt log type: text closed on: 19 May 2005, 15:25:17

382

-----------------------------------------------------------------------------------------------------log: c:\Imbook\bwebpage\Section4\mma18p1heterogeneity.txt log type: text opened on: 19 May 2005, 17:58:22 . . ********** OVERVIEW OF MMA18P1HETEROGENEITY.DO ********** . . * STATA Program . * copyright C 2005 by A. Colin Cameron and Pravin K. Trivedi . * used for "Microeconometrics: Methods and Applications" . * by A. Colin Cameron and Pravin K. Trivedi (2005) . * Cambridge University Press . . * Chapter 18.8 Pages 632-6 . * Unobserved Heterogeneity with Duration data Example . * (1) Exponential with and without heterogeneity .* Residuals Plots: Figures 18.2 (exp.wmf) and 18.3 (exp_gamma.wmf) .* Tabulate Model Estimates: Table 18.1 . * (2) Weibull with and without heterogeneity: Generalized Residuals Plots .* Residuals Plots: Figures 18.4 (Weibul16.wmf) and 18.5 (Weibul16_IG.wmf) .* Tabulate model Estimates: Table 18.2 . . * To run this program you need data file . * ema1996.dta . . ********** SETUP ********** . . set more off . version 8.0 . set scheme s1mono /* Used for graphs */ . set matsize 100 . . ********** DATA DESCRIPTION ********** . . * The data is from . * B.P. McCall (1996), "Unemployment Insurance Rules, Joblessness, .* and Part-time Work," Econometrica, 64, 647-682. . . * There are 3343 observations from the CPS Displaced Worker Surveys . * of 1986, 1988, 1990 and 1992 on 33 variables including . * spell = length of spell in number of two-week intervals . * CENSOR1 = 1 if re-employed at full-time job . . * See program mma17p4duration.do for further description of the data set . . ********** READ DATA ********** 383

. . use ema1996.dta (Sample for 1996 EMA paper: part-time= worked part-time last week) . . ********** CREATE ADDITIONAL VARIABLES ********** . . gen RR = reprate . gen DR = disrate . gen UI = ui . gen RRUI = RR*UI . gen DRUI = DR*UI . gen LOGWAGE = logwage . sum Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------spell | 3343 6.247981 5.611271 1 28 censor1 | 3343 .3209692 .4669188 0 1 censor2 | 3343 .1014059 .3019106 0 1 censor3 | 3343 .1717021 .3771777 0 1 censor4 | 3343 .3754113 .4843014 0 1 -------------+-------------------------------------------------------ui | 3343 .5527969 .4972791 0 1 reprate | 3343 .4544717 .1137918 .066 2.059 logwage | 3343 5.692994 .5356591 2.70805 7.600402 tenure | 3343 4.114867 5.862322 0 40 disrate | 3343 .1094376 .0735274 .002 1.02 -------------+-------------------------------------------------------slack | 3343 .4884834 .4999421 0 1 abolpos | 3343 .1456775 .3528354 0 1 explose | 3343 .5025426 .5000683 0 1 stateur | 3343 6.5516 1.803825 2.5 13 houshead | 3343 .6120251 .4873617 0 1 -------------+-------------------------------------------------------married | 3343 .5860006 .4926221 0 1 female | 3343 .3478911 .4763725 0 1 child | 3343 .4501944 .4975876 0 1 ychild | 3343 .1956327 .3967463 0 1 nonwhite | 3343 .1390966 .3460991 0 1 -------------+-------------------------------------------------------age | 3343 35.44331 10.6402 20 61 schlt12 | 3343 .2811846 .4496446 0 1 schgt12 | 3343 .3356267 .4722797 0 1 smsa | 3343 .7241998 .4469835 0 1 384

bluecoll | 3343 .6036494 .489212 0 1 -------------+-------------------------------------------------------mining | 3343 .029315 .1687132 0 1 constr | 3343 .1480706 .3552231 0 1 transp | 3343 .0646126 .2458778 0 1 trade | 3343 .1848639 .3882452 0 1 fire | 3343 .0514508 .2209484 0 1 -------------+-------------------------------------------------------services | 3343 .1699073 .3756075 0 1 pubadmin | 3343 .0095722 .097383 0 1 year85 | 3343 .2677236 .442839 0 1 year87 | 3343 .2174693 .4125862 0 1 year89 | 3343 .1998205 .3999251 0 1 -------------+-------------------------------------------------------midatl | 3343 .1088842 .3115405 0 1 encen | 3343 .1429853 .3501103 0 1 wncen | 3343 .0643135 .2453472 0 1 southatl | 3343 .2375112 .4256217 0 1 escen | 3343 .0532456 .2245564 0 1 -------------+-------------------------------------------------------wscen | 3343 .1441819 .3513266 0 1 mountain | 3343 .1079868 .3104102 0 1 pacific | 3343 .0260245 .159232 0 1 RR | 3343 .4544717 .1137918 .066 2.059 DR | 3343 .1094376 .0735274 .002 1.02 -------------+-------------------------------------------------------UI | 3343 .5527969 .4972791 0 1 RRUI | 3343 .2478687 .2380667 0 2.059 DRUI | 3343 .0602776 .0754261 0 .824 LOGWAGE | 3343 5.692994 .5356591 2.70805 7.600402 . . ********* ANALYSIS: UNEMPLOYMENT DURATION ********** . . * Stata st curves require defining the dependent variable . * and the censoring variable if there is one . stset spell, fail(censor1=1) failure event: censor1 == 1 obs. time interval: (0, spell] exit on or before: failure -----------------------------------------------------------------------------3343 total obs. 0 exclusions -----------------------------------------------------------------------------3343 obs. remaining, representing 1073 failures in single record/single failure data 20887 total analysis time at risk, at risk from t = 0 earliest observed entry t = 0 last observed exit t = 28 385

. stdes failure _d: censor1 == 1 analysis time _t: spell |-------------- per subject --------------| Category total mean min median max -----------------------------------------------------------------------------no. of subjects 3343 no. of records 3343 1 1 1 1 (first) entry time (final) exit time subjects with gap time on gap if gap time at risk

0 6.247981 0 0 20887 6.247981

0

0 1

0 5

1

28

5

28

failures 1073 .3209692 0 0 1 -----------------------------------------------------------------------------. . * Define $xlist = list of regressors used in subsequent regressions . global xlist RR DR UI RRUI DRUI LOGWAGE /* > */ tenure slack abolpos explose stateur houshead married /* > */ female child ychild nonwhite age schlt12 schgt12 smsa bluecoll /* > */ mining constr transp trade fire services pubadmin /* > */ year85 year87 year89 midatl /* > */ encen wncen southatl escen wscen mountain pacific . . * (1) EXPONENTIAL REGRESSION . . * Estimate exponential without heterogeneity . streg $xlist, nolog nohr dist(exponential) robust failure _d: censor1 == 1 analysis time _t: spell Exponential regression -- log relative-hazard form No. of subjects No. of failures Time at risk

= = =

3343 1073 20887

Number of obs =

Wald chi2(40) = 565.24 Log pseudo-likelihood = -2700.6903 Prob > chi2

3343

=

0.0000

-----------------------------------------------------------------------------| Robust _t | Coef. Std. Err. z P>|z| [95% Conf. Interval] 386

-------------+---------------------------------------------------------------RR | .4720235 .6005534 0.79 0.432 -.7050396 1.649087 DR | -.5756396 .7624489 -0.75 0.450 -2.070012 .9187327 UI | -1.424561 .2493917 -5.71 0.000 -1.91336 -.9357622 RRUI | .9655904 .6118408 1.58 0.115 -.2335956 2.164776 DRUI | -.1990635 1.019118 -0.20 0.845 -2.196498 1.798371 LOGWAGE | .3508005 .115598 3.03 0.002 .1242327 .5773684 tenure | -.0001462 .0064637 -0.02 0.982 -.0128147 .0125224 slack | -.2593666 .0759363 -3.42 0.001 -.4081991 -.1105342 abolpos | -.1550897 .0953306 -1.63 0.104 -.3419342 .0317549 explose | .198458 .0648354 3.06 0.002 .071383 .3255331 stateur | -.064626 .0229903 -2.81 0.005 -.1096862 -.0195659 houshead | .3812208 .0836602 4.56 0.000 .2172499 .5451918 married | .369552 .0786145 4.70 0.000 .2154705 .5236335 female | .1164067 .0852986 1.36 0.172 -.0507754 .2835888 child | -.0333008 .0794577 -0.42 0.675 -.1890352 .1224335 ychild | -.1449722 .1022781 -1.42 0.156 -.3454336 .0554892 nonwhite | -.6692066 .1188272 -5.63 0.000 -.9021037 -.4363095 age | -.0220821 .0039256 -5.63 0.000 -.0297762 -.0143879 schlt12 | -.1231414 .0966102 -1.27 0.202 -.3124939 .066211 schgt12 | .1114395 .082945 1.34 0.179 -.0511297 .2740087 smsa | .1922291 .0799904 2.40 0.016 .0354508 .3490075 bluecoll | -.2033718 .085129 -2.39 0.017 -.3702215 -.036522 mining | -.1205818 .1973575 -0.61 0.541 -.5073955 .2662319 constr | -.04475 .1081519 -0.41 0.679 -.2567237 .1672238 transp | -.1786694 .156034 -1.15 0.252 -.4844906 .1271517 trade | -.0345159 .1019152 -0.34 0.735 -.234266 .1652341 fire | .1120549 .1386716 0.81 0.419 -.1597365 .3838462 services | .1840002 .0983911 1.87 0.061 -.0088428 .3768432 pubadmin | .1090606 .2954211 0.37 0.712 -.4699541 .6880752 year85 | .2147661 .0888664 2.42 0.016 .0405911 .388941 year87 | .3541162 .0948499 3.73 0.000 .1682139 .5400186 year89 | .467082 .1104355 4.23 0.000 .2506325 .6835316 midatl | .0264112 .1465647 0.18 0.857 -.2608503 .3136727 encen | .0043916 .1502813 0.03 0.977 -.2901544 .2989375 wncen | .1724311 .1607689 1.07 0.283 -.1426703 .4875324 southatl | .2638807 .1183726 2.23 0.026 .0318747 .4958867 escen | .35414 .19317 1.83 0.067 -.0244664 .7327463 wscen | .3385896 .1433308 2.36 0.018 .0576664 .6195128 mountain | .0063693 .1538821 0.04 0.967 -.2952341 .3079727 pacific | .0770202 .2393505 0.32 0.748 -.3920982 .5461385 _cons | -4.079107 .8767097 -4.65 0.000 -5.797426 -2.360788 -----------------------------------------------------------------------------. estimates store bexp . . * Figure 18.2 (p.633) - Generalized (Cox-Snell) Residuals for Exponential . predict resid, csnell . stset resid, fail(censor1) 387

failure event: censor1 != 0 & censor1 < . obs. time interval: (0, resid] exit on or before: failure -----------------------------------------------------------------------------3343 total obs. 0 exclusions -----------------------------------------------------------------------------3343 obs. remaining, representing 1073 failures in single record/single failure data 1073 total analysis time at risk, at risk from t = 0 earliest observed entry t = 0 last observed exit t = 5.218098 . sts generate survivor=s . generate cumhaz = -ln(survivor) . sort resid . graph twoway (scatter cumhaz resid, c(J) msymbol(i) msize(small) clstyle(p1)) /* > */ (scatter resid resid, c(l) msymbol(i) msize(small) clstyle(p2)), /* > */ scale (1.2) plotregion(style(none)) /* > */ title("Exponential Model Residuals") /* > */ xtitle("Generalized (Cox-Snell) Residual", size(medlarge)) xscale(titlegap(*5)) /* > */ ytitle("Cumulative Hazard", size(medlarge)) yscale(titlegap(*5)) /* > */ legend(pos(6) ring(0) col(1)) legend(size(small)) /* > */ legend( label(1 "Cumulative Hazard") label(2 "45 degree line")) . graph export exp.wmf, replace (file c:\Imbook\bwebpage\Section4\exp.wmf written in Windows Metafile format) . drop resid survivor cumhaz . . * Estimate exponential with gamma heterogeneity . stset spell, fail(censor1) failure event: censor1 != 0 & censor1 < . obs. time interval: (0, spell] exit on or before: failure -----------------------------------------------------------------------------3343 total obs. 0 exclusions -----------------------------------------------------------------------------3343 obs. remaining, representing 1073 failures in single record/single failure data 20887 total analysis time at risk, at risk from t = 0 earliest observed entry t = 0 388

last observed exit t =

28

. streg $xlist, nolog nohr dist(exponential) frailty(gamma) robust failure _d: censor1 analysis time _t: spell Exponential regression -- log relative-hazard form Gamma frailty No. of subjects No. of failures Time at risk

= = =

3343 1073 20887

Number of obs =

Wald chi2(40) = 576.86 Log pseudo-likelihood = -2695.3518 Prob > chi2

3343

=

0.0000

-----------------------------------------------------------------------------| Robust _t | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------RR | .5005828 .6187508 0.81 0.419 -.7121465 1.713312 DR | -.8824469 .7894395 -1.12 0.264 -2.42972 .664826 UI | -1.584537 .2622252 -6.04 0.000 -2.098489 -1.070586 RRUI | 1.091168 .6327026 1.72 0.085 -.1489067 2.331242 DRUI | .0574048 1.047123 0.05 0.956 -1.994919 2.109729 LOGWAGE | .3792805 .1191278 3.18 0.001 .1457944 .6127666 tenure | .0007938 .0065903 0.12 0.904 -.012123 .0137106 slack | -.2862928 .0770348 -3.72 0.000 -.4372782 -.1353074 abolpos | -.1842749 .0977213 -1.89 0.059 -.3758051 .0072552 explose | .2151452 .0663117 3.24 0.001 .0851767 .3451137 stateur | -.0650451 .023552 -2.76 0.006 -.1112061 -.0188841 houshead | .3960399 .0847153 4.67 0.000 .2300009 .5620789 married | .3961194 .0806744 4.91 0.000 .2380005 .5542384 female | .1102564 .0869256 1.27 0.205 -.0601147 .2806275 child | -.0464355 .0815869 -0.57 0.569 -.206343 .113472 ychild | -.1213622 .103309 -1.17 0.240 -.3238441 .0811196 nonwhite | -.6909793 .1217489 -5.68 0.000 -.9296027 -.4523559 age | -.0225342 .0040184 -5.61 0.000 -.0304101 -.0146582 schlt12 | -.1513782 .0968026 -1.56 0.118 -.3411079 .0383515 schgt12 | .1011742 .0834622 1.21 0.225 -.0624088 .2647572 smsa | .212363 .081774 2.60 0.009 .052089 .372637 bluecoll | -.220439 .0862751 -2.56 0.011 -.3895351 -.0513429 mining | -.1721823 .2051663 -0.84 0.401 -.5743008 .2299362 constr | -.0897602 .11034 -0.81 0.416 -.3060225 .1265022 transp | -.1572488 .1563607 -1.01 0.315 -.4637102 .1492126 trade | -.0451107 .1034986 -0.44 0.663 -.2479642 .1577428 fire | .0881685 .1386688 0.64 0.525 -.1836175 .3599544 services | .1682835 .1005405 1.67 0.094 -.0287723 .3653393 pubadmin | .0961407 .3092103 0.31 0.756 -.5099004 .7021817 year85 | .1940199 .0906564 2.14 0.032 .0163366 .3717031 year87 | .3564373 .0959014 3.72 0.000 .1684741 .5444005 389

year89 | .4924007 .1101907 4.47 0.000 .2764308 .7083705 midatl | .0156736 .1488094 0.11 0.916 -.2759874 .3073347 encen | .0089345 .1538505 0.06 0.954 -.2926069 .3104759 wncen | .1742124 .1634726 1.07 0.287 -.1461881 .4946129 southatl | .2676635 .1192515 2.24 0.025 .0339348 .5013922 escen | .3741169 .199389 1.88 0.061 -.0166783 .7649121 wscen | .361461 .1423856 2.54 0.011 .0823903 .6405316 mountain | -.00019 .1557385 -0.00 0.999 -.3054318 .3050519 pacific | .0800478 .2463547 0.32 0.745 -.4027986 .5628941 _cons | -4.095067 .9086039 -4.51 0.000 -5.875898 -2.314236 -------------+---------------------------------------------------------------/ln_the | -1.462995 .31608 -4.63 0.000 -2.0825 -.8434894 -------------+---------------------------------------------------------------theta | .2315418 .0731857 .1246183 .4302067 -----------------------------------------------------------------------------. estimates store bexpgamma . . * Figure 18.3 (p.633) - Generalized (Cox-Snell) Residuals for Exponential-Gamma . predict resid, csnell (option unconditional assumed) . stset resid, fail(censor1) failure event: censor1 != 0 & censor1 < . obs. time interval: (0, resid] exit on or before: failure -----------------------------------------------------------------------------3343 total obs. 0 exclusions -----------------------------------------------------------------------------3343 obs. remaining, representing 1073 failures in single record/single failure data 1073 total analysis time at risk, at risk from t = 0 earliest observed entry t = 0 last observed exit t = 3.971096 . sts generate survivor=s . generate cumhaz = -ln(survivor) . sort resid . graph twoway (scatter cumhaz resid, c(J) msymbol(i) msize(small) clstyle(p1)) /* > */ (scatter resid resid, c(l) msymbol(i) msize(small) clstyle(p2)), /* > */ scale (1.2) plotregion(style(none)) /* > */ title("Exponential-Gamma Model Residuals") /* > */ xtitle("Generalized (Cox-Snell) Residual", size(medlarge)) xscale(titlegap(*5)) /* > */ ytitle("Cumulative Hazard", size(medlarge)) yscale(titlegap(*5)) /* 390

> */ legend(pos(6) ring(0) col(1)) legend(size(small)) /* > */ legend( label(1 "Cumulative Hazard") label(2 "45 degree line")) . graph export exp_gamma.wmf, replace (file c:\Imbook\bwebpage\Section4\exp_gamma.wmf written in Windows Metafile format) . drop resid survivor cumhaz . . /* > * Following did not work, even with starting values provided > * Results in book obtained on different computer with different Stata version > * Estimate exponential with IG heterogeneity > stset spell, fail(censor1=1) > quietly streg $xlist, nolog nohr dist(exponential) robust > matrix theta = 1.6 > matrix bstart = e(b),theta > streg $xlist, nohr dist(exponential) frailty(invgauss) robust from(bstart) > * estimates store bexpIG > */ . . * Table 18.1 (p.634) - Display Parameter Estimates . * Note that exponetial-IG missing . estimates table bexp bexpgamma, t(%9.3f) stats(N ll) b(%9.3f) /* > */ keep(RR DR UI RRUI DRUI LOGWAGE _cons) -------------------------------------Variable | bexp bexpgamma -------------+-----------------------RR | 0.472 0.501 | 0.786 0.809 DR | -0.576 -0.882 | -0.755 -1.118 UI | -1.425 -1.585 | -5.712 -6.043 RRUI | 0.966 1.091 | 1.578 1.725 DRUI | -0.199 0.057 | -0.195 0.055 LOGWAGE | 0.351 0.379 | 3.035 3.184 _cons | -4.079 -4.095 | -4.653 -4.507 -------------+-----------------------N | 3343.000 3343.000 ll | -2700.690 -2695.352 -------------------------------------legend: b/t . . * (2) WEIBULL REGRESSION 391

. . * Estimate Weibull without heterogeneity . stset spell, fail(censor1=1) failure event: censor1 == 1 obs. time interval: (0, spell] exit on or before: failure -----------------------------------------------------------------------------3343 total obs. 0 exclusions -----------------------------------------------------------------------------3343 obs. remaining, representing 1073 failures in single record/single failure data 20887 total analysis time at risk, at risk from t = 0 earliest observed entry t = 0 last observed exit t = 28 . streg $xlist, nolog nohr dist(weibull) robust failure _d: censor1 == 1 analysis time _t: spell Weibull regression -- log relative-hazard form No. of subjects No. of failures Time at risk

= = =

3343 1073 20887

Number of obs =

Wald chi2(40) = 501.65 Log pseudo-likelihood = -2687.5995 Prob > chi2

3343

=

0.0000

-----------------------------------------------------------------------------| Robust _t | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------RR | .4481156 .6381895 0.70 0.483 -.8027127 1.698944 DR | -.4269187 .8086983 -0.53 0.598 -2.011938 1.158101 UI | -1.496066 .2639679 -5.67 0.000 -2.013434 -.9786984 RRUI | 1.015226 .6455611 1.57 0.116 -.2500501 2.280503 DRUI | -.2988417 1.065384 -0.28 0.779 -2.386956 1.789272 LOGWAGE | .3655253 .12212 2.99 0.003 .1261745 .6048761 tenure | -.0011127 .0068716 -0.16 0.871 -.0145809 .0123554 slack | -.2652154 .0803214 -3.30 0.001 -.4226424 -.1077883 abolpos | -.1604227 .1012942 -1.58 0.113 -.3589557 .0381103 explose | .2075085 .0684715 3.03 0.002 .0733068 .3417103 stateur | -.0708745 .0242117 -2.93 0.003 -.1183286 -.0234204 houshead | .3976626 .0887192 4.48 0.000 .2237762 .571549 married | .3786057 .0830317 4.56 0.000 .2158665 .541345 female | .1260829 .0896987 1.41 0.160 -.0497233 .301889 child | -.0336778 .0839956 -0.40 0.688 -.1983061 .1309505 ychild | -.1613066 .108947 -1.48 0.139 -.3748389 .0522256 392

nonwhite | -.7025504 .12426 -5.65 0.000 -.9460956 -.4590052 age | -.0235823 .0041922 -5.63 0.000 -.0317989 -.0153658 schlt12 | -.1226759 .1022762 -1.20 0.230 -.3231335 .0777816 schgt12 | .1162848 .0880692 1.32 0.187 -.0563278 .2888973 smsa | .1999567 .0841129 2.38 0.017 .0350985 .3648149 bluecoll | -.1994925 .0899354 -2.22 0.027 -.3757626 -.0232223 mining | -.1015676 .2036644 -0.50 0.618 -.5007425 .2976073 constr | -.0253737 .1135609 -0.22 0.823 -.247949 .1972016 transp | -.1981522 .1672141 -1.19 0.236 -.5258858 .1295814 trade | -.0311361 .1079502 -0.29 0.773 -.2427146 .1804423 fire | .1262153 .1492527 0.85 0.398 -.1663145 .4187452 services | .2031673 .1038945 1.96 0.051 -.0004622 .4067968 pubadmin | .1117728 .3087374 0.36 0.717 -.4933415 .716887 year85 | .2374972 .093387 2.54 0.011 .054462 .4205325 year87 | .3787397 .1011782 3.74 0.000 .1804341 .5770454 year89 | .4920278 .1180472 4.17 0.000 .2606596 .7233959 midatl | .02465 .1542139 0.16 0.873 -.2776037 .3269036 encen | -.0014111 .1579065 -0.01 0.993 -.3109023 .30808 wncen | .1844363 .1694444 1.09 0.276 -.1476687 .5165413 southatl | .2740974 .1250481 2.19 0.028 .0290076 .5191872 escen | .367742 .2024771 1.82 0.069 -.0291058 .7645899 wscen | .3440005 .1527804 2.25 0.024 .0445563 .6434446 mountain | .0159627 .1620188 0.10 0.922 -.3015883 .3335136 pacific | .0849532 .2504077 0.34 0.734 -.4058368 .5757432 _cons | -4.357886 .9196792 -4.74 0.000 -6.160424 -2.555347 -------------+---------------------------------------------------------------/ln_p | .1215314 .0194374 6.25 0.000 .0834348 .1596281 -------------+---------------------------------------------------------------p | 1.129225 .0219492 1.087014 1.173075 1/p | .8855632 .0172131 .8524608 .9199511 -----------------------------------------------------------------------------. estimates store bweib . . * Figure 18.4 (p.635) - Generalized (Cox-Snell) Residuals for Weibull . predict resid, csnell . stset resid, fail(censor1) failure event: censor1 != 0 & censor1 < . obs. time interval: (0, resid] exit on or before: failure -----------------------------------------------------------------------------3343 total obs. 0 exclusions -----------------------------------------------------------------------------3343 obs. remaining, representing 1073 failures in single record/single failure data 1073 total analysis time at risk, at risk from t = 0 393

earliest observed entry t = 0 last observed exit t = 6.283261 . sts generate survivor=s . generate cumhaz = -ln(survivor) . sort resid . graph twoway (scatter cumhaz resid, c(J) msymbol(i) msize(small) clstyle(p1)) /* > */ (scatter resid resid, c(l) msymbol(i) msize(small) clstyle(p2)), /* > */ scale (1.2) plotregion(style(none)) /* > */ title("Weibull Model Residuals") /* > */ xtitle("Generalized (Cox-Snell) Residual", size(medlarge)) xscale(titlegap(*5)) /* > */ ytitle("Cumulative Hazard", size(medlarge)) yscale(titlegap(*5)) /* > */ legend(pos(6) ring(0) col(1)) legend(size(small)) /* > */ legend( label(1 "Cumulative Hazard") label(2 "45 degree line")) . graph export Weibul16.wmf, replace (file c:\Imbook\bwebpage\Section4\Weibul16.wmf written in Windows Metafile format) . drop resid survivor cumhaz . . * Estimate Weibull with gamma heterogeneity . stset spell, fail(censor1=1) failure event: censor1 == 1 obs. time interval: (0, spell] exit on or before: failure -----------------------------------------------------------------------------3343 total obs. 0 exclusions -----------------------------------------------------------------------------3343 obs. remaining, representing 1073 failures in single record/single failure data 20887 total analysis time at risk, at risk from t = 0 earliest observed entry t = 0 last observed exit t = 28 . streg $xlist, nolog nohr dist(weibull) frailty(invgauss) robust failure _d: censor1 == 1 analysis time _t: spell Weibull regression -- log relative-hazard form Inverse-Gaussian frailty No. of subjects No. of failures

= =

3343 1073

Number of obs =

3343

394

Time at risk

=

20887

Wald chi2(40) = 643.00 Log pseudo-likelihood = -2616.3216 Prob > chi2

=

0.0000

-----------------------------------------------------------------------------| Robust _t | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------RR | .7356277 .9058181 0.81 0.417 -1.039743 2.510998 DR | -1.072566 1.149098 -0.93 0.351 -3.324758 1.179625 UI | -2.574752 .3843798 -6.70 0.000 -3.328123 -1.821381 RRUI | 1.733571 .9333928 1.86 0.063 -.0958458 3.562987 DRUI | -.060621 1.537813 -0.04 0.969 -3.07468 2.953438 LOGWAGE | .575656 .1766599 3.26 0.001 .2294089 .9219031 tenure | -.0009848 .0097472 -0.10 0.920 -.0200889 .0181194 slack | -.4416007 .1142976 -3.86 0.000 -.6656199 -.2175814 abolpos | -.2873066 .1465357 -1.96 0.050 -.5745113 -.0001019 explose | .3641943 .0976897 3.73 0.000 .1727259 .5556627 stateur | -.0981133 .0346763 -2.83 0.005 -.1660775 -.030149 houshead | .5924383 .1256739 4.71 0.000 .3461219 .8387546 married | .6083214 .1183487 5.14 0.000 .3763624 .8402805 female | .1788439 .1285074 1.39 0.164 -.0730259 .4307137 child | -.0914227 .121778 -0.75 0.453 -.3301031 .1472578 ychild | -.1805373 .1527477 -1.18 0.237 -.4799173 .1188426 nonwhite | -1.008517 .1725174 -5.85 0.000 -1.346645 -.6703894 age | -.0333776 .0059183 -5.64 0.000 -.0449772 -.0217779 schlt12 | -.2258621 .1439543 -1.57 0.117 -.5080075 .0562832 schgt12 | .1505129 .124469 1.21 0.227 -.0934418 .3944677 smsa | .3009952 .119907 2.51 0.012 .0659819 .5360086 bluecoll | -.3211857 .1253163 -2.56 0.010 -.5668012 -.0755702 mining | -.2319827 .3008491 -0.77 0.441 -.8216361 .3576708 constr | -.1260324 .1633669 -0.77 0.440 -.4462257 .1941609 transp | -.2763858 .225893 -1.22 0.221 -.7191279 .1663562 trade | -.0687616 .1518284 -0.45 0.651 -.3663399 .2288166 fire | .0668973 .2131814 0.31 0.754 -.3509306 .4847252 services | .231914 .1494712 1.55 0.121 -.0610441 .5248721 pubadmin | .0901949 .4579252 0.20 0.844 -.807322 .9877117 year85 | .2780139 .1339053 2.08 0.038 .0155644 .5404634 year87 | .5208783 .1415375 3.68 0.000 .2434699 .7982867 year89 | .7209598 .1655487 4.35 0.000 .3964903 1.045429 midatl | -.0192077 .2222646 -0.09 0.931 -.4548382 .4164228 encen | -.0297055 .2284931 -0.13 0.897 -.4775438 .4181328 wncen | .2460338 .24216 1.02 0.310 -.2285911 .7206586 southatl | .3563643 .1793284 1.99 0.047 .0048872 .7078415 escen | .5461543 .2910193 1.88 0.061 -.024233 1.116542 wscen | .4606814 .2140966 2.15 0.031 .0410598 .880303 mountain | .017581 .2293804 0.08 0.939 -.4319963 .4671584 pacific | .1379886 .3636985 0.38 0.704 -.5748475 .8508247 _cons | -5.303059 1.34133 -3.95 0.000 -7.932017 -2.6741 -------------+---------------------------------------------------------------/ln_p | .5611667 .0225898 24.84 0.000 .5168915 .6054418 395

/ln_the | 1.852696 .0896755 20.66 0.000 1.676935 2.028457 -------------+---------------------------------------------------------------p | 1.752716 .0395935 1.676807 1.832062 1/p | .570543 .0128884 .5458332 .5963715 theta | 6.376987 .5718595 5.349136 7.602343 -----------------------------------------------------------------------------. estimates store bweibIG . . * Figure 18.5 (p.636) - Generalized (Cox-Snell) Residuals for Weibull-IG . predict resid, csnell (option unconditional assumed) . stset resid, fail(censor1) failure event: censor1 != 0 & censor1 < . obs. time interval: (0, resid] exit on or before: failure -----------------------------------------------------------------------------3343 total obs. 0 exclusions -----------------------------------------------------------------------------3343 obs. remaining, representing 1073 failures in single record/single failure data 1073 total analysis time at risk, at risk from t = 0 earliest observed entry t = 0 last observed exit t = 5.044588 . sts generate survivor=s . generate cumhaz = -ln(survivor) . sort resid . graph twoway (scatter cumhaz resid, c(J) msymbol(i) msize(small) clstyle(p1)) /* > */ (scatter resid resid, c(l) msymbol(i) msize(small) clstyle(p2)), /* > */ scale (1.2) plotregion(style(none)) /* > */ title("Weibull-IG Model Residuals") /* > */ xtitle("Generalized (Cox-Snell) Residual", size(medlarge)) xscale(titlegap(*5)) /* > */ ytitle("Cumulative Hazard", size(medlarge)) yscale(titlegap(*5)) /* > */ legend(pos(6) ring(0) col(1)) legend(size(small)) /* > */ legend( label(1 "Cumulative Hazard") label(2 "45 degree line")) . graph export Weibul16_IG.wmf, replace (file c:\Imbook\bwebpage\Section4\Weibul16_IG.wmf written in Windows Metafile format) . drop resid survivor cumhaz . 396

. * Table 18.2 (p.635) - Display Parameter Estimates . estimates table bweibIG bweib, t(%9.3f) stats(N ll) b(%9.3f) /* > */ keep(RR DR UI RRUI DRUI LOGWAGE _cons) -------------------------------------Variable | bweibIG bweib -------------+-----------------------RR | 0.736 0.448 | 0.812 0.702 DR | -1.073 -0.427 | -0.933 -0.528 UI | -2.575 -1.496 | -6.698 -5.668 RRUI | 1.734 1.015 | 1.857 1.573 DRUI | -0.061 -0.299 | -0.039 -0.281 LOGWAGE | 0.576 0.366 | 3.259 2.993 _cons | -5.303 -4.358 | -3.954 -4.738 -------------+-----------------------N | 3343.000 3343.000 ll | -2616.322 -2687.600 -------------------------------------legend: b/t . . ********** CLOSE OUTPUT ********** . log close log: c:\Imbook\bwebpage\Section4\mma18p1heterogeneity.txt log type: text closed on: 19 May 2005, 17:58:38

397

-----------------------------------------------------------------------------------------------------log: c:\Imbook\bwebpage\Section4\mma19p1comprisks.txt log type: text opened on: 19 May 2005, 17:52:44 . . ********** OVERVIEW OF MMA18P1COMPRISKS.DO ********** . . * STATA Program . * copyright C 2005 by A. Colin Cameron and Pravin K. Trivedi . * used for "Microeconometrics: Methods and Applications" . * by A. Colin Cameron and Pravin K. Trivedi (2005) . * Cambridge University Press . . * Chapter 19.5 pages 658-62 . * Competing Risks Example with censoring mechanism each of the three risks . * (1A) Table 19.2 p.659 Exponential . * (1B) Table 19.2 p.659 Exponential with IG frailty . * (2A) Table 19.3 p.659 Weibull . * (2B) Table 19.3 p.659 Weibull with IG frailty . * (2C) Table 19.3 p.660 Cox model . * (2D) Graph the resulting Cox baseline survival and cumulative hazards .* Figure 19.1: (combined_bsf.wmf) baseline survival functions .* Figure 19.2: (combined_cbh.wmf) baseline cumulative hazards . . * To run this program you need data file . * ema1996.dta . . * NOTE: The IG Heterogeneity estimation was unsuccessful for exponential .* but successful for Weibull . . ********** SETUP ********** . . set more off . version 8 . set scheme s1mono /* Used for graphs */ . set matsize 80

/* Needed for this program */

. . ********** DATA DESCRIPTION ********** . . * The data is from . * B.P. McCall (1996), "Unemployment Insurance Rules, Joblessness, .* and Part-time Work," Econometrica, 64, 647-682. . . * There are 3343 observations from the CPS Displaced Worker Surveys . * of 1986, 1988, 1990 and 1992 on 33 variables including . * spell = length of spell in number of two-week intervals 398

. * CENSOR1 = 1 if re-employed at full-time job . * CENSOR2 = 1 if re-employed at part-time job . * CENSOR3 = 1 if re-employed but left job: pt-ft status unknown . * CENSOR4 = 1 if still jobless . . * See program mma17p4duration.do for further description of the data set . . ********** READ DATA and CREATE ADDITIONAL VARIABLES ********** . . use ema1996.dta (Sample for 1996 EMA paper: part-time= worked part-time last week) . . gen RR = reprate . gen DR = disrate . gen UI = ui . gen RRUI = RR*UI . gen DRUI = DR*UI . gen LOGWAGE = logwage . sum Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------spell | 3343 6.247981 5.611271 1 28 censor1 | 3343 .3209692 .4669188 0 1 censor2 | 3343 .1014059 .3019106 0 1 censor3 | 3343 .1717021 .3771777 0 1 censor4 | 3343 .3754113 .4843014 0 1 -------------+-------------------------------------------------------ui | 3343 .5527969 .4972791 0 1 reprate | 3343 .4544717 .1137918 .066 2.059 logwage | 3343 5.692994 .5356591 2.70805 7.600402 tenure | 3343 4.114867 5.862322 0 40 disrate | 3343 .1094376 .0735274 .002 1.02 -------------+-------------------------------------------------------slack | 3343 .4884834 .4999421 0 1 abolpos | 3343 .1456775 .3528354 0 1 explose | 3343 .5025426 .5000683 0 1 stateur | 3343 6.5516 1.803825 2.5 13 houshead | 3343 .6120251 .4873617 0 1 -------------+-------------------------------------------------------married | 3343 .5860006 .4926221 0 1 female | 3343 .3478911 .4763725 0 1 child | 3343 .4501944 .4975876 0 1 ychild | 3343 .1956327 .3967463 0 1 399

nonwhite | 3343 .1390966 .3460991 0 1 -------------+-------------------------------------------------------age | 3343 35.44331 10.6402 20 61 schlt12 | 3343 .2811846 .4496446 0 1 schgt12 | 3343 .3356267 .4722797 0 1 smsa | 3343 .7241998 .4469835 0 1 bluecoll | 3343 .6036494 .489212 0 1 -------------+-------------------------------------------------------mining | 3343 .029315 .1687132 0 1 constr | 3343 .1480706 .3552231 0 1 transp | 3343 .0646126 .2458778 0 1 trade | 3343 .1848639 .3882452 0 1 fire | 3343 .0514508 .2209484 0 1 -------------+-------------------------------------------------------services | 3343 .1699073 .3756075 0 1 pubadmin | 3343 .0095722 .097383 0 1 year85 | 3343 .2677236 .442839 0 1 year87 | 3343 .2174693 .4125862 0 1 year89 | 3343 .1998205 .3999251 0 1 -------------+-------------------------------------------------------midatl | 3343 .1088842 .3115405 0 1 encen | 3343 .1429853 .3501103 0 1 wncen | 3343 .0643135 .2453472 0 1 southatl | 3343 .2375112 .4256217 0 1 escen | 3343 .0532456 .2245564 0 1 -------------+-------------------------------------------------------wscen | 3343 .1441819 .3513266 0 1 mountain | 3343 .1079868 .3104102 0 1 pacific | 3343 .0260245 .159232 0 1 RR | 3343 .4544717 .1137918 .066 2.059 DR | 3343 .1094376 .0735274 .002 1.02 -------------+-------------------------------------------------------UI | 3343 .5527969 .4972791 0 1 RRUI | 3343 .2478687 .2380667 0 2.059 DRUI | 3343 .0602776 .0754261 0 .824 LOGWAGE | 3343 5.692994 .5356591 2.70805 7.600402 . . ********* COMPETING RISKS FOR UNEMPLOYMENT DURATION ********** . . * Stata analysis requires using stset to define the dependent variable . * and the censoring variable if there is one . . * For the competing risks model there are three censoring variables . * CENSOR1 = 1 if re-employed at full-time job . * CENSOR2 = 1 if re-employed at part-time job . * CENSOR3 = 1 if re-employed but left job: pt-ft status unknown . . * Define $xlist = list of regressors used in subsequent regressions . global xlist RR DR UI RRUI DRUI LOGWAGE /* > */ tenure slack abolpos explose stateur houshead married /* 400

> > > >

*/ female child ychild nonwhite age schlt12 schgt12 smsa bluecoll /* */ mining constr transp trade fire services pubadmin /* */ year85 year87 year89 midatl /* */ encen wncen southatl escen wscen mountain pacific

. . *** (1A) EXPONENTIAL WITH NO HETEROGENEITY Table 19.2 . . stset spell, fail(censor1=1) failure event: censor1 == 1 obs. time interval: (0, spell] exit on or before: failure -----------------------------------------------------------------------------3343 total obs. 0 exclusions -----------------------------------------------------------------------------3343 obs. remaining, representing 1073 failures in single record/single failure data 20887 total analysis time at risk, at risk from t = 0 earliest observed entry t = 0 last observed exit t = 28 . streg $xlist, nolog nohr robust dist(exponential) failure _d: censor1 == 1 analysis time _t: spell Exponential regression -- log relative-hazard form No. of subjects No. of failures Time at risk

= = =

3343 1073 20887

Number of obs =

Wald chi2(40) = 565.24 Log pseudo-likelihood = -2700.6903 Prob > chi2

3343

=

0.0000

-----------------------------------------------------------------------------| Robust _t | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------RR | .4720235 .6005534 0.79 0.432 -.7050396 1.649087 DR | -.5756396 .7624489 -0.75 0.450 -2.070012 .9187327 UI | -1.424561 .2493917 -5.71 0.000 -1.91336 -.9357622 RRUI | .9655904 .6118408 1.58 0.115 -.2335956 2.164776 DRUI | -.1990635 1.019118 -0.20 0.845 -2.196498 1.798371 LOGWAGE | .3508005 .115598 3.03 0.002 .1242327 .5773684 tenure | -.0001462 .0064637 -0.02 0.982 -.0128147 .0125224 slack | -.2593666 .0759363 -3.42 0.001 -.4081991 -.1105342 abolpos | -.1550897 .0953306 -1.63 0.104 -.3419342 .0317549 explose | .198458 .0648354 3.06 0.002 .071383 .3255331 401

stateur | -.064626 .0229903 -2.81 0.005 -.1096862 -.0195659 houshead | .3812208 .0836602 4.56 0.000 .2172499 .5451918 married | .369552 .0786145 4.70 0.000 .2154705 .5236335 female | .1164067 .0852986 1.36 0.172 -.0507754 .2835888 child | -.0333008 .0794577 -0.42 0.675 -.1890352 .1224335 ychild | -.1449722 .1022781 -1.42 0.156 -.3454336 .0554892 nonwhite | -.6692066 .1188272 -5.63 0.000 -.9021037 -.4363095 age | -.0220821 .0039256 -5.63 0.000 -.0297762 -.0143879 schlt12 | -.1231414 .0966102 -1.27 0.202 -.3124939 .066211 schgt12 | .1114395 .082945 1.34 0.179 -.0511297 .2740087 smsa | .1922291 .0799904 2.40 0.016 .0354508 .3490075 bluecoll | -.2033718 .085129 -2.39 0.017 -.3702215 -.036522 mining | -.1205818 .1973575 -0.61 0.541 -.5073955 .2662319 constr | -.04475 .1081519 -0.41 0.679 -.2567237 .1672238 transp | -.1786694 .156034 -1.15 0.252 -.4844906 .1271517 trade | -.0345159 .1019152 -0.34 0.735 -.234266 .1652341 fire | .1120549 .1386716 0.81 0.419 -.1597365 .3838462 services | .1840002 .0983911 1.87 0.061 -.0088428 .3768432 pubadmin | .1090606 .2954211 0.37 0.712 -.4699541 .6880752 year85 | .2147661 .0888664 2.42 0.016 .0405911 .388941 year87 | .3541162 .0948499 3.73 0.000 .1682139 .5400186 year89 | .467082 .1104355 4.23 0.000 .2506325 .6835316 midatl | .0264112 .1465647 0.18 0.857 -.2608503 .3136727 encen | .0043916 .1502813 0.03 0.977 -.2901544 .2989375 wncen | .1724311 .1607689 1.07 0.283 -.1426703 .4875324 southatl | .2638807 .1183726 2.23 0.026 .0318747 .4958867 escen | .35414 .19317 1.83 0.067 -.0244664 .7327463 wscen | .3385896 .1433308 2.36 0.018 .0576664 .6195128 mountain | .0063693 .1538821 0.04 0.967 -.2952341 .3079727 pacific | .0770202 .2393505 0.32 0.748 -.3920982 .5461385 _cons | -4.079107 .8767097 -4.65 0.000 -5.797426 -2.360788 -----------------------------------------------------------------------------. estimates store bexpr1 . . stset spell, fail(censor2=1) failure event: censor2 == 1 obs. time interval: (0, spell] exit on or before: failure -----------------------------------------------------------------------------3343 total obs. 0 exclusions -----------------------------------------------------------------------------3343 obs. remaining, representing 339 failures in single record/single failure data 20887 total analysis time at risk, at risk from t = 0 earliest observed entry t = 0 last observed exit t = 28 402

. streg $xlist, nolog nohr robust dist(exponential) failure _d: censor2 == 1 analysis time _t: spell Exponential regression -- log relative-hazard form No. of subjects No. of failures Time at risk

= = =

3343 339 20887

Number of obs =

Wald chi2(40) = 227.08 Log pseudo-likelihood = -1250.5446 Prob > chi2

3343

=

0.0000

-----------------------------------------------------------------------------| Robust _t | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------RR | -.0928628 .9761428 -0.10 0.924 -2.006068 1.820342 DR | -.9600127 1.246692 -0.77 0.441 -3.403483 1.483458 UI | -1.047747 .5236826 -2.00 0.045 -2.074146 -.021348 RRUI | -.6698307 1.191869 -0.56 0.574 -3.005851 1.666189 DRUI | 1.987208 1.726509 1.15 0.250 -1.396688 5.371105 LOGWAGE | -.2577715 .1793075 -1.44 0.151 -.6092077 .0936646 tenure | .0053684 .0125538 0.43 0.669 -.0192366 .0299734 slack | -.2636908 .1311029 -2.01 0.044 -.5206477 -.0067339 abolpos | -.5626836 .202701 -2.78 0.006 -.9599703 -.1653969 explose | .0490271 .1130116 0.43 0.664 -.1724715 .2705258 stateur | -.1032439 .0406788 -2.54 0.011 -.182973 -.0235148 houshead | -.073544 .1343412 -0.55 0.584 -.3368479 .18976 married | -.0618813 .1339552 -0.46 0.644 -.3244287 .2006661 female | .4531912 .1384047 3.27 0.001 .181923 .7244594 child | -.2164986 .1452571 -1.49 0.136 -.5011973 .0682002 ychild | .149031 .1815684 0.82 0.412 -.2068365 .5048986 nonwhite | -.4563527 .1820135 -2.51 0.012 -.8130927 -.0996127 age | -.001781 .0064207 -0.28 0.781 -.0143653 .0108033 schlt12 | -.1803101 .1661528 -1.09 0.278 -.5059636 .1453433 schgt12 | -.0534463 .1462829 -0.37 0.715 -.3401555 .2332629 smsa | .1295376 .1384588 0.94 0.349 -.1418367 .400912 bluecoll | .0088207 .1510547 0.06 0.953 -.2872411 .3048825 mining | -.0141252 .4078632 -0.03 0.972 -.8135225 .785272 constr | .1867498 .1896106 0.98 0.325 -.1848802 .5583799 transp | -.402533 .2898061 -1.39 0.165 -.9705426 .1654766 trade | .1106678 .1735195 0.64 0.524 -.2294241 .4507598 fire | -.3396026 .3006096 -1.13 0.259 -.9287865 .2495813 services | .1619867 .1705571 0.95 0.342 -.172299 .4962724 pubadmin | .7445446 .5413463 1.38 0.169 -.3164746 1.805564 year85 | -.0548375 .149323 -0.37 0.713 -.3475052 .2378301 year87 | -.12113 .1616797 -0.75 0.454 -.4380164 .1957563 year89 | .1244437 .1950397 0.64 0.523 -.257827 .5067144 midatl | -.3969537 .2577568 -1.54 0.124 -.9021477 .1082403 403

encen | -.5115788 .2576815 -1.99 0.047 -1.016625 -.0065323 wncen | -.0674875 .257402 -0.26 0.793 -.5719862 .4370113 southatl | -.2719375 .1944647 -1.40 0.162 -.6530813 .1092062 escen | .065407 .3099463 0.21 0.833 -.5420766 .6728905 wscen | -.0941963 .2338712 -0.40 0.687 -.5525754 .3641827 mountain | .2287682 .2264905 1.01 0.312 -.215145 .6726814 pacific | -.2060074 .3970221 -0.52 0.604 -.9841563 .5721415 _cons | -.8636363 1.325425 -0.65 0.515 -3.461421 1.734148 -----------------------------------------------------------------------------. estimates store bexpr2 . . stset spell, fail(censor3=1) failure event: censor3 == 1 obs. time interval: (0, spell] exit on or before: failure -----------------------------------------------------------------------------3343 total obs. 0 exclusions -----------------------------------------------------------------------------3343 obs. remaining, representing 574 failures in single record/single failure data 20887 total analysis time at risk, at risk from t = 0 earliest observed entry t = 0 last observed exit t = 28 . streg $xlist, nolog nohr robust dist(exponential) failure _d: censor3 == 1 analysis time _t: spell Exponential regression -- log relative-hazard form No. of subjects No. of failures Time at risk

= = =

3343 574 20887

Number of obs =

Wald chi2(40) = 372.34 Log pseudo-likelihood = -1742.3964 Prob > chi2

3343

=

0.0000

-----------------------------------------------------------------------------| Robust _t | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------RR | -.6011551 .724665 -0.83 0.407 -2.021472 .8191621 DR | 1.121525 .9012528 1.24 0.213 -.6448975 2.887948 UI | -.9672682 .4486302 -2.16 0.031 -1.846567 -.0879691 RRUI | -.4326869 1.014413 -0.43 0.670 -2.4209 1.555526 DRUI | 2.102012 1.302564 1.61 0.107 -.450967 4.654991 404

LOGWAGE | .0029166 .1448149 0.02 0.984 -.2809153 .2867485 tenure | -.0479889 .0121403 -3.95 0.000 -.0717835 -.0241942 slack | -.4583215 .097709 -4.69 0.000 -.6498277 -.2668154 abolpos | -.2736409 .1396283 -1.96 0.050 -.5473073 .0000255 explose | .0246749 .0862551 0.29 0.775 -.144382 .1937319 stateur | -.1086692 .0319298 -3.40 0.001 -.1712504 -.046088 houshead | .5298135 .1054798 5.02 0.000 .3230769 .7365501 married | .0268657 .1062998 0.25 0.800 -.1814781 .2352095 female | .2590041 .109547 2.36 0.018 .0442959 .4737122 child | -.141802 .1114763 -1.27 0.203 -.3602915 .0766876 ychild | -.0885931 .136915 -0.65 0.518 -.3569416 .1797553 nonwhite | -.4668153 .143211 -3.26 0.001 -.7475036 -.186127 age | -.0247346 .0054431 -4.54 0.000 -.0354029 -.0140662 schlt12 | -.1034495 .1224893 -0.84 0.398 -.3435241 .1366251 schgt12 | .0952043 .1081669 0.88 0.379 -.1167988 .3072075 smsa | .0128711 .1021476 0.13 0.900 -.1873344 .2130767 bluecoll | .3098248 .1110841 2.79 0.005 .0921038 .5275457 mining | .2388579 .2604652 0.92 0.359 -.2716445 .7493603 constr | .0983356 .1419787 0.69 0.489 -.1799376 .3766088 transp | -.0783446 .1897853 -0.41 0.680 -.4503169 .2936278 trade | .1033278 .1292151 0.80 0.424 -.1499291 .3565847 fire | -.3607287 .2689374 -1.34 0.180 -.8878363 .166379 services | .0248212 .1323061 0.19 0.851 -.234494 .2841363 pubadmin | -1.770536 1.040329 -1.70 0.089 -3.809544 .2684714 year85 | .295673 .1143137 2.59 0.010 .0716222 .5197237 year87 | .4303606 .1198341 3.59 0.000 .1954901 .6652311 year89 | -.1373874 .1627204 -0.84 0.398 -.4563135 .1815386 midatl | -.5339921 .2188609 -2.44 0.015 -.9629516 -.1050326 encen | -.075022 .1998626 -0.38 0.707 -.4667454 .3167014 wncen | .1239805 .2095321 0.59 0.554 -.2866948 .5346559 southatl | .1522514 .1635982 0.93 0.352 -.1683951 .472898 escen | -.5123015 .3170723 -1.62 0.106 -1.133752 .1091488 wscen | .0198459 .1898764 0.10 0.917 -.3523051 .3919968 mountain | .1999108 .1869463 1.07 0.285 -.1664972 .5663188 pacific | .4481059 .2705097 1.66 0.098 -.0820833 .9782951 _cons | -1.620926 1.072666 -1.51 0.131 -3.723312 .4814595 -----------------------------------------------------------------------------. estimates store bexpr3 . . * Table 19.2 (page 658) first three columns . estimates table bexpr1 bexpr2 bexpr3, b(%10.3f) se(%10.3f) stats(N ll) /* > */ keep(RR DR UI RRUI DRUI LOGWAGE tenure) ----------------------------------------------------Variable | bexpr1 bexpr2 bexpr3 -------------+--------------------------------------RR | 0.472 -0.093 -0.601 | 0.601 0.976 0.725 DR | -0.576 -0.960 1.122 405

| 0.762 1.247 0.901 UI | -1.425 -1.048 -0.967 | 0.249 0.524 0.449 RRUI | 0.966 -0.670 -0.433 | 0.612 1.192 1.014 DRUI | -0.199 1.987 2.102 | 1.019 1.727 1.303 LOGWAGE | 0.351 -0.258 0.003 | 0.116 0.179 0.145 tenure | -0.000 0.005 -0.048 | 0.006 0.013 0.012 -------------+--------------------------------------N | 3343.000 3343.000 3343.000 ll | -2700.690 -1250.545 -1742.396 ----------------------------------------------------legend: b/se . . *** (1B) EXPONENTIAL WITH IG HETEROGENEITY Table 19.2 . . /* Did not work even though Weibull with IG heterogeneity did > > stset spell, fail(censor1=1) > streg $xlist, nohr robust dist(exponential) frailty(invgauss) > estimates store bexpigr1 > > stset spell, fail(censor2=1) > streg $xlist, nolog nohr robust dist(exponential) frailty(invgauss) > estimates store bexpigr2 > > stset spell, fail(censor3=1) > streg $xlist, nolog nohr robust dist(exponential) > estimates store bexpiggr3 > > * Table 19.2 (page 658) first three columns > estimates table bexpigr1 bexpigr2 bexpigr3, b(%10.3f) se(%10.3f) stats(N ll) /* > */ keep(RR DR UI RRUI DRUI LOGWAGE tenure) > > */ . . *** (2A) WEIBULL WITH NO HETEROGENEITY Table 19.3 . . stset spell, fail(censor1=1) failure event: censor1 == 1 obs. time interval: (0, spell] exit on or before: failure -----------------------------------------------------------------------------3343 total obs. 0 exclusions 406

-----------------------------------------------------------------------------3343 obs. remaining, representing 1073 failures in single record/single failure data 20887 total analysis time at risk, at risk from t = 0 earliest observed entry t = 0 last observed exit t = 28 . streg $xlist, nolog nohr robust dist(weibull) failure _d: censor1 == 1 analysis time _t: spell Weibull regression -- log relative-hazard form No. of subjects No. of failures Time at risk

= = =

3343 1073 20887

Number of obs =

Wald chi2(40) = 501.65 Log pseudo-likelihood = -2687.5995 Prob > chi2

3343

=

0.0000

-----------------------------------------------------------------------------| Robust _t | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------RR | .4481156 .6381895 0.70 0.483 -.8027127 1.698944 DR | -.4269187 .8086983 -0.53 0.598 -2.011938 1.158101 UI | -1.496066 .2639679 -5.67 0.000 -2.013434 -.9786984 RRUI | 1.015226 .6455611 1.57 0.116 -.2500501 2.280503 DRUI | -.2988417 1.065384 -0.28 0.779 -2.386956 1.789272 LOGWAGE | .3655253 .12212 2.99 0.003 .1261745 .6048761 tenure | -.0011127 .0068716 -0.16 0.871 -.0145809 .0123554 slack | -.2652154 .0803214 -3.30 0.001 -.4226424 -.1077883 abolpos | -.1604227 .1012942 -1.58 0.113 -.3589557 .0381103 explose | .2075085 .0684715 3.03 0.002 .0733068 .3417103 stateur | -.0708745 .0242117 -2.93 0.003 -.1183286 -.0234204 houshead | .3976626 .0887192 4.48 0.000 .2237762 .571549 married | .3786057 .0830317 4.56 0.000 .2158665 .541345 female | .1260829 .0896987 1.41 0.160 -.0497233 .301889 child | -.0336778 .0839956 -0.40 0.688 -.1983061 .1309505 ychild | -.1613066 .108947 -1.48 0.139 -.3748389 .0522256 nonwhite | -.7025504 .12426 -5.65 0.000 -.9460956 -.4590052 age | -.0235823 .0041922 -5.63 0.000 -.0317989 -.0153658 schlt12 | -.1226759 .1022762 -1.20 0.230 -.3231335 .0777816 schgt12 | .1162848 .0880692 1.32 0.187 -.0563278 .2888973 smsa | .1999567 .0841129 2.38 0.017 .0350985 .3648149 bluecoll | -.1994925 .0899354 -2.22 0.027 -.3757626 -.0232223 mining | -.1015676 .2036644 -0.50 0.618 -.5007425 .2976073 constr | -.0253737 .1135609 -0.22 0.823 -.247949 .1972016 transp | -.1981522 .1672141 -1.19 0.236 -.5258858 .1295814 trade | -.0311361 .1079502 -0.29 0.773 -.2427146 .1804423 fire | .1262153 .1492527 0.85 0.398 -.1663145 .4187452 407

services | .2031673 .1038945 1.96 0.051 -.0004622 .4067968 pubadmin | .1117728 .3087374 0.36 0.717 -.4933415 .716887 year85 | .2374972 .093387 2.54 0.011 .054462 .4205325 year87 | .3787397 .1011782 3.74 0.000 .1804341 .5770454 year89 | .4920278 .1180472 4.17 0.000 .2606596 .7233959 midatl | .02465 .1542139 0.16 0.873 -.2776037 .3269036 encen | -.0014111 .1579065 -0.01 0.993 -.3109023 .30808 wncen | .1844363 .1694444 1.09 0.276 -.1476687 .5165413 southatl | .2740974 .1250481 2.19 0.028 .0290076 .5191872 escen | .367742 .2024771 1.82 0.069 -.0291058 .7645899 wscen | .3440005 .1527804 2.25 0.024 .0445563 .6434446 mountain | .0159627 .1620188 0.10 0.922 -.3015883 .3335136 pacific | .0849532 .2504077 0.34 0.734 -.4058368 .5757432 _cons | -4.357886 .9196792 -4.74 0.000 -6.160424 -2.555347 -------------+---------------------------------------------------------------/ln_p | .1215314 .0194374 6.25 0.000 .0834348 .1596281 -------------+---------------------------------------------------------------p | 1.129225 .0219492 1.087014 1.173075 1/p | .8855632 .0172131 .8524608 .9199511 -----------------------------------------------------------------------------. estimates store bweibr1 . . stset spell, fail(censor2=1) failure event: censor2 == 1 obs. time interval: (0, spell] exit on or before: failure -----------------------------------------------------------------------------3343 total obs. 0 exclusions -----------------------------------------------------------------------------3343 obs. remaining, representing 339 failures in single record/single failure data 20887 total analysis time at risk, at risk from t = 0 earliest observed entry t = 0 last observed exit t = 28 . streg $xlist, nolog nohr robust dist(weibull) failure _d: censor2 == 1 analysis time _t: spell Weibull regression -- log relative-hazard form No. of subjects No. of failures Time at risk

= = =

3343 339 20887

Number of obs =

Wald chi2(40) =

3343

222.95 408

Log pseudo-likelihood = -1248.6859

Prob > chi2

=

0.0000

-----------------------------------------------------------------------------| Robust _t | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------RR | -.0855974 .9920715 -0.09 0.931 -2.030022 1.858827 DR | -.9387836 1.279111 -0.73 0.463 -3.445794 1.568227 UI | -1.110175 .5267037 -2.11 0.035 -2.142496 -.0778551 RRUI | -.6171912 1.203735 -0.51 0.608 -2.976469 1.742086 DRUI | 1.973269 1.756599 1.12 0.261 -1.469601 5.41614 LOGWAGE | -.2437885 .1833224 -1.33 0.184 -.6030938 .1155168 tenure | .0050643 .0127387 0.40 0.691 -.0199031 .0300317 slack | -.2689689 .133176 -2.02 0.043 -.529989 -.0079487 abolpos | -.5721689 .2059292 -2.78 0.005 -.9757826 -.1685551 explose | .0555267 .1147555 0.48 0.628 -.16939 .2804433 stateur | -.1087083 .0413647 -2.63 0.009 -.1897816 -.027635 houshead | -.0679894 .13661 -0.50 0.619 -.3357401 .1997613 married | -.060856 .1362403 -0.45 0.655 -.327882 .20617 female | .4583892 .1408831 3.25 0.001 .1822634 .734515 child | -.2228982 .147376 -1.51 0.130 -.5117499 .0659535 ychild | .1463598 .1844362 0.79 0.427 -.2151284 .507848 nonwhite | -.485664 .186033 -2.61 0.009 -.8502819 -.121046 age | -.0027009 .0065569 -0.41 0.680 -.0155521 .0101503 schlt12 | -.1837633 .1684487 -1.09 0.275 -.5139167 .1463901 schgt12 | -.0488958 .1485385 -0.33 0.742 -.340026 .2422343 smsa | .1380042 .1410747 0.98 0.328 -.1384971 .4145055 bluecoll | .0132584 .1537386 0.09 0.931 -.2880637 .3145805 mining | -.0138734 .4110202 -0.03 0.973 -.8194583 .7917115 constr | .1973771 .1920481 1.03 0.304 -.1790303 .5737845 transp | -.4116241 .2927848 -1.41 0.160 -.9854717 .1622234 trade | .1125741 .1765277 0.64 0.524 -.2334139 .4585621 fire | -.3378747 .3046641 -1.11 0.267 -.9350054 .2592561 services | .1700335 .1729565 0.98 0.326 -.1689551 .5090221 pubadmin | .7553679 .5487635 1.38 0.169 -.3201889 1.830925 year85 | -.0501695 .1515048 -0.33 0.741 -.3471135 .2467745 year87 | -.1116858 .1645254 -0.68 0.497 -.4341497 .2107781 year89 | .1344555 .1987084 0.68 0.499 -.2550059 .5239168 midatl | -.4039691 .2606153 -1.55 0.121 -.9147658 .1068276 encen | -.5105877 .2608364 -1.96 0.050 -1.021818 .0006423 wncen | -.0579723 .2607792 -0.22 0.824 -.5690902 .4531456 southatl | -.2682241 .1972983 -1.36 0.174 -.6549216 .1184733 escen | .079807 .3146812 0.25 0.800 -.5369568 .6965709 wscen | -.0854421 .2368638 -0.36 0.718 -.5496865 .3788024 mountain | .2441762 .2300886 1.06 0.289 -.2067892 .6951416 pacific | -.1999107 .4003467 -0.50 0.618 -.9845758 .5847544 _cons | -1.055211 1.353275 -0.78 0.436 -3.707582 1.597159 -------------+---------------------------------------------------------------/ln_p | .0815649 .0308379 2.64 0.008 .0211236 .1420061 -------------+---------------------------------------------------------------p | 1.084984 .0334587 1.021348 1.152584 409

1/p | .9216729 .0284225 .8676159 .9790979 -----------------------------------------------------------------------------. estimates store bweibr2 . . stset spell, fail(censor3=1) failure event: censor3 == 1 obs. time interval: (0, spell] exit on or before: failure -----------------------------------------------------------------------------3343 total obs. 0 exclusions -----------------------------------------------------------------------------3343 obs. remaining, representing 574 failures in single record/single failure data 20887 total analysis time at risk, at risk from t = 0 earliest observed entry t = 0 last observed exit t = 28 . streg $xlist, nolog nohr robust dist(weibull) failure _d: censor3 == 1 analysis time _t: spell Weibull regression -- log relative-hazard form No. of subjects No. of failures Time at risk

= = =

3343 574 20887

Number of obs =

Wald chi2(40) = 350.72 Log pseudo-likelihood = -1729.8356 Prob > chi2

3343

=

0.0000

-----------------------------------------------------------------------------| Robust _t | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------RR | -.6946399 .762754 -0.91 0.362 -2.18961 .8003305 DR | 1.361414 .9691375 1.40 0.160 -.5380611 3.260888 UI | -1.098453 .4595297 -2.39 0.017 -1.999115 -.1977918 RRUI | -.3055217 1.046769 -0.29 0.770 -2.357151 1.746107 DRUI | 1.990913 1.37004 1.45 0.146 -.6943156 4.676141 LOGWAGE | .0401096 .1526549 0.26 0.793 -.2590886 .3393078 tenure | -.0495153 .0126559 -3.91 0.000 -.0743204 -.0247103 slack | -.473113 .1025776 -4.61 0.000 -.6741614 -.2720647 abolpos | -.2910168 .1465355 -1.99 0.047 -.5782212 -.0038124 explose | .0315602 .0906338 0.35 0.728 -.1460787 .2091991 stateur | -.1199252 .0337488 -3.55 0.000 -.1860717 -.0537787 houshead | .5592843 .1107798 5.05 0.000 .3421598 .7764087 410

married | .032312 .1115613 0.29 0.772 -.1863442 .2509681 female | .2764899 .1147909 2.41 0.016 .0515039 .5014759 child | -.149619 .1167679 -1.28 0.200 -.3784799 .079242 ychild | -.1018703 .1436607 -0.71 0.478 -.3834401 .1796996 nonwhite | -.5164388 .1517355 -3.40 0.001 -.8138349 -.2190427 age | -.0275549 .0057648 -4.78 0.000 -.0388536 -.0162561 schlt12 | -.1115642 .1291366 -0.86 0.388 -.3646673 .1415389 schgt12 | .1015553 .1135108 0.89 0.371 -.1209217 .3240324 smsa | .0270168 .1078739 0.25 0.802 -.1844122 .2384459 bluecoll | .3229431 .1167884 2.77 0.006 .094042 .5518443 mining | .2437267 .2731206 0.89 0.372 -.2915799 .7790332 constr | .1307943 .1484399 0.88 0.378 -.1601425 .4217311 transp | -.1004424 .2004105 -0.50 0.616 -.4932397 .2923549 trade | .1181562 .136055 0.87 0.385 -.1485068 .3848192 fire | -.344603 .2792784 -1.23 0.217 -.8919787 .2027726 services | .0519644 .1386656 0.37 0.708 -.2198151 .3237438 pubadmin | -1.780582 1.049217 -1.70 0.090 -3.837009 .2758459 year85 | .311726 .1192592 2.61 0.009 .0779822 .5454698 year87 | .4514345 .126241 3.58 0.000 .2040067 .6988623 year89 | -.1180122 .1713414 -0.69 0.491 -.4538352 .2178108 midatl | -.5476552 .224463 -2.44 0.015 -.9875945 -.1077158 encen | -.084084 .20745 -0.41 0.685 -.4906786 .3225106 wncen | .1288938 .2191536 0.59 0.556 -.3006393 .5584268 southatl | .16223 .1702456 0.95 0.341 -.1714454 .4959053 escen | -.5110545 .3270884 -1.56 0.118 -1.152136 .130027 wscen | .0218047 .1978693 0.11 0.912 -.3660121 .4096214 mountain | .2045852 .1949939 1.05 0.294 -.1775957 .5867662 pacific | .4535074 .2840292 1.60 0.110 -.1031795 1.010194 _cons | -2.017592 1.123888 -1.80 0.073 -4.220372 .1851884 -------------+---------------------------------------------------------------/ln_p | .163312 .0235045 6.95 0.000 .117244 .2093801 -------------+---------------------------------------------------------------p | 1.177404 .0276744 1.124394 1.232914 1/p | .8493261 .019963 .8110869 .8893682 -----------------------------------------------------------------------------. estimates store bweibr3 . . * Table 19.3 (page 659) first three columns . estimates table bweibr1 bweibr2 bweibr3, b(%10.3f) se(%10.3f) stats(N ll) /* > */ keep(RR DR UI RRUI DRUI LOGWAGE tenure) ----------------------------------------------------Variable | bweibr1 bweibr2 bweibr3 -------------+--------------------------------------RR | 0.448 -0.086 -0.695 | 0.638 0.992 0.763 DR | -0.427 -0.939 1.361 | 0.809 1.279 0.969 UI | -1.496 -1.110 -1.098 411

| 0.264 0.527 0.460 RRUI | 1.015 -0.617 -0.306 | 0.646 1.204 1.047 DRUI | -0.299 1.973 1.991 | 1.065 1.757 1.370 LOGWAGE | 0.366 -0.244 0.040 | 0.122 0.183 0.153 tenure | -0.001 0.005 -0.050 | 0.007 0.013 0.013 -------------+--------------------------------------N | 3343.000 3343.000 3343.000 ll | -2687.600 -1248.686 -1729.836 ----------------------------------------------------legend: b/se . . *** (2B) WEIBULL WITH IG HETEROGENEITY Table 19.3 . . stset spell, fail(censor1=1) failure event: censor1 == 1 obs. time interval: (0, spell] exit on or before: failure -----------------------------------------------------------------------------3343 total obs. 0 exclusions -----------------------------------------------------------------------------3343 obs. remaining, representing 1073 failures in single record/single failure data 20887 total analysis time at risk, at risk from t = 0 earliest observed entry t = 0 last observed exit t = 28 . streg $xlist, nohr robust dist(weibull) frailty(invgauss) failure _d: censor1 == 1 analysis time _t: spell Fitting weibull model: Fitting constant-only model: Iteration 0: Iteration 1: Iteration 2: Iteration 3: Iteration 4: Iteration 5: Iteration 6:

log pseudo-likelihood = -3134.2376 (not concave) log pseudo-likelihood = -2998.472 log pseudo-likelihood = -2984.8299 log pseudo-likelihood = -2960.0446 log pseudo-likelihood = -2954.9102 log pseudo-likelihood = -2954.8838 log pseudo-likelihood = -2954.8838

412

Fitting full model: Iteration 0: Iteration 1: Iteration 2: Iteration 3: Iteration 4: Iteration 5:

log pseudo-likelihood = -2656.6306 log pseudo-likelihood = -2632.196 log pseudo-likelihood = -2616.9139 log pseudo-likelihood = -2616.3231 log pseudo-likelihood = -2616.3216 log pseudo-likelihood = -2616.3216

Weibull regression -- log relative-hazard form Inverse-Gaussian frailty No. of subjects No. of failures Time at risk

= = =

3343 1073 20887

Number of obs =

Wald chi2(40) = 643.00 Log pseudo-likelihood = -2616.3216 Prob > chi2

3343

=

0.0000

-----------------------------------------------------------------------------| Robust _t | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------RR | .7356277 .9058181 0.81 0.417 -1.039743 2.510998 DR | -1.072566 1.149098 -0.93 0.351 -3.324758 1.179625 UI | -2.574752 .3843798 -6.70 0.000 -3.328123 -1.821381 RRUI | 1.733571 .9333928 1.86 0.063 -.0958458 3.562987 DRUI | -.060621 1.537813 -0.04 0.969 -3.07468 2.953438 LOGWAGE | .575656 .1766599 3.26 0.001 .2294089 .9219031 tenure | -.0009848 .0097472 -0.10 0.920 -.0200889 .0181194 slack | -.4416007 .1142976 -3.86 0.000 -.6656199 -.2175814 abolpos | -.2873066 .1465357 -1.96 0.050 -.5745113 -.0001019 explose | .3641943 .0976897 3.73 0.000 .1727259 .5556627 stateur | -.0981133 .0346763 -2.83 0.005 -.1660775 -.030149 houshead | .5924383 .1256739 4.71 0.000 .3461219 .8387546 married | .6083214 .1183487 5.14 0.000 .3763624 .8402805 female | .1788439 .1285074 1.39 0.164 -.0730259 .4307137 child | -.0914227 .121778 -0.75 0.453 -.3301031 .1472578 ychild | -.1805373 .1527477 -1.18 0.237 -.4799173 .1188426 nonwhite | -1.008517 .1725174 -5.85 0.000 -1.346645 -.6703894 age | -.0333776 .0059183 -5.64 0.000 -.0449772 -.0217779 schlt12 | -.2258621 .1439543 -1.57 0.117 -.5080075 .0562832 schgt12 | .1505129 .124469 1.21 0.227 -.0934418 .3944677 smsa | .3009952 .119907 2.51 0.012 .0659819 .5360086 bluecoll | -.3211857 .1253163 -2.56 0.010 -.5668012 -.0755702 mining | -.2319827 .3008491 -0.77 0.441 -.8216361 .3576708 constr | -.1260324 .1633669 -0.77 0.440 -.4462257 .1941609 transp | -.2763858 .225893 -1.22 0.221 -.7191279 .1663562 trade | -.0687616 .1518284 -0.45 0.651 -.3663399 .2288166 fire | .0668973 .2131814 0.31 0.754 -.3509306 .4847252 services | .231914 .1494712 1.55 0.121 -.0610441 .5248721 pubadmin | .0901949 .4579252 0.20 0.844 -.807322 .9877117 413

year85 | .2780139 .1339053 2.08 0.038 .0155644 .5404634 year87 | .5208783 .1415375 3.68 0.000 .2434699 .7982867 year89 | .7209598 .1655487 4.35 0.000 .3964903 1.045429 midatl | -.0192077 .2222646 -0.09 0.931 -.4548382 .4164228 encen | -.0297055 .2284931 -0.13 0.897 -.4775438 .4181328 wncen | .2460338 .24216 1.02 0.310 -.2285911 .7206586 southatl | .3563643 .1793284 1.99 0.047 .0048872 .7078415 escen | .5461543 .2910193 1.88 0.061 -.024233 1.116542 wscen | .4606814 .2140966 2.15 0.031 .0410598 .880303 mountain | .017581 .2293804 0.08 0.939 -.4319963 .4671584 pacific | .1379886 .3636985 0.38 0.704 -.5748475 .8508247 _cons | -5.303059 1.34133 -3.95 0.000 -7.932017 -2.6741 -------------+---------------------------------------------------------------/ln_p | .5611667 .0225898 24.84 0.000 .5168915 .6054418 /ln_the | 1.852696 .0896755 20.66 0.000 1.676935 2.028457 -------------+---------------------------------------------------------------p | 1.752716 .0395935 1.676807 1.832062 1/p | .570543 .0128884 .5458332 .5963715 theta | 6.376987 .5718595 5.349136 7.602343 -----------------------------------------------------------------------------. estimates store bweibigr1 . . stset spell, fail(censor2=1) failure event: censor2 == 1 obs. time interval: (0, spell] exit on or before: failure -----------------------------------------------------------------------------3343 total obs. 0 exclusions -----------------------------------------------------------------------------3343 obs. remaining, representing 339 failures in single record/single failure data 20887 total analysis time at risk, at risk from t = 0 earliest observed entry t = 0 last observed exit t = 28 . streg $xlist, nolog nohr robust dist(weibull) frailty(invgauss) failure _d: censor2 == 1 analysis time _t: spell Weibull regression -- log relative-hazard form Inverse-Gaussian frailty No. of subjects No. of failures Time at risk

= = =

3343 339 20887

Number of obs =

3343

414

Wald chi2(40) = 253.77 Log pseudo-likelihood = -1230.1643 Prob > chi2

=

0.0000

-----------------------------------------------------------------------------| Robust _t | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------RR | -.3802006 1.452095 -0.26 0.793 -3.226255 2.465854 DR | -1.689504 1.779553 -0.95 0.342 -5.177363 1.798355 UI | -2.063963 .7469659 -2.76 0.006 -3.527989 -.5999369 RRUI | -.3019038 1.702153 -0.18 0.859 -3.638063 3.034255 DRUI | 3.263067 2.469908 1.32 0.186 -1.577863 8.103998 LOGWAGE | -.4954862 .2614747 -1.89 0.058 -1.007967 .0169948 tenure | .0174014 .0192239 0.91 0.365 -.0202768 .0550795 slack | -.3889861 .1911789 -2.03 0.042 -.7636898 -.0142824 abolpos | -.8027208 .2877528 -2.79 0.005 -1.366706 -.2387356 explose | .1187808 .1663987 0.71 0.475 -.2073546 .4449162 stateur | -.1753726 .059272 -2.96 0.003 -.2915437 -.0592015 houshead | -.0832153 .1944376 -0.43 0.669 -.464306 .2978754 married | -.0092249 .1945187 -0.05 0.962 -.3904747 .3720248 female | .6284921 .2064768 3.04 0.002 .223805 1.033179 child | -.389325 .2127697 -1.83 0.067 -.806346 .0276959 ychild | .3144939 .2663886 1.18 0.238 -.2076182 .836606 nonwhite | -.6691885 .2633831 -2.54 0.011 -1.18541 -.1529671 age | -.0034533 .0093696 -0.37 0.712 -.0218174 .0149108 schlt12 | -.3242365 .2380109 -1.36 0.173 -.7907293 .1422562 schgt12 | -.0745655 .2138285 -0.35 0.727 -.4936618 .3445307 smsa | .2107394 .2012744 1.05 0.295 -.1837512 .60523 bluecoll | -.0065426 .2175612 -0.03 0.976 -.4329548 .4198696 mining | .1293103 .6093175 0.21 0.832 -1.06493 1.323551 constr | .2870954 .2728176 1.05 0.293 -.2476172 .8218081 transp | -.6470251 .4118414 -1.57 0.116 -1.454219 .1601692 trade | .1901489 .2529975 0.75 0.452 -.3057172 .6860149 fire | -.4680763 .4488502 -1.04 0.297 -1.347807 .411654 services | .2462185 .2531429 0.97 0.331 -.2499325 .7423696 pubadmin | 1.351206 .7621665 1.77 0.076 -.1426127 2.845025 year85 | -.1501166 .2195046 -0.68 0.494 -.5803377 .2801044 year87 | -.2400145 .236954 -1.01 0.311 -.7044358 .2244069 year89 | .1828811 .2831188 0.65 0.518 -.3720216 .7377838 midatl | -.4074373 .3806192 -1.07 0.284 -1.153437 .3385627 encen | -.6525035 .381508 -1.71 0.087 -1.400245 .0952385 wncen | -.1300751 .3835973 -0.34 0.735 -.8819119 .6217617 southatl | -.3491396 .2954776 -1.18 0.237 -.928265 .2299859 escen | .2960895 .4558667 0.65 0.516 -.5973927 1.189572 wscen | -.0903554 .3527441 -0.26 0.798 -.7817212 .6010104 mountain | .3721587 .3457717 1.08 0.282 -.3055413 1.049859 pacific | -.1996218 .6042626 -0.33 0.741 -1.383955 .9847112 _cons | 1.157635 1.957298 0.59 0.554 -2.678599 4.993869 -------------+---------------------------------------------------------------/ln_p | .5004283 .0361284 13.85 0.000 .429618 .5712386 /ln_the | 2.896807 .1749249 16.56 0.000 2.55396 3.239653 415

-------------+---------------------------------------------------------------p | 1.649428 .0595911 1.53667 1.770459 1/p | .6062709 .0219036 .5648254 .6507577 theta | 18.11621 3.168976 12.85793 25.52487 -----------------------------------------------------------------------------. estimates store bweibigr2 . . stset spell, fail(censor3=1) failure event: censor3 == 1 obs. time interval: (0, spell] exit on or before: failure -----------------------------------------------------------------------------3343 total obs. 0 exclusions -----------------------------------------------------------------------------3343 obs. remaining, representing 574 failures in single record/single failure data 20887 total analysis time at risk, at risk from t = 0 earliest observed entry t = 0 last observed exit t = 28 . streg $xlist, nolog nohr robust dist(weibull) frailty(invgauss) failure _d: censor3 == 1 analysis time _t: spell Weibull regression -- log relative-hazard form Inverse-Gaussian frailty No. of subjects No. of failures Time at risk

= = =

3343 574 20887

Number of obs =

Wald chi2(40) = 416.91 Log pseudo-likelihood = -1696.8456 Prob > chi2

3343

=

0.0000

-----------------------------------------------------------------------------| Robust _t | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------RR | -.4326716 1.111223 -0.39 0.697 -2.610628 1.745285 DR | 1.166629 1.377826 0.85 0.397 -1.533861 3.867119 UI | -1.761667 .623017 -2.83 0.005 -2.982758 -.5405758 RRUI | -.5160276 1.418361 -0.36 0.716 -3.295964 2.263909 DRUI | 3.668779 1.93489 1.90 0.058 -.1235355 7.461093 LOGWAGE | -.0069584 .2162461 -0.03 0.974 -.4307929 .4168762 tenure | -.0677151 .0174959 -3.87 0.000 -.1020065 -.0334237 slack | -.7093182 .145145 -4.89 0.000 -.9937971 -.4248392 416

abolpos | -.4327781 .2106818 -2.05 0.040 -.8457069 -.0198494 explose | .0930879 .1284587 0.72 0.469 -.1586864 .3448623 stateur | -.1684826 .0472936 -3.56 0.000 -.2611764 -.0757887 houshead | .7760519 .1555864 4.99 0.000 .4711081 1.080996 married | .0849334 .1585652 0.54 0.592 -.2258487 .3957154 female | .329107 .1637254 2.01 0.044 .0082111 .6500028 child | -.2734744 .1667453 -1.64 0.101 -.6002892 .0533403 ychild | -.101407 .2021952 -0.50 0.616 -.4977024 .2948883 nonwhite | -.7325977 .211777 -3.46 0.001 -1.147673 -.3175223 age | -.0354358 .007992 -4.43 0.000 -.0510998 -.0197719 schlt12 | -.1729163 .1803828 -0.96 0.338 -.5264602 .1806275 schgt12 | .0955174 .1615133 0.59 0.554 -.2210429 .4120777 smsa | .0225321 .1500451 0.15 0.881 -.2715509 .3166151 bluecoll | .4311626 .1651405 2.61 0.009 .1074931 .7548321 mining | .4464055 .3724328 1.20 0.231 -.2835495 1.17636 constr | .1875875 .2104018 0.89 0.373 -.2247926 .5999675 transp | -.0190191 .2877627 -0.07 0.947 -.5830237 .5449855 trade | .1708654 .1960546 0.87 0.383 -.2133945 .5551253 fire | -.3548846 .3851005 -0.92 0.357 -1.109668 .3998985 services | .0199891 .1978478 0.10 0.920 -.3677854 .4077636 pubadmin | -2.249289 1.450209 -1.55 0.121 -5.091646 .5930688 year85 | .3978277 .1726143 2.30 0.021 .0595099 .7361456 year87 | .6809662 .1807412 3.77 0.000 .32672 1.035212 year89 | -.1380237 .2307311 -0.60 0.550 -.5902485 .314201 midatl | -.7908245 .3280754 -2.41 0.016 -1.43384 -.1478085 encen | -.1035781 .2984816 -0.35 0.729 -.6885913 .4814351 wncen | .2578004 .3150731 0.82 0.413 -.3597316 .8753324 southatl | .2314723 .2430344 0.95 0.341 -.2448663 .7078109 escen | -.6777305 .4486486 -1.51 0.131 -1.557065 .2016045 wscen | .0308173 .2842933 0.11 0.914 -.5263874 .5880219 mountain | .2849032 .2816226 1.01 0.312 -.267067 .8368734 pacific | .7162217 .4103619 1.75 0.081 -.0880727 1.520516 _cons | -1.42279 1.617429 -0.88 0.379 -4.592894 1.747313 -------------+---------------------------------------------------------------/ln_p | .5795747 .026888 21.56 0.000 .5268752 .6322742 /ln_the | 2.262575 .1322516 17.11 0.000 2.003367 2.521783 -------------+---------------------------------------------------------------p | 1.785279 .0480026 1.693632 1.881886 1/p | .5601365 .0150609 .5313819 .5904471 theta | 9.607798 1.270647 7.413974 12.45078 -----------------------------------------------------------------------------. estimates store bweibigr3 . . * Table 19.3 (page 659) first three columns . estimates table bweibigr1 bweibigr2 bweibigr3, b(%10.3f) se(%10.3f) stats(N ll) /* > */ keep(RR DR UI RRUI DRUI LOGWAGE tenure) ----------------------------------------------------Variable | bweibigr1 bweibigr2 bweibigr3 417

-------------+--------------------------------------RR | 0.736 -0.380 -0.433 | 0.906 1.452 1.111 DR | -1.073 -1.690 1.167 | 1.149 1.780 1.378 UI | -2.575 -2.064 -1.762 | 0.384 0.747 0.623 RRUI | 1.734 -0.302 -0.516 | 0.933 1.702 1.418 DRUI | -0.061 3.263 3.669 | 1.538 2.470 1.935 LOGWAGE | 0.576 -0.495 -0.007 | 0.177 0.261 0.216 tenure | -0.001 0.017 -0.068 | 0.010 0.019 0.017 -------------+--------------------------------------N | 3343.000 3343.000 3343.000 ll | -2616.322 -1230.164 -1696.846 ----------------------------------------------------legend: b/se . . *** (2C) ESTIMATE COX MODEL SPECIFICATION OF COMPETING RISKS . . stset spell, fail(censor1=1) failure event: censor1 == 1 obs. time interval: (0, spell] exit on or before: failure -----------------------------------------------------------------------------3343 total obs. 0 exclusions -----------------------------------------------------------------------------3343 obs. remaining, representing 1073 failures in single record/single failure data 20887 total analysis time at risk, at risk from t = 0 earliest observed entry t = 0 last observed exit t = 28 . stcox $xlist, nolog nohr robust basesurv(survrisk1) basechazard(chrisk1) failure _d: censor1 == 1 analysis time _t: spell Cox regression -- Breslow method for ties No. of subjects No. of failures Time at risk

= = =

3343 1073 20887

Number of obs =

Wald chi2(40) =

3343

540.98 418

Log pseudo-likelihood = -7717.2334

Prob > chi2

=

0.0000

-----------------------------------------------------------------------------| Robust _t | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------RR | .5222796 .5711698 0.91 0.361 -.5971926 1.641752 DR | -.752507 .72175 -1.04 0.297 -2.167111 .6620971 UI | -1.317719 .2372893 -5.55 0.000 -1.782798 -.8526409 RRUI | .8822462 .582115 1.52 0.130 -.2586783 2.023171 DRUI | -.0951357 .977774 -0.10 0.922 -2.011538 1.821266 LOGWAGE | .3352639 .1106483 3.03 0.002 .1183972 .5521306 tenure | .0008278 .0061286 0.14 0.893 -.0111841 .0128396 slack | -.247863 .0721173 -3.44 0.001 -.3892103 -.1065158 abolpos | -.1511638 .0905035 -1.67 0.095 -.3285475 .0262198 explose | .1865068 .0615742 3.03 0.002 .0658236 .30719 stateur | -.0590475 .022085 -2.67 0.008 -.1023334 -.0157616 houshead | .3601866 .0794827 4.53 0.000 .2044035 .5159698 married | .358819 .0746355 4.81 0.000 .2125362 .5051019 female | .1002758 .0813277 1.23 0.218 -.0591236 .2596753 child | -.0396054 .0755365 -0.52 0.600 -.1876542 .1084435 ychild | -.1276638 .0967856 -1.32 0.187 -.3173602 .0620325 nonwhite | -.6394475 .1151332 -5.55 0.000 -.8651043 -.4137906 age | -.0204623 .0037593 -5.44 0.000 -.0278305 -.0130942 schlt12 | -.1220585 .0920073 -1.33 0.185 -.3023895 .0582726 schgt12 | .1104817 .0783542 1.41 0.159 -.0430897 .2640531 smsa | .1864841 .0766075 2.43 0.015 .0363361 .3366321 bluecoll | -.2108023 .080867 -2.61 0.009 -.3692986 -.052306 mining | -.1238251 .1906352 -0.65 0.516 -.4974632 .249813 constr | -.054455 .1029488 -0.53 0.597 -.256231 .1473209 transp | -.1551657 .1466515 -1.06 0.290 -.4425973 .1322659 trade | -.0383252 .0968106 -0.40 0.692 -.2280706 .1514201 fire | .1097585 .1300779 0.84 0.399 -.1451895 .3647065 services | .1666262 .0939507 1.77 0.076 -.0175138 .3507662 pubadmin | .1022002 .2829817 0.36 0.718 -.4524336 .6568341 year85 | .204162 .084908 2.40 0.016 .0377454 .3705786 year87 | .3384229 .0899115 3.76 0.000 .1621997 .5146462 year89 | .4486559 .104937 4.28 0.000 .2429832 .6543286 midatl | .0342238 .140515 0.24 0.808 -.2411805 .3096282 encen | .0174597 .1438862 0.12 0.903 -.2645521 .2994716 wncen | .1650967 .1532559 1.08 0.281 -.1352795 .4654728 southatl | .2518023 .1127138 2.23 0.025 .0308874 .4727172 escen | .3450422 .1839818 1.88 0.061 -.0155554 .7056398 wscen | .3316752 .1359801 2.44 0.015 .0651591 .5981914 mountain | .009484 .1468626 0.06 0.949 -.2783613 .2973293 pacific | .0720292 .2263339 0.32 0.750 -.3715771 .5156355 -----------------------------------------------------------------------------. estimates store bcoxrisk1 . 419

. stset spell, fail(censor2=1) failure event: censor2 == 1 obs. time interval: (0, spell] exit on or before: failure -----------------------------------------------------------------------------3343 total obs. 0 exclusions -----------------------------------------------------------------------------3343 obs. remaining, representing 339 failures in single record/single failure data 20887 total analysis time at risk, at risk from t = 0 earliest observed entry t = 0 last observed exit t = 28 . stcox $xlist, nolog nohr robust basesurv(survrisk2) basechazard(chrisk2) failure _d: censor2 == 1 analysis time _t: spell Cox regression -- Breslow method for ties No. of subjects No. of failures Time at risk

= = =

Log pseudo-likelihood =

3343 339 20887

Number of obs =

Wald chi2(40) = 211.82 -2444.342 Prob > chi2

3343

=

0.0000

-----------------------------------------------------------------------------| Robust _t | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------RR | -.0719673 .9513101 -0.08 0.940 -1.936501 1.792566 DR | -1.0236 1.193087 -0.86 0.391 -3.362007 1.314807 UI | -.906022 .5109396 -1.77 0.076 -1.907445 .0954013 RRUI | -.7818457 1.166182 -0.67 0.503 -3.06752 1.503829 DRUI | 2.031968 1.671862 1.22 0.224 -1.244821 5.308756 LOGWAGE | -.2800345 .1736454 -1.61 0.107 -.6203732 .0603043 tenure | .0059934 .0122664 0.49 0.625 -.0180483 .0300352 slack | -.2476685 .12775 -1.94 0.053 -.498054 .0027169 abolpos | -.5434923 .1976775 -2.75 0.006 -.9309331 -.1560516 explose | .0334802 .1101886 0.30 0.761 -.1824856 .2494459 stateur | -.0923228 .0393339 -2.35 0.019 -.1694157 -.0152299 houshead | -.0864111 .1303336 -0.66 0.507 -.3418602 .1690379 married | -.065464 .1298376 -0.50 0.614 -.3199409 .189013 female | .4386603 .1340263 3.27 0.001 .1759735 .7013471 child | -.2049337 .1413612 -1.45 0.147 -.4819966 .0721293 ychild | .1556684 .1766059 0.88 0.378 -.1904727 .5018095 nonwhite | -.3956483 .1761206 -2.25 0.025 -.7408382 -.0504583 age | .0001207 .0062519 0.02 0.985 -.0121327 .0123741 420

schlt12 | -.1723734 .1618354 -1.07 0.287 -.489565 .1448182 schgt12 | -.0583556 .142103 -0.41 0.681 -.3368724 .2201611 smsa | .1120279 .1334106 0.84 0.401 -.1494521 .3735079 bluecoll | -.0021333 .1460376 -0.01 0.988 -.2883617 .2840951 mining | -.0132972 .401138 -0.03 0.974 -.7995132 .7729188 constr | .1654229 .1852256 0.89 0.372 -.1976127 .5284584 transp | -.3818733 .2831048 -1.35 0.177 -.9367485 .1730019 trade | .1065755 .1677346 0.64 0.525 -.2221782 .4353293 fire | -.345295 .2945472 -1.17 0.241 -.9225969 .2320068 services | .1443583 .1664345 0.87 0.386 -.1818474 .470564 pubadmin | .7203208 .5238954 1.37 0.169 -.3064953 1.747137 year85 | -.0647735 .1460286 -0.44 0.657 -.3509844 .2214373 year87 | -.138436 .1574958 -0.88 0.379 -.4471221 .1702502 year89 | .100033 .1887671 0.53 0.596 -.2699437 .4700097 midatl | -.3838124 .2529706 -1.52 0.129 -.8796257 .1120009 encen | -.5058645 .2521219 -2.01 0.045 -1.000014 -.0117146 wncen | -.081463 .2512893 -0.32 0.746 -.5739811 .411055 southatl | -.2799968 .1891246 -1.48 0.139 -.6506742 .0906805 escen | .0372908 .2993588 0.12 0.901 -.5494417 .6240233 wscen | -.1157119 .2286912 -0.51 0.613 -.5639385 .3325146 mountain | .204597 .2206239 0.93 0.354 -.2278179 .6370119 pacific | -.2138749 .3899895 -0.55 0.583 -.9782404 .5504905 -----------------------------------------------------------------------------. estimates store bcoxrisk2 . . stset spell, fail(censor3=1) failure event: censor3 == 1 obs. time interval: (0, spell] exit on or before: failure -----------------------------------------------------------------------------3343 total obs. 0 exclusions -----------------------------------------------------------------------------3343 obs. remaining, representing 574 failures in single record/single failure data 20887 total analysis time at risk, at risk from t = 0 earliest observed entry t = 0 last observed exit t = 28 . stcox $xlist, nolog nohr robust basesurv(survrisk3) basechazard(chrisk3) failure _d: censor3 == 1 analysis time _t: spell Cox regression -- Breslow method for ties No. of subjects

=

3343

Number of obs =

3343 421

No. of failures Time at risk

= =

574 20887

Wald chi2(40) = 357.81 Log pseudo-likelihood = -4094.2361 Prob > chi2

=

0.0000

-----------------------------------------------------------------------------| Robust _t | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------RR | -.4692082 .7157644 -0.66 0.512 -1.872081 .9336643 DR | .8759221 .8786992 1.00 0.319 -.8462967 2.598141 UI | -.9051384 .4449384 -2.03 0.042 -1.777202 -.0330753 RRUI | -.5392752 1.002388 -0.54 0.591 -2.503919 1.425369 DRUI | 2.293752 1.274021 1.80 0.072 -.2032836 4.790787 LOGWAGE | -.0140883 .1415912 -0.10 0.921 -.291602 .2634253 tenure | -.0465013 .0118142 -3.94 0.000 -.0696567 -.0233458 slack | -.4587556 .0952092 -4.82 0.000 -.6453621 -.2721491 abolpos | -.2743895 .136703 -2.01 0.045 -.5423223 -.0064566 explose | .0199625 .0843281 0.24 0.813 -.1453176 .1852426 stateur | -.1013309 .0311307 -3.26 0.001 -.1623459 -.0403159 houshead | .5154239 .1031203 5.00 0.000 .3133117 .717536 married | .0280002 .1037338 0.27 0.787 -.1753143 .2313148 female | .2477194 .1071841 2.31 0.021 .0376425 .4577962 child | -.1477253 .1086376 -1.36 0.174 -.3606511 .0652005 ychild | -.0702224 .1341067 -0.52 0.601 -.3330667 .1926219 nonwhite | -.4472066 .1401892 -3.19 0.001 -.7219723 -.1724409 age | -.0227849 .0053188 -4.28 0.000 -.0332096 -.0123602 schlt12 | -.1050265 .1191449 -0.88 0.378 -.3385462 .1284931 schgt12 | .0912594 .1057371 0.86 0.388 -.1159815 .2985004 smsa | .0078536 .0994133 0.08 0.937 -.1869928 .2027 bluecoll | .2916892 .1085873 2.69 0.007 .0788619 .5045165 mining | .2392902 .2514416 0.95 0.341 -.2535263 .7321067 constr | .0659352 .1393882 0.47 0.636 -.2072606 .339131 transp | -.0724276 .1845329 -0.39 0.695 -.4341054 .2892502 trade | .0824395 .1260009 0.65 0.513 -.1645178 .3293967 fire | -.3901171 .2648329 -1.47 0.141 -.90918 .1289458 services | .0007351 .1296195 0.01 0.995 -.2533144 .2547847 pubadmin | -1.749927 1.038715 -1.68 0.092 -3.785771 .2859182 year85 | .2810465 .1124259 2.50 0.012 .0606957 .5013973 year87 | .4139684 .117016 3.54 0.000 .1846212 .6433155 year89 | -.1485614 .1590621 -0.93 0.350 -.4603173 .1631946 midatl | -.5271828 .2165005 -2.44 0.015 -.9515159 -.1028497 encen | -.063171 .1962513 -0.32 0.748 -.4478166 .3214745 wncen | .134275 .2051501 0.65 0.513 -.2678118 .5363617 southatl | .1522905 .1610446 0.95 0.344 -.1633512 .4679321 escen | -.5030762 .3118938 -1.61 0.107 -1.114377 .1082245 wscen | .0116807 .1858946 0.06 0.950 -.352666 .3760273 mountain | .2043736 .1827277 1.12 0.263 -.1537662 .5625134 pacific | .4327009 .2661013 1.63 0.104 -.088848 .9542498 ------------------------------------------------------------------------------

422

. estimates store bcoxrisk3 . . * Table 19.3 (page 659) last three columns . * NOTE: The results from this program differ a little from those .* given in text. Need to resolve this. . estimates table bcoxrisk1 bcoxrisk2 bcoxrisk3, b(%10.3f) se(%10.3f) stats(N ll) /* > */ keep(RR DR UI RRUI DRUI LOGWAGE tenure) ----------------------------------------------------Variable | bcoxrisk1 bcoxrisk2 bcoxrisk3 -------------+--------------------------------------RR | 0.522 -0.072 -0.469 | 0.571 0.951 0.716 DR | -0.753 -1.024 0.876 | 0.722 1.193 0.879 UI | -1.318 -0.906 -0.905 | 0.237 0.511 0.445 RRUI | 0.882 -0.782 -0.539 | 0.582 1.166 1.002 DRUI | -0.095 2.032 2.294 | 0.978 1.672 1.274 LOGWAGE | 0.335 -0.280 -0.014 | 0.111 0.174 0.142 tenure | 0.001 0.006 -0.047 | 0.006 0.012 0.012 -------------+--------------------------------------N | 3343.000 3343.000 3343.000 ll | -7717.233 -2444.342 -4094.236 ----------------------------------------------------legend: b/se . . *** (2D) GRAPHS FOR COX COMPETING RISKS MODEL . . * Figure 19.1 (page 661) - Plot the three baseline survival functions . sort _t . graph twoway (scatter survrisk1 _t, c(J) msymbol(i) msize(small) clstyle(p1)) /* > */ (scatter survrisk2 _t, c(J) msymbol(i) msize(small) clstyle(p2)) /* > */ (scatter survrisk3 _t, c(J) msymbol(i) msize(small) clstyle(p3)), /* > */ scale (1.2) plotregion(style(none)) /* > */ title("Baseline Survival Functions") /* > */ xtitle("Unemployment Duration in 2-week intervals", size(medlarge)) xscale(titlegap(*5)) /* > */ ytitle("Baseline Survival Probability", size(medlarge)) yscale(titlegap(*5)) /* > */ legend(pos(3) ring(0) col(1)) legend(size(small)) /* > */ legend( label(1 "Risk 1 (full-time job)") label(2 "Risk 2 (part-time job)") label(3 "Risk 3 ( > unknown job)")) . graph export combined_bsf.wmf, replace (file c:\Imbook\bwebpage\Section4\combined_bsf.wmf written in Windows Metafile format) 423

. . * Figure 19.2 (page 659) - Plot the three baseline cumulative hazards . sort _t . graph twoway (scatter chrisk1 _t, c(J) msymbol(i) msize(small) clstyle(p1)) /* > */ (scatter chrisk2 _t, c(J) msymbol(i) msize(small) clstyle(p2)) /* > */ (scatter chrisk3 _t, c(J) msymbol(i) msize(small) clstyle(p3)), /* > */ scale (1.2) plotregion(style(none)) /* > */ title("Baseline Cumulative Hazard Functions") /* > */ xtitle("Unemployment Duration in 2-week intervals", size(medlarge)) xscale(titlegap(*5)) /* > */ ytitle("Baseline Cumulative Hazard", size(medlarge)) yscale(titlegap(*5)) /* > */ legend(pos(11) ring(0) col(1)) legend(size(small)) /* > */ legend( label(1 "Risk 1 (full-time job)") label(2 "Risk 2 (part-time job)") label(3 "Risk 3 ( > unknown job)")) . graph export combined_cbh.wmf, replace (file c:\Imbook\bwebpage\Section4\combined_cbh.wmf written in Windows Metafile format) . . ********** CLOSE OUTPUT ********** . log close log: c:\Imbook\bwebpage\Section4\mma19p1comprisks.txt log type: text closed on: 19 May 2005, 17:53:08

424

-----------------------------------------------------------------------------------------------------log: c:\Imbook\bwebpage\Section4\mma20p1count.txt log type: text opened on: 20 May 2005, 08:41:33 . . ********* OVERVIEW OF MMA20P1COUNT.DO ********** . . * STATA Program . * copyright C 2005 by A. Colin Cameron and Pravin K. Trivedi . * used for "Microeconometrics: Methods and Applications" . * by A. Colin Cameron and Pravin K. Trivedi (2005) . * Cambridge University Press . . * Chapter 20.3 pages 671-4 and 20.7 page 690 . * Count data regression example . * It provides . * (1) Frequency distribution for count (Table 20.3) . * (2) Data summary (Table 20.4) . * (3) Poisson regression with various standard errors (Table 20.5) . * (4) Negative binomial regression with various standard errors (Table 20.5) . . * To use this program you need health expenditure data in Stata data set . * randdata.dta . . ********** SETUP ********** . . set more off . version 8.0 . set scheme s1mono /* Used for graphs */ . . ********** DATA DESCRIPTION ********** . . * Essentially same data as in P. Deb and P.K. Trivedi (2002) . * "The Structure of Demand for Medical Care: Latent Class versus . * Two-Part Models", Journal of Health Economics, 21, 601-625 . * except that paper used different outcome (counts rather than $) . . * Each observation is for an individual over a year. . * Individuals may appear in up to five years. . * All available sample is used except only fee for service plans included. . * In analysis here only year 2 is used so panel complications are avoided. . * Clustering of individuals within household is ignored here. . . * Dependent variable is .* MED med Annual medical expenditures in constant dollars .* excluding dental and outpatient mental .* LNMED lnmeddol Ln(Medical expenditures) given meddol > 0 425

.* Missing otherwise .* DMED binexp 1 if medical expenditures > 0 . . * Regressors are . * - Health insurance measures .* LC logc log(coinsrate+1) where coinsurance rate is 0 to 100 .* IDP idp 1 if individual deductible plan .* LPI lpi 1og(annual participation incentive payment) or 0 if no payment .* FMDE fmde log(max(medical deductible expenditure)) if IDP=1 and MDE>1 or 0 otherw > ise. . * - Health status measures .* NDISEASE disea number of chronic diseases .* PHYSLIM physlm 1 if physical limitation .* HLTHG hlthg 1 if good health .* HLTHF hlthf 1 if good health .* HLTHP hlthp 1 if good health (omitted is excellent) . * - Socioeconomic characteristics .* LINC linc log of annual family income (in $) .* LFAM lfam log of family size .* EDUCDEC educdec years of schooling of decision maker .* AGE xage exact age .* BLACK black 1 if black .* FEMALE female 1 if female .* CHILD child 1 if child .* FEMCHILD fchild 1 if female child . . * If panel data used then clustering is on .* zper person id . . ********** READ DATA, SELECT AND TRANSFORM ********** . . use randdata.dta, clear . sum Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------plan | 20190 11.17553 3.976751 1 19 site | 20190 3.298811 1.80382 1 6 coins | 20190 26.3056 36.40386 0 100 tookphys | 20190 .5974245 .4904288 0 1 year | 20190 2.420109 1.217141 1 5 -------------+-------------------------------------------------------zper | 20190 357965.5 180868.1 125024 632167 black | 20190 .1814983 .3827071 0 1 income | 20190 8037.409 4058.371 0 29237.54 xage | 20190 25.72233 16.76945 0 64.27515 female | 20190 .5170381 .499722 0 1 -------------+-------------------------------------------------------educdec | 20186 11.96681 2.806255 0 25 426

time | 20190 .9989561 .0259741 .0767123 1 outpdol | 20190 51.12649 94.92627 0 2599.902 drugdol | 20190 13.1687 33.76212 0 706.3979 suppdol | 20190 6.8024 21.39346 0 1009.47 -------------+-------------------------------------------------------mentdol | 20190 6.870347 58.41298 0 1340.834 inpdol | 20190 100.4694 655.6215 0 38649.81 meddol | 20190 171.5679 698.2015 0 39182.02 totadm | 20190 .1127291 .4111857 0 8 inpmis | 20190 .0039624 .062824 0 1 -------------+-------------------------------------------------------mentvis | 20190 .4322437 3.430789 0 62 mdvis | 20190 2.860426 4.504365 0 77 notmdvis | 20190 .6855869 3.763543 0 109 num | 20190 3.954235 1.853034 1 14 mhi | 20190 76.55584 12.50224 12.2 100 -------------+-------------------------------------------------------disea | 20190 11.24449 6.741449 0 58.6 physlm | 20190 .1235003 .3220164 0 1 ghindx | 14967 73.09055 15.99371 3.7 100 mdeoff | 20185 417.8422 384.1199 0 1000 pioff | 20185 446.677 367.466 0 1291.68 -------------+-------------------------------------------------------child | 20190 .4013373 .4901812 0 1 fchild | 20190 .1937098 .3952139 0 1 lfam | 20190 1.248156 .539301 0 2.639057 lpi | 20190 4.707894 2.69784 0 7.163699 idp | 20190 .2599802 .4386343 0 1 -------------+-------------------------------------------------------logc | 20190 2.383342 2.041776 0 4.564348 fmde | 20190 4.029524 3.471353 0 8.294049 hlthg | 20190 .3620109 .4805938 0 1 hlthf | 20190 .077266 .2670196 0 1 hlthp | 20190 .0149579 .1213874 0 1 -------------+-------------------------------------------------------xghindx | 20190 73.2375 14.2332 3.7 100 linc | 20190 8.708265 1.228309 0 10.28324 lnum | 20190 1.248156 .539301 0 2.639057 lnmeddol | 15737 4.109318 1.484654 -.8495329 10.57597 binexp | 20190 .7794453 .414631 0 1 . . /* Describe and summarize the original data. > describe > summarize > * The orignal data are a panel. > * The following summarizes panel features for completeness > iis zper > tis year > xtdes > xtsum meddol lnmeddol binexp 427

> */ . . * Note that unlike chapter 16 we use all years, not just year 2 . . * educdec is missing for some observations . drop if educdec==. (4 observations deleted) . . * rename variables . rename mdvis MDU . rename meddol MED . rename binexp DMED . rename lnmeddol LNMED . rename linc LINC . rename lfam LFAM . rename educdec EDUCDEC . rename xage AGE . rename female FEMALE . rename child CHILD . rename fchild FEMCHILD . rename black BLACK . rename disea NDISEASE . rename physlm PHYSLIM . rename hlthg HLTHG . rename hlthf HLTHF . rename hlthp HLTHP . rename idp IDP . rename logc LC . rename lpi LPI . rename fmde FMDE 428

. . * Define the regressor list which in commands can refer to as $XLIST . global XLIST LC IDP LPI FMDE PHYSLIM NDISEASE HLTHG HLTHF HLTHP /* > */ LINC LFAM EDUCDEC AGE FEMALE CHILD FEMCHILD BLACK . . sum MDU $XLIST Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------MDU | 20186 2.860696 4.504765 0 77 LC | 20186 2.383588 2.041713 0 4.564348 IDP | 20186 .2599822 .4386354 0 1 LPI | 20186 4.708827 2.697293 0 7.163699 FMDE | 20186 4.030322 3.471234 0 8.294049 -------------+-------------------------------------------------------PHYSLIM | 20186 .1235247 .3220437 0 1 NDISEASE | 20186 11.2445 6.741647 0 58.6 HLTHG | 20186 .3620826 .4806144 0 1 HLTHF | 20186 .0772813 .2670439 0 1 HLTHP | 20186 .0149609 .1213992 0 1 -------------+-------------------------------------------------------LINC | 20186 8.708167 1.22841 0 10.28324 LFAM | 20186 1.248404 .5390681 0 2.639057 EDUCDEC | 20186 11.96681 2.806255 0 25 AGE | 20186 25.71844 16.76759 0 64.27515 FEMALE | 20186 .5169424 .4997252 0 1 -------------+-------------------------------------------------------CHILD | 20186 .4014168 .4901972 0 1 FEMCHILD | 20186 .1937481 .3952436 0 1 BLACK | 20186 .1815343 .3827365 0 1 . . * Write final data to a text (ascii) file so can use with programs other than Stata . outfile MDU LC IDP LPI FMDE PHYSLIM NDISEASE HLTHG HLTHF HLTHP /* > */ LINC LFAM EDUCDEC AGE FEMALE CHILD FEMCHILD BLACK /* > */ using mma20p1count.asc, replace . . ********** (1) FREQUENCIES OF COUNT (Table 20.3, page 672) ********** . . * Following ggives Table 20.3 (page 672) frequencies . tabulate MDU number | face-to-fac | t md visits | Freq. Percent Cum. ------------+----------------------------------0| 6,308 31.25 31.25 1| 3,815 18.90 50.15 429

2| 3| 4| 5| 6| 7| 8| 9| 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 31 | 32 | 33 | 34 | 35 | 37 | 38 | 39 | 40 | 41 | 44 | 45 | 46 | 48 | 51 | 52 | 55 | 56 | 57 | 58 | 62 | 63 |

2,795 1,884 1,345 968 689 531 408 287 206 190 118 109 82 59 56 33 37 35 26 22 19 19 13 8 10 6 12 6 8 8 4 5 9 5 5 9 1 3 5 6 2 2 2 1 3 1 1 1 1 1 1

13.85 9.33 6.66 4.80 3.41 2.63 2.02 1.42 1.02 0.94 0.58 0.54 0.41 0.29 0.28 0.16 0.18 0.17 0.13 0.11 0.09 0.09 0.06 0.04 0.05 0.03 0.06 0.03 0.04 0.04 0.02 0.02 0.04 0.02 0.02 0.04 0.00 0.01 0.02 0.03 0.01 0.01 0.01 0.00 0.01 0.00 0.00 0.00 0.00 0.00 0.00

63.99 73.33 79.99 84.79 88.20 90.83 92.85 94.27 95.29 96.24 96.82 97.36 97.77 98.06 98.34 98.50 98.68 98.86 98.98 99.09 99.19 99.28 99.35 99.39 99.44 99.46 99.52 99.55 99.59 99.63 99.65 99.68 99.72 99.75 99.77 99.82 99.82 99.84 99.86 99.89 99.90 99.91 99.92 99.93 99.94 99.95 99.95 99.96 99.96 99.97 99.97 430

65 | 1 0.00 99.98 69 | 1 0.00 99.98 72 | 1 0.00 99.99 74 | 1 0.00 99.99 76 | 1 0.00 100.00 77 | 1 0.00 100.00 ------------+----------------------------------Total | 20,186 100.00 . . * Histogram with kernel density estimate . hist MDU, discrete kdensity (start=0, width=1) . . ********** (2) DATA SUMMARY (Table 20.4, page 672) ********** . . * Following gives variables in same order as Table 20.4 (page 672) . sum MDU LC IDP LPI FMDE LINC LFAM AGE FEMALE CHILD FEMCHILD BLACK /* > */ EDUCDEC PHYSLIM NDISEASE HLTHG HLTHF HLTHP Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------MDU | 20186 2.860696 4.504765 0 77 LC | 20186 2.383588 2.041713 0 4.564348 IDP | 20186 .2599822 .4386354 0 1 LPI | 20186 4.708827 2.697293 0 7.163699 FMDE | 20186 4.030322 3.471234 0 8.294049 -------------+-------------------------------------------------------LINC | 20186 8.708167 1.22841 0 10.28324 LFAM | 20186 1.248404 .5390681 0 2.639057 AGE | 20186 25.71844 16.76759 0 64.27515 FEMALE | 20186 .5169424 .4997252 0 1 CHILD | 20186 .4014168 .4901972 0 1 -------------+-------------------------------------------------------FEMCHILD | 20186 .1937481 .3952436 0 1 BLACK | 20186 .1815343 .3827365 0 1 EDUCDEC | 20186 11.96681 2.806255 0 25 PHYSLIM | 20186 .1235247 .3220437 0 1 NDISEASE | 20186 11.2445 6.741647 0 58.6 -------------+-------------------------------------------------------HLTHG | 20186 .3620826 .4806144 0 1 HLTHF | 20186 .0772813 .2670439 0 1 HLTHP | 20186 .0149609 .1213992 0 1 . . . *********** (3, 4) REGRESSION ANALYSIS ************** . . * Here just two estimators - Poisson and negative binomial . * but three ways to calculate standard errors 431

. * (A) default ML . * (B) robust (to misspecification of heteroskedasticity) . * (C) cluster-robust needed here as data are actually panel (see chapter 21, 24) . . *** Table 20.5 Poisson regression estimates . . * Default standard errors assume variance = mean (ignoring overdispersion) . * This is first t-ratio in Table 20.5 . poisson MDU $XLIST Iteration 0: Iteration 1: Iteration 2: Iteration 3:

log likelihood = -60097.599 log likelihood = -60087.636 log likelihood = -60087.622 log likelihood = -60087.622

Poisson regression

Number of obs = 20186 LR chi2(17) = 13106.07 Prob > chi2 = 0.0000 Log likelihood = -60087.622 Pseudo R2 = 0.0983

-----------------------------------------------------------------------------MDU | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------LC | -.0427332 .0060785 -7.03 0.000 -.0546469 -.0308195 IDP | -.1613169 .0116218 -13.88 0.000 -.1840952 -.1385385 LPI | .0128511 .0018362 7.00 0.000 .0092523 .0164499 FMDE | -.020613 .0035521 -5.80 0.000 -.027575 -.0136511 PHYSLIM | .2684048 .0123624 21.71 0.000 .2441749 .2926347 NDISEASE | .023183 .0006081 38.12 0.000 .0219912 .0243749 HLTHG | .0394004 .0095884 4.11 0.000 .0206074 .0581934 HLTHF | .2531119 .016212 15.61 0.000 .2213369 .2848869 HLTHP | .5216034 .0272382 19.15 0.000 .4682176 .5749892 LINC | .0834099 .0051656 16.15 0.000 .0732854 .0935343 LFAM | -.1296626 .0089603 -14.47 0.000 -.1472245 -.1121008 EDUCDEC | .0176149 .0016387 10.75 0.000 .0144031 .0208268 AGE | .0023756 .0004311 5.51 0.000 .0015306 .0032206 FEMALE | .3487667 .0113504 30.73 0.000 .3265203 .371013 CHILD | .3361904 .0178194 18.87 0.000 .3012649 .3711158 FEMCHILD | -.3625218 .0179396 -20.21 0.000 -.3976827 -.3273608 BLACK | -.6800518 .0155484 -43.74 0.000 -.7105262 -.6495775 _cons | -.1898766 .0491731 -3.86 0.000 -.2862541 -.093499 -----------------------------------------------------------------------------. estimates store poisml . . * Should always control for possible overdispersion . * This is second t-ratio in Table 20.5 . poisson MDU $XLIST, robust Iteration 0: log pseudo-likelihood = -60097.599 432

Iteration 1: log pseudo-likelihood = -60087.636 Iteration 2: log pseudo-likelihood = -60087.622 Iteration 3: log pseudo-likelihood = -60087.622 Poisson regression

Number of obs = 20186 Wald chi2(17) = 1924.78 Prob > chi2 = 0.0000 Log pseudo-likelihood = -60087.622 Pseudo R2 = 0.0983 -----------------------------------------------------------------------------| Robust MDU | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------LC | -.0427332 .0150712 -2.84 0.005 -.0722723 -.0131942 IDP | -.1613169 .0279441 -5.77 0.000 -.2160863 -.1065474 LPI | .0128511 .0044136 2.91 0.004 .0042007 .0215015 FMDE | -.020613 .0088874 -2.32 0.020 -.0380319 -.0031941 PHYSLIM | .2684048 .0325743 8.24 0.000 .2045604 .3322493 NDISEASE | .023183 .0017189 13.49 0.000 .019814 .0265521 HLTHG | .0394004 .023194 1.70 0.089 -.006059 .0848598 HLTHF | .2531119 .0429454 5.89 0.000 .1689405 .3372833 HLTHP | .5216034 .0748808 6.97 0.000 .3748398 .668367 LINC | .0834099 .0139182 5.99 0.000 .0561306 .1106891 LFAM | -.1296626 .0226793 -5.72 0.000 -.1741132 -.085212 EDUCDEC | .0176149 .004042 4.36 0.000 .0096927 .0255371 AGE | .0023756 .0011184 2.12 0.034 .0001837 .0045675 FEMALE | .3487667 .0283549 12.30 0.000 .293192 .4043413 CHILD | .3361904 .040411 8.32 0.000 .2569863 .4153945 FEMCHILD | -.3625218 .04415 -8.21 0.000 -.4490542 -.2759893 BLACK | -.6800518 .0368748 -18.44 0.000 -.7523252 -.6077785 _cons | -.1898766 .127516 -1.49 0.136 -.4398033 .0600502 -----------------------------------------------------------------------------. estimates store poisrobust . . * Should also control here for clustering (see chapter 24) . * as up to four years of data for each person. . * Table 20.5 did not report these results . poisson MDU $XLIST, cluster(zper) Iteration 0: Iteration 1: Iteration 2: Iteration 3:

log pseudo-likelihood = -60097.599 log pseudo-likelihood = -60087.636 log pseudo-likelihood = -60087.622 log pseudo-likelihood = -60087.622

Poisson regression

Number of obs = 20186 Wald chi2(17) = 827.07 Log pseudo-likelihood = -60087.622 Prob > chi2 = 0.0000 (standard errors adjusted for clustering on zper) 433

-----------------------------------------------------------------------------| Robust MDU | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------LC | -.0427332 .0226824 -1.88 0.060 -.0871899 .0017235 IDP | -.1613169 .0424591 -3.80 0.000 -.2445352 -.0780986 LPI | .0128511 .0067697 1.90 0.058 -.0004173 .0261195 FMDE | -.020613 .0134449 -1.53 0.125 -.0469646 .0057386 PHYSLIM | .2684048 .0491061 5.47 0.000 .1721586 .364651 NDISEASE | .023183 .0027457 8.44 0.000 .0178015 .0285645 HLTHG | .0394004 .0354001 1.11 0.266 -.0299825 .1087833 HLTHF | .2531119 .0675164 3.75 0.000 .1207822 .3854416 HLTHP | .5216034 .1163731 4.48 0.000 .2935163 .7496905 LINC | .0834099 .0200881 4.15 0.000 .0440379 .1227818 LFAM | -.1296626 .0340038 -3.81 0.000 -.1963089 -.0630164 EDUCDEC | .0176149 .0062678 2.81 0.005 .0053302 .0298996 AGE | .0023756 .0016549 1.44 0.151 -.0008681 .0056192 FEMALE | .3487667 .0432567 8.06 0.000 .263985 .4335483 CHILD | .3361904 .0586109 5.74 0.000 .2213151 .4510656 FEMCHILD | -.3625218 .0660639 -5.49 0.000 -.4920045 -.233039 BLACK | -.6800518 .0544268 -12.49 0.000 -.7867263 -.5733774 _cons | -.1898766 .1860343 -1.02 0.307 -.5544971 .174744 -----------------------------------------------------------------------------. estimates store poiscluster . . *** Table 20.5 Negative binomial regression estimates . . * Default standard errors assume variance = mean (ignoring overdispersion) . * This is first t-ratio in Table 20.5 . nbreg MDU $XLIST Fitting Poisson model: Iteration 0: Iteration 1: Iteration 2: Iteration 3:

log likelihood = -60097.599 log likelihood = -60087.636 log likelihood = -60087.622 log likelihood = -60087.622

Fitting constant-only model: Iteration 0: Iteration 1: Iteration 2: Iteration 3:

log likelihood = -44579.449 log likelihood = -44192.261 log likelihood = -44191.615 log likelihood = -44191.615

Fitting full model: Iteration 0: log likelihood = -42968.574 Iteration 1: log likelihood = -42783.342 434

Iteration 2: log likelihood = -42777.614 Iteration 3: log likelihood = -42777.611 Negative binomial regression

Number of obs = 20186 LR chi2(17) = 2828.01 Prob > chi2 = 0.0000 Log likelihood = -42777.611 Pseudo R2 = 0.0320 -----------------------------------------------------------------------------MDU | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------LC | -.0504405 .0128694 -3.92 0.000 -.0756641 -.0252169 IDP | -.1475976 .0254099 -5.81 0.000 -.1974001 -.0977951 LPI | .0158351 .0040586 3.90 0.000 .0078805 .0237898 FMDE | -.021335 .0075119 -2.84 0.005 -.036058 -.0066119 PHYSLIM | .2751715 .0295572 9.31 0.000 .2172404 .3331026 NDISEASE | .0259352 .0014827 17.49 0.000 .0230292 .0288412 HLTHG | .0065371 .0202235 0.32 0.747 -.0331002 .0461744 HLTHF | .2368643 .0374086 6.33 0.000 .1635448 .3101837 HLTHP | .4256563 .0741812 5.74 0.000 .2802638 .5710488 LINC | .0845165 .0085659 9.87 0.000 .0677277 .1013053 LFAM | -.1226764 .019308 -6.35 0.000 -.1605195 -.0848333 EDUCDEC | .0162582 .0034846 4.67 0.000 .0094285 .0230879 AGE | .0025943 .0009433 2.75 0.006 .0007455 .0044432 FEMALE | .3672884 .024005 15.30 0.000 .3202395 .4143373 CHILD | .3060317 .0385618 7.94 0.000 .230452 .3816115 FEMCHILD | -.3755503 .0371392 -10.11 0.000 -.4483418 -.3027587 BLACK | -.7104372 .0274929 -25.84 0.000 -.7643223 -.6565521 _cons | -.2069298 .0899431 -2.30 0.021 -.3832151 -.0306445 -------------+---------------------------------------------------------------/lnalpha | .1674206 .0147901 .1384326 .1964087 -------------+---------------------------------------------------------------alpha | 1.182251 .0174856 1.148472 1.217024 -----------------------------------------------------------------------------Likelihood-ratio test of alpha=0: chibar2(01) = 3.5e+04 Prob>=chibar2 = 0.000 . estimates store nbml . . * Should always control for possible overdispersion . * This is second t-ratio in Table 20.5 . nbreg MDU $XLIST, robust Fitting Poisson model: Iteration 0: Iteration 1: Iteration 2: Iteration 3:

log pseudo-likelihood = -60097.599 log pseudo-likelihood = -60087.636 log pseudo-likelihood = -60087.622 log pseudo-likelihood = -60087.622

Fitting constant-only model: 435

Iteration 0: Iteration 1: Iteration 2: Iteration 3:

log pseudo-likelihood = -44579.449 log pseudo-likelihood = -44192.261 log pseudo-likelihood = -44191.615 log pseudo-likelihood = -44191.615

Fitting full model: Iteration 0: Iteration 1: Iteration 2: Iteration 3:

log pseudo-likelihood = -42968.574 log pseudo-likelihood = -42783.342 log pseudo-likelihood = -42777.614 log pseudo-likelihood = -42777.611

Negative binomial regression

Number of obs = 20186 Wald chi2(17) = 2203.12 Prob > chi2 = 0.0000 Log pseudo-likelihood = -42777.611 Pseudo R2 = 0.0320

-----------------------------------------------------------------------------| Robust MDU | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------LC | -.0504405 .0156238 -3.23 0.001 -.0810625 -.0198184 IDP | -.1475976 .0303777 -4.86 0.000 -.2071367 -.0880585 LPI | .0158351 .004431 3.57 0.000 .0071505 .0245197 FMDE | -.021335 .0090748 -2.35 0.019 -.0391211 -.0035488 PHYSLIM | .2751715 .0341067 8.07 0.000 .2083235 .3420195 NDISEASE | .0259352 .0016925 15.32 0.000 .022618 .0292524 HLTHG | .0065371 .023814 0.27 0.784 -.0401375 .0532118 HLTHF | .2368643 .0436579 5.43 0.000 .1512963 .3224322 HLTHP | .4256563 .0686042 6.20 0.000 .2911945 .560118 LINC | .0845165 .0113918 7.42 0.000 .0621891 .106844 LFAM | -.1226764 .0231639 -5.30 0.000 -.1680769 -.0772759 EDUCDEC | .0162582 .0040332 4.03 0.000 .0083533 .024163 AGE | .0025943 .0011128 2.33 0.020 .0004133 .0047753 FEMALE | .3672884 .0285724 12.85 0.000 .3112876 .4232892 CHILD | .3060317 .0428976 7.13 0.000 .221954 .3901095 FEMCHILD | -.3755503 .0447039 -8.40 0.000 -.4631682 -.2879323 BLACK | -.7104372 .0359462 -19.76 0.000 -.7808903 -.639984 _cons | -.2069298 .1130753 -1.83 0.067 -.4285533 .0146938 -------------+---------------------------------------------------------------/lnalpha | .1674206 .0187562 .1306591 .2041821 -------------+---------------------------------------------------------------alpha | 1.182251 .0221746 1.139579 1.226522 -----------------------------------------------------------------------------. estimates store nbrobust . . * Should also control here for clustering (see chapter 24) . * as up to four years of data for each person. 436

. * Table 20.5 did not report these results . nbreg MDU $XLIST, cluster(zper) Fitting Poisson model: Iteration 0: Iteration 1: Iteration 2: Iteration 3:

log pseudo-likelihood = -60097.599 log pseudo-likelihood = -60087.636 log pseudo-likelihood = -60087.622 log pseudo-likelihood = -60087.622

Fitting constant-only model: Iteration 0: Iteration 1: Iteration 2: Iteration 3:

log pseudo-likelihood = -44579.449 log pseudo-likelihood = -44192.261 log pseudo-likelihood = -44191.615 log pseudo-likelihood = -44191.615

Fitting full model: Iteration 0: Iteration 1: Iteration 2: Iteration 3:

log pseudo-likelihood = -42968.574 log pseudo-likelihood = -42783.342 log pseudo-likelihood = -42777.614 log pseudo-likelihood = -42777.611

Negative binomial regression

Number of obs = 20186 Wald chi2(17) = 1034.43 Log pseudo-likelihood = -42777.611 Prob > chi2 = 0.0000 (standard errors adjusted for clustering on zper) -----------------------------------------------------------------------------| Robust MDU | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------LC | -.0504405 .0236804 -2.13 0.033 -.0968533 -.0040277 IDP | -.1475976 .0457769 -3.22 0.001 -.2373186 -.0578766 LPI | .0158351 .0066968 2.36 0.018 .0027096 .0289607 FMDE | -.021335 .0137245 -1.55 0.120 -.0482344 .0055645 PHYSLIM | .2751715 .0489905 5.62 0.000 .1791519 .371191 NDISEASE | .0259352 .0025814 10.05 0.000 .0208758 .0309946 HLTHG | .0065371 .0359676 0.18 0.856 -.0639581 .0770323 HLTHF | .2368643 .0653989 3.62 0.000 .1086848 .3650437 HLTHP | .4256563 .1000813 4.25 0.000 .2295005 .621812 LINC | .0845165 .0152197 5.55 0.000 .0546864 .1143467 LFAM | -.1226764 .0340453 -3.60 0.000 -.189404 -.0559488 EDUCDEC | .0162582 .0059501 2.73 0.006 .0045962 .0279202 AGE | .0025943 .001581 1.64 0.101 -.0005045 .0056931 FEMALE | .3672884 .0420327 8.74 0.000 .2849059 .4496709 CHILD | .3060317 .0598167 5.12 0.000 .1887932 .4232702 FEMCHILD | -.3755503 .0649845 -5.78 0.000 -.5029175 -.2481831 BLACK | -.7104372 .0531155 -13.38 0.000 -.8145417 -.6063326 _cons | -.2069298 .1576721 -1.31 0.189 -.5159613 .1021018 437

-------------+---------------------------------------------------------------/lnalpha | .1674206 .0252599 .1179121 .2169291 -------------+---------------------------------------------------------------alpha | 1.182251 .0298635 1.125145 1.242256 -----------------------------------------------------------------------------. estimates store nbcluster . . ************ DISPLAY RESULTS FOR TABLE 20.5 (page 673) ************ . . * Note for brevity the coefficients for only some of the regressors . * are given in Table 20.5 . . * First columns of Table 20.5 (page 673) plus cluster-robust . estimates table poisml poisrobust poiscluster, t stats(N ll rank aic bic) b(%10.4f) t(%10.3f) ----------------------------------------------------Variable | poisml poisrobust poisclus~r -------------+--------------------------------------LC | -0.0427 -0.0427 -0.0427 | -7.030 -2.835 -1.884 IDP | -0.1613 -0.1613 -0.1613 | -13.881 -5.773 -3.799 LPI | 0.0129 0.0129 0.0129 | 6.999 2.912 1.898 FMDE | -0.0206 -0.0206 -0.0206 | -5.803 -2.319 -1.533 PHYSLIM | 0.2684 0.2684 0.2684 | 21.711 8.240 5.466 NDISEASE | 0.0232 0.0232 0.0232 | 38.124 13.487 8.443 HLTHG | 0.0394 0.0394 0.0394 | 4.109 1.699 1.113 HLTHF | 0.2531 0.2531 0.2531 | 15.613 5.894 3.749 HLTHP | 0.5216 0.5216 0.5216 | 19.150 6.966 4.482 LINC | 0.0834 0.0834 0.0834 | 16.147 5.993 4.152 LFAM | -0.1297 -0.1297 -0.1297 | -14.471 -5.717 -3.813 EDUCDEC | 0.0176 0.0176 0.0176 | 10.749 4.358 2.810 AGE | 0.0024 0.0024 0.0024 | 5.510 2.124 1.435 FEMALE | 0.3488 0.3488 0.3488 | 30.727 12.300 8.063 CHILD | 0.3362 0.3362 0.3362 | 18.866 8.319 5.736 FEMCHILD | -0.3625 -0.3625 -0.3625 438

| -20.208 -8.211 -5.487 BLACK | -0.6801 -0.6801 -0.6801 | -43.738 -18.442 -12.495 _cons | -0.1899 -0.1899 -0.1899 | -3.861 -1.489 -1.021 -------------+--------------------------------------N | 20186.0000 20186.0000 20186.0000 ll | -6.009e+04 -6.009e+04 -6.009e+04 rank | 18.0000 18.0000 18.0000 aic | 1.202e+05 1.202e+05 1.202e+05 bic | 1.204e+05 1.204e+05 1.204e+05 ----------------------------------------------------legend: b/t . . * Last columns of Table 20.5 (page 673) give bnbml. Also give others. . estimates table nbml nbrobust nbcluster, t stats(N ll rank aic bic) b(%10.4f) t(%10.3f) ----------------------------------------------------Variable | nbml nbrobust nbcluster -------------+--------------------------------------MDU | LC | -0.0504 -0.0504 -0.0504 | -3.919 -3.228 -2.130 IDP | -0.1476 -0.1476 -0.1476 | -5.809 -4.859 -3.224 LPI | 0.0158 0.0158 0.0158 | 3.902 3.574 2.365 FMDE | -0.0213 -0.0213 -0.0213 | -2.840 -2.351 -1.555 PHYSLIM | 0.2752 0.2752 0.2752 | 9.310 8.068 5.617 NDISEASE | 0.0259 0.0259 0.0259 | 17.492 15.324 10.047 HLTHG | 0.0065 0.0065 0.0065 | 0.323 0.275 0.182 HLTHF | 0.2369 0.2369 0.2369 | 6.332 5.425 3.622 HLTHP | 0.4257 0.4257 0.4257 | 5.738 6.205 4.253 LINC | 0.0845 0.0845 0.0845 | 9.867 7.419 5.553 LFAM | -0.1227 -0.1227 -0.1227 | -6.354 -5.296 -3.603 EDUCDEC | 0.0163 0.0163 0.0163 | 4.666 4.031 2.732 AGE | 0.0026 0.0026 0.0026 | 2.750 2.331 1.641 FEMALE | 0.3673 0.3673 0.3673 | 15.300 12.855 8.738 CHILD | 0.3060 0.3060 0.3060 439

| 7.936 7.134 5.116 FEMCHILD | -0.3756 -0.3756 -0.3756 | -10.112 -8.401 -5.779 BLACK | -0.7104 -0.7104 -0.7104 | -25.841 -19.764 -13.375 _cons | -0.2069 -0.2069 -0.2069 | -2.301 -1.830 -1.312 -------------+--------------------------------------lnalpha | _cons | 0.1674 0.1674 0.1674 | 11.320 8.926 6.628 -------------+--------------------------------------Statistics | N | 20186.0000 20186.0000 20186.0000 ll | -4.278e+04 -4.278e+04 -4.278e+04 rank | 19.0000 19.0000 19.0000 aic | 85593.2220 85593.2220 85593.2220 bic | 85743.5642 85743.5642 85743.5642 ----------------------------------------------------legend: b/t . . * For Poisson correcting for overdispersion is most important. . * For negative binomial overdispersion is already incorporated. . * For both contreolling for clustering (in this example with panel data) . * is also needed. . . ********** CLOSE OUTPUT . log close log: c:\Imbook\bwebpage\Section4\mma20p1count.txt log type: text closed on: 20 May 2005, 08:41:56

440

-----------------------------------------------------------------------------------------------------log: c:\Imbook\bwebpage\Section5\mma21p1panfeandre.txt log type: text opened on: 23 May 2005, 11:27:25 . . ********** OVERVIEW OF MMA21P1PANBFEANDRE.DO ********** . . * STATA Program . * copyright C 2005 by A. Colin Cameron and Pravin K. Trivedi . * used for "Microeconometrics: Methods and Applications" . * by A. Colin Cameron and Pravin K. Trivedi (2005) . * Cambridge University Press . . * Chapter 21.3.1-3 pages 709-14 . * Program performs basic panel analysis, mainly using XTREG: . * It derives most of Table 21.1 and Figures 21.1-21.4 . * (1) pooled OLS . * (2) between . * (3) within (or fixed effects) . * (4) first differences . * (5) random effects - GLS . * (6) random effects - MLE . * (7) Hausman test of FE versus RE . * Standard errors are default plus panel bootstrap . . * The individual effects model is . * y_it = x_it'b + a_i + e_it . * Default panel output assumes e_it is random. . * This is usually too strong an assumption. . * Instead should get panel-robust or cluster-robust errors after xtreg . * See Section 21.2.3 pages 709-12 . * Stata Version 8 does not do this but Stata version 9 does. . . * Three ways to obtain panel-robust se's for fixed and random effects models: . * (1) Use Stata version 9 and cluster option in xtreg . * (2) Use Stata version 8 xtreg and then panel bootstrap (this program) . * (3) Use Stata version 8 regress cluster option on transformed model (next program) . . * The four basic linear panel programs are . * mma21p1panfeandre.do Linear fixed and random effects using xtreg . * mma21p2panfeandre.do Linear fe and re using transformation and regress .* plus also has valid Hausman test . * mma21p3panresiduals.do Residual analysis after linear fe and re . * mma21p4panpangls.do Pooled panel OLS and GLS . . * To run this program you need data file . * MOM.dat . . * To speed up this program reduce nreps, the number of bootstraps . * used in the panel bootstrap to get panel-robust standard errors 441

. . ********** SETUP ********** . . set more off . version 8.0 . set scheme s1mono /* Graphics scheme */ . . ********** DATA DESCRIPTION ********** . . * The original data is from . * Jim Ziliak (1997) . * "Efficient Estimation With Panel Data when Instruments are Predetermined: . * An Empirical Comparison of Moment-Condition Estimators" . * Journal of Business and Economic Statistics, 15, 419-431 . . * File MOM.dat has data on 532 men over 10 years (1979-1988) . * Data are space-delimited ordered by person with separate line for each year . * So id 1 1979, id 1 1980, ..., id 1 1988, id 2 1979, 1d 2 1980, ... . * 8 variables: . * lnhr lnwg kids ageh agesq disab id year . . * File MOM.dat is the version of the data posted at the JBES website . * Note that in chapter 22 we instead use MOMprecise.dat . * which is the same data set but with more significant digits . . ********** READ DATA ********** . . * The data are in ascii file MOM.dat . * There are 532 individuals with 10 lines (years) per individual . * Read in using Infile: FREE FORMAT WITHOUT DICTIONARY . infile lnhr lnwg kids ageh agesq disab id year using MOM.dat (5320 observations read) . . ********** DATA TRANSFORMATIONS AND CHECK ********** . . * Create year dummies . tabulate year, generate(dyear) year | Freq. Percent Cum. ------------+----------------------------------1979 | 532 10.00 10.00 1980 | 532 10.00 20.00 1981 | 532 10.00 30.00 1982 | 532 10.00 40.00 1983 | 532 10.00 50.00 1984 | 532 10.00 60.00 1985 | 532 10.00 70.00 442

1986 | 532 10.00 80.00 1987 | 532 10.00 90.00 1988 | 532 10.00 100.00 ------------+----------------------------------Total | 5,320 100.00 . . * The following lists the variables in data set and summarizes data . describe Contains data obs: 5,320 vars: 18 size: 244,720 (97.6% of memory free) ------------------------------------------------------------------------------storage display value variable name type format label variable label ------------------------------------------------------------------------------lnhr float %9.0g lnwg float %9.0g kids float %9.0g ageh float %9.0g agesq float %9.0g disab float %9.0g id float %9.0g year float %9.0g dyear1 byte %8.0g year== 1979.0000 dyear2 byte %8.0g year== 1980.0000 dyear3 byte %8.0g year== 1981.0000 dyear4 byte %8.0g year== 1982.0000 dyear5 byte %8.0g year== 1983.0000 dyear6 byte %8.0g year== 1984.0000 dyear7 byte %8.0g year== 1985.0000 dyear8 byte %8.0g year== 1986.0000 dyear9 byte %8.0g year== 1987.0000 dyear10 byte %8.0g year== 1988.0000 ------------------------------------------------------------------------------Sorted by: Note: dataset has changed since last saved . summarize Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------lnhr | 5320 7.65743 .2855914 2.77 8.56 lnwg | 5320 2.609436 .4258924 -.26 4.69 kids | 5320 1.555827 1.195924 0 6 ageh | 5320 38.91823 8.450351 22 60 agesq | 5320 1586.024 689.7759 484 3600 -------------+-------------------------------------------------------disab | 5320 .0609023 .2391734 0 1 443

id | 5320 266.5 153.5893 1 532 year | 5320 1983.5 2.872551 1979 1988 dyear1 | 5320 .1 .3000282 0 1 dyear2 | 5320 .1 .3000282 0 1 -------------+-------------------------------------------------------dyear3 | 5320 .1 .3000282 0 1 dyear4 | 5320 .1 .3000282 0 1 dyear5 | 5320 .1 .3000282 0 1 dyear6 | 5320 .1 .3000282 0 1 dyear7 | 5320 .1 .3000282 0 1 -------------+-------------------------------------------------------dyear8 | 5320 .1 .3000282 0 1 dyear9 | 5320 .1 .3000282 0 1 dyear10 | 5320 .1 .3000282 0 1 . save mom, replace file mom.dta saved . . * The following summarizes panel features for completeness . iis id . tis year . xtdes id: 1, 2, ..., 532 n= 532 year: 1979, 1980, ..., 1988 T= Delta(year) = 1; (1988-1979)+1 = 10 (id*year uniquely identifies each observation) Distribution of T_i: min 5% 10 10 10

10

25% 50% 75% 10 10 10 10

95%

max

Freq. Percent Cum. | Pattern ---------------------------+-----------532 100.00 100.00 | 1111111111 ---------------------------+-----------532 100.00 | XXXXXXXXXX . xtsum lnhr lnwg kids ageh agesq disab Variable | Mean Std. Dev. Min Max | Observations -----------------+--------------------------------------------+---------------lnhr overall | 7.65743 .2855914 2.77 8.56 | N = 5320 between | .1790083 6.416 8.242 | n = 532 within | .2226492 3.66943 9.001431 | T = 10 | | lnwg overall | 2.609436 .4258924 -.26 4.69 | N = 5320 between | .3911937 1.346 4.543 | n = 532 within | .1691472 .0694361 4.487436 | T = 10 444

| | overall | 1.555827 1.195924 0 6 | N = 5320 between | 1.032205 0 5.4 | n = 532 within | .605468 -2.444173 5.055827 | T = 10 | | ageh overall | 38.91823 8.450351 22 60 | N = 5320 between | 7.945371 26.5 55.5 | n = 532 within | 2.895916 32.71823 52.21823 | T = 10 | | agesq overall | 1586.024 689.7759 484 3600 | N = 5320 between | 650.9138 710.5 3088.5 | n = 532 within | 229.8235 963.3239 2581.724 | T = 10 | | disab overall | .0609023 .2391734 0 1 | N = 5320 between | .1657419 0 1 | n = 532 within | .1725689 -.8390977 .9609023 | T = 10

kids

. . ********** DEFINE GLOBALS INCLUDING REGRESSOR LIST ********** . . * Number of reps for the boostrap . * Table 21.2 pge 710 used 500 . global nreps 500 . . * The regression below are of lnhrs on lnwg . * Additional regressors to be included below are defined in xextra . * Choose one of the following . . * No additional regressors . global xextra . global xextrashort . . * Include year dummies with one ommitted (or two omitted for first differences) . * global xextra dyear1 dyear2 dyear3 dyear3 dyear4 dyear5 dyear6 dyear7 dyear8 dyear9 . * global xextrashort dyear2 dyear3 dyear3 dyear4 dyear5 dyear6 dyear7 dyear8 dyear9 . . * Include socioeconomic characteristics . * global xextra kids ageh agesq disab . * global xextrashort kids ageh agesq disab . . ********* DIFFERENT PANEL ESTIMATES pages 709-14 ********** . . * Note that in the first xt command need to give , i(id) . * to indicate that the ith observation is for the ith id . . * XTDATA permits plots of between, within and overall . * Useful for looking at the data. See Stata manual under xtdata for example. . * XTREG gives between, within and RE estiamtes though not correct standard errors 445

. . * The graphs below use new Stata 8 graphics . * Change graphics scheme from default s2color to s1mono for printing . set scheme s1mono . * The following graphs include . * legend(pos(4) ring(0) col(1)) .* changes position of legend to four o'clock . * legend( label(1 "Data used") label(2 "Smoothed fit") label(3 "Linear fit")) .* changes labels for the legends . . *** (1) POOLED OLS (OVERALL) REGRESSION (Table 21.2 POLS column and Figure 21.1) . . use mom, clear . . * Wrong formula OLS standard errors require e_it is i.i.d. . regress lnhr lnwg $xextra Source | SS df MS Number of obs = 5320 -------------+-----------------------------F( 1, 5318) = 82.22 Model | 6.60538417 1 6.60538417 Prob > F = 0.0000 Residual | 427.225206 5318 .080335691 R-squared = 0.0152 -------------+-----------------------------Adj R-squared = 0.0150 Total | 433.830591 5319 .081562435 Root MSE = .28344 -----------------------------------------------------------------------------lnhr | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------lnwg | .0827436 .0091251 9.07 0.000 .0648545 .1006326 _cons | 7.441516 .0241265 308.44 0.000 7.394219 7.488814 -----------------------------------------------------------------------------. estimates store polsiid . . * Wrong White heteroskesdastic-consistent standard errors . * assume standard errors require e_it is independent over i . regress lnhr lnwg $xextra, robust Regression with robust standard errors Number of obs = F( 1, 5318) = 16.61 Prob > F = 0.0000 R-squared = 0.0152 Root MSE = .28344

5320

-----------------------------------------------------------------------------| Robust lnhr | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------lnwg | .0827436 .0203042 4.08 0.000 .0429391 .122548 446

_cons | 7.441516 .0548992 135.55 0.000 7.333891 7.549141 -----------------------------------------------------------------------------. estimates store polshet . . * Correct panel robust standard errors . regress lnhr lnwg $xextra, cluster(id) Regression with robust standard errors Number of obs = 5320 F( 1, 531) = 7.99 Prob > F = 0.0049 R-squared = 0.0152 Number of clusters (id) = 532 Root MSE = .28344 -----------------------------------------------------------------------------| Robust lnhr | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------lnwg | .0827436 .0292711 2.83 0.005 .0252421 .140245 _cons | 7.441516 .079587 93.50 0.000 7.285172 7.59786 -----------------------------------------------------------------------------. estimates store polspanel . . * Correct panel bootstrap standard errors . * Note that use cluster option so that bootstrap is over just i and not both i and t . set seed 10001 . bs "regress lnhr lnwg $xextra" "_b[lnwg] _b[_cons]", cluster(id) reps($nreps) level(95) command: regress lnhr lnwg statistics: _bs_1 = _b[lnwg] _bs_2 = _b[_cons] Bootstrap statistics

Number of obs = N of clusters = 532 Replications = 500

5320

-----------------------------------------------------------------------------Variable | Reps Observed Bias Std. Err. [95% Conf. Interval] -------------+---------------------------------------------------------------_bs_1 | 500 .0827435 -.0005317 .0298395 .024117 .1413701 (N) | .027782 .1408137 (P) | .0284079 .1434854 (BC) _bs_2 | 500 7.441516 .001375 .0805676 7.283223 7.59981 (N) | 7.281352 7.593587 (P) | 7.269371 7.585756 (BC) -----------------------------------------------------------------------------Note: N = normal 447

P = percentile BC = bias-corrected . matrix polsbootse = e(se) . . * Overall plot of data with lowess local regression line - Figure 21.1 page 712 . graph twoway (scatter lnhr lnwg, msize(vsmall)) (lowess lnhr lnwg) (lfit lnhr lnwg), /* > */ scale (1.2) plotregion(style(none)) /* > */ title("Pooled (Overall) Regression") /* > */ xtitle("Log hourly wage", size(medlarge)) xscale(titlegap(*5)) /* > */ ytitle("Log annual hours", size(medlarge)) yscale(titlegap(*5)) /* > */ legend(pos(4) ring(0) col(1)) legend(size(small)) /* > */ legend( label(1 "Original data") label(2 "Nonparametric fit") label(3 "Linear fit")) . graph export ch21pantot.wmf, replace (file c:\Imbook\bwebpage\Section5\ch21pantot.wmf written in Windows Metafile format) . . *** (2) BETWEEN REGRESSION (Table 21.2 Between column and Figure 21.2) . . use mom, clear . . * Usual standard errors assume iid error . xtreg lnhr lnwg, be i(id) Between regression (regression on group means) Number of obs = Group variable (i): id Number of groups = 532 R-sq: within = 0.0162 between = 0.0213 overall = 0.0152 F(1,530) sd(u_i + avg(e_i.))= .1772555

Obs per group: min = avg = 10.0 max = 10 = 11.55 Prob > F =

5320

10

0.0007

-----------------------------------------------------------------------------lnhr | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------lnwg | .0668379 .0196635 3.40 0.001 .0282099 .1054658 _cons | 7.483021 .0518829 144.23 0.000 7.3811 7.584943 -----------------------------------------------------------------------------. estimates store beiid . . * Heteroskedasticity robust standard errors . * Stata has no option for this. See ch21panel2.do . . * Correct panel bootstrap standard errors 448

. set seed 10001 . bootstrap "xtreg lnhr lnwg, be i(id)" "_b[lnwg] _b[_cons]", cluster(id) reps($nreps) level(95) command: xtreg lnhr lnwg , be i(id) statistics: _bs_1 = _b[lnwg] _bs_2 = _b[_cons] Bootstrap statistics

Number of obs = N of clusters = 532 Replications = 500

5320

-----------------------------------------------------------------------------Variable | Reps Observed Bias Std. Err. [95% Conf. Interval] -------------+---------------------------------------------------------------_bs_1 | 500 .0668379 -.0005547 .0192363 .0290438 .1046319 (N) | .0240799 .1059889 (P) | .0274993 .1066802 (BC) _bs_2 | 500 7.483021 .0016537 .0519151 7.381022 7.58502 (N) | 7.383433 7.595335 (P) | 7.382822 7.592656 (BC) -----------------------------------------------------------------------------Note: N = normal P = percentile BC = bias-corrected . matrix bebootse = e(se) . . * Betweeen plot of data with lowess local regression line - Figure 21.2 page 712 . iis id . xtdata, be . graph twoway (scatter lnhr lnwg, msize(vsmall)) (lowess lnhr lnwg) (lfit lnhr lnwg), /* > */ scale (1.2) plotregion(style(none)) /* > */ title("Between Regression") /* > */ xtitle("Log hourly wage", size(medlarge)) xscale(titlegap(*5)) /* > */ ytitle("Log annual hours", size(medlarge)) yscale(titlegap(*5)) /* > */ legend(pos(4) ring(0) col(1)) legend(size(small)) /* > */ legend( label(1 "Averages") label(2 "Nonparametric fit") label(3 "Linear fit")) . graph export ch21panbe.wmf, replace (file c:\Imbook\bwebpage\Section5\ch21panbe.wmf written in Windows Metafile format) . . *** (3) WITHIN (FIXED EFFECTS) REGRESSION (Table 21.2 Within column and Figure 21.3) . . use mom, clear . 449

. * Usual standard errors assume iid error . xtreg lnhr lnwg $xextra, fe i(id) Fixed-effects (within) regression Group variable (i): id

Number of obs = 5320 Number of groups = 532

R-sq: within = 0.0162 between = 0.0213 overall = 0.0152

corr(u_i, Xb) = -0.1995

Obs per group: min = avg = 10.0 max = 10 F(1,4787) = Prob > F

10

78.96 = 0.0000

-----------------------------------------------------------------------------lnhr | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------lnwg | .1676755 .01887 8.89 0.000 .1306816 .2046694 _cons | 7.219892 .0493434 146.32 0.000 7.123156 7.316628 -------------+---------------------------------------------------------------sigma_u | .18142881 sigma_e | .23278339 rho | .37789558 (fraction of variance due to u_i) -----------------------------------------------------------------------------F test that all u_i=0: F(531, 4787) = 5.83 Prob > F = 0.0000 . estimates store feiid . . * Correct panel robust standard errors . * Stata has no option for this. See ch21panel2.do . . * Correct panel bootstrap standard errors . set seed 10001 . bootstrap "xtreg lnhr lnwg $xextra, fe i(id)" "_b[lnwg] _b[_cons]", cluster(id) reps($nreps) level > (95) command: xtreg lnhr lnwg , fe i(id) statistics: _bs_1 = _b[lnwg] _bs_2 = _b[_cons] Bootstrap statistics

Number of obs = N of clusters = 532 Replications = 500

5320

-----------------------------------------------------------------------------Variable | Reps Observed Bias Std. Err. [95% Conf. Interval] -------------+---------------------------------------------------------------_bs_1 | 500 .1676755 -.0055543 .0844631 .0017284 .3336226 (N) | .0213276 .3318829 (P) | .0300515 .3605573 (BC) 450

_bs_2 | 500 7.219892 .01461 .223047 6.781665 7.658119 (N) | 6.782279 7.604026 (P) | 6.683465 7.574718 (BC) -----------------------------------------------------------------------------Note: N = normal P = percentile BC = bias-corrected . matrix febootse = e(se) . . * Within plot of data with lowess local regression line - Figure 21.3 page 712 . iis id . xtdata, fe . graph twoway (scatter lnhr lnwg, msize(vsmall)) (lowess lnhr lnwg) (lfit lnhr lnwg), /* > */ scale (1.2) plotregion(style(none)) /* > */ title("Within (Fixed Effects) Regression") /* > */ xtitle("Log hourly wage", size(medlarge)) xscale(titlegap(*5)) /* > */ ytitle("Log annual hours", size(medlarge)) yscale(titlegap(*5)) /* > */ legend(pos(4) ring(0) col(1)) legend(size(small)) /* > */ legend( label(1 "Deviations from average") label(2 "Nonparametric fit") label(3 "Linear fit") >) . graph export ch21panfe.wmf, replace (file c:\Imbook\bwebpage\Section5\ch21panfe.wmf written in Windows Metafile format) . . *** (4) FIRST DIFFERENCES REGRESSION (Table 21.2 First diff column and Figure 21.4) . . * Stata has no command for first differences regression . * Though may be possible with xtabond . * Instead need to create differenced data . . use mom, clear . * The following only works if each observation is (i,t) . * and within i the data are ordered by t . gen dlnhr = lnhr - lnhr[_n-1] (1 missing value generated) . gen dlnwg = lnwg - lnwg[_n-1] (1 missing value generated) . gen dkids = kids - kids[_n-1] (1 missing value generated) . gen dageh = ageh - ageh[_n-1] (1 missing value generated)

451

. gen dagesq = agesq - agesq[_n-1] (1 missing value generated) . gen ddisab = disab - disab[_n-1] (1 missing value generated) . * The following drops the first year which here is 1979 . drop if year == 1979 (532 observations deleted) . . * Usual standard errors assume iid error . regress dlnhr dlnwg $xextrashort Source | SS df MS Number of obs = 4788 -------------+-----------------------------F( 1, 4786) = 26.09 Model | 2.27870825 1 2.27870825 Prob > F = 0.0000 Residual | 417.943979 4786 .087326364 R-squared = 0.0054 -------------+-----------------------------Adj R-squared = 0.0052 Total | 420.222687 4787 .087784142 Root MSE = .29551 -----------------------------------------------------------------------------dlnhr | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------dlnwg | .1089851 .0213351 5.11 0.000 .0671584 .1508118 _cons | .0008283 .0042712 0.19 0.846 -.0075452 .0092018 -----------------------------------------------------------------------------. estimates store fdiffiid . . * Correct panel robust standard errors . regress dlnhr dlnwg $xextrashort, cluster(id) Regression with robust standard errors Number of obs = 4788 F( 1, 531) = 1.69 Prob > F = 0.1936 R-squared = 0.0054 Number of clusters (id) = 532 Root MSE = .29551 -----------------------------------------------------------------------------| Robust dlnhr | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------dlnwg | .1089851 .0837266 1.30 0.194 -.0554909 .2734612 _cons | .0008283 .0016148 0.51 0.608 -.0023439 .0040005 -----------------------------------------------------------------------------. estimates store fdiffpanel . 452

. * "Robust" standard errors only control for heteroskedasticity . regress dlnhr dlnwg $xextrashort, robust Regression with robust standard errors Number of obs = F( 1, 4786) = 2.51 Prob > F = 0.1135 R-squared = 0.0054 Root MSE = .29551

4788

-----------------------------------------------------------------------------| Robust dlnhr | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------dlnwg | .1089851 .0688514 1.58 0.114 -.0259952 .2439654 _cons | .0008283 .0042856 0.19 0.847 -.0075735 .0092301 -----------------------------------------------------------------------------. estimates store fdiffhet . . * Correct panel bootstrap standard errors . set seed 10001 . bs "regress dlnhr dlnwg $xextrashort" "_b[dlnwg] _b[_cons]", cluster(id) reps($nreps) level(95) command: regress dlnhr dlnwg statistics: _bs_1 = _b[dlnwg] _bs_2 = _b[_cons] Bootstrap statistics

Number of obs = N of clusters = 532 Replications = 500

4788

-----------------------------------------------------------------------------Variable | Reps Observed Bias Std. Err. [95% Conf. Interval] -------------+---------------------------------------------------------------_bs_1 | 500 .1089851 -.0092694 .0832844 -.0546462 .2726165 (N) | -.0486034 .2608319 (P) | -.0329857 .2929305 (BC) _bs_2 | 500 .0008283 -8.39e-06 .0015843 -.0022843 .003941 (N) | -.0023564 .0038644 (P) | -.0023692 .003842 (BC) -----------------------------------------------------------------------------Note: N = normal P = percentile BC = bias-corrected . matrix fdiffbootse = e(se) . . * First differences plot with lowess local regression line - Figure 21.4 page 713 453

. graph twoway (scatter dlnhr dlnwg, msize(vsmall)) (lowess dlnhr dlnwg) (lfit dlnhr dlnwg), /* > */ scale (1.2) plotregion(style(none)) /* > */ title("First Differences Regression") /* > */ xtitle("Log hourly wage", size(medlarge)) xscale(titlegap(*5)) /* > */ ytitle("Log annual hours", size(medlarge)) yscale(titlegap(*5)) /* > */ legend(pos(4) ring(0) col(1)) legend(size(small)) /* > */ legend( label(1 "First differences") label(2 "Nonparametric fit") label(3 "Linear fit")) . graph export ch21panfd.wmf, replace (file c:\Imbook\bwebpage\Section5\ch21panfd.wmf written in Windows Metafile format) . . *** (5) RANDOM EFFECTS GLS REGRESSION (Table 21.2 RE-GLS column) . . use mom, clear . . * Usual standard errors assume iid error . xtreg lnhr lnwg, re i(id) Random-effects GLS regression Group variable (i): id R-sq: within = 0.0162 between = 0.0213 overall = 0.0152 Random effects u_i ~ Gaussian corr(u_i, X) = 0 (assumed)

Number of obs Number of groups = Obs per group: min = avg = 10.0 max = 10

= 5320 532 10

Wald chi2(1) = 76.64 Prob > chi2 = 0.0000

-----------------------------------------------------------------------------lnhr | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------lnwg | .1193322 .0136312 8.75 0.000 .0926155 .146049 _cons | 7.346041 .0363925 201.86 0.000 7.274713 7.417368 -------------+---------------------------------------------------------------sigma_u | .16124733 sigma_e | .23278339 rho | .32424354 (fraction of variance due to u_i) -----------------------------------------------------------------------------. estimates store reglsiid . . * Correct panel robust standard errors . * Stata has no option for this. See ch21panel2.do . * or use xtgee corr(exchangeable), robust see ch21panel4.do . . * Correct panel bootstrap standard errors . set seed 10001

454

. bootstrap "xtreg lnhr lnwg, re i(id)" "_b[lnwg] _b[_cons]", cluster(id) reps($nreps) level(95) command: xtreg lnhr lnwg , re i(id) statistics: _bs_1 = _b[lnwg] _bs_2 = _b[_cons] Bootstrap statistics

Number of obs = N of clusters = 532 Replications = 500

5320

-----------------------------------------------------------------------------Variable | Reps Observed Bias Std. Err. [95% Conf. Interval] -------------+---------------------------------------------------------------_bs_1 | 500 .1193322 .0084025 .0563763 .008568 .2300965 (N) | .0332454 .2379648 (P) | .0203328 .2199058 (BC) _bs_2 | 500 7.346041 -.0217114 .1492226 7.052859 7.639223 (N) | 7.029869 7.577236 (P) | 7.082208 7.614716 (BC) -----------------------------------------------------------------------------Note: N = normal P = percentile BC = bias-corrected . matrix reglsbootse = e(se) . . *** (6) RANDOM EFFECTS MLE REGRESSION (Table 21.2 RE-MLE column) . . use mom, clear . . * Usual standard errors assume iid error . xtreg lnhr lnwg, mle i(id) Fitting constant-only model: Iteration 0: log likelihood = -305.19469 Iteration 1: log likelihood = -304.97993 Iteration 2: log likelihood = -304.97987 Fitting full model: Iteration 0: log likelihood = -270.51687 Iteration 1: log likelihood = -266.91794 Iteration 2: log likelihood = -266.91155 Random-effects ML regression Group variable (i): id Random effects u_i ~ Gaussian

Number of obs Number of groups =

= 5320 532

Obs per group: min = avg = 10.0 max = 10

10

455

LR chi2(1) Log likelihood = -266.91155

= 76.14 Prob > chi2 =

0.0000

-----------------------------------------------------------------------------lnhr | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------lnwg | .1195474 .0137484 8.70 0.000 .092601 .1464938 _cons | 7.345479 .0366973 200.16 0.000 7.273554 7.417404 -------------+---------------------------------------------------------------/sigma_u | .162175 .0060469 26.82 0.000 .1503233 .1740266 /sigma_e | .2329172 .0023819 97.79 0.000 .2282488 .2375856 -------------+---------------------------------------------------------------rho | .3265097 .017266 .2934209 .3610233 -----------------------------------------------------------------------------Likelihood-ratio test of sigma_u=0: chibar2(01)= 1147.08 Prob>=chibar2 = 0.000 . estimates store remleiid . . * Correct panel robust standard errors . * Stata has no option for this. See ch21panel2.do . . * Correct panel bootstrap standard errors . set seed 10001 . bootstrap "xtreg lnhr lnwg, mle i(id)" "_b[lnwg] _b[_cons]", cluster(id) reps($nreps) level(95) command: xtreg lnhr lnwg , mle i(id) statistics: _bs_1 = _b[lnwg] _bs_2 = _b[_cons] Bootstrap statistics

Number of obs = N of clusters = 532 Replications = 500

5320

-----------------------------------------------------------------------------Variable | Reps Observed Bias Std. Err. [95% Conf. Interval] -------------+---------------------------------------------------------------_bs_1 | 500 .1195474 .0094957 .0582585 .0050852 .2340096 (N) | .0333037 .2445228 (P) | .0209889 .2249033 (BC) _bs_2 | 500 7.345479 -.0245541 .1540811 7.042751 7.648207 (N) | 7.013718 7.577084 (P) | 7.070499 7.613971 (BC) -----------------------------------------------------------------------------Note: N = normal P = percentile BC = bias-corrected . matrix remlebootse = e(se) 456

. . * Population averaged is similar to re (gives similar to mle version of re) . * Exactly same as xtgee, i(id) . xtreg lnhr lnwg, pa i(id) Iteration 1: tolerance = .03364039 Iteration 2: tolerance = .00033468 Iteration 3: tolerance = 4.733e-06 Iteration 4: tolerance = 6.715e-08 GEE population-averaged model Number of obs = 5320 Group variable: id Number of groups = 532 Link: identity Obs per group: min = 10 Family: Gaussian avg = 10.0 Correlation: exchangeable max = 10 Wald chi2(1) = 76.70 Scale parameter: .0805511 Prob > chi2 = 0.0000 -----------------------------------------------------------------------------lnhr | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------lnwg | .1195474 .0136507 8.76 0.000 .0927925 .1463023 _cons | 7.345479 .0364481 201.53 0.000 7.274042 7.416916 -----------------------------------------------------------------------------. estimates store paiid . . *** (7) HAUSMAN TEST (NOT ROBUST) . . * Hausman test of fixed versus random effects . * The FE estimates are saved in feiid . * The RE estimates are saved in reglsiid . . * From Section 21.4.3 pages 717-9 this usual implementation of the Hausman test . * is invalid if there is any intracluster correlation left in the RE model . * as then the RE estimator is no longer fully efficient . * so Var[b_RE - b_FE] does not equal Var[b_FE] - V[b_RE] . . * Following is not valid - see MMA21P2PANMANUAL.DO for robust version . hausman feiid reglsiid ---- Coefficients ---| (b) (B) (b-B) sqrt(diag(V_b-V_B)) | feiid reglsiid Difference S.E. -------------+---------------------------------------------------------------lnwg | .1676755 .1193322 .0483432 .0130486 -----------------------------------------------------------------------------b = consistent under Ho and Ha; obtained from xtreg B = inconsistent under Ha, efficient under Ho; obtained from xtreg 457

Test: Ho: difference in coefficients not systematic chi2(1) = (b-B)'[(V_b-V_B)^(-1)](b-B) = 13.73 Prob>chi2 = 0.0002 . . ********* DISPLAY RESULTS - Table 21.2 on page 710 ********* . . * Standard error using iid errors and in somce cases panel . estimates table polsiid polshet polspanel beiid feiid, /* > */ se stats(N ll r2 tss rss mss rmse df_r) b(%10.3f) ------------------------------------------------------------------------------Variable | polsiid polshet polspanel beiid feiid -------------+----------------------------------------------------------------lnwg | 0.083 0.083 0.083 0.067 0.168 | 0.009 0.020 0.029 0.020 0.019 _cons | 7.442 7.442 7.442 7.483 7.220 | 0.024 0.055 0.080 0.052 0.049 -------------+----------------------------------------------------------------N | 5320.000 5320.000 5320.000 5320.000 5320.000 ll | -840.453 -840.453 -840.453 166.573 486.743 r2 | 0.015 0.015 0.015 0.021 0.016 tss | 433.831 rss | 427.225 427.225 427.225 16.652 259.398 mss | 6.605 6.605 6.605 0.363 4.279 rmse | 0.283 0.283 0.283 0.177 0.233 df_r | 5318.000 5318.000 531.000 530.000 4787.000 ------------------------------------------------------------------------------legend: b/se . estimates table fdiffiid fdiffhet fdiffpanel reglsiid remleiid, /* > */ se stats(N ll r2 tss rss mss rmse df_r) b(%10.3f) ------------------------------------------------------------------------------Variable | fdiffiid fdiffhet fdiffpanel reglsiid remleiid -------------+----------------------------------------------------------------_ | dlnwg | 0.109 0.109 0.109 | 0.021 0.069 0.084 lnwg | 0.119 | 0.014 _cons | 0.001 0.001 0.001 7.346 | 0.004 0.004 0.002 0.036 -------------+----------------------------------------------------------------lnhr | lnwg | 0.120 | 0.014 _cons | 7.345 458

| 0.037 -------------+----------------------------------------------------------------sigma_u | _cons | 0.162 | 0.006 -------------+----------------------------------------------------------------sigma_e | _cons | 0.233 | 0.002 -------------+----------------------------------------------------------------Statistics | N | 4788.000 4788.000 4788.000 5320.000 5320.000 ll | -956.059 -956.059 -956.059 -266.912 r2 | 0.005 0.005 0.005 tss | rss | 417.944 417.944 417.944 mss | 2.279 2.279 2.279 rmse | 0.296 0.296 0.296 df_r | 4786.000 4786.000 531.000 ------------------------------------------------------------------------------legend: b/se . estimates table paiid, se stats(N ll r2 tss rss mss rmse df_r) b(%10.3f) --------------------------Variable | paiid -------------+------------lnwg | 0.120 | 0.014 _cons | 7.345 | 0.036 -------------+------------N | 5320.000 ll | r2 | tss | rss | mss | rmse | df_r | --------------------------legend: b/se . . * Standard errors using panel bootstrap (regular bootstrap for between) . matrix list polsbootse polsbootse[1,2] _bs_1 _bs_2 se .02983953 .0805676

459

. matrix list bebootse bebootse[1,2] _bs_1 _bs_2 se .01923625 .05191507 . matrix list febootse febootse[1,2] _bs_1 _bs_2 se .08446309 .22304703 . matrix list fdiffbootse fdiffbootse[1,2] _bs_1 _bs_2 se .08328443 .00158427 . matrix list reglsbootse reglsbootse[1,2] _bs_1 _bs_2 se .05637633 .14922264 . matrix list remlebootse remlebootse[1,2] _bs_1 _bs_2 se .05825849 .15408111 . . ********** CLOSE OUTPUT ********* . log close log: c:\Imbook\bwebpage\Section5\mma21p1panfeandre.txt log type: text closed on: 23 May 2005, 11:34:06 -----------------------------------------------------------------------------------------------------log: c:\Imbook\bwebpage\Section5\mma21p2panmanual.txt log type: text opened on: 23 May 2005, 11:34:50 . . ********** OVERVIEW OF MMA21P2PANMANUAL.DO ********** . . * STATA Program . * copyright C 2005 by A. Colin Cameron and Pravin K. Trivedi . * used for "Microeconometrics: Methods and Applications" . * by A. Colin Cameron and Pravin K. Trivedi (2005) . * Cambridge University Press . 460

. * Chapter 21.3.1-3 pages 709-14 . * Program performs basic panel analysis and gets panel robust se's . * by first transforming model and then using REGRESS . * It also presents a valid Hausman test of FE versus RE model . . * This program estimates . * (2) between estimator by regress y_bar on x_bar . * (4) within estimator by regress (y - y_bar) on (x - x_bar) . * (5) random effects gls by regress (y - rho*y_bar) on (x - rho*x_bar) . * (6) random effects mle by regress (y - rho*y_bar) on (x - rho*x_bar) . * (7) robust variant of the Hausman test . * and calculates . * - usual standard errors .* (which may differ from xtreg due to different degrees of freedom) . * - panel robust standard errors .* (which for RE simplify by assuming lamda_hat is known not estimated) . * - panel bootstrap standard errors .* (which should equal panel robust from ch21panel.do as #bootstrap reps --> infinity) . * - heteroskedasticity robust standard errors .* (which are wrong but included for comparison with others) . . * The code is very limited: . * - it considers only one regressor . * - it assumes a balanced data set with exactly 10 years of data per obnservations . * - it does not use loops for transformations which would generalize code . . * NOTE: If have Stata Version 9 (rather than version 8) a simpler way to proceed is . * to directly use XTREG (see program mma21p1panfeandre.do) with option cluster(id) . . * The four basic linear panel programs are . * mma21p1panfeandre.do Linear fixed and random effects using xtreg . * mma21p2panfeandre.do Linear fe and re using transformation and regress .* plus also has valid Hausman test . * mma21p3panresiduals.do Residual analysis after linear fe and re . * mma21p4panpangls.do Pooled panel OLS and GLS . . * To run this program you need data file . * MOM.dat . * in your directory . . * To speed up this program reduce nreps, the number of bootstraps . * used in the panel bootstrap. . . ********** SETUP ********** . . set more off . version 8.0 . set scheme s1mono /* Graphics scheme */

461

. . ********** DATA DESCRIPTION ********** . . * The original data is from . * Jim Ziliak (1997) . * "Efficient Estimation With Panel Data when Instruments are Predetermined: . * An Emprirical Comparison of Moment-Condition Estimators" . * Journal of Business and Economic Statistics, 15, 419-431 . . * File MOM.dat has data on 532 men over 10 years (1979-1988) . * Data are space-delimited ordered by person with separate line for each year . * So id 1 1979, id 1 1980, ..., id 1 1988, id 2 1979, 1d 2 1980, ... . * 8 variables: . * lnhr lnwg kids ageh agesq disab id year . . * File MOM.dat is the version of the data posted at the JBES website . * Note that in chapter 22 we instead use MOMprecise.dat . * which is the same data set but with more significant digits . . ********** READ DATA ********** . . * The data are in ascii file MOM.dat . * There are 532 individuals with 10 lines (years) per individual . * Read in using Infile: FREE FORMAT WITHOUT DICTIONARY . infile lnhr lnwg kids ageh agesq disab id year using MOM.dat (5320 observations read) . summarize Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------lnhr | 5320 7.65743 .2855914 2.77 8.56 lnwg | 5320 2.609436 .4258924 -.26 4.69 kids | 5320 1.555827 1.195924 0 6 ageh | 5320 38.91823 8.450351 22 60 agesq | 5320 1586.024 689.7759 484 3600 -------------+-------------------------------------------------------disab | 5320 .0609023 .2391734 0 1 id | 5320 266.5 153.5893 1 532 year | 5320 1983.5 2.872551 1979 1988 . . ********** DEFINE GLOBALS ********** . . * Number of reps for the boostrap . * Table 21.1 used 500 . global nreps 500 . . ******** RUN REGRESSIONS USING XTREG ********** . 462

. * This is to verify alternative estimates later on . * And for random effects it saves lamda . * used later on to construct transformed regression . * of (y - lamda*y_1) on (x - lamda*x_1) . . xtreg lnhr lnwg, be i(id) Between regression (regression on group means) Number of obs = Group variable (i): id Number of groups = 532 R-sq: within = 0.0162 between = 0.0213 overall = 0.0152

Obs per group: min = avg = 10.0 max = 10

F(1,530) sd(u_i + avg(e_i.))= .1772555

= 11.55 Prob > F =

5320

10

0.0007

-----------------------------------------------------------------------------lnhr | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------lnwg | .0668379 .0196635 3.40 0.001 .0282099 .1054658 _cons | 7.483021 .0518829 144.23 0.000 7.3811 7.584943 -----------------------------------------------------------------------------. estimates store bextreg . . xtreg lnhr lnwg, fe i(id) Fixed-effects (within) regression Group variable (i): id R-sq: within = 0.0162 between = 0.0213 overall = 0.0152

corr(u_i, Xb) = -0.1995

Number of obs = 5320 Number of groups = 532 Obs per group: min = avg = 10.0 max = 10

F(1,4787) = Prob > F

10

78.96 = 0.0000

-----------------------------------------------------------------------------lnhr | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------lnwg | .1676755 .01887 8.89 0.000 .1306816 .2046694 _cons | 7.219892 .0493434 146.32 0.000 7.123156 7.316628 -------------+---------------------------------------------------------------sigma_u | .18142881 sigma_e | .23278339 rho | .37789558 (fraction of variance due to u_i) -----------------------------------------------------------------------------F test that all u_i=0: F(531, 4787) = 5.83 Prob > F = 0.0000

463

. estimates store fextreg . . xtreg lnhr lnwg, re i(id) Random-effects GLS regression Group variable (i): id R-sq: within = 0.0162 between = 0.0213 overall = 0.0152 Random effects u_i ~ Gaussian corr(u_i, X) = 0 (assumed)

Number of obs Number of groups =

= 5320 532

Obs per group: min = avg = 10.0 max = 10

10

Wald chi2(1) = 76.64 Prob > chi2 = 0.0000

-----------------------------------------------------------------------------lnhr | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------lnwg | .1193322 .0136312 8.75 0.000 .0926155 .146049 _cons | 7.346041 .0363925 201.86 0.000 7.274713 7.417368 -------------+---------------------------------------------------------------sigma_u | .16124733 sigma_e | .23278339 rho | .32424354 (fraction of variance due to u_i) -----------------------------------------------------------------------------. estimates store reglsxtreg . scalar sesq = e(sigma_e)^2 . scalar susq = e(sigma_u)^2 . scalar lamdaregls = 1 - sqrt( sesq / (e(Tbar)*susq + sesq) ) . di lamdaregls .58470925 . . xtreg lnhr lnwg, mle i(id) Fitting constant-only model: Iteration 0: log likelihood = -305.19469 Iteration 1: log likelihood = -304.97993 Iteration 2: log likelihood = -304.97987 Fitting full model: Iteration 0: log likelihood = -270.51687 Iteration 1: log likelihood = -266.91794 Iteration 2: log likelihood = -266.91155 Random-effects ML regression

Number of obs

=

5320 464

Group variable (i): id

Number of groups =

Random effects u_i ~ Gaussian

Obs per group: min = avg = 10.0 max = 10

LR chi2(1) Log likelihood = -266.91155

532

= 76.14 Prob > chi2 =

10

0.0000

-----------------------------------------------------------------------------lnhr | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------lnwg | .1195474 .0137484 8.70 0.000 .092601 .1464938 _cons | 7.345479 .0366973 200.16 0.000 7.273554 7.417404 -------------+---------------------------------------------------------------/sigma_u | .162175 .0060469 26.82 0.000 .1503233 .1740266 /sigma_e | .2329172 .0023819 97.79 0.000 .2282488 .2375856 -------------+---------------------------------------------------------------rho | .3265097 .017266 .2934209 .3610233 -----------------------------------------------------------------------------Likelihood-ratio test of sigma_u=0: chibar2(01)= 1147.08 Prob>=chibar2 = 0.000 . estimates store remlextreg . scalar sesq2 = e(sigma_e)^2 . scalar susq2 = e(sigma_u)^2 . scalar lamdaremle = 1 - sqrt( sesq2 / (e(g_avg)*susq2 + sesq2) ) . di lamdaremle .58648101 . . ******** ANALYSIS: FE, RE and FD ESTIMATORS CALCULATED MANUALLY ********** . . *** FIRST TRANSFORM DATA FROM LONG FORM TO WIDE FORM . . * Here just do this for lnhr and lnwg . keep lnhr lnwg id year . reshape wide lnhr lnwg, i(id) j(year) (note: j = 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988) Data long -> wide ----------------------------------------------------------------------------Number of obs. 5320 -> 532 Number of variables 4 -> 21 j variable (10 values) year -> (dropped) xij variables: 465

lnhr -> lnhr1979 lnhr1980 ... lnhr1988 lnwg -> lnwg1979 lnwg1980 ... lnwg1988 ----------------------------------------------------------------------------. . * Since year is 1979 to 1988 this will create . * lnhr1979 to lnhr1988 and lnwg1979 to lnwg1988 . . summarize Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------id | 532 266.5 153.7194 1 532 lnhr1979 | 532 7.669342 .249361 5.89 8.54 lnwg1979 | 532 2.597763 .4188951 .52 4.62 lnhr1980 | 532 7.660094 .2691995 5.22 8.34 lnwg1980 | 532 2.602368 .3945963 .8 4.61 -------------+-------------------------------------------------------lnhr1981 | 532 7.66765 .2105797 6.36 8.4 lnwg1981 | 532 2.610959 .3870011 1.53 4.53 lnhr1982 | 532 7.64609 .2427195 5.38 8.31 lnwg1982 | 532 2.61468 .4014363 1.21 4.61 lnhr1983 | 532 7.613064 .382703 2.77 8.37 -------------+-------------------------------------------------------lnwg1983 | 532 2.610526 .4111869 1.08 4.62 lnhr1984 | 532 7.636523 .3316735 3.18 8.44 lnwg1984 | 532 2.600188 .4621549 -.26 4.65 lnhr1985 | 532 7.668365 .2597423 5.08 8.54 lnwg1985 | 532 2.614944 .4347554 1.33 4.69 -------------+-------------------------------------------------------lnhr1986 | 532 7.659286 .3330862 2.77 8.38 lnwg1986 | 532 2.602632 .4432807 .07 4.59 lnhr1987 | 532 7.67406 .2745015 4.38 8.56 lnwg1987 | 532 2.614699 .4300122 1.28 4.03 lnhr1988 | 532 7.679831 .2552894 4.79 8.53 -------------+-------------------------------------------------------lnwg1988 | 532 2.625602 .4701759 -.22 4.6 . . *** (1) POOLED OLS (OVERALL) REGRESSION . . * Not relevant . . *** (2) CREATE INDIVIDUAL AVERAGES AND DO BETWEEN REGRESSION . . gen avelnhr = (lnhr1979+lnhr1980+lnhr1981+lnhr1982+lnhr1983+lnhr1984+ /* > */ lnhr1985+lnhr1986+lnhr1987+lnhr1988) / 10 . gen avelnwg = (lnwg1979+lnwg1980+lnwg1981+lnwg1982+lnwg1983+lnwg1984+ /* > */ lnwg1985+lnwg1986+lnwg1987+lnwg1988) / 10

466

. . * Should replicate xtreg, be . regress avelnhr avelnwg Source | SS df MS Number of obs = 532 -------------+-----------------------------F( 1, 530) = 11.55 Model | .363013807 1 .363013807 Prob > F = 0.0007 Residual | 16.6523404 530 .03141951 R-squared = 0.0213 -------------+-----------------------------Adj R-squared = 0.0195 Total | 17.0153542 531 .032043982 Root MSE = .17726 -----------------------------------------------------------------------------avelnhr | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------avelnwg | .0668379 .0196635 3.40 0.001 .0282099 .1054658 _cons | 7.483021 .0518829 144.23 0.000 7.3811 7.584943 -----------------------------------------------------------------------------. estimates store bebyols . . * Better is the following as gives heteroskedastic robust standard errors . regress avelnhr avelnwg, robust Regression with robust standard errors Number of obs = F( 1, 530) = 7.55 Prob > F = 0.0062 R-squared = 0.0213 Root MSE = .17726

532

-----------------------------------------------------------------------------| Robust avelnhr | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------avelnwg | .0668379 .0243185 2.75 0.006 .0190654 .1146103 _cons | 7.483021 .0657699 113.78 0.000 7.35382 7.612223 -----------------------------------------------------------------------------. estimates store behet . . * Or could bootstrap . bootstrap "regress avelnhr avelnwg" "_b[avelnwg] _b[_cons]", reps(200) level(95) command: regress avelnhr avelnwg statistics: _bs_1 = _b[avelnwg] _bs_2 = _b[_cons] Bootstrap statistics

Number of obs = Replications = 200

532

467

-----------------------------------------------------------------------------Variable | Reps Observed Bias Std. Err. [95% Conf. Interval] -------------+---------------------------------------------------------------_bs_1 | 200 .0668379 -.0010221 .0239486 .0196123 .1140634 (N) | .0233175 .1143305 (P) | .0266221 .1175503 (BC) _bs_2 | 200 7.483021 .0029632 .0648396 7.35516 7.610882 (N) | 7.362745 7.600107 (P) | 7.358079 7.591704 (BC) -----------------------------------------------------------------------------Note: N = normal P = percentile BC = bias-corrected . matrix bebootse = e(se) . . *** (3) CREATE DIFFERENCED DATA FOR FE AND RE . . * Continue with data already and then reshape . * Mean difference for FE and quasi for RE-GLS and RE-MLE . . * Mean difference for FE . gen mdlnhr1979 = lnhr1979 - avelnhr . gen mdlnhr1980 = lnhr1980 - avelnhr . gen mdlnhr1981 = lnhr1981 - avelnhr . gen mdlnhr1982 = lnhr1982 - avelnhr . gen mdlnhr1983 = lnhr1983 - avelnhr . gen mdlnhr1984 = lnhr1984 - avelnhr . gen mdlnhr1985 = lnhr1985 - avelnhr . gen mdlnhr1986 = lnhr1986 - avelnhr . gen mdlnhr1987 = lnhr1987 - avelnhr . gen mdlnhr1988 = lnhr1988 - avelnhr . gen mdlnwg1979 = lnwg1979 - avelnwg . gen mdlnwg1980 = lnwg1980 - avelnwg . gen mdlnwg1981 = lnwg1981 - avelnwg . gen mdlnwg1982 = lnwg1982 - avelnwg

468

. gen mdlnwg1983 = lnwg1983 - avelnwg . gen mdlnwg1984 = lnwg1984 - avelnwg . gen mdlnwg1985 = lnwg1985 - avelnwg . gen mdlnwg1986 = lnwg1986 - avelnwg . gen mdlnwg1987 = lnwg1987 - avelnwg . gen mdlnwg1988 = lnwg1988 - avelnwg . . * Quasi difference for RE - GLS . gen reglsdlnhr1979 = lnhr1979 - lamdaregls*avelnhr . gen reglsdlnhr1980 = lnhr1980 - lamdaregls*avelnhr . gen reglsdlnhr1981 = lnhr1981 - lamdaregls*avelnhr . gen reglsdlnhr1982 = lnhr1982 - lamdaregls*avelnhr . gen reglsdlnhr1983 = lnhr1983 - lamdaregls*avelnhr . gen reglsdlnhr1984 = lnhr1984 - lamdaregls*avelnhr . gen reglsdlnhr1985 = lnhr1985 - lamdaregls*avelnhr . gen reglsdlnhr1986 = lnhr1986 - lamdaregls*avelnhr . gen reglsdlnhr1987 = lnhr1987 - lamdaregls*avelnhr . gen reglsdlnhr1988 = lnhr1988 - lamdaregls*avelnhr . gen reglsdlnwg1979 = lnwg1979 - lamdaregls*avelnwg . gen reglsdlnwg1980 = lnwg1980 - lamdaregls*avelnwg . gen reglsdlnwg1981 = lnwg1981 - lamdaregls*avelnwg . gen reglsdlnwg1982 = lnwg1982 - lamdaregls*avelnwg . gen reglsdlnwg1983 = lnwg1983 - lamdaregls*avelnwg . gen reglsdlnwg1984 = lnwg1984 - lamdaregls*avelnwg . gen reglsdlnwg1985 = lnwg1985 - lamdaregls*avelnwg . gen reglsdlnwg1986 = lnwg1986 - lamdaregls*avelnwg . gen reglsdlnwg1987 = lnwg1987 - lamdaregls*avelnwg 469

. gen reglsdlnwg1988 = lnwg1988 - lamdaregls*avelnwg . . * Quasi difference for RE - MLE . gen remledlnhr1979 = lnhr1979 - lamdaremle*avelnhr . gen remledlnhr1980 = lnhr1980 - lamdaremle*avelnhr . gen remledlnhr1981 = lnhr1981 - lamdaremle*avelnhr . gen remledlnhr1982 = lnhr1982 - lamdaremle*avelnhr . gen remledlnhr1983 = lnhr1983 - lamdaremle*avelnhr . gen remledlnhr1984 = lnhr1984 - lamdaremle*avelnhr . gen remledlnhr1985 = lnhr1985 - lamdaremle*avelnhr . gen remledlnhr1986 = lnhr1986 - lamdaremle*avelnhr . gen remledlnhr1987 = lnhr1987 - lamdaremle*avelnhr . gen remledlnhr1988 = lnhr1988 - lamdaremle*avelnhr . gen remledlnwg1979 = lnwg1979 - lamdaremle*avelnwg . gen remledlnwg1980 = lnwg1980 - lamdaremle*avelnwg . gen remledlnwg1981 = lnwg1981 - lamdaremle*avelnwg . gen remledlnwg1982 = lnwg1982 - lamdaremle*avelnwg . gen remledlnwg1983 = lnwg1983 - lamdaremle*avelnwg . gen remledlnwg1984 = lnwg1984 - lamdaremle*avelnwg . gen remledlnwg1985 = lnwg1985 - lamdaremle*avelnwg . gen remledlnwg1986 = lnwg1986 - lamdaremle*avelnwg . gen remledlnwg1987 = lnwg1987 - lamdaremle*avelnwg . gen remledlnwg1988 = lnwg1988 - lamdaremle*avelnwg . . *** NOW BACK TO LONG FORM . . * Then back to long form . reshape long lnhr lnwg mdlnhr mdlnwg reglsdlnhr reglsdlnwg remledlnhr remledlnwg, i(id) j(year) (note: j = 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988) 470

Data wide -> long ----------------------------------------------------------------------------Number of obs. 532 -> 5320 Number of variables 85 -> 14 j variable (10 values) -> year xij variables: lnhr1979 lnhr1980 ... lnhr1988 -> lnhr lnwg1979 lnwg1980 ... lnwg1988 -> lnwg mdlnhr1979 mdlnhr1980 ... mdlnhr1988 -> mdlnhr mdlnwg1979 mdlnwg1980 ... mdlnwg1988 -> mdlnwg reglsdlnhr1979 reglsdlnhr1980 ... reglsdlnhr1988->reglsdlnhr reglsdlnwg1979 reglsdlnwg1980 ... reglsdlnwg1988->reglsdlnwg remledlnhr1979 remledlnhr1980 ... remledlnhr1988->remledlnhr remledlnwg1979 remledlnwg1980 ... remledlnwg1988->remledlnwg ----------------------------------------------------------------------------. . describe Contains data obs: 5,320 vars: 14 size: 276,640 (97.2% of memory free) ------------------------------------------------------------------------------storage display value variable name type format label variable label ------------------------------------------------------------------------------id float %9.0g year int %9.0g lnhr float %9.0g lnwg float %9.0g avelnhr float %9.0g avelnwg float %9.0g _est_bebyols byte %8.0g esample() from estimates store _est_behet byte %8.0g esample() from estimates store mdlnhr float %9.0g mdlnwg float %9.0g reglsdlnhr float %9.0g reglsdlnwg float %9.0g remledlnhr float %9.0g remledlnwg float %9.0g ------------------------------------------------------------------------------Sorted by: id year Note: dataset has changed since last saved . summarize Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------id | 5320 266.5 153.5893 1 532 471

year | 5320 1983.5 2.872551 1979 1988 lnhr | 5320 7.65743 .2855914 2.77 8.56 lnwg | 5320 2.609436 .4258924 -.26 4.69 avelnhr | 5320 7.65743 .1788568 6.416 8.242 -------------+-------------------------------------------------------avelnwg | 5320 2.609436 .3908626 1.346 4.543 _est_bebyols | 5320 1 0 1 1 _est_behet | 5320 1 0 1 1 mdlnhr | 5320 -1.21e-09 .2226492 -3.988 1.344 mdlnwg | 5320 -9.86e-10 .1691472 -2.54 1.878 -------------+-------------------------------------------------------reglsdlnhr | 5320 3.18006 .2347122 -1.181465 4.008506 reglsdlnwg | 5320 1.083675 .2344336 -1.593137 2.966892 remledlnhr | 5320 3.166493 .2346121 -1.193439 3.997138 remledlnwg | 5320 1.079051 .2339546 -1.597177 2.962247 . save MOM2, replace file MOM2.dta saved . . *** (4) FIXED EFFECTS ESTIMATOR USING DIFFERENCED DATA . . * This should replicate xtreg, fe . regress mdlnhr mdlnwg Source | SS df MS Number of obs = 5320 -------------+-----------------------------F( 1, 5318) = 87.72 Model | 4.27857391 1 4.27857391 Prob > F = 0.0000 Residual | 259.39846 5318 .048777446 R-squared = 0.0162 -------------+-----------------------------Adj R-squared = 0.0160 Total | 263.677034 5319 .04957267 Root MSE = .22086 -----------------------------------------------------------------------------mdlnhr | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------mdlnwg | .1676755 .0179032 9.37 0.000 .132578 .202773 _cons | -1.04e-09 .003028 -0.00 1.000 -.0059361 .0059361 -----------------------------------------------------------------------------. estimates store febyols . . * This gives panel corrected standard errors . regress mdlnhr mdlnwg, cluster(id) Regression with robust standard errors Number of obs = 5320 F( 1, 531) = 3.89 Prob > F = 0.0490 R-squared = 0.0162 Number of clusters (id) = 532 Root MSE = .22086

472

-----------------------------------------------------------------------------| Robust mdlnhr | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------mdlnwg | .1676755 .0849706 1.97 0.049 .0007557 .3345953 _cons | -1.04e-09 6.39e-09 -0.16 0.870 -1.36e-08 1.15e-08 -----------------------------------------------------------------------------. estimates store fepanel . . * This gives panel bootstrap standard errors . * Similar to bootstrap applied to xtreg, fe . set seed 10001 . bs "regress mdlnhr mdlnwg" "_b[mdlnwg] _b[_cons]", cluster(id) reps($nreps) level(95) command: regress mdlnhr mdlnwg statistics: _bs_1 = _b[mdlnwg] _bs_2 = _b[_cons] Bootstrap statistics

Number of obs = N of clusters = 532 Replications = 500

5320

-----------------------------------------------------------------------------Variable | Reps Observed Bias Std. Err. [95% Conf. Interval] -------------+---------------------------------------------------------------_bs_1 | 500 .1676755 -.0055543 .0844631 .0017284 .3336226 (N) | .0213276 .3318829 (P) | .0300515 .3605573 (BC) _bs_2 | 500 -1.04e-09 2.79e-10 6.50e-09 -1.38e-08 1.17e-08 (N) | -1.39e-08 1.28e-08 (P) | -1.41e-08 1.17e-08 (BC) -----------------------------------------------------------------------------Note: N = normal P = percentile BC = bias-corrected . matrix febootse = e(se) . . * This gives heteroskedasticity corrected standard errors that are not panel robust . regress mdlnhr mdlnwg, robust Regression with robust standard errors Number of obs = F( 1, 5318) = 7.79 Prob > F = 0.0053 R-squared = 0.0162 Root MSE = .22086

5320

473

-----------------------------------------------------------------------------| Robust mdlnhr | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------mdlnwg | .1676755 .0600942 2.79 0.005 .0498662 .2854848 _cons | -1.04e-09 .003028 -0.00 1.000 -.0059361 .0059361 -----------------------------------------------------------------------------. estimates store fehet . . *** (5) RANDOM EFFECTS - GLS ESTIMATOR USING DIFFERENCED DATA . . * Should give same coefficient estimates as xtreg . * May give different standard errors as treats lamda as known . * but in practice the differnece is not great as lamda precisely estimated . . * This should replicate xtreg, re . regress reglsdlnhr reglsdlnwg Source | SS df MS Number of obs = 5320 -------------+-----------------------------F( 1, 5318) = 76.64 Model | 4.16279701 1 4.16279701 Prob > F = 0.0000 Residual | 288.860014 5318 .054317415 R-squared = 0.0142 -------------+-----------------------------Adj R-squared = 0.0140 Total | 293.022811 5319 .055089831 Root MSE = .23306 -----------------------------------------------------------------------------reglsdlnhr | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------reglsdlnwg | .1193323 .0136312 8.75 0.000 .0926095 .146055 _cons | 3.050743 .0151135 201.86 0.000 3.021114 3.080371 -----------------------------------------------------------------------------. estimates store reglsbyols . . * This gives panel corrected standard errors . regress reglsdlnhr reglsdlnwg, cluster(id) Regression with robust standard errors Number of obs = 5320 F( 1, 531) = 5.39 Prob > F = 0.0206 R-squared = 0.0142 Number of clusters (id) = 532 Root MSE = .23306 -----------------------------------------------------------------------------| Robust reglsdlnhr | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------reglsdlnwg | .1193323 .0514016 2.32 0.021 .0183568 .2203077 474

_cons | 3.050743 .0571367 53.39 0.000 2.938501 3.162984 -----------------------------------------------------------------------------. estimates store reglspanel . . * This gives panel bootstrap standard errors . * Similar to bootstrap applied to xtreg, fe . set seed 10001 . bs "regress reglsdlnhr reglsdlnwg" "_b[reglsdlnwg] _b[_cons]", cluster(id) reps($nreps) level(95) command: regress reglsdlnhr reglsdlnwg statistics: _bs_1 = _b[reglsdlnwg] _bs_2 = _b[_cons] Bootstrap statistics

Number of obs = N of clusters = 532 Replications = 500

5320

-----------------------------------------------------------------------------Variable | Reps Observed Bias Std. Err. [95% Conf. Interval] -------------+---------------------------------------------------------------_bs_1 | 500 .1193323 -.0020689 .0516757 .0178035 .220861 (N) | .0300938 .2277364 (P) | .0339291 .236732 (BC) _bs_2 | 500 3.050743 .0022622 .0571941 2.938372 3.163114 (N) | 2.93212 3.148191 (P) | 2.920954 3.143819 (BC) -----------------------------------------------------------------------------Note: N = normal P = percentile BC = bias-corrected . matrix reglsbootse = e(se) . . * This gives heteroskedasticity corrected standard errors that are not panel robust . regress reglsdlnhr reglsdlnwg, robust Regression with robust standard errors Number of obs = F( 1, 5318) = 7.81 Prob > F = 0.0052 R-squared = 0.0142 Root MSE = .23306

5320

-----------------------------------------------------------------------------| Robust reglsdlnhr | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------reglsdlnwg | .1193323 .0426897 2.80 0.005 .035643 .2030215 475

_cons | 3.050743 .047821 63.80 0.000 2.956994 3.144491 -----------------------------------------------------------------------------. estimates store reglshet . . *** (6) RANDOM EFFECTS - MLE ESTIMATOR USING DIFFERENCED DATA . . * Should give same coefficient estimates as xtreg . * May give different standard errors as treats lamda as known . * but in practice the differnece is not great as lamda precisely estimated . . * This should replicate xtreg, mle . regress remledlnhr remledlnwg Source | SS df MS Number of obs = 5320 -------------+-----------------------------F( 1, 5318) = 76.67 Model | 4.16076808 1 4.16076808 Prob > F = 0.0000 Residual | 288.612179 5318 .054270812 R-squared = 0.0142 -------------+-----------------------------Adj R-squared = 0.0140 Total | 292.772947 5319 .055042855 Root MSE = .23296 -----------------------------------------------------------------------------remledlnhr | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------remledlnwg | .1195474 .0136533 8.76 0.000 .0927814 .1463134 _cons | 3.037495 .0150748 201.49 0.000 3.007942 3.067048 -----------------------------------------------------------------------------. estimates store remlebyols . . * This gives panel corrected standard errors . regress remledlnhr remledlnwg, cluster(id) Regression with robust standard errors Number of obs = 5320 F( 1, 531) = 5.38 Prob > F = 0.0208 R-squared = 0.0142 Number of clusters (id) = 532 Root MSE = .23296 -----------------------------------------------------------------------------| Robust remledlnhr | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------remledlnwg | .1195474 .0515474 2.32 0.021 .0182855 .2208093 _cons | 3.037495 .0570501 53.24 0.000 2.925424 3.149567 -----------------------------------------------------------------------------. estimates store remlepanel

476

. . * This gives panel bootstrap standard errors . * Similar to bootstrap applied to xtreg, fe . set seed 10001 . bs "regress remledlnhr remledlnwg" "_b[remledlnwg] _b[_cons]", cluster(id) reps($nreps) level(95) command: regress remledlnhr remledlnwg statistics: _bs_1 = _b[remledlnwg] _bs_2 = _b[_cons] Bootstrap statistics

Number of obs = N of clusters = 532 Replications = 500

5320

-----------------------------------------------------------------------------Variable | Reps Observed Bias Std. Err. [95% Conf. Interval] -------------+---------------------------------------------------------------_bs_1 | 500 .1195474 -.0020813 .0518188 .0177375 .2213573 (N) | .0300552 .2282355 (P) | .0339668 .2372786 (BC) _bs_2 | 500 3.037495 .0022658 .0571042 2.925301 3.149689 (N) | 2.919076 3.134685 (P) | 2.907989 3.13043 (BC) -----------------------------------------------------------------------------Note: N = normal P = percentile BC = bias-corrected . matrix remlebootse = e(se) . . * This gives heteroskedasticity corrected standard errors that are not panel robust . regress reglsdlnhr reglsdlnwg, robust Regression with robust standard errors Number of obs = F( 1, 5318) = 7.81 Prob > F = 0.0052 R-squared = 0.0142 Root MSE = .23306

5320

-----------------------------------------------------------------------------| Robust reglsdlnhr | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------reglsdlnwg | .1193323 .0426897 2.80 0.005 .035643 .2030215 _cons | 3.050743 .047821 63.80 0.000 2.956994 3.144491 -----------------------------------------------------------------------------. estimates store remlehet 477

. . *** (7) ROBUST VARIANT OF HAUSMAN TEST . . * From Section 21.4.3 pages 717-9 the usual implementation of the Hausman test . * is invalid if there is any intracluster correlation left in the RE model . * as then the RE estimator is no longer fully efficient . * so Var[b_RE - b_FE] does not equal Var[b_FE] - V[b_RE] . . * (7A) Nonrobust version of Hausman test by auxiliary regression .* [will be similar to nonrobust version in mma21p1panfeandre.do] . regress reglsdlnhr reglsdlnwg mdlnwg Source | SS df MS Number of obs = 5320 -------------+-----------------------------F( 2, 5317) = 45.26 Model | 4.90465081 2 2.45232541 Prob > F = 0.0000 Residual | 288.11816 5317 .054188106 R-squared = 0.0167 -------------+-----------------------------Adj R-squared = 0.0164 Total | 293.022811 5319 .055089831 Root MSE = .23278 -----------------------------------------------------------------------------reglsdlnhr | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------reglsdlnwg | .0668379 .0196635 3.40 0.001 .0282893 .1053864 mdlnwg | .1008376 .0272531 3.70 0.000 .0474104 .1542648 _cons | 3.10763 .0215465 144.23 0.000 3.06539 3.14987 -----------------------------------------------------------------------------. scalar Hnonrobust = (_b[mdlnwg]/_se[mdlnwg])^2 . di Hnonrobust 13.690344 . . * Perform preferred valid robust version of Hausman test . * This gives the results presented on p.719 . regress reglsdlnhr reglsdlnwg mdlnwg, cluster(id) Regression with robust standard errors Number of obs = 5320 F( 2, 531) = 4.24 Prob > F = 0.0149 R-squared = 0.0167 Number of clusters (id) = 532 Root MSE = .23278 -----------------------------------------------------------------------------| Robust reglsdlnhr | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------reglsdlnwg | .0668379 .0243001 2.75 0.006 .0191016 .1145741 mdlnwg | .1008376 .0785137 1.28 0.200 -.053398 .2550732 _cons | 3.10763 .027293 113.86 0.000 3.054014 3.161245 478

-----------------------------------------------------------------------------. scalar Hrobust = (_b[mdlnwg]/_se[mdlnwg])^2 . di Hrobust 1.6495074 . . ********* DISPLAY RESULTS - Table 21.2 on page 710 ********* . . * All estimates should be equal for a given estimator. . * The standard errors will vary. . * The first and second assume iid errors and generally will be the same. . * The third assumes heteroskedastic errors, but are not panel robust. . * The fourth are panel robust and also allow for heteroskedasticity. . estimates table bextreg bebyols behet, b(%10.3f) se /* > */ stats(N ll r2 tss rss mss rmse df_r) ----------------------------------------------------Variable | bextreg bebyols behet -------------+--------------------------------------lnwg | 0.067 | 0.020 avelnwg | 0.067 0.067 | 0.020 0.024 _cons | 7.483 7.483 7.483 | 0.052 0.052 0.066 -------------+--------------------------------------N | 5320.000 532.000 532.000 ll | 166.573 166.573 166.573 r2 | 0.021 0.021 0.021 tss | rss | 16.652 16.652 16.652 mss | 0.363 0.363 0.363 rmse | 0.177 0.177 0.177 df_r | 530.000 530.000 530.000 ----------------------------------------------------legend: b/se . estimates table fextreg febyols fehet fepanel, b(%10.3f) se /* > */ stats(N ll r2 tss rss mss rmse df_r) -----------------------------------------------------------------Variable | fextreg febyols fehet fepanel -------------+---------------------------------------------------lnwg | 0.168 | 0.019 mdlnwg | 0.168 0.168 0.168 | 0.018 0.060 0.085 _cons | 7.220 -0.000 -0.000 -0.000 | 0.049 0.003 0.003 0.000 479

-------------+---------------------------------------------------N | 5320.000 5320.000 5320.000 5320.000 ll | 486.743 486.743 486.743 486.743 r2 | 0.016 0.016 0.016 0.016 tss | 433.831 rss | 259.398 259.398 259.398 259.398 mss | 4.279 4.279 4.279 4.279 rmse | 0.233 0.221 0.221 0.221 df_r | 4787.000 5318.000 5318.000 531.000 -----------------------------------------------------------------legend: b/se . estimates table reglsxtreg reglsbyols reglshet reglspanel, b(%10.3f) se /* > */ stats(N ll r2 tss rss mss rmse df_r) -----------------------------------------------------------------Variable | reglsxtreg reglsbyols reglshet reglspanel -------------+---------------------------------------------------lnwg | 0.119 | 0.014 reglsdlnwg | 0.119 0.119 0.119 | 0.014 0.043 0.051 _cons | 7.346 3.051 3.051 3.051 | 0.036 0.015 0.048 0.057 -------------+---------------------------------------------------N | 5320.000 5320.000 5320.000 5320.000 ll | 200.589 200.589 200.589 r2 | 0.014 0.014 0.014 tss | rss | 288.860 288.860 288.860 mss | 4.163 4.163 4.163 rmse | 0.233 0.233 0.233 df_r | 5318.000 5318.000 531.000 -----------------------------------------------------------------legend: b/se . estimates table remlextreg remlebyols remlehet remlepanel, b(%10.3f) se /* > */ stats(N ll r2 tss rss mss rmse df_r) -----------------------------------------------------------------Variable | remlextreg remlebyols remlehet remlepanel -------------+---------------------------------------------------lnhr | lnwg | 0.120 | 0.014 _cons | 7.345 | 0.037 -------------+---------------------------------------------------sigma_u | _cons | 0.162 | 0.006 480

-------------+---------------------------------------------------sigma_e | _cons | 0.233 | 0.002 -------------+---------------------------------------------------_ | remledlnwg | 0.120 0.120 | 0.014 0.052 reglsdlnwg | 0.119 | 0.043 _cons | 3.037 3.051 3.037 | 0.015 0.048 0.057 -------------+---------------------------------------------------Statistics | N | 5320.000 5320.000 5320.000 5320.000 ll | -266.912 202.872 200.589 202.872 r2 | 0.014 0.014 0.014 tss | rss | 288.612 288.860 288.612 mss | 4.161 4.163 4.161 rmse | 0.233 0.233 0.233 df_r | 5318.000 5318.000 531.000 -----------------------------------------------------------------legend: b/se . . * The following are (panel) bootstrap standard errors . matrix list bebootse bebootse[1,2] _bs_1 _bs_2 se .02394857 .06483965 . matrix list febootse febootse[1,2] _bs_1 _bs_2 se .08446309 6.497e-09 . * Note that the following two differ from mma21p1panfeandre.do . * as here the same value of lamda is used throught the bootstraps . matrix list remlebootse remlebootse[1,2] _bs_1 _bs_2 se .05181879 .05710419 . matrix list reglsbootse reglsbootse[1,2] _bs_1 _bs_2 481

se .05167569 .05719414 . . * For completeness give lamda . di lamdaregls .58470925 . di lamdaremle .58648101 . . * Robust and nonrobust versions of Hausman test given on p.719 . di Hnonrobust /* Not valid if intracluster correlation */ 13.690344 . di Hrobust 1.6495074

/* Valid if intracluster correlation */

. . ********** CLOSE OUTPUT . log close log: c:\Imbook\bwebpage\Section5\mma21p2panmanual.txt log type: text closed on: 23 May 2005, 11:35:55 -----------------------------------------------------------------------------------------------------log: c:\Imbook\bwebpage\Section5\mma21p2panresiduals.txt log type: text opened on: 23 May 2005, 11:37:22 . . ********** OVERVIEW OF MMA21P3PANRESIDUALS.DO ********** . . * STATA Program . * copyright C 2005 by A. Colin Cameron and Pravin K. Trivedi . * used for "Microeconometrics: Methods and Applications" . * by A. Colin Cameron and Pravin K. Trivedi (2005) . * Cambridge University Press . . * Chapter 21.3.4 pages 713-15 Residual analysis . * This program . * (1) estimates correlations for . * - dependent variable . * - regressors variable . * - residuals from pooled ols [Table 21.3] . * - residuals from within estimation [Table 21.4] . * - residuals from random effects estimation . * (2) separately estimates correlations for . * - residuals from first differences estiamtion . * (3) gets correlations for each individual observation . 482

. * The code is very limited: . * - it considers only one regressor . * - it assumes a balanced data set with exactly 10 years of data per obnservations . * - it does not use loops for transformations which would generalize code . . * The four basic linear panel programs are . * mma21p1panfeandre.do Linear fixed and random effects using xtreg . * mma21p2panfeandre.do Linear fe and re using transformation and regress .* plus also has valid Hausman test . * mma21p3panresiduals.do Residual analysis after linear fe and re . * mma21p4panpangls.do Pooled panel OLS and GLS . . * To run you need file . * MOM.dat . * in your directory . . ********** SETUP ********** . . set more off . version 8.0 . set scheme s1mono /* Graphics scheme */ . . ********** DATA DESCRIPTION ********** . . * The original data is from . * Jim Ziliak (1997) . * "Efficient Estimation With Panel Data when Instruments are Predetermined: . * An Emprirical Comparison of Moment-Condition Estimators" . * Journal of Business and Economic Statistics, 15, 419-431 . . * File MOM.dat has data on 532 men over 10 years (1979-1988) . * Data are space-delimited ordered by person with separate line for each year . * So id 1 1979, id 1 1980, ..., id 1 1988, id 2 1979, 1d 2 1980, ... . * 8 variables: . * lnhr lnwg kids ageh agesq disab id year . . * File MOM.dat is the version of the data posted at the JBES website . * Note that in chapter 22 we instead use MOMprecise.dat . * which is the same data set but with more significant digits . . ********** READ DATA ********** .* . * The data are in ascii file MOM.dat . * There are 532 individuals with 10 lines (years) per individual . * Read in using Infile: FREE FORMAT WITHOUT DICTIONARY . infile lnhr lnwg kids ageh agesq disab id year using MOM.dat (5320 observations read)

483

. summarize Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------lnhr | 5320 7.65743 .2855914 2.77 8.56 lnwg | 5320 2.609436 .4258924 -.26 4.69 kids | 5320 1.555827 1.195924 0 6 ageh | 5320 38.91823 8.450351 22 60 agesq | 5320 1586.024 689.7759 484 3600 -------------+-------------------------------------------------------disab | 5320 .0609023 .2391734 0 1 id | 5320 266.5 153.5893 1 532 year | 5320 1983.5 2.872551 1979 1988 . . ************ (1) ANALYSIS: OBTAIN KEY AUTOCORRELATIONS Tables 21.3, 21.4 ********** . . ** RUN REGRESSIONS AND GET RESIDUALS OF INTEREST . . * pooled ols . regress lnhr lnwg Source | SS df MS Number of obs = 5320 -------------+-----------------------------F( 1, 5318) = 82.22 Model | 6.60538417 1 6.60538417 Prob > F = 0.0000 Residual | 427.225206 5318 .080335691 R-squared = 0.0152 -------------+-----------------------------Adj R-squared = 0.0150 Total | 433.830591 5319 .081562435 Root MSE = .28344 -----------------------------------------------------------------------------lnhr | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------lnwg | .0827436 .0091251 9.07 0.000 .0648545 .1006326 _cons | 7.441516 .0241265 308.44 0.000 7.394219 7.488814 -----------------------------------------------------------------------------. predict upols, residuals . . * fixed effects (within) . xtreg lnhr lnwg, fe i(id) Fixed-effects (within) regression Group variable (i): id R-sq: within = 0.0162 between = 0.0213 overall = 0.0152

Number of obs = 5320 Number of groups = 532 Obs per group: min = avg = 10.0 max = 10

F(1,4787)

=

10

78.96 484

corr(u_i, Xb) = -0.1995

Prob > F

=

0.0000

-----------------------------------------------------------------------------lnhr | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------lnwg | .1676755 .01887 8.89 0.000 .1306816 .2046694 _cons | 7.219892 .0493434 146.32 0.000 7.123156 7.316628 -------------+---------------------------------------------------------------sigma_u | .18142881 sigma_e | .23278339 rho | .37789558 (fraction of variance due to u_i) -----------------------------------------------------------------------------F test that all u_i=0: F(531, 4787) = 5.83 Prob > F = 0.0000 . predict ufe, e . . * random effects . xtreg lnhr lnwg, re i(id) Random-effects GLS regression Group variable (i): id R-sq: within = 0.0162 between = 0.0213 overall = 0.0152 Random effects u_i ~ Gaussian corr(u_i, X) = 0 (assumed)

Number of obs Number of groups =

= 5320 532

Obs per group: min = avg = 10.0 max = 10

10

Wald chi2(1) = 76.64 Prob > chi2 = 0.0000

-----------------------------------------------------------------------------lnhr | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------lnwg | .1193322 .0136312 8.75 0.000 .0926155 .146049 _cons | 7.346041 .0363925 201.86 0.000 7.274713 7.417368 -------------+---------------------------------------------------------------sigma_u | .16124733 sigma_e | .23278339 rho | .32424354 (fraction of variance due to u_i) -----------------------------------------------------------------------------. predict ure, e . . summarize upols ufe ure Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------upols | 5320 -1.27e-10 .2834089 -4.826247 .964581 ufe | 5320 -5.52e-11 .2208354 -4.003929 1.2719 ure | 5320 -9.00e-11 .2231118 -4.131111 1.085362 485

. save mom3, replace file mom3.dta saved . . ** TRANSFORM DATA FROM LONG FORM TO WIDE FORM . . * Here just do this for lnhr and lnwg and the residuals . keep lnhr lnwg id year upols ufe ure . reshape wide lnhr lnwg upols ufe ure, i(id) j(year) (note: j = 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988) Data long -> wide ----------------------------------------------------------------------------Number of obs. 5320 -> 532 Number of variables 7 -> 51 j variable (10 values) year -> (dropped) xij variables: lnhr -> lnhr1979 lnhr1980 ... lnhr1988 lnwg -> lnwg1979 lnwg1980 ... lnwg1988 upols -> upols1979 upols1980 ... upols1988 ufe -> ufe1979 ufe1980 ... ufe1988 ure -> ure1979 ure1980 ... ure1988 ----------------------------------------------------------------------------. . * Since year is 1979 to 1988 this will create . * lnhr1979 to lnhr1988 and lnwg1979 to lnwg1988 . . summarize Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------id | 532 266.5 153.7194 1 532 lnhr1979 | 532 7.669342 .249361 5.89 8.54 lnwg1979 | 532 2.597763 .4188951 .52 4.62 upols1979 | 532 .0128775 .2517228 -1.764168 .8312218 ufe1979 | 532 .0138689 .2249175 -1.578105 1.2719 -------------+-------------------------------------------------------ure1979 | 532 .0133046 .2200196 -1.618987 1.085362 lnhr1980 | 532 7.660094 .2691995 5.22 8.34 lnwg1980 | 532 2.602368 .3945963 .8 4.61 upols1980 | 532 .0032483 .2679463 -2.354734 .6659743 ufe1980 | 532 .0038486 .2253673 -2.085636 1.128546 -------------+-------------------------------------------------------ure1980 | 532 .0035069 .2238723 -2.089847 .9429754 lnhr1981 | 532 7.66765 .2105797 6.36 8.4 lnwg1981 | 532 2.610959 .3870011 1.53 4.53 upols1981 | 532 .0100939 .2133106 -1.342159 .7582438 ufe1981 | 532 .0099646 .163407 -1.001722 1.03687 486

-------------+-------------------------------------------------------ure1981 | 532 .0100382 .1596593 -1.02491 .8517824 lnhr1982 | 532 7.64609 .2427195 5.38 8.31 lnwg1982 | 532 2.61468 .4014363 1.21 4.61 upols1982 | 532 -.0117742 .2422735 -2.264238 .6897579 ufe1982 | 532 -.0122196 .1890237 -1.623214 .7918997 -------------+-------------------------------------------------------ure1982 | 532 -.0119661 .1875585 -1.737484 .6666697 lnhr1983 | 532 7.613064 .382703 2.77 8.37 lnwg1983 | 532 2.610526 .4111869 1.08 4.62 upols1983 | 532 -.0444568 .3778255 -4.826247 .7307264 ufe1983 | 532 -.0445494 .2836351 -3.577253 .5196197 -------------+-------------------------------------------------------ure1983 | 532 -.0444967 .294545 -3.804399 .5078294 lnhr1984 | 532 7.636523 .3316735 3.18 8.44 lnwg1984 | 532 2.600188 .4621549 -.26 4.65 upols1984 | 532 -.0201427 .3208512 -4.240003 .8263766 ufe1984 | 532 -.0193572 .225836 -2.810104 .8327778 -------------+-------------------------------------------------------ure1984 | 532 -.0198043 .2378605 -3.140221 .7036628 lnhr1985 | 532 7.668365 .2597423 5.08 8.54 lnwg1985 | 532 2.614944 .4347554 1.33 4.69 upols1985 | 532 .0104785 .259051 -2.503835 .8624523 ufe1985 | 532 .0100107 .1856724 -1.581894 .7944546 -------------+-------------------------------------------------------ure1985 | 532 .010277 .1886509 -1.752727 .7370209 lnhr1986 | 532 7.659286 .3330862 2.77 8.38 lnwg1986 | 532 2.602632 .4432807 .07 4.59 upols1986 | 532 .0024183 .3312105 -4.801424 .7439653 ufe1986 | 532 .0029962 .2595405 -4.003929 .6384854 -------------+-------------------------------------------------------ure1986 | 532 .0026673 .264328 -4.131111 .5111209 lnhr1987 | 532 7.67406 .2745015 4.38 8.56 lnwg1987 | 532 2.614699 .4300122 1.28 4.03 upols1987 | 532 .0161942 .2749153 -3.283269 .964581 ufe1987 | 532 .0157472 .2141618 -2.817174 1.009662 -------------+-------------------------------------------------------ure1987 | 532 .0160016 .2148092 -2.897725 .8441463 lnhr1988 | 532 7.679831 .2552894 4.79 8.53 lnwg1988 | 532 2.625602 .4701759 -.22 4.6 upols1988 | 532 .0210628 .2519891 -2.633313 .9072749 ufe1988 | 532 .0196898 .2048927 -1.68379 1.123516 -------------+-------------------------------------------------------ure1988 | 532 .0204713 .2022375 -1.897506 .9393954 . . ** OBTAIN THE VARIOUS CORRELATIONS . . corr lnhr1979 lnhr1980 lnhr1981 lnhr1982 lnhr1983 lnhr1984 lnhr1985 lnhr1986 lnhr1987 lnhr1988 (obs=532) 487

| lnhr1979 lnhr1980 lnhr1981 lnhr1982 lnhr1983 lnhr1984 lnhr1985 lnhr1986 lnhr1987 -------------+--------------------------------------------------------------------------------lnhr1979 | 1.0000 lnhr1980 | 0.3220 1.0000 lnhr1981 | 0.4321 0.4022 1.0000 lnhr1982 | 0.2947 0.3142 0.5670 1.0000 lnhr1983 | 0.2070 0.2324 0.3788 0.4781 1.0000 lnhr1984 | 0.1908 0.2235 0.3141 0.3318 0.6476 1.0000 lnhr1985 | 0.2284 0.3184 0.3999 0.3453 0.3930 0.5839 1.0000 lnhr1986 | 0.1934 0.1931 0.2813 0.2524 0.3162 0.3595 0.4128 1.0000 lnhr1987 | 0.1986 0.3160 0.3322 0.2951 0.3261 0.3464 0.3987 0.3603 1.0000 lnhr1988 | 0.1640 0.2551 0.3081 0.2674 0.2267 0.2537 0.3509 0.5741 0.5248 | lnhr1988 -------------+--------lnhr1988 | 1.0000

. corr lnwg1979 lnwg1980 lnwg1981 lnwg1982 lnwg1983 lnwg1984 lnwg1985 lnwg1986 lnwg1987 lnwg1988 (obs=532) | lnwg1979 lnwg1980 lnwg1981 lnwg1982 lnwg1983 lnwg1984 lnwg1985 lnwg1986 lnwg1987 -------------+--------------------------------------------------------------------------------lnwg1979 | 1.0000 lnwg1980 | 0.8415 1.0000 lnwg1981 | 0.8283 0.8920 1.0000 lnwg1982 | 0.7984 0.8559 0.9015 1.0000 lnwg1983 | 0.7795 0.8408 0.8787 0.9155 1.0000 lnwg1984 | 0.7208 0.7737 0.8102 0.8267 0.8625 1.0000 lnwg1985 | 0.7424 0.7929 0.8290 0.8511 0.8636 0.8620 1.0000 lnwg1986 | 0.7250 0.7714 0.8122 0.8286 0.8530 0.8399 0.9157 1.0000 lnwg1987 | 0.7188 0.7639 0.8029 0.8282 0.8525 0.8681 0.9117 0.9111 1.0000 lnwg1988 | 0.7220 0.7604 0.7900 0.8139 0.8326 0.8373 0.8787 0.8743 0.9101 | lnwg1988 -------------+--------lnwg1988 | 1.0000

. * The following gives Table 21.3 p.714 . corr upols1979 upols1980 upols1981 upols1982 upols1983 upols1984 upols1985 upols1986 upols1987 upo > ls1988 (obs=532) | upo~1979 upo~1980 upo~1981 upo~1982 upo~1983 upo~1984 upo~1985 upo~1986 upo~1987 -------------+--------------------------------------------------------------------------------488

upols1979 | upols1980 | upols1981 | upols1982 | upols1983 | upols1984 | upols1985 | upols1986 | upols1987 | upols1988 |

1.0000 0.3283 0.4442 0.3008 0.2089 0.2025 0.2395 0.1987 0.2091 0.1619

1.0000 0.4035 0.3140 0.2298 0.2289 0.3246 0.1903 0.3167 0.2456

1.0000 0.5678 0.3739 0.3194 0.4087 0.2797 0.3340 0.3016

1.0000 0.4684 0.3360 0.3484 0.2470 0.2877 0.2582

1.0000 0.6398 0.3898 0.3109 0.3097 0.2083

1.0000 0.5800 0.3535 0.3361 0.2470

1.0000 0.3991 1.0000 0.3941 0.3496 1.0000 0.3436 0.5545 0.5242

| upo~1988 -------------+--------upols1988 | 1.0000

. corr ure1979 ure1980 ure1981 ure1982 ure1983 ure1984 ure1985 ure1986 ure1987 ure1988 (obs=532) | ure1979 ure1980 ure1981 ure1982 ure1983 ure1984 ure1985 ure1986 ure1987 -------------+--------------------------------------------------------------------------------ure1979 | 1.0000 ure1980 | 0.0778 1.0000 ure1981 | 0.1777 0.0604 1.0000 ure1982 | -0.0250 -0.0519 0.2492 1.0000 ure1983 | -0.2339 -0.2277 -0.1609 0.0587 1.0000 ure1984 | -0.2482 -0.2431 -0.2691 -0.1709 0.3795 1.0000 ure1985 | -0.1842 -0.0919 -0.1054 -0.1581 -0.0939 0.2197 1.0000 ure1986 | -0.1860 -0.2333 -0.2434 -0.2405 -0.1110 -0.0763 -0.0361 1.0000 ure1987 | -0.1665 -0.0481 -0.1580 -0.1904 -0.1710 -0.1506 -0.0646 -0.0553 1.0000 ure1988 | -0.1960 -0.1251 -0.1646 -0.1949 -0.3265 -0.2786 -0.1221 0.2708 0.2379 | ure1988 -------------+--------ure1988 | 1.0000

. * The following gives Table 21.4 p.715 . corr ufe1979 ufe1980 ufe1981 ufe1982 ufe1983 ufe1984 ufe1985 ufe1986 ufe1987 ufe1988 (obs=532) | ufe1979 ufe1980 ufe1981 ufe1982 ufe1983 ufe1984 ufe1985 ufe1986 ufe1987 -------------+--------------------------------------------------------------------------------ufe1979 | 1.0000 ufe1980 | 0.1017 1.0000 ufe1981 | 0.2082 0.0802 1.0000 ufe1982 | 0.0003 -0.0380 0.2631 1.0000 ufe1983 | -0.2632 -0.2691 -0.2113 0.0089 1.0000 ufe1984 | -0.2594 -0.2698 -0.3004 -0.2037 0.3249 1.0000 ufe1985 | -0.1757 -0.0958 -0.1069 -0.1685 -0.1617 0.1713 1.0000 ufe1986 | -0.1915 -0.2534 -0.2644 -0.2676 -0.1723 -0.1364 -0.0865 1.0000 489

ufe1987 | -0.1519 -0.0497 -0.1561 -0.2008 -0.2399 -0.2066 -0.0918 -0.0908 1.0000 ufe1988 | -0.1650 -0.1109 -0.1385 -0.1772 -0.3816 -0.3096 -0.1268 0.2420 0.2439 | ufe1988 -------------+--------ufe1988 | 1.0000

. . * The following does estimation for just one year . regress lnhr1979 lnwg1979 Source | SS df MS Number of obs = 532 -------------+-----------------------------F( 1, 530) = 0.00 Model | .000035507 1 .000035507 Prob > F = 0.9810 Residual | 33.0180361 530 .062298181 R-squared = 0.0000 -------------+-----------------------------Adj R-squared = -0.0019 Total | 33.0180716 531 .062180926 Root MSE = .2496 -----------------------------------------------------------------------------lnhr1979 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------lnwg1979 | .0006173 .0258574 0.02 0.981 -.0501783 .0514129 _cons | 7.667738 .0680375 112.70 0.000 7.534082 7.801395 -----------------------------------------------------------------------------. . ************ (2) ANALYSIS: OBTAIN AUTOCORRELATIONS FOR FIRST DIFFERNCES . . ** SET UP THE DATA . use mom, clear . gen dlnhr = lnhr - lnhr[_n-1] (1 missing value generated) . gen dlnwg = lnwg - lnwg[_n-1] (1 missing value generated) . * The following drops the first year which here is 1979 . drop if year == 1979 (532 observations deleted) . regress dlnhr dlnwg Source | SS df MS Number of obs = 4788 -------------+-----------------------------F( 1, 4786) = 26.09 Model | 2.27870825 1 2.27870825 Prob > F = 0.0000 Residual | 417.943979 4786 .087326364 R-squared = 0.0054 -------------+-----------------------------Adj R-squared = 0.0052 Total | 420.222687 4787 .087784142 Root MSE = .29551

490

-----------------------------------------------------------------------------dlnhr | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------dlnwg | .1089851 .0213351 5.11 0.000 .0671584 .1508118 _cons | .0008283 .0042712 0.19 0.846 -.0075452 .0092018 -----------------------------------------------------------------------------. predict ufdiff, residuals . * Here just do this for lnhr and lnwg and the residuals . keep dlnhr dlnwg ufdiff id year . reshape wide dlnhr dlnwg ufdiff, i(id) j(year) (note: j = 1980 1981 1982 1983 1984 1985 1986 1987 1988) Data long -> wide ----------------------------------------------------------------------------Number of obs. 4788 -> 532 Number of variables 5 -> 28 j variable (9 values) year -> (dropped) xij variables: dlnhr -> dlnhr1980 dlnhr1981 ... dlnhr1988 dlnwg -> dlnwg1980 dlnwg1981 ... dlnwg1988 ufdiff -> ufdiff1980 ufdiff1981 ... ufdiff1988 ----------------------------------------------------------------------------. summarize Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------id | 532 266.5 153.7194 1 532 dlnhr1980 | 532 -.0092481 .3023508 -2.5 1.71 dlnwg1980 | 532 .0046053 .2301879 -2.12 1.05 ufdiff1980 | 532 -.0105783 .3014161 -2.499738 1.690644 dlnhr1981 | 532 .0075564 .2668644 -1.2 2.32 -------------+-------------------------------------------------------dlnwg1981 | 532 .0085902 .1818033 -.79 1.62 ufdiff1981 | 532 .0057919 .2669213 -1.145188 2.343149 dlnhr1982 | 532 -.0215602 .212834 -2.06 1.14 dlnwg1982 | 532 .0037218 .1755574 -1.17 .74 ufdiff1982 | 532 -.0227941 .213709 -2.036851 1.135902 -------------+-------------------------------------------------------dlnhr1983 | 532 -.0330263 .3413969 -4.51 .9899998 dlnwg1983 | 532 -.0041541 .1673057 -.88 .6399999 ufdiff1983 | 532 -.0334019 .3398726 -4.419281 .9780819 dlnhr1984 | 532 .0234586 .3034213 -2.31 2.57 dlnwg1984 | 532 -.0103383 .2342514 -2.13 .77 -------------+-------------------------------------------------------ufdiff1984 | 532 .0237571 .3004287 -2.168058 2.502691 dlnhr1985 | 532 .0318421 .2772558 -1.46 3.52 dlnwg1985 | 532 .0147556 .2371054 -1.33 3.06 491

ufdiff1985 | 532 .0294057 .2697542 -1.315878 3.185677 dlnhr1986 | 532 -.0090789 .3270724 -4.79 1.8 -------------+-------------------------------------------------------dlnwg1986 | 532 -.012312 .1804162 -1.83 1.04 ufdiff1986 | 532 -.0085654 .3299129 -4.796278 1.789363 dlnhr1987 | 532 .0147744 .3470122 -3.24 4.52 dlnwg1987 | 532 .0120677 .1845692 -.9400001 1.95 ufdiff1987 | 532 .0126309 .3494111 -3.243008 4.550777 -------------+-------------------------------------------------------dlnhr1988 | 532 .0057707 .2587991 -2.5 2.74 dlnwg1988 | 532 .0109023 .194813 -1.5 1.22 ufdiff1988 | 532 .0037542 .2576554 -2.337351 2.739172 . . ** GET THE CORRELATIONS . corr dlnhr1980 dlnhr1981 dlnhr1982 dlnhr1983 dlnhr1984 dlnhr1985 dlnhr1986 dlnhr1987 dlnhr1988 (obs=532) | dlnhr1~0 dlnhr1~1 dlnhr1~2 dlnhr1~3 dlnhr1~4 dlnhr1~5 dlnhr1~6 dlnhr1~7 dlnhr1~8 -------------+--------------------------------------------------------------------------------dlnhr1980 | 1.0000 dlnhr1981 | -0.6289 1.0000 dlnhr1982 | 0.0402 -0.2306 1.0000 dlnhr1983 | 0.0144 -0.0204 -0.2209 1.0000 dlnhr1984 | -0.0001 -0.0570 -0.1410 -0.4495 1.0000 dlnhr1985 | 0.0393 -0.0320 -0.0827 -0.4035 -0.1969 1.0000 dlnhr1986 | -0.0629 0.0322 0.0112 0.0233 -0.1192 -0.2334 1.0000 dlnhr1987 | 0.0811 -0.0709 -0.0029 -0.0448 -0.0202 0.0093 -0.6231 1.0000 dlnhr1988 | -0.0341 0.0461 -0.0082 -0.1020 0.0261 0.0682 0.2486 -0.6064 1.0000

. corr dlnwg1980 dlnwg1981 dlnwg1982 dlnwg1983 dlnwg1984 dlnwg1985 dlnwg1986 dlnwg1987 dlnwg1988 (obs=532) | dlnwg1~0 dlnwg1~1 dlnwg1~2 dlnwg1~3 dlnwg1~4 dlnwg1~5 dlnwg1~6 dlnwg1~7 dlnwg1~8 -------------+--------------------------------------------------------------------------------dlnwg1980 | 1.0000 dlnwg1981 | -0.3507 1.0000 dlnwg1982 | -0.0149 -0.2849 1.0000 dlnwg1983 | 0.0215 -0.0351 -0.3338 1.0000 dlnwg1984 | -0.0112 0.0098 -0.0686 -0.1899 1.0000 dlnwg1985 | -0.0135 -0.0085 0.0141 -0.1179 -0.5560 1.0000 dlnwg1986 | -0.0121 0.0289 -0.0303 0.0725 -0.0526 -0.2665 1.0000 dlnwg1987 | -0.0042 -0.0119 0.0382 -0.0083 0.1200 -0.1482 -0.5043 1.0000 dlnwg1988 | -0.0281 -0.0377 0.0157 -0.0133 -0.0174 -0.0058 -0.0174 -0.2627 1.0000

492

. corr ufdiff1980 ufdiff1981 ufdiff1982 ufdiff1983 ufdiff1984 ufdiff1985 ufdiff1986 ufdiff1987 ufdif > f1988 (obs=532) | ufd~1980 ufd~1981 ufd~1982 ufd~1983 ufd~1984 ufd~1985 ufd~1986 ufd~1987 ufd~1988 -------------+--------------------------------------------------------------------------------ufdiff1980 | 1.0000 ufdiff1981 | -0.6263 1.0000 ufdiff1982 | 0.0451 -0.2389 1.0000 ufdiff1983 | 0.0128 -0.0239 -0.2316 1.0000 ufdiff1984 | -0.0010 -0.0588 -0.1291 -0.4804 1.0000 ufdiff1985 | 0.0453 -0.0285 -0.0868 -0.3731 -0.1853 1.0000 ufdiff1986 | -0.0674 0.0321 0.0110 0.0256 -0.1138 -0.2538 1.0000 ufdiff1987 | 0.0811 -0.0711 -0.0077 -0.0533 -0.0081 0.0211 -0.6250 1.0000 ufdiff1988 | -0.0323 0.0499 0.0022 -0.1019 0.0368 0.0543 0.2326 -0.5943 1.0000

. . ************ (3) ANALYSIS: CORRELATIONS FOR AN INDIVIDUAL OBSERVATION . . * Look at correlations for each individual . . ** TRANSFORM DATA FROM LONG FORM TO WIDE FORM FOR INDIVIDUALS . . use mom3, replace . * Here just do this for lnhr and lnwg and the residuals . keep lnhr lnwg id year . reshape wide lnhr lnwg, i(year) j(id) (note: j = 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 > 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 6 > 6 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 > 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 > 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 1 > 48 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 > 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 1 > 97 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 > 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 2

493

> 46 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 > 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 2 > 95 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 > 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 3 > 44 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 > 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 3 > 93 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 > 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 4 > 42 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 > 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 4 > 91 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 > 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532) Data long -> wide ----------------------------------------------------------------------------Number of obs. 5320 -> 10 Number of variables 4 -> 1065 j variable (532 values) id -> (dropped) xij variables: lnhr -> lnhr1 lnhr2 ... lnhr532 lnwg -> lnwg1 lnwg2 ... lnwg532 ----------------------------------------------------------------------------. * Note that i and j are reversed . . * Since year is 1979 to 1988 this will create . * lnhr1979 to lnhr1988 and lnwg1979 to lnwg1988 . . tsset year time variable: year, 1979 to 1988 . . * First-order Correlation over T years for the first observation . corr lnhr1 L.lnhr1 (obs=9) | L. | lnhr1 lnhr1 -------------+-----------------lnhr1 | 494

-- | 1.0000 L1 | 0.6378 1.0000

. * First-order Correlation over T years for the second observation . corr lnhr2 L.lnhr2 (obs=9) | L. | lnhr2 lnhr2 -------------+-----------------lnhr2 | -- | 1.0000 L1 | 0.5553 1.0000

. * And so on . . ********** CLOSE OUTPUT . log close log: c:\Imbook\bwebpage\Section5\mma21p2panresiduals.txt log type: text closed on: 23 May 2005, 11:37:30 -----------------------------------------------------------------------------------------------------log: c:\Imbook\bwebpage\Section5\mma21p3panresiduals.txt log type: text opened on: 23 May 2005, 13:01:06 . . ********** OVERVIEW OF MMA21P3PANRESIDUALS.DO ********** . . * STATA Program . * copyright C 2005 by A. Colin Cameron and Pravin K. Trivedi . * used for "Microeconometrics: Methods and Applications" . * by A. Colin Cameron and Pravin K. Trivedi (2005) . * Cambridge University Press . . * Chapter 21.3.4 pages 713-15 Residual analysis . * This program . * (1) estimates correlations for . * - dependent variable . * - regressors variable . * - residuals from pooled ols [Table 21.3] . * - residuals from within estimation [Table 21.4] . * - residuals from random effects estimation . * (2) separately estimates correlations for . * - residuals from first differences estiamtion . * (3) gets correlations for each individual observation . . * The code is very limited: 495

. * - it considers only one regressor . * - it assumes a balanced data set with exactly 10 years of data per obnservations . * - it does not use loops for transformations which would generalize code . . * The four basic linear panel programs are . * mma21p1panfeandre.do Linear fixed and random effects using xtreg . * mma21p2panfeandre.do Linear fe and re using transformation and regress .* plus also has valid Hausman test . * mma21p3panresiduals.do Residual analysis after linear fe and re . * mma21p4panpangls.do Pooled panel OLS and GLS . . * To run you need file . * MOM.dat . * in your directory . . ********** SETUP ********** . . set more off . version 8.0 . set scheme s1mono /* Graphics scheme */ . . ********** DATA DESCRIPTION ********** . . * The original data is from . * Jim Ziliak (1997) . * "Efficient Estimation With Panel Data when Instruments are Predetermined: . * An Emprirical Comparison of Moment-Condition Estimators" . * Journal of Business and Economic Statistics, 15, 419-431 . . * File MOM.dat has data on 532 men over 10 years (1979-1988) . * Data are space-delimited ordered by person with separate line for each year . * So id 1 1979, id 1 1980, ..., id 1 1988, id 2 1979, 1d 2 1980, ... . * 8 variables: . * lnhr lnwg kids ageh agesq disab id year . . * File MOM.dat is the version of the data posted at the JBES website . * Note that in chapter 22 we instead use MOMprecise.dat . * which is the same data set but with more significant digits . . ********** READ DATA ********** .* . * The data are in ascii file MOM.dat . * There are 532 individuals with 10 lines (years) per individual . * Read in using Infile: FREE FORMAT WITHOUT DICTIONARY . infile lnhr lnwg kids ageh agesq disab id year using MOM.dat (5320 observations read) . summarize 496

Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------lnhr | 5320 7.65743 .2855914 2.77 8.56 lnwg | 5320 2.609436 .4258924 -.26 4.69 kids | 5320 1.555827 1.195924 0 6 ageh | 5320 38.91823 8.450351 22 60 agesq | 5320 1586.024 689.7759 484 3600 -------------+-------------------------------------------------------disab | 5320 .0609023 .2391734 0 1 id | 5320 266.5 153.5893 1 532 year | 5320 1983.5 2.872551 1979 1988 . . ************ (1) ANALYSIS: OBTAIN KEY AUTOCORRELATIONS Tables 21.3, 21.4 ********** . . ** RUN REGRESSIONS AND GET RESIDUALS OF INTEREST . . * pooled ols . regress lnhr lnwg Source | SS df MS Number of obs = 5320 -------------+-----------------------------F( 1, 5318) = 82.22 Model | 6.60538417 1 6.60538417 Prob > F = 0.0000 Residual | 427.225206 5318 .080335691 R-squared = 0.0152 -------------+-----------------------------Adj R-squared = 0.0150 Total | 433.830591 5319 .081562435 Root MSE = .28344 -----------------------------------------------------------------------------lnhr | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------lnwg | .0827436 .0091251 9.07 0.000 .0648545 .1006326 _cons | 7.441516 .0241265 308.44 0.000 7.394219 7.488814 -----------------------------------------------------------------------------. predict upols, residuals . . * fixed effects (within) . xtreg lnhr lnwg, fe i(id) Fixed-effects (within) regression Group variable (i): id R-sq: within = 0.0162 between = 0.0213 overall = 0.0152

corr(u_i, Xb) = -0.1995

Number of obs = 5320 Number of groups = 532 Obs per group: min = avg = 10.0 max = 10

F(1,4787) = Prob > F

10

78.96 = 0.0000 497

-----------------------------------------------------------------------------lnhr | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------lnwg | .1676755 .01887 8.89 0.000 .1306816 .2046694 _cons | 7.219892 .0493434 146.32 0.000 7.123156 7.316628 -------------+---------------------------------------------------------------sigma_u | .18142881 sigma_e | .23278339 rho | .37789558 (fraction of variance due to u_i) -----------------------------------------------------------------------------F test that all u_i=0: F(531, 4787) = 5.83 Prob > F = 0.0000 . predict ufe, e . . * random effects . xtreg lnhr lnwg, re i(id) Random-effects GLS regression Group variable (i): id R-sq: within = 0.0162 between = 0.0213 overall = 0.0152 Random effects u_i ~ Gaussian corr(u_i, X) = 0 (assumed)

Number of obs Number of groups =

= 5320 532

Obs per group: min = avg = 10.0 max = 10

10

Wald chi2(1) = 76.64 Prob > chi2 = 0.0000

-----------------------------------------------------------------------------lnhr | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------lnwg | .1193322 .0136312 8.75 0.000 .0926155 .146049 _cons | 7.346041 .0363925 201.86 0.000 7.274713 7.417368 -------------+---------------------------------------------------------------sigma_u | .16124733 sigma_e | .23278339 rho | .32424354 (fraction of variance due to u_i) -----------------------------------------------------------------------------. predict ure, e . . summarize upols ufe ure Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------upols | 5320 -1.27e-10 .2834089 -4.826247 .964581 ufe | 5320 -5.52e-11 .2208354 -4.003929 1.2719 ure | 5320 -9.00e-11 .2231118 -4.131111 1.085362

498

. save mom3, replace file mom3.dta saved . . ** TRANSFORM DATA FROM LONG FORM TO WIDE FORM . . * Here just do this for lnhr and lnwg and the residuals . keep lnhr lnwg id year upols ufe ure . reshape wide lnhr lnwg upols ufe ure, i(id) j(year) (note: j = 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988) Data long -> wide ----------------------------------------------------------------------------Number of obs. 5320 -> 532 Number of variables 7 -> 51 j variable (10 values) year -> (dropped) xij variables: lnhr -> lnhr1979 lnhr1980 ... lnhr1988 lnwg -> lnwg1979 lnwg1980 ... lnwg1988 upols -> upols1979 upols1980 ... upols1988 ufe -> ufe1979 ufe1980 ... ufe1988 ure -> ure1979 ure1980 ... ure1988 ----------------------------------------------------------------------------. . * Since year is 1979 to 1988 this will create . * lnhr1979 to lnhr1988 and lnwg1979 to lnwg1988 . . summarize Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------id | 532 266.5 153.7194 1 532 lnhr1979 | 532 7.669342 .249361 5.89 8.54 lnwg1979 | 532 2.597763 .4188951 .52 4.62 upols1979 | 532 .0128775 .2517228 -1.764168 .8312218 ufe1979 | 532 .0138689 .2249175 -1.578105 1.2719 -------------+-------------------------------------------------------ure1979 | 532 .0133046 .2200196 -1.618987 1.085362 lnhr1980 | 532 7.660094 .2691995 5.22 8.34 lnwg1980 | 532 2.602368 .3945963 .8 4.61 upols1980 | 532 .0032483 .2679463 -2.354734 .6659743 ufe1980 | 532 .0038486 .2253673 -2.085636 1.128546 -------------+-------------------------------------------------------ure1980 | 532 .0035069 .2238723 -2.089847 .9429754 lnhr1981 | 532 7.66765 .2105797 6.36 8.4 lnwg1981 | 532 2.610959 .3870011 1.53 4.53 upols1981 | 532 .0100939 .2133106 -1.342159 .7582438 ufe1981 | 532 .0099646 .163407 -1.001722 1.03687 -------------+-------------------------------------------------------499

ure1981 | 532 .0100382 .1596593 -1.02491 .8517824 lnhr1982 | 532 7.64609 .2427195 5.38 8.31 lnwg1982 | 532 2.61468 .4014363 1.21 4.61 upols1982 | 532 -.0117742 .2422735 -2.264238 .6897579 ufe1982 | 532 -.0122196 .1890237 -1.623214 .7918997 -------------+-------------------------------------------------------ure1982 | 532 -.0119661 .1875585 -1.737484 .6666697 lnhr1983 | 532 7.613064 .382703 2.77 8.37 lnwg1983 | 532 2.610526 .4111869 1.08 4.62 upols1983 | 532 -.0444568 .3778255 -4.826247 .7307264 ufe1983 | 532 -.0445494 .2836351 -3.577253 .5196197 -------------+-------------------------------------------------------ure1983 | 532 -.0444967 .294545 -3.804399 .5078294 lnhr1984 | 532 7.636523 .3316735 3.18 8.44 lnwg1984 | 532 2.600188 .4621549 -.26 4.65 upols1984 | 532 -.0201427 .3208512 -4.240003 .8263766 ufe1984 | 532 -.0193572 .225836 -2.810104 .8327778 -------------+-------------------------------------------------------ure1984 | 532 -.0198043 .2378605 -3.140221 .7036628 lnhr1985 | 532 7.668365 .2597423 5.08 8.54 lnwg1985 | 532 2.614944 .4347554 1.33 4.69 upols1985 | 532 .0104785 .259051 -2.503835 .8624523 ufe1985 | 532 .0100107 .1856724 -1.581894 .7944546 -------------+-------------------------------------------------------ure1985 | 532 .010277 .1886509 -1.752727 .7370209 lnhr1986 | 532 7.659286 .3330862 2.77 8.38 lnwg1986 | 532 2.602632 .4432807 .07 4.59 upols1986 | 532 .0024183 .3312105 -4.801424 .7439653 ufe1986 | 532 .0029962 .2595405 -4.003929 .6384854 -------------+-------------------------------------------------------ure1986 | 532 .0026673 .264328 -4.131111 .5111209 lnhr1987 | 532 7.67406 .2745015 4.38 8.56 lnwg1987 | 532 2.614699 .4300122 1.28 4.03 upols1987 | 532 .0161942 .2749153 -3.283269 .964581 ufe1987 | 532 .0157472 .2141618 -2.817174 1.009662 -------------+-------------------------------------------------------ure1987 | 532 .0160016 .2148092 -2.897725 .8441463 lnhr1988 | 532 7.679831 .2552894 4.79 8.53 lnwg1988 | 532 2.625602 .4701759 -.22 4.6 upols1988 | 532 .0210628 .2519891 -2.633313 .9072749 ufe1988 | 532 .0196898 .2048927 -1.68379 1.123516 -------------+-------------------------------------------------------ure1988 | 532 .0204713 .2022375 -1.897506 .9393954 . . ** OBTAIN THE VARIOUS CORRELATIONS . . corr lnhr1979 lnhr1980 lnhr1981 lnhr1982 lnhr1983 lnhr1984 lnhr1985 lnhr1986 lnhr1987 lnhr1988 (obs=532)

500

| lnhr1979 lnhr1980 lnhr1981 lnhr1982 lnhr1983 lnhr1984 lnhr1985 lnhr1986 lnhr1987 -------------+--------------------------------------------------------------------------------lnhr1979 | 1.0000 lnhr1980 | 0.3220 1.0000 lnhr1981 | 0.4321 0.4022 1.0000 lnhr1982 | 0.2947 0.3142 0.5670 1.0000 lnhr1983 | 0.2070 0.2324 0.3788 0.4781 1.0000 lnhr1984 | 0.1908 0.2235 0.3141 0.3318 0.6476 1.0000 lnhr1985 | 0.2284 0.3184 0.3999 0.3453 0.3930 0.5839 1.0000 lnhr1986 | 0.1934 0.1931 0.2813 0.2524 0.3162 0.3595 0.4128 1.0000 lnhr1987 | 0.1986 0.3160 0.3322 0.2951 0.3261 0.3464 0.3987 0.3603 1.0000 lnhr1988 | 0.1640 0.2551 0.3081 0.2674 0.2267 0.2537 0.3509 0.5741 0.5248 | lnhr1988 -------------+--------lnhr1988 | 1.0000

. corr lnwg1979 lnwg1980 lnwg1981 lnwg1982 lnwg1983 lnwg1984 lnwg1985 lnwg1986 lnwg1987 lnwg1988 (obs=532) | lnwg1979 lnwg1980 lnwg1981 lnwg1982 lnwg1983 lnwg1984 lnwg1985 lnwg1986 lnwg1987 -------------+--------------------------------------------------------------------------------lnwg1979 | 1.0000 lnwg1980 | 0.8415 1.0000 lnwg1981 | 0.8283 0.8920 1.0000 lnwg1982 | 0.7984 0.8559 0.9015 1.0000 lnwg1983 | 0.7795 0.8408 0.8787 0.9155 1.0000 lnwg1984 | 0.7208 0.7737 0.8102 0.8267 0.8625 1.0000 lnwg1985 | 0.7424 0.7929 0.8290 0.8511 0.8636 0.8620 1.0000 lnwg1986 | 0.7250 0.7714 0.8122 0.8286 0.8530 0.8399 0.9157 1.0000 lnwg1987 | 0.7188 0.7639 0.8029 0.8282 0.8525 0.8681 0.9117 0.9111 1.0000 lnwg1988 | 0.7220 0.7604 0.7900 0.8139 0.8326 0.8373 0.8787 0.8743 0.9101 | lnwg1988 -------------+--------lnwg1988 | 1.0000

. * The following gives Table 21.3 p.714 . corr upols1979 upols1980 upols1981 upols1982 upols1983 upols1984 upols1985 upols1986 upols1987 upo > ls1988 (obs=532) | upo~1979 upo~1980 upo~1981 upo~1982 upo~1983 upo~1984 upo~1985 upo~1986 upo~1987 -------------+--------------------------------------------------------------------------------upols1979 | 1.0000 501

upols1980 | upols1981 | upols1982 | upols1983 | upols1984 | upols1985 | upols1986 | upols1987 | upols1988 |

0.3283 0.4442 0.3008 0.2089 0.2025 0.2395 0.1987 0.2091 0.1619

1.0000 0.4035 0.3140 0.2298 0.2289 0.3246 0.1903 0.3167 0.2456

1.0000 0.5678 0.3739 0.3194 0.4087 0.2797 0.3340 0.3016

1.0000 0.4684 0.3360 0.3484 0.2470 0.2877 0.2582

1.0000 0.6398 0.3898 0.3109 0.3097 0.2083

1.0000 0.5800 0.3535 0.3361 0.2470

1.0000 0.3991 1.0000 0.3941 0.3496 1.0000 0.3436 0.5545 0.5242

| upo~1988 -------------+--------upols1988 | 1.0000

. corr ure1979 ure1980 ure1981 ure1982 ure1983 ure1984 ure1985 ure1986 ure1987 ure1988 (obs=532) | ure1979 ure1980 ure1981 ure1982 ure1983 ure1984 ure1985 ure1986 ure1987 -------------+--------------------------------------------------------------------------------ure1979 | 1.0000 ure1980 | 0.0778 1.0000 ure1981 | 0.1777 0.0604 1.0000 ure1982 | -0.0250 -0.0519 0.2492 1.0000 ure1983 | -0.2339 -0.2277 -0.1609 0.0587 1.0000 ure1984 | -0.2482 -0.2431 -0.2691 -0.1709 0.3795 1.0000 ure1985 | -0.1842 -0.0919 -0.1054 -0.1581 -0.0939 0.2197 1.0000 ure1986 | -0.1860 -0.2333 -0.2434 -0.2405 -0.1110 -0.0763 -0.0361 1.0000 ure1987 | -0.1665 -0.0481 -0.1580 -0.1904 -0.1710 -0.1506 -0.0646 -0.0553 1.0000 ure1988 | -0.1960 -0.1251 -0.1646 -0.1949 -0.3265 -0.2786 -0.1221 0.2708 0.2379 | ure1988 -------------+--------ure1988 | 1.0000

. * The following gives Table 21.4 p.715 . corr ufe1979 ufe1980 ufe1981 ufe1982 ufe1983 ufe1984 ufe1985 ufe1986 ufe1987 ufe1988 (obs=532) | ufe1979 ufe1980 ufe1981 ufe1982 ufe1983 ufe1984 ufe1985 ufe1986 ufe1987 -------------+--------------------------------------------------------------------------------ufe1979 | 1.0000 ufe1980 | 0.1017 1.0000 ufe1981 | 0.2082 0.0802 1.0000 ufe1982 | 0.0003 -0.0380 0.2631 1.0000 ufe1983 | -0.2632 -0.2691 -0.2113 0.0089 1.0000 ufe1984 | -0.2594 -0.2698 -0.3004 -0.2037 0.3249 1.0000 ufe1985 | -0.1757 -0.0958 -0.1069 -0.1685 -0.1617 0.1713 1.0000 ufe1986 | -0.1915 -0.2534 -0.2644 -0.2676 -0.1723 -0.1364 -0.0865 1.0000 ufe1987 | -0.1519 -0.0497 -0.1561 -0.2008 -0.2399 -0.2066 -0.0918 -0.0908 1.0000 502

ufe1988 | -0.1650 -0.1109 -0.1385 -0.1772 -0.3816 -0.3096 -0.1268 0.2420 0.2439 | ufe1988 -------------+--------ufe1988 | 1.0000

. . * The following does estimation for just one year . regress lnhr1979 lnwg1979 Source | SS df MS Number of obs = 532 -------------+-----------------------------F( 1, 530) = 0.00 Model | .000035507 1 .000035507 Prob > F = 0.9810 Residual | 33.0180361 530 .062298181 R-squared = 0.0000 -------------+-----------------------------Adj R-squared = -0.0019 Total | 33.0180716 531 .062180926 Root MSE = .2496 -----------------------------------------------------------------------------lnhr1979 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------lnwg1979 | .0006173 .0258574 0.02 0.981 -.0501783 .0514129 _cons | 7.667738 .0680375 112.70 0.000 7.534082 7.801395 -----------------------------------------------------------------------------. . ************ (2) ANALYSIS: OBTAIN AUTOCORRELATIONS FOR FIRST DIFFERNCES . . ** SET UP THE DATA . use mom, clear . gen dlnhr = lnhr - lnhr[_n-1] (1 missing value generated) . gen dlnwg = lnwg - lnwg[_n-1] (1 missing value generated) . * The following drops the first year which here is 1979 . drop if year == 1979 (532 observations deleted) . regress dlnhr dlnwg Source | SS df MS Number of obs = 4788 -------------+-----------------------------F( 1, 4786) = 26.09 Model | 2.27870825 1 2.27870825 Prob > F = 0.0000 Residual | 417.943979 4786 .087326364 R-squared = 0.0054 -------------+-----------------------------Adj R-squared = 0.0052 Total | 420.222687 4787 .087784142 Root MSE = .29551 -----------------------------------------------------------------------------503

dlnhr | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------dlnwg | .1089851 .0213351 5.11 0.000 .0671584 .1508118 _cons | .0008283 .0042712 0.19 0.846 -.0075452 .0092018 -----------------------------------------------------------------------------. predict ufdiff, residuals . * Here just do this for lnhr and lnwg and the residuals . keep dlnhr dlnwg ufdiff id year . reshape wide dlnhr dlnwg ufdiff, i(id) j(year) (note: j = 1980 1981 1982 1983 1984 1985 1986 1987 1988) Data long -> wide ----------------------------------------------------------------------------Number of obs. 4788 -> 532 Number of variables 5 -> 28 j variable (9 values) year -> (dropped) xij variables: dlnhr -> dlnhr1980 dlnhr1981 ... dlnhr1988 dlnwg -> dlnwg1980 dlnwg1981 ... dlnwg1988 ufdiff -> ufdiff1980 ufdiff1981 ... ufdiff1988 ----------------------------------------------------------------------------. summarize Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------id | 532 266.5 153.7194 1 532 dlnhr1980 | 532 -.0092481 .3023508 -2.5 1.71 dlnwg1980 | 532 .0046053 .2301879 -2.12 1.05 ufdiff1980 | 532 -.0105783 .3014161 -2.499738 1.690644 dlnhr1981 | 532 .0075564 .2668644 -1.2 2.32 -------------+-------------------------------------------------------dlnwg1981 | 532 .0085902 .1818033 -.79 1.62 ufdiff1981 | 532 .0057919 .2669213 -1.145188 2.343149 dlnhr1982 | 532 -.0215602 .212834 -2.06 1.14 dlnwg1982 | 532 .0037218 .1755574 -1.17 .74 ufdiff1982 | 532 -.0227941 .213709 -2.036851 1.135902 -------------+-------------------------------------------------------dlnhr1983 | 532 -.0330263 .3413969 -4.51 .9899998 dlnwg1983 | 532 -.0041541 .1673057 -.88 .6399999 ufdiff1983 | 532 -.0334019 .3398726 -4.419281 .9780819 dlnhr1984 | 532 .0234586 .3034213 -2.31 2.57 dlnwg1984 | 532 -.0103383 .2342514 -2.13 .77 -------------+-------------------------------------------------------ufdiff1984 | 532 .0237571 .3004287 -2.168058 2.502691 dlnhr1985 | 532 .0318421 .2772558 -1.46 3.52 dlnwg1985 | 532 .0147556 .2371054 -1.33 3.06 ufdiff1985 | 532 .0294057 .2697542 -1.315878 3.185677 504

dlnhr1986 | 532 -.0090789 .3270724 -4.79 1.8 -------------+-------------------------------------------------------dlnwg1986 | 532 -.012312 .1804162 -1.83 1.04 ufdiff1986 | 532 -.0085654 .3299129 -4.796278 1.789363 dlnhr1987 | 532 .0147744 .3470122 -3.24 4.52 dlnwg1987 | 532 .0120677 .1845692 -.9400001 1.95 ufdiff1987 | 532 .0126309 .3494111 -3.243008 4.550777 -------------+-------------------------------------------------------dlnhr1988 | 532 .0057707 .2587991 -2.5 2.74 dlnwg1988 | 532 .0109023 .194813 -1.5 1.22 ufdiff1988 | 532 .0037542 .2576554 -2.337351 2.739172 . . ** GET THE CORRELATIONS . corr dlnhr1980 dlnhr1981 dlnhr1982 dlnhr1983 dlnhr1984 dlnhr1985 dlnhr1986 dlnhr1987 dlnhr1988 (obs=532) | dlnhr1~0 dlnhr1~1 dlnhr1~2 dlnhr1~3 dlnhr1~4 dlnhr1~5 dlnhr1~6 dlnhr1~7 dlnhr1~8 -------------+--------------------------------------------------------------------------------dlnhr1980 | 1.0000 dlnhr1981 | -0.6289 1.0000 dlnhr1982 | 0.0402 -0.2306 1.0000 dlnhr1983 | 0.0144 -0.0204 -0.2209 1.0000 dlnhr1984 | -0.0001 -0.0570 -0.1410 -0.4495 1.0000 dlnhr1985 | 0.0393 -0.0320 -0.0827 -0.4035 -0.1969 1.0000 dlnhr1986 | -0.0629 0.0322 0.0112 0.0233 -0.1192 -0.2334 1.0000 dlnhr1987 | 0.0811 -0.0709 -0.0029 -0.0448 -0.0202 0.0093 -0.6231 1.0000 dlnhr1988 | -0.0341 0.0461 -0.0082 -0.1020 0.0261 0.0682 0.2486 -0.6064 1.0000

. corr dlnwg1980 dlnwg1981 dlnwg1982 dlnwg1983 dlnwg1984 dlnwg1985 dlnwg1986 dlnwg1987 dlnwg1988 (obs=532) | dlnwg1~0 dlnwg1~1 dlnwg1~2 dlnwg1~3 dlnwg1~4 dlnwg1~5 dlnwg1~6 dlnwg1~7 dlnwg1~8 -------------+--------------------------------------------------------------------------------dlnwg1980 | 1.0000 dlnwg1981 | -0.3507 1.0000 dlnwg1982 | -0.0149 -0.2849 1.0000 dlnwg1983 | 0.0215 -0.0351 -0.3338 1.0000 dlnwg1984 | -0.0112 0.0098 -0.0686 -0.1899 1.0000 dlnwg1985 | -0.0135 -0.0085 0.0141 -0.1179 -0.5560 1.0000 dlnwg1986 | -0.0121 0.0289 -0.0303 0.0725 -0.0526 -0.2665 1.0000 dlnwg1987 | -0.0042 -0.0119 0.0382 -0.0083 0.1200 -0.1482 -0.5043 1.0000 dlnwg1988 | -0.0281 -0.0377 0.0157 -0.0133 -0.0174 -0.0058 -0.0174 -0.2627 1.0000

. corr ufdiff1980 ufdiff1981 ufdiff1982 ufdiff1983 ufdiff1984 ufdiff1985 ufdiff1986 ufdiff1987 ufdif 505

> f1988 (obs=532) | ufd~1980 ufd~1981 ufd~1982 ufd~1983 ufd~1984 ufd~1985 ufd~1986 ufd~1987 ufd~1988 -------------+--------------------------------------------------------------------------------ufdiff1980 | 1.0000 ufdiff1981 | -0.6263 1.0000 ufdiff1982 | 0.0451 -0.2389 1.0000 ufdiff1983 | 0.0128 -0.0239 -0.2316 1.0000 ufdiff1984 | -0.0010 -0.0588 -0.1291 -0.4804 1.0000 ufdiff1985 | 0.0453 -0.0285 -0.0868 -0.3731 -0.1853 1.0000 ufdiff1986 | -0.0674 0.0321 0.0110 0.0256 -0.1138 -0.2538 1.0000 ufdiff1987 | 0.0811 -0.0711 -0.0077 -0.0533 -0.0081 0.0211 -0.6250 1.0000 ufdiff1988 | -0.0323 0.0499 0.0022 -0.1019 0.0368 0.0543 0.2326 -0.5943 1.0000

. . ************ (3) ANALYSIS: CORRELATIONS FOR AN INDIVIDUAL OBSERVATION . . * Look at correlations for each individual . . ** TRANSFORM DATA FROM LONG FORM TO WIDE FORM FOR INDIVIDUALS . . use mom3, replace . * Here just do this for lnhr and lnwg and the residuals . keep lnhr lnwg id year . reshape wide lnhr lnwg, i(year) j(id) (note: j = 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 > 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 6 > 6 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 > 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 > 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 1 > 48 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 > 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 1 > 97 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 > 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 2 > 46 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270

506

> 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 2 > 95 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 > 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 3 > 44 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 > 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 3 > 93 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 > 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 4 > 42 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 > 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 4 > 91 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 > 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532) Data long -> wide ----------------------------------------------------------------------------Number of obs. 5320 -> 10 Number of variables 4 -> 1065 j variable (532 values) id -> (dropped) xij variables: lnhr -> lnhr1 lnhr2 ... lnhr532 lnwg -> lnwg1 lnwg2 ... lnwg532 ----------------------------------------------------------------------------. * Note that i and j are reversed . . * Since year is 1979 to 1988 this will create . * lnhr1979 to lnhr1988 and lnwg1979 to lnwg1988 . . tsset year time variable: year, 1979 to 1988 . . * First-order Correlation over T years for the first observation . corr lnhr1 L.lnhr1 (obs=9) | L. | lnhr1 lnhr1 -------------+-----------------lnhr1 | -- | 1.0000 L1 | 0.6378 1.0000 507

. * First-order Correlation over T years for the second observation . corr lnhr2 L.lnhr2 (obs=9) | L. | lnhr2 lnhr2 -------------+-----------------lnhr2 | -- | 1.0000 L1 | 0.5553 1.0000

. * And so on . . ********** CLOSE OUTPUT . log close log: c:\Imbook\bwebpage\Section5\mma21p3panresiduals.txt log type: text closed on: 23 May 2005, 13:01:15 -----------------------------------------------------------------------------------------------------log: c:\Imbook\bwebpage\Section5\mma21p4pangls.txt log type: text opened on: 23 May 2005, 11:38:01 . . ********** OVERVIEW OF MMA21P4PANGLS.DO ********** . . * STATA Program . * copyright C 2005 by A. Colin Cameron and Pravin K. Trivedi . * used for "Microeconometrics: Methods and Applications" . * by A. Colin Cameron and Pravin K. Trivedi (2005) . * Cambridge University Press . . * Chapter 21.5.5 page 725 Table 21.6 Pooled panel OLS and GLS . * Demonstrate pooled GLS estimation using XTGEE . * (1) No correlation (i.e. pooled OLS) . * (2) Equicorrelated . * (3) AR1 . * (4) Unrestricted . * Standard errors are default plus panel boostrap . . * To run you need file . * MOM.dat . * in your directory . . * The four basic linear panel programs are . * mma21p1panfeandre.do Linear fixed and random effects using xtreg . * mma21p2panfeandre.do Linear fe and re using transformation and regress 508

.* plus also has valid Hausman test . * mma21p3panresiduals.do Residual analysis after linear fe and re . * mma21p4panpangls.do Pooled panel OLS and GLS . . ********** SETUP ********** . . set more off . version 8.0 . set scheme s1mono /* Graphics scheme */ . . ********** DATA DESCRIPTION ********** . . * The original data is from . * Jim Ziliak (1997) . * "Efficient Estimation With Panel Data when Instruments are Predetermined: . * An Empirical Comparison of Moment-Condition Estimators" . * Journal of Business and Economic Statistics, 15, 419-431 . . * File MOM.dat has data on 532 men over 10 years (1979-1988) . * Data are space-delimited ordered by person with separate line for each year . * So id 1 1979, id 1 1980, ..., id 1 1988, id 2 1979, 1d 2 1980, ... . * 8 variables: . * lnhr lnwg kids ageh agesq disab id year . . * File MOM.dat is the version of the data posted at the JBES website . * Note that in chapter 22 we instead use MOMprecise.dat . * which is the same data set but with more significant digits . . ********** READ DATA AND SUMMARIZE ********** .* . * The data are in ascii file MOM.dat . * There are 532 individuals with 10 lines (years) per individual . * Read in using Infile: FREE FORMAT WITHOUT DICTIONARY . infile lnhr lnwg kids ageh agesq disab id year using MOM.dat (5320 observations read) . . describe Contains data obs: 5,320 vars: 8 size: 191,520 (98.1% of memory free) ------------------------------------------------------------------------------storage display value variable name type format label variable label ------------------------------------------------------------------------------lnhr float %9.0g 509

lnwg float %9.0g kids float %9.0g ageh float %9.0g agesq float %9.0g disab float %9.0g id float %9.0g year float %9.0g ------------------------------------------------------------------------------Sorted by: Note: dataset has changed since last saved . summarize Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------lnhr | 5320 7.65743 .2855914 2.77 8.56 lnwg | 5320 2.609436 .4258924 -.26 4.69 kids | 5320 1.555827 1.195924 0 6 ageh | 5320 38.91823 8.450351 22 60 agesq | 5320 1586.024 689.7759 484 3600 -------------+-------------------------------------------------------disab | 5320 .0609023 .2391734 0 1 id | 5320 266.5 153.5893 1 532 year | 5320 1983.5 2.872551 1979 1988 . . ********** DEFINE GLOBALS INCLUDING REGRESSOR LIST ********* . . * Number of reps for the boostrap . * Table 21.6 used 500 . global nreps 500 . . ********* ANALYSIS: DIFFERENT POOLED GLS ESTIMATES USING XTGEE ********* . . *** (1) N0 ERROR CORRELATION - SAME AS POOLED OLS Table 21.7 first column . . * Default standard error . xtgee lnhr lnwg, corr(independent) i(id) Iteration 1: tolerance = 3.405e-13 GEE population-averaged model Number of obs = 5320 Group variable: id Number of groups = 532 Link: identity Obs per group: min = 10 Family: Gaussian avg = 10.0 Correlation: independent max = 10 Wald chi2(1) = 82.25 Scale parameter: .0803055 Prob > chi2 = 0.0000 Pearson chi2(5320):

427.23

Deviance

=

427.23 510

Dispersion (Pearson):

.0803055

Dispersion

= .0803055

-----------------------------------------------------------------------------lnhr | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------lnwg | .0827436 .0091234 9.07 0.000 .064862 .1006251 _cons | 7.441516 .0241219 308.50 0.000 7.394238 7.488795 -----------------------------------------------------------------------------. estimates store ind . * "Robust" standard error . xtgee lnhr lnwg, corr(independent) i(id) robust Iteration 1: tolerance = 3.405e-13 GEE population-averaged model Number of obs = 5320 Group variable: id Number of groups = 532 Link: identity Obs per group: min = 10 Family: Gaussian avg = 10.0 Correlation: independent max = 10 Wald chi2(1) = 7.99 Scale parameter: .0803055 Prob > chi2 = 0.0047 Pearson chi2(5320): Dispersion (Pearson):

427.23 Deviance .0803055 Dispersion

=

427.23 = .0803055

(standard errors adjusted for clustering on id) -----------------------------------------------------------------------------| Semi-robust lnhr | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------lnwg | .0827436 .0292684 2.83 0.005 .0253785 .1401086 _cons | 7.441516 .0795795 93.51 0.000 7.285543 7.597489 -----------------------------------------------------------------------------. estimates store indrob . * Correct panel bootstrap standard errors . set seed 10001 . bootstrap "xtgee lnhr lnwg, corr(independent) i(id)" "_b[lnwg] _b[_cons]", cluster(id) reps($nreps > ) level(95) command: xtgee lnhr lnwg , corr(independent) i(id) statistics: _bs_1 = _b[lnwg] _bs_2 = _b[_cons] Bootstrap statistics

Number of obs = N of clusters = 532 Replications = 500

5320

511

-----------------------------------------------------------------------------Variable | Reps Observed Bias Std. Err. [95% Conf. Interval] -------------+---------------------------------------------------------------_bs_1 | 72 .0827435 -.0007854 .0317837 .0193687 .1461184 (N) | .0090096 .1413525 (P) | .0154833 .1413525 (BC) _bs_2 | 72 7.441516 .0024828 .0861859 7.269667 7.613366 (N) | 7.27043 7.635125 (P) | 7.27043 7.631187 (BC) -----------------------------------------------------------------------------Note: N = normal P = percentile BC = bias-corrected . matrix indbootse = e(se) . . *** (2) EQUICORRELATED - SAME AS RE-GLS Table 21.7 second column . . * Default standard error . xtgee lnhr lnwg, corr(exchangeable) i(id) Iteration 1: tolerance = .03364039 Iteration 2: tolerance = .00033468 Iteration 3: tolerance = 4.733e-06 Iteration 4: tolerance = 6.715e-08 GEE population-averaged model Number of obs = 5320 Group variable: id Number of groups = 532 Link: identity Obs per group: min = 10 Family: Gaussian avg = 10.0 Correlation: exchangeable max = 10 Wald chi2(1) = 76.70 Scale parameter: .0805511 Prob > chi2 = 0.0000 -----------------------------------------------------------------------------lnhr | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------lnwg | .1195474 .0136507 8.76 0.000 .0927925 .1463023 _cons | 7.345479 .0364481 201.53 0.000 7.274042 7.416916 -----------------------------------------------------------------------------. estimates store exch . * "Robust" standard error . xtgee lnhr lnwg, corr(exchangeable) i(id) robust Iteration 1: tolerance = .03364039 Iteration 2: tolerance = .00033468 Iteration 3: tolerance = 4.733e-06 512

Iteration 4: tolerance = 6.715e-08 GEE population-averaged model Number of obs = 5320 Group variable: id Number of groups = 532 Link: identity Obs per group: min = 10 Family: Gaussian avg = 10.0 Correlation: exchangeable max = 10 Wald chi2(1) = 5.38 Scale parameter: .0805511 Prob > chi2 = 0.0204 (standard errors adjusted for clustering on id) -----------------------------------------------------------------------------| Semi-robust lnhr | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------lnwg | .1195474 .0515426 2.32 0.020 .0185258 .220569 _cons | 7.345479 .1379494 53.25 0.000 7.075103 7.615855 -----------------------------------------------------------------------------. estimates store exchrob . * Correct panel bootstrap standard errors . set seed 10001 . bootstrap "xtgee lnhr lnwg, corr(exchangeable) i(id)" "_b[lnwg] _b[_cons]", cluster(id) reps($nrep > s) level(95) command: xtgee lnhr lnwg , corr(exchangeable) i(id) statistics: _bs_1 = _b[lnwg] _bs_2 = _b[_cons] Bootstrap statistics

Number of obs = N of clusters = 532 Replications = 500

5320

-----------------------------------------------------------------------------Variable | Reps Observed Bias Std. Err. [95% Conf. Interval] -------------+---------------------------------------------------------------_bs_1 | 72 .1195474 .0068755 .059895 .0001201 .2389747 (N) | .0256504 .2573869 (P) | .0256504 .2286118 (BC) _bs_2 | 72 7.345479 -.0179736 .1585556 7.029328 7.66163 (N) | 6.990765 7.605015 (P) | 7.066358 7.605015 (BC) -----------------------------------------------------------------------------Note: N = normal P = percentile BC = bias-corrected . matrix exchbootse = e(se)

513

. . *** (3) AR(1) Table 21.7 third column . . * Default standard error . xtgee lnhr lnwg, corr(ar 1) i(id) t(year) Iteration 1: tolerance = .001507 Iteration 2: tolerance = 2.246e-06 Iteration 3: tolerance = 1.547e-09 GEE population-averaged model Number of obs = 5320 Group and time vars: id year Number of groups = 532 Link: identity Obs per group: min = 10 Family: Gaussian avg = 10.0 Correlation: AR(1) max = 10 Wald chi2(1) = 46.73 Scale parameter: .0803129 Prob > chi2 = 0.0000 -----------------------------------------------------------------------------lnhr | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------lnwg | .0843777 .0123428 6.84 0.000 .0601862 .1085691 _cons | 7.439893 .0327698 227.04 0.000 7.375665 7.50412 -----------------------------------------------------------------------------. estimates store ar1 . * "Robust" standard error . xtgee lnhr lnwg, corr(ar 1) i(id) t(year) robust Iteration 1: tolerance = .001507 Iteration 2: tolerance = 2.246e-06 Iteration 3: tolerance = 1.547e-09 GEE population-averaged model Number of obs = 5320 Group and time vars: id year Number of groups = 532 Link: identity Obs per group: min = 10 Family: Gaussian avg = 10.0 Correlation: AR(1) max = 10 Wald chi2(1) = 5.15 Scale parameter: .0803129 Prob > chi2 = 0.0232 (standard errors adjusted for clustering on id) -----------------------------------------------------------------------------| Semi-robust lnhr | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------lnwg | .0843777 .0371764 2.27 0.023 .0115133 .1572421 _cons | 7.439893 .100308 74.17 0.000 7.243293 7.636493 ------------------------------------------------------------------------------

514

. estimates store ar1rob . * Correct panel bootstrap standard errors . set seed 10001 . bootstrap "xtgee lnhr lnwg, corr(ar 1) i(id)" "_b[lnwg] _b[_cons]", cluster(id) reps($nreps) level > (95) command: xtgee lnhr lnwg , corr(ar 1) i(id) statistics: _bs_1 = _b[lnwg] _bs_2 = _b[_cons] Bootstrap statistics

Number of obs = N of clusters = 532 Replications = 500

5320

-----------------------------------------------------------------------------Variable | Reps Observed Bias Std. Err. [95% Conf. Interval] -------------+---------------------------------------------------------------_bs_1 | 500 .0843777 -.0025819 .050393 -.014631 .1833863 (N) | -.0060264 .184696 (P) | -.0031327 .1860251 (BC) _bs_2 | 500 7.439893 .0077122 .136732 7.171251 7.708534 (N) | 7.165532 7.686645 (P) | 7.157923 7.676162 (BC) -----------------------------------------------------------------------------Note: N = normal P = percentile BC = bias-corrected . matrix ar1bootse = e(se) . . *** (4) HOMOSKEDASTIC UNSTRUCTURED Table 21.7 fourth column . . * Default standard error . xtgee lnhr lnwg, corr(unstructured) i(id) t(year) Iteration 1: tolerance = .00721446 Iteration 2: tolerance = .0003951 Iteration 3: tolerance = .00001469 Iteration 4: tolerance = 4.230e-07 GEE population-averaged model Number of obs = 5320 Group and time vars: id year Number of groups = 532 Link: identity Obs per group: min = 10 Family: Gaussian avg = 10.0 Correlation: unstructured max = 10 Wald chi2(1) = 43.67 Scale parameter: .0803575 Prob > chi2 = 0.0000

515

-----------------------------------------------------------------------------lnhr | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------lnwg | .0910023 .0137712 6.61 0.000 .0640113 .1179933 _cons | 7.426262 .0366836 202.44 0.000 7.354363 7.49816 -----------------------------------------------------------------------------. estimates store unstr . * "Robust" standard error . xtgee lnhr lnwg, corr(unstructured) i(id) t(year) robust Iteration 1: tolerance = .00721446 Iteration 2: tolerance = .0003951 Iteration 3: tolerance = .00001469 Iteration 4: tolerance = 4.230e-07 GEE population-averaged model Number of obs = 5320 Group and time vars: id year Number of groups = 532 Link: identity Obs per group: min = 10 Family: Gaussian avg = 10.0 Correlation: unstructured max = 10 Wald chi2(1) = 3.29 Scale parameter: .0803575 Prob > chi2 = 0.0695 (standard errors adjusted for clustering on id) -----------------------------------------------------------------------------| Semi-robust lnhr | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------lnwg | .0910023 .0501344 1.82 0.069 -.0072594 .189264 _cons | 7.426262 .1328255 55.91 0.000 7.165929 7.686595 -----------------------------------------------------------------------------. estimates store unstrrob . * Correct panel bootstrap standard errors . set seed 10001 . /* For some reason the following did not work > bootstrap "xtgee lnhr lnwg, corr(unstructured) i(id)" "_b[lnwg] _b[_cons]", cluster(id) reps($nrep > s) level(95) > matrix unstrbootse = e(se) > */ . . ********** DISPLAY RESULTS IN TABLE 21.7 page 725 ********** . . * Standard error using iid errors and in some cases panel . estimates table ind indrob exch exchrob, /* > */ se stats(N ll r2 tss rss mss rmse df_r) b(%10.3f)

516

-----------------------------------------------------------------Variable | ind indrob exch exchrob -------------+---------------------------------------------------lnwg | 0.083 0.083 0.120 0.120 | 0.009 0.029 0.014 0.052 _cons | 7.442 7.442 7.345 7.345 | 0.024 0.080 0.036 0.138 -------------+---------------------------------------------------N | 5320.000 5320.000 5320.000 5320.000 ll | r2 | tss | rss | mss | rmse | df_r | -----------------------------------------------------------------legend: b/se . estimates table ar1 ar1rob unstr unstrrob, /* > */ se stats(N ll r2 tss rss mss rmse df_r) b(%10.3f) -----------------------------------------------------------------Variable | ar1 ar1rob unstr unstrrob -------------+---------------------------------------------------lnwg | 0.084 0.084 0.091 0.091 | 0.012 0.037 0.014 0.050 _cons | 7.440 7.440 7.426 7.426 | 0.033 0.100 0.037 0.133 -------------+---------------------------------------------------N | 5320.000 5320.000 5320.000 5320.000 ll | r2 | tss | rss | mss | rmse | df_r | -----------------------------------------------------------------legend: b/se . . * Standard errors using panel bootstrap (regular bootstrap for between) . matrix list indbootse indbootse[1,2] _bs_1 _bs_2 se .03178369 .0861859 . matrix list exchbootse

517

exchbootse[1,2] _bs_1 _bs_2 se .05989501 .15855561 . matrix list ar1bootse ar1bootse[1,2] _bs_1 _bs_2 se .05039303 .13673201 . matrix list unstrbootse matrix unstrbootse not found r(111); end of do-file r(111); . exit, clear

518

-----------------------------------------------------------------------------------------------------log: c:\Imbook\bwebpage\Section5\mma22p1pangmm.txt log type: text opened on: 23 May 2005, 11:52:35 . . ********** OVERVIEW OF MMA22P1PANGMM.DO ********** . . * STATA Program . * copyright C 2005 by A. Colin Cameron and Pravin K. Trivedi . * used for "Microeconometrics: Methods and Applications" . * by A. Colin Cameron and Pravin K. Trivedi (2005) . * Cambridge University Press . . * Chapter 22.3 pages 754-6 . * Panel 2SLS and GMM for a linear model with endogenous regressors . * Fixed effects are first differenced. . * Then 2SLS and GMM applied to first differenced model. . . * Program derives Table 22.2 and does other analysis in section . * (1) pooled OLS . * (2) 2SLS in base instruments case . * (3) 2SLS in stacked instruments case . * (4) 2SGMM in base instruments case . * (5) 2SGMM in stacked instruments case . * (6) F-statistics for weak instruments . * (7) Partial R-squared for weak instruments . . * The pooled OLS and 2SLS replicate Ziliak (1997) Table 1 Top left-hand corner . * for Base Case (9 instruments) and first Stacked Case (72 instruments) . * 2SLS in first differences where both 1979 and 1980 are dropped . . * To run you need file . * MOMprecise.dat . * in your directory . . * NOTE: This data set is different from MOM.dat used in chapter 21. .* The data here has more significant digits. .* leading to some difference in resulting coefficient estiamtes. . . ********** SETUP ********** . . set more off . version 8.0 . set scheme s1mono /* Graphics scheme */ . . ********** DATA DESCRIPTION ********** . 519

. * The original data is from . * Jim Ziliak (1997) . * "Efficient Estimation With Panel Data when Instruments are Predetermined: . * An Empirical Comparison of Moment-Condition Estimators" . * Journal of Business and Economic Statistics, 15, 419-431 . * NOTE: Data originally posted on JBES website was to only 2 dec places . * Here more accurate data is used (the same as the data used by Ziliak) . * Ziliak used Gauss. Here Stata is used. . . * File MOM.dat has data on 532 men over 10 years (1979-1988) . * Data are space-delimited ordered by person with separate line for each year . * So id 1 1979, id 1 1980, ..., id 1 1988, id 2 1979, 1d 2 1980, ... . * 8 variables: . * lnhr lnwg kids ageh agesq disab id year . . * File MOMprecise.dat has more significant digits than file MOM.dat . * (the version of the data posted at the JBES website (used in chapter 21) . . ********** READ DATA ********** . . * The data are in ascii file MOM.dat . * There are 532 individuals with 10 lines (years) per individual . * Read in using Infile: FREE FORMAT WITHOUT DICTIONARY . infile lnhr lnwg kids ageh agesq disab id year using MOMprecise.dat (5320 observations read) . describe Contains data obs: 5,320 vars: 8 size: 191,520 (98.1% of memory free) ------------------------------------------------------------------------------storage display value variable name type format label variable label ------------------------------------------------------------------------------lnhr float %9.0g lnwg float %9.0g kids float %9.0g ageh float %9.0g agesq float %9.0g disab float %9.0g id float %9.0g year float %9.0g ------------------------------------------------------------------------------Sorted by: Note: dataset has changed since last saved . summarize Variable |

Obs

Mean

Std. Dev.

Min

Max 520

-------------+-------------------------------------------------------lnhr | 5320 7.657458 .28564 2.772589 8.556414 lnwg | 5320 2.609477 .4260333 -.2613648 4.686474 kids | 5320 1.555827 1.195924 0 6 ageh | 5320 38.91823 8.450351 22 60 agesq | 5320 1586.024 689.7759 484 3600 -------------+-------------------------------------------------------disab | 5320 .0609023 .2391734 0 1 id | 5320 266.5 153.5893 1 532 year | 5320 1983.5 2.872551 1979 1988 . . ********** FIRST DIFFERENCES REGRESSION ********** . . * Stata has no command for first differences regression . * Though may be possible with xtivreg . . * The following only works if each observation is (i,t) . * and within i the data are ordered by t . gen dlnhr = lnhr - lnhr[_n-1] (1 missing value generated) . gen dlnwg = lnwg - lnwg[_n-1] (1 missing value generated) . gen dkids = kids - kids[_n-1] (1 missing value generated) . gen dageh = ageh - ageh[_n-1] (1 missing value generated) . gen dagesq = agesq - agesq[_n-1] (1 missing value generated) . gen ddisab = disab - disab[_n-1] (1 missing value generated) . . * The regression is of . * dlnhr on constant dlnwg dkids dageh dagesq ddisab . . ********** GENERATE THE INSTRUMENTS ********** . . * The endogenous variable is dlnwg. The others are exogenous. . * It is not clear whether current values of the exogenous variables are used as instruments. . * I would think so but there is no mention in the paper of this. . * In addition Table 1 considers various instrument sets . * We consider the first (first rows) and second (second rows) . . * (1) Use the levels of the exogenous regressors lagged one and two periods . * and the level of the endogenous regressor lagged two periods 521

. * This gives nine instruments . gen kidsl1 = kids[_n-1] (1 missing value generated) . gen kidsl2 = kids[_n-2] (2 missing values generated) . gen agehl1 = ageh[_n-1] (1 missing value generated) . gen agehl2 = ageh[_n-2] (2 missing values generated) . gen agesql1 = agesq[_n-1] (1 missing value generated) . gen agesql2 = agesq[_n-2] (2 missing values generated) . gen disabl1 = disab[_n-1] (1 missing value generated) . gen disabl2 = disab[_n-2] (2 missing values generated) . gen lnwgl2 = lnwg[_n-2] (2 missing values generated) . . * (2) Use the same instruments as in (1) except now stacked so that . * now the instrument matrix is block-diagonal. . * This gives nine instruments times number of time periods. . * The original data are 1979 to 1988. . * We will eventually drop the first two years as lose 2 years due to lags. . * For short hand call the instruments z1 to z9 and the years 1981 to 1988 y1 to y8. . * Pad out to 8 x 9 = 72 instruments for 8 years . . program define makeZ 1. forvalues i=1(1)8 { 2. gen z1y`i'=0 3. replace z1y`i' = ageh[_n-1] if year==1980+`i' 4. gen z2y`i'=0 5. replace z2y`i' = agesq[_n-1] if year==1980+`i' 6. gen z3y`i'=0 7. replace z3y`i' = kids[_n-1] if year==1980+`i' 8. gen z4y`i'=0 9. replace z4y`i' = disab[_n-1] if year==1980+`i' 10. gen z5y`i'=0 11. replace z5y`i' = ageh[_n-2] if year==1980+`i' 12. gen z6y`i'=0 13. replace z6y`i' = agesq[_n-2] if year==1980+`i' 522

14. gen z7y`i'=0 15. replace z7y`i' = kids[_n-2] if year==1980+`i' 16. gen z8y`i'=0 17. replace z8y`i' = disab[_n-2] if year==1980+`i' 18. gen z9y`i'=0 19. replace z9y`i' = lnwg[_n-2] if year==1980+`i' 20. } 21. end . quietly makeZ . sum Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------lnhr | 5320 7.657458 .28564 2.772589 8.556414 lnwg | 5320 2.609477 .4260333 -.2613648 4.686474 kids | 5320 1.555827 1.195924 0 6 ageh | 5320 38.91823 8.450351 22 60 agesq | 5320 1586.024 689.7759 484 3600 -------------+-------------------------------------------------------disab | 5320 .0609023 .2391734 0 1 id | 5320 266.5 153.5893 1 532 year | 5320 1983.5 2.872551 1979 1988 dlnhr | 5319 .0000192 .3016322 -4.787492 4.521109 dlnwg | 5319 .0001115 .2718437 -2.32463 3.062298 -------------+-------------------------------------------------------dkids | 5319 -.000188 .6629109 -5 6 dageh | 5319 .0030081 4.611209 -36 19 dagesq | 5319 .2105659 371.0841 -3024 1577 ddisab | 5319 0 .2429913 -1 1 kidsl1 | 5319 1.555932 1.196012 0 6 -------------+-------------------------------------------------------kidsl2 | 5318 1.556036 1.196101 0 6 agehl1 | 5319 38.91747 8.45096 22 60 agehl2 | 5318 38.91707 8.451706 22 60 agesql1 | 5319 1585.974 689.8313 484 3600 agesql2 | 5318 1585.957 689.8949 484 3600 -------------+-------------------------------------------------------disabl1 | 5319 .0609137 .2391944 0 1 disabl2 | 5318 .0609252 .2392155 0 1 lnwgl2 | 5318 2.609513 .4261095 -.2613648 4.686474 z1y1 | 5320 3.544549 10.92972 0 52 z2y1 | 5320 132.0002 438.9997 0 2704 -------------+-------------------------------------------------------z3y1 | 5320 .1567669 .5978681 0 6 z4y1 | 5320 .0048872 .0697442 0 1 z5y1 | 5320 3.445489 10.64043 0 51 z6y1 | 5320 125.0688 418.0247 0 2601 z7y1 | 5320 .1520677 .5938801 0 6 -------------+-------------------------------------------------------523

z8y1 | 5320 .0054511 .0736372 0 1 z9y1 | 5320 .2597756 .7905791 0 4.61522 z1y2 | 5320 3.63891 11.20265 0 53 z2y2 | 5320 138.7175 458.8032 0 2809 z3y2 | 5320 .1590226 .6057112 0 6 -------------+-------------------------------------------------------z4y2 | 5320 .0039474 .0627099 0 1 z5y2 | 5320 3.544549 10.92972 0 52 z6y2 | 5320 132.0002 438.9997 0 2704 z7y2 | 5320 .1567669 .5978681 0 6 z8y2 | 5320 .0048872 .0697442 0 1 -------------+-------------------------------------------------------z9y2 | 5320 .2602349 .7906729 0 4.60976 z1y3 | 5320 3.737218 11.49054 0 54 z2y3 | 5320 145.9744 480.6547 0 2916 z3y3 | 5320 .1637218 .6172305 0 6 z4y3 | 5320 .0052632 .0723633 0 1 -------------+-------------------------------------------------------z5y3 | 5320 3.63891 11.20265 0 53 z6y3 | 5320 138.7175 458.8032 0 2809 z7y3 | 5320 .1590226 .6057112 0 6 z8y3 | 5320 .0039474 .0627099 0 1 z9y3 | 5320 .2610997 .7928738 0 4.52656 -------------+-------------------------------------------------------z1y4 | 5320 3.83985 11.79093 0 55 z2y4 | 5320 153.7444 503.9576 0 3025 z3y4 | 5320 .1620301 .6132476 0 6 z4y4 | 5320 .0037594 .0612043 0 1 z5y4 | 5320 3.737218 11.49054 0 54 -------------+-------------------------------------------------------z6y4 | 5320 145.9744 480.6547 0 2916 z7y4 | 5320 .1637218 .6172305 0 6 z8y4 | 5320 .0052632 .0723633 0 1 z9y4 | 5320 .2614749 .7946793 0 4.607767 z1y5 | 5320 3.940414 12.08767 0 56 -------------+-------------------------------------------------------z2y5 | 5320 161.6111 527.9522 0 3136 z3y5 | 5320 .1595865 .608814 0 6 z4y5 | 5320 .006015 .0773303 0 1 z5y5 | 5320 3.83985 11.79093 0 55 z6y5 | 5320 153.7444 503.9576 0 3025 -------------+-------------------------------------------------------z7y5 | 5320 .1620301 .6132476 0 6 z8y5 | 5320 .0037594 .0612043 0 1 z9y5 | 5320 .2610663 .7939903 0 4.618777 z1y6 | 5320 4.047368 12.40128 0 57 z2y6 | 5320 170.144 553.5552 0 3249 -------------+-------------------------------------------------------z3y6 | 5320 .1575188 .6042401 0 5 z4y6 | 5320 .0065789 .0808511 0 1 z5y6 | 5320 3.940414 12.08767 0 56 524

z6y6 | 5320 161.6111 527.9522 0 3136 z7y6 | 5320 .1595865 .608814 0 6 -------------+-------------------------------------------------------z8y6 | 5320 .006015 .0773303 0 1 z9y6 | 5320 .2600271 .7937085 -.2613648 4.648325 z1y7 | 5320 4.140602 12.67474 0 58 z2y7 | 5320 177.7635 576.2959 0 3364 z3y7 | 5320 .1537594 .5983346 0 5 -------------+-------------------------------------------------------z4y7 | 5320 .006203 .0785219 0 1 z5y7 | 5320 4.047368 12.40128 0 57 z6y7 | 5320 170.144 553.5552 0 3249 z7y7 | 5320 .1575188 .6042401 0 5 z8y7 | 5320 .0065789 .0808511 0 1 -------------+-------------------------------------------------------z9y7 | 5320 .261494 .7964894 0 4.686474 z1y8 | 5320 4.240414 12.96638 0 59 z2y8 | 5320 186.0765 600.9297 0 3481 z3y8 | 5320 .1494361 .5901043 0 5 z4y8 | 5320 .0090226 .0945665 0 1 -------------+-------------------------------------------------------z5y8 | 5320 4.140602 12.67474 0 58 z6y8 | 5320 177.7635 576.2959 0 3364 z7y8 | 5320 .1537594 .5983346 0 5 z8y8 | 5320 .006203 .0785219 0 1 z9y8 | 5320 .2602616 .7933278 0 4.5933 . . * Define variable lists for regressors X and instruments Z . . global XREG dlnwg dkids dageh dagesq ddisab . . global ZBASECASE kidsl1 agehl1 agesql1 disabl1 agehl2 kidsl2 agesql2 disabl2 lnwgl2 . . global ZSTACKED z1y1 z2y1 z3y1 z4y1 z5y1 z6y1 z7y1 z8y1 z9y1 /* > */ z1y2 z2y2 z3y2 z4y2 z5y2 z6y2 z7y2 z8y2 z9y2 /* > */ z1y3 z2y3 z3y3 z4y3 z5y3 z6y3 z7y3 z8y3 z9y3 /* > */ z1y4 z2y4 z3y4 z4y4 z5y4 z6y4 z7y4 z8y4 z9y4 /* > */ z1y5 z2y5 z3y5 z4y5 z5y5 z6y5 z7y5 z8y5 z9y5 /* > */ z1y6 z2y6 z3y6 z4y6 z5y6 z6y6 z7y6 z8y6 z9y6 /* > */ z1y7 z2y7 z3y7 z4y7 z5y7 z6y7 z7y7 z8y7 z9y7 /* > */ z1y8 z2y8 z3y8 z4y8 z5y8 z6y8 z7y8 z8y8 z9y8 . . * Define variable lists for weak instruments test which drops . . save momfdiffgmm, replace file momfdiffgmm.dta saved

525

. sum Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------lnhr | 5320 7.657458 .28564 2.772589 8.556414 lnwg | 5320 2.609477 .4260333 -.2613648 4.686474 kids | 5320 1.555827 1.195924 0 6 ageh | 5320 38.91823 8.450351 22 60 agesq | 5320 1586.024 689.7759 484 3600 -------------+-------------------------------------------------------disab | 5320 .0609023 .2391734 0 1 id | 5320 266.5 153.5893 1 532 year | 5320 1983.5 2.872551 1979 1988 dlnhr | 5319 .0000192 .3016322 -4.787492 4.521109 dlnwg | 5319 .0001115 .2718437 -2.32463 3.062298 -------------+-------------------------------------------------------dkids | 5319 -.000188 .6629109 -5 6 dageh | 5319 .0030081 4.611209 -36 19 dagesq | 5319 .2105659 371.0841 -3024 1577 ddisab | 5319 0 .2429913 -1 1 kidsl1 | 5319 1.555932 1.196012 0 6 -------------+-------------------------------------------------------kidsl2 | 5318 1.556036 1.196101 0 6 agehl1 | 5319 38.91747 8.45096 22 60 agehl2 | 5318 38.91707 8.451706 22 60 agesql1 | 5319 1585.974 689.8313 484 3600 agesql2 | 5318 1585.957 689.8949 484 3600 -------------+-------------------------------------------------------disabl1 | 5319 .0609137 .2391944 0 1 disabl2 | 5318 .0609252 .2392155 0 1 lnwgl2 | 5318 2.609513 .4261095 -.2613648 4.686474 z1y1 | 5320 3.544549 10.92972 0 52 z2y1 | 5320 132.0002 438.9997 0 2704 -------------+-------------------------------------------------------z3y1 | 5320 .1567669 .5978681 0 6 z4y1 | 5320 .0048872 .0697442 0 1 z5y1 | 5320 3.445489 10.64043 0 51 z6y1 | 5320 125.0688 418.0247 0 2601 z7y1 | 5320 .1520677 .5938801 0 6 -------------+-------------------------------------------------------z8y1 | 5320 .0054511 .0736372 0 1 z9y1 | 5320 .2597756 .7905791 0 4.61522 z1y2 | 5320 3.63891 11.20265 0 53 z2y2 | 5320 138.7175 458.8032 0 2809 z3y2 | 5320 .1590226 .6057112 0 6 -------------+-------------------------------------------------------z4y2 | 5320 .0039474 .0627099 0 1 z5y2 | 5320 3.544549 10.92972 0 52 z6y2 | 5320 132.0002 438.9997 0 2704 z7y2 | 5320 .1567669 .5978681 0 6 z8y2 | 5320 .0048872 .0697442 0 1 526

-------------+-------------------------------------------------------z9y2 | 5320 .2602349 .7906729 0 4.60976 z1y3 | 5320 3.737218 11.49054 0 54 z2y3 | 5320 145.9744 480.6547 0 2916 z3y3 | 5320 .1637218 .6172305 0 6 z4y3 | 5320 .0052632 .0723633 0 1 -------------+-------------------------------------------------------z5y3 | 5320 3.63891 11.20265 0 53 z6y3 | 5320 138.7175 458.8032 0 2809 z7y3 | 5320 .1590226 .6057112 0 6 z8y3 | 5320 .0039474 .0627099 0 1 z9y3 | 5320 .2610997 .7928738 0 4.52656 -------------+-------------------------------------------------------z1y4 | 5320 3.83985 11.79093 0 55 z2y4 | 5320 153.7444 503.9576 0 3025 z3y4 | 5320 .1620301 .6132476 0 6 z4y4 | 5320 .0037594 .0612043 0 1 z5y4 | 5320 3.737218 11.49054 0 54 -------------+-------------------------------------------------------z6y4 | 5320 145.9744 480.6547 0 2916 z7y4 | 5320 .1637218 .6172305 0 6 z8y4 | 5320 .0052632 .0723633 0 1 z9y4 | 5320 .2614749 .7946793 0 4.607767 z1y5 | 5320 3.940414 12.08767 0 56 -------------+-------------------------------------------------------z2y5 | 5320 161.6111 527.9522 0 3136 z3y5 | 5320 .1595865 .608814 0 6 z4y5 | 5320 .006015 .0773303 0 1 z5y5 | 5320 3.83985 11.79093 0 55 z6y5 | 5320 153.7444 503.9576 0 3025 -------------+-------------------------------------------------------z7y5 | 5320 .1620301 .6132476 0 6 z8y5 | 5320 .0037594 .0612043 0 1 z9y5 | 5320 .2610663 .7939903 0 4.618777 z1y6 | 5320 4.047368 12.40128 0 57 z2y6 | 5320 170.144 553.5552 0 3249 -------------+-------------------------------------------------------z3y6 | 5320 .1575188 .6042401 0 5 z4y6 | 5320 .0065789 .0808511 0 1 z5y6 | 5320 3.940414 12.08767 0 56 z6y6 | 5320 161.6111 527.9522 0 3136 z7y6 | 5320 .1595865 .608814 0 6 -------------+-------------------------------------------------------z8y6 | 5320 .006015 .0773303 0 1 z9y6 | 5320 .2600271 .7937085 -.2613648 4.648325 z1y7 | 5320 4.140602 12.67474 0 58 z2y7 | 5320 177.7635 576.2959 0 3364 z3y7 | 5320 .1537594 .5983346 0 5 -------------+-------------------------------------------------------z4y7 | 5320 .006203 .0785219 0 1 z5y7 | 5320 4.047368 12.40128 0 57 527

z6y7 | 5320 170.144 553.5552 0 3249 z7y7 | 5320 .1575188 .6042401 0 5 z8y7 | 5320 .0065789 .0808511 0 1 -------------+-------------------------------------------------------z9y7 | 5320 .261494 .7964894 0 4.686474 z1y8 | 5320 4.240414 12.96638 0 59 z2y8 | 5320 186.0765 600.9297 0 3481 z3y8 | 5320 .1494361 .5901043 0 5 z4y8 | 5320 .0090226 .0945665 0 1 -------------+-------------------------------------------------------z5y8 | 5320 4.140602 12.67474 0 58 z6y8 | 5320 177.7635 576.2959 0 3364 z7y8 | 5320 .1537594 .5983346 0 5 z8y8 | 5320 .006203 .0785219 0 1 z9y8 | 5320 .2602616 .7933278 0 4.5933 . . ********** (1)-(3) 2SLS USING IVREG IS STRAIGHTFORWARD (Table 22.2, p.755) ********** . . * Note that this will automatically includes the exogenous variables as instrumetns . * It is not clear that Ziliak does this . . * The following drops the first two years which here are 1979 and 1980 . drop if year == 1979 | year == 1980 (1064 observations deleted) . . * (1) OLS results at bottom Ziliak table 1 . * Table 22.2 (page 755) OLS column with various standard errors estimates . regress dlnhr $XREG, noconstant Source | SS df MS Number of obs = 4256 -------------+-----------------------------F( 5, 4251) = 5.38 Model | 2.3389287 5 .467785741 Prob > F = 0.0001 Residual | 369.369193 4251 .086889954 R-squared = 0.0063 -------------+-----------------------------Adj R-squared = 0.0051 Total | 371.708121 4256 .087337435 Root MSE = .29477 -----------------------------------------------------------------------------dlnhr | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------dlnwg | .1115114 .0230566 4.84 0.000 .0663084 .1567144 dkids | -.0062887 .0116719 -0.54 0.590 -.0291717 .0165943 dageh | .0066935 .0212744 0.31 0.753 -.0350154 .0484025 dagesq | -.0000797 .0002644 -0.30 0.763 -.000598 .0004387 ddisab | -.0352603 .0199796 -1.76 0.078 -.0744306 .0039101 -----------------------------------------------------------------------------. estimates store olsiid

528

. regress dlnhr $XREG, noconstant robust Regression with robust standard errors Number of obs = F( 5, 4251) = 0.70 Prob > F = 0.6246 R-squared = 0.0063 Root MSE = .29477

4256

-----------------------------------------------------------------------------| Robust dlnhr | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------dlnwg | .1115114 .0791674 1.41 0.159 -.043698 .2667207 dkids | -.0062887 .011057 -0.57 0.570 -.0279662 .0153888 dageh | .0066935 .0243788 0.27 0.784 -.0411016 .0544887 dagesq | -.0000797 .0003147 -0.25 0.800 -.0006965 .0005372 ddisab | -.0352603 .0364021 -0.97 0.333 -.1066273 .0361067 -----------------------------------------------------------------------------. estimates store olshet . regress dlnhr $XREG, noconstant cluster(id) Regression with robust standard errors Number of obs = 4256 F( 5, 531) = 0.52 Prob > F = 0.7617 R-squared = 0.0063 Number of clusters (id) = 532 Root MSE = .29477 -----------------------------------------------------------------------------| Robust dlnhr | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------dlnwg | .1115114 .0960926 1.16 0.246 -.0772569 .3002797 dkids | -.0062887 .0109558 -0.57 0.566 -.0278107 .0152333 dageh | .0066935 .012339 0.54 0.588 -.0175458 .0309328 dagesq | -.0000797 .0001551 -0.51 0.608 -.0003843 .000225 ddisab | -.0352603 .0452557 -0.78 0.436 -.1241625 .053642 -----------------------------------------------------------------------------. estimates store olspanel . . * (2) 2SLS using the base case instrument set . * Table 22.2 (page 755) 2SLS column base case with various se estimates . ivreg dlnhr ($XREG = $ZBASECASE), noconstant Instrumental variables (2SLS) regression Source | SS df MS -------------+------------------------------

Number of obs = F( 5, 4251) =

4256 . 529

Model | .164904559 5 .032980912 Prob > F = . Residual | 371.543217 4251 .087401368 R-squared = . -------------+-----------------------------Adj R-squared = . Total | 371.708121 4256 .087337435 Root MSE = .29564 -----------------------------------------------------------------------------dlnhr | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------dlnwg | .2091087 .3886332 0.54 0.591 -.5528154 .9710328 dkids | -.0296864 .0437001 -0.68 0.497 -.1153615 .0559886 dageh | .026388 .0289908 0.91 0.363 -.030449 .0832251 dagesq | -.0003411 .0003688 -0.92 0.355 -.0010641 .000382 ddisab | .000402 .0429076 0.01 0.993 -.0837194 .0845233 -----------------------------------------------------------------------------Instrumented: dlnwg dkids dageh dagesq ddisab Instruments: kidsl1 agehl1 agesql1 disabl1 agehl2 kidsl2 agesql2 disabl2 lnwgl2 -----------------------------------------------------------------------------. estimates store baseiid . ivreg dlnhr ($XREG = $ZBASECASE), noconstant robust IV (2SLS) regression with robust standard errors Number of obs = F( 5, 4251) = 0.23 Prob > F = 0.9510 R-squared = . Root MSE = .29564

4256

-----------------------------------------------------------------------------| Robust dlnhr | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------dlnwg | .2091087 .423312 0.49 0.621 -.6208038 1.039021 dkids | -.0296864 .0400461 -0.74 0.459 -.1081977 .0488249 dageh | .026388 .0361631 0.73 0.466 -.0445106 .0972866 dagesq | -.0003411 .0004555 -0.75 0.454 -.0012342 .000552 ddisab | .000402 .0731433 0.01 0.996 -.142997 .143801 -----------------------------------------------------------------------------Instrumented: dlnwg dkids dageh dagesq ddisab Instruments: kidsl1 agehl1 agesql1 disabl1 agehl2 kidsl2 agesql2 disabl2 lnwgl2 -----------------------------------------------------------------------------. estimates store basehet . ivreg dlnhr ($XREG = $ZBASECASE), noconstant cluster(id) IV (2SLS) regression with robust standard errors Number of obs = F( 5, 531) = 1.44 Prob > F = 0.2087

4256

530

R-squared Number of clusters (id) = 532

= . Root MSE

= .29564

-----------------------------------------------------------------------------| Robust dlnhr | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------dlnwg | .2091087 .3741705 0.56 0.576 -.5259273 .9441447 dkids | -.0296864 .0293678 -1.01 0.313 -.0873777 .0280048 dageh | .026388 .0153921 1.71 0.087 -.0038488 .0566249 dagesq | -.0003411 .0001837 -1.86 0.064 -.0007019 .0000198 ddisab | .000402 .0667719 0.01 0.995 -.1307674 .1315714 -----------------------------------------------------------------------------Instrumented: dlnwg dkids dageh dagesq ddisab Instruments: kidsl1 agehl1 agesql1 disabl1 agehl2 kidsl2 agesql2 disabl2 lnwgl2 -----------------------------------------------------------------------------. estimates store basepanel . . * (3) 2SLS using the stacked instrument set . * Table 22.2 (page 755) 2SLS column stacked case with various se estimates . set matsize 100 . ivreg dlnhr ($XREG = $ZSTACKED), noconstant Instrumental variables (2SLS) regression Source | SS df MS Number of obs = 4256 -------------+-----------------------------F( 5, 4251) = . Model | -29.3711267 5 -5.87422533 Prob > F = . Residual | 401.079248 4251 .094349388 R-squared = . -------------+-----------------------------Adj R-squared = . Total | 371.708121 4256 .087337435 Root MSE = .30716 -----------------------------------------------------------------------------dlnhr | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------dlnwg | .542827 .1691348 3.21 0.001 .2112345 .8744195 dkids | -.0482932 .0393723 -1.23 0.220 -.1254834 .028897 dageh | .0268935 .0288808 0.93 0.352 -.029728 .0835151 dagesq | -.0003511 .0003671 -0.96 0.339 -.0010709 .0003687 ddisab | .0079759 .0397995 0.20 0.841 -.0700519 .0860037 -----------------------------------------------------------------------------Instrumented: dlnwg dkids dageh dagesq ddisab Instruments: z1y1 z2y1 z3y1 z4y1 z5y1 z6y1 z7y1 z8y1 z9y1 z1y2 z2y2 z3y2 z4y2 z5y2 z6y2 z7y2 z8y2 z9y2 z1y3 z2y3 z3y3 z4y3 z5y3 z6y3 z7y3 z8y3 z9y3 z1y4 z2y4 z3y4 z4y4 z5y4 z6y4 z7y4 z8y4 z9y4 z1y5 z2y5 z3y5 z4y5 z5y5 z6y5 z7y5 z8y5 z9y5 z1y6 z2y6 z3y6 z4y6 z5y6 z6y6 z7y6 z8y6 z9y6 z1y7 z2y7 z3y7 z4y7 z5y7 z6y7 z7y7 z8y7 z9y7 z1y8 z2y8 531

z3y8 z4y8 z5y8 z6y8 z7y8 z8y8 z9y8 -----------------------------------------------------------------------------. estimates store stackiid . ivreg dlnhr ($XREG = $ZSTACKED), noconstant robust IV (2SLS) regression with robust standard errors Number of obs = F( 5, 4251) = 1.59 Prob > F = 0.1596 R-squared = . Root MSE = .30716

4256

-----------------------------------------------------------------------------| Robust dlnhr | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------dlnwg | .542827 .2260738 2.40 0.016 .0996043 .9860497 dkids | -.0482932 .0350149 -1.38 0.168 -.1169408 .0203544 dageh | .0268935 .0339561 0.79 0.428 -.0396781 .0934652 dagesq | -.0003511 .0004324 -0.81 0.417 -.0011989 .0004966 ddisab | .0079759 .064012 0.12 0.901 -.1175211 .1334729 -----------------------------------------------------------------------------Instrumented: dlnwg dkids dageh dagesq ddisab Instruments: z1y1 z2y1 z3y1 z4y1 z5y1 z6y1 z7y1 z8y1 z9y1 z1y2 z2y2 z3y2 z4y2 z5y2 z6y2 z7y2 z8y2 z9y2 z1y3 z2y3 z3y3 z4y3 z5y3 z6y3 z7y3 z8y3 z9y3 z1y4 z2y4 z3y4 z4y4 z5y4 z6y4 z7y4 z8y4 z9y4 z1y5 z2y5 z3y5 z4y5 z5y5 z6y5 z7y5 z8y5 z9y5 z1y6 z2y6 z3y6 z4y6 z5y6 z6y6 z7y6 z8y6 z9y6 z1y7 z2y7 z3y7 z4y7 z5y7 z6y7 z7y7 z8y7 z9y7 z1y8 z2y8 z3y8 z4y8 z5y8 z6y8 z7y8 z8y8 z9y8 -----------------------------------------------------------------------------. estimates store stackhet . ivreg dlnhr ($XREG = $ZSTACKED), noconstant cluster(id) IV (2SLS) regression with robust standard errors Number of obs = F( 5, 531) = 2.41 Prob > F = 0.0357 R-squared = . Number of clusters (id) = 532 Root MSE = .30716

4256

-----------------------------------------------------------------------------| Robust dlnhr | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------dlnwg | .542827 .2085225 2.60 0.009 .1331968 .9524572 dkids | -.0482932 .0245011 -1.97 0.049 -.0964242 -.0001622 dageh | .0268935 .0149934 1.79 0.073 -.0025602 .0563473 dagesq | -.0003511 .0001866 -1.88 0.060 -.0007176 .0000154 ddisab | .0079759 .0624423 0.13 0.898 -.1146884 .1306402 532

-----------------------------------------------------------------------------Instrumented: dlnwg dkids dageh dagesq ddisab Instruments: z1y1 z2y1 z3y1 z4y1 z5y1 z6y1 z7y1 z8y1 z9y1 z1y2 z2y2 z3y2 z4y2 z5y2 z6y2 z7y2 z8y2 z9y2 z1y3 z2y3 z3y3 z4y3 z5y3 z6y3 z7y3 z8y3 z9y3 z1y4 z2y4 z3y4 z4y4 z5y4 z6y4 z7y4 z8y4 z9y4 z1y5 z2y5 z3y5 z4y5 z5y5 z6y5 z7y5 z8y5 z9y5 z1y6 z2y6 z3y6 z4y6 z5y6 z6y6 z7y6 z8y6 z9y6 z1y7 z2y7 z3y7 z4y7 z5y7 z6y7 z7y7 z8y7 z9y7 z1y8 z2y8 z3y8 z4y8 z5y8 z6y8 z7y8 z8y8 z9y8 -----------------------------------------------------------------------------. estimates store stackpanel . ivreg dlnhr ($XREG = $ZSTACKED), noconstant robust cluster(id) IV (2SLS) regression with robust standard errors Number of obs = F( 5, 531) = 2.41 Prob > F = 0.0357 R-squared = . Number of clusters (id) = 532 Root MSE = .30716

4256

-----------------------------------------------------------------------------| Robust dlnhr | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------dlnwg | .542827 .2085225 2.60 0.009 .1331968 .9524572 dkids | -.0482932 .0245011 -1.97 0.049 -.0964242 -.0001622 dageh | .0268935 .0149934 1.79 0.073 -.0025602 .0563473 dagesq | -.0003511 .0001866 -1.88 0.060 -.0007176 .0000154 ddisab | .0079759 .0624423 0.13 0.898 -.1146884 .1306402 -----------------------------------------------------------------------------Instrumented: dlnwg dkids dageh dagesq ddisab Instruments: z1y1 z2y1 z3y1 z4y1 z5y1 z6y1 z7y1 z8y1 z9y1 z1y2 z2y2 z3y2 z4y2 z5y2 z6y2 z7y2 z8y2 z9y2 z1y3 z2y3 z3y3 z4y3 z5y3 z6y3 z7y3 z8y3 z9y3 z1y4 z2y4 z3y4 z4y4 z5y4 z6y4 z7y4 z8y4 z9y4 z1y5 z2y5 z3y5 z4y5 z5y5 z6y5 z7y5 z8y5 z9y5 z1y6 z2y6 z3y6 z4y6 z5y6 z6y6 z7y6 z8y6 z9y6 z1y7 z2y7 z3y7 z4y7 z5y7 z6y7 z7y7 z8y7 z9y7 z1y8 z2y8 z3y8 z4y8 z5y8 z6y8 z7y8 z8y8 z9y8 -----------------------------------------------------------------------------. . * DISPLAY THE OLS AND 2SLS RESULTS . . * The following are used in Table 22.2 (page 755) . . * OLS column with various standard errors estimates . estimates table olspanel olshet olsiid, /* > */ se stats(N ll r2 tss rss mss rmse df_r) b(%10.3f) ----------------------------------------------------Variable | olspanel olshet olsiid -------------+--------------------------------------533

dlnwg | 0.112 0.112 0.112 | 0.096 0.079 0.023 dkids | -0.006 -0.006 -0.006 | 0.011 0.011 0.012 dageh | 0.007 0.007 0.007 | 0.012 0.024 0.021 dagesq | -0.000 -0.000 -0.000 | 0.000 0.000 0.000 ddisab | -0.035 -0.035 -0.035 | 0.045 0.036 0.020 -------------+--------------------------------------N | 4256.000 4256.000 4256.000 ll | -837.557 -837.557 -837.557 r2 | 0.006 0.006 0.006 tss | rss | 369.369 369.369 369.369 mss | 2.339 2.339 2.339 rmse | 0.295 0.295 0.295 df_r | 531.000 4251.000 4251.000 ----------------------------------------------------legend: b/se . . * 2SLS column base case with various standard errors estimates . estimates table basepanel basehet baseiid, /* > */ se stats(N ll r2 tss rss mss rmse df_r) b(%10.3f) ----------------------------------------------------Variable | basepanel basehet baseiid -------------+--------------------------------------dlnwg | 0.209 0.209 0.209 | 0.374 0.423 0.389 dkids | -0.030 -0.030 -0.030 | 0.029 0.040 0.044 dageh | 0.026 0.026 0.026 | 0.015 0.036 0.029 dagesq | -0.000 -0.000 -0.000 | 0.000 0.000 0.000 ddisab | 0.000 0.000 0.000 | 0.067 0.073 0.043 -------------+--------------------------------------N | 4256.000 4256.000 4256.000 ll | r2 | . . . tss | rss | 371.543 371.543 371.543 mss | 0.165 0.165 0.165 rmse | 0.296 0.296 0.296 df_r | 531.000 4251.000 4251.000 ----------------------------------------------------legend: b/se 534

. . * 2SLS column stacked case with various standard errors estimates . estimates table stackpanel stackhet stackiid, /* > */ se stats(N ll r2 tss rss mss rmse df_r) b(%10.3f) ----------------------------------------------------Variable | stackpanel stackhet stackiid -------------+--------------------------------------dlnwg | 0.543 0.543 0.543 | 0.209 0.226 0.169 dkids | -0.048 -0.048 -0.048 | 0.025 0.035 0.039 dageh | 0.027 0.027 0.027 | 0.015 0.034 0.029 dagesq | -0.000 -0.000 -0.000 | 0.000 0.000 0.000 ddisab | 0.008 0.008 0.008 | 0.062 0.064 0.040 -------------+--------------------------------------N | 4256.000 4256.000 4256.000 ll | r2 | . . . tss | rss | 401.079 401.079 401.079 mss | -29.371 -29.371 -29.371 rmse | 0.307 0.307 0.307 df_r | 531.000 4251.000 4251.000 ----------------------------------------------------legend: b/se . . ********** (4)-(5) 2SGMM REQUIRES SPECIAL MARTRIX CODING ********** . . *** PROGRAM PANELGMM DOES 2SLS (as check) and 2SGMM USING MATRIX COMMANDS . . * This program: . * - requires as inputs the global macros .* y gives the dependent variable name .* X gives the list of regressor names .* Z gives the list of instrument names . * - assumes the appropriate data is in memory . * - assumes the cluster identifier is called id . . * If the regressors and instruments include an intercept include . * this as a separate regressor, say called ONE, in X and Z. . * Then continue to use the following code with the noconstant option for accum and optaccum. . * (accum and optaccum automatically include a constant AT THE END, . * which is not where we want the constant.) . 535

. * This program computes the 2SLS and two-step GMM estimators .* [(X'Z)(Z'Z)_inv Z'X]_inv (X'Z)(Z'Z)_inv Z'y . * and [(X'Z)S_inv Z'X]_inv (X'Z)S_inv Z'y . * and appropriate panel robust standard errors . * assuming a short panel with errors correlated over t for given i and heteroskedastic. . . program define panelgmm 1. . * (1) Create Z'Z and check that full rank . matrix accum ZZ = $Z, noconstant 2. scalar dimz = rowsof(ZZ) 3. scalar detzz = det(ZZ) 4. di "Redundant instruments if det(Z'Z) zero. Here det(Z'Z) = " detzz 5. . * (2) Create Z'X which is trickier . * Create ZX'ZX = [Z X]' [Z X] using accum which automatically adds a constant . matrix accum ZXZX = $Z $X, noconstant 6. * Then Z'X is the (1,2) submatrix: rows 1 to dimz and columns dimz+1 to dimzx . scalar dimzx = rowsof(ZXZX) 7. * Also need dimension of X . matrix accum XX = $X, noconstant 8. scalar dimx = rowsof(XX) 9. matrix ZX = ZXZX[1..dimz,dimz+1...] 10. . * (3) Create Z'y . * Create Zy'Zy = [Z y]' [Z y] using accum which automatically adds a constant . matrix accum ZyZy = $Z $y, noconstant 11. * Then Z'y is the (1,2) submatrix: rows 1 to dimz and the last column . matrix Zy = ZyZy[1..dimz,dimz+1] 12. . * (4) Compute 2SLS Estimator . di " " 13. di "2SLS results: " 14. matrix b2SLS = syminv(ZX'*syminv(ZZ)*ZX)*ZX'*syminv(ZZ)*Zy 15. matrix list b2SLS 16. . * (5) Compute S = Sum_i Zi'u_i*u_i'Z_i using opaccum . * Key is use of opaccum. . * Need to compute the residuals. . gen yhat = 0 17. foreach var of varlist $X { 18. matrix a`var' = b2SLS["`var'",1] 19. scalar b`var' = trace(a`var') /* converts matrix to scalar */ 20. quietly replace yhat = yhat + (b`var')*(`var') 21. } 22. gen uhat = $y - yhat 23. gen uhatsq = uhat*uhat 24. quietly sum(uhatsq) 25. scalar rmse = sqrt(r(sum)/(_N-dimx)) 26. di "rmse = " rmse 27. * Alternative and check uses ivreg. 536

. quietly ivreg $y ($X = $Z), noconstant cluster(id) 28. predict uhat2, residuals 29. quietly sum uhat uhat2 30. * Sort data for opaccum to work . preserve 31. sort id 32. matrix opaccum S = $Z, group(id) opvar(uhat) noconstant 33. /* > * Ziliak uses heteroskedastic errors but not correlated. > * Then instead use the following which assumes time identifier is year. > * Make a unique identifier obsid so that group(obsid) does not group > gen obsid = 10000*id + year > sort obsid > matrix opaccum S = $Z, group(obsid) opvar(uhat) noconstant > */ . restore 34. . * (6) Compute Variance of 2SLS. . matrix v2SLS = syminv(ZX'*syminv(ZZ)*ZX)*ZX'*syminv(ZZ)*S*syminv(ZZ)*ZX*syminv(ZX'*syminv(ZZ)*Z X) 35. * matrix list v2SLS . * Now need to get standard errors . matrix se2SLS = J(dimx,1,0) /* Initially column vector of zeroes */ 36. scalar icol = 1 37. * Need loop here as Stata does not do square root on a vector . while icol <= dimx { 38. matrix se2SLS[icol,1] = sqrt(v2SLS[icol,icol]) 39. scalar icol = icol+1 40. } 41. matrix list se2SLS 42. . * (7) Compute Two-step GMM . di " " 43. di "2SGMM results: " 44. matrix b2SGMM = syminv(ZX'*syminv(S)*ZX)*ZX'*syminv(S)*Zy 45. matrix list b2SGMM 46. . * (8) Compute Variance of Two-step GMM . * Compute the residuals to recompute S at the new estimates. . * Note that could just use the old S . drop yhat uhat uhatsq 47. gen yhat = 0 48. foreach var of varlist $X { 49. matrix a`var' = b2SGMM["`var'",1] 50. scalar b`var' = trace(a`var') /* converts matrix to scalar */ 51. quietly replace yhat = yhat + (b`var')*(`var') 52. } 53. gen uhat = $y - yhat 54. gen uhatsq = uhat*uhat 55. quietly sum(uhatsq) 537

56. scalar rmse = sqrt(r(sum)/(_N-dimx)) 57. di "rmse = " rmse 58. * Sort data for opaccum to work . preserve 59. sort id 60. matrix opaccum S = $Z, group(id) opvar(uhat) noconstant 61. matrix v2SGMM = syminv(ZX'*syminv(S)*ZX) 62. * matrix list v2SGMM . matrix se2SGMM = J(dimx,1,0) /* Initially column vector of zeroes */ 63. scalar icol = 1 64. * Need loop here as Stata does not do square root on a vector . while icol <= dimx { 65. matrix se2SGMM[icol,1] = sqrt(v2SGMM[icol,icol]) 66. scalar icol = icol+1 67. } 68. matrix list se2SGMM 69. . * (9) Compute the overidentifying restrictions test . * Create row vector u'Z using vecaccum which automatically adds a constant . matrix vecaccum uZ = uhat $Z, noconstant 70. matrix maxobjfunction = uZ*syminv(S)*uZ' 71. scalar ortest = maxobjfunction[1,1] 72. scalar dof = dimz - dimx 73. di " Over-identifying restrictions test " ortest " dof " dof " p-value " chi2tail(dof,ortest) 74. . end . . *** EXECUTE THE PROGRAM PANEL GMM FOR THESE DATA . . * Note that Ziliak does not use an intercept. . * If have an intercept then need to add in the constant explicitly . * generate ONE = 1 . * and then add this to the X and Z . . * Define the dependent variable . global y dlnhr . . * Define the regressors. . global X $XREG . . * (4) 2SGMM (and 2SLS as check) using the base case instrument set . * Gives 2SGMM Base Case column of Table 22.2 (page 755) . . global Z $ZBASECASE . panelgmm (obs=4256) Redundant instruments if det(Z'Z) zero. Here det(Z'Z) = 6.375e+37 538

(obs=4256) (obs=4256) (obs=4256) 2SLS results: b2SLS[5,1] dlnhr dlnwg .20910869 dkids -.02968643 dageh .02638804 dagesq -.00034108 ddisab .00040197 rmse = .29563723 se2SLS[5,1] c1 r1 .3736429 r2 .02932634 r3 .01537039 r4 .00018343 r5 .06667771 2SGMM results: b2SGMM[5,1] dlnhr dlnwg .54679602 dkids -.04490416 dageh .02747594 dagesq -.00035912 ddisab -.0468348 rmse = .30719932 se2SGMM[5,1] c1 r1 .32762396 r2 .02714405 r3 .01295984 r4 .00015941 r5 .06236006 Over-identifying restrictions test 5.4503878 dof 4 p-value .24412497 . . * (5) 2SGMM (and 2SLS as check) using the stacked instrument set . * Gives 2SGMM Stacked Case column of Table 22.2 (page 755) . . drop uhat yhat uhatsq uhat2 /* Obtained in panelgmm */ . global Z $ZSTACKED

539

. * dlnwg dkids dageh dagesq ddisab . panelgmm (obs=4256) Redundant instruments if det(Z'Z) zero. Here det(Z'Z) = 7.52e+234 (obs=4256) (obs=4256) (obs=4256) 2SLS results: b2SLS[5,1] dlnhr dlnwg .54282703 dkids -.0482932 dageh .02689353 dagesq -.00035113 ddisab .0079759 rmse = .30716345 se2SLS[5,1] c1 r1 .20822845 r2 .02446659 r3 .01497229 r4 .0001863 r5 .0623543 2SGMM results: b2SGMM[5,1] dlnhr dlnwg .32999732 dkids -.01681724 dageh .01637783 dagesq -.00019221 ddisab -.02010632 rmse = .29791501 se2SGMM[5,1] c1 r1 .10965082 r2 .01356737 r3 .00834178 r4 .0001037 r5 .02357317 Over-identifying restrictions test 69.506226 dof 67 p-value .39307324 . . ********** (6) F-STATISTICS FOR WEAK INSTRUMENTS (page 756) ********** . . * (1) Weak Instruments using base case instrument set 540

. . * Test weak instruments for dlnwg using panel robust inference . quietly regress dlnwg $ZBASECASE, cluster(id) . quietly test $ZBASECASE . * This value should have been reported in the text on page 756 . * [Instead by mistake the F assuning iid errors below was reported] . di "r2 = " e(r2) " F = " r(F) " p = " r(p) " dof = " r(df) r2 = .00590049 F = 2.3790046 p = .01209278 dof = 9 . . * Same except use wrong inference assuming iid errors . quietly regress dlnwg $ZBASECASE . quietly test $ZBASECASE . di "r2 = " e(r2) " F = " r(F) " p = " r(p) " dof = " r(df) r2 = .00590049 F = 2.800243 p = .00281135 dof = 9 . . * (2) Weak Instruments using stacked instrument set . . * Test weak instruments for dlnwg using panel robust inference . quietly regress dlnwg $ZSTACKED, cluster(id) . quietly test $ZSTACKED . * This value was reported in the text on page 756 . di "r2 = " e(r2) " F = " r(F) " p = " r(p) " dof = " r(df) r2 = .02256803 F = 1.9000813 p = .00003808 dof = 72 . . * Same except use wrong inference assuming iid errors . quietly regress dlnwg $ZSTACKED . quietly test $ZSTACKED . di "r2 = " e(r2) " F = " r(F) " p = " r(p) " dof = " r(df) r2 = .02256803 F = 1.341413 p = .02961833 dof = 72 . . * (3) Weak Instruments for other regressors . * Here all regressors are instrumented. So should test all as above. . * These find no problems. . * For example, for dkids and base case instrument set . quietly regress dkids $ZSTACKED, cluster(id) . quietly test $ZSTACKED . di "r2 = " e(r2) " F = " r(F) " p = " r(p) " dof = " r(df) 541

r2 = .16281613 F = 8.4145744 p = 3.349e-52 dof = 72 . quietly regress dageh $ZSTACKED, cluster(id) . quietly test $ZSTACKED . di "r2 = " e(r2) " F = " r(F) " p = " r(p) " dof = " r(df) r2 = .22076423 F = 24.002499 p = 6.30e-126 dof = 72 . quietly regress dagesq $ZSTACKED, cluster(id) . quietly test $ZSTACKED . di "r2 = " e(r2) " F = " r(F) " p = " r(p) " dof = " r(df) r2 = .36856999 F = 150.79951 p = 4.10e-309 dof = 72 . quietly regress ddisab $ZSTACKED, cluster(id) . quietly test $ZSTACKED . di "r2 = " e(r2) " F = " r(F) " p = " r(p) " dof = " r(df) r2 = .28591864 F = 25.786283 p = 4.70e-132 dof = 72 . . ********** PARTIAL R-SQUARED FOR WEAK INSTRUMENTS (page 756) ********** . . * (1) Weak Instruments using base case instrument set . . * Test weak instruments for dlnwg using panel robust inference . quietly regress dlnwg $ZBASECASE, cluster(id) . quietly test $ZBASECASE . di "r2 = " e(r2) " F = " r(F) " p = " r(p) " dof = " r(df) r2 = .00590049 F = 2.3790046 p = .01209278 dof = 9 . . **** (D) Shea (1997) partial R-squared . . * Here we have five endogenous regressors and no exogenous regressors. . * Need to change code below if there are exogenous regressors. See ch4ivkling.do . * Focus on the endogenous wage regressor. . * For the other four just need to replace dlnwg in the first line of (1) . * and replace the first line of (2B) . . * (1) Form x1 - x1tilda: residual from regress x1 on other regressors . quietly reg dlnwg dkids dageh dagesq ddisab . * quietly reg dkids dlnwg dageh dagesq ddisab . predict x1minusx1tilda, resid

542

. . * (2) Form x1hat - x1hattilda: residual from regress x1hat on fitted values of other regressors . * (2A) First get the fitted values from regress endogenous on instruments . quietly reg dlnwg $ZBASECASE . predict dlnwghat, xb . di e(r2) " r2 from regress x1 on Z" .00590049 r2 from regress x1 on Z . quietly reg dkids $ZBASECASE . predict dkidshat, xb . di e(r2) " r2 from regress second endog regressor on Z" .1473738 r2 from regress second endog regressor on Z . quietly reg dageh $ZBASECASE . predict dagehhat, xb . di e(r2) " r2 from regress third endog regressor on Z" .13903221 r2 from regress third endog regressor on Z . quietly reg dagesq $ZBASECASE . predict dagesqhat, xb . di e(r2) " r2 from regress fourth endog regressor on Z" .3049799 r2 from regress fourth endog regressor on Z . quietly reg ddisab $ZBASECASE . predict ddisabhat, xb . di e(r2) " r2 from regress fifth endog regressor on Z" .26087493 r2 from regress fifth endog regressor on Z . * (2B) Run the regression of x1hat on fitted values of other regressors . quietly reg dlnwghat dkidshat dagehhat dagesqhat ddisabhat . * quietly reg dkidshat dlnwghat dagehhat dagesqhat ddisabhat . di e(r2) " r2 from regress prediction of x1 on predictions of x2 .38268288 r2 from regress prediction of x1 on predictions of x2 . predict x1hatminusx1hattilda, resid . . * (3) Form the correlation between (1) and (2) . * This value is reported in the text on page 756 . corr x1minusx1tilda x1hatminusx1hattilda 543

(obs=4256) | x1minu~a x1hatm~a -------------+-----------------x1minusx1t~a | 1.0000 x1hatminus~a | 0.0604 1.0000

. di r(rho)^2 " Shea's partial R-squared measure" .00364741 Shea's partial R-squared measure . . ********** CLOSE OUTPUT . . log close log: c:\Imbook\bwebpage\Section5\mma22p1pangmm.txt log type: text closed on: 23 May 2005, 11:52:42

544

-----------------------------------------------------------------------------------------------------log: c:\Imbook\bwebpage\Section5\mma23p1pannonlin.txt log type: text opened on: 23 May 2005, 12:46:16 . . ********** OVERVIEW OF MMA23P1PANNONLIN.DO ********** . . * STATA Program . * copyright C 2005 by A. Colin Cameron and Pravin K. Trivedi . * used for "Microeconometrics: Methods and Applications" . * by A. Colin Cameron and Pravin K. Trivedi (2005) . * Cambridge University Press . . * Chapter 23.3 pages 792-5 . * Example of nonlinear model (multiplicative effects) . . * This program derives Table 23.1 and Figure 23.1. . * It performs nonlinear panel analysis for multiplicative effects model . * y_it = a_i*exp(x_it'b) = exp(c_i+x_it'b) . * and parametric count data models . . * (1) Linear (xtreg) for log(PAT) with adjustment for PAT=0 .* Output include Figure 23.1 . * (2) Poisson (xtpoisson) fixed and random effects . * (3) GEE (xtgee) which includes pooled NLS . . * The Poisson individual effects model is . * y_it ~ Poisson(x_it'b + a_i) . * The standard errors assume this model correctly specified . * i.e. Variance = mean given x+it and a_i . . * FOr "panel robust se's see section 23.2.6 pages 788-791 . * To obtain more panel robust standard errors this program panel bootstraps . * Note that the panel se entries of 0.033 under GEE, Poisson-RE and Poisson-FE . * are not panel robust to the extent that the bootstrap se's are panel robust . * and in fact are the usual se's in the case of Poisson-RE and Poisson-FE . * Unlike ch.21 here "panel se" means "defaul panel se" and not "panel-robust se". . . * To speed up program reduce nreps, the number of bootstrap replications . . * To run this program you need data file . * patr7079.asc . . ********** SETUP ********** . . set more off . version 8.0 . set scheme s1mono /* Graphics scheme */ 545

. . ********** DATA DESCRIPTION ********** . . * There are ten years of data but only five years 1975-79 are used in estimation . . * The original data is from . * Bronwyn Hall, Zvi Griliches, and Jerry Hausman (1986), . * "Patents and R&D: Is There a Lag?", . * International Economic Review, 27, 265-283. . . * File patr7079.dat has data on 346 firms . * There are 4 lines per firm, with 25 variables . * Time-invariant: CUSIP,ARDSSIC,SCISECT,LOGK,SUMPAT, . * Time-varying X: LOGR70,LOGR71,LOGR72, ....., LOGR77,LOGR78,LOGR79 . * Time-varying Y: PAT70,PAT71,PAT72, ....., PAT77,PAT78,PAT79 . * in the format: . * I7,I3,I2,5F12.6/6F12.6/6F12.6/5F12.6/ . * where . * CUSIP Compustat's identifying number for the firm (Committee on .* Uniform Security Identification Procedures number). . * ARDSIC A two-digit code for the applied R&D industrial classification .* (roughly that in Bound, Cummins, Griliches, Hall, and Jaffe, in .* the Griliches R&D, Patents, and Productivity volume). . * SCISECT Dummy equal to one for firms in the scientific sector. . * LOGK The logarithm of the book value of capital in 1972. . * SUMPAT The sum of patents applied for between 1972-1979. . * LOGR70- The logarithm of R&D spending during the year (in 1972 dollars). . * LOGR79 . * PAT70- The number of patents applied for during the year that were . * PAT79 eventually granted. . . ********** READ DATA ********** . . * The data are in ascii file patr7079.asc . * There are 346 observations on 25 variables with four lines per obs . * The data are fixed format with . * line 1 variables 1-8 I7,I3,I2,5F12.6 . * line 2 variables 9-14 6F12.6 . * line 3 variables 15-20 6F12.6 . * line 4 variables 20-25 6F12.6 . . * Read in using Infile: FREE FORMAT WITHOUT DICTIONARY . * As there is space between each observation data is also space-delimited . * free format and then there is no need for a dictionary file . * The following command spans more that one line so use /* and */ . infile CUSIP ARDSSIC SCISECT LOGK SUMPAT LOGR70 LOGR71 LOGR72 LOGR73 /* > */ LOGR74 LOGR75 LOGR76 LOGR77 LOGR78 LOGR79 PAT70 PAT71 PAT72 /* > */ PAT73 PAT74 PAT75 PAT76 PAT77 PAT78 PAT79 using patr7079.asc (346 observations read)

546

. . ********** DATA TRANSFORMATIONS ********** . . * Use observation number as an identifier, not just CUSIP . gen id = _n . label variable id "id" . * The following lists the variables in data set and summarizes data . describe Contains data obs: 346 vars: 26 size: 37,368 (99.6% of memory free) ------------------------------------------------------------------------------storage display value variable name type format label variable label ------------------------------------------------------------------------------CUSIP float %9.0g ARDSSIC float %9.0g SCISECT float %9.0g LOGK float %9.0g SUMPAT float %9.0g LOGR70 float %9.0g LOGR71 float %9.0g LOGR72 float %9.0g LOGR73 float %9.0g LOGR74 float %9.0g LOGR75 float %9.0g LOGR76 float %9.0g LOGR77 float %9.0g LOGR78 float %9.0g LOGR79 float %9.0g PAT70 float %9.0g PAT71 float %9.0g PAT72 float %9.0g PAT73 float %9.0g PAT74 float %9.0g PAT75 float %9.0g PAT76 float %9.0g PAT77 float %9.0g PAT78 float %9.0g PAT79 float %9.0g id float %9.0g id ------------------------------------------------------------------------------Sorted by: Note: dataset has changed since last saved . summarize

547

Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------CUSIP | 346 531201.2 282074.9 800 989399 ARDSSIC | 336 9.97619 5.459706 1 21 SCISECT | 346 .4248555 .4950369 0 1 LOGK | 346 3.921063 2.095542 -1.76965 9.66626 SUMPAT | 346 284.7312 571.1136 0 3806 -------------+-------------------------------------------------------LOGR70 | 346 1.198348 1.941968 -3.67354 6.56641 LOGR71 | 346 1.169182 1.929444 -3.53055 6.95687 LOGR72 | 346 1.185953 1.929078 -3.35241 6.97009 LOGR73 | 346 1.231135 1.934896 -3.67395 7.06211 LOGR74 | 346 1.232636 1.946417 -3.15274 7.06524 -------------+-------------------------------------------------------LOGR75 | 346 1.165802 1.98001 -3.5476 6.76486 LOGR76 | 346 1.212888 1.979273 -3.84868 6.8285 LOGR77 | 346 1.250034 2.003002 -3.47884 6.90253 LOGR78 | 346 1.306511 2.019792 -3.2832 6.96345 LOGR79 | 346 1.345581 2.054982 -3.57742 7.03432 -------------+-------------------------------------------------------PAT70 | 346 40.00289 82.50335 0 608 PAT71 | 346 38.10983 78.40308 0 553 PAT72 | 346 36.30925 74.81591 0 557 PAT73 | 346 36.95376 77.91971 0 595 PAT74 | 346 37.60983 75.94388 0 528 -------------+-------------------------------------------------------PAT75 | 346 36.87283 75.98788 0 508 PAT76 | 346 35.84682 73.31613 0 487 PAT77 | 346 36.23121 72.75146 0 456 PAT78 | 346 32.80636 65.6505 0 434 PAT79 | 346 32.10116 66.36197 0 515 -------------+-------------------------------------------------------id | 346 173.5 100.0258 1 346 . . ******** CHANGE ORGANIZATION OF DATA USING RESHAPE AND MORE TRANSFORMATIONS . . reshape long PAT LOGR, i(id) j(year) (note: j = 70 71 72 73 74 75 76 77 78 79) Data wide -> long ----------------------------------------------------------------------------Number of obs. 346 -> 3460 Number of variables 26 -> 9 j variable (10 values) -> year xij variables: PAT70 PAT71 ... PAT79 -> PAT LOGR70 LOGR71 ... LOGR79 -> LOGR -----------------------------------------------------------------------------

548

. describe Contains data obs: 3,460 vars: 9 size: 128,020 (98.7% of memory free) ------------------------------------------------------------------------------storage display value variable name type format label variable label ------------------------------------------------------------------------------id float %9.0g id year byte %9.0g CUSIP float %9.0g ARDSSIC float %9.0g SCISECT float %9.0g LOGK float %9.0g SUMPAT float %9.0g LOGR float %9.0g PAT float %9.0g ------------------------------------------------------------------------------Sorted by: id year Note: dataset has changed since last saved . summarize Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------id | 3460 173.5 99.89562 1 346 year | 3460 74.5 2.872696 70 79 CUSIP | 3460 531201.2 281707.7 800 989399 ARDSSIC | 3360 9.97619 5.452387 1 21 SCISECT | 3460 .4248555 .4943925 0 1 -------------+-------------------------------------------------------LOGK | 3460 3.921063 2.092814 -1.76965 9.66626 SUMPAT | 3460 284.7312 570.3701 0 3806 LOGR | 3460 1.229807 1.970524 -3.84868 7.06524 PAT | 3460 36.28439 74.46563 0 608 . . * Create new variable log(patents) with adjustment for patents = 0 . gen NEWPAT = PAT . replace NEWPAT = 0.5 if NEWPAT==0. (605 real changes made) . gen LPAT = ln(NEWPAT) . label variable LPAT "Ln(Patents)" . label variable PAT "Patents"

549

. * Dummy variable for logit analysis . gen DPAT = 0 . replace DPAT = 1 if PAT>0 (2855 real changes made) . label variable DPAT "Patent Indicator" . * R and D . gen RANDD = exp(LOGR) . label variable LOGR "Ln(R&D)" . label variable RANDD "R&D" . * Lagged log R and D . tsset id year panel variable: id, 1 to 346 time variable: year, 70 to 79 . gen LOGRL1 = L1.LOGR (346 missing values generated) . gen LOGRL2 = L2.LOGR (692 missing values generated) . gen LOGRL3 = L3.LOGR (1038 missing values generated) . gen LOGRL4 = L4.LOGR (1384 missing values generated) . gen LOGRL5 = L5.LOGR (1730 missing values generated) . label variable LOGRL1 "Ln(R&D) lagged once" . label variable LOGRL2 "Ln(R&D) lagged twice" . label variable LOGRL3 "Ln(R&D) lagged three times" . label variable LOGRL4 "Ln(R&D) lagged four times" . label variable LOGRL5 "Ln(R&D) lagged five times" . * Year dummies . gen dyear2 = 0 . replace dyear2 = 1 if year==76 (346 real changes made)

550

. gen dyear3 = 0 . replace dyear3 = 1 if year==77 (346 real changes made) . gen dyear4 = 0 . replace dyear4 = 1 if year==78 (346 real changes made) . gen dyear5 = 0 . replace dyear5 = 1 if year==79 (346 real changes made) . . * Check data and Save data as Stata data set . describe Contains data obs: 3,460 vars: 22 size: 307,940 (97.0% of memory free) ------------------------------------------------------------------------------storage display value variable name type format label variable label ------------------------------------------------------------------------------id float %9.0g id year byte %9.0g CUSIP float %9.0g ARDSSIC float %9.0g SCISECT float %9.0g LOGK float %9.0g SUMPAT float %9.0g LOGR float %9.0g Ln(R&D) PAT float %9.0g Patents NEWPAT float %9.0g LPAT float %9.0g Ln(Patents) DPAT float %9.0g Patent Indicator RANDD float %9.0g R&D LOGRL1 float %9.0g Ln(R&D) lagged once LOGRL2 float %9.0g Ln(R&D) lagged twice LOGRL3 float %9.0g Ln(R&D) lagged three times LOGRL4 float %9.0g Ln(R&D) lagged four times LOGRL5 float %9.0g Ln(R&D) lagged five times dyear2 float %9.0g dyear3 float %9.0g dyear4 float %9.0g dyear5 float %9.0g ------------------------------------------------------------------------------Sorted by: id year 551

Note: dataset has changed since last saved . summarize Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------id | 3460 173.5 99.89562 1 346 year | 3460 74.5 2.872696 70 79 CUSIP | 3460 531201.2 281707.7 800 989399 ARDSSIC | 3360 9.97619 5.452387 1 21 SCISECT | 3460 .4248555 .4943925 0 1 -------------+-------------------------------------------------------LOGK | 3460 3.921063 2.092814 -1.76965 9.66626 SUMPAT | 3460 284.7312 570.3701 0 3806 LOGR | 3460 1.229807 1.970524 -3.84868 7.06524 PAT | 3460 36.28439 74.46563 0 608 NEWPAT | 3460 36.37182 74.42325 .5 608 -------------+-------------------------------------------------------LPAT | 3460 1.935464 1.949421 -.6931472 6.410175 DPAT | 3460 .8251445 .3798984 0 1 RANDD | 3460 23.02263 82.90186 .0213078 1170.563 LOGRL1 | 3114 1.216943 1.960836 -3.84868 7.06524 LOGRL2 | 2768 1.205747 1.953427 -3.84868 7.06524 -------------+-------------------------------------------------------LOGRL3 | 2422 1.19942 1.946583 -3.84868 7.06524 LOGRL4 | 2076 1.197176 1.941555 -3.67395 7.06524 LOGRL5 | 1730 1.203451 1.934293 -3.67395 7.06524 dyear2 | 3460 .1 .3000434 0 1 dyear3 | 3460 .1 .3000434 0 1 -------------+-------------------------------------------------------dyear4 | 3460 .1 .3000434 0 1 dyear5 | 3460 .1 .3000434 0 1 . drop NEWPAT . save patr7079, replace file patr7079.dta saved . summarize Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------id | 3460 173.5 99.89562 1 346 year | 3460 74.5 2.872696 70 79 CUSIP | 3460 531201.2 281707.7 800 989399 ARDSSIC | 3360 9.97619 5.452387 1 21 SCISECT | 3460 .4248555 .4943925 0 1 -------------+-------------------------------------------------------LOGK | 3460 3.921063 2.092814 -1.76965 9.66626 SUMPAT | 3460 284.7312 570.3701 0 3806 LOGR | 3460 1.229807 1.970524 -3.84868 7.06524 552

PAT | 3460 36.28439 74.46563 0 608 LPAT | 3460 1.935464 1.949421 -.6931472 6.410175 -------------+-------------------------------------------------------DPAT | 3460 .8251445 .3798984 0 1 RANDD | 3460 23.02263 82.90186 .0213078 1170.563 LOGRL1 | 3114 1.216943 1.960836 -3.84868 7.06524 LOGRL2 | 2768 1.205747 1.953427 -3.84868 7.06524 LOGRL3 | 2422 1.19942 1.946583 -3.84868 7.06524 -------------+-------------------------------------------------------LOGRL4 | 2076 1.197176 1.941555 -3.67395 7.06524 LOGRL5 | 1730 1.203451 1.934293 -3.67395 7.06524 dyear2 | 3460 .1 .3000434 0 1 dyear3 | 3460 .1 .3000434 0 1 dyear4 | 3460 .1 .3000434 0 1 -------------+-------------------------------------------------------dyear5 | 3460 .1 .3000434 0 1 . xtsum, i(id) Variable | Mean Std. Dev. Min Max | Observations -----------------+--------------------------------------------+---------------id overall | 173.5 99.89562 1 346 | N = 3460 between | 100.0258 1 346 | n = 346 within | 0 173.5 173.5 | T = 10 | | year overall | 74.5 2.872696 70 79 | N = 3460 between | 0 74.5 74.5 | n = 346 within | 2.872696 70 79 | T = 10 | | CUSIP overall | 531201.2 281707.7 800 989399 | N = 3460 between | 282074.9 800 989399 | n = 346 within | 0 531201.2 531201.2 | T = 10 | | ARDSSIC overall | 9.97619 5.452387 1 21 | N = 3360 between | 5.459706 1 21 | n = 336 within | 0 9.97619 9.97619 | T = 10 | | SCISECT overall | .4248555 .4943925 0 1 | N = 3460 between | .4950369 0 1 | n = 346 within | 0 .4248555 .4248555 | T = 10 | | LOGK overall | 3.921063 2.092814 -1.76965 9.66626 | N = 3460 between | 2.095542 -1.76965 9.66626 | n = 346 within | 0 3.921063 3.921063 | T = 10 | | SUMPAT overall | 284.7312 570.3701 0 3806 | N = 3460 between | 571.1136 0 3806 | n = 346 within | 0 284.7312 284.7312 | T = 10 | | LOGR overall | 1.229807 1.970524 -3.84868 7.06524 | N = 3460 between | 1.944421 -3.120133 6.911438 | n = 346 553

within | .3347099 -1.19673 4.218814 | T = 10 | | PAT overall | 36.28439 74.46563 0 608 | N = 3460 between | 72.5989 0 484.8 | n = 346 within | 16.97772 -177.7156 224.3844 | T = 10 | | LPAT overall | 1.935464 1.949421 -.6931472 6.410175 | N = between | 1.873181 -.6931472 6.180623 | n = 346 within | .5482375 -.2643028 4.368045 | T = 10 | | DPAT overall | .8251445 .3798984 0 1 | N = 3460 between | .2831052 0 1 | n = 346 within | .2537376 -.0748555 1.725145 | T = 10 | | RANDD overall | 23.02263 82.90186 .0213078 1170.563 | N = between | 81.69163 .0582575 1014.058 | n = 346 within | 14.71596 -280.2214 311.47 | T = 10 | | LOGRL1 overall | 1.216943 1.960836 -3.84868 7.06524 | N = between | 1.937733 -3.123236 6.897784 | n = 346 within | .3157841 -.6151992 4.203909 | T = 9 | | LOGRL2 overall | 1.205747 1.953427 -3.84868 7.06524 | N = between | 1.932143 -3.12461 6.889576 | n = 346 within | .3035537 -.486563 4.187752 | T = 8 | | LOGRL3 overall | 1.19942 1.946583 -3.84868 7.06524 | N = between | 1.926813 -3.074006 6.887726 | n = 346 within | .2928787 -.2381882 4.153968 | T = 7 | | LOGRL4 overall | 1.197176 1.941555 -3.67395 7.06524 | N = between | 1.923302 -2.989647 6.897597 | n = 346 within | .2818841 -.2335892 4.095286 | T = 6 | | LOGRL5 overall | 1.203451 1.934293 -3.67395 7.06524 | N = between | 1.917687 -2.99075 6.924144 | n = 346 within | .2692134 -.1899074 4.062701 | T = 5 | | dyear2 overall | .1 .3000434 0 1 | N = 3460 between | 0 .1 .1 | n = 346 within | .3000434 0 1| T= 10 | | dyear3 overall | .1 .3000434 0 1 | N = 3460 between | 0 .1 .1 | n = 346 within | .3000434 0 1| T= 10 | | dyear4 overall | .1 .3000434 0 1 | N = 3460 between | 0 .1 .1 | n = 346 within | .3000434 0 1| T= 10 | | dyear5 overall | .1 .3000434 0 1 | N = 3460

3460

3460

3114

2768

2422

2076

1730

554

between | within |

0 .3000434

.1 0

.1 | n = 346 1| T= 10

. . ********** DEFINE GLOBALS INCLUDING REGRESSOR LIST ********** . . * Number of reps for the bootstrap . * Table 23.1 used 500 . global nreps 500 . . * The regressions below are of patents on LOGR ??? on ??? . * Additional regressors to be included below are defined in xextra . * Here no additional regressors . global xextra . . ********** (1) LINEAR PANEL RANDOM AND FIXED EFFECTS FOR LOG(PAT) ********** . . * This adhoc method uses as dependent variable . * LPAT = ln(PAT) if PAT > 0 .* = ln(0.5) if PAT = 0 . * which is analyzed using chapter 21 methods . . * Note that in the first xt command need to give , i(id) . * to indicate that the ith observation is for the ith id . * Time invariant regressors LOGK SCISECT are not included . . use patr7079, clear . drop if year<75 (1730 observations deleted) . . * Overall plot of data . * The graphs below use new Stata 8 graphics . * Change graphics scheme from default s2color to s1mono for printing . set scheme s1mono . . * Figure 21.1 page 792 [with axis labels corrected - book is wrong] . graph twoway (scatter LPAT LOGR, msize(vsmall)) (lowess LPAT LOGR) (lfit LPAT LOGR), /* > */ scale (1.2) plotregion(style(none)) /* > */ title("Pooled (Overall) Regression") /* > */ xtitle("Log R&D Spending", size(medlarge)) xscale(titlegap(*5)) /* > */ ytitle("Log Patents", size(medlarge)) yscale(titlegap(*5)) /* > */ legend(pos(4) ring(0) col(1)) legend(size(small)) /* > */ legend( label(1 "Original data") label(2 "Nonparametric fit") label(3 "Linear fit")) . graph export ch23fig1.wmf, replace 555

(file c:\Imbook\bwebpage\Section5\ch23fig1.wmf written in Windows Metafile format) . . * OLS . regress LPAT LOGR $xextra, cluster(id) Regression with robust standard errors Number of obs = 1730 F( 1, 345) = 1330.60 Prob > F = 0.0000 R-squared = 0.7192 Number of clusters (id) = 346 Root MSE = 1.0461 -----------------------------------------------------------------------------| Robust LPAT | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------LOGR | .8340745 .0228655 36.48 0.000 .7891012 .8790478 _cons | .7954785 .0579246 13.73 0.000 .6815487 .9094083 -----------------------------------------------------------------------------. estimates store linolspan . . * Fixed effects . xtreg LPAT LOGR $xextra, fe i(id) Fixed-effects (within) regression Group variable (i): id R-sq: within = 0.0026 between = 0.7669 overall = 0.7192

corr(u_i, Xb) = 0.8405

Number of obs = 1730 Number of groups = 346 Obs per group: min = avg = 5.0 max = 5

F(1,1383) = Prob > F

3.63 =

5

0.0570

-----------------------------------------------------------------------------LPAT | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------LOGR | .1067505 .0560364 1.91 0.057 -.0031749 .216676 _cons | 1.709116 .0714557 23.92 0.000 1.568943 1.849289 -------------+---------------------------------------------------------------sigma_u | 1.7380872 sigma_e | .51119065 rho | .92038546 (fraction of variance due to u_i) -----------------------------------------------------------------------------F test that all u_i=0: F(345, 1383) = 16.96 Prob > F = 0.0000 . estimates store linfe . 556

. * Random effects . xtreg LPAT LOGR $xextra, re i(id) Random-effects GLS regression Group variable (i): id R-sq: within = 0.0026 between = 0.7669 overall = 0.7192 Random effects u_i ~ Gaussian corr(u_i, X) = 0 (assumed)

Number of obs Number of groups =

= 1730 346

Obs per group: min = avg = 5.0 max = 5

5

Wald chi2(1) = 915.90 Prob > chi2 = 0.0000

-----------------------------------------------------------------------------LPAT | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------LOGR | .7202377 .0237986 30.26 0.000 .6735932 .7668821 _cons | .9384761 .0599584 15.65 0.000 .8209598 1.055992 -------------+---------------------------------------------------------------sigma_u | .90057544 sigma_e | .51119065 rho | .7563152 (fraction of variance due to u_i) -----------------------------------------------------------------------------. estimates store linre . . . ********** (2) POISSON RANDOM AND FIXED EFFECTS (Table 32.1 p.794 ) ********** . . use patr7079, clear . drop if year<75 (1730 observations deleted) . . * Poisson Cross-section with Poisson standard errors . * Table 23.1 Poisson column . . poisson PAT LOGR $xextra Iteration 0: log likelihood = -21030.607 Iteration 1: log likelihood = -21030.583 Iteration 2: log likelihood = -21030.583 Poisson regression

Number of obs = 1730 LR chi2(1) = 108479.76 Prob > chi2 = 0.0000 Log likelihood = -21030.583 Pseudo R2 = 0.7206 -----------------------------------------------------------------------------557

PAT | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------LOGR | .6929337 .0022454 308.61 0.000 .6885329 .6973346 _cons | 1.711528 .009767 175.24 0.000 1.692385 1.730671 -----------------------------------------------------------------------------. estimates store poisiid . . * Poisson Cross-section with heteroskedastic robust standard errors . poisson PAT LOGR $xextra, robust Iteration 0: log pseudo-likelihood = -21030.607 Iteration 1: log pseudo-likelihood = -21030.583 Iteration 2: log pseudo-likelihood = -21030.583 Poisson regression

Number of obs = 1730 Wald chi2(1) = 1223.63 Prob > chi2 = 0.0000 Log pseudo-likelihood = -21030.583 Pseudo R2 = 0.7206 -----------------------------------------------------------------------------| Robust PAT | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------LOGR | .6929337 .0198092 34.98 0.000 .6541084 .731759 _cons | 1.711528 .0620025 27.60 0.000 1.590006 1.833051 -----------------------------------------------------------------------------. estimates store poishet . . * Poisson Cross-section with panel robust standard errors . poisson PAT LOGR $xextra, cluster(id) Iteration 0: log pseudo-likelihood = -21030.607 Iteration 1: log pseudo-likelihood = -21030.583 Iteration 2: log pseudo-likelihood = -21030.583 Poisson regression

Number of obs = Wald chi2(1) = 259.15 Log pseudo-likelihood = -21030.583 Prob > chi2

1730 =

0.0000

(standard errors adjusted for clustering on id) -----------------------------------------------------------------------------| Robust PAT | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------LOGR | .6929337 .0430441 16.10 0.000 .6085688 .7772987 _cons | 1.711528 .1340309 12.77 0.000 1.448832 1.974224 -----------------------------------------------------------------------------558

. estimates store poispan . . * Poisson panel fixed effects . * Table 23.1 p.794 Poisson-FE column . . * Poisson fixed effects . xtpoisson PAT LOGR $xextra, fe i(id) note: 22 groups (110 obs) dropped due to all zero outcomes Iteration 0: log likelihood = -3660.2656 Iteration 1: log likelihood = -3659.5926 Iteration 2: log likelihood = -3659.5926 Conditional fixed-effects Poisson regression Number of obs = Group variable (i): id Number of groups = 324 Obs per group: min = avg = 5.0 max = 5

Log likelihood = -3659.5926

1620

5

Wald chi2(1) = 1.35 Prob > chi2 =

0.2460

-----------------------------------------------------------------------------PAT | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------LOGR | -.0377642 .0325518 -1.16 0.246 -.1015645 .026036 -----------------------------------------------------------------------------. estimates store poisfe . . /* > * Alternative way is to put in dummy variables > set matsize 400 > xi: poisson PAT LOGR $xextra i.id > */ . . * Poisson panel random effects . * Table 23.1 p.794 Poisson-RE column . . * Poisson random effects . xtpoisson PAT LOGR $xextra, re i(id) Fitting Poisson model: Iteration 0: log likelihood = -21030.607 Iteration 1: log likelihood = -21030.583 Iteration 2: log likelihood = -21030.583 559

Fitting full model: Iteration 0: Iteration 1: Iteration 2: Iteration 3: Iteration 4:

log likelihood = -5633.1283 log likelihood = -5560.1171 log likelihood = -5553.2991 log likelihood = -5553.1788 log likelihood = -5553.1787

Random-effects Poisson regression Number of obs Group variable (i): id Number of groups = Random effects u_i ~ Gamma

= 1730 346

Obs per group: min = avg = 5.0 max = 5

Wald chi2(1) = 110.20 Log likelihood = -5553.1787 Prob > chi2 =

5

0.0000

-----------------------------------------------------------------------------PAT | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------LOGR | .3487832 .0332254 10.50 0.000 .2836625 .4139039 _cons | 2.312705 .124758 18.54 0.000 2.068184 2.557226 -------------+---------------------------------------------------------------/lnalpha | .5454692 .0899144 .3692402 .7216983 -------------+---------------------------------------------------------------alpha | 1.725418 .1551399 1.446635 2.057925 -----------------------------------------------------------------------------Likelihood-ratio test of alpha=0: chibar2(01) = 3.1e+04 Prob>=chibar2 = 0.000 . estimates store poisre . . * Poisson random effects with normal error . xtpoisson PAT LOGR $xextra, re i(id) normal Fitting comparison Poisson model: Iteration 0: log likelihood = -21030.607 Iteration 1: log likelihood = -21030.583 Iteration 2: log likelihood = -21030.583 Fitting constant-only model: tau = tau = tau = tau = tau = tau =

0.0 0.1 0.2 0.3 0.4 0.5

log likelihood = -55439.205 log likelihood = -12594.935 log likelihood = -8669.2146 log likelihood = -8107.7532 log likelihood = -7634.0488 log likelihood = -8046.3947 560

Iteration 0: Iteration 1: Iteration 2: Iteration 3:

log likelihood = -7634.0488 log likelihood = -7586.9889 log likelihood = -7586.5899 log likelihood = -7586.5898

Fitting full model: tau = 0.0 tau = 0.1 tau = 0.2 tau = 0.3 Iteration 0: Iteration 1: Iteration 2:

log likelihood = -19363.106 log likelihood = -6602.7685 log likelihood = -6335.5261 log likelihood = -6556.0614 log likelihood = -6335.5261 log likelihood = -6310.8821 log likelihood = -6261.9825

Random-effects Poisson regression Number of obs Group variable (i): id Number of groups = Random effects u_i ~ Gaussian

Obs per group: min = avg = 5.0 max = 5

LR chi2(0) Log likelihood = -6261.9825

= 1730 346

= 2649.21 Prob > chi2 =

5

.

-----------------------------------------------------------------------------PAT | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------LOGR | .815977 . . . . . _cons | 1.156293 . . . . . -------------+---------------------------------------------------------------/lnsig2u | -1.310299 . . . . . -------------+---------------------------------------------------------------sigma_u | .5193643 . . . -----------------------------------------------------------------------------Likelihood-ratio test of sigma_u=0: chibar2(01) = 3.0e+04 Pr>=chibar2 = 0.000 . estimates store poisrenormal . . * Poisson random effects population averaged . xtpoisson PAT LOGR $xextra, pa i(id) Iteration 1: tolerance = .09172122 Iteration 2: tolerance = .02686915 Iteration 3: tolerance = .00712438 Iteration 4: tolerance = .00159015 Iteration 5: tolerance = .00032104 Iteration 6: tolerance = .00006195 Iteration 7: tolerance = .00001174 Iteration 8: tolerance = 2.209e-06 561

Iteration 9: tolerance = 4.146e-07 GEE population-averaged model Number of obs = 1730 Group variable: id Number of groups = 346 Link: log Obs per group: min = 5 Family: Poisson avg = 5.0 Correlation: exchangeable max = 5 Wald chi2(1) = 16317.27 Scale parameter: 1 Prob > chi2 = 0.0000 -----------------------------------------------------------------------------PAT | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------LOGR | .5595302 .0043803 127.74 0.000 .550945 .5681153 _cons | 2.067515 .0185166 111.66 0.000 2.031223 2.103807 -----------------------------------------------------------------------------. estimates store poispa . . * Poisson random effects population averaged with robust se . xtpoisson PAT LOGR $xextra, robust pa i(id) Iteration 1: tolerance = .09172122 Iteration 2: tolerance = .02686915 Iteration 3: tolerance = .00712438 Iteration 4: tolerance = .00159015 Iteration 5: tolerance = .00032104 Iteration 6: tolerance = .00006195 Iteration 7: tolerance = .00001174 Iteration 8: tolerance = 2.209e-06 Iteration 9: tolerance = 4.146e-07 GEE population-averaged model Number of obs = 1730 Group variable: id Number of groups = 346 Link: log Obs per group: min = 5 Family: Poisson avg = 5.0 Correlation: exchangeable max = 5 Wald chi2(1) = 293.80 Scale parameter: 1 Prob > chi2 = 0.0000 (standard errors adjusted for clustering on id) -----------------------------------------------------------------------------| Semi-robust PAT | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------LOGR | .5595302 .0326436 17.14 0.000 .4955499 .6235104 _cons | 2.067515 .1113256 18.57 0.000 1.849321 2.285709 -----------------------------------------------------------------------------. estimates store poispapan 562

. . ********** (3) POISSON GEE (GENERALIZED ESTIMATING EQUATIONS ********** . . * Xtgee should reproduce Poisson random effects population averaged . xtgee PAT LOGR $xextra, corr(exchangeable) family(poisson) link(log) i(id) Iteration 1: tolerance = .09172122 Iteration 2: tolerance = .02686915 Iteration 3: tolerance = .00712438 Iteration 4: tolerance = .00159015 Iteration 5: tolerance = .00032104 Iteration 6: tolerance = .00006195 Iteration 7: tolerance = .00001174 Iteration 8: tolerance = 2.209e-06 Iteration 9: tolerance = 4.146e-07 GEE population-averaged model Number of obs = 1730 Group variable: id Number of groups = 346 Link: log Obs per group: min = 5 Family: Poisson avg = 5.0 Correlation: exchangeable max = 5 Wald chi2(1) = 16317.27 Scale parameter: 1 Prob > chi2 = 0.0000 -----------------------------------------------------------------------------PAT | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------LOGR | .5595302 .0043803 127.74 0.000 .550945 .5681153 _cons | 2.067515 .0185166 111.66 0.000 2.031223 2.103807 -----------------------------------------------------------------------------. estimates store poisgee . . * Xtgee should reproduce Poisson random effects population averaged with robust se . xtgee PAT LOGR $xextra, corr(exchangeable) family(poisson) link(log) i(id) robust Iteration 1: tolerance = .09172122 Iteration 2: tolerance = .02686915 Iteration 3: tolerance = .00712438 Iteration 4: tolerance = .00159015 Iteration 5: tolerance = .00032104 Iteration 6: tolerance = .00006195 Iteration 7: tolerance = .00001174 Iteration 8: tolerance = 2.209e-06 Iteration 9: tolerance = 4.146e-07 GEE population-averaged model Number of obs = 1730 Group variable: id Number of groups = 346 Link: log Obs per group: min = 5 563

Family: Correlation: Scale parameter:

Poisson avg = 5.0 exchangeable max = 5 Wald chi2(1) = 293.80 1 Prob > chi2 = 0.0000

(standard errors adjusted for clustering on id) -----------------------------------------------------------------------------| Semi-robust PAT | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------LOGR | .5595302 .0326436 17.14 0.000 .4955499 .6235104 _cons | 2.067515 .1113256 18.57 0.000 1.849321 2.285709 -----------------------------------------------------------------------------. estimates store poisgeepan . . * Xtgee should give NLS of exponential mean with iid standard errors . xtgee PAT LOGR $xextra, corr(independent) family(gaussian) link(log) i(id) Iteration 1: tolerance = 8.014e-08 GEE population-averaged model Number of obs = 1730 Group variable: id Number of groups = 346 Link: log Obs per group: min = 5 Family: Gaussian avg = 5.0 Correlation: independent max = 5 Wald chi2(1) = 2316.87 Scale parameter: 2060.724 Prob > chi2 = 0.0000 Pearson chi2(1730): Dispersion (Pearson):

3565052.8 2060.724

Deviance Dispersion

= 3565052.8 = 2060.724

-----------------------------------------------------------------------------PAT | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------LOGR | .5084673 .0105636 48.13 0.000 .487763 .5291716 _cons | 2.528729 .0544558 46.44 0.000 2.421997 2.63546 -----------------------------------------------------------------------------. estimates store nls . . * Xtgee should give NLS of exponential mean with robust standard errors . xtgee PAT LOGR $xextra, corr(independent) family(gaussian) link(log) i(id) robust Iteration 1: tolerance = 8.014e-08 GEE population-averaged model Number of obs = 1730 Group variable: id Number of groups = 346 Link: log Obs per group: min = 5 564

Family: Correlation: Scale parameter: Pearson chi2(1730): Dispersion (Pearson):

Gaussian avg = 5.0 independent max = 5 Wald chi2(1) = 85.32 2060.724 Prob > chi2 = 0.0000 3565052.8 2060.724

Deviance Dispersion

= 3565052.8 = 2060.724

(standard errors adjusted for clustering on id) -----------------------------------------------------------------------------| Semi-robust PAT | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------LOGR | .5084673 .055046 9.24 0.000 .4005791 .6163554 _cons | 2.528729 .2176674 11.62 0.000 2.102109 2.955349 -----------------------------------------------------------------------------. estimates store nlspan . . ********** (4) PANEL ROBUST STANDARD ERRORS BY BOOTSTRAP ********** . . * For discussion of panel robust standard errors . * see text Section 23.2.6 page 788-9 (nonlinear panel) . * and text Section 21.2.3 page 705-8 (linear panel) . . * Pooled Poisson panel robust bootstrap standard errors . set seed 10001 . bootstrap "poisson PAT LOGR $xextra" "_b[LOGR] _b[_cons]", cluster(id) reps($nreps) level(95) command: poisson PAT LOGR statistics: _bs_1 = _b[LOGR] _bs_2 = _b[_cons] Bootstrap statistics

Number of obs = N of clusters = 346 Replications = 500

1730

-----------------------------------------------------------------------------Variable | Reps Observed Bias Std. Err. [95% Conf. Interval] -------------+---------------------------------------------------------------_bs_1 | 500 .6929337 .0081667 .0473006 .6000008 .7858666 (N) | .6250867 .8100113 (P) | .6209522 .8025689 (BC) _bs_2 | 500 1.711528 -.0267995 .141745 1.433038 1.990019 (N) | 1.336657 1.924925 (P) | 1.355381 1.935691 (BC) -----------------------------------------------------------------------------Note: N = normal P = percentile 565

BC = bias-corrected . matrix poisbootse = e(se) . . * Poisson fixed effects panel bootstrap standard errors . set seed 10001 . bootstrap "xtpoisson PAT LOGR $xextra, fe i(id)" "_b[LOGR]", cluster(id) reps($nreps) level(95) command: xtpoisson PAT LOGR , fe i(id) statistic: _bs_1 = _b[LOGR] Bootstrap statistics

Number of obs = N of clusters = 324 Replications = 500

1620

-----------------------------------------------------------------------------Variable | Reps Observed Bias Std. Err. [95% Conf. Interval] -------------+---------------------------------------------------------------_bs_1 | 500 -.0377642 .0057448 .1067039 -.2474085 .17188 (N) | -.2458792 .1454112 (P) | -.3182177 .1310303 (BC) -----------------------------------------------------------------------------Note: N = normal P = percentile BC = bias-corrected . matrix poisfebootse = e(se) . . * Poisson random effects panel bootstrap standard errors . set seed 10001 . bootstrap "xtpoisson PAT LOGR $xextra, re i(id)" "_b[LOGR] _b[_cons]", cluster(id) reps($nreps) le > vel(95) command: xtpoisson PAT LOGR , re i(id) statistics: _bs_1 = _b[LOGR] _bs_2 = _b[_cons] Bootstrap statistics

Number of obs = N of clusters = 346 Replications = 500

1730

-----------------------------------------------------------------------------Variable | Reps Observed Bias Std. Err. [95% Conf. Interval] -------------+---------------------------------------------------------------_bs_1 | 500 .3487832 -.1581585 .1194127 .1141695 .5833969 (N) | -.0414326 .4028537 (P) 566

| .2775298 .5040658 (BC) _bs_2 | 500 2.312705 .5382745 .4384781 1.451214 3.174196 (N) | 2.104445 3.743506 (P) | 1.804036 2.552794 (BC) -----------------------------------------------------------------------------Note: N = normal P = percentile BC = bias-corrected . matrix poisrebootse = e(se) . . * Poisson population averaged panel bootstrap standard errors . set seed 10001 . bootstrap "xtpoisson PAT LOGR $xextra, pa i(id)" "_b[LOGR] _b[_cons]", cluster(id) reps($nreps) le > vel(95) command: xtpoisson PAT LOGR , pa i(id) statistics: _bs_1 = _b[LOGR] _bs_2 = _b[_cons] Bootstrap statistics

Number of obs = N of clusters = 346 Replications = 500

1730

-----------------------------------------------------------------------------Variable | Reps Observed Bias Std. Err. [95% Conf. Interval] -------------+---------------------------------------------------------------_bs_1 | 338 .5595301 -.0013448 .1072904 .3484868 .7705734 (N) | .1938364 .6946551 (P) | .0630385 .6535396 (BC) _bs_2 | 338 2.067515 -.0016997 .2940233 1.489163 2.645867 (N) | 1.675453 3.034075 (P) | 1.80883 3.352539 (BC) -----------------------------------------------------------------------------Note: N = normal P = percentile BC = bias-corrected . matrix poispabootse = e(se) . set seed 10001 . . * Xtgee should give exponential mean (NLS) with iid errors with boostrap se's . bootstrap "xtgee PAT LOGR $xextra, corr(independent) family(gaussian) link(log) i(id)" "_b[LOGR] > _b[_cons]", cluster(id) reps($nreps) level(95)

567

command: xtgee PAT LOGR , corr(independent) family(gaussian) link(log) i(id) statistics: _bs_1 = _b[LOGR] _bs_2 = _b[_cons] Bootstrap statistics

Number of obs = N of clusters = 346 Replications = 500

1730

-----------------------------------------------------------------------------Variable | Reps Observed Bias Std. Err. [95% Conf. Interval] -------------+---------------------------------------------------------------_bs_1 | 500 .5084673 .0122215 .0541264 .4021235 .614811 (N) | .4453159 .6547906 (P) | .4372376 .6397901 (BC) _bs_2 | 500 2.528729 -.0502655 .198022 2.139669 2.917789 (N) | 1.953206 2.763821 (P) | 2.084754 2.820513 (BC) -----------------------------------------------------------------------------Note: N = normal P = percentile BC = bias-corrected . . * Results fiven in same order as in Table 23.1 page 794 . matrix nlsbootse = e(se) . matrix list poisbootse poisbootse[1,2] _bs_1 _bs_2 se .04730061 .14174498 . matrix list poisfebootse symmetric poisfebootse[1,1] _bs_1 se .10670389 . matrix list poisrebootse poisrebootse[1,2] _bs_1 _bs_2 se .11941272 .43847813 . matrix list poispabootse poispabootse[1,2] _bs_1 _bs_2 se .10729042 .29402327 . 568

. ********** DISPLAY RESULTS FOR (1)-(3) GIVEN IN TABLE 23.1 page 794 ********** . . * Standard error using iid errors and in some cases panel . . estimates table linolspan linfe linre, t se /* > */ stats(N ll r2 tss rss mss rmse df_r) b(%10.3f) ----------------------------------------------------Variable | linolspan linfe linre -------------+--------------------------------------LOGR | 0.834 0.107 0.720 | 0.023 0.056 0.024 | 36.48 1.91 30.26 _cons | 0.795 1.709 0.938 | 0.058 0.071 0.060 | 13.73 23.92 15.65 -------------+--------------------------------------N | 1730.000 1730.000 1730.000 ll | -2531.658 -1100.267 r2 | 0.719 0.003 tss | 6732.584 rss | 1890.831 361.400 mss | 4841.753 0.948 rmse | 1.046 0.511 df_r | 345.000 1383.000 ----------------------------------------------------legend: b/se/t . estimates table poisiid poishet poispan, t se /* > */ stats(N ll r2 tss rss mss rmse df_r) b(%10.3f) ----------------------------------------------------Variable | poisiid poishet poispan -------------+--------------------------------------LOGR | 0.693 0.693 0.693 | 0.002 0.020 0.043 | 308.61 34.98 16.10 _cons | 1.712 1.712 1.712 | 0.010 0.062 0.134 | 175.24 27.60 12.77 -------------+--------------------------------------N | 1730.000 1730.000 1730.000 ll | -21030.583 -21030.583 -21030.583 r2 | tss | rss | mss | rmse | df_r | ----------------------------------------------------legend: b/se/t 569

. estimates table poisfe poisre poisrenormal poispa poispapan, t se /* > */ stats(N ll r2 tss rss mss rmse df_r) b(%10.3f) ------------------------------------------------------------------------------Variable | poisfe poisre poisreno~l poispa poispapan -------------+----------------------------------------------------------------PAT | LOGR | -0.038 0.349 0.816 | 0.033 0.033 0.000 | -1.16 10.50 . _cons | 2.313 1.156 | 0.125 0.000 | 18.54 . -------------+----------------------------------------------------------------lnalpha | _cons | 0.545 | 0.090 | 6.07 -------------+----------------------------------------------------------------lnsig2u | _cons | -1.310 | 0.000 | . -------------+----------------------------------------------------------------_ | LOGR | 0.560 0.560 | 0.004 0.033 | 127.74 17.14 _cons | 2.068 2.068 | 0.019 0.111 | 111.66 18.57 -------------+----------------------------------------------------------------Statistics | N | 1620.000 1730.000 1730.000 1730.000 1730.000 ll | -3659.593 -5553.179 -6261.982 r2 | tss | rss | mss | rmse | df_r | ------------------------------------------------------------------------------legend: b/se/t . estimates table poisgee poisgeepan nls nlspan, t se /* > */ stats(N ll r2 tss rss mss rmse df_r) b(%10.3f) -----------------------------------------------------------------Variable | poisgee poisgeepan nls nlspan -------------+---------------------------------------------------570

LOGR | 0.560 0.560 0.508 0.508 | 0.004 0.033 0.011 0.055 | 127.74 17.14 48.13 9.24 _cons | 2.068 2.068 2.529 2.529 | 0.019 0.111 0.054 0.218 | 111.66 18.57 46.44 11.62 -------------+---------------------------------------------------N | 1730.000 1730.000 1730.000 1730.000 ll | r2 | tss | rss | mss | rmse | df_r | -----------------------------------------------------------------legend: b/se/t . . ********** CLOSE OUTPUT . log close log: c:\Imbook\bwebpage\Section5\mma23p1pannonlin.txt log type: text closed on: 23 May 2005, 12:53:45

571

-----------------------------------------------------------------------------------------------------log: c:\Imbook\bwebpage\Section6\mma24p1olscluster.txt log type: text opened on: 24 May 2005, 14:33:58 . . ********** OVERVIEW OF MMA24P1OLSCLUSTER.DO ********** . . * STATA Program . * copyright C 2005 by A. Colin Cameron and Pravin K. Trivedi . * used for "Microeconometrics: Methods and Applications" . * by A. Colin Cameron and Pravin K. Trivedi (2005) . * Cambridge University Press . . * Chapter 24.7 pages 848-53 Table 24.4 . * Cluster robust inference for OLS cross-section application using . * Vietnam Living Standard Survey data . . * (0) Descriptive Statistics (Table 24.3 first half) . * (1) Linear regression (in logs) with household data (Table 24.4) . . * For Tables 24.5-6 for clustered count data see MMA24P2POISCLUSTER.DO . . * The cluster effects model is . * y_it = x_it'b + a_i + e_it . * Default xtreg output assumes e_it is iid. . * This is usually too strong an assumption. . * Instead should get cluster-robust errors after xtreg . * See Section 21.2.3 pages 709-12 . * Stata Version 8 does not do this but Stata version 9 does. . * Here we do a panel bootstrap - results not reported in the text . . * To speed up programs reduce breps - the number of bootstrap reps . . * To run this program you need data set . * vietnam_ex1.dta . . ********** SETUP ********** . . set more off . version 8.0 . set scheme s1mono /* Used for graphs */ . . ********** DATA DESCRIPTION ********** . . * The data comes from World Bank 1997 Vietnam Living Standards Survey . * A subset was used in chapter 4.6.4. . * The larger sample here is described on pages 848-9 572

. . * The data are HOUSEHOLD data . * There are N=5006 households in 194 clusters . . * The separate data set vietnam_ex2.dta has household-level data . . ********** READ IN HOUSEHOLD DATA and SUMMARIZE (Table 24.3) ********** . . use vietnam_ex1.dta . desc Contains data from vietnam_ex1.dta obs: 5,999 vars: 8 11 Apr 2005 12:39 size: 185,969 (98.2% of memory free) ------------------------------------------------------------------------------storage display value variable name type format label variable label ------------------------------------------------------------------------------sex byte %8.0g Gender of HH.head (1:M;2:F) age int %8.0g Age of household head comped98 float %9.0g diploma completed diploma HH.head farm float %9.0g loaiho Type of HH (1:farm; 0:nonfarm) hhsize long %12.0g Household size commune float %9.0g commune code PSU-SVY commands lhhexp1 float %9.0g lhhex12m float %9.0g ------------------------------------------------------------------------------Sorted by: . sum Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------sex | 5999 1.270712 .4443645 1 2 age | 5999 48.01284 13.7702 16 95 comped98 | 5999 3.385564 2.037543 0 9 farm | 5999 .5730955 .4946694 0 1 hhsize | 5999 4.752292 1.954292 1 19 -------------+-------------------------------------------------------commune | 5999 98.26588 56.00461 1 194 lhhexp1 | 5999 9.341561 .6877458 6.543108 12.20242 lhhex12m | 5006 6.310585 1.593083 0 12.36325 . . rename sex SEX . rename age AGE . rename comped98 EDUC 573

. rename farm FARM . rename hhsize HHSIZE . rename commune COMMUNE . rename lhhexp1 LNHHEXP . rename lhhex12m LNEXP12M . gen HHEXP = exp(LNHHEXP) . . * Following should give same descriptive statistics . * as in top half (Household) in Table 24.3 p.850 . * But there are some differences plus here have FARM not URBAN . sum LNEXP12M AGE SEX HHSIZE FARM EDUC HHEXP LNHHEXP COMMUNE Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------LNEXP12M | 5006 6.310585 1.593083 0 12.36325 AGE | 5999 48.01284 13.7702 16 95 SEX | 5999 1.270712 .4443645 1 2 HHSIZE | 5999 4.752292 1.954292 1 19 FARM | 5999 .5730955 .4946694 0 1 -------------+-------------------------------------------------------EDUC | 5999 3.385564 2.037543 0 9 HHEXP | 5999 14599.23 12582.31 694.4419 199271 LNHHEXP | 5999 9.341561 .6877458 6.543108 12.20242 COMMUNE | 5999 98.26588 56.00461 1 194 . . * Write data to a text (ascii) file so can use with programs other than Stata . * Note that LNEXP12M has some missing values coded as . . outfile LNEXP12M AGE SEX HHSIZE FARM EDUC LNHHEXP COMMUNE /* > */using vietnam_ex1.asc, replace . . ********** ANALYSIS: CLUSTER ANALYSIS FOR LINEAR MODEL [Table 24.4 p.851] ********** . . * Regressor list for the linear regressions . global XLISTLINEAR LNHHEXP AGE SEX HHSIZE FARM EDUC . . * OLS with usual standard errors (Table 24.4 columns 1-2) . regress LNEXP12M $XLISTLINEAR Source | SS df MS -------------+------------------------------

Number of obs = 5006 F( 6, 4999) = 82.02 574

Model | 1138.38332 6 189.730553 Prob > F = 0.0000 Residual | 11563.877 4999 2.31323805 R-squared = 0.0896 -------------+-----------------------------Adj R-squared = 0.0885 Total | 12702.2603 5005 2.53791415 Root MSE = 1.5209 -----------------------------------------------------------------------------LNEXP12M | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------LNHHEXP | .6702328 .0418711 16.01 0.000 .5881472 .7523185 AGE | .0105766 .0016554 6.39 0.000 .0073312 .013822 SEX | .097444 .0518961 1.88 0.060 -.0042952 .1991832 HHSIZE | .0289812 .0132524 2.19 0.029 .0030007 .0549617 FARM | .1346891 .0493325 2.73 0.006 .0379757 .2314025 EDUC | -.0903599 .0122803 -7.36 0.000 -.1144346 -.0662852 _cons | -.5107135 .3799642 -1.34 0.179 -1.25561 .234183 -----------------------------------------------------------------------------. estimates store olsiid . . * OLS with heteroskedastic-robust standard errors (Table 24.4 column 3) . regress LNEXP12M $XLISTLINEAR, robust Regression with robust standard errors Number of obs = F( 6, 4999) = 80.80 Prob > F = 0.0000 R-squared = 0.0896 Root MSE = 1.5209

5006

-----------------------------------------------------------------------------| Robust LNEXP12M | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------LNHHEXP | .6702328 .0425223 15.76 0.000 .5868705 .7535952 AGE | .0105766 .0016634 6.36 0.000 .0073157 .0138376 SEX | .097444 .0519606 1.88 0.061 -.0044217 .1993096 HHSIZE | .0289812 .0134698 2.15 0.031 .0025744 .055388 FARM | .1346891 .0494286 2.72 0.006 .0377873 .2315908 EDUC | -.0903599 .0127869 -7.07 0.000 -.1154278 -.0652919 _cons | -.5107135 .3812665 -1.34 0.180 -1.258163 .2367362 -----------------------------------------------------------------------------. estimates store olshet . . * OLS with cluster-robust standard errors (Table 24.4 column 4) . regress LNEXP12M $XLISTLINEAR, cluster(COMMUNE) Regression with robust standard errors Number of obs = F( 6, 193) = 54.91 Prob > F = 0.0000

5006

575

R-squared Number of clusters (COMMUNE) = 194

= 0.0896 Root MSE

= 1.5209

-----------------------------------------------------------------------------| Robust LNEXP12M | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------LNHHEXP | .6702328 .0528536 12.68 0.000 .565988 .7744777 AGE | .0105766 .0019371 5.46 0.000 .0067561 .0143972 SEX | .097444 .0595084 1.64 0.103 -.0199263 .2148142 HHSIZE | .0289812 .0153602 1.89 0.061 -.0013142 .0592766 FARM | .1346891 .0608046 2.22 0.028 .0147622 .2546159 EDUC | -.0903599 .0149743 -6.03 0.000 -.1198942 -.0608255 _cons | -.5107135 .4706163 -1.09 0.279 -1.438925 .4174979 -----------------------------------------------------------------------------. estimates store olsclust . . * Random effects estimation (FGLS) (Table 24.4 columns 5-6) . * This uses the xtreg command which first requires identifying the cluster . iis COMMUNE . xtreg LNEXP12M $XLISTLINEAR, re Random-effects GLS regression Group variable (i): COMMUNE R-sq: within = 0.0518 between = 0.2884 overall = 0.0883 Random effects u_i ~ Gaussian corr(u_i, X) = 0 (assumed)

Number of obs = 5006 Number of groups = 194 Obs per group: min = avg = 25.8 max = 39

1

Wald chi2(6) = 335.12 Prob > chi2 = 0.0000

-----------------------------------------------------------------------------LNEXP12M | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------LNHHEXP | .6268899 .0468004 13.39 0.000 .5351627 .718617 AGE | .0112334 .0016411 6.85 0.000 .008017 .0144499 SEX | .1069915 .0511849 2.09 0.037 .0066709 .2073121 HHSIZE | .0158302 .0135166 1.17 0.242 -.0106618 .0423222 FARM | .0928509 .0549544 1.69 0.091 -.0148578 .2005595 EDUC | -.0638447 .0129744 -4.92 0.000 -.0892741 -.0384153 _cons | -.1660698 .4202027 -0.40 0.693 -.989652 .6575123 -------------+---------------------------------------------------------------sigma_u | .46739871 sigma_e | 1.4526468 rho | .09381491 (fraction of variance due to u_i) ------------------------------------------------------------------------------

576

. estimates store refgls . . * Note that can cluster bootstrap if desired to get more robust standard errors . * This is done at end of program . . * Fixed effects estimation (FGLS) (Table 24.4 columns 7-8) . xtreg LNEXP12M $XLISTLINEAR, fe Fixed-effects (within) regression Group variable (i): COMMUNE R-sq: within = 0.0520 between = 0.2787 overall = 0.0865

corr(u_i, Xb) = 0.0797

Number of obs = 5006 Number of groups = 194 Obs per group: min = avg = 25.8 max = 39

F(6,4806) = Prob > F

1

43.92 = 0.0000

-----------------------------------------------------------------------------LNEXP12M | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------LNHHEXP | .6037139 .0520178 11.61 0.000 .5017352 .7056926 AGE | .0115845 .0016706 6.93 0.000 .0083092 .0148597 SEX | .112821 .0520014 2.17 0.030 .0108745 .2147675 HHSIZE | .0107124 .0141127 0.76 0.448 -.016955 .0383797 FARM | .0693037 .0609002 1.14 0.255 -.0500885 .1886959 EDUC | -.0510325 .0135817 -3.76 0.000 -.0776588 -.0244062 _cons | .0361552 .461482 0.08 0.938 -.8685606 .9408711 -------------+---------------------------------------------------------------sigma_u | .57732514 sigma_e | 1.4526468 rho | .13640519 (fraction of variance due to u_i) -----------------------------------------------------------------------------F test that all u_i=0: F(193, 4806) = 3.49 Prob > F = 0.0000 . estimates store fe . . * Note that can cluster bootstrap if desired to get more robust standard errors . * This is done at end of program . . * Random effects estimation by MLE assuming normality (Table 24.4 columns 5-6) . * This uses the xtreg command which first requires identifying the cluster . iis COMMUNE . xtreg LNEXP12M $XLISTLINEAR, mle Fitting constant-only model: Iteration 0: log likelihood = -9262.6182 Iteration 1: log likelihood = -9252.6974 577

Iteration 2: log likelihood = -9252.1542 Iteration 3: log likelihood = -9252.1493 Fitting full model: Iteration 0: log likelihood = -9096.5264 Iteration 1: log likelihood = -9092.5585 Iteration 2: log likelihood = -9092.5546 Random-effects ML regression Group variable (i): COMMUNE Random effects u_i ~ Gaussian

Number of obs = 5006 Number of groups = 194 Obs per group: min = avg = 25.8 max = 39

LR chi2(6) Log likelihood = -9092.5546

= 319.19 Prob > chi2 =

1

0.0000

-----------------------------------------------------------------------------LNEXP12M | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------LNHHEXP | .6276456 .0467072 13.44 0.000 .536101 .7191901 AGE | .01122 .0016406 6.84 0.000 .0080045 .0144354 SEX | .1067788 .0511618 2.09 0.037 .0065035 .207054 HHSIZE | .01603 .0135121 1.19 0.235 -.0104533 .0425133 FARM | .0936529 .0548379 1.71 0.088 -.0138274 .2011332 EDUC | -.0643046 .0130222 -4.94 0.000 -.0898277 -.0387816 _cons | -.1718111 .4192856 -0.41 0.682 -.9935959 .6499737 -------------+---------------------------------------------------------------/sigma_u | .455472 .0329742 13.81 0.000 .3908438 .5201002 /sigma_e | 1.452303 .0148092 98.07 0.000 1.423278 1.481329 -------------+---------------------------------------------------------------rho | .0895499 .0120221 .0682208 .1154799 -----------------------------------------------------------------------------Likelihood-ratio test of sigma_u=0: chibar2(01)= 212.57 Prob>=chibar2 = 0.000 . estimates store remle . . * Test of the RE specification using Breusch-Pagan test . * This is statistic in third bottom row of Table 24.4 . quietly xtreg LNEXP12M $XLISTLINEAR, re . xttest0 Breusch and Pagan Lagrangian multiplier test for random effects: LNEXP12M[COMMUNE,t] = Xb + u[COMMUNE] + e[COMMUNE,t] Estimated results: | Var

sd = sqrt(Var) 578

---------+----------------------------LNEXP12M | 2.537914 1.593083 e | 2.110183 1.452647 u | .2184615 .4673987 Test: Var(u) = 0 chi2(1) = 432.75 Prob > chi2 = 0.0000 . . * Hausman test of FE vs. RE specification . * This test is not a robust version. . * Its validity asswumes that errors are iid after including COMMUNE-specific effect . * For this example this may be reasonable as cluster bootstrap se's close to usual se's . xthausman (Warning: xthausman is no longer a supported command; use -hausman-. For instructions, see help hausman.)

Hausman specification test ---- Coefficients ---| Fixed Random LNEXP12M | Effects Effects Difference -------------+----------------------------------------LNHHEXP | .6037139 .6268899 -.0231759 AGE | .0115845 .0112334 .000351 SEX | .112821 .1069915 .0058295 HHSIZE | .0107124 .0158302 -.0051179 FARM | .0693037 .0928509 -.0235472 EDUC | -.0510325 -.0638447 .0128122 Test: Ho: difference in coefficients not systematic chi2( 6) = (b-B)'[S^(-1)](b-B), S = (S_fe - S_re) = 17.89 Prob>chi2 = 0.0065 . . * Alternative GLS estimation using the GEE approach . * Same as xtgee with family(gaussian) link(id) corr(exchangeable) . * So GLS with equicorrelated errors . xtreg LNEXP12M $XLISTLINEAR, pa Iteration 1: tolerance = .21691897 Iteration 2: tolerance = .00610852 Iteration 3: tolerance = .00014606 Iteration 4: tolerance = 3.479e-06 Iteration 5: tolerance = 8.285e-08 GEE population-averaged model

Number of obs

=

5006 579

Group variable: Link: Family: Correlation: Scale parameter:

COMMUNE Number of groups = 194 identity Obs per group: min = 1 Gaussian avg = 25.8 exchangeable max = 39 Wald chi2(6) = 338.97 2.314413 Prob > chi2 = 0.0000

-----------------------------------------------------------------------------LNEXP12M | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------LNHHEXP | .6281447 .0466076 13.48 0.000 .5367955 .719494 AGE | .0112111 .0016411 6.83 0.000 .0079946 .0144275 SEX | .1066389 .0511914 2.08 0.037 .0063056 .2069722 HHSIZE | .0161625 .013502 1.20 0.231 -.0103009 .0426259 FARM | .0941811 .0547349 1.72 0.085 -.0130973 .2014594 EDUC | -.0646085 .0129528 -4.99 0.000 -.0899956 -.0392215 _cons | -.1756087 .4185566 -0.42 0.675 -.9959645 .6447472 -----------------------------------------------------------------------------. estimates store pa . . ********** DISPLAY TABLE 24.4 RESULTS page 851 ********** . . estimates table olsiid olshet olsclust, /* > */ b(%10.3f) t(%10.2f) stats(r2 N) ----------------------------------------------------Variable | olsiid olshet olsclust -------------+--------------------------------------LNHHEXP | 0.670 0.670 0.670 | 16.01 15.76 12.68 AGE | 0.011 0.011 0.011 | 6.39 6.36 5.46 SEX | 0.097 0.097 0.097 | 1.88 1.88 1.64 HHSIZE | 0.029 0.029 0.029 | 2.19 2.15 1.89 FARM | 0.135 0.135 0.135 | 2.73 2.72 2.22 EDUC | -0.090 -0.090 -0.090 | -7.36 -7.07 -6.03 _cons | -0.511 -0.511 -0.511 | -1.34 -1.34 -1.09 -------------+--------------------------------------r2 | 0.090 0.090 0.090 N | 5006.000 5006.000 5006.000 ----------------------------------------------------legend: b/t . estimates table pa fe refgls remle, /* 580

>

*/ b(%10.3f) t(%10.2f) stats(r2 N)

-----------------------------------------------------------------Variable | pa fe refgls remle -------------+---------------------------------------------------_ | LNHHEXP | 0.628 0.604 0.627 | 13.48 11.61 13.39 AGE | 0.011 0.012 0.011 | 6.83 6.93 6.85 SEX | 0.107 0.113 0.107 | 2.08 2.17 2.09 HHSIZE | 0.016 0.011 0.016 | 1.20 0.76 1.17 FARM | 0.094 0.069 0.093 | 1.72 1.14 1.69 EDUC | -0.065 -0.051 -0.064 | -4.99 -3.76 -4.92 _cons | -0.176 0.036 -0.166 | -0.42 0.08 -0.40 -------------+---------------------------------------------------LNEXP12M | LNHHEXP | 0.628 | 13.44 AGE | 0.011 | 6.84 SEX | 0.107 | 2.09 HHSIZE | 0.016 | 1.19 FARM | 0.094 | 1.71 EDUC | -0.064 | -4.94 _cons | -0.172 | -0.41 -------------+---------------------------------------------------sigma_u | _cons | 0.455 | 13.81 -------------+---------------------------------------------------sigma_e | _cons | 1.452 | 98.07 -------------+---------------------------------------------------Statistics | r2 | 0.052 N | 5006.000 5006.000 5006.000 5006.000 -----------------------------------------------------------------legend: b/t

581

. . ********** ADDITIONALLY DO CLUSTER BOOTSTRAPS ********** . . * These results not given in the text . . global breps = 500 . . * Note that can bootstrap if desired to get more robust standard errors . * The first reproduces reg , cluster(COMMUNE) . bootstrap "reg LNEXP12M $XLISTLINEAR" _b, cluster(COMMUNE) reps($breps) level(95) command: reg LNEXP12M LNHHEXP AGE SEX HHSIZE FARM EDUC statistics: b_LNHHEXP = _b[LNHHEXP] b_AGE = _b[AGE] b_SEX = _b[SEX] b_HHSIZE = _b[HHSIZE] b_FARM = _b[FARM] b_EDUC = _b[EDUC] b_cons = _b[_cons] Bootstrap statistics

Number of obs = N of clusters = 194 Replications = 500

5006

-----------------------------------------------------------------------------Variable | Reps Observed Bias Std. Err. [95% Conf. Interval] -------------+---------------------------------------------------------------b_LNHHEXP | 500 .6702328 .0000939 .0546562 .5628482 .7776175 (N) | .5575338 .7715588 (P) | .5502583 .7638555 (BC) b_AGE | 500 .0105766 .0000108 .0019538 .0067379 .0144154 (N) | .0067395 .0143774 (P) | .006576 .0141968 (BC) b_SEX | 500 .097444 -.0023301 .0602315 -.0208945 .2157825 (N) | -.0210348 .2196117 (P) | -.0261246 .2083439 (BC) b_HHSIZE | 500 .0289812 -.0008009 .0160043 -.0024629 .0604252 (N) | -.0004838 .0628019 (P) | .0028144 .0662394 (BC) b_FARM | 500 .1346891 .0026611 .0560327 .0245999 .2447782 (N) | .0293473 .2510255 (P) | .0202142 .2483591 (BC) b_EDUC | 500 -.0903599 -.00006 .014992 -.119815 -.0609047 (N) | -.1205786 -.0618314 (P) | -.1204532 -.0615499 (BC) b_cons | 500 -.5107135 .0044955 .4893788 -1.47221 .4507834 (N) | -1.435498 .4444398 (P) | -1.388972 .4859312 (BC) -----------------------------------------------------------------------------Note: N = normal 582

P = percentile BC = bias-corrected . * The t-statistic vector is e(b)./e(se) where ./ is elt. by elt. division . * But Stata Version 8 does not do ./ so instead need the following . matrix tols = (vecdiag(diag(e(b))*syminv(diag(e(se)))))' . matrix list tols, format(%10.2f) tols[7,1] r1 b_LNHHEXP 12.26 b_AGE 5.41 b_SEX 1.62 b_HHSIZE 1.81 b_FARM 2.40 b_EDUC -6.03 b_cons -1.04 . . * The next two reproduce xtreg , cluster(COMMUNE) . * but the cluster option for xtreg is not available for Stata version 8 . . * For this example the cluster bootstrap se's are within 10 percent . * of the usual xtreg se's, so usual se's may be okay here . . * Fixed effects estimator . bootstrap "xtreg LNEXP12M $XLISTLINEAR, fe" _b, cluster(COMMUNE) reps($breps) level(95) command: xtreg LNEXP12M LNHHEXP AGE SEX HHSIZE FARM EDUC , fe statistics: b_LNHHEXP = _b[LNHHEXP] b_AGE = _b[AGE] b_SEX = _b[SEX] b_HHSIZE = _b[HHSIZE] b_FARM = _b[FARM] b_EDUC = _b[EDUC] b_cons = _b[_cons] Bootstrap statistics

Number of obs = N of clusters = 194 Replications = 500

5006

-----------------------------------------------------------------------------Variable | Reps Observed Bias Std. Err. [95% Conf. Interval] -------------+---------------------------------------------------------------b_LNHHEXP | 500 .6037139 -.0006143 .0583525 .4890671 .7183608 (N) | .4852716 .7172067 (P) | .4841806 .7148217 (BC) b_AGE | 500 .0115845 5.02e-06 .0017464 .0081532 .0150157 (N) | .0082637 .0151613 (P) 583

| .0084701 .0152766 (BC) b_SEX | 500 .112821 -.0017372 .0546362 .0054756 .2201664 (N) | .0129603 .2214846 (P) | .017047 .235448 (BC) b_HHSIZE | 500 .0107124 -.0004379 .0150286 -.0188148 .0402395 (N) | -.0195233 .0415316 (P) | -.0184428 .044119 (BC) b_FARM | 500 .0693037 -.0010067 .0497627 -.0284666 .167074 (N) | -.0291446 .1679352 (P) | -.0259051 .1705921 (BC) b_EDUC | 500 -.0510325 .0003307 .0153224 -.081137 -.020928 (N) | -.0818133 -.0219096 (P) | -.0844261 -.0230367 (BC) b_cons | 500 .0361552 .0087515 .5186644 -.9828799 1.05519 (N) | -.934128 1.087458 (P) | -.934128 1.087458 (BC) -----------------------------------------------------------------------------Note: N = normal P = percentile BC = bias-corrected . matrix tfe = (vecdiag(diag(e(b))*syminv(diag(e(se)))))' . matrix list tfe, format(%10.2f) tfe[7,1] r1 b_LNHHEXP 10.35 b_AGE 6.63 b_SEX 2.06 b_HHSIZE 0.71 b_FARM 1.39 b_EDUC -3.33 b_cons 0.07 . . * Random effects estimator . bootstrap "xtreg LNEXP12M $XLISTLINEAR, re" _b, cluster(COMMUNE) reps($breps) level(95) command: xtreg LNEXP12M LNHHEXP AGE SEX HHSIZE FARM EDUC , re statistics: b_LNHHEXP = _b[LNHHEXP] b_AGE = _b[AGE] b_SEX = _b[SEX] b_HHSIZE = _b[HHSIZE] b_FARM = _b[FARM] b_EDUC = _b[EDUC] b_cons = _b[_cons] Bootstrap statistics

Number of obs = N of clusters = 194

5006

584

Replications

=

500

-----------------------------------------------------------------------------Variable | Reps Observed Bias Std. Err. [95% Conf. Interval] -------------+---------------------------------------------------------------b_LNHHEXP | 500 .6268899 -.0079169 .0486878 .5312314 .7225483 (N) | .5261016 .7155449 (P) | .540477 .7254891 (BC) b_AGE | 500 .0112334 .0001211 .0017668 .0077622 .0147047 (N) | .0080698 .0152565 (P) | .0077655 .0147142 (BC) b_SEX | 500 .1069915 .0058127 .0561182 -.0032656 .2172486 (N) | .0046711 .2187323 (P) | -.0109273 .2045939 (BC) b_HHSIZE | 500 .0158302 -.0014562 .0146506 -.0129543 .0446147 (N) | -.017179 .0459636 (P) | -.0108163 .0482198 (BC) b_FARM | 500 .0928509 -.0071707 .0442312 .0059485 .1797532 (N) | -.0014455 .1728321 (P) | .0053411 .1906732 (BC) b_EDUC | 500 -.0638447 .0049481 .014058 -.0914648 -.0362246 (N) | -.0871102 -.029496 (P) | -.094956 -.0407984 (BC) b_cons | 500 -.1660698 .0535286 .4305953 -1.012073 .6799335 (N) | -.8970464 .6892154 (P) | -.9512222 .6032417 (BC) -----------------------------------------------------------------------------Note: N = normal P = percentile BC = bias-corrected . matrix tre = (vecdiag(diag(e(b))*syminv(diag(e(se)))))' . matrix list tre, format(%10.2f) tre[7,1] r1 b_LNHHEXP 12.88 b_AGE 6.36 b_SEX 1.91 b_HHSIZE 1.08 b_FARM 2.10 b_EDUC -4.54 b_cons -0.39 . . ********** CLOSE OUTPUT . log close log: c:\Imbook\bwebpage\Section6\mma24p1olscluster.txt log type: text closed on: 24 May 2005, 14:44:12 585

-----------------------------------------------------------------------------------------------------log: c:\Imbook\bwebpage\Section6\mma24p2poiscluster.txt log type: text opened on: 24 May 2005, 16:35:22 . . ********** OVERVIEW OF MMA24P2POISCLUSTER.DO ********** . . * STATA Program . * copyright C 2005 by A. Colin Cameron and Pravin K. Trivedi . * used for "Microeconometrics: Methods and Applications" . * by A. Colin Cameron and Pravin K. Trivedi (2005) . * Cambridge University Press . . * Chapter 24.7 pages 848-53 Table 24.6 . * Cluster robust inference for Poisson cross-section application using . * Vietnam Living Standard Survey data . . * (0) Descriptive Statistics (Table 24.3 second half) . * (1) Frequencies of data (Table 24.5) . * (2) Poisson regression with individual-level data (Table 24.6) . . * The results differ in second significant digit from those in text . * despite same sample size. Not sure why. . . * For Table 24.4 for clustered household data see MMA24P1OLSCLUSTER.DO . . * The Poisson cluster effects model is . * y_it ~ Poiss0n(x_it'b + a_i) . * Default xtreg output assumes Poisson distribution - var = mean. . * This is usually too strong an assumption. . * Instead should get cluster-robust errors after xtpois . * See Section 21.2.3 pages 709-12 and section 23.26 pages 788-9 . * Stata Version 8 does not do this. . * Here we do a panel bootstrap - results not reported in the text . . * To speed up programs reduce breps - the number of bootstrap reps . * This program takes a long time if bootstrap . . * To run this program you need data set . * vietnam_ex2.dta . . ********** SETUP ********** . . set more off . version 8.0 . set scheme s1mono /* Used for graphs */

586

. . ********** DATA DESCRIPTION ********** . . * The data comes from World Bank 1997 Vietnam Living Standards Survey . * A subset was used in chapter 4.6.4. . * The larger sample here is described on pages 848-9 . . * The data are HOUSEHOLD data . * There are N=5006 individuals in 194 clusters (communes) . . * The separate data set vietnam_ex1.dta has individual level data . . ********** READ IN INDIVIDUAL-LEVEL DATA and SUMMARIZE (Table 24.3) ********** . . use vietnam_ex2.dta, clear . desc Contains data from vietnam_ex2.dta obs: 27,766 vars: 12 11 Apr 2005 12:33 size: 1,443,832 (85.9% of memory free) ------------------------------------------------------------------------------storage display value variable name type format label variable label ------------------------------------------------------------------------------COMPED98 float %9.0g SEX float %9.0g AGE float %9.0g MARRIED float %9.0g ILLDUM float %9.0g INJDUM float %9.0g ILLDAYS float %9.0g ACTDAYS float %9.0g PHARVIS float %9.0g HLTHINS float %9.0g lnhhinc float %9.0g commune float %9.0g ------------------------------------------------------------------------------Sorted by: . sum Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------COMPED98 | 27765 3.390672 1.93115 0 11 SEX | 27765 .5111471 .4998847 0 1 AGE | 27765 2.977504 .9671446 0 4.59512 MARRIED | 27765 .3988835 .4896775 0 1 ILLDUM | 27765 .6219701 .8995068 0 9 587

-------------+-------------------------------------------------------INJDUM | 27765 .0096885 .0979537 0 1 ILLDAYS | 27765 2.804034 5.45823 0 60 ACTDAYS | 27765 .0657302 1.115939 0 30 PHARVIS | 27765 .5117594 1.313427 0 30 HLTHINS | 27765 .1625788 .3689876 0 1 -------------+-------------------------------------------------------lnhhinc | 27765 2.60261 .6244145 .0467014 5.405502 commune | 27765 101.5266 56.28334 1 194 . . rename COMPED98 EDUC . rename ILLDUM ILLNESS . rename INJDUM INJURY . rename HLTHINS INSURANCE . rename lnhhinc LNHHEXP . rename commune COMMUNE . . * Following should give same descriptive statistics . * as in bottom half (Household) in Table 24.3 p.850 . * But there are is a difference for LNHHEXP plus here no data on MEDEXP . sum PHARVIS LNHHEXP AGE SEX MARRIED EDUC ILLNESS INJURY ILLDAYS ACTDAYS INSURANCE COMMUNE Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------PHARVIS | 27765 .5117594 1.313427 0 30 LNHHEXP | 27765 2.60261 .6244145 .0467014 5.405502 AGE | 27765 2.977504 .9671446 0 4.59512 SEX | 27765 .5111471 .4998847 0 1 MARRIED | 27765 .3988835 .4896775 0 1 -------------+-------------------------------------------------------EDUC | 27765 3.390672 1.93115 0 11 ILLNESS | 27765 .6219701 .8995068 0 9 INJURY | 27765 .0096885 .0979537 0 1 ILLDAYS | 27765 2.804034 5.45823 0 60 ACTDAYS | 27765 .0657302 1.115939 0 30 -------------+-------------------------------------------------------INSURANCE | 27765 .1625788 .3689876 0 1 COMMUNE | 27765 101.5266 56.28334 1 194 . sum LNHHEXP, detail LNHHEXP ------------------------------------------------------------588

Percentiles Smallest 1% 1.302267 .0467014 5% 1.658267 .1111674 10% 1.875315 .3755146 25% 2.188848 .4177101 50% 75% 90% 95% 99%

Obs 27765 Sum of Wgt. 27765

2.534935 Mean 2.60261 Largest Std. Dev. .6244145 2.962732 5.405502 3.458658 5.405502 Variance .3898934 3.737957 5.405502 Skewness .4925002 4.295394 5.405502 Kurtosis 3.583693

. . * Following gives Table 24.5 (page 852) frequencies . * These differ in some places from Table 24.5 - especially for number = 0 . tabulate PHARVIS PHARVIS | Freq. Percent Cum. ------------+----------------------------------0 | 20,668 74.44 74.44 1| 3,829 13.79 88.23 2| 1,716 6.18 94.41 3| 777 2.80 97.21 4| 359 1.29 98.50 5| 174 0.63 99.13 6| 64 0.23 99.36 7| 43 0.15 99.51 8| 16 0.06 99.57 9| 4 0.01 99.59 10 | 78 0.28 99.87 11 | 1 0.00 99.87 12 | 5 0.02 99.89 13 | 1 0.00 99.89 14 | 3 0.01 99.90 15 | 9 0.03 99.94 16 | 1 0.00 99.94 20 | 8 0.03 99.97 22 | 2 0.01 99.97 27 | 1 0.00 99.98 28 | 3 0.01 99.99 30 | 3 0.01 100.00 ------------+----------------------------------Total | 27,765 100.00 . . * Histogram with kernel density estimate . hist PHARVIS, discrete kdensity (start=0, width=1) . 589

. * Write data to a text (ascii) file so can use with programs other than Stata . outfile PHARVIS LNHHEXP AGE SEX MARRIED EDUC ILLNESS INJURY ILLDAYS /* > */ ACTDAYS INSURANCE COMMUNE using vietnam_ex2.asc, replace . . ********** ANALYSIS: CLUSTER ANALYSIS FOR POISSON MODEL [Table 24.6 p.851] ********* . . * Regressor list for the Poisson regressions . global XLISTPOISSON LNHHEXP INSURANCE SEX AGE MARRIED ILLDAYS ACTDAYS INJURY ILLNESS EDUC . . * Poisson with usual standard errors (Table 24.6 columns 1-2) . poisson PHARVIS $XLISTPOISSON Iteration 0: Iteration 1: Iteration 2: Iteration 3: Iteration 4:

log likelihood = -26309.924 log likelihood = -25300.337 log likelihood = -25281.839 log likelihood = -25281.786 log likelihood = -25281.786

Poisson regression

Number of obs = 27765 LR chi2(10) = 13226.50 Prob > chi2 = 0.0000 Log likelihood = -25281.786 Pseudo R2 = 0.2073

-----------------------------------------------------------------------------PHARVIS | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------LNHHEXP | .078686 .0138419 5.68 0.000 .0515564 .1058156 INSURANCE | -.2485716 .0259704 -9.57 0.000 -.2994727 -.1976706 SEX | .0851733 .0171697 4.96 0.000 .0515213 .1188253 AGE | .0252426 .0106126 2.38 0.017 .0044423 .0460429 MARRIED | .1239639 .0209267 5.92 0.000 .0829483 .1649795 ILLDAYS | .0429083 .0010728 40.00 0.000 .0408057 .0450109 ACTDAYS | .0089793 .0052409 1.71 0.087 -.0012927 .0192514 INJURY | .1717029 .0747292 2.30 0.022 .0252364 .3181694 ILLNESS | .5623976 .0064536 87.15 0.000 .5497488 .5750464 EDUC | -.0524459 .0048173 -10.89 0.000 -.0618878 -.0430041 _cons | -1.640821 .0458542 -35.78 0.000 -1.730694 -1.550949 -----------------------------------------------------------------------------. estimates store poisiid . . * Poisson with heteroskedastic-robust standard errors (Table 24.6 column 3) . poisson PHARVIS $XLISTPOISSON, robust Iteration 0: log pseudo-likelihood = -26309.924 Iteration 1: log pseudo-likelihood = -25300.337 590

Iteration 2: log pseudo-likelihood = -25281.839 Iteration 3: log pseudo-likelihood = -25281.786 Iteration 4: log pseudo-likelihood = -25281.786 Poisson regression

Number of obs = 27765 Wald chi2(10) = 2423.07 Prob > chi2 = 0.0000 Log pseudo-likelihood = -25281.786 Pseudo R2 = 0.2073 -----------------------------------------------------------------------------| Robust PHARVIS | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------LNHHEXP | .078686 .0255091 3.08 0.002 .0286891 .1286829 INSURANCE | -.2485716 .0437892 -5.68 0.000 -.3343969 -.1627464 SEX | .0851733 .030907 2.76 0.006 .0245967 .1457499 AGE | .0252426 .0198448 1.27 0.203 -.0136526 .0641377 MARRIED | .1239639 .0419107 2.96 0.003 .0418205 .2061073 ILLDAYS | .0429083 .0028779 14.91 0.000 .0372678 .0485488 ACTDAYS | .0089793 .0207444 0.43 0.665 -.031679 .0496377 INJURY | .1717029 .2043534 0.84 0.401 -.2288224 .5722282 ILLNESS | .5623976 .0228635 24.60 0.000 .517586 .6072092 EDUC | -.0524459 .0081043 -6.47 0.000 -.0683301 -.0365618 _cons | -1.640821 .0872497 -18.81 0.000 -1.811828 -1.469815 -----------------------------------------------------------------------------. estimates store poishet . . * Poisson with cluster-robust standard errors (Table 24.6 column 4) . poisson PHARVIS $XLISTPOISSON, cluster(COMMUNE) Iteration 0: Iteration 1: Iteration 2: Iteration 3: Iteration 4:

log pseudo-likelihood = -26309.924 log pseudo-likelihood = -25300.337 log pseudo-likelihood = -25281.839 log pseudo-likelihood = -25281.786 log pseudo-likelihood = -25281.786

Poisson regression

Number of obs = 27765 Wald chi2(10) = 1295.38 Log pseudo-likelihood = -25281.786 Prob > chi2 = 0.0000 (standard errors adjusted for clustering on COMMUNE) -----------------------------------------------------------------------------| Robust PHARVIS | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------LNHHEXP | .078686 .0472052 1.67 0.096 -.0138344 .1712065 INSURANCE | -.2485716 .0617873 -4.02 0.000 -.3696725 -.1274708 SEX | .0851733 .0327427 2.60 0.009 .0209988 .1493478 AGE | .0252426 .0262626 0.96 0.336 -.0262311 .0767163 591

MARRIED | .1239639 .048607 2.55 0.011 .028696 .2192318 ILLDAYS | .0429083 .0037384 11.48 0.000 .0355811 .0502355 ACTDAYS | .0089793 .0190493 0.47 0.637 -.0283567 .0463154 INJURY | .1717029 .2214258 0.78 0.438 -.2622836 .6056894 ILLNESS | .5623976 .028512 19.72 0.000 .506515 .6182802 EDUC | -.0524459 .0153841 -3.41 0.001 -.0825982 -.0222937 _cons | -1.640821 .1541108 -10.65 0.000 -1.942873 -1.33877 -----------------------------------------------------------------------------. estimates store poisclust . . * Random effects estimation (Table 24.6 columns 5-6) . * This uses the xtpois command which first requires identifying the cluster . iis COMMUNE . xtpois PHARVIS $XLISTPOISSON, re Fitting Poisson model: Iteration 0: Iteration 1: Iteration 2: Iteration 3: Iteration 4:

log likelihood = -26309.924 log likelihood = -25300.337 log likelihood = -25281.839 log likelihood = -25281.786 log likelihood = -25281.786

Fitting full model: Iteration 0: Iteration 1: Iteration 2: Iteration 3: Iteration 4:

log likelihood = -23538.342 log likelihood = -23430.615 log likelihood = -23419.142 log likelihood = -23419.132 log likelihood = -23419.132

Random-effects Poisson regression Group variable (i): COMMUNE Random effects u_i ~ Gamma

Number of obs = 27765 Number of groups = 194 Obs per group: min = avg = 143.1 max = 206

51

Wald chi2(10) = 13723.01 Log likelihood = -23419.132 Prob > chi2 = 0.0000 -----------------------------------------------------------------------------PHARVIS | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------LNHHEXP | -.1013746 .0187549 -5.41 0.000 -.1381336 -.0646157 INSURANCE | -.1675953 .0273642 -6.12 0.000 -.2212283 -.1139624 SEX | .099303 .0172541 5.76 0.000 .0654855 .1331206 AGE | .0047406 .0107899 0.44 0.660 -.0164073 .0258884 592

MARRIED | .1579958 .0212825 7.42 0.000 .1162828 .1997088 ILLDAYS | .046055 .0011422 40.32 0.000 .0438164 .0482937 ACTDAYS | .0186084 .0054546 3.41 0.001 .0079176 .0292991 INJURY | .1479464 .0780863 1.89 0.058 -.0051 .3009928 ILLNESS | .5801872 .0076855 75.49 0.000 .565124 .5952505 EDUC | -.0284493 .0055827 -5.10 0.000 -.0393911 -.0175075 _cons | -1.276974 .0723199 -17.66 0.000 -1.418718 -1.135229 -------------+---------------------------------------------------------------/lnalpha | -1.039839 .1035295 -1.242753 -.836925 -------------+---------------------------------------------------------------alpha | .3535115 .0365989 .2885885 .4330401 -----------------------------------------------------------------------------Likelihood-ratio test of alpha=0: chibar2(01) = 3725.31 Prob>=chibar2 = 0.000 . estimates store poisre . . * Following shows that cluster option for xtpois in Stata version does nothing . xtpois PHARVIS $XLISTPOISSON, i(COMMUNE) re Fitting Poisson model: Iteration 0: Iteration 1: Iteration 2: Iteration 3: Iteration 4:

log likelihood = -26309.924 log likelihood = -25300.337 log likelihood = -25281.839 log likelihood = -25281.786 log likelihood = -25281.786

Fitting full model: Iteration 0: Iteration 1: Iteration 2: Iteration 3: Iteration 4:

log likelihood = -23538.342 log likelihood = -23430.615 log likelihood = -23419.142 log likelihood = -23419.132 log likelihood = -23419.132

Random-effects Poisson regression Group variable (i): COMMUNE Random effects u_i ~ Gamma

Number of obs = 27765 Number of groups = 194 Obs per group: min = avg = 143.1 max = 206

51

Wald chi2(10) = 13723.01 Log likelihood = -23419.132 Prob > chi2 = 0.0000 -----------------------------------------------------------------------------PHARVIS | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------LNHHEXP | -.1013746 .0187549 -5.41 0.000 -.1381336 -.0646157 INSURANCE | -.1675953 .0273642 -6.12 0.000 -.2212283 -.1139624 593

SEX | .099303 .0172541 5.76 0.000 .0654855 .1331206 AGE | .0047406 .0107899 0.44 0.660 -.0164073 .0258884 MARRIED | .1579958 .0212825 7.42 0.000 .1162828 .1997088 ILLDAYS | .046055 .0011422 40.32 0.000 .0438164 .0482937 ACTDAYS | .0186084 .0054546 3.41 0.001 .0079176 .0292991 INJURY | .1479464 .0780863 1.89 0.058 -.0051 .3009928 ILLNESS | .5801872 .0076855 75.49 0.000 .565124 .5952505 EDUC | -.0284493 .0055827 -5.10 0.000 -.0393911 -.0175075 _cons | -1.276974 .0723199 -17.66 0.000 -1.418718 -1.135229 -------------+---------------------------------------------------------------/lnalpha | -1.039839 .1035295 -1.242753 -.836925 -------------+---------------------------------------------------------------alpha | .3535115 .0365989 .2885885 .4330401 -----------------------------------------------------------------------------Likelihood-ratio test of alpha=0: chibar2(01) = 3725.31 Prob>=chibar2 = 0.000 . . * Note that can cluster bootstrap if desired to get more robust standard errors . * This is done at end of program . . * Fixed effects estimation (FGLS) (Table 24.6 columns 7-8) . xtpois PHARVIS $XLISTPOISSON, fe note: 1 group (94 obs) dropped due to all zero outcomes Iteration 0: Iteration 1: Iteration 2: Iteration 3: Iteration 4: Iteration 5:

log likelihood = -28435.61 log likelihood = -24231.502 log likelihood = -22468.078 log likelihood = -22446.225 log likelihood = -22446.002 log likelihood = -22446.002

Conditional fixed-effects Poisson regression Number of obs = Group variable (i): COMMUNE Number of groups = Obs per group: min = avg = 143.4 max = 206

Log likelihood = -22446.002

27671 193

51

Wald chi2(10) = 13621.76 Prob > chi2 = 0.0000

-----------------------------------------------------------------------------PHARVIS | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------LNHHEXP | -.1146402 .019025 -6.03 0.000 -.1519285 -.0773519 INSURANCE | -.163603 .0274193 -5.97 0.000 -.2173438 -.1098622 SEX | .0997415 .0172564 5.78 0.000 .0659195 .1335635 AGE | .0033591 .0107945 0.31 0.756 -.0177977 .024516 MARRIED | .1606792 .0212958 7.55 0.000 .1189403 .2024182 ILLDAYS | .046148 .0011453 40.29 0.000 .0439032 .0483929 ACTDAYS | .0189184 .0054666 3.46 0.001 .008204 .0296328 594

INJURY | .1479319 .078183 1.89 0.058 -.0053039 .3011677 ILLNESS | .5803719 .0077289 75.09 0.000 .5652235 .5955203 EDUC | -.0272099 .0056191 -4.84 0.000 -.0382232 -.0161966 -----------------------------------------------------------------------------. estimates store poisfe . . * Note that can cluster bootstrap if desired to get more robust standard errors . * This is done at end of program . . ********** DISPLAY TABLE 24.6 RESULTS page 852 ********** . . * The results here differ in the second significant digit from those in text . * despite same sample size. Not sure why. . . estimates table poisiid poishet poisclust, /* > */ b(%10.3f) t(%10.2f) stats(r2 N) ----------------------------------------------------Variable | poisiid poishet poisclust -------------+--------------------------------------LNHHEXP | 0.079 0.079 0.079 | 5.68 3.08 1.67 INSURANCE | -0.249 -0.249 -0.249 | -9.57 -5.68 -4.02 SEX | 0.085 0.085 0.085 | 4.96 2.76 2.60 AGE | 0.025 0.025 0.025 | 2.38 1.27 0.96 MARRIED | 0.124 0.124 0.124 | 5.92 2.96 2.55 ILLDAYS | 0.043 0.043 0.043 | 40.00 14.91 11.48 ACTDAYS | 0.009 0.009 0.009 | 1.71 0.43 0.47 INJURY | 0.172 0.172 0.172 | 2.30 0.84 0.78 ILLNESS | 0.562 0.562 0.562 | 87.15 24.60 19.72 EDUC | -0.052 -0.052 -0.052 | -10.89 -6.47 -3.41 _cons | -1.641 -1.641 -1.641 | -35.78 -18.81 -10.65 -------------+--------------------------------------r2 | N | 27765.000 27765.000 27765.000 ----------------------------------------------------legend: b/t . estimates table poisre poisfe, /* 595

>

*/ b(%10.3f) t(%10.2f) stats(r2 N)

---------------------------------------Variable | poisre poisfe -------------+-------------------------PHARVIS | LNHHEXP | -0.101 -0.115 | -5.41 -6.03 INSURANCE | -0.168 -0.164 | -6.12 -5.97 SEX | 0.099 0.100 | 5.76 5.78 AGE | 0.005 0.003 | 0.44 0.31 MARRIED | 0.158 0.161 | 7.42 7.55 ILLDAYS | 0.046 0.046 | 40.32 40.29 ACTDAYS | 0.019 0.019 | 3.41 3.46 INJURY | 0.148 0.148 | 1.89 1.89 ILLNESS | 0.580 0.580 | 75.49 75.09 EDUC | -0.028 -0.027 | -5.10 -4.84 _cons | -1.277 | -17.66 -------------+-------------------------lnalpha | _cons | -1.040 | -10.04 -------------+-------------------------Statistics | r2 | N | 27765.000 27671.000 ---------------------------------------legend: b/t . . ********** ADDITIONALLY DO CLUSTER BOOTSTRAPS ********** . . * These results not given in the text . . * Output at website uses breps 500 . global breps 50 . . * Note that can bootstrap if desired to get more robust standard errors . * The first reproduces pois , cluster(COMMUNE) . bootstrap "poisson PHARVIS $XLISTPOISSON" _b, cluster(COMMUNE) reps($breps) level(95) 596

command: poisson PHARVIS LNHHEXP INSURANCE SEX AGE MARRIED ILLDAYS ACTDAYS INJURY ILLNESS EDUC statistics: b_LNHHEXP = [PHARVIS]_b[LNHHEXP] b_INSURA~E = [PHARVIS]_b[INSURANCE] b_SEX = [PHARVIS]_b[SEX] b_AGE = [PHARVIS]_b[AGE] b_MARRIED = [PHARVIS]_b[MARRIED] b_ILLDAYS = [PHARVIS]_b[ILLDAYS] b_ACTDAYS = [PHARVIS]_b[ACTDAYS] b_INJURY = [PHARVIS]_b[INJURY] b_ILLNESS = [PHARVIS]_b[ILLNESS] b_EDUC = [PHARVIS]_b[EDUC] b_cons = [PHARVIS]_b[_cons] Bootstrap statistics

Number of obs = N of clusters = 194 Replications = 50

27765

-----------------------------------------------------------------------------Variable | Reps Observed Bias Std. Err. [95% Conf. Interval] -------------+---------------------------------------------------------------b_LNHHEXP | 50 .078686 .0072233 .0475425 -.0168542 .1742262 (N) | -.0050689 .1878158 (P) | -.0204097 .1710779 (BC) b_INSURANCE | 50 -.2485716 .0013929 .0770506 -.4034107 -.0937326 (N) | -.3640907 -.1004183 (P) | -.4677969 -.1004183 (BC) b_SEX | 50 .0851733 -.0039062 .0345537 .0157351 .1546115 (N) | .022333 .1494552 (P) | .022333 .1494552 (BC) b_AGE | 50 .0252426 .0012812 .0270715 -.0291596 .0796447 (N) | -.025843 .0726057 (P) | -.0479862 .0726057 (BC) b_MARRIED | 50 .1239639 -.0017894 .0406114 .0423522 .2055756 (N) | .0132484 .2024732 (P) | .0132484 .2101617 (BC) b_ILLDAYS | 50 .0429083 -.0005122 .0034 .0360757 .0497409 (N) | .0358535 .0481521 (P) | .0363203 .0500312 (BC) b_ACTDAYS | 50 .0089793 -.0021093 .0249974 -.0412549 .0592135 (N) | -.0343906 .0573651 (P) | -.0352626 .0573651 (BC) b_INJURY | 50 .1717029 -.0321969 .2090263 -.2483512 .591757 (N) | -.3271621 .4807015 (P) | -.1896703 .648314 (BC) b_ILLNESS | 50 .5623976 .0061368 .0294736 .5031682 .621627 (N) | .5206931 .6271017 (P) | .5192547 .6206369 (BC) b_EDUC | 50 -.0524459 .0027244 .01598 -.0845589 -.0203329 (N) | -.0825952 -.017323 (P) 597

| -.0850821 -.0256777 (BC) b_cons | 50 -1.640821 -.0414073 .1460702 -1.93436 -1.347282 (N) | -1.984352 -1.399226 (P) | -1.867373 -1.310915 (BC) -----------------------------------------------------------------------------Note: N = normal P = percentile BC = bias-corrected . * The t-statistic vector is e(b)./e(se) where ./ is elt. by elt. division . * But Stata Version 8 does not do ./ so instead need the following . matrix tpois = (vecdiag(diag(e(b))*syminv(diag(e(se)))))' . matrix list tpois, format(%10.2f) tpois[11,1] r1 b_LNHHEXP 1.66 b_INSURANCE -3.23 b_SEX 2.46 b_AGE 0.93 b_MARRIED 3.05 b_ILLDAYS 12.62 b_ACTDAYS 0.36 b_INJURY 0.82 b_ILLNESS 19.08 b_EDUC -3.28 b_cons -11.23 . . * The next two reproduce xtpois , cluster(COMMUNE) . * but xtpois has no cluster option so instead cluster boostrap . . * Fixed effects estimator . bootstrap "xtpois PHARVIS $XLISTPOISSON, fe" _b, cluster(COMMUNE) reps($breps) level(95) command: xtpois PHARVIS LNHHEXP INSURANCE SEX AGE MARRIED ILLDAYS ACTDAYS INJURY ILLNESS EDUC , > fe statistics: b_LNHHEXP = [PHARVIS]_b[LNHHEXP] b_INSURA~E = [PHARVIS]_b[INSURANCE] b_SEX = [PHARVIS]_b[SEX] b_AGE = [PHARVIS]_b[AGE] b_MARRIED = [PHARVIS]_b[MARRIED] b_ILLDAYS = [PHARVIS]_b[ILLDAYS] b_ACTDAYS = [PHARVIS]_b[ACTDAYS] b_INJURY = [PHARVIS]_b[INJURY] b_ILLNESS = [PHARVIS]_b[ILLNESS] b_EDUC = [PHARVIS]_b[EDUC]

598

Bootstrap statistics

Number of obs = N of clusters = 193 Replications = 50

27671

-----------------------------------------------------------------------------Variable | Reps Observed Bias Std. Err. [95% Conf. Interval] -------------+---------------------------------------------------------------b_LNHHEXP | 50 -.1146402 .0046925 .042981 -.2010138 -.0282666 (N) | -.1801919 -.0258064 (P) | -.1841975 -.043704 (BC) b_INSURANCE | 50 -.163603 .0145077 .0513299 -.2667543 -.0604516 (N) | -.2391983 -.0581847 (P) | -.269962 -.0993868 (BC) b_SEX | 50 .0997415 .0030381 .0298361 .0397836 .1596994 (N) | .0581716 .1630876 (P) | .055771 .1562326 (BC) b_AGE | 50 .0033591 -.0017336 .0228288 -.042517 .0492353 (N) | -.0508069 .040935 (P) | -.0508069 .0541492 (BC) b_MARRIED | 50 .1606793 .009603 .0435503 .0731616 .2481969 (N) | .1091381 .260388 (P) | .0877519 .2407327 (BC) b_ILLDAYS | 50 .046148 -.0004107 .0027904 .0405406 .0517555 (N) | .0397139 .0504146 (P) | .0397139 .050898 (BC) b_ACTDAYS | 50 .0189184 -.0049228 .0176306 -.0165115 .0543484 (N) | -.0169987 .0490534 (P) | -.0158923 .0497731 (BC) b_INJURY | 50 .1479319 .0204617 .2194316 -.2930323 .5888962 (N) | -.2735089 .5520838 (P) | -.3044733 .5520838 (BC) b_ILLNESS | 50 .5803719 .0003675 .0199171 .540347 .6203969 (N) | .5370637 .6163648 (P) | .5370637 .6163648 (BC) b_EDUC | 50 -.0272099 -.0003993 .0112987 -.0499155 -.0045043 (N) | -.0521668 -.0068456 (P) | -.0531845 -.0068456 (BC) -----------------------------------------------------------------------------Note: N = normal P = percentile BC = bias-corrected . matrix tpoisfe = (vecdiag(diag(e(b))*syminv(diag(e(se)))))' . matrix list tpoisfe, format(%10.2f) tpoisfe[10,1] r1 b_LNHHEXP -2.67 b_INSURANCE -3.19 b_SEX 3.34 599

b_AGE 0.15 b_MARRIED 3.69 b_ILLDAYS 16.54 b_ACTDAYS 1.07 b_INJURY 0.67 b_ILLNESS 29.14 b_EDUC -2.41 . . * Random effects estimator . bootstrap "xtpois PHARVIS $XLISTPOISSON, re" _b, cluster(COMMUNE) reps($breps) level(95) command: xtpois PHARVIS LNHHEXP INSURANCE SEX AGE MARRIED ILLDAYS ACTDAYS INJURY ILLNESS EDUC , > re statistics: b_LNHHEXP = [PHARVIS]_b[LNHHEXP] b_INSURA~E = [PHARVIS]_b[INSURANCE] b_SEX = [PHARVIS]_b[SEX] b_AGE = [PHARVIS]_b[AGE] b_MARRIED = [PHARVIS]_b[MARRIED] b_ILLDAYS = [PHARVIS]_b[ILLDAYS] b_ACTDAYS = [PHARVIS]_b[ACTDAYS] b_INJURY = [PHARVIS]_b[INJURY] b_ILLNESS = [PHARVIS]_b[ILLNESS] b_EDUC = [PHARVIS]_b[EDUC] b_cons = [PHARVIS]_b[_cons] b_1cons = [lnalpha]_b[_cons] Bootstrap statistics

Number of obs = N of clusters = 194 Replications = 50

27765

-----------------------------------------------------------------------------Variable | Reps Observed Bias Std. Err. [95% Conf. Interval] -------------+---------------------------------------------------------------b_LNHHEXP | 50 -.1013746 .0038095 .0406385 -.1830407 -.0197086 (N) | -.1794194 -.0319058 (P) | -.1977448 -.0319058 (BC) b_INSURANCE | 50 -.1675954 -.0053195 .04945 -.2669688 -.0682219 (N) | -.2912881 -.0900193 (P) | -.2677689 -.088337 (BC) b_SEX | 50 .099303 -.0008622 .032962 .0330634 .1655427 (N) | .0463968 .1569125 (P) | .0463968 .1569125 (BC) b_AGE | 50 .0047406 -.002087 .0196285 -.0347045 .0441856 (N) | -.0319554 .0398893 (P) | -.0212454 .0454795 (BC) b_MARRIED | 50 .1579958 .0045701 .0386327 .0803604 .2356311 (N) | .1002202 .2446688 (P) | .0595091 .2383231 (BC) 600

b_ILLDAYS | 50 .046055 -.0000891 .0033445 .039334 .0527761 (N) | .0400018 .0525925 (P) | .0400018 .0528012 (BC) b_ACTDAYS | 50 .0186084 -.0013996 .0204209 -.022429 .0596457 (N) | -.0251694 .0533912 (P) | -.0251694 .0624974 (BC) b_INJURY | 50 .1479464 -.0122248 .2130704 -.2802346 .5761274 (N) | -.2971589 .4662884 (P) | -.3564237 .4662884 (BC) b_ILLNESS | 50 .5801873 .002013 .019375 .5412517 .6191228 (N) | .5488635 .621733 (P) | .5488635 .6328769 (BC) b_EDUC | 50 -.0284493 -.0017922 .0117021 -.0519655 -.0049331 (N) | -.050308 -.0116823 (P) | -.050308 -.0065941 (BC) b_cons | 50 -1.276974 -.0036143 .1309168 -1.540061 -1.013887 (N) | -1.523902 -.9686469 (P) | -1.523902 -.9686469 (BC) b_1cons | 50 -1.039839 .0148765 .0966908 -1.234147 -.8455317 (N) | -1.170977 -.8494586 (P) | -1.183111 -.8494586 (BC) -----------------------------------------------------------------------------Note: N = normal P = percentile BC = bias-corrected . matrix tpoisre = (vecdiag(diag(e(b))*syminv(diag(e(se)))))' . matrix list tpoisre, format(%10.2f) tpoisre[12,1] r1 b_LNHHEXP -2.49 b_INSURANCE -3.39 b_SEX 3.01 b_AGE 0.24 b_MARRIED 4.09 b_ILLDAYS 13.77 b_ACTDAYS 0.91 b_INJURY 0.69 b_ILLNESS 29.95 b_EDUC -2.43 b_cons -9.75 b_1cons -10.75 . . ********** CLOSE OUTPUT ********** . log close log: c:\Imbook\bwebpage\Section6\mma24p2poiscluster.txt log type: text closed on: 24 May 2005, 16:50:38 601

-----------------------------------------------------------------------------------------------------log: c:\Imbook\bwebpage\Section6\mma25p1treatment.txt log type: text opened on: 26 May 2005, 10:26:17 . . ********** OVERVIEW OF MMA25P1TREATMENT.DO ********** . . * STATA Program . * copyright C 2005 by A. Colin Cameron and Pravin K. Trivedi . * used for "Microeconometrics: Methods and Applications" . * by A. Colin Cameron and Pravin K. Trivedi (2005) . * Cambridge University Press . . * Chapter 25.8.1-25.8.4 pages 889-893 Tables 25.3-25.4 and Fig. 25.3 . * Evaluating treatment effect of training on Earnings . * using Dehejia-Wahba data (originally Lalonde data) . . * (0) Summarize data for treatments and controls (Table 25.3) . * (1) Calculate the treatment effect by simple methods (Table 25.4) . * To replicate some results in DW 1999 . * (1A) treatment-control . * (1B) control function . * (1C) before-after cpmparison . * (1D) differences-in-differences . * (2) Calculate treatment effect by propensity score (matching by strata) . * Last entry in Table 25.4 and Figure 25.3. . . * The program MMA25P2MATCHING.DO uses propensity scores with matching . * methods more sophisticated than those usd in the MMA25P1TREAMENT.DO . . * To run this program you need file . * nswpsid.da1 . . ********** STATA SETUP ********** . . set more off . version 8 . set scheme s1mono /* Used for graphs */ . . ********** DATA DESCRIPTION ********** . . * Data set nswpsid.da1 is data set nswpsid.da1 from Guido Imbens . * http://emlab.berkeley.edu/users/imbens/index.shtml . . * Data originally from DW99 . * R.H. Dehejia and S. Wahba (1999) . * "Causal Effects in Nonexperimental Studies: reevaluating the 602

. * Evaluation of Training Programs", JASA, 1053-1062 . * or DW02 . * R.H. Dehejia and S. Wahba (2002) . * "Propensity-score Matching Methods for Nonexperimental Causal . * Studies", ReStat, 151-161 . * which in turn are from . * Lalonde, R. (1986), "Evaluating the Econometric Evaluations of . * Training Programs with Experimental Data," AER, 604-620. . . * Each observation is for an individual. . * There are 2,675 observations: 185 in treated group and 2490 in control . . * Variables are . * TREAT 1 if treated (NSW treated) and 0 if not (PSID-1 control) . * AGE in years . * EDUC in years . * BLACK 1 if black . * HISP 1 if hispanic . * MARR 1 if married . * RE74 Real annual earnings in 1974 (pre-treatment) . * RE75 Real annual earnings in 1974 (pre-treatment) . * RE78 Real annual earnings in 1974 (post-treatment) . * U74 1 if unemployed in 1974 . * U75 1 if unemployed in 1974 . . * NOTE: U74 and U75 are miscoded in these data and also in the .* summary statistics table of DW02 .* See below for correction to data . . ********** READ DATA AND TRANSFORMATIONS ********** . . infile TREAT AGE EDUC BLACK HISP MARR RE74 RE75 RE78 U74 U75 /* > */ using nswpsid.da1 (2675 observations read) . . * The original data reversed U74 and U75 . * Should be U74=1 if R74=0 and U74=0 if R74>0 anmd similar for U75 . * This effects results with propensity score though not eariler results . . * Wrong U74 and U75 . sum U74 U75 Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------U74 | 2675 .1345794 .3413376 0 1 U75 | 2675 .1293458 .335645 0 1 . . * Correct the original data . drop U74 U75 603

. gen U74 = cond(RE74 == 0, 1, 0) . gen U75 = cond(RE75 == 0, 1, 0) . . * Correct U74 and U75 . sum U74 U75 Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------U74 | 2675 .1293458 .335645 0 1 U75 | 2675 .1345794 .3413376 0 1 . . * Create regressors used as additional controls in regressions below . gen AGESQ = AGE*AGE . gen EDUCSQ = EDUC*EDUC . * DW99 do not define NODEGREE but following gives Table 1 means . gen NODEGREE = 0 . replace NODEGREE = 1 if EDUC < 12 (891 real changes made) . gen RE74SQ = RE74*RE74 . gen RE75SQ = RE75*RE75 . gen U74BLACK = U74*BLACK . gen U74HISP = U74*HISP . . sum AGE EDUC NODEGREE BLACK HISP MARR U74 U75 RE74 RE75 RE78 TREAT /* > */ AGESQ EDUCSQ RE74SQ RE75SQ U74BLACK U74HISP Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------AGE | 2675 34.22579 10.49984 17 55 EDUC | 2675 11.99439 3.053556 0 17 NODEGREE | 2675 .3330841 .4714045 0 1 BLACK | 2675 .2915888 .4545789 0 1 HISP | 2675 .0343925 .1822693 0 1 -------------+-------------------------------------------------------MARR | 2675 .8194393 .3847257 0 1 U74 | 2675 .1293458 .335645 0 1 U75 | 2675 .1345794 .3413376 0 1 RE74 | 2675 18230 13722.25 0 137149 RE75 | 2675 17850.89 13877.78 0 156653 604

-------------+-------------------------------------------------------RE78 | 2675 20502.38 15632.52 0 121174 TREAT | 2675 .0691589 .2537716 0 1 AGESQ | 2675 1281.61 766.8415 289 3025 EDUCSQ | 2675 153.1862 70.62231 0 289 RE74SQ | 2675 5.21e+08 8.47e+08 0 1.88e+10 -------------+-------------------------------------------------------RE75SQ | 2675 5.11e+08 8.91e+08 0 2.45e+10 U74BLACK | 2675 .0549533 .2279316 0 1 U74HISP | 2675 .0056075 .0746868 0 1 . . * Reproduce DW99 Table 1: RE74subset Treated and PSID-1 rows . * Same as CT Table 25.3 page 890 . * except for changes to U74, U75 and U74BLACK . bysort TREAT: sum AGE EDUC NODEGREE BLACK HISP MARR U74 U75 RE74 RE75 RE78 TREAT /* > */ AGESQ EDUCSQ RE74SQ RE75SQ U74BLACK ----------------------------------------------------------------------------------------------------> TREAT = 0 Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------AGE | 2490 34.8506 10.44076 18 55 EDUC | 2490 12.11687 3.082435 0 17 NODEGREE | 2490 .3052209 .4605934 0 1 BLACK | 2490 .2506024 .433447 0 1 HISP | 2490 .0325301 .1774389 0 1 -------------+-------------------------------------------------------MARR | 2490 .8662651 .3404357 0 1 U74 | 2490 .0863454 .2809298 0 1 U75 | 2490 .1 .3000603 0 1 RE74 | 2490 19428.75 13406.88 0 137149 RE75 | 2490 19063.34 13596.95 0 156653 -------------+-------------------------------------------------------RE78 | 2490 21553.92 15555.35 0 121174 TREAT | 2490 0 0 0 0 AGESQ | 2490 1323.53 769.796 324 3025 EDUCSQ | 2490 156.3161 71.43048 0 289 RE74SQ | 2490 5.57e+08 8.66e+08 0 1.88e+10 -------------+-------------------------------------------------------RE75SQ | 2490 5.48e+08 9.12e+08 0 2.45e+10 U74BLACK | 2490 .0144578 .1193923 0 1 ----------------------------------------------------------------------------------------------------> TREAT = 1 Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------AGE | 185 25.81622 7.155019 17 48 605

EDUC | 185 10.34595 2.01065 4 16 NODEGREE | 185 .7081081 .4558666 0 1 BLACK | 185 .8432432 .3645579 0 1 HISP | 185 .0594595 .2371244 0 1 -------------+-------------------------------------------------------MARR | 185 .1891892 .3927217 0 1 U74 | 185 .7081081 .4558666 0 1 U75 | 185 .6 .4912274 0 1 RE74 | 185 2095.574 4886.623 0 35040.1 RE75 | 185 1532.056 3219.251 0 25142.2 -------------+-------------------------------------------------------RE78 | 185 6349.145 7867.405 0 60307.9 TREAT | 185 1 0 1 1 AGESQ | 185 717.3946 431.2517 289 2304 EDUCSQ | 185 111.0595 39.30388 16 256 RE74SQ | 185 2.81e+07 1.14e+08 0 1.23e+09 -------------+-------------------------------------------------------RE75SQ | 185 1.27e+07 5.60e+07 0 6.32e+08 U74BLACK | 185 .6 .4912274 0 1

. . save nswpsid, replace file nswpsid.dta saved . . ********** ANALYSIS: (1) CALCULATE EFFECT OF TRAINING (Table 25.4, p.891) ********** . . ***** (1A) TREATMENT-CONTROL COMPARISON USING POST_TREATMENT EARNINGS . ***** [Difference in means] . . * DW99 Table 5 column 1 and Table 3 column 1 . regress RE78 T Source | SS df MS Number of obs = 2675 -------------+-----------------------------F( 1, 2673) = 173.41 Model | 3.9811e+10 1 3.9811e+10 Prob > F = 0.0000 Residual | 6.1365e+11 2673 229573201 R-squared = 0.0609 -------------+-----------------------------Adj R-squared = 0.0606 Total | 6.5346e+11 2674 244375675 Root MSE = 15152 -----------------------------------------------------------------------------RE78 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------TREAT | -15204.78 1154.614 -13.17 0.000 -17468.8 -12940.75 _cons | 21553.92 303.6414 70.98 0.000 20958.53 22149.32 -----------------------------------------------------------------------------. 606

. * CT Table 25.4 p.891 first row uses heteroskedastic-robust standard errors . regress RE78 TREAT, robust Regression with robust standard errors Number of obs = F( 1, 2673) = 537.36 Prob > F = 0.0000 R-squared = 0.0609 Root MSE = 15152

2675

-----------------------------------------------------------------------------| Robust RE78 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------TREAT | -15204.78 655.9143 -23.18 0.000 -16490.93 -13918.63 _cons | 21553.92 311.785 69.13 0.000 20942.56 22165.29 -----------------------------------------------------------------------------. estimates store treatcontrol . . ***** (1B) CONTROL FUNCTION ESTIMATOR Additionally Include pre-treatment controls . . * DW99 Table 5 column 2 using regressors in footnote a . * Same as DW99 Table 2 column 14 . regress RE78 TREAT AGE AGESQ EDUC NODEGREE BLACK HISP RE74 RE75 Source | SS df MS Number of obs = 2675 -------------+-----------------------------F( 9, 2665) = 419.22 Model | 3.8296e+11 9 4.2551e+10 Prob > F = 0.0000 Residual | 2.7050e+11 2665 101500967 R-squared = 0.5860 -------------+-----------------------------Adj R-squared = 0.5847 Total | 6.5346e+11 2674 244375675 Root MSE = 10075 -----------------------------------------------------------------------------RE78 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------TREAT | 217.9438 866.1968 0.25 0.801 -1480.542 1916.43 AGE | 158.5058 155.4065 1.02 0.308 -146.2239 463.2354 AGESQ | -3.232885 2.11617 -1.53 0.127 -7.382386 .9166173 EDUC | 564.6237 103.56 5.45 0.000 361.5577 767.6898 NODEGREE | 502.0912 647.0243 0.78 0.438 -766.6292 1770.812 BLACK | -699.3353 493.1811 -1.42 0.156 -1666.392 267.7211 HISP | 2226.535 1092.71 2.04 0.042 83.88965 4369.181 RE74 | .2791682 .0279297 10.00 0.000 .2244021 .3339343 RE75 | .5680874 .0275763 20.60 0.000 .5140143 .6221605 _cons | -2836.703 2901.443 -0.98 0.328 -8526.01 2852.604 -----------------------------------------------------------------------------. . * CT Table 25.4 p.891 second row uses heteroskedastic-robust standard errors . regress RE78 TREAT AGE AGESQ EDUC NODEGREE BLACK HISP RE74 RE75, robust 607

Regression with robust standard errors Number of obs = F( 9, 2665) = 232.85 Prob > F = 0.0000 R-squared = 0.5860 Root MSE = 10075

2675

-----------------------------------------------------------------------------| Robust RE78 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------TREAT | 217.9438 767.8811 0.28 0.777 -1287.759 1723.647 AGE | 158.5058 151.0305 1.05 0.294 -137.6431 454.6546 AGESQ | -3.232885 2.103324 -1.54 0.124 -7.357197 .891428 EDUC | 564.6237 121.6483 4.64 0.000 326.0891 803.1583 NODEGREE | 502.0912 632.3685 0.79 0.427 -737.8914 1742.074 BLACK | -699.3353 432.4582 -1.62 0.106 -1547.323 148.6523 HISP | 2226.535 1219.08 1.83 0.068 -163.9034 4616.974 RE74 | .2791682 .0618802 4.51 0.000 .1578301 .4005063 RE75 | .5680874 .0663995 8.56 0.000 .4378876 .6982872 _cons | -2836.703 2937.385 -0.97 0.334 -8596.487 2923.081 -----------------------------------------------------------------------------. estimates store controlfunction . . * Variation that lets OLS coefficients differ across treatment and controls . * Interaction of regressors with T . gen TAGE = TREAT*AGE . gen TAGESQ = TREAT*AGESQ . gen TEDUC = TREAT*EDUC . gen TNODEGREE = TREAT*NODEGREE . gen TBLACK = TREAT*BLACK . gen THISP = TREAT*HISP . gen TRE74 = TREAT*RE74 . gen TRE75 = TREAT*RE75 . regress RE78 TREAT AGE AGESQ EDUC NODEGREE BLACK HISP RE74 RE75 /* > */TAGE TAGESQ TEDUC TNODEGREE TBLACK THISP TRE74 TRE75 Source | SS df MS Number of obs = 2675 -------------+-----------------------------F( 17, 2657) = 223.17 Model | 3.8431e+11 17 2.2607e+10 Prob > F = 0.0000 Residual | 2.6915e+11 2657 101297131 R-squared = 0.5881 608

-------------+-----------------------------Adj R-squared = 0.5855 Total | 6.5346e+11 2674 244375675 Root MSE = 10065 -----------------------------------------------------------------------------RE78 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------TREAT | -8202.823 11960.39 -0.69 0.493 -31655.45 15249.8 AGE | 79.46291 165.6177 0.48 0.631 -245.2897 404.2155 AGESQ | -2.260967 2.239074 -1.01 0.313 -6.651471 2.129537 EDUC | 567.4906 106.2026 5.34 0.000 359.2424 775.7388 NODEGREE | 655.3534 679.5015 0.96 0.335 -677.052 1987.759 BLACK | -707.0551 505.0048 -1.40 0.162 -1697.297 283.1872 HISP | 2553.662 1154.726 2.21 0.027 289.4107 4817.914 RE74 | .2869368 .0282197 10.17 0.000 .231602 .3422715 RE75 | .5677759 .0277689 20.45 0.000 .5133251 .6222267 TAGE | 668.0022 745.1401 0.90 0.370 -793.1112 2129.116 TAGESQ | -8.651515 12.26876 -0.71 0.481 -32.7088 15.40577 TEDUC | -27.54033 529.1855 -0.05 0.958 -1065.197 1010.117 TNODEGREE | -963.4163 2410.973 -0.40 0.689 -5690.989 3764.157 TBLACK | -384.5853 2593.349 -0.15 0.882 -5469.772 4700.601 THISP | -2126.096 4086.539 -0.52 0.603 -10139.22 5887.023 TRE74 | -.2540934 .2070566 -1.23 0.220 -.6601018 .1519151 TRE75 | -.472797 .3097211 -1.53 0.127 -1.080116 .1345218 _cons | -1603.593 3069.895 -0.52 0.601 -7623.219 4416.032 -----------------------------------------------------------------------------. . ***** (1D) DIFFERENCE-IN-DIFFERENCES . . * Need to stack two separate years of data RE75 and RE78 . * into a panel of two years on RE . gen id = _n . label variable id "id" . gen EARNS1 = RE75 . gen EARNS2 = RE78 . reshape long EARNS, i(id) j(year) (note: j = 1 2) Data wide -> long ----------------------------------------------------------------------------Number of obs. 2675 -> 5350 Number of variables 31 -> 31 j variable (2 values) -> year xij variables: EARNS1 EARNS2 -> EARNS -----------------------------------------------------------------------------

609

. gen dyear2 = 0 . replace dyear2 = 1 if year==2 (2675 real changes made) . gen Tdyear2 = TREAT*dyear2 . regress EARNS Tdyear2 TREAT dyear2 Source | SS df MS Number of obs = 5350 -------------+-----------------------------F( 3, 5346) = 169.20 Model | 1.0214e+11 3 3.4047e+10 Prob > F = 0.0000 Residual | 1.0757e+12 5346 201218724 R-squared = 0.0867 -------------+-----------------------------Adj R-squared = 0.0862 Total | 1.1779e+12 5349 220201247 Root MSE = 14185 -----------------------------------------------------------------------------EARNS | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------Tdyear2 | 2326.505 1528.712 1.52 0.128 -670.3928 5323.403 TREAT | -17531.28 1080.962 -16.22 0.000 -19650.41 -15412.15 dyear2 | 2490.585 402.0217 6.20 0.000 1702.458 3278.711 _cons | 19063.34 284.2723 67.06 0.000 18506.05 19620.63 -----------------------------------------------------------------------------. . * CT Table 25.4 p.891 fourth row usea heteroskedastic-robust standard errors . regress EARNS Tdyear2 TREAT dyear2, robust Regression with robust standard errors Number of obs = F( 3, 5346) = 1222.98 Prob > F = 0.0000 R-squared = 0.0867 Root MSE = 14185

5350

-----------------------------------------------------------------------------| Robust EARNS | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------Tdyear2 | 2326.505 748.5021 3.11 0.002 859.1359 3793.875 TREAT | -17531.28 360.5992 -48.62 0.000 -18238.2 -16824.36 dyear2 | 2490.585 414.1056 6.01 0.000 1678.769 3302.4 _cons | 19063.34 272.5318 69.95 0.000 18529.06 19597.61 -----------------------------------------------------------------------------. estimates store diffindiff . . * Adding pretreatment controls makes no differnce as timne-invariant . regress EARNS Tdyear2 TREAT dyear2 AGE AGESQ EDUC NODEGREE BLACK HISP

610

Source | SS df MS Number of obs = 5350 -------------+-----------------------------F( 9, 5340) = 184.54 Model | 2.7943e+11 9 3.1048e+10 Prob > F = 0.0000 Residual | 8.9843e+11 5340 168245017 R-squared = 0.2372 -------------+-----------------------------Adj R-squared = 0.2359 Total | 1.1779e+12 5349 220201247 Root MSE = 12971 -----------------------------------------------------------------------------EARNS | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------Tdyear2 | 2326.505 1397.856 1.66 0.096 -413.8634 5066.874 TREAT | -9766.469 1043.296 -9.36 0.000 -11811.76 -7721.183 dyear2 | 2490.585 367.6092 6.78 0.000 1769.92 3211.249 AGE | 1357.093 139.6885 9.72 0.000 1083.246 1630.939 AGESQ | -15.23373 1.911801 -7.97 0.000 -18.98164 -11.48582 EDUC | 1504.728 91.99622 16.36 0.000 1324.377 1685.078 NODEGREE | -447.8275 588.8841 -0.76 0.447 -1602.281 706.6257 BLACK | -3177.524 446.5098 -7.12 0.000 -4052.865 -2302.182 HISP | -360.5058 993.7164 -0.36 0.717 -2308.596 1587.584 _cons | -25357.74 2618.207 -9.69 0.000 -30490.49 -20224.98 -----------------------------------------------------------------------------. . ***** (1C) BEFORE-AFTER COMPARISON . . * Regression for treated only . regress EARNS Tdyear2 if TREAT==1 Source | SS df MS Number of obs = 370 -------------+-----------------------------F( 1, 368) = 59.41 Model | 2.1464e+09 1 2.1464e+09 Prob > F = 0.0000 Residual | 1.3296e+10 368 36129816.6 R-squared = 0.1390 -------------+-----------------------------Adj R-squared = 0.1367 Total | 1.5442e+10 369 41848713.4 Root MSE = 6010.8 -----------------------------------------------------------------------------EARNS | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------Tdyear2 | 4817.09 624.9741 7.71 0.000 3588.121 6046.058 _cons | 1532.056 441.9234 3.47 0.001 663.0436 2401.068 -----------------------------------------------------------------------------. . * CT Table 25.4 p.891 third row uses heteroskedastic-robust standard errors . regress EARNS Tdyear2 if TREAT==1, robust Regression with robust standard errors Number of obs = F( 1, 368) = 59.41 Prob > F = 0.0000 R-squared = 0.1390 Root MSE = 6010.8

370

611

-----------------------------------------------------------------------------| Robust EARNS | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------Tdyear2 | 4817.09 624.9741 7.71 0.000 3588.121 6046.058 _cons | 1532.056 236.684 6.47 0.000 1066.633 1997.478 -----------------------------------------------------------------------------. estimates store beforeafter . . ***** DISPLAY RESULTS FOR FIRST FOUR ROWSM OF Table 25.4, p.891 . . estimates table treatcontrol controlfunction beforeafter diffindiff, /* > */ b(%10.0f) se(%10.0f) stats(N) -----------------------------------------------------------------Variable | treatcon~l controlf~n beforeaf~r diffindiff -------------+---------------------------------------------------TREAT | -15205 218 -17531 | 656 768 361 AGE | 159 | 151 AGESQ | -3 | 2 EDUC | 565 | 122 NODEGREE | 502 | 632 BLACK | -699 | 432 HISP | 2227 | 1219 RE74 | 0 | 0 RE75 | 1 | 0 Tdyear2 | 4817 2327 | 625 749 dyear2 | 2491 | 414 _cons | 21554 -2837 1532 19063 | 312 2937 237 273 -------------+---------------------------------------------------N| 2675 2675 370 5350 -----------------------------------------------------------------legend: b/se .

612

. ********** ANALYSIS: (2) PROPENSITY SCORE USING STRATA (Table 25.4, p.891) ********** . . use nswpsid, clear . . ***** (2A) COMPUTE PROPENSITY SCORE . . * Calculate propensity score using regressors in DW99 Table 3 footnote e . logit TREAT AGE AGESQ EDUC EDUCSQ MARR NODEGREE BLACK HISP RE74 RE75 RE74SQ RE75SQ U74BLACK Iteration 0: log likelihood = -672.64954 Iteration 1: log likelihood = -499.56574 Iteration 2: log likelihood = -318.55053 Iteration 3: log likelihood = -248.28844 Iteration 4: log likelihood = -225.08984 Iteration 5: log likelihood = -219.00396 Iteration 6: log likelihood = -209.30653 Iteration 7: log likelihood = -208.38887 Iteration 8: log likelihood = -205.17689 Iteration 9: log likelihood = -204.93156 Iteration 10: log likelihood = -204.92951 Iteration 11: log likelihood = -204.9295 Logit estimates

Log likelihood = -204.9295

Number of obs = 2675 LR chi2(13) = 935.44 Prob > chi2 = 0.0000 Pseudo R2 = 0.6953

-----------------------------------------------------------------------------TREAT | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------AGE | .3305734 .1203353 2.75 0.006 .0947206 .5664262 AGESQ | -.0063429 .0018561 -3.42 0.001 -.0099808 -.0027049 EDUC | .8247711 .3534216 2.33 0.020 .1320775 1.517465 EDUCSQ | -.0483153 .0186057 -2.60 0.009 -.0847819 -.0118488 MARR | -1.884062 .2994614 -6.29 0.000 -2.470996 -1.297129 NODEGREE | .1299868 .4284278 0.30 0.762 -.7097163 .96969 BLACK | 1.132961 .352088 3.22 0.001 .4428814 1.823041 HISP | 1.962762 .5673735 3.46 0.001 .8507302 3.074793 RE74 | -.0001047 .0000355 -2.95 0.003 -.0001743 -.0000351 RE75 | -.0002172 .0000415 -5.23 0.000 -.0002986 -.0001357 RE74SQ | 2.36e-09 6.57e-10 3.59 0.000 1.07e-09 3.65e-09 RE75SQ | 1.58e-10 6.68e-10 0.24 0.813 -1.15e-09 1.47e-09 U74BLACK | 2.137042 .4273667 5.00 0.000 1.299419 2.974665 _cons | -7.552458 2.451721 -3.08 0.002 -12.35774 -2.747173 -----------------------------------------------------------------------------note: 19 failures and 0 successes completely determined.

613

. * Note that Table 25.6 footnote b is wrong in stating RE74*RE75 is regressor . predict PSCORE (option p assumed; Pr(TREAT)) . . ***** (2B) PLOT PROPENSITY SCORE BY TREATMENT STATUS TO SEE OVERLAP . . * Observations with no overlap in propensity score across treatment status are dropped . . sum PSCORE if TREAT==1 Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------PSCORE | 185 .6876511 .3095136 .0006526 .9748755 . scalar PTMIN = r(min) . scalar PTMAX = r(max) . sum PSCORE if TREAT==0 Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------PSCORE | 2490 .0232066 .0901373 4.49e-11 .9735255 . scalar PCMIN = r(min) . scalar PCMAX = r(max) . drop if PSCORE < PTMIN (1344 observations deleted) . drop if PSCORE < PCMIN (0 observations deleted) . drop if PSCORE > PTMAX (0 observations deleted) . drop if PSCORE > PCMAX (6 observations deleted) . * Following gives number of observations left . sum PSCORE Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------PSCORE | 1325 .1350934 .2703797 .0006526 .9735255 . . * This differs from CT text page 893 as now U74 and U75 are corrected . * Instead of losing 1423 controls and 8 treated leaving 1244 614

. * now lose 1344 controls and 6 treated leaving 1325 . * versus DW Figure 1 1333 controls are dropped leaving 1342 . * and Dw Table 3 column 6 says that there are 1255 left . . ***** (2C) CREATE FIGURE 25.3 ON PAGE 892 . . * This will differ a little from figure in text due to U74 and U75 corrected . . label define tstatus 0 Comparison_sample 1 Treated_sample . label values TREAT tstatus . label variable TREAT "Treatment Status" . graph twoway (scatter RE78 PSCORE if RE78 < 20000, msize(small)) /* > */ (lowess RE78 PSCORE, bwidth(0.5) clpattern(solid)), /* > */ by(TREAT, title("Post-treatment Earnings against Propensity Score", margin(b=3) size(vlarge)) > ) /* > */ subtitle(, bfcolor(none)) /* > */ scale (1.2) plotregion(style(none)) /* > */ xtitle(" Propensity Score Propensity Score", size(medlarge)) > xscale(titlegap(*5)) /* > */ ytitle("Real Earnings 1978", size(medlarge)) yscale(titlegap(*5)) /* > */ legend(pos(12) ring(0) col(2)) /* > */ legend( label(1 "Original data") label(2 "Nonparametric regression")) . graph export ch25treatment.wmf, replace (file c:\Imbook\bwebpage\Section6\ch25treatment.wmf written in Windows Metafile format) . . ***** (2D) ADJUSTED DIFFERENCE Use PSCORE to summarize pre-treatment controls . . * A simple method regressors RE78 on a quadratic on PSCORE and on TREAT . * And measures the treatment effect as coefficient of TREATED . . gen PSCORESQ = PSCORE*PSCORE . regress RE78 TREAT PSCORE PSCORESQ Source | SS df MS Number of obs = 1325 -------------+-----------------------------F( 3, 1321) = 46.14 Model | 1.5152e+10 3 5.0505e+09 Prob > F = 0.0000 Residual | 1.4458e+11 1321 109450232 R-squared = 0.0949 -------------+-----------------------------Adj R-squared = 0.0928 Total | 1.5974e+11 1324 120645977 Root MSE = 10462 -----------------------------------------------------------------------------RE78 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------TREAT | 301.5344 1388.756 0.22 0.828 -2422.874 3025.943 615

PSCORE | -39475.21 4836.678 -8.16 0.000 -48963.62 -29986.8 PSCORESQ | 33122.86 5037.943 6.57 0.000 23239.61 43006.1 _cons | 14560.51 347.3596 41.92 0.000 13879.07 15241.95 -----------------------------------------------------------------------------. . * This yields coefficient of 301 with nonrobust se of 1388 . * which is close to DW 99 Table 3 column 3 .* coefficient of 294 with nonrobust se of 1389 . . ***** (2E) CREATE STRATA . . * DW are not clear on how formed. . * NBER Working Paper W6829 appendix suggests that form five cells . * according to range of PSCORE (where nonoverlapping PSCOREs already dropped) . . * Here we instead create ten strata . * for PSCORE <0.1, 0.1-0.2, ...., 0.8-0.9 and > 0.9 . global cut1 = 0.1 . global cut2 = 0.2 . global cut3 = 0.3 . global cut4 = 0.4 . global cut5 = 0.5 . global cut6 = 0.6 . global cut7 = 0.7 . global cut8 = 0.8 . global cut9 = 0.9 . gen STRATA = 1 . replace STRATA = 2 if PSCORE > $cut1 & PSCORE <= $cut2 (60 real changes made) . replace STRATA = 3 if PSCORE > $cut2 & PSCORE <= $cut3 (35 real changes made) . replace STRATA = 4 if PSCORE > $cut3 & PSCORE <= $cut4 (33 real changes made) . replace STRATA = 5 if PSCORE > $cut4 & PSCORE <= $cut5 (13 real changes made) . replace STRATA = 6 if PSCORE > $cut5 & PSCORE <= $cut6 616

(21 real changes made) . replace STRATA = 7 if PSCORE > $cut6 & PSCORE <= $cut7 (22 real changes made) . replace STRATA = 8 if PSCORE > $cut7 & PSCORE <= $cut8 (13 real changes made) . replace STRATA = 9 if PSCORE > $cut8 & PSCORE <= $cut9 (13 real changes made) . replace STRATA = 10 if PSCORE > $cut9 (86 real changes made) . . tab STRATA T | Treatment Status STRATA | Compariso Treated_s | Total -----------+----------------------+---------1 | 1,018 11 | 1,029 2| 53 7| 60 3| 24 11 | 35 4| 17 16 | 33 5| 8 5| 13 6| 6 15 | 21 7| 8 14 | 22 8| 5 8| 13 9| 0 13 | 13 10 | 7 79 | 86 -----------+----------------------+---------Total | 1,146 179 | 1,325

. . ***** (2F) Test for similar regressor means for treated and nontreated within each Strata . . * Compare means within Strata across treatment status . tab STRATA TREAT, sum(AGE) nostand nofreq Means of AGE | Treatment Status STRATA | Compariso Treated_s | Total -----------+----------------------+---------1 | 31.427308 30.363636 | 31.415938 2 | 28.037736 28.714286 | 28.116667 3 | 27.833333 27.909091 | 27.857143 4 | 27.529412 28.25 | 27.878788 5 | 28.875 27.8 | 28.461538 6| 25 23.4 | 23.857143 617

7 | 24.875 24.5 | 24.636364 8| 24.8 32 | 29.230769 9| . 29.461538 | 29.461538 10 | 23.285714 23.367089 | 23.360465 -----------+----------------------+---------Total | 30.961606 25.765363 | 30.259623 . tab STRATA TREAT, sum(EDUC) nostand nofreq Means of EDUC | Treatment Status STRATA | Compariso Treated_s | Total -----------+----------------------+---------1 | 11.229862 11.545455 | 11.233236 2 | 10.433962 10.714286 | 10.466667 3 | 10.583333 10.181818 | 10.457143 4 | 10.647059 10.0625 | 10.363636 5 | 10.625 9.4 | 10.153846 6 | 9.3333333 10.066667 | 9.8571429 7 | 9.875 11.071429 | 10.636364 8| 10.8 11.25 | 11.076923 9| . 11 | 11 10 | 10.571429 10.164557 | 10.197674 -----------+----------------------+---------Total | 11.141361 10.413408 | 11.043019 . tab STRATA TREAT, sum(MARR) nostand nofreq Means of MARR | Treatment Status STRATA | Compariso Treated_s | Total -----------+----------------------+---------1 | .8280943 .81818182 | .82798834 2 | .56603774 .85714286 | .6 3 | .29166667 .18181818 | .25714286 4 | .23529412 .25 | .24242424 5| .25 0 | .15384615 6 | .16666667 .06666667 | .0952381 7| .125 .07142857 | .09090909 8| .2 .625 | .46153846 9| . .53846154 | .53846154 10 | 0 0| 0 -----------+----------------------+---------Total | .77574171 .19553073 | .69735849 . tab STRATA TREAT, sum(NODEGREE) nostand nofreq Means of NODEGREE

618

| Treatment Status STRATA | Compariso Treated_s | Total -----------+----------------------+---------1 | .38408644 .36363636 | .38386783 2 | .62264151 .57142857 | .61666667 3| .625 .54545455 | .6 4 | .52941176 .625 | .57575758 5| .625 .8 | .69230769 6 | .83333333 .8 | .80952381 7| .625 .64285714 | .63636364 8| .8 .75 | .76923077 9| . .76923077 | .76923077 10 | .71428571 .75949367 | .75581395 -----------+----------------------+---------Total | .41186736 .69832402 | .45056604 . tab STRATA TREAT, sum(BLACK) nostand nofreq Means of BLACK | Treatment Status STRATA | Compariso Treated_s | Total -----------+----------------------+---------1 | .36247544 .63636364 | .3654033 2 | .60377358 .57142857 | .6 3 | .66666667 .54545455 | .62857143 4 | .88235294 .875 | .87878788 5| 1 .4 | .76923077 6 | .83333333 .6 | .66666667 7| .875 .92857143 | .90909091 8| .8 1 | .92307692 9| . .92307692 | .92307692 10 | 1 .94936709 | .95348837 -----------+----------------------+---------Total | .40401396 .83798883 | .46264151 . tab STRATA TREAT, sum(HISP) nostand nofreq Means of HISP | Treatment Status STRATA | Compariso Treated_s | Total -----------+----------------------+---------1 | .04911591 0 | .04859086 2 | .0754717 .28571429 | .1 3 | .08333333 0 | .05714286 4| 0 0| 0 5| 0 .2 | .07692308 6 | .16666667 .13333333 | .14285714 7| .125 .07142857 | .09090909 8| .2 0 | .07692308 619

9| . .07692308 | .07692308 10 | 0 .05063291 | .04651163 -----------+----------------------+---------Total | .05148342 .06145251 | .05283019 . tab STRATA TREAT, sum(RE74) nostand nofreq Means of RE74 | Treatment Status STRATA | Compariso Treated_s | Total -----------+----------------------+---------1 | 12216.528 12142.62 | 12215.738 2 | 5989.8844 2031.6573 | 5528.0912 3 | 6476.1906 5884.7335 | 6290.3041 4 | 4790.868 4895.09 | 4841.3999 5 | 2375.3662 5715.8799 | 3660.1792 6 | 3173.6867 2402.9567 | 2623.1653 7 | 1533.1259 2269.1672 | 2001.5158 8 | 1567.414 0 | 602.85154 9| . 34.243847 | 34.243847 10 | 0 0| 0 -----------+----------------------+---------Total | 11386.483 2165.8167 | 10140.823 . tab STRATA TREAT, sum(RE75) nostand nofreq Means of RE75 | Treatment Status STRATA | Compariso Treated_s | Total -----------+----------------------+---------1 | 10352.924 8964.4728 | 10338.081 2 | 3916.448 3250.0113 | 3838.697 3 | 2417.8314 2694.2624 | 2504.7097 4 | 3134.96 2905.615 | 3023.7624 5 | 3204.6788 1917.262 | 2709.5185 6 | 2878.54 1731.1554 | 2058.9796 7 | 643.84411 1230.5051 | 1017.1739 8 | 2539.0337 1501.9275 | 1900.8145 9| . 201.91542 | 201.91542 10 | 127.88014 234.47151 | 225.79547 -----------+----------------------+---------Total | 9528.6389 1583.4094 | 8455.2834 . tab STRATA TREAT, sum(U74BLACK) nostand nofreq Means of U74BLACK | Treatment Status STRATA | Compariso Treated_s |

Total 620

-----------+----------------------+---------1 | .01473477 0 | .01457726 2 | .05660377 .14285714 | .06666667 3 | .08333333 .09090909 | .08571429 4 | .17647059 .1875 | .18181818 5| .25 .2 | .23076923 6 | .16666667 .06666667 | .0952381 7| .125 .21428571 | .18181818 8| .4 1 | .76923077 9| . .92307692 | .92307692 10 | 1 .94936709 | .95348837 -----------+----------------------+---------Total | .03141361 .58659218 | .10641509 . . * Formal test of difference in means within strata across treatment status . * Example is for education . * bysort STRATA: oneway EDUC T . . ***** (2G) Calculate weighted average of within strata mean difference in outcome . . #delimit ; delimiter now ; . global sum = 0 ; . * Sums the estimate of interest over strata ; . global sumwgt = 0 ; . /* Sums the number of treated obs over strata */ > global count = 0 ; . /* This gives the number of Strata used > global numcut = 10;

*/

. * Possibly include extra regressors. > * Not clear which ones, so same as DW99 Table 3 footnote a for column 2 > global XLIST AGE AGESQ EDUC NODEGREE BLACK HISP RE74 RE75; . forvalues i = 1/$numcut { ; 2. global addon = 0 ; 3. /* Within strata estiamte of interest */ > global tobs = 0 ; 4. /* Within strata number of treated obs */ > capture { ; 5. quiet regress RE78 TREAT $XLIST if STRATA == `i' ; 6. global addon = _b[TREAT] ; 7. quiet summarize TREAT if TREAT==1 & STRATA==`i' ; 8. global tobs = _result(1) ; 9. * # of treatment observations ; . }; 10. di "`i' estimate = $addon Top cut = ${cut`i'} #treat obs = $tobs" ; 11. if $addon ~= 0 { ; 621

12. global sum = $sum + $addon * $tobs ; 13. global sumwgt = $sumwgt + $tobs ; 14. global count = $count + 1 ; 15. } ; 16. } ; 1 estimate = -4410.946812653378 Top cut = .1 2 estimate = -2113.275144674707 Top cut = .2 3 estimate = 1486.684503266305 Top cut = .3 4 estimate = -6085.742371951832 Top cut = .4 5 estimate = 1899.984014892578 Top cut = .5 6 estimate = -411.1481648763024 Top cut = .6 7 estimate = 133.9267490931921 Top cut = .7 8 estimate = 1848.656362915039 Top cut = .8 9 estimate = 0 Top cut = .9 #treat obs = 13 10 estimate = 4857.563579676591 Top cut =

#treat obs = 11 #treat obs = 7 #treat obs = 11 #treat obs = 16 #treat obs = 5 #treat obs = 15 #treat obs = 14 #treat obs = 8 #treat obs = 79

. #delimit cr ; delimiter now cr . . . ***** DISPLAY RESULT: "Propensity Score" estimate in last row Table 25.4 . . * Weighted estimate . di $sum / $sumwgt " Count = " $count 1562.7274 Count = 9 . . * This differs from value 995 given in text due to . * previously mentioned correction of U74 and U75. . * Now get 1562 with se not estimated . * compared to DW99 estimates Table 3 column 4 1608 and column 5 1494 . . ********** CLOSE OUTPUT ********** . log close log: c:\Imbook\bwebpage\Section6\mma25p1treatment.txt log type: text closed on: 26 May 2005, 10:26:22

-----------------------------------------------------------------------------------------------------log: c:\Imbook\bwebpage\Section6\mma25p2matching.txt log type: text opened on: 26 May 2005, 10:26:31 . . ********** OVERVIEW OF MMA25P2MATCHING.DO ********** . . * STATA Program . * copyright C 2005 by A. Colin Cameron and Pravin K. Trivedi . * used for "Microeconometrics: Methods and Applications" . * by A. Colin Cameron and Pravin K. Trivedi (2005) 622

. * Cambridge University Press . . * Chapter 25.8.5 pages 893-6 Tables 25.5-25.7 . * Evaluating treatment effect of training on Earnings . * using Dehejia-Wahba data (originally Lalonde data) . . * (1) For DW 2002 specification of the logit model for propensity score . * calculate treatment effect by matching methods (Tables 25.5-6) . * ( ) give distribution of propensity score (Table 25.5) . * (1A) nearest neighbor matching . * (1B) radius matching r = 0.001 . * (1C) radius matching r = 0.001 . * (1D) radius matching r = 0.001 . * (1E) stratification . * (1F) kernel matching . * (2) For DW 1999 specification of the logit model for propensity score . * calculate treatment effect by matching methods (Table 25.6) . . * The program MMA25P1TREATMENT.DO provides simpler nonmatching methods . * for the same data. . . * To run this program you need data file . * nswpsid.da1 . . * To run this program you need the Stata add-ons . * pscore.ado, atts.ado, attr.ado, attnd.ado, attnw.ado . * due to Sascha O. Becker and Andrea Ichino (2002) . * "Estimation of average treatment effects based on propensity scores", . * The Stata Journal, Vol.2, No.4, pp. 358-377. . . * This program uses version 2.02 May 13 2005 for Stata version 8 . * downloadable from http://www.iue.it/Personal/Ichino/#pscore . * We earlier used version 1.29 October 8 2002 for Stata version 7 . * downloadable from http://www.iue.it/Personal/Ichino/#pscore . * and obtained the same results . . * To speed up the program reduce breps: the number of bootstrap . * replications used to obtain bootstrap standard errors . * Bootstrap se's will differ from text as here seed is set to 10101 . . ********** STATA SETUP ********** . . set more off . version 8 . set scheme s1mono /* Used for graphs */ . . ********** DATA DESCRIPTION ********** . 623

. * Data set nswpsid.da1 is data set nswpsid.da1 from Guido Imbens . * http://emlab.berkeley.edu/users/imbens/index.shtml . . * Data originally from DW99 . * R.H. Dehejia and S. Wahba (1999) . * "Causal Effects in Nonexperimental Studies: reevaluating the . * Evaluation of Training Programs", JASA, 1053-1062 . * or DW02 . * R.H. Dehejia and S. Wahba (2002) . * "Propensity-score Matching Methods for Nonexperimental Causal . * Studies", ReStat, 151-161 . * which in turn are from . * Lalonde, R. (1986), "Evaluating the Econometric Evaluations of . * Training Programs with Experimental Data," AER, 604-620. . . * Each observation is for an individual. . * There are 2,675 observations: 185 in treated group and 2490 in control . . * Variables are . * TREAT 1 if treated (NSW treated) and 0 if not (PSID-1 control) . * AGE in years . * EDUC in years . * BLACK 1 if black . * HISP 1 if hispanic . * MARR 1 if married . * RE74 Real annual earnings in 1974 (pre-treatment) . * RE75 Real annual earnings in 1974 (pre-treatment) . * RE78 Real annual earnings in 1974 (post-treatment) . * U74 1 if unemployed in 1974 . * U75 1 if unemployed in 1974 . . * NOTE: U74 and U75 are miscoded in these data and also in the .* summary statistics table of DW02 .* See below for correction to data . . ********** READ DATA AND TRANSFORMATIONS ********** . . ****** propensity score for nsw-psid composite sample************* . ****** output for MMA Tables 25.6 & 25.7 *********************** . . infile TREAT AGE EDUC BLACK HISP MARR RE74 RE75 RE78 U74 U75 /* > */ using nswpsid.da1 (2675 observations read) . . * The original data reversed U74 and U75 . * Should be U74=1 if R74=0 and U74=0 if R74>0 anmd similar for U75 . * This effects results with propensity score though not eariler results . . * Wrong U74 and U75 . sum U74 U75 624

Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------U74 | 2675 .1345794 .3413376 0 1 U75 | 2675 .1293458 .335645 0 1 . . * Correct the original data . drop U74 U75 . gen U74 = cond(RE74 == 0, 1, 0) . gen U75 = cond(RE75 == 0, 1, 0) . . * Correct U74 and U75 . sum U74 U75 Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------U74 | 2675 .1293458 .335645 0 1 U75 | 2675 .1345794 .3413376 0 1 . . * Create regressors used as additional controls in regressions below . gen AGESQ = AGE*AGE . gen EDUCSQ = EDUC*EDUC . * DW99 do not define NODEGREE but following gives Table 1 means . gen NODEGREE = 0 . replace NODEGREE = 1 if EDUC < 12 (891 real changes made) . gen RE74SQ = RE74*RE74 . gen RE75SQ = RE75*RE75 . gen U74BLACK = U74*BLACK . gen U74HISP = U74*HISP . . sum AGE EDUC NODEGREE BLACK HISP MARR U74 U75 RE74 RE75 RE78 TREAT /* > */ AGESQ EDUCSQ RE74SQ RE75SQ U74BLACK U74HISP Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------AGE | 2675 34.22579 10.49984 17 55 EDUC | 2675 11.99439 3.053556 0 17 625

NODEGREE | 2675 .3330841 .4714045 0 1 BLACK | 2675 .2915888 .4545789 0 1 HISP | 2675 .0343925 .1822693 0 1 -------------+-------------------------------------------------------MARR | 2675 .8194393 .3847257 0 1 U74 | 2675 .1293458 .335645 0 1 U75 | 2675 .1345794 .3413376 0 1 RE74 | 2675 18230 13722.25 0 137149 RE75 | 2675 17850.89 13877.78 0 156653 -------------+-------------------------------------------------------RE78 | 2675 20502.38 15632.52 0 121174 TREAT | 2675 .0691589 .2537716 0 1 AGESQ | 2675 1281.61 766.8415 289 3025 EDUCSQ | 2675 153.1862 70.62231 0 289 RE74SQ | 2675 5.21e+08 8.47e+08 0 1.88e+10 -------------+-------------------------------------------------------RE75SQ | 2675 5.11e+08 8.91e+08 0 2.45e+10 U74BLACK | 2675 .0549533 .2279316 0 1 U74HISP | 2675 .0056075 .0746868 0 1 . . bysort TREAT: sum AGE EDUC NODEGREE BLACK HISP MARR U74 U75 RE74 RE75 RE78 TREAT /* > */ AGESQ EDUCSQ RE74SQ RE75SQ U74BLACK U74HISP ----------------------------------------------------------------------------------------------------> TREAT = 0 Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------AGE | 2490 34.8506 10.44076 18 55 EDUC | 2490 12.11687 3.082435 0 17 NODEGREE | 2490 .3052209 .4605934 0 1 BLACK | 2490 .2506024 .433447 0 1 HISP | 2490 .0325301 .1774389 0 1 -------------+-------------------------------------------------------MARR | 2490 .8662651 .3404357 0 1 U74 | 2490 .0863454 .2809298 0 1 U75 | 2490 .1 .3000603 0 1 RE74 | 2490 19428.75 13406.88 0 137149 RE75 | 2490 19063.34 13596.95 0 156653 -------------+-------------------------------------------------------RE78 | 2490 21553.92 15555.35 0 121174 TREAT | 2490 0 0 0 0 AGESQ | 2490 1323.53 769.796 324 3025 EDUCSQ | 2490 156.3161 71.43048 0 289 RE74SQ | 2490 5.57e+08 8.66e+08 0 1.88e+10 -------------+-------------------------------------------------------RE75SQ | 2490 5.48e+08 9.12e+08 0 2.45e+10 U74BLACK | 2490 .0144578 .1193923 0 1 U74HISP | 2490 .0036145 .0600237 0 1 626

----------------------------------------------------------------------------------------------------> TREAT = 1 Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------AGE | 185 25.81622 7.155019 17 48 EDUC | 185 10.34595 2.01065 4 16 NODEGREE | 185 .7081081 .4558666 0 1 BLACK | 185 .8432432 .3645579 0 1 HISP | 185 .0594595 .2371244 0 1 -------------+-------------------------------------------------------MARR | 185 .1891892 .3927217 0 1 U74 | 185 .7081081 .4558666 0 1 U75 | 185 .6 .4912274 0 1 RE74 | 185 2095.574 4886.623 0 35040.1 RE75 | 185 1532.056 3219.251 0 25142.2 -------------+-------------------------------------------------------RE78 | 185 6349.145 7867.405 0 60307.9 TREAT | 185 1 0 1 1 AGESQ | 185 717.3946 431.2517 289 2304 EDUCSQ | 185 111.0595 39.30388 16 256 RE74SQ | 185 2.81e+07 1.14e+08 0 1.23e+09 -------------+-------------------------------------------------------RE75SQ | 185 1.27e+07 5.60e+07 0 6.32e+08 U74BLACK | 185 .6 .4912274 0 1 U74HISP | 185 .0324324 .1776263 0 1

. . *** NOTE: The benchmark estimate obtained from NSW experiment is . *** $1,794 = Average(RE_78 for NSW treated) - Average (RE_78 for NSW comtrols) . *** See MMA25P3EXTRA.DO . . ********** (1) ANALYSIS for DW02 SPECIFICATION OF THE PROPENSITY SCORE ********** . . * Following defines number of bootstrap replications . * Table 25.6 used 200 (or 100 in some places) . global breps 200 . . * From DW02 Table 3 footnote a the propensity score uses the following regressors . global XDW02 AGE AGESQ EDUC EDUCSQ MARR NODEGREE BLACK HISP RE74 RE75 RE74SQ U74 U75 U74HISP . . **** Table 25.5 p.894 summarizes propensity score . **** using just those observations with common support .

627

. pscore TREAT $XDW02, pscore(myscore) comsup blockid(myblock) numblo(5) level(0.005) logit

**************************************************** Algorithm to estimate the propensity score ****************************************************

The treatment is TREAT TREAT | Freq. Percent Cum. ------------+----------------------------------0| 2,490 93.08 93.08 1| 185 6.92 100.00 ------------+----------------------------------Total | 2,675 100.00

Estimation of the propensity score Iteration 0: Iteration 1: Iteration 2: Iteration 3: Iteration 4: Iteration 5: Iteration 6: Iteration 7: Iteration 8: Iteration 9:

log likelihood = -672.64954 log likelihood = -551.87026 log likelihood = -355.56578 log likelihood = -234.78051 log likelihood = -208.2965 log likelihood = -199.26423 log likelihood = -197.26114 log likelihood = -197.1054 log likelihood = -197.10179 log likelihood = -197.10175

Logit estimates

Number of obs = 2675 LR chi2(14) = 951.10 Prob > chi2 = 0.0000 Log likelihood = -197.10175 Pseudo R2 = 0.7070 -----------------------------------------------------------------------------TREAT | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------AGE | .2628422 .120206 2.19 0.029 .0272428 .4984416 AGESQ | -.0053794 .0018341 -2.93 0.003 -.0089742 -.0017846 EDUC | .7149774 .3418173 2.09 0.036 .0450278 1.384927 EDUCSQ | -.0426178 .0179039 -2.38 0.017 -.0777088 -.0075269 MARR | -1.780857 .301802 -5.90 0.000 -2.372378 -1.189336 NODEGREE | .1891046 .4257533 0.44 0.657 -.6453564 1.023566 BLACK | 2.519383 .370358 6.80 0.000 1.793495 3.245272 HISP | 3.087327 .7340486 4.21 0.000 1.648618 4.526036 RE74 | -.0000448 .0000425 -1.05 0.292 -.000128 .0000385 628

RE75 | -.0002678 .0000485 -5.52 0.000 -.0003628 -.0001727 RE74SQ | 1.99e-09 7.75e-10 2.57 0.010 4.72e-10 3.51e-09 U74 | 3.100056 .5187391 5.98 0.000 2.083346 4.116766 U75 | -1.273525 .4644557 -2.74 0.006 -2.183842 -.3632088 U74HISP | -1.925803 1.07186 -1.80 0.072 -4.02661 .1750032 _cons | -7.407524 2.445692 -3.03 0.002 -12.20099 -2.614056 -----------------------------------------------------------------------------note: 65 failures and 0 successes completely determined.

Note: the common support option has been selected The region of common support is [.00036433, .98576756]

Description of the estimated propensity score in region of common support Estimated propensity score ------------------------------------------------------------Percentiles Smallest 1% .0003871 .0003643 5% .0004805 .0003669 10% .0006343 .0003702 Obs 1271 25% .0016393 .0003714 Sum of Wgt. 1271 50% 75% 90% 95% 99%

.0090427 Mean .1447205 Largest Std. Dev. .2809511 .0897599 .9803043 .656286 .9830988 Variance .0789335 .9392306 .9855413 Skewness 2.049999 .9640553 .9857676 Kurtosis 5.748631

****************************************************** Step 1: Identification of the optimal number of blocks Use option detail if you want more detailed output ******************************************************

The final number of blocks is 6 This number of blocks ensures that the mean propensity score is not different for treated and controls in each blocks

********************************************************** 629

Step 2: Test of balancing property of the propensity score Use option detail if you want more detailed output **********************************************************

The balancing property is satisfied

This table shows the inferior bound, the number of treated and the number of controls for each block Inferior | of block | TREAT of pscore | 0 1 | Total -----------+----------------------+---------.0003643 | 960 9| 969 .1 | 56 10 | 66 .2 | 33 14 | 47 .4 | 22 24 | 46 .6 | 7 33 | 40 .8 | 8 95 | 103 -----------+----------------------+---------Total | 1,086 185 | 1,271 Note: the common support option has been selected

******************************************* End of the algorithm to estimate the pscore ******************************************* . . **** For completeness do same with common support option NOT selected . . drop myscore myblock . pscore TREAT $XDW02, pscore(myscore) blockid(myblock) numblo(5) level(0.005) logit

**************************************************** Algorithm to estimate the propensity score ****************************************************

The treatment is TREAT TREAT | Freq. Percent Cum. ------------+----------------------------------0| 2,490 93.08 93.08 1| 185 6.92 100.00 630

------------+----------------------------------Total | 2,675 100.00

Estimation of the propensity score Iteration 0: Iteration 1: Iteration 2: Iteration 3: Iteration 4: Iteration 5: Iteration 6: Iteration 7: Iteration 8: Iteration 9:

log likelihood = -672.64954 log likelihood = -551.87026 log likelihood = -355.56578 log likelihood = -234.78051 log likelihood = -208.2965 log likelihood = -199.26423 log likelihood = -197.26114 log likelihood = -197.1054 log likelihood = -197.10179 log likelihood = -197.10175

Logit estimates

Number of obs = 2675 LR chi2(14) = 951.10 Prob > chi2 = 0.0000 Log likelihood = -197.10175 Pseudo R2 = 0.7070

-----------------------------------------------------------------------------TREAT | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------AGE | .2628422 .120206 2.19 0.029 .0272428 .4984416 AGESQ | -.0053794 .0018341 -2.93 0.003 -.0089742 -.0017846 EDUC | .7149774 .3418173 2.09 0.036 .0450278 1.384927 EDUCSQ | -.0426178 .0179039 -2.38 0.017 -.0777088 -.0075269 MARR | -1.780857 .301802 -5.90 0.000 -2.372378 -1.189336 NODEGREE | .1891046 .4257533 0.44 0.657 -.6453564 1.023566 BLACK | 2.519383 .370358 6.80 0.000 1.793495 3.245272 HISP | 3.087327 .7340486 4.21 0.000 1.648618 4.526036 RE74 | -.0000448 .0000425 -1.05 0.292 -.000128 .0000385 RE75 | -.0002678 .0000485 -5.52 0.000 -.0003628 -.0001727 RE74SQ | 1.99e-09 7.75e-10 2.57 0.010 4.72e-10 3.51e-09 U74 | 3.100056 .5187391 5.98 0.000 2.083346 4.116766 U75 | -1.273525 .4644557 -2.74 0.006 -2.183842 -.3632088 U74HISP | -1.925803 1.07186 -1.80 0.072 -4.02661 .1750032 _cons | -7.407524 2.445692 -3.03 0.002 -12.20099 -2.614056 -----------------------------------------------------------------------------note: 65 failures and 0 successes completely determined.

Description of the estimated propensity score Estimated propensity score ------------------------------------------------------------631

Percentiles Smallest 1% 2.36e-09 1.76e-12 5% 8.39e-08 5.07e-12 10% 4.47e-07 1.14e-11 25% .0000107 1.14e-11 50% 75% 90% 95% 99%

Obs 2675 Sum of Wgt. 2675

.0002558 Mean .0691589 Largest Std. Dev. .2074207 .0071195 .9830988 .129801 .9855413 Variance .0430234 .6394923 .9857676 Skewness 3.407447 .9572224 .986626 Kurtosis 13.56404

****************************************************** Step 1: Identification of the optimal number of blocks Use option detail if you want more detailed output ******************************************************

The final number of blocks is 7 This number of blocks ensures that the mean propensity score is not different for treated and controls in each blocks

********************************************************** Step 2: Test of balancing property of the propensity score Use option detail if you want more detailed output ********************************************************** Variable BLACK is not balanced in block 1 The balancing property is not satisfied Try a different specification of the propensity score Inferior | of block | TREAT of pscore | 0 1 | Total -----------+----------------------+---------0 | 2,265 7 | 2,272 .05 | 98 2| 100 .1 | 56 10 | 66 .2 | 33 14 | 47 .4 | 22 24 | 46 .6 | 7 33 | 40 .8 | 9 95 | 104 -----------+----------------------+---------632

Total |

2,490

185 |

2,675

******************************************* End of the algorithm to estimate the pscore ******************************************* . . **** All of the following use common support . . **************************************************************************** . **** Note: The results in the first half of Table 25.6 . **** erroneously added RE75SQ as a regressor. . **** This does not effect Table 25.5 (done correctly) or . **** stratification estimates (which used myscore from correct model). . **** But it does effect NN, radius and kernel estimates. . **** To enable comparison with the text we do analysis here . **** both with and without RE75SQ. . **** Even dropping RE75SQ the results continue to differ from DW02. . **** Text Corrected . **** Table 25.6 Table 25.6 DW 2002 . **** NN 2385 1286 1202 . **** Radius = 0.001 -7815 -7808 1187 . **** Radius = 0.0001 -9333 -6401 1191 . **** Radius = 0.00001 -2200 -1135 1198 . **** Stratification 1497 1497 . **** Kernel 1309 1342 . **************************************************************************** . . **** Row 1 Table 25.6: Nearest neighbor matching (random version) . set seed 10101 . attnd RE78 TREAT $XDW02 RE75SQ, comsup boot reps($breps) dots logit

The program is searching the nearest neighbor of each treated unit. This operation may take a while.

ATT estimation with Nearest Neighbor Matching method (random draw version) Analytical standard errors --------------------------------------------------------n. treat. n. contr. ATT Std. Err. t --------------------------------------------------------185

53

2385.430

1792.028

1.331

633

--------------------------------------------------------Note: the numbers of treated and controls refer to actual nearest neighbour matches

Bootstrapping of standard errors command: attnd RE78 TREAT AGE AGESQ EDUC EDUCSQ MARR NODEGREE BLACK HISP RE74 RE75 RE74SQ U74 U > 75 U74HISP RE75SQ , pscore() logit comsup statistic: attnd = r(attnd) .................................................................................................... > .................................................................................................. > ..

Bootstrap statistics

Number of obs = Replications = 200

2675

-----------------------------------------------------------------------------Variable | Reps Observed Bias Std. Err. [95% Conf. Interval] -------------+---------------------------------------------------------------attnd | 200 2385.43 -859.5093 1094.969 226.1985 4544.661 (N) | -937.0529 3515.425 (P) | 1202.547 4697.713 (BC) -----------------------------------------------------------------------------Note: N = normal P = percentile BC = bias-corrected

ATT estimation with Nearest Neighbor Matching method (random draw version) Bootstrapped standard errors --------------------------------------------------------n. treat. n. contr. ATT Std. Err. t --------------------------------------------------------185

53

2385.430

1094.969

2.179

--------------------------------------------------------Note: the numbers of treated and controls refer to actual nearest neighbour matches . set seed 10101

634

. attnd RE78 TREAT $XDW02, comsup boot reps($breps) dots logit

The program is searching the nearest neighbor of each treated unit. This operation may take a while.

ATT estimation with Nearest Neighbor Matching method (random draw version) Analytical standard errors --------------------------------------------------------n. treat. n. contr. ATT Std. Err. t --------------------------------------------------------185

60

1285.782

3895.044

0.330

--------------------------------------------------------Note: the numbers of treated and controls refer to actual nearest neighbour matches

Bootstrapping of standard errors command: attnd RE78 TREAT AGE AGESQ EDUC EDUCSQ MARR NODEGREE BLACK HISP RE74 RE75 RE74SQ U74 U > 75 U74HISP , pscore() logit comsup statistic: attnd = r(attnd) .................................................................................................... > .................................................................................................. > ..

Bootstrap statistics

Number of obs = Replications = 200

2675

-----------------------------------------------------------------------------Variable | Reps Observed Bias Std. Err. [95% Conf. Interval] -------------+---------------------------------------------------------------attnd | 200 1285.782 319.006 1275.405 -1229.261 3800.825 (N) | -1128.466 3835.567 (P) | -2181.243 3294.797 (BC) -----------------------------------------------------------------------------Note: N = normal P = percentile BC = bias-corrected

635

ATT estimation with Nearest Neighbor Matching method (random draw version) Bootstrapped standard errors --------------------------------------------------------n. treat. n. contr. ATT Std. Err. t --------------------------------------------------------185

60

1285.782

1275.405

1.008

--------------------------------------------------------Note: the numbers of treated and controls refer to actual nearest neighbour matches . . **** Row 2 Table 25.6: Radius matching for Radius=0.001 . set seed 10101 . attr RE78 TREAT $XDW02 RE75SQ, comsup boot reps($breps) dots logit radius(0.001)

The program is searching for matches of treated units within radius. This operation may take a while.

ATT estimation with the Radius Matching method Analytical standard errors --------------------------------------------------------n. treat. n. contr. ATT Std. Err. t --------------------------------------------------------54

517 -7815.382

1118.181

-6.989

--------------------------------------------------------Note: the numbers of treated and controls refer to actual matches within radius

Bootstrapping of standard errors command: attr RE78 TREAT AGE AGESQ EDUC EDUCSQ MARR NODEGREE BLACK HISP RE74 RE75 RE74SQ U74 U7 > 5 U74HISP RE75SQ , pscore() logit comsup radius(.001) statistic: attr = r(attr) 636

.................................................................................................... > .................................................................................................. > ..

Bootstrap statistics

Number of obs = Replications = 200

2675

-----------------------------------------------------------------------------Variable | Reps Observed Bias Std. Err. [95% Conf. Interval] -------------+---------------------------------------------------------------attr | 200 -7815.381 1345.983 3794.466 -15297.9 -332.8595 (N) | -18163.96 936.3913 (P) | -21184.98 -2839.753 (BC) -----------------------------------------------------------------------------Note: N = normal P = percentile BC = bias-corrected

ATT estimation with the Radius Matching method Bootstrapped standard errors --------------------------------------------------------n. treat. n. contr. ATT Std. Err. t --------------------------------------------------------54

517 -7815.381

3794.466

-2.060

--------------------------------------------------------Note: the numbers of treated and controls refer to actual matches within radius . set seed 10101 . attr RE78 TREAT $XDW02, comsup boot reps($breps) dots logit radius(0.001)

The program is searching for matches of treated units within radius. This operation may take a while.

ATT estimation with the Radius Matching method Analytical standard errors --------------------------------------------------------n. treat. n. contr. ATT Std. Err. t ---------------------------------------------------------

637

51

541 -7808.241

1146.418

-6.811

--------------------------------------------------------Note: the numbers of treated and controls refer to actual matches within radius

Bootstrapping of standard errors command: attr RE78 TREAT AGE AGESQ EDUC EDUCSQ MARR NODEGREE BLACK HISP RE74 RE75 RE74SQ U74 U7 > 5 U74HISP , pscore() logit comsup radius(.001) statistic: attr = r(attr) .................................................................................................... > .................................................................................................. > ..

Bootstrap statistics

Number of obs = Replications = 200

2675

-----------------------------------------------------------------------------Variable | Reps Observed Bias Std. Err. [95% Conf. Interval] -------------+---------------------------------------------------------------attr | 200 -7808.242 1022.016 3770.093 -15242.7 -373.7819 (N) | -16697.45 1438.308 (P) | -18942.21 -1204.325 (BC) -----------------------------------------------------------------------------Note: N = normal P = percentile BC = bias-corrected

ATT estimation with the Radius Matching method Bootstrapped standard errors --------------------------------------------------------n. treat. n. contr. ATT Std. Err. t --------------------------------------------------------51

541 -7808.242

3770.093

-2.071

--------------------------------------------------------Note: the numbers of treated and controls refer to actual matches within radius . 638

. **** Row 3 Table 25.6: Radius matching for Radius=0.0001 . set seed 10101 . attr RE78 TREAT $XDW02 RE75SQ, comsup boot reps($breps) dots logit radius(0.0001)

The program is searching for matches of treated units within radius. This operation may take a while.

ATT estimation with the Radius Matching method Analytical standard errors --------------------------------------------------------n. treat. n. contr. ATT Std. Err. t --------------------------------------------------------24

92 -9333.120

2285.624

-4.083

--------------------------------------------------------Note: the numbers of treated and controls refer to actual matches within radius

Bootstrapping of standard errors command: attr RE78 TREAT AGE AGESQ EDUC EDUCSQ MARR NODEGREE BLACK HISP RE74 RE75 RE74SQ U74 U7 > 5 U74HISP RE75SQ , pscore() logit comsup radius(.0001) statistic: attr = r(attr) .................................................................................................... > .................................................................................................. > ..

Bootstrap statistics

Number of obs = Replications = 200

2675

-----------------------------------------------------------------------------Variable | Reps Observed Bias Std. Err. [95% Conf. Interval] -------------+---------------------------------------------------------------attr | 200 -9333.12 4076.044 5211.11 -19609.2 942.9621 (N) | -19094.04 4604.865 (P) | -22414.52 -4341.134 (BC) -----------------------------------------------------------------------------Note: N = normal P = percentile 639

BC = bias-corrected

ATT estimation with the Radius Matching method Bootstrapped standard errors --------------------------------------------------------n. treat. n. contr. ATT Std. Err. t --------------------------------------------------------24

92 -9333.120

5211.110

-1.791

--------------------------------------------------------Note: the numbers of treated and controls refer to actual matches within radius . set seed 10101 . attr RE78 TREAT $XDW02, comsup boot reps($breps) dots logit radius(0.0001)

The program is searching for matches of treated units within radius. This operation may take a while.

ATT estimation with the Radius Matching method Analytical standard errors --------------------------------------------------------n. treat. n. contr. ATT Std. Err. t --------------------------------------------------------27

91 -6401.345

2054.218

-3.116

--------------------------------------------------------Note: the numbers of treated and controls refer to actual matches within radius

Bootstrapping of standard errors command: attr RE78 TREAT AGE AGESQ EDUC EDUCSQ MARR NODEGREE BLACK HISP RE74 RE75 RE74SQ U74 U7 > 5 U74HISP , pscore() logit comsup radius(.0001) statistic: attr = r(attr) .................................................................................................... 640

> .................................................................................................. > ..

Bootstrap statistics

Number of obs = Replications = 200

2675

-----------------------------------------------------------------------------Variable | Reps Observed Bias Std. Err. [95% Conf. Interval] -------------+---------------------------------------------------------------attr | 200 -6401.345 310.4673 5618.88 -17481.53 4678.842 (N) | -18778.71 4636.073 (P) | -21404.97 3740.767 (BC) -----------------------------------------------------------------------------Note: N = normal P = percentile BC = bias-corrected

ATT estimation with the Radius Matching method Bootstrapped standard errors --------------------------------------------------------n. treat. n. contr. ATT Std. Err. t --------------------------------------------------------27

91 -6401.345

5618.880

-1.139

--------------------------------------------------------Note: the numbers of treated and controls refer to actual matches within radius . . **** Row 4 Table 25.6: Radius matching for Radius=0.00001 . set seed 10101 . attr RE78 TREAT $XDW02 RE75SQ, comsup boot reps($breps) dots logit radius(0.00001)

The program is searching for matches of treated units within radius. This operation may take a while.

ATT estimation with the Radius Matching method Analytical standard errors --------------------------------------------------------n. treat. n. contr. ATT Std. Err. t --------------------------------------------------------641

15

19 -2200.022

2986.211

-0.737

--------------------------------------------------------Note: the numbers of treated and controls refer to actual matches within radius

Bootstrapping of standard errors command: attr RE78 TREAT AGE AGESQ EDUC EDUCSQ MARR NODEGREE BLACK HISP RE74 RE75 RE74SQ U74 U7 > 5 U74HISP RE75SQ , pscore() logit comsup radius(.00001) statistic: attr = r(attr) .................................................................................................... > .................................................................................................. > ..

Bootstrap statistics

Number of obs = Replications = 200

2675

-----------------------------------------------------------------------------Variable | Reps Observed Bias Std. Err. [95% Conf. Interval] -------------+---------------------------------------------------------------attr | 200 -2200.022 626.9762 7009.51 -16022.47 11622.43 (N) | -24355.12 8831.196 (P) | -31741.1 4217.228 (BC) -----------------------------------------------------------------------------Note: N = normal P = percentile BC = bias-corrected

ATT estimation with the Radius Matching method Bootstrapped standard errors --------------------------------------------------------n. treat. n. contr. ATT Std. Err. t --------------------------------------------------------15

19 -2200.022

7009.510

-0.314

--------------------------------------------------------Note: the numbers of treated and controls refer to actual matches within radius

642

. set seed 10101 . attr RE78 TREAT $XDW02, comsup boot reps($breps) dots logit radius(0.00001)

The program is searching for matches of treated units within radius. This operation may take a while.

ATT estimation with the Radius Matching method Analytical standard errors --------------------------------------------------------n. treat. n. contr. ATT Std. Err. t --------------------------------------------------------16

17 -1135.184

3189.367

-0.356

--------------------------------------------------------Note: the numbers of treated and controls refer to actual matches within radius

Bootstrapping of standard errors command: attr RE78 TREAT AGE AGESQ EDUC EDUCSQ MARR NODEGREE BLACK HISP RE74 RE75 RE74SQ U74 U7 > 5 U74HISP , pscore() logit comsup radius(.00001) statistic: attr = r(attr) .................................................................................................... > .................................................................................................. > ..

Bootstrap statistics

Number of obs = Replications = 200

2675

-----------------------------------------------------------------------------Variable | Reps Observed Bias Std. Err. [95% Conf. Interval] -------------+---------------------------------------------------------------attr | 199 -1135.184 -2079.93 7030.204 -14998.87 12728.5 (N) | -23808.6 8048.6 (P) | -16939.85 9102.585 (BC) -----------------------------------------------------------------------------Note: N = normal P = percentile BC = bias-corrected 643

ATT estimation with the Radius Matching method Bootstrapped standard errors --------------------------------------------------------n. treat. n. contr. ATT Std. Err. t --------------------------------------------------------16

17 -1135.184

7030.204

-0.161

--------------------------------------------------------Note: the numbers of treated and controls refer to actual matches within radius . . **** Row 5 Table 25.6: Stratification Matching . set seed 10101 . atts RE78 TREAT, pscore(myscore) blockid(myblock) comsup boot reps($breps) dots

ATT estimation with the Stratification method Analytical standard errors --------------------------------------------------------n. treat. n. contr. ATT Std. Err. t --------------------------------------------------------185

1086

1497.484

920.688

1.626

---------------------------------------------------------

Bootstrapping of standard errors command: atts RE78 TREAT , pscore(myscore) blockid(myblock) comsup statistic: atts = r(atts) .................................................................................................... > .................................................................................................. > ..

Bootstrap statistics

Number of obs = Replications = 200

2675

644

-----------------------------------------------------------------------------Variable | Reps Observed Bias Std. Err. [95% Conf. Interval] -------------+---------------------------------------------------------------atts | 200 1497.484 91.22797 913.129 -303.1669 3298.134 (N) | -16.69353 3509.36 (P) | -64.37524 3306.115 (BC) -----------------------------------------------------------------------------Note: N = normal P = percentile BC = bias-corrected

ATT estimation with the Stratification method Bootstrapped standard errors --------------------------------------------------------n. treat. n. contr. ATT Std. Err. t --------------------------------------------------------185

1086

1497.484

913.129

1.640

--------------------------------------------------------. . **** Row 6 Table 25.6: Kernel Matching . set seed 10101 . attk RE78 TREAT $XDW02 RE75SQ, comsup boot reps($breps) dots logit

The program is searching for matches of each treated unit. This operation may take a while.

ATT estimation with the Kernel Matching method --------------------------------------------------------n. treat. n. contr. ATT Std. Err. t --------------------------------------------------------185

1058

1309.217

.

.

--------------------------------------------------------Note: Analytical standard errors cannot be computed. Use the bootstrap option to get bootstrapped standard errors.

645

Bootstrapping of standard errors command: attk RE78 TREAT AGE AGESQ EDUC EDUCSQ MARR NODEGREE BLACK HISP RE74 RE75 RE74SQ U74 U7 > 5 U74HISP RE75SQ , pscore() logit comsup bwidth(.06) statistic: attk = r(attk) .................................................................................................... > .................................................................................................. > ..

Bootstrap statistics

Number of obs = Replications = 200

2675

-----------------------------------------------------------------------------Variable | Reps Observed Bias Std. Err. [95% Conf. Interval] -------------+---------------------------------------------------------------attk | 200 1309.217 45.93746 958.1801 -580.2722 3198.707 (N) | -412.7856 3416.999 (P) | -374.4567 3450.043 (BC) -----------------------------------------------------------------------------Note: N = normal P = percentile BC = bias-corrected

ATT estimation with the Kernel Matching method Bootstrapped standard errors --------------------------------------------------------n. treat. n. contr. ATT Std. Err. t --------------------------------------------------------185

1058

1309.217

958.180

1.366

--------------------------------------------------------. set seed 10101 . attk RE78 TREAT $XDW02, comsup boot reps($breps) dots logit

The program is searching for matches of each treated unit. This operation may take a while.

ATT estimation with the Kernel Matching method

646

--------------------------------------------------------n. treat. n. contr. ATT Std. Err. t --------------------------------------------------------185

1086

1342.016

.

.

--------------------------------------------------------Note: Analytical standard errors cannot be computed. Use the bootstrap option to get bootstrapped standard errors.

Bootstrapping of standard errors command: attk RE78 TREAT AGE AGESQ EDUC EDUCSQ MARR NODEGREE BLACK HISP RE74 RE75 RE74SQ U74 U7 > 5 U74HISP , pscore() logit comsup bwidth(.06) statistic: attk = r(attk) .................................................................................................... > .................................................................................................. > ..

Bootstrap statistics

Number of obs = Replications = 200

2675

-----------------------------------------------------------------------------Variable | Reps Observed Bias Std. Err. [95% Conf. Interval] -------------+---------------------------------------------------------------attk | 200 1342.016 61.94744 933.8668 -499.5284 3183.561 (N) | -378.5027 3354.131 (P) | -405.7551 3349.118 (BC) -----------------------------------------------------------------------------Note: N = normal P = percentile BC = bias-corrected

ATT estimation with the Kernel Matching method Bootstrapped standard errors --------------------------------------------------------n. treat. n. contr. ATT Std. Err. t --------------------------------------------------------185

1086

1342.016

933.867

1.437

--------------------------------------------------------647

. . ********** (2) ANALYSIS for DW99 SPECIFICATION OF THE PROPENSITY SCORE ********** . . * From DW99 Table 3 footnote e the propensity score uses the following regressors . global XDW99 AGE AGESQ EDUC EDUCSQ MARR NODEGREE BLACK HISP RE74 RE75 RE74SQ RE75SQ U74BLACK . . * Note that CT Table 25.6 footnote b erroneously lists RE74*RE75 as regressor . * but this program (correctly) did not include RE74*RE75 . . **** Propensity score with just those observations with common support . . drop myscore myblock . pscore TREAT $XDW99, pscore(myscore) comsup blockid(myblock) numblo($breps) level(0.005) logit

**************************************************** Algorithm to estimate the propensity score ****************************************************

The treatment is TREAT TREAT | Freq. Percent Cum. ------------+----------------------------------0| 2,490 93.08 93.08 1| 185 6.92 100.00 ------------+----------------------------------Total | 2,675 100.00

Estimation of the propensity score Iteration 0: log likelihood = -672.64954 Iteration 1: log likelihood = -499.56574 Iteration 2: log likelihood = -318.55053 Iteration 3: log likelihood = -248.28844 Iteration 4: log likelihood = -225.08984 Iteration 5: log likelihood = -219.00396 Iteration 6: log likelihood = -209.30653 Iteration 7: log likelihood = -208.38887 Iteration 8: log likelihood = -205.17689 Iteration 9: log likelihood = -204.93156 Iteration 10: log likelihood = -204.92951 648

Iteration 11: log likelihood = -204.9295 Logit estimates

Log likelihood = -204.9295

Number of obs = 2675 LR chi2(13) = 935.44 Prob > chi2 = 0.0000 Pseudo R2 = 0.6953

-----------------------------------------------------------------------------TREAT | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------AGE | .3305734 .1203353 2.75 0.006 .0947206 .5664262 AGESQ | -.0063429 .0018561 -3.42 0.001 -.0099808 -.0027049 EDUC | .8247711 .3534216 2.33 0.020 .1320775 1.517465 EDUCSQ | -.0483153 .0186057 -2.60 0.009 -.0847819 -.0118488 MARR | -1.884062 .2994614 -6.29 0.000 -2.470996 -1.297129 NODEGREE | .1299868 .4284278 0.30 0.762 -.7097163 .96969 BLACK | 1.132961 .352088 3.22 0.001 .4428814 1.823041 HISP | 1.962762 .5673735 3.46 0.001 .8507302 3.074793 RE74 | -.0001047 .0000355 -2.95 0.003 -.0001743 -.0000351 RE75 | -.0002172 .0000415 -5.23 0.000 -.0002986 -.0001357 RE74SQ | 2.36e-09 6.57e-10 3.59 0.000 1.07e-09 3.65e-09 RE75SQ | 1.58e-10 6.68e-10 0.24 0.813 -1.15e-09 1.47e-09 U74BLACK | 2.137042 .4273667 5.00 0.000 1.299419 2.974665 _cons | -7.552458 2.451721 -3.08 0.002 -12.35774 -2.747173 -----------------------------------------------------------------------------note: 19 failures and 0 successes completely determined.

Note: the common support option has been selected The region of common support is [.00065257, .97487544]

Description of the estimated propensity score in region of common support Estimated propensity score ------------------------------------------------------------Percentiles Smallest 1% .0006813 .0006526 5% .0008363 .0006581 10% .0011416 .0006593 Obs 1331 25% .0024351 .0006598 Sum of Wgt. 1331 50% 75% 90% 95%

.0111854 Mean .1388772 Largest Std. Dev. .275571 .0779976 .9744237 .6200607 .9747552 Variance .0759394 .9494181 .9747918 Skewness 2.17177 649

99%

.970738

.9748754

Kurtosis

6.296349

****************************************************** Step 1: Identification of the optimal number of blocks Use option detail if you want more detailed output ******************************************************

The final number of blocks is 195 This number of blocks ensures that the mean propensity score is not different for treated and controls in each blocks

********************************************************** Step 2: Test of balancing property of the propensity score Use option detail if you want more detailed output **********************************************************

The balancing property is satisfied

This table shows the inferior bound, the number of treated and the number of controls for each block Inferior | of block | TREAT of pscore | 0 1 | Total -----------+----------------------+---------.0006526 | 501 2| 503 .005 | 143 3| 146 .01 | 78 0| 78 .015 | 42 0| 42 .02 | 38 0| 38 .025 | 29 1| 30 .03 | 22 0| 22 .035 | 23 0| 23 .04 | 22 0| 22 .045 | 17 1| 18 .05 | 23 0| 23 .055 | 13 1| 14 .06 | 12 0| 12 .065 | 9 0| 9 .07 | 11 1| 12 .075 | 9 1| 10 .08 | 6 0| 6 .085 | 6 0| 6 650

.09 | .095 | .1 | .105 | .11 | .115 | .12 | .125 | .13 | .135 | .14 | .145 | .15 | .155 | .16 | .165 | .175 | .18 | .185 | .19 | .195 | .2 | .205 | .215 | .225 | .23 | .235 | .24 | .245 | .25 | .26 | .265 | .27 | .28 | .285 | .29 | .295 | .3 | .305 | .315 | .32 | .325 | .33 | .335 | .34 | .345 | .35 | .355 | .365 | .37 | .375 |

8 6 9 4 8 3 1 2 6 1 1 1 2 4 3 2 1 0 1 2 2 1 1 5 2 2 2 2 0 0 1 1 1 1 1 2 2 2 0 1 0 2 1 0 1 1 2 0 1 2 2

1| 0| 0| 0| 0| 0| 0| 3| 1| 0| 1| 0| 0| 0| 0| 0| 0| 1| 0| 0| 1| 0| 0| 0| 1| 1| 3| 0| 1| 2| 1| 0| 0| 0| 0| 1| 1| 0| 1| 0| 1| 1| 0| 1| 1| 2| 0| 1| 0| 0| 2|

9 6 9 4 8 3 1 5 7 1 2 1 2 4 3 2 1 1 1 2 3 1 1 5 3 3 5 2 1 2 2 1 1 1 1 3 3 2 1 1 1 3 1 1 2 3 2 1 1 2 4 651

.38 | .385 | .4 | .405 | .42 | .425 | .45 | .47 | .48 | .485 | .495 | .5 | .51 | .515 | .525 | .53 | .535 | .54 | .555 | .56 | .565 | .57 | .575 | .59 | .595 | .6 | .605 | .61 | .615 | .62 | .625 | .635 | .64 | .645 | .665 | .67 | .675 | .68 | .69 | .71 | .735 | .74 | .745 | .765 | .79 | .795 | .8 | .805 | .815 | .825 | .84 |

1 1 0 0 0 1 2 1 1 2 1 0 0 2 0 0 0 1 0 1 1 0 1 0 0 0 0 1 0 0 0 1 1 2 0 1 0 1 1 1 0 1 2 1 0 0 0 0 0 0 0

2| 4| 1| 2| 1| 0| 0| 0| 1| 0| 0| 2| 2| 1| 1| 2| 1| 0| 1| 1| 0| 1| 1| 1| 1| 1| 1| 2| 1| 1| 1| 2| 1| 0| 1| 0| 3| 0| 0| 1| 1| 0| 0| 1| 4| 1| 1| 2| 3| 1| 1|

3 5 1 2 1 1 2 1 2 2 1 2 2 3 1 2 1 1 1 2 1 1 2 1 1 1 1 3 1 1 1 3 2 2 1 1 3 1 1 2 1 1 2 2 4 1 1 2 3 1 1 652

.845 | 0 1| 1 .85 | 0 1| 1 .86 | 0 1| 1 .865 | 0 1| 1 .895 | 0 1| 1 .9 | 0 2| 2 .905 | 0 2| 2 .915 | 0 1| 1 .92 | 0 1| 1 .925 | 0 7| 7 .93 | 0 2| 2 .935 | 0 1| 1 .94 | 0 3| 3 .945 | 1 6| 7 .95 | 1 14 | 15 .955 | 0 16 | 16 .96 | 1 5| 6 .965 | 3 12 | 15 .97 | 1 13 | 14 -----------+----------------------+---------Total | 1,146 185 | 1,331 Note: the common support option has been selected

******************************************* End of the algorithm to estimate the pscore ******************************************* . . **** For completeness do same with common support option NOT selected . . drop myscore myblock . pscore TREAT $XDW99, pscore(myscore) blockid(myblock) numblo($breps) level(0.005) logit

**************************************************** Algorithm to estimate the propensity score ****************************************************

The treatment is TREAT TREAT | Freq. Percent Cum. ------------+----------------------------------0| 2,490 93.08 93.08 1| 185 6.92 100.00 ------------+----------------------------------Total | 2,675 100.00 653

Estimation of the propensity score Iteration 0: log likelihood = -672.64954 Iteration 1: log likelihood = -499.56574 Iteration 2: log likelihood = -318.55053 Iteration 3: log likelihood = -248.28844 Iteration 4: log likelihood = -225.08984 Iteration 5: log likelihood = -219.00396 Iteration 6: log likelihood = -209.30653 Iteration 7: log likelihood = -208.38887 Iteration 8: log likelihood = -205.17689 Iteration 9: log likelihood = -204.93156 Iteration 10: log likelihood = -204.92951 Iteration 11: log likelihood = -204.9295 Logit estimates

Log likelihood = -204.9295

Number of obs = 2675 LR chi2(13) = 935.44 Prob > chi2 = 0.0000 Pseudo R2 = 0.6953

-----------------------------------------------------------------------------TREAT | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------AGE | .3305734 .1203353 2.75 0.006 .0947206 .5664262 AGESQ | -.0063429 .0018561 -3.42 0.001 -.0099808 -.0027049 EDUC | .8247711 .3534216 2.33 0.020 .1320775 1.517465 EDUCSQ | -.0483153 .0186057 -2.60 0.009 -.0847819 -.0118488 MARR | -1.884062 .2994614 -6.29 0.000 -2.470996 -1.297129 NODEGREE | .1299868 .4284278 0.30 0.762 -.7097163 .96969 BLACK | 1.132961 .352088 3.22 0.001 .4428814 1.823041 HISP | 1.962762 .5673735 3.46 0.001 .8507302 3.074793 RE74 | -.0001047 .0000355 -2.95 0.003 -.0001743 -.0000351 RE75 | -.0002172 .0000415 -5.23 0.000 -.0002986 -.0001357 RE74SQ | 2.36e-09 6.57e-10 3.59 0.000 1.07e-09 3.65e-09 RE75SQ | 1.58e-10 6.68e-10 0.24 0.813 -1.15e-09 1.47e-09 U74BLACK | 2.137042 .4273667 5.00 0.000 1.299419 2.974665 _cons | -7.552458 2.451721 -3.08 0.002 -12.35774 -2.747173 -----------------------------------------------------------------------------note: 19 failures and 0 successes completely determined.

Description of the estimated propensity score Estimated propensity score ------------------------------------------------------------Percentiles Smallest 654

1% 5% 10% 25%

2.84e-08 4.47e-07 2.07e-06 .000034

50%

.0006388 Mean .0691589 Largest Std. Dev. .2063646 .010941 .9744237 .1336877 .9747552 Variance .0425863 .6200607 .9747918 Skewness 3.471137 .9651648 .9748754 Kurtosis 14.05057

75% 90% 95% 99%

4.49e-11 4.88e-10 4.88e-10 4.95e-10

Obs 2675 Sum of Wgt. 2675

****************************************************** Step 1: Identification of the optimal number of blocks Use option detail if you want more detailed output ******************************************************

The final number of blocks is 195 This number of blocks ensures that the mean propensity score is not different for treated and controls in each blocks

********************************************************** Step 2: Test of balancing property of the propensity score Use option detail if you want more detailed output ********************************************************** Variable BLACK is not balanced in block 1 The balancing property is not satisfied Try a different specification of the propensity score Inferior | of block | TREAT of pscore | 0 1 | Total -----------+----------------------+---------0 | 1,845 2 | 1,847 .005 | 143 3| 146 .01 | 78 0| 78 .015 | 42 0| 42 .02 | 38 0| 38 .025 | 29 1| 30 .03 | 22 0| 22 .035 | 23 0| 23 .04 | 22 0| 22 655

.045 | .05 | .055 | .06 | .065 | .07 | .075 | .08 | .085 | .09 | .095 | .1 | .105 | .11 | .115 | .12 | .125 | .13 | .135 | .14 | .145 | .15 | .155 | .16 | .165 | .175 | .18 | .185 | .19 | .195 | .2 | .205 | .215 | .225 | .23 | .235 | .24 | .245 | .25 | .26 | .265 | .27 | .28 | .285 | .29 | .295 | .3 | .305 | .315 | .32 | .325 |

17 23 13 12 9 11 9 6 6 8 6 9 4 8 3 1 2 6 1 1 1 2 4 3 2 1 0 1 2 2 1 1 5 2 2 2 2 0 0 1 1 1 1 1 2 2 2 0 1 0 2

1| 0| 1| 0| 0| 1| 1| 0| 0| 1| 0| 0| 0| 0| 0| 0| 3| 1| 0| 1| 0| 0| 0| 0| 0| 0| 1| 0| 0| 1| 0| 0| 0| 1| 1| 3| 0| 1| 2| 1| 0| 0| 0| 0| 1| 1| 0| 1| 0| 1| 1|

18 23 14 12 9 12 10 6 6 9 6 9 4 8 3 1 5 7 1 2 1 2 4 3 2 1 1 1 2 3 1 1 5 3 3 5 2 1 2 2 1 1 1 1 3 3 2 1 1 1 3 656

.33 | .335 | .34 | .345 | .35 | .355 | .365 | .37 | .375 | .38 | .385 | .4 | .405 | .42 | .425 | .45 | .47 | .48 | .485 | .495 | .5 | .51 | .515 | .525 | .53 | .535 | .54 | .555 | .56 | .565 | .57 | .575 | .59 | .595 | .6 | .605 | .61 | .615 | .62 | .625 | .635 | .64 | .645 | .665 | .67 | .675 | .68 | .69 | .71 | .735 | .74 |

1 0 1 1 2 0 1 2 2 1 1 0 0 0 1 2 1 1 2 1 0 0 2 0 0 0 1 0 1 1 0 1 0 0 0 0 1 0 0 0 1 1 2 0 1 0 1 1 1 0 1

0| 1| 1| 2| 0| 1| 0| 0| 2| 2| 4| 1| 2| 1| 0| 0| 0| 1| 0| 0| 2| 2| 1| 1| 2| 1| 0| 1| 1| 0| 1| 1| 1| 1| 1| 1| 2| 1| 1| 1| 2| 1| 0| 1| 0| 3| 0| 0| 1| 1| 0|

1 1 2 3 2 1 1 2 4 3 5 1 2 1 1 2 1 2 2 1 2 2 3 1 2 1 1 1 2 1 1 2 1 1 1 1 3 1 1 1 3 2 2 1 1 3 1 1 2 1 1 657

.745 | 2 0| 2 .765 | 1 1| 2 .79 | 0 4| 4 .795 | 0 1| 1 .8 | 0 1| 1 .805 | 0 2| 2 .815 | 0 3| 3 .825 | 0 1| 1 .84 | 0 1| 1 .845 | 0 1| 1 .85 | 0 1| 1 .86 | 0 1| 1 .865 | 0 1| 1 .895 | 0 1| 1 .9 | 0 2| 2 .905 | 0 2| 2 .915 | 0 1| 1 .92 | 0 1| 1 .925 | 0 7| 7 .93 | 0 2| 2 .935 | 0 1| 1 .94 | 0 3| 3 .945 | 1 6| 7 .95 | 1 14 | 15 .955 | 0 16 | 16 .96 | 1 5| 6 .965 | 3 12 | 15 .97 | 1 13 | 14 -----------+----------------------+---------Total | 2,490 185 | 2,675

******************************************* End of the algorithm to estimate the pscore ******************************************* . . **** All of the following use common support . . **** Row 7 Table 25.6: Nearest neighbor matching (random version) . set seed 10101 . attnd RE78 TREAT $XDW99, comsup boot reps($breps) dots logit

The program is searching the nearest neighbor of each treated unit. This operation may take a while.

658

ATT estimation with Nearest Neighbor Matching method (random draw version) Analytical standard errors --------------------------------------------------------n. treat. n. contr. ATT Std. Err. t --------------------------------------------------------185

57

560.287

2205.663

0.254

--------------------------------------------------------Note: the numbers of treated and controls refer to actual nearest neighbour matches

Bootstrapping of standard errors command: attnd RE78 TREAT AGE AGESQ EDUC EDUCSQ MARR NODEGREE BLACK HISP RE74 RE75 RE74SQ RE75S > Q U74BLACK , pscore() logit comsup statistic: attnd = r(attnd) .................................................................................................... > .................................................................................................. > ..

Bootstrap statistics

Number of obs = Replications = 200

2675

-----------------------------------------------------------------------------Variable | Reps Observed Bias Std. Err. [95% Conf. Interval] -------------+---------------------------------------------------------------attnd | 200 560.2872 1104.87 1331.294 -2064.967 3185.542 (N) | -785.5272 4190.844 (P) | -2615.809 2016.239 (BC) -----------------------------------------------------------------------------Note: N = normal P = percentile BC = bias-corrected

ATT estimation with Nearest Neighbor Matching method (random draw version) Bootstrapped standard errors --------------------------------------------------------n. treat. n. contr. ATT Std. Err. t 659

--------------------------------------------------------185

57

560.287

1331.294

0.421

--------------------------------------------------------Note: the numbers of treated and controls refer to actual nearest neighbour matches . . **** Row 8 Table 25.6: Radius matching for Radius=0.001 . set seed 10101 . attr RE78 TREAT $XDW99, comsup boot reps($breps) dots logit radius(0.001)

The program is searching for matches of treated units within radius. This operation may take a while.

ATT estimation with the Radius Matching method Analytical standard errors --------------------------------------------------------n. treat. n. contr. ATT Std. Err. t --------------------------------------------------------57

583 -9358.228

997.561

-9.381

--------------------------------------------------------Note: the numbers of treated and controls refer to actual matches within radius

Bootstrapping of standard errors command: attr RE78 TREAT AGE AGESQ EDUC EDUCSQ MARR NODEGREE BLACK HISP RE74 RE75 RE74SQ RE75SQ > U74BLACK , pscore() logit comsup radius(.001) statistic: attr = r(attr) .................................................................................................... > .................................................................................................. > ..

Bootstrap statistics

Number of obs = Replications = 200

2675

660

-----------------------------------------------------------------------------Variable | Reps Observed Bias Std. Err. [95% Conf. Interval] -------------+---------------------------------------------------------------attr | 200 -9358.228 2589.204 3079.824 -15431.51 -3284.949 (N) | -11328.39 901.8873 (P) | -13053.95 -6956.288 (BC) -----------------------------------------------------------------------------Note: N = normal P = percentile BC = bias-corrected

ATT estimation with the Radius Matching method Bootstrapped standard errors --------------------------------------------------------n. treat. n. contr. ATT Std. Err. t --------------------------------------------------------57

583 -9358.228

3079.824

-3.039

--------------------------------------------------------Note: the numbers of treated and controls refer to actual matches within radius . . **** Row 9 Table 25.6: Radius matching for Radius=0.0001 . set seed 10101 . attr RE78 TREAT $XDW99, comsup boot reps($breps) dots logit radius(0.0001)

The program is searching for matches of treated units within radius. This operation may take a while.

ATT estimation with the Radius Matching method Analytical standard errors --------------------------------------------------------n. treat. n. contr. ATT Std. Err. t --------------------------------------------------------27

76 -7847.460 2066.697

-3.797

--------------------------------------------------------Note: the numbers of treated and controls refer to actual matches within radius

661

Bootstrapping of standard errors command: attr RE78 TREAT AGE AGESQ EDUC EDUCSQ MARR NODEGREE BLACK HISP RE74 RE75 RE74SQ RE75SQ > U74BLACK , pscore() logit comsup radius(.0001) statistic: attr = r(attr) .................................................................................................... > .................................................................................................. > ..

Bootstrap statistics

Number of obs = Replications = 200

2675

-----------------------------------------------------------------------------Variable | Reps Observed Bias Std. Err. [95% Conf. Interval] -------------+---------------------------------------------------------------attr | 200 -7847.46 2920.804 4850.874 -17413.17 1718.251 (N) | -13423.91 5223.634 (P) | -15432.32 632.0693 (BC) -----------------------------------------------------------------------------Note: N = normal P = percentile BC = bias-corrected

ATT estimation with the Radius Matching method Bootstrapped standard errors --------------------------------------------------------n. treat. n. contr. ATT Std. Err. t --------------------------------------------------------27

76 -7847.460

4850.874

-1.618

--------------------------------------------------------Note: the numbers of treated and controls refer to actual matches within radius . . **** Row 10 Table 25.6: Radius matching for Radius=0.00001 . set seed 10101 . attr RE78 TREAT $XDW99, comsup boot reps($breps) dots logit radius(0.00001)

662

The program is searching for matches of treated units within radius. This operation may take a while.

ATT estimation with the Radius Matching method Analytical standard errors --------------------------------------------------------n. treat. n. contr. ATT Std. Err. t --------------------------------------------------------16

13

223.468

4551.850

0.049

--------------------------------------------------------Note: the numbers of treated and controls refer to actual matches within radius

Bootstrapping of standard errors command: attr RE78 TREAT AGE AGESQ EDUC EDUCSQ MARR NODEGREE BLACK HISP RE74 RE75 RE74SQ RE75SQ > U74BLACK , pscore() logit comsup radius(.00001) statistic: attr = r(attr) .................................................................................................... > .................................................................................................. > ..

Bootstrap statistics

Number of obs = Replications = 200

2675

-----------------------------------------------------------------------------Variable | Reps Observed Bias Std. Err. [95% Conf. Interval] -------------+---------------------------------------------------------------attr | 199 223.4685 -1272.487 5608.927 -10837.43 11284.37 (N) | -14600.21 8548.427 (P) | -10778.17 11039.05 (BC) -----------------------------------------------------------------------------Note: N = normal P = percentile BC = bias-corrected

ATT estimation with the Radius Matching method Bootstrapped standard errors 663

--------------------------------------------------------n. treat. n. contr. ATT Std. Err. t --------------------------------------------------------16

13

223.468

5608.927

0.040

--------------------------------------------------------Note: the numbers of treated and controls refer to actual matches within radius . . **** Row 11 Table 25.6: Stratification Matching . set seed 10101 . atts RE78 TREAT, pscore(myscore) blockid(myblock) comsup boot reps($breps) dots

ATT estimation with the Stratification method Analytical standard errors --------------------------------------------------------n. treat. n. contr. ATT Std. Err. t --------------------------------------------------------98

1233

1322.160

.

.

---------------------------------------------------------

Bootstrapping of standard errors command: atts RE78 TREAT , pscore(myscore) blockid(myblock) comsup statistic: atts = r(atts) .................................................................................................... > .................................................................................................. > ..

Bootstrap statistics

Number of obs = Replications = 200

2675

-----------------------------------------------------------------------------Variable | Reps Observed Bias Std. Err. [95% Conf. Interval] -------------+---------------------------------------------------------------atts | 200 1322.16 -51.6285 1276.237 -1194.524 3838.844 (N) | -1515.399 3960.787 (P) 664

| -1383.034 4034.298 (BC) -----------------------------------------------------------------------------Note: N = normal P = percentile BC = bias-corrected

ATT estimation with the Stratification method Bootstrapped standard errors --------------------------------------------------------n. treat. n. contr. ATT Std. Err. t --------------------------------------------------------98

1233

1322.160

1276.237

1.036

--------------------------------------------------------. . **** Row 12 Table 25.6: Kernel Matching . * pscore TREAT $XDW99, pscore(myscore) comsup blockid(myblock) numblo($breps) level(0.005) logit . set seed 10101 . attk RE78 TREAT $XDW99, comsup boot reps($breps) dots logit

The program is searching for matches of each treated unit. This operation may take a while.

ATT estimation with the Kernel Matching method --------------------------------------------------------n. treat. n. contr. ATT Std. Err. t --------------------------------------------------------185

1146

1518.694

.

.

--------------------------------------------------------Note: Analytical standard errors cannot be computed. Use the bootstrap option to get bootstrapped standard errors.

Bootstrapping of standard errors

665

command: attk RE78 TREAT AGE AGESQ EDUC EDUCSQ MARR NODEGREE BLACK HISP RE74 RE75 RE74SQ RE75SQ > U74BLACK , pscore() logit comsup bwidth(.06) statistic: attk = r(attk) .................................................................................................... > .................................................................................................. > ..

Bootstrap statistics

Number of obs = Replications = 200

2675

-----------------------------------------------------------------------------Variable | Reps Observed Bias Std. Err. [95% Conf. Interval] -------------+---------------------------------------------------------------attk | 200 1518.694 130.8493 808.3386 -75.31444 3112.703 (N) | 212.6286 3165.292 (P) | 96.05106 2991.407 (BC) -----------------------------------------------------------------------------Note: N = normal P = percentile BC = bias-corrected

ATT estimation with the Kernel Matching method Bootstrapped standard errors --------------------------------------------------------n. treat. n. contr. ATT Std. Err. t --------------------------------------------------------185

1146

1518.694

808.339

1.879

--------------------------------------------------------. . ********** CLOSE OUTPUT ********** . log close log: c:\Imbook\bwebpage\Section6\mma25p2matching.txt log type: text closed on: 26 May 2005, 11:15:53 -----------------------------------------------------------------------------------------------------log: c:\Imbook\bwebpage\Section6\mma25p3extra.txt log type: text opened on: 26 May 2005, 11:33:04 . . ********** OVERVIEW OF MMA25P3EXTRA.DO ********** . 666

. * STATA Program . * copyright C 2005 by A. Colin Cameron and Pravin K. Trivedi . * used for "Microeconometrics: Methods and Applications" . * by A. Colin Cameron and Pravin K. Trivedi (2005) . * Cambridge University Press . . * Chapter 25.8 pages 889-893 . * Evaluating treatment effect of training on Earnings . * This program provides additional analysis and data not in the book . * (1) Compare NSW experiment treated to NSW experiment controls . * (2) Compare NSW experiment treated to CPS "controls" . * [Same as text except "controls" are from CPS not PSID] . . * The program is based on .* MMA25P2MATCHING.DO propensity score matching . . * To run this program you need STATA data files . * nswre74_treated.dta NSW Treated sample . * nswre74_control.dta NSW Control sample (not analyzed earlier) . * propensity_cps.dta CPS Control sample (rather than PSID) . . * To run this program you need the Stata add-ons . * pscore.ado, atts.ado, attr.ado, attnd.ado, attnw.ado . * due to Sascha O. Becker and Andrea Ichino (2002) . * "Estimation of average treatment effects based on propensity scores", . * The Stata Journal, Vol.2, No.4, pp. 358-377. . . * This program uses version 2.02 May 13 2005 for Stata version 8 . * downloadable from http://www.iue.it/Personal/Ichino/#pscore . * We earlier used version 1.29 October 8 2002 for Stata version 7 . * downloadable from http://www.iue.it/Personal/Ichino/#pscore . * and obtained the same results . . * To speed up the program reduce breps: the number of bootstrap . * replications used to obtain bootstrap standard errors . * Bootstrap se's will differ from text as here seed is set to 10101 . . ********** STATA SETUP ********** . . set more off . version 8 . set scheme s1mono /* Used for graphs */ . . ********** DATA DESCRIPTION ********** . . * Data originally from DW99 . * R.H. Dehejia and S. Wahba (1999) . * "Causal Effects in Nonexperimental Studies: reevaluating the 667

. * Evaluation of Training Programs", JASA, 1053-1062 . * or DW02 . * R.H. Dehejia and S. Wahba (2002) . * "Propensity-score Matching Methods for Nonexperimental Causal . * Studies", ReStat, 151-161 . * which in turn are from . * Lalonde, R. (1986), "Evaluating the Econometric Evaluations of . * Training Programs with Experimental Data," AER, 604-620. . . * nswre74_treated.dta N=185 NSW Treated sample only . * nswre74_control.dta N=260 NSW Control sample only . * propensity_cps.dta N=16177 NSW Treated + CPS Control sample (Full CPS or CPS-1) . . ********** (1) ANALYSIS: NSW TREATED VERSUS NSW CONTROLS ********** . . * Read in NSW treated and control and combine . use nswre74_treated.dta, clear . append using nswre74_control.dta . . ** Summarize these data . sum Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------treat | 445 .4157303 .4934022 0 1 age | 445 25.37079 7.100282 17 55 edu | 445 10.19551 1.792119 3 16 black | 445 .8337079 .3727617 0 1 hisp | 445 .0876404 .2830895 0 1 -------------+-------------------------------------------------------married | 445 .1685393 .3747658 0 1 nodegree | 445 .7820225 .4133367 0 1 re74 | 445 2102.265 5363.582 0 39570.68 re75 | 445 1377.138 3150.961 0 25142.24 re78 | 445 5300.764 6631.492 0 60307.93 -------------+-------------------------------------------------------u74 | 445 .2674157 .4431092 0 1 u75 | 445 .3505618 .4776829 0 1 . bysort treat: sum ----------------------------------------------------------------------------------------------------> treat = 0 Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------treat | 260 0 0 0 0 age | 260 25.05385 7.057745 17 55 edu | 260 10.08846 1.614325 3 14 668

black | 260 .8269231 .3790434 0 1 hisp | 260 .1076923 .3105893 0 1 -------------+-------------------------------------------------------married | 260 .1538462 .3614971 0 1 nodegree | 260 .8346154 .3722439 0 1 re74 | 260 2107.027 5687.906 0 39570.68 re75 | 260 1266.909 3102.982 0 23031.98 re78 | 260 4554.801 5483.836 0 39483.53 -------------+-------------------------------------------------------u74 | 260 .25 .4338478 0 1 u75 | 260 .3153846 .4655651 0 1 ----------------------------------------------------------------------------------------------------> treat = 1 Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------treat | 185 1 0 1 1 age | 185 25.81622 7.155019 17 48 edu | 185 10.34595 2.01065 4 16 black | 185 .8432432 .3645579 0 1 hisp | 185 .0594595 .2371244 0 1 -------------+-------------------------------------------------------married | 185 .1891892 .3927217 0 1 nodegree | 185 .7081081 .4558666 0 1 re74 | 185 2095.574 4886.62 0 35040.07 re75 | 185 1532.055 3219.251 0 25142.24 re78 | 185 6349.144 7867.402 0 60307.93 -------------+-------------------------------------------------------u74 | 185 .2918919 .4558666 0 1 u75 | 185 .4 .4912274 0 1

. . * Write data to a text (ascii) file so can use with programs other than Stata . outfile treat age edu black hisp married nodegree re74 re75 re78 u74 u75 /* > */using nswre74_all.asc, replace . . ** Calculate the benchmark Treatment Effect . ** Same as DW02 Tables 2 and 3 NSW row second last column . ** and is the number given in CT page 894 second last line . . regress re78 treat Source | SS df MS Number of obs = 445 -------------+-----------------------------F( 1, 443) = 8.04 Model | 348013183 1 348013183 Prob > F = 0.0048 Residual | 1.9178e+10 443 43290369.3 R-squared = 0.0178 -------------+-----------------------------Adj R-squared = 0.0156 Total | 1.9526e+10 444 43976681.9 Root MSE = 6579.5 669

-----------------------------------------------------------------------------re78 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------treat | 1794.342 632.8534 2.84 0.005 550.5745 3038.11 _cons | 4554.801 408.0459 11.16 0.000 3752.855 5356.747 -----------------------------------------------------------------------------. . ********** (2) ANALYSIS: NSW TREATED VERSUS CPS CONTROLS ********** . . * This data set has NSW treated and full CPS controls . use propensity_cps.dta, clear . . * Variables u74, u75 were evaluated wrongly in the original file . * So make the following correction . drop u74 u75 . gen u74=0 . replace u74=1 if re74==0 (2044 real changes made) . gen u75=0 . replace u75=1 if re75==0 (1859 real changes made) . gen age2=age*age . gen age3=age2*age . gen edu2=edu*edu . gen edure74=edu*re74 . * Not sure whether this is needed . * Does DW99 use edu*re74*age3 or separately edu*re74 and age3 ? . gen edre74age3=edu*re74*age3 . . ** Summarize these data . sum Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------treat | 16177 .011436 .1063292 0 1 age | 16177 33.14051 11.03651 16 55 edu | 16177 12.00828 2.868005 0 18 black | 16177 .0823391 .2748892 0 1 670

hisp | 16177 .0718922 .2583173 0 1 -------------+-------------------------------------------------------married | 16177 .7057551 .4557167 0 1 nodegree | 16177 .3005502 .4585115 0 1 re74 | 16177 13880.47 9613.115 0 35040.07 re75 | 16177 13512.21 9313.207 0 25243.55 re78 | 16177 14749.48 9670.996 0 60307.93 -------------+-------------------------------------------------------u74 | 16177 .1263522 .3322562 0 1 u75 | 16177 .1149162 .3189307 0 1 age2 | 16177 1220.09 783.4604 256 3025 age3 | 16177 48988.49 45032.59 4096 166375 edu2 | 16177 152.4238 67.06033 0 324 -------------+-------------------------------------------------------edure74 | 16177 169452.3 129585.8 0 490561 edre74age3 | 16177 9.53e+09 1.21e+10 0 7.75e+10 . bysort treat: sum ----------------------------------------------------------------------------------------------------> treat = 0 Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------treat | 15992 0 0 0 0 age | 15992 33.22524 11.04522 16 55 edu | 15992 12.02751 2.870846 0 18 black | 15992 .0735368 .2610237 0 1 hisp | 15992 .072036 .2585556 0 1 -------------+-------------------------------------------------------married | 15992 .7117309 .4529712 0 1 nodegree | 15992 .2958354 .4564316 0 1 re74 | 15992 14016.8 9569.796 0 25862.32 re75 | 15992 13650.8 9270.403 0 25243.55 re78 | 15992 14846.66 9647.392 0 25564.67 -------------+-------------------------------------------------------u74 | 15992 .1196223 .3245295 0 1 u75 | 15992 .1093047 .3120308 0 1 age2 | 15992 1225.906 784.7382 256 3025 age3 | 15992 49305.85 45139.01 4096 166375 edu2 | 15992 152.9023 67.16633 0 324 -------------+-------------------------------------------------------edure74 | 15992 171147.6 129218.8 0 465521.8 edre74age3 | 15992 9.64e+09 1.21e+10 0 7.75e+10 ----------------------------------------------------------------------------------------------------> treat = 1 Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------treat | 185 1 0 1 1 671

age | 185 25.81622 7.155019 17 48 edu | 185 10.34595 2.01065 4 16 black | 185 .8432432 .3645579 0 1 hisp | 185 .0594595 .2371244 0 1 -------------+-------------------------------------------------------married | 185 .1891892 .3927217 0 1 nodegree | 185 .7081081 .4558666 0 1 re74 | 185 2095.574 4886.62 0 35040.07 re75 | 185 1532.055 3219.251 0 25142.24 re78 | 185 6349.144 7867.402 0 60307.93 -------------+-------------------------------------------------------u74 | 185 .7081081 .4558666 0 1 u75 | 185 .6 .4912274 0 1 age2 | 185 717.3946 431.2517 289 2304 age3 | 185 21554.66 20964.71 4913 110592 edu2 | 185 111.0595 39.30388 16 256 -------------+-------------------------------------------------------edure74 | 185 22898.73 57393.97 0 490561 edre74age3 | 185 4.28e+08 1.24e+09 0 8.75e+09

. . * Write data to a text (ascii) file so can use with programs other than Stata . * This has data as original except for recode of u74 and u75 . outfile treat age edu black hisp married nodegree re74 re75 re78 u74 u75 /* > */ using propensity_cps.asc, replace . . ** Number of replications to use in the bootstrap . ** Ideally at least 400 . global breps 200 . . *** (2A) CPS propensity score model from DW02 Table 2 footnote A . . global CPSDW02 age age2 age3 edu edu2 married nodegree black hisp re74 re75 u74 u75 edure74 . . * With common support option . pscore treat $CPSDW02, pscore(myscore) blockid(myblock) comsup numblo(5) level(0.005) logit

**************************************************** Algorithm to estimate the propensity score ****************************************************

The treatment is treat treat |

Freq.

Percent

Cum. 672

------------+----------------------------------0 | 15,992 98.86 98.86 1| 185 1.14 100.00 ------------+----------------------------------Total | 16,177 100.00

Estimation of the propensity score Iteration 0: Iteration 1: Iteration 2: Iteration 3: Iteration 4: Iteration 5: Iteration 6: Iteration 7: Iteration 8:

log likelihood = -1011.0713 log likelihood = -612.55814 log likelihood = -481.71035 log likelihood = -428.3351 log likelihood = -409.00437 log likelihood = -404.57736 log likelihood = -404.16676 log likelihood = -404.15991 log likelihood = -404.15991

Logit estimates

Number of obs = 16177 LR chi2(14) = 1213.82 Prob > chi2 = 0.0000 Log likelihood = -404.15991 Pseudo R2 = 0.6003 -----------------------------------------------------------------------------treat | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------age | 2.425229 .3500652 6.93 0.000 1.739114 3.111344 age2 | -.0672395 .0111308 -6.04 0.000 -.0890555 -.0454234 age3 | .0005685 .0001113 5.11 0.000 .0003505 .0007866 edu | .9247848 .2500694 3.70 0.000 .4346577 1.414912 edu2 | -.0572021 .0136202 -4.20 0.000 -.0838972 -.0305071 married | -1.556471 .2517687 -6.18 0.000 -2.049929 -1.063014 nodegree | .9270591 .3254621 2.85 0.004 .2891651 1.564953 black | 3.850668 .2662868 14.46 0.000 3.328755 4.37258 hisp | 1.673885 .409913 4.08 0.000 .8704705 2.4773 re74 | -.0002203 .0001086 -2.03 0.043 -.0004332 -7.40e-06 re75 | -.0001969 .0000378 -5.21 0.000 -.000271 -.0001228 u74 | 1.749522 .2897311 6.04 0.000 1.18166 2.317385 u75 | .00944 .257531 0.04 0.971 -.4953115 .5141915 edure74 | .0000222 9.08e-06 2.45 0.014 4.43e-06 .00004 _cons | -35.22098 3.797922 -9.27 0.000 -42.66477 -27.77719 -----------------------------------------------------------------------------note: 3 failures and 0 successes completely determined.

Note: the common support option has been selected The region of common support is [.00106139, .93845543] 673

Description of the estimated propensity score in region of common support Estimated propensity score ------------------------------------------------------------Percentiles Smallest 1% .0010892 .0010614 5% .001221 .0010615 10% .0013925 .0010625 Obs 4041 25% .0021398 .0010632 Sum of Wgt. 4041 50% 75% 90% 95% 99%

.0053823 Mean .0452964 Largest Std. Dev. .1326324 .0156111 .9356451 .0856723 .93718 Variance .0175914 .282253 .9374608 Skewness 4.475994 .822637 .9384554 Kurtosis 24.36564

****************************************************** Step 1: Identification of the optimal number of blocks Use option detail if you want more detailed output ******************************************************

The final number of blocks is 8 This number of blocks ensures that the mean propensity score is not different for treated and controls in each blocks

********************************************************** Step 2: Test of balancing property of the propensity score Use option detail if you want more detailed output **********************************************************

The balancing property is satisfied

This table shows the inferior bound, the number of treated and the number of controls for each block Inferior | of block | of pscore |

treat 0

1|

Total 674

-----------+----------------------+---------.0010614 | 3,214 18 | 3,232 .025 | 240 8| 248 .05 | 172 14 | 186 .1 | 96 19 | 115 .2 | 86 32 | 118 .4 | 31 38 | 69 .6 | 9 20 | 29 .8 | 8 36 | 44 -----------+----------------------+---------Total | 3,856 185 | 4,041 Note: the common support option has been selected

******************************************* End of the algorithm to estimate the pscore ******************************************* . . * Without common support option . drop myscore myblock . pscore treat $CPSDW02, pscore(myscore) blockid(myblock) numblo(5) level(0.005) logit

**************************************************** Algorithm to estimate the propensity score ****************************************************

The treatment is treat treat | Freq. Percent Cum. ------------+----------------------------------0 | 15,992 98.86 98.86 1| 185 1.14 100.00 ------------+----------------------------------Total | 16,177 100.00

Estimation of the propensity score Iteration 0: Iteration 1: Iteration 2: Iteration 3: Iteration 4: Iteration 5:

log likelihood = -1011.0713 log likelihood = -612.55814 log likelihood = -481.71035 log likelihood = -428.3351 log likelihood = -409.00437 log likelihood = -404.57736 675

Iteration 6: log likelihood = -404.16676 Iteration 7: log likelihood = -404.15991 Iteration 8: log likelihood = -404.15991 Logit estimates

Number of obs = 16177 LR chi2(14) = 1213.82 Prob > chi2 = 0.0000 Log likelihood = -404.15991 Pseudo R2 = 0.6003 -----------------------------------------------------------------------------treat | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------age | 2.425229 .3500652 6.93 0.000 1.739114 3.111344 age2 | -.0672395 .0111308 -6.04 0.000 -.0890555 -.0454234 age3 | .0005685 .0001113 5.11 0.000 .0003505 .0007866 edu | .9247848 .2500694 3.70 0.000 .4346577 1.414912 edu2 | -.0572021 .0136202 -4.20 0.000 -.0838972 -.0305071 married | -1.556471 .2517687 -6.18 0.000 -2.049929 -1.063014 nodegree | .9270591 .3254621 2.85 0.004 .2891651 1.564953 black | 3.850668 .2662868 14.46 0.000 3.328755 4.37258 hisp | 1.673885 .409913 4.08 0.000 .8704705 2.4773 re74 | -.0002203 .0001086 -2.03 0.043 -.0004332 -7.40e-06 re75 | -.0001969 .0000378 -5.21 0.000 -.000271 -.0001228 u74 | 1.749522 .2897311 6.04 0.000 1.18166 2.317385 u75 | .00944 .257531 0.04 0.971 -.4953115 .5141915 edure74 | .0000222 9.08e-06 2.45 0.014 4.43e-06 .00004 _cons | -35.22098 3.797922 -9.27 0.000 -42.66477 -27.77719 -----------------------------------------------------------------------------note: 3 failures and 0 successes completely determined.

Description of the estimated propensity score Estimated propensity score ------------------------------------------------------------Percentiles Smallest 1% 5.92e-07 1.18e-09 5% 1.72e-06 4.07e-09 10% 3.63e-06 4.24e-09 Obs 16177 25% .0000196 1.55e-08 Sum of Wgt. 16177 50% 75% 90% 95% 99%

.0001247 Mean .011436 Largest Std. Dev. .0691037 .0010579 .9356451 .0073933 .93718 Variance .0047753 .0250635 .9374608 Skewness 9.281842 .3620009 .9384554 Kurtosis 99.39697

676

****************************************************** Step 1: Identification of the optimal number of blocks Use option detail if you want more detailed output ******************************************************

The final number of blocks is 13 This number of blocks ensures that the mean propensity score is not different for treated and controls in each blocks

********************************************************** Step 2: Test of balancing property of the propensity score Use option detail if you want more detailed output **********************************************************

The balancing property is satisfied

This table shows the inferior bound, the number of treated and the number of controls for each block Inferior | of block | treat of pscore | 0 1 | Total -----------+----------------------+---------0 | 11,635 0 | 11,635 .0007813 | 1,056 2 | 1,058 .0015625 | 932 5| 937 .003125 | 712 2| 714 .00625 | 709 2| 711 .0125 | 306 7| 313 .025 | 240 8| 248 .05 | 172 14 | 186 .1 | 96 19 | 115 .2 | 86 32 | 118 .4 | 31 38 | 69 .6 | 9 20 | 29 .8 | 8 36 | 44 -----------+----------------------+---------Total | 15,992 185 | 16,177

******************************************* End of the algorithm to estimate the pscore ******************************************* 677

. . * Nearest neighbor matching (random version) . attnd re78 treat $CPSDW02, comsup boot reps($breps) dots logit

The program is searching the nearest neighbor of each treated unit. This operation may take a while.

ATT estimation with Nearest Neighbor Matching method (random draw version) Analytical standard errors --------------------------------------------------------n. treat. n. contr. ATT Std. Err. t --------------------------------------------------------185

155

730.380

1049.321

0.696

--------------------------------------------------------Note: the numbers of treated and controls refer to actual nearest neighbour matches

Bootstrapping of standard errors command: attnd re78 treat age age2 age3 edu edu2 married nodegree black hisp re74 re75 u74 u75 > edure74 , pscore() logit comsup statistic: attnd = r(attnd) .................................................................................................... > .................................................................................................. > ..

Bootstrap statistics

Number of obs = Replications = 200

16177

-----------------------------------------------------------------------------Variable | Reps Observed Bias Std. Err. [95% Conf. Interval] -------------+---------------------------------------------------------------attnd | 200 730.3805 1280.829 941.0756 -1125.38 2586.141 (N) | 151.7753 3865.059 (P) | -601.5495 1317.795 (BC) -----------------------------------------------------------------------------Note: N = normal 678

P = percentile BC = bias-corrected

ATT estimation with Nearest Neighbor Matching method (random draw version) Bootstrapped standard errors --------------------------------------------------------n. treat. n. contr. ATT Std. Err. t --------------------------------------------------------185

155

730.380

941.076

0.776

--------------------------------------------------------Note: the numbers of treated and controls refer to actual nearest neighbour matches . . * Radius matching: Radius=0.0001 . attr re78 treat $CPSDW02, comsup boot reps($breps) dots logit radius(0.0001)

The program is searching for matches of treated units within radius. This operation may take a while.

ATT estimation with the Radius Matching method Analytical standard errors --------------------------------------------------------n. treat. n. contr. ATT Std. Err. t --------------------------------------------------------67

1027 -2935.932

888.041

-3.306

--------------------------------------------------------Note: the numbers of treated and controls refer to actual matches within radius

Bootstrapping of standard errors command: attr re78 treat age age2 age3 edu edu2 married nodegree black hisp re74 re75 u74 u75 e > dure74 , pscore() logit comsup radius(.0001) 679

statistic: attr = r(attr) .................................................................................................... > .................................................................................................. > ..

Bootstrap statistics

Number of obs = Replications = 200

16177

-----------------------------------------------------------------------------Variable | Reps Observed Bias Std. Err. [95% Conf. Interval] -------------+---------------------------------------------------------------attr | 200 -2935.932 472.0703 1332.096 -5562.767 -309.0973 (N) | -5186.873 438.6902 (P) | -5999.987 -950.2962 (BC) -----------------------------------------------------------------------------Note: N = normal P = percentile BC = bias-corrected

ATT estimation with the Radius Matching method Bootstrapped standard errors --------------------------------------------------------n. treat. n. contr. ATT Std. Err. t --------------------------------------------------------67

1027 -2935.932

1332.096

-2.204

--------------------------------------------------------Note: the numbers of treated and controls refer to actual matches within radius . . * Kernel Matching . attk re78 treat $CPSDW02, comsup boot reps($breps) dots logit

The program is searching for matches of each treated unit. This operation may take a while.

ATT estimation with the Kernel Matching method --------------------------------------------------------n. treat. n. contr. ATT Std. Err. t ---------------------------------------------------------

680

185

3856

1267.716

.

.

--------------------------------------------------------Note: Analytical standard errors cannot be computed. Use the bootstrap option to get bootstrapped standard errors.

Bootstrapping of standard errors command: attk re78 treat age age2 age3 edu edu2 married nodegree black hisp re74 re75 u74 u75 e > dure74 , pscore() logit comsup bwidth(.06) statistic: attk = r(attk) .................................................................................................... > .................................................................................................. > ..

Bootstrap statistics

Number of obs = Replications = 200

16177

-----------------------------------------------------------------------------Variable | Reps Observed Bias Std. Err. [95% Conf. Interval] -------------+---------------------------------------------------------------attk | 200 1267.716 -64.23519 720.5805 -153.2374 2688.669 (N) | -211.0497 2559.206 (P) | -136.5283 2594.417 (BC) -----------------------------------------------------------------------------Note: N = normal P = percentile BC = bias-corrected

ATT estimation with the Kernel Matching method Bootstrapped standard errors --------------------------------------------------------n. treat. n. contr. ATT Std. Err. t --------------------------------------------------------185

3856

1267.716

720.580

1.759

--------------------------------------------------------. . * Stratification Matching . atts re78 treat, pscore(myscore) blockid(myblock) comsup boot reps($breps) dots 681

ATT estimation with the Stratification method Analytical standard errors --------------------------------------------------------n. treat. n. contr. ATT Std. Err. t --------------------------------------------------------185

3856

1505.512

734.270

2.050

---------------------------------------------------------

Bootstrapping of standard errors command: atts re78 treat , pscore(myscore) blockid(myblock) comsup statistic: atts = r(atts) .................................................................................................... > .................................................................................................. > ..

Bootstrap statistics

Number of obs = Replications = 200

16177

-----------------------------------------------------------------------------Variable | Reps Observed Bias Std. Err. [95% Conf. Interval] -------------+---------------------------------------------------------------atts | 200 1505.512 -9.343635 665.1843 193.7979 2817.227 (N) | 251.7493 2958.461 (P) | 252.6815 2985.052 (BC) -----------------------------------------------------------------------------Note: N = normal P = percentile BC = bias-corrected

ATT estimation with the Stratification method Bootstrapped standard errors --------------------------------------------------------n. treat. n. contr. ATT Std. Err. t --------------------------------------------------------185

3856

1505.512

665.184

2.263 682

--------------------------------------------------------. . *** (2B) CPS propensity score model from DW99 Table 2 footnote A . . global CPSDW99 age age2 edu edu2 nodegree married black hisp re74 re75 u74 u75 edure74 age3 . . * With common support option . drop myscore myblock . pscore treat $CPSDW99, pscore(myscore) blockid(myblock) comsup numblo(5) level(0.005) logit

**************************************************** Algorithm to estimate the propensity score ****************************************************

The treatment is treat treat | Freq. Percent Cum. ------------+----------------------------------0 | 15,992 98.86 98.86 1| 185 1.14 100.00 ------------+----------------------------------Total | 16,177 100.00

Estimation of the propensity score Iteration 0: Iteration 1: Iteration 2: Iteration 3: Iteration 4: Iteration 5: Iteration 6: Iteration 7: Iteration 8:

log likelihood = -1011.0713 log likelihood = -612.55814 log likelihood = -481.71035 log likelihood = -428.3351 log likelihood = -409.00437 log likelihood = -404.57736 log likelihood = -404.16676 log likelihood = -404.15991 log likelihood = -404.15991

Logit estimates

Number of obs = 16177 LR chi2(14) = 1213.82 Prob > chi2 = 0.0000 Log likelihood = -404.15991 Pseudo R2 = 0.6003 -----------------------------------------------------------------------------treat | Coef. Std. Err. z P>|z| [95% Conf. Interval] 683

-------------+---------------------------------------------------------------age | 2.425229 .3500652 6.93 0.000 1.739114 3.111344 age2 | -.0672395 .0111308 -6.04 0.000 -.0890555 -.0454234 edu | .9247848 .2500694 3.70 0.000 .4346577 1.414912 edu2 | -.0572021 .0136202 -4.20 0.000 -.0838972 -.0305071 nodegree | .9270591 .3254621 2.85 0.004 .2891651 1.564953 married | -1.556471 .2517687 -6.18 0.000 -2.049929 -1.063014 black | 3.850668 .2662868 14.46 0.000 3.328755 4.37258 hisp | 1.673885 .409913 4.08 0.000 .8704705 2.4773 re74 | -.0002203 .0001086 -2.03 0.043 -.0004332 -7.40e-06 re75 | -.0001969 .0000378 -5.21 0.000 -.000271 -.0001228 u74 | 1.749522 .2897311 6.04 0.000 1.18166 2.317385 u75 | .00944 .257531 0.04 0.971 -.4953115 .5141915 edure74 | .0000222 9.08e-06 2.45 0.014 4.43e-06 .00004 age3 | .0005685 .0001113 5.11 0.000 .0003505 .0007866 _cons | -35.22098 3.797922 -9.27 0.000 -42.66477 -27.77719 -----------------------------------------------------------------------------note: 3 failures and 0 successes completely determined.

Note: the common support option has been selected The region of common support is [.00106139, .93845543]

Description of the estimated propensity score in region of common support Estimated propensity score ------------------------------------------------------------Percentiles Smallest 1% .0010892 .0010614 5% .001221 .0010615 10% .0013925 .0010625 Obs 4041 25% .0021398 .0010632 Sum of Wgt. 4041 50% 75% 90% 95% 99%

.0053823 Mean .0452964 Largest Std. Dev. .1326324 .0156111 .9356451 .0856723 .93718 Variance .0175914 .282253 .9374608 Skewness 4.475994 .822637 .9384554 Kurtosis 24.36564

****************************************************** Step 1: Identification of the optimal number of blocks Use option detail if you want more detailed output ****************************************************** 684

The final number of blocks is 8 This number of blocks ensures that the mean propensity score is not different for treated and controls in each blocks

********************************************************** Step 2: Test of balancing property of the propensity score Use option detail if you want more detailed output **********************************************************

The balancing property is satisfied

This table shows the inferior bound, the number of treated and the number of controls for each block Inferior | of block | treat of pscore | 0 1 | Total -----------+----------------------+---------.0010614 | 3,214 18 | 3,232 .025 | 240 8| 248 .05 | 172 14 | 186 .1 | 96 19 | 115 .2 | 86 32 | 118 .4 | 31 38 | 69 .6 | 9 20 | 29 .8 | 8 36 | 44 -----------+----------------------+---------Total | 3,856 185 | 4,041 Note: the common support option has been selected

******************************************* End of the algorithm to estimate the pscore ******************************************* . . * Without common support option . drop myscore myblock . pscore treat $CPSDW99, pscore(myscore) blockid(myblock) numblo(5) level(0.005) logit

685

**************************************************** Algorithm to estimate the propensity score ****************************************************

The treatment is treat treat | Freq. Percent Cum. ------------+----------------------------------0 | 15,992 98.86 98.86 1| 185 1.14 100.00 ------------+----------------------------------Total | 16,177 100.00

Estimation of the propensity score Iteration 0: Iteration 1: Iteration 2: Iteration 3: Iteration 4: Iteration 5: Iteration 6: Iteration 7: Iteration 8:

log likelihood = -1011.0713 log likelihood = -612.55814 log likelihood = -481.71035 log likelihood = -428.3351 log likelihood = -409.00437 log likelihood = -404.57736 log likelihood = -404.16676 log likelihood = -404.15991 log likelihood = -404.15991

Logit estimates

Number of obs = 16177 LR chi2(14) = 1213.82 Prob > chi2 = 0.0000 Log likelihood = -404.15991 Pseudo R2 = 0.6003 -----------------------------------------------------------------------------treat | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------age | 2.425229 .3500652 6.93 0.000 1.739114 3.111344 age2 | -.0672395 .0111308 -6.04 0.000 -.0890555 -.0454234 edu | .9247848 .2500694 3.70 0.000 .4346577 1.414912 edu2 | -.0572021 .0136202 -4.20 0.000 -.0838972 -.0305071 nodegree | .9270591 .3254621 2.85 0.004 .2891651 1.564953 married | -1.556471 .2517687 -6.18 0.000 -2.049929 -1.063014 black | 3.850668 .2662868 14.46 0.000 3.328755 4.37258 hisp | 1.673885 .409913 4.08 0.000 .8704705 2.4773 re74 | -.0002203 .0001086 -2.03 0.043 -.0004332 -7.40e-06 re75 | -.0001969 .0000378 -5.21 0.000 -.000271 -.0001228 u74 | 1.749522 .2897311 6.04 0.000 1.18166 2.317385 u75 | .00944 .257531 0.04 0.971 -.4953115 .5141915 edure74 | .0000222 9.08e-06 2.45 0.014 4.43e-06 .00004 age3 | .0005685 .0001113 5.11 0.000 .0003505 .0007866 _cons | -35.22098 3.797922 -9.27 0.000 -42.66477 -27.77719 686

-----------------------------------------------------------------------------note: 3 failures and 0 successes completely determined.

Description of the estimated propensity score Estimated propensity score ------------------------------------------------------------Percentiles Smallest 1% 5.92e-07 1.18e-09 5% 1.72e-06 4.07e-09 10% 3.63e-06 4.24e-09 Obs 16177 25% .0000196 1.55e-08 Sum of Wgt. 16177 50% 75% 90% 95% 99%

.0001247 Mean .011436 Largest Std. Dev. .0691037 .0010579 .9356451 .0073933 .93718 Variance .0047753 .0250635 .9374608 Skewness 9.281842 .3620009 .9384554 Kurtosis 99.39697

****************************************************** Step 1: Identification of the optimal number of blocks Use option detail if you want more detailed output ******************************************************

The final number of blocks is 13 This number of blocks ensures that the mean propensity score is not different for treated and controls in each blocks

********************************************************** Step 2: Test of balancing property of the propensity score Use option detail if you want more detailed output **********************************************************

The balancing property is satisfied

This table shows the inferior bound, the number of treated and the number of controls for each block Inferior | 687

of block | treat of pscore | 0 1 | Total -----------+----------------------+---------0 | 11,635 0 | 11,635 .0007813 | 1,056 2 | 1,058 .0015625 | 932 5| 937 .003125 | 712 2| 714 .00625 | 709 2| 711 .0125 | 306 7| 313 .025 | 240 8| 248 .05 | 172 14 | 186 .1 | 96 19 | 115 .2 | 86 32 | 118 .4 | 31 38 | 69 .6 | 9 20 | 29 .8 | 8 36 | 44 -----------+----------------------+---------Total | 15,992 185 | 16,177

******************************************* End of the algorithm to estimate the pscore ******************************************* . . * Nearest neighbor matching (random version) . attnd re78 treat $CPSDW99, comsup boot reps($breps) dots logit

The program is searching the nearest neighbor of each treated unit. This operation may take a while.

ATT estimation with Nearest Neighbor Matching method (random draw version) Analytical standard errors --------------------------------------------------------n. treat. n. contr. ATT Std. Err. t --------------------------------------------------------185

155

730.380

1049.321

0.696

--------------------------------------------------------Note: the numbers of treated and controls refer to actual nearest neighbour matches

688

Bootstrapping of standard errors command: attnd re78 treat age age2 edu edu2 nodegree married black hisp re74 re75 u74 u75 edure > 74 age3 , pscore() logit comsup statistic: attnd = r(attnd) .................................................................................................... > .................................................................................................. > ..

Bootstrap statistics

Number of obs = Replications = 200

16177

-----------------------------------------------------------------------------Variable | Reps Observed Bias Std. Err. [95% Conf. Interval] -------------+---------------------------------------------------------------attnd | 200 730.3805 1179.371 964.5437 -1171.658 2632.419 (N) | -9.143144 3738.959 (P) | -638.1188 1625.387 (BC) -----------------------------------------------------------------------------Note: N = normal P = percentile BC = bias-corrected

ATT estimation with Nearest Neighbor Matching method (random draw version) Bootstrapped standard errors --------------------------------------------------------n. treat. n. contr. ATT Std. Err. t --------------------------------------------------------185

155

730.380

964.544

0.757

--------------------------------------------------------Note: the numbers of treated and controls refer to actual nearest neighbour matches . . * Radius matching: Radius=0.0001 . attr re78 treat $CPSDW99, comsup boot reps($breps) dots logit radius(0.0001)

The program is searching for matches of treated units within radius. This operation may take a while.

689

ATT estimation with the Radius Matching method Analytical standard errors --------------------------------------------------------n. treat. n. contr. ATT Std. Err. t --------------------------------------------------------67

1027 -2935.932

888.041

-3.306

--------------------------------------------------------Note: the numbers of treated and controls refer to actual matches within radius

Bootstrapping of standard errors command: attr re78 treat age age2 edu edu2 nodegree married black hisp re74 re75 u74 u75 edure7 > 4 age3 , pscore() logit comsup radius(.0001) statistic: attr = r(attr) .................................................................................................... > .................................................................................................. > ..

Bootstrap statistics

Number of obs = Replications = 200

16177

-----------------------------------------------------------------------------Variable | Reps Observed Bias Std. Err. [95% Conf. Interval] -------------+---------------------------------------------------------------attr | 200 -2935.932 522.4813 1276.508 -5453.15 -418.7147 (N) | -5239.598 302.9884 (P) | -6023.029 -1232.031 (BC) -----------------------------------------------------------------------------Note: N = normal P = percentile BC = bias-corrected

ATT estimation with the Radius Matching method Bootstrapped standard errors --------------------------------------------------------n. treat. n. contr. ATT Std. Err. t 690

--------------------------------------------------------67

1027 -2935.932

1276.508

-2.300

--------------------------------------------------------Note: the numbers of treated and controls refer to actual matches within radius . . * Kernel Matching . attk re78 treat $CPSDW99, comsup boot reps($breps) dots logit

The program is searching for matches of each treated unit. This operation may take a while.

ATT estimation with the Kernel Matching method --------------------------------------------------------n. treat. n. contr. ATT Std. Err. t --------------------------------------------------------185

3856 1267.716

.

.

--------------------------------------------------------Note: Analytical standard errors cannot be computed. Use the bootstrap option to get bootstrapped standard errors.

Bootstrapping of standard errors command: attk re78 treat age age2 edu edu2 nodegree married black hisp re74 re75 u74 u75 edure7 > 4 age3 , pscore() logit comsup bwidth(.06) statistic: attk = r(attk) .................................................................................................... > .................................................................................................. > ..

Bootstrap statistics

Number of obs = Replications = 200

16177

-----------------------------------------------------------------------------Variable | Reps Observed Bias Std. Err. [95% Conf. Interval] -------------+---------------------------------------------------------------691

attk | 200 1267.716 -57.76407 751.2898 -213.7948 2749.227 (N) | -304.83 2488.355 (P) | -314.1009 2459.423 (BC) -----------------------------------------------------------------------------Note: N = normal P = percentile BC = bias-corrected

ATT estimation with the Kernel Matching method Bootstrapped standard errors --------------------------------------------------------n. treat. n. contr. ATT Std. Err. t --------------------------------------------------------185

3856

1267.716

751.290

1.687

--------------------------------------------------------. . * Stratification Matching . atts re78 treat, pscore(myscore) blockid(myblock) comsup boot reps($breps) dots

ATT estimation with the Stratification method Analytical standard errors --------------------------------------------------------n. treat. n. contr. ATT Std. Err. t --------------------------------------------------------185

3856

1505.512

734.270

2.050

---------------------------------------------------------

Bootstrapping of standard errors command: atts re78 treat , pscore(myscore) blockid(myblock) comsup statistic: atts = r(atts) .................................................................................................... > .................................................................................................. > ..

692

Bootstrap statistics

Number of obs = Replications = 200

16177

-----------------------------------------------------------------------------Variable | Reps Observed Bias Std. Err. [95% Conf. Interval] -------------+---------------------------------------------------------------atts | 200 1505.512 61.77066 741.7862 42.7422 2968.282 (N) | 245.6284 2880.622 (P) | 348.125 2849.896 (BC) -----------------------------------------------------------------------------Note: N = normal P = percentile BC = bias-corrected

ATT estimation with the Stratification method Bootstrapped standard errors --------------------------------------------------------n. treat. n. contr. ATT Std. Err. t --------------------------------------------------------185

3856

1505.512

741.786

2.030

--------------------------------------------------------. . *** (2C) CPS propensity score model from Becker-Ichino, 2002 (BI02) . . gen re742 = re74*re74 . gen re752 = re75*re75 . gen blacku74 = black*u74 . global CPSBI02 age age2 edu edu2 married black hisp re74 re75 re742 re752 blacku74 . . * With common support option . drop myscore myblock . pscore treat $CPSBI02, pscore(myscore) blockid(myblock) comsup numblo(5) level(0.005) logit

**************************************************** Algorithm to estimate the propensity score ****************************************************

693

The treatment is treat treat | Freq. Percent Cum. ------------+----------------------------------0 | 15,992 98.86 98.86 1| 185 1.14 100.00 ------------+----------------------------------Total | 16,177 100.00

Estimation of the propensity score Iteration 0: Iteration 1: Iteration 2: Iteration 3: Iteration 4: Iteration 5: Iteration 6: Iteration 7: Iteration 8:

log likelihood = -1011.0713 log likelihood = -660.17479 log likelihood = -533.64831 log likelihood = -462.67008 log likelihood = -435.22392 log likelihood = -427.14921 log likelihood = -425.78297 log likelihood = -425.64689 log likelihood = -425.64309

Logit estimates

Number of obs = 16177 LR chi2(12) = 1170.86 Prob > chi2 = 0.0000 Log likelihood = -425.64309 Pseudo R2 = 0.5790 -----------------------------------------------------------------------------treat | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------age | .7902073 .0940972 8.40 0.000 .6057803 .9746344 age2 | -.0128161 .0015894 -8.06 0.000 -.0159313 -.0097009 edu | .9953909 .2558663 3.89 0.000 .4939022 1.49688 edu2 | -.0636036 .0131378 -4.84 0.000 -.0893532 -.0378541 married | -1.534639 .2516679 -6.10 0.000 -2.027899 -1.041379 black | 3.340175 .3032312 11.02 0.000 2.745853 3.934497 hisp | 1.636367 .3971529 4.12 0.000 .8579614 2.414772 re74 | -.0001744 .0000626 -2.79 0.005 -.0002971 -.0000517 re75 | -.000168 .0000693 -2.42 0.015 -.0003039 -.0000322 re742 | 8.06e-09 2.61e-09 3.09 0.002 2.95e-09 1.32e-08 re752 | -2.05e-09 3.97e-09 -0.52 0.605 -9.83e-09 5.73e-09 blacku74 | 1.033264 .288037 3.59 0.000 .4687217 1.597806 _cons | -18.16269 1.865757 -9.73 0.000 -21.81951 -14.50588 -----------------------------------------------------------------------------note: 112 failures and 0 successes completely determined.

Note: the common support option has been selected 694

The region of common support is [.00065577, .90386519]

Description of the estimated propensity score in region of common support Estimated propensity score ------------------------------------------------------------Percentiles Smallest 1% .0006768 .0006558 5% .0007912 .000656 10% .0009583 .0006562 Obs 5354 25% .0016749 .0006566 Sum of Wgt. 5354 50% 75% 90% 95% 99%

.0040446 Mean .0343457 Largest Std. Dev. .1120884 .0089357 .8905055 .0495031 .898552 Variance .0125638 .1913766 .9023286 Skewness 4.931471 .6773557 .9038652 Kurtosis 29.27201

****************************************************** Step 1: Identification of the optimal number of blocks Use option detail if you want more detailed output ******************************************************

The final number of blocks is 10 This number of blocks ensures that the mean propensity score is not different for treated and controls in each blocks

********************************************************** Step 2: Test of balancing property of the propensity score Use option detail if you want more detailed output ********************************************************** Variable blacku74 is not balanced in block 3 The balancing property is not satisfied Try a different specification of the propensity score Inferior | of block | of pscore |

treat 0

1|

Total 695

-----------+----------------------+---------0 | 4,230 13 | 4,243 .0125 | 330 7| 337 .025 | 231 9| 240 .05 | 126 14 | 140 .1 | 108 23 | 131 .2 | 87 30 | 117 .4 | 29 20 | 49 .5 | 10 24 | 34 .6 | 12 25 | 37 .8 | 6 20 | 26 -----------+----------------------+---------Total | 5,169 185 | 5,354 Note: the common support option has been selected

******************************************* End of the algorithm to estimate the pscore ******************************************* . . * Without common support option . drop myscore myblock . pscore treat $CPSBI02, pscore(myscore) blockid(myblock) numblo(5) level(0.005) logit

**************************************************** Algorithm to estimate the propensity score ****************************************************

The treatment is treat treat | Freq. Percent Cum. ------------+----------------------------------0 | 15,992 98.86 98.86 1| 185 1.14 100.00 ------------+----------------------------------Total | 16,177 100.00

Estimation of the propensity score Iteration 0: Iteration 1: Iteration 2: Iteration 3:

log likelihood = -1011.0713 log likelihood = -660.17479 log likelihood = -533.64831 log likelihood = -462.67008 696

Iteration 4: Iteration 5: Iteration 6: Iteration 7: Iteration 8:

log likelihood = -435.22392 log likelihood = -427.14921 log likelihood = -425.78297 log likelihood = -425.64689 log likelihood = -425.64309

Logit estimates

Number of obs = 16177 LR chi2(12) = 1170.86 Prob > chi2 = 0.0000 Log likelihood = -425.64309 Pseudo R2 = 0.5790 -----------------------------------------------------------------------------treat | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------age | .7902073 .0940972 8.40 0.000 .6057803 .9746344 age2 | -.0128161 .0015894 -8.06 0.000 -.0159313 -.0097009 edu | .9953909 .2558663 3.89 0.000 .4939022 1.49688 edu2 | -.0636036 .0131378 -4.84 0.000 -.0893532 -.0378541 married | -1.534639 .2516679 -6.10 0.000 -2.027899 -1.041379 black | 3.340175 .3032312 11.02 0.000 2.745853 3.934497 hisp | 1.636367 .3971529 4.12 0.000 .8579614 2.414772 re74 | -.0001744 .0000626 -2.79 0.005 -.0002971 -.0000517 re75 | -.000168 .0000693 -2.42 0.015 -.0003039 -.0000322 re742 | 8.06e-09 2.61e-09 3.09 0.002 2.95e-09 1.32e-08 re752 | -2.05e-09 3.97e-09 -0.52 0.605 -9.83e-09 5.73e-09 blacku74 | 1.033264 .288037 3.59 0.000 .4687217 1.597806 _cons | -18.16269 1.865757 -9.73 0.000 -21.81951 -14.50588 -----------------------------------------------------------------------------note: 112 failures and 0 successes completely determined.

Description of the estimated propensity score Estimated propensity score ------------------------------------------------------------Percentiles Smallest 1% 2.89e-08 1.94e-10 5% 3.05e-07 1.94e-10 10% 1.20e-06 1.94e-10 Obs 16177 25% .0000148 1.94e-10 Sum of Wgt. 16177 50% 75% 90% 95% 99%

.0001313 Mean .011436 Largest Std. Dev. .0664629 .0016513 .8905055 .0074369 .898552 Variance .0044173 .0234798 .9023286 Skewness 8.811019 .3855562 .9038652 Kurtosis 89.82108

697

****************************************************** Step 1: Identification of the optimal number of blocks Use option detail if you want more detailed output ******************************************************

The final number of blocks is 14 This number of blocks ensures that the mean propensity score is not different for treated and controls in each blocks

********************************************************** Step 2: Test of balancing property of the propensity score Use option detail if you want more detailed output ********************************************************** Variable blacku74 is not balanced in block 7 The balancing property is not satisfied Try a different specification of the propensity score Inferior | of block | treat of pscore | 0 1 | Total -----------+----------------------+---------0 | 11,076 1 | 11,077 .0007813 | 968 2| 970 .0015625 | 1,020 2 | 1,022 .003125 | 1,185 3 | 1,188 .00625 | 804 5| 809 .0125 | 330 7| 337 .025 | 231 9| 240 .05 | 126 14 | 140 .1 | 108 23 | 131 .2 | 87 30 | 117 .4 | 29 20 | 49 .5 | 10 24 | 34 .6 | 12 25 | 37 .8 | 6 20 | 26 -----------+----------------------+---------Total | 15,992 185 | 16,177

******************************************* End of the algorithm to estimate the pscore ******************************************* 698

. . * Nearest neighbor matching (random version) . attnd re78 treat $CPSBI02, comsup boot reps($breps) dots logit

The program is searching the nearest neighbor of each treated unit. This operation may take a while.

ATT estimation with Nearest Neighbor Matching method (random draw version) Analytical standard errors --------------------------------------------------------n. treat. n. contr. ATT Std. Err. t --------------------------------------------------------185

147

1214.888

988.298

1.229

--------------------------------------------------------Note: the numbers of treated and controls refer to actual nearest neighbour matches

Bootstrapping of standard errors command: attnd re78 treat age age2 edu edu2 married black hisp re74 re75 re742 re752 blacku74 , > pscore() logit comsup statistic: attnd = r(attnd) .................................................................................................... > .................................................................................................. > ..

Bootstrap statistics

Number of obs = Replications = 200

16177

-----------------------------------------------------------------------------Variable | Reps Observed Bias Std. Err. [95% Conf. Interval] -------------+---------------------------------------------------------------attnd | 200 1214.888 379.5276 924.3417 -607.8733 3037.65 (N) | -199.325 3378.257 (P) | -1646.026 2654.964 (BC) -----------------------------------------------------------------------------Note: N = normal 699

P = percentile BC = bias-corrected

ATT estimation with Nearest Neighbor Matching method (random draw version) Bootstrapped standard errors --------------------------------------------------------n. treat. n. contr. ATT Std. Err. t --------------------------------------------------------185

147

1214.888

924.342

1.314

--------------------------------------------------------Note: the numbers of treated and controls refer to actual nearest neighbour matches . . * Radius matching: Radius=0.0001 . attr re78 treat $CPSBI02, comsup boot reps($breps) dots logit radius(0.0001)

The program is searching for matches of treated units within radius. This operation may take a while.

ATT estimation with the Radius Matching method Analytical standard errors --------------------------------------------------------n. treat. n. contr. ATT Std. Err. t --------------------------------------------------------65

1089 -3094.104

857.247

-3.609

--------------------------------------------------------Note: the numbers of treated and controls refer to actual matches within radius

Bootstrapping of standard errors command: attr re78 treat age age2 edu edu2 married black hisp re74 re75 re742 re752 blacku74 , > pscore() logit comsup radius(.0001) statistic: attr = r(attr) 700

.................................................................................................... > .................................................................................................. > ..

Bootstrap statistics

Number of obs = Replications = 200

16177

-----------------------------------------------------------------------------Variable | Reps Observed Bias Std. Err. [95% Conf. Interval] -------------+---------------------------------------------------------------attr | 200 -3094.104 603.6858 1724.927 -6495.585 307.3775 (N) | -5865.623 247.5659 (P) | -8184.668 -474.5812 (BC) -----------------------------------------------------------------------------Note: N = normal P = percentile BC = bias-corrected

ATT estimation with the Radius Matching method Bootstrapped standard errors --------------------------------------------------------n. treat. n. contr. ATT Std. Err. t --------------------------------------------------------65

1089 -3094.104

1724.927

-1.794

--------------------------------------------------------Note: the numbers of treated and controls refer to actual matches within radius . . * Kernel Matching . attk re78 treat $CPSBI02, comsup boot reps($breps) dots logit

The program is searching for matches of each treated unit. This operation may take a while.

ATT estimation with the Kernel Matching method --------------------------------------------------------n. treat. n. contr. ATT Std. Err. t --------------------------------------------------------185

5169

881.520

.

. 701

--------------------------------------------------------Note: Analytical standard errors cannot be computed. Use the bootstrap option to get bootstrapped standard errors.

Bootstrapping of standard errors command: attk re78 treat age age2 edu edu2 married black hisp re74 re75 re742 re752 blacku74 , > pscore() logit comsup bwidth(.06) statistic: attk = r(attk) .................................................................................................... > .................................................................................................. > ..

Bootstrap statistics

Number of obs = Replications = 200

16177

-----------------------------------------------------------------------------Variable | Reps Observed Bias Std. Err. [95% Conf. Interval] -------------+---------------------------------------------------------------attk | 200 881.5195 193.3904 741.3048 -580.3012 2343.34 (N) | -375.8089 2373.732 (P) | -776.3726 2117.355 (BC) -----------------------------------------------------------------------------Note: N = normal P = percentile BC = bias-corrected

ATT estimation with the Kernel Matching method Bootstrapped standard errors --------------------------------------------------------n. treat. n. contr. ATT Std. Err. t --------------------------------------------------------185

5169

881.520

741.305

1.189

--------------------------------------------------------. . * Stratification Matching . atts re78 treat, pscore(myscore) blockid(myblock) comsup boot reps($breps) dots

702

ATT estimation with the Stratification method Analytical standard errors --------------------------------------------------------n. treat. n. contr. ATT Std. Err. t --------------------------------------------------------185

5169

1538.713

.

.

---------------------------------------------------------

Bootstrapping of standard errors command: atts re78 treat , pscore(myscore) blockid(myblock) comsup statistic: atts = r(atts) .................................................................................................... > .................................................................................................. > ..

Bootstrap statistics

Number of obs = Replications = 200

16177

-----------------------------------------------------------------------------Variable | Reps Observed Bias Std. Err. [95% Conf. Interval] -------------+---------------------------------------------------------------atts | 200 1538.713 18.76738 748.4438 62.81438 3014.612 (N) | 249.6562 3263.537 (P) | 225.0108 3230.658 (BC) -----------------------------------------------------------------------------Note: N = normal P = percentile BC = bias-corrected

ATT estimation with the Stratification method Bootstrapped standard errors --------------------------------------------------------n. treat. n. contr. ATT Std. Err. t --------------------------------------------------------185

5169

1538.713

748.444

2.056

--------------------------------------------------------703

. . ********** CLOSE OUTPUT ********** . log close log: c:\Imbook\bwebpage\Section6\mma25p3extra.txt log type: text closed on: 26 May 2005, 13:26:49 ----------------------------------------------------------------------------------------------------

704

705

706

707

708

709

710

711

712

713

714

715

BOOK 716

717

718

719

720

721

722

723

724

725

726

727

728

729

730

731

732

733

734

735

736

737

738

739

740

741

742

FIGURES Most of these figures are produced by Stata programs given at this website. Page Figure Brief caption File 50 3.1 Social experiment with random assignment ch3-fig1.wmf 89

4.1

Quantile regression estimates of slope coefficient

ch4fig1qr.wmf

90

4.2

Quantile regression estimated lines

ch4fig2qr.wmf

249 7.1

Power of Wald chi-square test

ch7power.wmf

253 7.2

Density of Wald test statistic of zero slope coefficient

ch7montecarlo.wmf

296 9.1

Histogram for log wage

ch9hist.wmf

296 9.2

Kernel density estimates for log wage

ch9kd1.wmf

297 9.3

Nonparametric regression of log wage on education

ch9ksm1.wmf

300 9.4

Kernel density estimates using differnet kernels

ch9kdensu1.wmf

309 9.5

k-NN regression

ch9ksmma.wmf

310 9.6

Nonparametric regression using Lowess

ch9ksmlowess.wmf

317 9.7

Nonparamertric estimate of derivative of y with respect to ch9kderiv.wmf x

368 11.1

Bootstrap estimate of the density of t-test statistic

411 12.1

Halton sequence draws comparedto pseudo-random draws

413 12.2

Inverse transformation method for unit exponential draws ch12fig2invtransform.wmf

414 12.3

Accept-reject method for random draws

ch12fig3envelope.wmf

424 13.1

Bayesian analysis for mean parameter of normal density

ch13_bayes1.wmf

466 14.1

Charter boat fishing: probit and logit predictions

ch14binary.wmf

516 15.1

Generalized random utility model

ch15-Gen-RUM2.wmf

531 16.1

Tobit regression example

ch16condmeans.wmf

540 16.2

Inverse Mills ratio as censoring point c increases

ch16millsratio.wmf

575 17.1

Strike duration: Kaplan-Meier survival function

kennanstrk.wmf

585 17.2

Weibull distribution: density, survivor, hazard and ch17weibull.wmf cumulative hazard functions

604 17.3

Unemployment duration: Kaplan-Meier survival function

605 17.4

Unemployment duration: unemployment insurance

606 17.5

Unemployment duration: Nelson-Aalen cumulative hazard na_pt1.wmf function

606 17.6

Unemployment duration: cumulative hazard functions by na_pt2.wmf unemployment insurance

627 18.1

Length-biased sampling under stock sampling

633 18.2

Unemployment duration: generalized residuals

633 18.3

Unemployment duration: Weibull model generalized exp_gamma.wmf residuals

survival

ch11boot.wmf

functions

exponential-gamma

km_pt1.wmf

by km_pt2.wmf

ch18lbias.wmf model exp.wmf

743

635 18.4

Unemployment duration: Weibull model generalized weibul16.wmf residuals

636 18.5

Unemployment duration: Weibull-Inverse Gaussian model weibul16_ig.wmf generalized residuals

661 19.1

Unemployment duration: Cox Competing Risks baseline combined_bsf.wmf survival functions

662 19.2

Unemployment duration: Cox Competing Risks baseline combined_cbh.wmf cumulative hazards

712 21.1

Hours and wages: pooled (overall) regression

ch21pantot.wmf

713 21.2

Hours and wages: between regression

ch21panbe.wmf

713 21.3

Hours and wages: within (fixed effects) regression

ch21panfe.wmf

714 21.4

Hours and wages: first differences regression

ch21panfd.wmf

793 23.1

Patents and R&D spending: pooled (overall) regression ch23fig1.wmf [with corrected labelling of axes]

880 25.1

Regression-discontinuity design: example

ch25-fig1-rd.wmf

883 25.2

Treatment assignment in sharp and fuzzy RD designs.

ch25-fig2-rd.wmf

892 25.3

Training impact: earnings against propensity score by ch25treatment.wmf treatment status

924 27.1

Missing data: examples of missing regressors

ch27fig1.wmf

Assign to treatment Yes Eligible subject invited to participate

Randomize

Agrees to participate?

Assign to control No

Drop from study

744

1 .8 .6 .4 .2

Upper 95% confidence band Quantile slope coefficient Lower 95% confidence band OLS slope coefficient

0

Slope and confidence bands

Slope Estimates as Quantile Varies

0

.2

.4

.6

.8

1

15

Regression Lines as Quantile Varies Actual Data 90th percentile Median

5

10

10th percentile

0

Log Household Total Expenditure

Quantile

6

8

10

12

Log Household Medical Expenditure

745

.6

Test size = 0.10 Test size = 0.05

.4

Test size = 0.01

0

.2

Test Power

.8

1

Test Power as a function of the ncp

0

5

10

15

20

Noncentrality parameter lamda

.4

Monte Carlo Simulations of Wald Test Monte Carlo

.2 .1 0

Density

.3

Standard Normal

-4

-2

0

2

4

Wald Test Statistic

746

0

.2

Density

.4

.6

Histogram for Log Wage

0

1

2

3

4

5

Log Hourly Wage

One-half plug-in Plug-in

.2

.4

.6

Two times plug-in

0

Kernel density estimates

.8

Density Estimates as Bandwidth Varies

0

1

2

3

4

5

Log Hourly Wage

747

Bandwidth h=0.8

Bandwidth h=0.4

Bandwidth h=0.1

1

2

3

4

Actual data

0

Log Hourly Wage

5

Nonparametric Regression as Bandwidth Varies

0

5

10

15

20

Years of Schooling

.4

Epanechnikov (h=0.545) Gaussian (h=0.246) Quartic (h=0.646)

.2

Uniform (h=0.214)

0

Kernel density estimates

.6

Density Estimates as Kernel Varies

0

1

2

3

4

5

Log Hourly Wage

748

350

Actual Data

300

kNN (k=5) Linear OLS

200

250

kNN (k=25)

150

Dependent variable y

k-Nearest Neighbours Regression as k Varies

0

20

40

60

80

100

Regressor x

Actual Data

300

Lowess (k=25)

200

250

OLS Cubic Regression

150

Dependent variable y

350

Lowess Nonparametric Regression

0

20

40

60

80

100

Regressor x

749

8

Nonparametric Derivative Estimation

0

2

4

6

From OLS Cubic Regression

-2

Dependent variable y

From Lowess (k=25)

0

20

40

60

80

100

Regressor x

.4

Bootstrap Density of 't-Statistic' Bootstrap Estimate

.2 .1 0

Density

.3

Standard Normal

-4

-2

0

2

4

t-statistic from each bootstrap replication

750

.6 .4 0

.2

Cdf F(x)

.8

1

Inverse Transformation Method

0

1

2

3

4

5

Random variable x Draw of 0.64 (vertical axis) yields x = 1.02 (horizontal axis).

.6

Accept-reject Method Desired density f(x)

.4 .2 0

f(x) and kg(x)

Envelope kg(x)

0

2

4

6

8

10

Random variable x

751

.4

Bayes: Likelihood, Prior and Posterior Likelihood N[10,2] Prior N[5,3]

.2 0

.1

Density

.3

Posterior N[8,1.2]

0

5

10

15

Evaluation point

1.5

Predicted Probabilities Across Models Actual Data (jittered)

1

Probit

0

.5

OLS

-.5

Predicted probability

Logit

-2

0

2

4

Log relative price (lnrelp)

752

Explanatory variables Disturbances

Disturbances

Latent classes

Indicators

Latent variables

Stated preference indicators

Utilities

Observable variable

Indicators

Unobservable variable Structural relationship Disturbances Revealed preference indicator y

-2000

0

2000

4000

Tobit: Censored and Truncated Means

Actual Latent Variable Truncated Mean Censored Mean

-4000

Different Conditional Means

Measurement relationship

Uncensored Mean

1

2

3

4

5

Natural Logarithm of Wage

753

Inverse Mills ratio

2

N[0,1] Cdf

.5

1

1.5

N[0,1] Density

0

Inverse Mills, pdf and cdf

2.5

Inverse Mills Ratio as Cutoff Varies

-2

-1

0

1

2

Cutoff point c

.75 .5

Upper 95% confidence band Survival Function

.25

Lower 95% confidence band

0

Survival Probability

1

Kaplan-Meier Survival Function Estimate

0

50

100

150

200

250

Strike duration in days

754

0

20

40

60

0 .2 .4 .6 .8 1

Weibull survivor

0 .01.02 .03.04

Weibull density

Weibull Distribution

80

0

20

40

60

80

Duration time

60

80

0 2 4 6 8

Cumulative hazard

.05 .1 .15 0

40

Duration time

0

Weibull hazard

Duration time

20

0

20

40

60

80

Duration time

1

Overall Survival Function Estimate Upper 95% confidence band

.25

.5

.75

Lower 95% confidence band

0

Survival Probability

Survival Estimate

0

10

20

30

Unemployment Duration in 2-week intervals

755

1.00

Survival Function Estimates by UI Status No UI (UI = 0)

0.75 0.50 0.25 0.00

Survival Probability

Received UI (UI = 1)

0

10

20

30

Unemployment Duration in 2-week intervals

1.5

Overall Cumulative Hazard Estimate Upper 95% confidence band

.5

1

Lower 95% confidence band

0

Cumulative Hazard

Cumulative Hazard Estimate

0

10

20

30

Unemployment Duration in 2-week intervals

756

1.50

Cumulative Hazard Estimates by UI Status No UI (UI = 0)

1.00 0.50 0.00

Cumulative Hazard

Received UI (UI = 1)

0

10

20

30

Unemployment Duration in 2-week intervals

S3

S2 S1

S5 12-month survey period

S4

S7

Survey date S9 S6

S8

757

4 3 2 1

Cumulative Hazard

5

Exponential Model Residuals

Cumulative Hazard

0

45 degree line

0

1

2

3

4

5

Generalized (Cox-Snell) Residual

3 2 1

Cumulative Hazard 45 degree line

0

Cumulative Hazard

4

Exponential-Gamma Model Residuals

0

1

2

3

4

Generalized (Cox-Snell) Residual

758

4 2

Cumulative Hazard

6

Weibull Model Residuals

Cumulative Hazard

0

45 degree line

0

2

4

6

Generalized (Cox-Snell) Residual

4 3 2 1

Cumulative Hazard 45 degree line

0

Cumulative Hazard

5

Weibull-IG Model Residuals

0

1

2

3

4

5

Generalized (Cox-Snell) Residual

759

1 .8 .6

Risk 1 (full-time job) Risk 2 (part-time job)

.2

.4

Risk 3 (unknown job)

0

Baseline Survival Probability

Baseline Survival Functions

0

10

20

30

Unemployment Duration in 2-week intervals

10

Risk 1 (full-time job)

8

Risk 2 (part-time job)

2

4

6

Risk 3 (unknown job)

0

Baseline Cumulative Hazard

Baseline Cumulative Hazard Functions

0

10

20

30

Unemployment Duration in 2-week intervals

760

8 6 4

Log annual hours

10

Pooled (Overall) Regression

Original data Nonparametric fit

2

Linear fit

0

1

2

3

4

5

Log hourly wage

8 7.5 7

Averages Nonparametric fit

6.5

Log annual hours

8.5

Between Regression

Linear fit

1

2

3

4

5

Log hourly wage

761

7 6 5

Log annual hours

8

9

Within (Fixed Effects) Regression

Deviations from average

4

Nonparametric fit Linear fit

0

1

2

3

4

5

Log hourly wage

0

First differences Nonparametric fit Linear fit

-5

Log annual hours

5

First Differences Regression

-2

-1

0

1

2

3

Log hourly wage

762

4 2 0

Log Patents

6

Pooled (Overall) Regression

Original data

-2

Nonparametric fit Linear fit

-5

0

5

10

Log R&D Spending

10 5

Actual data No treat (low) Treat (high)

0

Outcome y

15

20

Regression Discontinuity Example

1

2

3

4

5

Selection variable S

763

Post-treatment Earnings against Propensity Score Treated_sample

0

5000

10000

15000

Real Earnings 1978

20000

Comparison_sample

0

.5

1

Propensity Score Original data

0

.5

1

Propensity Score Nonparametric regression

Graphs by Treatment Status

Propensity score Pr[D=1|S]

Sharp and Fuzzy RD Designs

Sharp Design Fuzzy design

Selection variable S

764

Post-treatment Earnings against Propensity Score

5000

10000

15000

Treated_sample

0

Real Earnings 1978

20000

Comparison_sample

0

.5

Propensity Score Original data

1

0

.5

1

Propensity Score Nonparametric regression

Graphs by Treatment Status

765

BOOK CORRECTIONS - June 9, 2005 plus some but not all corrections since then added Page p.85

Date Posted Correction or Addition 2/18/2006 Bottom line should be "censored models (see Section 16.9.2)." [Jeff Smith, Michigan]

p.68, 147 11/22/2005 Liebler should be spelt Leibler [Joerg Stoye, NYU] p.89

3/30/2006

Third last line should be "q = 0.1, 0.5, and 0.9" and not "q = 0.1, 0.2, ..., 0.9" [James MacKinnon, Queen's]

p. 113

5/27/2005

Exercise 4-2 part (b) should be Hence directly obtain a consistent estimate of the variance of µ_hat (and not Hence directly obtain the variance of y_bar)

p. 114

6/9/2005

Exercise 4-7 parts (d)-(f) need to be replaced. See mmaex04_7.pdf.

p. 164

6/9/2005

Exercise 5-1 is correct but the function is close to A better example uses E[y|x]=exp(0+0.04x)/[1+exp(0+0.04x)].

p. 165

6/9/2005

Exercise 5-7 part (c) is ML estimation (delete the word NLS).

p. 168

3/3/2006

Second line after first displayed equation should be E[h(x)(y-g(x,ß))] = 0 (and not E[h(x)(y-x'ß)]) [Doug Miller, UC-Davis]

p. 178

3/3/2006

Last displayed equation. The first and third matrices are wrong and should be similar to G_hat in (6.21). For these matrices the two terms being summed over i should be x_i*x_i' and 3*utilde_i^2*x_i*x_i'. [Doug Miller, UC-Davis]

p. 189

3/6/2006

Theil's interpretation. Change "suppose that in the reduced form model" to "Suppose that we specify a first-stage model where" [Doug Miller, UCDavis]

p.190

3/6/2006

Basmann's interpretation. Change "OLS reduced form prediction" to "OLS first-stage predictions". [Doug Miller, UC-Davis]

p.193

3/6/2006

Top line change "because to regressors" to "because the regressors" [Doug Miller, UC-Davis]

p. 199

5/18/2005

In Table 6.4 NL2SLS column is 0.969, 0.041, 0.84 (and not 0.960, 0.046, 0.85)

p. 214

3/28/2006

In the displayed equation for the 3SLS estimator the matrix OMEGA_hat should be SIGMA-hat. Same change two lines down and four lines down. SIGMA_hat = definition given for OMEGA_hat.

p. 220

5/27/2005

Exercise 6-1 part (a) should be (y - exp(x'ß))^2 (and not (y - (x'ß))^2)

linear.

Exercise 6-1 part (d) should be E[x(y - exp(x'ß))] = 0 (so add = 0) p. 255

5/18/2005

Sample size was N=40 (and not N=30)

p.255

5/18/2005

Five lines from bottom should be z = (0.817 - 1) / 0.376 = -0.487

p. 256

5/18/2005

In section 7.8.3 the percentiles should be -1.89 and 1.80 (and not -2.62 and 1.83)

p.278,280 11/22/2005 Liebler should be spelt Leibler [Joerg Stoye, NYU] p. 414

5/18/2005

Figure 12.3 vertical axis label should be f(x) and kg(x) and legend should be kg(x) (and not g(x)) 766

p. 493

2/18/2006

First two lines should be "in the probability of fishing from a beach, and an increase of 0.119, 0.080, and 0.068, respectively, in the probability of fishing from a pier, a private boat, and a charter boat." [Jeff Smith, Michigan]

p. 501

3/22/2006

(15.17) and the line before should have minus sign before the expected Hessian. [Frank Windmeijer, Bristol]

p. 505

3/22/2006

Fifth line should be "computer-intensive" not "computer-intesive". [Frank Windmeijer, Bristol]

p.508

3/22/2006

Possible error in (15.31) needs to be checked

p. 569

5/19/2005

Bibliographic note 16.3 should refer to Tobin (1958) (and not Tobit (1958)) [Kevin Hoover, UCD]

p. 793

4/7/2005

Figure 23.1 axes labels are reversed. Vertical axis is log(patents) and horizontal axis is log(R&D)

p. 839

4/10/2006

Second equality for SIGMA_c^-1 should not have the inverse at the end.

p. 839

4/10/2006

Formula for [I + aee']^(1/2) should finish with ee' and not Mee'.

p. 895

5/26/2005

Table 25.6 footnote b drop RE74*RE75 from the list of regressors

767

768

769

770

771

772

773

774

775

776

777

778

779

780

781

782

783

784

785