Editorial policy: The Journal of Econometrics is designed to serve as an outlet for important new research in both theoretical and applied econometrics. Papers dealing with estimation and other methodological aspects of the application of statistical inference to economic data as well as papers dealing with the application of econometric techniques to substantive areas of economics fall within the scope of the Journal. Econometric research in the traditional divisions of the discipline or in the newly developing areas of social experimentation are decidedly within the range of the Journal’s interests. The Annals of Econometrics form an integral part of the Journal of Econometrics. Each issue of the Annals includes a collection of refereed papers on an important topic in econometrics. Editors: T. AMEMIYA, Department of Economics, Encina Hall, Stanford University, Stanford, CA 94035-6072, USA. A.R. GALLANT, Duke University, Fuqua School of Business, Durham, NC 27708-0120, USA. J.F. GEWEKE, Department of Economics, University of Iowa, Iowa City, IA 52240-1000, USA. C. HSIAO, Department of Economics, University of Southern California, Los Angeles, CA 90089, USA. P. ROBINSON, Department of Economics, London School of Economics, London WC2 2AE, UK. A. ZELLNER, Graduate School of Business, University of Chicago, Chicago, IL 60637, USA. Executive Council: D.J. AIGNER, Paul Merage School of Business, University of California, Irvine CA 92697; T. AMEMIYA, Stanford University; R. BLUNDELL, University College, London; P. DHRYMES, Columbia University; D. JORGENSON, Harvard University; A. ZELLNER, University of Chicago. Associate Editors: Y. AÏT-SAHALIA, Princeton University, Princeton, USA; B.H. BALTAGI, Syracuse University, Syracuse, USA; R. BANSAL, Duke University, Durham, NC, USA; M.J. CHAMBERS, University of Essex, Colchester, UK; SONGNIAN CHEN, Hong Kong University of Science and Technology, Kowloon, Hong Kong; XIAOHONG CHEN, Department of Economics, Yale University, 30 Hillhouse Avenue, P.O. Box 208281, New Haven, CT 06520-8281, USA; MIKHAIL CHERNOV (LSE), London Business School, Sussex Place, Regents Park, London, NW1 4SA, UK; V. CHERNOZHUKOV, MIT, Massachusetts, USA; M. DEISTLER, Technical University of Vienna, Vienna, Austria; M.A. DELGADO, Universidad Carlos III de Madrid, Madrid, Spain; YANQIN FAN, Department of Economics, Vanderbilt University, VU Station B #351819, 2301 Vanderbilt Place, Nashville, TN 37235-1819, USA; S. FRUHWIRTH-SCHNATTER, Johannes Kepler University, Liuz, Austria; E. GHYSELS, University of North Carolina at Chapel Hill, NC, USA; J.C. HAM, University of Southern California, Los Angeles, CA, USA; J. HIDALGO, London School of Economics, London, UK; H. HONG, Stanford University, Stanford, USA; MICHAEL KEANE, University of Technology Sydney, P.O. Box 123 Broadway, NSW 2007, Australia; Y. KITAMURA, Yale Univeristy, New Haven, USA; G.M. KOOP, University of Strathclyde, Glasgow, UK; N. KUNITOMO, University of Tokyo, Tokyo, Japan; K. LAHIRI, State University of New York, Albany, NY, USA; Q. LI, Texas A&M University, College Station, USA; T. LI, Vanderbilt University, Nashville, TN, USA; R.L. MATZKIN, Northwestern University, Evanston, IL, USA; FRANCESCA MOLINARI (CORNELL), Department of Economics, 492 Uris Hall, Ithaca, New York 14853-7601, USA; F.C. PALM, Rijksuniversiteit Limburg, Maastricht, The Netherlands; D.J. POIRIER, University of California, Irvine, USA; B.M. PÖTSCHER, University of Vienna, Vienna, Austria; I. PRUCHA, University of Maryland, College Park, USA; P.C. REISS, Stanford Business School, Stanford, USA; E. RENAULT, University of North Carolina, Chapel Hill, NC; F. SCHORFHEIDE, University of Pennsylvania, USA; R. SICKLES, Rice University, Houston, USA; F. SOWELL, Carnegie Mellon University, Pittsburgh, PA, USA; MARK STEEL (WARWICK), Department of Statistics, University of Warwick, Coventry CV4 7AL, UK; DAG BJARNE TJOESTHEIM, Department of Mathematics, University of Bergen, Bergen, Norway; HERMAN VAN DIJK, Erasmus University, Rotterdam, The Netherlands; Q.H. VUONG, Pennsylvania State University, University Park, PA, USA; E. VYTLACIL, Columbia University, New York, USA; T. WANSBEEK, Rijksuniversiteit Groningen, Groningen, Netherlands; T. ZHA, Federal Reserve Bank of Atlanta, Atlanta, USA and Emory University, Atlanta, USA. Submission fee: Unsolicited manuscripts must be accompanied by a submission fee of US$50 for authors who currently do not subscribe to the Journal of Econometrics; subscribers are exempt. Personal cheques or money orders accompanying the manuscripts should be made payable to the Journal of Econometrics. Publication information: Journal of Econometrics (ISSN 0304-4076). For 2011, Volumes 160–165 (12 issues) are scheduled for publication. Subscription prices are available upon request from the Publisher, from the Elsevier Customer Service Department nearest you, or from this journal’s website (http://www.elsevier.com/locate/jeconom). Further information is available on this journal and other Elsevier products through Elsevier’s website (http://www.elsevier.com). Subscriptions are accepted on a prepaid basis only and are entered on a calendar year basis. Issues are sent by standard mail (surface within Europe, air delivery outside Europe). Priority rates are available upon request. Claims for missing issues should be made within six months of the date of dispatch. USA mailing notice: Journal of Econometrics (ISSN 0304-4076) is published monthly by Elsevier B.V. (Radarweg 29, 1043 NX Amsterdam, The Netherlands). Periodicals postage paid at Rahway, NJ 07065-9998, USA, and at additional mailing offices. USA POSTMASTER: Send change of address to Journal of Econometrics, Elsevier Customer Service Department, 3251 Riverport Lane, Maryland Heights, MO 63043, USA. AIRFREIGHT AND MAILING in the USA by Mercury International Limited, 365 Blair Road, Avenel, NJ 07001-2231, USA. Orders, claims, and journal inquiries: Please contact the Elsevier Customer Service Department nearest you. St. Louis: Elsevier Customer Service Department, 3251 Riverport Lane, Maryland Heights, MO 63043, USA; phone: (877) 8397126 [toll free within the USA]; (+1) (314) 4478878 [outside the USA]; fax: (+1) (314) 4478077; e-mail:
[email protected]. Oxford: Elsevier Customer Service Department, The Boulevard, Langford Lane, Kidlington, Oxford OX5 1GB, UK; phone: (+44) (1865) 843434; fax: (+44) (1865) 843970; e-mail:
[email protected]. Tokyo: Elsevier Customer Service Department, 4F Higashi-Azabu, 1-Chome Bldg., 1-9-15 Higashi-Azabu, Minato-ku, Tokyo 106-0044, Japan; phone: (+81) (3) 5561 5037; fax: (+81) (3) 5561 5047; e-mail:
[email protected]. Singapore: Elsevier Customer Service Department, 3 Killiney Road, #08-01 Winsland House I, Singapore 239519; phone: (+65) 63490222; fax: (+65) 67331510; e-mail:
[email protected]. Printed by Henry Ling Ltd., Dorchester, United Kingdom The paper used in this publication meets the requirements of ANSI/NISO Z39.48-1992 (Permanence of Paper)
In Memorium Arnold Zellner died at his home in Chicago on August 11, 2010. He was one of the most prolific and influential econometricians who ever lived; colleague, mentor or friend to many in the profession; and, with Dennis Aigner, co-founder of the Journal of Econometrics in 1973. His memory will long live in the minds of those who knew him, his enthusiasm for profession and life will continue to infuse his colleagues, and his ideas will influence cohorts of scholars as yet unborn. In a prescient sign of his developing interests and future productivity, Arnold published his first paper at the age of 24 in Econometrica, two years after earning his BS in physics at Harvard. Following two years of military service and graduate school in economics at University of California – Berkeley, he took up his first faculty position in the Department of Economics at the University of Washington. While at Washington he developed the seemingly unrelated regressions model, arguably the most widely applied model in econometrics after linear regression, with the published paper appearing in Journal of the American Statistical Association in 1962. He moved to the University of Wisconsin in 1961. His contributions to Bayesian econometrics, which would be the theme of the balance of his academic career, began there in a paper published in 1964 with his University of Wisconsin Ph.D. colleague George Tiao. In 1966 Arnold Zellner took up the position of HGB Alexander Professor of Economics in the University of Chicago Business School, where he remained until his retirement. 1971 marked the publication of An Introduction to Bayesian Inference in Econometrics and Statistics. This volume, his most widely cited scholarly work after the seemingly unrelated regressions paper, is today on the bookshelf of every Bayesian econometrician in the world. Published at a time when very few econometricians utilized Bayesian statistics, it marked the beginning of growth in Bayesian methods in econometrics that continues unabated beyond Arnold’s passing. Much of this growth can be ascribed to Arnold’s contributions, including well over 100 academic publications that followed the 1971 textbook. It is due, too, to his hallmark energy, enthusiasm and organizational skills, reflected most significantly in two organizations that he founded. The Seminar on Bayesian Inference in Econometrics and Statisitics (SBIES), inaugurated at the time he completed the textbook, was the center of intellectual development in Bayesian econometrics for several decades. Following up on his commitment to Bayesian methods in all scientific endeavors, Arnold founded the International Society for Bayesian Analysis in 1991, an organization that today exercises leadership in Bayesian theory and application in fields ranging from genetics to economics to mathematical statistics. Arnold Zellner was instrumental in the founding of Journal of Econometrics in the early 1970s, and exercised continuous leadership as co-editor until his death. Acting on the advice of Dale Jorgenson, North-Holland Publishing Company (today operating as Elsevier) approached Arnold and Dennis Aigner about launching a new journal in econometrics. Phoebus Dhrymes joined Arnold and Dennis as third coeditor, and the first issue appeared in March, 1973. The journal’s commitment to important new research in theoretical and applied econometrics, enunciated in the inaugural issue, has not changed since. Due in no small part to the vigorous leadership of Arnold and his colleagues, the volume of contributions increased greatly, with over 2,000 pages now published each year. The honor of Fellow of the Journal of Econometrics, conceived by Arnold in the late 1980s, is conferred on any contributor with at least four (co-author-adjusted) publications. This model, new at the time and today adopted by several econometrics journals, was an innovation in objective recognition of scholarship at the time. Together with the Annals series of issues, initiated by Dennis Aigner in 1979, it has been a distinguishing feature of Journal of Econometrics.1 The Journal recognizes the contributions of these pioneers in the Zellner and Aigner awards given in alternate years. Arnold Zellner was similarly influential in the community of statisticians. He made outstanding contributions to the American Statistical Association, where he inaugurated the Journal of Business and Economic Statistics in 1983. He was Chair of the Business and Economics Section in 1982 and went on to be elected Association President in 1991. The annual Zellner Thesis Award in Business and Economic Statistics recognizes both these contributions and his influence on scholarship in econometrics. He was an elected fellow of the International Statistical Institute as well as the American Statistical Association. Arnold leaves a legacy of warmth, creativity and enthusiasm that will continue to inspire those who know him and those who follow in his footsteps. His sense of humor and zealous for life and work carried to the end. On July 15, 2010, he sent the following message to one of the associate editors: 1
The history of Arnold’s involvement with the Journal has been documented in some detail by two of his colleagues: see remarks by Dennis Aigner in Journal of Econometrics 75: 397–398 (1996); and Takeshi Amemiya, ‘‘Thirty-five Years of Journal of Econometrics,’’ Journal of Econometrics 148: 179–185 (2009).
‘‘Dear ___, many thanks for your constructive message below. I am in agreement with you and the referees’ suggestions for revision and regret the delay in responding due to some current personal health problems. Tomorrow I am to see the heart surgeon who will ‘‘stitch me up’’ and we can all hope for the best. In the meantime, please send Barbara and me what you wish sent on to the authors of this paper and we shall enter the material in the EE system and tell the author what remains to be done to get the paper published. With very best wishes to all, Arnold’’ His constructive and encouraging comments are also dearly missed by authors of the Journal of Econometrics. In one of the many condolences messages the editorial board office received, an author wrote: ‘‘He was a titan of the discipline, a great and lovely man, and a bedrock of the Journal of Econometrics. He dealt with a number of my submissions to the Journal and was always scrupulously fair, positive, encouraging and always corresponded with a lovely jolly air.’’ Arnold’s unflagging enthusiasm and commitment as a scholar will be missed at the Journal and throughout the profession.
Journal of Econometrics 160 (2011) 289–299
Contents lists available at ScienceDirect
Journal of Econometrics journal homepage: www.elsevier.com/locate/jeconom
The Hausman test and weak instruments✩, ✩✩ Jinyong Hahn a , John C. Ham b , Hyungsik Roger Moon c,∗ a
Department of Economics, University of California, Los Angeles, United States
b
Department of Economics, University of Maryland, IZA, IRP (UW-Madison), United States
c
Department of Economics, University of Southern California, United States
article
info
Article history: Received 19 November 2008 Received in revised form 29 June 2010 Accepted 5 September 2010 Available online 17 September 2010 JEL classification: C12 Keywords: Hausman test Weak instruments
abstract We consider the following problem. There is a structural equation of interest that contains an explanatory variable that theory predicts is endogenous. There are one or more instrumental variables that credibly are exogenous with regard to this structural equation, but which have limited explanatory power for the endogenous variable. Further, there is one or more potentially ‘strong’ instruments, which has much more explanatory power but which may not be exogenous. Hausman (1978) provided a test for the exogeneity of the second instrument when none of the instruments are weak. Here, we focus on how the standard Hausman test does in the presence of weak instruments using the Staiger–Stock asymptotics. It is natural to conjecture that the standard version of the Hausman test would be invalid in the weak instrument case, which we confirm. However, we provide a version of the Hausman test that is valid even in the presence of weak IV and illustrate how to implement the test in the presence of heteroskedasticity. We show that the situation we analyze occurs in several important economic examples. Our Monte Carlo experiments show that our procedure works relatively well in finite samples. We should note that our test is not consistent, although we believe that it is impossible to construct a consistent test with weak instruments. © 2010 Elsevier B.V. All rights reserved.
1. Introduction The weak instrument problem has led to the development to two strands of research, each of which is characterized by a different asymptotic approximation. The first of these, which we will call the many-instrument asymptotics, emphasizes the finite sample distortion which can be explained by the approximation where the number of instruments grows to infinity as a function of the sample size. This literature often concludes that the IV estimators are still approximately normal, but that the asymptotic variance estimators need to address the finite sample issue.1 Because the many-instrument asymptotics still produces a normal approximation for the estimators, the implication for practitioners is more
✩ We thank Takeshi Amemiya, an associate editor, two referees, and Joris Pinkse for helpful comments and suggestions, and Martin Weidner for proofreading. Hahn, Ham, and Moon acknowledge supports from the National Science Foundation. Any opinions, findings, conclusions, or recommendations in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation. ✩✩ All the omitted proofs and derivations are available in our Online Appendix
(Hahn et al., 2010) which is available at www-rcf.usc.edu/~moonr or upon request by email to the authors. ∗ Corresponding author. Tel.: +1 213 740 2108. E-mail address:
[email protected] (H.R. Moon). 1 See Bekker (1994), Donald and Newey (2001), and Hahn and Hausman (2002), among others. 0304-4076/$ – see front matter © 2010 Elsevier B.V. All rights reserved. doi:10.1016/j.jeconom.2010.09.009
or less a simple message that the standard error calculations need to be refined. On the other hand, there is the concern that the many-instrument asymptotics may not be relevant for situations where the degree of overidentification is mild. When the model is just or only mildly overidentified, and the explanatory power of the instruments is small, the alternative approximation due to Staiger and Stock (1997) is intuitively appealing.2 This approximation is characterized by alternative asymptotics where the firststage coefficients shrink to zero as a function of the square root of the sample size. We will refer to this approximation as the weakinstrument asymptotics or Staiger–Stock asymptotics. Under Staiger and Stock’s asymptotic approximation, many usual statistics have nonstandard asymptotic distributions. For example, it is well known that IV estimators and t-statistics have nonstandard distributions. Staiger and Stock (1997) also considered tests of overidentification under their asymptotics, and established that standard tests of overidentification do not have chi-square (χ 2 hereafter) distributions either. On the other hand, they showed that in the context of comparing the weak IV against OLS, a version of Hausman test statistic as usually employed has a correct asymptotic size, although they did observe that the test is not consistent. This finding is important because there is no standard test in the literature to determine whether conventional
2 See also Kleibergen (2000), Moreira (2003), and Andrews et al. (2006).
290
J. Hahn et al. / Journal of Econometrics 160 (2011) 289–299
asymptotics or Staiger and Stock’s alternative asymptotics is more appropriate for a given finite sample. The version of the Hausman test has the identical asymptotic distribution under both asymptotics and is thus a exception to the rule of thumb that test statistics tend to have nonstandard distribution under weak instrument asymptotics. It is a useful exception in that practitioners do not need to worry about the weakness of the IV and its potentially complicated consequences. In this paper, we extend Staiger and Stock’s (1997) analysis and document further exceptional cases. We next consider the standard Hausman test that examines the difference of two IV estimators based on two different sets of instruments and show that it possesses a certain robustness property in that its asymptotic distribution is invariant to whether conventional or weak instrument asymptotics are adopted. We consider a Hausman test that compares the weak IV and the strong IV. It is well known that the test statistic has a χ 2 distribution under conventional asymptotics. We establish that a version of Hausman test continues to have the χ 2 distribution even under the weak instrument asymptotics. We then show that a version of the overidentification test, which we interpret to be a natural generalization of the Hausman test, has such robustness. Finally, we also provide empirical researchers with a version of the Hausman test that can be used with heteroscedasticity under both conventional and Staiger–Stock asymptotics; although quite straightforward theoretically, neither case is currently available in the literature. Besides being of theoretical interest, our result has substantial practical implications because empirical researchers often face the following problem.3 They have a structural equation of interest that contains an explanatory variable that theory predicts is endogenous. They want to obtain a confidence interval for the estimated coefficient on the structural parameter, or for a set of coefficients from the structural equation. On the one hand, they have one or more instrumental variables that credibly are exogenous with regard to this structural equation, but which have limited explanatory power for the endogenous variable. On the other hand, they have one or more ‘strong instruments’, which have much more explanatory power but which may not be exogenous. Researchers currently can take one of two tacks. First, if the researcher only uses the weak instruments, the standard errors on the structural equation calculated by standard methods may be very large. Moreover, it may be the case that the standard asymptotic distribution for IV estimators is invalid because of the weak instrument problem. Second, in the vast majority of cases, empirical researchers use the strong instruments since they are simple to use and likely to produce statistically significant results. Thus, it has obvious appeal to the researcher, but also has the obvious disadvantage that the researcher may obtain inconsistent results if the strong instrument is not a valid instrument. We would propose that researchers take a third approach in their work: use the strong instrument but provide a diagnostic via a Hausman test comparing the results using the strong and weak instrument. However, this approach raises the concern of whether the Hausman test is valid when some of the instruments are weak, which our paper naturally addresses. The outline of the paper is as follows. We outline our model and assumptions in Section 2. In Section 3, we motivate the paper by showing that the situation we analyze arises in several important economic examples: (i) estimating models of life cycle labor supply behavior; (ii) estimating dynamic models such as a health production function for individuals in a developing country and (iii) estimating the return to schooling. In Section 4 we consider the conventional Hausman test under weak IV asymptotics and
3 See Section 3 below.
show that, in general, it will not have the standard χ 2 distribution, but if the model is exactly identified given the weak instruments, one of the standard tests can be used without modification. In Section 5, we provide a modification of the Hausman test when the model is overidentified given the weak instruments that has a standard χ 2 distribution. In Section 6, we first consider the case where it is the weak instrument, and not the strong instrument, that may not be valid. We then extend our test to the case where the errors are heteroskedastic.4 Finally, we extend the test statistic from Section 5 to improve its power while maintaining its good size properties. The results of our Monte Carlo experiments are presented in Section 7. They show that there is indeed a problem with the standard tests when there are weak instruments which overidentify the model, and that our general procedure works relatively well in this case in finite samples. 2. Model and assumptions We consider a simultaneous equation linear regression model y1 = Y2 β + ε
and
Y2 = Z Π + V , where ε and V are mean zero unobserved error matrices, y1 is an n-vector of dependent variables, Y2 is an n × K matrix of regressors that are correlated with ε , Z is an n × L matrix of IV’s with L ≥ K that are independent of V .5 The sample size is denoted by n and all the asymptotic results of the paper are based on n → ∞. We assume that the IV’s consist of two components, Z = [W , S ], where W is an n × Lw matrix that contains weak IV’s and S is an n × Ls matrix that contains strong, but potentially invalid, IV’s. Further, S is the ‘‘residual’’ when S is projected on W in the population, S = W Γw + S.
(1)
We also denote y1i , Y2i , wi , si , si , εi , and vi to be the ith row of y1 , Y2 , W , S , S , ε , and V , respectively. We assume that Lw ≥ K . Throughout this section, we will assume that W is orthogonal to the regression error ε , that is, E [wi εi ] = 0. The main object of interest in this paper is a test for the validity of the IV’s in S. In this case, the hypotheses that we are testing are ′
′
′
′
′
H0 : E [si εi ] = 0 (or E [ si εi ] = 0),
(2)
H1 : E [si εi ] ̸= 0 (or E [ si εi ] ̸= 0).
(3)
Thus, if the exclusion restriction is violated, then it is only through the possible correlation between si and εi .6 Let ρs = [E ( si s′i )]−1 E ( si εi ) denote the coefficient of the projection of ε on S. We write that
ε = S ρs + V ρv + e,
(4)
where ρv = [E (vi vi )] E (vi εi ) denotes the coefficient of projection of εi on vi . We will assume that ei is uncorrelated with si and ′
−1
4 To carry out this extension, we must first derive the standard Hausman test to for the case where both the exogenous and potentially endogenous instruments are strong and the errors are heteroskedastic. Although this derivation is quite straightforward from the perspective of econometric theory, it should be helpful to applied researchers since this test is not currently available in the literature. 5 We follow the standard approach, and assume that included exogenous variables are ‘partialled out’—see the online Appendix for more details. 6 The main focus of the paper is not to obtain a post-specification test inference, but rather to investigate the validity of various versions of the Hausman tests themselves with weak IV’s. See Guggenberger (2008), e.g., for issues on postspecification test inference.
J. Hahn et al. / Journal of Econometrics 160 (2011) 289–299
vi and has mean zero and variance σe2 . Our null and alternative hypotheses can then be rewritten as H0 : ρs = 0, H1 : ρs ̸= 0. The basic idea of the Hausman test statistics for the null hypothesis (2) is based on the difference of the following two estimators7 :
βw = (Y2′ PW Y2 )−1 Y2′ PW y1 , βz = (Y2′ PZ Y2 )−1 Y2′ PZ y1 . When the conventional asymptotic approximation is valid, βz is an efficient but non-robust estimator, while βw is a less efficient, but robust, estimator. Then, the conventional Hausman test statistics measure the difference βw − βz using various weight matrices. We first consider three versions of the Hausman test that are used widely in the literature: 2 H1 = ( βw − βz )′ [ σε,w (Y2′ PW Y2 )−1 − σε,2 z (Y2′ PZ Y2 )−1 ]−1 ( βw − βz ),
−2 H2 = σε,w (βw − βz )′ [(Y2′ PW Y2 )−1 − (Y2′ PZ Y2 )−1 ]−1 ( βw − βz ),
H3 =
σε,−z2 ( βw 1 n
(y1 − Y2 βw )′ (y1 − Y2 βw )
substantial measurement error. MaCurdy (1981) used polynomials in age as IV for ∆ ln(wit ), but Altonji (1986) argued that MaCurdy’s instruments were weak in the sense of not being jointly significant in the first-stage equation. Instead Altonji considered a direct measure w mit of the wage which is obtained from a question put to individuals in the sample ‘what is your hourly wage rate?’ He assumes that the measurement error in wit and w mit are independent, and thus only considering the error term ∆eit , the variable ∆ ln(w mit ) is a valid IV for ∆ ln(wit ). However, as Altonji noted, this potential instrument will not be independent of ηit unless w mit is known in period t − 1. He next considered ∆ ln(w mit −1 ) as an IV for ∆ ln(wit ) since it will be orthogonal to ηit , but finds that the correlation between ∆ ln(wit ) and ∆ ln(w mit −1 ) is too weak to be empirically useful. Instead he assumes that the wage is known one period in advance so that ∆ ln(w mit ) is indeed an appropriate IV for ∆ ln(wit ). Thus, our procedure could be used to offer readers a diagnostic test whether ∆ ln(w mit ) is indeed a valid IV, using either (or both) MaCurdy’s polynomial in age or ∆ ln(w mit −1 ) as the weak instruments. 3.2. Dynamic models
− βz )′ [(Y2′ PW Y2 )−1 − (Y2′ PZ Y2 )−1 ]−1 ( βw − βz ),
Researchers often consider dynamic panel data regressions of the form
where 2 σε,w =
291
(5)
yit = γ yit −1 + β Xit + uit .
(8)
The problem we analyze arises in many empirical studies; here we show this for three important cases.
In (8), yit is a scaler-dependent variable for individual i in year t , Xit is a vector of exogenous explanatory variables, and uit is an error term. Since it is unreasonable to assume that the error term uit is independent over time for the same person, yit −1 must be treated as endogenous. Natural instruments are lagged values of Xit , but researchers often find that these lagged values of Xit do a poor job of explaining yit −1 . Instead they often assume a MA(k) structure for uit , which implies that yit −k−1 is a valid IV for yit −1 . However, the choice of k is usually arbitrary since economic theory does not provide any guidance on this issue. Again our test can be used here, where yit −k−1 is the strong instrument and the lags of Xit are the weak instruments. An example of such an equation is given in Strauss and Thomas (1995), where (8) is a health production function for individuals in a developing country, and the Xit represents variables such as distance to the village health clinic. They used the strong instruments (lagged yit ), but could have used our procedure below to obtain a diagnostic for their approach.
3.1. Life cycle labor supply models
3.3. Estimating the return to schooling
and
σε,2 z =
1 n
(y1 − Y2 βz )′ (y1 − Y2 βz ).
(6)
Under conventional asymptotics these test statistics all converge to χK2 , a (central) chi-square distribution with d.f. K under the null. Therefore, the conventional asymptotics suggest that we compare these test statistics with the critical value from χK2 . Our contribution is to consider the properties of the test statistics under the assumption that W is weak, and S is strong but potentially invalid, under the asymptotics developed by Staiger and Stock (1997). 3. Economic examples
Researchers often consider the following model to describe the (annual) intertemporal labor supply function for prime-aged males8
∆ ln(hit ) = δ ∆ ln(wit ) + α + β ∆Xit + ∆eit + δηit ,
(7)
where ∆ denotes the first difference. In (7), hit are hours of work in year t for individual i, wit is his real hourly wage rate in that year, Xit are time changing demographic variables, and eit is an idiosyncratic error term. Further, ηit is a ‘rational expectations’ error term which is orthogonal to all variables known in period t − 1; thus, ∆ ln(wit ) is correlated with ηit . We also expect that ∆ ln(wit ) is correlated with ∆eit since the variable wit is formed by dividing annual earnings by hit , and the latter is thought to contain
7 Given a matrix A, we use notation P = A(A′ A)−1 A and M = I − P throughout. A A A 8 See MaCurdy (1981), Altonji (1986), Ham (1986) and Ham and Reilly (2002). Corner solutions at zero hours are not important for this group and thus a regression framework is appropriate.
Consider the wage equation ln(wi ) = α Si + γ Ai + β Xi + ei .
(9)
In (9) the variables wi , Si and Ai represent the hourly wage, years of schooling, and ability (as measured by a test such as the AFQT in the case of the NLS data), while Xi represents variables such as race, experience, and experienced squared. The problem here is that even conditional on ability Ai , Si and ei may be correlated. For example, an increase in ambition may increase both Si and ei , leading to a positive correlation between these variables. One possible instrument for Si is the father’s education FEi , which Willis and Rosen (1979) use to identify a more complicated version of (9). They argue that children from wealthier families have a lower discount rate than poorer children since wealthy parents are more likely to help finance their children’s education. In practice, FEi will be an important determinant of Si conditional on Ai and Xi . However, it may be an invalid IV since it can also reflect the father’s ambition, which he may pass on to his children; if so, FEi will be correlated with ei and thus will be an invalid IV. An alternative IV
292
J. Hahn et al. / Journal of Econometrics 160 (2011) 289–299
is the father’s age, FAi , in the year that the individual turned 18 since this will also affect the family’s ability to help its children pay for college, but is unlikely to be correlated with ei . Unfortunately, FAi may have little predictive power for Si conditional on Ai and Xi , and again out test provides a diagnostic for Willis and Rosen’s assumption that FEi is a valid IV. 4. Hausman tests under weak IV asymptotics We now investigate the theoretical properties of the conventional Hausman test for the hypothesis (2) under the assumption that W is weak. More specifically, we assume that the coefficient of the population projection of Y2 on W shrinks to zero at the rate √1 , while the coefficient of population projection of Y2 on S does n not. For this purpose, we adopt the following parameterization: C Y2 = W √ + S Πs + V . n
(10)
We assume that Ls ≥ K . Suppose that we consider H1 , H2 , or H3 , but adopt Staiger and Stock’s (1997) alternative asymptotic approximation. Because their asymptotics imply that the asymptotic distribution of βw is not normal, it is natural to conjecture that the asymptotic size of the conventional procedure would be distorted. Not surprisingly, H1 , H2 , and H3 are usually not distributed as χK2 under the weak-instrument asymptotic approximation—see Theorem 4 in Appendix C.1. Our first main contribution is to recognize that there is an important exception. We show that H3 is asymptotically χK2 under the null despite the presence of weak instruments if the model is exactly identified with only the weak IV. Theorem 1. Assume Conditions 1 and 2 in Appendix A.9 Suppose that Lw = K and Y2′ W has full rank K . Then, (a) under the null hypothesis (2),
H3 ⇒ Z Z ≡ χ , ′
2 K
5. Generalized Hausman test The results in Section 4 imply that a version of Hausman test, i.e., H3 (but not H1 and H2 ), combined with a critical value from χK2 , is asymptotically valid even under the weak-instrument asymptotics as long as β is exactly identified by the weak instruments (Lw = K ). On the other hand, Theorem 4 in Appendix C.1 shows that neither H1 , H2 , nor H3 , along with a critical value from χK2 , is valid under the weak-instrument asymptotics if the weak IV’s overidentify β , that is, when Lw > K . One might argue that the overidentified case is not of practical concern because a practitioner can always choose a subset of weak instruments from W that exactly identifies β and thus H3 can be used. Although one can resolve the situation in this fashion, it is not clear which of the K weak instruments should be chosen. We show in this section that there is a version of the specification test which can be used with a critical value from a chi-square distribution even under the weakinstrument asymptotics when the model is overidentified with the weak IV’s. Suppose that the model is in fact overidentified with the weak instruments (Lw ≥ K ). Under the null that the strong IV’s are valid, we have the moment condition E [wi (y1i − Y2i′ plim βz )] = 0. However, under the alternative that the strong IV’s are not valid, we have E [wi (y1i − Y2i′ plim βz )] ̸= 0 because the probability limit of βz will be different from β under the alternative. From these observations, we might consider a test statistic based on 1 βz ). √ W ′ (y1 − Y2 n
With some algebra, the following lemma can be shown to hold.
(b) under the alternative hypothesis (3), H3 ⇒ (Z + κ) (Z + κ) ≡ χ (κ), ′
2 K
Lemma 1. Assume Conditions 1 and 2 in Appendix A. Under the null (2) and conventional asymptotics,
where Z ∼ N (0, IK ) and the noncentrality parameter κ is defined in (16) in Appendix C.2.
1 βz ) ⇒ N (0, σε2 Ψ ), √ W ′ (y1 − Y2
Proof. In Appendix C.2.
where Ψ is the probability limit of
Theorem 1 indicates that, as long as the weak instrument W exactly identifies β (Lw = K ), the standard practice of using H3 along with a critical value from χK2 is asymptotically valid even under the weak-instrument asymptotics. The weak-instrument asymptotic distribution under the null is identical to the standard asymptotic distribution. The weak-instrument asymptotic distribution under the alternative is χK2 (κ), a noncentral chi-square distribution with d.f. K , which dominates the asymptotic distribution under the null χK2 , and therefore, the test is unbiased under the weak-instrument asymptotics. The invalidity of H1 and H2 (even under exact identification) can be attributed to the failure to estimate σε2 consistently under the null. On the other hand, even H3 , which is based on a consistent estimator of σε2 under the null, is invalid without exact identification. This is one of the main differences between our results and the results in Staiger and Stock (1997), who test for exogeneity of the regressors (there the strong IV’s are the regressors). In the next section, we develop a modification of the Hausman test that does not require exact identification. This modification requires σε2 to be estimated consistently under the null.
9 We impose standard regularity conditions, which are discussed in Appendix A.
n
1 Ψ n
and
= W ′ W − (W ′ Y2 )(Y2′ PZ Y2 )−1 (Y2′ W ). Ψ Proof. In the online Appendix.
Therefore, the test statistic is equal to
H ( σε2 ) =
1
σε2
−1 W ′ (y1 − Y2 (y1 − Y2 βz )′ W Ψ βz ),
(11)
= W ′ W −(W ′ Y2 )(Y2′ PZ Y2 )−1 (Y2′ W ) and where Ψ σε2 denotes some 2 consistent estimator for σε . In light of Lemma 1, it is straightforward to conclude that the (conventional) asymptotic distribution of H ( σε2 ) is χL2w . In other words, researchers can use H ( σε2 ) to obtain a χ 2 -test with standard critical values even when the model is overidentified with the weak IV’s. Proposition 1 gives an interpretation of the new statistic H ( σε2 ). Proposition 1. When Lw = K ,
H ( σε2 ) =
1
σε2
( βw − βz )′ [(Y2′ PW Y2 )−1 − (Y2′ PZ Y2 )−1 ]−1 ( βw − βz ).
Proof. In the online Appendix.
J. Hahn et al. / Journal of Econometrics 160 (2011) 289–299
From Proposition 1, we can conclude that H ( σε2 ) can be understood as a version of the Hausman test in a special case where Lw = K . Depending on the estimator σε2 used, the statistic H ( σε2 ) 10 can be understood to be an extension of H2 or H3 . Recall σε,2 z in (6). It turns out that H ( σε,2 z ), which is comparable to H3 , has desirable asymptotic properties. Theorem 2. Assume Conditions 1 and 2 in Appendix A. Assume that Lw ≥ K . (a) Under the null hypothesis,
H ( σε,2 z ) ⇒ χL2w ,
(b) under the alternative hypothesis, H ( σε,2 z ) ⇒ (κ + Z)′ (κ + Z), where Z ∼ N (0, ILw ) and κ is the same noncentrality parameter as in Theorem 1. Proof. In Appendix D.1.
6.2. Heteroscedasticity It is well known that 2SLS is not efficient under heteroscedasticity, and the usual form of the Hausman test would no longer be valid even under the null. This implies that, even with conventional asymptotics, the Hausman test has to be modified. We note that there does not exist a standard modification of Hausman test to accommodate heteroscedasticity. We consider one possible modification here. Since the size of the Hausman test is valid only when Lw = K even under homoscedasticity, we assume that the dimension of the weak IV’s is the same as the dimension of β , that is, Lw = K . Here for expositional simplicity we assume that β is a scalar (that is, K = 1). The extension to a general K is straightforward and has been placed in the online Appendix. Given that the Hausman test has an interpretation of a comparison between βw and βz , a natural modification of the Hausman test statistic would take the form
Theorem 2 indicates that using H ( σε,2 z ) along with a critical
value from χ 2 is asymptotically valid even under the weakinstrument asymptotics, and the weak-instrument asymptotic distribution under the null is identical to the standard asymptotic distribution. The weak-instrument asymptotic distribution under the alternative dominates the asymptotic distribution under the null, and therefore, the test is unbiased under the weak-instrument asymptotics. On the other hand, the test statistic does not diverge to infinity under the alternative, as is the case with standard asymptotics, and therefore the test is not consistent under weakinstrument asymptotics regardless of whether the model is exactly identified or over-identified by the weak IV’s. 6. Discussion We first consider two deviations from our assumptions. First, we consider the case where the strong IV is valid under both null and alternative hypotheses, while the weak IV is valid only under the null.11 Second, we examine the consequences of heteroscedasticity. After this, we consider the issue of improving power. 6.1. When the weak IV are valid only under the alternative hypothesis We may want to consider an alternative scenario, where the strong IV are valid both under the null and the alternative, and the weak IV are valid only under the null. Although this scenario is unlikely to be common in practice, we address this situation for its theoretical interest. In this case, the model could be modified as ˜ √C + S Πs + V , where the alternative y1 = Y2 β + ε and Y2 = W n
ρw + V ρv + e. Here W is the hypothesis is now written ε = W . population projection ‘‘residual’’ of W on S: W = S Γs + W Here we consider the properties of (generalized) Hausman test statistic H ( σε,2 z ).12 It can be shown that H ( σε,2 z ) is distributed as
χL2s under the null, but diverges to ∞ under the alternative.13 In other words, the H ( σε,2 z ) has the identical properties as under the conventional asymptotics!
293
n( βw − βz )2
(12)
√
( n( Var βw − βz )) √
( n( where Var βw − βz )) denotes a consistent estimator of √ the asymptotic variance of n( βw − βz ) under conventional asymptotics. To see this in more detail, by definition, under the null we have √
n( βw − βz ) =
√
n( βw − β) −
=
′
W Y2
−1
n
√
n( βz − β)
W ε ′
√
n
−
Y2′ PZ Y2
−1
Y2′ PZ ε
√
n
n
,
and its asymptotic variance under conventional asymptotics is E [wi2 εi2 ]
(E [wi Y2i ])2
−2
E (Y2i zi′ )(E (zi zi′ ))−1 E (zi wi εi2 ) E [wi Y2i ][E (Y2i zi′ )(E (zi zi′ ))−1 E (zi Y2i )]
E (Y2i zi )(E (zi zi′ ))−1 E (zi zi′ εi2 )(E (zi zi′ ))−1 E (zi Y2i ) ′
+
[E (Y2i zi′ )(E (zi zi′ ))−1 E (zi Y2i )]2
.
One can use the idea behind White’s heteroscedasticity corrected standard errors, using the standard IV estimator’s residuals √ εˆ z = y1 − Y2 βz . A natural choice for a consistent estimator of Var( n( βw − βz )) is given in Box I. Although the above derivation is quite straightforward from the perspective of econometric theory, it should be helpful to applied researchers since the standard Hausman test where both the exogenous and potentially endogenous instruments are strong is not currently available in the literature when there is heteroskedasticity. In the online Appendix we show that under the Staiger–Stock asymptotics
Hhetero (ˆεz ) =
n( βw − βz )2
( Var βw − βz )
⇒ (Z + κhetero )2 ,
(13)
where Z ∼N (0, 1) and
−Σww C τ
κhetero = lim n
1 n
n ∑
E [w (˜si (ρs − Πs τ ) + vi (ρv − τ ) + ei ) ] 2 i
1/2 ,
2
i=1
where τ = (Πs′ Σss Πs )−1 Πs′ Σss ρs . Since κhetero = 0(τ = 0) under the null,
Hhetero (ˆεz ) ⇒ χ12 . 10 When L = K , we have H ( 2 σε,w ) = H2 and H ( σε,2 z ) = H3 . w 11 We thank an anonymous referee for suggesting this question. 12 Given that the role of w and s is switched, we note that our test would be based
on √1n S ′ (y1 − Y2 βz ). 13 The proof is available in our online Appendix.
Therefore, the test is valid under the null. On the other hand, under the alternative,
Hhetero (ˆεz ) ⇒ χ12 (κhetero ), and thus the test Hhetero (ˆεz ) becomes asymptotically unbiased.
294
J. Hahn et al. / Journal of Econometrics 160 (2011) 289–299
√
( n( βw − βz )) = Var
n ∑
1 n
1 n
i=1 n ∑
wi Y2i
2 − 2
i=1
+
wi2 εi2
1 n
n ∑ i=1
Y2i zi′
1 n
1 n
1 n
n ∑
n
∑
1 n
n ∑
wi Y2i
i =1
i=1
−1
′
Y2i zi
i=1
i=1
zi zi′
n ∑
Y2i zi′
1 n
n ∑
n ∑
Y2i zi′
∑
i =1
zi zi′ εˆ i2 ′
zi zi
i =1
zi zi′
−1
i =1
i=1 n ∑
1 n
1 n
n
1 n
−1
n ∑
1 n
1 n
1 n
n
∑
1 n
zi zi′
n ∑ i =1
−1
i=1
zi zi′
−1
i=1 n ∑
zi wi εi2
1 n
n ∑
1 n
n ∑
zi Y2i
i =1
zi Y2i
i =1
.
2 zi Y2i
i=1
Box I.
6.3. Improving the power of H3 In Section 4, we noted that H3 is asymptotically unbiased for the case where L = K . On the other hand, the test statistic does not diverge to infinity under the alternative, as is the case with standard asymptotics, and therefore the test is not consistent under weak-instrument asymptotics. Given that the consistency of a test is usually understood to be a necessity, a researcher may conclude that a test using H3 is deficient. We should note, though, that a consistent test is probably impossible to construct given the nature of weak instruments. Many other tests are inconsistent with weak IV’s, and a lack of consistency is not limited to the weak IV literature. (For example, a recent test by Andrews (2003) exhibits similar properties.) The lack of consistency suggests that it would be a useful endeavor to try to improve the power of the test while maintaining its good size properties. For this purpose, we propose the following version of the Hausman test:
H4 =
σε,−z2 ( βw
− βz )′ [(Y2′ PW Y2 )−1 − (Y2′ PZ Y2 )−1 ]−1 ( βw − βz ),
where σε,2 z is obtained by the following algorithm: 1. Using the IV estimator βz , we get the IV residual εz = y1 − Y2 βz . 2. Regress the IV residual εz on Z = [S , W ] to get the residual MZ εz and define 1 ′ 1 ′ σε,z = εz MZ εz = σε,2 z − εz PZ εz . n n 2
(14)
From (14), one can see that the proposed estimator σε,2 z modifies
σε,z by subtracting 1n εz′ PZ εz . Although it is generally preferable to use the modified estimator σε,2 z , there are two special cases where such modification is unnecessary and σε,2 z does not need to 1/2 be modified. The first case is when Σss ρs belongs to the space 1/2 spanned by the columns of Σss Πs (or Πs ). Recall the definition 1/2 1/2 τ = (Πs′ Σss Πs )−1 Πs′ Σss ρs . We then have Σss ρs = Σss Πs (Πs′ −1 ′ Σss Πs ) Πs Σss ρs , so that ρs = Πs τ . This coincidence depends on 2
the alternative, which is not known to the practitioner, so it probably has little practical importance. The second case is of more practical significance. Suppose that the model is exactly identified by the strong instrument si , that is, Ls = K , and Πs is invertible. 2 We then have ρs = Πs τ and σ∗2 = σ∗∗ , and there is no need for the second step modification above—we can use σε,2 z in place of σε2 . We believe that the second case is empirically more relevant than the first case, because in many applications the endogenous regressor Y2i and the strong IV si are scalars. Theorem 3 below shows that H4 thus defined has the usual χK2 under the null, and its asymptotic distribution under the null stochastically dominates χK2 .
Theorem 3. Assume Conditions 1 and 2 in Appendix A. Suppose that Lw = K . Then H4 ⇒ χK2 under the null. Under the alternative hypothesis (3), H4 ⇒
2 σ∗∗ (Z σ∗2
+ κ)′ (Z + κ), where Z ∼ N (0, IK ), κ is the
2 same noncentrality parameter as in Theorem 1. Also, σ∗∗ = plim σε,2 z 2 2 under the alternative and σ∗ = plim σε,z under the alternative.
Proof. Omitted because Theorem 3 is an immediate consequence of Theorem 2 in Section 5. In Appendix D we show that under the alternative,
2 σε,2 z →p σ∗2 = (ρv − τ )′ Σvv (ρv − τ ) + σe2 and σε,2 z →p σ∗∗ = (ρs − 2 ′ ′ Πs τ ) Σss (ρs − Πs τ )+σ∗ , where τ = plim(βz −β) = (Πs Σss Πs )−1 2 ≤ 1, which implies that Πs′ Σss ρs . Here it is obvious to see σ∗2 /σ∗∗
the asymptotic power of the modified test H4 is larger than H3 . In Section 5, we proposed a generalized Hausman test statistic H (·) for the case where Lw > K . A natural question is whether the power of H ( σε,2 z ) is dominated by H ( σε,2 z ). Using Theorem 2, it is easy to see that H ( σε,2 z ) is asymptotically unbiased and its power dominates that of H ( σε,2 z ). As such, H ( σε,2 z ) is a desirable test. 2 We note that if Lw = K , H ( σε,z ) simplifies to
H ( σε,2 z ) = σε,−z2 ( βw − βz )′ [(Y2′ PW Y2 )−1
− (Y2′ PZ Y2 )−1 ]−1 ( βw − βz ) = H4
(15)
by Proposition 1. Based on this equality, we will, without too much loss of generality, define H4 = H ( σε,2 z ) even for the overidentified case.14
7. Monte Carlo simulations The data generating process used in the Monte Carlo simulations is y1i = Y2i β + s′i ρs + εi Y2i = wi′ Πw + s′i Πs + vi ,
i = 1, . . . , n
iid
iid
1
ρ
where (wi′ , s′i ) ∼ N (0, I ), (εi , vi ) ∼ N 0, ρ 1 , and y1i and Y2i are scalars. Further, β = 1 and Πw and Πs are proportional to vectors consisting of ones. They are related to the (partial) first stage R2 by R2w =
Πw′ Πw , Πw′ Πw + 1
R2s =
Πs′ Πs . Πs′ Πs + 1
We fixed R2s = 0.2 throughout the simulation, and we considered R2w = 0.01, 0.02. We set ρs = 0 under the null, and ρs = (γs ,
14 From our online Appendix one sees that a test based on H is quite easy to 4 construct in Stata or similar programs.
J. Hahn et al. / Journal of Econometrics 160 (2011) 289–299 Table 1 Size of test, # of weak IV = 1, ρ = 0.25.
Table 2 Size of test, # of weak IV = 5, ρ = 0.75.
n
# strong IV
R2
H1
H2
H3
H4
n
# strong IV
R2
H1
H2
H3
H4
100 100 100 100 100 100 500 500 500 500 500 500
1 1 1 1 1 1 1 1 1 1 1 1
0.01 0.02 0.03 0.05 0.1 0.2 0.01 0.02 0.03 0.05 0.1 0.2
0.00 0.01 0.01 0.01 0.02 0.03 0.01 0.02 0.03 0.03 0.04 0.05
0.00 0.01 0.01 0.01 0.02 0.04 0.01 0.02 0.03 0.03 0.04 0.05
0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.06
0.05 0.06 0.06 0.06 0.06 0.05 0.05 0.05 0.05 0.06 0.05 0.06
100 100 100 100 100 100 500 500 500 500 500 500
1 1 1 1 1 1 1 1 1 1 1 1
0.01 0.02 0.03 0.05 0.1 0.2 0.01 0.02 0.03 0.05 0.1 0.2
0.37 0.34 0.31 0.27 0.20 0.14 0.30 0.22 0.18 0.14 0.10 0.07
0.32 0.29 0.27 0.22 0.15 0.09 0.28 0.21 0.16 0.12 0.08 0.06
0.23 0.20 0.18 0.14 0.10 0.07 0.16 0.12 0.10 0.07 0.06 0.05
0.11 0.11 0.11 0.11 0.11 0.10 0.07 0.07 0.07 0.07 0.07 0.07
100 100 100 100 100 100 500 500 500 500 500 500
2 2 2 2 2 2 2 2 2 2 2 2
0.01 0.02 0.03 0.05 0.1 0.2 0.01 0.02 0.03 0.05 0.1 0.2
0.00 0.01 0.01 0.01 0.02 0.03 0.01 0.02 0.03 0.04 0.04 0.05
0.00 0.01 0.01 0.01 0.02 0.03 0.01 0.02 0.03 0.04 0.05 0.05
0.05 0.05 0.05 0.05 0.05 0.06 0.06 0.06 0.06 0.06 0.05 0.06
0.06 0.06 0.05 0.06 0.06 0.06 0.06 0.06 0.06 0.06 0.06 0.06
100 100 100 100 100 100 500 500 500 500 500 500
2 2 2 2 2 2 2 2 2 2 2 2
0.01 0.02 0.03 0.05 0.1 0.2 0.01 0.02 0.03 0.05 0.1 0.2
0.35 0.33 0.30 0.25 0.18 0.12 0.30 0.22 0.18 0.13 0.09 0.07
0.31 0.28 0.25 0.20 0.13 0.07 0.28 0.20 0.16 0.11 0.07 0.06
0.23 0.20 0.17 0.14 0.09 0.06 0.16 0.11 0.09 0.07 0.06 0.05
0.12 0.12 0.11 0.11 0.11 0.10 0.06 0.06 0.06 0.06 0.06 0.06
100 100 100 100 100 100 500 500 500 500 500 500
5 5 5 5 5 5 5 5 5 5 5 5
0.01 0.02 0.03 0.05 0.1 0.2 0.01 0.02 0.03 0.05 0.1 0.2
0.00 0.00 0.01 0.01 0.02 0.03 0.01 0.02 0.02 0.03 0.04 0.04
0.00 0.00 0.01 0.01 0.02 0.03 0.01 0.02 0.02 0.03 0.04 0.04
0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05
0.05 0.06 0.06 0.06 0.06 0.06 0.05 0.05 0.05 0.05 0.05 0.05
100 100 100 100 100 100 500 500 500 500 500 500
5 5 5 5 5 5 5 5 5 5 5 5
0.01 0.02 0.03 0.05 0.1 0.2 0.01 0.02 0.03 0.05 0.1 0.2
0.26 0.23 0.21 0.17 0.11 0.06 0.27 0.20 0.16 0.11 0.08 0.06
0.23 0.20 0.17 0.13 0.07 0.04 0.26 0.19 0.14 0.10 0.06 0.05
0.18 0.15 0.13 0.09 0.06 0.04 0.16 0.11 0.09 0.07 0.06 0.05
0.12 0.12 0.12 0.12 0.12 0.12 0.07 0.07 0.07 0.07 0.07 0.07
Note: R2 is the partial R2 of the weak IV.
0, . . . , 0) under the alternative. We consider n = 100, 500, ρ =
Note: R2 is the partial R2 of the weak IV. 1 4
,
, = 0.01, 0.02, 0.03, 0.05, 0.1, 0.2, and γs = 1. The dimensions of wi and si are Lw = 1, 5 and Ls = 1, 2, 5, respectively. The nominal size of the test is 5%. (Additional cases are considered in our online Appendix.) All of the simulation results are based on 5000 runs. Table 1 looks at the size of the test for H1 –H4 when there is one weak IV and ρ = 14 for different sample sizes and numbers of strong instruments. Thus, we first consider the case where the model is exactly identified using the weak instruments and the degree of endogeneity is relatively small. The first section of the table considers this specification for the two different sample sizes and the six values of R2w . Ideally each entry should be 0.05, so that we see that in each case the size with low R2w is much too small for H1 and H2 but is dead on for H3 and H4 . As R2w and the sample size increase, the size distortion of both H1 and H2 decreases. The bottom two sections of Table 1 consider the case of two and five strong instruments, respectively. Now the size of H3 and H4 are still equal to 0.05 or 0.06 in the rest of the cases, while H1 and H2 continue to be under-sized. Table 2 looks at the size of the test for H1 –H4 when there are five weak IV and ρ = 34 ; i.e. a case where the model is overidentified under the weak instruments and the degree of endogeneity is considerably higher. In this case, for tests H3 and H4 we consider H3 = H ( σε,2 z ) and H4 = H ( σε,2 z ), where the test statistic H (·) is defined in (11). Now the size of each test is biased upwards—this is especially true for H1 and H2 with low R2w . However, it is interesting to note that H4 substantially outperforms H3 for most of the cases, which is intuitively 1 2
295
3 , R2w 4
plausible since H4 was developed for the case where the model is overidentified using the weak instruments. Comparing the results in Tables 1 and 2 does raise an interesting dilemma. On the one hand, researchers can improve the size of the test by using only one of the weak IV’s when the model is overidentified under the weak IV. On the other hand since different researchers are likely to make different choices in terms of which weak IV to use, they will obtain different test results for identical models. Further, there is also the potential problem of researchers running all five regressions when there are five weak instruments and one endogenous variable, and choosing the results they like the best. In Table 3, we consider the power properties of H3 and H4 when there is one weak IV, five strong IV and ρs = (1, 0, . . . , 0), i.e. only one of the strong instruments is invalid. Note that this is a conservative example in that it will be harder to reject the null when it is false than if all the strong IV were invalid. Given that we have weak instruments, it is unrealistic to expect the entries in Table 3 to be close to one. Not surprisingly, the power of each test rises with the sample size and the explanatory power of the weak instruments. It is also interesting to note that when R2w is low, the power of H4 is often more than double that of H3 when n = 100, and about 50% greater than that of H3 when n = 500. Thus in terms of power with low R2w , H4 substantially out performs H3 for all sample sizes in our example. This is to be expected as the model is overidentified under the strong IV, and H4 was developed with power considerations in mind.15 (Recall that there is no need to use
15 The corresponding power statistics for H when the model is overidentified 4 under the weak instruments are somewhat higher than those in Table 3.
296
J. Hahn et al. / Journal of Econometrics 160 (2011) 289–299
→p 0; 1n S ′ V →p 0; 1n S ′ e →p 0, and (iii) 1n ε ′ ε →p σε2 > 0; 1n cc Σww Σw s e′ e →p σe2 > 0, where Σzz = Σsw Σss and the notation ‘‘ > 0’’ in (i) signifies positive definiteness of the matrices. 1 ′ Ze n
Table 3 Power of test, # of weak IV = 1, # strong IV = 5, γs = 1. n
R2
ρ
H3
H4
100 100 100 100 100 100 500 500 500 500 500 500
0.01 0.02 0.03 0.05 0.1 0.2 0.01 0.02 0.03 0.05 0.1 0.2
0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25
0.10 0.14 0.17 0.24 0.37 0.55 0.27 0.46 0.61 0.79 0.95 0.99
0.22 0.27 0.32 0.40 0.56 0.71 0.42 0.62 0.76 0.89 0.98 1.00
100 100 100 100 100 100 500 500 500 500 500 500
0.01 0.02 0.03 0.05 0.1 0.2 0.01 0.02 0.03 0.05 0.1 0.2
0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5
0.11 0.16 0.21 0.29 0.45 0.62 0.32 0.54 0.70 0.86 0.97 0.99
0.26 0.33 0.39 0.50 0.65 0.78 0.53 0.74 0.85 0.95 0.99 1.00
100 100 100 100 100 100 500 500 500 500 500 500
0.01 0.02 0.03 0.05 0.1 0.2 0.01 0.02 0.03 0.05 0.1 0.2
0.75 0.75 0.75 0.75 0.75 0.75 0.75 0.75 0.75 0.75 0.75 0.75
0.13 0.20 0.27 0.37 0.55 0.70 0.41 0.67 0.81 0.93 0.98 1.00
0.36 0.45 0.54 0.63 0.77 0.84 0.70 0.88 0.94 0.98 0.99 1.00
Note: R2 is the partial R2 of the weak IV.
Remark 1. Condition 1 assumes the weak law of large numbers of the variables in Z , S , V , and Y2 . The asymptotic orthogonalities in Condition 1(ii) reflect the definitions of the parameterizations in (10), (1) and (4). In Condition 1(iii) we assume homoscedasticity of the ‘‘errors’’, as is common in the literature. We discuss heteroscedasticity in Section 6. Condition 2. We also assume that
1 S) √ vec(W ′ n 1 vec(Zws ) ′ √ vec(W V ) ⇒ vec(Zwv ) n Zwe 1 ′ √ We n
∼ N (0, diag(Σss ⊗ Σww , Σvv ⊗ Σww , σe2 Σww )), where diag(Σss ⊗ Σww , Σvv ⊗ Σww , σe2 Σww ) is a block diagonal matrix consisting of Σss ⊗ Σww , Σvv ⊗ Σww , and σe2 Σww as blocks. Appendix B. Preliminaries Proofs of the lemmas below are available in our online Appendix. Lemma 2. Assume Condition 1 holds. The following hold both under the null and under the alternative: (a) (b) (c)
the modified version of σε,2 z when the model is exactly identified under the strong instruments.) When R2w is high, the power gain decreases. However, in this case, the power itself is much higher than that for the case with low R2w . 8. Conclusion
→p
Σww Γw′ Σww
Σww Γw Γw′ Σww Γw + Σss
0 Σss Πs
.
1 ′ Z Y2 p n 1 ′ 1 ′ −1 1 ′ Y Z Z Z Z Y2 n 2 n n
→
(
)
.
→p Πs′ Σss Πs .
Lemma 3. Assume Conditions 1 and 2 hold. The following hold under the null hypothesis in Section 2: (a) √1n W ′ ε ⇒ Zwv ρv + Zwe . (b)
Y2′ PW Y2 Y2′ PW ε
⇒
−1 (Σ (Σww C + Zws Πs + Zwv )′ Σww ww C + Zw s Πs + Zwv ) −1 (Z ρ + Z ) (Σww C + Zws Πs + Zwv )′ Σww wv v we
.
ε →p 0. ε →p Σvv ρv . (e) βz →p β. (f) σε,2 z →p σε2 . (g) σε,2 z →p σε2 . (c)
Hausman (1978) provides a test for whether an instrument(s) is valid given that the model is identified by other instruments which can be treated as exogenous. However, as we show in a series of examples, researchers often face the problem that the most acceptable instruments are also quite weak, while most strong instruments are potentially endogenous. Using StaigerStock asymptotics, we show that the standard Hausman test for this case may have a size distortion under the null in the presence of weak instruments unless the model is exactly identified using the weak instruments. We then provide a form of the Hausman test that eliminates the problem for the overidentified case. Finally, we show in our online Appendix that this test is easy for empirical researchers to implement using a program like Stata. Our Monte Carlo results suggest that there is indeed a problem with the standard tests, and that our general procedure works relatively well in finite samples.
1 ′ ZZ n
(d)
1 ′ Z n 1 ′ Y n 2
Lemma 4. Assume Conditions 1 and 2 hold. The following hold under the alternative hypothesis in Section 2:
ε →p
0
. (b) βz →p β + τ , where τ = (Πs′ Σss Πs )−1 Πs′ Σss ρs . (c) √1n W ′ (ε − Y2 τ ) ⇒ Zws (ρs − Πs τ ) − Σww C τ + Zwv (ρv − τ ) + Zwe . (d) σε,2 z →p (ρv − τ )′ Σvv (ρv − τ ) + σe2 . (e) σε,2 z →p (ρs − Πs τ )′ Σss (ρs − Πs τ ) + (ρv − τ )′ Σvv (ρv − τ ) + σe2 . (a)
1 ′ Z n
Σss ρs
Appendix C. Proofs of the results in Section 4 Appendix A. Regularity conditions Condition 1. We assume the following. (i) 1n Z ′ Z →p Σzz > 0; 1n S ′ S
→p Σss > 0; 1n V ′ V →p Σvv > 0; 1n Y2′ Y2 →p Σ22 , (ii) 1n Z ′ V →p 0;
We begin by presenting a rather natural result on the properties of the Hausman test with weak IV. In Theorem 4 below, we show that the Hausman test does not have the usual χ 2 distribution under the null.
J. Hahn et al. / Journal of Econometrics 160 (2011) 289–299
C.1. The asymptotic distribution of the Hausman test
−2 − σε,w
Before presenting Theorem 4, we introduce the following definitions: define [ ] [ ] −1 Dw (Σww C + Zws Πs + Zwv )′ Σww (Σww C + Zws Πs + Zwv ) = , ′ − 1 Nw (Σww C + Zws Πs + Zwv ) Σww (Zwv ρv + Zwe )
ξ (Zws , Zwv , Zwe ) = 1 + 1/2
1
1
ζ (Zws , Zwv ) =
−1/2
σε
−1/2
× (Σ22 Dw−1 Nw − Σ22
1/2
(Σ22 Dw−1 Nw − Σ22 2
Σvv ρv ) −
1
σε2
Σvv ρv )′
σe × (Σww C + Zws Πs + Zwv )]−1/2 −1 × (Σww C + Zws Πs + Zwv )′ Σww Zwv ρv , ′ 2 ′ −1 2 ′ −1 −1 H1 = (βw − βz ) [ σε,w (Y2 PW Y2 ) − σε,z (Y2 PZ Y2 ) ] (βw − βz ), −2 ′ ′ −1 ′ −1 −1 H2 = σε,w (βw − βz ) [(Y2 PW Y2 ) − (Y2 PZ Y2 ) ] (βw − βz ), and H3 =
σε,−z2 ( βw
− βz )′ [(Y2′ PW Y2 )−1 − (Y2′ PZ Y2 )−1 ]−1 ( βw − βz ).
n
−2 σε,w (Y2′ PW Y2 )( βw − β + op (1))
where the last equality follows from obtain 1/2
−1/2
− Σ22
Y2′ PW Y2
−1/2
H1 ⇒ (σε2 + (Σ22 Dw−1 Nw − Σ22
n
= op (1). We therefore 1/2
Σvv ρv )′ (Σ22 Dw−1 Nw ′
−1 Σvv ρv ) − ρv′ Σvv Σ22 Σvv ρv )−1 Nw Dw−1 Nw .
Let
Z=
−1 [(Σww C + Zws Πs + Zwv )′ Σww
Y2′ PW Y2
297
]−1
−2 ′ = σε,w (ε PW Y2 )(Y2′ PW Y2 )−1 (Y2′ PW ε) + op (1),
−1 ρv′ Σvv Σ22 Σvv ρv ,
1
−1 [(Σww C + Zws Πs + Zwv )′ Σww (Σww C + Zws Πs σe −1 + Zwv )]−1/2 (Σww C + Zws Πs + Zwv )′ Σww Zwe .
Then, Z ≡ N (0, IK ) and Z is independent of ζ (Zws , Zwv ). Recalling the definitions of ξ (Zws , Zwv , Zwe ) and ζ (Zws , Zwv ), the limit of H1 is
σe2 ξ (Zws , Zwv , Zwe )−1 (ζ (Zws , Zwv ) + Z)′ (ζ (Zws , Zwv ) + Z). σε2 Part (b): Notice that under the null hypothesis, by Lemma 3(f),
Theorem 4. Assume Conditions 1 and 2 hold, and suppose that Z denotes a random vector of N (0, IK ) that is independent of ζ (Zws , Zwv ). Under the null hypothesis (2), (a) H1 , H2 ⇒
Zwv ) + Z). (b) H3 ⇒
σe2 σ2
σe2 σ2 ε
ξ (Zws , Zwv , Zwe )−1 (ζ (Zws , Zwv ) + Z)′ (ζ (Zws ,
(c) H1 , H2 ⇒ ξ (Zws , Zwv , Zwe ) (d) H3 ⇒ Z′ Z ≡ χK2 .
−1
Z Z. ′
Proof. Part (a): Here we show only the limit of H1 . The limit of H2 can be derived by similar fashion and we omit the proof. By Lemma 3(b), we have Y2′ PW Y2 Y2′ PW ε
]
[ ⇒
σε2 + op (1), we obtain 2 σε,w =
ε = Σvv ρv + op (1), and
Y2′ PZ Y2 n
= Op (1). Therefore,
1 ′ Y2 Y2 ( βw − β) n
−1 Part (c): When W exactly identifies β , Nw′ Dw Nw = (Zwv ρv + −1 Zwe )Σww (Zwv ρv + Zwe ). In this case, define
Z=
1
σε
−1/2 Σww (Zwv ρv + Zwe ) ∼ N (0, IK ).
Then, the limit distribution of H1 (and H2 ) is now ξ (Zws , Zwv , Zwe )−1 Z′ Z as required.
1/2
−1/2
= σε2 + (Σ22 Dw−1 Nw − Σ22
Part (a): We proceed as in Part (c) in the proof of Theorem 4. Define
Z=
1
σε
−1/2 Σww (Zwv ρv + Zwe ) ∼ N (0, IK ).
Then, the limit distribution of H3 is then Z′ Z ≡ χK2 .
1/2
Σvv ρv )′ (Σ22 Dw−1 Nw
−1 Σvv ρv )′ − ρv′ Σvv Σ22 Σvv ρv .
Part (b): Using similar argument in the proof of Theorem 4 Part (a), we have
− (Y2′ PW Y2 )]−1 (Y2′ PW Y2 )( βw − βz ) ′ [ ′ Y Y2 PZ Y2 2 PZ Y2 = σε,−z2 ( βw − β − τ + op (1))′ n
Now note that
−
H1 = ( βw − βz )′ σε,−z2 (Y2′ PZ Y2 )[ σε,−z2 (Y2′ PZ Y2 ) −2 −2 − σε,w (Y2′ PW Y2 )]−1 σε,w (Y2′ PW Y2 )( βw − βz ) ′ − 2 ′ − 2 = ( βw − β + op (1)) σε,z (Y2 PZ Y2 )[ σε,z (Y2′ PZ Y2 ) −2 −2 − σε,w (Y2′ PW Y2 )]−1 σε,w (Y2′ PW Y2 )( βw − β + op (1)) ′ [ ′ Y2 PZ Y2 Y 2 PZ Y 2 = ( βw − β + op (1))′ σε,−z2 σε,−z2
n
H3 = σε,−z2 ( βw − βz )′ (Y2′ PZ Y2 )[(Y2′ PZ Y2 )
⇒ σε2 − 2Nw′ Dw−1 Σvv ρv + Nw′ Dw−1 Σ22 Dw−1 Nw −1/2
σe2 (ζ (Zws , Zwv ) + Z)′ (ζ (Zws , Zwv ) + Z). σε2
C.2. Proof of Theorem 1
1 ′ 1 ε ε − 2( βw − β)′ Y2′ ε + ( βw − β)′ n n
− Σ22
= σε−2 (ε ′ PW Y2 )(Y2′ PW Y2 )−1 (Y2′ PW ε) + op (1)
]
Dw . Nw
From this, we can deduce that βw − β ⇒ Dw−1 Nw . Also, by Lemmas 3 (d)–(f), and 2(c), we have βz = β + op (1), σε,2 z = 1 ′ Y n 2
H3 = σε−2 ( βw − β)′ Y2′ PW Y2 ( βw − β) + op (1)
=
Suppose that Lw = K , that is, W exactly identifies β . Then, under the null hypothesis (2),
[
Part (a), we can show that
⇒ σε−2 Nw′ Dw−1 Nw
(ζ (Zws , Zwv ) + Z)′ (ζ (Zws , Zwv ) + Z).
ε
σε,2 z →p σε2 . Using similar arguments to those used in the proof of
n
′
Y2 PW Y2 n
]−1
n
(Y2′ PW Y2 )( βw − β − τ + op (1))
= σε,−z2 ( βw − β − τ + op (1))′ (Y2′ PW Y2 + op (1)) × ( βw − β − τ + op (1)) = σε,−z2 ((ε − Y2 τ )′ PW Y2 )(Y2′ PW Y2 )−1 × (Y2′ PW (ε − Y2 τ )) + op (1) = σε,−z2 (ε − Y2 τ )′ PW (ε − Y2 τ ) + op (1),
298
J. Hahn et al. / Journal of Econometrics 160 (2011) 289–299
where the second line holds since βz
= β + τ + op (1) by
Y ′ PW Y2
Lemma 4(b), the third line holds since 2 n = op (1), and the last line follows since the dimension of W and dimension of Y2 are the same and Y2′ W has full rank. By Lemma 4 (c) and (e) and Condition 1, we can write 1
√ W ′ (ε − Y2 τ ) ⇒ Zws (ρs − Πs τ ) + Zwv (ρv − τ )
H ( σε2 ) ⇒ (κ + Z)′ (κ + Z), where Z ∼ N (0, ILw ) and κ is the same noncentrality parameter as in Theorem 1(b). Proof. Recall the definition
n
+ Zwe − Σww C τ
H ( σε2 ) =
and
N −1 W ′ (y1 − Y2 (y1 − Y2 βz )′ W Ψ βz ) = 2 , say. σε σε 2
−1 W ′ (y1 − Y2 N = (y1 − Y2 βz )′ W Ψ βz ).
+ (ρv − τ )′ Σvv (ρv − τ ) + σe2 2 σ∗∗ , say.
Note that 1
1
n
−1 −1/2 Z = σ∗∗ Σww (Zws (ρs − Πs τ ) + Zwv (ρv − τ ) + Zwe )
n
and
× 1
σ∗∗
1/2 Σww Cτ .
(16)
1 ′ Y2 Z · n
1 n
−1 ZZ
n
1
−1
· Z ′ Y2 n
1 ′ Y2 Z · n
1 ′ ZZ n
−1
1
· Z ′ ε. n
Using Lemmas 2–4, we can write 1 1 βz ) = √ W ′ (ε − Y2 τ ) + op (1) √ W ′ (y1 − Y2
Then,
n
H3 ⇒ (Z+κ) (Z+κ), ′
where Z ∼ N (0, IK ).
1
βz ) = √ W ′ ε − √ W ′ Y2 √ W ′ (y1 − Y2
Define
κ=−
1
We start with the analysis of
σε,2 z →p (ρs − Πs τ )′ Σss (ρs − Πs τ ) =
Lemma 6. Assume Conditions 1 and 2 hold. Under the alternative hypothesis (3),
n
⇒ Zws (ρs − Πs τ ) + Zwv (ρv − τ ) + Zwe − Σww C τ .
Appendix D. Proofs of the results in Section 5
Because
D.1. Some useful lemmas
1 ′ Y2 PZ Y2 = Op (1), n
1 n
′
W Y2 = Op
1
√
n
,
We introduce a few lemmas below that are helpful in proving Theorem 2. Lemmas 5 and 6 assume that the estimator σε2 is consistent under the null even when we adopt the weak IV asymptotics, and find the limit of the test statistic H ( σε2 ) under the null and under the alternative, respectively. In Lemma 7 we provide an estimator σε2 that is consistent under the null even when we adopt the weak IV asymptotics.
we have
Lemma 5. Assume Conditions 1 and 2 hold. Suppose that σε2 is consistent for σε2 under the null using weak instrument asymptotics. Then H ( σε2 ) ⇒ χL2w .
−1 N = (Zws (ρs − Πs τ ) + Zwv (ρv − τ ) + Zwe − Σww C τ )′ Σww
Proof. The result easily follows from the proof of Lemma 6 below by noting that ρs = 0, τ = 0, and ε = e under the null. In Lemma 5, we make the additional assumption that σε2 is 2 consistent for σε under the weak instrument asymptotics in order to isolate the properties of the ‘‘numerator’’. This is inspired by the discussion in the previous section, where we saw that H2 failed to converge to a central chi-square distribution (see Theorem 4) 2 because the estimator σε,w in (5) is inconsistent for σε2 . It turns out that the properties of σε2 have implications for 2 the power properties of H ( σε ) under the weak instrument asymptotics. Define
1 n
= Ψ
1 n
W ′W −
1 n
W ′ Y2
1 ′ Y2 PZ Y2 n
−1
1 ′ Y2 W n
= Σww + op (1) under both the null and the alternative. We may therefore write that
× (Zws (ρs − Πs τ ) + Zwv (ρv − τ ) + Zwe − Σww C τ ) + op (1).
Also, under the alternative, by Lemma 4(e), 2 σε2 →p σ∗∗ ,
where 2 σ∗∗ = (ρs − Πs τ )′ Σs˜s˜ (ρs − Πs τ ) + (ρv − τ )′ Σvv (ρv − τ ) + σe2 .
Now let −1 −1/2 Z = σ∗∗ Σww (Zws (ρs − Πs τ ) + Zwv (ρv − τ ) + Zwe ),
and −1 1/2 κ = −σ∗∗ Σww C τ .
σ∗2 = (ρv − τ )′ Σvv (ρv − τ ) + σe2
Then, it is easy to see that Z ∼ N (0, IK ). By writing
and
H ( σε2 ) =
κ(Zws ) =
−1/2 σ∗−1 Σww (Zws (ρs
− Πs τ ) − Σww C τ ),
where
τ = plim( βz − β) = (Πs′ Σss Πs )−1 Πs′ Σss ρs denotes the asymptotic bias of βz under the alternative hypothesis.
N
σε2
= (κ + Z)′ (κ + Z) + op (1),
we obtain the desired conclusion.
Lemmas 5 and 6 imply that it is important to choose σε2 such 2 that it is consistent for σε under the null, and consistent for σ∗2 under the alternative. To see this, suppose that we use σε,2 z in
J. Hahn et al. / Journal of Econometrics 160 (2011) 289–299
(6) for σε2 . It can be shown16 that σε,2 z →p σε2 under the null, but
299
σε,z →p (ρs − Πs τ ) Σss (ρs − Πs τ ) + σ∗ under the alternative. In 2 other words, we have σ∗2 /σ∗∗ ≤ 1 if we use σε,2 z . This implies that 2 the asymptotic distribution of H ( σε,z ) under the alternative is a mixture of χ 2 distributions (κ(Zws ) + Z)′ (κ(Zws ) + Z) multiplied by a constant less than or equal to 1. Therefore, the test H ( σε,2 z )
• Compute the 2SLS estimate βw by using the instruments W . Let 2 Vw = σε,w (Y2′ PW Y2 )−1 denote the standard variance estimator 2 of βw , where σε,w = 1n εw′ εw denotes the standard estimator of 2 σε . • Let σε,2 z Vw = 2 Vw . σε,w
(14). It turns out that if we use σε,2 z as an estimate of σε2 , then the
• H4 can now be calculated as H4 = ( βw − βz )′ [ Vw − Vz ]−1 ( βw − βz ).
2
′
2
may be asymptotically biased.17 Below, we present asymptotic properties of σε,2 z developed in
2 ratio σ∗2 /σ∗∗ converges to 1 under the alternative:
Lemma 7. Assume Conditions 1 and 2. Under the null, σε,2 z →p σε2 , 2 2 ′ and under the alternative, σε,z →p σ∗ = (ρv − τ ) Σvv (ρv − τ ) + σe2 . Proof. The required results follow by Lemmas 3(g) and 4(d) in Appendix B. D.2. Proof of Theorem 2 Proof. Part (a) follows by Lemmas 5 and 7 above in Appendix D.1. Part (b) follows by Lemmas 6 and 7 from Appendix D.1. Appendix E. Computational issue For convenience to practitioners, we provide below an alternative algorithm to compute H4 in the special case when the weak instruments W exactly identify the coefficient. The algorithm is based on the characterization (15) in Section 5: For a general overidentified case, see our online Appendix.
• Compute the 2SLS βz by using the instruments Z = [W , S ]. Let Vz = σε,2 z (Y2′ PZ Y2 )−1 denote the standard variance estimator of βz , where σε,2 z = 1n εz′ εz denotes the standard estimator of σε2 . 2 • Computation of σε,z : 1. Using the IV estimator βz , get the IV residual εz = y1 − Y2 βz . 2. Regress the IV residual εz on Z = [S , W ], and get the residual εz = MZ εz . 3. Calculate σε,2 z = 1n εz′ εz . • Let σε,2 z Vz = 2 Vz . σε,z
16 See Lemmas 3(f) and 4(e) in Appendix B. 17 Recall that H ( σε,2 z ) = H3 when Lw = K . The upshot is that unless Lw = Ls = K , H4 will be more powerful than H3 ; hence, our focus on calculating H4 in the online Appendix.
References Altonji, Joseph G., 1986. Intertemporal substitution in labor supply: evidence from micro data. Journal of Political Economy 94, S176–S215. Andrews, Donald W.K., 2003. End-of-Sample instability tests. Econometrica 71, 1661–1694. Andrews, Donald W.K., Moreira, Marcelo, Stock, James H., 2006. Optimal two-sided invariant similar tests for instrumental variables regression. Econometrica 74, 715–752. Bekker, Paul A., 1994. Alternative approximations to the distributions of instrumental variable estimators. Econometrica 62, 657–681. Donald, Stephen D., Newey, Whitney K., 2001. Choosing the number of instruments. Econometrica 69, 1161–1191. Guggenberger, Patrik, 2008. The impact of a Hausman pretest on the asymptotic size of a hypothesis test, Working Paper, UCLA. Hahn, Jinyong, Ham, John C., Moon, Hyungsik R., 2010. The Hausman test and weak instruments: Online Appendix, available at: www-rcf.usc.edu/~moonr. Hahn, Jinyong, Hausman, Jerry A., 2002. A new specification test for the validity of instrumental variables. Econometrica 70, 163–189. Ham, John C., 1986. Testing whether unemployment represents life-cycle labor supply. Review of Economic Studies 53, 559–578. Ham, John C., Reilly, Kevin T., 2002. Testing intertemporal substitution, implicit contracts, and hours restriction models of the labor market using micro data. American Economic Review 92, 905–927. Hausman, Jerry A., 1978. Specification tests in econometrics. Econometrica 46, 1251–1271. Kleibergen, Frank, 2000. Pivotal statistics for testing structural parameters in instrumental variables regression. Econometrica 70, 1781–1803. MaCurdy, Thomas E., 1981. An empirical model of labour supply in a life-cycle setting. Journal of Political Economy 89, 1059–1085. Moreira, Marcelo, 2003. A conditional likelihood ratio test for structural models. Econometrica 71, 1027–1048. Staiger, Douglas, Stock, James H., 1997. IV regression with weak instruments. Econometrica 65, 557–586. Strauss, John, Thomas, Duncan, 1995. Human resources: empirical modeling of household and family decisions. In: Srinivasan, T.N., Behrman, Jere (Eds.), Handbook of Development Economics, vol. 3A. North Holland Press, pp. 1883–2024. Willis, Robert, Rosen, Sherwin, 1979. Education and self-selection. Journal of Political Economy 87, S7–S36.
Journal of Econometrics 160 (2011) 300–310
Contents lists available at ScienceDirect
Journal of Econometrics journal homepage: www.elsevier.com/locate/jeconom
Robust tests for heteroskedasticity in the one-way error components model✩ Gabriel Montes-Rojas a,∗ , Walter Sosa-Escudero b a
Department of Economics, City University London, Northampton Square, London EC1V 0HB, UK
b
Department of Economics, Universidad de San Andrés, Argentina
article
info
Article history: Received 11 November 2008 Received in revised form 2 June 2010 Accepted 8 September 2010 Available online 17 September 2010 JEL classification: C12 C23 Keywords: Error components Heteroskedasticity Testing
abstract This paper constructs tests for heteroskedasticity in one-way error components models, in line with Baltagi et al. [Baltagi, B.H., Bresson, G., Pirotte, A., 2006. Joint LM test for homoskedasticity in a oneway error component model. Journal of Econometrics 134, 401–417]. Our tests have two additional robustness properties. First, standard tests for heteroskedasticity in the individual component are shown to be negatively affected by heteroskedasticity in the remainder component. We derive modified tests that are insensitive to heteroskedasticity in the component not being checked, and hence help identify the source of heteroskedasticity. Second, Gaussian-based LM tests are shown to reject too often in the presence of heavy-tailed (e.g. t-Student) distributions. By using a conditional moment framework, we derive distribution-free tests that are robust to non-normalities. Our tests are computationally convenient since they are based on simple artificial regressions after pooled OLS estimation. © 2010 Elsevier B.V. All rights reserved.
1. Introduction Typical panels in econometrics are largely asymmetric, in the sense that their cross-sectional dimension is much larger than its temporal one. Consequently, most of the concerns that affect cross-sectional models harm panel data models similarly. This is surely the case of heteroskedasticity, a subject that has played a substantial role in the history of econometric research and practice, and still occupies a relevant place in its pedagogical side: all basic texts include a chapter on the subject. As is well known, heteroskedasticity invalidates standard inferential procedures, and usually calls for alternative strategies that either accommodate heterogeneous conditional variances, or are insensitive to them. The one-way error components model is the most basic extension of simple linear models to handle panel data, and it is widely used in the applied literature. In this model, heteroskedasticity may now be present in either the ‘individual’ error component, in the observation-specific ‘remainder’ error component, or in both simultaneously.
✩ We thank Federico Zincenko for excellent research assistance, Roger Koenker and Anil Bera for useful discussions, Bernard Lejeune for important clarifications and for graciously making his computer routines available, and four anonymous referees, Cheng Hsiao and the associate editor for comments that helped improve this paper considerably. Nevertheless, all errors are our responsibility. ∗ Corresponding author. Tel.: +44 0 20 7040 8919. E-mail address:
[email protected] (G. Montes-Rojas).
0304-4076/$ – see front matter © 2010 Elsevier B.V. All rights reserved. doi:10.1016/j.jeconom.2010.09.010
Consider the case of testing for heteroskedasticity. In the crosssectional domain, the landmark paper by Breusch and Pagan (1979) derives a widely used, asymptotically valid test in the Lagrange multiplier (LM) maximum-likelihood (ML) framework under normality. Further work by Koenker (1981) proposed a simple ‘studentization’ that avoids the restrictive Gaussian assumption. This is an important result since non-normalities severely affect the performance of the standard LM based test, as clearly documented by Evans (1992) in a comprehensive Monte Carlo study. Wooldridge (1990, 1991) and Dastoor (1997) consider a more general framework allowing for heterokurtosis. The literature on panel data has only recently produced results analogous to those available for the cross-sectional case.1 For the one-way error component, Holly and Gardiol (2000) study the case where heteroskedasticity is only present in the individual-specific component, and derive a test statistic that is a direct analogy of the classic Breusch–Pagan test in an LM framework under normality.2 Baltagi et al. (2006) allow for heteroskedasticity in both components and derive a test for the joint null of homoskedasticity, again, in the Gaussian LM framework. They also derive ‘marginal’ tests for homoskedasticity in either component, that is, tests that assume that heteroskedasticity is absent in the component not
1 An early contribution on this topic is the seminal paper by Mazodier and Trognon (1978). 2 Recently, Baltagi et al. (2010) extended this test to incorporate serial correlation as well.
G. Montes-Rojas, W. Sosa-Escudero / Journal of Econometrics 160 (2011) 300–310
being checked, of which, naturally, the test by Holly and Gardiol (2000) is a particular case. Both articles propose LM-type tests and, consequently, are based on estimating a null homoskedastic model, which makes them computationally attractive.3 Closer to our work is Lejeune (2006), who proposes a pseudo-maximumlikelihood framework for estimation and inference of a full heteroskedastic model. This paper derives new tests for homoskedasticity in the error components model that possess two robustness properties. Though the term robust has a long tradition in statistics (Huber, 1981), in this paper it is used to mean being resistant to (1) misspecification of the conditional variance of the remainder term, and (2) departures away from the strict Gaussian framework used in the ML-LM context. The first robustness property is related to resistance to misspecification of the a priori admissible hypotheses, that is, to ‘type-III errors’ in the terminology of Kimball (1957) (see Welsh, 1996, pp. 119–120, for a discussion of these concepts). The negative effects of this type of misspecification on the performance of LM tests have been studied by Davidson and MacKinnon (1987), Saikkonen (1989) and Bera and Yoon (1993), and are found to occur when the score of the parameter of interest is correlated with that of the nuisance parameter. This type of misspecification affects the Holly and Gardiol (2000) test in the case where the temporal dimension of the panel is fixed, which assumes that heteroskedasticity is absent in the remainder term, and therefore, rejects its null spuriously not due to heteroskedasticity being present in the individual component being tested, but in the other one. This problem can be observed directly in the corresponding non-zero element of the Fisher information matrix presented in Baltagi et al. (2006). As discussed in Section 4, Lejeune’s (2006) tests are similarly affected. In such cases, it is difficult to identify the presence of heteroskedasticity in the individual component since it is ‘masked’ by the other source. We propose a modified test for heteroskedasticity in the individual component that is immune to the presence of heteroskedasticity in the remainder term, and hence can identify the source of heteroskedasticity. The second robustness property is related to the idea of robustness of validity of Box (1953), that is, tests that achieve an intended asymptotic level for a rather large family of distributions (see Welsh, 1996, ch. 5, for a discussion). In this paper, through an extensive Monte Carlo experiment, non-normalities are shown to severely affect the performance of the tests by Holly and Gardiol (2000) and Baltagi et al. (2006), consistent with the results of Evans (1992) for the cross-sectional case. We derive new tests using a conditional moment framework, and thus, they are distribution free by construction, subject to mild regularity assumptions. In this context, the LM-type tests proposed by Lejeune (2006) are also resistant to non-normalities. We also consider the case of possible heterokurtosis as a simple extension of our framework, along the line of the work by Wooldridge (1990; 1991) and Dastoor (1997). An additional advantage of all our proposed statistics is that of simplicity, since they are based on simple transformations of pooled OLS residuals of a fully homoskedastic model, unlike the case of the tests by Holly and Gardiol (2000) and Baltagi et al. (2006) that require ML estimation. Furthermore, all tests proposed in this paper can be computed based on the R2 coefficients from simple artificial regressions. The paper is organized as follows. Section 2 presents the heteroskedastic error components model and the set of moment conditions used to derive test statistics in Section 3. Section 4 presents the results of a detailed Monte Carlo experiment that compares all our statistics and those obtained by Holly and Gardiol
3 Other related contributions include Roy (2002) and Phillips (2003).
301
(2000), Baltagi et al. (2006) and Lejeune (2006). Section 5 considers an extension of the proposed statistics to handle heterokurtosis. Section 6 concludes and presents suggestions for practitioners and future research. 2. Moment conditions for the one-way heteroskedastic error components model Baltagi et al. (2006) use a parametric error components model under normality and a ML estimator. In order to highlight differences and similarities, our search for distribution-free tests for heteroskedasticity will be based on a set of appropriate moment conditions. Consider the following regression model with general heteroskedasticity in a one-way error components model: yit = x′it β + uit ,
uit = µi + νit ,
i = 1, . . . , N ; t = 1, . . . , T ,
(1)
where yit , uit , µi and νit are scalars, xit is a kβ -vector of regressors, and β is a kβ -vector of parameters. As usual, the subscript i refers to individual, and t to temporal observations. We follow the conditional moment framework introduced by Newey (1985), Tauchen (1985) and White (1987), and consider a set of conditioning variables wit , containing the not necessarily disjoint elements xit , zµi and zν it . Here zµi and zν it are vectors of regressors of dimensions kθµ and kθν , respectively. For notational convenience we also define wi = {wi1 , . . . , wit , . . . , wiT } and xi = {xi1 , . . . , xit , . . . , xiT }. Throughout the paper we assume that the conditional mean of model (1) is well specified, that is, E [uit |wi ] = E [uit |xi ] = 0. In the context of the general framework specified by Wooldridge (1990, p. 18) this implies that the validity of the derived tests actually imposes more than just the hypothesis of interest, by ruling out misspecification in the conditional mean.4 Further, we assume that the conditional processes µi |wi and νit |wi are conditionally uncorrelated, independent across i, with νit |wi also uncorrelated across t, and with zero conditional mean, conditional variances given by ′
σµ2i ≡ V [µi |wi ] = σµ2 hµ (zµ′ i θµ ) > 0, σ
2 ν it
≡ V [νit |wi ] = V [νit |wit ] =
i = 1, . . . , N ,
σν hν (zν′ it θν ) 2
(2)
> 0,
i = 1, . . . , N ; t = 1, . . . , T ,
(3)
and finite fourth moments. hµ (.) and hν (.) are twice continuously differentiable functions satisfying hµ (.) > 0, hν (.) > 0, hµ (0) = 1, hν (0) = 1, h(µ1) (0) ̸= 0 and h(ν1) (0) ̸= 0, where h(j) denotes their jth derivatives. In this setup, θµ and θν will be the parameters of interest. A test for heteroskedasticity in the individual-specific component is σµ2
based on the null hypothesis H0
: θµ = 0; and a test for heteroσ2
skedasticity in the remainder error term is based on H0 ν : θν = 0. Testing for the validity of the full homoskedastic model implies a σµ2 ,σν2
: θν = θµ = 0. Because, in joint test with null hypothesis H0 general, the nature of the heteroskedasticity is unknown, zµ and zν may be similar, when not identical, hence we cannot rely on them to distinguish among different types of heteroskedasticity. ∑T Let ui ≡ T −1 t =1 uit be the between residuals and u˜ it ≡ uit − u¯ i the within residuals. Different moment conditions on these errors provide alternative ways of testing for both sources of heteroskedasticity.
4 Before testing for heteroskedasticity, it would be necessary first to check that the conditional mean is correctly specified. Lejeune (2006) provides robust tests for that purpose.
302
G. Montes-Rojas, W. Sosa-Escudero / Journal of Econometrics 160 (2011) 300–310
The squared between residual provides moment conditions for σµ2
testing H0 : E [¯u2i |wi ] = σµ2 hµ (zµ′ i θµ ) + T −2 σν2
T −
hν (zν′ it θν ).
(4)
t =1
σ2
If H0 ν is true, that is, if there is no heteroskedasticity in the remainder component, it simplifies to E [ui |wi ] = σµ2 hµ (zµ′ i θµ ) + T −1 σν2 . 2
(5)
σ2
Moreover, if H0 ν does not hold, but N → ∞ and T → ∞, the presence of heteroskedasticity in the remainder component has no effect on a test for homoskedasticity in the individual component σµ2
based on (5). In this case a test for H0 is said to be robust to the vaσ2
σ2
lidity of H0 ν . Second, if N → ∞ and T is fixed, but H0 ν is true, the moment condition in (5) holds. A test for these cases can be based on N times the centered R2 of an auxiliary regression of u¯ 2 on zµ and a constant, as shown in the next section. σν2
However, if N → ∞, T is fixed and H0 does not hold, tests based on (4) may led to spurious rejections because of the presence of heteroskedasticity in the remainder component. For this case, define T −
2
u˜¯ i = u¯ 2i − T −2
u˜ 2it − T −3
t =1
= u¯ 2i −
T
u˜ 2it − T −4
t =1
T −
u˜ 2it . . .
−
T −1
Dastoor’s framework includes Wooldridge’s (1990; 1991) setup for heterokurtosis, that is, the case where the error term is allowed to have different conditional fourth moments. In our case, this would involve allowing that both E [(µ2i − σµ2 hµ (zµ′ i θµ ))2 |wi ] and E [(νit2 − σν2 hν (zν′ it θν ))2 |wi ] are not constants. In this section we derive tests assuming homokurtosis, since it provides an intuitive framework to motivate the statistics. The heterokurtic case and a related Monte Carlo exploration are treated as an extension in Section 5. Assumption 2. For each i = 1, . . . , N, and t = 1, . . . , T , E [(µ2i − σµ2 hµ (zµ′ i θµ ))2 |wi ] = Gµ < ∞ and E [(νit2 − σν2 hν (zν′ it θν ))2 |wi ] = Gν < ∞. The test statistics will be based on transformations of the OLS residuals uˆ it ≡ yit − x′it βˆ , where βˆ is the OLS estimator of regression model (1). 3.1. Test for H0 . Cases N , T → ∞ and N → ∞, T finite and θν = 0
u˜ 2it ,
σµ2
For these two cases, a test for H0
t =1
2 will be based on η¯ˆ i = u¯ˆ i ,
ˆ it . Define η¯ˆ , a N-vector containing the where uˆ i ≡ T t =1 u sample squared between residuals, Zµ , a N × kθµ matrix with the sample matrix of covariates for testing this hypothesis, and MN ≡ IN − J¯N , where J¯N = ιN ι′N /N and ιN is a (N × 1) vector of ones. Consider a sequence of alternatives à la Pitman such that √ θµ = δµ / N and 0 ≤ ‖δµ ‖ < ∞, where ‖.‖ is the Euclidean ∑T −1
and note that 2
Assumption 1. For each i = 1, . . . , N and t = 1, . . . , T , E [wj,it wj′,it ] is a finite positive definite matrix, where wj,it is a column vector containing the distinct elements of w and 1. Moreover, E [|wj,it |2+ϵ ], E [|wj,it µ2i |2+ϵ ] and E [|wj,it νit2 |2+ϵ ] are uniformly bounded for some ϵ > 0.
σµ2
t =1
T
−2
1−
T −
procedure. We use the asymptotic framework of Dastoor (1997) adapted to the one-way error components model structure described above.
E [u˜¯ i |wi ] = E u¯ 2i −
T
−2
1−
T −1
T
−
u˜ 2it |wi
= σµ2 hµ (zµ′ i θν ).
(6)
t =1
Unlike (4), this moment condition does not involve parameters related to heteroskedasticity in the remainder component, and, hence, it will be used in Section 3.2 to construct tests for heteroskedasticity in the individual component in short panels that are robust to the presence of heteroskedasticity in the remainder component. Consider now the moment condition based on the squared within residual:
+T
T −
and λµ =
(1)
σµ4 hµ (0)2 ′ δµ Dµ δµ . φµ
Then, under Assumptions 1 and 2, as σµ2
σ2
√
: θµ = δµ /
N, ′
hν (zν′ ij θν )
Theorem 1. Let φµ = Var [¯u2i |wi ], Dµ = limN →∞ E [ N1 Zµ MN′ Zµ ]
N , T → ∞ or N → ∞, T fixed and H0 ν , and under HA
E [˜u2it |wi ] = σν2 (1 − 2T −1 + T −2 )hν (zν′ it θν )
−2
σµ2
norm. The following Theorem derives a valid test statistic for H0 for the two cases being considered.
.
(7)
j̸=t
σν2
This condition can be used to construct tests for H0 . Note that
σµ2 and θµ do not appear anywhere in (7), which means that a test
based on this moment condition will be robust to the presence of heteroskedasticity in the individual error component, i.e. when θµ ̸= 0. A test for heteroskedasticity in the remainder component will be based on NT × R2 , where R2 is the centered coefficient of determination of an auxiliary regression of u˜ 2 on zν and a constant (see Section 3.3). Note, there may be differences between short and long panels because E [˜u2it |wi ] = σν2 (hν (zν′ it θν ) + O(T −1 )). This is explored in Section 3.4. 3. Robust tests for heteroskedasticity Our tests will be based on the moment conditions considered in the previous section, following Koenker’s (1981) studentization
′
¯ˆ −1 η¯ˆ MN Zµ (Z′ MN Zµ )−1 Z′ mµ ≡ N × (η¯ˆ MN η) µ µ d × MN η¯ˆ → χk2θµ (λµ ).
(8)
Proof. Note that the sequence of random variables {¯u2i } is independent. Moreover, by taking a Taylor series expansion of the function hµ (.) and Assumption 1, √1 Z′µ MN η¯ = σµ2 h(µ1) (0) N
δµ′ Dµ + op (1) and limN →∞ Var [ √1N Z′µ MN η] ¯ = φµ Dµ , where η¯ = {¯u21 , . . . , u¯ 2N }. Also note that φµ =
1 ′ η¯ MN η¯ + op (1). Now we N apply Theorem 1 in Dastoor (1997) for our sequence of squared OLS between residuals on i = 1, . . . , N, which under Assumption 2 (homokurtosis) gives the desired result.
Note that if µ is Gaussian, φµ = 2 ×(σµ2 + T −1 σν2 )2 , and then the Koenker-type test reduces to the Holly and Gardiol (2000) marginal test, which is similar to the Breusch and Pagan (1979) test where the between OLS residuals are used instead of the untransformed OLS residuals.
G. Montes-Rojas, W. Sosa-Escudero / Journal of Econometrics 160 (2011) 300–310
303
(9)
0, i ̸= k and Cov[˜u2it , u˜ 2ih |wi ] = O(T −2 ), t ̸= h. Then follow the proof of Theorem 1 for our sequence on i = 1, . . . , N and t = 1, . . . , T , which under Assumption 2 (homokurtosis) gives the desired result.
Note that mµ is N × R2µ where R2µ is the centered coefficient of determination of this regression model, i.e. an auxiliary regression of η¯ˆ on zµ and a constant (see Koenker, 1981, p. 111).
Note that if νit is Gaussian, φν = 2 × σν4 , so this Koenker-type test is the same as the Breusch–Pagan style test where the within OLS residual is used instead of the untransformed OLS residual. Consider now the auxiliary regression model
Consider now the auxiliary regression model (see Davidson and MacKinnon, 1990, on the use of artificial regressions) 2
u¯ˆ i = α + zµ′ i γ + residual.
2
σµ2
3.2. Test for H0 . Case N → ∞, T finite and θν ̸= 0 A test for the individual component in short panels with potential heteroskedasticity in the remainder component requires
˜2 ˜ the use of condition (6). A test for H0 will be based on η¯ˆ i = u¯ˆ i , 2 2 −2 ∑T ˜2 ˜ u˜ˆ and u˜ˆ ≡ uˆ − u¯ˆ . Define η¯ˆ , a where u¯ˆ = u¯ˆ − T σµ2
i
i
1−T −1
t =1
it
it
it
i
N-vector containing the transformed sample residuals. Theorem 2. Let
σ2
3.4. Test for H0 ν . N → ∞ and T finite Consider now the case where N → ∞ and T is finite. For this case, consider a Taylor expansion of Eq. (7) where θν is expanded about 0,
(1)
σµ4 hµ (0)2 ′ δµ ∗ φµ σµ2
2 = limN →∞ Var [u˜¯ i |wi ] and λ∗µ =
φµ∗
u˜ˆ it = α + zν′ it γ + residual. (13) 2 2 Again, mν = NT × Rν , where Rν is the centered coefficient of determination of the regression model.5
Dµ δµ . Then, under Assumptions 1 and 2, as N → ∞ and under HA
√ θµ = δµ / N,
2
:
˜
˜
˜
′
→χ
2 kθµ
˜
T −
h(ν1) (0) zν′ ij θν
+ o(‖θν∗ ‖)
= σν2 + σν2 ((1 − 2T −1 )h(ν1) (0) zν′ it θν
(λ∗µ ).
(10)
Proof. Similar to that in Theorem 1.
−2
j=1
¯ˆ −1 η¯ˆ MN Zµ (Z′ MN Zµ )−1 Z′ MN η¯ˆ m∗µ ≡ N × (η¯ˆ MN η) µ µ d
2
+ T ′
(1 − 2T −1 )h(ν1) (0) zν′ it θν
E [˜ |wi ] = σν + σν u2it
+ T −1 h(ν1) (0)¯zν′ i θν ) + o(‖θν∗ ‖) ∑T where z¯ν i = T −1 t =1 zν it , i = 1, . . . , N and θν∗ is between θν and 0. Moreover, note that Cov[˜u2it , u˜ 2ih |wi ] = c = O(T −2 ), then, for T finite, additional covariance terms need to be taken into consideration. Define limN →∞ Var [ √1 Z˜ν′ MNT η] ˜ = Ων , where
Consider the auxiliary regression model
NT
˜2
u¯ˆ i = α + zµ′ i γ + residual.
(11)
Using a similar argument as before, m∗µ = N × R2µ∗ where R2µ∗ is the centered coefficient of determination of the regression model. Note that the auxiliary regression model (11) covers that in model (9), and therefore, the case analyzed here is a generalization of the former. σ2
3.3. Test for H0 ν . N , T → ∞
Z˜ν is a NT × kθν matrix with the sample matrix of covariates with typical element {(1 − 2T −1 )zν it + T −1 z¯ν i }, η˜ is vector of ˆ ν be a consistent estimate of that within residuals {˜uit }, and let Φ variance–covariance matrix of η˜ .
= σν4 h(ν1) (0)2 δν′ D˜ ν Ων−1 D˜ ν δν where D˜ ν = limN →∞ E [ Z˜ν MNT Z˜ν ]. Then, under Assumptions 1 and 2, as N → √ σ2 ∞, T fixed and under HA ν : θν = δν / NT , Theorem 4. Let λν 1 NT
′
ˆ ν MNT Z˜ν )−1 m∗ν ≡ NT × η˜ˆ MNT Z˜ν (Z˜ν′ MNT Z˜ν )(Z˜ν′ MNT Φ
Consider a test for homoskedasticity in the remainder compo2 nent in long panels with N , T → ∞. Define η˜ˆ it = u˜ˆ it , where u˜ˆ it ≡ uˆ it − u¯ˆ i , η˜ˆ , a NT -vector containing the sample within residuals squared, Zν , a NT × kθν matrix with the sample matrix of covariates for testing this hypothesis, and MNT = INT − (J¯N ⊗ J¯T ), where J¯T = ιT ι′T /T , ⊗ is the Kronecker product, and ιT is a (T × 1) vector of ones. Consider√ a sequence of local alternatives (Pitman drift) such that θν = δν / NT and 0 ≤ ‖δν ‖ < ∞. The following Theorem
derives an asymptotically valid test for this hypothesis. Theorem 3. Let φν
=
limN ,T →∞ Var [˜u2it |wi ]
′
′
=
d
Proof. The proof follows from Theorem 3 and Dastoor’s (1997) Theorem 1. A convenient way to implement this test is based on the auxiliary regression model 2
u˜ˆ it = α + z˜ν′ it γ + residual,
(14)
+ o(T ) for any ϵ > 0, where and note that NT × Rν = R∗ν 2 is the centered coefficient of determination of this regression model.6 m∗ν
−(2+ϵ)
Then, under
√ = δν / NT ,
′
˜ˆ −1 η˜ˆ MNT Zν (Z′ MNT Zν )−1 Z′ MNT η˜ˆ mν ≡ NT × (η˜ˆ MNT η) ν ν → χk2θν (λν ).
d
× (Z˜ν′ MNT Z˜ν )Z˜ν′ MNT η˜ˆ → χk2θν (λν ).
2∗
Gν , Dν
=
( 1)
σ 4 h (0)2 ′ δν Dν δν . limN ,T →∞ E [ Zν MNT Zν ] and λν = ν νφ ν σ2 Assumptions 1 and 2, as N , T → ∞ and under HA ν : θν 1 NT
′
(12)
Proof. Note that the sequence of random variables {˜u2it } is asymptotically independent as T → ∞, because Cov[˜u2it , u˜ 2kh |wi , wk ] =
5 As noted by an anonymous referee a significant limitation of this test is that
νit |wi is not serially correlated and it should not be very difficult to construct
a modified test that do not rely on this assumption (see for instance the next subsection, where additional covariance terms are considered). 6 The Monte Carlo experiments of the next section are carried out with T ≥ 5, and we find no significant discrepancies between the results obtained from model (14) and those carried out based on the statistic in Theorem 4, where 1 ˆ ν are estimated as the within individuals covariance terms c in Φ NT (T −1)
∑N ∑T i=1
t =1
∑T
h̸=t
2
2
u˜ˆ it u˜ˆ ih .
304
G. Montes-Rojas, W. Sosa-Escudero / Journal of Econometrics 160 (2011) 300–310
σ2
σ 2 ,σ 2
3.5. Test for H0
ν
• m∗µ . H0 µ : θµ = 0. This test statistic is robust to the validity of
µ
σ2
Following Baltagi et al. (2006) we construct a joint test based on the sum of the individual tests, mµ,ν = mµ + mν .
(15)
With N and T tending to infinity, the joint test is trivially derived by exploiting the two orthogonal moment conditions (5) and (7) and hence a valid test is based on the sum of the marginal tests for each source of heteroskedasticity, which involve the sum of independent chi-squared random variables, and therefore, we d
have that mµ,ν → χk2θ +kθ . Note that the joint test by Baltagi µ ν et al. (2006) also reduces to the sum of two marginal tests when T → ∞. A preliminary analysis of the Monte Carlo experiments showed that with T small, mµ,ν behave similarly to the large T case, and therefore, we find that it is not necessary to make a small panel correction.
σ2
• HGµ . H0 µ : θµ = 0. Holly and Gardiol’s (2000) ‘marginal’ test for no heteroskedasticity in the individual component.
• Lµ . H0 : θµ = θν = 0. Lejeune’s (2006) ‘marginal’ test for no heteroskedasticity in the individual component. σ2
• mν . H0 ν : θν = 0. The statistic is NT -times the R2 from the 2 pooled OLS regression of u˜ˆ it on xit and a constant (see Section 3.3, Eq. (12)). σ2
• m∗ν . H0 ν : θν = 0. This is a finite T corrected version of the
previous statistic, and is NT -times the R2 from the pooled OLS 2
regression of u˜ˆ it on x˜ it and a constant, with x˜ ∗it = (1 − 2T −1 )xit + T −1 x¯ i . (see Section 3.4, Eq. (14)). σ2
In order to explore the robustness properties of the proposed tests in small samples, the design of our Monte Carlo experiment will initially follow very closely that of Baltagi et al. (2006), to which we refer for further details on the experimental design, and will be modified accordingly to highlight some specific features of our tests. The baseline model is: i = 1, . . . , N ; t = 1, . . . , T ,
(16)
where xit = wi,t + 0.5wi,t −1 and wi,t ∼ iid U (0, 2). The parameters β0 and β1 are assigned values 5 and 0.5, respectively. For each xi , we generate T + 10 observations and drop the first 10 observations in order to reduce the dependency on initial values. The experiment considers three cases, corresponding to different sources of heteroskedasticity. In all of them, the total variance is set to σ¯ µ2 + σ¯ ν2 = 8, where σ¯ µ2 = E (σµ2i ) and σ¯ ν2 = E (σν2it ). For all DGPs, νit has zero mean and variance σν2it , while µi has zero mean and variance σµ2i . For each case we consider exponential heteroskedasticity, h(z ′ θ ) = exp(z ′ θ ).7 The following heteroskedastic models are considered: Heteroskedasticity in the remainder component (case a): σν2it
=
Heteroskedasticity in the remainder component (case b): σν2it
=
σν2 hν (θν xit ), σµ2i = σµ2 , θν ∈ {0, 1, 2, 3}, and θµ = 0. σν2 hν (θν x¯ i ), x¯ i = T −1 θµ = 0.
˜2
regression of u¯ˆ it on xit and a constant (see Section 3.2, Eq. (10)).
• BBPν . H0 ν : θν = 0. This is the marginal tests for the null of no
4. Monte Carlo experiments
yit = β0 + β1 xit + µi + νit ,
H0 ν in short panels, and is N-times the R2 from the pooled OLS
2 2 t =1 xit , σµi = σµ , θν ∈ {0, 1, 2, 3}, and
∑T
Heteroskedasticity in the individual component: σµ2i = σµ2 hµ (θµ x¯ i ),
x¯ i = T −1 t =1 xit , σν2it = σν2 , θµ ∈ {0, 1, 2, 3}, and θν = 0. For each replication we have computed the test statistics proposed in this paper, those based on Lejeune’s (2006) framework (based on pooled OLS residuals), and those of Baltagi et al. (2006) and Holly and Gardiol (2000), using residuals after ML estimation. In particular, the statistics considered and their corresponding null hypotheses are:
heteroskedasticity in the remainder component in Baltagi et al. (2006), for the case where heteroskedasticity varies with i and t; see their Section 3.2, Eq. (10). σ2
• BBPν′ . H0 ν : θν = 0. In this case, it is assumed that the variance of νit varies only with i = 1, . . . , N. See Baltagi et al. (2006), Section 3.2, Eq. (11).
• Lν . H0 : θµ = θν = 0. Lejeune’s (2006) ‘marginal’ test for no heteroskedasticity in the remainder component.
• mµ,ν . H0 : θµ = θν = 0. This is the proposed statistic for the joint null of homoskedasticity in both components, and is the sum of mµ and mν (see Section 3.5, Eq. (15)). • BBPµ,ν . H0 : θµ = θν = 0. This is Baltagi et al.’s (2006) test for the joint null; see their Section 3.2, Eq. (13). • Lµ,ν . H0 : θµ = θν = 0. This is Lejeune’s (2006) test for the joint null. We have performed 5000 replications for each case, and the proportion of rejections was obtained based on a 5% nominal level. The main goals of the experiment are to quantify (1) the effects of misspecified heteroskedasticity on new and existing tests, (2) the effects of departures away from Gaussianity, (3) the ‘cost of robustification’, that is, the potential power losses due to using robust tests when the ‘ideal’ conditions (normality and correct specification) used to derive the ML-LM based tests hold, and hence a robustification is not necessary. In order to isolate each problem, in the first subsection we will focus on robustness to misspecification, and in the second one on robustness of validity, measuring robustification costs for each case.
∑T
σµ2
: θµ = 0. The statistic is N-times the R2 from the 2 pooled OLS regression of uˆ¯ on x¯ i and a constant (see Section 3.1,
• mµ . H0
i
Eq. (8)).
7 Simulations were also run for quadratic heteroskedasticity, h(z ′ θ) = (1 + z ′ θ)2 , and the results are similar for size and power to those of exponential heteroskedasticity. Following the referees’ suggestions we omit these results but they are available from the authors upon request.
4.1. Robustness to misspecified heteroskedasticity Tables 1–3 present simulation results for the Gaussian DGP, for
(N , T ) = (50, 5) and (N , T ) = (25, 10) panel sizes, with µi ∼ N (0, σµ2i ), νit ∼ N (0, σν2it ). Each table is split into four horizontal
panels, corresponding to different variance values and panel sizes. It is important to note that all tests are constructed using parameters estimated under the joint null hypothesis of full homoskedasticity. The Holly and Gardiol (2000), Baltagi et al. (2006) and Lejeune (2006) statistics may be affected by the presence of heteroskedasticity in the other component not being tested and which is ignored. For instance, as discussed in Section 3, misspecified heteroskedasticity is expected to affect the performance of the Holly and Gardiol (2000) statistic, that is, a test for heteroskedasticity in the individual component assuming no heteroskedasticity in the remainder component. Similarly, it
G. Montes-Rojas, W. Sosa-Escudero / Journal of Econometrics 160 (2011) 300–310
305
Table 1 Empirical rejection probabilities. DGP: Normal. Heteroskedasticity in the remainder component (case a).
θµ
θν
Exponential heteroskedasticity mµ
m∗µ
HGµ
Lµ
mν
m∗ν
BBPν
BBPν′
Lν
mµ,ν
BBPµ,ν
Lµ,ν
0.047 0.053 0.055 0.063
0.047 0.049 0.054 0.061
0.045 0.048 0.056 0.061
0.032 0.054 0.080 0.097
0.052 1.000 1.000 1.000
0.093 0.463 0.808 0.889
0.045 0.999 1.000 1.000
0.045 0.900 0.998 0.999
0.050 0.998 1.000 1.000
0.041 0.364 0.634 0.698
0.044 0.998 1.000 1.000
0.025 0.361 0.654 0.661
0.053 0.056 0.054 0.053
0.054 0.055 0.054 0.054
0.040 0.046 0.042 0.045
0.041 0.083 0.164 0.209
0.062 1.000 1.000 1.000
0.092 0.428 0.788 0.878
0.057 1.000 1.000 1.000
0.043 0.695 0.949 0.975
0.047 0.324 0.619 0.695
0.057 1.000 1.000 1.000
0.045 1.000 1.000 1.000
0.034 0.343 0.658 0.693
0.055 0.099 0.181 0.276
0.054 0.069 0.088 0.119
0.049 0.092 0.183 0.300
0.039 0.288 0.485 0.512
0.056 0.999 1.000 1.000
0.052 0.996 1.000 1.000
0.053 1.000 1.000 1.000
0.047 0.903 0.998 0.999
0.047 0.999 1.000 1.000
0.050 0.985 0.996 0.980
0.049 1.000 1.000 1.000
0.035 0.944 0.981 0.937
0.049 0.053 0.066 0.076
0.049 0.053 0.055 0.069
0.042 0.046 0.052 0.069
0.045 0.610 0.877 0.865
0.050 1.000 1.000 1.000
0.053 0.997 1.000 1.000
0.049 1.000 1.000 1.000
0.048 0.698 0.956 0.970
0.047 0.990 0.998 0.987
0.050 1.000 1.000 1.000
0.044 1.000 1.000 1.000
0.041 0.968 0.993 0.966
σµ = 6, σ¯ ν = 2 N = 25, T = 10 2
0 0 0 0
2
0 1 2 3
N = 50, T = 5 0 0 0 0
0 1 2 3
σµ2 = 2, σ¯ ν2 = 6 N = 25, T = 10 0 0 0 0
0 1 2 3
N = 50, T = 5 0 0 0 0
0 1 2 3
Notes: Monte Carlo simulations based on 5000 replications. Theoretical size 5%. Heteroskedasticity in the remainder component, case a: σν2it = σν2 hν (θν xit ), σµ2i = σµ2 ,
θν ∈ {0, 1, 2, 3}, and θµ = 0.
Table 2 Empirical rejection probabilities. DGP: Normal. Heteroskedasticity in the remainder component (case b).
θµ
θν
Exponential heteroskedasticity mµ
m∗µ
HGµ
Lµ
mν
m∗ν
BBPν
BBPν′
Lν
mµ,ν
BBPµ,ν
Lµ,ν
0.050 0.053 0.053 0.054
0.050 0.053 0.053 0.053
0.047 0.046 0.045 0.041
0.036 0.053 0.090 0.143
0.048 0.205 0.493 0.680
0.049 0.238 0.567 0.755
0.047 0.194 0.554 0.787
0.050 0.745 0.995 1.000
0.039 0.054 0.073 0.102
0.045 0.165 0.396 0.582
0.040 0.146 0.479 0.730
0.023 0.034 0.053 0.074
0.044 0.052 0.058 0.070
0.045 0.048 0.056 0.064
0.043 0.046 0.056 0.065
0.044 0.088 0.197 0.281
0.057 0.547 0.917 0.979
0.063 0.694 0.978 0.998
0.052 0.531 0.943 0.993
0.051 0.924 1.000 1.000
0.047 0.074 0.153 0.214
0.056 0.454 0.874 0.956
0.047 0.446 0.912 0.990
0.037 0.062 0.121 0.170
0.046 0.052 0.072 0.096
0.048 0.053 0.071 0.093
0.038 0.043 0.064 0.093
0.040 0.336 0.697 0.764
0.048 0.212 0.509 0.687
0.051 0.245 0.586 0.757
0.045 0.191 0.553 0.777
0.051 0.735 0.996 1.000
0.047 0.117 0.261 0.374
0.047 0.167 0.436 0.618
0.044 0.447 0.917 0.990
0.032 0.199 0.481 0.554
0.046 0.103 0.232 0.377
0.048 0.065 0.117 0.182
0.043 0.095 0.242 0.396
0.050 0.666 0.932 0.897
0.053 0.556 0.922 0.979
0.055 0.696 0.979 0.997
0.051 0.514 0.934 0.992
0.040 0.934 1.000 1.000
0.051 0.337 0.679 0.774
0.055 0.492 0.906 0.970
0.044 0.447 0.917 0.990
0.043 0.498 0.823 0.775
σµ = 6, σ¯ ν = 2 N = 25, T = 10 2
0 0 0 0
2
0 1 2 3
N = 50, T = 5 0 0 0 0
0 1 2 3
σµ2 = 2, σ¯ ν2 = 6 N = 25, T = 10 0 0 0 0
0 1 2 3
N = 50, T = 5 0 0 0 0
0 1 2 3
Notes: Monte Carlo simulations based on 5000 replications. Theoretical size 5%. Heteroskedasticity in the remainder component, case b: σν2it = σν2 hν (θν x¯ i ), σµ2i = σµ2 ,
θν ∈ {0, 1, 2, 3}, and θµ = 0.
should affect the performance of mµ , our test robustified to nonnormalities only. We expect our fully robust test m∗µ to be more resistant to this type of misspecification. Consider first Tables 1 and 2, that is, when there is heteroskedasticity in the remainder component only, cases a and b, respectively. As predicted by the results in Section 3, in terms of size distortion, mµ and HGµ become negatively affected by the presence of heteroskedasticity in the remainder component, that is, they
tend to reject their nulls not due to the presence of heteroskedasticity in the individual component but in the other one. For example, in Table 1, with small T , the rejection rates reach 0.3 for a nominal size of 0.05. The Monte Carlo results show that this problem affects the corresponding test by Lejeune (Lµ ) as well. Monte Carlo results on Lejeune’s (2006) procedures are new, so it is relevant to observe that the test designed specifically to detect heteroskedasticity in the remainder component, Lν , has correct
306
G. Montes-Rojas, W. Sosa-Escudero / Journal of Econometrics 160 (2011) 300–310
Table 3 Empirical rejection probabilities. DGP: Normal. Heteroskedasticity in the individual component.
θµ
θν
Exponential heteroskedasticity mµ
m∗µ
HGµ
Lµ
mν
m∗ν
BBPν
BBPν′
Lν
mµ,ν
BBPµ,ν
Lµ,ν
0.048 0.326 0.776 0.952
0.047 0.327 0.773 0.953
0.045 0.344 0.815 0.974
0.035 0.067 0.151 0.232
0.054 0.055 0.054 0.051
0.095 0.172 0.330 0.494
0.049 0.049 0.049 0.046
0.050 0.049 0.051 0.055
0.043 0.072 0.131 0.205
0.049 0.255 0.662 0.881
0.044 0.276 0.737 0.950
0.027 0.042 0.077 0.126
0.053 0.122 0.298 0.511
0.053 0.121 0.298 0.511
0.039 0.121 0.315 0.562
0.039 0.235 0.547 0.642
0.050 0.049 0.049 0.050
0.092 0.368 0.743 0.911
0.049 0.047 0.049 0.047
0.044 0.048 0.044 0.050
0.042 0.193 0.459 0.575
0.048 0.095 0.217 0.373
0.043 0.098 0.253 0.467
0.033 0.137 0.334 0.430
0.047 0.175 0.476 0.721
0.050 0.169 0.462 0.694
0.047 0.175 0.504 0.747
0.040 0.050 0.088 0.119
0.054 0.052 0.053 0.055
0.053 0.067 0.094 0.145
0.049 0.047 0.050 0.053
0.047 0.057 0.071 0.084
0.046 0.055 0.055 0.076
0.053 0.141 0.377 0.598
0.050 0.139 0.413 0.654
0.034 0.040 0.052 0.073
0.052 0.093 0.218 0.380
0.054 0.096 0.219 0.378
0.042 0.088 0.220 0.412
0.049 0.095 0.202 0.265
0.056 0.050 0.051 0.051
0.056 0.094 0.193 0.313
0.053 0.050 0.045 0.050
0.040 0.046 0.051 0.051
0.046 0.078 0.119 0.157
0.052 0.076 0.159 0.279
0.051 0.079 0.173 0.333
0.042 0.067 0.123 0.162
σ¯ µ = 6, σν = 2 N = 25, T = 10 2
0 1 2 3
2
0 0 0 0
N = 50, T = 5 0 1 2 3
0 0 0 0
σ¯ µ2 = 2, σν2 = 6 N = 25, T = 10 0 1 2 3
0 0 0 0
N = 50, T = 5 0 1 2 3
0 0 0 0
∑T Notes: Monte Carlo simulations based on 5000 replications. Theoretical size 5%. Heteroskedasticity in the individual component: σµ2i = σµ2 hµ (θµ x¯ i ), x¯ i = T −1 t =1 xit , σν2it =
and σ¯ ν2 = 6. First, to assess the sensitivity of the proposed statistics to the panel size, we fix N = 50 and consider 1000 simulations for each T ∈ {2, 3, . . . , 30}. Simulation results are presented graphically in Fig. 1, and show that the main problem arises because of short panels. Moreover, it shows that the main gain of using m∗µ is in the small T case, the most likely situation in practice. All tests achieve correct size for large T , but m∗µ achieves the correct size in shorter panels. Second, we have also computed rejection rates depending on the size of the cross-sectional dimension of the panel, N, keeping fixed the temporal dimension; see Figs. 2 and 3. In particular, we fix T = 2, 5 and consider 1000 simulations for each N ∈ {10, 20, . . . , 200}. Results show that mµ , HGµ and Lµ increasingly (and wrongly) reject as N increases. Nevertheless, m∗µ remains insensitive to changes in N, although rejection rates are above 0.05. Finally, we explored the effects of the relative importance of between vs. within heteroskedasticity in the remainder component. Consider now the following form of functional heteroskedasticity:
σν2it = σν2 ∗ exp(λν ∗ (α ∗ (xit − x¯ i ) + (1 − α) ∗ xit )),
0.8 0.6 0.4
Rejection rates (at 0.05)
0.2 0.0
size and power increasing with the strength of heteroskedasticity, as can be seen in Table 1. Interestingly, the robustified test m∗µ presents much lower rejection rates (almost a third of their competitors), hence being more resistant to misspecifications in the alternative hypothesis. It is important to observe that, as predicted by the results of Section 2, the effects of misspecification are stronger the smaller the T is and the more important is the between variation in the remainder component. The first effect can be appreciated by comparing results for different panel sizes, and the second by comparing the cases σµ2 = 6, σ¯ ν2 = 2 and σµ2 = 2, σ¯ ν2 = 6 in Tables 1 and 2. In order to highlight these points, consider the following experiments, which are a variation of the exponential heteroskedasticity in the remainder component, case a, where σµ2 = 2 for all i, λν = 3,
1.0
σν2 , θµ ∈ {0, 1, 2, 3}, and θν = 0.
5
10
15
20
25
30
T Fig. 1. Heteroskedasticity in the remainder component with T varying.
with α ∈ [0, 1]. If α = 0, this corresponds to case a in Table 1. If α = 1, by construction, there is only within heteroskedasticity, and therefore no differences in the variance across individuals. For different values of α , we have generated 1000 replications for (N , T ) = (50, 5), and calculate the empirical size at a theoretical level of 5% of HGµ , Lµ , mµ and m∗µ . Results are shown graphically in Fig. 4. HGµ , Lµ and mµ reject too often for small α , while m∗µ has better size properties. Moreover, for the four statistics, the simulated empirical size approaches the theoretical level as α goes to 1. Regarding robustification costs, tests specifically designed to detect heteroskedasticity in the remainder (mν , BBPν , BBPν′ , Lν ) increase their empirical power with the strength of this type of heteroskedasticity and, as expected under normality, the power of BBPν is the largest. Interestingly, our robust test mν performs relatively close to the Baltagi et al. (2006) LM statistics, implying
1.0
G. Montes-Rojas, W. Sosa-Escudero / Journal of Econometrics 160 (2011) 300–310
0.6 0.4 0.2
Rejection rates (at 0.05)
0.8
statistic designed to increase its power in small samples, is not as good as expected. First, it shows over-rejection for the (σµ2 =
50
100 N
150
200
0.2
0.4
0.6
0.8
1.0
Fig. 2. Heteroskedasticity in the remainder component with N varying, T = 2.
Rejection rates (at 0.05)
307
50
100 N
150
200
0.8
Fig. 3. Heteroskedasticity in the remainder component with N varying, T = 5.
6, σν2 = 2) case. Second, its power outperforms that of mν only in Table 2. Consider now Table 3, where we have heteroskedasticity in the individual component only under Gaussianity. The Holly and Gardiol (2000) test is locally optimal and should have correct asymptotic size, so robustification is not necessary. Our robust statistics mµ and m∗µ have very similar rejection rates for all values of θµ , suggesting that robustification cost are small in this case too. Interestingly, the test by Lejeune (2006) has increasing power, and for the (50, 5) case it outperforms the optimal test by Holly and Gardiol (2000). As heteroskedasticity in the individual component increases, (mν , BBPν , BBPν′ ) present rejection rates similar to their nominal levels, consistent with the fact that tests that check heteroskedasticity in the remainder component are immune to the presence of heteroskedasticity in the individual one. Interestingly Lν and m∗ν present unwanted power, that is, they reject their nulls due to heteroskedasticity in the other component, and hence are not robust to this misspecification. Finally, joint tests present increasing power, though, as expected, they are outperformed by the marginal tests specifically designed to detect departures in a single component. The distribution-free joint statistic mµ,ν has less power than BBPµ,ν (which assumes Gaussianity) but the power loss is very small, suggesting again that robustification costs are negligible. Results are similar when the relative importance of each component is altered (that is, by comparing the two horizontal panels). Again, for the N = 50, T = 5 case and when the individual variance is relatively larger than the individual one (second panel of Table 3), the joint test by Lejeune (2006) presents the highest power. Although not reported (results are available from the authors upon request), for completeness, we have also considered the case of heteroskedasticity in both components.8 Our proposed moment-based marginal tests do not diminish their power as we add misspecification of the type not being tested. That is, in general, their power performance increases for greater heteroskedasticity in the other component, and in fact, they have a similar performance to the Baltagi et al. (2006) LM tests. To summarize, the robustification costs incurred by all our new statistics are small, as measured by the loss in power by unnecessarily using resistant tests in the Gaussian case.
0.6 0.4 0.2
Rejection rates (at 0.05)
4.2. Robustness of validity
0.0
0.2
0.4
0.6
0.8
1.0
In order to explore the effect of departures away from Gaussianity, we evaluate the performance of all the test statistics under H0 : θµ = θν = 0, N = 50 and T = 5, for non-normal DGPs using 5000 replications. First, we generate t-Student DGPs with 3 and 5 degrees of freedom. Second, we consider skewednormal distributions constructed as in Azzalini and Capitanio (2003).9 Finally, we have also considered log-normal, exponential, χ12 and uniform distributions. In all cases, the random variables are standardized to have the required variances. Results appear in Table 4. The effects of departures away from Gaussianity are dramatic. For the t-Student cases, the empirical sizes of the LM Gaussianbased statistics are considerably large. Moreover, the simulations
alpha Fig. 4. Within–between heteroskedasticity in the remainder component.
that robustification costs for these particular experiments are low, that is, the loss in power for unnecessarily using a robust test is minor. Finally, note that the performance of m∗ν , our proposed
8 Parameters were set as follows: σ 2 = σ 2 h (θ x¯ ), σ 2 = σ 2 h (θ x ), θ ∈ µ ν µ ν it µi µ µ µ i νit
{0, 1, 2, 3}, and θν ∈ {0, 1, 2, 3}.
9 We are grateful to an anonymous referee for pointing out this distribution. We have used the SN package in R and the rsn command, with a shape parameter α = 20. This random variable has a kurtosis of 1 and considerable skewness.
308
G. Montes-Rojas, W. Sosa-Escudero / Journal of Econometrics 160 (2011) 300–310
show that rejection rates decrease as degrees of freedom increase, and thus the DGP becomes closer to normal. Even higher rejection rates are observed for the log-normal, exponential, χ12 and uniform DGPs. For instance, the log-normal has rejection rates above 0.24 for HGµ , and close to 0.50 for BBPν . However, rejection rates are close to the nominal level for the skewed-normal distribution (with considerable skewness but limited kurtosis). These results are in line with Evans’ (1992) simulations for the Breusch–Pagan cross-sectional test, which was found to be highly sensitive to excess kurtosis but less so to skewness. Interestingly our new test statistics and those of Lejeune’s (2006) are robust to departures away from Gaussianity, presenting empirical sizes very close to their nominal values. Surprisingly, we also find a good empirical size for the t-Student case with 3 degrees of freedom, which has infinite fourth moment, and therefore, it does not satisfy the assumptions used in the theorems of Section 3. Finally, all tests derived under Lejeune’s (2006) framework present good empirical size and are, hence, robust to distributional misspecifications. Although not reported, in all cases, the proposed tests have monotonically increasing empirical power as heteroskedasticity in the tested component augments. To summarize, the analysis confirms that, although optimal in the Gaussian case, LM tests derived under this assumption are severely affected by non-normalities, and that, on the contrary, our new statistics and those based on Lejeune’s (2006) remain unaltered by changes in the underlying distribution of the error terms. 5. An extension: the heterokurtic case We consider an extension of the tests proposed above to the case of finite but non-identical fourth moments, i.e. heterokurtosis. This is, thus, a generalization of the procedures of Wooldridge (1990, 1991) and Dastoor (1997) in the cross-sectional case, to the error components model in panel data. In this case, Assumption 2 should be dropped and the asymptotic results should be modified to allow for different variances of the conditional squared residuals. We illustrate this procedure by modifying Theorem 1 (for the tests for heteroskedasticity in the individual component), which provides a guidance for straightforward extensions for Theorems 2, 3 and 4. 2
Recall from Section 3.1 that η¯ˆ i = u¯ˆ i . Define
2 2 N N − − 1 1 ˆ µ = diag Φ η¯ˆ 1 − η¯ˆ i , . . . , η¯ˆ N − η¯ˆ i . N i=1 N i=1 Consider the following assumption, that ensures the existence of the fourth moments: Assumption 2′ . Let η¯ = {¯u21 , . . . , u¯ 2N }, then limN →∞ Var [ √1 Z′µ MN
η] ¯ = Ωµ is a finite positive define matrix.
N
The following theorem provides the asymptotic distribution of a Wooldridge (1990) type statistic for testing heteroskedasticity in the individual component with heterokurtosis. The intuition is that, as argued in Wooldridge (1990, p. 23), the White (1980) ˆ µ ) can be used covariance matrix (in our case based on Φ to compute heteroskedasticity tests that are not affected by heterokurtosis. A similar procedure can be used to construct tests that are robust to heterokurtosis for all the test statistics considered in this paper. ′ Theorem 5. Let λhµ = σµ4 h(µ1) (0)2 δµ Dµ Ωµ−1 Dµ δµ . Then, under Asσ2
sumptions 1 and 2, as N , T → ∞ or N → ∞, T fixed and H0 ν , and σµ2
under HA
√ : θµ = δµ / N, ′
ˆ µ MN Zµ )−1 mhθµ ≡ N × η¯ˆ MN Zµ (Z′µ MN Zµ )(Z′µ MN Φ d × (Z′µ MN Zµ )Z′µ MN η¯ˆ → χk2θµ (λhµ ).
Proof. The proof follows from Theorem 1 and Dastoor’s (1997) Theorem 1. Interestingly, following Wooldridge (1990, Example 3.2, p. 32–34) this test can also be implemented in an artificial regression setup, as N × R2h µ of the regression of a vector of ones on
(η¯ˆ −
∑N ¯ ˆ i )(zµ − i =1 η
zµi ), where R2h µ is the uncentered coefficient of determination of the regression. Note that this procedure can be extended for a general variance–covariance matrix of the transformed residual η. In this case, we could define a general matrix Φ = (M ηη′ M ) ⊙ A, where M is a square matrix with the dimension of η (either MN or MNT ), A is a selector matrix of the same dimension, with 0s and 1s that indicate which elements are non-zero, and ‘⊙’ denotes the element-by-element matrix multiplication operator. By imposing adequate restrictions on the type of dependence, test statistics that are robust to heterokurtosis and several types of panel dependences can be constructed. We conduct a small Monte Carlo experiment to evaluate the effect of heterokurtosis on our proposed statistics, and the corresponding heterokurtic-robust modifications based on the artificial regression setup explained above. We generate 1000 replications under H0 : θµ = θν = 0, N = 50 and T = 5, for nonnormal DGPs with varying kurtosis. We consider 3 different cases. First, we generate half of the observations with a t-Student with 5 degrees of freedom and half with 10 degrees of freedom. Second, half with t-Student (df = 5) and half with a log-normal. Finally, one fifth with t-Student (df = 5), one fifth with t-Student (df = 5), one fifth with t-Student (df = 5), one fifth normal and one fifth with log-normal. In all cases we use the adjustment explained in the Monte Carlo section to get the required variances. Results appear in Table 5. The tests based on homokurtosis have good empirical size. In general, the Wooldridge-type statistics show rejection rates below the nominal size of 5%. Overall this suggests that heterokurtosis may not produce great size distortions. 1 N
1 N
∑N
i=1
6. Concluding remarks and suggestions for practitioners As in the cross-sectional case, heteroskedasticity is likely to affect panel models as well. A further complication in the standard error components model is to correctly identify in which of the two components, if not in both, it is present. Available LM based tests are shown to have difficulties solving this problem. First, by relying strictly on distributional assumptions, they are prone to be negatively affected by departures away from the Gaussian framework in which they are derived. This paper shows that this is clearly the case, since alternative distributions (in particular, heavy-tailed ones) lead to spurious rejections of the null of homoskedasticity. Second, joint tests of the null of homoskedasticity in both components, though helpful in serving as a starting diagnostic check, are by construction unable to identify the source of heteroskedasticity. More importantly, the marginal LM test for the individual component rejects its null in the presence of heteroskedasticity in either component, and hence, cannot help in identifying which error is causing it. Our new tests are robust in these two senses, that is, they have correct asymptotic size for a wide family of distributions and they have power only in the direction intended for. An extensive Monte Carlo experiment confirms the severity of these problems and the adequacy of our new tests in small samples. Our new tests are computationally convenient, since they are based on simple algebraic transformations of pooled OLS residuals, unlike the tests by Baltagi et al. (2006) or Holly and Gardiol (2000) that require ML or pseudo-ML estimation. Also, the extension to the case of unbalanced panels is immediate in the case of our tests, due to the use of simple moment conditions, in contrast with
G. Montes-Rojas, W. Sosa-Escudero / Journal of Econometrics 160 (2011) 300–310
309
Table 4 Empirical rejection probabilities. Size distortions with different DGPs. N = 50, T = 5. Exponential heteroskedasticity m∗µ
mµ DGP
σµ = 6, σ¯ ν = 2
Gaussian t3 t5 Skewed-N Log-normal Exponential
0.053 0.049 0.055 0.049 0.051 0.048 0.057 0.055
χ12 Uniform
2
HGµ
Lµ
mν
m∗ν
BBPν
BBPν′
Lν
mµ,ν
BBPµ,ν
Lµ,ν
0.039 0.207 0.105 0.065 0.314 0.177 0.275 0.193
0.039 0.042 0.050 0.047 0.046 0.032 0.051 0.049
0.050 0.055 0.050 0.056 0.054 0.059 0.064 0.053
0.092 0.083 0.086 0.074 0.065 0.072 0.080 0.091
0.049 0.320 0.176 0.092 0.485 0.238 0.333 0.013
0.044 0.324 0.189 0.088 0.500 0.242 0.353 0.006
0.042 0.049 0.047 0.055 0.051 0.043 0.048 0.053
0.048 0.055 0.051 0.049 0.061 0.055 0.064 0.051
0.043 0.384 0.192 0.091 0.590 0.297 0.439 0.141
0.033 0.042 0.052 0.044 0.041 0.031 0.039 0.041
0.042 0.153 0.077 0.057 0.243 0.115 0.166 0.202
0.049 0.043 0.047 0.054 0.051 0.039 0.045 0.050
0.056 0.054 0.050 0.054 0.054 0.059 0.056 0.049
0.056 0.052 0.052 0.049 0.055 0.052 0.048 0.056
0.053 0.341 0.182 0.092 0.494 0.240 0.359 0.011
0.040 0.344 0.187 0.087 0.496 0.251 0.364 0.007
0.046 0.046 0.046 0.065 0.046 0.049 0.046 0.046
0.052 0.054 0.049 0.051 0.063 0.053 0.056 0.050
0.051 0.359 0.170 0.088 0.543 0.248 0.386 0.140
0.042 0.049 0.045 0.055 0.049 0.033 0.051 0.045
2
0.053 0.049 0.055 0.051 0.050 0.048 0.056 0.055
σµ2 = 2, σν2 = 6 Gaussian t3 t5 Skewed-N Log-normal Exponential
χ12 Uniform
0.052 0.048 0.050 0.056 0.054 0.050 0.056 0.057
0.054 0.050 0.051 0.057 0.054 0.049 0.055 0.057
Notes: Monte Carlo simulations based on 5000 replications. Theoretical size 5%. Table 5 Empirical rejection probabilities. Heterokurtosis. N = 50, T = 5. m∗µ
Test statistic
mµ
DGP
σµ2 = 6, σν2 = 2
t5 & t10 t5 & log −N t5 & t7 & t10 & Normal & log −N
0.045 0.058 0.054
DGP
σµ2 = 2, σν2 = 6
t5 & t10 t5 & log −N t5 & t7 & t10 & Normal & log −N
0.049 0.051 0.048
0.043 0.054 0.059
0.048 0.052 0.052
mhµ
m∗µh
mν
mhν
m∗ν
m∗ν h
mµ,ν
mhµ,ν
0.033 0.026 0.038
0.031 0.021 0.043
0.045 0.058 0.070
0.044 0.038 0.055
0.055 0.077 0.072
0.050 0.054 0.068
0.044 0.056 0.064
0.036 0.032 0.048
0.043 0.022 0.030
0.042 0.021 0.031
0.050 0.057 0.054
0.038 0.038 0.045
0.057 0.071 0.060
0.049 0.053 0.056
0.046 0.062 0.051
0.044 0.034 0.039
Notes: Monte Carlo simulations based on 1000 replications. Theoretical size 5%.
many other error component procedures whose derivation for the unbalanced case requires complicated algebraic manipulations (see Sosa-Escudero and Bera, 2008, for a recent case). Note that Lejeune’s (2006) tests allow for unbalanced panels too. In practice, the use of our new tests will depend on the hypothesis of interest. Obviously, joint tests are a useful starting point as a general diagnostic test, since they have correct size and power to detect departures away from the general null of homoskedasticity. Marginal tests can be used when the interest lies in one particular direction, our tests being particularly helpful in small samples. Additionally, marginal tests can be combined in a Bonferroni approach, to produce a joint test that is compatible with the marginal ones (see Savin, 1984, for further details). That is, compute both marginal tests, and reject the joint null if at least one of them lies in its rejection region, where the significance level for the marginal tests is halved, in order to guarantee that the resulting joint test has the desired asymptotic size. This is the essence of the ‘multiple comparison procedure’ in Bera and Jarque (1982). Regarding further research, this paper focuses mostly on preserving consistency and correct asymptotic size, with minimal power losses with respect of existing ML based tests. Power improvements can be expected from using a quantile regression framework, as in Koenker and Bassett (1982), which finds power gains by basing a test for heteroskedasticity on the difference in slopes in a quantile regression framework, for the cross-sectional case. The literature on quantile models for panels is still incipient, though promising (see Koenker, 2004; Canay, 2008; Galvao, 2009), so further developments along the results of this research line seem promising.
References Azzalini, A., Capitanio, A., 2003. Distributions generated by perturbation of symmetry with emphasis on a multivariate skew t-distribution. Journal of the Royal Statistical Society Series B 65, 367–389. Baltagi, B.H., Bresson, G., Pirotte, A., 2006. Joint LM test for homoskedasticity in a one-way error component model. Journal of Econometrics 134, 401–417. Baltagi, B.H., Jung, B.C., Song, S.H., 2010. Testing for heteroskedasticity and serial correlation in a random effects panel data model. Journal of Econometrics 154, 122–124. Bera, A., Jarque, C., 1982. Model specification tests. a simultaneous approach. Journal of Econometrics 20, 59–82. Bera, A., Yoon, M., 1993. Specification testing with locally misspecified alternatives. Econometric Theory 9, 649–658. Box, G., 1953. Non-normalities and tests on variances. Biometrika 40, 318–335. Breusch, T.S., Pagan, A.R., 1979. A simple test for heteroskedasticity and random coefficient variation. Econometrica 47, 1287–1294. Canay, I., 2008. A Note on Quantile Regression for Panel Data Models. Mimeo, Northwestern University. Dastoor, N.K., 1997. Testing for conditional heteroskedasticity with misspecified alternative hypothesis. Journal of Econometrics 82, 63–80. Davidson, J., MacKinnon, J., 1987. Implicit alternatives and the local power of test statistics. Econometrica 55, 1305–1329. Davidson, J., MacKinnon, J., 1990. Specification tests based on artificial regression. Journal of the American Statistical Association 85, 220–227. Evans, M., 1992. Robustness of size of tests of autocorrelation and heteroskedasticity to nonnormality. Journal of Econometrics 51, 7–24. Galvao, A., 2009. Quantile Regression for Dynamic Panel Data with Fixed Effects. Mimeo, University of Wisconsin. Holly, A., Gardiol, L., 2000. A score test for individual heteroskedasticity in a one-way error components model. In: Krishnakumar, J., Ronchetti, E. (Eds.), Panel Data Econometrics: Future Directions. Elsevier, Amsterdam, pp. 199–211 (Chapter 10). Huber, P., 1981. Robust Statistics. Wiley, New York. Koenker, R., 1981. A note on studentizing a test for heteroskedasticity. Journal of Econometrics 17, 107–112. Koenker, R., 2004. Quantile regression for longitudinal data. Journal of Multivariate Analysis 91, 74–89. Koenker, R., Bassett, G., 1982. Robust tests for heteroscedasticity based on regression quantiles. Econometrica 50, 43–61.
310
G. Montes-Rojas, W. Sosa-Escudero / Journal of Econometrics 160 (2011) 300–310
Kimball, A., 1957. Errors of the third kind in statistical consulting. Journal of the American Statistical Association 57, 133–142. Lejeune, B., 2006. A full heteroscedastic one-way error components model allowing for unbalanced panel: pseudo-maximum likelihood estimation and specification testing. In: Baltagi, B.H. (Ed.), Panel Data Econometrics: Theoretical Contributions and Empirical Applications. Elsevier, Amsterdam, pp. 31–66 (Chapter 2). Newey, W., 1985. Maximum likelihood specification testing and conditional moment tests. Econometrica 53, 1047–1070. Mazodier, P., Trognon, A., 1978. Heteroscedasticity and stratification in error component models. Annales de l’INSEE 30/31, 451–482. Phillips, R.L., 2003. Estimation of stratified error components model. International Economic Review 44, 501–521. Roy, N., 2002. Is adaptative estimation useful for panel data models with heteroskedasticity in the individual specific error component? some Monte Carlo evidence. Econometric Reviews 21, 189–203. Saikkonen, P., 1989. Asymptotic relative efficiency of the classical tests under misspecification. Journal of Econometrics 42, 351–369.
Savin, N.E., 1984. In: Griliches, Z., Intriligator, M.D. (Eds.), Multiple hypothesis testing. In: Handbook of Econometrics, Vol. II. Elsevier, Amsterdam, pp. 827–879 (Chapter 14). Sosa-Escudero, W., Bera, A., 2008. Tests for unbalanced error component models under local misspecification. Stata Journal 8, 68–78. Tauchen, G., 1985. Diagnostic testing and evaluation of maximum likelihood models. Journal of Econometrics 30, 415–443. White, H., 1980. A heteroskedasticity-consistent covariance matrix estimator and a direct test for heteroskedasticity. Econometrica 48, 817–838. White, H., 1987. In: Bewley, T.F. (Ed.), Specification testing in dynamic models. In: Advances in Econometrics-Fifth World Congress, Vol. 1. Cambridge University Press, Cambridge, pp. 1–58 (Chapter 1). Welsh, A., 1996. Aspects of Statistical Inference. Wiley, New York. Wooldridge, J.M., 1990. A unified approach to robust, regression-based specification tests. Econometric Theory 6, 17–43. Wooldridge, J.M., 1991. On the application of robust, regression-based diagnostics to models of conditional means and conditional variances. Journal of Econometrics 47, 5–46.
Journal of Econometrics 160 (2011) 311–325
Contents lists available at ScienceDirect
Journal of Econometrics journal homepage: www.elsevier.com/locate/jeconom
Multivariate contemporaneous-threshold autoregressive models✩ Michael J. Dueker a,∗ , Zacharias Psaradakis b , Martin Sola b,c , Fabio Spagnolo d a b
Russell Investments, USA Department of Economics, Mathematics & Statistics, Birkbeck, University of London, UK
c
Department of Economics, Universidad Torcuato di Tella, Argentina
d
Department of Economics and Finance, Brunel University, UK
article
info
Article history: Received 13 March 2009 Received in revised form 30 August 2010 Accepted 7 September 2010 Available online 17 September 2010 JEL classification: C32 G12
abstract This paper proposes a contemporaneous-threshold multivariate smooth transition autoregressive (C-MSTAR) model in which the regime weights depend on the ex-ante probabilities that latent regimespecific variables exceed certain threshold values. A key feature of the model is that the transition function depends on all the parameters of the model as well as on the data. Since the mixing weights are also a function of the regime-specific noise covariance matrix, the model can account for contemporaneous regime-specific co-movements of the variables. The stability and distributional properties of the proposed model are discussed, as well as issues of estimation, testing and forecasting. The practical usefulness of the C-MSTAR model is illustrated by examining the relationship between US stock prices and interest rates. © 2010 Elsevier B.V. All rights reserved.
Keywords: Nonlinear autoregressive model Smooth transition Stability Threshold
1. Introduction It has been long recognized that economic variables may behave very differently in different states of the economy such as, for example, high/low inflation, high/low growth, or high/low stock prices (relative to dividends). This behavior may be attributable not only to state-dependent response of economic variables to policy shocks but also to state-dependent response on the part of the authorities responsible for fiscal and monetary policies. In an attempt to capture state-dependent or regime-switching behavior, a variety of nonlinear models has been proposed for describing the dynamics of economic time series subject to changes in regime (see, e.g.,Tong (1983, 1990), Hamilton (1993), van Dijk et al. (2002) and Dueker et al. (2007)). Researchers are often interested in studying the interrelationships between several economic/financial variables. To this end, several multivariate models have been considered in the literature, including Markov-switching autoregressive models (e.g., Sola
✩ Helpful comments by an associate editor and two anonymous referees are gratefully acknowledged. We especially would like to thank Demian Pouzo for numerous conversations about the paper. Finally we thank Alejandro Francetich and Juan Passadore for excellent research assistance. ∗ Corresponding author. E-mail address:
[email protected] (M.J. Dueker).
0304-4076/$ – see front matter © 2010 Elsevier B.V. All rights reserved. doi:10.1016/j.jeconom.2010.09.011
and Driffill (1994)), threshold autoregressive models (Tsay, 1998), smooth transition autoregressive (STAR) models (van Dijk et al., 2002), functional-coefficient autoregressive models (Harvill and Ray, 2006) and mixture autoregressive models (Fong et al., 2007; Bec et al., 2008). In spite of some obvious difficulties associated with the practical use of many of these models (e.g., choice of an appropriate threshold variable, number of regimes, transition function, functional forms), they are potentially very useful for analyzing state-dependent multivariate relationships. Well-known examples of such relationships, which have been the focus of recent research, are nonlinear money-output Granger causality patterns (e.g., Rothman et al. (2001) and Psaradakis et al. (2005)), nonlinearities in the term structure of interest rates (e.g., Sola and Driffill (1994), Tsay (1998) and De Gooijer and Vidiella-i-Anguera (2004)) and nonlinearities in business-cycle relationships (e.g., Altissimo and Violante (2001) and Koop and Potter (2006)), inter alia. One of the major challenges faced in a multivariate framework is how best to capture the state-dependent behavior that the components of a multiple time series may exhibit, as well as the potentially changing interrelationships between the variables, in a way which is both statistically sound and economically meaningful. In many instances, different states of the economy can be characterized in terms of high and low values of certain economic/financial variables (e.g., high/low inflation or high/low growth). The economy typically behaves differently in these regimes and it is
312
M.J. Dueker et al. / Journal of Econometrics 160 (2011) 311–325
reasonable to expect that the contemporaneous and feedback relationships between variables will also be regime specific. An econometric model will be useful in such cases if it is capable of both identifying the periods associated with different states of nature and capturing the state-specific interrelationships among variables. A Markov-switching autoregressive model, for example, which allows for shifts in the mean or the intercept can capture extreme events associated with the level of the series but cannot account for state-dependent interrelationships among the variables. The latter may be accounted for by allowing all the parameters of the model to switch, but this usually results in identifying as separate regimes periods which do not necessarily correspond to economically meaningful states of nature (e.g., high/low growth rates). Multivariate threshold autoregressive and STAR models typically associate different regimes with small and large values of the transition variables and are capable of characterizing state-dependent interactions among the variables. This paper contributes to the literature on multivariate nonlinear models by proposing a contemporaneous-threshold multivariate STAR, or C-MSTAR, model. A key characteristic of the model is that the mixing (or regime) weights depend on the ex-ante probabilities that latent regime-specific variables exceed certain (unknown) threshold values (cf. Dueker et al. (2007)). What is more, the mixing (or transition) function of the C-MSTAR model depends on all the parameters of the model as well as on the data. This implies that, in contrast to conventional STAR models, there is no need to choose an appropriate transition variable using a model selection criterion since, by construction, all the variables that enter the model’s information set are also present in the transition function. Furthermore, the dependence of the mixing weights on the regime-specific noise covariance matrices allows the model to capture contemporaneous regime-specific co-movements of the variables and to exploit the information in these covariance matrices in order to predict regimes. These important characteristics make the C-MSTAR model capable of describing successfully multiple time series with a wide variety of conditional distributions and of capturing state-dependent interrelationships among the variables of interest. To convey the flavor of contemporaneous-threshold smooth transition autoregressive (C-STAR) models, the definition and main characteristics of the univariate C-STAR model is recalled in Section 2. The C-MSTAR model is introduced and discussed in Section 3. We examine the stability properties of the model and use artificial data to analyze the various types of conditional distributions that can be generated by a C-MSTAR model. Section 4 discusses estimation and testing, and reports the results of simulation experiments that assess the finite-sample performance of the maximum likelihood (ML) estimator and of the related statistics. In Section 5, we investigate the relationship between US stock prices and interest rates using a C-MSTAR model, and evaluate its out-of-sample forecast performance. Our empirical results suggest that monetary policy has different effects on stock prices in different states of the economy and that Granger causality between stock prices and interest rates is regime dependent. A summary is given in Section 6. 2. Univariate contemporaneous-threshold models The C-STAR model of Dueker et al. (2007) is a member of the STAR family. A STAR process may be thought of as a mixture of two (or more) autoregressive processes which are averaged, at any given point in time, according to some continuous function G(·) taking values in [0, 1]. More specifically, a two-regime (conditionally heteroskedastic) STAR model for the univariate time series {xt } may be formulated as xt = G(zt −1 )x1t + [1 − G(zt −1 )]x2t ,
t = 1, 2, . . . ,
(1)
where zt −1 is a vector of exogenous and/or pre-determined variables and xit = µi +
p −
αj(i) xt −j + σi ut ,
i = 1, 2.
(2)
j =1
In (2), p is a positive integer, {ut } are independent and identically distributed (i.i.d.) random variables such that ut is independent of (x1−p , . . . , x0 ) and E(ut ) = E(u2t − 1) = 0, σ1 and σ2 are (i)
positive constants, and µi and αj (i = 1, 2; j = 1, . . . , p) are real constants. The feature that differentiates alternative STAR models is the choice of the mixing function G(·) and transition variables zt −1 (cf. Teräsvirta (1998) and van Dijk et al. (2002)). (i) (i) Letting zt −1 = (xt −1 , . . . , xt −p )′ and αi = (α1 , . . . , αp )′ (i = 1, 2), the (conditionally) Gaussian, two-regime C-STAR model of order p is obtained by defining the mixing function G(·) in (1) as G(zt −1 )
=
Φ ({x∗
Φ ({x∗ − µ1 − α′1 zt −1 }/σ1 ) , − µ1 − α1 zt −1 }/σ1 ) + 1 − Φ ({x∗ − µ2 − α′2 zt −1 }/σ2 ) ′
where Φ (·) denotes the standard normal distribution function and x∗ is a threshold parameter.1 Notice that G(zt −1 ) =
P(x1t < x∗ |zt −1 ; ϑ1 ) P(x1t < x∗ |zt −1 ; ϑ1 ) + P(x2t ⩾ x∗ |zt −1 ; ϑ2 )
and 1 − G(zt −1 ) =
P(x2t ≥ x∗ |zt −1 ; ϑ1 ) P(x1t < x∗ |zt −1 ; ϑ1 ) + P(x2t ≥ x∗ |zt −1 ; ϑ2 ) (i)
,
(i)
where ϑi = (µi , α1 , . . . , αp , σi2 )′ is the vector of parameters associated with regime i. Hence, (1) may be rewritten as xt =
P(x1t < x∗ |zt −1 ; ϑ1 )x1t + P(x2t ≥ x∗ |zt −1 ; ϑ2 )x2t P(x1t < x∗ |zt −1 ; ϑ1 ) + P(x2t ≥ x∗ |zt −1 ; ϑ2 )
.
Since the values of the mixing function depend on the probability that the contemporaneous value of x1t (x2t ) is smaller (greater) than the threshold level x∗ , the model is called a contemporaneousthreshold STAR model. As with conventional STAR models, a CSTAR model may be thought of as a regime-switching model that allows for two regimes associated with the two latent variables x1t and x2t . Alternatively, a C-STAR model may be thought of as allowing for a continuum of regimes, each of which is associated with a different value of G(zt −1 ).2 One of the main purposes of the C-STAR model is to address two somewhat arbitrary features of conventional STAR models. First, STAR models specify a delay such that the mixing function for period t consists of a function of xt −j for some j ≥ 1. Second, STAR models specify which of and in what way the model parameters enter the mixing function. C-STAR models address
1 Although conditional Gaussianity is used as a convenient assumption in much of what follows, Φ (·) can be replaced with another continuous distribution function. 2 It is perhaps worth noting here that the C-STAR model allows for realizations
of x1t and x2t such that x1t ≥ x∗ and x2t < x∗ . To illustrate the point numerically, suppose that x1t = −0.5 + 0.6xt −1 + 3ut and x2t = −0.5 + 0.9xt −1 + 3ut , with ut ∼ N (0, 1); assume further that xt −1 = 5 and x∗ = 10. Then, the mixing weights are P(x1t < x∗ |zt −1 ) = P(3ut < x∗ + 0.5 − 0.6xt −1 |zt −1 ) = Φ (2.5) = 0.994 and P(x2t ≥ x∗ |zt −1 ) = P(3ut ≥ y∗ + 0.5 − 0.9xt −1 |zt −1 ) = 1 − Φ (1.6666667) = 0.0478, so that G(zt −1 ) = 0.9541. Hence, conditionally on xt −1 = 5, the C-STAR model assigns a large weight to the regime associated with x1t , so that most of the area of the regime-specific conditional distribution is below the threshold and very little of the area associated with the other regime is above the threshold. It is not, therefore, against the logic of the model to obtain a realization such as x2t < x∗ (which is very likely to happen); the identifying conditions of the model imply that the weight given to the regime associated with x2t is going to be small whenever the realizations of x2t such that x2t < x∗ are likely to occur.
M.J. Dueker et al. / Journal of Econometrics 160 (2011) 311–325
these twin issues in an intuitive way: they use a forecasting function such that the mixing function depends on the ex-ante regime-dependent probabilities that xt will exceed the threshold value(s). Furthermore, the mixing function makes use of all of the model parameters in a coherent way.
313
It can be readily seen that G1 (zt −1 ) = (1/δt )P(x1t < x∗ , w1t < w ∗ |yt −1 ; θ 1 ), G2 (zt −1 ) = (1/δt )P(x2t < x∗ , w2t ≥ w ∗ |yt −1 ; θ 2 ), G3 (zt −1 ) = (1/δt )P(x3t ≥ x∗ , w3t < w ∗ |yt −1 ; θ 3 ), G4 (zt −1 ) = (1/δt )P(x4t ≥ x∗ , w4t ≥ w ∗ |yt −1 ; θ 4 ),
3. Multivariate contemporaneous-threshold models
(i)
In this section, we present a C-MSTAR model which is capable of both separating different regimes in terms of the probability of regime-specific latent variables being greater (or smaller) than relevant thresholds as well as allowing the interaction and feedback relationships between variables to differ between regimes. We begin by defining the model and then proceed to investigate some of its properties. 3.1. Definition The C-MSTAR model belongs to the class of multivariate STAR models. An n-variate (conditionally heteroskedastic) STAR process {yt } with m regimes may be defined as yt =
m −
Gi (zt −1 )yit ,
t = 1, 2, . . . ,
(3)
i=1
where Gi (·) (i = 1, . . . , m) are continuous functions taking values in [0, 1], zt −1 is a vector of exogenous and/or pre-determined variables, and yit = µi +
p −
1/2
(i)
A j y t −j + 6 i
ut ,
i = 1, . . . , m.
(4)
j=1
In (4), p is a positive integer, {ut } is a sequence of i.i.d. ndimensional random vectors with E(ut ) = 0, E(ut u′t ) = In (In being the identity matrix of order n) and ut independent of (y1−p , . . . , y0 ), µi (i = 1, . . . , m) are n-dimensional vectors of (i)
intercepts, Aj (i = 1, . . . , m; j = 1, . . . , p) are n × n coefficient matrices, and 6i (i = 1, . . . , m) are symmetric, positive definite n × n matrices.3 For simplicity and clarity of exposition, we shall focus hereafter on the bivariate, first-order C-MSTAR model, i.e., the model with n = 2, m = 4, and p = 1. To define this model, let yt = (xt , wt )′ ,
yit = (xit , wit )′ ,
y1 = (x , w ) , ∗
∗
i = 1, . . . , 4,
y∗2 = (x∗ , −w ∗ )′ ,
∗ ′
y∗3 = (−x∗ , w ∗ )′ ,
y4 = (−x , −w ) , ∗
∗
∗ ′
where x∗ and w ∗ are threshold parameters, and xit and wit (i = 1, . . . , 4) are latent regime-specific random variables. Then, {yt } is said to follow a (conditionally) Gaussian, first-order C-MSTAR model if it satisfies (3)–(4) with ut ∼ N (0, I2 ), zt −1 = yt −1 , and −1/2
Gi (zt −1 ) = (1/δt )Φ2 (6i
{y∗i − µi − A1(i) yt −1 }),
i = 1, . . . , 4,
(5)
where Φ2 (·) denotes the N (0, I2 ) distribution function and
δt =
4 −
−1/2
Φ2 (6i
{y∗i − µi − A1(i) yt −1 }).
(6)
i=1
3 For a symmetric, positive definite matrix C, C1/2 denotes its symmetric, positive definite square root.
where θ i = (µ′i , vec(A1 )′ , vech(6i )′ )′ is the vector of parameters associated with regime i. Hence the mixing functions Gi (·) reflect the weighted probabilities that the regime-specific latent variables xit and wit are above or below the respective thresholds x∗ and w ∗ . The first-order model above can be generalized straightforwardly to the case of p ≥ 2 lags. Furthermore, although we do not pursue this modelling strategy here, the number of lags in (4) may be allowed to differ over i and thus be regime-specific.4 Regarding the number of regimes m, it should be remembered that m is always determined by the dimension n of the C-MSTAR model. When n = 2, we have m = 4 by construction since there are four possible states of nature defined by the regime-specific latent variables and the thresholds, namely {x1t < x∗ , w1t < w ∗ }, {w2t < w ∗ , w2t ≥ w ∗ }, {w3t ≥ w ∗ , w3t < w∗ }, and {w4t ≥ w ∗ , w4t ≥ w∗ }. For a model with n = 3, we have m = 9, and so on.5 Finally, as in the univariate case, a (conditionally) non-Gaussian C-MSTAR model can be obtained by replacing Φ2 (·) in (5)–(6) by the distribution function Ψ (·), say, of another continuous distribution on R2 (having mean vector 0 and covariance matrix I2 ). The interpretation of the model remains the same as long as ut is assumed to be distributed according to Ψ (·). 3.2. Distributional characteristics To gain an understanding of the behavior of C-MSTAR time series, we illustrate some properties of the C-MSTAR model by using artificial data obtained from the data-generating processes (DGPs) given in Table 1. These DGPs have been chosen to highlight some important features of the model related to: (i) the response of the mixing function to changes in the parameters of the model; and (ii) the empirical distribution of C-MSTAR data. The errors ut are contemporaneously uncorrelated under DGP-1, while DGP2 and DGP-3 allow for positive and negative contemporaneous correlation, respectively. Fig. 1 shows the conditional density functions of the latent regime-specific random vectors yit (i = 1, . . . , 4) for DGP-1, given yt −1 = (0.4, 0.6)′ , along with the threshold y∗1 = (0.4, 0.6)′ and the values of the mixing functions Gi (yt −1 ). Each plot shows the relevant area of the density (suitably rotated) for which each regime is defined. The regime-specific conditional means are E(y1t |yt −1 ) = (0.35, 0.57)′ , E(y2t |yt −1 ) = (0.29, 0.6)′ , E(y3t |yt −1 ) = (0.59, 0.39)′ , and E(y4t |yt −1 ) = (0.43, 0.66)′ .
4 In either case, the number of lags may be selected adaptively by using complexity-penalized likelihood criteria (see Kapetanios (2001) and Psaradakis and Spagnolo (2006) for related results concerning univariate nonlinear autoregressive models). 5 Needless to say, the number of parameters in an C-MSTAR model increases considerably with the dimension of the model, and hence with the number of regimes (a problem which is, of course, common to many of the multiple-regime multivariate models mentioned in Section 1). One way of dealing with this difficulty may be to allow only some of the components of yit in (4) to have regime-specific dynamics. To give an example, suppose that yt = (xt , wt , rt )′ , where xt is output growth, wt is inflation and rt is the change in the exchange rate; since periods of high inflation are likely to coincide with periods of devaluation, one might allow the dynamics of output and of only one of the other two variables to be regimespecific. An alternative approach may be to consider a two-regime model in which the regimes are defined in terms of a linear combination of the latent variables being greater (or smaller) than a linear combination of the thresholds. The former approach has the advantage that the regimes have a clear economic interpretation.
314
M.J. Dueker et al. / Journal of Econometrics 160 (2011) 311–325 Table 1 Data-generating processes. DGP-1
] [ ] −0.05 0.80 0.05 (1) , A1 = , 61 = I2 −0.05 0.10 0.90 [ ] [ ] −0.05 0.75 −0.05 (2) µ2 = , A1 = , 62 = I2 0.05 0.05 0.85 [ ] [ ] 0.15 0.75 −0.30 (3) µ3 = , A1 = , 63 = I2 −0.05 0.20 0.85 [ ] [ ] 0.05 0.90 −0.10 (4) µ4 = , A1 = , 64 = I2 0.10 0.01 0.90 µ1 =
[
(x∗ , w∗ ) = (0.6, −0.4) DGP-2 Intercepts, coefficients are the same as for DGP-1 [ autoregressive ] [ ]and threshold [ parameters ] 1 0.9 1 0.8 1 0.3 61 = , 62 = , 63 = 0.9 1 0.8 1 0.3 1
64 =
[
1 0.8
0.8 1
]
DGP-3 Intercepts, coefficients and threshold parameters are the [ autoregressive ] [ ] [ ] same as for DGP-1 1 −0.9 1 −0.8 1 −0.3 61 = , 62 = , 63 = −0.9 1 −0.8 1 −0.3 1
64 =
[
1 −0.8
] −0.8 1
It can be seen that the values of the mixing weights Gi (yt −1 ) depend on the values of the regime-specific conditional means relative to the threshold. More specifically, the larger the area of the conditional distribution which lies above the threshold is, the larger Gi (yt −1 ) is. In our example, we have G1 (yt −1 ) = 0.09, G2 (yt −1 ) = 0.48, G3 (yt −1 ) = 0.09, and G4 (yt −1 ) = 0.34. Conditioning on yt −1 = (−1.5, −2)′ yields the density functions shown in Fig. 2. The regime-specific conditional means now are E(y1t |yt −1 ) = (−1.44, −1.97)′ , E(y2t |yt −1 ) = (−1.26, −1.97)′ , E(y3t |yt −1 ) = (−1.37, −1.35)′ , and E(y4t |yt −1 ) = (−1.31, −1.59)′ . The mixing functions take the values G1 (yt −1 ) = 0.88, G2 (yt −1 ) = 0.1, G3 (yt −1 ) = 0.02, and G4 (yt −1 ) = 0. It is not surprising that the regime associated with G1 (·) is now the most prominent regime since the distance of E(y1t |yt −1 ) from each of the thresholds is about one standard deviation. The results for DGP-2 and DGP-3 can be summarized as follows. When we condition on yt −1 = (0.4, 0.6)′ , the values of the mixing functions do not change substantially as a result of the change in the shape of the conditional distributions (for brevity, the relevant plots are not included here). We have G1 (yt −1 ) = 0, G2 (yt −1 ) = 0.52, G3 (yt −1 ) = 0.11, and G4 (yt −1 ) = 0.36 under DGP-2 (positive contemporaneous correlation), while G1 (yt −1 ) = 0, G2 (yt −1 ) = 0.54, G3 (yt −1 ) = 0.07, and G4 (yt −1 ) = 0.38 under DGP-3 (negative contemporaneous correlation). Interestingly, the change in the sign of the correlation coefficient results in marginal changes in the values of the mixing functions; it is the location of the conditional means relative to the thresholds and the dispersion of the conditional densities that are of primary importance as far as the mixing weights are concerned. Similar results are obtained when we condition on yt −1 = (−1.5, −2)′ .
representation which is geometrically ergodic.6 For simplicity and clarity of exposition, the discussion is once again focused on the Gaussian, bivariate, first-order C-MSTAR model. The stability concept considered here is that of Q -geometric ergodicity of a Markov chain introduced by Liebscher (2005). To recall the definition of this concept, suppose that {ξ t }t ≥0 is a Markov chain on a general state space S with k-step transition probability kernel P (k) (·, ·) and an invariant distribution π (·), so that P (k) (v, B) = P(ξ k ∈ B|ξ 0 = v) and π (B) = S P (1) (v, B)π (dv) for any Borel set B in S and v ∈ S . If there exists a non-negative function Q (·) on S satisfying S Q (v)π (dv) < ∞ and positive constants a1 , a2 and γ < 1 such that, for all v ∈ S ,
(k) P (v, ·) − π (·) ≤ {a1 + a2 Q (v)}γ k , τ
k = 1, 2, . . . ,
3.3. Stability
where ‖·‖τ denotes the total variation norm,7 then {ξ t } is said to be Q -geometrically ergodic. Geometric ergodicity entails that the total variation distance between the probability measures P (k) (v, ·) and π (·) converges to zero geometrically fast (as k → ∞) for all v ∈ S . It is well known that, if the initial value ξ 0 of the Markov chain has a distribution π (·), then geometric ergodicity implies strict stationarity of {ξ t }. Furthermore, provided that ξ 0 is such that Q (ξ 0 ) is integrable with respect to π (·), Q -geometric ergodicity implies that {ξ t } is Harris ergodic (i.e., aperiodic, irreducible and positive Harris recurrent), as well as absolutely regular (or β -mixing) with a geometric mixing rate (see Liebscher (2005, Proposition 4)). Such ergodicity and mixing properties are of great importance for the purposes of statistical inference in dynamic models since they ensure the validity of many conventional limit theorems (see, e.g., Doukhan (1994)). To give sufficient conditions for Q -geometric ergodicity of a CMSTAR process, the concept of the joint spectral radius of a set of matrices is needed. Suppose that C is a set of real, square matrices
3.3.1. Probabilistic properties In this sub-section, we examine some probabilistic properties of the C-MSTAR model. Specifically, we give conditions under which the C-MSTAR model is stable in the sense of having a Markovian
6 For a comprehensive account of the stability and convergence theory of Markov chains the reader is referred to Meyn and Tweedie (2009). 7 Note that P (k) (v, ·) − π(·) = 2 sup P (k) (v, B) − π(B). τ
B
0.02
0.04
0.06
0.08
0.10
Regime 1: E(x1t||t–1)=0.35, E(w1t||t–1)=0.57.
–3.0
0.02 0.04 0.06 0.08 0.10 0.12 0.14 0.16
M.J. Dueker et al. / Journal of Econometrics 160 (2011) 311–325
315
Regime 2: E(x2t||t–1)=0.29, E(w2t||t–1)=0.6.
–2.5
–3.0
2.1
–2.5
–1.5
–2.2
2.6
–3.0
–2.0
–2.6 –1.8
1.6
–2.0
–1.0
–1.5 –1.0
–0.5
–1.4
0.1
0.0
0.02 0.04 0.06 0.08 0.10
–3.0 2.6
–2.6
2.1 2.6 1.1
–1.8
1.6
2.6 1.6
–2.2
2.1
–0.4
Regime 4: E(x4t||t–1)=0.43, E(w4t||t–1)=0.66.
0.14 0.16 0.02 0.04 0.06 0.08 0.10 0.12
Regime 3: E(x3t||t–1)=0.59, E(w3t||t–1)=0.39.
2.1 0.6
–1.4 1.1
0.6
–0.5
0.0
–1.0
1.1
–1.0
1.6 1.1
0.1
–0.6
0.6
Fig. 1. DGP1: distributions conditional on Xt −1 = 0.4 and wt −1 = 0.6, G1 (yt −1 ) = 0.09, G2 (yt −1 ) = 0.48, G3 (yt −1 ) = 0.09, G4 (yt −1 ) = 0.34, x∗ = 0.6, w ∗ = −0.4.
and let Ch be the set of all products of length h ≥ 1 of the elements of C . The joint spectral radius of C is then defined as
1/h
ρ(C ) = lim sup sup ‖C‖ h→∞
,
(7)
C∈Ch
where ‖·‖ is an arbitrary matrix norm. We note that the value of ρ(C ) is independent of the choice of matrix norm and that, if the set C trivially consists of a single matrix, then ρ(C ) coincides with the ordinary spectral radius (i.e., the maximal modulus of the eigenvalues of the matrix).8 The first-order C-MSTAR model defined by (3)–(6) belongs to the family of models studied by Liebscher (2005). By appealing to Theorem 2 and Proposition 5 in that paper, the following proposition is readily established.9 Here and in the sequel, ‖·‖ is used to denote the Euclidean vector norm and its subordinate
8 Also note that the norm of C in the definition of ρ(C ) in (7) may be replaced by the spectral radius of C as long as C is a finite or bounded set. 9 It can be easily seen that, under the conditions of Proposition 1, the nonlinear functions that specify the conditional mean and conditional variance of yt , given yt −1 , satisfy the assumptions in Section 4 of Liebscher (2005).
matrix norm (i.e., ‖v‖ = (v′ v)1/2 and ‖C‖ = sup‖v‖=1 ‖Cv‖, for an n-dimensional vector v and an n × n matrix C). Proposition 1. Suppose that, for every compact B of R2 , there subset −1 exist positive constants b1 and b2 such that 6(v) ≤ b1 and
∑ |det{6(v)}| ≤ b2 for all v ∈ B, where 6(v) = 4i=1 Gi (v)61i /2 . If, (2) (3) (4) (1) in addition, the set A = {A1 , A1 , A1 , A1 } is such that ρ(A) < 1, then the first-order C-MSTAR process {yt } is a Q -geometrically ergodic Markov chain with Q (v) = ‖v‖. It follows from our earlier discussion that ρ(A) < 1 guarantees the existence of a unique invariant distribution for {yt } with respect to which E(‖yt ‖) < ∞; furthermore, if {yt } is initialized from this invariant distribution, then it is strictly stationary, as well as absolutely regular and hence ergodic (in the sense of ergodic theory). We also note that the conclusion of Proposition 1 remains true for a non-Gaussian C-MSTAR model in which the distribution of the noise ut admits a positive Lebesgue density on R2 . Finally, it is worth pointing out that Liebscher’s (2005) approach, on which we have relied here, delivers conditions for geometric ergodicity which can sometimes be weaker than alternative sufficient conditions (cf. Liebscher (2005, p. 682)). A practical difficulty, however, is that exact or approximate computation of the joint spectral radius of a set of matrices is not an easy task, not
M.J. Dueker et al. / Journal of Econometrics 160 (2011) 311–325
0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08
316
0.14 0.16 0.02 0.04 0.06 0.08 0.10 0.12
Regime 1: E(x1t||t–1)=–1.44, E(w1t||t–1)=–1.97.
Regime 2: E(x2t||t–1)=–1.26, E(w2t||t–1)=–1.61.
–3.0 –2.5
2.6
–3.0
–2.0
–3.0 –2.6 –2.2 –1.8 –1.4
–2.5
2.1
–2.0
–1.5
1.6
–1.5
–1.0 –0.5
–1.0
1.1
–1.0
0.6
–0.5
0.0
0.1
0.0
0.002 0.004 0.006 0.008 0.010 0.012 0.014
0.004 0.008 0.012 0.016 0.020 0.024
–0.4
Regime 3: E(x3t||t–1)=–1.37, E(w3t||t–1)=–1.35.
–3.0
2.6
Regime 4: E(x4t||t–1)=–1.31, E(w4t||t–1)=-1.59.
2.6 2.1
–2.6
1.1
–1.8
1.6
–1.4 1.1
2.6
1.6
–2.2
2.1
2.1 1.6
0.6
–1.0
0.1
1.1
–0.6
0.6
Fig. 2. DGP1: Distributions conditional on Xt −1 = −1.5 and wt −1 = −2, G1 (yt −1 ) = 0.88, G2 (yt −1 ) = 0.1, G3 (yt −1 ) = 0.02, G4 (yt −1 ) = 0.0, x∗ = 0.6w ∗ = −0.4.
even in the simplest non-trivial case of a two-element set (see, e.g., Tsitsiklis and Blondel (1997)).10 One possibility is to use the algorithm presented in Gripenberg (1996) to obtain an arbitrarily small interval within which the joint spectral radius of A lies. An alternative approach, which may also provide useful information about the model in cases where the condition of Proposition 1 is not fulfilled, is to use simulation methods to investigate the properties of the skeleton of the C-MSTAR model. We turn our attention to this topic next. 3.3.2. Skeleton of the model As shown by Chan and Tong (1985), the stability properties of a nonlinear dynamic model may be analyzed by considering the noiseless part, or skeleton, of the model alone (see also Tong (1990)). The skeleton of the bivariate first-order C-MSTAR model
10 The problem of determining whether ρ(A) < 1 is, in fact, known to be NPhard, that is it cannot be solved in a number of steps which is a polynomial function of the size of A. It should also be remembered that the condition that each of the matrices in A has a spectral radius less than unity is necessary but not sufficient for ρ(A) < 1. A useful summary of some of the methods available for computing or approximating the joint spectral radius of a set of matrices can be found in Jungers (2009).
is the dynamic system yt = f(yt −1 , θ),
(8)
where f(yt −1 , θ) =
4 −
(i)
Gi (yt −1 )(µi + A1 yt −1 )
(9)
i=1
and θ denotes the vector of all the parameters of the model. A fixed point of the skeleton is any two-dimensional vector ye satisfying the equation f(ye , θ) = ye ,
(10)
and ye is said to be an equilibrium point of the C-MSTAR model. Since the model is nonlinear, there may, of course, exist one, several or no equilibrium points satisfying (10). By a first-order Taylor expansion of f(yt −1 , θ) about the point ye , we have yt − ye = f(yt −1 , θ) − f(ye , θ) ≈ D(ye )′ (yt −1 − ye ),
(11)
where D(ye ) =
∂ f(yt −1 , θ) . ∂ yt −1 yt −1 =ye
(12)
M.J. Dueker et al. / Journal of Econometrics 160 (2011) 311–325
Simulated data using DGP1: C–MSTAR for x
317
Simulated data using DGP1: C–MSTAR for w 10
14
8 10 6 4
6
2 2
0 –2
–2 Skeleton for x Simulted x data threshold value: x*
0
20
40
60
80
100
120
140
160
180
200
Mixing Functions:Weights Regime 1 and 2
G1(yt–1) G2(yt–1)
20
40
60
80
100
120
140
160
180
200
–6
0
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
0
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
–6
Skeleton for w Simulted w data threshold value: w*
–4
0
20
40
60
80
100
120
140
160
180
200
Mixing Functions:Weights Regime 3 and 4
G3(yt–1) G4(yt–1)
20
40
60
80
100
120
140
160
180
200
Fig. 3. Generated data using DGP1. Simulated skeleton, data and mixing functions using the C-MSTR model.
Thus, the local stability of each equilibrium point ye may be assessed by considering the spectrum of D(ye ). More specifically, if the spectral radius of D(ye ) is less than unity, then the equilibrium is locally stable and yt is a contraction in a neighborhood of ye . It can be readily verified that
4 − ∂ f(yt −1 , θ) ∂ Gi (yt −1 ) = (µi + A(1i) yt −1 )′ ∂ yt −1 ∂ yt −1 i =1 + Gi (yt −1 )(A(1i) )′
(13)
and
∂ Gi (yt −1 ) 1 −1/2 = 2 −δt (6i A(1i) )′ ∇ Φ2 (vi ) ∂ yt −1 δt 4 − −1/2 (i) ′ + Φ2 (vi ) (6i A1 ) ∇ Φ2 (vi ) ,
(14)
i =1
−1/2
(i)
where vi = 6i (y∗i − µi − A1 yt −1 ) and ∇ Φ2 (vi ) is the gradient vector of Φ2 (·) at vi .11 3.3.3. Numerical examples A wide variety of empirical distributions and time series can be generated by an C-MSTAR model. In Fig. 3 we show, using
11 Notice that, since Φ (v ) = Φ (v ) Φ (v ) for any v = (v , v )′ ∈ 2 i 1i 2i i 1i 2i R2 , ∇ Φ2 (vi ) may be computed as ∇ Φ2 (vi ) = (φ (v1i ) Φ (v2i ) , Φ (v1i ) φ (v2i ))′ , where φ(·) is the standard normal density function.
DGP-1 presented in Table 1, typical data generated according to a first-order C-MSTAR model, the corresponding mixing functions Gi (yt −1 ), and the skeleton yt . The corresponding plots for DGP-2 and DGP-3 (computed using the same realizations of ut as for DGP1) are omitted in order to conserve space. When the covariance matrix of the noise is diagonal (DGP-1), the data appear to take values which correspond to all the regimes. When, on the other hand, there is a positive contemporaneous correlation (DGP-2), the generated data assume values which are mostly associated with regimes 1 and 4 (corresponding to G1 (·) and G4 (·)), while regimes 2 and 3 (associated with G2 (·) and G3 (·)) appear to dominate in the presence of negative contemporaneous correlation (DGP-3). In all three cases, the skeleton converges to its fixed point very quickly. Using numerical simulations, we found the fixed point ′ ye to be unique for each DGP, taking the value (0.0251, 0.2309) , ′ ′ (0.0539, 0.3828) and (−0.1052, −0.0451) for DGP-1, DGP-2 and DGP-3, respectively. To assess the stability of these fixed points, we compute the spectral radius of the matrix of partial derivatives given in (12) using the expansion in (13)–(14). The spectral radius of D(ye ) is 0.8357, 0.8320 and 0.8296 under DGP-1, DGP-2 and DGP-3, respectively, suggesting that the equilibrium points are locally stable. Furthermore, the Q -geometric ergodicity condition of Proposition 1 is also satisfied for these DGPs: an application of the algorithm in Gripenberg (1996) yields 0.9366025 < ρ(A) < 0.9366125.12
12 The algorithm is implemented using Gustaf Gripenberg’s MATLAB code, which is available at http://math.tkk.fi/~ggripenb/ggsoftwa.htm.
318
M.J. Dueker et al. / Journal of Econometrics 160 (2011) 311–325
4. Estimation and testing
For most parameters the bias is significantly different from zero only when T = 200. The size of the bias depends somewhat on the DGP. For example, while relatively large samples are needed to reduce the bias of µ3 (DGP-1 and DGP-2) and µ4 (DGP-3), we
4.1. ML parameter estimation As in the univariate case, once the distribution of the noise ut is specified, the parameters of the C-MSTAR model can be estimated by the ML method. Letting Ψ (·) denote the distribution function of ut , we assume that Ψ (·) admits a positive Lebesgue density ψ(·) on R2 . Then, for a sample (y0 , y1 , . . . , yT ) of consecutive observations from the bivariate first-order C-MSTAR model characterized by the parameter vector θ = (θ ′1 , θ ′2 , θ ′3 , θ ′4 , x∗ , w ∗ )′ ∈ 2 ⊂ Rdim(θ) , we define the log-likelihood function (conditional on y0 ) as
LT (θ) =
T −
ln ℓt (θ),
t =1
where
ℓt (θ) =
4 −
−1/2
Gi (yt −1 ) det(6i
−1/2
)ψ(6i
{yt − µi − A1(i) yt −1 }),
i =1
and the mixing weights Gi (yt −1 ) are given by (5)–(6) with Ψ (·) used in the place of Φ2 (·). If ℓt (θ) is sufficiently smooth with respect to θ and satisfies suitable heterogeneity, dependence and moment conditions, then standard asymptotic results hold for the ML estimator θ of θ , obtained as the maximizer of (1/T )LT (θ) over 2. More specifically, θ is strongly consistent for the (unknown) true value θ 0 of the parameter θ and {−∇ 2 LT ( θ)}1/2 ( θ − θ 0 ) is asymptotically normal with mean vector 0 and covariance matrix I2 , where ∇ 2 LT ( θ) is the Hessian matrix of LT (θ) evaluated at θ = θ . Sufficient conditions which ensure the validity of these asymptotic results are given in the Appendix, together with a proof. 4.2. Finite-sample properties of ML To throw some light on the finite-sample properties of the ML estimator of the parameters of a C-MSTAR model, we now conduct an extensive simulation study. The DGP used in the experiments is the bivariate first-order C-MSTAR model with Gaussian noise and several parameter configurations. To conserve space, we only report results for the three parameter configurations listed in Table 1 and sample sizes T = 200 and T = 800.13 Experiments proceed by first generating 50 + T data points for yt with initial values set to zero; the first 50 data points are then discarded in order to eliminate start-up effects, while the remaining T points are used to estimate the parameters of the model. The ML estimate θ is obtained by means of a quasi-Newton algorithm that approximates the Hessian according to the Broyden–Fletcher–Goldfarb–Shanno update computed from numerical derivatives. Approximate standard errors for the elements of θ are obtained from the inverted negative Hessian matrix of the log-likelihood function evaluated at the ML estimates. Since the computation of ML estimates is time-consuming (given the large number of parameters), the number of Monte Carlo replications per experiment is 2000. In Tables 2–4, we report some of the characteristics of the finite-sample distributions of each of the elements of θ . These include the bias of the ML estimator, a measure of the accuracy of estimated standard errors as approximations to the sampling standard deviation of the ML estimator, and a test for the normality of the sampling distribution of the ML estimator.
(1)
4.3. Testing for nonlinearity Although a linear specification is nested within the C-MSTAR model, testing the former against the latter by means of conventional Wald, likelihood ratio or score tests is not straightforward because the threshold parameters (x∗ and w ∗ in the bivariate case) are not identified under linearity. It is well known that in problems of this type the asymptotic distributions of conventional test statistics typically depend on unknown parameters and are nonstandard. As in Dueker et al. (2007), one may, in principle, adapt Hansen’s (1992) procedure to obtain asymptotic P-values for a suitably modified likelihood ratio statistic. However, the computational demands of this procedure are rather prohibitive in our multivariate setting because ML parameter estimation for each point of a grid involving a large number of parameters is required (dim(θ) = 38 when n = 2). As an alternative, we will investigate here an approach based on a general portmanteau-type test that is designed to detect the nonlinearity of an unspecified type in a multivariate time series. The test in question was proposed by Harvill and Ray (1999) and is a multivariate extension of Tsay’s (1986) nonlinearity test. To describe the test procedure, let {et } be the least-squares residuals of a pth-order vector autoregressive (VAR) model for {yt } and {e∗t } be the least-squares residuals of the regression of the {np(np + 1)/2}-dimensional vector q∗t = vech(qt ⊗ q′t ) on the (np)-dimensional vector qt = (y′t −1 , . . . , y′t −p )′ , where ⊗ is the Kronecker product operator. Further, let S1 and S2 be the n × n matrices of residual sum of squares and regression sum of squares, respectively, in the least-squares regression of et on e∗t . Then, for a sample of size T , the Harvill–Ray test statistic is given by
13 The full set of results is available upon request.
(2)
find that the bias of the elements of A1 (DGP-1) and A1 (DGP-2 and DGP-3) approaches zero even for relatively small sample sizes. Overall the results show that the ML estimator is slightly biased only for the smallest sample size under consideration, and the bias clearly decreases as the sample increases, becoming negligible in most cases when T = 800. As a measure of the accuracy of estimated asymptotic standard errors, the ratio of the exact standard deviation of the ML estimates to the estimated standard errors averaged across replications for each design point is shown (in parentheses) in Tables 2–4. For most parameters, the estimated asymptotic standard errors are downward biased. These biases are not, however, substantial (even when T = 200) and should not have significant adverse effects on inference. Finally, the Gaussianity of the finite-sample distributions of the ML estimates is assessed by means of a Kolmogorov–Smirnov goodness-of-fit test based on the difference between the empirical distribution function of the ML estimates (relocated and scaled so that the linearly transformed estimates have zero mean and unit variance) and the standard normal distribution function (see Lilliefors (1967)). As can be seen in Tables 2–4, the normality hypothesis for estimators other than µ3 and µ4 (DGP-1and DGP3) and x∗ (DGP-2) cannot be rejected (at the 5% level) for sample sizes larger than 200. Furthermore, we find that the values of the Kolmogorov–Smirnov statistic decrease as T increases, suggesting that the quality of the normal approximation is likely to improve with increasing sample sizes. In fact, while normality is rejected a few times when T = 200, it is never rejected when T = 800.
ℜ=
bd − nc + 1 nc
1 − ω1/2
ω1/2
,
M.J. Dueker et al. / Journal of Econometrics 160 (2011) 311–325
319
Table 2 Finite-sample performance of ML: DGP-1. T = 200
0.008 0.085 0.006 −0.057 0.024 (1.055) (0.989) (1.054) (1) (1.019) (1.013) : 0.071 , A1 : 0.005 0.009 , 61 : −0.066 (1.042) (1.032) (0.995) (1.015) 0.077 −0.004 −0.023 −0.046 0.033 (1.061) (2) (1.022) (0.994) (1.049) (0.992) : 0.053 , A1 : 0.013 −0.006 , 62 : −0.041 (1.026) (0.996) (1.028) (1.076) 0.180 0.093 −0.072 0.083 0.030 (0.969) (0.955) (1.130)Ď (3) (1.043) (1.075) : 0.156 , A1 : 0.084 0.012 , 63 : −0.093 Ď (0.982) (1.068) (1.022) (1.106) 0.171 0 . 078 0.052 −0.066 0.065 (1.149)Ď (4) (1.029) (0.948) (0.950) (1.033) : 0.109 , A1 : 0.084 0.075 , 64 : −0.069 (0.982) (0.930) (1.041) (1.165)Ď
µ1
µ2
µ3
µ4
x∗ :
−0.042 −0.054 w ∗ : (1.061), (1.093)
T = 800
−0.007 (1.003) (1) : 0.010 , A1 (1.009) 0.011 (1.005) (2) : 0.004 , A1 (1.006) 0.077 (1.053) (3) : 0.111 , A1 (1.041) −0.088 (1.058) (4) : 0.061 , A1 (1.044)
µ1
µ2
µ3
µ4
x∗ :
−0.001 (1.011) : 0.006 (0.998) −0.002 (1.003) : 0.009 (0.999) 0.009 (1.018) : 0.022 (1.012) 0.019 (1.006) : 0.058 (0.992)
−0.005 −0.021 (1.019) (0.992) , 6 : 1 0.002 (0.996) 0.003 0.016 (1.008) (1.005) , 6 : 2 −0.002 (1.007) 0.044 −0.020 (1.006) (0.951) , 6 : 3 0.010 (1.040) −0.038 −0.006 (1.008) (1.056) , 6 : 4 −0.049 (1.031)
0.011 (0.997)
−0.020 (1.010) 0.010 (1.002) −0.005 (1.015) 0.060 (0.994) −0.052 (1.011) 0.008 (0.992) 0.029 (0.910)
−0.012 0.026 w ∗ : (1.009), (0.988)
For each ML estimator, entries are the finite-sample bias of the estimator and the ratio of the sampling standard deviation to the estimated standard error (in parentheses). Ď indicates that the Kolmogorov–Smirnov statistic for normality is significant at the 5% level.
where c = np(np + 1)/2, b = T − p − c − np − (n − c + 1)/2, d = {(n2 c 2 − 4)/(n2 + c 2 − 5)}1/2 , and ω = det(S1 )/ det(S1 + S2 ). Under the null hypothesis that {yt } follows a (zero-mean) pth-order VAR model, ℜ has asymptotically a central F -distribution with nc and bd − (nc /2) + 1 degrees of freedom. To assess whether a test based on ℜ has power to detect nonlinearity of the C-MSTAR type, we carry out some Monte Carlo experiments. Table 5 shows the empirical rejection frequencies of the test for C-MSTAR time series generated according to the three DGPs in Table 1. It is clear that, even for time series of a relatively short length, the test based on ℜ has significant power to reject a first-order, linear VAR specification when the data come from a C-MSTAR model. It should be emphasized, however, that the results of a test based on ℜ should be interpreted with caution in an empirical setting since the test is not designed to be optimal against a CMSTAR, or any other specific nonlinear alternative model, and can be expected to have non-trivial power against a wide range of nonlinear mechanisms. That being said, since the test appears to be powerful enough to detect the nonlinearity of the C-MSTAR type, it should be useful as part of a modelling strategy which seeks to establish the usefulness of a C-MSTAR model by first checking a
simpler linear VAR model for signs of misspecification. Of course, once the linear and C-MSTAR models are estimated, they can be compared by using a complexity-penalized likelihood criterion such as the well-known Akaike information criterion (AIC) or one of its many variants. Psaradakis et al. (2009) found such criteria to be useful when selecting among competing (univariate) nonlinear autoregressive models. 5. Empirical application As an illustration, we analyze the low-frequency relationship between stock prices and interest rates. The interactions between asset prices and monetary policy is a topic which has attracted considerable interest in the literature (see, e.g., Bernanke and Gertler (1999, 2001) and Cecchetti et al. (2000)). Using a C-MSTAR model, we examine the possibly different effects that monetary policy may have on stock prices in different states of the economy. An interest rate shock may, for example, have very different effects on stock markets depending on whether the price-earnings ratio is (perceived to be) high or low. Our approach explicitly allows for four different regimes, which are associated with: (i) low priceearning ratio, low interest rates; (ii) low price-earning ratio, high
320
M.J. Dueker et al. / Journal of Econometrics 160 (2011) 311–325 Table 3 Finite-sample performance of ML: DGP-2. T = 200
−0.038 −0.010 −0.005 0.050 −0.072 (1.053) (1.012) (1.047) (1) (1.020) (1.008) : 0.099 , A1 : 0.022 0.010 , 61 : −0.033 (1.040) (1.007) (1.033) (1.018) −0.051 0.010 −0.009 −0.031 0.005 (1.062) (2) (0.990) (0.998) (1.022) (0.999) : 0.072 , A1 : 0.004 −0.002 , 62 : −0.032 (1.009) (1.000) (1.004) (1.040) 0.201 0.106 0.041 −0.090 0.079 (0.920) (0.943) (1.301)Ď (3) (1.078) (0.929) : 0.210 , A1 : 0.079 0.062 , 63 : −0.101 Ď (1.061) (1.012) (1.078) (1.227) 0.010 0.016 0.002 0.032 0.011 (1.022) (1.000) (1.031) (4) (0.991) (0.997) : 0.080 , A1 : 0.008 0.002 , 64 : −0.015 (1.051) (0.998) (1.007) (1.029)
µ1
µ2
µ3
µ4
x∗ :
−0.055 −0.061 w ∗ : (1.072) (1.091)Ď ,
T = 800 0.020 (1.008)
µ1 : 0.007 , (1.001) 0.029 (0.996) µ2 : , 0.030 (1.002) 0.098 (1.033) µ3 : 0.056 , (1.058) 0.007 (1.009) µ4 : 0.023 , (1.007) x∗ :
(1) A1
(2 ) A 1
(3 ) A 1
(4 ) A 1
−0.001 (1.000) : 0.007 (1.002) −0.003 (0.996) : 0.002 (0.999) 0.026 (1.009) : 0.044 (1.006) −0.001 (1.003) : −0.003 (1.001)
0.007 (1.002)
−0.022 0.005 ( 1 . 010 ) ( 1 . 006 ) , 61 : −0.011 −0.005 (1.004) (1.014) 0.005 −0.009 −0.002 (1.005) (0.998) (0.999) , 62 : 0.002 −0.005 (1.002) (1.009) 0.031 −0.007 0.044 (1.082) ( 1 . 012 ) ( 1 . 013 ) , 63 : −0.008 0.049 (1.010) (1.051) 0.008 −0.001 −0.004 (1.004) (0.999) (1.002) , 64 : −0.002 −0.004 (0.997) (0.989)
0.012 −0.010 w ∗ : (0.993), (1.004)
For each ML estimator, entries are the finite-sample bias of the estimator and the ratio of the sampling standard deviation to the estimated standard error (in parentheses). Ď indicates that the Kolmogorov–Smirnov statistic for normality is significant at the 5% level.
interest rates; (iii) high price-earning ratio, low interest rates; and (iv) high price-earning ratio, high interest rates. 5.1. A C-MSTAR model for stock prices and interest rates Our analysis is based on Robert Shiller’s well-known data set of annual observations, from 1900 to 2000, on the Standard and Poor’s 500 composite stock price index to earnings per share (St ) and the three-month Treasury Bill rate (Rt ).14 We let st = St − µs and rt = Rt − µr denote the deviation of the two variables from their respective means. It is evident from Fig. 4 that, for long periods of time, both St and Rt take values well above their sample µr = 4.809, respectively). means (which are µs = 13.731 and It is also clear that both time series tend to remain above or below the respective sample mean for relatively long periods.15 It is reasonable to expect that the economy behaved differently
14 The date is available at http://www.econ.yale.edu/~shiller/data/chapt26.xls. 15 The hypothesis that S and R are random walks (with drift) is rejected in favor t t of a stationary STAR alternative using Eklund’s (2003) test statistic, which takes the value 6.38 and 2.68 for St and Rt , respectively.
in the 1970’s and 1980’s, when interest rates were relatively high and the price-earnings ratio was relatively low, and in periods such as the 1930’s and late 1990’s, when the price-earnings ratio was relatively high. When considering linear VAR models for (st , rt ), the AIC selects a first-order model. However, such a model is firmly rejected by the nonlinearity test discussed in Section 4.3: the value of ℜ is 7.44689, which has a zero asymptotic P-value. Since we use annual data, we expect the nonlinear dynamics of stock prices and interest rate to be adequately captured by a first-order model. Our analysis is based, therefore, on the C-MSTAR model defined by (3)–(6), with yt = (st , rt )′ , yit = (sit , rit )′ , x∗ = s∗ , w ∗ = r ∗ , m = 4, p = 1, and ut ∼ N (0, I2 ). ML estimates of the parameters of this model and their asymptotic standard errors are reported in Table 6.16 The standardized residuals of the model exhibit no signs of serial correlation on the basis of conventional Ljung–Box portmanteau tests. The estimated threshold parameters reported in the last row of Table 6 are s∗ = 3.40317 and r ∗ = −0.07214. Adding to these
16 The GAUSS code used to obtain these results is available from the authors upon request.
M.J. Dueker et al. / Journal of Econometrics 160 (2011) 311–325
321
Table 4 Finite-sample performance of ML: DGP-3. T = 200 0.071 (1.033)
0.020 0.013 (0.967) (1.005) (1) , , µ1 : A : 6 : 1 0.060 1 0.002 0.001 −0.004 (1.006) (1.011) (0.998) (1.007)
0.004 0.007 (0.972) (1.015)
0.031 0.003 0.002 −0.048 0.037 (1.015) (2) (1.005) (1.001) (1.048) (0.990) µ2 : 0.032 , A1 : 0.003 0.002 , 62 : −0.047 (1.008) (0.999) (0.999) (1.033) 0.099 0.074 0.043 −0.081 0.045 (1.152)Ď (3) (1.019) (1.050) (1.042) (1.029) µ3 : 0.163 , A1 : 0.073 0.046 , 63 : −0.054 ( 1 . 069 ) ( 1 . 012 ) ( 1 . 033 ) (1.076) 0.165 0.081 0.056 −0.088 0.067 (1.262)Ď (4) (1.082) (1.079) (1.076) (1.029) µ4 : 0.111 , A1 : 0.085 0.050 , 64 : −0.078 Ď (1.076) (1.062) (1.050) (1.186)
x∗ :
0.022 −0.040 w ∗ : (1.031), (0.955)
T = 800
0.030 (1.002) : 0.043 , (1.007) 0.019 (1.002) : 0.001 , (1.003) 0.067 (1.047) : 0.101 , (1.032) 0.087 (1.052) : 0.020 , (1.019)
µ1
µ2
µ3
µ4
x∗ :
0.001 (1.002) : 0.000 (1.001) 0.001 (1.000) : 0.000 (1.001) 0.015 (1.006) : 0.022 (0.998) 0.012 (1.020) : −0.009 (0.967)
(1) A 1
(2) A 1
(3 ) A 1
(4 ) A 1
0.004 (1.009) , −0.006 (1.019) 0.001 (0.999) , 0.001 (0.999) 0.021 (1.036) , 0.017 (0.989) 0.030 (1.022) , 0.020 (1.012)
−0.010 0.002 (1.002) (1.002) 61 : −0.005 (1.006) −0.009 0.001 (1.001) (1.000) 62 : −0.004 (0.997) −0.031 0.010 (1.004) (1.007) 63 : −0.018 (1.009) 0.045 0.035 (1.009) (0.995) 64 : −0.021 (0.988)
0.006 −0.011 w ∗ : (1.009), (1.004)
For each ML estimator, entries are the finite-sample bias of the estimator and the ratio of the sampling standard deviation to the estimated standard error (in parentheses). Ď indicates that the Kolmogorov–Smirnov statistic for normality is significant at the 5% level.
Table 5 Power of nonlinearity test. T
Nominal level 1%
5%
10%
DGP-1 100 200 400 800
82.20 86.36 93.80 99.04
88.84 91.68 96.20 99.48
91.28 93.48 97.60 99.60
86.76 92.36 97.84 99.88
90.20 95.04 98.68 99.96
96.08 99.56 100.0 100.0
97.24 99.68 100.0 100.0
DGP-2 100 200 400 800
78.08 85.16 95.16 99.64 DGP-3
100 200 400 800
93.28 98.96 99.96 100.0
Entries are percentage rejection frequencies of the Harvill–Ray test.
values the corresponding sample means µs and µr , we see that the estimated thresholds for the price-earnings ratio and the interest rate are 17.1343 and 4.73695, respectively. The bottom four panels of Fig. 4 plot the estimated mixing functions, for each point in sample, which specify the weight of regime 1 (associated with G1 (·)), regime 2 (associated with G2 (·)), regime 3 (associated with G3 (·)), and regime 4 (associated with G4 (·)). It is seen that the most prominent regime is the one characterized by a low price-earnings ratio and low interest rates (regime 1). This regime lasts from the mid 1930’s to the end of the 1960’s. Much of the 1970’s and 1980’s appears to be associated with a regime with low price-earnings ratio and high interest rates (regime 2), a regime which also seems to characterize a few years in the beginning of the 1900’s through 1930. The regime associated with high price-earnings ratio and low interest rates (regime 3) never lasts more than six years and is prevalent in only a few years during the 1930’s, 1960’s and 1990’s. Finally, the regime associated with high price-earnings ratio and high interest rates (regime 4) seems to dominate for only short periods of time towards the end of the 1960’s and the early 1990’s. Regarding the stability properties of the empirical model, we note that the ML estimates reported in Table 6 do not satisfy
322
M.J. Dueker et al. / Journal of Econometrics 160 (2011) 311–325
Price–Earnings Ratio
6 2 –2
Skeleton for x Threshold for the P.–E. Price–Earnings
Demeaned Values
10
–6 1900 1910 1920 1930 1940 1950 1960 1970 1980 1990 2000
Interest Rates
–10 –6 –2 2 6 10 14 18
Demeaned Values
14
Skeleton for w Thr. for the Interest Rates Interest Rates
1900 1910 1920 1930 1940 1950 1960 1970 1980 1990 2000
G2 (zt–1)
G1 (zt–1)
Mixing function: Weight of regime 1 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0
Regime 1
1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0
Regime 3
1900 1910 1920 1930 1940 1950 1960 1970 1980 1990 2000
Regime 2
1900 1910 1920 1930 1940 1950 1960 1970 1980 1990 2000
G4 (zt–1)
G3 (zt–1)
1900 1910 1920 1930 1940 1950 1960 1970 1980 1990 2000
1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0
1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0
Regime 4
1900 1910 1920 1930 1940 1950 1960 1970 1980 1990 2000
Fig. 4. Data mixing functions using the C-MSTR (1) model.
Table 6 ML estimates for a C-MSTAR model. Regime 1: low price-earnings ratio, low interest rate
−0.54000 0.07130 1.03181 −0.32729 2.72805 (0.72558) (1) (0.11118) (0.25067) (0.78069) (0.00967) µ1 = 0.52756 , A1 = 0.00836 1.11187 , 61 = 0.07130 0.02642 (0.07339) (0.01090) (0.02438) (0.00967) (0.04115)
Regime 2: low price-earnings ratio, high interest rate 0.52803 (0.34762)
0.90321 −0.03038 (0.08047) (0.10508)
3.61557 −0.29982 (0.78726) (0.74264)
(2) µ2 = 0.59252 , A1 = −0.15747 0.79461 , 62 = −0.29982 3.00545 (0.27681) (0.07017) (0.08365) (0.74264) (0.33928) Regime 3: high price-earnings ratio, low interest rate
0.19662 0.95186 1.04940 15.2194 −0.43588 (1.3002) (3) (0.20444) (0.44486) (4.90575) (0.00489) µ3 = −1.08184 , A1 = 0.10149 0.90033 , 63 = −0.43588 0.63892 (0.34393) (0.04864) (0.09666) (0.00489) (0.55733)
Regime 4: high price-earnings ratio, high interest rate
−3.73793 −0.46101 0.18350 22.9615 3.13327 (1.82851) (4) (0.25623) (0.29623) (10.4492) (0.00489) µ4 = −0.82893 , A1 = −0.13939 0.53549 , 64 = 3.13327 0.42766 (0.24954) (0.03497) (0.04044) (0.00489) (0.71372) 3.40317 −0.07214 s∗ = r∗ = (0.71359), (0.10159),
max L = −351.160, AIC = 778.320, BIC = 877.317, HQ = 818.386 Figures in parentheses are asymptotic standard errors and max L is the maximized log-likelihood.
the condition of Proposition 1; specifically, we have 1.25346 < (1) (2) (3) (4) ) < 1.27997, where A = { ρ(A A1 , A1 , A1 , A1 }. It should be remembered, however, that a joint spectral radius less than unity is not necessary for Q -geometric ergodicity and is clearly a rather stringent condition.
To investigate further the stability characteristics of the empirical model, we examine the properties of its skeleton. Using numerical simulation and a grid of starting values, it is found that the skeleton of the empirical model in Table 6 has a unique fixed point ye = (0.478, −0.059)′ and that the matrix of partial
M.J. Dueker et al. / Journal of Econometrics 160 (2011) 311–325
323
Table 7 ML estimates for a VAR model. yt = µ + Ayt −1 + 61/2 ut 0.1301 (0.3200)
µ=
0.7938 −0.0590 (0.0706) (0.0332)
10.2291 0.0577 , , A = 6 = 0.0988 0.8661 0.0111 0.0577 2.2561 (0.1503) (0.1047) (0.0492)
max L = −437.679, AIC = 893.358, BIC = 916.805, HQ = 902.847 Figures in parentheses are asymptotic standard errors and max L is the maximized loglikelihood.
derivatives D(ye ) in (12) has spectral radius 0.801. This suggests that the model is locally stable. Furthermore, plots of the skeleton (not shown here) reveal that, for both the price-earning ratio and the interest rate, the skeleton converges very quickly to the respective long-run value, which provides further evidence of stability. 5.2. Regime-specific Granger causality In the majority of applications, Granger causality has been analyzed in the context of linear VAR models for a set of variables of interest. A standard auxiliary assumption typically made is that the parameters of the VAR are constant over the sample period under consideration. This corresponds to an assumption that the causal links are stable over time, an assumption which is far from innocuous and may not hold in practice (see, e.g., Psaradakis et al., 2005). To illustrate this point, we begin our analysis using a first-order VAR model, the estimated parameters of which are reported in Table 7. Clearly, none of the two variables appears to be Granger causal for the other. This result is very surprising since, not only do the two variables reflect alternative investing opportunities, but the interest rate is usually thought of as a policy variable that might be used to correct misalignments in stock prices. The lack of Granger causality in our system may well be a consequence of the issues mentioned above. Another potential difficulty is that causality tests based on VAR models may have low power in the presence of nonlinearities in the data. For this reason, we also carry out the nonparametric test for Granger non-causality proposed by Diks and Panchenko (2006). The test is implemented with one lag and bandwidth set equal to max{8.62/T 2/7 , 1.5} = 2.3059.17 The test statistic for the null hypothesis that the price-earning ratio is Granger non-causal for the interest rate takes the value 0.195, which has an asymptotic P-value of 0.4226; the statistic for testing the null hypothesis that the interest rate is Granger non-causal for the price-earning ratio takes the value 1.1095, which has an asymptotic P-value of 0.0866. Of course, neither the causality test based on the VAR nor the nonparametric test can provide information about the potential regime-specific nature of Granger causality in our bivariate system. To investigate this issue we adopt a slightly different approach to that of Psaradakis et al. (2005) and, instead of inquiring how causality patterns change over time, we examine whether the two variables are useful for predicting each other in different economic regimes. Using the C-MSTAR model in Table 6, it can be (i) seen that the off-diagonal elements of A1 vary significantly across regimes. Specifically, the interest rate Granger causes the priceearning ratio in regime 3. One may speculate that in regime 3 the stock price boom of the 1960’s is associated with a long period of relatively low interest rates; the causality in regime 1 reflects the fact that stocks and bonds are substitute assets and that low interest rates may help to forecast high future stock prices. The
17 For details on the definition of the test statistic and the choice of bandwidth the reader is referred to Diks and Panchenko (2006).
price-earnings ratio Granger causes the interest rates in regimes 2–4. This result may reflect the fact that the central bank reacts to the price-earning ratio by changing the interest rate when it is thought that a misalignment correction is needed. In regime 2, a low price-earnings ratio leads to a reduction in interest rates (from a high interest rate regime). In regime 3, a high price-earnings ratio leads to an increase in interest rates (from a low interest rate regime). Finally, in regime 4 a high price-earnings ratio leads to a reduction of the interest rate (from a high interest rate regime). Notice that regime 4 is followed by regime 2; for example, the period of high price-earnings ratio and interest rates of the 1920’s is followed by a crash in the stock markets.18 5.3. Forecast accuracy In this sub-section, we evaluate the accuracy of out-of-sample forecasts from the C-MSTAR model and the linear VAR model. Comparisons are based on a series of recursive forecasts computed in the following way. Each of the models is fitted to the bivariate time series {yt = (st , rt )′ }Tt =−1N , where T = 101 is the number of observations in the full sample and N = 25 is the number of forecasts (the forecast period is 1976–2000). Using T − N as the forecast origin, a sequence of one-step-ahead forecasts is generated from each of the fitted models. The forecast origin is then rolled forward one period to T − N + 1, the parameters of the forecast models are re-estimated, and another sequence of one-step-ahead forecasts is generated. The procedure is repeated until N forecasts are obtained, which are then used to compute measures of forecast accuracy. Note that one-step-ahead forecasts from the C-MSTAR are relatively straightforward to compute as the model involves a weighted average of the two linear relationships. Forecast performance is evaluated using traditional accuracy measures such as mean square percentage error (MSPE), mean absolute percentage error (MAPE), and root mean square percentage error (RMSPE). In addition, the ability of the models to correctly identify turning points (i.e., the direction of change in the variable of interest regardless of the accuracy with which the magnitude of the change is predicted) is evaluated using the so-called confusion rate (CR), which is computed as the percentage of times the direction of change is wrongly predicted. From the results reported in Table 8, it is clear that the CMSTAR model yields the smallest MSPE, MAPE and RMSPE for the price-earnings ratio, while the VAR is more successful than the CMSTAR in forecasting the interest rate. Turning to the outcomes for the bivariate system (sum of the individual results), the CMSTAR outperforms the VAR, with a gain of 2% in terms of both MSPE and MAPE, and 1% in terms of the RMSPE. A comparison between the two models on the basis of confusion rates shows that the C-MSTAR produces better results for both series. The C-MSTAR wrongly predicts the direction of the change in the
18 Even though there is no reason, in general, for regime 4 to be short-lived (as this is not an intrinsic property of the model), we expect this to be the case for our data set because a high enough interest rate tends to cool down the stock market.
324
M.J. Dueker et al. / Journal of Econometrics 160 (2011) 311–325
Table 8 Out-of-sample performance.
VAR (PE) VAR (IR) Overall C-MSTAR (PE) C-MSTAR (IR) Overall
MSPE
MAPE
RMSPE
CR
0.0562 0.0658 0.1220 0.0487 0.0710 0.1197
0.1946 0.2132 0.4078 0.1734 0.2266 0.4000
0.4412 0.4617 0.9029 0.4164 0.4760 0.8924
0.2917 0.4167 – 0.2500 0.3333 –
PE refers to the price-earnings ratio and IR to the interest rate. MSPE is the mean square percentage error, MAPE is the mean absolute percentage error, and RMSPE is the root mean square percentage error of the difference between the forecast data and the actual data. CR are confusion rates.
price-earnings ratio 25% of time, with the corresponding figure for the linear model is 29%. In the case of the interest rate, the confusion rates are 34% and 42% for the C-MSTAR and the VAR, respectively. To assess which model is more successful over time (that is, which model outperforms the alternative most of the time as opposed to being more successful on average), we compute the number of times each model achieves the smallest MAPE over the 25 forecast points. On the basis of the individual series, we find that C-MSTAR outperforms the VAR 76% of the time when forecasting the price-earnings ratio and 60% of the time when forecasting the interest rates. To summarize, the empirical results illustrate the importance of capturing the regime-specific properties of the data in order to understand the complex interrelationships between economic variables. Models which do not account for such regime-specific characteristics may yield results which, like those obtained from a linear VAR, may appear to be counterintuitive. The C-MSTAR model characterizes adequately the dynamics of interest rates and stock prices, yields economically meaningful results, and has good outof-sample forecast performance.19 , 20 6. Summary In this paper, we have introduced a new class of contemporaneous-threshold multivariate STAR models in which the mixing weights are determined by the probability that contemporaneous latent variables exceed certain threshold values. We have discussed issues related to the stability of the model, estimation and testing. We have also illustrated the practical use of the proposed model by analyzing the bivariate relationship between US stock prices and interest rates. Our findings indicate that the proposed model performs well in terms of in-sample goodness of fit and outof-sample forecast accuracy, and that the regime-specific Granger causality patterns between the two variables that are implied by the model typically differ from those obtained from a linear model in a way which is economically meaningful. Appendix. Asymptotic properties of the ML estimator
Section 4.1 are given below. Definitions and notation used here are as in the main text. For any real-valued function θ → f (θ), we write ∇ f (θ ∗ ) and ∇ 2 f (θ ∗ ) for the gradient vector and Hessian matrix, respectively, of f (·) at θ ∗ , and use ‖·‖2 to denote the Frobenius matrix norm (i.e., ‖C‖2 = {tr(C′ C)}1/2 ). (C.1) For each θ ∈ 2, {yt } is strictly stationary and ergodic. (C.2) Ψ (·) and ψ(·) are twice continuously differentiable. (C.3) θ 0 is an interior point of the compact and convex parameter space 2. (C.4) P[ℓ t (θ) − ℓt (θ 0 ) ̸= 0] > 0 for all θ ∈ 2 \ {θ 0 }. (C.5) E supθ∈2 |ln ℓt (θ)| < ∞. (C.6) E supθ∈B (θ 0 ) ∇ 2 ln ℓt (θ)2 < ∞ for some open neighborhood B (θ 0 ) ofθ 0 . (C.7) E supθ∈B (θ 0 ) ∇ 2 ℓt (θ)2 < ∞. (C.8) E ‖∇ ln ℓt (θ 0 )‖2 < ∞. (C.9) (θ 0 ) = −E[∇ 2 ln ℓt (θ 0 )] is nonsingular.
These are fairly standard regularity conditions for ML estimation. We note that for (C.1) to hold it is sufficient that the conditions of Proposition 1 are satisfied and {yt } is initialized from its invariant distribution. We have the following result for the ML estimator θ = arg maxθ∈2 (1/T )LT (θ). Proposition 2. If conditions (C.1)–(C.5) are satisfied, then θ is strongly consistent for θ . If, in addition, conditions ( C . 6 ) – ( C . 9 ) are 0 √ satisfied, then T ( θ − θ 0 ) is asymptotically normal with mean vector 0 and covariance matrix (θ 0 )−1 . Proof. It is easy to see that LT (θ) is a measurable function of the data for each fixed θ ∈ 2 and almost surely continuous in θ . Moreover, since the sequence {ln ℓt (θ)} is strictly stationary and ergodic under (C.1)–(C.2) (e.g., Straumann and Mikosch (2006, Proposition 2.5)), it follows from (C.5) and the uniform strong law of large numbers in Theorem 2.7 of Straumann and Mikosch (2006) that
T 1 − ln ℓt (θ) − E[ln ℓt (θ)] = 0 almost surely. lim sup T →∞ θ∈2 T t =1 Thus, using the compactness of 2, together with the fact that E[ln ℓt (θ)] attains a unique maximum at θ = θ 0 under (C.3)–(C.5), we conclude by a standard argument (cf. Amemiya (1973, Lemma 3)) that limT →∞ θ = θ 0 almost surely. Turning to the root-T asymptotic normality of θ , we note that LT (θ) is almost surely twice continuously differentiable in θ and ∑T t =1 ∇ ln ℓt (θ) = 0 for all T sufficiently large because θ is strongly consistent for θ 0 and θ 0 is interior to 2. Thus, by a mean-value ∑T expansion of t =1 ∇ ln ℓt ( θ) about θ 0 , we have T 1 − 0 = √ ∇ ln ℓt (θ 0 ) T t =1
Sufficient conditions which ensure the strong consistency and asymptotic normality of the ML estimator of θ mentioned in
+
T 1−
T t =1
¯ ∇ ln ℓt (θ) 2
√
T ( θ − θ0 ) ,
(15)
19 The forecast results are particularly noteworthy because one of the major weaknesses of many nonlinear models is their relatively poor out-of-sample performance (see also Dueker et al., 2007). 20 In an earlier version of the paper, we discussed the relationship between the C-MSTAR and the autoregressive conditional root model of Bec et al. (2008), and reported the empirical results obtained by fitting a logistic multivariate STAR model to our data. The latter model was found to be outperformed by the CMSTAR both in terms of in-sample goodness of fit and out-of-sample forecast accuracy. For reasons of space conservation, we do not include these findings here; instead, we refer the interested reader to the working paper version available at http://pareto.uab.es/wp/2010/81710.pdf.
for some θ¯ ∈ 2 satisfying ¯θ − θ 0 ≤ θ − θ 0 and all T
sufficiently large. Since {∇ 2 ln ℓt (θ)} is a strictly stationary and ergodic sequence, and limT →∞ θ¯ = θ 0 almost surely by virtue of the strong consistency of θ for θ 0 , it follows from (C.6), Theorem 2.7 of Straumann and Mikosch (2006), and Lemma 4 of Amemiya (1973) that lim
T →∞
T 1−
T t =1
¯ = −(θ 0 ) almost surely. ∇ 2 ln ℓt (θ)
(16)
M.J. Dueker et al. / Journal of Econometrics 160 (2011) 311–325
Furthermore, since the model is correctly specified, {∇ ln ℓt (θ 0 )} forms a strictly stationary and ergodic vector-valued martingaledifference sequence relative to the σ -field generated by {yt , yt −1 , . . . , y0 }, and E[{∇ ln ℓt (θ 0 )}{∇ ln ℓt (θ 0 )}′ ] exists and is equal to (θ 0 ) under (C.4)–(C.8). Thus, we may use the Billingsley –Ibragimov martingale central limit theorem (Taniguchi and Kakizawa (2000, Theorem √ ∑TA.2.14)) and the Cramér–Wold device to conclude that (1/ T ) t =1 ∇ ln ℓt (θ 0 ) is asymptotically normal with mean vector 0 and covariance matrix (θ 0 ). This result, together with (15), √ (16) and (C.9), delivers the claimed asymptotic distribution of T ( θ − θ 0 ) by an application of Slutsky’s lemma. The asymptotic normality of {−∇ 2 LT ( θ)}1/2 ( θ−θ 0 ) mentioned in Section 4.1 is an immediate consequence of Proposition 2 and ∑T of the fact that limT →∞ (1/T ) t =1 ∇ 2 ln ℓt ( θ) = −(θ 0 ) almost surely (the latter result also guarantees the existence of a large enough T such that ∇ 2 LT ( θ) is negative definite almost surely). References Altissimo, F., Violante, G., 2001. The non-linear dynamics of output and unemployment in the US. Journal of Applied Econometrics 16, 461–486. Amemiya, T., 1973. Regression analysis when the dependent variable is truncated normal. Econometrica 41, 997–1016. Bec, F., Rahbeck, A., Shephard, N., 2008. The ACR model: a multivariate dynamic mixture autoregression. Oxford Bulletin of Economics and Statistics 70, 583–618. Bernanke, B., Gertler, M., 1999. Monetary policy and asset price volatility. In: New Challenges for Monetary Policy. Federal Reserve Bank of Kansas City, Kansas City, pp. 77–128. Bernanke, B., Gertler, M., 2001. Should central banks respond to movements in asset prices? American Economic Review 91, 253–257. Cecchetti, S.G., Genberg, H., Lipsky, J., Wadhwani, S.B., 2000. Asset Prices and Central Bank Policy, Geneva Reports on the World Economy. No. 2. International Center for Monetary and Banking Studies and Centre for Economic Policy Research. Chan, K.S., Tong, H., 1985. On the use of the deterministic Lyapunov function for the ergodicity of stochastic difference equations. Advances in Applied Probability 17, 666–678. De Gooijer, J.G., Vidiella-i-Anguera, A., 2004. Forecasting threshold cointegrated systems. International Journal of Forecasting 20, 237–253. Diks, C., Panchenko, V., 2006. A new statistic and practical guidelines for nonparametric Granger causality testing. Journal of Economic Dynamics and Control 30, 1647–1669. Doukhan, P., 1994. Mixing: Properties and Examples. In: Lecture Notes in Statistics, vol. 85. Springer-Verlag, Berlin. Dueker, M.J., Sola, M., Spagnolo, F., 2007. Contemporaneous threshold autoregressive models: estimation, testing and forecasting. Journal of Econometrics 141, 517–547. Eklund, B., 2003. A nonlinear alternative to the unit root hypothesis. SSE/EFI Working Paper No. 547. Stockholm School of Economics. Fong, P.W., Li, W.K., Yau, C.W., Wong, C.S., 2007. On a mixture vector autoregressive model. Canadian Journal of Statistics 35, 135–150. Gripenberg, G., 1996. Computing the joint spectral radius. Linear Algebra and its Applications 234, 43–60.
325
Hamilton, J.D., 1993. Estimation, inference and forecasting of time series subject to changes in regime. In: Maddala, G.S., Rao, C.R., Vinod, H.D. (Eds.), Handbook of Statistics, vol. 11. Elsevier Science Publishers, Amsterdam, pp. 231–260. Hansen, B.E., 1992. The likelihood ratio test under nonstandard conditions: testing the Markov switching model of GNP. Journal of Applied Econometrics 7, S61–S82; Journal of Applied Econometrics 11, 195–198 (Erratum). Harvill, J.L., Ray, B.K., 1999. A note on tests for nonlinearity in a vector time series. Biometrika 86, 728–734. Harvill, J.L., Ray, B.K., 2006. Functional coefficient autoregressive models for vector time series. Computational Statistics and Data Analysis 50, 3547–3566. Jungers, R., 2009. The Joint Spectral Radius: Theory and Applications. In: Lecture Notes in Control and Information Sciences, vol. 385. Springer-Verlag, Berlin. Kapetanios, G., 2001. Model selection in threshold models. Journal of Time Series Analysis 22, 733–754. Koop, G., Potter, S., 2006. The vector floor and ceiling model. In: Milas, C., Rothman, P., van Dijk, D. (Eds.), Nonlinear Time Series Analysis of Business Cycles. In: Contributions to Economic Analysis, vol. 276. Elsevier, Amsterdam, pp. 97–131. Liebscher, E., 2005. Towards a unified approach for proving geometric ergodicity and mixing properties of nonlinear autoregressive processes. Journal of Time Series Analysis 26, 669–689. Lilliefors, W.H., 1967. On the Kolmogorov–Smirnov test for normality with mean and variance unknown. Journal of the American Statistical Association 62, 399–402. Meyn, S.P., Tweedie, R.L., 2009. Markov Chains and Stochastic Stability, 2nd ed. Cambridge University Press, Cambridge. Psaradakis, Z., Ravn, M.O., Sola, M., 2005. Markov switching causality and the money–output relationship. Journal of Applied Econometrics 20, 665–683. Psaradakis, Z., Sola, M., Spagnolo, F., Spagnolo, N., 2009. Selecting nonlinear time series models using information criteria. Journal of Time Series Analysis 30, 369–394. Psaradakis, Z., Spagnolo, N., 2006. Joint determination of the state dimension and autoregressive order for models with Markov regime switching. Journal of Time Series Analysis 27, 753–766. Rothman, P., van Dijk, D., Franses, P.H., 2001. A multivariate STAR analysis of the relationship between money and output. Macroeconomic Dynamics 5, 506–532. Sola, M., Driffill, J., 1994. Testing the term structure of interest rates using a stationary vector autoregression with regime switching. Journal of Economic Dynamics and Control 18, 601–628. Straumann, D., Mikosch, T., 2006. Quasi-maximum-likelihood estimation in conditionally heteroscedastic time series: a stochastic recurrence equations approach. Annals of Statistics 34, 2449–2495. Taniguchi, M., Kakizawa, Y., 2000. Asymptotic Theory of Statistical Inference for Time Series. Springer-Verlag, New York. Teräsvirta, T., 1998. Modelling economic relationships with smooth transition regressions. In: Ullah, A., Giles, D.E.A. (Eds.), Handbook of Applied Economic Statistics. Marcel Dekker, New York, pp. 507–552. Tong, H., 1983. Threshold Models in Non-Linear Time Series Analysis. SpringerVerlag, New York. Tong, H., 1990. Non-linear Time Series: A Dynamical System Approach. Oxford University Press, Oxford. Tsay, R.S., 1986. Nonlinearity tests for time series. Biometrika 73, 461–466. Tsay, R.S., 1998. Testing and modeling multivariate threshold models. Journal of the American Statistical Association 93, 1188–1202. Tsitsiklis, J.N., Blondel, V.D., 1997. The Lyapunov exponent and joint spectral radius of pairs of matrices are hard – when not impossible – to compute and to approximate. Mathematics of Control, Signals, and Systems 10, 31–40. van Dijk, D., Teräsvirta, T., Franses, P.H., 2002. Smooth transition autoregressive models—a survey of recent developments. Econometric Reviews 21, 1–47.
Journal of Econometrics 160 (2011) 326–348
Contents lists available at ScienceDirect
Journal of Econometrics journal homepage: www.elsevier.com/locate/jeconom
Panels with non-stationary multifactor error structures✩ G. Kapetanios a , M. Hashem Pesaran b,c , T. Yamagata d,∗ a
Queen Mary, University of London, United Kingdom
b
Cambridge University, United Kingdom
c
University of Southern California, CA, United States
d
University of York, United Kingdom
article
info
Article history: Received 15 June 2009 Received in revised form 14 April 2010 Accepted 5 October 2010 Available online 16 October 2010 JEL classification: C12 C13 C33 Keywords: Cross-section dependence Large panels Unit roots Principal components Common correlated effects
abstract The presence of cross-sectionally correlated error terms invalidates much inferential theory of panel data models. Recently, work by Pesaran (2006) has suggested a method which makes use of crosssectional averages to provide valid inference in the case of stationary panel regressions with a multifactor error structure. This paper extends this work and examines the important case where the unobservable common factors follow unit root processes. The extension to I (1) processes is remarkable on two counts. First, it is of great interest to note that while intermediate results needed for deriving the asymptotic distribution of the panel estimators differ between the I (1) and I (0) cases, the final results are surprisingly similar. This is in direct contrast to the standard distributional results for I (1) processes that radically differ from those for I (0) processes. Second, it is worth noting the significant extra technical demands required to prove the new results. The theoretical findings are further supported for small samples via an extensive Monte Carlo study. In particular, the results of the Monte Carlo study suggest that the crosssectional-average-based method is robust to a wide variety of data generation processes and has lower biases than the alternative estimation methods considered in the paper. © 2010 Elsevier B.V. All rights reserved.
1. Introduction Panel data sets have been increasingly used in economics to analyze complex economic phenomena. One of their attractions is the ability to use an extended data set to obtain information about parameters of interest which are assumed to have common values across panel units. Most of the work carried out on panel data has usually assumed some form of cross-sectional independence to derive the theoretical properties of various inferential procedures. However, such assumptions are often suspect, and as a result recent advances in the literature have focused on estimation of panel data models subject to error cross-sectional dependence. A number of different approaches have been advanced for this purpose. In the case of spatial data sets where a natural immutable
✩ The authors thank Vanessa Smith and Elisa Tosetti, and seminar participants at Kyoto University, University of Amsterdam, University of Nottingham and the annual conference at the Granger Centre for most helpful comments on a previous version of this paper. This version has also benefited greatly from constructive comments and suggestions by the Editor (Cheng Hsiao), an associate editor and three anonymous referees. Hashem Pesaran and Takashi Yamagata acknowledge financial support from the ESRC (Grant No. RES-000-23-0135). ∗ Corresponding author. Tel.: +44 0 1904 433758. E-mail address:
[email protected] (T. Yamagata).
0304-4076/$ – see front matter © 2010 Elsevier B.V. All rights reserved. doi:10.1016/j.jeconom.2010.10.001
distance measure is available, the dependence is often captured through ‘‘spatial lags’’ using techniques familiar from the time series literature. In economic applications, spatial techniques are often adapted using alternative measures of ‘‘economic distance’’. This approach is exemplified in work by Lee and Pesaran (1993), Conley and Dupor (2003), Conley and Topa (2002) and Pesaran et al. (2004), as well as the literature on spatial econometrics recently surveyed by Anselin (2001). In the case of panel data models where the cross-section dimension (N) is small (typically N < 10) and the time series dimension (T ) is large, the standard approach is to treat the equations from the different cross-section units as a system of seemingly unrelated regression equations (SURE) and then estimate the system by generalized least squares (GLS) techniques. The SURE approach is not applicable if the errors are correlated with the regressors and/or if the panels under consideration have a large cross-sectional dimension. This has led a number of investigators to consider unobserved factor models, where the cross-section error correlations are defined in terms of the factor loadings. The use of unobserved factors also allows for a certain degree of correlation between the idiosyncratic errors and the unobserved factors. Use of factor models is not new in economics, and dates back to the pioneering work of Stone (1947), who applied the principal component (PC) analysis of Hotelling
G. Kapetanios et al. / Journal of Econometrics 160 (2011) 326–348
to US macroeconomic time series over the period 1922–1938 and was able to demonstrate that three factors (namely total income, its rate of change and a time trend) explained over 97% of the total variations of all the 17 macro variables that he had considered. Until recently, subsequent applications of the PC approach to economic times series has been primarily in finance. See, for example, Chamberlain and Rothschild (1983), Connor and Korajzcyk (1986) and Connor (1988). But more recently the unobserved factor models have gained popularity for forecasting with a large number of variables, as advocated by Stock and Watson (2002). The factor model is used very much in the spirit of the original work by Stone, in order to summarize the empirical content of a large number of macroeconomics variables by a small set of factors which, when estimated using principal components, is then used for further modelling and/or forecasting. Related literature on dynamic factor models has also been put forward by Forni and Reichlin (1998) and Forni et al. (2000). Recent uses of factor models in forecasting focus on the consistent estimation of unobserved factors and their loadings. Related theoretical advances by Bai and Ng (2002) and Bai (2003) are also concerned with the estimation and selection of unobserved factors and do not consider the estimation and inference problems in standard panel data models, where the objects of interest are slope coefficients of the conditioning variables (regressors). In such panels, the unobserved factors are viewed as nuisance variables, introduced primarily to model the cross-section dependences of the error terms in a parsimonious manner relative to the SURE formulation. Despite these differences, knowledge of factor models could still be useful for the analysis of panel data models if it is believed that the errors might be cross-sectionally correlated. Disregarding the possible factor structure of the errors in panel data models can lead to inconsistent parameter estimates and incorrect inference. Coakley et al. (2002) suggest a possible solution to the problem using the method of Stock and Watson (2002). But, as Pesaran (2006) shows, the PC approach proposed by Coakley et al. (2002) can still yield inconsistent estimates. Pesaran (2006) suggests a new approach by noting that linear combinations of the unobserved factors can be well approximated by cross-section averages of the dependent variable and the observed regressors. This leads to a new set of estimators, referred to as the Common Correlated Effects (CCE) estimators, that can be computed by running standard panel regressions augmented with the cross-section averages of the dependent and independent variables. The CCE procedure is applicable to panels with a single factor or multiple unobserved factors, and it does not necessarily require the number of unobserved factors to be smaller than the number of observed cross-section averages. In this paper, we extend the analysis of Pesaran (2006) to the case where the unobserved common factors are integrated of order 1, or I (1). Our analysis does not require an a priori knowledge of the number of unobserved factors. It is only required that the number of unobserved factors remains fixed as the sample size is increased. The extension of the results of Pesaran (2006) to the I (1) case is far from straightforward, and it involves the development of new intermediate results that could be of relevance to the analysis of panels with unit roots. It is also remarkable in the sense that, whilst the intermediate results needed for deriving the asymptotic distribution of the panel estimators differ between the I (1) and I (0) cases, the final results are surprisingly similar. This is in direct contrast to the usual phenomenon whereby distributional results for I (1) processes are radically different to those for I (0) processes and involve functionals of Brownian motion whose use requires separate tabulations of critical values. It is very important to appreciate that our primary focus is on estimating the coefficients of the panel regression model.
327
We do not wish to investigate the (co-)integration properties of the unobserved factors. Rather, our focus is robustness to the properties of the unobserved factors, for the estimation of the coefficients of the observed regressors that vary over time as well as over the cross-section units. In this sense, the extension provided by our work is of great importance in empirical applications where the integration properties of the unobserved common factors are typically unknown. In the CCE approach, the nature of the factors does not matter for inferential analysis of the coefficients of the observed variables. The theoretical findings of the paper are further supported for small samples via an extensive Monte Carlo study. In particular, the results of the Monte Carlo study clearly show that the CCE estimator is robust to a wide variety of data generation processes and has lower biases than all of the alternative estimation methods considered in the paper. The structure of the paper is as follows. Section 2 provides an overview of the method suggested by Pesaran (2006) in the case of stationary factor processes. Section 3 provides the theoretical framework of the analysis of non-stationarity. In this section, the theoretical properties of the various estimators are presented. Section 4 presents an extensive Monte Carlo study, and Section 5 concludes. The Appendices contain proofs of the theoretical results. Notation: K stands for a finite positive constant, ‖A‖ = [Tr(AA′ )]1/2 is the Frobenius norm of the m × n matrix A, and A+ denotes the Moore–Penrose inverse of A. rk(A) denotes the rank of A. supi Wi is the supremum of Wi over i. an = O(bn ) states that the deterministic sequence {an } is at most of order bn , xn = Op (yn ) states that the vector of random variables, xn , is at most of order yn in probability, and xn = op (yn ) is of smaller order q.m.
in probability than yn ; → denotes convergence in quadratic p
mean (or mean square error), → convergence in probability, d
d
→ convergence in distribution, and ∼ asymptotic equivalence of probability distributions. All asymptotics are carried out under N → ∞, either with a fixed T , or jointly with T → ∞. Joint j
convergence of N and T will be denoted by (N , T ) → ∞. Restrictions (if any) on the relative rates of convergence of N and T will be specified separately. 2. Panel data models with observed and unobserved common effects In this section, we review the methodology introduced in Pesaran (2006). Let yit be the observation on the ith cross-section unit at time t for i = 1, 2, . . . , N ; t = 1, 2, . . . , T , and suppose that it is generated according to the following linear heterogeneous panel data model: yit = α′i dt + β′i xit + γ ′i ft + εit ,
(1)
where dt is an n × 1 vector of observed common effects, which ′ ′ ′ , d2t ) , where d1t is an n1 × 1 vector of is partitioned as dt = (d1t deterministic components such as intercepts or seasonal dummies and d2t is an n2 × 1 vector of unit root stochastic observed common effects, with n = n1 + n2 , xit is a k × 1 vector of observed individual-specific regressors on the ith cross-section unit at time t, ft is the m × 1 vector of unobserved common effects, and εit are the individual-specific (idiosyncratic) errors assumed to be independently distributed of (dt , xit ). The unobserved factors, ft , could be correlated with (dt , xit ), and to allow for such a possibility the following specification for the individual specific regressors will be considered: xit = A′i dt + Γ ′i ft + vit ,
(2)
where Ai and Γ i are n × k and m × k factor loading matrices with fixed and bounded components, and vit = (vi1t , . . . , vikt )′ are the
328
G. Kapetanios et al. / Journal of Econometrics 160 (2011) 326–348
specific components of xit distributed independently of the common effects and across i, but assumed to follow general covariance stationary processes. In our set-up, εit is assumed to be stationary, which implies that, in the case where ft and/or dt contain unit root processes, then yit , xit , dt and ft must be cointegrated.1 Some of the implications of this property are explored further in Remark 6. Combining (1) and (2), we now have
zit
yit xit
=
(k+1)×1
B′i
=
dt +
(k+1)×n n×1
Ci′
ft +
(k+1)×m m×1
uit
,
(k+1)×1
(3)
εit + β′i vit vit
Bi = αi
Ci = γ i
Ai
Γi
= 0 Ik
1
βi
0 Ik
1
βi
1 0
εit β′i vit
Ik
q.m.
,
(4)
Γ˜ i = γ i
¯ ′ dt + C¯ ′ ft + u¯ t , z¯t = B
(7)
where
N i=1
N 1 −
N i =1
uit ,
and N 1 −
N i =1
Bi ,
C¯ =
N 1 −
N i=1
Ci .
1
β
0 Ik
,
as N → ∞,
(12)
for all N , and as N → ∞,
(9)
(13)
it follows, assuming that Rank (Γ˜ ) = m, that q.m.
¯ ′ dt ) → 0, ft − (CC ′ )−1 C (¯zt − B
as N → ∞.
This suggests that, for sufficiently large N, it is valid to use h¯ t = (dt′ , z¯t′ )′ as observable proxies for ft . This result holds irrespective of whether the unobserved factor loadings, γ i and Γ i , are fixed or random. When the rank condition is not satisfied, the use of cross-section averages alone does not allow consistent estimation of all of the unobserved factors, and as a result the estimation of the individual coefficients βi by means of the cross-section averages alone will not be possible. But, interestingly enough, consistent estimates of the mean of the slope coefficients, β, and their asymptotic distribution can be obtained if it is further assumed that the factor loadings are distributed independently of the factors and the individual-specific error processes. 2.1. The CCE estimators We now discuss the two estimators for the means of the individual specific slope coefficients proposed by Pesaran (2006). One is the Mean Group (MG) estimator proposed in Pesaran and Smith (1995) and the other is a generalization of the fixed effects estimator that allows for the possibility of cross-section dependence. The former is referred to as the ‘‘Common Correlated Effects Mean Group’’ (CCEMG) estimator, and the latter as the ‘‘Common Correlated Effects Pooled’’ (CCEP) estimator. The CCEMG estimator is a simple average of the individual CCE estimators, bˆ i of βi ,
(8)
We distinguish between two important cases: when the rank condition rk(C¯ ) = m ≤ k + 1,
Γ˜ = (E (γ i ), E (Γ i )) = (γ, Γ ),
(6)
¯t = u
(11)
where
,
Γi .
zit ,
as N → ∞, for each t ,
(5)
As discussed in Pesaran (2006), the above set-up is sufficiently general and renders a variety of panel data models as special cases. In the panel literature with T small and N large, the primary parameters of interest are the means of the individual specific slope coefficients, βi , i = 1, 2, . . . , N. The common factor loadings, αi and γ i , are generally treated as nuisance parameters. In cases where both N and T are large, it is also possible to consider consistent estimation of the factor loadings, but this topic will not be pursued here. The presence of unobserved factors in (1) implies that estimation of βi and its cross-sectional mean cannot be undertaken using standard methods. Pesaran (2006) has suggested using cross-section averages of yit and xit to deal with the effects of proxies for the unobserved factors in (1). To see why such an approach could work, consider simple cross-section averages of the equations in (3)2 :
¯= B
(10)
and C¯ → C = Γ˜
N 1 −
¯ t → 0, u p
,
Ik is an identity matrix of order k, and the rank of Ci is determined by the rank of the m × (k + 1) matrix of the unobserved factor loadings
z¯t =
′ ¯ ′ dt − u¯ t ). ft = (C¯C¯ )−1 C¯ (¯zt − B
But since
where uit =
holds, and when it does not. Under the former, the analysis simplifies considerably, since it is possible to proxy the unobserved factors by linear combinations of cross-section averages, z¯t , and the observed common components, dt . But if the rank condition is not satisfied, this is not possible, although as we shall see it is still possible to consistently estimate the mean of the regression coefficients, β, by the CCE procedure. In the case where the rank condition is met, we have
bˆ MG = N −1
N −
bˆ i ,
(14)
i=1
where
¯ i, ¯ Xi )−1 Xi′ My bˆ i = (Xi′ M
(15)
¯ is defined by Xi = (xi1 , xi2 , . . . , xiT )′ , yi = (yi1 , yi2 , . . . , yiT )′ , M 1 However, as will be shown later, our results on the estimators of β hold even if the factor loadings γ i and/or Γ i are zero (or weak in the sense of Chudik et al. (forthcoming)), and it is not necessary that xit and ft are cointegrated. What is required for our results is that, conditional on dt and ft , the idiosyncratic errors εit and vit are stationary. 2 Pesaran (2006) considers cross-section weighted averages that are more general. But to simplify the exposition we confine our discussion to simple averages throughout.
¯ = IT − H¯ (H¯ ′ H¯ )−1 H¯ ′ , M
(16)
¯ = (D, Z¯ ), D and Z¯ being, respectively, the T × n and T × (k + 1) H matrices of observations on dt and z¯t . We also define for later use Mg = IT − G (G ′ G )−1 G ′ , and
(17)
G. Kapetanios et al. / Journal of Econometrics 160 (2011) 326–348
Mq = IT − Q (Q ′ Q )+ Q ′ ,
with Q = G P¯ ,
(18)
where G = (D, F ), D = (d1 , d2 , . . . , dT )′ , F = (f1 , f2 , . . . , fT )′ are T × n and T × m data matrices on observed and unobserved common factors, respectively, (A)+ denotes the Moore–Penrose inverse of A, and
P¯
(n+m)×(n+k+1)
=
¯ B , C¯
In 0
U¯ ∗ = (0, U¯ ),
(19)
¯ and U¯ = (u¯ 1 , u¯ 2 , . . . , u¯ T )′ where U¯ ∗ has the same dimension as H ¯ t . Efficiency gains is a T × (k + 1) matrix of observations on u from pooling of observations over the cross-section units can be achieved when the individual slope coefficients, βi , are the same. Such a pooled estimator of β, denoted by CCEP, is given by bˆ P =
N −
−1 ¯ Xi Xi′ M
i =1
N −
¯ i, Xi′ My
(20)
Assumption 4 (Random Slope Coefficients). The slope coefficients, βi , follow the random coefficient model
βi = β + ~ i ,
~i v IID(0, Ω~ ),
3. Theoretical properties of CCE estimators in non-stationary panel data models The following assumptions will be used in the derivation of the asymptotic properties of the CCE estimators. Assumption 1 (Non-Stationary Common Effects). The (n2 + m) × 1 ′ vector of stochastic common effects, gt = (d2t , ft′ )′ , follows the multivariate unit root process
gt = gt −1 + ζ gt , where ζ gt is an (n2 + m) × 1 vector of L2+δ , δ > 0, stationary near epoque dependent (NED) processes of size 1/2, on some α mixing process of size −(2 + δ)/δ , distributed independently of the individual-specific errors, εit ′ and vit ′ for all i, t and t ′ . Assumption 2 (Individual-Specific Errors). (i) The individual-specific errors εit and vjt are distributed independently of each other, for all i, j and t. εit have uniformly bounded positive variance, supi σi2 < K , for some constant K , and uniformly bounded fourthorder cumulants. vit have covariance matrices, Σ vi , which are non-singular and satisfy supi ‖Σ vi ‖ < K < ∞, autocovariance ∑∞ matrices, Γ iv (s), such that supi s=−∞ ‖Γ iv (s)‖ < K < ∞, and have uniformly bounded fourth-order cumulants. (ii) For each i, (εit , vit′ )′ is an (k + 1) × 1 vector of L2+δ , δ > 0, stationary near 2δ epoque dependent (NED) processes of size 2δ− on some α -mixing 4 process ψ it of size −(2 + δ)/δ which is partitioned conformably to (εit , vit′ )′ as (ψεit , ψ′vit )′ , where ψεit and ψvjt are independent for all i and j. Assumption 3. The coefficient matrices, Bi and Ci , are independently and identically distributed across i, and independent of the individual specific errors, εjt and vjt , the common factors, ζ gt , for all i, j and t with fixed means B and C , and uniformly bounded secondorder moments. In particular, vec(Bi ) = vec(B) + ηB,i ,
(23)
where ‖β‖ < K , ‖Ω~ ‖ < K , for some constant K , Ω~ is a k × k symmetric non-negative definite matrix, and the random deviations, ~i , are distributed independently of γ j , Γ j , εjt , vjt , and ζ gt for all i, j and t. ~i has finite fourth moments uniformly over i. Assumption 5 (Identification of βi and β). i and T , and
∑N
limN →∞ N1
Assumption 6.
i=1
Xi′ Mg Xi
−1
T
¯ Xi Xi′ M
−1
T
exists for all
Σ vi is non-singular.
exists for all i and T , and
′ ¯ 2 X M Xi supi E i T < K < ∞.
for i = 1, 2, . . . , N ,
(21)
Assumption 7. When rank condition (9) is not satisfied, (i) Xi′ Mq Xi
and Θ
T2
E (T
−2
vec(Ci ) = vec(C ) + ηC ,i , for i = 1, 2, . . . , N ,
where ΩBη and ΩC η are (k + 1)n ×(k + 1)n and (k + 1)m ×(k + 1)m symmetric non-negative definite matrices, ‖B‖ < K , ‖C ‖ < K , ‖ΩBη ‖ < K and ‖ΩC η ‖ < K , for some constant K .
limN ,T →∞
1 N
∑N
i=1 ΘiT , where ΘiT
N
i=1
=
Xi Mq Xi ), are non-singular; (ii) if m
≥ 2k + 1, then X ′ Mq Xi −1 X ′ Mq F 2 i i 2 exists for all i and T and sup E i 2 2 T T T ′ 2 F F < ∞; and (iii) if m < 2k + 1, then E T 2 < ∞ and 2 F ′ F −1 < ∞. E T2
Xi′ Mq Xi
−1
Remark 1. Assumption 1 departs from the standard practice in the analysis of large panels with common factors and specifies that the factors are non-stationary. Assumption 2 concerns the individual specific errors and relaxes the assumption that εit are serially uncorrelated, often adopted in the literature (see, e.g., Pesaran (2006)). Assumptions 2–6 are standard in large panels with random coefficients. But some comments on Assumption 7 seems to be in order. This assumption is only used when the rank condition (9) is not satisfied. It is made up of three regularity conditions.3 The last two are of greater significance and only relate to the Mean Group estimator presented in the next section. In effect, these assumptions ensure that the individual slope coefficient estimators possess second-order moments asymptotically, which seems plausible in most economic applications. Remark 2. Note that Assumption 3 implies that γ i are independently and identically distributed across i, and
γ i = γ + ηi , ηi v IID(0, Ωη ),
for i = 1, 2, . . . , N ,
(24)
where Ωη is an m × m symmetric non-negative definite matrix, and ‖γ‖ < K , and ‖Ωη ‖ < K , for some constant K . For each i and t = 1, 2, . . . , T , writing the model in matrix notation, we have yi = Dαi + Xi βi + F γ i + εi ,
(25)
where εi = (εi1 , εi2 , . . . , εiT )′ . Using (25) in (15), we have bˆ i − βi =
¯ Xi Xi′ M
−1
T
(22)
=
∑ N 1
′
and
ηC ,i v IID(0, ΩC η ),
for i = 1, 2, . . . , N ,
i=1
which can also be viewed as a generalized fixed effects (GFE) ¯ = τ T with estimator, and reduces to the standard FE estimator if H τ T being a T × 1 vector of ones.
ηB,i v IID(0, ΩBη ),
329
+
¯F Xi′ M
T
¯ Xi Xi′ M T
−1
γi
Xi′ M¯ ε i T
,
(26)
3 E ‖T −2 F ′ F ‖2 < ∞, which is part of Assumption 7(iii), can be established under mild regularity conditions (see Lemma 4 of Phillips and Moon (1999)).
330
G. Kapetanios et al. / Journal of Econometrics 160 (2011) 326–348
which shows the direct dependence of bˆ i on the unobserved factors ¯ F . To examine the properties of this component, through T −1 Xi′ M we first note that (2) and (7) can be written in matrix notation as Xi = G Π i + Vi ,
(27)
and
¯ = (D, Z¯ ) = (D, DB¯ + F C¯ + U¯ ) = G P¯ + U¯ ∗ , H
(28)
where Π i = (Ai , Γ i ) , Vi = (vi1 , vi2 , . . . , viT ) , G = (D, F ), and P¯ and U¯ ∗ are defined by (19). Using Lemmas 3 and 4 in Appendix A, and assuming that rank condition (9) is satisfied, it follows that ′
¯F Xi′ M T
¯ Xi Xi′ M T
= Op −
′ ′
1
+ Op
√
NT
T
1
Xi′ Mg Xi
′
1
= Op
,
N
√
N
uniformly over i,
,
(29)
uniformly over i,
(30)
The variance estimator for Σ MG suggested by Pesaran (2006) is given by
ˆ MG = Σ
T
−
Xi′ Mg εi
= Op
T
1
1
+ Op
√
NT
N
(38)
N − 1 i=1
which can be used here as well. The following theorem summarizes the results for the Mean Group estimator. The result is proved in Appendix C.
Theorem 1. Consider the panel data models (1) and (2). Let Assumptions 1–6 and 7(ii), (iii) hold. Then, for the Common Correlated Effects j
Mean Group estimator, bˆ MG , defined by (14), we have, as (N , T ) → ∞, that
√
and
¯ εi Xi′ M
N − (bˆ i − bˆ MG )(bˆ i − bˆ MG )′ ,
1
d
N (bˆ MG − β) → N (0, Σ MG ),
where
,
uniformly over i.
(31)
Σ MG = Ω~ + Λ,
(39)
N
1 −
If the rank condition does not hold, then by Lemma 6 in Appendix A it follows that
Λ = lim
¯F Xi′ M
and Σ iqT is defined in (C.67). Σ MG can be consistently estimated by (38).
T
¯ Xi Xi′ M T
− −
Xi′ Mq F T
= Op
Xi′ Mq Xi
,
N
= Op
T
1
√
1
√
N
uniformly over i,
,
(32)
uniformly over i,
(33)
and
¯ εi Xi′ M T
−
Xi′ Mq εi
= Op
T
1
1
+ Op
√
NT
N
,
uniformly over i.
(34)
In the following subsections we discuss our main theoretical results.
We now examine the asymptotic properties of the pooled estimators. Focusing first on the MG estimator, and using (26), we have N −
1 N (bˆ MG − β) = √
N i=1
+
N 1 −
N i=1
~i + −1
N 1 −
−1 Ψˆ iT
N i=1
√
√
NXi M¯ εi
Ψˆ iT
′
T
¯F NXi′ M
¯ F) N (Xi′ M T
= Op
1
√
T
T
+ Op
N 1 −
1
√
N
N (bˆ MG − β) = √ ~i + Op N i =1
,
(35)
Note that this theorem does not require that the rank condition, (9), holds for any number, m, of unobserved factors so long as m is fixed. Also, it does not impose any restrictions on the relative rates of expansion of N and T . The following theorem summarizes the results for the second pooled estimator, bˆ P . The proof is provided in Appendix C.
Theorem 2. Consider the panel data models (1) and (2), and suppose that Assumptions 1–6 and 7(i) hold. Then, for the Common Correlated
1
√
T
Effects Pooled estimator, bˆ P , defined by (20), as (N , T ) → ∞, we have that
√
d
N (bˆ P − β) → N (0, Σ ∗P ),
.
(36)
+ Op
1
√
N
Φ = lim
N ,T →∞
N ,T →∞
N (bˆ MG − β) → N (0, Σ MG ),
as (N , T ) → ∞.
N 1 −
N i=1
ΦTi ,
Ξ = lim
N ,T →∞
N 1 −
N i=1
Ξ Ti ,
ΘTi
Ξ Ti = Var[T −2 Xi′ Mq Xi ~i ], and ΦTi and ΘTi are given by (C.87) and (C.84) , respectively. Σ ∗P can be estimated consistently by
.
∗
j
N 1 −
N i=1
Θ = lim
ˆ P = Ψˆ Σ d
(41)
where
Hence
√
(40)
Σ ∗P = Θ−1 (Ξ + Φ)Θ−1 ,
Using this, we can formally show that
√
Σ iqT
where Σ ∗P is given by
γi
¯ Xi . In the case where rank condition (9) is ˆ iT = T −1 Xi′ M where Ψ satisfied, by (29), we have √
N i =1
j
3.1. Results for pooled estimators
√
N ,T →∞
(37)
where
∗−1 ∗ ˆ
ˆ R Ψ
∗−1
,
(42)
G. Kapetanios et al. / Journal of Econometrics 160 (2011) 326–348
∗
Ψˆ = N −1
N − ¯ Xi X ′M i
T
i=1
Rˆ ∗ =
1
(N − 1)
N
−
√ ,
where
¯ Xi Xi M ′
(bˆ i − bˆ MG )(bˆ i − bˆ MG )′
¯ Xi Xi M ′
T
.
(44) Overall we see that, despite a number of differences in the above analysis, especially in terms of the results given in (29)–(34), compared to the results in Pesaran (2006), the conclusions are remarkably similar when the factors are assumed to follow unit root processes. Remark 3. The formal analysis in the Appendices focuses on the case where the factor is an I (1) process and no cointegration is present among the factors. But, as shown by Johansen (1995, pp. 40), when the factor process is cointegrated and there are l < m cointegrating vectors, we have that F γ i = F1 δ1i + F2 δ2i , where F1 is an m − l-dimensional I (1) process with no cointegration, whereas F2 is an l-dimensional I (0) process. This implies that the cointegration case is equivalent to a case where the model contains a mix of non-cointegrated I (1) and I (0) factor processes. Since we know that the results of the paper hold for both non-cointegrated I (1) and, by Pesaran (2006), I (0) factor processes, we conjecture that they hold for the cointegrated case, as well. However, we feel that a formal proof of this statement is beyond the scope of the present paper. We consider a case of cointegrated factors in the Monte Carlo study. The results clearly support the above claim. Remark 4. In the case of standard linear panel data models with strictly exogenous regressors and homogeneous slopes, and without unobserved common factors, Pesaran et al. (1996) show that in general the fixed effect estimator is asymptotically at least as efficient as the Mean Group estimator. It is reasonable to expect that this result also applies to the CCE type estimators, namely that, under βi = β for all i, the CCEP estimator would be at least as efficient as the CCEMG estimator. Although a formal proof is beyond the scope of the present paper, the Monte Carlo results reported below provide some evidence in favour of this conjecture. As we noted above, the whole analysis does not depend on whether the rank condition holds or not. But in the case where the rank condition is satisfied, a number of simplifications arise. In particular, the technical Assumption 7 is not needed, and Assumption 3 can be relaxed. Namely the factor loadings, γ i , need not follow the random coefficient model. It would be sufficient that they are bounded. Also, the expressions for the theoretical covariance matrices of the estimators change, although crucially the estimators of these covariance matrices do not. For completeness, we present corollaries on the theoretical properties of the pooled estimators when the rank condition holds, below. Proofs are provided in Appendix D. Corollary 1. Consider the panel data models (1) and (2). Assume that the rank condition, (9), is met and suppose that Assumptions 1–6 hold. Then, for the Common Correlated Effects Mean Group estimator, bˆ MG , j
defined by (14), we have, as (N , T ) → ∞, that
√
d
N (bˆ MG − β) → N (0, Σ MG ),
where Σ MG is given by Ω~ . Σ MG can be consistently estimated by (38). Corollary 2. Consider the panel data models (1) and (2), and suppose that the rank condition, (9), is met and that Assumptions 1–6 hold. Then, for the Common Correlated Effects Pooled estimator, bˆ P , defined j
d
N (bˆ P − β) → N (0, Σ ∗P ),
(43)
T
i =1
331
by (20), as (N , T ) → ∞, we have that
Σ ∗P = Ψ ∗−1 R ∗ Ψ ∗−1 ,
R∗ =
lim
N ,T →∞
N −1
Ψ = lim
N →∞
N −
Σ v ΩiT ,
(46)
i=1
∗
(45)
N
−1
N −
Σ vi
,
(47)
i=1
and Σ v ΩiT denotes the variance of consistently by (42).
Xi′ Mg Xi T
~i . Σ ∗P can be estimated
3.2. Estimation of individual slope coefficients In panel data models where N is large, the estimation of the individual slope coefficients is likely to be of secondary importance as compared to establishing the properties of pooled estimators. However, it might still be of interest to consider conditions under which they can be consistently estimated. In the case of our set-up, the following further assumption is needed. Assumption 8. For each i, εit is a martingale difference sequence. For each i, vit is a k × 1 vector of L2+δ , δ > 0, stationary near epoque dependent (NED) processes of size 1/2, on some α -mixing process of size −(2 + δ)/δ . Then, we have the following result. The proof is provided in Appendix E. Theorem 3. Consider the panel data models √ (1) and (2) and suppose that Assumptions 1, 2(i) and 3–8 hold. Let T /N → 0, as (N , T ) j
→ ∞, and assume that the rank condition (9) is satisfied. As (N , T ) j → ∞, bˆ i , defined by (15), is a consistent estimator of βi . Further, √
d
T (bˆ i − βi ) → N (0, Σ bi ).
(48)
A consistent estimator of Σ bi is given by
ˆ bi = σ˚ i2 Σ
¯ Xi Xi′ M T
−1
,
(49)
where
σ˚ i2 =
yi − Xi bˆ i
′
¯ (yi − Xi bˆ i ) M
T − (n + 2k + 1)
.
(50)
Remark 5. Parts of the above result hold under weaker versions of Assumption 8. In particular, we note that the central limit theorem in (E.110) holds if Assumption 2(ii) holds. However, in this case, the asymptotic variance has a different form, as autocovariances of εit vit enter the asymptotic variance expression. If, then, a consistent estimate of the asymptotic variance is required, a Newey and West type correction (Newey and West, 1987) needs to be used. Consistency of this variance estimator requires more stringent assumptions than the NED Assumption 2(ii). It is sufficient to assume that (εit , vit′ )′ is a strongly mixing process for this consistency to hold. Remark 6. It is worth noting that despite the fact that, under our assumptions, ft , yit and xit are I (1) and cointegrated, implying that εit is an I (0) process, in the results of Theorem 3, the rate of j
√
convergence of bˆ i to βi as (N , T ) → ∞ is T and not T . It is helpful to develop some intuition behind this result. Since for N sufficiently
332
G. Kapetanios et al. / Journal of Econometrics 160 (2011) 326–348
large ft can be well approximated by the cross-section averages, for pedagogic purposes we might as well consider the case where ft is observed. Without loss of generality, we also abstract from dt , and substitute (2) in (1) to obtain yit = β′i Γ ′i ft + vit + γ ′i ft + εit = ϑ′i ft + ζit ,
(51)
where ϑi = Γ i βi +γ i and ζit = εit +β′i vit . First, it is clear that, under our assumptions, and for all values of βi , ζit is I (0) irrespective of whether ft is I (0) or I (1). But, if ft is I (1), since ζit v I (0), then yit will also be I (1) and cointegrated with ft . Hence, it follows that ϑi can be estimated superconsistently. However, the ordinary least squares (OLS) estimator of βi need not be superconsistent. To see this, note that βi can be estimated equivalently by regressing the residuals from the regressions of yit on ft on the residuals from the regressions of xit on ft . Both these sets of residuals are stationary √ processes, and the resulting estimator of βi will be at most T consistent. Remark 7. An issue related to the above remark concerns the probability limit of the OLS estimator of the coefficients of xit in a regression of yit on xit alone. In general, such a regression will be subject to the omitted variable problem and hence misspecified. Also, the asymptotic properties of such OLS estimators cannot be derived without further assumptions. However, there is a special case which illustrates the utility of our method. Abstracting from dt , assuming that k = m and that Γ i is invertible, then, similarly to (51), write the model for yit as 1 yit = β′i xit + γ ′i Γ ′− (xit − vit ) + εit = ϱ′i xit + ςit , i
(52)
where ϱ′i = βi + γ ′i Γ i and ςit = εit − γ ′i Γ i vit . Note that ςit is, by construction, correlated with vit . The question is whether estimating a regression of the form (52) provides a consistent estimate of ϱi . For stationary processes this would not be case, due to the correlation between ςit and vit . However, in the case of nonstationary data this is not clear, and consistency would depend on the exact specification of the model. Under the assumptions we have made in this remark, the estimator of ϱi would be consistent. However, even in this case it is clear that the application of the least squares method to (52) can only lead to a consistent estimator of ϱi and not of βi . To consistently estimate the latter we need to augment the regressions of yit on xit with their cross-section averages. ′
′−1
′−1
4. Monte Carlo design and evidence In this section, we provide Monte Carlo evidence on the small-sample properties of the CCEMG and the CCEP estimators, which are defined by (14) and (20), respectively. We consider nine alternative estimators. The first one is the CupBC estimator proposed by Bai et al. (2009), which is a bias-corrected version of a continuously updated estimator that estimates both the slope parameters and the unobserved factors iteratively. The CupBC estimator, as analyzed by Bai et al. (2009), assumes that the number of unobserved factors is known and only considers the case where the slopes are homogeneous.4 In addition, we consider two alternative principal component (PC) augmentation approaches discussed in Kapetanios and Pesaran (2007). The first PC approach applies the Bai and Ng (2002) procedure to zit = (yit , x′it )′ to obtain consistent estimates of the unobserved factors, and then uses the estimated factors to augment the regression (1), and thus produces consistent estimator of β. We consider both pooled
4 See Bai et al. (2009), for more details.
and mean group versions of this estimator, which we refer to as PC 1POOL and PC 1MG. The second PC approach begins with extracting the principal component estimates of the unobserved factors from yit and xit separately. In the second step, yit and xit are regressed on their respective factor estimates, and in the third step the residuals from these regressions are used to compute the standard pooled and mean group estimators, with no crosssectional dependence adjustments. We refer to the estimators based on this approach as PC 2POOL and PC 2MG, respectively. On top of these principal component estimators, we consider two sets of benchmark estimators. The first set consists of infeasible mean group and pooled estimators, which are obtained assuming that the factors are observable (i.e., z¯t for the CCE estimators is replaced by true factor ft ). The other set consists of naive mean group and pooled estimators, which ignore the factor structure. The naive estimators are expected to illustrate the extent of bias and size distortions that can occur if the error cross-section dependence that is induced by the factor structure is ignored. We report summaries of the performance of the estimators in the Monte Carlo experiments in terms of average biases, root mean square errors and rejection probabilities of the t-test for slope parameters under both the null hypothesis and an alternative hypothesis. For computing the t-statistics, the standard errors of mean group and pooled CCE estimators are estimated using (38) and (42), respectively. The standard errors of PC1, PC2, infeasible and naive estimators are estimated similarly to those of the CCE estimators. The standard errors of the CupBC estimator are computed following Bai et al. (2009). 4.1. Baseline design The experimental design of the Monte Carlo study closely follows the one used in Pesaran (2006). Consider the following data generating process (DGP): yit = αi1 d1t + βi1 x1it + βi2 x2it + γi1 f1t + γi2 f2t + εit ,
(53)
and xijt = aij1 d1t + aij2 d2t + γij1 f1t + γij3 f3t + vijt ,
j = 1, 2,
(54)
for i = 1, 2, . . . , N, and t = 1, 2, . . . , T . This DGP is a restricted version of the general linear model considered in Pesaran (2006), and sets n = k = 2, and m = 3, with α′i = (αi1 , 0), β′i = (βi1 , βi2 ), and γ ′i = (γi1 , γi2 , 0), and ′
Ai =
ai11 ai21
ai12 ai22
γi11 Γi = γi21
,
′
0 0
γi13 . γi23
The observed common factors and the individual-specific errors of xit are generated as independent stationary AR(1) processes with zero means and unit variances: d1t = 1,
d2t = ρd d2,t −1 + vdt ,
t = −49, . . . , 1, . . . , T ,
vdt ∼ IIDN(0, 1 − ρ ), ρd = 0.5, d2,−50 = 0, vijt = ρv ij vijt −1 + ~ijt , t = −49, . . . , 1, . . . , T , ~ijt ∼ IIDN 0, 1 − ρv2ij , vji,−50 = 0, 2 d
and
ρvij ∼ IIDU[0.05, 0.95],
for j = 1, 2.
But the unobserved common factors are generated as nonstationary processes: fjt = fjt −1 + vfj,t , for j = 1, 2, 3, t = −49, . . . , 0, . . . , T , vfj,t ∼ IIDN(0, 1), fj,−50 = 0, for j = 1, 2, 3. The first 50 observations are discarded.
(55)
G. Kapetanios et al. / Journal of Econometrics 160 (2011) 326–348
To illustrate the robustness of the CCE estimators and others to the dynamics of the individual-specific errors of yit , these are generated as the (cross-sectional) mixture of stationary heterogeneous AR(1) and MA(1) errors. Namely,
εit = ρiε εi,t −1 + σi 1 − ρi2ε ωit , i = 1, 2, . . . , N1 , t = −49, . . . , 0, . . . , T , and
εit =
σi 1 + θi2ε
(ωit + θiε ωi,t −1 ),
i = N1 + 1, . . . , N , t = −49, . . . , 0, . . . , T , where N1 is the nearest integer to N /2,
ωit ∼ IIDN(0, 1), σi2 ∼ IIDU[0.5, 1.5], ρiε ∼ IIDU[0.05, 0.95], θiε ∼ IIDU[0, 1]. ρvij , ρiε , θiε and σi are not changed across replications. The first 49 observations are discarded. The factor loadings of the observed common effects, αi1 and vec(Ai ) = (ai11 , ai21 , ai12 , ai22 )′ are generated as IIDN(1, 1) and IIDN(0.5τ 4 , 0.5I4 ) with τ 4 = (1, 1, 1, 1)′ , respectively, which are not changed across replications. The parameters of the unobserved common effects in the xit equation are generated independently across replications as
Γ ′i =
γi11 γi21
0 0
γi13 γi23
∼ IID
N (0.5, 0.50) N (0, 0.50)
N (0, 0.50) . N (0.5, 0.50)
0 0
For the parameters of the unobserved common effects in the yit equation, γ i , we considered two different sets, which we denote by A and B . Under set A, the γ i are drawn such that the rank condition is satisfied, namely
γi1 ∼ IIDN(1, 0.2),
γi2A ∼ IIDN(1, 0.2),
γi3 = 0,
and
E Γ˜ iA = E (γ iA ),
E (Γ i ) =
1 1 0
0.5 0 0
0 0 . 0.5
Under set B ,
γi1 ∼ IIDN(1, 0.2),
γi2B ∼ IIDN(0, 1),
γi3 = 0,
so E Γ˜ iB = E (γ iB ),
1 E (Γ i ) = 0 0
0.5 0 0
0 0 , 0.5
333
Another important consideration worth bearing in mind when comparing the CCE and the principal component type estimators is the fact that the computation of the CupBC, PC1 and PC2 estimators assumes that m = 3, namely that the number of unobserved factors is known. In practice, m might be difficult to estimate accurately, particularly when N or T happen to be smaller than 50. By contrast, the CCE type estimators are valid for any fixed m and do not require an a priori estimate for m. Each experiment was replicated 2000 times for the (N , T ) pairs with N , T = 20, 30, 50, 100, 200. In what follows, we shall focus on β1 (the cross-section mean of βi1 ), and the results for β2 , which are very similar to those for β1 , will not be reported. The results for all the estimators considered are reported in Table 1. Since the performance of CCE and CupBC estimators dominates other feasible estimators in most of the designs considered, to save space we do not report the results of these estimators for the remaining experiments. 4.2. Designs for robustness checks In this subsection, we consider a number of Monte Carlo experiment designs that aim to check the robustness of the estimators to a variety of empirical settings. 4.2.1. The number of factors exceeds k + 1 In order to show the effect of a different type of violation of the rank condition from experiment B , we consider the DGP 1A, but an extra factor term γi4 f4t is added to the right-hand side (RHS) of the y equation (53), where γi4 ∼ IIDN(0.5, 0.2), f4t = f4t −1 + vf 4,t , vf 4,t ∼ IIDN(0, 1), f4,−50 = 0. In this case, observe that
E (γ i , Γ i )′ =
1 0.5 0
1 0 0
0 0 0.5
0.5 0 0
whose rank is k + 1 = 3, which is less than the number of unobserved factors, m = 4. Under this experiment, the number of factors is treated as unknown and is estimated, using the information criterion ‘PCp2 ’ which is proposed by Bai and Ng (2002, pp. 201).5 The information criterion is applied to the first differenced variables with the maximum number of factors set to six. The results are reported in Table 5. However, recall that the CCE type estimators do not make use of the number of the factors and are valid irrespective of whether k + 1 is more or less than m.
and the rank condition is not satisfied. For each set, we conducted two different experiments.
• Experiment 1 examines the case of heterogeneous slopes with βij = 1 + ηij , j = 1, 2, and ηij ∼ IIDN(0, 0.04), across replications. • Experiment 2 considers the case of homogeneous slopes with βi = β = (1, 1)′ . The two versions of experiment 1 will be denoted by 1A and 1B , and those of experiment 2 by 2A and 2B . Concerning the infeasible pooled estimator, it is important to note that, although this estimator is unbiased under all four sets of experiments, it need not be efficient, since in these experiments the slope coefficients, βi , and/or error variances, σi2 , differ across i. As a result, the CCE or PC augmented estimators may in fact dominate the infeasible estimator in terms of root mean square error (RMSE), particularly in the case of experiments 1A and 1B , where the slopes as well as the error variances are allowed to vary across i.
4.2.2. Cointegrating factors In this design, the unobserved common factors are generated as cointegrated non-stationary processes. There are two underlying stochastic trends, given by fjtt = fjtt −1 + vfjt ,t ,
v
t fj,t
for j = 1, 2, t = −49, . . . , 0, . . . , T ,
∼ IIDN(0, 1),
fj,−50 = 0, t
(56)
for j = 1, 2.
Then, this experiment uses the same design as 1A, but the I (1) factors in (53) and (54) are replaced by f1t = f1tt + 0.5f2tt + vf 1,t ,
t = −49, . . . , 0, . . . , T ,
f2t = 0.
t = −49, . . . , 0, . . . , T ,
5f1tt
+
f2tt
+ v f 2 ,t ,
f3t = 0.75f1tt + 0.25f2tt + vf 3,t ,
vfj,t ∼ IIDN(0, 1),
fj,−50 = 0,
t = −49, . . . , 0, . . . , T , for j = 1, 2, 3.
The first 50 observations are discarded. The results are reported in Table 6.
5 PC is one of the information criteria which performed well in the finite sample p2 investigations reported in Bai and Ng (2002).
334
G. Kapetanios et al. / Journal of Econometrics 160 (2011) 326–348
Table 1 Small-sample properties of common correlated effects type estimators in the case of experiment 1A (heterogeneous slopes + full rank). Bias (×100)
(N , T ) 20
Root mean square error (×100)
30
50
100
200
20
30
50
Size (5% level, H0 : β1 = 1.00)
100
200
20
30
Power (5% level, H1 : β1 = 0.95)
50
100
200
20
30
50
100
200
CCE type estimators 20 30 50 100 200
CCEMG 0.05 0.09 −0.19 0.00 −0.05
20 30 50 100 200
CCEP 0.18 −0.17 0.00 0.00 −0.07
−0.10 −0.01
−0.03 −0.01 −0.11
0.22 0.04 −0.02
0.04 −0.03
0.00
−0.05
−0.12
0.09 −0.07 0.03 −0.05
0.18 0.09 −0.04
0.06
−0.07
−0.13
0.10 −0.04 0.04 0.00
0.14 0.03 0.05
−0.01 −0.15 0.12 0.00 0.05
9.67 7.69 5.88 4.25 3.07
7.89 6.09 4.61 3.46 2.49
6.74 5.11 4.01 2.89 2.01
5.87 4.54 3.44 2.33 1.72
5.54 4.22 3.13 2.27 1.51
7.20 6.95 5.70 5.75 4.40
6.90 5.30 5.05 5.85 5.15
7.15 5.90 6.65 5.25 4.90
7.90 6.25 6.20 4.90 5.60
7.55 6.35 5.95 6.20 5.10
11.65 11.40 15.10 23.35 35.55
13.00 14.25 20.40 34.30 52.65
16.10 18.05 25.60 44.40 68.70
17.50 22.05 34.10 56.00 83.65
20.10 26.85 36.65 63.25 90.50
8.75 7.10 5.33 3.78 2.71
7.67 5.99 4.51 3.25 2.29
6.85 5.32 3.97 2.85 1.95
6.32 4.78 3.47 2.34 1.70
6.21 4.46 3.22 2.28 1.53
7.70 7.55 6.80 5.70 5.10
8.10 6.25 6.20 5.65 4.35
7.30 6.75 5.90 5.60 5.05
8.05 6.65 6.35 5.15 4.70
7.15 6.45 6.45 6.25 4.75
12.75 12.40 17.45 28.15 44.75
13.50 15.00 22.15 37.40 56.80
16.05 19.30 26.40 44.80 70.30
16.80 20.65 32.90 55.20 83.55
18.30 26.90 36.25 61.75 89.75
0.87 11.16 0.83 8.91 0.54 6.77 0.33 4.83 0.17 3.55
9.86 7.70 6.01 4.15 2.94
8.35 6.51 5.05 3.39 2.45
7.46 5.66 4.20 2.76 2.00
6.95 5.28 3.83 2.55 1.69
67.25 66.80 64.45 64.60 62.85
64.40 61.05 58.85 56.40 52.85
57.90 55.95 51.95 47.85 45.00
60.95 55.40 51.35 43.35 44.50
65.75 63.30 56.70 52.65 48.00
72.05 71.85 77.20 80.70 86.65
68.75 69.95 72.65 80.35 88.70
65.00 66.35 69.55 82.50 90.75
68.50 70.75 76.90 87.90 96.60
74.60 77.60 83.45 92.50 99.10
7.21 5.91 4.48 3.16 2.22
6.33 4.95 3.75 2.78 1.93
5.62 4.43 3.39 2.49 1.69
4.98 3.97 3.09 2.15 1.57
4.76 3.87 2.94 2.14 1.44
6.40 6.50 6.45 5.50 4.85
6.20 5.80 5.25 5.15 5.00
6.80 6.05 5.90 5.45 5.00
5.95 5.30 5.25 4.70 5.60
6.50 5.90 5.20 5.45 4.70
12.75 16.15 21.70 36.85 59.15
15.35 18.05 27.35 46.15 72.85
16.85 23.35 31.45 55.10 82.25
19.70 25.20 38.45 62.50 90.40
20.40 28.80 40.25 66.65 92.75
0.27 0.02 0.00 −0.02
7.30 6.23 4.61 3.30 2.35
6.96 5.78 4.40 3.26 2.22
6.92 5.79 4.31 3.12 2.20
7.11 5.89 4.71 3.30 2.45
7.40 6.61 5.02 3.52 2.49
6.40 7.05 5.70 5.25 4.95
6.80 5.90 5.80 5.60 4.70
6.60 7.00 5.50 5.20 4.50
7.00 5.25 6.25 5.20 5.85
5.10 5.70 5.00 5.30 4.70
13.70 15.70 22.20 33.45 56.15
13.75 15.35 22.55 38.20 62.10
14.55 18.95 23.65 38.85 59.50
14.10 16.70 25.50 36.75 59.05
12.65 16.60 21.00 32.30 52.20
−0.13 0.13 −0.01 0.02 0.00
Bai, Kao and Ng principal component estimator 20 30 50 100 200
CupBC 0.62 0.35 0.53 0.21 0.10
0.70 0.42 0.67 0.34 0.10
0.81 0.73 0.33 0.35 0.08
0.77 0.59 0.63 0.28 0.23
Infeasible estimators (including f1t and f2t ) 20 30 50 100 200
Infeasible MG 0.01 −0.19 0.02 −0.14 −0.10 0.07 0.01 0.07 −0.07 0.04
20 30 50 100 200
Infeasible pooled 0.15 −0.13 −0.20 −0.15 0.12 0.07 −0.05 0.07 −0.08 0.06
−0.08
0.15
−0.08
0.01 −0.06 0.02 −0.07
−0.02
0.12 −0.04 0.04 0.01
−0.15
−0.26 −0.07
0.22 −0.08 0.09 −0.12
0.14 0.00 0.06
0.21 0.06 0.07
−0.21
Naive estimators (excluding f1t and f2t ) 20 30 50 100 200
Naive MG 22.18 22.23 22.21 21.97 22.15
23.13 25.06 23.91 23.92 24.09
26.82 28.36 25.65 26.76 27.49
29.96 31.33 29.61 30.04 30.09
32.62 34.01 33.64 32.88 33.23
31.76 30.51 29.75 28.40 27.87
32.97 33.31 31.12 30.02 29.44
37.37 37.87 32.75 32.97 32.80
41.49 41.46 37.73 36.39 35.71
47.04 45.32 42.66 40.06 39.34
32.05 40.45 55.80 71.20 81.85
32.95 44.10 59.30 75.25 86.00
34.85 46.65 58.00 77.90 87.85
35.45 43.85 59.25 78.60 88.05
31.50 39.45 54.75 75.25 87.95
41.00 51.00 68.30 81.05 88.75
42.65 53.95 70.85 84.35 91.95
43.50 57.45 70.30 85.95 92.30
41.95 52.20 69.20 85.85 92.90
38.05 47.15 65.05 83.20 92.05
20 30 50 100 200
Naive pooled 25.25 26.60 25.76 29.39 26.54 28.75 25.81 28.47 25.95 28.32
31.27 32.45 30.39 31.30 31.89
33.59 35.37 34.01 33.15 33.65
34.84 35.46 35.88 34.91 34.11
35.30 35.48 35.61 34.39 34.20
37.01 39.13 37.39 36.76 36.21
42.66 42.70 39.05 39.90 39.63
45.42 45.97 44.04 41.79 42.39
47.67 46.81 45.93 44.27 42.68
42.15 51.55 64.75 75.85 83.45
43.65 56.70 67.15 78.90 86.25
47.75 57.65 69.25 81.35 87.70
45.20 59.55 70.35 79.30 87.40
44.50 56.20 69.35 80.15 87.20
52.50 61.05 73.55 85.10 89.95
52.65 66.60 76.25 86.55 91.90
55.95 66.55 78.25 88.05 93.55
53.40 67.75 78.65 86.65 92.20
51.95 64.55 77.45 86.40 92.20
Principal component estimators, augmented 20 30 50 100 200
PC1MG −12.27 −11.15 −10.30 −9.25 −7.86 −6.46 −6.84 −5.05 −3.89 −4.78 −3.21 −2.03 −4.31 −2.54 −1.39
−8.87 −5.72 −3.01 −1.57 −0.81
−8.90 17.09 14.81 13.24 11.51 11.55 22.55 25.35 30.05 33.40 37.40 12.15 12.95 13.30 12.70 −5.25 13.55 10.84 8.98 7.80 7.15 20.60 20.90 21.65 24.75 24.70 10.75 8.25 7.35 7.40 −3.12 10.10 7.79 5.86 4.67 4.47 19.95 17.65 16.25 14.95 17.90 8.70 8.20 7.65 11.40 −1.45 7.44 5.34 3.68 2.87 2.72 20.10 16.80 11.45 9.75 11.10 9.55 12.15 20.25 28.85 −0.78 6.39 4.19 2.60 1.93 1.71 25.20 17.95 10.95 8.15 7.65 13.85 21.95 42.85 67.65
13.75 6.75 9.75 36.75 77.15
20 30 50 100 200
PC1POOL −11.97 −11.04 −10.35 −8.86 −7.66 −6.34 −6.20 −4.86 −3.81 −4.36 −3.00 −2.01 −3.62 −2.32 −1.36
−9.09 −5.73 −3.07 −1.60 −0.81
−9.23 15.88 14.38 13.07 11.59 12.07 25.50 28.35 32.05 34.45 38.95 12.05 14.10 14.90 14.55 −5.37 12.48 10.45 8.89 7.80 7.34 21.45 23.75 22.05 24.70 25.50 11.00 8.80 7.55 7.95 −3.19 9.06 7.52 5.72 4.73 4.54 21.40 18.75 16.00 16.05 18.90 8.55 9.55 8.10 10.90 −1.49 6.61 5.01 3.61 2.88 2.74 21.05 16.85 11.25 9.35 10.80 11.25 14.55 20.85 27.90 −0.79 5.39 3.81 2.51 1.91 1.73 25.15 17.60 10.50 7.80 7.80 16.35 26.75 45.45 68.00
14.90 6.35 9.65 36.30 76.15
(continued on next page)
4.2.3. Semi-strong factor structure Chudik et al. (forthcoming) introduce the notions of weak, semistrong and strong factor structures and prove that these different factor structures do not affect the consistency of the CCE type estimators with I (0) factors. Here we consider the effect of having
a semi-strong factor structure when the factors are I (1). For this purpose, the same DGP of the experiment 1A is used, but all factor loadings in (53) and (54) are multiplied by N −1/2 . The results are reported in Table 7. It is easily seen that when the factors are weak or semi-strong they cannot be consistently estimated by
G. Kapetanios et al. / Journal of Econometrics 160 (2011) 326–348
335
Table 1 (continued) Bias (×100)
(N , T ) 20
30
Root mean square error (×100) 50
100
200
20
30
50
Size (5% level, H0 : β1 = 1.00)
100
200
20
30
Power (5% level, H1 : β1 = 0.95)
50
100
200
20
30
50
100
200
Principal component estimators, orthogonalized 20 30 50 100 200
PC2MG −31.26 −25.50 −20.65 −16.17 −14.61
−27.06 −24.01 −22.67 −23.11 32.83 −21.21 −18.27 −16.69 −16.33 26.82 −16.23 −13.32 −11.41 −10.89 21.68 −12.44 −9.69 −7.61 −6.60 16.87 −10.78 −8.12 −5.79 −4.59 15.11
28.34 22.25 17.06 12.97 11.19
25.00 23.44 23.83 86.50 88.45 91.25 19.13 17.35 16.92 86.85 87.10 89.10 13.98 11.95 11.37 90.15 88.35 88.80 10.18 7.99 7.02 93.65 93.30 89.75 8.45 6.08 4.85 98.95 97.85 95.45
95.20 93.35 89.05 87.50 90.75
97.40 95.95 91.70 83.30 83.75
74.10 70.15 70.80 72.35 79.65
73.95 67.80 60.25 56.20 60.20
75.80 66.10 52.20 37.60 33.30
82.05 69.25 45.80 19.30 10.00
88.20 74.70 46.10 13.60 6.75
20 30 50 100 200
PC2POOL −31.97 −27.47 −24.27 −23.18 −24.19 33.39 −26.32 −21.51 −18.24 −16.83 −16.75 27.53 −21.22 −16.35 −13.17 −11.35 −10.99 22.10 −16.77 −12.52 −9.62 −7.55 −6.60 17.43 −15.16 −10.91 −8.00 −5.66 −4.53 15.67
28.69 22.48 17.15 13.06 11.33
25.23 23.99 24.99 91.00 90.70 93.20 19.13 17.51 17.37 91.35 90.40 89.70 13.82 11.91 11.48 95.05 90.90 88.95 10.11 7.95 7.03 97.95 95.05 90.50 8.34 5.96 4.79 99.75 98.45 95.95
95.55 93.35 88.20 86.45 89.35
98.50 96.15 91.70 82.30 82.50
80.65 78.50 79.65 80.90 88.65
78.60 71.80 63.80 60.80 65.85
78.80 66.65 52.95 38.10 33.35
83.35 70.65 46.20 18.30 8.40
90.45 76.90 48.25 14.25 6.30
Notes: The DGP is yit = αi1 d1t +βi1 x1it +βi2 x2it +γi1 f1t +γi2 f2t +εit , with εit = ρiε εi,t −1 +σi (1 −ρi2ε )1/2 ωit , i = 1, 2, . . . , [N /2], and εit = σi (1 +θi2ε )−1/2 (ωit +θiε ωi,t −1 ), i = [N /2] + 1, . . . , N , ωit ∼ IIDN(0, 1), σi2 ∼ IIDU[0.5, 1.5], ρiε ∼ IIDU[0.05, 0.95], θiε ∼ IIDU[0, 1]. Regressors are generated by xijt = aij1 d1t + aij2 d2t + γij1 f1t + γij3 f3t + vijt , j = 1, 2, for i = 1, 2, . . . , N. d1t = 1, d2t = 0.5d2,t −1 + vdt , vdt ∼ IIDN(0, 1 − 0.52 ), d2,−50 = 0; fjt = fjt −1 + vfj,t , vfj,t ∼ IIDN(0, 1), fj,−50 = 0, for j = 1, 2, 3; vijt = ρv ij vijt −1 + υijt , υijt ∼ IIDN(0, 1 − ρv2ij ), vij,−50 = 0 and ρv ij ∼ IIDU[0.05, 0.95] for j = 1, 2, for t = −49, . . . , T , with the first 50 observations discarded; αi1 ∼ IIDN (1, 1) ; aijℓ ∼ IIDN(0.5, 0.5) for j = 1, 2, ℓ = 1, 2; γi11 and γi23 ∼ IIDN(0.5, 0.50), γi13 and γi21 ∼ IIDN(0, 0.50); γi1 and γi2 ∼ IIDN(1, 0.2); βij = 1 + ηij , with ηij ∼ IIDN(0, 0.04) for j = 1, 2. ρvij , ρiε , θiε , σi2 , αi1 , aijℓ for j = 1, 2, ℓ = 1, 2 are fixed across replications. CCEMG and CCEP are defined by (14) and (20). CupBC is the bias-corrected iterated principal component estimator of Bai et al. (2009). The PC1 and PC2 estimators are from Kapetanios and Pesaran (2007). The variance estimators of all mean group and pooled estimators (except that of CupBC) are defined by (38) and (42), respectively. The PC type estimators are computed assuming that the number of unobserved factors, m = 3, is known. All experiments are based on 2000 replications. Table 2 Small-sample properties of common correlated effects type estimators in the case of experiment 2A (homogeneous slopes + full rank). Bias (×100)
(N , T ) 20 20 30 50 100 200
CCEMG 0.05 −0.14 0.08 −0.04 0.06
20 30 50 100 200 20 30 50 100 200
30
−0.15
Size (5% level, H0 : β1 = 1.00)
Root mean square error (×100) 50
100
200
20
30
50
100
200
0.09 0.00 0.03 −0.01 0.00
8.45 6.44 5.08 3.59 2.83
6.29 5.11 3.79 2.76 2.05
5.10 3.80 2.80 2.02 1.52
3.78 2.67 1.94 1.35 1.00
3.14 2.07 1.39 0.98 0.68
7.15 6.05 6.10 4.55 5.60
6.40 6.75 5.90 5.50 4.45
6.80 7.25 4.85 6.05 6.35
6.75 6.40 5.40 5.10 5.20
0.08 0.01 0.03 −0.01 0.00
6.95 5.20 4.08 2.87 2.17
5.56 4.50 3.29 2.37 1.63
4.94 3.55 2.56 1.78 1.32
3.98 2.67 1.84 1.24 0.92
3.74 2.26 1.39 0.93 0.65
6.60 5.10 5.40 5.60 5.60
6.75 5.90 5.40 6.20 3.95
7.30 7.25 5.45 6.40 5.70
0.01
8.25
6.13 4.73 3.56 2.43 1.73
4.14 3.08 2.31 1.66 1.16
2.32 1.72 1.27 0.86 0.63
1.29 0.96 0.70 0.48 0.33
64.00 61.85 59.90 60.30 59.95
52.40 50.00 49.25 48.40 46.60
38.20 35.40 34.45 34.40 32.60
0.02 0.04 0.02 0.06 0.03
−0.15
0.12 −0.06 −0.08 −0.02
CCEP 0.18 −0.14 0.05 −0.02 0.07
0.00 0.14 0.07 −0.04 −0.03
0.03 0.07 −0.02 0.06 0.01
−0.14
CupBC 0.12 0.04 −0.04 0.03 0.07
0.10 0.08 0.22 0.01 0.01
0.08 0.07 −0.06 0.02 0.03
−0.01
0.03 0.05 −0.04 0.01
0.01 0.04 −0.04 0.02
0.02 0.04 −0.05 0.03
−0.01 6.40 0.03 0.01 0.00
4.89 3.27 2.43
20
30
50
100
Power (5% level, H1 : β1 = 0.95) 200
20
30
50
100
200
6.85 6.45 5.35 6.10 5.70
11.70 12.70 18.00 28.30 44.20
13.80 20.45 26.90 43.00 67.95
21.75 30.70 44.45 72.35 91.90
31.25 50.90 75.65 95.20 99.90
47.90 71.60 95.00 99.90 100.00
6.75 6.25 6.20 5.25 5.60
6.80 6.40 5.30 5.95 5.35
14.25 15.25 24.60 41.65 65.25
16.25 24.55 34.35 58.35 84.40
25.25 34.90 51.70 81.85 96.95
33.70 52.95 78.65 97.80 100.00
46.25 70.70 95.00 100.00 100.00
25.15 23.25 21.90 20.25 20.70
18.85 19.15 15.40 17.15 14.90
70.40 71.30 77.20 87.15 94.70
66.65 71.35 81.60 91.65 97.70
65.90 79.30 88.35 97.40 99.80
84.75 95.00 98.85 100.00 100.00
98.35 99.90 100.00 100.00 100.00
Notes: The DGP is the same as that of Table 1, except that βij = 1 for all i and j, i = 1, 2, . . . , N , j = 1, 2. See notes to Table 1.
the principal components, and this could adversely impact the estimators of β that rely on the PCs as estimators of the unobserved factors. 4.2.4. A structural break in the means of the unobserved factors Finally, the results of recent research by Stock and Watson (2008) suggest that the possible structural breaks in the means of the unobserved factors will not affect the consistency of the CCE type estimators, as well as the principal component type estimators. In view of this, we considered another set of experiments, corresponding to the DGPs specified as 1A, but now the unobserved factors are generated subject to mean shifts. Specifically, under these experiments the unobserved factors are generated as fjt = ϕjt for t < [2T /3] and fjt = 1 + ϕjt for t ≥ [2T /3], with [A] being the greatest integer less than or equal to A, where ϕjt = ϕj,t −1 + ζjt , and ζjt ∼ IIDN(0, 1), for j = 1, 2, 3. Results are reported in Table 8.
4.3. Results The results of experiments 1A, 2A, 1B , 2B are summarized in Tables 1–4, respectively. We also provide results for the naive estimator (which excludes the unobserved factors or their estimates) and the infeasible estimator (which includes the unobserved factors as additional regressors) for comparison purposes. But for the sake of brevity we include the simulation results for these estimators only for experiment 1A. As can be seen from Table 1, the naive estimator is substantially biased, performs very poorly, and is subject to large size distortions: this is an outcome that continues to apply in the case of other experiments (not reported here). In contrast, the feasible CCE estimators perform well, have biases that are close to the bias of the infeasible estimators, show little size distortions even for relatively small values of N and T , and their RMSE falls steadily with increases in N and/or T . These results are quite similar to the results
336
G. Kapetanios et al. / Journal of Econometrics 160 (2011) 326–348
Table 3 Small-sample properties of common correlated effects type estimators in the case of experiment 1B (heterogeneous slopes + rank deficient). Bias (×100)
(N , T ) 20
30
Root mean square error (×100) 50
100
200
20
20 30 50 100 200
CCEMG 0.33 −0.19 0.20 0.14 0.23 15.02 0.30 0.14 0.09 −0.17 0.35 12.91 −0.15 0.63 −0.20 −0.17 0.02 9.82 0.25 0.13 0.27 0.00 0.06 7.01 0.05 −0.11 −0.17 −0.07 −0.05 5.35
20 30 50 100 200
CCEP 0.48 0.06 −0.04 −0.23 −0.06 0.18 0.00 0.48 −0.18 0.11 0.18 0.24 0.04 −0.10 −0.16
20 30 50 100 200
CupBC 1.34 0.51 0.57 0.30 0.14
0.83 0.85 0.70 0.44 0.14
1.07 1.14 0.62 0.45 0.13
0.16
0.10 13.13
1.35 11.24 1.23 8.97 0.81 6.77 0.46 4.86 0.26 3.53
50
100
200
13.90 12.03 8.46 6.55 4.65
12.61 10.70 7.87 5.85 4.15
13.35 10.07 7.42 5.25 3.61
13.78 10.59 7.34 5.01 3.31
6.80 5.50 5.80 5.75 4.80
6.90 6.80 5.10 5.95 5.05
6.75 5.25 6.10 5.45 4.75
6.60 6.15 5.75 5.45 5.15
12.81
12.21
13.57 9.95 7.22 4.87 3.30
15.30 11.04 7.22 4.98 3.15
6.75 6.10 5.25 5.10 5.40
7.40 6.90 5.90 6.00 4.70
7.00 5.70 6.25 5.40 5.25
7.59 5.78 4.32 2.76 2.00
7.24 5.60 4.05 2.61 1.69
67.35 67.40 64.65 66.40 64.80
60.20 59.80 57.35 56.55 53.35
56.70 55.35 52.40 48.20 45.15
9.52 7.52 5.85 4.20 2.99
8.24 6.47 4.98 3.44 2.45
20
30
Power (5% level, H1 : β1 = 0.95)
30
−0.25 0.43 11.48 10.70 10.39 −0.17 −0.02 8.42 7.57 7.23 −0.06 0.05 5.87 5.72 5.27 −0.04 −0.03 4.35 3.99 3.75 1.12 0.86 0.91 0.42 0.27
Size (5% level, H0 : β1 = 1.00) 50
100
200
20
30
50
100
200
7.20 4.80 5.90 6.10 4.55
9.40 8.40 9.75 14.50 19.45
8.95 10.05 12.90 17.75 23.70
10.15 9.45 13.40 21.65 29.75
10.15 10.35 14.00 22.65 37.25
10.15 11.65 15.20 27.30 43.45
6.65 6.00 5.30 4.95 4.10
6.75 5.50 5.50 6.00 3.95
9.90 9.05 11.40 17.25 25.75
10.20 9.95 14.05 19.60 28.50
10.40 10.55 14.15 23.50 34.50
10.35 10.25 14.35 23.55 41.10
10.25 10.60 15.20 27.00 46.05
60.85 56.95 52.00 44.00 43.95
66.80 65.35 59.70 53.00 46.95
70.45 72.35 74.90 79.35 86.90
66.05 68.95 72.25 80.40 88.45
66.05 69.15 70.40 83.10 91.20
71.25 72.95 78.65 89.40 96.90
76.85 80.20 84.50 93.60 99.35
Notes: The DGP is the same as that of Table 1, except that γi2 ∼ IIDN(0, 1), so the rank condition is not satisfied. See notes to Table 1. Table 4 Small-sample properties of common correlated effects type estimators in the case of experiment 2B (homogeneous slopes + rank deficient). Bias (×100)
(N , T ) 20
30
Root mean square error (×100) 50
100
200
20
Size (5% level, H0 : β1 = 1.00)
30
50
100
200
20
30
Power (5% level, H1 : β1 = 0.95)
50
100
200
20
30
50
100
200
20 30 50 100 200
CCEMG −0.28 −0.26 0.41 −0.11 0.07 0.09 0.00 0.23 −0.07 0.14 −0.08 −0.12 0.14 0.11 0.01
0.73 14.45 0.45 −0.05 11.99 −0.02 0.00 9.01 −0.03 0.06 6.66 −0.17 −0.07 5.13
12.85 10.78 7.97 5.92 4.45
12.02 9.82 7.62 5.16 3.88
12.07 9.52 6.79 4.78 3.27
13.47 10.33 6.72 4.56 3.34
7.35 5.20 5.05 4.65 5.45
5.45 5.90 4.80 5.40 5.10
6.40 5.95 5.00 5.60 5.45
6.70 6.50 5.45 4.60 4.65
6.00 6.55 4.95 6.35 5.15
9.35 7.85 9.40 15.10 22.35
9.15 10.50 12.20 18.15 28.80
10.95 12.40 15.75 23.95 36.60
11.55 14.35 17.60 28.50 44.75
10.90 14.90 21.15 34.85 56.70
20 30 50 100 200
CCEP −0.12 −0.19 0.35 −0.26 0.66 12.66 −0.09 0.05 0.06 0.39 0.03 10.00 −0.14 0.39 −0.08 0.01 0.03 7.29 0.20 −0.13 −0.11 −0.05 0.04 5.44 0.19 0.11 −0.08 −0.13 −0.07 3.97
11.53 9.57 6.92 4.97 3.71
11.56 9.26 6.84 4.55 3.35
12.12 9.36 6.58 4.45 2.96
15.07 11.05 6.79 4.39 3.09
7.45 5.55 4.95 4.80 5.25
7.00 5.75 5.25 5.35 5.15
7.55 6.80 5.45 5.40 5.05
6.35 6.70 5.60 4.95 5.00
6.50 6.75 4.85 6.05 5.60
9.85 9.90 11.25 20.60 31.95
10.00 11.70 15.60 22.65 38.45
12.60 13.30 16.65 28.35 44.30
12.65 15.20 19.95 31.40 50.70
11.50 14.50 20.40 36.80 60.40
20 30 50 100 200
CupBC 0.44 0.18 0.12 0.18 0.10
6.02 4.64 3.62 2.48 1.72
4.11 3.04 2.32 1.65 1.17
2.41 1.72 1.29 0.86 0.63
1.33 1.00 0.70 0.48 0.33
59.65 60.05 60.90 59.65 59.50
48.25 48.75 47.10 48.70 45.65
34.40 33.85 32.85 33.20 32.15
26.15 21.60 20.00 19.80 21.50
19.00 20.00 14.75 16.90 15.80
69.45 71.45 77.00 87.85 95.05
65.90 72.15 82.25 91.10 98.50
68.15 79.20 88.35 97.85 99.65
85.85 95.10 98.95 100.00 100.00
99.30 100.00 100.00 100.00 100.00
0.33 0.22 0.36 0.02 0.03
−0.31
0.29 0.26 0.23 0.14 0.03 0.13 0.09 −0.01 0.06 0.05
0.20 0.09 0.07 0.04 0.02
8.11 6.33 4.90 3.23 2.39
Notes: The DGP is the same as that of Table 1, except that γi2 ∼ IIDN(0, 1), so the rank condition is not satisfied, and βij = 1 for all i and j, i = 1, 2, . . . , N , j = 1, 2. See notes to Table 1.
presented in Pesaran (2006), and illustrate the robustness of the CCE estimators to the presence of unit roots in the unobserved common factors. This is important since it obviates the need for pretesting of unobserved common factors for the possibility of non-stationary components. The CCE estimators perform well, in both heterogeneous and homogeneous slope cases, and irrespective of whether the rank condition is satisfied, although the CCE estimators with rank deficiency have sightly higher RMSEs than those under the full rank condition. The RMSEs of the CCE estimators of Tables 1 and 3 (heterogeneous case) are higher than those reported in Tables 2 and 4 for the homogeneous case. The sizes of the t-test based on the CCE estimators are very close to the nominal 5% level. In the case of full rank, the powers of the tests for the CCE estimators are much higher than in the rank-deficient case. Finally, not surprisingly, the power of the tests for the CCE estimators in the homogeneous case is higher than that in the heterogeneous case. It is also important to note that the small-sample properties of the CCE estimator do not seem to be much affected by the residual
serial correlation of the idiosyncratic errors, εit . The robustness of the CCE estimator to the short-run dynamics is particularly helpful in practice where typically little is known about such dynamics. In fact, a comparison of the results for the CCEP estimator with the infeasible counterpart given in Table 1 shows that the former can even be more efficient (in the RMSE sense). For example, the RMSE of the CCEP for N = T = 50 is 3.97 whilst the RMSE of the infeasible pooled estimator is 4.31. This might seem counterintuitive at first, but, as indicated above, the infeasible estimator does not take account of the residual serial correlation of the idiosyncratic errors, but the CCE estimator does allow for such possibilities indirectly through the use of the cross-section averages that partly embody the serial correlation properties of ft and the εit . Consider now the PC augmented estimators and recall that they are computed assuming that the true number of common factors is known. The results in Table 1 bear some resemblance to those presented in Kapetanios and Pesaran (2007). The biases and RMSEs of the PC1POOL and PC1MG estimators improve as both N and T increase, but the t-tests based on these estimators
G. Kapetanios et al. / Journal of Econometrics 160 (2011) 326–348
337
Table 5 Small-sample properties of common correlated effects type estimators, in the case of heterogeneous slopes; the number of factors m = 4 exceeds k + 1 = 3. Bias (×100)
Root mean square error (×100)
(N , T )
20
30
20 30 50 100 200
CCEMG 0.23 0.20 −0.04 0.12 0.01
0.29 0.08 0.00 −0.06 −0.04
20 30 50 100 200
CCEP 0.09 0.03 −0.04 0.06 −0.04
−0.05 −0.05 −0.07 −0.05
20 30 50 100 200
CupBC 0.49 0.01 −0.11 0.06 0.05
0.32 0.12 0.25 0.04 −0.11
0.50
50
100
200
20
0.06
−0.23
−0.07 −0.16
0.14 −0.19 −0.01 −0.04
−0.16 −0.03
10.97 8.98 6.81 4.81 3.78
0.01 0.03
−0.02 −0.08 −0.13 −0.01
−0.22
0.00
0.04 −0.14 0.01 −0.03
0.06 0.12 −0.08 0.10 0.00
0.11 0.21 −0.02 0.04 0.00
50
100
200
9.59 7.65 6.03 4.25 3.08
8.29 6.84 5.12 3.69 2.84
7.61 6.42 4.71 3.53 2.61
7.70 6.29 4.67 3.46 2.53
0.13 0.11 −0.10
9.57 7.96 6.06 4.21 3.13
8.94 7.21 5.59 3.85 2.74
8.07 6.60 4.85 3.51 2.62
7.70 6.36 4.54 3.37 2.42
7.83 6.25 4.49 3.38 2.37
0.11 0.07 0.21 −0.04 −0.03
11.56 9.38 7.07 4.81 3.55
10.26 7.98 6.29 4.32 3.14
8.94 6.68 5.03 3.58 2.61
7.09 5.58 4.04 2.82 2.00
6.30 4.62 3.54 2.54 1.67
0.14 0.12 −0.10
−0.11 −0.09
30
Notes: The DGP is the same as that of Table 1, except that an extra term γi4 f4t is added to the y equation, where γi4 ∼ IIDN(0.5, 0.2), f4t = f4t −1 + vf 4,t , vf 4,t ∼ IIDN(0, 1), f4,−50 = 0. For the CupBC estimator, the number of unobserved factors is treated as an unknown but is estimated by the information criterion PCP2 , which is proposed by Bai and Ng (2002). We set the maximum number of factors to six. See also the notes to Table 1.
substantially over-reject the null hypothesis. The PC2POOL and PC2MG estimators perform even worse. The biases of the PC estimators are always larger in absolute value than the respective biases of the CCE estimators. The size distortion of the PC augmented estimators is particularly pronounced. Finally, it is worth noting that the performance of the PC estimators actually gets worse when N is small and kept small but T rises. This may be related to the fact that the accuracy of the factor estimates depends on the minimum of N and T . Now consider the CupBC estimator, and again recall that it is computed assuming that the true number of common factors is known. Let us begin with discussing results in the case in which the rank condition is satisfied, the results of which are reported in Tables 1 and 2. As is evident, the average bias and RMSEs of CupBC estimator are comparable to those of CCE estimators. Because of this, the results of CCEMG, CCEP and CupBC estimators only are reported in Table 2 onwards. In the case of heterogeneous slopes with the rank condition satisfied, the RMSEs of the CCE estimator are uniformly smaller than those of the CupBC estimator (as can be seen from Table 1). This might be expected, since the CupBC estimator is designed for the model with homogeneous slopes. In the case of homogeneous slopes with the rank condition satisfied, as can be seen from Table 2, the RMSEs of the CCEP estimator are smaller than those of the CupBC estimator when T is relatively small (T = 20 and 30). Turning our attention to the performance of the t-test, it is apparent that the size of the test based on the CupBC estimator is far from the nominal level across all experiments. This is especially so for experiments where the slopes are heterogeneous. In these cases, increases in N and T do not seem to help to improve the test performance. Even for homogeneous slope cases, the best rejection probability result is 14.90% for T = N = 200 in Table 2. In contrast, the size of the t-test based on the CCE estimators is close to 5% nominal level across all experiments. Tables 3 and 4 provide the summary of experimental results in the rank-deficient case. For this design, even though the size of the t-test based on the CupBC estimator is grossly oversized, the RMSEs of the estimator are smaller than those of the CCE estimators. However, note that in these experiments the number of factors is treated as known, which is rarely expected in a practical situation. We return to this issue below.
Tables 5–8 report the results of the experiments carried out as robustness checks.6 Table 5 reports the results of the experiments where the number of unobserved factors is four (m = 4), which exceeds k + 1 = 3, in the case of heterogeneous slopes. In this experiment, CupBC estimates are obtained supposing that m is unknown but estimated using the information criterion PCP2 , which is proposed by Bai and Ng (2002), applied to the first differences of (yit , x1it , x2it ). We set the maximum number of factors to six.7 First, despite the number of unobserved factors, m = 4, exceeding the number of regressors and regressand (k + 1 = 3), the RMSEs of the CCE estimators decrease as N and T are increased, which confirms the consistency of the estimators in the rank-deficient case. Furthermore, the RMSEs of the CCE estimators dominate those of the CupBC estimator, except only when T is very large (≥100). We note that, although not reported for brevity, the size of the t-test based on the CCE estimators is very close to the nominal 5% level, whilst the size distortion of the CupBC estimators is acute for all cases considered. Tables 6–8 report the results of experiments with the same DGP as in Table 1 but where the unobserved factors are cointegrated, factor structures are semistrong, and the unobserved factors are subject to mean shifts, respectively. In all of these designs the CCE estimators uniformly dominate the CupBC estimator in terms of both RMSEs and the size of the t-test (which is not reported in the tables). These are consistent with the findings of Chudik et al. (forthcoming) and Stock and Watson (2008). 5. Conclusions Recently, there has been increased interest in the analysis of panel data models where the standard assumption that the errors of the panel regressions are cross-sectionally uncorrelated
6 For brevity, the size and power of t-tests are not reported in Tables 5–8, since they are qualitatively similar to those in Tables 1–4. For similar reasons, the results for homogeneous slopes and/or rank-deficient cases (for Tables 6–8) are not reported. A full set of results is available upon request from the authors. 7 For small N and T , the information criterion tends to overestimate the number
of the factors in the first-differenced data (yit , x1it , x2it ), and the estimates tend to 4 as N and T get larger.
338
G. Kapetanios et al. / Journal of Econometrics 160 (2011) 326–348
Table 6 Small-sample properties of common correlated effects type estimators, heterogeneous slopes and full rank, cointegrated factors, in the case of experiment 1A (heterogeneous slopes + full rank). Bias (×100) (N , T )
20
20 30 50 100 200
CCEMG 0.05 −0.14 −0.03 −0.05 −0.05
20 30 50 100 200
CCEP −0.06 −0.06 −0.03 −0.02 −0.04
20 30 50 100 200
CupBC 0.54 0.69 0.49 0.33 0.13
Root mean square error (×100) 30
50
−0.05
−0.22
0.08
0.09 0.14 −0.01 0.14
0.03 −0.05 0.03 0.03
−0.02
−0.01 −0.07
−0.23 −0.07 −0.09
0.14 0.03 0.10 0.85 0.52 0.54 0.37 0.31
100
200
20
30
50
100
200
0.04
0.00 0.02 0.11 0.00 −0.04
9.26 7.35 5.85 4.15 3.08
7.87 6.02 4.70 3.40 2.46
6.58 5.18 4.06 2.87 2.02
5.69 4.54 3.49 2.49 1.72
5.29 4.16 3.14 2.19 1.59
0.06
−0.01
−0.02
8.52 6.78 5.35 3.77 2.70
7.54 5.90 4.54 3.18 2.33
6.65 5.25 4.05 2.84 1.99
5.95 4.70 3.55 2.50 1.72
5.68 4.29 3.19 2.22 1.60
11.01 8.65 6.82 4.61 3.39
9.58 7.48 5.69 3.86 2.88
8.01 6.26 4.99 3.43 2.41
6.94 5.39 4.24 2.84 2.03
6.32 4.91 3.70 2.52 1.82
0.11
−0.05
0.06 −0.01
−0.03 0.05
0.01 0.13 −0.02 −0.04
0.61 0.54 0.50 0.38 0.13
0.68 0.50 0.53 0.22 0.25
0.78 0.68 0.58 0.26 0.09
0.12
Notes: The DGP of the same as that of Table 1, except that the factors are generated as cointegrated non-stationary processes: f1t = f1tt + 0.5f2tt + vf 1,t , f2t = 0.5f1tt + f2tt + vf 2,t , f3t = 0.75f1tt + 0.25f2tt + vf 3,t , with vfj,t ∼ IIDN(0, 1), fj,−50 = 0, for j = 1, 2, 3, where fℓtt = fℓtt −1 + vft ℓ,t , with vft ℓ,t ∼ IIDN(0, 1), for ℓ = 1, 2, t = −49, . . . , 0, . . . , T . See also the notes to Table 1.
Table 7 Small-sample properties of common correlated effects type estimators, semi-strong factors, in the case of experiment 1 A (heterogeneous slopes + full rank). Bias (×100) (N , T )
20
20 30 50 100 200 20 30 50 100 200
CCEP 0.09 −0.19 0.01 0.04 −0.07
20 30 50 100 200
CupBC 0.23 −0.20 0.39 0.18 0.00
Root mean square error (×100) 30
50
100
200
−0.09
−0.22
−0.07
0.09
−0.09
0.02 −0.12 0.01 −0.06
0.01 0.16 0.03 0.01
0.01 −0.11 0.05 −0.01
−0.11 0.14 0.02 0.05
0.10 −0.04 0.04 0.00
−0.07 −0.10
−0.06
0.04
−0.12
−0.08
0.13 0.08 −0.03
0.09 −0.05 0.02 −0.04
0.46 0.09 0.37 0.18 0.03
0.17 0.38 0.06 0.06 0.03
20
30
50
100
200
CCEMG 9.92 7.74 5.96 4.23 3.06
8.01 6.21 4.57 3.51 2.46
6.57 5.14 3.99 2.87 2.00
5.63 4.43 3.42 2.33 1.72
5.17 4.10 3.10 2.26 1.51
0.13 0.00 0.05
0.13 −0.02 0.03 0.00
8.64 7.12 5.27 3.77 2.68
7.49 5.90 4.46 3.28 2.30
6.34 5.12 3.93 2.84 1.96
5.65 4.49 3.43 2.35 1.70
5.34 4.21 3.16 2.28 1.53
0.43 0.20 0.20 0.05 0.09
0.45 0.49 0.15 0.09 0.01
12.29 9.53 7.34 4.99 3.77
10.55 8.03 6.08 4.40 3.03
8.09 6.39 5.07 3.61 2.55
6.75 5.14 3.99 2.69 1.98
5.80 4.58 3.40 2.45 1.64
Notes: The DGP of the same as that of Table 1, except that the factor loadings matrix Γ ′i is multiplied by N −1/2 for all i. See also the notes to Table 1.
is violated. When the errors of a panel regression are crosssectionally correlated, then standard estimation methods do not necessarily produce consistent estimates of the parameters of interest. An influential strand of the relevant literature provides a convenient parameterization of the problem in terms of a factor model for the error terms. Pesaran (2006) adopts an error multifactor structure and suggests new estimators that take into account cross-sectional dependence, making use of cross-sectional averages of the dependent and explanatory variables. However, he focuses on the case of weakly stationary factors that could be restrictive in some applications. This paper provides a formal extension of the results of Pesaran (2006) to the case where the unobserved factors are allowed to follow unit root processes. It is shown that the main results of Pesaran continue to hold in this more general case. This is certainly of interest, given the fact that usually there are major differences between results obtained for unit root and stationary
processes. When we consider the small-sample properties of the new estimators, we observe that again the results accord with the conclusions reached in the stationary case, lending further support to the use of the CCE estimators irrespective of the order of integration of the data observed. The Monte Carlo experiments also show that the CCE type estimators are robust to a number of important departures from the theory developed in this paper, and in general have better small-sample properties than alternatives that are available in the literature. Most importantly, the tests based on CCE estimators have the correct size, whilst the factorbased estimators (including the one recently proposed by Bai et al. (2009)) show substantial size distortions even in the case of relatively large samples. Appendix A. Lemmas Proofs of the lemmas are provided in Appendix B.
G. Kapetanios et al. / Journal of Econometrics 160 (2011) 326–348
339
Table 8 Small-sample properties of common correlated effects type estimators, one break in the means of unobserved factors, in the case of experiment 1A (heterogeneous slopes + full rank). Bias (×100)
(N , T )
20
20 30 50 100 200
CCEMG 0.01 0.14 −0.21 0.02 −0.08
20 30 50 100 200
CCEP 0.17 −0.15 −0.03 0.05 −0.06
20 30 50 100 200
CupBC 0.52 0.32 0.58 0.28 0.10
Root mean square error (×100) 30
50
100
200
20
−0.10 −0.03
−0.02 −0.02 −0.11
0.06
−0.07
−0.13 0.14 0.03 0.06
0.10 −0.04 0.04 0.00
30
50
100
200
9.66 7.68 5.91 4.26 3.08
7.82 6.08 4.64 3.48 2.49
6.74 5.11 4.01 2.88 2.01
5.87 4.54 3.43 2.33 1.72
5.54 4.22 3.13 2.26 1.51
0.20 0.03 −0.02
0.05 −0.02
0.00
−0.05
0.00
−0.13
−0.13
−0.14
0.18 0.09 −0.04
0.07 −0.06 0.04 −0.05
0.11 0.01 0.05
0.14 −0.01 0.02 0.00
8.73 7.10 5.30 3.80 2.72
7.61 5.98 4.53 3.26 2.29
6.86 5.31 3.97 2.85 1.95
6.30 4.78 3.47 2.34 1.71
6.21 4.46 3.21 2.28 1.53
0.77 0.58 0.75 0.35 0.08
0.79 0.77 0.38 0.38 0.08
0.80 0.58 0.61 0.29 0.23
0.89 0.84 0.54 0.32 0.17
11.18 8.91 6.78 4.85 3.57
9.87 7.80 6.01 4.22 2.93
8.39 6.55 5.03 3.41 2.44
7.52 5.68 4.18 2.75 2.01
6.97 5.27 3.82 2.55 1.69
Notes: The DGP is the same as that of Table 1, except that fjt = ϕjt for t < ⌊2T /3⌋ and fjt = 1 + ϕjt for t ≥ ⌊2T /3⌋, with ⌊A⌋ being the greatest integer part of A, where ϕjt = ϕj,t −1 + ζjt , ζjt ∼ IIDN(0, 1), j = 1, 2, 3. See also the notes to Table 1.
Lemma 1. Under Assumptions 1–4,
Lemma 2. Under Assumptions 1–4,
U¯ ′ U¯
Vi′ U¯
T Vi′ U¯ T
ε′i U¯ T F ′ U¯ T Xi′ U¯ T Q U¯
= Op
(A.1)
N
1
= Op
+ Op
N
1
= Op = Op
1
√
N
= Op
1
,
√
N
= Op
1
,
1
√
NT
+ Op
N
′
T
1
1
√
NT D′ U¯ T
, (A.2)
,
= Op
1
√
uniformly over i
N
(A.3)
(A.5)
√
N
T2 Q ′ Xi T2 Q ′G T2
¯ ′ H¯ H T2
¯ ′G H T2
¯ ′ εi H T
¯′
H Vi T
¯′
H Xi T2
¯ ′ U¯ H T
(A.6) uniformly over i
(A.7)
− −
Xi′ Mg εi
(A.8)
= Op (1)
(A.9)
= Op (1)
(A.10)
T
uniformly over i
(A.11)
T
uniformly over i
(A.12)
¯ Xi Xi′ M ¯F Xi′ M ¯ εi Xi′ M T
= Op (1), = Op
uniformly over i
1
√
N
.
(A.13) (A.14)
= Op
T
√
N
,
uniformly over i,
1
= Op
T
1
+ Op
√
NT
1
N
(A.16)
, (A.17)
1
= Op
+ Op
N
1
√
NT
,
uniformly over i.
(A.18)
− Σ vi = Op
1
√
T
.
Lemma 6. Under Assumptions 1–4, and assuming that the rank condition (9) does not hold,
T
= Op (1),
Lemma 5. Under Assumptions 1–4,
T
= Op (1),
(A.15)
Lemma 4. Assume that the rank condition (9) holds. Then, under Assumptions 1–4,
Xi′ Mg Xi
= Op (1)
uniformly over i.
N
uniformly over i.
¯F Xi′ M
= Op (1) = Op (1) ,
¯ εi Xi′ M T
(A.4)
1
+ Op
TN
Xi′ Mg Xi
T
1
√
Lemma 3. Under Assumptions 1–4, and assuming that rank condition (9) holds,
¯ Xi Xi′ M
uniformly over i
′
Q Q
T
= Op
− − −
Xi′ Mq Xi T Xi′ Mq F T
= Op
N
= Op
Xi′ Mq εi T
uniformly over i.
1
√
N
= Op
1
√
1
√
NT
,
,
uniformly over i, uniformly over i,
+ Op
1
N
(A.19)
(A.20)
, (A.21)
Lemma 7. Under Assumptions 1–4, and assuming that the rank condition (9) does not hold,
340
G. Kapetanios et al. / Journal of Econometrics 160 (2011) 326–348
¯F Xi M ′
T
C¯ = Op
1
+ Op
N
1
√
NT
to be bounded. (A.6) is established by
,
uniformly over i.
(A.22)
since G ′ G /T 2 = Op (1). To establish (A.7), first note that Q ′ Xi T2
Appendix B. Proofs of lemmas Proof of Lemma 1. To prove (A.1), we first show that
¯ t ‖2 = O E ‖u
1
N
,
= P¯ ′
¯t‖ = O and E ‖u
1
√
N
.
(B.23)
N 1 −
ε¯ + ¯t = t N u
i =1
G′G
T2
Πi + P¯ ′
G ′ Vi
(B.24)
.
(B.26)
T ∑ T ∑ ′ E g g ( ) ℓt ℓt
T ∑
′
gℓt vit t =1 t ′ =1 t =1 = O (1) T T2
sup Var
β′i vit ,
T2
′ = P¯ ′ GT 2G P¯ = Op (1),
The first term is Op (1) uniformly over i, since the elements of P¯ and Πi are assumed to be bounded in probability uniformly over i. For the second term, under Assumptions 1–2, denoting gℓt as the ℓth element of gt and noting that supi E vit vit′ ′ = O (1), we have that
We recall that
Q ′Q T2
i
.
v¯ t ′ i=1 βi vit . Then, by the cross-sectional indepen′ dence of vit and βi specified in Assumptions 2 and 4, E ‖¯vt ‖2 = ′ 2 ∑N 1 , and again by Assumptions 2 and 4, we have i=1 E βi vit N2 2 K E ‖¯vt ‖ ≤ N = O N1 . Similarly, E ε¯ t2 = O N1 . Next, note that ∑ T ¯ ¯′ T −1 U¯ ′ U¯ = T −1 t =1 ut ut , where the cross-product terms in ¯ t u¯ ′t , being functions of covariance stationary processes with fiu
where v¯ t =
1 N
∑N
nite fourth-order cumulants, are themselves stationary with finite ∑T ¯ t ‖2 , and means and variances. Also, E T −1 U¯ ′ U¯ ≤ T −1 t =1 E ‖u
by (B.23) E T −1 U¯ ′ U¯ = O N −1 , which establishes (A.1). The result for Vi′ U¯ /T in (A.2) is established in Lemma 2 below. The result for ε′i U¯ /T in (A.2) is established similarly to that for Vi′ U¯ /T . To establish (A.3), first we examine T −1 F ′ U¯ . Consider the ℓth row of T −1 F ′ U¯ and note that it can be written as ∑ T ¯′ ¯ T −1 t =1 fℓt ut . Since by assumption fℓt and ut are independently distributed processes, it easily follows that
T ∑
t =1 T
Var
T ∑ T ∑ ′ E f f ( ) ℓt ℓt ′ 1
¯t fℓt u
t =1 t =1
=O N
T2
(B.25)
∑T ¯ wt converges to its limit at the which establishes that T −1 t =1 fℓt u √ desired rate of Op 1/ N . The result for T −1 D′ U¯ is obtained using the same line of arguments. To establish (A.4), first note that Xi′ U¯ T
′ ¯ ′ (D, F ) U
= Πi
T
+
Vi′ U¯ T
= Op
¯ ′ (D,F )′ U¯
=P
T
= Op
√1
N
∑T
t ′ =1
E (gℓt gℓt ′ ) = O T 2 , and therefore supi Var
T ′ t =1 gℓt vit
T
= O (1). Hence, G Vi /T = Op (1) uniformly over i for sufficiently large T . Therefore, as the elements of P¯ are assumed to be bounded in probability, the second term is Op (1) uniformly over i, which establishes (A.7). (A.8) is straightforwardly proven, using (A.6). ¯ = Q + U¯ ∗ , where U¯ ∗ = 0, U¯ , To prove (A.9), recalling H ′
¯ ′ H¯ H T2
=
Q ′Q T2
+
U¯ ∗′ U¯ ∗ T2
+
Q ′ U¯
∗
+
T2
U¯ ∗′ Q T2
= Op (1)
by (A.1), (A.5) and (A.6). To establish (A.10),
¯ ′F H T2
′ = P¯ ′ GT 2F +
Op (1), since G ′ F /T 2 is Op (1). (A.11) is established because
¯ ′ εi H T
G ′ εi U¯ ∗′ εi = P¯ ′ + = Op (1) uniformly over i, T
T
(B.27) U¯ ∗′ F T2
=
(B.28)
since G ′ εi /T = Op (1) uniformly over i, using the same line of the argument as in the proof of (A.7). (A.12) can be proven similarly to (A.11). Next,
=
Q ′ Xi T2
+
U¯ ∗′ Xi T2
= Op (1)
uniformly over i
by (A.4) and (A.7), which establishes (A.13). Finally, (A.14) follows by the boundedness in probability of P¯ and (A.3). Proof of Lemma 2. In order to prove (A.15), we need to examine more closely Lemma A.2. of Pesaran (2006). So, we have Vi′ U¯ T
= T
−1
Vi ε¯ + (NT ) ′
−1
′
Vi
N −
Vj βj , T
−1
¯ Vi V , ′
(B.29)
j =1
where ε¯ = N −1
∑ εj and V¯ = N −1 Nj=1 Vj . Denote the tth ∑N element of ε¯ by ε¯ t = N −1 j=1 εjt , and consider the first term on ∑N
j =1
the right-hand side (RHS) of (B.29). Since, by assumption, vit and
ε¯ t are independently distributed covariance stationary processes with zero means, it follows that
1
√
N
,
uniformly over i
using (A.2) and (A.3), since the elements of Π i are assumed to be bounded uniformly over i. ¯ and using (A.3), To establish (A.5), recalling that Q = G P, Q ′ U¯ T
t =1
T2
T ∑ fℓt u¯ t 1 t =1 Var , =O T N
∑T
¯ ′ Xi H
.
But, by standard unit root analysis, we know that asymptotic ∑T ∑T 2 ′) = O T E f f , and therefore ( ′ ℓ t ℓ t t =1 t =1
But, by standard unit root asymptotic analysis, we know that ∑
, since the elements of P¯ are assumed
T ∑
viℓt ε¯ t 1 t =1 t ′ =1 t =1 sup =O T N T2 i
sup Var i
T ∑ T ∑ ′ Γivℓ t − t
,
where Γivℓ t − t ′ is the autocovariance function of the station∑∞ ary process, viℓt . But, by Assumption 2, supi s=1 Γivℓ (|s|) < ∞.
G. Kapetanios et al. / Journal of Econometrics 160 (2011) 326–348
Therefore,
equal to
T ∑
viℓt ε¯ t 1 t =1 sup Var , =O T NT i
(B.30)
which establishes that T
−1
Vi ε¯ = Op ′
1
√
TN
,
∑ T
viℓt ε¯ t
Var t =1 T Pr T −1 Vi′ ε¯ ≥ ϵ ≤
(B.31)
,
ϵ2
for all i.
But, since for any two continuous functions f , g, if f (x) ≤ g (x) for all x, supx f (x) ≤ supx g (x), it follows that
∑ T sup Var t =1 i
sup Pr T −1 Vi′ ε¯ ≥ ϵ ≤
(NT )
Vi
N −
Vj βj = N
−1
Vi′ Vi T
j =1
where V¯ −∗ i = N −1
∑N
j=1,j̸=i
T
1 ′ ′ −1 ′ −1 ′ X Q H¯ H¯ ¯ − Q Q H X i T i ¯ ′ ¯ −1 ∗ U¯ ∗′ U¯ ∗ Q ′ U¯ U¯ ∗′ Q Xi′ Q H H + + ≤ 2 T T T T T2 Q ′ Q −1 H¯ ′ X i × T2 T2 1 = Op √ , uniformly over i,
Assumption 2, plimT →∞ T −1 Vi′ Vi follows that N −1
Vi′ Vi
T
βi = Op
1
N
βi +
,
Vi′ V¯ −∗ i
,
T
(B.32)
= Σ vi uniformly over i, it
uniformly over i.
(B.33)
Also, since the elements of Vi and V¯ −∗ i are independently distributed and covariance stationary, following the same line of analysis leading to (B.31), we have Vi V¯ −∗ i ′
T
= Op
1
√
NT
,
uniformly over i.
(B.34)
(NT )−1 Vi′
N −
Vj βj = Op
1
√
j =1
NT
1
+ Op
N
,
uniformly over i.
(B.35)
Finally, since the last term of (B.29) can be written as T −1 Vi′ V¯ = N −1
Vi′ Vi T
+
Vi′ V¯ −i T
, where V¯ −i = N −1
∑N
j=1,j̸=i
Vj , it also follows
that T −1 Vi′ V¯ = Op
N
(B.39)
by (A.1), (A.5), (A.7), (A.9), (A.6) and (A.13). Finally,
′ ′ −1 1 ′ ′ −1 ′ Xi′ U¯ ∗ Xi Q Q Q ′ XQ QQ ¯ H X − Q X ≤ i i T i T2 T T2 = Op
1
√
N
uniformly over i,
(B.40)
by (A.7), (A.6) and (A.4). Noting that Mg = Mq when the rank condition is satisfied, substituting (B.38)–(B.40) into (B.37), we have
′ ¯ Xi Xi M Xi′ Mg Xi = Op √1 , − T T N
Using (B.33) and (B.34) in (B.32) now yields
N
by (A.4), (A.9) and (A.13). Next, we have
Vj βj . Since βi is bounded, and, by
′ −1 ′ ¯ Xi 1 ′ ′ −1 ′ Xi′ U¯ ∗ H H¯ H¯ ′ X H¯ − X Q H¯ H¯ ¯ H X ≤ i i T i T T2 T2 1 = Op √ , uniformly over i, (B.38)
,
(B.37)
¯ ¯∗ We examine each of the above terms. So, noting that H = Q + U , with U¯ ∗ = 0, U¯ , we have
viℓt ε¯ t
proving that (B.31) follows from (B.30). Consider the second term in (B.29), and note that ′
1 ′ ′ −1 ′ ′ . ¯ + X Q Q Q H X − Q X i i T i
ϵ2
i
−1 ′ X ′ H¯ H¯ ′ H¯ −1 H¯ ′ X Q Xi Xi′ Q Q ′ Q i i − T T 1 ′ ′ −1 ′ ′ ¯ ¯ ¯ ¯ Xi H ≤ T Xi H − Xi Q H H 1 ′ ′ −1 ′ −1 ′ ¯ ¯ ¯ + Xi Q H H − QQ H Xi T
uniformly over i.
To see how (B.31) follows from (B.30), we note that, by the Markov inequality,
−1
341
uniformly over i,
as required. Next, we consider (A.17). In particular, by a similar analysis to that for (A.16), we have
′ ¯ εi Xi M Xi′ Mq εi ′ ′ ¯ −1 ¯ ′ ≤ 1 X ′H − ¯ ¯ − X Q H H H ε i i i T T T 1 ′ ′ −1 ′ −1 ′ ¯ ¯ ¯ εi + − QQ H Xi Q H H T
1
√
NT
+ Op
1
N
,
uniformly over i,
which completes the proof of Lemma 2.
(B.36)
We examine each of the above terms. So, we have
Proof of Lemma 3. We start by proving (A.16). We need to
Xi′ M¯X i
determine the order of probability of
T
1 ′ ′ −1 ′ ′ . ¯ + X Q Q Q H ε − Q ε i i T i
−
Xi′ Mq Xi T
. But this is
1 ′ X H¯ − X ′ Q H¯ ′ H¯ −1 H¯ ′ εi i T i
(B.41)
342
G. Kapetanios et al. / Journal of Econometrics 160 (2011) 326–348
¯ ′ H¯ −1 H¯ ′ ε 1 Xi′ U¯ ∗ H i ≤ T T T2 T 1 = Op √ , uniformly over i,
Also, from (B.45),
(B.42)
NT
NT
−1
¯ U¯ . C¯ U¯ ′ M
(B.49)
Then, using this result in (B.48), we have
′ ′ ¯ F ¯ ¯ ¯ Xi M ′ −1 2 ≤ Γ ′ U MU ¯ ¯ ¯ C C C i T T ′ ¯ U¯ −1 Vi M C¯ ′ C¯ C¯ ′ + .
by (A.4), (A.9) and (A.11). Next, we have
1 ′ ′ −1 ′ −1 ′ X Q H¯ H¯ ¯ H ε − Q Q i T i ¯ ′ ¯ −1 ∗ U¯ ∗′ Q Q ′ U¯ ∗ Xi′ Q H H 1 U¯ ∗′ U¯ − − ≤ − 2 T T T T T T2 Q ′ Q −1 H¯ ′ ε i × T2 T 1 = Op √ , uniformly over i,
¯ U¯ = − C¯ C¯ ′ F ′M
(B.50)
T
Since the norms of C¯ C¯ ′
−1
C¯ and Γ ′i are bounded, we need to
¯ U¯ /T and Vi′ M ¯ U¯ /T . For establish the probability orders of U¯ ′ M
¯ U¯ /T , using (A.1), (A.9) and (A.14), we have U¯ ′ M ¯ U¯ U¯ ′ M T
(B.43)
= Op
1
N
.
(B.51)
Similarly, by (A.2) and (A.12),
by (A.1), (A.5), (A.7), (A.9), (A.6) and (A.11). Finally,
′ ′ −1 1 ′ ′ −1 ′ U¯ ∗′ εi Xi Q Q Q ′ XQ QQ ¯ ≤ H ε − Q ε i i T i T2 T T2 = Op
1
1
+ Op
√
NT
N
,
uniformly over i,
(B.44)
which establishes (A.17).
uniformly over i,
¯ B C¯
0
¯ U¯ , + 0, M
¯ U¯ . U¯ ′ M¯F C¯ = −U¯ ′ M
(B.45)
Also, from above, ′
¯ F C =¯ −X i M ¯ U¯ . Xi′ M
(B.46)
Note, however, that Xi = G Π i + Vi , and hence
and substituting (B.51) and (B.52) into (B.50) establishes the result.
Xi = G Π i + Vi ,
¯ F = −Γ ′i F ′ M ¯ U¯ C¯ Xi′ M
C¯ C¯
(B.53)
where G = (D, F ) is the T × m + n matrix of I (1) factors, and Vi is a stationary error matrix. Denote the OLS residuals of the multiple
ˆ i , where Π ˆi = regression (B.53) as Vˆ i = Xi − G Π
G′G
−1
G ′ Xi .
′
Op (T −1 ), it follows that Vˆ i′ Vˆ i /T − Vi′ Vi /T = Op (T −1 ). The required result now follows since, under Assumption 2, Vi′ Vi /T − Σ vi = Op (T −1/2 ), where Σ vi is a non-singular matrix. Proof of Lemma 6. The procedure in Lemma 3 can be used to prove (A.19) and (A.21), but replacing all inverses with generalized inverses. This is required since Q ′ Q has reduced rank when the rank condition does not hold. We need to show that
¯ U¯ C¯ C¯ C¯ − Vi′ M ′
′ −1
.
(B.48)
Q ′Q
+ −
T2
(B.47)
(B.54)
¯ ′ ¯ + HH T2
= Op
1
√ T
N
.
(B.55)
However, because the Moore–Penrose inverse is not a continuous function, it is not sufficient that
′ −1
(B.52)
where + denotes the Moore–Penrose inverse. To establish (B.54), we need to show that
By the full rank assumption for C¯ and substituting (B.47) in (B.46), we obtain
′
uniformly over i,
uniformly over i,
¯ U¯ + Vi′ M ¯ U¯ . = Γ ′i F ′ M
,
1 ′ ′ + ′ + ′ 1 X Q H¯ H¯ − Q Q ¯ H X = O √ i p T i N
¯ U¯ = Π′i G ′ + Vi′ M ¯ U¯ Xi M ′ ′ ¯ U¯ + Vi′ M ¯ U¯ = Πi G M ′ D ¯ U¯ + Vi′ M¯U¯ = A′i , Γ ′i M F′ 0 ′ ¯ ¯ = A′i , Γ ′i ¯ U¯ + Vi M U F ′M ′
NT
Vi /T ′ ˆ i − Πi /T − Π ˆ i − Πi G ′ Vi /T = −Xi′ Mg G Π ′ ˆ i − Πi G ′ Vi /T , =− Π ˆ i − Πi = because Mg G = 0. But, since G ′ Vi /T = Op (1) and Π
¯ F C¯ = −M ¯ U¯ . Hence, or M
1
√
¯ H¯ = 0 and M ¯ D = 0, since H¯ = D, Z¯ . Then But M In
N
+ Op
Vˆ i′ Vˆ i /T − Vi′ Vi /T = Vˆ i′ Vˆ i − Vi /T + Vˆ i − Vi
¯ H¯ = M ¯ G P¯ + U¯ ∗ . M
1
Observe that Vˆ i = Mg Xi . Then, we can write
Proof of Lemma 4. We start by noting that
¯F 0 = 0, M
T
= Op
Proof of Lemma 5. Recall that
by (A.7), (A.6) and (A.2). Noting that Mg = Mq when the rank condition is satisfied, substituting (B.42)–(B.44) into (B.41) yields
′ ¯ εi Xi M 1 Xi′ Mg εi = Op √1 + O − p T T N NT
¯ U¯ Vi′ M
Q ′Q T2
−
¯′¯ HH T2
= Op
1
√ T
N
,
(B.56)
G. Kapetanios et al. / Journal of Econometrics 160 (2011) 326–348
for (B.55) to hold. But, by Theorem 2 of Andrews (1987), (B.56) is j
sufficient for (B.55), if additionally, as (N , T ) → ∞,
¯′¯ HH
lim
Pr rk
T2
j
N ,T → ∞
= rk
Q ′Q
T2
¯ = where Γ¯ = N1 i=1 Γ i and γ in (A.22) now yields
= 1,
T2
=
¯F Xi′ M
(B.57)
T
Q ′Q T2
+
U¯ ∗′ U¯
∗
T2
+
Q ′ U¯ ∗ T2
+
U¯ ∗′ Q T2
1
= Op
,
N
¯F Xi′ M
T
with
Γ¯ = Op
1
√
NT
uniformly over i,
+ Op
N
i =1
1
√
NT
,
uniformly over i,
1
which in turn yields
U¯ ∗′ U¯ ∗ Q ′ U¯ ∗ U¯ ∗′ Q + + 2 >ϵ =0 lim Pr T2 j T2 T N ,T → ∞
√
¯F NXi′ M
N 1 −
γ¯ +
T
N i=1 uniformly over i.
for all ϵ > 0. Also, rk(T −2 Q ′ Q ) = n + rk(C¯ ), for all N and T , with
′ Xi M¯F Xi′ Mq F ≤ 1 X ′ H¯ − X ′ Q H¯ ′ H¯ + H¯ ′ F − i i T T T 1 ′ ′ + ′ + ′ ¯ ¯ ¯ + X Q H H − Q Q H F T i 1 ′ ′ + ′ ¯ F − Q ′ F . (B.58) + H T Xi Q Q Q 1 ′ X H¯ − X ′ Q H¯ ′ H¯ + H¯ ′ F = Op √1 , i T i N uniformly over i,
¯ F )γ¯ N (Xi′ M
χNT ≡
(B.59)
(B.60)
N 1 −
¯ Xi Xi′ M
N i=1 1 N
∑N
i =1
N −
1
+ √
¯F NXi′ M
T2
,
γ¯ + ηi − η¯ ,
(C.63)
ηi . By (A.19) and (A.20), it follows that √
Xi′ Mq Xi
+
NXi′ Mq F
T2
N i =1
T2
T2
Op
1 √ T
N
+ Op
1
√
N
.
(C.64)
1 T 3/2
√
1 d N (bˆ MG − β) ∼ √
¯ F )γ¯ N (Xi′ M T2
=
, uniformly over i, it is the case that, for N
(B.61)
Proof of Lemma 7. The result immediately follows from (B.48), (B.49), (B.51) and (B.52). Appendix C. Proofs of theorems for pooled estimators Proof of Theorem 1. We know that
,
N −
N i=1
by (A.7), (A.6) and (A.3). Substituting (B.59)–(B.61) into (B.58) yields the required result.
N i=1
T
and T large,
uniformly over i,
Γ i ~i , Γ¯
1
√
√
1 ′ ′ + ′ 1 ′ XQ QQ ¯ H F − Q F = O , √ p T i N
N 1 −
,
(C.62)
by (A.1), (A.5), (A.7), (A.9), (A.6) and (A.10). Finally,
γ¯ + Γ¯ β +
T
non-stationary components. Then, since, by (C.62), uniformly over i
Note that, for the above two expressions, we have changed the normalization from T to T 2 . This is because, in the case where the rank condition does not hold, the use of cross-sectional averages is not sufficient to remove the effect of the I (1) unobserved ¯ Xi , Xi′ M ¯ F , Xi′ Mq Xi and Xi′ Mq F would involve factors, and so Xi′ M
= Op √1N . We have
× γ¯ + ηi − η¯ + Op
uniformly over i,
′ ¯ ′ H¯ Q Q 1 H T − T = Op √N
i=1
+ Op
N
N
1
√
Γ i ~i = Op (N −1/2 ), and therefore
∑N
1
√
+ Op
√
We next reconsider the second term on the RHS of (35), which is the only term affected by the fact that the rank condition does not hold. The second term on the RHS in (35) can be written as
χNT ≡
1 ′ ′ + ′ + ′ X Q H¯ H¯ − Q Q = Op √1 , ¯ H F T i N ¯ ′ H¯ H T
= Op
uniformly over i.
by (A.4), (A.9) and (A.10). Second, by (B.55) and (B.56),
′ Q Q if T −
= Op
T
where η¯ =
Consider each of the above terms in turn. First,
1 N
But, under Assumption 4,
√
Γ i ~i
j
rk(T −2 Q ′ Q ) → n + rk(C ) < n + m as (N , T ) → ∞. Using these results, it is now easily seen that condition (B.57) in fact holds. Hence, the desired result follows. Consider now (A.20). Following a similar line of analysis used to establish (A.19), we have
C¯ =
,
1
1 N
Γ i ~i
N i=1
+ Op
γ i . Substituting this result
∑N
N 1 −
γ¯ + Γ¯ β +
where rk(A) denotes the rank of A. But,
¯ ′ H¯ H
343
∑N
×
N 1 −
~i + √
Xi′ Mq F
T2
N i=1
Xi′ Mq Xi
+
T2
ηi − η¯ .
(C.65)
We next focus on analysing the RHS of (C.65). The first term on the RHS of (C.65) tends to a Normal density with mean zero and finite variance. The second term needs further analysis. Letting
Q1iT =
Xi′ Mq Xi
and Q¯ 1T = N 1 −
√
+
T2 1 N
Xi′ Mq F
T2
∑N
i=1
Q1iT , we have that
N 1 − Q1iT ηi − η¯ = √ Q1iT − Q¯ 1T ηi . N i=1 N i=1
(C.66)
344
G. Kapetanios et al. / Journal of Econometrics 160 (2011) 326–348
We note that ηi is i.i.d. with zero mean and finite variance and independent of all other stochastic quantities in the second term of the RHS on (C.66). We define
Q1iT ,−i =
Xi′ Mq,−i Xi
+
T2
and Q¯ 1T ,−i =
∑N
1 N
i=1
¯ −i B C¯ −i
In 0
′
= Bi2
S ′S T2
1
K2 + Op
T
¯ −i = B
1 N
Q−i (Q−′ i Q−i )+
Xi′ Mq Xi
+
Xi′ Mq F
T2
∑N
¯ j=1,j̸=i Bj and C−i =
=
T2
Q1iT
1
N
,
uniformly over i,
N N 1 − 1 − Q1iT − Q¯ 1T ηi − √ Q1iT ,−i − Q¯ 1T ,−i ηi N i=1 N i=1
√
= Op
1 N 1/2
Then, it is easy to show that, if zTi = xi yTi , xi is an i.i.d. sequence with zero mean and finite variance and yTi is a triangular array of random variables with finite variance, then zTi is a martingale difference triangular array for which a central limit theorem holds (see, e.g., Theorem 24.3 of Davidson (1994)). But this is the case here, for any ordering over i, setting yTi = Q1iT ,−i − Q¯ 1T ,−i and xi = ηi . Using this result, it follows thatthe secondterm on the RHS of (C.65) tends to a Normal density if Q1iT − Q¯ 1T ηi has variance with finite norm, uniformly over i, denoted by Σ iqT ; i.e., (C.67)
In order to establish the existence of second moments, it is that ‖(Q1iT − Q¯ 1T )‖, or equivalently sufficient to prove Q1iT ,−i − Q¯ 1T ,−i , has finite second moments. We carry out the
analysis for Q1iT − Q¯ 1T . For this, we need to provide further analysis of
T2
and
Xi′ Mq F T2
. First, note that Xi can be written as
Xi = QBi1 + SBi2 + Vi ,
(C.68)
where S is the T × m − k − 1-dimensional complement of Q , i.e., Q and S are orthogonal, and F = QK 1 + SK 2 ,
(C.69)
where K1 and K2 are full row rank matrices of constants with bounded norm. Note that, if m < 2k + 1, we assume, without loss of generality, that Bi2 has full row rank, whereas, if m ≥ 2k + 1, Bi2 has full column rank. Then,
1
= Op
T
,
T2
= Op
1
T
T2
′
= Bi2
S ′S T2
T2
K2
uniformly over i.
Xi′ Mq Xi
and B′
S′S
B
have
˜ ′i2 B˜ i2 . Bi2 = B +
˜ ′+ = B˜ + i2 Bi2 , and since in this case Bi2 has
and we obtain
′
Bi2
S ′S T2
+ Bi2
′
′
= Bi2 Bi2 Bi2
−1
−1
S ′S
T2
Bi2 B′i2
−1
Bi2 .
(C.71)
Hence,
Xi′ Mq Xi
+
Xi′ Mq F
T2
′
′
= Bi2 Bi2 Bi2
T2
−1
K2 + Op
1
T
,
uniformly over i, and the required result now follows by the boundedness assumption for Bi2 and K2 . The assumption that Bi2 has full row rank if m < 2k + 1 implies that the whole of S enters the equations for Xi . If that is not the case, then the argument above has to be modified as follows. We have that Xi = QBi1 + S1 Bi2 + Vi , where S1 is a subset of S. Then, Xi′ Mq Xi
= B′i2
T2
S1′ S1 T2
Bi2 + Op
1
T
,
uniformly over i,
and the analysis proceeds as above until
′ −1 ′ ′ −1 S1 S1 = B B B i2 i2 i2 T2 T2 T2 ′ S1 S 1 × K2 + Op , uniformly over i. 2
Xi′ Mq Xi
+
Xi′ Mq F
T
Then, the required result follows by the boundedness assumption for Bi2 and K2 and by Assumption 7(iii), which implies that
uniformly over i,
′ S ′ S1 −1 S S 1 < ∞ and E T12 < ∞. 2 T
E
,
Thus, in general, we have that
uniformly over i.
√
Then, Xi′ Mq Xi
T2
T
and B′i2 S ′ Mq Vi
,
S ′S
′ ′ −1 −1/2 ˜+ B ∆ , i2 = Bi2 Bi2 Bi2
But it easily follows that
T
B′i2
full row rank,
= B′i2 S ′ Mq SBi2 + Vi′ Mq Vi + B′i2 S ′ Mq Vi + Vi′ Mq SBi2 .
T2
S ′S
˜ ′i2 B˜ i2 Then, noting that B
Xi′ Mq Xi = Xi′ Mq (QBi1 + SBi2 + Vi ) = Xi′ Mq SBi2 + Xi′ Mq Vi
Vi′ Mq Vi
1
2k + 1. Then, it is easy to see that
Xi′ Mq Xi
T2
+ Bi2
We need to distinguish between two cases. In the first case, m ≥
B′i2
.
Σ iqT = Var[(Q1iT − Q¯ 1T )ηi ].
S ′S
i2 T 2 i2 T 2 an inverse. Then, by Assumption 7(ii), Q1iT − Q¯ 1T has finite second moments. The case where m < 2k + 1 is more complicated. ˜ i2 = ∆1/2 Bi2 , we have Denoting ∆ = T −2 S ′ S and B
and
B′i2
+ Op
j=1,j̸=i Cj . Then, it is straightforward that
− Q¯ 1T − Q1iT ,−i − Q¯ 1T ,−i = Op
uniformly over i.
∑N
,
Thus,
Q1iT ,−i , where Mq,−i = IT −
Xi′ Mq F T2
T2
Q−′ i , Q−i = G P¯ −i , P¯ −i = 1 N
Xi′ Mq,−i F
Similarly, using (C.69),
Bi2 + Op
1
T
d
N (bˆ MG − β) → N (0, Σ MG ),
j
as (N , T ) → ∞,
where
,
uniformly over i.
(C.70)
Σ MG = Ω~ + Λ,
(C.72)
G. Kapetanios et al. / Journal of Econometrics 160 (2011) 326–348
and
√
Λ = lim
N ,T →∞
N 1 −
N i=1
N (bˆ P − β) =
Σ iqT .
Xi′ Mq Xi
+
Xi′ Mq εi
T2
T N 1 −
1
T
+
Xi′ Mq εi
T2
1
= Op
T
T
∑N
Xi′ Mq Xi
i =1
N
(C.74)
+
Xi′ Mq εi
T2
conditions: (i) for any ordering of the cross-sectional units, is a martingale difference; (ii)
+
Xi′ Mq Xi
Xi′ Mq εi
T2
T
Xi′ Mq εi
moments. (ii) follows easily from the above argument about the
existence of moments of
+
Xi′ Mq Xi
Xi′ Mq F
T2
T
. Then, one has to
simply prove (i). We need to show that, for any ordering,
Xi′ Mq Xi
+
Xi′ Mq εi
T2
Then Qi∗ = Qi∗∗
. Denote Qi∗∗ =
T
Xi′ Mq εi T
Xi′ Mq εi
. Now
1 T
=
T
Xi′ Mq Xi T2
+
(C.76)
1
+ Op
√
N
1
√
T
(C.77)
hiT =
¯ F ηi − η¯ + εi Xi′ M
T2
T2
,
N (bˆ P − β) =
(C.78)
uniformly over i,
1
√
N
+ Op
1
√
T
,
(C.79)
where h¯ T = N1 i=1 hiT . Since, by assumption, ~ i and hiT are independently distributed across i,
∑N
1
N −
N − 1 i =1
bˆ i − bˆ MG
= Σ MG + Op
1
√
N
bˆ i − bˆ MG
+ Op
and the desired result follows.
′
1
√
T
T2
1 √ T
ηi − η¯ .
N
+ Op
1 T 3/2
.
−1
T2
N i=1
×
N 1 − Xi′ Mq (Xi ~i + εi + F ηi − η¯ )
√
T2
N i =1
+ Op
1
√ T
N
+ Op
1 T 3/2
.
(C.82)
Also, by Assumption 7, when the rank condition is not satisfied, Xi′ Mq Xi
∑N
i=1
is non-singular. Further, by (C.70),
T2
N 1 − Xi′ Mq Xi
T2
N 1 − ′ S ′S = Bi2 2 Bi2 + Op N i=1 T
1
T
.
2
′ E ST 2S < ∞. Hence, T −2 B′i2 S ′ SBi2 forms asymptotically a martingale difference triangular array with finite mean and variance and, as a result, T −2 B′i2 S ′ SBi2 obeys the martingale difference triangular array law of large numbers across i (see, e.g., Theorem 19.7 of Davidson (1994)) and, therefore, its mean tends to a non-stochastic limit which we denote by Θ; i.e.,
¯ + hiT − h¯ T + Op bˆ i − bˆ MG = (~i − ~)
¯F Xi′ M
N i =1
N 1 − Xi′ Mq Xi
and so
Substituting this result in (C.80), and making use of (33) and (34), we have
where
+
N 1 −
γ¯ + √
We note that, by Assumption 3, Bi2 is an i.i.d. sequence with finite second moments. Further, by Assumption 7, it follows that
,
uniformly over i,
¯ Xi Xi′ M
(C.81)
T2
N i =1
N i=1
.
But, by (C.62), the first component of qNT is Op
1 N
is consistent. To see this, first note that
¯F NXi′ M
t =1 st εit , where
N − (bˆ i − bˆ MG )(bˆ i − bˆ MG )′ ,
bˆ i − β = ~i + hiT + Op
√
∑T
N − 1 i =1
qNT =
N 1 −
.
st is a unit root process (see the definition of S in (C.68) above). Then, for (C.75) to hold it is sufficient to note that, for all t , l, E (Qi∗∗ st εit |Qi∗∗ sl εi−1l ) = 0. This completes the proof of (C.74). Finally, we need to show that the variance estimator given by
(C.80)
Assuming random coefficients, we note that γ i = γ¯ +ηi − η¯ , where ∑ η¯ = N1 Ni=1 ηi . Hence,
(C.75)
+ qNT ,
T2
T2
√
E (Qi∗ |Qi∗−1 ) = 0,
ˆ MG = Σ
√
qNT = √ N i =1
T
has finite second
N ¯ (Xi ~i + εi ) 1 − Xi′ M
where
T
follows a central limit theorem. This holds under the following
T2
N ¯ F γi 1 − Xi′ M
.
−1
N i =1
. In particular, we have to prove that
For this, it is enough to show that √1
1
×
Xi′ Mq Xi
N i=1
where Qi∗ =
N ¯ Xi 1 − Xi′ M
√
345
N i=1
(C.73)
To complete the proof, we have to consider two further issues. First, we note that, in (C.65), we disregard a term involving
Θ = lim
N ,T →∞
N 1 −
N i =1
Proof of Theorem 2. As before, the pooled estimator, bˆ P , defined by (20), can be written as
ΘiT
,
(C.83)
where
ΘiT = E T −2 B′i2 S ′ SBi2 .
(C.84)
But, by similar arguments to those used for the mean group estimator in the case when the rank condition does not hold, we can show that N 1 − Xi′ Mq Xi
,
√
N i=1
T2
d
~i → N (0, Ξ ) ,
where
Ξ = lim
N ,T →∞
N 1 −
N i=1
Ξ Ti ,
(C.85)
346
G. Kapetanios et al. / Journal of Econometrics 160 (2011) 326–348
and Ξ Ti = Var T −2 Xi′ Mq Xi ~i . Further, by independence of εi across i,
N 1 − Xi′ Mq εi
√
1
= Op
T2
N i=1
T
N 1 −
Xi′ Mq F
T2
N i=1
√
¯ εi NXi′ M
Ψˆ iT
= Op
T
1
T
N
which implies that 1 N
∑N
i=1
Q2iT , we have
N 1 −
N i=1
N 1 − Q2iT − Q¯ 2T ηi . ηi − η¯ = √
−1 Ψˆ iT
√
¯ εi NXi′ M
N 1 −
=
T
Xi′ Mg Xi
N i =1
N i=1
T
ΦTi
(C.86)
√
− Q¯ 2T )ηi ].
T
(C.87)
j
d
N (bˆ P − β) → N (0, Σ ∗P ),
as (N , T ) → ∞,
(C.88)
where
Σ ∗P = Θ−1 (Ξ + Φ) Θ−1 ,
(C.89)
. We have
V ′ +(θˆ i
as
N 1 −
= √
N i=1
+
1,i −θ 1,i
√
T
has mean
Xi′ Mg εi
Xi′ Mg Mg εi
=
√ . This can then T T εi +(θˆ 2,i −θ 2,i )G √ , where θˆ 1,i is the √
)G ′
be written estimated
(θˆ 1,i − θ 1,i )G ′ εi V ′ (θˆ 2,i − θ 2,i )G + + i √ √
= √
T
T
T
N 1 −
N i=1
~i + −1 Ψˆ iT
N 1 −
N i=1
−1
NXi′ M¯F
Ψˆ iT
T
√
¯ εi NXi′ M
γi
Vi′ εi
√
T
= Op
1
+ Op
√
T
1
√
N
,
(D.90)
(D.91)
and so, by the uniform boundedness assumption on γ i , and by (A.16), we have that
¯F NXi′ M
γ i = Op
T
1
√
T
+ Op
1
√
N
. So it suffices to show that
N 1 − 1 1 ˆ N bMG − β = √ ~i + Op √ + Op √ .
√
N i=1
T
as (N , T ) → ∞.
Ω~ can be consistently estimated by
ˆ MG = Σ
N −
1
N − 1 i=1
bˆ i − bˆ MG
bˆ i − bˆ MG
bˆ i − bˆ MG
= (βi − β) + Op
1
¯F NXi′ M T
γ i = Op
By Lemma 3, we have that
1
√
T
+ Op
′
√
T
.
+ Op
uniformly over i,
and so
(D.94)
(D.95)
To show this, from the proof of Theorem 3, we first note that
,
uniformly over i,
√
N
j
d
N bˆ MG − β → N (0, Ω~ ),
,
uniformly over i,
√
T
has mean zero with bounded variance uniformly over i, which
√
√1
follows from our assumptions. Thus, from (D.92) and (D.93), we have
T
But the last three terms are Op
√
Hence,
−1
Xi′ Mg εi
T Vi′ εi
√
Ψˆ iT
−1
T
regression coefficient of Xi on G and θˆ 2,i is the estimated regression coefficient of εi on G. But (θˆ 1,i − θ 1,i ) = Op (T −1 ) and (θˆ 2,i − θ 2,i ) = Op (T −1 ). So Vi′ + (θˆ 1,i − θ 1,i )G ′ εi + (θˆ 2,i − θ 2,i )G √
¯ Xi . As we assume that the rank condition (9) ˆ iT = T −1 Xi′ M where Ψ is satisfied, we have, by Lemma 4, that
N i=1
Xi′ Mg Xi
T
N bˆ MG − β
N 1 −
(D.93)
T
(θˆ 1,i − θ 1,i )G ′ G (θˆ 2,i − θ 2,i ) + . √
Proof of Corollary 1. Using (E.106), we have
T
. (D.92)
′
Appendix D
¯F N Xi′ M
N
T
proving the result for the pooled estimator. The result for the consistency of the variance estimator follows along similar lines to that for the mean group estimator.
√
T
For this, we have to show that Xi′ Mg εi
√
1
√
zero with bounded variance uniformly over i. We analyze
Thus, overall, by the independence of ~i and ηi , it follows that
−1 Ψˆ iT
T
and
ΦTi = Var[(Q2iT
+ Op
√
−1 ′ N ′ − Xi Mg εi 1 Xi Mg Xi = Op √ . √
1
NT i=1 N i=1
T
We examine the behaviour of the first term on the RHS of (D.92). We wish to show that
√
N ,T →∞
NXi′ Mg εi
1
+ Op
where N 1 −
−1 √
T
N d 1 − Q2iT − Q¯ 2T ηi → N (0, Φ) , √ N i=1
Φ = lim
,
√
Then, similarly to the analysis used above for T −2 Xi′ Mq Xi , we have
1
+ Op
√
uniformly over i,
.
Further, letting Q2iT = T −2 Xi′ Mq F and Q¯ 2T =
√
−1
1
√
N
.
which yields (noting that βi − β = ~i ) 1
N −
N − 1 i=1
bˆ i − bˆ MG
bˆ i − bˆ MG
′
1
√
N
,
G. Kapetanios et al. / Journal of Econometrics 160 (2011) 326–348
=
N −
1
N − 1 i=1
~i ~′i + Op
1
+ Op
√
T
1
.
√
N
But by the assumption that ~i has finite fourth moments, and using the law of large numbers for i.i.d. processes, it readily follows that j
ˆ MG → Ω~ , as (N , T ) → ∞. Σ
Proof of Corollary 2. Assuming that the rank condition is satisfied, bˆ P , defined by (20), can be written as
√
N bˆ P − β
=
N ¯ Xi 1 − Xi′ M
N i =1
×
−1
where the uniformity follows by the assumption that ~i has uniformly finite fourth moments. Since ~i is i.i.d. and independent of all other stochastic quantities in the model, it follows ¯ −i Xi ~i is a martingale difference triangular that ~˜ Ti = T −1 Xi′ M array, since, for any ordering of the cross-sectional units, ¯ −i Xi ~i |i − 1, . . . , 1 = 0. Then, as long as E ‖T −1 Xi′ M ¯ −i E T −1 Xi′ M 2 Xi ‖ < ∞, which is satisfied by Assumption 6, a central limit theorem holds for ~˜ Ti , by Theorem 24.3 of Davidson (1994). Also, by Assumption 2(ii) of this paper, Theorem 1 of De Jong (1997) and Example 17.17 of Davidson (1994), it follows that T 1−
T
¯ (Xi ~i + εi ) 1 − Xi′ M
√
+ qNT ,
T
N i =1
T t =1
N
(D.96)
j
qNT =
√
¯ F γi N Xi′ M
N 1 −
T
√
N bˆ P − β =
√1
.
(D.97)
+ Op
T
×
√1
+ Op
T
N 1 − Xi′ M¯X i
N i =1
×
√
N
.
N →∞
√
T
T
+ Op
1
.
(D.99)
∑N −1
T −1 Xi′ M¯X i is non-
√
N
i=1
N
N −
.
Σ vi
∑N
N
¯ −i Xi Xi′ M T
j=1,j̸=i
= Op
N
i =1
Xi′ M¯X i T
¯ −i ~i . We define M
1 − Xi′ M¯X i
√
N i =1
T
zjt . Then, it is straightforward to see that
Xi′ M¯X i T
−
1 N
, uniformly over i, and so N ¯ −i X i 1 − Xi′ M
~i − √
∗−1
N i=1
T
. Hence, as (N , T )
lim
N
N ,T →∞
−1
N −
Σ v ΩiT , (D.103)
i =1
,
(D.104)
T
,
N ′ ¯ − X M Xi i
bˆ i − bˆ MG
(D.105)
Appendix E Proof of Theorem 3. Using (25) in (15), we have
¯ Xi Xi′ M
T
~i = Op
−1
T
¯F Xi′ M
T
¯ Xi Xi′ M
−1
γi
¯ εi Xi′ M
T
(D.100)
¯ −i = IT − H¯ −i H¯ −′ i H¯ −i −1 H¯ −′ i , where H¯ −i = (D, Z¯ −i ), as M Z¯ −i is a T × (k + 1) matrix of observations on dt and z¯t ,−i and 1 N
R =
i
+
i=1
√1
(N − 1) i=1 T ′ X ′ M ¯ Xi i ˆ ˆ × bi − bMG .
RHS on (D.98). We first consider √1
∑N
∗
N ′ ¯ − X M Xi 1
bˆ i − βi =
p
−1
By a similar argument to that used to show the consistency of the variance estimator in the MG estimator case, it is easy to show that this variance estimator is consistent.
Next, we examine the second component of the first term of the
z¯t ,−i =
ˆ R Ψ
where Ψ = lim
= Op
T
1
∗
,
i=1
Rˆ ∗ =
→ Ψ ∗−1 ,
T
∗−1
T
By (A.16) and, since by Assumption 6 N singular, we have
N i =1
∗−1 ∗ ˆ
(D.98)
−1
N i =1
−1
R Ψ
∗
1
√
+ Op
ΣP = Ψ
∗−1 ∗
Ψˆ = N −1
N 1 − Xi′ M¯X i ~i + Xi′ Mg εi
1 − Xi M¯X i
T
(D.102)
where
T
1
√
′
i=1
∗
∗
Further, by (A.17),
N
Xi′ Mg εi
∑N
N d
uniformly over i,
ˆ β → N (0, Σ ∗P ), where N b−
ˆ P = Ψˆ Σ
T
N ¯ (Xi ~i + εi ) 1 − Xi′ M
+ Op
√
,
where Σ v ΩiT denotes the variance of i T ~i . The variance estimator for Σ ∗P suggested by Pesaran (2006) is given by
−1
√
T
X ′ Mg Xi
. Thus,
N i =1
N bˆ P − β =
1
√
N
N ¯ Xi 1 − Xi′ M
N i =1
√
N i=1
By (D.91), qNT = Op
vit εit = Op
which implies that √1
→ ∞,
where
347
T
.
Using (A.17) and (A.18), and assuming that the rank condition (9) is satisfied, we have bˆ i − βi =
¯ Xi Xi′ M
−1
T
+ Op
Xi′ Mg εi
T 1
√
NT
+ Op
1
N
.
For N and T sufficiently large, the distribution of
1 N 1/2
, (D.101)
(E.106)
(E.107)
√
T bˆ i − βi
will be√ asymptotically normal if the rank condition (9) is satisfied and if T /N → 0 as N and T → ∞. To see why this additional condition is needed, using (E.107), note that
348
G. Kapetanios et al. / Journal of Econometrics 160 (2011) 326–348
√
T bˆ i − βi
=
¯ Xi Xi′ M
−1
Xi′ Mg εi
√
T
T
√ + Op
T
+ Op
N
and the asymptotic distribution of
1
√
N
,
(E.108)
√
T bˆ i − βi
√
will be free of j
nuisance parameters only if T /N → 0, as (N , T ) → ∞. We now give the necessary arguments for showing that the first term on the RHS of (E.108) is asymptotically normally distributed. We note that Xi′ Mg εi
T T ′ 1 − 1 − ˆ i − Πi √ =− Π gt εit + √ vit εit .
√
T
T t =1
T t =1
(E.109)
But, it is straightforward to show that the first term of (E.109) is Op (T −1/2 ) when gt is I (1). Then, we need to obtain a central limit theorem for the second term of (E.109). But, by the martingale difference assumption on εit , it follows that vit εit is also a martingale difference sequence with finite variance given by σi2 Σ vi . Then, by Theorem 24.3 of Davidson (1994), it follows that T 1 −
√
T t =1
d
vit εit → N (0, σi2 Σ vi ).
(E.110)
Further, by (A.16), and noting that, by Assumptions 5 and 6, Xi′ M¯X i /T and Xi′ Mg Xi /T are non-singular, we also have
¯ Xi Xi′ M
−1
T
−
Xi′ Mg Xi
−1
= Op
T
and, by Lemma 5, it follows that
1
√
N
Xi′ Mg Xi
,
−1
T
1 − Σ− vi = Op
√1 T
,
finally implying that
√
d
1 T bˆ i − βi → N (0, σi2 Σ − vi ),
(E.111)
and that a consistent estimator of the asymptotic variance can be obtained by
σ˚ i2
¯ Xi Xi′ M
−1
T
,
where σ˚ i2 =
yi − Xi bˆ i
′
¯ yi − Xi bˆ i M
T − (n + 2k + 1)
.
(E.112)
References Andrews, D.W., 1987. Asymptotic results for generalized Wald tests. Econometric Theory 3, 348–358.
Anselin, L., 2001. Spatial econometrics. In: Baltagi, B. (Ed.), A Companion to Theoretical Econometrics. Blackwell, Oxford. Bai, J., 2003. Inferential theory for factor models of large dimensions. Econometrica 71, 135–173. Bai, J., Kao, C., Ng, S., 2009. Panel cointegration with global stochastic trends. Journal of Econometrics 149, 82–99. Bai, J., Ng, S., 2002. Determining the number of factors in approximate factor models. Econometrica 70, 191–221. Chamberlain, G., Rothschild, M., 1983. Arbitrage, factor structure and meanvariance analysis in large asset markets. Econometrica 51, 1305–1324. Chudik, A., Pesaran, M.H., Tosetti, E., 2010. Weak and strong cross section dependence and estimation of large panels. Working Paper Series 1100. European Central Bank. The Econometrics Journal (forthcoming). Coakley, J., Fuertes, A., Smith, R.P., 2002. A principal components approach to crosssection dependence in panels. Mimeo. Birkbeck College, University of London. Conley, T.G., Dupor, B., 2003. A spatial analysis of sectoral complementarity. Journal of Political Economy 111, 311–352. Conley, T.G., Topa, G., 2002. Socio-economic distance and spatial patterns in unemployment. Journal of Applied Econometrics 17, 303–327. Connor, G., Korajzcyk, R., 1986. Performance measurement with the arbitrage pricing theory: a new framework for analysis. Journal of Financial Economics 15, 373–394. Connor, G., Korajzcyk, R., 1988. Risk and return in an equilibrium APT: application to a new test methodology. Journal of Financial Economics 21, 255–289. Davidson, J., 1994. Stochastic Limit Theory. Oxford University Press. De Jong, R.M., 1997. Central limit theorems for dependent heterogeneous random variables. Econometric Theory 13, 353–367. Forni, M., Hallin, M., Lippi, M., Reichlin, L., 2000. The generalised factor model: identification and estimation. Review of Economics and Statistics 82, 540–554. Forni, M., Reichlin, L., 1998. Let’s get real: a factor analytical approach to disaggregated business cycle dynamics. Review of Economic Studies 65, 453–473. Johansen, S., 1995. Likelihood-Based Inference in Cointegrated Vector Autoregressive Models. Oxford University Press, Oxford. Kapetanios, G., Pesaran, M.H., 2007. Alternative approaches to estimation and inference in large multifactor panels: small sample results with an application to modelling of asset returns. In: Phillips, G.D.A., Tzavalis, E. (Eds.), The Refinement of Econometric Estimation and Test Procedures: Finite Sample and Asymptotic Analysis. Cambridge University Press, Cambridge. Lee, K.C., Pesaran, M.H., 1993. The role of sectoral interactions in wage determination in the UK economy. The Economic Journal 103, 21–55. Newey, W.K., West, K.D., 1987. A simple, positive semi-definite, heteroskedasticity and autocorrelation consistent covariance matrix. Econometrica 55, 703–708. Pesaran, M.H., 2006. Estimation and inference in large heterogeneous panels with a multifactor error structure. Econometrica 74, 967–1012. Pesaran, M.H., Schuermann, T., Weiner, S.M., 2004. Modeling regional interdependencies using a global error-correcting macroeconomic model. Journal of Business & Economic Statistics 22, 129–181 (with Discussions and a Rejoinder). Pesaran, M.H., Smith, R.P., 1995. Estimating long-run relationships from dynamic heterogeneous panels. Journal of Econometrics 68, 79–113. Pesaran, M.H., Smith, R., Im, K.S., 1996. Dynamic linear models for heterogeneous panels. In: Matyas, L., Sevestre, P. (Eds.), The Econometrics of Panel Data. Kluwer, pp. 145–195. Phillips, P., Moon, H.R., 1999. Linear regression limit theory for nonstationary panel data. Econometrica 67, 1057–1111. Stock, J.H., Watson, M.W., 2002. Macroeconomic forecasting using diffusion indices. Journal of Business and Economic Statistics 20, 147–162. Stock, J.H., Watson, M.W., 2008. Forecasting in dynamic factor models subject to structural instability. In: Castle, J., Shephard, N. (Eds.), The Methodology and Practice of Econometrics, A Festschrift in Honour of Professor David F. Hendry. Oxford University Press, Oxford. Stone, R., 1947. On the interdependence of blocks of transactions. Journal of the Royal Statistical Society 9, 1–45. Supplement.
Journal of Econometrics 160 (2011) 349–371
Contents lists available at ScienceDirect
Journal of Econometrics journal homepage: www.elsevier.com/locate/jeconom
Spatial heteroskedasticity and autocorrelation consistent estimation of covariance matrix Min Seong Kim, Yixiao Sun ∗ Department of Economics, UC San Diego, United States
article
info
Article history: Received 30 April 2009 Received in revised form 6 March 2010 Accepted 7 October 2010 Available online 18 November 2010 JEL classification: C13 C14 C21
abstract This paper considers spatial heteroskedasticity and autocorrelation consistent (spatial HAC) estimation of covariance matrices of parameter estimators. We generalize the spatial HAC estimator introduced by Kelejian and Prucha (2007) to apply to linear and nonlinear spatial models with moment conditions. We establish its consistency, rate of convergence and asymptotic truncated mean squared error (MSE). Based on the asymptotic truncated MSE criterion, we derive the optimal bandwidth parameter and suggest its data dependent estimation procedure using a parametric plug-in method. The finite sample performances of the spatial HAC estimator are evaluated via Monte Carlo simulation. © 2010 Elsevier B.V. All rights reserved.
Keywords: Asymptotic mean squared error Heteroskedasticity and autocorrelation Covariance matrix estimator Optimal bandwidth choice Robust standard error Spatial dependence
1. Introduction This paper studies spatial heteroskedasticity and autocorrelation consistent (HAC) estimation of covariance matrices of parameter estimators. As heteroskedasticity is a well known feature of cross sectional data (e.g. White (1980)), spatial dependence is also a common property due to interactions among economic agents. Therefore, robust inference in the presence of heteroskedasticity and spatial dependence is an important problem in spatial data analysis. The first discussion of spatial HAC estimation is Conley (1996, 1999). He proposes a spatial HAC estimator based on the assumption that each observation is a realization of a random process, which is stationary and mixing, at a point in a twodimensional Euclidean space. Conley and Molinari (2007) examine the performance of this estimator using Monte Carlo simulation. Their results show that inference is robust to the measurement error in locations. Robinson (2007) considers nonparametric kernel
∗ Corresponding address: Department of Economics, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA 92093-0508, United States. Tel.: +1 858 534 4692. E-mail addresses:
[email protected] (M.S. Kim),
[email protected] (Y. Sun). 0304-4076/$ – see front matter © 2010 Elsevier B.V. All rights reserved. doi:10.1016/j.jeconom.2010.10.002
spectral density estimation for weakly stationary processes on a ddimensional lattice. Kelejian and Prucha (2007, hereafter KP) also develop a spatial HAC estimator. As in many empirical studies, they model spatial dependence in terms of a spatial weighting matrix. The difference is that the weighting matrix is not assumed to be known and is not parametrized. Typical examples of this type of process include the spatial autoregressive and spatial moving average processes. Local nonstationarity and heteroskedasticity are built-in features of these types of processes. This is in sharp contrast with Conley (1996, 1999) and Robinson (2007) in which the process is assumed to be stationary. In spatial HAC estimation literature, an economic distance is commonly employed to characterize the decaying pattern of the spatial dependence. The covariance of random variables at locations i and j is a function of dij,n , the economic distance between them. As the economic distance increases, the covariance decreases in absolute value and vice versa. A variety of distance measures can be considered depending on applications. For example, Pinkse et al. (2002) use geographic distance and Conley (1999) uses transportation cost. The existence of such an economic distance enables us to use the kernel method for the standard error estimation. The estimator is a weighted sum of sample covariances with weights depending on the relative distances, that is, dij,n /dn for some bandwidth parameter dn .
350
M.S. Kim, Y. Sun / Journal of Econometrics 160 (2011) 349–371
We generalize the spatial HAC estimator proposed by KP to be applicable to general linear and nonlinear spatial models and establish its asymptotic properties. We provide the conditions for consistency and the rate of convergence. Let E ℓn denote the mean of the average number of pseudo-neighbors. By definition, two units are pseudo-neighbors if their distance is less than dn . We show that the spatial HAC estimator is consistent if E ℓn = o(n) and dn → ∞ as n → ∞. This result implies that the rate of convergence of the estimator is E ℓn /n. Comparing our results with Andrews (1991), we find that the properties of the spatial HAC estimator we consider are interestingly parallel to those of the time series HAC estimator, even though they assume different DGPs and have different dependence structures. We decompose the difference of the spatial HAC estimator from the true covariance matrix into three parts. The first part is due to the estimation error of model parameters and the second and third parts are bias and variance terms even if the model parameters are known. We derive the asymptotic bias and variance and show that the estimation error vanishes faster than the other two terms under some regularity conditions. As a result, the truncated Mean Squared Error (MSE) of the spatial HAC estimator is dominated by the bias and variance terms. This key result provides us the opportunity to select the bandwidth parameter to balance the asymptotic squared bias with variance. We find that the optimal bandwidth choice depends on the weighting matrix Sn used in the MSE criterion. Depending on which model parameter is the focus of interest, we suggest different choices of the weighting matrix. This scheme coincides with that suggested by Politis (2007). We provide a data-driven implementation of the optimal bandwidth parameter and examine the finite sample properties of our spatial HAC estimator and the associated test via Monte Carlo simulation. We compare the performance of competing estimators using different choices of dn and Sn . In addition, the effects of location errors and the performance of the plug-in procedure with a mis-specified parametric model are examined. We also consider the case when the observations are located irregularly and compare the performance of the standard normal approximation with two naive bootstrap approximations for hypothesis testing. In addition to KP, the paper that is most closely related to ours is Andrews (1991) who employs the asymptotic truncated MSE criterion to select the bandwidth parameter for time series HAC estimation. His paper in turn can be traced back to the literature on spectral density estimation. We extend Andrews (1991) to the spatial setting. The extension is nontrivial as spatial processes are more difficult to deal with, especially when they are not weakly stationary. The remainder of the paper is as follows. Section 2 describes the estimation problem and the underlying spatial process we consider and introduces our spatial HAC estimator. Section 3 establishes the consistency, the rate of convergence, and the asymptotic truncated MSE of the spatial HAC estimator. Section 4 derives asymptotically optimal sequences of fixed bandwidth parameters and proposes a data-dependent implementation. Section 5 studies the consistency, the rate of convergence, and the asymptotic truncated MSE of the spatial HAC estimator with the estimated optimal bandwidth parameter. Section 6 presents Monte Carlo simulation results. Section 7 concludes. 2. Spatial processes and HAC estimators In a general spatial model with moment restrictions, the asymptotic distribution of a parameter estimator often satisfies ′ − 12
(Bn Jn Bn )
√
d n(θˆ − θ0 ) → N (0, Ir ),
as n → ∞,
where n is the sample size, Bn is a nonstochastic r × p matrix and
Jn = var
n 1 −
√
n i=1
n n 1 −−
=
n i =1 j =1
Vi,n (θ0 )
E [Vi,n (θ0 )Vj,n (θ0 )′ ].
(1)
Vi,n (θ ) is a random p-vector for each θ ∈ Θ ⊂ Rr . For IV estimation of a linear regression model, Vi,n (θ ) = Zi,n (Yi,n − Xi′,n θ ) where Zi,n is the vector of instruments. For pseudo-ML estimation, Vi,n (θ ) is the score function of the ith observation. For GMM estimation, Vi,n (θ ) is the moment vector. A prime example of this setting is the spatial linear regression: Yi,n = Xi′,n θ0 + ui,n , where E (ui,n |Xi,n ) = 0. The OLS estimator of θ0 is
θˆ =
n 1−
n i=1
−1 Xi,n Xi′,n
n 1−
n i =1
Xi,n Yi,n
. 1
Under some regularity conditions, (Bn Jn B′n )− 2 Ir ) where Jn =
n n 1 −−
n i=1 j=1
Bn =
n 1−
n i=1
E (Xi,n ui,n )(Xj,n uj,n )′
√
d
n(θˆ − θ0 ) → N (0,
and
−1 .
Xi,n Xi′,n
√ We are interested in estimating the asymptotic variance of n(θˆ − θ0 ). As Bn is often easy to estimate by replacing θ0 with ˆθ , our focus is on consistent estimation of Jn . By extending the spatial HAC estimator proposed in KP, we can construct a spatial HAC estimator of Jn as follows n n 1 −−
n i=1 j=1
Vˆ i,n Vˆ j′,n K
dij,n
dn
,
(2)
where Vˆ i,n = Vi,n (θˆ ) and K (·) is a real-valued kernel function. dij,n is the economic distance between units i and j and dn is a bandwidth or truncation parameter. We assume that the degree of spatial dependence is a function of dij,n , more specifically, if dij,n is small, Vi,n and Vj,n are highly dependent. Whereas, if it is large, the two units are rather close to being independent. We assume that Vi,n (= Vi,n (θ0 )) for i = 1, . . . , n are generated from the linear transformation of np common innovations: Vi,n = R˜ in ε˜ n
(3)
where
(1)
r˜i1,n
R˜ in =
... .. . 0
(1)
r˜in,n
... .. . ...
0
(p)
r˜i1,n
.. . ...
(p)
r˜in,n
(c )
is a p × np block diagonal matrix with unknown elements, ε˜ n = (c ) (˜ε1n , . . . , ε˜ n(c,n) )′ and ε˜ n = ((˜εn(1) )′ , . . . , (˜εn(p) )′ )′ is a np × 1 vector of innovations. We assume that var(˜εn(c ) ) = σcc In ,
cov(˜εn(c ) , ε˜ n(d) ) = σcd In
so that the variance matrix of ε˜ n is of the form var(˜εn ) = Σ ⊗ In
with Σ = (σij ),
M.S. Kim, Y. Sun / Journal of Econometrics 160 (2011) 349–371
where ⊗ denotes the Kronecker product. The process Vi,n (θ0 ) may be nonlinear in parameter θ0 but we assume that Vi,n (θ0 ) follows a linear array process as in (3). This type of process allows for nonstationarity and unconditional heteroskedasticity. It also includes typical spatial parametric models such as spatial autoregressive processes and spatial moving average processes. Let Rin ≡ R˜ in (Σ 1/2 ⊗ In ) and εn ≡ (ε1,n , . . . ., εnp,n )′ = (Σ −1/2 ⊗ In )˜εn , then Vi,n = Rin εn
var (εn ) = Inp .
and
The matrix Rin can be written more explicitly as
(1)
... .. . ...
ri1,n
Rin ≡
(p)
ri1,n
=
σ 11 r˜i1(1,)n σ
p1
(p) r˜i1,n
(1)
(p)
ri,np,n
ri,np,n
(1)
...
r˜in,n
.. .
(p) r˜in,n
...
... .. .
σ 1p r˜i1(p,)n
...
σ
pp
... .. .
(p) r˜i1,n
...
(p)
r˜in,n (p) r˜in,n
4 Assumption 1. For each n ≥ 1, {εℓ,n } are i.i.d. (0, 1) with E εℓ, n ≤ cE for a constant cE < ∞.
For simplicity, we assume that εi,n is independent of εj,n for i ̸= j. Our results can be generalized but with more tedious calculations. Under Assumption 1, the covariance matrix between Vi,n and Vj,n is given by (cd)
Γij,n ≡ (γij,n ) = E [Vi,n Vj′,n ] = Rin R′jn
(4)
(cd) where the (c , d)-th element of Γij,n is denoted by γij,n . Accordingly,
Eq. (1) can be restated as Jn =
n n 1 −−
n i=1 j=1
n n 1 −−
Γij,n =
n i=1 j=1
Rin R′jn
(5)
Jn (c , d) =
n i=1 j=1 n
=
n
E
np − np −
m=1 ℓ=1
(c ) (d) rim,n rjℓ,n εm,n εℓ,n
np
1 − − − (c ) (d) r r . n i=1 j=1 m=1 im,n jm,n
(6)
Assumption 2. For all j = 1, 2, . . . , np, and s = 1, 2, . . . , p, ∑n (s) k=1 |rkj,n | < cR for some constant cR , 0 < cR < ∞. Assumption 3. There exists qd > 0 such that n−1 q
∑n ∑n i =1
n n 1 −−
n i =1 j =1
Vˆ i,n Vˆ j′,n K
d∗ij,n dn
,
of matrix A.
Assumptions 2 and 3 impose conditions on the persistence of the spatial process. If |σ ij | ≤ C for some constant C > 0, ∑n (s) (s) then Assumption 2 holds if rkj,n | < cR /C . Since |˜rkj,n | can k=1 |˜ be regarded as the (absolute) change of
(s) Vk,n
(7)
where d∗ij,n = dij,n + νij,n and νij,n denote the measurement error. Data on economic distances available to econometricians usually contain measurement errors. For example, the economic distance between two countries may be measured by transportation cost in international trade and this inevitably involves some measurement error. Sometimes the economic distance may be estimated from another related model. The underlying estimation error is a special case of measurement error. Assumption 4. (i) {νij,n } are independent of {εℓ,n }. (ii) νij,n = o(dn ) ∑n ∑n as dn → ∞. (iii) n−1 i=1 j=1 ‖Γij,n ‖E |νij,n |qd < ∞ for all n. We allow a measurement error to increase as the distance of two units becomes farther as long as Assumption 4(ii) and (iii) ∑ hold.∑Under this assumption, it is straightforward that n n n−1 i=1 j=1 ‖Γij,n ‖E (d∗ij,n )qd < ∞ for all n. Essentially, measurement errors in ∑ location can not be so large as to change ∑n n the summability of n−1 i=1 j=1 ‖Γij,n ‖E (d∗ij,n )q . ∗ −1 Let ℓi,n = j=1 1{dij,n ≤ dn } and ℓn = n i=1 ℓi,n . If ∗ we call unit j a pseudo-neighbor of unit i if dij,n ≤ dn , then ℓi,n is the number of pseudo-neighbors that unit i has and ℓn is the average number of pseudo-neighbors. Here we use the terminology ‘‘pseudo-neighbor’’ in order to differentiate it from the common usage of ‘‘neighbor’’ in spatial modeling. In order to obtain the properties of the estimator in Theorem 1 below, it is important to control the boundary effects. That is, the effects of the units on the boundary should become negligible as the sample size increases so that the asymptotic properties of the estimator depend only on the behavior of the interior units. We define the boundary in terms of the number of pseudo-neighbors. If E |ℓi,n − E ℓn | = o(E ℓn ), then we say that i is not on the boundary, otherwise it is on the boundary. Let
∑n
j =1
‖Γij,n ‖dijd,n < ∞ for all n, where ‖A‖ denotes the Euclidean norm
(s)
is finite for all n. This excludes the case in which the sample size increases because of more intensive sampling within a given ∑n distance. This condition enables us to truncate the sum j=1 ‖Γij,n ‖ and downweigh the summand without incurring a large error. As in the time series literature, this assumption helps us control the asymptotic bias of the spatial HAC estimator. By (6), ∑n ∑n ∑np (c ) (d) q Assumption 3 holds if n−1 i=1 j=1 | m=1 rim,n rjm,n |dijd,n < ∞ for all c, d, and n. This implies that if units i and j are far away (c ) (c ) from each other, then one of rim,n and rjm,n must be small for any given m. In other words, given an innovation at any location m, the responses at locations i and j cannot be both large if i and j are far away from each other. The spatial HAC estimator we consider is based on (2) but also allows for measurement errors in the economic distances as follows
∑n
and the (c , d)-th element of Jn is n n 1 −−
ε˜ jn(s) is limited to a finite number of units. Assumption 3 states that ∑ ∑ q Γij,n decays to zero fast enough such that n−1 ni=1 nj=1 ‖Γij,n ‖dijd,n
Jˆn =
where σ ij is the (i, j)-th element of Σ 1/2 . We make the following assumption on εn .
351
in response to one
unit change in ε˜ jn , the summability condition requires that the aggregate response be finite. The condition holds trivially if the (s) set {˜rkj,n , k = 1, 2, . . . , n} has only a finite number of nonzero elements. In this case, the dependence induced by the innovation
En ≡ {i : E |ℓi,n − E ℓn | = o(E ℓn )}, n1 =
n −
1{i ∈ En }
and n2 = n − n1 .
i=1
Then En represents the set of nonboundary locations and n1 and n2 are the sizes of the nonboundary set and boundary set respectively. A unit i is in the nonboundary set En as long as the difference between ℓi,n and E ℓn does not grow too fast as n increases. The size of the boundary depends on the choice of the bandwidth. We can mitigate the boundary effects by raising dn slowly as n increases. If n2 /n is o(1), the boundary effect is asymptotically negligible. When
352
M.S. Kim, Y. Sun / Journal of Econometrics 160 (2011) 349–371
the units are regularly spaced on a lattice in R2 , this condition is satisfied if E ℓn /n = o(1). We maintain the following assumption on the number of pseudo-neighbors.
our HAC estimator is Jˆn =
n n 1 −−
n i=1 j=1
Vˆ i,n Vˆ j′,n K
d∗ij,min
dn
.
(8)
Assumption 5. For i ∈ En , ℓi,n ≤ CE ℓn for some constant C .
All our results remain valid as d∗ij,min is a single distance measure and it is not hard to show that Assumptions 3–6 hold with d∗ij,min .
Assumption 5 is very weak as C can be a large constant. This assumption rules out the case that the units are concentrated only in some limited area while other area is scarce.
3. Asymptotic properties of spatial HAC estimators This section presents the consistency conditions, the rate of convergence, and the asymptotic truncated MSE of the fixed bandwidth kernel spatial HAC estimator. We begin by introducing the assumption on the kernel used in the spatial HAC estimator.
Assumption 6. For i ∈ En ,
1 V j ,n = g , lim var √ E ℓn j:d∗ ≤d n ij,n
−
n→∞
where g ≡ limn→∞ Jn = limn→∞ n
∑n ∑n −1 i =1
√ ∑ In this assumption, 1/ E ℓn j:d∗
ij,n
≤dn
j =1
Γij,n .
Vj,n can be regarded as
√ ∑n a local version of the (scaled) global average: 1/ n j=1 Vj,n . Assumption 6 states that the asymptotic variance of each local average (around an interior point) is the same as that of the global average. There are many copies of local averages. Assumption 6 amounts to stating that there are multiple (noisy) observations of the quantity that we want to estimate. From this perspective, Assumption 6 is likely to be the weakest possible assumption we have to maintain. It is similar to the homogeneity assumption in Bester et al. (2009), which assumes that the covariance matrix of each group converges to the same limit. Assumption 6 is related to but weaker than covariance stationarity. It is implied by covariance stationarity but it can hold even though covariance stationarity is violated. As an example, consider a nonstationary spatial process on a regular lattice in R2 represented by S (x1 , x2 ) =
2π
∫ 0
2π
∫
exp(iω1 x1 + iω2 x2 )
0
× f(x1 ,x2 ) (ω1 , ω2 )dZ (ω1 , ω2 ) where f(x1 ,x2 ) (ω1 , ω2 ) is the local spectral density function at location (x1 , x2 ), and Z (ω1 , ω2 ) is a stochastic process with independent increments. The dependence of f(x1 ,x2 ) (ω1 , ω2 ) on (x1 , x2 ) induces nonstationarity. For this type of nonstationary process, Assumption 6 amounts to assuming that f(x1 ,x2 ) (0, 0) does not depend on (x1 , x2 ), which is much weaker than assuming that f(x1 ,x2 ) (ω1 , ω2 ) does not depend on (x1 , x2 ) for all (ω1 , ω2 ). In Assumptions 3–6, we consider the case of a single distance measure. Our framework can be easily extended to allow for multiple distance measures. As in KP, suppose we have M distance measures d∗ij,m,n for m = 1, 2, . . . , M, each of which is possibly error-ridden. If one of the distance measures, say dij,1,n and its associated measurement error satisfy Assumptions 3–6, then n− 1
n − n −
‖Γij,n ‖E (d∗ij,1,n )qd < ∞.
i=1 j=1
Let d∗ij,min = min d∗ij,m,n m
then n− 1
n − n −
‖Γij,n ‖E (d∗ij,min )qd < ∞.
i=1 j=1
Hence we can use the d∗ij,min as the ‘‘aggregate distance measure’’ and construct the HAC estimator based on d∗ij,min . More specifically,
Assumption 7. (i) The kernel K : R → [−1, 1] satisfies K (0) = 1, K (x) = K (−x), K (x) = 0 for |x| ≥ 1. (ii) For all x1 , x2 ∈ R there is a constant, cL < 0, such that
|K (x1 ) − K (x2 )| ≤ cL |x1 − x2 |. ∑n d∗ (iii) (E ℓn )−1 E j=1 K 2 ( dij,n ) → K¯ for all i. n Examples of kernels which satisfy Assumption 7(i) and (ii) are the Bartlett, Tukey-Hanning and Parzen kernels. The quadratic spectral (QS) kernel does not satisfy Assumption 7(i) because it does not truncate. We may generalize our results to include the QS kernel but this requires a considerable amount of work. Assumption 7(iii) is more of an assumption on the distribution of the units. In the case of a 2-dimensional lattice structure and d∗ij,n is the Euclidean distance, we have K¯ =
1
π
1
∫
−1
∫ √1−x2 ∫ K 2 ( x2 + y2 )dydx = √ −
1 −x 2
1
rK 2 (r )dr . 0
In finite samples, we may use K¯ n = (nℓn )−1
n − n −
K
2
d∗ij,n
dn
i=1 j=1
for K¯ . As positive semi-definiteness is a desirable property of Jˆn , it is important to examine the class of kernel functions that yields a psd covariance estimator. If we map a set of observations into a lattice and assign zero at a grid where an observation is missing, then Jˆn is numerically equivalent to the weighted average of the periodogram. Therefore, even though the spatial process of Vi,n is not covariance stationary, a kernel with a positive Fourier transformation guarantees the positive semi-definiteness of Jˆn as the case of spectral density estimation. The asymptotic variance of Jˆn depends on g, the limit value of Jn and the asymptotic bias of Jˆn is determined by the smoothness of the kernel at zero and the rate of decaying of the spatial dependence as a function of the distance. Define Kq0 = lim
1 − K (x)
x→ 0
|x|q0
,
for q0 ∈ [0, ∞)
and let q = max{q0 : Kq0 < ∞} be the Parzen characteristic exponent of K (x). The magnitude of q reflects the smoothness of K (x) at x = 0. We assume q ≤ qd throughout the paper. Let gn(q) =
n n 1 −−
n i=1 j=1
Γij,n E (d∗ij,n )q ,
g (q) = lim gn(q) ≡ lim n→∞
n→∞
n n 1 −−
n i=1 j=1
Γij,n E (d∗ij,n )q .
Next we introduce additional assumptions required to obtain the asymptotic properties of Jˆn .
M.S. Kim, Y. Sun / Journal of Econometrics 160 (2011) 349–371
√
Assumption 8. (i) n(θˆ − θ0 ) = Op (1). (ii) supi E supθ ∈Θn ‖Vi,n (θ)‖2 < ∞ where Θn is a small neighborhood around θ0 . (iii) supi E supθ∈Θn ‖ ∂θ∂ ′ Vi,n (θ )‖2 < ∞. (iv) For r = 1, . . . , p, supi E supθ ∈Θn
‖ ∂θ∂∂θ ′ Vi(,rn) (θ )‖2 < ∞. (v) supi,a,b ∞ for r , s = 1, . . . , p. 2
∑∞
j =1
‖EVj(,rn) Vb(,rn) ∂θ∂ ′ Vi(,sn) ∂θ∂ Va(,sn) ‖ <
Assumption 8(i) usually holds by the asymptotic normality of parameter estimators. Assumption 8(ii) is implied by Assumptions 1 and 2. Assumption 8 (iii)–(v) are trivial in a linear regression case. We define the MSE criterion as
MSE
n E ℓn
, Jˆn , S
=
n E ℓn
n n 1 −−
n i=1 j=1
Vi,n Vj′,n K
d∗ij,n
dn
.
below. Therefore, we use J˜n to analyze the asymptotic properties of Jˆn . Notwithstanding, if θˆ has an infinite second moment, the underlying estimation error can dominate the MSE criterion. To circumvent the undue influence of θˆ on the criterion of performance, we follow Andrews (1991) and replace the MSE criterion with a truncated MSE criterion. We define MSEh
n E ℓn
, Jˆn , Sn
] n vec(Jˆn − Jn )′ Sn vec(Jˆn − Jn ) , h = E min E ℓn [
where Sn is a p2 × p2 weighting matrix that may be random. The criterion which we base on for the optimality result is the asymptotic truncated MSE, which is defined as
lim lim MSEh
h→∞ n→∞
n E ℓn
, Jˆn , Sn .
This criterion yields the same value as the asymptotic MSE when θˆ has well defined moments, but does not diverge to infinity when θˆ has infinite second moments. p
Assumption 9. (i) E εl8,n < ∞. (ii) Sn → S for a positive definite matrix S. Let tr denote the trace function and Kpp the p2 × p2 commutation matrix. Under the assumptions above, we have the following theorem. Theorem 1. Suppose that Assumptions 1–7 hold, n2 /n → 0, E ℓn and dn → ∞ and E ℓn /n → 0. (a) limn→∞ E ℓ var(vec J˜n ) = K¯ (I + Kpp )(g ⊗ g ). n q (b) limn→∞ dn (E J˜n − Jn ) = −Kq g (q) . n
(c) If Assumption 8 holds and Jn ) = Op (1) and
n E ℓn
, Jˆn , Sn h→∞ n→∞ E ℓn n , J˜n , Sn = lim lim MSEh h→∞ n→∞ E ℓn n ˜ , Jn , S = lim MSE n→∞ E ℓn
lim lim MSEh
=
1
n
Kq2 (vec g (q) )′ S (vec g (q) ) + K¯ tr(S (I + Kpp )(g ⊗ g )).
τ
n
Under the assumptions above, the effect of using θˆ instead of θ0 on the asymptotic property is op (1) as Theorem 1(c) states
(d) Under the conditions of part (c) and Assumption 9,
Proofs are given in the Appendix. For each element of J˜n , q limn→∞ Enℓ cov(J˜rs,n , J˜cd,n ) = K¯ (grc gsd +grd gsc ) and limn→∞ dn (E J˜rs,n
E [vec(Jˆn − Jn )′ S vec(Jˆn − Jn )],
where S is some p2 × p2 weighting matrix and vec(·) is the column by column vectorization function. We also define J˜n as the pseudoestimator that is identical to Jˆn but is based on the true parameter, θ0 , instead of θˆ . That is, J˜n =
353
2q dn E ℓn
n
→ τ ∈ (0, ∞), then
(Jˆn − J˜n ) = op (1).
n E ℓn
(Jˆn −
− Jrs,n ) = −Kq grs(q) . Theorem 1(a) and (b) show that the asymptotic variance and bias of J˜n depend on the choice of the bandwidth. When we increase the bandwidth, the bias decreases and the variance increases because E ℓn increases with dn . The second part of Theorem 1(c) shows that, compared with the variance term in part (a), the effect of using Vi,n (θˆ ) instead of Vi,n (θ0 ) in the construction of the spatial HAC estimator is of a smaller order. Therefore, the rate of convergence is obtained by balancing the variance and the squared bias. Accordingly, E ℓn = o(n) is the condition for the consistency of Jˆn and its rate of √ η −q convergence is E ℓn /n (= O(dn )). If we assume that E ℓn = O(dn ) for some η > 0, then the rate of convergence can be rewritten as nq/(η+2q) . The results here are different from those provided by KP. In their paper, the condition for consistency is E ℓn = o(nτ ) where τ ≤ 12 and the rate of convergence is nq/(η+4q) . They obtain this slower rate of convergence by balancing the terms from the estimation error in θˆ and the asymptotic bias. Their rate is not the best obtainable because their bound for the estimation error term is too loose. It is also interesting that the asymptotic properties of the spatial HAC estimator are very similar to those of the time series HAC estimator even though their DGPs and dependence structures are different from each other. Instead of using dn as the bandwidth parameter, we can also use E ℓn as the bandwidth parameter. In the time series case, dn = .5E ℓn for interior points. Substituting this relationship into Theorem 1, we obtain the same results as given in Parzen (1957), Hannan (1970) and Andrews (1991). 4. Optimal bandwidth parameter and data dependent bandwidth selection This section presents a sequence of optimal bandwidth parameters which minimize the asymptotic truncated MSE of Jˆn and gives a data-driven implementation. We also consider the choice of the weighting matrix Sn . We obtain the optimal bandwidth parameter directly as a corollary to Theorem 1(d). Let d⋆n be the optimal bandwidth parameter. Then d⋆n = arg min dn
+
E ℓn n
1 2q
dn
Kq2 (vecg (q) )′ Sn (vecg (q) )
K¯ tr(Sn (I + Kpp )(g ⊗ g )).
(9)
If the relation between E ℓn and dn is specified, (9) can be restated η in an explicit form. For example, we may assume that E ℓn = αn dn and αn = O(1) for some η > 0. Then (9) is reduced to: d⋆n = arg min dn
1 2q
dn
Kq2 (vecg (q) )′ Sn (vecg (q) )
354
M.S. Kim, Y. Sun / Journal of Econometrics 160 (2011) 349–371
η
+
αn dn ¯ K tr(Sn (I + Kpp )(g ⊗ g ))
qKq2 κ(q)n
=
n
2q1+η (10)
αn ηK¯
where
κ(q) =
2(vecg (q) )′ Sn (vecg (q) ) tr(Sn (I + Kpp )(g ⊗ g ))
.
Corollary 1. Suppose Assumptions 1–9 hold. Assume that E ℓn = η αn dn for some η > 0, αn = α + o(1). Then, for any sequence of 2q
d
E ℓn
bandwidth parameters {dn } such that n n n → ∞, {d⋆n } is preferred in the sense that
→ τ ∈ (0, ∞) as
lim lim (MSEh (n2q/(2q+η) , Jˆn (dn ), Sn )
(c ) i.i.d
h→∞ n→∞
− MSEh (n
2q/(2q+η)
, Jˆn (d⋆n ), Sn )) ≥ 0.
The inequality is strict unless dn = d⋆n + o(n1/(2q+η) ). In general, η is equal to the dimension of the space. In the time series case, η = 1 while in the two dimensional regular lattice case, η = 2. As a result, the optimal bandwidth d⋆n depends on the dimension of space. Given the nonparametric nature of our estimator, this is not surprising. In contrast, KP suggest using dn = c [n1/4 ], which is rate optimal only if q = 1 and η = 2. In general, both the rate and constant are suboptimal. d⋆n is a function of g and g (q) which are unknown in finite samples. Therefore, the optimal bandwidth d⋆n is not feasible in practice. For this reason, a data dependent estimation procedure is needed for implementation. Among several data dependent bandwidth selection methods, plug-in methods are appropriate in this case because we consider the estimation of Jn at given data. In the plug-in methods, unknown parameters are estimated using a parametric or nonparametric method (e.g. Andrews, 1991; Newey and West, 1987, 1994). The former yields a less variable bandwidth parameter but may introduce an asymptotic bias due to the mis-specification of the parametric model. In contrast, the latter does not require knowledge of the DGP, but it converges more slowly than the former, which causes bandwidth selection to be less reliable. Since the optimal bandwidth involves g (q) , a quantity that is very hard to estimate, we focus on the parametric plug-in method in this paper. In fact, the rate of convergence for a nonparametric estimator of g (q) is generally slower than that for g itself. Fig. 1 presents the percentage increase in MSE relative to the minimum MSE as a function of the bandwidth. The graph is based on the spatial AR(1) process Vn = ρ Wn Vn + εn on a square grid of integers, where Wn is a contiguity matrix whose threshold is
√
i.i.d
2 and εi,n ∼ N (0, 1). The sample size is n = 400. As a standard practice, Wn is row-standardized and its diagonal elements are zero. The curve is U-shaped for each ρ and therefore our goal is to choose the bandwidth which is reasonably close to d⋆n . As argued by Andrews (1991), good performance of a HAC estimator only requires the automatic bandwidth parameter to be near the optimal bandwidth value and not precisely equal to it. The simplest and most popular approximating parametric (c ) model is the spatial AR(1) model for Vn , c = 1, . . . , p. Depending on the correlation structure, spatial MA(q) or spatial ARMA(p, q) (c ) models can also be used. As an example, consider the case that Vn follows a spatial AR(1) process of the form: (c )
Vn
(c )
(c )
(c )
Fig. 1. Spatial AR(1) process: Vn = ρ Wn Vn + εn , n = 400.
(c ) −1 (c )
= ρc Wn Vn + ε˜ n = (In − ρc Wn ) ε˜ n ,
(c )
(c )
where ε˜ i,n ∼ (0, σε2 ) and Wn is a spatial weighting matrix. Wn is determined a priori and by convention it is row-standardized and its diagonal elements are zero. See Anselin (1988). We can estimate ρc by quasi-maximum likelihood (QML) or spatial two stage least squares (2SLS) estimators (e.g. Kelejian and Prucha, 1998). In fact, a simple OLS estimator can be used. If the spatial AR(1) model is the true data generating process, then the OLS estimator is inconsistent while the QML and 2SLS estimators are consistent. Since the spatial AR(1) model is likely to be mis-specified, the QML and 2SLS estimators are not necessarily preferred. (p) (1) (c ) (c ) (c ) ˆ = Let εˇ n = (In − ρˆ c Wn )Vn , εˇ n = (ˇεn , . . . , εˇ n ) and Σ −1 ′ n εˇ n εˇ n . Define Aˆ cd =
[
1 (c )′ Vˆ n (In − ρˆ c W (c ) )′ (In − ρˆ d Wn(d) )Vˆ n(d) n
]
× [(In − ρˆ c Wn(c ) )−1 ][(In − ρˆ d Wn(d) )−1 ]′ (cd)
where its (i, j)-th element is denoted by aˆ ij (q)
(11) for i, j = 1, . . . , n.
Then, we estimate gcd and gcd by gˆcd =
n n 1 − − (cd) aˆ , n i=1 j=1 ij
(q)
gˆcd =
n n 1 − − (cd) ∗ q aˆ (dij,n ) . n i=1 j=1 ij
(12)
Consequently, the data dependent bandwidth parameter estimator, dˆ n , based on the spatial AR(1) model is dˆ n = arg min dn
+ K¯
ℓn n
1 2q
dn
Kq2 (vecgˆ (q) )′ Sn (vecgˆ (q) )
tr(Sn (I + Kpp )(ˆg ⊗ gˆ )).
(13)
For spatial MA(1) and spatial ARMA(1, 1) models, (11) is restated as Aˆ cd =
Aˆ cd
[
1 (c )′ ˆ c Mn(c )′ )−1 (In + λˆ d Mn(d) )−1 Vˆ n(d) Vˆ n (In + λ n
]
× (In + λˆ c Mn(c ) )(In + λˆ d Mn(d)′ ) [ 1 (c )′ ˆ c Mn(c )′ )−1 (In + λˆ d Mn(d) )−1 = Vˆ n (In − ρˆ c Wn(c )′ )(In + λ n ] × (I − ρˆ d Wn(d) )Vˆ n(d) (In − ρˆ c Wn(c ) )−1 (In + λˆ c Mn(c ) ) × (In + λˆ d Mn(d)′ )(In − ρˆ d Wn(d)′ )−1 (c )
respectively. λc and Mn are the coefficient and the n × n weighting matrix for the spatial MA component. Extension to
M.S. Kim, Y. Sun / Journal of Econometrics 160 (2011) 349–371
355
spatial AR(p), spatial MA(q), spatial ARMA(p, q) models for p, q ≥ 2 is straightforward. The choice of the weighting matrix Sn is another important problem. A traditional choice suggested by Andrews (1991) is
We study the properties of Jˆn (dˆ n ) by investigating Jˆn (d¨ n ) because the asymptotic properties of Jˆn (dˆ n ) are equivalent to those of Jˆn (d¨ n ) as stated in Theorem 2 below. For Theorem 2, we introduce the following assumption.
Sˆn = (Bˆ n ⊗ Bˆ n )′ S˜ (Bˆ n ⊗ Bˆ n )′ ,
Assumption 10.
where S˜ is a r 2 × r 2 diagonal weighting matrix. For this choice of Sˆn , the asymptotic truncated MSE criterion reduces to the asymptotic truncated MSE of Bˆ n Jˆn Bˆ ′n with weighting matrix S˜ provided that
Bˆ n − Bn = op (E ℓn /n). When S˜ is an identity matrix, we obtain the MSE of the sum of the elements in Bˆ n Jˆn Bˆ ′n .
While Sˆn is consistent for the objective we are interested in, as Politis (2007) points out, it yields a single optimal bandwidth for estimating all elements of a covariance matrix but each element has its own individual optimal bandwidth. In particular, the cost of (s) using a single optimal bandwidth increases when the process Vn is significantly different for different s. This is typical in a spatial context. Considering this, we propose using different weighting matrices for different elements of the covariance matrix when Vn has a heterogenous dependence structure. Let Srs,n denote the weighting matrix for estimating Jˆrs,n . Then, a natural choice of Srs,n is the diagonal matrix in which the element corresponding to Jˆrs,n is 1 and others are zero. We can also choose the weighting matrix such that the asymptotic truncated MSE criterion reduces to the asymptotic truncated MSE of a subvector of the parameter estimator θˆ . One concern of this method is that it does not guarantee Jˆn to be psd, which is often regarded as a desirable property of Jˆn . However, we can attain positive semi-definiteness with a simple modification suggested by Politis (2007). As Jˆn is symmetric, ˆ Uˆ ′ , where Uˆ is an orthogonal matrix and Λ ˆ = Jˆn (dˆ n ) = Uˆ Λ ˆ ˆ diag(λ1 , . . . , λp ) is a diagonal matrix whose diagonal elements are
ˆ+ ˆ+ ˆ + = diag(λˆ + the eigenvalues of Jˆn . Let Λ 1 , . . . , λp ) where λs =
√
¨
n( dˆ n − 1) = Op (1). dn
(q) Since d¨ n is defined by replacing ℓn , gˆcd and gˆcd with E ℓn ,
(q) the probability limits of gˆcd and gˆcd in the definition of dˆ n , the (q)
assumption holds if ℓn , gˆcd and gˆcd converge to E ℓn , g¨cd and (q)
g¨cd respectively at the parametric rate. This is a rather weak assumption. ∑n ∑n ∗ ˆ ¨ ¨ Let ℓˆ i,n = j=1 1(dij,n ≤ dn ), ℓi,n = j=1 1{dij,n ≤ dn },
ℓ¨ i,n . The next theorem summarizes the properties of the spatial HAC estimator with dˆ n . ℓˆ n = n−1
∑n
i=1
ℓˆ i,n and ℓ¨ n = n−1
∑n
i=1
Theorem 2. Suppose Assumptions 1–10 hold. (a)
n E ℓ¨ n
(Jˆn (dˆ n ) − Jn ) = Op (1) and
(b) Let τ¨ = limn→∞
2q d¨ n E ℓ¨ n
n
n E ℓ¨ n
(Jˆn (dˆ n ) − Jˆn (d¨ n )) = op (1).
. Then,
, Jˆn (dˆ n ), Sn h→∞ n→∞ E ℓ¨ n n ˆ ¨ = lim lim MSEh , Jn (dn ), Sn h→∞ n→∞ E ℓ¨ n lim lim MSEh
=
1
τ¨
n
Kq2 (vec g (q) )′ S (vec g (q) ) + K¯ tr(S (I + Kpp )(g ⊗ g )).
Proofs are given in the Appendix. Theorem 2(a) implies that p
Jˆn (dˆ n ) − Jn → 0 as long as E ℓ¨ n = o(n) and Jˆn (dˆ n ) and Jˆn (d¨ n ) have the same asymptotic properties. If the approximating parametric p
p
model is correct, that is, gˆ → g and gˆ (q) → g (q) , {dˆ n } has some optimality properties as a result of Theorem 1(d) and Corollary 1. Corollary 2. Suppose Assumptions 1–10 hold. Assume that E ℓn = η αn dn for some η > 0 and αn = α + o(1). Then for any sequence of data dependent bandwidth estimators {d˙ n } such that for some fixed
ˆ s , 0). Then, we define our modified estimator as max(λ ˆ + Uˆ ′ . Jˆn (dˆ n )+ = Uˆ Λ
2q
As each eigenvalue of Jˆn (dˆ n )+ is nonnegative, it is psd. Theorem 4.1 in Politis (2007) shows that Jˆn (dˆ n )+ converges Jn at the same rate as Jˆn (dˆ n ). In fact, it is not hard to show that the truncated AMSE of Jˆn (dˆ n )+ is smaller than that of Jˆn .
sequence, {dn }, which satisfies limn→∞ have
5. Properties of data dependent bandwidth parameter estimators
dˆ n is preferred in the sense that
dn E ℓn n
→ τ ∈ (0, ∞) we
lim lim (MSEh (n2q/(2q+η) , Jˆn (d˙ n ), Sn )
h→∞ n→∞
− MSEh (n2q/(2q+η) , Jˆn (dn ), Sn )) = 0, lim lim (MSEh (n2q/(2q+η) , Jˆn (d˙ n ), Sn )
h→∞ n→∞
In this section, we consider the consistency condition, rate of convergence, and asymptotic truncated MSE of spatial HAC estimators with the data dependent bandwidth parameter estimator. Let
− MSEh (n2q/(2q+η) , Jˆn (dˆ n ), Sn )) ≥ 0. The inequality is strict unless dn = d⋆n + o(n1/(2q+η) ). 6. Monte Carlo simulation
n n 1 − − (cd) aˆ ij , n→∞ n i=1 j=1
g¨cd = P lim
n
(q)
g¨cd = P lim n→∞
n
1 − − (cd) aˆ E (d∗ij,n )q n i=1 j=1 ij (q)
be the probability limits of gˆcd and gˆcd respectively. Define d¨ n = arg min dn
+ K¯
E ℓn n
1 2q
dn
Kq2 (vecg¨ (q) )′ Sn (vecg¨ (q) )
tr(Sn (I + Kpp )(¨ g ⊗ g¨ )).
(14)
In this section, we study the properties of the spatial HAC estimator with Monte Carlo simulation. First, we compare the performance of the spatial HAC estimator based on dˆ n with other bandwidth selection procedures and the heteroskedasticity robust covariance estimator of White (1980). We evaluate them using the MSE criterion and the coverage accuracy of the associated CIs. Second, we examine the robustness of our bandwidth choice procedure to the mis-specification in the spatial weighting matrices and the approximating parametric model. We also examine its robustness to the presence of measurement errors in distance. Third, for studentized tests, we compare the normal
356
M.S. Kim, Y. Sun / Journal of Econometrics 160 (2011) 349–371
approximation with some naive bootstrap approximations. Fourth, we evaluate the performance of the spatial HAC estimator with bandwidth parameter dˆ n when the units are distributed irregularly on the lattice. Finally, we use different weighting matrices in the MSE criterion and evaluate the effect of the resulting bandwidth choice on the MSE of a standard error estimator. The data generating process we consider here is yn = Xn θ0 + un
(15)
un = ρ0 W0n un + εn ,
|ρ0 | < 1,
(16)
i.i.d.
with εi,n ∼ N (0, 1). We assume a lattice structure, in which each unit is located on a square grid of integers. W0n is √ a contiguity matrix and units i and j are neighbors if dij,n ≤ 2. Following convention, it is row-standardized and its diagonal elements are zero. We consider three different sizes of lattices, 20 × 20 (n = 300, 400), 25 × 25 (n = 400) and 32 × 32 (n = 1024). The ranges of dn we consider are from 1 to 27 for the 20 × 20 lattice, from 1 to 34 for the 25 × 25 lattice and from 1 to 44 for the 32 × 32 lattice. We use a location model in the first part and a univariate regression model in the second part. The estimand of interest is the covariance √ matrix of n(θˆ − θ0 ). We use the Parzen kernel, which is defined as follows:
1 − 6x2 + 6|x|3 , K (x) = 2(1 − |x|)3 , 0,
for 0 ≤ |x| ≤ 1/2, for 1/2 ≤ |x| ≤ 1, otherwise.
6.1. Location model For the location model, model (15) reduces to yi,n = θ0 + ui,n . Without loss of generality, we set θ0 = 1. A natural estimator of θ0 ∑n is θˆ = n−1 i=1 yi,n and uˆ n = yn − θˆ . We use the spatial AR(1) as the approximating parametric model. The concentrated log-likelihood function for the spatial AR(1) process is n log L(ˆun |ρ) = − log(ˆun − ρ Wn uˆ n )′ (ˆun − ρ Wn uˆ n ) 2 + log |In − ρ Wn | + const. See Lee (2004). For a given spatial weighting matrix Wn , we estimate ρ by the QML method, that is
ρˆ = ρ( ˆ Wn ) = arg max log L(ˆun |ρ). ρ
Depending on the choice of Wn , we obtain a different ρˆ and hence a different bandwidth parameter dˆ n (Wn ) from Eq. (13). To find dˆ n , we search the minimizer numerically instead of using the plug-in version of (10). In our simulation experiment, we take Wn to be the contiguity matrix in which units i and j are neighbors if dij,n ≤ D , a threshold parameter. We consider three values for
√
the threshold: D = 1, 2, 2, leading to√three bandwidth choices (ℓ) (h) dˆ n , dˆ n and dˆ n . Note that when D = 2, the spatial weighting matrix is equal to the true spatial weighting matrix W0n . We also consider the case with measure errors in distance. When dij,n > 1, we take P (νij,n = −1) = P (νij,n = 0) = P (νij,n = 1) = 1/3. We use the contiguity matrix as √ the weighting matrix and take the threshold parameter to be 2. This gives us the data driven bandwidth estimator dˆ (e) . 1 KP suggest taking dn = c [n 4 ], where c is a constant and [z ] denotes the largest integer that is less than or equal to z and they 1
use [n 4 ] in their simulation. We compare the performances of
Table 1 Ratio of the MSE of spatial HAC estimators with different bandwidths to the MSE of the spatial HAC estimator with finite sample optimal bandwidth, d˜ n .
ρ −0.5 dˆ n (l)
dˆ n
(h)
dˆ n
ˆ (e)
dn
dFn d⋆n d˜ n
−0.3
0
0.3
0.5
0.7
0.9
0.95
1.08 (6.2)
1.16 (5.4)
5.02 (3.5)
1.18 (6.5)
1.14 (8.8)
1.12 (11.7)
1.07 (17.9)
1.06 (22.2)
2.22 (4.3)
2.08 (3.8)
2.93 (2.8)
1.17 (4.6)
1.09 (6.2)
1.05 (8.2)
1.02 (12.5)
1.01 (15.5)
1.14 (7.1)
1.27 (6.2)
7.46 (4.4)
1.37 (7.6)
1.34 (10.5)
1.30 (14.4)
1.21 (23.1)
1.13 (26.5)
4.92 (4.1) 2.41 1.00 (6.2) (6.3)
1.49 (4.5) 1.40 1.00 (5.5) (5.3)
7.40 (4.0) 4.43 1.00 (1.0) (1.3)
1.47 (7.8) 1.20 1.10 (6.7) (5.4)
1.42 (10.8) 1.68 1.10 (8.9) (7.1)
1.34 (14.7) 2.07 1.09 (11.8) (9.2)
1.21 (23.1) 1.99 1.07 (18.4) (13.7)
1.13 (26.4) 1.73 1.07 (23.2) (16.6)
Notes: (1) Sample size n = 400. (2) dFn = 4 when n = 400. (3) The number in parentheses represents the average value of bandwidth choice.
(ℓ)
(h)
1
dˆ n , dˆ n , dˆ n and dˆ (e) with that of dFn = [n 4 ].1 We also include the heteroskedasticity consistent estimator of White (1980, 1984) in our comparison. The estimator is defined as INID =
1
n −
n − 1 i=1
uˆ 2i,n ,
which can be regarded as the SHAC estimator with bandwidth set at 0. Table 1 presents the ratio of the MSE of the spatial HAC estimator with different bandwidth choices to the spatial HAC estimator with the infeasible finite sample optimal bandwidth d˜ n . It also reports the average bandwidth choice in each scenario. When ρ is high, the ratio is usually less than 1.20 even with incorrect Wn and measurement errors. When ρ is negative, the (ℓ) ratio with dˆ n is larger than 2.0. Table 1 also illustrates how misspecification in the spatial weighting matrix affects our choice of bandwidth. If Wn includes fewer units as neighbors, the bandwidth estimator tends to be smaller than the one with correct Wn . In contrast, if Wn includes more units as neighbors, the bandwidth estimator tends to be larger. This coincides with our intuition. If we think we have a larger neighborhood, we need to choose a larger bandwidth to reflect the dependence structure. Table 1 also presents the performance of the spatial HAC estimator with measurement errors. The effects of measurement errors are related to the mis-specification of Wn . For a given bandwidth parameter, positive measurement errors lead to a smaller number of neighbors and vice versa. Whereas, in contrast to the mis-specification of Wn , measurement errors are different across different individuals. Table 1 shows that the estimator contaminated by measurement errors performs very poorly compared to other estimators when ρ = −0.5, while it performs reasonably well when ρ is positive. Table 1 also compares dˆ n with dFn . As dFn depends only on the sample size, it is invariant to the spatial dependence. Thus, it performs relatively well when it is close to d˜ n (e.g. ρ = −0.3 and 0.3) but it is inferior to dˆ n in most scenarios. Table 2 provides the bias, variance and MSE of the spatial HAC estimators with different bandwidth selection and those of INID. We use SHAC0 , SHACl , SHACh , SHACe and SHACF to denote the (l) (h) (e) spatial HAC estimators with dˆ n , dˆ n , dˆ n , dˆ n and dFn respectively. We can see that SHAC0 is reasonably accurate in general but that it suffers from severe underestimation when ρ is extremely high.
1 F denotes ‘‘ fixed.’’
M.S. Kim, Y. Sun / Journal of Econometrics 160 (2011) 349–371
357
Table 2 Bias, variance, and MSE of spatial HAC estimators with different bandwidth choices in a location model with spatial AR(1) error. Estimator
Bias n = 400
Variance n = 1024
n = 400
MSE (RMSE/true value)
n = 1024
n = 400
n = 1200
ρ=0 SHAC0 SHACl SHACh SHACe SHACF INID
−0.014 −0.010 −0.025 −0.021 −0.018 −0.001
−0.007 −0.004 −0.012 −0.009 −0.020
SHAC0 SHACl SHACh SHACe SHACF INID
0.001
0.023 0.013 0.033 0.034 0.020 0.005
0.011 0.006 0.016 0.016 0.019 0.002
−0.337 −0.444 −0.319 −0.340 −0.519 −1.001
−0.255 −0.354 −0.235 −0.251 −0.322 −1.000
0.211 0.124 0.275 0.287 0.060 0.005
0.123 0.071 0.164 0.172 0.068 0.002
SHAC0 SHACl SHACh SHACe SHACF INID
−0.949 −1.156 −0.947 −0.994 −1.703 −2.859
−0.703 −0.927 −0.670 −0.698 −1.128 −2.855
1.192 0.664 1.553 1.598 0.174 0.008
0.714 0.389 0.968 1.007 0.217 0.004
SHAC0 SHACl SHACh SHACe SHACF INID
−3.701 −4.166 −3.860 −3.985 −6.876 −9.731
−2.690 −3.337 −2.673 −2.743 −5.054 −9.705
12.325 7.166 15.234 15.325 0.871 0.024
7.937 4.313 10.715 10.983 1.219 0.010
SHAC0 SHACl SHACh SHACe SHACF INID
−54.32 −54.86 −59.31 −59.80 −85.73 −98.49
−39.60 −43.78 −42.76 −42.90 −73.70 −97.92
1010.1 781.5 956.6 938.4 26.7 0.5
899.0 541.4 1077.9 1071.9 42.7 0.2
SHAC0 SHACl SHACh SHACe SHACF INID
−273.6 −266.3 −284.4 −285.3 −373.4 −402.3
−204.5 −212.6 −223.6 −224.3 −339.8 −399.4
10 557 10 465 10 532 10 392 215 4
13 871 9 553 14 605 14 285 365 1
0.023 0.013 0.034 0.034 0.020 0.005
(0.15) (0.12) (0.18) (0.19) (0.14) (0.07)
0.011 0.006 0.016 0.016 0.020 0.002
(0.10) (0.08) (0.13) (0.13) (0.14) (0.04)
0.324 0.321 0.377 0.402 0.329 1.006
(0.28) (0.28) (0.30) (0.31) (0.28) (0.49)
0.188 0.196 0.219 0.235 0.172 1.000
(0.21) (0.22) (0.23) (0.24) (0.20) (0.49)
2.093 2.000 2.450 2.586 3.074 8.182
(0.36) (0.35) (0.39) (0.40) (0.44) (0.71)
1.209 1.248 1.417 1.495 1.490 8.153
(0.27) (0.28) (0.30) (0.30) (0.30) (0.71)
26.02 24.52 30.13 31.21 48.15 94.71
(0.45) (0.44) (0.49) (0.50) (0.62) (0.87)
15.18 15.45 17.86 18.51 26.76 94.20
(0.35) (0.35) (0.38) (0.38) (0.46) (0.87)
3960.9 3791.1 4474.0 4514.9 7376.6 9700.0
(0.62) (0.61) (0.66) (0.66) (0.84) (0.97)
2467.1 2457.7 2906.4 2912.1 5474.0 9587.7
(0.49) (0.49) (0.53) (0.53) (0.73) (0.97)
85 420 81 382 91 390 91 762 139 610 161 880
(0.72) (0.70) (0.74) (0.74) (0.92) (0.99)
55 680 54 750 64 620 64 580 115 810 159 490
(0.58) (0.58) (0.63) (0.63) (0.84) (0.99)
ρ = 0.3
ρ = 0.5
ρ = 0.7
ρ = 0.9
ρ = 0.95
Spatial HAC estimators do not capture high dependence well even if we choose a large bandwidth since spatial HAC estimators are constructed with the estimated residuals not the true disturbances. Our asymptotic theory does not capture the effect of demeaning on the SHAC estimator. This is analogous to the time series case, see for example, Sun et al. (2008) and Sun and Phillips (2008). When there is no spatial dependence (ρ = 0), SHAC0 is quite reliable in that the RMSE is only 15% of the true value even though INID is slightly more accurate. When there exists some spatial dependence, SHAC0 is much more accurate than INID. Furthermore, INID is rarely improved with an increasing sample size, which is in sharp contrast to SHAC0 . For example, when ρ = 0.3 and n = 400, the MSE of SHAC0 is less than a third of that of INID. When n = 1024, the difference increases with the former less than a fifth of the latter. Therefore, when there is no spatial dependence, the loss of efficiency from using a spatial HAC estimator with data dependent bandwidth is small. Whereas, there is a remarkable reduction in RMSE by using a spatial HAC estimator when there exists spatial dependence. Table 2 also shows how mis-specification in Wn and measurement errors affect the performance of the spatial HAC estimator using the bandwidth choice we suggest. Comparing SHACe with
SHAC0 , we find that measurement errors lead to higher MSE. However, the difference in MSE is not very large, reflecting the robustness of the SHAC to the presence of measurement errors. Similarly, mis-specification in Wn is not critical in our simulation design. (l) (h) (e) Among the three bandwidth choices dˆ n , dˆ n , dˆ n , none of them perform consistently better than others and the difference gets smaller when n = 1024. Compared to SHACF , all of them tend to yield smaller MSEs especially when n = 400 and ρ is high. Table 3 reports the empirical coverage probabilities of CIs associated with different spatial HAC estimators. The results in this table are similar to the ones in Table 2. All of the estimators yield very accurate CIs when there is no spatial dependence. In contrast, when there is spatial dependence, INID is clearly inferior to spatial HAC estimators. As the sample size increases, the coverage accuracy improves for all of the estimators except INID. Compared to SHACF , spatial HAC estimators using our data dependent bandwidth choice are more reliable as the dependence increases even in the presence of measurement errors or misspecification in the spatial weighting matrix. Table 3 shows that when ρ = 0.9 or 0.95, the error in coverage probability (ECP) is substantial. For example, when ρ = 0.95, the ECP for the 95% CI with SHAC0 is 14.6% even when n = 1024.
358
M.S. Kim, Y. Sun / Journal of Econometrics 160 (2011) 349–371
Table 3 Empirical coverage probabilities of nominal 99%, 95% and 90% CIs constructed using spatial HAC estimators in a location model with spatial AR(1) error. Estimator
99% n = 400
95% n = 1024
n = 400
90% n = 1024
n = 400
n = 1024
ρ=0 SHAC0 SHACl SHACh SHACe SHACF INID
98.8 98.7 98.8 98.7 98.9 98.9
99.0 99.1 99.0 98.9 98.9 99.0
95.0 95.1 94.8 95.0 95.4 95.8
SHAC0 SHACl SHACh SHACe SHACF INID
97.8 97.5 97.7 97.4 97.2 94.0
98.4 98.2 98.4 98.1 98.2 94.0
92.2 92.0 92.1 91.9 91.4 84.6
SHAC0 SHACl SHACh SHACe SHACF INID
96.7 96.7 96.5 96.3 95.0 84.3
97.6 97.5 97.7 97.7 96.9 83.8
90.2 90.4 90.3 89.6 87.4 70.8
SHAC0 SHACl SHACh SHACe SHACF INID
95.5 95.0 94.2 93.4 89.6 66.5
96.9 96.5 96.7 96.6 94.2 66.7
86.6 87.0 85.3 84.7 77.8 53.3
SHAC0 SHACl SHACh SHACe SHACF INID
86.7 87.6 84.2 83.9 68.8 35.2
93.5 93.4 91.4 91.4 82.1 37.4
77.3 78.0 73.9 73.7 56.8 27.2
SHAC0 SHACl SHACh SHACe SHACF INID
79.5 81.1 77.3 77.2 55.1 24.5
89.3 88.9 87.1 87.1 70.3 25.8
69.2 70.2 66.2 66.2 43.6 18.6
95.0 95.1 95.2 95.0 94.9 95.7
90.4 90.9 90.1 89.9 90.5 91.3
90.3 90.7 90.1 90.0 90.0 91.3
93.4 92.9 93.5 93.0 93.2 84.0
87.1 86.2 86.8 87.0 85.7 76.4
87.4 86.5 87.5 87.0 86.8 77.6
91.9 91.6 91.6 91.5 90.7 72.3
83.3 82.9 82.8 82.2 79.1 63.9
86.1 85.2 85.8 85.7 83.6 64.6
89.7 89.1 89.1 88.9 85.3 54.2
80.0 79.6 78.7 78.1 69.2 46.2
83.3 82.9 83.6 83.2 78.1 47.7
84.9 84.4 83.8 83.6 70.4 28.8
69.1 70.4 66.7 66.4 48.9 23.2
78.5 78.1 76.9 76.5 62.5 24.8
80.4 80.3 77.1 77.0 59.5 17.8
62.1 63.4 59.8 59.8 36.3 15.0
73.1 73.3 70.2 70.5 52.2 14.8
ρ = 0.3
ρ = 0.5
ρ = 0.7
ρ = 0.9
ρ = 0.95
As seen in Table 2, the downward bias of spatial HAC estimators becomes very large when spatial dependence is very high. For this reason, the CIs tend to be very tight. The ECP comes from two sources. First, the spatial HAC estimator is biased downward. Second, the CIs are based on the asymptotic normal approximation. In order to alleviate this problem, we investigate the performance of some bootstrap procedures in Table 5. Table 4 shows the performance of dˆ n with mis-specified parametric models. As the parametric plug-in method is likely to be biased, robustness of the spatial HAC estimator to the misspecification of the approximating parametric model is a highly desirable property. Consider the case that un follows a spatial AR(p) process: un = ρ Wn1 un + ρ 2 Wn2 un + · · · + ρ p Wnp un + εn .
√ √ The thresholds for W1n , W2n , W3n and W4n are dij,n ≤
√
√
√
2,
2 <
dij,n ≤ 2, 2 < dij,n < 5 and 5 < dij,n ≤ 2 2 respectively. Regardless of the number of lags the true process has, we use spatial AR(1) as the approximating parametric model. Table 4 illustrates that as the number of lags increases, the accuracy of the spatial HAC estimator using the spatial AR(1) model becomes lower. However, comparison with dFn clearly shows that the plug-in
method using the spatial AR(1) model performs reasonably well. For example, when ρ = 0.4 and the DGP is spatial AR(4) the empirical coverage probability of the 99% CI with SHAC0 is 93.5% and that with SHACF is 86.5%. Table 5 examines bootstrap approximation as an alternative to the normal approximation. Both the i.i.d. naive bootstrap and wild bootstrap are considered. The procedure for the i.i.d. naive bootstrap we use here is as follows: (S.1) At each location i, draw y∗i,n randomly from {yi,n , i = 1, . . . , n} with replacement. ∑ ∗ (S.2) Estimate the model parameter θ by θˆ ∗ = n−1 yi,n . (S.3) Construct the spatial HAC estimator based on the bootstrap sample but use the bandwidth parameter dˆ n . (S.4) Compute the t-stat in the bootstrap world. (S.5) Repeat S.1–S.4 to obtain the empirical distribution of the bootstrapped t-stat. (S.6) Use critical values from the empirical distribution in (S.5) to construct CIs. We also implement the wild bootstrap, which is proposed by Liu (1988) to account for unknown form of heteroskedasticity. The procedure is the same as that for the iid bootstrap except that (S.1) is replaced by (W.1)
M.S. Kim, Y. Sun / Journal of Econometrics 160 (2011) 349–371 Table 4
359
See Davidson and Flachaire (2001) for more details.
Performance of the spatial HAC estimator with bandwidth dˆ n under mis-specified approximating parametric model. SAR(2)
Bias MSE
SHAC0
SHACF
−0.293
−0.400
0.206 (0.26) 97.7 92.4 87.5
0.204 (0.26) 97.5 91.6 86.3
−0.656
−1.020
0.809 (0.33) 97.0 90.7 84.2
1.123 (0.39) 95.9 88.0 80.6
−1.707
−2.850
4.634 (0.41) 95.7 88.6 81.2
8.335 (0.56) 91.9 81.4 73.4
−7.253
−11.95
72.33 (0.53) 91.7 82.5 76.0
144.0 (0.74) 81.1 68.3 60.7
99% 95% 90% Bias MSE 99% 95% 90% Bias MSE 99% 95% 90% Bias MSE
SAR(3)
99% 95% 90%
SHAC0
SAR(4) SHACF
SHAC0
SHACF
ρ = 0.2 −0.316 −0.431
−0.322
−0.437
0.231 (0.27) 97.3 91.5 85.9
0.230 (0.27) 97.7 91.9 87.1
0.237 (0.27) 97.3 91.5 85.8
ρ = 0.3 −0.796 −1.231
0.226 (0.27) 97.7 92.3 87.1
−0.852
−1.307
1.607 (0.43) 95.1 87.5 79.3
1.207 (0.36) 96.4 90.3 82.8
1.802 (0.44) 94.7 87.1 78.6
ρ = 0.4 −2.760 −4.495
1.091 (0.35) 96.5 90.5 83.0
−3.448
−5.480
20.52 (0.64) 88.6 77.0 68.6
16.16 (0.49) 93.5 84.2 78.0
30.41 (0.67) 86.5 73.6 66.9
ρ = 0.5 −39.50 −57.33
−197.7
−249.9
41887 (0.77) 74.3 64.3 55.7
62499 (0.95) 43.8 34.5 28.1
10.88 (0.46) 94.3 86.2 78.9
1845.9 (0.66) 84.7 74.5 67.4
3294.3 (0.88) 63.6 50.2 42.8
Note: (1) Number in parenthesis represents the ratio of the RMSE to the true value. (2) dFn = 4. Table 5 Empirical coverage probability of nominal 99%, 95%, 90% confidence intervals constructed using the bootstrap and standard normal approximations. Normal
i.i.d. bootstrap
Wild bootstrap
ρ = 0.0 99% 95% 90%
98.8 95.0 90.4
99% 95% 90%
97.8 92.2 87.1
99% 95% 90%
96.7 90.2 83.3
99% 95% 90%
95.5 86.6 80.0
99% 95% 90%
86.7 77.3 69.1
99% 95% 90%
79.5 69.2 62.1
99.2 95.7 90.9
99.0 95.2 90.7
ρ = 0.3 98.5 93.4 88.9
98.5 93.7 89.2
ρ = 0.5 98.5 92.8 87.3
98.4 93.0 87.1
ρ = 0.7 97.6 92.2 84.8
98.0 91.9 85.4
ρ = 0.9 97.0 88.3 81.6
96.7 87.4 81.0
ρ = 0.95 94.0 85.8 78.8
93.5 83.2 75.9
(W.1) At each location, compute the residual uˆ i,n = yi,n − θˆ and generate the bootstrap observation y∗i,n : y∗i,n
θˆ + uˆ i,n = ˆ θ − uˆ i,n
with probability 0.5, with probability 0.5.
(S.1) and (W.1) eliminate spatial dependence of the bootstrap sample. Goncalves and Vogelsang (2008) show that the i.i.d. naiv e bootstrap provides a valid approximation to the ‘‘fixedb’’ asymptotic distribution in time series regressions. Under the ‘‘fixed-b’’ specification, the bandwidth is set proportional to the sample size and the associated test statistic converges to a non-standard limiting distribution (e.g. Kiefer and Vogelsang, 2002, 2005). Goncalves and Vogelsang (2008) introduce a naive bootstrap procedure to obtain the critical values from the nonstandard distribution. Bester et al. (2009) have extended the ‘‘fixed-b’’ asymptotics and the naive bootstrap procedure to spatial HAC estimation. Their results are not applicable to our setting for two reasons. First, we adopt the traditional asymptotics framework in which the bandwidth or the number of pseudo-neighbors grows at a slower rate than the sample size. Second, the spatial processes we consider allow for nonstationarity and heteroskedasticity which are ruled out in Bester et al. (2009). However, the idea of using bootstrap to capture the randomness of the HAC estimator is still applicable. When the bandwidth is large, the bias of the HAC estimator is small and the main task is to capture the finite sample variation of the HAC estimator. By ignoring the spatial dependence hence the bias of the HAC estimator, the iid bootstrap and wild bootstrap do exactly this. The bootstrap method can be justified in the traditional framework. Under some regularity assumptions and E ℓn = o(n), the t-statistic or Wald statistic converges in distribution to the standard normal distribution or a chi-square distribution. In the bootstrap world, the corresponding test statistic obviously converges to the same distribution. Therefore, the iid bootstrap and wild bootstrap can be viewed as a valid method to obtain critical values from the standard normal or Chi-square distribution. Whether the critical values are second order correct, however, is beyond the scope of this paper. Table 5 shows that the bootstrap methods implemented here improve the accuracy of the CIs compared to the standard normal approximation, especially when the dependence is extremely high. As we have seen in previous tables, the standard normal approximation yields a large size distortion when spatial dependence is very high. However, we don’t find this problem from the bootstrap procedures. Between the i.i.d. naive bootstrap and the wild bootstrap, there is no significant difference. For example, when ρ = 0.95, the empirical coverage probabilities of the 95% CI by the i.i.d. naive bootstrap and the wild bootstrap are 85.8% and 83.2% respectively, while that of CLT is 69.2%. Table 6 illustrates the performance of the spatial HAC estimator with dˆ n when the units are located irregularly on the lattice. We generate un using the spatial AR(1) process on 20 × 20 and 25 × 25 lattices and randomly sample 300 and 400 locations from the lattices respectively without replacement. We estimate the location model with the observations on those 300 and 400 locations. We condition on the same set of locations we sample in each simulation. Table 6 shows that irregularity in location does not adversely affect the performance of the spatial HAC estimators with dˆ n . The result is confirmed by comparing Table 6 with Tables 2 and 3 in which the observations are regularly spaced. This corroborates our asymptotic results as they do not require a regular lattice structure.
6.2. Univariate model In the second part, the regression model we consider is yi,n = α + β xi,n + ui,n
360
M.S. Kim, Y. Sun / Journal of Econometrics 160 (2011) 349–371
Table 6 Performance of the spatial HAC estimator with bandwidth dˆ n in the presence of irregulairty and sparsity in spatial locations. n = 300, (20 × 20)
n = 400, (25 × 25)
SHAC0
SHACF
SHAC0
SHACF
−0.016
−0.021
0.023 (0.15) 99.1 95.1 89.7
−0.012
−0.027
0.021 (0.15) 98.8 95.0 89.5
0.015 (0.12) 99.1 96.2 92.0
0.023 (0.15) 99.1 96.0 92.0
−0.207
−0.229
0.119 (0.21) 98.9 94.8 89.6
0.108 (0.20) 98.8 94.5 89.2
ρ=0 Bias MSE 99% 95% 90%
ρ = 0.3 −0.364
−0.256
Bias MSE
0.198 (0.25) 97.7 92.6 87.1
99% 95% 90%
0.186 (0.24) 97.5 91.7 86.2
ρ = 0.5 Bias MSE
−0.611
−1.107
−0.446
−0.667
1.008 (0.32) 96.9 90.7 86.0
1.364 (0.37) 95.7 88.5 80.8
0.531 (0.27) 98.2 93.3 86.9
0.588 (0.28) 97.9 92.3 85.0
−1.251
−2.564
4.387 (0.33) 96.9 90.2 82.6
7.222 (0.42) 94.9 84.6 77.1
−14.55
−32.66
457.9 (0.45) 92.1 83.1 74.2
1084.5 (0.70) 78.3 63.8 55.4
−75.8
−145.4
9349 (0.54) 87.3 75.5 66.3
21 268 (0.82) 62.3 49.6 42.4
99% 95% 90%
ρ = 0.7 −4.204
−1.989
Bias MSE
9.794 (0.40) 94.6 87.9 81.5
99% 95% 90%
18.293 (0.55) 89.9 80.4 70.8
ρ = 0.9 −49.63
−25.81
Bias MSE
1180.9 (0.55) 88.0 78.8 70.8
99% 95% 90%
2480.2 (0.80) 69.6 57.1 50.4
ρ = 0.95 −213.2
−132.0
Bias MSE
23 799 (0.64) 82.0 70.5 63.4
99% 95% 90%
45 588 (0.89) 55.2 44.6 38.5
Table 7 reports the bias and MSE of the SHAC estimator Jˆ22,n for the above two weighting matrices. The coverage probability of the associated 95% CI is also reported. When ψ = 0.3, Sˆn always yields more accurate Jˆ22,n if ρ > 0. In the case that ρ is very high, the reduction in MSE and improvement in coverage accuracy by using Sˆn over Sˇn are remarkable. For example, when ρ = 0.95 and n = 400, the MSE of Jˆ22,n with weighting matrix Sˆn is 27.43 while that with Sˇn is 42.93. The empirical coverage probability of the CI with Sˆn is 91.4% and that with Sˇn is 82.6%. When n = 1024, the difference is still very large but becomes less dramatic. When ψ = 0.9, Sˆn performs better than Sˇn in most cases although the margin of improvement is small. 7. Conclusion In this paper, we study the asymptotic properties of the spatial HAC estimator. We establish the consistency conditions, the rate of convergence and the asymptotic truncated MSE of the estimator. We also determine the optimal bandwidth parameter which minimizes the asymptotic truncated MSE. As this optimal bandwidth parameter is not feasible in practice, we suggest a data dependent bandwidth parameter estimator using a parametric plug-in method. Monte Carlo simulation results show that the data dependent bandwidth choice we suggest performs reasonably well compared to other bandwidth selection procedures in terms of both the MSE criterion and the coverage accuracy of CIs. They also confirm the robustness of our bandwidth choice procedure to the mis-specification in the spatial weighting matrix and the approximating parametric model, irregularity and sparsity in spatial locations, and the presence of measurement errors. In this paper, we focus on the asymptotic truncated MSE criterion, which may not be most suitable for hypothesis testing or CI construction. It is interesting to extend the methods by Sun et al. (2008) and Sun and Phillips (2008) on time series HAC estimation to the spatial setting. Acknowledgements
where α = 1, β = 5, xn = (xi,n ) is the standardized version of x˜ n , which follows a spatial process of the form:
We thank Tim Conley, Graham Elliott, Xun Lu, Dimitris Politis and Hal White for helpful comments. We appreciate the comments of Cheng Hsiao, the coeditor, an associate editor and two referees, which led to considerable improvements of the paper. Sun gratefully acknowledges partial research support from NSF under Grant No. SES-0752443.
x˜ n = ψ W0n x˜ n + ζn ,
Appendix
Note: Number in parentheses represents the ratio of RMSE to the true value.
i.i.d
with ζin ∼ U [0, 1]. Here we assume the spatial process of x˜ n and un have the same weighting matrix W0n . Let Xn be the design matrix with i-th row Xi,n = [1, xi,n ]. In view of the standardization, n−1 Xn′ Xn is the 2 × 2 identity matrix.
Proof of Theorem 1. For notational simplicity, we re-order the individuals and make new indices. For i(j) = 1, . . . , ℓj,n , d∗i j,n ≤
We consider two different weighting matrices: Sn = Sˇn or Sˆn where
(a) Asymptotic variance: limn→∞ Enℓ var(vec(J˜n )) = K¯ (I + Kpp ) n (g ⊗ g )
1 0 Sˇn = 0 0
0 1 0 0
0 0 1 0
0 0 0 1
0 0 and Sˆn = 0 0
0 0 0 0
0 0 0 0
0 0 . 0 1
dn , and for i(j) = ℓj+1,n , . . . , n, d∗i j,n > dn . (j)
Let ϕlkrs,n =
the variance of βˆ and the corresponding MSE is the MSE of Jˆ22,n .
d∗ ij,n dn
and Nn = {νij,n |i, j = 1, . . . , n} be the set of measurement errors in distance. By Assumption 4(i), we have n
The first choice Sˇn is suggested by Andrews (1991) in time series HAC estimation. For this choice, the MSE criterion reduces to the MSE of Jˆ11,n + Jˆ22,n + 2Jˆ12,n . The second choice is designed to select
(r ) (s) j=1 ril,n rjk,n K
(j)
E ℓn
∑n ∑n i =1
cov(J˜rs,n , J˜cd,n ) =
×
np
nE ℓn
np − np −
E E
ϕlkrs,n (εl,n εk,n − E εl,n εk,n )
l=1 k=1
ϕefcd,n (εe,n εf ,n − E εe,n εf ,n ) Nn f =1 np
−− e= 1
1
M.S. Kim, Y. Sun / Journal of Econometrics 160 (2011) 349–371
361
Table 7 Bias and MSE of Jˆ22,n and empirical coverage of the associated CIs with bandwidth selected using different weighting matrices in the truncated AMSE criterion.
ψ
n
ρ 0
−0.054
−0.096
−0.200
−0.813
−1.845
0.041 93.4
0.075 92.8
0.234 92.7
4.076 91.7
27.43 91.4
Bias MSE 95%
−0.019
−0.075
−0.147
−0.336
−1.706
−4.451
Sˇn
0.024 93.9
0.062 92.6
0.151 91.5
0.535 89.8
7.850 85.9
42.93 82.6
Bias MSE 95%
−0.049
−0.286
−0.679
−1.968
−14.00
−40.78
Sˆn
0.047 92.7
0.245 90.9
1.051 89.5
7.440 86.8
333.8 81.9
2893.6 79.3
Bias MSE 95%
−0.039
−0.295
−0.686
−1.991
−14.62
−44.04
Sˇn
0.039 92.9
0.245 90.8
1.049 89.6
7.584 86.6
357.3 80.6
3173.0 76.4
Bias MSE 95%
−0.009
−0.037
−0.071
−0.152
−0.620
−1.444
Sˆn
0.012 95.1
0.018 95.1
0.035 94.9
0.111 94.8
1.771 94.4
11.90 93.6
Bias MSE 95%
−0.007
−0.043
−0.087
−0.204
−1.068
−2.891
Sˇn
0.010 95.2
0.034 94.6
0.086 94.0
0.300 93.8
4.361 90.9
24.89 89.5
Bias MSE 95%
−0.025
−0.210
−0.481
−1.363
−9.706
−29.49
Sˆn
0.019 95.8
0.133 94.4
0.580 93.3
4.195 91.6
197.1 89.1
1829.4 87.0
Bias MSE 95%
−0.020
−0.211
−0.477
−1.346
−9.850
−31.41
Sˇn
0.016 96.0
0.132 94.3
0.580 93.2
4.289 91.8
215.6 88.7
2101.9 84.5
1024
0.9
np − np − np − np − E ϕlkrs,n ϕefcd,n (εl,n εk,n εe,n εf ,n
C1,n can be restated as n n n n 1 −−−−
l=1 k=1 e=1 f =1
− εl,n εk,n E εe,n εf ,n − εe,n εf ,n E εl,n εk,n +E εl,n εk,n E εe,n εf ,n ) Nn
E ℓn i=1 j=1 a=1 b=1
×
=
E ℓn
E
np −
n2
np
+
1
ϕllrs,n ϕllcd,n (E ε
4 l,n
np − np −
− 3) +
l=1
ϕlkrs,n ϕlkcd,n
l=1 k=1
ϕlkrs,n ϕklcd,n
np −
1
nE ℓn l=1
where
C1,n =
nE ℓn l=1
×
E
n − n −
n − n −
(r ) (s) ril,n rjl,n K
C2,n =
(c ) (d)
ral,n rbl,n K
np − np −
nE ℓn l=1 k=1
×
n − n −
d∗ab,n
E
n − n −
C3,n =
nE ℓn l=1 k=1
×
− n − n a=1 b=1
(r ) (s) ril,n rjk,n K
E
d∗ab,n
dn
n − n − i=1 j=1
(c )
(d)
rak,n rbl,n K
d∗ab,n dn
d∗ij,n
dn
,
(r ) (s) ril,n rjk,n K
d∗ij,n dn
dn
d∗ab,n
]
dn
.
4 l ,n
− 3|
[ ∗ n n − n − n − − dij,n E K
i =1 j =1 a =1 b =1
d∗ab,n
dn
]
dn
≤
np cR4 1 −
E ℓn n l = 1
b=1
|E εl4,n − 3| ≤
cR4 cE p E ℓn
= o(1)
using Assumptions 1 and 2. C2,n can be restated as given in Box I. In order to consider the boundary effects, we decompose C2,n as given in Box II. In the following, we show that D1n converges to K¯ grc gsd and the other terms become negligible as n increases. In order to prove that D1n converges to K¯ grc gsd , it suffices to first show that
.
|E ε
a =1
(E εl4,n − 3),
i=1 j=1
(c ) (d) ral,n rbk,n K
np − np −
dn
dn
a=1 b=1
1
d∗ij,n
i =1 j =1
a=1 b=1
1
K
|ril(r,n) rjl(s,n) ral(c,)n rbl(d,)n | np n n − − 1 − (r ) (s) 4 ≤ |E εl,n − 3| |ril,n | |rjl,n | nE ℓn l=1 i =1 j =1 n n − − (c ) (d) × |ral,n | |rbl,n | ×K
np −
Therefore,
≡ C1,n + C2,n + C3,n ,
1
d∗ij,n
np 1 − (r ) (s) (c ) (d) 4 E εl,n − 3 r r r r n l=1 il,n jl,n al,n bl,n
C1,n ≤
l=1 k=1
[ E K
np
−−
0.95
0.029 93.8
0.3
n
0.9
−0.023
0.9
nE ℓn
0.7
Bias MSE 95%
400
=
0.5
Sˆn 0.3
1
0.3
lim
n→∞
1 nE ℓn
E
ℓ i ,n − − ℓ a ,n −− i∈En j(i) =1 a∈En b(a) =1
K2
d∗ij ,n (i ) dn
γia(rc,n) γj((isd) b)(a) ,n
362
M.S. Kim, Y. Sun / Journal of Econometrics 160 (2011) 349–371
1
E
nE ℓn
np − np − n − n − n − n − l=1 k=1 i=1 j=1 a=1 b=1
1
=
nE ℓn
E
1 nE ℓn
d∗ij,n
K
dn
∗ ∗ − np n − n − n − n − dij,n dab,n K
E
K
dn
i=1 j=1 a=1 b=1
=
(r ) (s) (c ) (d) ril,n rjk,n ral,n rbk,n K
ℓi,n − ℓa,n n − n − −
K
d∗ij ,n (i)
dn
K
dn
i=1 j(i) =1 a=1 b(a) =1
d∗ab,n dn
(r ) (c ) ril,n ral,n
np −
l=1
d∗ab ,n ( a) dn
(s) (d) rjk,n rbk,n
k=1
γia(rc,n) γj((isd) b)(a) ,n .
(A.1)
Box I.
1 nE ℓn
∗ ∗ ℓi,n − − ℓa,n dab ,n dij ,n −− ( a) (i) ( rc ) ( sd ) E K γia,n γj(i) b(a) ,n K dn
i∈En j(i) =1 a∈En b(a) =1
dn
∗ ∗ ℓi,n − − ℓa,n dab ,n dij ,n −− 1 (a) (i) + E K γia(rc,n) γj((isd) b)(a) ,n K nE ℓn d d n n i̸∈En j(i) =1 a∈En b(a) =1 ∗ ∗ ℓi,n − − ℓa,n dij ,n dab ,n −− 1 (i) (a) + E K K γia(rc,n) γj((isd) b)(a) ,n nE ℓn dn dn i∈En j(i) =1 a̸∈En b(a) =1 ∗ ∗ ℓi,n − − ℓa,n dij ,n dab ,n −− 1 (i) (a) ( rc ) ( sd ) K + E K γia,n γj(i) b(a) ,n nE ℓn dn dn i̸∈En j =1 a̸∈En b =1 (i)
(a)
= D1n + D2n + D3n + D4n . Box II.
1
= lim
n→∞
nE ℓn
E
ℓ i ,n − − ℓa,n −−
K
2
d∗ab ,n (a)
dn
i∈En j(i) =1 a∈En b(a) =1
= K¯ grc gsd ,
lim
n→∞
nE ℓn
dn
i∈En j(i) =1 a∈En b(a) =1
∗ ℓi,n − − ℓa,n dij ,n −− 1 (i) = lim E K n→∞ nE ℓn d n i∈En j =1 a∈En b =1 (i)
× K
(sd)
d∗ab ,n (a) dn
(a)
γia(rc,n) γj((isd) b)(a) ,n . ∑ℓi,n
(A.3)
∗ ℓi,n − − ℓa,n dij ,n −− 1 (i) lim E K2 γia(rc,n) γ¯ı(,bsd(a)) ,n n→∞ nE ℓn d n i∈En j(i) =1 a∈En b(a) =1 ∗ ℓi,n ℓa,n 1 − − (rc ) − (sd) 1 − 2 dij(i) ,n = lim E γ γ K n→∞ n i∈E a∈E ia,n b =1 ¯ı,b(a) ,n E ℓn j =1 dn n n ( a) (i) ℓ ℓ a,n i,n 1 − − (rc ) 1 − − (sd) = lim γ E γ n→∞ n i∈E a∈E ia,n E ℓn b =1 j =1 j(i) b(a) ,n n n (i)
E ℓn j=1
K
2
d∗ij,n dn
.
(A.5)
Note that
We proceed to show that the expected value of each term is o(1). We consider the first term only as the proofs for the other two terms are similar. As a ∈ En , by the Markov inequality
(sd)
Let γ¯ı,b ,n = E1ℓ j(i) =1 γj(i) b(a) ,n . Then we get the equation ( a) n given in Box III. For the first term in (A.4), under the Assumption 7(i),
(a)
n 1 −
ℓ i ,n ℓ a ,n − E ℓn E ℓn − − 1 − 1 (sd) (sd) γj(i) b(a) ,n − γj(i) b(a) ,n Eℓ E ℓ n b =1 j =1 n b(a) =1 j(i) =1 (a ) (i ) E ℓn E ℓn E ℓn E ℓn − − − − 1 1 ≤ γj((isd) b)(a) ,n γj((isd) b)(a) ,n + E ℓn b(a) =ℓa,n +1 j(i) =1 E ℓn b(a) =1 j(i) =ℓi,n +1 E ℓn E ℓn − − 1 + γj((isd) b)(a) ,n . (A.6) E ℓn b(a) =ℓa,n +1 j(i) =ℓi,n +1
∗ ℓi,n − − ℓa,n dij ,n −− (i) E K2 γia(rc,n) γj((isd) b)(a) ,n
×
(A.2)
and then show that 1
γia(rc,n) γj((isd) b)(a) ,n
P
1 ℓa,n |ℓa,n − E ℓn | ≥ ε ≤ E − 1 → 0. E ℓn ε E ℓn
That is, for any ε > 0, there exists a N0 > 0 such that for n ≥ N0 P (ℓa,n ̸∈ B(E ℓn , ε)) ≤ ε, where B(E ℓn , ε) = (⌊(1 − ε)E ℓn ⌋, ⌈(1 + ε)E ℓn ⌉). Now
1 E E ℓn
(sd) γj(i) b(a) ,n b(a) =ℓa,n +1 j(i) =1 E ℓn E ℓn − − 1 ( sd ) ≤E γ E ℓn b =ℓ +1 j =1 j(i) b(a) ,n a,n E ℓn −
(a)
E ℓn −
(i)
M.S. Kim, Y. Sun / Journal of Econometrics 160 (2011) 349–371
lim
n→∞
1 nE ℓn
363
∗ ℓi,n − − ℓa,n dij ,n −− (i) E γia(rc,n) γj((isd) b)(a) ,n K2 dn
i∈En j(i) =1 a∈En b(a) =1
∗ ℓi,n − − ℓa,n dij ,n −− 1 ( i ) = lim E γia(rc,n) γ¯ı(,bsd(a)) ,n K2 n→∞ nE ℓn dn i∈En j =1 a∈En b =1 (i)
(a)
∗ ℓi,n − − ℓa,n dij ,n −− 1 (i) + lim E γia(rc,n) (γj((isd) b),n − γ¯ı(,bsd(a)) ,n ) . K2 n→∞ nE ℓn dn i∈En j =1 a∈En b =1 (i)
(A.4)
( a)
Box III.
E ℓn E ℓn − − 1 (sd) =E γ ℓa,n ∈ B(E ℓn , ε) E ℓn b =ℓ +1 j =1 j(i) b(a) ,n a,n (i) (a) × P ℓa,n ∈ B(E ℓn , ε) E ℓn E ℓn − − 1 ( sd ) +E γ ℓa,n ̸∈ B(E ℓn , ε) E ℓn b =ℓ +1 j =1 j(i) b(a) ,n a , n ( i ) (a) × P ℓa,n ̸∈ B(E ℓn , ε) ⌈(1+ε) E ℓn −E ℓn ⌉ − 1 (sd) ≤ 2ε γ 2ε E ℓn b =⌊(1−ε)E ℓ ⌋ j =1 j(i) b(a) ,n n (i) (a) + O(1)P ℓa,n ̸∈ B(E ℓn , ε) ,
For the second term in (A.4), the first step is to show 1 nE ℓn
is bounded, we also have
ℓi,n ℓa,n − E ℓn − E ℓn − 1 − 1 (sd) (sd) E γj(i) b(a) ,n − γj(i) b(a) ,n E ℓ E ℓ n b =1 j =1 n b(a) =1 j(i) =1 ( a) (i) ∗ n 1 − 2 dij,n × K = o(1). E ℓn j = 1 dn As a result
∗ ℓi,n − − ℓa,n dij ,n −− 1 ( i ) ( rc ) ( sd ) lim E K2 γia,n γ¯ı,b(a) ,n n→∞ nE ℓn dn i∈En j(i) =1 a∈En b(a) =1 E ℓn − E ℓn − − − 1 1 = lim γia(rc,n) E γj((isd) b)(a) ,n n→∞ n E ℓ n i∈En a∈En b(a) =1 j(i) =1 ∗ n 1 − 2 dij,n × K E ℓn j = 1 dn E ℓn − E ℓn 1 − − (rc ) 1 − ( sd ) = lim γ lim γ n→∞ n i∈E a∈E ia,n n→∞ E ℓn b =1 j =1 j(i) b(a) ,n n n (a) (i) ∗ n − d 1 ij,n × lim E K2 = K¯ grc gsd , n→∞ E ℓn j = 1 dn using Assumption 7(ii).
dn
i∈En j(i) =1 a∈En b(a) =1
dn
× γia(rc,n) γj((isd) b)(a) ,n − γ¯ı(,bsd(a)) ,n = o(1),
(A.8)
and the second step is to prove 1 nE ℓn
∗ ℓi,n − − ℓa,n −− dia,n 2 K E i∈En j(i) =1 a∈En b(a) =1
dn
× γia(rc,n) γj((isd) b)(a) ,n − γ¯ı(,bsd(a)) ,n = o(1).
which can be made arbitrarily small when n → ∞. So the first term in (A.6) is indeed op (1). Hence
ℓi,n ℓa,n − E ℓn − E ℓn − 1 − 1 (sd) (sd) E γj(i) b(a) ,n − γj(i) b(a) ,n = o(1). (A.7) E ℓn b = 1 j = 1 E ℓn b(a) =1 j(i) =1 (a) (i) ∑n ∑n 1 2 ∗ Since (E ℓn )−1 j=1 K 2 (d∗ij,n /dn ) = (ℓi,n /E ℓn )ℓ− i,n j=1 K (dij,n /dn )
∗ ∗ ℓi,n − − ℓa,n dij ,n −− dia,n (i) 2 2 E −K K
(A.9)
For (A.8) 1 nE ℓn
∗ ∗ ℓi,n − − ℓa,n dij ,n −− dia,n ( i ) K2 E − K2 i∈En j(i) =1 a∈En b(a) =1
×
≤
γia(rc,n) 1
nE ℓn
dn
dn
(sd) (sd) γj(i) b(a) ,n − γ¯ı,b(a) ,n
ℓi,n − − ℓa,n −− (rc ) (sd) (sd) E γia,n γj(i) b(a) ,n − γ¯ı,b(a) ,n i∈En j(i) =1 a∈En b(a) =1
ℓi,n ℓa,n 1 − (rc ) 1 − − (sd) (sd) = γia,n E γj(i) b(a) ,n − γ¯ı,b(a) ,n n (i,a)∈F E ℓn j = 1 b = 1 1 (i) ( a)
ℓi,n ℓa,n 1 − (rc ) 1 − − (sd) (sd) + γia,n E γj(i) b(a) ,n − γ¯ı,b(a) ,n n (i,a)∈F E ℓn j = 1 b = 1 2 (i) (a)
= M1n + M2n , where
F1 = {(i, a) : dia,n ≤ dn & i, a ∈ En }, and
F2 = {(i, a) : dia,n > dn & i, a ∈ En }. For M1n , we obtain
M1n ≤
E ℓn n
ℓi,n ℓa,n − (rc ) 1 − − (sd) γia,n E γj(i) b(a) ,n E ℓn (i,a)∈F E ℓn j = 1 b = 1 1
1
(i)
( a)
364
M.S. Kim, Y. Sun / Journal of Econometrics 160 (2011) 349–371
1
+
( E ℓn ) 2
ℓ
ℓi,n − ℓi,n − ℓa,n − (sd) γh(i) b(a) ,n
j(i) =1 h(i) =1 b(a) =1
E ℓn
∑
i∈En
lim E
×
Therefore, we obtain
1 nE ℓn
E
ℓi,n − − ℓa,n −−
K2
=E
1 −− n i∈E a∈E n n
K
2
γia(rc,n) γj((isd) b)(a)
d∗ia,n
dn
ℓi,n ℓa,n 1 − − (sd) × γ E ℓn j =1 b =1 j(i) b(a)
γia(rc,n)
E ℓn
j(i) =1 b(a) =1
E ℓn
d∗ia,n
dn
γia(rc,n)
ℓi,n −
1 E ℓn
ℓa,n −
j(i) =1 b(a) =1
1 E ℓn
γj((isd) b)(a) ,n −
ℓi,n − ℓi,n − ℓa,n −
1
(E ℓn )2
h(i) =1 j(i) =1 b(a) =1
γh((sdi) ,)b(a) ,n
γj((isd) b)(a) ,n
− ℓi,n − ℓa,n + op (1) γh((sdi) ,)b(a) ,n h(i) =1 b(a) =1
→ 0.
(sd) ,n − γ¯ı,b ,n (a)
Hence the second term in (A.4) is op (1). By a symmetric argument, we obtain the result that
ℓi,n − ℓi,n n n − − 1 − (sd) (sd) sup γj(i) b,n + γhb,n E ℓn j =1 b=1 E ℓn j = 1 b h = 1 (i)
n 1 − ∗ × 1 dab,n < dn E ℓn b = 1
ℓi,n − n n − − (sd) (sd) ℓi,n ℓa,n ≤ γ + sup γhb,n E ℓn j =1 b=1 j(i) b,n (E ℓn )2 b h=1 (i) ℓi , n ℓa , n . (E ℓn )2
p
1
+C
2
γj((isd) b)(a) ,n − γ¯ı(,bsd(a)) ,n
j(i) =1 b(a) =1
ℓi,n − ℓa,n 1 − γj((isd) b)(a) ,n − γ¯ı(,bsd(a)) ,n Eℓ n j(i) =1 b(a) =1 ℓi,n − n 1 − ∗ ( sd ) ≤ γj(i) b,n 1 dab,n < dn E ℓn j(i) =1 b=1 ℓi,n − n − n 1 − (sd) ∗ ∗ + γ 1 d < d , d < d n n ab,n ih,n hb,n 2 (E ℓn ) j(i) =1 b=1 h=1
ℓi,n
( a)
ℓi,n − ℓa,n −
1
=
lim
E ℓn
K
n i∈E a∈E n n
ℓi,n − ℓa,n −
1
For some generic constant C
≤C
γia(rc,n)
The last equality holds because
1
n→∞
1
dn
1 −−
(i)
(a)
(i)
d∗ia,n
ℓi,n − ℓa,n − 1 × γ (sd) − γ¯ı(,bsd(a)) ,n = 0. E ℓn j =1 b =1 j(i) b(a) ,n
(a)
= o(1).
≤
−
(i )
dia,n dn
i∈En j(i) =1 a∈En b(a) =1
∗
(sd) ,n − γ¯ı,b ,n
The next step is to show (A.9).
K
2
j(i) =1 b(a) =1
n→∞
=
E ℓn
= E P lim
M1n = o(1) and M2n = o(1).
(rc )
ℓi,n − ℓa,n − γj((isd) b)(a) ,n − γ¯ı(,bsd(a)) ,n
1
M2n
1 −− n i∈E a∈E n n
n→∞
n
E ℓn − E ℓn 1 − 1 1 − (rc ) q (sd) ≤ q γia,n dia γj(i) b(a) ,n E ℓn j = 1 b = 1 dn n (i,a)∈F 2 (i) (a) E ℓ E ℓ E ℓ n n n − − − 1 (sd) + γh b ,n + o(1) (E ℓn )2 j(i) =1 h(i) =1 b(a) =1 (i) (a) q = O d− . n
≤ C . As |n−1
a∈En
1
where the first equality holds with the same procedure as (A.7). For M2n ,
d∗ ia,n
K 2 ( d )γia,n | < ∞ for all n, invoking the dominated n convergence theorem and Assumption 6 yields
∑
E ℓn − E ℓn − (rc ) 1 − (sd) = γia,n γj(i) b(a) ,n n E ℓn (i,a)∈F E ℓn j = 1 b = 1 1 (i) (a) E ℓn − E ℓn E ℓn − − 1 (sd) + γh b ,n + o(1) (E ℓn )2 j(i) =1 h(i) =1 b(a) =1 (i) (a) E ℓn =O ,
E ℓi,n E ℓn
By Assumption 5, E limn→∞ Eiℓ,n = limn→∞ n
nE ℓn
E
ℓi,n − − ℓa,n −−
K2
d∗ab ,n (a)
dn
i∈En j(i) =1 a∈En b(a) =1
) γia(rc,n) γjb(sd ,n
= K¯ grc gsd . The next step is to prove (A.3). In view of previous derivations, it suffices to show that
∗ ∗ 2 ℓi,n ℓa,n dab ,n dij ,n 1 −−− − (a) (i ) −K lim E K n→∞ nE ℓn i∈E j =1 a∈E b =1 dn dn n n (i )
(a )
) × γia(rc,n) γjb(sd = 0. ,n
(A.10)
But
E
1
ℓi,n − − ℓa,n −−
nE ℓn i∈E n
K
j(i) =1 a∈En b(a) =1
d∗ij ,n (i)
dn
−K
d∗ab ,n (a)
2
dn
) × γia(rc,n) γjb(sd ,n
=E
1
−
nE ℓn (i,j(i),a,b(a))∈I 1
K
d∗ij ,n (i) dn
−K
d∗ab ,n (a) dn
2
M.S. Kim, Y. Sun / Journal of Econometrics 160 (2011) 349–371
ℓi,n ℓa,n 1 − − (sd) × γ cn−q E ℓn j =1 b =1 j(i) b(a) ,n
1
) × γia(rc,n) γjb(sd + ,n
nE ℓn ∗
−
×
K
(i)
dij ,n (i)
dn
(i,j(i),a,b(a))∈I2
−K
∗
dab ,n ( a) dn
2
) γia(rc,n) γjb(sd ,n
= o(cn ). By choosing cn such that cn → ∞ but cn /dn → 0, we have F1n = o(1)
where
I1 = {(i, j(i), a, b(a)) : |dij(i) ,n − dab(a) ,n | ≤ 2cn & i, a ∈ En }, ∗
∗
and
I2 = {(i, j(i), a, b(a)) : |d∗ij(i) ,n − d∗ab(a) ,n | > 2cn & i, a ∈ En }.
nE ℓn
F1n
K
(i,j(i),a,b(a))∈I1
d∗ij ,n (i)
dn
2 d∗ab ,n (a) ) −K γia(rc,n) γjb(sd ,n dn 2 ∗ dij ,n d∗ab ,n − cL2 (i) (a) (rc ) (sd) − ≤ γia,n γjb,n nE ℓn (i,j(i),a,b(a))∈I dn dn 1 ℓi,n ℓa,n 1 − − (rc ) 4cL2 cn2 1 − − (rc ) ≤ γ γia,n d2n n i∈E a∈E E ℓn j =1 b =1 jb,n n n (i) (a) 2 c = O n2 dn
(i) (a)
d∗j
(i) b(a) ,n
≤ cn , then
d∗ij(i) ,n − d∗ab(a) ,n ≤ d∗ia,n + d∗ab(a) ,n + d∗b(a) j(i) ,n − d∗ab(a) ,n ≤ 2cn , and dij(i) ,n − dab(a) ,n ≥ dij(i) ,n − dia,n − dij(i) ,n − dj(i) b(a) ,n ≥ −2cn . ∗
∗
∗
∗
∗
These two inequalities imply that |d∗ij ,n − d∗ab ,n | ≤ 2cn , a (i) (a) contradiction. Without loss of generality, we assume that d∗ia,n > cn for (i, j(i) , a, b(a) ) ∈ I2 . In this case F2n
( a)
1 ≤ E nE ℓn −K ≤E =E
−
K
(i,j(i),a,b(a))∈I2 ∗
dab ,n (a)
2
dn
4
−
d∗ij ,n (i)
lim
n→∞
n
E ℓn
cov J˜rs,n , J˜cd,n
γia(rc,n) γj((isd) b)(a) ,n ∗ q (rc ) (sd) ∗ −q dia,n γia,n γj(i) b(a) ,n dia,n
nE ℓn (i,j(i),a,b(a))∈I 2 4
−
−
nE ℓn i∈E ∗ n a:dia,n ∈[cn ,dn ]
×
∗ q (rc ) dia,n γia,n
ℓi,n − ℓa,n − (sd) ∗ −q γj(i) b(a) ,n dia,n
j(i) =1 b(a) =1
≤ 4E
1 − − ∗ q (rc ) dia,n γia,n n i∈E a∈E n
n
(i)
= K¯ (grc gsd + grd gsc ).
In terms of matrix form, lim
n
E ℓn
var vec J˜n
= K¯ (I + Kpp )(g ⊗ g ),
where g = [grs ], r , s = 1, . . . , n. p (b) Asymptotic bias: limn→∞ dn (E J˜n − Jn ) when dn → ∞ By Assumption 4(ii) and the dominated convergence theorem, we have
d∗ ij,n n − n 1 − K − dn q 1 Γij,n d∗ij,n ∗ q = −E d
dqn E J˜n − Jn
n i =1 j =1
= −Kq
n n 1 −−
n i=1 j=1
ij,n dn
q Γij,n E d∗ij,n + o(1).
Therefore,
dn
dn
by choosing the sequence of dn in a way that n2 /n → 0 as n → ∞. We can show D3n and D4n are o(1) in a similar way. With the symmetric procedure, it is straightforward that limn→∞ C3,n = K¯ grd gsc . Therefore,
∗
dn
i̸∈En j(i) =1 a∈En b(a) =1
= o(1),
n→∞
using Eq. (A.7). For F2n we note that if |d∗ij ,n − d∗ab ,n | > 2cn , then (i) (a) either d∗ia,n > cn or d∗j b ,n > cn . Otherwise, if both d∗ia,n ≤ cn and
∗ ∗ ℓi,n − − ℓa,n dab ,n dij ,n −− (a) (i) E K γia(rc,n) γj((isd) b)(a) ,n K
ℓi,n ℓa,n − − − − 1 (sd) (rc ) 1 ≤E γ γia,n n i̸∈E a∈E E ℓn b =1 j =1 j(i) b(a) ,n n n (a) (i) E ℓn − E ℓn 1 − − (rc ) 1 − (sd) = γia,n γ + o(1) n i̸∈E a∈E E ℓn b =1 j =1 j(i) b(a) ,n n n
For F1n , we have
and F2n = o(1)
and (A.3) is proved. Next, we show that D2n is o(1). For D2n , 1
−
(a)
−q
≡ F1n + F2n ,
1 ≤ E nE ℓn
365
lim dqn (E J˜n − Jn ) = −Kq lim
n→∞
n→∞
n n 1 −−
n i=1 j=1
q Γij,n E d∗ij,n = −Kq g (q) ,
(q)
(q) where grs is (r , s)-th element ofg .
(c)
n E ℓn
Jˆn − Jn
= Op (1) and
n E ℓn
Jˆn − J˜n
= op (1)
By (a) and (b) the first part of (c) is implied bythe second part.
Therefore, it suffices to show that holds if and only if
n
E ℓn
n E ℓn
Jˆn − J˜n
= op (1). This
b Jn b − b Jn b = op (1) for any b ∈ Rp . As ′ˆ
′˜
a consequence, we can consider the case that Jn is a scalar random variable without loss of generality. Using a Taylor expansion, we have
∗ n − n n 1− dij,n ˆJn − J˜n = K E ℓn E ℓn n i=1 j=1 dn × Vˆ i,n Vˆ j′,n − Vi,n Vj′,n n
366
M.S. Kim, Y. Sun / Journal of Econometrics 160 (2011) 349–371
√
′ √ √ ≡ 2L1,n n θˆ − θ0 + n θˆ − θ0 L2,n n θˆ − θ0 ′ √ √ + n θˆ − θ0 L3,n n θˆ − θ0
n − min vec(J˜n − Jn )′ Sn vec(J˜n − Jn ) , h E ℓn n = min vec(J˜n − Jn )′ Sn vec(J˜n − Jn ) + op (1), h E ℓn p n − min vec(J˜n − Jn )′ Sn vec(J˜n − Jn ) , h → 0, E ℓn
(A.11)
where
∂ V j ,n K = Vi,n , √ n E ℓn i = 1 dn ∂θ ′ n j =1 ∗ 2 n n dij,n 1 −− ∂ 1 = √ K V (θ ) Vj,n (θ ), i ,n dn ∂θ ∂θ ′ nE ℓn n i=1 j=1 ∗ ′ n n dij,n 1 −− ∂ 1 ∂ = √ K Vi,n (θ ) Vj,n (θ ) . dn ∂θ ∂θ nE ℓn n i=1 j=1
L1,n
L2,n
L3,n
n E ℓn 1 −
n 1 −
d∗ij,n
Therefore, under Assumption 8(i) it suffices to show that L1,n = op (1), L2,n = op (1) and L3,n = op (1). For L2,n , we have, using the Cauchy inequality, the expression given in Box IV. The last equality in Box IV follows from Assumption 8(iv). By Assumption 8(ii), we have
P
n 1 − ∗ 1 dij,n ≤ dn Vj,n (θ ) > ∆ E ℓn j = 1
≤
≤
≤
1
n −
∆ E ℓn
j =1
1
∆
E
1
∆
E1 d∗ij,n
E ℓn
ℓi,n −
E ℓn j = 1
(A.14)
lim lim MSEh
2 21 (θ ) (i) ,n 2
sup E sup Vj,n (θ )
j
21
θ
, J˜n , S
n
= lim MSE
E ℓn
n→∞
, J˜n , S .
(A.15)
Under Assumption 8(ii), (A.14) holds by applying Lemma 3. Eq. (A.15) holds by applying Lemma 4 with
(J˜ − Jrs,n ) E ℓn rs,n n
[ E
n E ℓn
≤ dn ) Vj,n (θ ) = Op (1)
E ℓn n i=1 j=1
I2 =
uniformly over i. Hence L2,n = op (1). Using the same procedure, we can show that L3,n = op (1) under Assumption 8(iii). The next step is to show L1,n = op (1). By the Markov inequality and Assumption 8(v), we obtain the expression in Box V. The last inequality in Box V follows from Assumption5. Combining this with the definition of L1n , we obtain L1,n = Op
(J˜rs,n − Jrs,n )
n n n 1 −−
I1 =
4
= O(1) ∀r , s ≤ p. Note that
]4
= E (I1 + I2 )4
where
→ 0,
n n n 1 −−
1(d∗ij,n
n E ℓn
h→∞ n→∞
It is easy to see that supn≥1 EXn2 < ∞, as required by Lemma 4, if
as ∆ → ∞. This implies that n 1 −
h→∞ n→∞
n , J˜n , Sn − MSEh , J˜n , S =0 E ℓn E ℓn n
n
j(i) =1
MSEh
n ′ ˜ ˜ Xn = vec(Jn − Jn ) S (Jn − Jn ) . Eℓ
≤ dn E Vj,n (θ ) E Vj
lim lim
E
ℓi , n 1 E ℓn ℓi,n
E ℓi,n
as n → ∞. Here the op (1) term follows from Theorem 1(c). Also |ξn | ≤ h. By Lemma 3, E ξn → 0. Since this holds for all h, the first equality of Theorem 1(d) holds. The second equality of Theorem 1(d) is obtained by showing that
E ℓn n
= op (1).
(d) Asymptotic truncated MSE To establish the first and second equalities of Theorem 1(d), we introduce two lemmas from Andrews (1991). For proofs, see Lemma A1 and Lemma A2 in Andrews (1991).
d∗ij,n
dn
K
d∗ij,n
(Vi(,rn) Vj(,sn) − γij(,rsn) )
dn
− 1 γij(,rsn) .
Therefore, it suffices to show that the following terms are all O(1): D1n = E I14 ,
D2n = E I13 I2 ,
D4n = E I1 I23 ,
D3n = E I12 I22 ,
D5n = E I24 .
By Theorem 1(a) and (b), it is straightforward to show that D3n , D4n and D5n are O(1) under Assumption 9(ii). The proofs for D1n and D2n are similar and we focus only on D1n here. As denoted before, let
ϕlkrs,n =
∑n ∑n i=1
D1n = E
p
j =1
n E ℓn
≤ 8E
Lemma 4. Let {Xn } be a sequence of nonnegative r v ’s for which supn≥1 EXn1+δ < ∞ for some δ > 0. Then, limh→∞ limn→∞ (E min {Xn , h} − EXn ) = 0. In our setting, n ξn = min vec(Jˆn − Jn )′ Sn vec(Jˆn − Jn ) , h E ℓn n − min vec(J˜n − Jn )′ Sn vec(J˜n − Jn ) , h E ℓn n = min vec(Jˆn − J˜n + J˜n − Jn )′ Sn vec(Jˆn − J˜n + J˜n − Jn ) , h E ℓn
K
E ℓn n i=1 j=1
Lemma 3. If {ξn } is bounded sequence of random variables such that
ξn → 0, then E ξn → 0.
np np 1 −−
n l=1 k=1
n E ℓn
+ 8E
d∗ ij,n dn
K
np 1−
n l=1 n
E ℓn
(r ) (s)
ril,n rjk,n . Then,
4 ϕlkrs,n (εl,n εk,n − E εl,n εk,n ) 4
ϕllrs,n (ε
np np 1 −−
n l=1 k= ̸ l
2 l ,n
− Eε ) 2 l ,n
4 ϕlkrs,n (εl,n εk,n − E εl,n εk,n )
≡ 8G1n + 8G2n . Let ε˜ l21 ,n = εl2,n − E εl2,n . For G1n , we have: G1n =
1 n2 (E ℓn )
2
E
−
ϕl1 l1 rs,n ϕl2 l2 rs,n ϕl3 l3 rs,n ϕl4 l4 rs,n
l1 ,l2 ,l3 ,l4
× E ε˜ l21 ,n ε˜ l22 ,n ε˜ l23 ,n ε˜ l24 ,n
M.S. Kim, Y. Sun / Journal of Econometrics 160 (2011) 349–371
367
2 ∗ 2 n − n 1 1− 2 dij,n ∂ L2,n = K V (θ ) V (θ ) √ i , n j , n nE ℓn n i=1 j=1 dn ∂θ ∂θ ′ 2 2 n n ∂ 1 −− 1 ∗ 1(dij,n ≤ dn ) Vi,n (θ ) Vj,n (θ ) ≤ √ ∂θ ∂θ ′ nE ℓn n i=1 j=1 2 n n 1 − E ℓn 1 − ∂ 2 ≤ Vi,n (θ ) 1(d∗ij,n ≤ dn ) Vj,n (θ ) Eℓ n n i=1 ∂θ ∂θ ′ n j =1 2 2 − 2 n n 1 n ∂ 1 − E ℓn 1 − ∗ Vi,n (θ ) 1(dij,n ≤ dn ) Vj,n (θ ) sup ≤ n n n i=1 θ ∂θ ∂θ ′ E ℓn j = 1 i=1 2 n n 1 − E ℓn 1 − ∗ 1(dij,n ≤ dn ) Vj,n (θ ) = Op (1) n n i = 1 E ℓn j = 1
(A.12)
Box IV.
∗ n n 1 − dij,n 1 − ∂ P √ V Vi,n > δ K n j=1 j,n E ℓn i=1 dn ∂θ ′ ∗ ∗ n n n n dij,n dab,n 1 1 −−−− ∂ ∂ ≤ 2E K K Vj,n Vb,n ′ Vi,n Va,n δ nE ℓ2n i=1 j=1 a=1 b=1 dn dn ∂θ ∂θ ∗ n n ∗ n n dij,n dab,n 1 1 −−−− E K E Vj,n Vb,n ∂ Vi,n ∂ Va,n K ≤ 2 2 ′ δ nE ℓn i=1 j=1 a=1 b=1 dn dn ∂θ ∂θ n n n n 1 1 −−−− ∗ ∗ E Vj,n Vb,n ∂ Vi,n ∂ Va,n E 1 ( d ≤ d ) 1 ( d ≤ d ) ≤ 2 n n ij,n ab,n 2 ′ δ nE ℓn i=1 j=1 a=1 b=1 ∂θ ∂θ ℓj,n ℓb,n n n − − − − 1 ℓj,n ℓb,n 1 E Vj,n Vb,n ∂ Vi ,n ∂ Va ,n ≤ 2E (j) (b) 2 ′ δ Eℓ nℓ ℓ ∂θ ∂θ j,n b,n j=1 i =1 b=1 a =1 (b) (j)
n
≤
≤
1
E ℓj , n ℓb , n
δ2 C
E ℓ2n
E ℓ2j,n
n − ∂ ∂ sup E Vj,n Vb,n ′ Vi,n V a ,n ∂θ ∂θ i,a,b j=1
1/2
E ℓ2b,n
δ2
1/2
C′
≤
E ℓ2n
(A.13)
δ2 Box V.
=
≤
1 n2
−
E
(E ℓn )
2
4 ϕllrs ,n + 3E
E 2
2 −
n2 (E ℓn )
−
× (εl1 ,n εl2 ,n − E εl1 ,n εl2 ,n )(εl3 ,n εl4 ,n − E εl3 ,n εl4 ,n ) ] × (εl5 ,n εl6 ,n − E εl5 ,n εl6 ,n )(εl7 ,n εl8 ,n − E εl7 ,n εl8 ,n )
ϕl21 l1 rs,n ϕl22 l2 rs,n
l1 ̸=l2
l
C
2 ϕllrs ,n
[ =O
l
]
1
(E ℓn )2
= o(1)
=
ϕ
2 llrs,n
=
l
∗ n − n − − dij,n K
i=1 j=1
l
dn
(r ) (s) ril,n rjl,n
2
n n 2 − 2 − − (r ) (r ) ≤ ril,n rjl,n = O(n) i =1
l
j =1
G2n =
1 n2 (E ℓn )2
E
l1 ̸=l2 l3 ̸=l4 l5 ̸=l6 l7 ̸=l8
1
1
ϕl1 l2 rs,n ϕl3 l4 rs,n ϕl5 l6 rs,n ϕl7 l8 rs,n
−−−− E ϕl1 l2 rs,n ϕl3 l4 rs,n ϕl5 l6 rs,n ϕl7 l8 rs,n l1 ̸=l2 l3 ̸=l4 l5 ̸=l6 l7 ̸=l8
where the last equality holds by the independence of {εℓ,n } from {νij,n }. Since {εl,n } is independent and E εl8,n < ∞, it suffices to show that E 2
n2 (E ℓn )
by Assumption 2. For G2n , we have:
[− − − −
( E ℓn )
2
× E [(εl1 ,n εl2 ,n − E εl1 ,n εl2 ,n )(εl3 ,n εl4 ,n − E εl3 ,n εl4 ,n ) × (εl5 ,n εl6 ,n − E εl5 ,n εl6 ,n )(εl7 ,n εl8 ,n − E εl7 ,n εl8 ,n )]
using the fact that
−
1 n2
E 2
n2 (E ℓn )
np − np −
ϕl41 l2 rs,n < ∞,
(A.16)
l1 =1 l2 ̸=l1 np − np − np − np −
ϕl1 l2 rs,n ϕl1 l2 rs,n ϕl3 l4 rs,n ϕl3 l4 rs,n < ∞,
l1 =1 l2 =1 l3 =1 l4 =1
(A.17)
368
M.S. Kim, Y. Sun / Journal of Econometrics 160 (2011) 349–371 np
1 n2
( E ℓn )
2
np
np
np
−−−−
E
and
ϕl1 l2 rs,n ϕl2 l4 rs,n ϕl4 l3 rs,n ϕl3 l1 rs,n < ∞.
ℓi,n −
l 1 =1 l 2 =1 l 3 =1 l 4 =1
(A.18)
K
d∗ij ,n (i)
dn
j(i) =1
γj((isr) a),n
Eq. (A.16) is true because 1 n2 (E ℓn )2
np − np −
E
ϕ
4 l1 l2 rs,n
np
np
− − − − d∗ij,n K
n2 (E ℓn )2 l =1 l ̸=l 1 2 1
n2 (E ℓn )2 l =1 l ̸=l 1 2 1
[
j:d∗
i =1
=
n 2 E ℓn
nE ℓn
np − np −
E
ϕl1 l2 rs,n ϕl1 l2 rs,n ϕl3 l4 rs,n ϕl3 l4 rs,n
j(a) =1
2
E
ϕl1 l2 rs,n ϕl1 l2 rs,n
1
nE ℓn
×
E
ϕl1 l2 rs,n ϕl1 l2 rs,n K
(r )
(r )
ril1 ,n ral1 ,n
1 nE ℓn
E
np −
K
dn (s)
(s)
dn
K
dn
= γia(rr,n) γjb(ss,n)
n 2 E ℓn
=
np − np − np − np −
1 n2 E ℓ2n
×K
=
n E ℓn
ϕl1 l2 rs,n ϕl2 l4 rs,n ϕl4 l3 rs,n ϕl3 l1 rs,n
E
K
dn
i,j,a,b o,p,q,m
d∗qm,n
d∗ij,n
K
d∗ab,n
1
ℓo,n − p(o) =1
K
d∗op(o) ,n dn
(sr )
γp(o) q,n
γp((srq))q,n
ℓo,n −
b(o) =1
j(a) =1
γb((sro))o,n
ℓi,n − m(i) =1
j(a) =1 m(i) =1
2q
=
γm(sr(i))i (1 + o(1)) 2
γj((asr) a) ,n γm(sr(i))i,n (1 + o(1)) = O(1)
2q
dn 2q
dn E ℓn /n
lim MSE
=
dn
τ + o(1)
,
n
, J˜n , Sn
E ℓn ′ n = lim vec E J˜n − Jn Sn vec E J˜n − Jn n→∞ E ℓn n + lim K¯ tr Sn var(vecJ˜n ) n→∞ E ℓn ′ 1 = Kq2 vecg (q) S vecg (q) + K¯ tr S (I + Kpp )(g ⊗ g ) ,
(i)
τ
where the last equality holds by Theorem 1(a) and (b).
Proof of Corollary 1. The proof is very close to the proof of Corollary 1 in Andrews (1991). As
dn
γj((asr) a) ,n
nE ℓn i,a
n→∞
dn
(sr ) (sr ) γja(sr,n) γbo(sr,n) γpq ,n γmi,n ∗ ℓi,n dij ,n −− − (i) E K γj(sra),n
b(a) =1
o ,q
ℓi,n ℓa,n − −−
1
ℓ dn i,a o,q j(i) =1 ∗ ℓa,n dab ,n − (a) × K γb((sra))o,n
×
K
d∗op,n
dn
n2 E 2n
dn
q = O d− . So n
we have
l1 =1 l2 =1 l3 =1 l4 =1
− −
−q
using Eqs. (A.2) and (A.3). Combining the above proof, we obtain G2n = O(1). Hence D1n = O(1). For the last equality of Theorem 1(d), since
using Eqs. (A.2) and (A.3). Hence (A.17) holds. Finally, for Eq. (A.18), we note that E 2
ℓq,n − p(q) =1
= O(1)
1
ℓa,n K (d∗ − ij(a) ,n /dn ) − 1 (sr ) + q γj(a) a,n j(a) =1 d∗aj ,n /dn ( a)
ϕl1 l2 rs,n ϕl2 l4 rs,n ϕl4 l3 rs,n ϕl3 l1 rs,n
ℓa,n −
−−
×
∗ ∗ n − n − n − n − dij,n dab,n i=1 j=1 a=1 b=1
q γj((asr) a) ,n + Op d− n
n2 E ℓ2n i,a
rjl2 ,n rbl2 ,n
dn
−q
np − np − np − np −
l2 =1
K
γja(sr,n)
l1 =1 l2 =1 l3 =1 l4 =1
∗ ∗ n − n − n − n − dij,n dab,n
np −
γj(a) a,n + Op dn
1
=
i=1 j=1 a=1 b=1
γja(s,,nr )
q γj((asr) a) ,n + Op d− n
l 1 =1
=
E 2
l 1 =1 l 2 =1
1
l1 =1 l2 =1
np − np −
dn
where the O(·) term also satisfies EOp dn n 2 E ℓn
nE ℓn
=
ℓa,n −
=
and 1
dn
γja(sr,n)
dn
d∗ij,n
q q × d∗aj(a) ,n d− n
l1 =1 l2 =1 l3 =1 l4 =1
1
(sr )
j(a) =1
np − np − np − np −
=
ℓa,n −
=
(E ℓn )2
E 2
K
d∗ij ,n (a)
j(a) =1
j =1
where the last equality follows from Assumption 2. For Eq. (A.17), we have 1
d∗ij,n
K
ℓa,n −
dn
−
+
]
1
=O
dn
i=1 j=1
(r ) (s) ril1 ,n rjl2 ,n
4
np − np n n 4 − 4 − − (r ) (s) ril1 ,n rjl2 ,n
1
≤
n
K
j:d∗
d∗ij,n
j:d∗
n
K
l1 =1 l2 ̸=l1
1
=
−
=
=
−
ℓq,n − m(q) =1
K
d∗qm(q) ,n dn
(sr )
γm(q) i,n
n
2q 2q+η
2q 2q+η
= αn
2q
d n E ℓn n
2qη+η
n 2q η = αn2q+η τ 2q+η + o(1) , E ℓn E ℓn n
M.S. Kim, Y. Sun / Journal of Econometrics 160 (2011) 349–371
by Theorem 1(d), we obtain
lim lim MSEh n2q/(2q+η) , Jˆn (dn ), Sn
ℓ¨ i,n n − ∂ ℓ¨ i,n 1 − Vi,n (θ¯ )Vj ,n (θ¯ ) = op (1) ( i ) ∂θ E ℓ¨ n ℓ¨ i,n n i=1 j =1 (i) ∂ ¯ ¯ + V ( θ ) V ( θ ) i,n ∂θ j(i) ,n ,
h→∞ n→∞
η
2q
= α 2q+η τ 2q+η
1
τ
′
Kq2 vecg (q) S vecg (q)
+ K¯ tr S (I + Kpp )(g ⊗ g )
.
It is straightforward to show that this is uniquely minimized over τ ∈ (0, ∞) by τ ⋆ = qKq2 κ(q)/η (provided 0 < κ < ∞ and S is 2q
psd) and that a sequence {dn } satisfies ⋆
dn = dn + o(n
1/(2q+η)
).
dn E ℓn n
→ τ ⋆ if and only if
Proof of Theorem 2. (a)
E ℓ¨ n
(Jˆn (dˆ n ) − Jˆn (d¨ n )) = op (1).
n E ℓ¨ n
By Theorem 1(c),
n
(Jˆn (dˆ n ) − Jn ) = Op (1) and
n E ℓ¨ n
(Jˆn (d¨ n ) − Jn ) = Op (1). Therefore, it
n Jˆrs,n (dˆ n ) − Jˆrs,n (d¨ n ) E ℓ¨ n =
n E ℓ¨ n
n n 1 −−
K
n i=1 j=1
d∗ij,n
−K
dˆ n
d∗ij,n
d¨ n
op (1) terms hold as
Vˆ i,n Vˆ j,n
∗ ∗ ℓ¨ i,n n dij ,n dij ,n 1−− (i) (i) K −K ¨ ˆn d¨ n E ℓn n i=1 j =1 d (i) n
× Vˆ i,n Vˆ j′(i) ,n − Vi,n Vj′(i) ,n ,
M2n =
E ℓ¨ n
M3n =
n
1
n i=1
n E ℓ¨ n
ℓ¨ i,n n − −
1
n −
n i=1
K
j(i) =ℓ¨ i,n +1
K
d∗ij ,n (i) dˆ n
−K
d∗ij ,n (i)
d¨ n
Vi,n Vj′(i) ,n ,
Vˆ i,n Vˆ j′(i) .
The third term M3n is zero when dˆ n ≤ d¨ n . We assume that dˆ n > d¨ n below. Therefore, it suffices to show M1n = op (1), M2n = op (1) and M3n = op (1). We consider the case that Vi,n is a scalar here as the proof for the vector case is similar. Note that ∗ ℓ¨ i,n ∗ n dij,n dij,n E ℓ¨ n 1 − − ‖M1n ‖ = K − K ¨ n E ℓn i=1 j =1 d¨ n dˆ n
×
∂ Vj ,n (θ¯ ) ∂ Vi,n (θ¯ ) ¯ ) + Vi,n (θ¯ ) (i) V ( θ j , n (i) ∂θ ′ ∂θ ′
ˆθ − θ
ℓ¨ i,n ∗ n − − dij,n d∗ij,n 1 ≤ Op (1) − √ n d¨ n E ℓ¨ n n i=1 j =1 dˆ n (i) ∂ ∂ ¯ ¯ ¯ ¯ × V ( θ) V ( θ ) + V ( θ ) V ( θ ) j(i) ,n ∂θ i,n i,n ∂θ j(i) ,n
(i)
=
Using Assumption 8(ii) and (iii), we have
ℓ¨ i,n n ∂ 1 −− ∂ ¯ ¯ ¯ ¯ Vi,n (θ)Vj(i) ,n (θ) + Vi,n (θ) Vj ,n (θ) > ∆ P ∂θ (i) ℓ¨ i,n n i=1 j(i) =1 ∂θ 1
1
E
2
sup E sup
∆
θ
i
1 2 12 2 2 ∂ sup E sup Vj,n (θ) Vi,n (θ) →0 ∂θ θ j
as n and ∆ grows. Thus, M1n = op (1).
√
n(d¨ n /dˆ n − 1) = Op (1), we have √ ¨ ˆ P ( n dn /dn − 1 > C ) → 0 as C → ∞. That is, for any ε > 0, √ there exists a constant C > 0 such that P ( n|d¨ n /dˆ n − 1| > C ) < ε for sufficiently large n. Hence we can focus on the event that E = √ { n|d¨ n /dˆ n − 1| < C } and we do so in the following derivation: We now consider M2n . Since
∗ ∗ ℓ¨ in n − dij ,n dij ,n 1 − (i) (i) P K −K Vi,n Vj(i) ,n > δ d¨ n dˆ n nE ℓ¨ n i=1 j(i) =1 ≤
≤
E ℓ¨ n
ℓ¨ i,n n − 1 − ∂ √ d¨ n Vi,n (θ¯ )Vj ,n (θ¯ ) ≤ op (1) n − 1 ∂θ (i) ˆdn E ℓ¨ n n i=1 j =1
n d¨ n /dˆ n − 1
(i)
(i)
√
= Op (1).
≤
dˆ n
j(i) =1 n −
d∗ij ,n (i)
= Op (1),
(i)
n θˆ − θ
ℓ¨ i,n n − − ∂ Vi,n (θ¯ )Vj ,n (θ¯ ) + Vi,n (θ¯ ) ∂ Vj ,n (θ¯ ) (i) (i) ∂θ ¨ℓi,n n ∂θ i =1 j =1
√
Op (1). Since ℓ¨ i,n /E ℓ¨ n = Op (1), it now suffices to show that
≤
where M1n =
where the first inequalityuses Assumption 7(ii) and the Op (1)and
ℓ¨ i,n n − − ∂ ∂ ¯ ¯ ¯ ¯ E V ( θ) V ( θ) + E V ( θ) V ( θ) j(i) ,n ∂θ i,n i,n ∂θ j(i) ,n ∆ ℓ¨ i,n n i=1 j =1 (i) 2 12 ℓ¨ i,n n 2 12 1 1 −− ∂ ¯ ¯ ≤ E E Vj(i) ,n (θ) Vi,n (θ) E ∆ ℓ¨ i,n n i=1 j =1 ∂θ (i) 2 21 ℓ¨ i,n n 2 21 1 −− 1 ∂ E Vi,n (θ) ¯ ¯ Vj ,n (θ) + E E ∆ ℓ¨ i,n n i=1 j =1 ∂θ (i)
≡ M1n + M2n + M3n
(A.19)
1
suffices to show the second part of Theorem 2(a). Without loss of generality, we assume Jn is a scalar random variable. Note that
369
∂ ¯ ¯ V ( θ ) + V ( θ ) i,n ∂θ j(i) ,n
1
ℓ¨ i,n ℓ¨ a,n n − n − − − E E Vi,n Vj(i) ,n Va,n Vb(a) ,n
δ 2 nE ℓ¨ n i=1 j =1 a=1 b =1 (i) (a) ∗ ∗ ∗ ∗ dij ,n dij ,n dab ,n dab ,n (i) (a) (i) (a) × K −K K −K d¨ n d¨ n dˆ n dˆ n C
1
δ 2 nE ℓ¨ n ×
≤
1
ℓ¨ i,n ℓ¨ a,n n − n − − − E E Vi,n Vj(i) ,n Va,n Vb(a) ,n i=1 j(i) =1 a=1 b(a) =1
d∗ij ,n d∗ab ,n d¨ n (i) (a) d¨ n
d¨ n
dˆ n
n n 1 −−−− E Vi,n Vj,n Va,n Vb,n . 2 δ E ℓ¨ n n i=1 j=1 a=1 b=1
C
2
1
n
2 −1
n
(A.20)
To compute the order of the above upper bound, we note that
370
M.S. Kim, Y. Sun / Journal of Econometrics 160 (2011) 349–371 n
n
n
n
d¨ n × − 1 E Vi,n Vj(i) ,n Va,n Vb(a) ,n E dˆ n
1 −−−− E Vi,n Vj,n Va,n Vb,n 2 n i=1 j=1 a=1 b=1
n n n n 1 −−−− = 2 E n i =1 j =1 a =1 b =1
×
np −
rjℓ2 ,n εℓ2
ℓ2 =1 n
np −
riℓ1 ,n εℓ1
np −
raℓ3 ,n εℓ3
ℓ 3 =1 n
n
≤
ℓ1 =1
np − rbℓ2 ,n εℓ4 ℓ =1 4
np
n
1 −−−−−
=
n2 i=1 j=1 a=1 b=1 ℓ=1
+
+
+
≤
ℓ
i=1 j=1 a=1 b=1
4 2 n n n − p 1 −− riℓ,n γi,j = O(1), ≤C +3
n i=1 j=1
(A.21)
so the upper bound in (A.20) is o(1), which implies that M2n = op (1). The next step is to show M3n = op (1). As before, we can focus
√
on the event E = { n|d¨ n /dˆ n − 1| < C }. For any given δ > 0, ∗ n n dij ,n n 1− − ( i) P (‖M3n ‖ ≥ δ) = P Vˆ i,n Vˆ j(i) ,n K E ℓ¨ ≥δ n i=1 dˆ n n j(i) =ℓ¨ i,n +1 ∗ n n dij ,n n 1− − (i) =P K Vi,n Vj(i) ,n 1 + op (1) ≥ δ dˆ n E ℓ¨ n n i=1 j(i) =ℓ¨ i,n +1 ∗ n n dij ,n − − n (i) c 1 K ≤ P Vi,n Vj(i) ,n ≥ δ, E + P (E ) E ℓ¨ ˆ n i=1 d n n ¨ j(i) =ℓi,n +1 2 ∗ ℓˆ i,n n dij ,n 1− − n (i) −2 ≤δ E K Vi,n Vj(i) ,n E + P E c . ¨ ˆ n E ℓn dn i=1 ¨ j(i) =ℓi,n +1
But
E
≤
n E ℓ¨ n 1
nE ℓ¨ n
n 1−
ℓi,n − ˆ
n i=1 j(i) =ℓ¨ i,n +1
n − E
ℓi,n − ˆ
K
n −
d∗ij ,n (i) dˆ n
2
Vi,n Vj(i) ,n E
ℓˆ a,n −
i=1 j =ℓ¨ +1 a=1 b =ℓ¨ a,n +1 (i) i,n (a)
∗ dij ,n (i) − K (1) K dˆ n
∗ dab ,n (a) × K − K (1) E Vi,n Vj(i) ,n Va,n Vb(a) ,n E dˆ n ≤
cL2 nE ℓ¨ n
n − E
ℓi,n − ˆ
n −
ℓˆ a,n −
i=1 j =ℓ¨ +1 a=1 b =ℓ¨ a,n +1 (a) (i) i,n
∗ dij ,n (i) − 1 dˆ n
∗ dab ,n (a) × − 1 E Vi,n Vj(i) ,n Va,n Vb(a) ,n E dˆ n ≤
cL2 nE ℓ¨ n
n − E
ℓi,n − ˆ
n −
ℓˆ a,n −
i=1 j =ℓ¨ +1 a=1 b =ℓ¨ a,n +1 (i) i,n (a)
ℓˆ a,n −
n2 i=1 a=1 j(i) =ℓ¨ i,n +1 b(a) =ℓ¨ a,n +1
h→∞ n→∞
γi,b γj,a
n
ˆ
E Vi,n Vj(i) ,n Va,n Vb(a) ,n
n n n n cL2 1 − − − − E Vi,n Vj,n Va,n Vb,n = o(1), 2 E ℓ¨ n n i=1 a=1 j=1 b=1
lim lim
n n n n 1 −−−−
i =1
ℓi,n −
n E ℓ¨ n
(Jˆn (dˆ n ) −
Proof of Corollary 2. By Corollary 2 and Theorem 2(b),
γi,j γa,b
n2 i=1 j=1 a=1 b=1
E
n n 1 −−
Jˆn (d¨ n )) = op (1). The first equality of Theorem 2(b) holds by applying Lemma 3 in the same way as in proof of the first equality of Theorem 1(d). Then the second equality of Theorem 2(b) holds by Theorem 1(d).
γi,a γj,b
n n n n 1 −−−−
n2
E ℓ¨ n
using Eq. (A.21). Hence M3n = op (1). Consequently,
riℓ,n rjℓ,n raℓ,n rbℓ,n E ε 4
n n n n 1 −−−−
n2 i=1 j=1 a=1 b=1
cL2
d¨ n − 1 ˆ dn
MSEh n2q/(2q+η) , Jˆn (d˙ n ), Sn
− MSEh n2q/(2q+η) , Jˆn (dˆ n ), Sn = lim lim MSEh n2q/(2q+η) , Jˆn (dn ), Sn h→∞ n→∞ − MSEh n2q/(2q+η) , Jˆn (d¨ n ), Sn .
(A.22)
Since g¨ = g and g¨ (q) = g (q) , d¨ n = d⋆n . Corollary 1 implies that the expression in (A.22) is ≥ 0 with the inequality being strict unless dn = d⋆n + o(n1/(2q+η) ). References Andrews, D., 1991. Heteroskedasticity and autocorrelation consistent covariance matrix estimation. Econometrica 59 (3), 817–858. Anselin, L., 1988. Spatial Econometrics: Methods and Models. Kluwer Academic Publishers. Bester, A., Conley, T., Hansen, C., Vogelsang, T., 2009. Fixed-b asymptotics for spatially dependent robust nonparametric covariance matrix estimators. Working Paper. Department of Economics, Michigan State University. Conley, T., 1996. Econometric modelling of cross sectional dependence. Ph.D. Thesis. University of Chicago, Dept. of Economics. Conley, T., 1999. GMM estimation with cross sectional dependence. Journal of Econometrics 92 (1), 1–45. Conley, T., Molinari, F., 2007. Spatial correlation robust inference with errors in location or distance. Journal of Econometrics 140 (1), 76–96. Davidson, R., Flachaire, E., 2001. The wild bootstrap, tamed at last. Working Papers. Department of Economics, Queen’s University. Goncalves, S., Vogelsang, T., 2008. Block bootstrap HAC robust tests: the sophistication of the naive bootstrap. Working Paper. Universite de Montreal and Michigan State University. Hannan, E., 1970. Multiple Time Series. Wiley, New York. Kelejian, H., Prucha, I., 1998. A generalized spatial two-stage least squares procedure for estimating a spatial autoregressive model with autoregressive disturbances. The Journal of Real Estate Finance and Economics 17 (1), 99–121. Kelejian, H., Prucha, I., 2007. HAC estimation in a spatial framework. Journal of Econometrics 140 (1), 131–154. Kiefer, N., Vogelsang, T., 2002. Heteroskedasticity-autocorrelation robust testing using bandwidth equal to sample size. Econometric Theory 18 (06), 1350–1366. Kiefer, N., Vogelsang, T., 2005. A new asymptotic theory for heteroskedasticityautocorrelation robust tests. Econometric Theory 21 (06), 1130–1164. Lee, L., 2004. Asymptotic distributions of quasi-maximum likelihood estimators for spatial econometric models. Econometrica 72 (6), 1899–1925. Liu, R.Y., 1988. Bootstrap procedure under some non-I.I.D. models. Annals of Statistics 16, 1696–1708. Newey, W., West, K., 1987. A simple, positive semi-definite, heteroskedasticity and autocorrelation consistent covariance matrix. Econometrica 55, 703–708. Newey, W., West, K., 1994. Automatic lag selection in covariance matrix estimation. Review of Economic Studies 61, 631–653. Parzen, E., 1957. On consistent estimates of the spectrum of a stationary time series. Annals of Mathematical Statistics 28, 329–348. Pinkse, J., Slade, M., Brett, C., 2002. Spatial price competition: a semiparametric approach. Econometrica 70 (3), 1111–1153. Politis, D., 2007. Higher-order accurate, positive semi-definite estimation of largesample covariance and spectral density matrices. Working Paper. Department of Mathematics, UC San Diego. Robinson, P., 2007. Nonparametric spectrum estimation for spatial data. Journal of Statistical Planning and Inference 137 (3), 1024–1034.
M.S. Kim, Y. Sun / Journal of Econometrics 160 (2011) 349–371 Sun, Y., Phillips, P., 2008. Bandwidth Choice for Interval Estimation in GMM Regression. Working Paper. Department of Economics, UC San Diego. Sun, Y., Phillips, P., Jin, S., 2008. Optimal bandwidth selection in heteroskedasticityautocorrelation robust testing. Econometrica 76 (1), 175–194.
371
White, H., 1980. A heteroskedasticity-consistent covariance matrix estimator and a direct test for heteroskedasticity. Econometrica 48 (4), 817–838. White, H., 1984. Asymptotic Theory for Econometricians. Academic Press, Orlando, FL.