SHORT-MEMORY LINEAR PROCESSES AND ECONOMETRIC APPLICATIONS
SHORT-MEMORY LINEAR PROCESSES AND ECONOMETRIC APPLICATIONS ...
92 downloads
794 Views
3MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
SHORT-MEMORY LINEAR PROCESSES AND ECONOMETRIC APPLICATIONS
SHORT-MEMORY LINEAR PROCESSES AND ECONOMETRIC APPLICATIONS KAIRAT T. MYNBAEV International School of Economics Kazakh-British Technical University Almaty, Kazakhstan
Copyright # 2011 by John Wiley & Sons, Inc. All rights reserved Published by John Wiley & Sons, Inc., Hoboken, New Jersey Published simultaneously in Canada No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as permitted under Sections 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 750-4470, or on the web at www.copyright.com. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201) 748-6008, or online at http://www.wiley.com/go/permission. Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives or written sales materials. The advice and strategies contained herein may not be suitable for your situation. You should consult with a professional where appropriate. Neither the publisher nor author shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages. For general information on our other products and services or for technical support, please contact our Customer Care Department within the United States at (800) 762-2974, outside the United States at (317) 572-3993 or fax (317) 572-4002. Wiley also publishes its books in variety of electronic formats. Some content that appears in print may not be available in electronic format. For more information about Wiley products, visit our web site at www.wiley.com. Library of Congress Cataloging-in-Publication Data: Mynbaev, K. T. (Kairat Turysbekovich) Short-memory linear processes and econometric applications / Kairat T. Mynbaev. p. cm. Includes bibliographical references and index. ISBN 978-0-470-92419-8 1. Linear programming. 2. Econometric models. 3. Regression analysis. 4. Probabilities. I. Title. T57.74.M98 2011 519.70 2—dc22 2010040947
Printed in the United States of America 10 9 8 7 6 5 4 3 2 1
To my teacher Mukhtarbai Otelbaev, from whom I learnt the best I know.
CONTENTS
List of Tables
xi
Preface
xiii
Acknowledgments
xix
1 INTRODUCTION TO OPERATORS, PROBABILITIES AND THE LINEAR MODEL 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 1.10 1.11 1.12
Linear Spaces 1 Normed Spaces 3 Linear Operators 6 Hilbert Spaces 9 Lp Spaces 13 Conditioning on s-fields 18 Matrix Algebra 21 Convergence of Random Variables 24 The Linear Model 29 Normalization of Regressors 32 General Framework in the case of K Regressors Introduction to L2-Approximability 40
35
2 Lp-APPROXIMABLE SEQUENCES OF VECTORS 2.1 2.2 2.3 2.4 2.5 2.6 2.7
1
45
Discretization, Interpolation and Haar Projector in Lp 45 Convergence of Bilinear Forms 49 The Trinity and Its Boundedness in lp 54 Convergence of the Trinity on Lp-Generated Sequences 57 Properties of Lp-Approximable Sequences 68 Criterion of Lp-Approximability 71 Examples and Counterexamples 80
3 CONVERGENCE OF LINEAR AND QUADRATIC FORMS
89
3.1 3.2 3.3 3.4
General Information 89 Weak Laws of Large Numbers 93 Central Limit Theorems for Martingale Differences 94 Central Limit Theorems for Weighted Sums of Martingale Differences 95 3.5 Central Limit Theorems for Weighted Sums of Linear Processes 102 3.6 Lp-Approximable Sequences of Matrices 106
vii
viii
CONTENTS
3.7 Integral operators 111 3.8 Classes sp 117 3.9 Convergence of Quadratic Forms of Random Variables
122
4 REGRESSIONS WITH SLOWLY VARYING REGRESSORS 4.1 4.2 4.3 4.4 4.5 4.6
Slowly Varying Functions 132 Phillips Gallery 1 133 Slowly Varying Functions with Remainder 143 Results Based on Lp -Approximability 149 Phillips Gallery 2 159 Regression with Two Slowly Varying Regressors
131
174
5 SPATIAL MODELS 5.1 5.2 5.3 5.4 5.5 5.6 5.7 5.8 5.9 5.10 5.11 5.12 5.13 5.14
189
A Math Introduction to Purely Spatial Models 190 Continuity of Nonlinear Matrix Functions 193 Assumption on the Error Term and Implications 195 Assumption on the Spatial Matrices and Implications 198 Assumption on the Kernel and Implications 201 Linear and Quadratic Forms Involving Segments of K 205 The Roundabout Road 207 Asymptotics of the OLS Estimator for Purely Spatial Model Method of Moments and Maximum Likelihood 217 Two-Step Procedure 223 Examples and Computer Simulation 230 Mixed Spatial Model 236 The Roundabout Road (Mixed Model) 244 Asymptotics of the OLS Estimator for Mixed Spatial Model
213
254
6 CONVERGENCE ALMOST EVERYWHERE 6.1 6.2 6.3 6.4 6.5 6.6 6.7
265
Theoretical Background 265 Various Bounds on Martingale Transforms 270 Marcinkiewicz –Zygmund Theorems and Related Results 278 Strong Consistency for Multiple Regression 292 Some Algebra Related to Vector Autoregression 300 Preliminary Analysis 310 Strong Consistency for Vector Autoregression and Related Results
319
7 NONLINEAR MODELS 7.1 7.2 7.3 7.4
339
Asymptotic Normality of an Abstract Estimator 339 Convergence of Some Deterministic and Stochastic Expressions 346 Nonlinear Least Squares 358 Binary Logit Models with Unbounded Explanatory Variables 373
8 TOOLS FOR VECTOR AUTOREGRESSIONS 8.1 Lp-Approximable Sequences of Matrix-Valued Functions 8.2 T-Operator and Trinity 397
393 393
CONTENTS
8.3 Matrix Operations and Lp-Approximability 400 8.4 Resolvents 402 8.5 Convergence and Bounds for Deterministic Trends
ix
404
REFERENCES
417
Author Index
423
Subject Index
425
LIST OF TABLES
4.1 Basic SV Functions
134
4.2 Transition Matrix Summary
178
4.3 Type-Wise OLS Asymptotics
182
4.4 Transition Matrix Summary in Case II: jld j , 1, b1 ld þ b2 ¼ 0, b2 =0 5.1 Simulations for Theorem 5.8.1
234
5.2 Simulations for Two-Step Estimator 5.3 Comparison of Percentage Errors
235
236
5.4 Asymptotic Distribution with a Constant Term and Case Matrix 5.5 Simulation Results for Pseudo-Case Matrices 5.6 Simulation Results for Case Matrices
262
262
263
6.1 Contributions to the Consistency Theory of Autoregressions 7.1 Properties of Bernoulli Variables
187
302
386
xi
PREFACE
1 RED LIGHT There are no new econometric models in this book. You will not find real-life applications or tests of economic theories either.
2 GREEN LIGHT The book concentrates on the methodology of asymptotic theory in econometrics. Specifically, central limit theorems (CLTs) for weighted sums of short-memory processes are obtained. They are applied to several well-known econometric models to demonstrate how their asymptotic behavior can be studied, what kind of assumptions are (in)appropriate and how probabilistic convergence statements are applied. Currently, no monographs or textbooks are devoted specifically to econometric models with deterministic regressors. The field is considered rather narrow by some specialists because the first thing they think about is polynomial trends. Indeed, polynomial trends are not widely used in econometrics. However, some other types of regressors fall into the classes of deterministic regressors considered in the literature; for example, some spatial matrices and seasonal dummies. This makes deterministic explanatory variables more important than commonly thought. Besides, on the level of CLTs deterministic weights are of interest in themselves. There is a monograph by Taylor (1978) devoted exclusively to such theorems.
3 THE ESSENCE By and large, CLTs here are based on only one global idea: how sequences of discrete objects (vectors and matrices) can be approximated with functions of a continuous argument (defined on the segment [0, 1]). Stated in this general way, the idea is as old as calculus. The novelty here consists in application of the idea to weighted sums of linear processes n X
wnt ut ,
(0:1)
t¼1
xiii
xiv
PREFACE
where wn ¼ (wn1 , . . . , wnn ), n ¼ 1, 2, . . . is a sequence of deterministic vector weights and ut ¼
1 X
ctj ej , t ¼ 0, +1, +2, . . .
(0:2)
j¼1
is a short-memory linear process. Anybody with a little experience in probabilities, statistics, and econometrics can confirm that statements on convergence in distribution of such weighted sums have many applications. As it turned out, the main difficulties in proving precise CLTs lay in the theory of functions. Hence, attempts to obtain general CLTs for sums of type Eq. (0.1) by researchers with backgrounds other than the theory of functions yielded results less satisfactory than those published in my paper (Mynbaev, 2001) on Lp -approximable sequences. My interest in CLTs for Eq. (0.1) arose from the necessities of regression analysis. In the asymptotic theory of regressions with deterministic regressors, sequences of regressors can be approximated by functions of a continuous argument. The structure of the corresponding estimators allows for application of CLTs for Eq. (0.1). As I was developing applications, I needed various additional properties of Lp -approximable sequences. They are distributed throughout the book and, taken together, constitute a complete toolkit accompanying the main CLTs. In the econometrics context, two other definitions of deterministic regressors are suggested in the literature. A purely algebraic definition (based on recursion) was proposed by Johansen (2000) and developed further by Nielsen (2005) to study strong consistency of ordinary least squares (OLS) estimators. Nielsen’s result, given in Chapter 8, shows that such regressors are asymptotically polynomial functions multiplied by oscillating (trigonometric) functions. The Johansen – Nielsen approach and Lp -approximability complement each other. Phillips (2007) has defined regressors in terms of slowly varying (SV) functions (which is a functional– theoretical construction). Slow variation is a limit property at infinity and, in general, has nothing to do with Lp -approximability, which is a limit property distributed over the segment [0, 1]. However, special sequences arising from SV functions in the regression context are all Lp -approximable.
4 STANDING PROBLEMS About half of the results contained in the book were obtained after I started writing it. The theory has grown to the extent that no single person can embrace all the ramifications. 1. Linear processes (0.2), depending on the rate at which the numbers cj vanish at infinity, are classified as follows. Processes for which 1 X j¼1
jcj j , 1
(0:3)
PREFACE
xv
are called short-memory processes. Processes for which the series in Eq. (0.3) diverges, but 1 X
c2j , 1
j¼1
are called long-memory processes. My CLT for weighted sums Eq. (0.1) holds in the case of short-memory processes. The existing CLTs for longmemory ones, as deep as they are, leave some questions open. 2. The main advantage of representing sequences of vectors with the help of functions of a continuous argument is that the limit expressions in asymptotic distributions involve integrals of those functions. Thus, they are amenable to further analysis, which I call analysis at infinity. For this reason alone, when my definition of Lp -approximability does not fit practical situations (and there is at least one, in spatial econometrics), developing a more suitable definition may be better than relinquishing the concept altogether. 3. The name of the book reflects its coverage rather than its potential. There are two important directions in which it can be extended. One is nonparametric and nonlinear estimation, where even my CLT will suffice for the beginning. Another is the case of stochastic regressors. In this case Anderson and Kunitomo (1992) impose conditions on separate parts of the OLS estimator that allow them to prove its convergence. As an alternative, I would embed enough structure in the stochastic regressors to be able to derive convergence of separate parts of the OLS estimator. The structure entailed by Lp -approximability in the deterministic case may guide the choice for the stochastic case.
5 REVIEW BY CHAPTERS Chapter 1 is a collection of general ideas and preliminaries from probability theory and functional analysis. It also contains a discussion of Lp -approximability and its advantages. The first nontrivial application is to the convergence in distribution of the fitted value for the linear regression. This convergence looks to some econometricians so incredible that an anonymous referee of Econometric Theory said that my paper was “full of mistakes.” Naturally, the paper was rejected and, not so naturally, the result was not published in journals. Thus this book was written. The discussion of issues related to normalization of regressors draws from folklore and should be in the core of any course on asymptotic theory in econometrics. Chapter 2 covers the nonstochastic part of my paper (Mynbaev, 2001). Readers with taste for mathematical precision will find it illuminating that Lp approximability (which relates sequences of vectors to functions defined on [0, 1]) can be characterized intrinsically (in terms of sequences of vectors themselves). This is evidence of a well-balanced definition. On a more practical note, such results and their by-products make sure that the ensuing CLTs are the most precise and general.
xvi
PREFACE
The main CLTs are proved in Chapter 3, which is based on Mynbaev (2001) but I would like to acknowledge the influence of Nabeya and Tanaka (1990) who paved the way to treating convergence of quadratic forms. This is where the theory of integral operators is needed and introduced first. There are many CLTs and weak laws of large numbers (WLLN) out there. The reader will notice that when the innovations ej in linear processes [Eq. (0.2)] are martingale differences (m.d.’s), the McLeish CLT (McLeish, 1974) and the ChowDavidson WLLN (Davidson, 1994) are absolutely sufficient for the purposes of Chapter 3. I dare to suggest trying these tools first in all other problems with linear processes involving m.d.’s. Serious applications (to static models) start in Chapter 4. Phillips (2007) developed a nice scheme of investigating asymptotic properties of regressions with regressors such as log s, log(log s), their reciprocals and so on. Chapter 4 follows this scheme, while the underlying central limit results are derived from my CLT. This is possible because Phillips’ specification of the weights in Eq. (0.1) is a special case of Lp -approximable sequences. One of the methodological conclusions of this chapter is that direct derivation of a CLT given in Chapter 3 is better than recourse to Brownian motion used by Phillips. Chapter 5 demonstrates what can be done with the help of Lp -approximability in the theory of spatial models. This research started with a joint paper (Mynbaev and Ullah, 2008) in which we showed that the OLS estimator for a purely spatial model is not asymptotically normal. In Mynbaev (2010), this result is extended to a mixed spatial model. Spatial models are peculiar in many respects, a full discussion of which would be too technical for a preface. It is worth stating here only the most general methodological conclusion. When studying the asymptotic behavior of a new model, never presume it is of a certain class. Otherwise, you will be bound to use specific techniques that will take you to a particular result so you will not see the general picture. In the 1980s, Lai and Wei in a series of papers (Lai and Wei, 1982, 1983a, 1983b, 1985) obtained outstanding results on strong consistency of the OLS estimator for the linear model, with and without autoregressive terms. Reading those papers is a thankless task because the solution to a large problem is divided into publishable articles and the times of publication of the articles are not the best reflection of the logic of the solution. Chapter 6 is an attempt to expound Lai and Wei’s theory coherently. Chapter 7 contains a treatment of two nonlinear estimators: nonlinear least squares (NLS) and maximum likelihood (ML). The choice of the models is explained by the fact that in both cases the explanatory variables are deterministic. The first part of the chapter covers the Phillips (2007) result for the model ys ¼ bsg þ us . The second part is my extension to unbounded explanatory variables of the approach to binary logit models suggested by Gourie´roux and Monfort (1981). Finally, Chapter 8 contains a study of algebraic properties of Lp -approximable sequences of matrix-valued functions and a study of a different type of deterministic trends from Nielsen (2005). The applications to vector autoregressions (VARs) with deterministic trends are left out.
PREFACE
xvii
6 EXPOSITION The book is analytical in nature, meaning that there is a lot of formula manipulation. Most calculations are detailed so they can be followed without a pen and paper. To simplify the reader’s job, all meaningful parts of proofs are given in separate statements. Because of this, some proofs look longer than they are. Commuters who need to do their reading in buses and trains will benefit from such exposition. Only the core theoretical results are collected in Chapters 2 and 3. All others are given immediately before they are applied (including some CLTs). Thus, applicationspecific properties of Lp -approximable sequences, as well as parts of the theory of integral operators, are scattered throughout the book. If someone were to lecture using this book, I have imagined how clumsy it would be to say, “Let us recall the function defined by Eq. (9) in Lecture 3.” For this reason I have tried to give names not only to final statements, but also to auxiliary objects, such as lemmas, functions and operators. In most cases the names reflect the roles performed by such objects. Thus, you will see bad and good coefficients, a chain product, annihilation lemma, balancer, cutter and the like. However, in a couple of cases descriptive names would be too long, and the names I give reflect the look, not the role. There is a projy and proXy, an awkward aggregate, genie (because of Gn ) and so on. No subsection contains more than one statement. Therefore statements are referred to by the section they are in. Thus, Lemma 3.1.2 means the statement from subsection 3.1.2, even though the name ‘Lemma’ may not be there. Equation numbering follows the Wiley standard: Eq. (7.1) means equation 1 from Chapter 7. To make the book self-contained, most preliminaries are given in the book. All calculations are detailed with extensive cross-referencing.
7 SUGGESTIONS FOR READING The variety and depth of mathematical theories used by econometricians can be a serious obstacle for novices. Davidson (1994) has done an excellent job in gathering in one place the required minimum, from measure theory to stochastic processes. For me, this is the most important book I have read in the past 10 years, and I recommend it for preliminary or concurrent reading. A partial excuse for the limited coverage of the existing literature is that during the four years that I was working on the book I did not receive any support, financial or otherwise, and did not have access to a good library, except when I traveled to international conferences. At the final stage, when the book was in production, I received useful references from some colleagues. Regarding weighted sums and their applications in econometrics, Jonathan B. Hill suggests reading Cˇ´ız´ek (2008), Goldie and Smith (1987), Hahn et al. (1987), Hill (2010, 2011) and references therein. Jan Mielniczuk, who contributed a lot to the theory of long-memory processes not covered here, proposes reading Wu (2005) and Wu and Min (2005) for the most recent developments in the area. M. H. Pesaran was kind enough to provide references Chudik et al. (2010), Holly et al. (2008) and Pesaran and Chudik (2010) for spatial models, vector autoregressions and panel data models.
xviii
PREFACE
Personally, I find nothing more gratifying than reading applied econometric papers because they abound in new ideas. Sometimes they also show how things should not be done. See more about this in Chapter 1. Kairat T. Mynbaev Almaty, Kazakhstan
ACKNOWLEDGMENTS
I am grateful to Carlos Martins Filho for his encouragement for this book and many other projects. The folks at John Wiley & Sons have been highly efficient in preparing the book for publication. The production process surely involved many people of whom I would like to especially thank Susanne Steitz-Filler, Christine Punzo, Jacqueline Palmieri, and Nick Barber (Books Manager, Techset Composition Ltd).
xix
CHAPTER
1
INTRODUCTION TO OPERATORS, PROBABILITIES AND THE LINEAR MODEL
T
HIS CHAPTER has a little bit of everything: normed and Hilbert spaces, linear operators, probabilities, including conditional expectations and different modes of convergence, and matrix algebra. Introduction to the OLS method is given along with a discussion of methodological issues, such as the choice of the format of the convergence statement, choice of the conditions sufficient for convergence and the use of L2 -approximability. The exposition presumes that the reader is versed more in the theory of probabilities than in functional analysis.
1.1 LINEAR SPACES In this book basic notions of functional analysis are used more frequently than in most other econometric books. Here I explain these notions the way I understand them— omitting some formalities and emphasizing the intuition.
1.1.1 Linear Spaces The Euclidean space Rn is a good point of departure when introducing linear spaces. An element x ¼ (x1 , . . . , xn ) [ Rn is called a vector. Two vectors x, y can be added coordinate by coordinate to obtain a new vector x þ y ¼ (x1 þ y1 , . . . , xn þ yn ):
(1:1)
A vector x can be multiplied by a number a [ R, giving ax ¼ (ax1, . . . , axn). By combining these two operations we can form expressions like ax þ by or, more generally, a1 x(1) þ þ am x(m) (1)
(1:2)
(m)
where a1, . . . , an are numbers and x , . . . , x are vectors. Expression (1.2) is called a linear combination of vectors x (1), . . . , x (m) with coefficients a1, . . . , an. Generally, multiplication of vectors is not defined. Short-Memory Linear Processes and Econometric Applications. Kairat T. Mynbaev # 2011 John Wiley & Sons, Inc. Published 2011 by John Wiley & Sons, Inc.
1
2
CHAPTER 1
INTRODUCTION TO OPERATORS, PROBABILITIES AND THE LINEAR MODEL
Here we observe the major difference between R and Rn . In R both summation a þ b and multiplication ab can be performed. In Rn we can add two vectors, but to multiply them we use elements of another set – the set of real numbers (or scalars) R. Generalizing upon this situation we obtain abstract linear (or vector) spaces. The elements x, y of a linear space L are called vectors. They can be added to give another vector x þ y. Summation is defined axiomatically and, in general, there is no coordinate representation of type (1.1) for summation. A vector x can be multiplied by a scalar a [ R. As in Rn , we can form linear combinations [Eq. (1.2)]. The generalization is pretty straightforward, so what’s the big deal? You see, in functional analysis complex objects, such as functions and operators, are considered vectors or points in some space. Here is an example. Denote C[0, 1] the set of continuous functions on the segment [0, 1]. The sum of two functions F, G [ C[0, 1] is defined as the function F þ G with values (F þ G)(t) ¼ F(t) þ G(t), t [ [0, 1] [this is an analog of Eq. (1.1)]. Continuity of F, G implies continuity of their sum and of the product aF, for a a scalar, so C[0, 1] is a linear space.
1.1.2 Subspaces of Linear Spaces A subset L1 of a linear space L is called its linear subspace (or just a subspace, for simplicity) if all linear combinations ax þ by of any elements x, y [ L1 belong to L1 . Obviously, the set f0g and L itself are subspaces of L, called trivial subspaces. For example, in Rn the set L1 ¼ {x : c1 x1 þ þ cn xn ¼ 0} is a subspace because if x, y [ L1 , then c1 (ax1 þ by1 ) þ þ cn (axn þ byn ) ¼ 0. Thus, in R3 the usual straight lines and two-dimensional (2-D) planes containing the origin are subspaces. All intuition we get from our day-to-day experience with the space we live in applies to subspaces. Geometrically, summation x þ y is performed by the parallelogram rule. Multiplying x by a number a = 0 we obtain a vector ax of either the same (a . 0) or opposite (a , 0) direction. Multiplying x by all real numbers, we obtain a straight line {ax : a [ R} passing through the origin and parallel to x. This is a particular situation in which it may be convenient to call x a point rather than a vector. Then the previous sentence sounds like this: multiplying x by all real numbers we get a straight line passing through the origin and the given point x. For a given x1 , . . . , xn its linear span M is, by definition, the least linear space of L containing those points. In the case n ¼ 2 it can be constructed as follows. Draw a straight line L1 ¼ {ax1 : a [ R} through the origin and x1 and another straight line L2 ¼ {ax2 : a [ R} through the origin and x2 . Then form M by adding elements of L1 and L2 using the parallelogram rule: M ¼ {x þ y : x [ L1 , y [ L2 }.
1.1.3 Linear Independence Vectors x1 , . . . , xn are linearly independent if the linear combination c1 x1 þ þ cn xn can be null only when all coefficients are null. EXAMPLE 1.1. Denote by ej ¼ (0, . . . , 0, 1, 0, . . . , 0) (unity in the jth place) the jth unit vector in Rn . From the definition of vector operations in Rn we see that
1.2 NORMED SPACES
3
c1 e1 þ þ cn en ¼ (c1 , . . . , cn ). Hence, the equation c1 e1 þ þ cn en ¼ 0 implies equality of all coefficients to zero and the unit vectors are linearly independent. If in a linear space L there exist vectors x1 , . . . , xn such that 1. x1 , . . . , xn are linearly independent and 2. any other vector x [ L is a linear combination of x1 , . . . , xn , then L is called n-dimensional and the system {x1 , . . . , xn } is called its basis. If, on the other hand, for any natural n, L contains n linearly independent vectors, then L is called infinite-dimensional. EXAMPLE 1.2. The unit vectors in Rn form a basis because they are linearly independent and for any x [ Rn we can write x ¼ (x1 , . . . , xn ) ¼ x1 e1 þ þ xn en . EXAMPLE 1.3. C[0, 1] is infinite-dimensional. Consider monomials xj (t) ¼ t j , j ¼ 0, . . . , n. By the main theorem of algebra, the equation c0 x0 (t) þ þ cn xn (t) ¼ 0 with nonzero coefficients can have at most n roots. Hence, if c0 x0 (t) þ þ cn xn (t) is identically zero on [0, 1], the coefficients must be zero, so these monomials are linearly independent. Functional analysis deals mainly with infinite-dimensional spaces. Together with the desire to do without coordinate representations of vectors this fact has led to the development of very powerful methods.
1.2 NORMED SPACES 1.2.1 Normed Spaces The Pythagorean theorem gives rise to the Euclidean distance
dist(x, y) ¼
rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi X (xi yi )2
(1:3)
i
between points x, y [ Rn . In an abstract situation, we can first axiomatically define the distance dist(x, 0) from x to the origin and then the distance between any two points will be dist(x, y) ¼ dist(x y, 0) (this looks like tautology, but programmers use such definitions all the time). dist(x, 0) is denoted kxk and is called a norm. Let X be a linear space. A real-valued function k k defined on X is called a norm if 1. kxk 0 (nonnegativity), 2. kaxk ¼ jajkxk for all numbers a and vectors x (homogeneity), 3. kx þ yk kxk þ kyk (triangle inequality) and 4. kxk ¼ 0 implies x ¼ 0 (nondegeneracy).
4
CHAPTER 1
INTRODUCTION TO OPERATORS, PROBABILITIES AND THE LINEAR MODEL
By homogeneity the norm of the null vector is zero: 0 0 (vector) ¼ (number)
¼ j0jk0k ¼ 0: (vector) 0
Nondegeneracy makes sure that the null vector is the only vector whose norm is zero. If we omit the nondegeneracy requirement, the result is the definition of a seminorm. Distance measurement is another context in which points and vectors can be used interchangeably. kxk is a length of the vector x and a distance from point x to the origin. In this book, the way norms are used for bounding various quantities is clear from the next two definitions. Let {Xi } be a nested sequence of normed spaces, X1 # X2 # . . . : Take one element from each of these spaces, xi [ Xi . We say that {xi } is a bounded sequence if supi kxi kXi , 1 and vanishing if kxi kXi ! 0:
1.2.2 Convergence in Normed Spaces A linear space X provided with a norm k k is denoted (X, k k). This is often simplified to X. We say that a sequence {xn } converges to x if kxn xk ! 0. In this case we write lim xn ¼ x. Lemma (i) Vector operations are continuous: if lim xn ¼ x, lim yn ¼ y and lim an ¼ a, then lim an xn ¼ ax, lim(xn þ yn ) ¼ lim xn þ lim yn . (ii) If lim xn ¼ x, then limkxn k ¼ kxk (a norm is continuous in the topology it induces). Proof. (i) Applying the triangle inequality and homogeneity, kan xn axk k(an a)xk þ kan (xn x)k ¼ jan ajkxk þ kan kkxn xk ! 0: Here we remember that convergence of the sequence {an } implies its boundedness: supjan j , 1. (ii) Let us prove that kxk kyk kx yk:
(1:4)
The proof is modeled on a similar result for absolute values. By the triangle inequality, kxk kx yk þ kyk and kxk kyk kx yk: Changing the places of x and y and using homogeneity we get kyk kxk ky xk ¼ kx yk: The latter two inequalities imply Eq. (1.4). Equation (1.4) yields continuity of the norm: jkxn k kxkj kxn xk ! 0: B
1.2 NORMED SPACES
5
We say that {xn } is a Cauchy sequence if limn,m!1 (xn xm ) ¼ 0. If {xn } converges to x, then it is a Cauchy sequence: kxn xm k kxn xk þ kx xm k ! 0. If the converse is true (that is, every Cauchy sequence converges), then the space is called complete. All normed spaces considered in this book are complete, which ensures the existence of limits of Cauchy sequences.
1.2.3 Spaces lp A norm more general than (1.3) is obtained by replacing the index 2 by an arbitrary number p [ [1, 1). In other words, in Rn the function kxkp ¼
X
jxi jp
1=p
(1:5)
i
satisfies all axioms of a norm. For p ¼ 1, definition (1.5) is completed with kxk1 ¼ sup jxi j
(1:6)
i
because limp!1 kxkp ¼ kxk1 . Rn provided with the norm k kp is denoted Rnp (1 p 1). The most immediate generalization of Rnp is the space lp of infinite sequences of numbers x ¼ (x1 , x2 , . . . ) that have a finite norm kxkp [defined by Eqs. (1.5) or (1.6), where i runs over the set of naturals N]. More generally, the set of indices I ¼ {i} in Eq. (1.5) or Eq. (1.6) may depend on the context. In addition to Rnp we use Mp (the set of matrices of all sizes). The jth unit vector in lp is an infinite sequence ej ¼ (0, . . . , 0, 1, 0, . . . ) with unity in the jth place and 0 in all others. It is immediate that the unit vectors are linearly independent and lp is infinite-dimensional.
1.2.4 Inequalities in lp The triangle inequality in lp kx þ ykp kxkp þ kykp is called the Minkowski inequality. Its proof can be found in many texts, which is not true with respect to another, less known, property that is natural to call monotonicity of lp norms: kxkp kxkq
for all 1 q p 1:
(1:7)
If x ¼ 0, there is nothing to prove. If x = 0, the general case can be reduced to the case kxkq ¼ 1 by considering the normalized vector x=kxkq : kxkq ¼ 1 implies jxi j 1 for all i. Hence, if p , 1, we have kxkp ¼
X i
jxi jp
1=p
X
jxi jq
1=p
¼
i
If p ¼ 1, the inequality supi jxi j kxkq is obvious.
X i
jxi jq
1=q
¼ kxkq :
6
CHAPTER 1
INTRODUCTION TO OPERATORS, PROBABILITIES AND THE LINEAR MODEL
In lp there is no general inequality opposite to Eq. (1.7). In Rnp there is one. For example, in the case n ¼ 2 we can write max{jx1 j, jx2 j} (jx1 jp þ jx2 jp )1=p 21=p max{jx1 j, jx2 j}: All such inequalities are easy to remember under the general heading of equivalent norms. Two norms kk1 and kk2 defined on the same linear space X are called equivalent if there exist constants 0 , c1 c2 , 1 such that c1 kxk1 kxk2 c2 kxk1 for all x. Theorem. (Trenogin, 1980, Section 3.3) In a finite-dimensional space any two norms are equivalent.
1.3 LINEAR OPERATORS 1.3.1 Linear Operators A linear operator is a generalization of the mapping A : Rm ! Rn induced by an n m matrix A according to y ¼ Ax. Let L1 , L2 be linear spaces. A mapping A : L1 ! L2 is called a linear operator if A(ax þ by) ¼ aAx þ bAy
(1:8)
for all vectors x, y [ L1 and numbers a, b. A linear operator is a function in the first place, and the general definition of an image applies to it: Im(A) ¼ {Ax : x [ L1 } # L2 : However, because of the linearity of A the image Im(A) is a linear subspace of L2 : Indeed, if we take two elements y1 , y2 of the image, then there exist x1 , x2 [ L1 such that Axi ¼ yi : Hence, a linear combination a1 y1 þ a2 y2 ¼ a1 Ax1 þ a2 Ax2 ¼ A(ax1 þ bx2 ) belongs to the image. With a linear operator A we can associate another linear subspace N(A) ¼ {x [ L1 : Ax ¼ 0} # L1 , called a null space of A. Its linearity easily follows from that of A: if x, y belong to the null space of A, then their linear combination belongs to it too: A(ax þ by) ¼ aAx þ bAy ¼ 0. The set of linear operators acting from L1 to L2 can be considered a linear space. A linear combination of operators aA þ bB of operators A, B is an operator defined by (aA þ bB)x ¼ aAx þ bBx. It is easy to check linearity of aA þ bB.
1.3 LINEAR OPERATORS
7
If A is a linear operator from L1 to L2 and B is a linear operator from L2 to L3 , then we can also define a product of operators BA by (BA)x ¼ B(Ax). Applying Eq. (1.8) twice we see that BA is linear: (BA)(ax þ by) ¼ B(aAx þ bAy) ¼ a(BA)x þ b(BA)y:
1.3.2 Bounded Linear Operators Let X1 , X2 be normed spaces and let A : X1 ! X2 be a linear operator. We can relate kAxk2 to kxk1 by composing the ratio kAxk2 =kxk1 if x = 0. A is called a bounded operator if all such ratios are uniformly bounded, and the norm of an operator A is defined as the supremum of those ratios: kAxk2 : x=0 kxk1
kAk ¼ sup
(1:9)
An immediate consequence of this definition is the bound kAxk2 kAkkxk1 for all x [ X1 , from which we see that the images Ax of elements of the unit ball b1 ¼ {x [ X1 : kxk1 1} are uniformly bounded: kAxk2 kAk
for all x [ b1 :
(1:10)
To save a word, a bounded linear operator is called simply a bounded operator. Let B(X1 , X2 ) denote the set of bounded operators acting from X1 to X2 : Lemma.
B(X1 , X2 ) with the norm (1.9) is a normed space.
Proof. We check the axioms from Section 1.2.1 one by one. 1. Nonnegativity is obvious. 2. Homogeneity of Eq. (1.9) follows from that of k k2 . 3. The inequality k(A þ B)xk2 kAxk2 þ kBxk2 implies kA þ Bk ¼ sup x=0
k(A þ B)xk2 kAxk2 kBxk2 sup þ sup ¼ kAk þ kBk: kxk1 x=0 kxk1 x=0 kxk1
4. If kAk ¼ 0, then kAxk2 ¼ 0 for all x and, consequently, A ¼ 0.
B
1.3.3 Isomorphism Let X1 , X2 be normed spaces. A linear operator I : X1 ! X2 is called an isomorphism if 1. kIxk2 ¼ kxk1 for all x [ X1 (preservation of norms) and 2. IX1 ¼ X2 (I is a surjection).
8
CHAPTER 1
INTRODUCTION TO OPERATORS, PROBABILITIES AND THE LINEAR MODEL
Item 1 implies that kIk ¼ 1 and I is one-to-one (if Ix1 ¼ Ix2 , then kx1 x2 k1 ¼ kI(x1 x2 )k2 ¼ 0 and x1 ¼ x2 ): Hence, the inverse of I exists and is an isomorphism from X2 to X1 . Normed spaces X1 and X2 are called isomorphic spaces if there exists an isomorphism I : X1 ! X2 . Vector operations in X1 are mirrored by those in X2 and the norms are the same, so as normed spaces X1 and X2 are indistinguishable. However, a given operator in one of them may be easier to analyze than its isomorphic image in the other, because of special features. Let A be a bounded operator in X1 . It is easy to see that A~ ¼ IAI 1 is a linear operator in X2 . Moreover, the norms are preserved under this mapping: kIAI 1 xk2 kIAyk2 kAyk1 ¼ sup ¼ sup ¼ kAk: kxk2 x=0 y=0 kIyk2 y=0 kyk1
~ ¼ sup kAk
1.3.4 Convergence of Operators Let A, A1 , A2 , . . . be bounded operators from a normed space X1 to a normed space X2 . The sequence {An } converges to A uniformly if kAn Ak ! 0, where the norm is as defined in Eq. (1.9). This is convergence in a normed space B(X1 , X2 ): The word ‘uniform’ is pertinent because, as we can see from Eq. (1.10), when kAn Ak ! 0, we also have the convergence kAn x Axk2 ! 0 uniformly in the unit ball b1 . The sequence {An } is said to converge to A strongly, or pointwise, if for each x [ X1 we have kAn x Axk2 ! 0. Of course, uniform convergence implies strong convergence.
1.3.5 Projectors Projectors are used (or implicitly present) in econometrics so often that it would be a sin to bypass them. Let X be a normed space and let P : X ! X be a bounded operator. P is called a projector if P2 ¼ P:
(1:11)
Suppose y is a projection of x, y ¼ Px. Then P doesn’t change y: Py ¼ P2 x ¼ Px ¼ y. This property is the key to the intuition behind projectors. Consider on the plane two coordinate axes, X and Y, intersecting at a positive, not necessarily straight, angle. Projection of points on the plane onto the axis X parallel to the axis Y has the following geometrical properties: 1. The projection of the whole plane is X. 2. Points on X stay the same. 3. Points on Y are projected to the origin. 4. Any vector on the plane is uniquely represented as a sum of two vectors, one from X and another from Y. All these properties can be deduced from linearity of P and Eq. (1.11).
1.4 HILBERT SPACES
9
Lemma. Let P be a projector and denote Q ¼ I P, where I is the identity operator in X. Then (i) Q is also a projector. (ii) Im(P) coincides with the set of fixed points of P: Im(P) ¼ {x : x ¼ Px}: (iii) Im(Q) ¼ N(P), Im(P) ¼ N(Q): (iv) Any x [ X can be uniquely represented as x ¼ y þ z with y [ Im(P), z [ Im(Q): Proof. (i) Q2 ¼ (I P)2 ¼ I 2 2P þ P2 ¼ I P ¼ Q: (ii) If x [ Im(P), then x ¼ Py for some y [ X and Px ¼ P2 y ¼ Py ¼ x, so that x is a fixed point of P. Conversely, if x is a fixed point of P, then x ¼ Px [ Im(P): (iii) The equation Px ¼ 0 is equivalent to Qx ¼ (I P)x ¼ x, and the equation Im(Q) ¼ N(P) follows. Im(P) ¼ N(Q) is obtained similarly. (iv) The desired representation is obtained by writing x ¼ Px þ (I P)x ¼ y þ z, where y ¼ Px [ Im(P) and z ¼ (I P)x ¼ Qy [ Im(Q). If x ¼ y1 þ z1 is another representation, then, subtracting one from another, we get y y1 ¼ (z z1 ). Hence, P( y y1 ) ¼ P(z z1 ). Here the righthand side is null because z, z1 [ Im(Q) ¼ N(P). The left-hand side is y y1 because both y and y1 are fixed points of P. Thus, y ¼ y1 and z ¼ z1 . B
1.4 HILBERT SPACES 1.4.1 Scalar Products A Hilbert space is another infinite-dimensional generalization of Rn . Everything starts with noticing how useful a scalar product n X
kx, yl ¼
xi yi
(1:12)
i¼1
of two vectors x, y [ Rn is. In terms of it we can define the Euclidean norm, in Rn : kxk2 ¼
n X
!1=2 x2i
¼ kx, xl1=2 :
(1:13)
i¼1
Most importantly, we can find the cosine of the angle between x, y by the formula cy) ¼ cos(x,
kx, yl : kxk2 kyk2
(1:14)
10
CHAPTER 1
INTRODUCTION TO OPERATORS, PROBABILITIES AND THE LINEAR MODEL
To do without the coordinate representation, we observe algebraic properties of this scalar product. First of all, it is a bilinear form: it is linear with respect to one argument when the other is fixed: kax þ by, zl ¼ akx, zl þ bk y, zl, kz, ax þ byl ¼ akz, xl þ bkz, yl for all vectors x, y, z and numbers a, b. Further, we notice that kx, xl is always nonnegative and kx, xl ¼ 0 is true only when x ¼ 0. Thus, on the abstract level, we start with the assumption that H is a linear space and kx, yl is a real function of arguments x, y [ H having properties: 1. kx, yl is a bilinear form, 2. kx, xl 0 for all x [ H and 3. kx, xl ¼ 0 implies x ¼ 0. 4. kx, yl ¼ ky, xl for all x, y. Such a function is called a scalar product. Put kxk ¼ kx, xl1=2 : Lemma.
(1:15)
(Cauchy –Schwarz inequality) jkx, ylj kxkkyk:
Proof. The function f (t) ¼ kx þ ty, x þ tyl of a real argument t is nonnegative by item 2. Using items 1 and 4 we see that it is a quadratic function: f (t) ¼ kx, x þ tyl þ tk y, x þ tyl ¼ kx, xl þ 2tkx, yl þ t 2 k y, yl: Its nonnegativity implies that its discriminant kx, yl2 kx, xlk y, yl is nonpositive. B
1.4.2 Continuity of Scalar Products Notation (1.15) is justified by the following lemma. Lemma (i) Eq. (1.15) defines a norm on H and the associated convergence concept: xn ! x in H if kxn xk ! 0: (ii) The scalar product is continuous: if xn ! x, yn ! y, then kxn , yn l ! kx, yl: Proof. (i) By the Cauchy – Schwarz inequality kx þ yk2 ¼ kx þ y, x þ yl ¼ kxk2 þ2kx, yl þ kyk2 kxk2 þ2kxkkyk þ kyk2 ¼ (kxk þ kyk)2 ,
1.4 HILBERT SPACES
11
which proves the triangle inequality in Section 1.2.1 (3). The other properties of a norm (nonnegativity, homogeneity and nondegeneracy) easily follow from the scalar product axioms. (ii) Convergence xn ! x implies boundedness of the norms kxn k: Therefore, by the Cauchy –Schwarz inequality, kkxn , yn l kx, ylk kkxn , yn ylk þ kkxn x, ylk kxn kkyn yk þ kxn xkkyk:
B
A linear space H that is endowed with a scalar product and is complete in the norm generated by that scalar product is called a Hilbert space.
1.4.3 Discrete Ho¨lder’s Inequality An interesting generalization of the Cauchy – Schwarz inequality is in terms of the spaces lp from Section 1.2.3. Let p be a number from [1, 1) or the symbol 1. Its conjugate q is defined from 1=p þ 1=q ¼ 1. Explicitly, 8 > < p=(p 1) [ (1, 1), q ¼ 1, > : 1,
1 , p , 1; p ¼ 1; p ¼ 1:
Ho¨lder’s inequality states that 1 X
jxi yi j kxkp kykq :
(1:16)
i¼1
P A way to understand it is by considering the bilinear form kx, yl ¼ 1 i¼1 xi yi . It is defined on the Cartesian product l2 l2 and is continuous on it by Lemma 1.4.2 Ho¨lder’s inequality allows us to take arguments from different spaces: kx, yl is defined on lp lq and is continuous on this product.
1.4.4 Symmetric Operators Let A be a bounded operator in a Hilbert space H. Its adjoint is defined as the operator A that satisfies kAx, yl ¼ kx, A yl for all x, y [ H: This definition arises from the property of the transpose matrix A0 , n X i¼1
(Ax)i yi ¼
n X i¼1
xi (A0 y)i :
12
CHAPTER 1
INTRODUCTION TO OPERATORS, PROBABILITIES AND THE LINEAR MODEL
Existence of A is proved using the so-called Riesz theorem. We do not need the general proof of existence because in all the cases we need, the adjoint will be constructed explicitly. Boundedness of A will also be proved directly. A is called symmetric if A ¼ A . Symmetric operators stand out by having properties closest to those of real numbers.
1.4.5 Orthoprojectors Cosines of angles between vectors from H can be defined using Eq. (1.14). We don’t need this definition, but we do need its special case: vectors x, y [ H are called orthogonal if kx, yl ¼ 0. For orthogonal vectors we have the Pythagorean theorem: kx þ yk2 ¼ kx þ y, x þ yl ¼ kxk2 þ2kx, yl þ kyk2 ¼ kxk2 þ kyk2 :
Two subspaces X, Y # H are called orthogonal if every element of X is orthogonal to every element of Y. If a projector P in H (P2 ¼ P) is symmetric, P ¼ P , then it is called an orthoprojector. In the situation described in Section 1.3.5, when points on the plane are projected onto one axis parallel to another, orthoprojectors correspond to the case when the axes are orthogonal. Lemma.
Let P be an orthoprojector and let Q ¼ I P. Then
(i) Im(P) is orthogonal to Im(Q). (ii) For any x [ H, kPxk is the distance from x to Im(Q). Proof. (i) Let x [ Im(P) and y [ Im(Q). By Lemma 1.3.5(ii), x ¼ Px, y ¼ Qy: Hence, x and y are orthogonal: kx, yl ¼ kPx, Qyl ¼ kx, P(I P)yl ¼ kx, (P P2 )yl ¼ 0: (ii) For an arbitrary element x [ H and a set A # H the distance from x to A is defined by dist(x, A) ¼ inf kx yk: y[A
Take any y [ Im(Q): In the equation x y ¼ Px þ Qx Qy ¼ Px þ Q(x y)
1.5 LP SPACES
13
the two terms at the right are orthogonal, so by the Pythagorean theorem kx yk2 ¼ kPxk2 þ kQ(x y)k2 kPxk2 , which implies the lower bound for the distance dist(x, Im(Q)) kPxk: This lower bound is attained on y ¼ Qx [ Im(Q): kx yk ¼ kPx þ Qx Qxk ¼ kPxk: Hence, dist(x, Im(Q)) ¼ kPxk: B
1.5 Lp SPACES 1.5.1 s -Fields Let V be some set and let F be a nonempty family of its subsets. F is called a s-field if 1. unions, intersections, differences and complements of any two elements of F belong to F , 2. the union of any sequence {An : n ¼ 1, 2, . . .} of elements of F belongs to F and 3. V belongs to F . This definition contains sufficiently many requirements to serve most purposes of analysis. In probabilities, s-fields play the role of information sets. The precise meaning of this sentence at times can be pretty complex. The following existence statement is used very often. Lemma. For any system S of subsets of V there exists a s-field F that contains S and is contained in any other s-field containing S. Proof. The set of s-fields containing S is not empty. For example, the set of all subsets of V is a s-field and contains S. Let s be the intersection of all s-fields containing B S: It obviously satisfies 1 – 3 and hence is the s-field we are looking for. The s-field whose existence is affirmed in this lemma is called the least s-field generated by S and denoted s(S):
1.5.2 Borel s-field in Rn A ball in Rn centered at x [ Rn of radius 1 . 0, b1 (x) ¼ {y [ Rn : kx yk2 , 1}, is called an 1-neighborhood of x. We say that the set A # Rn is an open set if each point x belongs to A with its neighborhood b1 (x) (where 1 depends on x). The Borel s-field Bn in Rn is defined as the smallest s-field that contains all open subsets of Rn : It exists by Lemma 1.5.1. In more general situations, when open subsets of V are not defined, s-fields of V are introduced axiomatically.
14
CHAPTER 1
INTRODUCTION TO OPERATORS, PROBABILITIES AND THE LINEAR MODEL
1.5.3 s-Additive Measures A pair (V, F ), where V is some set and F is a s-field of its subsets, is called a measurable space. A set function m defined on elements of F with values in the extended half-line [0, 1] is called a s-additive measure if for any disjoint sets A1 , A2 , . . . [ F one has
m
1 [ j¼1
! Aj
¼
1 X
m(Aj ):
j¼1
EXAMPLE 1.4. On a plane, for any rectangle A define m(A) to be its area. The extension procedure from the measure theory then leads to the Lebesgue measure m with V ¼ R2 and F ¼ B2 (m is defined on all Borel subsets of R2 ). A probabilistic measure is a s-additive measure that satisfies an additional requirement m(V) ¼ 1: In this case, following common practice, we write P instead of m: Thus, a probability space (sometimes also called a sample space) is a triple (V, F , P) where V is a set, F is a s-field of its subsets and P is a s-additive measure on F such that P(V) ¼ 1: EXAMPLE 1.5. On a plane, take the square [0, 1]2 as V and let P be the Lebesgue measure. Then F will be the set of Borel subsets of the square.
1.5.4 Measurable Functions Let (V1 , F 1 ) and (V2 , F 2 ) be two measurable spaces. A function f : V1 ! V2 is called measurable if f 1 (A) [ F 1 for any A [ F 2 : More precisely, it is said to be (F 1 , F 2 )-measurable. In particular, when (V1 , F 1 ) ¼ (Rn , Bn ) and (V2 , F 2 ) ¼ (Rm , Bm ), this definition gives the definition of Borel-measurability. Most of the time we deal with real-valued functions, when V2 ¼ R and F 2 ¼ B1 is the Borel s-field. In this case we simply say that f is F 1 -measurable. All analysis operations in the finite-dimensional case preserve measurability. The next theorem is often used implicitly. Theorem.
(Kolmogorov and Fomin, 1989, Chapter 5, Section 4)
1. Let X, Y and Z be arbitrary sets with systems of subsets sX , sY and sZ , respectively. Suppose the function f : X ! Y is (sX , sY )-measurable and g : Y ! Z is (sY , sZ )-measurable. Then the composition z(x) ¼ g( f (x)) is (sX , sZ )-measurable. 2. Let f and g be defined on the same measurable space (V, F ): Then a linear combination af þ bg and product fg are measurable. If g does not vanish, then the ratio f =g is also measurable.
1.5 LP SPACES
15
1.5.5 L p Spaces Let (V, F , m) be any space with a s-additive measure m and let 1 p , 1. The set of measurable functions f : V ! R provided with the norm ð k f kp ¼
1=p j f (x)jp d m , 1 p , 1,
V
is denoted Lp ¼ Lp (V). In the case p ¼ 1 this definition is completed with k f k1 ¼ ess supx[V j f (x)j ¼ inf
sup j f (x)j:
m(A)¼0 x[VnA
The term in the middle is, by definition, the quantity at the right and is called essential supremum. These definitions mean that values taken by functions on sets of measure zero don’t matter. An equality f (t) ¼ 0 is accompanied by the caveat “almost everywhere” (a.e.) or “almost surely” (a.s.) in the probabilistic setup, meaning that there is a set of measure zero outside which f (t) ¼ 0:
1.5.6 Inequalities in Lp Apparently, Lp spaces should have a lot in common with lp spaces. The triangle inequality in Lp kF þ Gkp kFkp þ kGkp is called a Minkowski inequality. Ho¨lder’s inequality looks like this: ð j f (x)g(x)j d m k f kp kgkq , V
where q is the conjugate of p. When m(V) , 1, we can use this inequality to show that for 1 p1 , p2 1, L p2 is a subset of L p1 : ð
p1
ð
j f (x)j d m V
j f (x)j
p1 p2 = p1
p1 = p2 ð dm
V
p ¼ f p12 [m(V)]1p1 =p2 :
1 p1 = p2 dm
V
In particular, when (V, F , P) is a probability space, we get k f k p1 k f k p2
if 1 p1 , p2 1:
This is the opposite of the monotonicity relation (1.7).
1.5.7 Covariance as a Scalar Product Real-valued measurable functions on a probability space (V, F , P) are called random variables. Let X, Y be integrable random variables (integrability is necessary for their
16
CHAPTER 1
INTRODUCTION TO OPERATORS, PROBABILITIES AND THE LINEAR MODEL
means to exist). Denote x ¼ X EX, y ¼ Y EY: Then the covariance of X, Y is defined by cov(X, Y) ¼ E(X EX)(Y EY) ¼ Exy, the standard deviation of X is, by definition, pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi pffiffiffiffiffiffiffi s(X) ¼ cov(X, X) ¼ Ex2 ¼ s(x)
(1:17)
(1:18)
and the definition of correlation of X, Y is
r(X, Y) ¼
cov(X, Y) Exy ¼ : s(X)s(Y) s(x)s(y)
(1:19)
Comparison of Eqs. (1.17), (1.18) and (1.19) with Eqs. (1.12), (1.13) and (1.14) from Section 1.4.1 makes clear that definitions (1.17), (1.18) and (1.19) originate in Euclidean geometry. In particular, s (X) is the distance from X to EX and from x to 0. While this idea has been very fruitful, I often find it more useful to estimate (EX 2 )1=2 , which is the distance from X to 0.
1.5.8 Dense Sets in Lp, p < 1 Let us fix some space with measure (V, F, m). A set M # Lp is said to be dense in Lp if any function f [ Lp can be approximated by some sequence f fng # M: k fn 2f kp ! 0. By 1A we denote the indicator of a set A: 1, x [ A; 1A ¼ 0, x A: P A finite linear combination i ci 1Ai of indicators of measurable sets Ai [ F is called a step function. We say that the measure m is a s-finite measure if V can be represented as a union of disjoint sets Vi, [ V¼ Vi , (1:20) i
of finite measure m(Vi) , 1. For example, Rn is a union of rectangles of finite Lebesgue measure. Lemma. If p , 1 and the measure m is s-finite, then the set M of step functions is dense in Lp. Proof. Step 1. Let f [ Lp. First we show that the general case of V of infinite measure can be reduced to the case m (V) , 1. Since for the sets from Eq. (1.20) we have ð
j f (x)j p d m ¼ V
Xð l
j f (x)j p dm , 1, Vl
1.5 LP SPACES
17
P Ð for any 1 . 0 there exists L . 0 such that l.L Vl j f (x)j p d m , 1: Denote e e ¼ SL Vl . Whatever step function f~ we find to approximate f in L p (V) V 1 l¼1 in the sense that ð
j f (x) f~ 1 (x)j p d m , 1,
V
we can extend it by zero, ( f1 (x) ¼
f~ 1 (x),
e x [ V;
0,
e x [ VnV,
to obtain an approximation to f in Lp(V): ð
j f f1 j p d m ¼ V
ð
j f f~ 1 j p d m þ
V
ð
j f j p d m , 21:
Vn V
e , 1: f1 will be a step function and m(V) Step 2. Now we show that f can be considered bounded. From ð
j f j p dm ¼
V
1 ð X l¼1
j f (x)j p d m , 1 {l1j f (x)j,l}
Ð we see that for any 1 . 0, L can be chosen so that {Lj f (x)j} j f (x)j p d m , 1. e ¼ j f (x)j L and, as above, we see that approximatThen f is bounded on V e is enough. ing f by a simple function on V Step 3. Now we can assume that m(V) , 1 and j f (x)j L: Take a large k and partition [2L, L] into k nonoverlapping (closed, semiclosed or open, it does not matter) intervals D1 , . . . , Dk of length 2L=k: Let l1 , . . . , lk denote the left ends of those intervals and put Am ¼ f 1 (Dm ), m ¼ 1, . . . , k: Then the sets Am are disjoint,
jlm f (x)j
2L k
for x [ Am and V ¼
k [
Am :
m¼1
This implies p ð X Xð lm 1Am (x) f (x) dm ¼ jlm f (x)j p d m V m A m m p 2L m(V) ! 0, k ! 1: k
B
18
CHAPTER 1
INTRODUCTION TO OPERATORS, PROBABILITIES AND THE LINEAR MODEL
1.6 CONDITIONING ON s-FIELDS 1.6.1 Absolute Continuity of Measures Let (V, F, P) be a probability space and let f be an integrable function on V. Then the s-additivity of Lebesgue integrals (Kolmogorov and Fomin, 1989, Chapter 5, Section 5.4) ð 1 ð X f (x)dP ¼ f (x)dP for disjoint measurable Am 1 S Am
m¼1
Am
m¼1
means that ð n(A) ¼
f (x)dP
(1:21)
A
is a s-additive set function with the same domain F as that of P. Another property of Lebesgue integrals (see the same source) states that n is absolutely continuous with respect to P: n(A) ¼ 0 for each measurable set A for which P(A) ¼ 0. The Radon – Nikodym theorem affirms that the opposite is true: s-additivity and absolute continuity are sufficient for a set function to be of form (1.21). Theorem. (Radon – Nikodym) (Kolmogorov and Fomin, 1989, Chapter 6, Section 5.3) If (V, F , P) is a probability space and n is a set function defined on F that is s-additive and absolutely continuous with respect to P, then there exists an integrable function f on V such that Eq. (1.21) is true. If g is another such function, then f ¼ g a.s.
1.6.2 Conditional Expectation Let (V, F, P) be a probability space, X an integrable random variable and G a s-field contained in F . The conditional expectation E(XjG) is defined as a G-measurable function Y such that ð ð YdP ¼ XdP for all A [ G: (1:22) A
A
EXAMPLE 1.6. Let G ¼ {;, V} be the smallest s-field. In the case A ¼ ; (or, more generally, P(A) ¼ 0) Eq. (1.22) turns into an equality of two zeros. In the case A ¼ V we see that the means of Y and X should be the same. Since a constant is the only G-measurable random variable, it follows that E(XjG) ¼ EX: EXAMPLE 1.7. Let G ¼ F be the largest s-field contained in F . Since X is G-measurable, Y ¼ X satisfies Eq. (1.22). Hence, E(XjG) ¼ X by a.s. uniqueness. Y ¼ X is an incorrect answer for Example 1.6 because inverse images X 1 (B) of some Borel sets would not belong to {;, V} unless F ¼ {;, V}: Y ¼ E(XjG) contains precisely as much information about X as is necessary to calculate the integrals in (1.22).
1.6 CONDITIONING ON s-FIELDS
19
1.6.3 Conditioning as a Projector Lemma.
Let (V, F, P) be a probability space and let G be a s-field contained in F.
(i) For any integrable X, E(XjG) exists. Denote PG X ¼ E(X j G) for X [ L1 (V). (ii) PG is linear, PG (aX þ bY) ¼ aPGX þ bPGY, and bounded, kPG Xk1 kXk1 . (iii) PG is a projector. Proof. Ð (i) n(A) ¼ A XdP defines a s-additive set function on G that is absolutely continuous with respect to P. By the Radon – Nikodym theorem there exists a G-measurable function Y such that Eq. (1.22) is true. This proves the existence of Y ¼ E(X j G). (ii) We can use Eq. (1.22) repeatedly to obtain ð
ð
ð
PG (aX þ bY) dP ¼
ð
(aX þ bY) dP ¼ a
A
A
ð
ð PG X dP þ b
¼a A
ð
Y dP A
PG Y dP A
(aPG X þ bPG Y) dP,
¼
XdP þ b A
A [ G:
A
Since aPG X þ bPG Y is G-measurable, it must coincide with PG (aX þ bY). For any real-valued function f define its positive part by fþ ¼ maxf f, 0g and negative part by f2 ¼ 2minf f, 0g. Then it is geometrically obvious that f ¼ fþ f and j f j ¼ fþ þ f : Decomposing PG X into its positive and negative parts, PG X ¼ (PG X)þ (PG X) , and remembering that both sets {PG X . 0} and {PG X , 0} are G-measurable we have ð
ð jPG Xj dP ¼ V
[(PG X)þ þ (PG X) ] dP ð
V
ð PG X dP þ
¼ ð
PG X . 0
ð
PG X dP PG X , 0
X dP þ
¼ PG X . 0
ð
X dP PG X , 0
jXj dP: V
This proves that k PG k 1. Ð Ð (iii) P2G X is defined as a G-measurable function Y such that A Y dP ¼ A PG X dP for all A [ G: Since PG X itself is G-measurable, we have Y ¼ PG X a.s. B
20
CHAPTER 1
INTRODUCTION TO OPERATORS, PROBABILITIES AND THE LINEAR MODEL
1.6.4 The Law of Iterated Expectations In a 3-D space, projecting first to a plane and then to a straight line in that plane gives the same result as projecting directly to the straight line. This is also true of conditioning (and projectors in general). Lemma. Let H # G # F be nested s-fields and denote PH and PG as the conditioning projectors on H and G, respectively. Then PH PG ¼ PG PH ¼ PH : Using the conditional expectation notation, this is the same as E[E(XjG)jH] ¼ E[E(XjH)jG] ¼ E(XjH):
(1:23)
In particular, when H ¼ {;, V} is the least s-field, we get E[E(XjG)] ¼ EX for all integrable X. Proof. H-measurability of PH X implies its G-measurability. Hence, by Lemma 1.6.3 (iii) PG doesn’t change it. This proves that Ð PG PH ¼ PHÐ . PG X is G-measurable and Ðsatisfies A PÐG XdP ¼ A X dP for all A [ G: In particular, this is true for A [ H: A PG XdP ¼ A XdP, A [ H: Confronting this with the definition of PH PG X, ð
ð
PG XdP, A [ H,
PH PG XdP ¼ A
A
Ð Ð we see that A PH PG XdP ¼ A XdP, A [ H: But PH satisfies the same equation with PH X in place of PH PG X and both are H-measurable. Hence, PH PG X ¼ PH X a.s. B
1.6.5 Extended Homogeneity In the usual homogeneity, PG (aX) ¼ aPG X, a is a number. In the conditioning context, a can be any G-measurable function, according to the next theorem. I call this property extended homogeneity. Theorem. If the variables X and XY are integrable and Y is G-measurable, then PG (XY) ¼ YPG X: The proof can be found, for example, in (Davidson 1994, Section 10.4).
1.6.6 Independence s-fields H and G are called independent s-fields if any event A [ H is independent of any event B [ G: P(A > B) ¼ P(A)P(B): Random variables X and Y are said to be independent if s-fields s (X) and s (Y) are independent. Moreover, a family {Xi : i [ I} of random variables is called independent if, for any two disjoint sets of indices J, K, s-fields s (Xi : i [ J) and s (Xi : i [ K) are independent.
1.7 MATRIX ALGEBRA
21
Theorem. (Davidson 1994, Section 10.5) Suppose X is integrable and H-measurable. If G is independent of H, then conditioning X on G provides minimum information: E(XjG ) ¼ EX:
1.7 MATRIX ALGEBRA Everywhere we follow the matrix algebra convention: all matrices and vectors in the same formula are compatible. All matrices in this section are assumed to be of size n n: The determinant of A is denoted as det A or jAj.
1.7.1 Orthogonal Matrices A matrix T is called orthogonal if T 0 T ¼ I:
(1:24) P
Since both T 0 T and TT 0 have generic elements l til tli , Eq. (1.24) is equivalent to TT 0 ¼ I: Equation (1.24) means, by definition of the inverse, that T 1 ¼ T 0 : Geometrically, the mapping y ¼ Tx is rotation in Rn : This is proved by noting that T preserves scalar products: kTx, Tyl ¼ kx, T 0 Tyl ¼ kx, yl: Hence, it preserves vector lengths and angles between vectors, see Equations (1.13) and (1.14) in Section 1.4.1. Rotation around the origin is the only mapping that has these properties.
1.7.2 Diagonalization of Symmetric Matrices A number l [ R is called an eigenvalue of a matrix A if there exists a nonzero vector x that satisfies Ax ¼ lx: Such a vector x is named an eigenvector corresponding to l: From this definition it follows that A reduces to multiplication by l along the straight line {ax : a [ R}: The set L of all eigenvectors corresponding to l, completed with the null vector, is a subspace of Rn , because Ax ¼ lx and Ay ¼ ly imply A(ax þ by) ¼ l(ax þ by): This subspace is called a characteristic subspace of A corresponding to l: The dimension of the characteristic subspace (see Section 1.1.3) is called multiplicity of l: A reduces to multiplication by l in L: We say that a system of vectors x1 , . . . , xk is orthonormal if kxi , xj l ¼
1, 0,
i ¼ j; i = j:
The system of unit vectors in Rn is an example of an orthonormal system. An orthonormal system is necessarily linearly independent because scalar multiplication of the equation a1 x1 þ þ ak xk ¼ 0 by vectors x1 , . . . , xk yields a1 ¼ ¼ ak ¼ 0:
22
CHAPTER 1
INTRODUCTION TO OPERATORS, PROBABILITIES AND THE LINEAR MODEL
Theorem. (Diagonalization theorem) (Bellman 1995, Chapter 4, Section 7). If A is symmetric of size n n, then it has n real eigenvalues l1 , . . . , ln , repeated with their multiplicities. Further, there is an orthogonal matrix T such that A ¼ T 0 LT,
(1:25)
where L is a diagonal matrix L ¼ diag[l1 , . . . , ln ]: Finally, the eigenvectors x1 , . . . , xn that correspond to l1 , . . . , ln can be chosen orthonormal. Equation (1.25) embodies the following geometry. In the original coordinate system with the unit vectors ej (see Section 1.1.3) the matrix A has generic elements aij : The first transformation T in Eq. (1.25) rotates the coordinate system to a new position in which A is of simple diagonal form, the new axes being eigenvectors along which applying A amounts to multiplication by numbers. The final transformation by T 0 ¼ T 1 rotates the picture to the original position.
1.7.3 Finding and Applying Eigenvalues Eigenvalues are the roots of the equation det(A lI) ¼ 0: Application of this matrix algebra rule is complicated as the left side of the equation is a polynomial of order n. Often it is possible to exploit the analytical structure of A to find its eigenvalues using the next lemma. A subspace L of Rn is called an invariant subspace of a matrix A if AL # L: Lemma (i) l is an eigenvalue of A if and only if l c is an eigenvalue of A cI: (ii) Let L be an invariant subspace of a symmetric matrix A. Denote P an orthoprojector onto L, Q ¼ I P and M ¼ Im(Q). Then M is an invariant subspace of A and the analysis of A reduces to the analysis of its restrictions AjL and AjM : Proof. Statement (i) is obvious because the equation Ax ¼ lx is equivalent to (A cI)x ¼ (l c)x. (ii) For any x, y [ Rn by symmetry of A, P, kPAQx, yl ¼ kAQx, Pyl ¼ kQx, APyl ¼ 0: The last equality follows from the facts that Py [ L ¼ Im(P), APy [ L and Im(P) is orthogonal to Im(Q) [see Lemma 1.4.5(i)]. Plugging in y ¼ PAQx we get kPAQxk ¼ 0 and PAQx ¼ 0: Since Qx runs over M when x runs over Rn , we obtain PAM ¼ {0} or, by Lemma 1.4.5(ii), AM # M and M is invariant with respect to A.
1.7 MATRIX ALGEBRA
23
Now premultiply by A the identity I ¼ P þ Q to get A ¼ AP þ AQ ¼ AjL P þ AjM Q:
B
The second part of this lemma leads to the following practical rule. If you have managed to find the first eigenvalue l and the corresponding characteristic subspace L of A, then consider the restriction AjM to find the rest of the eigenvalues. This process of “chipping off ” characteristic subspaces can be repeated. While you do that, construct the orthonormal systems of eigenvectors until their total number reaches n. Denoting y ¼ Tx, from Theorem 1.7.2 we have kAx, xl ¼ kT 0 LTx, xl ¼ kLTx, Txl ¼ kLy, yl ¼
n X
li y2i :
i¼1
Hence, A is nonnegative and kAx, xl 0 for all x if and only if all eigenvalues of A are nonnegative. Therefore we can define the square root of a nonnegative symmetric matrix by A1=2 ¼ T 0 diag[l11=2 , . . . , l1=2 n ]T:
1.7.4 Gram Matrices In a Hilbert space H consider vectors x1 , . . . , xk : Their Gram matrix is defined by 0 1 kx1 , x1 l . . . kx1 , xk l G ¼ @ ... ... . . . A: kxk , x1 l . . . kxk , xk l Theorem. (Gantmacher 1959, Chapter IX, Section 5) Vectors x1 , . . . , xk are linearly independent if and only if det G . 0.
1.7.5 Positive Definiteness of Gram Matrices Lemma. If vectors x1 , . . . , xk [ Rn are linearly independent, then G is positive definite: kGx, xl . 0 for all x = 0: Proof. According to the Silvester criterion (Bellman 1995, Chapter 5, Section 3), G is positive definite if and only if all determinants
kx1 , x1 l kx1 , x2 l , . . . , det G kx1 , x1 l, det kx2 , x1 l kx2 , x2 l
(1:26)
are positive. Linear independence of the system {x1 , . . . , xk } implies that of all its subsystems {x1 }, {x1 , x2 }, . . . . Thus all determinants are positive by Theorem 1.7.4. B
24
CHAPTER 1
INTRODUCTION TO OPERATORS, PROBABILITIES AND THE LINEAR MODEL
1.7.6 Partitioned Matrices: Determinant and Inverse Lemma.
(Lu¨tkepohl 1991, Section A.10). Let matrix A be partitioned as A¼
A11 A21
A12 A22
where A11 and A22 are square. Then (i) If A11 is nonsingular, jAj ¼ jA11 j jA22 A21 A1 11 A12 j. (ii) If A11 and A22 are nonsingular, 1
A
¼
¼
D
DA12 A1 22
!
1 1 1 A1 22 A21 D A22 þ A22 A21 DA12 A22 1 1 A1 11 þ A11 A12 GA21 A11 GA21 A1 11
! A1 11 A12 G , G
1 1 where D ¼ (A11 A12 A1 and G ¼ (A22 A21 A1 22 A21 ) 11 A12 ) .
1.8 CONVERGENCE OF RANDOM VARIABLES A random variable is nothing but a (F , B)-measurable function X : V ! R where (V, F , P) is a probability space and B is the Borel s-field of R. In the case of a random vector it suffices to replace R by Rn and B by Bn , the Borel s-field of Rn :
1.8.1 Convergence in Probability Let X, X1 , X2 , . . . be random vectors defined on the same probability space and with values in the same space Rn : If lim P(kXn Xk2 . 1) ¼ 0 for any 1 . 0,
n!1
then {Xn } is said to converge in probability to X. Convergence in probability is comp
monly denoted Xn ! X or plimXn ¼ X: From the equivalent definition lim P(kXn Xk2 1) ¼ 1 for any 1 . 0
n!1
it may be easier to see that this notion is a natural generalization of convergence of numbers. A nice feature of convergence in probability is that it is preserved under arithmetic operations.
1.8 CONVERGENCE OF RANDOM VARIABLES
25
Lemma. Let {Xi } and {Yi } be sequences of n 1 random vectors and let {Ai } be a sequence of random matrices such that plim Xi , plim Yi and plim Ai exist. Then (i) plim(Xi + Yi ) ¼ plimXi + plimYi : (ii) plim Ai Xi ¼ plim Ai plim Xi : (iii) Let g : Rn ! R be a Borel-measurable function such that X ¼ plim Xi takes values in the continuity set Cg of g with probability 1, P(X [ Cg ) ¼ 1: Then plimg(Xi ) ¼ g(X): 1 (iv) If plim Ai ¼ A and P(det A = 0) ¼ 1, then plim A1 n ¼A . Proof. Statements (i) and (ii) are from (Lu¨tkepohl 1991, Section C.1). (iii) is proved in (Davidson 1994, Theorem 18.8). (iv) The real-valued function 1=det A of a square matrix A of order n is continu2 ous everywhere in the space Rn of its elements except for the set det A ¼ 0: Elements of A1 are cofactors of elements of A divided by det A: Hence, they are also continuous where det A = 0. The statement follows on applying (iii) element by element. B Part (iv) of this lemma does not imply invertibility of An a.e. It merely implies that the set on which An is not invertible has probability approaching zero.
1.8.2 Distribution Function of a Random Vector Let X be a random vector with values in Rk . Its distribution function is defined by !! k Y 1 FX (x) ¼ P(X1 x1 , . . . , Xk xk ) ¼ P X (1, xn ] , x [ Rk : n¼1
It is proved that FX induces a probability measure on Rk , also denoted by FX . We say that X has density pX if FX is absolutely continuous with respect to the Lebesgue measure in Rk , that is if ð FX (A) ¼ pX (t) dt A
for any Borel set A. Random vectors X, Y are said to be identically distributed if their distribution functions are identical: FX (x) ¼ FY (x) for all x [ Rk . The original pair consisting of the vector X and probability space (V, F , P) is distributed identically with the pair consisting of the identity mapping X(t) ¼ t on Rk and probability space (Rk , Bk , FX ) where Bk is the Borel field of subsets of Rk . Identically distributed vectors have equal moments. In particular, there are two different formulas for ð ð EX ¼ X(v) dP(v) ¼ tdFX (t) V
(see Davidson 1994, Section 9.1).
Rk
26
CHAPTER 1
INTRODUCTION TO OPERATORS, PROBABILITIES AND THE LINEAR MODEL
1.8.3 Convergence in Distribution We say that a sequence of random vectors {Xi } converges in distribution to X if FXi (t) ! FX (t) at all continuity points t of the limit distribution FX . For convergence d
in distribution we use the notation Xi ! X or dlimXi ¼ X. In econometrics, we are interested in convergence in distribution because confidence intervals for X in the one-dimensional (1-D) case can be expressed in terms of FX : P(a , X b) ¼ FX (b) FX (a). Here the right-hand side can be approximated by FXi (b) FXi (a) if dlimXi ¼ X and a and b are continuity points of FX (which is always the case if X is normal). Convergence in distribution is so weak that it is not preserved under arithmetic operations. In expressions like Xi þ Yi or Ai Xi we can pass to the limit in distribution if one sequence converges in distribution and the other in probability to a constant. Lemma. Let {Xi } and {Yi } be sequences of n 1 random vectors and let {Ai } be a sequence of random matrices such that dlimXi , plimYi and plimAi exist. (i) If c ¼ plimYi is a constant, then dlim(Xi þ Yi ) ¼ dlimXi þ c. (ii) If A ¼ plimAi is constant, then dlimAi Xi ¼ AdlimXi . (iii) plimXi ¼ X implies dlimXi ¼ X. If X is a constant, then the converse is true: dlimXi ¼ c implies plimXi ¼ c. (iv) (Dominance of convergence in probability to zero) If plimAi ¼ 0, then the same is true for the product: plimAi Xi ¼ 0: d
(v) Suppose Xn ! X where all random vectors take values in Rk . Let h : Rk ! Rm be measurable and denote Dh the set of discontinuities of h. d
If FX (Dh ) ¼ 0, then h(Xn ) ! h(X). Proof. For (i) and (ii) see (Davidson 1994, Theorem 22.14) (1-D case). The proof of (iii) can be found in (Davidson 1994, Theorems 22.4 and 22.5). Statement (iv) is proved like this. If plimAi ¼ 0, then dlimAi Xi ¼ 0 by (ii), which implies plimAi Xi ¼ 0 by (iii). The proof of (v) is contained in (Billingsley 1968, Chapter 1, Section 5). B The case c ¼ 0 of statement (i) is a perturbation result: adding to {Xi } a sequence {Yi } such that plimYi ¼ 0 does not change dlimXi . A continuous h (for which Dh is empty) is a very special case of (v). This case is called a continuous mapping theorem (CMT). For (ii) “plimAi ” is not constant, the way around is to prove convergence in distribution of the pair {Ai , Xi }. Then CMT applied to h(Ai , Xi ) ¼ Ai Xi does the job.
1.8.4 Boundedness in Probability Let {Xn } be a sequence of random variables. We know that a (proper) random variable X satisfies P(jXj . M) ! 0 as M ! 1. Requiring this property to hold uniformly in n gives us the definition of boundedness in probability: supn P (jXn j . M) ! 0
1.8 CONVERGENCE OF RANDOM VARIABLES
27
as M ! 1. We write Xn ¼ Op (1) when {Xn } is bounded in probability. This notation is justified by item (i) of the next lemma. Lemma (i) If Xn ¼ xn ¼ constant, then xn ¼ O(1) is equivalent to Xn ¼ Op (1). (ii) If Xn ¼ Op (1) and Yn ¼ Op (1), then Xn þ Yn ¼ Op (1) and Xn Yn ¼ Op (1). Proof. (i) It is easy to see that sup P(jxn j . M) ¼ sup 1{jxn j.M} ¼ 1{supn jxn j.M} : n
(1:27)
n
This implies that supn P(jXn j . M) ! 0 if and only if supn jxn j M: (ii) Let us show that {jXn þ Yn j . M} # {jXn j . M=2} < {jYn j . M=2}:
(1:28)
Suppose the opposite is true. Then there exists v [ V such that M , jXn (v) þ Yn (v)j jXn (v)j þ jYn (v)j M, which is nonsense. Equation (1.28) implies
M M þ sup P jYn j . sup P(jXn þ Yn j . M) sup P jXn j . 2 2 n n n ! 0, M ! 1, that is, Xn þ Yn ¼ Op (1). Further, along with Eq. (1.28), we can prove n pffiffiffiffiffio n pffiffiffiffiffio {jXn Yn j . M} # jXn j . M < jYn j . M and therefore pffiffiffiffiffi pffiffiffiffiffi sup P(jXn Yn j . M) sup P jXn j . M þ sup P jYn j . M n
n
! 0, M ! 1,
n
(1:29)
which proves that Xn Yn ¼ Op (1). B
28
CHAPTER 1
INTRODUCTION TO OPERATORS, PROBABILITIES AND THE LINEAR MODEL
1.8.5 Convergence in Probability to Zero The definition of Section 1.8.1 in the special case when {Xn } is a sequence of random variables gives the definition of convergence in probability to zero: p limn!1 P(jXn j . 1) ¼ 0 for any 1: In this case, instead of Xn ! 0 people often write Xn ¼ op (1): Lemma (i) If Xn ¼ xn ¼ constant, then xn ¼ o(1) is equivalent to Xn ¼ op (1). (ii) Xn ¼ op (1) implies Xn ¼ Op (1). (iii) If Xn ¼ op (1) and Yn ¼ op (1), then Xn + Yn ¼ op (1). (iv) Suppose Xn ¼ op (1) or Xn ¼ Op (1) and Yn ¼ op (1). Then Xn Yn ¼ op (1). d (v) If Xn ! X and Yn ¼ op (1), then Xn Yn ¼ op (1). Proof. (i) From an equation similar to Eq. (1.27): lim sup P(jxn j . 1) ¼ lim sup 1{jxn j.1} ¼ 1{lim supn!1 jxn j.1} , n!1
n!1
we see that limn!1 P(jXn j . 1) ¼ 0 is equivalent to lim supn!1 jxn j 1 and Xn ¼ op (1) is equivalent to xn ¼ o(1). (ii) If Xn ¼ op (1), then, for any given d . 0, there exists n0 such that P(jXn j . M) d, n n0 . Increasing M, if necessary, we can make sure that P(jXn j . M) d, n , n0 . Thus, supn P(jXn j . M) d. Since d . 0 is arbitrary, this proves Xn ¼ Op (1). (iii) This statement follows from Lemma 1.8.1(i). (iv) By (ii) Xn ¼ Op (1), modify Eq. (1.29) to get sup P(jXn Yn j . 1M) sup P(jXn j . M) þ sup P(jYn j . 1): nn0
n
nn0
Taking an arbitrary d . 0, choose a sufficiently large M, define 1 ¼ d=M and then select a sufficiently large n0 : The right-hand side will be small, which proves Xn Yn ¼ op (1). (v) This is just a different way of stating Lemma 1.8.3(iv). B
1.8.6 Criterion of Convergence in Distribution of Normal Vectors A normal vector is defined using its density. We don’t need the formula for the density here. It suffices to know that determined Ð Ð the density of a normal vector e is completely by its first moment Ee ¼ Rn tdFe (t) and second moments Eei ej ¼ Rn ti tj dFe (t).
1.9 THE LINEAR MODEL
29
Lemma. Convergence in distribution of a sequence {Xk } of normal vectors takes place if and only if the limits limEXk and limV(Xk ) exist where V(X) ¼ E(X EX)(X EX)0 . Proof. This statement is obtained by combining two facts. The characteristic function fX of a random vector X is defined by
fX (t) ¼ Eeikt,Xl , t [ Rn : pffiffiffiffiffiffiffi Here i ¼ 1. The first fact is that convergence in distribution dlimXk ¼ X is equivalent to the pointwise convergence limfXk (t) ¼ fX (t) for all t [ Rn (see Billingsley 1995, Theorem 26.3). The second fact is that the characteristic function of a normal vector X depends only on two parameters: its mean EX and variance V(X ) see (Rao 1965, Section 8a.2). B
1.9 THE LINEAR MODEL 1.9.1 The Classical Linear Model The usual assumptions about the linear regression y ¼ Xb þ e
(1:30)
are the following: 1. y is an observed n-dimensional random vector, 2. the matrix of regressors (or independent variables) X of size n k is assumed known, 3. b [ Rk is the parameter vector to be estimated from data ( y and X), 4. e is an unobserved n-dimensional error vector with mean zero and 5. n . k and det X 0 X = 0. The matrix X is assumed constant (deterministic). In dynamic models, with lags of the dependent variable at the right side, those lags are listed separately. I am in favor of separating deterministic regressors from stochastic ones from the very beginning, rather than piling them up together and later trying to specify the assumptions by sorting out the exogenous regressors.
30
CHAPTER 1
INTRODUCTION TO OPERATORS, PROBABILITIES AND THE LINEAR MODEL
1.9.2 Ordinary Least Squares Estimator The least squares procedure first gives rise to the normal equation X 0 X b^ ¼ X 0 y for the OLS estimator b^ of b and then, subject to the condition det X 0 X = 0, to the formula of the estimator
b^ ¼ (X 0 X)1 X 0 y: This formula and model (1.30) itself lead to the representation
b^ b ¼ (X 0 X)1 X 0 e
(1:31)
used to study the properties of b^ : In particular, the assumption Ee ¼ 0 implies that b^ is unbiased, E b^ ¼ b and that its distribution is centered on b.
1.9.3 Normal Errors N(m, S) denotes the class of normal vectors with mean m and variance S (which in general may be singular). Errors distributed as N(0, s 2 I) are assumed as the first approximation to reality. Components e1 , . . . , en of such errors satisfy cov(ei , ej ) ¼ 0,
i = j, Eei ¼ 0, Ee2i ¼ s2 :
(1:32)
The first equation here says that e1 , . . . , en are uncorrelated. Lemma. If e N(0, s 2 I), then the components of e are independent identically distributed. Proof. By the theorem from (Rao 1965, Section 8a.2) uncorrelatedness of the components of e plus normality of e imply independence of the components. By Eq. (1.32) the first and second moments of the components coincide, therefore their densities and distribution functions coincide. B
1.9.4 Independent Identically Distributed Errors We write e IID(0, s 2 I) to mean that the components of e are independent identically distributed (i.i.d.), have mean zero and covariance s2 I: Lemma 1.9.3 means that N(0, s 2 I) # IID(0, s 2 I): Lemma. Suppose e IID(0, s2 I) and put F 0 ¼ {;, V}, F t ¼ s(ej : j t), t ¼ 1, 2, . . . Then et is F t -measurable, E(et jF t1 ) ¼ 0, E(e2t jF t1 ) ¼ s2 , t ¼ 1, . . . , n. Proof. For t ¼ 1, E(e1 jF 0 ) ¼ Ee1 ¼ 0 (see Example 1.6 in Section 1.6.2). Let t . 1. By definition, F t1 ¼ s (ej : j t 1) and s (et ) are independent.
1.9 THE LINEAR MODEL
31
By Theorem 1.6.6, E(et jF t1 ) ¼ Eet ¼ 0: Similarly, E(e2t jF t1 ) ¼ Ee2t ¼ s 2 (see Theorem 1.5.4(i) about nonlinear transformations of measurable functions). B
1.9.5 Martingale Differences Let {F t : t ¼ 1, 2, . . .} be an increasing sequence of s-fields contained in F : F 1 # . . . # F n # . . . , F : A sequence of random variables {et : t ¼ 1, 2, . . .} is called adapted to {F t } if et is F t -measurable for t ¼ 1, 2, . . . If a sequence of integrable variables {et } satisfies 1. {et } is adapted to {F t } and 2. E(et jF t1 ) ¼ 0 for t ¼ 1, 2, . . . , where F 0 ¼ {;, V}, then we say that {et , F t } or, shorter, {et } is a martingale difference (m.d.) sequence. Lemma.
Square-integrable m.d. sequences are uncorrelated and have mean zero.
Proof. By the law of iterated expectations (LIE) [Eq. (1.23)] and the m.d. property item 2 the means are zero: Eet ¼ E[E(et jF t1 )jF 0 ] ¼ 0, t ¼ 1, 2, . . . Let s , t. Since es is F s -measurable, it is F t1 -measurable. By extended homogeneity (Section 1.6.5) and the LIE Ees et ¼ E[E(es et jF t1 )] ¼ E[es E(et jF t1 )] ¼ 0:
B
The generality of the m.d. assumption is often reduced by the necessity to restrict the behavior of the second-order conditional moments by the condition E(e2t jF t1 ) ¼ s 2 , t ¼ 1, 2, . . .
(1:33)
Owing to the LIE this condition implies Ee2t ¼ s 2 , t ¼ 1, 2, . . . We denote by MD(0, s 2 ) the square-integrable m.d.’s that satisfy Eq. (1.33). By Lemma 1.9.4, IID(0, s 2 I) # MD(0, s 2 ) if we put F t ¼ s (ej : j t).
1.9.6 The Hierarchy of Errors We have proved that N(0, s 2 I) # IID(0, s 2 I) # MD(0, s 2 ):
(1:34)
Members of any of these three classes have a mean of zero and are uncorrelated. Normal errors are in the core of all error classes considered in this book. This means that any asymptotic results should hold for normal errors and the class of normal errors can be used as litmus paper for tentative assumptions and proofs. The
32
CHAPTER 1
INTRODUCTION TO OPERATORS, PROBABILITIES AND THE LINEAR MODEL
criterion of convergence in the distribution of normal vectors (Section 1.8.6) facilitates verifying convergence in this class. Some results will be proved for linear processes P as errors. Let {cj : j [ Z} be a double-infinite summable sequence of numbers, j[Z jcj j , 1, and let {ej : j [ Z} be a sequence of integrable zero-mean random variables, called innovations. A linear process is a sequence {vj : j [ Z} defined by the convolution vt ¼
X
cj etj , t [ Z:
(1:35)
j[Z
Members of any of the above three classes may serve as the innovations. If c0 ¼ 1 and cj ¼ 0 for any j = 0, we get vt ¼ et , which shows that the class of linear processes includes any of the three classes of Eq. (1.34). Linear processesPwith summable {cj } are called short-memory processes. If supj Ejej j , 1 and j jcj j , 1, then vt have uniformly bounded L1 -norms, P Ejvt j supj Ejej j j jcj j , 1, and zero means. More general processes with P 2 square-summable {cj }, j[Z cj , 1, are called long-memory processes. In this case, if the innovations are uncorrelated and have uniformly bounded L2 -norms, P then vt exist in the sense of L2 : Ev2t supj Ee2j j c2j , 1. There are also mixing processes, see (Davidson, 1994), which are more useful in nonlinear problems. Longmemory and mixing processes are not considered here. Long-memory processes do not fit Theorem 3.5.2, as discussed in Section 3. Conditions in terms of mixing processes do not look nice, perhaps because they are inherently complex or the theory is underdeveloped.
1.10 NORMALIZATION OF REGRESSORS 1.10.1 Normal Errors as the Touchstone of the Asymptotic Theory Suppose we have a series of regressions y ¼ X b þ e with the same b and n going to infinity (dependence of y, X and e on n is not reflected in the notation). We would like to know if the sequence of corresponding OLS estimators b^ converges in distribution to a normal vector. We shall see that, as a preliminary step, b^ should be centered on b and properly scaled, so that convergence takes place for Dn (b^ b), where Dn is some matrix function of the regressors. The factor Dn is called a normalizer (it normalizes variances of components of the transformed errors in the OLS estimator formula to a constant). The choice of the normalizer is of crucial importance as it affects the conditions imposed later on X and e. The classes of regressors and errors should be as wide as possible. The search for these classes is complicated if both regressors and errors are allowed to vary. However, under the hierarchy of errors described above the normal errors are the core of the theory. The implication is that, whatever the conditions imposed on X, they should work for the class of normal errors. The OLS estimator, being a linear transformation
1.10 NORMALIZATION OF REGRESSORS
33
of e, is normal when e is normal. Therefore from the criterion of convergence in distribution of normal vectors (Section 1.8.6) we conclude that the choice of the normalizer and the class of regressors should satisfy the conditions 1. lim EDn (b^ b) exists and 2. lim V(Dn (b^ b)) exists when e N(0, s 2 I): For deterministic X, it is natural to stick to deterministic Dn , so condition 1 trivially holds because of unbiasedness of b^ : The second condition can be called a variance stabilization condition.
1.10.2 Where Does the Square Root Come From? Consider n independent observations on a normal variable with mean b and standard deviation s: In terms of regression, we are dealing with X ¼ (1, . . . , 1)0 (n unities) and e N(0, s 2 I): From the representation of the OLS estimator (1.31) b^ b ¼ (e1 þ þ en )=n: By independence of the components of e this implies 1 s2 : [V(e ) þ þ V(e )] ¼ 1 n n n2 pffiffiffi Now it is easy to see that with Dn ¼ n the variance stabilization condition is satisfied pffiffiffi d and the criterion of convergence of normal variables gives n(b^ b) ! N(0, s 2 ): The square root also works for stable autoregressive models (Hamilton, 1994). V(b^ b) ¼
1.10.3 One Nontrivial Regressor and Normal Errors Consider a slightly more general case y ¼ xb þ e with x [ Rn and a scalar b. The representation of the OLS estimator reduces to b^ b ¼ x0 e=kxk22 and we easily find that V(kxk2 (b^ b)) ¼
n 1 X x2 s 2 ¼ s 2 kxk22 i¼1 i
under the same assumption e N(0, s 2 I): It follows that d kxk2 (b^ b) ! N(0, s 2 )
(1:36)
and Dn ¼ kxk2 is the right normalizer.pffiffiffi pffiffiffi pffiffiffi What if instead of Dn we use n? Then n(b^ b) ¼ nx0 e= kxk22 and the variance stabilization condition leads to n ! constant: kxk22
34
CHAPTER 1
INTRODUCTION TO OPERATORS, PROBABILITIES AND THE LINEAR MODEL
pffiffiffi This means that the n-rule separates a narrow class of regressors for which kxk2 is of pffiffiffi order n for large n. In general, any function of n tending to 1 as n ! 1 can be used as a normalizer for some class of regressors, and there are as many classes as there are functions with different behavior at infinity. The normalizer Dn ¼ kxk2 is better because it adapts to the regressor instead of separating some class. For example, for x ¼ (1, . . . , 1)0 (n unities) it gives the classical square root and for a linear trend x1 ¼ 1, x2 ¼ 2, . . . , xn ¼ n it grows as n3=2 : As Dn is self-adjusting, you don’t need to know the rate of growth of kxk2 : This is especially important in applications where regressors don’t have any particular analytical pattern. The decisive argument is that Dn is in some sense unique (see Section 1.11.3).
1.10.4 The Errors Contribution Negligibility Condition Let us look again at y ¼ xb þ e where e1 , . . . , en are now IID(0, s2 I) and not necessarily normal. Having made up our mind regarding the normalizer we need to prove convergence in distribution of kxk2 (b^ b) ¼
x1 xn e1 þ þ en : kxk2 kxk2
Here is where CLTs step in. The CLTs we need affirm the asymptotic normality of weighted sums n X
wnt et
t¼1
of random variables e1 , . . . , en , which are not necessarily normal. Convergence in distribution of such sums is possible under two types of restrictions. The first type limits dependence among the random variables and is satisfied in the case under consideration because we assume independence. The second type requires contribution of each term in the sum to vanish asymptotically where
contribution ¼
variance of a term : variance of the sum
Under our assumptions this type boils down to the condition jxt j ¼ 0, n!1 1tn kxk2 lim max
(1:37)
often called an errors contribution negligibility condition. This condition in combination with e IID(0, s 2 I) is sufficient to prove Eq. (1.36).
1.11 GENERAL FRAMEWORK IN THE CASE OF K REGRESSORS
35
1.11 GENERAL FRAMEWORK IN THE CASE OF K REGRESSORS 1.11.1 The Conventional Scheme Now in the model y ¼ X b þ e we allow X to have more than one column and assume det X 0 X = 0, e IID(0, s 2 I). The rough approach consists in generalizing upon Section 1.10.2 (with a constant regressor) by relying on the identity 0 1 0 pffiffiffi XX Xe ^ pffiffiffi : n (b b ) ¼ n n
(1:38)
Suppose that here X0X exists and is nonsingular n!1 n
limit A ¼ lim
(1:39)
and that X0e d pffiffiffi ! N(0, B): n
(1:40)
Then, by continuity of matrix inversion (X 0 X=n)1 ! A1 and the rule for convergence in distribution [Lemma 1.8.3(ii)] implies
X0X n
1
X0e d pffiffiffi ! A1 u, u N(0, B): n
As a result, pffiffiffi d n(b^ b) ! N(0, A1 BA1 ):
(1:41)
As in case k ¼ 1, the rough approach separates a narrow class of regressor matrices by virtue of conditions (1.39) and (1.40). The refined approach is based on the variance stabilization idea. Partitioning X into columns, X ¼ (X1 , . . . , Xk ), we see that the vector u ¼ X 0 e has components uj ¼ Xj0 e with variances V(uj ) ¼ s 2 kXj k22 : Since X 0 X is the Gram matrix of the system {X1 , . . . , Xk }, the condition det X 0 X = 0 is equivalent to linear independence of the columns (Section 1.7.4) and implies kXj k2 = 0 for all j and large n. If we define the normalizer by
Dn ¼ diag kX1 k2 , . . . , kXk k2 ,
(1:42)
36
CHAPTER 1
INTRODUCTION TO OPERATORS, PROBABILITIES AND THE LINEAR MODEL
then the matrix H¼
XD1 n
X1 Xk ¼ , ..., ¼ (H1 , . . . , Hk ) kX1 k2 kXk k2
has normalized columns, kHj k2 ¼ 1: This construction is simple yet so important that I would love to name it after the discoverer. Unfortunately, the historical evidence is not clear-cut, as is shown in Section 1.11.2. For this reason I call Dn a variancestabilizing (VS) normalizer. The analog of Eq. (1.38) is [see Eq. (1.31)] Dn (b^ b) ¼ Dn (X 0 X)1 X 0 e 0 1 1 1 0 0 1 0 ¼ (D1 n X XDn ) Dn X e ¼ (H H) H e:
(1:43)
Naturally, the place of Eqs. (1.39) and (1.40) is taken by limit A ¼ lim H 0 H exists and is nonsingular n!1
(1:44)
and d
H 0 e ! N(0, B):
(1:45)
We call both the combinations of Eqs. (1.38) þ (1.39) þ (1.40) and Eqs. (1.43) þ (1.44) þ (1.45) a conventional scheme of derivation of the OLS asymptotics. The result in Section 1.11.3 implies that, if we want to use Eq. (1.43), condition (1.44) is unavoidable. If Eq. (1.44) is not satisfied with any normalization, the conventional scheme itself should be modified (see in Chapter 4, how P.C.B. Phillips handles this issue).
1.11.2 History The probabilists became aware of the variance stabilization principle a long time ago. It is realized in one or another form in all CLTs. It took some time for the idea to penetrate econometrics. Eicker (1963) introduced the normalizer Dn , but considered convergence of components of the OLS estimator instead of convergence of the estimator in joint distribution. Anderson (1971) proved convergence in joint distribution using Dn and mentioned that the result “in a slightly different form was given by Eicker”. Schmidt (1976), without reference to either Eicker or Anderson, established a result similar to Anderson’s. None of these three authors compare Dn to the classical normalizer. Moreover, Schmidt’s comments imply that he thinks of Dn as complementary to the square root. Amemiya (1985) proved Anderson’s result, without referring to thepthree ffiffiffi authors just cited. Evidently, he was the first to show that Dn is superior to n in
1.11 GENERAL FRAMEWORK IN THE CASE OF K REGRESSORS
37
the sense that Eq. (1.44) is more general than Eq. (1.39). He also noticed that Dn -type normalization is applicable to maximum likelihood estimators. Finally, Mynbaev and Castelar (2001) established that Dn is more general than any other normalizer, as long as the conventional scheme is employed. This result is the subject of Section 1.11.3.
1.11.3 Universality of Dn Definition. A diagonal matrix (actually, a sequence of matrices) Dn is called a conventional-scheme-compliant (CSC) normalizer if H ¼ XD1 n satisfies Eqs. (1.44) 2 and (1.45) for all errors e IID(0, s I): If {Mn } is any sequence of nonstochastic diagonal matrices satisfying the condition limit M ¼ lim Mn exists and is nonsingular
(1:46)
Dn ¼ Mn Dn is also a CSC and Dn is a CSC normalizer, then it is easily checked that e normalizer with e ¼ M 1 AM 1 , B~ ¼ M 1 BM 1 : e0 H e ¼ HMn1 , A~ ¼ lim H H Theorem. (Mynbaev and Castelar 2001) The VS normalizer (1.42) is unique in the class of CSC normalizers up to a factor satisfying Eq. (1.46). It follows that if with some normalizer the conventional scheme works, then Dn can also be used, while the converse may not be true. 1 Proof. Let Dn ¼ diag[ d n1 , . . . , d nk ] be some CSC normalizer, H ¼ X Dn , and let A and B be the corresponding elements of the conventional scheme. The diagonal of 0 the limit relation H H ! A gives 0
2
H j H j ¼ kXj k22 =d nj ! a jj , j ¼ 1, . . . , k,
(1:47)
where H j denote the columns of H, Xj the columns of X and a ij the elements of A: Recalling that Dn has dnj ¼ kXj k2 on its diagonal we deduce from Eq. (1.47) that dnj =d nj ! a 1=2 jj ,
j ¼ 1, . . . , k:
(1:48) 0
By the Cauchy –Schwarz inequality the elements of H H satisfy the inequality H j j kH i k2 kH j k2 : Letting n ! 1 here and using Eq. (1.47) we get jaij j (aii a jj )1=2 : This tells us that none of the diagonal elements can be zero because otherwise a whole cross in A would consist of zeros and A would be singular. 1 Now from Eq. (1.48) we see that Mn ¼ Dn Dn satisfies Eq. (1.46) and Dn ¼ Mn Dn differs from Dn by an asymptotically constant diagonal factor. It follows that Dn is CSC with A ¼ M 1 AM 1 and B ¼ M 1 BM 1 : 0 jH i
38
CHAPTER 1
INTRODUCTION TO OPERATORS, PROBABILITIES AND THE LINEAR MODEL
The square root is an example of a normalizer that has a narrower area of appliB cability than Dn .
1.11.4 The Moore– Penrose Inverse Suppose A is a singular square matrix. According to (Rao 1965, Section 1b.5) the Moore – Penrose inverse Aþ of a matrix A is uniquely defined by the properties AAþ A ¼ A,
(1:49)
þ
(1:50) (1:51)
þ
þ
A AA ¼ A , AAþ and Aþ A are symmetric:
When A is symmetric, Aþ can be constructed explicitly using its diagonal representation. Let A be of order n and diagonalized as A ¼ PLP0 where P is orthogonal, P0 P ¼ I and L is a diagonal of eigenvalues of A (see Theorem 1.7.2). Denote þ 1 , 1 ¼ l l 0,
l = 0; 1 þ (L ) ¼ diag l ¼ 0:
þ þ 1 1 , ..., , l1 ln
Aþ ¼ P(L1 )þ P0 :
Lemma. Aþ is the Moore – Penrose inverse of A. It is symmetric and the matrix Q ¼ Aþ A is an orthoprojector: Q0 ¼ Q, Q2 ¼ Q: Proof. Aþ is symmetric by construction. It is easy to see that the product D ¼ (L1 )þ L has zeros where L has zeros and unities where L has nonzero eigenvalues. Therefore LD ¼ L and DLþ ¼ Lþ , so that Eqs. (1.49) and (1.50) are true: AAþ A ¼ PLDP0 ¼ A, Aþ AAþ ¼ PDLþ P0 ¼ Aþ : Besides, the matrices AAþ ¼ PLLþ P0 and Aþ A ¼ PLþ LP0 ¼ PDP0 are symmetric. By the uniqueness of the Moore – Penrose inverse, Aþ is that inverse. The symmetry of Q ¼ Aþ A has just been shown. Q is idempotent: B Q2 ¼ (Aþ A)2 ¼ PD2 P0 ¼ Q. Note that Aþ is not a continuous function of A. For example, An ¼
1 0 0 1=n
1.11 GENERAL FRAMEWORK IN THE CASE OF K REGRESSORS
39
converges to A¼
1 0
0 0
¼ Aþ
but Aþ n ¼
1 0 0 n
does not converge to Aþ .
1.11.5 What if the Limit of the Denominator Matrix is Singular? Can the Moore – Penrose inverse save the situation? It is important to realize that convergence in distribution of Dn ( b^ b) in the conventional scheme is obtained as a consequence of Equations (1.43) – (1.45) from Section 1.11.1. Since the Moore – Penrose inversion is not continuous, the scheme does not work when the limit of the denominator matrix is singular. The next proposition shows that the Moore – Penrose inverse can be applied if outside (independent of the conventional scheme) information is available in the form limit v ¼ dlimDn (b^ b) exists: Lemma.
(1:52)
If instead of Eq. (1.44) we assume that limit A ¼ lim H 0 H exists and is singular n!1
(1:53)
and if two pieces of information about convergence in distribution are available in the form of Eqs. (1.45) and (1.52), then Qv N(0, Aþ BAþ ) where Q ¼ Aþ A is an orthoprojector. Proof. The normal equation X 0 X(b^ b) ¼ X 0 e can be rewritten as H 0 HDn (b^ b) ¼ H 0 e: Denoting u the limit of the numerator and using Eqs. (1.53), (1.45) and (1.52) we get Av ¼ u: Premultiply this by Aþ to obtain Qv ¼ Aþ u: Now the statement follows from Eq. (1.45). B Thus, under the additional condition (1.52) some projection of v is normally distributed, with a degenerate variance Aþ BAþ .
40
CHAPTER 1
INTRODUCTION TO OPERATORS, PROBABILITIES AND THE LINEAR MODEL
1.12 INTRODUCTION TO L2-APPROXIMABILITY 1.12.1 Asymptotic Linear Independence By Theorem 1.7.4 the Gram matrix 0
H10 H1 0 @ ... G¼HH¼ Hk0 H1
1 . . . H10 Hk ... ... A . . . Hk0 Hk
is nonsingular if and only if the columns H1 , . . . , Hk of H are linearly independent. Therefore condition (1.44) is termed the asymptotic linear independence condition. The question is: can the word “asymptotic” be removed from this name, that is, are there any vectors for which nonsingularity of the limit A ¼ limn!1 H 0 H would mean simply linear independence? Imagine that for each j we have convergence of columns Hj ! Mj , as n ! 1, in such a way that Hk0 Hl ! Mk0 Mk : Then existence of the limit A ¼ limn!1 H 0 H would be guaranteed and detA = 0 would mean linear independence of M1 , . . . , Mk . Unfortunately, the sequences {Hj : n . k} do not converge. Their elements belong to Rn2 , which can be embedded naturally into l2 (N). A necessary condition for convergence x(n) ! x in l2 (N) is the coordinate-wise convergence x(n) i ! xi , n ! 1, for all i ¼ 1, 2, . . . . But for Eq. (1.45) to be true we have to require the errors contribution negligibility condition (1.37) which in terms of the elements of H looks like this: lim max jhij j ¼ 0:
n!1 i, j
Thus, convergence Hj ! Mj , as n ! 1, implies Mj ¼ 0, but this is impossible because kHj k2 ¼ 1 for all n because of normalization.
1.12.2 Discretization The general idea is to approximate sequences of vectors (functions of a discrete argument) with functions of a continuous argument. For any natural n a function f [ C[0, 1] generates a vector with coordinates f (i=n), i ¼ 1, . . . , n: A sequence of vectors {x(n) }, with x(n) [ Rn for all n, can be considered close to f if i ! 0, n ! 1: f max x(n) i 1in n This kind of approximation was used by Nabeya and Tanaka (1988), see also, (Tanaka, 1996). A better idea is to use the class L2 (0, 1), which is wider than C[0, 1]: However, the members of L2 (0, 1) are defined only up to sets of Lebesgue measure 0, and it doesn’t make sense to talk about values f (i=n) for f [ L2 (0, 1): Ð i=n Instead of values we can use integrals (i1)=n f (t) dt, i ¼ 1, . . . , n: For convenience,
1.12 INTRODUCTION TO L2-APPROXIMABILITY
the vector of integrals is multiplied by zation operator dn ð pffiffiffi i=n (dn f )i ¼ n
41
pffiffiffi n, which gives the definition of the discreti-
i ¼ 1, . . . , n:
f (t) dt,
(1:54)
(i1)=n
The sequence {dn f : n [ N} is called L2 -generated by f. L2-generated sequences were introduced by Moussatat (1976). With the volatility of economic data, in econometrics it is unacceptable to require regressors to be L2 -generated or, in other words, to be exact images of some f [ L2 (0, 1) under the mapping dn : To allow some deviation from exact images, in a conference presentation (Mynbaev 1997) I defined an L2 -approximable sequence as a sequence {x(n) } for which there is a function f [ L2 (0, 1) satisfying kx(n) dn f k2 ! 0: If this is true, we also say that {x(n) } is L2 -close to f [ L2 (0, 1): It is worth emphasizing that the OLS estimator asymptotics can be proved without this condition. When the errors are independent, the asymptotic linear independence and errors contribution negligibility condition are sufficient for this purpose, see (Anderson 1971; Amemiya 1985). In 1997 I needed this notion to find the asymptotic behavior of the fitted value, which is a more advanced problem. Note also that (Po¨tscher and Prucha 1997) and Davidson (1994) used the term Lp -approximability in a different context. L2 -approximable sequences and, more generally, Lp -approximable sequences defined in (Mynbaev 2000) possess some continuity properties when p , 1: This is their main advantage over general sequences.
1.12.3 Ordinary Least Squares Asymptotics Theorem.
Consider a linear model y ¼ X b þ u where
(i) the errors u1 , . . . , un are defined by Eq. (1.35), the innovations {ej : j [ Z} P 2 are IID(0, s2 I), j[Z jcj j , 1 and ej are uniformly integrable; (ii) for each j ¼ 1, . . . , k, the sequence of columns {Hj : n . k} of the normalized regressor matrix H ¼ XD1 n is L2 -close to Mj [ L2 (0, 1); (iii) the functions M1 , . . . , Mk are linearly independent. Then the denominator matrix H 0 H converges to the Gram matrix G of the system M1 , . . . , Mk and d Dn (b^ b) ! N(0, (sbc )2 G1 ):
Proof. By Theorem 2.5.3 limn!1 Hi0 Hj ¼ 0
0
d
Ð1
0 Mi Mj dt (sbc )2 G)
(1:55) and, in consequence,
lim H H ¼ G: By Theorem 3.5.2 H u ! N(0, (this includes the case when H 0 u converges in distribution and in probability to a null vector). Equation (1.55) follows from the conventional scheme. B
42
CHAPTER 1
INTRODUCTION TO OPERATORS, PROBABILITIES AND THE LINEAR MODEL
In similar results with VS normalization (Anderson, 1971, Theorem 2.6.1; Schmidt, 1976, Section 2.7; Amemiya, 1985, Theorem 3.5.4) the errors are assumed independent. Assumptions on H vary from source to source. In Theorems 2.5.3 and 3.5.2 the necessary properties of H are derived from the L2 -approximability assumption. Instead, we could require them directly. When the errors are independent, these properties are: existence of the limit A ¼ limn!1 H 0 H, asymptotic linear independence det A = 0 and the errors contribution negligibility condition limn!1 maxi, j jhij j ¼ 0: Thus, as far as the OLS asymptotics for the classical model is concerned, the L2 -approximability condition is stronger than the minimum required. It becomes indispensable when deeper properties are needed, like convergence of the fitted value considered next.
1.12.4 Convergence of the Fitted Value The fitted value is defined by y^ ¼ X b^ . The need for its asymptotics may arise in the following way. Suppose we have to estimate stock q(t) based on its known initial value q(t0 ) and flow (rate of change) q0 (t). By the Newton – Leibniz formula, Ðt q(t) q(t0 ) ¼ t0 q0 (s) ds. If q 0 (t) is measured at discrete points and regressed on, say, a polynomial of time, the interpolated fitted value approximates q0 on the whole interval [t0 , t] and integrating it gives an estimate of q(t) q(t0 ). As is the case with the OLS estimator, the fitted value has to be transformed to achieve convergence in distribution. Centering on X b results in ^ ^ y^ X b ¼ X(b^ b) ¼ XD1 n Dn (b b) ¼ HDn (b b):
(1:56)
Convergence of Dn (b^ b) is available from Theorem 1.12.3, but H does not converge, as explained in Section 1.12.1. It happens, though, that interpolating H leads to a convergent sequence in L2 (0, 1). APvector x with n values is interpolated by constants to obtain a step function Dn x ¼ nt¼1 xt 1it . The interpolation operator Dn is applied to columns of H. From Eq. (1.56) we get Dn ( y^ X b) ¼ Dn
k X
Hl [Dn (b^ b)]l ¼
l¼1
k X
(Dn Hl )[Dn (b^ b)]l :
l¼1
Theorem. Under the assumptions of Theorem 1.12.3 the fitted value converges in distribution to a linear combination of the functions M1, . . . ,Mk, d
Dn ( y^ X b) !
k X
Ml ct ,
l¼1
where the random vector c ¼ (c, . . . , ck)0 is distributed as N(0, (sbc )2 G1 ). Proof. By Lemma 2.5.1 the L2-approximability condition kHl 2 dnMlk ! 0 is equivalent to kDnHl 2 Mlk! 0. Convergence of {Dn Hl } to Ml in L2 implies convergence in distribution of {Dn H} to the vector M ¼ (M1, . . . , Mk)0 . In the expression Dn (^y X b) ¼ [Dn H 0 ][Dn (b^ b)] both factors in brackets at the right
1.12 INTRODUCTION TO L2-APPROXIMABILITY
43
converge in distribution. Since their limits M and u ¼ dlimDn (b^ b) are independent d and, for each n, DnH and Dn (b^ b) are independent, the relations Dn H ! M and d d Dn (b^ b) ! u imply convergence of the pair (Dn H, Dn (b^ b)) ! (M, u) see (Billingsley 1968, pp. 26– 27). By the continuous mapping theorem then d
Dn ( y^ X b) ! M 0 u.
B
1.12.5 Convictions and Preconceptions In econometrics too much depends on the views of the researcher. Apparently, a set of real-world data can be looked at from different angles. Unfortunately, theoretical studies also suffer from the subjectivity of their authors. Two different sets of assumptions for the same model may lead to quite different conclusions. The choice of the assumptions depends on the previous experience of the researcher, the method employed and the desired result. Assumptions made for and views drawn from a simple model are often taken to a higher level where they can be called convictions if justified or preconceptions if questionable. A practitioner usually worries only about the qualitative side of the result. A highly technical paper about estimator asymptotics in his/her interpretation boils down to “under some regularity conditions the estimator is asymptotically normal”. Hypotheses tests are conducted accordingly, the result is cited without proofs in expository monographs for applied specialists and, with time, becomes a part of folklore. The probability of a critical revision of the original paper declines exponentially. Imagine that you are a security agent entrusted with the task of capturing an alien that is killing humans. If you presuppose that the beast is disguised like a human your course of actions will be quite different from what it would be if you were looking for a giant cockroach. When you see a new estimator, its asymptotics is that alien. The best of all is not to presume that it is of a particular type. Make simplified assumptions and look at the finite-sample distributions in the case of normal disturbances. If they are normal, perhaps the asymptotics is also normal. If they are not, a suitable transformation of the estimator, such as centering and scaling, may result in normal asymptotics. Alternatively, you may have to apply a CLT in conjunction with the CMT to obtain nonnormal asymptotics. All these possibilities are illustrated in the book. By choosing the format of the result you make a commitment. Normal asymptotics is usually proved using a CLT. Let us say it comes with conditions (A), (B) and (C). To satisfy them, you impose in terms of your model conditions (A0 ), (B0 ) and (C0 ), respectively. These conditions determine the class of processes your result is applicable to. By selecting a different format you are bound to use different techniques and obtain a different class. In the case of the conventional scheme an easy way to go is simply assume that X and e are such that either Eqs. (1.39) þ (1.40) or Eqs. (1.44) þ (1.45) are satisfied. I call such a “theorem” a pig-in-a-poke result. While this approach serves illustrative purposes in a university course well, its value in a research paper or monograph is doubtful. Eicker (1963) mentions that conditions should be imposed separately on the errors and regressors.
44
CHAPTER 1
INTRODUCTION TO OPERATORS, PROBABILITIES AND THE LINEAR MODEL
In this relation it is useful to distinguish between low-level conditions, stated directly in terms of the primary elements of the model, such as Eq. (1.44), and high-level conditions, expressed in terms of some complex combinations of the basic elements, such as Eq. (1.45). Of course, this distinction is relative. For instance, the L2-approximability assumption about deterministic regressors made in the most part of this book is of a lower level than Eq. (1.44). The parsimony principle in econometric modeling states that a model should contain as few parameters as possible or be simple otherwise and still describe the process in question well. A similar principle applies to the choice of conditions. If you have imposed several of them and are about to require a new one, make sure that it is not implied or contradicted by the previous conditions. My major professor, M. Otelbaev, used to say, “If I am allowed to impose many conditions, I can prove anything”. Transparency, simplicity and beauty are other subjective measures of the assumptions quality. A good taste is acquired by reading and comparing many sources. It is not a good idea to have a prospective user of your result prove a whole theorem to check whether your assumptions are satisfied. Nontransparent conditions appealing to existence of objects with certain properties are especially dangerous. It is quite possible to use the right theorems and comply with all the rules of formal logic and get a bad statement because the set of objects it applies to will be empty if the conditions are contradictory or existence requirements are infeasible. Contradictions are easy to avoid by using conditions with nonoverlapping responsibilities. In other words, beware of two different conditions governing the behavior of the same object. Generalizations do not always work, as we have seen when going from constant to variable regressors. However, when studying a dynamic model, such as the mixed spatial model Y ¼ Xb þ rWY þ e in Chapter 5, I choose the conditions and methods that work for its two submodels, Y ¼ Xb þ e and Y ¼ rWY þ e. In this sense, this book is not free from subjectivity. Generalizations based on the conventional scheme can be as harmful as any others. The study of the purely spatial model in Chapter 5 shows that the said model violates the habitual notions in several ways: 1. the OLS asymptotics is not normal, 2. the limit of the numerator vector is not normal, 3. the limit of the denominator matrix is not constant, 4. the normalizer is identically 1 (that is, no scaling is necessary) and 5. there is no consistency. These days requirements to econometric papers are very high. If you suggest a new model, you have to defend it by showing its theoretical advantages and testing its practical performance, preferably in the same paper. The author of a new model can be excused if he/she studies the model under simplified assumptions and leaves the generalizations and refinements to the followers. The way of modeling deterministic regressors advocated here allows us to combine simple assumptions with rigorous proofs.
CHAPTER
2
Lp-APPROXIMABLE SEQUENCES OF VECTORS
I
chapter we use some classical tools, the Haar projector and continuity modulus in the first place. With their help we can study the properties of discretization, interpolation and convolution operators. From there we go to Lp -approximable sequences and their properties, among which the criterion of Lp -approximability is the most advanced. The chapter ends with examples. Everywhere in this chapter Lp denotes the space Lp (0, 1). Where necessary integration over subsets of (0, 1) is Ð 1=p b . indicated as in kFk p,(a,b) ¼ a jF(x)jp dx N THIS
2.1 DISCRETIZATION, INTERPOLATION AND HAAR PROJECTOR IN Lp 2.1.1 Partitions and Coverings For each natural n the set {t=n: t ¼ 0, . . . , n} is called a uniform partition. The intervals it ¼ [(t 1)=n, t=n) form a disjoint covering of [0, 1) of equal length 1=n. Denoting [a] as the integer part of a real number a, we can see that the condition x [ it is equivalent to t 1 nx , t, which, in turn, is equivalent to t ¼ [nx] þ 1. The function [nx] þ 1 can be called a locator because x [ i[nx]þ1 for all x [ [0, 1).
2.1.2 Discretization Operator Definition For each natural n, we can define a discretization operator dnp : Lp ! Rnp by ð (dnp F)t ¼ n1=q F(x) dx, t ¼ 1, . . . , n, F [ Lp , it
where q is the conjugate of p, 1=p þ 1=q ¼ 1. Up to a scaling factor, the tth component of dnp F is the average of F over the interval it . For a given F [ Lp , the sequence {dnp F: n [ N} is called Lp -generated by F. Short-Memory Linear Processes and Econometric Applications. Kairat T. Mynbaev # 2011 John Wiley & Sons, Inc. Published 2011 by John Wiley & Sons, Inc.
45
46
CHAPTER 2
Lp-APPROXIMABLE SEQUENCES OF VECTORS
2.1.3 Discretization Operator Properties Lemma.
If F [ Lp , 1 p 1, then
(i) j(dnp F)t j kFkp,it , (ii) kdnp Fkp kFkp and (iii) limn!1 max1 t n j(dnp F)t j ¼ 0 in case p , 1. Proof. Everywhere we assume p , 1, the modification for p ¼ 1 being obvious. (i) By Ho¨lder’s inequality 0
11=p 0
ð
ð
11=q
j(dnp F)t j n1=q @ jF(x)jp dxA @ dxA it
¼ kFkp,it :
it
(ii) If p , 1, use part (i) to get by additivity of integrals n X
!1=p j(dnp F)t jp
t¼1
0 @
n ð X t¼1
11=p jF(x)jp dxA
¼ kFkp :
it
(iii) Since jF()jp [ L1 , we can use absolute continuity of the Lebesgue integral to Ð find, pfor any 1 . 0, a number n(1) . 0 such that n . n(1) implies it jF(x)j dx , 1 for t ¼ 1, . . . , n. Now the statement follows from (i). B Statement (ii) means that the norms of dnp from Lp to lp do not exceed 1.
2.1.4 Continuity Modulus in Lp , p , 1 Most properties of continuity moduli we need can be found in Zhuk and Natanson (2001). For y [ R, let ty be the translation operator, (ty F)(x) ¼ F(x þ y). If F is defined on (a, b), ty F is defined on (a, b) y ¼ (a y, b y). As the domain of the difference F ty F people take the intersection of these intervals, denoted (a, b)y ¼ (a, b) > [(a, b) y] ¼ ( max{a, a y}, min{b, b y}): In particular, with V ¼ (0, 1) the difference F ty F is defined on Vy . The continuity modulus of F [ Lp equals, by definition,
vp (F, d) ¼ sup kF ty Fk p,Vy , d . 0: jyj d
The continuity modulus is designed to measure how close ty F is to F for small y. This can be demonstrated using the indicator 1A of a set A. Think of the function
2.1 DISCRETIZATION, INTERPOLATION AND HAAR PROJECTOR IN Lp
47
F ¼ 1A as an example, where A is some measurable subset of (0, 1). Then ty 1A ¼ 1 on A y and ty 1A ¼ 0 outside A y. The Venn diagram shows that 1A ty 1A is zero on the intersection A > (A y) and outside the union A < (A y) and it is unity on the symmetric difference sy ¼ [An(A y)] < [(A y)nA] of A and A y. Therefore when p , 1 ð ð k1A ty 1A kpp,Vy ¼ j1A ty 1A jp dx dx ¼ mes(sy ), sy
Vy
where mes denotes the Lebesgue measure. In the theory of the Lebesgue measure it is proved that mes(sy ) ! 0 when y ! 0, which implies k1A ty 1A kp,Vy ! 0, y ! 0 ( p , 1):
(2:1)
However, if p ¼ 1, then k1A ty 1A k1,Vy ess sup 1sy (x):
(2:2)
x[sy
Here the quantity at the right is 1 as long as mes(sy ) . 0.
2.1.5 Continuity of Elements of Lp Lemma (i) The continuity modulus is nondecreasing: d g implies that vp (F, d) vp (F, g). (ii) If F [ Lp , p , 1, then limd ! 0 vp (F, d) ¼ 0. Proof. Part (i) follows directly from the definition: supremum over a larger set is larger. (ii) Since linear combinations of indicators of measurable sets are dense in Lp when p , 1, Eq. (2.1) extends to all of Lp : kF ty Fkp,Vy ! 0, y ! 0 (F [ Lp , p , 1):
(2:3)
This is described by saying that functions from Lp with finite p are continuous in mean (or translation continuous). To prove (ii), it remains to apply sup over B jyj d to Eq. (2.3). Equation (2.2) shows that elements of L1 are not translation continuous. When it is important to have this property, L1 is replaced by the set of continuous functions C[0, 1]. The continuity modulus of a continuous function on [0, 1] is defined by
vC (F, d) ¼
sup jxyj d, x,y[[0,1]
jF(x) F(y)j:
48
CHAPTER 2
Lp-APPROXIMABLE SEQUENCES OF VECTORS
This tends to 0 when d tends to zero because a continuous function on [0, 1] is uniformly continuous.
2.1.6 Interpolation Operator and Haar Projector Definitions The interpolation operator Dnp : Rnp ! Lp takes a vector f [ Rnp to a step function Dnp f ¼ n1=p
n X
ft 1it :
t¼1
Here, the factor n1=p is introduced just for convenience. Interpolation by a piece-wise constant function is sufficient for our purposes. The Haar projector Pn is defined on integrable functions on [0, 1] by Pn F ¼ n
n ð X t¼1
F(x) dx 1it
it
(the value of Pn F on the interval it is the average of F on that interval). Pn is really a projector because for is in the above sum only one term can be different from zero and Pn (Pn F) ¼ n
n ð X s¼1
¼n
is
n ð X s¼1
is
(Pn F)( y) dy 1is 0
ð
1
@n F(x) dx 1is (y)Ady 1is ¼ Pn F: is
2.1.7 Interpolation and Haar Projector Properties Lemma.
Let 1 p 1.
(i) Dnp preserves norms, kDnp f kp ¼ k f kp for all f [ Rnp , and bilinear forms, ð1
(Dnp f )Dnq g dx ¼ f 0 g for all f , g [ Rn :
(2:4)
0
(ii) Discretizing and then interpolating is the same as projecting: Dnp dnp ¼ Pn . Interpolation and subsequent discretization amount to applying the identity operator: dnp Dnp ¼ I on Rnp . (iii) Norms of the projectors Pn do not exceed 1: kPn Fkp kFkp .
2.2 CONVERGENCE OF BILINEAR FORMS
49
Proof. (i) If p , 1, then ð1
p
j(Dnp f )(x)j dx ¼
n ð X t¼1
0
¼n
j(Dnp f )(x)jp dx
it
n ð X t¼1
j ft jp 1it (x) dx ¼
n X
j ft jp :
t¼1
it
If p ¼ 1, then kDn1 f k1 ¼ maxt¼1,..., n j ft j. Further, ð1 (Dnp f )Dnq g dx ¼
n ð X t¼1
0
1=p
n
ft n
1=q
gt dx ¼
n X
nft gt dx ¼ f 0 g:
t¼1
it
ð it
(ii) For F [ Lp Dnp (dnp F) ¼ n1=p
n X
(dnp F)t 1it ¼ n1=pþ1=q
t¼1
n ð X t¼1
F(x) dx 1it ¼ Pn F:
it
To see that dnp Dnp f ¼ f for all f [ Rn it suffices to check that (dnp Dnp f )t ¼ ft for all t. This is true because interpolating ft generates a constant on it equal to F ¼ n1=p ft . With this F, (dnp F)t gives ft . (iii) From boundedness of the discretization operator [Lemma 2.1.3(ii)] and items (i) and (ii) of this lemma kPn Fkp ¼ kDnp dnp Fkp ¼ kdnp Fkp kFkp :
B
2.2 CONVERGENCE OF BILINEAR FORMS An expression of type fixed, ð1
Ð1 0
F(x)G(x) dx is a bilinear form: it is linear in F when G is
ð1
ð1
(aF þ bH)G dx ¼ a FG dx þ b HG dx, 0
0
and similarly it is linear in G when F is fixed.
0
50
CHAPTER 2
Lp-APPROXIMABLE SEQUENCES OF VECTORS
2.2.1 Convergence of Haar Projectors to the Identity Operator Lemma. If p , 1, then the sequence {Pn } converges strongly to the identity operator with the next bound on the rate of convergence: kPn F Fkp 21=p vp (F, 1=n). Proof. Step 1. Using additivity of integrals and the fact that the restriction of Pn F on it Ð equals n it Fdx we have kPn F Fkpp ¼
n ð X t¼1
j(Pn F)( y) F( y)jp dy
it
p ð ð ¼ n F(x) dx n F(y) dx dy t¼1 it it it p n ð ð X p ¼n (F(x) F( y)) dx dy: t¼1 it it n ð X
Now we apply Ho¨lder’s inequality to X(x) ¼ F(x) F( y) and Y(x) ; 1 and use the identity p p=q ¼ 1: kPn F
Fkpp
n
p
n ðð X t¼1
¼n
it it
n ðð X t¼1
jF(x) F( y)jp dx dy np=q
jF(x) F( y)jp dx dy:
it it
As we can see, we need to estimate the integrals over the squares it it . Step 2. Let F be defined on D ¼ (a, a þ b). We want to reveal the translation operator in the integral ðð jF(x) F( y)jp dx dy I¼ DD
(change x ¼ y þ z in the inner integral) aþb ð ð aþby
¼ a
ay
jF( y þ z) F( y)jp dz dy:
2.2 CONVERGENCE OF BILINEAR FORMS
51
Figure 2.1 Change of variables.
The inner integral should be over y. To change the order of integration, we have to split {z: b z b} into {z: b z 0} and {z: 0 z b} (Figure 2.1). Reading off the limits of integration from the diagram we write aþb ð
ð0 dz
I¼
jF( y) (tz F)( y)jp dy
az
b
ðb
aþbz ð
þ dz
jF( y) (tz F)( y)jp dy:
a
0
Step 3. Combining Steps 1 and 2 we get 2
kPn F
Fkpp
n
ð0
n X
6 4
t¼1
1=n
ð
1=n ð
dz
þ 0
ð dz
jF tz Fjp dy
(it )z
3 7 jF tz Fjp dy5:
(it )z
The intervals (it )z conveniently satisfy
Vz
ð
1=n ð
dz
þ 0
Vz
3 7 jF tz Fjp dy5 2vpp (F, 1=n): B
52
CHAPTER 2
Lp-APPROXIMABLE SEQUENCES OF VECTORS
2.2.2 Bilinear Forms Involving Haar Projectors and Discretizers Here we prove a preliminary result that helps understand the main result from Section 2.2.3. Lemma.
(Convergence of bilinear forms) Let 1 , p , 1, F [ Lp , G [ Lq . Then ð1
0
lim (dnp F) dnq G ¼ lim
n!1
ð1 (Pn F)Pn G dx ¼ FG dx:
n!1 0
(2:5)
0
Proof. First we establish a variant of bilinear form preservation property. Plugging f ¼ dnp F, g ¼ dnq G in identity (2.4) and remembering that Dnp dnp ¼ Pn by Lemma 2.1.7(ii), we see that ð1
(Pn F)Pn G dx ¼ (dnp F)0 dnq G,
(2:6)
0
which implies the first equation in Eq. (2.5). Now we use the age-old trick of adding and subtracting members and Ho¨lder’s inequality to estimate 1 1 ð ð ð1 ð1 (Pn F)Pn G dx FG dx ¼ (Pn F)(Pn G G) dx þ (Pn F F)G dx 0
0
0
0
kPn Fkp kPn G Gkq þ kPn F Fkp kGkq : By Lemmas 2.1.7 and 2.2.1 1 ð ð1 (Pn F)Pn G dx FG dx 0
0
max {2
3=(2p)
, 23=(2q) }[kFkp vq (G, 1=n) þ vp (F, 1=n)kGkq ]:
This bound and Lemma 2.1.5 establish the second equation in Eq. (2.5).
(2.7) B
2.2.3 Bilinear Forms of Lp-Generated Sequences Theorem. (Mynbaev, 2000) Let 1 , p , 1, X [ Lp , Y [ Lq and let [a, b] # [0, 1]. Then 0
ðb
lim [dnp (1[a,b] X)] dnq (1[a,b] Y) ¼ X(x)Y(x) dx
n!1
a
uniformly with respect to the intervals [a, b].
53
2.2 CONVERGENCE OF BILINEAR FORMS
Proof. After substituting F ¼ 1[a,b] X and G ¼ 1[a,b] Y in Eq. (2.7) and on account of Eq. (2.6) we conclude that the problem reduces to bounding k1[a,b] Xkp and vp (1[a,b] X, d) for arbitrary X [ Lp , 1 , p , 1 and [a, b] # [0, 1]. It is easy to see that multiplication by 1[a,b] is bounded in Lp uniformly in [a, b]: k1[a,b] Xkp kXkp :
(2:8)
Bounding the continuity modulus is more involved. The beginning is suggested by the definition of vp ( f X, d) where we put f ¼ 1[a,b] : k f X ty ( f X)kp,Vy k f X f ty Xkp,Vy þ k f ty X (ty f )ty Xkp,Vy ¼ k f (X ty X)kp,Vy þ k( f ty f )ty Xkp,Vy :
(2.9)
To the first term at the right apply Eq. (2.8): k f (X ty X)kp,Vy kX ty Xkp,Vy :
(2:10)
Note that the translation operator can be thrown over to the adjacent factor,
k f ty gkpp,Vy
min{1,1y} ð
¼
j f (x)g(x þ y)jp dx
max{0,y}
(replace x þ y ¼ t) min {1þy,1} ð
¼
j f (t y)g(t)jp dt ¼ k(ty f )gkpp,Vy :
max {y,0}
Hence the second term at the right of Eq. (2.9) equals k( f ty f )ty Xkp,Vy ¼ k(ty f f )Xkp,Vy Consider the measure m(A) ¼ outside (0, 1). Let
Ð A
(2:11)
jX(x)jp dx where X has been extended by zero
sy ¼ {[a, b]n[a þ y, b þ y]} < {[a þ y, b þ y]n[a, b]} denote the set on which f ty f ¼ 1. With this notation, from Eq. (2.11) we see that k( f ty f )ty Xkpp,Vy ¼
ð
j f (x) (ty f )(x)jp dm(x)
Vy
¼ m(Vy > sy ) m(sy ):
(2.12)
54
CHAPTER 2
Lp-APPROXIMABLE SEQUENCES OF VECTORS
Suppose [a, b] and [a, b] þ y do not intersect. Then either a þ y . b or b þ y , a. In both cases b a , y. sy is a union of two nonoverlapping segments of length b a, so mes(sy ) ¼ 2(b a) , 2jyj: Suppose [a, b] and [a, b] þ y do intersect. Then their symmetric difference sy is a union of two disjoint intervals of length jyj each, so mes(sy ) ¼ 2jyj: In consequence, mes(sy ) 2jyj independently of [a, b]. Since m is absolutely continuous with respect to the Lebesgue measure, for any 1 . 0 there exists d . 0 such that mes(A) , d implies m(A) , 1. By choosing jyj , d=2 we satisfy m(sy ) , 1. Now Eqs. (2.9), (2.10) and (2.12) show that vp ( fX, d=2) vp (X, d=2) þ 1. Since 1 and d are arbitrarily small, we have proved that lim vp ( f X, d) ¼ 0
d !0
uniformly in [a, b].
B
I claimed (Mynbaev, 2000) that this theorem holds for nonuniform partitions, but I failed to prove this generalization six years later when I was writing this chapter.
2.3 THE TRINITY AND ITS BOUNDEDNESS IN lp 2.3.1 Motivation Consider a linear process vt ¼
X
ctj ej , t [ Z,
j[Z
where {cj : j [ Z} is a summable sequence of real numbers and {ej : j [ Z} is a sequence of centered (Eej ¼ 0) integrable random variables (innovations). Suppose that for each n [ N we are given a vector of weights wn [ Rn and the question is about convergence in distribution of weighted sums Sn ¼
n X
wnt vt :
t¼1
Changing the order of summation, Sn ¼
n X t¼1
wnt
X
ctj ej ¼
j[Z
n X X j[Z
! wnt ctj ej ,
t¼1
we see that it makes sense to consider the convolution operator Tn : Rnp ! lp (Z) defined by (Tn w)j ¼
n X t¼1
wt ctj , j [ Z:
2.3 THE TRINITY AND ITS BOUNDEDNESS IN lp
55
Sometimes it is convenient to represent Tn w as 0 1 Tn w B 0 C Tn w ¼ @ Tn w A, Tnþ w where Tnþ : Rnp ! lp ( j . n), Tn0 : Rnp ! Rnp and Tn : Rnp ! lp ( j , 1) are defined by (Tnþ w)j ¼ (Tn w)j , j . n; (Tn0 w)j ¼ (Tn w)j , 1 j n; (Tn w)j ¼ (Tn w)j , j , 1: As a consequence of the exceptional role of Tnþ , Tn0 and Tn in this theory, I call these three operators a trinity. Naturally, Tn is called a T-operator.
2.3.2 Boundedness of the Trinity in lp Lemma.
If ac , 1 and 1 p 1, then supn kTn k ac and sup max{kTnþ k, kTn0 k, kTn k} ac :
(2:13)
n
Proof. Let E n be the embedding operator of Rnp into lp (Z): 0, if t , 1 or t . n; (E n w)t ¼ wt , if 1 t n: Denote tj the translation operator in lp (Z): (tj w)t ¼ wtþj , j, t [ Z: Obviously, both operators preserve norms, kE n wkp ¼ kwkp , ktj wkp ¼ kwkp and, as a result, ktj E n k 1. From the definition of Tn we have X X (Tn w)j ¼ (E n w)t ctj ¼ (E n w) jþs cs t[Z
¼
X
s[Z
(ts E n w)j cs ¼
s[Z
X s[Z
!
cs ts E n w : j
Since this is true for all j [ Z, we have proved the representation Tn ¼ which implies X jcs jkts E n k ac : kTn k s[Z
P
s[Z cs ts E n ,
56
CHAPTER 2
Lp-APPROXIMABLE SEQUENCES OF VECTORS
Now Eq. (2.13) follows from kTn wkp ¼ (kTnþ wkpp þ kTn0 wkpp þ kTn wkpp )1=p :
B
2.3.3 Matrix Representation of the Trinity Correct me if I am wrong, but using matrix representations seems to be inevitable when studying the further properties of the trinity members contained in Theorem 2.4.9. These members can be identified with matrices 0
Bc B 3 Tn ¼ B @ c2 0
c1
c2
cn Bc B 1n Tnþ ¼ B @ c2n
0
1 c4 cnþ2 C C C, c3 cnþ1 A
c1n cn c1n
cn c1
c0 Bc B 1 Tn0 ¼ B @ c1n 1
c1 c0 cn
1 cn1 cn2 C C C, A
c0
c2 C C C: c3 A
All of these matrices have n columns; Tn has an infinite number of rows stretching upward, and Tnþ has an infinite number of rows stretching downward. Their structure suggests using diagonal matrices for analysis. Let In and 0n denote the n n identity and null matrices. Then the matrices 0(nk)k Ink , A00 ¼ In ; A0k ¼ 0k 0k(nk) 0jkj(njkj) 0jkj , A0k ¼ Injkj 0(njkj)jkj
k ¼ 1, . . . , n 1; (2:14) k ¼ 1, . . . , n þ 1,
have the elements of their respective diagonals (main, sub or super) equal to 1 and all others equal to 0. The norms of all these operators do not exceed 1: kA0k zkp ¼ k(zkþ1 , . . . , zn , 0, . . . , 0)0 kp kzkp ,
k ¼ 0, . . . , n 1,
(2.15)
kA0k zkp ¼ k(0, . . . , 0, z1 , . . . , znjkj )0 kp kzkp ,
k ¼ 1, . . . , n þ 1:
(2.16)
From the matrix expression for Tn0 it is directly seen that Tn0 ¼
n1 X k¼nþ1
ck A0k :
(2:17)
2.4 CONVERGENCE OF THE TRINITY ON Lp-GENERATED SEQUENCES
57
To obtain a similar representation for Tn put 01k 01(nk) Ak ¼ , 1 k n; Ik 0k(nk) 1n 0 1 01n B C In A , k . n: A k ¼@ 0(kn)n 1n Then 0 kA k zkp ¼ k(. . . , 0, z1 , . . . , zk ) kp kzkp ,
kA k zkp
k n; (2.18)
0
¼ k(. . . , 0, z1 , . . . , zn , 0, . . . , 0) kp kzkp , 1 X ck A Tn ¼ k :
k . n; (2.19)
k¼1
Following the same track, let 0k(nk) Ik ¼ , 1 k n; Aþ k 01(nk) 01k 1n 0 1 0(kn)n B C In A , k . n: Aþ k ¼@ 01n
1n
Quite similar to what we have for Tn , now we have Tnþ ¼
1 X
þ ck Aþ k , kAk zkp kzkp , k 1:
(2.20)
k¼1
By the way, the representations and bounds we have obtained allow us to improve upon Eq. (2.13): kTn0 k
n1 X k¼nþ1
jck j, kTn k
1 X
jck j, kTnþ k
1 X
k¼1
jck j:
k¼1
2.4 CONVERGENCE OF THE TRINITY ON Lp-GENERATED SEQUENCES 2.4.1 Some Estimates for Images of Functions from Lp Functions from Ð Lp have two types of properties: magnitude properties, characterized by integrals A jF(x)jp dx over measurable sets, and continuity properties, embodied mainly in the continuity modulus vp (F, d). In this terminology, functions from lp have only magnitude properties and no continuity ones. We should expect
58
CHAPTER 2
Lp-APPROXIMABLE SEQUENCES OF VECTORS
discretizations dnp F of elements of Lp to be better in some way than general elements of lp . Here we obtain several estimates that confirm this surmise. Lemma.
If F [ Lp , p , 1, then fn ¼ dnp F satisfies
(i) kA0k fn fn kp (vpp (F, k=n) þ kFkpp,(0,k=n) ) (ii) (iii) (iv) (v) (vi) (vii)
1=p
,
1=p kA0k fn fn kp (vpp (F, k=n) þ kFkpp,(1k=n,1) ) , kA0k fn kp kFkp , jkj n 1, kA k fn kp kFk p,(0,k=n) , k n, for all k, kA k fn kp kFkp þ kAk fn kp kFk p,(1k=n,1) , k n, for all k. kAþ k fn kp kFkp
k ¼ 1, . . . , n þ 1, k ¼ 0, 1, . . . , n 1,
Proof. (i) By Eq. (2.16) A0k fn ¼ (0, . . . , 0, fn,1 , . . . , fn,njkj )0 (k zeros), so !1=p jkj n X X j fnt jp þ j fn,tjkj fnt jp : kA0k fn fn kp ¼ t¼1
(2:21)
t¼jkjþ1
For t jkj we use the bound on j(dnp F)t j from Lemma 2.1.3: jkj X t¼1
p
j fnt j
jkj ð X t¼1
p
jkj=n ð
jF(x)j dx ¼
it
jF(x)jp dx:
(2:22)
0
For t . jkj ð ð 1=q 1=q j fn,tjkj fnt j ¼ n F(x) dx n F(x) dx itjkj it ð ð 1=q ¼n F(x) dx F( y þ jkj=n) dy: itjkj it Apply the change x ¼ y þ jkj=n to map it to itjkj : ð 1=q j fn,tjkj fnt j ¼ n [F(x) (tjkj=n F)(x)] dx itjkj 0 B @
ð itjkj
11=p C jF(x) (tjkj=n F)(x)jp dxA :
(2.23)
2.4 CONVERGENCE OF THE TRINITY ON Lp-GENERATED SEQUENCES
59
Summarizing, 0 B kA0k fn fn kp @
jkj=n ð
n X
jFjp dx þ
t¼jkjþ1
0
11=p
ð
C jF tjkj=n Fjp dxA
itjkj
¼ (kFkpp,(0,jkj=n) þ kF tjkj=n Fkpp,(0,1jkj=n) ) (kFkpp,(0,jkj=n) þ vpp (F, jkj=n))
1=p
1=p
:
(2.24)
The final inequality is by the definition of the continuity modulus. (ii) For k ¼ 0 both sides of the inequality are null. Let k 1. By Eq. (2.15) A0k fn ¼ ( fn,kþ1 , . . . , fn,n , 0, . . . , 0)0 (k zeros), so instead of Eq. (2.21) we have
kA0k fn
n X
fn kp ¼
j fn,t fn,tk j þ
t¼kþ1
!1=p
n X
p
j fnt j
p
:
t¼nkþ1
The place of Eq. (2.22) is taken by
n X
p
j fnt j
t¼nkþ1
ð n X t¼nkþ1
it
ð1
p
jF(x)j dx ¼
jF(x)jp dx:
1k=n
Bound (2.23) is still applicable in the present situation. Equation (2.24) follows with kFk p,(1k=n,1) in place of kFk p,(0,k=n) . Item (iii) follows from Eqs. (2.15) – (2.16) and Lemma 2.1.3. Item (iv) is a consequence of Eq. (2.22):
kA k fn k p ¼
k X
!1=p j fnt jp
kFk p,(0,k=n) ,
k n:
t¼1
Item (v) obtains from Eq. (2.18) and Lemma 2.1.3. The proofs of (vi) and (vii) mimic those of (iv) and (v). B
2.4.2 The Doubling Property of the Continuity Modulus Lemma.
vp (F, 2d) vp (F, d).
60
CHAPTER 2
Lp-APPROXIMABLE SEQUENCES OF VECTORS
Proof. Let jyj 2d. By the triangle inequality kF ty Fk p,Vy kF ty=2 Fk p,Vy þ kty=2 F ty Fk p,Vy ¼ kF ty=2 Fk p,Vy þ kF ty=2 Fk p,(Vy þy=2) : Here the end term results from the change x þ y=2 ¼ t. As it happens, Vy ¼ ( max{0, y}, min{1, 1 y}) # Vy=2 Vy þ y=2 ¼ ( max{y=2, y=2}, min{1 þ y=2, 1 y=2}) # Vy=2 : Hence, increasing the domains in the above inequality and applying sup gives
vp (F, 2d) ¼ sup kF ty Fk p,Vy jyj2d
2 sup kF ty=2 Fk p,Vy=2 ¼ 2vp (F, d):
B
jy=2jd
2.4.3 The Continuity Modulus is Uniformly Continuous Lemma. For F [ Lp , p , 1, the continuity modulus vp (F, d) is a uniformly continuous function of d . 0. Proof. Adapted from Zhuk and Natanson (2001). Step 1. Let us prove that d1 , d2 implies
vp (F, d2 ) vp (F, d1 ) vp (F, d2 d1 ):
(2:25)
By definition, for any 1 . 0 there exists y, jyj d2 , such that
vp (F, d2 ) 1 kF ty Fk p,Vy :
(2:26)
jyj=d2 1 implies jyjd1 =d2 d1 , so for h ¼ yd1 =d2 by definition kF th Fk p,Vh vp (F, d1 ):
(2:27)
Since d1 =d2 , 1 by assumption, there is an inclusion Vh ¼ Vyd1 =d2 ¼ (max{0, yd1 =d2 }, min{1, 1 yd1 =d2 }) $ Vy : This, together with Eq. (2.27), implies vp (F, d1 ) kF th Fk p,Vy :
(2:28)
2.4 CONVERGENCE OF THE TRINITY ON Lp-GENERATED SEQUENCES
61
Adding Eqs. (2.26) and (2.28) yields
vp (F, d2 ) vp (F, d1 ) 1 kF ty Fk p,Vy kF th Fk p,Vy kth F ty Fk p,Vy ,
(2.29)
where the final step is by the triangle inequality. Now consider I ¼ k th F
ty Fkpp,Vy
p ð yd1 F(x þ y) dx: ¼ F x þ d 2
Vy
e¼ Changing x þ yd1 =d2 ¼ z and denoting u ¼ y(1 d1 =d2 ) and V Vy þ yd1 =d2 , we have ð I¼
jF(z) F(z þ y(1 d1 =d2 ))jp dz ¼ kF tu Fk p,e : V
V e
e is a subset of Vu : Note that V e ¼ max y d1 , y 1 d1 , min 1 þ y d1 , 1 y 1 d1 V d2 d2 d2 d2 d1 d1 ¼ max y , u , min 1 þ y , 1 u d2 d2 # (max{0, u}, min{1, 1 u}) # Vu : Now Eq. (2.29), the definition and the monotonicity of the continuity modulus give
vp (F, d2 ) vp (F, d1 ) 1 kF tu Fk p,Vu vp (F, juj) vp (F, d2 (1 d1 =d2 )) ¼ vp (F, d2 d1 ): Since 1 is arbitrarily close to zero, Eq. (2.25) follows. Step 2. By Lemma 2.1.5 the continuity modulus of F [ Lp , p , 1, vanishes at zero. Hence, the right-hand side of Eq. (2.25) can be made arbitrarily small by choosing d2 d1 small, regardless of where d1 is. The left side is nonnegative by monotonicity. Thus, the continuity modulus is, indeed, uniformly continuous. B
62
CHAPTER 2
Lp-APPROXIMABLE SEQUENCES OF VECTORS
2.4.4 Major For F [ Lp , p , 1, put
m(d) ¼ m(F, p, d) ¼ max{vp (F, d), kFk p,(0,d) , kFk p,(1d,1) }, d [ (0, 1]: Since m appears as a majorant in certain estimates, I call it just a major (luckily, the majors from matrix algebra do not play any role in this book). Lemma.
Let F [ Lp , p , 1. m is continuous on (0, 1] and vanishes at zero: lim m(d) ¼ 0:
d !0
(2:30)
If kFkp = 0, then m(d) is positive for positive d. Proof. Continuity of the major follows from continuity of vp (F, d) (Lemma 2.4.3) and absolute continuity of the Lebesgue integral (both norms kFk p,(0,d) and kFk p,(1d,1) are continuous in d). Equation (2.30) is a consequence of Lemma 2.1.5 and absolute continuity of the Lebesgue integral. Suppose that m(d) ¼ 0 for some d [ (0, 1]. If d 1=2, then the intervals (0, d) and (1 d, 1) cover (0, 1) and F ¼ 0 a.e. on (0, 1), which contradicts the assumption kFkp = 0. Let’s assume d , 1=2. Then F ¼ 0 a.e: on (0, d):
(2:31)
Ð 1d vp (F, d) ¼ 0 implies, in particular, 0 jF(x) F(x þ d)jp dx ¼ 0, which, because of Ð 1d Eq. (2.31), reduces to 0 jF(x þ d)jp dx ¼ 0. That is, the vanishing behavior of F on (0, d) extends to (d, 2d). By the doubling property (Lemma 2.4.2) we have vp (F, 2d) ¼ 0. Hence, the above procedure of propagating the equality of F to zero can be repeated a finite number of times to cover the whole interval (0, 1) (actually, covering (0, 1 d) is enough). The conclusion again contradicts kFkp = 0. Thus, B the assumption m(d) ¼ 0 is wrong and m is positive on (0, 1].
2.4.5 Inverting the Major Denote
z(1) ¼ z(F, P, 1) ¼ sup {d [ (0, 1]: m(d) 1}, 1 [ (0, 1], the inverse of the major. This is apusual way to obtain a generalized inverse of a ffiffiffi function. If, for example, m(d) ¼ d, then z(1) ¼ 12 . The definition works when the theorem on inverses of continuous strictly monotone functions does not. If the graph of the major has a flat section at 10 , m(d) ¼ 10 for d1 d d2 , the definition supplies the right end of that section: z(10 ) ¼ d2 .
2.4 CONVERGENCE OF THE TRINITY ON Lp-GENERATED SEQUENCES
63
Lemma. Let F [ Lp , p , 1. Then z is positive on (0, 1]. For sufficiently small 1, z(1) inverts m:
m(z(1)) ¼ 1:
(2:32)
lim z(1) ¼ 0:
(2:33)
It also vanishes at 0 if kFkp = 0: 1!0
Proof. By Eq. (2.30), for any 1 [ (0, 1] there is d [ (0, 1] such that m(d) 1. Then z(1) d and z is positive on (0, 1]. Let {dn } be a sequence such that dn ! z(1) and m(dn ) 1. By continuity of m, then m(z(1)) 1. For sufficiently small 1 a strict inequality here is impossible because if m(z(1)) , 1, then by continuity of m we would have m(d) 1 for some d [ (z(1), 1], which is at variance with the definition of z(1). Equation (2.33) follows from Eq. (2.30) because in the case kFkp = 0, by Lemma 2.4.4 d ¼ 0 is the only point where m vanishes. B
2.4.6 Zero-Tail and Nonzero-Tail Sequences of Weights In principle, the bounds from Lemma 2.4.1 are sufficient to prove convergence of the trinity if we are willing to abuse the 1 d language. The definitions in Sections 2.4.4 and 2.4.5, this section and Section 2.4.7 are aimed at making the things more beautiful by using just one 1 in the main statement. Everywhere it is assumed that X ac ; jcj j , 1, F [ Lp , 1 [ (0, 1]: j[Z
The objective is to study convergence of the trinity on images dnp F. This convergence is trivial if ac ¼ 0 and/or kFkp ¼ 0. In Sections 2.4.4 and 2.4.5 we see the implications of kFkp = 0. Here we combine them with those of the restriction ac = 0. The sequencePc ¼ {cj : j [ Z} with ac = 0 is called zero-tail if there exists a natural n such that jkjn jck j ¼ 0. By rejecting this condition we obtain nonzero-tail P sequences for which jkjn jck j . 0 for all n . 0. For a zero-tail nontrivial sequence the number ( ) X jck j ¼ 0 kc ¼ min n [ N: jkjn
is defined.
2.4.7 Regulator Definition The regulator r: (0, 1] ! N is used in a statement of type: for any 1 [ (0, 1] there exists a natural number r(1) such that a certain quantity (depending on n) does not exceed 1 for n r(1). In the trivial case ac kFkp ¼ 0 we put formally r(1) ¼ 1 for all 1:
64
CHAPTER 2
Lp-APPROXIMABLE SEQUENCES OF VECTORS
Consider the nontrivial case: ac kFkp = 0. 1. If c is zero-tail, then kc is a natural number and the set {n [ N: z(1)n kc } is nonempty because z(1) . 0 by Lemma 2.4.5. By definition r(1) ¼ r(c, F, p, 1) ¼ min{n [ N: z(1)n kc }: zero-tail, then by summability of c for all sufficiently large n the 2. If c is not P inequality jkj.z(1)n jck j 1 is true. In this case the regulator is defined by ( ) X jck j 1 : r(1) ¼ r(c, F, p, 1) ¼ min n [ N: jkj.z(1)n
In both cases directly from the definition we see that X jck j 1:
(2:34)
jkj.z(1)r(1)
From this property and Eq. (2.33) we also see that lim r(1) ¼ 1:
(2:35)
1!0
2.4.8 Cutter Let ac kFkp = 0. Put c(n, 1) ¼ [z(1)n], n [ N, n r(1): I call c(n, 1) a cutter because it is used to cut sums and integrals. Lemma.
Suppose ac kFkp = 0, 1 [ (0, 1] is sufficiently small and n r(1).
(i) If jkj c(n, 1), then jkj=n z(1). P (ii) jkj.c(n,1) jck j 1: (iii) c(n, 1) n 2. Proof. From the definition of an integer part c(n, 1) z(1)n , c(n, 1) þ 1:
(2:36)
By the condition of the lemma z(1)n z(1)r(1), which, together with the right inequality in Eq. (2.36), leads to c(n, 1) þ 1 . z(1)r(1):
(2:37)
Part (i) follows from the left side of Eq. (2.36). Part (ii) results from Eqs. (2.34) and (2.37): X X jck j ¼ jck j jkj.c(n,1)
jkjc(n,1)þ1
X jkj.z(1)r(1)
jck j 1:
2.4 CONVERGENCE OF THE TRINITY ON Lp-GENERATED SEQUENCES
65
(iii) By Eqs. (2.33) and (2.35) for small 1 the expression 2=(1 z(1)) is close to 2 and r(1) is large, so for such 1 we have 2=(1 z(1)) r(1) n. Hence, 2 n nz(1) and nz(1) n 2. Combining this with the left inequality in Eq. (2.36) proves the statement. B
2.4.9 Convergence of the Trinity on Lp-Generated Sequences In addition to the previous notation ac ¼
bc ¼
P
j[Z
X
jcj j we need
cj :
j[Z
Theorem. (Mynbaev, 2001) If ac , 1, F [ Lp , 1 p , 1, then for all sufficiently small 1 max{k(Tn0 bc )dnp Fkp , kTn dnp Fkp , kTnþ dnp Fkp } (21=p ac þ 2kFkp )1 for all n r(1):
(2.38)
Proof. In the trivial case ac kFkp ¼ 0 the left side of Eq. (2.38) is zero and the inequality is true for all n 1 ¼ r(1), so we can assume ac kFkp = 0. Denote fn ¼ dnp F. The cutter determines what kind of estimate to use. For 0 . k c(n, 1) ( (n 2)) we use Lemma 2.4.1(i): kA0k fn fn kp (v pp (F, jkj=n) þ kFkpp,(0,jkj=n) )1=p [by Lemma 2.4.8(i) and monotonicity] (v pp (F, z(1)) þ kFkpp,(0,z(1)) )1=p [applying the major and Eq. (2.32)] 21=p m(z(1)) ¼ 21=p 1:
(2:39)
Similarly, using item (ii) of Lemma 2.4.1, for 0 k c(n, 1) we have p )1=p 21=p 1: kA0k fn fn kp (vpp (F, k=n) þ kFkp,(1k=n,1)
(2:40)
66
CHAPTER 2
Lp-APPROXIMABLE SEQUENCES OF VECTORS
After subtracting bc fn from Eq. (2.17) we can sort the terms as in (Tn0 bc )fn ¼
n1 X k¼0
¼
1 X
ck A0k fn þ
ck A0k fn
k¼nþ1
c(n,1) X
k¼0 |fflfflfflfflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflfflfflffl} n1 X
ck [A0k fn fn ]
k¼c(n,1)
|fflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl}
S1
þ
ck fn
k[Z 1 X
ck [A0k fn fn ] þ
X
S2
X
ck A0k fn
ck fn :
jkj¼c(n,1)þ1
jkj.c(n,1)
S3
S4
|fflfflfflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflfflffl}
|fflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflffl}
Now estimate S1 using Eq. (2.40) and S2 using Eq. (2.39); for S3 apply Lemma 2.4.1(iii) [the sum is not empty by Lemma 2.4.8(iii)], and for S4 apply Lemmas 2.1.3 and 2.4.8(ii). The resulting bound is c(n,1) X
k(Tn0 bc )fn kp
k¼0
þ
1 X
jck j21=p 1 þ
jck j21=p 1
k¼c(n,1) n1 X
c(n,1) X
jck jkFkp
jkj.c(n,1)
jkj.c(n,1)
21=p 1
X
jck jkFkp þ
X
jck j þ 2kFkp
jck j
jkj.c(n,1)
jkj¼0
(21=p ac þ 2kFkp )1:
(2.41)
Applying the cutter in representation (2.19) we get kTn fn kp
c(n,1) X
jck j kA k fn k p þ
k¼1 |fflfflfflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflfflffl} S1
X
jck j kA k fn k p :
k.c(n,1)
|fflfflfflfflfflfflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl} S2
For S1 use Lemma 2.4.1(iv) and for S2 Lemma 2.4.1(v). Next, apply parts (i) and (ii) of Lemma 2.4.8: kTn fn kp
c(n,1) X k¼1
X
jck jkFk p,(0,k=n) þ
jck jkFkp
k.c(n,1)
ac kFk p,(0,z(1)) þ 1kFkp ac m(z(1)) þ 1kFkp 1(ac þ kFkp ): The final line follows from the property (2.32) of the major.
(2.42)
2.4 CONVERGENCE OF THE TRINITY ON Lp-GENERATED SEQUENCES
67
Similarly, for Tnþ we use representation (2.30), Lemmas 2.4.1(vi) and 2.4.8(i) for k c(n, 1) and Lemmas 2.4.1(vii) and 2.4.8(ii) for k . c(n, 1). The result is kTnþ fn kp 1(ac þ kFkp ):
(2:43)
Equations (2.41), (2.42) and (2.43) prove the theorem.
B
2.4.10 Discussion Components of dnp F vanish in the limit [Lemma 2.1.3(iii)], so the terms Tn0 dnp F and bc dnp F in Eq. (2.38) do not converge separately in lp . To understand why their difference converges to zero, it is useful to consider the operator 2 Mn F ¼ Dnp Tn0 dnp F ¼ Dnp 4
n X
!n 3 5 (dnp F )t ctj
t¼1
¼ n1=p
n X n X
j¼1
(dnp F )t ctj 1ij
j¼1 t¼1 1=pþ1=q
¼n
n X n ð X j¼1 t¼1
F(x) dx ctj 1ij ,
F [ Lp :
it
Thus, the restriction of Mn F on ij equals Mn Fjij ¼
ð n X n F(x) dx ctj : t¼1
it
The averages of F over the covering intervals are multiplied by ctj and the results are summed to obtain Mn Fjij . As n ! 1, the partition becomes finer and more and more of the numbers ctj are involved in the sum. The averages tend to values of F at points of (0, 1). In the limit every value F(x) is multiplied by the sum bc of all numbers cj . This intuitive explanation is substantiated by kMn F bc Fkp kDnp (Tn0 bc )dnp Fkp þ jbc j kDnp dnp F Fkp ¼ k(Tn0 bc )dnp Fkp þ jbc j kPn F Fkp ! 0, n ! 1, where we use Lemma 2.1.7, Eq. (2.38) and Lemma 2.2.1. The limit limn!1 Mn is similar to the multiplier operator M in the Fourier analysis defined by (MF)(x) ¼
X k[Z
mk ck eikx
68
CHAPTER 2
Lp-APPROXIMABLE SEQUENCES OF VECTORS
if the function F on the unit circumference is decomposed as X F(x) ¼ ck eikx k[Z
and {mk } is a given sequence of numbers. M is a composite of three mappings: first F is discretized to obtain its Fourier coefficients {ck }, second the Fourier coefficients are multiplied by mk to get {mk ck } and, finally, the latter numbers are used as Fourier coefficients in the new series MF.
2.5 PROPERTIES OF Lp-APPROXIMABLE SEQUENCES 2.5.1 Definitions Let 1 p 1 and let the sequence of vectors { fn } be such that fn [ Rn for all n [ N. { fn } is called Lp -approximable if there exists a function F [ Lp such that k fn dnp Fkp ! 0, n ! 1:
(2:44)
If such is the case, the sequence { fn } is said to be Lp -close to F. To make this definition work in the case p ¼ 1, it is necessary to assume additionally that F is continuous on [0, 1], and I prefer to mention this condition each time rather than include it in the definition. Lemma.
If p , 1, then Eq. (2.44) is equivalent to kDnp fn Fkp ! 0, n ! 1:
(2:45)
Proof. If Eq. (2.44) is true, then by Lemmas 2.1.7 and 2.2.1 kDnp fn Fkp kDnp fn Dnp dnp Fkp þ kDnp dnp F Fkp ¼ kDnp ( fn dnp F)kp þ kPn F Fkp ! 0,
n ! 1:
Conversely, the same lemmas allow us to derive from Eq. (2.45) that k fn dnp Fkp ¼ kDnp ( fn dnp F)kp kDnp fn Fkp þ kF Pn Fkp ! 0, n ! 1:
B
2.5.2 M-Properties The name m-properties is used for those properties of Lp -approximable sequences that stem mainly from magnitude properties of functions from Lp . Accordingly, c-properties reflect mainly the continuity characteristics of elements of Lp . They are more difficult to establish.
2.5 PROPERTIES OF Lp-APPROXIMABLE SEQUENCES
Lemma.
69
Let { fn } be Lp -approximable. Then
(i) supn k fn kp , 1. (ii) If p , 1, then lim max j fnt j ¼ 0:
n!1 1tn
Proof. (i) By Lemma 2.1.3(ii) Lp -approximability implies k fn kp k fn dnp Fkp þ kdnp Fkp k fn dnp Fkp þ kFkp c: Part (ii) follows from Lemma 2.1.3(iii) and max j fnt j k fn dnp Fkp þ max1tn j(dnp F)t j:
1tn
B
2.5.3 Bilinear Forms of Lp-Approximable Sequences This section starts a series of c-properties. Theorem. (Mynbaev, 2001) If 1 , p , 1, {xn } is Lp -close to X [ Lp and {yn } is Lq -close to Y [ Lq , then lim
n!1
[nb] X
ðb xnt ynt ¼
t¼[na]
X(s)Y(s) ds
for all [a, b] # [0, 1]
a
uniformly with respect to the segments [a, b]. Here we put xn0 ¼ yn0 ¼ 0 for the sum at the left to have meaning when a ¼ 0. Proof. Denote {a, b} ¼ {t [ N: [na] t [nb]} and apply Theorem 2.2.3. The locator (Section 2.1.1) makes sure that a [ i[na]þ1 , b [ i[nb]þ1 . In consequence, ! ! [nb] [na] n [ [ [ it , 1[a,b] ¼ 0 on it < it : 1[a,b] ¼ 1 on t¼[na]þ2
t¼1
t¼[nb]þ2
Therefore [dnp (1[a,b] X)]0 dnq (1[a,b] Y) ¼
n X
[dnp (1[a,b] X)]t [dnq (1[a,b] Y)]t
t¼1
¼
X
(dnp X)t (dnq Y)t
t[{a,b}
þ
X
[dnp (1[a,b] X)]t [dnq (1[a,b] Y)]t
t¼[na]þ1,[nb]þ1
X
t¼[na],[na]þ1
(dnp X)t (dnq Y)t :
(2.46)
70
CHAPTER 2
Lp-APPROXIMABLE SEQUENCES OF VECTORS
By Lp (Lq )-approximability and Ho¨lder’s inequality X X (dnp X)t (dnq Y)t xnt ynt t[{a,b} t[{a,b} X X [(dnp X)t xnt ](dnq Y)t þ xnt [(dnq Y)t ynt ] t[{a,b} t[{a,b} kdnp X xn kp kdnq Ykq þ kxn kp kdnq Y yn kq ! 0:
(2.47)
Here we used Lemmas 2.1.3(ii) and 2.5.2(i). By Lemma 2.1.3(i) max {j[dnp (1[a,b] X)]t j, j(dnp X)t j} max{k1[a,b] Xk p,it , kXk p,it } ¼ kXk p,it : A similar bound holds for Y. Hence, of the three sums at the right of Eq. (2.46), the last two tend to zero uniformly with respect to a, b. By Theorem 2.2.3, Eq. (2.46) and Eq. (2.47) we have uniformly in a, b X X xnt ynt ¼ lim (dnp X)t (dnq Y)t lim n!1
t[{a,b}
n!1
t[{a,b}
0
ðb
¼ lim [dnp (1[a,b] X)] dnq (1[a,b] Y) ¼
X(s)Y(s) ds:
n!1
a
B
2.5.4 The Trinity and Lp-Approximable Sequences In essence, statements for Lp -approximable sequences are obtained from those for Lp -generated ones by perturbation. Since the rate of convergence of fn dnp F in Eq. (2.44) is not quantified, in the next theorem it is not possible to specify the rate of convergence of the trinity. Theorem.
(Mynbaev, 2001) If p , 1, ac , 1 and { fn } is Lp -approximable, then lim max{k(Tn0 bc ) fn kp , kTn fn kp , kTnþ fn kp } ¼ 0:
n!1
Proof. By boundedness of Tn0 in lp (see Eq. (2.41)), convergence of the trinity on Lp -generated sequences (Theorem 2.4.9) and the Lp -approximability definition k(Tn0 bc ) fn kp k(Tn0 bc )( fn dnp F)kp þ k(Tn0 bc )dnp Fkp 2ac k fn dnp Fkp þ k(Tn0 bc )dnp Fkp ! 0: The rest of the proof utilizes Eqs. (2.42) and (2.43) and is equally simple.
B
2.6 CRITERION OF Lp-APPROXIMABILITY
71
2.5.5 Bilinear Forms and T Operator Combined Theorem. then
If 1 , p , 1, ac , 1 and { fn } is Lp -close to F, {gn } is Lp -close to G,
lim
n!1
X
(Tn fn )t (Tn gn )t ¼
t[Z
b2c
ð1 F(x)G(x) dx: 0
Proof. By Ho¨lder’s inequality n n X X 0 0 2 (Tn fn )t (Tn gn )t bc fnt gnt t¼1 t¼1 n n X X 0 0 0 (Tn fn bc fn )t (Tn gn )t þ jbc j fnt (Tn gn bc gn )t t¼1 t¼1 k(Tn0 bc )fn kp kTn0 gn kq þ jbc jk fn kp k(Tn0 bc )gn kq ! 0:
(2.48)
The final line obtains by applying uniform boundedness of kTn0 k, k fn kp and kgn kq and Theorem 2.5.4. Ho¨lder’s inequality and Theorem 2.5.4 yield X (Tn fn )t (Tn gn )t kTn fn kp kTn gn kq ! 0: t,1 Here Tn can be replaced with Tnþ . Therefore Eq. (2.48) implies X
(Tn fn )t (Tn gn )t b2c
fnt gnt ! 0:
t¼1
t[Z
It remains to recall that by Theorem 2.5.3
n X
Pn
t¼1 fnt gnt
!
Ð1 0
FGds.
B
2.6 CRITERION OF Lp-APPROXIMABILITY 2.6.1 Statement of Problem The definition of Lp -approximability appeals to the existence of a function F [ Lp for which Eq. (2.44) would be true. Along with the question about what this entails it is natural to ask when such a function exists. Different answers are possible. Sufficient conditions and counter-examples are considered in Section 2.7. All of them rely on
72
CHAPTER 2
Lp-APPROXIMABLE SEQUENCES OF VECTORS
some external information about the sequence. Here we concentrate on what is called an intrinsic characterization, which should satisfy two conditions: 1. it should be equivalent to (that is, necessary and sufficient for) Lp -approximability and 2. it should be expressed in terms of just the sequence itself, without appealing to any other objects.
2.6.2 Continuity Modulus of a Step Function Here is one of those technical calculations that look arbitrary—and therefore ugly— yet lead to a precise result. Notation (2.14) conceals the fact that the matrices A0k depend not only on k but also on n. Here it is more convenient to use the well-known fact that if In ¼ A01 denotes the n n matrix with the first subdiagonal filled with unities and all other cells with zeros, then all other matrices A0k with negative k are its powers: A0k ¼ (In )k , k ¼ 1, . . . , n þ 1: Lemma.
For a natural n consider a step function F¼
n X
ct 1it
t¼1
with some real coefficients and denote c ¼ (c1 , . . . , cn )0 . If p , 1 and d , 1, then !
vp (F, d) (2=n)1=p 2 sup k(In )[ yn] c ckp þ kIn c ckp : 0,yd
Proof. Because of the symmetry
kF
ty Fkpp,Vy
min{1,1y} ð
¼
jF(x) F(x þ y)jp dx
max {0,y} min{1þy,1} ð
¼
jF(z y) F(z)jp dz ¼ kF ty Fkpp,Vy
max {y,0}
the sup in the definition of the continuity modulus can be taken over only positive y:
vp (F, d) ¼ sup kF ty Fk p,Vy : 0,yd
The condition d , 1 is not really a restriction because for y 1 the set Vy is empty.
2.6 CRITERION OF Lp-APPROXIMABILITY
73
Fix 0 , y d , 1 and denote k ¼ [yn] yn , n. Because of the form of F we have to start with kF
ty Fkpp,Vy
¼
ð n X t¼1
jF ty Fjp dx:
it >Vy
Let’s look at one term in this sum. Let x [ it > Vy , that is, (t 1)=n x , t=n, 0 , x , 1 y: From the definition of k k=n y , (k þ 1)=n:
(2:49)
These inequalities imply (t þ k 1)=n x þ y , (t þ k þ 1)=n or x þ y [ itþk < itþkþ1 . Hence F(x þ y) may take only values ctþk (on itþk y) and ctþkþ1 (on itþkþ1 y). It follows that ð it >Vy
ð
jF ty Fjp dx ¼
jct ctþk jp dx
it >Vy >(itþk y)
ð þ
jct ctþkþ1 jp dx
it >Vy >(itþkþ1 y)
¼ jct ctþk jp mes (it > Vy > (itþk y)) þ jct ctþkþ1 jp mes(it > Vy > (itþkþ1 y)):
(2.50)
Obviously, with Vy,t,k ¼ it > Vy > (itþk y) we have mes(Vy,t,k )
mes(itþk y) ¼ 1=n, 0,
if Vy,t,k = ;; if Vy,t,k ¼ ;:
To avoid the headache of trying to figure out when Vy,t,k is nonempty we replace it by a weaker condition that Vy > (itþk y) is nonempty (the upper bound may only increase). Since itþk y ¼ [(t þ k 1)=n, (t þ k)=n y), Vy ¼ (0, 1 y) and by Eq. (2.49) (t þ k)=n y . (k þ 1)=n y . 0, the intersection Vy > (itþk y) is nonempty if (t þ k 1)=n y < 1 y or t þ k n:
74
CHAPTER 2
Lp-APPROXIMABLE SEQUENCES OF VECTORS
Similarly, mes(it > Vy > (itþkþ1 y)) 1=n and we can count only those t that satisfy t þ k þ 1 n: Summing Eq. (2.50) over the indicated t we get
kF
ty Fkpp,Vy
! nk nk1 X 1 X p p jct ctþk j þ jct ctþkþ1 j : n t¼1 t¼1
(2:51)
Consider, for example, the first sum at the right of Eq. (2.51), nk X
jct ctþk jp ¼
t¼1
n X
jc jk cj jp þ
k X
jcj jp :
j¼1
j¼kþ1
From Eq. (2.16) we know that (In )k c ¼ (0, . . . , 0, c1 , . . . , cnk )0 so the above sum equals nk X
p
jct ctþk jp ¼ k(In )k c ckp
t¼1
and similarly nk1 X
p
jct ctþkþ1 jp ¼ k(In )kþ1 c ckp :
t¼1
Thus, using also an elementary inequality (ap þ bp )1=p 21=p (a þ b), p
kF ty Fk p,Vy n1=p (k(In )k c ckp þ k(In )kþ1 c ckpp )
1=p
(2=n)1=p (k(In )k c ckp þ k(In )kþ1 c ckp ): Here, by boundedness (2.16) k(In )kþ1 c ckp k(In )kþ1 c (In )k ckp þ k(In )k c ckp ¼ k(In )k (In c c)kp þ k(In )k c ckp kIn c ckp þ k(In )k c ckp :
(2.52)
2.6 CRITERION OF Lp-APPROXIMABILITY
75
Thus, kF ty Fk p,Vy (2=n)1=p (2k(In )k c ckp þ kIn c ckp ), which proves the lemma.
B
2.6.3 Condition X: Discretizing the Continuity Modulus The discretization operator dnp takes us from Lp to lp and the interpolation operator Dnp takes us back. How does this two-way relationship extend to continuity moduli? In other words, what is, in terms of lp , the equivalent of the property limd!0 vp (F, d) ¼ 0, p , 1? This equivalent, let’s call it condition X, is established in the next lemma taken from Mynbaev (2001). Denote X({fn }) ¼
lim
sup
d !0, m !1 nm, 0,yd
k(In )[yn] fn fn kp
(2:53)
for any sequence { fn } , lp . We say that { fn } satisfies condition X if X({ fn }) ¼ 0. Lemma.
Let p , 1:
(i) If { fn } is Lp -generated by F [ Lp, fn ¼ dnp F, then { fn } satisfies condition X. (ii) Conversely, suppose that a sequence { fn }, such that fn [ Rn for all n, satisfies condition X. Then the step functions Fn ¼ D np fn possess the property lim sup vp (Fn , d) ¼ 0:
d !0 n1
Proof. (i) Let 0 , d , 1 and n [ N. Since [ yn] yn, Lemma 2.4.1(i) implies sup n1, 0,yd
k(In )[ yn] fn fn kp (vpp (F, d) þ kFkpp,(0,d) )
1=p
:
Therefore lim
sup
d !0 n 1, 0,yd
k(In )[ yn] fn fn kp ¼ 0,
which is stronger than X({ fn }) ¼ 0. (ii) As a preliminary step, let’s prove that X({fn }) ¼ 0 implies lim k(In )k fn fn kp ¼ 0 for any k [ N:
n !1
(2:54)
76
CHAPTER 2
Lp-APPROXIMABLE SEQUENCES OF VECTORS
From Eq. (2.53) we see that if X({ fn }) ¼ 0, then for any 1 . 0 there exist d . 0 and m 1 such that k(In )[ yn] fn fn kp , 1
for all n m and y [ (0, d]:
(2:55)
For a natural k consider n m0 ; max{m, k=d} and put y ¼ k=n d. Then [ yn] ¼ k and the preceding bound gives Eq. (2.54): k(In )k fn fn kp , 1 for all n m0 :
(2:56)
Put c ¼ n1=p fn in Lemma 2.6.2. Then the function F from that lemma becomes Fn ¼ Dnp fn and ! 1=p
vp (Fn , d) 2
2 sup 0,yd
k(In )[ yn] fn
fn kp þ
kIn fn
fn kp :
Applying Eqs. (2.55) and (2.56) we get vp (Fn , d) 21=p 31 for all n m0 . The proof is complete. B
2.6.4 Precompactness in Lp A set K in a normed space L is called precompact if every sequence {xn } # K contains a convergent subsequence {xnm }. Properties of precompact sets in infinite-dimensional spaces parallel those of bounded sets in finite-dimensional spaces. I give an example of how this notion works in Section 2.6.6. Theorem (Frechet – Kolmogorov). (Iosida, 1965, Section X.1) A set K , Lp is precompact if and only if sup kFkp , 1 (uniform boundedness) and
F[K
lim sup vp (F, d) ¼ 0 (uniform equicontinuity in mean):
d !0 F[K
2.6.5 Orthogonality Lemma. Let 1 , p , 1. If a function F [ Lp is orthogonal to indicators of all intervals, ð1 F(x)1(a,b) (x) dx ¼ 0 0
then F ¼ 0 a.e.
for all (a, b) # (0, 1),
(2:57)
2.6 CRITERION OF Lp-APPROXIMABILITY
77
Proof. By linearity, Eq. (2.57) extends to ð1 F(x)G(x) dx ¼ 0
for all step functions G:
(2:58)
0
If G [ Lq is an arbitrary function, then the projections Pn G are step functions. By Eq. (2.58), Ho¨lder’s inequality and Lemma 2.2.1 1 1 ð ð ð1 FG dx ¼ FG dx FPn G dx 0
0
0
kFkp kG Pn Gkq ! 0: This generalizes Eq. (2.58) to ð1 F(x)G(x) dx ¼ 0
for all G [ Lq :
(2:59)
0
It is easy to check that H ¼ jFj p1 sgnF belongs to Lq : ð1
q
ð1
jH(x)j dx ¼ 0
jF(x)j
(p1)q
ð1 dx ¼
0
jF(x)jp dx , 1:
0
Then, by Eq. (2.59) ð1 0¼
ð1 F(x)G(x) dx ¼
0
jF(x)kF(x)jp1 dx ¼ kFkpp
0
and F ¼ 0 a.e.
B
2.6.6 Criterion of Lp-Approximability Theorem. (Mynbaev, 2001) Let 1 , p , 1 and suppose { fn } is a sequence of vectors satisfying fn [ Rn for all n [ N. Then { fn } is Lp -approximable if and only if the following three conditions hold: (i) supn k fn kp , 1 (uniform boundedness), P (ii) the limit limn !1 n1=q [nb] t¼[na] fnt exists for any 0 a , b 1 (here by definition fn0 ¼ 0 for all n) and (iii) X({ fn }) ¼ 0 (condition X).
78
CHAPTER 2
Lp-APPROXIMABLE SEQUENCES OF VECTORS
Proof. Necessity. Let { fn } be Lp -close to F [ Lp . The necessity of uniform boundedness is proved in Lemma 2.5.2. In the refined convergence theorem (Theorem 2.5.3) let {xn } ¼ { fn }, X ¼ F and let {yn } be Lq -generated by Y ; 1. Then, by the definition from Section 2.1.2, ynt ¼ n1=p1 ¼ n1=q , t ¼ 1, . . . , n, and Theorem 2.5.3 gives 1=q
lim n
n !1
[nb] X
ðb F(s) ds uniformly in [a, b] # [0, 1]:
fnt ¼
t¼[na]
(2:60)
a
This condition implies (ii). Later on in the proof we need a generalization of this property for subsequences: if some subsequence { fnm } of { fn } is Lp -close to F [ Lp , meaning that k fnm dnm ,p Fkp ! 0, m ! 1, then
lim n1=q m
m !1
[n m b] X
ðb F(s) ds uniformly in [a, b] # [0, 1]:
fnm ,t ¼
t¼[nm a]
(2:61)
a
This is obtained from Eq. (2.60) simply by taking { fnm } as the original sequence. In Lemma 2.6.3(i) the necessity of condition X is proved for Lp -generated sequences, so for any 1 . 0 there exist d . 0 and m 1 such that sup nm, 0,yd
k(In )[yn] dnp F dnp Fkp , 1:
Due to Lp -approximability, the choice of m can also be subjected to sup k fn dnp Fkp , 1:
nm
By boundedness of (In )k [see Eq. (2.16)] for n m and 0 , y d k(In )[ yn] fn fn kp k(In )[ yn] ( fn dnp F)kp þ k(In )[ yn] dnp F dnp Fkp þ kdnp F fn kp 2k fn dnp Fkp þ k(In )[ yn] dnp F dnp Fkp 31: This proves necessity of condition X for Lp -approximable sequences. Sufficiency. Put Fn ¼ Dnp fn . Since Dnp is an isomorphism (Lemma 2.1.7), condition (i) implies uniform boundedness of Fn : supn kFn kp , 1. By Lemma 2.6.3(ii) condition X ensures uniform equicontinuity in the mean of the functions Fn : limd!0 supn1 vp (Fn , d) ¼ 0. In virtue of the Frechet – Kolmogorov theorem, the set K ¼ {Fn } is precompact in Lp . Hence, there exist a subsequence {Fnm } and a function F [ Lp such that kFnm Fkp ! 0. Then { fnm } is Lp -close to F and Eq. (2.61) is true.
2.6 CRITERION OF Lp-APPROXIMABILITY
79
We need to show that the whole sequence {Fn } converges to F. Suppose it does not. Then there exists another subsequence {Fnk } that is at a positive distance from F: kFnk Fkp 1 . 0:
(2:62)
By precompactness, {Fnk } has a convergent subsequence. Changing the notation, if necessary, we can think of {Fnk } itself as convergent to some G [ Lp : kFnk Gkp ! 0:
(2:63)
Note that { fnk } is Lp -close to G because by Lemmas 2.1.7 and 2.2.1 k fnk dnk , p Gkp ¼ kDnk , p ( fnk dnk , p G)kp ¼ kFnk Pnk Gkp kFnk Gkp þ kG Pnk Gkp ! 0: This allows us to employ Eq. (2.61):
lim
k !1
n1=q k
[n k b] X
ðb fnk ,t ¼
t¼[nk a]
G(s) ds
for all [a, b] # [0, 1]:
(2:64)
a
By condition (ii) the limits in Eqs. (2.61) and (2.64) should be the same. We write this conclusion as ð1 (F G)1(a,b) dx ¼ 0 for all (a, b) # (0, 1): 0
By the orthogonality Lemma 2.6.5, F ¼ G a.e., which contradicts Eqs. (2.62) and (2.63). Hence, the whole sequence {Fn } converges to F and { fn } is Lp -close to F: k fn dnp Fkp ¼ kDnp ( fn dnp F)kp ¼ kFn Pn Fkp kFn Fkp þ kF Pn Fkp ! 0:
2.6.7 Explicit Construction Corollary. If 1 , p , 1 and { fn } is Lp -approximable, then F(x) ¼
[nx] X d lim n1=q fnt , dx n !1 t¼1
is that function to which { fn } is Lp -close.
x [ [0, 1],
B
80
CHAPTER 2
Lp-APPROXIMABLE SEQUENCES OF VECTORS
Proof. This follows from Eq. (2.60) where we can take a ¼Ð 0, b ¼ x and use the d x Lebesgue differentiation theorem: if F is integrable, then dx 0 F(s) ds ¼ F(x) a.e. (Kolmogorov and Fomin, 1989, Chapter VI, Section 3). B
2.7 EXAMPLES AND COUNTEREXAMPLES 2.7.1 Definition of Trends 1. A polynomial trend equals, by definition, xn ¼ (1k1 , 2k1 , . . . , nk1 )0 where k is natural. 2. A logarithmic trend is defined by xn ¼ ( lnk 1, . . . , lnk n)0 for a natural k. 3. A geometric progression is taken to be xn ¼ (a1 , a2 , . . . , an )0 with a real a = 0. 4. Finally, an exponential trend is a vector xn ¼ (ea , . . . , ena )0 . Obviously, denoting b ¼ ea we turn the exponential trend into a geometric progression xn ¼ (b, . . . , bn )0 . A constant is a polynomial trend (k ¼ 1), a geometric progression (a ¼ 1) and an exponential trend (a ¼ 0). In the conventional scheme the regressors are normalized. This is why we are interested in Lp -approximability of the normalized trends fn ¼ xn =kxn kp . The next theorem in the most important case p ¼ 2 is proved in Mynbaev and Castelar (2001). Theorem.
Let p , 1.
(i) If {xn } is a polynomial trend, then the normalized sequence { fn } is Lp -close to F(x) ¼ ((k 1)p þ 1)1=p xk1 , k [ N. When p ¼ 1, this statement is true with F(x) ¼ xk1 . (ii) If {xn } is a logarithmic trend, then { fn } is Lp -close to F ; 1 for all k [ N. (iii) For a geometric progression, { fn } is not Lp -approximable, unless a ¼ 1. (iv) For an exponential trend, { fn } is not Lp -approximable, unless a ¼ 0. Because exponential trends are a special case of geometric progressions, part (iv) follows from item (iii). The rest of the proof is split into sections. See Theorems 4.4.1, 4.4.8 and Lemma 7.2.3 for other examples of Lp -approximable sequences.
2.7.2 Simple Sufficient Conditions Lemma.
Let p 1.
(i) Suppose that for a given { fn }, with fn [ Rn for all n, there exists F [ L1 such that kDnp fn Fk1 ! 0. Then { fn } is Lp -close to F.
2.7 EXAMPLES AND COUNTEREXAMPLES
81
(ii) Let F be continuous on [0, 1] and suppose that a sequence {pn }, with pn [ Rn for all n, satisfies max jpnt F(t=n)j ! 0,
1tn
n ! 1:
Denote fn ¼ n1=p pn . Then {fn } is Lp -close to F. Proof. (i) By Ho¨lder’s inequality the equivalent definition of Lp -approximability [Eq. (2.45)] is satisfied: kDnp fn Fkp kDnp fn Fk1 ! 0: (ii) By uniform continuity of F max max jF(t=n) F(x)j ! 0,
1tn x[it
Since Dnp fn ¼
Pn
t¼1
n ! 1:
pnt 1it , we see that
kDnp fn Fk1 ¼ max max jpnt F(x)j 1tn x[it
max jpnt F(t=n)j þ max max jF(t=n) F(x)j ! 0: t
t
x[it
It remains to apply part (i).
B
2.7.3 Proof of Theorem 2.7.1(i): Polynomial Trends For a continuous function h on [0, 1] its integral n 1X h(t=n) n t¼1
Ð1 0
h(t) dt is a limit of Riemann sums:
ð1 h(t) dt ¼ o(1): 0
o(1), as usual, denotes a sequence {1n } satisfying lim 1n ¼ 0. This notation is impersonal in the sense that the sequences {1n } that appear in different places of the proof are not the same. From kxn kpp ¼
n X t¼1
t (k1)p ¼ n(k1)pþ1
n (k1)p 1X t n t¼1 n
82
CHAPTER 2
Lp-APPROXIMABLE SEQUENCES OF VECTORS
we see that h(t) ¼ t (k1)p is the right choice to approximate kxn kpp . Since ð1
t (k1)p dt ¼
1 , (k 1)p þ 1
(2:65)
0
we get 391=p n = t ð1 X 1 1 þ h(t) dt 5 kxn kp ¼ n(k1)pþ1 4 h ; : (k 1)p þ 1 n t¼1 n 8 <
2
0
1=p
n(k1)pþ1 (1 þ o(1)) (k 1)p þ 1 (k1)pþ1 1=p n ¼ (1 þ o(1)): (k 1)p þ 1 ¼
(2.66)
The normalized trend is ! n k1 0 (k 1)p þ 1 1=p 1 k1 ,..., (1 þ o(1)): fn ¼ n n n
(2:67)
Put pn ¼ n1=p fn , F(x) ¼ ((k 1)p þ 1)1=p x p1 . From Eq. (2.67) we derive jpnt F(t=n)j ¼ ((k 1)p þ 1)1=p (t=n)k1 o(1): Theorem 2.7.1(i) follows from this equation and Lemma 2.7.2(ii) because here o(1) is a sequence that does not depend on t.
2.7.4 Proof of Theorem 2.7.1(ii): Logarithmic Trends Denote, for any real m, ðn
Im (n) ¼ lnm x dx: 2
Since no closed-form formula of type (2.65) exists, establishing an analog of (2.66) will be more difficult. Step 1. Let us prove that Im (n) ¼ n lnm n(1 þ o(1)):
(2:68)
2.7 EXAMPLES AND COUNTEREXAMPLES
83
Integration by parts yields
Im (n) ¼ x ln
m
xjn2
ðn m
x
lnm1 x dx x
2 m
m
¼ n ln n 2 ln 2 mIm1 (n): This recurrent relation can be used to prove by induction Im (n) ¼ n lnm n þ c1 n lnm1 n þ þ ck1 n lnmiþ1 n þ ck þ ckþ1 Imi (n)
(2.69)
for any natural i. Here c1 , . . . , ckþ1 depend on m. Now let m . 0 and consider two cases. 1. If m is integer, put i ¼ m. The end term in Eq. (2.69) contains the integral Ðn I0 (n) ¼ 2 dx ¼ n 2 and Eq. (2.68) follows. 2. If m is not integer, put i ¼ [m] þ 1. From [m] , m , [m] þ 1 ¼ i it follows that 1 , m i , 0 and 0 , lnmi x lnmi 2. Therefore the final integral in Eq. (2.69) is bounded as ðn
Imi (n) ¼ lnmi x dx (n 2) lnmi 2: 2
Again, Eq. (2.69) implies Eq. (2.68). Step 2. Now we show that kxn kp ¼ (1 þ o(1))n1=p lnk n:
(2:70)
By monotonicity, for t ¼ 2, . . . , n ðt
kp
tþ1 ð
kp
ln s ds ln t
lnkp s ds:
t
t1
Summing these inequalities and using the notation Im (n) we get ð2 1
lnkp s ds þ Ikp (n) kxn kpp ¼
n X t¼2
lnkp t Ikp (n þ 1):
(2:71)
84
CHAPTER 2
Lp-APPROXIMABLE SEQUENCES OF VECTORS
In view of Eq. (2.68) the integrals at the left and right have the same asymptotics: Im (n þ 1) ¼ (1 þ o(1))(n þ 1) lnm (n þ 1) 1 ln n þ ln (1 þ 1=n) m ¼ (1 þ o(1))(n lnm n) 1 þ n ln n ¼ (1 þ o(1))n lnm n: The conclusion is that Eq. (2.70) follows from Eqs. (2.68) and (2.71). Step 3. Fix 1 [ (0, 1] and denote
s1,n ¼ {t [ N: 1 t [1n]}, t1,n ¼ {t [ N: [1n] þ 1 t n}: If t [ s1,n , then by monotonicity ln t k ln n k 1 þ 1 2: ln n ln n
(2:72)
If t [ t1,n , then 1n , [1n] þ 1 t, the ratio t=n is bounded away from zero, 1 , t=n 1, and there exists c1 ¼ c1 (1) . 0 such that t ln c1 for all t [ t1,n : n Hence, there is n1 (1) that satisfies, for all t [ t1,n , ln n þ ln (t=n)k ln t k 1 ¼ 1 ln n ln n ln (t=n) k ¼ 1þ 1 1, n n1 (1): ln n
(2.73)
Step 4. Eq. (2.70) implies fn ¼ xn =kxn kp ¼
1 þ o(1) k (ln 1, . . . , lnk n)0 : n1=p lnk n
If we put pn ¼ n1=p fn , F ; 1, Lemma 2.7.2 is not applicable. The values pnt for t close to n are close to 1 and those for t close to 1 are close to 0, so there is no uniform convergence max1tn jpnt F(t=n)j ! 0. Instead, we show directly that kDnp fn Fkp ! 0. The interpolated function can be represented as Dnp fn ¼
n 1 þ o(1) X 1it lnk t ¼ gn þ hn lnk n t¼1
2.7 EXAMPLES AND COUNTEREXAMPLES
85
where gn ¼
n n 1 X o(1) X k 1 ln t, h ¼ 1it lnk t: it n lnk n t¼1 lnk n t¼1
hn ! 0 in C[0, 1] and therefore in Lp . Decompose gn F ¼ S1,n þ T1,n where S1,n ¼
"
X
# " # X ln t k ln t k 1 1it , T1,n ¼ 1 1it : ln n ln n t[t1,n
t[s1,n
Using the inclusion
2.7.5 Proof of Theorem 2.7.1(iii): Geometric Progressions Case jaj , 1. From
kxn kp ¼
n X
!1=p a
tp
¼ jaj
t¼1
1 jajnp 1 jajp
1=p ¼
jaj (1 þ o(1)) (1 jajp )1=p
it follows that fn ¼ (1 þ o(1))
(1 jajp )1=p 1 (a , . . . , an ) jaj
(recall that a = 0 by definition). We need to analyze Dnp fn ¼ (1 þ o(1))
n (n(1 jajp ))1=p X at 1it : jaj t¼1
B
86
CHAPTER 2
Lp-APPROXIMABLE SEQUENCES OF VECTORS
For a fixed 1 [ (0, 1] denote t1,n ¼ {t [ N: [1n] þ 1 t n}. Since [1n] 1n and therefore (0, 1) #
p
jDnp fn j dx
X ð t[t1,n
1
jDnp fn jp dx
it
¼ (1 þ o(1))
c1
n(1 jajp ) X 1 jajtp jaj n t[t1,n
1 1 jajp X jajtp jaj t¼[1n]þ1
¼ c2 jaj[1n]p ! 0, n ! 1:
(2.74)
Suppose that { fn } is Lp -close to some F [ Lp . Then, by the triangle inequality, Eq. (2.74) implies kFk p,(1,1) kF Dnp fn kp þ kDnp fn kp,(1,1) ! 0,
n ! 1:
Since 1 is arbitrarily close to zero, F ¼ 0 a.e. However, the normalization of fn implies normalization of F, kFkp ¼ 1. The contradiction finishes the proof in the case jaj , 1. Case jaj . 1. Let x n ¼ (an , . . . , a1 )0 , f n ¼ x n =kxn kp , b ¼ 1=a. Then xn ¼ a1þn (an , . . . , a1 )0 ¼ a1þn x n , fn ¼
xn a1þn x n ¼ 1þn ¼ (sgna)1þn f n , kxn kp jaj kxn kp
n (n(1 jbjp ))1=p X Dnp f n ¼ (1 þ o(1)) bntþ1 1it , jbj t¼1
Dnp fn ¼ (sgna)1þn Dnp f n : Since jbj , 1, the proof for the case jaj , 1 applies to f n , with the roles of the left and right endpoints of the interval (0, 1) changed. Specifically, for 1 [ (0, 1) let s1,n ¼ {t [ N: 1 t [(1 1)n] þ 1}. By the definition of the integer part, (1 1)n , [(1 1)n] þ 1, which implies (1 1) , {[(1 1)n] þ 1}=n and (0, 1 1) #
0
jDnp f n jp dx
X ð t[s1,n
jDnp f n jp dx
it
¼ (1 þ o(1))
c1
n(1 jbjp ) X 1 jbj p(ntþ1) p jbj n t[s1,n
[(11)n]þ1 X t¼1
jbj p(ntþ1) ¼ c2 jbj p{n[(11)n]} :
2.7 EXAMPLES AND COUNTEREXAMPLES
87
Because [(1 1)n] (1 1)n , we have n [(1 1)n] 1n and, as a result, 11 ð
jDnp f n jp dx c2 jbj p1n ! 0,
n ! 1:
0
As in the previous case, the implication is that if { fn } is Lp -close to some F [ Lp , then by the triangle inequality kFk p,(0,11) kF Dnp f n kp þ kDnp f n kp,(0,11) ! 0,
n ! 1:
The conclusion that F ¼ 0 a.e. contradicts the normalization kFkp ¼ 1 and the proof of Theorem 2.7.1 is complete.
2.7.6 One Abstract Example This example was suggested to me by participants of the Probabilities and Statistics Seminar of the Steklov Mathematical Institute in 2000. Is the sequence that results from normalization of n
zfflfflfflfflfflfflfflfflfflfflfflfflffl}|fflfflfflfflfflfflfflfflfflfflfflfflffl{ 0 . . . , 1 , 0, . . . , 0 ) xn ¼ ( 1, |fflfflfflffl{zfflfflfflffl} [ ln n]
([ln n] unities in a sequence of n elements) Lp -approximable? Here
kxn kp ¼
[ln n] X
!1=p 1
¼ ([ln n])1=p , fn ¼ xn ([ln n])1=p , k fn kp ¼ 1:
t¼1
Suppose that { fn } is Lp -approximable. Then by the criterion of Lp approximability [see, in particular, Eq. (2.60)], the limit of n1=q
[nb] X
fnt
t¼[na]
should exist for all 0 a , b 1. However, for all sufficiently large n we have [ln n] ln n 12na , [na] and in the above sum there are no nonzero terms. Therefore the function F from Eq. (2.60) should vanish, which is impossible by its normalization. Thus, { fn } is not Lp -approximable.
CHAPTER
3
CONVERGENCE OF LINEAR AND QUADRATIC FORMS
C
Chapter 1, general tools from the theory of L p spaces and probabilities are reviewed, up to martingale CLTs. This, together with the material of Chapter 2, provides us with a launch pad for CLTs for weighted sums of random variables, where those variables are initially m.d.’s and then short-memory linear processes. Next, the desire to obtain convergence statements for quadratic forms forces us to delve into the theory of integral operators. Certain classes of compact operators are studied, including Hilbert – Schmidt and nuclear ones. Both the final statements and some auxiliary results are important for later applications. For example, in Chapter 5 the gauge inequality is applied seven times. In this chapter we deal with two types of L p spaces: on the segment [0, 1] or the square [0, 1]2, for approximation purposes, and on a probability space (V, F , P), for probabilistic results. To distinguish between these, the first two are denoted L p , as in the previous chapter, and the latter L p . If X is a random vector, the norm in Lp , ONTINUING FROM
p , 1, is defined by kXk p ¼ (EkX()k2p )1=p [the norm of X at the left is in the space L p (V) and at the right is in the finite-dimensional space Rdim X , with apologies for the confusion].
3.1 GENERAL INFORMATION In this, some well-known facts from probability theory are reviewed.
3.1.1 Chebyshov Inequality Lemma.
If X is a random vector and kXk p , 1, p , 1, then P(kXk2 1) 1p kXkpp
for all 1 . 0:
Proof. There are several versions of this inequality, but all of them are based on the same idea. Using an obvious fact that 1 kXk2 =1 on the set {kXk2 1}, we prove the Short-Memory Linear Processes and Econometric Applications. Kairat T. Mynbaev # 2011 John Wiley & Sons, Inc. Published 2011 by John Wiley & Sons, Inc.
89
90
CHAPTER 3
CONVERGENCE OF LINEAR AND QUADRATIC FORMS
inequality as ð P(kXk2 1) ¼
dP 1
p
kXk2 1
ð
kXkp2 dP 1p kXkpp : B
kXk2 1
An obvious implication is that limM !1 P(kXk2 M) ¼ 0.
3.1.2 Convergence in Lp and in Probability Lemma.
p
If p , 1 and Xn ! X in L p , then Xn ! X.
Proof. This statement immediately follows from Chebyshov’s inequality because for any fixed 1 . 0 P(kX Xn k2 1) 1p kX Xn k pp ! 0:
B
In our applications usually p ¼ 1 or p ¼ 2.
3.1.3 Convergence in Different L p Spaces Lemma.
If Xn ! X in L p , then Xn ! X in all Lq with q , p.
Proof. Just apply Ho¨lder’s inequality to prove: 0 1q=p 0 11q=p ð ð ð p q kXn Xk2p dP @ kXn Xk2 q dPA @ dPA V
V
V
¼ kXn
Xkqp
! 0:
B
One of the standard applications is when the difference Xn X is represented as a sum of terms each of which tends to zero in its own L p . In this situation Xn ! X in L p with the least p.
3.1.4 Uniform Integrability A random variable is called proper if it is finite with probability 1. All random variables in this book are assumed proper, but this is not mentioned explicitly. For any integrable random variable we can write EjXj ¼
1 X
EjXj1{tjXj,tþ1} , 1:
t¼0
The convergence of this series means its remainder should tend to zero: 1 X t¼m
EjXj1{tjXj,tþ1} ¼ EjXj1{jXjm} ! 0,
m ! 1:
3.1 GENERAL INFORMATION
91
Requiring this property uniformly for a family of random variables gives rise to the uniform integrability notion. A family {Xt : t [ T} of random variables is called uniformly integrable if lim sup EjXt j1{jXt jm} ¼ 0:
m!1 t [T
(3:1)
The next theorem (from Davidson, 1994) is extremely useful in establishing uniform integrability. Theorem. (Criterion of uniform integrability) A collection {Xt : t [ T} of random variables is uniformly integrable if and only if the following two conditions are satisfied (i) supt [T EjXt j , 1 (uniform L1 -boundedness) and (ii) limd !0 supP(A) d supt [T EjXt j1A ¼ 0 (uniform absolute continuity) where A are measurable subsets of V. The advantage of condition (ii) over Eq. (3.1) is that in (ii), for a given d, the set A can be chosen independently of Xt .
3.1.5 Crame´r– Wold Theorem Theorem. A sequence {Sn : n [ N} of k-dimensional random vectors converges in joint distribution to a random vector S if and only if a sequence of scalar variables {a0 Sn : n [ N} converges in distribution to a0 S for every a [ Rk . This theorem, also called a Crame´r – Wold device, reduces the vector case to the scalar one. It is the main reason why many convergence statements in probabilities theory are given for random variables. In particular, when convergence in joint distribution to a normal vector S is sought and S is to be distributed as N(0, V), the desired conclusion will follow if for all a [ Rk d
a0 Sn ! N(0, a0 Va):
(3:2)
This is because, taking S N(0, V), for its linear transformation we get a0 S d
N(0, a0 Va). Equation (3.2) is a shortcut for a0 Sn ! a0 S with S N(0, V), so by the d
Crame´r – Wold theorem Sn ! S.
3.1.6 Linear and Quadratic Forms in Independent Standard Normals Let {un : n [ N} be a sequence of independent standard normal variables. The sequence {u2n : n [ N} consists of independent x2 -variables with one degree of freedom. If a sequence of deterministic vectors {cn : n [ N} is square-summable, P 2 n kcn k2 , 1, then both series X X L¼ cn un and Q ¼ cn u2n n
n
92
CHAPTER 3
CONVERGENCE OF LINEAR AND QUADRATIC FORMS
converge in L2 and, hence, in probability and distribution. We refer to L as a linear form and to Q as a quadratic form in independent standard normals. Such variables are convenient for characterizing limit distributions. In the case of convergence to a normal law they are just an equivalent way of expressing the same result, while in the more general cases (when the limit distribution contains both linear and quadratic parts) they become indispensable. Lemma. For a k-dimensional random vector S the condition S N(0, V) is equivalent to the representation
S¼
X
cn un
with
n
X
cn c0n ¼ V,
n
X
kcn k22 , 1:
(3:3)
n
Proof. Suppose S N(0, V). Let C be the square root of V, partition C into columns, C ¼ (c1 , . . . , ck ), denote U ¼ (u1 , . . . , uk )0 and consider a normal vector P T ¼ kn¼1 cn un ¼ CU. Then ET ¼ 0, V(T) ¼ CEUU 0 C 0 ¼ CC 0 ¼ C 2 ¼
X
cn c0n ¼ V:
n
Since a normal vector is completely determined by its mean and variance, we have S ¼ T (more precisely, T is distributed as S). Conversely, if Eq. (3.3) is true, then S is a limit in distribution of normal vectors P Sn ¼ ln cl ul . Here ES ¼ lim ESn ¼ 0, n!1
V(S) ¼ lim V(Sn ) ¼ lim E n!1
¼ lim
n!1
n!1
X
X
cl ul
ln
cm c0l Eum ul ¼
l,mn
!
X
X
!0 cl ul
ln
cn c0n :
n
B
3.1.7 Lindeberg –Le´vy Theorem From Davidson (1994, p. 336), if {Xt }1 1 is an i.i.d. sequence having zero mean and 2 variance s , then n 1 X d pffiffiffi Xt ! N(0, s 2 ): n t¼1
If Xt are not centered, the theorem can be applied to Xt EXt .
3.2 WEAK LAWS OF LARGE NUMBERS
93
3.2 WEAK LAWS OF LARGE NUMBERS 3.2.1 Martingale Difference Arrays Consider a family {{Xnt , F nt : t ¼ 1, . . . , kn }: n ¼ 1, 2, . . . }, where {kn } is an increasing sequence of integers, Xnt are random variables and F nt are nested sub-sfields of V, F n,t1 # F nt for all n, t. Such a family is called a martingale difference array if 1. Xnt are F nt -measurable, 2. Xnt are integrable and 3. E(Xnt j F n,t1 ) ¼ 0 for all t, n. An array can be visualized as a sequence of rows X11 X21
X1,k1 X2,k1
X2,k2
Pn Convergence statements are usually given for row-wise sums Sn ¼ kt¼1 Xnt . Let {wn } be a sequence of scalar weights and let {en } be a sequence of zero-mean fixedvariance random variables. Suppose we have to consider convergence of P weighted sums sn ¼ nt¼1 wt et . Upon normalization these sums become Sn ¼ Pn Pn Pn 2 1=2 2 1=2 wt et the task is cast in terms t¼1 wt t¼1 wt et . With Xnt ¼ t¼1 wt of arrays, which explains their necessity. In applications kn ¼ n most of the time.
3.2.2 Martingale Weak Laws of Large Numbers Statements about convergence in probability of weighted sums of random variables are called weak laws of large numbers (WLLN). The definition of strong laws of large numbers is obtained by replacing convergence in probability by almost sure convergence. In econometrics convergence in probability is used more often because it is easier to prove. The next theorem was proved by Chow (1971) in the homogeneous case and generalized by Davidson (1994) to the heterogeneous case. J. Davidson calls it a martingale WLLN, even though it is actually about convergence in Lp (of which convergence in probability is a consequence). Theorem. Let {Xnt , Fnt } be a m.d. array, {cnt } a positive constant array and {kn } an increasing integer sequence with kn ! 1. Suppose 1 p 2. If (i) {jXnt =cnt j p } is uniformly integrable, Pn cnt , 1, and (ii) lim supn !1 kt¼1 P kn 2 (iii) limn !1 t¼1 cnt ¼ 0, Pn then Sn ¼ kt¼1 Xnt ! 0 in L p .
94
CHAPTER 3
CONVERGENCE OF LINEAR AND QUADRATIC FORMS
The cases p ¼ 1 and p ¼ 2 are the main cases when verification of the conditions of this theorem is possible.
3.2.3 Weak Laws of Large Numbers for Uncorrelated Arrays Theorem. (Davidson, 1994, Corollary 19.10) If {Xnt } is a zero-mean stochastic array with E(Xnt Xns ) ¼ 0 for s = t, and (i) {Xnt =cnt } is L2 -bounded, supn,t kXnt =cnt k2 , 1, Pn 2 cnt ¼ 0, (ii) limn!1 kt¼1 P kn then Sn ¼ t¼1 Xnt ! 0 in L2 . As compared to the martingale WLLN, here condition 3.2.2(ii) is not required, the m.d. assumption is weakened to uncorrelatedness and the uniform integrability to simple L2 -boundedness. This result is stated just for comparison purposes. It is not applied in this book.
3.3 CENTRAL LIMIT THEOREMS FOR MARTINGALE DIFFERENCES CLTs are about convergence of sums of random variables in distribution. When I first started to study this theory, I was puzzled by the number of different CLTs. The variety is perhaps explained by the diversity of applications. Many papers on asymptotic theory in econometrics contain their own CLTs derived from more general CLTs established by probabilists. Here we look at two CLTs that are especially useful when working with m.d.’s.
3.3.1 Central Limit Theorems with Unconditional Normalization Brown (1971) and McLeish (1974) established similar results (I leave it to historians to decide who did what). The next theorem is taken from Davidson (1994), who took it from McLeish: Theorem. Let {Xnt , Fnt } be a m.d. array with finite unconditional variances Pn 2 snt2 ¼ EXnt2 and let kt¼1 snt ¼ 1. If P kn 2 p (i) t¼1 Xnt ! 1 (stabilization of the sum of squares condition) and p
(ii) max1tkn jXnt j ! 0 (terms asymptotic negligibility condition), Pn d Xnt ! N(0, 1). then Sn ¼ kt¼1 This is a good opportunity to P comment on the way the theorem is applied. n Most of the time the normalization kt¼1 snt2 ¼ 1 is not satisfied. We have to find P kn 2 the limit of t¼1 snt and then use it for normalization. If the limit turns out to be 0, we have to investigate the degenerate case. Verification of condition (i) calls for
3.4 CENTRAL LIMIT THEOREMS FOR WEIGHTED SUMS OF MARTINGALE DIFFERENCES
95
application of some WLLN. The surest way to prove (ii) is to try to derive it from E max1tkn jXnt j ! 0.
3.3.2 Central Limit Theorems with Conditional Normalization The following result from Hall and Heyde (1980, Corollary 3.1) corrects Dvoretzky (1972) who missed the requirement that s-fields should be nested. Theorem. Let {Xnt , Fnt } be a m.d. array with finite unconditional variances EXnt2 . Denote snt2 ¼ E(Xnt2 j F n,t1 ) conditional variances. Suppose that P kn 2 p 2 (i) t¼1 snt ! s (a constant), Pn p E(Xnt2 1{jXnt j.1} j Fn,t1 ) ! 0 and (ii) for all 1 . 0, kt¼1 (iii) the s-fields are nested over n: F n,t # F nþ1,t for t ¼ 1, . . . , kn , n ¼ 1, 2, . . . Pn d Xnt ! N(0, s 2 ). Then Sn ¼ kt¼1 Formally, in this theorem s 2 ¼ 0 is allowed. However, in applications the degenerate case still requires separate investigation. In most practical situations both this and the previous theorem can be applied. The small differences in the resulting assumptions on the error terms are indistinguishable from the practitioner’s point of view.
3.4 CENTRAL LIMIT THEOREMS FOR WEIGHTED SUMS OF MARTINGALE DIFFERENCES In econometric papers and books there are quite a few CLTs for weighted sums of random variables. Sometimes they are not even named CLTs. Anderson (1971), Anderson and Kunitomo (1992), and Hannan (1979) are the examples most relevant to the subject of this chapter. The CLTs obtained in this section are better for asymptotic theory of regressions because, owing to the L p -approximability notion, they allow us to better trace the link between regressors and asymptotic properties of the estimator. The conditions on the linear processes are also weaker, except for Hannan’s result (Hannan, 1979), which is about long-memory processes. Theorems on weighted sums of random elements in linear spaces collected in Taylor (1978) are not very useful in the econometric context. Theorems for mixing processes (Davidson, 1994) are a completely different area. See also Wu (2005) and Wu and Min (2005) regarding long-memory processes.
3.4.1 Weighted Sums of Martingale Differences The format of the sums is dictated by applications to linear models. Let {ent , Fnt } be a m.d. array. Denote en ¼ (en1 , . . . , enn )0 . Suppose we have L sequences of deterministic weights {wnl : n [ N}, where wln [ Rn for n [ N and l ¼ 1, . . . , L. Writing all
96
CHAPTER 3
CONVERGENCE OF LINEAR AND QUADRATIC FORMS
vectors as column vectors and putting w1n , . . . , wLn side by side we obtain a matrix of weights Wn ¼ (w1n , . . . , wLn ) of size n L. The L-dimensional random vector Wn0 en is the main attraction in this theme park. Assuming that the m.d.’s are square-integrable and variances of en1 , . . . , enn are the same, Ee2n1 ¼ ¼ Ee2nn ¼ s 2
for all n,
(3:4)
we have EWn0 en ¼ 0, V(Wn0 en ) ¼ EWn0 en e0n Wn ¼ s 2 Wn0 Wn :
(3:5)
The simplifying assumption (3.4) makes comparison of the result for the linear model with those of Anderson (1971) and Amemiya (1985) easier. It can be relaxed by assuming Ee2nt ¼ snt2 where {sn1 , . . . , snn } is L1 -close to a continuous function on [0, 1].
3.4.2 Assumption on Weights and its Implications For each l, it is assumed that {wln : n [ N} is L2 -close to some Fl [ L2 . By the refined convergence theorem (see Section 2.5.3) 0 lim (wkn ) n!1
wln
ð1 Fk (x)Fl (x) dx
¼
for 1 k, l L:
(3:6)
0
Denote 0
0
(w1n ) w1n @ Gn ¼ 0 (wnL ) w1n
1 0 (w1n ) wnL A ¼ Wn0 Wn L 0 L (wn ) wn
and 0
Ð1
F12 dx
B B 0 G¼B B 1 @Ð FL F1 dx
0
1 F1 FL dx C C 0 C C Ð1 2 A FL dx
Ð1
0
the Gram matrices of systems {w1n , . . . , wnL } and {F1 , . . . , FL }, respectively. Then Eq. (3.6) is equivalent to lim Gn ¼ G:
(3:7)
3.4 CENTRAL LIMIT THEOREMS FOR WEIGHTED SUMS OF MARTINGALE DIFFERENCES
97
By Lemma 2.5.2(ii) all elements of Wn asymptotically vanish: max
lim
n !1 t¼1,...,n; l¼1,..., L
j(wln )t j ¼ 0:
(3:8)
3.4.3 Central Limit Theorems for Weighted Sums of Martingale Differences Theorem. (Mynbaev, 2001) Let {ent , Fnt } be a m.d. array and let {Wn } be a sequence of n L deterministic matrices with columns w1n , . . . , wnL . Suppose that (i) e2nt are uniformly integrable, E(e2nt jF n,t1 ) ¼ s2 for all t and n, and (ii) the sequence {wln : n [ N} is L2 -close to Fl [ L2 , l ¼ 1, . . . , L. Then lim V(Wn0 en ) ¼ s2 G
n!1
(3:9)
and d
Wn0 en ! N(0, s2 G),
(3:10)
where G is the Gram matrix of F1 , . . . , FL . By LIE, Ee2nt ¼ E[E(e2nt j F n,t1 )] ¼ s 2 , so Eqs. (3.4) and (3.5) are true and Eq. (3.9) follows from Eq. (3.7). The rest of the proof is split into several sections. Recall that the singularity of the Gram matrix of a system of vectors means that those vectors are linearly dependent. Sections 3.4.4 – 3.4.6 are about the nonsingular case (det G = 0) and Section 3.4.7 completes the proof by treating the singular case.
3.4.4 Reduction to One-Dimensional Case and Normalization (Nonsingular G) Let det G = 0. By the Crame´r– Wold theorem 3.1.5, Eq. (3.10) follows if we establish d
a0 Wn0 en ! N(0, s 2 a0 Ga)
(3:11)
for any a [ RL , a = 0. By Eq. (3.5) V(a0 Wn0 en ) ¼ s 2 a0 Gn a: Equation (3.7), a = 0 and linear independence of F1 , . . . , FL imply lim a0 Gn a ¼ a0 Ga = 0:
n !1
(3:12)
98
CHAPTER 3
CONVERGENCE OF LINEAR AND QUADRATIC FORMS
Hence, for all sufficiently large n we may define Sn ¼ a0 Wn0 en (s 2 a0 Gn a)1=2 : This variable is centered, ESn ¼ 0, and normalized: ES2n ¼ a0 Wn0 Een e0n Wn a(s 2 a0 Gn a)1 ¼ 1:
(3:13)
It is convenient to work with coefficients cnt ¼
L X
al (wln )t (s 2 a0 Gn a)1 :
l¼1
Then Sn becomes Sn ¼
n X
cnt ent
t¼1
and by uncorrelatedness of m.d.’s, Eq. (3.13) can be written ES2n ¼
n X
cnt cns Eent ens ¼
s,t¼1
n X
E(cnt ent )2 ¼ 1:
(3:14)
t¼1
Here Xnt ¼ cnt ent are m.d.’s. Equation (3.14) shows that the normalization condition Pkn 2 t¼1 snt ¼ 1 from Theorem 3.3.1 is satisfied.
3.4.5 Proving that the Sum of Squares Stabilizes Condition (i) from Theorem 3.3.1 takes the form plim
n X
c2nt e2nt ¼ 1:
(3:15)
t¼1
To prove it, we apply the martingale WLLN 3.2.2 with p ¼ 1. Denote c nt ¼ c2nt and X nt ¼ (e2nt s 2 )cnt : X nt are m.d.’s by the conditional second moment condition (i) of the theorem we are proving: E(X nt j F n,t1 ) ¼ [E(e2nt j F n,t1 ) s 2 ]cnt ¼ 0: From the bound je2nt s 2 j e2nt þ s 2 we have {je2nt s 2 j m} # {e2nt m s2 },
1{je2nt s 2 jm} 1{e2nt ms 2 } :
3.4 CENTRAL LIMIT THEOREMS FOR WEIGHTED SUMS OF MARTINGALE DIFFERENCES
99
Therefore uniform integrability of X nt =cnt ¼ e2nt s 2 follows from that of e2nt : Eje2nt s 2 j1{je2nt s 2 jm} E(e2nt þ s 2 )1{e2nt ms 2 } ¼ Ee2nt 1{e2nt ms 2 } þ s 2 P(e2nt m s 2 ) ! 0,
m ! 1,
uniformly in n and t. Here we have used the Chebyshov inequality P(e2nt m s 2 ) 1=(m s 2 )Ee2nt : By Eq. (3.14) n X
c nt ¼
t¼1
n X
c2nt ¼ s2 ,
(3:16)
t¼1
so condition (ii) of Theorem 3.2.2 is satisfied. By Eqs. (3.8) and (3.12) with some c . 0 max jcnt j c
1tn
max
t¼1,..., n; l¼1,..., L
j(wln )t j ! 0, n ! 1:
(3:17)
This fact and Eq. (3.16) show that condition (iii) of Theorem 3.2.2 also holds: n X
c 2nt max c2nt 1tn
t¼1
n X
c2nt ! 0:
t¼1
P n X nt ! 0, which implies Theorem 3.2.2 allows us to conclude that kt¼1 1
Eq. (3.15).
3.4.6 Verifying the Terms Asymptotic Negligibility Condition We need to check that plim max jcnt ent j ¼ 0: 1tn
(3:18)
Here the conditions on ent are weaker and the weights cnt are more general than in (Tanaka, 1996, Chapter 3.5, Section 3.1, Problem 4.6). By the assumed uniform integrability, for any 1 . 0, we can choose m . 0 such that sup Ee2nt 1{jent j.m} 1s 2 n,t
which, together with Eq. (3.16), yields n X t¼1
c2nt Ee2nt 1{jent j.m} 1:
(3:19)
100
CHAPTER 3
CONVERGENCE OF LINEAR AND QUADRATIC FORMS
From Eq. (3.17) we can see that there is a number n0 such that m2 max c2nt 1,
n n0 :
1tn
(3:20)
Denote A0 ¼ ;, At ¼ {jcnt ent j ¼ max jcnt ent j}n 1tn
Bt ¼ {jent j . m},
t1 [
Aj ,
j¼0
t ¼ 1, . . . , n:
Since A1 , . . . , An form a disjoint covering of V, we have E max jcnt ent j2 ¼ 1tn
n X
E max jcnt ent j2 1As
s¼1
1tn
(jcns ens j is the largest of jcn1 en1 j, . . . , jcnn enn j on As ) ¼
n X
Ejcns ens j2 1As :
s¼1
Remembering that jens j m on As nBs , E max jcnt ent j2 ¼ 1tn
n X
c2ns Ee2ns 1As >Bs þ
s¼1
n X
n X
c2ns Ee2ns 1As nBs
s¼1
c2ns Ee2ns 1Bs þ max c2nt m2 1tn
s¼1
n X
E1As 21,
n n0 :
s¼1
The end inequality follows from Eqs. (3.19) and (3.20). As we know, convergence in L1 implies that in probability 3.1.2. We have proved Eq. (3.18). This section completes verification of conditions of Theorem 3.3.1 whose application proves Eqs. (3.11) and (3.10).
3.4.7 The Degenerate Case Step 1. When F1 , . . . , FL are linearly dependent, we can renumber them so that F1 , . . . , FK are linearly independent and FKþ1 , . . . , FL are their linear combinations, Fj ¼
K X i¼1
c ji Fi ,
j ¼ K þ 1, . . . , L:
(3:21)
3.4 CENTRAL LIMIT THEOREMS FOR WEIGHTED SUMS OF MARTINGALE DIFFERENCES
101
Denote F ¼ (F1 , . . . , FL )0 , P ¼ (F1 , . . . , FK )0 , Q ¼ (FKþ1 , . . . , FL )0 ,
P . With the matrix Q
with the view to partition F as F ¼ 0
1 . . . cKþ1,K ... ... A ... cL,K
cKþ1,1 C ¼ @ ... cL,1
Eq. (3.21) is just Q ¼ CP. The next relationship is between the Gram matrix GP of the system P and the Gram matrix G of F: ð1 0 ð1 PP G ¼ FF 0 dx ¼ QP0 0
¼
0
GP
GP C 0
CGP
CGP C 0
PQ0
QQ0
dx
:
The proof is complete if we show that Wn0 en converges in distribution to a normal vector with variance of this structure. Step 2. To prove convergence of an auxiliary vector, with pn ¼ (w1n , . . . , wKn ),
qn ¼ (wKþ1 , . . . , wLn ) n
the matrix Wn is partitioned as Wn ¼ (pn , qn ). To parallel Eq. (3.21) define
w nj ¼
K X
c ji win ,
j ¼ K þ 1, . . . , L:
(3:22)
i¼1
Note that Eq. (3.21) does not imply a similar dependence between qn and pn . It is easy to check that with q n ¼ (w nKþ1 , . . . , w nL ), Eq. (3.22) in matrix form looks like q n ¼ pn C 0 . Thus using W n ; (pn , q n ) we can define an auxiliary vector: 0 W n en
¼
p0n en q 0n en
¼
p0n en Cp0n en
¼
I p0 e : C n n
102
CHAPTER 3
CONVERGENCE OF LINEAR AND QUADRATIC FORMS d
From what is already proved in the nonsingular case, p0n en ! S where S N(0, GP ): Therefore d GP GP C 0 0 , W n en ! N 0, CGP CGP C 0 0
that is, W n en converges to the required vector. Step 3. To prove convergence of the main vector,
dn2 Fj w nj ¼
K X
c ji (dn2 Fi win )
i¼1
is implied by Eqs. (3.21) and (3.22). Using this equation, orthogonality of m.d.’s and the definition of L2 -approximability, we get 2 !2 31=2 n X k(wnj )0 en (w nj )0 en k2,V ¼ 4E (wnj w nj )t ent 5 t¼1
¼s
n X
!1=2 j(wnj )t
2 (w nj )t j
¼ skwnj w nj k2
t¼1
skwnj dn2 Fj k2 þ skdn2 Fj w nj k2 skwnj dn2 Fj k2 þ s
K X
jc ji jkdn2 Fi win k2
i¼1
! 0: 0
We see that W n en and Wn0 en have the same limit in distribution because they differ by a vector whose probability limit is zero. This completes the proof of Theorem 3.4.3.
3.5 CENTRAL LIMIT THEOREMS FOR WEIGHTED SUMS OF LINEAR PROCESSES 3.5.1 Linear Processes with Short-Range Dependence Let {{ent , F nt : t [ Z}: n [ Z} be a double-infinite m.d. array. Except that the set of indices is wider, this satisfies the same requirements as a one-sided array {{Xnt , F nt : t ¼ 1, . . . , kn }: n ¼ 1, 2, . . .} from Section 3.2.1. Fixing a summable sequence of numbers {cj : j [ Z}, denote X en,tj cj , t [ Z: vnt ¼ j[Z
103
3.5 CENTRAL LIMIT THEOREMS FOR WEIGHTED SUMS OF LINEAR PROCESSES
The array {vnt : t, n [ Z} is called a linear process (with short-range dependence). This definition takes account of the array structure of {ent } and is only marginally more general than the definition in Section 1.9.6.
3.5.2 Central Limit Theorems for Weighted Sums of Linear Processes For each n, denote vn ¼ (vn1 , . . . , vnn )0 and X X ac ¼ jcj j, bc ¼ cj : j[Z
j[Z
Theorem. (Mynbaev, 2001) Let {ent , F nt } be a double-infinite m.d. array and let {Wn } be a sequence of n L matrices with columns w1n , . . . , wnL . Suppose that (i) e2nt are uniformly integrable and E(e2nt jF n,t1 ) ¼ s 2 for all t and n, (ii) the sequence {wnl : n [ N} is L2 -close to Fl [ L2 , l ¼ 1, . . . , L, and (iii) ac , 1: With the same Wn and G as in Section 3.4.3, the following statements are true: (a) If bc = 0, then d
Wn0 vn ! N(0, (sbc )2 G):
(3:23)
(b) If bc ¼ 0, then plimWn0 vn ¼ 0: In both cases lim V(Wn0 vn ) ¼ (sbc )2 G:
n!1
(3:24)
3.5.2.1 A Word on Precision Many parts of the proof rely on condition (iii), and long-memory processes are not covered by the present approach. In the case of equal weights wnt ¼ n1=2 , t ¼ 1, . . . , n, there is a stronger result (Hall and Heyde, 1980, Corollary 5.2, in which the series bc converges conditionally). Further, since G contains L2 -norms of Fl , the functions F1 , . . . , FL cannot be taken from a wider class than L2 . See also Wu (2005) and Wu and Min (2005) regarding long-memory processes.
3.5.3 T-Decomposition for Linear Forms Consider the convolution operator Tn : Rn2 ! l2 (Z) defined by (Tn w)j ¼
n X t¼1
wt ctj ,
j [ Z:
104
CHAPTER 3
CONVERGENCE OF LINEAR AND QUADRATIC FORMS
I call Tn a T-operator, alluding to the fact that it crowns the trinity of operators from Section 2.3.1. Changing the order of summation, it is easy to see that for any vector of weights wn [ Rn2 the linear form w0n vn is w0n vn ¼
n X
wnt
X
t¼1
¼
X i[Z
en,tj cj
j[Z
eni
n X
wnt cti ¼
t¼1
X
(3:25) eni (Tn wn )i :
i[Z
Verbally, convolution of {cj } with {ent } induces convolution of {cj } with the weights. When I was giving a seminar at Steklov Mathematical Institute in Moscow in December 2000, those present immediately guessed that I had thrown over the convolution from {ent } to wn , as done in Eq. (3.25). I call Eq. (3.25) a T-decomposition to mean two things: it involves the T-operator and, for some people, it is trivial. If cj ¼ 0 for j , 0 and wn1 ¼ ¼ wnn ¼ 1, then vnt ¼
t X
en, j ctj , (Tn wn )j ¼
j¼1
n X
ctj
t ¼ max{1, j}
and Eq. (3.25) becomes n X t X
en, j ctj ¼
t¼1 j¼1
X
enj
j[Z
n X
ctj :
t ¼ max{1, j}
This equation is known as the Beveridge– Nelson decomposition (Beveridge and Nelson, 1981). Ever since it was introduced it has been used to prove limit results by perturbation argument: the asymptotic statement is established first for uncorrelated innovations and then extended to linear processes by showing that the asymptotics is the same. See plenty of examples in (Tanaka 1996). Here Eq. (3.25) serves the same purpose.
3.5.4 Proving Convergence of Wn0 v n Let us show that plim(Wn0 vn (bc Wn )0 en ) ¼ 0:
(3:26)
By Eq. (3.25) and the definition of the trinity (see Section 2.3.1), the lth component of the vector un ; Wn0 vn (bc Wn )0 en equals unl ¼
X i[Z
¼
n X t¼1
eni (Tn wln )i bc
n X
(wln )t ent
t¼1
ent ((Tn0 bc )wln )t þ
X t,1
ent (Tn wln )t þ
X t.n
ent (Tnþ wln )t :
105
3.5 CENTRAL LIMIT THEOREMS FOR WEIGHTED SUMS OF LINEAR PROCESSES
Hence, by orthogonality of m.d.’s and Theorem 2.5.4 (convergence of the trinity on Lp -approximable sequences) 2
2
2
Eu2nl ¼ s 2 [k(Tn0 bc )wln k2 þ kTn wln k2 þ kTnþ wln k2 ] ! 0, n ! 1: Convergence in Lp implies that in probability. Thus Eq. (3.26) is true. Now we can proceed with proving statements (a) and (b) in Theorem 3.5.2. (a) Let bc = 0: Since bc Wn and {ent , Fnt } satisfy the assumptions of Theorem 3.4.3 ({bc wln : n [ N} is L2 -close to bc Fl , l ¼ 1, . . . , L), Eq. (3.10) gives d
(bc Wn )0 en ! N(0, (sbc )2 G): Owing to Eq. (3.26), this equation proves (a). Statement (b) follows directly from Eq. (3.26) if bc ¼ 0:
3.5.5 T-Decomposition for Means of Quadratic Forms Applying Eq. (3.25) to wkn and wln and multiplying the resulting equations we get (wkn )0 vn (wln )0 vn ¼ (wkn )0 vn v0n wln X X ¼ eni (Tn wkn )i enj (Tn wln )j , k, l ¼ 1, . . . , L: i[Z
j[Z
Now by the definition of the trinity and orthogonality of m.d.’s X (Tn wkn )j (Tn wln )j E(wkn )0 vn v0n wln ¼ s 2 j[Z 2
¼ s [(Tn0 wkn , Tn0 wln ) þ (Tn wkn , Tn wln ) þ(Tnþ wkn , Tnþ wln )],
(3:27)
where ( , ) denotes the scalar product in l2 .
3.5.6 Proving Convergence of Variances To the elements of L
V(Wn0 vn ) ¼ EWn0 vn v0n Wn ¼ (E(wkn )0 vn v0n wln )k,l¼1 we apply Eq. (3.27). By the Cauchy – Schwarz inequality, boundedness of the trinity 2.3.2 and Theorem 2.5.4 j(Tn0 wkn , Tn0 wln ) b2c (wkn , wln )j j((Tn0 bc )wkn , Tn0 wln )j þ j(bc wkn , (Tn0 bc )wln )j k(Tn0 bc )wkn k2 kTn0 wln k2 þ kbc wkn k2 k(Tn0 bc )wln k2 ac k(Tn0 bc )wkn k2 sup kwln k2 þ jbc j k(Tn0 bc )wln k2 sup kwkn k2 ! 0: n
n
106
CHAPTER 3
CONVERGENCE OF LINEAR AND QUADRATIC FORMS
The other two terms in Eq. (3.27) are easier to handle: j(Tn+ wkn , Tn+ wln )j kTn+ wkn k2 kTn+ wln k2 ! 0: The above estimates show that lim E(wkn )0 vn v0n wln ¼ lim E(bc wkn )0 en e0n bc wln :
n!1
n !1
Thus, V(Wn0 vn ) is a perturbation of V((bc Wn )0 en ), which tends to (sbc )2 G by Eq. (3.9). The proof of Theorem 3.5.2 is complete.
3.6 Lp-APPROXIMABLE SEQUENCES OF MATRICES The theory in the preceding sections admits generalizations in different directions. This section develops some results necessary for spatial models. A different track is followed in Chapter 4.
3.6.1 Discretization and Interpolation in the Two-Dimensional Case Corresponding to uniform coverings of (0, 1) we can define uniform coverings of the square Q ¼ (0, 1)2 consisting of small squares qst ¼ is it ,
1 s, t n,
of area n2 . For a given F [ Lp ((0, 1)2 ), dnp F is defined by ð (dnp F)st ¼ n2=q F(x) dx, 1 s, t n, qst
[here x ¼ (x1 , x2 ) and dx is the Lebesgue measure on the plane]. If f is a matrix of size n n, the step function Dnp f is, by definition, n X fst 1qst : Dnp f ¼ n2=p s,t¼1
dnp and Dnp are called discretization and interpolation operators, respectively. Lemma (i) For all F [ Lp ((0, 1)2 ) and f [ lp kdnp Fkp kFkp , kDnp f kp ¼ k f kp :
(3:28)
3.6 Lp-APPROXIMABLE SEQUENCES OF MATRICES
107
(ii) If F is symmetric, F(x, y) ¼ F( y, x) for all (x, y) [ (0, 1)2 , then dnp F is a symmetric matrix. (iii) To distinguish the 2-D and 1-D cases, denote d2np as the operator defined here and d1np its 1-D cousin from Section 2.1.2. If F(x, y) ¼ G(x)H( y), then (d2np F)st ¼ (d1np G)s (d1np H)t for all s, t. Proof. (i) Equation (3.28) is proved as the corresponding properties in Sections 2.1.3 and 2.1.7. (ii) Observe that (x, y) [ qst if and only if ( y, x) [ qts and therefore by the symmetry of F ð ð (dnp F)st ¼ n2=q F(x, y) dx dy ¼ n2=q F(x, y) dx dy ¼ (dnp F)ts : qst
qts
(x and y are real and dx and dy are linear Lebesgue measures). (iii) Obviously, ð ð (d2np F)st ¼ n1=q G(x) dx n1=q H( y) dy ¼ (d1np G)s (d1np H)t : is
B
it
3.6.2 Continuity Modulus Let ty be the translation operator by a vector y [ R2 . Denote Q ¼ (0, 1)2 ,
Q y ¼ {x y: x [ Q},
Qy ¼ Q > (Q y):
If F is defined on Q, then F ty F is defined on Qy . With this notation, the definition of the continuity modulus is quite similar to that in the 1-D case:
vp (F, d) ¼ sup kF ty Fk p,Qy , d . 0: kyk2 d
Lemma. For all F [ Lp (Q) and 1 p , 1 we have limd !0 vp (F, d) ¼ 0: The proof is similar to the proof of Lemma 2.1.5.
3.6.3 The Haar Projector The Haar projector Pn is defined on the integrable on Q functions F by Pn F ¼ n2
ð n X s,t¼1
qst
F(x) dx1qst :
108
CHAPTER 3
Lemma.
CONVERGENCE OF LINEAR AND QUADRATIC FORMS
For all F [ Lp (Q) and 1 p 1
(i) Pn ¼ Dnp dnp : (ii) kPn Fkp kFkp : Proof. (i) Directly from the definitions Dnp (dnp F) ¼ n2=p
n X
(dnp F)st 1qst
s,t¼1
¼ n2=pþ2=q
n ð X s,t¼1
F(x) dx1qst ¼ Pn F:
qst
(ii) Follows from (i) and Eq. (3.28).
B
3.6.4 Convergence of Haar Projectors to the Identity Operator (Two-Dimensional Case) Lemma. If p , 1, then the sequence {Pn } converges strongly to the identity operator with the next bound on the rate of convergence: pffiffiffi kPn F Fkp (2p)1=p vp (F, 2=n): Proof. Step 1. Splitting the integral over Q into integrals over qst we have n ð X Pn F F p ¼ jPn F Fjp dy p s,t¼1
qst
Ð (Pn F on qst is just n2 qst Fdx) p ð ð ð 2 2 F(x) dx n F( y) dx dy: ¼ n s,t¼1 qst qst qst n X
Next, apply Ho¨lder’s inequality to the inner integral and use the identity p p=q ¼ 1 p ð ð n X Pn F F p ¼ n2p (F(x) F( y)) dx dy p s,t¼1 qst qst n ð ð X n2 jF(x) F( y)jp dx dy: s,t¼1
qst qst
3.6 Lp-APPROXIMABLE SEQUENCES OF MATRICES
109
Step 2. To evaluate the small integrals that appear here, consider an integral over a square D ¼ (a, a þ b)2 ðð I¼ jF(x) F( y)jp dx dy DD
(change x ¼ y þ z in the inner integral) ð ð
jF( y þ z) F( y)jp dz dy:
¼ D Dy
Following exactly Step 2 of the proof of Lemma 2.2.1 would be difficult. Instead, we use the inclusion A ; {( y, z): y [ D, z [ D y} # B ; {( y, z): kzk2 y [ Dz }:
pffiffiffi 2b, (3:29)
This inclusion is proved by showing that each element of A belongs to B: Let ( y, z) [ A: Then there is x [ D such that z ¼ x y: Since x and y belong to pffiffiffi pffiffiffi the same square with diagonal of length 2, we have kzk2 2b: Further, y [ D and y ¼ x z [ D z imply y [ Dz : We have shown that ( y, z) [ B: Equation (3.29) allows us to estimate I as ð
ð
p
I ¼ jF( y) F( y þ z)j dz dy jF( y) F( y þ z)jp dz dy A
ð ¼ pffiffi kzk2 2b
0
B
1
ð
B C @ jF( y) (tz F)( y)jp dyAdz: Dz
Step 3. Summarizing Steps 1 and 2
kPn F Fk pp n2
n X s,t¼1
ð pffiffi kzk2 2=n
0 B @
ð
1 C jF tz Fj p dyAdz
(qst )z
and noting that [ s,t
(qst )z ¼
[ s,t
[qst > (qst z)] # Q > (Q z) ¼ Qz
110
CHAPTER 3
CONVERGENCE OF LINEAR AND QUADRATIC FORMS
we finally arrive at
kPn F Fk pp n2
ð pffiffi kzk2 2=n
v pp (F,
0
1
ð
B C @ jF tz Fj p dyAdz Qz
pffiffiffi 2=n)n2
ð pffiffi kzk2 2=n
dz ¼ 2pv pp (F,
pffiffiffi 2=n): B
3.6.5 L p -Approximable Sequences of Matrices A sequence of matrices { fn : n [ N}, where fn is of size n n for all n, is called L p -approximable if there exists a function F [ L p ((0, 1)2 ) satisfying the condition k fn dnp Fk p ! 0:
(3:30)
If this is true, we also say that {fn } is Lp -close to F. Lemma.
(Equivalent definition) If p , 1, then Eq. (3.30) is equivalent to kDnp fn Fkp ! 0:
The proof repeats that of Lemma 2.5.1, except that the references to Lemmas 2.1.7 and 2.2.1 should be replaced by references to the corresponding properties from Sections 3.6.1, 3.6.3 and 3.6.4.
3.6.6 Simple sufficient condition Lemma. Let {Bn } be a sequence of matrices such that Bn is of size n n and there exists a continuous function F [ C([0, 1]2 ) satisfying max jBnst F(s=n, t=n)j ! 0:
1s,tn
(3:31)
Put fn ¼ n2=p Bn . If p , 1, then {fn } is L p -close to F. The proof is obtained by making obvious changes in the proof of Lemma 2.7.2. Condition (3.31) was used in (Nabeya and Tanaka, 1988), see also (Tanaka, 1996, Section 5.6).
3.7 INTEGRAL OPERATORS
111
3.7 INTEGRAL OPERATORS If in the definition of a matrix product P 10 1 0 a1i xi 1 a1n x1 B i C A@ A ¼ @ P A ani xi ann xn
0
a11 @ an1
i
we replace summation by integration, the definition of an integral operator emerges. Tracing the analogies between properties of integral operators and matrices makes one’s life much easier, although infinite dimensionality certainly introduces a bit of adventure.
3.7.1 Integral Operators with Square-Integrable Kernels Let K be a symmetric function from L2 (Q), Q ¼ (0, 1)2 . We can associate with it an integral operator ð1 K(x, y)F( y)dy, F [ L2 (0, 1):
(KF)(x) ¼ 0
The function K is called a kernel of the operator K. For the first pack of properties of K we need the notation ð1 (F, G) ¼
ð F(x)G(x) dx, (F, G) ¼
0
F(x)G(x) dx (0,1)2
of scalar products in L2 (0, 1) and L2 (Q). The norms these generate are denoted by the same symbol kFk2 . Lemma.
If K is symmetric and belongs to L2 (Q), then
(i) K is linear, K(aF þ bG) ¼ aKF þ bKG for all numbers a, b and elements F, G [ L2 (0, 1), (ii) K is bounded, kKk2 kKk2 , (iii) K is symmetric, (KF, G) ¼ (F, KG). Proof. (i) Linearity of K is obvious.
112
CHAPTER 3
CONVERGENCE OF LINEAR AND QUADRATIC FORMS
(ii) By the Ho¨lder inequality 01 11=2 ð j(KF)(x)j @ K 2 (x, y) dyA kFk2 0
which implies kKFk2 kKk2 kFk2 :
(3:32)
This means that the domain of K is the whole L2 (0, 1) and the norm of K is not larger than kKk2 . (iii) Changing the order of integration, 01 1 ð (KF, G) ¼ @ K(x, y)F( y) dyAG(x) dx ð1 0
ð1 ¼
0
01 1 ð F( y)@ K(x, y)G(x) dxAdy
0
ð1 ¼
0
01 1 ð F( y)@ K( y, x)G(x) dxAdy ¼ (F, KG):
0
0
B
3.7.2 Compactness of K A bounded linear operator A from one normed space, X, to another normed space, Y, is called compact if the image AB of the unit ball B ¼ {y [ Y: kyk 1} is precompact (see the definition in Section 2.6.4). Lemma.
If K [ L2 (Q), then K is compact.
Proof. To prove that the image KB of the ball B ¼ {F: kFk2 1} is precompact we can use the Frechet – Kolmogorov theorem (Section 2.6.4). Eq. (3.32) shows that the elements of KB are uniformly bounded: sup kKFk2 kKk2 : kFk2 1
For an arbitrary h [ R, estimate by Ho¨lder’s inequality that 01 11=2 ð j(KF)(x) (th KF)(x)j @ jK(x, y) K(x þ h, y)j2 dyA kFk2 : 0
3.7 INTEGRAL OPERATORS
113
If jhj d, then by the definition of the 2-D continuity modulus the end bound implies ð
j(KF)(x) (th KF)(x)j2 dx
Qh
ð ð1
jK(x, y) K(x þ h, y)j2 dy kFk22 v22 (K, d) kFk22 :
Qh 0
Apply sup over jhj d to the left side:
v2 (KF, d) v2 (K, d) kFk2 v2 (K, d) for F [ B: Hence, by Lemma 3.6.2 the elements of KB are uniformly equicontinuous. Thus, by the Frechet – Kolmogorov theorem KB is precompact and K is compact. B
3.7.3 Orthonormal Systems and Fourier Series Let H be a Hilbert space. A system of vectors {xn : n [ N} , H is called orthonormal if 1. (xi , x j ) ¼ 0, i = j (orthogonality) and 2. kxi k2 ¼ 1 (normalization). With an orthonormal system Pat hand, we can consider Fourier coefficients ci ¼ ( y, xi ) and Fourier series [ H. Here the infinite sum i ci xi for any yP P1 n i¼1 ci xi is defined as the limit in H of partial sums i¼1 ci xi . Lemma (i) y is representable by its Fourier series, y ¼ P kyk22 ¼ i c2i .
P
i ci xi ,
(ii) If y P and z are representable by their Fourier series, y ¼ z ¼ i (z, xi )xi , then X ( y, z) ¼ ( y, xi )(z, xi ):
if and only if P
i
Proof. (i) For any natural n we can write 2 ! n n n X X X ci xi ¼ x ci xi , x ci xi x i¼1 i¼1 i¼1 2
i
( y, xi )xi and (3:33)
114
CHAPTER 3
CONVERGENCE OF LINEAR AND QUADRATIC FORMS
(use linearity of scalar products) ¼ kxk22
n X
ci (xi , x)
i¼1
n X
ci (x, xi ) þ
i¼1
n X
ci c j (xi , x j )
i, j¼1
(apply orthonormality and plug in the coefficients) ¼ kxk22
m X
c2i :
i¼1
2 P P1 2 2 Letting n ! 1 yields x 1 i¼1 ci xi 2 ¼ kxk2 i¼1 ci , which proves statement (i). (ii) Equation (3.33) is obtained by scalar multiplication of y and z with subsequent application of the linearity of scalar products and orthonormality. B Equation (3.33) is called a Parseval identity.
3.7.4 Symmetric and Nonnegative Operators Let A be a bounded linear operator in a Hilbert space H. Its adjoint A is defined as the operator that satisfies (Ax, y) ¼ (x, A y) for all x, y [ H. Further, A is called symmetric or self-adjoint if A ¼ A . The thrust of Lemma 3.7.1 is that K is symmetric in this sense. A symmetric operator A is called nonnegative if (Ax, x) 0 for any x [ H. Lemma (i) For any bounded operator A, the operator B ¼ A A is symmetric and nonnegative. (ii) For any bounded operator A and compact operator B, the products AB and BA are compact. Proof. (i) By the definition of the adjoint (Bx, y) ¼ (A Ax, y) ¼ (Ax, Ay) ¼ (x, A Ay) ¼ (x, By) for all x, y [ H: This proves the symmetry of B. The proof of its nonnegativity is even simpler: (Bx, x) ¼ (A Ax, x) ¼ (Ax, Ax) 0: (ii) Consider, for example, the first product. We need to prove that if the vectors xn satisfy kxn k 1, then the sequence {ABxn } contains a convergent subsequence. Indeed, since B is compact, there is a subsequence {xnm } # {xn } such that {Bxnm } converges. But then by boundedness of A, the sequence {ABxnm } also converges: kABxnm ABxnk k kAkkBxnm Bxnk k ! 0: B
3.7 INTEGRAL OPERATORS
115
3.7.5 Hilbert –Schmidt Theorem Let A be a bounded linear operator in a Hilbert space H: People say that x is an eigenvector of A that corresponds to its eigenvalue l if Ax ¼ lx and x = 0: Nonzero x is excluded here as an uninteresting case (with zero x, Ax ¼ lx is true for any l). However, a zero eigenvalue is a possibility. If x is an eigenvector of A that corresponds to eigenvalue l, then Aax ¼ lax for any number a. In other words, along the straight line {ax: a [ R} the action of A is simple: it multiplies vectors by l. The dimension of the least linear subspace that contains all eigenvectors corresponding to the given eigenvalue l is called multiplicity of that l. Theorem. (Hilbert – Schmidt). (Kolmogorov and Fomin, 1989, Chapter 4, Section 6) If A is a symmetric, compact operator in H, then it possesses systems of eigenvalues {ln } and corresponding eigenvectors {xn } such that (i) the eigenvalues are real, repeated according to their multiplicities and ln converge to 0, (ii) the system of eigenvectors is orthonormal and such that any y [ H can be decomposed as X ( y, xi )xi : (3:34) y¼ i
Since Axi ¼ li xi , Eq. (3.34) obviously implies Ay ¼
X
li ( y, xi )xi , y [ H:
(3:35)
i
This representation, called a spectral decomposition of A, is the main purpose of the Hilbert – Schmidt theorem.
3.7.6 Functions of a Symmetric Compact Operator The spectral decomposition (3.35) enables us to define functions of A. The idea is simple: define f (A) by f (A)y ¼
X
f (li )( y, xi )xi
(3:36)
i
for any f for which f (li ) exist and the series at the right converges. A simple sufficient condition is that f be defined and bounded on the real line, supt[R j f (t)j , 1, because then convergence of Eq. (3.36) trivially follows from convergence of Eq. (3.35). Lemma. Let A be a symmetric compact operator. It is nonnegative if and only if all its eigenvalues are nonnegative. Further, if it is nonnegative, then all its nonnegative powers At , t 0, exist.
116
CHAPTER 3
CONVERGENCE OF LINEAR AND QUADRATIC FORMS
Proof. If A is nonnegative, then, selecting y ¼ xi in Eq. (3.35), we get 0 (Axi , xi ) ¼ li (xi , xi ) ¼ li : Conversely, if all eigenvalues are nonnegative, Eqs. (3.34) and (3.35) P give (Ay, y) ¼ i li ( y, xi )2 0: Suppose A is nonnegative. Then its eigenvalues are nonnegative and satisfy li ! 0 by the Hilbert – Schmidt theorem. Hence, in the definition of powers At y ¼
X
lti ( y, xi )xi
(3:37)
i
the numbers lti are defined and bounded. Thus, At is correctly defined and bounded in H. B
3.7.7 Fourier Decomposition of the Kernel Lemma. as
If K is symmetric and square-integrable on Q, then it can be decomposed
K(x, y) ¼
X
li Fi (x)Fi ( y):
(3:38)
i
Besides, kKk22 ¼
X
l2i :
(3:39)
i
Here eigenvalues {ln } and eigenvectors {Fn } are the result of the application of the Hilbert – Schmidt theorem to K. Proof. By Fubini’s theorem (Kolmogorov and Fomin, 1989, Chapter 5, Section 6), for almost all x [ (0, 1) the kernel K(x, ) belongs to L2 (0, 1) as a function of y. Hence, we can substitute it for y in Eq. (3.34) and thereby obtain K(x, y) ¼
X
(K(x, ), Fi )Fi ( y):
i
Since Fi are eigenvectors, the Fourier coefficients here actually equal ð1 (K(x, ), Fi ) ¼ K(x, y)Fi ( y) dy ¼ li Fi (x): 0
This proves Eq. (3.38).
3.8 CLASSES sp
117
Equation (3.39) is a consequence of Eq. (3.38):
kKk22
¼
ð1 ð1 X 0 0
¼
X i, j
!2
li Fi (x)Fi ( y)
dxdy
i
ð1
ð1
li lj Fi (x)Fj (x)dx Fi ( y)Fj ( y)dy ¼ 0
X
l2i :
i
0
This shows, in particular, that the series in Eq. (3.38) converges in L2 (Q).
B
3.8 CLASSES sp 3.8.1 s-Numbers and Classes sp Let A be a compact operator in H. By Lemma 3.7.4 the operator A A is compact, symmetric and nonnegative. By the Hilbert – Schmidt theorem its eigenvalues li (A A) tend to zero, while by Lemma 3.7.6 they are nonnegative. The numbers pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi si (A) ¼ li (A A) are called s-numbers of A. A class sp, 1 p 1, is defined as the set of compact operators in H satisfying kAksp ¼ k{si (A)}klp , 1: Note that the smaller p, the stronger is the requirement A [ sp : This is so because, by monotonicity of lp norms, kAksq kAksp
if 1 p q 1:
(3:40)
The elements of s2 are called Hilbert – Schmidt operators. Equation (3.39) means that if K is symmetric and square-integrable on (0, 1)2 , then K is a Hilbert – Schmidt operator. The elements of s1 are called nuclear operators. In the finite-dimensional case the trace of a symmetric matrix is, by definition, the sum of its diagonal elements. The trace is shown to be equal to the sum of the eigenvalues. In the infinite-dimensional case, when matrix representations, as a rule, don’t work, it is natural to define the trace of a symmetric compact operator to be the sum of its Peigenvalues. The nuclearity assumption ensures absolute convergence of the series i li . Lemma.
If A is symmetric and compact, then si (A) ¼ jli (A)j and X i
!1=2
l2i
X i
jli j:
(3:41)
118
CHAPTER 3
CONVERGENCE OF LINEAR AND QUADRATIC FORMS
Proof. For a symmetric and compact A we have
si (A) ¼
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi pffiffiffiffiffiffiffiffiffiffiffiffi qffiffiffiffiffiffiffiffiffiffiffi li (A A) ¼ li (A2 ) ¼ l2i (A) ¼ jli (A)j:
Now Eq. (3.41) follows from Eq. (3.40).
B
3.8.2 Discretizing an Integral Operator Let Pn , D1n , d1n be the 1-D Haar projector, interpolation operator and discretization operator, respectively, obtained from Sections 2.1.2 and 2.1.6 by putting p ¼ 2. Let d2n be the 2-D discretization operator from Section 3.6.1, also in case p ¼ 2. Denote Kn ¼ Pn KPn . Lemma.
Kn ¼ D1n (d2n K)d1n :
Proof. Since D1n d1n ¼ Pn (Lemma 2.1.7), we have Kn ¼ D1n (d1n KD1n )d1n and the statement will follow if we establish
d1n KD1n ¼ d2n K:
(3:42)
Let us see what the application of KD1n to f [ Rn gives:
(KD1n
ð1 f )(x) ¼ K(x,
y)(D1n f )( y) dy
n ð pffiffiffi X ¼ n K(x, y) dy ft : t¼1
0
it
Multiply this by d1n : 0 1 n ð ð X @ K(x, y) dyAdx ft [d1n (KD1n f )]s ¼ n t¼1
¼n
is
ð n X t¼1
it
K(x, y) dx dy ft ¼ [(d2n K)f ]s ,
s ¼ 1, . . . , n:
is it
Since f is arbitrary, this proves Eq. (3.42).
B
3.8.3 Preservation of Spectra under Discretization Lemma. If K is symmetric and square-integrable on Q, then nonzero eigenvalues of Kn , repeated according to their multiplicities, are the same as those of d2n K.
3.8 CLASSES sp
119
Proof. We know from Lemma 2.1.7(ii) that interpolating f [ Rn and discretizing the result D1n f gives the original vector
d1n D1n f ¼ f , f [ Rn :
(3:43)
Let l be a nonzero eigenvalue of Kn . Kn F ¼ lF by Lemma 3.8.2 and Eq. (3.43) implies
ld1n F ¼ d1n Kn F ¼ d1n D1n (d2n K)d1n F ¼ (d2n K)d1n F: We need to satisfy ourselves that f ¼ d1n F is not null. The image Im(Kn ) is contained in the image Im(Pn ), which consists of functions constant on the elements of the covering {it }: Since F is not null and belongs to Im(Kn ), at least one of these constants is not zero and, hence, f is not null. We see that f ¼ d1n F is an eigenvector of d2n K corresponding to the eigenvalue l: Now we prove that multiplicities are preserved. Multiplicity of l equals the maximum number of orthogonal eigenvectors that correspond to l: If G is another eigenvector that corresponds to the same eigenvalue l and is orthogonal to F, then, denoting g ¼ d1n G, by Eq. (2.6) we have
0
ð1
0
ð1
f g ¼ (dn F) dn G ¼ (Pn F)Pn G dx ¼ FG dx ¼ 0: 0
0
Here we have used Pn F ¼ F (projectors don’t change elements of their images). Thus, multiplicities are preserved. Conversely, if (d2n K)f ¼ lf , f = 0, then we multiply both sides by D1n and use Eq. (3.43) to substitute f : D1n (d2n K)d1n D1n f ¼ lD1n f or, by Lemma 3.8.2, Kn F ¼ lF where F ¼ D1n f is not null. Thus, nonzero eigenvalues of Kn and d2n K are the same. If g is another eigenvector that corresponds to the same eigenvalue l of d2n K and is orthogonal to f , then with G ¼ D1n f Eq. (2.4) gives ð1 0
ð1
FG dx ¼ (Dn f )Dn g dx ¼ f 0 g ¼ 0: 0
Again, multiplicities are preserved. In the case l ¼ 0 the above argument is not applicable because Kn F ¼ lF ¼ 0 does not imply F ¼ 0: The subspace of eigenvectors of Kn that correspond to l ¼ 0 is infinite-dimensional. B
120
CHAPTER 3
CONVERGENCE OF LINEAR AND QUADRATIC FORMS
3.8.4 Integral Operators and Interpolation Applying the 2-D interpolation operator D2n to a symmetric matrix A of size n n we get a function on Q D2n A ¼ n
n X
aij 1qij :
i, j¼1
This function generates an integral operator ð1
(AF)(x) ¼ (D2n A)(x, y)F( y) dy: 0
Lemma. There is a one-to-one correspondence between the set of nonzero eigenvalues of A and a similar set of A. Proof. The image Im(Pn ) of the 1-D Haar projector consists of functions constant on the elements of the covering {it }: Since 1qst ¼ 1is 1it , the image of A is contained in Im(Pn ):
(AF)(x) ¼ n
n X s,t¼1
ð1 ast 1is (x) 1it ( y)F( y) dy [ Im(Pn ): 0
Pn, being a projector, doesn’t change elements of its image, so Pn A ¼ A: Suppose AF ¼ lF, l = 0: Since the left side belongs to Im(Pn ), the right side also belongs to it. Therefore Pn APn F ¼ lF: By Lemma 3.8.2 and d2n D2n ¼ I this rewrites as D1n (Ad1n F) ¼ D1n (d2n D2n A)d1n F ¼ D1n (d2n K)d1n F ¼ lF: Multiply both sides by d1n and use the property d1n D1n ¼ I and notation f ¼ d1n F to get Af ¼ d1n D1n (Ad1n F) ¼ lf : Assumption F = 0 implies f = 0 because F is piece-wise constant. Thus, l is an eigenvalue of F. Conversely, let Af ¼ lf , f [ Rn : Multiply both sides by D1n and substitute A ¼ d2n D2n A, f ¼ d1n D1n f : D1n (d2n D2n A)d1n D1n f ¼ lD1n f : By Lemma 3.8.2 and denoting F ¼ D1n f we have Pn APn F ¼ lF, which is actually the same as AF ¼ lF: In both parts of the proof multiplicities are preserved because of preservation of scalar products. B
3.8 CLASSES sp
121
3.8.5 Uniform Boundedness of s1-Norms of Wn Here we need some facts from Gohberg and Krei˘n (1969): 1. The class sp is a normed space with the norm k ksp . In particular, the triangle inequality kA þ Bksp kAksp þ kBksp and its consequence jkAksp kBksp j kA Bksp
(3:44)
are true (Gohberg and Kreı˘n, 1969, p. 92). 2. For any bounded operators B and C kBACksp kBkkAksp kCk
(3:45)
(Gohberg and Krei˘n, 1969, Section 2.1). P 3. If for someP orthonormal basis {f j } we have j kAfj k , 1, then kAks1 j kAfj k (Gohberg and Krei˘n, 1969, Section 7.8). Let A be a square matrix of order n. Choosing fj ¼ (0, . . . , 0, 1, 0, . . . , 0) (unity in the jth place) we have Afj ¼ (a1j , . . . , anj )0 (jth column) and property 3 yields kAks1
n X
k(a1j , . . . , anj )k2
pffiffiffi nkAk2 :
(3:46)
j¼1
Theorem. (Mynbaev, 2010) Let {Wn } be a sequence of symmetric matrices, Wn being of size n n: If K is a symmetric square-integrable function on Q, the operator K is nuclear and pffiffiffi kWn d2n Kk2 ¼ o(1= n),
(3:47)
then the eigenvalues ln1 , . . . , lnn of Wn satisfy sup n
n X
jlnj j , 1:
j¼1
Proof. The operator Kn from Section 3.8.2 is symmetric because K and Pn are. By Lemma 3.8.3, kKn ks1 ¼ kd2n Kks1 (zero eigenvalues do not affect the norms), while P Lemma 3.8.4 implies nj¼1 jlnj j ¼ kWn ks1 . Hence, from Eqs. (3.44), (3.47) and (3.46) n X jlnj j kKn ks1 ¼ jkWn ks1 kd2n Kks1 j j¼1 kWn d2n Kks1
pffiffiffi nkWn d2n Kk2 ! 0:
122
CHAPTER 3
CONVERGENCE OF LINEAR AND QUADRATIC FORMS
Using kPn k 1, Eq. (3.45), Lemma 3.8.1 and the nuclearity assumption we see that kKn ks1 ¼
1 X j¼1
sj (Pn KPn )
1 X j¼1
sj (K) ¼
1 X
jlj (K)j , 1:
j¼1
The two displayed equations immediately above prove the theorem.
B
3.9 CONVERGENCE OF QUADRATIC FORMS OF RANDOM VARIABLES Here we see how CMTs in combination with a CLT work to obtain convergence to a nonnormal distribution.
3.9.1 CLT for Quadratic Forms of Linear Processes: Version 1 Let the vector {vn } be the same as in Theorem 3.5.2 (a linear process, with m.d.’s ent as innovations, with coefficients cj ) and let {kn : n [ N} be a sequence of nonstochastic matrices such that kn is of size n n: Theorem.
(Mynbaev, 2001) Suppose that
(i) e2nt are uniformly integrable and E(e2nt jF n,t1 ) ¼ s 2 for all t and n, (ii) the sequence {cj : j [ Z} is summable, ac , 1, (iii) the sequence {kn : n [ N} is L2 -close to some symmetric function K [ L2 ((0, 1)2 ) with the next rate of approximation kkn dn2 Kk2 ¼ o(1=n),
(3:48)
(iv) the integral operator K with the kernel K is nuclear. Then we can assert that 1. If bc = 0, then the quadratic form Qn (kn ) ¼ v0n kn vn P converges in distribution to (sbc )2 i li u2i where ui are independent standard normal and li are the eigenvalues of K. 2. If bc ¼ 0, then plimQn (kn ) ¼ 0. This kind of a CLT appeared in Nabeya and Tanaka (1990). The important features of their approach include: 1. the link between the limit distribution of Qn (kn ) and the integral operator K through the eigenvalues of the latter; 2. approximation of Qn (kn ) by Qn (dn2 K) and of Qn (dn2 K) by Qn (dn2 KL ),
3.9 CONVERGENCE OF QUADRATIC FORMS OF RANDOM VARIABLES
123
where
KL (x, y) ¼
L X
li Fi (x)Fi ( y)
(3:49)
i¼1
is the initial segment of representation (3.35). Adding to these features the properties of L2 -approximable sequences allowed me (Mynbaev, 2001) to replace their continuous kernels with square-integrable ones and consider double-infinite linear processes instead of one-sided processes (cj ¼ 0 for j , 0). [See in Tanaka (1996) the details regarding the so-called Fredholm approach.] Prove the above theorem in Sections 3.9.2 – 3.9.6.
3.9.2 Approximation of Qn (kn ) by Qn (dn2 K) By the Cauchy – Schwarz inequality n X vns vnt (kn dn2 K)st jQn (kn ) Qn (dn2 K)j ¼ s,t¼1 n X
kkn dn2 Kk2
!1=2 v2ns v2nt
s,t¼1
¼ kkn dn2 Kk2
n X
v2nt :
t¼1
Here, by orthogonality of m.d.’s,
E
n X t¼1
v2nt ¼
n X X
Een,ti ci en,tj cj
t¼1 i, j[Z
¼ s2
n X X t¼1 j[Z
c2j ¼ s 2 n
X
c2j :
j[Z
As condition (ii) of Theorem 3.9.1 implies square-summability of {cj : j [ Z}, we can use Eq. (3.48) to conclude that EjQn (kn ) Qn (dn2 K)j cnkkn dn2 Kk2 ! 0: This proves that plim(Qn (kn ) Qn (dn2 K)) ¼ 0:
(3:50)
124
CHAPTER 3
CONVERGENCE OF LINEAR AND QUADRATIC FORMS
3.9.3 Approximation of Qn (dn2 K) by Qn (dn2 KL ) Subtracting from the representation of K [Eq. (3.35)] its initial segment [Eq. (3.49)] and applying Lemma 3.6.1(iii) we get (d2n2 K d2n2 KL )st ¼
X
li (d1n2 Fi )s (d1n2 Fi )t :
i.L
Therefore Qn (d2n2 K) Qn (d2n2 KL ) ¼
X
li
i.L
¼
(d1n2 Fi )s vns (d1n2 Fi )t vnt
s,t¼1
X
li
i.L
¼
n X
X
n X
!2 (d1n2 Fi )t vnt
t¼1
li [(d1n2 Fi )0 vn ]2 :
(3:51)
i.L
By the T-decomposition for means of quadratic forms (3.27) 2
2
E[(d1n2 Fi )0 vn ]2 ¼ s 2 [kTn0 d1n2 Fi k2 þ kTn d1n2 Fi k2 2
þ kTnþ d1n2 Fi k2 ] 3s 2 a2c kFi k22 ¼ 3s 2 a2c : The passage is by boundedness of the trinity 2.3.2, of the discretization operator 2.1.3 and normalization of Fi . From the two displayed equations immediately above by nuclearity of K EjQn (dn2 K) Qn (dn2 KL )j 3s 2 a2c
X
jli j ! 0,
L ! 1:
(3:52)
i .L
It is extremely important that the majorant here does not depend on n and, as a result, plimL !1 [Qn (dn2 K) Qn (dn2 KL )] ¼ 0 uniformly in n:
(3:53)
3.9.4 Convergence of the Cut-Off Quadratic Form Case 1. Let bc = 0. By selecting wln ¼ dn2 Fl , l ¼ 1, . . . , L, we satisfy condition (ii) of Theorem 3.5.2. All other conditions of that theorem also hold. By part (a) of Theorem 3.5.2 0
1 (dn2 F1 )0 vn d @ A ! ... N(0, (sbc )2 G), (dn2 FL )0 vn
(3:54)
3.9 CONVERGENCE OF QUADRATIC FORMS OF RANDOM VARIABLES
125
where by orthonormality of the system Fl, l ¼ 1, . . . , L, G ¼ ((Fi , Fj ))Li, j¼1 ¼ I: It follows that Eq. (3.54) is equivalent to (see Section 3.1.6) 0 1 0 1 u1 (dn2 F1 )0 vn d @ A ! jsbc j@ . . . A, ... uL (dn2 FL )0 vn
(3:55)
where u1 , . . . , uL are independent standard normal. Similarly to Eq. (3.51) Qn (d2n2 KL ) ¼
L X
li [(d1n2 Fi )0 vn ]2 :
i¼1
Here at the right we have a continuous function of the vector at the left of Eq. (3.55). By CMT (Section 1.8.3) then d
Qn (d2n2 KL ) ! (sbc )2
L X
li u2i , n ! 1:
(3:56)
i¼1
Case 2. Let bc ¼ 0: By Theorem 3.5.2b 0 1 (dn2 F1 )0 vn p @ A ! ... 0, n ! 1, (dn2 FL )0 vn and by Lemma 1.8.1(iii) p
Qn (d2n2 KL ) ! 0, n ! 1:
(3:57)
3.9.5 Theorem on a Double Limit We are interested in the convergence in distribution of a sequence of random vectors {Xn : n [ N}. Suppose that each of these vectors is approximated by a sequence {XnL : L [ N}, with the degree of approximation improving as L ! 1. It is useful to visualize the situation as in the matrix: X11 X12 ... X1L # X1
X21 X22 ... X2L # X2
... ... ... ... ...
Xn1 Xn2 ... XnL # Xn
! Y1 ! Y2 ... ... ! YL # ! Y
Theorem. (Billingsley, 1968) Suppose there is convergence in distribution along d the rows XnL ! YL , n ! 1, and that the vectors YL in the right margin converge
126
CHAPTER 3
CONVERGENCE OF LINEAR AND QUADRATIC FORMS d
d
in distribution themselves, YL ! Y, L ! 1. Then we can conclude that Xn ! Y, n ! 1, if lim lim sup P(kXnL Xn k2 1) ¼ 0:
L !1
(3:58)
n !1
3.9.6 Culmination Case 1. Let bc = 0: Put Xn ¼ Qn (dn2 K) and XnL ¼ Qn (dn2 KL ). By Eq. (3.56) we have row-wise convergence d
XnL ! YL ; (sbc )2
L X
li u2i , n ! 1:
i¼1
P 2 Summability of {li } implies convergence of YL to Y ; (sbc )2 1 i¼1 li ui in L1 , probability and distribution. By Chebyshov’s inequality and Eq. (3.52) for any 1 . 0 X 1 jli j, lim sup P(kXnL Xn k2 1) 3s 2 a2c 1 n !1 i.L which implies Eq. (3.58). This allows us to realize the double limit passage: d
Qn (dn2 K) ! (sbc )2
1 X
li u2i :
i¼1
It remains to apply Eq. (3.50) to complete the proof in the case under consideration. Case 2. Let bc ¼ 0. Then we can use Eq. (3.57) instead of Eq. (3.56) in the above argument. With this change the conclusion is that Qn (kn ) converges in distribution to zero. But then we recall that convergence in distribution to a constant implies convergence in probability to that constant, see Lemma 1.8.3(iii).
3.9.7 CLT for Quadratic Forms of Linear Processes: Version 2 Here we show that by imposing a stronger condition on the m.d.’s (up to fourth moments) it is possible to relax the L2 -proximity requirement in Eq. (3.48) from pffiffiffi o(1=n) to o(1= n): The next theorem derives from Mynbaev (2010). Theorem.
Suppose that
(i) the m.d. array {ent , F nt } satisfies conditions: E(e2nt jF n,t1 ) ¼ s 2 for all t and n; the third conditional moments are constant, but may depend on n and t, E(e3nt jF n,t1 ) ¼ ant , and the fourth moments are uniformly bounded, m4 ¼ supn,t Ee4nt , 1;
3.9 CONVERGENCE OF QUADRATIC FORMS OF RANDOM VARIABLES
127
(ii) the sequence {cj : j [ Z} is summable, ac , 1; (iii) the sequence {kn : n [ N} is L2 -close to some symmetric function K [ L2 ((0, 1)2 ) with the next rate of approximation pffiffiffi kkn dn2 Kk2 ¼ o(1= n);
(3:59)
(iv) the integral operator K with the kernel K is nuclear. Then we can assert that 1. If bc = 0, then the quadratic form Qn (kn ) ¼ v0n kn vn P converges in distribution to (sbc )2 i li u2i , where ui are independent standard normal and li are the eigenvalues of K. 2. If bc ¼ 0, then plimQn (kn ) ¼ 0. The proof is given in Sections 3.9.8 – 3.9.10.
3.9.8 Lemma on Mixed Fourth-Order Moments The next lemma is close to that of (Lee, 2004b, Lemma A.11). Lemma. Denote m pqrs ¼ Eenp enq enr ens for integer p, q, r, s [ Z. If the m.d. array {ent , F nt } satisfies condition (i) of Theorem 3.9.7, then
m pqrs ¼
8 4 <s ,
if [(p ¼ q) = (r ¼ s)] or [(p ¼ r) = (q ¼ s)] or [(p ¼ s) = (q ¼ r)], : 4 Eenp , if p ¼ q ¼ r ¼ s:
In all other cases m pqrs ¼ 0: Proof. Without loss of generality we can order the indices: p q r s: Consider four situations. 1. s . r: By definition of m.d.’s
m pqrs ¼ E[enp enq enr E(ens jF n,s1 )] ¼ 0: 2. s ¼ r . q: By condition (i) of Theorem 3.9.7 and orthogonality of m.d.’s
m pqrs ¼ E[enp enq E(e2nr jF n,r1 )] 0, if p , q, 2 ¼ s Eenp enq ¼ 4 s , if p ¼ q:
128
CHAPTER 3
CONVERGENCE OF LINEAR AND QUADRATIC FORMS
3. s ¼ r ¼ q . p: By condition (i) of Theorem 3.9.7
m pqrs ¼ E[enp E(e3nq jF n,q1 )] ¼ anq Eenp ¼ 0: 4. s ¼ r ¼ q ¼ p: By definition m pqrs ¼ Ee4np : As a result of the ordering the cases [(p ¼ r) = (q ¼ s)] and [(p ¼ s) = (q ¼ r)] are impossible. The case [(p ¼ q) = (r ¼ s)] is covered in item 2, while p ¼ q ¼ r ¼ s is contained in item 4. In “all other cases” s . r q p or s r q . p should be true. The equality of m pqrs to zero then follows from items 1 – 3. B
3.9.9 The Gauge Inequality For an n n matrix A denote g(A) ¼ [E(v0n Avn )2 ]
1=2
:
(3:60)
This possesses the properties of a seminorm (homogeneity and triangle inequality), but I call it a gauge just to avoid the trite “norm” or “seminorm”. After some experimenting with variance I understood that using the gauge for the problem at hand is better. Lemma. Under conditions (i) and (ii) of Theorem 3.9.7 for any matrices A, B such that the product AB is of size n n g(AB) (3s4 þ m4 )
1=2 2 ac kAk2 kBk2 :
Proof. Denoting a1 , . . . , ak the columns of A and b1 , . . . , bk the rows of B, by the T-decomposition (3.25) we have a0l vn ¼
X
eni (Tn al )i , bl vn ¼
i
X
enj (Tn bl )j ,
l ¼ 1, . . . , k:
j
Hence, 0
a01 vn
1
X
0
(Tn a1 )i
1
B B C C A0 vn ¼ @ . . . A ¼ eni @ . . . A, i a0k vn (Tn ak )i 0 1 1 0 1 (Tn b1 )j b vn X B C B C enj @ . . . A: Bvn ¼ @ . . . A ¼ j (Tn bk )j bk vn
3.9 CONVERGENCE OF QUADRATIC FORMS OF RANDOM VARIABLES
129
This implies v0n ABvn ¼ (A0 vn )Bvn ¼
X
eni enj
i,j
k X
(Tn al )i (Tn bl )j
l¼1
and k X
E(v0n ABvn )2 ¼
X
Eeni1 eni2 enj1 enj2
l,m¼1 ii ,i2 ,j1 ,j2 [Z
(Tn al )i1 (Tn bl ) j1 (Tn am )i2 (Tn bm ) j2 : By Lemma 3.9.8 many fourth-order moments vanish and those that don’t are equal to either s4 or Ee4np : E(v0n ABvn )2 ¼
k X l,m¼1
s4
X
[(Tn al )i (Tn am )i (Tn bl )j (Tn bm )j
i,j[Z
þ (Tn al )i (Tn bl )i (Tn am )j (Tn bm )j þ (Tn al )i (Tn bm )i (Tn am )j (Tn bl )j ] þ
k X X
Ee4ni (Tn al )i (Tn bl )i (Tn am )i (Tn bm )i :
l,m¼1 i[Z
The structure of this expression becomes much clearer if we use the notation ( , )l2 for the scalar product in l2 (Z):
E(v0n ABvn )2 ¼
k X
s4 [(Tn al , Tn am )l2 (Tn bl , Tn bm )l2
l,m¼1
þ (Tn al , Tn bl )l2 (Tn am , Tn bm )l2 þ (Tn al , Tn bm )l2 (Tn am , Tn bl )l2 ] þ
k X X
Ee4ni (Tn al )i (Tn bl )i (Tn am )i (Tn bm )i :
l,m¼1 i[Z
By boundedness of the T-operator (2.13) and conditions (i) and (ii) of Theorem 3.9.7 j(Tn x, Tn y)l2 j kTn xk2 kTn yk2 a2c kxk2 kyk2 , kTn xk1 kTn xk2 ac kxk2 , Ee4ni m4 :
130
CHAPTER 3
CONVERGENCE OF LINEAR AND QUADRATIC FORMS
Therefore E(v0n ABvn )2 3s4 a4c
k X
kal k2 kam k2 kbl k2 kbm k2
l,m¼1
þ m4
k X
kTn al k1 kTn bl k1
l,m¼1
3s4 a4c
X
j(Tn am )i (Tn bm )i j
i[Z
k X
!2 kal k2 kbl k2
l¼1
þ m4 a2c
k X
kal k2 kbl k2 kTn am k2 kTn bm k2 :
l,m¼1
Applying Ho¨lder’s inequality and, again, boundedness of Tn, E(v0n ABvn )2 (3s4 þ m4 )a4c
k X
! kal k22
l¼1
k X
! 2 kbl k2
l¼1
¼ (3s4 þ m4 )a4c kAk22 kBk22 :
B
3.9.10 Proof of Theorem 3.9.7 By the gauge inequality 3.9.9 (EjQn (kn ) Qn (dn2 K)j2 )
1=2
¼ (E[v0n (kn dn2 K)vn ]2 )
1=2
¼ g(kn dn2 K) ckIk2 kkn dn2 Kk2 pffiffiffi ¼ c n kkn dn2 Kk2 ¼ o(1): Instead of bound 3.9.2 use this bound. All other steps of the proof of Theorem 3.9.1 do not change.
CHAPTER
4
REGRESSIONS WITH SLOWLY VARYING REGRESSORS
R
asymptotically collinear regressors arise in a number of applications, both in linear and nonlinear settings. Examples are the log – periodogram analysis of long memory (see Robinson, 1995; Hurvich et al., 1998; Phillips, 1999 and references therein), the study of growth convergence (Barro and Sala-i-Martin, 2003), and NLS estimation (Wu, 1981). This chapter is based on Phillips’ 2007 paper. His contribution to the theory of regressions with asymptotically collinear regressors can be described as follows: EGRESSIONS WITH
1. He used the properties of SV functions to develop asymptotic expansions of some nonstochastic expressions that arise in regression analysis. 2. Based on those asymptotic expansions, he employed Brownian motion to derive central limit results for weighted sums of linear processes where the weights are standardized SV regressors. 3. For the cases when the conventional scheme does not work (because of asymptotic collinearity of the regressors) he modified it so as to obtain convergence of the OLS estimator in a variety of practical situations. This chapter is structured accordingly. The main results are contained in Sections 4.2, 4.4, 4.5, and 4.6. Section 4.2, called Phillips Gallery 1, covers his asymptotic expansions of nonstochastic expressions. Not all of them are applied later in the book, but I include them for two reasons: they may be useful in other applications and they have helped me to guess some facts related to Lp -approximability. Section 4.4 is devoted to generalizations of the central limit results established by Phillips. The main point is that, for these sort of results, using Lp -approximability and my CLT 3.5.2 is preferable to recourse to Brownian motion. Here the reader will see that some expansions of nonstochastic expressions are also implied by Lp -approximability. In Section 4.5, named Phillips Gallery 2, we return to Phillips’ exposition by going over applications. Section 4.6, which is about regressions with two SV Short-Memory Linear Processes and Econometric Applications. Kairat T. Mynbaev # 2011 John Wiley & Sons, Inc. Published 2011 by John Wiley & Sons, Inc.
131
132
CHAPTER 4
REGRESSIONS WITH SLOWLY VARYING REGRESSORS
regressors, is a rigorous replacement for a heuristic argument given by Phillips in 2007. The analysis here reveals two important features of the model: 1. the asymptotic variance-covariance matrix jumps along certain rays in the parameter space and 2. there are infinitely many models with different asymptotic behavior of the functions of the sample size.
4.1 SLOWLY VARYING FUNCTIONS 4.1.1 Definition and Examples of Slowly Varying Functions A positive measurable function on [A, 1), A . 0, is slowly varying (SV) if lim
x!1
L(rx) ¼1 L(x)
for any r . 0:
(4:1)
Examples of such functions are L1 (x) ¼ log x and L2 (x) ¼ log(log x) because L1 (rx) log r þ log x ¼ ! 1, L1 (x) log x L2 (rx) log(log r þ log x) log(log x) þ log(1 þ (log r=log x)) ¼ ¼ ! 1: L2 (x) log(log x) log(log x) Similarly, L3 (x) ¼ 1=log x and L4 (x) ¼ 1=log(log x) are SV. The function L5 (x) ¼ x a , a = 0, is not SV because (rx)a =x a ¼ r a does not tend to 1 unless r ¼ 1. It turns out that condition (4.1) is very restrictive: Theorem. (Seneta, 1985, p. 25) If L satisfies Eq. (4.1), then La for any real a also satisfies that condition. Further, if L and M satisfy it, then their product and sum also do. The statements regarding the power La and product LM are trivial.
4.1.2 Uniform Convergence Theorem Theorem. (Seneta, 1985, p. 10) If L is SV, then, for any fixed interval [a, b] with 0 , a , b , 1, Eq. (4.1) is true uniformly in r [ [a, b]: For the functions L1 through L4 this statement can be verified directly.
4.1.3 Representation Theorem Theorem. (Seneta, 1985, p. 10) Let L be defined on [A, 1), A . 0: Then L is SV if and only if there exist a number B A and functions m, 1 on [B, 1) with properties: Ðx (i) L(x) ¼ exp m(x) þ B 1(t) dt=t , (ii) m is bounded, measurable and the limit c ¼ limx !1 m(x) exists (c [ R), (iii) 1 is continuous on [B, 1) and limx !1 1(x) ¼ 0:
4.2 PHILLIPS GALLERY 1
133
The equation in (i) is called Karamata representation. Dependence of L on B is not important and is therefore suppressed. However, it may surge in some bounds later, and I try to avoid using B for other purposes.
4.1.4 Growth Rates of Slowly Varying Functions Theorem. (Seneta, 1985, p. 24) A SV function L grows or declines very slowly in comparison with the power scale: for any g . 0, xg L(x) ! 1 and xg L(x) ! 0 as x ! 1.
4.1.5 Integrals of Slowly Varying Functions over Intervals Adjacent to the Origin Theorem. (Seneta, 1985, p. 65) Let a number h . 0 and function f on (0, 1) be Ðb such that the integral 0 t h f (t) dt exists, where 0 , b , 1. Let L be SV and bounded on each finite subinterval of [0, 1). If h . 0, then 1 L(x)
ðb
ðb f (t)L(xt) dt !
0
x ! 1:
f (t) dt, 0
In the case h ¼ 0, the same is true if L is nondecreasing.
4.2 PHILLIPS GALLERY 1 Of the exhibits of this gallery, only Lemmas 4.2.6 and 4.2.7 are applied in later sections.
4.2.1 Phillips Simplifying Assumption and the Notation L 5 K (1) Phillips (2007) argues that, in the development of an asymptotic theory of regression, little seems to be lost if one considers m ; constant because the asymptotic behavior of representation 4.1.3(i) is equivalent to that with m ; constant. Formally, this is justified as follows. If m is continuously differentiable, then 0 1 ðx dt L(x) ¼ exp@m(x) þ 1(t) A t B
0x 1 ðx ð dt ¼ exp@ m0 (t) dt þ m(B) þ 1(t) A t 0
B
¼ exp@m(B) þ
B
ðx B
(t m0 (t) þ 1(t))
1 dt A : t
134
CHAPTER 4
REGRESSIONS WITH SLOWLY VARYING REGRESSORS
So, if additionally limt ! 1 t m0 (t) ¼ 0, we have a new representation of the same funcis to use only SV functions with the tion L with a constant m. Thus, Ð x the suggestion representation L(x) ¼ cL exp B 1(t) dt=t and write L ¼ K(1) in this case, omitting the constant cL from the notation. The function 1 in this representation is called an 1-function of L: Further, 1 can be extended to the segment [0, B] in such a way that the integral ÐB 1(t)=t dt exists [e.g., one can set 1 equal to 0 in the neighborhood of 0 and interpolate 0 continuously between that neighborhood and [B, 1)]. This will amount to redefining L on [0, B], which does not matter asymptotically because only a finite number of regression equations is affected. In any case, L can be considered continuous and positive on [0, 1). From now on we assume that such adjustments are made. For some expansions we need to assume that j1(x)j is also SV and 1 has Karamata representation 0x 1 ð dt 1(x) ¼ c1 exp@ h(t) A for x B, t B
for some (possibly negative) constant c1 , where h is continuous and h(x) ! 0 as x ! 1. In such cases we also write 1 ¼ K(h), remembering that 1 can be negative. The number of different conditions in this theory may be daunting. To reduce it, in some cases we assume a little more than is required by a puristic approach. For L ¼ K(1), 1(x) ¼
xL0 (x) ! 0 L(x)
as x ! 1:
(4:2)
Using this formula we calculate and collect in Table 4.1 expressions for 1 and h in the sequence L ¼ K(1), 1 ¼ K(h) (the role of the function 1 m(x) ¼ (1(x) þ h(x)) 2 is disclosed in Section 4.2.7). In Table 4.1 we denote l1 (x) ¼ log x, l2 (x) ¼ log(log x) and assume g . 0. The table contains the functions of most practical interest against which the plausibility of new assumptions should be checked. TABLE 4.1 Basic SV Functions
h
1
L
m
L1m
L1 ¼ lg1
g l1
1 l1
1g 1 2 l1
L2 ¼ l 2
1 l1 l2
1 þ l2 l1 l2
1 2l1
g (g 1) g2 l1 2 1 2l21
L3 ¼
1 l1
1 l1
1 l1
1 l1
1 l31
L4 ¼
1 l2
1 l1 l2
1 þ l2 l1 l2
2 þ l2 2l1 l2
2 þ l2 2l21 l32
4.2 PHILLIPS GALLERY 1
135
4.2.2 Approximation of Sums by Integrals To study sums that involve SV functions, Phillips’ idea is to approximate the sums by integrals and then apply calculus to the integrals. The first step is realized here. (Phillips, 2007, Lemma 7.1) If L ¼ K(1), then for integer B 1
Lemma.
n X
ðn L(t) ¼
t¼B
L(t) dt þ O(ng ), as n ! 1,
B
where g . 0 is arbitrarily small. Proof. For a natural k, integration by parts gives kþ1 ð (
t
[t]
1) 0 2 L (t) dt
(integer part of t)
k kþ1 ð
¼
kþ1 ð 1 tL0 (t) dt k þ L0 (t) dt 2
k
k kþ1 ð
¼ (k þ 1)L(k þ 1) kL(k)
1 L(t) dt k þ [L(k þ 1) L(k)] 2
k
1 ¼ (L(k þ 1) þ L(k)) 2
kþ1 ð
L(t) dt: k
Summation of equations like this yields n X k¼B
L(k) ¼
n1 X 1 1 (L(k þ 1) þ L(k)) þ (L(B) þ L(n)) 2 2 k¼B
1 ¼ (L(B) þ L(n)) þ 2
ðn B
ðn 1 L(t) dt þ t [t] L0 (t) dt: 2
(4:3)
B
Using L0 (t) ¼ 1(t)L(t)=t [see Eq. (4.2)] and jt [t] 1=2j 1=2 (this is the idea of the proof) we bound the last integral as n ð ðn 1 t [t] L0 (t) dt 1 L(t)1(t) dt 2 t 2 B
B
ðc ðn 1 L(t)1(t) 1 L(t) y1 dt þ ¼ 1(t)t dt: g 2 t 2 t B
c
136
CHAPTER 4
REGRESSIONS WITH SLOWLY VARYING REGRESSORS
Here, by Theorems 4.1.3 and 4.1.4 the constant c . B can be chosen so that jt g L(t)j 1, j1(t)j 1: On [B, c] the function jL(t)1(t)=tj is bounded. Hence, n ð ðn 1 t [t] L0 (t) dt c1 þ 1 t g1 dt ¼ O(ng ): 2 2 c
B
Since 12(L(B) þ L(n)) ¼ O(ng ) by Theorem 4.1.4, the lemma follows from Eq. (4.3). B
4.2.3 Simple Integral Asymptotics Ðn
Lemma. If L ¼ K(1), then (1 þ o(1)):
1 L(t) dt
¼ nL(n)(1 þ o(1)) and
Pn
t¼1
L(t) ¼ nL(n)
Proof. Choosing b ¼ 1, h [ (0, 1), f ; 1 in Theorem 4.1.5 we get 1 nL(n)
ðn
1 L(t) dt ¼ L(n)
0
ð1 L(ns) ds ! 1: 0
By continuity of L and Theorem 4.1.4 1 nL(n)
ð1 L(t) dt ! 0: 0
From the above two equations 1 nL(n)
ðn L(t) dt ! 1 1
and the first statement follows upon multiplication of this equation by nL(n). By Lemma 4.2.2 n X t¼1
ðn L(t) ¼ 1
L(t) dt þ O(ng )
ng1 : ¼ nL(n) þ nL(n)o(1) þ nL(n)O L(n)
ng1 ¼ o(1): Hence, 1=L(n) is SV and, with g [ (0, 1), Theorem 4.1.4 implies O L(n) the second statement follows. B
4.2 PHILLIPS GALLERY 1
137
4.2.4 Things Get Complicated
1 Ð1 k L (t) dt: nLk (n) 0 (i) If L ¼ K(1), then M(n) ¼ 1 þ o(1):
Lemma.
Denote M(n) ¼
(ii) If, in addition, 1 ¼ K(h), then M(n) ¼ 1 k1(n) þ o(1(n)): (iii) If, along with the previous assumptions, we suppose that h ¼ K(m), then M(n) ¼ 1 k1(n) þ k2 12 (n) þ k1(n)h(n) þ o(12 (n) þ 1(n)h(n)): (iv) (Phillips, 2007, Lemma 7.2) Finally, if in addition m ¼ K(n), then M(n) ¼ 1 k1(n) þ k2 12 (n) þ k1(n)h(n) k 3 13 (n) 3k 2 12 (n)h(n) k1(n)h2 (n) k1(n)h(n)m(n) þ o[13 (n) þ 12 (n)h(n) þ 1(n)h2 (n) þ 1(n)h(n)m(n)]: Proof. Since L ¼ K(1) implies Lk ¼ K(k1), statement (i) follows from Lemma 4.2.3. (ii) Below we use repeatedly tL0 (t) ¼ L(t)1(t) and similar equations for 1, h, m: Integrating by parts, ðn I1 ;
k
L (t) dt ¼ [tL
k
(t)]jn1
ðn k
1
¼ nLk (n) Lk (1) k
tLk1 (t)L0 (t) dt
1
ðn
Lk (t)1(t) dt:
(4:4)
1
Since Lk 1 is SV, to the end integral we can apply Lemma 4.2.3: I1 ¼ nLk (n) þ O(1) knLk (n)1(n)[1 þ o(1)] 1 k k knLk (n)1(n)[1 þ o(1)] ¼ nL (n) þ nL (n)1(n)O nLk (n)1(n) ¼ nLk (n) knLk (n)1(n)[1 þ o(1)] because nLk (n)1(n) ! 1: This proves the statement. Three elements of this proof are used below as a standard procedure: 1. A term that results from substitution of the lower limit of integration, like Lk (1), is O(1): 2. Since products and powers of SV functions are SV, Lemma 4.2.3 and Theorem 4.1.4 are applicable. 3. O(1) is subsumed by the remainder that results from application of Lemma 4.2.3.
138
CHAPTER 4
REGRESSIONS WITH SLOWLY VARYING REGRESSORS
(iii) For the integral at the right of Eq. (4.4) we have ðn I2 ;
k
ðn
k
tLk1 (t)L0 (t)1(t) dt
L (t)1(t) dt ¼ nL (n)1(n) k 1
1
ðn
tLk (t)10 (t) dt þ O(1)
1
ðn
k
¼ nL (n)1(n) k
k
ðn
2
Lk (t)1(t)h(t) dt þ O(1)
L (t)1 (t) dt 1
1
k
¼ nL (n)1(n) kI3 I4 þ O(1),
(4:5)
where ðn
k
ðn
2
L (t)1 (t) dt, I4 ¼
I3 ¼ 1
Lk (t)1(t)h(t) dt:
1
By the standard procedure we can combine Eqs. (4.4) and (4.5) and jump directly to k
k
I1 ¼ nL (n) þ O(1) knL (n)1(n) þ k
2
ðn
k
ðn
2
L (t)1 (t) dt þ k Lk (t)1(t)h(t) dt
1 k
k
2
k
1
2
k
¼ nL (n) knL (n)1(n) þ k nL (n)1 (n) þ knL (n)1(n)h(n) þ nLk (n)o[12 (n) þ 1(n)h(n)], which proves the proposition. (iv) For I3 and I4 the representations are I3 ¼ nLk (n)12 (n) þ O(1) k
ðn
tLk1 (t)L0 (t)12 (t) dt 2
1
¼ nLk (n)12 (n) k
ðn
Lk (t)13 (t) dt 2
ðn
Lk (t)12 (t)h(t) dt þ O(1),
1
I4 ¼ nLk (n)1(n)h(n) þ O(1) k
ðn
tLk1 (t)L0 (t)1(t)h(t) dt
1
1
tLk (t)10 (t)h(t) dt
ðn 1
tLk (t)1(t)10 (t) dt
1
1
ðn
ðn
tLk (t)1(t)h0 (t) dt
(4:6)
4.2 PHILLIPS GALLERY 1
ðn
k
¼ nL (n)1(n)h(n) k
139
Lk (t)12 (t)h(t) dt
1
ðn
k
ðn
2
L (t)1(t)h (t) dt
1
Lk (t)1(t)h(t)m(t) dt þ O(1):
(4:7)
1
Equations (4.4) through (4.7) imply I1 ¼ nLk (n) þ O(1) knLk (n)1(n) þ k2 I3 þ kI4 k
k
2
k
2
¼ nL (n) þ O(1) knL (n)1(n) þ k nL (n)1 (n) k
3
ðn
Lk (t)13 (t) dt
1
2k 2
ðn
Lk (t)12 (t)h(t) dt þ knLk (n)1(n)h(n) k 2
1
ðn k
ðn
Lk (t)12 (t)h(t) dt
1
Lk (t)1(t)h2 (t) dt k
1
ðn
Lk (t)1(t)h(t)m(t) dt
1
or, applying the standard procedure, I1 ¼ nLk (n) knLk (n)1(n) þ k 2 nLk (n)12 (n) þ knLk (n)1(n)h(n) k3 nLk (n)13 (n) 3k2 nLk (n)12 (n)h(n) knLk (n)1(n)h2 (n) knLk (n)1(n)h(n)m(n) þ nLk (n)o[13 (n) þ 12 (n)h(n) þ 1(n)h2 (n) þ 1(n)h(n)m(n)]:
B
EXAMPLE 4.1. In the logarithmic case L(t) ¼ log t we have 1(t) ¼
tL0 (t) 1 ¼ , L(t) log t
h(t) ¼
t10 (t) 1 ¼ , 1(t) log t
m(t) ¼
t h0 (t) ¼ h(t): h(t)
Part (iv) of the lemma then gives the expansion ðn 1
logk t dt ¼ n logk n kn logk1 n þ k2 n logk2 n kn logk2 n k 3 n logk3 n þ 3k2 n logk3 n kn logk3 n kn logk3 n þ o(n logk3 n) ¼ n logk n kn logk1 n þ k(k 1)n logk2 n k(k 1)(k 2)n logk3 n þ o(n logk3 n):
(4:8)
140
CHAPTER 4
REGRESSIONS WITH SLOWLY VARYING REGRESSORS
However, successive integration by parts gives the exact result ðn ðn k k log t dt ¼ n log n k logk1 t dt ¼ 1
1 k
¼ n log n kn logk1 n þ k(k 1)n logk2 n þ þ k!(n 1): The expansion in Eq. (4.8) is accurate to the fourth order.
4.2.5 Expansion Related to Simple Regression For the simple regression yt ¼ a þ bL(t) þ ut ,
t ¼ 1, . . . , n,
one version of the formulas for the OLS estimates a^ and b^ is (Theil, 1971, Section 3.1) " #1 n n X X t 2 b^ b ¼ (L(t) L)u (L(t) L) (4:9) t¼1
t¼1
a^ a ¼
1 n
n X
b^ b) ut L(
(4:10)
t¼1
where 1 L ¼ n
n X
L(t)
t¼1
is the average. This explains the interest in the next proposition. Lemma. (Phillips, 2007, Lemma 7.3) If L ¼ K(1), 1 ¼ K(h), h ¼ K(m), and h(n) ¼ o(1(n)), then n 1X 2 ¼ L2 (n)12 (n)(1 þ o(1)): (L(t) L) n t¼1 Proof. We use Lemma 4.2.4(iii) with k ¼ 2 and k ¼ 1. The argument n is omitted everywhere. " #2 n n n X X 1X 1 1 2 2 ¼ (L(t) L) L (t) L(t) n t¼1 n t¼1 n t¼1 ¼ L2 {1 21 þ 412 þ 21h þ o(12 þ 1h) [1 1 þ 12 þ 1h þ o(12 þ 1h)]2 } ¼ L2 [1 21 þ 412 þ 21h 1 12 14 12 h2 þ 21 212 21h þ 213 þ 212 h 213 h þ o(12 þ 1h)] ¼ L2 [12 þ o(12 )], where we remember that h(n) ¼ o(1(n)).
(4:11) B
4.2 PHILLIPS GALLERY 1
141
4.2.6 Second-Order Regular Variation (Point-wise Version) Denote G(t, n) ¼
L(t) L(n) L(n)1(n)
(4:12)
and call this function a G-function of L. Lemma. [Phillips, 2007, Eq. (60)] If L ¼ K(1) and 1 is SV in the general sense of Section 4.1.1, then G(rn, n) ¼ log r[1 þ o(1)] uniformly in r [ [a, b], for any 0 , a , b , 1. Proof. From Karamata representation
L(rn) ¼ log L(n)
ðn
dt 1(t) ¼ 1(n) t
rn
ð1
1(ns) ds : 1(n) s
r
The conditions r [ [a, b] and
s[
[r, 1] [1, r]
if r , 1; if r . 1,
imply s [ [min{1, a}, max{1, b}]. By the uniform-convergence Theorem 4.1.2 1(ns)=1(n) ! 1 uniformly in s, so
L(rn) ¼ 1(n)[1 þ o(1)] log L(n)
ð1
ds ¼ 1(n) log r[1 þ o(1)]: s
r
This implies L(rn) 1 ¼ exp{1(n) log r[1 þ o(1)]} 1 L(n) ¼ 1(n) log r[1 þ o(1)] uniformly in r [ [a, b]:
B
Note, in passing, that L is second-order regularly varying [see de Haan and Resnick (1996)] in the sense that limn !1 G(rn, n) ¼ log r, r . 0. Equation (4.1) is a firstorder regular variation (RV) in this terminology.
142
CHAPTER 4
REGRESSIONS WITH SLOWLY VARYING REGRESSORS
4.2.7 Third-Order Regular Variation Lemma.
(Phillips, 2007, Lemma 7.5) If L ¼ K(1), 1 ¼ K(h), and h is SV, then
G(rn, n) log r ¼ m(n) log2 r þ o(1(n) þ h(n)) uniformly in r [ [a, b], for any 0 , a , b , 1. Here m(n) ¼ 12[1(n) þ h(n)]: Proof. By Lemma 4.2.6, applied to 1 instead of L, 1(rn) ¼ 1 þ h(n) log r þ o(h(n) log r) uniformly in r [ [min{a, 1}, max{b, 1}]: 1(n) The cases r , 1 and r . 1 are similar, so we consider just r , 1: If r t=n 1 and r [ [a, b], then t=n [ [min{a, 1}, max{b, 1}] and the above equation can be applied in
L(rn) ¼ log L(n)
ðn
dt 1(t) ¼ 1(n) t
rn
¼ 1(n)
ðn
1(n(t=n)) dt 1(n) t
rn
ðn h
1 þ h(n) log
t t i dt þ o h(n) log n n t
rn
ð1 ¼ 1(n) log r 1(n)h(n)[1 þ o(1)]
log s
ds s
r
1 ¼ 1(n) log r þ 1(n)h(n) log2 r [1 þ o(1)]: 2 This equation and the approximation e x 1 ¼ x þ
x2 þ o(x 2 ), x ! 0, give 2
L(rn) 1 2 1 ¼ exp 1(n) log r þ 1(n)h(n) log r[1 þ o(1)] 1 L(n) 2 1 1(n)h(n) log2 r þ o(1(n)h(n) log2 r) 2 1 1 þ 12 (n) log2 r þ 12 (n)h(n)log3 r 2 2 1 ¼ 1(n) log r þ 1(n)[1(n) þ h(n)] log2 r þ o(12 (n) þ 1(n)h(n)): 2
¼ 1(n) log r þ
B
4.3 SLOWLY VARYING FUNCTIONS WITH REMAINDER
143
EXAMPLE 4.2. For L(n) ¼ 1=log n we have 1(n) ¼ 1=log n, h(n) ¼ 1=log n and, by direct expansion, L(rn) log r log r 1 1¼ ¼ L(n) log r þ log n log n 1 þ log r=log n j 1 log r X j log r (1) if jlog rj , log n, ¼ log n j¼0 log n which agrees with the third-order expansion given in the lemma.
4.3 SLOWLY VARYING FUNCTIONS WITH REMAINDER 4.3.1 Definition and Notation L ¼ K(1, f) Phillips (2007, Lemma 7.4) establishes an important property that ð1
k L(rn) 1 dr ¼ (1)k k!1k (n)[1 þ o(1)], L(n)
n ! 1,
(4:13)
0
for any natural k, but its proof is incomplete. The full proof given in Section 4.3.7 relies on the theory of SV functions with remainder by Aljancˇic´ et al. (1955). We shall use the interpretation of that theory contained in the appendix of Shiganov in Seneta (1985). Let us call a remainder a positive function f on [0, 1) with properties: 1. f is nondecreasing and f(x) ! 1 as x ! 1, 2. there exist positive numbers u ¼ uf , X such that xu f(x) is nonincreasing on [X, 1). A positive measurable function L defined on [0, 1) is called SV with remainder f if for any l . 0 L(lx)=L(x) ¼ 1 þ O(1=f(x)), x ! 1 It is the same definition as before, with a makeweight in the form of a bound on the rate of convergence. For the examples considered in Section 4.2.1 the functions Li ¼ K(1i ) are SV with remainder fi (x) ¼ 1=j1i (x)j, and the number u . 0 can be taken arbitrarily close to 0. If L ¼ K(1) and L is SV with remainder f, we write L ¼ K(1, f).
4.3.2 Basic Properties of Slowly Varying Functions with Remainder Theorem. (i) (Seneta, 1985, p. 100) For a SV function with remainder the Karamata representation (item 1 of Section 4.3.1) holds with the
144
CHAPTER 4
REGRESSIONS WITH SLOWLY VARYING REGRESSORS
corresponding bounds on the rates:
m(x) c ¼ O(1=f(x)), 1(x) ¼ O(1=f(x)):
(4:14)
(ii) (Seneta, 1985, p. 101) The uniform convergence theorem holds with the following bound on the rate of convergence: if L is SV with remainder f, then L(lx) sup 1:l [ [a, b] ¼ O(1=f(x)), x ! 1, (4:15) L(x) for any fixed 0 , a , b , 1.
4.3.3 Bounding the Log by the Power Function We need bounds of type 2 þ log l c1 la
for l 1
(4:16)
for 0 , l 1,
(4:17)
and 2 log l c2 la
where a . 0. Equation (4.17) is reduced to Eq. (4.16) by inverting l. The precise value of c1 , c2 does not matter much, and therefore in the next lemma bound (4.16) is obtained for all l . 0. Lemma.
Equations (4.16) and (4.17) are true with c1 ¼ c2 ¼ 1a e 2a1 .
Proof. Consider f (l) ¼ c1 la 2 log l. The first-order condition is f 0 (l) ¼ c1 ala1
1 ¼ 0; c1 ala ¼ 1, l
so the minimum point is l0 ¼ [1=(c1 a)]1=a (at 0 and 1 the function goes to 1). c1 is determined from the tangency condition f (l0 ) ¼ 0 : c1
1 1 1 1 1 2 log ¼ 0; log ¼ 1 2a; c1 ¼ e2a1 : c1 a a c1 a c1 a a
Substituting 1=l for l in Eq. (4.17) we get 2 þ log l c2 la . Hence, B c2 ¼ c1 .
4.3.4 The Case of Large l The purpose of this and Section 4.3.5 is to complement the uniform convergence theorem (4.15) with statements that cover large and small l. Denote r(l, x) ¼ L(lx)=L(x), U(l, x) ¼ log r(l, x):
4.3 SLOWLY VARYING FUNCTIONS WITH REMAINDER
145
Using an elementary bound jex 1j jxjejxj we see that it suffices to estimate the right-hand side of jr(l, x) 1j ¼ jeU(l,x) 1j jU(l, x)jejU(l,x)j :
(4:18)
Lemma. (Seneta, 1985, p. 102) If L is SV with remainder f, then for any a . 0 there exist constants Ma . 0 and Ba B such that jr(l, x) 1j Ma la =f(x) for all x Ba
and
l 1:
Proof. Let x B and l 1. The Karamata representation 4.1.3(i) implies lðx
U(l, x) ¼ log L(lx) log L(x) ¼ m(lx) m(x) þ
1(t)
dt , t
(4:19)
x
where lx x: By Eq. (4.14) there exists a constant K . 0 such that jm(x) cj K=f(x),
j1(x)j K=f(x):
(4:20)
Since f is nondecreasing, we can use Eqs. (4.19) and (4.20) to get lðx
jU(l, x)j jm(lx) cj þ jc m(x)j þ
j1(t)j
dt t
x
K K þ þK f(lx) f(x)
lðx
1 dt f(t) t
x
2K K þ f(x) f(x)
lðx
dt ¼ K(2 þ log l)=f(x): t
(4:21)
x
Combining this with Eq. (4.18) we have jr(l, x) 1j
K(2 þ log l) 2K=f(x) K=f(x) e l : f(x)
By Lemma 4.3.3 we can dominate the factor 2 þ log l with la=2 : jr(l, x) 1j
Kc1 2K=f(x) K=f(x)þa=2 e l , f(x)
(4:22)
where c1 ¼ c1 (a=2). Since f(x) ! 1, there is Ba B (B is the lower limit of integration in the Karamata representation) such that K=f(x) a=2 for x Ba :
146
CHAPTER 4
REGRESSIONS WITH SLOWLY VARYING REGRESSORS
Therefore from Eq. (4.22) we finally obtain jr(l, x) 1j
Kc1 a a e l ¼ Ma la =f(x) for f(x)
x Ba , l 1,
where Ma ¼ Kc1 ea :
B
4.3.5 The Case of Small l Lemma 4.3.4 is given just for completeness and illustrative purposes. Only Lemma 4.3.5 is used subsequently. It means that the larger x, the closer l is allowed to be to 0. Lemma. (Seneta, 1985, p. 102) If L is SV with remainder f, then for any b . u [where u is the number from item 2 of Section 4.3.1(ii)] there exist constants Mb . 0 and Bb B such that jr(l, x) 1j Mb lb =f(x) for all x Bb
and
Bb =x l 1:
Proof. In (Seneta, 1985, p. 102) there is a typo: this lemma is stated with b . 0 instead of b . u. Let x B and B=x l 1. Since lx x, instead of Eq. (4.19) we have ðx U(l, x) ¼ m(lx) m(x)
1(t)
dt t
lx
and instead of Eq. (4.21) 2K K þ jU(l, x)j f(lx) f(lx)
ðx
dt ¼ K(2 log l)=f(lx): t
(4:23)
lx
Fix some b . u. Using monotonicity of f and that it increases to 1 at 1, from Bb xl we have K=f(lx) K=f(Bb ) , (b u)=2 for a sufficiently large Bb B. Then by Eq. (4.23) jU(l, x)j
bu bu (2 log l) ¼ b u log l: 2 2
(4:24)
However, by property 4.3.1(ii) of f the inequality lx x implies (lx)u f(lx) xu f(x) and
1=f(lx) lu =f(x):
Hence, from Eq. (4.23) jU(l, x)j K lu (2 log l)=f(x):
(4:25)
4.3 SLOWLY VARYING FUNCTIONS WITH REMAINDER
147
Using Eq. (4.25) for the first factor at the right of Eq. (4.18) and Eq. (4.24) for the second factor, we see that jr(l, x) 1j
K lu (2 log l)ebu l((bu )=2) : f(x)
To dominate 2 log l, in Eq. (4.17) put a ¼ (b u)=2 . 0. Then jr(l, x) 1j
K bu u((bu )=2)=((bu )=2) e c2 l ¼ Mb lb =f(x), f(x)
where Mb ¼ Kc2 ebu .
B
Remark 4.1. Since in practical cases the number u can be arbitrarily close to 0, the number b . u can also be as close to 0 as desired.
4.3.6 Assumption 4.1 (for Second-Order Regular Variation) The function L is SV with remainder and of form L ¼ K(1) where 1 is SV in the general sense of Section 4.1.1 and the remainder f1 (with properties 1 and 2 from Section 4.3.1) with some positive c satisfies 1 c j1(x)j f1 (x) cf1 (x)
for all x c:
(4:26)
As a result of the importance of this assumption, we write it out completely: 1. L has Karamata representation 0x 1 ð dt L(x) ¼ cL exp@ 1(t) A t
for x B
B
for some B . 0. Here cL . 0 is a constant, 1 is a continuous function and 1(x) ! 0 as x ! 1. Further, L is continuous on [0, 1). [This part of the assumption is written as L ¼ K(1) for short.] 2. 1 is SV in the sense of the general definition in Section 4.1.1. 3. There exists a function f1 on [0, 1) with properties: (3a) f1 is positive, nondecreasing on [0, 1), f1 (x) ! 1 as x ! 1, and there exist positive numbers u, X such that xu f1 (x) is nonincreasing on [X, 1); (3b) 1(x) is quasi-monotone for large x in the sense that with some positive constant c satisfies Eq. (4.26).
148
CHAPTER 4
REGRESSIONS WITH SLOWLY VARYING REGRESSORS
We write L ¼ K(1, f1 ) for short to mean that L satisfies Assumption 4.1. In all practical examples from Section 4.2.1, Assumption 4.1 holds with f1 (x) ¼ 1=j1(x)j.
4.3.7 Second-Order Regular Variation (Integral Version) Lemma. (Phillips, 2007, Lemma 7.4) If Assumption 4.1 holds and u ¼ uf [ (0, 1=k), then Eq. (4.13) is true. Proof. Using the G-function (4.12) and the identity ð1
logk r dr ¼ (1)k k!
(4:27)
[Gk (rn, n) logk r] dr ! 0:
(4:28)
0
we rewrite Eq. (4.13) as ð1 0
Let 0 , d , 1. Step 1. Lemma 4.2.6 implies Gk (rn, n) ¼ logk r[1 þ o(1)] uniformly in r [ [a, b]:
(4:29)
Hence, ð1
[Gk (rn, n) logk r] dr ! 0, n ! 1:
(4:30)
d
Step 2. The right inequality in Eq. (4.26) means that L is SV with remainder f1 and allows us to apply Lemma 4.3.5. For n . Bb =d the interval (Bb =n, d) is not empty and jG(rn, n)j
Mb r b f1 (n)j1(n)j
for n Bb =d and
Bb =n r d:
Hence, using also the left inequality in Eq. (4.26), we have ðd
k
ðd
jG (rn, n)j dr Bb =n
0
k Mb rb dr f1 (n)j1(n)j
Mb c1
k ðd 0
rbk dr ¼ c4 d1bk :
(4:31)
4.4 RESULTS BASED ON LP -APPROXIMABILITY
149
This tends to zero if b [ (0, 1=k). This condition can be satisfied because b is arbitrarily close to u. Step 3. If 0 , r Bb =n, then rn Bb and L(rn) c by continuity of L. As a result, k 1 c þ 1 , jGk (rn, n)j j1(n)jk L(n) where the function on the right is SV. From Theorem 4.1.4 it follows that Bð b =n
k 1 c Bb ! 0, þ1 jG (rn, n)j dr k L(n) n j1(n)j k
n ! 1:
(4:32)
0
Equations (4.30), (4.31), and (4.32), together with d ð logk r dr ! 0, d ! 0, 0
show that we can choose first a small d and then a large n to make the left side of Eq. (4.28) as small as desired. B
4.4 RESULTS BASED ON LP -APPROXIMABILITY This section demonstrates that Assumption 4.1 (Section 4.3.6), coupled with an assumption on linear processes, is sufficient for all asymptotic results.
4.4.1 Theorem on Lp -Approximability of G Theorem. (Mynbaev, 2009) For p [ [1, 1) and integral k 0 define a vector wn [ Rn by wnt ¼ n1=p Gk (t, n),
t ¼ 1, . . . , n:
If L ¼ K(1, f1 ) and puk , 1, then {wn } is Lp -close to fk (x) ¼ logk x: Proof. The case k ¼ 0 is trivial. The definitions of wn and interpolation operator Dnp (Section 2.1.6) give Dnp wn ¼
n X
Gk (t, n)1it :
t¼1
This is equivalent to n equations (Dnp wn )(u) ¼ Gk (t, n) for u [ it ,
t ¼ 1, . . . , n:
(4:33)
150
CHAPTER 4
REGRESSIONS WITH SLOWLY VARYING REGRESSORS
The condition u [ it is equivalent to the condition that t is an integer satisfying t nu þ 1 , t þ 1 which, in turn, is equivalent to t ¼ [nu þ 1]. Hence, the above n equations take a compact form (Dnp wn )(u) ¼ Gk ([nu þ 1], n),
0 u , 1:
(4:34)
Let 0 , d 1=2 and apply Lemma 4.3.5. With the number Bb from that lemma for n . n1 ; Bb =d the interval (Bb =n, d) is not empty, and by the triangle inequality kDnp wn fk k p,(0,1) kDnp wn fk k p,(d,1) þ k fk k p,(0,d) þ kDnp wn k p,(0,Bb =n) þ kDnp wn k p,(Bb =n,d) :
(4:35)
Obviously, k fk k p,(0,d) ! 0 as d ! 0. For the other three terms we consider three cases. Case 1. d u , 1. According to Eq. (4.29)
1 G (rn, n) ¼ log r[1 þ o(1)] uniformly in r [ d, 1 þ 2Bb k
k
(4:36)
Defining r ¼ [nu þ 1]=n, from the inequality nu , [nu þ 1] nu þ 1 we have 1 1 1 du,r uþ ,1þ 1þ , n n1 2Bb
(4:37)
1 : r [ d, 1 þ 2Bb
(4:38)
so that r ¼ u þ o(1) and Equations (4.36) and (4.38) lead to Gk ([nu þ 1], n) logk u ¼ o(1) uniformly in u [ (d, 1): This proves that kDnp wn fk k p,(d,1) ! 0, n ! 1:
(4:39)
Case 2. Bb =n u , d. Let n . n2 ; max{n1 , 2}: Then Eq. (4.37) and the conditions u [ [Bb =n, d), n . n2 imply Bb 1 1 u , r u þ , d þ 1: n n n2
4.4 RESULTS BASED ON LP -APPROXIMABILITY
151
This means we can apply Lemma 4.3.5, the left inequality of Eq. (4.26) and Eq. (4.37) to get
Mb jG ([nu þ 1], n)j b r f1 (n)j1(n)j k
k
Mb c1
k u
bk
Bb ,d : for u [ n
Fixing b [ (u, 1=( pk)) we have, with a new constant c2 (independent of n and d), ðd
ðd
p
upbk du ¼
jDnp wn j du c2 Bb =n
c2 d1pbk : 1 pbk
(4:40)
0
Case 3. 0 , u , Bb =n: In this case [nu þ 1] nu þ 1 , Bb þ 1 and L([nu þ 1]) c by the assumed continuity of L: Hence, jG([nu þ 1], n)j (c=L(n) þ 1)=j1(n)j
(4:41)
and kDnp wn k p,(0,Bb =n)
c 1 k Bb 1=p þ1 : n L(n) j1(n)j
(4:42)
Here the expression on the right tends to zero because the expression in the square brackets is a SV function by Theorem 4.1.1, and Theorem 4.1.4 applies to the whole expression. From Equations (4.39), (4.40), and (4.42) we see that we can choose first a small d and then a large n to make the left side of Eq. (4.35) as small as desired. B
4.4.2 Second-Order Regular Variation (Discrete Version) Corollary. If L ¼ K(1, f1 ) and uk , 1, then n 1X Gk (t, n) ¼ (1)k k! n !1 n t¼1
lim
Proof. Put p ¼ 1 in Theorem 4.4.1. Equations (4.27) and (4.33) imply ð1 ð1 n 1 X k k k G (t, n) (1) k! ¼ Dn1 wn du log udu n t¼1 0
0
kDn1 wn fk k1,(0,1) ! 0:
B
152
CHAPTER 4
REGRESSIONS WITH SLOWLY VARYING REGRESSORS
4.4.3 Averages of Slowly Varying Functions The next proposition is a discrete analog of Lemma 4.2.4(ii). Corollary. If L ¼ K(1, f1 ) and u , 1, then n 1 X Lk (t) ¼ 1 k1(n)[1 þ o(1)]: nLk (n) t¼1
Proof. Letting k ¼ 1 in the corollary in Section 4.4.2 yields n 1X G(t, n) ¼ [1 þ o(1)], n t¼1
which can be rearranged to n 1 X L(t) ¼ 1 1(n)[1 þ o(1)]: nL(n) t¼1
(4:43)
The 1-function of Lk is x(Lk (x))0 =Lk (x) ¼ k1(x) and L(lx)=L(x) ¼ 1 þ O(1=f1 (x)) implies Lk (lx)=Lk (x) ¼ 1 þ O(1=f1 (x)). Hence, Lk ¼ K(k1, f1 ), so Eq. (4.43), B applied to Lk , proves the corollary.
4.4.4 Averages of Squared Deviations from the Mean The following proposition is an analog of Lemma 4.2.5. Corollary. If L ¼ K(1, f1 ) and 2u , 1, then n 1X 2 ¼ L2 (n)12 (n)[1 þ o(1)]: (L(t) L) n t¼1
Proof. In the identity E(Y EY)2 ¼ EY 2 (EY)2 take the random variable Y to be a discrete variable with values Yt ¼ L(t)=L(n) 1, t ¼ 1, . . . , n, and uniform probability distribution p1 ¼ ¼ pn ¼ 1=n. Then EY ¼ Y ¼ L=L(n) 1 and The above identity gives Yt EY ¼ (L(t) L)=L(n). 2 2 !2 n n n 1X L(t) L 1X L(t) 1X L(t) 1 1 ¼ : n t¼1 L(n) n t¼1 L(n) n t¼1 L(n) This implies " #2 n n n X X X 1 1 1 2 2 ¼ (L(t) L) G (t, n) G(t, n) : nL2 (n)12 (n) t¼1 n t¼1 n t¼1 To finish the proof, it remains to apply Corollary 4.4.2 with k ¼ 1 and k ¼ 2.
B
4.4 RESULTS BASED ON LP -APPROXIMABILITY
153
4.4.5 Summary on Normalizing Factors The main sequences of weights arising in the Phillips approach are (see Section 4.5) xn ¼ (L(1), . . . , L(n)), . . . , L(n) L), yn ¼ (L(1) L, zn ¼ ((L(1) L(n))k , . . . , (L(n) L(n))k ): From Corollaries 4.4.3, 4.4.4, and 4.4.2, respectively, we see that pffiffiffi nL(n)[1 þ o(1)], pffiffiffi kyn k2 ¼ nL(n)j1(n)j[1 þ o(1)], pffiffiffiffiffiffiffiffiffiffi pffiffiffi kzn k2 ¼ n(L(n)j1(n)j)k (2k)![1 þ o(1)]:
kxn k2 ¼
pffiffiffi pffiffiffi Hence, instead of k k2 -norms we can use nL(n) for xn , nL(n)1(n) for yn (the sign pffiffiffi does not matter because of the symmetry of normal distributions) and n(L(n)1(n))k for zn . This explains the choice of weights in Section 4.4.6.
4.4.6 Central Limit Results for Linear Regression Here we need a new assumption. 4.4.6.1 Assumption 4.2 (On Linear Process) Let {cj : j [ Z} be a sequence P 2 of numbers satisfying j[Z jcj j , 1, and let {ej : j [ Z} be a m.d. such that et 2 2 are uniformly integrable and E(et jF t1 ) ¼ se for all t. Here {F t } is an increasing sequence of s-fields. A linear process {uj : j [ Z} is defined by ut ¼
X
cj etj ,
t [ Z:
j[Z
(Phillips 2007, Lemma 2.1) makes a stronger assumption on the linear process than in the next lemma. P
2 Lemma. Denote s 2 ¼ se j[Z cj . Under Assumptions 4.1 (Section 4.3.6) and 4.2 (Section 4.4.6.1) the following statements are true: 1 Xn d L(t)ut ! N(0, s 2 ). (i) If 2u , 1, then pffiffiffi t¼1 nL(n) Xn 1 d t ! (L(t) L)u N(0, s 2 ). (ii) If 2u , 1, then pffiffiffi t¼1 nL(n)1(n) 1 Xn d Gk (t, n)ut ! N(0, s 2 (2k)!). (iii) If 2uk , 1, then pffiffiffi t¼1 n
154
CHAPTER 4
REGRESSIONS WITH SLOWLY VARYING REGRESSORS
P In the case j[Z cj ¼ 0, everywhere convergence in distribution can be replaced by convergence in probability to zero. Proof. By Theorem 3.5.2 it is enough to establish L2 -approximability of the sequence of weights in question. (i) Setting p ¼ 2, k ¼ 1 in Theorem 4.4.1 gives 2 ð1 L([nu þ 1]) L(n) du ! 0: log u L(n)1(n) 0
Multiply this relation by 12 (n) ! 0 to obtain 2 ð1 L([nu þ 1]) du ! 0: 1 L(n) 0
In a similar way to the transition from Eq. (4.33) to Eq. (4.34) we have n 1 1 X L([nu þ 1]) (L(1), . . . , L(n)) ¼ : L(t)1it ¼ Dn2 pffiffiffi L(n) t¼1 L(n) nL(n) 1 (L(1), . . . , L(n)) is L2 -close to g ; 1. This shows that the sequence pffiffiffi nL(n) (ii) From Eq. (4.43) we conclude that L ¼ L(n) L(n)1(n)[1 þ o(1)] and that the sequence of weights in statement (ii) is 1 . . . , L(n) L) (L(1) L, wn ¼ pffiffiffi nL(n)1(n) 1 1 þ o(1) ¼ pffiffiffi (G(1, n), . . . , G(n, n)) þ pffiffiffi (1, . . . , 1): n n It is easy to see that the second sequence on the right is L2 -close to g ; 1. The first sequence is L2 -close to f1 by Theorem 4.4.1. Hence, wn is L2 -close to g1 (x) ; 1 þ log x. The statement follows from the fact that Ð1 2 0 g1 (x) dx ¼ 1. Statement (iii) follows directly from Theorem 4.4.1 and Eq. (4.27). B
4.4.7 Controlling Small l for Third-Order Regular Variation To study higher-order expansions we need a condition that is stronger than Assumption 4.1 (Section 4.3.6).
4.4 RESULTS BASED ON LP -APPROXIMABILITY
4.4.7.1
155
Assumption 4.3
1. L ¼ K(1, f1 ), 1 ¼ K(h, fh ) where h is a general SV function (in particular, 1 and h are quasi-monotone). 2. The function m(x) ¼ (1(x) þ h(x))=2 is different from zero for all large x and satisfies the condition 1 max {j1(x)j, jh(x)j} jm(x)j max {j1(x)j, jh(x)j}: c
(4:44)
The constant c in Eqs. (4.26) and (4.44) is the same because, if these conditions hold with different constants, the constants can be replaced by the largest of them. From Table 4.1 we see that m(x) can be either identically zero or different from zero for all large x. When m is not identically zero, jm(x)j happens to be of order of the largest of j1(x)j and jh(x)j, and Eq. (4.44) is satisfied. Assumption 4.3 is designed to analyze the effects of expansion terms up to log2 x. Condition 2 excludes vanishing m. If m vanishes, log2 x is not in the asymptotic expansion of L(x), and higher-order approximations need to be considered. There are different ways to characterize proximity of the G-function of L to log r. Lemma 4.2.6 establishes the pointwise version G(rn, n) ¼ log r[1 þ o(1)] uniformly in r [ [a, b]: Lemma 4.3.7 supplies the integral version: ð1
Gk (rn, n) dr ¼ (1)k k![1 þ o(1)],
n ! 1:
0
There is also an Lp -approximability relation (Theorem 4.4.1) n X k k G (t, n)1it () log () t¼1 p,(0,1)
and its discrete counterpart (Corollary 4.4.2) n 1X Gk (t, n) ¼ (1)k k![1 þ o(1)], n t¼1
n ! 1:
As we have seen, the Lp -approximability relation is the most useful. The purpose of this and the next few sections is to prove an Lp -approximability statement for H(t, n) ¼
G(t, n) log (t=n) , m(n)
which we call an H-function. By Lemma 4.2.7 H(rn, n) ¼ log2 r[1 þ o(1)],
n ! 1,
156
CHAPTER 4
REGRESSIONS WITH SLOWLY VARYING REGRESSORS
Pn 2 so t¼1 H(t, n)1it (x) should be Lp -close to log x. The analysis of the proof of Theorem 4.4.1 shows that to prove this fact we need to bound H(rn, n) for small r. This is done with the help of the next lemma. Lemma. If L satisfies Assumption 4.3 (Section 4.4.7.1), then for any b . max{2u1 , uh} there exist constants Mb . 0 and Bb B such that jG(lx, x) log lj Mb lb
1 1 þ f1 (x) fh (x)
! for x Bb
and
Bb l 1: x
Proof. Denote U(l, x) ¼ log L(lx)=L(x) and consider L(lx)=L(x) 1 1(x) log l ¼ eU(l,x) 1 U(l, x) þ U(l, x) 1(x) log l:
(4:45)
By Lemma 4.3.5 applied to 1 ¼ K(h, fh ) j1(lx)=1(x) 1j c1 lbh =fh (x) for all x Bb
Bb =x l 1
and
(4:46)
on bh . where bh . uh and c1 depends Ðx Since U(l, x) ¼ lx 1(t) dt=t, we have x ð ðx dt dt jU(l, x) 1(x) log lj ¼ 1(t) þ 1(x) t t lx lx ðx 1(t) dt 1 ¼ 1(x) 1(x) t lx
ð1 1(sx) ds j1(x)j 1 : 1(x) s l
The conditions Bb lx x and l s 1 imply Bb sx x, so we can use Eq. (4.46) to get c1 jU(l, x) 1(x) log lj j1(x)j fh (x)
ð1
sbh 1 ds ¼ j1(x)j
l
c2 j1(x)j bh l fh (x)
for x Bb
c2 (lbh 1) fh (x)
and
Bb l 1: x
(4.47)
4.4 RESULTS BASED ON LP -APPROXIMABILITY
x2 ejxj
157
From bounds (4.24) and (4.25) and an elementary inequality jex 1 xj we get
jeU(l,x) 1 U(l, x)j U 2 (l, x)ejU(l,x)j c3
l2u1 (2 log l)2 12(b1 u1 ) l f21 (x)
where b1 . u1 . Now we combine Eqs. (4.45), (4.47), and (4.48) to obtain " # 2 L(lx) j1(x)j (2 log l ) b 1=2b 3=2 u h 1 1 þ l L(x) 1 1(x) log l c4 f (x) l f21 (x) h
(4.48)
(4:49)
By Lemma 4.3.3, (2 log l)2 can be dominated by c5 ld with an arbitrary d . 0. Since the number b1 . u1 is arbitrarily close to u1 , the number a1 ¼ 12b1 32u1 d is less than, and arbitrarily close to, 2u1 . Hence, bounds (4.44) and (4.49) imply ! lbh la1 þ jG(lx, x) log lj c6 fh (x) f21 (x)j1(x)j ! 1 1 þ lb : c7 fh (x) f1 (x) Here we take an arbitrary b . max{2u1 , uh } and put a1 ¼ bh ¼ b. c7 depends on b. B
4.4.8 Lp -Approximability of H Theorem. (Mynbaev, 2011) Let Assumption 4.3 (Section 4.4.7.1) hold. For p [ [1, 1) define a vector wn [ Rn by wnt ¼ n1=p H(t, n),
t ¼ 1, . . . , n:
If max{2u1 , uh } , 1=p, then {wn } is Lp -close to f (x) ¼ log2 x. Proof. The proof follows that of Theorem 4.4.1. Similarly to the transition from Eq. (4.33) to Eq. (4.34) now we have (Dnp wn )(u) ¼ H([nu þ 1], n),
0 u , 1:
Let 0 , d 1=2. With the number Bb from Lemma 4.4.7 put n1 ; Bb =d. For n . n1 the interval (Bb =n, d) is not empty and, by the triangle inequality, kDnp wn f k p,(0,1) kDnp wn f k p,(d,1) þ k f k p,(0,d) þ kDnp wn k p,(0,Bb =n) þ kDnp wn k p,(Bb =n,d) : Obviously, k f k p,(0,d) ! 0 as d ! 0. Now we consider three cases.
(4.50)
158
CHAPTER 4
REGRESSIONS WITH SLOWLY VARYING REGRESSORS
Case 1. d u , 1. Lemma 4.2.7 guarantees that 1 : H(rn, n) ¼ log2 r[1 þ o(1)] uniformly in r [ d, 1 þ 2Bb
(4:51)
Defining r ¼ [nu þ 1]=n, from the inequality nu , [nu þ 1] nu þ 1 we have 1 1 1 du,r uþ ,1þ 1þ : n n1 2Bb
(4:52)
This leads to r ¼ u þ o(1) and
1 r [ d, 1 þ : 2Bb
(4:53)
From Eqs. (4.51) and (4.53) we see that H([nu þ 1], n) log2 u ¼ o(1) uniformly in u [ (d, 1), which allows us to conclude that kDnp wn f k p,(d,1) ! 0,
n ! 1:
(4:54)
Case 2. Bb =n u , d . Let n . n2 ; max{n1 , 2}. Then Eq. (4.52) and the conditions u [ [Bb =n, d), n . n2 imply Bb 1 1 u , r u þ , d þ 1: n n n2 This means we can successively apply Lemma 4.4.7, Eqs. (4.44) and (4.52) to get ! G(rn, n) log r Mb rb 1 1 jH([nu þ 1], n)j ¼ jm(n)j f (n) þ f (n) m(n) h 1 ( ) c1 rb 1 1 max , jm(n)j fh (n) f1 (n) c2 rb max{j1(n)j, jh(n)j} jm(n)j Bb b b ,d : c3 r c3 u for u [ n
4.5 PHILLIPS GALLERY 2
159
Hence, ðd
jDnp wn jp du cd1pb :
(4:55)
Bb =n
Here the right-hand side tends to zero if b , 1=p. This is possible because of the inequality max{2u1 , uh } , 1=p. Case 3. 0 , u , Bb =n. By monotonicity the inequality [nu þ 1]=n . u implies j log ([nu þ 1]=n)j j log uj. For G([nu þ 1], n) we can use Eq. (4.41). Then G([nu þ 1], n) log([nu þ 1]=n) jH([nu þ 1], n)j þ m(n) m(n) c þ L(n) logu þ : L(n)1(n)m(n) m(n) Since all functions of n here are SV and jlog uj can be dominated by cua with 0 , a , 1=p, we see that 1pa c þ L(n) Bb 1=p c Bb þ ! 0: kDnp wn kp,(0,Bb =n) L(n)1(n)m(n) n jm(n)j n (4:56) Equations (4.50), (4.54), (4.55), and (4.56) prove the theorem.
B
Intuitively, Lp -approximability of H should be a stronger fact than Lp -approximability of G. Indeed, if we multiply G([nu þ 1], n) log ([nu þ 1]=n) 2 log u ! 0 m(n) p by m(n) ! 0, we get G([nu þ 1], n) log [nu þ 1] ! 0: n p
Here [nuþ1] ! u uniformly on (0, 1) by Eq. (4.37) and log [nuþ1] jloguj. As a result n n of Eq. (4.34) the sequence {wn } from Theorem 4.4.1 with k ¼ 1 is Lp -close to log x.
4.5 PHILLIPS GALLERY 2 This whole Section is based on the paper of Phillips (2007), except that the asymptotic results are derived from Lp -approximability 4.4, rather than from 4.2.
160
CHAPTER 4
REGRESSIONS WITH SLOWLY VARYING REGRESSORS
These are the common features of the models considered in Phillips (2007): 1. the regressors are asymptotically collinear, 2. the ( joint) limit distribution of the regression coefficients is one-dimensional, and 3. the usual regression formulas for asymptotic standard errors are valid, but rates of convergence are affected. The reader is advised to review the properties of op (1) and Op (1) from Sections 1.8.4 and 1.8.5. These are widely used in the rest of the book.
4.5.1 Simple Regression In this and the next three sections we deal with the model yt ¼ a þ bL(t) þ ut ,
t ¼ 1, . . . , n,
(4:57)
where the errors ut satisfy Assumption 4.2 (Section 4.4.6). In cases such as L(s) ¼ 1= log s where L(1) is undefined or, more generally, L(1), . . . , L(a) are undefined, L(s) may be redefined as L(s) ¼ L(a þ 1) for 1 s a, with no effect on asymptotic results. Henceforth, it is assumed that such adjustments are made.
4.5.2 Problem with Estimation The sequence (L(1), . . . , L(n)) upon normalization becomes L2 -close to 1, as we have established in the proof of Lemma 4.4.6. This means that the regressors in Eq. (4.57) are asymptotically collinear and the normalized matrix of second moments 0 1 D1 n X XDn is asymptotically singular. With the expansions from Section 4.4 the above fact can be proved directly. Denote 0 1 1 L(1) X ¼ @ A 1 L(n) the matrix of regressors in Eq. (4.57). By Corollary 4.4.3 the normalizer for P n L(s) 0 P P (4:58) XX¼ L(s) L2 (s) is pffiffiffi n pffiffiffi0 ; Dn ¼ 0 nL(n) so 0 1 D1 n X XDn
¼
1 nL(n)
1 P
P
L(s)
1 L(s) nL(n) P 2 1 L (s) nL2 (n)
Thus, the conventional scheme is not applicable.
!
!
1 1
1 : 1
4.5 PHILLIPS GALLERY 2
161
4.5.3 Way Around the Problem Phillips’ idea is to use the alternative formulas for OLS estimates from Section 4.2.5. Pn 2 can be normalized by (nL2 (n)12 (n))1, according to Corollary 4.4.4, (L(t) L) t¼1 Pn and from Lemma 4.4.6 we know that t¼1 (L(t) L)ut should be divided by pffiffiffi nL(n)1(n) to achieve convergence in distribution. Therefore Eq. (4.9) should be rearranged to pffiffiffi nL(n)1(n)(b^ b) " #1 n n X X 1 1 2 t: p ffiffi ffi (L(t) L) (L(t) L)u ¼ nL2 (n)12 (n) t¼1 nL(n)1(n) t¼1
(4.59)
Since L ¼ L(n)(1 þ o(1)) [see Section (4.4.3)], to achieve convergence of the term pffiffiffi involving b^ b in Eq. (4.10), we have to multiply both sides by n1(n) to get n pffiffiffi L pffiffiffi 1(n) X ut n1(n)(a^ a) ¼ pffiffiffi nL(n)1(n)(b^ b): L(n) n t¼1
P Luckily, here p1ffiffin nt¼1 ut converges in distribution (correctly posed problems always have solutions), because the sequence of weights is L2 -close to 1. Therefore the first term is op (1). The whole thing asymptotically is pffiffiffi pffiffiffi n1(n)(a^ a) ¼ nL(n)1(n)(b^ b) þ op (1):
(4:60)
Equations (4.59) and (4.60) and Lemma 4.4.6(ii) lead to the following conclusion. Theorem. (Phillips, 2007, Theorem 3.1) If L satisfies Assumption 4.1 (Section 4.3.6), 2u , 1 and ut satisfies Assumption 4.2 (Section 4.4.6.1), then pffiffiffi d 1 1 a) 2 pffiffiffi n1(n)(a^ N 0, s ! : (4:61) 1 1 nL(n)1(n)(b^ b)
4.5.4 Examples EXAMPLE 4.3. L(s) ¼ logg s, g . 0. This gives the semilogarithmic model. Here 1(n) ¼ g=log n, L(n)1(n) ¼ g logg1 n and the previous theorem gives pffiffiffi d 1 1 (gpffiffiffin=log n) (a^ a) 2 : ! N 0, s 1 1 g n logg1 n(b^ b) EXAMPLE 4.4.
L(s) ¼ log( log s). Here 1(n) ¼
1 , ( log n) log ( log n)
L(n)1(n) ¼
1 log n
162
CHAPTER 4
REGRESSIONS WITH SLOWLY VARYING REGRESSORS
and convergence is described by pffiffiffi d 1 1 n=(( log n))(a^ a) 2 pffiffiffin) log ( log N 0, s ! : 1 1 n=log n(b^ b) EXAMPLE 4.5. L(s) ¼ 1= log s. This example arises when the regressor decays slowly. Here 1(n) ¼ 1= log n, L(n)1(n) ¼ 1= log2 n, and the result is pffiffiffi d 1 1 a a ) n =log n( ^ 2 pffiffiffi : ! N 0, s 1 1 n=log2 n(b^ b) EXAMPLE 4.6. L(s) ¼ 1= log ( log s). Here 1(n) ¼
1 , ( log n) log ( log n)
L(n)1(n) ¼
1 , ( log n) log2 ( log n)
so that pffiffiffi d 1 1 pffiffiffin=(( log n) log2 ( log n))(a^ ^ a) : ! N 0, s 2 1 1 n=(( log n) log n( log n))(b b) Some intuition that explains the results is as follows. When L(n) ! 1 as n ! 1, the convergence rate of the slope coefficient b^ exceeds that of the intercept a^ , because the signal from the regressor L(s) is stronger than that of a constant regressor. When L(n) ! 0 as n ! 1, the convergence rate of b^ is less than that of a^ , because the signal from the regressor L(s) is weaker than that of a constant regressor.
4.5.5 Standard Errors From the general equation V
a^ b^
¼ s 2 (X 0 X)1
we know how to find estimates of standard errors: they are computed by scaling square roots of diagonal elements of (X 0 X)1 with s 2 (or estimate of s 2 ). However, because of asymptotic singularity of X 0 X, its inverse behaves badly, as we see shortly. Using the rule
a c
b d
1 ¼ ad bc
d b c a
(4:62)
4.5 PHILLIPS GALLERY 2
163
we have for the inverse of Eq. (4.58)
ad bc ¼
X
! 1
t
¼n
X
X
! 2
L (t)
t
2
L (t)
(X X)
1
!2 L(t)
t
X
t 0
X
1 ¼ P 2 n t (L(t) L)
!2 L(t)
¼n
t
X
2, (L(t) L)
t
P 2 t L (t) P t L(t)
P
t
L(t)
n
! :
Now apply expansions from Sections 4.4.3 and 4.4.4: 0
(X X)
1
2 nL (n) 1 ¼ 2 2 n L (n)12 (n)[1 þ o(1)] nL(n) 0 1 1 1 B L(n) C C [1 þ o(1)] : ¼B @ 1 1 A n12 (n) L(n) L2 (n)
nL(n) n
[1 þ o(1)]
Square roots of the elements on the main diagonal should be put on the diagonal of the normalizing matrix, denoted Fn , to obtain unities on the main diagonal of Fn1 (X 0 X)1 Fn1 . Thus, 1 1 Fn ¼ diag pffiffiffi , pffiffiffi , n1(n) nL(n)1(n) 1 1 1 0 1 1 Fn (X X) Fn ¼ [1 þ o(1)]: 1 1 It follows from these formulas that, in spite of the singularity in the limit matrix, the covariance matrix of the regression coefficients is consistently estimated as in conventional regression when an appropriate estimate s2 of s 2 is employed.
4.5.6 Lemma on a Linear Transformation of the Parameter Lemma. Let A be a nonsingular matrix. If the parameter vector in the linear model has been transformed as in y ¼ X b þ u ¼ XA1 Ab þ u ¼ XA1 a þ u, then the linear relation between a and b, a ¼ Ab, translates to a similar relation between the OLS estimators: a^ ¼ Ab^ . Consequently, a^ a ¼ A(b^ b).
164
CHAPTER 4
REGRESSIONS WITH SLOWLY VARYING REGRESSORS
Proof. 1
1
a^ ¼ ((XA1 )0 XA1 ) (XA1 )0 y ¼ ((A1 )0 X 0 XA1 ) (A1 )0 X 0 y ¼ A(X 0 X)1 A0 (A1 )0 X 0 y ¼ Ab^ :
B
It is important that in asymptotic theory, where there is a sequence of linear models depending on n, the matrix A may also depend on n.
4.5.7 Polynomial Regression in L(s) In this model the regressors are polynomials in the SV function L, and the data are generated by yt ¼
p X
bj L j (t) þ ut ¼ b0 Lt þ ut
(4:63)
j¼0
where Lt ¼ (1, L(t), . . . , L p (t))0 and the error ut satisfies Assumption 4.2 (Section 4.4.6). This model may be analyzed using the approach applied to simple regression. But, as the degree p increases in Eq. (4.63), the analysis becomes complicated because high-order expansions of the sample moments of L are needed to develop a complete asymptotic theory. An alternate approach is to rewrite the model Eq. (4.63) in a form wherein the moment matrix of the regressors has a full-rank limit. The degeneracy in the new model, which has now an array format, then passes from the data matrix to the coefficients and is simpler to analyze. The process is first illustrated with simple regression yt ¼ a þ bL(t) þ ut , which we can write in the form yt ¼ a þ bL(n) þ b(L(t) L(n)) þ ut or, denoting an ¼ a þ bL(n), as yt ¼ an þ b(L(t) L(n)) þ ut :
(4:64)
The regressors {1, L(t) L(n)} in Eq. (4.64) are not collinear, and the OLS asymptotics is obtained by application of Theorem 1.12.3. The column of unities in the regrespffiffiffi pffiffiffi sor matrix X is normalized by n, giving 1= n(1, . . . , 1)0 , which is L2 -close to 1. The second column, in agreement with Corollary 4.4.4, is normalized by n X pffiffiffi 2 (L(t) L) nL(n)1(n) ¼
!1=2 [1 þ o(1)]:
t¼1
pffiffiffi The normalized second column equals 1= n(G(1, n), . . . , G(n, n)), which is L2 -close to log x. Thus, by Theorem 1.12.3
pffiffiffi d a a ) n ( ^ n n pffiffiffi ! N(0, s 2 G1 ) ^ nL(n)1(n)(b b)
(4:65)
4.5 PHILLIPS GALLERY 2
165
where G is the Gram matrix of the system {1, log x}: ð1 G¼
1 log x log x log2 x
dx ¼
1 1
1 : 2
0
By Lemma 4.5.6
a^ n an ¼ (a þd bL(n)) (a þ bL(n)) ¼ a^ a þ (b^ b)L(n): Since our purpose is to deduce convergence of a^ a, and we know the rates of convergence of b^ b and a^ n an from Eq. (4.65), we have pffiffiffi pffiffiffi pffiffiffi n1(n)(a^ a) ¼ nL(n)1(n)(b^ b) þ n1(n)(a^ n an ) pffiffiffi ¼ nL(n)1(n)(b^ b) þ op (1): This implies
pffiffiffi d 1 a) 2 pffiffiffi n1(n)(a^ ! N 0, s 1 nL(n)1(n)(b^ b)
1 1
,
which is the same as Theorem 4.5.3.
4.5.8 General Case (Convergence for the Transformed Regression) Extending the procedure devised for Eq. (4.64) to Eq. (4.63) gives the representation j p X L(t) 1 þ L(n) þ ut yt ¼ bj L(n) L(n) j¼0 ¼
j p j X X j L(t) 1 þ ut bj Lj (n) L(n) j¼0 i¼0 i
(change the summation order) j p p X X j L(t) L(n) i j 1 (n) bj L (n) þ ut : ¼ i L(n)1(n) j¼i i¼0 Here
j ¼ j!=( i!( j i)!) for 0 i j. Denote i p X j ani ¼ bj L j (n) , i ¼ 0, . . . , p: i j¼i
(4:66)
(4:67)
166
CHAPTER 4
REGRESSIONS WITH SLOWLY VARYING REGRESSORS
With the help of the G-function from Section 4.2.6 we rewrite Eq. (4.66) as yt ¼
p X 1i (n)ani Gi (t, n) þ ut
(4:68)
i¼0
Further, introducing the vector
an ¼ (an0 , . . . , anp )0 and matrices 0 B G¼@
G0 (1, n) Gp (1, n)
0
1 C A,
(4.69)
p
G (n, n) G (n, n) Dn1 ¼ diag[1, 1(n), . . . , 1p (n)] we write Eq. (4.68) as y ¼ GDn1 an þ u:
(4:70)
By Corollary 4.4.2 the l2 -norm of the ith column of G is !1=2 n X pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2(i1) G (t, n) ¼ (2i 2)!n[1 þ o(1)], i ¼ 1, . . . , p þ 1: t¼1
For compatibility with Phillips (2007, Lemma 7.8) this is better normalized by The normalized column
pffiffiffi n.
1 pffiffiffi (Gi1 (1, n), . . . , Gi1 (n, n))0 n is L2 -close to logi1 x, i ¼ 1, . . . , p þ 1, by Theorem 4.4.1. Now we can apply pffiffiffi Theorem 1.12.3 with Dn ¼ nI, lp (x) ¼ (1, log x, . . . , log p x)0 to obtain the following statement. Theorem. (Phillips, 2007, Theorem 4.1) If Assumptions 4.1 (Section 4.3.6) and 4.2 (Section 4.4.6.1) hold and 2up , 1, then 1 0 G G ! G ; n
ð1
lp (x)l0p (x) dx,
det G = 0,
0
0 1 !2 X pffiffiffi d 1 cj G A: nDn1 (a^ n an ) ! N @0, se j[Z
(4:71)
4.5 PHILLIPS GALLERY 2
167
G has elements ð1
gij ¼ logiþj2 dx ¼ (1)iþj2 (i þ j 2)!,
i, j ¼ 1, . . . , p þ 1:
0
4.5.9 Analysis of G I think it is useful to know how one can guess a result like that reported here. Triangular decompositions of matrices are convenient for calculating determinants and matrices because 1. the determinant of a triangular matrix equals the product of its diagonal elements and 2. the inverse of a lower (or upper) triangular matrix is again a lower (upper, respectively) triangular matrix. The procedure for finding a lower triangular matrix B in the decomposition S ¼ BB0 of a symmetric matrix S tells us that the elements on the main diagonal of B are completely determined by the values of major minors of S, (see Gantmacher, 1959, Chapter II, Section 4, Corollary 3). Once these has been found, other minors can be used to calculate the elements below the main diagonal. Both steps can be implemented in MathCAD. Then one can try to generalize from numbers to analytical expressions. (Phillips, 2007, Lemma 7.8 corrected) Put Fp ¼ diag[0!, 1!, . . . , p!] and
Lemma.
0
1 B 1 B B 1 Hp ¼ B B B @ p (1) p 0
0 1 2 p (1) pþ1 1
0 0 1 p (1) pþ2 2
where the ith row consists of the coefficients in the binomial (1 1)
i1
¼
i1 X m¼0
Then (i) G ¼ Fp Hp Hp0 Fp , p Q (ii) det G ¼ ( j!)2 , j¼1
(iii) (G1 ) pþ1,pþ1 ¼ 1=( p!)2 .
(1)
i1þm
i1 : m
1 0 0 C C 0 C C C C A 1
168
CHAPTER 4
REGRESSIONS WITH SLOWLY VARYING REGRESSORS
Proof. (i) In the product Fp Hp the ith row of Hp gets multiplied by (i 1)! The jth row of Fp Hp becomes the jth column of (Fp Hp )0 ¼ Hp0 Fp . Therefore
(Fp Hp Hp0 Fp )ij
¼
min{i1, Xj1}
(i 1)!( j 1)!(1)
iþj2
m¼0
i1 m
j1 : m
Because of symmetry, we can assume, without loss of generality, that i j and then
(Fp Hp Hp0 Fp )ij
¼ (1)
iþj2
i1 X i1 j1 : (i 1)!( j 1)! m m m¼0
According to [Vilenkin, 1969, Chapter II, Equation (25)], for k l there is an identity k X l k kþl ¼ : m m k m¼0 Application of this formula completes the proof: (Fp Hp Hp0 Fp )ij ¼ (1)iþj2 (i1)!( j 1)!
(ii) Obviously, det G ¼ (det Fp det Hp )2 ¼
p Q
(i þ j 2)! ¼ Gij : (i 1)!( j 1)!
( j!)2 .
j¼1
(iii) Part (i) implies
0
G1 ¼ [(Fp Hp )1 ] (Fp Hp )1 : Direct calculation shows that
(G1 )ii ¼
pþ1 X
2
[(Fp Hp )1 ] ji ,
i ¼ 1, . . . , p þ 1:
j¼i
In particular, 2
(G1 ) pþ1,pþ1 ¼ [(Fp Hp )1 ] pþ1,pþ1 ¼ 1=( p!)2 :
B
4.5 PHILLIPS GALLERY 2
169
4.5.10 An Equation with an Upper Triangular Matrix Lemma. (Phillips, 2007, p. 601) Denote e pþ1 ¼ (0, . . . , 0, 1)0 (( p þ 1)th unit vector) and
Apþ1
0 0 B 0 B B B B 0 B B B B 0 ¼B B B B B B 0 B B B @ 0
1 0 1
2 0 2
1
1 2
0
2
0
0
0
0
p1 0 p1
1 p1
2 p1 p1
0
p 0 p
1
C C C C C C 1 C C p C C, C 2 C C C p C C p1 C C A p p
1 p p (1) B C 0 B C B p C B C B (1) p1 C 1 C: ¼B B C B C B C p B (1) C @ p1 A 1 0
m pþ1
The solution x [ R pþ1 of the equation A pþ1 x ¼ ce pþ1 ,
(4:72)
x ¼ cm pþ1 :
(4:73)
where c is a constant, is given by
Proof. The first p equations of Eq. (4.72) can be written as pþ1 X j1 x ¼ 0, k1 j j¼k
k ¼ 1, . . . , p:
(4:74)
Equation (4.73) in a coordinate form is xk ¼ (1) pkþ1
p c, k1
k ¼ 1, . . . , p þ 1:
(4:75)
170
CHAPTER 4
REGRESSIONS WITH SLOWLY VARYING REGRESSORS
The statement is proved by induction in decreasing k. Equation (4.75) for k ¼ p þ 1 is just the right equation in Eq. (4.72). Equation (4.75) should be verified also for k ¼ p because the structure in Eq. (4.72) changes from p þ 1 to p. The final two equations in Eq. (4.72) imply xp ¼
p p x pþ1 ¼ c, p1 p1
which agrees with Eq. (4.75). Suppose Eq. (4.75) is true for k þ 1, . . . , p þ 1 and let us prove it for k. From Eqs. (4.74) and (4.75) we have pþ1 pþ1 X X j1 j1 p pjþ1 xk ¼ (1) x ¼ c: k1 j k1 j1 j¼kþ1 j¼kþ1
(4:76)
Here the coefficient in front of c equals
(1) pkþ1
¼ (1)
pþ1 X (1)kjþ1 ( j 1)! p! ( p k þ 1)! (k 1)!( j k)! ( j 1)!( p j þ 1)! ( p k þ 1)! j¼kþ1
pkþ1
X pþ1
p k1
(1)
jkþ1
pkþ1
jk
j¼kþ1
(replace j k by i)
¼ (1)
pkþ1
p
" pkþ1 X
k1
¼ (1)
pkþ1
¼ (1)
pkþ1
p k1
(1)
iþ1
pkþ1
i¼1
i
#
1þ1
[(1 1) pkþ1 þ 1]
p : k1
Equations (4.76) and (4.77) prove Eq. (4.75) and the lemma.
(4.77)
B
4.5.11 From Convergence of aˆ n to Convergence of bˆ It transpires that only the final component, a^ np , in a^ n (which translates to the component b^ p in the original coordinates) determines the nondegenerate part of the limit theory for the full set of coefficients.
4.5 PHILLIPS GALLERY 2
171
Theorem. (Phillips, 2007, Theorem 4.2) If Assumptions 4.1 (Section 4.3.6) and 4.2 (Section 4.4.6.1) hold and 2up , 1, then pffiffiffi pffiffiffi p n1 (n)DnL (b^ b) ¼ m pþ1 nLp (n)1p (n)(b^ p bp ) þ op (1) s2 d 0 m m ! N 0, pþ1 pþ1 , (p!)2
(4.78)
where m pþ1 is from Lemma 4.5.10 and DnL ¼ diag[1, L(n), . . . , Lp (n)]: Proof. By Lemma 4.5.6 the equation [see Eq. (4.67)]
an ¼ A pþ1 DnL b
(4:79)
a^ n an ¼ A pþ1 DnL (b^ b):
(4:80)
implies
First, we derive convergence of b^ p . The final equation in system (4.80) is a^ np anp ¼ Lp (n)(b^ p bp ). Utilizing what we know about convergence of a^ np from Theorem 4.5.8 and Lemma 4.5.9(iii) we have pffiffiffi pffiffiffi p d nL (n)1p (n)(b^ p bp ) ¼ n1p (n)(a^ np anp ) ! N(0, s 2 =( p!)2 ):
(4:81)
Now we are ready to derive convergence of the whole vector b^ . On the righthand side of Eq. (4.80) each equation contains the term Lp (n)(b^ p b p ) with a nonzero coefficient. From Eq. (4.81) we know that, to make this term convergent, we have to pffiffiffi multiply each equation by n1p (n). In the resulting system pffiffiffi pffiffiffi p n1 (n)(a^ n an ) ¼ Apþ1 n1p (n)DnL (b^ b),
(4:82)
the final equation is the same as Eq. (4.81). In the first p equations the left side is op (1), pffiffiffi as follows from Theorem 4.5.8 and that 1(n) ! 0. Letting c ¼ n1p (n)(a^ np anp ) we rewrite Eq. (4.82) as pffiffiffi A pþ1 n1p (n)DnL (b^ b) ¼ cepþ1 þ op (1),
172
CHAPTER 4
REGRESSIONS WITH SLOWLY VARYING REGRESSORS
which by Lemma 4.5.10 implies pffiffiffi p 1 n1 (n)DnL (b^ b) ¼ A1 pþ1 ce pþ1 þ A pþ1 op (1) pffiffiffi ¼ m pþ1 n1p (n)(a^ np anp ) þ op (1): This equation and Eq. (4.81) prove Eq. (4.78).
B
4.5.12 Variance Matrix Estimation pffiffiffi The limit distribution of the expression n1p (n)DnL (b^ b) has support given by the range of the vector m pþ1 and is therefore of dimension 1. The variance matrix of b^ is asymptotically
s2 D1 m pþ1 m0pþ1 D1 nL , ( p!) n12p (n) nL 2
(4:83)
which, as we show now, is consistently estimated by the usual regression formula. The matrix of regressors X in the original regression (4.63) in terms of the vectors Lt ; (1, L(t), . . . , Lp (t))0 satisfies X 0 ¼ (L1 , . . . , Ln ), so X0X ¼
n X
Ls L0s :
(4:84)
s¼1
This expression for X 0 X has more of statistical flavor. The following result gives asymptotic expressions for both X 0 X and its inverse. Those expressions show that, indeed, Eq. (4.83) is the asymptotic form of (X 0 X)1 . Theorem. (Phillips, 2007, Theorem 4.3) If L satisfies Assumption 4.1 (Section 4.3.6) and 2up , 1, then (i) X 0 X ¼ nDnL i pþ1 i0pþ1 DnL [1 þ o(1)], where i pþ1 ¼ (1, . . . , 1)0 is a ( p þ 1)vector with unity in each element. 0 1 (ii) (X 0 X)1 ¼ (1=(( p!)2 n12p (n))) D1 nL m pþ1 m pþ1 DnL [1 þ o(1)], where m pþ1 is from Lemma 4.5.10.
Proof. (i) Step 1. We need a working expression for Lt that would link X 0 X to the matrix whose convergence is affirmed in Eq. (4.71). Let us take a closer look at the transformation that has led to Eq. (4.68):
b0 Lt ¼
p X i¼0
1i (n)ani Gi (t, n):
4.5 PHILLIPS GALLERY 2
173
Using representation Eq. (4.79) and denoting by G(t, n) ¼ (G0 (t, n), . . . , Gp (t, n))
0
the tth row of Eq. (4.69) we can continue as follows:
b0 Lt ¼ a0n Dn1 G(t, n) ¼ b0 DnL A0pþ1 Dn1 G(t, n): Since this is true for any b [ R pþ1, we obtain the desired representation Lt ¼ DnL A0pþ1 Dn1 G(t, n): Step 2. Equation (4.84), a similar identity G0 G ¼
n X
G(t, n)G0 (t, n),
t¼1
the above representation and Eq. (4.71) lead to X0X ¼
X
DnL A0pþ1 Dn1 G(t, n)G0 (t, n)Dn1 Apþ1 DnL
t
¼ nDnL A0pþ1 Dn1 GDn1 Apþ1 DnL [1 þ o(1)]:
(4.85)
In Dn1 all elements, except for the unity in the upper left corner, tend to zero. In G the element in the upper left corner is unity. Therefore Dn1 GDn1 ¼ E þ o(1), where E has 1 in the upper left corner and 0 elsewhere. Hence, X 0 X ¼ nDnL A0pþ1 EApþ1 DnL [1 þ o(1)] ¼ nDnL ipþ1 i0pþ1 DnL [1 þ o(1)]: (ii) From Eq. (4.84) the inverse sample matrix is 1 0 1 1 A1 D1 G1 D1 (X 0 X)1 ¼ D1 n1 (Apþ1 ) DnL [1 þ o(1)]: n nL pþ1 n1
(4:86)
Now observe that D1 n1 is dominated by its final diagonal element, and so we can write D1 n1 ¼
1 epþ1 e0pþ1 [1 p 1 (n)
þ o(1)],
174
CHAPTER 4
REGRESSIONS WITH SLOWLY VARYING REGRESSORS
where epþ1 ¼ (0, . . . , 0, 1)0 . Hence, 1 1 D1 n1 G Dn1 ¼
1 1 epþ1 (e0pþ1 G1 epþ1 )e0pþ1 ¼ epþ1 e0pþ1 2 2p 12p (n) ( p!) 1 (n)
and also, by Lemma 4.5.10, A1 pþ1 epþ1 ¼ mpþ1 . Thus, continuing Eq. (4.85), (X 0 X)1 ¼ ¼
1 1 1 1 D (A epþ1 )[e0pþ1 (A0pþ1 )1 ]D1 nL [1 þ o(1)] ( p!)2 12p (n) n nL pþ1 1 D1 mpþ1 m0pþ1 D1 nL [1 þ o(1)]: ( p!) n12p (n) nL 2
B
It follows from (ii) that, in spite of the singularity in the limit matrix, the covariance matrix of the regression coefficients is consistently estimated as in conventional regression by s2 (X 0 X)1 whenever s2 is a consistent estimate of s 2 .
4.6 REGRESSION WITH TWO SLOWLY VARYING REGRESSORS Multiple regression with different SV regressors also appears in some applications, the case of two such regressors being of principal interest. One such formulation is given in Section 4.6.9.
4.6.1 Statement of Problem It is convenient to call the product d(x) ¼ L(x)1(x)m(x) a d-function of L. Our subject is the two-variable regression model ys ¼ b0 þ b1 L1 (s) þ b2 L2 (s) þ us :
(4:87)
The purpose is to develop its asymptotic theory general enough to include all pairs (Li , Lj ), i , j, of functions from Table 4.1, where in the case L1 (x) ¼ logg x we assume g ¼ 1. This L1 is special in that its m- and d-functions are identically zero. In all other cases 1, m, and d are nonzero for all large x. The following minimal assumptions are used implicitly: 1. To avoid multicollinearity, in the pair (L1 , L2 ) only one function is allowed to be log x (and to have a vanishing d-function). By changing the notation, if necessary, we can assume that if one of L1 , L2 is log x, then it is always L1 . 2. To exclude constant regressors, we also assume that none of 11 and 12 vanishes for all large n. When L1 (x) ¼ log x, model (4.87) is called semireduced. When both d1 and d2 are nonzero, model (4.87) is called nonreduced.
4.6 REGRESSION WITH TWO SLOWLY VARYING REGRESSORS
175
The analysis in the subsequent sections shows that the asymptotic theory of model (4.87) depends on the asymptotic behavior of the ratios d1 =d 2 and 11 =12 . In the next assumption L1 and L2 denote SV functions of form K(1i , f1i ), where 1i ¼ K(hi , fhi ), not necessarily those from Table 4.1. 4.6.1.1
Assumption 4.4
1. The limit l1 ¼ limn!1 11 =12 (finite or infinite) exists and 2. in the nonreduced case we assume that the limit ld ¼ limn !1 d1 =d2 (finite or infinite) exists.
4.6.2 Phillips’ Transformation of the Regressor Space Theorem 5.1 of Phillips (2007) is based on the following heuristic argument. Rewrite Eq. (4.87) as ys ¼ b0 þ b1 L1 (n) þ b2 L2 (n) L1 ((s=n)n) L2 ((s=n)n) 1 þ b2 L2 (n) 1 þ us : þ b1 L1 (n) L1 (n) L2 (n)
(4.88)
We note from Lemma 4.2.7 that L2 has a higher-order representation Lj (rn) 1 ¼ 1j (n) log r þ 1j (n)mj (n) log2 r[1 þ o(1)], Lj (n)
r . 0:
(4:89)
Letting s ¼ rn here we write (the argument n in Lj , 1j , mj is suppressed): ys ¼ b0 þ b1 L1 þ b2 L2 s s þ b1 L1 11 log þ b1 L1 11 m1 log2 [1 þ o(1)] n n s s þ b2 L2 12 log þ b2 L2 12 m2 log2 [1 þ o(1)] þ us : n n
(4.90)
Dropping here o(1) produces an approximation ys ¼ b0 þ b1 L1 þ b2 L2 þ (b1 L1 11 þ b2 L2 12 ) log þ (b1 L1 11 m1 þ b2 L2 12 m2 ) log2
s þ us n
to Eq. (4.90). Denoting 0 1 0 1 0 1 b0 gn0 b ¼ @ b1 A, gn ¼ @ gn1 A, An ¼ @ 0 b2 gn2 0
L1 L1 11 d1
s n (4.91)
1 L2 L2 12 A d2
(4:92)
176
CHAPTER 4
REGRESSIONS WITH SLOWLY VARYING REGRESSORS
we obtain s s ys ¼ gn0 þ gn1 log þ gn2 log2 þ us , n n
gn ¼ An b:
(4:93)
We call the gi ’s good coefficients and bi ’s bad coefficients. The matrix An is called a transition matrix. Because of asymptotic collinearity of the regressors in Eq. (4.87), the asymptotic distribution of the bad coefficients is degenerate (onedimensional) and is not possible to find directly by normalizing the OLS estimator. In Eq. (4.93) the regressors are not asymptotically collinear and therefore the asymptotic distribution of the OLS estimator g^ n is good (normal with a positive definite variance –covariance matrix). Phillips’ idea is to extract the asymptotic distribution 1 is not well-behaved as of the b’s from that of the g’s using b ¼ A1 n gn . An n ! 1. Thus, the study of the transition matrix is at the heart of the method. The problem with this transformation is that it is impossible to prove that Eq. (4.91) approximates Eq. (4.90). Relationship Eq. (4.89) does not cover the segment 1 s , na whose length is proportional to the sample size. For such s the approximation cannot be good. For example, at s ¼ 2 for functions L1 , L3 from Table 4.1 we have L1 (2)=L1 (n) ¼ log 2= log n ! 0, L3 (2)=L3 (n) ¼ log n= log 2 ! 1. Therefore Theorem 5.1 of Phillips (2007) is true for Eq. (4.91) and not for the original regression. Our main result shows that, indeed, replacing Eq. (4.90) by Eq. (4.91) results in loss of information in terms of the variety of different asymptotic types. We are able to show that the values of s for which there is no approximation are negligible because 1. we impose the condition of slow variation with remainder and 2. instead of the sup-norm used by Phillips we use the integral L2 -norm contained in the definition of L2 -approximability.
4.6.3 Precise Transformation Now we describe a modification of the Phillips transformation that allows us to avoid dropping any terms. To explain the idea, we consider only a nonreduced model with jld j , 1 and
b1 ld þ b2 = 0:
(4:94)
For the nonreduced model both m1 and m2 are nonzero. Therefore we can write Lj (s) ¼ Lj (n) þ (Lj (s) Lj (n)) Lj (s) Lj (n) s s log þ Lj (n)1j (n) log ¼ Lj (n) þ Lj (n)1j (n) Lj (n)1j (n) n n s ¼ Lj (n) þ Lj (n)1j (n) log þ dj (n)Hj (s, n) n where Hj (s, n) ¼
1 s
Gj (s, n) log mj (n) n
(4.95)
4.6 REGRESSION WITH TWO SLOWLY VARYING REGRESSORS
177
is the H-function of Lj . Hj is not equal to log2 (s=n), but by Theorem 4.4.8 under appropriate conditions the sequence {n1=2 Hj (s, n) : s ¼ 1, . . . , n}, n ¼ 1, 2, . . . , is L2 -close to log2 x. Substitution of Eq. (4.95) in Eq. (4.87) gives ys ¼ gn0 þ gn1 log
s þ Dn þ us n
(4:96)
where
gn0 ¼ b0 þ b1 L1 þ b2 L2 , gn1 ¼ b1 L1 11 þ b2 L2 12
(4:97)
Dn ¼ b1 d1 (n)H1 (s, n) þ b2 d2 (n)H2 (s, n):
(4:98)
and
~ n) in such a way that The crucial step is to define gn2 ¼ a32 b1 þ a33 b2 and H(s, ~ n) and Dn ¼ gn2 H(s,
~ n)} is L2 -close to log2 x {n1=2 H(s,
(4:99)
where H~ is a new function. Then Eq. (4.96) becomes ys ¼ gn0 þ gn1 log
s ~ n) þ us þ gn2 H(s, n
(4:100)
and the transition matrix is 0
1 An ¼ @ 0 0
L1 L1 11 a32
1 L2 L2 12 A: a33
We show how this is done in the case of Eq. (4.94). Continuing Eq. (4.98) we get b1 d1 H1 (s, n) b2 d2 H2 (s, n) þ Dn ¼ (b1 d1 þ b2 d2 ) b1 d1 þ b2 d2 b1 d1 þ b2 d2 b d1 =d2 H1 (s, n) b H2 (s, n) : þ 2 ¼ (b1 d1 þ b2 d2 ) 1 b1 d1 =d2 þ b2 b1 d1 =d2 þ b2
(4.101)
Letting ~ n) ¼ b1 d1 =d2 H1 (s, n) þ b2 H2 (s, n) gn2 ¼ b1 d1 þ b2 d2 , H(s, b1 d1 =d2 þ b2 b1 d1 =d2 þ b2
(4:102)
178
CHAPTER 4
REGRESSIONS WITH SLOWLY VARYING REGRESSORS
TABLE 4.2 Transition Matrix Summary
Case Nonreduced model, jld j , 1
Nonreduced model, jld j ¼ 1
Subcase
Coefficients
b1 ld þ b2 = 0 b1 ld þ b2 ¼ 0, b2 = 0 b1 ld þ b2 ¼ 0, b2 ¼ 0 b1 = 0 b1 ¼ 0
I. a32 ¼ d1 , a33 ¼ d2 II. Indefinite III. a32 ¼ d1 , a33 ¼ 0 IV. a32 ¼ d1 , a33 ¼ d2 V. a32 ¼ 0, a33 ¼ d2 VI. a32 ¼ 0, a33 ¼ d2
Semireduced model, (L1 (x) ¼ log x; d2 = 0)
we satisfy the first part of Eq. (4.99). By Assumption 4.4 (Section 4.6.1.1) and Eq. (4.94)
b1 d1 =d2 b1 ld ! , b1 d1 =d2 þ b2 b1 ld þ b2
b2 b2 ! : b1 d1 =d2 þ b2 b1 ld þ b2
(4:103)
It is shown in Section 4.6.5 that this implies the second part of Eq. (4.99). Note that ~ n) defined in Eq. (4.102) depends on b1 , b2 in a nonlinear fashion, but in the H(s, limit that dependence disappears. The analysis in Section 4.6.5 shows that the elements a32 , a33 are as described in Table 4.2. The case in which the transition matrix is not defined and higher-order RV is necessary to determine it is marked as indefinite. The dependence of the transition matrix on the true b is not continuous.
4.6.4
Linearity of Lp-Approximable Sequences
Lemma. If {vn } is Lp -close to V, {wn } is Lp -close to W and numerical sequences {an } and {bn } converge to a and b, respectively, then {an vn þ bn wn } is Lp -close to aV þ bW. Proof. This property follows from kDnp (an vn þ bn wn ) (aV þ bW)kp jan ajkDnp vn kp þ jbn bjkDnp wn kp þ jajkDnp vn Vkp þ jbjkDnp wn Wkp ! 0, where, by Lp -approximability, kDnp vn kp and kDnp wn kp are bounded.
B
4.6.5 Derivation of the Transition Matrix We consider one by one the six cases listed in the last column of Table 4.2. 4.6.5.1 Nonreduced Model In the first five cases we assume that d1 = 0, d2 = 0 and L1 , L2 satisfy Assumption 4.3 (Section 4.4.7.1). By Theorem 4.4.8, where we take p ¼ 2, the sequences w1n , w2n with components wint ¼ n1=2 Hi (t, n), t ¼ 1, . . . , n, i ¼ 1, 2, are L2 -close to log2 x. By linearity we conclude that {an w1n þ bn w2n } is L2 -close to (a þ b) log2 x whenever an ! a, bn ! b.
4.6 REGRESSION WITH TWO SLOWLY VARYING REGRESSORS
179
Case I Suppose jld j , 1, b1 ld þ b2 = 0. This is exactly the model situation of Section 4.6.3. By Eq. (4.103) the second part of Eq. (4.99) is true and we can put a32 ¼ d1 , a33 ¼ d2 . Case II jld j , 1, b1 ld þ b2 ¼ 0, b2 = 0. This is the indefinite case discussed in Section 4.6.10. Case III jld j , 1, b1 ld þ b2 ¼ 0, b2 ¼ 0. Obviously, from Dn ¼ b1 d1 H1 (s, n) one has a32 ¼ d1 , a33 ¼ 0. Case IV Assume that jld j ¼ 1, b1 = 0. From the first line of Eq. (4.101)
b1 b2 d2 =d1 H1 (s, n) þ H2 (s, n) : Dn ¼ (b1 d1 þ b2 d2 ) b1 þ b2 d2 =d1 b1 þ b2 d2 =d1 Here b1 =(b1 þ b2 d2 =d1 ) ! 1, b2 (d2 =d1 )=(b1 þ b2 d2 =d1 ) ! 0. As above, by linearity Eq. (4.99) holds and the definition from Case I can be used again.
gn2
Case V Let jld j ¼ 1, b1 ¼ 0. Obviously, Dn ¼ b2 d2 H2 (s, n) and the choice ~ n) ¼ H2 (s, n) satisfies Eq. (4.99) and gives a32 ¼ 0, a33 ¼ d2 . ¼ b2 d2 , H(s,
4.6.5.2 Semireduced Model (Case VI) In this case by definition L1 (s) ¼ log s, d1 ¼ 0, d2 = 0. We can still apply Eq. (4.95) to L2 . For L1 we use simply L1 (s) ¼ L1 (n) þ (L1 (s) L1 (n)) ¼ L1 (n) þ log (s=n). Since L1 11 ;1, Eq. (4.97) is true and Eq. (4.98) formally holds with d1 ¼ 0. Therefore the choice is the same as e n) ¼ H2 (s, n). in Case V: gn2 ¼ b2 d2 , H(s,
4.6.6 Convergence of good Coefficients Denote G the Gram matrix of the system fj (x) ¼ log j1 x, j ¼ 1, 2, 3, that is, the element gij of G equals ð1 fi (x)fj (x) dx:
gij ¼ 0
Theorem.
(Phillips, 2007, Theorem 4.1) Let Assumption 4.2 (Section 4.4.6.1) hold.
(i) In the nonreduced case suppose that both L1 , L2 satisfy conditions of Theorem 4.4.8 (on Lp -approximability of H ) with p ¼ 2. In particular, assume that max{2u11 , uh1 , 2u12 , uh2 } , 1=2. Then pffiffiffi d n(g^ n gn ) ! N(0, s 2 G1 ):
(4:104)
180
CHAPTER 4
REGRESSIONS WITH SLOWLY VARYING REGRESSORS
(ii) In the semi-reduced case suppose that L1 (x) ¼ log x and that L2 satisfies conditions of Theorem 4.4.8 with p ¼ 2. In particular, assume that max{2u12 , uh2 } , 1=2. Then Eq. (4.104) is true. Proof. (i) Nonreduced model. Denote 0
1 Xn ¼ @ 1
1 ~ log (1=n) H(1, n) A ~ log (n=n) H(n, n)
the matrix of regressors in Eq. (4.100). The definition of H~ is clear from 4.6.5. Let us prove that the first, second and third columns of Wn ¼ pffiffiffi (1= n)Xn are L2 -close to f1 , f2 , and f3 , respectively. For the first column this is obvious because if we denote it wn , then Dn2 wn is identically 1 on (0, 1). By Theorem 4.4.1 in which we put p ¼ 2, k ¼ 1, 1 1 n 0 is L2 -close to f2 : wn ¼ pffiffiffi log , . . . , log n n n For the third column the statement follows from Theorem 4.4.8 and linearity ~ of Lp -approximability by construction of H. pffiffiffi Now Eq. (4.104) follows from Theorem 1.12.3 where Dn ¼ nI. ~ n) ¼ (ii) For the semi-reduced model the situation is simpler, since H(s, B H2 (s, n), while the first two columns of Wn are the same.
4.6.7 Convergence of bad Coefficients It turns out that only g^ n2 affects the limit distribution of b^ . Lemma 4.5.9(iii) and convergence Eq. (4.104) imply pffiffiffi d n(g^ n2 gn2 ) ! N(0, s 2 =4):
(4:105)
To describe the behavior of the bad coefficients denote 0 1 1i (b^ 0 b0 ) pffiffiffiB C n@ L1 11 (b^ 1 b1 ) A, i ¼ 1, 2; B(i) n ¼ L2 12 (b^ 2 b2 ) 0 1 0 1 l1 1 1 B C B C f (l1 ) ¼ @ 1 A, g ¼ @ 1 A: 1
1
Let G be a normal variable distributed as N(0, s 2 =4) [it arises from Eq. (4.105)].
4.6 REGRESSION WITH TWO SLOWLY VARYING REGRESSORS
181
Theorem. (Classification theorem). (Mynbaev, 2011) Suppose conditions for convergence of the good coefficients from Theorem 4.6.6 and Assumption 4.4 hold. Then the relation between the bad coefficients [contained in B(i) n ] and good coefficients (represented by G) is as presented in Table 4.3. In the cases marked “indefinite” RV of orders higher than 3 is required to determine the type. This theorem reveals the distinction between the definite case, when the third-order RV is enough to determine the asymptotics, and the indefinite case, when higherorder RV is necessary. The situation is similar to the sufficient condition for optima in terms of the first- and second-order derivatives: if the second-order condition is not satisfied, we have to check higher-order derivatives. The most unexpected outcome is that the asymptotic variances jump along certain rays. The case of more than two different SV regressors should present an even larger number of different asymptotic types and Phillips (2007, Theorem 5.2) does not cover all possibilities.
4.6.8 Proof of the Classification Theorem Repeating the main idea from Section 4.6.2, we use the information about convergence of the good coefficients Eq. (4.104) in combination with the relation between the good and bad coefficients to see what terms determine the limit distribution. By Lemma 4.5.6
g^ n gn ¼ An (b^ b):
(4:106)
We now discuss the eight cases itemized in Table 4.3. 4.6.8.1 Case A Let us restrict our attention to the nonreduced model and in the first two cases assume that either (jld j , 1, b1 ld þ b2 = 0) or (jld j ¼ 1, b1 = 0). In terms of Table 4.2, we are looking at Cases I and IV. In both cases the matrix An is the same as in the Phillips analysis. From Eq. (4.92) we have det An ¼ L1 11 L2 12 (m2 m1 ). For An to be invertible, we have to require m1 =m2 for all large n: One can check that 0
A1 n
1 ¼ @0 0
(1=(m1 m2 ))(m2 =11 m1 =12 ) m2 =(L1 11 (m1 m2 )) m1 =(L2 12 (m1 m2 ))
1 (1=(m1 m2 ))(1=12 1=11 ) A 1=(L1 11 (m1 m2 )) 1=(L2 12 (m1 m2 )) (4:107)
satisfies A1 n An ¼ I and that 0
(m 1
1 m2 )D(1) n An
m1 m2 ¼@ 0 0
1 m2 m1 (11 =12 ) (11 =12 ) 1 A 1 m2 m1 1
(4:108)
182
Cases A–H discussed in Section 4.6.8.
Semireduced model, (L1 (x) ¼ log x; d2 = 0)
Nonreduced model (d1 = 0, d2 = 0)
Case
Subcase
H. m2 B(2) n ! gG
G. m2 B(1) n ! f (l1 )G
d
Indefinite
Indefinite
d
F. m2 B(2) n ! gG
d
D. m1 B(2) n ! gG
(jld j , 1, b2 = 0, b1 ld þ b2 ¼ 0) d
d
B. (m1 m2 )B(2) n ! gG if m1 = m2
jl1 j ¼ 1
E. m2 B(1) n ! f (l1 )G
d
C. m1 B(1) n ! f (l1 )G
d
A. (m1 m2 )B(1) n ! f (l1 )G if m1 = m2
d
jl1 j , 1
(jld j ¼ 1, b1 ¼ 0)
(jld j , 1, b2 ¼ 0, b1 ld þ b2 ¼ 0)
(jld j , 1, b1 ld þ b2 = 0) or (jld j ¼ 1, b1 = 0)
TABLE 4.3 Type-Wise OLS Asymptotics
Indefinite if m1 ¼ m2
jl1 j 1
4.6 REGRESSION WITH TWO SLOWLY VARYING REGRESSORS
183
where D(i) n ¼ diag[1i , L1 11 , L2 12 ], i ¼ 1, 2. In the case under consideration jl1 j , 1. From Eqs. (4.106), (4.107), and (4.108) we have pffiffiffi ^ n(m1 m2 )D(1) n (b b) 0 m1 m2 m2 m1 (11 =12 ) B ¼@ 0 m2
(m1 m2 )B(1) n ¼
m1
0
1 (11 =12 ) 1 Cpffiffiffi 1 A n( g^ n gn ): 1
Now take into account that 1i and mi vanish at infinity by the Karamata theorem, that pffiffiffi 11 =12 !l1 by assumption and that n( g^ n gn ) converges in distribution by Theorem 4.6.6. Then the preceding equation and Eq. (4.105) imply pffiffiffi d ^ n2 gn2 ) þ op (1) ! f (l1 )G: (m1 m2 )B(1) n ¼ f (l1 ) n( g
(4:109)
In the other cases the argument is similar, and we indicate only the analogs of Eqs. (4.107), (4.108), and (4.109). 4.6.8.2 Case B In this case jl1 j ¼ 1 and the other assumptions do not change, so we are still in Cases I and IV of Table 4.2. To obtain the fraction 12 =11 !0, we change the diagonal matrix in Eq. (4.108) to obtain 0
1 (m1 m2 )D(2) n An
m1 m2 ¼@ 0 0
m2 (12 =11 ) m1 m 2 m1
1 1 (12 =11 ) A: 1 1
Then (m1 m2 )B(2) n ¼
pffiffiffi 1 ^ n g n ) n(m1 m2 )D(2) n An ( g
pffiffiffi d ¼ g n( g^ n2 gn2 ) þ op (1) ! gG:
4.6.8.3 Case C zero element, 0
1 An ¼ @ 0 0
By Table 4.2, Case III, in the last row of An there is only one non-
L1 L1 11 d1
1 L2 L2 12 A, 0
0
A1 n
1 ¼ @0 0
1 1=12 1=m1 (1=12 1=11 ) A: 0 (1=d1 ) 1=(L2 12 ) 1=(L2 12 m1 )
Noting that 0
1 m1 D(1) n An
m1 11 ¼@ 0 0
m1 (11 =12 ) 0 m1
1 (11 =12 ) 1 A 1 1
184
CHAPTER 4
REGRESSIONS WITH SLOWLY VARYING REGRESSORS
we get
m1 B(1) n ¼
pffiffiffi 1 ^ n g n ) nm1 D(1) n An ( g
pffiffiffi d ¼ f (l1 ) n( g^ n2 gn2 ) þ op (1) ! f (l1 )G:
4.6.8.4 Case D Here jl1 j ¼ 1 and all other assumptions are like in Case C (Table 4.2, Case III). So 0 1 m1 12 m1 1 12 =11 1 @ 0 A m1 D(2) 0 1 n An ¼ m1 1 0 and, as a result,
m1 B(2) n ¼
pffiffiffi 1 ^ n g n ) nm1 D(2) n An ( g
pffiffiffi d ¼ g n( g^ n2 gn2 ) þ op (1) ! gG:
4.6.8.5 Case E We continue looking at the nonreduced model and assume that jld j ¼ 1, b1 ¼ 0: From Table 4.2, Case V, we see that An is triangular, 0 1 0 1 1 L1 1 1=11 1=m2 (1=11 1=12 ) L2 @ 0 1=(L1 11 ) 1=(L1 11 m2 ) A: An ¼ @ 0 L1 11 L2 12 A, A1 n ¼ 0 0 d2 0 0 1=d2 (4:110) Suppose that jl1 j , 1: To make use of this condition, consider 0 1 m2 11 m2 1 11 =12 1 @ 0 A: m2 D(1) m2 1 n An ¼ 0 0 1 It follows that pffiffiffi d ^ n2 gn2 ) þ op (1) ! f (l1 )G: m2 B(1) n ¼ f (l1 ) n( g
(4:111)
4.6.8.6 Case F We are again in Table 4.2, Case V. Unlike Case E, now we have jl1 j ¼ 1 and use 0 1 m2 12 m2 12 =11 12 =11 1 (2) 1 A: m2 Dn An ¼ @ 0 m2 1 0 0 1
4.6 REGRESSION WITH TWO SLOWLY VARYING REGRESSORS
185
Hence, pffiffiffi d ^ n2 gn2 ) þ op (1) ! gG: m2 B(2) n ¼ g n( g
(4:112)
4.6.8.7 Cases G and H As is clear from Section 4.6.5.2 (Case VI), the transition matrix and its inverse are the same as in Eq. (4.110). Therefore the conclusions in Eqs. (4.111) and (4.112) apply. The case m1 ¼ m2 is marked as indefinite because, as mentioned in Case A, the matrix An is not invertible. Case II (Section 4.6.5.1) (jld j , 1, b1 ld þ b2 ¼ 0, b2 = 0) is also indefinite because the transition matrix is not defined.
4.6.9 Example Since Phillips (2007, Theorem 5.1) is actually about regression with a quadratic form in log(s=n), no wonder its predictions are different from those of the classification theorem. In particular, the latter theorem captures a new effect that, within the same model, the rate of convergence and the asymptotic standard error depend on the true b1 and b2 . The following example from Phillips (2007) has iterated logarithmic growth, a trend decay component and a constant regressor: ys ¼ b0 þ b1 =log s þ b2 log(log s) þ us : Such a model is relevant in empirical research where we want to capture simultaneously two different opposing trends in the data. Here L1 (s) ¼ 1=log s, L2 (s) ¼ log(log s). From Table 4.1 11 (n) ¼ 12 (n) ¼
1 , log n
m1 (n) ¼
1 , (log n)log(log n)
1 , log n
d1 (n) ¼
m2 (n) ¼
1 , log3 n
1 1 : , d2 (n) ¼ 2 log n 2 log2 n
Since d1 =d2 ! 0 and 12 =11 ! 0, we have from Table 4.3, Cases B and D, 0 1 ( (1=log(log n))( b^ 0 b0 ) pffiffiffi 2gG if b2 = 0; n B d C ^ (1=log n)( b 1 b1 ) A ! 2 @ log n gG if b2 ¼ 0: b^ 2 b2 The formula from Phillips (2007, pp. 575 – 576), after correction of two typos, gives 0 1 (1=log(log n))( b^ 0 b0 ) pffiffiffi n B d (1=log n)( b^ 1 b1 ) C A ! gG, 2 @ log n b^ 2 b2 regardless of b2 .
186
CHAPTER 4
REGRESSIONS WITH SLOWLY VARYING REGRESSORS
The comments by Phillips apply. The coefficient of the growth term converges pffiffiffi fastest, but at less than an n rate. The intercept converges next fastest, and finally the coefficient of the evaporating trend. All of these outcomes relate to the strength of the signal from the respective regressor.
4.6.10 What Can be Done in the Indefinite Cases? From the derivation of Table 4.2 and proof of Theorem 4.6.7 we can see that indefiniteness can occur when the rate of approximation is not good enough to define the transition matrix or when it is defined, but is not invertible. We use the first possibility as an example. To this end, basic notation related to approximation of SV functions is necessary. Let (1) (2) Li ¼ K(1(1) i , f1(1) ), 1i ¼ K(1i , f1(2) ), i
i
(3) 1(2) i ¼ K(1i , f1(3) ); i
and suppose 1(3) i is SV for i ¼ 1, 2. One can then prove Li (rn) (1) (1) (1) (1) (2) 2 3 1 ¼ 1(1) i log r þ 1i mi log r þ 1i mi mi log r Li (n) (1) (2) þ o(1(1) i mi mi ),
(4:113)
where 2
m(1) i ¼
(2) (2) (3) (1) (2) (1) 1(1) 1(2) i þ 1i i (1i þ 1i ) þ 31i 1i þ [1i ] , m(2) ¼ : i (2) 2 1(1) i þ 1i
(4:114)
Denoting H (2) ¼ G, H (3) (t, n) ¼ [H (2) (t, n) log2 (t=n)](1=m(2) (n)), from Eq. (4.113) we have the fourth-order RV Hi(3) (rn, n) ¼ log3 r þ o(1): With this information, an extension of Eq. (4.95) is s þ dj (n)Hj(2) (s, n) n s s þ dj (n)log2 ¼ Lj (n) þ Lj (n)1(1) j (n)log n n
Lj (s) ¼ Lj (n) þ Lj (n)1(1) j (n)log
(4:115)
(3) þ dj (n)m(2) j Hj (s, n):
Equation (4.115) allows us to rewrite Eq. (4.98) as (3) (2) (3) 2 Dn ¼ b1 d1 m(2) 1 H1 (s, n) þ b2 d2 m2 H2 (s, n) þ (b1 d1 þ b2 d2 ) log (2) Denote k ¼ (b1 d1 þ b2 d2 )=(b1 d1 m(2) 1 þ b2 d2 m2 ).
s : n
(4:116)
4.6 REGRESSION WITH TWO SLOWLY VARYING REGRESSORS
187
TABLE 4.4 Transition Matrix Summary in Case II: jld j < 1, b1 ld þ b2 ¼ 0, b2 =0
Case
Subcase
jldm j , 1
jldm j ¼ 1
b1 ldm þ b2 ¼ 0 b1 ldm þ b2 = 0 b1 = 0 b1 ¼ 0
4.6.10.1
Transition matrix elements jlk j , 1 jlk j ¼ 1
VII. Indefinite (2) VIII. a32 ¼ d1 m(2) 1 , a33 ¼ d2 m2 IX. a32 ¼ d1 , a33 ¼ d2
jlk j , 1 jlk j ¼ 1 jlm j , 1 jlm j ¼ 1
(2) X. a32 ¼ d1 m(2) 1 , a33 ¼ d2 m2 XI. a32 ¼ d1 , a33 ¼ d2 XII. a32 ¼ 0, a33 ¼ d2 XIII. a32 ¼ 0, a33 ¼ d2 m(2) 2
Assumption 4.5
(2) 1. Let b1 d1 m(2) 1 þ b2 d2 m2 = 0 for all large n. (2) (2) 2. Suppose the limits ldm ¼ limn ! 1 d1 m(2) 1 =(d2 m2 ), lk ¼ lim k, lm ¼ lim m2 (finite or infinite) exist. 3. {n1=2 Hj(3) } is L2 -close to log3 x, j ¼ 1, 2:
Under this assumption the last row of the transition matrix is described by Table 4.4, where the numbering continues that of Table 4.2. Case VII starts a new indefinite branch. Equation (4.113) can be called a pointwise RV. The method requires establishing an integral version of the fourth-order RV in the form of Assumption 4.5(3) (Section 4.6.10.1). The proofs of second-order and third-order RV given in this chapter are pretty complex. The proof of the fourth-order RV must be even more complex given that the function m(2) in Eq. (4.114) depends nonlinearly on the 1-functions. Therefore, trying to obtain a general result is not recommended. If indefiniteness arises in an applied problem with specific L1 , L2 , it would be easier to prove Assumption 4.5(3) (Section 4.6.10.1) for those specific functions.
CHAPTER
5
SPATIAL MODELS
U
NLIKE AUTOREGRESSIVE models, which may contain only lags of the dependent variable, spatial models may contain all kinds of shifts: backward, forward and, in some problems, without any specific spatial or temporal direction. The reader can consult (Anselin, 1988), (Cliff and Ord, 1981), (Cressie, 1993) and (Anselin and Bera, 1998) about applications of spatial models. We concentrate on the mathematical side. Both the techniques and the asymptotic results in spatial models are quite different from those in Chapter 4. This chapter is based on (Mynbaev and Ullah, 2008) and (Mynbaev, 2010). One of the major differences between the previous research and ours is that we do not impose any conditions on nonlinear matrix functions of the spatial matrix and derive their properties from low-level assumptions. Another difference is that the gauge inequality allows us to deal with autocorrelated errors using only low-level conditions. For the purely spatial model we prove convergence in distribution to a ratio of two quadratic forms in standard normal variables. This format of the asymptotic statement is justified by the finite-sample properties. As a by-product of the method, we show that the identification conditions for ML and method of moments (MM) developed by other authors fail under our conditions. Interestingly, the two-step procedure we suggest for correcting bias is a combination of least squares and ML estimators. For the mixed spatial model we prove convergence in distribution to a nonstandard vector whose components contain both linear and quadratic forms in standard normal variables. For this we need to prove that the pair (denominator, numerator) converges in distribution because the denominator converges in distribution to a random matrix. This is the place where a parsimonious choice of assumptions and the normalizer is especially important. It is because of them that the asymptotic distribution automatically adjusts to the rates of growth of the exogenous regressors. Finally, the problem caused by randomness of the denominator matrix is addressed with the device called a multicollinearity detector.
Short-Memory Linear Processes and Econometric Applications. Kairat T. Mynbaev # 2011 John Wiley & Sons, Inc. Published 2011 by John Wiley & Sons, Inc.
189
190
CHAPTER 5
SPATIAL MODELS
5.1 A MATH INTRODUCTION TO PURELY SPATIAL MODELS 5.1.1 The Model To understand what a spatial model is, it is useful to start with a simple autoregression yt ¼ ryt1 þ et ,
t ¼ 1, . . . , n:
Denoting Yn ¼ (y1 , . . . , yn )0 , En ¼ (e1 , . . . , en )0 , Y0 ¼ ( y0 , 0, . . . , 0) 0 (n coordinates), we can write the model as 0
1 0 0 ... 0 0 B 1 0 ... 0 0 C B C Yn ¼ rB 0 1 ... 0 0 C B CYn þ rY0 þ En : @... ... ... ... ...A 0 0 ... 1 0 It is easy to see that in the case of a more general autoregression yt ¼ r1 yt1 þ þ rp ytp þ et ,
t ¼ 1, . . . , n,
there will be more nonzero elements in the big matrix in front of Yn , but all of them will be below the main diagonal. In the purely spatial model Yn ¼ rWn Yn þ En ,
(5:1)
Yn and En have the same meaning as above. The spatial matrix Wn , however, can have nonzero elements anywhere and they don’t have to be unities. If t is time, this means regressing yt on scaled backward and forward shifts. If everything is happening in the 2-D space, one can add upward and downward shifts. However, in many applications there is no temporal, spatial or geographic connotation. The spatial matrix is assumed predetermined and its elements don’t need to be estimated. It may or may not have a specific analytical structure. Finally, in the spatial model there is no reference to the initial vector Y0 . Members of Yn form a closed community: they refer only to each other. In most papers on spatial models the components of the error En in Eq. (5.1) are assumed to be independent or m.d.s, (see Lee, 2004a; Kelejian et al., 2004) and references therein. The method developed in (Mynbaev, 2010) works in case of linear processes with m.d.s as innovations. Therefore here we consider Yn ¼ rWn Yn þ vn , where the components of vn ¼ (vn1 , . . . , vnn )0 form a linear process.
(5:2)
5.1 A MATH INTRODUCTION TO PURELY SPATIAL MODELS
191
5.1.2 OLS Estimator and Related Matrices r is the parameter to be estimated. Since the regressor is Xn ¼ Wn Yn , the usual OLS estimator formula gives r^ ¼ (Xn0 Xn )1 Xn0 Yn : Its consequence
r^ r ¼ (Xn0 Xn )1 Xn0 vn ¼
(Wn Yn )0 vn , (Wn Yn )0 Wn Yn
(5:3)
used in the conventional scheme here, is insufficient for analysis because of the presence of the dependent vector Yn at the right-hand side. Denote Sn ¼ Sn (r) ¼ I rWn : The matrix S1 n , when it exists, can be called a solver because it solves Eq. (5.2) for Yn : Yn ¼ S1 n vn :
(5:4)
Equation (5.4) is the reduced form of Eq. (5.2) (the form in which there is no dependent variable at the right). It is convenient to put Gn ¼ Wn S1 n : Substitution of Eq. (5.4) into Eq. (5.3) yields
r^ r ¼
v0n G0n vn : v0n G0n Gn vn
(5:5)
The analysis of the fraction at the right here is possible after we study Sn and Gn .
5.1.3 History Kelejian and Prucha (1999) considered a generalized MM estimator for Eq. (5.1). Lee (2001) developed the theory of quasi-maximum likelihood (QML) estimation. Lee (2002) provided an example of inconsistency of the OLS estimator r^ . These authors were looking for normal asymptotics pffiffiffi d n(~r r) ! N(0, V) and imposed conditions accordingly. Here r~ is any of MM, QML or OLS estimators and V is the variance of the limit distribution. The research in the above references moved towards relaxing the assumptions that underlie the asymptotic results. Along the way the conditions imposed and the results obtained became complex, to the point that it is hard to see whether a given condition can be satisfied or whether two different conditions imposed on the same sequence of matrices are compatible.
192
CHAPTER 5
SPATIAL MODELS
In (Mynbaev and Ullah, 2008) our main objective was to simplify and reduce the number of conditions, avoid assumptions with overlapping responsibilities and derive the characteristics of the limit distribution from the primary low-level conditions. We found that, under some conditions, the OLS estimator can be asymptotically a ratio of two infinite linear combinations of x2 variables.
5.1.4 Finite-Sample Expression The intuition behind our result is explained as follows. Lemma. Suppose that Wn is symmetric with eigenvalues l1 , . . . , ln and vn is distributed as N(0, s 2 I): Then Pn h(li )u2i P r^ r ¼ ni¼1 2 , (5:6) 2 i¼1 h (li )ui where u1 , . . . , un are independent standard normal and h(t) ¼ t=(1 rt): Proof. By the diagonalization theorem (Theorem 1.7.2) Wn can be represented as Wn ¼ P0n Ln Pn , where Pn is an orthogonal matrix, P0n Pn ¼ Pn P0n ¼ I and Ln is a diagonal matrix with l1 , . . . , ln on the main diagonal. Then Sn ¼ I rWn ¼ P0n Pn rP0n Ln Pn ¼ P0n (I rLn )Pn , 0 0 1 0 1 Gn ¼ Wn S1 n ¼ Pn Ln Pn Pn (I rLn ) Pn ¼ Pn Ln (I rLn ) Pn ,
v0n G0n vn ¼ v0n P0n Ln (I rLn )1 Pn vn ¼ (Pn vn )0 Ln (I rLn )1 Pn vn , v0n G0n Gn vn ¼ v0n P0n Ln (I rLn )1 Pn P0n Ln (I rLn )1 Pn vn ¼ (Pn vn )0 [Ln (I rLn )1 ]Pn vn : Here we remember that Ln (I rLn )1 is symmetric. Moreover, Ln (I rLn )1 ¼ diag[h(l1 ), . . . , h(ln )]: Since EPn vn ¼ 0, V(Pn vn ) ¼ Pn V(vn )P0n ¼ s 2 Pn P0n ¼ s 2 I, we see that Vn ¼ Pn vn is distributed as N(0, s 2 I). Hence, it can be represented as Vn ¼ su, where u ¼ (u1 , . . . , un )0 . Substitution of this Vn in the above expressions yields v0n G0n vn ¼ su0 diag[h(l1 ), . . . , h(ln )]su ¼ s 2
n X
h(li )u2i ,
i¼1
v0n G0n Gn vn ¼ su0 {diag[h(l1 ), . . . , h(ln )]}2 su ¼ s 2
n X
h2 (li )u2i :
i¼1
Now the statement follows from Eq. (5.5).
B
5.2 CONTINUITY OF NONLINEAR MATRIX FUNCTIONS
193
As a result of this lemma we think that the asymptotics of the OLS estimator should be a ratio of two linear combinations of x2 variables rather than normal.
5.2 CONTINUITY OF NONLINEAR MATRIX FUNCTIONS 5.2.1 Elementary Algebraic Identities Lemma.
Let A, B be square matrices of the same size. Then kþ1
(i) A Bkþ1 ¼ Ak (A B) þ Ak1 (A B)B þ þ (A B)Bk , k ¼ 1, 2, . . . , P1 P k 2 m (ii) If kAk2 , 1, then ¼ 1 k¼0 A m¼0 (m þ 1)A . Proof. (i) The purpose here is to reveal the difference A B in Akþ1 Bkþ1 ¼ Akþ1 Ak B þ Ak B Ak1 B2 þ Ak1 B2 þ ABk Bkþ1 ¼ Ak (A B) þ Ak1 (A B)B þ þ (A B)Bk : (ii) It is easy to check that the k k2 -norm of matrices is submultiplicative: kABk2 kAk2 kBk2 : Therefore kAk k2 kAkk2 : kAk2 , 1 implies that in the next equation all the series converge and rearranging the members is legitimate: 1 X k¼0
!2 k
A
¼
1 X
Akþl ¼
1 X X m¼0 kþl¼m
k,l¼0
Akþl ¼
1 X
(m þ 1)Am :
m¼0
The end equality obtains if we observe that there are m þ 1 pairs (k, l) of nonnegative integers such that k þ l ¼ m. B
5.2.2 h-Function and h-Series Functional calculus deals with applying functions of a real or complex argument to matrices and operators. We are interested in the function h(t) ¼
t 1 rt
194
CHAPTER 5
SPATIAL MODELS
because Gn ¼ h(Wn ): We call it an h-function because its graph is a hyperbola except for the case r ¼ 0. This name gives rise to a series of related notions: h-series, h-continuity, etc. Lemma.
If A is a square matrix satisfying jrjkAk2 , 1, then h(A) ¼
1 X
rk Akþ1
(5:7)
k¼0
and k(I rA)1 k2
1 kAk2 , kh(A)k2 : 1 jrjkAk2 1 jrjkAk2
(5:8)
Proof. If kAk2 , 1, we can let m ! 1 in the identity (I A)
m X
Ak ¼ A Amþ1 :
k¼0
The resulting equation (I A)
P1
k¼0
Ak ¼ I means that
(I A)1 ¼
1 X
Ak if kAk2 , 1:
k¼0
Applying this well-known fact to rA instead of A we see that S1 n (r) ¼
1 X
rk Ak ,
(5:9)
k¼0
which implies Eq. (5.7). From Eqs. (5.9) and (5.7) we deduce kS1 n (r)k2
1 X k¼0
kh(A)k2
1 X k¼0
jrjk kAkk2 ¼
1 , 1 jrjkAk2
jrjk kAk2kþ1 ¼ kAk2
1 X
(jrjkAk2 )k ¼
k¼0
kAk2 : 1 jrjkAk2 B
We use the name h-series for the decomposition (5.7).
5.2.3 h-Continuity Lemma.
For square matrices A, B such that
m ¼ jrjmax{kAk2 , kBk2 } , 1
(5:10)
5.3 ASSUMPTION ON THE ERROR TERM AND IMPLICATIONS
195
we have kh(A) h(B)k2 [1 þ f(r, A, B)]kA Bk2 , where
f(r, A, B) ¼
1 X
(k þ 1)mk , 1:
k¼1
Proof. We need the bound kAkþ1 Bkþ1 k2 kA Bk2 (k þ 1)(max{kAk2 , kBk2 })k ,
k ¼ 0, 1, . . .
(5:11)
For k ¼ 0 it is trivial. For k . 0 it follows from submultiplicativity and Lemma 5.2.1(i), where there are k þ 1 terms at the right. This bound and the h-series lead to kh(A) h(B)k2
1 X
jrjk kAkþ1 Bkþ1 k2
k¼0
kA Bk2
1 X
jrjk (k þ 1)(max{kAk2 , kBk2 })k
k¼0
¼ [1 þ f(r, A, B)]kA Bk2 :
B
This lemma gives rise to the following property which can be termed h-continuity: if kAn Bn k2 ! 0 and the numbers f(r, An , Bn ) are uniformly bounded, sup f(r, An , Bn ) , 1,
(5:12)
n
then limkh(Bn ) h(An )k2 ¼ 0: I call condition (5.12) an escort, meaning that many convergence statements depend on its validity.
5.3 ASSUMPTION ON THE ERROR TERM AND IMPLICATIONS 5.3.1 Assumption on the Error Term 5.3.1.1
Assumption 5.1
1. {{ent , F nt : n [ Z} : t [ Z} is a double-infinite m.d. array [ent is F nt measurable, F n,t1 , F nt and E(ent jF n,t1 ) ¼ 0]. 2. E(e2nt jF n,t1 ) ¼ s 2 for all t and n. 3. The third conditional moments are constant but may depend on n and t, that is, E(e3nt jF n,t1 ) ¼ ant , and the fourth moments are uniformly bounded, m4 ¼ supn,t Ee4nt , 1.
196
CHAPTER 5
SPATIAL MODELS
4. {cj : j [ Z} is a summable sequence of numbers, ac ¼
P
j[Z
jcj j , 1.
0
5. Components of vn ¼ (vn1 , . . . , vnn ) are defined by X en,tj cj , t ¼ 1, . . . , n: vnt ¼ j[Z
Conditions 1, 2, 4 and 5 are from Section 3.5.2. Condition 3 is from Section 3.9.7. Uniform integrability of e2nt follows from Condition 3 and therefore is not included in Condition 2.
5.3.2 Gauge Version of h-Continuity For an n n matrix A g(A) ¼ [E(v0n Avn )2 ]
1=2
denotes the gauge (Section 3.9.9). In the lemma below, Lemma.
pffiffiffi n spoils the picture.
If vn satisfies Assumption 5.1 and A, B satisfy
m ¼ jrj max{kAk2 , kBk2 } , 1, then g(h(A) h(B)) ckA Bk2
pffiffiffi n þ f(r, A, B) ,
(5:13)
where f is the same as in Lemma 5.2.3. Proof. By the Minkowski inequality g(h(A) h(B)) ¼ g
1 X
! k
r (A
kþ1
B
kþ1
)
k¼0
g(A B) þ
1 X
jrjk g(Akþ1 Bkþ1 ):
(5.14)
k¼1
By the gauge inequality (Section 3.9.9) pffiffiffi g(A B) ckIk2 kA Bk2 ¼ c nkA Bk2 :
(5:15)
For k . 0 we employ Lemma 5.2.1(i) and the gauge inequality to get g(Akþ1 Bkþ1 ) g(Ak (A B)) þ g(Ak1 (A B)B) þ þ g((A B)Bk ) c[kAkk2 kA Bk2 þ kAk2k1 kA Bk2 kBk2 þ þ kA Bk2 kBkk2 ] c(k þ 1)( max {kAk2 , kBk2 })k kA Bk2 :
(5.16)
5.3 ASSUMPTION ON THE ERROR TERM AND IMPLICATIONS
197
Summarizing, " g(h(A) h(B)) ckA Bk2
# 1 pffiffiffi X k (k þ 1)m : nþ B
k¼1
5.3.3 Trace Version of h-Continuity Lemma. then
If vn satisfies Assumption 5.1 (Section 5.3.1) and A, B satisfy Eq. (5.10),
jtrh(A) trh(B)j kA Bk2
pffiffiffi n þ f ( r, A, B)
(5:17)
where f is from Lemma 5.2.3. Proof. Since a trace is a linear function of a matrix, tr(aA þ bB) ¼ atrA þ btrB, we have a trace analog of Eq. (5.14) ! 1 X k kþ1 kþ1 r (A B ) jtrh(A) trh(B)j ¼ tr k¼0 jtr(A B)j þ
1 X
jrjk jtr(Akþ1 Bkþ1 )j:
k¼1
The trace of a single matrix gives rise to the nasty
pffiffiffi n, as in
n pffiffiffi X jtrAj ¼ aii nkAk2 , i¼1 while the trace of a product behaves better as long as the Euclidean norm is used: n X n X aij b ji kAk2 kBk2 : jtrABj ¼ i¼1 j¼1
(5:18)
Bound (5.18) opens the door to the analog of Eq. (5.16): jtr(Akþ1 Bkþ1 )j jtr(Ak (A B))j þ þ jtr((A B)Bk )j (k þ 1)( max {kAk2 , kBk2 })k kA Bk2 : The conclusion follows in the same way as for the gauge version.
B
198
CHAPTER 5
SPATIAL MODELS
5.4 ASSUMPTION ON THE SPATIAL MATRICES AND IMPLICATIONS 5.4.1 Assumption on the Spatial Matrices 5.4.1.1 Assumption 5.2 The sequence of matrices {Wn : n [ N} is such that Wn is of size n n and there exists a function K [ L2 ((0, 1)2 ) that satisfies 1 kWn dn Kk2 ¼ o pffiffiffi : n
(5:19)
Here dn ¼ dn2 is the 2-D discretization operator from Section 3.6.1. There are at least as many such classes of matrices as there are functions in L2 ((0, 1)2 ): One can take any function K [ L2 ((0, 1)2 ) and put Wn ¼ d2 K, in which case the left side of Eq. (5.19) pffiffiffi is identically zero. Unfortunately, because of the presence of n in Eqs. (5.13) and (5.17) just L2 -approximability kWn dn Kk2 ! 0 is insufficient for our purposes. In Section 5.4.2 we show that L2 -approximability of Wn implies lim max jwnij j ¼ 0,
n !1 i, j
lim
n !1
X
jwnij j ¼ 1:
(5:20)
i, j
The first equation P means that the influence of a given economic unit on other units is weak. If the sum i, j jwnij j is adopted as a measure of total interaction among the units, the second equation shows that this interaction increases to infinity.
5.4.2 Simple Implications of Assumption 5.2 Lemma.
Let {Wn } satisfy Assumption 5.2. Then
(i) kWn k2 and kdn Kk2 are asymptotically the same, lim kWn k2 ¼ lim kdn Kk2 ¼ kKk2 :
n !1
n !1
(5:21)
(ii) If K is symmetric, then Wn0 approaches dn K with the same rate as Wn : 1 0 kWn dn Kk2 ¼ o pffiffiffi : n (iii) Eq. (5.20) is true. Proof. (i) Because of the convergence of Pn K to K (see Section 2.2.1) and continuity of norms we have lim kPn Kk2 ¼ kKk2 :
n !1
5.4 ASSUMPTION ON THE SPATIAL MATRICES AND IMPLICATIONS
199
Also take into account that Pn ¼ Dn dn and that Dn preserves norms (see Section 2.1.7) so that kPn Kk2 ¼ kDn dn Kk2 ¼ kdn Kk2 : We have proved the second Eq. (5.21). The first equation follows directly from Eq. (5.19) and continuity of norms. (ii) Note that (x, y) [ qij if and only (y, x) [ q ji (see the definition of dn K in Section 3.6.1) and, hence, for a symmetric K, dn K is also symmetric. Now the statement follows from kWn0 dn Kk2 ¼ k(Wn dn K)0 k2 ¼ kWn dn Kk2 :
(5:22)
(iii) The first equation in Eq. (5.20) is established like Lemma 2.5.2 (ii): initially it is proved for Lp -generated sequences and then extended to Lp-approximable sequences. From the first equation in Eq. (5.20) and part (i) of this lemma 0 , c kWn k22 kWn k1 kWn k1 , which implies the second equation of Eq. (5.20).
B
5.4.3 Existence of the OLS Estimator As is clear from Eq. (5.5), the OLS estimator exists whenever Sn (r) is invertible. A good condition for the invertibility of Sn (r) should be expressed in terms of the basic assumptions. Such a condition is provided in the lemma below. Fortunately, it is weaker than the condition on r imposed later for convergence of the estimator in distribution. Lemma.
If {Wn } satisfies Assumption 5.2 and r satisfies jrj ,
1 , kKk2
(5:23)
then there exists a natural n0 such that sup kGn k2 , 1
nn0
and the OLS estimator Eq. (5.5) exists for all large n:
(5:24)
200
CHAPTER 5
SPATIAL MODELS
Proof. By condition Eq. (5.23) there exists 1 . 0 satisfying jrjkKk2 1 21. By Lemma 5.4.2 there is a natural n0 such that sup jrjkWn k2 1 1,
nn0
sup jrjkdn Kk2 1 1:
(5:25)
nn0
Therefore Eq. (5.8) implies Eq. (5.24): kGn k2 ¼ kh(Wn )k2
kWn k2 1 sup kWn k2 , 1: 1 jrjkWn k2 1 nn0
(5:26) B
5.4.4 Gauging G0n h(dn K) Lemma.
Under the conditions of Lemma 5.4.3
g(G0n h(dn K)) ! 0,
kGn h(dn K)k2 ! 0,
kG0n h(dn K)k2 ! 0:
Proof. Equation (5.25) implies
mn ¼ jrj max {kWn0 k2 , kdn Kk2 } 1 1 and therefore the escort is satisfied:
f(r, Wn0 , dn K)
1 X
(k þ 1)(1 1)k c,
for n n0 :
k¼1
By the gauge version of h-continuity and Lemma 5.4.2 g(G0n h(dn K)) ¼ g(h(Wn0 ) h(dn K)) pffiffiffi ckWn0 dn Kk2 n þ f(r, Wn0 , dn K) ! 0:
(5.27)
The other two statements of the lemma rely on Lemma 5.2.3 but the escort is the same. B
5.4.5 Gauging G0n Gn h2 (dn K) Lemma.
Under conditions of Lemma 5.4.3, g(G0n Gn h2 (dn K)) ! 0:
Proof. Replacing Wn by dn K in Eq. (5.26) we obtain sup kh(dn K)k2 , 1:
nn0
(5:28)
5.5 ASSUMPTION ON THE KERNEL AND IMPLICATIONS
201
By the Minkowski and gauge inequalities (Sections 1.2.4 and 3.9.9, respectively) g(G0n Gn h2 (dn K)) g((G0n h(dn K))Gn ) þ g(h(dn K)(Gn h(dn K))) c[kG0n h(dn K)k2 kGn k2 þ kh(dn K)k2 kGn h(dn K)k2 ]: (5.29) The escort for G0n h(dn K) and Gn h(dn K) is the same [this is proved like for Eq. (5.22)], and we have checked its validity in Lemma 5.4.4. Therefore both tend to zero in L2 . kGn k2 and kh(dn K)k2 are uniformly bounded by Eqs. (5.29) and (5.28). From Eq. (5.29) we see that the lemma is true. B
5.5 ASSUMPTION ON THE KERNEL AND IMPLICATIONS 5.5.1 Assumption on the Kernel 5.5.1.1 Assumption 5.3 The function K from Section 5.4.1 [Eq. (5.19)] is symmetric and the eigenvalues li , i ¼ 1, 2, . . . , of the integral operator ð1 (KF)(x) ¼ K(x, y)F(y)dy,
F [ L2 (0, 1),
0
are summable: n(K) ;
1 X
jlj j , 1,
(5:30)
j¼1
Necessary and sufficient conditions (in terms of K ) for summability of eigenvalues can be found in [Gohberg and Krei˘n (1969), Theorem 10.1]. Recall from Section 3.7.1 that K is called a kernel of K and the eigenvalues summability condition means that K is nuclear. The quantity n(K) appears in many estimates. I call it a nuke. As proved in Section 3.7.2, K is compact and the Hilbert – Schmidt theorem (Theorem 3.7.5) is applicable. The eigenvalues {ln } are counted with their multiplicities. From now on we denote by {Fn } the orthonormal system of eigenvectors of K.
5.5.2 Bounds for Segments of K For any 1 L , M 1 consider a segment KL,M (x, y) ¼
M X
lj Fj (x)Fj (x)
j¼L
of decomposition (3.38). Note that in this notation the initial segment KL becomes K1,L .
202
CHAPTER 5
Lemma.
SPATIAL MODELS
If K satisfies Assumption 5.3, then kdn KL,M k2
M X
jlj j
for any L, M:
(5:31)
j¼L
Proof. We need the identity (d2n KL,M )s,t ¼
M X
lj (d1n Fj )s (d1n Fj )t , s, t ¼ 1, . . . , n,
(5:32)
j¼L
which follows from Lemma 3.6.1 (iii). Here d2n and d1n are 2-D and 1-D discretization operators, respectively. For any n, i, j by the Cauchy– Schwarz inequality and boundedness of dn j(dn Fi , dn Fj )l2 j kdn Fi k2 kdn Fj k2 kFi k2 kFj k2 ¼ 1:
(5:33)
Hence, Eq. (5.32) gives kdn KL,M k22 ¼
n X M X
li lj (dn Fi )s (dn Fi )t (dn Fj )s (dn Fj )t
s,t¼1 i, j¼L
¼
M X
li lj (dn Fi , dn Fj )2l2
i, j¼L
M X
!2 jlj j :
j¼L
When M ¼ 1, this bound requires Eq. (5.30).
B
5.5.3 h-Continuity for Segments of K Lemma.
Under Assumptions 5.1 and 5.3 and jrj , 1=n(K),
(5:34)
the bound sup kh(dn K) h(dn KL )k2 ckdn K dn KL k2 ¼ c n1
X
jlj j
j.L
is true, where c does not depend on L. Proof. From Eqs. (5.34) and (5.31) we obtain a uniform bound jrj sup kdn KL,M k2 jrjn(K) , 1:
(5:35)
n,L,M
The quantity m from Eq. (5.10) for the pair (dn K, dn KL ) is also uniformly bounded away from 1:
mn,L ¼ jrj max {kdn Kk2 , kdn KL k2 } jrjn(K) , 1:
(5:36)
5.5 ASSUMPTION ON THE KERNEL AND IMPLICATIONS
203
Hence, the escort condition holds,
f(r, dn K, dn KL ) ¼
1 X
(k þ 1)mkn,L
k¼1
1 X
(k þ 1)(jrjn(K))k ¼ c , 1:
k¼1
Now, by Lemma 5.2.3 and Eq. (5.31) kh(dn K) h(dn KL )k2 (1 þ c)kdn K dn KL k2 ¼ c1 kdn KLþ1,1 k2 c1
1 X
jlj j:
(5.37)
j¼Lþ1
B
5.5.4 Gauging h(dn K) h(dn KL ) This Lemma improves upon Lemma 5.3.2 thanks to the additional structure embedded in K: It is useful to follow the similarities in the proofs. Lemma.
Under Assumptions 5.1 and 5.3 and Eq. (5.34) we have X sup g(h(dn K) h(dn KL )) c jlj j, n1
j.L
where c does not depend on L: Proof. We start with g(h(dn K) h(dn KL )) g(dn K dn KL ) þ
1 X
jrjk g((dn K)kþ1 (dn KL )kþ1 ):
(5.38)
k¼1
By the gauge inequality (Section 3.9.9) with A ¼ dn Fj , B ¼ (dn Fj )0 and Eq. (5.33) write g(dn Fi (dn Fj )0 ) ckdn Fi k2 kdn Fj k2 c: The first term at the right of Eq. (5.38) is bounded like this: 0 !n 1 X A lj (dn Fj )s (dn Fj )t g(dn K dn KL ) ¼ g@ j.L
¼g
X
lj dn Fj (dn Fj )0
j.L
c
X j.L
s,t¼1
!
jlj j:
X j.L
jlj jg(dn Fj (dn Fj )0 )
(5:39)
204
CHAPTER 5
SPATIAL MODELS
For the rest of the terms at the right of Eq. (5.38) successively apply Eq. (5.16), Eq. (5.36) and the last part of Eq. (5.37): jrjk g((dn K)kþ1 (dn KL )kþ1 ) c(k þ 1)(jrj max {kdn Kk2 , kdn KL k2 })k kdn K dn KL k2 X c1 (k þ 1)(jrjn(K))k jlj j: j.L
Hence, collecting the terms gives g(h(dn K) h(dn KL )) c
X
jlj j þ c1
j.L
¼ c2
X
1 X
(k þ 1)(jrjn(K))k
X
jlj j
j.L
k¼1
jlj j,
j.L
where c2 does not depend on n, L.
B
5.5.5 Gauging h2 (dn K) h2 (dn KL ) Lemma.
Under Assumptions 5.1 and 5.3 and Eq. (5.34) one has sup g(h2 (dn K) h2 (dn KL )) c n1
X
jlj j,
j.L
where c does not depend on L. Proof. Equation (5.29) rewrites for the current situation as g(h2 (dn K) h2 (dn KL )) g([h(dn K) h(dn KL )]h(dn K)) þ g(h(dn K)[h(dn K) h(dn KL )]) c[kh(dn KL )k2 þ kh(dn K)k2 ]kh(dn K) h(dn KL )k2 : Here, by Eqs. (5.8) and (5.35) kh(dn KL )k2
kdn KL k2 kdn KL k2 c1 , 1 L 1: 1 jrjkdn KL k2 1 jrjn(K)
(5:40)
Besides, by Lemma 5.5.3 kh(dn K) h(dn KL )k2 c
X j.L
jlj j: B
5.6 LINEAR AND QUADRATIC FORMS INVOLVING SEGMENTS OF K
205
5.6 LINEAR AND QUADRATIC FORMS INVOLVING SEGMENTS OF K In the previous sections we have traced applications of the nuclearity assumption to nonlinear estimates of deterministic expressions. Here we show this assumption allows us to obtain closed-form expressions for a series of random variables.
5.6.1 Convergence of Chain Products Consider any system {Fi : n [ N} in L2 (0, 1) (in applications it is the system of eigenvectors of an integral operator). For a collection of indices i ¼ (i1 , . . . , ikþ1 ), where all i1 , . . . , ikþ1 are natural, by a chain product we mean
(dn Fi1 , dn Fi2 )l2 (dn Fi2 , dn Fi3 )l2 (dn Fik , dn Fikþ1 )l2 , if k . 0; cni ¼ 1, if k ¼ 0: Here dn ¼ dn2 is the discretization operator from Section 2.1.2. One scalar product in the chain is called its link. Put
1, if (k . 0 and i1 ¼ ¼ ikþ1 ) or (k ¼ 0), c1i ¼ 0, otherwise: Lemma. If the system {Fi : n [ N} is orthonormal, then for all i with natural i1 , . . . , ikþ1 lim cni ¼ c1i :
n!1
Proof. By Eq. (2.6) ð1
(Pn Fi )Pn Fj dx ¼ (dn Fi )0 dn Fj :
0
From Lemma 2.2.1 we know that Pn Fi ! Fi in L2 . Therefore continuity of scalar products ensures that ð1
(Pn Fi )Pn Fj dx ¼ (Pn Fi , Pn Fj )L2 ! (Fi , Fj )L2 , n ! 1:
0
From the above two equations we conclude that
lim (dn Fi , dn Fj )l2 ¼ (Fi , Fj )L2 ¼
n!1
1, 0,
i ¼ j, i = j:
(5:41)
In the case k ¼ 0 there is nothing to prove. If k . 0 and among i1 , . . . , ikþ1 there are at least two different indices, then at least two adjacent ones il , ilþ1 must be unequal. In this case the corresponding link (dn Fil , dn Filþ1 )l2 in the chain product
206
CHAPTER 5
SPATIAL MODELS
tends to zero and the product itself tends to zero. If k . 0 and i1 ¼ ¼ ikþ1 , then all links tend to 1. B
5.6.2 Powers of dnKL Lemma.
If K satisfies Assumption 5.3, then the powers of dn KL have elements X
(dn KL )kþ1 ¼ st
k þ1 Y
lij cni (dn Fi1 )s (dn Fikþ1 )t :
(5:42)
i1 ,...,ikþ1 L j¼1
Proof. The proof is by induction. By Eq. (5.32) and the definition of the chain products cni Eq. (5.42) is certainly true for k ¼ 0: L X
(dn KL )st ¼
li (dn Fi )s (dn Fi )t :
(5:43)
i¼1
Suppose Eq. (5.42) is true for some k . 0 and multiply it by Eq. (5.43): ((dn KL )kþ1 dn KL )st ¼
n X
(dn KL )kþ1 sp (dn KL ) pt
p¼1
¼
n X
X
k þ1 Y
lij cni (dn Fi1 )s (dn Fikþ1 )p
p¼1 i1 ,...,ikþ1 L j¼1
L X
likþ2 (dn Fikþ2 )p (dn Fikþ2 )t :
ikþ2 ¼1
Rearranging, ((dn KL )kþ1 dn KL )st ¼
X
kþ2 Y
lij cni
i1 ,...,ikþ2 L j¼1
n X
(dn Fikþ1 )p (dn Fikþ2 )p (dn Fi1 )s (dn Fikþ2 )t :
p¼1
This coincides with Eq. (5.43) incremented by 1 because cn(i1 ,...,ikþ1 )
n X p¼1
(dn Fikþ1 )p (dn Fikþ2 )p ¼ cn(i1 ,...,ikþ2 ) : B
5.6.3 Double A Lemma I call expressions at the right of Eqs. (5.44) and (5.45) below awkward aggregates. This explains the name of the lemma.
5.7 THE ROUNDABOUT ROAD
Lemma.
207
Let a, b be two n-dimensional vectors (one or both may be random). Then
a0 h(dn KL )b ¼
1 X
X
rk
k þ1 Y
lij cni (a0 dn Fi1 )(b0 dn Fikþ1 ) (1st AA)
(5:44)
i1 ,...,ikþ1 L j¼1
k¼0
and a0 h2 (dn KL )b ¼
1 X
rm (m þ 1)
X
m þ2 Y
lij cni (a0 dn Fi1 )(b0 dn Fimþ2 ) (2nd AA)
i1 ,...,imþ2 L j¼1
m¼0
(5:45) Proof. To prove Eq. (5.44), plug Eq. (5.42) into Eq. (5.7) h(dn KL ) ¼
1 X
rk (dn KL )kþ1 st
k¼0
¼
1 X
rk
k¼0
X
kþ1 Y
lij cni (dn Fi1 )s (dn Fikþ1 )t :
(5.46)
i1 ,...,ikþ1 L j¼1
Multiplying this by as bt and summing over s, t ¼ 1, . . . , n we get Eq. (5.44). From Eq. (5.7) and Lemma 5.2.1(ii) we get !2 1 1 X X 2 k k r A A2 ¼ (m þ 1)rm Amþ2 : (5:47) h (A) ¼ m¼0
k¼0
Use this identity together with Eq. (5.42) to prove an analog of Eq. (5.46): h2 (dn KL ) ¼
1 X
(m þ 1)rm (dn KL )mþ2 st
m¼0
¼
1 X m¼0
rm (m þ 1)
X
m þ2 Y
lij cni (dn Fi1 )s (dn Fimþ2 )t :
(5.48)
i1 ,...,imþ2 L j¼1
This implies Eq. (5.45).
B
5.7 THE ROUNDABOUT ROAD We need to study the behavior of the numerator and denominator of Eq. (5.5). The distinctive feature of spatial models is that, at least under our conditions, both converge to nontrivial random variables. To be able to pass from their convergence to convergence of their ratio, we need to prove their convergence in joint distribution.
208
CHAPTER 5
SPATIAL MODELS
5.7.1 Basic Definitions The numerator and denominator are considered coordinates of a new vector P n (for pair), 0 0 v0n h(Wn )0 vn vn Gn vn ¼ : Pn ¼ v0n G0n Gn vn v0n h(Wn )0 h(Wn )vn P n is approximated by another vector with h(dn K) in place of Gn ¼ h(Wn ) 0 vn h(dn K)vn : v0n h2 (dn K)vn That second vector, in turn, is approximated by yet another vector with h(dn KL ) instead of h(dn K),
v0n h(dn KL )vn , v0n h2 (dn KL )vn
where KL is the initial segment of K: (The primes are omitted because dn K and dn KL are symmetric.) The third approximation is a continuous function of the weighted sum of linear processes from Chapter 3. To this last vector we are able to apply Billingsley’s theorem on a double limit (Section 3.9.5). A special notation for the intermediate vectors is not introduced because what we actually need is the initial vector P n , final vector XnL and the differences between successive approximations. This scheme is realized in the representation P n ¼ an þ bnL þ gnL þ XnL , where
an ¼
bnL ¼
v0n [G0n h(dn K)]vn , v0n [G0n Gn h2 (dn K)]vn v0n [h(dn K) h(dn KL )]vn v0n [h2 (dn K) h2 (dn KL )]vn
(5.49) (5.50)
and
gnL ¼
v0n h(dn KL )vn v0n h2 (dn KL )vn
XnL :
(5:51)
The vector XnL carries all the essential information about P n to the extent that its double limit dlimL !1 dlimn !1 XnL describes the limit of P n . Therefore XnL is called a proXy. Its first limit
jL ¼ dlim XnL n !1
(5:52)
5.7 THE ROUNDABOUT ROAD
209
is also representative of P n and is called a projy. The proXy is defined as follows. With the system of eigenvalues {li } and eigenfunctions {Fi } of the operator K put for any n, L [ N 0
UnL
1 v0n dn F1 L X h(lj ) 2 : ¼ @ . . . A, XnL ¼ UnLj h2 (lj ) j¼1 v0n dn FL
(5:53)
The goal is to show that an , bnL and gnL are negligible in some sense.
5.7.2 Convergence of the Proxies Lemma. Denote
Let Assumption 5.1 hold and let the sequence {h(lj )} be summable.
jL ¼ (sbc )
2
L X j¼1
u2j
1 X h(lj ) h(lj ) 2 2 , J ¼ (sbc ) uj h2 (lj ) h 2 (l j ) j¼1
where u1 , u2 , . . . are independent standard normal. Then (i) Eq. (5.52) is true, (ii) jL converges to J in L1 . Proof. (i) Assumption 5.1 on the error term is stronger than in Theorem 3.5.2. The Gram matrix G of the system {F1 , . . . , FL } equals IL . Disposing dn F1 , . . . , dn FL into rows of Wn , by Theorem 3.5.2 we have d
UnL ¼ Wn0 vn ! N(0, (sbc )2 IL ) as n ! 1, including the case bc ¼ 0: By Lemma 3.1.6 we can equivalently write 0
UnL
1 u1 ! jsbc j@ . . . A: uL d
(5:54)
The proXy is a continuous function of UnL . Equation (5.52) follows by CMT from Eq. (5.54). (ii) Convergence of jL to J in L1 follows from the summability of {h(lj )} and integrability of u2i [recall that summability of {h(lj )} implies its squaresummability]. One can prove that the means and variances of XnL tend to those of jL , see (Mynbaev and Ullah, 2008). B
210
CHAPTER 5
SPATIAL MODELS
5.7.3 Bounding Alphas and Betas Lemma. If Assumptions 5.1 and 5.3 hold and jrjn(K) , 1, then the coordinates of vectors an and bnL satisfy kan1 k2,V þ kan2 k2,V ¼ o(1)
(5:55)
and kbnL1 k2,V þ kbnL2 k2,V c
X
jlj j,
(5:56)
j.L
where c does not depend on n and L: Proof. Equation (5.34) implies jrjkKk2 , 1 because kKk2 ¼ [see Eq. (3.39)]. By definitions (3.60) and (5.49) kan1 k2,V ¼ [E(v0n [G0n h(dn K)]vn )2 ]
1=2
P 1
2 j¼1 lj
1=2
n(K)
¼ g(G0n h(dn K)):
By Lemma 5.4.4 kan1 k2,V ! 0: Since kan2 k2,V ¼ g(G0n Gn h2 (dn K)), Lemma 5.4.5 gives kan2 k2,V ! 0. Similarly, because kbnL1 k2,V ¼ [E(v0n [h(dn K) h(dn KL )]vn )2 ]
1=2
¼ g(h(dn K) h(dn KL ))
we can use Lemma 5.5.4 to conclude that kbnL1 k2,V c depend on n and L: For
P
j.L
jlj j where c does not
kbnL2 k2,V ¼ g(h2 (dn K) h2 (dn KL )) Lemma 5.5.5 leads to the same result.
B
5.7.4 Representation of Gammas Lemma.
If Assumption 5.3 holds and jrj max jlj j , 1, j
(5:57)
5.7 THE ROUNDABOUT ROAD
211
then
gnL1 ¼
1 X
1 X
kþ1 Y
lij (cni c1i )UnLi1 UnLikþ1 ,
(5:58)
i1 ,...,ikþ1 L j¼1
k¼0
gnL2 ¼
X
rk
X
rm (m þ 1)
mþ2 Y
lij (cni c1i )UnLi1 UnLimþ2 :
(5:59)
i1 ,...,imþ2 L j¼1
m¼0
See Section 5.6.1 and Eq. (5.53) for the definitions of cni , c1i and UnL . Proof. Letting a ¼ b ¼ vn in the Double A Lemma (Section 5.6.3) we get 1 X
v0n h(dn KL )vn ¼
¼
k þ1 Y
lij cni (v0n dn Fi1 )(v0n dn Fikþ1 )
i1 ,...,ikþ1 L j¼1
k¼0 1 X
X
rk
X
rk
kþ1 Y
lij cni UnLi1 UnLikþ1 :
(5.60)
i1 ,...,ikþ1 L j¼1
k¼0
We need to express the first component of the proXy in similar terms. Condition (5.57) allows us to replace h(lj ) in its definition [Eq. (5.53)] by h(lj ) ¼
1 X
rk ljkþ1 ,
(5:61)
k¼0
thus obtaining
XnL1 ¼
L X
2 UnLj
j¼1
1 X
rk lkþ1 ¼ j
k¼0
1 X
rk
k¼0
L X
2 lkþ1 UnLj : j
j¼1
Since the chain product c1i vanishes for i with different components, this is the same as
XnL1 ¼
1 X k¼0
rk
X
kþ1 Y
lij c1i UnLi1 UnLikþ1 :
i1 ,...,ikþ1 L j¼1
Equation (5.58) is a consequence of this formula and Eq. (5.60). Letting a ¼ b ¼ vn in Eq. (5.45) we get v0n h2 (dn KL )vn ¼
1 X m¼0
rm (m þ 1)
X
m þ2 Y
i1 ,...,imþ2 L j¼1
lij cni UnLi1 UnLimþ2 :
(5:62)
212
CHAPTER 5
SPATIAL MODELS
By Lemma 5.2.1(ii) 1 X
2
h (l j ) ¼
!2
rk lkj
l2j ¼
1 X
(m þ 1)rm lmþ2 : j
m¼0
k¼0
Hence, the proXy’s second component is XnL2 ¼
L X
2 UnLj h2 (lj ) ¼
j¼1
1 X
rm (m þ 1)
m¼0
L X
2 lmþ2 UnLj : j
j¼1
As above, we can insert here c1i because these chain products vanish outside the diagonal i1 ¼ ¼ imþ2 : XnL2 ¼
1 X
X
rm (m þ 1)
m þ2 Y
lij c1i UnLi1 UnLimþ2 :
i1 ,...,imþ2 L j¼1
m¼0
Combining this equation with Eq. (5.62) we obtain the representation for gnL2 .
B
5.7.5 Bounding Gammas Lemma. For any positive (small) 1 and (large) L there exists n0 ¼ n0 (1, L) such that E(jgnL1 j þ jgnL2 j) c1 for all n n0 ; where c does not depend on n and L. Proof. By Ho¨lder’s inequality and Eq. (5.39) 2 1=2
EjUnLi UnLj j {E[v0n dn Fi (dn Fj )0 vn ] }
¼ g(dn Fi (dn Fj )0 ) c:
(5:63)
By Lemma 5.6.1, for any 1, L from the statement of this lemma we can choose n0 ¼ n0 (1, L) so large that jcni c1i j 1 for all n n0 and i1 , . . . , ikþ1 L:
(5:64)
Now Eqs. (5.58), (5.63) and (5.64) give EjgnL1 j
1 X k¼0
c1
X
jrjk
1 X k¼0
k þ1 Y
jlij jjcni c1i jEjUnLi1 UnLikþ1 j
i1 ,...,ikþ1 L j¼1
jrjk
X
k þ1 Y
jlij j c1
i1 ,...,ikþ1 L j¼1
where c1 does not depend on n and L, as claimed. The proof for gnL2 is identical.
1 X
jrjk (n(K))kþ1 ¼ c1 1,
k¼0
B
5.8 ASYMPTOTICS OF THE OLS ESTIMATOR FOR PURELY SPATIAL MODEL
213
5.8 ASYMPTOTICS OF THE OLS ESTIMATOR FOR PURELY SPATIAL MODEL For the reader’s convenience all major assumptions made so far are repeated in this section.
5.8.1 Main Assumptions and Statement Denote ac ¼ 5.8.1.1
P
j[Z
jcj j, bc ¼
P
j[Z
cj .
Assumption 5.1 on the Error Term
1. {{ent , F nt : n [ Z} : n [ Z} is a double-infinite m.d. array. 2. E(e2nt jF n,t1 ) ¼ s2 for all t and n. 3. The third conditional moments are constant, E(e3nt jF n,t1 ) ¼ ant , and the fourth moments are uniformly bounded, m4 ¼ supn,t Ee4nt , 1. 4. {cj : j [ Z} is a summable sequence of numbers, ac , 1. 5. Components of vn ¼ (vn1 , . . . , vnn )0 are defined by X en,tj cj , t ¼ 1, . . . , n: vnt ¼ j[Z
For simplicity, one can think of a double-infinite sequence P {ej : j [ Z} of i.i.d. random variables with finite fourth moments and put vt ¼ j[Z etj cj . 5.8.1.2 Assumption 5.2 on the Spatial Matrices The sequence of matrices {Wn : n [ N} is such that Wn is of size n n and there exists a function K [ L2 ((0, 1)2 ) that satisfies 1 kWn dn Kk2 ¼ o pffiffiffi : n 5.8.1.3 Assumption 5.3 on the Function K The function K is symmetric and the eigenvalues li , i ¼ 1, 2, . . . of the integral operator ð1 (KF)(x) ¼ K(x, y)F(y) dy, F [ L2 (0, 1), 0
are summable: n(K) ;
1 X j¼1
jlj j , 1:
(5:65)
214
CHAPTER 5
SPATIAL MODELS
Assumption 5.3 implies
1 P j¼1
l2j , 1. The inequality
jrj ,
1 X
!1=2
l2j
(5:66)
j¼1
is sufficient for existence of the OLS estimator and representation (5.5) for all large n. To analyze r^ , the pair P n ¼ (v0n Gn vn , v0n G0n Gn vn )0 was composed and represented as P n ¼ an þ bnL þ gnL þ XnL :
(5:67)
To be able to estimate bnL and gnl , we have imposed the condition jrj , 1=n(K),
(5:68)
which is stronger than Eq. (5.66). Theorem. then
Let Assumptions 5.1– 5.3 hold and let r satisfy Eq. (5.68). If bc = 0, P1
j¼1
h(lj )u2j
j¼1
h2 (lj )u2j
dlim (r^ r) ¼ P1
,
(5:69)
where uj are independent standard normal, h(lj ) ¼ lj =(1 rlj ) and 1 X
jh(lj )j , 1:
(5:70)
j¼1
5.8.2 Discussion Theorem 5.8.1 was proved in Mynbaev and Ullah (2008) in the case of i.i.d. errors and in Mynbaev (2010) when the errors are linear processes. The gauge inequality (Section 3.9.9) plays the key role in the generalization from the i.i.d. case to linear processes. It replaces Lemma A.11 [used in Mynbaev and Ullah (2008)] of Lee’s supplement (Lee, 2004a). Lemma 5.1.4 about the finite-sample distribution of the OLS estimator explains why Eq. (5.69) is a better reflection of reality than convergence to a normal vector. Kelejian and Prucha (2001) and Lee (2004a) noticed that the finite-sample distribution of the OLS estimator contains quadratic forms. They developed CLTs for linear-quadratic forms. However, under their assumptions the quadratic part disappears in the limit. In addition to being nonnormal, the asymptotics (5.69) is special in other respects. Both the numerator and denominator of the fraction at the right are nontrivial random variables, unlike many other econometric problems in which the numerator is nontrivial and the denominator is constant. The mean of the fraction in general is not zero. We don’t know if r^ r converges in probability but if it does, the mean of
5.8 ASYMPTOTICS OF THE OLS ESTIMATOR FOR PURELY SPATIAL MODEL
215
r^ may not be zero. This is the true reason for the inconsistency of r^ previously menpffiffiffi tioned in several other sources. No scaling by n or any other nontrivial normalizer is necessary to achieve convergence in distribution. The limit distribution does not depend on the second moments of the innovations. By Lemma 5.7.2 the top and bottom of the fraction in Eq. (5.69) converge in L1 and, hence, in probability. This can be used for approximate calculations by truncating the sums.
5.8.3 Proof of Theorem 5.8.1 Let us check that under condition (5.68) the nuclearity condition (5.65) is equivalent to the summability of h(lj ) [Eq. (5.70)]. Using Eq. (5.68) 0 , c1 ¼ 1 jrj
1 X
jlj j 1 jrlj j j1 rlj j
j¼1
1 þ jrlj j 1 þ jrj
1 X
jlj j ¼ c2 , 1 for all j:
j¼1
Hence, jl j j jlj j jh(lj )j for all j; c2 c1
(5:71)
which proves the equivalence. By Eq. (5.55) plim an ¼ 0: Combining Eq. (5.56) with the Chebyshov and Ho¨lder inequalities gives 1 cX jlj j, P(jbnL1 j þ jbnL2 j . 1) kjbnL1 j þ jbnL2 jk2,V 1 1 j.L where c does not depend on n and L. From Lemma 5.7.5 we conclude that for any fixed L, plimn !1 gnL ¼ 0. Equation (5.67) then implies, for any fixed L, P(kP n XnL k1 . 1) P(kan k1 þ kbnL k1 þ kgnL k1 . 1) 1 1 þ P kbnL k1 . P kan k1 . 3 3 1 þ P kgnL k1 . 3 so that lim sup P(kP n XnL k1 . 1) 3 n !1
cX jlj j: 1 j.L
216
CHAPTER 5
SPATIAL MODELS
Since c does not depend on L, all conditions of Billingsley’s theorem on a double limit (Section 3.9.5) are satisfied with Y ¼ J (see Lemma 5.7.2). Consequently, dlim P n ¼ J:
n !1
(5:72)
Remembering that bc = 0 and J2 . 0 a.s., by CMT
dlim (r^ r) ¼ dlim
n !1
n !1
P n1 J1 ¼ : P n2 J2
5.8.4 Convergence in Distribution of the Estimator of s2 Let
s^ 2 ¼
1 (Yn r^ Wn Yn )0 (Yn r^ Wn Yn ) n1
be the OLS estimator of s2 . Corollary. (Mynbaev and Ullah, 2008) In addition to the conditions of Theorem 5.8.1 assume that the errors vnt ¼ et , t ¼ 1, . . . , n, are i.i.d. Then dlim
n !1
pffiffiffi 2 n(s^ s2 ) [ N(0, m4 s2 ),
where m4 ¼ Ee4t . In particular, s^ 2 is consistent. Proof. For convenience, in the definition of s^ 2 we put n instead of n 1 because the result is asymptotically the same. Substituting ^ )Wn ]S1 ^ )Gn Sn ( r^ )S1 n ( r) ¼ [(I rWn ) þ ( r r n ¼ I þ (r r we have ^ )S1 (Yn r^ Wn Yn )0 (Yn r^ Wn Yn ) ¼ (Sn ( r^ )S1 n ( r)vn )Sn ( r n ( r)vn ¼ v0n (I þ ( r r^ )Gn )0 (I þ ( r r^ )Gn )vn ¼ v0n vn þ ( r r^ )v0n G0n vn þ ( r r^ )v0n Gn vn þ ( r r^ )2 v0n G0n Gn vn ¼ v0n vn þ 2( r r^ )P n1 þ ( r r^ )2 P n2 :
(5:73)
5.9 METHOD OF MOMENTS AND MAXIMUM LIKELIHOOD
217
Taking an arbitrary 1 [ (0, 1=2) we can write pffiffiffi 2 (Yn r^ Wn Yn )0 (Yn r^ Wn Yn ) ns 2 n(s^ s 2 ) ¼ n1=2 ¼
v0n vn ns 2 2(r r^ ) P n1 (r r^ )2 P n2 þ þ : n1=2 n1 n1=21 n1 n1=21
From the proof of Theorem 5.8.1 we know that P n1 , P n2 , (r r^ ) and (r r^ )2 converge in distribution. Therefore pffiffiffi 2 v0 vn ns 2 þ op (1): n(s^ s 2 ) ¼ n pffiffiffi n The term v0n vn ns 2 pffiffiffi ¼ n
Pn
2 t¼1 (et
s 2) pffiffiffi n
is known to converge in distribution to N(0, m4 s 2 ): This follows from the Lindeberg-Le´vy Theorem 3.1.7: e2t s2 are i.i.d., have mean zero and variance 2
V(e2t s2 ) ¼ V(e2t ) ¼ Ee4t (Ee2t ) ¼ m4 s4 :
B
5.9 METHOD OF MOMENTS AND MAXIMUM LIKELIHOOD The method used in the previous sections to study the OLS estimator allows us to calculate some limits that play an important role in the theory of other two methods: MM and ML. Everywhere the assumptions of Theorem 5.8.1 are maintained.
5.9.1 Identification Conditions for ML In ML estimation random variables are assumed to be normal and the ML estimator is obtained by maximizing the log-likelihood function. If the same estimator is used in case of nonnormal variables, it is a QML estimator. For a purely autoregressive spatial model the QML estimator was studied by Lee (2001). One of key elements in his proof consists in applying the identification uniqueness condition (White, 1994, Chapter 3). Lee worked out conditions sufficient for local and global identification. They involve a special parameter hn designed to accommodate different asymptotics of Wn at infinity. Under our conditions, the only meaningful choice is hn ¼ 1 for all n: This is why this parameter is omitted, and the conditions look as follows.
218
CHAPTER 5
5.9.1.1
SPATIAL MODELS
Condition 1 – Local Identification (ID) Condition
1 2 2 0 2 tr(Gn Gn ) þ tr(Gn ) tr (Gn ) lim n !1 n n
The limit (5:74)
exists and is positive. In ML estimation it is customary to denote the true parameters by r0 and s 20 and use r and s 2 for any other parameter values. 5.9.1.2 Condition 2 – Global ID Condition value r0 the limit
For any r different from the true
1 10 2 1 10 (ln js 20 S1 n Sn j ln js n (r)Sn (r)Sn (r)j) n !1 n lim
(5:75)
exists and is not zero. Here Sn (r) ¼ I rWn and Sn ¼ I r0 Wn . jAj denotes the determinant of A. An expression like ln jAj presumes that jAj . 0: By definition
s 2n (r) ¼
s 20 tr(Sn10 S0n (r)Sn (r)S1 n ): n
(5:76)
5.9.2 ML Identification Conditions Failure Theorem. (Mynbaev and Ullah, 2008) Under the assumptions of Theorem 5.8.1 identification conditions 1 and 2 fail. The proof is divided into several sections. We need the notation that reflects dependence of h(lj ) and Gn on r: h(r, lj ) ¼
lj , Gn (r) ¼ Wn S1 n (r) ¼ h(r, Wn ): 1 rlj
5.9.3 Calculating the Limit of tr(Gn (r)) Lemma.
Uniformly on any compact subset s of {r : jrj , 1=n(K)} lim tr(Gn (r)) ¼
n !1
1 X
h(r, lj ):
j¼1
Here the series at the right converges uniformly on the set s. Proof. We approximate tr(Gn (r)) by tr(h(dn K)) and then find the limit of tr(h(dn K)). By the trace version of h-continuity (Section 5.3.3) jtr(Gn (r)) tr(h(dn K))j ¼ jtr(h(Wn )) tr(h(dn K))j pffiffiffi kWn dn Kk2 n þ f(r, Wn , dn K) ! 0;
(5.77)
219
5.9 METHOD OF MOMENTS AND MAXIMUM LIKELIHOOD
where the escort condition of type Eq. (5.27) is applied. Moreover, from Eq. (5.25) we see that the numbers f(r, Wn , dn K) are uniformly bounded with respect to r [ s. The nuke is finite, [see Eq. (5.65)] so we can use Eq. (5.46) with L ¼ 1, s ¼ t to calculate
tr(h(dn K)) ¼
1 X
rk tr(dn K)kþ1
k¼0
¼
1 X
1 X
rk
kþ1 Y
lij cni (dn Fi1 , dn Fikþ1 )l2 :
i1 ,...,ikþ1 ¼1 j¼1
k¼0
Sending n ! 1 and taking into account Lemma 5.6.1 and Eq. (5.41) we have
tr(h(dn K)) !
1 X
1 X
k þ1 Y
lij c1i (Fi1 , Fikþ1 )l2
i1 ,...,ikþ1 ¼1 j¼1
k¼0
¼
1 X
rk
rk
1 X
likþ1 ¼
i¼1
k¼0
1 X
h(r, li ):
(5.78)
i¼1
We may pass to the limit because all series converge uniformly in n and r. Equations (5.77) and (5.78) prove the lemma. B
5.9.4 Calculating Traces of Products Lemma.
Uniformly on any compact subset s of jrj , 1=n(K)
lim tr(G0n Gn ) ¼
n!1
1 X j¼1
h2 (r, lj ) ¼ lim tr(G2n ): n!1
Proof. Ideally, tr(G0n Gn ) ¼ tr(h(Wn0 )h(Wn )) should be close to tr(h2 (dn K)): By Eq. (5.18) jtr(G0n Gn ) tr(h2 (dn K))j jtr(G0n h(dn K))Gn ]j þ jtr[h(dn K)(Gn h(dn K))]j kG0n h(dn K)k2 kGn k2 þ kh(dn K)k2 kGn h(dn K)k2 ¼ o(1): The concluding piece here is based on Eqs. (5.24), (5.27) and (5.28).
220
CHAPTER 5
SPATIAL MODELS
Now we repeat Eq. (5.78) using Eq. (5.48) where appropriate
tr(h2 (dn K)) ¼
1 X
1 X
1 X
1 X
rm (m þ 1)
lij cni (dn Fi1 , dn Fimþ2 )l2
m þ2 Y
lij c1i (Fi1 , Fimþ2 )l2
i1 ,...,imþ2 ¼1 j¼1
m¼0
¼
m þ2 Y
i1 ,...,imþ2 ¼1 j¼1
m¼0
!
1 X
rm (m þ 1)
rm (m þ 1)
m¼0
1 X
lmþ2 ¼ i
i¼1
1 X
h2 (r, li ):
i¼1
We have proved the left equation. Replacing G0n by Gn everywhere we get the right one. B
5.9.5 Essential Formula of the ML Theory of Spatial Models Lemma.
If r is such that jSn (r)j is positive, then @ lnjSn (r)j ¼ tr[Wn S1 n (r)]: @r
Proof. By definition, if f (A) is a scalar function of a matrix A, then
@f @A
¼ ij
@f : @aij
Lu¨tkepohl (1991, p. 473) has the equation @ lnjAj ¼ (A0 )1 @A for a nonsingular matrix with jAj . 0: Combining it with (A0 )1 ¼ (A1 )0 we have @ lnjSn (r)j 0 ¼ (S1 n (r)) : @ Sn (r) Obviously, @ (Sn (r))ij @ ¼ (dij rwnij ) ¼ wnij , @r @r
5.9 METHOD OF MOMENTS AND MAXIMUM LIKELIHOOD
221
where dij ¼ 1 if i ¼ j and dij ¼ 0 if i = j: These equations imply n @ lnjSn (r)j X @ lnjSn (r)j @ (Sn (r))ij ¼ @r @r @ (Sn (r))ij i, j¼1
¼
n X
1 (S1 n (r)) ji (Wn )ij ¼ tr[Wn Sn (r)]:
i, j¼1
B
5.9.6 Limit of the Newton –Leibniz Equation Lemma. Recall that the true parameter r0 satisfies Eq. (5.68). If r is in a small neighborhood of r0 , then
lim (lnjSn (r)j lnjSn j) ¼
ðr X 1
n !1
r0
h(t, li )dt:
i¼1
Proof. By the Newton –Leibniz formula and Lemma 5.9.5 ðr lnjSn (r)j lnjSn j ¼
@ lnjSn (t)j dt ¼ @t
r0
ðr
tr[Wn S1 n (t)]dt:
r0
By Lemma 5.9.3 lim tr[Wn S1 n (t)] ¼ tr(Gn (t)) ¼
n !1
1 X
h(t, lj ):
j¼1
Since convergence here is uniform in a small neighborhood of r0 , term-wise integration of this equation is possible. B
5.9.7 Proof of Theorem 5.9.2 For the local ID condition [Eq. (5.74)] we apply Lemmas 5.9.3 and 5.9.4:
1 2 2 0 2 tr(Gn Gn ) þ tr(Gn ) tr (Gn ) lim n!1 n n 2 !2 3 1 1 X 14 X 2 2 h 2 (l j ) h(lj ) 5 ¼ 0: ¼ lim n!1 n n j¼1 j¼1
222
CHAPTER 5
SPATIAL MODELS
The expression (5.75) used in the global ID condition is rearranged into a more transparent form using properties of logs, determinants and the fact that Sn (r), Sn and their inverses commute: 10 2 1 10 lnjs 20 S1 n Sn j lnjs n (r)Sn (r)Sn (r)j 10 1 10 ¼ lnjs 20 =s 2n (r)j þ lnjS1 n j þ lnjSn j lnjSn (r)j lnjSn (r)j
¼ lnjs 20 =s 2n (r)j þ 2(lnjSn (r)j lnjSn j):
(5:79)
Lemma 5.9.6 gives for the second part of this aggregate lim
1
n !1 n
(lnjSn (r)j lnjSn j) ¼ 0:
(5:80)
For the first part Eqs. (5.76) and (5.73) imply
s 20 tr[(I þ (r0 r)Gn )0 (I þ (r0 r)Gn )] n s2 ¼ 0 tr[I þ 2(r0 r)Gn þ (r0 r)2 G0n Gn ] n
trGn trG0n Gn þ (r0 r)2 : ¼ s 20 1 þ 2(r0 r) n n
s 2n ( r) ¼
Now it is clear from Lemmas 5.9.3 and 5.9.4 that lim s 20 =s 2n (r) ¼ 1 for any r that satisfies (5:68):
n !1
(5:81)
Equations (5.79), (5.80) and (5.81) prove that the global ID condition fails.
5.9.8 Identification Condition for Method of Moments For the purely spatial model Kelejian and Prucha (1999) studied a generalized moments estimator. Lee (2001) simplified their approach and worked out an ID condition in terms of a 2 2 matrix An with elements an11
1 0 0 02 0 ¼ 2 Yn Wn Wn Yn tr(Wn Wn ) Yn Wn Yn , n
1 an12 ¼ Yn0 Wn02 Wn2 Yn þ tr(Wn0 Wn ) Yn0 Wn0 Wn Yn , n an21 ¼ Yn0 Wn2 Yn þ Yn0 Wn0 Wn Yn , an22 ¼ Yn0 Wn02 Wn Yn :
5.10 TWO-STEP PROCEDURE
223
5.9.8.1 Condition 3 – Identification Condition for Method of Moments The Limit 1 plim An n!1 n
(5:82)
exists and is nonsingular.
5.9.9 Method of Moments Identification Condition Failure Theorem. (Mynbaev and Ullah, 2008) Under the assumptions of Theorem 5.8.1 the limit (5.82) is zero. Proof. The desired result follows if we show that L2 -norms of all elements of An are uniformly bounded. Those elements have some common parts which we bound first. Since Yn ¼ S1 n vn , we have, by Lemma 3.9.9, [E(Yn0 Wn02 Wn Yn )2 ]
1=2
02 1 0 0 ¼ g(S01 n Wn Wn Sn ) ¼ g(Gn Wn Gn )
c1 kG0n k2 kWn0 k2 kGn k2 ¼ c1 kWn k2 kGn k22 c2 , where Lemma 5.4.2(i) and Eq. (5.24) have been applied. Similarly, by Lemma 5.2.2 [E(Yn0 Wn Yn )2 ]
1=2
1 ¼ g(S01 n Gn ) c1 kSn k2 kGn k2 c2 :
By Eq. (5.18) and Lemma 5.4.2(i) ktr(Wn0 Wn )k kWn k22 c: The three estimates above imply plim an11 =n ¼ 0 and plim an22 =n ¼ 0: A similar conclusion for an12 is reached if we note additionally that [E(Yn0 Wn02 Wn2 Yn )2 ]
1=2
¼ g(G0n Wn0 Wn Gn ) c1 kGn k22 kWn k22 c2 ,
[E(Yn0 Wn0 Wn Yn )2 ]
1=2
¼ g(G0n Gn ) c1 kGn k22 c2 :
an21 contains one new term for which [E(Yn0 Wn2 Yn )2 ]
1=2
1 ¼ g(S01 n Wn Gn ) c1 kSn k2 kWn k2 kGn k2 c2 :
B
5.10 TWO-STEP PROCEDURE 5.10.1 Maximum Likelihood Estimators I am not a big specialist in ML and MM estimation and can’t say whether in the situation of Theorem 5.8.1 these methods work without the conditions whose failure is reported in Theorems 5.9.2 and 5.9.9. In Mynbaev and Ullah (2008) we decided
224
CHAPTER 5
SPATIAL MODELS
to revert to the analysis of the OLS estimator, instead of trying to revive the ML or MM procedures. The OLS estimator seemed to be more amenable to analysis, but the outcome turned out to be a hybrid of OLS and ML estimators. It also incorporates precise (finite-sample) results on ratios of quadratic forms of normal variables. Unlike most two-stage least squares (2SLS) estimators, where the least squares is used in both stages, in our procedure the first step is the OLS, whereas the second step is a correction to it based on a construct that mimics the ML estimator. In view of Theorem 5.8.1, correcting the OLS estimator means trying to obtain a zero-mean variable from a nonzero-mean variable with an unknown mean. The expression of the ML estimator will help the reader understand the idea behind our construction. The ML estimator was derived in a more general situation by Ord (1975), among others. In our case the log-likelihood function is n n 1 ln Ln (u) ¼ ln(2p) ln s 2 þ ln jSn (r)j 2 kYn rWn Yn k22 , 2 2 2s
(5:83)
where u ¼ ( r, s 2 ). By the essential formula of the ML theory (Section 5.9.5) @ ln Ln (u) 1 ¼ tr(Wn S1 (Yn0 Wn Yn Yn0 Wn0 Yn þ 2rkWn Yn k22 ) n ( r)) @r 2s 2 ¼ tr(Wn S1 n ( r)) þ
1 0 (Y Wn Yn rkWn Yn k22 ), s2 n
@ ln Ln (u) n 1 1 ¼ þ kYn rWn Yn k22 : @s 2 2 s 2 2s 4 The first-order conditions for maximization of ln Ln (u) give the estimators
r^ ML ¼
Yn0 Wn Yn s^ 2ML tr(Wn S1 n (r)) , 2 kWn Yn k2
1 s^ 2ML ¼ kYn rWn Yn k22 : n Of course, these estimators are not feasible as they contain an unknown r.
5.10.2 Annihilation of Means of Quadratic Forms of Normal Variables The material of this section is another piece of information that is necessary to understand the two-step procedure definition. In the theory of normal variables there are many nice finite-sample results. Mathai and Provost (1992) serves as a good introduction and guide to the literature. The annihilation lemma (the name is mine) below answers the question: how can a ratio of quadratic forms !1 n n X X 2 2 2 hi ui hi ui i¼1
i¼1
225
5.10 TWO-STEP PROCEDURE
be combined with an inverse of a quadratic form !1 n X 2 2 hi ui i¼1
in such a way that the resulting expression has mean zero? Here h1 , . . . , hn are some nonzero real numbers. Denote " #1=2 n Y 2 pn (t) ¼ (1 þ 2thi ) , i¼1 1 ð
In ¼
dt , Ini ¼ pn (t)
0
1 ð
0
dt , i ¼ 1, . . . , n: pn (t)(1 þ 2th2i )
(5:84)
Since pn (t) is of order t n=2 at infinity, the integral In converges when n . 2, and for such n 0 , Ini , In , 1: Lemma.
(5:85)
If n . 2, h21 . 0, . . . , h2n . 0 and u N(0, s 2 I ), then ! Pn n 2 s2 X 1 i¼1 hi ui E Pn 2 2 Ini hi Pn 2 2 ¼ 0: In i¼1 i¼1 hi ui i¼1 hi ui
Proof. Hoque (1985) proved that if S and B are symmetric matrices, B is positive definite and u N(0, V), then
1 ð u0 Su ¼ jI þ 2tVBj1=2 tr[(I þ 2tVB)1 VS]dt: E 0 u Bu 0
We apply this result to S ¼ diag[h1 , . . . , hn ], B ¼ diag[h21 , . . . , h2n ], V ¼ s 2 I, I þ 2tVB ¼ diag[1 þ 2t s 2 h21 , . . . , 1 þ 2t s 2 h2n ],
s 2 h1 s 2 hn 1 ,..., : (I þ 2tVB) VS ¼ diag 1 þ 2t s 2 h2n 1 þ 2t s 2 h21 Then Pn 2 1 ð X n n X u hi s 2 hi dt ¼ Ini hi : E Pni¼1 2i 2 ¼ 1 þ 2t s 2 h2i pn (s 2 t) i¼1 i¼1 ui hi i¼1 0
(5:86)
226
CHAPTER 5
SPATIAL MODELS
However, formula (10) from (Jones, 1986) yields E Pn
1
2 i¼1 ui
h2i
1 ¼ 2 s
1 ð
dt In ¼ 2: pn (t) s
(5:87)
0
To annihilate Eq. (5.86) we have to multiply Eq. (5.87) by s 2 =In
Pn
i¼1
Ini hi.
B
5.10.3 Modification Factor Definition Since the OLS estimator and the two-step estimator from Section 5.10.4 do not change if Wn is replaced by its symmetric derivative (Wn þ Wn0 )=2, from now on we assume without loss of generality that Wn is symmetric. Then Wn can be represented as Wn ¼ Pn diag[ln1 , . . . , lnn ]Pn0 ;
(5:88)
where ln1 , . . . , lnn are eigenvalues of Wn and Pn is an orthogonal matrix: Pn Pn0 ¼ I. In the definition of integrals (5.84) put hi ¼ h(lni ), i ¼ 1, . . . , n: The modification factor is defined by .
Mn ¼ Pn diag[In1 =In , . . . , Inn =In ]Pn0 :
(5:89)
5.10.4 Correction Term and Two-Step Estimator Definition Step 1. Estimate r and s2 by OLS. Step 2. Calculate the correction term and two-step estimator using formulas
rcorr ¼
Yn0 Wn Yn s^ 2 tr(Mn Wn S1 ^ )) r^ þ rcorr n (r : , r2S ¼ 2 kWn Yn k22
Notice that Step 2 does not require additional estimation. For analytical purposes we rewrite the correction term as
rcorr ¼
e0n S01 ^ 2 tr(Mn Wn S1 ^ )) n Gn en s n (r : 2 kGn en k2
(5:90)
Here, en appears in place of vn because the errors will be assumed independent normal.
5.10 TWO-STEP PROCEDURE
227
5.10.5 Properties of the Correction Term Instead of Assumption 5.1 here we make a stronger assumption about the error term: en N(0, s2 I): The next theorem under an additional condition that with some p , 2 sup n
n X
jlni jp , 1
i¼1
has been established in [Mynbaev and Ullah, 2008]. Theorem 3.8.5 shows that this condition with p ¼ 1 is implied by Assumptions 5.2 and 5.3. Theorem. [Mynbaev, 2010] Suppose Assumptions 5.2 and 5.3 hold and en N(0, s2 I): If the true r satisfies jrjn(K) , 1, then there exist random variables kn1 , kn2 , kn3 and a deterministic function cn such that ðr
rcorr ¼ r þ kn1 þ kn2 þ kn3
cn (t)dt,
(5:91)
r^
E kn1 ¼ 0 for all n, plim kn2 ¼ 0, dlim kn3 ¼ P1
i¼1
1 , u2i h2 (li )
(5:92) (5:93)
where ui are independent standard normal and kn3 and cn are positive a.e.
5.10.6 Discussion Theorem 5.8.1 suggests replacing the usual consistency definition plim r^ ¼ r by plim r^ ¼ r þ k where E k ¼ 0: Equation (5.92) shows to what extent we have been able to satisfy this definition. Intuitively, the definition of r2S can be explained as follows. By the mean value theorem Eqs. (5.91) and (5.92) imply rcorr r þ kn3 cn (t )(r r^ ), so that the true parameter is a weighted sum of rcorr and r^ :
r
rcorr þ kn3 cn (t )r^ : 1 þ kn3 cn (t )
(5:94)
Here t is some point between the true value and the OLS estimate. Since the weights are unknown we choose one half for each, which seems to work pretty well. As a result of the positivity of kn3 and cn , overshooting of r^ ( r^ . r) results in negativity of the end term in the Eq. (5.91) and undershooting of the correction term ( rcorr , r). This is why their average is closer to r than to r^ . Our attempt to use more correction terms in Monte Carlo simulations did not improve upon r2S .
228
CHAPTER 5
SPATIAL MODELS
5.10.7 Deriving the Correction Term Representation [Equation (5.91)] Using the diagonalization of Wn [Eq. (5.88)] and the definition of Mn [Eq. (5.89)] we have the expressions ^ n1 , . . . , 1 rl ^ nn ]Pn0 , Sn ( r^ ) ¼ I r^ Wn ¼ Pn diag[1 rl 0 Gn ¼ Wn S1 n ¼ Pn diag[h(ln1 ), . . . , h(lnn )]Pn , n P ^ )) ¼ I1n Ini h( r^ , lni ): tr(Mn Wn S1 n (r i¼1
It is easy to see that the vector e~ n ¼ P0n en is distributed as N(0, s2 I). By properties of trace ~ 0n P0n Pn diag e0n S01 n Gn en ¼ e
1 1 P0 ,..., 1 rln1 1 rlnn n
Pn diag[h(ln1 ), . . . , h(lnn )]Pn0 Pn e~ n ¼
n X i¼1
e~ 2i
h(lni ) 1 rlni
and similarly
kGn en k22 ¼ e0n G0n Gn en ¼
n X
e~ 2i h2 (lni ):
i¼1
Equation (5.90) becomes Pn
rcorr ¼
i¼1
e~ 2i h(lni )=(1 rlni ) s^ 2 =In Pn ~ 2i h2 (lni ) i¼1 e
Rearrange the numerator to reveal r n X i¼1
e~ 2i
Pn
i¼1
Pn
i¼1
Ini h( r^ , lni )
:
e~ 2i h2 (lni ) and s2 s^ 2 :
n n X h(lni ) s^ 2 X h(lni ) Ini h(r^ , lni ) ¼ h(lni ) e~ 2i 1 rlni In i¼1 1 rlni i¼1 n n X s2 Ini s2 s^ 2 X Ini h(lni ) h(lni ) þ e~ 2i þ In In i¼1 i¼1 þ
n s^ 2 X Ini [h(lni ) h(r^ , lni )]: In i¼1
(5:95)
5.10 TWO-STEP PROCEDURE
Here the first term actually is r
kn0 ¼
n X
e~ 2i
Pn
i¼1
e~ 2i h2 (lni ): Hence, if we denote
2
h (lni ), kn1
i¼1
kn2 ¼
229
n 1 X s 2 Ini 2 h(lni ), ¼ e~ kn0 i¼1 i In
n s 2 s^ 2 X s^ 2 Ini h(lni ), kn3 ¼ , kn0 In i¼1 kn0
(5:96)
then Eq. (5.95) becomes
rcorr ¼ r þ kn1 þ kn2 þ kn3
n X Ini (h(lni ) h(r^ , lni )): I i¼1 n
(5:97)
If we also take into account that h(lni ) h(r^ , lni ) ¼ h(r, lni ) h(r^ , lni ) ðr ¼
@h(t, lni ) dt ¼ @t
r^
ðr
h2 (t, lni ) dt
r^
and denote
cn (t) ¼
n X Ini 2 h (t, lni ), I i¼1 n
then Eq. (5.97) gives Eq. (5.91).
5.10.8 Establishing the Properties of Components of rcorr By Eq. (5.96) and the annihilation Lemma 5.10.2 Pn E kn1 ¼ E
i¼1
P e~ 2i h(lni ) s2 =In ni¼1 Ini h(lni ) ¼ 0: Pn ~ 2i h2 (lni ) i¼1 e
Lemma 5.10.2 is applicable because zero h(lni ) can be left out of all sums. We have proved the first part of Eq. (5.92). Since kn0 is the second component of the vector P n from Section 5.7.1, kn0 ¼ P n1 ¼ en0 G0n Gn en , and P n converges to the projy, see [Eqs. (5.72) and Section 5.7.2], we have dlim kn0 ¼ J2 ¼ (sbc )2
n !1
1 X i¼1
u2i h2 (li ):
(5:98)
230
CHAPTER 5
SPATIAL MODELS
Assumption 5.3, Eqs. (5.71) and (5.85), and Theorem 3.8.5 imply n n X X Ini h(lni ) jh(lni )j c for all n: i¼1 i¼1 In
(5:99)
Hence, factorizing kn2 as
kn2
#
"X n 1 pffiffiffi 2 1 I ni ¼ 1=2 n(s s^ 2 ) h(lni ) kn0 i¼1 In n
we see that by Corollary 5.8.4 and Eqs. (5.98) and (5.99) the factors in all brackets are Op (1), so that kn2 ¼ op (1). We have proved the second relation in Eq. (5.92). Equation (5.93) follows from Eq. (5.98) and consistency of s^ 2 by CMT: dlim kn3 ¼ plim s^ 2
n !1
1 s2 ¼ : P 2 2 dlim kn0 (sbc )2 1 i¼1 ui h (li )
n !1
Nonnegativity of kn3 and cn is obvious.
B
5.11 EXAMPLES AND COMPUTER SIMULATION The exposition here follows Mynbaev and Ullah (2008).
5.11.1 Definition of the Case Spatial Matrices Conditions in the asymptotic theory usually involve infinite sequences of matrices. In our case we need a link between the function K and sequence of spatial matrices. Other authors in the area have advanced more complex conditions. Those that involve S1 n , Gn and random vectors (involving the error directly or through Yn ) are especially difficult to verify, (see Kelejian and Prucha, 1999, Assumption 5; Kelejian and Prucha, 2001, Assumption 7; Kelejian and Prucha, 2002, Assumption 7; Lee, 2002, Assumptions 5,7,9; Lee, 2003, Assumption 5; Lee, 2004a, Assumptions 9,10; Theorem 3.2). Going from a function K [ L2 ((0, 1)2 ) to spatial matrices is easy: just discretize K, and the approximation condition trivially holds. Going back, that is finding a function K that approximates a given sequence {Wn } of practical interest, is more difficult. Here we take a look at one practical example of spatial matrices considered by Case (1991). We prove L2 -approximability, but the rate of approximation is slower than pffiffiffi o(1= n). Despite this slower rate, we think that Monte Carlo simulations can still be useful if they support the corollary of Theorem 5.8.1 that the asymptotics of r^ is not normal and provide evidence that the two-step procedure from Section 5.10.4 improves the OLS estimator. Besides, the model can be reformulated as one with an exogenous regressor and a pseudo-Case matrix (see the definition and lemma in Section 5.11.3) and then the result for the mixed spatial model will apply.
5.11 EXAMPLES AND COMPUTER SIMULATION
231
In the Case (1991) framework there are r districts and m farmers in each district. Denote lm ¼ (1, . . . , 1)0 (m unities), Bm ¼ (lm l0m Im )=(m 1) and n ¼ rm. The Case spatial matrix equals Wn ¼ I r B m : Here the blocks Bm are put along the diagonal. All elements of the blocks except those on the diagonal are equal to 1=(m 1). The diagonal elements are null.
5.11.2 Eigenvalues of the Case Matrices Lemma. Wn has r eigenvalues equal to 1 and (r 1)m eigenvalues equal to 1=(1 m). Proof. From (lm l0m )lm ¼ lm (l0m lm ) ¼ mlm we see that l1 ¼ m is an eigenvalue and e1 ¼ lm is the corresponding eigenvector of the matrix lm l0m . Denote Xm the (m 1)-dimensional subspace of Rm of vectors orthogonal to e2 : Xm ¼ {x [ Rm : l0m x ¼ x1 þ þ xm ¼ 0}: For any x [ Xm , lm l0m x ¼ 0. Selecting in Xm a set e2 , . . . , em of pairwise orthogonal vectors we see that they are eigenvectors that correspond to eigenvalues l2 ¼ ¼ lm ¼ 0. Since the system e1 , . . . , em is complete in Rm , we have found all eigenvalues of lm l0m . det (lm l0m lI) ¼ 0 is equivalent to det (Bm (l 1)=(m 1)I) ¼ 0. Therefore each eigenvalue l of lm l0m generates an eigenvalue (l 1)=(m 1) of Bm . The eigenvalues of Bm then are l1 ¼ 1 and l2 ¼ ¼ lm ¼ 1=(1 m). Since Ir has r eigenvalues equal to 1, the statement follows from the following property of the Kronecker products (Lu¨tkepohl, 1991, p. 464): the eigenvalues lij (A B) of A B are obtained by multiplying all possible eigenvalues li (A) of A by all possible B eigenvalues lj (B) of B.
5.11.3 L2 -Approximability of the Case Matrices The Case matrix Wn has r blocks on the main diagonal equal to Bm and all other blocks null. If {Wn } is to be L2 -close to some function K, the blocks on the main diagonal must be modeled by the behavior of K along the 458 line. L2 -approximability requires some stabilization of these blocks. If m is fixed and r tends to infinity, then K would have to be zero outside an arbitrary neighborhood of the 458 line and in that case K would have to simply vanish. Since Bm has zeros on the main diagonal and all other elements equal to 1=(m 1), we have !1=2 !1=2 r r m 1=2 X X m2 m 2 2 kBm k2 ¼ ¼ r ! 1, kWn k2 ¼ m1 m1 j¼1 j¼1 so that {Wn } is not L2 -close to K ; 0:
232
CHAPTER 5
SPATIAL MODELS
In the lemma below we see that {Wn } is L2 -approximable in the other extreme case, when r is fixed and m tends to infinity. Define pseudo-Case matrices by e n ¼ Ir lm l0m =(m 1): W The rate of approximation for the Case matrices is slower than required in Theorem 5.8.1, so we mention in passing that the pseudo-Case matrices satisfy Assumption 5.2 of Theorem 5.8.1. However, for the computer simulations only the proper Case matrices were used. Denote by
u1 u v1 v (n) ,s, , , t , , 1 u, v n, quv ¼ (s, t) : n n n n squares with sides of length 1=n. They cover (0, 1)2 . We define K to be r on the union q of diagonal squares of side 1=r and 0 outside that union: q¼
r [
q(r) uu , K ¼ r1q :
u¼1
Lemma . . . } of the Case matrices is (i) For any fixed r, the sequence {Wn : m ¼ 1, 2, pffiffiffi L2 -close to K and kWn dn Kk2 ¼ O 1= n : e n : m ¼ 1, 2, . . . } of the pseudo-Case (ii) For any fixed r, the sequence {W e n dn Kk2 ¼ o 1=pffiffinffi . matrices satisfies kW Proof. (i) Consider the terms in kWn dn Kk22 ¼
n X
[wnij (dn K)ij ]2 :
i, j¼1
Let buv ¼ {(i, j) : (u 1)m þ 1 i um, (v 1)m þ 1 j vm}, 1 u, v r, be the batches of indices that correspond to blocks of Wn of size m m. The diagonal blocks are all Bm and the others are null matrices. 1. Let (i, j) [ buu . From inequalities 1 i (u 1)m m, 1 j (u 1)m m we see that
wnij ¼
1=(m 1), 0,
if i = j, if i ¼ j:
5.11 EXAMPLES AND COMPUTER SIMULATION
233
However, q(n) ij
,
q(r) uu
ð , q, (dn K)ij ¼ n
rdxdy ¼
1 : m
q(n) ij
2. Let (i, j) [ buv with u = v. Then wnij ¼ 0: 2 Since q(n) ij , ((0, 1) nq), we have
(dn K)ij ¼ 0: The equations we have derived imply n X
[wnij (dn K)ij ]2
i, j¼1
¼
r X
X
u,v¼1,u=v (i, j)[buv n r X X 1 ¼ þ m2 u¼1 i¼1
þ
r X X
! [wnij (dn K)ij ]2
u¼1 (i, j)[buu
X (i, j)[buu ,i=j
1 1 2 m1 m
r X r 1 r r 1 2 ¼O : ¼ þ 2 (m m) ¼ þ m m (m 1)2 u¼1 m m(m 1) n (ii) From the above proof one can see that it is the diagonal elements In =(m 1) in Bm that prevent the norm kW n dn Kk2 from being of pffiffiffi order better than O 1= n . By removing them we get statement (ii). B
5.11.4 Monte Carlo Simulations for Theorem 5.8.1 In a finite-sample framework, there is no sequence of spatial matrices and we cannot know the function K that approximates that sequence. Applying the interpolation operator to Wn , we can define K and regard it as the function that approximates the given and all subsequent (unknown) spatial matrices. With this definition, Wn for the given sample becomes an exact image of K under discretization. As Lemma 3.8.4 shows, nonzero eigenvalues of Wn and K coincide. Simulation of the asymptotic result of Theorem 5.8.1 becomes, effectively, a comparison of simulation results for the finite-sample deviation from the true value [Eq. (5.5)] and its eigenvalue representation [Eq. (5.6)] in the case of a symmetric Wn . In this sense the simulation of Theorem 5.8.1 is trivial. However, it can be useful if evidence is sought against the null hypothesis of normal asymptotics.
234
CHAPTER 5
SPATIAL MODELS
Lee (2004a) studied the performance of the QML estimator for r ranging from 30 to 120 and m from 3 to 100. Our values for r, m are roughly the same. Unlike Lee, who investigated only convergence properties, we also checked the empirical distribution and found evidence that it is not normal. We find the empirical distribution function of the OLS estimator with 1000 repetitions. As Lemma 5.11.2 shows, Wn has a large number of equal negative eigenvalues, denoted lmin , and a small number of equal positive eigenvalues, denoted lmax . Recall that Theorem 5.8.1 guarantees convergence of r^ for r in a small neighborhood of 0, called here a convergence neighborhood. The combinations of r and m considered are: (a) m ¼ 10, r ¼ 100 (jrj , 0:0047); (b) m ¼ 100, r ¼ 10 (jrj , 0:0524); (c) m ¼ 50, r ¼ 50 (jrj , 0:01). (The intervals in parentheses are convergence neighborhoods). For each of the cases (a), (b) and (c) we take three different values of r: one in a small neighborhood of 0, another close to lmin and the third close to lmax . Thus, we do nine simulations and for each of them: (i) test for normality the distributions of the OLS estimator and its “eigenvalue”counterpart (5.6), (ii) find sample means and standard deviations of the OLS estimator r^ and its expression in terms of eigenvalues. Table 5.1 shows that, in many cases, bias is large and comparable in absolute value with the parameter being estimated. This should not come as a surprise because a ratio of quadratic forms in general does not have mean zero. The main calculations were made in GAUSS and the empirical distributions were fed to Minitab to test for
TABLE 5.1 Simulations for Theorem 5.8.1
Values of m, r (a) m ¼ 10, r ¼ 100
(b) m ¼ 100, r ¼ 10
(c) m ¼ 50, r ¼ 50
True r
OLS estimator mean (s.d.)
“Eigenvalue” formula mean (s.d.)
(a1)20.105 (a2) 0.1 (a3) 0.95 (b1)20.095 (b2) 0.1 (b3) 0.95 (c1)20.015 (c2) 0.1 (c3) 0.95
20.2445 (0.1648) 0.1789 (0.1085) 0.9976 (0.0003) 20.4972 (0.8410) 0.0165 (0.5512) 0.9969 (0.0016) 20.0707 (0.2175) 0.1728 (0.1662) 1.0129 (0.1694)
20.2350 (0.1712) 0.1781 (0.1189) 0.9976 (0.0003) 20.4503 (0.7753) 0.0212 (0.5461) 0.9969 (0.0016) 20.0685 (0.2193) 0.1605 (0.1669) 1.0015 (0.0005)
Mynbaev and Ullah (2008). (s.d.) standard deviation.
5.11 EXAMPLES AND COMPUTER SIMULATION
235
normality. In all cases the null hypothesis that the distribution is normal is rejected (the p-value of the Anderson – Darling statistic is less than 0.005 in all cases). When m is small relative to r (m ¼ 10, r ¼ 200) and r is close to zero, the sample distribution of r^ for the purely spatial model is closer to the normal. When, on the contrary, m is large relative to r (m ¼ 200, r ¼ 10) and r is the same, the sample distribution of r^ is positively skewed. Thus, if nonzero entries of Wn are concentrated around the main diagonal, we should expect asymptotic normality. To a lesser extent, this effect is observed in case of the mixed model. For the purely spatial model, as r approaches the right end of the theoretical interval of convergence, the sample distribution collapses to a spike.
5.11.5 Monte Carlo Simulations for the Two-Step Procedure The second part of the computer simulations is related to Theorem 5.10.5. The twostep procedure from Section 5.10.4 is computationally intensive. GAUSS’ internal code for calculating integrals is unreliable and we had to use MathCAD to find the coefficients In and Ini . For moderate values of n (10 and 100) we have to take values from a ¼ 100 to a ¼ 1000 to approximate improper integrals over the halfline by integrals over [0, a]. For n ¼ 1000 the function 1=pn declines very quickly and it is sufficient to take a ¼ 10. With In and Ini at hand we used GAUSS to realize the two-step procedure. In cases (a) and (b) from Section 5.11.4 it took about half an hour on a computer with a processor speed 2.4 MHZ to simulate 100 procedures and the total time for each of the six subcases was about 50 minutes. Therefore we did not attempt to simulate 1000 times and in case (c) the combination m ¼ 50, r ¼ 50 was replaced with m ¼ 40, r ¼ 40 (the convergence neighborhood being jrj , 0:0125). The results are presented in Table 5.2. Some of them are not particularly illuminating [see, for example, cases (a3), where the approximation is not very good, and (b1), where the standard error is comparable with the bias]. This is why it is better to compare the errors of the OLS and our procedure.
TABLE 5.2 Simulations for Two-Step Estimator
Values of m, r (a) m ¼ 10, r ¼ 100
(b) m ¼ 100, r ¼ 10
(c) m ¼ 40, r ¼ 40
Mynbaev and Ullah (2008).
True r
Two-step estimator mean (s.d.)
(a1)20.105 (a2) 0.1 (a3) 0.95 (b1)20.095 (b2) 0.1 (b3) 0.95 (c1)20.015 (c2) 0.1 (c3) 0.95
20.0950 (0.0051) 0.1017 (0.0027) 0.5070 (3.2e-008) 20.1884 (0.1918) 0.0888 (0.0586) 0.9467 (1.3e-006) 20.0721 (0.0525) 0.1556 (0.0483) 0.9974 (3.7e-007)
236
CHAPTER 5
SPATIAL MODELS
TABLE 5.3 Comparison of Percentage Errors
Values of m, r (a) m ¼ 10, r ¼ 100
(b) m ¼ 100, r ¼ 10
(c) m ¼ 40, r ¼ 40
True r
OLS error %
“Eigenvalue” formula error%
Two-step estimator error
(a1) 20.105 (a2) 0.1 (a3) 0.95 (b1) 20.095 (b2) 0.1 (b3) 0.95 (c1) 20.015 (c2) 0.1 (c3) 0.95
132.86 78.90 5.01 423.37 83.50 4.94 371.33 72.80 6.62
123.81 78.10 5.01 374.00 78.80 4.94 356.67 60.50 5.42
9.52 1.70 46.63 98.32 11.20 0.35 380.67 55.60 4.99
Mynbaev and Ullah (2008).
Comparison of errors is given in Table 5.3. As we can see, for r close to zero the two-step procedure improves the OLS estimator in all cases. For r close to one of the eigenvalues of Wn the evidence is mixed: in two cases (shown in bold) the error has increased.
5.12 MIXED SPATIAL MODEL 5.12.1 Statement of Problem Here we study the model Yn ¼ Xn b þ rWn Yn þ vn ;
(5:100)
where r, Wn and vn are the same as in the purely spatial model, Xn is an n k matrix of deterministic exogenous regressors and b is a k-dimensional parameter vector. A range of estimation techniques for this model has been investigated in the literature: ML and QML, MM and generalized MM, the least squares and 2SLS and the instrumental variables estimator (see Ord, 1975; Kelejian and Prucha, 1998,1999; Smirnov and Anselin, 2001; Lee, 2001, 2002, 2003, 2004a). Despite the conceptual and technical differences in approaches, all these authors have been looking for a normal asymptotics. The goal in the rest of this chapter is to extend to the mixed spatial model the method developed in the previous sections for the purely spatial model. Corresponding to two particular cases b ¼ 0 and r ¼ 0 we have two submodels Submodel 1: Yn ¼ rWn Yn þ vn , Submodel 2: Yn ¼ Xn b þ vn : Theorem 5.8.1 supplies a set of conditions sufficient for convergence in distribution of the OLS estimator r^ for Submodel 1. Let’s call this set Set 1. The corresponding set for
5.12 MIXED SPATIAL MODEL
237
Submodel 2 is provided in Chapter 1. When considering the combined model (5.100) it is natural to join Sets 1 and 2 into one Set C (comprehensive or combined) and try to impose on top of Set C as few conditions as possible to obtain convergence in distribution of the OLS estimator u^ of u ¼ (b0 , r)0 . Not only should the conditions be combined, but also the knowledge of the elements of the conventional scheme, for the submodels should contribute to the construction of the conventional scheme for the combined model. Inspection of the conditions of Theorems 1.12.3 and 5.8.1 shows that Set C consists of Assumptions 5.1 –5.3 plus just one condition on the columns Hn1 , . . . , Hnk of the normalized regressor matrix Hn ¼ Xn D1 n where Dn ¼ diag[kXn1 k2 , . . . , kXnk k2 ] is the VS normalizer (see Section 1.11.1 for the definition) and Xn1 through Xnk are columns of Xn . The condition looks like this: 5.12.1.1
Assumption 5.4 on the normalized regressors
1. The sequence of columns {Hnl : n [ N} is L2 -close to Ml [ L2 (0, 1), l ¼ 1, . . . , k. 2. M1 , . . . , Mk are linearly independent or, equivalently, the Gram matrix 0
1 (M1 , Mk )L2 A (Mk , Mk )L2
(M1 , M1 )L2 @ G0 ¼ (Mk , M1 )L2 is positive definite (see Section 1.7.5).
Item 2 of this assumption can be called asymptotic linear independence of normalized regressors. We need to analyze the elements of the conventional scheme for the mixed model and get the most out of Assumptions 5.1– 5.4.
5.12.2 The Conventional Scheme for the Mixed Model Denoting u ¼ (b0 , r)0 the parameter vector and Zn ¼ (Xn , Wn Yn ) the regressor matrix we rewrite Eq. (5.100) as Yn ¼ Zn u þ vn . Until we work out the condition for nonsingularity of Zn0 Zn it is safer to work with the normal equation Zn0 Zn (u^ u) ¼ Zn0 vn . Following the discussion in Section 1.11.1 the suggestion is to put Dn ¼
Dn 0
0 dn
with dn . 0 to be defined later. The normal equation is easily rearranged to 0 1 1 0 ^ D1 n Zn Zn Dn Dn (u u) ¼ Dn Zn vn :
238
CHAPTER 5
SPATIAL MODELS
Here 0 zn ¼ D1 n Zn vn
and
0 1 Vn ¼ D1 n Z n Z n Dn
are called numerator and denominator, respectively. Recalling the notation Sn ¼ I rWn , Gn ¼ Wn S1 n , 1 exists we get the reduced form Yn ¼ S1 and assuming that S1 n n Xn b þ Sn vn of Eq. (5.100) so that
Wn Yn ¼ Gn Xn b þ Gn vn ¼ Gn Hn Dn b þ Gn vn : Therefore the normalized regressor matrix is 1 1 1 Dn 0 1 ¼ Hn , Gn Hn Dn b þ Gn vn Zn Dn ¼ (Xn , Wn Yn ) 0 dn1 dn dn
(5:101)
and the numerator is
zn ¼
0 D1 n Zn vn
¼
Hn0 vn : 0 0 0 1 1 0 0 dn (Dn b) Hn Gn vn þ dn vn Gn vn
(5:102)
From our experience with Submodels 1 and 2 (Section 5.12.1) we can surmise that perhaps the behavior of the terms involving Hn0 , G0n and vn can be deduced from Assumptions 5.1– 5.4. However, the vector bn ¼
1 1 Dn b ¼ (kXn1 k2 b1 , . . . , kXnk k2 bk ) dn dn
involves both dn and Dn and its behavior may present problems; bn is called a balancer. With the help of Eq. (5.101) it is easy to establish the working expressions for the blocks of the denominator Vn11 ¼ Hn0 Hn , Vn12 ¼ Hn0 Gn Hn bn þ
1 0 H Gn vn , dn n
Vn21 ¼ V0n12 , Vn22 ¼ b0n Hn0 G0n Gn Hn bn þ
2 0 0 0 1 b H G Gn vn þ 2 v0n G0n Gn vn : dn n n n dn
(5.103)
As we see in the next few sections, all parts of zn and Vn not involving dn and bn converge under Assumptions 5.1 –5.4. Everywhere these assumptions are assumed to hold and only additional conditions are listed.
5.12 MIXED SPATIAL MODEL
239
5.12.3 Infinite-Dimensional Matrices We use infinite-dimensional matrices A of size l m, where one or both dimensions can be infinite. Matrices can extend downward or rightward, but not upward or leftward. We consider only matrices with finite l2 -norms. Summation, transposition and multiplication are performed as usual and preserve this property because kA þ Bk2 kAk2 þ kBk2 , kA0 k2 ¼ kAk2 , kABk2 kAk2 kBk2 : The above inequality ensures the validity of the associativity law for multiplication and the transposition law for products (AB)C ¼ A(BC), (AB)0 ¼ B0 A0 :
(5:104)
This may be the only fact that is not evident, so here is the proof for multiplication. Formally, for the ijth element of the products (AB)C and A(BC) we have X X ((AB)C)ij ¼ (AB)il clj ¼ aik bkl clj l
¼
X
k,l
aik
X
k
! bkl clj
¼
X
l
aik (BC)kj ¼ (A(BC))ij :
k
These formal manipulations are justified by absolute convergence of all the series. For example, for one of the terms in the middle X k
jaik j
X
! jbkl clj j
l
X
jaik j
X
k
X
!1=2 b2kl
X
l
l
!1=2
!1=2
a2ik
k
X k,l
b2kl
!1=2 c2lj X
!1=2 c2lj
, 1:
l
Elements of l2 are written as column-vectors. A matrix A with kAk2 , 1 induces a bounded linear operator A in l2 by the formula 0P 1 a1j xj j B C B C Ax ¼ B P C @ alj xj A j
because kAxk2 kAk2 kxk2 . Henceforth for the operator we use the same notation A as for the matrix. Denote B the set of bounded operators in l2 and BM the set of bounded operators in l2 having a matrix representation A with kAk2 , 1. BM is a proper subset of B (for example, for the diagonal matrix representing the identity operator one has kIk2 ¼ 1).
240
CHAPTER 5
SPATIAL MODELS
Therefore some care is necessary when in one formula there are operators from both B and BM:
5.12.4 EXtender and eXtended M Let {Fi } be the orthonormal system of eigenfunctions of the operator K from Assumption 5.3 (Section 5.3.1). With a function F [ L2 (0, 1) decomposed as X F¼ (F, Fj )L2 Fj j1
we associate a vector X F [ l2 of its Fourier coefficients X F ¼ ((F, F1 )L2 , (F, F2 )L2 , . . . )0 : By Parseval’s identity (3.33) (X F)0 X G ¼
X
(F, Fi )(G, Fi ) ¼ (F, G)L2
i
for any functions F, G [ L2 (0, 1). In particular, X preserves norms and kX Fk2 ¼ kFk2 . Hence, X is an isomorphism from L2 (0, 1) to l2 . Denote M ¼ (M1 , . . . , Mk )0 : I call X an eXtender because its application to M 0 gives a matrix M ¼ (X M1 , . . . , X Mk ) ¼ X M 0 of size 1 k. M is named an eXtended M. The linear span of M1 , . . . , Mk is denoted M. Lemma. If Assumption 5.4 (Section 5.12.1.1) holds, then the columns of M are linearly independent and kMk2 , 1: Proof. By Assumption 5.4 the functions Mi are linearlyPindependent. The columns of PM are linearly independent because, by isomorphism, i ci Mi ¼ 0 is equivalent to i ci X Mi ¼ 0. Assumption 5.4 also guarantees square-summability of columns of M. Since B the number of the columns is finite, we have kMk2 , 1.
5.12.5 Double P Lemma Denote P ¼ M(M0 M)1 M0 , Q ¼ I P,
(1st PP), (2nd PP):
(5:105)
P and Q are referred to as principal projectors of the theory of mixed spatial models.
5.12 MIXED SPATIAL MODEL
Lemma.
241
If Assumption 5.4 (Section 5.12.1) holds, then
(i) P and Q are symmetric and idempotent. (ii) P projects l2 onto the image X M of M under X : L2 (0, 1) ! l2 . Proof. (i) Like in linear algebra, we use Eq. (5.104) to check that 0
1
1
P0 ¼ M00 [(M0 M)1 ] M0 ¼ M[(M0 M)0 ] M0 ¼ M[M0 M00 ] M0 ¼ P, P2 ¼ M(M0 M)1 M0 M(M0 M)1 M0 ¼ M(M0 M)1 M0 ¼ P: Since M0 M is finite-dimensional, we can change the order of inversion and transposition without worrying about generalization of linear algebra rules. To prove that Q is an orthoprojector we need to treat P as an operator, not a matrix, because with the identity operator, matrix calculus doesn’t work: Q0 ¼ I P0 ¼ I P, Q2 ¼ (I P)2 ¼ I P P þ P2 ¼ I P ¼ Q: (ii) The image of P coincides with X M because for any x [ l2 the vector Px is a linear combination of X M1 , . . . , X Mk with coefficients (G1 M0 x)l , l ¼ 1, . . . , k: Px ¼ M(M0 M)1 M0 x ¼ (X M1 , . . . , X Mk )G1 M0 x ¼
k X
(G1 M0 x)l X Ml :
l¼1
B
5.12.6 The Genie The eXtender generates an isomorphism between two sets of operators: any bounded operator A in L2 (0, 1) induces a bounded operator A~ ¼ X AX 1 in l2 : Let hX ¼ diag[h(l1 ), h(l2 ), . . . ] be an infinite-dimensional diagonal matrix. It is shown in Section 5.8.3 that Eq. (5.68) P implies i1 jh(li )j , 1, so khX k2 , 1. I call hX a genie because, ultimately, it is a reflection of the limit of Gn ¼ h(Wn ). The significance of hX is explained by the following fact. Lemma.
The genie is the operator induced in l2 by h(K): X h(K) ¼ hX X :
(5:106)
242
CHAPTER 5
SPATIAL MODELS
Proof. According to the general definition of functions of operators [Eq. (3.36)] X h(li )(F, Fi )Fi : h(K)F ¼ i
By definition of the eXtender, X h(K)F ¼ (h(l1 )(F, F1 ), h(l2 )(F, F2 ), . . . )0 ¼ hX ((F, F1 ), (F, F2 ), . . . )0 ¼ hX X F: Since F is arbitrary, this proves the lemma.
B
5.12.7 Convergence of Quadratic Forms in Hn Denote Gi ¼ M0 hiX M, i ¼ 0, 1, 2:
(5:107)
By Parseval’s identity, the G0 defined here is the same as the Gram matrix from Assumption 5.4 (Section 5.12.1): G0 ¼ X M X M ¼ 0
X
!k
Lemma.
¼ M(x)M 0 (x)dx:
(Ml , Fi )L2 (Mm , Fi )L2
i
ð1
l,m¼1
0
If jrjn(K) , 1, then
(i) limn !1 Hn0 Hn ¼ G0 , (ii) limn !1 Hn0 Gn Hn ¼ G1 ¼ limn !1 Hn0 G0n Hn , (iii) limn !1 Hn0 G0n Gn Hn ¼ G2 . Proof. Statement (i) follows directly from Assumption 5.4, Theorem 2.5.3 and the definition of M: lim (H 0 Hn )lm n !1 n
¼
lim H 0 Hnm n !1 nl
ð1 ¼ Ml (x)Mm (x)dx 0
¼
1 X
(Ml , F j )L2 (Mm , Fj )L2 ¼ (M0 M)lm :
j¼1
The proofs of the other statements are given in separate sections.
5.12.8 Proving Convergence of Hn0 Gn Hn and Hn0 G0n Hn The elements of the matrix Hn0 Gn Hn are Hnl0 Gn Hnm , 1 l, m k. For any l, m Hnl0 Gn Hnm ¼ Hnl0 [h(Wn ) h(dn K)]Hnm þ Hnl0 h(dn K)Hnm :
B
5.12 MIXED SPATIAL MODEL
243
Here the first term tends to zero by Eq. (5.19) and boundedness of kHnl k2 : jHnl0 [h(Wn ) h(dn K)]Hnm j ckHnl k2 kWn dn Kk2 kHnm k2 ! 0: For the second term, Lemma 5.6.3 with L ¼ 1 gives Hnl0 h(dn K)Hnm ¼
1 X
1 X
rp
pþ1 Y
lij cni (dn Fi1 , Hnl )l2 (dn Fi pþ1 , Hnm )l2 :
i1 ,...,i pþ1 ¼1 j¼1
p¼0
The series converge uniformly in l, m, n because the scalar and chain products cni are uniformly bounded and jHnl0 h(dn K)Hnm j c
1 X
1 X
jli1 li pþ1 j
i1 ,...,i pþ1 ¼1
p¼0
¼c
1 X
jrjp
(jrjn(K))p n(K) , 1:
p¼0
Besides, there is convergence of the chain products (Lemma 5.6.1) and scalar products [Eq. (5.41)], with an obvious modification for Hnm , so Hnl0 h(dn K)Hnm !
1 X
1 X
rp
p¼0
pþ1 Y
lij c1i (Fi1 , Ml )L2 (Fi pþ1 , Mm )L2
i1 ,...,i pþ1 ¼1 j¼1
(c1i vanishes outside the diagonal i1 ¼ ¼ i pþ1 ) ¼
1 X p¼0
rp
1 X
lipþ1 (Fi , Ml )L2 (Fi , Mm )L2
i¼1
[using Eq. (5.61) and the definition of M] ¼
1 X
h(li )(Fi , Ml )L2 (Fi , Mm )L2 ¼ (M0 hX M)lm :
i¼1
We have proved that Hnl0 Gn Hnm ! (M0 hX M)lm . The second equation in Lemma 5.12.7(ii) follows from the first and Lemma 5.4.4: jHnl0 (Gn G0n )Hnm j kHnl k2 (kGn h(dn K)k2 þ kG0n h(dn K)k2 )kHnm k2 ! 0:
5.12.9 Proving Convergence of Hn0 G0n Gn Hn As in Section 5.12.8, note that Hn0 G0n Gn Hn has Hnl0 G0n Gn Hnm as its elements and Hnl0 G0n Gn Hnm ¼ Hnl0 [G0n Gn h2 (dn K)]Hnm þ Hnl0 h2 (dn K)Hnm :
244
CHAPTER 5
SPATIAL MODELS
The first term on the right is estimated using Lemma 5.4.5 and Eqs. (5.24) and (5.28): jHnl0 [G0n Gn h2 (dn K)]Hnm j kHnl k2 (kG0n h(dn K)k2 kGn k2 þ kh(dn K)k2 kGn h(dn K)k2 )kHnm k2 ckWn dn Kk2 ! 0: By Lemma 5.6.3 the second term can be re-written as Hnl0 h2 (dn K)Hnm ¼
1 X
1 X
r p ( p þ 1)
pþ2 Y
lij cni (dn Fi1 , Hnl )l2 (dn Fi pþ2 , Hnm )l2
i1 ,...,i pþ2 ¼1 j¼1
p¼0
with the series converging uniformly, as in Section 5.12.8. After letting n ! 1 and applying Lemma 5.6.1 and Eqs. (5.41) and (5.47) we obtain Hnl0 h2 (dn K)Hnm !
1 X
1 1 X X i¼1
¼
1 X
pþ2 Y
lij c1i (Fi1 , Ml )L2 (Fi pþ2 , Mm )L2
i1 ,...,i pþ2 ¼1 j¼1
p¼0
¼
1 X
r p ( p þ 1)
! p
r (pþ
1)lipþ2
(Fi , Ml )L2 (Fi , Mm )L2
p¼0
h2 (li )(Fi , Ml )L2 (Fi , Mm )L2 ¼ (M0 h2X M)lm :
i¼1
Thus, for any l, m Hnl0 G0n Gn Hnm ! (M0 h2X M)lm :
5.13 THE ROUNDABOUT ROAD (MIXED MODEL) The idea here generalizes Section 5.7.1. Before explaining it I report an unsuccessful attempt to realize a customary way of proving convergence of the OLS estimator. By writing 1 V1 n zn ¼ [EVn þ (Vn EVn )] zn
we see that convergence of V1 n zn will take place if 1. zn converges in distribution, 2. EVn converges to a nonsingular deterministic matrix, 3. Vn EVn converges in probability to a null matrix. I failed to implement steps (2) and (3) because Vn converges in distribution to a stochastic matrix.
5.13 THE ROUNDABOUT ROAD (MIXED MODEL)
245
5.13.1 Definition of the Proxies (Mixed Model) For any natural n, L denote 0
UnL
1 Hn0 vn B (d n F 1 )0 v n C C ¼B @ A (dn FL )0 vn
(5:108)
a random vector with k þ L components. The previous UnL [from Eq. (5.53)] represents the final two components of this one. Now we define the proXy by 0 1 Hn0 vn P B L h(l )(M, F ) U C i i L2 nL,kþi C B i¼1 B PL 2 C h (li )(M, Fi )L2 UnL,kþi C XnL ¼ B (5:109) B i¼1P C: L B C 2 @ A i¼1 h(li )UnL,kþi PL 2 2 i¼1 h (li )UnL,kþi Its last two components give the old XnL from Section 5.7.1. The limiting behavior of XnL , as n ! 1, is described in terms of the projy 0 P1 1 i¼1 (M, Fi )L2 ui B PL h(l )(M, F ) u C i i L2 i C B i¼1 B PL 2 C B jL ¼ jsbc jB i¼1 h (li )(M, Fi )L2 ui C (5:110) C, 1 L , 1, PL B 2 C @ jsbc j i¼1 h(li )ui A P jsbc j Li¼1 h2 (li )u2i where u1 , u2 , . . . are independent standard normal. Once again, the final two components of this projy give the earlier projy. Both proxies carry the essential information about the auxiliary vector from the next section.
5.13.2 Auxiliary Vector, Alphas, Betas and Gammas All random components contained in the numerator zn [Eq. (5.102)] and denominator Vn [Eq. (5.103)] are dumped into one auxiliary vector (now it is not a pair) 0 1 0 1 Hn0 vn An1 B An2 C B Hn0 Gn vn C B C B 0 0 C C B C An ¼ B n Gn vn C: B An3 C ¼ B Hn G @ An4 A @ v0n G0n vn A v0n G0n Gn vn An5 Hn0 G0n Vn (which is a part of zn ) is not included because plim (Hn0 Gn Vn Hn0 G0n Vn ) ¼ 0 (see Lemma 5.13.6). The first three components of An are k 1 and linear in vn , whereas the final two are (scalar) quadratic forms of vn . The ordering of components
246
CHAPTER 5
SPATIAL MODELS
of An does not matter. The final two components of An coincide with P n from Section 5.7.1. Therefore in the representation An ¼ an þ bnL þ gnL þ XnL
(5:111)
the terms at the right will have the final two components equal to the corresponding vectors from Section 5.7.1. Naturally, the vectors at the right-hand side of Eq. (5.111) have blocks conformable with those of An . XnL is defined in Eq. (5.109) and represents the main part of An . The other three vectors are defined by 0
0
1
B H 0 [G h(d K)]v C B n n n C B 0 n0 C 2 C H [G G h ( d K)]v an ¼ B n n C, B n n n B C 0 0 @ vn [Gn h(dn K)]vn A v0n [G0n Gn h2 (dn K)]vn 0 1 0 B H 0 [h(d K) h(d K )]v C B n n n L n C B 0 2 C 2 B bnL ¼ B Hn [h (dn K) h (dn KL )]vn C C B 0 C @ vn [h(dn K) h(dn KL )]vn A v0n [h2 (dn K) h2 (dn KL )]vn and 0
gnL
1 0 1 0 0 B Hn0 h(dn KL )vn C B XnL2 C B 0 2 C B C C B C ¼B B Hn0 h (dn KL )vn C B XnL3 C: @ vn h(dn KL )vn A @ XnL4 A v0n h2 (dn KL )vn XnL5
(5:112)
In an , bnL and gnL the first blocks are null because the CLT for weighted sums of linear processes from Chapter 3 is directly applicable to the first block of An . anj , j ¼ 1, . . . , 5, mean the five blocks in an , not its scalar coordinates. A similar convention applies to all other vectors defined here. The proof of convergence of An consists of three steps. Step 1. an , bnL , gnL are negligible in some sense. For their final two components this is shown in Sections 5.7.3 and 5.7.5. For the first three we show it in Lemmas 5.13.6 – 5.13.8. Step 2. The limit of XnL , as n ! 1, is jL , which, in turn, converges to J defined in Eq. (5.115). Step 3. Apply Billingsley’s theorem on a double limit and the CLT.
5.13 THE ROUNDABOUT ROAD (MIXED MODEL)
247
5.13.3 Convergence of the proXy Lemma.
If X
jh(li )j , 1,
(5:113)
i1
then dlim XnL ¼ jL for all natural L. n !1
Proof. In Theorem 3.5.2 put Wn ¼ (Hn , dn F1 , . . . , dn FL ): By that theorem UnL ¼ Wn0 vn converges in distribution to a normal vector with zero mean and variance equal to (sbc )2 multiplied by the Gram matrix G of the system fM1 , . . . , Mk , F1 , . . . , FL g: d
UnL ! N(0, (sbc )2 G):
(5:114)
Denoting F (L) ¼ (F1 , . . . , FL )0 and applying the usual vector operations we write G in the form
G¼
(M, M 0 )L2
(M, F (L)0 )L2
!
(F (L) , M 0 )L2 (F (L) , F (L)0 )L2 0 P1 0 (M, F1 )L2 i¼1 (M, Fi )L2 (M , Fi )L2 B 0 (M , F1 )L2 1 B ¼B @ ... ... 0 (M 0 , FL )L2
...
(M, FL )L2
...
0
... ...
... 1
1 C C C: A
Here the upper left block is the Parseval identity. The lower right block is the result of orthonormality of F1 , . . . , FL : If we take a sequence of independent standard normals and define UL by 0 P1 B UL ¼ jsbc jB @
i¼1
1 (M, Fi )L2 ui C u1 C, A ... uL
then it will be normal, have zero mean and variance EUL UL0 ¼ (sbc )2
V11 V21
0 V21 , V22
248
CHAPTER 5
SPATIAL MODELS
where V11 ¼
X
(M, Fi )L2 (M 0 , Fj )L2 Eui uj ¼ (M, M 0 )L2 ,
i,j
0P B V21 ¼ @
V22
(M 0 , Fi )L2 Eu1 ui
1
0
(M 0 , F1 )L2
1
C B C ... ... A¼@ A, 0 0 (M , Fi )L2 EuL ui (M , FL )L2 0 1 . . . Eu1 uL Eu21 B C ¼ @ ... ... . . . A ¼ I: EuL u1 . . . Eu2L P
d
Hence, EUL UL0 ¼ (sbc )2 G and, in consequence, UnL ! UL as n ! 1: XnL , being a continuous function of UnL , converges in distribution to the same function of UL . When comparing the expressions of XnL and its limit in distribution jL , keep in mind two corollaries of Eq. (5.114). First, the relationship d
Hn0 vn ! N(0, (sbc )2 (M, M 0 )L2 ) is equivalent to d
Hn0 vn ! jsbc j
1 X
(M, Fi )L2 ui
i¼1
and, second, d
d
2 UnL,kþi ! jsbc jui , UnL,kþi ! (sbc )2 u2i :
B
5.13.4 Convergence of the Projy Denote 0
0 1 1 M0 u J1 0 1 B J2 C B M0 hx u C u1 B B C C 0 2 @ M h u B C C J¼B sb j u2 A ¼ j , u ¼ J x c B B 3C C @ J4 A @ jsbc ju0 hx u A ... J5 jsbc ju0 h2x u
(5:115)
(see Sections 5.12.4 and 5.12.6 for the definitions of the eXtended M and genie). Note that J1 , J2 , J3 are linear in standard normal variables and J4 , J5 are quadratic. Lemma (i) L1 -limjL ¼ J, (ii) components of J converge in L2 :
5.13 THE ROUNDABOUT ROAD (MIXED MODEL)
249
Proof. (i) This part follows from the fact that Eq. (5.113) implies that Eu2i ¼ 1:
P
i1
h2 (li ) , 1 and
(ii) This proposition is equivalently expressed by saying that the vectors 0 1 (M, Fi )L2 ui h(l )(M, Fi )L2 ui C L B X B 2 i C B h (li )(M, Fi )L2 ui C JL ¼ jsbc j (5:116) B 2 C A i¼1 @ jsbc jh(li )ui jsbc jh2 (li )u2i converge to J in L2 . We can’t be sure that M0 u belongs to l2 for every point in the sample space V. However, components of M0 u converge in L2 : X
E
(X Ml )2i u2i ¼ kX Ml k22 ¼ kMl k22 , 1:
i
For the fourth component, by independence of u2i E
X
!2 jh(li )u2i j
¼
i
X
jh(li )h(lj )jEu2i u2j
¼
i,j
X
!2 jh(li )jEu2i
, 1:
i
The second, third and fifth components of J converge even faster because of B multiplication by h(li ) and h2 (li ):
5.13.5 Bounding Linear Forms in v n Lemma.
For any n n matrix A (EkHn0 Avn k2 )
1=2
ckHn k2 kAk2 :
Proof. Use the partition of Hn into its columns: EkHn0 Avn k22 ¼ E
k X
(Hnl0 Avn )2 ¼
l¼1
k X
Ev0n A0 Hnl Hnl0 Avn
l¼1
(apply Ho¨lder’s inequality and Lemma 3.9.9)
k X
[E(v0n A0 Hnl Hnl0 Avn )2 ]1=2 ¼
k X
l¼1
c
k X
l¼1
kA0 Hnl k2 kHnl0 Ak2 c
l¼1
¼
g(A0 Hnl Hnl0 A)
ckHn k22 kAn k22 :
k X
kAk22 kHnl kk22
l¼1
B
250
CHAPTER 5
SPATIAL MODELS
5.13.6 Estimating Alphas Lemma.
If jrj
P 1
2 j¼1 lj
1=2
, 1, then
lim Ekanj k22 ¼ 0, j ¼ 1, . . . , 5,
(5:117)
plim (Hn0 Gn Vn Hn0 G0n Vn ) ¼ 0:
(5:118)
n !1
and
Proof. For j ¼ 4, 5, Eq. (5.117) is proved in Section 5.7.3. The case j ¼ 1 is trivial, for an1 ¼ 0: By Assumption 5.4 (Section 5.12.1.1) and Theorem 2.5.3 lim kHn k22 ¼
n !1
k X
kMl k22 :
(5:119)
l¼1
The assumptions of this lemma allow us to use Lemma 5.4.4. By Lemma 5.13.5 (Ekan2 k22 )
1=2
1=2
¼ (EkHn0 [Gn h(dn K)]vn k22 ) ckHn k2 kGn h(dn K)k2 ! 0:
(5:120)
Similarly, by Lemmas 5.4.5 and 5.13.5 and bounds (5.24) and (5.28) (Ekan3 k22 )
1=2
¼ (EkHn0 [G0n Gn h2 (dn K)]vn k22 )1=2 0
ckHn k2 [kGn h(dn K)k2 kGn k2 þ kh(dn K)k2 kGn h(dn K)k2 ] ! 0: Replacing Gn by G0n in Eq. (5.120) (EkHn0 [G0n h(dn K)]vn k2 )
1=2
ckHn k2 kGn h(dn K)k2 ! 0:
This equation and Eq. (5.120) imply Eq. (5.118): (EkHn0 (Gn G0n )vn k22 )
1=2
(EkHn0 [Gn h(dn K)]vn k22 )
1=2
þ (EkHn0 [h(dn K) G0n ]vn k22 )
5.13.7 Estimating Betas Lemma.
If jrjn(K) , 1, then (EkbnLj k22 )
1=2
c
X i.L
where c does not depend on n, L.
jli j, j ¼ 1, . . . , 5,
1=2
! 0:
B
5.13 THE ROUNDABOUT ROAD (MIXED MODEL)
251
Proof. As in Lemma 5.13.6, we need to consider only bnL2 and bnL3 . By Lemmas 5.13.5 and 5.5.3 and Eq. (5.119) (EkbnL2 k22 )
1=2
¼ {EkHn0 [h(dn K) h(dn KL )]vn k22 }1=2 ckHn k2 kh(dn K) h(dn KL )k2 c1
X
jli j:
i.L
For bnL3 we also use Eq. (5.40): (EkbnL3 k22 )
1=2
¼ (EkHn0 [h2 (dn K) h2 (dn KL )]vn k22 )1=2 ckh2 (dn K) h2 (dn KL )k2 c(kh(dn K)k2 þ kh(dn KL )k2 )kh(dn K) h(dn KL )k2 X c1 jli j: i.L
B
5.13.8 Estimating Gammas Lemma. If jrjn(K) , 1, then for any positive (small) 1 and (large) L there exists n0 ¼ n0 (1, L) such that EkgnLj k1 c1 for all n n0 , j ¼ 1, . . . , 5, where c does not depend on n and L. Proof. Recall definitions (5.109) and (5.112). For the first component, we need to consider:
gnL2 ¼ Hn0 h(dn KL )vn
L X
h(li )(M, Fi )L2 UnL,kþi :
(5:121)
i¼1
By the Double A Lemma 5.6.3 we write the lth component of Hn0 h(dn KL )vn as Hnl0 h(dn KL )vn ¼
1 X
rp
1 X p¼0
pþ1 Y
i1 ,...,i pþ1 L j¼1
p¼0
¼
X
rp
X
pþ1 Y
i1 ,...,i pþ1 L j¼1
lij cni
n X
(dn Fi1 )s (Hnl )s (dn Fi pþ1 )t vnt
s,t¼1
lij cni (Hnl , dn Fi1 )l2 UnL,kþi pþ1 :
(5:122)
252
CHAPTER 5
SPATIAL MODELS
However, the lth component of the sum in Eq. (5.121) is L X
h(li )(Ml , Fi )L2 UnL,kþi
i¼1
¼
1 X
rp
p¼0
¼
1 X
L X
lipþ1 (Ml , Fi )L2 UnL,kþi
i¼1
X
rp
pþ1 Y
lij c1i (Ml , Fi1 )L2 UnL,kþi pþ1
(5:123)
i1 ,...,i pþ1 L j¼1
p¼0
(in the end sum all those terms with unequal i1 , . . . , i pþ1 actually vanish). The last two equations give the next expression for the lth component of gnL2 : (gnL2 )l ¼
1 X
rp
p¼0
X
pþ1 Y
lij [cni (Hnl , dn Fi1 )l2
i1 ,...,i pþ1 L j¼1
c1i (Ml , Fi1 )L2 ]UnL,kþi pþ1 :
(5:124)
By Theorem 2.5.3 (Hnl , dn Fi1 )l2 ! (Ml , Fi1 )L2 and by Lemma 5.6.1 cni ! c1i as n ! 1: Hence, for any 1, L . 0 there exists n0 ¼ n0 (1, L) such that jcni (Hnl , dn Fi1 )l2 c1i (Ml , Fi1 )L2 j , 1, n n0 ,
(5:125)
for all i that appear in (gnL2 )l . Besides, similarly to Eq. (5.63) EjUnL,kþi pþ1 j (EjUnL,kþi pþ1 j2 )1=2 ¼ [E(v0n dn Fi pþ1 (dn Fi pþ1 )0 vn )]1=2 {E[v0n dn Fi pþ1 (dn Fi pþ1 )0 vn ]2 }1=4 ¼ g(dn Fi pþ1 (dn Fi pþ1 )0 )1=2 ckdn Fi pþ1 k2 c:
(5:126)
The result of Eqs. (5.124), (5.125) and (5.126) is the desired estimate for (gnL2 )l : EkgnL2 k1
k X
Ej(gnL2 )l j c1
p¼0
l¼1
c1
1 X
1 X
X
jrjp
pþ1 Y
jlij j
i1 ,...,i pþ1 L j¼1
(jrjn(K)) p n(K) ¼ c1 1:
p¼0
The definitions from Sections 5.13.1 and 5.13.2 imply
gnL3 ¼ Hn0 h2 (dn KL )vn
L X i¼1
h2 (li )(M, Fi )L2 UnL,kþi :
5.13 THE ROUNDABOUT ROAD (MIXED MODEL)
253
By Lemma 5.6.3 Hnl0 h2 (dn KL )vn ¼
1 X
X
rm (m þ 1)
m þ2 Y
lij cni (Hnl , dn Fi1 )l2 UnL,kþimþ2
i1 ,...,imþ2 L j¼1
m¼0
because v0n dn Fimþ2 ¼ UnL,kþimþ2 : The next calculation is analogous to that for (5.123) L X
h2 (li )(Ml , Fi )L2 UnL,kþi
i¼1
¼
1 X
rm (m þ 1)
m¼0
¼
1 X
L X
lmþ2 (Ml , Fi )L2 UnL,kþi i
i¼1
rm (m þ 1)
X
m þ2 Y
lij c1i (Ml , Fi1 )L2 UnL,kþimþ2 :
i1 ,...,imþ2 L j¼1
m¼0
For the lth component of gnL3 we get: (gnL3 )l ¼
1 X m¼0
rm (m þ 1)
X
m þ2 Y
lij [cni (Hnl , dn Fi1 )l2
i1 ,...,imþ2 L j¼1
c1i (Ml , Fi1 )L2 ]UnL,kþimþ2 : The rest of the proof is exactly the same as that for gnL2 :
B
5.13.9 Convergence of the Auxiliary Vector Lemma.
If rnðKÞ , 1; then dlimAn ¼ J:
P1 2 1=2 , 1: By the Proof. The condition of the lemma implies r j¼1 lj Chebyshov inequality and Lemmas 5.7.3 and 5.13.6 plim an ¼ 0: From Lemmas 5.7.3 and 5.13.7 we infer that 1 c P kbnL k1 . 1 EkbnL k1 EkbnL k2 1 1 1=2 c X c EkbnL k22 jli j; 1 1 i.L where c does not depend on 1; n; L. Lemmas 5.7.5 and 5.13.8 show that plimn!1 gnL ¼ 0 for any fixed L.
254
CHAPTER 5
SPATIAL MODELS
The facts we have just listed and Eq. (5.111) ensure that for any fixed L lim supn !1 PðkAn XnL k1 . 1Þ
cX jli j: 1 i.L
Equivalence (5.71) allows us to use Lemmas 5.13.3 and 5.13.4. Application of Billingsley’s theorem (Theorem 3.9.5) completes the proof. B
5.14 ASYMPTOTICS OF THE OLS ESTIMATOR FOR MIXED SPATIAL MODEL 5.14.1 Crucial Choice and Assumption From Lemmas 5.12.7 and 5.13.9 we know that all elements in Eqs. (5.102) and (5.103) converge, except for bn and 1=dn . It is tempting to leave the choice of dn to the user and make an assumption such as that below. 5.14.1.1 Tentative Assumption The sequence {dn } of positive constants is such that the limits lim 1=dn and B ¼ lim bn exist. I don’t like such a “thing in itself ” and prefer dn governed by the elements of the model. However, the issue is not just a matter of taste. With this tentative assumption it is easy to run into problems. The simplest choice is to select dn quickly growing to make both lim 1=dn and B zero. In that case the inverse of lim Hn0 Hn 0 ¼V dlim Vn ¼ 0 0 does not exist. Three requirements to the choice of dn make sense: 1. It should be such that the existence of the limits lim 1=dn and B is a plausible but not very restrictive condition. 2. The cases lim 1=dn ¼ 0 and B ¼ 0 should be mutually exclusive, otherwise jVj ¼ 0: 3. The components of the numerator and denominator due to Submodels 1 and 2 (Section 5.12.1) should be retained in the mixed model. 5.14.1.2
Definition dn ¼ max {kXn1 k2 jb1 j, . . . , kXnk k2 jbk j, 1}:
The idea here is the same as that with the time series autoregressive model of (Mynbaev 2006a). The normalizer for Submodel 1 is identically 1, as we know from Theorem 5.8.1. The autoregressive term in the mixed model is subordinate to the exogenous term in the following sense. If the exogenous regressors are bounded, normalizing by unity the autoregressive term in the estimator is sufficient. If the
5.14 ASYMPTOTICS OF THE OLS ESTIMATOR FOR MIXED SPATIAL MODEL
255
exogenous regressors grow quickly, they pull the dependent variable to the extent that the autoregressive term should P be normalized more heavily, dn is equivalent to d~ n ¼ kj¼1 jbj jkXnj k2 þ 1: dn d~ n (k þ 1)dn . Therefore lim dn ¼ 1 is equivalent to lim
k X
jbj jkXnj k2 ¼ 1:
(5:127)
j¼1
Obviously, dn 1, jbj jkXnj k2 =dn 1: The vector bn with coordinates bnj ¼ bj kXnj k2 =dn [ [1, 1], j ¼ 1, . . . , k is called a balancer in 5.12.2. 5.14.1.3
Assumption 5.5
The limits
d ¼ lim dn [ [1, 1] and Bj ¼ lim bnj [ [1, 1], j ¼ 1, . . . , k exist. Lemma 5.14.2 partially answers the question of what this assumption means in terms of the regressors and b.
5.14.2 Balancer Properties Lemma.
If Assumption 5.5 holds, then the following is true.
(i) If bj ¼ 0, then Xnj is arbitrary and Bj ¼ 0: (ii) Let bj = 0: Then (iia) Bj ¼ 0 is equivalent to kXnj k2 ¼ o(dn ): (iib) Bj = 0 is equivalent to kXnj k2 =dn ! cj . 0: (iii) Conditions max jBj j , 1 and d . 1 j
(5:128)
are mutually exclusive. In particular, conditions B ¼ 0 and d ¼ 1 are mutually exclusive. (iv) B ¼ 0 if and only if either (iva) b ¼ 0 or (ivb) b = 0 and limn!1 kXnj k2 ¼ 0 for any j such that bj = 0: In either case dn ¼ 1 for all large n and d ¼ 1:
256
CHAPTER 5
SPATIAL MODELS
Proof. (i) is obvious. (ii) If bj = 0, then kXnj k2 ¼ bnj dn =bj . This equation implies (iia) and (iib). (iii) Suppose that Eq. (5.128) is true and denote 1 ¼ 1 maxi jBi j. d . 1 implies dn ¼ max {kXn1 k2 jb1 j, . . . , kXnk k2 jbk j} . 1 for all large n:
(5:129)
maxj jBj j , 1 implies jbnj j ¼ kXnj k2 jbj j=dn 1 1=2 for all large n:
(5:130)
Equations (5.129) and (5.130) lead to a contradiction: dn (1 1=2)dn . (iv) Let B ¼ 0. If b ¼ 0, there is nothing to prove. If b = 0, then consider any j such that bj = 0. By (iia) for any such j we have kXnj k2 ¼ o(dn ). By (iii) the assumption B ¼ 0 excludes the possibility d ¼ 1. Hence, d , 1 and kXnj k2 ¼ o(dn ) is equivalent to kXnj k2 ¼ o(1). Since this is true for any j with bj = 0, we have dn ¼ 1 for all large n and, consequently, d ¼ 1. We have proved (ivb). Conversely, if (iva) is true, then trivially B ¼ 0. If (ivb) is true, then dn ¼ 1 for all large n and bnj ¼ kXnj k2 bj ! 0 for any j such that bj = 0. Hence, B ¼ 0. B
5.14.3 Convergence of the Pair (Vn , zn ) It is useful to summarize what we can say about convergence of the elements of the conventional scheme, to see the problem with convergence of the OLS estimator u^ . Lemma.
Under Assumptions 5.1– 5.5 and jrjn(K) , 1 we have dlim (Vn , zn ) ¼ (V, z ),
where
z¼
V¼
J1 B0 J2 þ d1 J4
! ,
(5.131)
G0
G1 B þ d1 J2
B0 G1 þ d1 J02
B0 G2 B þ d2 B0 J3 þ d12 J5
! ;
(5.132)
the matrices Gj are defined in Eq. (5.107) and the vector J in Eq. (5.115). Proof. The expressions for Gj are established in Lemma 5.12.7. Convergence in distribution of the auxiliary vector of random components of Vn and zn is supplied by Lemma 5.13.9. The expressions of V and z are obtained by replacing various parts in formulas (5.103) and (5.102) by their respective limits from Lemma 5.12.7,
5.14 ASYMPTOTICS OF THE OLS ESTIMATOR FOR MIXED SPATIAL MODEL
257
Eq. (5.115) and Assumption 5.5 (Section 5.14.1.3). The correspondence between B components of An and J is seen by comparison of Eqs. (5.111) and (5.115). The problem with Eq. (5.132) is that it is stochastic and contains both linear (J2 and J3 ) and quadratic (J5 ) forms in standard normals.
5.14.4 Multicollinearity Detector It is curious that infinite-dimensional matrices are helpful in the study of finite-dimensional matrices. Denote jsbc j 2 0 u F ¼ QhX MB þ , where u ¼ (u1 , u2 , . . . ) : d l2
(5:133)
Due to the lemma below, jVj . 0 a:e: () F . 0 a.e. This is why we call F a multicollinearity detector (the notation comes from Finesse). Lemma (i) F ¼ L1 -lim FL , where jsbc j (L) 2 , u(L) ¼ (u1 , u2 , . . . , uL , 0, . . . )0 : Qh MB þ u FL ¼ X d l2
(ii) jVj ¼ jG0 jF. Proof. We start with part (ii) which will help us prove (i). In the determinant of a partitioned matrix rule [Lemma 1.7.6 (i)] put V11 ¼ G0 . Then the second determinant at the right of Lemma 1.7.6 (i) is just a number [see Eq. (5.132)] 2 0 1 0 number ¼ V22 V21 V1 11 V12 ¼ B G2 B þ B J3 þ 2 J5 d d 1 1 J B0 G1 þ J02 G1 G B þ 1 2 0 d d 2 1 ¼ B0 G2 B þ B0 J3 þ 2 J5 B0 G1 G1 0 G1 B d ffl{zfflfflffl} d |fflffl 1 1 0 1 1 0 1 B0 G1 G1 0 J2 J2 G0 G1 B 2 J2 G0 J2 : d ffl{zfflfflfflfflfflfflfflfflffl} |fflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflffl} d d |fflfflfflfflfflfflfflffl
258
CHAPTER 5
SPATIAL MODELS
Collect similarly underlined terms, number ¼ B0 (G2 G1 G1 0 G1 )B þ
1 0 (J5 J2 G1 0 J2 ) d2
2 þ B0 (J3 G1 G1 0 J2 ): d Now replace the matrices Gj by their expressions from Eq. (5.107) and recall the definitions of Ji given in Eq. (5.115): number ¼ B0 [M0 h2X M M0 hX M(M0 M)1 M0 hX M]B (sbc )2 0 2 [u hX u (M 0 hX u)0 (M0 M)1 M0 hX u] d2 2 þ jsbc jB0 [M0 h2X u M0 hX M(M0 M)1 M0 hX u]: d
þ
(5.134)
Next, employ the principal projectors of Lemma 5.12.5: number ¼ (hX MB)0 (I P)hX MB þ
(sbc )2 (hX u)0 (I P)hX u d2
2 þ jsbc j(hX MB)0 (I P)hX u d (sbc )2 (hX u)0 Q2 hX u d2 jsbc j jsbc j þ (hX MB)0 Q2 hX u þ (hX u)0 Q2 hX MB d d jsbc j 2 : ¼ Qh MB þ u X d l2 ¼ (hX MB)0 Q2 hX MB þ
Thus, V22 V21 V1 11 V12 ¼ F:
(5:135)
Equation (5.134) is particularly convenient to see that statement (i) is true. It can also be deduced from Lemma 5.13.4(ii). B
5.14.5 Multicollinearity Detector and Exogenous Regressors Domination From Eq. (5.127) we know that d ¼ 1 only when at least one of the exogenous regressors grows to infinity. In this case we say that the exogenous regressors dominate, and by Lemma 5.14.2(iii) B = 0. If, however, B ¼ 0, then by Lemma 5.14.2(iv) all exogenous regressors are negligible in some sense and d ¼ 1. In this situation it is natural to say that the autoregressive term dominates.
5.14 ASYMPTOTICS OF THE OLS ESTIMATOR FOR MIXED SPATIAL MODEL
259
Lemma. If the exogenous regressors dominate, then F is the squared distance from the image of B0 M under the mapping h(K) to the linear span M of functions M 1 , . . . , Mk : F ¼ dist2 (h(K)B0 M, M): Proof. The image of P is characterized in Lemma 5.12.5. It equals X M. Q projects onto the subspace of l2 orthogonal to X M and kQxk22 is the squared distance from x to X M. Thus, putting d ¼ 1 in Eq. (5.133) we have F ¼ kQhX MBk2l2 ¼ dist2 (hX MB, X M) :
(5:136)
Using the main property of the genie Eq. (5.106) we have hX MB ¼ hX
k X
Bl X Ml ¼ hX X B0 M ¼ X h(K)B0 M:
(5:137)
l¼1
Since X is an isomorphism, Eq. (5.136) and (5.137) imply F ¼ dist2 (X h(K)B0 M, X M) ¼ dist2 (h(K)B0 M, M) :
B
5.14.6 Main Theorem Theorem. Let Assumptions 5.1– 5.5 and jrjn(K) , 1 hold. Then the following statements are true. (i) jVj . 0 a.e. if and only if F . 0 a:e:
(5:138)
If condition (5.138) holds, then the OLS estimator u^ for the mixed spatial model converges in distribution, d Dn (u^ u) ! V1 z,
(5:139)
where ! 1 0 G1 0 V ¼ LCL, L ¼ , F 0 1 G0 þ V12 V21 V12 C¼ : V21 1 1
(5:140)
260
CHAPTER 5
SPATIAL MODELS
(ii) In particular, when the autoregressive term dominates, z, C and F simplify to
z¼
J1
, C¼
J4
F ¼ (sbc )
0
2
G0 þ d12 J2 J2
J2
J02
1
! , (5:141)
kQhX uk22 :
As we see, bc = 0 is necessary for Eq. (5.138) in this case. (iii) In the other special case, when the exogenous regressors dominate, z, C and F become
z¼
G0 þ G1 BB0 G1 J1 , C ¼ 0 B0 G1 B J2
G1 B 1
, B = 0,
F ¼ dist2 (h(K)B0 M, M):
(5.142) (5.143)
This means that linear independence of h(K)B0 M and M1 , . . . , Mk is necessary and sufficient for Eq. (5.138). Moreover, if the constant F is positive, then the convergence relation has a more familiar format d Dn (u^ u) ! N(0, (sbc )2 V1 )
(5:144)
where V¼
G0 B0 G 1
G1 B : B0 G2 B
Recall that J1 , J2 and J3 are linear in standard normals, while J4 and J5 are quadratic forms of standard normals. In the case of domination by the exogenous regressors, the quadratic parts disappear from z and V and V is nonstochastic. If the autoregressive term dominates, the linear part vanishes in z2 and V22 . These are the traces of features of Submodels 1 and 2 (Section 5.12.1). None of these extreme cases involves J3 , which reflects interaction between the exogenous regressors and the spatial matrix. Condition Eq. (5.138) is called an invertibility criterion. Proof. (i) Equivalence of the invertibility criterion to the absence of multicollinearity follows from Lemma 5.14.4. Equation (5.139) is a consequence of Lemma 5.14.3 and CTM. It remains to prove expression (5.140) for V1 . In terms of the partition of V we use the formula for the inverse of Lemma 1.7.6(ii), which looks like this: 1 V12 GV21 V1 V1 V12 G V11 þ V1 1 11 11 11 , V ¼ G GV21 V1 11
5.14 ASYMPTOTICS OF THE OLS ESTIMATOR FOR MIXED SPATIAL MODEL
261
1
where G ¼ (V22 V21 V1 11 V12 ) . From Eq. (135) we know that G ¼ 1=F. Hence, ! 1 1 G1 1 G1 1 0 þ G0 V12 V21 G0 0 V12 V ¼ F V21 G1 1 0 ! ! 1 G0 þ V12 V21 V12 1 G1 G0 0 0 0 , ¼ F V21 1 0 1 0 1 which proves Eq. (5.140). (ii) Eq. (5.141) obtains from Equations (5.140), (5.131), (5.133) and (5.132) on putting B ¼ 0, d ¼ 1. (iii) Similarly, Eq. (5.142) follows from Eqs. (5.140), (5.131) and (5.132), with d ¼ 1. B = 0 by Lemma 5.14.2(iii). Equation (5.143) is proved in Lemma 5.14.5. Now we prove Eq. (5.144). In the case under consideration directly from Eq. (5.132) G1 B G0 : V¼ B0 G1 B0 G2 B However, by definitions Eq. (5.107) and Eq. (5.115) J1 0 V(z ) ¼ E 0 J1 J02 B B J2 EM0 uu0 M EM0 uu0 hX MB 2 ¼ (sbc ) EB0 M0 hX uu0 M EB0 M0 hX uu0 hX MB M0 M M0 hX MB ¼ (sbc )2 0 0 ¼ (sbc )2 V: B M hX M B0 M0 h2X MB These equations imply Eq. (5.144).
B
5.14.7 Example In the model Yn ¼ bln þ rWn Yn þ vn with a constant term and Case matrix Wn the regressors are collinear because Wn ln ¼ ln and Zn ¼ Wn (ln , Yn ) is of rank at most 2. e n Yn þ vn . The pseudo-Case matrix W e n satisfies Therefore we consider Yn ¼ bln þ rW P the assumptions of Theorem 5.14.6 with i1 jli j 2r 1, if r is fixed and m ! 1. For simplicity, the components of the error vector vn are assumed i.i.d. with mean 0 and variance s 2 . Application of Theorem 5.14.6 leads to the following conclusions. The conditions d ¼ 1 (exogenous regressors domination) and B ¼ 0 (autoregressive term domination) are mutually exclusive and together cover all possible b. The theoretical statements combined with long and dull calculations result in Table 5.4. In this example z and F contain quadratic forms of standard normal variables, but those
262
CHAPTER 5
SPATIAL MODELS
TABLE 5.4 Asymptotic Distribution with a Constant Term and Case Matrix
b¼0
b=0
d ¼ 1, B ¼ 0 (autoregressive term domination) If r ¼ 1, there is asymptotic multicollinearity. pffiffiffi p If r 2, then ( n(b^ b), r^ r) ! (0, 1 r):
d ¼ 1; max jBj j ¼ 1 (exogenous regressor domination) For any natural r, there is asymptotic multicollinearity
Mynbaev (2010).
forms cancel out in F1 z. Still, the asymptotic distribution, when it exists, is not normal. In particular, b^ is consistent and r^ is not when b ¼ 0, r 2: Computer simulations confirm the theoretical results. For pseudo-Case matrices the values of m, r were fixed at m ¼ 200, r ¼ 10 giving n ¼ 1000: Each of the values b ¼ 1, 0, 1 was combined with 20 values of r from the segment [ 0:2, 0:2], to see if there is deterioration of convergence at the boundary of the theoretical interval of convergence r [ ( 1=19, 1=19): For each combination (b, r) 100 simulations were run. The ranges of sample means and sample standard deviations for the samples of size 100 are reported in Table 5.5 (for small values we indicate just the order of magnitude). As we see, the estimate of b ¼ 0 is good, as predicted, and the estimates of b ¼ +1 are not. As a result of inconsistency, the estimate of r is always bad (closer to 1 than to the true r). To see the dynamics of r^ as m increases we combined b ¼ 0, r ¼ 0:2 with m ¼ 200, 300, . . . , 1000: The corresponding values of r^ approach 1, starting from 0:9966 and monotonically increasing to 0:999. These simulations did not reveal any deterioration of convergence outside the theoretical interval, which suggests that the convergence may hold in a wider interval. For combinations b ¼ 1, 0, 1 with r ¼ 0:2 the null hypothesis of normal distribution for b^ and r^ is rejected (the p-value of the Anderson-Darling statistic is less than 0.0001). Recall that a similar worrisome evidence was found in the case of a purely autoregressive spatial model. The results for the Case matrices are reported in Table 5.6. Owing to multicollinearity, there is no definite pattern in these numbers, and for the combination b ¼ 0, r ¼ 0:2 an increase in m from 200 to 1000 did not improve the estimates. Finally, in cases b ¼ +1 for both the Case and pseudo-Case matrices the sample TABLE 5.5 Simulation Results for Pseudo-Case Matrices
Statistic
b ¼ 1
b¼0
b¼1
mean b
1012
1012
1016
11
11
1015 0:995 1014
st.d. b mean r st.d. r Mynbaev (2010).
10 0:995 1011
10 0:995 1011
5.14 ASYMPTOTICS OF THE OLS ESTIMATOR FOR MIXED SPATIAL MODEL
263
TABLE 5.6 Simulation Results for Case Matrices
Statistic mean b s.d. b mean r^ r s.d. r
b ¼ 1
b¼0
b¼1
[ 1:7, 1:02] [0:54, 1:15] [ 0:82, 0:02] [0:43, 1:34]
[ 0:92, 0:17] [0:57, 1:63] [ 0:95, 0:06] [0:49, 1:95]
[ 0:008, 0:014] [0:02, 0:05] [ 0:79, 0:03] [0:46, 2:35]
Mynbaev (2010).
correlation between the estimates of b and r was at least 0.99 in absolute value. For the GAUSS code see (Mynbaev, 2006b). We also verified that the distinction made in Theorem 5.14.6 between the autoregressive term and exogenous term domination reflects the reality. The first column of Table 5.4 shows that Theorem 5.14.6 correctly works in the case of the autoregressive term domination. Column 2 of that table is not satisfactory because of the asymptotic multicollinearity. To make the exogenous regressor dominant and stay within the same example, we set m ¼ 10, r ¼ 200 and choose b large relative to r ðr ¼ 105 ; b ¼ 105 Þ. Then, despite the asymptotic multicollinearity, the distributions of both r^ and b^ are approximately normal (the p-value of the Anderson – Darling statistics is higher than 0.9). Both estimates are still biased.
5.14.8 Conclusions Estimation for the mixed spatial model is the subject of quite a few papers and books. Among the estimation techniques are the ML, MM, instrumental variables, least squares and two-stage least squares (see Ord, 1975; Anselin, 1988; Kelejian and Prucha, 1998, 1999; Lee, 2002, 2003, 2004a). In the context of the ML estimation (Lee, 2004a) lists several problems that arise in spatial models: pffiffiffi 1. The estimators may have rates of convergence lower than the customary n. 2. Different components of the estimator may converge at different rates. 3. Under some circumstances, the estimator may be inconsistent. 4. The asymptotic behavior of the estimator depends on the degree of collinearity of the exogenous regressors with the spatial term. The impression I get from the cited literature is that the same problems arise with any other estimation procedure. I am in favor of intelligent formulas which think for you and provide the necessary rates of convergence or address the multicollinearity issue without you having to sort the things out by selecting the appropriate conditions. Theorem 5.14.6 does just that. The normalizer Dn adjusts to the rate of growth of the regressors, including the spatially lagged term. It does not matter whether the exogenous regressors grow, like polynomial trends, or are bounded. The invertibility criterion covers all possible cases, from B ¼ 0 to d ¼ 1. Most importantly, the asymptotic distribution includes linear combinations of x2 variables
264
CHAPTER 5
SPATIAL MODELS
discovered, in the case of Submodel 1 (Section 5.12.1), in Mynbaev and Ullah (2008), (Theorem 5.8.1). The present theory is far from being complete. Three important issues have not been considered here because of their complexity. 1. For the purposes of statistical inference, we need to estimate the variance – covariance matrix of the vector F1 z. The situation is relatively simple in the case of exogenous regressors domination, when F is constant, 0 estimates Fn ¼ H n H n converges to F in probability and, hence, F1 n 1 1 2 F : (sbc ) can be estimated by Fn V(zn ) (see the end of the proof of Theorem 5.14.6). Even in this case there is a problem because 0 Fn ¼ H n H n depends, through dn , on unknown b. This problem is partially alleviated by the fact that zn depends on b in the same way. Therefore if (k) (1) some of kx(1) n k2 , . . . ,kxn k2 tend to infinity and, for example, kxn k2 is the (1) largest of these quantities and b1 = 0, then dn ¼ kxn k2 jb1 j for all large n and the quantities that depend on b1 in Fn and V(zn ) cancel out. If, however, (k) all of kx(1) n k2 , . . . ,kxn k2 are bounded, then 1 dn constant, so that dependence on b is weak. In the general case, when F is stochastic, there is no simple link between estimates of F, V(z) and V(F1 z). At the moment I can suggest no constructive ideas on the matter and invite the profession to think about it. 2. The second issue is consistency of the OLS estimator. Again, the problem deserves a separate study, and only general considerations are offered here. First, for a purely spatial model we have shown that, because of the presence of quadratic forms in standard normals in the asymptotic distribution, the consistency notion itself should be modified, from plim r^ ¼ r to plim r^ ¼ r þ X where EX ¼ 0: Since the mixed spatial model inherits those quadratic forms, the situation for the problem at hand must be even more complex. Second, what is known for Submodel 2 (Section 5.12.1), about consistency (Amemiya, 1985, Theorem 3.5.1) and asymptotic normality (Amemiya, 1985, Theorem 3.5.4) of the OLS estimator, indicates that consistency and convergence in distribution are two essentially different problems that require different approaches and conditions (also see Chapter 6). That is to say, trying to extract from Theorem 5.14.6 conditions sufficient for consistency may not be the best idea. Still, if we wish to realize it, this is how. The components of u^ u converge with different rates. This can be written d as mni ( u^ i ui ) ! fi , i ¼ 1, . . . , k þ 1, where mni are normalizing multipliers and fi are random variables. mni ! 0 means a swelling distribution, so in such cases u^ i ui does not converge in probability. If mni ! 1, then u^ i ui behaves as 1=mni fi , which goes to 0 in probability. Finally, for i with mni ; 1 it suffices to impose conditions providing fi ¼ 0. 3. Finally, because of nontransparent conditions in papers listed in Section 5.11.1 it is desirable to reconsider asymptotic results for other estimation methods.
CHAPTER
6
CONVERGENCE ALMOST EVERYWHERE
T
is based on a series of papers by Tze Leung Lai and Ching Zong Wei published in the 1980s. Let {F n : n 1} be a sequence of increasing (F n # F nþ1 ) s-fields. It is convenient to call a sequence of random variables {wn } premeasurable (relative to {F n }) if wn is F n1 -measurable for every n 1. Lai and P Wei first investigated the convergence of the series 1 1 wi ei with a m.d. sequence {en , F n } and premeasurable coefficients {wn } and then applied the statements obtained to the study of stochastic regression models. Even though their models contain no deterministic regressors, there are two reasons to include the results by Lai and Wei in this book. First, their approach was extended later by Nielsen (2005) to include deterministic regressors, in addition to the lags of the dependent variable, and that extension is impossible to understand without going through the results by Lai and Wei. The second reason is that one of their results is applied in Chapter 7. Unlike in Chapters 4 and 5, here only strong convergence is used. Therefore we have to go through a range of related technical tools, from conditional versions of the Borel – Cantelli lemma, Jensen’s inequality, and so on, to the martingale convergence theorem. HIS CHAPTER
6.1 THEORETICAL BACKGROUND 6.1.1 Borel – Cantelli Lemma Let {An } be an arbitrary sequence of measurable sets. We say that the event v [ An occurs infinitely often (notation: An i.o.) if there is a sequence n1 , n2 , , nk ! 1 such that v [ Ani for all i (the sequence {ni } depends on v). Lemma.
is true.
For any measurable events Ai the implication 1 X P(Ai ) , 1 ) P(An i:o:) ¼ 0
(6:1)
i¼1
Short-Memory Linear Processes and Econometric Applications. Kairat T. Mynbaev # 2011 John Wiley & Sons, Inc. Published 2011 by John Wiley & Sons, Inc.
265
266
CHAPTER 6
CONVERGENCE ALMOST EVERYWHERE
Proof. The verbal definition given above immediately translates to {An i.o.} ¼ 1 >1 n¼1 <m¼n Am . The traditional proof of the lemma is based on this equation. A more transparent proof employs the counting functions of {An } defined by P C¼ 1 shows the number of those sets Ai that 1 1(Ai ): For each v, the value C(v)P contain v, and {C ¼ 1} ¼ {An i.o. }. If 1 i¼1 P(Ai ) , 1, then by the monotone convergence theorem (Davidson, 1994, p. 60) EC lim sup E n!1
n X
1(Ai ) ¼ lim sup
n X
n!1
1
P(Ai ) , 1:
1
This implies C , 1 a.s. and P(An i.o.) ¼ 0.
B
6.1.2 Conditional Borel – Cantelli Lemma Lemma. (Hall and Heyde, 1980, p. 32) Let {Zn } be a sequence of random variables adapted to an increasing sequence of s-fields {F n } and such that 0 Zn 1. Then P1 P1 i¼1 Zi , 1 a.s. if and only if i¼1 E(Zi jF i1 ) , 1 a.s. Just to digest this statement, consider a sequence of events {An } adapted to {F n }. By the definition of conditional probability of events 1 X
P(Ai jF i1 ) ¼
i¼1
1 X
E(1(Ai )jF i1 ):
i¼1
If the premise of the Borel – Cantelli lemma holds and, hence, C , 1 a.s., then from 1.
1 X i¼1
¼
1 X i¼1
P(Ai ) ¼
1 X
E1(Ai )
i¼1
E[E(1(Ai ) jF i1 )] E
" 1 X
# P(Ai jF i1 )
i¼1
P1 P1 we see that i¼1 P(Ai ) , 1 ) i¼1 P(Ai j F i1 ) , 1 a.s. Since the variables Zn ¼ 1(An ) satisfy the assumptions of the conditional Borel – Cantelli lemma, we P have a related proposition P(An i.o.) ¼ 0 () 1 i¼1 P(Ai jF i1 ) , 1 a.s.
6.1.3 Martingale Convergence Theorem Theorem. (Chow, 1965; Hall and Heyde, 1980, Theorem 2.16) Let {Xn , F n : n 1} be a m.d. sequence with E jXn j p , 1 for each n and some p [ [1, 2]. Then P P p Sn ¼ n1 Xi converges a.s. on the set { 1 i¼1 E( jXi j jF i1 ) , 1}. This theorem is used herein as often as Theorem 6.1.4, which is its consequence. Chow’s contribution was to extend the known result from p ¼ 2 to p , 2: The importance of this extension is seen in later applications.
6.1 THEORETICAL BACKGROUND
267
6.1.4 Martingale Strong Law (Convergence of Normed Sums) Instead of convergence of {Sn } itself we may be interested in convergence of normed sums Sn =Un , where the norming sequence {Un } is a nondecreasing sequence of positive random variables. To this end, Theorem 6.1.3 can be applied to obtain the following proposition. Theorem. (Hall and Heyde, 1980) Let {Xn , F n : n 1} be a m.d. sequence and let {Un } be a premeasurable nondecreasing sequence of positive random variables. Suppose 1 p 2 and EjXn j p , 1 for each n. Then limn!1 Sn =Un ¼ 0 a.s. on the set ( ) 1 X p p Ui E( jXi j jF i1 ) , 1 lim Un ¼ 1, n!1
where Sn ¼
Pn 1
i¼1
Xi .
6.1.5 Conditional Ho¨lder Inequality (Long, 1993, p. 3) Let p, q [ [1, 1] be Ho¨lder conjugates, as in the usual Ho¨lder inequality, 1=p þ 1=q ¼ 1, and let G be a sub-s-field of F . Then jE( fg jG )j [E(j f j p j G )]1=p [E(jgjq jG )]1=q :
6.1.6 Conditional Jensen Inequality (Long, 1993, p. 5) Let f be a convex function defined on (a, b): Suppose that f [ L1 (V) is real with values in (a, b) for almost all v and such that f( f ) [ L1 (V) or f( f ) is nonnegative [definition 1.1.2 from (Long, 1993) extends E( f j G ) from f [ L1 to nonnegative f ]. Then f (E( f j G )) E(f( f )jG ) a.s.
6.1.7 Conditional Chebyshov Inequality Lemma. For p 1 and G-measurable g such that g . 0 a.s. one has P( f . gj G) gp E(j f j p jG ). Proof. P( f . g jG ) ¼ E(1( f =g . 1) jG) E(j f =gj p 1( f =g . 1)j G) gp E(j f jp jG ): B
6.1.8 Paley– Zygmund Inequality This statement and its proof are taken from Burkholder (1968, Lemma 1). Lemma. Let a . b 0: If a real random variable X satisfies EX a and EX 2 ¼ 1, then P(X b) (a b)2 :
268
CHAPTER 6
CONVERGENCE ALMOST EVERYWHERE
Proof. Let Y ¼ 1(X b): Then
a EX ¼ EXY þ EX(1 Y) (EY 2 )1=2 þ b ¼ [P(X b)]1=2 þ b and the desired inequality follows.
B
6.1.9 Conditional Paley– Zygmund Inequality For a random variable d let us call the variable v ¼ [E(d2 jG )]1=2 its variation. The following lemma, taken from Chow (1969, Lemmas 1 and 10 ), roughly means that if the conditional mean m ¼ E(d j G) of a random variable d is not very small relative to its variation v, then d itself is not very small relative to v either. Lemma. Let d be a nonnegative random variable and let G # F be a s-field. If P(3lv m , 1) ¼ 1 for some constant l . 0, then P(d . lv jG) l2 :
(6:2)
Proof. First we prove an auxiliary statement: if l 0 is a G-measurable random variable and P(m , 1) ¼ 1, then vP(d . lvjG ) l(m 2lv):
(6:3)
We can assume that l . 0 a.s. because if l 0, we can prove Eq. (6.3) for l þ 1 with a positive constant 1 and then let 1 ! 0: Let us split V in three sets: A ¼ {d lv}, B ¼ {lv , d v=l} and C ¼ {d . v=l}. On A, d is bounded by lv, which is G-measurable; on B the lower bound we need for d is available, and on C the variation v helps to dominate d: The estimation on A is the easiest: m ¼ E(d jG) ¼ E[d1(A) þ d1(B) þ d1(C) jG] lv þ E(d1(B) jG ) þ E(d1(C) jG ):
(6:4)
By the definition of conditional probability E(d1(B) jG) (v=l)E(1(lv , d) jG) ¼ (v=l)P(d . lv) jG):
(6:5)
By the conditional Ho¨lder and Chebyshov inequalities with p ¼ 2 E(d1(C)jG ) [E(d 2 jG )]1=2 [P(d . v=l) jG )]1=2 v[(l=v)2 E(d2 j G)]1=2 ¼ lv:
(6:6)
From Eqs. (6.4), (6.5) and (6.6) we get m 2lv þ (v=l)P(d . lv) jG), which proves Eq. (6.3). Now we treat l as a positive constant. From the assumption 3lv m a.s. we have m 2lv lv, so Eq. (6.3) gives Eq. (6.2).1 B 1
Here and in a couple more places I am going to be honest. When v is positive in vPðd . lvjGÞ l2 v, we can divide both sides by v to get Eq. (6.2). What Chow does when v ¼ 0 I have no idea.
6.1 THEORETICAL BACKGROUND
269
6.1.10 Lemma on Convex Combinations of Nonnegative Random Variables Lemma. (Burkholder, 1968, Lemma 2) Let Y1 , . . . , Yn be nonnegative random variables that are bounded away from zero on sets of positive probability: P(Yi d) 1, 1 i n, with some constants 1, d . 0: Denote S the simplex ( n
S ¼ a [ R : ai . 0,
n X
) ai ¼ 1 :
i¼1
P Then any convex combination ni¼1 ai Yi with coefficients a [ S is bounded away P from zero on a set of positive probability: P ni¼1 ai Yi gd1 (1 g)2 1 for any g [ (0, 1) and a [ S: Proof. It may and will be assumed that the probability space is nonatomic2: for any A with P(A) . 0 there is B # A such that 0 , P(B) , P(A). Then there exist Pmeasurable sets Ai # {Yi d} such that P(Ai ) ¼ 1. Let Xi ¼ d1(Ai ) and X ¼ ni¼1 ai Xi , a [ S: Then EX ¼ 1d and, by Ho¨lder’s inequality,
2
EX ¼ E
n X
!2 1=2 a1=2 i ai Xi
i¼1
E
n X i¼1
ai Xi2 ¼ d2
n X
ai P(Ai ) ¼ 1d2 :
i¼1
Therefore a defined by a ¼ EX(EX 2 )1=2 satisfies a2 1. Letting b ¼ ag we have a.P b. Now by Lemma 6.1.8 applied to X(EX 2 )1=2 we get from the inequality Pn n X ¼ i¼1 ai Xi i¼1 ai Yi that
P
n X
! ai Yi gd1 P(X gEX) ¼ P(X ag (EX 2 )1=2 )
i¼1
¼ P(X(EX 2 )1=2 b) (a b)2 ¼ (1 g)2 a2 ¼ (1 g)2 1:
B
6.1.11 Egorov’s Theorem (Davidson, 1994, Theorem 18.4) For a.s. convergence Xn ! X the following condition is necessary and sufficient: there exists for every d . 0 a set C(d), with P(C(d)) 1 d, such that Xn (v) ! X(v) uniformly on C(d):
2
I do not understand this, I am just following the principle that great guys do not err, and when they do it can always be fixed.
270
CHAPTER 6
CONVERGENCE ALMOST EVERYWHERE
6.1.12 Kronecker’s Lemma Lemma. (Davidson, 1994, Lemma 2.35), Let {xi : i 1} be a sequence of real numbers and let {aP i : i 1} be a nondecreasing sequence of positive numbers such n that ai ! 1. If i¼1 xi =ai converges to a real number, as n ! 1, then Pn 1=an i¼1 xi ! 0: The essence of the lemma is that if in the convergent sum x1 =a1 þ þ xn =an the weights are replaced by the smallest, 1=an , then the resulting sum tends to zero. Davidson assumes positivity of xi , but this condition is not used in the proof.
6.2 VARIOUS BOUNDS ON MARTINGALE TRANSFORMS Sections 6.2– 6.4 are based mainly on Lai and Wei (1982, 1983b).
6.2.1 Truncation Argument When {en , F n } is a m.d. sequence and {wn } is premeasurable, we may need to consider expressions such as E(wn en jF n1 ) or E((wn en )2 jF n1 ): The product wn en may not satisfy the necessary integrability requirements. To avoid imposing conditions of type E j wn en j , 1 or E(wn en )2 , 1, the following truncation argument is used. Let an be so large that P(jwn j . an ) n2 and put wn ¼ wn 1(jwn j an ): Then E j wn en j an E j en j , 1, E(wn en j F n1 ) ¼ wn E(en j F n1 ): Thus, wn are bounded, wn en are integrable and {wn en } is a m.d. sequence. The sets P1 An ¼ {jwn j . an } ¼ {wn ¼ wn } satisfy i¼1 P(Ai ) , 1. By the Borel– Cantelli lemma (Section 6.1.1) P(An i.o.) ¼ 0 or, equivalently, P(wn ¼ wn for all large n) ¼ 1 a:s: The conclusion is that when we need to prove a.s. convergence of the series or the like, we can consider wn bounded, without loss of generality.
(6:7) P1
i¼1
wn e n
6.2.2 Convergence of Normed Martingale Transforms When {en , F n } is a m.d. P sequence and the sequence of weights {wn } is premeasurable, the sum Sn ¼ ni¼1 wi ei is called a martingale transform. The next lemma P shows what happens if Sn is normed by Un ¼ n1 w2i : Lemma.
Let dn be square-integrable m.d.’s such that sup E(dn2 jF n1 ) , 1 a:s: n
(6:8)
6.2 VARIOUS BOUNDS ON MARTINGALE TRANSFORMS
and let {wn} be premeasurable. Then ( ) !1 n n 1 X X X 2 2 wi di wi ! 0 a:s: on U ¼ wi ¼ 1 : i¼1
1
271
(6:9)
1
P Proof. In Theorem 6.1.4 put Xi ¼ wi di , p ¼ 2 and Un ¼ n1 w2i . (To make Un positive, we can replace it by max {Sn1 w2i , 1}, which does not affect asymptotic statements.) Condition (6.8) implies 1 X E(X 2 jF i1 ) i
i¼1
Ui2
¼
1 X w2 E(d2 jF i1 ) i
i
Ui2
i¼1
sup E(dn2 j F n1 ) n
1 X w2 i
i¼1
Ui2
:
(6:10)
By monotonicity of the function f (x) ¼ x2 we have n X w2
i U2 i¼m i
U ði n n X Ui Ui1 X 1 ¼ ¼ dx Ui2 U2 i¼m i¼m i Ui1
U ði
n X i¼m
U ðn
dx ¼ x2
Ui1
(6:11)
dx ¼ O(1) a:s: on U x2
Um1
if Um1 . 0: From Eqs. (6.10) and (6.11) we see that ( lim Un ¼ 1,
n!1
1 X
) Ui2 E(Xi2 jF i1 )
,1 ¼U
i¼1
and Eq. (6.9) follows from Theorem 6.1.4.
B
6.2.3 The Local Marcinkiewicz– Zygmund condition Marcinkiewicz and Zygmund (1937) showed that if en are independent random variables with zero mean and satisfy sup Ee2n , 1, lim inf E j en j . 0, n
n!1
(6:12)
P then for every sequence of constants {an } an a.s. convergence of the series 1 i¼1 ai ei is P1 2 equivalent to convergence of the numerical series i¼1 ai : Gundy (1967) proved an alternative result that sounds as follows: if a m.d. sequence {en , F n } satisfies E(e2n j F n1 ) ¼ 1, inf E(jen jjF n1 ) d n
(6:13)
272
CHAPTER 6
CONVERGENCE ALMOST EVERYWHERE
with some constant d . 0 and if {wn } is premeasurable, then, except on a null set, the P1 P1 P 2 2 conditions 1 i¼1 wi , 1, i¼1 (wi ei ) , 1 and the convergence of i¼1 wi ei are equivalent. Alternative proofs of Gundy’s theorem are given by Chow (1969) and Burkholder and Gundy (1970). Following Lai and Wei (1983b) we say that the m.d. sequence {en , F n } satisfies a local Marcinkiewicz – Zygmund condition (M – Z condition) if sup E(e2n jF n1 ) , 1 and lim inf E(jen jjF n1 ) . 0 a:s:
(6:14)
n!1
n
Clearly, Eq. (6.14) is weaker than both Eq. (6.12) and Eq. (6.13). The purpose of the next several sections is to investigate the convergence properties of the series P1 i¼1 wi ei . For convenience we introduce Assumption L-W. 6.2.3.1 Assumption L-W The m.d. sequence {en , F n } satisfies the local M – Z condition and the sequence of random weights {wn } is premeasurable.
6.2.4 Convergence of Centered Modules of Martingale P Differences (Norming by ni¼1 wi2 ) Under Assumption L-W
Lemma. n X
wi [jei j E(jei kF i1 )] ¼ o
i¼1
n X
! w2i
a:s: on U ¼
i¼1
( 1 X
) w2i
¼1
(6:15)
i¼1
and n X
wi [jei j E(jei kF i1 )] converges a:s: on VnU ¼
i¼1
( 1 X
) w2i
,1 :
(6:16)
i¼1
Proof. The centered modules di ¼ jei j E(jei kF i1 ) are, obviously, m.d.’s. Since E(jei jjF i1 ) is F i1 -measurable, we have E(di2 jF i1 ) ¼ E{e2i 2jei jE(jei kF i1 ) þ [E(jei kF i1 )]2 jF i1 } ¼ E(e2i jF i1 ) 2[E(jei kF i1 )]2 þ [E(jei kF i1 )]2 E(e2i jF i1 ):
(6:17)
Now we see that by the local M – Z condition, Lemma 6.2.2 is applicable and Eq. (6.9) rewrites as Eq. (6.15). On VnU for Xn ¼ wn dn we have, by Eq. (6.17) 1 X
E(Xi2 j F i1 ) ¼
i¼1
Hence, VnU , 6.1.3.
1 X
w2i E(di2 jF i1 ) sup E(e2n j F i1 ) n
i¼1
P1
i¼1
1 X
w2i , 1:
i¼1
E(Xi2 j F i1 ) , 1 and Eq. (6.16) follows from Theorem B
273
6.2 VARIOUS BOUNDS ON MARTINGALE TRANSFORMS
Pn
6.2.5 Norming by
i¼1
wi
P The difference between thisP and Lemma 6.2.4 is that the norming by ni¼1 w2i is replaced by the norming by ni¼1 wi. Assuming that wi are nonnegative, denote ( ) 1 X wi ¼ 1, sup wi , 1 (6:18) V¼ i
i¼1
the set where the series Lemma.
P1
i¼1
wi diverges and wi are uniformly (in i) bounded.
If wi are nonnegative and Assumption L-W holds, then n X
wi [jei j E(jei kF i1 )] ¼ o
i¼1
n X
! wi
a:s on V:
(6:19)
i¼1
Proof. From the definition of V n X
wi ¼ sup w j j
1
n X i¼1
n X wi sup w j sup j w j j i¼1
wi sup j w j
!2 on V
and with the notation di ¼ jei j E(jei jjF i1 ) Eq. (6.15) implies !1 !1 X X n n X X n n sup wj ! 0 a:s: on V > U: wi d i wi wi di w2i j 1 1 1 1 (6:20) P1
However, if v [ V > (VnU), then i¼1 wi (v)di (v) converges by Eq. (6.16), while Pn 1 wi (v) ! 1 by the definition of V, so n X
n X
wi di
1
!1 wi
! 0 a:s: on V > (VnU):
(6:21)
1
Equations (6.20) and (6.21) prove Eq. (6.19).
B
6.2.6 A Lower Bound for Weighted Sums of Modules of Martingale Differences Lemma. If wi are nonnegative and Assumption L-W holds, then with the same V as in Eq. (6.18) one has
lim inf n!1
n X 1
wi jei j
n X 1
!1 wi
lim inf E(jen kF n1 ) a:s: on V: n!1
(6:22)
274
CHAPTER 6
CONVERGENCE ALMOST EVERYWHERE
Proof. Obviously, with di ¼ jei j E(jei jjF i1 ) Pn Pn Pn wi jei j wi E(jei kF i1 ) wi d i 1 1 Pn Pn ¼ þ P1n : 1 wi 1 wi 1 wi By Eq. (6.19) on V, the last term is asymptotically negligible. Denoting a ¼ lim inf n!1 E(jen jjF n1 ), for any 1 [ (0, a=2) we can find N . 0 such that E(jen jjF n1 ) a 1 for n . N: In Pn Pn PN wi E(jei kF i1 ) E(jei kF i1 ) i E(jei kF i1 ) 1 wiP 1 wP ¼ þ Nþ1 Pn (6:23) n n w w i 1 i 1 1 wi the first term on the right tends to 0 as n ! 1, on V: The second term is not less than Pn a1 i¼Nþ1 wi a 21, (a 1) ¼ PN (6:24) PN Pn Pn w þ w w = i i 1 Nþ1 1 i Nþ1 wi þ 1 say, for all n sufficiently large. Equations (6.23) and (6.24) prove Eq. (6.22).
B
6.2.7 A Lower Bound for Weighted Sums of Powers of Martingale Differences Lemma. [Lai and Wei, 1983b Lemma 1(i)] If wi are nonnegative and Assumption L-W holds, then for every r 1 ( ) !1 n n 1 X X X r lim inf wi jei j wi . 0 a:s: on V ¼ wi ¼ 1, sup wi , 1 : n!1
1
1
i
i¼1
Pn 1 : i ¼ 1, . . . , ng as a probability density function on Proof. Viewing fwi 1 wi {1, . . . , n}, by Ho¨lder’s inequality we have 2 !1 31=r !1 n n n n X X X X r 4 5 wi jei j wi wi jei j wi : 1
1
1
1
Hence, the statement follows from the local M – Z condition Eq. (6.14) and Eq. (6.22). B
6.2.8 Uniform Boundedness of Weights Lemma. [Lai and Wei, 1983b, Lemma 1(ii)] If wi are nonnegative and Assumption L-W holds, then wn are uniformly (in n) bounded on the set A ¼ { supn wn jen j , 1}: P(sup wn ¼ 1, sup wn jen j , 1) ¼ 0: n
n
Proof. Let dn ¼ wn jen j,
mn ¼ E(dn jF n1 ), vn ¼ [E(dn2 jF n1 )]1=2
(6:25)
6.2 VARIOUS BOUNDS ON MARTINGALE TRANSFORMS
275
and, for a natural K 1, define AK ¼ {sup dn , K} > {mn 3K 1 vn for all large n}: n
In the definition of AK , “for all large n” means “for n N(v), where N(v) is large enough”. Step 1. To prove boundedness of wn , it suffices to prove boundedness of the variations vn . Indeed, by the conditional Ho¨lder inequality and the local M – Z condition 0 , lim inf E(jen jjF n1 ) lim inf [E(e2n jF n1 )]1=2 : n!1
n!1
Therefore from wn ¼ vn =[E(e2n jF n1 )]1=2 it follows that lim sup wn n!1
lim supn!1 vn lim inf n!1 [E(e2n jF n1 )]1=2
and we have the implication supn vn , 1 ) supn wn , 1. Step 2. Now we reduce the problem further by showing that sup vn , 1 on AK for all K ) sup vn , 1 on A: n
(6:26)
n
The local M – Z condition implies E(jen kF n1 ) ¼
E(jen kF n1 ) [E(e2n j F n1 )]1=2 [E(e2n j F n1 )]1=2 1=2 lim inf n!1 E(jen kF n1 ) [E(e2n jF n1 )]1=2 , all large n: supn [E(e2n jF n1 )]1=2
Multiplying this by wn (which is F n1 -measurable), we get mn cvn for all large n, where c depends on v . Hence, for almost any v there exists S K such that mn 3K 1 vn for all large n. As a result, V ¼ 1 K¼1 {mn 3K 1 vn for all large n}, up to a set of probability zero. Besides, A ¼ S1 <1 K¼1 {supn dn , K}. Hence, A ¼ K¼1 AK , which proves Eq. (6.26). Step 3. The variable Zn ¼ 1(dn K) is F n -measurable and satisfies 0 Zn 1, so by the conditional Borel-Cantelli lemma (Lemma 6.1.2) the conditions P1 P1 E(Zi jF i1 ) , 1 and i¼1 Zi , 1 are a.s. equivalent. The sum Pi¼1 1 Z equals the number of times the inequality dn K is true. On AK i¼1 i this number is zero and therefore 1 X i¼1
E(Zi j F i1 ) ¼
1 X i¼1
P(di K jF i1 ) , 1 on AK :
(6:27)
276
CHAPTER 6
CONVERGENCE ALMOST EVERYWHERE
Step 4. Put
xn ¼ 1(mn 3K 1 vn ),
d n ¼ dn xn , 2
mn ¼ E( d n j F n1 ),
vn ¼ [E( d n j F n1 )]
1=2
:
As xn is F n1 -measurable, we have mn ¼ mn xn and vn ¼ vn xn . Obviously, mn 3K 1 vn a.s. Hence, by the conditional Paley– Zygmund inequality (Lemma 6.1.9) P( d n . K 1 vn jF n1 ) K 2 :
(6:28)
Noting that P( dn . K 1 vn j F n1 ) ¼ E(1( d n . K 1 vn ) jF n1 ) ¼ E(1(dn . K 1 vn )xn j F n1 ) ¼ P(dn . K 1 vn jF n1 )xn we get from Eq. (6.28) P(dn . K 1 vn jF n1 ) K 2 on AK :
(6:29)
Step 5. The resulting Eqs. (6.27) and (6.29) can be combined as follows. The inequalities dn . K 1 vn , K 1 vn K imply dn K and therefore 1(dn . K 1 vn )1(K 1 vn K) 1(dn K):
(6:30)
Equations (6.29) and (6.30) lead to K 2
X
1(K 1 vn K)
n
X
1(K 1 vn K)P(dn . K 1 vn j F n1 )
n
¼
X
E[1(K 1 vn K)1(dn . K 1 vn ) jF n1 ]
n
X
P(dn K jF n1 ) on AK:
n
Recalling that by Eq. (6.27) the right-hand side is a.s. finite, we see that the inequality K 1 vn K may be true on AK only for a finite number of indices n. Hence, supn vn , 1 on AK for all K: As we know from Steps 1 and 2, this completes the proof. B
6.2 VARIOUS BOUNDS ON MARTINGALE TRANSFORMS
277
6.2.9 Implication of Uniform Boundedness of Partial Sums Lemma.
If Assumption L-W holds, then ( sup w2n n
where Sn ¼
Pn 1
¼ 1 a:s: on B ¼
sup S2n n
, 1,
1 X
) w2i
¼1 ,
i¼1
wi ei :
Proof. Putting S0 ¼ 0 and summing the identity S2i ¼ (Si1 þ wi ei )2 ¼ S2i1 þ 2Si1 wi ei þ w2i e2i
we get
S2n ¼
n X
S2i
n X
1
S2i1 ¼ 2
n X
1
Si1 wi ei þ
1
i
¼
Ui2
i¼1
w2i e2i :
(6:31)
1
In Theorem 6.1.4 put Xn ¼ Sn1 wn en , Un ¼ the local M – Z condition 1 X E(X 2 jF i1 )
n X
Pn
i¼1
w2i , p ¼ 2: On the set B by
1 X S2i1 w2i E(e2i jF i1 ) Ui2 i¼1
1 X w2i sup S2n sup E(e2n jF n1 ) ,1 U2 n n i¼1 i
[see Eq. (6.11) for the proof of convergence of the above series]. Hence, by Theorem 6.1.4 ! n n X X 2 Si1 wi ei ¼ o wi a:s: on B: (6:32) 1
Since, obviously, S2n ¼ o
i¼1
Pn
2
wi on B, Eqs. (6.31) and (6.32) show that ! n n X X 2 2 2 wi ei ¼ o wi a:s: on B: i¼1
1
i¼1
However, by Lemma 6.2.7 lim inf n!1
n X 1
w2i e2i
n X
!1 w2i
. 0 a:s: on V
1
¼
( 1 X i¼1
) w2i
¼ 1,
sup w2i i
,1 :
Thus, the assumption that supi w2i , 1 on B leads to a contradiction.
(6:33) B
278
CHAPTER 6
CONVERGENCE ALMOST EVERYWHERE
6.3 MARCINKIEWICZ– ZYGMUND THEOREMS AND RELATED RESULTS 6.3.1 Generalized Marcinkiewicz – Zygmund Theorem I Theorem. (Lai and Wei, 1983b, Corollary 1) Suppose Assumption L-W holds (Section 6.2.3). Then, except on a null set, the following statements are equivalent: P1 2 wi , 1, (i) Pi¼1 1 (ii) 1 wi ei converges, P (iii) supn n1 wi ei , 1, P1 2 2 (iv) 1 wi ei , 1. P Proof. Denote Sn ¼ n1 wi ei . The implication symbols below denote implications outside some null set. (i) ) (ii). From (i) and the local M – Z condition (Section 6.2.3) we see that with Xn ¼ wn en 1 X
E(Xi2 jF i1 ) sup E(e2n j F n1 ) n
i¼1
1 X
w2i , 1 a:s:
i¼1
Therefore, by the martingale strong law (Section 6.1.4) Sn converges a.s. on V and (ii) holds. For future reference, notice that this argument can be summarized as follows: if the m.d. sequence {en } satisfies supn E(e2n jF n1 ) , 1 and {wn } is premeasurable, P1 2 P then n1 wi ei converges a.s. on W ¼ i¼1 wi , 1 : The implication (ii) ) (iii) is trivial. (iii) ) (i). (iii) implies sup jwn en j ¼ sup jSn Sn1 j sup (jSn j þ jSn1 j) , 1: n
n
n
As jwn j is nonnegative and F n1 -measurable, it follows by Lemma 6.2.8 that P 2 supn w2n , 1. This would not be true by Lemma 6.2.9 if we had 1 i¼1 wi ¼ 1. (i) ) (iv). Since we have established that (i) implies (iii), it suffices to show that ( ) 1 1 X X w2i e2i , 1 a:s: on A ¼ sup S2n , 1, w2i , 1 : (6:34) n
1
i¼1
As a result of the bound 1 1 X X E (Si1 wi ei )2 jF i1 sup S2n sup E(e2n jF i1 ) w2i , 1, i¼1
n
n
i¼1
the martingale strong law (Section 6.1.4) provides convergence of the series P1 2 1 Si1 wi ei on A: It remains to apply identity (6.31) and that supn Sn , 1 on A to see that Eq. (6.34) holds.
6.3 MARCINKIEWICZ –ZYGMUND THEOREMS AND RELATED RESULTS
279
(iv) ) (i). From (iv) supn jwn en j , 1, which implies supn jwn j , 1 by Lemma 6.2.8. Therefore to prove (i) it suffices to show that P
1 X
w2i e2i
, 1,
1
sup w2n n
1 X
, 1,
! w2i
¼ 1 ¼ 0:
(6:35)
i¼1
P1
By Lemma 6.2.7, Eq. (6.33) holds and is true.
1
w2i e2i ¼ 1 on V: Thus, Eq. (6.35) B
6.3.2 Approximation Lemma Lemma. Let {en , F n } be a m.d. sequence satisfying the local M – Z condition from Section 6.2.3. Then for every given h [ (0, 1), there exist positive integers m and K and a m.d. sequence {~en , F~ n} satisfying E( e~ 2n j F~ n1 ) K 2 and E(j e~ n k F~ n1 ) K 1 a:s: for all n m
(6:36)
and such that F n # F~ n for all n and P(en ¼ e~ n for all n m) 1 h:
(6:37)
Proof. Enlarging the probability space, if necessary, let {dn } be a sequence of i.i.d. symmetric Bernoulli variables such that P(dn ¼ 1) ¼ P(dn ¼ 1) ¼ 1=2 and {dn } S ~ is independent of s ( 1 1 F n ): Denote Bn ¼ s(d1 , . . . , dn ), F n ¼ s(F n < Bn ): By independence of dn we have E(dn jBn1 ) ¼ Edn ¼ 0, E(dn2 jBn1 ) ¼ Edn2 ¼ 1, E(jdn kBn1 ) ¼ Ejdn j ¼ 1:
(6:38)
Denote AK,n ¼ {E(e2n jF n1 ) K 2 , E(jen jjF n1 ) K 1 }. By the local M– Z S T condition, V ¼ 1 nm AK,n , up to a set of probability zero. Since the event m,K¼1 T nm AK,n increases with K and m, this implies lim P
m,K !1
\
! AK,n
¼ 1:
nm
This means that for any h [ (0, 1) there exist positive integers m and K such that P
\
! AK,n
1 h:
nm
Define e~ n ¼ en 1(AK,n ) þ dn 1(VnAK,n ) for n 1:
(6:39)
280
CHAPTER 6
CONVERGENCE ALMOST EVERYWHERE
The functions 1(AK,n ) and 1(VnAK,n ) are F n1 - and F~ n1 -measurable. Hence, by Eq. (6.38) and LIE E( e~ n j F~ n1 ) ¼ 1(AK,n )E(en j F~ n1 ) þ 1(VnAK,n )E(dn j F~ n1 ) ¼ 1(AK,n )E[E(en jF n1 )j F~ n1 ] þ 1(VnAK,n )E[E(dn j Bn1 ) j F~ n1 ] ¼ 0: Thus, {~en , F~ n } is a m.d. sequence. Now we show that the first of the conditions (6.36) is met: E( e~ 2n j F~ n1 ) ¼ 1(AK,n )E(e2n j F~ n1 ) þ 1(VnAK,n )E(dn2 j F~ n1 ) ¼ 1(AK,n )E[E(e2n jF n1 )j F~ n1 ] þ 1(VnAK,n )E[E(dn2 j Bn1 )j F~ n1 ] K 2 1(AK,n ) þ 1(VnAK,n ) K 2 : Similarly, E(j e~ n k F~ n1 ) ¼ 1(AK,n )E[E(jen kF n1 )j F~ n1 ] þ 1(VnAK,n )E[E(jdn kF n1 )j F~ n1 ] K 1 1(AK,n ) þ 1(VnAK,n ) K 1 : Finally, Eq. (6.37) follows from Eq. (6.39): P(en ¼ e~ n for all n m) ¼ P
\
! {en ¼ e~ n } 1 h:
nm
B
6.3.3 Generalized Marcinkiewicz – Zygmund Theorem II Lemma. (Burkholder, 1968, Lemma 4) To each d [ (0, 1) corresponds an a [ (0, 1) with the following property: if di , i 1, are m.d.’s satisfying Ejdi j P d(Edi2 )1=2 , i 1, then the martingale fn ¼ ni¼1 di satisfies Ejfn j a(Efn2 )1=2 for all n. For the original Marcinkiewicz – Zygmund result see Marcinkiewicz and Zygmund (1937). The restrictions a, d 1 are necessary by Ho¨lder’s inequality. Proof. The numbers ci ¼ (Edi2 )1=2 can be assumed positive because those di that vanish a.s. can be omitted from the sequence. Geometrically, it is obvious that we can choose b to satisfy the 0 , b , d and b ¼ (d b)2 . Application of the Paley – Zygmund inequality (Section 6.1.8) to di (Edi2 )1=2 yields P(di bci ) P (d b)2 ¼ b. The numbers ai ¼ c2i = n1 c2j and variables Yi ¼ (di =ci )2 satisfy the
281
6.3 MARCINKIEWICZ –ZYGMUND THEOREMS AND RELATED RESULTS
assumptions of Lemma 6.1.10 with d ¼ b2 , 1 ¼ b and g ¼ b [ (0, 1): Hence, P
n X
! ai Yi b
4
(1 b)2 b (d b)2 b ¼ b2 :
(6:40)
i¼1
Let Sn ( f ) ¼
Pn
i¼1
di2
1=2
. As Efn2 ¼
2
P(Sn ( f ) b
(Efn2 )1=2 )
Pn
i¼1
¼P
Edi2 ¼
n X
Pn
i¼1
c2i (di =ci )2
c2i , we have
b
4
n X
i¼1
! c2i
:
(6:41)
i¼1
Equations (6.40) and (6.41) lead to the bound 0
b2 [b2 (Efn2 )1=2 ]3=2 P@
n X
c2i (di =ci )2
i¼1
n X
1
!1 c2j
b4 A[b2 (Efn2 )1=2 ]3=2
j¼1
¼ P(Sn ( f ) b2 (Efn2 )1=2 )[b2 (Efn2 )1=2 ]3=2 ð ¼ [b2 (Efn2 )1=2 ]3=2 dP {Sn ( f )b2 (Efn2 )1=2 }
E(Sn ( f ))3=2 :
(6:42)
By Burkholder’s inequality (Long, 1993, Theorem 5.5.7) for 1 , p , 1 the norms (Ej fn jp )1=p and [E(Sn ( f )p )]1=p are equivalent. In particular, with some constant c . 0, and also applying Ho¨lder’s inequality, we have E(Sn ( f ))3=2 cEj fn j3=2 c(Ej fn j)1=2 (Efn2 )1=2 :
(6:43)
Combining Eqs. (6.42) and (6.43) yields b5 (Efn2 )1=2 (Efn2 )1=4 c(Ej fn j)1=2 (Efn2 )1=2 or Ej fn j a(Efn2 )1=2 with a ¼ (b5 =c)2 . B
6.3.4 Lemma on Probing Martingale Differences e n } be a m.d. sequence Lemma. [Lai and Wei, 1983b, Lemma 2 (ii)] Let {e en , F satisfying E( e~ 2n j F~ n1 ) K 2 and E(j e~ n k F~ n1 ) K 1 a:s: for all n m
(6:44)
with m, K sufficiently large. Let {an } be a sequence of constants and m ¼ n0 , n1 , . . . a sequence of nonrandom positive integers such that X n[bi
a2n . 0 for all i,
(6:45)
282
CHAPTER 6
CONVERGENCE ALMOST EVERYWHERE
where the batch bi is defined by bi ¼ {ni1 , n ni }. Further, define ui ¼
X
! an e~ n
n[bi
X
!1=2 , Gi ¼ F~ ni :
a2n
(6:46)
n[bi
Then {ui , Gi : i 1} is a m.d. sequence and sup E(u2n jGn1 ) K 2 , inf E(jun kGn1 ) cK a:s: n
n
(6:47)
where cK is a positive constant that depends only on K. Proof. ui form a m.d. sequence because, up to a constant factor, E(un jGn1 ) is P ~ n j F~ ni1 ) ¼ 0: n[bi an E( e From the first inequality in Eq. (6.44) we see that the first condition in Eq. (6.47) is met: 2 E4
X
!2 an e~ n
n[bi
3 j F~ ni1 5 ¼
X
a2n E[E( e~ 2n j F~ n1 )j F~ ni1 ]
n[bi
K2
X
a2n
n[bi
Let us prove the lower bound in Eq. (6.47). Lai and Wei (1983b) modeled this argument on (Barlow, 1975, p. 845). Take A [ F~ ni1 with P(A) . 0 and consider the stochastic sequence {an e~ n 1(A), F~ n : n . ni1 } on the probability space with the probability measure PA defined by PA (B) ¼ P(B > A)=P(A): This sequence is a m.d. sequence under PA : E(an e~ n 1(A)j F~ n1 ) ¼ an 1(A)E( e~ n j F~ n1 ) ¼ 0: Denoting EA X ¼ E1(A)X=P(A), from condition (6.44) we have EA j e~ n 1(A)j ¼ EA [1(A)E(j e~ n k F~ n1 )] EA K 1 ¼ K 1 , EA ( e~ n 1(A))2 ¼ EA [1(A)E( e~ 2n j F~ n1 )] K 2 and the consequence is that EA jan e~ n 1(A)j jan jK 1 K 2 [EA (an e~ n 1(A))2 ]1=2 , n . ni1 : By M– Z Theorem II (Section 6.3.3) it follows that there exists a constant dK [ (0,1) depending only on K and such that for n . ni1 : 2 X a e~ 1(A) dK 4EA EA n ,jn j j i1
X ni1 ,jn
!2 31=2 aj e~ j 1(A) 5 :
(6:48)
6.3 MARCINKIEWICZ –ZYGMUND THEOREMS AND RELATED RESULTS
283
We note that for n . ni1 EA
X
2
!2 aj e~ j 1(A)
¼ E4
ni1 ,jn
3
!2
X
aj e~ j
1(A)5
P(A)
ni1 ,jn
X
¼
a2j E[E( e~ 2j j F~ j1 )1(A)] P(A)
ni1 ,jn
[applying the conditional Jensen inequality (Section 6.1.6)] X 2 a2j E [E(j e~ j k F~ j1 )] 1(A)=P(A) ni1 ,jn
[using the second inequality in Eq. (6.44)] X a2j E1(A)=P(A) K 2 ni1 ,jn
X
¼ K 2
(6:49)
a2j :
ni1 ,jn
The conclusion from Eqs. (6.48) and (6.49) is that !1=2 X X a e~ 1(A)=P(A) dK K 1 a2j : E j[b j j j[b i
i
Since this inequality holds for all A [ F~ ni1 with P(A) . 0, it follows from the definition of conditional expectation that E(jui jjGi1 ) dK K 1 a.s. So, the second condition in Eq. (6.47) is satisfied. B
6.3.5 Lemma on Leading Terms Lemma. Let { e~ n , F~ n } be a m.d. sequence and let {an } be a sequencePof constants satisfying the condition of nontriviality of batches (6.45). Define Sn ¼ ni¼1 ai e~ i , X
Ai ¼
!1=2 a2n
=0
(6:50)
n[bi
and let ui , Gi be as in Eq. (6.46). Then the leading terms in the expression 8 > > > <
k P
Pk
i¼1
1 2 P Sni1
u2i
S2ni =A2i are given by
þ O(1) on ,1 k X A2 S2ni i¼1 i¼1 i ¼ k S2 k 1 S2 > P P P A2i ni1 ni1 > 2 i¼1 > þ ui on ¼1 : (1 þ o(1)) A2 A2 i¼1
i
i¼1
i¼1
i
(6:51)
284
CHAPTER 6
CONVERGENCE ALMOST EVERYWHERE
Proof. Start with the identity !2 X S2ni ¼ Sni1 þ an e~ n A2 i A2i n[bi 2 !2 3 X X ¼ 4S2ni1 þ 2Sni1 an e~ n þ an e~ n 5A2 i n[bi
n[bi
1 2 ¼ S2ni1 A2 i þ 2(Sni1 Ai )ui þ ui :
Summation over i gives k X
S2ni =A2i ¼
i¼1
k X
S2ni1 A2 i þ2
i¼1
k X
(Sni1 A1 i )ui þ
i¼1
k X
u2i :
(6:52)
i¼1
is Gi1 -measurable, we can put p ¼ 2, Xi ¼ (Sni1 A1 Since Sni1 A1 i i )ui , F i ¼ Gi in the martingale convergence theorem (Section 6.1.3) and use Eq. (6.47) to conclude that ( ) k 1 X X 1 2 2 (Sni1 Ai )ui converges on Sni1 Ai , 1 : (6:53) i¼1
i¼1
In the martingale strong law (Section 6.1.4), in addition to the above choice, let us take P Un ¼ ni¼1 S2ni1 A2 i . On the set U ¼ {limn!1 Un ¼ 1} we have 1 X
Ui2 E(Xi2 jF i1 ) sup E(u2n jF n1 ) n
i¼1
1 X
2 2 (Sni1 A1 ,1 i ) Ui
i¼1
[see Eq. (6.11) for the proof of convergence of the last series]. By Theorem 6.1.4 limn!1 Sn =Un ¼ 0 on U, that is, ( ) ! k k 1 X X X 1 2 2 2 2 (Sni1 Ai )ui ¼ o Sni1 Ai Sni1 Ai ¼ 1 : (6:54) on i¼1
i¼1
i¼1
Equations (6.52), (6.53) and (6.54) prove Eq. (6.51).
B
6.3.6 Theorem on Almost Sure Convergence of a Series with Square-Summable Coefficients (One-Dimensional Case) Theorem. (Lai and Wei, 1983b, Corollary 2) Let {en , F n } be a m.d. sequence satisfying the local M – Z condition (6.14). Let {an } be a sequence of constants such that 1 X i¼1
a2i , 1 and ai = 0 for infinitely many i:
(6:55)
285
6.3 MARCINKIEWICZ –ZYGMUND THEOREMS AND RELATED RESULTS
Then the series
P1
i¼1
ai ei converges a.s. and P
1 X
! ai ei ¼ Y
¼0
(6:56)
i¼1
for every random variable Y that is F p -measurable for some p 1: Hence, in partiP1 P cular, 1 i¼1 ai ei has a nonatomic distribution and P i¼1 ai ei ¼ c ¼ 0 for any constant c: When the m.d. sequence {en , F n } satisfies the stronger condition (6.13), that a i¼1 i ei has a nonatomic distribution for constants an satisfying Eq. (6.55) was established by Barlow (1975). P1
P1 Proof. The a.s. convergence of i¼1 ai ei follows from the generalized M – Z theorem I (Section 6.3.1). To prove Eq. (6.56), assume the contrary that ! 1 X ai ei Y ¼ 0 ¼ 3h . 0 (6:57) P i¼1
for some F p -measurable Y: By Egorov’s theorem there exists an event V0 such that P(V0 ) 1 h and
1 X
ai ei (v) converges uniformly for v [ V0 :
(6:58)
i¼1
By the approximation lemma (Lemma 6.3.2), for the given h there exist positive integers m and K and a m.d. sequence { e~ n , F~ n } satisfying conditions (6.36) and (6.37) and such that F n # F~ n for all n. Moreover, m can be taken larger than p: Let ! m1 n X X ai ei Y þ ai e~ i , n m: (6:59) Sn ¼ i¼m
i¼1
As the term in the parentheses is F m1 -measurable and F n # F~ n , {Sn , F~ n : n m} is a martingale: E(Sn j F~ n1 ) ¼
m1 X
! ai ei Y þ
V1 ¼
1 X i¼1
ai e~ i þan E( e~ n j F~ n1 ) ¼ Sn1 :
i¼m
i¼1
Denote (
n1 X
) ai ei Y ¼ 0 , V2 ¼
\ nm
{en ¼ e~ n }, V3 ¼ V0 > V1 > V2 :
286
CHAPTER 6
CONVERGENCE ALMOST EVERYWHERE
From Eqs. (6.57), (6.58) and (6.59) we see that Sn ¼
n X
ai ei Y converges uniformly to 0 on V3 :
(6:60)
i¼1
By Eq. (6.37), (6.57) and (6.58) we have P( V2 ) h, P( V1 ) ¼ 1 3h and P( V0 ) h (the bars stand for complements). Hence, P(V3 ) ¼ 1 P( V3 ) ¼ 1 P( V0 < V1 < V2 ) ¼ 1 P( V0 ) P( V1 ) P( V2 ) þ P( V0 > V1 ) þ P( V0 > V2 ) þ P( V1 > V2 ) P( V0 > V1 > V2 ) 1 P( V0 ) P( V1 ) P( V2 ) h: We now define nonrandom positive integers m ¼ n0 , n1 , inductively as follows.Having defined ni1 , we can choose by Eq. (6.55) an index n . ni1 such that an = 0: By Eq. (6.60), we can then choose ni . n in such a way that supv[V3 jSni (v)j jan j=i: With the numbers Ai defined in Eq. (6.50) this choice ensures that S2ni (v) A2i =i2 for all v [ V3 and that Ai . 0 for all i, so that 1 X
S2ni A2 i
i¼1
1 X
i2 , 1 on V3 with P(V3 ) . 0:
(6:61)
1
Having defined the integers ni , we then define ui and Gi as in Eq. (6.46) and obtain by Lemma 6.3.4 that {ui , Gi } is a m.d. sequence satisfying the local M – Z condition Eq. (6.47). Taking in Lemma 6.2.7 wi ¼ 1 for all i and r ¼ 2 we see that the set V from that lemma equals V and lim inf n!1
n 1X u2 . 0 a:s: n 1 i
(6:62)
The Sn defined in Eq. (6.59) differs from the Sn defined in Lemma 6.3.5 by the term Y which is F p -measurable and does not affect the proof of Lemma 6.3.5. Hence, Eq. (6.51) is applicable in the current situation. Equations (6.51) and (6.62) lead to P B the conclusion ki¼1 S2ni =A2i ! 1 a.s., which contradicts Eq. (6.61).
6.3.7 Multivariate Local Marcinkiewicz –Zygmund Condition The assumption to be made in the multivariate case is stronger than the local M –Z condition [Eq. (6.14)] in that in the first part of Eq. (6.14) higher powers of the m.d.’s are used: sup E(jen ja jF n1 ) , 1 a:s: with some a . 2: n
(6:63)
6.3 MARCINKIEWICZ –ZYGMUND THEOREMS AND RELATED RESULTS
287
Besides, under this condition the second part of Eq. (6.14) becomes equivalent to lim inf E(e2n jF n1 ) . 0 a:s: n!1
(6:64)
Indeed, if lim inf n!1 E(jen jjF n1 ) . 0 a.s., then Eq. (6.64) holds because of the inequality E(jen jjF n1 ) [E(e2n jF n1 )]1=2. Conversely, if Eq. (6.64) is true, then by the conditional Ho¨lder inequality with p ¼ (a 1)=(a 2), q ¼ a 1 we have 1=p þ 1=q ¼ 1, 1=p þ a=q ¼ 2 and a=2 jF n1 ) E(e2n jF n1 ) ¼ E(e1=pþ n
[E(jen jjF n1 )]1=p [sup E(jen ja jF n1 )]1=q : n
Hence, lim inf n!1 E(jen jjF n1 ) . 0 follows. If en ¼ (en1 , . . . , end )0 is a column vector in Rd , we replace the positivity of 2 E(en jF n1 ) in Eq. (6.64) by the positive definiteness of the covariance matrix E(en e0n jF n1 ) or, equivalently, by the positivity of its least eigenvalue lmin [E(en e0n jF n1 )]: Thus, by a multivariate local M – Z condition we mean 9 sup E( ken ka jF n1 ) , 1 a:s: with some a . 2 > = n and > ; lim inf lmin [E(en e0n jF n1 )] . 0:
(6:65)
n!1
6.3.8 Generalized Marcinkiewicz – Zygmund Theorem, Multivariate Case Theorem. (Lai and Wei, 1983b, Corollary 3) Let {en , F n } be a vector m.d. sequence satisfying Eq. (6.65). Let {wn } be a premeasurable sequence of vectors wn ¼ (wn1 , . . . , wnd )0 of the same dimension as en . Then, except for a null set, the following statements are equivalent: P1 (i) kwi k2 , 1, P11 0 (ii) 1 wi ei converges, P (iii) supn n1 w0i ei , 1, P1 0 2 (iv) 1 jwi ei j , 1. Proof. The idea is to reduce the multivariate case to the scalar case considered in Section 6.3.1. Put ui ¼
w0i ei 1(wi = 0) þ ei1 1(wi ¼ 0): kwi k
(6:66)
Then sup E(jun ja jF n1 ) sup E( ken ka jF n1 ) , 1 a:s: n
n
(6:67)
288
CHAPTER 6
CONVERGENCE ALMOST EVERYWHERE
Moreover, if wn = 0, then E(u2n jF n1 ) ¼ w0n E(en e0n jF n1 )wn = kwn k2 lmin [E(en e0n jF n1 )]: When wn ¼ 0, a similar inequality is true with wn replaced by (1, 0, . . . , 0)0 [ Rd : Therefore lim inf E(u2n jF n1 ) lim inf lmin [E(en e0n jF n1 )] . 0 a:s: n!1
n!1
(6:68)
By the argument in Section 6.3.7 it follows from Eqs. (6.67) and (6.68) that {un } satisfies the local M –Z condition. That it is a m.d. sequence is easily seen from Eq. (6.66). Since w0i ei ¼ kwi kui and kwi k is F i1 -measurable, items (i) – (iv) from Section 6.3.1 rewrite as items (i) – (iv) from this section. B
6.3.9 Convergence of a Series with Constant Vector Coefficients Theorem. (Lai and Wei, 1983b, Corollary 4) Let {en , F n } be a m.d. sequence satisfying the multivariate local M – Z condition Eq. (6.65). Suppose {An } is a sequence of nonrandom vectors from Rd such that 1 X
kAi k2 , 1 and Ai = 0 for infinitely many i:
(6:69)
i¼1
P1 0 P 0 Then the series 1 i¼1 Ai ei converges a.s. and P i¼1 Ai ei ¼ Y ¼ 0 for every random variable Y that is F p -measurable for some p 1: Proof. In the definitions of Section 6.3.8 replace wi by Ai to see that {ui } is a m.d. sequence satisfying the univariate local M – Z condition. As a result of the equality A0i ei ¼ kAi kui we can put ai ¼ kAi k. Then condition of Eq. (6.69) translates into Eq. (6.55) and the statement follows from Theorem 6.3.6. B
6.3.10 Wei’s Bound on Martingale Transforms Theorem. (Wei, 1985, Lemma 2) Let {en , F n } be a m.d. sequence such that, with some a . 2, supn E(jen ja jF n1 ) , 1 a.s. and let {wn } be a premeasurable sequence Pn 2 1=2 of random variables. Define sn ¼ : Then for any d . 1= min {a, 4} 1 wi n X
wi ei ¼ O(sn ( log sn )d ) a:s:
(6:70)
1
Furthermore, if jwn j ¼ o(scn ) for some 0 , c , 1,
(6:71)
6.3 MARCINKIEWICZ –ZYGMUND THEOREMS AND RELATED RESULTS
289
then n X
wi ei ¼ O(sn ( log log sn )1=2 ) a:s:
(6:72)
1
The proof is omitted because it uses stochastic processes in continuous time. Instead, in Section 6.3.11 we give a simpler statement Lai and (1982) witha proof. P1from Wei P 1 2 2 w ¼ 1 and Note that this theorem covers both cases 1P i 1 wi , 1 : In the n latter case Eqs. (6.70) and (6.72) become w e ¼ O(1): Eq. (6.71) becomes i i 1 P 2 wn ¼ o(1) and trivially follows from 1 w , 1: i 1
6.3.11 A Simple Bound on Martingale Transforms The following statement is stronger than Lemma 6.2.2. Lemma.
If the m.d. sequence {en , F n } satisfies sup E(e2n jF n1 ) , 1
(6:73)
n
and {wn } is premeasurable, then P1 2 P1 (i) 1 wi ei converges a.s. on U ¼ 1 wi , 1 ; Pn 2 1=2 (ii) For every h . 1=2 with sn ¼ we have 1 wi ( ) n 1 X X h 2 wi ei ¼ o(sn ( log sn ) ) a:s: on VnU ¼ wi ¼ 1 : 1
1
Proof. By the truncation argument from Section 6.2.1 wn en can be considered P1 2 square-integrable, and Eq. (6.7) ensures that U ¼ U ¼ 1 (wi ) , 1 . In the proof we omit the stars. Statement (i) was actually obtained in the course of the proof of Theorem 6.3.1, see implication (i) ) (ii) in the proof of that theorem. (ii) {wn en } is a m.d. sequence. Un ¼ sn ( log sn )1=2 is nondecreasing, positive and F n1 -measurable. Obviously, 1 X
E((wi ei )2 jF i1 )Ui2
i¼1
1 X
w2i Ui2 sup E(e2n jF n1 ):
i¼1
n
Replace the function f (x) ¼ x2 in the proof of Lemma 6.2.2 by f (x) ¼ [x( log x)2h ]1 , to get with sufficiently large m 1 X w2
i U2 i¼m i
U 1 ði ð 1 X 1 dx ¼ dx , 1 on U 2 U x( log x)2h i¼m i Ui1
Um1
because 2h . 1: Now the statement follows from Theorem 6.1.4.
B
290
CHAPTER 6
CONVERGENCE ALMOST EVERYWHERE
6.3.12 Bounds for Weighted Sums of Squared Martingale Differences Assume the conditions of Lemma 6.3.11. Then P1 P1 2 (i) 1 jwi jei , 1 a.s. on T ¼ 1 jwi j , 1 ; (ii) For every r . 1
Lemma.
n X
jwi je2i
n X
¼o
1
!r ! a:s: on VnT:
jwi j
(6:74)
1
Proof (i) The centered squares di ¼ e2i E(e2i jF i1 ) are m.d.’s and satisfy E(jdi jjF i1 ) 2 supn E(e2n jF n1 ): In the identity 1 X
jwi je2i ¼
1
1 X
jwi j di þ
1 X
1
jwi jE(e2i jF i1 )
1
the second series on the right clearly converges on T: The first series converges by the martingale convergence theorem (Theorem 6.1.3) because {jwi jdi } is a m.d. sequence and 1 X
E(jwi di jjF i1 ) sup E(jdn jjF n1 ) n
1
1 X
jwi j , 1
1
on T: (ii) As a first step we prove 1 X
r jwi je2i c i , 1 a.s. on VnT,
(6:75)
1
where ci ¼ implies
Pi
1
jwj j and, by definition, 0=0 ¼ 0. The assumption r . 1
1 X jwi j i¼1
cri
¼
1 X ci ci1 i¼1
cri
1 ð
dx , 1 a.s. on VnT: xr
c1
r Applying statement (i) with jwi jc i in place of jwi j we obtain Eq. (6.75). To apply Kronecker’s lemma (Lemma 6.1.12), denote xi ¼ jwi je2i , Pn ai ¼ cri : i¼1 xi =ai converges by Eq. (6.75) and {an } monotonically increases to 1 on VnT: Hence, n X 1
r jwi je2i c n ¼ 1=an
n X i¼1
xi ! 0: B
6.3 MARCINKIEWICZ –ZYGMUND THEOREMS AND RELATED RESULTS
291
6.3.13 Precise Order of Weighted Squares of Martingale Differences Lemma. If the m.d. sequence {en , F n} satisfies the M-Z condition Eq. (6.65) (in the 1-D case) and {wn } is premeasurable, then Pn Pn jwi je2i jwi je2i 1 lim sup P1 n ,1 0 , lim inf Pn n!1 n!1 1 jwi j 1 jwi j ( ) 1 X on V ¼ jwi j ¼ 1, sup jwi j , 1 : (6.76) i
i¼1
Proof. The left inequality in Eq. (6.76) is proved in Lemma 6.2.7. Note that the right inequality strengthens Eq. (6.74) by allowing r ¼ 1: Let us prove it. Take some r [ (1, min {2, a=2}) and denote di ¼ e2i E(e2i jF i1 ): From the elementary inequality ja bjr cr (ar þ br ), conditional Jensen’s inequality (Section 6.1.6) and M-Z condition [Eq. (6.65)] we get E(jdi jr jF i1 ) cr E{jei j2r þ [E(e2i jF i1 )]r jF i1 } 2cr sup E(jen j2r jF n1 ) , 1 a:s:
(6.77)
n
The products jwi jdi are m.d.’s. Now consider two cases: 1. Suppose 1 X
jwi jr ¼ 1:
(6:78)
1
Denoting Ui ¼
Pi
1
jwj jr , by Eqs. (6.77) and (6.78) we have
1 X E(jwi di jr jF i1 ) i¼m
Uir
c
1 X jwi jr
Uir i¼m
¼c
1 X Ui Ui1 i¼m
Uir
U 1 ði ð 1 X dx dx c ¼ c , 1 a:s:, r x xr i¼m Ui1
Um1
where c depends on v and Um1 . 0. By the martingale strong law (Section 6.1.4) ! n n X X r (6:79) Eq: (6:78) ) jwi j di ¼ o jwi j a:s: i¼1
i¼1
2. Next assume the opposite of Eq. (6.78): 1 X 1
jwi jr , 1:
(6:80)
292
CHAPTER 6
CONVERGENCE ALMOST EVERYWHERE
P P1 r r Since by Eq. (6.77) 1 i¼1 E(jwi di j jF i1 ) c 1 jwi j , 1 in this case, the martingale convergence theorem (Theorem 6.1.3) shows that n X
Eq: (6:80) )
jwi j di ¼ O(1) a:s:
(6:81)
i¼1
Now we are able to prove n X
jwi j di ¼ o
i¼1
n X
! a:s:
jwi j
(6:82)
i¼1
From the bound n X
jwi j ¼ sup jwj j
n X
j
1
i¼1
n X jwi j sup jwj j1r jwi jr supj jwj j j i¼1
we see that n X
r
jwi j ¼ O
i¼1
n X
! jwi j
on V:
(6:83)
i¼1
Take v [ V: If Eq. (6.78) is true, then Eq. (6.82) follows from Eqs. (6.79) and (6.83). If Eq. (6.80) is true, then Eq. (6.82) is a consequence of P Eq. (6.81) and 1 1 jwi j ¼ 1: Finally, Eq. (6.82) and the M-Z condition (Section 6.2.3) imply n X
jwi je2i
1
¼
n X
jwi j di þ
i¼1
n X
jwi jE(e2i jF i1 )
¼O
i¼1
n X
! jwi j
i¼1
on V: B
6.4 STRONG CONSISTENCY FOR MULTIPLE REGRESSION 6.4.1 Notation In the multiple regression model yn ¼ Xn b þ 1n , n ¼ 1, 2, . . . , where 1n ¼ (e1 , . . . , en )0 we assume that {en , F n } is a m.d. sequence, the parameter vector is p 1 and the matrix Xn is of size n p: It is assumed that, as n grows, new equations are appended to the system and the previous equations are not changed. Further, for each n the nth row (xn1 , . . . , xnp ) is F n1 -measurable. The nth row is written as the transpose of xn ¼ (xn1 , . . . , xnp )0 and, therefore, if Xn0 Xn is nonsingular, the least squares estimate
6.4 STRONG CONSISTENCY FOR MULTIPLE REGRESSION
293
bn of b becomes bn ¼
(Xn0 Xn )1 Xn0 yn
¼bþ
n X
!1 xi x0i
1
n X
xi ei :
1
It seen that the statistical properties of bn are related to the martingale transform Pis n x 1 i ei and the random matrix An ¼
n X
xi x0i :
1
That this matrix is a sum of nonnegative definite matrices xi x0i is one of the leading ideas in Section 6.4. In particular, some important properties of the sequence {An } depend on quadratic forms x0k A1 k xk , which we call increments. All statements in Section 6.4 are taken from Lai and Wei (1982). For a p p symmetric matrix A we denote by
lmin (A) ¼ l1 (A) lp (A) ¼ lmax (A)
(6:84)
its eigenvalues. jAj stands for the determinant of A:
6.4.2 Lemma on Increments Lemma. Let B be a p p matrix and let w be a p 1 vector. If A ¼ B þ ww 0 is nonsingular, then w 0 A1 w ¼ 1 jBj=jAj. Proof. By the partitioned matrix rule [Lemma 1.7.6(i)] 1 w0 0 ¼ jAj(1 w0 A1 w), jBj ¼ jA ww j ¼ w A which gives the desired result.
B
6.4.3 The Link between Increments and lmax (An) Lemma. Let x1 , x2 , . . . be p 1 vectors and let An ¼ nonsingular for some N. Then
Pn
0 1 xi xi .
Suppose that AN is
(i) lmax (An ) is nondecreasing and An is nonsingular for all n N. P1 0 1 x Ak xk , 1. Pnk¼N 0k 1 (iii) If limn ! 1 lmax (An ) ¼ 1, then k¼N xk Ak xk ¼ O( log lmax (An )): (ii) If limn ! 1 lmax (An ) , 1, then
Proof. (i) By (Gohberg and Kreı˘n (1969), Lemma 1.1) the inequality Ak Ak1 implies
lj (Ak ) lj (Ak1 ), j ¼ 1, . . . , p:
(6:85)
294
CHAPTER 6
CONVERGENCE ALMOST EVERYWHERE
In particular, lmax (An ) is nondecreasing. If AN is nonsingular, then lmin (AN ) is positive and all An with n N are nonsingular. (ii) The equation Ak ¼ Ak1 þ xk x0 k and Lemma 6.4.2 imply n X
n X
x0k A1 k xk ¼
k¼Nþ1
(1 jAk1 j=jAk j), n N þ 1:
(6:86)
lj (An ) [lmin (An )]p , n N þ 1:
(6:87)
k¼Nþ1
Obviously, Eq. (6.85) implies [lmax (An )]p jAn j ¼
p Y j¼1
If limn!1 lmax (An ) , 1, then Eqs. (6.86) and (6.87) give n X
n X
x0k A1 k xk ¼
k¼Nþ1
(jAk j jAk1 j)=jAk j
k¼Nþ1
lp min (ANþ1 )
n X
(jAk j jAk1 j)
k¼Nþ1
¼ lp min (ANþ1 )(jAn j jAN j) [lmax (An )=lmin (ANþ1 )]p c, n N þ 1: (iii) As a result of Eq. (6.85) we have jAk j jAk1 j and jA ðk j
jAk j jAk1 j ¼ jAk j
dx jAk j
jAk1 j
jA ðk j
dx ¼ ln jAk j ln jAk1 j: x
jAk1 j
Summing these inequalities and applying Eqs. (6.86) and (6.87) we get n X
x0k A1 k xk
k¼Nþ1
n X
( ln jAk j ln jAk1 j)
k¼Nþ1
¼ ln jAn j ln jAN j ¼ O( ln jAn j) ¼ O{log [lmax (An )]}: B
6.4.4 Lemma on Recursions Define N ¼ inf {n : Xn0 Xn is nonsingular}, inf f ¼ 1, and for n N consider the quadratic form 0 Qn ¼ 10n Xn (Xn0 Xn )1 Xn0 1n ¼ 10n Xn A1 n Xn 1n :
6.4 STRONG CONSISTENCY FOR MULTIPLE REGRESSION
295
Lemma. Let us partition Xn into rows, Xn ¼ (x1 , . . . , xn )0 , and denote qk ¼ x0k A1 k1 xk . Then 1 1 0 1 A1 k ¼ Ak1 Ak1 xk xk Ak1 =(1 þ qk ), 2 Pk1 Pk1 x0k A1 xi ei x0k A1 xi ei ek k1 k1 1 1 Qk ¼ Qk1 þ2 1 þ qk 1 þ qk 2 þ x0k A1 k xk ek ,
and for n . N n X
Qn QN þ
(6.88)
(6.89) 2 Pk1 x0k A1 x e i i k1 1
1 þ qk Pk1 n x0k A1 x e X i i ek k1 1 k¼Nþ1
¼2
1 þ qk
k¼Nþ1
þ
n X
2 x0k A1 k xk ek :
(6.90)
k¼Nþ1
Proof. To prove Eq. (6.88), it is enough to check that premultiplication of the matrix 0 Xk1 þ xk x0k gives the identity matrix. at the right of Eq. (6.88) by Ak ¼ Xk1 0 1 Remembering that Xk Xk Ak ¼ I we have
0 1 A1 0 0 1 k1 xk xk Ak1 (Xk1 Xk1 þ xk xk ) Ak1 1 þ qk 0 1 xk x0k A1 xk (x0k A1 k1 k1 xk )xk Ak1 þ xk x0k A1 k1 1 þ qk 1 þ qk
1 qk xk x0k A1 k1 xk x0k A1 ¼I þ 1 k1 1 þ qk 1 þ qk
qk qk ¼Iþ xk x0k A1 k1 ¼ I: 1 þ qk 1 þ qk
¼I
Now we prove Eq. (6.89). By Eq. (6.88) for k . N 0 1 x0k A1 k ¼ xk Ak1
0 1 (x0k A1 x0k A1 k1 xk )xk Ak1 k1 ¼ : 1 þ qk 1 þ x0k A1 k1 xk
For k . N the definition of Qn , and Eqs. (6.88) and (6.91) give
Qk ¼
k1 X
! x0i ei
þ
x0k ek
A1 k
1
¼
k 1 X 1
k 1 X
! xi ei þ xk ek
1
x0i ei A1 k
k 1 X 1
xi ei þ 2
x0k A1 k
k 1 X 1
! 2 xi ei ek þ x0k A1 k xk ek
(6:91)
296
CHAPTER 6
CONVERGENCE ALMOST EVERYWHERE
X k1 0 1 A1 k1 xk xk Ak1 ¼ xi ei 1 þ qk 1 1 ! k1 X x0k A1 2 k1 þ2 xi ei ek þ x0k A1 k xk ek 1 þ qk 1 2 Pk1 Pk1 0 1 x0k A1 x e x A x e ek i i i i k1 k k1 1 1 ¼ Qk1 þ2 1 þ qk 1 þ qk k1 X
x0i ei
A1 k1
2 þ x0k A1 k xk ek ,
which proves Eq. (6.89). Sending the first two terms at the right of Eq. (6.89) to the left side and summing over k ¼ N þ 1, . . . , n we get Eq. (6.90). B
6.4.5 Bounding Qn Lemma.
Let the m.d. sequence {en , F n } satisfy sup E(e2n jF n1 ) , 1 a:s:
(6:92)
n
and let {xn } be premeasurable. Put L,1 ¼ {N , 1, lim lmax (An ) , 1}, n!1
L¼1 ¼ {N , 1, lim lmax (An ) ¼ 1}: n!1
The following statements are true: (i) Qn ¼ O(1) a.s. on L,1 . (ii) For every d . 0 Qn ¼ o([log lmax (An )]1þd ) a:s: on L¼1 :
(6:93)
(iii) If Eq. (6.92) is replaced by sup E(jen ja jF n1 ) , 1 a:s: for some a . 2,
(6:94)
n
then Eq. (6.93) can be strengthened into Qn ¼ O(log lmax (An )) a:s: on L¼1 :
(6:95)
6.4 STRONG CONSISTENCY FOR MULTIPLE REGRESSION
297
Proof. Pk1 (i) Let wk ¼ x0k A1 k1 1 xi ei =(1 þ qk ) if k . N and wk ¼ 0 if k N. Since {wn } is premeasurable, it follows from Lemma 6.3.11(ii) that 2
n X
wk ek ¼ o4
k¼Nþ1
n X
!1=2 log
!
( 1 X
k¼Nþ1 n X
¼o
n X
w2k
w2k
!h 3 5
k¼Nþ1
w2k
on
) w2k
¼1 :
Nþ1
k¼Nþ1
However, by Lemma 6.3.11(i) n X
wk ek ¼ O(1) on
( 1 X
) w2k
,1 :
Nþ1
k¼Nþ1
The two bounds above imply ! Pk1 n n X X x0k A1 xi ei 2 k1 1 ek ¼ o wk þ O(1) 1 þ qk k¼Nþ1 k¼Nþ1 0 !2 1 Pk1 n 0 1 X xk Ak1 1 xi ei A þ O(1) ¼ o@ 1 þ qk k¼Nþ1 0
n B X ¼ o@
x0k A1 k1
k¼Nþ1
Pk1 1
1 þ qk
xi ei
2 1 C A þ O(1):
(6.96)
P 0 1 0 1 By Lemma 6.4.3(ii) 1 k¼Nþ1 xk Ak xk , 1 on L,1 . Because xk Ak xk is F k1 -measurable, by Lemma 6.3.12(i) 1 X
2 x0k A1 k xk ek , 1 on L,1 :
(6:97)
k¼Nþ1
Equations (6.96) and (6.97) allow us to use Eq. (6.90) to conclude that
Qn þ
n X
2 Pk1 x0k A1 xi ei k1 1
k¼Nþ1
1 þ qk
This proves statement (i).
(1 þ o(1)) ¼ O(1) a:s: on L,1 : (6:98)
298
CHAPTER 6
CONVERGENCE ALMOST EVERYWHERE
(ii) In the event L¼1 , by Lemmas 6.4.3(iii) and 6.3.12(ii) for every d . 0 2 !1þd 3 n n X X 2 4 5 x0k A1 x0k A1 k xk ek ¼ o k xk k¼Nþ1
k¼Nþ1 1 X
¼ o[( log lmax (An ))1þd ] if
x0k A1 k xk ¼ 1
k¼Nþ1
and by Lemma 6.3.12(i) n X
1 X
2 x0k A1 k xk ek ¼ O(1) ¼ o(log lmax (An )) if
k¼Nþ1
x0k A1 k xk , 1:
k¼Nþ1
Consequently, for every d . 0 n X
2 1þd x0k A1 ] on L¼1 : k xk ek ¼ o[( log lmax (An ))
(6:99)
k¼Nþ1
As above, we apply Eqs. (6.96), (6.99) and (6.90) to derive
Qn þ
n X
x0k A1 k1
Pk1 1
xi ei
2 (1 þ o(1))
1 þ qk
k¼Nþ1
¼ o[( log lmax (An ))1þd ] on L¼1
(6:100)
for every d . 0. The proof of Eq. (6.93) is complete. (iii) Suppose Eq. (6.94) holds. By Lemma 6.4.2 x0k A1 k xk ¼ 1 jAk1 j=jAk j 1 and therefore ( ) ( ) 1 1 X X 0 1 0 1 0 1 sup xk Ak xk , 1, xk Ak xk ¼ 1 ¼ xk Ak xk ¼ 1 : k
k¼Nþ1
k¼Nþ1
By Lemmas 6.3.13 and 6.4.3(iii) n X
2 x0k A1 k xk ek ¼ O
k¼Nþ1
n X
! x0k A1 k xk
¼ O( log lmax (An )):
k¼Nþ1
Using this relationship instead of Eq. (6.99) in the proof of Eq. (6.100) we finish the proof of Eq. (6.95). B
6.4.6 Case of One Regressor The beauty of Lemma 6.4.5 is that the convergence properties established in Lemma 6.3.11 are extended to the multiple regression case without deterioration of the rates of convergence. More importantly, Lemma 6.4.5 in the case of one regressor ( p ¼ 1)
6.4 STRONG CONSISTENCY FOR MULTIPLE REGRESSION
299
provides an improvement of Lemma 6.3.11(ii) under the assumption (6.94). This is the content of the next corollary. Corollary. Let the m.d. sequence {en , F n } satisfy Eq. (6.94) and suppose P1 2that the sequence of random weights {wn } is premeasurable. Then in the event 1 wi ¼ 1 n X
0" !#1=2 1 n n X X A a:s: w i e i ¼ O@ w2i log w2i
1
1
1
Proof. Set Xn ¼ (w1 , . . . , wn )0 in Lemma 6.4.5 to obtain An ¼ Xn0 Xn ¼
n X
w2i , lmax (An ) ¼
n X
1
Qn ¼
w2i , Xn0 1n ¼
n X
1
10n Xn (Xn0 Xn )1 Xn0 1n
¼
n X
wi ei ,
1
!2
n X
wi ei
1
!1 w2i
:
1
Then by Eq. (6.95) n X 1
!2 wi e i
n X 1
!1 w2i
¼ O log
n X 1
! w2i
on
( 1 X
) w2i
¼1 :
1
B
6.4.7 Strong Consistency of the Ordinary Least Squares Estimator in Stochastic Regression Models Consider the regression model yn ¼ Xn b þ 1n , where 1n ¼ (e1 , . . . , en )0 , and denote An ¼ Xn0 Xn . For nonrandom xij , Lai et al. (1978, 1979) proved that the condition A1 n ! 0 is necessary and sufficient for the strong consistency of the OLS estimator bn .3 This condition is equivalent to lmin (An ) ! 1. Now suppose that xij are random and lmin (An ) ! 1 a.s. Anderson and Taylor (1979) established the strong consistency of bn under the condition lmax (An ) ¼ O(lmin (An )), while Christopeit and Helmes (1980) weakened that condition to [lmax (An )]r ¼ O(lmin (An )) a.s. for some r . 1=2. The following theorem by Lai and Wei (1982) is a substantial improvement of those results. Theorem. Suppose that in the above regression model, {en , F n } is a m.d. sequence such that Eq. (6.94) holds. Further, assume that the nth row xn of Xn is F n1 -measurable, for all n, and that
lmin (An ) ! 1 a:s: and log lmax (An ) ¼ o(lmin (An )) a:s: 3
It would be good to include this result in the book but I could not find the complete proof.
(6:101)
300
CHAPTER 6
CONVERGENCE ALMOST EVERYWHERE
Then the least-squares estimator bn converges a.s. to b; in fact, ! log lmax (An ) 1=2 kbn bk ¼ O a:s: lmin (An )
(6:102)
Proof. Since lmin (An ) ! 1 a.s., we have N , 1 a.s. Note the identity 2 n 0 1=2 X 0 xi ei ¼ (A1=2 Xn0 1n ) A1=2 Xn0 1n ¼ 10n Xn A1 An n n n Xn 1n ¼ Qn : 1 It allows us to apply Lemma 6.4.5(iii) and condition (6.101) in writing 2 2 n n X 1 X 1=2 2 1=2 xi ei kAn k An xi ei kbn bk ¼ An 1 1 2
¼ [lmin (An )]1 Qn ¼ O([ log lmax (An )]=lmin (An )) ¼ o(1):
B
6.4.8 Another Version of the Strong Consistency Statement In Chapter 7 on nonlinear models there is a situation where condition (6.94) is too tough. To this end, the theorem below is useful. Theorem. If in Theorem 6.4.7 condition (6.94) is replaced by a weaker condition Eq. (6.92) and condition (6.101) is replaced by a stronger one
lmin (An ) ! 1 a:s: and [log lmax (An )]1þd ¼ o(lmin (An )) for some d . 0,
(6:103)
then the conclusion [Eq. (6.102)] is true. This follows from Lemma 6.4.5(ii).
6.5 SOME ALGEBRA RELATED TO VECTOR AUTOREGRESSION 6.5.1 The Model and History Consider the vector autoregressive model Yn ¼ BYn1 þ en , n ¼ 1, 2, . . .
(6:104)
where B is a p p nonrandom matrix with real elements and {en , F n } is a vector m.d. sequence satisfying sup E(ken ka jF n1 ) , 1 a.s. with some a . 2: n
(6:105)
6.5 SOME ALGEBRA RELATED TO VECTOR AUTOREGRESSION
301
Equation (6.104) is equivalent to Yn ¼ Bn Y0 þ
n X
Bni ei ,
(6:106)
i¼1
where Y0 is assumed to be F 0 -measurable. We denote Yn ¼ (Yn1 , . . . , Ynp )0 , en ¼ (en1 , . . . , enp )0 , B ¼ (bij )1i, jp , bi ¼ (bi1 , . . . , bip )0
(6.107)
(B is partitioned into rows). The least-squares estimate of bi based on observed Y1 , . . . , Ynþ1 is b^ k (n þ 1) ¼
n X i¼1
!1 Yi Yi0
n X
Yi Yiþ1,k ¼ bk þ
i¼1
n X
!1 Yi Yi0
i¼1
n X
Yi eiþ1,k ,
(6:108)
i¼1
where the inverse sign denotes the Moore – Penrose generalized inverse. In view of Eq. (6.108) the least-squares estimate is strongly consistent if and only if n X
!1 Yi Yi0
n X
Yi e0iþ1 ! 0 a:s:
(6:109)
yn ¼ b1 yn1 þ þ bp ynp þ 1n
(6:110)
i¼1
i¼1
The autoregressive AR( p) model
can be written in the vector form [Eq. (6.104)] with Yn ¼ ( yn , yn1 , . . . , ynpþ1 )0 , en ¼ (1n , 0, . . . , 0)0 ,
b1 b p1 bp : B¼ I p1 0
(6.111)
This B is called a companion matrix for Eq. (6.110). Everywhere we denote z1 , . . . , zp the eigenvalues of B (the roots of its characteristic polynomial) and put M ¼ max jzj j, m ¼ min jzj j. If m . 1, the process {Yn } is called purely explosive. If M 1, we say that it is nonexplosive. Section 6.5 is based on the paper Lai and Wei (1985). Table 6.1 describes the contributions of various authors prior to that paper. All these authors considered weak consistency. Lai and Wei established strong consistency for Eq. (6.110) in (Lai and Wei, 1983a) and for Eq. (6.104) in (Lai and Wei, 1985). Nielsen (2005) added a deterministic term (see the definition in Chapter 8) in Eq. (6.104). In Nielsen, (2008) he discovered that the least-squares estimator for a vector autoregression (VAR) of dimension larger than 1 may be inconsistent. To achieve consistency, we have to decompose Yn into stationary, unit root and explosive processes and require the unit root component to be of dimension 1, [see (Nielsen, 2008, pp. 3-4) for the decomposition
302
CHAPTER 6
CONVERGENCE ALMOST EVERYWHERE
TABLE 6.1 Contributions to the Consistency Theory of Autoregressions
Author(s)
Case(s) considered
Mann and Wald (1943) Rubin (1950), Anderson (1959) Rao (1961) White (1958) Muench(1974), Stigum(1974)
M,1 m.1 p ¼ 2, jz1 j , 1, jz2 j , 1 p ¼ 1, jz1 j ¼ 1 Roots anywhere
and Assumption E that ensures consistency]. He mentions that Lai and Wei (1985) and Nielsen (2005) overlooked the possibility of singularity. Giving complete proofs is not an option here, and we just require the unit root component to be of dimension 1, remembering that this makes the argument by Lai and Wei correct.
6.5.2 Growth Rate of Natural Powers of B For x [ Rp denote kxk ¼ (x0 x)1=2 and for a p p matrix A denote kAk ¼ supkxk¼1 kAxk. Let l1 , . . . , lq , q p, denote all the distinct eigenvalues of B, and express it in its Jordan form (6:112) B ¼ C 1 DC, where D ¼ diag[D1 , . . . , Dq ], 0 1 lj 1 0 0 B 0 lj 1 0 C B C B C Dj ¼ B 0 0 lj C is an mj mj matrix, B . C . . . . . . .. A .. .. @ .. 0 0 0 lj P q mj is the multiplicity of lj j¼1 mj ¼ p , and C is a nonsingular p p matrix (over the complex field). Denote M ¼ maxjlj j, m ¼ max{mj : jlj j ¼ M}:
(6:113)
m is the multiplicity of the largest (in absolute value) eigenvalue of B. Lemma (i) There exists a constant c ¼ c(m, M) . 0 such that kBn k cnm1 M n for all natural n: (ii) (iii)
1 n n log kB k ¼ (1 þ o(1)) log M. 1 n n log kB k ¼ (1 þ o(1)) log m.
(6:114)
6.5 SOME ALGEBRA RELATED TO VECTOR AUTOREGRESSION
303
Proof. (i) [Varga, 1962, p. 65] proved that kBn k ¼ (1 þ o(1))cCmn 1 [lmax (B)]n(m1) ,
(6:115)
where c does not depend on B. The quantity Cmn 1 M n(m1) ¼ ¼
n(n 1) (n m þ 2) M n M m1 (m 1)! (1 1n) (1 m2 n ) m1 n n M m 1 (m 1)!M
(6.116)
is of order nm1 M n , which proves (i). (ii) From Eqs. (6.115) and (6.116) it follows that with 1 1n 1 m2 n c(n) ¼ [(m 1)!M m1 ] we have 1 1 m1 log kBn k ¼ log [(1 þ o(1))cc(n)] þ log n þ log M: n n n This proves (ii). (iii) is a consequence of (ii) because lmax (B1 ) ¼ 1=lmin (B) ¼ m1 .
B
6.5.3 The Meddling Middle Factor This lemma is used to show that in some situations for the estimation of lmax (An Cn A0n ) the factor in the middle, Cn , is asymptotically negligible. Lemma. Let A, C be p p matrices such that C is symmetric and nonnegative definite. Then
lmax (C)lmax (AA0 ) lmax (ACA0 ) lmin (C)lmax (AA0 ):
(6:117)
Proof. In the proof we use the fact that for any symmetric nonnegative definite matrix C
lmax (C) ¼ sup x0 Cx ¼ sup kCxk: kxk¼1
(6:118)
kxk¼1
Together with the equation kAk ¼ kA0 k this leads to kAk2 ¼ sup kA0 xk2 ¼ sup x0 AA0 x ¼ lmax (AA0 ): kxk¼1
kxk¼1
(6:119)
304
CHAPTER 6
CONVERGENCE ALMOST EVERYWHERE
Using the diagonalization C ¼ U 0 LU, where U is an orthogonal matrix, U 0 U ¼ I, and L is a diagonal matrix with the eigenvalues lj (C) on the main diagonal, we have a two-sided bound on the quadratic form x0 Cx:
lmin (C)kxk2 x0 Cx ¼ (Ux)0 LUx lmax (C)kxk2 : The left inequality in Eq. (6.117) is proved as follows:
lmax (ACA0 ) ¼ sup x0 ACA0 x ¼ sup (A0 x)0 C(A0 x) kxk¼1
kxk¼1
lmax (C) sup kA0 xk2 ¼ lmax (C)lmax (AA0 ): kxk¼1
The right inequality follows from sup x0 ACA0 x sup {lmin (C)kA0 xk2 } ¼ lmin (C)lmax (AA0 ): kxk¼1
B
kxk¼1
6.5.4 Lemma on an Enveloping Resolvent Let us call BF ¼
1 X
Bi F(Bi )0
i¼0
an enveloping resolvent (see related definitions in Section 8.4.1). Here F is a p p matrix. Lemma. If M , 1, then the equation X BXB0 ¼ F has a unique solution X ¼ BF for any right-hand side F. Proof. If there are two different solutions, X1 and X2 , then the difference X ¼ X1 X2 satisfies X BXB0 ¼ 0. Since kBk , 1, we arrive at a contradiction: 0 , kXk kBk2 kXk , kXk. Further, X ¼ BF is really a solution: X BXB0 ¼
1 X
Bi F(Bi )0
i¼0
1 X
Biþ1 F(Biþ1 )0 ¼ F: B
i¼0
6.5.5 Lemma on the Same Order Lemma.
Denote Vn ¼
Pn
i¼1
Yi Yi0 . lmax (Vn ) and
Pn
i¼1
kYi k2 are of the same order.
6.5 SOME ALGEBRA RELATED TO VECTOR AUTOREGRESSION
305
Proof. Denoting lj the eigenvalues of Vn and using properties of trace we have
lmax (Vn )
p X
li (Vn ) ¼ tr Vn ¼ tr
i¼1
¼
n X
Yi Yi0
i¼1
n X
tr Yi0 Yi ¼
n X
i¼1
kYi k2 plmax (Vn ):
(6.120)
i¼1
B
6.5.6 Towards the Lower Bound Let w(l) ¼ jB lIp j ¼ lp þ a1 l p1 þ þ ap be the characteristic polynomial of B and let a0 ¼ 1. The variables Zn ¼ Yn þ
p X
aj Ynj ¼
j¼1
p X
aj Ynj , n p,
(6:121)
j¼0
are instrumental in the estimation of lmin (Vn ) from below. P P Pn p n 0 2 0 p Z Z a Lemma. lmin i i i¼p i¼0 Yi Yi . j¼0 j lmin Proof. Let u be a unit vector, kuk ¼ 1. By Eq. (6.121) and Ho¨lder’s inequality p X
u0 Zi Zi0 u ¼
!2 aj u0 Yij
p X
j¼0
a2j
j¼0
p X
(u0 Yij )2
j¼0
and therefore n X
u
0
Zi Zi0 u
p X
i¼p
! a2j
p n X X
(u0 Yij )2
i¼p j¼0
j¼0
(in the inner sum replace i j ¼ k) ¼
p X
! a2j
ip n X X
(u0 Yk )2
i¼p k¼i
j¼0
(changing summation order)
¼
p X j¼0
! a2j
kþp n X X k¼0 i¼k
0
2
(u Yk ) ¼ p
p X j¼0
! a2j
n X
u0 Yk Yk0 u:
k¼0
The lemma follows from this inequality and that for any symmetric matrix A the equation lmin (A) ¼ inf kuk¼1 u0 Au is true. B
306
CHAPTER 6
CONVERGENCE ALMOST EVERYWHERE
6.5.7 Representation in Terms of Errors Denote Cj ¼
j X
ai B ji , j ¼ 0, . . . , p 1:
(6:122)
i¼0
Lemma (i) Along with Eq. (6.121) there is the representation Zk ¼
p1 X
Cj ekj :
(6:123)
j¼0
(ii) Denoting p1 X
Ak,l ¼
0 e0m Ckm ,
(6.124)
0 (A0k,l e0l Ckl þ Ckl el Ak,l ),
(6.125)
j¼klþ1
Rn ¼ we have n X
m¼kpþ1
n lþp2 X X l¼2
l1 X
e0kj Cj0 ¼
k¼l
Zi Zi0 ¼
p1 X
i¼p
Cj
n X
j¼0
! ekj e0kj Cj0 þ Rn :
(6:126)
k¼p
Proof. (i) The recursion Eq. (6.106), applied to Ykj , j p, is used to reveal its dependence on Ykp : Ykj ¼ Bkj Y0 þ
kp X
Bkji ei þ
i¼1
¼B
pj
B
kp
kj X i¼kpþ1
Y0 þ
kp X
! B
kpi
ei þ
i¼1
¼ B pj Ykp þ
Bkji ei
kj X
kj X
Bkji ei
i¼kpþ1
Bkji ei
i¼kpþ1
(the sum in the expression above is empty when j ¼ p). Now we plug these expressions in Eq. (6.121) and use the fact that B satisfies its determinantal equation w(B) ¼ 0 by the Caley – Hamilton
6.5 SOME ALGEBRA RELATED TO VECTOR AUTOREGRESSION
307
theorem (see Herstein, 1975). For p k n
Zi ¼
p X
aj Yij ¼
j¼0
p X
! aj B
pj
Yip þ
p X
j¼0
j¼0
ij X
aj
Bijh eh
h¼ipþ1
[the first sum is null; in the second one change the summation order and apply notation Eq. (6.122)] i X
ih X
h¼ipþ1
j¼0
¼
! aj B
ijh
i X
eh ¼
p1 X
Cih eh ¼
Cj eij :
j¼0
h¼ipþ1
This is Eq. (6.123). P (ii) We substitute Eq. (6.123) in nk¼p Zk Zk0 , multiply through the inner sums and sort out the terms into groups with i ¼ j, i , j and i . j: n X
Zk Zk0
¼
k¼p
p1 n X X k¼p
! Ci eki
p1 X
i¼0
! e0kj Cj0
¼ Si¼j þ Si,j þ Si.j : (6:127)
j¼0
The terms at the right-hand side are rearranged as below:
Si¼j ¼
p1 n X X
Ci eki e0ki Ci0 ¼
k¼p i¼0
p1 X
n X
Ci
i¼0
Si,j ¼
! eki e0ki Ci0 ,
(6:128)
k¼p
p2 X p1 n X X
Ci eki e0kj Cj0
k¼p i¼0 j¼iþ1
(replacing i according to k i ¼ l)
¼
n k X X
p1 X
Ckl el
k¼p l¼kpþ2
¼
n lþp2 X X l¼2
k¼l
! e0kj Cj0
j¼klþ1 p1 X
Ckl el
! e0kj Cj0
j¼klþ1
and
Si.j ¼
p2 X p1 n X X k¼p j¼0 i¼jþ1
Ci eki e0kj Cj0
,
(6.129)
308
CHAPTER 6
CONVERGENCE ALMOST EVERYWHERE
(replacing j according to k j ¼ l) ¼
n k X X
!
p1 X
0 Ci eki e0l Ckl
k¼p l¼kpþ2 i¼klþ1
¼
n lþp2 X X k¼l
l¼2
!
p1 X
0 Cj ekj e0l Ckl :
(6.130)
j¼klþ1
Collecting Eqs. (6.127) – (6.130) and using notations (6.124) and (6.125) we finish the proof of Eq. (6.126). B
6.5.8 Spectrum-Separating Decomposition Let M . 1 m. From the rational canonical form (over the real field) of the real matrix B (see Herstein, 1975, p. 307) it follows that there exists a nonsingular real matrix M such that B ¼ M 1
B1 0
0 M, B2
(6:131)
where B1 and B2 are real square matrices such that min jlj (B1 )j . 1, max jlj (B2 )j 1:
(6:132)
Substituting Eq. (6.131) in Yn ¼ BYn1 þ en and premultiplying the resulting equation by M we obtain
MYn ¼
B1 0
0 MYn1 þ Men : B2
With conformal partitioning
MYn ¼
jn Sn , Men ¼ Tn zn
(6:133)
the system breaks up into Sn ¼ B1 Sn1 þ jn , Tn ¼ B2 Tn1 þ zn :
(6:134)
The first of the processes in Eq. (6.134) is purely explosive and the second one is nonexplosive. As a result of nonsingularity of M estimating Vn is the same as estimating 0
MVn M ¼
n X 1
Pn
0
(MYi )(MYi ) ¼
0 1 Si Si Pn 0 1 Ti Si
Pn
0 1 Si Ti Pn 0 1 Ti Ti
! :
(6:135)
6.5 SOME ALGEBRA RELATED TO VECTOR AUTOREGRESSION
309
6.5.9 Exploiting Order Properties of Symmetric Matrices Lemma.
Let A be a p p symmetric positive definite matrix
(i) If A1 ¼ Ip þ B þ C, where B, C are symmetric, B is nonnegative definite and kCk , 1, then kAk 1=(1 kCk):
(6:136)
(ii) If A is partitioned as
A¼
H , Q
P H0
where P and Q are, respectively, r r and s s matrices such that p ¼ r þ s, then for u [ Rr
0
u u u0 P1 u(1 þ kA1 ktrQ): A1 0 0
(6:137)
Proof. (i) Note that
lmin (A1 ) ¼ inf x0 A1 x ¼ inf (x0 x þ x0 Bx þ x0 Cx) kxk¼1
kxk¼1
0
inf x x sup jx0 Cxj 1 kCk: kxk¼1
kxk¼1
Since kAk ¼ lmax (A) ¼ 1=lmin (A1 ), Eq. (6.136) follows. (ii) By the partitioned matrix inversion rule [Lemma 1.7.6(ii)] A1 ¼
P1 þ P1 HGH 0 P1 GH 0 P1
P1 HG , G
(6:138)
where (6:139) G1 ¼ Q H 0 P1 H is positive definite
0 is positive definite. Taking x2 [ Rs , 0 [ Rr and because G ¼ (0 I)A1 I 0 we have from Eq. (6.138) letting y ¼ x2 kGk ¼ sup x02 Gx2 ¼ sup y0 A1 y kx2 k¼1
sup p
x[R , kxk¼1
kx2 k¼1
0 1
x A x ¼ kA1 k:
(6.140)
310
CHAPTER 6
CONVERGENCE ALMOST EVERYWHERE
By Eq. (6.139) tr(H 0 P1 H) ¼ trQ trG1 trQ. Therefore
lmax (P1=2 HH 0 P1=2 ) tr(P1=2 HH 0 P1=2 ) ¼ tr(H 0 P1 H) trQ: (6:141) In the equation
0
u 1 u ¼ u0 P1 u þ u0 P1 HGH 0 P1 u A 0 0
(6:142)
we need to estimate only the end term. Using the inequality v0 Cv kCkkvk2 (C is symmetric) twice we get u0 P1 HGH 0 P1 u ¼ (H 0 P1 u)0 G(H 0 P1 u) kGkkH 0 P1 uk
2
¼ kGku0 P1 HH 0 P1 u ¼ kGk(P1=2 u)0 P1=2 HH 0 P1=2 (P1=2 u) kGkkP1=2 HH 0 P1=2 kkP1=2 uk
2
¼ kGklmax (P1=2 HH 0 P1=2 )u0 P1 u:
(6.143)
Combining Eqs. (6.140) – (6.143) we obtain Eq. (6.137).
B
6.6 PRELIMINARY ANALYSIS 6.6.1 Bound on the Error Norm Lemma.
If the m.d. {en , F n } satisfies Eq. (6.105), then for any b . 1=a ken k ¼ o(nb ):
(6:144)
Proof. By the conditional Chebyshov inequality (Lemma 6.1.7) for any d . 0 P(kei ka . diab jF i1 )
1 c E(kei ka j F i1 ) ab : ab di di
As ab . 1, we get 1 X i¼1
P(kei ka . diab j F i1 )
1 cX iab , 1: d i¼1
Letting Zi ¼ 1(kei ka . diab ) in the conditional Borel– Cantelli lemma (Lemma 6.1.2) P a ab for all large i. Since this is we see that 1 i¼1 Zi , 1 a.s. This means that kei k di B true for any d . 0, we obtain Eq. (6.144).
6.6 PRELIMINARY ANALYSIS
311
6.6.2 Laws of Iterated Logarithm-Type Upper Bound The bound (6.145) below and, more generally, Eq. (6.72) is the same rate as is seen in the so-called laws of iterated logarithm; see Stout (1974). Lemma. Suppose that the (real-valued) m.d. sequence {en , F n } satisfies Eq. (6.105) and that l is a complex number with jlj ¼ 1. Denote R(a) and I (a) the real and imaginary parts of a complex number a and put Rn ¼
n X
R(lj e jk ), In ¼
n X
j¼1
I(lj e jk ), k ¼ 1, . . . , p:
j¼1
Then Rn ¼ O((n log log n)1=2 ), In ¼ O((n log log n)1=2 ) a:s:
(6:145)
pffiffiffiffiffiffiffi iw l ¼ e , i ¼ 1, implies lj ¼ eijw ¼ Proof. By the Euler formula Pn cos jw i sin jw and Rn ¼ j¼1 e jk cos jw. {e jk cos jw, F j } is a m.d. sequence such that sup E(je jk cos jwja j F j1 ) sup E(kej ka j F j1 ) , 1: j
j
Putting wn ¼ 1 for all n, we satisfy all conditions required for Wei’s bound (Section 6.3.10), where sn ¼ n1=2 and jwn j ¼ 1 ¼ o(scn ) ¼ o(nc=2 ) for all 0 , c , 1. Therefore Rn ¼ O(n1=2 ( log log n1=2 )1=2 ) ¼ O((n log log n)1=2 ): Obviously, such a proof works for In too.
B
6.6.3 Case of One Jordan Cell Lemma.
Let l be a complex number with jlj ¼ 1. Define a p p matrix 0
l 1 0 B0 l 1 B B D ¼ B0 0 l B .. .. .. @. . . 0 0 0
1 0 0 C C C C: .. C .. . . A l
If the m.d. sequence {en , F n } satisfies Eq. (6.105), then n X Dni ei ¼ O(n p1=2 ( log log n)1=2 ) a:s: i¼1
312
CHAPTER 6
CONVERGENCE ALMOST EVERYWHERE
Proof. We begin with the expression for the powers of D: 0
k
Bl B B B B0 k D ¼B B B . B .. @ 0
k k1 l 1
lk
.. .
..
0
1 k kpþ1 l C p1 C
C k C lkpþ2 C p2 C, C C .. C . A k l
.
k ¼ 0, 1, . . .
(6:146)
a! 0 a a if a b. Therefore Here we set ¼ 1, ¼ 0 if a , b and ¼ 0 b b b!(a b)! D
ni
ei ¼
p1 X
l
nin
n¼0
!0
0 X ni nin n i ei,nþ1 , . . . , ei,nþp : l n n n¼0
Introducing for n ¼ 0, 1, . . . , p 1 and k ¼ n þ 1, . . . , p the sum Sn (n, k) ¼ l
nn
n X ni i¼1
n
li eik ,
(6:147)
we have n X
D
ni
ei ¼
p1 X n X
l
nin
¼
p1 X
ni n
n¼0 i¼1
i¼1
ei,nþ1 , . . . ,
0 X n X
l
nin
ni n
n¼0 i¼1
Sn (n, n þ 1), . . . ,
n¼0
0 X
!0
ei,nþp
!0 Sn (n, n þ p) :
(6.148)
n¼0
By partial summation,
Pn
i¼1
ai bi ¼
Pn1 j¼1
(aj a jþ1 )
Pj
i¼1
bi , we have
n n1 X X ni nj nj1 R(li eik ) ¼ Rj , n n n i¼1 j¼1
(6:149)
where Rj and R(li eik ) are from Lemma 6.6.2. Note that, as m ! 1,
m n
m1 n
¼
m 1 (m 1)! m 1 ¼ n!(m 1 n)! m n n1
¼
(m 1) (m (n 1)) mn1 : (n 1)! (n 1)!
(6.150)
(The equivalence relation “” between two sequences an and bn means c1 an bn c2 an with constants independent of n:) For moderate m the left side is
6.6 PRELIMINARY ANALYSIS
313
bounded. Lemma 6.6.2 and Eqs. (6.149) and (6.150) lead to the bound n X ni i¼1
n
R(li eik ) ¼
n1 X
O((n j)n1 )O(( j log log j)1=2 )
j¼1
¼
n1 X
O(nn1=2 ( log log n)1=2 )
j¼1
¼ O(nnþ1=2 ( log log n)1=2 ) a:s:
(6.151)
Likewise, n X ni I (li eik ) ¼ O(nnþ1=2 ( log log n)1=2 ) a:s: n i¼1
(6:152)
Since n p 1 and jlj ¼ 1, Eqs. (6.151) and (6.152) imply, in view of Eq. (6.147), that Sn (n, k) ¼ O(n p1=2 ( log log n)1=2 ) a.s. Therefore the lemma follows from Eq. (6.148). B
6.6.4 Order of Magnitude of kYnk in Case M < 1 Lemma. Let the m.d. sequence {en , F n } satisfy Eq. (6.105) and let M , 1. Then kYn k ¼ o(nb ) a.s. for every b . 1=a. Proof. As M , 1, we can simplify Eq. (6.114) as kBn k cnm1 M n ¼ c(nm1 M n=2 )M n=2 c1 M n=2 :
(6:153)
By Lemma 6.6.1 for any 1 . 0 there exists I(1) . 0 such that kei k 1ib ,
i I(1):
(6:154)
i , I(1):
(6:155)
We can extend this bound by writing kei k c2 ,
Apply Eq. (6.106) and Eqs. (6.153)– (6.155) to get kYn k kBn kkY0 k þ
n X
kBni kkei k
i¼1
c1 kY0 kM n=2 þ c1 c2
I(1)1 X i¼1
M (ni)=2 þ c1 1
n X i¼I(1)
M (ni)=2 ib
314
CHAPTER 6
CONVERGENCE ALMOST EVERYWHERE
b
¼ o(n ) þ c3 M
(nI(1))=2
b
þ c1 1n
n X
M
(ni)=2
i¼I(1)
¼ o(nb ) þ c1 1nb
1 X
b i n
M i=2 :
i¼0
Since 1 . 0 is arbitrary, the lemma is proved.
B
6.6.5 Order of Magnitude of kYnk in Case M 5 1 Lemma. Let the m.d. sequence {en , F n } satisfy Eq. (6.105) and let M ¼ 1. Then kYn k ¼ O(nm1=2 ( log log n)1=2 ) a.s. Proof. Plugging the Jordan representation (6.112) in autoregression (6.104) and premultiplying the resulting equation by C we obtain CYn ¼ DCYn1 þ Cen :
(6:156)
Partition the vectors CYn and Cen conformably with D as 0 (1) 1 0 (1) 1 zn un CYn ¼ @ A, Cen ¼ @ A, z(q) u(q) n n
(6:157)
where z(n j) and u(n j) are mj 1 vectors, j ¼ 1, . . . , q. The properties of en imply a sup j,n E(ku(nj) k j F n1 ) , 1. Then, because D is diagonal, system (6.156) breaks up into q equations j) þ u(nj) ¼ Dnj z(0j) þ z(nj) ¼ Dj z(n1
n X
Djni u(i j) :
(6:158)
i¼1
For j with jlj j ¼ 1 from Eq. (6.158) and Lemmas 6.5.2 and 6.6.3 we obtain kz(nj) k ¼ O(nmj 1 ) þ O(nmj 1=2 ( log log n)1=2 ) ¼ O(nmj 1=2 ( log log n)1=2 ): (6:159) For j with jlj j ¼ 1 by part (i) of Lemma 6.5.2 kz(nj) k ¼ o(n1=2 ). Hence, Eq. (6.159) holds for every j. This proves the statement because C is nonsingular. B
6.6.6 Order of Magnitude of kYnk in Case M > 1 Lemma. Let {en , F n } be a m.d. sequence satisfying Eq. (6.105). If M . 1, then kYn k ¼ O(nm1 M n ) a.s. where m and M are as defined in Eq. (6.113). Proof. We use Eq. (6.156) with CYn and Cen partitioned as in Eq. (6.157). Equations (6.154) and (6.155) imply ku(i j) k c max {1, 1ib }. Using this bound, Eq. (6.158) and
6.6 PRELIMINARY ANALYSIS
315
Eq. (6.114) we estimate one component of CYn as follows: kz(nj) k kDnj kkz(0j) k þ
n X
kDjni kku(i j) k
i¼1
c1 nmj 1 jlj jn þ c2
n X
(n i)mj 1 jlj jni max{1, 1ib }
i¼1
"
¼ c1 nmj 1 jlj jn 1 þ c3
n X i¼1
i 1 n
mj 1
# jlj ji max{1, 1ib }
c4 nmj 1 jlj jn : This implies the required bound on Yn .
B
6.6.7 Generalization of Rubin’s Theorem For purely explosive systems (i.e. m ¼ min jlj j . 1), Rubin (1950) showed in the 1-D case that if the en are i.i.d. random variables with Ee1 ¼ 0 and Ee21 ¼ s 2 . 0, then Yn diverges exponentially fast so that P(Bn Yn converges to nonzero limit) ¼ 1: The theorem below is a multivariate generalization of Rubin’s result. Theorem. Let the m.d. sequence {en , F n } satisfy Eq. (6.105) and let Yn be defined by Eq. (6.106). If m . 1, then B is invertible and 1 X Bi ei : (6:160) Bn Yn converges a:s: to Y ¼ Y0 þ i¼1 If, furthermore, " #! r X 0 Brk enrþk e0nrþk (Brk ) jF nr lim inf lmin E .0 (6:161) n!1
k¼1
for some r 1, then the limit Y in Eq. (6.160) has the property that a0 Y has a continuous distribution for all a = 0:
(6:162)
P Proof. Let Zn ¼ Y0 þ ni¼1 Bi ei be the initial segment of Eq. (6.160). By Eq. (6.106) Bn Yn ¼ Zn . By Lemma 6.5.2 kBn k cmn . This bound and Eq. (6.105) imply 1 X i¼1
2
E(kBi ei k j F i1 )
1 X i¼1
2
kBi k sup E(ken k2 j F n1 ) , 1 a:s: n
By the martingale convergence theorem (Theorem 6.1.3) Zn ! Y a.s.
316
CHAPTER 6
CONVERGENCE ALMOST EVERYWHERE
Now we turn to the proof of Eq. (6.162). Note that, since Bi is nonsingular, the vectors (Bi )0 a, i ¼ 1, 2, . . . , are nonzero for a = 0. Let us write the series in Eq. (6.160) in batches of r terms: 1 X
Bi ei ¼
i¼1
1 X r X
Bnrk enrþk ¼
n¼0 k¼1
1 X
B(nþ1)r
n¼0
r X
Brk enrþk :
k¼1
Denote un ¼
r X
Brk enrþk , F~ n ¼ F (nþ1)r :
k¼1
The sequence {un , F~ n } is a m.d. sequence, E(un j F~ n1 ) ¼
r X
Brk E(enrþk j F nr ) ¼ 0,
k¼1
with conditional second moments E(un u0n
j F~ n1 ) ¼ E
r X
! rk
B
enrþk e0nrþl (Brl )0
j F nr :
k,l¼1
Here for k , l E(enrþk e0nrþl j F nr ) ¼ E[E(enrþk e0nrþl j F nrþl1 ) j F nr ] ¼ E[enrþk E(e0nrþl j F nrþl1 ) j F nr ] ¼ 0 and similarly for l , k E(enrþk e0nrþl j F nr ) ¼ 0. Hence, E(un u0n
j F~ n1 ) ¼ E
r X
! B
rk
enrþk e0nrþk (Brk )0
j F nr
k¼1
and condition Eq. (6.161) rewrites as lim inf lmin E(un u0n j F~ n1 ) . 0: n!1
As {un , F~ n } satisfies the multivariate local M – Z condition [Eq. (6.65)] and the coefficients An ¼ (B(nþ1)r )0 a satisfy Eq. (6.69), Theorem 6.3.9 yields 0
0
P(a Y ¼ c) ¼ P a Y0 þ
1 X
! A0n un
¼c ¼0
n¼0
for every constant c.
B
6.6 PRELIMINARY ANALYSIS
317
6.6.8 kFnk is Bounded Lemma. If the m.d. sequence {en , F n } satisfies Eq. (6.105), then the expression P Fn ¼ n1 ni¼1 ei e0i satisfies kF n k ¼ O(1): Proof. Let di ¼ kei k2 E(kei k2 j F i1 ). Taking some r [ (1, min {2, a=2}), in a way similar to Eq. (6.77) we have supi E(jdi jr j F i1 ) , 1. Letting Un ¼ n in Theorem 6.1.4 we see that 1 X
E(jdi jr j F i1 )Uir c
1 X
1
and therefore n X
Pn
i¼1
kei k2 ¼
i¼1
ir , 1 a:s:
1
di ¼ o(n). Hence,
n X
di þ
i¼1
n X
E(kei k2 j F i1 ) ¼ o(n) þ O(n) ¼ O(n) a:s:
(6:163)
i¼1
Since Fn is nonnegative definite, by Lemma 6.5.5 it is true that kFn k ¼ lmax (Fn )
p X i¼1
n 1X li (Fn ) ¼ tr ei e0 n i¼1 i
! ¼
n 1X kei k2 : n i¼1
Equations (6.163) and (6.164) prove the lemma.
B
6.6.9 kXnk is Bounded Lemma.
If M , 1 and Eq. (6.105) holds, then Xn ¼ 1n kXn k
(6:164)
Pn
i¼1
Yi Yi0 satisfies
n 1X kYi k2 ¼ O(1): n i¼1
(6:165)
Proof. By the recursion (6.106) n X
2
!1=2 kYk k2
4
k¼1
n X
kBk kkY0 k þ
!2 31=2 kBki kkei k 5
i¼1
k¼1
" n X
k X
#1=2 2
k
(kB kkY0 k)
k¼1
2 þ4
n X
k X
k¼1
i¼1
kB
ki
!2 31=2 kkei k 5 :
(6.166)
318
CHAPTER 6
CONVERGENCE ALMOST EVERYWHERE
The first term on the right causes no trouble as M , 1. The second term is bounded using Eq. (6.163): !2 ! ! n k n k k X X X X X kBki kkei k kBki k kBki kkei k2 k¼1
i¼1
i¼1
k¼1
1 X
! i
kB k
i¼1 n n X X
i¼0
1 X i¼0
i¼1
!2 kBi k
n X
kB
! ki
k kei k2
k¼i
kei k2 ¼ O(n) a:s:
(6.167)
i¼1
For Xn a relationship of type Eq. (6.164) is true. Therefore Eqs. (6.166) and (6.167) give Eq. (6.165). B
6.6.10 Lemma on Almost Decreasing Sequences Lemma. (Nielsen, 2005, Lemma 8.5) If a numerical nonnegative sequence {at } is almost decreasing, in the sense that there exist constants c . 0 and k . 0 such that atþ1 at þ ct k for all large t, and the numbers a1 , . . . , aT are jointly bounded as T X
at ¼ o(T d )
(6:168)
t¼1
for all d . 0, then these numbers tend to zero at the rate aT ¼ o(T r ) for all r , min {1, k=2}: Proof. By the “almost decreasing” condition, there exists T0 such that atþ1 at þ ct k for all t T0 : Suppose T . T0 and consider T0 t T: Inductively, at atþ1 ct k atþ2 ct k c(t þ 1)k aT [ct k þ c(t þ 1)k þ þ c(T 1)k ]:
(6:169)
Now let 0 , r , 1: For large enough T, Eq. (6.169) is applicable to t [ [T T r , T] # [T0 , T]: The sum in the brackets at the right of Eq. (6.169) contains at most T r terms, and each of these does not exceed c(T T r )k 2cT k for large T. Therefore min
TtTT r
at aT 2cT rk :
Restricting r to r [ (0, min{1, k=2}) it is seen that T X t¼1
at
T X
at T r (aT 2cT rk )
t¼TT r
¼ T r aT 2cT 2rk T r aT 2c
6.7 STRONG CONSISTENCY FOR VECTOR AUTOREGRESSION AND RELATED RESULTS
319
P for large T: Combining this with Eq. (6.168) gives aT T r Tt¼1 at þ 2c ¼ B o(T dr ): As d can be chosen arbitrarily small, this proves the lemma.
6.7 STRONG CONSISTENCY FOR VECTOR AUTOREGRESSION AND RELATED RESULTS 6.7.1 Theorem on Relative Compactness Recall that a sequence of vectors {ak } # Rp is called relatively compact if any subsequence {bk } of it contains a further subsequence {ck } that is convergent. We say that a sequence of random matrices {Xn } is relatively compact with probability 1 if for almost any v [ V the sequence {Xn (v)} is relatively compact. In this situation it is convenient to denote Lim{Xn } the set of limit points of {Xn }. The purpose here is to establish the relationship between Xn ¼
n n X 1X Yi Yi0 and Fn ¼ n1 ei e0i : n i¼1 i¼1
The relative compactness notion is required because {Xn } and {Fn } do not converge. Theorem. Let the m.d. sequence {en , F n } satisfy Eq. (6.105) and let Yn be defined by Yn ¼ BYn1 þ en ,
(6:170)
where B satisfies M ¼ maxjlj j , 1. Then with probability 1 the matrix sequence {Xn } is relatively compact and its set of limit points is BF, where B is the enveloping resolvent from Lemma 6.5.4 and F is the set of limit points of {Fn } (which is also relatively compact): Lim{Xn } ¼ B Lim{Fn }. Proof. In a finite-dimensional space relative compactness is equivalent to boundedness, so the statements about relative compactness of {Xn } and {Fn } follow from Lemmas 6.6.8 and 6.6.9. From Eq. (6.170) we get n X
Yi Yi0 ¼
i¼1
n X
0 (BYi1 þ ei )(Yi1 B0 þ e0i ) ¼ B
i¼1
þ
n X i¼1
n X
0 Yi1 Yi1 B0
i¼1 0 (BYi1 e0i þ ei Yi1 B0 ) þ
n X
ei e0i
i¼1
or, using Xn and Fn , n 1 1X 0 (BYi1 e0i þ ei Yi1 B0 ) þ Fn : Xn ¼ BXn B0 þ B(Y0 Y00 Yn Yn0 )B0 þ n n i¼1
(6:171)
320
CHAPTER 6
CONVERGENCE ALMOST EVERYWHERE
Because Yi1 is F i1 -measurable, {BYi1 e0i } is a m.d. sequence. Let us bound n X
2
E( kBYi1 e0i k jF i1 )=i2
n X
1
kBk2 kYi1 k2 E( kei k2 jF i1 )=i2
1
c
n X
kYi1 k2 =i2 :
(6:172)
1
Denote Sn ¼ n X 1
Pn 1
kYi1 k2 , ai ¼ i2 . From Lemma 6.6.9 we see that
kYi1 k2 =i2 ¼
n X
(Si Si1 )ai
1
¼ S0 a1 þ Sn an þ
n1 X
Sj (aj a jþ1 )
1
n1 1 X 1 1 þ O( j) n2 j2 ( j þ 1)2 1
n1 X 1 ¼ O(1) þ O 2 ¼ O(1): j 1 ¼ O(1) þ O(n)
(6:173)
Now Eqs. (6.172), (6.173) and the martingale strong law (Section 6.1.4) with p ¼ 2, Xn ¼ BYn1 e0n and Un ¼ n imply n 1X 0 (BYi1 e0i þ ei Yi1 B0 ) ¼ o(1) a:s: n i¼1
(6:174)
Besides, by Lemma 6.6.4 with b ¼ 1=2 . 1=a 1 B(Y0 Y00 Yn Yn0 )B0 ¼ o(1): n
(6:175)
The consequence of Eqs. (6.171), (6.174) and (6.175) is that Xn BXn B0 ¼ Fn þ o(1) or, in terms of the operator B, Xn ¼ B(Fn þ o(1)):
(6:176)
By Lemma 6.6.8 {Fn } is relatively compact. If {Fn (v): n 1} is convergent for some v [ V, then, because of boundedness of B, {Xn (v): n 1} is also convergent. The set of limit points of {Xn } is an image under B of the set of limit points B of {Fn }.
6.7 STRONG CONSISTENCY FOR VECTOR AUTOREGRESSION AND RELATED RESULTS
321
6.7.2 Bounding lmax (Vn) Theorem. If the m.d. sequence {en , F n } satisfies Eq. (6.105), then for Vn ¼ P n 0 i¼1 Yi Yi the following is true: (i) lmax (Vn ) ¼ O(n2m2 M 2n ) a.s. if M . 1; (ii) lmax (Vn ) ¼ O(n2m log log n) a.s. if M ¼ 1; (iii) lmax (Vn ) ¼ O(n) a.s. if M , 1. Proof. (i) From Eq. (6.120) and Lemma 6.6.6 we derive
lmax (Vn )
n X
kYi k2 c
n X
i¼1
i2m2 M 2i
i¼1
n 2m2 X i in 2m2 2n ¼ cn M (M 2 ) n i¼1
cn2m2 M 2n
1 X
i
(M 2 ) :
i¼0
(ii) Similarly, by Lemma 6.6.5, excluding i for which log log i is not defined,
lmax (Vn ) kY1 k2 þc
n X
i2m1 log log i
i¼2
c1 þ cn
2m1
(log log n)(n 2) ¼ O(n2m log log n):
Statement (iii) follows from Eq. (6.165).
B
6.7.3 Bounding lmin(Vn) from Below in the Stable Case The importance of the theorem on relative compactness (Section 6.7.1) is demonstrated by the following application. For the notations B, Fn , Xn and Vn see Sections 6.5.4, 6.7.1 and 6.7.2, respectively. Lemma.
If the m.d. sequence {en , F n } satisfies Eq. (6.105), then the condition lim inf lmin (BFn ) . 0 a:s: n!1
(6:177)
is necessary and sufficient for lim inf n!1
1 lmin (Vn ) ¼ lim inf lmin (Xn ) . 0 a:s: n!1 n
(6:178)
Proof. Suppose Eq. (6.177) is true and denote a ¼ lim inf n!1 lmin (Xn ): Then there exists a subsequence {Xnk } # {Xn } such that a ¼ limn!1 lmin (Xnk ): By Lemma 6.6.9
322
CHAPTER 6
CONVERGENCE ALMOST EVERYWHERE
{Xn } is relatively precompact. To simplify the notation, we can assume that {Xnk } converges. Similarly, by Lemma 6.6.8 we may assume that {Fnk } converges. By Theorem 6.7.1 the respective limit points satisfy Lim{Xn } ¼ BLim{Fn } and almost sure positivity of a follows from Eq. (6.177). The proof of (6.178) ) (6.177) is absolutely analogous. B To extend this result to unstable systems, we need to modify Eq. (6.177) because the operator B is not defined in the case M . 1: This is the subject of Sections 6.7.4 and 6.7.5, the final result being Theorem 6.7.5.
6.7.4 Conditions Equivalent to Equation (6.177) Denote Hk (L) ¼ (I, B, . . . , Bk1 )L1=2 ¼ (L1=2 , BL1=2 , . . . , Bk1 L1=2 ),
k ¼ 1, 2, . . .
The matrices Hk , Hp and Hp Hp0 are of sizes p (kp), p p2 and p p, respectively. Lemma. Under the conditions of Lemma 6.7.3, condition (6.177) of that lemma is equivalent to each of the next two conditions: P(rankHp (L) ¼ p)
for all L [ Lim{Fn }
(6:179)
and lim inf lmin (Hp (Fn )[Hp (Fn )]0 ) . 0 a:s: n!1
(6:180)
Proof. Step 1. We prove that for any L the following three conditions are equivalent
lmin (Hk (L)[Hk (L)]0 ) . 0 for some k p, rank Hp (L) ¼ p
(6:182)
lmin (Hp (L)[Hp (L)]0 ) . 0
(6:183)
(6:181)
and
(see Kushner, 1971, p. 264). With the identity and null matrices of appropriI ate sizes we can define A ¼ to obtain 0 (I, B, . . . , Bk1p )L1=2 ¼ (I, B, . . . , B p1 )L1=2 A ¼ Hp (L)A, Hk (L) ¼ (I, . . . , B p1 , Bp , . . . , Bk1 )L1=2 ¼ (Hp (L), B p (I, . . . , Bk1p )L1=2 ) ¼ (Hp (L), B p Hp (L)A):
(6:184)
6.7 STRONG CONSISTENCY FOR VECTOR AUTOREGRESSION AND RELATED RESULTS
323
This implies Hk (L)[Hk (L)]0 ¼ Hp (L)[Hp (L)]0 þ B p Hp (L)AA0 [Hp (L)]0 (B p )0 and
lmin (Hk (L)[Hk (L)]0 ) lmin (Hp (L)[Hp (L)]0 ):
(6:185)
Furthermore, with appropriately sized matrices we can define S ¼ (I, B p ) and T ¼ diag[I, A] to obtain from Eq. (6.184)
I SHp (L)T ¼ (Hp (L), B Hp (L)) 0 p
0 A
¼ Hk (L):
(6:186)
If rank Hp (L) , p, then rank Hk (L) min{rank S, rank Hp (L), rank T} , p by Eq. (6.186) and rank Hk (L)[Hk (L)]0 , p: Since Hk (L)[Hk (L)]0 is of size p p and nonnegative definite, we can use jHk (L)[Hk (L)]0 j ¼
p Y
lj (Hk (L)[Hk (L)]0 )
(6:187)
j¼1
to conclude that lmin (Hk (L)[Hk (L)]0 ) ¼ 0. Thus, (6.181) ) (6.182). By Eq. (6.187), the implication (6.182) ) (6.183) is true. And, finally, according to Eq. (6.185), (6.183) implies lmin (Hk (L)[Hk (L)]0 ) . 0 for all k p: Step 2. As we know from Lemma 6.6.8, {Fn (v): n 1} is relatively compact for almost every v: Suppose that Eq. (6.177) holds. Then there is a subsequence {Fnk } # {Fn } such that limn!1 lmin (BFnk (v)) . 0: By relative compactness we can assume that {Fnk (v)} converges to some L(v): Then lmin (BL(v)) . 0 and we can choose k ¼ k(v) p such that
0
lmin (Hk (L(v))[Hk (L(v))] ) ¼ lmin
k 1 X
! i
i 0
B L(v)(B )
. 0:
i¼0
From Step 1, this is equivalent to Eqs. (6.182) and (6.183) for the given L ¼ L(v): For the reason that this proof applies to each limit point L of {Fn }, we have proved that (6.177) ) (6.179). Since Eq. (6.183) is just an equivalent way of writing Eq. (6.180), we have also proved (6.179) () (6.180). The implication (6.180) ) (6.177) is obvious P p1 i B Fn (Bi )0 . B because Hp (Fn )[Hp (Fn )]0 ¼ i¼0
6.7.5 A General Lower Bound on lmin(Vn) The theorem here, unlike Lemma 6.7.3, does not require the assumption M , 1: Theorem. Let the m.d. sequence {en , F n } satisfy Eq. (6.105) and define Yn by Eq. (6.106). Suppose that condition (6.179) or, equivalently, (6.180) holds. Then lim inf n!1 n1 lmin (Vn ) . 0 a.s.
324
CHAPTER 6
CONVERGENCE ALMOST EVERYWHERE
Proof. Step 1. Let us show that the residual Rn in Eq. (6.126) satisfies Rn ¼ o(n): Obviously, the matrices from Eq. (6.122) satisfy c ¼ max kCj k , 1:
(6:188)
j
Plþp2 Ckl el Ak,l : As we can see from Eq. (6.124), Ak,l is Denote Xl ¼ k¼l F l1 -measurable and {Xl } is a m.d. sequence. By Eq. (6.188), kAk,l k c
l1 X
kem k
m¼kpþ1
and kXl k c
lþp2 X
kel kkAk,l k c1 kel k
lþp2 X
l1 X
k¼l
m¼kpþ1
k¼l
¼ c1 kel k
c2 kel k
l1 X
mþp1 X
m¼lpþ1
k¼l
kem k
kem k c1 ( p 1)kel k
l1 X
kem k
m¼lpþ1
!1=2
l1 X
kem k
2
:
m¼lpþ1
With this bound at hand, we can estimate n X l¼2
E( kXl k2 jF l1 )l2 c3
n X
l1 X
kem k2 E( kel k2 jF l1 )l2
l¼2 m¼lpþ1
c3 sup E( ken k2 jF n1 ) n
c4 sup E( ken k2 jF n1 ) n
n1 X m¼3p n1 X
kem k2
mþp1 X
l2
l¼mþ1
kem k2 m2 :
m¼3p
(6:189) P As we know from Eq. (6.163), n1 kem k2 ¼ O(n): Therefore the right side of Eq. (6.189) is bounded uniformly in n [see a similar argument in Section 6.7.1, in particular, Eq. (6.173)]. By Theorem 6.1.4 Rn ¼ P n 0 l¼2 (Xl þ Xl ) ¼ o(n). Step 2. In view of Eq. (6.126) we have proved that ! p1 n n X 1X 1X 0 0 Zi Zi ¼ Cj ekj ekj Cj0 þ o(1): n i¼p n j¼0 k¼p
6.7 STRONG CONSISTENCY FOR VECTOR AUTOREGRESSION AND RELATED RESULTS
This equation and Lemma 6.6.8 show that the sequence
n P n 1 n
i¼p
325
o Zi Zi0 is
relatively compact with probability 1. Moreover, its set of limit points is {F(F): F [ Lim{Fn }}, where
F(F) ¼
p1 X
Cj FCj0 :
j¼0
All terms in this expression are nonnegative definite. Therefore if x0 F(F)x ¼ 0, then F 1=2 x ¼ 0, F 1=2 (B þ a1 Ip )0 x ¼ 0, . . . , F 1=2 (B p1 þ þ a p1 Ip )0 x ¼ 0 which, in turn, implies F 1=2 x ¼ 0, F 1=2 B0 x ¼ 0, . . . , P p1 i i 0 F 1=2 (B p1 )0 x ¼ 0: Hence, if lmin B F(B ) . 0, then F(F) is noni¼0 P singular. Therefore lim inf n!1 lmin 1n ni¼p Zi Zi0 . 0 a.s. by assumption (6.180). Lemma 6.5.6 and this relation prove the theorem.
B
6.7.6 Application to Scalar Autoregression Consider an autoregressive model (6.110), which can be written as Eq. (6.104) if notation (6.111) is adopted. Theorem.
If the m.d. sequence {1n , F n } satisfies sup E(j1n ja j F n1 ) , 1 a:s: with some a . 2
(6:190)
n
and
lim inf n!1
n 1X E(12i j F i1 ) . 0 a:s: n i¼1
(6:191)
then lim inf n!1 lmin (Vn ) . 0 a.s. Proof. Obviously, Eq. (6.190) implies Eq. (6.105). We need to verify Eq. (6.179). With u ¼ (1, 0, . . . , 0)0 we have en ¼ 1n u and Fn ¼ n1 p1 X i¼0
n X
ei e0i ¼ n1
n X
i¼1
i¼1
i
n X
i 0
1
B Fn (B ) ¼ n
i¼1
1i uu0 1i ¼ n1 12i
" p1 X i¼0
n X i¼1
i
0
i 0
12i uu0 ,
#
B uu (B ) :
(6:192)
326
CHAPTER 6
CONVERGENCE ALMOST EVERYWHERE
From
Bu ¼
2
b p1
I p1
B u¼
...
b1
b1
... I p1
b p1
0
1 b1 1 B C B C B 1 C bp B 0 C B C B C¼B 0 C C, 0 @...A B B C @...A 0 0 0 2 1 0 1 b1 þ b2 b1 B C B C B b1 C C B 1 C B C bp B C B 1 C B 0 C¼B B C B C 0 B C B 0 C C @...A B @ ... A 0 0 0
1
and similar expressions for other powers of B we observe that the matrix (u, Bu, . . . , B p1 u) is upper triangular with unities on the main diagonal and is therefore nonsingular. It follows that in Eq. (6.192) the matrix in the brackets is nonsingular:
det
p1 X
! Bi uu0 (Bi )0
i¼0
2
13 u0 0 C7 B 6 p1 B (Bu) C7 u) ¼ det6 (u, Bu, . . . , B @ . . . A5 . 0: 4 (B p1 u)0 0
Theorem 6.7.5 is applicable.
B
6.7.7 Purely Explosive Case Here we establish that in the purely explosive case (m ¼ minjlj j . 1) both lmin (Vn ) and lmax (Vn ) grow exponentially fast under assumption (6.161). Theorem. Let {en , F n } be a m.d. sequence satisfying Eqs. (6.105) and (6.161), and define Yn by Eq. (6.106). Suppose m . 1: Then P n n 0 i 0 i 0 (i) The product a.s. to G ¼ 1 i¼0 B YY (B ) , where Pn B 0 Vn (B ) convergesn Vn ¼ i¼1 Yi Yi and Y ¼ limn!1 B Yn : (ii) G is positive definite with probability 1. (iii) With probability 1 lim n1 log lmax (Vn ) ¼ 2 log M,
n!1
lim n1 log lmin (Vn ) ¼ 2 log m:
n!1
6.7 STRONG CONSISTENCY FOR VECTOR AUTOREGRESSION AND RELATED RESULTS
327
Proof. (i) As for the existence of the limit Y ¼ limn!1 Bn Yn , see Eq. (6.160). Convergence of G follows from kBi k cmi : Let us prove that Bn Vn (Bn )0 converges to G: Denoting Zn ¼ Bn Yn Yn0 (Bn )0 and Z ¼ YY 0 , by Eq. (6.160) we have Zn ! Z a.s. and therefore c ¼ supi kZi k , 1 a.s. For a given 1 . 0 let n(1) be such that kZ Zi k 1 for i n(1): We need to prove that Bn Vn (Bn )0 ¼
n X
Bin Bi Yi Yi0 (Bi )0 (Bin )0
i¼1
¼
n X
Bin Zi (Bin )0
i¼1
P i i 0 converges to G ¼ 1 i¼0 B Z(B ) : This convergence follows from the next three bounds. The first bound is n(1)1 n(1)1 X 2 X in in 0 B Zi (B ) kBin k c i¼0 i¼0 c
1 X
2
kBj k ! 0, n ! 1:
j¼nn(1)þ1
The second is 1 X j j 0 B Z(B ) j¼nn(1)þ1
1 X
2
kBj k kZk ! 0,
n ! 1:
j¼nn(1)þ1
Finally, nn(1) n X X in in 0 j j 0 B Zi (B ) B Z(B ) i¼n(1) j¼0 1 nn(1) X 2 X j j 0 ¼ B (Znj Z)(B ) 1 kBj k : j¼0 j¼0 For these bounds to yield the desired result, n(1) is chosen first and n next. (ii) Suppose that 0 ¼ x0 Gx ¼
1 X i¼0
x0 Bi YY 0 (Bi )0 x ¼
1 X
(x0 Bi Y)2 :
i¼0
Then x0 Bi Y ¼ ((Bi )0 x)0 Y ¼ 0: By Eq. (6.162) this is an impossible event if (Bi )0 x = 0, that is x = 0:
328
CHAPTER 6
CONVERGENCE ALMOST EVERYWHERE
(iii) Denote Cn ¼ Bn Vn (Bn )0 : Since Cn ! G a.s. and G is positive definite, we have 0 , lim lmin (Cn ) lim lmax (Cn ) , 1 a.s: n!1
n!1
This relation, the equation Vn ¼ Bn Cn (Bn )0 and Lemma 6.5.3 imply n1 log lmax (Vn ) ¼ n1 loglmax (Bn (Bn )0 ) þ o(1):
(6:193)
However, by Lemma 6.5.2(ii) and Eq. (6.119) n1 log lmax (Bn (Bn )0 ) ¼ n1 log kBn k2 ¼ 2(1 þ o(1))log M:
(6:194)
The first equation in (iii) follows from Eqs. (6.193) and (6.194). As a result of Vn1 ¼ (Bn )0 Cn1 Bn , along with Eq. (6.193), we have n1 log lmin (Vn ) ¼ n1 log lmax (Vn1 ) ¼ n1 log lmax ((Bn )0 Bn ) þ o(1): Instead of Eq. (6.194) we need now n1 log lmax ((Bn )0 Bn ) ¼ n1 log kBn k2 ¼ 2(1 þ o(1))log m: The above two relations prove the second equation in (iii).
B
6.7.8 Some Bounds Involving Vn As before, we denote Vn ¼
Pn
i¼1
Yi Yi0 and let
N ¼ inf{n: Vn is nonsingular}, inf ¼ 1: In the lemma below, we explicitly take account of the fact that the error vectors en form an array e1 ¼ (e11 , . . . , e1p ), e2 ¼ (e21 , . . . , e2p ), . . . , en ¼ (en1 , . . . , enp ) [see Eq. (6.107)]. Lemma. Let the m.d. sequence {en , F n } satisfy Eq. (6.105) and suppose that Eq. (6.179) or, equivalently, Eq. (6.180) holds. If, additionally, M 1, then (i) N , 1 a.s. and kVn1=2 k ¼ O(n1=2 ) a:s: P (ii) Yn0 Vn1 Yn 1 for n N and ni¼N Yi0 Vi1 Yi ¼ O(log n) a.s. (iii) We have the bound n 1=2 X Yi e0iþ1 ¼ O((log n)1=2 ) a:s: Vn i¼1
6.7 STRONG CONSISTENCY FOR VECTOR AUTOREGRESSION AND RELATED RESULTS
329
Proof. (i) By Theorem 6.7.5 lim inf n!1 n1 lmin (Vn ) . 0, and so N , 1 and 2
2
kVn1=2 k ¼ sup kVn1=2 xk ¼ sup x0 Vn1 x kxk¼1
kxk¼1
¼ lmax (Vn1 ) ¼ 1=lmin (Vn ) ¼ O(n) a:s: (ii) As Vn1 is nonnegative definite and Vn is positive definite for n N, we have jVn1 j 0, jVn j . 0 and by Lemma 6.4.2 Yn0 Vn1 Yn ¼ 1 jVn1 j=jVn j 1, n N: Further, by Lemma 6.4.3 n X
Yi0 Vi1 Yi ¼ O(log lmax (Vn )):
i¼N
It remains to recall that log lmax (Vn ) ¼ O(log n), according to parts (ii) and (iii) of Theorem 6.7.2. (iii) To apply Lemma 6.4.5(iii), we need to reveal in 2 !0 ! n n n X X 1=2 X 0 0 1 0 Yi eiþ1 ¼ Yi e iþ1 Vn Yi e iþ1 V n i¼1 i¼1 i¼1 the structure associated with regression yn ¼ Xn b þ 1n . Let X 0 n ¼ (Y1 , . . . , Yn ), 1(nj ) ¼ (e2, j , . . . , enþ1, j ), Q(nj ) ¼ (1(nj ) )0 Xn ( X 0 n Xn )1 X 0 n 1(nj ) , j ¼ 1, . . . , p: P Then Vn ¼ ni¼1 Yi Y 0 i ¼ X 0 n Xn and the rows of Xn are not changed as new rows are appended. For each j, {eiþ1, j , F iþ1 } is a m.d. sequence and Yi is F i -measurable. Further, 0P n
n X
Yi e0iþ1
i¼1
1
0 0 (1) 1 X 1 B i¼1 C B C @ n n A ¼B : C¼ n @P A 0 ( p) X 1 n n Yi eiþ1,p Yi eiþ1,1
i¼1
So by Lemma 6.4.5(iii) 0 0 (1) 1 2 Xn 1n n 1=2 X C 0 (1) 0 ( p) 0 0 1 B Yi eiþ1 ¼ ((1n ) Xn , . . . , (1n ) Xn )( Xn Xn ) @ . . . A V n i¼1 Xn0 1n( p) ( p) ¼ Q(1) n þ þ Qn ¼ O(log lmax (Vn )) ¼ O(log n) a:s:
B
330
CHAPTER 6
CONVERGENCE ALMOST EVERYWHERE
6.7.9 More Bounds Involving Vn Lemma. Then
Let the conditions of Lemma 6.7.8 hold and suppose that B is nonsingular.
1 (i) kVn1=2 B0 Vnþ1 BVn1=2 k 1 þ O(n1=2 (log n)1=2 ) a:s. (ii) Let r . 1=a where a is the integrability parameter from Eq. (6.105). Then 0 1 lim sup n1=2r (Ynþ1 Vnþ1 Ynþ1 Yn0 Vn1 Yn ) 0 a:s: n!1
Proof. 1 BVn1=2 : To obtain a (i) We intend to apply Lemma 6.5.9(i) to An ¼ Vn1=2 B0 Vnþ1 recursion for Vnþ1 , consider
Vnþ1 ¼
nþ1 X
Yi Yi0 ¼
i¼1
¼
nþ1 X
nþ1 X
0 (BYi1 þ ei )(Yi1 B0 þ e0i )
i¼1 0 0 (BYi1 Yi1 B0 þ BYi1 e0i þ ei Yi1 B0 þ ei e0i )
i¼1
¼ B(Vn þ Y0 Y00 )B0 þ B
n X
Yi e0iþ1 þ
i¼0
n X
eiþ1 Yi0 B0 þ
nþ1 X
i¼0
ei e0i :
i¼1
Therefore 0
1=2 1 A1 B Vnþ1 (B0 )1 Vn1=2 ¼ Ip þ Bn þ C~ n þ C~ n n ¼ Vn
(6:195)
where " Bn ¼ Vn1=2 Y0 Y00 þ B1
nþ1 X
!
#
ei e0i (B0 )1 Vn1=2
i¼1
is nonnegative definite, C~ n ¼
Vn1=2
n X
! Yi e0iþ1
(B0 )1 Vn1=2 :
i¼0
By Lemma 6.7.8, parts (i) and (iii), n 1=2 X 0 ~ kC n k Vn Yi eiþ1 k(B0 )1 kkVn1=2 k i¼0 ¼ O(n1=2 (log n)1=2 ) a:s:
(6:196)
6.7 STRONG CONSISTENCY FOR VECTOR AUTOREGRESSION AND RELATED RESULTS
331
0 Noting that Cn ¼ C~ n þ C~ n is symmetric, we can apply Eqs. (6.195) and (6.196) and Lemma 6.5.9(i) to conclude that
kAn k 1=[1 O(n1=2 (log n)1=2 )] ¼ 1 þ O(n1=2 (log n)1=2 ): (ii) For n N we have 0 1 1 Vnþ1 Ynþ1 ¼ (BYn þ enþ1 )0 Vnþ1 (BYn þ enþ1 ) Ynþ1 1 0 1 BYn þ Ynþ1 Vnþ1 enþ1 ¼ Yn0 B0 Vnþ1 1 1 þ e0nþ1 Vnþ1 Ynþ1 e0nþ1 Vnþ1 enþ1 : 1 is positive definite, we continue as follows: Remembering that Vnþ1 0 1 1 Vnþ1 Ynþ1 (Vn1=2 Yn )0 Vn1=2 B0 Vnþ1 BVn1=2 (Vn1=2 Yn ) Ynþ1 1=2 1=2 þ 2(Cnþ1 Ynþ1 )0 (Cnþ1 enþ1 )
[applying statement (i)] 2
[1 þ O(n1=2 (log n)1=2 )] kVn1=2 Yn k 1=2 1=2 þ 2kVnþ1 Ynþ1 kkVnþ1 kkenþ1 k:
(6:197)
By Lemmas 6.6.1 and 6.7.8(i,ii) 2
ken k ¼ o(nr ), kVn1=2 k ¼ O(n1=2 ), kVn1=2 Yn k ¼ Yn0 Vn1 Yn 1: These bounds and Eq. (6.197) give 0 1 Vnþ1 Ynþ1 Yn0 Vn1 Yn þ O(n1=2 (log n)1=2 ) þ o(nr1=2 ) Ynþ1
¼ Yn0 Vn1 Yn þ o(nr1=2 ): This proves statement (ii).
(6:198) B
6.7.10 Convergence of Certain Quadratic Forms An important difference between purely explosive (m . 1) and nonexplosive (M 1) systems is described in the theorem below. Theorem. Let {en } be a m.d. sequence satisfying Eqs. (6.105) and (6.161), and define Yn by Eq. (6.106). (i) If m . 0, then for k ¼ 0, +1, +2, . . . 0 Vn1 Ynk ¼ (Bk Y)0 G1 (Bk Y) . 0 a:s: lim Ynk
n!1
where G and Y are the same as in Theorem 6.7.7.
332
CHAPTER 6
CONVERGENCE ALMOST EVERYWHERE
(ii) If M 1, then lim max Yj0 Vn1 Yj ¼ 0 a:s:
n!1 1 jn
(6:199)
Proof. Proving (i). The random variables under consideration converge by Theorem 6.7.7: 0 Vn1 Ynk ¼ (Bk Bkn Ynk )0 [Bn Vn (Bn )0 ]1 (Bk Bkn Ynk ) Ynk
! (Bk Y)0 G1 (Bk Y) where G1 exists. Proving (ii). The proof is split into several steps. Step 1. For the proof of Eq. (6.199) it suffices to show that lim Yn0 Vn1 Yn ¼ 0 a:s:
n!1
(6:200)
Indeed, by Lemma 6.7.8(i) for 1 j , N Yj0 Vn1 Yj max kYj k2 O(n1 ) ¼ O(n1 ): 1 j,N
For N j n we can use the fact that, according to Eq. (6.88), Vj1 V 1 jþ1 0 1 and therefore Yj0 Vj1 Yj Yj0 V 1 Y Y V Y . j j jþ1 j n Step 2. Case of a nonsingular B. By Lemma 6.7.9(ii), in Eq. (6.198) we can choose r [ (1=a, 1=2): Denoting at ¼ Yt0 Vt1 Yt , by Eq. (6.198) we have atþ1 at þ o(t k ), with k ¼ 1=2 r . 0, whereas by Lemma 6.7.8(ii) Pn d t¼N at ¼ O(log n) ¼ o(n ) for any d . 0: Therefore Lemma 6.6.10 b implies an ¼ o(n ) for all b , min{1, k=2}. Case of a singular B. In this case 0 is a root of the characteristic polynomial f(l) of B: Denote r its multiplicity, r p: Subcase r , p. In the spectrum-separating decomposition of Section 6.5.8 we can assume that M is nonsingular, B1 is nonsingular and all eigenvalues of B2 are zero. The error vectors jn , zn in Eq. (6.134) satisfy the same conditions as en . As we proved in the nonsingular case,
S0n
n X 1
!1 Si S0i
Sn ! 0 a.s:
(6:201)
6.7 STRONG CONSISTENCY FOR VECTOR AUTOREGRESSION AND RELATED RESULTS
333
Denoting An ¼ MVn M 0 , from Eqs. (6.133) and (6.135) we get Yn0 Vn1 Yn
¼
(MYn )0 A1 n MYn
þ
0
0
Tn
A1 n
¼ 0
Sn 0
0
þ2
Tn
A1 n Sn
0
0
Sn 0
A1 n
0
Tn
¼ In1 þ In2 þ 2In3 , say:
(6:202)
As M is nonsingular and kVn1 k ¼ O(n1 ) by Lemma 6.7.8(i), we have 0 1 1 1 1 kA1 n k k(M ) kkVn kkM k ¼ O(n ) a:s:
(6:203)
Lemma 6.6.4, applied to Tn , gives kTn k ¼ o(n1=2 ): Consequently, 2 1 0 In2 kA1 n k kTn k ¼ O(n )o(n) ¼ o(1):
(6:204)
By Lemma 6.6.9 n X
tr
! Ti Ti0
¼
1
n X
kTi k2 ¼ O(n):
(6:205)
1
As a result of the partitioning [Eq. (6.135)], Lemma 6.5.9(ii) can be applied to An . Using Eqs. (6.201), (6.203) and (6.205) we get
0 In1
S0n
n X
!1 Si S0i
" Sn 1 þ
kA1 n ktr
1
n X
!# Ti Ti0
1 1
¼ o(1)[1 þ O(n )O(n)] ¼ o(1):
(6:206)
The bounds obtained allow us to estimate In3 : 0
Sn 0 1=2 1=2 ¼ (In1 In2 )1=2 ¼ o(1): kIn3 k An An Tn 0
(6:207)
The desired conclusion Eq. (6.200) follows from Eqs. (6.202), (6.204), (6.206) and (6.207). Subcase r ¼ p. In this case Sn is empty, Yn ¼ Tn and the bound is similar to Eq. (6.204): Yn0 Vn1 Yn ¼ Tn0 Vn1 Tn kVn1 k kTn k2 ¼ O(n1 )o(n) ¼ o(1):
B
334
CHAPTER 6
CONVERGENCE ALMOST EVERYWHERE
6.7.11 Another Lemma on Purely Explosive Processes Lemma. Let {en } be a m.d. sequence satisfying Eqs. (6.105) and (6.161). Suppose m . 1: Then
lim
n X
n!1
kBn Yi k ¼
1 X
i¼1
kBi Yk , 1 a:s:,
i¼1
where Y is from Eq. (6.160). Proof. Obviously, for any I . 0 n X
n
kB Yi k ¼
i¼1
n X
þ
i¼Iþ1
I X
! kB(ni) Bi Yi k:
(6:208)
i¼1
By Theorem 6.6.7 the number I can be chosen in such a way that for i I we have kBi Yi Yk 1: We handle the first sum in Eq. (6.208) using Lemma 6.5.2: n nI1 X X (ni) i j kB B Yi k kB Yk i¼Iþ1 j¼0 (change j ¼ n 2 i in the second sum) n X (ni) i (ni) ¼ (kB B Yi k kB Yk) i¼Iþ1
n X
kB(ni) (Bi Yi Y)k 1
i¼Iþ1
1
n X
kB(ni) k
i¼Iþ1
1 X
kBi k c1 1
i¼0
1 X
mi ¼ c2 1:
(6:209)
i¼0
For the second sum in Eq. (6.208) we apply the bound c3 ¼ supi kBi Yi k , 1: I X
kB(ni) Bi Yi k c4
1 X
mi ¼ c5 m(nI) ! 0,
n ! 1:
(6:210)
i¼nI
i¼1
Besides, 1 X i¼nI
kBi Yk c6
1 X
mi ! 0,
n ! 1:
(6:211)
i¼nI
Equations (6.208) – (6.211) prove the statement.
B
6.7 STRONG CONSISTENCY FOR VECTOR AUTOREGRESSION AND RELATED RESULTS
335
6.7.12 Strong Consistency of the Ordinary Least Squares Estimator Theorem. Suppose that the m.d. sequence {en } satisfies Eqs. (6.105) and (6.179) [or, equivalently, Eq. (6.180)]. Then for k ¼ 1, . . . , p lim b^ k (n) ¼ bk a:s:
(6:212)
n!1
Proof. Setting up proper normalization. The elements of the representation !1 n1 n1 X X 0 Yi Y Yi eiþ1,k (6:213) b^ k (n) bk ¼ i
i¼1
i¼1
need to be properly normalized to converge. We are assuming that for the spectrumseparating decomposition Eq. (6.132) holds and the process {Sn } is purely explosive. Therefore, by Theorem 6.7.7(i) ! n1 X n 0 0 Si Si (Bn (6:214) lim B1 1 ) ; G is positive definite a.s. n!1
1
Denoting Vn ¼
n1 X
Ti Ti0 ,
Dn ¼
Bn 1
0
0
1=2 Vn1
1
, An1 ¼ M
n1 X
! Yi Yi0
M0,
(6:215)
1
from Eq. (6.213) we get the desired representation: " # n1 X 1 1=2 0 0 0 1=2 n Dn M Yi eiþ1,k : b^ k (n) bk ¼ [n M Dn ] [Dn An1 Dn ]
(6:216)
i¼1
Convergence of the denominator matrix. Here we prove that Dn An1 D0n !
G 0 0 Ir
a.s.,
(6:217)
where r is the dimension of the process {Tn }: From Eqs. (6.135) and (6.215) we obtain the following representation for the denominator matrix: 0 Dn An1 D0n ¼
B B 1 1=2 B n1 @P Vn1
Bn 1
0
0
1
0 B B ¼B @
n1 P
Bn 1
n1 P
1=2 Vn1
0 Si S0i (Bn 1 )
1 n1 P 1
0 Ti S0i (Bn 1 )
Si S0i Ti S0i Bn 1
n1 P 1 n1 P 1 n1 P 1
1 C (Bn )0 C 1 C A 0 0
Si Ti0 Ti Ti
1=2 Si Ti0 Vn1
Ir
1 C C C: A
0 1=2 Vn1
336
CHAPTER 6
CONVERGENCE ALMOST EVERYWHERE
The limit of the upper left element of this matrix is given by Eq. (6.214). To prove Eq. (6.217), it suffices to show that Bn 1
n1 X
1=2 Si Ti0 Vn1 ¼
1
n1 X
0
1=2 (Bn 1 Si ) (Vn1 Ti ) ! 0:
(6:218)
1
Since the process {Tn } is nonexplosive, by Theorem 6.7.10(ii) 2
1=2 1 max kVn1 Tj k ¼ max Tj0 Vn1 Tj ! 0:
1 jn1
1 jn1
Besides, by Lemma 6.7.11 in the purely explosive case sup n
n1 X
kBn 1 Si k , 1:
(6:219)
i¼1
Clearly, Eq. (6.218) is a consequence of the above two equations. Bounding the first factor in Eq. (6.216). By Lemma 6.7.8(i) 1=2 kVn1 k ¼ O(n1=2 ) a.s:
(6:220)
This bound, definition (6.215) and Lemma 6.5.2 imply 1=2 0 kn1=2 M 0 D0n k n1=2 kM 0 k(k(Bn 1 ) k þ kVn1 k)
¼ O(n1=2 )[O(mn ) þ O(n1=2 )] ¼ O(1):
(6:221)
Bounding the last factor in Eq. (6.216). Using Eqs. (6.133) and (6.215) we get 0 1 nP 1
n B Si eiþ1,k C n1 X B1 0 B i¼1 C 1=2 1=2 Dn M Yi eiþ1,k ¼ n n C 1=2 B n1 P @ A 0 V n1 i¼1 Ti eiþ1,k i¼1
0 B B ¼B @
n1=2 Bn 1
n1 P
1
Si eiþ1,k C C C: A Ti eiþ1,k
i¼1 n1 1=2 P n1=2 Vn1 i¼1
(6:222)
Since ken k ¼ o(n1=2 ) by Lemma 6.6.1, Eq. (6.219) implies n1 1=2 n X B1 Si eiþ1,k n i¼1 (n1=2 max eiþ1,k ) 1in1
n1 X i¼1
kBn 1 Si k ! 0 a:s:
(6:223)
6.7 STRONG CONSISTENCY FOR VECTOR AUTOREGRESSION AND RELATED RESULTS
337
From Lemma 6.7.8(iii) we know that n1 1=2 X Ti eiþ1,k ¼ O((log n)1=2 ): Vn1 i¼1
(6:224)
Combining this with Eq. (6.220) we see that n1 1=2 1=2 X Vn1 Ti eiþ1,k ¼ O(n1=2 (log n)1=2 ): n i¼1
(6:225)
Equations (6.222), (6.223) and (6.225) prove that n1=2 Dn M
n1 X
Yi eiþ1,k ! 0 a:s:
(6:226)
i¼1
The strong consistency (6.212) is a consequence of Eqs. (6.216), (6.217), (6.222), (6.226) and the fact that G is positive definite a.s. B
CHAPTER
7
NONLINEAR MODELS
I
N THIS chapter we consider two types of nonlinear estimation techniques: NLS and the ML method. In the first case we give a full proof of the result by Phillips (2007) for the model ys ¼ bsg þ us , s ¼ 1, . . . , n: This proof includes an expanded exposition of the Wooldridge (1994) approach to asymptotic normality of an abstract estimator. In the second case, we give an extension to the unbounded explanatory variables of the result of Gourie´roux and Monfort (1981) for the binary selection model. Problems arising from the unboundedness assumption are explained. Some ideas of the proof, such as obtaining and analyzing the Lipschitz constant, can be used in models other than binary logit, and others, like the link to the linear model, are specific to the binary logit model.
7.1 ASYMPTOTIC NORMALITY OF AN ABSTRACT ESTIMATOR The theory of nonlinear estimation is complex, and some authors in this area “overcome” its complexities by hiding them under a pile of conditions. The result by Wooldridge (1994) stands out by being rigorous and applicable to nonlinear regressions for nonstationary dependent time series.
7.1.1 The Framework We begin with an objective function Qn (v, u), where v is the sample data and u is the parameter in the parameter space Q. Q is assumed to be of dimension p and, correspondingly, all square matrices will be of size p p. Most of the time dependence on v of Qn and its derivatives is suppressed. The vector of first-order derivatives Sn (u) ¼ ru Qn (u)0 is called a score and the matrix of second-order derivatives Hn (u) ¼ ru Sn (u) Short-Memory Linear Processes and Econometric Applications. Kairat T. Mynbaev # 2011 John Wiley & Sons, Inc. Published 2011 by John Wiley & Sons, Inc.
339
340
CHAPTER 7
NONLINEAR MODELS
is called a Hessian. By an estimator of the true value u0 we mean a maximizing or minimizing point u^n of the objective function Qn (v, u), which, under appropriate conditions, is a solution to the first-order condition Sn (u^n ) ¼ 0 a:s:
(7:1)
Usually such an estimator exists only asymptotically almost surely in the sense that the probability of the sample points v for which it exists approaches unity as n ! 1: We are interested in conditions sufficient for existence and consistency of such an estimator. Once we have it, a mean value expansion about u0 0 ¼ Sn (u^n ) ¼ Sn (u0 ) þ Hn (u^n , u0 )(u^n u0 )
(7:2)
can be used to investigate the asymptotic normality of u^n : In Eq. (7.2) Hn (u, u0 ) denotes the Hessian with rows evaluated at mean values u between u and u0 . That is, if we denote Hn1 , . . . , Hnp the rows of Hn , then with some D1 , . . . , Dp [ [0, 1] 0
1 Hn1 (u0 þ D1 (u u0 )) A Hn (u, u0 ) ¼ @ Hnp (u0 þ Dp (u u0 ))
(7:3)
(in fact, the argument here belongs to the cube with vertices u and u0 ): The numbers Di arise from an application of the mean value theorem to p components of Sn : It is well known that, in general, they are different (one cannot apply the mean value theorem to the whole vector Sn to produce a single D for i ¼ 1, . . . , p): With some abuse of notation the argument in Eq. (7.3) is denoted u ¼ u0 þ D(u u0 ) and then Hn (u, u0 ) can be written as Hn (u, u0 ) ¼ Hn (u):
7.1.2 Wooldridge’s Assumptions The first assumption is a set of regularity conditions to ensure a proper smoothness of Qn : 7.1.2.1
Assumption W1
1. Qn : V Q ! R is the objective function defined on the data space V and the parameter space Q # Rp : 2. The true parameter u0 belongs to the interior int(Q): 3. Qn satisfies standard measurability and differentiability conditions: a. for each u [ Q, Qn ( , u) is measurable, b. for each v [ V, Qn (v, ) is twice continuously differentiable on int(Q): The second assumption is about normalization of the score and Hessian at the true value u0 :
7.1 ASYMPTOTIC NORMALITY OF AN ABSTRACT ESTIMATOR
341
7.1.2.2 Assumption W2 There exists a sequence of nonstochastic positive definite diagonal matrices {Dn } such that d
! N(0, B0 ) D1 n Sn (u0 ) and p
1 D1 ! A0 , n Hn (u0 )Dn
(7:4)
where A0 and B0 are nonrandom matrices and A0 is positive definite. The next assumption realizes the idea to provide a type of uniform convergence Hn (u) ! Hn (u0 ) of the Hessian normalized by something tending to infinity at a rate slower than Dn : 7.1.2.3 Assumption W3 There is a sequence of nonstochastic positive definite diagonal matrices {Cn } such that ! 0 as n ! 1 Cn D1 n
(7:5)
max kCn1 [Hn (u) Hn (u0 )]Cn1 k ¼ op (1),
(7:6)
and u [ Nnr (u0 )
where the neighborhood Nnr (u0 ) of u0 is defined by Nnr (u0 ) ¼ {u [ Q: kCn (u u0 )k r}, 0 , r 1: Owing to this assumption, (i) we allow each element of the Hessian to be standardized by a different function of the sample size and (ii) the neighborhood Nnr (u0 ), over which the convergence Hn (u) ! Hn (u0 ) must take place, uniformly shrinks to u0 as the sample size tends to infinity. Everywhere in Section 7.1 Assumptions W1– W3 are assumed to hold.
7.1.3 Algebraic Lemma It is convenient to denote 0 1 Sn0 ¼ Sn (u0 ), Hn0 ¼ Hn (u0 ), An ¼ D1 n Hn Dn :
One of Wooldridge’s tricks is to use expansions about the point
u~n ¼ u0 (Hn0 )1 Sn0 ,
(7:7)
342
CHAPTER 7
NONLINEAR MODELS
which mimics u^n [from Eq. (7.2) we see that u^n ¼ u0 (Hn (u))1 Sn0 )]. This point has the properties (u~n u0 )0 Sn0 ¼ (Sn0 )0 (Hn0 )1 Sn0 , (u~n u0 )0 Hn0 (u~n u0 ) ¼ (Sn0 )0 (Hn0 )1 Sn0 :
(7.8)
The purpose of the lemma below is to show that the difference Qn (u) Qn (u~n ) is a quadratic function. Lemma.
The objective function satisfies
1 Qn (u) Qn (u~n ) ¼ (u u~n )0 Hn0 (u u~n ) þ Rn (u, u0 ) Rn (u~n , u0 ), 2
(7:9)
where Rn (u, u0 ) ¼ (u u0 )0 [Hn (u, u0 ) Hn0 ](u u0 ): Proof. By the second-order Taylor expansion 1 Qn (u) Qn (u0 ) ¼ (u u0 )0 Sn0 þ (u u0 )0 Hn0 (u u0 ) þ Rn (u, u0 ): 2
(7:10)
Replacing u by u~n in Eq. (7.10) and using Eq. (7.8) we get 1 Qn (u~n ) Qn (u0 ) ¼ (u~n u0 )0 Sn0 þ (u~n u0 )0 Hn0 (u~n u0 ) þ Rn (u~n , u0 ) 2 1 0 0 0 1 0 ¼ (Sn ) (Hn ) Sn þ Rn (u~n , u0 ): 2
(7.11)
By Eq. (7.7) we have u0 ¼ u~n (Hn0 )1 Sn0 , so by Eq. (7.10) 1 [u u~n (Hn0 )1 Sn0 ]0 2 H 0 [u u~n (H 0 )1 S 0 ] þ Rn (u~n , u0 )
Qn (u) Qn (u0 ) ¼ [u u~n (Hn0 )1 Sn0 ]0 Sn0 þ n
n
n
1 ¼ (u u~n )0 Sn0 (Sn0 )0 (Hn0 )1 Sn0 þ (u u~n )0 Hn0 (u u~n ) 2 1 1 (u u~n )0 Hn0 (Hn0 )1 Sn0 (Sn0 )0 (Hn0 )1 Hn0 (u u~n ) 2 2 1 þ (Sn0 )0 (Hn0 )1 Sn0 þ Rn (u, u0 ): 2
7.1 ASYMPTOTIC NORMALITY OF AN ABSTRACT ESTIMATOR
343
Here some terms cancel out, and the result is 1 1 Qn (u) Qn (u0 ) ¼ (Sn0 )0 (Hn0 )1 Sn0 þ (u u~n )0 Hn0 (u u~n ) þ Rn (u, u0 ): (7:12) 2 2 Subtracting Eq. (7.11) from Eq. (7.12) we get Eq. (7.9).
B
7.1.4 Lemma on Convergence in Probability Lemma. If a sequence of random vectors {an } satisfies plim an ¼ 0, then there exists a sequence of positive numbers {rn } such that lim rn ¼ 0 and lim P(kan k . rn ) ¼ 0: Proof. By definition, for any d . 0 we have P(kan k . d) ! 0: Hence, letting d1 ¼ 1 we can find n1 such that P(kan k . d1 ) d1 for all n n1 : Similarly, for d2 ¼ 21 there exists n2 . n1 such that P(kan k . d2 ) d2 for all n n2 : On the kth step we put dk ¼ 2kþ1 and find nk . nk1 such that P(kan k . dk ) dk for all n nk :
(7:13)
Since <1 k¼1 [nk , nkþ1 ) ¼ [n1 , 1), for any n n1 there is a segment [nk , nkþ1 ) containing this n: We can define rn ¼ dk for nk n , nkþ1 : Then from Eq. (7.13) it follows that P(kan k . rn ) ¼ P(kan k . dk ) dk : Since the conditions n ! 1, B nk ! 1 and dk ! 0 are equivalent, this proves the lemma.
7.1.5 Bounding the Remainder in the Vicinity of u0 Lemma.
One has the bound sup jRn (u, u0 )j dn r2 for all r 1,
u [ Nnr (u0 )
where dn are random variables satisfying
dn ¼ op (1):
(7:14)
Proof. Denote
dn ¼
sup kCn1 [Hn (u) Hn0 ]Cn1 k:
u [ Nnr (u0 )
Then Eq. (7.14) follows from Eq. (7.6). Write Rn (u, u0 ) ¼ bn (u)0 Bn (u, u0 )bn (u); where bn (u) ¼ Cn (u u0 ), Bn (u, u0 ) ¼ Cn1 [Hn (u, u0 ) Hn0 ]Cn1 : Since for any r 1 we have {u : kbn (u)k r} ¼ Nnr (u0 ) # Nn1 (u0 ) and for u [ Nnr (u0 ) the argument u ¼ u0 þ D(u u0 ) of Hn (u, u0 ) belongs to Nnr (u0 ),
344
CHAPTER 7
NONLINEAR MODELS
kCn (u u0 )k kCn (u u0 )k r, by condition (7.6) we obtain sup jRn (u, u0 )j
u [ Nnr (u0 )
sup kbn (u)k2 kBn (u, u0 )k dn r2 :
B
u [ Nnr (u0 )
7.1.6 Bounding the Remainder in the Vicinity of u~n Define N~ n (r) ¼ {u : kDn (u u~n )k r}, Vn ¼ {v: kCn (u~n u0 )k rn }: Lemma.
There exists a sequence of positive numbers {rn } such that
(i) P(Vn ) ! 1 as n ! 1: (ii) For all large n we have sup v [ Vn , u [ N~ n (rn )
jRn (u, u0 )j 4dn rn2 a:s:
(iii) For rn 1 sup jRn (u~n , u0 )j dn rn2 a:s:
v [ Vn
Proof. (i) By Assumption W2 and definition (7.7) d
1 0 1 1 Dn (u~n u0 ) ¼ Dn (Hn0 )1 Sn0 ¼ A1 n Dn Sn ! N(0, A0 B0 A0 )
and, therefore, Dn (u~n u0 ) ¼ Op (1):
(7:15)
mn ; kCn D1 n k ! 0:
(7:16)
By Eq. (7.5)
Equations (7.15), (7.16) and the bound kCn (u~n u0 )k mn kDn (u~n u0 )k imply plim Cn (u~n u0 ) ¼ 0:
(7:17)
According to Lemma 7.1.4, then there exists a positive sequence {rn } such that limn!1 rn ¼ 0 and P(Vn ) ¼ P(kCn (u~n u0 )k rn ) ! 1: This proves (i). (ii) Obviously, kDn (u u~n )k rn implies kCn (u u~n )k mn rn so N~ n (rn ) # {u : kCn (u u~n )k mn rn } , {u : kCn (u u~n )k rn} for n large: (7:18)
7.1 ASYMPTOTIC NORMALITY OF AN ABSTRACT ESTIMATOR
345
By the triangle inequality kCn (u u0 )k kCn (u u~n )k þ kCn (u~n u0 )k:
(7:19)
In view of Eqs. (7.18) and (7.19) we have the implication
v [ Vn , u [ N~ n (rn ) ) u [ Nn2rn (u0 ): Therefore (ii) follows from Lemma 7.1.5 and Eq. (7.17): jRn (u, u0 )j
sup v [ Vn , u [ N~ n (rn )
sup u [ Nn2rn (u0 )
jRn (u, u0 )j 4dn rn2
if n is large. (iii) Since u~n [ Nnrn (u0 ) for v [ Vn , statement (iii) follows immediately from Lemma 7.1.5. B
7.1.7 Consistency of u^n Theorem.
~ n } such that There exists a sequence of sets {V ~ n ) ! 1 as n ! 1 P(V
(7:20)
~ n there exists an estimator u^n [ N~ n (rn ) such that and for almost any v [ V Dn (u^n u0 ) ¼ Op (1):
(7:21)
Proof. Denote fn (u) ¼ Qn (u) Qn (u~n ): Obviously, the center u~n of the neighborhood N~ n (rn ) satisfies fn (u~n ) ¼ 0: We want to show that on the boundary @N~ n (rn ) of N~ n (rn ) the function fn is positive. Let ln denote the smallest eigenvalue of 0 1 An ¼ D1 n Hn Dn and let ~ n ¼ Vn > {v: 5dn ln =4}: V Since ln tends to a positive number by Assumption W2, Lemma 7.1.6(i) and ~n Eq. (7.14) imply Eq. (7.20). By Lemmas 7.1.3 and 7.1.6 for v [ V 1 0 1 ~ [Dn (u u~n )]0 (D1 min fn (u) min n Hn Dn )[Dn (u un )] u [ @N~ n (rn ) u [ @N~ n (rn ) 2 ~ þRn (u, u0 ) Rn (un , u0 ) 1 1 ln rn2 5dn rn2 ln rn2 . 0: 2 4 Thus, there are points inside N~ n (rn ) where fn takes values lower than on its boundary. Since fn is smooth by Assumption W1, it achieves its minimum inside N~ n (rn ) and
346
CHAPTER 7
NONLINEAR MODELS
the point of minimum u^n satisfies the first-order condition (7.1). The inequality kDn (u^n u~n )k rn and Eq. (7.15) prove Eq. (7.21). B
7.1.8 Asymptotic Normality of u^n Theorem. satisfies
Under Assumptions W1– W3 the estimator u^n from Theorem 7.1.7
d 1 Dn (u^n u0 ) ! N(0, A1 0 B0 A0 ):
(7:22)
~ n we can use Eq. (7.2). PremultipliProof. By Theorem 7.1.7 for almost any v [ V 1 cation of that equation by Dn yields 0 1 ~ ^ 0 ¼ D1 n Sn þ Dn H n (un u0 ) 0 1 ~ 0 1 ^ ^ ¼ D1 n Sn þ An Dn (un u0 ) þ Dn (H n Hn )Dn Dn (un u0 ),
(7.23)
where we denote H~ n ¼ Hn (u^n , u0 ). Now we show that the end term in Equation (7.23) is negligible. Equation (7.21) implies that the mean value u n ¼ u0 þ D(u^n u0 ) satisfies Dn (u n u0 ) ¼ Op (1): By condition (7.5) then Cn (u n u0 ) ¼ op (1):
(7:24)
~ n : kCn (u n u0 )k 1} we have P(V n ) ! 1 by Eqs. (7.20) n ¼ {v [ V Denoting V n it holds that u n [ Nn (1) for all large n and therefore, by and (7.24). For v [ V Assumption W3 0 1 1 1 ~ 0 1 1 ~ kD1 n (H n Hn )Dn k kDn Cn kkCn (H n Hn )Cn kkCn Dn k ¼ op (1) 1 (D1 n Cn ¼ Cn Dn because these matrices are diagonal). This bound and consistency 0 (7.21) show that the end term in Eq. (7.23) is op (1): It follows that 0 ¼ D1 n Sn þ An Dn (u^n u0 ) þ op (1): As a result of Assumption W2 this implies Dn (u^n u0 ) ¼ 1 0 A1 n Dn Sn þ op (1): It remains to apply Assumption W2 again to prove Eq. (7.22). B
7.2 CONVERGENCE OF SOME DETERMINISTIC AND STOCHASTIC EXPRESSIONS To make the exposition of the Phillips’ method in Section 7.3 clearer, in Section 7.2 we collect some technical tools arising in his approach.
347
7.2 CONVERGENCE OF SOME DETERMINISTIC AND STOCHASTIC EXPRESSIONS
7.2.1 Approximation of Integrals by Integral Sums: General Statement Lemma.
Denote n t ð1 1X f (t) dt: f R( f ) ¼ n t¼1 n 0
(i) If f is absolutely continuous on [0, 1], then jR( f )j k f 0 k1 =n: (ii) Suppose f is continuously differentiable on (0, 1], j f j is monotone on [0, d0 ) for some 0 , d0 , 1 and sup j f 0 (t)j cj f 0 (d)j
for all 0 , d d0 =2:
(7:25)
d,t,1
Then 2ðd
j f 0 (d)j n
j f (t)j dt þ c
jR( f )j 2
for all 0 , d d0 =2:
0
Proof. (i) Using the Newton – Leibniz formula and changing the order of integration we have [it ¼ ((t 1)=n, t=n)] ð t=n ð ð t f (s) ds j f 0 (t)j dt ds f n it s
it
ð
ðt
¼
ð 1 dsj f (t)j dt j f 0 (t)j dt: n 0
it (t1)=n
it
Hence, n ð n ð X 1X k f 0 k1 t : f (s) ds j f 0 (t)j d t ¼ jR( f )j f n n n t¼1 t¼1 it
it
1 t (ii) By monotonicity of j f j for small t the term f does not exceed either n n Ð Ð 1 d d 0 =2 it j f (s)j ds or itþ1 j f (s)j ds, so for n dþ1=n 2ðd ð nd 1 X t f j f (t)j dt j f (t)j dt: n t¼1 n 0
0
348
CHAPTER 7
NONLINEAR MODELS
By the finite increments formula and condition (7.25) X X ð1 n ð 1 n t t 0 f f (s) ds f ( u ) s ds n n t¼nd t¼nd n it
d
n X 1 sup j f 0 (s)j n sd t¼1
ð ds
cj f 0 (d)j : n
it
The last two bounds result in nd 1 X t ðd jR( f )j f þ j f (s)j ds n t¼1 n 0
2ðd X ð1 1 n t j f 0 (d)j f (s) ds 2 j f (s)j ds þ c : f þ n n t¼nd n d
0
B
7.2.2 Approximation of Integrals by Integral Sums: Special Cases Lemma. Let i be a nonnegative integer and g0 a real number such that g0 . 1: Denote fi (t) ¼ t g (log t)i where g belongs to a small neighborhood Od (g0 ) # (1, 1) of g0 : Then uniformly in g [ Od (g0 ) 1 if g0 . 1, n ( log n)i if 1 g0 . 1: R( fi ) ¼ O (gþ1)=2 n R( fi ) ¼ O
(7.26) (7.27)
Proof. From fi0 (t) ¼ gtg1 ( log t)i þ it g1 ( log t)i1 we get j fi0 (t)j ct g1 (1 þ jlog tji ):
(7:28)
Equation (7.26) follows from this bound and Lemma 7.2.1(i). In case g0 1 we see from Eq. (7.28) that fi satisfies assumptions of Lemma 7.2.1(ii). Therefore 2ðd
jR( fi )j 2 0
t g jlog tji dt þ c
j fi0 (d)j : n
(7:29)
7.2 CONVERGENCE OF SOME DETERMINISTIC AND STOCHASTIC EXPRESSIONS
349
It is easy to see that with some constants cij ða
t g (log t)i dt ¼ agþ1
ða
g
t ( log t)
kþ1
0
cij log j a:
(7:30)
j¼0
0
Indeed, for i ¼ 0 one has Then for i ¼ k þ 1
i X
Ða 0
t g dt ¼ agþ1 =(g þ 1): Suppose Eq. (7.30) is true for i ¼ k:
a ða t gþ1 ( log t)kþ1 k þ 1 g k dt ¼ g þ 1 t ( log t) dt gþ1 0 0
gþ1
¼
a
( log a) gþ1
kþ1
k k þ 1 gþ1 X a ckj log j a, gþ1 j¼0
which is of form Eq. (7.30). Equations (7.28) – (7.30) imply for small d jR( fi )j c1 dgþ1 j log dji þ c2 The choice d ¼ n1=2 yields dgþ1 ¼ Eq. (7.27).
dg1 j log dji : n
dg1 ¼ n(gþ1)=2 and finishes the proof of n B
7.2.3 Lp-Approximability of Power Sequences Here we consider sequences xn ¼ (1g , 2g , . . . , ng ): For g 0 Lp -approximability of xn =kxn k is shown in Section 2.7.3, where the fact that g is an integer did not play any role. Continuity, however, was important. Here we consider negative g when there is no continuity. Let 1 p , 1 and 0 . g . 1=p: Then ngpþ1 þ O(n(gpþ1)=2 (i) kxn kpp ¼ ),g gp þ 1 n g 1 1=p is Lp -close to W(s) ¼ sg : More,..., (ii) the sequence wn ¼ n n n over, kwn dnp Wkp ! 0 uniformly in g [ (1=p þ d, 0) for any d . 0:
Lemma.
Proof. (i) By Eq. (7.26) kxn kpp ¼
n X
t gp ¼ ngpþ1
t¼1
n gp 1X t n t¼1 n
21 3 ð ngpþ1 þ O(n(gpþ1)=2 ): ¼ ngpþ1 4 sgp ds þ O(n(gpþ1)=2 )5 ¼ gp þ 1 0
350
CHAPTER 7
NONLINEAR MODELS
(ii) By Minkowski’s inequality nd X
kwn dnp Wkp
!1=p jwnt jp
þ
t¼1
þ
nd X
!1=p j(dnp W)t jp
t¼1
!1=p
n X
jwnt (dnp W)t j
p
:
(7.31)
t¼nd
Here, by monotonicity nd X t¼1
ðd nd gp 1X t jwnt j ¼ sgp ds ¼ cdgpþ1 : n t¼1 n p
(7:32)
0
Ð Using Ho¨lder’s inequality and the definition (dnp W)t ¼ n11=p it W(s) ds we obtain a similar bound for the second term at the right of Eq. (7.31) nd X
j(dnp W)t jp ¼ n p1
t¼1
nd X t¼1
n p1
nd ð X t¼1
ðd
p ð W(s) ds it 0
ð
1p1
W p (s) ds@ dsA
it
it
sgp ds ¼ cdgpþ1 :
¼
(7.33)
0
By the finite increments formula n X t¼nd
p ð n X g 1 t p g jwnt (dnp W)t j ¼ n s ds n n t¼nd it p ð n X 1 t 0 ¼ n W (u) s ds n n t¼nd it sup jW 0 (s)jp sd
n X t¼nd
1 sup jW 0 (s)jp : np sd
1 n pþ1 (7.34)
7.2 CONVERGENCE OF SOME DETERMINISTIC AND STOCHASTIC EXPRESSIONS
351
Since W satisfies condition (7.25), Eq. (7.34) implies n X
!1=p jwnt (dnp W)t j
p
t¼nd
c g1 d : n
(7:35)
Equations (7.31), (7.32), (7.33), and (7.35) yield dg1 gþ1=p : þ kwn dnp Wkp c d n The choice dgþ1=p ¼ dg1 =n or, equivalently, d ¼ n(1þ1=p) finishes the proof of Lp -approximability with the bound kwn dnp Wkp ¼ O (n( pgþ1)=( pþ1) ): The fact that this convergence is uniform in g [ (1=p þ d, 0) follows from the observation that the constants in Eqs. (7.32), (7.33), and (7.35) are uniformly bounded. B
7.2.4 Definition of Auxiliary Deterministic and Stochastic Expressions Phillips defines the matrix Cn required in the Wooldridge framework by Cn ¼ ng0 þ1=2d diag[1, log n]: With this definition the neighborhood from Assumption W3 becomes Nnr (u0 ) ¼ {u [ Q: kCn (u u0 )k r} 2
2
¼ {u : [ng0 þ1=2d (b b0 )] þ [ng0 þ1=2d ( log n)(g g0 )] r 2 }:
(7.36)
In particular, for u [ Nn1 (u0 ) jb b0 j
1 ng0 þ1=2d
,
jg g0 j
1 ng0 þ1=2d
log n
:
(7:37)
We need two types of deterministic expressions: D1ni ¼
n X
(bi s2g bi0 s2g0 ) logi s,
i ¼ 0, 1, 2,
(7.38)
s¼1
D2ni ¼
n X s¼1
(bsg b0 sg0 )bi sg logiþ1 s, i ¼ 0, 1,
(7.39)
352
CHAPTER 7
NONLINEAR MODELS
and two types of stochastic ones: S1ni ¼
n X
us (bi sg bi0 sg0 ) logiþ1 s,
i ¼ 0, 1,
(7.40)
s¼1 n s g 1 X s S2ni (g) ¼ pffiffiffi us logi , n n n s¼1
i ¼ 0, 1, 2, 3:
(7.41)
In these definitions, (b, g) [ Nnr (u0 ): Therefore Eqs. (7.38), (7.39), and (7.40) should converge to zero in some sense.
7.2.5 Bounding Deterministic Expressions Lemma. Let g0 . 1=2: With some ( b , g ) between ( b0 , g0 ) and ( b, g) uniformly in ( b, g) [ Nn1 (u0 ) we have c logi n, ng0 2g 1=2d c jD2ni j g gg 1=2d logiþ1 n: 0 n
jD1ni j
(7.42) (7.43)
Proof. With some ( b , g ) between ( b0 , g0 ) and ( b, g) by the finite increments formula
bi s2g bi0 s2g0 ¼ i(b )i1 s2g (b b0 ) þ 2(b )i s2g (g g0 ) log s:
(7:44)
Therefore Eq. (7.38) becomes D1ni ¼ i( b )i1 ( b b0 )
n X
s2g logi s þ 2(b )i (g g0 )
s¼1
n X
s2g logiþ1 s:
(7:45)
s¼1
This requires estimation of Tni ¼
n X
s2g logi s,
i ¼ 0, . . . , 4:
s¼1
Substitution of log s ¼ log (s=n) þ log n yields Tni ¼
n X
s2 g
s¼1
¼ n2g
i X j¼0
þ1
i X j¼0
s ij log n Cji log j n
Cji ( logij n)
n 1X s s2g log j : n s¼1 n
(7:46)
353
7.2 CONVERGENCE OF SOME DETERMINISTIC AND STOCHASTIC EXPRESSIONS
As a result of Eq. (7.37) for all large n we have 2g þ 1 . d1 for some d1 . 0: Hence, by Lemma 7.2.2 jTni j c1 n2g
þ1
logi n:
(7:47)
Now Eqs. (7.37), (7.45), and (7.47) imply Eq. (7.42):
jD1ni j
c2 n2g þ1 logi n c3 n2g þ1 logiþ1 n ¼ c4 n2g g0 þ1=2þd logi n: þ g þ1=2d g þ1=2 d 0 0 log n n n
In the case of D2ni instead of Eqs. (7.44) and (7.45) we have, respectively,
bsg b0 sg0 ¼ sg (b b0 ) þ b sg (g g0 ) log s, D2ni ¼ bi (b b0 )
n X
sg
þg
logiþ1 s þ bi b (g g0 )
s¼1
n X
sg
þg
logiþ2 s,
(7:48)
i ¼ 0, 1:
s¼1
P Instead of Eq. (7.46) we need to estimate Tni ¼ ns¼1 sg þg logi s, i ¼ 1, 2, 3: In addition to Lemma 7.2.2, in the derivation of the analog of Eq. (7.47) we have to apply the Ho¨lder inequality ! ! n g þg n 2g n 1=2 1 X 1=2 1 X s 2g X s s 1 s s s log j log j log j : n s¼1 n n n s¼1 n n n s¼1 n n The result is jTni j c1 ng
þgþ1
jD2ni j
c2 ng
logi n and, hence,
þgþ1
logiþ1 n
ng0 þ1=2d
¼ c4 ng
þgg þ1=2þd 0
þ
c3 ng þgþ1 logiþ2 n ng0 þ1=2d log n
logiþ1 n:
B
7.2.6 Multiplication of Lp-Approximable Sequences by Continuous Functions Suppose sequences {wn (g): n ¼ 1, 2, . . .} depend on a parameter g [ Gn : We say that {wn (g)} is Lp -close to W [ Lp (0, 1) uniformly on Gn if sup kwn (g) dnp Wkp ! 0:
g [ Gn
The lemma below shows that this property is preserved under multiplication by continuous functions.
354
CHAPTER 7
NONLINEAR MODELS
Lemma. Let f [ C[0, 1]: Denote M(s) ¼ W(s)f (s) and consider the product sequences n 1 , . . . , wnn (g) f : mn (g) ¼ wn1(g) f n n If {wn (g)} is Lp -close to W [ Lp (0, 1) uniformly on Gn , then {mn (g)} is Lp -close to M uniformly on Gn : Proof. Denote zn (g) ¼ mn (g) dnp M: We need to prove that sup kzn (g)kp ! 0:
(7:49)
g [ Gn
As f is uniformly continuous, for any 1 . 0 there exists d . 0 such that js s0 j d implies j f (s) f (s0 )j 1 and sups [ it j f (t=n) f (s)j 1 for n 1=d: Now we bound the tth component of zn (g): t [dnp (Wf )]t j[znt (g)]t j ¼ wnt (g) f n ð t 1=q f W(s) ds wnt (g) n n it ð h t i 1=q þ n W(s) f f (s) ds n it k f kC jwnt (g) (dnp W)t j þ 1(dnp jWj)t : Hence by boundedness of dnp kzn (g)kp k f kC sup kwn (g) dnp Wkp þ 1kWkp : g [ Gn
This proves Eq. (7.49).
B
2 7.2.7 Convergence in Distribution of Sni (g0 )
7.2.7.1 Assumption P1 The errors ut in the model ys ¼ bsg þ us are linear processes ut ¼
X
cj etj , t [ Z,
j[Z
P where {et , F t : t [ Z} is a m.d. array, j jcj j , 1, second conditional moments are constant, E(e2t j F t1 ) ¼ s2e for all t and the squares e2t are uniformly integrable.
355
7.2 CONVERGENCE OF SOME DETERMINISTIC AND STOCHASTIC EXPRESSIONS
If g0 . 1=2 and ut satisfy Assumption P1, then 0 1 !2 ð1 n s g0 X X 1 s d S2ni (g0 ) ¼ pffiffiffi us logi ! N @0, se cj s2g0 log2i s dsA: n n n s¼1 j
Lemma.
0
Proof. Let d . 0 be such that g0 d . 1=2: The sequence 1 s g0 d pffiffiffi : s ¼ 1, . . . , n , n ¼ 1, 2, . . . n n is L2 -close to W(s) ¼ sg0 d by Lemma 7.2.3(ii). Since the function f (s) ¼ sd logi s is continuous on [0, 1], the product sequence 1 s g0 d s d i 1 s g0 i pffiffiffi p ffiffi ffi log s ¼ log s n n n n n is L2 -close to M(s) ¼ W(s) f (s) ¼ sg0 logi s by Lemma 7.2.6 (where Gn ¼ {g0 }). The conclusion of this lemma follows from Theorem 3.5.2. B 2 7.2.8 A Uniform Bound on Sni (g )
Lemma. If g0 . 1=2 and ut satisfy Assumption P1, then uniformly in g [ Nn1 (u0 ) [see Eqs. (7.36) and (7.41) for the definitions] S2ni (g) ¼ Op (1):
(7:50)
Proof. To make use of Lemma 7.2.7, we approximate S2ni (g) by S2ni (g0 ): For the approximation to work, we need to bound g away from 1=2: Therefore we write n n s g s gd s d 1 X s 1 X s S2ni (g) ¼ pffiffiffi us logi ¼ pffiffiffi us logi , n n n n n n s¼1 n s¼1
where d . 0 is small and g d is uniformly bounded away from 1=2: By the mean value theorem there are points gs,n between g and g0 such that s gd s g0 d s gs,n d s ¼ (g g0 ) log : n n n n 1 s gd : s ¼ 1, . . . , n we have Denoting wn (g) ¼ pffiffiffi n n n gd s g0 d 2 1X s 2 kwn (g) wn (g0 )k2 ¼ n s¼1 n n ¼ jg g 0 j2
n 2(g d) s,n 1X s s log2 : n s¼1 n n
356
CHAPTER 7
NONLINEAR MODELS
By Lemma 7.2.2 uniformly in n and gs,n the estimate n 2(g d) 1X s s,n s log2 c n s¼1 n n
is true. Thus, by Eq. (7.37) uniformly in g [ Nn1 (u0 ), kwn (g) wn (g0 )k2
c : ng0 þ1=2d log n
We know from Lemma 7.2.3 that wn (g0 ) is L2 -close to W(s) ¼ sg0 d , so the above inequality implies sup g [ Nn1 (u0 )
kwn (g) dn2 Wk2 ! 0:
With f (s) ¼ sd logi s then by Eq. (7.49) sup g [ Nn1 (u0 )
kmn (g) dn2 Mk2 ! 0,
where M(s) ¼ W(s) f (s) ¼ sg0 logi s and
1 s gd s d i s 1 s g i s ¼ pffiffiffi : log log mn (g) ¼ pffiffiffi n n n n n n n Now we use identity (3.25), orthogonality of m.d.’s and Lemma 2.3.2 to get 2
kS2ni (g) S2ni (g0 )k2
2
n n
1 X s g s g0 1 X
is i s
¼ pffiffiffi us log pffiffiffi us log
n s¼1 n n n s¼1 n n
2
2
n
X
¼
u [m (g) mn (g0 )]
s¼1 s n 2
2
X
¼
ei {Tn [mn (g) mn (g0 )]}i
i [ Z 2
¼
se2 kTn [mn (g)
se2
X i[Z
!2
mn (g0 )]k22
jci j kmn (g) mn (g0 )k22 :
(7:51)
7.2 CONVERGENCE OF SOME DETERMINISTIC AND STOCHASTIC EXPRESSIONS
357
Since mn (g0 ) is L2 -close to M, this bound and Eq. (7.51) imply S2ni (g) ¼ S2ni (g0 ) þ Op (1) uniformly in g [ Nn1 (u0 ): Lemma 7.2.7 and Eq. (7.52) prove Eq. (7.50).
(7:52) B
1 7.2.9 Bounding Sni
Lemma. satisfy
If g0 . 1=2 and ut satisfy Assumption P1, then the variables (7.40)
S1ni ¼ Op
( log n)iþ1 uniformly in u [ Nn1 (u0 ): ng0 g d
Proof. Using Eq. (7.48) rewrite Eq. (7.40) as S1ni ¼ i(b )i1 (b b0 )
n X
us sg logiþ1 s þ (b )i (g g0 )
s¼1
n X
us sg logiþ2 s,
i ¼ 0, 1:
(7:53)
s¼1
Therefore we need to bound n X
Uni ¼
us sg logi s, i ¼ 1, 2, 3:
s¼1
Substituting log s ¼ log(s=n) þ log n we get Uni ¼
n X
u s sg
s¼1
¼
i X
s Cji log j logij n n j¼0
i X
Cji ( logij n)ng
þ1=2
j¼0
¼ ng
þ1=2
i X
n s g 1 X s pffiffiffi us log j n n n s¼1
Cji ( logij n)S2nj (g ):
j¼0
By Lemma 7.2.8 Uni ¼ Op (ng S1ni
¼ Op
ng
¼ Op (ng uniformly in u [ Nn1 (u0 ):
þ1=2
þ1=2
logi n): This bound, Eqs. (7.37) and (7.53) yield
logiþ1 n
ng0 þ1=2d
g0 þd
g þ1=2 iþ2 n log n þ Op g þ1=2d log n n0
logiþ1 n) B
358
CHAPTER 7
NONLINEAR MODELS
7.3 NONLINEAR LEAST SQUARES 7.3.1 The Model and History Asymptotically collinear regressors arise in nonlinear regressions of type ys ¼ bsg þ us , s ¼ 1, . . . , n,
(7:54)
where the trend component g . 1=2 is to be estimated along with the regression coefficient b: Let b0 and g0 denote the true values of the parameters. The firstorder expansion for bsg is
bsg b0 sg0 þ sg0 (b b0 ) þ (b0 sg0 log s)(g g0 ) ¼ bsg0 þ (g g0 )b0 sg0 log s:
(7:55)
Thus, the linearized form of Eq. (7.54) involves the regressors sg0 and sg0 log s, which are asymptotically collinear and whose second moment matrix is asymptotically singular upon appropriate (multivariate) normalization. Wu (1981, p. 509) noted that model (7.54) failed his conditions [which require a single normalizing quantity and a positive definite limit matrix for the second moment matrix of the linearized version of Eq. (7.54) ys ¼ bsg0 þ (g g0 )b0 sg0 log s þ us ]. More precisely, Wu noted that the model (7.54) satisfies his conditions for strong consistency of the least-squares estimator u^ ¼ (b^ , g^ ), but not his conditions for asymptotic normality. There are two reasons for the failure: 1. the Hessian requires different standardizations for the parameters b and g (whereas Wu’s approach uses a common standardization) and 2. the Hessian is asymptotically singular because of the asymptotic collinearity of the functions sg0 and sg0 log s that appear in the score (whereas Wu’s theory requires the variance matrix to have a positive definite limit). Phillips (2007) derived the asymptotic distribution of the NLS estimator for Eq. (7.54). The theory is very instructive because of: 1. his choice of standardizing matrices, 2. the way he modified the Wooldridge approach to suit Eq. (7.54) and 3. the possibility to adapt the theory to models different from Eq. (7.54). This section contains a full proof of his results using, where necessary, statements based on Lp -approximability instead of those based on Brownian motion. While it is in general a matter of taste which approach to use, in one case the Brownian motion methods do not seem adequate for applications. Phillips (2007, Lemma 6.1) is proved for smooth functions, while its purported application Phillips (2007, p. 607) is to functions which may not be continuous.
7.3 NONLINEAR LEAST SQUARES
359
7.3.2 The Objective Function, Score and Hessian In the NLS method, the objective function for model (7.54) is defined by Qn (u) ¼
n X
( ys bsg )2 , where u ¼ (b, g):
s¼1
The estimator u^ , by definition, solves the extremum problem
u^ ¼ arg {min Qn (u)} u
(we minimize over u the objective function and take the minimizing point as the estimator). As a result of smoothness of Qn , the estimator satisfies the first-order condition Sn (u^ ) ¼ 0:
(7:56)
Since scaling the derivatives by 1=2 does not change the validity of the expansion (7.2) and does not affect asymptotic conditions, we define the score by [see the derivatives of bsg in Eq. (7.55)] n X 1 sg Sn (u) ¼ ru Qn (u)0 ¼ ( ys bsg ): g b s log s 2 s¼1
(7:57)
The Hessian for Eq. (7.57) is H n ( u ) ¼ ru S n ( u ) ¼
n X
hs (u),
(7:58)
s¼1
where
@ hs (u) ¼ @b
sg ( ys bsg ) , bsg ( ys bsg ) log s
@ @g
sg ( ys bsg ) g bs ( ys bsg ) log s
:
Replacing ys by its expression from the true model ys ¼ b0 sg0 þ us we find the elements of hs (u) to be hs11 (u) ¼ hs12 (u) ¼ hs21 (u) ¼
@ g s ( ys bsg ) ¼ s2g , @b
(7:59)
@ ( ys bsg )bsg log s @b
¼ ys sg log s þ 2bs2g log s ¼ b0 sg0 þg log s us sg log s þ 2bs2g log s ¼ bs2g log s us sg log s þ (bsg b0 sg0 )sg log s,
(7:60)
360
CHAPTER 7
NONLINEAR MODELS
hs22 (u) ¼
@ ( ys bsg )bsg log s @g
¼ b2 s2g log2 s (b0 sg0 þ us bsg )bsg log2 s ¼ b2 s2g log2 s us bsg log2 s þ (bsg b0 sg0 )bsg log2 s:
(7:61)
7.3.3 Lemma on Convergence of the Score Define the normalization matrix Dn ¼ ng0 þ1=2 diag[1, log n]: Lemma.
(7:62)
If g0 . 1=2 and ut satisfy Assumption P1, then
D1 n Sn (u0 )
s2 ! N 0, 2g0 þ 1 d
1
b0
b0
b20
!! 2
, s ; se
X
!2
cj :
j
Proof. Equations (7.57) and (7.62) imply g0 1=2 D1 n Sn (u0 ) ¼ n
0 B B ¼ B B @
1 0 0 1= log n
X n s¼1
n g 1 X s 0 pffiffiffi us n n s¼1 n g X b s 0 pffiffiffi 0 (log s)us n log n s¼1 n
sg0 us b0 sg0 ( log s)us 1 C C C: C A
(7:63)
Replacing log s ¼ log(s=n) þ log n and using notation (7.41) we get, by Lemma 7.2.7, 0
D1 n Sn (u0 )
1
1 @ A S2n0 (g0 ) þ op (1), ¼ ¼ b0 2 2 b Sn1 (g0 ) þ b0 Sn0 (g0 ) 0 log n S2n0 (g0 )
d
where S2n0 (g0 ) ! N½0, s2 =(2g0 þ 1) : This proves the lemma.
B
7.3.4 Asymptotic Representation of the Normalized Hessian Lemma. Suppose g0 . 1=2 and ut satisfy Assumption P1. With Hn defined by Eqs. (7.58) – (7.61) from Section 7.3.2 and Dn defined by Eq. (7.62) for the matrix
7.3 NONLINEAR LEAST SQUARES
361
1 An (u0 ) ¼ D1 n Hn (u0 )Dn , we have asymptotically
0 B 1B An (u0 ) ¼ B g1 B @
b0
1 b0 1
1 g1 log n
b20 1
1 1 g1 log n
2 2 þ 2 2 g1 log n g1 log n
1 C C 1 C C þ Op (n ), A
where g1 ¼ 2g0 þ 1 and 1 is some number from (0, g0 þ 1=2): Proof. By Eqs. (7.58) – (7.61) from Section 7.3.2
Hn (u) ¼
s2g0
n X s¼1
b0 s2g0 log s us sg0 log s
!
b0 s2g0 log s us sg0 log s b20 s2g0 log2 s us b0 sg0 log2 s
:
(7:64)
This equation and definition (7.62) lead to the following expressions for the elements of An (u0 ): An11 (u0 ) ¼
1
n X
n2g0 þ1
s¼1
s2g0 ¼
n 2g 1X s 0 , n s¼1 n
s 2g0 g0 þ log n ( b s u s ) log s 0 n2g0 þ1 log n s¼1 n " # 1 n 2g X 1 1X s 0 s log j ¼ b0 j n n n j¼0 log n s¼1 " # 1 n s g0 X 1 1 X js pffiffiffi : us log g0 þ1=2 log j n n n n s¼1 j¼0 n
An12 (u0 ) ¼
1
(7:65)
n X
(7:66)
Similarly, replacing log s by log ns þ log n, we have n 2 X X 1 2 2g0 g0 2 2j s log j n ( b s u b s ) C log s 0 0 j 2 n n2g0 þ1 log n s¼1 j¼0 " # 2 n 2g X Cj2 1X s 0 2 2j s log ¼ b0 2j n n n s¼1 n j¼0 log " # 2 n s g0 X Cj2 1 X 2j s pffiffiffi : us log b0 g0 þ1=2 log2j n n n n s¼1 j¼0 n
An22 (u0 ) ¼
(7:67)
362
CHAPTER 7
NONLINEAR MODELS
The bounds from Lemma 7.2.2 can be joined as R( fi ) ¼ O(n1 )
(7:68)
with some 1 [ (0, g0 þ 1=2): It is easy to calculate that ð1 t 0
2g0
1 dt ¼ , g1
ð1 t
2g0
0
1 log t dt ¼ 2 , g1
ð1
t 2g0 log2 t dt ¼
0
2 : g13
(7:69)
Hence, Eq. (7.65) asymptotically is An11 (u0 ) ¼
1 þ O(n1 ): g1
(7:70)
For Eq. (7.66) Lemma 7.2.7 and Eqs. (7.68) and (7.69) imply An12 (u0 ) ¼
b0 b 1 S2 (g ) þ 0 þ O(n1 ) g þ1=2 log n n1 0 n0 g12 log n g1 1 ng0 þ1=2
S2n0 (g0 ) ¼
b0 b 2 0 þ Op (n1 ): g1 g1 log n
(7:71)
Finally, for Eq. (7.67), we use Eqs. (7.68) and (7.69) and Lemma 7.2.7 to obtain
An22 (u0 ) ¼ b20
¼
b20
2 X
1 2j n log j¼0
21 3 ð Cj2 4 t 2g0 log2j t dt þ O(n1 )5 þ Op (n1 ) 0
1 2 2 þ Op (n1 ): þ g1 g21 log n g31 log2 n
Equations (7.70) –(7.72) prove the lemma.
(7:72)
B
7.3.5 The Right Normalization of the Hessian From the proof of Theorem 7.1.8 we can see that it is really not the convergence of the normalized Hessian An (u0 ) that is important [see Eq. (7.4)] but the convergence of the inverse A1 n (u0 ): The lemma below shows that if Hn (u0 ) is normalized by Dn this inverse does not converge. The inverse converges if Dn is replaced by Fn ¼
1 Dn ¼ ng0 þ1=2 diag[1= log n, 1]: log n
363
7.3 NONLINEAR LEAST SQUARES
Lemma.
Suppose b0 = 0, g0 . 1=2 and ut satisfy Assumption P1.
(i) The inverse of An (u0 ) is 0
1 2 2 1 1 B 1 g log n þ g2 log2 n b g log n 1 C 1 0 1 C 1 1 3 2 B An (u0 ) ¼ g1 log nB C @ A 1 1 1 1 2 b0 g1 log n b0 þ Op (n1 )
and, hence, A1 n (u0 ) diverges as n ! 1: (ii) Denote En ¼ Fn1 Hn (u0 )Fn1 : Then 0 2 2 2 þ b 1 0 g1 log n g21 log2 n g3 B B En1 ¼ 12 B b0 @ 1 1 b0 g1 log n
b0
1 1 1 C g1 log n C C A 1
þ Op (n1 )
b20 b0
g3 ¼ 12 b0
b 0 1
!
þ Op
1 : log n
Proof. (i) By Lemma 7.3.4 det An (u0 ) ¼
b20 2 2 2 1 þ 1 þ 1 g1 log n g 21 log2 n g1 log n g12 log2 n g 21 þ Op (n1 )
¼
b20 þ Op (n1 ): g14 log2 n
This equation, Lemma 7.3.4 and Eq. (4.62) prove (i). Part (ii) immediately follows from (i).
B
7.3.6 The Order of Eigenvalues of An (u0) Lemma.
If g0 . 1=2 and Assumption P1 is satisfied, then An (u0 ) has eigenvalues
1 þ b20 1 , l1 ¼ þ Op g1 log n
b20 (1 b20 ) 1 þ Op : l2 ¼ 3 g1 (1 þ b20 ) log2 n log3 n
(7:73)
364
CHAPTER 7
NONLINEAR MODELS
1 : From Lemma 7.3.4 we see that the eigenvalues g1 log n m1 , m2 of g1 An (u0 ) are the roots of the equation b0 (1 an ) 1m det(g1 An (u0 ) mI) ¼ det b0 (1 an ) b20 (1 2an þ 2a2n ) m Proof. Denote an ¼
¼ m2 m[b20 (1 2an þ 2a2n ) þ 1] þ b20 (1 2an þ 2a2n ) b20 (1 2an þ a2n ) ¼ m2 m[(1 þ b20 ) þ b20 (2an þ 2a2n )] þ b20 a2n ¼ 0: Hence
m1,2
(1 þ b20 ) þ b20 (2an þ 2a2n ) + ¼ 2
pffiffiffiffi D
,
(7:74)
where the discriminant D is D ¼ (1 þ b20 )2 þ 2b20 (1 þ b20 )(2an þ 2a2n ) þ b40 (4a2n 8a3n þ 4a4n ) 4b20 a2n ¼ (1 þ b20 )2 þ 2b20 (1 þ b20 )(2an þ 2a2n ) þ b40 (8a3n þ 4a4n ) þ 4b20 a2n (b20 1): Using the approximation
pffiffiffiffiffiffiffiffiffiffiffiffiffi pffiffiffi an a þ an a þ pffiffiffi we have 2 a
pffiffiffiffi 2b2 a2 (b2 1) 1 : D ¼ (1 þ b20 ) þ b20 (2an þ 2a2n ) þ 0 n 0 2 þ Op (1 þ b0 ) log3 n Therefore by Eq. (7.74)
m1 ¼ 1 þ
b20
þ Op
1 , log n
b20 a2n (1 b20 ) 1 b20 (1 b20 ) 1 ¼ 2 þ Op : m2 ¼ þ Op 1 þ b20 g1 (1 þ b20 ) log2 n log3 n log3 n Upon division by g1 we get Eq. (7.73).
B
7.3.7 Regulating Convergence of the Hessian With Lemma 7.3.6 we have finished preparing the ingredients necessary for modifying Wooldridge’s Assumption W2 (Section 7.1.2.2). In this and Section 7.3.8 we deal with definitions and statements required for his Assumption W3 (Section 7.1.2.3).
7.3 NONLINEAR LEAST SQUARES
365
Taking some d [ (0, (g0 þ 1=2)=3) put Cn ¼
1 Dn ¼ ng0 þ1=2d diag[1, log n] nd
(7:75)
and Dn ¼ Cn1 [Hn (u) Hn (u0 )]Cn1 : Lemma.
The elements of the matrix Dn are (g1 ¼ 2g0 þ 1),
Dn11 ¼
Dn12 ¼
1
n X
ng1 2d
s¼1
(s2g s2g0 ), " n X
1 ng1 2d log n
n X
g
(bs2g b0 s2g0 ) log s
s¼1
g0
us (s s ) log sþ
s¼1
Dn22
(7:76)
n X
# g
g0
g
(bs b0 s )s log s ,
(7:77)
s¼1
" n X 1 ¼ g 2d 2 (b2 s2g b20 s2g0 ) log2 s log n s¼1 n1
n X
us (bsg b0 sg0 ) log2 s þ
s¼1
n X
# (bsg b0 sg0 )bsg log2 s :
(7:78)
s¼1
Proof. Using Eqs. (7.58) – (7.61) from Section 7.3.2 and Eq. (7.64) we find the elements of the difference Gn ¼ Hn (u) Hn (u0 ): Gn11 ¼
n X
(s2g s2g0 ),
s¼1
Gn12 ¼
n X
(bs2g b0 s2g0 ) log s
n X
s¼1
þ
us (sg sg0 ) log s
s¼1
n X
(bsg b0 sg0 )sg log s,
s¼1
Gn22 ¼
n X
(b2 s2g b20 s2g0 ) log2 s
s¼1
þ
n X
n X
us (bsg b0 sg0 ) log2 s
s¼1
(bsg b0 sg0 )bsg log2 s:
s¼1
These equations and Eq. (7.7) imply Eqs. (7.76) – (7.78).
B
366
CHAPTER 7
NONLINEAR MODELS
7.3.8 Verifying Wooldridge’s Assumption W3 Lemma.
Provided that g0 . 1=2 and Assumption P1 holds we have kDn (u, u0 )k ¼ op (1):
sup u [ Nn1 (u0 )
Proof. To bound Eq. (7.76) we use Eq. (7.42) with i ¼ 0: Dn11 ¼ O
1 ng0 2g 1=2dþg1 2d
¼O
1
n3g0 2g þ1=23d
:
This tends to zero because g is close to g0 and g1 þ 1=2 3d . 0: Equation (7.77) is estimated with the help of Eqs. (7.42) and (7.43) and Lemma 7.2.9:
Dn12 ¼
1
log n
log n
log n
Op g 2g 1=2d þ g g d þ g gg 1=2d ng1 2d log n n0 n0 n0 1 1 1 ¼ Op 3g 2g þ1=23d þ 3g g þ13d þ 3g gg þ1=23d n 0 n 0 n 0 1 ¼ Op 1 n
with some 1 . 0: Here we remember that g and g are close to g0 : Finally, for (7.78) we use again Eqs. (7.42) and (7.43) and Lemma 7.2.9 to get
Dn22
1 log2 n log2 n log2 n ¼ g 2d 2 O g 2g 1=2d þ Op g g d þ O g gg 1=2d n0 n0 n0 log n n1 1 1 1 ¼ Op 3g 2g þ1=23d þ 3g g þ13d þ 3g gg þ1=23d n 0 n 0 n 0 1 ¼ Op 1 n
with some 1 . 0:
B
7.3.9 Summary of Phillips’ Statements Here we review Wooldridge’s assumptions in the light of the Phillips propositions proved so far. Everywhere the next two assumptions are assumed to hold.
7.3 NONLINEAR LEAST SQUARES
367
7.3.9.1 Assumption P1 The errors ut in the model ys ¼ bsg þ us are linear processes ut ¼
X
cj e jt , t [ Z,
j[Z
P where {et , F t : t [ Z} is a m.d. array, j jcj j , 1, the second conditional moments are constant, E(e2t j F t1 ) ¼ se2 for all t, and the squares e2t are uniformly integrable. This condition, introduced in Section 7.2.7, provides convergence of stochastic expressions. 7.3.9.2
Assumption P2
(i) g0 . 1=2, (ii) b0 = 0 and (iii) jb0 j , 1: The inequality g0 . 1=2 is necessary for the integrability of f (s) ¼ s2g0 : The condition b0 = 0 provides the existence of A1 n (u0 ) (see Lemma 7.3.5). The inequality jb0 j , 1 is imposed to ensure positivity of the eigenvalues of An (u0 ) (see Lemma 7.3.6). This latter condition is missing in the Phillips paper. Assumption W1 obviously holds for the objective function Qn defined in Section 7.3.2. By Lemma 7.3.3 with Dn ¼ ng0 þ1=2 diag[1, log n] the first part of Assumption W2 is satisfied in the form D1 n Sn (u0 )
s2 1 ! N(0, B0 ), where B0 ¼ 2g0 þ 1 b0 d
b0 : b20
(7:79)
While the convergence (7.4) is true by Lemma 7.3.4, the part of Assumption W2 concerning positive definiteness of A0 is not satisfied. Phillips noticed that, in fact, the inverse Hn1 (u0 ) needs to be normalized. Introducing Fn ¼ 1=(log n) Dn and En ¼ Fn1 Hn (u0 )Fn1 he proved [see Lemma 7.3.5(ii)] En1
g3 ! 12 b0 p
b20 b0
b 0 , where g1 ¼ 2g0 þ 1: 1
(7:80)
The matrices Cn defined by Cn ¼ (1=nd )Dn trivially satisfy the first part of Assumption W3: d ¼ o(1): Cn D1 n ¼n
(7:81)
368
CHAPTER 7
NONLINEAR MODELS
Finally, with the neighborhood Nnr (u0 ) ¼ {u: kCn (u u0 )k r}, by Lemma 7.3.8 the second part of Assumption W3 holds: max kCn1 [Hn (u) Hn (u0 )]Cn1 k ¼ op (1):
u [ Nn1 (u0 )
(7:82)
The algebraic Lemma 7.1.3 does not depend on Eq. (7.80) and continues to be true. The next lemma of the Wooldridge framework, Lemma 7.1.5, uses only Eq. (7.82) and is also true. The remaining statements, starting from Lemma 7.1.6, need a revision.
7.3.10 Bounding the Remainder in the Vicinity of u~n Vn is the same as in Section 7.1.6, Vn ¼ {v: kCn (u~n u0 )k rn }: In the definition of N~ n (r), however, Dn is replaced by Fn : N~ n (r) ¼ {u : kFn (u u~n )k r}:
(7:83)
Lemma. Let g0 . 1=2 and let Assumption P1 hold. There exists a sequence of positive numbers {rn } such that (i) P(Vn ) ! 1 as n ! 1: (ii) For all large n we have sup v [ Vn , u [ N~ n (rn )
jRn (u, u0 )j 4dn rn2 a:s:
(iii) For rn 1 sup jRn (u~n , u0 )j dn rn2 a:s:
v [ Vn
Proof. By definition (7.7) Fn (u~n u0 ) ¼ Fn (Hn0 )1 Sn0 ¼ En1 Fn1 Sn0 :
(7:84)
Here, by Eq. (7.79) and the definition of Fn we have 0 Fn1 Sn0 ¼ (log n)D1 n Sn ¼ Op (log n):
(7:85)
Therefore Eqs. (7.80) and (7.84) imply Fn (u~n u0 ) ¼ Op (log n):
(7:86)
Hence, by Eq. (7.81) 1 ~ ~ Cn (u~n u0 ) ¼ Cn D1 n Dn (un u0 ) ¼ (Cn Dn )[(log n)Fn (un u0 )] ¼ op (1): (7:87)
Lemma 7.1.4 and Eq. (7.87) prove statement (i).
7.3 NONLINEAR LEAST SQUARES
369
Since kFn (u u~n )k rn implies kCn (u u~n )k kCn D1 n log nkrn rn for ~ ~ large n, we have the inclusion N n (rn ) # {u: kCn (u un )k rn } for large n. This inclusion, the triangle inequality and (7.19) lead to the implication v [ Vn , u [ N~ n (rn ) ) u [ Nn2rn (u0 ): Therefore statement (ii) follows from Lemma 7.1.5. As u~n [ Nn2rn for v [ Vn , statement (iii) follows directly from Lemma 7.1.5. B
7.3.11 Consistency of u^n Theorem. Let Assumptions P1 and P2 be satisfied. Then there exists a sequence of ~ n } such that sets {V ~ n ) ! 1 as n ! 1 P(V
(7:88)
~ n there exists an estimator u^n [ N~ n (rn ) such that and for almost any v [ V Fn (u^n u0 ) ¼ Op (log n):
(7:89)
Proof. By Lemma 7.1.3 0 Qn (u) Qn (u~n ) ¼ [Fn (u u~n )] En [Fn (u u~n )] þ 2Rn (u, u0 ) 2Rn (u~n , u0 ) (7:90)
[the right side of Eq. (7.9) gets multiplied by two because the derivatives in Section 7.3.2 are half the usual derivatives; the notation En ¼ Fn1 Hn (u0 )Fn1 is from Lemma 7.3.5(ii)]. On the boundary of N~ n (rn ) [see Eq. (7.83)] we have 0 [Fn (u u~n )] En [Fn (u u~n )] lmin (En )rn2 :
(7:91)
The numbers ln ; lmin (En ) ¼ lmin ½An (u0 )log2 n by Lemma 7.3.6 and Assumption P2 are bounded away from zero, ln c . 0: Denote fn (u) ¼ Qn (u) Qn (u~n ): Obviously, the point u~n [ N~ n (rn ) satisfies fn (u~n ) ¼ 0: We want to show that on the boundary @N~ n (rn ) of N~ n (rn ) the function fn ~ n ¼ Vn > {v: 5dn ln =2}. Since ln c, by Eq. (7.14) and is positive. Let V ~ n satisfies Eq. (7.88). Lemma 7.3.10(i) V ~n According to Lemma 7.3.10 and Eqs. (7.90) and (7.91), for v [ V 1 fn (u) ln rn2 5dn rn2 ln rn2 . 0: 2 u [ @N~ n (rn ) min
Thus, there are points inside N~ n (rn ) where fn takes lower values than on its boundary. Since fn is smooth, it achieves its minimum inside N~ n (rn ) and the point of minimum u^n satisfies the first-order condition (7.56). The inequality kFn (u^n u~n )k rn combined with Eq. (7.86) proves Eq. (7.89). B
370
CHAPTER 7
NONLINEAR MODELS
7.3.12 Phillips’ Representation of the Standardized Estimator Lemma.
Under Assumptions P1 and P2 the estimator from Theorem 7.3.11 satisfies Fn (u^n u0 ) ¼ En1 Fn1 Sn0 þ op (1):
(7:92)
Proof. Expanding Sn (u^n ) about u0 , we have from Eq. (7.56) 0 ¼ S0n þ Hn0 (u^n u0 ) þ (Hn Hn0 )(u^n u0 ), where the Hessian Hn is evaluated at mean values between u0 and u^n : Scaling this condition we get 0 ¼ Fn1 Sn0 þ (Fn1 Hn0 Fn1 )Fn (u^n u0 ) þ [Fn1 (Hn Hn0 )Fn1 ]Fn (u^n u0 ) ¼ Fn1 Sn0 þ {En þ En En1 [Fn1 (Hn Hn0 )Fn1 ]}Fn (u^n u0 ) ¼ Fn1 Sn0 þ En (I þ En1 D~ n )Fn (u^n u0 ), where we have denoted D~ n (u , u0 ) ¼ Fn1 (Hn Hn0 )Fn1 : It follows that on the set Jn ¼ {v: (I þ En1 D~ n )1 exists} we have the representation Fn (u^n u0 ) ¼ (I þ En1 D~ n )1 En1 Fn1 Sn0 :
(7:93)
Now we prove that the probability of Jn approaches 1. By Lemma 7.3.8 (see also the definitions of Fn and Cn in Section 7.3.9)
sup
kD~ n (u, u0 )k ¼
u [ Nn1 (u0 )
log2 n 1 sup kDn (u, u0 )k ¼ op 1 2 d n u [ Nn1 (u0 ) n
with some 1 [ (0, 2d): Using Eq. (7.89) we get
Cn (u^n u0 ) ¼
2 log n log n ^ F ( u u ) ¼ O ¼ op (1): n n 0 p nd nd
(7:94)
7.3 NONLINEAR LEAST SQUARES
371
This implies that u^n and u (which is a mean between u^n and u0 ) belong to Nn1 (u0 ) with probability approaching unity as n ! 1. Therefore by Eq. (7.94) and Lemma 7.3.5(ii) En1
~Dn (u , u0 ) ¼ op 1 , lim P(Jn ) ¼ 1 n!1 n1
and (I þ En1 D~ n )1 ¼
1 j X 1 En1 D~ n ¼ I þ op 1 n j¼0
on Jn :
(7:95)
By Eq. (7.85) and Lemma 7.3.5(ii), En1 Fn1 Sn0 ¼ Op (log n): Hence, Eqs. (7.93) and (7.95) give Eq. (7.92): 1 ^ Fn (un u0 ) ¼ I þ op 1 En1 Fn1 Sn0 n 1 ¼ En1 Fn1 Sn0 þ op 1 Op (log n): n
B
7.3.13 Asymptotic Normality Theorem. Suppose that in the model (7.54) the errors satisfy Assumption P1 and the true parameter vector satisfies Assumption P2. Then the least-squares estimator u^n exists with probability approaching 1, is consistent in the sense of Eq. (7.89) and has the following limit distribution: d
Fn (u^n u0 ) !
1 N(0, s 2 (2g0 þ 1)3 ): 1=b0
(7:96)
Proof. We apply representation (7.92). The limit of En1 is singular [Lemma 7.3.5(ii)] and Fn1 Sn0 diverges [see Eq. (7.85)]. Therefore before letting n ! 1 we need to calculate the product En1 Fn1 Sn0 . By Eq. (7.63) and the definition Fn ¼ 1=(log n) Dn we have 0
1 n g log n X s 0 pffiffiffi us C B n g X n s¼1 n B C s 0 1 0 C ¼ p1ffiffiffi Fn Sn ¼ B us vs , n g B 1 X C n s¼1 n s 0 @ pffiffiffi A us b0 log s n s¼1 n
(7:97)
372
CHAPTER 7
NONLINEAR MODELS
where vs ¼ (log n, b0 log s)0 : With the help of the expression from Lemma 7.3.5(ii) we calculate 0
1 2 2 1 1 þ 1 1 B C log n g1 log n g12 log2 n b0 g1 log n C 1 3B En vs } g1 B C 1 1 @ 1 A b0 log s 1 b0 g1 log n b20 0 1 2 2 1 1 log s þ log n þ 2 B C g1 g 1 log n g1 log n C: } g31 B @ A 1 1 log n þ log s b0 g1 In this and the next two equations } means equality up to a term of order Op (n1 ). Next we replace log s ¼ log (s=n) þ log n and retain the terms of order 1=log n: 1 2 2 1 s 1 log þ log n log n þ 2 þ B C g1 g1 log n g1 log n n B C En1 vs } g13 B C @ A 1 1 s þ log b0 g1 n 1 0 1 s 1 2 s þ log þ log B g1 g1 log n g1 n n C C } g13 B @ A 1 1 s þ log b0 g1 n 0 2 1 g1 2 s 1 1 s þ log þ @ log n g1 þ log } g13 n A: g1 n 1=b0 0 0
(7:98)
From Eqs. (7.97) and (7.98) we see that En1 Fn1 Sn0
}
g13 þ
1 1=b0
n g 1 X s 0 1 s pffiffiffi us þ log g1 n n s¼1 n
n g 1 1 X s 0 s pffiffiffi us 2g1 þ g12 log : log n n s¼1 n n
(7:99)
By Lemma 7.2.7 the second line of Eq. (7.99) is Op (1=log n): The proof of that lemma can be easily modified to show that with F(s) ¼ sg0 (1=g1 þ log s) 0 1 ð1 n g X 1 s 0 1 s d pffiffiffi ! N @0, s 2 F 2 (s) dsA: us þ log g1 n n s¼1 n 0
(7:100)
7.4 BINARY LOGIT MODELS WITH UNBOUNDED EXPLANATORY VARIABLES
373
Here, by Eq. (7.69) ð1
2
ð1
F (s) ds ¼ 0
0
s2g0 2s2g0 1 2g0 2 þ log s þ s log s ds ¼ 3 : 2 g1 g1 g1
(7:101)
Equations (7.99) –(7.101) allow us to conclude that d 1 N(0, s 2 g13 ): En1 Fn1 Sn0 ! 1=b0 This equation and Eq. (7.92) prove Eq. (7.96).
B
7.4 BINARY LOGIT MODELS WITH UNBOUNDED EXPLANATORY VARIABLES 7.4.1 The Binary Logit Model and Log-Likelihood Consider independent observations (xt , yt ), t ¼ 1, . . . , T, where all yt are Bernoulli variables (with values 0 and 1) and xt ¼ (xt1 , . . . , xtK )0 are vectors of explanatory variables. The Bernoulli variable is uniquely characterized by the probability P( yt ¼ 1): In the binary model it is assumed that P( yt ¼ 1jxt ) ¼ F(x0t b0 ),
(7:102)
P where F is some probability distribution function, x0t b0 ¼ Kk¼1 xtk b0k and b0 [ RK is an unknown parameter vector. The choice of the logistic function F(x) ¼ 1=(1 þ ex ), x [ R,
(7:103)
makes the model a binary logit model. The density function of a single Bernoulli variable is P( y) ¼ py (1 p)1y : As a result of Eq. (7.102) and the assumed independence of observations the likelihood function (also known as the joint density) of the sequence of observations (x, y) ¼ {(x1 , y1 ), (x2 , y2 ), . . .} is LT (b; (x, y)) ¼
T Y
[F(x0t b)]yt [1 F(x0t b)]1yt:
t¼1
The log-likelihood is, obviously, log LT (b; (x, y)) ¼
T X
{yt log F(x0t b) þ (1 yt )[1 F(x0t b)]}
t¼1 T X ¼ yt log t¼1
F(x0t b) 0 þ log[1 F(xt b)] : 1 F(x0t b)
374
CHAPTER 7
NONLINEAR MODELS
7.4.2 The Score and Hessian Note that the logit function (7.103) satisfies 1 F(x) ¼ ex =(1 þ ex ) and therefore F(x) ¼ ex ; 1 F(x)
f (x) ; F 0 (x) ¼
ex ¼ F(x)[1 F(x)]: (1 þ ex )2
(7:104)
Using Eq. (7.104) we find the derivatives d F(x0t b) d ¼ x0 b ¼ xt , log db 1 F(x0t b) db t d F 0 (x0t b) xt ¼ F(x0t b)xt log[1 F(x0t b)] ¼ db 1 F(x0t b) and the score T T X X d log LT (b; (x, y)) ¼ [yt xt F(x0t b)xt ] ¼ [ yt F(x0t b)]xt : db t¼1 t¼1
(7:105)
Consequently, using the notation HT (b) ¼
T X
f (x0t b)xt x0t , b [ RK ,
T ¼ 1, 2, . . .
t¼1
the Hessian can be expressed as T X d2 log LT (b; (x, y)) ¼ F 0 (x0t b)xt x0t ¼ HT (b): db db0 t¼1
(7:106)
7.4.3 History The strong consistency of the ML estimator for Eq. (7.102) was studied in the context of repeated samples by Amemiya (1976) and Morimune (1959). These authors assumed that the explanatory variables can take a finite number of values and that the number of observations goes to infinity for each set of possible values of these variables. Such assumptions are appropriate in the context of controlled experiments and do not satisfy most econometric applications. Gourie´roux and Monfort (1981) (henceforth referred to as G&M) made a significant step forward by allowing the explanatory variables to take an infinite number of values. They obtained necessary and sufficient conditions for strong consistency of the ML estimator and proved its asymptotic normality. They discovered an interesting link between the logit model and the OLS estimator for a linear model (see Section 7.4.12); such a link does not exist in case of the probit model. In their argument an important role belongs to the surjection theorem from Cartan
7.4 BINARY LOGIT MODELS WITH UNBOUNDED EXPLANATORY VARIABLES
375
(1967), see Section 7.4.7. Strong consistency was proved with the help of Anderson and Taylor (1979), and asymptotic normality was obtained as an application of an asymptotic normality result for the OLS estimator due to Eicker (1966). Let lKT and l1T denote, respectively, the largest and the smallest eigenvalues of HT (b0 ), where b0 is the true parameter vector:
lKT ¼ lmax (HT (b0 )), l1T ¼ lmin (HT (b0 )), MT ¼ sup kxt k,
(7:107)
tT
PK 2 1=2 . G&M assumed that the regressors are deterministic, where kxt k ¼ k¼1 xtk bounded (supT MT , 1), and that the eigenvalues of HT (b0 ) are of the same order: supT lKT =l1T , 1: We relax some or all of these assumptions, depending on the situation. The case of unbounded explanatory variables requires an accurate estimation of the Lipschitz constant for the mapping [see Eqs. (7.105) and (7.106)] d log LT (b; (x, y)) db T X ¼ b þ HT1 (b0 ) [yt F(x0t b)]xt ,
fT (b; (x, y)) ¼ b þ HT1 (b0 )
(7:108)
t¼1
which is necessary to apply the Cartan theorem. Instead of the result by Anderson and Taylor (1979) we use a more general theorem (Theorem 6.4.8) by Lai and Wei (1982). The theorem by Eicker (1966) does not apply because the squares of the error terms in the linear model are not uniformly integrable (Lemma 7.4.16); instead, we apply the Lindeberg CLT. We show that our results include, as a special case, those due to G&M. Hsiao (1991) also addressed the issue of unbounded explanatory variables. However, since Hsiao’s intention was to cover errors in variables, the resulting conditions are complex and are not directly comparable to ours. Besides, the approach by G&M generalized here gives necessary and sufficient conditions in some cases. The main results are given in Sections 7.4.10, 7.4.13 and 7.4.21. Everywhere we maintain (without explicitly mentioning) the basic assumptions of the logit model: the observations are independent and satisfy Eq. (7.102) with the logit function (7.103). To distinguish the additional assumptions from those in the previous sections, here we provide their numbers with the prefix BL (for binary logit).
7.4.4 Uniqueness of the Maximum Likelihood Estimator 7.4.4.1 Assumption BL1 l1T . 0:
For all large T the matrix HT (b0 ) is positive definite:
Lemma. If Assumption BL1 is satisfied and the ML estimator b^ T (x, y) exists, then it is unique.
376
CHAPTER 7
NONLINEAR MODELS
Proof. Denote G(b) ¼ diag[ f (x01 b), . . . , f (x0T b)],
X ¼ (x1 , . . . , xT ):
(7:109)
Then HT (b) ¼ XG(b)X 0 and rankHT (b0 ) min[rankX, rankG(b0 )] ¼ rankX K
for T K:
As a result of this inequality and Assumption BL1, rankX ¼ K. Among the columns of X there are K linearly independent and the other T K P are linear functions of those K columns. In the sum Tt¼1 f (x0t b)xt x0t we can relocate the terms that correspond to the linearly independent xt to the beginning and the others to the end [this does not change HT (b)] and renumber the terms correspondingly. Then X ¼ (Y, AY) where Y is K K and nonsingular and A is some (T K) K matrix. Partitioning G(b) correspondingly we get HT (b) ¼ (Y, AY)
G1 (b)
0
0
G2 (b)
Y0
Y 0 A0
¼ YG1 (b)Y 0 þ AYG2 (b)Y 0 A0 YG1 (b)Y 0 : Since det[YG1 (b)Y 0 ] ¼ (detY)2 detG1 (b) = 0, HT (b) is nonsingular and the Hessian (7.106) is negative definite. This ensures uniqueness of the ML estimator. B
7.4.5 Lipschitz Condition for the Logit Density Lemma.
With the notation (7.107) for any b, b0 [ RK , we have j f (x0t b0 ) f (x0t b)j 4kb b0 kMT ekbb0 kMT f (x0t b0 ):
(7:110)
Proof. It is convenient to use hyperbolic functions cosh x ¼
ex þ ex ex ex sinh x , sinh x ¼ , tanh x ¼ 2 2 cosh x
and their obvious properties: (cosh x)0 ¼ sinh x, tanh x ¼ tanh (x),
jtanh xj 1:
(7:111)
The density f in Eq. (7.104) can be represented as f (x) ¼ [cosh (x=2)]2 =4 and then Eq. (7.111) can be used to obtain j f 0 (x)j ¼ j f (x) tanh (x=2)j f (x),
1 jxj e f (x) ejxj : 4
(7:112)
7.4 BINARY LOGIT MODELS WITH UNBOUNDED EXPLANATORY VARIABLES
377
By the first relation in Eq. (7.112) and finite increments formula for any x, h with some u [ (0, 1) we have j f (x þ h) f (x)j ¼ j f 0 (x þ uh)hj f (x þ uh)jhj:
(7:113)
Using the inequality jxj jx þ uhj jx (x þ uh)j jhj and the second relation in Eq. (7.112) we bound f (x þ uh) ejxþuhj ¼ ejxjjxþuhj ejxj ejhj ejxj 4f (x)ejhj :
(7:114)
Equations (7.113) and (7.114) imply j f (x þ h) f (x)j 4jhj f (x)ejhj , which leads to the desired estimate 0
j f (x0t b0 ) f (x0t b)j 4jx0t (b b0 )jejxt (bb0 )j f (x0t b0 ) 4kb b0 kMT ekbb0 kMT f (x0t b0 ): B
7.4.6 Lipschitz Condition for f T Let B(b0 , r) ¼ {b [ RK : kb b0 k , r} denote an open ball in RK with center b0 and radius r. The function fT is defined in Eq. (7.108). Lemma. If Assumption BL1 holds, then for any r . 0 and T ¼ 1, 2, . . . the function fT satisfies the Lipschitz condition ~ (x, y))k L(r, T)kb bk ~ kfT (b; (x, y)) fT (b;
for b, b~ [ B(b0 , r)
with the Lipschitz constant L(r, T) ¼ 4rMT erMT lKT =l1T :
(7:115)
Proof. The finite increments formula for vector-valued functions (Kolmogorov and Fomin 1989, Ch. X, Part 1, Section 3) states that ~ (x, y))k kfT (b; (x, y)) fT (b;
dfT (b; (x, y))
kb bk ~ sup
db0
b [ B(b0 ,r)
and the lemma will follow if we prove
dfT (b; (x, y))
L(r, T): sup
db0
b [ B(b0 ,r)
(7:116)
378
CHAPTER 7
NONLINEAR MODELS
From Eqs. (7.106) and (7.108) we get d fT (b; (x, y)) d 2 log LT (b; (x, y)) 1 ¼ I þ H (b ) 0 T db0 db db0 ¼ HT1 (b0 )[HT (b0 ) HT (b)] ¼ HT1 (b0 )
T X
xt x0t [ f (x0t b0 ) f (x0t b)]:
(7.117)
t¼1
Equation (7.117) explains the construction of f T : when HT (b) is close to HT (b0 ), the matrix d f T (b; (x, y))=db0 should be small. Denote
at ¼ j f (x0t b0 ) f (x0t b)j, A ¼ diag[a1 , . . . , aT ], bt ¼ 4kb b0 kMT ekbb0 kMT f (x0t b0 ), B ¼ diag[b1 , . . . , bT ]: By Lemma 7.4.5 at bt . Using the notation (7.109) we have [(, ) is the scalar product in RT ] kHT (b0 ) HT (b)k ¼ kXG(b0 )X 0 XG(b)X 0 k ¼ sup ([G(b0 ) G(b)]X 0 x, X 0 x) kxk¼1
sup (AX 0 x, X 0 x) sup (BX 0 x, X 0 x) kxk¼1
kxk¼1
¼ 4kb b0 kMT ekbb0 kMT kHT (b0 )k:
(7.118)
This bound and Eq. (7.117) imply
df T (b; (x, y))
kH 1 (b0 )kkHT (b0 ) HT (b)k
T
db0 4kb b0 kMT ekbb0 kMT which proves Eq. (7.116) and the lemma.
lKT , l1T B
7.4.7 Surjection Theorem Theorem. (Cartan 1967, Theorem 4.4.1). If the function c : B(b0 , r) ! RK is such that the function f(b) ¼ b c(b) satisfies the Lipschitz condition ~ ckb bk ~ for b, b~ [ B(b0 , r) kf (b) f (b)k with a constant c , 1, then any element of B[c(b0 ), (1 c)r] is the image by c of an element of B(b0 , r): B(c(b0 ), (1 c)r) # c(B(b0 , r)):
(7:119)
7.4 BINARY LOGIT MODELS WITH UNBOUNDED EXPLANATORY VARIABLES
379
7.4.8 Definition Let {rT } be a sequence of positive numbers. We say that the ML estimator b^ T (x, y) exists a.s. and converges a.s. to the true value b0 at the rate o(rT ) if for almost any (x, y) 1. there exists T0 (x, y) such that b^ T (x, y) exists for all T T0 (x, y) and 2. b^ T (x, y) b0 ¼ o(rT ) a.s.
7.4.9 Existence and Convergence of the Maximum Likelihood Estimator in Terms of the Inverse Lipschitz Function The function L(r, T) defined in Eq. (7.115) is continuous and monotone in r and satisfies L(0, T) ¼ 0, L(1, T) ¼ 1 for each T. Therefore for any c [ (0, 1) we can define rT (c) by L[rT (c), T] ¼ c. We call rT (c) an inverse Lipschitz function. By Lemma 7.4.6 for any 1 [ (0, 1) ~ (x, y))k ckb bk ~ kf T (b; (x, y)) f T (b;
for b, b~ [ B(b0 , 1rT (c)):
(7:120)
Denote [see Eq. (7.108)]
cT (b; (x, y)) ¼ b f T (b; (x, y)) ¼ HT1 (b0 )
T X
[ yt F(x0t b)]xt :
(7:121)
t¼1
Lemma. Suppose Assumption BL1 is satisfied and let c [ (0, 1). Then b^ T (x, y) exists a.s. and converges a.s. to the true value b0 at the rate o[rT (c)] if and only if
cT (b0 ; (x, y)) ¼ o(rT (c)) a:s:
(7:122)
Proof. Sufficiency. Following G&M we apply the surjection theorem. If Eq. (7.122) is true, then for almost any (x, y) and for any 1 [ (0, 1) there exists T0 ¼ T0 (x, y, 1) such that kcT (b0 ; (x, y))k , (1 c)1rT (c)
for all T T0 :
(7:123)
Equation (7.120) shows that Theorem 7.4.7 is applicable with r ¼ 1rT (c). By Eq. (7.123) the null vector belongs to the ball B[cT (b0 ; (x, y)), (1 c)1rT (c)]. Therefore Eq. (7.119) ensures the existence of b^ T (x, y) [ B(b0 , 1rT (c))
(7:124)
such that cT [b^ T (x, y); (x, y)] ¼ 0 and, hence, b^ T (x, y) is the ML estimator. (Recall that by Lemma 7.4.4 it is unique.) Since Eq. (7.124) is true for all T T0 and 1 can be arbitrarily small, we have proved that b^ T (x, y) b0 ¼ o(rT (c)) a:s:
(7:125)
380
CHAPTER 7
NONLINEAR MODELS
Necessity. Suppose Eq. (7.122) does not hold. Then for any (x, y) from a set of positive probability there exist 11 ¼ 11 (x, y) and a sequence {Tn } such that Tn ! 1 as n ! 1 and kcTn (b0 ; (x, y))k 11 rTn (c) for all n:
(7:126)
Letting 1 ¼ 11 =[2(1 þ c)] in Eq. (7.120) we get kcT (b; (x, y)) cT (b0 ; (x, y))k kb b0 k þ kf T (b; (x, y)) f T (b0 ; (x, y))k 11 1rT (c) þ c1rT (c) ¼ rT (c) 2 for all b [ B[b0 , 1rT (c)]. This inequality and Eq. (7.126) imply kcTn (b; (x, y))k kcTn (b0 ; (x, y))k kcTn (b; (x, y)) cTn (b0 ; (x, y))k 11 rTn (c) for all b [ B(b0 , 1rTn (c)): 2 This shows that the ML estimator b^ Tn (x, y), if it exists, cannot belong to B[b0 , 1rTn (c)] for all n. The resulting inequality kb^ Tn (x, y) b0 k 1(x, y)rTn (c), n ¼ 1, 2, . . . , is true on a set of positive probability and means that Eq. (7.125) cannot be true. B
7.4.10 Existence and Convergence of the Maximum Likelihood Estimator in Terms of rT Condition (7.122) is not very convenient because the inverse Lipschitz function is difficult to find explicitly. Here we show that it can be replaced by
rT ¼
1 l1T : MT lKT
(7:127)
Let {aT }, {bT } be positive sequences of constants or random variables. We write aT bT if c1 aT bT c2 aT with constants c1 , c2 . 0 independent of T (c1 , c2 may depend on the point in the sample space V if aT , bT are random variables). Theorem. (From my drawer) If Assumption BL1 holds, then b^ T (x, y) exists a.s. and converges a.s. to b0 at the rate o(rT ) if and only if c T [b0 ; (x, y)] ¼ o(rT ) a:s. Proof. Let us prove that for any c [ (0, 1) rT (c) rT :
(7:128)
7.4 BINARY LOGIT MODELS WITH UNBOUNDED EXPLANATORY VARIABLES
Denoting r~ T (c) ¼ rT (c)MT and cT ¼
381
cl1T we rewrite the definition 4lKT
L(rT (c), T) ¼ 4rT (c)MT erT (c)MT
lKT ¼c l1T
of rT (c) as r~ T (c)er~T (c) ¼ cT :
(7:129)
Now consider two cases. 1. Suppose 0 , r~ T (c) 1. Obviously, r rer er for any 0 , r 1 so that r~ lT (c) r~ T (c) r~ uT (c) where the lower and upper bounds r~ lT (c), r~ uT (c) are defined by e~rlT (c) ¼ cT , r~ uT (c) ¼ cT . Thus, 1ecT r~ T (c) cT or cl1T cl1T rT (c) 4eMT lKT 4MT lKT
if rT (c)
1 : MT
(7:130)
2. Let r~ T (c) . 1. In the range r . 1 we have er rer e2r . It follows that l r~ lT (c) r~ T (c) r~ uT (c), where r~ lT (c), r~ uT (c) are defined from e2~r T (c) ¼ cT , u er~ T (c) ¼ cT . Hence, 12 ln cT r~ T (c) ln cT or 1 c l1T 1 c l1T rT (c) ln þ ln ln þ ln lKT lKT 2MT 4 MT 4
if rT (c) .
1 : MT (7:131)
Equations (7.130) and (7.131) allow us to prove Eq. (7.128). If lKT =l1T ! 1, then cT ! 0 and by Eq. (7.129) r~ T (c) ! 0. For all sufficiently large T Eq. (7.130) is true and Eq. (7.128) follows. If, however, lKT =l1T ¼ O(1), then rT 1=MT and Eq. (7.128) follows from either Eq. (7.130) or Eq. (7.131). Now the theorem follows from Lemma 7.4.9 and Eq. (7.128). B Equation (7.128) implies that rT (c0 ) rT (c00 ) for any c0 , c00 [ (0, 1), that is, the dependence of rT (c) on c is insignificant.
7.4.11 Consistency in the Case of Bounded Explanatory Variables 7.4.11.1 Assumption BL2 The explanatory variables are bounded and the eigenvalues of HT (b0 ) are of the same order: supT MT , 1, supT lKT =l1T , 1. Corollary. (Gourie´roux and Monfort 1981, Lemma 3) Suppose Assumptions BL1 and BL2 are satisfied. Then b^ T (x, y) exists a.s. and converges a.s. to b0 if and only if cT [b0 ; (x, y)] ¼ o(1) a.s. as T ! 1.
382
CHAPTER 7
NONLINEAR MODELS
The proof follows from the fact that under Assumption BL2 rT 1. Comparison of this corollary and Theorem 7.4.10 shows that when MT lKT =l1T is allowed to grow, convergence of b^ T to b0 is faster than just “a.s.”
7.4.12 The Relationship between the Logit and Linear Models Denote 0 1 z1 0 p ffiffiffiffiffiffiffiffiffiffiffiffiffiffi yt F(xt b0 ) @ A, zt ¼ x0t f (x0t b0 ), ut ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffi , Z ¼ f (x0t b0 ) z T
0
1 u1 u ¼ @A uT
(7:132)
and consider the model g ¼ Z b þ u,
(7:133)
where b is a K-dimensional parameter. Lemma. Under the specification of the logit model the variables ut defined in (7.132) are independent and satisfy Eut ¼ 0,
Eu2t ¼ 1:
(7:134)
If, further, Assumption BL1 holds, then the OLS estimator for model (7.133) has the property
b^ b ¼ (Z 0 Z)1 Z 0 u ¼ cT (b0 ; (x, y)): Proof. By the binary model assumption (7.102) E( yt jxt ) ¼ 1 P( yt ¼ 1jxt ) þ 0 P( yt ¼ 0jxt ) ¼ F(x0t b0 ), so E(ut jxt ) ¼
E( yt jxt ) F(x0t b0 ) pffiffiffiffiffiffiffiffiffiffiffiffiffiffi ¼0 f (x0t b0 )
and, by the LIE, Eut ¼ 0. Similarly, E( y2t jxt ) ¼ F(x0t b0 ) and Eq. (7.104) implies E(u2t jxt ) ¼ ¼
E( y2t jxt ) 2E( yt jxt )F(x0t b0 ) þ F 2 (x0t b0 ) f (x0t b0 ) F(x0t b0 )[1 F(x0t b0 )] ¼ 1, Eu2t ¼ 1: f (x0t b0 )
We have proved Eq. (7.134).
7.4 BINARY LOGIT MODELS WITH UNBOUNDED EXPLANATORY VARIABLES
383
Independence of ut follows from the assumed independence of the observations (xt , yt ). Further, using Eq. (7.121) we get
Z 0 Z ¼ HT (b0 ), Z 0 u ¼
T X
[ yt F(x0t b0 )]xt ,
(7:135)
t¼1
b^ b ¼ (Z 0 Z)1 Z 0 u ¼ HT1 (b0 )
T X
[ yt F(x0t b0 )]xt ¼ cT (b0 ; (x, y)):
t¼1
B
7.4.13 Conditions for Consistency in Terms of Eigenvalues of HT (b0) Theorem 7.4.10 and Corollary 7.4.11 supply necessary and sufficient conditions for a.s. convergence of b^ T to b0 at the rate o(rT ). These conditions are in terms of a relatively complex function cT [b0 ; (x, y)]. Using Lemma 7.4.12 and the result by Anderson and Taylor (1979) on strong consistency of the OLS estimator, G&M obtained a simpler condition for a.s. convergence of b^ T to b0 . The essence of their result is that under some circumstances the condition lim l1T ¼ 1
T !1
(7:136)
provides a.s. convergence. That the result by Anderson and Taylor (1979) generalized by Lai and Wei (1982) allows us to prove sufficiency of Eq. (7.136) under more general conditions than in the G&M theorem (Assumption BL2 is not required in the theorem below). The next two assumptions are imposed to satisfy the conditions of the Lai and Wei theorem (Theorem 6.4.8). 7.4.13.1
Assumption BL3
The explanatory variables xt are deterministic.
7.4.13.2
Assumption BL4
With some d . 0
(log lKT )1þd ¼ o(l1T ),
log lKT 1=2 ¼ o(rT ): l1T
(7:137)
Theorem. (From my drawer) If Assumptions BL3 and BL4 are satisfied and Eq. (7.136) holds, then b^ T (x, y) exists a.s. and converges a.s. to b0 at the rate o( rT ). Proof. Denote F t ¼ s (u1 , . . . , ut ) the s-field generated by u1 , . . . , ut [for the notation see Eq. (7.132)]. By Lemma 7.4.12 {ut , F t } is a m.d. sequence with supt E(u2t jF t1 ) ¼ supt Eu2t , 1. zt is deterministic and F t1 -measurable.
384
CHAPTER 7
NONLINEAR MODELS
By Theorem 6.4.8 and Eq. (7.137)
kb^ bk ¼ O
log lKT l1T
1=2 ! ¼ o(rT ):
It remains to recall that by Lemma 7.4.12, b^ b ¼ cT [b0 ; (x, y)].
B
7.4.14 Corollary Lemma. (Gourie´roux and Monfort, 1981, Lemma 4, sufficiency part) Suppose Assumptions BL2 and BL3 are satisfied and condition (7.136) holds. Then the ML estimator b^ T (x, y) exists a.s. and converges a.s. to b0 . Proof. We need to verify that Eq. (7.137) follows from the conditions of this corollary. By Assumption BL2 with some constant c . 0
lKT cl1T :
(7:138)
Obviously, for any 1 . 0 there exists c1 . 0 such that log l l1 for l c1 . Choosing 1(1 þ d) , 1, by Eqs. (7.136) and (7.138) we get for all large T d) (cl1T )1(1þd) ¼ o(l1T ): ( log lKT )1þd l1(1þ KT
(7:139)
This is the first part of Eq. (7.137). Assumption BL2 implies rT 1. Therefore the second part of Eq. (7.137) follows from Eq. (7.139): log lKT o(( log lKT )1þd ) ¼ ¼ o(1) ¼ o(rT ): l1T l1T Thus, this corollary really follows from Theorem 7.4.13.
B
7.4.15 Example Here is an example with one unbounded explanatory variable when Corollary 7.4.14 is not applicable and Theorem 7.4.13 is. For this example G&M established just the existence of the ML estimator (see their Remark 1 on p. 88). EXAMPLE.
Let K ¼ 1, b0 ¼ 1, xt ¼ ln t. Then b^ T b0 ¼ o(1= ln T) a:s:
(7:140)
7.4 BINARY LOGIT MODELS WITH UNBOUNDED EXPLANATORY VARIABLES
385
Proof. The first several equations in the linear model asymptotically don’t matter, so it suffices to find the orders of the required quantities. Thus, f (xt b0 ) ¼
e ln t (1 þ
e ln t )2
¼
1=t 1 þ o(1) , ¼ 2 t (1 þ 1=t)
T X ln t ln2 t (1 þ o(1)) ln3 T: zt ¼ pffi (1 þ o(1)), Z 0 Z ¼ t t t¼1
(7.141)
It follows that lKT ¼ l1T ln3 T. Obviously, MT ¼ ln T, rT ¼ 1= ln T and Assumption BL4 is satisfied: 3
(ln ln T)
1þd
3
¼ o(ln T),
ln ln3 T ln3 T
1=2
3 ln ln T ¼ ln T
1=2
1 ¼ o(rT ): ln T
Application of Theorem 7.4.13 yields Eq. (7.140).
B
7.4.16 Lack of Uniform Integrability of Squared Errors Our next goal is to provide conditions sufficient for asymptotic normality of the ML estimator. For this purpose G&M applied the theorem by Eicker (1966) for the OLS estimator for the linear model. Eicker’s theorem requires uniform integrability of the squares u2t : supt Eu2t 1(u2t . c) ! 0 as c ! 1: The lemma below shows that Eicker’s theorem is not applicable when xt are unbounded. Lemma (i) If xt are deterministic, then for c . 0 Eu2t 1(u2t . c) ¼ [1 F(Xt )]1(Xt . ln c) þ F(Xt )1(Xt . ln c),
(7:142)
where Xt ¼ x0t b0 : (ii) If, additionally, supt jXt j ¼ 1 then sup Eu2t 1(u2t . c) ¼ 1 for any c . 0:
(7:143)
t
Proof. (i) It is convenient to collect the properties of the relevant Bernoulli variables in Table 7.1 [see Eqs. (7.102), (7.104) and (7.132)]. Based on the table data and Eq. (7.103) we note that the following equivalences hold for c . 0: if yt ¼ 1 then u2t . c () F(Xt ) ,
1 , Xt . ln c 1þc
386
CHAPTER 7
NONLINEAR MODELS
TABLE 7.1 Properties of Bernoulli Variables
Values of yt
Values of ut
Values of u2t
F(Xt )
1
1 F(Xt ) pffiffiffiffiffiffiffiffiffiffiffi f (Xt )
1 F(Xt ) F(Xt )
1 F(Xt )
0
F(Xt ) pffiffiffiffiffiffiffiffiffiffiffi f (Xt )
F(Xt ) 1 F(Xt )
Probabilities
and if yt ¼ 0 then u2t . c () F(Xt ) .
c () Xt . ln c: 1þc
Together with the table data these give Eq. (7.142): Eu2t 1(u2t . c) ¼ P(yt ¼ 1)[u2t 1(u2t . c)]yt ¼1 þ P(yt ¼ 0)[u2t 1(u2t . c)]yt ¼0 ¼ F(Xt )
1 F(Xt ) 1(Xt . ln c) F(Xt )
þ [1 F(Xt )]
F(Xt ) 1(Xt . ln c) 1 F(Xt )
¼ [1 F(Xt )]1(Xt . ln c) þ F(Xt )1(Xt . ln c): (ii) Suppose jXtk j ! 1 along a subsequence {tk }: Then for any given c . 0 one has jXtk j . ln c for all large k. Since at least one of the numbers 1 F(Xtk ) or F(Xtk ) tends to 1, Eq. (7.142) implies Eq. (7.143). B
7.4.17 G&M Representation of the Bias Lemma.
We have the representation
HT1=2 (b^ T )(b^ T b0 ) ¼ [HT1=2 (b^ T )AT1=2 ][AT1=2 HT1=2 (b0 )][(Z 0 Z)1=2 Z 0 u], where Z and u are from Eq. (7.132),
AT ¼
T X
f (x0t btT )xt x0t
t¼1
and btT is some point belonging to the segment with extremities b^ T and b0 .
(7:144)
7.4 BINARY LOGIT MODELS WITH UNBOUNDED EXPLANATORY VARIABLES
387
Proof. The ML estimator b^ T maximizes the log-likelihood function and, given its smoothness, is the root of the first-order condition [see Eq. (7.105)] T X t¼1
Subtracting from both sides Z 0u ¼
T X
PT
t¼1
yt xt ¼
T X t¼1
F(x0t b0 )xt and using Eq. (7.135) we get
[yt F(x0t b0 )]xt ¼
t¼1
F(x0t b^ T )xt :
T X
[F(x0t b^ T ) F(x0t b0 )]xt :
t¼1
Here by the finite increments formula, F(x0t b^ T ) F(x0t b0 ) ¼ f (x0t btT )x0t (b^ T b0 ). Therefore Z 0 u ¼ AT (b^ T b0 ) and 1=2 1=2 HT1=2 (b^ T )(b^ T b0 ) ¼ HT1=2 (b^ T )A1 (b0 )Z 0 u: T HT (b0 )HT
Application of the identity Z 0 Z ¼ HT (b0 ) [see Eq. (7.135)] finishes the proof of Eq. (7.144). B
7.4.18 Kadison Theorem Let H be a Hilbert space and let B(H) denote the space of all bounded operators in H: For a given set S # R, B(H)S denotes the set of all bounded self-adjoint operators with the spectrum s (A) # S: We are interested in conditions on a real-valued function f that provide strong continuity of f (A), A [ B(H)S : if {An x} converges for each x [ H, then { f (An )x} also converges for each x [ H. The theorem below is sufficient for our purposes: Theorem. (Kadison, 1968, Corollary 3.7) If S is a closed or open subset of R, then a real-valued function defined on S is strong-operator continuous on B(H)S if and only if it is continuous on S, bounded on bounded subsets of S, and O(x) at infinity. We need a simple case of this theorem. When H ¼ Rn , strong-operator continuity coincides with uniform continuity. In applications to symmetric nonnegative matrices we can put S ¼ [0, 1) and then the desired continuity takes place if f is real-valued, continuous on [0, 1) and satisfies f (x) ¼ O(x) as x ! 1.
7.4.19 Functions of Two Matrix Sequences Lemma. satisfy
Suppose two sequences of positive, symmetric K K matrices {AT }, {BT }
sup kBT k , 1, kAT BT k ¼ o(1): T
(7:145)
388
CHAPTER 7
NONLINEAR MODELS
Let f be a real-valued, continuous on [0, 1) function such that f (x) ¼ O(x) as x ! 1. Then k f (AT ) f (BT )k ¼ o(1):
(7:146)
Proof. Suppose Eq. (7.146) is wrong. Then there exist 1 . 0 and a sequence {Tk } such that k f (ATk ) f (BTk )k 1 for all k:
(7:147)
A bounded sequence {BTk } contains a convergent subsequence {BTkn } such that kBTkn Bk ! 0 as n ! 1 for some B: Hence, by Eq. (7.145) kATkn Bk ¼ o(1): The Kadison theorem implies k f (ATkn ) f (B)k ¼ o(1), k f (BTkn ) f (B)k ¼ o(1) and, consequently, k f (ATkn ) f (BTkn )k ¼ o(1): This contradicts Eq. (7.147) and proves the statement. B
7.4.20 Convergence of the Elements of the G&M Representation 7.4.20.1 Assumption BL5 The ML estimator b^ T is consistent in the sense that b^ T b0 ¼ o( rT ) a.s. If necessary, this assumption can be replaced by sufficient conditions from Theorem 7.4.13. 7.4.20.2 Assumption BL6 The largest and smallest eigenvalues of HT (b0 ) are of the same order: supT lKT =l1T , 1. Lemma.
If Assumptions BL5 and BL6 hold, then a:s a:s AT1=2 HT1=2 (b0 ) ! I, HT1=2 ( b^ T )AT1=2 ! I:
(7:148)
Proof. Let us prove the first relation in Eq. (7.148). Denote ^ atT ¼ j f (x0t b0 ) f (x0t btT )j, btT ¼ 4kb^ T b0 kMT ekbT b0 kMT f (x0t b0 ):
By Lemma 7.4.5 and the inequality kbtT b0 k kb^ T b0 k we have atT btT : Then, )], we get similarly to Eq. (7.118) with G(bT ) ¼ diag[ f (x01 b1T ), . . . , f (x0T bTT kHT (b0 ) AT k ¼ kX[G(b0 ) G(bT )]X 0 k ^
4kb^ T b0 kMT ekbT b0 kMT lKT :
(7:149)
7.4 BINARY LOGIT MODELS WITH UNBOUNDED EXPLANATORY VARIABLES
389
By Assumption BL5 and definition (7.127) kb^ T b0 kMT ¼ o(l1T =lKT ) ¼ o(1):
(7:150)
Equations (7.149) and (7.150) lead to ^ kHT (b0 )=lKT AT =lKT k 4kb^ T b0 kMT ekbT b0 kMT ¼ o(1):
By Lemma 7.4.19, in which we put BT ¼ HT (b0 )=lKT and f (x) ¼ x1=2 , this implies k(HT (b0 )=lKT )1=2 (AT =lKT )1=2 k ¼ o(1) or 1=2 kHT1=2 (b0 ) A1=2 T k ¼ o(lKT ):
(7:151)
We also need to bound kAT1=2 k: Owing to equations (7.149) and (7.150), kHT (b0 ) AT k ¼ o(l1T ): It follows that
lmin (AT z, z) ¼ inf (AT z, z) ¼ lmin (HT (b0 ))(1 þ o(1)) kzk¼1
and that k 2l1=2 for large T: kA1=2 T 1T
(7:152)
Now we combine Eqs. (7.151) and (7.152) and Assumption BL6 to get kAT1=2 HT1=2 (b0 ) Ik kA1=2 kkHT1=2 (b0 ) AT1=2 k T ¼ o((lKT =l1T )1=2 ) ¼ o(1) a.s. This is the first relation in Eq. (7.148). Replacing in the definition of atT the vector b0 by b^ T and making the necessary changes in the subsequent calculations, instead of Eq. (7.151) we obtain kHT1=2 ( b^ T ) AT1=2 k ¼ o(l1=2 KT ): Combining this bound with Eq. (7.152), as above, we finish the proof of the second relation in Eq. (7.148). B
7.4.21 Asymptotic Normality of the Maximum Likelihood Estimator Denote 2
gTt ¼ k(Z 0 Z)1=2 z0t k ¼ zt (Z 0 Z)1 z0t ¼ x0t HT1 (b0 )xt f (x0t b0 ), t ¼ 1, . . . , T
390
CHAPTER 7
NONLINEAR MODELS
and X
dT (1) ¼
gTt , 1 . 0:
jx0t b0 j.log(12 =gTt )
7.4.21.1
Assumption BL7
limT !1 dT (1) ¼ 0 for all 1 . 0:
Theorem. (From my drawer) If Assumptions BL3, BL5, BL6, BL7 and the condition liml1T ¼ 1 are satisfied, then d HT1=2 ( b^ T )( b^ T b0 ) ! N(0, I), T ! 1:
(7:153)
Proof. In view of the representation (7.144), in which the first two factors in square brackets satisfy Eq. (7.148), we have to show that d
(Z 0 Z)1=2 Z 0 u ! N(0, I), T ! 1:
(7:154)
By the Crame´r – Wold theorem (Theorem 3.1.53), Eq. (7.154) follows, if we prove d
a0 (Z 0 Z)1=2 Z 0 u ! N(0, a0 a), T ! 1,
(7:155)
for any a [ RK , a = 0: Denote
XTt ¼
T X 1 0 0 1=2 0 1 0 0 1=2 0 a (Z Z) a (Z Z) zt ut , ST ¼ XTt ¼ Z u: kak kak t¼1
XTt are independent and by Eq. (7.134) 1 0 0 1=2 0 a (Z Z) zt Eut ¼ 0, kak o 1 n 0 0 1=2 0 0 0 1=2 0 0 ES2T ¼ E a (Z Z) Z u [a (Z Z) Z u] kak2
EXTt ¼
¼
1 0 0 1=2 0 a (Z Z) Z (Euu0 )Z(Z 0 Z)1=2 a ¼ 1: kak2
By the Lindeberg theorem (Davidson, 1994, p. 369) to prove Eq. (7.155) it is enough to prove that LT (1) ! 0 as T ! 1, for all 1 . 0, where
LT (1) ¼
T X t¼1
2 EXTt 1(jXTt j . 1)
7.4 BINARY LOGIT MODELS WITH UNBOUNDED EXPLANATORY VARIABLES
391
2
is the Lindeberg function. Denoting g~ Tt ¼ [a0 (Z 0 Z)1=2 z0t =kak] we have
2 kak
(Z 0 Z)1=2 z0 ¼ gTt , X 2 ¼ g~ Tt u2 g~ Tt t Tt t kak
(7:156)
and LT (1) ¼
T X
g~ Tt Eu2t 1(u2t . 12 = g~ Tt ):
(7:157)
t¼1
Equation (7.142) yields Eu2t 1(u2t . c) [1 F(Xt )]1(jXt j . ln c) þ F(Xt )1(jXt j . ln c) ¼ 1(jx0t b0 j . ln c):
(7:158)
Now Eqs. (7.156) – (7.158) and Assumption BL7 lead to LT (1)
T X
gTt Eu2t 1(u2t . 12 =gTt )
t¼1
T X
gTt 1(jx0t b0 j . ln(12 =gTt )) ¼ dT (1) ! 0, T ! 1,
t¼1
for any 1 . 0: Application of the Lindeberg theorem completes the proof of Eq. (7.155). B
7.4.22 Corollary Lemma. (Gourie´roux and Monfort, 1981, Proposition 4) If the explanatory variables satisfy Assumptions BL2 and BL3, the smallest eigenvalue of HT (b0 ) goes to infinity and max1tT gTt ! 0, then Eq. (7.153) is true. Proof. Under Assumption BL2, rT 1. By Corollary 7.4.14 b^ T converges to b0 a.s., so Assumption BL5 is satisfied with rT ¼ 1. Assumption BL6 is a part of BL2. Thus, to apply Theorem 7.4.21 we have to verify Assumption BL7. Take any 1 . 0 and choose d . 0 so small that log(12 =d) . Mkb0 k, where M ¼ supt kxt k: Let T(d) be so large that maxt gTt d for T T(d): Then 2 2 1 1 jx0t b0 j Mkb0 k , log log , dT (1) ¼ 0 for T T(d): d gTt B
7.4.23 Example Here is an example that satisfies the conditions of Theorem 7.4.21 but not Corollary 7.4.22.
392
CHAPTER 7
EXAMPLE.
NONLINEAR MODELS
For K, b0 and xt from Example 7.4.15 one has
gTt
ln2 t , dT (1) ! 0 tln3 T
for all 1 . 0
(7:159)
and asymptotic normality (7.153) is true. Proof. From Example 7.4.15 we know that Assumption BL4 is satisfied and condition (7.136) holds. Therefore by Theorem 7.4.13, Assumption BL5 is satisfied (BL6 is trivial). To apply Theorem 7.4.21, it remains to check Assumption BL7. The expression for gTt in Eq. (7.159) follows from Eq. (7.141). dT (1) can only increase if summation over t such that jx0t b0 j . log(12 =gTt ) is replaced by summation over t satisfying eMkb0 k . 12 =gTt : Hence,
dT (1) c1
X t=ln2 tc2 =(12 ln3 T)
ln2 t : t ln3 T
Here the summation can be done over tT ¼ {t: t t0 , t=ln2 t c2 (12 ln3 T)} because the first several observations don’t matter. The function f (t) ¼ t=ln2 t has a positive minimum on [t0 , 1), so for large T tT ¼ ; and dT (1) ¼ 0: B
CHAPTER
8
TOOLS FOR VECTOR AUTOREGRESSIONS
I
chapter we find some technical results that proved useful in the study of VARs. The first part (Sections 8.1– 8.4) is from my work. It describes algebraic properties of Lp -approximable sequences (e.g., how they can be added and multiplied). The second part (Section 8.5) is devoted to a different notion of deterministic trends originated in Johansen (2000) and developed further in Nielsen (2005). Unlike simpler approaches (e.g., polynomial and logarithmic trends), it allows us to consider periodic (seasonal) trends. The Nielsen paper contains very deep results on strong consistency of the OLS estimator for VARs with deterministic trends. My initial intention was to cover all Nielsen’s results but then I realized that they would require another book. N THIS
8.1 Lp-APPROXIMABLE SEQUENCES OF MATRIX-VALUED FUNCTIONS In this section some results from Chapter 2 concerning sequences of vectors are generalized to sequences of matrices, so as to satisfy the needs of the theory of VARs.
8.1.1 Matrix-Valued Functions By replacing real numbers x1 , . . . , xn with matrices X1 , . . . , Xn in the vector x ¼ (x1 , . . . , xn ) we arrive at the following definitions. Let Mp denote the set of all matrices (of different sizes) equipped with the norm k kp : Denote tn ¼ {1, . . . , n}: lp (tn , Mp ) stands for the set of matrix-valued functions X: tn ! Mp provided with the norm
kX; lp (tn , Mp )k ¼
( P
n t¼1
kXt kpp
1=p
,
p , 1:
maxt¼1,..., n kXt k1 ,
p ¼ 1:
Short-Memory Linear Processes and Econometric Applications. Kairat T. Mynbaev # 2011 John Wiley & Sons, Inc. Published 2011 by John Wiley & Sons, Inc.
393
394
CHAPTER 8
TOOLS FOR VECTOR AUTOREGRESSIONS
The size s(A) of a matrix A is defined as the product of its dimensions. We consider only functions X with values Xt of the same size s(X) (which may vary with X). Usual matrix algebra conventions are followed: all vectors are column vectors and all matrices in the same formula are assumed to be compatible.
8.1.2 Definition of Lp -Approximability Let X [ lp (tn , Mp ) be a matrix-valued function. Xtij denotes the (i, j)th element of Xt . For a fixed pair (i, j), the vector Tij (X) ¼ (X1ij , . . . , Xnij )0 is called an (i, j)-thread of X or just a thread when the pair (i, j) is clear from the context. Consider a sequence {Xn : n [ N} of matrix-valued functions of equal sizes s(X1 ) ¼ s(X2 ) ¼ . We say that {Xn } is Lp -approximable if for each (i, j) the sequence of (i, j)-threads {Tij (Xn ) : n [ N} is Lp -approximable in the sense of Section 2.5.1. This means existence of functions Xijc [ Lp satisfying kTij (Xn ) dnp Xijc kp ! 0 for any (i, j): Applying dnp element-wise to the matrix X c composed of Xijc , we equivalently write kXn dnp X c ; lp (tn , Mp )k ! 0,
n ! 1:
When this is true, we say that {Xn } is Lp -close to X c :
8.1.3 Convergence of Trilinear Forms Here we generalize Lemma 2.2.2 to the case of three factors. We write Z c [ C[0, 1] if all components of the matrix Z c belong to the space C[0, 1]: The notation X c [ Lp has a similar meaning. Theorem. Let 1 , p , 1: Consider sequences of matrix-valued functions {Xn }, {Yn }, {Zn } such that Xn , Yn , Zn are defined on tn for all n [ N. Suppose {Xn } is Lp -close to X c [ Lp , {Yn } is Lq -close to Y c [ Lq and {Zn } is L1 -close to Z c [ C[0, 1]: Then
lim
n !1
n X t¼1
ð1 Xnt Ynt Znt ¼
X c (x)Y c (x)Z c (x) dx:
(8:1)
0
Proof. Step 1. Let X c , Y c , Z c be scalar functions. Generalizing Eq. (2.6) we get ð1 0
(Pn X c )(Pn Y c )(Pn Z c ) dx ¼
n X t¼1
(dnp X c )t (dnq Y c )t (dn1 Z c )t :
(8:2)
395
8.1 Lp-APPROXIMABLE SEQUENCES OF MATRIX-VALUED FUNCTIONS
Further, 1 ð ð1 (Pn X c )(Pn Y c )(Pn Z c ) dx X c Y c Z c dx 0
0
1 ð ð1 c c c c ¼ (Pn X X )(Pn Y )(Pn Z ) dx þ X c (Pn Y c Y c )(Pn Z c ) dx 0
0
ð1 þ 0
c c c c X Y (Pn Z Z ) dx
kPn X c X c kp kPn Y c kq kPn Z c k1 þ kX c kp kPn Y c Y c kq kPn Z c k1 þ kX c kp kY c kq kPn Z c Z c k1 :
(8.3)
In the case of X c and Y c we can apply boundedness of Haar projectors (Lemma 2.1.7) and their convergence to the identity operator in Lp , p , 1 (Lemma 2.2.1). For Z c we use uniform continuity: ð kPn Z c Z c k1 ¼ max maxn Z c (x) dx Z c (y) 1tn y[it it ð ¼ max maxn (Z c (x) Z c (y)) dx 1tn y[it it
jZ c (x) Z c ( y)j ! 0,
sup
n ! 1:
jxyj1=n
Now Eqs. (8.2) and (8.3) lead to
lim
n !1
n X
c
c
c
ð1
(dnp X )t (dnq Y )t (dn1 Z )t ¼
t¼1
The limit at the left equals limn!1
X c (x)Y c (x)Z c (x) dx:
0
Pn
t¼1
Xnt Ynt Znt because
n n X X c c c Xnt Ynt Znt (dnp X )t (dnq Y )t (dn1 Z )t t¼1 t¼1 kXn dnp X c kp kYn kq kZn k1 þ kdnp X c kp kYn dnq Y c kq kZn k1 þ kdnp X c kp kdnq Y c kq kZn dn1 Z c k1 ! 0:
396
CHAPTER 8
TOOLS FOR VECTOR AUTOREGRESSIONS
Here we have used the Lp -approximability assumption of the theorem and boundedness of the discretization operator (Lemma 2.1.3). The statement in the case under consideration has been proved. Step 2. In the matrix case the matrix products at the left and right of Eq. (8.1) have elements X
(Xnt )iu (Ynt )uv (Znt )vj
and
X
u,v
c c Xiuc Yuv Zvj :
(8:4)
u,v
This means that the matrix version of Eq. (8.1) can be obtained by applying its scalar sibling to triplets of threads [the (i, u)-thread of Xn , (u, v)-thread of Yn and (v, j)-thread of Zn ] and summing the resulting equations over all pairs (u, v): B
8.1.4 Refined Convergence of Trilinear Forms Theorem.
Under the conditions of Theorem 8.1.3
lim
n !1
[nb] X
ðb Xnt Ynt Znt ¼
t¼[na]
X c (x)Y c (x)Z c (x) dx
a
uniformly with respect to the intervals [a, b] # [0, 1]: Proof. Since the result is not used in this book, only the main steps of the proof are indicated. Similarly to Eq. (2.7) we can obtain 1 ð ð1 Pn F Pn G Pn H dx FGH dx c[vp (F, 1=n)kGkq kHk1 0
0
þ kFkp vq (G, 1=n)kHk1 þ kFkp kGkq kPn H Hk1 ]: Letting F ¼ 1[a,b] X c , G ¼ 1[a,b] Y c and H ¼ Z c and arguing as in the proof of Theorem 2.2.3, it is possible to get
lim
n !1
n X t¼1
c
c
c
ðb
[dnp (1[a,b] X )]t [dnq (1[a,b] Y )]t (dn1 Z )t ¼
X c (x)Y c (x)Z c (x) dx:
a
The proof in the scalar case is completed by making obvious changes in the proof of Theorem 2.5.3. The generalization to the matrix case is straightforward. B
8.2 T-OPERATOR AND TRINITY
397
8.2 T-OPERATOR AND TRINITY 8.2.1 T-Operator Definition (Matrix Case) The definition of the convolution operator from Section 2.3.1 needs to be modified to take into account the possibility of pre- and postmultiplication. lp (Z, Mp ) is obtained from lp (tn , Mp ) by way of replacing tn with Z: Let {c ls : s [ Z} and {c rs : s [ Z} be two sequences of matrices intended for multiplication from the left and right, respectively. The T-operator l Tnr : lp (tn , Mp ) ! lp (Z, Mp ) is defined by (l Tnr X)j ¼
n X
l r ctj Xt ctj , j [ Z:
t¼1
8.2.2 The Adjoint of l Tnr Define (l Tnr ) by ½(l Tnr ) X j ¼
n X
c rjt Xt c ljt , j [ Z:
t¼1
Pn
The formula kX, Yl ¼ tr t¼1 Xj Yj defines a bilinear form with the argument (X, Y) from the product lp (tn , Mp ) lq (tn , Mq ) because n n n X X X X X X Xj Yj ¼ (Xj Yj )ii ¼ (Xj )iu (Yj )ui tr i j¼1 i j¼1 u j¼1
n X
kXj kp kYj kq kX; lp (tn , Mp )kkY; lq (tn , Mq )k:
j¼1
Lemma.
The operator (l Tnr ) is the adjoint of l Tnr in the sense that kl Tnr X, Yl ¼ kX, (l Tnr ) Yl:
Proof. Whenever the products AB and BA are square, the equation trAB ¼ trBA
(8:5)
is true (see Lu¨tkepohl, 1991, Section A.7). Change the summation order: tr
n X
(l Tnr X)j Yj ¼ tr
j¼1
n X n X
l r ctj Xt ctj Yj
j¼1 t¼1
¼ tr
n X t¼1
Xt
n X j¼1
! r l ctj Yj ctj
¼ tr
n X t¼1
Xt [(l Tnr ) Y]t : B
398
CHAPTER 8
TOOLS FOR VECTOR AUTOREGRESSIONS
8.2.3 Boundedness of l Tnr Denote
ac ¼
X
kcsl k1 kcsr k1 :
s[Z
Lemma.
If ac , 1, then k l Tnr X; lp (Z, Mp )k ac kX; lp (tn , Mp )k:
Proof. Denoting xt ¼ kXt kp , x ¼ (x1 , . . . , xn ), c s ¼ kcsl k1 kcsr k1 we have k( l Tnr X)j kp c
n X
l r kctj k1 kXt kp kctj k1
t¼1
¼c
n X
xt c tj ¼ (Tn x)j :
t¼1
Here c depends only on the sizes of the matrices involved. Hence, by Lemma 2.3.2 X
!1=p k(
l
p Tnr X)j kp
c
j[Z
X
!1=p j(Tn x)j j
p
¼ ckTn xklp (Z)
j[Z
cac kxkp ¼ cac
n X
!1=p kXt kpp
:
t¼1
B
8.2.4 The Trinity and Lp -Approximable Sequences (Matrix Case) Here, to conserve notation, we omit the superscripts l and r and denote (Tnþ X)j ¼ (Tn X)j , j . n; (Tn0 X)j ¼ (Tn X)j , 1 j n; (Tn X)j ¼ (Tn X)j , j , 1; X bc X ¼ csl X csr : s[Z
Besides, the norms in lp (tn , Mp ), lp ({ j , 1}, Mp ) and lp ({ j . n}, Mp ) are denoted simply k kp : Theorem.
If ac , 1, p , 1 and {Xn } is Lp -close to X c [ Lp , then lim max{kTn0 Xn bc Xn kp , kTn Xn kp , kTnþ Xn kp } ¼ 0:
n !1
Moreover, {Tn0 Xn } is Lp -close to bc X c [ Lp :
(8:6)
8.2 T-OPERATOR AND TRINITY
399
Proof. From the generic expression (8.4) for the (i, j)th element of a product of three matrices we see that the tth coordinate of the (i, j)-thread of Tn0 Xn bc Xn equals n X X s¼1
¼
l r (cst )iu (Xns )uv (cst )vj
u,v
" n X X u,v
XX
(csl )iu (Xnt )uv (csr )vj
s[Z u,v l r (cst )iu (cst )vj (Xns )uv
s¼1
X
# (csl )iu (csr )vj (Xnt )uv
:
s[Z
The expression in the brackets here is of the type considered in Section 2.5.4 with
c t ¼ (ctl )iu (ctr )vj , bc ¼
X
c s :
s[Z
Since the summation over (u, v) is finite, the Lp -norm of the (i, j)-thread of Tn0 Xn bc Xn tends to zero. As the number of threads is finite, we have kTn0 Xn bc Xn kp ! 0: The other two relations in Eq. (8.6) are proved similarly. The final statement of the theorem follows from Lp -approximability of {Xn }: kTn0 Xn dnp bc X c kp kTn0 Xn bc Xn kp þ kbc (Xn dnp X c )kp ! 0:
B
8.2.5 Shift Operators The backward shift (or lag) operator L and forward shift operator Lþ are defined by (L X)t ¼ Xt1 , 2 t n, (L X)1 ¼ 0, (Lþ X)t ¼ Xtþ1 , 1 t n 1, (Lþ X)n ¼ 0, where X [ lp (tn , Mp ): Lemma. to X c :
If {Xn } is Lp -close to X c [ Lp and p , 1, then {L+ Xn } are Lp -close
l ¼ I, csl ¼ 0 for s=1, Proof. The lag operator obtains from l Tnr if we choose c1 r þ cs ¼ I for all s: A similar choice works for L : Thus the statement follows from the previous theorem. B
400
CHAPTER 8
TOOLS FOR VECTOR AUTOREGRESSIONS
8.3 MATRIX OPERATIONS AND LP-APPROXIMABILITY 8.3.1 Transposition and Summation of Lp -Approximable Sequences By definition, the transposed matrix-valued function X 0 has values Xt0 , t ¼ 1, . . . , n: A sum of two functions X, Y is defined as the function with values Xt þ Yt : Theorem.
Let {Xn } be Lp -close to X c [ Lp : Then
(i) {Xn0 } is Lp -close to (X c )0 , (ii) if {Yn } is Lp -close to Y c [ Lp , then {Xn þ Yn } is Lp -close to X c þ Y c : Proof. Statement (i) is obvious because transposition does not change k kp -norms of matrices: kXn0 dnp (X c )0 kp ¼ kXn dnp X c kp ! 0; (ii) follows from the triangle inequality: k(Xn þ Yn ) dnp (Xn þ Yn )kp kXn dnp X c kp þ kYn dnp Y c kp ! 0: B
8.3.2 Multiplication of Lp -Approximable Sequences A product of two functions X, Y is defined as the function with values Xt Yt : The argument here is similar to that in Section 7.2.6. Theorem. If {Xn } is Lp -close to X c [ Lp and {Yn } is L1 -close to Y c [ C[0, 1], then {Xn Yn } is Lp -close to X c Y c : Proof. Step 1. Consider the scalar case. The fact that {Yn } is L1 -close to Y c means that for any 1 . 0 there is n0 . 0 such that for n . n0 max jYnt (dn1 Y c )t j , 1: t
Since Y c is uniformly continuous, we can also assert that max max jY c (x) Y c ( y)j , 1, t
x,y [ it
8.3 MATRIX OPERATIONS AND LP-APPROXIMABILITY
401
increasing, if necessary, n0 : Therefore for all t ð t t c c c c ¼ Ynt (dn1 Y )t þ n Y (x) dx Y Ynt Y n n it ð t jYnt (dn1 Y c )t j þ n Y c (x) Y c dx n it
21:
(8.7)
It follows that h t i t jXnt Ynt (dnp X c Y c )t j Xnt Ynt Y c þ (Xnt (dnp X c )t )Y c n n n h t io þ dnp X c () Y c () Y c n t 21jXnt j þ max jY c (x)jjXnt (dnp X c )t j þ 1(dnp jX c j)t : x [ [0,1]
Taking Lp -norms on both sides we get kXn Yn dnp X c Y c kp 21kXn kp þ max jY c (x)jkXn dnp X c kp x [ [0,1]
c
þ 1kdnp jX jkp : Recalling that dnp is bounded (Section 2.1.3) and kXn dnp X c kp ! 0 we see that the right-hand side here can be made arbitrarily small by increasing n: Step 2. In the matrix case we note that the (i, j)-thread of Xn Yn dnp X c Y c has the tth component equal to (Xn Yn )tij (dnp X c Y c )tij ¼
X
{(Xnt )iu (Ynt )uj [dnp (X c )iu (Y c )uj ]}:
u
Here {(Xnt )iu : t [ tn } is Lp -close to (X c )iu and {(Ynt )uj : t [ tn } is L1 -close to (Y c )uj , so the expression in the curly brackets tends to 0 in Lp -norm. Since the summation over u is finite, the theorem follows. B
8.3.3 Functions with Constant Matrix Values We say that a matrix-valued function Y: tn ! Mp is constant if Yt ¼ A, t ¼ 1, . . . , n:
402
CHAPTER 8
TOOLS FOR VECTOR AUTOREGRESSIONS
Theorem (i) If {Xn } is L1 -close to X c [ L1 , then {n1=p Xn } is Lp -close to X c : (ii) If {Xn } is Lp -close to X c [ Lp and {Yn } is a sequence of constant functions with values An ! A, then {Xn þ n1=p Yn } is Lp -close to X c þ A and {Xn Yn } is Lp -close to X c A: Proof. (i) Since kn
1=p
c
Xn dnp X kp ¼
n XX i, j
!1=p 1=p
jn
c
(Xnt )ij (dnp X )tij j
p
,
t¼1
it suffices to prove the statement for threads. Obviously, p ð n n X X p 1=p 1=p c 11=p c jn (Xnt )ij (dnp X )tij j ¼ (Xnt )ij n X (x) dx n t¼1 t¼1 it p ð 1 c n max (Xnt )ij n X (x) dx t n it kXn dn1 X c ; l1 (tn , M1 )kp ! 0: (ii) This statement follows from part (i) of this theorem and Theorems 8.3.1 (ii) and 8.3.2. B
8.4 RESOLVENTS 8.4.1 Definition of Resolvents As time series autoregressions are difference equations, difference equations resolvents should play a special role in the theory of autoregressions. Here we look at examples of three resolvents. With a square matrix B we can associate an operator ( Pt1 (lB X)t ¼
s¼1
0,
Bt1s Xs ,
2 t n,
X [ lp (tn , Mp ):
t ¼ 1,
It is easy to check that lB X satisfies the difference equation (lB X)t B(lB X)t1 ¼ Xt1 ,
2 t n:
8.4 RESOLVENTS
403
In the definition of lB the values of X are premultiplied by powers of B, and therefore lB can be termed a left resolvent. Instead of premultiplying by powers of B and/or summing over initial values of s ¼ 1, . . . , t 1 we can postmultiply by powers of B0 and/or sum over terminal values s ¼ t þ 1, . . . , n, as in ( Pn (rB X)t ¼
s¼tþ1
Xs B0st1 ,
0,
1 t n 1, t ¼ n,
X [ lp (tn , Mp ):
Now the difference equation is (rB X)t B0 (rB X)t1 ¼ Xtþ1 B0 , 2 t n: rB is called a right resolvent. Since properties of operators obtained by different combinations of pre- and/or postmultiplication and summation sets are similar, it is enough to study one example of each type. Besides, a statement on boundedness of a resolvent generates a statement on boundedness of its adjoint (Section 8.2.2). This point is not elaborated here. The enveloping resolvent is defined by ( Pt1 (eB X)t ¼
s¼1 B
t1s
Xs B0t1s ,
0,
2 t n,
X [ lp (tn , Mp ):
t ¼ 1,
It satisfies the equation (eB X)t B(eB X)t1 B0 ¼ Xt1 , 2 t n:
8.4.2 Convergence of Resolvents Theorem. Suppose that all eigenvalues of B are inside the unit circle jlj , 1 and that {Xn } is Lp -close to X c [ Lp , p , 1: Then (i) {lB Xn } is Lp -close to (I B)1 X c , {rB Xn } is Lp -close to X c (I B0 )1 , and {eB Xn } is Lp -close to
1 X
Bs X c B0s :
(8:8)
s¼0
(ii) If {Yn } is Lq -close to Y c [ Lq , q , 1, 1=p þ 1=q ¼ 1, then
lim
n !1
n X t¼1
ð1 Ynt (eB Xn )t ¼ 0
Y c (x)
1 X s¼0
Bs X c (x)B0s dx:
(8:9)
404
CHAPTER 8
TOOLS FOR VECTOR AUTOREGRESSIONS
Proof. (i) Comparison of definitions from Sections 8.2.1 and 8.2.4 shows that the resolvents can be obtained from Tn0 with the following choices of the matrices csl and csr : 0, s0 r l , c ¼ I for all s, (a) choice for lB : cs ¼ Bs1 , s , 0 s 0s1 B , s.0 , (b) choice for rB : csl ¼ I for all s, csr ¼ 0, s0 0, s0 0, s0 r c ¼ (c) choice for eB : csl ¼ , : s Bs1 , s , 0 B0s1 , s , 0 By Eq. (6.114) the assumption about the spectrum of B ensures convergence of all series involving B, including ac : By Theorem 8.2.4: P s1 c X ¼ (I B)1 X c , (a) {lB Xn } is Lp -close to bc X c ¼ 1 s¼1 B P c 0s1 (b) {rB Xn } is Lp -close to bc X c ¼ 1 ¼ X c (I B0 )1 , s¼1 X B P s c 0s (c) {eB Xn } is Lp -close to bc X c ¼ 1 s¼0 B X B : (ii) Equation (8.8) and Theorem 8.1.3 imply Eq. (8.9).
B
8.5 CONVERGENCE AND BOUNDS FOR DETERMINISTIC TRENDS 8.5.1 Definition and Examples Suppose a sequence of deterministic vectors {dt : t ¼ 0, 1, . . .} # Rp satisfies the recurrent equation dt ¼ Ddt1 , t ¼ 1, 2, . . . ,
(8:10)
where D is a p p matrix. If D has all its eigenvalues on the unit circle and the vectors d1 , . . . , dp are linearly independent, jlj (D)j ¼ 1, j ¼ 1, . . . , p; rank(d1 , . . . , dp ) ¼ p,
(8:11)
then {dt } is called a deterministic trend. Obviously, Eq. (8.10) is equivalent to dt ¼ Dt d0 , t ¼ 1, 2, . . .
(8:12)
EXAMPLE 8.1. When p ¼ 1, assuming that D ¼ eiw , from the Euler formula and Eq. (8.12) we get dt ¼ eitf d0 ¼ (cos t f þ i sin t f) d0 : This shows that in the 1-D case a monomial dt ¼ t k is not a deterministic trend, unless k ¼ 0:
8.5 CONVERGENCE AND BOUNDS FOR DETERMINISTIC TRENDS
405
(Nielsen, 2005, p. 535). Let
EXAMPLE 8.2.
D¼
0 , 1
1 1
d0 ¼
1 : 1
Then
1 1
d1 ¼
0 1
1 1 1 , d2 ¼ ¼ 0 1 1
0 1
1 1 : ¼ 1 0
If the data are biannual, then the first component dt1 generates a constant in a regression model and the second component dt2 is a dummy for even-numbered years. The eigenvalues of D are +1: EXAMPLE 8.3.
In the 2-D case a linear trend is obtained as follows. Put D¼
1 1 0 , dt ¼ t 1 1
Then Ddt1 ¼
1
0
1
1 1 rank(d1 , d2 ) ¼ rank 1
1
¼
1
t1 t 1 ¼ rankD: 2
¼ dt ;
l(D) ¼ 1 is an eigenvalue of multiplicity two. EXAMPLE 8.4. (Adapted from Johansen, 2000, p. 744.) A periodical trend (biannual dummy) from Example (8.2) can be combined with a linear trend from Example (8.3). Let s1 (t) ¼
1, 0,
t is odd; t is even;
s2 (t) ¼
1, 0,
t is even; t is odd:
Then, obviously, s1 (t þ 1) ¼ s2 (t) ¼ 1 s1 (t) and with 0
1 0 D ¼ @1 1 1 0
0 1 1 1 0 0 A, dt ¼ @ t A, s1 (t) 1
406
CHAPTER 8
TOOLS FOR VECTOR AUTOREGRESSIONS
we have 0
1
0
B Ddt1 ¼ @ 1 1 0
1 0
B ¼@
0
10
1
1
C CB 0 A@ t 1 A 1 s1 (t 1) 1 0 1 1 1 C B C t A ¼ @ t A ¼ dt ;
s1 (t) 1 s1 (t 1) 0 1 1 1 1 B C rank(d1 , d2 , d3 ) ¼ rank@ 1 2 3 A ¼ 3 ¼ rankD 1
0 1
and the eigenvalues of D are 1 and 1:
8.5.2 The Jordan Representation of D By assumption, D has eigenvalues on the unit circle. From now on we suppose that these occur at l distinct complex pairs eiuj and eiuj for 0 uj p, which, of course, reduce to a single value of 1 or 1 if uj equals 0 or p: By a theorem of (Herstein, 1975, p. 308) there exists a regular, real matrix P that block-diagonalizes D as PDP1 ¼ diag[D1 , . . . , Dl ]
(8:13)
where Dj are real Jordan matrices of the form 0 B Lj B0 B Dj ¼ B B B0 @ 0
Ej Lj .. 0
..
.
1 0 C 0 C C C C Ej C A Lj
.
Ej .. . ..
.
(8:14)
and the pair (Lj , Ej ) is one of the pairs (1, 1), (1, 1)
or
cos uj sin uj
sin uj 1 , cos uj 0
0 1
for 0 , uj , p: (8:15)
The numbers
dj ¼ dim Dj =dim Lj , d ¼ max dj
(8:16)
are the multiplicity of the eigenvalue lj (D) and the largest multiplicity of the eigenvalues of D, respectively.
8.5 CONVERGENCE AND BOUNDS FOR DETERMINISTIC TRENDS
407
8.5.3 Normalization of dt The block structure (8.13) induces the block structures of the process dt , 0 0 0 0 0 0 , . . . , dt,l ) , and of the initial vector, d0 ¼ (d0,1 , . . . , d0,l ) : For the jth block dt ¼ (dt,1 t we have the equation dt, j ¼ Dj d0, j , j ¼ 1, . . . , l: The partial initial vector d0, j itself consists of dj blocks that correspond to the diagonal blocks of Eq. (8.14): d0, j ¼ (d0,0 j, 1 , . . . , d0,0 j, dj )0 :
(8:17)
The block dt, j is normalized by NT, j ¼ diag[(Lj =T)dj 1 , . . . , (Lj =T)0 ]
(8:18)
and, correspondingly, dt is normalized by NT ¼ diag[NT,1 , . . . , NT,l ]:
(8:19)
Definitions (8.18) and (8.19) imply kNT k ¼ O(1),
kNT1 k ¼ O(T d1 ):
(8:20)
Denote f (n, ) the vector that consists of the first n terms of the Taylor series:
0 un1 u0 ,..., f (n, u) ¼ : (n 1)! 0! Lemma.
Uniformly in t ¼ 0, . . . , T
NT, j dt, j ¼ (1 þ o(1)) f (dj , t=T) (Ltj d0, j,dj ) þ O
1 , T
as T ! 1:
(8:21)
Proof. Using Eq. (8.14), where the pair (Lj , Ej ) is one of those listed in Eq. (8.15), we write the powers of Dj as 0
Ltj
B B B t Dj ¼ B B 0 B @ 0
t Lt1 j 1
Ltj
0
1 t tdj þ1 Lj C dj 1 C C t tdj þ2 C: Lj C dj 2 C A Ltj
(8:22)
408
CHAPTER 8
TOOLS FOR VECTOR AUTOREGRESSIONS
This is quite similar to the expression for Dk from Section 6.6.3, where the notation has been introduced. Upon premultiplication of Eq. (8.22) by Eq. (8.18) we get 0 BT B B t NT, j Dj ¼ B B B @
1dj
tþd 1 Lj j
T
1dj
t tþd 2 Lj j 1 tþdj 2
T 2dj Lj
0 0
0
a b
1 t Ltj C T dj 1 C C t 2dj Ltj C T C: dj 2 C A Ltj 1dj
Postmultiplying this by Eq. (8.17) we see that the first block of NT, j Dtj d0, j ¼ NT, j dt, j is
(NT, j dt, j )1 ¼
dj 1 1 X
T dj 1
k¼0
t Ltþk j d0, j,dj k dj 1 k
(8:23)
(the blocks of d0, j are counted from the end). For nonzero values of the binomial coefficients from t! t(t 1) (t k þ 1) t k 1 k 1 tk t ¼ 1 ¼ 1 ¼ k k! k! k!(t k)! tk t t we have tk t ¼ (1 þ o(1)) as t ! 1, k k!
for 0 k dj 1:
(8:24)
Equations (8.23) and (8.24) imply dX j 1
(t=T)dj 1k Ltþk d0, j,dj k k j ( d 1 k)!T j k¼0 dj 1 (t=T) 1 Ltj d0, j,dj þ O ¼ (1 þ o(1)) (dj 1)! T
(NT, j dt, j )1 ¼ (1 þ o(1))
(8.25)
(the term with k ¼ 0 leads; all others do not exceed c=T). Equation (8.25) is true for t large, t t0 and T t0 : To extend Eq. (8.25) to values t , t0 , we simply bound t k c and then from Eq. (8.23) in the case dj 2 (NT, j dt, j )1 ¼ O
1 T dj 1
1 : ¼O T
(8:26)
8.5 CONVERGENCE AND BOUNDS FOR DETERMINISTIC TRENDS
409
Since also (t=T)dj 1 t 1 Lj d0, j,dj ¼ O dj 1 , (dj 1)! T
(8:27)
Equation (8.25) follows from Eqs. (8.26) and (8.27) for t , t0 in the case dj 2: Finally, if dj ¼ 1, then Dtj consists of just one block, NT, j ¼ I and NT, j Dtj d0, j ¼ Ltj d0, j,dj ¼
(t=T)dj 1 t L d0, j,dj : (dj 1)! j
Equation (8.25) has been proved for all dj 1 and 0 t T: Replacing in the above argument dj 1 by dj 2, . . . , 0 we obtain analogs of Eq. (8.25) for other blocks of NT, j dt, j : The resulting equations are collected as 1 (t=T)dj 1 t 1 L d þ O (1 þ o(1)) j 0, j,dj B C ( d 1)! T j B C B C (t=T)dj 2 t 1 C B B (1 þ o(1)) C Lj d0, j,dj þ O (dj 2)! T C ¼B B C B C B C B C 0 @ A (t=T) t 1 (1 þ o(1)) Lj d0, j,dj þ O 0! T 1 , ¼ (1 þ o(1))f (dj , t=T) (Ltj d0, j,dj ) þ O T 0
NT, j dt, j
as T ! 1,
and this relationship is uniform in t, 0 t T:
B
A slight change of this argument shows that max kNT Dt k ¼ O(1): 0tT
8.5.4 Trigonometric Lemma Lemma (i) Let L¼
cos u sin u
sin u , cos u
a x¼ : b
Then
a2 þ b2 þ A cos 2t u þ B sin 2t u, t ¼ 1, 2, . . . , 2 where A and B are constant 2 2 matrices with elements depending only on a, b: t
t
0
L x(L x) ¼
1 0
0 1
410
CHAPTER 8
TOOLS FOR VECTOR AUTOREGRESSIONS
(ii) Let Lj ¼
cos uj sin uj
sin uj , cos uj
xj ¼
aj , bj
j ¼ 1, 2:
Then 0
Lt1 x1 (Lt2 x2 ) ¼
X
[A+ cos(t(u1 + u2 )) þ B+ sin(t(u1 + u2 ))]
where A+ , B+ are 2 2 matrices with elements depending only on x1 , x2 : Proof. (i) Since L is rotation by angle u, Lt is rotation by angle tu and
t
Lx¼
cos t u sin t u
sin t u cos t u
a cos t u b sin t u a ¼ : a sin t u þ b cos t u b
(8:28)
Therefore 0
Lt x(Lt x) ¼
a cos t u b sin t u
a sin t u þ b cos t u c11 c12 : ¼ c21 c22
(a cos t u b sin t u, a sin t u þ b cos t u)
where c11 ¼ a2 cos2 t u 2ab sin t u cos t u þ b2 sin2 t u, c12 ¼ c21 ¼ a2 sin t u cos t u þ ab cos2 t u ab sin2 t u b2 sin t u cos t u, c22 ¼ a2 sin2 t u þ 2ab sin t u cos t u þ b2 cos2 t u: Using equations 2 sin a cos a ¼ sin 2a, cos2 a ¼ 12( cos 2a þ 1) and sin2 a ¼ 12(1 cos 2a) this can be rewritten as c11 ¼
a2 b2 (cos 2t u þ 1) þ (1 cos 2t u) ab sin 2t u, 2 2
c12 ¼ c21 ¼ c22 ¼
a2 b2 sin 2t u þ ab cos 2t u, 2
a2 b2 (1 cos 2t u) þ (cos 2t u þ 1) þ ab sin 2t u: 2 2
8.5 CONVERGENCE AND BOUNDS FOR DETERMINISTIC TRENDS
411
The result is 0
1 0 2 a2 þ b2 a b2 0 C B B 0 2 CþB 2 Lt x(Lt x) ¼ B @ a2 þ b2 A @ 0 ab 2 0 1 a2 b2 ab B 2 C C sin 2t u: þB @ a2 b2 A ab 2
1 ab
C C cos 2t u b a A 2 2
2
(ii) This time using Eq. (8.28) we have 0 Lt1 x1 (Lt2 x2 )
¼
a1 cos t u1 b1 sin t u1 a1 sin t u1 þ b1 cos t u1
(a2 cos t u2 b2 sin t u2 , a2 sin t u2 þ b2 cos t u2 ): The terms in the above expressions are linear combinations of the products cos t u1 cos t u2 , cos t u1 sin t u2 , sin t u1 cos t u2 , and sin t u1 sin t u2 : Using the formulas cos(a þ b) þ cos(a b) , 2 cos(a b) cos(a þ b) , sin a sin b ¼ 2 sin(a þ b) þ sin(a b) sin a cos b ¼ 2
cos a cos b ¼
these terms can be rewritten as linear combinations of cos(t(u1 + u2 )) and sin(t(u1 + u2 )): B
8.5.5 Sample Covariance of the Deterministic Process Lemma (i) With an appropriate pair (Lj , Ej ) from Eq. (8.15) we have ð1 T kd0, j,dj k2 1X 0 (NT, j dt1, j )(NT, j dt1, j ) ¼ (1 þ o(1)) f (dj , u) f 0 (dj , u) du dimLj T t¼1 0 1 : Ej þ O T (ii) If kd0, j,dj k2 . 0 then the limiting matrix in (i) is positive definite. P (iii) T1 Tt¼1 (NT,i dt1,i )(NT, j dt1, j )0 ¼ O T1 for i = j:
412
CHAPTER 8
TOOLS FOR VECTOR AUTOREGRESSIONS
Proof. (i) By Lemma 8.5.3 T T 1X 1X t1 0 t1 f dj , (NT, j dt1, j )(NT, j dt1, j )0 ¼ (1 þ o(1)) f dj , T t¼1 T t¼1 T T 1 t1 0 0 (Lt1 j d0, j,dj d0, j,dj (Lj ) ) þ O T (8.29) (the terms corresponding to t ¼ 1 are of order T1 and can be included/ excluded without affecting the result). Let 0 p, q dj 1 be integer numbers. One block of the expression at the right of Eq. (8.29) equals T 1X 1 t 1 pþq t1 1 0 : Lj d0, j,dj d0,0 j,dj (Lt1 ) þ O j T t¼1 p!q! T T
(8:30)
Consider two cases. (a) Suppose dimLj ¼ 1: In this case Lj ¼ +1, d0, j,dj is a real number and t1 0 0 2 Lt1 j d0, j,dj d0, j,dj (Lj ) ¼ jd0, j,dj j ¼ Ej
kd0, j,dj k2 : dimLj
The limit of Eq. (8.29), therefore, is 1 p!q!
ð1
u pþq duEj
kd0, j,dj k2 , dimLj
0
which, in combination with Eq. (8.29), proves (i). (b) Let dimLj ¼ 2: By Lemma 8.5.4(i), Eq. (8.30) equals t 1 pþq Ej kd0, j,dj k2 =dimLj T T 1 1X t 1 pþq 1 : þ (A cos 2t uj þ B sin 2t uj ) þ O p!q! T t¼1 T T
T 1 1X p!q! T t¼1
The desired result follows if we prove that T 1X t 1 pþq 1 : (A cos 2t uj þ B sin 2t uj ) ¼ O T t¼1 T T
8.5 CONVERGENCE AND BOUNDS FOR DETERMINISTIC TRENDS
413
To this end, it is sufficient to prove T 1 1 X
T lþ1
t¼0
1 t cos (2t u þ a) ¼ O T l
(8:31)
for any nonnegative integer l and a constant a ¼ 0 or a ¼ p=2. Direct calculation gives jþ4k @ p , j ¼ 0, 2, sin 2t u þ a þ 2 @u jþ3k jþ4k j4k @ t cos(2t u þ a) ¼ 2 sin(2t u þ a), j ¼ 1, 3, @u t jþ4k cos(2t u þ a) ¼ 2j4k
P l for k ¼ 0, 1, . . . These identities show that T1 t¼0 t cos(2t u þ a) is, up l X @ T1 l to a constant factor cl , t sin(2t u þ a), where x ¼ a or t¼0 @u x ¼ a þ p=2. By (Gradshteyn and Ryzhik 2007, Formula 1.341.1) PT1 t¼0 sin(2t u þ x) ¼ f (x, T, u), where f (x, T, u) ¼ sin[x þ (T 1)u] sin(T u)=sin u [by definition, f (x, T, u) ¼ 0 when sin u ¼ 0]. Thus, T 1 1 X
T lþ1
t l cos(2t u þ a) ¼
t¼0
l cl @ f (x, T, u): lþ1 T @u
l Xl @ f (x, T, u) ¼ a(x, T, u)T m , where m¼0 @u a(x, T, u) are bounded in T. Therefore Eq. (8.31) follows. Ð1 (ii) The matrix 0 f (dj , u) f 0 (dj , u) du is positive definite as a Gram matrix of a linearly independent system (Section 1.7.5). It is easy to see that
(iii) By Lemma 8.5.3 T T 1X 1X t1 0 t1 0 f dj , (NT,i dt1,i )(NT, j dt1, j ) ¼ (1 þ o(1)) f di , T t¼1 T t¼1 T T 1 t1 0 0 : þO d d (L ) Lt1 0,i, d i 0, j,dj i j T Therefore the result follows from Lemma 8.5.4(ii) and Eq. (8.31).
B
414
CHAPTER 8
TOOLS FOR VECTOR AUTOREGRESSIONS
8.5.6 Asymptotic Behavior of Normalized Deterministic Trends Theorem. Then
(Nielsen, 2005, Theorem 4.1) Suppose dt satisfies Eqs. (8.10) and (8.11).
(i) max0tT jNT dt j ¼ O(1) and, in particular, maxtT kdt k ¼ O(T d1 ), where d ¼ max dj is the largest multiplicity of eigenvalues of D: P (ii) limT !1 T1 Tt¼1 (NT dt1 )(NT dt1 )0 is positive definite. P 1 0 (iii) maxtT dt0 Ts¼1 ds1 ds1 dt ¼ O T1 : Proof. (i) The process NT dt is obtained by stacking the processes NT, j dt, j : 0
0
NT,1
B NT dt ¼ @
.. 0
. NT,l
10
1 0 1 dt,1 NT,1 dt,1 C@ A A ¼ @ A: dt,l NT,l dt,l
Therefore the first statement follows from Lemma 8.5.3 and the inequality kNT dt k l maxj kNT, j dt, j k: Further, 1 1dj dt, j ¼ NT, , . . . , (Lj =T)0 ]NT, j dt, j ¼ O(T d1 ); j NT, j dt, j ¼ diag[(Lj =T)
which proves the statement. (ii) The equation 0
1 NT,1 dt1,1 B C (NT dt1 )(NT dt1 )0 ¼ @ A((NT,1 dt1,1 )0 , . . . , (NT,l dt1,l )0 ) NT,l dt1,l ¼ (NT,i dt1,i (NT, j dt1, j )0 )i,l j¼1 implies that the matrix T 1X (NT dt1 )(NT dt1 )0 T !1 T t¼1
M ; lim
¼ lim
T !1
T 1X NT,i dt1,i (NT, j dt1, j )0 T t¼1
!l i, j¼1
has null blocks outside the main diagonal by Lemma 8.5.5(iii) and positive definite diagonal blocks by Lemma 8.5.5(ii). Thus, M itself is positive definite.
8.5 CONVERGENCE AND BOUNDS FOR DETERMINISTIC TRENDS
415
(iii) In the identity dt0
T X
!1 0 ds1 ds1
s¼1
" #1 T X 1 0 1 0 dt ¼ (NT dt ) (NT ds1 )(NT ds1 ) (NT dt ) T T s¼1
the vector NT dt is bounded by part (i). The matrix in the brackets is positive definite by part (ii). B
8.5.7 Corollary Lemma.
Under the conditions of Theorem 8.5.6
!1=2
X
0 T 0
¼ O(T 1=2 ):
max dt ds1 ds1
tT
s¼1
Proof. For a vector x and symmetric matrix A we have 0
kx0 Ak ¼ kA0 xk ¼ ((A0 x) A0 x)1=2 ¼ (x0 A2 x)1=2 , which implies
2 !1=2
!1 31=2
X T X
0 T 0 0
¼ 4d 0
d ds1 ds1 ds1 ds1 dt 5 : t
t
s¼1 s¼1 Now the statement follows from part (iii) of Theorem 8.5.6.
B
REFERENCES Aljancˇic´, S., Bojanic´, R., Tomic´, M. 1955. Deux the´ore`mes relatifs au comportement asymptotique des se´ries trigonome´triques. Srpska Akad. Nauka. Zb. Rad. Mat. Inst., 43(4), 15– 26. Amemiya, T. 1976. The maximum likelihood, the minimum chi-square and the nonlinear weighted leastsquares estimator in the general qualitative response model. J. Amer. Statist. Assoc., 71(354), 347 –351. Amemiya, T. 1985. Advanced Econometrics. Oxford: Blackwell. Anderson, T. W. 1959. On asymptotic distributions of estimates of parameters of stochastic difference equations. Ann. Math. Stat., 30, 676–687. Anderson, T. W. 1971. The Statistical Analysis of Time Series. New York: John Wiley & Sons Inc. Anderson, T. W., Kunitomo, N. 1992. Asymptotic distributions of regression and autoregression coefficients with martingale difference disturbances. J. Multivar. Anal., 40(2), 221–243. Anderson, T. W., Taylor, J. B. 1979. Strong consistency of least squares estimates in dynamic models. Ann. Stat., 7(3), 484–489. Anselin, L. 1988. Spatial Econometrics: Methods and Models. Boston: Kluwer Academic Publishers. Anselin, L., Bera, A. K. 1998. Spatial dependence in linear regression models with an introduction to spatial econometrics. In: Ullah, A., Giles, D.E.A. (eds), Handbook of Applied Economics Statistics. New York: Marcel Dekker. Barlow, W. J. 1975. Coefficient properties of random variable sequences. Ann. Probab., 3(5), 840–848. Barro, R. J., Sala-i-Martin, X. 2003. Economic Growth. 2nd edn. Cambridge: MIT Press. Bellman, R. 1995. Introduction to Matrix Analysis. Classics in Applied Mathematics, Vol. 12. Philadelphia, PA: Society for Industrial and Applied Mathematics (SIAM). Reprint of the 1960 original. Beveridge, S., Nelson, C. R. 1981. A new approach to decomposition of economic time series into permanent and transitory components with particular attention to measurement of the “business cycle”. J. Monetary Econ., 7, 151–174. Billingsley, P. 1968. Convergence of Probability Measures. New York: John Wiley & Sons Inc. Billingsley, P. 1995. Probability and Measure. 3rd edn. New York: John Wiley & Sons Inc. Brown, B. M. 1971. Martingale central limit theorems. Ann. Math. Stat., 42, 59–66. Burkholder, D. L. 1968. Independent sequences with the Stein property. Ann. Math. Stat., 39, 1282– 1288. Burkholder, D. L., Gundy, R. F. 1970. Extrapolation and interpolation of quasi-linear operators on martingales. Acta Math., 124, 249–304. Cartan, H. 1967. Calcul Diffe´rentiel. Paris: Hermann. Case, A. C. 1991. Spatial patterns in household demand. Econometrica, 59, 953–965. Chow, Y. S. 1965. Local convergence of martingales and the law of large numbers. Ann. Math. Stat., 36, 552–558. Chow, Y. S. 1969. Martingale extensions of a theorem of Marcinkiewicz and Zygmund. Ann. Math. Stat., 40, 427– 433. Chow, Y. S. 1971. On the Lp-convergence for n 21/p Sn, 0 , p , 2. Ann. Math. Stat., 42, 393– 394. Christopeit, N., Helmes, K. 1980. Strong consistency of least squares estimators in linear regression models. Ann. Stat., 8(4), 778–788. Chudik, A., Pesaran, M. H., Tosetti, E. 2010. Weak and strong cross section dependence and estimation of large panels, 45 pp. http://ideas.repec.org/p/ces/ceswps/_2689.html Cˇ´ız´ek, P. 2008. Robust and efficient adaptive estimation of binary-choice regression models. J. Amer. Statist. Assoc., 103, 687 –696. Cliff, A. D., Ord, K. 1981. Spatial Processes: Models Applications. London: Pion Ltd.
Short-Memory Linear Processes and Econometric Applications. Kairat T. Mynbaev # 2011 John Wiley & Sons, Inc. Published 2011 by John Wiley & Sons, Inc.
417
418
REFERENCES
Cressie, N. A. C. 1993. Statistics for Spatial Data. New York: John Wiley & Sons Inc. Davidson, J. 1994. Stochastic Limit Theory. New York: Oxford University Press. An introduction for econometricians. de Haan, L., Resnick, S. 1996. Second-order regular variation and rates of convergence in extreme-value theory. Ann. Prob., 24(1), 97–124. Dvoretzky, A. 1972. Asymptotic normality for sums of dependent random variables. Proceedings of the Sixth Berkeley Symposium on Mathematical Statistics and Probability (Univ. California, Berkeley, Calif., 1970/1971), Vol. II: Probability theory. pp. 513– 535, Berkeley: Univ. California Press. Eicker, F. 1963. Asymptotic normality and consistency of the least squares estimators for families of linear regressions. Ann. Math. Stat., 34, 447–456. Eicker, F. 1966. A multivariate central limit theorem for random linear vector forms. Ann. Math. Stat., 37, 1825– 1828. Gantmacher, F. R. 1959. The Theory of Matrices. Vols. 1, 2. New York: Chelsea Publishing Co. Gohberg, I. C., Kreı˘n, M. G. 1969. Introduction to the Theory of Linear Nonselfadjoint Operators. Translations of Mathematical Monographs, Vol. 18. Providence: American Mathematical Society. Goldie, C. M., Smith, R. L. 1987. Slow variation with remainder: Theory and applications. Q. J. Math., 38, 45– 71. Gourie´roux, C., Monfort, A. 1981. Asymptotic properties of the maximum likelihood estimator in dichotomous logit models. J. Econometrics, 17(1), 83– 97. Gradshteyn, I. S., Ryzhik, I. M. 2007. Table of Integrals, Series, and Products. 7th edn. Amsterdam Elsevier/Academic Press. Translated from the Russian, Translation edited and with a preface by Alan Jeffrey and Daniel Zwillinger, with one CD-ROM (Windows, Macintosh and UNIX). Gundy, R. F. 1967. The martingale version of a theorem of Marcinkiewicz and Zygmund. Ann. Math. Stat., 38, 725– 734. Hahn, M. G., Kuelbs, J., Samur, J. D. 1987. Asymptotic normality of trimmed sums of mixing random variables. Ann. Probab., 15, 1395–1418. Hall, P., Heyde, C. C. 1980. Martingale Limit Theory and Its Application. New York: Academic Press Inc. Hamilton, J. D. 1994. Time Series Analysis. Princeton: Princeton University Press. Hannan, E. J. 1979. The central limit theorem for time series regression. Stoch. Proc. Appl., 9(3), 281–289. Herstein, I. N. 1975. Topics in Algebra. 2nd edn. Lexington: Xerox College Publishing. Hill, J. B. 2010. Least tail-trimmed squares for infinite variance autoregressions, under review at Journal of the Royal Statistical Society Series B. http://www.unc.edu/~jbhill/working%20papers.htm Hill, J. B. 2011. Central limit theory for kernel self-normalized tail-trimmed sums of dependent data with applications. Working paper, 30 pp. http://www.unc.edu/~jbhill/clt_tail_trim.pdf Holly, S., Pesaran, M. H., Yamagata, T. 2008. A spatio-temporal model of house prices in the US, 30 pp. http://ideas.repec.org/p/cam/camdae/0654.html Hoque, A. 1985. The exact moments of forecast error in the general dynamic model. Sankhya¯ Ser. B, 47(1), 128–143. Hsiao, C. 1991. Identification and estimation of dichotomous latent variables models using panel data. Rev. Econ. Stud., 58(4), 717– 731. Hurvich, C. M., Deo, R., Brodsky, J. 1998. The mean squared error of Geweke and Porter– Hudak’s estimator of the memory parameter of a long-memory time series. J. Time Ser. Anal., 19(1), 19– 46. Iosida, K. 1965. Functional Aalysis. Berlin: Springer-Verlag. Johansen, S. 2000. A Bartlett correction factor for tests on the cointegrating relations. Economet. Theor., 16(5), 740– 778. Jones, M. C. 1986. Expressions for inverse moments of positive quadratic forms in normal variables. Austral. J. Stat., 28(2), 242– 250. Kadison, R. V. 1968. Strong continuity of operator functions. Pacific J. Math., 26, 121– 129. Kelejian, H. H., Prucha, I. R. 1998. A generalized spatial two-stage least squares procedure for estimating a spatial autoregressive model with autoregressive disturbances. J. Real Estate Finance, 17, 99– 121. Kelejian, H. H., Prucha, I. R. 1999. A generalized moments estimator for the autoregressive parameter in a spatial model. Int. Econ. Rev., 40, 509– 533. Kelejian, H. H., Prucha, I. R. 2001. On the asymptotic distribution of the Moran I test statistic with applications. J. Econometrics, 104, 219 –257.
REFERENCES
419
Kelejian, H. H., Prucha, I. R. 2002. 2SLS and OLS in a spatial autoregressive model with equal spatial weights. Reg. Sci. Urban Econ., 32, 691–707. Kelejian, H. H., Prucha, I. R., Yuzefovich, Y. 2004. Instrumental variable estimation of a spatial autoregressive model with autoregressive disturbances: large and small sample results. Adv. Econometrics, 18, 163–198. Kolmogorov, A. N., Fomin, S. V. 1989. Elementy Teorii Funktsii i Funktsional’nogo Analiza. 6th edn. Moscow: “Nauka”. With a supplement, “Banach algebras”, by V. M. Tikhomirov. Kushner, H. 1971. Introduction to Stochastic Control. New York: Holt, Rinehart and Winston, Inc. Lai, T. L., Wei, C. Z. 1982. Least squares estimates in stochastic regression models with applications to identification and control of dynamic systems. Ann. Stat., 10(1), 154–166. Lai, T. L., Wei, C. Z. 1983a. Asymptotic properties of general autoregressive models and strong consistency of least-squares estimates of their parameters. J. Multivar. Anal., 13(1), 1– 23. Lai, T. L., Wei, C. Z. 1983b. A note on martingale difference sequences satisfying the local Marcinkiewicz – Zygmund condition. Bull. Inst. Math. Acad. Sinica, 11(1), 1 –13. Lai, T. L., Wei, C. Z. 1985. Asymptotic properties of multivariate weighted sums with applications to stochastic regression in linear dynamic systems. In: Multivariate Analysis VI (Pittsburgh, Pa., 1983). Amsterdam: North-Holland., pp. 375– 393. Lai, T. L., Robbins, H., Wei, C. Z. 1978. Strong consistency of least squares estimates in multiple regression. Proc. Nat. Acad. Sci. USA, 75(7), 3034–3036. Lai, T. L., Robbins, H., Wei, C. Z. 1979. Strong consistency of least squares estimates in multiple regression. II. J. Multivar. Anal., 9(3), 343– 361. Lee, LungFei. 2001. Asymptotic distributions of quasi-maximum likelihood estimators for spatial autoregressive models I: spatial autoregressive processes. Ohio State University. Lee, LungFei. 2002. Consistency and efficiency of least squares estimation for mixed regressive, spatial autoregressive models. Economet. Theor., 18(2), 252– 277. Lee, LungFei. 2003. Best spatial two-stage least squares estimators for a spatial autoregressive model with autoregressive disturbances. Economet. Rev., 22(4), 307–335. Lee, LungFei. 2004a. Asymptotic distributions of quasi-maximum likelihood estimators for spatial autoregressive models. Econometrica, 72(6), 1899–1925. Lee, LungFei. 2004b. A supplement to “Asymptotic distributions of quasi-maximum likelihood estimators for spatial autoregressive models”. http://economics.sbs.ohio-state.edu/lee/wp/sar-qml-rappen-04feb.pdf. Long, R. L. 1993. Martingale Spaces and Inequalities. Beijing: Peking University Press. Lu¨tkepohl, H. 1991. Introduction to Multiple Time Series Analysis. Berlin: Springer-Verlag. Mann, H. B., Wald, A. 1943. On the statistical treatment of linear stochastic difference equations. Econometrica, 11, 173– 220. Marcinkiewicz, J., Zygmund, A. 1937. Sur les fonctions inde´pendantes. Fund. Math., 29, 60–90. Mathai, A. M., Provost, S. B. 1992. Quadratic Forms in Random Variables. Statistics: Textbooks and Monographs, Vol. 126. New York: Marcel Dekker Inc. McLeish, D. L. 1974. Dependent central limit theorems and invariance principles. Ann. Prob., 2, 620 –628. Morimune, K. 1959. Comparison of normal and logistic models in the bivariate dichotomous analysis. J. Econometrics, 47, 957– 976. Moussatat, M. W. 1976. On the asymptotic theory of statistical experiments and some of its applications. PhD thesis, Univ. of California, Berkeley. Muench, T. J. 1974. Consistency of least squares estimates of coefficients of stochastic differential equations. Univ. of Minnesota Economic Department Technical Report. Mynbaev, K. T. 1997. Linear models with regressors generated by square-integrable functions. In: Programa e Resumos. 7a Escola de Se´ries Temporais e Econometria. Porto Alegre: ABE e SBE. 6 a 8 de agosto, pp. 80–82. Mynbaev, K. T. 2000. Limits of weighted sums of random variables. Discussion text No. 218/2000, Economics Department, Federal University of Ceara´, Brazil. Mynbaev, K. T. 2001. Lp-approximable sequences of vectors and limit distribution of quadratic forms of random variables. Adv. Appl. Math., 26(4), 302– 329. Mynbaev, K. T. 2006a. Asymptotic properties of OLS estimates in autoregressions with bounded or slowly growing deterministic trends. Commun. Stat. Theor. Methods, 35(1-3), 499– 520.
420
REFERENCES
Mynbaev, K. T. 2006b. OLS Estimator for a Mixed Regressive, Spatial Autoregressive Model: Extended Version. http://mpra.ub.uni-muenchen.de/15153/. Mynbaev, K. T. 2009. Central limit theorems for weighted sums of linear processes: Lp-approximability versus Brownian motion. Economet. Theor., 25(3), 748–763. Mynbaev, K. T. 2010. Asymptotic distribution of the OLS estimator for a mixed regressive, spatial autoregressive model. J. Multivar. Anal., 10(3), 733–748. Mynbaev, K. T. 2011. Regressions with asymptotically collinear regressors. Economet. Journal (forthcoming). Mynbaev, K. T., Castelar, I. 2001. The strengths and weaknesses of L2-approximable regressors. Two Essays on Econometrics. Vol. 1, Fortaleza, Brazil: Expressa˜o Gra´fica. http://mpra.ub.uni-muenchen. de/9056/. Mynbaev, K. T., Ullah, A. 2008. Asymptotic distribution of the OLS estimator for a purely autoregressive spatial model. J. Multivar. Anal., 99, 245–277. Nabeya, S., Tanaka, K. 1988. Asymptotic theory of a test for the constancy of regression coefficients against the random walk alternative. Ann. Stat., 16(1), 218–235. Nabeya, S., Tanaka, K. 1990. A general approach to the limiting distribution for estimators in time series regression with nonstable autoregressive errors. Econometrica, 58(1), 145–163. Nielsen, B. 2005. Strong consistency results for least squares estimators in general vector autoregressions with deterministic terms. Economet. Theor., 21(3), 534– 561. Nielsen, B. 2008. Singular vector autoregressions with deterministic terms: Strong consistency and lag order determination. http://www.nuffield.ox.ac.uk/economics/papers/2008/w14/Nielsen08VAR explosive.pdf. Ord, K. 1975. Estimation methods for models of spatial interaction. J. Amer. Statist. Assoc., 70, 120–126. Pesaran, M. H., Chudik, A. 2010. Econometric analysis of high dimensional VARs featuring a dominant unit, 41 pp. http://ideas.repec.org/p/ces/ceswps/_3055.html Phillips, P. C. B. 1999. Discrete Fourier transforms of fractional processes. Cowles Foundation discussion paper no. 1243, Yale University. Phillips, P. C. B. 2007. Regression with slowly varying regressors and nonlinear trends. Economet. Theor., 23, 557– 614. Po¨tscher, B. M., Prucha, I. R. 1997. Dynamic Nonlinear Econometric Models. Berlin: Springer-Verlag. Rao, C. R. 1965. Linear Statistical Inference and its Applications. New York: John Wiley & Sons Inc. Rao, M. M. 1961. Consistency and limit distributions of estimators of parameters in explosive stochastic difference equations. Ann. Math. Stat., 32, 195–218. Robinson, P. M. 1995. Log-periodogram regression of time series with long range dependence. Ann. Stat., 23(3), 1048–1072. Rubin, H. 1950. Consistency of maximum likelihood estimates in the explosive case. In: Koopmans, T. C. (ed), Statistical Inference in Dynamic Economic Models. Cowles Commission Monograph No. 10. New York: John Wiley & Sons Inc., pp. 356–364. Schmidt, P. 1976. Econometrics. Statistics: Textbooks and monographs. New York: Marcel Dekker, Inc. Seneta, E. 1985. Pravilno Menyayushchiesya Funktsii. Moscow: “Nauka”. Translated from English by I. S. Shiganov, Translation edited and with a preface by V. M. Zolotarev, With appendices by I. S. Shiganov and V. M. Zolotarev. Smirnov, O. L., Anselin, L. 2001. Fast maximum likelihood estimation of very large spatial autoregressive models: a characteristic polynomial approach. Comput. Statist. Data Anal., 35(3), 301– 319. Smith, R. L. 1982. Uniform rates of convergence in extreme-value theory. Adv. in Appl. Probab., 14, 600–622. Stigum, B. P. 1974. Asymptotic properties of dynamic stochastic parameter estimates. III. J. Multivar. Anal., 4, 351–381. Stout, W. F. 1974. Almost Sure Convergence. Academic Press Inc., New York-London. Probability and Mathematical Statistics, Vol. 24. Tanaka, K. 1996. Time Series Analysis. New York: John Wiley & Sons Inc. Nonstationary and noninvertible distribution theory. Taylor, R. L. 1978. Stochastic Convergence of Weighted Sums of Random Elements in Linear Spaces. Lecture Notes in Mathematics, Vol. 672. Berlin: Springer. Theil, H. 1971. Principles of Econometrics. New York: John Wiley & Sons Inc.
REFERENCES
421
Trenogin, V. A. 1980. Funktsional’nyi Analiz. Moscow: “Nauka”. Varga, R. S. 1962. Matrix Iterative Analysis. Englewood Cliffs, Prentice-Hall Inc. Vilenkin, N. Ja. 1969. Kombinatorika. Moscow: Nauka. Wei, C. Z. 1985. Asymptotic properties of least-squares estimates in stochastic regression models. Ann. Stat., 13(4), 1498–1508. White, H. 1994. Estimation, Inference and Specification Analysis. Econometric Society Monographs, Vol. 22. Cambridge: Cambridge University Press. White, J. S. 1958. The limiting distribution of the serial correlation coefficient in the explosive case. Ann. Math. Stat., 29, 1188– 1197. Wooldridge, J. M. 1994. Estimation and inference for dependent processes. Handbook of Econometrics, Vol. IV. Handbooks in Econom., Vol. 2. Amsterdam: North-Holland., pp. 2639– 2738. Wu, Chien-Fu. 1981. Asymptotic theory of nonlinear least squares estimation. Ann. Stat., 9(3), 501 –513. Wu, W. B. 2005. Nonlinear system theory: Another look at dependence. Proc. Natl. Acad. Sci. USA, 102, 14150– 14154. Wu, W. B., Min, W. 2005. On linear processes with dependent innovations. Stochastic Process. Appl., 115, 939–958. Zhuk, V. V., Natanson, G. I. 2001. Seminorms and moduli of continuity of functions defined on a segment. Zap. Nauchn. Sem. S.-Peterburg. Otdel. Mat. Inst. Steklov. (POMI), 276 (Anal. Teor. Chisel i Teor. Funkts. 17), 155– 203.
AUTHOR INDEX Aljancˇic´, S., 143 Amemiya, T., 36, 41, 42, 96, 264, 374 Anderson, T.W., xi, 36, 41, 42, 95 –96, 299, 302, 375, 383 Anselin, L., 189, 236, 263 Barlow, W.J., 282, 285 Barro, R.J., 131 Bellman, R., 22, 23 Bera, A.K., 189 Beveridge, S., 104 Billingsley, P., 26, 29, 43, 125 Bojanic´, R., 143 Brodsky, J., 131 Brown, B.M., 94 Burkholder, D.L., 267, 269, 272, 280 Cartan, H., 374, 378 Case, A.C., 230 –231 Castelar, I., 37, 80 Chow, Y.S., xii, 93, 266, 268, 272 Christopeit, N., 299 Cliff, A.D., 189 Davidson, J., xii, xiii, 20, 21, 25, 26, 32, 41, 91– 95, 266, 269, 270, 390 de Haan, L., 141 Deo, R., 131 Dvoretzky, A., 95
Gradshteyn, I.S., 413 Gundy, R.F., 271– 272 Hall, P., 95, 103, 266, 267 Hamilton, J.D., 33 Hannan, E.J., 95 Helmes, K., 299 Herstein, I.N., 307, 308, 406 Heyde, C.C., 95, 103, 266, 267 Hoque, A., 225 Hsiao, C., 375 Hurvich, C.M., 131 Iosida, K., 76 Johansen, S., x, 393, 405 Jones, M.C., 226 Kadison, R.V., 387 Kelejian, H.H., 190, 191, 214, 222, 230, 236, 263 Kolmogorov, A.N., 14, 18, 80, 115, 116, 377 Kreı˘n, M.G., 121, 201, 293 Kunitomo, N., xi, 95 Kushner, H., 322
Fomin, S.V., 14, 18, 80, 115, 116, 377
Lai, T.L., xii, 265, 270, 272, 274, 278, 281– 282, 284, 287, 288, 289, 293, 299, 301–302, 375, 383 Lee, LungFei, 127, 190, 191, 214, 217, 222, 230, 234, 236, 263 Long, R.L., 267, 281 Lu¨tkepohl, H., 24, 25, 220, 231, 397
Gohberg, I.C., 121, 201, 293 Gourie´roux, C., xii, 339, 374, 381, 384, 391
Mann, H.B., 302 Marcinkiewicz, J., 271, 280, 286 Mathai, A.M., 224
Eicker, F., 36, 43, 375, 385
Short-Memory Linear Processes and Econometric Applications. Kairat T. Mynbaev # 2011 John Wiley & Sons, Inc. Published 2011 by John Wiley & Sons, Inc.
423
424
AUTHOR INDEX
McLeish, D.L., xii, 94 Monfort, A., xii, 339, 374, 381, 384, 391 Morimune, K., 374 Moussatat, M.W., 41 Muench, T.J., 302 Mynbaev, K.T., x, xi, xii, 37, 41, 52, 54, 65, 69, 70, 75, 77, 80, 97, 103, 121, 122 –123, 126, 149, 157, 181, 189, 190, 192, 214, 216, 218, 223, 227, 254, 263 Nabeya, S., xii, 40, 110, 122 Natanson, G.I., 46, 60 Nelson, C.R., 104 Nielsen, B., x, xii, xiii, 265, 301 –302, 318, 393, 405, 414 Ord, K., 189, 224, 236, 263 Paley, R.E.A.C., 267, 276, 280 Phillips, P.C.B., x, xii, 36, 131– 132, 133, 135, 137, 140 –142, 148, 153, 159 – 161, 166 –167, 169, 171 –172, 175 – 176, 179, 181, 185 –186, 339, 346, 351, 358, 366 –367, 370 Provost, S.B., 224 Prucha, I.R., 41, 191, 214, 222, 230, 236, 263 Po¨tscher, B.M., 41 Rao, C.R., 29– 30, 38 Rao, M.M., 302 Resnick, S., 141 Robbins, H., 299 Robinson, P.M., 131
Rubin, H., 302, 315 Ryzhik, I.M., 413 Sala-i-Martin, X., 131 Schmidt, P., 36, 42 Seneta, E., 132, 133, 143–146 Smirnov, O.L., 236 Stigum, B.P., 302 Stout, W.F., 311 Tanaka, K., xii, 40, 99, 104, 110, 122, 123 Taylor, J.B., 299, 375, 383 Taylor, R.L., 95 Theil, H., 140 Tomic´, M., 143 Trenogin, V.A., 6 Ullah, A., xii, 189, 192, 209, 214, 216, 218, 223, 227, 230, 264 Varga, R.S., 303 Vilenkin, N.Ja., 168 Wald, A., 302 Wei, C.Z., xii, 265, 270, 272, 274, 278, 281–282, 284, 287–289, 293, 299, 301–302, 375, 383 White, H., 217 White, J.S., 302 Wooldridge, J.M., 339, 340, 341, 351, 358, 364, 366, 368 Wu, Chien-Fu, 131, 358 Zhuk, V.V., 46, 60 Zygmund, A., 267, 268, 271, 276, 278, 280, 286, 287
SUBJECT INDEX s-additive measures, 14 s-additivity of Lebesgue integrals, 18 L2 -approximable sequence, 41 Lp-approximability abstract example, 87 convergence of the trinity, 70 criterion, 77, 87 definitions, 1-D, 68 explicit construction, 79 exponential trend, 80 geometric progression, 80 logarithmic trend, 80 matrix-valued functions, 110 polynomial trend, 80 c-properties, 68 m-properties, 68 refined convergence of bilinear forms, 69 sp classes, 117 L2-close, 41 Lp-close, 68 matrix case, 110, 394 uniformly, 353 h-continuity, 195 T-decomposition, 103, 104 s-field, 13 s-finite measure, 16 d-function, 174 1-function, 134 G-function, 141 H-function, 155 h-function, 194 L ¼ K(1), 134 L2-generated by, 41 Lp-generated by, 45 s-numbers, 117 T-operator, 55, 104, 297 adjoint, 114, 397 boundedness, 26. 298 definition (matrix case), 397
h-series, 194 2-step estimator, 226 absolute continuity, 18 of Lebesgue integrals, 18 adapted sequence of random variables, 31 adaptive normalizer, 34 adjoint operator, 11, 114 annihilation lemma, 224 asymptotic independence, condition, 39 of normalized regressors, 237 asymptotic linear independence, condition, 40 asymptotically almost surely, 340 autoregressive term domination, 258 Auxiliary vector, 245 awkward aggregates, 206 balancer mixed spatial model, 238 basis, 3 Beveridge–Nelson decomposition, 104 bilinear form, 10 binary model, 373 Borel s-field, 13 Borel –Cantelli lemma, 265 conditional, 266 Borel-measurability, 14 bounded operator, 7 sequence, 4 boundedness in probability, 26 Cauchy sequence, 5 Central Limit Theorems for weighted sums, of linear processes, 103 chain product, 205 characteristic function, 29 characteristic subspace, 21
Short-Memory Linear Processes and Econometric Applications. Kairat T. Mynbaev # 2011 John Wiley & Sons, Inc. Published 2011 by John Wiley & Sons, Inc.
425
426
SUBJECT INDEX
class of normal vectors, 30 classification theorem, 181 CLT, xi CLT for quadratic forms of linear processes version 1, 122 version 2, 126 CLT for weighted sums of linear processes, 103 of martingale differences, 97 coefficients bad, 176 good, 176 compact operator, 112 companion matrix, 301 complete, 5 condition X, 75 conditional expectation, 18 conjugate number, 11 continuous in mean, 47 continuity modulus 2-D, 107 condition X, 75 definition, 46 doubling property, 59 limit at zero, 47 monotonicity, 47 of a continuous function, 47 of a step function, 72 uniform continuity, 60 continuous mapping theorem, 26 conventional scheme, 36 conventional-scheme-compliant normalizer, 37 convergence in distribution, 26 in normed spaces, 4 in probability, 24 in probability to zero, 28 convergence neighborhood, 234 convergence of the trinity (matrix case), 398 convergence on Lp-generated sequences, 65 correction term, 226 correlation, 16 counting function, 266 covariance, 16 covering, 45 Crame´r-Wold device, 91 theorem, 91
criterion of uniform integrability, 91 CSC normalizer, 37 cutter, 64 1-D, 26 2-D, 2 3-D, 20 denominator, 238 dense, 16 dense set, 16 density, 25 dimensional, 3 discretization operator, 40 1-D, 45 2-D, 106 distance, 12 distribution function, 25 double A lemma, 206 double P lemma, 240 Egorov’s theorem, 269 eigenvalue, 21, 115 eigenvector, 21, 115 embedding operator, 55 errors contribution negligibility condition, 34 escort, 195 essential formula of the ML theory of spatial models, 220 essential supremum, 15 estimator, 340 exogenous regressors domination, 258 extended homogeneity, 20 eXtended M, 240 eXtender, 240 first moment, 28 fitted value, 42 Fourier coefficients, 113 series, 113 functions of operators, 115 G&M, 374 gauge, 128 generalized M-Z theorem I, 278 II, 280 multivariate case, 287 genie, 241 Gram matrix, 23
SUBJECT INDEX
Ho¨lder’s inequality, 15 in lp, 11, 15 Haar projector 1-D, 48 2-D, 107 Hessian, 340 high-level conditions, 44 Hilbert space, 11 Hilbert-Schmidt operators, 117 ID, 218 identically distributed vectors, 25 image, 6 increments, 293 independent events, 20 family, 20 variables, 20 indicator, 16 inequality Cauchy –Schwarz, 10 Chebyshov, 89 conditional Chebyshov, 267 conditional Ho¨lder, 267 conditional Jensen, 267 conditional Paley–Zygmund, 268 gauge, 128 Paley–Zygmund, 267 infinite-dimensional, 3 infinitely often, 265 innovations, 32 integral operator, 111 1-D, 48 2-D, 106 interpolation operator, 42 intrinsic characterization, 72 invariant subspace, 22 inverse Lipschitz function, 379 inverse of the major, 62 invertibility criterion, 260 isomorphic spaces, 8 isomorphism, 7 Karamata representation, 133, 143 kernel, 111 Kronecker’s lemma, 270 lag, 399 law of iterated expectations (LIE), 20 least s-field, 13
427
Lebesgue measure, 14 lemma on almost decreasing sequences, 318 on convex combinations of nonnegative random variables, 269 likelihood function, 373 Lindeberg function, 391 Lindeberg– Le´vy theorem, 92 linear independent, 2 operator, 6 space, 2 span, 2 subspace, 2 linear combination of operators, 6 of vectors, 1 linear form in independent standard normals, 92 linear process, 32, 103 linearity of Lp-approximable sequences, 178 link, 205 Lipschitz condition, 377 locator, 45 log-likelihood, 373 logistic function, 373 logit model, 373 low-level conditions, 44 m.d., xiv M–Z condition, 272 major, 62 Marcinkiewicz-Zygmund condition local, 272, 287 multivariate local, 287 martingale difference, 31 difference array, 93 transform, 270 WLLN, 93 martingale convergence theorem, 265–266 martingale strong law, 267 matrix-valued functions Lp-approximable, 394 constant, 384, 401 convergence of trilinear forms, 394 definition, 393 multiplication, 400
428
SUBJECT INDEX
matrix-valued functions (Continued) summation, 400 transposition, 400 measurable function, 14 space, 14 measure nonatomic, 269 Minkowski inequality in Lp, 15 in lp , 5 ML, xiv ML estimator a.s. existence, 379 convergence at some rate, 379 MM, 189 modification factor, 226 monotonicity of lp norms, 5 Moore-Penrose inverse, 38 multicollinearity detector, 257 multiplicity of an eigenvalue, 21, 115 negative part of a function, 19 neighborhood, 13 Newton-Leibniz formula, 221 non zero-tail sequences, 63 nonnegative operator, 114 nonreduced model, 174, 176 norm equivalent, 6 homogeneity, 3 nondegeneracy, 3 nonnegativity, 3 of a vector, 3 of an operator, 7 triangle inequality, 3 normal equation, 30 normalization of system of vectors, 113 normalizer, 32 NLS, xiv nuclear operators, 117 nuke, 201 null space, 6 numerator, 238 OLS, xii open set, 13 orthogonality of subspaces, 12
of system of vectors, 113 of vectors, 12 orthogonal matrix, 21 orthogonality lemma, 76 orthonormal system, 21, 113 orthoprojector, 12 pair, 208 Parseval identity, 114 parsimony principle, 44 perturbation argument, 104 pig-in-a-poke result, 43 positive part of a function, 19 precompact set, 76 premeasurable random variables, 265 principal projectors, 240 projy mixed spatial model, 245 purely spatial model, 209 probabilistic measure, 14 probability space, 14 process long-memory, 32 non-explosive, 301 purely explosive, 301 short-memory, 32 product of operators, 7 projector, 8 proper random variable, 90 proXy mixed spatial model, 245 purely spatial model, 208 pseudo-Case matrices, 232 purely spatial model, 190 QML, 191 quadratic form in independent standard normals, 92 quasi-monotone, 147 random variable, 15, 24 vector, 24 reduced form, 191 refined convergence of bilinear forms of Lp-approximable sequences, 69 of Lp-generated sequences, 52 regulator, 63 relatively compact with probability 1, 319
SUBJECT INDEX
remainder, 143 representation theorem, 132 resolvent convergence, 403 enveloping, 304, 403 left, 403 right, 403 2SLS, 224 sample space, 14 scalar product in Rn , 9 in a Hilbert space, 10 score, 339 second moments, 28 self-adjoint operator, 114 semireduced model, 174, 179 seminorm, 4 sequences nonzero-tail, 63 zero-tail, 63 shift backward, 399 forward, 399 size of a matrix, 394 slowly varying function, 132 with remainder, 143 solver, 191 spaces C[0,1], 2 Lp, 15 Lp , 85, 89 lp , 5 Mp , 5 Rnp , 5–6 spatial matrix, 190 spectral decomposition, 115 spectrum-separating decomposition, 308 square root of a matrix, 23 standard deviation, 16 step function, 16 strong convergence of operators, 8 strong law of large numbers, 93 strongly converge of operators, 8 submultiplicativity of norms, 193 SV, xii SV function, 132 SV function, with remainder, 143 symmetric operator, 12, 114
429
theorem on relative compactness, 319 thread, 394 transition matrix, 176 translation continuous, 47 translation operator 1-D, 46 2-D, 107 in lp (Z), 55 trend deterministic, 404 exponential, 80 geometric progression, 80 linear, 34 logarithmic, 80 polynomial, 80 trinity, 55 boundedness, 54 convergence on Lp-generated sequences, 65 definition, 55 matrix representation, 56 truncation argument, 270 two-step estimator, 226 uncorrelated variables, 30 uniform absolute continuity, 91 L1 -boundedness, 91 convergence of operators, 8 equicontinuity in mean, 76 uniform convergence theorem, 132, 144 uniform partition, 45 uniformly integrable, 91 unit vector in lp , 5 in Rn , 2 vanishing sequence, 4 VAR, xv variance stabilization condition, 33 variance-stabilizing normalizer, 36 variation, 268 vector, 1 vector spaces, 2 weak law of large numbers, 93 WLLN, 93 zero-tail sequences, 63