PERIODICALLY CORRELATED RANDOM SEQUENCES Spectral Theory and Practice
Harry L. Hurd The University of North Carolina at Chapel Hill
Abolghassem Miamee Hampton University
BlCLNThNNlAL
W ILEY-INTERSCIENCE A John Wiley & Sons, Inc., Publication
This Page Intentionally Left Blank
PERIODICALLY CORRELATED RANDOM SEQUENCES
T H E W I L E YB I C E N T E N N I A L - K N O W L E D G E FOR G E N E R A T I O N S
6
ach generation has its unique needs and aspirations. When Charles Wiley first opened his small printing shop in lower Manhattan in 1807, it was a generation of boundless potential searching for an identity. And we were there, helping to define a new American literary tradition. Over half a century later, in the midst of the Second Industrial Revolution, it was a generation focused on building the future. Once again, we were there, supplying the critical scientific, technical, and engineering knowledge that helped frame the world. Throughout the 20th Century, and into the new millennium, nations began to reach out beyond their own borders and a new international community was born. Wiley was there, expanding its operations around the world to enable a global exchange of ideas, opinions, and know-how.
For 200 years, Wiley has been an integral part of each generation's journey, enabling the flow of information and understanding necessary to meet their needs and fulfill their aspirations. Today, bold new technologies are changing the way we live and learn. Wiley will be there, providing you the must-have knowledge you need to imagine new worlds, new possibilities, and new opportunities. Generations come and go, but you can always count on Wiley to provide you the knowledge you need, when and where you need it!
LLh.4Ld.G 2&TU+ WILLIAM J. PESCE PRESIDENT AND CHIEF EXECUTIVE OFFICER
PETER B O O T H WlLE'f CHAIRMAN O F THE BOARD
PERIODICALLY CORRELATED RANDOM SEQUENCES Spectral Theory and Practice
Harry L. Hurd The University of North Carolina at Chapel Hill
Abolghassem Miamee Hampton University
BlCLNThNNlAL
W ILEY-INTERSCIENCE A John Wiley & Sons, Inc., Publication
Copyright 02007 by John Wiley & Sons, Inc. All rights reserved. Published by John Wiley & Sons, Inc., Hoboken, New Jersey Published simultaneously in Canada. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as permitted under Section 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 750-4470, or on the web at www.copyright.com.Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-601 1, fax (201) 748-6008, or online at http://www.wiley.com/go/permission. Limit of LiabilityDisclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives or written sales materials. The advice and strategies contained herein may not be suitable for your situation. You should consult with a professional where appropriate. Neither the publisher nor author shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages. For general information on our other products and services or for technical support, please contact our Customer Care Department within the United States at (800) 762-2974, outside the United States at (317) 572-3993 or fax (317) 572-4002. Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic format. For information about Wiley products, visit our web site at www.wiley.com. Wiley Bicentennial Logo: Richard J. Pacific0 Library of Congress Cataloging-in-Publication Data:
Hurd, Harry L. (Harry Lee), 1940Periodically correlated random sequences : spectral theory and practice / Harry L. Hurd. p. cm. (Wiley series in probability and statisitcs) Includes index. ISBN 978-0-471 -3477 1-2 (cloth) 1. Spectral theory (Mathematics) 2. Sequences (Mathematics) 3. Correlation (Statistics) 4. Stochastic processes. I. Miamee, Abolghassem, 1944- 11. Title. QC20.7.S64H87 2007 5 15'.24--dc22 2007013742 Printed in the United States of America. 1 0 9 8 7 6 5 4 3 2 1
To Marcia, Cheryl, Robert, Oliuia, Angela and t o Efie, Goly, Naxy, and Ali
This Page Intentionally Left Blank
CONTENTS
xiii
Preface Acknowledgments
xvii
Glossary
1
Introduction
1.1 1.2
2
Summary Historical Notes Problems and Supplements
Examples, Models, and Simulations
2.1
xv
Examples and Models 2.1.1 Random Periodic Sequences Sums of Periodic and Stationary Sequences 2.1.2 Products of Scalar Periodic and Stationary 2.1.3 Sequences
1 6 14 16
19 20 20 21 21 vii
viii
CONTENTS
2.2
4
2.1.4 Time Scale Modulation of Stationary Sequences 2.1.5 Pulse Amplitude Modulation 2.1.6 A More General Example 2.1.7 Periodic Autoregressive Models 2.1.8 Periodic Moving Average Models 2.1.9 Periodically Perturbed Dynamical Systems Simulations 2.2.1 Sums of Periodic and Stationary Sequences 2.2.2 Products of Scalar Periodic and Stationary Sequences Time Scale Modulation of Stationary Sequences 2.2.3 2.2.4 Pulse Amplitude Modulation 2.2.5 Periodically Perturbed Logistic Maps 2.2.6 Periodic Autoregressive Models 2.2.7 Periodic Moving Average Models Problems and Supplements
22 23 24 25 27 28 29 29 30 32 33 35 38 40 42
Review of Hilbert Spaces
45
3.1 3.2 3.3 3.4 3.5 3.6
45 47 49 51 53 GO GO 61 64 65
Vector Spaces Inner Product Spaces Hilbert Spaces Operators Projection Operators Spectral Theory of Unitary Operators 3.6.1 Spectral Measures 3.6.2 Spectral Integrals 3.6.3 Spectral Theorems Problems and Supplements
Stationary Random Sequences
67
4.1
68 68 70 72 74 75
4.2
Univariate Spectral Theory 4.1.1 Unitary Shift 4.1.2 Spectral Representation 4.1.3 Mean Ergodic Theorem 4.1.4 Spectral Domain Univariate Prediction Theory
CONTENTS
4.3
4.4
5
Harmonizable Sequences
5.1 5.2 5.3 5.4
6
4.2.1 Infinite Past, Regularity and Singularity 4.2.2 Wold Decomposition 4.2.3 Innovation Subspaces 4.2.4 Spectral Theory and Prediction 4.2.5 Finite Past Prediction Multivariate Spectral Theory 4.3.1 Unitary Shift 4.3.2 Spectral Represent at ion 4.3.3 Mean Ergodic Theorem 4.3.4 Spectral Domain Multivariate Prediction Theory Infinite Past, Regularity and Singularity 4.4.1 4.4.2 Wold Decomposition 4.4.3 Innovations and Rank 4.4.4 Regular Processes 4.4.5 Infinite Past Prediction 4.4.6 Spectral Theory and Rank 4.4.7 Spectral Theory and Prediction 4.4.8 Finite Past Prediction Problems and Supplements
Vector Measure Integration Harmonizable Sequences Limit of Ergodic Average Linear Time Invariant Filters Problems and Supplements
Fourier Theory of the Covariance
6.1 6.2 6.3 6.4
Fourier Series Representation of the Covariance Harmonizability of P C Sequences Some Properties of B k ( . r ) , Fk, and F Covariance and Spectra for Specific Cases 6.4.1 P C White Noise 6.4.2 Products of Scalar Periodic and Stationary Sequences
ix
75 76 78 84 91 99 100 101 102 102 107 107 108 109 116 119 121 123 125 129 133
134 141 145 146 149 151
152 160 168 170 170 171
X
CONTENTS
6.5 6.6 6.7 6.8
Asymptotic Stationarity Lebesgue Decomposition of F The Spectrum of mt Effects of Common Operations on P C Sequences 6.8.1 Linear Time Invariant Filtering 6.8.2 Differencing 6.8.3 Random Shifts 6.8.4 Sampling 6.8.5 Bandshift ing Periodically Time Varying (PTV) Filters 6.8.6 Problems and Supplements
7
Representations of P C Sequences
7.1 7.2
7.3 7.4
8
The Unitary Operator of a P C Sequence Representations Based on the Unitary Operator 7.2.1 Gladyshev Representation 7.2.2 Another Representation of Gladyshev Type 7.2.3 Time-Dependent Spectral Representation 7.2.4 Harmonizability Again Representation Based on Principal Components 7.2.5 Mean Ergodic Theorem P C Sequences as Projections of Stationary Sequences Problems and Supplements
172 173 174 176 176 181 182 187 191 192 194 199 200 201 201 203 203 205 207 210 212 213
Prediction of P C Sequences
215
8.1 8.2 8.3 8.4
218 220 226 229 231 235 235 236 237 246
8.5
Wold Decomposition Innovat ions Periodic Autoregressions of Order 1 Spectral Density of Regular P C Sequences 8.4.1 Spectral Densities for PAR(1) Least Mean-Square Prediction Prediction Based on Infinite Past 8.5.1 8.5.2 Prediction for a PAR(1) Sequence 8.5.3 Finite Past Prediction Problems and Supplements
CONTENTS
9
Estimation of Mean and Covariance
9.1 9.2
Estimation of mt: Theory Estimation of mt: Practice 9.2.1 Computation of &,N 9.2.2 Computation of f i k , ~ Estimation of R(t T , t ) : Theory 9.3.1 Estimation of R(t T , t ) 9.3.2 Estimation of B ~ ( T ) Estimation of R(t T , t ) : Practice 9.4.1 Computation of g ~ ( tT , t ) 9.4.2 Computation of ~ ~ , N T ( T ) Problems and Supplements A
9.3
9.4
10
+
+
+
+
Spectral Estimation
10.1 10.2 10.3 10.4
10.5
10.6
11
xi
249
250 261 262 263 264 265 272 282 283 288 292 297
299 The Shifted Periodogram 302 Consistent Estimators 306 Asymptotic Normality 308 Spectral Coherence 308 Spectral Coherence for Known T 10.4.1 310 10.4.2 Spectral Coherence for Unknown T 312 Spectral Estimation: Practice 312 10.5.1 Confidence Intervals 313 10.5.2 Examples 322 Effects of Discrete Spectral Components 323 10.6.1 Removal of the Periodic Mean 10.6.2 Testing for Additive Discrete Spectral Components 323 10.6.3 Removal of Detected Components 327 Problems and Supplements 328
A Paradigm for Nonparametric Analysis of PC Time Series
331
11.1 The Period T is Known 11.2 The Period T is Unknown
332 334
References
337
Index
351
This Page Intentionally Left Blank
PREFACE
Periodically correlated (or cyclostationary) processes are random processes that have a periodic structure, but are still very much random. Roughly speaking, if the model of a physical system contains randomness and periodicity together, then measurements made on the system (over time) will very likely have a structure that is periodically nonstationary, or in the second order case, periodically correlated. For example, meterological systems, communication systems, systems containing rotating shafts, and economic systems all have these properties. The intent of this work is to introduce the main ideas of periodically correlated processes through the simpler periodically correlated sequences. Our approach is to provide (1) motivating and illustrative examples, (2) an account of the second order theory, and (3) some basic theory and methods for practical time series analysis. Our particular view of the second order theory places emphasis on the unitary operator that propagates or shifts the sequence by one period. This view makes clear the well known connection between stationary vector sequences and periodically correlated sequences. But we do not rely completely on this connection and have sometimes chosen methods of proof that are extensible to continuous time or to almost P C xiii
xiv
PREFACE
processes. As for time series analysis, we suppose that a reader is presented with a sample of a time series and asked to determine if periodic correlation is present, and if so, to say something about it, to characterize it. We present the theory, methods, and algorithms that will help the reader answer this question, within the scope of covariance and spectral estimation. The topic of periodic autoregressive moving average (or PARMA) became too large for inclusion at this time, especially when we began to consider sequences of less than full rank. Accordingly, the book is roughly organized into three parts. Chapters 1 and 2 present basic definitions, simple mathematical models, and simulations whose intent is to motivate and give insight. In this we present a number of examples that illustrate that the usual periodogram analysis cannot be expected to reveal the presence of periodic correlation in a time series. We give a historical review of the topic that mainly emphasizes the early development but gives references to application-specific bibliographies. Chapters 3-8 give background and theoretical structure, beginning with a review of Hilbert space including the spectral theorem for unitary operators and correlation and spectral theory for multivariate stationary sequences. We present the (spectral) theory of harmonizable sequences and then the Fourier theory for the covariance of P C sequences. This is naturally followed by representations for PC sequences and here is where the unitary operator plays its part. We then treat the prediction problem for PC sequences and introduce the rank of a PC sequence. The last three chapters (Chapters 9-11) treat issues of time series analysis for P C sequences. We first treat the nonparametric estimation of mean, correlation, and spectrum. Chapter 11 summarizes the methods into a paradigm for nonparametric time series analysis of possibly P C sequences. MATLAB scripts used in preparing the figures and in conducting the time series analyses, as well as the data used, can be obtained from the website http://www.unc.edu/"hhurd/pc-sequences. The material beginning with Chapter 3 would be useful as a basis for a course of study. It would be helpful for students to have a senior level background in vector spaces, probability, and random processes. The material of Chapter 2 is designed to provide motivation and insight and would probably be helpful to most students except those who may have some familiarity with the topic. HARRYL.HURD AND ABOLGHASSEM MIAMEE Chapel Hall, NC and Hampton, VA January 31, 2007
ACKNOWLEDGMENTS
The authors gratefully acknowledge the support of ONR, USARO, NSA, and the Iranian IPM for work leading to this book. In addition, we acknowledge the encouragement, interest, and helpfulness of Stamatis Cambanis, Harry Chang, Dominique Dehay, Neil Gerr, J. C. Hardin, Christian Houdre, Gopinath Kallianpur, Tim0 Koski, Douglas Lake, Robert Launer, Jacek Leskow, Andrzej Makagon, P. R. Masani, Antonio Napolitano, M. Pourahmadi, M. M. Rao, H. Salehi, and A. M. Yaglom. HLH and AGM
xv
This Page Intentionally Left Blank
GLOSSARY
A univariate process (or sequence). A vector (or multivariate) sequence. The T-variate sequence formed from blocking. The mean of X t ; that is, m(t)= E { X t } . The covariance of X t evaluated at ( s .t ) . The matrix spectral distribution function of the T-variate vector stationary sequence arising from the blocking (lifting) of a univariate PC-T sequence. The matrix spectral density of the T-variate vector stationary sequence arising from the blocking (lifting) of a univariate PC-T sequence. The matrix spectral distribution function of the T-variate vector stationary sequence { Z , " , j = 0 . 1 , . . . T - 1,t E Z} resulting from Gladyshev's transformation. The rank of the PC-T sequence X t . The rank of the matrix A. Hilbert space generated by the sequence X t . Periodically Correlated Random Sequences:Spectral Theory and Practice. By H.L. Hurd and A.G. Miamee Copyright @ 2007 John Wiley & Sons, Inc.
xvii
xviii
c M NND
GLOSSARY
Generic set with a linear structure. Generic subspace of a Hilbert space. Nonnegative definite.
CHAPTER 1
INTRODUCTION
Periodically correlated (PC) random processes are random processes in which there exists a periodic rhythm in the structure that is generally more complicated than periodicity in the mean function. We will begin with an illustration of some meteorological data. The top trace of Figure 1.1shows a 40 day record of hourly solar radiation levels taken at meteorological station DELTA on Ellsmere Island, N.W.T., Canada. A daily (24 hour period) rhythm may be observed in this data in two ways: in the periodic average (or mean) and in the variation about the periodic mean. Since solar radiation can be expected to have a 24 hour period, let us compute the average of the 40 measurements for each of the 24 hours. Precisely, if the time series is denoted by X t , t = 1 , 2 , ..., N T , where N T = 960, then the sample periodic m e a n (with period T = 24) is computed by
. N-1 Periodically Correlated Random 5’equences:Spectral Theory and Practice. By H.L. Hurd and A.G. Miamee Copyright @ 2007 John Wiley & Sons, Inc.
1
2
INTRODUCTION
OO
100;
200
400
600
800
I 1000
5
10
15
20
I 25
(Top) Solar radiation from station DELTA of the Taconite Inlet Project [211]. (Bottom) r i Z N ( t ) with 95% confidence intervals determined by the Student’s t. T = 24, N = 40. Figure 1.1
and plotted in the bottom trace of Figure 1.1. For t not in the base interval, r h N ( t ) is defined periodically. It is visually clear that the sample periodic mean is not constant (but properly periodic) and a simple hypothesis test for difference in mean, say, between hour 1 and hour 13, indicates a difference with much significance. We postpone the details of testing for a proper fluctuation in the mean (i.e., for rejection of the hypothesis that the true mean m ( t )is constant) to Chapter 9. The top trace of Figure 1.2 is the deviation yt = X t - r h ~ ( t )of X t from the sample periodic mean r h ~ ( t )The . bottom trace presents the sample periodic variance,
.
N-1
and it too appears to have a significant (with the details again postponed) variation through the period. So it is not just the mean that appears to have a periodic rhythm, the variance does too, suggesting that the entire probability law may have a periodic rhythm. We will state this more precisely following some discussion of notation. First, a stochastic (or random) process X ( t , w ) is taken to be a function X : 1 x R 4 C,where C is the set of complex numbers, Jl is called the i n d e x set, and R is a space, on which a sigma-algebra 3 of subsets and a probability measure P are defined. An F-measurable function is called a r a n d o m variable, and for a stochastic process, the function X ( t , .) is assumed to be a random variable for each t E 1. Although the focus of this book is random sequences
3 200I-
i
I
0
-200 0
200
400
600
800
I 1000
OO
5
10
15
20
25
I
Y
I
Figure 1.2
(Top) Deviation around sample periodic mean. (Bottom) S,(t) with 95% confidence limits determined by the chi-squared distribution with N - 1 = 39 degrees of freedom.
(I = Z)having a periodic rhythm, extensions of the ideas to fields (I = Z 2 ) ,to processes (1= R), to multivariate sequences, and to almost periodic sequences are briefly described in the supplements to this chapter. We will most often denote the element of the random sequence by Xt so that the dependence on w is suppressed and the index is the subscript symbol t , conveying time. The essential structure needed to characterize a stochastic process is its probability law, meaning the collection of finite dimensional distributions, defined as the probabilities Ptl.tz...., tn(A11A21...,A,) = P [ X t 1 EA1,XtZ E A21...iXtn E An]
(1.3)
for arbitrary n, collection of times t l , t 2 , ..., t , in Z, and Borel sets A l l A2, ...,A, of @.
Definition 1.1 (Strict Stationarity) A stochastic process X t ( w ) is called (strictly) stationary i f its probability law is invariant with respect to time shifts, or more precisely, if for arbitrary n, collection of times t l , t2, ..., t , in Z,and Borel sets A l , A2, ... , A , of C we have
Pt,+l.tz+l.....t,+l(Al, A2,. . . An) = Ptl,tz,,..,t , ( A l ,A2,. . . An). i
1
(1.4)
Now we can formalize the structure suggested by Figures 1.1 and 1.2.
Definition 1.2 (Periodically Stationarity) A stochastic sequence X t ( w ) is called (strictly) periodically stationary with period T zf, for every n, any collection of times tl,t2,..., t , in Z,and Borel sets A l , A2, ..., A, of @, Ptl+T.tz+T,...,t , + ~ ( A 1A2,. , . . , An) = Ptl,tz,.._.t,(Al, A2,. . . >A,),
(1.5)
4
INTRODUCTION
and there are n o smaller values of T > 0 f o r which (1.5) holds. Synonyms for periodically stationary include periodically nonstationary, cyclostationary (think of cyclically stationary), processes with periodic structure, and a few others. For a little more on this nomenclature, see the historical notes (Section 1.2) at the end of this chapter. If (1.5) holds for T = 1, then the process (or sequence) is stationary and it is clear that if Xt is periodically stationary with period T , then it is also for period kT,k E Z.And so we say that a sequence is properly periodically stationary if the least T for which (1.5) holds exceeds 1. Most often we will be considering second order random sequences, so that
We will sometimes just write that sequences
m(t):=
XtE L2. The mean exists for second order
S, x , ( w ) ~ ( c i w ) ,
and we define the covariance of the pair
for all t E
z
(X8, Xt)to be
R ( s , t ) := COV(X,,Xt) = E{[X, - m,][Xt - mt]} If there is no ambiguity, we will write m(t)and R(s,t ) for the mean and covariance of Sometimes, in order to conserve space, we will write variables as subscripts rather than in parentheses, such as mt for m(t) and R,,t for R(s,t ) . Since, for a zero mean sequence X t , the covariance
x,.
c o v (XS, Xt)= E{X,Xt} is clearly the L2 inner product, our conclusions about zero mean second order random sequences can be interpreted for sequences of vectors in a Hilbert space. For some topics (e.g., those involving shift operators) it will be more natural to think of Xtin this manner. The notion of stationarity for second order sequences is expressed in terms of the first two moments.
Definition 1.3 (Weak Stationarity) A second order random process X t E L 2 ( R , 3 ,P ) with t E Z is called (weakly) stationary if f o r every s, t E Z
m ( t )= m and R ( s , t ) = R ( s - t ) . If Xt is of second order, periodic stationarity induces a rhythmic structure in the mean and covariance.
5
Definition 1.4 (Periodically Correlated) A second order process X t E L2(R,.F,P ) is called periodically correlated with period T (PC-T) if f o r every s,t E m(t)= m(t T ) (1.6)
z
and
+
R(s,t ) = R(s + T ,t + T )
(1.7) and there are n o smaller values of T > 0 f o r which (1.6) and (1.7) hold. It is clear that if the period is T , then (1.6) and (1.7) also hold when T is replaced by k T , for any integer k . If Xt is PC-1 then it is stationary (weakly) because then R(s,t ) is a function only of s - t. Clearly a stationary sequence is PC with every period. We will write an indexed collection { X { , j = 1 , 2 , . . . , q } of random sequences as the vector sequence Xt = [X,’, . . . , x:]’.
x:,
Definition 1.5 (Multivariate Stationarity) A second order q-variate rund o m sequence Xt with t E Z is called (weakly) stationary if
E { x ~= } mi
(1.8)
Rjk(s1t ) = Cov ( X i ,X:) = R j k ( s- t )
(1.9)
and f o r all s, t E Z and j , k E {1,2,. . . , q } . If this is t h e case, we denote m = [ml,m 2 , . . , m4]’ and R(T)= [Rjk(7-)]3,,=,
Multivariate (or vector) sequences obtained from the blocking of univariate (or scalar) sequences will be indexed by n and thus denoted as X,. That is, the univariate sequence X t is related by T-blocking to the T-variate sequence Xn by (1.10) [X,]’ = X ~ + , T , n E Z, j = 0 , 1 , . . . ,T - 1. The following proposition is a simple matter of following the indices.
Proposition 1.1 (Gladyshev) A second order r a n d o m sequence { X t : t E Z} is PC with period T if and only if the T is t h e smallest integer f o r which the T - v a r i a t e blocked sequence X, (1.10) is stationary. Proof. Considering the covariance Cov( [X,]J,[XmIk)= C o v ( X j + , ~ X , kfrn~), then stationarity of X, implies c o v ([X,]j, [Xm]k)= R j y n - m) = c o v ( X j + n T , X k + m T ) ,
6
INTRODUCTION
which implies (1.7) holds for X t , and conversely. The same argument applies to the mean. Periodically correlated sequences are generally nonstationary but yet they are nonstationary in a very simple way that, when the period T is known, makes them equivalent to vector valued stationary processes. The term periodically correlated was introduced by E. G. Gladyshev [77], but the same property was introduced by W. R. Bennett [la] who called them cyclostationary. Since P C sequences are so closely related to stationary vector sequences, which are rather well understood, then one can legitimately ask: why go to the effort t o study the structure of these processes? There are several answers. First, the value of T , required to transform a P C sequence to a vector stationary sequence, sometimes is not known prior to the analysis of an observed time series. Thus studying the time and spectral structure of the process using its natural time organization can provide clues to help us develop tests for PC structure and estimators for the period T . Second, the issues concerning innovation rank are more easily understood for PC sequences than for multivariate sequences because the natural time order eliminates some ambiguity. Third, the methods developed here for sequences naturally carry over to continuous time and to the almost periodic case; and in those cases it is not generally possible to block the process into a stationary sequence of finite dimensional vectors. We will often assume that E { X t } E 0 as it is the covariance (or quadratic) structure that is of most interest. However, we shall carefully discuss the issue of the additive periodic terms of a PC sequence, and how they can be conceptually viewed, and how they can be treated in the analysis of time series. There are several ways in which two sequences can be considered equal. For example, two random processes X t and Yt can be called equal if for each w E R their respective sample paths X t ( w ) and Y , ( w ) are the same. However, throughout this book, unless otherwise specified, we take two processes X t and Yt to be equal if
E I Xt
- Y,
12=
0, for every t E H.
1.1 SUMMARY This summary provides a little more detail about the contents with enough precision to make our direction clear, but not with the same care we will give subsequently. And it also provides further discussion of notation.
SUMMARY
7
Chapter 1: Introduction. Gives an introductory empirical example to motivate the definitions, and then this summary followed by a historical development of the study of these processes. In this we do not attempt a complete bibliography but concentrate on the beginnings of the topic and give additional references that contain more complete bibliographies. Chapter 2: Examples, Models, and Simulations. Presents simple models for constructing P C sequences, usually by combining randomness (usually through stationary sequences) with periodicity. Some important examples are sums and products of periodic sequences and stationary sequences, time scale modulation of stationary sequences, pulse amplitude modulation, periodic autoregressions, periodic moving averages, and periodically perturbed dynamical systems. For most of these examples, results of simulations are presented to show the extent to which some sort of periodic rhythm is visually perceptible in the time series. These also illustrate that the usual periodogram typically does not reveal the presence of the periodic structure in P C sequences, and the periodogram of the squares sometimes can reveal the periodic structure, but not always. Chapter 3: Review of Hilbert Spaces. Presents the basic facts about Hilbert space that will be needed. After definitions of vector space, inner product, and Hilbert space, general properties of (linear) operators are discussed. Of particular interest are projection operators, which have an important use in prediction, and unitary operators, which have a fundamental role in stationary and P C sequences. Finally, we review the spectral theory for unitary operators, including spectral measures, integrals, and the representation
u=
1
27r
e%(dX).
(1.11)
This spectral representation plays a critical role in the spectral theory for stationary and P C sequences. Chapter 4: Stationary Random Sequences. Emphasizes the role of the unitary operator and its spectral representation as we believe this helps to give a clear view of P C sequences. The core result is that if X z , j = 1,2, . . . , q are jointly (weakly) stationary and 7-l = @ { X i : j = 1,2, ...,q , t E Z},the stationary covariance structure allows one to prove quite easily that there exists a unitary operator U : 'H H 'H for which
xi+,= ux,j
(1.12)
for every j = 1 , 2 , .. . , q and t E Z.Iterating (1.12) gives X," = U t X i for all t , and by applying the spectral representation (1.11) we obtain the spectral
8
INTRODUCTION
representation of the sequence (1.13) where t~jis orthogonally scattered. We then discuss the main topics connected with prediction, regularity and singularity, the Wold decomposition, innovations, the predictor expressed by innovations, the connection between spectral theory and prediction, and finally, finite past prediction. We also discuss the issue of rank in connection with innovations and spectral theory. Chapter 5: Harmonizable Sequences. Presents the main facts about hamnonizable random sequences with emphasis on what is important to P C sequences. As a generalization of the spectral representation for stationary sequences (and also for continuous time), M. Loitve [138],who also wrote about (strongly) harmonizable processes in the first edition of Probability Theory [139],defined a sequence to be harmonizable if it has a spectral representation
(1.14) where <(.) is an L 2 ( R ,F ,P ) valued measure but no longer has orthogonally scattered (or uncorrelated) increments as it does in the stationary case. In order to convey the precise meaning of (1.14), we discuss vector valued measures and integration with respect to such measures. Then we discuss weakly and strongly harmonizable sequences, their connection to projections of stationary sequences, and spectral representation
R ( s ,t ) =
6'"l'^
s-iX2t
F ( d X 1 ,d X 2 )
(1.15)
of the covariance, where the sense of integration depends on whether X t is weakly or strongly harmonizable. Finally, we show how time invariant linear filtering affects the spectral representation of a harmonizable sequence (and of its covariance). Chapter 6: Fourier Theory of the Covariance. This is a topic introduced and mainly completed by Gladyshev [77]. The bijection between PC-T sequences and T-variate stationary vector sequences makes it no surprise that the Fourier theory for the covariance for P C sequences is very much related to the Fourier theory for the covariance of stationary vector sequences. The P C structure in the covariance (1.7) implies easily that
R(t
+
T-1
i27rlctlT Bk(T)e
t)=
7,
k=O
,
(1.16)
SUMMARY
9
Figure 1.3 Support ST of the spectral measure F for a periodically correlated sequence.
+
where B k ( 7 ) = T-' C,=oe 121rktlTR(t 7 , t ) . Using the connection to stationary vector sequences, Gladyshev argued that the coefficient functions { B k ( 7 ) : k = 0, 1, . . . , T - 1) are Fourier transforms T-1
_ '
(1.17) We show it by use of a characterization of Fourier transforms based on a theorem of Riesz. The plausibility that R ( s ,t ) given by (1.16) can be put into the form (1.15), which would make the covariance strongly harmonizable, turns out to be a fact, so every P C sequence is strongly harmonizable. The defining rhythm (1.7) associated with a PC sequence constrains the support set of the spectral measure F appearing in (1.15) to the 2T - 1 diagonal lines
ST
=
{(XI,
X2)
E
[0,2 ~ ) :*A 2 = A1 - 2 ~ k / Tk, = -(T-1).. . . , T - l}, (1.18)
as illustrated in Figure 1.3. The support lines of F may be identified with the sequence { F k ( . ) : k = 0 , . . . , T - 1) of complex measures whose Fourier transforms are BI,( 7 ) . We discuss the Lebesgue decomposition of F and the issue of point masses in the random spectral measure ((.), some of which are produced by the mean rn(t). The effects of time invariant and periodic filtering, sampling, and random time shifting of PC sequences are examined. We also give the mapping between the spectral measure F and the matrix valued spectral measure F of the (blocked) vector stationary sequence X,.
10
INTRODUCTION
Chapter 7: Representations of PC Sequences. Addresses various representations of P C sequences, with an emphasis on the connection to the unitary operator of a P C sequence. The basic covariance structure (1.7) implies that on the Hilbert space 'H = v { X t : t E Z} there exists a unitary operator U : 'H H 7-1 for which Xt+T = U x t (1.19)
for every t E Z.Thus U is a shift operator for X but only f o r shafts of length T . The most basic consequence of (1.19) is that we can find (derived from U ) another unitary operator V : 'H ++ 'H and a periodic function Pt taking values in 'H for which X t = VtPt, for every t E Z. (1.20) Using the spectral theorem for the unitary V leads to a spectral representation xt = J,2.rr ezXt[l ( d X , t ) ,where [I(., t ) is orthogonally scattered for all t whereas harmonizability implies that P C sequences also have a spectral decomposition (1.14) with respect to a tzme znvarzant random spectral measure [ that is not orthogonally scattered. With the aid of (1.20) we explicitly construct the time independent random measure E . By expanding Pt in a Fourier series T-1 we obtain the Gladyshev representation X t = Ck=O Ztk ez2.rrktlTas a Fourier series having jointly stationary coefficients (2," : k = 0 , 1 , . . . ,T - l}. We show (see [160]) how to explicitly construct a dilated sequence yt such that X t can be recovered by projection, X t = Pyt. Chapter 8: Prediction of PC Sequences. Treats the prediction problem for P C sequences, again with the help of the unitary operator U . We discuss regularity and singularity, the Wold decomposition and innovations, where we find that, at any t , the dimension dt of the innovation space is either 0 or 1, and dt = dt+T. It follows that a regular PC-T sequence has an infinite moving average representation with respect to the orthonormal sequence {Et : t E D+}, j>O : t - J E D +
where the t2 sequence of coefficients At = {u: : j 2 0} is periodic At = At+T and D+ = { t : dt > 0 ) is the set of times where nontrivial innovation occurs. T The number r = CjZldt+j is a constant (independent of t ) and is defined to be the rank of a PC-T sequence. A PC-T sequence is of full rank whenever T = T . For a simply constructed P C sequence of less than full rank, let { E t , t E Z} be an orthonormal sequence; the sequence {. . . [-I, (-1, t o , 60,(1 , [I, . . . } is PC-2 but of rank 1. We discuss the prediction problem for infinite and finite sets of predictors and give some illustrative results for periodic autoregressions of order 1, which, although simple, may also be of less than full
11
SUMMARY
rank. We then discuss prediction based on a finite past, periodic partial autocorrelations, and the periodic Durbin-Levinson algorithm. We also give the innovation algorithm for nonnegative definite (and hence possibly of deficient rank) covariances, along with a Cholesky decomposition for NND matrices. Chapter 9: Estimation of Mean and Covariance. Addresses the problems of estimation of the time-varying mean mt = E { X t } = mt+T and covariance Rt+,,t = E { [ X t + , - mt+,][Xt - m t ] }= Rt+T+,,t+T, and their Fourier coefficients 6 i k = ~ - 1 mte-i2xkt/Tand B k ( 7 ) = T-' cL :: Rt+,,te-i2Tkt/T. Here, X t is taken to be a real valued PC-T sequence. The corresponding estimators, which may be motivated by the lifting to the stationary vector sequence X,, are given by the following: for %t,N see (1.1),
cT=-:
h
For 6 t . N and f i k , N , we give conditions for mean square consistency in spectral terms, express the limits spectrally, and discuss the connection to the mean ergodic theorem. We show how to use the random time shift to give almost sure consistency in terms of Bo and FO from known stationary results. The random time shift is used again to obtain asymptotic normality via mixing for linear PC sequences. The practical estimation programs permest .m and permcoef f .m are presented and demonstrated. To test for a proper periodic mean (null is m(t) = m ) , the former produces confidence intervals based on Student's t and an ANOVA test; the latter uses a variance contrast method applied50 the periodogram to produce p-values for 6 i k = 0. For R N ( ~ 7 , t ) we give conditions for consistency in probability for a linear PC sequence using the lifted correlations and the fact that a linear PC sequence, when lifted, is a linear T-variate stationary sequence so known results may be applied. For X t with bounded fourth moments, various conditions on the second moments of Zz,, = [Xt+, - mt+,][Xt - mt]- R(t 7 ,t ) ensure mean square consistency; and other conditions give almost sure consistency. Asymptotic normality is obtained using either of two approaches: (1) a condition on the covariance of Zt+3T,ralong with #-mixing, and (2) normality of X t and a summability condition on the covariance, namely, C,"=-, I R(t 7 ,t )l2 < co. Similar results are obtained for consistency
+
+
z:=il
of g k . N T ( 7 ) .
+
12
INTRODUCTION
The practical estimation programs p e r s i g e s t .m and Kcoeff .m are presented and demonstrated. Program peracf .m computes R N ~(t + r,t ) and ? ~ , , , ( t 7 ,t ) . Assuming normal X i , confidence limits for the latter are computed (and plotted) by use of the Fisher transformation. Also computed are tests for (1) equality of correlations p ( t 7 , t ) = p ( r ) , where p ( r ) is some unknown constant; and ( 2 ) for p ( t r, t ) = 0 for some specific r (= means for all t in a period). Program Bcoeff .m computes (and plots) ~ ~ , N T ( for T ) k = 0 , 1 , . . . , [(T - 1)/2] (real X t ) via the sample Fourier transform applied to = [Xt?, - 6 i t + t , N ] [Xt- 6 i t , ~ ]This . permits the computing of p-values for the test B ~ , N T (= T 0, ) based on the variance contrast method of Section 9.2.2. Also, program p e r s i g e s t .m computes 2 ~ ( talong ) with confidence intervals based on normal (x2distribution with N - 1 degrees of freedom) and the Bartlett test for heterogeneous variances.
+
+
+
x,,
Chapter 10: Spectral Estimation. Addresses the problems of estimation of the possibly complex density functions f k ( X ) when the F k ( . ) in (1.17) are absolutely continuous with respect t o Lebesgue measure. The principal idea for the estimation of f k ( X ) is based on smoothing the two-dimensional periodogram
(1.21) along lines of support of F in [0,27r) x [0,27r), where
c
N-1
X,(X) =
[ X , - rnt]e--iAt
t=O
is the sample Fourier transform of [ X t - rnt],t = 0 , 1 , . . . , N - 1. Note the usual estimators for the spectral density in the stationary case are formed by smoothing f ( N , X l , X z ) along the main diagonal A1 = X 2 . We begin by showing that f k , ~ ( X ) = ~ N ( X , X - 27rk/T) is the Fourier transform of gky(7-Tand ) if CF=-, CT=-t IR(t 7 ,t)1 < co,then f k , ~ ( X )is an asymptotically unbiased estimator for f k ( X ) . By assuming X t is Gaussian, we obtain limN,m Var [ f k , ~ ( X ) ] = f o ( X ) f o ( X - 27rk/T) if X # rn/T and the limit is fo(X)fo(X - 27rk/T) lfn-k(7rn/T)I2if X = m / T , thus showing, as in the stationary case, that the estimator is not consistent. However, as in the stationary case, consistency can be achieved by smoothing f k , N ( X ) by the Fourier transform W ( X ) of a summable weight sequence w ( j ) , f k , ~ ( X ) = 1 J": W ( ( a- X ) / p ~ ) f k , ~ ( a ) dand a . where p~ is a positive sequence with p~ + 0 and N ~ N cc as N 00. We give conditions under which estimators formed in this manner are consistent and asymptotically normal. If X t is
+
+
h
--f
--f
a Gaussian PC-T sequence for which
C;=-, [CF=-t lR(t + 7 ,t)l']
1/2
< 00,
SUMMARY
13
then there exists a K > 0 for which N ~ N C O[V g j ( A 1 ) , g k ( A 2 ) ] 5 K for any j , k E [0,1,..., T - 11 and XI, A 2 E [0,27r). If X t is periodically stationary with fourth moments and uniformly &mixing with C,"=-,(&)1/2 < 00 and k ( j ) is any sequence with C,"=-,k ( j ) / ~ ' I l ~00, then
if p~ -+ 0, NpL 4 cc as N -+ M. In addition, we discuss the empirical spectral analysis of harmonizable sequences, which leads naturally to the notion of spectral coherence, which may be defined theoretically as
=
lim Corr
[ZN(Xl);zN(&)],
"03
and whose squared magnitude may be estimated by
where Z p is the sample Fourier transform of X t of length N . For PC-T sequences, setting X 1 = X and A2 = A1 - 27rk/T, we obtain
Spectral coherence is useful because (1) it may be estimated in a simple manner, (2) its distribution is known under a null hypothesis condition, and (3) in the case of P C sequences, it gives a way to judge the largeness of f l c , ~ ( X ) . If a harmonizable sequence has a jump (or atom) in its random spectral measure at A, and at Xb, then the theoretical spectral coherence at ( A a , & ) will be unity. Hence it becomes important to sense the presence of discrete spectral components in a time series and to remove them. Such methods are also discussed in this chapter. Final$, we present programs f k e s t .m and sc0h.m that implement the estimator f k , ~ ( X ) and the empirical spectral coherence / y ( A p A,; , M ) I 2 given above. Chapter 11: A Paradigm for Nonparametric Analysis of PC Time Series. Suppose one is given a sample of a time series and asked the question: Does this series exhibit the PC property or not? If so, what can we say about it? This chapter summarizes and organizes the methods discussed in previous chapters into a procedural outline, or paradigm, for answering these questions only
14
INTRODUCTION
within the scope of nonparametric time series analysis. That is, we consider only the tools of mean, correlation, and spectral measurements. Obviously, our ability to answer these questions, especially regarding characterization, will substantially improve by the inclusion of PARMA time series analysis, a topic to be addressed in future writings.
1.2 HISTORICAL NOTES The notion of P C processes seems to have begun with W. R. Bennett [la] who observed their presence in a communication theoretic context and called them cyclostationary. L. I. Gudzenko [84] initiated the subject of nonparametric spectral analysis for P C processes. V. A. Markelov [149] addressed some level crossing problems for Gaussian P C processes. A short time later E. G. Gladyshev [77] published the first analysis of spectral properties and representations based on the connection between P C sequences and stationary vector sequences. He gave necessary and sufficient conditions, in the spirit of A. Khintchine [129], for a doubly indexed sequence R ( s , t )to be the correlation of a P C sequence and argued that all P C sequences are strongly harmonizable and showed that their spectral support consists of a family of lines parallel to the main diagonal and having spacing of 27r/T. He also gave two representations for the processes and conditions for the processes to be purely nondeterministic. In 1963 Gladyshev [78] treated continuous time P C processes and introduced the almost periodically correlated processes. In a series of papers L. J. Herbst [go-961 explored sequences and processes whose variances may be periodic or almost periodic with respect to time: this work was done without the benefit of the P C structure. W. M. Brelsford [25] obtained, for P C sequences, a spectral-like representation of mixed summation and integral form. He also presented methods for estimation of the periodic coefficients in periodic autoregression models. In various investigations of asymptotic stationarity, J. Kampi. de Fi.riet [126], J. Kampi. de Fkriet and F. N. Frenkiel [127], and Parzen [178, 1801 mentioned processes that are P C in nature but their work concentrated on the estimation of the asymptotic correlation and spectral density functions (i.e., on estimation of B o ( T )and fo(X); see the preceding summary of Chapter 6 for this notation). To give some of the early connections to applications, Markelov [149]states that the noise output of a parametric amplifier has the P C property if the noise input is stationary; along similar lines, Parzen [179] suggested that a Poisson process with time periodic parameter would be a model for electron emissions from the cathode of a temperature limited diode whose filament was heated by an alternating current. A. S. Monin [165] suggests using P C processes as
HISTORICAL NOTES
15
models for meteorological time series and R. H. Jones and Brelsford [122] do this for a time series of temperatures. Several additional books touch on various aspects of P C processes. First, A. Papoulis [176]discusses various properties of PC processes in an early edition of his book on probability and stochastic processes; he calls them periodically stationary. L. E. Franks [58] discusses cyclostationary processes in a book on communication theory. For the continuous time case, H. L. Hurd [loll showed the nature of the spectral support (an extension of Gladyshev’s theorem) for strongly harmonizable PC processes, identified the connection between P C processes and those that can be made stationary by an independent uniformly distributed time shift, and obtained consistency results for estimation of the coefficient functions B ~ ( Tand ) the densities fk(X). H. Ogura [174] presented some of the spectral theory based on harmonizable processes. W. A. Gardner, in his dissertation [63]developed various representations of continuous time P C processes and used them in the solution of estimation problems. Much of this appears in the paper by Gardner and Franks [64]. After the initial work by Gladyshev, the topic was seriously examined in the former Soviet Union by Y . A. Dragan and his colleagues; much of their work seems to be summarized in three books [50-521, all in Russian. A. M. Yaglom [227]gives many references and a nice exposition of many of the basic relationships; we recommend this as a reference for those who wish to work in the topic. Much work on cyclostationary processes followed, mainly motivated by communications problems and lead by Gardner, resulting in two books [66,67]. In these books Gardner principally takes a viewpoint much like Wiener’s generalized harmonic analysis, where an observed sequence is considered to be a nonrandom sequence. The usual notion of probability is replaced with a limit of occupation time above a threshold, or fraction of time [67,166]. The approach produced understanding and solutions to many problems and is therefore interesting and useful. A discussion of the two views (random and nonrandom) may be found in [69]. Some later efforts to clarify the Wold isomorphism [67,226] applied to random and nonrandom cyclostationary sequences are given in [116,117]. In 1992 Gardner initiated a large meeting, whose subject was Cyclostationarity in Communications and Signal Processing [73], which brought together engineers, statisticians, and mathematicians. Many new problems and collaborations came from this meeting. Subsequent work took several directions, in addition to continued work on communications problems. In statistics, work ensued on structural theory of PC and almost PC processes [23,44,46,65,97,98,102-104,106,110-113,116118,143,145,157,160]on spectral and covariance estimation [5-7.30,39-41, 43,75,105,108,109,135,212,2281, on testing, [7.37,72,107] and on PARMA time series [lo, 137,140,175,205,212,213,217-2191.
16
INTRODUCTION
Interesting work on PC processes, a little under-represented here, exists in some other fields of study. As pointed out earlier by Monin [165] and Jones and Brelsford [122], there is a natural application in meteorology due to the obvious daily or annual forcing. Applications of both parametric and nonparametric methods are indicated and have been used rather extensively. We find the connection between P C processes and Bloch’s theorem, pointed out by K. Kim, G. North, and J. Huang [130,131] to be of great interest. An extensive body of related work on periodic control [15-171 has a direct relation to PC sequences. For connected work in economics, see the book by P. H. Franses [60] and the references therein. A recent survey on cyclostationarity by Gardner, A. Napolitano, and L. Paura [74] contains a very complete bibliography. During this period of development of PC processes, the theory of harmonizable processes also matured. H. Cram& [36]attributes the word harmonizable to M. L o h e [138],who also wrote about (strongly) harmonizable processes in [139]. Yu A. Rozanov [200] made an important early contribution, before the more recent developments [l,31,38,82,99,100,144,155,169,189]. PROBLEMS AND SUPPLEMENTS
1.1 P C fields indexed on Z2. A collection of second order random variables X,,t indexed on Z2 is called a (strongly) PC field with period ( S . T ) if its mean and covariance functions satisfy
m(s,t ) = m(s + k S , t + IT),
(1.22)
(1.23) R ( s ,t , s’,t’) = R ( s + ICS, t + lT, S’ + ICS, t’ + 1T) for all integers s, t , s‘, t‘ and k , 1 in Z. The second order random field X,,t is called weakly PC with period ( S .T )
if
+
+
+
+
R ( s ,t , s’, t’) = R ( s S , t T , S’ S , t’ T ) (1.24) for every s,t,s’,t‘. Here we require S 2 0 and T 2 0 but do not permit S = T = 0 because this would put no constraint on the covariance structure of the field. A weakly PC random field is essentially a countable collection of P C sequences arranged along parallel lines of slope T / S in Z2. If X is strongly PC, then it is also weakly PC. 1.2 Multivariate P C sequences. The multivariate sequence X t = [X:, X:, . . . , XF]’ is P C with period T if
+
m J ( t )= E { X Z } = mJ(t T )
(1.25)
and
R J k ( s , t ) = E { [ X-;m 3 ( s ) ] [ X ~ - m k ( t ) ] } = R J k ( ~ + T , t + (1.26) T)
PROBLEMS AND SUPPLEMENTS
17
for every j,k = 1 , 2 , ..., N and s, t E Z. 1.3 Almost PC sequences. A complex valued nonrandom sequence ft is called almost periodic in the Bohr sense if for every E > 0 the set (1.27)
has bounded gaps, meaning there is a real number A for which every interval of length A intersects E ( E ) .A L2 sequence is almost PC if for every r the sequence R(t T , t ) is Bohr AP with respect t o t .
+
1.4 Continuous time processes and fields. For 1 = R, the defining equations (1.6) and (1.7) as well as (1.22) and (1.23) read exactly the same. Continuous time APC processes are L2 processes for which R(t 7 ,t ) is Bohr AP with respect to t for each T . A function f : R H C is Bohr almost periodic if it is continuous and for every E > 0 the set E ( E )defined by (1.27) has bounded gaps.
+
This Page Intentionally Left Blank
CHAPTER 2
EXAMPLES, MODELS, AND SIMULATIONS
With the objective of building intuition, we will now consider some mathematical models that produce PC sequences and present some simulated sample paths based on these models. As expected, if the P C structure is strong enough, it can be perceived by viewing the time series. The simulated sample paths are also used to demonstrate that the classical periodogram' is unable to determine the presence of the PC structure in a time series. Its modification for PC sequences is given in Chapter 10. 'Although there are several useful methods, t h e periodogram is probably the most widely used method for determinining the presence of periodicities in a time series. A review of its use in this context will be given in Chapter 10.
Periodically Correlated Random Sequences:Spectral Theory and Practice. By H.L. Hurd and A.G. Miamee Copyright @ 2007 John Wiley & Sons, Inc.
19
20
EXAMPLES, MODELS, AND SIMULATIONS
2.1
EXAMPLES AND MODELS
Although the more interesting and most general examples of P C processes come from combining periodicity with stationary random processes, we begin with r a n d o m periodic sequences.
2.1.1
Random Periodic Sequences
EXAMPLE2.1
If X t E L2(R,3,P ) is a random periodic sequence for which
X t = Xt+T, for every t E Z,
(2.1)
then X t is P C with period T . To see this, recall that (2.1) means that the random variables X t and Xt+T are the same modulo L2;that is, for every t. Then it becomes clear that for every s, t
and
so X t is PC-T. A special case of a periodic sequence is given by
Xt = X .f t , where X is a second order random variable and f t is a scalar periodic sequence f t = ft+T. This special case gives E { X t } = f t E { X } and Var[Xt] = Var[X]f,”. By taking X to be a constant random variable, say, X = 1, we see that a scalar periodic sequence is PC. Note that a periodic sequence has a periodicity in the covariance that is stronger than the condition (1.7) given above. That is, Proposition 2.1 A sequence X t i s periodic if a n d only i f m ( t ) i s periodic
and R(s,t ) is doubly periodic, in symbols,
m(t)
=
R(s,t )
=
+
m(t T ) , R ( s + mT, t + nT)
(2.5)
f o r every s, t , m, n E Z. Periodic sequences are also present, along with unitary operators, in general representations of P C sequences. This is the subject of Chapter 7.
21
EXAMPLES AND MODELS
2.1.2 Sums of Periodic and Stationary Sequences H EXAMPLE2.2 Suppose X t and Yt are uncorrelated random sequences. If X t is T-periodic and yt is wide sense stationary with mean my and covariance R Y ( u ) then ,
2,= xt
+ yt
is PC with period T. By ( 2 . 3 ) it is easy to see
m z ( t )= m x ( t )+ m y
= mx(t
+ T ) + m y = mz(t + T )
and
Rz(s,t )
= =
+
R x ( s , t ) R ~ (-st ) Rx ( S + T ,t + T ) + R ~ (+sT
- t - T )= R
~ (+sT ,t + T ) .
So the conditions (1.6) and (1.7) are satisfied for 2,.Here, and in some of the following examples, we write the time arguments using parentheses to help legibility. Sums of periodic and stationary sequences are among the simplest of PC sequences, although it is important to understand them. Perhaps a little more interesting are the cases where additive periodic components are not present. Such are the following cases.
2.1.3 Products of Scalar Periodic and Stationary Sequences H EXAMPLE2.3
If X t is wide sense stationary with E { X t } = 0 and sequence f t = f t + T , then yt = f t xt
ft
is a scalar periodic
'
is P C with period T . The required computations yield
my(t)= f t E { X t } = 0, for every t E Z and
(2.6)
22
EXAMPLES, MODELS, AND SIMULATIONS
In engineering, it is sometimes said that Yt is produced by amplitude m o d ulation of a stationary process by a scalar periodic function. Let us observe that the variance of yt is proper/$ periodic with period T whenever 1 f t l is properly periodic since
R y ( t , t )= Ift12rx(0)= Ift+TI2rx(O) = R y ( t + T , t + T ) .
(2.8)
The sample paths can be just as complicated as those of stationary sequences, but the amplitude is periodically scaled. Of course, information about X t is lost if f t = 0 for some t = 0 , 1 , 2 , . . . , T - 1. 2.1.4
Time Scale Modulation of Stationary Sequences
EXAMPLE2.4
If X t is wide sense stationary with E X t E 0 and sequence f t = f t + T taking values in Z, then
ft
is a scalar periodic
yt = Xt+ft is PC with period T. For every s, t in Z,
and
thus showing yt is PC-T. In engineering, time scale modulation is related to phase or frequency modulation. In contrast to the amplitude modulation case, the variance of Yt is never properly periodic; it is always constant:
R y ( t ,t ) = R x ( t
+ft
-
t - f t ) = Rx(0).
(2.11)
Hence there exist simply constructed PC processes whose variance function is constant in time. The next example seems to be of both amplitude scale and time scale type. 'A sequence gt will be called properly periodic with period T if gt = gt+T for every t but g is not constant.
23
EXAMPLES AND MODELS
w
EXAMPLE2.5
If X t is stationary with E X t f t = f t + T defined by
a={
= 0 and f t
-
1 1
if if
is a scalar periodic sequence
0 5 t < [T/2] [T/2] 5 t < T
where [z]denotes greatest integer less than
'
(2.12)
2, then
yt = f t . xt is PC with period T (from the amplitude modulation example). Although yt is formed by multiplication of a stationary sequence by a periodic function, we still get R y ( t , t ) = Ifti2Rx(0)= R x ( 0 ) so in this way yt acts like time scale modulation.
2.1.5
Pulse Amplitude Modulation
This example is closely related to the amplitude modulation example. EXAMPLE2.6
If X , is a zero mean stationary sequence and f r is a nonrandom real valued function defined on {0,1,. . . , T - l}, then for t = nT r, 0 5 r < T , the sequence r, = X n f r (2.13)
+
is PC with period T . To see this, for arbitrary integers s, t write s = mT and t = nT r. where 0 5 q , r < T . Then
+
E{YSyt)
= = =
E{XmX,)fq.fr r x ( m- n ) f , f r E{ Y s + T X + T 1.
+q
(2.14)
Bennett [12] studied sequences with this form, where the sequence { f r : 0 5 r < T } is called a pulse and the random sequence { X , : n E Z} contains information that is carried by the amplitude of the pulse over consecutive intervals of length T ; hence these sequences are called pulse amplitude modulation. Bennett's work, which was in the data communications context, began the study of cyclostationary signals. From (2.14) it follows easily that
so iff," is not constant these sequences will have a properly periodic variance.
24
EXAMPLES, MODELS, AND SIMULATIONS
2.1.6
A More General Example
This example is a generalization of the simple amplitude modulation discussed earlier and requires the following definition.
Definition 2.1 The second order sequences { X i : j = 1 , 2 , . . . , m, t E Z} are called jointly stationary if r n j ( t ) = E { X : } = rnj(0)
is constant with respect to t and R j k ( ~t ,) =
E{XjXf"}
depends only on s - t for every j , k = 1 , 2 , . . . , m and s, t in Z. Jointly stationary sequences are discussed further in Chapter 4, where they are called a multivariate stationary sequence. EXAMPLE2.7
Suppose the sequences in the collection { X i ,: j = 1 , 2 , ...,N , t 6 Z}are jointly stationary and the scalar functions fl : j = 1 , 2 , ...,N are all periodic with period T . Then the sequence N
(2.15) j=l
is P C with period T . To see this, let mj = E { X z } so that
cf j m j N
m y ( t )=
= my(t
+T )
j=1
and
N
N
=
f!+Tftk+TRjlc(s j=1 k=1
N
+T
-
t
-T
)
N f j z R j k ( ~ t ) = R y ( s ,t ) .
=
(2.16)
j=1 k=1
In Chapter 7 it will be shown that every PC sequence can be given in this form with N 5 T .
EXAMPLES AND MODELS
25
2.1.7 Periodic Autoregressive Models A very important class of P C sequences are those given by the periodic parametric models. Here we will begin to examine the periodic autoregressive (PAR) models of order 1 and in the next subsection we will introduce periodic moving average (PMA) processes. In this book, we will only treat simple periodic autoregressive models and periodic moving average models, leaving the treatment of periodic autoregressive moving average (PARMA) models for future work. Definition 2.2 A zero m e a n second order sequence X t as called a periodic autoregression of order 1 (PAR(1)) if
X t = 4(t)Xt-,
+ o(t)&,
where @ ( t )= $(t + T ) and o(t) = o(t + T ) are real and {& orthonormal sequence.
(2.17) :
t
E
Z} is a n
Defining {o(t)Et) as the shock sequence, here we set a ( t ) = o and denote the resulting model as the PAR(1) with constant variance shocks, or PAR(1)CVS. Assuming X t is causal with respect to the shocks in the sense E { X & } = 0 whenever s > t , we can obtain the main result of this section: EXAMPLE2.8
A PAR(1)-CVS sequence Xt is PC-T if and only if lAl < 1, where T-1
(2.18) t=O
The proof is contained in the following discussion. Let us first recall that when $(t)is constant, +(t)E 4, the sequence X t is a homogeneous autoregression of order 1 (AR(1)) and (2.17) becomes Xt = 4Xt-l a&. Then input shocks are always of constant variance and the sequence Xt is stationary with autocorrelation
+
if and only if
141 < 1.
26
EXAMPLES,MODELS, AND SIMULATIONS
Taking s > t to be specific, let us now calculate E { X s X t } for a PAR(1)CVS sequence. By recursion of (2.17),
+
E { X s X t } = E{[4(s)Xs-1 C S l X t } = E{4(s)Xs-1Xt} + E{<sXt} = E{4(s)X,-1Xt} + O for s > t =
$(s)$(s - 1 ) . . . qqt
+ l)E{XtXt}
(2.19)
and we see from the periodicity of # ( t )that we will obtain
provided
R x ( t ,t ) = E { X t X t } = E{Xt+TXt+T} < co for every t. So X t will be PC-T provided the variance function R x ( t , t ) is periodic. Periodicity and boundedness of the variance are always necessary for a P C sequence because R ( t , t ) = R(t T , t T ) < co for all t. The following lemma completes our current discussion of J A ( .In Chapter 8 we explore more thoroughly the connection between causality, boundedness of I/ X t 11, and IAl < 1.
+
+
Lemma 2.1 A necessary and suficient condition f o r the variance a $ ( t ) of a PAR(1)-CVS to be periodic, a $ ( t ) = a$(t T ) , is that /A1 < 1, where A is given b y (2.18).
+
Proof. First, for arbitrary t let us denote
AT-l(t)
=
4(t)#(t- 1 ) . . . @(t- T + 2)
(2.20)
and we can see that AT(t) is the product of 4 ( t )over exactly one cycle so that
AT(t)= +(t)4(t- 1) . . . $(t - T
+ 1) = A.
(2.21)
Using (2.17) we obtain
& ( t ) = A:(t)a$(t - 1)
+ 1,
(2.22)
EXAMPLES AND MODELS
27
which may be continued recursively to obtain
and this converges if and only if IAl < 1. Hence, if lAl < 1, & ( t ) is bounded. And again from the (convergent) representation (2.23), a $ ( t ) = a % ( t + T )for all t because A p ( t )= Ap(t T ) for all t and p = 0 , 1 , . . . , T - 1. Conversely, if a $ ( t ) =I1 X t 112 is bounded, then for all t
+
p=o
we must conclude that
j=O
and hence IAl < 1.
I
Given IAl < 1we may solve for a$@)by doing the aforementioned recursion for only T times.
and so if
& ( t )= 0%(t - T ) then easily &(t)
2.1.8
=
1
+ A l ( t )+ . . . + A$-,(t) 1 - A’
(2.25)
Periodic Moving Average Models
A second order random sequence X t is a periodic moving average of order q (PMA(q) and with period T if it satisfies 4
xt = C ej(t)tt-j
(2.26)
j=O
+
where 0, ( t )= B j ( t T ) for every j , t and where the & are taken to be of zero mean and orthonormal. Then X t has mean zero for all t and its covariance is
28
EXAMPLES,MODELS, AND SIMULATIONS
easily computed to be 0
0
j=o k=O
+
where Is,t = { j : 0 5 j 5 y and j = s - t k for some 0 5 k 5 4). Since Is,t = I s + ~ , t +we~ !conclude that X t is PC-T and its variance is 4
R ( t ,t ) =
C lQjI2(t).
(2.28)
j=O
We may also conclude from (2.27) that R ( s ,t) = 0 if 1s - tl and Q j ( t ) are taken to be real. 2.1.9
> y. Usually
Periodically Perturbed Dynamical Systems
The PAR and PMA models we have just briefly discussed are simple examples of periodically time varying systems that produce PC sequences when driven by random shocks. This leads quite naturally to the question: Do nonlinear dynamical systems, when periodically perturbed, give rise to orbits that are theoretically or empirically consistent with periodic correlation or cyclostationarity? An obvious physical example that motivates this question is the perturbation of meteorological processes by the daily variation in solar radiation. Below we will see that periodically perturbing a simple family of maps (the logistic maps) yields orbits that exhibit periodic correlation. To make the notion of a periodically perturbed map more precise, suppose we have a family ga : IR 4 IR of maps, and a finite collection of parameters {ag,~ 1 ,. .., a~-l}. The orbit of x under the periodically perturbed map (or family of maps) is the sequence given by 51
=
9ao(.)
22
=
gal
oga,(x)
29
SIMULATIONS
where T = ( n - 1) mod T . Simulated orbits for the logistic family g a ( z ) = a z ( l - x) are presented in the next section. 2.2
SIMULATIONS
The main goal of this section is to provide some simulated sample paths of real valued series generated from the models described in the previous section. We first present plots of the simulated series to illustrate what can be visually perceived about the presence of PC structure in a time series. Finally, for some of the series we will show the periodograms in order to illustrate that they are ineffective in revealing the presence of PC structure unless there are additive periodic components.
2.2.1 Sums of Periodic and Stationary Sequences Figure 2.1 presents some time series formed from the sum of a very simple periodic sequence and a very simple stationary sequence,
Yt = A c o s ( T ~ / ~+~&, )
& is white noise with E { & } = 0, a( = 1. In Figure 2.l(a), where A = 0.5/512, the additive periodic sequence is not clearly perceived from
where
the raw time series. However, it is easily perceived in Figure 2.l(b), where A = 2.51512. i
I
i -2
I
-3
I 10
20
30
time index
40
50
80
-4
10
(4 Time series Yt = A cos(.irt/l6) Figure 2.1 (b) A = 2.5/512.
20
30
40
50
60
lime index
(b)
+
Et,
uc = 1. (a) A = 0.5/512.
Figure 2.2 shows that even the weak periodic sequence can be visually perceived by comparing the squares of the sample Fourier transform at frequency
30
EXAMPLES, MODELS, AND SIMULATIONS
index j = 32 to those near it, a procedure we will formalize later. In this figure, the Fourier transform was taken of a sample of 512 points so that the period 16 gives j = 512/16 = 32 period^.^ The squares of the sample Fourier transform are called the periodogrum. More on the periodogram will be given in Chapters 9 and 10.
35
105.
‘0
rb
20
30 40 Frequency Index j
50
60
Frequency Index I
(4
(b)
+
= A cos(.rrt/16) &; ag = 1 based on one Figure 2.2 Periodograms of FFT of length 512. (a) A = 0.5/512. (b) A = 2.5/512.
2.2.2
Products of Scalar Periodic and Stationary Sequences
Figure 2.3 presents some time series formed from a simple periodic amplitude modulation of white noise,
yt = [l
+ mc0~(2.irt/64)]&,
(2.30)
where & is mean zero white noise with unit variance, a; = 1. In Figure 2.3(a), where m = 0.25, the multiplicative periodic structure is not clearly perceived from the raw time series. However, it is easily perceived in Figure 2.3(b), where m = 1. But even in the case of strong modulation ( m = l ) ,Figure 2.4 shows that the periodogram of a 512 point sample does not reveal the presence of the period 64 modulation. (It would be expected at j = 512/64 = 8.) However, Figure 2.5 shows that the period j = 8 is clearly perceived in the periodogram of the squared series yt2. Consideration of the time series of squares immediately reveals the reason; the series of squares has a periodic mean, E{Y,’} = [1+2mcos(2.irt/64) m2cos2(27rt/64)]a$.
+
3The peak is actually at j = 33 because the MATLAB computing environment begins indexing at 1 rather than 0.
31
SIMULATIONS
+
Figure 2.3 Time series Yt = [l mcos(2nt/64)] & . (a) m = 0.25 (weak modulation). (b) m = 1 (strong modulation).
35-
~
OO
10
20
30 40 Frequency Index j
(4
50
80
10 0
10
20
30 40 Frequency Index j
50
60
(b)
Figure 2.4 Periodogram of Yt = [l + mcos(2nt/64)] Et, based on one FFT of length 512. (a) m = 0.25 (weak modulation). (b) m = 1 (strong modulation).
Can this simple modification, forming the periodogram of the squared series us to perceive the presence of any arbitrary PC structure?
y,", always permit
The answer is no; its success depends on y2having a properly periodic mean, that is, E { K 2 } = R y ( t , t ) must be a proper periodic function. This proper periodicity typically occurs for amplitude modulation models but never occurs for time scale modulation models.
32
EXAMPLES, MODELS, AND SIMULATIONS
Figure 2.5 (a) Time series of squares y,' for yt = [1+ m c o s ( 2 ~ t / 6 4 ) &, ] with m = 1 (strong modulation). (b) Periodogram of y,", based on one FFT of length 512, with strong peak at index j = 8.
2.2.3 Time Scale Modulation of Stationary Sequences Figure 2.6(a) shows the time series of a stationary first order autoregression
Figure 2.6 Time series. (a) X t = O.9Xt-1 (time scale modulation).
+
+&
(AR(1)). (b) Yt = X t + f t
X , = 0.9Xt-1 [t with covariance R x ( k ) = 0.9'" 05, and Figure 2.6 is the series Yt = X t + f t , where f t = ft+32 is a permutation of the integers 1 , 2 , .. . ,32. Since from (2.11) we have R y ( t , t )= Rx(O),then it is no surprise
SIMULATIONS
33
Figure 2.7
y,“ for yt = X,+ft (time scale modulation). (a) time series. (b) periodogram based on one FFT of length 512.
that we cannot perceive any periodicity in the mean of the squares q2shown in Figure 2.7(a), nor in the periodogram in Figure 2.7(b). As another example, Figure 2.8 shows that neither the time series of the squares of yt = ft . X t , nor the periodogram, show any evident sign of periodicity when we take X t to be a stationary first order autoregression X t = 0.8Xt-l + & with correlation R x ( k ) = 0.8k and we take f t to be the &1 sequence of (2.12). Although yt is most certainly P C with period T = 32, E { q 2 } = l f t 1 2 R ~ ( 0=) Rx(0)is constant with respect to t (and thus not properly periodic). The last two examples show that in the case of time scale modulation we cannot expect to perceive the PC structure visually from the time series nor from the periodogram of the squares, even though both cases represent severe modulation of the original sequences. The contents of Chapters 6, 9 and 10 will give a better understanding of these and we will give methods that do permit the perception of their PC structure.
~ 5 ,
2.2.4
Pulse Amplitude Modulation
Figure 2.9(a) is a time series resulting from the pulse amplitude modulation yt = X , ~ ( T ) ,t = nT + T , 0 5 T < T of a triangular f ( r ) of duration 32 by an uncorrelated noise sequence X , having Rx (0) = 1. The periodogram of Yt (based on fast Fourier transform (FFT) length 1024) shows no clear component at j = 1024/32 = 32. The large spectral “hole” at 64 periods (in 1024 point FFT) is caused by the exact symmetry of ft = f T - t + l . The
34
EXAMPLES, MODELS, AND SIMULATIONS
Figure 2.8 y," for yt = ft . X t with ft = f t + 3 2 and f t = *1 according to Eq. (2.12). (a) Time series. (b) Periodogram based on one FFT of length 512. A period 32 component would be seen at frequency index 512132 = 16 or harmonics thereof.
2'
100
200
300
400
Y, i 500
(4 PAM sequence Yt = Xnf(r), t = nT + r, 0 5 r < T for triangular f ( t ) and X , uncorrelated. (a) Time series. (b) Periodogram based on one FFT of length 1024. Figure 2.9
periodogram of
y,"
shown in Figure 2.10(b) clearly shows the component at
j = 32.
The previous examples have illustrated that PC sequences can arise from stationary sequences whose amplitudes or time scales are given a periodic rhythm. This example, aside from the historical importance, shows that PC
SIMULATIONS
35
Figure 2.10 y," for PAM sequence. (a) Time series. (b) Periodogram based on one FFT of length 1024.
sequences can arise when random events (such as a pulse with a random amplitude) occur on a periodic schedule.
2.2.5
Periodically Perturbed Logistic Maps
In the experimental results that follow, we shall take ga(x) to be the family of logistic maps go(.) = az(1- x) (2.31) for 2 5 cy 5 4. These maps carry [0,1] into [0,1] and have been extensively studied (e.g., Devaney [45]). For a = 4, the logistic map is chaotic (has sensitive dependence on initial conditions ) but converges to fixed points or periodic points for many other values of a;for example, g2.6 has a single attracting fixed point at x = 8/13. To go immediately to our objective, we chose T = 20 and set a0 = a1
= . . ' a g= 4
to give chaotic behavior for the first half of the period and C Y ~ O= a11
=
. ..
= 2.6
to give stable behavior for the second half of the period. Figure 2.11 presents 200 consecutive values of x, and the estimated periodic mean m, : 0 5 n 5 19 based on N = 100 periods. We can easily perceive some sort of periodic character in the raw orbit and that the estimated periodic mean is significantly nonzero (based on normal
36
EXAMPLES, MODELS, AND SIMULATIONS
1
0.5
"
20
1
r
I
O'
2
4
80
60
40
6
100
120
140
160
180 200
I
I
8
I0
12
14
16
18
do
Figure 2.11 Periodically perturbed logistic map. (Top) Two hundred points from a 2000 point simulation. (Bottom) Estimated periodic mean m, : 0 5 n 5 19 with 95% confidence intervals based on N = 2000/20 = 100 periods.
distribution of errors in the estimated periodic mean). The orbit seems to behave rather chaotically (qualitatively) in the first half of each cycle, and then appears to be rapidly converging in the second half. However, the confidence intervals of f i n for 0 5 n 5 4 suggest that the orbit is rather stable in this interval from one period to the next, even though cy = 4 in this region. The reason for the apparent stability is that x20k+20 (the first point in a new cycle) always determined by 9 4 operating on the last point, ~ 2 0 k + 1 9 ,in the previous cycle, which is always very close to the fixed point for 9 2 . 6 at x = 8/13. The sensitive dependence of g 4 causes the confidence intervals of mn to continue to increase until n = 10 when the map switches to g2.6 whose stability causes the confidence intervals to decrease until the map switches again to g 4 at the start of the next cycle. The demeaned orbit, Zn = xn - m, shown in the top trace of Figure 2.12, appears to be comprised of some irregular pulses occurring on a periodic schedule, a qualitative clue for periodic correlation or cyclostationarity. The estimated periodic 6, in the bottom trace clearly shows the sample variance is not constant through the period, as did the confidence intervals we discussed earlier. And, as we now might expect, the usual periodogram shown in the top trace of Figure 2.13 shows no evidence of periodicity at period T = 20 (which would produce 1000/20 = 50 periods in 1000 points and would present at index 51 on the periodogram). However, in the periodogram of the squares, the peak
SIMULATIONS
" I
-0.51
' -0
20
40
60
80
100
120
1 140
160
180
200
I
0.3
Figure 2.12 (Top) Two hundred points of the demeaned orbit 2,. (Bottom) Estimated g,y ( t ) with 95% confidence intervals.
6l 4r
-2
-4'
Figure 2.13 (Bottom) 2;.
10
20
30
40
50
60
70
60
10
20
30
40
50
60
70
80
I
Periodograms based on one FFT of length 1000. (Top) zn.
37
38
EXAMPLES, MODELS, AND SIMULATIONS
at index 51 is very significant (p-value < 10V6). The dashed lines are the Q = 0.01 threshold for test of variance contrast based on a half-neighborhood of size m = 8. 2.2.6
Periodic Autoregressive Models
Some simulated PAR(1) time series illustrate our ability to perceive evidence of periodic correlation, either visually in the series or by detecting periodicity in the sample periodic variance. Figures 2.14 and 2.15 present 256 samples of the simulated series, the theoretical (see (2.25)) and estimated values of a x ( t ) )and periodograms of X t and of X," for the model X t = 4 ( t ) X t - 1
+
+
N
(2.32) where 4a = 0.95 and 4 b = -0.95 and T = 16. This yields A = 0.4403 which is substantially larger than the largest value of (A1in the group with sinusoidally varying 4(t). Note the values of 8 x ( t )and their confidence intervals strongly support the hypothesis of a constant variance, which is indeed the true situation. But still there is strong visual evidence of something periodic in the time series. Neither the usual periodogram nor the periodogram of the squares reveals any hint of periodicity with period T = 32. So here is a case in which the detection of periodic correlation must wait for the more general methods we shall discuss in later chapters.
SIMULATIONS
50
150
100
200
250
10
20
30
40
50
60
70
Figure 2.14 Simulated X t = $ ( t ) X t - , +EL where Et is N ( 0 , I) white noise $1 cos(27rt/T)],T = 32, 40 = 0.6, $1 = 0.4. (a) (Top) A and 4(t) = [$o sample of 256 points from a 5120 point simulation. (Bottom) Theoretical and estimated o x ( t )with 95% confidence intervals based on N = 5120/32 = 160 periods. (b) Average of 10 periodograms using FFTs of length 512 for X t (Top) and X : (Bottom). Dashed lines are the a: = 0.01 threshold for test of variance contrast based on a half neighborhood of size m = 8.
+
Figure 2.15 Same model as Figure 2.14 except $0 = -0.6, &I = 0.4. (a) Time series and u x ( t ) using same parameters as Figure 2.14(a). (b) Average of 10 periodograms as in Figure 2.14(b).
39
80
40
EXAMPLES,MODELS, AND SIMULATIONS
Figure 2.16 Simulated X t with +(t)= 0.95 for 0 5 t < T/2 and @ ( t = ) -0.95 for T/2 2 t < T, with T = 32 and Cov (&,&) = 6,-t. (a) Time series and a x ( t ) using same parameters as Figure 2.14(a). (b) Average of 10 periodograms as in Figure 2.14(b).
2.2.7
Periodic Moving Average Models
Again some simulated series illustrate our ability to visually perceive evidence of periodicity in the time series, in sample periodic variance or in the periodograms of X t and X: . Figure 2.17 presents the now familiar views of a simulated X t for the slowly varying model
xt with
= Et
+h(t)Et-1
Qi(t)= 1 + 0 . 2 5 ~ 0 ~ ( 2 ~ t / T ) ,
T = 32, and Cov = dSpt. We see no clear visual evidence of the periodicity in the time series trace of Figure 2.17(a) although it is reasonably clear in Z j ~ ( t ) .Again, the usual periodogram is unable to detect any evidence of periodicity at cy = 0.01, but the periodicity in X ; is clear; the p-value at 16 periods (in 512 point FFT) is 5.4 x 10-4. Note from the form of (2.28) that it is easy to construct PMA models that have constant variance. For example, the model &(t)= 1, & ( t )= c o s ( 2 ~ t / T ) , and & ( t ) = s i n ( 2 ~ t / T would ) give 2
~ ( tt ) ,=
C e,”(t)= I + c o s 2 ( 2 T t / ~+) s i n 2 ( 2 r t / ~=) 2 . j=O
Figure 2.18 presents a N = 600 point simulation of such a series with no clear ) from the periodogram of squares. evidence to reject a ( t )= 0 from Z j ~ ( tnor
41
SIMULATIONS
-1 O -5
50
100
150
2W
250
-2'
'
'
'
'
'
10
20
30
40
50
60
70
80
I
Simulated Xt = Et + & ( t ) E t - l with & ( t ) = 1 + 0.25cos(2~t/T)for T = 32 and Cov(Es,&) = 6,-t. (a) Time series and a x ( t ) using same parameters as Figure 2.14(a). (b) One periodogram using same parameters as in Figure 2.14(b). Figure 2.17
Another constant variance model,
permits the switching between two MA models in a way similar t o the switching AR model which we can understand from the analysis of stationary models. By choosing the parameters (&,e1) = (1,l), the switching model (2.33) gives a two-point averaging filter for the first half of the period and a two-point differencing filter for the second half. We leave experimentation with this model t o the reader. Does periodicity in the variance imply the presence of periodic correlation? Based on all the simple models, the presence of a periodic variance is a good indicator that PC structure is present. But strictly, the answer is no, as seen from the following example for which we acknowledge Andrzej Makagon. Suppose, for t E Z, X t is a stationary sequence, ft is a periodic scalar sequence with 1 f t l properly periodic, and gt is a nonrandom scalar sequence of values taking fl with A Closing Question.
lim
N+m
N
C gt = 0. 2N + 1 1
~
t=-N
42
EXAMPLES, MODELS, AND SIMULATIONS
Figure 2.18 Simulated PMA Xt = Et +cos(27rt/T)&1 +sin(2~t/T)
Then yt = f t g t X t has standard deviation a y ( t ) = l f t I 2 g x , but generally
for all s , t because gt # gt+T for all t . It would seem that we need to look pretty hard t o find a sequence with periodic variance that is not PC. And so for practical purposes we use it as a n important clue that periodic correlation is present. PROBLEMS A N D SUPPLEMENTS 2.1
Prove Proposition 2.1.
2.2 W h a t is the spectral representation of a periodic L2(R,F,P)-valued sequence? What is the spectral representation of a sequence that is periodic and also stationary? 2.3 Use the program parbatch.m to further explore the effects of $1 in the PAR(l)-CVS simulation of X t = [$o $1 cos(2.irt/T)]Xt-l
+
40
+
and
Et.
2.4 Construct versions of the elementary examples for the cases of continuous time P C processes, for P C fields, and for almost P C processes.
PROBLEMS AND SUPPLEMENTS
43
2.5 Continuous time wide sense stationary processes that are not continuous in quadratic mean are often considered pathological because of the usual assumption of continuity of the covariance R(u) at u = 0. This exercise illustrates that this is not the case for continuous time P C processes. Show that if X t is stationary and continuous in quadratic mean, and Yt = f t X t , where f t has a simple discontinuity at t o , then Yt is not continuous in quadratic mean at t o . Hence there are simple P C processes that are not continuous in quadratic mean.
This Page Intentionally Left Blank
CHAPTER 3
REVIEW OF HILBERT SPACES
The purpose of this chapter is to study those aspects of Hilbert space theory that are needed for understanding the rest of this book. It is directed toward the spectral theorem for unitary operators. We give some proofs, the remainder of which can be found in the indicated pages of Akheizer and Glazman 1-21.
3.1
VECTOR SPACES
Definition 3.1 A vector space over the field C of complex numbers consists of a set X in which two operations called addition and scalar multiplication are defined so that f o r each pair x and y in X and each complex number cy there are unique elements xfy and a x in X such that the following conditions hold f o r any vectors x,y,z E X and a n y complex numbers Q and ,8 E @:
(a) x + y = y + x ;
+ + z = x + (y + a ) ;
(b) (x y)
Periodacally Correlated Random Sequences:Spectral Theory and Practice. By H.L. Hurd and A.G. Miamee Copyright @ 2007 John Wiley & Sons. Inc.
45
46
REVIEW OF HILBERT SPACES
(c) there is a n element 0 E X , called zero, such that x + 0 = x;
+ y) = ax + a y ; (e) (a+ P)x = ax + Px;
(d) a(x
(g) ox = 0; (h) I x = x . The elements of a vector space some important vector spaces.
X are called vectors. Here are examples of
Examples.
1. It is easy to check that the Euclidean space C k = {x = [d, x2,. . . , x k ] : zz E @, i = 1,2,. . . , k } with component-wise addition and scalar multiplication is a vector space and its zero vector is 0 = [ O , O , ...,01 . 2. The space of square summable sequences
with component-wise addition and scalar multiplication is a vector space. Its closedness under scalar multiplication and addition follows from
/I ax 112=
laI2 I/ x
112>
which is easy to check, and
which is immediate from Problem 3 at the end of this chapter. The rest of requirement for a vector space can be easily checked. In particular here, the 0 vector is the sequence with all zero entries. 3. The space of square integrable functions X on a probability space (0,F,P ) , that is, the space of all random variables X on R satisfying the condition
INNER PRODUCT SPACES
47
with the usual point-wise addition and scalar multiplication, is a vector space. As in the last example, checking the vector space properties becomes straightforward once we notice that for any two square integrable functions X and Y and any complex number Q
E~QXJ =’ IQ.~’EIX/’ < m, which is easy to check and
EIX
+ Y / ’ 5 2EIXI’ + 2EIY/’,
which is an immediate consequence of problem 3 at the end of this chapter.
Definition 3.2 L e t X be a iiector space and L be a subset of X . L i s called a linear manifold if ax py E L f o r a n y a , P E @. and a n y x,y E L .
+
Definition 3.3 G i v e n a subset E of a vector space X , t h e set of all finite linear combinations f r o m E i s called t h e span of E and i s denoted by
One can easily verify that s p ( E ) turns out to be a vector space, which is called the vector space (or linear manifold) generated by E . Remark. If 13, are all the linear manifolds containing E , then nrCris a linear manifold containing E and it is easy t o check that it is the smallest among all such linear manifolds.
Definition 3.4 A set E i s called linearly independent if for every finite subset (x1,xz.. . . .Xn} of E, t h e condition ~ 1 x 1 ~ 2 x 2 . . . a,xn = 0 implies Ql
= Q2 = . . . = Q, = 0.
+
+
Definition 3.5 A subset E of a vector space X i s called a basis for X i f it i s linearly independent and sp(E) = X . 3.2
INNER P R O D U C T SPACES
Here the notion of inner product space is introduced, its properties are given and some specific inner product spaces are discussed.
Definition 3.6 A m a p (.,.) : X x X -+ C, where X i s a vector space, i s called a n i n n e r product o n X if f o r all complex n u m b e r s a,P and all vectors x,y, z E X we have
48
REVIEW OF HILBERT SPACES
(a)
(X,Y)
= (y,x);
(b) (ax + P Y , 4 = a(x,4 + P ( Y ,4; (c) (x,x)2 0; (d) (x,x)= 0 if a n d only zf x = 0 . If X is a vector space and (., .) is an inner product on X ,then the pair (X, (.. .)) is called an inner product space. For a vector x E X we define its norm to be
m.
I/ x /I=
Two vectors x and y in an inner product space X are said to be orthogonal, written x i y, if (x,y) = 0. Some properties of inner product and norm are given in the following proposition (See [2, Chapter I].)
Proposition 3.1 For a n y vectors x, y in a n i n n e r product space X and a n y complex n u m b e r (Y w e have (a) Cauchy-Schwarz inequality: i(x,y)/
I/ 11 y 11; 11 x + y 11111 x I/ + / / y 11;
/I ax /I= la/ I/ x /I ; (d) I/ x ]I= 0 zf and only if x = 0; (e) parallelogram law: 11 x + y 112 + 11 x - y /I2= (c)
I/ x /I2
/I y 1 ’. From the triangle inequality for norms it follows that d(x;y) =I1 x - y / / 2
+2
is a metric. The topology of an inner product space is the one induced by this metric.
Definition 3.7 A sequence x, of vectors in a n i n n e r product space X i s said (a) t o converge t o x E X if
11 x,
(b) t o be C a u c h y i f f o r a n y
E
//
Xn
- X,
-x
I/+
0, as n
+ 00;
> 0 there exists N > 0 s u c h t h a t
II<
E,
w h e n e v e r n, m > N .
Proposition 3.2 (Continuity of Inner Product and Norm) If {xn} and {y,} are t w o sequences in a n i n n e r product space converging t o x and y, respectively, t h e n (a)
II xn ll+ll
x
/I;
HILBERT SPACES
(b) ( X n l Y n )
+
49
(x.Y).
Proof. From the triangle inequality it follows that
/I x ll=ll and similarly arrives at
//
/I/
y
+ Y I I I I I x - Y I1 + II Y I1 11 + / / x /I. Combining these two inequality
x -Y
x -y
one
I /I x II - II Y /I I
=
I(xn?Yn -Y)
+ (xn - x , Y ) /
I I(Xn,Yn - Y)I + I(Xn - X , Y ) / 5 l l ~ n l l l l ~ ~ ~ ~ l
l + l l ~ ~ ~ ~ l l l l
from which (b) follows.
3.3
HILBERT SPACES
Definition 3.8 A Hilbert space 'H is a n inner product space that is complete with respect t o the n o r m induced by the inner product; that is, a Hilbert space is a n inner product space in which every Cauchy sequence (x,) converges in n o r m to some vector x E 'H. Here are some examples of Hilbert spaces. Examples.
1. The Euclidean space C k = {x = [x',x 2 ? .. . x'"]: xi E C ,i = 1 , 2 , . . . ? k } equipped with k
(x,y) =
xi2 i= 1
4s'.
Here the norm of a vector x =
becomes an inner product space.
[x',x2.. . . .xk]in C k is given by I/ x /I= This inner product space turns out to be a Hilbert space. To verify this, which means verifying its completeness, let x, = [xi, . . . ? xk] be a Cauchy sequence in Ck. Given any positive E there exists positive integer N such that
~2,
k E
2
>/I
x,
-
x,
112=
-
j=1
.&I2,
whenever m, n
> N.
50
REVIEW OF HILBERT SPACES
Hence for each integer j = 1 , 2 , . . . , k we have
:.1
-
&
< E , whenever m, n > N .
Now by completeness of complex numbers @, for each positive j there is a complex number d t o which converges. Taking x = [ X I , x2,. . . , xk] we get 11 x, - x II+ 0 , as n co, --f
which completes the proof. 2. The space E2 = x =
(zj)zl
: xj E @,
CjIxj/”(m} equipped with
becomes an inner product space with norm / / x /I= Using an argument similar t o the one used in the example of Euclidean space C k above, one can verify that the space is a Hilbert space.
!”
This turns out to be a prototype Hilbert space in the sense that every separable Hilbert space is essentially the same as (isomorphic to) this one. However, the next example, which is the cornerstone of the theory developed in this book, is not of this type (because it is not separable) and thus deserves our attention. 3 . Consider the space L2(R,F.P ) of square integrable functions (random variables with finite second moments) on a probability space (R. F.P ) and define the inner product to be
( X , Y )= E X Y =
s,
X(w)Y(w)dP(w).
It is easy to check that ( X , Y ) satisfies requirements (a)-(c) in Definition 3.6. However, requirement (d) does not hold. That is. ( X , X ) = 0 in general does not imply that X is identically zero. It only implies that X is zero almost surely. This problem can be settled by saying two random variables are equivalent if they are equal almost surely and by reinterpreting L2(R. F ,P ) as the collection of all these equivalence classes. Each class is uniquely represented by any specific random variable in that class and we shall use notation X. Y , and so on for elements of L2(R.F,P ) and call them random variables, although it is sometimes necessary to remember that X stands for the class of all random variables equivalent to X. We similarly reinterpret the definition for the
OPERATORS
51
inner product (X,Y ) .To show that this modified space L2(R,F ,P ) is a Hilbert space, it is necessary to establish its completeness, a task beyond the scope of this presentation, but readily found in measure theory texts. See, for example, [202, Theorem 3.111. This Hilbert space is of particular importance to us because most of the stochastic processes in this book are just two-sided sequences of vectors in such a Hilbert space. Definition 3.9 If E i s a subset of a Hilbert space ‘H, t h e n t h e closure of s p ( E ) , denoted by -(E), i s called t h e span closure of E . Definition 3.10 A n y closed linear manifold M of a Halbert space 3-1 will be called a subspace. T h e i n n e r product (.,.) of ‘H induces a n i n n e r product o n M . T h e subspace M together w i t h this inherited i n n e r product i s itself complete (this follows f r o m completeness of ‘H) and hence i s a Halbert space.
3.4 OPERATORS Definition 3.11 L e t ‘H and K be t w o Hilbert spaces. A m a p p i n g (or transf o r m a t i o n ) T from ‘H i n t o K i s said t o be additive if T ( x y ) = Tx T y f o r all x , y E ‘H and homogeneous if T ( a x )= cyTx for all complex numbers cy a n d all vectors x E 3-1. A transformation T t h a t i s both additive and homogeneous i s called linear.
+
+
It is easy to check that a transformation T : ‘H + K is linear if and only if T ( a x+ y ) = cyTx T y for all x , y E ‘H and all complex numbers a. A linear transformation from a Hilbert space ‘H into the complex numbers C is called a linear functional.
+
Definition 3.12 A linear transformation T : ‘H there exists a posative n u m b e r M s u c h t h a t
I/ Tx //I Ad // x I / ,
K as called bounded if
---f
for all x E 3-1.
T h e smallest s u c h M i s called n o r m of T and i s denoted by 11 T / / . A bounded linear transformation f r o m ‘H i n t o itself i s called a n operator o n ‘H. Proposition 3.3 For a n y linear t r a n s f o r m a t i o n T s t a t e m e n t s are equivalent: (a) T is bounded; (b) T i s u n i f o r m l y continuous o n 3-1;
(c) T i s everywhere continuous o n 3-1>
1
3-1
---f
K t h e following
52
REVIEW OF HILBERT SPACES
(d) T is continuous at 0 .
Proof. Let T be a bounded linear transformation and E be a positive number. Take 6 = E / 11 T /I. Now whenever 11 y - x 115 6, we can write
m=% E
/ / T Y - T x / / = I I T ( x - Y )/ I ~ / / T l / l l Y - x / l ~ l l T I I
which means T is uniformly bounded. This shows (a)=+-(b). Implications (b)+(c) and (c)+(d) are obvious. To show (d) +(a) suppose T is continuous at 0 . Since TO = 0, there exists some 6 > 0 such that /I Tx // =/I Tx - TO I/< 1, whenever /I x II< 6. Now for any x E 7-1 we can write
So taking M = 116 we get
11 Tx 115 M 1) x 11,
for all x E 7-1,
which means T is bounded. This completes the proof of (d)+(a).
I
Definition 3.13 A bounded linear transformation T : 7-1 --+ K: is called a n isomorphism if it i s onto and (Tx,Ty) = (x,y) for all x, y E 7-1. Two Hilbert spaces X and K: are called isomorphic i f there exists a n isomorphism f r o m 7-1 onto K:. For the proof of the following two important theorems one can refer to [202, Theorem 4.121 or [87, Section 221.
Theorem 3.1 (Riesz Theorem) If 'p is a bounded functional o n 7-1, there exists a unique vector y E l-l such that p(x) = ( x , y ) ,f o r all x E 7-1
and
/I Y ll=ll
'p
II .
A complex valued function p(x,y ) ,x, y E 7-1, which is linear in each variable, is called a bilinear functional and it is called bounded if there exists a positive number M such that / Y ( X , Y ) ~5 L ~ l l x l / l l ~ for l l all , X , Y E 7-1.
Theorem 3.2 For any bounded bilinear functional unique operator A o n 7-1 such that 'p(x,y)= AX,^), f o r all x
E
'p
7-1.
o n 7-1 there exists a
PROJECTION OPERATORS
53
Proposition 3.4 For any operator T o n ‘H there exists another operator T* on 7-1 such that (Tx,y) = (x,T*y), f o r all x, y E 7-1.
The operator T*, which is called the adjoint of T, has the same n o r m as T. Proof. Let y be any fixed vector in ‘H. Consider the linear function cp defined by cp(x) = ( T x , ~ )for , all E 8. This functional cp is bounded because for any x E ‘H we can write
By Theorem 3.1 (Riesz), there exists y* E ‘H such that cp(x) = (x,y*)for every x E ‘H. Defining T* : 7-1 + ‘H by T*y = y*,one can check that T* is linear, bounded, and satisfies (Tx,y) = p(x) = (x,y*) = (x,T*y), for all x, y E ‘H. I
Definition 3.14 A n operator T o n the Halbert space ‘H is called normal if it commutes with T*, Hermitian if it is equal to T*, and unitary if T-’ = T * . 3.5
PROJECTION OPERATORS
For more about material covered in this section one can refer to 30-33, 35, 36 1.
[a, Sections
Definition 3.15 Let x be a ,uector in a Hilbert space ‘H and M be a subset of 7-1. W e say x is orthogonal to M and write x IM if x i y f o r all y in M . I n general, two sets are said to be orthogonal to each other i f every vector in one set is orthogonal to the other set. The orthogonal complement M’ of a subset M of a Halbert space 7-1 is defined to be M’ = {x E H : x 1M } . Proposition 3.5 Orthogonal complement of any subset M of a Hilbert space ‘H is a subspace of 7-1. Proof. Let y and z be any two vectors in M I , x be any vector in M , and a , p be any two complex numbers; then we have (x,ay
+
+
PZ)
= Q(X, y)
+ P(x,z) = 0,
which shows a y Pz E M i and hence M L is a linear manifold. To show the remaining requirement, closedness of M’, let (y,) be a sequence in M’
54
REVIEW OF HILBERT SPACES
converging t o some y in 3-1. Now for any x in M by continuity of the inner product (see Proposition 3.2) we can write (x,y)= lim (x,y,) n-co
=
lim O = 0:
n-cc
a
which shows y must be in M l .
Consider a subset M of a Hilbert space 3-1 and a vector x in 3-1 outside M . Is there in M a “closest” vector t o x? If there is such a vector is it unique? Considering S(x) = infycM 11 x - y 11. by closest element in M t o x we mean any vector x E A4 with 11 x - x I/= S(x). It is easy to see that for a general set M the answer is negative. The following important theorem, which can be found in any book on Hilbert spaces (e.g., see [202, Theorem 4.11]),provides a complete answer t o this question.
Theorem 3.3 (Projection Theorem)) If M i s a subspace of the Hilbert space 3-1 and x is any vector in 3-1, then (a) there i s a unique vector 2 E M , called orthogonal projection of x onto M , such that 6(x) = inf I/ x - y 11=11 x - 2 11; YEM
(b) a vector 2 E M serves as the orthogonal projection of x onto M if and only af x - 2 E M I - .
Proof. For (a), let 6(x) = infyEM 11 x-y 11; then there is a sequence xn E M with I/ x - x, 11- 6(x). Since M is a subspace and hence (xn yn)/2 E M , an application of the parallelogram law gives
+
o 511X,
- xm
11’
11 X, 5 2 /I x,
=
2
-x -
x
/ j 2 +2 112
+2
11 xm - x 11’ -411 Xn +2 X m 11 xm - x 112 -4S(x).
-
xlI
2
Since the last quantity on the right has limit zero as m, n + 00:the sequence x, is Cauchy and hence has a limit, 2: in M . Now by continuity of norm given in Proposition 3.2(a),
which shows the existence in (a). Assuming there are two different vectors 2 and 2 E M with
11 x - 2 11=/1
x -2
I/=
6(x),
PROJECTION OPERATORS
55
by the parallelogram law (Proposition 3.1), we get
which contradicts 2 # X. For (b), if 2 is in M and x - 2 is in M’, defined in (a). This is because for any y E M
I/ x-y / / 2 = (x-2+2-y,x-2+2-y)
=
//
then 2 is the unique vector
x-2
112
+ )I y - 2
11211j x - 2
.
Conversely, if x E M but x - 2 is not in M , then 2 is not the closest vector of M to x. One can easily check that, for any y in M with (x - 2,y) # 0, the vector
is clearly in M and closer to x than 2.
I
Part (a) of Theorem 3.3 establishes the existence of the orthogonal projection 2 of x while part (b), as we see in the following examples, helps us in identifying it. Exarn ples
1. Let xi,i = 1 , 2 , . . . , n, be n vectors in a Hilbert space 31, and M be the subspace M = sp(x1, x2,... , x,} and x be any vector in 31 but not in M . The closest vector in M to x is of the form
Using part (b) of the Projection Theorem, for all j = 1 , 2 , . . . , n we must have x - 2 1xj or (x - &;xi, xj) = 0. So one can find a%by solving the following system of n linear equations: C(Xi,Xj)Cui= ( X , X j ) .
j
=
.
1 , 2 , . . . n.
2. As a special case when M = sp(e1, e2 , .. . , e n }with e, Ie:, for all z # j and 11 e, I/= 1 for all i, then one can see that orthogonal projection of x on M turns out to be (see Problem 3.10 at the end of this chapter)
56
REVIEW OF HILBERT SPACES
Definition 3.16 Let M be a subspace of the Hilbert space 3-1; t h e n the projection PM : H + M i s a n operator defined b y
P M X = 2,for all x E 3-1. Sometimes instead of P M X we use the notation (x I M). The following Proposition gives some useful properties of projection operator.
Proposition 3.6 If M and
N are two subspaces
of a Hilbert space 3-1, then
(a) ( I - P M ) i s the projection f r o m 3-1 onto M I ; (b) every x E 3-1 has a unique representation as the sum of two orthogonal vectors, one in M and the other one in M‘
x = PMX
+ I/ ( I
+ (I - Pm)x; 1;’
(c)
11 x /I2
(d)
PM i s a bounded operator with n o r m one (i.e. 11 PM /I= 1);
=
11 PMX
(e) x E M if and only if
-
pM)x
P M X = x and x E M L if and only if P M X = 0;
( f ) PM i s idempotent ( i e . , P& = PM); (g)
PM i s self-adjoint;
(h)
PM is nonnegative definite (i.e., ( P M x ,2~0)for all x E 3-1);
(i) M
c N i f and only if PMPN
= PM.
Proof. Part (a) follows from the definition of projection. Part (b) follows (I - PM) to the vector x. Part (c) from applying both sides of I = PM follows from (b) and the fact that the summands P M X and (I - PM)Xare orthogonal. For part (d), consider any two vectors x , y E 3-1 and any two complex numbers a , p. It is clear that
+
a2+py^ E M and
+ py - a2 +
(UX
+
=
+
(ax- c~2) ( p y - p?) IM
+
which implies P M ( ~ xp y ) = ~ P M XPPMY and hence PM is linear. Now from (c) we can write /I PMX l/
PROJECTION OPERATORS
57
This means PM is self adjoint. Applying (f) and (g) we can write ( P M x ,X ) = ( P h x , x )= ( P M x , P M x )=I1 P M X 112 ’ 0 , which completes the proof of (h). Proof of (i) is left to the reader as problem 3.7 at the end of this chapter. I
Proposition 3.7’ If P is a self-adjoint idempotent operator on a Hilbert space H,then it is a projection on some closed subspace M of 7-1 (i.e., P = P M , f o r some M ) . Proof. Let M = { x E 7-1 : P x = x } . It is easy to check that M is a linear manifold of ‘H. To show M is closed let x , be a sequence in 7-1 that converges to some x E H. So we have P x , = x , for all n E N. Taking the limit of both sides and using the continuity of P , we get P x = x , which shows that x is in M and hence M is closed. Take any x E ‘H: first, P x E M , because P x = P 2 x = P ( P x ) and second of all x - P x IM , because for any y E M we have ( x - P x , y ) = ( x - P x , P y ) = ( ( I - P ) x , P y ) = ( P ( I - P ) x , y )= ( ( P- P 2 ) x ,y ) = ( 0 ,y ) = 0. So for any x E ‘H, P x = 2 = P M X ,by definitions of 2 and P M . I Projections have the following additional properties, the proofs of which can be found in [2, Chapter 1111.
Proposition 3.8 If M j , j = 1 , 2 , . . . , n are subspaces of
H ,then:
(a) T h e composition P M P ~i ;s a projection if and only if P M ,P M ~ =PM~PM~ If this i s the case, then PM,PM, = P M , ~ M ~ . (b) T h e operator
+
Q = P M , + P M ~ . . . + PM,
is a projection if and only i f Pmj P M ~ = 0 , whenever j # k and if this is the case then Q = P M , where M = M1 @ M2 @ . . . @ M n . (c) T h e operator PMl - PM2 is a projection if and only if M2 i f this the case then P M , - Pm2 = P M , where
M = M I e M 2= { X
E
M 1:xiM2}
c M1 and
58
REVIEW OF HILBERT SPACES
Proposition 3.9 Suppose (PM,), n E N , is a nondecreasing sequence of projections; that is, we have p~~ < p ~ , + for~ all n. T h e n P = limn+33 PM, exists and i s a projection. T h e limit here i s in the strong sense, lim
k-33
//
P x - P M ~ /I=X 0,for all x E 'Ft.
The next proposition gives some properties of orthogonal complements
Proposition 3.10 Let M be a subspace and L be a subset of 7-1 , then (a) L n L'
= 0;
(b) q ( L ) = (L')'; (c) L
G (LI)' and M
=(ML)l;
(d) [V(unMn)]' = &MA.
Proof. For (a), suppose x E L n L', then we must have x Ix and llx112 = (x,x)= 0, which implies x = 0. For (b), let x E L , then x I y for every y E L'. This means x 1 L' or x E (L')'. So L & ( L l ) l .Now since (L')' is a subspace we get the inclusion @ ( L ) C (L')'. To show the not in m ( L ) and let 2 other inclusion, suppose there is a vector x E (L')' be the projection of x on v ( L ) . Then x - 2 is in both (L')' and L', so jlx1/2= (x,x)= 0,which is a contradiction. Part (c) is an immediate consequence of (b). Proof of (d) is left to the reader as Problem 3.8 at the end of this chapter. I
Definition 3.17 For a finite or countable family (7-1,) of Hilbert spaces, their direct s u m 7-1 = @C,7-1, is defined t o be the collection of all sequences X = (x,), where each x, E 'Ft, and C 1 1 ~ ~ 1<1 00. ~ Addition and scalar multiplication o n 7 j are defined coordinate-wise and inner product i s defined bu n
One can easily check that 7-1 = @C,7-1, together with this inner product. becomes a Hilbert space. Sometimes an arbitrary x E 7-1 is written as X = @ C , x,. If these 7-1, are mutually orthogonal subspaces of some Hilbert space 7-1, then 7-1 = @C,7-1, is called the orthogonal direct sum of the k (7-1,). If we have finitely many X n , n = 1 , 2 , . . . , k , then 7-1 = @ C1 'Ft, can be identified with the space of all vectors X = [x', x2,. . . , x k ]with x2 E Xi.
Proposition 3.11 Suppose (7-1,; n = 1 , 2 , . . . k } i s a collection of mutually orthogonal subspaces of a Halbert 7-1 such that nk=l(7-1n)L= 0 . T h e n 7-1 i s isomorphic t o the orthogonal direct s u m @ C:=,7-1,. ~
PROJECTION OPERATORS
59
The straightforward proof is omitted. Remark. If M is a subspace of 3-1, then by the preceding Proposition, 3.1 turns out to be isomorphic t o the orthogonal sum of M @ M I but we usually say 'H is the same as the orthogonal sum of M @ M I and write 'H = M @ M'. If this is the case, any x E 'H can be expressed as the sum
x=y+z of some y E M and some z E M'.
[/XI/
Definition 3.18 A set E c 'H i s called normalized i f = 1 f o r all x E E . If (x,y) = 0 f o r any distinct pair x , y E El then E is called a n orthogonal set. A set E that is both normalized and orthogonal i s called orthonormal. A complete orthonormal set in 'H i s a dense orthonormal subset E of 'H. Gram-Schmidt Orthogonalization Process. From any linearly independent sequence (xn)c 'H, using the Gram-Schmidt orthogonalization process, one can construct an orthonormal set {en} that is equivalent to it, namely,
sp{el, e z , . .. , e k } = sp{xl, ~
. , xk}, for any k 2 1
2 , ..
The process is based on induction and starts with setting
After el, e2, . . . , ek have been constructed,
ek+l
is taken to be
k
with ck+l = xk+l - c,=1 (xk+l - en)%. For proofs and a more complete discussion of the Gram-Schmidt process, see [2, Sections 8 and 91. Remark. The original sequence need not be linearly independent because we can always extract a linearly independent subsequence from one that is not.
Definition 3.19 A Hilbert space 'H is called separable if it has a countable dense subset E . One can easily prove that Gram-Schmidt orthogonalization process gives the following important result (see Problem 3.9 at the end of this Chapter).
Proposition 3.12 Every separable Halbert space has a complete orthonormal sequence.
60
REVIEW OF HILBERT SPACES
Proposition 3.13 (Fourier Series) If E = {en : n E N} is a complete orthonormal set in 'H, then any vector x E 'H can be expressed as 00
x
=
C(x,enjen. n=l
One can then check that a countable orthonormal set E = {en : n E N} is complete if and only if
c 'H
00
/I x /12=
C /(x,en)I2,for all x E 'H. n=l
That is why a complete orthonormal set in 'H is also called an orthonormal basis for 'H.
3.6 SPECTRAL THEORY OF UNITARY OPERATORS One of the classical results of operator theory is the spectral theorem for normal operators. In this section we state this result for the case of unitary operators so we can use it in subsequent chapters. For more detail on material presented here one can refer to [2, Sections 62 and 631. 3.6.1
Spectral Measures
Here we first introduce the notion of spectral measures and then briefly discuss their properties. Throughout this section we work with a measure space ( X ,F)consisting of a set X and a a-algebra 3 of its subsets.
Definition 3.20 A function E defined o n 0-algebra 3 of subsets of X whose values are orthogonal projections in a Halbert space 'H is called a spectral measure i f E ( X ) = I and f o r any sequence ( M n ) of disjoint subsets in F 00
n=l
For an elementary and yet interesting example of spectral measures see Problem 3.11 at the end of this chapter. The following proposition lists several useful properties of spectral measures. For proofs the reader can refer to [87, Section 361.
Proposition 3.14 If E i s a spectral measure o n 3 then E(8) = 0 and i t is (a) finitely additive: f o r any finite disjoint family of sets n=l
Mn in 3,
SPECTRAL THEORY OF UNITARY OPERATORS
61
(b) modular: for any two sets M , N in F,
E ( M U N ) = E ( M )+ E ( N )- E ( M n N ) ; (c) multiplicative: for any two sets M , N in F,
E ( M n N)= E ( M ) E ( N ) ; (d) subtractive: f o r any two sets M , N in F with M
CN,
E ( N - M ) 5 E ( N )- E ( M ) ; (e) monotone: for any two sets M , N in F with A4 C N ,
E(M)5 E(N); ( f ) commutative: for any two sets M , N in 3
E ( M ) E ( N )= E ( N ) E ( M ) ; (g) orthogonally scattered: for any two disjoint sets M , N in 3,
E ( M ) IE ( N ) . The following theorem, proved in [87, Section 361, reveals a very close and useful tie that exists between spectral measures and scalar measures.
Theorem 3.4 Let ( X , F ) be a measure space. A projection valued set function E on 3 is a spectral measure if and only if E ( X ) = I and f o r each pair of vectors x and y an 'H the scalar valued set function px,y( M ) = ( E (M ) x ,y) is a countably additive measure. 3.6.2
Spectral Integrals
Our presentation here is in a manner found in [87] and [187]. Let f be a bounded measurable function on ( X , F ) and E be a spectral measure on F.For each pair of vectors x and y in 'FI, the familiar integral J f ( X ) d p u , , , of f with respect to scalar valued measure p,,,(M) = ( E ( M ) x , y )can be formed. It is easy to see that p defined by cp(x,y ) = f ( X ) d p L , , yis a bilinear functional. We claim this is also bounded. In fact, for any vector x E X,
62
REVIEW OF HILBERT SPACES
which in conjunction with the parallelogram law (part (e) of Proposition 3.1) gives IdX,Y)l
5 2lf I S U P l I X l l IIYII.
Therefore by Theorem 3.2 there is a unique operator A( f ) on Fl such that
( A ( f1x1 Y ) = P ( X , Y ) =
1
f (X)(E(dX)x, Y).
The dependence of A ( f ) on f and E will be denoted by
1
f
(X)E(dX)= A ( f1,
s
and this defines the spectral integral f (X)E(dX)as A(f ) . The following properties of the spectral integral we just introduced are proved in [87, Section 371 and [187, Section 1.41.
Theorem 3.5 If E is a spectral measure on (X,F), f and g are bounded complex valued measurable functions on X , and CY is a complex number, then
s f (X)E(dX); (b) J ( f ( 4 + S(X))E(dX) s f ( X ) E ( d X ) + s 9(X)E(dX); (4 J C Q f
( X ) ) E ( d X )= a
=
(c) J P ( X ) E ( W = ISf(X)E(dX)l*;
(4 s f (X)S(X)E(dX) = s f (X)E(dX)s S(X)E(dX).
s
The dependence of A( f ) and f (X)E(dX)mentioned above is often expressed in the more application oriented form of
1
f (X)E(dX)x= A(f ) x , for every x E 3-1.
s
(3.2)
The integral f (X)E(dX)xhere can also be defined in the customary measure theory way. First, we define the integral of a simple function f = ajl~, to be
f (X)E(dX)x= J
5
c,”=,
CYjE(Aj)X.
j=1
Second, using the usual techniques of scalar measures we check that this integral, defined for simple functions, is well defined, linear, and additive. Third, we show that for any simple function f
SPECTRAL THEORY O F UNITARY OPERATORS
63
where p Z ( A )= P ~ , ~ ( = A ()E ( A ) x , x ) .In fact, if we express the simple function f in the form f = C7=la j l ~with ~ , mutually disjoint sets A?, then we can write
n
j=1 n
j=1
=
/lfi2dPZ.
where the second equality follows from the orthogonality property of spectral measures stated in part (g) of Proposition 3.14. Fourth, take an arbitrary complex valued function f in L 2 ( p Z )and let (f n ) be any sequence of simple functions converging to f in L2(pz)-norm. This implies that the sequence (fn) is Cauchy E L 2 ( p z ) . Therefore by (3.3) the sequence fn(X)E(dX)x must be Cauchy in X and thus has a limit B, in X. We define
(3.4) Finally, we check that this integral retains linearity and additivity as well as (3.3). Now we have two seemingly different definitions for the spectral integral f (X)E(dX)x: namely, A(f ) x given through (3.2) and B , ( f ) given through (3.4). We claim these two are the same, that is, A ( f ) x = B,, and hence there is no confusion. To verify our claim for a simple f = a j l ~ and , any y E X we can write
c,”=,
j=1
n j=1
64
REVIEW OF HILBERT SPACES
which in virtue of the uniqueness part of Riesz Theorem 3.1 completes the proof. The argument for a general function f is the standard limiting argument of passing from the simple functions case to general ones.
3.6.3
Spectral Theorems
In the last two subsections we introduced the notion of spectral measure and then, starting with a given spectral measure on a measure space ( X ,F) with values in a Hilbert space 'H, we proceed to define the spectral integral J f ( X ) E ( d X ) as an operator A(f ) such that
1f
(X)d(E(X)x, Y) = ( A ( f ) xY,) , for any Pair x, Y
E
'H.
However, in most applications we have an operator A : 'H + 'H and we would like t o represent it as a spectral integral. The question is: Given an operator A : 'H + 'H, does there exist a spectral measure E on some measure space ( X , F )and a function f such that A = J f ( X ) E ( d X ) ? The answer for a large class of operators, known as normal operators, is affirmative. This important result, which is called the spectral theorem f o r normal operators, says that every bounded normal operator A has a spectral measure E on Borel subsets , XE(dX). This result can be found,for of complex plane C for which A = J example, in [87], [187, Section 1.41and [a, Section 621. In this book we are interested in the special case of this result for unitary operators: If U is a unitary operator on a Hilbert space 'H, then there exists a unique spectral measure E on Borel subsets of the unit circle T such that
U=
XE(dX)
If we identify T with [ 0 , 2 ~in ) the usual way one can state this theorem as follows.
Theorem 3.6 (Spectral Theorem for Unitary operators) For any unitary operator U o n a Halbert space 'H there exists a unique spectral measure E o n the Borel subsets of [ 0 , 2 i ~ )such that
Finally this last theorem and with part (c) and (d) of Theorem 3.5 establish the following result.
Theorem 3.7 For any unitary operator U o n a Hilbert space 'H there exists a unique spectral measure E o n Borel subsets of [0,27r) such that
PROBLEMS AND SUPPLEMENTS
1
65
2T
Ut =
e i t X E ( d X ) , f o r any integer t .
PROBLEMS AND SUPPLEMENTS
3.1
Show that for any two complex numbers a and
I a + P 1212 1 a l2 3.2
+2
P.
IP 12.
Show that the Hilbert space l 2 as in Section 3.3 is separable.
3.3 Show that any separable Hilbert space 3-t is isomorphic to 12. Hint: Take a countable dense set { x n } in 3-t! denote its Gram-Schmidt orthogonalized sequence by en, and then show that the mapping T : 3-t + l2 defined by T x = ( ( x !e n ) )is an isomorphism. 3.4
Let
U be a unitary operator on some Hilbert space 3-t.
(a) Show that for any two orthogonal subspaces M and
N
of 3-1 we have
U ( M @N)= U M @UN. (b) Show that if M and
N
are subspaces of 3-1 then
U ( M eN)= U M
e UN
3.5 Suppose M is a subspace of a Hilbert space 3-t and U is a unitary operator on 3-t. Show that UP, = PMU if and only if UM = M . 3.6
Show that the “subspace’! assumption in Theorem 3.3 is essential.
3.7 Let M and N be two subspaces of a Hilbert space 3-1 and show that M C N if and only if PMP’ = PM. 3.8 Let ( M n ) be a sequence of subspaces of a Hilbert space 3-t and show that [ ~ ( u ~ M = n,M,i. ~ ) ] ~ 3.9 Show that any separable Hilbert space has a complete orthonormal sequence. Hint: Start with a countable dense subset {xn : n E N} of 3-1 and consider its corresponding Gram-Schmidt orthogonalized set. 3.10 Consider a Hilbert space 3-1 and its subspace M spanned by an orthonormal set ( e l , e 2 , .. . e n } .Show that the projection of a vector x € 3-1 on M is given by !
n
66
REVIEW OF HILBERT SPACES
3.11 Define E : F + L z ( X ,F,,u) by taking E ( A ) : L z ( X ,3,p ) -+ L 2 ( X ,F,p ) for each set A E 3,to be the multiplication by the characteristic function of A . That is, E ( A ) g = g l or~ ( E ( A ) g ) ( A = ) g(A)lA(A). Show that E is a spectral measure (see Definition 3.20)
CHAPTER 4
STATIONARY RANDOM SEQUENCES
Here we review the pertinent facts needed about stationary random sequences, both univariate and multivariate. Half the chapter is devoted to univariate and the other half to the multivariate case. For a good introduction to stationary univariate sequences, see Pourahmadi [183] and Brockwell and Davis [28]; for multivariate stationary sequences, see Rozanov [201], Wiener and Masani [224], [225] and Masani [152]. We recall from Chapter 1 that (weak) stationarity is defined by the conditions E X t = m (constant) for all t and R ( s , t )= Cov ( X , , X t ) = R ( s - t ) for all s, t E Z.Here R(7) = Cov ( X T X,) , is called the covariance function of X t . Throughout this chapter we assume that all our random variables have zero mean and finite variance and we equip the set of all these random variables with the inner product
( X ,Y ) = cov ( X ,Y). One of the most important properties of the covariance function of a stationary random process is its nonnegative definiteness. We omit its easy proof. Periodically Correlated Random Sequences:Spectral TheonJ and Practice. By H.L. Hurd and A.G. Miamee Copyright @ 2007 John Wiley & Sons, Inc.
67
68
STATIONARY RANDOM SEQUENCES
Definition 4.1 A complex valued function -K ( r ) defined o n integers is said to be nonnegative definite i f K ( - r ) = K ( r ) f o r all r E Z and n
i,j=l
f o r any positive integer n, scalars a l , a2, . . . , a, and integers t l , t 2 , . . . , t,. Its importance stems from Herglotz' theorem( see [28]), which says any nonnegative definite function has a spectral representation.
Theorem 4.1 (Herglotz) A complex valued function K ( . ) defined o n integers is nonnegative definite if and only if there exists a bounded, nondecreasing and left continuous function FA on [0,ax), which vanishes at 0 , and f o r which
K ( r )=
12"
ei''dFA,
f o r any r E
z.
The function FA is called the spectral distribution function function of X t . Some authors develop the spectral representation of stationary random processes starting with this theorem. However, because we wish to emphasize the role of unitary operators in the spectral theory for periodically correlated processes, we will start with their role in the spectral theory of univariate stationary sequences. 4.1
UNIVARIATE SPECTRAL THEORY
In this section the spectral theorem for unitary operators is used to obtain spectral representations of stationary sequences and their covariance functions. Then the isomorphism between the time and spectral domain is studied and its importance in prediction theory of stationary random sequences is discussed. 4.1.1
Unitary Shift
Although much of the spectral theory of stationary sequences can be derived, and historically was derived, without the explicit use of the unitary shift, we believe it is the most fundamental idea present. Hence we will use it here as the basis for discussing the spectral theory of stationary sequences; then later we will use it in discussing the spectral theory for periodically correlated random sequences. For any second order random sequences X t , there is a naturally defined smallest subspace on which we can focus our attention. This subspace, called time domain of Xt,is defined next.
UNIVARIATE SPECTRAL THEORY
69
Definition 4.2 The time domain ‘Hx of a second order process X t is the closed subspace spanned b y all the vectors X t with t E Z,
‘Hx= W { X t : t E Z},
(4.1)
where the closure is in the mean-square sense. In the following we take m = E { X t } t o be zero but will remark later on the consequence of assuming m # 0. We now show that the condition of stationarity is equivalent t o the existence of a unitary shift operator.
Proposition 4.1 A second order stochastic sequence X , is stationary if and only if there exists a unitary operator U defined on ‘ H x for which
for every t E Z. Proof. If there exists a unitary operator U for which (4.2) holds, then it is easy t o see ( X t , X , ) = ( U X t , U X s )= ( X t + l , X s + l ) , which means X t is stationary. Conversely, suppose X , is stationary; then we define U : L X = sp{Xt : t E Z} LX by U z = C,”=la3XtJ+1,for any z = C,”=la,Xt, in Lx. We will show that U is well defined, linear, and onto. To show it is well defined, suppose a vector z in L x has two different expressions, which, without loss of generality, can be taken to be z = C,”=, a3Xt, and z = C,”=, b,Xt,. Then we can write --f
c,”=,
which means C,”=, u,Xt,+l = b3Xt,+1. For linearity, suppose z = n C,=l a,Xt, and z’ = C,”=, b,Xt, are two members of E L x ; then for any
70
STATIONARY RANDOM SEQUENCES
scalar
cy
we have
n
j=1 n
n
j=1
j=1
To show U is onto, pick any z =
C,”=, a3Xt3 E Lx,and
then we have U(C,”=l u,Xt,-l) = C,”=, u3Xt, = z. Finally, U is an isometry because for any two vectors z = C,”=, u,Xt, and z’ = C,”=, b,XtJ in Lx we can write n
n
j,k=1
j,k=l
which shows U preserves inner product as well as norm. Therefore U can be extended by continuity (see [a]) to a unitary operator U on ‘ H x = and (4.2) holds for U . I Now we address the issue of m = E X # 0. Since constant random variable is also in L 2 , then
Ex c L2
and the
m = / x t ( u ) P ( c i u )= ( x ~ 1)., Thus a constant mean for a process says that every X t has the same projection onto 1. But it is not necessarily true that 1 E ‘Hx. If 1 E 7-l then U l = 1, meaning 1 is an eigenvector of U with eigenvalue of 1. See the supplements for a sketch. The fuss over the case of nonzero mean can be avoided by always including the vector 1 in forming L X and H x ; that is, define
and then 7 - l ~= 4.1.2
z.
Lx
= sp{l,Xt : t E
Z}}
Spectral Representation
Let X t be a stationary sequence with unitary shift U as defined in Section 4.1.1. Due to the Spectral Theorem for Unitary Operators (Theorem 3.7)
UNIVARIATE SPECTRAL THEORY
71
there exists a spectral measure E on Borel subsets of [0,27r) for which
Ut =
A
2T
eitXE(dX)
(4.3)
Therefore we can write
where the set function [ ( E )= E ( A ) X o ,for A any Borel subset of [0, ax), turns out to be a countably additive vector measure called the r a n d o m spectral measure or r a n d o m measure of X t . Furthermore, t(.)is orthogonally scattered in the sense that ( J ( A ) , < ( B= ) ) 0, whenever A n B = 8. These properties are inherited from the corresponding properties of the spectral measure E explained in Section 3.6. We have just proved the necessity part of the following spectral representation theorem for stationary random processes.
Theorem 4.2 For a second order sequence X t t o be stationary it is necess a r y and s u f i c i e n t that there exists a countably additive orthogonally scattered measure t(.)o n t h e Borel subsets of [0,27r) s u c h that
Xt =
lT
eitx[(dX).
(4.4)
Proof. To prove sufficiency suppose X t has representation (4.4). Formally, we then have
which shows X t is stationary. Here F is the scalar valued spectral measure of X t defined on Borel subsets of [O, 27r) by
F ( A l n A21 = (E(Al),t(AZ))= M A 1 n A2)1I2. This can be made precise by interpreting (4.3) and hence (4.4) in the RiemannStieltjes sense. Or it can be argued using Proposition 5.8. 1
72
STATIONARY RANDOM SEQUENCES
Corresponding to the Riemann-Stieltjes interpretation of (4.3) and (4.4), the expression (4.5) can also be expressed as a Riemann-Stieltjes integral,
with respect to a distribution function F defined on [0,2n)by
The function F , which is called the spectral distribution function of X t , is nondecreasing and left continuous. Henceforth we will use the notation F for both the spectral measure and the spectral distribution function of X t . The integration with respect to F, the measure, is in the Lebesgue sense and is identified by using F(dX) while the integration with respect to F, the distribution, is in the Riemann-Stieltjes sense and is identified by using d F ( X ) . Nonzero Mean. Only a slight modification must be made if X t has a nonzero mean m. Writing X t = X,l m, we see that X l has zero mean. This leads us to see that <(.) must have an atom of weight m at X = 0, or <(A) = <'(A) m whenever (0) E A and otherwise [(A) = <'(A), where <(.) is the random , t ) = (X,',,, X i ) /mi2and measure associated with X & .This leads to ( X t + T X F x ( A ) = Fxt(A) Iml2 whenever (0) E A and otherwise F x ( A ) = F x / ( A ) . We emphasize that the <(.) measure of a single point is a random variable, and a constant is a special random variable. So it is possible for "((0)) to be a nonzero random variable, of mean zero but positive variance, and still X i would have zero mean. This comment is very closely connected to the following topic.
+
+
+
+
4.1.3 Mean Ergodic Theorem For stationary sequences, the mean ergodic theorem addresses mean square convergence, that is, convergence in L2(R,F ,P ) , of N
(4.7) t=-N
More generally, we can address the convergence of N
UNIVARIATE SPECTRAL THEORY
73
from the spectral representation (4.4). Proposition 4.2 If X t is a stationary sequence, then an the L2 sense lim SN(X) =(({A}).
(4.9)
N-co
Proof. This follows immediately from
l and is continuous where the Dirichlet kernel d N ( z )is bounded, \ d ~ ( ~5) 1, LZ
for every N and converges to l { o } ( ~ ) . SN(X) + [({A})
follows from
Definition 4.3 Any stationary sequence f o r which lim SN(O)= m,
N-CC
(4.10)
in mean square or L 2 ( f l , 3 ,P ) sense, is called mean ergodic. Thus from Proposition 4.2 and the discussion in the preceding section, we can see that SN(O)+ m if arid only if the atom of [(.) at (0) is the constant random variable m. This also means that F ( { O } )= lmI2,for if F ( { O } ) > /m/’, then [ ( { O } ) = X m, where X has mean zero but positive variance. In other words, the sequence is mean ergodic ( s ~ ( 0 )m ) if and only if F ( 0 ) = ]mi2, meaning the atom at X = 0 is only large enough to account for the mean; there is no random component having positive variance at X = 0.
+
---f
74
4.1.4
STATIONARY RANDOM SEQUENCES
Spectral Domain
Let Xt be a stationary sequence with spectral measure F and let L 2 ( F )denote the set of all complex valued Bore1 measurable functions on [0,27r) that are square integrable with respect to the measure F. That is, L 2 ( F ) = { f : [0,27r) + C with
AZx
1 f(X) l 2
F ( d X ) < cm}.
(4.11)
L 2 ( F ) equipped with the inner product
( f ,9 ) = / 2 n f(X)S(X)FW
(4.12)
0
becomes a Hilbert space, which is called the spectral domain of Xi. The spectral representation of Theorem 4.2 establishes a bridge between the time and spectral domains of Xt. To see this, from the spectral representation of Xt it is natural to consider the transformation V, which maps each finite linear combination C cjXt, in Lx to C cjeiXtj in its spectral domain Lz ( F ) :
j€J
j€J
It is easy to see from
that V is a well defined isometry. Hence V can be extended to an isometric isomorphism, called the Kolmogorov isomorphism, from I-Lx onto L2(F). This Kolmogorov isomorphism, as we see later in this chapter, allows us to transfer questions regarding a stationary sequence from the time domain to corresponding questions in the spectral domain. Then using Fourier analysis. we analyze the question in the spectral domain and then transfer our findings back t o the time domain.
UNIVARIATE PREDICTION THEORY
4.2
75
UNIVARIATE PREDICTION THEORY
gt for X t , based on some subset S = { X j , , X,, , . . . }, is a random variable that-is close to X t in some acceptable sense. Typically, we make the error, X t - X t , as small as possible in the chosen sense. For example, we can choose 2, so that either P ( l X , - gt1 2 E ) or E J X t- X t l z is smallest. Here we will address only linear least-squares prediction, so that 2, is a linear function of the elements of S that minimizes E l X t - Xt12. The problem is solved by the projection of X , onto M ( S ) ,the Hilbert subspace spanned by the elements of S. Here we examine two cases that have conceptual and practical significance: first, when S is the infinite set { X , : s 5 t } and we wish to predict the element Xt+l, and second, when S is a finite contiguous sequence of n elements {Xt-n+l, X t P n + z , . .. , X , } and we wish to predict X i - , (backward) and Xt+l (forward). There are other cases of theoretical and practical interest, but these two permit us to discuss the main ideas and provide the basis for extending it to the case of PC random sequences. We do not pursue finding the complete solution to the case when S is the infinite past, as to do so would take us a little too far from our main direction. However, we will give appropriate references for interested readers. A predictor'
4.2.1
Infinite Past, Regularity and Singularity
We begin with some subspaces of the time domain Ftx that are important for discussing prediction on an infinite past. In the following discussion, the closure is with respect to the inner product (4.1) defined on 'Hx.
Definition 4.4 Let X t be any second order random sequence. Its (linear) past up to and including time t is defined to be the subspace
'H(t)= P { X , : s 5 t }
(4.14)
generated b y all the vectors X , with s 5 t and its remote past is defined to be the subspace 'FI(-~)= x(t). (4.15)
n
tEZ
When the context requires it. we will add additional notation Fix. 'Hx(t). and 'Hx(-co),to signify the random sequence X t in question. We note that X ( t ) is nondecreasing with respect to t , that is. K ( t ) 2 X ( s ) for t 2 s, and for this reason we can also express
' H ( - ~ o=)
nFt(t) n E ( t k ) , =
t
'In some disciplines,
tk
zt is called the predictand and S the predictors
76
STATIONARY RANDOM SEQUENCES
where
{tk}
is any sequence of integers that converges t o -m.
Definition 4.5 A stationary random sequence is called purely nondeterministic or regular i f 7-q-m) = { O }
and is called deterministic or singular if 'H(-m) = 'H,
or equivalently, 'H(s) = 'H(t); for all s , t E
Z.
When we are working with a second order random sequence Xt, it is natural to take predictor Xt+l of Xt+l based on the past of the process up to and including time t to be that random variable in X ( t ) which generates the least error. It is natural to work with the linear past as the set of acceptable predictors and with mean-square error as the criterion for goodness because the solution to the Erediction problem is then given by the Projection Theorem 3.3: that is, Xt+l is simply the projection (Xt+l I 'H(t))of Xt+l on 'H(t). The Projection Theorem also ensures that the predictor xt+, (as a vector) is unique, although its implementation may not be unique, especially in nonstationary situations to be addressed later. Another important reason for considering the linear least-square predictor is the fact that in the very important Gaussian case, the nonlinear and linear predictors become identical. See [183] for a good introductory discussion of nonlinear prediction.
4.2.2 Wold Decomposition Some important results in prediction on the infinite past arise from the relationship between the propagating unitary operator U defined in (4.2) and the subspaces we have just defined. At this time we give the following relationships. Lemma 4.1 If Xt is second order stationary with unitary shift U , then
(a) 'H(t
+ 1) = U'H(t);
(b) 'H = U'H; (c) 'H(-m)
= U'H(-m).
Proof. For (a), recall that for any mapping A : 7-1 H 'H and subset M of c 'H, A M = {y E 'H : y = Ax,x E M } . Thus taking M to be L ( t ) = sp{X, : s 5 t } we have L ( t + 1) = U L ( t ) ; for if z E L ( t ) , then z = C y = l ~ 3 X t 3: t, 5 t so that U z = C ; = l ~ 3 X t , + 1 E L ( t l),
+
UNIVARIATE PREDICTION THEORY
77
+
which means U L ( t ) c L(t 1). Now let z E U X ( t ) , so a = Uw for some w E X ( t ) . Then w = limw, for w, E L ( t ) . By continuity of U we can write z = Uw = U(limw,) = limUw,. But since Uw, E L ( t 1) for all n, we conclude that z E L(t 1) = X ( t 1) and so U X ( t ) C X ( t 1). A similar argument using the continuity of U-' produces X ( t 1) C U X ( t ) . The proof of (b) is essentially the same. For (c), first suppose z E X ( - o o ) , which means z E X ( t ) for all t . But then z E X ( t 1) = U X ( t ) ,so for each t there is some y t E X ( t ) with z = Uy,. But since U is one-to-one, yt = y for some y and all t E Z. It is clear that y E X(-m) and a = U y , which implies z E U X ( - m ) .
+
+
+
+
+
+
I Theorem 4.3 (Wold Decomposition Theorem) A n y second order stationary sequence X t has a unique decomposition X,=Y,+Zt,
(4.16)
an terms of two orthogonal stationary sequences yt and Zt such that (a) ?-lx(t)= X y ( t ) @ X z ( t ) :
(b) X x ( t ) = X Y ( - ~ @ ) EZ(~); (c) yt is deterministic and 2, is purely nondeterministic; (d) UY = U X ~ X ( - ~and ) ; UZ = U X ~ K H ( - ~ ) - L .
Proof. For each t E Z set
yt = ( X t / X ( - c o ) )and 2, = X t
-K.
(4.17)
On one hand, for each s, Y, E X x ( - m ) , and on the other hand, 2, = X t - yt = X t - ( X t / X x ( - m ) ) IX ~ ( - o o )for each t. Hence Y, 1.Zt for all s , t E Z,which means X y I X z . For each t E Z, yt E X x ( - o o ) C X x ( t ) and hence 2, = X t - yt E X x ( t ) and therefore X y ( t ) @ X z ( t ) X x ( t ) . In order t o complete the proof of (a), it suffices t o show that X x ( t ) C X y ( t ) @ X z ( t ) . To do this, pick some w E X x ( t ) ; then w = limw,, where each vector w, E M x ( t ) and hence w, = u, +v,, u, 6 L Y ( t ) , v, E L Z ( t ) . But since w, must be Cauchy and I/u, - u,I/ 5 j/w, -wrnl1, and similarly for v,, then u, -+ u E X y ( t ) , v, ---t v E 'Flz(t). Taking the limit of w, = u, v,, we get w = u v , which shows X x ( t ) c X y ( t ) C€ X z ( t ) . This completes the proof of (a). For (b), it is sufficient t o show
+
+
X x ( - c c ) = X y ( t ) , t E Z.
(4.18)
From the definition of yt it is clear that X y ( t ) c X X ( - W ) . If this subset was proper for some t , then there must exist a nonzero vector u E X x ( - m ) 8
78
STATIONARY RANDOM SEQUENCES
X y ( t ) . In that case, for each s 5 t , we have u i Ys and u i 2, so that u I X , = Y, + 2,and hence u I X x ( t ) , and therefore u I X x ( - m ) . This implies u = 0, which is a contradiction. Singularity of yt in (c) is an immediate consequence of (4.18). Taking the intersection, over all integers t , of both sides of the equality in item (b) implies X x ( - m ) = X y ( - m ) @ Xz(-co). From this and (4.18) one gets Xz(-m) = 0, which means 2,is regular. For (d), using the definition'(4.17) of yt yields yt+l
= =
(Xt+1l%x(-m)) = ( U x ( X t ) I X x ( - m ) ) U x ( X t l X x ( - c o ) ) = ux yt
for any t E Z.This and Proposition 4.1 imply that yt is stationary. This also shows that UX and U y both act on X Y , which together with the fact that % x ( - c o ) = X Y , proved above, gives U y = U X / E ~ ( - The ~ ) . statements about 2,follow easily. I
4.2.3 Innovation Subspaces Since innovation refers t o something entirely new, it is both natural and important to consider the innovation subspace I x ( t )at each time t . We start with defining several notions that we will need later on in this chapter.
Definition 4.6 (Innovations and their subspaces) (a) A n uncorrelated sequence
Et of random variables with mean m and variance u2 is called a white noise and is denoted b y W N ( m ,u 2 ) .If m = 0 and u2 = 1, then the sequence is called a normalized white noise.
(b) A sequence et of vectors in a Halbert space is called orthogonal if (et e,) = 0 whenever s # t and it is called an orthonormal processorthonormal if (et,es)= 6,-t. (c) The innovation of a second order process Xt at time t is defined to be ct =
xt - PK(t-1)Xt
.
(d) The innovation space of a second order sequence X t at time t is defined
to be
I x ( t ) = X X ( t ) a X x ( t - 1) = {x E X X ( t ) : 2 IX x ( t - 1). The preceding expression can be written as
X x ( t ) = I x ( t )@3X x ( t - 1)
(4.19)
UNIVARIATE PREDICTION THEORY
79
which when iterated seems to suggest that we can express the entire history of X t as 00 ..
'Flx(t)= @ C Z x ( t - j ) . j=O But this is not quite correct because some nonzero vectors IC E 'Flx(-co) do not necessarily belong to @ C j , , Z x ( j ) . This suggests the following modification: m
'Flx(t)= 'Flx(-m)@ C Z x ( t - j ) , 3 =O
which turns out to be true. We will now make this a bit more precise while presenting some basic facts about ZX ( t )that are important for understanding its role in prediction.
Lemma 4.2 If X t is a second order stationary sequence with unitary shift U , then for every integer t ,
Proof. Part (a) is clear from
Z x ( t + 1)
'Flx(t+ 1) 0 'Flx(t) = U % x ( t )8 U'Flx(t - 1) = U['Flx(t)8 K x ( t - l)]= U Z x ( t ) =
where the first equality in the last row follows from Problem 3.4 at the end of Chapter 3. The first part of (b) is obvious. For the second part, applying part (a) of the Wold Decomposition Theorem 4.3 followed by its part (c), we get
Zx(t)
=
= = =
'Flx(t)a'Flx(t- 1) ['Fly(t)@ c f f Z ( t ) e ] ['Fly@ - 1) @ W Z ( -~ 111 ['Fly@) e ' F l y @ - I)] @ [ E Z (8 ~W ) t - 111 (0) @ ['Flz(t)8 X 2 ( t - I)] = zz(t).
Proof of (c) is left to reader as an exercise. For (d), if X t is singular, then X t E 'Flx(t- 1) and hence dimZx(t) = 0. Now if X t is not singular, then the
80
STATIONARY RANDOM SEQUENCES
regular component 2, of the Wold decomposition (4.16) must have positive length, that is, / / 2, //>0 for all t E Z. Furthermore, the innovation vector ct
=
xt - PX,(t-l)Xt
=
2,- P X z ( t - l ) & E I z ( t )
E Ix(t)
(4.20)
cannot be null. Thus I x ( t ) = I z ( t ) contains at least one nonzero vector. That means d i m I x ( t ) 2 1. Suppose d i m I x ( t ) > 1; then there should be a vector w E I x ( t )that is orthogonal to Ct. Hence w 1 X t - Px,(t-l)Xt and also w IX x ( t - l), which leads to w 1 H x ( t ) , a contradiction. Note that it is also clear from part (a) that d x ( t ) is constant with respect to t . I It is clear from the preceding that I x ( t ) = sp{et} and we denote the prediction error variance as a $ ( t ) = Var (Ct). Note that a x ( t ) > 0 if and only if X t has a nontrivial regular part. Since the least square predictor X t of X t based on its past Rx(t - 1) is defined to be the orthogonal projection of X t on X x ( t - 1). The following additional facts about innovations are true in a trivial way (i.e., with ac(t) = 0) if X t is deterministic or singular. However, the more interesting situation for prediction purposes is when the sequence is nondeterministic, which is equivalent to a x ( t ) # 0.
-
Lemma 4.3 Let Then (a)
ct be the innovation sequence of a stationary sequence X t .
Ct is stationary and has the same shift as X t and hence X t and jointly stationary and a x ( t )= a;
ct
are
(b) Ct is a n orthogonal sequence with constant variance, that is,
(Cs, C t ) = a2Ss-t, f o r any s, t
E
Z;
(c) f o r any integer t and any positive integer k,
which in particular shows that any future innovation is orthogonal to the past of the sequence up to that point. Proof. For (a), let U denote the shift operator of the stationary sequence X t . Since U commutes with the projection onto X ( t ) , for every t Ct+l
=
Xt+l - PX(t)Xt+l = uxt - PN(t)UXt
=
u(xt
-
P X ( t - l ) x t ) = uct,
UNIVARIATE PREDICTION THEORY
81
showing that U serves as a unitary shift for Ct and therefore Ct is statiosary with variance 0 % . For (b), if s < t , then CS E X x ( s ) and Ct = X t - X t i X x ( s ) . Therefore ICt. For (c), note that for any k 2 1, Xt-k E 'Hx(t- 1 ) and ('t = X t - z t is orthogonal to X x ( t - 1 ) . Therefore we have ( & , X t - k ) = 0. Furthermore, from the definition of 0% in part (a),we have
Next, we study the relationship between regular stationary sequences, white noise, orthonormal sequences, and innovation sequences. The very elementary, yet important, result is that any white noise sequence is regular. We begin with a lemma. Lemma 4.4
(a) A n y orthogonal sequence et is regular. (b) Any white noise (c) The innovation
Et
Ct
is regular. of
any stationary sequence X t is regular.
Proof. Since all white noises and all innovation sequences are orthogonal, we need only prove the first statement. Take any vector Y in X e ( - o o ) C 3-1,. Since {et} forms an orthonormal basis for X e , we have the expansion Y = oc stet, with at = (Y,e t ) . On the other hand, for each integer t , the expression
c3=--03
Y E Xe(-m)
c X e ( t - 1) c X e ( t ) ,
in conjunction with et 1 N e ( t - l ) , implies that at = (Y,et) = 0, for each integer t . Therefore Y = 0. I Using this lemma one can characterize regular sequences Proposition 4.3 (Moving Average Representation) A second order rundom sequence X t is stationary and regular if and only i f it has a one sided moving average expansion
X t = c a : , e t - 3 . t E Z, with 320
with respect to an orthonormal sequence et.
/a3I2< co. 320
(4.21)
82
STATIONARY RANDOM SEQUENCES
Proof. If X t satisfies (4.21) then it is clear that X t is an L2 random variable for every t. Taking t 2 s and observing that
(4.22)
R(t+l,s+l)
=
shows that X t is stationary. Since clearly
X x ( t ) c X e ( t ) = m{es : s 5 t } , we can write
n X x ( t ) c n X e ( t ) = (01, t
t
showing that X t is regular. Conversely, suppose X t is a regular stationary sequence. The innovation spaces Zx( t ) ,t E Z defined in (4.19) are one dimensional and hence 2 , ( t )= sp{&}, where Ct = X t - P7-l(t-l)Xt is the innovation of X t at time t . Since X t is regular, part (c) of Lemma 4.2 reduces t o X x ( t ) = @ E,”=,Zx(t- j ) . Therefore any vector Y E X x ( t ) can be represented as
it must have a representation In particular, since X t E EX@),
Now on the one hand,
and on the other hand,
j20
j20
In the last equation, U moved inside the summation because of mean-square convergence of the partial sums and continuity of U . In virtue of uniqueness of
UNIVARIATE PREDICTION THEORY
83
such representations for each j , a j ( t ) must be constant in t . That is, a j ( t ) = aj and we get the one sided moving average representation (4.23)
For another characterization of regular stationary processes see problem 4.9. Since we usually observe the process X t itself and not its innovations it seems pointless, from a practical viewpoint, to express the predictor Xg in terms of innovations, and that is true. However, expreping the predictor this way permits us to evaluate the error variance I/ X6 - X6 11' as a function of 6. We begin with expressing the predictor for a purely nondeterministic sequence.
6,
Proposition 4.4 If X t is a regular stationary sequence with one sided moving average representation (4.23) in terms of its innovations ct, then the 6-step ahead predictor of Xg based o n its past, namely, 2 6 = (X,/'Hx(O)),is given by (4.24) j=6
and the resulting prediction error given b y 6-1
(4.25) j=O
has variance (4.26)
Proof. It is immediate from (4.23) that 'Hx(0)c F l ~ ( 0and ) it is equally immediate from [t = X t - PM(t-l) that 'Hc(0) c 'Hx(0).Thus we have 'Hc(0) = 'Hx(0).Therefore C,"=, a j c s - j , which clearly belongs to ' H c ( O ) , is in X x ( 0 ) . On the other hand, C::: a j c s - j , which is clearly orthogonal to ' H c ( O ) , is orthogonal to 'Hx(0).Therefore by the projection theorem we have
-
X6
oc
=
(X6I'HX(O))= c a j c s - 3 . j=6
The error formulas are now immediate.
84
STATIONARY RANDOM SEQUENCES
Corollary 4.4.1 If X t i s a stationary sequence having Wold decomposition X t = x + Z t of Proposition 4.3, with yt its deterministic and Zt = CjrqajCt-j its regular component, t h e n the 6-step ahead predictor of X6 based o n ats past . . . , X - 1, Xo i s given by w
(4.27)
with (4.25) and (4.26) remaining to hold.
+c , :
Proof. In this case, X x ( 0 ) = X X ( - C G ) @ X z ( 0 ) and so yS belongs t o X x ( 0 ) while a j C 6 - j is orthogonal t o W x ( 0 ) .
aj(6-j
I
The next lemma shows that in principle the moving average coefficients 2 0} of X t can be found in terms of its autocorrelation function R(t). In what follows we use the convention aj = 0, for j < 0. {aj : j
Lemma 4.5 If X t has the one sided moving representation X t in t e r m s of its innovation sequence, then
=
I, aj&j ” =
00
R(t) = g2
aj+tq,
t E Z,
j=O
o r in matrix f o r m
R = 2rr*.
Here f o r each i , j E Z the i j t h entry of the matrices R and respectively. R’J = R ( j - i ) and P =
(4.28)
are defined by
Thus the moving average coefficients and hence the predictor coefficients { a 3 } of a regular sequence can be obtained from triangular or Cholesky factorization (4.28) of its covariance matrix R. See Sections 4.2.5.3 and 8.5.3.3 for further discussion, including proofs, of Cholesky factorizations of positive definite and nonnegative definite matrices.
4.2.4
Spectral Theory and Prediction
Here we discuss how, using the Kolmogorov isomorphism (4.13), one can translate some prediction problems t o spectral domain problems. However, we will omit the development of the optimal predictor in the spectral domain. Although interesting, this would take us a bit too far from the most practical issues of prediction, namely, finite prediction. A complete coverage of infinite past prediction can be found in Doob [49],Wiener and Masani [224,225], and
UNIVARIATE PREDICTION THEORY
85
Masani [152,153]. Our presentation of infinite prediction theory is the same as developed in the Wiener and Masani references. Recall the spectral representations (4.5) and (4.6):
1
21T
R ( t )=
1
21T
e"AF(dX)
=
e%F(A).
where the spectral distribution function F(A) of the stationary sequence X t was chosen to be nondecreasing and left continuous on [0,27r) with F ( 0 ) = 0. By well known properties of such functions, (e.g., see [195, Chapter 5 , Theorem 21) F has a derivative F'(X) (in the usual sense), for a.e. A, which is nonnegative valued and belongs to L1 [0,27r), and we have
F ( X )- F ( O )2
1
x
F ' ( e ) d e , for all
xE
[0,27r).
If the equality holds for all X E [ 0 , 2 ~ ) we , say F ( X ) is absolutely continuous and denote its derivative by f (A) and call it the spectral denszty of the stationary sequence X t . If this is the case, then
R ( t )= ( X t , X O ) = E { X t % }
=
121T
eZtxf (X)dX.
which means the spectral measure F ( d X ) of X t is also absolutely continuous (a.c.) with respect to the Lebesgue measure and its Radon-Nikodym derivative is f ( X ) . The next theorem gives sufficient conditions for a process to have a moving average representation.
Theorem 4.4 (a) A statzonary process X t has a movzng average representatzon
k=-cc
k=-oo
if and only if its spectral measure is absolutely continuous. If this is the case, then x
ff2
f ( A ) = G1p(e-ix)12, with p(e-2')
=
bkePikA.
k=-cc
(b) If the moving average is one sided, that is, cc
bk =0
for every k < 0 , then
86
STATIONARY RANDOM SEQUENCES
and either cp+(z) =
cyZ0 bkzk is identically zero or In f
E
L1 [O, 27r) and
Proof.
Suppose X t has the moving average in (a); then one can easily conclude that
On the other hand, since we can write
m
m
k=-w
k=-m
cy=-,Ibkl2 <
DC:
k=-m
the function cp is in L2[0,27r) and
k=-w Therefore we arrive at
which, by virtue of uniqueness of the Fourier transform, implies that X t has a spectral density f , which is of the form f ( X ) =
UNIVARIATE PREDICTION THEORY
Cr=”=_, lbkI2 < cc.Now letting [(A) = s ,
87
cp-’(X)<(dX) we can write
e i t x [ ( d X ) is clearly a white noise. where Et = For (b), since p E L2[0,27r) it follows that p + ( z ) belongs to the Hardy class H2 and hence either p ( e i x ) E L2 vanishes identically or In cp(eix) is integrable on [0,27r) and
1
27r
In lcp(e-ix)ldX.
In lp+(o)/5 2lT
Multiplying both sides of this by 2 and substituting bo for cp+(O),
Now substituting for Icp(e-i’)(2 from f ( A )
=
glcp(e-iA)12, we get
which leads us to
1 In lbo12 I
l
27r
[Inf (A)
+ ln(27r) - ln(a2)]dX.
Transporting the last term on the right hand side to the left and combining the resulting two terms on the right hand side, we get
Combining this theorem with the Wold Decomposition Theorem we get the following important result. Theorem 4.5 Suppose X t is a stationary sequence with spectral distribution function F and spectral density f . Let F y and Fz denote the spectral distribution functions of singular and regular components Yt and Z, in its Wold decomposition (see Theorem 4.3); then
88
STATIONARY RANDOM SEQUENCES
+ Fz;
(a) F = FY
(b) F z is absolutely continuous and its spectral density is of the form
In
2
I
&
277
In Fz(X)dX.
Proof. Since {Y,} 1.{ Z t } , we have
which implies
+
+
This in turn implies F = F y F z constant. However, the constant must be zero because all three spectral distribution functions F, F y , and FZ vanish at 0. This proves (a). Parts (b) and (c) follow from last theorem because by the Wold Decomposition Theorem 4.3, m
30
Now we can prove the following two useful theorems.
Theorem 4.6 The stationary sequence X t is nondeterministic if and only if it has a spectral distribution function F such that In F‘ E L1[0,27r). I n this case, the prediction error gZ is given b y
Proof. Let X t be a nondeterministic stationary sequence with Wold decomposition X t = Yt + Z t . Denoting the spectral distribution function of Zt by F z , it follows from part (c) of Theorem 4.5 that In FA E L1 [0,27r) and
2 1 In-27r <2J
277
lnFL(X)dX 5
17i 1
2x
lnF’(X)dX.
(4.29)
UNIVARIATE PREDICTION THEORY
89
Since the sequence Xt is nondeterministic, c2 is positive and hence the last integral cannot become -m. But neither can it be +cc because, by Jensen’s F’(X)dX, which is finite since F‘ beinequality, it is dominated by In longs to L1[O,27r) and has nonnegative values. Therefore 1nF’ € L1[0,27r). Next, suppose that X t is such that 1nF’ E L1[0,27r) and consider the innovation CO = XO - (XolFlx(-l)); since the projection (XO/Flx(-l)) E (- I), we can write (0 = lim ut
& s,’”
t-0
for some ut of the form ut = X O - EL., Ut,X-k. By the Kolmogorov isomorphism we can write t
II U t 112 k=l
Dividing both sides by 27r and then taking the natural logarithm of both sides, we can write
2
-& Jd
2T
InF’(X)dX.
The second inequality above follows from the Jensen inequality (see [195,page 1101) while the fourth one follows from Jensen’s formula (see [3, page 2061). Now taking the limit of both sides of the inequality
90
STATIONARY RANDOM SEQUENCES
obtained above, we get In 0 2 = In 277-
~
'I
27r
'I2 2
l'"
In F'(X)dX.
Since by assumption the integral is finite, we conclude that 0 > 0, that is, X t is nondeterministic. The error formula is a result of this last inequality and (4.29). I The following theorem reveals the close correspondence that exists between the spectral measures (or distribution functions) associated with the components of the Wold decomposition of X t , on one hand, and the Lebesgue decomposition of F x on the other hand. For the Lebesgue decomposition of a measure and its properties one can refer to Halmos [86, Section 321 or Royden [195, Chapter 111. Theorem 4.7 (Concordance) The spectral distribution functions of Wold decomposition components Z, and yt of a nondeterministic sequence X t are given by FZ = Fa" and F y = F S ,
where Fa", F S are the absolutely continuous and singular part of F = F x , respectively.
Proof. Since we have already seen that the spectral distribution function of X t is given by F = FZ + F y and that F z is absolutely continuous, it suffices to show that FG = 0 a.e. Now from Theorem 4.5(a) we can write
F'(X) = F i ( X )
+ F&(X)= a21p(e-ix)12+ F&(X),with p E L2[0,2.i7).
Taking the natural logarithm of both sides and then integrating results in
The last inequality follows from Theorem 4.5(c) and the fact that X t is assumed to be non deterministic. But by Theorem 4.6 the integral on the left-hand side of the last inequality is 27r ln(a2/27r). Hence the integral on the right-hand side of the last inequality is nonpositive but its integrand is clearly nonnegative. Therefore
UNlVARlATE PREDICTION THEORY
and hence
F'Y (A) ,'p(e-ix),2 = 0,
91
for a.e. A.
Therefore the numerator must vanish. That is,
F;(A) = 0, for a.e. A.
I
The following theorem, whose proof now is straightforward (and hence omitted), gives a spectral criterion for regularity.
Theorem 4.8 A stationary sequence X t is regular if and only if it has an absolutely continuous spectral distribution function F with a spectral density f such that In f E L1[0,27r). If this is the case, then
4.2.5
Finite Past Prediction
+
The problem we address here is that of predicting X ( t 6),for a stationary process X t , based on the n observations { X t P n + l ,Xt-n+2,. . . , X i } . The best linear predictor here, of course, is the orthogonal projection of Xt+6 onto M ( t ; n )= sp{X, : t - n < s 5 t } and we will thus denote it by h
Xt+6,n =
(Xt+dIM(t;
(4.30)
Throughout this section we consider only real random sequences and only two values of 6, namely, S = 1 and S = -n. Beginning with 6 = 1 we seek the coefficients in the forward prediction (4.31) j=1
The normal equations arising from the properties of projection are A
(Xt+l - X t + l , n , X s )= 0, t - n
+ 15 s 5 t,
or n
R(t - j
+ 1 - s) = R(t + 1 - s),
t - n + 15
j=1
These equations can be expressed in matrix form as
s
5 t.
(4.32)
92
STATIONARY RANDOM SEQUENCES
It is clear that solution of this equation, namely, {ah, : j = 1 , 2 , . . . , n } is independent of t-a fact that could be argued from the stationarity of X t as well. So from now on we suppress t and express the last equation in compact form Rk = Rnan, (4.34) where RA denotes the n x 1 matrix on the left-hand side of the above equation. For any 01, = [anla,a . . . a,,]’ that solves (4.33), the prediction error h
Et+l,n = Xt+l
-
(4.35)
Xt+l,n
has variance, again independent o f t , given by h
(0;)’
=
11 X t + l -
=
R(0)-
Xt+l,n
/I2=
h
( X t + l - Xt+l,n,Xt+1)
n
C a,jR(j) = R(O)- ( R k ) ’ a n .
(4.36)
,=1
Now let us examine the case 6 = -n (backward prediction), where we predict X t P n based on M ( t ;n ) = sp{X, : t - n < s 5 t}. The coefficients Pn3 in the best linear estimator n h
Xt-n,n
=
C PngXt-j+l
(4.37)
j=1 are determined by the matrix equation
which in the compact format becomes
R;
= Rn,Bn.
(4.39)
RY is clearly the flip of Ri,that is, R T ( j ) = R A ( n - j + 1). The same is true about the column matrices 0, and an. That is, for any k = 1 , 2 , . . . , n, Pnk =
an(n-k+l) and
The prediction error at time t
-
(Ri)k
= (R?)n-k+l.
(4.40)
n, namely, h
Et-n,n
has variance
= Xt-n - Xt-n,n,
(4.41)
93
UNIVARIATE PREDICTION THEORY
which, by virtue of (4.40), is identical t o ( ~ = R(0) i-) (R;)’a, ~ (see (4.36)). Thus 0: = uk and so from now on we write both of them as u,.The prediction coefficients a , and p, are unique if and only if R, is invertible, and this is true if and only if any n consecutive members { X s ,X s + l r..., Xs+,} of the process are LI [56]. The normal equations producing the solutions a , and p, are sometimes called the Yule-Weker equatio:s. Some important properties of Xt+l,, and Xt-,,, and their errors are now given in a manner suggested by the presentation in [183, Chapter 71. Here we use u2 to denote the prediction error of predicting Xt+l on its infinite past. That is, u2 = IIXt+l - zt+11I2 with , zt+l = (Xt+1INx(t)). h
h
Proposition 4.5 If X t i s stationary and Xt+l,, and Xt-n,n are the best linear predictors of Xt+l and Xt-n based o n n observations {Xt-,+l, Xt-,, . . . , X t } , then
(a) u,,i s nonincreasing and bounded below by a; that is,
u1
2
u2.
...
0;
(b) u, 4u,as n + m; h
h
(c) limn-m Xt+l,n = Xt+i;
(d) if urn= 0 f o r some m, then u, = 0 f o r all n 2 m; (e) R,+1 i s invertible f o r some n zf and only if case R, i s also invertible and
0,
> 0 f o r that n. In that
u; = R(0)-
(4.42)
(f) if u, > 0 , then rankR,+l = rank (R,)
+ 1;
(g) if X t is nondeterministic, then u > 0 and lRnl # 0 f o r all n is the case, we have further that 1
u = exp (lim n In
JIR,I) > 0.
2 1. If this (4.43)
Proof. For (a), u, is bounded and nonincreasing because of the top line of (4.36). For (b), this follows from limn-m M ( t ;n) = s p { X , : s 5 t } in conjunction with the fact that the predictor zt+lthat achieves error u2 can be approximated arbitrarily closely by elements of sp{X, : s 5 t } . For (c), since (2t+l-xt+l) I (2t+,,, - xt+l), we can write h
94
STATIONARY RANDOM SEQUENCES
and hence
h
h
I/ Xt+l,n - Xt+l 112=
0; - 0'
This in conjunction of part(b) gives
which completes the proof. Part (d) is an immediate consequence of part (a). For part (e), 0: > 0 means that Xt+l @ M ( t ; n )which, in turn, means that {Xt-n+l, Xt-n, . . . , Xt,Xt+1} is LI. The left equality in (4.42) follows from (4.39) and (4.36). To prove the right equality, we apply the result of Problem 4.14 at the end of this chapter to the partitioned matrix
and obtain
For ( f ) , if on > 0 then both Rn+l and R, are invertible and have respective ranks of n 1 and n. For (g), since X t is nondeterministic, 0 > 0 and, consequently, for every positive integer n, g n > 0 and R, is invertible. Therefore we can write
+
1 n o = lim -1C "l n a , = n-oo
n
which implies (4.43).
lim --ln1 " 2n k = l
n-03
k=l
lRk+l
lRkl
I
= lim l n J i R n l , n'OO
I
4.2.5.1 Partial Autocorrelations
For a general second order random sequence
Xt and any n 2 0, its nth partial autocorrelation at time t is defined to be
j.(t,72 + 1) = Corr (Xt+l - Xt+l,n, Xt-n A
h
- Xt-n,n)
+
which gives the immediate interpretation that r(t,n 1) is the correlation of the prediction errors E t + l , n with Et-n,n. Another interpretation is that n ( t ,rz 1) is the correlation between Xt+l and X t P n when the effects on the variables {Xt-n+l,Xt--n,.. . , X t } are removed. Note that when n = 0, we obtain r ( t ,1) = Corr (Xt+l,Xt),since there are none in between.
+
95
UNIVARIATE PREDICTION THEORY
+
For stationary processes we expect, for each n, n ( t , n 1) t o be constant with respect t o t . The following result shows that this is in fact true.
Lemma 4.6 If X t is stationary, then each n ( t ,n + 1) is independent o f t and hence from now on wall be denoted by n(n 1).
+
Proof. By the remarks preceding Proposition 4.5, it is clear that the denominator of (4.45), defining the partial autocorrelation, is independent o f t . So we only need to check time independence of its numerator and that is clear from
n
n
j=1 n
3=1
n
(4.46) j = 1 k=l
Again we note that the vectors an and Pn need not be unique solutions t o the forward and backward YuleWalker equations because as long as they are solutions, they represent the projections. The expression (4.46) for n(n 1) can be shortened, since for each k in the last line of that equation,
+
n
j=1
thus causing the cancellation of the last two items and producing - (RT)’an + 1) = R(n + 1)4
7r(n
(4.47)
Due t o the “flip” relationships between an and ,Bn as well as between R; and R;, equation (4.47) can be written as
n(n + 1) = R(n + 1) - (Ri)’Pn
(4.48)
g;
4.2.5.2 Durbin-Levinson Algorithm The idea of the Durbin-Levinson algorithm is to find a computationally economical way t o compute an+lgiven the vector of predictor coefficients a, (a solution of (4.31)). To do this, write the matrix equation (4.33) with n 1 replacing n, as
+
(4.49)
96
STATIONARY RANDOM SEQUENCES
We seek the vector of coefficients separately produces
RA = Rna,
+ RYal
and
= [aLal]’.Writing the two equations
R(n
+ 1) = (RY)’au+ alR(0).
Since RA = Rna, it is natural to try a, = a, preceding equations into
0 = R,w
+ R:al
and
(4.50)
+ w, which transforms the
R(n + 1) = (RY)’(a, + w) + alR(O), (4.51)
respectively. But the top line is solved by w = -alp,. Substituting this expression for w in the bottom line and using equations (4.36) and (4.47), we get
a[ =
+
R(n 1) - (R?)’CY, = 7r(n+ 1). R(O)- (R?)’P,
Note that crl is the last coordinate of the vector a,+1 of regression coefficients as described in [28, Section 3.41. In other words, cq = a(,+l)(,+l).Hence the last equation gives
+ 1) = Q ( n + l ) ( n + l ) = P ( n + l ) l .
-ir(n
Given a, and
P,, if
7r,
# 0 (meaning both Xt+l and X t - , are LI of
cq from the preceding and then a, = a, - alp,. then (4.49) is solved by a[ = 0 and a, = a,,which makes perfect
M ( t ,n - l ) ) ,we determine
If T , = 0 sense because X t - , does not add any new information. For the backward coefficients, we solve for Pnfl (predicting t o time t - n based on a sample of size n 1 into the future) in terms of 0, . Beginning with (4.38) we obtain
+
(4.52) which leads, as above, t o Pu
=
R(” + 1) - (Rk)’P, = 7r(n l), R(0)- @;)’an
+
with P,+l = [A Pi]’, where Pl = P, - P, a,. Given that we wish t o compute the coefficients up through some n = no, we begin with n = 1 and directly obtain
What we need for the second step, namely, computing coordinates of a2 and a1 = [all]and PI = [Pll],which we already have. The process continues recursively u p t o n = no.
P2, are
97
UNlVARlATE PREDICTION THEORY
4.2.5.3 Cholesky Decomposition and Innovation Algorithm A variation of the Durbin-Levinson idea is the innovation algorithm. This algorithm is useful for recursively computing finite past prediction coefficients without the need for an explicit matrix inversion as in (4.33) or (4.34). Additionally, it does not depend on the stationarity of X t , a feature we will utilize in Chapter 8, where the issue of deficient rank covariance matrices is also treated. We will now show how the innovation algorithm is essentially connected to the Cholesky decomposition (or factorization) of a positive definite and therefore invertible covariance matrix R. The case of a rank deficient R will be treated in Chapter 8.
Proposition 4.6 (Cholesky Decomposition) If the n x n matrix R is positive definite, then there exists a lower triangular matrix 0 for which
R = 0 0’.
(4.53)
This factor 0 is unique if we demand its diagonal elements to be positive. Proof. Recall [56] that R is positive definite if and only if there exist n LI random variables { X I, X 2 , . . . , X,} of finite variance such that R = Cov (X,X), where X = [ X I ,X 2 , . . . , Xn]’. Here we sketch a proof based on the GramSchmidt orthogonalization procedure applied to the vectors { X I ,X 2 , . . . , X n } . To start the Gram-Schmidt procedure, set Y1 = X I and then 71 = Y l / / / Y /I. l Since the set { X I ,x,} is LI, the vector Y2 = X2 -PM1X2 is not null; then set 772 = Y2///Y2//. It is clear that { q 1 , q 2 } are orthonormal and that X I = 1911q1 and X 2 = 821771 8 2 2 7 2 . Assuming the orthonormal set (771,772,. . . , q k } has been determined from { X I ,X 2 , . . . , X k } , the linear independence of { X I , X 2 , . . . , xk} implies Yk+1 = Xk+l - PM,Xk+l is not null; thus we set ~k+= l Y k + l / / / Y k + l l /and we can write
+
[!]=I Xk+l
8 I921 ;
ok+l,l
0
o...
022
0
&+1,2
”
.’. ’
8k+l,k+l
]I 1. qk+l
Thus X, = 07.Since { X I ,X 2 , . . . , X,} is LI, the preceding holds for k n and the set (771,772, . . . , q,} is orthonormal. This leads finally to
R = E X , XA
+1=
=0E{~q’}0’.
which, since E{qq’} = I, is (4.53) with 0 lower triangular as required.
I
Discussions of the Cholesky decomposition (or factorization) may be found in many references. For example, see [207], and for a matrix oriented proof, see Golub and Van Loan [79, Theorem 5.2-31.
98
STATIONARY RANDOM SEQUENCES
The preceding sketch only shows the existence of 0 appearing in the Cholesky decomposition. The connection to finite past prediction is readily made by recognizing that q k + 1 is prediction error vector X k + l - P M k X k + l normalized to unit length, where we recognize that
Mk=sP{Xi,Xz,...,Xk}=SP{771,772,. ..,qk}, Since X k + l =
cr=f:
&+l,jqj
xk+l
15k 5 n .
can be uniquely decomposed
=PMkXk+l
and then noting x j =kl e k + l , j ? j ) j E the identifications
+[Xkfl -PMkXk+l]
M k
and
qk+l
1M
k ,
we can easily make
k
,--
ok+l,jqj
(4.54)
- P M k X k + l = Ok+l,k+l?lk+l.
(4.55)
Xk+l =PMkXk+l
= j=1
and Xk+l
+
So at row k 1, the first k terms form the least square predictor P M , X k + l of X k + l and the last term 6 k + l , k + 1 q k + l is the prediction error whose norm is &+l,k+l. Thus we see that computing the Cholesky decomposition for R is the same problem as computing the coefficients for the predictor expressed in terms of the prediction errors (the finite past innovations). Finally, we come to the innovation algorithm, which gives a method of recursively computing the ( k + 1)st row of 0 given the coefficients of the first k rows. Proposition 4.7 (The Innovation Algorithm) If the n x n matrix R i s positive definite then the lower triangular matrix 0 in (4.53) can be computed The remainder of the recursively as follows. First set 0 1 1 = [R(1,l)]'/'. coeficaents & + l , j are computed left to right beginning with k = 1 (row 2) as follows. F o r j = 1 , 2 , . . . , k , set
(4.56) For the diagonal term, set
(4.57) Subsequent rows (k = 2 , 3 , . . . , n - 1) are computed in increasing order.
MULTIVARIATE SPECTRAL THEORY
99
Proof. Let X = [XI,X z , . . . , X,]’ be a random vector whose components are LI and R = Cov (X,X ) . Since every subset is also LI, its corresponding submatrix of R-is positive definite. The process is started by setting 2 1 =0 so that X I - XI and hence 11x1 - 2 111 = l/XllI and X1 = Qllql leads us to Ofl = R ( 1 , l ) . Suppose now the first k rows of 0 have been determined. By the Gram-Schmidt construction, we first have (Xk+l - J ? k + l , 7,) = 0 for j = 1 . 2 , . . . , k. But this may be written h
(Xk+l,qj) = (Xk+lrqj) = Q k + l , j , h
where the last equality follows from (4.54). Using q, = Q,;’(X, - X,) in the preceding,
giving (4.56). Note that for computing Q k + l , j , only 6’s from previous rows and for m < j are needed. The diagonal term in row k 1 is computed bY
+
&+l,k+l
=
llXk+l - Xk+lII
giving (4.57).
4.3
MULTIVARIATE SPECTRAL T H E O R Y
Let us recall from Definition 1.5 that a second order q-variate sequence Xt = [Xi,X:, . . . ,X:]’isstationaryifmj(t) ~ m and j R j k ( s , t ) = R j k ( s - t ) for all s, t E Z and all j, Ic = 1 , 2 , . . . , q.
100
STATIONARY RANDOM SEQUENCES
4.3.1
Unitary Shift
Multivariate sequences also have a unitary shift defined on their time domain. The time domain of an q-variate random sequence is defined as
Proposition 4.8 A zero mean q-variate sequence Xt is stationary if and only if there exists a unitary operator U defined on E X such that
x;+,= UX,j,
(4.58)
for each t E Z and 1 5 j 5 q. Proof. Since unitary operators preserve inner products (4.58), we get
R j k ( s , t )= ( X i , X , " )= ( U X j , U X , k )= (Xi+l,X,"+l)= R i k ( s + 1 , t + 1) and this implies R j k ( s ,t ) = Rjk((s- t ) . To prove the converse we set
Lx
= sp{Xtp : t E
z,1 5 p 5 q} k
so that E X = and for any z = C j = 1 a 3 X c in Lx we define Uz = C:=l ajX?+,. Using stationarity one can show, just as was done above in the univariate case, that U is well defined, linear, and preserves inner products as a map from L X onto L x . Then we similarly extend U to a unitary map from 3 - t ~to E X . In other words, a multivariate stationary sequence has a single unitary operator U acting as the shift operator for all its components. This is a basic characteristic of multivariate stationary processes. Sometimes we will express X;+, = U X ; for t E Z and j = 1 , 2 , . . . , q in a brief form by Xt+l = U X t , t E
z.
The issue of a nonzero mean is similar to the univariate case. Specifically,
Thus a constant mean says that each component sequence X i has the fixed projection mj onto 1. Again it is not necessarily true that 1 E E X but if so, it still remains true that U1 = 1. See problems 4.1 and 4.2.
MULTIVARIATE SPECTRAL THEORY
4.3.2
101
Spectral Representation
The spectral representation for multivariate stationary sequences follows from an application of 27r
Ut =
e"'dE(X)
(Spectral Theorem for Unitary Operators (Theorem 3 . 7 ) ) to the unitary operator U that gives Xt+l = U X t in Proposition 4.8. Denoting E as the corresponding spectral measure, we can define a column vector valued random measure by [ ( d X ) = [EqdX)]g,l = [E(dX)x{];=, and write, for j = 1 , 2 , .. . , q ,
The countable additivity of ( and orthogonality of its increments,
(ti( A ) ,E j (A')) = 0,
whenever A n A' = 0
for any i : j = 1 , 2 : . . . , q , follow from the properties of the spectral measure E . These remarks in conjunction with Theorem 4.2 applied component-wise yields the following.
Theorem 4.9 If Xt i s a q-variate stationary sequence, then (a) there exists a n q-variate vector measure 6, called its random spectral measure or simply its random measure, such that
xt =
1
27r
eit't(dX);
(b) the spectral measure F of Xt defined by
F ( A ) = [ ( ( z ( A ) , ( j ( A ) ) ] q, f o r any Bore1 subset A of [0,27r) ,
2,3=1
is a nonnegative definite matrix valued measure; (c) the covariance has the spectral representation
102
STATIONARY RANDOM SEQUENCES
where its matrix valued distribution F A of X t is related to its spectral measure F ( d X ) just as in the univariate case. By the Lebesgue decomposition we can always write F = Fa' + F S . We use F' to denote the Radon-Nikodym derivative of its absolutely continuous part Fa' with respect to Lebesgue measure. If F is absolutely continuous (w.r.t. Lebesgue measure) then we denote f = F' and call it the spectral density of
xt . 4.3.3
Mean Ergodic Theorem
For multivariate stationary sequences, the mean ergodic theorem addresses mean-square convergence of
(4.59) at X = 0. By considering Proposition 4.2 applied to each component of X i , we obtain the following.
Proposition 4.9 If X t is a stationary sequence, then (4.60) Extending Definition 4.3, if E ( X t } = m, we will say that X t is mean ergodic if lim S,(O) =m (4.61) N-CC
component-wise, in the mean-square sense. Then, as in the univariate case, S,(O) -+ m if and only if the atom of <(.) at (0) is the vector of constants m, or in terms of F , Fjj({O}) = /mjI2. 4.3.4
Spectral Domain
In this section we introduce the spectral domain of a multivariate stationary sequence as a Hilbert space of functions. Then the Kolmogorov isomorphism between time and spectral domains of such a sequence is presented and its importance in prediction theory is discussed. We show that to any random variable Y E 'Hx there corresponds a vector function cp(X) = I,Pj(X)l,"_l = [ , P l ( X ) > P 2 N , ' " 7 (P9(X)1*
in the spectral domain such that
MULTIVARIATE SPECTRAL THEORY
103
To be more precise, let F = [Fij]:,3,1be the spectral measure of Xi as defined in part (c) of Theorem 4.9 and let p be any complex valued measure with respect to which all diagonal measures F j j of F are absolutely continuous. For example, one can take p = C,"=, F j j . In view of the nonnegative definiteness of F ( A ) for any Borel set A , the vanishing of all diagonal elements Fjj(A) implies that Fij(A) = 0 for any i, j = 1 , 2 , ..., q. Hence each Fzj is absolutely continuous with respect to this measure p. Define the spectral density f, by
Since the matrix F ( A ) is nonnegative definite for each Borel set A, the matrix function f,@) =
[f$(X)l
will be positive definite for almost every A. We define the space L2(f,) to be the set of all vector valued functions cp(X) = [(pJ(X)]3,, for which
i,j=l
+
If both vector functions cp and belong to L2(fp)then it is easy to see that the function [cp*f,+] E L 1 ( p ) .In fact, since f is nonnegative definite, we have Icp* (X)f,(X)+(X)
I I [cp* (X)f,
(4cp(X)11/2 [+*
(4+(X)1'12
for almost every X w.r.t p . Hence by the Cauchy-Schwarz inequality cp"fp+ belongs to L 1 ( p ) . We define an inner product on L2(fu) by
One can easily check that neither the spectral domain L2(fp) nor its inner product depends on the choice of the measure p. We denote this (p-invariant) common space by L2(F) and this common inner product by ( c p , +): that is,
(9, +) =
1
cp*F(dX)+ = J'cp*(X)f,(X)+(X)P(dX)
zy
From now on we set the auxiliary measure to p = F j j and suppress all p indices. If we identify two functions cp and in L2(F) when
+
104
STATIONARY RANDOM SEQUENCES
then L2(F) becomes an inner product space. The following lemma shows that this space, which is called the spectral domain of Xt,is complete and hence a Hilbert space.
Lemma 4.7 T h e i n n e r product space L2(F) with n o r m llcpll complete .
=
d m is
Proof. It is easy to see that the spectral density f is different from zero almost everywhere w.r.t. p . In fact, if f = [f”]= [ F ’ J ( d X ) / p ( d X ) ]is zero on a set A then F J J ( A )= JafJJ(X)p(dX) = 0 ,=~1,...,q. Hence p(A) = F J j ( A )= 0. For each X E [0,27r) let m(X) denote the smallest nonzero eigenvalue of the matrix f(X); further denote S(X) as the subspace of all qdimensional row vectors cp = [pj]satisfying the condition cp*(X)f(X) = 0 and R(X) the orthogonal complement of S(X) in @4. It follows readily from the non negativeness of the matrix f(X) for almost every X that llqll = 0 if and only if cp(X) E S(X) for almost every X w.r.t. p. It is easily seen that for any row vector function cp E L2(F) one can find another such function @ E L2(F) such that 1 1 9= 0 and @ E R(X) for almost every X w.r.t. p, and for these A’s
xy
clmOl2. 4
cp*(X)f(X)cp(X) 2 m(X)
3=1
Now let {cp,} be a Cauchy sequence in L2(F) and {Cp,} be its corresponding sequence as we just defined. We may assume without loss of any generalis ity that ‘p, = Cpp for any postive integer p . Given any E > 0, since {cp,} Cauchy, there exists a positive integer N such that
((v,- cp, 11 < &,
whenever p , p ’
> N.
This means 4
Icp$(X)
/m(X)
-
cp$(X)12p(dX)< E , whenever p , p ’
> N,
j=1
which in turn implies that, for each j = 1:2, .., q
/
I c p g ( X ) m - c p c , ( X ) m 1 2 p ( d X ) < E , whenever p , p ’ > N .
So for each j = 1 , 2 , ...,q the sequence { @ , ( X ) ~ } is Cauchy in the Hilbert space L 2 ( p ) .Hence for each j = 1 , 2 , ..., q there is a function @J E L 2 ( p )such that in the L 2 ( p )sense. lim ( p g m = @, ,-a
MULTIVARIATE SPECTRAL THEORY
105
Hence for each j = 1,2, ..., q there is a subsequence of p;, which we again denote by pi and some sets A j with p ( A j ) = 0 such that lim
( P ; ( x ) ~=
P-m
@(A),
for every
x E ~j
Taking
and noting that m(X)is positive for almost every X with respect to p , we see that for each j = 1 , 2 , ..., q lim pg(X) = pj(X), for almost all X E A j .
P-W
Letting A = U j A j , clearly p ( A ) = 0 and for every X E A , and hence almost every A, we have lim
p-00
cp;(4f(4cpP(4
= cp*(X)f(X)cp(X).
As the sequence (9,) is Cauchy and hence bounded, there exists a constant M such that llcpP1I2 =
J’ (P$(4f(X)cpP(4P(d4L M ,
for all P E
z.
By Fatou’s lemma,
J’ cp*(X)f(X)cp(X)
I M,
which shows that the row vector function cp belongs to L2(F). Now for any E > 0 there exists a positive N such that Ilcp; - cp;,//
< E,
whenever p , p’
> N.
Fixing p > N and letting p’ -+ DC) along those values of p’ for which cpP, (A) + p ( X ) almost everywhere w.r.t. p , and using Fatou’s lemma again, we obtain -
cp*ll
< E , for any p > N .
I
The Kolmogorov Isomorphism. Now we can establish a multivariate extension of the Kolmogorov isomorphism, which can transfer some prediction problems from the time domain to the spectral domain, where Fourier analysis can be used to solve the problem there and then transfer the result back to the time domain. Let p = [p3]1,”,lbe a row vector such that
J’ Ip’(X)12f”(X)(dX)
< 00; j
= 1 , 2 , .. . , q .
(4.62)
106
STATIONARY RANDOM SEQUENCES
s
For each j the integral cpJ[jd(X) exists and is a random variable in X X , the time domain. We correspond to this vector function cp E L2(F) the random variable Y defined by
which is clearly in X X . This correspondence is an isometric mapping because if 4
”
is another such random variable then one can easily check that
This correspondence can be extended by linearity to the set of all finite linear combination of cp’s satisfying (4.62) and then continuity to their span closure, which is L2(F). Using the standard arguments one can show that this extension remains to be an isometry. Now since the range of this mapping contains all random variables X i (because X i = Sp:(X)((dX) with pi = [ez”dJ-k]“,,,), this mapping is an isometry from the spectral domain L2(F) onto the time domain 7 - l ~ .So any element Y in the time domain of a multivariate stationary sequence has a spectral representation
for some vector function cp E L 2 ( F ) . Note that any q-variate vector random variable Y = [Yk]EXl can also be represented as Y = 1cp(X)e(dX), where cp is an q x q matrix valued function. In particular, we can write Xi = e“’<(dX). It should be remarked that if T is a linear operator on the time domain, then the equation
s
defines a linear operator T f on the spectral domain, namely, T’cp = cp’. Conversely, to any linear operator T’ in the spectral domain there corresponds a linear operator T in the time domain. In particular, one can see that multiplication by e” is the operator in the spectral domain corresponding to the unitary shift operator U we defined earlier in the time domain.
107
MULTIVARIATE PREDICTION THEORY
4.4
MULTIVARIATE PREDICTION THEORY
In this section we study the prediction of multivariate stationary random sequences. The presentation is based mostly on the Wiener and Masani citeWandMl,WandM2, Masani [153] and Rozanov [201] treatments. 4.4.1
Infinite Past, Regularity and Singularity
We start by generalizing some more definitions to the multivariate case.
be a second order r a n d o m sequence. W e Definition 4.7 L e t Xt = define its past u p t o and including t i m e t a n d r e m o t e past t o be
‘FIx(t)= Sp{Xj : s I t , 1 I j 5 4 )
(4.63)
and (4.64)
respectively.
Definition 4.8 A q-variate stationary r a n d o m sequence Xt i s called purely nondeterministic ( o r regular) if ‘FIx(-m) = (0) and i s called deterministic ( o r singular) if
or equivalently if ‘FIx(s)= ‘ F I X @ ) , f o r all s E Z. Definition 4.9 For a n y q-variate vector X = [Xj]:=,, w i t h all i t s q comp o n e n t s belonging t o s o m e Hilbert space ‘H, and f o r a n y subspace M of this Halbert space, w e define (XIM) t o be
W e refer t o this as t h e projection of X o n M and hence s o m e t i m e s denote it by PJM(X). B y t h e predictor Xt+, of Xt based o n ‘FIx(t)here w e m e a n
-
Xt+i = P n x ( t ) X t + i = (Xt+i l H x ( t ) ) .
For more on this see Problem 4.13.
108
STATIONARY RANDOM SEQUENCES
4.4.2 Wold Decomposition Again, as in the univariate case, we characterize the relationship between the propagating unitary operator U defined in (4.58) and the time domain subspaces. The proofs follow exactly as in the univariate case (see Lemma 4.1).
Lemma 4.8 If X t is m-wariate stationary with unitary shift U , then (a) X x ( t
+ 1) = U X x ( t ) ;
(b) X x = U X x ; (c) X x ( - m ) = U X x ( - m ) .
Proposition 4.10 (Wold Decomposition) A n y multiwariate stationary sequence X t can be uniquely decomposed as the sum of two orthogonal stationary sequences Y t and Z t , (4.65) xt = Yt zt
+
such that (a) X z ( t ) = X Y ( ~@) X z ( t ) ; (b) Y t is deterministic and Zt purely nondeterministic;
(d) Fx = FY
+Fz.
Proof. For each t
E
Z set
Yt =
( X t l X x ( - o o ) ) and Zt = X t
-
Yt.
Since for each s, Y , E X x ( - o o ) and for each t , Zt = X t - Yt = X t ( X t / X x ( - m ) ) 1X X ( - W ) , the components of Y , are orthogonal to the components of Zt and this implies X y i Z Z . For each t € Z , Y t € Z x ( - m ) C Z x ( t )and hence Zt = X t - Y t E X x ( t ) . Therefore 'Fly(t)@ X z ( t )2 X x ( t ) . In order to complete the proof of (a) it suffices to show that X x ( t ) C X y ( t )@ X z ( t ) . Pick w E X x ( t ) ; then each component WJ of w satisfies wJ = limw;. where each w3, E M x ( t ) and hence W; = u i w;, u; E L Y ( t ) . u; E L Z ( t ) . But since w3,must be Cauchy and 11~3,- u3,11 5 1 1 ~ ; - w&li, and similarly for ~ 3 then , ~ u3, -+ uJ E X y ( t ) , u3, + uJ E X z ( t ) . Taking the limit of W3,= u, 3 w A l we get WJ = uJ v J , which shows X x ( t ) c X y ( t ) @ X z ( t ) . For (b), we first show that X x ( - m ) = X y ( t ) for all t E Z, which implies Y t is deterministic. Since it is clear that X y ( t ) c X X ( - O C ) for each t , we show that X y ( t ) c X X ( - m ) . For contradiction, suppose for fixed t there is
+
+
+
MULTIVARIATE PREDICTION THEORY
109
a nonzero u E 'Flx(-m)8 E y ( t ) ;then for each s 5 t we have u _L Y, and u I 2, so that u I X , = Y , Z, (component-wise). Hence u I E x ( t ) , and therefore u I E x ( - m ) component-wise, implying u = 0, which is a contradict ion. For (c), using Lemma 4.1, for any t E Z, we can write
+
from which one can conclude that Yt is stationary. This also shows that U and U y act the same on H y , which together with H x ( - m ) = H y , proved above, give U y = U ( H X ( - M )The . statements about Zt follow easily. Part (d) is easy to see. I
4.4.3
Innovations and Rank
Here we discuss notions of rank and innovation for multivariate stationary processes.
Definition 4.10 The innovation space of the m-variate sequence X t at tame t is defined to be
zx(t)= E x ( t )e E x ( t - 1) = {x E 'Flx(t): X 1.E x ( t - 1)).
(4.66)
which can also be identified through
E x ( t )= I x ( t )8 X x ( t - 1). Now we give some basic facts generalizing those for univariate case given in Lemma 4.2 .
Lemma 4.9 If X t is an q-variate stationary sequence with Wold decomposition X t = Yt Zt, then
+
(a) I x ( t
+ 1) = U z x ( t ) ;
(b) zx(t)1Ex(-..)% (c) dimZx(t
I x ( t )= zz(t);
+ 1) = dimZx(t) 5 m;
(d) X t singular
+ dimZx(t) = 0.
Proof. For (a),we follow the same reasoning as in the univariate case, namely,
zx(t+ 1)
+
'Flx(t 1) e 'Flx(t) = U'Flx(t) e U E x ( t - 1) = U [ E x ( t )8 xx(t- I)], =
110
STATIONARY RANDOM SEQUENCES
where the last line follows from the fact that U is unitary. For (b), let u E Z x ( t ) and v E XX(-CO). Thus u E X x ( t ) and u I X x ( t - 1) but v E XX(S)for all s and in particular for s = t - 1 so componentwise ( u j , u k )= 0. To see the second part of (b) write Zx(t) = X x ( t )
e X x ( t - 1)
[ X y ( t )e~ X Z ( ~ 0 ) ][ X Y ( ~ 1)CBX Z ( -~ I)] = [xY(t) 8 X y ( t - i)]CB [XZ(t)8 7-120 - 111 = [ X z ( t )8 X&- I)] = Z z ( t ) , =
because for all s , t we have X z ( s ) IXy(t) and X y ( t ) 8X y ( t - 1) = (0). For (c), suppose dimZx(t) = q’, for some q’ q and let { q l , ~. .~. ,,q q / }be a basis for Z x ( t ) . Since Zx(t 1) = UZx(t) where U is unitary, then we only need to observe that {q; = Uqj : j = 1 , 2 , .. . q’} will be a basis for Z x ( t 1); thus dimZx(t 1) = q’. For (d), if Xt is singular, then X t E X x ( t - l ) , for all t , and hence Zx(t) = Xx(t) 8 X x ( t - 1) = (0) for all t , meaning dimZx(t) = 0. I
+
+
~
+
Now we further examine the case when X t is not singular, so the regular component Zt of the Wold decomposition (4.65) has at least one component of positive length: that is, IlZjil > 0 for some j 2 1 and each t E Z. The innovation vector defined here by Ct
=
=
[ X t - f?ix(t-1)Xt1 E Zx(t) [Zt - h , ( t - l ) Z t l E Z Z ( t )
(4.67)
cannot be null. For if Ct = 0, then Zt E X z ( t - l ) , which together with IlZi I/ > 0 for some j contradicts that Zt is regular. It is clear from (4.67) that sp{C,} c Zx(t). In fact we have Z x ( t ) = sp{C,}. To see this, first write Xt
=
xt - h x ( t - l ) X t )+ %,(t-l)X
t -
5, + &,(t-l)Xt,
where Ct i X x ( t - 1). Then suppose Y E Xx(t) and Y I < { , j = 1 , 2 , . . . , q. Then Y E X x ( t - 1) and so Y I Zx(t), yielding Z x ( t ) c sp{C,}. Thus we have proved part (a) of the following lemma. Lemma 4.10 If quence Xt, t h e n
Ct
is the innovation process of a n q-variate stationary se-
(a) Zx(t) = SP{C,),t E z; (b)
Ct
is stationary and has the same shij? as Xt;
(c) Xt and Ct are jointly stationary, that is, ( X t ,C,) s , t E Z;
=
(Xt+l,C,+l),
MULTIVARIATE PREDICTION THEORY
111
(d) X = Cov ( C t , C t ) is independent oft and we have
(e) any future innovation of Xt is orthogonal to the past of the sequence Xt. I n fact, f o r any positive integer k and any integer t we have
(Ctl X t - k )
= 0, and
(Ctl Xt) = x.
Proof. For (b), since the unitary shift operator U commutes with the projection onto 'Hx(-m), then for every t Ct
=
xt
=
U[Xt-l - P7ix(-c+-ll
-
P31x(-,)Xt = uxt-1
-
P31x(-,)UXt-1
= UCt-1;
(4.68)
showing that U is the shift for C t . Hence Ct is stationary. For (c), it suffices to note because of (b) that Ct and Xt have the same unitary shift. For (d), note that Cov ( C t , C t ) is independent o f t due to (c), hence proving the equation when s = t . Now if s < t , then 6, E 'Hx(s) and Ct = X t - Xt I 'Hx(s). Therefore 6, 1.Ct, which means [(Ci,[/)]g,,=, = 0. For (e), for any integer k 2 1, X t - k E 'Hx(t - 1) and Ct 1 'Hx(t - 1). Therefore we have (Ct, Xt-k) = 0. Using the top line of (4.67) to express Xt it is clear that
-
Next we turn to the notion of rank of a multivariate stationary sequence Xt. In the time domain, where we work with a process and its innovation, there are two types of rank that we consider: its process rank denoted by p x , which is the dimension of sp{Xi : 1 5 j 5 q } and its innovation rank, denoted by r x , which is the dimension of sp{ci : 1 5 j 5 q}. These, which because of stationarity are independent of t , turn out to be equal to rankR(0) and rank X , respectively. Working with the spectral domain of Xt, we say Xt has spectral rank sx if it possesses a spectral density matrix f(X) having rank sx for a.e. A. The innovation rank is the most informative one because it describes the number of new (i.e., LI) random variables entering 'Hx(t)at each time step and hence it has a direct bearing on prediction and on the complexity of the sequence. From now on the rank of X t will mean its innovation rank. Here dimZx(t) stands for the dimension of the set {cl : 1 5 j 5 q } and by the
112
STATIONARY RANDOM SEQUENCES
dimension of any finite subset A of a vector space we mean dimension of span of A, which turns out to be the maximum number of LI vectors in A . From the preceding remarks, any deterministic multivariate sequence X t has rank zero. But when X t is nonsingular, it must have a nontrivial regular part havingdimZz(0) = dimZx(0) = r > 0. SinceZx(t) = sp{
co)
Corollary 4.10.1 The rank of a q-variate stationary sequence X t is
r x = rank E.
(4.69)
Lemma 4.11 If X t is multivnriate sequence, then r x 5 px.
For any
a1, az,
. . . , aq we can write
which shows that for any positive integer k 5 q
{ X ? : 1 5 j 5 k } is LI whenever {<$ : 1 5 j 5 k } is so. Using this fact one can prove the following lemma. Lemma 4.12 If A is a q x q and B is a q x r matrix such that A = BB*,
then A and B have the same rank. Proof. It suffices to show that A and B have the same row rank and this is the immediate consequence of the following claim. If Aj and B3 denote the j t h row of A and B, respectively, then
j=1
j=1
for any scalars a1 , a2 , . . . , aq. To verify this claim, we first note that the j t h row of A is
Aj = [ ( B j , B ; )(Bj, , B;), ...>(Bj, Bq*)I and hence
MULTIVARIATE PREDICTION THEORY
113
Now if C,"=, a,B, = 0 then it is clear from (4.70) that each entry of the row vector C,"=, a,A, is zero, and hence C,"=, a,A, itself is zero. On the other hand, if we assume C,"=, a,A, is zero, then all entries must be zero, which according to (4.70) gives
But that means C4=1 a j B j ,which is in the row space of cp, is also orthogonal to the row space. Therefore C4=1 ajBj = 0 must be zero. 1
Lemma 4.13 A n y nonnegative definite q x q matrix f = [ f k J ] of rank r can be factored as f = @@*, (4.71) where @ = [4kJ] is a q x r rectangular matrix of rank r . Proof. Let G be a q x q square root of the matrix f and V be a q x q unitary matrix that diagonalizes G as
D = VGV* = [ d J 3 ] , where D is diagonal with d33 > 0 for all j = 1 , 2 , .. . , r . Now let D be the rectangular matrix obtained from D by _omitting the last m - r columns. It is easy to see that the matrix @ = V*D serves as the desired factor. The statement about the rank of ip is clear from the last lemma. See also our discussion on the Cholesky factorization in Chapter 8.
Lemma 4.14 If @ = [4lC3] is an q x r rectangular matrix of rank r then there exists an r x q matrix \k = [gik]such that @\k = I,,
with I, the r x r identity matrix.
(4.72)
114
Proof.
STATIONARY RANDOM SEQUENCES
For each i = .1,2, . . . , q the system of r linear equations
k=l
with q unknowns $ i k , k = 1 , 2 , . . . , q has a solution because the rank of its coefficient matrix cp is q. If $Jil, $i2, . . . , $24 satisfy this system, then the matrix \k = [ g i k ]is clearly the required factor. I Now consider an q-variate stationary sequence Xt = [ X i ]of rank r with spectral density f(A) = [ f k j ( A ) ] and spectral measure [(CIA). By Lemma 4.13 we have the factorization f ( X ) = @(X)@*(X), where the factor @(A) is an q x r matrix valued function, each entry of which is in L2[0,27r). So we can write 00
00
n=-m
n=-E
Thus there exists an r x q matrix valued function (4.71) and (4.72) and consequently \k(X)f(A)lk*(X)
a
=
[Gik] that
satisfies
= I,, for a.e. A.
Define Aj(A), for any j = 1 , 2 , . . . , r and any Bore1 subset of [0,27r), by
where @ is the j t h row of the matrix Q. By this choice of \k the random measures Aj ,j = l , 2, . . . ; T are mutually uncorrelated, namely, E{Aj (A)h3’(A’)} = 0 , j # j‘. Moreover
ElAj(dX)/2= dX, j = 1 , 2 , .. . , r . Consider the r-variate stationary sequence
The above-mentioned properties of measures Aj imply that the sequence Ct = “$1 is uncorrelated. In fact one can see that the set {<$: t E Z,1 5 j 5 r } is an orthonormal set. That is; for any i , j = 1 , 2 , . . . , q we have 1
E<:Z=
{0
if i = j and t otherwise
=s
115
MULTIVARIATE PREDICTION THEORY
Now one can easily check that (@\k-I,)f(@\k - I 4 ) * = 0 almost everywhere. Therefore @\k and I, are identical in L 2 ( F )and hence we can write
Xt
=
/ Jo
2n
27r
eitx[(dX) = 0
which gives
1
27r
eitXI,[(dX)
1
=
eitA@(X)\k(X)[(dX),
27r
Xt =
eitx@(X)A(dX).
(4.73)
Substituting the above Fourier expansion of cp in (4.73), we get the moving average representation
n=--30
s,"'
with A, = [Akj]being some q x r matrix and i, = e i n x A ( d X ) being an orthonormal sequence. This establishes the following theorem.
Theorem 4.10 If a q-variate stationary sequence X t has spectral rank r then at has a moving average representation m
(4.74) n=-m
where (1) the q x r coeficaents A, have square summable entrzes: C , IA2J < cc f o r k = 1 , 2 , . . . . q , j = 1 , 2 , . . . , r , and (2) the components {(i: t E Z,1 5 j 5 r } of the r-varaate sequence itf o r m a n orthonormal basas f o r 'Hx.
Proof. Based on the remarks preceding the statement of theorem we only need t o show the last statement about the basicity of it.Let @ = [4kJ]and A = [AJ] be the Fourier transforms of A, = [A231 and 6, = [
/
\k(X)A(X)dX with
I
Il\k(A)A(X)l12dA< oc
From the representation (4.74) it is clear that 'Hx C ' H i . On the other is assumed i t o form a basis for ?-lxwe have 'HA C 'Hx.So hand, since < 'Hx = ' H i 3 which by our choice of A implies 'HA EX.We only need to show that X t has rank r . That is, we must show that its spectral density, which turns out to be f(X) = +(X)@*(X), has rank r for a.e. A. Suppose not: that is. suppose that on a set of positive measure, the rank of the matrix f(X) is less than r ; then there exists a vector function *(A) = [ y J ( X ) such ] that
116
STATIONARY RANDOM SEQUENCES
JJ!V(A)1J2dA # 0 and Qi(A)!V*(A) = 0, for a.e. A. This would mean that the ! V ( A ) h ( X ) d Ais in 7 - l ~and at the same time orthogonal to 7 - l ~ ,
element u = because
j=o
which is a contradiction.
I
The moving average representation claimed in this theorem is not unique. In fact, every such factorization of the form (4.71) of the spectral density f gives rise to one such moving average representation. The moving average representation in the last result is two sided. However, one sided moving averages are more useful in studying prediction of regular stationary processes and other issues studied in the next subsection.
4.4.4
Regular Processes
We start with defining the multivariate versions of uncorrelated and white is called uncorrelated noise sequences. A multivariate sequence Et = [65]?==, if for any t # s,
cov
(ci, ct) = 0,for every j , IC = 1 , 2 , . . . , q.
Uncorrelated multivariate sequences are not necessarily stationary. For a simple univariate example start with an orthonormal sequence yt and set X t = yt for even and X t = 0.5Yt for odd t's. When an uncorrelated multivariate sequence ct is stationary it is called a white noise , and in this case we have Cov ( c t , E,) = X St-,. We call a white noise Et for which Cov ( c t ,E , ) = I bt-, a normalized white noise. As we see in the next lemma, any white noise can be expressed in terms of a normalized white noise. Lemma 4.15 Every white noise tt of rank r 5 q can be written as Et = A s t , where A is a n q x r matrix and E t is an r-variate normalized white noise. I n this case we also have Cov (Et, E t ) = AA'.
Proof. Let { E ~ , E ; ,. . . , E ; } be an orthogonal basis for sp{$, : 1 5 j 5 q } . let A be the m x T matrix that expresses and € 0 = E; . . . ~ 3 'Now the components of toin terms of the orthonormal basis E ; , . . . , E ; } ; that is, to = AEO.Let U be the unitary shift for Et and set E t = Utao = [Ute;,Ut&g1 . . . , U'EE~;]', for each integer t . It is easy to see that Et is a normalized white noise for which
[&A
{&A,
et = UtEo = U t A € o = AUtao =
AEt.
MULTIVARIATE PREDICTION THEORY
The statement about the covariance follows from checked fact that Cov ( ~ tE t, ) = I,. 1
117
tt = AEt and the easily
The preceding lemma enables us t o show any uncorrelated multivariate stationary sequence is regular. We start with the following lemma.
Lemma 4.16 Every multivariate normalized white noise
Et
=
[E$]
is regular.
Proof. The assumptions on sequence E t ensure that { E : : t E Z, 1 L j 5 m} forms an orthonormal basis for RE.So for any fixed vector u E X€(-m) C X E we can write
On the other hand, the expression
uE
c X E ( t - 1) c X E ( t ) , t E Z,
Xg(-cO)
in conjunction with fact that
E!
IX & ( t- l ) ,implies that
a; = ( U , & i ) = 0 for each t E Z, 1 5 j 5 q. This, in turn, implies that u = 0. Therefore we get XE(-co) = 0, which completes the proof. I
Proposition 4.11 Every uncorrelated multivariate stationary sequence regular.
Et is
Proof. By Lemma 4.15 we can write Et = A E t , for some q x r matrix A and some r-variate normalized white noise E t . This implies that “ ( t ) c X E ( t ) for every integer t , and hence
X&W)
c X€(-W).
But XE(-03) = (0) by Lemma 4.16 and so 3-t<(-m) = (0). This complete the proof. I Next we characterize all multivariate regular stationary sequences.
Proposition 4.12 (Moving Average Representation)
118
STATIONARY RANDOM SEQUENCES
(a) Any q-wariate one sided moving average of a n r-variate normalized white noise et : xt = C A s E t - s (4.75) s>o
with
C lujsk12< m, 1 5 j I q; 1 5 k 5 r ,
(4.76)
s>o
is stationary and regular. (b) Any q-variate regular stationary process Xt of rank r 5 q has a one sided moving average representation
xt
(4.77)
=CAsEt-s, s20
where et is a n r-variate normalized white noise with matrices At satisfying (4.76).
=
[uik]
and E t having Proof. For (a), suppose Xt is given by (4.75) with the stated properties. First note that orthonormality of all the components { E : : t E Z,1 5 j 5 r } of E t and the square summability (4.76) of { u l k } ensure that X i is well defined for every integer t and any j = 1 . 2 , . . . , q. It is obvious from (4.75) that X x ( t ) c X & ( t )for every integer t and hence
c Xe(-co).
Xx(-co)
This, and the regularity of Et imply that Xt is regular. Now consider the unitary operator defined on 7 - i ~via U E =~ E : + ~ , t E Z; j = 1,2, ...,r.
The linearity and continuity of U implies that
UXt = U ( xAjet-j) = j>O
x
A j U e t - j = x A j c t + l - - j= X t + l ,
j>O
j>o
showing that U serves as a unitary shift for X t , and hence Xt is stationary . For (b), conversely suppose Xt is a regular q-variate stationary sequence of rank r 5 q. Since the innovation spaces 2x(t) appearing in (4.66) are of dimension r and 2x(t) i ZX(S) for t # s, we may express any vector u E X x ( t ) as
MULTIVARIATE PREDICTION THEORY
where
{E; :
119
1 5 j 5 r } is any fixed orthonormal basis for Zx(0) and
- UtEio , t E Z , l < j l r .
Ej -
In particular, taking u = X F , we get
Thus we have shown that each component of Xt and hence Xt has the representation (4.77). It remains t o show that each coefficient ( X , ~ , E { _ ,is) independent of t . However this is an immediate consequence of uniqueness of such expansions together with
xt+l= C ( X , " + l ,E i + l - s ) E t + l - - s s>o
and
Here U may be brought inside the sum due t o convergence of the partial sums and continuity of U . The last equality follows from the fact that E : + ~ = U E ~a ,conclusion that may be drawn from item (a) of Lemma 4.3. Taking = ( X t ,E { - ~ ) one arrives at (4.76).
aik
4.4.5
Infinite Past Prediction
As in the univariate case in Section 4.2, we can evaluate the prediction error of x 6 , S 2 1 , by expressing it in terms of the innovations of Xt. Proposition 4.13 If Xt is a regular q-variate stationary sequence with one sided moving average (4.76), in terms of its innovation it,then its 6-step ahead predictor X g = ( X ~ l H x ( 0 )based ) o n its past is given by
(4.78)
Its prediction error is given b y 6-1
x g -
x s = CABi6-s s=o
(4.79)
120
STATIONARY RANDOM SEQUENCES
and has variance Var
(x,- X,)
6-1
=
C A,EA;.
(4.80)
s=o
ztii
Proof. It is clear that C z 6AsC6-s belongs to ‘Hx(0) and AsC6-s is orthogonal to ‘Hx(0).These in conjunction with uniqueness of projection gives
-
c 00
X6 = (Xg/’Hx(O))=
ASC6-s.
9=6
The error formulas are now immediate.
I
Corollary 4.13.1 Let Xt be an q-variate stationary sequence with innovation process Ct and let Xt = Yt Z t be its Wold decomposition (4.65) of Proposition 4.10 with its regular component Zt having one sided moving ave w e (4.761, namely, Zt = &OAsCt-s . Then the 6-step ahead predictor of Xg based on its past Xo,X-l, ... is given b y
+
00
(4.81)
I n this case (4.79) and (4.80) remain true. Proof. From the Wold Decomposition Theorem we see that ‘Hx(0) = ‘HFlx(-m) @ ‘Hz(0), which implies Y6 C z 6AsC6-s belongs to ‘Hx(0).On the other hand, obviously AsC6-s is orthogonal to ‘Hx(O),which completes the proof of the first part. The error formulas are again immediate.
+
x:iA
I In principle, the moving average matrix coefficients A, can be found in terms of the matrix autocorrelation function R(t).One can prove the following lemma as in its univariate version. Corollary 4.13.2 If Xt is a regular stationary sequence with one sided moving average representation Xt = Cp=o AkCtPkin terms of its innovation process Ct ( 4.76 ), then 00
R(t)=
C A s + t K , t E Z. s=o
Here we use the convention of taking A, = 0 for each negative s.
MULTIVARIATE PREDICTION THEORY
121
As in the univariate case one can write the last equation as
R = fxf*, where for any j, k = 0,1,. . . ,_the entries of the matrices R and f are defined by Rkj = R ( j - k ) and Rkj = Aj-k, respectively. This again suggests that the moving average coefficients for a regular sequence and its predictor’s coefficients {Ak : k = 0,1, . . . } can be obtained from a matricial Cholesky factorization of the covariance matrix R. For further discussion see Propositions 4.6 and 8.9. 4.4.6
Spectral Theory and Rank
The notion of rank of multivariate stationary sequences can be described nicely in spectral terms. This is another reason that the innovation rank is considered a more natural notion than the rank of R(0). The following is due to Rozanov [199] who credits a 1941 note of Zasuhin [229].
Theorem 4.11 (Spectral Characterization of Regularity) A y-variate stationary sequence Xt is regular i f and only if all entries Fjk of its spectral measure F are absolutely continuous w.r.t. Lebesgue measure and its spectral density f(X) can be factored as f(X) = *(X)**(X)
(4.82)
with factor @(A) being a q x r matrix valued function (for some r 5 y) with entries in L2[0,27r] such that @‘(A)
=
Iuj,”12 < cx), j= 1,2,. . . ,y,
xujn”e-iXn, n=O
k = 1,2,. . . , r .
n=O
Proof. Suppose Xt is a regular q-variate stationary sequence. By part (b) of Proposition 4.11 it has one sided moving average representation (4.76) SZO
with properties specified there. The r-variate normalized white noise sequence Ct has a spectral representation
where the r-variate random measure q ( d X ) has orthogonal components and uniform spectral density,
122
STATIONARY RANDOM SEQUENCES
Substituting for
Ct
in (4.76) we get
which gives 27r
~
~
(= 7
)
+(A)+*(A)ei'xdA.
From this, by virtue of uniqueness of Fourier transforms, we arrive at f(A) = +(A)+*(A), which is the desired factorization. Conversely, suppose Xt is a qvariate stationary sequence having an absolutely continuous spectral measure with spectral density f (A) satisfying (4.82). Consider an infinite dimensional Hilbert space with an orthonormal basis specially labeled as {<; : t E Z, 15 j 5 r}. Now let Yt be the q-variate sequence defined by
s>o
It is clear that Yt is regular and has autocovariance function
I
2?r
~
~
(= 7
)
*(X)+*(A)ei'xdA.
This and (4.4.6) show that Xt and Y t have the same correlation structure, and hence Xt is regular as well. I
Proposition 4.14 The various ranks of a regular q-variate stationary sequence Xt satisfy the following inequalities:
s=r
s>o
where Ct is an r-variate white noise. This forces its spectral density to factor as (4.82) f(X) = *(A)+*(X) (4.84) with +(A) being an q x r matrix valued function. Now, on one hand, since *(A) is q x r its rank can never exceed r. On the other hand, its rank, by Lemma 4.12, is a.e. equal to s. That is, s 5 rank+(X) 5 T .
MULTIVARIATE PREDICTION THEORY
123
Since f(X) has rank s, then for a.e. X we have f ( X ) = @(A)@*(X), where @(A) is q x s. Let q be a s-dimensional vector of orthogonally scattered random measures and set Yt = @ ( A ) e i t X q ( d X ) .Then R Y ( T = ) Rx(T), so Xt and Y t are essentially the same. But then +(A) is L 2 [ 0 , 2 n )because f ( X ) is L 1 [ 0 , 2 n ) . So @(A) has an L 2 Fourier series, @(A) = C r o A j e i j X , leading to
s,""
M
j=O
€4
where is an s-vector of mutually orthogonal white sequences. But this means that the innovation space of Y t is of dimension at most s, meaning rss. 1
4.4.7
Spectral Theory and Prediction
If Xt is a q-variate regular sequence of rank r, then by part (b) of Proposition 4.12 it has a one sided moving average representation (with convergence in the mean-square sense, component-wise)
k=O
in terms of a sequence of r x m matrices Ak and an r-variate white noise Ct with I ( t )= sp{C,}. Hence, as discussed earlier, one can write the predictor of X6 based on its past ...,X-1, XOas m
(4.85) k=6
But this representation of the predictor is in terms of innovations C t , which cannot usually be observed. In order t o resolve this problem note that for each integer t , the innovation Ct E 7-Ix(t)and thus is a limit of some finite linear combinations of Xt, namely,
124
STATIONARY RANDOM SEQUENCES
where C k ( n ) is independent of t is due to the stationarity of X t . Under suitable conditions on f , (cf. [152,156,159,225])one can show that the limiting process can be achieved by a series, M
k=O
with convergence in mean-square sense. Substituting for Ct from last equation into (4.85) we obtain rx?
k
k=O
n=O
This expresses the predictor in terms of the past of the process itself, which is observable. Algorithms for determining the coefficients A k , c k , and subsequently Dk from the spectral density, which generalizes our presentation in the univariate case, is available in the literature. For a full account of this one can refer to [152,224,225].
Lemma 4.17 Let X t be a regular full rank q-variate stationary sequence with spectral distribution function F and spectral density f . Suppose and @ denote its q-variate innovation, and generating function, respectively, then
ct
(a) eZtA@-’ i s in the spectral domain L2(F) of X t and corresponds to Ct in its time domain ZX;
(b) f o r any 9
E
L2(F),l k 9 E L2[0,27r);
(c) for any 9 E L2(F), if Ak is the k-th Fourier coeficient of @@, then n-o3 lim
(2 A k e i k ” ) @ - ’
= lk, in the L2(F)sense.
-n
Theorem 4.12 Suppose the spectral measure F of a random sequence X t is absolutely continuous w.r.t. Lebesgue measure and its spectral density f satisfies the boundedness condition : VI
< f < PI,
V
> 0.
Then for any S > 1, M
k
k=O
n=O
(4.86)
MULTIVARIATE PREDICTION THEORY
125
cr=o
The function @ = A k e - i k A is called the spectral generating function of Xt and @-’ = CEO Cke-akA i s its inverse. I n this case the prediction error matrix of lag 6 is given by 6-1 n=l
where R ( 0 ) = Cov (XO, XO). Note that when the boundedness condition (4.86) is assumed; the corresponding processes turn out to be of full rank. Therefore this last theorem holds only for full rank processes. For some generalization of this result to nonfull rank case, see [156] and [159]. 4.4.8
Finite Past Prediction
The problem addressed here is that of predicting a member, say X ( t + S ) , of a q-variate stationary sequence Xt based on on a finite number of observations {XtFn+1,X t P n r . . , Xt} in its past. We take the best linear predictor to mean the component-wise orthogonal projection of Xt+6 onto M x ( t ;n ) = sp{X$ : t - n < s 5 t , 1 5 j 5 q}, namely, h
(4.87)
Xt+b,n = (Xt+6 IMx (t;n ) ) .
As in the univariate case we are especially interested in 6 = 1 and 6 = -n. We will treat only the case of S = 1 and assume the process is real. Completion of some details, being similar t o the univariate case, are suggested as a problem. Assuming 6 = 1, we seek the coefficients of the following linear expression: n
(4.88) j=1
The multivariate normal equations arising from the propzrties of projection can be written in terms of all the components of Xt+l - Xt+l,nand X,, but this can be more conveniently expressed as h
(&+l
-
X t + l , n , X s ) = 0, t - n
+ 15 s 5 t,
where the Gramian ( X , Y ) of any two q-vectors X = [ X 1 , X 2 ,... ,Xq]’ and Y = [Y’,Y 2 , . . , Yq]’is the q x q matrix whose ( i , j ) t h entry is (Xi, Yj). These normal equations can be expressed in terms of the autocorrelation function n
C A , ~ R+(1 -~ -=~~ S)
j=1
( t 1+- s),
t
- n+ 15
5 t,
(4.89)
126
STATIONARY RANDOM SEQUENCES
or in matrix form
(4.90)
R ( n - 1) . . . Finally, the normal equations can be expressed as
R; = RnAn.
(4.91)
As we see, these matrices RA,Rn and hence &, do not depend on t , which is of course due to the stationarity of Xt. For any An = [AnlAnZ...Ann]‘ that solves (4.90), the prediction error h
= Xt+l
-
Xt+l,n
(4.92)
has covariance
E n ( t + 1) = Var (Xt+l - Xt+l,n) h
h
=
(Xt+l -Xt+l,n, Xt+l) n
=
~ ( 0 -)
C AnjR(j) 3=1
=
R(0) - (Ri)’An.
(4.93) h
h
The second line follows from the fact that ((Xt+l - Xt+l,n,Xt+l,n)= 0. The last line shows that E n ( t 1) is independent o f t and so from now on we denote it by En. The corresponding relationship for predicting based on M ( t ; n )= sp{Xa : t - n < s 5 t , 1 5 j 5 q } follows exactly as in the univariate case (see discussion following (4.36) leading up to (4.42)). In the following proposition we extend to the multivariate case some of the results we proved earlier for the uniyriate case. Here we use the fG1owing notation: X = [a”] = Var (Xt+l - Xt+1) and E n = Var (Xt+l - Xt+l,n). Recall that for two n x n matrices A and B we write A 2 B if A - B is nonnegative definite.
+
Proposition 4.15 If Xt is a n q-variate stationary sequence, then (a) f o r any two positive integers n and k with k
E n 2 Ek
> n,
2 ..
(b) f o r every j = 1 , 2 , , . . , q, the sequence a$J is bounded and nonincreasing;
MULTIVARIATE PREDICTION THEORY
(c) whenever
(d) o$
.. o ~ J
-+ d j ,
= 0 for some j and some n, then og = 0 for any m
127
1 n;
f o r any j = 1,2 , . . . , q ;
(e) limn+00 En = C, entry-wise; h
h
(f) limn+00 Xt+l;n = Xt+1;
(g) rank of C, is nonincreasing and nullity of C, is nondecreasing; (h) limn-m ICnI =
1x1;
(i) if R, is invertible then
(k) whenever Xt is full rank and nondeterministic, then lR,/ # 0 for all n 1 1, and we have
1x1 = exp (lim
1
In lRnl)
> 0.
Proof. For (a), it is not hard to see that for any n’ 2 n
h
h
So we have En - C,, = Var (Xt+l,n,- Xt+l,,). But the term on the righthand side, being a covariance, is nonnegative, thus proving the first inequality in (a). The second one is similar. Part (b) is an immediate consequence of part (a), noting that diagonal elements of a nonnegative definite matrix are nonnegative. Part (c) is immediate from part (b). The limit argument in (d) is actually univariate and can be argued as we did in part (b) of Proposition 4.5. For (e), it suffices to show that
But since C, - C is nonnegative and hence its diagonal entries dominate the rest, it suffices to show .. lim (o? - 033) = o 71-00
128
STATIONARY RANDOM SEQUENCES
and this was proved in part (d). For (f), we can write for any j = 1 , 2 , . . . ,Q ..
qy
=
I12;+l,n -
112
=
which implies
A
+ 2!+l- x;+,l12! llz;+l,,- 2;+,112 + (4.94) A
=
.
IIX;+l,n - 2!+l
CTjj
..
.
IIX;+l,n - 2;+l 112 = CT; - 0 3 3 .
By virtue of the convergence statement of part (d), this completes the proof of (f). For (g), since it is well known that null C n rank& = q , it is enough t o verify the statement only for rank. For this note that for each positive integer n,rank C, = dimZn is the maximum number of LI vectors in the generators {X,”,, - X ~ , + I :, 1~ 5 j 5 q } of Z., So it suffices to show that for any two integers n’ 2 n the maximum number of LI vectors in the generators of 1; does not exceed ( 5 )the maximum number of LI vectors in the generators of 1,.The inequality is true if every LD set of generators from 2, is also LD in 1;.We prove the latter by considering the subsets of the generators in order, as in the Gram-Schmidt process. Indeed, suppose the s ( 5 q ) vectors {X!+,- X j t + ~: ,1~5 j 5 s } are LD; then there exists scalars {aj : 1 5 j 5 s}, not all zero, such that
+
h
h
Thus
c,”=, ajXZ+,is in M ( t ,n)and hence in M ( t ,n’), which in turn implies
2
h
aj(x;+l - X j , + l , $ ) = 0.
j=1 h
Therefore { X i + , - Xjt+l,,;, : 1 5 j 5 s} are LD, which completes the proof (see also problem 4.16). Part (h) follows from part (e) and the fact that the determinant of a matrix is an algebraic function of its entries and hence a continuous function of them. The first equality in part (i) follows from taking the determinant of both sides of (4.93). For the second equality, the Partitioning (4.95)
in conjunction with the fact that if A and D are invertible n x n and m x m matrices then
PROBLEMS AND SUPPLEMENTS
129
gives us IRn+lI
=
-1 I I[R(o)- (Rn) R,
,1
lR1
fin -
I -IX~IIR~I
(see [183, Ch. 7, problem 141 and [148, Appendix A]). For (k), since lXnl IXI by part (h), we obtain
--t
PROBLEMS AND SUPPLEMENTS
4.1 If Xt is a stationary process with mean m and shift U , then 1 is an eigenvector of U with eigenvalue of 1. For a proof, let {$j, j = 1,2, . . . } be a complete orthonormal set generated from Xo, XI, X-1, X2, X-2, . . . obtained by the Gram-Schmidt procedure. Now note that each & is a finite linear combination $k = xy21a y ) X t 3 so that ($k, 1) = C;:, aj(k)(Xt,, 1) = a (k) j rn;
zyt1
from this it follows that
3=1
j=1
j=1
Thus the vectors 1 and U1 have the same Fourier coefficients with respect to the complete orthonormal sequence {4j : j = 1 , 2 , .. . }. Show that if X t is q-variate stationary with mean m and 1 E 7-1 then 1 is an eigenvector of U with eigenvalue of 1. The sketch for the univariate case can be modified to give (&, 1) = (&, U1) for arbitrary k and hence the proof. The main point to note is that (Xf, 1) = mp = (U-’XT, 1). 4.2
4.3
Show that if Xt is a stationary process with mean zero then
PsP{I,x,,x, ,...,~ , } X n + l= PsP{x~,x, ,...,x,}Xn+1. 4.4 IctI
Consider X t
= Et - m-1,
with
Et
a white noise. Show that when
< 1 the least-square predictor of Xt+l in terms of its past H ( t ) is given by j=1
What happens if
ct =
l?
Let Xt be a mean zero stationary process with autocovariance R(.). Show that the series c;=, 6kXk converges in the mean-square sense if O,OjR(i - j ) is finite.
4.5
czoCEO
130
4.6
STATIONARY RANDOM SEQUENCES
Show that
D;
> 0 if and only if Xt has a nontrivial regular part.
4.7 Show that item (c) of Lemma 4.1 is equivalent to saying that U commutes with the projection onto 'H(-co). 4.8 Determine the autocovariance function of the stationary process with spectral density f ( A ) = (T - l A l ) / r 2 , X E [ - T , T ) . 4.9 Show that a stationary sequence Xt is regular if and only if limn,,(XtI'Flx(tn ) )= 0. Here is a sketch of the "only if" part. If Xt is regular, it has a represent at ion (4.21)
II
(XtlwY(t - n ) ) 112
I II
(XtI'Hc(t - n ) )112 00
k=n+l
k=n+1 as n + m, as claimed.
4.10 An alternative proof that the infinite moving average Xt in (4.76) is stationary is to take t 2 s and show that Cov (Xi, X,) depends only on t - s. Complete the proof by showing this component-wise: that is, by taking any 1 5 j _< q and any integers s and t with t L: s and showing R j k ( t , s ) = R y t 1,s 1).
+
+
4.11 Give an example of a regular q-variate sequence for which rank R(0) = q and rank(I'C) < q. Now give an q-variate sequence that is singular but rank R(0)= q . 4.12 Complete the details appearing in (4.88) through Proposition 4.15 for the estimator Xt+&,tin = (Xt+sjMx(t;n ) )when 6 = -n. A
4.13 With the notation of Definition 4.9, show that (XIM) is the unique q-variate vector such that
//lX- (XlM)III = inf{jllX
- Y\/l: components of
Y are in M } ,
with ~ l ~ representing X ~ \ ~ the Euclidean norm tr (XX*) of X .
4.14 Show that for any n x n invertible matrix A and any m x m invertible matrix D we have (A
+ BDB/)-l = A-'
and det
[$ ]
-
A-lB(B'A-lB
+ D-l)-lB'A-'
= detAdet(D - CA-lB).
PROBLEMS AND SUPPLEMENTS
131
Suppose X t and yt are stationary processes satisfying X t - a X t - l = Zt and yt - cuyt-1 = X t Zt, with X t and Zt being two uncorrelated white noises W N ( 0 ,02) and la1 < 1. Find spectral density of yt. 4.15
+
4.16 If M and N are two subspaces of a vector space X such that for any k LI vectors in N one can find k LI vectors in M , then dim M 2 dim N . Hint: The dimension of any vector space is the maximum number of LI vectors found in that vector space.
This Page Intentionally Left Blank
CHAPTER 5
HARMONIZABLE SEQUENCES
In various applications, both data and physical models suggest the absence of stationarity. This motivates the extension of the well developed theory of stationary processes to some classes of nonstationary ones. One such class is that of periodically correlated processes, which is the subject of this book. Another large and useful class is that of harmonizable processes. The concept of harmonizable processes was introduced by Lobve (see [139]for a discussion) and was studied by Crambr in his paper on nonstationary processes [35].The idea there was to extend the harmonic integral representations (4.4)and (4.5) of Theorem 4.2, namely,
and
R ( s , t )= R ( s - t ) =
L277
ei("-t)AF
(dX)
Periodically Correlated Random Sequences:Spectral Theory and Practice. By H.L. Hurd and A.G. Miamee Copyright @ 2007 John Wiley & Sons, Inc.
133
134
HARMONIZABLE SEQUENCES
beyond the class of stationary processes. In the stationary case, 5 is a Hilbert valued orthogonally scattered vector measure on the Bore1 subsets B of [0,27r) but in the case of harmonizable sequences, studied in this chapter, the vector measure 5 is not necessarily orthogonally scattered. Thus we briefly present integration with respect to such general measures (see [48, Chapter 11). 5.1
VECTOR MEASURE INTEGRATION
In this section we review basic properties of vector measures and present integration of scalar functions with respect t o such measures.
Definition 5.1 Let 3 be a field of subsets of a set 0. A set function 5 f r o m 3 to a Halbert space 3-1 is called a finitely additive vector measure, or simply a vector measure, i f for any two disjoint sets El and E2 in 3 , 5 ( E l Ez) = [ ( E l ) [ ( E z ) and it is called countably additive if
u
+
n=l
n=l
for any pairwise disjoint sequence En of sets in 3 f o r which in 3. EXAMPLE 5.1
u,”==, En as also
A finitely additive vector measure
Let T : Lm[O,11 --t ‘H be a linear transformation. For each Lebesgue measurable subset E of [0,1],we define [ ( E )to be T ( ~ EThen ) . by linearity of T it is clear that C is a finitely additive vector measure that in general may fail to be countably additive. EXAMPLE 5.2
A countably additive vector measure
Let T : L1[O,11 -+ ‘H be a continuous linear operator and consider the measure q defined by q ( E ) = T ( ~ Efor ) , each Lebesgue measurable subset E of [0,1].For each such set E we have Jlq(E)II5 IITIIX(E);here and throughout this chapter X stands for Lebesgue measure. Consequently, for any sequence of disjoint Lebesgue measurable subsets of [0,1],we can write
n=m+l
VECTOR MEASURE INTEGRATION
135
which means q is countably additive. In the study of vector measures and harmonizable processes, two notions of variation are used.
<
Definition 5.2 If is a vector measure on a field 3 of subsets of R,then its total variation (or simply variation) is the extended real valued set function :F R+ defined b y ---f
n i=l
and its semivariation is the extended real valued set
li
: 3 + R+
defined b y
i=l
where the sup is over all finite 3-partitions { Bi} of B and all finite collection of scalars ai with lail 5 1 . The measure is said to be of bounded variation (of bounded semivariation) i f I
<
<
The countably additive vector measure in Example 5.1 is of bounded semivariation because for any finite partition Bi of R and any scalars cyi with Iail 5 1, we have
I IlTll However, C is not of finite variation. To see this consider the finite partition of [O, 11 consisting of the intervals Ek = (2-k,2-k+1], for k = 1 , 2 , ...,n and En+l = [O, 2-"), and note that for each positive integer n n+l k=l
This means Iql([O,11) is infinite. The vector measure q in Example 5.2 is of bounded variation. This is an immediate consequence of the fact that for any finite partition { B i } of R we have
Proposition 5.1 For any vector measure
< on F and any set B E 3,
II
136
HARMONIZABLE SEQUENCES
Proof. If {BI, B2,..., B m } is a finite 3 partition of scalars with lan/ 5 1 , n = 1 , 2 , ...,m, then
I
{
sup Iz*[l(B) : z* E
B and a l , a2, ...,am are
x*,/ / z * /I/ l}.
This implies ll[ll(B)5 sup{Iz*[I(B) : x* E X * , //z*II5 I}. Now for any z* E X * with 11z*115 1 and any finite 3 partition {BI, B2,..., Bm} of B we arrive at the following inequalities: m
m
n=l
n= 1
which implies the needed reverse inequality.
I
The next few propositions give some basic properties of variation and semivariation.
Proposition 5.2 For any vector measure [ and any set B E F,
lltll(B) I ltl(B). Proof. It is immediate from the definitions.
I
Proposition 5.3 The variation /
u
n
n
n
VECTOR MEASURE INTEGRATION
137
which means
ltl(Eu F ) 5 Itl(E)+ Itl(F) So if ltl(E U F ) = 03, the inequality in the last equation must be equality and we are done. Now if lcl(E U F ) < m, then there are finitely many subsets {Ei : i = 1 , 2 , ...,n } and {Fj : j = 1 , 2 , ...,n } of members F with Ei G E l Fj C F such that n
m
This gives n
m
i=l
j=1
which, since E is an arbitrary positive real number, implies the reverse inequality Itl(E)+ I
Proposition 5.4 A vector measure of bounded variation is countably additive if and only if its variation is countably additive.
I
Proof. For a proof one can refer to [48].
Proposition 5.5 If [ is a measure on a sigma field Hilbert space X1 then (a)
/i
is a monotone set function o n F;
(b) SUP{llt(C)II : c E F , C (c)
il[ii
F with values in some
G B} 5 lltll(B);f o r any B E F;
is subadditive on 3: that is, f o r any sequence {B,} of sets in F
c 00
IItII(Ur=i=1Bn)
I
iItIi(~n);
n=l
Proof. Parts (a) and (b) are immediate consequences of the definition of semivariation II
138
HARMONIZABLE SEQUENCES
and for each n, the collection {En Ti F1, En n F2, ..., En n F k } forms a finite disjoint partition for En. Thus if Iajl 5 1, for j = 1 , 2 , ...,Ic
which implies M
n=l
For (d) first assume that the Hilbert space IFI is real. In that case if { B l ,Bz,...,B,} is a disjoint F-partition for B,set T+ 7r
then for any
IC* E
-
7r
=
{ n : 1 5 n 5 m,z*E(Bn) 2 0}, and = { n :1 5 72 5 rn,Z*E(B,) < O}, =
X * with
5 1, we can write
/12*1/
as required. One can then check that for complex Hilbert spaces, by splitting x*[ into its real and imaginary parts and applying estimate 5.3 to each, the estimate continues to hold with 4 instead of 2. 4 In contrast to the complex valued measures, the variation of a countably additive vector measure 6 need not be finite. However, parts (b) and (d) of Proposition 5.5 imply the following. Proposition 5.6 A vector measure is bounded if and only i f it is of bounded
semivariation. This fact suggests using semivariation instead of variation in developing an integration theory with respect to vector measures. In fact this theory has been successfully developed and such an integral has been defined for a wide class of scalar valued functions (see [53, Section IV.101). In the rest of this section, we develop this integral for the smaller class of bounded measurable functions, which is large enough for the scope of the present book.
VECTOR MEASURE INTEGRATION
139
Definition 5.3 (Integral of Simple Functions) Let 3 be a field of subsets of R and E : 3 -+ Fl be a bounded vector measure. W e define the integral of a simple function f = C:=l ail^,, where a, are nonzero scalars and E, are pairwise disjoint members of 3, by
It is easy to show that this integral, on the set of simple functions, is well defined. Furthermore, if we define the sup norm of f as /If Ilm = sup{ f (w): w E R}, then we have
The last equality is a consequence of the definition of variation once we notice that az/llf /Im 5 1 for i = 1 , 2 , ..., n. It is also easy t o see that this integral is linear: that is, if f = Cy=l( Y , ~ E ,and g = P 3 1 ~ are , two simple functions then
Cj"==,
This integral can be extended to all uniform limits of simple functions. In fact, let f be the uniform limit of a sequence { f n } of simple functions. Thus { f n } is Cauchy in sup norm. On the other hand, we can write
{sn
which shows that f n d E } is a Cauchy sequence in the Hilbert space Fl and thus has a limit. We define f d t to be this limit: that is, we define
s
Because of the discussion above we say a function is F-integrable if it is the uniform limit of some F-simple functions. The integral of an 3-integrable function f on a set B in F is defined by fdt = 5 , f 1Bdt. It can be easily shown that any bounded 3-measurable function f on R is the uniform limit of a sequence of 3-simple functions and hence 3-integrable with respect to any bounded vector measure E . The next proposition lists basic properties of the vector integral we just defined.
s,
Proposition 5.7 If the functions f and g are both F-integrable, c i s any scalar and B , C are any two disjoint members of 3 then
140
HARMONIZABLE SEQUENCES
Proof. For (a), pick two sequences { f n } and {gn} of simple functions that converge uniformly to f and g, respectively. Then it is easy to see that the sequence (cf n gn) converges to cf g uniformly and hence limn (cfn gn)d[ = s,(cf g)d[. Now because of the linearity of the integral on simple functions,
s,
+
+ +
s,
( c f n + gn)dt = C
L fn
d~
+
1 B
+
gn d t .
Taking the limit as n goes to infinity on both sides, we get
which clearly shows (a). Proof of parts (b) and (c) are similar and left to the reader. I We will close this section with the following result. Proposition 5.8 Let E be a bounded vector measure on Borel subsets of [ 0 , 2 ~ with ) values in some Hilbert space 7-1. If the scalar valued function F defined b y F ( [a,b ) , [c,d)) = ( [ ( [ a b. ) ) , [([c, d))) can be extended to a measure on the Borel subsets of [ O , ~ T ) ~then , for any two bounded measurable functions f , g : [0,27r) -+ C we have
( Proof. f =
12"
f dt,
g dE> = J2"
J2" 0
0
S2"
f g dF.
0
If functions f and g are simple, then we can represent them as = Cr=l,021~, and write
czla , l ~ and , g
m
n
HARMONIZABLE SEQUENCES
141
Now suppose f and g are any two bounded measurable functions. Then there are sequences fn and gn of simple functions that converge uniformly to f and g respectively. According to the simple function case above we have fn
dt,
/2T
gn d t ) = / 2 T
0
/2n
0
f n g n dF,
for any n E N.
0
Since the sequence f n g n converges uniformly to f g if we take the limit of both sides of this equation, by virtue of the continuity of inner products as in Proposition 3.2, we arrive at (5.8). I
5.2
HARMONIZABLE SEQUENCES
Here we introduce harmonizable sequences and present their basic properties. The study of harmonizable processes was started by Lokve in 1948 [138] and subsequently by Cram& [36]. The weaker type of harmonizability seems to have originated with Rozanov in 1959 [200], but its characterization in terms of stationary dilations was completed in 1978 by Miamee and Salehi [155]. Before we define these two types of harmonizability we recall that a sequence { X t } of random variables on some probability space (elF ,P ) is said to be of second order if every element X t is of finite variance, that is, X t E L2(R,3,P ) for all t E Z. However, for the presentation here one can think of { X t } as being a sequence in an arbitrary Hilbert space ‘FI.
Definition 5.4 A random sequence X t with autocovariance function R(s,t ) is called strongly harmonizable (harmonizable, in short) if there exists a scalar valued measure F of bounded variation on Borel subsets of [ O , ~ T such ) ~ that (5.4) and weakly harmonizable if there exists a bounded vector measure subsets of [0,2n) such that
E on Borel
r2x
X t = /o
eitx[(dX).
(5.5)
F (respectively, t ) is called the spectral (respectively, random) measure of X t .
142
HARMONIZABLE SEQUENCES
Lemma 5.1 Every weakly harmonizable process X t is bounded. Proof. Let X t be a weakly harmonizable sequence so that (5.5) holds for some bounded random measure E . By virtue of properties of integrals with respect t o such random measures given in Section 5.1, we get
1
/ / X t / l= /2Te'tA<(dA)11 5 ~ ~ ~ ~ ~ ( [ 0 ~for 2 7every r ) ) , integer t. 0
Definition 5.5 A sequence X t in a Hilbert space l-l is said t o have a stationary dilation if there exists a Hilbert space K containing l-l and a stationary sequence Zt in K such that
x, = PZ,;t E z, where P is the orthogonal projection f r o m K onto 7-l. If this is the case, the sequence Zt is called a stationary dilation f o r X t . Addressing the interesting question of which sequences have stationary dilations, Abreu [l]proved the following result.
Proposition 5.9 Every harmonizable process has a stationary dilation. Proof. Suppose { X t } c 7-l be a harmonizable process whose spectral measure is F . That is suppose
R ( s ,t ) = (XS!X t ) =
L2xL2T
e i ( s A - t e ) F ( d Ad, o ) , s, t E Z.
The total variation IF/ of F is a finite symmetric Borel measure on [ 0 , 2 ~ ) ~ . Let p be the finite positive measure defined on Borel subsets B[O,27r) of [ 0 , 2 n ) bY
for any B E B[O,2n-). Then for every continuous complex valued function f on [ 0 , 2 n )we have fdp =
/2x
/2x(
0
0
f x 1)dlFl
+
L2x
L2x(1x f )dlFl
(5.6)
143
HARMONIZABLE SEQUENCES
But for each A, 8 E [0,27r) we have
Therefore
which, by virtue of (5.6), yields
lT
l T ( fx 7 )dF 5
lT
If12dP.
(5.7)
Now let ji be the measure on the square [ 0 , 2 ~which ) ~ is concentrated on its diagonal and p ( B ,B) = p ( B ) , for every B E B[O,27r), and define p = - F. Using (5.7) we get
Thus p is a nonnegative measure and hence it is the spectral measure of some harmonizable sequence yt . Recalling X X (m) = @ { X t : t E Z}and X y (m) = m{yt : t E Z}, we define K to be their direct sum X x ( c o )@'Fly(m) and let 2, = Xt
+ yt,
for each t
E
Z.
Then
&(s,t)
=
(&,a (XS =
+ y s , x t+ y t ) =
+
( X S , X t ) (Ys,Y,).
+
Hence Z , = F p is the covariance measure of 2, and this implies 2, is stationary. Finally, if P : K + X x ( c o ) denotes the orthogonal projection from K onto X x ( c c ) , it is clear from the construction of 2, that
X t = P Z t , for each t E Z.
I
The following proposition shows why harmonizable processes are sometimes called strongly harmonizable. Proposition 5.10 Every harmonizable sequence Xt is weakly harmonizable.
144
HARMONIZABLE SEQUENCES
Proof. Let { X t } c '7-l be a harmonizable sequence. By Proposition 5.9 this sequence has a stationary dilation { Z t } in some Hilbert space K containing 1-I: that is,
xt = PZ,,
t E Z,
with P being the orthogonal projection P : K -+ 1-I. Let C be the orthogonally scattered random measure of the stationary sequence Z t . Then integration theory with respect to vector measures, as presented in Section 5.1, and continuity of the projection P allow us to write
where the random measure E is defined by J(B) = P < ( B ) ,for each Borel subset B of [0,27r). It remains to show that E is of bounded semivariation. To see this, note that for any disjoint collection {Bi: 1 5 i 5 n } of Borel subsets of [O, 27r) and any collection {ai : 1 5 i 5 n } of scalars with I c y i I 5 1, we have
i=l
2=1
i=l
n
ij=l n
Abreu [l]observed that the class of harmonizable processes fails t o include all projections of stationary sequences. Seeking a characterization for sequences possessing a stationary dilation, Rozanov [a001 showed that projection of any stationary process is weakly harmonizable. The converse of Rozanov's result was given by Miamee and Salehi [155]. These comments establish the following characterization.
Proposition 5.11 A sequence X t has stationary dilation if and only if it is weakly harmonizable. Niemi in a series of papers [168,170-1721 studied various related questions and proved the following general result.
LIMIT OF ERGODIC AVERAGE
145
Theorem 5.1 If ‘Ft is a Hilbert space, then f o r any bounded vector measure + .F there exists (1) a Hilbert space K containing ‘Ft and (2), an orthogonally scattered vector measure C ; 3 -+ K , such that
[ : ‘Ft
[ ( E )= P [ ( E ) ,f o r any E E .F. Here P : K
--f
‘Ft stands for the orthogonal projection.
For more information on the subject of dilation the reader may want t o check Rosenberg [196], Makagon and Salehi [144],Miamee [154,160], Chattergi [32], Mlak [164], Masani [153], and the references therein. We close this section with the following results concerning boundedness of harmonizable sequences.
Corollary 5.11.1 Every harmonizable process is bounded. 5.3
LIMIT OF ERGODIC AVERAGE
Here we extend Proposition 4.2 t o express the limit of N
in spectral terms.
Proposition 5.12 If X t is a harmonizable sequence having the random spectral measure E and corresponding spectral (correlation) measure F , then
in the mean-squart sense.
Proof. Using (1.14), we obtain
=
lx
d N ( Y - A)[(d?),
where the Dirichlet kernel d ~ ( z is) bounded and continuous and converges to 0 of follows from l~~)(z The ) . convergence 11 S N ( A ) - (‘({A})
146
HARMONIZABLE SEQUENCES
because F ( . , {A}) is a (finite) measure in the first coordinate for each fixed A, and from the fact that N
N
r2n
p27r
converges t o
5.4
LINEAR T I M E INVARIANT FILTERS
The process yt is said t o be obtained from the process X t by application of the linear filter W = { w k ( t ): t , k E Z} if
k=--30
The coefficients wk(t) are called the weights of the filter. A filter W = { w k ( t ) : t , k E Z} is called a linear time invariant (LTI) filter if wk(t) depends only on t - k , that is, if wk(t) = Wt-k. In this case we can write
(5.10) k=-cc
k=-m
The time invariant linear filter W = {wk} is said t o be causal if
wk = 0 for k < 0, which according to (5.10) means that we can express yt in the form
Examples.
t E Z is linear but not time invariant 1. The filter defined by yt = since the weights are w k ( t ) = a S t + k , which do not depend only on t - k .
LINEAR TIME INVARIANT FILTERS
147
2 . The filter defined by yt = a - l X t + l + aoXt f a l X t - 1 , t E Z is linear and time invariant but not causal unless a-1 happens to be zero.
cco
3. The infinite moving average yt = ajtt-j (where & is usually taken to be orthonormal) is an instance of a causal linear time invariant filter.
Definition 5.6 A linear filter W
=
{ w k } with absolutely summable weights
is called stable. Lemma 5.2 If W = { w k } is a stable filter then the series Cp=-, wkeikX converges point wise to some function W(X). This function W(X) is called its transfer function. Proof. Clearly for every X the series I gent, and hence convergent.
Cr=-,
wkeikX is absolutely conver-
Since a harmonizable sequence X t is defined as an integral of functions eiXt with respect to some random measures [ ( d X ) , namely,
the familiar frequency domain interpretation of the effect of time invariant linear filtering for stationary sequences is retained.
Proposition 5.13 Suppose X t is a harmonizable sequence with random measure t and suppose W = { w j } is a stable time invariant linear filter; then for every t , the sum in
c Cc
r, =
wjxt-j
(5.11)
j=-,
converges in norm and thus defines the filtered sequence yt. Denoting W(X) as the transfer function (see Lemma 5.2) of the filter, the sequence yt is harmonizable with spectral representation
1 27r
yt
=
ei"XW(X)d<(X)
(5.12)
and spectral measure defined for all B E B([O,27r)') b y (5.13)
148
HARMONIZABLE SEQUENCES
Proof. By Lemma 5.1, { X t } is bounded. Therefore M X = sup, /lXtlj < m. Now if 0 < m < n,then because of
j=m
j=m k=m n n j=m k=m
cc n
5
n
1 9I
/a/I F - 3 ,
Xt-k)
I 5 M:
[
3=m k=m
c1% n
2
I] ,
3=m
the norm / / C,"=,, w3Xt-311 goes t o zero as m , n -+ 03. The same argument applies t o the case -n < -m < 0 and so by the Cauchy criterion the series w 3 X t - 3 converges in norm to some random variable, which we call yt: that is,
c,"=-,
n
To prove (5.12) and (5.13), we note that by Lemma 5.2 n
lim
n-oc,
C wkeikA
-
W(X), for every X E [0,27r).
k=-n
So applying the Dominated Convergence Theorem IV.10.10 in [53]with fn(X) = C:=-, wkeikA, f ( A ) = W(X), and g(X) being the constant function g(X) = lwkl, we get
c:=-,
Now letting n to go infinity on both sides of
and using (5.14) and (5.15), we get (5.12), as claimed. To prove the last assertion, according to (5.12) and (5.8) we can write
PROBLEMS AND SUPPLEMENTS
149
which, in conjunction with the uniqueness of Fourier inversion, completes the proof. I
PROBLEMS AND SUPPLEMENTS 5.1 Let F(dX) be a measure on [O, 27r) and let bution function defined on [O, 27r) by
show that for any continuous function
f(X)
F(X)be its associated
distri-
on [0,27r),
5.2 Determine the transfer function of the time invariant linear filter with coefficients wo = 1,w1 = - 2 ~ 2 ,w2 = 1, and wj = 0 for j # 0 , 1 , 2 . What value of a should be used for the filter to suppress sinusoidal oscillations of period 6?
5.3
Consider the process 7rt
+
7rt
+
B sin (-) + Zt 2.5Zt-1, 3 3 where A and B are uncorrelated mean zero random variables that are also uncorrelated with the white noise Zt. Find X t = A cos (-)
(a) the covariance function and the spectral distribution function of X t ; (b) the covariance function and the spectral distribution function of the filtered process, yt = X t - 2Xt-1 Xt-2.
+
Let Z X 0, 5 X < 27r be an orthogonally scattered process with distribution function FA and suppose that $ E L 2 ( F ) . 5.4
s,"
(a) Show that Wp = $(X)dZx, F E [0,27r) is also orthogonal scattered and has spectral distribution function F
GF
=
I'$(X)12dFA.
(b) Show that if g E L 2 ( G )then g$ E L 2 ( F )and
150
HARMONIZABLE SEQUENCES
(c) Show that if
l$l > 0, a.e., then Z F = S 0’ L*(A) dWA,
5.5
Prove parts (b) and ( c ) of Proposition 5.7.
CHAPTER 6
FOURIER THEORY OF THE COVARIA NC E
As in the stationary case it is possible and somewhat natural to develop a Fourier theory for the covariance without considering the representation of the process itself. In the stationary case, the Fourier theory for the covariance was complete as soon as it was understood that every stationary covariance was nonnegative definite and that the Herglotz theorem (Bochner’s theorem for continuous parameter) implied that every stationary covariance was the Fourier transform of a positive measure. Not long afterward, Kolmogorov and Cram& developed the spectral theory for stationary processes. Kolmogorov’s result used the spectral theory for unitary operators whereas Cram& developed, in a more direct fashion, the isomorphism that makes the process at time t correspond to a stochastic integral of the exponential function eiXtwith respect to a process of orthogonal increments (in other words, an orthogonally scattered vector measure). In the case of PC processes the representation theory for the covariance and for the process has developed more or less simultaneously in the sense that Gladyshev gave results on process representation in his first (1961) paper Periodically Correlated R a n d o m Sequences:Spectral Theory and Practice. By H.L. Hurd and A.G. Miamee Copyright @ 2007 John Wiley & Sons, Inc.
151
152
FOURIER THEORY OF THE COVARIANCE
[77]on this topic. But here we will follow the route of the stationary case and develop the Fourier theory for the covariance and the representation for PC sequences that is a consequence of harmonizability, also due to Gladyshev. The clarification brought by unitary operators, treated in Chapter 7 , will help in understanding Gladyshev's and other representations of P C sequences.
6.1 FOURIER SERIES REPRESENTATION O F T H E COVARIANCE
+
Following the notation of Gladyshev [77],since R(t 7 ,t ) is periodic in the variable t with period T , it can be represented as a discrete Fourier series
R(t + 7 ,t ) =
T-1
Bk(7)eiZTktlT, k=O
where t=O
where, as shown in the following lemma, the representation (6.1) is a pointwise equality. In continuous time or if R(t 7 ,t ) is almost periodic in t , a Fourier series representation still exists but the sense of (6.1) may not be pointwise equality.
+
Lemma 6.1 (Discrete Fourier Series) Let V be a vector space over The m a p D ; Vn H Vn defined by
C.
is invertible (a bijection) and D-l is given by n-I
Proof. Given 'u E Vn, we shall show D is invertible by showing that (6.4) explicitly gives the inverse, that is, D-lDu = u. Let j ' be any integer in { 0 , 1 , .. . , T - 1); then ~
I "
n-ln-1
k=O j = O
FOURIER SERIES REPRESENTATION OF THE COVARIANCE
153
where it is not difficult to show that
+
Thus the function R(t 7 ,t ) is completely represented by the collection : k = 0 , 1 , .. . T - l}, and the sequence X t is of coefficient functions {B~(T) stationary if and only if R(t T , t ) = BO(T) for all t. w e now turn our attention to properties of the coefficient functions ( B k ( 7 ) : k = O , l , . . .T - 1). First, the connection between nonnegative definite functions and secondorder sequences also holds in the nonstationary case.
+
Proposition 6.1 A necessary and suficient condition that a complex sequence { R ( s , t ): s , t E Z} be the covariance of a random sequence is that it is nonnegative definite (we will henceforth say NND), meaning that f o r every n, every collection of complex constants { a l ,CQ, ...,a n } , and collection of times { t l , t 2 , ..., t n } , n
n
p=l
q=l
The necessity is very simple and is omitted. Several proofs exist for the sufficiency; see, for example, Lemma 5.1 of Rozanov [201]. For the stationary case the covariance is a function of the difference of the arguments, (see Chapter 4) so
where F is a nonnegative measure on the Bore1 subsets of [ 0 , 2 ~ ) . Now we show that Bo(T)is always the covariance of some stationary sequence; that is, it is always NND.
Proposition 6.2 If X t is a PC sequence, then
Bo(7) =
c
1 T-l
R(t + 7 ,t )
t=O
is nonnegative definite. Proof. To see that €lo(.) is NND, take any n and any collection of complex numbers (01, L Y ~ ., . . , a,} and integers { t l ,t 2 , . . . , t n } and consider a
n
7, p=l
a p q B o ( t p- t q )=
q=1
C ca p=l n
n
q=l
1 T-l
C
P q ~ R(t t=O
+ t, - t,, t ) .
(6.9)
154
FOURIER THEORY OF THE COVARIANCE
But for each fixed t,, t q ,the transformation u = t - tq converts the last sum into
c
-t,+T-1 u=-tq
c
T-1
R(u + t,, u + t q ) =
u=o
R(u
+ t,, u + t q ) ,
where the last equality occurs because we are summing a periodic sequence over exactly one period and so it does not matter where in the period the sum begins. Thus (6.9) becomes n
p=l
n
n
ap%Bo(tp - tq)
n
=
q=l
p=l
q=l
T-1
c
1 T-l
apqs;
u=o n
R(u + t,, u + t q )
n
u=Op=lq=l because
cc n
n
p=l
q=1
a p q R ( u+ t,, u + t q )1 0
follows from the application of (6.6) to the times { u
+ t3}jn=l.
I
An immediate application of Herglotz’s theorem (Theorem 4.1) to the coefficient Bo (T ) yields (6.10) for some nonnegative measure Fo. But what about the other coefficients B ~ ( T )Are ? they Fourier transforms too? The answer is yes, but not necessarily with respect to nonnegative measures. We answer the question using the following characterization of sequences that are Fourier transforms of complex measures p . Proposition 6.3 A bounded complex sequence { a , : n E Z} has representation
an =
Jc
277
eiAnp(dX)
(6.11)
for p a complex measure o n the Bore1 subsets of [0,27r)Zf and only i f it satisfies the following boundedness condition: there exists a positive number M such that (6.12)
for any positive integer m and any complex sequence {aj}y=l.
FOURIER SERIES REPRESENTATION OF THE COVARIANCE
155
Proof. If a, is given by (6.11), then the boundedness condition holds with M = s,'" Idpi. For the converse, suppose we are given a sequence {a,} that satisfies the boundedness. We will show that this condition expresses the boundedness of a linear functional on a dense subspace of C[O,an), and hence extension to all of C[O,2n) gives the desired result. Let M be the space of finite linear combination of functions from E = {eixn,n E Z}; that is, M is the set of trigonometric polynomials t ( X ) = C,"=, a j e i x j . Clearly M c C[O,2n). A linear functional may be defined on M by setting, for every t E M , n
=
C
ajaj,
(6.13)
j=1
where t(X)= a j e i x j . We immediately see that the boundedness condition (6.12) exprhsses t h e boundedness of L on M . But since M is dense in C[O,27r), a result from elementary Fourier series or from the Stone-Weierstrauss theorem, then L may be extended to C[O,2n) without increasing the norm. Hence from the Riesz representation theorem (see Rudin [202, Chapter 61) for bounded linear functionals on C[O,27r)there exists a measure p so that for f E C[O,2n), the functional L is given by (6.14) and
sf"
IdpI
as required.
5 M . Observe that L evaluated on the elements of E produces
I
For bounded linear functionals on C[O,11 the Riesz representation leads to Stieltjes integrals with respect to functions of bounded variation (see Riesz and Nagy [192, page 106, section 501). The characterization of Fourier transforms given in Proposition 6.3 is sometimes called the problem of trigonometric moments (see Riesz and Nagy [192, page 1161). Bochner [20] extended the characterization to bounded functions f : R -+ C and Eberlein to LCA groups, an account of which can be found in the book by Rudin [203]. Now we apply the preceding to the B ~ ( T ) .
156
FOURIER THEORY OF THE COVARIANCE
Proposition 6.4 If X t is PC- T, then for each k = 0,1, . . . , T - 1 there exists a measure Fk on Bore1 subsets of [0,2n-) such that
(6.15) Proof. For general k we shall apply Proposition 6.3, as we have already remarked that Proposition 6.2 and Herglotz's theorem together imply (6.15) for Ic = 0. Hence it suffices t o show that there exists a positive constant M for which
for any m , complex sequence {a, : 1 L j 5 m}, and integer sequence {rj : l<j<m}. Substituting (6.2) in the left-hand side of the previous expression leads to
where the preceding inequality follows from the Cauchy-Schwarz inequality applied t o second order random variables. Application of the Cauchy-Schwarz inequality t o finite sequences yields
FOURIER SERIES REPRESENTATION OF THE COVARIANCE
157
But T-l
m
T-1
t=O
j=1
t=O
m
m
k=l j=1 m
m
k=l j = 1
and
c
T-1
EJX# =
t=O
c
T-1
271
R(t,t)= TBo(0)= T b Fo(dX).
t=O
Using these facts in (6.16) leads finally to
and so (6.18)
J Fo(dX) l"serves for the constant ill, and this same constant serves
so that for all ic.
I
Note that the inequality (6.15) says only that IFkl(,l([O, 27r)) 5 Fo([O,a n ) ) , the total mass of IFk/ is dominated by that of Fo, not absolute continuity (lFkI << F o ) , which requires IFk\(A) 5 Fo(A) for all Bore1 sets A . It will be shown later using the Cauchy-Schwarz inequality that there is a sense in which Fk is absolutely continuous with respect t o Fo. To see that the measures Fk need not be nonnegative, consider any properly P C sequence X t for which R(t,t ) is constant with respect t o t . For then there must exist some T and k > 0 for which B k ( r ) # 0, meaning lFk([O,27r))I # 0. But R(t,t ) constant means that Fk([O, 27r)) = 0 for all k > 0 because (6.19) Examples of such processes were given in Sections 2.1.4, 2.2.6, and 2.2.7. The reader may wish t o show that Proposition 6.4 is not necessarily true if the underlying covariance R ( s ,t ) is replaced with an arbitrary f ( s ,t ) =
158
FOURIER THEORY OF THE COVARIANCE
+
+
f (s T ,t T ) . The following result, due to Gladyshev [77], is a statement in the spirit of Khintchine's theorem that is related to this problem.
:::}z
Proposition 6.5 (Gladyshev) Suppose the sequence { B ~ ( ? - )E, T of coeficient functions is determined b y (6.2) from some R ( s , t ) that satisfies R(s7t ) = R ( s T ,t T ) . Then R ( s ,t ) is NND if and only if for any n, any set of complex numbers { a l , a z , .. . , a n } ,any set of times { t l ,t z , . . . , t n } ,and any set of indices { k l , k z , . . . , kn E [0,1,...' T - l]}, one has
+
+
n
n
(6.20) p = l q=1
Proof. Suppose R is NND so that (6.6) holds; then evaluation of (6.20) gives
p = l q=1
(6.21) p = l q=1
n
n
p = l q=l
-
T-1
t=O
In the last sum of the preceding line, setting t' = t - t , yields - t o +T-I
because for any p , q the summand is periodic in t' with period T and the sum is over one full period. Hence the sum can be started anywhere and the same value will be obtained. So finally we obtain
FOURIER SERIES REPRESENTATION OF THE COVARIANCE
159
To show the converse, we are given (6.20) and wish to show that (6.6) holds for arbitrary n, and any collection of complex numbers { C X I , Q ~., . . , a,}, and any collection of integers { t l ,t 2 , . . . , tn}. Using (6.1) we first obtain
p=l q = l
n
n
T-1
p = l q=l
k=O
By observing the way B k ( T ) is formed it is clear that for k outside the base set {0,1, ...,T - 1) the relation B k ( 7 ) = B k + T ( T ) holds for every k and T . For any arbitrary periodic sequence h k = h k + T the truth of
c
T-1
hk
1 T
=-
k=O
applied to
Bk(tp
T-lT-l hk-k'
(6.23)
k=O k'=O
- t q )yields -
.
n
T-1 T-1
n
p=lq=l
n
k=O k'=O
n
T-1 T-1
We now claim that the preceding quadruple sum can be put into the form
.
nT
nT
(6.25) which can be recognized as the form of (6.20) and the result S 2 0 follows. To do this we first note that every p' E {1,2, ...,nT} can be written asp' = k n + p for k E {0,1,..., T - 1) and p E {1,2, ...,n}. The identifications -
,
k,,
=
k,
$1
=
t,
Jp'
transform the last sum in (6.24) into (6.25) and the claim is proved.
I
Note that condition (6.20) means that {/'3jk(.)} for j , k E {0,1,. . . T - l} is a cross-covariance matrix of a T-variate stationary sequence. Gladyshev [77]
160
FOURIER THEORY OF THE COVARIANCE
used this to argue that the B k ( r )are Fourier transforms. We have chosen our particular path because it introduces us to the Riesz theorem about linear functionals, which leads us to similar results both in the continuous time and almost periodic cases [102,106].
6.2
HARMONlZABlLlTY OF PC SEQUENCES
Gladyshev [77]also showed that all P C sequences are (strongly) harmonizable, which, in view of Definition 5.4, means that P27r
r27r
where F is a finite measure on B ( [ 0 , 2 ~ ) ~The ) . proof of this relation for covariances of P C sequences seems almost obvious, since using (6.15) in (6.1) produces (6.26) a representation of R ( s , t ) by a finite sum of one dimensional Fourier transforms. We only need to see how to express (6.26) in the form (5.4). We get a clue about making this transformation from the following result, in which we assume R ( s ,t ) is the autocovariance function of a harmonizable sequence (see Definition 5.4). Proposition 6.6 If autocovariance R ( s , t ) is a Fourier transform of a mea-
+
+
sure of bounded variation F , then R ( s ,t ) = R ( s T ,t T ) if and only if the support of F is contained in the union ST = UEG!T+lSk of 2T - 1 diagonal lines (6.27) s k = { ( X i , A,) E [o,2 r ) X [o, 2T) : A2 = - 2 ~ k / T } . Proof. If supp(F) c ST, then (5.4) becomes
because e i ( S + T ) X 1 - i ( t + T ) X z = e i s A l - i t A 2 , whenever ( A l l A,) E ST. Conversely if R ( s ,t ) = R ( s T ,t T ) ,then for every natural number N ,
+
R(s,t) =
1
(2N
+ 1)
+
N
R(s k=-N
+ kT,t + k T )
HARMONIZABILITY OF PC SEQUENCES
161
where
(6.30) is a sequence of bounded continuous functions with D ~ ( j 2 7 r= ) 1 , j E Z and converging to 0 for u # 27rZ. Hence the Lebesgue convergence theorem implies (6.28) holds for every s, t and the invertibility of Fourier transforms (or the uniqueness of the measure) implies that the support of F must be contained in ST. 1 Now we have a target for the rearrangement of the finite collection of measures on lines given by (6.26). The proof of the following proposition is just a demonstration of the rearrangement. It gives us the correspondence between the complex measures Fk and the measure F , which we now know to be supported (concentrated) on a finite number of lines.
Proposition 6.7 The covariance of every PC sequence is a Fourier transf o r m of a measure with bounded total variation. I n other words, every PC sequence is harmonizable. Proof. We only need to show that (6.26) can be rearranged to yield (6.28). First, (6.26) can be rewritten as
R ( s ,t ) =
[" 6'"
eizt+iy(s-t) A d z , d Y ) ,
(6.31)
where the support of the measure p ( . ) , as shown in Figure 6.1 for T = 5, is the set {(z,y ) : 0 5 y < 27r, 2 = 27rk/T,O 5 Ic 5 T - 1) of T vertical lines of length 27r and spacing 27rlT. The total variation of p is given by
The transformation A1
=y
and
1
A2 = y
- z in (6.31) produces
A 1 +2"
R ( s ,t ) = x1=0
eiXls--iXzt
pI ( d A 1 , d A 2 ) .
(6.32)
x2=x1
The support of p' is a family of T diagonal lines given by A2 = A1 - 27rk/T for A1 E [0,27r)as illustrated in Figure 6.2. This follows from the fact that the support of p is T vertical lines with spacing 27r/T. For example, the line segment { (2, y) : z = 0,O 5 y < 27r) is transformed to { ( A 1 , A,) : A2 = XI, 0 5 A1 < 27r).
162
F O U R I E RTHEORY OF THE COVARIANCE
F 4
Figure 6.1 T = 5.
Support of p in [ 0 , 2 ~x) [ 0 , 2 r ) for a PC-T sequence with
/ Figure 6.2
Support of p’ under the transformation XI = y and XZ = y - x.
HARMONIZABILITY OF PC SEQUENCES
Figure 6.3
163
Support of F = p' : Xz is taken (mod 2n)
The final step t o put (6.31) into the desired form (6.28) is to replace A2 in (6.31) with &(mod 27r), noting that the exponentials are invariant under these shifts (see Figure 6.3). The transformation Xz(mod 27r) translates the support of p', where XZ < 0, upward by 27r to fill in the square [0,27r)x [0,27r). So we finally see that the measure F is merely a rearrangement of p having the same (finite) variation as p. I It follows from the sequence of transformations that
and so from the uniqueness of Fourier transforms we conclude that we can identify the measure F k with the restriction of F to the set s k . An application of the inversion formula (see LoBve, [139, page 1991 or Brockwell and Davis, [28, Theorem 4.9.11)
to (6.33) makes precise the relation (6.34) The set
S1
for T = 5 is illustrated in Figure 6.4
164
FOURIER THEORY OF THE COVARIANCE
Figure 6.4
Support set
S 1 for
T = 5.
Harrnonizability of X t . Since P C sequences are (strongly) harmonizable and thus, by Proposition 5.10, also weakly harmonizable, we have
Xt =
6'"
eiXt [ ( d X ) ,
(6.35)
where the covariance in the increments of 5' were described by Proposition 6.6. This same representation will be proved again in Chapter 7 via the explicit construction of the random measure 5'. Now we show that the representation (6.35) for a PC-T sequence X t leads to another connection to multivariate stationary sequences, but in the frequency domain.
Proposition 6.8 (Gladyshev) A second order sequence X t is PC-T if and only if there exists a T-variate stationary sequence { Z $ } t~E ~Z ~ such , that
xt -
-
T-1
C Zfei2nkt/T,
for every t E Z.
k=O
Proof. If X t has the representation (6.36), then
k=O j = O
because for every k , k' the covariances (Z,",2,"') depend only on s - t .
(6.36)
HARMONIZABILITY OF PC SEQUENCES
165
Conversely, if X t is P C with period T , then from (6.35), e"'E(dX) k=O
(6.38) where
Ek(O
is defined on the Borel sets of [0,27r/T)by (6.39)
(6.40)
gives the result since the random measures Ek are orthogonally scattered. That is, if A and B are subsets of [0,27r/T),then
.Fp4(An B ) = E{JP(A)EP(B)} =0 whenever A n B = 0 because the translated sets
also do not intersect, and this is true for every p , q = 0 , 1 , . . . , T 6.5 may help in the perception of this fact. I
-
1. Figure
The following proposition, also due to Gladyshev [77],relates the matrix valued cross spectral measures (distribution) F for the multivariate stationary sequence { [X,]P = Xg = X , T + ~ : 0 5 p 5 T - 1) with .F of the multivariate stationary sequence (2: : 0 5 k 5 T - l}. Here we note that F and F are both T x T , but F is defined on the Borel sets of [0,27r),whereas .F is defined on the Borel sets of [0,27r/T). Proposition 6.9 If X t is PC-T, then
F(dX) = TV(X).F(dX/T)V-'(X),
where V(X)is a unitary matrix (a map CT given b y . u ~ k(A) -
4
C T ) whose ( p , k)th element is
ei2~~k/T+zX~/T
J?;
(6.41)
(6.42)
166
FOURIER THEORY OF THE COVARIANCE
Figure 6.5 Partition of [0,27r)x [0,27r)into subsquares of side 27r/T. The cross spectral measure .Fpq(.)is obtained from the diagonal of subsquare pq by F p q ( A )= F ( A + 2 x p / T , A + 2 x q l T ) .
where p , k are both in the set { 0 , 1 , . . . , T - 1). Proof. For the T-variate stationary sequence X $ = X p + n ~ , E n Z,p = 0; 1,..., T - 1, we can write
O n the other hand, using representation (6.36) of Proposition 6.8 we can write
=
ei2.rrpk/Te-i2.rrp'k'/T
k
,iX(p-p')/T+iX(s-t)gkk'
k'
where we changed variables by X = T y . Comparing the last display with expression (6.43) and applying the uniqueness of Fourier transforms together yield
=
($)
VP~((X)E~'~'(X)F~~' .
T k
k'
(6.44)
HARMONIZABILITY OF PC SEQUENCES
167
Since the functions w p k ( X ) are all continuous functions, setting A = [a,b) with b - a sufficiently small, we can write
F p p ’ ( [ ab, ) ) =
C C ~ ~ ~ ( X ( ~ ” ) ) ; t i p ‘ ~ ’ ( X ( ~ b~)’/)T) )F, ~ ~ ’ ( [ a , k
where a 5
k’
< b. We abbreviate this by
x
F p p ’ ( d X )= T k
dX W ~ ~ ( ( X ) T I ~ ‘ ~ ’ ( X ) F ~ ~ ‘ (6.45)
(?),
k’
which by taking all the p,p’ gives (6.41). The invertibility and continuity of V(X) for X E [0,27r) means (6.41) can also be expressed as 1
F ( ~ x=)- v - ’ ( T x ) F ( T ~ E x ) v ( T x ) . T
(6.46)
Corollary 6.9.1 The complex matrix measures F and 3 are mutually absolutely continuous in the sense that for every Bore1 set A, IFpp‘I(A) = 0 , p , p ‘ = 0 , 1 , . . . ,T - 1 iFkk’l(A/T) = 0 , k,k’ = 0 , 1 , . . . ,T - 1. Proof. If IFkk’I(A/T) = 0, k , k’
= 0 , l ; . . . , T - 1, then
= o
(6.48)
for p,p’ = 0 , 1 , . . . , T - 1. The reverse implication follows from (6.46).
I
We will say that the complex matrix measure F is absolutely :ontinuous with respect to Lebesgue measure p and write IF1 << p if lFlPP << p for p , p ’ = O , l , . . . ,T - 1. Similarly, I F1 << p will mean 131kk’(A/T)<< p for k , k’ = 0 , 1 , . . . , T - 1. We have immediately from Corollary 6.9.1 that IF( << p if and only if IF1 << p . Thus if either of these hold we have
dFpp‘ dP
-=
d F k k i( . / T ) TCCU~~(X)B~‘~‘(X) dP k
k’
(6.49)
168
FOURIER THEORY OF THE COVARIANCE
or, more compactly, setting fx(X) = dF/d,u and f i ( X / T )= d.F(./T)/dp, we have (6.50) and the inverse whose form is expressed by (6.46). See the supplements at the end of this chapter for some discussion of Proposition 6.9 in terms of the spectral processes of Y ( n )and Z,(t).
6.3 SOME PROPERTIES OF B ~ ( T Fk, ) , AND F From (6.2) it is clear that B k ( r ) is periodic in the index k , that is, B k + T ( T ) . In addition,
Bk(7) =
. -r+T-l
-
1 r
C
R ( s + T , s ) e -i2rk(s+~)/T
I
S=-T
=
B - ~ ( T ) ~ - ~ ~ T=~BTT/- kT (7 ) e - 2 2 T k T / T ,
(6.51)
so there is a symmetry about the index (T - 1)/2; in other words, (6.51) shows that the entire collection of coefficient functions ( B k ( 7 ) : k = 0. 1,.. . T - l} is determined by the collection ( B k ( 7 ) : k = 0 , 1 , . . . , [(T- 1)/2]}. This also means that the spectral measures {Fk : k = 0 , 1 , . . . , T - 1) are determined by the collection {Fk : k = 0 , 1 , . . . , [(T- 1)/2]}. This conclusion may also be drawn from the Hermitian property of F (see Chapter 5) ~
F ( b ,b ) , [c,4 )= E{E([a,~))E([c,d))} = F ( [ c , d ) [a, , b))l
(6.52)
for any rectangle [a, b) x [c,d ) in [0,27r)x [0,27r). In the case of PC sequences, for which the support of F is contained in 2T - 1 diagonal lines, we conclude that the measure on the kth line below the main diagonal is determined by the conjugate of the kth line above the main diagonal. And since each spectral measure Fk may be identified with two pieces appropriately spliced from F. the kth line below the main diagonal and the (T - k)th line above the main diagonal, the conjugate symmetry of F implies that the {Fk : k = 0,. . . . T-1} are determined entirely by F restricted to A2 5 A1 or from the preceding remark regarding splicing, by the collection { F , : k = 0 , 1 , . . . , [(T- 1)/2]}.
SOME PROPERTIES OF B k ( T ) . Fk, AND
F
169
Figure 6.6 Geometric relationship between F and F k for a PC sequence. 11 = [a,b ) , 4 = [c,d ) . (a) Hermitian symmetry of F implies F(A2) = F(A1) for A1 = 11 x 4 and A2 = Iz x 11.(b) Cauchy-Schwarz inequality implies IF(b,b ) x [c,4)IZ5 W a , b ) 2 ) F ( [ c , d ) 2 ) .
The main diagonal measure Fo has a domination property arising from the Cauchy-Schwarz inequality applied t o I , I W I ( [ a ,~))E([.,d)))I2 i E{lI([a, ~ ) ) 1 2 1 ~ { l d))I2), ~([c~
which can also be written
This immediately implies that if FO is absolutely continuous with respect t o Lebesgue measure, then Fk is also, and its density satisfies Ifk(A)12
5 fO(A)fO(A - 2kr/T).
(6.55)
But also if F has a point mass at (X1,Xz) then the diagonal must have point masses at (AI,XI) and (Az,A 2 ) . For additional conclusions t h a t may be drawn when X t is a real P C sequence, see the supplements t o this chapter.
170
6.4
FOURIER THEORY OF THE COVARIANCE
COVARIANCE A N D SPECTRA FOR SPECIFIC CASES
The notation and definitions just completed are applied here to some specific examples. 6.4.1
PC White Noise
Periodically correlated white noise might also be called an uncorrelated PC sequence, thus providing an example of a situation where the terminology cyclostationary works better.
Definition 6.1 If ct is a normalized white noise (see Definition 4.6) and a ( t ) = a(t T ) is a nonnegative periodic sequence, then
+
xt = a ( t ) € t
(6.56)
R ( s , t )= a2(t)b,-t,
(6.57)
is called PC white noise. It is easy to compute that
which leads to ~
T-1
(6.58) =
C
bTT1 T - l a 2 ( t ) e-a2n
k t /T
t=O
So the spectral densities for PC white noise are simply
Note if the variance is constant, a 2 ( t )= CT’! then a’
Bk(7)
=
{0
k=Oandr=O otherwise
>
showing that the nonstationary part (corresponding to k Taking the other extreme, suppose
# 0) of F is null. (6.59)
COVARIANCE AND SPECTRA FOR SPECIFIC CASES
171
then we find B k ( r ) = a26,/T and hence the spectral densities are all the same: (6.60) for a l l k = O , l , . . . , T - l . In view of 6.60, we can see that each element of the matrix spectral distribution function F P P ' is comprised only of the absolutely continuous part, which is
and so
fz(X)=&
1
1 1 ... 1 1 ...
1 1 .
,0 O< < XX<< ~ T / T . .. 1 !1 i. . . : ] , ..
..
(6.61)
Clearly f i ( X ) is of rank 1 and since V(X) is unitary for every A, then from (6.50), fx(X) is of rank 1 for every A. Anticipating the application of multivariate (innovation) rank (see Section 4.4.3) to P C sequences (see Chapter 8); this particular case (6.59) is called rank-1 PC-T white noise. Note that the PC-T white noise is actually a special case of X t = f t y t for f t periodic and yt stationary and where we identify f: = u 2 ( t ) .By convention o ( t ) is the positive square root of cr2(t). 6.4.2
Products of Scalar Periodic and Stationary Sequences
If X t = f t y t for f t periodic and yt stationary, then from (2.7) the simple result R x ( s ,t ) = f 8 & .R y ( s - t ) leads to
. T-I
(6.62) p=o
where Fp is the Fourier coefficient
172
FOURIER THEORY OF THE COVARIANCE
with
T-1
p=o If the sequence yt has an absolutely continuous spectrum
then from (6.62) and using standard Fourier relationships we obtain (6.63) p=o
where in (6.63) we must use Fp = F p + and ~ the argument of taken (mod 27r).
6.5
f y
must be
ASYMPTOTIC STATIONARITY
Definition 6.2 For a second order sequence X t , we define (6.64)
whenever the limit exists. If Ravg(r)i s defined for all r E Z,then we say the sequence X t i s asymptotically stationary. It is not hard to see that the limit is independent of t o at any r for which the limit exists and if the limit exists for all 7 , then the function Ravg(r)must be NND and hence a covariance. Proposition 6.10 Every PC sequence is asymptotically stationary and
Ravg(7) = B o ( T ) .
(6.65)
Proof. The proof follows from the fact that for any positive integer k ,
and so for any N , choosing k"
=
[(2N+ 1 ) / T ] produces
LEBESGUE DECOMPOSITION OF
F
173
Proposition 6.11 Every harmonizable sequence is asymptotically stationary and Ravg(7)= / ~ e ” l r F ( d X 1 l d X ~ ) . (6.66) Proof. The partial sum 1 2N+1
N
t=-N
I
277
E { X t + T X , }=
___
277
D N ( X I - Xz)eiXITF(dX1, d X z ) , (6.67)
where D N ( . ) ,given by (6.30), converges as N -+ co to the right-hand side of (6.66) from the same argument used in Proposition 6.6. 1
Proposition 6.12 If F is the spectral measure of a harmonizable sequence and A is the diagonal of [O,Z.ir) x [0,27r), then the diagonal measure defined b y F a ( A ) = F ( A n A x A ) satisfies F a ( A ) 2 0 for any measurable subset A of [O, 2T). Proof. For any n the diagonal A can be covered by n disjoint squares S,, x Sj, of side 27r/n,
A c A, =
USj, x
Sj,,
in a way that A, A. Since F(Gjn x & , n A x A ) = E { / [ ( S , n n A ) ) 24 } 0 and A, n A x A A n A x A , then by the continuity of measure F ( A , n A x A ) -+ F ( A n A x A ) , and thus F ( A n A x A ) 2 0. I In particular, if X t is stationary it is clearly asymptotically stationary and RaYg(7)is the same as the covariance function of X t . And we also recover the facts that were proved independently in Proposition 6.10: PC-T sequences are asymptotically stationary with Ravg(7)= B o ( T )and FA is Fo. For some additional facts, including some early references, on asymptotic stationarity, see [38,42,121,126,178].
6.6
LEBESGUE DECOMPOSITION OF F
Since F is a measure on the Bore1 sets of [0, 2 7 ~ ) from ~ , its Lebesgue decomposition we can write (6.68) F = Fa“ F “ ,
+
where FaCis absolutely continuous with respect to two-dimensional Lebesgue measure p2 and F S is singular with respect to p z . The first observation is that
174
FOURIER THEORY OF THE COVARIANCE
for a stationary or PC-T sequence, F is singular with respect to p2 because p 2 ( s T ) = 0. Since for any harmonizable sequence it is always true that F " " ( A ) = 0 for the diagonal A, we conclude that F ( A ) > 0 if and only if F s ( A ) > 0. To get a sense of the meaning of the absolutely continuous part, we will say that a second order sequence X i is transient whenever the time averaged variance is zero, or more explicitly whenever N
(6.69) t=-N
Proposition 6.13 If X t is harmonizable with spectral measure F , then the following are equivalent: (a) X t is transient; (b) F ( A ) = 0;
(c) F = Fa" Proof. The result follows from (6.66) and the remarks above.
I
See the supplements for more on this issue. 6.7
T H E S P E C T R U M OF
mt
An important source of singular discrete components in the spectrum of a PC sequence is the mean function mt = E { X t } , where Imtl 5 E { I X t / } < cc because X , E L2(R,F,P ) . If a PC-T sequence X t has a nonzero mean then we can write X t = X i mt, where mt = mt+T and X i is PC-T but with zero mean. Thus
+
(6.70)
j=O
k=O
THE SPECTRUM OF
175
mt
Figure 6.7 Possible locations of spectral atoms of F produced by the periodic mean mt for T = 5 .
and the periodicity m t = mt+T permits the discrete Fourier series representation T-1
(6.72) k=O
with scalar coefficients
. T-1 (6.73) It follows easily that the spectral measure associated with R x ( s ,t ) can be expressed as
where Fxl is the spectral measure associated with the covariance. and 6 ( a ,b ) = 1 if a = 0 and b = 0 and otherwise d(a,b ) = 0. So the point masses produced by mt are at the points (2rjlT.2rklT) with mass &,%. The possible location of the point masses attributed to the periodic mean for a PC-5 sequence are illustrated in Figure 6.7. Now suppose X t is PC-T but its periodic mean is null and we are studying the spectral measure for the cowarzance of X t . Can there still remain point masses in the spectrum of X t , that is. in Fx? The answer is yes. To give a simple example. suppose
176
FOURIER THEORY OF THE COVARIANCE
where A and B are zero mean second order random variables. This sequence has mt = E { X t } = 0 and covariance
R x ( s ,t ) = E{ l A / 2 } e i A ~ ( s+ - tE ) {AB}eiA~s-zXbt +E{ B z } e i A b s - i A a t + E{IB(2}eiXb(S-t) If E { A B } = 0, then R x ( s , t ) depends only on s - t so X t is stationary and hence PC-T for every integer T 2 1. More to the point, if E { A B } # 0, the sequence is still PC-T (i.e., R x ( s T , t T ) = R x ( s , t ) )provided A, - A b = 2j7r/T for some integer j. Extending this, a PC-T sequence can have a countable number of harmonic terms in its singular discrete part
+
+
provided and
That is, the frequency pairs corresponding to nonzero correlations must all lie in the set ST. The condition of Hermitian symmetry is seen in the simple statement and that of the Schwarz inequality
IE{AjAk}12 5
E{1Ajl2}E{lAkl2).
In order for X f d to be stationary it is necessary and sufficient for E{Aj&} = 0 for j # Ic. 6.8
EFFECTS OF C O M M O N OPERATIONS O N PC SEQUENCES
This section contains a treatment of the most common operations that are often used t o precondition time series or that serve as components of more complicated algorithms. 6.8.1
Linear Time invariant Filtering
Given a random sequence X t and a nonrandom sequence wn, n E Z called filter coeficients, a new (filtered) sequence is formed by
(6.78)
EFFECTS OF COMMON OPERATIONS ON PC SEQUENCES
177
provided the sum converges in mean-square sense. If w, = 0 for n < 0 the filter is called causal: Yt depends only on the input X, for s 5 t. If CnEZ /w,I < m, the filter is called stable. Since R x ( t ,t ) is bounded for PC sequences, the finiteness of
along with the Cauchy criterion suffices for the existence of the sum Yt and then
t)
R ~ ( S ,
=
=
CC - n , t - m) C C w,EZRx(s+T-n,t+T-m) W,W,R~(S
= Ry(s+T,t+T),
n E Z mEZ
showing that the P C property is preserved by stable linear filtering. Causality is not required for the existence of the sum (6.78) defining Yt, but in practical situations filters are typically causal. The harmonizability of P C sequences permits a useful and helpful interpretation of the effects of linear time invariant filtering of P C sequences.' In Chapter 5 it was shown that the resulting sequence yt (6.78) from the filtering of a harmonizable sequence X t by a stable linear filter can also be expressed as 2.77
yt
=
eiXtW(X)[(dX),
(6.79)
where (6.80) is the Fourier transform of the filter weights. The covariance of Yt then becomes
where FX is the spectral measure for the sequence X t . When Xt is PC-T so the support of FX is contained in ST, (6.81) describes how the filter response W(X) modifies FX and hence the output covariance. For example, Figure 6.8(a) presents the frequency response W(X) for a lowpass filter having 8 'This interpretation may also be utilized in continuous time, where PC processes are not necessarily harmoniaable.
178
FOURIER THEORY OF THE COVARIANCE
real valued coefficients that were determined by a least-square algorithm2 and for which the cutoff design frequency is A, = 0 . 3 0 ~ .Figure 6.8(b) shows a
1
0
1
2
3
4
5
6
2
3
4
5
6
0,
0
1
0
n
6
Figure 6.8 Effects of filtering a PC-5 sequence by an 8 coefficient = 0.3071.. (a) Frequency response W ( X ) . lowpass filter with A, (b) lOlog,, IW(Xl)W(X,)l image plot. (c) (Top) 2O10gl0 IW(X)l; (middle) 2010glOjW(X - 27r/T)l; (bottom) IOlog,, (W(X)W(X- 27r/T)l. (d) lOlog,, IW(X,)W(X,)l with XZ = XI - 27r/T.
greyscale image of lolog lOjW(X,)W(X2)l with ST for T = 5 overlaid in black. This image, whose diagonal is the response of Figure 6.8(a), shows that only the very low frequencies survive the filtering and the contributions of the off-diagonal measures, where the product W(Xl)W(X,) is small, are strongly 2This particular set of coefficients was determined by the function f irls from the MATLAB Signal Processing Toolbox.
EFFECTS OF COMMON OPERATIONS ON PC SEQUENCES
179
suppressed. Real coefficients produce the symmetry W(X) = W(27r - A) and hence the large values of W(X1)W(Xz) in the upper left and lower right corners. The plots of Figure 6.8(c) illustrate the values of W(Xl)W(Xz) along the k = 1 support line : Xz = A1 -27r/T: the top trace is 2010glOIW(X)l, the middle is 2010g10 lW(X-27r/T)/, and the bottom is 101oglo lW(X)W(X - 27r/T)/. Figure 6.8(d) shows the densities f z ( X ) produced by application of the same filter to PC-5 rank 1 white noise, for which all the densities are equal to the same constant. Note that an ideal lowpass filter that passes only frequencies in 0 5 X < A, would completely suppress the P C structure provided A, < 27r/T (in this example, T = 5). This result makes intuitive sense; if a P C sequence is smoothed by a filter having sufficiently long memory, then the nonstationary fluctuations will be smoothed out. Before continuing with further illustrations we shall formalize the statement of effects of linear time invariant filtering on a P C sequence.
Proposition 6.14 Suppose X t is PC-T having spectral measures F?, k = 0.1.. . . T - 1, and {wn, n E Z} are t h e coeficzents of a stable linear filter. T h e n t h e spectral measures of t h e filtered sequence yt are given by (6.82)
where W ( X )is given by (6.80). Proof. Graphically, almost by inspection. But! using Proposition 5.13 and (6.33), we obtain
Jo
and then by recalling that S k = ((X1,Xz) : XZ = A1 - 27rk/T} and always taking A1 and Xz modulo 27r. I Note that the effect, of LTI filtering on the main diagonal, which represents the "average" stationary spectrum, is exactly that of filtering a stationary sequence.
180
FOURIER THEORY OF THE COVARIANCE
In the case of LTI filtering (with real coefficients) as illustrated in Figure 6.9, the extent to which the off-diagonal (nonstationary) part of F x is passed depends on where the passband intersects its reflection about X = T . If the
I
-60;
1
2
3
4
5
6
(4
_-
I
0
1
2
3
0
1
2
3
4
5
6
4
5
6
Y
1
Figure 6.9 Effects of filtering a PC-5 sequence by a 12 coefficient bandpass filter with band edges (X,,X,) = (0.4n,0.6n) (a) Frequency response W(X). (b) IOlog,, IW(X,)W(Xz)l image plot. (c) (Top) 2010g1, lW(X)(; (middle) 2O10gl0 IW(X-27r/T)/;(bottom) lOlog,,, IW(X)W(X - 27r/T)/. (d) 10loglo IW(X,)W(Xz)I with A 2 = A1 - 27r/T.
area of high response covers a support line, then the nonstationary spectrum in this region will contribute t o the output. It may be seen from Figure 6.9(b) that the area of high response is not directly over a support line. The bottom trace of Figure 6.9(c) shows that the response along the k = 1 support line is substantially smaller than the response on the diagonal.
EFFECTS OF COMMON OPERATIONS ON PC SEQUENCES
181
In a subsequent section we will extend this analysis to compute the effect of periodically time varying (PTV) filters. 6.8.2
Differencing
In many applications (such as in economics or meteorology) the observed time series contains a trend term proportional to time t or a very low frequency fluctuation that appears as a trend in short series. In this case it is common practice to produce a new sequence
where B is the back shift operator defined by Xt-1 = B X t . A nonrandom periodic component is sometimes called a periodic trend and in this case differencing with a one period lag
will completely suppress (eliminate) from X t any additive periodic function with period T. The T-point differences are also used to remove stochastic periodic terms produced by models having roots on the unit circle at z j = e a X j , where A, = 2 j ~ / T ,=j 0 , 1 , . . . , T - 1. Since both of these operations are LTI filters, their effect on harmonizable sequences and, in particular, on P C sequences can be understood using (6.81) and Proposition 6.14. In the case of first differences with lag 1, the filter coefficient sequence is wo = 1 and w1 = -1 and the resulting frequency response is simply computed by (6.80) to be W ( X= ) 1 - eZX. Figure 6.10 presents the resulting 2O10gl0 IW(X)i relative to the maximum (W(7r)= 2) along with the grayscale image of 101oglo IW(X1)W(Xz)i, also relative to its maximum. The filter response at X = 0 is null and exceeds 0.707 of the maximum in the region 7r/2 5 X < T . However, for low frequencies, 0 < X < 7r/2, the filter has large suppressive effects. So the entire lower part of the spectrum of a harmonizable (including stationary) sequence will be seriously affected by first differencing. Hence, if one thinks that the low frequencies carry some important information, more thought should be given to the design of the detrending filter. The support set SS for a PC-5 sequence is overlaid in the usual manner to show how the filter response would affect the spectrum. Figure 6.11 presents the corresponding displays for the first difference for lag 5 . In this case W(2j7r/5) = 0 for j = 0 . 1 , . . . , 5 and so W(Xl)W(Xz) = 0 at the points (XI, Xz) = ( 2 j ~ / T . 2 k r / T )for j , k E (0,l , ,. . . 5 } and where T = 5.
182
FOURIER THEORY OF THE COVARIANCE
-406
I
1
2
3
4
5
6
Figure 6.10 Frequency response for first difference with lag 1, @ ( B )= 1- B . (a) Frequency response 201og,, IW(X)l. (b) Outer product frequency response lOlog,, IW(X,)W(X,)i. ST for T = 5 is overlaid in black.
So any discrete components at these points will be completely suppressed in the output, where recall that discrete components can be produced by the periodic mean or by random periodic components. Of course, any other discrete components in the original spectrum will be passed but with their amplitudes modified by the filter response. Figure 6.11(b) shows that on the central part of each small square [2j7r/T,2 ( j 1)7r/T) x [2k7r/T,2 ( k l)7r/T) the measure on the diagonal lines does not experience much suppression, but near the points (XI, A,) = (2j7r/T,2k7r/T) the measure is substantially affected. As in the lag 1 case, this argues for methods that do not affect so much of the spectral covariance measure F x in such a significant manner.
+
+
6.8.3 Random Shifts We have already noted several ways in which periodically correlated processes have a very close connection to stationary processes. In this section we show the connection in yet another way: periodically nonstationary and periodically correlated sequences are essentially those sequences that can be made stationary by an independent random time shift. The motivation for this problem comes from engineering problems in which periodic functions arise and one wishes to treat the periodic function as a stationary process. Typically this is done because treating the periodic function as a stationary process makes calculations concerning the spectrum easier, and in some cases the average spectrum conveys enough information to solve the engineering problem at hand. And one is tempted by the argument that
EFFECTS OF COMMON OPERATIONS ON PC SEQUENCES
183
Figure 6.11 Frequency response for first difference with lag 5 ; + ( B )= l - B 5 . (a) Frequency response 20 log,, iW(X)l. (b) Outer product frequency response lOlog,, IW(X1)W(X,)l. ST for T = 5 is overlaid in black.
the exact time origin is unknown anyway and so we may as well consider it as random and uniformly distributed over the period of the periodic function. Fredrick Beutler [24] studied this problem for nonrandom functions in some detail. He showed that if f : R -+ R, f ( t ) = f ( t T ) is Bore1 measurable and 0 is a real random variable uniformly distributed on {0,1,. . . , T } , then Y,(w) = f ( t 0 ( w ) ) is strictly stationary. He also showed that the uniform distribution for 0 is not necessarily the only distribution that can make Y , ( w ) stationary. The connection to PC processes (see Hurd [103]) and sequences is a little more complicated because the nonrandom function f(.) is replaced with random function Xi.See Gardner [65] for a treatment that also includes certain almost PC processes. In discrete time, the shifted sequence can be explicitly written as
+
+
so that in forming ~ ( L J the ) randomness of the process Xt(w) is mixed up by the random shift O ( w ) . Given that both X and 0 are defined on the same probability space (0,F,P ) , we come immediately t o this question: How big does F need to be to ensure that Y , ( w ) is 3 measurable for each fixed t? If we denote FX to be the sigma-field induced by X (the smallest sigma-field containing the w-sets giving the finite dimensional distributions) and 3Qto be the sigma-field induced by 0, then we will see below that it is enough for F to contain the join FX VFQ. Recall FX V F Q is the smallest sigma-field containing sets of the form A n B for A E 3 x and B E FQ.
184
FOURIER THEORY OF THE COVARIANCE
Although 0 could generally be taken to have range Z, for our current problem it suffices to consider 0 : s1 H { 0 , 1, ..., T - 1). To see that 3~ FQ is indeed sufficient for our problem, let
v
Sj = {W : O ( W )= j } = @ - ' ( j ) , j = 0,1, ..., T - 1 so that
c
(6.85)
T-1 @(W)
=
1S3(W)j,
(6.86)
3=0
and this leads at once to =
Y,(w)
Xt+j(W),w E Sj,j = O , l , . . . ,T- 1 (6.87) 3=0
So now it is clear that for each fixed t , Y , ( w )is a sum of products of Fx measurable functions with Fe measurable functions; that is, Y,(w) is FX Fe measurable. Now if X t is independent of 0 in the sense that FX and FQ are independent sigma-fields, then for any t and Borel set A ,
v
T-1
Pr[O = j n Xt+3E A]
Pr[Y, E A] = 3=0 T-1
j=O
T-1
(6.88) j=O
where pj = Pr[O = j ] = P ( S j ) .
This may immediately be extended to obtain for every n,every collection of times t l , t z , ...,t , and Borel sets A l , Az, ..., A,
Pr[Y,, E A l , ytz E A z ,...,Yt, E A,]
c
T-1
=
pujPr[Xtl+j E A i , Xtz+j E A2, ...,Xt,+j E A,].
(6.89)
j=O
The finite dimensional distributions of Y, are just p-weighted time averages of the finite dimensional distributions of X t .
185
EFFECTS OF COMMON OPERATIONS ON PC SEQUENCES
The same thing hold for moments. For example, if X t is second order, then II
T-1T-I
T-l
(6.90) j=O
where the last equality results from SjnSk = 8 for j # k and the independence of X with 0. After the following simple lemma we will be prepared to state the main result.
Lemma 6.2 A sequence p : Z + C i s periodic with period T ; that is, pt = if and only if the sequence
Pt+T
Pt
=
T1 T-l Cpt+j
(6.91)
j=O
is constant with respect to the variable t . Proof. If pt = pt+T, then Pt will not depend on t because it is a uniform average over exactly one period; it does not matter where in the period one begins the sum. Conversely, if j7t does not depend on t , then
0 = pt+l - Ft = pt+T and so pt must be periodic with period T .
- Pt
I
Proposition 6.15 If X t i s periodically nonstationary with period T and 0 is a n integer valued random variable, uniformly distributed o n {0,1,..., T-1} and independent of X t , then yt = X t + e is strictly stationary. Conversely, if Yt = Xt+Q is strictly stationary f o r some 0 uniformly distributed o n {0,1, ...,T-1) and independent of X t , then X t i s periodically nonstationary with period T .
Proof. Both statements are applications of Lemma 6.2 to (6.89).
I
We state the following separately because of our focus on the second order case.
186
FOURIER THEORY OF THE COVARIANCE
Proposition 6.16 If X t is PC with period T and 0 is an integer valued random variable, uniformly distributed on {0,1,...,T - 1) and independent of X t , then y t + ~is wide sense stationary and its covariance is Bo(r). Conversely, i f yt+e is wide sense stationary for some 0 uniformly distributed on {0,1, ..., T - l} and independent of X t , then X t is PC with period T. Proof. Both statements are applications of Lemma 6.2 t o (6.90). In the first claim, (6.2) is used to see that the covariance of Y is Bo(T) (which is NND by Proposition 6.2). I When X t is harmonizable the covariance E { Y , K } can be expressed nicely in terms of the spectral measure F x .
Proposition 6.17 If X t is a harmonizable sequence and Yt = Xt+Q, where 0 : R H { 0 , 1 , .. . , T - 1) is independent of X t , then
R y ( S t, ) =
L~~
e i X l s - - i A z t @ @ ( X i - X2)Fx(dX1,d X 2 )
(6.92)
where @e(u)= E { e i e u ) = CyL: eijupj is the characteristic function of the random variable 0 . Proof. Since E{X,+eXt+e 10 = O O }= R x ( s
+ 0 0 t, + O O ) then , setting
pj = P r [ 0 = j ] ,
(6.93)
The preceding line is easily identified as (6.92).
I
In the special case when 6 is uniformly distributed on {0,1, . . . , T - l},we obtain @,o(u) = ez(T--l)u/2 sin(Tu/2) T sin(u/2) ' Note that k
@ @ ( A 1 - X2)
+
A2 = XI 27rk/T for any integer of ST is preserved but all the off-diagonal
= 0 whenever
# 0. Hence the main diagonal
EFFECTS OF COMMON OPERATIONS ON PC SEQUENCES
187
components of ST are removed by the random time shift. This gives another way to view the direct claim of Proposition 6.16 and shows that if the support of F x is null on some of the off-diagonals s k , then there are distributions other than uniform that will cause xt+Qto be stationary (see [24,65,103]).
6.8.4
Sampling
If X t is a random sequence and we form yt = Xkt for k a positive integer and t E Z,how are the covariance and spectral properties of Yt related to those of the original process X t ? The question can be answered for arbitrary harmonizable sequences with random spectral measure (( .) and spectral covariance measure F . For then by (6.35), (6.94) which is an integral of exponential weights with respect to the random measure [. But ei(X+2r/k)kt = eiXkt implies that the exponential weights are the same for all t E Z for any frequencies that differ by an integer multiple of 27r/k. Hence we can combine the parts of [ that differ by multiples of 27r/k, resulting in (6.95) which, after the transformation y = kX, may finally be written as (6.96)
c,”:,’
+
where [’(dy) = [ ( d y / k 2j7rlk). The interpretation of (6.96) is that sampling by a factor of k does not preserve the original weighting of the random amplitudes by the exponentials, but causes the random amplitudes corresponding to frequencies that differ by an integer multiple of 27r/k to be weighted identically. This confounding of frequencies is usually called aliasing. As for the covariance of y t , one can repeat the preceding steps or compute directly from (6.96) to reach (6.97) where
FOURIER THEORY OF THE COVARIANCE
188
To illustrate using a sampling factor of two ( k = 2), the spectrum in each subsquare in Figure 6.12(a) is shifted onto the principal square and then rescaled to fill the entire square [ 0 , 2 ~ in ) ~Figure 6.12(b). This illustrates also the idea that if the original sequence Xt had been lowpass filtered prior to sampling, then the spectrum in the “upper” quadrants would have been greatly suppressed so their effect on the sum (6.98) would have been small. (0,2.rr)
r - - - - - - - - - -
I I
I I I
I I I I
. : /
,__._.____._._.__._...... _____.________.__.______(
*
I
I
Figure 6.12 Aliasing effects due to sampling with k = 2. (a) Spectral support for PC-8 sequence. (b) Resulting F y produced by sampling with k = 2.
To determine whether yt = X k t is P C or stationary we examine R Y ( St, ) = R x ( k s , k t ) to find the smallest positive S for which R y ( s + b , t + S ) = R y ( s , t ) will be true (nontrivially) for all integers s, t . Since
+ + 6) = R x ( k s + kS, k t + k b ) ,
R ~ ( s6,t
the previous equality will be satisfied whenever k b = nT, which has solutions 6,n since k and T are integers. The smallest 6 giving a solution is paired with the least positive integer n that makes S = nT/k a positive integer. If T and k are relatively prime, then n = k and the period of the new sequence is again T . If T / k is an integer, then the period of the new sequence is S = T / k (in the new index set). For example, if T = 4 and k = 2, then by choosing n = 1 we get yt to be P C with period 6 = 2. But if T = 5 and k = 2, then we need to take n = 2 in order to see that Yt is again PC with period S = 5 . Figure 6.13 illustrates the spectral effects of sampling a PC-4 sequence by a factor of k = 2 and k = 4. Figure 6.13(a) shows the support of the original F x and the support of F y after the factor of 2 sampling is presented in Figure
EFFECTS OF COMMON OPERATIONS ON PC SEQUENCES
189
6.13(b). An import,ant case is when k = T , for then 6 = 1 so the sampled sequence is stationary. In this case all of the measures on the diagonals in the small squares of Figure 6.13(c) are added together to form the measure F y of Figure 6.13(d).
Figure 6.13 Result of sampling a PC-4 sequence by factors of 2 and 4. Small dashed lines define the subsets that are aliased. (a) Support set S 4 for a PC-4 sequence. (b) Support set for Fy resulting from sampling by a factor of k = 2. (c) Support set S4 for a PC-4 sequence. (d) Support set for F y resulting from sampling by a factor of k = 4.
An interesting application of filtering and sampling is the procedure called aggregation. Given a random sequence X , that has a periodic structure with period T , it is sometimes of interest to inquire about the sequence obtained by summing (or perhaps averaging) the values of X t over one period. For example, if t signified a monthly index and T = 12, then one might have interest in the yearly aggregate.
190
FOURIER THEORY OF T H E COVARIANCE
Definition 6.3 T h e T - s a m p l e aggregate of a process X, is T-I
(6.99) p=o
The effects of aggregation may now be understood in terms of LTI filtering and sampling. Precisely, the sequence Y, may be seen as the T-point sampling of the filtered sequence yt = C n > O X t - n w n , where {w,} has finite uniform weights, w, = 1 for n = 0 , 1 , . . . , T - 1 and w, = 0 for n 2 T . This yields a lowpass type of filter whose frequency response is illustrated in Figure 6.14 for T = 12. Note that the frequency response has zeros at 27rk/T so the periodic
Figure 6.14 Frequency response for filter with uniform weights wn = 1 for n = 0 , 1 , . . . , N - 1 for N = 12. The PC-12 support set is overlaid in black. (a) Frequency response 20 loglo IW(X)l. (b) Outer product frequency response 10log,, IW(X,)W(XZ)I.
components with period T are removed and near X = n-, the suppression of frequencies is 20 dB (an order of magnitude). The support lines for a PC-12 sequence are overlaid in black. Figure 6.14 shows the spectral densities that result from filtering PC-12 white noise by the uniform weight filter with N = 12. Recall that if X t is PC-12 then the filter output yt will generally be PC-12 but the 12-point sampling of yt will be stationary and its spectrum is the sum of all the diagonal measures of the T x T = 144 subsquares, but scaled to fill [0,27r) x [0,27r). However, the original spectral covariance measure FX will be greatly suppressed on all the squares except those corresponding to low frequencies XI, < 27r/12 (mod 27r).
EFFECTS OF COMMON OPERATIONS ON PC SEQUENCES
6.8.5
191
Bandshifting
Bandshifting refers t o forming the product sequence
yt = ~ ~ e z ’ * ~ ,
(6.100)
an operation we can easily understand in the context of harmonizable sequences. For then it follows from (6.35) that
where the meaning of E’(d7) = a d 7
- As)
should be clear (for 0 5 y < 27r take [y - A,] (mod 27r)). So the random spectral measure [’ for yt is that of X , shifted upward by A,. This is often called complex bandshifling because of multiplication by the complex exponential rather than by sines or cosines. The covariance of yt is simply computed to be (even for nonharmonizable sequences)
from which it is easy to see that complex bandshifting a stationary or a PC-T sequence by any shift frequency A, leaves the respective result stationary or PC-T. This may also be understood from the spectral covariance measure. Indeed, for any harmonizable sequence
so
Fy(dyl,dy2) = F x ( d y l
-As)
(6.103)
and, as above, dyl - A, and dyz - A, are to be taken mod 2 ~ The . effect of the bandshifting by frequency A, is to shift the measure F along the main diagonal upward by A,. Since frequency is taken mod 27r, the amplitude associated with a frequency yo > 2~ will now appear at 70- 2 ~ a ,phenomenon sometimes called wrapping. To illustrate, we consider the weighted measure W ( A ~ ) W ( A Z ) F ( ~dA2) A ~ ,produced by the lowpass LTI filter with cutoff frequency A, = 0 . 1 5 ~whose response is illustrated in Figure 6.15(b). Figures 6.15(c) and 6.15(d) show the effect of bandshifting by A, = 0 . 2 and ~ 0.5~ respectively. Note the “wrapping” effect.
192
FOURIER THEORY OF THE COVARIANCE
Figure 6.15 Effect of bandshifting and lowpass filtering. (a) Frequency response W(A) for a 8 coefficient lowpass filter with cutoff A, = 0 . 1 5 ~ .(b) lolog IW(X,)W(X,)l. (c) Effect of bandshifting by A, = 0 . 2 ~ .(d) Effect of bandshifting by A, = 0 . 5 ~ .
6.8.6
Periodically Time Varying ( P T V ) Filters
If in (6.78) the filter coefficients w n , n E Z are replaced by a collection w n ( t ) , n E Z satisfying w n ( t ) = wn(t T ) for every n E Z,the filtered sequence is formed by (6.104)
+
provided the sum converges in mean square sense. If w n ( t ) = 0 for all t and n < 0 the filter is called causal: yt depends only on the input X , for s 5 t . If CnEZ iwn(t)1 < x for t = 0 , 1 , . . . , T - 1, the filter is called stable; in this
193
EFFECTS OF COMMON OPERATIONS ON PC SEQUENCES
case note C n E Z m a x t = o , l , . . . , ~Iwn(t)l -l < co. If a PC-T sequence is filtered by a stable PTV filter, also with period T , then since R x ( t ,t ) is bounded,
cc
lwn(t)lIwn~(t)llRx(t - n,t - n/)l< m,
(6.105)
n E Z n’EZ
ensuring the existence of the sum Yt. Furthermore,
R Y ( s , ~ )=
C cw n ( ~ ) w n ( ( t ) R ~n,t( s n’) C C + T)w,/(t + T ) R x ( s+ T -
-
n E Z n’EZ
=
W,(S
- 72,
t +T
- n’)
nEZ n‘EZ
=
RY ( S
+ T ,t + T ) ,
showing that the PC-T property is preserved by stable PTV filtering of the same period T . As for the case of LTI filtering, we can use the harmonizability of PC sequences to express and interpret the covariance Yt. First we define for stable PTV filters. (6.106) n=-m
so the sum in (6.104) becomes
(6.107) The interchange of integral and sum is justified essentially by (6.105) and Fubini’s theorem. Second, the covariance of U, then becomes
where F x is the spectral covariance measure for the sequence X t . A little more interpretation may be obtained by expressing w n ( t ) by its discrete Fourier series T-I
(6.109) k=O
194
FOURIER THEORY OF THE COVARIANCE
with
-
T-I
(6.110) where it is clear that for Yt becomes
CnIW;~ < 00
for k = 0,1, . . . , T-1. Then the expression
T-1
=
c
6'"
nEZ
k=O
T-1
ei2aktlT
k=O
eixtwk(X)<(dX),
(6.111)
showing that the action of a PTV filter is represented by the filtering of X t by T different LTI filters and then bandshifting the kth filter output by 2nklT and summing the result. This finally gives the alternative expression
(6.112) We note that PTV filtering of P C sequences can also be understood by blocking X t into vectors of length T . This lifts the problem to one of filtering stationary T-variate sequences by a time invariant filter having matrix coefficients. For further discussion of PTV filtering applied to P C sequences (and continuous time P C processes) see [59]. The design of PTV filters is addressed in [184], where related references are given. PROBLEMS A N D SUPPLEMENTS 6.1 If X t is stationary, show that X t + Xzt is harmonizable and describe the most general form of support of its spectral measure F . What happens if X t is PC-T? 6.2 Denote { X j : j = 1 , 2 ; . . . } as the set of rational numbers in [O, 271.). Let {Aj : j = 1 , 2 , . . . } be a sequence of orthogonal second order random variables with Cj E{lAj12} < 00. The sequence X t = C j A j e i x ~ist a stationary sequence whose discrete spectrum is dense in [0,2n). Use this idea to construct a PC-T sequence whose discrete spectrum is dense in ST.
6.3 In the case that X t is a real P C sequence, R ( s , t ) is real and so
B ~ ( - T= ) B - k ( T ) e -i2akr/T = BT - k
(T)e-i2wT
(6.113)
PROBLEMS AND SUPPLEMENTS
becomes Bk(-T)
=B-k(r)e
--ilrrkr/T -
195
BT - k ( 7 ) e - i 2 w T .
Appealing directly to (6.2) we obtain further that Re Bk(r) = Re B T - k ( r ) and Im B k ( r ) = -h & - k ( 7 )
(6.114)
so
Bk(r)= BT-lc(7).
(6.115)
Expressions (6.51) and (6.115) together lead to Bk(-T)
= B k ( r ) e -i 2 ~ k r / T
(6.116)
So if we define a modified coefficient function by
-k ( r ) = B k ( r ) e B
ixkr/T 1
= Bk(7).
(6.117)
it may be seen that Bk(r) is even, B k ( - r ) An application of the inversion formula (see L o h e [139, page 1991, Brockwell and Davis [28, page 15111,
leads to a condition of symmetry about the point X = T expressed by
-
F ~ ( [ bu) ,) = Fk([27r - b, 21r - u ) )
(6.118)
for any interval [ a , b ) c [0,21r). From (6.116) it also follows that F k ( [ a , b ) )= the meaning of the latter is obtained by extending F k periodically with period 21r over the entire real line. For k = 0 this symmetry exists even for the complex case because Bo(r)= Bo(r)and FO is real. However, for k # 0, we evidently need X t to be real in order to conclude that Bk(r) is even. We also note that (6.118) leads to a condition of symmetry about the point X = 0 expressed by F k ( [ a + ~ r / Tb+~r/T)), , where
-
Fk([Ul
b ) ) = Fk([-b, - a ) ) .
This may also be seen as the consequence of using a different exponential sequence in (6.117), specifically
-
Bk(7)
=Bk(7)e
arrr-zxkr/T
to obtain an even g;k(r). It is also of interest to observe the meaning of these symmetries in terms of the measure F . Beginning with the measure p from which F was constructed,
196
FOURIER THEORY OF THE COVARIANCE
we note that the symmetries about y = n and y = 0 in the measures Fk become symmetries about the lines A2 = 2n - XI and A2 = - X i in Figure 6.2. These lines become the antidiagonal of the square [0,2n) x [0,2n) under the transformation that takes pc into F . Thus when X t is real, the measure F is symmetric about the antidiagonal A2 = 27r-Xl. But as we have already noted, there is also the symmetry about the main diagonal given by F ( [ a b, ) , Ic, d ) ) = F ( [ c ,d ) , [a,b ) ) . Thus F is evidently determined completely on the square [0,27r) x [0,2n)by its values on any one of the four triangles formed by the diagonal, the antidiagonal, and the boundaries of the square.
6.4 Show that if the limit Ravg(7) = limlv,, exists for every 7 E Z,then R a v g ( ~is) NND.
& E z - NE{Xt+,%}
6.5 Compute the densities fk(X) for X t = ftyt with yt stationary white noise and with f t = 1 + m cos(2nt/T), where m is a real parameter (usually m 5 1). 6.6 To get a version of Proposition 6.9 in terms of the spectral processes of Y ( n )and Z p ( t ) ,first re-express (6.38) as
T-1
=
2K/T
eikZxtlT
6'"
eiYtE(dy+ 27rk/T)
k=O
=
eiYtqp(dy),
so considering the random value ~~((0))gives T-1
qp({O))=
eik2xp/Tt({27rk/T}). k=O
Under conditions for WLLN, ~~((0))= m ( p ) ,the mean at time p and <((2nk/T)) = the Icth Fourier coefficient in the Fourier series of m(t).
m k ,
6.7 Verify that the matrix V(X) (6.42) is unitary for all 0 5 X < 27r. 6.8 If F ( A ) > 0 , we will say that the sequence is not transient or continuing. If F ( A ) > 0 and F, = 0, we will say the sequence is purely not transient or purely continuing (persistent). Stationary sequences, PC-T sequences, and some others are purely continuing. They have no inherent transient part. Of
197
PROBLEMS AND SUPPLEMENTS
course, the sum sequence
xt = yt + 2,
(6.119)
for Yt PC-T and 2,transient, with X t and Yt orthogonal sequences, is neither transient nor purely continuing, it is just continuing. The measures Fk can be further decomposed,
Fk where F,"'
= F,"'
+ F,",
(6.120)
<< ,u and F," i ,u and this leads to the further Fk = F i d
+ F,"' + F t c ,
decomposition (6.121)
where F i d is the singular discrete part consisting of point masses; F,Sc is the singular continuous part consisting of F," less the point masses. and F,"' is absolutely continuous with respect to Lebesgue measure. The component F,"' can be expressed in terms of a density (6.122) which we will call the spectral density of the P C sequence. If F{ is null, then r2n
It is possible to further develop the decomposition of the singular part of F , FS
= F S C + Fsdl + ~
s d 2
where F s d 2is concentrated on a countable set of [O, 2 ~ Fsd' ) is~concentrated ~ on a countable set of lines in [ 0 , 2 ~ )and ~ , F s c is the remainder. Note the remainder is singular continuous with respect to the specific other singular parts Fsdl and F s d 2 . Since IFkl is a finite measure, Fid can contain no more than a countable number of point masses and hence Fsd can contain no more than a countable set of point masses. Of course, this could have been concluded directly from the singular discrete part of F S .
This Page Intentionally Left Blank
CHAPTER 7
REPRESENTATIONS OF PC SEQUENCES
As we mentioned previously, P C sequences arise from the mixing of stationarity and periodicity. This is very clearly seen, we believe, through the unitary operator of a P C sequence and the representations that follow. At the most basic level, the connection of the P C structure to a group of shift operators was first mentioned by Ogura [174] as a motivation for the development of spectral representations (that are much like those of Gladyshev). The explicit use of the unitary operator was introduced in [lll]for continuous time P C processes: its role in almost P C processes was explored in [ l l a ] . In this chapter we show the existence of this operator and develop the representations that follow from it. We then use the spectral theory to prove Gladyshev’s representation theorem (Proposition 6.8) but in a broader context than the original proof. The spectral theorem also permits the explicit construction of the random spectral measure E appearing in the harmonizable representation. We will also see that, not surprisingly, the role of the unitary operator for P C sequences is a straightforward extension of its role in the stationary case.
Periodically Correlated R a n d o m Sequences:Spectral Theory and Practice. By H.L. Hurd and A.G. Miamee Copyright @ 2007 John Wiley & Sons, Inc.
199
200
REPRESENTATIONS OF PC SEQUENCES
7.1 T H E UNITARY OPERATOR OF A PC SEQUENCE Reviewing from Chapter 3, a unitary operator on a Hilbert space ‘H is a linear operator U from 7-L onto ‘H for which (Ux, Uy) = (x,y) for every x,y E ‘H; that is, unitary operators are linear and preserve inner products. Every unitary operator can be written as an integral with respect to a spectral measure (see Theorem 3.6)
U=
I
2T
eiXE(dX).
(7.1)
Proposition 7.1 A second order stochastic sequence X t is PC-T if and only if there exists a unitary operator U o n its time domain 7ix (see Definition 4.2) such that Xt+T = U X t , f o r every t E Z. (7.2) Proof. In view of the discussion in Chapter 4, we assume 1 E ‘ H x . If there exists a unitary U for which (7.2) is true, then the mean and correlation functions of X t satisfy m(t)= m ( t + T ) and R ( s , t )= R ( s + T , t + T ) for every
s , t E Z. Conversely, if X t is P C with period T , then for z = C,”=, a j X t J in C X = s p { X t ,t E Z}, define the operator U by U z = C,”=, ajXt,+T. It is not difficult to show (see Problem 7.5 at the end of the chapter) that this extension of U to C X is linear, well defined, and preserves inner product. It is also easy to see that U,as a map from C X to C X ,is onto. It then follows from the continuity of U that it extends to an operator on the closure ‘Hx = G. One can finally verify that this extension is unitary. 1 So for PC-T sequences, shifts of length T are unitary. If X t is a P C sequence and U is a shift operator for T = 1, then X t is stationary. Recall from Proposition 4.8 that multivariate stationary sequences have the property that there is a single unitary operator UX that acts as the shift operator for each component of X,. It should be clear that if X, is the T variate stationary sequence arising from T-blocks of a PC-T sequence, then
U = Ux.
The existence of U leads to another characterization of P C sequences that becomes very useful for obtaining representations of P C processes. Proposition 7.2 A second order sequence X t is PC-T i f and only i f there exists a unitary operator V and a periodic sequence Pt taking values in X X f o r which xt = VtP, (7.3)
f o r every t E Z.
201
REPRESENTATIONS BASED ON THE UNITARY OPERATOR
Proof. If X t is given by (7.3) then
( X s ,X t )
V"t)
=
(V"S,
=
(VTUV"Ps+T,V V P t + T )
= (Xs+T,Xt+T) (7.4) so X t is PC. Conversely, if X t is PC-T, the spectral representation of U given by (7.1) provides a means to construct a unitary operator by 1
r27~
V = /o
exp(zX/T)E(dX),
(7.5)
where V T = U. In other words, V is a T t h root of U.If we take Pt = V - t X t we can easily verify that (7.3) holds and Pt is periodic because IIPt+T -
by virtue of (7.2).
IIV-t-T[xt+Tl - v-t[xtlll = IIV-TIXt+Tl - & / I = 0,
Ptll =
1
The expression (7.3) is a clear contrast to the stationary case, in which Pt = X O ,a fixed random variable. We also note that the characterization is not unique as F = V exp(i2.rrlT) and p ( t )= P ( t )exp(-i2.rrt/T) yield another unitary operator and periodic function that also solve (7.3). The spectral theorem shows that any Bore1 measurable function f u ( X ) with Ifv(X)l = 1 and f z ( X ) = exp(iX) will yield an operator that is not necessarily unitary, but with VT = U. The operator V defined by (7.5) is, however, unitary and we shall call it the principal T t h root of U . 7.2
REPRESENTATIONS BASED ON T H E U N I T A R Y OPERATOR
Representations for X t follow naturally from (7.3) by simultaneously employing various representations for V and P . We begin by proving again the representation of Gladyshev (Proposition 6.8). 7.2.1
Gladyshev Representation
Proposition 7.3 (Gladyshev) A second order sequence X t i s PC-T if and only if there exists a T - v a r i a t e stationary sequence Zt = [ Z j ]s u c h that T-1
j=O
202
REPRESENTATIONS OF PC SEQUENCES
Proof. If X t has the stated representation, it is clearly P C with period T . Conversely, if X t is P C with period T , then (7.6) follows from Proposition 7.2 and the representation of the periodic function Pt by its (finite) Fourier series
j=O
where
& E ‘Hx,and then making the identification
are jointly stationary because they are orbits of different starting vecThe tors Pj under the same unitary operator V . I
As noted in Chapter 6, Gladyshev actually stated a result in which the jointly stationary sequences 2: are “band limited” to the interval [O: 27r/T). This corresponds t o taking V to be the principal root of U as we have done in (7.5). Then the change of variable y = X/T in (7.5) produces
Since V is now back in the canonical integral form (see [2]), it is clear that the spectral support of V is contained in the interval [ 0 , 2 ~ / Tand ) so
The representation (7.6) with V t acting on the terms pjez2.rr3t/T provides additional meaning to the term spectral redundancy that is used in the description of cyclostationary processes (see Gardner [70]). It shows that the correlations or redundancy that appears in the spectrum of X t is caused by the action of the one unitary operator on possibly T different vectors. Proposition 1.1made the simple observation that every PC-T sequence X t may be viewed as a T-variate stationary sequence X, = [ X i ] ,where X;.l = X j + n ~n, E Z for j = 0,1, T - 1. Proposition 7.3, also by Gladyshev. has just shown the important fact that every PC-T sequence may be represented, via (7.6) by a T-variate stationary sequence Z t whose components are of bandwidth 2nlT. The next result is essentially the same as Proposition 7.3 but relies on a different proof and gives a minimal representation for X t . ...)
REPRESENTATIONS BASED ON THE UNITARY OPERATOR
7.2.2
203
Another Representation of Gladyshev Type
Proposition 7.4 A second order sequence X t is PC-T if and only if there exist a q-variate stationary sequence Zt = [Z:] and q scalar periodic sequences { f; = f l + T , j = 1 , 2 , ...,q } with q 5 T , such that 4
xt
f,j.
=
(7.10)
3=1 Proof. If X t has the stated representation, it is clearly P C with period T . Conversely, if X t is P C with period T , then we first note that L p = sp{Pt : t = 0 , l . ..., T - 1) has dimension q = dimLP 5 T . Let {& : j = 1 , 2 , .. . , q } be an orthonormal basis for C p ,and so 4
(7.11) j=1
Now the claim (7.10) follows from Proposition 7.2 and then making the identifications 2; = V"j (7.12) and
(7.13) Finally, we turn to spectral integral representations for P C sequences. The first is a direct application of the spectral representation (7.1) for unitary operators and the representation (7.11) for Pt.
7.2.3
Time-Dependent Spectral Representation
Proposition 7.5 A second order sequence X t as PC-T if and only zf there exists a tame dependent random spectral measure [(., t ) = [(., t + T ) o n the Bore1 subsets of [0,27r) that ,isorthogonally scattered ( i n the sense that ( [ ( A s, ) ,[ ( B ,t ) )= 0 f o r every s, t E Z whenever A n B = 8) and such that
(7.14) Remark. This is in contrast to the harmonizable representation in which the measure is independent of t but is not orthogonally scattered. Proof. If X t has the form (7.14) with [(., t ) = [(.>t+T) and ( [ ( As), , [ ( B , t ) )=
204
REPRESENTATIONS OF PC SEQUENCES
=
&(A)Ft,t(A).
Furthermore, Ft,t(.) 2 0 and Fst(.) inherits countable additivity from < ( . , t ) . Thus Fst(.) is a (finite) complex measure and
Note that a representation (7.16) for the correlation of a PC-T sequence is obtained as a by-product. Conversely, if X t is PC-T, we use the spectral representation (7.5) for the unitary operator V t to obtain
where the t i m e dependent spectral measure < ( . , t )is defined through the application of the spectral measure appearing in (7.1) to the vector Pt. To be precise, for any Borel A, set
< ( A , t )= E(A)[PtI,
+
from which it follows directly that C(A,t ) = [ ( A ,t T ) . It is also easy to see that ((., t ) and <(., s) are mutually orthogonally scattered in the sense
( ( ( A ,t ) ,t ( B ,3 ) ) = 0, whenever A fl B = 0.
I
REPRESENTATIONS BASED ON THE UNITARY OPERATOR
205
Oscillatory PC Sequences. A random sequence X t is called oscillatory (see Priestley [185]) if there is an orthogonally scattered random spectral measure <(.) on the Bore1 sets of [0,27r) and a function a ( t ,A) for which (7.18) where a ( t , .) E L 2 ( F )for all t E Z and F is the spectral measure associated with <( .). Oscillatory processes are generally nonstationary but clearly include the stationary processes by taking a ( t ,A) 5 1. It follows that the correlation of an oscillatory sequence has the representation
1
2x
R ( s , t )= If a ( t ,A) = a ( t
+ T ,A)
eix(s-t)
for all t E
a ( s ,A ) a ( t , F ( d A ) .
Z and A
E [0,27r), then it is clear that
So X t is PC-T. In this case the spectral measure I?,,(.) given here by
F5t(A)=
s,
U(S,
(7.19)
defined in (7.15) is
A)a(t,X)F(dA).
The characterization of oscillatory P C sequences is an open problem. 7.2.4
Harmonizability Again
Although Proposition 6.7 shows (in the same manner as Gladyshev) that all PC sequences are harmonizable, we will now make the argument again using the unitary operator. We will do this by explicitly giving the random measure E and then arguing that its spectral correlation measure F is of bounded variation.
Proposition 7.6 Every PC-T sequence X t is strongly harmonizable.
<
Proof. In order to express explicitly, we use the spectral representations (7.5) and (7.8) in relation (7.6) to obtain
206
REPRESENTATIONS OF PC SEQUENCES
Now for any interval A
c [0,27r),if
we define
2
T-1
<(A)=
E ( T A - 2p7r)[Fp],
(7.21)
p=o
then (7.20) can be read as (1.14). Here it is important to remember that the projection valued measure E ( . ) is defined on the Borel sets of [0,2 ~ ) . The measure [ may be viewed in the following manner. First note we can write
u
T-1
A=
A n A p , where Ap = [2p7r/T,2 ( p
+ 1)7r/T).
p=o
(7.22)
-
Then E ( T A - 2p7r)[Fp]is the action of E ( T ( A n A,)) on vector Pp, where T ( A n A,) is taken modulo 27r. Thus the correlation (or redundancy [70]) in the spectral measure arises-from the repeated use of the same projection measure on different vectors Pp. The spectral (correlation) measure is determined by first defining it on intervals A and B in [0,27r) by T-1 T-1
(7.23) p=o p’=O
where
F p p ‘ ( AB , ) = ( E ( T A- 27rp)[Fp],E ( T B - 27rp/)[Fpr]).
(7.24)
The countable additivity and boundedness is inherited from the projection valued measure E ; the measure defined on intervals may then be extended in the usual way to the Borel sets of [0,27r) x [0,27r). But (7.24) for A, B intervals gives Fpp‘(A,B ) = 0 unless
( T A - 2p7r) n ( T B - 2pl7r) # 0.
(7.25)
This leads to the conclusion that the support set of F must be contained in ST = { ( > I , A), : Xz = A1 - 27rk/T. k E [-(T - l ) ,T - l ] } intersected with the square [0,27r)x [0,27r). The finiteness of the spectral correlation measure can be established from the finiteness of each of the measures Fpp’(A,B ) . Indeed, from the Schwarz inequality IFpp’(A,B)I2 5 F p p ( A A , ) F p ‘ p ’ ( BB. ) (7.26) and from (7.23) it can be seen that FPP(A, A) is a nonnegative spectral measure defined on the Borel sets of the rectangles A P x A P defined in (7.22) with FPP(AP, AP) 5 l l ~ P l l & , and similarly for Fp’p’(B, B). Hence each Fpp‘(A.B) has finite variation and since F ( A , B) is just a finite sum of these parts, it too
207
REPRESENTATIONS BASED ON THE UNITARY OPERATOR
I
has finite variation.
7.2.5
Representation Based on Principal Components
We now show that a representation of the type given in Proposition 7.4 may be obtained from the principal component analysis of the covariance R ( s ,t ) . This idea appeared in a report by K. L. Jordan [123] and was given a thorough treatment for continuous time by Gardner and Franks [64]. More recently, some finer points, also for continuous time, have been given by Hurd and Kallianpur [lll]and Makagon [145]. For continuous time, the representations are based essentially on Mercer's representation of NND compact operators on L2[0,11 (see [192]);this leads to the well known Karhunen-LoBve expansion. It has appeared again recently in the context of EOFs (empirical orthogonal functions) for cyclostationary processes (see papers by Kim et al. [130-1321). If R ( s ,t ) is the correlation of a PC-T sequence, the restriction of R ( s ,t ) to the principal square
Ro = [ R ( s , t ): ( s , t )E (0,1,.. . ,T
-
1)2]
(7.27)
is a Hermitian NND matrix (which we assume to be of full rank) and so there exists a matrix ip whose j t h column is ( 4 , ( t ) : t = 0.1, . . . . T - 1) and for which Ro = @Zip*. (7.28) where
I . : : :1.i uo" 0 0
.
Z=
1 0 That is, the columns of that
ip
.:.. .. ..
0 0
. 0
g ' T2 - l
(7.29)
.
...
are right eigenvectors of Ro with eigenvalues = R O @j ~ =, 0 , l . .
. . ,T
-
032,
so
1,
or more explicitly, T-1
. ; 4 , ( ~ ) = C[Ro],,t4,(t), s,j
= 0,1. ...,T - 1.
(7.30)
t=O
Furthemore. the eigenvectors
{4j: j
= 0,1, .... T - 1)
are orthonormal
T-I
(7.31)
208
REPRESENTATIONS OF PC SEQUENCES
which may be written @*@ = I .
These facts mean that R(s,t ) may be represented in the principal square by T-1
(7.32) j=O
Using the block notation (see (1.10)) the T-block Xo = [ X OX, I , .. . , X T - ~ ] (treated as a column vector) is transformed by the linear transformation 70 = @*Xo
into a vector
70
(7.33)
of uncorrelated random variables,
E{QoT$}= E{@*Xo(@*Xo)*} = @*Ro@ = X. We shall denote ~0 as the principal component vector of block number 0. Multiplying both sides of (7.33) by @ and using @*@ = $4," = I yields
or in detail
The PC-T property R(s,t ) = R(s will diagonalize
+ T ,t + T) implies that the very same @
R, = [ R ( s , t ): ( s , t )E {nT,nT for every n
E
+ 1 , .. . , nT + T - 1}2]
(7.35)
Z. That is,
R,
= @.C@*,
or in other words, the eigenvectors
c$j((t) solve
(7.36)
T-I
(7.37) for s , j = 0, 1, ..., T - 1 and hence T-1
(7.38)
REPRESENTATIONS BASED ON THE UNITARY OPERATOR
+
+
+
209
+
for (s,t ) E {nT,nT 1 , .. . , ( n l)T - 1)) x { n T ,nT 1 , .. . , ( n l)T - 1)) (i.e., in the n t h square ) and for every n E Z. From these facts we conclude that for each fixed n , the random vector of principal components qn = @*X, (7.39) given in detail by T-1
(7.40) t=O
are orthogonal
(7.41) and Var ( [qn]j)= 0;. Furthermore, the principal component sequences [qn]j(indexed on n) are jointly stationary because they are all propagated by the same unitary operator U , T-1
(7.42) Stationarity of the vector sequence qt could also have been concluded by and noting that writing out the expression for ([qnIj,[qm]k)in terms of the PC-T property (1.7) implies that the inner product depends only on n m. And even though the principal component vectors qn have orthogonal components for fixed n it is not necessarily true that ([qnIf,[ q m ] k ) = 0 for m # n. Finally, using a*@= a@*= I yields as before
xt
xn=W n or in detail (using t = Ic
+ nT with 0 5 k < T
-
1)
T-1
xt
=
[xn]k =
$j j=O
(Ic) [%]j
(7.43)
210
REPRESENTATIONS OF PC SEQUENCES
+
for t = nT, nT 1, . . . , nT +T - 1. This representation has wide use and interpretation in communications engineering [64,73],where it is called the translation series representation (TSR). By extending +(t)periodically, 4j ( t ) = 4j(t T ) ,then (7.43) is sometimes written
+
c
T-1
xt
=
4j(t)[rlnIj,
= Lt/TJ,
j =0
making it clear that the “pulses” 4 j ( t ) are the same in each period with the randomness (the message) entering via the amplitude vector vn. Here we ask how the preceding remarks change when Ro is rank deficient, let us say rank Ro = r < T . Then T - r eigenvalues D: = Var ( [ q n ] j )in (7.29) will be null. In this case @ and C in representations (7.28) and (7.36) can be taken t o be of dimension T x r and r x r. Alternatively, the dimensions can remain unchanged because whatever vectors 4j are used in the columns corresponding t o uj = 0 will be ignored. As for uniqueness, all vectors corresponding t o distinct eigenvalues will be unique (recall we always assume CT=ill4j(t)l2 = 1). But there will be ambiguity whenever there are eigerivalues ui with multiplicity greater than = a:z = . . . = = u’. Then any orthonormal set one. Let us say that of vectors that span the eigenspace P z= {X E CT : RoX = a2X} will serve as normalized eigenvectors. See the supplements for discussion of the case when X t is stationary.
ail
7.3
M E A N ERGODIC T H E O R E M
Generally, ergodic theorems address convergence of the sum N
(7.44) t=-N
where X, is the orbit of a dynamical system or, in our case, a random sequence. When X t is stationary we expect from 1
E{SN}= 2 N + 1 ~
c N
W X t )=P
t=-N
that SN p , and when this is true, the sequence is called ergodic. The mean ergodic theorem addresses the mean square convergence of SN to p and pointwise ergodic theorems address the convergence for each w E 0. A more complete treatment of ergodicity for cyclostationary sequences, including both --f
MEAN ERGODIC THEOREM
211
pointwise and mean-square theory, is given by Boyles and Gardner [23]who also give a good set of references t o related work. Also see Honda [98]. Here we will give only a few results for P C sequences that are connected t o the mean ergodic theorem. We will use the spectral theory for harmonizable sequences from Section 5.3 for which we can express the limit of a more general form of (7.44), namely, N
(7.45) which was also given in (5.8). Note that if X t is PC-T then F ( { ( A , X } ) ) = F , ( { A } )the , FOmeasure of the point X where Fo is defined in (6.26). Similarly, the limit of S N ( X )is [({A}), the random measure of the point A. See L o h e [139, page 4741 for similar results when E(A) is interpreted as the random spectral process and F(X1, A 2 ) as the two dimensional spectral distribution function function. For a P C sequence with possibly nonzero mean mt , we defined (in Chapter 6) 6 k as the kth Fourier coefficient of mi, (7.46)
t=O Now we can observe that the quantity N
converges to f % k for = 2 r k / T ,k = 0 , l : . . . , T - 1, and otherwise converges to 0. So it is natural t o say that a PC-T sequence is mean ergodic whenever S ~ ( 2 7 r k l T+ ) 6 k l in mean-square sense for all k = 0 , 1 , . . . , T - 1. Proposition 7.7 A PC-T sequence is mean ergodic i f and only if
Fo({27ik/T})= F ( { 2 r k / T } ,{ 2 ~ k / T }=) 1 f % k I 2 ;
for all k
(7.47)
= 0 . 1 , . . . , T - 1.
Proof. Let mt = mt+T be the periodic mean of the PC-T sequence X t and write X t = X i + m t , where X l has zero mean. Denoting Sh(A) as (5.8) evaluated for X i , we see that
+
S N ( 2 r k / T )= S h ( 2 r k / T )
1
N
mte -aZrrkt/T t=-N
212
REPRESENTATIONSOF PC SEQUENCES
+ %k.
converges t o 5 ( { 2 7 r k / T ) ) = 5 ’ ( { 2 7 r k / T } ) stationary case, since
Continuing to follow the
Fo({27rk/T)) = E{1t({2.irk/T))/2)= E{it’({2.k/T))12}
+b
2
k /
,
we obtain a mean ergodic result S ~ ( 2 7 r k l T4 ) %k for every k , in mean-square sense, if and only if
FO((27rkIT)) - l’f6kl2 = E{15’({27rk/T})12)= 0 for every k .
I
Thus we get mean ergodicity if and only if the diagonal measure Fo({27rk/T}) is exactly large enough t o account for % k , meaning that there is no random component contributing positive variance to Fo({27rk/T}). This would be expected from the multivariate stationary case, but we make it precise in the next paragraph . The mean function mt can be viewed as the mean of the tth component of the blocked vector sequence X, defined in (1.10). That is, m = E { X , ) = (mo,m l , . . . , m ~ - l ) ’ Thus . we can examine the weak law for X, and relate it to the preceding paragraphs. Denoting F as the matrix spectral measure of X, and using
cx N
SN(X) =
1 ~
2N+1
e-ixn
(7.48)
n=-N
we found in Proposition 4.9 that S,(O) + m if and only if the atom of E ( . ) at (0) is the vector of constants m, or in terms of F, F j j ( { O ) ) = /mj12,j = 0 , 1 , . . . , T - 1. Relating the matrix spectral measures F and F via (6.41), along with F k k ( 0 ) = F({27rk/T},( 2 7 r k l T ) ) = Fo({27rk/T)),we find that the preceding condition on the F j j ( { O ) ) are equivalent to (7.47).
7.4 PC SEQUENCES AS PROJECTIONS OF STATIONARY SEQUENCES
Since by Proposition 6.7 P C sequences are strongly harmonizable, they have a stationary dilation, meaning they can be expressed (Proposition 5.9) as the orthogonal projection (7.49) xt = &,Y, = (Y,/‘Ftx) of a stationary sequence Yt in some larger space ‘Fty 2 'Fix. See the problems for a very simple example of a P C sequence obtained from a projection. More importantly, in the case of P C sequences we can always construct from X t a stationary dilation Y, solving (7.49) (see Miamee [160]) as follows.
PROBLEMS AND SUPPLEMENTS
213
Given a PC-T sequence X t , let 'FI denote the direct sum of T copies of % x , namely, 'FI = %% on which the inner product between two vectors X = ( 2 1 , ~. .~, Z.T ) and Y = (y1,y2,. . . , Y T ) is defined to be T
k=l
where (.,.) is the inner product on each copy of 7 - t ~ .The sequence Yt = [ X t ,X t + l , .. . , X t + ~ - lis ] stationary in 'FIT because T-1
T
Furthermore, it is clear that X t is the projection of Y t onto the space 3 - t ~x 0 x 0 . . . x 0, which is essentially ' F I X . PROBLEMS AND SUPPLEMENTS
7.1 The representations following from (7.2) can also be obtained for P C processes defined on other index sets. For example, every P C field indexed on Z2 can be represented in similar forms, also with a finite sum (see Hurd et al. [llS]). A similar representation is possible for continuous time but with possibly a countable sum (see Hurd and Kallianpur [lll]).Groups of unitary operators may be used to form the almost periodically unitary processes, a class of almost P C processes [ 1121. 7.2 For the case T = 2 show that if f ( X ) = eiX12 for 0 5 X < T and f ( X ) = eiX12ei" for T 5 X < 27r, then V = f ( X ) d E ( X ) solves V2 = U x = J ": ei'ITdE(X).
s,'"
7.3 Put the P C sequences given by the following examples into the form X t = Vt[P(t)],for V unitary: (a) amplitude scale modulated stationary sequences (Section 2.1.3);
(b) time scale modulated stationary sequences (Section 2.1.4);
7.4 The TSR when X t is stationary. If X t is stationary the representation (7.43) holds for every T 2 1. This motivates the question: Is there anything different in the proper P C case (meaning the sequence is P C but not stationary) relative to the stationary case? Here is one difference. Note that if X t is PC-T, then for every N 2 1 the restricted correlations (7.51)
214
REPRESENTATIONS OF PC SEQUENCES
are periodic in r , RY = RY+T. For some N this periodicity need not be proper (meaning all the elements of RY are constant with respect to r ) , but in order for Xt to be PC-T there must be some finite N for which the periodicity is proper. But if X t is stationary, RF = RY++lfor every N . So by taking N = kT a representation of the form (7.43) can be obtained for any PC-T sequence (including stationary sequences). But for sufficiently large N , the matrices and C will depend on r for a PC-T sequence, but they will never depend on r for a stationary sequence. To construct a proper PC-T sequence for which RY is constant with respect to T for small N , consider the PMA model Xt = Oo(t)& / 3 k ( t ) t t - k for k > T and choose O o ( t ) , / 3 k ( t ) to be properly periodic but O i ( t ) O % ( t ) = a’. Then R; = R7Nfl for 1 5 N < k .
+ +
7.5
Show the operator U defined on Cx = sp{Xt, t E Z} by U(C,”,l a3Xt,) =
c,”=, a 3 X t , + ~is well defined. linear, and inner product preserving. Hint: Denote z1 c,”=, a3Xt3 and z1 = c T = l To show U is well defined, =
bkxt,.
zz and show Uzl = UZZ. For linearity, show U(azl + z2) = aUz1 UZZ. To show that inner products are preserved, note that for any two vectors z1,zz E Cx,(1.7) implies assume z1
+
=
(z1, zz) = (UZl, Uza). 7.6 A simple example of a P C sequence obtained as a projection. Suppose the vectors yt are orthonormal, (Ys,yt) = d S p t , so yt is stationary white noise. Let XX = m{Yzt : t E Z}. Then
x,= P?.f Y
-
{ Yt, 0,
t even t odd
So Xt is clearly properly P C with period 2 7.7 Makagon and Salehi [142]show that every periodically stationary Xt (in the strict sense of Definition 1.2) can be represented in law by a time varying linear combination of strictly stationary sequences. This may be viewed as an P). extension of Gladyshev’s theorem, which is a representation in L2(R,F,
CHAPTER 8
PREDICTION OF PC SEQUENCES
The primary objective of this chapter is to present, for PC sequences; some of the basic ideas that are fundamental to linear prediction. We will first give some notation and then proceed to the Wold decomposition and linear prediction before presenting the solution for sequences that are periodic autoregressions of order one (PAR(1)) . Although many aspects of the prediction problem can be solved via the connection to multivariate sequences, we believe that viewing PC sequences as nonstationary can facilitate our general understanding of nonstationary processes, and so we take this tack. Some of the following definitions, already given in Chapter 4, are repeated here due to the new context. We begin with the subspaces that are important for discussing the prediction problem. In the following discussion, the closure is with respect to L2(R, F.P ) .
Definition 8.1 The time domain of a second order sequence X, is the subspace generated by all the values of Xi,with t € Z, ;Ft=
V { X t : t E Z}.
Periodically Correlated R a n d o m 5’equences:Spectral T h e o r y and Practice. By H.L. Hurd and A.G. Miamee Copyright @ 2007 John Wiley & Sons, Inc.
(8.1) 215
216
PREDICTION OF PC SEQUENCES
T h e Hilbert subspace generated by all t h e values of X , f o r s 5 t i s denoted by
% ( t )= W { X , : s I t}
(8.2)
and is called t h e past of t h e process u p t o and including t i m e t . T h e remote past is defined by X H ( - ~=) x ( t ) . (8.3)
n
tEZ
We note that ‘ H ( t ) is nondecreasing with respect to t , that is, X ( t ) 2 X ( s ) for t 2 s and for this reason we can also express
where { t k } is any sequence of integers that converges to -a.
Definition 8.2 A second order r a n d o m sequence is called purely nondeterministic o r regular i f
7-q-m)
= {O}
and is called (globally) d e t e m i n i s t i c o r singular i f
X(-cO) = ‘H, or equivalently, if
X ( s ) = X ( t ) , f o r all s , t in Z. A sequence i s called nondeterministic at t i m e t i f X i $ ‘ H ( t - 1) and deterministic at t i m e t if X t E ‘H(t - 1). It is called nondeterministic if it i s nondeterministic f o r s o m e t E Z. Note the negative of nondeterminism, X t E ‘H(t - 1) for all t , is equivalent t o the condition ‘H(-c0) = ‘H given above as the definition of determinism.
Also note in the case of stationary sequences, if X t r$ X ( t - 1 ) for some t then r$ ‘ H ( t - 1 ) for all t , and so in that case it is convenient to state the condition for nondeterminism as XO r$ ‘H(-1), as we did in Section 4.2. However, the following PC-2 sequence illustrates that for nonstationary sequences we need t o permit the issue of determinism t o be time dependent. Suppose {Y,} is a stationary purely nondeterministic sequence and a sequence X t is constructed by “doubling” yt; that is, X t is the sequence
Xt
. . . Y- 1
or in symbols
Y-1
Yo Yo Y] Y]. . .
217
It is easily seen that X t is purely nondeterministic because implies
nX x ( t ) nX y ( t ) =
t€Z
Xx ( 2 t ) = K H( t~)
= (0).
t€Z
But we find that X t E X x ( t - 1) for t odd and X t $ X x ( t - 1) for t even. So X t is purely nondeterministic and hence nondeterministic although it is deterministic (locally) for t odd. This is not a contradiction because pure nondeterminism is a global quality whereas determinism for specific times is local. Sometimes a sequence Xt can be expressed in terms of another (not necessarily unique) sequence &, for example, when {&} is a basis (in some sense) for X t . This motivates the concepts of causality, autonomy, and stability. Causality, Autonomy, and Stability.
Definition 8.3 (Causal) A sequence X t is called causal with respect to the sequence [t z.f'Flx(t)c X c ( t ) f o r all t E Z. In summary, causality addresses the issue of time order. If X t is causal with respect to &, then X t cannot depend on future values of It,that is, on Es for s > t . If Xi is causal with respect to an orthogonal sequence & then the orthogonality of Et implies
showing that X t is regular or purely nondeterministic. If X t is causal with respect to the orthonormal sequence
Et, then for all
tEZ k=-ca
and
t
k=-oc
Setting j = t - k gives
where $ J j ( t ) = c+j(t). Remark. See Lutkepohl [141] for some other notions of causality.
218
PREDICTION OF PC SEQUENCES
Definition 8.4 (Autonomous) T h e sequence X t has a n autonomous' representation in terms of It if f o r all t E Z 30
j=-m
cj
where the s u m is in the mean-square sense. The condition / $ j l z < 03 along with the orthogonality of Et and boundedness of IIEtll are suficient f o r the existence of Xtf o r each fixed t . In summary, Xtis autonomous with respect to representation do not depend on t .
Et
if the coefficients of the
Definition 8.5 (Stable) T h e sequence Xthas a stable representation an terms of the sequence Et if for all t E Z
In other words, Xt is stable with respect to representation are summable.
Et if the coefficients of the
8.1 WOLD DECOMPOSITION As in the case for stationary sequences, the Wold decomposition is closely related t o the relationship between the propagating unitary operator U x defined in ( 7 . 2 ) and the subspaces we have just defined. We now give these relationships for PC sequences. Lemma 8.1 If X t is P C - T with unitary T-shift U , then (a) X ( t
+ 7')
= UX(t);
(b) 7-t= U7-t; (c)
7-t(-0O)
= U'FI(-O3).
Proof. For item (a) recall that if A : 7-t H 7-l and M is any subset of 7-t, then A M = {y E 7-t : y = Ax,x E M}.Thus taking M to be L ( t ) = sp{X, : lThe word autonomous conveys the notion of time invariance and usually refers t o the solution of an unforced deterministic system; see Elaydi [54, page 2.1
WOLD DECOMPOSITION
219
s 5 t } then L(t + T) = U L ( t ) ;for if z E L ( t ) ,then z = C,"=, a3Xt, : t, 5 t so that U z = C,"=, a j X t , + ~E L(t + T ) ,which means U L ( t ) c L(t + T ) . Now let z E U'Ft(t), so z = Uw for w E 'Ft(t). Then w = limw, for w, E L ( t ) . By continuity of U we can write z = Uw = U(1im w,) = lim Uw,. But since Uw, E L ( t + T ) for all n we conclude that z E C ( t + T ) = X ( t + T ) . Thus U'Ft(t) c X ( t + T ) . A similar argument using the continuity of U - l produces Y ( t T ) c U'Ft(t). For item (b). the proof is essentially the same. One first shows that U C c L , where C = sp{X,,s E Z} and U X C 'Ft follows from the continuity of U . Then 'Ft c U'Ft follows similarly. To prove item (c), first suppose z E 'Ft(-m) which means z E E ( t ) for all t . But then z E X ( t T ) = U'Ft(t) so there is a y E X ( t ) (for all t E 2%) with z = U y . Hence z E U'Ft(-w).
+
+
The Wold decomposition now follows, in essence, from the facts given in the preceding lemma.
Proposition 8.1 (Wold Decomposition for PC Sequences) Any PC-T sequence X t has a unique decomposition xt = yt
+ zt
(8.10)
with (a) 'Ftx ( t )= 'FtY ( t )CE 'Ftz( t ) ;
(b) yt is deterministic and Zt is purely nondeterministic; (c) Y t and Zt are PC-T ; (d) U y and UZ are restrictions of U x to 'Ftx(-w) and 'Ftx(-co)',
respec-
tively. Proof. For each t E Z set Yt = (XtI'Ft(-oo)) and Zt = X t - y t . On the one hand, for each s, Y,E 'Ftx(-co), and on the other hand, Zt = X t - yt = X t - ( X t j ' F l x ( - m ) ) 1 'Ftx(-m) for each t . Hence Y, i Zt for all s , t E Z, which means Fly i Wx.For each t E Z, yt E 'Ftx(-ca) C 'Ftx(t)and hence 2,= Xt - yt E 'Ftx(t)and therefore 'Fty(t)@ 'Ftz(t)C 'Ftx(t).In order to complete the proof of item (a) it suffices t o show that 'Ftx(t) ' F t y ( t ) $ X Z ( t ) . Pick w E % x ( t ) ; then w = limw,, where each vector w, E M x ( t ) and hence w, = un v,, u, E C y ( t ) , v, E C z ( t ) . But since w, is Cauchy and llun - u,II 5 IIw, - w,Il, u, must be Cauchy. Hence u, -+ u E 'Fly(t). Similarly v, v E 'Ftz(t). Taking the limit of w, = u, v, we get w = u + v, which shows 'Ftx(t)c 'Fty(t)$ 'Ftz(t).
+
--f
+
220
PREDICTION OF PC SEQUENCES
For item (b), we first show that X x ( - m ) = X y ( t ) for all t E Z, which implies yt is deterministic. Since it is clear that X Y ( t ) c H x ( - m ) for each t , we show that X y ( t ) c Flx(-m). For contradiction, suppose for fixed t there is a nonzero u E Xx(-m) 8X y ( t ) , then for each s f t we have u IY, and u I 2, so that u IX , = Y, 2, and hence u i X x ( t ) , and therefore u I Xx(-m). This implies u = 0, which is a contradiction. For item (c), using Lemma 8.1, for any t E Z,we can write
+
K+T = ( X t + T I H x ( - m ) ) = ( u x X t I H x ( - m ) ) = u x ( X t I H x ( - m ) ) l from which one can conclude that Yt is PC-T. The argument in the proof of item (c) also shows that UX and U y act the same on H Y , which together with the fact Hx(-m) = HY proved above gives U y = U x l ~ ~ which ( ~ ) proves , the statement in item (d) about yt. The statements about 2,follow easily.
8.2
INNOVATIONS
Suppose for the nondecreasing spaces X x ( t ) there is some t o where the inclusion X x ( t 0 ) 3 X x ( t o - 1 ) is proper. Then something is added to the history of the sequence at time t o , and this motivates the following definition.
Definition 8.6 The innovation space of the sequence X t at time t is defined to be
Z x ( t ) = X x ( t ) 8X x ( t
- 1 ) = {X E
7-lx(t) : x Ixx(t- 1 ) ) .
The preceding can also be expressed as
R x ( t )= Z x ( t ) @ X x ( t - 1) and this suggests that, continuing to iterate, one can express the entire history as X x ( t ) = T x ( t )@ T X ( t - 1 ) e3 . . . , but this in not quite correct because any nonzero vector x E X x ( - m ) will not necessarily be in @ CjSt Z x ( j ) . But such vectors are in X X ( - W ) . SO our next guess is
X x ( t ) = T x ( t ) @ Z x (t l ) @ ~ ’ ~ @ X x ( .- m ) This is in fact true. We will now make it a bit more precise while presenting some basic facts about T x ( t ) that are important for understanding its role in
INNOVATIONS
221
prediction. We do this for PC-T sequences and mention the connection to the univariate stationary case as well as the multivariate stationary case. Lemma 8.2 If X t is a PC-T sequence with unitary operator U , then (a) T x ( t
+ T ) = UTx(t);
(b) I x ( t )IX x ( - c o ) f o r every t E
Z,which implies
T x ( t )= 7 i z ( t )0 K z ( t - 1); (c) d x ( t ) = dimZx(t) = 0 or 1 and d x ( t
+ T ) = d x ( t ) , t E Z.
Proof. For item (a), first write
zx(t+ T )
+
= = =
+
7ix(t T ) e Xx(t T - 1) U X x ( t )8 U X x ( t - 1) u[7ix(t) 8 Xx(t - I)]
where the last line follows from the fact that U is unitary (see problem 3.4 in Chapter 3). For item (b), take two vectors u E I x ( t ) and v E 7ix(-co). Thus u E X x ( t ) and u i X x ( t - 1) but v E X x ( s ) for all s and in particular for s = t - 1 so ( u . v )= 0. To see the second part of item (b) write
zx(t)= 7ix(t) 8 7ix(t - 1)
[ X y ( t )@ 7 i ~ ( t8 ) ][ X Y (-~ 1) @ % ( t - I)] [7iy(t)8 X y ( t - I)] @ [ X Z ( ~8)7-W - 1)) = [XZ(~ 8)X Z ( -~ 111 = Z Z ( ~ ) ,
=
=
because for all s, t we have ‘ H z ( s ) I7 i y ( t ) and X y ( t ) 8 X y ( t - 1) = (0). For item (c), we only need t o observe that the vector X t is either in X x ( t - l), in which case dimZx(t) = 0, or if not, then dimZx(t) = 1. The periodicity in the innovation dimension, d x ( t ) = dx(t T ) follows immediately from item (a). I
+
If X t is stationary then T = 1 and from this it follows that either d x ( t ) = 0 for all t E Z or d x ( t ) = 1 for all t E Z. In the first case we can conclude that X t is singular or globally deterministic; in the second case X t has a regular or purely nondeterministic component. The PC-2 sequence X t formed by “doubling” (see (8.5)) clearly has dx(t) = 1 for t even and d x ( t ) = 0 for t Odd. The preceding lemma also shows that the block innovation at time t , defined by
zT(t)
= =
7 i x ( t )e xx(t- T ) I x ( t ) @zx(t - 1)@ . . . @ I X ( t - T t l ) ,
(8.11)
222
PREDICTION OF PC SEQUENCES
is of dimension
c
T-1
d Z ( t )=
(8.12)
d x ( t - s).
s=o
Furthermore, the periodicity d x ( t ) = d x ( t + T )implies d $ ( t ) is constant with respect to t. This constant d s gives a very natural way to define the rank of the PC-T sequences.
Definition 8.7 The rank r x of a PC-T sequence is the number of times in one period that the innovation is not trivial, that is, the number of times in one period that dx ( t )= 1. This is consistent with the rank (see Section 4.4.3) of the T-variate stationary process obtained by blocking. It is also clear that T $ ( t ) IX x ( - c o ) for every t E Z. The PC-2 sequence X t formed by "doubling" a regular sequence yt is thus of rank 1. Now we can state and prove our earlier suggestion, that V Z ( t ) is precisely the orthogonal sum of the current and previous innovation spaces.
Proposition 8.2 If X, is PC-T then 00
X Z ( t ) = T x ( t ) @ Z x ( t - 1 ) @. . . = % C z x ( t - j ) . j=O
Proof. The inclusion
j=O
follows from item (b) in lemma 8.2. To show the opposite inclusion, we use the truth of preceding inclusion to write
XZ(t)=
@Cz,(t -j )@ M , 3=0
+
for some subspace M . Now if u E X z ( t ) we can write u = v w, where v E Cj"=,@ T x ( t - j ) and w E M , which implies w iT x ( t - j ) for j = 0 , 1 , 2 , . . . . Thus w E X z ( t ) and w IT x ( t ) ,which imply u E X Z ( t - 1). This argument can be carried out inductively to reach the conclusion w E X ~ ( t - j for ) j E Z, which, because 2,is purely nondeterministic, implies
nxz(t m
- j ) = (0).
j=O
INNOVATIONS
Hence w
=0
and consequently u E @
Zx(t - j ) .
223
I
The following corollary follows easily from (8.11).
Corollary 8.2.1 If X t is a PC-T sequence, then M
j=O
The representation of % z ( t ) by an orthogonal sum of innovation spaces gives rise to the moving average representation of Zt (or of X t when X t is purely nondeterministic). From the previous discussion about dx ( t ) we now denote D+ = { t : dx(t)> 0) (8.13) to be the set of time indices where X t has positive innovation dimension. We note that D+ is a periodic set in the sense that t E D+ implies t kT E D+ for every k E Z,and that d$ = card (D+ n {0,1,..., T - 1)). This notation permits us to give the moving average representation even when X t is not of full rank, meaning d$ < T .
+
Proposition 8.3 (Moving Average Representation) The second order process X t is a purely nondeterministic PC-T sequence of rank d? if and only if there exists a T-periodic set (of indices) D+ with d? = card( D+tn(O, 1,..., T 1)) and a set of orthonormal innovation vectors
Z= {tm: m E D + )
(8.14)
such that for every t
where
(8.16)
and
+
~ j ( t k T ) = aj(t)
(8.17)
for every j , k , t with t - j E D+, Remark. To clarify the notation, note that if s $ D+, then there is no corresponding innovation vector Es E I . An alternative representation having T innovations per period, some of which may be ignored by u j ( t ) , is given in
224
PREDICTION OF PC SEQUENCES
the Remark following the proof. Proof. Suppose X t is given by (8.15) where the { a j ( t ) ) and Ern have the stated properties. The orthonormality of the Imand the square summability (8.16) together ensure that X t is a second order random process. Now to be specific we take t 2 s and observe
R(t,s) = (Xt,XS)
k>O:s-kED+
=
R(t
+ T ,s + T ) ,
(8.18)
where we use the fact that for t - j E D+ and s - Ic E D + , 1 t-j=s-Ic 0 t -j # - k
>
and so X t is PC-T. Since it is clear that
then t
t
showing that X t is purely nondeterministic (regular). To see that X t is of rank card(D+ n {0,1, ...,T - 1)) we note from (8.15) that for t E {0,1,...,T - 1) there are exactly card(D+ n{0,1, ...,T - 1 ) ) values of t for which X t depends on Et; for the others, X t depends only on the past innovations ( j > 0; i.e., j = 0 is not permitted). In other words, X t has exactly card(D+ n {0,1, ..., T - 1)) nonzero innovations for t E {0,1, ...,T - l} (so d x ( t ) = dimZx(t) = 1 for precisely these t ) and this implies rank ( X ) = card(D+ n {0,1, ...,T - 1)) . Conversely, suppose X t is PC-T, purely nondeterministic, and of rank d$. Since the innovation spaces ZX ( p ) appearing in (8.13) are of dimension at most one and Zx(p) IZ x ( q ) for p # q , we may express any vector Y E ' H x ( t )as
225
INNOVATIONS
where D + = { t : d x ( t ) = dimZx(t) = 1) and when p E D'. Finally, since Xt E X x ( t ) we may write
c
xt =
tPis the unit
vector of Zx(p)
aj(tItt-3.
2 2 0 :t-j€D+
where for each t we must have
c
la3(t)12
< m,
3 2 0 t-jED+
which are (8.15) and (8.16). To obtain the periodicity of the a,(t) we first write
c
Xt+T =
a3
( t -tT ) c t + T - j
=
320 t+T-jEDf
a3
( t + T)[t+T-J.
3 2 0 t-jED+
where the change of indexing in the last expression follows because D + is a periodic set (with period T , due to Xt being PC-T). But then we may also express Xt+T =
UXxt =
c
a,(Wx[tt-,I =
c
aJ(t)Et+T-J,
320 t-j€D+
3 2 0 t-jED+
where Ux may be brought inside the sum due to mean-square convergence of the partial sums and continuity of UX. The last equality follows from the fact that for p E D + we have Q+T = U X & , a conclusion that may be drawn from I x ( t T ) = UxZx(t), which was established in Lemma 8.2. Then from the uniqueness of the decomposition it follows that
+
a&) = a,(t whenever t
-
J
+T )
E D + , as we have claimed in (8.17).
Remark. The regular part has an alternative representation that may be useful. More precisely, a second order sequence Xt is PC-T, purely nondeterministic, and of rank d z if and only if there exist a periodic set (of indices) D + of period T having d g = card(D+ n {0,1, ..., T - 1)) and a sequence of orthonormal vectors {[', : r n E Z } (8.19)
z=
such that for every t (8.20)
226
PREDICTIONOF PC SEQUENCES
with (8.21) and
a;(t
+ k T ) = a;(t)
(8.22)
a;(t) = 0
(8.23)
for every k , t and j 2 0 but
whenever t - j $! D+. The representation (8.20) for X t has T orthonormal vectors per period but ignores some of them via (8.23) if d; < T .
8.3 PERIODIC AUTOREGRESSIONS OF ORDER 1 From Section 2.1.7, a second order sequence X t is called a periodic autoregression of order 1, or PAR(l), if it satisfies
where {& : t E Z}is a collection of orthonormal random variables and 4(t) = 4(t T ) and a ( t ) = a(t T ) . In the introduction to this topic in Section 2.1.7 we set a ( t ) = 1 but here we will permit o(t),t = 0 , 1 , . . . , T - 1 to be arbitrary real numbers, which can be taken to be nonnegative without any loss in generality. However, we assume that not all a ( t ) are zero. There are two important related concepts that must be considered in addressing the solution X t of this simple system, namely, boundedness of solutions and causality. Boundedness means that supt / / X t / < / 03 and causality means 7ix(t)c R<(t). If X t is causal then X t cannot depend on future values of Et (i.e., on Es for s > t ) . From Chapter 2 we already know that the number A = $ ( t ) plays a crucial role in the nature of the solutions to (8.24). The following theorem gives the relationship between these notions.
+
+
nT=i'
Theorem 8.1 Let X t be a PAR(1) sequence given b y (8.24). A n y two of the following conditions implies the other one. (a) X t is bounded; (b) 7ix ( t ) C 7ic( t )(causality),
PERIODIC AUTOREGRESSIONS OF ORDER 1
- 1 ) .. . # ( t - T A ~ - l ( t ) = $(t)@(t = A := AT +(t)4(t- 1).. . $(t - T
227
+2)
+ 1)
(8.25)
and then the first recursion of (8.24) produces
xt
= 4(t)[4(t -
1)Xt-z
+ a(t
-
+
1)&-1] a(t)
and repeating N - 2 times gives
c
N-1
xt =
+ AN(t)Xt-N,
Aj(t)a(t- j ) & j
(8.26)
j=O
where A j ( t ) = APA,(t) for j = p T + r , 0 5 condition implies that
T
< T . For N
= MT the causality
Since X t is assumed to be bounded, we must have
which implies 1Al < 1. Now assume (b) and (c) hold. Since we have assumed causality, we have (8.27). Taking the limit as M -+ cm of both sides of (8.27), the limit of the second term on the right becomes zero (because IAl < 1) and we get
c
A;(t)a2(t- k ) for every t .
k=O
Therefore X t is bounded. Finally assume that (a) and (c) hold. Using (8.26) with N = MT we get MT--1
Xt
=
C
j=O
A j ( t ) a ( t- j ) c t - j
+AZMXt-2~+i.
(8.28)
228
PREDICTION OF PC SEQUENCES
Letting M go to infinity, since X t is bounded and IAI on the right goes to zero, giving
<
1, the second term
(8.29) j=O
which shows
Xt is causal.
I
The preceding theorem gives sufficient conditions for X t to be P C
Proposition 8.4 Any two of the conditions an the statement of Theorem 8.1 implies the solution to (8.24) is PC-T. Proof. Since any two of the conditions implies the third and hence (8.29) is alwasy true along with C, lA,(t)12a2(t- j ) < 03. Taking s > t to be specific, the thus compute 00
R ( s , t )= E { X , X t } = C A , ( t ) A S - t + J ( s ) g Z (- tj ) ,
,=O
+
+
which together with A, ( t )= A, ( t T ) implies R(s,t ) = R(s T ,t ing X t is PC-T. I
+ T ) mean-
Note that lc$(t)i > 1 is possible for some t and still IAl < 1. In other terms, the system described by (8.24) can be locally expanding for some t and locally contracting for other t and yet contracting on the whole, IAl < 1. It is also clear that under any two of the conditions of Theorem 8.1, the resulting causality 'Hx( t )c ' H c ( t ) implies X t is purely nondeterministic. So when a PAR(1) sequence is PC-T and either causal or IAl < 1, it must have an infinite moving average representation of the form (8.15) or (8.20). The next corollary explicitly gives the coefficients aj ( t ) and a5 ( t ) appearing in (8.15) and (8.20).
Corollary 8.4.1 If X t is a causal PC solution to the PAR(1) system (8.24), then we have the moving average representation (8.30) j=O
+
where q ( t )= APA,(t)a(t - r ) = aj(t T ) for j = p T and A p ( t ) defined b y (8.25) and C,"=,iaj(t)12< m.
+
T,
05
T
< T with A
SPECTRAL DENStTY OF REGULAR PC SEQUENCES
229
Proof. The proof follows from identifying the coefficients aj(t) with the coefficients in representation (8.29). It is clear that aj(t) = aj(t T) from the periodicity of A T ( t )and a ( t ) . We obtain representations of the type (8.15) by ignoring the extraneous & j , where a(t - j ) = 0, and we obtain a representation of type (8.20) if we include them but then their coefficients must satisfy a j ( t ) = 0. I
+
Before continuing further, we need to say something about the effects of
a ( t ) = 0 and 4 ( t ) = 0. First, if a(t0) = 0, then it is clear that EX(t0) = XX(t0 - 1) and hence d x ( t 0 ) = 0. Thus rank (X) = card { t E 0 , 1 , . . . T - 1 : a ( t ) # 0). Furthermore, X t is deterministic since X t E X x ( t - 1) at all t for which a ( t ) = 0. Hence if X t is a PAR(1) sequence and PC, then it is deterministic if and only if it is not full rank. The occurrence of 4 ( t ) = 0 does not affect rank but only the memory. For if 4(to) = 0, then clearly A = 0 and so the infinite moving average representation (8.29) terminates after some finite number J of terms, where J < T . If for some t we have both a ( t ) = 0 and 4(t)= 0, then X t = 0, but still X t E X x ( t - 1) so that also d x ( t ) = 0, meaning there is no innovation at t that contributes to the rank. 8.4
SPECTRAL DENSITY OF REGULAR PC SEQUENCES
We begin with the spectral density
dF f(X) = -(A) dP ( p is Lebesgue measure) of the T-variate stationary sequence X n made from T-blocks of a PC-T process X t . For a regular PC-T sequence of rank T = d?, it may be seen from the infinite moving average representation (8.15) that X t depends only on T innovation vectors from the block of indices { t ,t - 1, . . . , t - T 1). In order to make a correspondence between t and a block number n we assume t = nT. We define 6, to be the column vector whose components are & E Z : j E {nT - T 1 5 j 5 n T } n D+. Using this notation X , may be expressed in terms of the vectors E k for k 5 n and the coefficients are taken from the a3( t ) ,where t - j E D+.Similarly, Xt-l can be expressed in terms of the same El, except the coefficient for the innovation occurring at t , which, if among the components of E n , must be zero. This can be continued to X + - T + ~to obtain (8.31)
+
+
230
PREDICTION O F PC SEQUENCES
where A, is of dimension T x
T
and (8.16) implies (8.32)
c
which means
IAF12 < 20
P10 ..
for all the entries AiJ of A,, whose dimension is T x T . To obtain A0 explicitly, denote { j i , j 2 , .. . , j , } = { j : 0 5 j 5 T - 1 and t - j E D + } , where the j , are also ordered j l < j 2 . . . < j,. Then
where we note that for j = j l , j 2 , . . . , j , the coefficient aj(t') is never present whenever t' - j $ D+. But, in addition, for row k we take a j ( t - k ) = 0 if j < k. The spectral density of X, can be written (see Theorem 4.11)
f(X)
=
@(X)@*(X),
where
(8.34)
c 00
@(A) =
Apeixp
p=o
Recall that the matrix valued measures F and F, linked in (6.41) by the continuous transformation V , are both absolutely continuous or neither are. And since the collection measures {PO,F 1 , . . . , F T - ~ are } formed by splicing of the elements of F, then F,F, and the collection {Po,P I , .. . , F T - ~ all } are absolutely continuous (meaning all elements of the matrix are absolutely continuous) or none are. Applied to the current case, since F is absolutely continuous with density f ( X ) , then F and the collection { P OF, l , . . . , F T - ~ are } absolutely continuous, where the densities of the collection are { f o , f 1 , . . . , f ~ - - l } . We can also note that when the PC sequence has rank T , then from Theorem 4.11 and Proposition 4.14, f ( X ) is of rank T for a.e. X and from the invertibility and continuity of V(X) we can conclude that rank f ( X ) = rank fz(X) for a.e. A. The rank is reflected into the collection { f a , fi, . . . , f ~ - 1 }through the way they are formed from the elements of fz.
SPECTRAL DENSITY OF REGULAR PC SEQUENCES
231
8.4.1 Spectral Densities for PAR(1) There are two ways to compute the spectral densities for a PAR(1): a direct method that retains the nonstationary setup, and the lifting method that translates the problem to a vector autoregression or VAR. We will present them both.
8.4.1.1 The Direct Method This is a frequency domain approach in the sense discussed in Chapter 4. It was presented in a treatment of a first order autoregression with almost periodic coefficients [147]. We assume IA/ < 1 so X t given by (8.24) is PC-T. Let V denote the unitary operator, mapping 'Hx to itself, defined by V& = &+I. Let 9 be the unitary linear transformation defined by Q : Et -+ eat . which maps Me onto L2 = L2([0,27r),dp). Here and in the sequel dp denotes the normalized Lebesgue measure and [O, 27r) is regarded as a group with addition mod 27r. It is easy to see that QVQ-' is the operator of multiplication by ez and the process X t is unitarily equivalent to the L2 sequence ht. In terms of the equivalent sequence ht the system (8.24) takes the form
ht(.) = #(t)ht-l(.)
+ o(t)eit',
t
E
Z
(8.35)
and the moving average representation (8.30) takes the form
(8.36) j=O
+
+
where a j ( t ) = ApA,(t)a(t - T ) = a j ( t T) for j = p T T , 0 5 T < T with A and A p ( t )defined by (8.25) and C,"=, iaj(t)i2 < 03. Now one can express ht as ht = eit'gt, in terms of a periodic sequence gt given by
(8.37)
232
PREDICTION OF PC SEQUENCES
Indeed, if IAl
< 1, then from (8.36)we can write
gt(X)
=
e-itxht(X)
A j ( t ) a ( t- j ) e - i j x
= j=O
T-1
ffi
N=O
-
k=O T-1
[I - Ae-iTx]-l
(c
Ak(t)a(t- k)eCikx
k=O
We identify
L cO /T-1
Gt(X)=
\
Ak(t)a(t- k)e-ikx
and note that Gt(X) is the source of the periodicity g t ( X ) = g t + T ( X ) . Also, since rank deficiency of X t corresponds to a ( t )= 0 for some values o f t , then Gt(X) also carries the rank information. If Ak(t) # 0, k = 0 , 1 , . . . , T - 1 for some t , then rank deficiency means that, for at least one value of k , a term e-'lCx does not appear in the Fourier series of Gt(X). Recall from (1.17)that the spectral measures Fk can be defined by
from which we obtain the following. Proposition 8.5 Let ( # ( t ) ) be a T-periodic sequence of nonzero complex < 1. Let ( F k ) , k = 0,. . . , T - 1 be the numbers with IAi = J$(l)...$(T)I spectrum of the (PC) solution to the system (8.24). Then the measures yk are absolutely continuous with respect to the normalized Lebesgue measure dX and -(A) d Fk
dX
=
T-1
(8.39)
1=0
where Gt(X)= CzLt Ak(t)a(t-k)e-ikX and G j ( X ) = j E
z.
+
11 - Ae"(X))-2GL(A + 2 7 r l / T ) G ~ - k ( A 27rl/T),
CFz:
Gt(X)e2qiJt/T,
SPECTRAL DENSITY OF REGULAR PC SEQUENCES
233
Proof. Let gt(A) be given by (8.37) and from the periodicity, gt(A) = gt+T(A) we can write (dropping the A) gt = Cyit e-2Tij/Tg3,
Hence, in view of (8.38) and the equivalence of
+
(X,) and (eit'gt),
+
Since g t ( A 27rl/T) = [l- P e - i T ( X ) ] - l G t ( A 27rl/T) and the Fourier coefficients determine a measure, the proposition is proved. I
8.4.1.2 The Method of Lifting Although the following is easily proved for PARMA(p, q ) systems, this generality is not needed here. We omit the proof.
Proposition 8.6 T h e univariate PAR(1) system (8.24) can be expressed as a T-variate VAR(1) (8.40)
where (8.41)
(8.42)
234
PREDICTION OF PC SEQUENCES
with COU(Em,en) = bm-nIT,
... -+(T i€Jo =
... ...
... - 1) . . .
$ ( T - 2)
1 0
...
>
(8.43)
1
0 0 0
... 0 ... 0 ...
-
0
......
(8.44)
-
By premultiplying each side of (8.40) by the invertible transformation we can obtain the usual form
@il
where @ ( z )= IT - @&"@lz. Note the elements of X, are ordered from later to earlier and do not overlap with the elements of X,-1. If the elements of X, are ordered from earlier t o later. the matrix @O will be lower triangular rat her than upper triangular. Using well known results [28, Theorem 11.3.11from multivariate sequences, if det @ ( z )# 0 for IzI 5 1, then X, is causal in the sense that (8.45) j=O
where * ( z ) = @ - l ( z ) @ i l O with ~ density is of the form (4.82),
C,"=,119,Ij <
co. Then the spectral
which exists and is continuous (hence bounded) for all X E [0.27r). Since by assumption @(e-") is invertible for all X E [0,27r),and @O is always invertible. then clearly rank f (A) = rank 0 0= T . As long as the condition for causality is met, the rank of f(X), which is the (innovation) rank of the sequence. is governed by 00. An expression of the form (8.46) was introduced for PARMA sequences by Sakai [205].
LEAST MEAN-SQUAREPREDICTION
8.5
235
LEAST MEAN-SQUARE PREDICTION
The problem of linear mean-square predictkn or interpolation addresses the formation of a least mean-square estimator X t for t E S, (the indices of prediction) based on observed values of X , for s E So (the indices of observations). Here the set So can be an infinite past, So = { s E Z : s 5 t } , or a finite past So = { s E Z : t - N 5 s 5 t } . Sometimes the observations are also is always an optimal predictor comprised from the called predictors, but predictors in So.
zt
8.5.1 Prediction Based on Infinite Past
+
Here we will consider only the case S, = { t 6); that is, we predict Xt+6. And we will focus here on regular PC-T sequences because the prediction error for the singular component is zero; the prediction error comes only from the regular component. As discussed in Chapter 4, the optimal least squares predictor gt+,jis that vector in the subspace of observations 7 - l (~t )that minimizes ll%+6
-
Xt+6//;
the solution (the sought vector) is the orthogonal projection (Xt+Gl'Hx(t))of Xt+6 onto R x ( t ) . Now since X t is assumed regular so it has representation (8.15),then
Xt+6
= J
c
a j ( t + 6)&+6-j
2 0 :t +6 -- j ED+
(8.47) makes it easy to see that g t + 6 = (Xt+6 / E X ( t ) )=
2
aj ( t
+ S)'$t+b-j
j=6 : t + J - j € D +
because of the orthogonality of the &. The prediction error et+6 = Xt+6 -
(Xt+617-lx(t))
has the variance 6-1
(8.48)
236
PREDICTION OF PC SEQUENCES
Since we also have llXt+sl12=
c
b j ( t + S)12,
j 2 0 :t+J-j€D+
it follows that if la,(t)l decreases slowly with respect to j , then llet+b112 will be a small fraction of IIXt+6112. If laj(t)l decreases quickly with respect to j , llet+61/2will be a larger fraction of IIXt+6112.To get a real example we examine the PAR( 1) PC sequences.
8.5.2
Prediction for a PAR(1) Sequence
Now we can address the linear prediction of a PAR(1) that is PC. We wish to express Px,(t,X(t 6) explicitly in terms of the coefficients $ ( t ) and a ( t ) .
+
Proposition 8.7 If Xt is a PC solution to (8.24) where X t is causal (or IAl < l), then using the notation of Corollary 8.4.1, %+6
where 6 = p T
+W
= ( x t + 6 l E x ( t ) )= APAT(t
t ,
(8.49)
+ r for 0 5 r < T.
Proof. First we observe that X+6 =
( X t + 6 / x x ( t )=) (xt+b/EFI;(t)),
where E i ( t ) = @{& : s 5 t and s E D + } . Here, D+ turns out to be the set
{ t E z : a ( t )# O } . The desired projection can be written explicitly from the infinite moving average representation (8.29)
M
k=O:t-kED+ 30
=
APA,(t + 6 ) C & ( t ) a ( t - k ) c t - k = APA,(t + 6 ) X t , (8.50) k=O
where the third line follows from the definition of A j ( t ) and the relation Aj+b(t
+ 6)= A6(t + S ) A j ( t ) ,
which can be derived from (8.25). This finally yields Ab(t where 6 = p T r with 0 5 r < T. I
+
+ 6) = APA,(t + 6);
237
LEAST MEAN-SQUARE PREDICTION
It is now quite easy to obtain the prediction error.
Corollary 8.7.1 Under the conditions of Proposition 8.7
where j
=p(j)T
6-1
6-1
j=O
,=O
(8.51)
+ r ( j ) ,0 5 r ( j ) < T .
In the case of predicting to t+l (6 = 1) based on X x ( t ) ,the prediction error is zero if o(t)= 0, and on the other hand, the prediction is 0 if AoAl(t 1) = 4(t 1) = 0, and then the prediction error has variance
+
+
2 c 2 X , t (t + 6) = /lXt+l - 2t+ll12 = IIXt+lll .
The prediction error can be zero for other t , 6 provided 6 < T - 1 because we assume that o ( t ) > 0 for at least one t in {0,1, . . . , T - 1). Aside from this constraint for general 6, any occurrences of a 2 ( t- r ) = 0 clearly diminish the prediction error. On the other hand, the prediction will be 0 whenever there is a s E { t ,t 1 ,.. . t 6) with 4 ( s ) = 0 because then APA,(t 6) = 0. This of course again leads to
+
+
+
+ 6) = llXt+b - Xt+6ll2 = llXt+sII2 h
O$,$(t
’
As in the stationary case, the general solution to the prediction of PC sequences based on the past infinite sequence of observations is considered beyond our current goals. But the path to it, via spectral theory for multivariate sequences, seems clear. From a practical viewpoint, its need for practical computing has been diminished due to ever-increasing computing capacity that makes finite past computation practical for very large finite pasts. Thus it beconies very important to solve the problem of prediction based on finite pasts.
8.5.3
Finite Past Prediction
+
The problem addressed here is the prediction of X ( t 6) based on the n observations { X t P n + l ,. . . , X t } . The best linear predictor is, of course, the orthogonal projection of Xt+6 onto M ( t ;n) = sp{X,, t - n 1 5 s 5 t } ) and we will thus denote Xt+6,t;n = (Xt+6IM(t;n ) ) . (8.52)
+
We shall now consider only real sequences and S in the linear expression
= 1, and
seek the coefficients
n.
(8.53)
j=1
238
PREDICTION OF PC SEQUENCES
The normal equations arising from the properties of projection are
E{[Xt+l -2?t+l,t;n]Xs}=0, s = t - n + l ) " ' , t , or
c n
c$'')R(t
-
j
+ 1,S ) = R(t + 1,s ) ,
s = t - TZ + 1, . . . , t ,
(8.54)
j=l
which can be expressed in matrix form as
...
Rt,t
.
Rt,t-n+l
'
Rt-n+l,t
Rt-n+l,t-n+l
where we use Ru,ufor R ( u ,w ) as needed. In a shorter notation (8.55) becomes
rt+l,t:t-n+l = R(t,n)an (t+l). For any a:+') = [an1 ( t + l )an2 (t+l) error
...
(8.56)
that solves (8.55): the prediction h
Et+l,n
= Xt+l - Xt+l,t;n
(8.57)
has variance
.i(t + 1)
h
=
I/Xt+l - Xt+l,t:nl12
In the stationary case the prediction of Xt-n based on M ( t ;n ) = s p { X , : n 1 5 s 5 t } is the same problem as prediction of Xt+l due to the symmetry arising from stationarity (Section 4.2.5). But in the nonstationary case it must be addressed separately. But still from the arguments above, the coefficients ,!?$n) in the best linear estimator
t
-
+
n
(8.59) j=1
are determined by
rt--n,t:t-n+i
= R(t,n ) P (t-1 n
(8.60)
239
LEAST MEAN-SQUARE PREDICTION
and the prediction error h
(8.61)
ct-n,n = Xt-n - Xt-n,t;n
has variance
o:(t
-
n ) = IlXt-, - (xt-,IM(t; n))112 h
=
E{ [Xt-n - Xt-n,t;n]Xt-n}
=
R ( t - n , t - n ) - Cp!J-n’R(t j=1
=
R ( t - n, t - n) - rL-n,t:t-n+lp(t-4.
n -
n,t -j
+ 1) (8.62)
we present for PC sequences, some important properties of Zt+1,tin and Xt-n,t;n corresponding to Propositions 4.5 and 4.15,and again in a manner motivated by Pourahmadi [183, Chapter 71. NOW h
h
Proposition 8.8 If Xt is PC-T and zt+l,t;n and Xt-n,t;n are the best linear predictors of Xt+l and Xt-n based on { X t - n + l , . . . , X t } , then (a) if at+’)solves (8.55) and @,-”’solves (8.60), then irrespective of invertibility of R ( t ,n ) , they are also solutions when t is replaced with t+T;
(b) for n fixed, irrespective of the invertibility of R(t,n ) ,
+ 1) = o:(t + T + 1) and o i ( t n ) = o:(t - n + T ) ; (8.63) (c) for t fixed, u i ( t + 1) is bounded and nonincreasing with respect to n, and (t + 1), which is 0 if { X , } is deterministic at t + 1; o i ( t + 1) --+ X,t (d) R ( t + 1 , n + 1) is invertible if and only i f o z ( t + 1) > 0 and R ( t , n ) is o:(t
-
0%
invertible; whenever R ( t ,n ) is invertible o;(t
+ 1)
=
R(t
+ 1,t + 1)- r:+l.t:t--n+lR(t,n)-’rt+l,t:t-,+l (8.64)
If oz(t + 1) > 0, then rank R(t + 1,n + 1) = rank R ( t :n ) + 1.
+
1) is invertible zf and only if o;(t - n) > 0 and R ( t , n ) is invertible; whenever R ( t ,n) is invertible
(e) R ( t , n
1 o z ( t - n ) = R ( t - n. t - n ) - rL-n,t:t-n+lR(t,n)- rt-n,t:t--n+l
(8.65)
240
PREDICTION OF PC SEQUENCES
Ifa:(t - n ) > 0 , then r a n k R ( t , n + 1) = r a n k R ( t , n ) + 1. ( f ) if X t is nondeterministic at to
+
1, then .:(to in addition, jR(t0,n)l # 0 for n 2 1, then
a2- ( t X,t
+ 1) > 0 for all n 2 1; if
+ 1) = exp
Proof. For (a), the vectors rt+l,t:t-n+l,rt-n,t:t-n+l and matrix R ( t , n) in ( 8 . 5 5 ) and (8.60) are invariant when t is replaced with t T . For (b), the result is clear due to (a) if a?'') is unique. But even if not a unique solution of ( 8 . 5 5 ) , still represents the projection P M ( ~ ; ~ ) X ~ because it solves the normal equations (8.54) and the claim follows by using the top line of (8.58). The same argument gives the result for a:(t - n ) . For (c), a i ( t 1) is bounded and nonincreasing because of the top line of (8.58); the limit argument follows first because limn+m M ( t ;n ) = sp{X,, s 5 t } and then because the predictor zt+lthat achieves error variance ~ $ , ~ ( t + l ) can be approximated arbitrarily closely by elements of sp{X,, s 5 t } . For (d), note that
+
at+')
+
R(t
+ 1,n + 1) =
Rt +1,t+1 rt+l,t:t-n+l
r:+ 1 ,t :t - n + l R(t,n)
(8.66)
from which it follows (see [183, Chapter 71 or [148, Appendix A]) that
IR(t + 1,n + 1)I = =
[Rt+l,t+l- r:+i,t:t-n+lR(t,n)-'rt+l,t:t--n+l]x
ai(t
+ l ) / R ( t ,n)I.
lR(t, .)I (8.67)
First, (8.67) implies we can write (8.64) when IR(t,n)I # 0. To finish the proof, note that lR(t 1 , n l)l # 0 if and only if the random variables {Xt+l,. . . , Xtpn+1} are LI, and this occurs if and only if a;(t+ 1) # 0. From (8.67), lR(t 1,n 1)1 # 0 then implies JR(t,.)I # 0. The proof of (e) follows in the same manner by using
+
+
+
+
R(t,n+l) =
[ rt,
R ( t ,).
rt-n,t:t-n+1
-n,t :t-n+ 1
and
IR(4 72 + 111 = =
[Rt-n,t-n -
':-n,t:t-n+lR(t,
Rt-n,t-n
I
n)-lrt-n,t:t-n+11
ai(t - n)lR(t,n)/.
For (f), if X t is nondeterministic at t o certainly a:(t+l) = //Xt+l-PM(,;,)Xt+l /I2
(8.68)
x IR(t, n)l
(8.69)
+ 1, then Xto+l $!
K,y(to) and so
> 0 for all n 2 1. If lR(t0,n)I # 0
+ ~
LEAST MEAN-SQUARE PREDICTION
+ +
for n 1 1, then o:(t 1) > 0 implies IR(t0 1) + 0% ( t 1) we finally obtain
oi(t
+
X,t
=
241
+ 1,n + 1)1# 0 for all n 2 1. Since
lim log lR(to,n)l. TL O ’O
This completes the proof.
8.5.3.1 Partial Autocorrelations For a second order random sequence X t , the partial autocorrelation is defined, for n 2 1, as
(8.70)
+
which gives the immediate interpretation that ~ ( nt , 1) is the correlation of the prediction errors E t + l , n with Et--n,n. Another interpretation is that 7r(t,n 1) is the correlation between Xt+l and X t P n when the effects on the variables {Xt-,+l,. . . X t } are removed. Note that when n = 0, we obtain ~ ( 1) t ,= Corr{Xt+lXt}, there are none between. In the nonstationary case, it is possible for either or both E t + l , n - l and ct-,+l,,-l to be zero, and this may also be true of the random variables Xt+l and Xt-n and of the predictors Xt+l,n-l and Xt-n+l,n-l. Since 7r(t,n 1) is defined as a correlation, we define it to be zero when either of the random variables E t + l , n , Et--n,n is zero. Subsequently we will give some examples of simple PC sequences that exhibit some of these situations. For nonstationary processes we hardly expect 7r(t,n 1) to be constant with respect to t for fixed n, as in the stationary case.
+
A
h
+
+
Lemma 8.3 If X t is a PC-T process, then
Proof. Using the periodicity a:::) = (similarly for p!::) and o,-l(t+l), see Proposition 8.8) and R ( s , t )= R ( s + T , t + T ) ,the periodicity
242
PREDICTION OF PC SEQUENCES
+
of ~ ( nt , 1) follows from (8.71)
E{Et+l,nft-n,n)
n
=
j
R(t+l,t-
+ 1 , t - n)
j=1 n,
k=l
(8.72) j=1 k=l
(8.73)
Remark. This result has obvious implications on the estimation of ~ ( n+ t , 1) for PC sequences. Again we note that the vectors a:+') and p!-") need not be unique solutions t o the forward and backward Yule-Walker equations because as long as they are solutions they represent the projections. The expression for ~ ( nt , 1) can be shortened, since for each k in the last line of (8.73),
+
c
n-1
-j
a$l+(t
+ 1,t - k + 1) = R(t + 1,t
-
k
+ l),
j=1
causing the cancellation of the last two lines and producing
If in the last line of (8.73) we sum first on k , then we additionally obtain
Next, we give some simple PC sequences that demonstrate the t dependence of the various quantities comprising ~ ( nt ), in (8.70). For the first simple example, consider again the sequence X t given by doubling Yt,
...
Y-1 Y-1 Yo
...
x-2
I
I
Yo
Y 1
Y1 . . .
x1
Xz
x 3
I I I I I I ;
x-1
x,
...
LEAST MEAN-SQUARE PREDICTION
where yt is an orthonormal sequence. Then using (8.5) and ilytll obtain ( n = 0)
243
= 1, we
Next, n ( t ,2 ) = 0 because h
- Xt-1,1 = 0 tt+l,l = Xt+l - X t + l , l = 0
Et-1,l = Xt-1
h
t odd, t even,
where these follow from Xt-1 = X t for t odd and Xt+l = X t for t even. From here it is easy to see that for n 2 2, ~ ( nt , 1) = 0 because Et+l,n--l is always 0 or Xt+l and in the latter case, Xt+l 1X t P n + l . If X t is a causal PAR(1) sequence (8.24), then n ( t , l ) = E { X t + l X t } = +(t+ l ) a $ ( t ) ,which shows that ~ ( 1) t ,can be zero. Note that in order for o$(t+l) = I@(t+l)12a$(t)+a2(t+l) = 0, it is necessary for either 4(t+l) = 0 or a$(t) = 0. Either of these produces ~ ( 1) t . = 0. For n 2 2 . observe that
+
+
h
Et+l,n-l
=
ct-n+l,n-l
=
- Xt+l,n-l = a(t 1)Jt+l, Xt-n+l - ( X t - n + l l M ( t ;n - 1)). Xt+l
so Jt+l i [XtPn+l- Xt-n+l,n--l] implies n(t,n + 1) = 0. Note that ax@) = 0 means a rank deficiency occurs for any collection that includes X t . But n(t,1) = 0 only means E { X t + l X t } = 0, which occurs, for example, if X t is an orthogonal sequence. But for P C sequences, the above examples show the possibility of n ( t ,1) # 0 for some t while T ( t ,1) = 0 for h
other t .
8.5.3.2 Durbin-Levinson Algorithm The idea of the Durbin-Levinson algorithm is to find a computationally economical way to compute a : : ; ) given the vector of predictor coefficients a?+')(a solution of (8.56)). We follow the general presentation of Pourahmadi [183, Chapter 71 (which is similar to that of Sharf [207]). Write the matrix equation (8.56) with n 1 replacing n, as
+
[
rt+l,t:t-n+l
] [ =
Rt+l,t-n
R(t,n ) r: -n.t :t-n+ 1
rt-n,t:t-n+l Rt -n,t -n
] [ :;]
(8.76)
'
We seek the vector of coefficients at:;)' = [ ~ L C Y ~Writing ]'. the two equations separately produces rt+l,t:t-n+l Rt+l,t-n
1
=
R(t,n ) a , + rt-n,t:t-n+lal, rk-n,t:t-n+la,
+ alRt-n,t-n.
(8.77)
244
PREDICTION OF PC SEQUENCES
Since rt+l,t:t-n+l= R(t,n)a:+') it is natural to try a , transforms the preceding into 0 Rt+l,t-n
= a:+')
+ w , which
R(t,n)w + rt-n,t:t-n+lQL = r:-n,t:t-n+l(a2+1) + w) + a d L n , t - n . =
But the top line is solved by w =
(8.78)
and the bottom line gives
So given a:+') and &-"), if 7r(t,n + 1) # 0 (meaning both Xt+l and X t P n are LI of M ( t ,n - l ) ) ,determine a1 from the preceding and then a , = (t+l) If 7r(t,n 1) = 0 then (8.77) is solved by = 0 and a n a, = a:+'), which makes perfect sense because Xt-, does not add anything new. For the backward coefficients, we solve for &-;4 in terms of (predicting to the same time t - n based on a longer sample into the future). Beginning with (8.60) we obtain
+
Rt+l,t+l
r:+1,t:t--n+1
rt-n,t:t-n+l
which leads, as above, to
Pu
=
Rt-n,t+l - ri+l,t:t--n+l n
Rtt 1, t + l - .:+1
,t:t-n+1 an
-
+
7r(t,n l ) a n ( t - 12) an(t + 1)
(8.81)
and pk;? = [pu pi]', where p z = &-,-" - /3,ak+'. Given that we wish t o compute the coefficients up through some n = no, we begin with n = 0 and directly obtain a;+'" = {Rt+i,t/Rt.t},
0k-l
= {Rt-i,t/Rt,t}
for each t = 0 , l ; . . . , T - 1. All the coefficients needed for {a:", pi--" : t = 0 , 1 , . . . , T - l} are present in the first set : t = 0.1,. . . , T - l} and coefficients for any other values o f t can be obtained by periodicity. The process is continued recursively to n = no. The expressions (8.79) and (8.81) give, for PC sequences, the connection between the last regression coefficient and the partial autocorrelation (see the discussion of Durbin-Levinson algorithm in Chapter 4, or in [28, Section 3.41 for the stationary case. Another, and perhaps more practical, solution along similar lines is the innovations algorithm.
245
LEAST MEAN-SQUARE PREDICTION
8.5.3.3 Innovations Algorithm Section 4.2.5.3 showed the close connection between the Cholesky decomposition and the innovations algorithm whenever R is positive definite. The treatment given there was for general positive definite R,not just those arising from covariances of stationary sequences. However, the covariance of a nonstationary sequence need not be positive definite, although it certainly must be nonnegative definite. Here we modify the discussion of Section 4.2.5.3 to accommodate covariances R that are not necessarily positive definite, thus not necessarily of full rank.
Proposition 8.9 (Cholesky Decomposition for a NND matrix) If the n x n matrix R is nonnegative definite and of rank r , then (a) there exists a n x n lower triangular matrix 0 of rank r for which
(b) there exists a lower semitriangular n x r matrix
R
--
=
0 0’.
6 for which (8.83)
Proof. We use the same notation as in the proof of Proposition 4.6 and rely on the fact (see [56, Theorem 21) that R is nonnegative definite and of rank r if and only if there exist random variables { X I, X2, . . . , X , } of finite variance with R = CovX, where X = ( X I X , 2 , . . . , X,)’, and r is the maximum number of LI vectors that can be found in {XI, X Z ,. . . , X , } . The proof is a simple modification of the proof of Proposition 4.6. In the Gram-Schmidt orthogonalization process, whenever we find that XI, E M k - 1 = sp{X1,Xz,. . . , X k - l } , meaning that Y k = xk - P M ~ - ~= 0X (the ~ prediction error is null), we can proceed in two ways. One way gives us (a) and the other gives us (b). In the first case, we set the new basis (or innovation) vector q k to be a dummy unit vector, call it qh, orthogonal to M , and to all previous dummy vectors. The resulting 0 will be lower triangular and still X = 07,with R = OE{qqt’}O’ = OO’, giving (8.82) as required. However, since no X , depends on any of the n - r dummy vectors, there will be n - r columns of 0 that will be zero. Alternatively, we do not introduce a dummy qk when Yk = Xk-Pn/lh-lXk = 0 and only retain the qj required to represent { X I ,X z , . . . , X k } . Hence will contain only r eleme_nts when k = n, and the matrix of coefficients 0 will be -n x r . But 0 will have a semitriangular property: denoting c, = max,{8,, # 0}, then c, is nondecreasing and cn = r . 1
246
PREDICTION OF PC SEQUENCES
See the problems at the end of the chapter for the connection between the Cholesky decomposition and rank revealing factorizations (see Gu and Miaranian [83]). The recursive computing of the matrices 0 and 0 requires only a simple modification to Proposition 4.7. I
Proposition 8.10 (The Innovation Algorithm for a NND Matrix) If the n x n matrix R is nonnegative definite, then the lower triangular matrix 0 in (8.82) can be computed recursively as follows. First set 011 = [ R ( l 1)]1/2. , The remainder of the coeficients & + l , j are computed left to right beginning with k = 1 (row 2) as follows. For j = 1 , 2 , . . . , k set
For the diagonal term, set
Subsequent rows (k = 2 , . . . , n - 1) are computed in increasing order. The matrix 6 is just 0 with the null columns removed. In the context of P C sequences, the innovations algorithm for full rank PC sequences was presented by Anderson, Meerschaert, and Vecchia [lo] and Lund and Basawa [140].
PROBLEMS AND SUPPLEMENTS
8.1 How much of the claim of the Wold decomposition of X t (Proposition 8.1) can be obtained without using the result that Ex(-co) is invariant under U x ? That is, for an arbitrary subspace M of X X we can always write
x,=y,+zt, = P M X t and Zt = X t where for any s , t E Z or equivalently
-
Yt
=
P M i x t and it is clear that Yt 1 2,
H x ( t )= HY ( t )a3 H Z ( t ) because if x E H x ( t ) then x = limx,, where x, is a linear combination of the vectors { X s , s 5 t } . But then x , = y, z,, where Yn = PMx, and z, = x, - y,, and from the continuity of projection and the closedness of M it follows that there is a y E E y ( t ) and z E E z ( t ) such that x = y z. It also follows immediately that EY i X Z .
+
+
PROBLEMS A N D SUPPLEMENTS
247
Since X t is PC-T, the linearity of U X permits us to write
but at the same time
Xt+T = K+T -k
Zt+T
so can we say that k ; + ~= UxYt for all t? Yes, we can provided that U x Y t never escapes from M ; that is, M must be closed under U X . Thus we see how the invariance of H x ( - c e ) under U x is used. 8.2
AM'
A subspace M c l-l is called reducing for operator A if AM c M and c M I . Show that 'Hx(-m) is a reducing subspace for U x .
8.3 Suppose X t is PC-T and there are orthogonal processes yt and Zt with Yt deterministic and Zt purely nondeterministic such that X t = Yt +Zt. Show that yt = (Xt17ix(-oo)). 8.4
Show by direct calculation that if a PAR(1) solution is causal, then for
any s
> t. E{X,Xt}
= @(s)($(s-
1)
'
. . ($(t+ l)ff$,$?
where o $ , ~= E { X t X t } . 8.5 What statements corresponding to Proposition 8.8, part (c), can be made about o:(t - n ) as n oo? --f
8.6 Cholesky decomposition and rank revealing factorizations. The issue of a Cholesky decomposition for NND (not necessarily positive definite) matrices has been examined in the computing literature in the broader context of rank revealing factorizations. In the case of a Cholesky decomposition of a n x n NND matrix A , the resuting rank revealing decomposition produces (see Gu and Miaranian [83]) IIAII' = LDL'. (8.86) where II is a permutation matrix,
The n' x n' matrix A,, is the Cholesky factor of the full rank part of A and the rest describes the rank deficient part. The essence of this factorization is easily seen in the discussion in the text.
This Page Intentionally Left Blank
CHAPTER 9
ESTIMATION OF MEAN AND COVARIAN C E
The main topic of this chapter is the estimation of the mean
the covariance
Rt+t,,= E{[Xt+T - mt+t,][xt - m t ] }= Rt+T+r.t+T, and their Fourier coefficients
C
1 T-l mte-z2nktlT mk = . and Bk(r) = T-l T t=O I
T-1
Rt+,,te-azTkt/T. t=O
Throughout this chapter it is assumed that X t is a real PC-T sequence. In this discussion we shall mainly treat consistency in mean square (i.e. convergence in L2(R,F,P ) ) ,in order to show what can be expected to be true and t o make the connections between the harmonizable X t and the lifted T-variate Penodacally Correlated Random Sequences:Spectral Theory and Practace. By H.L. Hurd and A.G. Miamee Copyright @ 2007 John Wiley & Sons, Inc.
249
250
ESTIMATION OF MEAN AND COVARIANCE
stationary sequence X,. Strong (almost sure) consistency and asymptotic normality will also be discussed. Although results for all the consistency issues can be established via the lifted stationary sequence X,, some of the direct methods are discussed because they can also be applied to the almost periodic case, where the bijective mapping to finite dimensional vector stationary sequences is not possible [log, 1351. 9.1
ESTIMATION
OF mt: THEORY
We assume a finite sample X O ,, .~. . , X N T - ~and recall that the sample periodic mean m t , N introduced in Chapter l is
from which it is clear that % t , ~is unbiased, E{hit,N} = mt. Results on the limiting behavior of % t , can ~ be obtained from the stationarity of the lifted T-variate sequence X,, or from estimating the Fourier coefficients fib of mt:
We begin with consistency of C t , N . In order to discuss the mean-square consistency of % t , N from a spectral viewpoint, recall that the matrix valued cross spectral measure for the Tvariate stationary sequence {[X,], = X,T+,,P = O , 1, ...: T - 1, n E Z} is denoted as F and its density as f . Recall we take F and f to be associated with the covariance of X,.
Proposition 9.1 (Mean-Square Consistency of then
rjZt,~)
(a) limN-+mE { [ % t . N - mil2}= o if and only if Ftt({O)) = 0 , where E { X t + 3 T X t }=
(b)
CE-,
jRt+kT,tl
1;"e Z X J F t t ( d X ) .
< co is suficient f o r (a) and then
If X t is PC-T,
mt. THEORY
ESTIMATION OF
251
where f t t (A) is continuous.
Proof. The first claim is just a direct application of Section 4.3.3, but we indicate the proof using the notation of the PC context. Indeed, since
-
N-1
we have
which converges to F t t ( { O } ) as N -+ca. For the second part, use R(t p T , t qT) = R(t structure) to write
+
-
.
p=o
l
r=-N+1
N-1
cp=-,
C
lRs+kT,t/
The slightly weaker condition, that for arbitrary E > 0, IE{[f%,N
5 N
R(t+rT,t), r = p - q
qEI(N,r)
-c 00
N-r
TR(t + r T , t )
R(t
+ r T ,t ) = f t t ( 0 )
< ca for some s, t , then the same argument gives
NE[Gs,N
N
q)T,t ) (the PC
q=o
c c
1
- _N
If
-
N-1 N-l
-
=
+ (p
N-1 N-1
/,
L
+
- ms]"f&,N - t s ]
CE-,
- msl[%,N
IRu+k~.uj
-
+
E
]
fst(0).
< 03 for u = s, t , gives only
ml)l
2 1/2
[ E { f k , N - ms}
i f,',2(o)f;!2(o)
+
[E{Gt,N -
2 1/2
mt}
]
252
ESTIMATION OF MEAN AND COVARIANCE
for N > NO. See the problems at the end of the chapter for some related issues. It will be useful later (in spectral estimation) to have some more conditions that imply mean-square convergence of the averages of [ X , - w ~ t l e - ~ ' ~Thus . denote
.
.&(A)
1 EIJN(A)I2= 3
=
cc
N-1 N-1
t=O
Rt,se-iX( t - s )
(9.5)
s=o
Lemma 9.1 Suppose X t is a L2 random sequence whose covariance Rs,t satisfies Rt,t M for all t; then either of the conditions (a),(b) below is suficient for lim J N ( A ) = 0 in mean square (9.6)
<
N-CX
uniformly in A. (a) lirnTirn Rt+T,t= 0
(b) for X t PC-T,
uniformly in t ; IRs,tl
< m.
Proof. Clearly, for any fixed A, (9.6) is equivalent to limN-CX@&(A) = 0. TO see that (a) is sufficient, for arbitrary E > 0 choose No such that lRt+,,tJ < t / 2 for / T I > No. Defining sets A = [0,N - 11 x [0,N - 11 and
B = A n { ( s , t ): 1s - tl < No}, then
T-1 if N > 8NoM/c, where M = m a x / R s , t = / max IRt,tlt=O . Condition (b) follows easily from
ESTIMATION OF
mt:THEORY
253
or Proposition 5.12. In both of these cases the uniformity with respect to X is clear.
Remark. In the stationary case, condition (a) is limuioo R, = 0 and IR,/ < X I . condition (b) is Now we examine G i k , ~ where , the harmonizability of X t helps express the result spectrally.
xr
h
Proposition 9.2 If X t is PC-T and <(A) is the random spectral measure associated with X t , then in the mean-square sense we have
c
T-I h
lim
N-CC
% k p
= <({27rk/T}), lim " 0 3
%t,N =
< ( { 2 ~ l c / T } ) e ~. ~ "( ~9 .~7 )' ~
k=O
Proof. Note that here N refers to the number of whole periods in the sample.
-
.
-
NT-1
~
L' NT
cc
N-1T-I
Xt+pT
p=O
e--i27rk(t+PT)
t=O
--i2nkt/T
T
t=O
The first claim follows from Proposition 5.12. The second claim follows from the inversion of (9 . 2 ), T-1
k=O
and the first claim.
I A
Proposition 9.3 (Mean-Square Consistency of G i k , ~ If ) X t is PC-T, F is its spectral measure, and FO is the diagonal measure (associated with B O ( T ) ) , then h
(a) limN-m E{ [ f i k , N - G i k I 2 } = O zf and only if F ( { 2 x k / T } , { 2 ~ l c / T }=) Fo({27rJ~/T})/ m k I 2 ; 1
(9.9)
254
ESTIMATION OF MEAN AND COVARIANCE
(b) the condition
CEOI R u + k ~ , u l < m
is suficient f o r (a) and then
N E { [ & ~ , N - mk12 =
1 -fo(2.irk/~),
lim
N+03
T
(9.10)
where fo(X) is the (continuous) density defined by (9.11)
Proof. For claim (a), since (9.12) h
the convergence of E { [ % ~ J-T %kI2} t o zero is equivalent to the average of [Xt - mt]e-22nkt/T converging in L2 to zero. A necessary and sufficient condition for this convergence is given in Proposition 5.12 and is expressed by (9.9). The interpretation of (9.9) is that the point mass Fo({2nk/T})of the diagonal measure is only large enough to account for l%kI2; there is no random component of positive variance at frequency X = 2rk/T. For (b), ~~~~~~o IRu+kT.ul < 00 implies i B k ( T ) / < m and hence
c,"=-,
is clearly continuous and solves (9.11). See Proposition 4.3.2 of Brockwell and Davis [28] for the stationary case. I See the problems for another proof of (9.10). From (9.10); if CTz: Cr=oI R Z L + k ~ . U < 1 oc,for u = s, t , then it is easy to obtain
for all N sufficiently large.
ESTIMATION OF
mt:THEORY
255
Almost Sure Consistency. Although almost sure consistency can be addressed directly, we now show how results from stationary univariate sequences can be adapted for P C sequences through the fact that the sequence yt = Xt+o is stationary if 0 is independent of X and uniformly distributed on 0,1, . . . , T- 1 (see Section 6.8.3). In particular, we use this device to study the almost sure limits of
that is, to the strong law for the sequence [Xt-rnt]e-"t. 0 a.s. implies & k , N .+ &k as.
Note that J x , N ( X ) +
h
Proposition 9.4 If X t is PC-T with mean mt and 0 is independent of X t , and uniformly distributed on 0 , 1 , . . . , T - 1, then zf either J x , N ( X ) or JY,N(X) converges a.s., the other does also and (9.13)
Remark. This result is adapted from the case in which X t is a PC process indexed on the reals; see Cambanis et al. [30]. A result for APC processes is given there also. Proof. Set 2,= X t - rnt and consider the difference (w is sometimes suppressed)
=
fN(X,w)
-hN(X,w),
where it is assumed N > T > 0 ( w ) for all w . The quantity ~ N ( X w, ) is O ( N - l ) a.s. because EZ-' I Z t ( w ) / < m 51.5. due to E{lZtl} < cxj, t = 0 , 1 , . . . ,T- 1. The convergence of f N ( X , w ) is a little more troublesome because the interval changes with N . But since
c
1 N-l+T E{lfN(bJ)I2} I N2
C
N-l+T
s=N
t=N
~{IZSZtl}
256
ESTIMATION OF MEAN AND COVARIANCE
the Borel-Cantelli lemma provides the result f N ( A , w ) A. I
-+
0 a.s. for every fixed
Many of the known sufficient conditions for almost sure convergence of J x , N ( A ) to zero appear in the article by Gaposhkin [62], where more subtle conditions, such as item (c) below are also given. The following conditions result from the application of the preceding proposition to some of these known conditions for stationary sequences.
Proposition 9.5 If X t is PC-T with mean mt, then each of the following conditions is suficient for J x , N ( A ) -+ 0 a.s. (a) f o r fixed A, there exists
-
Q
> 0 and NO > 0 for which
A
2x
sin2 n N ( q - A) K Fo(drl) I N”’ N 2 sin2 n(q - A)
whenever N > NO; (b) each of the conditions Bo(T) = O ( T - ” ) f o r Q > 0 or is suficient for condition (a) uniformly in A;
xF ~ B O (
(c) for fixed A, convergence of the sequence
c M
-log log k .
k=3
Proof. For the stationary sequence Yt = Xt+e - mt+e,conditions (a) and (b) imply J Y , N ( A )-+ 0 a s . by application of Theorem 6.1 of Doob [49] to the covariance B~(T) of yt; condition (c) implies J Y , N ( A ) 0 a s . by application of Theorem 4A of Gaposhkin [62]. The claim J x , N ( A ) + 0 a s . follows from Proposition 9.4. I -+
Remark. Note that maxt lRt+7,t/= O ( T - ~and ) Cy=-, C:=il (Rt+t,,t I< co are sufficient for the conditions given in (b). This latter condition is equivalent to condition (b) of Lemma 9.1.
ESTIMATION OF
mt:THEORY
257
In the stationary case (see Doob [49, page 4931) condition (a) is established for X = 0 and then argued for X # 0 by noting that X l = XtePixt is also stationary with spectral distribution function F x , ( u ) = F X ( U A) modulo [ - 7 r , T ) (or [0,27r)).If X i is PC-T, then X i = Xtecixt also remains P C as
+
E{x:x~)
=
E{XsXt)e-iX(s-t)
=
E{X,+~X~+~)~-'X("+T-~-T)
=
E{X:+TXI+T)
and its spectral measure Fx, is that of X t shifted down the main diagonal by X (see Section 6.8.5), that is, Fxl(X1, X2) = Fx(X1 A, XZ X) and so the main diagonal measure FO shifts down by X modulo 27r. Gaposhkin [62] also gives conditions for almost sure convergence (a) in spectral terms and (b) for a class of quasistationary processes. The application of these is left to the reader. Finally, I. Honda [98] applies existing results to the lifted X, to show that if X t is PC-T and Gaussian, then J x , N ( X ) + 0 a.s. for all X if and only if the diagonal spectral measure FO is absolutely continuous with respect to Lebesgue measure.
+
+
Asymptotic Normality The two main approaches to asymptotic normality are via the assumptions (1) that the process is linear (an infinite moving average) and (2) that the process is mixing in some sense. We shall give results using both approaches. But first we see how the random shift can be used for this problem. Recall that both J x , N ( X ) and JY,N(X)have mean-square limits. The direct proof that E{ie-iX'Jy,N(X) - J x , N ( X ) I ~+ ) 0 is even simpler than the a s . result (Proposition 9.4) and is left as an exercise. This is enough to give the following, where + denotes convergence in distribution.
If X t is PC-T with mean mt, 0 is independent of X t , and uniformly distributed on 0 , 1 , . . . ,T - 1, then JY,N(X) + F y implies Jx,N(X)eiX' + F y .
Proposition 9.6
P
Proof. The proof is a consequence of e-Zx'Jy,N(X) + J x , N ( X ) and elementary facts about convergence in distribution. See Lokve [139, page 2781 or Brockwell and Davis [28, Proposition 6.3.31. I h
This result can give a way to obtain asymptotic normality for single fixed k .
&k,N
for a
Proposition 9.7 Suppose X t is PC-T with mean mt, 0 is independent of X t , and uniformly distributed on 0 , 1 , . . . , T - 1. Any condition on X that gives & k , N + f i k in probability and asymptotic normality for the mean estimator of h
258
ESTIMATIONOF MEAN AND COVARIANCE
stationary sequence ~ for f i k , N .
( = Xt+ee-i2"k(t+Q)/T ~ 1 will give asymptotic normality
h
Proof. Denote X ! k ) = X te - i 2 x k t / T Observe that
(a PC-T sequence) and
K(k)= X:$)o.
) N ( p , 0 2 ) by hypothesis. But the rightThe sum on the left J Y ( k ) ) , N ( 0+ most quantity converges in probability to e-i2xkQITei2"kQ/T6ik= f i k , so evidently p = 6 i k . Taking = 0 in the preceding proposition gives f i k p = Jx(~).N(O) N(fik:a2). I h
*
Proposition 9.8 Suppose X t is a real linear PC-T sequence
c 00
xt
= mt
+
(9.14)
$j(t)St-j,
j=-m
where mt = mt+T and $ j ( t ) = $ j ( t + T ) , j E Z are real, C j / $ j ( t ) /< m, t = 0,1, . . . , T - 1, and {&} is a real zero mean i.i.d. sequence. Then if \k is of rank T , (I%N - m) + N (0, \k~\k')
JN
where and
h
mN
= ( & T - ~ , N ,& T - ~ , N , . . . , &o,N)', m =
c
( ~ T - I m ,
~ - i , ... , mo)',
00
\k=
\kj
with
(9.15)
9 j =
Proof. The linear PC-T sequence xt,lifted to x, = (X,T-I, has the representation
XnT-2,.
. . X,T-T)',
o=
X,
=m
+ C j=-m
\kjzn-j
(9.16)
ESTIMATION OF
mt:THEORY
259
with \ k j given by (9.15) and where En = ( & - l , < n T - z , . . .&T-T)’ are i.i.d. with Cov(En,Em)= IT^,-,. Since C,”=-,i[\kj]:i201 = Cr=jlC kI&(t)l < m, Proposition 11.2.2 of Brockwell and Davis [28] may be applied to conclude the result. I Since the Fourier coefficients m = ( 6 0 ,& I , . . . &*-I)’, are linearly related to mt via m = T-1/2V*(0)m,where V(X) is the unitary matrix defined in Proposition (6.9), we obtain the following.
Corollary 9.8.1 Under the conditions of Proposition 9.8, (a) zfrank \k = T , then
m(
m -~Ei)
(b) zf rank \k 5 T, then f o r any
+ N (O,T-lV\kES’V*);
P with p’\kE\k‘P> 0 , we have
T-1
(c) zf rank \k 5 T, then f o r any
JN
c
p with P’V(O)\kX\k’V*(O)P> 0, we have
T-1
Pk(i%k,N -
6,) =+ N (0, T-1p’V(O)\kE\k’V*(O)P).
k=O
m
This is a common way to address the convergence of ( i f i ~- m) to a possibly degenerate normal, meaning its covariance \kkc\k’ is possibly not of full rank. If \kCW is of full rank, then P’\klc\k’p> 0 for all nonzero P. The notation is still correct if we consider p to be a projection onto a subspace and positivity to mean positive definite. Asymptotic Normality Via Mixing. Asymptotic normality can also be obtained from certain mixing conditions that govern the memory of the process. Various notions of mixing exist and here we shall utilize the concepts of strong or a-mixing and $-mixing. For a sequence Xt, denote the Bore1 sigma-fields 3 t = B ( X , , s 5 t ) and Qt = B ( X , , s 2 t ) . Note that we can ignore the presence of a nonrandom mean mt in Xt because 3 t = B ( X , - m,, s 5 t ) and similarly for Gt. We define the a-mixing and &mixing functions to be at,n dt,n
= =
sup{jP(A n B) - P(A)P(B)/: A E Ft, B E G t + n } , (9.17) sup{lP(B(A)- P ( B ) /: A E Ft, P ( A ) > 0, B E Gt+n}; (9.18)
and see that if X t is stationary (strictly), at,n and If lim s u p a t , n = 0 or lim n-+m
t
n-+w
t
+t,n
are independent o f t . = 0,
260
ESTIMATION OF MEAN AND COVARIANCE
then X t is correspondingly called uniformly strongly mixing or uniformly 4-
mixing. If X t is periodically stationary (see Definition 1.2) with period T, then = Qt+T,, and 4 t , n = +t+T,, for every n; furthermore, sup, in the preceding displays can be replaced with maxt=0,1,...,T-1. If lim,-+,ot,n = 0 or limn-m +t,n = 0 then we say that a sequence is strongly mixing or 4mixing for the reference time t ; but since 4 t - k l , n + k l L +t,n L &+kz,n-kz for any k l , k2 2 0 it follows from the left inequality that +-mixing at any reference time t implies it for all t (similarly for a-mixing). Furthermore, +tz,n = O(+t,,,) for any tl,t2 provided limn-, +tl,, exists. To see this, in the inequality above set t = tz and t - kl = tl - T , t kz = t l , where we can take T > tl - tz > 0. Then Qt,n
+
+tl,n+kl 5 -+tz 5 4t1,n
n
$tl+T,n-kz h , n
+tl,n
and the limits of the rightmost and leftmost quantities are both unity. The mixing sequences 4tl,nand +tz,n may be a little different but are asymptotically the same. Note if a periodically stationary sequence is constructed by interleaving stationary processes having different mixing rates, say, that for each t the sequence Xt+jT is +-mixing with mixing function
4):
= sup{/P(BIA) - P(B)J: A E
F:), P ( A ) > 0; B E G:;,},
+
32)
where = a(xt+jT,J’ 5 m ) and G:in = B(xt+jT, j 2 m n). We may (t) conclude from 3,( t ) C 3t+t+,T and 6,+, C Gt+mT+nT that )4: 5 +t+rnT,nT = +t,nT and since &,nT = O ( + t l , n ~then ) the periodic mixing functions & n T are bounded below by maxt 42). In other words, the slowest individual mixing rate governs the periodic mixing rate. Finally, setting 6, = maxt at,,, we see that Xt is uniformly strongly mixing limn-m 6, = 0 if and only if it is strongly mixing for some t . And then also h, = O(at,,) for any t . The same statements hold for 4-mixing. Rosenblatt [197] showed that strong (i.e., a ) mixing along with a moment condition gave asymptotic normality for the sample mean of a stationary process. Using mixing hypotheses, Rozanov [201, Section 111 gives a central limit result for the sample mean of a stationary multivariate process but requires that the spectral density matrix be of full rank at X = 0. The result for the rank deficient case is only a slight elaboration of the Rozanov result. The +-mixing facilitates CLT results for covariance and spectral estimation because of the following facts. If in (9.18), 51 and cz are 3t and Gt+, measurable, respectively, then
I q c 1cz 1 - E{ 51} w e 2 } I I 2+?
I51 I ‘WSI cz I ,
(9.19)
ESTIMATION OF
+
where r , s > 1 and 1 / r 1 / s Billinsgley [13, page 1701.)
= 1.
mt.PRACTICE
261
(See Ibragimov [120, Lemma 1.11 or
P r o p o s i t i o n 9.9 If X t is PC-T and (a) b, = O(n-'-') for some E > 0 , (b) EIXt/2+6< m , t E Z f o r b > 4 / ~and (c) the spectral density matrix f of the blocked sequence X, is bounded and continuous at X = 0, then T-1
fi
- mt)
Pt(%t,N
t=O
* N (0,2rp'f(O)p)
whenever p'f(0)p > 0 . Ifdet[f(O)] # 0, then ~ ' T ( i i i N-
m) + N ( 0 , 2 n f ( 0 ) ) .
Proof. When f(0) is of full rank, this is Theorem 11.2 of Rozanov [201]. If p'f(0)p > 0, then the sequence Y, = C T z i p j X n ~ + is j a-mixing with mixing function ay>, = O(6,). Furthermore, the asymptotic variance for N-1 T-1 t.he estimator % y , = ~ N-' Y, of my = CjT0 Pjmj is clearly o2 p'f(0)p > o. Also, ElY,(2+b5 dW&Y,N
-
my
P-
CCT
m,
t E Z and thus
I
=+ N(O;P'f(0)P). h
The asymptotic normality of ?%k,.',r, Ic = 0 , l : . . . , T - 1 can be established using I% - I% = V(O)(&N- m), as in Proposition 9.8. If det[f(O)] # 0, then we easily see that b, = O(n-'-') implies that the mixing function for X, satisfies ax,, = O((Tn)-l-') = O ( ( n ) - l - ' ) and so asymptotic normality for 6 i follows ~ directly from Rozanov [20l, Section 111. h
Corollary 9.9.1 Under the conditions of Proposition 9.9, if rank f(0) = T
fi( m N
- I%)
if rank f ( 0 ) I T , then for any
+ N (0;2 n v f ( o ) v * );
p with p'Vf ( 0 ) V ' p> 0 , we have
T-1
fl
pk('?%k,N
-
?%k) =?'
N (0, p ' v f ( o ) v * p ) .
k=O
9.2
ESTIMATION OF mt: PRACTICE
Since we can describe the mean of a PC-T sequence X t by mt or by the Fourier coefficients, G&, we wish to estimate either of these quantities. Here we
262
ESTIMATION OF MEAN AND COVARIANCE h
describe corresponding estimators denoted by fiit,N (9.1) and I % ~ , N .Although we can estimate h k directly from f i i t , N , as in (9.8),here we estimate it directly from the series by the first line in (9.8), namely,
-
NT-I
(9.20) because it can be implemented as a sample Fourier transform of { X t , t = 0 , 1:. . . NT - l} evaluated at frequency 2;7k/T. This permits the use of a frequency-based met hod for assessing significance. In the previous sections we have already examined the consistency of these estimators under the assumption that X t is P C with period T . To summarize, under mild and unsurprising hypotheses, these estimators are consistent in several senses. In the following paragraphs we describe the computation of f i t , N and I % k , N , and their implementation by programs permest .m and permcoef f .m. h
9.2.1 Computation of ??&,N Since the mean sequence mt can be an arbitrary periodic real valued sequence, it is also of interest to know whether or not mt is properly periodic or whether it is a constant. In order to help the perception of mt = m, our realization of the estimator fiit,N (to be described more completely later) includes 1 - cy confidence intervals around each point estimate; these confidence intervals are based on the assumption that for the tth season, the random variables = X t + p ~ - m t , p = O , l , . . . , N - l } areNormal(0,a:). Sincea: must be estimated, the confidence intervals are determined by a t distribution with N 1 degrees of freedom. The existence of nonoverlapping confidence intervals, as in Figure 1.1, gives a preliminary clue that mt $ m. One-way analysis of variance can be used to test for mt = m under the assumption that the random variables {yt")} are Normal(0, a') (homogeneous variances in t ) and that the collections and {yt")} are independent whenever s # t. Note that the condition mt = m conveys nothing conclusive about the presence of P C structure in the covariance. Indeed, the true mean of a PC sequence may be constant and a sequence may have a periodic mean and stationary covariance structure. So we cannot use the outcome of a test for mt = m to avoid the further testing for covariance structure.
{x")
{Yp'}
Program permest .m. The program permest .m implements the estimator & t , N , but slightly more generally because the series may contain missing values and the length of the series may not be an integral number of periods. Given an input time series vector, and a specified period T , the program computes and returns the periodic mean based on all the values that are present (not
ESTIMATIONOF
mt:PRACTICE
263
missing) in the series. That is, for each t in the base period, (9.21) where Nt = card{p E {0,1,. . . , N - 1) : Xt+pT not missing}. Using a specified a. the 1 - Q confidence intervals are computed at each t = 1 , 2 , .. . , T . The original series is plotted with missing values replaced by the periodic mean and marked by “x”; the periodic mean is also plotted along with 1 - Q confidence intervals based on the normality assumption. The p-value for a one-way ANOVA test for equality of means is also computed and is present on the plots. The demeaned series X t - G t , N is computed and returned to the calling program. Figure 1.1 presented an application of permest to 40 periods (T = 24) of a solar radiation series from station DELTA of the Taconite Inlet Project. The p-value from the one-way ANOVA for m(t)E m was computed by MATLAB to be zero. This is not too surprising considering the clarity of the periodicity and the size of the dataset. But even after shortening the dataset to 2 periods we found that the p-value is and for 4 periods it again returns a p-value of zero. A
9.2.2
Computation of ?&,N
Although we could construct frequency based tests when the length is not an integral number of periods, the payoff seems hardly worth the effort! so we assume here that the series is of length N T In this case we have already noted in (9.8) that
where NT-1
(9.22)
Note that the F F T algorithm provides the Fourier transform (9.22) evaluated at the Fourier frequencies A, = 2 r j / N T , j = 0 , 1 , . . . N T - 1 so that &k,n; is the FFT coefficient with index j = k N . Taking this view helps us to construct tests for specific f i k = 0 and also for { G k = 0. k # 0); which exactly corresponds to mt = m . These tests are based on variance contrast, which at some frequezcy index j , is the value of ( X N T ( ~ ~ ~ O / NinT contrast )I’ t o the average of / X N T ( k j / N T ) \ ’ values in a neighborhood. More details will be given in Chapter 10.
264
ESTIMATION OF MEAN AND COVARIANCE
Program permcoef f .m. For a real series and specified period T , the program permcoeff .m implements the estimator f i k , ~ but , slightly more generally because the series may contain missing values and the series may not have a length of an integral number of periods. Missing values are set to the sample mean of the nonmissing values and the series is cut to the largest number N of whole periods in the series. h
Table 9.1 Result from program permcoeff .m applied to 40 periods of solar radiation data of Figure 1.1. The overall test { f i k = 0, k = 1 , 2 , . . . 11) gives a p-value of 0. h
k
lfik,Nl
nl
n2
Variance ratio
p-value
0 1 2 3 4 5 6 7 8 9 10 11
2.94e+002 6.98e+00 1 4.26e+000 9.42e-001 1.40e+000 1.82e-001 5.49e-001 9.20e-001 2.14e-001 1.76e-001 7.68e-001 6.61e-001
1 2 2 2 2 2 2 2 2 2 2 2
16 32 32 32 32 32 32 32 32 32 32 32
1.12e+003 2.30e+003 9.26e+000 7.90e-001 1.37ef000 5.09e+000 5.24e-001 1.76e+000 9.41e-002 1.07e-001 1.57e+000 3.61e+000
3.33e-016 0.00e+000 6.69e-004 4.62e-001 2.68e-00 1 1.20e-002 5.97e-001 1.88e-001 9.10e-001 8.98e-001 2.24e-001 3.87e-002
h
The values of f i k J are computed only for k = 0 , 1 , . . . , L((T - 1)/2)] because for real series, f i k = f i ~ - k . In addition, the p-values for the test f i k = 0, discussed in the previous paragraphs, are returned. The results of permcoeff .m when applied to 40 periods (with T = 24) of the solar radiation data of Figure 1.1 are presented in Table 9.1. Note the k = 0 term corresponds to the sample mean. The value of I f i k , ~ Ifor k = 1 is incomputably significant (i.e., 0) and so a p-value correction is meaningless. However, when shortening the series to 4 periods, the value of 1 2 is ~ just 2 and the p-value for k = 1 is 1.39e - 005, still significant, even when corrected by a factor of 11. The overall test { f i k = 0, k # 0} gives a p-value of zero. h
9.3 ESTIMATION OF R(t
+
T , t ) : THEORY
Now we address the estimation of R(t+.r,t ) and its Fourier coefficients B k ( l ~ ) . For motivation, we recall that for a stationary sequence X t having correlation function R ( r )= E{Xt+,Xt}, the natural estimator for R ( T )based on a finite
ESTIMATION OF
R(t + T , t ) :THEORY
265
sample of length N is the well known gN(T)
=
1
N-T-l
[ X ~ +T e N ] [ X t- G N ] ,
(9.23)
t=O
which, under conditions such as limT+w R ( T )= 0, or CTm~ R ( T < ) / co, is mean-square consistent. If X is Gaussian, then is consistent if and only if the spectral d.f. (measure) for X has no discrete component (see Doob [49, Theorem 7.11). The estimator for the autocorrelation function is (9.24)
9.3.1 Estimation of R(t
+ r,t )
Corresponding results for PC sequences are easily obtained for estimation of the covariance R(t T , t ) based on a finite sample X O ,X I , . . . X N T - ~ .The periodicity R(t T , t ) = R(t T T ,t T ) suggests the estimator
+
+
+ +
+
+
1 and for all T possible, and then use E N ( s ,t ) = E N ( S T ,t T ) as necessary. Denoting 8$(t) = R N ( t~) , the autocorrelation estimator is for t
= 0,1,.. . ,T -
+
h
(9.26) Recall in the stationary case, that the sum in (9.23) may be divided by N - T - 1 rather than N to obtain the maximum likelihood estimator. But N is often preferred because ~ N ( T will ) be a NND function of T . Later we will have need for an estimator for R ( s ,t ) that is a NND function of s, t . Generally, E N ( s ,t ) given by (9.25) will not be NND unless N = KT for integer K . This can be ensured by either truncating the series to KT observations, where K = LN/TJ, or by filling the series with zeros from N 1 to K T , where K = i N / T J + 1. This process produces estimates that may be interpreted as the components of the matrix autocovariance and autocorrelation functions of the lifted sequence X,. That is, if
+
. K-I (9.27)
266
ESTIMATION OF MEAN AND COVARIANCE
and (9.28) the mapping (1.10) implies that [Xk+hIi
= Xi+(k+h)T h
and so the corresp0ndenc.e between RE(h) and gN(t
[Z$(h)]ij = Z
N (+~ h T , j ) = EN(^ + (i
-
+ r,t ) for N = KT is j ) + hT,j).
Sufficient conditions for consistency can be given for linear PC sequences or directly in terms of R ( t + 7, t ) . For the former, Theorem 11.2.1 of Brockwell and Davis [28] gives the following.
Proposition 9.10 If X t is a linear PC-T sequence and satisfies the conditions of Proposition 9.8, then lim [g%(h)]ij = [ R X ( h ) ] i , in probability.
(9.29)
N-CX
Next, we obtain consistency results for an estimator expressed more clearly in terms of X t ; specifically, consider
+
1
g f , ( t r,t ) = N
N-l
[Xt+kT+T
-
(9.30)
m t + 7 ] [ X t + k T - mt]
k=O
in which the mean sequence mt is assu_medto be kno_wn. The following lemma shows that under broad conditions, /Rfy(t r,t ) - RN(t r,t ) / 0 in probability.
+
+
--f
Lemma 9.2 If X t is PC-T and has fourth moments, and if % , p s = t or s = t r, then
+
A R N = EN(t
+ T,t ) - g L ( t + r,t )
P
P +
m, f o r
0.
Proof. Denote
z t , =~ [xt+T
-
and
[xt
st+~,N]
zJ,T= [Xt+7 -
Then by a simple direct computation
-
%,N] -
R(t
[ X t - mt] - R(t +
+ 7,t )
(9.31)
t).
(9.32)
ESTIMATION OF
R(t + T , t ) :THEORY
267
Since R ( s , t )must be bounded due to JR,,tJI maxt E { X : } = M2, then
-
-
showing that the error G t , -~ mt is bounded in probability for each t = P
P
0 , 1 , . . . ,T - 1. Hence if either G t , N mt or %t+,,N rnt+,. the result follows by straightforward convergence results. (See Proposition 6.1.1 of Brockwell and Davis
[%I.)
P
The convergence ARN 0 is sufficient to ensure the equality of the limits in the various modes of convergence of interest here. The following proposition gives conditions for consistency in terms of the zero mean sequence ZJ,, whose covariance is RZt ( t l ,t 2 , T ) = E(ZJ,,,ZJ2,,}.
Proposition 9.11 If X as PC-T with bounded fourth moments, then each of the following is suficient for mean-square consistency of gh(t T , t ) (for t. r fixed) :
+
(a) limN+W
1
N-1 N-1 c j = Ock=O
(b) limk,oo RZ+(t
RZt ( t
+ j T ,t + kT, T ) = 0;
+ j T ,t + jT + kT. T ) = 0 uniformly in j .
If ZJ+,T,, is stationary an j , these simplify to (a’) limN+oo
& Cr=il(1 -
+ kT, (c’) CEO lRzt ( t ,t + j T , . ) I 2
(b’) limkioo RZt ( t ,t
T)
RZt ( t , t + ~ T , T=)0; = 0;
< m.
If X t i s Gaussian, the following conditions sufice for (b’) and (c’)
-
+ kT. u ) k - m 0 for (u,u ) = (t + t ) ,( t ,t ) ,(t + t + (t,t + T ) ; (c”) CEO lR(u+jT,v)/’ < 00 f o r ( u , w ) = (t+T.t), ( t , t ) (t+.r,t+T),(t.t+T). ,
(b”) R(u
T,
T.
T),
Proof. Item (a) follows from the computation Var
1 N-l [Ek(t+ T , t ) ]= N2 j=O
N--l
k=O
RZt ( t
+ j T , t + kT,
7).
268
ESTIMATION OF MEAN AND COVARIANCE
Item (b) is an application of (a) from Lemma 9.1 to the sequence ZJ,T.Item (a’) follows from item (a) using the Toeplitz structure of stationary covariances. Item (b’) can be seen directly from item (b), and item (c’) is sufficient for (b’). Item (b”) is sufficient for item (b’) by use of Isserlis’ formula,
E{Xt1xt xt3 xt,1 = W X t1 xt,}E{Xt,xt,} + E{XtlXt,}E{Xt,Xt4} + E{Xt1Xt4)E{Xt2Xts 1, 2
(9.33)
Again by using (9.35) and the Schwarz inequality, item (c”) suffices for JRZ+ ( t ,t j T , 7)12 < 00 which implies (c’), or (b’) directly.
c,zo
+
It is important to observe that the presence of a discrete component in the spectrum of ZJ,Tis in contradiction to the conditions of this proposition. In the Gaussian case, the presence of any discrete components in the spectrum of Xtis in contradiction to the conditions of this proposition. Almost Sure Consistency. Conditions for almost sure consistency may be obtained from the stationary case when {ZZ,,} is PC in t for fixed r, for then {ZJ+,T,T}is stationary in j for fixed t ,r.
Proposition 9.12 If X is PC-T with bounded fourth moments and { Zi+jT,T} is stationary_ in j for fixed t ,r, then the following are suficient for a.s. consistency of Rh(t r,t ) (for t , r fixed):
+
(a) for some a
>0
where RZt ( t
+ j T , t ,r) = E{ZJ+JT,TZZ,T};
C,”=-,lRZ+( t + j T , t , .)I suficient for (a);
(b) either
< co or
RZt
(t + j T , t , 7 ) = O(j-”) is
Cj”=, +
(c) if X t is Gaussian, the conditions IR(u j T , .)I2 < m or R ( u + j T , v ) = O(j-“) f o r ( u , v ) = ( t + T , t ) , ( t , t ) , ( t + ~ , t + T ) . ( t . t + ~ ) are suficient for those in (2).
ESTIMATION OF
R(t + 7. t )
THEORY
269
Proof. Item (a) results from the application of Theorem 6.2 of Doob [49] to the stationary sequence {ZJ+3-r.,T}. Both claims of item (b) are an application of Doob Theorem 7.1.1 to the stationary sequence {Z:+,,,,}. For item (c), application of the Isserlis formula (9.34) yields x
33
j=-m
+lR(t + j T + ~ , t ) R ( t + j T , t + ~ ) l and, for example, the first of the sums on the right-hand side is bounded by
R(t
+ j T + r ,t +
T ) ~
1
1'2
and the other similarly. Thus the first condition in item ( 2 ) is satisfied. If R ( u + j T , v ) = O(j-") for (u,v) = ( t + T , t ) , ( t , t ) , ( t + T , t + T ) , ( t , t + T ) , then again by (9.35) we conclude the second condition, R Z +(t + j T . t ,T ) = O ( j - " ) , in item (2) is satisfied. 1
Asymptotic Normality. The first results on asymptotic normality for estimators of covariances of P C sequences were due to Pagano [175] for periodic autoregressions, and to Vecchia and Ballerini [219], who obtained asymptotic normality results for one sided (causal) infinite periodic moving averages (8.15) under the additional conditions CjZn la,(t)l < cc and the orthonormal sequence [k has fourth moments, E { [ i } < 03. We will mainly concentrate here on the results based on +-mixing (introduced by Ibragimov [l20]), but an outline of the linear model approach is given in the supplements. The mixing approach has been used to obtain consistency and asymptotic normality for covariance and spectral estimators for almost P C processes in continuous time [log, 109,1351. Although the results are easier here for discrete time P C processes, the main ideas are present. The following lemma relates the mixing function for Zt,T = X t + T X t in terms of the mixing function for X t .
Lemma 9.3 If X t is periodically stationary with period T and uniformly $-mixing with mixing function &, then for arbitrary real numbers and arbitrary integers t1,rl1t 2 , 7 2 , the sequence C j = PlXtl+T,+jTXtl+jT P2Xtz+~2+jTXt2+jT is $-mixing with mixing function & n - - n o ) ~ , where no 2 0 is a fixed integer.
+
270
ESTIMATION OF MEAN AND COVARIANCE
Proof. Set n,in = Lmin{tl,T1, t z , r z } / T ] n,,, ,
= lmax{tl,r~,tZ,rz}/Tl+l.
Since
we have
where no = n,,, - n,in. Since we are only interested in we can take & = 0 for n 5 0. 1
$(n-no)T
as n
+ 30,
If X , i s periodically stationary with period T , E X : < < cc, then for a n y t = 0,1, ...,T - 1, T E Z for which a 2 ( t , T )= C,"=-,R Z i ( t + j T , t , ~>) 0 , Proposition 9.13 30,
t E Z and uniformly &mixing with E,"=-,(&)'I2
V%[gk(t+ 7 ,t ) - R(t + 7 ,t)] * N (0,a 2 ( t 7, ) ). Proof. The hypotheses imply that Zj+jT,Tis a zero mean stationary sequence in j with E { ( Z l + j T , T ) 2<} m, j E Z and the estimator for the mean of 22+jT,T is gf,(t T , t ) - R(t + T , t ) . The preceding lemma with 01 = 1, /?z = 0 implies for every t = 0 , 1 , . . . , T - 1 and T E Z, that ZJ+jT,Tis &mixing with mixing function &n-no)T and hence the results of Ibragimov [120, Theorem 1.51 can be applied directly. The condition C,"=o(~(n-no)~)1/2 < 00 along with the relation (9.19) for r = s = 2 ensures the existence of the sum 0 2 ( t7 , ) = C,"=-,E{ZJ+jT,TZJ,T} = C,"=-,R Z t ( t j T , t , T ) even though the limit may be zero, an example of which is given in the supplements. I
+
+
Corollary 9.13.1 Under the conditions of Proposition 9.13, for any vector
P such that P'EP > 0, we have
where
oc
ESTIMATION OF
and P k , k = 1,2 , . . . , K are arbitrary real numbers and are arbitrary unique pairs of integers. Proof. Consider now for arbitrary real
Pk
R(t
+
t)
T~
( t k ,r k ) ,
and integers
tk,rk
THEORY
271
k = 1 . 2 , . . . ,K the sequence
which is clearly stationary in j and of mean zero, E { & } = CfzlP k E { Z J k + 3 T , T k= } 0. From the preceding lemma & is $-mixing with mixing function 6 ( n - n o ) ~ for which C:=o($(n-no)T)1/2 < cc and where no depends on the collection { ( t k , ~ k :)k = 1 , 2 , . . . , K } . Setting
-
N-1
K
produces
N Var ( J N ) = N E { J $ }
which shows the asymptotic variance of ~ J is P'XP. N The sums defining [XI exist due to C,"=o(~(n--no)T)1/2 < 00 and the inequality (9.19). The preceding proposition (or Ibragimov [120, Theorem 1.51) can be applied provided the asymptotic variance is positive, P'XP > 0. I If X t is stationary and Gaussian, several facts combine to make a result that depends only on a summability of covariances. Proposition 9.14 If Xt is perzodically stationary with peraod T , Gaussian, T-1 00 and Ct=oC,=-, lR(t r , t)I2 < cc, then
+
f i [ ( a k ( +t
t ) - R(t + r ,t ) ]+ N (0, a 2 ( t ,r ) ), t
7.
where 0 2 ( t ,T ) = C,"=-,RZt ( t
= 0,1, ..., T - 1,
+ j T , t , r ) and
RZt ( t S j T ,t.7) = R(t+jT+r,t+r)R(t+jT,t)+R(t+gT+T,t)R(t+jT,t+r).
272
ESTIMATION OF MEAN AND COVARIANCE
9.3.2 Estimation of B ~ ( T ) Since in the stationary case the natural estimator for the covariance is (9.23), examination of (6.2) suggests the following form for the estimator of Bk(r) based on a sample of length N T :
where the interval over which the sum is performed must be modified for r because X t is no longer stat’ionary.
<0
(9.36) An alternative approach is to base the estimation of B k ( r ) on its Fourier relationship to R(t r, t ) . For this approach we would seek conditions for which the estimator kh(t+.r, t ) is consistent for each t = 0 , 1 , . . . , T-1, and then use the fact that each element of the vector B ( r )= (Bo(r),B l ( r ) ,. . . , B T - ~ ( T ) ) ’ is given by (6.2), which in vector form is
+
+
B(7) = T - 1 / 2 V* (O)[R(O
7 ,0 ) , R ( l
+
. . . , R(T - 1 + 7 ,T - I)]’
7 , l),
In this manner many of the preceding results adapt easily to give conditions for consistency. We will concentrate on the first approach because it can be applied to the case of almost PC processes for which the covariance Rt+r,t cannot be directly estimated. See the supplements for a list of references that address the estimation of the Fourier coefficients of B(t,r ) in the almost periodic case. As in the estimation of R ( t T , t ) , we actually study
+
and use the following lemma.
Lemma 9.4 If X t is PC-T and has fourth moments, and if t=0,1, ...,T-1,then
P 6 t . N -+
mt for
ESTIMATION OF
R(t + 7, t )
273
THEORY
Proof. Computing ABN directly gives
-(&+Tn2t
-
mt+Tmt)]
e
-i2x k t / T
c %] [Ac 1/2
tEINT,r
tEINT
1/2
(f%+T
.
(9.38)
T
where the latter bounds are due to the Schwarz inequality for finite sums. Each of the quantities on the left in (9.38) are bounded in probability, due to the Chebychev inequality and the fact that they have finite expectation, for example,
By hypothesis. the quantity in each of the right brackets in (9.38) converges to zero in probability, giving the claimed result using the same argument as I in Lemma 9.2. For the next lemma, set
M2 = maxt
E{X?}
Lemma 9.5 If X t is PC-T. then (9.39)
274
ESTIMATION OF MEAN AND COVARIANCE
where E N T ( T , ) is bounded for all N and r . If C Z - , lR(t = 0 , 1 , . . . T - 1, then there is a constant K f o r which
+ r,t)l < cc f o r
t
(9.40) 7=-m
Proof. We shall show the result for r
-
-
2 0.
For any N ,
NT-7-1
R(t + r, t)e-i2"kt/T
1
-
NT
.
t=O
NT-r-I
where m is the largest integer in ( N T - r - l ) / T . Since NT where 0 5 a < T , then the required E N T ( ~ ,is) given by ENT(T)=
-aBk(r)
1 NT-r-l +NT
R(t
-
r - 1 = mT+ a,
+ r ,t ) e
-i2rlct/T
mT
and
T-1
/ E N T ( T ) /5 2 c
l R ( t + r ; t )5/ 2TM2: (9.42) t=O where A42 is the bound on the second moments of X t . The second part of the claim follows from (9.42). I
Proposition 9.15 If X is PC-T with E { X % }5 M4, then each of the following is suficient for the mean-square consistency of B I , N T ( f~o)r k = 0 , l ; . . ,T - 1. h
(a) limu-m R Z t ( t
+ u,t , r ) = 0 uniformly in t ;
(b) if ZZ,, is PC in t and
C,"p=-m CzL', IRZt (t1,tz:r)I< cc;
(c) if X is Gaussian and
C,"=-,
R2(t+ r , t ) < cc;
ESTIMATION OF
(d) if X is Gaussian and limrlm
R(t + T >t ) :THEORY
275
R(t + r,t ) = 0 unzformly in t .
Proof. We shall show the result for r 2 0. First note that if
. =
NT-r
EL,NT'(r)
-
E{gl,NT(T)}
(9.43)
converges in mean square t o zero as N 2 cc,then the triangle inequality and Lemma 9.5 produce the desired result B&(t,r ) -+ B(t.r ) in the mean-square sense. That condition (a) is sufficient follows from
and performing the sum in two disjoint regions B and A - B. where A and B are defined in the proof of Lemma 9.1. As in that proof, No may be chosen large enough that IRZt ( u ,u , T ) I < ~ / everywhere 2 in A - B and so the part of the sum in (9.44) over A - B satisfies the same inequality. Having fixed N O , the value of N may be chosen t o make the sum over B arbitrarily small. Note this is precisely condition (b') of Proposition 9.11 holding for t = 0,1, . . . , T - 1 (uniformly in t ) . The sufficiency of (b) follows from the application of condition (b) of Lemma 9.1 to the sequence Zl,r. This is condition (c') of Proposition 9.11 holding uniformly in t . For (c) we use the fact that for any E > 0 there is an icg for which l u- u / > NO implies lR(u,w)I < t 1 I 2 for all w. Thus using the Isserlis formula,
(Rz+(u.v,7)1 I lR(u+7.u+7)R(u.w)1+l R ( u + 7 . v ) R ( u . v + 7 ) /5 4 2
+
for all c if I u - ul > NO 171. The argument of condition (a) may then be applied to obtain the conclusion. This condition is sufficient for condition (b') of Proposition 9.11 to hold for all t , 7 . To establish (d), we use the Isserlis formula again t o obtain
c
N-1
u,u=o
c
N-I
lRzt(u.w.7); i
+
[1R(u+r,v+.r)R(u.u)I IR(u+7,u)R(u,w+r)/]
u v=o
and then the Schwarz inequality, applied. for example, to the first term on the right. to obtain
c
N-1
u,u=o
IR(u+r,v+r)R(u,.)I
i
+
R2(u
7 ,u
+
7)
276
ESTIMATION OF MEAN AND COVARIANCE
But these sums of squared terms, such as R 2 ( u transformation s = u - w,t = 'u become NT-1
C
2 c
+
T,U
+
T),
under the
NT-1
R2(u+7,u) I
u.u=o
S=-m
R2(s+t+~,t+~)
t=-NT+l
s=--00
t=O
The other three sums are similarly bounded by this last quantity and so E { J z , N T ( ~is) }0(1/N), hence establishing (d). I Note all the conditions give consistency independently of k = 0 , 1 , . . . , T - 1 and conditions (c) and (d) are also independent of T . All of these conditions are versions of the conditions of Proposition 9.11 but with uniformity. Almost Sure Consistency. Conditions for almost sure consistency can also be obtained when {Zz,,} is P C in t for fixed T , for then {ZJ+j:t.,,} is stationary in j for fixed t , r .
Proposition 9.16 If X is PC-T with bounded fourth moments, then the following are suficient for a.s. consistency of EL,N( T ) : (a) for some
Q
>0
where R z t ( s ,t , r ) = E{ZJ,TZ;,T}; (b) either
t
C;="=_,lRZ+( t + u,t , T ) I < m
= 0 , 1 , . . . ,T - 1
or are suficient for (a);
RZt
( t + u,t , T ) = O(u-") for
+
(c) if X t is Gaussian, the conditions C:=il C,"=-, IR(t r?t)I2< cc or maxT=il IR(t T , t)I = O ( T - " ) are suficient for those in (b).
+
Proof. Item (a) is sufficient for mean-square consistency because the left quantity in (9.45) dominates E { J z , T , N T }(see (9.43)). Following the proof m-m of Doob [49, Chapter X, Theorem 6.11, (9.45) implies Jk,T,N, -+ 0 a.s on a subsequence Nm and to make the general statement it is enough for supt Rzt ( t .t , r ) < m, a condition ensured by the bounded fourth moments. We note that (9.45) is equivalent to K
N
j=
- N+ 1
E { Z ~ + j T , TtZ , , ,5}
F,
s , t = 0 , 1 , . . . , T - 1,
ESTIMATION OF
R(t + T , t ) . THEORY
277
which is a little stronger than the uniformity of condition (a) of Proposition 9.12 with respect to t . Both statements of item (b) are sufficient for item (a) and these are stronger than the uniformity of condition (b) in Proposition 9.12 with respect to t . For item (c), application of the Isserlis formula (9.34) yields (9.46)
5
.A
c
+ + t + T)R(t+ j T , t )
[R(t j T
7,
jz-00
+R(t
+ jT +
T,
+
t)R(t j T , t
+
T)]
and, for example, the first of these sums is bounded by
[ ,F [E 5
1/2
R(t+jT+7,t+T)2]
j=-m
5
[c j=-m
1
R2(t+jT,t)
1'2
R2(t+jT+T,t+7)
t=O j=-m
T-1
m
t=O
j=-m
and the other similarly.
1
1'2
I
Note that whenever the conditi2ns of Lemma 9.2 are also satisfied then there is also a.s. consistency for R N ( t 7 ,t ) . Furthermore, maxt lRz(t j , t , T ) I = O ( j - " ) and C,"=-,C;=G1IRz(t j , t , T ) I < cx) suffice correspondingly for the conditions given in (b).
+
+
+
Asymptotic Normality The first results on asymptotic normality for estimators of covariances of PC sequences were due to Pagano [175] for periodic autoregressions, and to Vecchia and Ballerini [219], who began with a one sided (causal) infinite moving average. Here we will give only one result, which is a consequence of Proposition 9.13 and Corollary 9.13.1. Denote e; as the conjugate transpose of the row vector ek = (1,ei2rklT, ei2r2klT , . . ' , e i Z r ( T - - l ) k l T) *
Proposition 9.17 Under the conditions of Proposition 9.13, dV(i?t,,(T)
+
- B~(T))N (0, T - l e z E e k )
278
ESTIMATION OF MEAN AND COVARIANCE
whenever eiXek
> 0, =
[']p,q
E{zJp+jT,TZ!q,T)>
tpl
t,
= 0>',
' ' ' ?
-
',
j=-m
and V(X) is defined by (6.42). Proof. The result follows from the fact that
can be expressed as a linear combination of random variables g k ( t + T ,t ) that are each asymptotically normal under the assumptions, and so the limit will also be asymptotically normal provided the variance eiXek > 0. I For real-valued discrete-time Gaussian PC sequences, Genossar, Lev-Ari, and Kailath [75]show that consistency for every k and T occurs if and only if one of the following holds: (a) the measure Fo has no discrete component; (b) limA,,
A-l
Cfzt IBo(.)I2
= 0.
When X t is Gaussian, the Isserlis fozmula allows us to estimate the rate of convergence of the covariance Cov [ B ~ , N ( B T ~ ,) N , ( Tas~ N ) ] + co. This result is needed to prove consistency of estimators for the spectral densities f k ( x ) under the Gaussian assumption. A similar result will show consistency of these estimators under the assumption of &mixing. Convergence o f c o v [ E ~ , N ( sTk~, N)(,7 2 ) ] for xt Gaussian. In the following lemma all the variables a , b, ..., f , u, t :t l ?tz are integers and to save space we employ subscripts, writing Bt,., = Rt+T,t.
Lemma 9.6 If X t is PC-T with t2 > t l ,
T-1 C,"=-, Ct=, IRt+T,t12 = K < 30,
then for
(9.47) p=o
where g p
= Bp,u+cBn - p ,u+ f
ei2~p(au+b)/Tei2?r(n-p)(du+e)/T
,
ESTIMATION OF
and the error sequence t1,tz. That is,
= Ki
Ic,,tl,tzI
and
279
Z is bounded and summable unaformly in
E
Eu,tl,tz,u
R(t + 7 ,t ) . THEORY
w
where K1, K2 may be chosen to be independent of tl and tz. Proof. Denote It as the summand appearing in the line above (9.47). Then by defining tz’ = tl - mT, where m = [(tz- t l ) / T ] ,the number of whole intervals of length T in the interval [ t l ,t z ] , we obtain
tz’ t7
T-1
p=o
c
tz’
T-1
=
(t2
-tl)
o p
+
(9.48)
EU,tl,tZ‘
p=o
where EU.tl,tZ = ( t 2 - t1)mod
which is null i
x
T-1
l c ~ , t i r h5
t=O
C It - ( t z - t’z) c tz
T-1
t;
p=o
ffp
( T )= 0. To obtain the bound on
[2
’/’
T-1
21rtl
5
jEZl,tl,t2
T-1
[C
t=O ‘?+du+e,u+f
‘?+au+b,u+c]
1,
1
1’2
(9.49) To show that
l ~ ~ , is~ summable ~ , ~ ~ ( with respect to u , observe that L t=O
is an t2 sequence in the variable u ,
U=-w
u=-w t=O
J
280
ESTIMATION OF MEAN AND COVARIANCE
The final estimate C,"=-,I c u , t l , t z / 5 2K is obtained by application of the Schwarz inequality for & to the top line of (9.49). I This lemma is used in the next propositionJo obtain estimates on the rate of convergence of the covariance C O V [ B ~ , , ~B, kN, ,r 2 , ~ as ] N 4 00 for Gaussian PC sequences. However, the square summability must be replaced with the stonger absolute summability. Proposition 9.18 If X t zs PC-T, Gausszan, and 00, then f o r any 0 5 q , r 2 5 N
C,"=-, CT=-t lRt+r,t/2<
N
NCov
[gj,r1,N,gk,r2,N]
=
+
~ ~ N ( ~ i 7 1 . 7 2 ) [ F 3 , k ( ~ , 7 1 , 7 G~,k(U,71.72)1 2 )
-N
(9.50)
+O(N-l)
where T-1
Bp,u+r1+r2Bj-k-p,ue
F ~ , k ( ~ ~ 1 . 7 1 , 7 2= )
-~2~(ju+pu)/T
,
p=o
c
T-1
G3,k(U1T1,r 2 )
=
Bp,U+T1B3-k-p,~--72e-~ 2 . (n3u -3
TZ +
k ~ +z P T Z ) / T
>
(9.51)
p=o
and the real functzon UN( u ,7 N + co.
5
1 , ~ ~1 ) for
u E [-N, N ] , and zncreases t o 1 as
Proof. The proof will_be given for 7 2 2 71 2 0 in order to conserve space. From the definition of B k , T , N and the use of the Isserlis formula we find
cov -
,gk,rz,N] N - T I - 1 N-r2-1
[ g j , r lN
-
N2
c c s=o
Bt+rz,s-ti-r1-r2Bt,s-te
-227r(.Js-kt)/T
3=0
and these two sums may be associated with the sums involving F and G in (9.50);we shall evaluate the second of these only. To perform the evaluation, partition the rectangle E = [0,N - 71 - 11 x [0, N - 7 2 - 11 into three sets, A1 = { ( s , t ) E E : t > s}, A2 = {(s,t ) E E : s 2 t and s 5 t A3 = { ( s , t ) E E : s z t + ~ -71. 2
+72
- TI,
ESTIMATION OF
~ (+ 7, t t)
THEORY
281
and denote the sums over these sets by S1, S2, and 5’3. To evaluate 5’1, the transformation u = s - t , w = s results in N--rz+u
0
u,=-N+rz
-2 2 TT (jv - kv
+k u ) / T
u=O
and the application of Lemma 9.6 to the inner sum yields
The expression [1- ( 7 2 - u ) / N ]in the preceding line may be identified with U N ( U T, ~~ ,2 in ) the interval [-N ~ 2 . 0 1 .The same transformation yields the other two parts,
+
and
We thenidentify [ l - ( ~ z / N ]and [ l - ( ~ l + u ) / NwithUN(u,71,n) ] over therespective sets of indices appearing in these two lines and define U N ( U~, I , I T = ~ ) 0 for u not in [-N 7 2 , N 711. We may thus write
+
+
where
Convergence of Cov
[ g j , ~ (B~^ ~l, )N ,( T ~ )for ] X t 4-
Mixing.
282
ESTIMATION OF MEAN AND COVARIANCE
Lemma 9.7 If X t is periodically stationary with period T , E X : < 00, t E C,”=-,($n)1/2 < 03, then there are real
Z and uniformly &mixing with numbers C1,C2 1 0 for which
with I ( N T , r ) = { 0 , 1 , .. . N T -
IT - l} and the result follows if we show EJ;,,,,, 5 [Cl/rl C2]/NT. To do this, note that Zt,r = Xt+,Xt implies for any s, t , r that
+
and
qz,,u 2 t + s ) c qx,,u 2 t + s ) .
$TSPr
Then we conclude that Zt,r is 4-mixing with mixing function 4tS5 and so = maxt 5 maxt pt,s--r ‘X = 4s-T for s > r. Combining the like results for r < 0 gives 5 $s-irl and for s < 0 gives $:-5 ~ i S ~ - l . l when Is1 > 171. Using the fact that (see (9.19)) IRzIT(u, w)i 5 2kf44’(u - w) I 2 h f 4 4 u - - v l - r when j u - w /> lrl produces the estimate
4:
#& :4
-
NT-rNT--7
where F = I ( N T ,r) x I ( N T , r ) ,E = {(u,w) E F : (u- v / < sum is O(ClIrl/N)and the second is O(C2/N). I
9.4
ESTIMATION OF R(t
+
T,
IIT~}.
The first
t ) : PRACTICE
As in the case of mt, we can describe the time dependent covariance of a PC sequence by R(t + r,t ) or by the Fourier coefficients, Bk(T), and thus wish to estimate either. Here we describe implementations of the estimators
ESTIMATION OF
gN(t
+ r , t ) and
~ ~ , N T ( Twhere ) ,
R(t + T , t ) : PRACTICE
283
the latter is estimated directly from the
series by
as in the estimation of the mean mt. Again, as in the case of estimating mt, the main reason B ~ , N T (isT com) puted via (9.54) is that a test for significance based on neighboring frequencies becomes available as in Section 9.2.2. As mentioned earlier, B ~ , N T (can T ) alternatively be computed from g ~ (t ~ ,T,, t ) by h
h
+
C
1 T-l T t=O
~ ~ , N T (= T )-
ENt,,
(t + T , t ) e -
i2?rkt/T
,
(9.55)
but then tests for B ~ ( T=)0 must be formulated differently, for example, by assuming that the series is stationary white noise. 9.4.1
Computation of
R N (-I-~7,t )
+
Given a sample of length N T , the general idea is to estimate g ~ ( t ~ , t ) according to (9.25). One primary objective of this estimation is to determine if there is a T for which R ( ~ + Tt,) is properly periodic in t. As in the stationary case, the range of 7 for which we can reliably estimate R(t T , t ) is limited by the sample size. As / T I increases, the sample size available for the correlation estimate diminishes toward zero. So our strategy must be, as it always is, to say as much as we can from the finite sample. In this case, we can only estimate the finite set of correlations {R(t T , t ) , t = 0 , 1 , . . . , T - 1, 171 I 7;nax}, and for each 7 determine whether or not R(t ~ , thas ) a periodic variation with respect to t. This effort can be lessened somewhat because for a real PC sequence, B ( t ,- 7 ) = B(t - T , T ) and hence it is sufficient to test only 0 5 T 5 T ~ If ~ the tests ~ . for proper periodicity are negative for all 0 5 T 5 -rmaxr then our conclusion is that the observed series is not consistent with PC-T structure. This statement is made only for the period T tested, so there is a natural question of whether T could be incorrect. To further investigate the estimation of R(t ~ , twe ) begin with ( T = 0) the estimation of R ( t ,t ) = 0 2 ( t ) .After testing for the presence of a periodic mean, this is the next natural step in determining whether a series has periodic structure with period T . Although there are some arguments to the contrary, we consider the rejection of a 2 ( t )= a2 (in favor of the proper T-periodicity of 0 2 ( t ) )to signify presence of periodic correlation of period T . The first argument to the contrary is that tests for a 2 ( t )= o2 must rely on a single sample path, and so we cannot know if we are seeing a true PC sequence X t or
+
+
+
+
284
ESTIMATION OF MEAN AND COVARIANCE
a randomly shifted yt = Xt+e (with f3 uniformly distributed on {0,1, . . . , T 1)) version of it. Using the principle of parsimony discussed in Section 6.8.3, the simplest approach is to always consider that the time we call t = 0 is some specific time that is not a random variable; that is, we do not allow random shifts to come into the probabilistic model we assume for our observational system. Another argument to the contrary is that we can construct random sequences with periodic variances (see the example at the end of Chapter 2) that are not PC. However, we do not believe they exist in any practical sense and thus we again come to deciding for the existence of PC-T structure whenever we can accept that a 2 ( t )is properly periodic with period T . In order to help the perception of u 2 ( t )= cr2, our realization of the estimator
-
N+-1
(9.56) introduced in (1.2), includes 1 - a confidence intervals around each point estimate; these confidence intervals are based on the assumption that for the tth season, the random variables {Xt+pT - m t , p = 0 , 1 , . . . , N - l} are Normal(0,a;). Hence the confidence intervals are determined by a x2 distribution with N - 1 degrees of freedom. The existence of nonoverlapping confidence intervals, as in Figure 1.2, gives a preliminary clue that a 2 ( t )= a 2 . There are several possible tests for heterogeneous variance that can be employed, but we use Bartlett’s test coded in MATLAB by Trujillo-Ortiz and Hernandez-Walls [216]. Program persigest .m. The program persigest .m implements the estimator z ~ ( t but ) , slightly more generally because the series may contain missing values and the length of the series may not be an integral number of periods. Given an input time series vector, and a specified period T , the program computes and returns 3 ~ ( tbased ) on all the values that are present (not missing) in the series; that is, for each t in the base period, Nt = card{p E (0, 1, . . . , N - l} : Xt+pT not missing }. Using a specified a , the 1 - a confidence intervals are computed at each t = 1 , 2 , . . . , T . The p-value for Bartlett’s test for homogeneity of variances is also computed and is present on the plots. The demeaned series X t - 6 t , N is normalized by 8jN(t)and returned to the calling program. When all the data for time t (modulo T ) are missing, the missing points corresponding to these times are omitted from the time series plot, and l ? ~ ( tis) not plotted. Figure 1.2 presented an application of persigest to a solar radiation series from station DELTA of the Taconite Inlet Project. The p-value from the Bartlett test for a ( t ) = G was computed by MATLAB to be zero. As in the case of the periodic mean, this is not too surprising considering the clarity of the periodicity and the size of the dataset (82 periods). After shortening the
ESTIMATION OF
R(t + 7. t ) . PRACTICE
285
dataset to 4 periods we found that the p-value is -0.6 and for 8 periods it is -3.7e - 007. The estimation of o(t) and the result of the hypothesis test for o ( t ) = o represents only the simplest analysis of second order properties of X t . Rejecting o ( t ) = o strongly suggests PC structure, but leaves open whether or not X t is simply the result of a stationary process subjected to amplitude-scale modulation, as described in Section 2.1.3; to resolve this, we must estimate R(t ~ , tfor ) r # 0. And on the other hand, a ( t ) = g is possible for PC processes that are formed by various forms of time-scale modulation, as described in Section 2.1.4. So again we must estimate R(t r, t ) for r # 0 in order to complete the analysis. A test for a ( t ) = o based on the Fourier series representation of R ( t , t )= a 2 ( t )will be given in Section 9.4.2.
+
+
Program peracf .m. The program peracf .m implements a slight modification of the 8 ~to accommodate , ~ missing values and the possibility that the length of the series may not be an integral number of periods. Given an input time series vector, and a specified period T , the program computes and returns
where Nt is the cardinality of the indexes It = { k : X t + k T not missing }. and similarly for It+,. The quantity Nt,, is the cardinality of It It+,. Denoting Skt ( t ) = 2~~ ( t ,t ) . the estimator of the autocorrelation (coefficient)
n
(9.58) is also provided. For each t , T , confidence limits for p^N,,T ( t + ~t ), are computed by use of the Fisher transformation 1 l+p^ 2 = -log 2 1-6,
(9.59)
c+
under which the 2 are approximately N ( p z , o i ) ,where pz M (1/2Nt,,)p, with C the Fisher transformation of the true p and oi = l/(Nt,, - 3) (see Cram& [34, page 3991). Assuming that the term (1/2Nt,,)p can be ignored, the confidence limits for p are determined simply from those of 2 by the inverse of the Fisher transformation. A test for equality of correlations p ( t T , t ) = p ( r ) , where p ( r ) is some unknown constant, may be made from the variable
+
286
ESTIMATION OF MEAN AND COVARIANCE
which, under the null hypothesis, is (approximately) x2(T- 1). Sometimes it is also of interest t o test for p(t 7 ,t ) G 0 for some specific 7 . Then we take pz = 0 so the test for p ( t 7 ,t ) = 0 becomes that of testing the 2 s for pz = 0. Figure 9.1 presents the estimates of j ? p ~ ~ , ~ ( t 1,t) for the solar radiation data of Figures 1.1 and 1.2. The sample sizes are Nt,1 = 82 for t = 1, ..., 23 and N t , ~ = 81 for t = 24. The 0.99 confidence intervals for true p ( t 1,t) are shown and the test for p ( t 1,t) = p(1) yields a p-value of 5.3e - 005. The evidence for periodicity in p(t 1,t ) is strong although not as strong as for at. For lags 7 = 2 , 3 , 4 , the p-values for the test of p ( t T , t ) = p ( 7 ) were 0.16,0.58,0.91, but the specific test p ( t T , t ) = 0 gave computed p-values of zero; for these lags, significant nonzero correlation is present, but its time variation is insignificant.
+
+
+
+
+
+
+
+
0.75 t 0.7 0
5
10
20
15
3
Values of & + l , t , N t , l versus t for solar radiation from station Figure 9.1 DELTA (see Figures 1.1 and 1.2) using T = 24. The sample sizes are Nt,l = 82 for t = 1, ..., 23 and Nt,l = 81 for t = 24. The error bars are the 99% confidence limits determined by the Fisher transformation and its inverse. The test for p ( t 1,t ) p1 yields a p-value of 5.3e - 005, but for T = 2 , 3 , 4 , the smallest was 0.16.
+
=
Recall from Sections 2.2.6 and 2.2.7 that we can construct simple PAR and PMA models whose variances are constant with respect to time so that a periodogram of squares does not detect the presence of P C structure. But we can detect it using j ? p ~ (~ t, ~ 1,t ) as demonstrated by the following examples. Figure 9.2 presents the estimates of j ? p ~ (~ t, ~ 1,t ) for the solar radiation data of Figure 2.16, where a ( t ) = a could not be rejected by the periodogram of the squares. Here, using N = 512, the Bartlett test for a ( t ) E a gives a pvalue of 1 (in Figure 2.16, N = 5120 was used). The test for p ( t + l , t ) = p ( l ) , also based on N = 512, yields a p-value of 0. However, the more specific test
+
+
ESTIMATION OF
R(t + 7, t ) . PRACTICE
287
0.61
0.4L
-02-04-
-0 6 -0 8
-1 0
5
10
II€ III5
T-iT d J I I J
15
25
20
30
35
40
Figure 9.2 Values of pt+l,t,Nt,lversus t for the simulated switching AR(1) of Figure 2.16, where d ( t ) = 0.95 for 0 5 t < T / 2 and $(t) = -0.95 for T / 2 5 t < T , with T = 32; & is white noise. The sample sizes are Nt>l= 16 for t = 1, ..., 31 and Nt,l = 15 for t = 32. The error bars are the 99% confidence limits determined by the Fisher transformation and its inverse. The test for p(t 1,t ) = p1 yields a p-value of 0, but for p ( t 1,t ) 3 0, the p-value was 0.7. See the text for an explanation.
+
+
+
1;t ) = 0 produced a p-value of 0.7, although the sample correlations are clearly not zero. This is caused by half the sample correlations being roughly 0.95 and the other half being roughly -0.95, giving an average that is consistent with 0. On the other hand, the p(t 1,t ) = p ( 1 ) test uses a x2 and is very powerful in this situation. This example illustrates that the perception of P C structure can be just one lag away. In another example of a constant variance PC sequence, here we reexamine the case of the PMA(2) sequence X t = Et c0s(27rt/T)Jt-l sin(27rt/T)Et-2 presented in Figure 2.18, where a ( t ) = r could not be rejected by by the periodogram of the squares. Nor can it be rejected by the Bartlett test which gives a p-value of 0.6. However, the hypothesis p(t 1 , t ) = p(1) is strongly rejected (even visually, as seen in Figure 9.3) by a p-value of 0 based on N = 600; the more specific test p(t 1,t ) = 0 produced a p-value of 3.9e - 07. So the P C structure is clearly detected by the tests provided in program peracf .m. The reader may wish t o change parameters t o make the contrast less extreme, or experiment with the other constant variance P C sequences discussed in Chapter 2. p(t
+
+
+
+
+
288
ESTIMATION OF MEAN AND COVARIANCE
1
I L'
OL
-0 2 -0.4 -
1
-0.8' 0
Figure 9.3
2
4
6
8
10
12
14
versus t for the PMA(2) sequence Xt = presented in Figure 2.18. Et is white noise and the resulting variance is constant Rx ( t ,t ) 2. The sample sizes are Nt,l = 50 for t = 1, ..., 11 and Nt,l = 49 for t = 12. The error bars are the 99% confidence limits determined by the Fisher transformation and its inverse. The test for p(tf1, t ) p l yields a p-value of 0, and for p ( t + l , t ) = 0, the p-value was 3.9e - 07. [t
Values of
Ft+l,t,Nt,l
+ ~0~(27rt/T)[t-1+ sin(27rtlT)tt-2
=
=
h
9.4.2
Computation of
B ~ , N7) T( h
As in the case of computing m k , ~ our , approach is to apply the Fourier transform, as in (9.22), to the product series
%,T = [xt+T- 6 t + ~ . N[xt ] - %t,N] in order t o compute
Continuing with the Fourier transform method, we make the computation for each fixed r of interest and note that the set I N T , ~defined , by (9.36), is not necessarily an integral number of periods in length. Thus for any 7 we shall always cut the set I N T , t~o be of length N I T , where N' = Lcard { I N T , ~ } / T ) , so the frequency = 2.irk/T occurring in the estimation of B k ( r ) is actually a Fourier frequency for an FFT of length N ' T . The same algorithm discussed in Section 9.2.2 is then applied. In addition, we can see that, for r = 0,
.
NT-1
ESTIMATIONOF
R(t + T , t )
PRACTICE
289
are given by the Fourier transform of the squares evaluated at certain Fourier frequencies. Hence the hypothesis test CT: = o2 is the same as B k ( 0 ) = 0 for all k = 1 , 2 , .... T - 1. Program Bcoeff .m. For a real series-and specified period T , the program Bcoef f .m implements the estimator B ~ , N T ( Tbut ) . slightly more generally because the series may contain missing values and the series may not have a length of an integral number of periods. Missing values are set to the sample mean of the nonmissing values and the series is cut to the largest number N of whole periods in the series. For each specified value of 7 , the values of g k , ~ ~ (are r ) computed only for A
h
k = 0 , 1 , . . . , [ ( T- 1)/2] because for real series, B ~ , N T (= TB ) T - ~ . N T ( TIn) . addition, the p-values for the test B k ( r ) = 0, based on the variance contrast method, are returned. These p-values should be treated with caution because the requisite assumptions may not be met. Here are the considerations. For large N T , the sample Fourier transform of Yt.7 = [Xt+T-%t+T.N][Xt'fk,N].
can be considered a long average, and so the Fourier coefficients ?(A,) would be expected to tend to normality. However, even if the series { X t , t = 0 , l . . . . N T - 1) is i.i.d. Normal, the series yt,, may not even be i.i.d., except when r = 0, and so simple central limit results cannot be used; see the supplements at the end of the chapter for a problem. However, if we assume that the spectral density of Yt,, is smooth in the neighborhood of the frequencies 2.irk/T, then the p-values based on local estimates of the background variance are thought to be reasonable. The results of Bcoeff .m when applied to the solar radiation data of Figures 1.1 and 1.2 are presented in Table 9.2. The r = 0 results presented in Table 9.2(a) show that Bo(0) = 0 is strongly rejected, an expected result because Bo(0) is the average variance of the sequence and must be nonzero for nontrivial sequences. The strong rejection of Bl(0) = 0 indicates the periodicity in the variance, a result consistent with the result of program persigest. The seemingly stronger rejection of B l ( 0 ) = 0 relative to Bo(0) = 0 is attributed to the differences in the degrees of freedom in the variance samples. The 7 = 1 results presented in Table 9.2(b) show that Bo (1) = 0 and & ( l )= 0 are strongly rejected, showing very significant r = 1 average correlation and very significant 7 = 1 periodic variation in the correlation. The k = 1 coefficient is also extremely significant at larger lags; analysis of larger k is not warranted until we remove the effect of the periodic variance.
290
ESTIMATION OF MEAN AND COVARIANCE
Result of program Bcoeff .m applied to solar radiation Table 9.2 data of Figures 1.1 and 1.2. N T = 1968. Local variance estimates based on T I A = 16 neighboring Fourier coefficients. (a) r = O
k
lijk,NT(T)l
nl
n2
Variance ratio
p-value
0 1 2 3 4 5 6 7
l.lleS004 4.42e+003 5.41e+002 2.89e+002 1.57e+002 9.08e+001 1.75ef001 7.25e+001 5.43e+00 1 5.99e+001 6.63e+00 1 6.86ef00 1
1 2 2 2 2 2 2 2 2 2 2 2
16 32 32 32 32 32 32 32 32 32 32 32
1.63e+002 2.95e+002 5.08e+000 2.86e+000 1.43e+000 8.32e-00 1 3.73e-002 7.89e-001 8.38e-001 8.89e-001 8.54e-00 1 1.21e+000
8.16e-010 0.00e+000 1.21e-002 7.18e-002 2.53e-001 4.44e-00 1 9.63e-001 4.63e-001 4.42e-001 4.2 le-00 1 4.35e-001 3.12e-001
8 9 10 11
(b) r = 1 1.01e+004 0 1 4.00e+003 4.72ef002 2 2.49e+002 3 4 1.14ef002 4.19e+001 5 2.49ef001 6 7.86e+001 7 5.89ef001 8 9 6.16e+001 10 7.71e+000 11 6.79e+00 1
1 2 2 2 2 2 2 2 2 2 2 2
16 32 32 32 32 32 32 32 32 32 32 32
1.57ef002 2.23e+002 3.95ef000 2.78ef000 9.69e-001 3.77e-001 1.46e-001 1.36ef000 1.27ef000 1.54e+000 2.99e-002 3.31e+000
l.lle-009 0.00e+000 2.93e-002 7.74e-002 3.90e-001 6.89e-001 8.65e-001 2.70e-001 2.96e-001 2.29e-001 9.71e-001 4.94e-002
To see if the Fourier coefficient method indicates ptfT,t $ p T , the program - G t , scaled ~ by $ ~ ( t )its, sample periodic standard deviation. If the series is the result of an amplitude-scale modulation of a stationary series, then we expect that p k ( r ) = 0 will be rejected for Ic = 0 and r = 0 and possibly some other r;and it will never be rejected for any other k (i.e., for all k > 0) and r. Bcoeff .m was applied t o the series X t
ESTIMATIONOF
R(t + T ,t ) :PRACTICE
291
Tables 9.3(a) and 9.3(b) first indicate that p k ( r ) = 0 is st,rongly rejected for k = 0 and r = 1 , 2 , meaning that there are large average correlation coefficients at lags r = 1,2. But also the coefficients p k ( r ) are never rejected Table 9.3
Result for r = 1 of program Bcoeff .m applied to
[Xi- & , ~ ] / 3 (for t ) solar radiation data of Figures 1.1and 1.2. N’T = 1944 because one full period must be cut, giving N’ = 81. Local variance estimates based on TLA = 16 neighboring Fourier coefficients.
(a) r
=
k
I&,NT(T)~
nl
n2
Variance ratio
p-value
0 1 2 3 4 5 6 7 8 9 10 11
9.06e-001 2.90e-003 4.21e-003 7.93e-003 4.32e-003 2.97e-003 3.12e-003 4.58e-003 2.28e-003 3.70e-003 1.22e-003 2.27e-003
1 2 2 2 2 2 2 2 2 2 2 2
16 32 32 32 32 32 32 32 32 32 32 32
1.25e+002 3.02e-002 4.48e-002 6.67e-001 2.35e-001 4.24e-001 3.18e-001 7.15e-001 4.1Oe-001 1.07e+000 1.29e-001 6.94e-001
5.56e-009 9.70e-001 9.56e-001 5.20e-001 7.92e-001 6.58e-001 7.30e-001 4.97e-001 6.67e-001 3.55e-001 8.80e-001 5.07e-001
1 2 2 2 2 2
16 32 32 32 32 32 32 32 32 32 32 32
l.lle+002 1.34e-001 2.38e-001 1.59e+000 1.68e+000 2.87e-001 1.26ef000 9.29e-001 5.59e-001 2.37e-002 7.50e-001 4.02e-001
1.32e-008 8.75e-001 7.90e-001 2.20e-001 2.03e-001 7.52e-001 2.96e-001 4.05e-001 5.77e-001 9.77e-001 4.80e-001 6.72e-001
1
(b) T = 2 0 8.57e-001 1 5.74e-003 2 7.96e-003 3 9.47e-003 4 6.61e-003 5 2.71e-003 6 5.08e-003 7 3.86e-003 8 3.75e-003 9 7.83e-004 10 4.50e-003 11 2.93e-003
2
2 2 2 2 2
for k > 0 and any r = 1 , 2 ; that is, we cannot reject ptf7,t = pT for r = 1 , 2 using the Fourier coefficient method using n A = 16 neighboring Fourier coefficients for the variance contrast. The most significant coefficient for the
292
ESTIMATION OF MEAN AND COVARIANCE
additional (not shown) analysis of r = 3 , 4 , . . . , 2 0 yielded the p-value 6e-003; but when this is corrected for multiple hypotheses (20 values of r, 11 values of k ) it becomes insignificant. So program Bcoef f .m does not reject the model of an amplitude modulated stationary sequence whereas the direct method does reject it with a p-value of 5.3e - 005 obtained from the test for p ( t 1,t ) = p1 based on ,5t+l,t,Nt,l (see Figure 9.1). It is not unexpected because the direct method examines the sample time-dependent correlations ,5tt+r,t,Nt,, (or g ~ ~ , , (rt, t ) )for t in the base period, whereas p k ( r ) or B ~ , N T (are T) estimators for Fourier coefficients. If time sequences as in the plot of , 5 ~( t~ , ~ 1;t ) in Figure 9.1 do not have a strong projection onto any particular one of the Fourier basis vectors, then deviation from constancy will be more easily observed in the time sequence. Thus we can reject p ( t + 1,t ) = p1 better in the time domain than in the frequency domain. Of course, this is not always the case and we advocate use of both methods for determining if p ( t r,t ) = pr.
+
+
h
+
+
PROBLEMS A N D SUPPLEMENTS 9.1
Show if
C, iR,+p~.kI < x that (9.60) =
-f,427rTTj/T). 1
T 9.2 Show that if N E [ f ? ~ , , ~ - m , ] [ 6 i ~ , ~ --+ - fm S t ,( O] ) , a sufficient condition for which is C,"=-,j R s + k r , t / < 00 for all (s, t ) E ( 0 , l : . . . ;T-l}', then using
we obtain
by taking the kkth entry of (6.46). (Note that [V*(O)]k is the kth row of V*(O) (or kth column of V ( 0 ) )defined in (6.42).) Finally, we recall that fkk(X) = fo(2k7r/T X/T), X E [O; 27r). So this agrees with (9.10) of Proposition 9.3.
+
9.3 Show directly the following L2 version of Proposition 9.4. Specifically, show that if X t is PC-T with zero mean and 0 is independent of X t , and
PROBLEMS AND SUPPLEMENTS
uniformly distributed on 0 , 1 , . . . ,T JX.N(X)12}
-
1, then limN,x
= 0.
293
E{/e-”@JY.N(X)-
c,”=-,
Here we construct a Pc sequence with a:,, = E{Zt+jT.TZt,T}= 0 , where Zt,, = [Xt+T- mt+,][Xt - mt] - Rt+T,t.Let X t = C ( w ) for t odd and & p for t even. where C ( w ) is a random variable with E { C 4 }= [ E { C 2 } ] 2 and & is white noise. Then t = 1.7 = 2 gives Zt,, = C2((w)- E { C 2 } and ) E { C 2 } ] 2= } 0 for all j . thus E{Zt+jT.TZt,T}= E { [ C 2 ( w9.4
9.5 The following is a periodic extension of the Brockwell-Davis [28] Theorem 7.3.1. Suppose X t is a linear PC-T sequence
(9.61)
+
where q 3 ( t )= $,(t T ) , j E Z,C, l$j(t)l < co, t = 0.1.. . . , T - 1 and Et is an i.i.d. mean zero, variance u2 sequence with E { J $ }= 7 7 0 ~ . Then for p z o . 410.
x
oc
v=-x
2 = - x
cv=-, x
where si,q,t= $t+q+vT(t Proof. First. the relation
+ q)‘$t+vT(t).
s=t=u=v s = t # u = v( plus two other ways) none of s, t ;u , v are equal
qa4
a2 0
leads to
-
N-1 N-1
294
ESTIMATION OF MEAN AND COVARIANCE
where Ap,u(S,t , P ,4)
c 00
=
[ii+p(S
+ I.IT f P ) ' $ i ( t f hT)'&+t-s+vT-pT+q(t
-k vT
f
4)
i=-w
x
$i+t--s+uT-pT(t
When computing NCov
+vT)].
[&+p,t,N,gt+,,t,N]
(where s = t ) the term
Rs+pT+p,s+pTRt+uTtq,t+uT
is cancelled by the subtraction of the mean. The remaining two terms having products of R collapse to a single sum due to the stationarity in the variables ( p , v). As an example, one of the remaining terms is
c
N-1
1 N-l N
N-l
Rs+pTtp,t+vT+qRs+pT,t+uT
p = l u=l
The term involving
=
(l--)Rs+pT+p,t+q 1/11
p=-N+l
Si,,,t
Rs+pT,t.
N
is established from (again s = t )
N-1 n.-1
and the Lebesgue convergence theorem.
9.6 and
I
Under the hypotheses of the proposition above, the estimators & + T , t , ~ are asymptotically normal.
&+T,t,N
9.7 Here is a sketch of the proof of Proposition 9.14. The existence of a;,, = C,"=-,R ~ t , ~ (j T t , t ) may be verified for every t , 7 using Isserlis' formula, (9.47), and CT=-iC,"=-, IRt+T,t12 < co. But since we must also have Zimr-wIRt+T,tj = 0, results of Maruyama [151] and Kolmogorov and Rozanov [133]show that Xt is a mixing (no rate specified). We get existence of all moments due to the fact that X t is Gaussian. Finally, apply Theorem 1.4 of Ibragimov [120]to the stationary sequence =Zt+j~,~.
+
<,
9.8 Give an example to show that if XI, X 2 , X3 are i.i.d., then the random variables X 1 X 2 and X2X3 are not necessarily independent. This applies to the discussion of program Bcoeff .m.
PROBLEMS AND SUPPLEMENTS
295
9.9 Almost P C processes. For results on consistency and asymptotic normality for covariance and spectral estimators for almost P C processes (continuous and discrete time) see references [30,39--41,43,44,46,67,108,109,135].
9.10 Almost sure consistency of B ^ ~ , N ( T ) via the random shift. Assume { Z t , 7 }is P C in t for fixed T and denote
where R z ( t ,T , T ' ) = E{Xt+,+,'Xt+,'Xt+,Xt) - R ( ~ + T + Tt' -, t ~ ' ) R ( t + t )~ . , Show the following are sufficient for a s . consistency of B ~ , N ( T : ) h
(a) For some a
(b)
>0
C,"=-,I B z , o ( ~ , j <) ( 03 with k
=O,l,.
. . , T - 1.
or Bz,o(r:j)= O(j-") are sufficient for (a)
This Page Intentionally Left Blank
CHAPTER 10
SPECTRAL ESTIMATION
Suppose X t is a zero mean stationary sequence and we wish to estimate, from a single sample path, the spectral density function f > where R ( T ) = eix‘f(X)dX. The typical approach to the nonparametric estimation of a continuous f is through use of the sample Fourier transform
s,’”
N-I
(10.1)
from which the periodogram
(10.2) is formed. Practically, if X t is not known to be of zero mean. then we replace X t with X t - f i . The periodogram is an asymptotically unbiased estimator for f although it is not consistent. Consistent estimators for f are obtained by tapering (or weighting) the covariance estimator by a multiplicative kernel Perzodzcally Correlated Random Sequences:Spectral Theory and Practzce. By H L. Hurd and A . G . Miamee Copyright @ 2007 John Wiley & Sons, Inc.
297
298
SPECTRAL ESTIMATION
k ( . ) E I 1 ( - c m , c m ) that is bounded and even with k ( 0 ) = 1. This estimator can equivalently be expressed as a smoothing of the periodogram by W ( . ) , the Fourier transform of WJ(.); thus
(10.3) where P N .+ 0 and N P N + 03 as N --t 03. This procedure gives smoothing kersels that are dependent on N in a manner that produces the consistency of ~ N ( A ) . For example, if a stationary process X is Gaussian and R(.) E 11(-ca,co), then &(A) is a consistent estimator at every A. Details may be found in many places [8,27,28,88,177,198]. The consistency can be explained from the fact that neighboring periodogram values, say, I N ( X ~ )and I N ( X ~with ) A 1 # Xz, become asymptotically independent as N --t 03. Hence in a fixed neighborhood (A0 - c , XO+ E )of some XO, the number of independent periodogram values increases proportionally to N , suggesting that on the interval ( A 0 - E , Xo E ) we can consistently estimate the average spectral density ( 2 ~ ) ~ f (~X ) d X . Generally this represents a bias in the estimation of f(X0). However, if the smoothing is controlled as N -+ ca in the manner prescribed above, the number of independent samples still increases fast enough to reduce the variation but yet the width of the smoothing kernel, and hence the bias, shrinks to zero, producing the consistency. Here we are concerned with the estimation of the possibly complex density functions f k ( X ) for real PC sequences when the F k ( . ) in (6.15) are absolutely continuous with respect to Lebesgue measure so that
sxl”_’:
+
(10.4)
Since the measures F k ( . ) can be identified (see Section 6.2 and Figures 6.16.3) with the concentration on 5’1, = { ( A I , XZ) : A1 E [0,27r),XZ = mod (XI lc27r/T,27r)) of the spectral measure F in the harmonizable representation (1.15), then an estimation of f k ( X ) for k = 0 , 1 , 2 , . . . .T - 1 comprises a spectral analysis of the PC sequence. But F can also be composed by splicing together the measures {Fp,} from the matrix valued spectral measure F associated with the multivariate sequence Z,“,k = 0 , 1 , . . . ,T - 1 (see Proposition 6.8 and Figure 6.5). Or to obtain Fk; splice together the Fpq for which ( p - q)mod T = k . Using the relationship F(dX) = TV(X)F(dX/T)V-l(X) (10.5)
THE SHIFTED PERIODOGRAM
299
(see Proposition 6.9 for the definition of the unitary and continuous V(X))between .F and the matrix valued spectral measure F of the blocked multivariate sequence X,, we arrive at the relation between the densities, namely, (10.6) when F and .F are absolutely continuous with respect to Lebesgue measure. Recall from Corollary 6.9.1 that IF1 << l.Fl(./T)l<< /FI. Then any condition on X, that permits the estimation of f x ( X ) gives a way to estimate f k ( X ) via inverting (10.6) and splicing. But here we will mainly take the approach of estimating f k ( X ) directly from the sample Fourier transform of an observed series, noting that these methods extend to continuous time and APC sequences. Spectral analysis for P C processes seems to have begun with Gudzenko [84] and was treated in the continuous time case, using the same methods employed here, by Hurd [101,105]. This treatment is essentially an adaptation of Parzen’s work (e.g., see [177]) and we note that Parzen’s work on spectral density estimation for stationary processes having periodically missing observations [180]essentially treats the estimation of f o ( X ) for a particular type of P C process. This problem has also been treated in a nonprobabilistic context by Gardner [66,67,69]who discusses the connection between the probabilistic and non-probabilistic treatments. An estimator for f k ( X ) is constructed quite naturally from an extension of the periodogram. 10.1 T H E SHIFTED PERIODOGRAM
The principal idea for the P C case is based on the formation of a twodimensional periodogram
from the tensor product of the sample Fourier transform (10.1) with itself. Since I,AJ(X) = f ~ ( X , x ) ,the usual estimators for the stationary spectral density are formed by smoothing f ~ ( X 1 ,X2) along the main diagonal A1 = X2. Now taking the clue from the support of the spectral covariance measure of a PC-T sequence (Figure 1.3) we can guess that estimators of f k ( X ) can be formed by smoothing f ~ ( X 1 , X z ) along the line of support of F in [0,27r) x [O. 27r) that corresponds to F k , that is, along A2 = A 1 - k27r/T, 27r (modulo 27r) for A1 E [0,27r). Now to complete the details, we first show the shifted periodogram, fk.N(X)
=f N ( h
- 27rk/T).
(10.8)
300
SPECTRAL ESTIMATION
is the Fourier transform of B ^ ~ , N ( T ) and is an asymptotically unbiased but inconsistent estimator for fk(A). Then we examine the smoothing of the shifted periodogram f k , ~ ( A )by a family of kernels, depending on N , t o yield consistent estimators for f k (A).
Lemma 10.1 If { X n , n = 0, ...! N Fourier transform of z k , ( T~) .
1) i s any real sequence,
-
fk,N(X)
is the
Proof. First write
We now perform the indicated sum over the region [O, N - 11 x [0,N - 11 in two parts denoted by S l for u = m - n < 0 and Sa for u 2 0. This produces
which shows that N-1
( 10,lO) where we take
E ~ , N ( u=) 0 for u not in the interval
[-N
+ 1,N - 11.
I
As in the stationary case, the shifted periodogram is asymptotically unbiased. Proposition 10.1 If X , is PC-T and f k , ~ ( X ) is
CF="=_, C;=&'1R(t + T , t)l < cc, then
a n asymptotically unbiased estimator for
fk(A).
Proof. Use (9.39) of Lemma 9.5 to write
.
N-I
u=-N+l
301
THE SHIFTED PERIODOGRAM
The condition m so that
Cy="=_, CT=illR(t+r;t)/< cc ensures that c,"=-, IBk(r)I <
is continuous in A. As N P- 03, the first term in (10.11) converges to and the second term converges to 0 by Lemma 9.5. I
fk(X)
Since the periodogram is an inconsistent estimator for f(X) in the stationary case, we certainly expect inconsistency of f k , ~ ( X ) as an estimator for fk(X). The following proposition leads to a similar conclusion for the PC case, assuming that Xt is Gaussian.
Proposition 10.2 If Xt is a Gaussian PC-T sequence for which
7=-,
t=O
then lim Var
N-,
X # rn/T f o ( X ) f o ( X - 27rk/T), fo(X)fo(X - 27rk/T) Ifn-k(7rn/T)/2,X = m / T ( 10.12)
[ f k , ~ ( X ) ]=
+
'
Proof. Using the definition (10.8) and Isserlis' formula again, we obtain Var
[ f k , N ( X ) ]= I 1
+I 2
( 10.13)
A - 1 N-1 N-1 N-1
-
( ~ T Ns=o ) ~u=o
t=O
R s ,u ~ ~ , ~ ~ - i X ( s - u + ~ - t ) - i 2 ~ k ( t - w ) / T
v=o
c c cc
+ -( ~1T N ) ~
N - l N-l N - l N-l s=o
u=o t=O
-iX(s+U-u-t)-iZ~k(t-v)/~
Rs,vRt,ue
u=o
The computations used in obtaining (10.11) from (10.10) and (9.39) may be used to evaluate the two terms appearing in (10.13). Setting 71 = s - u and rz = t - v, the first of the two terms may be written
302
SPECTRAL ESTIMATION
where E ~ ( N , T and ~ )EZ(N,Q) are summable by virtue of Lemma 9.5. From the summability of Bo(r) we may write
and so conclude lim
N+02
11 = fo(X)fo(X
-
27rk/T).
(10.15)
Similarly, for X # .irn/T, an application of the Riemann-Lebesgue lemma yields limN+m I2 = 0. For X = 7rn/T lim 12
=
fn-k(7rn/T)fk-n(-7rn/T)
=
Ifn-k(dT)I2.
N-CC
I
Regarding the issue of consistency, at any X for which f k ( X ) necessary from the domination by the diagonal (see Section 6.3)
(10.16)
# 0,
it is
I f o ( X ) f o ( X - 27rk/T) f o ( X - 27rk/T) # 0 and
lfk(X)12
that both f o ( X ) # 0 and thus from (10.12) the asymptotic variance is positive. It may also be seen that the limits in (10.12) reduce to the familiar result (see Parzen [177]) for the stationary case by setting k = n = 0,
10.2 C 0NSISTE NT EST IMA T 0 RS We now show that consistent estimators for the f k ( X ) may be obtained by smoothing f ( N , XI, X2) along the support line in [0,27r) x [0,27r) that corresponds to F k , that is, along XZ = XI - 27rk/T. In view of (10.8) and Lemma 10.1, these estimators may also be expressed as a weighting of Ek,nr(T) by kernel w(.); that is,
where ,UN is a positive sequence with p~ + 0 and N ~ -+N 00 as N + m. The sequence w ( j ) is taken here to be even, bounded, summable and w ( 0 ) = 1;
303
CONSISTENT ESTIMATORS
thus also w ( j ) is square summable and the spectral smoothing kernel W(A) is the Lz[O,27r) function determined by
J=-m
We now give two types of consistency results. The first is for Gaussian processes. where the Isserlis formula is the crucial ingredient. and the second is for 0-mixing processes.
Gaussian Sequences. We now show that Cov[$.N(X1), $,N(X,)] is O ( l / i V p ~ ) for arbitrary j,k and A1, A 2 and hence the consistency is established.
Proposition 10.3 If X t is a Gaussian PC-T sequence for whach rT-i
then there exzsts a K > 0 such that for any 2 , k E (0.1. ....T - 1) and XI. [O. 27r) NP~CO& V , v ( ~ i )&. . N ( ~ 2 ) 1 I K
X2
E
f o r N suficzently large. h
Proof. From the definition of NPNCOV -
AXTp N SF
we writ(.
[$N ( X 1 ) . $ , V ( A 2 ) ] A-1
I-1
C C
( 2 T ) 2 -A'+l
=
f k AT(&)
w(pNr1)w(pN.r2)cov2)cov [ijJ,N('rl)ijkj
y ( r ~ ) ~ e - ~ ~ l ~ l + ~ ~ z
-N+1
+ S G + SO>
( 10.18)
where the three terms S F . S G ,and So result from the expression (9.50) for NCov [gJ~ ( r l ) E ~k7 ( 7 - 2 ) ] . Recall the real function U ~ ( ( u . r2) 7 ~ .5 1 for u E [-N.N] increases to 1 as N + x. First. the sum SO involving the 0(1/N) term in (9.50) when transformed by vJ = ~ N 2T= ~1 . 2 is bounded by
which converges t o 0 as N asN+x.
--f
m because w ( j ) is summable and N ~ -+N cc
304
SPECTRAL ESTIMATION
Now for the term SF containing F ( u ,7 1 , 7 2 ) , first set T-I
j=O
and note by hypothesis that N-1
N-1
xrosS ( r ) < cc. Then N-l
and then the transformation u1 = TI - r 2 ,
u2
=
yields
An application of the Schwarz inequality for square summable sequences gives, for any N , 30
30
-m
-32
The summability of S ( r ) produces
-30 -30
and the result for the result for SG.
SF follows from these facts. A similar analysis produces I
We leave it as a problem to show condition is sufficient for
c:=-,[zS1l ~ (+t
C,"=-, C,'='
lR(t+r,t ) /< cc
1/2
t ) ~ ' ] < oo.
7,
A
@Mixing Sequences. Consistency of
under @mixing assumptions is interesting because it can be established with considerably less effort than even for the case of Gaussian processes. The result here relies on Lemma 9.7, which fk,NX)
CONSISTENTESTIMATORS
305
gave a simple bound on the rate of convergence for iCov[Ej,NT(71),E ~ , , N T ( T ~ ) ] / .
Proposition 10.4 If Xt is periodically stationary with period T , E X : < oc, t E Z and uniformly @-mixing with C,”=-,(&)1/2 < DC), k ( j ) is any sequence with C,”=-,k ( j ) / j I 1 / ’ < 00, then
Proof. The proof is a result of the estimates
N
N
The next t o last line follows from considering N fixed, and then there must be a uo (which may depend on N ) and a Ci for which [C,l u / p I ~ +C2] 5 C{) u / p1 ~ for Iul > / u o / .And then for 1u1 5 /u01,since [Cllu/p~r/+C2] is bounded, there is a Ci with [Cl/u/pNI C2] 5 (24. Combining these observations with the I hypotheses gives the result.
+
306
10.3
SPECTRAL ESTIMATION
ASYMPTOTIC NORMALITY
Here we sketch some results on asymptotic normality of &.)v(X) for linear PC sequences. We omit detailed proofs but indicate the path t o the result via stationary results presented in [ a s ] . Proposition 10.5 Suppose X t = a(t)Et for Et real, z . ~ d . ,m e a n zero wzth varzance at,and a ( t ) = a(t T ) > 0 for all t . T h e n f o r any 0 5 X < T, and 0 5 k < T . we have that Re&,N(X) and Im&,N(X) are asymptotzcally normal.
+
This follows easily from Brockwell and Davis [28.Proposition 10.3.21. where we only need t o see that the periodic variances a 2 ( t )have a minimum. and this controls the Lindeberg condition. We leave it as a problem t o compute Var[&,c,~(X)] and Cov(&,c,~(X1), &.i.n-(X2)) for A. XI. A 2 in the Fourier frequencies for sample size N . Recall for PC white noise that fk(X) = B k ( 0 ) / 2 ~where . B k ( 0 ) = $ C;=&'a2(t)e-Z2"kt/T0 < X < 2T. In the following. we assume X t is Gaussian in order to use Propositions 10.1, 10.2. and 10.3. 1
-
Proposition 10.6 Suppose X t zs a Gausszan, zero mean real lznear PC-T sequence (10.20)
w h e r e v , ( t ) = q J ( t + T ) , .IEZ arereal, C,IuJ(t)J
(b) lim Var
(10.21)
[fk.N(X)I
N-CC
fO(X)fO(X
-
2Tk/T).
fo(X) f o ( X - 27rk/T)
+ 1f n - k ( m / T ) I 2 .
XfrnlT
,
X = 7rn/T '
(c) f o r any j , k E {0,1, ..., T - 1) and XI: X 2 E [0; 27r), there exists a K > 0 for which Np1vCov [&(XI). &.':.N(X,)] 5K
f o r LV suficiently large;
ASYMPTOTIC NORMALITY
(d) $..\.(A)
307
zs asymptotically n o r m a l .
Proof. For (a)-(.). we first note that quences are also t2. Then
C,"=-,iu(t)I2< x
because
t1 se-
,=-m
and
+
3L where w j = maxt=o,l, .T-I lv,(t .)I and CJ=-m vJ < x. Propositions 10.1. 10.2. and 10.3 give the results. For (d) the same argument used in [28. Section 11.71 for asymptotic normality of smoothed estimators based on cross spectral densities is applicable here. In summary, a discrete frequency version of the smoothed periodogram estimator (10.17) is constructed from a shifted periodogram based on a sample Fourier transform at the Fourier frequencies j27~IN.j = 0 . 1 , . . . . N - 1. As N increases. item (c) says that the number of effectively independent samples accumulated in the smoothing increases. giving the asymptotic normality. I I?*
Another route to asymptotic results can be based on the fact that any linear PC sequence, when blocked, gives a linear T-variate stationary sequence (see (9.16). Thus the asymptotic normality of spectral estimators for f x ( X ) , the spectral density of the blocked sequence X,. can be transformed t o estimators for f i ( A / T ) via the inversion of (10.6). and estimates for fk(X) are produced by splicing the latter. Figures 6.1-6.5 may be helpful. We do not give the details here. but see [28. Section 11.71 and [88,page 2891 for a detailed proof where fourth moments are required. Also see Nematollahi and Rao [167] for a treatment of spectral azalysis for PC sequences based on X,. Confidence limits for f k , N ( X ) will be discussed in a later section.
308 10.4
SPECTRAL ESTIMATION
SPECTRAL COHERENCE
Since we can get consistent estimators of &(A), a natural question is whether we can use the estimator &(A) given by (10.17) to produce a test for the presence of PC-T structure. That is, if an observed value of &(A) is significantly non-zero, we would declare that PC-T structure is present. Since f k ( X ) can be identified with cross spectral densities, the notion of coherence (or coherency) provides a natural framework for making such tests. Recall coherency [27, Chapters 7 and 8 1 indicates the linear relation between the random spectral measures [1 and & of two stationary series. For P C sequences. we wish to measure the linear relation between the random amplitudes [ ( d A ) and [(dX - Ao). where XO = 27rk/T. Thus spectral coherence refers to coherence statistics applied to the random spectral measure of a possibly nonstationary sequence; in order to obtain empirical measurements. we apply it to the sample spectra, namely, the sample Fourier transform. Many of the properties of complex random variables that are pertinent to spectral analysis of time series were initiated by N. R. Goodman in his thesis and subsequently (see [55,80-821) , including the idea of testing for various nonstationary structure based on correlations among F F T ordinates. 10.4.1
Spectral Coherence for Known T
For a PC-T sequence whose spectral measures F k , k = 0.1.. . . . T - 1 are absolutely continuous with respect to Lebesgue measure. the domination of the diagonal of F . namely, I f k ( A ) 1 2 5 f o ( A ) f o ( A - 2 7 r k / T ) , gives a natural way to measure if f k ( X ) is large. That is. for f o ( A ) # 0 and f o ( A - 27rk/T) # 0, we define the theoretical complex coherence (coherency) between the random amplitudes at frequencies X and X - 27rk/T by (10.22)
Assuming the conditions of Proposition 10.1, since
we can also write
=
lim Corr N C C
[Z,(A). X,(X
- 27rk/~)1.
Adapting this to sample quantities, we repiace the quantities in the first line of (10.22) with their estimates, f k . ~ ( A ) ,f o , ~ ( A ) .and f o , ~ ( X- 27rk/T).
SPECTRAL COHERENCE
309
and then express these estimates in terms of the values of a sample Fourier transform evaluated at the usual Fourier frequencies. From (10.17), setting A, = j27r/N, this leads to
m=l
where in the discrete frequency approximation, the weights Wm( N ) incorporate the denominator term l / p N from the first line. Thus we obtain an approximate squared coherence
Since the smoothing sequence Wm is typically concentrated over a small interval of frequencies, we simplify to the case where Wm is constant, and thus
The quantity [?(A,, A, - 27rk/T)I2 is called the sample magnitude squared coherence (or spectral coherence), where the dependence on M is suppressed here. A slightly more general version (see (10.30)) of sample magnitude squared coherence was one of the tests proposed by Goodman for testing of various for nonstationary structures [82] based on correlations among FFT ordinates. It turns out to be perfectly matched to perceiving the spectral correlations of PC sequences. Under the null case, Z3 are complexGaussian with uncorrelated real and imaginary parts for each j and E { Z 3 Z J , }= 0, j # j’. the sample squared coherence 191’ has probability density P(lT12) = ( M - 1)(1- lY/2)M-2,
0I /Y12 5 1.
( 10.26)
Setting X = ly/21it is easily determined that P [ X 5 x] = 1 - (1 - x ) ~ - ’ , which for a Type I error of a leads to the solution for the a-threshold of z,
= /TIa 2 = 1- e l o g ( 4 / ( M - l ) .
( 10.27)
Note that (10.26) depends only on the length M of the smoothing window. Since M is the number of complex products, it should be identified with 2M real products or degrees of freedom in the usual sense.
310
SPECTRAL ESTIMATION
If, in (10.24) or (10.25),the 2, do not have mean zero, but share a common mean E ( 2 , ) = p, then (10.25) may be replaced by
and then p ( l r i 2 ) = ( M - 2 ) ( l athreshold, 20 -
l ~ ) ~ ) " - ~ leading .
IrIi = 1
-
elog(4/(M-2),
to the solution for the (10.29)
A, - 27rk/T) l2 exceeds the threshold, then we can In summary. if i";A,. declare that evidence of P C structure exists for these specific T . A and k . We repeat for emphasis that the preceding test is for a specific T, X and k . Frequently, T can be considered known, and then a test for presence of PC-T structure can be constructed from a family of hypothesis tests defined by a set HT of pairs ( A , . k ) . If there is no prior knowledge, HT should cover the set [O. 27r) x ( 1 . 2 , . . . , T - 1) by finite collection of points that accounts for X t being real. In this case we suggest HT = A x [l.2 , . . . , [ ( T- l)/2]], where A = {A, = j M i r / N , ~= 1 , 2 . . . . . l N / h I ] } . If none of the null hypotheses in HT is rejected, then there is no evidence within the family HT that X t is PC-T. However, the thresholds for the individual tests require adjustment for multiple hypotheses. This problem has not been systematically studied for the hypothesis testing problems connected with determining the presence of P C structure. Our elementary approach has been to use the Bonferroni correction together with a simply reasoned estimate of the number of independent tests in the family.
10.4.2 Spectral Coherence for Unknown T In our previous discussion of spectral coherence, T was assumed known. Here we show how spectral coherence can be used as a basis for testing for the presence of PC structure when T is unknown. Rather than restricting the computation of the coherence statistic to a specific support line determined by T . we compute
for ( p . q ) in a square array and determine the ( p , q ) for which i"jA,. A., M)I2 is significant. The perception of this empirical spectral coherence is aided by plotting the coherence values only at points where a threshold is exceeded. where the threshold is determined by the null distribution of lY(A,, A,, M)12
SPECTRAL COHERENCE
311
under the assumption that the x j are i.i.d. complex Gaussian variates, so that (10.26) may be used for setting thresholds. The spectral coherence statistic of (10.30) is sometimes called diagonal spectral coherence because it may be seen as a smoothing of the two-dimensional periodogram 1 -r f ( N .j , k ) = 27rN along a diagonal line (having unity slope) starting at the coordinate ( p , q ) . and then normalizing by the product of the smoothed diagonal terms. Since the support of the spectral measure F for P C sequences consists of straight lines of unity slope, the diagonal spectral coherence computation gives a test for the presence of P C structure [107].
---xjx,
EfFects o f Parameter M . From the viewpoint of sensitivity. (10.27) shows that the threshold 171: decreases as M increases, meaning smaller values of true coherence will be called significant. So sensitivity argues for larger M . But since the parameter M controls the length of a smoothing window applied to some diagonal line, we see that choosing M too small relative to the smoothness of the underlying coherence will diminish our ability to detect the presence of significant coherence. On the other hand. if the underlying coherence varies rapidly along some diagonal line, then choosingM too large will cause the coherence statistic t o be diluted with terms having low values of true coherence. This causes the effective M t o be smaller, and hence the threshold for significant coherence is set too low, producing too many erroneous rejections of the null. As in most nonparametric smoothing procedures. it is thus useful to observe the results of a collection of smoothing parameters. We typically use Ad = 8,16. and 32 t o begin. If the underlying coherence along some line were very smooth (in the limit. uniform as in P C white noise) then we are motivated to consider making M as large as permitted (i.e.. M = N ) so that only one value of spectral coherence is determined for each separation d from the main diagonal. It is not difficult t o show that for M = N the numerator of (10.30) is given by
x,x,
thus it may be seen that l ; / ( X p , A,, M ) / ' , as a function of d = p - q ? is proportional to the magnitude of the normalized Fourier transform of /XnI2.That is, it becomes the usual periodogram of the squares. whose utility (and limitations) for recognizing the presence of P C structure we have already seen (see Sections 2.2.6 and 2.2.7). Specifically, the periodogram of squares is useful only when a 2 ( t )is properly periodic, which corresponds t o B k ( 0 ) # 0 for some k > 0. If B k ( 0 ) = 0 for some k > 0. since B k ( 0 ) = Jf"f,(X)dX. we
312
SPECTRAL ESTIMATION
conclude that the density f k ( X ) along the kth support line integrates to 0. From (10.22) we can see that the theoretical coherence y(X, X - 27rk/T) can thus change sign along the support line.
10.5
SPECTRAL ESTIMATION: PRACTICE
The practical computations of &,~,N(X) and lT(Xj, A, -27rk/T)I2 follow exactly as described above. The sample Fourier transform X,(X) is computed for a finite collection of A and for given TLthe shifted periodograms are computed and smoothed to produce estimates f k , ~ ( X of ) f k ( A ) . Using the distribution of IT(X,, A, - 27rk/T) I we make a test to see if fk (A) is significant in comparison to fi/2(X)fi/2(A - 27rk/T). In contrast, confidence intervals for fk(X) or y ( X , X - 27rk/T) tell us something about the estimation errors. 10.5.1
Confidence Intervals
First note that f k ( X ) and &,~,N(X) are typically not real when k > 0. As in Brillinger [26] we treat the real and imaginary parts separately, using a Student’s t to describe the distribution of Re [&(Xj)-fk(Xj)] and Im f k ( X j ) ] relative to their sample variances Z:e and $;,?,. Then, setting
[A(&)-
(10.32) the confidence interval for Re fk(X) is [Re & ( X j ) - A r e , R e & ( X j ) +A,,]. The confidence interval for Im f k ( X ) is [Im z(Xj)- Ai,,Im A,,]. Confidence intervals for Iy(X, X - 27rk/T)/ can also be based on the Fisher transformation. Enochson and Goodman [55] show that
A(&) +
(10.33) and show that for large values (0.4 < /yI2 < 0.95) z is close to N ( p z , a 2 ) , where 1 1 oz = -, pL,=C+ 2 ( M - 1)’ 2M and
Nutall and Carter [173] show graphically the errors in the transformation for a few values of M . But with modern computing, the confidence intervals can be computed exactly, as in Wang and Tang [220].
SPECTRAL ESTIMATION PRACTICE
313
The estimators for f k ( X ) and Iy(X, X - 2j7k/T)I2 have been implemented in programs f k e s t .m and sc0h.m. Although the programs were constructed for real series, only slight modifications should be required t o make them applicable t o complex series. Program f k e s t .m. For2pecified period T and k , the program f k e s t .m implements the estimator f k , ~ ( X ) . The Fourier transform of the sample is computed by the FFT program at the AF = {A, = j27r/N, j = 0, 1,.. . , N - 1). The estimator &,~,N(X) is computed for a subset of AF using a specified window W,. The quantities Re&,N(X) and Im&.N(X) are plotted along with their confidence intervals determined by Student’s t as discussed earlier. In addition, the values of IT(& X - 27rk/T)I2 are plotted along with the threshold (10.27) for significant coherence. Program sc0h.m. First, an FFT of length N is computed at the frequencies = j2rr/N, j = 0 , 1 , .. . , N - l}. Next. the magnitude squared coherence /y(X,. A,, M)I2 is computed in a specified square set of ( p , q ) and
AF = {A,
using a specified smoothing window W,. Only values of /y/* exceeding the threshold (10.27) are plotted. For multiple hypothesis corrections, we replace the specified Q with CulN,, where N , is the number of points on one side of the plotting square. A provision is also made t o smooth the resulting image with a two-dimensional smoothing window.
10.5.2
Examples
10.5.2.1 White Noise Starting with a very simple case, we examine & , N ( X ~ ) assuming the sequence is PC with T = 16; when it is just white noise. Therefore we know the support of resulting spectral measure F is just the main diagonal of [O, 27r) x [0,27r) and hence f k ( X ) = 0 for all X when k > 0 and B~(T) = 0 for all T when k > 0. Figure 10.1 presents f ^ l , ~ ( X j ) , N = 1024 for X j = 2j7r/N, j = 0 , 1,.. . ,255, I
-
in the case that X , is just stationary white noise and hence E { X j X j l } = 0 for j # j ’ . The smoothing weights are uniform Wk = 2 x / N M , with M = 16. It is clear that B~(T) = 0 and hence f k ( X ) = 0 for k # 0; in addition, for k = 0 we have Bo(0) = a: and f o ( X ) = 4 / 2 7 r . The confidence intervals for Re f^l,c,~(X) and Im &.N(X) are determined by the t statistic method (10.32) with v = M - 1 degrees of freedom. The spectral coherence images in Figure 10.2 show, for white Gaussian noise, the speckled character of the image for a small smoothing window (M = 4) and the effect of increasing the window t o M = 16. Since a spectral coherence image is a presentation of many values of IyI’, the issue of multiple hypothesis correction naturally arises. By counting only
314
-0 21
SPECTRAL ESTIMATION
50
100
150
200
250
0 2,
-0 2
50
100
150
200
250
50
100
150
200
250
4
50
100
(a)
150
200
250
-4
1
(b)
Figure 10.1 Estimate f ^ l , ~ ( X ) and I-y(X, X-27r/T)I2 for white noise, using FFT of length 1024, T = 16 and a uniform smoothing window of length 16. (a) Top is Re f ^ l , ~ ( X ) : bottom is Im f ? c . ~ ( X ) . Confidence intervals based on the t statistic (95%). (b) Top is ly(X, X - 27r/T)I2:bottom is argy(X. X - 27r/T). Coherence threshold based on (10.27) with cy = 0.05.
one side of the diagonal (say, j > i) for an image of side n. we obtain n ( n - 1 ) / 2 distinct values of lyI2. But these are clearly not independent random variables so using the Bonferroni correction a / ( . ( . - l ) / 2 ) in place of Q: is too harsh. Using o / n , although still conservative. is considered more reasonable. The bottom two spectral coherence images of Figure 10.2 show the same images as the top row, but the plot threshold is determined by (10.27) with a / n in place of a . 10.5.2.2 PC White Noise If X t is PC white noise (see Sections 2.2.2 and 6.4.1). X t = a(t)et.where et is stationary white noise and ~ ( t=) a ( t T ) , then from the remarks following (6.59), we have
+
+
where B k ( 0 ) = T-' C:=-t a2(t)e-"2"ktlT . Taking the simple a ( t ) = ao[l ~:cos(27rt/T)] we obtain Bo(0)= a,'(1+a2/2). B l ( 0 ) = ao,', Bz(0) = a2ai/4. and since the sequence is real, B~-1(0) = B l ( 0 ) .B ~ - 2 ( 0 = ) B2(0). All other values of k produce B k ( 0 ) = 0. It follows that f o ( X ) = (1+a2/2)/27r, f l ( X ) = a/27r. and f 2 ( X ) = cr2/27r. All other values of k produce fk(A) = 0. The support of resulting spectr21 measure F is shown in Figure 10.3 for T = 16. Figure 10.4(a) presents f l , ~ ( X ? ) .N = 1024 for A, = 237r/N, j = 0. 1.. . . ,255, for PC white noise with cy = 1. The smoothing weights are uniform W& =
SPECTRAL ESTIMATION PRACTICE
Figure 10.2 Spectral coherence image of white noise based on an FFT of length 1024. Only values exceeding the Q thresholds are plotted. For (a) and (b). cy = 0.05. For (c) and (d). the Bonferroni correction is used. 01 = 0.051512.
315
316
SPECTRAL ESTIMATION
Figure 10.3 Support set S1 and s15 for real X t = oo[l+ cucos27rt/16]~t Since X t is real, B ~ ( T=)& ( T ) and f l ( A ) = f l 5 ( A ) .
27r/NM, with M = 16. It is clear from the top of Figure 10.4(b) that almost all of the l ~ values ) ~ are significant relative to the 0.05 threshold. The estimates and confidence intervals for Re f ^ l , ~ ( X ) and Im &,N(X) show that the values of Imfk(X) evidently are not contributing to the large values of i r ( X , X - 27r/T)I2.This is correct as the true values of Im fl(X) are zero. Figures 10.4(c) and 10.4(d) show, for the P C white noise simulation with cr = 1. spectral coherence images based on an F F T of length 1024. Note the indices run from 1 to 512, corresponding to A, = 2j7r/N, j = 0 . 1 , . . . ,511. This corresponds to the lower left quarter square [0,T ) x [0, 7 r ) of [O. 27r) x [0,27r), and so the support sets of F1 and F1s outside this quarter square are not seen. See Figure 10.3. But the value of the period T may easily be inferred from these images by determining the least value of p - q for which there is significant coherence. The plot threshold for Figure 10.4(c) is determined by p-value= 0.05 whereas for Figure 10.4(d) the plot threshold is determined by p-value= 0.051512. 10.5.2.3 PAR(1) We demonstrate the use of spectral coherence on some simulated PAR( 1) series, where we can compute the true spectral density. Figure 10.5 shows 1200 points of a simulated PAR(1) series X , = #(t)Xt-l+ ( 1 for which 4(t)= 0.6 0.3 cos 27rt/12. Figure 10.6(a) show the estimates Re &,N(X) and Im &,N(A) due to smoothing by M = 32 points along with the true densities computed according to (8.39). Note the non-zero values of Im f o , ~ ( X )are essentially computational noise and may be considered zero. The two panels of Figure 10.6(b) show the
+
SPECTRAL ESTIMATION: PRACTICE
317
Figure 10.4 Simulated periodic white noise, X t = [l+cos(27rt/T)]et with T = 16. SpecFal estimates based on an_FFT of length 1024 and M = 16. (a) Top is Re f l , ~ ( X ) and bottom is Im f l , ~ ( X ) , both with 95% confidence intervals. (b) Top is X - 27r/T)I2 with cy = 0.05 threshold. Bottom is f ^ l , ~ ( X ) ] (radians). In (c) and (d), only values exceeding arg[Re f l , ~ ( X ) + i I m the IyI2 thresholds are plotted. (c) M = 32, Iy/Zo5= 0.0921. (d) M = 32, I ~ / ~ o 5 / 5 1=2 0.2576
estimates Re f ^ l . ~ ( X ) and Im ~?,N(X). The confidence intervals contain the theoretically computed densities. Figure 10.7 presents a spectral coherence image for this series based on one FFT of length 1200. Since T = 12, there are 100 periods in the sample and so the Fl line should occur at a shift of 1200/12 from the diagonal. The line
318
SPECTRAL ESTIMATION
is present but it does not extend very far because i-/(A, X diminishes as X increases.
-
27r/T)I2 quickly
Figure 10.5 Simulated PAR(1) series X t = d(t)Xt-1+[1 for which d(t) = 0.6 0.3cos(2nt/12).
+
Figure 10.6 Estimates and true values of fo(X) and f l ( A ) for simulated PAR(1) with + ( t )= 0.6 0.3cos(27it/12). The FFT length was 1200 and smoothing was uniform with M = 32. Solid lines are true values: dashed lines are estimates and 95% confidence intervals. (a) Top is Re f o , ~ ( X ) and bottom is I m (b) Top is Re f?.,y(X) and bottom is Im f?,l,~(X).
+
fl.h~(X).
10.5.2.4 Polarity Modulated AR(1) llJe begin with the zero mean AR(1) sequence yt = O I ' -+~c t . where c t is an orthonormal sequence and @ < 1. Then
319
SPECTRAL ESTIMATION. PRACTICE
Figure 10.7 Spectral coherence image using one FFT of length 1200 and M = 64 for simulated PAR(1) for which Q ( t ) = 0.6 A 0 . 3 c o s ( 2 ~ t / 1 2 ) . 6
-4 1 0
Figure 10.8 when tmod T
100
200
300
400
500
Sample of X t = PtYt where Yt = 0.95Yt-1 Pt = -1 otherwise.
< LT/2] and
we form xt =
{ yt. -yt.
0 5 t mod T < [T/2] [T/2]5 t mod T < T
’
+ Et
and Pt = I
(10.34)
Since this is an amplitude scale modulation of a stationary sequence by a periodic sequence. then X t is PC-T. But since E { X : } = E { Y , 2 } .it is another
320
SPECTRAL ESTIMATION
case where X t is PC-T but the variance is not properly periodic. Figure 10.8 shows a 512 point sample of the simulated X t for 4 = 0.95 and T = 32. Although we do not show the plot of 8$(t),the p-value for the Bartlett test for ~ ( t=)o produces a p-value of 0.99. The main point of interest here is the spectral coherence image shown in Figure 10.9. Although the occurrence of significant spectral coherence on the support lines ST is remarkable, the same phenomenon occurs for other models having constant variance.
Figure 10.9 Spectral coherence image for switching AR(1) described in Figure 10.8 based on one FFT of length 512. M = 32.
Bird Call. Figure 10.10 presents a series of 4096 points from an audio recording of the Eastern screech owl [85]. During this segment of 0.93 second, the modulation was very regular, appearing periodic, but with unknown period. The periodograms of X t and X,", shown in Figure 10.11 reveal no significant periodicity in X t (top panel) but very significant periodicity in the squares X ? (bottom panel). Although the threshold line is drawn using a = 0.01, the strongest harmonics are also significant at a = 0.01/128, the threshold for Bonferroni correction based on the 128 displayed values. The periodicity is also very clear in the spectral coherence image of Figure 10.12 because the apparent amplitude modulation is very severe. Since the strongest harmonic is at frequency index of 16, corresponding to 16 - 1 periods in the record, the period is estimated to be 4096/(16 - 1) = 273.07. Although an isolated bird call is clearly a signal of finite duration, (and hence theoretically unable to be a P C sequence), the sample observed is consistent with the P C structure. We
PRACTICE
SPECTRAL ESTIMATION:
321
consider it not to be a paradox, but a fact related to the inferences possible from a finite sample.
!14'0-
500
1000
1500
2000 2500
3000
3500
4000
Figure 10.10 Sample (0.93 second) of audio recording of an Eastern screech owl . See [ 8 5 ] .
-8'
-8'
20
40
60
20
40
60
v
80
100
120
, 80
100
120
I
Figure 10.11 Periodograms of Eastern screech owl audio signal based on one FFT of length 4096. Dashed lines are the LY = 0.01/128 (Bonferroni correction for 128 points plotted) threshold for test of variance contrast based on a half neighborhood of size m = 8. Top : X t . Bottom : X:.
322
SPECTRAL ESTIMATION
Figure 10.12 Spectral coherence image of Eastern screech owl audio signal based on one FFT of length 4096 and M = 16.
10.6 EFFECTS OF DISCRETE SPECTRAL C O M P O N E N T S Let us observe that the presence of any discretespectral components in_Xt will spoil every one of the consistency results for Rh(t T . t ) ,B z , N T ( ~ f k), .~ ( A ) given in Chapters 9 and 10. This applies to random discrete spectral components as well as nonrandom ones. such as the ones associated with the periodic mean. In addition, discrete spectral components can seriously distort and bias the estimation of spectral coherence. To see this. suppose for a harmonizable X t we have [({A,}) = A(w)e”a and [ ( { A b } ) = B ( w ) e z X b .Kow from a realization of Xt consider the computation of ln/(Ap. A,, &!)I2. where we assume for simplicity that A, = A, and A, = A b . Then
+
A
where C1. (2, and & are the remaining random parts in the respective sums. In the realization, if A(w) and B ( w ) are large compared to the remaining parts. then ~y(A,.A,,M)12 will be near 1. So t o perceive the presence of PC-T structure with absolutely continuous spectra we must first detect and remove any discrete spectral components. both the random and the nonrandom.
EFFECTS OF DISCRETE SPECTRAL COMPONENTS
323
Our approach for removing discrete spectral components is t o first remove the periodic mean and then the remaining components that are detectable.
10.6.1 Removal of the Periodic Mean The removal of the sample periodic mean by Yt = X t - i?it,N has already been described and utilized in the results of Chapter 9. We note that for a sequence t o be PC-T. the only nonrandom discrete spectral components are those of the periodic mean. for any others would produce a mean function that did not satisfy m(t)= m(t T ) . Recall from Section 6.7 that the periodic mean produces atoms in the measure F at ( A l . A2) = (27rj/T,27rk/T) of weight m3mk for j , k = 0 , l . . . . . T - 1. Since from (9.7)
+
--
T-1
(in the mean square sense), removal of the sample mean eliminates more than just the % k . it removes the effects of random discrete spectral components having the same frequencies as the sample mean. That is. we may have ( ( { 2 7 r k / T ) ) = %k Ak,where 6 k is the Fourier coefficient of the (nonrandom) mean mt and A k is the random but zero mean remainder. Note that E { l A k / 2 )= 0. k = 0 , l . . . . . T - 1 is the condition for the lifted sequence X, to be L 2 mean ergodic (see Propositions 9.2 and 9.3). Finally. since X i = X t - M ( t ) will clearly have Fx~(j27r/T,j27r/T) = 0. 3 = 0.1. . . . , T - 1 and / / X t - f?it.'\ / / +X/t/ - M ( t ) / / . then the spectral measure of X t - f?it.,v must converge to zero at frequencies (j27r/T,j27r/T) for J = 0 . 1 . . . . . T - 1. R'e leave it as a problem t o show this using the spectral representation of X t .
+
10.6.2 Testing for Additive Discrete Spectral Components From the previous paragraphs. the other random periodic components necessarily must have random amplitudes with zero means, since the mean mt is explained completely by the K k . Assuming X t is harmonizable (as it would be if X t were PC-T or stationary), the other periodic components can be writ ten 3=1
In orderfor X f to be PC-T it is necessary' for (A,. A,) E ST whenever E { E ( A 1 ) < ( X 2 ) ) # 0. Although detecting and removing all the remaining dis'This requirement does not exist if X t is almost PC See the supplements in Chapter 1 for the definition.
324
SPECTRAL ESTIMATION
Crete spectral components is not feasible with a finite sample, we can still detect and remove or suppress the larger ones to reduce their effect on the tests for PC structure. The problem of detecting the presence of a periodic sequence added to a random sequence is a very old one and has a large literature; see, for example. Schuster [208,209] and Fisher [57]. Thorough treatments can be found in books by Hannan [88, page 4631, Anderson [8],Brockwell and Davis [28. page 3341 and Quinn [186]. Here we use a method based on the periodogram. whose original purpose was finding the frequencies and amplitudes of additive sine and cosine components in a time series (see Schuster [209]). The periodogram can also be viewed as the least-squares estimation of the amplitudes of the sine and cosine components when the frequencies are fixed, as in the FFT algorithm. Its use for estimating spectral densities was described earlier in this chapter. The method we use may be motivated by the simple case of detecting the discrete frequency terms comprising a trigonometric polynomial in the presence of additive uncorrelated (white) noise. That is, P
(10.36) p= 1
and & is real and i.i.d. N ( 0 , a ; ) and A, > 0, p = 1:2 , . . . , P . The usual periodogram for a sample of length N is defined simply as (10.37) where N-1
t=O
Since we can only compute the transform (10.38) for a finite set of A, we will typically use the set of Fourierfrequencies {0,27r/N, 47r/N,. . . , (N-l)27r/N}, which are precisely the frequencies computed by the FFT. Under the null hypothesis that f t = 0, X t is white noise with zero mean and variance cr;, so for X j = j27r/N (a Fourier frequency) with j 2 1, the random variables Re { ~ N ( A ~ )and }
Im { X N ( A , ) )
{z~(Aj/)}
are of zero mean, of variance Na;/2, and orthogonal to Re and Im {ZN(A~,)} for j j’. Hence 2 ~ ( 2 7 r j / N T )j , = 1, ..., N - 1 will be complex Gaussian with zero mean and variance Na;. The random variable Z,(O) is always real for a real
+
EFFECTS OF DISCRETE SPECTRAL COMPONENTS
325
series. Hence for j > 0 the random variables (10.39) are independent and distributed ~ ~ ( 2For ) . the case j = 0, (N)-'%N(O) is the sample mean and 20 = I ( N ) - ' Z N ( O )is ~ ~x2(1)since E { X t } = 0 under the null. Finally, as a test for the presence of a harmonic component at some A k 0 = 27rko/N # 0 , we use the variance contrast ratio, defined as the ratio of z k o to the average of 2, over a deleted neighborhood A(k0) of ko. Sometimes we specify the deleted neighborhood in terms of its half-width m = n ~ / 2where , T I A = card ( A ) .Under the null hypothesis stated above,
Thus the distribution of the variance contrast ratio (10.40) is F ( n l . 2 n A ) . where n1 = 2 except when ko = 0, and then 2 0 has an imaginary part of zero. making n1 = 1. This is a slight variant of the test described by Anderson [8, Section 4.3.31. Assume under the alternative hypothesis that the frequencies of ft in (10.36) are among the Fourier frequencies so at such a frequency, say, A, = 27rkp/N, the random variables Z k p will each be noncentral x 2 ( 2 ) with noncentrality parameter N l a p / 2 / a iand the variance contrast ratio is a noncentral F . For power calculations of related tests in a signal detection application. see Whalen [221] and Robertson [194]. The efficacy of the varzance contrast can be argued even when the model (10.36) of the null can be relaxed to permit yt to be a stationary sequence with a sufficiently smooth spectral density, for then the denominator of the contrast ratio remains a good estimate of the variance in a neighborhood of Ak,.
This method is used as a basis for testing m k = 0 and B k ( 7 ) = 0 in the programs permcoeff .m and Bcoeff .m described in Sections 9.2.2 and 9.4.2. In this case the frequencies to be tested are 27rj/T, j = 0.1.. . . .T - 1. In the more general case, when no specific frequency is known, we test all the Fourier frequencies A k 0 possible by using a neighborhood of typically 8 or 16 points centered at Ak0 with some points deleted near the center. This is done
326
SPECTRAL ESTIMATION
for all ko for which the neighborhoods can be obtained from the periodogram. Again, the threshold is chosen (using the F ) and discrete components declared at frequencies where the threshold is exceeded. The preceding defines the basic idea, but there is another issue t o discuss in getting t o a practical algorithm. Suppose there is a large discrete component at Xko and none within the deleted neighborhood A(lco),then the algorithm behaves as expected when the neighborhood is centered at ko. But when the neighborhood is centered at kl with ICo E A(kl). then CkEA(kl) I N ( X ~ )will be too large due t o the large discrete component at Xko. To alleviate this problem we eliminate from CkEA(kl) I N ( & ) values of IAv(Xk)that are too far from their median, where decision thresholds are determined by the probability of incorrectly eliminating large values. When large values are "trimmed" out of the background, the value 2 n is~ adjusted and the threshold for a specified a is recomputed.
-4
'
-40'
I
20
60
40
80
100
120
I
50
100
150
200
250
Figure 10.13 (Top) Simulated series yt. (Bottom) Periodogram based on 512 point sample. Dotted line is a = 0.001 threshold for test of variance contrast using half-width m = 8.
The top trace of Figure 10.13 presents 128 consecutive values of the sum
where E { & } = 0. Var {&} = 1. A = 0.5, and X I = 2 ~ / 8 . The bottom trace of Figure 10.13 shows the periodogram values along with a a = 0.001 threshold for the test of variance contrast when 1 2 = ~ 2m = 16. Only at (or very near) frequency X = 2 ~ / 8(for which there are 64 periods in 512 samples) is the observed periodogram value significantly above the average of the 16 neighboring values. The computed p-value of the variance contrast at X = 2 ~ / 8is 2.5 x lop7. Note this p-value is for one test at a specific fixed
327
EFFECTS OF DISCRETE SPECTRAL COMPONENTS
and known frequency. The simple but conservative Bonferroni adjustment for multiple hypotheses (see Westfall and Young [ 2 2 2 , Section 2.3.11) yields 250 x 2.5 x l o p 7 = 6.25 x still extremely significant.
10.6.3
Removal of Detected Components
The use of the periodogram t o detect the presence of harmonic components of the form AcosXk,t B s i n & t in a real series suggests how to remove such terms from a series. For if a discrete component is detected at some X k o , then the Fourier components Re { 2 2 ~ ( & ) / nand } Im { 2 x N ( X k 0 ) / 1 2 } are the ordinary least-squares coefficients of regression of the observed series on the functions cos X k t and - sin X k t . Hence if a discrete frequency component is detected at &,, the residual from the regression is
+
yt - Re { 2 X N ( ~ k , ) / n )cos X k , t
+ Im { 2 X N ( ~ k , ) / n sin } ~ k , t
Figure 10.14 shows that subtracting the detected discrete component in this manner effectively suppresses completely the component at X = 27r/8. not 4
I
I
-4
60
40
20
80
120
100
-
-I
0 -100
I
-2ooL
-300'
50
-,r+
100
150
200
250
+
Figure 10.14 (Top) Yt - Re { 2 2 ~ ( 2 ~ / 8 ) cos(27rt/8) /n} Im { 2 2 ~ ( 2 ~ / 8 ) / sni }n ( 2 ~ t / 8 ) . (Bottom) Periodogram based on 512 point sample. Dotted line is a = 0.001 threshold for test of variance contrast using m = 8.
just the discrete part. This effect can be mitigated by adding back a random component acos X k t bsin X k t , where a and b are distributed N(O.3') and Z 2 is essentially the denominator
+
in the variance contrast ratio (10.40).
328
SPECTRAL ESTIMATION
PROBLEMS A N D SUPPLEMENTS 10.1 The two spectra. Here we would like to point out the relationship between an empirical spectral analysis (what we can measure) and the spectral representation of the operator that propogates the process. First let us suppose we have a Hilbert space 7-1 on which there is defined some unitary operator U (more than one unitary operator can be defined on 7-1). Let X E l-l and consider the stationary sequence X n = U n X (we are taking X O = X ) whose correlation sequence is given by
where the existence of F and the truth of this representation is guaranteed by the Herglotz theorem or via the spectral theorem for unitary operators (Chapter 4). Clearly it is possible to define many stationary sequences on a Hilbert space 7-1 on which we have a unitary operator. The quantities we can observe and measure are determined by the action of U on specific vectors. Indeed, for two points X I ,Xz E l-l it is possible for Fx, to be absolutely continuous with respect to Lebesgue measure while Fx2 is discrete. This same observation carries over to PC-T sequences. Given 7-1 and unitary operator U on 7-1, a PC-T sequence may be formed by starting first with a set of T vectors X = ( X o , X 1 ,...,X T - ~ ) ' and then for any n = j kT with 0 5 j 5 T - 1 we define X ~ T + = ~Uk[[z,]. This PC-T sequence will have a correlation Rx(m,n) and spectral distribution functions f x , k , k = {0,1, ...,T - 1). These quantities are specific to the starting collection X just as the correlation and spectrum are specific to X in the stationary case. And so all we can do empirically is attempt to estimate these quantities for the specific X we happen to receive in our experiment.
+
10.2
Show the condition
C,"=-, [C;=;'
C,"==_, CT=-t IR(t + r,t)l < 1/2
lR(t + .r,t)I2]
3c)
is sufficient for
< 02. As a hint, note that for fixed 7 ,
T-1
where t , E (0, 1,. . . ,T - l}. Then show
C,"=-, IR(t, + ~ , t , ) l<
cxj.
10.3 Compute Var [ f 3 c , ~ ( X ) ] and Cov ( & , N ( A ~ &,c,~(&)), )~ beginning with Proposition 10.5, for X, X I , X2 in the Fourier frequencies. Recall for P C white noise that f k ( X ) = Bk(0)/27r, where B k ( 0 ) = T-lCF=i' oz(t)e-i2"kt/T1 0 -<
x < 27r.
PROBLEMS AND SUPPLEMENTS
329
10.4 Estimation of spectral densities for two-dimensional P C fields has been studied by Alekseev [6] and Dehay and Hurd [47]. 10.5 Observing random periodic components. Consider the application of the periodogram test for discrete spectral components t o the three following random sequences, where we have included the dependence on w E 0:
+
1. Y ( t ,w)= 0.5 c o s ( 2 ~ t / 1 6 ) [ ( t ,w);the periodic component of Y ( t ,w)is a nonrandom function as constructed in the simulation.
+
2. Y ( t , w ) = X ( w )c o s ( 2 ~ t / l 6 ) [ ( t , ~with ) Var [XI < cc; the periodic component of Y ( t , w ) is then a nonstationary random function.
+
+
3. Y(t,w) = 0 . 5 c o s ( 2 ~ ( t 0(w))/16) t ( t . w ) , where O ( w ) is a random variable independent of [ ( t ,w)and uniformly distributed on [0,1,.... 151; the periodic component of Y ( t . w ) is a stationary random function. Our observations of time series generated by these three cases will be very much the same. In particular, if we happen to experience X ( w ) = 0.5, then the observed Y ( t ,w)will appear essentially the same in all three cases except for the shift of phase due t o 0. And the periodogram test for periodicity will produce the same result because, for all three cases, the forming of IN(X) by (10.2) destroys the dependence on O ( w ) . Our point is that from a single sample we can perceive whether a periodic component is present, but we cannot perceive whether it arises from a nonrandom function (such as a periodic mean), from a nonstationary random periodic function. or from stationary random periodic function. The manner in which we view it thus becomes an assumption about the underlying model. The most general (inclusive) view, without additional side information, is that it is a nonstationary random periodic function.
10.6 As pointed out in Section 10.4.2, choosing M = N can sometimes be a detriment because it is possible for the phase of the spectral correlation measure Fk to change along its support line. Thus it is possible that the quantity (10.30) may average t o zero. This argues that small values of M also have their purpose. To obtain a more powerful test for small values of M that still requires persistence along diagonal lines. we can zncoherently average [lo71 all the values of (y(p,q , M)12 from a coherence plot for a fixed value of d = p - q. To be precise. the incoherent statistic is the average .
L
(10.42) where we typically use L = L(N - 1- $ ) / M I . The quantity 6 ( d ,M ) is plotted as a function of the difference frequency d. In essence. this statistic was
330
SPECTRAL ESTIMATION
utilized by Bloomfield. Hurd and Lund [19] for determining whether residuals of time series after model fitting are cyclostationary.
10.7
For a harmonizable X t , show that
xt
-
&tJr
=
lT
e z X t W N (XT)<(dX).
where W N ( X T )is continuous in A, W N ( X T )= 0 for X = j27r/T. j = 0.1. . . . T - 1, and all N 2 1. Also limN,= WN(AT)= 1 for X # j 2 7 ~ / Tj, = 0 . 1, . . . , T - 1 .
.
CHAPTER 11
A PARADIGM FOR NONPARAMETRIC ANALYSIS OF PC TIME SERIES
Suppose one is given a sample of a time series and asked the question: Does this series exhibit the PC property? If so. what can we say about it? This chapter summarizes and organizes the methods discussed in previous chapters into a procedural outline. or paradigm, for answering these questions. This is done only within the scope of mean, correlation. and spectral measurements. Obviously. our ability to answer these questions. especially regarding characterization. will substantially improve by the inclusion of PARMA time series analysis, a topic t o be addressed in future writings. These questions are t o be answered by the observation of a single realization, and there are two distinct subcases to be considered- when the period T is known or when it is unknown.
Period is known. This is the case when the periodic effects are suspected or known to be coupled t o a physical system that determines the period. The most obvious examples are the diurnal period in hours (T = 24) and the annual and quarterly periods. say in months so the periods PerLodzcally Correlated R a n d o m Sequences Spectral T h e o r y and Practzce By H L Hurd and A G Miamee Copyright @ 2007 John Wiley & Sons, Inc
331
332
A PARADIGM FOR NONPARAMETRIC ANALYSIS OF PC TIME SERIES
are T = 12 and T = 4, respectively. Known periods may also arise in mechanical or electrical systems. When T is known, a strategy for determining the presence of P C structure can be based on the consistency results discussed in the preceding chapters. The tools we have at our disposal are estimators for mt or its Fourier coefficients & k , for R(t 7 ,t ) or its Fourier coefficients B k ( 7 ) . and for the densities f k ( X ) . When the sequence is found to be PC-T, there remains the question of whether or not it is just a periodically scaled (amplitude modulated) stationary sequence.
+
Period is unknown. This situation arises when one has no prior information about the presence of PC structure in the data or about its period. The general strategy is to use the multiple hypothesis test:
Ho: observed series is not PC; H 3 . j= 1. ...,N : observed series is PC with period T3. Although multiple alternatives can be tested using the same tools. we prefer spectral methods for detection and removal of harmonic components and for determining the presence of the PC property.
11.1 THE PERIOD T IS KNOWN We build our understanding of the series of steps, roughly in the order given below. In all these tests, some adjustment for multiple hypotheses is likely required. Our programs implement simple multiple hypothesis corrections as appropriate. T h e mean mt. Use permest .m to form and plot 6 i t , ~ to , test for mt = m and t o return X t - 6 i t , for ~ later analysis. Rejection of mt = m indicates a properly periodic mean. In physical systems it is a clue that periodic non-stationarity in the covariance structure (more generally, in the probability law) may be present. Although it is theoretically possible for the sample mean to converge to a random limit, limN-icof?it,N = (({27rk/T}),we usually assume the limit is nonrandom and so { 27rk/T}) = & k .
CTzi
c(
Removal of mean and discrete spectral components. Using pgram.m, identify frequencies of discrete spectral components and remove them by the method indicated in Section 10.6.3. A
Fourier coefficients of mt. Use perrncoeff .rn to form & k , N and test for E k = 0, for k = 0.1.. . . ,T - 1. 630 = 0 corresponds to the average
THE PERIOD
T
IS KNOWN
333
xt=omt T-1
= 0. ffik # 0 for k 2 1 corresponds to mt $ m, a properly periodic mean mt. The interpretation regarding nonstationarity is the same as the preceding.
T-'
The variance r 2 ( t ) .U s e p e r s i g e s t .m to form and plot Z j N ( t ) , to test a ( t ) = IJ.and to return the demeaned and scaled series [Xt - i j Z t , ~ ] / L ? ~ for (t) later analysis. Rejection of a ( t ) = IJ indicates a properly periodic variance but leaves open whether or not X t is simply the result of a stationary process subjected to amplitude-scale modulation. Another test. based on the Fourier coefficients B k ( 0 ) of R ( t ,t ) = a 2 ( t )is , given below.
+
h
The covariance R(t r,t ) . Use peracf .m to form and plot R N i~( t + . r , t ) . to test for (a) R(t 7 . t ) = R ( r ) (for given lag 7 , the covariance is constant with respect to t ) and (b) R(t r , t ) = 0 (for given lag 7 , the covariance is zero for all t ) . Rejection of R(t 7 ,t ) = R ( r ) indicates the sequence is properly P C (there exist lags for which R(t + r. t ) is properly periodic). Rejection of R(t T , t ) = 0 for some 7 # 0 indicates the sequence is not PC white noise.
+
+
+
+
+
The correlation p ( t T,t ) . The application of peracf .m to the demeaned and normalized [ X t - f7it j ~ ] / z ~produces (t) p ^ ~( t ~ 7 .t ) . Using test (a) above, rejection of p ( t + ~ , t=) p ( r ) indicates that X t is properly P C and is not just an amplitude modulated stationary sequence. That is, there exist lags for which p(t ~ , tis) properly periodic. Test (b) gives the same information as it does for the covariance: rejection of p ( t + r , t ) = 0 for some T # 0 indicates the sequence is not P C white noise.
+
+
h
The coefficients B k ( 7 ) . Use Bcoeff .m to form B ~ , N T (for T )a co~~ection of r and for k = 0 , 1 , . . . , L(T - 1)/2]. For these k and 7 , test Bk(7)= 0 (we use the variance contrast method). Recall Bo(0) > 0 is expected. Rejection of B k ( 0 ) = 0 for some k > 0 indicates a properly periodic a ( t ) (recall B k ( 0 ) = T-' = a 2 ( t ) e z Z r k t ) . Rejection of B~(T)= o for some k > 0, 7 # 0 indicates R(t + 7 ,t ) is properly periodic at lag 'T with frequency 2.rrklT.
xT=-,,
Note that testing Bk(7-T) = 0 for some specific 7 can be a more sensitive (more power) detector of the presence of P C structure if only a few Bk(r) dominate the rest. We leave the quantification of this statement as a problem.
The coefficients p k ( T ) . The application of Bcoeff .m to the demeaned and normalized [ X t - R t , ~ ] / L ?produces ~(t) p ^ t + r . t , ~ t for a collection of T and for k = 0 , 1 , . . . , L(T - 1)/2]. For these k and 7 , test p k ( ~ = ) 0 (we use the variance contrast method). Since the normalized series has
334
A PARADIGM FOR NONPARAMETRIC ANALYSIS OF PC TIME SERIES
constant unit variance, we expect po(0) > 0 and pk(0) = 0 for k > 0. Rejection of p k ( ~ = ) 0 for some k > 0. T # 0 indicates p ( t T.t ) is properly periodic at lag T with frequency 27rk/T. and that X t is properly PC-T and is not just an amplitude modulated stationary sequence.
+
The densities fk(X) for B k ( 7 ) . Use f k e s t .mt o form &(A) basedon X t f i t for k = 0 , 1 , . . . , [ ( T- l)/2] and a subset of the Fourier frequencies AF = {A, = 327r/N. 3 = 0,1, . . . , N-l}. where N is the sample size. For these k and A, test fk(A) = 0 via sample magnitude squared coherence lq(A,,A, - 2 ~ k / T ) / We ~ . expect fo(A) 2 0 for all A. Rejection of fk(A) = 0 for some k and A indicates the spectral measure F is not zero on the line A2 = A 1 - 27rk/T. and hence X t is PC-T. h
The densities fk(X) for p k ( 7 ) . Use f k e s t .m to form f k ( A ) based on [ X ,%,]/3jN(t)for k = 0 , l . . . . , L(T - 1 ) / 2 j and a subset of the Fourier frequencies R F = {A, = 327r/N, j = 0 . 1 . . . . , N - l}, where N is the sample size. For these k and A, test f k ( A ) = 0 via sample magnitude squared coherence lq(A,. A, - 27rk/T)lz. We expect f o ( A ) 2 0 for all A. Rejection of &(A) = 0 for some k and A indicates the spectral measure F is not zero on the line A2 = A 1 -27rk/T, meaning [ X ,-mt]/o(t)is PCT. and so X t is properly PC-T and is not just an amplitude modulated stationary sequence. The spectral measures F and F . Use sc0h.m to make quick check that the support of F and F is in the expected location. namely in the set ST defined by (1.18) and shown in Figure 1.3.
11.2 THE PERIOD T IS UNKNOWN When T is unknown we are faced with either conducting the fixed T tests for a range of T or conducting some tests organized for the more general case of determining a n unknown T . The steps below, although few, are useful for finding P C structure when the value of T is unknown. The idea is t o get some candidate values of T . or in other words, t o identify some diagonals in [0,27r) x [ 0 . 2 ~ where ) there is significant spectral coherence. We can then estimate the densities on these lines. Some of these can also be used for inference on almost P C sequences. a topic beyond our current scope. The tests we employ are somewhat more "spectral" in nature, as in nonparametric spectral estimation. As before our programs implement simple multiple hypothesis corrections as appropriate.
The mean mi. Use pgram.m t o plot the periodogram and identify significantly large harmonic terms. A program spermean . m (not demonstrated
THE PERIOD
T
IS UNKNOWN
335
here) forms form X t - ht.the residual from the regression on Fourier frequency components. Without the knowledge of T . this is the only method we currently have to remove large components. whether they arise from a time varying mean or from random-amplitude components.
The variance u 2 ( t ) .Use pgram.m and spermean.m on the squares [Xt 6itI2 to plot the periodogram, to identify significantly large harmonic terms. Use the significantly large terms to form S ( t ) and [ X t - % t ] / S ( t ) for later analysis. The spectral measure F . Use sc0h.m applied t o X t - st to determine if there are support lines on which there is significant coherence. Once some candidate support lines A2 = A1 - 6k. k = 1 , 2 . . . . . n are identified. the density fdi, on those lines can be estimated using fkest .m. Repeat the process for [Xt - % t ] / $ n i ( t ) to find lines b k on which to estimate f .
..
The densities f on the support lines defined by 61,61,. ,a,. Use fkest .m to form f based on X t - 7%t on the given lines and for a subset of the Fourier frequencies AF = {A, = j27r/N. j = 0 , l . . . . , LV- l}. where N is the sample size. Note we have already rejected that the density is zero on the line A2 = A1 - S k .
..
The densities f on the support lines defined by & , & , . ,6,. Use fkest .m to form f based on [ X t- h t ] / S ~ ( t on ) the given lines and for a subset of the Fourier frequencies A* = {A, = j27r/N. j = 0 . 1 , . . . , N l}? where 1%’ is the sample size. Note we have already rejected that the = A1 - 8 k . density is zero on the line h
For unknown T the forming of ht and 8 ( t )by identification of large components may require experimentation with thresholds. It is part of the price of dropping the constraint of a fixed T .
This Page Intentionally Left Blank
REFERENCES
1. I. L. Abreu, “A note on harmonizable and stationary sequences:” Bol. Soc. Mat. Mexicana, 15,pp. 48-51, 1970. 2. N. I. Akheizer and I. M. Glazman, Theory of Linear Operators in Hilbert Space. Fredrick Unger, 1961 and Dover, New York, 1993.
3. L.V. Ahlfors, Complex Analysis. 2nd ed. McGraw-Hill, New York. 1966. 4. E. 3. Akutowicz. ”On an explicit formula in least square prediction,” Math. Scan., 5 , pp. 261-266, 1957.
5 . V. G. Alekseev. “Estimating the spectral densities of a Gaussian periodically correlated stochastic process,” Prob.Inf. Dansm.. 24, pp. 109-115, 1988. 6. V. G. Alekseev, “On spectral density estimates of a Gaussian periodically correlated random fields,” Prob. Math. Stat., 11,pp. 157-167: 1991. 7. J. Allen and S. Hobbs, “Detecting target motion by frequency-plane smoothing,” in Proceedings of the Twenty-Sixth Asilomar Conference o n Systems and Computers. Pacific Grove, CA, pp. 1042-1047, 1992. 8. T. W. Anderson, The Statistical Analysis of T i m e Series; Wiley, Hoboken, NJ, 1971. 9. T. W. Anderson, An Introduction t o Multivariate Analysis, 2nd ed. ,Wiley, Hoboken, NJ. 1984. Periodically Correlated R a n d o m Sequences:Spectral Theory and Practzce. By H.L. Hurd and A.G. Miamee Copyright @ 2007 John Wiley & Sons; Inc.
337
338
REFERENCES
10. P. L. Anderson, M. M. Meerschaert, and A. Vecchia) "Innovations algorithm for periodically stationary time series?'' Stoch. Proc. Appl., 83; pp.149-169. 1999. 11. C. F. Ansley, "An algorithm for the exact likelihood of a mixed autoregressive moving average process, Biometrika. 66, pp. 59-65, 1979. 12. W. R . Bennett, "Statistics of regenerative digital transmission. J.! 37, pp. 1501-1542, 1958.
''
Bell Syst Tech.
13. P. Billingsley, Convergence of Probability Measures, Wiley. Hoboken. NJ. 1968. 14. P. Billingsley, Probability and measure: 2nd ed., Wiley-Interscience, Hoboken. NJ, 1986. 15. S. Bittanti and G. De Nicolao, "hlarkovian representations of cyclostationary processes," in Lecture Notes in Control and Information Sciences No. 161. L. Gerencs6r and P. E. Caines Eds., Springer, Djew York, 1991.
16. S. Bittanti, P. Bolzern, L. Piroddi, and G. De Nicolao, "Representation, prediction and identification of cyclostationary processes-a state-space approach:" in Cyclostationarity in Communications and Signal Processing, W . A. Gardner, Ed., IEEE Press, New York, 1993. 17. S. Bittanti and P. Colaneri. "Invariant representations of discrete-time periodic sysytems," Automatica, 36, pp. 1777-1793. 2000. 18. P. Bloomfield, Fourier Analysis of Time Series: An Introduction, Wiley, Hoboken, NJ, 1976.
19. P. Bloomfield, H.L. Hurd and R. Lund. "Periodic correlation in stratospheric ozone time series." J . Time Series Anal.; 15,pp. 127-150, 1994. 20. S. Bochner, "A theorem on Fourier-Stieltjes integrals," Bull. AiWS, 40. pp. 272-276, 1934. 21. G. E. P. Box and G. M. Jenkins, Time Series Analysis: Forecasting and Control. Holden Day, San Francisco, 1970.
22. G. E. P. Box, G. M. Jenkins, and G. Reinsel, Time Series Analysis. 3rd ed.. Prentice-Hall, Englewood Cliffs, NJ. 1994. 23. R. A. Boyles and W. A. Gardner; "Cycloergodic properties of discrete-parameter nonstationary stochastic processes,'' IEEE Trans. I n . Theory, IT-29, pp. 105114, 1983. 24. F. J. Beutler, "On stationary conditions for certain periodic random processes." J . Math. Anal. Appl., 3, pp. 25-36. 1961. 25. W. Sl.Brelsford, "Probability predictions and time series with periodic structure," PhD Dissertation, Johns Hopkins University, Baltimore. MD: 1967. 26. D. R . Brillinger, Time Series: Data Analysis and Theory, Holt: Rinehart and Winston, New York, 1965. 27. D. R. Brillinger. Time Series: Data Analysis and Theory, Holt, Rinehart and Winston, New York. 1973. 28. P. J. Brockwell and R. A. Davis, Time Series: Theory and Methods, 2nd ed., Springer, New York, 1991.
REFERENCES
339
29. S. Cambanis and C. H. HoudrC, "On the continuous wavelet transform of second order random processes,'' IEEE Trans. Inf. Theory, IT-41, pp. 628-642, 1995. 30. S. Cambanis. C. H. HoudrC. H. L. Hurd, and J. Leskow. "Laws of large numbers for periodically and almost periodically correlated processes," Stoch. Proc. App2.. 53, pp. 37-54, 1994. 31. D. K. Chang and M. M.Rao, "Bimeasures and nonstationary processes" in Real and Stochastic Analysis, hl. ill. Rao, Ed.; Wiley, Hoboken, N J , 1987. 32. S.D. Chatterji. "Orthogonally scattered dilation of hilbert space valued functions ." Lecture Notes i n Mathematics. No. 920. Springer Verlag. pp. 570-580. New York. 1982. 33. C. Corduneanu, Almost Periodic Functions, Chelsea Publishing Company, New York. 1989. 34. H. Cramhi-. Methods of Mathematical Statistics. Princeton University Press. Princeton. N J , 1961. 35. H. Cram&, "On the theory of stationary random processes." Math. Ann.. 41, pp. 215-230. 1940. 36. H. Cram&, "On some classes of nonstationary stochastic processes." Proc. Fourth Berkeley Symp. Math. Stat, Prob.. 2, pp. 55-77. 1961. 37. A. V. Dandawate and G. B. Giannakis, "Statistical test for presence of cyclostationarity," IEEE Trans. Signal Proc.. 42?pp. 2355-2369; 1994. 38. D. Dehay. "On a class of asymptotically stationary harmonizable processes," J . Multivariate Anal.: 22, pp. 251-257, 1987. 39. D. Dehay. "Nonlinear analysis for almost periodically correlated strongly harmonizable processes:" presented at 2nd World congress of the Bernoulli Society at Uppsala, Sweden, August 13-18, 1990. 40. D. Dehay. -Processus bivaries presque pkriodiquement corrC1Cs: analyse spectrale et estimation des densites spectrales croisees!" in Journe'es d e Statistiques Strasbourg, XXIII. pp. 187-189. 1991. 41. D. Dehay, "Estimation de parametres fonctionnels spectraux de certains processus non-nkcessairement stationnaires," C. R. Acad. Sci. Paris. 314(4), pp. 313-316, 1992. 42. D. Dehay and R. MochC, "Trace measures of a positive definite bimeasure." J . Multivariate Anal.: 40, pp. 115-131, 1992. 43. D. Dehay, "Asymptotic behavior of estimators of cyclic functional parameters for some nonstationary processes." Stat. and Decisions. 13. pp. 273-286. 1995. 44. D. Dehay, "Spectral analysis of the covariance of the almost periodically correlated processes," Stoch. Proc. Appl., 5 0 , pp. 315-330: 1994. 45. R. L. Devaney. A n Introduction to Chaotic Dynamsics, Benjamin, San Francisco, 1986. 46. D. Dehay and H. L. Hurd, "Representation and estimation for periodically and almost periodically correlated random processes," in Cyclostationarity i n Communications and Signal Processing, W. A. Gardner. Ed.. IEEE Press. New York, 1993.
340
REFERENCES
47. D. Dehay and H.L. Hurd, “Spectral estimation for strongly periodically correlated random fields defined on RZ”, Math. Methods Stat., 11,No. 2, pp. 135 151, 2002. ~
48. J. Diestel and J. J. Uhl, Jr., Vector Measures, Mathematical Surveys, No. 15, American Mathematical Society, Providence, RI, 1977. 49. J. L. Doob, Stochastic Processes, Wiley, Hoboken, NJ, 1953. 50. Y. P. Dragan, Structure and Representation of Stochastic Signal Models (in Russian), Naukova Dumka, Kiev, 1980. 51. Y . P. Dragan and I. N.Yavorskiy, Rythmics of Sea Waves and Underwater ACOUStic Signals (in Russian), Naukova Dumka, Kiev, 1982. 52. Y. P. Dragan, V. A. Rozhkov, and I. N.Yavorskiy, Methods of Probabilistic Analysis of Rhythms of Oceanological Processes (in Russian), Gidrometeoizdat, Leningrad, 1987. 53. N. Dunford and J. T. Schwarz, Linear Operators, Part I: General Theory, WileyInterscience, Hoboken, NJ, 1958. 54. S. N. Elaydi, A n Introduction to Difference Equations, 2nd ed., Academic Press, New York, 1999. 55. L. D. Enochson and N. R. Goodman, “Gaussian approximation to the distribution of sample coherence:” Measurement Analysis Corporation, Technical Report AFFDL-TR-65-57, AD620987, June 1965. 56. C. J. Everett and H. J. Ryser, ‘‘ The Gram matrix and Hadamard theorem,“ A m . Math. Monthly, 53,No. 1 , pp. 21-23, 1946. 57. R. A. Fisher, “Tests of significance in harmonic analysis,” Proc. R. Soc. London Ser. A , 125, No. 796, pp. 54-59, 1929. 58. L. E. Franks, Signal Theory, Prentice-Hall, Englewood Cliffs, NJ, 1969. 59. L. E. Franks, “Polyperiodic linear filtering,” in Cyclostationarity i n Communications and Signal Processing, W. A. Gardner, Ed., IEEE Press, New York, 1994. 60. P. H. F’i-anses, Periodicity and Stochastic Trends i n Economic Time Series, Oxford University Press, New York, 1996. 61. R. Gangolli, “Wide sense stationary sequences of distributions on hilbert space and the factorization of operator-valued functions,” J . Math. Mech., 12, pp. 893-910, 1963. 62. V. F. Gaposhkin, “Criteria for the strong law of large numbers for some classes of second order stationary processes and homogeneous random fields,” Theory Probab. Appl., XXII, No. 2? pp. 286-310, 1977. 63. W . A. Gardner, “Representation and estimation of cyclostationary processes,“ Ph.D. Dissertation, Department of Electrical and Computer Engineering, University of Massachusetts, August, 1972, reprinted as Signal and Image Processing Lab Technical Report No. SIPL-82-1, Department of Electrical and Computer Engineering, University of California at Davis, 1982.
REFERENCES
341
64. W. A. Gardner and L. E. Franks, “Characterization of cyclostationary random signal processes,” IEEE Trans. Inf. Theory, IT-21, pp. 4-14, 1975. 65. W. A. Gardner, “Stationarizable random processes,” IEEE Trans. Inf. Theory, IT-24, pp. 8-22, 1978. 66. W. A. Gardner, Introduction to Random Processes with Application to Signals and Systems, Macmillan, New York, 1985. 67. W. A. Gardner, Statistical Spectral Analysis: A Nonprobabilistic Theory, Prentice Hall, Englewood Cliffs, N J , 1987. 68. W. A. Gardner, “Signal interception: a unifying theoretical framework for feature detection,” IEEE Trans. Commun., COM-36, pp. 897-906, 1988. 69. W. A. Gardner, “TWO alternative philosophies for estimation of the parameters of time-series,“ IEEE Trans. Inf. Theory, 37, pp. 216-218, 1991. 70. W. A. Gardner, ”Exploiting spectral redundancy in cyclostationary signals,” IEEE ASSP Mag., 8,pp. 14-36, 1991. 71. W. A. Gardner and C. M. Spooner, “Signal interception: performance advantages of cyclic feature detectors,” IEEE Truns. Commun., COM-40, pp. 149159, 1992. 72. W. A. Gardner and C. M. Spooner, “Detection and source location of weak cyclostationary signals: simplification of the maximum likelihood receiver, IEEE Trans. on Commun., COM-41, pp. 905-916, 1993. 73. W. A. Gardner, “An introduction to cyclostationary signals,” in Cyclostationarity in Communications and Signal Processing, W. A. Gardner, Ed., IEEE Press, New York, 1994. 74. W. A. Gardner, A. Napolitano, and L. Paura, “Cyclostationarity: half a century of research,” Signal Processing, 86,pp. 639-697, 2006. 75. M. J. Genossar, H. Lev-Ari and T. Kailath, ‘‘Consistent estimation of the cyclic autocorrelation,” IEEE Trans. Signal Proc., 42, pp. 595-603, 1994. 76. E. G. Gladyshev, “On Multi-dimensional stationary random processes,” Theory Probab. Appl., 3, pp. 425-428, 1958. 77. E. G. Gladyshev, “Periodically correlated random sequences,” Sow. Math., 2, pp. 385-388, 1961. 78. E. G. Gladyshev, “Periodically and almost periodically correlated random processes with continuous time parameter,” Theory Probab. Appl., 8,pp. 173-177, 1963. 79. G. Golub and C. Van Loan, Matrix Computations, Johns Hopkins Press, Baltimore, 1987. 80. N . R. Goodman, “On the joint estimation of the spectrum, co-spectra and quadrature spectrum of a two-dimensionsal stationary Gaussian process,” Dissertation, Princeton University, 1957, Also Scientific Paper No. 10, Engineering Scientific Laboratory, New York University, AD134919, 1957. 81. N . R. Goodman, “Statistical analysis based on the multivariate complex Gaus-
sian distribution,” Ann. Math. Stat., 34, pp. 152-177, 1963.
342
REFERENCES
82. N. R. Goodman, ”Statistical tests for nonstationarity within the framework of harmonizable processes:“ Rocketdyne Research Report No. 65-28, AD619270. August 2, 1965. 83. L. Gu and L. Miranian, ”Strong rank revealing Cholesky factorization.” Electron. Trans. Numer. Anal., 17, pp. 76-92, 2004. 84. L. I. Gudzenko, “On periodically nonstationary processes.” Radiotekhnika i elektronika, 4, No. 6, pp. 1062-1064, 1959. 85. http://www.Amnh.ufl.edu/natsci/ornithology/sounds.htm. 86. P. R. Halmos, Measure Theory, Van Nostrand, Princeton, NJ, 1950. 87. P. R. Halmos, Introduction to Halbert Space, Chelsea Publishing Company. Kew York: 1957. 88. E. J. Hannan, Multiple Time Series. Wiley. Hoboken, NJ, 1970 89. H. Helson and G. Szego, “ A problem in prediction theory ,” Ann. Math. Pure Appl., 51; pp. 107-138, 1960. 90. L. J. Herbst, “Almost periodic variances.” A n n . Math. Stat., 34,pp. 1549-1557; 1963. 91. L. J. Herbst! “Periodogram analysis and variance fluctuations,” J . R . Stat. Soc. B, 25, pp. 442-450, 1963. 92. L. J. Herbst; “A test for variance heterogeneity in the residuals of a Gaussian moving average,“ J . R . Stat. Soc. B, 25, pp. 451-454, 1963. 93. L. J . Herbst; “Spectral analysis in the presence of variance fluctuations.’‘ J . R . Stat. Soc. B, 26, pp. 354-360, 1964. 94. L. J . Herbst, “Stationary amplitude fluctuations in a time series,” J . R . Stat. SOC.B, 26, pp. 361-364, 1964. 95. L. J. Herbst, “The statistical fourier analysis of variances.” J . R . Stat. SOC.B: 27, pp. 159-165, 1965. 96. L. J. Herbst, “Fourier methods in the study of variance fluctuations in time series analysis,” Technometrics, 11, pp. 103-113, 1969. 97. I. Honda. ”On the spectral representation and related properties of periodically correlated stochastic processes,” Trans. IECE Japan, E65. pp. 723-729. 1982. 98. I. Honda, “On the ergodicity of Gaussian periodically correlated stochastic processes,” Trans. IEICE Japan, E73, pp. 1729-1737, 1990. 99. C. H. HoudrC, “Harmonixability, V-boundedness, (2, p)-boundedness of stochastic processes,’‘ Prob. Theory Relat. Fields, 84, pp. 39-54, 1987. 100. C. H. HoudrB, “Linear Fourier and stochastic analysis,’‘ Prob. Theory Relat. Fields, 87, pp. 167-188, 1990. 101. H. L. Hurd, “An investigation of periodically correlated stochastic processes.’‘ PhD dissertation, Duke University deptartment of Electrical Engineering, Nov., 1969.
REFERENCES
343
102. H. L. Hurd, "Periodically correlated processes with discontinuous correlation functions," Theory Probab. Appl., 19,pp. 834-838. 1974. 103. H. L. Hurd. "Stationarizing properties of random shifts," SIAM J . Appl. Math.: 26. pp. 203-211, 1974. 104. H. L. Hurd? "Representation of strongly harmonizable periodically correlated processes and their covariances," J . Multivariate Anal.) 29.pp. 53-67. 1989. 103. H. L. Hurd. "Nonparametric time series analysis for periodically correlated processes." IEEE Trans. Inf. Theory, IT-35, pp. 350-359. 1989. 106. H. L. Hurd. "Correlation theory of almost periodically correlated processes.'? J . Multivariate Anal., 37,pp. 24-45. 1991. 107. H. L. Hurd and N. L. Gerr! "Graphical methods for determining the presence of periodic correlation in time series," J . Time Series Anal., 12.pp. 337-350. 1991. 108. H. L. Hurd and J. Leskow. "Estimation of the Fourier coefficient functions and their spectral densities for +mixing almost periodically correlated processes,'' Stat. Prob. Lett., 14:pp. 299-306: 1992. 109. H. L. Hurd and J. Leskow. "Strongly consistent and asymptotically normal estimation of the covariance for almost periodically correlated processes." Stat. Decisions. 10,pp. 201-225, 1992. 110. H. L. Hurd and V. Mandrekar, "Spectral theory of periodically and quasiperiodically stationary SaS sequences,'' Technical Report No. 349, Center for Stochastic Processes, Department of Statistics; UNC at Chapel Hill, Sept. 1991. 111. H. L. Hurd and G. Kallianpur, "Periodically correlated and periodically unitary processes and their relationship to Lz [O, TI-valued stationary sequences." in Nonstationary Stochastic Processes and Their Appllication. J. C. Hardin and A. G. Miamee, Eds.; World Scientific Publishing, Singapore, 1992.
112. H. L. Hurd, "Almost periodically unitary stochastic processes." Stoch. Proc. Appl.. 43,pp. 99-113, 1992. 113. H. L. Hurd and A. Russek, "Almost periodically correlated and almost periodically unitary processes in the sense of Stepanov." Theory Probab. Appl., 41: 1996. 114. H. L. Hurd and A. Russek. "Almost periodically correlated processes in LCA groups." Technical Report No. 369, Center for Stochastic Processes, Department of Statistics. UNC at Chapel Hill, 1992. 115. H. L. Hurd and C. H. Jones. "Dynamical systems with cyclostationary orbits." in The Chaos Paradigm: Developments and Applications in Engineering and Science, R. Katz, Ed.; AIP Press, New York: 1994. 116. H. L. Hurd and T. Koski, "The Wold isomorphism for cyclostationary sequences," Signal Processing, 84,No. 5 : pp. 813-824, 2004.
117. H. L. Hurd and T. Koski, "Cyclostationary arrays: their unitary operators and representations,'' in Stochastzc Processes and Functional Analysis: A volume of recent advances i n honor of M. M. Rao, Lecture Notes in Pure and Applied
344
REFERENCES
Mathematics No. 238 New York, 2004.
, Alan
Krinik and Randall Swift, Eds., Marcel Dekker,
118. H. L. Hurd, G. Kallianpur and J. Farshidi “Correlation and spectral theory for periodically correlated random fields indexed on Z2,” J . Multivariate Anal.. 90, NO. 2, pp. 359-383, 2004. 119. H. L. Hurd, “Periodically correlated sequences of less than full rank:” J . Stat.Planning Inference, 129,pp. 279-303, 2005. 120. I. A. Ibragimov, “Some limit theorems for stationary processes,” Theory Probab. Appl., 12,pp. 349-382, 1962. 121. Y. Isokawa, “An identification problem in almost and asymptotically almost periodically correlated processes,” J . Appl. Prob., 19,pp. 53-67, 1982. 122. R. H. Jones and W. M. Brelsford, “Time series with periodic structure,” Biometrika, 54,pp.403-408, 1967. 123. K. L. Jordan, “Discrete representations of random signals, Technical Report No. 378, MIT Research Laboratory of Electronics, 1961. 124. G. Kallianpur and V. Mandrekar, “Spectral theory of stationary H-valued processes, J . Multivariate Anal., 1,pp. 1-16, 1971. 125. R. E.Kalman, ” A new approach t o linear filtering and prediction problems,” Trans. ASME J . Basic Eng.: 83D, pp. 35-45, 1960. 126. J. Kampi! de Fbriet, “Correlation and spectrum of asymptotically stationary random functions,” Math. Stud., 30, pp. 55-67, 1962. 127. J. Kampb de Fbriet, and F.N. Frenkiel, “Correlation and spectra for nonstationary random functions,” Math. Comp, 16,pp. 1-21(1962). 128. Y. Katznelson, A n Introduction to Harmonic Analysis, Dover, New York, 1976. 129. A. Khintchine, “Korrelations theorie de stationaren stochastischen prozesse,” Math. Ann., 109,pp. 604-615, 1934. 130. K. Kim, G. North and J. Huang, “EOFs of one-dimensional cyclostationary time series: computations, examples and stochastic modeling,” J . Atmos. Sci., 53,pp. 1007-1017, 1996. 131. K. Kim and G. North, “EOFs of harmonizable cyclostationary processes,” J . Atmos. Sci., 54,pp. 2417-2427, 1997. 132. K. Kim and Q. Wu, “A comparison study of EOF techniques: analysis of nonstationary data with periodic statistics,” J . Climate, 12,pp. 185-199, 1999. 133. A. Kolmogorov and Y. Rozanov, “On strong mixing conditions for stationary Gaussian processes,’‘ Theory Probab. Appl., 5 , pp. 204-208, 1960. 134. A. Kolmogorov, “Stationary sequences in hilbert space,“ (in Russian), Bull. Math. Univ. Moscow, 2, 1941. Translated in report CN/74/2, J. F. Barrett, trans., Department of Engineering, Cambridge University, pp. 1-24, 1974. 135. J . Leskow: “Asymptotically normality of the spectral density estimators for almost periodically correlated processes.,” Stoch. Proc. Appl., 52,pp. 351-360, 1994.
REFERENCES
345
136. B. M. Levitan and V. V. Zhikov, Almost Periodic Functions and Diflerential Equations, Cambridge University Press, London, 1982. 137. W . K. Li and Y. V. Hui, “An algorithm for the exact likelihood of periodic autoregressive-moving average (PARMA) models,” Commun. Stat. Simulation, 17,NO. 4, pp. 1484-1494, 1988. 138. M. Lo&, “Fonctions Albatories du Second Order,” in P . Le‘vy’s Processus Stochastiques et Mouvement Brownien, pp. 228-252 Gauthier-Villars, Paris, 1948. 139. M. L o b e , Probability Theory, Van Nostrand: New York, 1965. 140. R. Lund and I. V. Basawa, “Recursive prediction and likelihood evaluation for periodic ARMA models,” J . Time Series Anal., 21, pp. 75-93, 2000. 141. H. Lutkepohl, Introduction to Multzple Time Series Analysis, 2nd ed., SpringerVerlag, New York, 1993. 142. A. Makagon and H. Salehi, “Structure of periodically distributed stochastic sequences”, in Stochastic Processes, A Festschmft in Honour of Gopinath Kallianpur. Springer-Verlag, 245-251, 1993. 143. A. Makagon, A. G. Miamee and H. Salehi, ‘Continuous time periodically correlated processes: spectrum and prediction,” Stoch. Proc. Appl., 49,pp. 277-295, 1994. 144. A. Makagon, and H. Salehi, ‘Spectral dilation of operator valued measures and its application to infinite dimensional harmonizable processes,’‘ Studia Math., 8 5 , pp. 254-297, 1987. 145. A. Makagon, “Induced stationary process and structure of locally square integrable periodically correlated processes,” Studia Math., 136, pp. 71-85, 1999. 146. A. Makagon and A. Weron, ” Wold-Cramer concordance theorems for interpolation of q-variate stationary processes over locally compact Abelian groups ,” J . Multivariate Anal., 6 , pp. 123-137, 1976. 147. A. Makagon, A. G. Miamee and H. L. Hurd, “On AR(1) models with periodic and almost periodic coefficients,” Stoch. Proc. Appl., 100,pp. 167-185, 2002. 148. K. V. Mardia, J. T. Kent, and J. M. Bibby, Multivariate Analysis, Academic Press, New York, 1979. 149. V.A. Markelov, “Axis crossings and relative time of existence of periodically nonstationary random processes,” Sov. Radiophys., 9 , pp. 440-443, 1966. 150. D. E. K. Martin, “Estimation of the minimal period of periodically correlated sequences,” Ph.D. dissertation, Department of Mathematics, University of Maryland at College Park, 1990. 151. G. Maruyama, “The harmonic analysis of stationary stochastic processes,’‘ Mem. Fac. Sci. Kyushu Univ. Ser. A , 4, 1949. Reprinted in Gisiro Maruyama Selected Papers! Kaigai Publications, Tokyo, 1988. 152. P. Masani, “The prediction theory of multivariate stochastic processes II1,”Acta. Math., 104,pp. 141-162, 1960.
346
REFERENCES
153. P. Masani. "Recent trends in multivariate prediction theory.'' in Multzwariate Analysis V. P.R. Krishnaiah, Ed.! pp. 351-382. Academic Press: New York, 1966. 154. A. G. bliamee, "Spectral dilation of L ( B ,R)-valued measures and its application to stationary dilation for Banach space valued processes," Indiana Univ. Math. J., 38,pp. 841-860. 1989. 155. A. G. hliamee and H. Salehi. "Harmonizability, V-boundedness and stationary dilation of stochastic processes ," Indiana Univ. Math. J., 27,pp. 37-50. 1978. 156. A. G. Miamee and H. Salehi, "On the bilateral prediction error matrix of a multivariate stationary stochastic process." SIAM J . Appl. Math., 10, pp. 247253, 1979. 157. A. G. Miamee and H. Salehi, "On the prediction of periodically correlated stochastic processes," in Multivariate Analysis.! P.R. Krishnaiah, Ed., pp. 167179, North-Holland. Amsterdam,l980. 158. A. G. Miamee and H. Salehi, "On an Expicit Representation of the Linear Predictor of a weakly Stationary Stochastic Sequence," Bol. Soc. Mat. Mexicana 28. pp.81-93, 1983. 159. A. G. Miamee, "On determining the predictor of nonfull-rank multivariate stationary random processes," SIAM J . Appl. Math.. 18. pp. 909-918. 1987.
160. A. G. Miamee, "Periodically correlated processes and their stationary dilations," S I A M J . Appl. Math.. 50,pp. 1194-1199. 1990. 161. A. G. Miamee and M. Pourahmadi. "Best approximation in L P ( d p ) and prediction problems of Szego, Kolmogorov, Yaglom and Nakazi," J . London Math. SOC.,38,pp. 133-145, 1988. 162. S . Mittnik, 'Computation of theoretical autocovariance matrices of multivariate autoregressive moving average time series." J . R. Stat. Soc. B . 52. pp. 151-155. 1990. 163. S. Mittnik: "Computing theoretical autocovariances of multivariate autoregressive moving average models by using a block Levinson method.'' J . R. Statist. SOC. B; 55:pp. 435-440, 1993. 164. W-. fillak, "Dilation of Hilbert Space Operators (Genera Theory)", Dissertationes Math.. CLIII, pp. 1-61, 1978. 165. A. S. Monin? %tationary and periodic time series in the general circulation of the atmosphere," in Proceedings of Symposium on Time Series Analysis, hl. Rosenblatt. Ed., Wiley, Hoboken. N J , 1963. 166. A. Napolitano and J. Leskow. "Quantile prediction for time series in the fraction-of-time probability framework," Signal Proc., 82,pp. 1727-1741. 2002. 167. A. R. Nematollahi and T. Subba Rao. ''On The spectral density estimation of periodically correlated (cyclostationary) time series," Sankhya, 67.Part 3. pp. 568-589, 2005. 168. H. Niemi. "On Stationary dilations and the linear prediction of certain stochastic processes ." SOC.Sci. Fenn. Comment Phys. Math.: 45,pp. 111-130, 1975.
REFERENCES
347
169. H. Niemi, “Stochastic processes as Fourier transforms of stochastic measures.’’ Ann. Acad. Sci. Fenn. Ser. A.I. Math.: 591. pp. 1-47, 1975. 170. H. Niemi. “On orthogonally scattered dilations of bounded vector measures,” Ann. Acad. Sci. Fenn. Ser. A . I . Math.) 3, pp. 43-52, 1977. 171. H. Niemi: #‘Diagonal Measure of a Positive Definite Bimeasure,“ in Lecture Notes in Mathematics: No. 945: pp. 237-246. Springer-Verlag, Kew York. 1982. 172. H. Niemi. “Grothendieck’s inequality and minimal orthogonally scattered dilations,“ in Lecture Notes in Mathematics; No. 1080. pp. 175-187. 1984. 173. A. H. Nutall and G. Clifford Carter: “Approximation to the cumulative distribution function of the magnitude-squared coherence estimate.“ IEEE Trans. ASSP, ASSP-29, NO. 4! pp. 932-936. 1981. 174. H. Ogura. “Spectral Representation of periodic nonstationary random processes.!‘ IEEE Trans. Inf. Theory, IT-17. pp. 143-149. 1971. 175. ill. Pagano. ”On Periodic and multiple autoregressions,” Ann. Stat.. 6: pp. 1310-131 7. 1978. 176. A. Papoulis. Probability, Random Variables and Stochastic Processes. McGrawHill. New York. 1962. 177. E. Parzen. “On Consistent estimates of the spectrum of a stationary time series:” Ann. Math. Stat., 28. pp. 24-43,, 1957. 178. E. Parzen, “Spectral analysis of asymptotically stationary time series,” Bull. Int. Stat. Inst.. 39, No. 2 : pp. 87-103. 1962. 179. E. Parzen. Stochastic Processes, Holden-Day. San Francisco. 1962. 180. E. Parzen. .‘On Spectral analysis with missing observations.” Sankhya, Ser. A, 25. pp. 383-392, 1963. 181. h1. Pourahmadi and H. Salehi, “On subordination and linear transformation of harmonizable and periodically correlated processes.” in Probability Theory on Vector Spaces III. pp. 195-213, Springer-Verlag. New York/Berlin, 1984. 182. 51. Pourahmadi. “Taylor expansion of exp(xT=o a k z k ) and some applications,” A m . Math. Monthly. 91, pp. 303-307, 1984. 183. Ll. Pourahmadi, Foundations of Time Series Analysis and Prediction Theory. Wiley. Hoboken. NJ: 2001. 184. J . S. Prater and C. M. Loeffler, ‘.Analysis and design of periodically timevarying IIR filters, with applications to transmultiplexing,” IEEE Trans. Signal Proc.. 40, pp. 2715-2725, 1992. 185. hI. B. Priestley, ”Evolutionary spectra and nonstationary process.”, J . R. Stat. Soc., Ser. B.27!pp. 204-237. 1965. 186. B. Quinn. The Estimation of Frequency, Academic Press. New York, 2001. 187. H. Radjavi and P. Rosenthal York/Berlin. 1973.
~
Invariant Subspaces. Springer-Verlag, New
188. 11.hl. Rao .‘Harmonizable Processes: structure theory.“ L ’Enseign Math. 28, pp. 295-351. 1982.
348
REFERENCES
189. M. M. Rao and K. Chang, “Bimeasure and nonstationary processes:” in Real and Stochastic Analysis, M.M. Rao , Ed., pp.7-118, Wiley, Hoboken, NJ: 1986. 190. J. Ramanathan and 0. Zeitouni, “On the wavelet transform of fractional Brownian motion,” IEEE Trans. Inf. Theory, IT-37, pp. 1156-1158, 1991. 191. G. C. Reinsel, Elements of Multivariate Time Series Analysis, Springer-Verlag, New York, 1997. 192. F. Riesz and B. Sz.-Nagy, Functional Analysis, Fredrick Ungar, New York: 1965. 193. R . A. Roberts, W. A. Brown and H. H. Loomis, “A Review of digital spectral correlation analysis: theory and implementation,” in Cyclostationarity in Communications and Signal Processing, W. A. Gardner, Ed., IEEE Press, New York, 1994. 194. G. H. Robertson, “Operating characteristics for a linear detector of C W signals in narrowband Gaussian noise;” Bell Syst. Tech. J.: 46,pp. 755-774, 1967. 195. H. L. Royden, Real Analysis, Macmillan, New York, 1968. 196. M. Rosenberg, “Quasi-isometric dilations of operator-valued measures and Grothendieck’s inequality,” Pacific J . Math, 103,pp. 135-161, 1982. 197. M. Rosenblatt, “A central limit theorem and a strong mixing condition,” Proc. N A S , 42,pp.43-47, 1956. 198. M. Rosenblatt, Stationary Sequences and Random Fields, Birkhauser, Boston, 1985. 199. Y. A. Rozanov, “Spectral theory of multi-dimensional stationary random processes with discrete time,” Usp. Mat. Nauk, 13, No. 2, pp. 93-142, 1958. 200. Yu.A. Rozanov, “Spectral analysis of abstract functions,’‘ Theory Probab. Appl., 4,pp.271-287, 1959. 201. Y. A. Rozanov, Stationary Random Processes, Holden Day, San Francisco, 1967. 202. W . Rudin, Real and Complex Analysis, McGraw-Hill, New York, 1987. 203. W. Rudin, Fourier Analysis on Groups, Wiley, Hoboken, NJ: 1990 204. H. Sakai, “Circular lattice filtering using Pagano’s method,“ IEEE Trans. A S S P , 30, pp. 279 - 287, 1982. 205. H. Sakai, “On the spectral density matrix of a periodic ARMA process,” J . Time Series Anal., 12,pp. 73 - 82, 1991. 206. Q. Shao and R. Lund, ‘Computation and characterization of autocorrelations and partial autocorrelations in periodic ARMA models,” J . Time Series Anal., 25,NO. 3, pp. 359-372, 2004. 207. L. Sharf; Statistical Signal Processing, Addison-Wesley, New York, 1990. 208. A. Shuster, “On lunar and solar periodicities of earthquakes,” Proc. R. Soc., 61,pp. 455-465, 1897.
REFERENCES
349
209. A. Shuster, “On the investigation of hidden periodicities with application to a supposed 26 day period of meteorological phenomena,” Terr. Magn., 3,pp. 13-41, 1898. 210. M. H. Stone, “On one parameter unitary groups in hilbert space.’‘ Annals of Math., 33.pp. 643-648, 1932. 211. Taconite Inlet Project, http://eclogite.geo.umass.edu/climate/TILPHTML/ TILP home. html.
212. C. J. Tian. “A limiting property of sample autocovariances of perodically correlated processes with application to period determination,” J . T i m e Series Anal., 9. pp. 411-417, 1988. 213. G. C. Tiao and M. R. Grupe, “Hidden periodic autoregressive moving average models in time series data,’‘ Biometrika 67. pp. 365-373, 1980. 214. D. Tjostheim and J. B. Thomas, “Some Properties and Examples of Random Processes that are Almost Wide Sense Stationary“, IEEE Trans., IT-21: pp. 257-262, 1975. 215. D. Tjostheim, “On the analysis of a class of multivariate nonstationary stochastic processes?“ in Prediction Theory and Harmonic Analysis: V. Mandrekar and H. Salehi, Eds., pp. 403-416, North Holland, Amsterdam, 1983. 216. A. Trujillo-Ortiz and R. Hernandez- Walls, Bartlett’s test for homogeneity of variances,“ http://www.mathworks.com/matlabcentral/fileexchange.
see
“Btest: URL
217. A. Vecchia, “Maximum likelihood estimation for periodic autoregressive moving average models,” Technometrics, 27,pp. 375-384, 1985. 218. A. Vecchia, “Periodic autoregressive-moving average (PARMA) modeling with applications to water resources,” Water Resour. Bull.: 21. No. 5. 1985. 219. A. Vecchia and R. Ballerini, “Testing for periodic autocorrelations in seasonal time series data,” Biometrika, 78, pp. 53-63, 1991. 220. S. Wang and M. Tang, “Exact confidence interval for magnitude-squared coherence estimates,” IEEE Signal Proc. Lett., 11, No. 3, pp. 326-329, 2004.
221. A. D. Whalen, Detection of Signals in Noise, Academic Press, New York, 1971. 222. P. Westfall and S. Young, Resampling-Based Multiple Testing; Wiley. Hoboken, NJ, 1993. 223. N. Wiener, “Generalized harmonic analysis,” Acta Math., 5 5 , pp. 117-258, 1930. 224. N. Wiener and P. Masani, “The prediction theory of multivariate stochastic processes 1,”Acta. Math., 98, pp. 111-150, 1957. 225. N. Wiener and P. Masani: “The prediction theory of multivariate stochastic processes 11,”Acta. Math.; 99, pp. 93-137, 1958. 226. H. 0. A. Wold, ”On prediction in stationary time series:” Ann. Math. Stat., 19. pp. 558 - 567, 1948.
350
REFERENCES
227. A. M. Yaglom, Correlatzon Theory of Statzonary and Related Random Functzons, Springer-Verlag, New York, 1987. 228. I. N. Yavorskiy. .‘The statistical analysis of periodically correlated random processes,” (in Russian). Radzotekhnzka z elektronzka, 30, No. 6 . pp. 1096-1104. 1985. 229. V. N. Zasuhin. .‘On the theory of multi-dimensional stationary random processes,“ Dokl. Akad. Nauk S S S R , 116, pp.435-437. 1941.
INDEX
aggregation, 189 aliasing, 187 almost PC random sequences, 17 almost periodic, 3 almost periodic sequences, 17 almost sure consistency of r E l t . ~and i j Z k , ~ , 255 almost sure consistency of B i , N T ( ~276 ),
-
h
+
almost sure consistency of E h ( t T , t ) : 268 amplitude modulation, 22 asymptotic covariance of shifted periodogram. @-mixing sequences: 305 asymptotic covariance of shifted periodogram, Gaussian sequences, 303 asymptotic normality, 269, 277 asymptotic normality of fit,^ and i j i k . ~ , 257 asymptotic normality of T ~ . N ( X ) , 306 asymptotic stationarity, 14. 172 asymptotic variance of shifted periodogram, 30 1 autocorrelation function, 84, 120, 265
-
autocovariance function. 130, 141, 160, 265 autonomous, 218 band limited, 202 bandshifting, 191 bandwidth. 202 Bcoeff .m, 289 blocking. 5 causal filters, 146, 177, 192 causal sequence, 217 characterization of Fourier transform, 154 Cholesky decomposition, 97. 121, 245, 247 classical periodogram. 19 complex bandshifting. 191 concordance. 90 confidence intervals for y k . ~ ( X ) ,312 covariance function, 67: 149 cyclostationary, 4. 14; 170 detecting a periodic sequence, 324
Perzodzcally Correlated Random 5’equences:Spectral Theory and Practice. By H.L. Hurd351 and A.G. Miamee Copyright @ 2007 John Wiley & Sons, Inc.
352
INDEX
deterministic, 76, 107; 216 deterministic system, 218 Durbin-Levinson, 95, 243
moving average coefficients: 84, 121 moving average representation, 81, 85, 115, 117, 130, 223 multivariate stationarity, 5 multivariate white noise, 117
effect of spectral coherence smoothing parameter M , 311 effects of discrete spectral components, 322 evidence of P C structure, 310
nonnegative definite (NND), 68: 84, 153 normal equations, 91, 125, 238 orthogonal increment, 151 orthogonally scattered, 8, 71, 151, 165 oscillatory, 205
3-integrable, 139 FFT, 33, 263 finite past prediction, 125, 237 finite past predictor, 91 fkest .m,313 Fourier frequencies, 263 Fourier series representation, 152
+mixing, 259 P A R ( l ) , 38, 215, 236 PAR(1)-CVS, 25, 26, 42 PARMA, 25 partial autocorrelation, 94, 241 past, 75, 216 P C random fields, 16 P C white noise. 170 peracf .m,285 periodic autoregression (PAR). 25, 38, 215 periodic moving average (PMA). 25, 27, 40, 214 periodic time varying (PTV) filters, 192 periodically correlated (PC). 1. 5 periodically perturbed dynamical systems, 28 periodically stationary, 3 periodogram, 19 permcoeff.m, 264 permest .m,262 persigest.m, 284 positive definite (PD), 84 predictand, 75 predictor, 75 projection, 53. 107 purely nondeterministic, 76. 107, 216
Gaussian, 14, 76, 271 Gram-Schmidt procedure, 97, 129 harmonizable, 141, 205 Herglotz’ theorem, 68 Herglotz’s theorem, 154 Hilbert space, 74 infinite moving average, 147 infinite past prediction, 75, 107, 119, 235 innovation, 78, 81, 109; 124, 220 innovation algorithm, 97, 98 Jensen, 89 jointly stationary, 24, 80 Kolmogorov isomorphism, 74, 89, 105 L2[0,2.rr),86 least square predictor, 80 least-square predictor, 98 Lebesgue decomposition, 90, 173 linear P C sequence, 258, 266, 293, 306 linear time invariant (LTI) filter, 146. 176, 179 mean mean mean mean
ergodic theorem, 72, 102, 145, 210 square consistency of 7 i Z t , ~250 , ?-. square consistency of f i k , ~ 253 , square consistency of B L , ~ ~ ( T 2 74 mean square consistency of (t 7, t ) , 267 mixing conditions, 259, 260, 269
RL
+
) ’
Radon-Nikodym, 85 random measure, 71, 101, 141 random periodic sequences, 20 random spectral measure, 71, 101, 145. 187, 191, 253 rank, 111, 121, 222 regular, 76, 107, 216 remote past, 107 sample periodic mean, I scoh.m, 313
INDEX
semivariation, 135 shifted periodogram, 299 singular, 76, 107. 216 spectral coherence, 308 spectral density, 85-88, 91, 102, 103, 116, 121, 124, 260 spectral density matrix, 261 spectral density of PC sequence, 229 spectral distribution function, 68, 87, 88, 90, 91; 124, 149, 171, 257 spectral domain, 74, 102, 104 spectral measure. 71, 121, 141 stable filters, 177, 192 stable linear filter, 147 stable representation, 218 stationarity, 4 stationary dilation, 142, 212 stochastic integral, 151 stochastic or random process, 2 strict stationarity, 3 strongly harmonizable, 141 time dependent spectral measure, 203 time domain, 68, 69; 74, 100, 106, 215 total variation, 135 transfer function. 147, 149 transient, 174 translation series representation (TSR), 210 uncorrelated multivariate sequence, 116 uniformly mixing, 260 unitary shift, 68, 100, 200 variance contrast, 325 vector measure integration, 134 white noise, 78, 81, 116, 170 Wold decomposition, 76, 108, 218 Yule-Walker equations, 242 Yule-Walker equations. 95
353
WILEY SERIES IN PROBABILITY AND STATISTICS ESTABLISHED BY WALTER A. SHEWHART AND SAMUEL S. WILKS Editors: David J. Balding, Noel A . C. Cressie, Nicholas I. Fisher, Iain M. Johnstone, J. B. Kadane, Geert Molenberghs, David W. Scott, Adrian F. M. Smith, Sanford Weisberg Editors Emeriti: Vic Barnett, J. Stuart Hunter, David G. Kendall, Jozef L. Teugels The Wiley Series in Probability and Statistics is well established and authoritative. It covers many topics of current research interest in both pure and applied statistics and probability theory. Written by leading statisticians and institutions, the titles span both state-of-the-art developments in the field and classical methods. Reflecting the wide range of current research in statistics, the series encompasses applied, methodological and theoretical statistics, ranging from applications and new techniques made possible by advances in computerized practice to rigorous treatment of theoretical approaches. This series provides essential and invaluable reading for all statisticians, whether in academia, industry, government, or research.
t
*
* *
ABRAHAM and LEDOLTER . Statistical Methods for Forecasting AGRESTI . Analysis of Ordinal Categorical Data AGRESTI . An Introduction to Categorical Data Analysis, Second Edition AGRESTI . Categorical Data Analysis, Second Edition ALTMAN, GILL, and McDONALD . Numerical Issues in Statistical Computing for the Social Scientist AMARATUNGA and CABRERA . Exploration and Analysis of DNA Microarray and ProSein Array Data ANDEL . Mathematics of Chance ANDERSON . An Introduction to Multivariate Statistical Analysis, Third Edition ANDERSOK . The Statistical Analysis of Time Series ANDERSON, AUQUIER, HAUCK, OAKES, VANDAELE, and WEISBERG . Statistical Methods for Comparative Studies ANDERSON and LOYNES . The Teaching of Practical Statistics AFWITAGE and DAVID (editors) . Advances in Biometry ARNOLD, BALAKRISHNAN, and NAGARAJA . Records ARTHANARI and DODGE . Mathematical Programming in Statistics BAlLEY . The Elements of Stochastic Processes with Applications to the Natural Sciences BALAKRISHNAN and KOUTRAS . Runs and Scans with Applications BALAKRISHNAN and NG . Precedence-Type Tests and Applications BARNETT . Comparative Statistical Inference, Third Edition BARNETT . Environmental Statistics BARNETT and LEWIS . Outliers in Statistical Data, Third Edition BARTOSZYNSKI and NIEWIADOMSKA-BUGAJ . Probability and Statistical Inference BASILEVSKY . Statistical Factor Analysis and Related Methods: Theory and Applications BASU and RIGDON . Statistical Methods for the Reliability of Repairable Systems BATES and WATTS . Nonlinear Regression Analysis and Its Applications
*Now available in a lower priced paperback edition in the Wiley Classics Library. ?Now available in a lower priced paperback edition in the Wiley-Interscience Paperback Series.
'
*
*
BECHHOFER, SANTNER, and GOLDSMAN . Design and Analysis of Experiments for Statistical Selection, Screening, and Multiple Comparisons BELSLEY . Conditioning Diagnostics: Collinearity and Weak Data in Regression BELSLEY, KUH, and WELSCH . Regression Diagnostics: Identifying Influential Data and Sources of Collinearity BENDAT and PIERSOL . Random Data: Analysis and Measurement Procedures, Third Edition BERRY, CHALONER, and GEWEKE . Bayesian Analysis in Statistics and Econometrics: Essays in Honor of Arnold Zellner BERNARD0 and SMITH . Bayesian Theory BHAT and MILLER . Elements of Applied Stochastic Processes, Third Edition BHATTACHARYA and WAYMIRE . Stochastic Processes with Applications BILLINGSLEY . Convergence of Probability Measures, Second Edition BILLINGSLEY . Probability and Measure, Third Edition BIRKES and DODGE . Alternative Methods of Regression BISWAS, DATTA, FINE, and SEGAL . Statistical Advances in the Biomedical Sciences: Clinical Trials, Epidemiology, Survival Analysis, and Bioinformatics BLISCHKE AND MURTHY (editors) . Case Studies in Reliability and Maintenance BLISCHKE AND MURTHY . Reliability: Modeling, Prediction, and Optimization BLOOMFIELD . Fourier Analysis of Time Series: An Introduction, Second Edition BOLLEN . Structural Equations with Latent Variables BOLLEN and CURRAN . Latent Curve Models: A Structural Equation Perspective BOROVKOV . Ergodicity and Stability of Stochastic Processes BOULEAU . Numerical Methods for Stochastic Processes BOX Bayesian Inference in Statistical Analysis BOX . R. A. Fisher, the Life of a Scientist BOX and DRAPER . Response Surfaces, Mixtures, and Ridge Analyses, Second Edition BOX and DRAPER . Evolutionary Operation: A Statistical Method for Process Improvement BOX and FRIENDS . Improving Almost Anything, Revised Edition BOX, HUNTER, and HUNTER * Statistics for Experimenters: Design, Innovation, and Discovery, Second Editon BOX and LUCERO . Statistical Control by Monitoring and Feedback Adjustment BRANDIMARTE . Numerical Methods in Finance: A MATLAB-Based Introduction BROWN and HOLLANDER . Statistics: A Biomedical Introduction BRUNNER, DOMHOF, and LANGER . Nonparametric Analysis of Longitudinal Data in Factorial Experiments BUCKLEW . Large Deviation Techniques in Decision, Simulation, and Estimation CAIROLI and DALANG . Sequential Stochastic Optimization CASTILLO, HADI, BALAKRISHNAN, and SARABIA . Extreme Value and Related Models with Applications in Engineering and Science CHAN . Time Series: Applications to Finance CHARALAMBIDES . Combinatorial Methods in Discrete Distributions CHATTERJEE and HADI . Regression Analysis by Example, Fourth Edition CHATTERJEE and HADI . Sensitivity Analysis in Linear Regression CHERNICK . Bootstrap Methods: A Guide for Practitioners and Researchers, Second Edition CHERNICK and FRIIS . Introductory Biostatistics for the Health Sciences CHILES and DELFINER . Geostatistics: Modeling Spatial Uncertainty CHOW and LIU . Design and Analysis of Clinical Trials: Concepts and Methodologies, Second Edition CLARKE and DISNEY . Probability and Random Processes: A First Course with Applications, Second Edition COCHRAN and COX . Experimental Designs, Second Edition
*Now available in a lower priced paperback edition in the Wiley Classics Library. ?Now available in a lower priced paperback edition in the Wiley-Interscience Paperback Series.
*
* *
* *
*
t
*
CONGDON . Applied Bayesian Modelling CONGDON . Bayesian Models for Categorical Data CONGDON . Bayesian Statistical Modelling CONOVER . Practical Nonparametric Statistics, Third Edition COOK. Regression Graphics COOK and WEISBERG . Applied Regression Including Computing and Graphics COOK and WEISBERG . An Introduction to Regression Graphics CORNELL * Experiments with Mixtures, Designs, Models, and the Analysis of Mixture Data, Third Edition COVER and THOMAS . Elements of Information Theory COX . A Handbook of Introductory Statistical Methods COX . Planning of Experiments CRESSIE . Statistics for Spatial Data, Revised Edition CSORGO and HORVATH . Limit Theorems in Change Point Analysis DANIEL . Applications of Statistics to Industrial Experimentation DANIEL . Biostatistics: A Foundation for Analysis in the Health Sciences, Eighth Edition DANIEL . Fitting Equations to Data: Computer Analysis of Multifactor Data, Second Edition DASU and JOHNSON . Exploratory Data Mining and Data Cleaning DAVID and NAGARAJA * Order Statistics, Third Edition DEGROOT, FIENBERG, and KADANE . Statistics and the Law DEL CASTILLO . Statistical Process Adjustment for Quality Control DEMARIS . Regression with Social Data: Modeling Continuous and Limited Response Variables DEMIDENKO . Mixed Models: Theory and Applications DENISON, HOLMES, MALLICK and SMITH . Bayesian Methods for Nonlinear Classification and Regression DETTE and STUDDEN . The Theory of Canonical Moments with Applications in Statistics, Probability, and Analysis DEY and MUKERJEE . Fractional Factorial Plans DILLON and GOLDSTEIN . Multivariate Analysis: Methods and Applications DODGE . Alternative Methods of Regression DODGE and ROMIG . Sampling Inspection Tables, Second Edition DOOB * Stochastic Processes DOWDY, WEARDEN, and CHILKO . Statistics for Research, Third Edition DRAPER and SMITH . Applied Regression Analysis, Third Edition DRYDEN and MARDIA . Statistical Shape Analysis DUDEWICZ and MISHRA . Modem Mathematical Statistics D U " and CLARK . Basic Statistics: A Primer for the Biomedical Sciences, Third Edition DUPUIS and ELLIS . A Weak Convergence Approach to the Theory of Large Deviations EDLER and KITSOS . Recent Advances in Quantitative Methods in Cancer and Human Health Risk Assessment ELANDT-JOHNSON and JOHNSON . Survival Models and Data Analysis ENDERS . Applied Econometric Time Series ETHIER and KURTZ . Markov Processes: Characterization and Convergence EVANS, HASTINGS, and PEACOCK . Statistical Distributions, Third Edition FELLER. An Introduction to Probability Theory and Its Applications, Volume I, Third Edition, Revised; Volume 11, Second Edition FISHER and VAN BELLE . Biostatistics: A Methodology for the Health Sciences FITZMAURICE, LAIRD, and WARE . Applied Longitudinal Analysis FLEISS . The Design and Analysis of Clinical Experiments FLEISS . Statistical Methods for Rates and Proportions, Third Edition
*Now available in a lower priced paperback edition in the Wiley Classics Library. "Now available in a lower priced paperback edition in the Wiley-Interscience Paperback Series.
t
*
t
* *
t
FLEMING and HARRINGTON . Counting Processes and Survival Analysis FULLER . Introduction to Statistical Time Series, Second Edition FULLER. Measurement Error Models GALLANT . Nonlinear Statistical Models GEISSER . Modes of Parametric Statistical Inference GELMAN and MENG * Applied Bayesian Modeling and Causal Inference from Incomplete-Data Perspectives GEWEKF! . Contemporary Bayesian Econometrics and Statistics GHOSH, MUKHOPADHYAY, and SEN . Sequential Estimation GIESBRECHT and GUMPERTZ . Planning, Construction, and Statistical Analysis of Comparative Experiments GIFI . Nonlinear Multivariate Analysis GIVENS and HOETING . Computational Statistics GLASSERMAN and YAO . Monotone Structure in Discrete-Event Systems GNANADESIKAN . Methods for Statistical Data Analysis of Multivariate Observations, Second Edition GOLDSTEIN and LEWIS . Assessment: Problems, Development, and Statistical Issues GREENWOOD and NIKULIN * A Guide to Chi-Squared Testing GROSS and HARRIS . Fundamentals of Queueing Theory, Third Edition HAHN and SHAPIRO . Statistical Models in Engineering HAHN and MEEKER. Statistical Intervals: A Guide for Practitioners HALD . A History of Probability and Statistics and their Applications Before 1750 HALD . A History of Mathematical Statistics from 1750 to 1930 HAMPEL . Robust Statistics: The Approach Based on Influence Functions HANNAN and DEISTLER . The Statistical Theory of Linear Systems HEIBERGER . Computation for the Analysis of Designed Experiments HEDAYAT and SINHA . Design and Inference in Finite Population Sampling HEDEKER and GIBBONS . Longitudinal Data Analysis HELLER . MACSYMA for Statisticians HINKELMANN and KEMPTHORNE . Design and Analysis of Experiments, Volume 1 : Introduction to Experimental Design, Second Edition HINKELMANN and KEMPTHORNE . Design and Analysis of Experiments, Volume 2: Advanced Experimental Design HOAGLIN, MOSTELLER, and TUKEY . Exploratory Approach to Analysis of Variance HOAGLIN, MOSTELLER, and TUKEY . Exploring Data Tables, Trends and Shapes HOAGLIN, MOSTELLER, and TUKEY . Understanding Robust and Exploratory Data Analysis HOCHBERG and TAMHANE * Multiple Comparison Procedures HOCKING . Methods and Applications of Linear Models: Regression and the Analysis of Variance, Second Edition HOEL . Introduction to Mathematical Statistics, Fifth Edition HOGG and KLUGMAN . Loss Distributions HOLLANDER and WOLFE . Nonparametric Statistical Methods, Second Edition HOSMER and LEMESHOW . Applied Logistic Regression, Second Edition HOSMER and LEMESHOW . Applied Survival Analysis: Regression Modeling of Time to Event Data HUBER . Robust Statistics HUBERTY . Applied Discriminant Analysis HUBERTY and OLEJNIK . Applied MANOVA and Discriminant Analysis, Second Edition HUNT and KENNEDY . Financial Derivatives in Theory and Practice, Revised Edition
*Now available in a lower priced paperback edition in the Wiley Classics Library. ?Now available in a lower priced paperback edition in the Wiley-Interscience Paperback Series.
HURD and MIAMEE . Periodically Correlated Random Sequences: Spectral Theory and Practice HUSKOVA, BERAN, and DUPAC . Collected Works of Jaroslav Hajekwith Commentary HUZURBAZAR . Flowgraph Models for Multistate Time-to-Event Data IMAN and CONOVER . A Modem Approach to Statistics i- JACKSON . A User's Guide to Principle Components JOHN . Statistical Methods in Engineering and Quality Assurance JOHNSON . Multivariate Statistical Simulation JOHNSON and BALAKRISHNAN . Advances in the Theory and Practice of Statistics: A Volume in Honor of Samuel Kotz JOHNSON and BHATTACHARYYA . Statistics: Principles and Methods, Fifth Edition JOHNSON and KOTZ . Distributions in Statistics JOHNSON and KOTZ (editors) . Leading Personalities in Statistical Sciences: From the Seventeenth Century to the Present JOHNSON, KOTZ, and BALAKRISHNAN . Continuous Univariate Distributions, Volume 1, Second Edition JOHNSON, KOTZ, and BALAKRISHNAN . Continuous Univariate Distributions, Volume 2 , Second Edition JOHNSON, KOTZ, and BALAKRISHNAN . Discrete Multivariate Distributions JOHNSON, KEMP, and KOTZ . Univariate Discrete Distributions, Third Edition JUDGE, GRIFFITHS, HILL, LUTKEPOHL, and LEE . The Theory and Practice of Ecgnometrics, Second Edition JURECKOVA and SEN . Robust Statistical Procedures: Aymptotics and Interrelations JUREK and MASON . Operator-Limit Distributions in Probability Theory KADANE . Bayesian Methods and Ethics in a Clinical Trial Design KADANE AND SCHUM . A Probabilistic Analysis of the Sacco and Vanzetti Evidence KALBFLEISCH and PRENTICE . The Statistical Analysis of Failure Time Data, Second Edition KARIYA and KURATA . Generalized Least Squares KASS and VOS . Geometrical Foundations of Asymptotic Inference KAUFMAN and ROUSSEEUW . Finding Groups in Data: An Introduction to Cluster Analysis KEDEM and FOKIANOS . Regression Models for Time Series Analysis KENDALL, BARDEN, CARNE, and LE . Shape and Shape Theory KHURI . Advanced Calculus with Applications in Statistics, Second Edition KHURI, MATHEW, and SINHA . Statistical Tests for Mixed Linear Models KLEIBER and KOTZ * Statistical Size Distributions in Economics and Actuarial Sciences KLUGMAN, PANJER, and WILLMOT . Loss Models: From Data to Decisions, Second Edition KLUGMAN, PANJER, and WILLMOT . Solutions Manual to Accompany Loss Models: From Data to Decisions, Second Edition KOTZ, BALAKRISHNAN, and JOHNSON . Continuous Multivariate Distributions, Volume 1, Second Edition KOVALENKO, KUZNETZOV, and PEGG . Mathematical Theory of Reliability of Time-Dependent Systems with Practical Applications KOWALSKI and TU . Modern Applied U-Statistics KVAM and VIDAKOVIC . Nonparametric Statistics with Applications to Science and Engineering LACHIN . Biostatistical Methods: The Assessment of Relative Risks LAD . Operational Subjective Statistical Methods: A Mathematical, Philosophical, and Historical Introduction LAMPERTI . Probability: A Survey of the Mathematical Theory, Second Edition *Now available in a lower priced paperback edition in the Wiley Classics Library. 'Now available in a lower priced paperback edition in the Wiley-Interscience Paperback Series.
*
*
LANGE, RYAN, BILLARD, BRILLINGER, CONQUEST, and GREENHOUSE . Case Studies in Biometry LARSON . Introduction to Probability Theory and Statistical Inference, Third Edition LAWLESS . Statistical Models and Methods for Lifetime Data, Second Edition LAWSON . Statistical Methods in Spatial Epidemiology LE . Applied Categorical Data Analysis LE . Applied Survival Analysis LEE and WANG . Statistical Methods for Survival Data Analysis, Third Edition LEPAGE and BILLARD . Exploring the Limits of Bootstrap LEYLAND and GOLDSTEIN (editors) . Multilevel Modelling of Health Statistics LIAO . Statistical Group Comparison LINDVALL . Lectures on the Coupling Method LIN . Introductory Stochastic Analysis for Finance and Insurance LINHART and ZUCCHINI . Model Selection LITTLE and RUBIN . Statistical Analysis with Missing Data, Second Edition LLOYD . The Statistical Analysis of Categorical Data LOWEN and TEICH . Fractal-Based Point Processes MAGNUS and NEUDECKER . Matrix Differential Calculus with Applications in Statistics and Econometrics, Revised Edition MALLER and ZHOU . Survival Analysis with Long Term Survivors MALLOWS . Design, Data, and Analysis by Some Friends of Cuthbert Daniel MA", SCHAFER, and SINGPURWALLA . Methods for Statistical Analysis of Reliability and Life Data MANTON, WOODBURY, and TOLLEY . Statistical Applications Using Fuzzy Sets MARCHETTE . Random Graphs for Statistical Pattern Recognition MARDIA and JUPP . Directional Statistics MASON, GUNST, and HESS . Statistical Design and Analysis of Experiments with Applications to Engineering and Science, Second Edition McCULLOCH and SEARLE * Generalized, Linear, and Mixed Models McFADDEN . Management of Data in Clinical Trials, Second Edition McLACHLAN . Discriminant Analysis and Statistical Pattern Recognition McLACHLAN, DO, and AMBROISE . Analyzing Microarray Gene Expression Data McLACHLAN and KRISHNAN . The EM Algorithm and Extensions, Second Edition McLACHLAN and PEEL . Finite Mixture Models McNEIL . Epidemiological Research Methods MEEKER and ESCOBAR . Statistical Methods for Reliability Data MEERSCHAERT and SCHEFFLER . Limit Distributions for Sums of Independent Random Vectors: Heavy Tails in Theory and Practice MICKEY, DUNN, and CLARK . Applied Statistics: Analysis of Variance and Regression, Third Edition MILLER . Survival Analysis, Second Edition MONTGOMERY, PECK, and VINING ' Introduction to Linear Regression Analysis, Fourth Edition MORGENTHALER and TUKEY . Configural Polysampling: A Route to Practical Robustness MUIRHEAD . Aspects of Multivariate Statistical Theory MULLER and STOYAN . Comparison Methods for Stochastic Models and Risks MURRAY . X-STAT 2.0 Statistical Experimentation, Design Data Analysis, and Nonlinear Optimization MURTHY, XIE, and JIANG . Weibull Models MYERS and MONTGOMERY . Response Surface Methodology: Process and Product Optimization Using Designed Experiments, Second Edition MYERS, MONTGOMERY, and VINING . Generalized Linear Models. With Applications in Engineering and the Sciences
*Now available in a lower priced paperback edition in the Wiley Classics Library. ?Now available in a lower priced paperback edition in the Wiley-Interscience Paperback Series.
t
t
*
t *
*
*
t
*
NELSON . Accelerated Testing, Statistical Models, Test Plans, and Data Analyses NELSON Applied Life Data Analysis NEWMAN * Biostatistical Methods in Epidemiology OCHI * Applied Probability and Stochastic Processes in Engineering and Physical Sciences OKABE, BOOTS, SUGIHARA, and CHIU . Spatial Tesselations: Concepts and Applications of Voronoi Diagrams, Second Edition OLIVER and SMITH * Influence Diagrams, Belief Nets and Decision Analysis PALTA . Quantitative Methods in Population Health: Extensions of Ordinary Regressions PANJER . Operational Risk: Modeling and Analytics PANKRATZ Forecasting with Dynamic Regression Models PANKRATZ . Forecasting with Univariate Box-Jenkins Models: Concepts and Cases PARZEN . Modem Probability Theory and Its Applications PERA, TIAO, and TSAY A Course in Time Series Analysis PIANTADOSI Clinical Trials: A Methodologic Perspective PORT Theoretical Probability for Applications POURAHMADI * Foundations of Time Series Analysis and Prediction Theory POWELL . Approximate Dynamic Programming: Solving the Curses of Dimensionality PRESS . Bayesian Statistics: Principles, Models, and Applications PRESS . Subjective and Objective Bayesian Statistics, Second Edition PRESS and TANUR . The Subjectivity of Scientists and the Bayesian Approach PUKELSHEIM . Optimal Experimental Design PUN, VILAPLANA, and WERTZ * New Perspectives in Theoretical and Applied Statistics PUTERMAN . Markov Decision Processes: Discrete Stochastic Dynamic Programming QIU . Image Processing and Jump Regression Analysis RAO * Linear Statistical Inference and Its Applications, Second Edition RAUSAND and H0YLAND System Reliability Theory: Models, Statistical Methods, and Applications, Second Edition RENCHER . Linear Models in Statistics RENCHER . Methods of Multivariate Analysis, Second Edition RENCHER . Multivariate Statistical Inference with Applications RIPLEY * Spatial Statistics RIPLEY . Stochastic Simulation ROBINSON * Practical Strategies for Experimenting ROHATGI and SALEH * An Introduction to Probability and Statistics, Second Edition ROLSKI, SCHMIDLI, SCHMIDT, and TEUGELS * Stochastic Processes for Insurance and Finance ROSENBERGER and LACHIN * Randomization in Clinical Trials: Theory and Practice ROSS . Introduction to Probability and Statistics for Engineers and Scientists ROSSI, ALLENBY, and McCULLOCH * Bayesian Statistics and Marketing ROUSSEEUW and LEROY * Robust Regression and Outlier Detection RUBIN . Multiple Imputation for Nonresponse in Surveys RUBINSTEIN * Simulation and the Monte Carlo Method RUBINSTEIN and MELAMED . Modem Simulation and Modeling RYAN . Modem Engineering Statistics RYAN . Modem Experimental Design RYAN Modem Regression Methods RYAN * Statistical Methods for Quality Improvement, Second Edition SALEH * Theory of Preliminary Test and Stein-Type Estimation with Applications SCHEFFE * The Analysis of Variance SCHIMEK Smoothing and Regression: Approaches, Computation, and Application SCHOTT * Matrix Analysis for Statistics, Second Edition SCHOUTENS * Levy Processes in Finance: Pricing Financial Derivatives 9
-
9
*
-
*Now available in a lower priced paperback edition in the Wiley Classics Library. ?Now available in a lower priced paperback edition in the Wiley-Interscience Paperback Series.
t
7 7
*
SCHUSS . Theory and Applications of Stochastic Differential Equations SCOTT * Multivariate Density Estimation: Theory, Practice, and Visualization SEARLE . Linear Models for Unbalanced Data SEARLE Matrix Algebra Useful for Statistics SEARLE, CASELLA, and McCULLOCH . Variance Components SEARLE and WILLETT . Matrix Algebra for Applied Economics SEBER . A Matrix Handbook For Statisticians SEBER . Multivariate Observations SEBER and LEE . Linear Regression Analysis, Second Edition SEBER and WILD . Nonlinear Regression SENNOTT . Stochastic Dynamic Programming and the Control of Queueing Systems SERFLING . Approximation Theorems of Mathematical Statistics SHAFER and VOVK . Probability and Finance: It’s Only a Game! SILVAPULLE and SEN . Constrained Statistical Inference: Inequality, Order, and Shape Restrictions SMALL and McLEISH . Hilbert Space Methods in Probability and Statistical Inference SRIVASTAVA . Methods of Multivariate Statistics STAPLETON . Linear Statistical Models STAUDTE and SHEATHER * Robust Estimation and Testing STOYAN, KENDALL, and MECKE . Stochastic Geometry and Its Applications, Second Edition STOYAN and STOYAN . Fractals, Random Shapes and Point Fields: Methods of Geometrical Statistics STREET and BURGESS . The Construction of Optimal Stated Choice Experiments: Theory and Methods STYAN . The Collected Papers of T. W. Anderson: 1943-1985 SUTTON, ABRAMS, JONES, SHELDON, and SONG . Methods for Meta-Analysis in Medical Research TAKEZAWA ‘ Introduction to Nonparametric Regression TANAKA . Time Series Analysis: Nonstationary and Noninvertible Distribution Theory THOMPSON . Empirical Model Building THOMPSON . Sampling, Second Edition THOMPSON * Simulation: A Modeler’s Approach THOMPSON and SEBER . Adaptive Sampling THOMPSON, WILLIAMS, and FINDLAY . Models for Investors in Real World Markets TIAO, BISGAARD, HILL, PERA, and STIGLER (editors). Box on Quality and Discovery: with Design, Control, and Robustness TIERNEY . LISP-STAT: An Object-Oriented Environment for Statistical Computing and Dynamic Graphics TSAY ’ Analysis of Financial Time Series, Second Edition UPTON and FINGLETON * Spatial Data Analysis by Example, Volume 11: Categorical and Directional Data VAN BELLE . Statistical Rules of Thumb VAN BELLE, FISHER, HEAGERTY, and LUMLEY . Biostatistics: A Methodology for the Health Sciences, Second Edition VESTRUP . The Theory of Measures and Integration VIDAKOVIC * Statistical Modeling by Wavelets VINOD and REAGLE * Preparing for the Worst: Incorporating Downside Risk in Stock Market Investments WALLER and GOTWAY . Applied Spatial Statistics for Public Health Data WEERAHANDI . Generalized Inference in Repeated Measures: Exact Methods in MANOVA and Mixed Models WEISBERG . Applied Linear Regression, Third Edition 9
*Now available in a lower priced paperback edition in the Wiley Classics Library. ?Now available in a lower priced paperback edition in the Wiley-Interscience Paperback Series.
*
WELSH . Aspects of Statistical Inference WESTFALL and YOUNG . Resampling-Based Multiple Testing: Examples and Methods for p-Value Adjustment WHITTAKER . Graphical Models in Applied Multivariate Statistics WINKER . Optimization Heuristics in Economics: Applications of Threshold Accepting WONNACOTT and WONNACOTT . Econometrics, Second Edition WOODING . Planning Pharmaceutical Clinical Trials: Basic Statistical Principles WOODWORTH . Biostatistics: A Bayesian Introduction WOOLSON and CLARKE . Statistical Methods for the Analysis of Biomedical Data, Second Edition WU and HAMADA . Experiments: Planning, Analysis, and Parameter Design Optimization WU and ZHANG . Nonparametric Regression Methods for Longitudinal Data Analysis YANG . The Construction Theory of Denumerable Markov Processes YOUNG, VALERO-MOM, and FRIENDLY . Visual Statistics: Seeing Data with Dynamic Interactive Graphics ZELTERMAN . Discrete Distributions-Applications in the Health Sciences ZELLNER . An Introduction to Bayesian Inference in Econometrics ZHOU, OBUCHOWSKI, and McCLISH . Statistical Methods in Diagnostic Medicine
*Now available in a lower priced paperback edition in the Wiley Classics Library. 'Now available in a lower priced paperback edition in the Wiley-Interscience Paperback Series.