Time Series and Linear Systems

Lecture Notes in Control and Information Sciences Edited by M.Thoma and A. Wyner 86 Time Series and Linear Systems Edi...

Author: Sergio Bittanti

40 downloads 1085 Views 7MB Size Report

This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!

Report copyright / DMCA form

DOWNLOAD PDF

Lecture Notes in Control and Information Sciences Edited by M.Thoma and A. Wyner

86 Time Series and Linear Systems

Edited by S. Bittanti

Springer-Verlag Berlin Heidelberg New York London Paris Tokyo

Series Editors M. Thoma • A. Wyner

Advisory Board L. D. Davisson • A. G. J. MacFarlane • H. Kwakernaak J. L. Massey ' Ya Z. Tsypkin • A. J. Viterbi Editor Sergio Bittanti Dipartimento di Elettronica Politecnico di Milano Piazzo Leonardo da Vinci 32 20133 Milano (italy)

ISBN 3-540-16903-2 Springer-Verlag Berlin Heidelberg New York ISBN 0-387-16903-2 Springer-Verlag New York Berlin Heidelberg Library of Congress Cataloging in Publication Data Time series and linear systems. (Lecture notes in control and information sciences; 86) Includes bibliographies. 1. Time-series analysis. 2. Linear systems. I. Bittanti, Sergio. I1. Series. OA280.T558 1986 519.5'5 86-20244 ISBN 0-387-16903-2 (U.S.) This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically those of translation, reprinting, re-use of illustrations, broadcasting, reproduction by photocopying machine or similar means, and storage in data banks. Under § 54 of the German Copyright Law where copies are made for other than private use, a fee is payable to "Verwertungsgesellschaft Weft", Munich. © Springer-Verlag Berlin, Heidelberg 1986 Printed in Germany Offsetpnnting: Color-Druck, G. Baucke, Berlin Binding: B. Helm, Berlin 216113020-543210

PREFACE

O v e r the

five p a s t

n i c o di M i l a n o

years,

(Italy)

and ~ d e n t i f i c a t i o n Several

Analysis,

series

Statistics,

of r e s e a r c h

by m e a n s

Numerical

contributing

on the subject.

underlying

this

was

chapters

are e x t e n d e d

advanced

topics

The b o o k

problem

in the field.

They also

directions

as follows.

The p r o b l e m of f i n d i n g

the

linear

of c r i t e r i a

such as A I C or B I C

the p r o b l e m

of d e t e r m i n i n g

observed second

~

Hankel

studied matrix

matrix

variables

chapter.

assumtions

of c u r r e n t

which

Among

of the

impulse

of f i n i t e

are s u b j e c t

The motivation

can then be avoided.

interest.

rank.

is an

in time

series

here

other

things

the use

discussed.

rational

transfer

of a p p r o x i m a t i n g response

Linear

to errors

as the

is the b e s t

is c r i t i c a l l y

as the p r o b l e m

important

useful

chapter

models

a suitable

The v a r i o u s

constitute

is i n t e r p r e t e d

model

for the d a t a at hand.

with a Hankel

talks

of ideas

overviewing

The first

of m o d e l l i n g

approximant

infinite

their

a system-theoretic

papers

to the use of s t o c h a s t i c

approximation

train

of s u c h an activity.

introductory

is o r g a n i z e d

analysis.

report

to r e s e a r c h

introduction

to d e v e l o p

The

with

System

for the art of modelling.

is a p a r t i a l

introductions

systems.

Econometrics,

up a w o r k s h o p

This b o o k

of l i n e a r

including

to s e t t i n g

point of v i e w

of m o d e l l i n g

backgrounds,

the P o l i t e c n i c o

activity

at the P o l i t e c -

in the m e t h o d o l o g y

of d i f f e r e n t

Theory,

visited

has b e e n

of time

specialists

and C o n t r o l

a stream

function the

coefficients

systems

where

are c o n s i d e r e d

is t h a t p r e j u d i c i a l A n e w class

Moreover,

all

in the

causality

of d y n a m i c

models

IV

for time

series

are b a s e d strictly

is p r o p o s e d

on the c l a s s i c a l related

chapter

Length

approach.

digits

with which

is d e v o t e d

of s t o c h a s t i c of b i n a r y data.

ically

time-varying

coefficients,

structural

series.

Chapter

The

properties

of these

and so on.

in the a n a l y s i s

of s t o c h a s t i c

in the sixth Shur Then,

problems

chapter.

and Singular the p r o b l e m

time-invariant

The v o l u m e

on the s u b j e c t the m a i n

authors their

of some r e c e n t either

expresses

his

care and patience

sincere

valuable

Research

Council

(M.P.I.)

is g r a t e f u l l y

theory

(C.N.R.)

properties upon.

are c o n s i d e r e d

of the LU,

QR,

is provided. subspace

of a

is d e v o t e d

in E c o n o m e t r i c s .

book providing

acknowledgment contributions,

di T e o r i a

researchers

courses with

field.

in the p r e p a r a t i o n

of the C e n t r o

to d e s c r i b e

is t o u c h e d

last c h a p t e r

in the

period-

on the b a s i c

b y these

algorithms

trends

with

reachability,

reachability The

to

as a t e x t b o o k for m o n o g r a p h i c

and p e r s p e c t i v e s

for t h e i r m o s t

The s u p p o r t

the

or as a r e f e r e n c e

trends

The e d i t o r

Decomposition

systems

systems

overview

as

it p e r m i t s

here

i.e.

system

is studied.

can be u s e d

focuses

periodic

This

of the data,

with

The r o l e p l a y e d

of b i n a r y

data.

can be u s e d

systems,

of c o m p u t i n g

system

to the d i s c u s s i o n

which

An e x t e n s i v e

2. The

Description

the o b s e r v e d

5 deals

in l i n e a r

Value

Minimum

with which

attention

stabilizability

Some numerical

in C h a p t e r

complexity

digits

and are

by the n u m b e r

to e n c o d e

the o b s e r v e d

time

judged

These models

approach,

introduced

encode

seasonal

Analysis

is t h e n

it p e r m i t s

number

chapter.

to the so c a l l e d

A model

to the n o t i o n

the s h o r t e s t

Factor

to the s y s t e m s

fourth

leads

in the t h i r d

to the as well

fellow as

of the m a n u s c r i p t s .

dei S i s t e m i

of the N a t i o n a l

and t h a t of the M i n i s t r y

of E d u c a t i o n

acknowldged.

Sergio

Bittanti

ABSTRACTS

Chapter

TIME

l

SERIES AND

b y E.J.

The b a s i c

concept

an o u t p u t

y(t),

u(t),

p

of

STOCHASTIC

Hannan

of this p a p e r of

q

is a l i n e a r

components,

components

y(t)

y(t)

the

e(t)

= ~ W i e(t-i) linear

The m e t h o d s

valid when

the s y s t e m

prediction

is o p t i m a l ,

is r e l a t e d

to an input,

+ ~ L i u(t-i) 1

are the

- Z L i u(t-i).

system wherein

via a relation

0

wherein

MODELS

is t r u l y

prediction

errors

of the p a p e r linear

but may prove

are s u b s t a n t i a l l y

in the useful

for

sense over

that

linear

a much wider

range. To b r i n g

the p r o b l e m b a c k

statistical structure

methods

are b a s e d

by one w h e r e i n

W(z)

are a p p r o x i m a t e d discussion

= Z W.

process.

to c h o o s e

"order"

lags

-i

, L(z)

the of the t r u e

functions

= Z L.

of r a t i o n a l

of some b a s i c

approximation

the m a x i m u m

z

proportions

on t h e a p p r o x i m a t i o n

the m a t r i x

by m a t r i c e s

is g i v e n

the

to r e a s o n a b l e

-i

functions.

theory

It is n e c e s s a r y ,

z

relating

A brief

to such an

in the a p p r o x i m a t i o n

of the a p p r o x i m a n t ,

i.e.

effectively

in the A R M A X m o d e l ,

h Z AiY(t-i) 0

h = ~ B.u(t-i)l I

h + Z C.l e(t-i) 0

,

to w h i c h Various

the r a t i o n a l algorithms

analysis

described

are d e s c r i b e d

a suitable

that

does

approximant.

recursive

at e a c h

corresponds.

are b a s i c

in time

a solution

The m a i n

this by a G a u s s - N e w t o n

is r e d e t e r m i n e d

to the p r o b l e m

algorithm

iteration

iteration

series

in w h i c h

the

by a c a l c u l a t i o n

in the order.

Finally,

on-line

presented

Chapter

function

and are t h e n u s e d to e f f e c t

of f i n d i n g

order

transfer

implementations

for the c a s e w h e r e

LINEAR

2

of the a l g o r i t h m

y(t)

are

is scalar.

ERRORS-IN-VARIABLES

SYSTEMS

by M. D e i s t l e r

Linear where

errors-in-variables(EV) all o b s e r v e d

considered. out

The

variables

statistical

to be s i g n i f i c a n t l y

conventional good part

of t h e s e

the t r a n s f e r general,

errors

systems, are s u b j e c t

analysis

more

complications

function

of the

is n o t u n i q u e l y

system

are

systems.

f r o m the

f r o m the

turns

to A

fact t h a t

in the E V case,

determined

systems

systems

compared

(e.g. ARMAX) arises

linear

to e r r o r s

of s u c h

complicated

in e q u a t i o n s

i.e.

in

second moments

of the o b s e r v a t i o n s . The p a p e r known

is o r g a n i z e d

results

sections contained

concerning

3 - 5 the in the

is a n a l y s e d :

as follows: the s t a t i c

information

(ensemble)

In s e c t i o n

In s e c t i o n

about

second

3 the

case

2 some w e l l

are r e s t a t e d .

the t r a n s f e r

moments

In

function

of the o b s e r v a t i o n s

set of all t r a n s f e r

functions

corresponding described.

to g i v e n

Section

system

is a p r i o r i

whether

causality

the o b s e r v a t i o n s . are derived. using

4 deals with known

of the o b s e r v a t i o n s

the same p r o b l e m

to be c a u s a l

c a n be d e t e c t e d In s e c t i o n

Section

information

second moments

the

the p r o b l e m

f r o m the s e c o n d m o m e n t s

5 conditions

6 deals

coming

and w i t h

when

is

with

for i d e n t i f i a b i l i t y

conditions

from moments

of

for i d e n t i f i a b i l i t y

of o r d e r

greater

than

two.

A N E W C L A S S OF D Y N A M I C M O D E L S FOR STATIONARY TIME SERIES

Chapte r 3

by G. P i c c i

A new class presented. known

of d y n a m i c

for s t a t i o n a r y

time

T h e y are a n a t u r a l

generalization

of the w e l l -

and P s y c h o m e t r i c s . of time

to some e x t e n t

simple

of m u l t i v a r i a t e introduction subsumed

series

is

Analysis M o d e l s w i d e l y u s e d in S t a t i s t i c s It is s h o w n

series clarify

the

in this n o t e

structure

schemes

series which

of a p r i o r i causality

by c o n v e n t i o n a l

reduce

of) Dynamic

in the r e c e n t

mathematical time

that the F a c t o r A n a l y s i s

considered

VariabZe Models d i s c u s s e d provide

S. P i n z o n i

models

l i n e a r Factor

Models

and

avoid

They

identification

the u n j u s t i f i e d

assumptions

A R M A X models.

(and

Errors-In-

literature.

for the

to

as for e x a m p l e

Chapter

4

PREDICTIVE AND NONPREDICTIVE MINIMUM DESCRIPTION LENGTH PRINCIPLES by J. R i s s a n e n

This

chapter

behind

presents

the r e c e n t l y

Minimum model

permits length

one to e n c o d e

stochastic

for m o d e l s

the p r e d i c t i v e tend

sets

data

can be predicted.

involves their

a tight

estimates

values,

statistical with

lower

information

problems

complexity

bound

that

in m o d e l i n g

we d e s c r i b e

of the d a t a

relative

of models,

both

single

case.We

illustrate

with

associated

estimates

structures.

We also

the p a r a m e t e r s , be t a k e n

of.

The

by s i m u l a t i o n s .

complexity the

the c o m p l e x i t y and

all the

say that the

the funda-

stochastic

model. of the

stochastic

A R M A class input/output

the c o n s i s t e n c y of the p a r a m e t e r s

h o w the p r i o r by their

feasibility

for

f r o m the d a t a

and the m u l t i p l e

as r e p r e s e n t e d

advantage

demonstrated

describe

with

to the g a u s s i a n

of the n u m b e r

which

with which

to c a l c u l a t e

simulations

can be

stochastic

the c a l c u l a t i o n

complexity

to be the

ones,

to i n c o r p o r a t e

optimal

code

of the p a r a m e t e r s

we m a y

it

on h o w the

c a n be e x t r a c t e d

are

with which

complexities

associated

Hence,

called

a statistical

is d e f i n e d

The

ideas

shortest

for the errors

The m o d e l

models.

in the

The

Depending

same value.

and the a s s o c i a t e d

As a p p l i c a t i o n s

digits

and the n o n p r e d i c t i v e

m a y be t a k e n

the c o n s i d e r e d

mental

in a class

b o t h of the n u m b e r

which

principle,

data.

of s t o c h a s t i c

to the

also

the b a s i c

Briefly,

of b i n a r y

of the data.

two k i n d s

samples

principles.

the o b s e r v e d

complexity

is done

defined, large

Length

manner

estimation

by the n u m b e r

available

coding

developed

Description

is judged

in a t u t o r i a l

knowledge

estimated of the

of the and the about

values,

scheme

is

can

IX

Chapter 5

DETERMINISTIC AND STOCHASTIC LINEAR PERIODIC SYSTEMS by S. Bittanti

The main results concerning the structural properties of linear periodic systems are reviewed. and discrete-time time-invariant discussed.

Both continuous-time

systems are dealt with. By a comparison with

systems,

five structural properties are

Three of them are basic properties concerning the

reachability and controllability subspaces. The fourth one concerns the length of the time interval required to perform the reachability and controllability transition. (spectral) characterizations

are presented as fifth property.

The extended structural properties detectability)

The modal

(i.e. stabilizability and

are also dealt with. Finally,

periodic stochastic

systems are considered. The existence of a cyclostationary solution is investigated by analizing the appropriate periodic Lyapunov equation.

Chapter 6

NUMERICAL PROBLEMS IN LINEAR SYSTEM THEORY by D. Boley and S. Bittanti

We discuss some numerical aspects

in linear system theory.We

start by showing the numerical algorithm to solve systems of linear equations and non-degenerate

least squares problems.We

then move on to an introduction to more sophisticated matrix decompositions,

used to solve more sophisticated problems,and

introduce the cincept of son,

backward

error

analysis

1965). Among the decompositions we introduce

(Wilkin-

name

form

LU

A=LU

used

to o b t a i n

solution

of l i n e a r

determinant

(Gaussian Elimination) A=QR

QR

soln. to l e a s t S q u a r e s p r o b l e m (linear n o n d e g e n e r a t e )

(orthogonal triangularization)

soln. to l i n e a r E q u a t i o n s w i t h o u t n e e d to p i v o t

Schur

A=QRQ ' . Eigenvalues/vectors

Singular Value D e c o m p o s i t i o n (S.V.D.)

A=PZQ ' . Singular

Values

• rank • distance

to s i n g u l a r i t y

2 - n o r m of m a t r i x

•

2-norm condition where

P,Q denote

orthogonal

U,R

"

upper

triangular

matrices

L

"

lower

triangular

matrices

Z

is n o n - n e g a t i v e

diagonal

last s e c t i o n we d i s c u s s

linear

s y s t e m theory.

linear

numerical

methods.

a n d give

in t e r m s

subspace

some r e c e n t

of r e s u l t s

aspects

is f o c u s e d

in

o n the p r o b l e m

of a t i m e - i n v a r i a n t

It is s h o w n h o w some c l a s s i c a l

problems

on the e r r o r s

some n u m e r i c a l

The attention

the c o n t r o l l a b l e

system.

number

matrices

In the

of c o m p u t i n g

Equations

methods

results

from these

lead to

giving bounds

classical

×I

Chapter

SOME R E C E N T

7

DEVELOPMENTS

b y M. M c A l e e r

In this p a p e r

we d i s c u s s

in e c o n o m e t r i c s : particular,

macroeconomic associated

modelling

with

and M. D e i s t l e r

some of the m a i n

methods

diagnostic

IN E C O N O M E T R I C S

recent

for s p e c i f i c a t i o n

checking

and

empirical

search,

specification

and f o r e c a s t i n g ; microeconomics.

developments in testing;

and some m o d e l s

AUTHORS

Sergio Bittanti D i p a r t i m e n t o di E l e t t r o n i c a P o l i t e c n i c o di M i l a n o P i a z z a L e o n a r d o da Vinci, 32 20133 M I L A N O ITALY

Daniel Boley D e p a r t m e n t of C o m p u t e r S c i e n c e U n i v e r s i t y of M i n n e s o t a 136 L i n d Hall 207 C h u r c h S t r e e t S.E. M I N N E A P O L I S , M i n n e s o t a 55455 U.S.A.

Manfred Deistler I n s t i t u t fdr O k o n o m e t r i e u n d Operations Research Technische Universit~t Wien Argentinierstrasse 8/119 A-1040 WIEN AUSTRIA

E d w a r d G. H a n n a n D e p a r t m e n t of S t a t i s t i c s M a t h e m a t i c a l S c i e n c e s Bldg. The A u s t r a l i a n N a t i o n a l U n i v e r s i t y GPO Box 4 C A N B E R R A , A C T 2601 AUSTRALIA

M i c h a e l J. M c A l e e r D e p a r t m e n t of S t a t i s t i c s , The F a c u l t i e s The A u s t r a l i a n N a t i o n a l U n i v e r s i t y GPO Box 4 C A N B E R R A , A C T 2601 AUSTRALIA

Xill

Giorgio Picci I s t i t u t o di E l e t t r o t e c n i c a U n i v e r s i t ~ di P a d o v a Via G r a d e n i g o 6/A 35131 P A D O V A ITALY

Stefano Pinzoni LADSEB-CNR Corso Stati Uniti 35020 P A D O V A ITALY

Jorma R i s s a n e n IBM-RES 650 H a r r y R o a d SAN JOSE, C A 95193 U.S.A.

4

ed E l e t t r o n i c a

XIV TABLE

TIME

Chapte r I

SERIES

by E.J.

AND

OF C O N T E N T S

STOCHASTIC

MODELS

Hannan

I. I n t r o d u c t i o n

I

2. Some

4

Basic

Algorithms

3. A p p r o x i m a t i o n 4. R a t i o n a l

Criteria

Transfer

5. A. G a u s s - N e w t o n 6. Some

Theoretical

8

Function

Approximation

12 16

Procedure

28

Considerations

34

References

Chapter

2

LINEAR

ERRORS-IN-VARIABLES

SYSTEMS

37

by M. D e i s t l e r

I. I n t r o d u c t i o n

37

2. The S t a t i c

41

3. S e c o n d

Case

M o m e n t s and D y n a m i c

Models: the G e n e r a l

Case

4. C a u s a l i t y 5. C o n d i t i o n s Moments

52 for I d e n t i f i a b i l i t y

f r o m the S e c o n d

of the O b s e r v a t i o n s

6. I d e n t i f i a b i l i t y References

48

from H i g h O r d e r

58 Moments

63 66

XV

Chapter

3

A NEW CLASS

OF D Y N ~ 4 I C

FOR STATIONARY b y G. Picci

TIME

MODELS

69

SERIES

and S. P i n z o n i

69

I. I n t r o d u c t i o n 2. D y n a m i c

Factor

3. S t o c h a s t i c

Analysis

80

Models

87

Realization

4. C a u s a l i t y

104

References

112

Ch_~pter 4

PREDICTIVE MINIMUM

AND NONPREDICTIVE

DESCRIPTION

LENGTH

115

PRINCIPLES

by J. R i s s a n e n

1. I n t r o d u c t i o n

115

2. C o d i n g

120

and Prediction

3. A R M A E s t i m a t i o n 4. V e c t o r

Time

and P r e d i c t i o n

Series

125 131

Models

137

References

Chapter

5

DETERMINISTIC

AND

STOCHASTIC

LINEAR

PERIODIC

141

SYSTEMS by S. B i t t a n t i

141

I. I n t r o d u c t i o n 2. S t r u c t u r a l Systems

Properties

of C o n t i n u o u s - t i m e

Periodic

143

X~ 2.1 Continuous-time Linear Periodic Systems

143

2.2 Structural Properties

145

2.3 Grammian Matrices

146

2.4 Five Structural Properties of Time-invariant

146

Systems 2.5 Five Structural Properties of Continuous-time

148

Periodic Systems 3.

Structural Properties of Discrete-time Periodic

156

Systems 3.1 Discrete-time Linear Periodic Systems

156

3.2 Structural Properties

158

3.3 Grammian Matrices

158

3.4 Five Structural Properties of Discrete-time

159

Periodic Systems 4.

Kalman Canonical Decomposition

163

5.

Extended Structural Properties

165

6.

Stochastic Linear Periodic Systems

168

References

Chapter 6

176

NUMERICAL PROBLEMS IN L I N E A R

SYSTEM

THEORY

183

by D. Boley and S. Bittanti

1 ,

Introduction

183

2.

Review of Simpler Computational Methods

183

2.1LU

183

Decomposition

2.20rthogonal 2.2.1QR

Decomposition

Decomposition

2.2.2 Geometric Interpretation of a Rotation

188 188 191

2.2.3 QR Decomposition by Housolder deconigositions 192 2.2.4 Solving Least Squares Problems Using Orthogonal Decompositions

194

X~

Special Forms Used in Numerical Linear Algebra-Why

196

3.1 The Jordan Canonical Form

196

3.2 Numerical Conditioning of a Problem

197

4.

Schur Decomposition

199

5.

Singular Value Decomposition -

201

3.

Condition Number of a Matrix 6.

Applications of Previous to Linear Systems

References

Chapter 7

211 220

SOME RECENT DEVELOPMENTS IN ECONOMETRICS

222

by M. McAleer and M. Deistler

I.

Introduction

222

2.

Specification and Quality COntrol of a Model

226

2.1 Model Specification

227

2.2 Tight and Loose Specifications

228

2.3 Principles for Testing

231

2.4 Diagnostic Testing

232

2.4.1 Serial Correlation

232

2.4.2 Heteroscedasticity

233

2.4.3 Exogeneity

234

2.4.4 Functional Form

234

2.4.5 Parameter Constancy

235

2.4.6 Non-nested Alternatives

235

3.

Macroeconomic Modelling and Forecasting

236

4.

Microeconometrics

240

References

241

Chapter

I

Time

Series and Stochastic

E.J.

.

Models

Hannan

Introduction

This c h a p t e r will be c o n c e r n e d w i t h p r o c e d u r e s

for a n a l y s i n g

y(t),

of

t = 1,2,...,T,

where

y(t)

is a v e c t o r

q

data,

components

that can be thought of as the o u t p u t of some s y s t e m to w h i c h the input is

u(t),

an o b s e r v e d v e c t o r of

p

components.

held in m i n d is one w h e r e no very p r e c i s e about the system and the d e s c r i p t i o n of such g e n e r a l i t y explanation.

that e x p e r i e n c e

information

The situation is available

will be on the b a s i s

suggests w i l l

suffice

This w i l l be further d i s c u s s e d below.

of m o d e l s for a g o o d

T h e s e models

will always be stochastic. Let us b e g i n by c o n s i d e r i n g stationary

stochastic

y(t),

p r o c e s s with

E{yj(t) 2} < -, where

y~(t)

is the

assume that

y(t)

of the p r o c e s s

is ergodic

is e v e r

from the i n d e f i n i t e l y

effects c o u l d

=

of

y(t).

since only one h i s t o r y

so that there

far past,

to r e q u i r e if there

such effects

to

or r e a l i z a t i o n that it be

is no influence

or rather

such as diurnal

by a so that

It is c o s t l e s s

on

y(t)

is such an

as the m e a n or of

or seasonal m o v e m e n t s . so that,

will be with the m e a n c o r r e c t e d

Such

for example,

quantities

y(t)- y,

1 T Z y(t).

to c a l c u l a t i o n s

so t h a t

[(t)

This makes n o t a t i o n Any such stationary, least in part,

square,

q,

first be removed by r e g r e s s i o n

all c a l c u l a t i o n s

In relation

as g e n e r a t e d

seen and r e a s o n a b l e

it can only be through

periodic c o m p o n e n t s

been done

j = 1,2

j'th c o m p o n e n t

purely n o n - d e t e r m i n i s t i c , influence

alone,

finite mean

it will be assumed

is the r e s i d u a l

that this has a l r e a d y

from such an adjustment.

simpler. non-deterministic

through

its spectrum,

process can be analysed, f(w),

a

q x q

at

matrix valued

2

function

satisfying F(t)

f (~) = f (~)

= f (-~) '

d E{y(s)y(s+t)'}

= I ~eit~f{~)

We shall n o t discuss F o u r i e r m e t h o d s methods models

are e m p h a s i s e d

proportions,

in c o n t r a s t

Here

to F o u r i e r m e t h o d s

by smoothness

and systems e n g i n e e r i n g

a c r o n y m for a u t o r e g r e s s i v e (Here e x o g e n o u s

means

For

to m a n a g e a b l e

requlrements

for

f~).

emphasised

and are c a l l e d ARMAX,

moving-average

input.)

that are

is r e d u c e d

These finite p a r a m e t e r m o d e l s have been e s p e c i a l l y econometrics

the main

"finite parameter"

and in w h i c h the g e n e r a l i t y

essentially,

de.

in any d e t a i l b e c a u s e

of this paper are different.

non-parametric

and

with exogenous

y(t}

stationary

in

an

compo[:ents.

and non-

deterministic j y(t) Here the e(t) of

e(t)

= y(t) y(t)

= 7 W i e(t-i), 0

are the linear i n n o v a t i o n s

- y(tlt-l)

from

y(t[t-l)

important,

Then

There

% 0,

since

i.e.

is the b e s t linear p r e d i c t o r

There

is an e x t e n s i v e

of

f(~)

here.

the c o n s t r u c t i o n

but this will n o t be

Put (i.i)

Iz] > 1

and

W(z)

is a n a l y t i c

H o w e v e r we always a s s u m e

zeros on

Izl = 1

for

Izl > i ,

det W(z)

cause c o n s i d e r a b l e

# 0,

problems.

is a d e c o m p o s i t i o n f(~)

which

is u n i q u e

W(z)

having

= _ 1 W(e-i~ ) nw(e-i~)*, 2~ since there

the p r o p e r t i e s

we g e n e r a l i s e

(i.I)

y(t)

is no other

(1.2) such d e c o m p o s i t i o n

stated above.

To take a c c o u n t

= 7. Wie(t-i) 0

u(t)

= ZLiZ-i

is causal, so that there is no i n f l u e n c e However

The e s s e n t i a l

(1.3)

relation

(I.I},

as a basis for a w o r t h w h i l e further

of

+ 7. L.u(t-i), 1

L(z)

s > t.

with

to

and p u t

u(s),

theory

= 7. W.z -i 0

~II W i N 2 < ,

Izl > I,

E{e(s)e(t) '} = ~st ~.

W i e n e r and others c o n c e r n i n g

algorithmically,

det W(z)

since

y(t-2)...

from k n o w l e d g e

W(z}

9(tlt-1)

where

y(t-l),

due to Kolmogoroff, of

W 0 = Iq,

specialisation

(1.27,

restriction

on

is that the

y~t)

(1.3) are too g e n e r a l

statistical

consider

here

analysis.

the infinite

from to serve

To introduce

(Hankel)

matrix

a

-W 1

L1

W2

L2

W3

L3

-..

W2

L2

W3

L3

W4

L4

-.-

W3

L3

W4

L4

W5

L5

...

H =

i

m

I

0

Here

[WjLj]

Q

Q

t

0

0

i

Q

O

w i l l be c a l l e d a "block",

columns.

The i m p o r t a n c e of

obvious, f a c t that the b e s t predictic~ of u ( ~ , ,

H

j

H

has,

Li÷jut i,

Let

n

rows of

H

the c o e f f i c i e n t w i l l be m a d e

that span all of the rows of

so that a n y row can be l i n e a r l y r e p r e s e n t e d in t e r m s of them.

Of course rank of

n

w o u l d be infinite in general.

H,

rows of

H

p

[W(z),L(z) ].

Call

H1

y(t) Since

where

K,L

c o l u m n s of

(1.3),

comprise, H 0.

q

r e s p e c t i v e l y the first

q

are com.posed of

This is the state

F,H

+ e(t},

the rank of

H, H 0.

x(t)

H0~(t)

= Ke(t)

and

= H0~(t-l).

(1.4)

(full)

rows of

+ Lu(t) H

then

+ H2u(t-I). H 1 = HH0,

and x(t+l)

= Fx(t)

space r e p r e s e n t a t i o n

Its lack of u n i q u e n e s s ,

given that

F

+ Lu(t)

+ Ke(t).

(1.5)

in p r e d i c t i o n e r r o r form. is minimal,

i.e. of d i m e n s i o n

is e n t i r e l y due to the lack of u n i q u e n e s s

in a

T h a t can be m a d e u n i q u e by c h o o s i n g the rows of

as the first l i n e a r l y H.

or

Put

+ e(t),

for suitable

y(t) = Hx(t)

choice of

H,

(1.4)

= Hl~(t-l)

H0, H 2

H 2 = FH 0

the

the first b l o c k of

~(t) = [e(t)'u(t) ' e ( t - l ) ' u ( t - l ) ' . . . ] ' , Then from

n,

and p u t

H 0 = [K L H 2] the next

The i n t e g e r

is c a l l e d the o r d e r or the M c M i l l a n d e g r e e of

e q u i v a l e n t l y of

of

is, ignoring

j>l

The i m p o r t a n c e of H

in section 4.

be a set of

p + q

can be seen f r o m the, a l m o s t

evident in o t h e r w a y s H0

rows a n d

as the j'th row of blocks,

blocks in t h a t p r e d i c t i o n .

H

q

step a h e a d p r e d i c t o r

(t+jlt) = Z0 Wi+ j et_i + SO that

of

H0

i n d e p e n d e n t set f o u n d as you go down the r o w s

We w i l l r e t u r n to this later.

The m e t h o d s

used herein are d e p e n d e n t 9 n acting, as if

Then, and only then, functions

of

z

W(z)

and

L(z)

are m a t r i c e s

a n d can thus be w r i t t e n

n

is finite.

of rational

in the form

[ W(z) L(z)] = A(z-l) -I [C(z-I) B(z-l)] where (1.6)

A(z),

B(z), C(z)

are m a t r i c e s

(1.6)

of p o l y n o m i a l s .

is far from unique but we shall later describe

prescription fraction

of

H0

how the u n i q u e

just d e s c r i b e d

description",

the shift o p e r a t o r

Of course

(1.6).

i.e.

leads to a unique "matrix -1 shall use z also to indicate

We

z-ly(t)

= y(t-l).

Corresponding

to

(1.6)

we have the ARMAX r e p r e s e n t a t i o n A(z-l)Y(t)

= B(z-l)u(t}

This is i m p o r t a n t p a r t l y b e c a u s e

+ C(z-l}e(t)"

it e x p r e s s e s

y(t-1), y(t-2),., u(t-l), u(t-2),..e(t), serve as a basis for an iterative coefficient matrices are unobserved, no input

y(t)

estimation

This will be dealt w i t h variable

and can

p r o c e d u r e where the

are e s t i m a t e d by regression,

(or exogenous)

in terms of

e(t-l), e(t-2),.,

b e i n g r e p l a c e d by e s t i m a t e s

in the iteration.

(1.7)

the

e(t),

from a p r e v i o u s in section

5.

which stage

When

is o b s e r v e d we speak of the ARMA

case. Notes on References. spectral

theory,

for example

theory of systems 2.

There are m a n y r e f e r e n c e s Hannah

see K a i l a t h

(1970).

(1980), C a s t i

For the structure (1977).

Some Basic A l g o r i t h m s

There are three basic a l g o r i t h m s (i)

The first a l g o r i t h m y(t)

d(~)

of time series analysis.

is the discrete

at f r e q u e n c i e s

for

Under

T'

highly

Composite.

E{d(2~j/T) d(2nk/T) and,

indeed the error

(ii)

The

transform

is u n i f o r m l y

second a l g o r i t h m

j=0,1 ..... [½T']

conditions

on

f(~)

n

is finite.

is the L e v i n s o n - W h i t t l e

recursion.

in a sense,

(I.I) by c o n s i d e r i n g

smoothness

2~j/T',

} ~ 6jk2~f(2~j/T)

a l g o r i t h m will not be so i m p o r t a n t

This is designed,

Fourier

= T - ½ T~y(t) e it~ , 1

which is c h e a p l y c o m p u t a b l e

in

for the basic

0(T -I)

if

This

to us.

to produce a p p r o x i m a t i o n s

to

e(t)

~(z) = w(z) -I

e(t) = ~(z)y(t)

The procedure recursively calculates polynomial

approximations

$ n

of degree ~(z)

n

to

~.

is a polynomial

degree

n.

s < 0

n

of degree

will,

n

because a system for which

The recursive calculation

natural estimates

For

We have used

of

F(t)

in fact, be of McMillan

uses the data through the

of the form

1 T-t (t) = ~ Z y(s)y(s+t) ', t Z 0. s=l put ~(t) = ~(-t)' However it will be convenient

put this Levinson-Whittle

recursion

because of its many uses later.

in a more general

Thus let

v(t)

to

setting

be a vector of

s

components and put ^ I T-s Fv(t ) = ~ 7 v(s)v(s+t) ' = s=l

v(-t) ',

t >. 0.

~he recursion calculates matrices^ Fn,j, Fn, j, Sn, S nIf^ v(t) = y(t) then Fn, j is ~n,j' the c o e f f i c i e n t ~n(Z) ~ and correspondingly en(t) The

Fn, ~

process putting

=

would,

"backwards"

we have an estimate

n 0Z Sn,jy(t-j) '

rn(t),

v(t)

for

n rn(t) = ~ n , j

We now give the

e(t)

= y(t),

corresponding

correspond

12.1) to

to the time reversed

(as distinct from the forwards residuals Fn,j = ~n,j

z -j in

iT+n ~n = Sn = T ~ Sn (tlen(t) '

in this case where

residuals,

of

of

~n(t)).

Thus,

y(t) = v(t) y(t-n+j) ,

IT+n ~n = Sn = ~ 1~ rn(t)rnlt)'.

recursive algorithm in terms of

Fn,j = Fn-l,j + Fn,nFn-l,n-j'Fn,j

v(t).

= Fn-l,j

Fn, 0 = Fn, 0 ffi I s~_ Fn,n = -An-ISn~I ' Fn,n = -An-iSn~l'

12.21

+ Fn,nFn-l,n-j '

n An =

ZFn,j~v(J-n-I) "

0

Sn = (Is - Fn,nFn,n) Sn-l' Sn = (Is - Fn,nFn,n) Sn-i '

So = S o ffi ~v 1°)" In case

s = i

we have

that the algorithm

S n = S n' F n,j = F n,j . . j=l, . .

is simplified.

,n,

so

These procedures better,

when

have

n/T

severe

is not small.

founded on the T o e p l i t z T < t ~ T+n. for the

disadvantages This

assumption

Fn, ~

for given

n

T

v(t)

implicitly or

the system of e q u a t i o n s

has a block Toeplitz down any diagonal.)

= %n_l(t)

they are

= 0, -n < t ~ 0

+ ~n,nrn_l|t-l),

matrix,

i.e.

There have been

many m o d i f i c a t i o n ~ , often Dased on calculating, ~n(t) (see (2.1), (2.2)) r e c u r s i v e l y by %n(t)

is small or,

is because

that

(This is so called because

one with the same e l e m e n t s

when

for example,

~

(t), n

rn(t)

= ~n_l(t-l) + ~n,nen_l(t)

~n(0)

£ 0,

~0(t)

= 90(t)

= y(t),

1 ~ t ~ T.

Then also An - T1 It is the terms

T+n Z en(t)rn(t-l). 1

in

(2.1),

(2.2),

to cause m o s t of the trouble, involve, q = 1

in a substantial

it has been

T Z en_l(t)^ n

rn_l(t-l),

resulting

number

coefficient. equivalent

to the fact The use of them)

replaced

in

that

are called partial coefficients

the

Sn(Z)

between

is that the

# 0,

and is completely Izl ~ I,

additional because

a desirable calculations.

but wherever

These

of the flow diagrams

in real time calculations.

we shall continue

~n,n

to write

that is used

For the

in terms of the it could be

(using a lower case symbol

autocorrelations

by systems

computing

(2.4)

(2.4)

formula.

TO see why the a l g o r i t h m consider

also In case

by

of correlation in

~ n,n

(so c a l l e d

recursion

case

I ~ t < n

assumption.

be replaced

n,n

that seem

as also does the c o r r e l a t i o n

involves

are important

by a lattice

In the scalar

$

A virtue

(-i,i),

(2.3)

of this account

Levinson-Whittle

those for

use the c o e f f i c i e n t

or ladder m e t h o d s

describing purposes

that

is also true of

property. lattice

T < t ~ T+n

the Toeplitz

n ~ t ~ T.

lies

This

for

~n-i (t-l)/{!2 TZ en_l(t) ^ ^ 2 + ½ tZ rn_llt-l)2 } n n

but one m i g h t equally en_l{t),

(2.3)

though

way,

suggested

(2.3)

by statisticians

for

q = i}

and reflection

engineers. has b e e n p r e s e n t e d

an estimate

of

e(t)

for general

when

v(t)

inputs are observed.

Put,

then, n^

en(t)

n^

= Z~ n • y(t-j) 0 ,3

- Z~n, j u(t-j). I

Here

Z~n, j z -j is an a p p r o x i m a t i o n

using

(1.6) .

To obtain

A

and

[~n^,~'Tn,j" ]

also

~n'

hand

q x q

$, ~

to

take

as the f i r s t b l o c k

the c o v a r i a n c e matrix

W(z)-IL(z)

of

q

m a t r i x of the

of S n.

= C ( z - l ) - i B ( z -I)

.

v(t) ' = (y(t) ',u(t) ') , s = p + q, rows in

en(t),

Fn,j.

Then

is the top left

This type of p r o c e d u r e

will r e p e a t e d l y

be used below. (iii) The third m a j o r a l g o r i t h m a finite p a s t e q u i v a l e n t finite.

The a l g o r i t h m A

x(t+l)

to

is the K a l m a n

e(t),

filter,

on the basis of

which computes

(1.5),

for

n

is

= Fx(t)

+ Lu(t)

+ K(t) e(t),

y(t)

^

= Hx(t)

+ c(t)

!

K(t)

= {FP(t) H

P(t+l) P(1)

It may be wise

= FP(t)F'

= FP(1)F'

There is an e n o r m o u s Gaussian

x(l}

P(t),

of r o u n d i n g literature

+ .Q}-IK(t)'

= 0.

replacing

it by

½{P(t)+P(t) '}

errors.

surrounding

this algorithm.

For

lies in the fact that it allows

to be calculated,

w h i c h we call

+ ~}-I

- K(t){HP(t)H'

+ KnK',

its importance

likelihood

likelihood,

+ K~K'

to sym/netrise

to reduce the e f f e c t s

our p u r p o s e s

+ K~}{HP(t)H'

L(8)

or better

and still

(-2T -I)

the

by that

speak of as the likelihood.

This is, apart from a c o n s t a n t , 1 T Zlog d e t { H P ( t ) H ' + ~ }

i T + ~ 1Ze(t) '{HP (t)H'+n}-l£ (t) .

(2.5)

7 1 Here

e

K, ~.

stands for the p a r a m e t e r s Those

indicate by in

~.

T.

In (2.5)

treating few,

in F, H, K

u(t)

if any,

assumption

we

The r e m a i n d e r the G a u s s i a n as a fixed

of the m e t h o d s

that the

e(t)

involved,

shall call

i.e.

are the v a r i a n c e s likelihood

sequence

those

in

system p a r a m e t e r s

of this c h a p t e r are Gaussian.

and c o v a r i a n c e s

has been w r i t t e n

of vectors.

down

We e m p h a s i s e

depend greatly

The likelihood,

is used to obtain an e s t i m a t i o n m e t h o d rather

F, H,

and shall

than b e c a u s e

that

on the (2.5), it is

the true likelihood. Notes on R e f e r e n c e s .

The fast F o u r i e r

t r a n s f o r m was i n t r o d u c e d

to

latter day science

in C o o l e y and Tukey

of the L e v i n s o n - W h i t t l e

(1965).

The v e c t o r

a l g o r i t h m was given in W h i t t l e

L a t t i c e f o r m s are s u r v e y e d in F r i e d l a n d e r

(1982).

form

(1963).

A g r e a t amount

of detail a b o u t the K a l m a n f i l t e r is found in A n d e r s o n a n d Moore (1979). 3.

Approximation Criteria

The p r o b l e m to be c o n s i d e r e d

in t h e r e m a i n d e r of this c h a p t e r is

that of a p p r o x i m a t i n g the true s y s t e m by one of finite M c M i l l a n degree.

T h i s degree,

n,

has to be d e t e r m i n e d .

Once t h i s

is

r e c o g n i s e d it m u s t a l s o be r e c o g n i s e d that it is not p o s s i b l e to p r o c e e d p u r e l y through the m i n i m i s a t i o n of a l w a y s be f u r t h e r r e d u c e d by t a k i n g procedures here considered choose

n n

log det ~n + d(n)CT/T' Here

~n

(2.5)

large.

by m i n i m i s i n g some f o r m of n = 0,1,...,N.

is the m a x i m u m l i k e l i h o o d e s t i m a t e of

and the f i r s t term in the m i n i m a l v a l u e of

(3.1)

~,

(2.55, for

n

given.

The c o n s t a n t

The second term in

is

n ( 2 q + p).

term w h i c h i n c r e a s e s as Two c o m m o n l y used w i l l be c a l l e d be c a l l e d

CT

n

BIC(n).

increases,

s e q u e n c e s are

AIC(n),

given

and

An upper bound,

d(n)

whereas C T ~ 2,

C T = log T, N,

is the d i m e n s i o n (3.1)

(N

is a p e n a l t y

the first d e c r e a s e s . in which c a s e

in w h i c h case has b e e n

m i g h t increase w i t h

(3.15

(3.15 w i l l

imposed on

and is n e e d e d in c o n n e c t i o n w i t h p r o o f s of a s y m p t o t i c p r o p e r t i e s of the method.

n,

essentially

(Some a p p r o x i r ~ t i o n is

i n v o l v e d in that statement.) which

(3.1)

is, e x c e p t for a constant,

of

T,

since t h a t can The a l t e r n a t i v e

n

(wi~h T5

T.)

in p r a c t i c e

such b o u n d s do not seem to be u s e d p r o b a b l y b e c a u s e the b o u n d s n e e d e d for v a l i d i t y are m u c h larger than v a l u e s of experienced

n

t h a t an

i n v e s t i g a t o r w o u l d c o n s i d e r r e a s o n a b l e and a r e needed

in the t h e o r e t i c a l

i n v e s t i g a t i o n only to e x c l u d e r i d i c u l o u s l y

large values. For the c a s e of

C T = log T

a j u s t i f i c a t i o n has been g i v e n by

R i s s a n e n on the b a s i s of a m i n i m u m d e s c r i p t i o n

length p r i n c i p l e .

The idea is to use the m o d e l

set to r e c o r d the d a t a in as f e w b i t s

as p o s s i b l e .

(or r a t h e r

The first term

T/2

b y it) g i v e s a

m e a s u r e of the a v e r a g e n u m b e r of b i t s r e q u i r e d for an o p t i m a l encoding when

n

is fixed and the m a x i m u m l i k e l i h o o d structure,

on G a u s s i a n a s s u m p t i o n s , i s decode,

t a k e n to be the true structure.

To

the m o d e l p a r a m e t e r s m u s t also be t r a n s m i t t e d a n d T / 2 by

the second t e r m in

(3.1), for

BIC,

measures

the n u m b e r of bits for

an optimal e n c o d i n g of these, to an a c c u r a c y d e t e r m i n e d by that of the m e t h o d of m a x i m u m likelihood.

The use of

CT - 2

justified by A k a i k e on the basis of a p r e d i c t i o n

has b e e n

theory,

and has b e e n

widely used. The e m p h a s i s in this c h a p t e r w i l l p r i n c i p a l l y be on the use of rational t r a n s f e r f u n c t i o n systems as a p p r o x i m a t i o n s more general kind. section.

to systems of a

T h i s will be f u r t h e r d i s c u s s e d in the next

H o w e v e r h e r e some d i s c u s s i o n of the case w h e r e there is a

true r a t i o n a l t r a n s f e r f u n c t i o n s y s t e m w i l l be g i v e n in r e l a t i o n to the use of

(3.1).

T h e c o n d i t i o n s under w h i c h the s t a t e m e n t s b e l o w

hold true are e s s e n t i a l l y

(6.1),

of fourth m o m e n t s of the

ej (t),

also d e p e n d o n a c o n d i t i o n (Compare b e l o w

(I.i).)

(6.2), below, p l u s the f i n i t e n e s s but the p r o o f s of the t h e o r e m s

det W(z)

This

6

# 0,

Izl >_ 1-6,

6 > 0.

may be as small as d e s i r e d but is

p r e s c r i b e d u pr~or~. Now assume there is a true T ÷ ~,

CT/T

+

0

no

n

minimises

(3.1) while,

(which is an i n s i g n i f i c a n t r e q u i r e m e n t ) .

following holds, w h e r e a.s. (i)

and

lim inf C T / ( 2 1 o g log T) > 1 then T~=

n ÷ n0, a.s.

If

lim sup C T / ( 2 T+=

n

a.s. to

loglog T) < 1 then

does not c o n v e r g e

n0.

lim inf C T = ~ then

If

lim sup C T < = T ~ ~

~ + n O in p r o b a b i l i t y .

then

!im l i m P{n > i. 6~0 T + ~ no } =

(3.2)

These results d e s e r v e careful i n t e r p r e t a t i o n . (i) should not be i n t e r p r e t e d as saying that a good value to use b e c a u s e with

T

tO be m e a n i n g f u l .

At is 3.9. of

T

2 loglog T At

T = i0

It is t h e r e f o r e not far f r o m

In the f i r s t place C T = 2 loglog T

changes

CT = 2

(3.2) s u g g e s t s that

AIC(n)

n

T = i000

for m o s t AIC(n).

values

The r e s u l t

is bad b e c a u s e it w i l l a l w a y s o v e r -

estimate the M c M i l l a n degree. no true d e g r e e and t h e n

is

far too slowly

it is 1.7 and at

met in p r a c t i c e , w h i c h is the value for

is how fast.

Then the

stands for " a l m o s t surely".

If

(ii) If

as

H o w e v e r in p r a c t i c e there w i l l be

should increase with

Some i n v e s t i g a t i o n s

suggest t h a t

T.

The q u e s t i o n

C T = 2, i.e. AIC,

10

gives an o p t i m a l rate Of increase, The r e s u l t

(3.2) d e s e r v e s

s i m p l e s t case w h e r e n = 1

a c c o r d i n g to c e r t a i n c r i t e r i a .

further d i s c u s s i o n .

q = i,

n0 = 0

We give this for the

so that

y(t)

= e(t).

When

is the m o d e l t h e n y(t)

+ ay(t-l)

= e(t)

+ ce(t-l),

We i n d i c a t e w h y

n = 1

value,

is u n i f o r m l y bounded.

when

CT

lal < I,

w i l l be p r e f e r r e d to

Icl < 1-6.

n = 0,

(3.3)

the true

The c h o i c e b e t w e e n the two

v a l u e s w i l l be b a s e d on log ~i + 2 C T / T - log ~0 = - l ° g ~ 0 / ~ l ) so that

n = 1

Consider

Fig.

w i l l be p r e f e r r e d w h e n

+ 2CT/T

A T = T l o g ( ~ 0 / ~ l) > 2C T •

i.

/ 1-6

c

-i+6

/ -i

1 a

F i g u r e I. The r e g i o n of o p t i m i s a t i o n lines t h r o u g h

±(1-6),

for

n = 1

is that b e l o w and a b o v e the

e x c l u d i n g the diagonal,

the l i k e l i h o o d c o u l d be at the boundary. maximum likelihood estimates that

(~,~)

it may be s h o ~

m o v e s to the diagonal.

but the m a x i m u m of

In fact if

Thus

that AT

that d i a g o n a l b y

lal < i o g { ( 2 - 6 ) / ~ } 1T

= 4,

let us say.

s t a t i o n a r y r a n d o m f u n c t i o n of

(a - c) ÷ 0,

Fig.

~ = log{(l+a)/(l-a)}

6,

i.e.

~(~)2 ~

so

i.

where

Let us

so that

Then this function,

is e v e n t u a l l y the m a x i m u m v a l u ~ is

are the

is e v e n t u a l l y the

m a x i m u m of a f u n c t i o n d e f i n e d on the d i a g o n a l of parameterise

a, 6

~(s)

of w h i c h is a

t a k e s the p l a c e of

t

in our p r e v i o u s

considerations

spectral d e n s i t y ~(u)

will,

as

~ + 0 el.

so that

Thus

A

(a,c)

as follows.

becomes

large v a l u e s

which will m a k e o p t i m i s a t i o n interpretation

continuously.

-~ < ~ < =.

its m a x i m u m for i n c r e a s i n g l y that approach

but v a r i e s

{cosh ~ } - i

will

increasingly of

u

(i,i)

has

that

large,

i.e. v a l u e s

approach

difficult.

~(u)

It is e v i d e n t

or

take

of

a

(-i,-i),

This r e s u l t has a n o t h e r

It is a p p r o x i m a t e l y

true that

~i

is the

minimum value of

-~{Id(~)12/lw(eiC~) l2}

IV

where that

W(z)

=

(z-c)/(z-a).

a-S ~ 0.

If

Iw(ei~)12 ÷ i,

a,S

(e,a) = (I,i)

than

does,

0 (for -i)

then

IW(ei~)12

or

(3.4)

a "notch"

If

a

reduced

~n

or

0.

bf

Where

Id(~)l 2

This f u n c t i o n

near

0,

en.

is i n c r e d i b l y

local minima to find.

to

(3.4) and the a b s o l u t e

This c o r r e s p o n d s

local maxima and minima, neighbourhoods the function

of

~,e

+i) or

IW(ei~)12 ~).

Thus

the n o t c h will by the shape of irregular that w i l l

for

(for

~

m i n i m u m m a y be very d i f f i c u l t 2 ~(u) will have many

small)

into small

because

of the n a t u r e of

e = log{(l+a)/(1-a)}. situation,

essentially

the same. in that

i.e.

that for general

It must be e m p h a s i s e d T

n, q, p, that

(3.2)

is is very

may need to be very large before

it is

relevant.

Notes on References. suggested in A k a i k e

The procedures (1969),

and above are in H a n n a n relating to

AIC

Rissanen

(1980),

described (1983).

(1981),

w h e n there is no true

Hannah and K a v a l i e r i s

(1980).

T

give

to the fact that

w h i c h w i l l be c o m p r e s s e d

a = ±i

The general "asymptotic"

for

faster

(for

(since

shape will be is d e t e r m i n e d

so that there w i l l be many v a l u e s

el

±n

be and what its precise large,

We know that

then

goes to

zero at

at other values

at

d(~)). el

so that it seeks to move

becomes

is f u r t h e r

for

The m e t h o d of m a x i m u m

(3.4),

(-i,-i).

to unity u n i f o r m l y

develops

2(i)

away f r o m

a-a ÷ o.

IW(ei~)l-2

a n d thus

will converge

as

to m i n i m i s e

towards e

(See s e c t i o n

remain b o u n d e d

uniformly,

likelihood attempts

(3.4)

de

in this s e c t i o n were The results

(1984). nO

in

(3.2)

For the results

see S h i b a t a

(1980),

12

4.

Rational

Transfer

Function ApDr0ximation

In this section a b r i e f theory c o n c e r n i n g

the a p p r o x i m a t i o n

by a p p r o x i m a t i n g

to

H

less c o n c e r n e d w i t h methods

account w i l l be given of some d e e p of the true structure

by a Hankel matrix

theory may c h o o s e

relate mostly

to the case where

W(z)

by a

possible

W(z)

for

n

finite

in the Hankel norm, w h i c h

singular value norm) to another.

for

H

from

has Thus

)',

past.

The

structure

~ ~).

matrices being (5.5).

describes

space on w h i c h

By i.e.

R.

H

(I

Q ~)

as the

operates

(1.2).

space

F(t)'

= 0,

(4.1)

(j,k)th

block,

Wj = 0,

of the f u t u r e on the

is therefore matrix

of

e n d o w e d with a e t,

namely

definition H

of tensor p r o d u c t

operates

blocks see b e l o w

is endowed w i t h a m e t r i c

m a t r i x of

Yt+l'

namely

that

block

is the

t'th

the c a n o n i c a l

= F(j-k)'

Fourier coefficient factorisation

Let this f a c t o r i s a t i o n

(This n o t a t i o n

(e(t)',e(t+l)',...)'

a b l o c k diagonal m a t r i x w i t h the diagonal

E{y(t+j)y(t+k) '} = F(k-j) Since

(or

we mean the T e n s o r p r o d u c t of the two

The space to w h i c h

we c o n s i d e r

norm

yt = ( y ( t - l ) ' , y ( t - 2 ) ' , . . . ) '

given by the c o v a r i a n c e

For the general

(j,k}th

)'

the d e p e n d e n c e

structure given by the c o v a r i a n c e with

is as small as

from one H i l b e r t

et =

E ( e t et+l)

Wj_k,J,k=l,2,... (4.1)

metric (I

H-~

(i.I),

Yt+l = Her + Ket+l' K

so t h a t

is the E u c l i d e a n

(y(t)',y(t+l)' ....

as is e a s i l y c h e c k e d

j < 0.

The idea is to a p p r o x i m a t e

as an o p e r a t o r

e t = (e(t-l) ',e(t-2)',...

where

The

To see w h a t is i n v o l v e d put Yt =

Then,

(Readers

this section.)

there are no inputs and

only that case w i l l be d i s c u s s e d here. to

of finite rank.

to "skip"

be

f(-m)

matrix of

f(--a)

of this, as for f(m) in -I~ -iv --iw * = (2~) W(e )~W(e ) .

is in a g r e e m e n t w i t h that in section 2 b e c a u s e

is the s p e c t r u m of the time r e v e r s e d process.)

Here

W(z)

f(-w]

= E WjzJ

13

and

det W + 0,

block,

W(j}

Izl > I.

Let

= 0, j < 0.

W

have

W(k-j)

(j,k)th

as the

Then

s = ( i • ~-%) w - 1 ~ ( z ~ n ½) operates from £2 to £2' sequences al,a2,.., with produc~

(a,b)

decomposition is upper

= Zajbj.

Thus S

triangular

The blocks,

Sj+k_l,

S

and for

z = exp i~ If

Toeplitz,

singular

by the matrix

matrix

so that

W-IH

then that

it is easily

q = 1

In the scalar

whose

W -I

is of Hankel

value

because

then

case,

f(-~)

q = I,

Thus we write

in the typical

form.

(j,k) th

place

function (4.3)

checked

= f(~)

we shall

w(z).

W

is also of matrix

= ~-½ W ( z ) -I W(z -I) n ½

matrix. letters.

S

j,k = 1,2,...,

are g e n e r a t e d S(z)

it is

is also a Hankel

and block

It follows

in

where £2 is the space of all ZIajl 2 < ~ and with the ihner

is sought.

that form.

(4.2)

that this is a unitary

so that in future

In this case

~ = ~

and

W = W.

use lower case

therefore

(4.3)

becomes

s(z) = w(z-l)/w(z) which

is o b v i o u s l y

of modulus

1

for

has real coefficients.

Of course

analytic

for

However

analytic

part,

The singular that operator

Izl ~ I. i.e.

value

Introduce

unlike

since W(z),

only the c o e f f i c i e n t s

the coefficients decomposition

of

to be appropriate,

S = 2 pjnj~j, 1

z = exp i~

S(z),

e.g.

of S

z 3,

j > 0,

is of the form

w(z) is not

of the

occur in

S.

(assuming

compact)

njnk = ~j~k = ~jk

Pl ~ P2 ~ "'" ~ 0.

the new random variables uj ~ nj

(I

@ ~½)

*

(I

@ n -½) e .

xj = ~j

W -I Yt+l t

Then E(uju k) = E(xjx k) = 6jk; The occur

uj,

xj

m i g h t be called

in the classical

analysis

as functions

theory

E(ujx k) = 6jkP j.

"discriminant

functions"

of statistical

canonical

that are used to c l a s s i f y

since they correlation

individuals.

The

14

pj

themselves

e s, s ~ t, canonical

w o u l d be c a l l e d

spans the same correlations

w o u l d be o b t a i n e d with the metric yr.

if

"canonical

space as do the

and the same S

uj

determined

(at least for

Hankel norm a p p r o x i m a t i o n given

n.

virtues

to

H

these

for the A R M A case.

H.

block,

Call

that such a c a n o n i c a l

r(v,j)

the

v=l,2, .... r(v,j),

after f i r s t

Then

by the f i r s t j'th

n

row,

of

it is

the b e s t W(z),

for

since the in a

to estimate

ideas h a v e b e e n used by Akaike

representation

is chosen as c o n s t i t u t e d

to

of h a v i n g

survey,

It will be r e c a l l e d

matrix

is known

into that here

that we shall b r i e f l y (1.6),

6

xi)

from a space

are b y no means evident

c o n t e x t nor are the e f f e c t s However

of

and e q u i v a l e n t

We shall not enter further

pj, uj, xj.

the same

by the c o v a r i a n c e

q = I) to d e t e r m i n e

of such an a p p r o x i m a t i o n

statistical

s ~ t,

as an operator

Once the singular value d e c o m p o s i t i o n

possible u n i q u e l y

of

yS,

Since

(but not the same

were c o n s i d e r e d

structure

correlatlcns".

introducing

in a way

a canonical

form is a t t a i n e d linearly

the

if

H0

independent

j = 1,...,q,

rows

in the v'th

such a set of rows is always of the form

v = 1 ..... nj;

j = 1 ..... q;

Znj = n.

(4.4)

The

n. are known as the K r o n e c k e r indices. They uniquely ] determine these first linearly i n d e p e n d e n t rows of H . There is 0 a c o r r e s p o n d i n g unique f a c t o r i s a t i o n of W(z) = A(z-l)-ic(z-l), w h e r e A(z -I) = ZA(z),

C(z)

= ZC(z)

and

in the j'th place in the diagonal. n o m i a l s with monic,

A having diagonal

Z

is diagonal with

A, C

are m a t r l c e s

elements

of degree

i.e. have unity as the c o e f f i c i e n t

= C - A

the d e c o m p o s i t i o n

is u n i q u e l y

of

znJ.

nj

z-nJ or p o l y w h i c h are

Putting

d e f i n e d by the i n e q u a l i t i e s

on degrees deg aij

< deg ~jj, j + i;

deg ~ij

~ deg aii'

j ~ i;

deg aij < deg aii" deg eij < deg ~ ii'

i,j = 1,2 .... ,q.

A k a i k e ' s m e t h o d leads to estimates y~t = (y(t)', y ( t - l ) ' , . . . , y ( t - h ) ' ) ' fitting an a u t o r e g r e s s i o n minimising

BIC

or

AIC.

j > i

of the

nj

and of

A.

Put

where

h

m i g h t be c h o s e n by

and d e t e r m i n i n g

h

as the order

Put,

for

£ = 0,1,...;

m = 0,1,...,q-l,

15

y£m(t) ' = (y(t+l} ', y(t+2) ', .... y(t+£) ', Yl(t+£+l),..., Ym(t+£+l) ) '. If the smallest

nj

is for

j = m

and

nm = £

then row

(see (4.4)) is linearly dependent on earlier rows of correspondingly y£,m(t)

H

r(E+l,m) and

(see (4.1)) there will be some linear function of

that is orthogonal to the

past,

while this will not be

true for £i < £ or for £I = £' ml < m. we consider the solutions of ~J[DjI£q+m - ~£,m~£,m ]"

To judge when this is so

01 > D2 > "'" > 0£q+m

1 ~£,m = {T Zy£,m(t)Y£,m(t),}-½ T1 Z ~ , ~ t ) (gt) ,{~l zgt(gt),}-% where the summations are over £q+m 4 hq.

The

~j

h+l < t < T-£-I.

It is assumed that

are the canonical correlations between the

Y£,m(t) and 9 t. Successively examining these canonical correlations (ordering (£,m) in dictionary order, first according to £ and then

m)

we stop when,

for the first time

_(T_V£,m)log (l-~£q+m) 2 - ~£,m > 0; If this happens at eliminate

£(i), m(1)

Ym(1) (t+£(1)+j),

then

j > 0,

nm(1)

~£,m = q(h-£)-m+l. is put at

from all future

£(i). Y£,m

Now

and

continue, always taking 9£,m as qh-dim y£,m(t) + i. Once nm(1) is determined we eliminate Ym(2) (t+£(2)+j), j > 0, from future y£,m(t) and continue and so on. In this way all nk are determined and with each will be associated an ~(k), which is the ~j for the smallest ~j at the step when nk was determined. ~j is determined only up to a scalar factor and that is fixed in ~(k) making the last element unity. of the estimate of to yj(t+v) determined,

A(z)

Now

~(k)

so that the element of

nk

in canonical form corresponding to

available.

~(k)

k'th

by row

corresponding

in y£,m(t), for £,m at the values where nk was is the coefficient of z v-I in the estimate of ak,j(z).

Thus at the end of the calculation the A(z),

determines the

and

estimate

~

of

the Kronecker indices, are

It is then necessary to estimate

C(z).

This would be

done

by forming ~(z-l)y(t) and using the calculated autocovariances of this to estimate those of C(z-l)e(t). Then an estimate of the spectrum will be obtained and factored to find an estimate of Since

A(0) = C(0)

and the row degrees of

C(z)

C(z).

are prescribed by

the degree inequalities this would have to be done carefully and would not be a trivial calculation for

q > i.

In any case these

18

estimates of

A(z), C(z)

are inefficient b u t could be used to

initiate a minimisation of

(2.5), in the form for

to the canonical choice of

H0

and the

nk"

(1.5) corresponding

We do not proceed

further with the description because there are problems with the method.

It is, so far, restricted to the ARMA case.

determined in an inefficient estimation procedure adjustment of them has been suggested.

The

nk

are

and no later

However the method is of

interest because of its association with the theory of the first part of this section. Notes on References.

Adamyan,

and Jewell and Bloomfield norm approximation. q=l,

Arov and Krein

(1983)

(1983,a)

suggest,

that the canonical correlations be found directly

estimate.

Akaike

(1983)

deal with the theory of Hankel

Jewell and Bloomfield

s(z) = W(z)/W(z-l),

for

from

w h i c h is to be obtained by factoring a spectral (1969,a)

presents his method.

of a moving-average model see Hannah 5.

(1971), Glover

For some estimation

(1970).

A Gauss-Newton Procedure

(i)

First the case

q=l

and the calculations

Gauss-Newton procedure but to include

n

! T T~ eT (t)2' Here

At each iteration this is to

Thus consider = cT(z-l) _l{a T (z-1)y(t)-bT(z-1)u(t)}.

eT(t)

at, by, c r

The idea is to use a

to approximate to the true A~MAX structure

in the estimation.

be done recursively.

ARMAX model for

will be discussed because this is important

are then quite feasible.

are the transfer functions, q=l

and

T

for given

n,

(5.1) in the

is the vector of system parameters

i.e. the 3n freely varying coefficients in aT, b , c T. Here, again, we use lower case letters for the scalar case. Note that b

is, i n general,

a row vector since we do not require

p=l.

The

and are functions only of wT(z)-i = CT(Z-I)-IaT(Z -1) eT(t) WT(z)-l£T(z) = c (Z-I)-IbT(z-I). The procedure is to linearise these functions about a previous estimate, of

(5.1)

to a linear problem.

Gauss-Newton but includes

n

which reduces the minimisation

As has been said the procedure is in the optimisation.

It is necessary

to obtain a first estimate from which to commence the itezation. This is done by taking autoregression. Step 0.

eT ~ 1

and choosing

Put vCt)

at, b T

We go on to describe the algorithm.

=

~-u(t)/'

t

=

1 ..... T

by regression

17

and use the Levinson-Whittle algorithm. hand element of

S n.

Choose

~

Let

G2 n

be the top left

to minimise

log ~2 + n(p+l)log T/T. n Let the first row of scalar and in

~(z)

F~,j

be called

(aj'b!)3 where

a.~

is

5. has p elements. Then ~. is the j'th coefficient 3 J and ~ is the j'th coefficient vector in ~(z). S

The basic algorithm is now given by step 1 which is repeated until convergence.

To commence step 1 one needs estimates

These will initially come from step 0, with Step i.

Define

e(t), ~(t), ~(t), ~(t)

. . . .

ce(t)=~y(t),

~

c~(t)=y(t),

n, a, ~, c.

~ ~ 1.

by

^

(t)=e(t), c~(t)=u(t),

y(t)=~(t)=~(t)=~(t)=e(t)=0,

t < 0.

Put

I ~ (t~\

v(t) :

l-~(t)~ ~-~(t)/

,

t = 1,2 ..... T

Fn,j' Fn,j' Sn'

and use the Levinson-Whittle recursion to generate

Sn" Put n

£n(t) = ~Fn,jV(t-n+j)

Sa(1)\ n,n~ (1)| = Sn_11 ~n,n[

~(l~ I n an /

<~ (1}.~

/~(1)

- T7{~(t)-~(t)+~(t) TI 1

I n-l,j\

n,sl

| n-l,jl

~(z)/

~^(l)

n,j!

a2

n tn~

+ Fn_l(n-j)'

/

n,O

^2

s(l~l

n,O

.^(l)

•

n ,n|

j

=

a(1)l n in!

\Cn-l,j/

n,O

<~

(1)\

\

n,~|

(5.2)

} 9n_l(t-l)

~(l)'

n = °n-i - ~an,n'

n,n '

6(11) n,n- Sn-i

/

n,n~

I~(11~ ]

n,n I

\ n,n/ ^2

oo

:

1 T

s{~ft)

1

- ~(t)

+ ~(t)}2

1,2,...,n-l.

18

Choose

~,i, Ar

to minimise

log ~2 + n(p+2) and set

~ = ~%1;

aJ

log T/T

a(t) = Z~jz ] •

#

5. = ~ ( l )

= ~(I) n,J

'

3

= zS~zJ

C(Z) = ZGjzZ• ~(z) ~j = ~(I)

n,j

~,j •

'

and proceed to repeat step I. 2 log On+n(p+2) log T/T

Cease the iteration when the minimised value of stabilises. We make a number of remarks. i.

If there is a true rational

algorithm will provide as

T

increases.

(see section parameters,

7)

transfer function

estimates

that converge

system then this

to the true

Under fairly general conditions T½(T--T0) ,

TO

2sl Z

values e(t)

being the true vector of system

will be asymptotically

could be estimated

on the

normal.

The covariance

matrix

as (t-l)

.[v(t)',v(t-1)'• .... v(t-fl))} -I

V where

2 2 is o~ at the last iteration and n used at the last iteration. A more efficient

v(t)

is that vector

estimate

of the

covariance matrix would he obtained via the aj, ~J' °''3 j=l ..... n• ~2, a t the last iteration but we omit the formulae for brevity. 2.

It is not fully apparent

that it is best to use

BIC and some

would argue that in a situation where it is ~ r e a s o n a b l e in a true rational used. 3.

transfer

function

AIC

should be

Much depends upon the end purpose of the analysis.

Though the Levinson-Whittle

recursion has been used above this

could be replaced by other recursive discussed briefly

in sub-section

finally to effect one iteration and initiating with

4.

One problem with the algorithm t.

of the form

An alternative

of an algorithm

n=~

then it will

calculations

2(ii).

at

with

system then

to believe

would be

to optimise

(2.5)

aj, ~j, cj, j=l ..... ~•~2.

fail at step 1 since

is that unless ~(t)

~(z) ~ 0,

What should be done is to reflect those zeros of

that are inside the unit circle•

Izl ~

i,

etc will grow exponentially

in the unit circle.

c(z),

This may be

19

done as follows.

Form

^2 ~ -j 7. afl,jZ

Z C,, .Z j

O

0

0

which will be factored

as

~2 ~ -j 7. ~^ .Z 0 n,J where now

n,3

Z c^ .z j 0 n'3

7~^ .z j + 0, n,3

Izl ~ i.

To achieve

n-] p(j) =

this form

n ~2

7 ~^ ^ k=0 n'j+kCn'k

/ Z 0

n,j

and put !~

!~

, =

~j+l = 2 j + 2 j' ~

=

6(i)

6(2)

6(k)

~j,

c^

n,]

=

algorithm

c(z)

of

6.. 3

c~,j

5.

In some applications

test shows

~

:

:

:

0

...

~ (0)

6 (£~ 6 (n-l)

/

is the limit of

step i, on each

to check the location

algorithm)

that there

by taking

at each iteration

An alternative

i.

before

cheaper

by allowing

for example

to vary arbitrarily

and use the above

are zeros

the degrees

of steps

has been worthwhile.

At the first iteration

of

a(z), c(z)

using

b(z),

lower.

1 to 4, one can allow

but this will be much more

always

inside

in the use of

the degree of

would be to proceed

after the last iteration, elimination

•.. • ..

it may be felt that economy

may be e f f e c t e d

In principle,

If

at step

(by a Schur-Cohn

= i.

to differ,

6(1) 6(0)

o

this calculation

only when the

parameters

+

then

Izl

6.

...

it may by c o m p u t a t i o n a l l y

of the zeros of

costly.

0

6c~)/~(0).

Instead of performing

degrees

o

is used in place of

occasion,

c(z)

...

is the k'th element

this sequence,

cn,j

...0)

°

6}~)

Now

(i,0,0,

= 2(l,p(1) ..... p(~))G-l(6j)

G(6j)

Here

~0

BIC

computationally

by eliminating

terms

to determine

whether

We omit details.

of step 1 then

these

c(z)

~ 1

and hence

only this

20

~(t)

~ y(t),

~(t)

= eCt),

~(t) ~ u(t).

Computationally more

e f f i c i e n t a l g o r i t h m s h a v e b e e n g i v e n for this case, w h e n

u(t)

does

^

not Occur, w h i c h e x p l o i t the fact that to

y(t-j),

j=l,...,n,

and that

e(t)

e(t-j),

o r t h o g o n a l and c o u l d be t r e a t e d as such.

is T o e p l i t z o r t h o g o n a l

e(t-k)

are a p p r o x i m a t e l y

It is p o s s i b l e t h a t a m o r e

e f f i c i e n t i m p l e m e n t a t i o n may also be found at later iterations. m a y be m e n t i o n e d that in c o n f i n i n g s u m m a t i o n s we have treated

y(t),

t h a t assLunption

for

7.

u(t)

At step 1 from the

This is b e c a u s e

so that

It

t=l,2,...,T

t < 0,

but have a v o i d e d

This seems p r e f e r a b l e .

first i t e r a t i o n

p r o c e d u r e one may r e p l a c e n > n.

as zero for

t > T.

to

(i.e. repetition)

~(t)-~(t)+$(t) 6 (z-l)~(t)

a(z-l)~(t)-&~(t)=0

and

by

e(t)

implies that

~(t)- ~(t)

is,

of the

in

(5.2)

for

a(z-l)~(t) for

n > n

= e(t) a

linear c o m b i n a t i o n of the v a r i a b l e s in the regression. W h e n this ~(1) ~(1) i s done t h e ~ ., must be r e g a r d e d as a d j u s t m e n t s t o t h e n,3 n,] previous a~,j, c~,j i.e. m u s t be added to these. (ii)

N o w c o n s i d e r the v e c t o r case, w h i c h is m o r e e l a b o r a t e .

r e t u r n to the set, n,

for g i v e n

~.

M(n),

of all systems,

(We fix

the s y s t e m p a r a m e t e r s

~

the set of all H a n k e l m a t r i c e s

W(z), L(z). dimension

(1.3), of M c M i l l a n d e g r e e

for the m o m e n t only, b e c a u s e

that n e e d discussion.)

the r e q u i r e m e n t s b e l o w

(i.i))

H

of rank

First

M(n) n

it is

is e q u i v a l e n t l y

(for

W(z)

and of all p a i r s of t r a n s f e r

obeying functions

It m a y b e c o n c e p t u a l i s e d as a s m o o t h surface of n(2q+p)

and,

technically,

is an a n a l y t i c m a n i f o l d .

A

r e a s o n a b l e a p p r o a c h to e s t i m a t i n g a system w o u l d t h e r e f o r e be to determine

n

and t h e n the a p p r o p r i a t e p o i n t on

w h a t was done for because

M(n)

q=l.

For

q > 1

be m a p p e d h o m e o m o r p h i c a l l y

into E u c l i d e a n space.

a l t e r n a t i v e to the c o n s i d e r a t i o n of the K r o n e c k e r indices, There is, however,

a sum of

and this is

c a n n o t t h e n b e c o v e r e d by one n e i g h b o u r h o o d t h a t m a y

of all systems w h o s e K r o n e c k e r indices sum

of M(n)

M(n)

~his is h o w e v e r a p r o b l e m

M(n)

to

M(n) n

is the u n i o n

and h e n c e an

is the d e t e r m i n a t i o n of

as was the t e c h n i q u e u s e d in s e c t i o n 4.

s o m e t h i n g very a r b i t r a r y in the d e c o m p o s i t i o n

into sets c o r r e s p o n d i n g to d i f f e r e n t p a r t i t i o n s of q

n

as

i n t e g e r s and the e f f o r t r e q u i r e d for an e f f i c i e n t

p r o c e d u r e to d i s c o v e r t h e s e

is fairly c o n s i d e r a b l e .

of K r o n e c k e r indices s u m m i n g to n a m e l y those which,

for

n

n = qh + m,

A m o n g s t the set

there is one special set, 0 ~ m < q,

are of the f o r m

n I = n 2 = ... n m = h+l, nm+ 1 = -- . = n q = h. T h e n the f i r s t l i n e a r l y i n d e p e n d e n t r o w s in H are just the first n rows

n If

21

U(n)

is the subset of

independent

then

M(n)

u(n)

for which these rows are linearly

is open and dense in

or nothing is lost in restricting

attention

to

unlikely that the maximum of the likelihood in

M(n).

(However

u(n)

would provide

M(n).

Thus

U(n).

little

It is most

will be found off

a bad coordinate

system in

which to work if the maximum was near the edge.)

We describe

in another way by giving a unique description

A(z),

of

U(n) U(n)

B(z), C(z)

in W(2) = A(z-1)-lc(z-l), L(z) = A(z-l)-iB(z-l) for a system in U(n). We do this by describing the coefficient matrices An, j, Bn,j, Cn, j in A(z), B(z), C(z).

These will be depicted

indicating a freely varying are after the

All partitions

m'th row or column

A n ,0

= [~m Oq_m] ' Bnw0 = O, An , 1 = [[ 0. ]

Cn, 0

An,h+l' All other

below with a star

submatrix of elements.

Bn,h+l'

Cn,h+l =

An,j, Bn,j, Cn,j,

j ~ h+l,

are unrestricted.

We do

not mean that An,h+l, Bn,h+l, Cn,h+ 1 are equal. The vector T of system parameters coordinatising U(n) is of dimension n(2q+p) and is made up of the freely varying elements matrices.

in the coefficient

We now go on to describe how to estimate

n, T

and

~.

We do this by a series of steps that are related to those for but are more complicated. step

2

is

iterated.

Steps 0 to 1 are not repeated. Always the output

from the previous

the input to the next so we do not indicate

q=l

Only step is

those by a special

notation i.e. we do not for example write

~!i) for the ~. 3 J matrix found at step 1 since it is clear which Aj is used an step 2 i.e. that from step 1 and not step 0. Also we shall now index the stages in the Levinson-Whittle

recursion

by

h,

rather than

n

as

before. Step 0.

Put vet)

~ \-uCtl/'

t

=

1, ....

T

and use the Levinson Whittle recursion. hand choose

q x q ~,

submatrix of i.e.

n,

S h, h = 0,1,2,...,

be the top left n where n = qh, and

to minimise

log det ~n + n(q+p) Let the first block of

Let

q

log T/T,

rows in

Fh, j

n

=

hq,

h=

be called

0,i,...

|Ae,

I~j ]

and

22

let ~ be called ~. Then Aj: Bj n c o e f f i c i e n t m a t r i c e s in A(z), B{z), and

n

Step i.

=

are the

j = l,...,h

with

A0 = Ig,

C(z)

~ Iq

.

Put 8(t)

= Z Ajy(t-j) 0

- Z Bju(t-j) 1

and

v(t) =

|-u(t)J

k-act)/

and use the L e v i n s o n - W h i t t l e element of

Sh

is c a l l e d

algorithm.

~n'

Again

n = hq

the top left hand

and we choose

h

i.e.

to m i n i m i s e log det ~ Now

IAj, Bj, 6 9 ]

coefficient

Now

m

= ~-i

+ n(2q+p)

n

are the top

matrices

in

to the case

~-i

or equal to the true

in

in

C(z),

and p r o v i d e

with

A0 = B0 = I

n = £q+m,

for

0 ~ m < q.

m = 1,2,...,q-l,

at step i,

£,

T

is to insert

(5.3) and the elements

transfer

function

procedure,

in the a p p r o p r i a t e

a l g o r i t h m will be used for of

m

need be taken.)

n =

(h-l)q + m 1 < j < m

m.

An, 0, Cn, 0.

the c a l c u l a t i o n

we regress

Yj(t)

here

at (see

t h e m as a r e g r e s s i o n

(It is u n l i k e l y

and then only

The r e g r e s s i o n

other variables,

that we d e s c r i b e If

q > 5

places

c h e a p l y using the c a l c u l a t i o n s

It is simpler to d e s c r i b e

one for each value of

than

We c o n s i d e r

done at step i.

step 1 but the details are too c o m p l i c a t e d to be d e s c r i b e d the references).

m=0

will be greater

indicated by a star in

This can be done c o m p u t a t i o n a l l y

for

was p r e f e r r e d

our procedure.

then are a l r e a d y

zero e l e m e n t s

h

h

the

We c h o o s e

since

to w h i c h

at step 1

which e x p l a i n s

but the c a l c u l a t i o n s

h=0,1,2,...

F~,j

If there is a true r a t i o n a l

system then for large e n o u g h

The p r o b l e m

n = hq,

rows in

B(z),

and need only compute

by the criterion.

m = q

q

A(z),

has to be determined,

corresponds

log T/T,

4

that the

or fewer v a l u e s

is of a v e c t o r v a r i a b l e

but is c a r r i e d for a typical

on

out row by row so row,

j,

j = l,...,q.

on the f o l l o w i n g v a r i a b l e s ^

(i) and

- Yk(t-i), i = 2,...,h

for

k=l,...,q;

where

k = m+l,...,q.

i = l,...,h

for

k ~ m

23

(2}

Uk(t-i),

k = l,...,q;

i = 1 .... ,~

(3)

ek{t-i),

k = l,...,q;

i = 1 .... ,h

where ^j^ Z C4e(tj) = 0

g

g

^j Z A4y(tj) 0

Z Bju(t-j ), 1

A

y(t) FQr

m < j < q

= u(t)

we regress

= e(t) yj(t)

(i)

-Yk(t-i),

(2) (3)

-(Yk(t) - Sk(t)), uk(t-i), ek(t-i),

The coefficient regression ek(t-i)

of

-Yk(t-i) aj,k(i) ,

in relation as

or

-(Yk(t)

in

A(z) C(z).

from the

choosing

to m i n i m i s e

q

m = q

the left

end of step

I).

Now we have

an

indicated

by

As was said above

for

Step 2.

q2, q2

n=(~-l)q+m, products

of

is chosen by

n = q(~-l)

with

+ m,

m = i, .... q.

expression

the latter

0, 1 are not repeated.

will be necessary,

F o r m matrices

respectively

~(z)

~n'

uk(t-i),

at the

of the form

n = ~.

steps

often no repetition

n

j'th

for

and cross

Now

log T/T,

B(z),

is the

The matrix

side is just the m i n i m i s e d

n, A(z),

(5.3)

- ek(t))

and s i m i l a r l y

regressions.

log det ~n + n{2q+p) (For

i = 1 ..... ~-i.

by the sums of squares

the residuals m,

i = 1 .... ,h-l.

k = m+l ..... q. k = 1 ..... q,

to • B(z),

T -I

t ~ O.

on

k = l,...,q;

estimate,

is estimated

= O,

q(t), and

~(t),

qp

~(t),

columns

Step

2 may be but

or at most one. of

q

rows

and,

by solving

A

h 0Z ~j [q(t-j},

~(t-j) , ~(t-j)]

=

(y(t)',

(y(t} ', u(t) ', e(t) ') = 0, Here

A

e(t)

is obtained 0Z ~je(t-j)

with the usual product wherein

Iq,(5.4)

t ~ 0.

from

= - 1Z ~ u(t-J)3

initial

u(t)', e(t)')@

conditions.

a typical

block

is

+ By

0Z Ajy^ (t-j) X @ Y

xijY ,

we m e a n

(5.5) the tensor

i = 1,...,a;

j = l,...,b

24

where

X

is

a x b.

Of course in (5.4)

all blocks are a scalar multiple of column of

n(t), for example, is

X

lq.

is

1 x (2q+p)

Thus the

and

i+q(j-l)th

O(z=l)-iEijy(t),

where

Eij

consists of zeros save for a unit in the (i,j)th place. Put

I [n(t)!] ~v(J ) = ~ Xl-C(t) ~-l[q(t+j), -C(t+j), -~(t+j)] . L-C (t) This matrix is of dimension

q(2q + p).

It is to be the

that is the input to the Levinson-Whittle carried out.

It is, thus,

computational effort.

For

(5.6)

~v(j)

recursion which is to be

q(2q + p)

that determines the

q = p = 5

this is

75,

which already

would be a rather large scale implementation of the Levinson-Whittle recursion. In cases where q is larger it may be necessary to use some other expedient and we discuss this in remarks below. Let

~(t)

be the vector obtained by adding columns numbered

i + q(i - I),

i -- l,...,q

in

n(t)

and similarly for

~(t), ~(t)

in relation to ~(t), u(t). (It is ~(t), ~(t), ~(t) that correspond most closely to the quantities defined for q=l.) Thus h 0Z Cjn(t-j) = y(t), 0Z Cj~(t-j) = e(t), 0Z Cj~(t-j)=u(t). ^

^

^

Now form, for each h recursion with (5.6), ^ = ~-I_ ~h,h Sh i

^

value considered in the L e v i n s o n - ~ i t t l e

h-I 7 ~ !Z[n(t-h+j) 0 h-l,j T

-~(t-h+j), {~(t)

Here e(t) q (2q+p).

is as from (5.5).

This vector,

Th, j = Th_l, j + Fh_l,h_jTh, h,

-~(t-h+j)] '

- ~(t) {h,h'

+ e(tl}. is of dimension

j = 0,1,2 .... ,h-l.

To initiate take ~0,0 to have zeros everywhere save for units in the places numbered i + q(i-l), i = l,...,q; q(q+p) + i + q(i-l), ^

i = l,...,q.

NOW the

Th, j

Bj, Cj, for n = hq. Thus element in the i + q(k-l)'th

provide^ estimates of the matrices

Aj,

Ah, j has as estimate of aik(j) the place in ~h,j" ~h,j has as its

(i,k) 'th element the element in place

q2 + i + q(k-l)

while Ch, j has as its (i,k) 'th element that in the {q2 + qP + i + q(k-l)}'th place. Next put,

in

~h,j

25

~n = T1 Z ~n(t)&n(t)'"

n = hq,

where h h Z th, jen(t- j) = Z A ,jy(t-j) 0 0 h and choose

i.e.

~

h - Z Bh,jult-j) 1

so that this minimises

log det ~n + n(2q + p) log T/T,

Now we seek to estimate

m

in

n = (~ - l)q + m,

as in step I of the algorithm. this as a regression formed at

(5.4) columns

elements

in

numbered

[A(j),

B(j), C(j)]

i+q(k-l),

bik(J)

matrix

Oik(J). element

in addition,

q2+qp+i+q(k-l)

are added,

having been eliminated) Now call

T

order,

where

parameter

comes

column index,

~h,j' the

i+q(k-l),

from k

all columns

i=l,...,m; Xj(t)

columns numbered i=m+l,...,q,

A(j),

j,

B(j)

i,k=l .... ,q for which

to be null.

Thus

k=m+l,...,q

are

except that for the i+q(k-l),

k=l,...,m

to form a matrix of only

to lag,

q(2q+p)

is associated

q2+qp+i+q(k-l),

is prescribed

these parameters

first according

~(t-j)]

aik(J) , the

k=l,...,p

Now eliminate

the vector of estimates

n

n=(~-l)g+m,

~(t-j),

As in forming

Call the resulting matrix

X0(t),

|-n(t-j),

is associated with

in (5.3)

columns numbered

eliminated.

it could be computed using

i=l,...,q;

and the column numbered

the corresponding j=l

step.

to describe

in the sense that the column

i,k=l ..... q

q2+i+q(k-l),

is associated with for

though

Consider

in this matrix are associated with the

column numbered with

m,

(5.6).

from the previous

q(2q+p)

m = 1,2,...,q-l,

Again it will be easiest

for each

the output from the use of

(5.7)

n = hg.

(all others

m(q-m)

of system parameters

are arranged or

C(j),

for

in dictionary

then according

and finally according

columns.

to whether

then according

to row index

i.

the to

Then

~n = {TI ZX(t) '~-ix(t)}-i {~I ZX(t) ,~-l(~(t) + e(t) -~(t))}; x(t) = [X0(t),

We emphasise

Xl(t) . . . .

that

of step 2.

~(t), ~(t)

are all formed using

step, which at the first use of step 2

step I, but later will have been from a previous use

Only

at this step.

].

X(t), ~, ~(t),

the output from the previous will have been

(5.8)

h

has been determined

The notation

~n

in (5.8)

by previous calculations should not be confused

26 ^

with ;fh,~ type

earlier. h,j

~n

is made up of many submatrices

and is of dimension

n(2q + p),

of the

n ~ (h-l)q + m.

We now again put = TI Z e n (t) e^n (t)',

~n

n = (h-l)q + m

(5.9)

h

Z 6 e (t-j) 0 n,3 n where

~n,j'

= Z A (t-j) - Z B ju(t-j) 0 n'JY 1 n,

~n,j' dn,j

the identification

have elements

discussed

before

obtained

(5.7).

~n

according

We choose

~

to

to

minimise log det ~n + n(2q + p) log T/T,

n = (h-l)q + m,

m = 1,2,...,q. (5.10)

Again for

m = q

optimised

(5.7).

the value of this criterion Then

values corresponding of

~

from

~j, ~j, ~j

to

~, i

is

(h-l)q + m

(5.9) that optimised

the

Remarks.

I.

analogues

here.

(5.4)

~

in

(5.5)).

All of the remarks In relation

from these

a

in relation

this factorisation references). 2. and

AA " C(e i ~)~C(e I~)

which does have

det C(z)

are available

Much of the work involved q

begin to be important.

would not be unreasonable step 1.

for

*

[z[ ~ i.

simulations

of

transfer n

function

is improved

at

Iz[ ~ I.

so as to obtain Algorithms

for

(see the

h, m

p

there it

found at

of step 2 the values of

step.

of

and if that

To reduce the calculation

to do them only at the

at the previous

with rational

that the determination

C(z)

is in step 2 where the sizes of

(5.5) have been computed we may move determined

# 0,

criterion

canonically

# 0,

of

but we omit details here

In any case at repetitions

h, m

the description

to the scalar case have

det ~(z)

from the first use of step 2 could be used. (5.4),

is the value

to the use of an estimate

again a problem will arise unless

C(z),

~

~j, ~j, Cj, ~,

This completes

Again this can be checked via a Schur-Cohn fails we should factor

and

to be the

(5.10).

We may now repeat step 2 commencing (which defines the algorithm.

is that which

are finally defined

h,

When that is done, once straight to

(5.8),

(5.9)

However experience with generated

data shows

at the first use of step

2 and it may improve again at later iterations

of that step.

27

N o t e s on R e f e r e n c e s .

The a l g o r i t h m s here d e s c r i b e d w e r e

p r e s e n t e d in R a n n a n and R i s s a n e n

(1982), H a n n a n and K a v a l i e r i s

The emphasis there w a s m o r e on o r d e r d e t e r m i n a t i o n . d e t e r m i n a t i o n of m

in step 1 (i.e.

is given in the second r e f e r e n c e . b e g i n n i n g of s u b s e c t i o n example.

(ii)

(1969)

Tunnicliffe Wilson

(1972).

For

(1984).

the

q > i) an a l t e r n a t l v e c a l c u l a t i o n For the structure t h e o r y at the

see D i e s t l e r and H a n n a h

The a l g o r i t h m in remark 4 in s u b s e c t i o n

Tunnicliffe Wilson

first

(1981),

for

(i) is due to

and its m a t r i c i a l v e r s i o n to

28

6.

Some T h e o r e t i c a l C o n s i d e r a t i o n s

This section w i l l be very b r i e f this account,

nor c o u l d

a v a i l a b l e here.

since t h e o r y is not the p u r p o s e of

such t h e o r y be f u l l y p r e s e n t e d in the space

H o w e v e r there seems to be some v i r t u e in i n d i c a t i n g

the scope of the t h e o r y u n d e r l y i n g

the m e t h o d s .

In the first place it is not n e c e s s a r y t h a t linear i n n o v a t i o n s , e(t),

be G a u s s i a n and all of the m e t h o d s are v a l i d u n d e r m u c h m o r e

general c o n d i t i o n s the

e(t)

in the sense t h a t the same theory o b t a i n s as if

were Gaussian.

The e s s e n t i a l c o n d i t i o n

E{e(t) le(t-l) , e(t-2) ....

} = 0.

This is e q u i v a l e n t to the a s s e r t i o n ,

for

(1.3), that the b e s t l i n e a r p r e d i c t o r

by a linear system.

Asymptotic

u(t)

(in the

if they are to be

require additionally that (6.2)

} = n.

some r e g u l a r i t y c o n d i t i o n s of a r e a s o n a b l y g e n e r a l

nature are n e e d e d b u t we do not d i s c u s s Of course

see

in that sense, g e n e r a t e d

distributions,

E{e(t)e(t) 'le(t-l) ....

- ZLiu(t-i) ,

is the b e s t p r e d i c t o r

so t h a t the d a t a is,

For

(6.1) y(t)

least squares sense)

the same as for the G a u s s i a n case,

is that

(6.1),

t h e m here.

(6.2) w i l l h o l d if the

with zero mean v e c t o r and finite

e(t)

are i n d e p e n d e n t ,

s e c o n d moments,

but are c o n s i d e r a b l y

more general. 7.

On-Line Procedures

Here only the case

p = q = I

m e t h o d easily g e n e r a l i s e s

to

w i l l be c o n s i d e r e d , p > I.

c o n c e r n i n g m e t h o d s for real time, and this has r e c e n t l y b e e n references.

There

though the

is a large l i t e r a t u r e

o n - l i n e e s t i m a t i o n of s y s t e m s

surveyed,

as w i l l be i n d i c a t e d in the

Here a t t e n t i o n will be c o n c e n t r a t e d on an o n - l i n e

implementation,

for

q = i,

of the a l g o r i t h m d e s c r i b e d in s e c t i o n 5.

In other w o r d s we i m p l e m e n t the two steps of this a l g o r i t h m in an o n - l i n e fashion, w i t h the step 1 i t e r a t e d Before d e s c r i b i n g that let us d e s c r i b e procedures.

(i.e. repeated)

once.

three k n o w n on-line

Each is of the f o r m

T(t) = T(t-l)

+ P(t)x(t)$(t),

e(t)

= w(t)

- T(t)'x(t)

2g

T

P(t)

= {z x ( t ) x ( t ) ' } - i 1

= P(t-l) Here and

v(t)

is the

T(t)

"independent

is the e s t i m a t e

coefficients. w(t),

- {I + x(t) 'P(t-l)x(t) } - I p ( t - l ) x ( t ) x ( t ) ' P ( t - l ) .

must

at t i m e

In t h e b a s i c

be c o n s t r u c t e d

w i t h the e s t i m a t e

on-line at t i m e

T(t-l).

(I) RLS = R e c u r s i v e

least

variable" t

in a r e g r e s s i o n

of the v e c t o r

procedures t

squares.

to t i m e

identify

bl(t),

t

together

T, x ( t ) ,

This corresponds

ah(t),

and probably

w(t).

to s t e p 0.

• (t) ' =

(al(t),

x(t) ' =

( - y ( t - l ) , - y ( t - 2 ) ..... -y(t-h) , u ( t - 1 ) , u ( t - 2 ) , . . . , u ( t - h ) }

w (t)

a2(t) .....

x(t)

of r e g r e s s i o n

x(t),

from data

In e a c h c a s e w e

on

b2(t) .....

bh(t)).

= y (t).

(2) A M L = A p p r o x i m a t e

maximum

likelihood.

This

corresponds

t o the

first use o f s t e p i.

T(t) ' =

(ai(t) ..... an(t) ,bl(t) ..... b n ( t ) ,Cl(t ) ..... C n ( t ) )

x(t) ' =

(-y(t-1) ..... - y ( t - n ) , u ( t - 1 ) ..... u ( t - n ) , % ( t - 1 ) ..... % ( t - n ) )

w(t)

= y(t).

In fact w h a t

is m o s t p r o p e r l y

but i n s t e a d

£ (t-j)

~(t)

= y(t)

e (t-j)

in

x(t)

- T(t)x(t).

This c a n b e done s i n c e which uses

c a l l e d /hML u s e s n o t

at t i m e

t

the l a t e s t

value

used

is

~(t-1)

T(t-1).

(3) RML = R e c u r s i v e s e c o n d use o f s t e p

maximum

likelinoodo

This

corresponas

to t h e

I.

T(t) ' =

(a l(t) , .... a n(t) ,b l(t), .... b n ( t ) , c l(t) , .... c n(t))

x(t) ' =

(-~(t-1) ..... - ~ ( t - n ) , ~ ( t - 1 ) ..... ~(t-n) ,~(t-1) ..... ~(t-n))

w(t)

= ~(t)

+ ~(t)

n 7 Cj (t)x(t-j)' 0 y(t)

= u(t)

As p r e s e n t e d independently

=

= e(t)

- ~(t).

(7.1)

( - y ( t ) , u ( t ) , ~ ( t ) ) , C 0 ( t ) - l, = 0,

t ~ 0.

above each of these of the others.

(7.2)

c o u l d De u s e d as a p r o c e d u r e

Of c o u r s e

n

is fixed.

It is k n o w n

80

that AML m a y not converge, McMillan degree

n,

e v e n if the true s y s t e m is A R M A X of

unless

2R(c(ei~) - I - ½) > 0,

~ q [-~,~],

i.e. unless the p o s i t i v e real c o n d i t i o n is s a t i s f i e d .

It seems

that ~ML m a y fail u n l e s s the l o c a t i o n of the zeros o f

C(z)

is

m o n i t o r e d and w h e n these move inside the u n i t circle then the Cj {t)

u s e d in f o r m i n g

~ (t) , ~ (t) , ~ (t) , % (t)

m u s t be held at

fixed v a l u e s o u t s i d e of the u n i t circle u n t i l the o u t p u t vector, T(t)

corresponds

to a stable

C.(t) set, i.e. a set with zeros 3 For these reasons it has b e e n s u g g e s t e d

outside of the circle.

that the a l g o r i t h m s be run in parallel, p r o v i d e d by the

~(t)

for RML b e i n g the

~(t)

f r o m AML.

in g e n e r a l be m u c h l a r g e r t h a n assumed true order.

w i t h the

from RLS and w i t h the

~(t-j)

e(t)

The v a l u e of

n

in h

for AML

(7.1),

in AM-L, RML w h e r e the

A c o m m o n choice w o u l d be

f7.2)

in RLS w o u l d

h = 2n,

n

is the

but t h i s

is arbitrary. One main reason for on-line c a l c u l a t i o n is to allow the e s t i m a t e s to adapt to an e v o l v i n g m e c h a n i s m

generating

case one should a l s o be "forgetting"

the r e m o t e p a s t since that

will be i r r e l e v a n t to the "forgetting factor" and

x(s)

e s t i m a t i o n problem.

£t(s)

t ~ ~(u), u=s+ 1

then the nett e f f e c t changed,

at time

t.

In that

Thus a

is i n c l u d e d t h a t m u l t i p l i e s

in the c a l c u l a t i o n s ~t(s) =

the data.

w(s)

If

£ (s) = 1 s

is that o n l y the f o r m u l a for

P(t)

is

becoming 1 = ~)[P(t-l)

P(t)

- {l(t)

+ 2 ( t ) ' P ( t - l ) x ( t ) } - i p ( t - l ) x ( t ) x ( t ) 'P(t-l)] . One r e a s o n a b l e p r o c e d u r e w o u l d be to take 0 < ~ < 1

and

1

is fairly n e a r to

However it is felt t h a t

h

and

n

i,

h

will have to i n c r e a s e w i t h

converge to the t r u e log t,

T

H l,

where

l

is

0.95.

m i g h t be m a d e to d e p e n d on

In p a r t i c u l a r even if the true system w e r e then

l(t) e.g.

t

Of c o u r s e if

in o r d e r that h

T (t)

increases with

as it w i l l if AIC or BIC is used to choose

e v e n t u a l l y the c a l c u l a t i o n c a n n o t be d o n e

t.

of the k n o w n order,

h,

in real time.

n,

will t

as

then However

if "forgetting" is u s e d then the sample s i z e is not, truly,

3~ increasing

with

The criterion

t

and thus

time

t

shoula not increase

indefinitely.

should be

log c^2(t) h wherej when

h

+ h log f(t)/f(t)

"forgetting"

is used

f(t)

(7.3) measures

the sample

size to

and is f(t+l)

= X(t+l)f(t)

sznce the effective

sample

+ i,

size is

t t f(t) : Z n X(u). s=l s+l It remains allowed

to describe

so to vary.

RLS, where readily with

h

how to compute

Though

indicates

~(t)

in

these procedures

the order,

(7.3) w h e n

are d e s c r i b e d

and for

p = 1

h

is

for

it will be

seen that they can be used in the same way for AML or RML,

n

taking

the place of

could also fairly easily vector

x(t)

h,

and for

be g e n e r a l i s e d

p > i.

to

Inaeea

they

Call

Xh(t}

q > 1.

when this has b e e n rearranged

the

as

(-y (t-l) ,u (t-l) ,-y (t-2) ,u (t-2) ,... ,-y (t-h) ,u (t-h)) and rearrange

T(t)

Xh(t)' If

Q

accordingly,

is orthogonal

and

Rh(t)

and

is upper

f (t)-Ish (t) 2

the calculations be obtained S =

is

[~(t)

,

[xH~t+l~ Q

acts only on rows place

in

QiQi_l

Qi_iQi_2

... Q1 S

:

[~

(t)

rh(t) 1

0

sn(t)j

triangular, c~(t).

then

Moreover,

as

cost.

Put

= (y(1) ..... y(t}).

~(t)~h(t)

= -rh(t|

as now will be indicated, a~(t),

h = I,...,H

may

consider

rH (t) ]

y(t+l~] Q2HQ2H_I

i, 2H+I ... Q1 s. are

Indeed

%h(t).

v~t)'

may be done so that all

at little

and construct

it

: [Xh~l) ..... xh(t)],

Q[X h(t)v(t)]

where

calling

--- Q2QI

where

and introduces

Qi'

orthogonal,

a zero in the

Then if the rows numbered

i,

(2H+l,i)'th (2h+l)

in

32

and

(0,0,

...,

0, d ½, d % r 2 . . . .

~0,0,

...,

0 • ~_iXl

xI i'

di,

~i

2

e

, 6 i~ - i x 2 '

are defined

d i = d + 6i_ixi

,

= d / d i,

i,

(0,0

(2h+l)

#

..., 0,

where

SH(t)2 we may find

of

"'" Q1

chosen

... S

d r h+l )

Q

---

, ~ X h + l ). Q2HQ2H-I

"'" Q 1

right hand element

of

Q2HQ2H_I

... Q 1 S

is

~(t)

h ~

H

h = H,

Moreover

at no e x t r a ... Q 1 S

6~ h.

Thus

this,

and

(7.3)

the w h o l e

thing may be done with f o r m of the a l g o r i t h m s

q = i.

How

this

well and the and via

h

h

to h a v e

transfer

It w o u l d

then,

eventually

the a l g o r i t h m

algorithm

could

t = 2000)

that

and often

log t

itself

with

RML.

This then

5, at l e a s t

to b e seen.

should t

could to i t s

for

If

h,

If it w a s fit the d a t a

set

l(t)

£ 1

and could be chosen

increase

as

log t.

Tnis

n o t r u n in real time. long run value

increases

b e r u n In r e a l t i m e

it c o u l d b e r e g a r d e d

recursive.

system would

then we

to i n c r e a s e

would De small compared

very large

in AML,

in s e c t i o n

h ~ H,

made

some virtues.

function

right

for a l l

calculation n

gives

and that of

will be remains

system was not evolving

that eventually

However got

certainly

a rational

should be allowed

(7.3).

means

it s e e m s

algorithm

the b o t t o m

m a y be c o m p u t e d

the same

that

is

Thus given

the c a l c u l a t i o n

cost, s i n c e % ^ is 62heh(t)

an o n - l i n e

are f i x e d

in RLS.

is

+ ~H(t) 2

Q2hQ2h_l is

for

defines n

are

"'"

recursively.

useful

xI = 0

of

to minimise

Precisely

e

r I = r I = I,

element

~H(t)

~(t)

hand element

0, 6 X 2,

= SH(t-1)2

for all

QiQi_l

right hand

and the bottom

6 %2 H e^H (t),

!

r k = c r k + sx k,

r o w s of

the bottom

believed

---- 1

60

= ~i_ixi/d i

O, d ~i e d z r 2,

--oQ

(0,0,

Q2hQ2h-i

,

!

t h e n the

~h(t) 2

by the recursion

s

x k = x K - xirk,

6 H

' 6 %i - i x 2h+l"%

"'"

6i = d 6 i - i / d i

e

Moreover

, a½r2h+l)

slowly

u p uo v a l u e s

as a r e a l

until

it

so t h a t the so l a r g e

time algorithm.

(say

33

Allowing

h

useful, w i t h

and

n

l(t)

to i n c r e a s e n e e d s i n v e s t i g a t i o n but c o u l d p r o v e a d r o i t l y varied,

or even a n o n - l i n e a r ,

to m o d e l an e v o l v i n g p h e n o m e n o n

episodic phenomenon.

When

h

or

n

it is likely that o c c a s i o n a l l y they w i l l change a p p r e c i a b l y value of

t

to another.

This

is b e c a u s e

(7.3)

varies from one

is l i k e l y to be flat

near its m i n i m u m or e v e n h a v e s e v e r a l m i n i m a n e a r to e q u a l i t y . may not m a t t e r m u c h since all of the c o m p e t i n g m o d e l s

This

are b e n a v Z n g

about e q u a l l y w e l l b u t c o u l d be m i s i n t e r p r e t e d as e v o l u t i o n . Notes on References.

The f i e l d of o n - l i n e c a l c u l a t i o n

surveyed in L j u n g and S ~ d e r s t r o m section, for

h,

n

(1986).

The b a s i c p r o c e d u r e of this

fixed, w a s s u g g e s t e d in Mayne,

(1983) and the p r o c e d u r e and M a c k i s a c k

(1983).

for

h, n

is e x t e n s i v e l y

A s t r o m anO C l a r ~ e

v a r y i n g in Hannan,

Kavalleris

34

References. Adamyan,

V.M., Arov,

D.F. and Krein, M.G.

(1971)

of Schmidt pairs for a Hankel operator Schur-Takagi Akaike,

H.

Ann. Akaike,

problem.

(1969) Inst.

H.

Fitting

(1969,a)

autoregressive

Canonical Advances

and D.G. Lainiotis, Anderson,

B.D.O.

15, 31-73.

models

for prediction.

6, 416-431. correlation

and the use of an information Identification,

and the generalised

Maths USSR Sbornik,

Star. Math.

Analytic properties

analysis

criterion.

and Case Studies,

Academic

and Moore, J.B.

Press, (1979)

of time series

In: System eds. R.K. Mehra

New York,

29-91.

Optimal Filtering,

Prentice Hall, Englewood Cliffs. Casti,

J,L.

(1977)

Academic Cooley,

J.W.

and Tukey,

calculation

Glover,

M. and Hannan,

B.

(1983)

E.J.

(1981)

Anal.

Lattice

systems and their Cambridge,

Multiple

Hannah,

E.J.

(1980)

The estimation

system.

of the

for adaptive

processing.

Ann. (1981)

Statist.

error bounds

Tlme Ser~es,

Research Dept.

Wiley,

New York.

of the order of an ARMA

8, 1071-1081.

Estimating

J. Multivariate

L~

Systems Division,

of linear

EngLand.

(1970)

E.J.

of

All optimal Hankel norm approximatlons

E.J.

Hannah,

Mathematics

Some properties

filters

Hannan,

process.

for machine

ii, 474-484.

Control and Management

Engineering,

An algorlthm

70, 830-867.

multivariable Report,

(1955)

of ARMA systems with unknown order.

(1982)

IEEE,

K.

J.W.

19, 297-301.

J. Multivariate

Proc.

and Their Applications,

of complex Fourier Series.

parameterization

Friedlander,

Systems

Press, New York.

Computation, Diestler,

Dynamical

the dimension

Anal.

of a linear

ii, 459-473.

of

35

Hannan,

E.J.

(1982)

criterion.

Testing

for autocorrelation

In: Essays in Statistical

and E.J. Hannan,

and Akaike's

Science,

Applied Probability

Trust,

eds. J.M. Gani

Sheffield,

403-412. Hannan,

E.J.

and Kavalieris,

series models. Hannan, E.J. models. Hannan, E.J.,

and Kavalieris,

Kavalieris,

autoregressive

Ann. Statist.

of past and future

~1986)

order.

Recursive

733, no.l. estimation

BiometriKa,

of mlxed

59, 81-94.

Prentice Hall, Englewood

~1983)

canonical

definitions

Identification,

P.

(1983,a)

for time serles:

T.

(1983)

MIT Press,

D.Q., Astrom,

for recursive

correlations

correlations

bounds and computation.

K.J.

and Clarke,

Research

Imperial

(1983)

Theory and Practice

College,

Universal

J.M.

(1983)

of parameters

Report,

of

Mass. A new algorithm

in controlled

Dept. of Electrical

London.

prior

estimation by minimum description

for parameters length.

and

Ann. Statist.

416-431. R.

(1980)

Asymptotically

order of the model process.

of

and theory.

Canonical

Cambridge,

identif±cation

A~MA processes.

Shibata,

Cliffs.

l_!l, 848-855.

Lgung, L. and S6derstrom,

J.

autoregression

l_!l, 837-847.

Ann. Statist.,

Rissanen,

M.

Biometrika,

for time series:

and Bloomfield,

Engineering,

Regression,

J. (1982) Recursive

P.

linear time

7.

Linear Systems,

past and future

Mayne,

(1986)

moving-average

(1980)

Multivariate i_~6, 492-561.

L. and MacKisack,

Jewell, N.P. and Bloomfield,

Jewell, N.P.

L.

of linear systems.

Hannan, E.J. and Rissanen,

T.

I1984) Prob.

J. Time Series Anal.

estimation

Kailath,

L.

Adv. Appl.

Ann.

efficient

for estimating

Statist.

selection

parameters

8, 147-164.

of the

of a linear

~,

36

Tunnicliffe Wilson, G.

(1969}

Factorization

of the covariance

generating function of a pure moving-average SIAM J. Numer. Tunnicliffe Wilson,

Anal., G.

~1972) The factorization or matricial

spectral densities. Whittle,

P.

~1963)

process.

~, 1-7.

SIAM J. Appl. Math.,

23, 420-426.

On the fitting of multivariate

auto-regressions

and the approximate canonical

factorization of a spectral

density matrix.

50, 129-134.

Biometrika,

Chapter 2

Linear Errors-in-Variables Models

Manfred Deistler

I. I n t r o d u c t i o n

In this c o n t r i b u t i o n we are c o n c e r n e d w i t h some a s p e c t s of the ident i f i c a t i o n p r o b l e m for linear systems w h e r e b o t h inputs and outputs are subject to

("observational")

called e r r o r s - i n - v a r i a b l e s

errors.

M o d e l s of this k i n d are

(EV) models.

The c o n v e n t i o n a l s e t t i n g in the s t a t i s t i c a l a n a l y s i s of linear s y s t e m s is to a t t r i b u t e all e r r o r s to the outputs,

or

valently to add the e r r o r s to the e q u a t i o n s . equations ^

(for our purposes)

equi-

T h i s gives the e r r o r s in

(EE) models. ^

Let x t and Yt d e n o t e the "true" inputs and o u t p u t s r e s p e c t i v e l y and let x t and Yt d e n o t e the o b s e r v e d inputs and outputs, ation can be i l l u s t r a t e d as follows:

I

There

E V m o d e l s are of the form:

Fig

I

then the situ-

I: S c h e m a t i c r e p r e s e n -

t a t i o n of an E V m o d e l

u t and v t are the e r r o r s of the inputs a n d the o u t p u t s re-

spectively.

On the o t h e r hand EE m o d e l s are of the form:

S

Fig 2: S c h e m a t i c r e p r e s e n -

r

tation of an EE m o d e l Yt

S8

Of c o u r s e the E V setting is m o r e g e n e r a l t h a n the EE setting.

For a

nunlber of p u r p o s e s , e.g. for the p r e d i c t i o n of the o b s e r v e d o u t p u t s f r o m o b s e r v e d inputs,

the EE s e t t i n g is adequate.

In m a n y cases h o w -

ever, the E V s e t t i n g seems to be m o r e a p p r o p r i a t e ,

(i)

e.g.

if our m a i n i n t e r e s t c o n c e r n s the "true" s y s t e m g e n e r a t i n g the data

(rather t h a n a good r e p r e s e n t a t i o n of the data)

and if we

c a n n o t be sure a p r i o r i that the true inputs are not c o n t a m i n a t e d by e r r o r s

(ii)

if we w a n t to d e c o u p l e the c o m m o n e f f e c t b e t w e e n the v a r i a b l e s f r o m the i n d i v i d u a l e f f e c t s

(iii)

if there is no a p r i o r i c l a s s i f i c a t i o n of the o b s e r v e d v a r i a b l e s into inputs a n d o u t p u t s a n d if thus a s y m m e t r i c t r e a t m e n t of the v a r i a b l e s w o u l d be a p p r o p r i a t e .

We are d e a l i n g here only w i t h linear systems in a s t a t i o n a r y context. Also,

if the c o n t r a r y has not b e e n s t a t e d e x p l i c i t e l y ,

we r e s t r i c t

o u r s e l v e s to the single input - single o u t p u t case. Our p r i m a r y int e r e s t is in the c h a r a c t e r i s t i c s function

of the system,

i.e. in the t r a n s f e r

(or the p a r a m e t e r s of the t r a n s f e r f u n c t i o n ) ;

but a l s o the

^

c h a r a c t e r i s t i c s of the errors and of

(xt) are of interest.

The s t a t i s t i c a l t h e o r y of linear d y n a m i c EE systems, A R M A X systems

(also in the m u l t i i n p u t - m u l t i o u t p u t case)

c h e d a c e r t a i n stage of c o m p l e t e n e s s n o w (1984)).

e s p e c i a l l y of has rea-

(see H a n n a n and K a v a l i e r i s

In the EV case on the o t h e r h a n d there is still a g r e a t

n u m b e r of open p r o b l e m s and this is the r e a s o n why there is still a r e l a t i v e l y small n u m b e r of a p p l i c a t i o n s

in this field.

lems in the E V case a r i s e f r o m the fact that the

The m a i n prob-

(ensemble)

second

m o m e n t s of the o b s e r v a t i o n s do in g e n e r a l n o t u n i q u e l y d e t e r m i n e the t r a n s f e r f u n c t i o n of the system. A n o t h e r d i f f e r e n c e to EE m o d e l s is, that in the EV case, h i g h e r o r d e r m o m e n t s may c o n t a i n a d d i t i o n a l

Our e m p h a s i s fiability,

is on two problems:

i.e.

(in the non G a u s s i a n case)

i n f o r m a t i o n a b o u t the t r a n s f e r function.

The f i r s t is the p r o b l e m of identi-

the p r o b l e m w h e t h e r the c h a r a c t e r i s t i c s

of i n t e r e s t

39

m e n t i o n e d a b o v e can be u n i q u e l y d e t e r m i n e d f r o m c e r t a i n c h a r a c t e r i s t i c s of the o b s e r v a t i o n s as e.g. f r o m t h e i r from their p r o b a b i l i t y law

(ensemble)

s e c o n d m o m e n t s or

(see D e i s t l e r and S e l f e r t

(1978)).

If the

answer is n e g a t i v e then the s e c o n d p r o b l e m is to d e s c r i b e the sets of 9 b s e r v a t i o n a l l y e q u i v a l e n t c h a r a c t e r i s t i c s of interest,

i.e. the sets

of c h a r a c t e r i s t i c s of i n t e r e s t w h i c h c o r r e s p o n d to the same c h a r a c t e r istics of the o b s e r v a t i o n s . These q u e s t i o n s are q u e s t i o n s p r e c e d i n g e s t i m a t i o n in the n a r r o w sense and as has b e e n s t a t e d a l r e a d y they t u r n out to be the m a i n d i f f i c u l t y in the p r o c e s s of e s t i m a t i o n

(or inference)

in E V models.

This diffi-

culty is the r e a s o n w h y not v e r y m u c h a t t e n t i o n has b e e n p a i d to EV models for a long time. However,

in the last d e c a d e t h e r e has b e e n a

r e s u r g i n g i n t e r e s t in E V m o d e l s in e c o n o m e t r i c s , theory, see e.g. A i g n e r and G o l d b e r g e r A n d e r s o n B.D.O. (1984), D e i s t l e r

s t a t i s t i c s and system

(1977), A i g n e r et al.

(1985), A n d e r s o n and D e i s t l e r

(1984),

(1984), A n d e r s o n T.W.

(1984), D e i s t l e r

(1985a),Fuller

(1980), G r e e n and

Anderson

(1985) , H i n i e h and W e b e r

(1984) , K a l m a n

(1982), K a l m a n

Maravall

(1979), Picci

Mittag

(1985), W e g g e

(1985), S 6 d e r s t r 6 m

(1983),

(1980), S c h n e e w e i B und

(1983).

The p a p e r is o r g a n i z e d as follows.

In s e c t i o n 2 we r e p e a t some well

known results for the static case.

In sections 3 to 5 we c o n s i d e r the

(dynamic) c a s e w h e n the c h a r a c t e r i s t i c s are their second moments.

of the o b s e r v a t i o n s c o n s i d e r e d

T h e r e b y in s e c t i o n 3 the set of all t r a n s f e r

functions c o r r e s p o n d i n g to g i v e n s e c o n d m o m e n t s of the o b s e r v a t i o n s is described.

Section

4

deals w i t h the same p r o b l e m , w h e n the s y s t e m

is a priori k n o w n to be c a u s a l and w i t h the p r o b l e m w h e t h e r c a u s a l i t y can be d e t e c t e d f r o m the s e c o n d m o m e n t s of the o b s e r v a t i o n s .

In sec-

tion 5 several c o n d i t i o n s for i d e n t i f i a b i l i t y are given. F i n a l l y in section 6 we d e r i v e c o n d i t i o n s for i d e n t i f i a b i l i t y u s i n g i n f o r m a t i o n coming f r o m m o m e n t s of o r d e r g r e a t e r than two.

The system c o n s i d e r e d is of the form

(1.1)

Yt = w(mxt

40

where B on ~

is a c o m p l e x v a r i a b l e as w e l l as the b a c k w a r d - s h l f t o p e r a t o r

and where

(1.2)

w(B)

=

Z wiBl

is the t r a n s f e r function.

The s u m m a t i o n on the l.h.s of

(1.2) ranges

o v e r all i n t e g e r s and thus in g e n e r a l the s y s t e m is not a p r i o r i a s s u m e d to be causal.

The o b s e r v e d p r o c e s s e s

(x t) and

(yt) are g i v e n by

^

(1.3)

x t = xt + Ut

(1.4)

Yt = Yt + vt

We a s s u m e throughout:

(1.5) All p r o c e s s e s c o n s i d e r e d are

(wide sense)

stationary;

all limits

of r a n d o m v a r i a b l e s are u n d e r s t o o d in the sense of m e a n squares convergence

(I .6)

Ex t = Eu t = Ev t = 0

(I .7)

EXsU t = EXsV t = 0

Vs,t

and

(1.8)

(ut,v t) has a s p e c t r a l density,

~ say.

T h e s e a s s u m p t i o n s are c a l l e d the s t a n d a r d a s s u m p t i o n s h e r e and they w i l l not be f u r t h e r e x p l i c i t e l y restated.

The a s s u m p t i o n Ex t = 0 is i m p o s e d for n o t a t i o n a l c o n v e n i e n c e o n l y and may e a s i l y be relaxed. tion

(1.8)

(1.7)

is n a t u r a l

is n a t u r a l for errors.

In m a n y cases w e in a d d i t i o n a s s u m e

in our context. A l s o the assump-

41

(I .9)

EUsV t = o

Vs,t

i.e. ~ is d i a g o n a l (1.10) All p r o c e s s e s Thereby,

if

considered

have a spectral

(zt) is a s t a t i o n a r y

density

we often use fz to d e n o t e

process,

its spectral density. Assumption and

(1.9) means

(yt) are

due

that all

ment devices

effects

e.g.

if the errors

for inputs and outputs are correlated.

then c o r r e s p o n d

between

the s i t u a t i o n

is h o p e l e s s

to given second m o m e n t s

to separate

because

(x t)

effects are

Of course s i t u a t i o n s may occur w h e r e

can not be justified,

tional a s s u m p t i o n information

(linear)

to the s y s t e m and that only i n d i v i d u a l

a t t r i b u t e d to the errors. an a s s u m p t i o n

common

such

in the m e a s u r e -

W i t h o u t any addi"too many"

of the observations.

systems

Additional

the errors c o u l d be o b t a i n e d from certain

frequency domain p r o p e r t i e s

of the errors,

or from h i g h e r order moments.

2. The Static Case Here we c o n s i d e r the t r a n s f e r

the special case, w h e r e the system is static,

function w is simply the slope p a r a m e t e r of a line and

all p r o c e s s e s detail in the

are white noise. literature,

the surveys by M a d a n s k y T.W. A n d e r s o n complicated

see K a l m a n

Yt

(1.3) and

This case has been d i s c u s s e d

see e.g.

Gini

(1959), Moran

(1921), F r i s c h

(1982)

in great

(1934)

(1971), A i g n e r et al.

(1984). For the m u l t i v a r i a b l e

The static E V model

(2.1)

i.e.

and

(1984)

and K l e p p e r and L e a m e r

(1984).

is w r i t t e n as

= axt

(1.4), w h e r e

EXsX t = ~st.O~ In a d d i t i o n we a s s u m e

; a~R

(xt),

;

(ut) and

EUsU t = 6stO u (1.9)

(v t) are w h i t e n o i s e and thus

;

and

case, w h i c h is much more

EVsV t = 6stO v

i.e. E U s V t = o. If we try to w r i t e

42

(2.1) (1.3)(1.4)

as a "regression"

in the o b s e r v e d variables,

we

obta in :

Yt = axt + (vt - aut) But here E x t ( v t - au t) = -a.o u and thus in general squares e s t i m a t o r s investigate

will not be consistent.

the p r o b l e m

The p a r a m e t e r s

least

in more detail.

of i n t e r e s t are ~ = (a,o~,Ou,av).

ween these p a r a m e t e r s

(ordinary)

T h e r e f o r e we have to

and the second moments

The relation bet-

of the o b s e r v a t i o n s

is given by (2.2)

o x = Ex~ = o~ + ~u

(2.3)

~xy = ExtY t = ExtYt = a . ~

(2.4)

~y

^

^

= Ey 2 = a2o^ + x

v

Thus the p r o b l e m of i d e n t i f i a b i l i t y model

is w h e t h e r

8 is u n i q u e l y

from second moments

determined

for this

from ~x" Cxy" Oy" A slightly

m o r e general model w o u l d be of the form

(2.5)

(I .3) -

bYt = ax t

(where a and b are suitably n o r m a l i z e d e.g. by a 2 + b 2 = I)

(I .4). Sloppy speaking here we a l l o w for the case a = ~

(2. I). Then the p r o b l e m of o b s e r v a t i o n a l to the f o l l o w i n g covariance

"Frisch"

matrix

problem

equivalence

(see Kalman

is e q u i v a l e n t

(1982)):

Given the

K = --Fx'~xyi find all d e c o m p o s i t i o n s i

l

l yx °yI (2.6)

K

into c o v a r i a n c e

-- ~

÷

(i.e. symmetric,

nonnegative

definite)

matrices

^

K a n d ~,

such that K is singular and ~ is diagonal.

lence is s t r a i g h t f o r w a r d ~

in

This equiva-

here K is the c o v a r i a n c e m a t r i x of

4,3

^

^

(xt,Yt), a and b, after suitable normalization, are defined from

~=(°u0)

^

the linear dependence relations in K, and

0

In the case

(2.1)

aV

(which excludes the possibility b = 0 in (2.5)

and which is the only one we treat here, unless the contrary has been explicitely stated)

K = ~.

a

a

2

holds. ^

By the singularity of K we have ^ det K =

(2.7) where a~ = E ~

a^.a^

x

y

~2 xy

-

=

O,

and furthermore

(2.8)

0 -<- ~^x -<- ax

(2.9)

0 -<- ~^y -<-

y

and these are the only restrictions on ~ of pairs

(~,g~)

and ~9. Thus the range

compatible with the given second moments of the

observations is a part of a hyperbola, as illustrated in Fig 3 a^J b 0Y: __. Y g

~

x

Fi@ 3: The range of compatible

^=

(

I

a2 xy"

~-I y

J

~v

x

~ ^x

x'

pairs

44 We do assume (2.10)

throughout

that

~^ > 0 x

and that (2.11)

det K > 0.

Then the range of compatible

slope parameters

-I a = ~xy.O~ is given

by the intervals

(2.12)

[axy. Ox I, ~y.~x I ]

for Cxy > 0

~-I [~y" xy

for ~xy < 0

-I] ' ~xy'~x

{0}

for axy = 0

Note that the end points coefficients

of the

of these

from Yt to x t respectively. EV model contains

intervalls

(theoretical) This

regressions

correspond

is an a p p e a l i n g

the two regressions

to the

from x t to Yt and

as extreme

result cases

as the (where

either ut=0 or vt=0). The set of compatible and

parameters

a u and ~v is obvious

from

(2.2)

(2.4).

F r o m what we said above, matrix

it also follows

K can be decomposed

as in

that every covariance

(2.6).

Let us summarize: Theorem

2.1: For every covariance

ponding

EV system

tional

assumptions

(2.5) (1.3) (1.4) (2.1),

(2.10)

0 = (a,0~,Ou, O v) compatible servations

K there exists a corres-

satisfying and

(2.11),

with given

(1.9).

Under the addi-

the set of parameters

second moments

of the ob-

is given by -I

(2.13)

matrix

{O = (a,a

at(sign

-

~xy,Ox

x .[lo y/o

a-1

~xy,~y

- aOxy)6 ~ 4 I

ll,loy.

]}

for °xy + o

and {0 = {(0,

a~,a x - ~,0)6]R4[

o < ~

< o x}

for Oxy

o

45

This result is due to Gini

(1921) and F r i s c h

(1934).

If we d r o p the a s s u m p t i o n s~ ~ 0 a n d c o n s i d e r the d e c o m p o s i t i o n (2.6)

(i.e. the m o r e general case

picture:

(2.5)) then we have the f o l l o w i n g

In the case ~xy ~ 0, s~ ~ o m u s t h o l d and in every de-

composition

(2.6), K m u s t h a v e r a n k e q u a l to one. For o t h e r w i s e

^

K = 0 and K = ~ w o u l d not be diagonal. If ~ = 0 and K @ 0, then ^ ^ xy K may e i t h e r h a v e r a n k equal to one, or K = 0 and thus K = ~. In the latter c a s e the errors c o r r e s p o n d to a m a x i m a l e x t r a c t i o n of individual factors of the o b s e r v e d v a r i a b l e s .

For s i n g u l a r K, the F r i s c h p r o b l e m is trivial, defines the u n i q u e d e c o m p o s i t i o n

b e c a u s e then K =

(2.6~ w h e n e v e r ~xy ~ o.

If x t and Yt are not n e c e s s a r i l y one d i m e n s i o n a l Frisch p r o b l e m can be formulated.

then an a n a l o g o u s

Then the m a i n p r o b l e m s are to ^

determine the m a x i m u m corank, m'say, (2.6) of K

(i.e. to d e t e r m i n e

between the true variables) (suitably normalized)

of K among, all d e c o m p o s i t i o n s

the m a x i m u m n u m b e r of l i n e a r r e l a t i o n s

and to c h a r a c t e r i z e the set of all

linear r e l a t i o n s c o r r e s p o n d i n g to g i v e n K.

These p r o b l e m s h a v e n o t yet been s o l v e d for the general, m u l t i variable case

(see K a l m a n

(1982), K l e p p e r and L e a m e r

(1984)) and

they will n o t be t r e a t e d here.

Now, let us turn a g a i n to the one d i m e n s i o n a l case. Gaussian o b s e r v a t i o n s

In the c a s e of

(xt,Yt) , there is no i n f o r m a t i o n f r o m the

data e x c e e d i n g the i n f o r m a t i o n o b t a i n e d f r o m the s e c o n d m o m e n t s ; thus e.g. for the slope p a r a m e t e r a there is an " i n t e r v a l of uncertainty";

therefore,

of course,

the m o d e l

is not i d e n t i f i a b l e

in this case.

There are s e v e r a l p o s s i b i l i t i e s

to o v e r c o m e this "basic" n o n i d e n t i -

liability of the E V model. A g a i n the r e a d e r is r e f e r e d to the survey p a p e r s by M a d a n s k y (1959) and M o r a n (1971). As easily seen, -I if ~u or av or ~u.~v are k n o w n then we h a v e i d e n t i f i a b i l i t y . A s s u m p t i o n s of this k i n d may be j u s t i f i e d in p h y s i c a l a p p l i c a t i o n s

48

where either a-priori

the properties

known,

or where

the true variables such assumptions Another

the m e a s u r e m e n t s

are kept constant.

Gaussian

case)

Reiers~l

(1950)).

is to utilize

from moments Here,

(1969)).

c

information

coming

whereas

large order exist. rather

Zl..Z n

(in the non

than two

we assume

(Geary

(1942)

throughout

that

For technical

than w i t h moments

If z1...z n are random variables,

in the Taylor

about the origin, Brillinger

are

for most applications

of order ~reater

up to a suitable

joint n-th order cumulant (i)ntl...tn

can be repeated,

However

for simplicity,

sons, we deal with cumulants and Stuart

instruments

cannot be justified.

possibility

all moments

of the m e a s u r e m e n t

rea-

(see e.g. Kendall then their

is given by the c o e f f i c i e n t of Cz1"''Zn n series expansion of in E exp ii~izlt i has the following

properties

(see e.g.

(1981)) :

Cz1= EZl

czl..Zn

;

Czlz2 = E(z I - EZl)(zo, - Ez2)

is symmetric

c(alz1+a2z2 ) z3"'Zn

if the random variables

in its arguments.

= a I Czl

z3...z n + a 2 c z2z3--z n

(Zl...z n) and

(Wl...w n) are independent

then C

=C (z1+wl)...(Zn+Wn)

Now,

in addition

(2.14)

~t,t£~

+C Zl...z n

Wl...w n

we assume are independent

and identically

distributed

so are the u t and the v t and the processes (v t) are m u t u a l l y Then we have the following cumulants:

(xt~), (u t) and

independent. relations

between

and

the n-th order

47

(2.15)

Cxn

C~n

(2.16)

Cyrx(n_r)

=

+ Cun

= Cgr~(n_r)

+ Cvru(n-r)

= CgrR(n-r)

= a.cg(r-1)~(n+1_r)=a.Cy(r_1)x(n+1_r);

(2.17)

where

Cy n = C ~ n we have

used

=

r = 2..n-I

+ Cvn

the notation

that C v r u n _ r = 0, for

C z r w n _ r = c z . .z w . . . w

and

the

r > 0, n - r > 0;

r times n-r times r n-r since v and u are

is n o n

(and t h a t

fact

inde-

pendent.

Let us a s s u m e a

that

n > 2 such

using

(2.16),

(2.18)

xt

that

C~n

Cgr-1~(n+1-r)

a is u n i q u e l y

processes.

Note

are at l e a s t and for

that

(2.16)

Thus w e h a v e

shown:

Theorem

Consider

2.2:

the a s s u m p t i o n

x t is n o n

Gaussian

theorem

forward

Then

there

a @ 0 then,

static

r > 0

the c u m u l a n t s

of

to the c a s e

the

observed

n = 2)

there

f o r m C y r x ( n - r ) , r > 0, n - r > 0

-

~,

~u and

a v can

(2.4).

EV model

Then,

under

a n d a ~ 0, t h e m o d e l

can be extended

n-

a is d e t e r m i n e d ,

(2.2)

(2.14).

I > 0,

(as o p p o s e d

Once

from

r-

from

of the

holds.

the

> 0).

(2.1) (1.3) (1.4)

the assumptions

s~

together ~ 0,

is i d e n t i f i a b l e .

to t h e m u l t i v a r i a t e

case

in a s t r a i g t h -

manner.

If i n s t e a d late

that

thus

(2.16)

unique

;

determined

for n > 2

determined

~

we assume

~ 0, a n d

two cumulants

these

be u n i q u e l y

This

If in a d d i t i o n

a = Cyrx(n-r) Cy (r-1) x (n+l-r)

and t h u s

with

Gaussian

@ 0.

of a s s u m i n g (u t v t) holds

provided

is for

that

that

(u t) a n d

Gaussian all there

then

r and for

(v t) a r e

independent

we postu-

Curvn- r = 0 whenever

n > 2 and

all n > 2 and

is a n > 2 s u c h

that

therefore ^

Cxn

~ 0.

a is

is

48

Now, let us make a few remarks on estimation: mic Gaussian

likelihood

is of the form (2.19)

function

(where constants

LT(O)

Thereby K(O)is the covariance

in (2.19)

estimator

The corresponding observationally

(MLE), obtained by minimizing

IT =

In the non Gaussian

case, of course

estimator

The reader

(for the case Oxy,T>0

Sy,T

say)

a'~xy,T )

(2.18) can be used to define

for a from the sample cumulants.

that the estimators

Here the

should be selected and also infor-

is referred

of different

to Drion

3. Second Moments and Dynamic Models:

order can be

(1951)

of obtained from

satisfy the restrictions

and Scott

(1950).

(2.18) do not

coming from the second moments. The General

Case

From now on, linear dynamic

systems are considered.

the two sections

only the information

following,

of all

to the true K,

IJxy

mation coming from sample cumulants

necessarily

corresponding

~x,T - &-1 "~xy,T'

'

problem arises which cumulants

Note also,

i.e. the e s t i m a t e o f t h e s e t

parameters

to theorem 2.1 is given by

[~xy,T'~x,T -1

combined.

xt

T tZ1= (yt) (xt'Yt)

set of parameters,

equivalent

{8 = (a,a-1.~xy,T,

a consistent

size. Thus the correspon-

is given by

K T = \~xy'T' Oy,T

then according

'

matrix K in (2.6) corresponding

{~x,T'°xy,T) (2.20)

(2.1) (1.3) (1.4)

T Z (xt,Yt).K-1(0).(xt,Yt) t=l

8, and T is the sample

ding maximum likelihood LT(O)

of the static model

logarith-

have been neglected):

= T log det K (8) +

to the parameters

The negative

second moments of the observed processes

In th£s and in

coming from the

(x t , yt ) is used.

49

For the m o m e n t let x t and Yt be not n e c e s s a r i l y Let z t = (xt,Yt), The general

zt =

one dimensional.

(~t'gt)' wt = (ut'vt)"

f o r m of a linear dynamic

s y s t e m is:

N • •N•

(3.1)

lim Z N~ i=-N wi

•

--

= 0, w i 6 ~

zt-i

m x n

; m
where N

w(B)

= lira

X

~!N)Bi ¢ 0

N ÷~ i=-N x is the c o r r e s p o n d i n g

transfer

function.

In

(3.1), a n a l o g o u s l y

for the static case, we a l l o w for a c o m p l e t e l y the variables.

symmetric

to

treatment

In this section we assume that the spectral

(2.5) of

densi-

ties

f =

, ~ =

fyx fy

and ~ of

(xt,Yt),

(1.9) holds

(~t,§t)

and of

1

fxy fyx fy

(ut,v t) r e s p e c t i v e l y

in the sense that ~ is diagonal.

(3.1)(1.3)(1.4)

we have a d e c o m p o s i t i o n

exist and that

Then for an EV system

analogous

to

(2.6):

^

(3.2)

f = f ÷ ^

.

where f is singular Note that a matrix

f:[-~,~]

stationary p r o c e s s

if and only matrix

(3.1)) and ~ is diagonal.

if it is an integrable,

satisfying

f(1)

from f, every d e c o m p o s i t i o n

singular and w h e r e ~ is a diagonal omit the s t a t e m e n t

l-a.e),

Since f is singular, (3.3)

= 0 by

~ ~nxn is a spectral d e n s i t y of a

definite H e r m i t i a n if we c o m m e n c e

^

(since ~ ( ~ l ) f

w(e -il)

corresponds

a transfer

f(1)

= 0

spectral

(real)

nonnegative

= f (-I)'. C o n v e r s e l y (3.2) where f is a density matrix

to an EV system

(we always

(3.1) (1.3) (1.4):

f u n c t i o n can be found s a t i s f y i n g

50

T h e r e b y a l w a y s a s u i t a b l e n o r m a l i z a t i o n can be c h o s e n such that the limit in

(3.1) exist.

As easily can be seen,

for every spectral density matrix f a decc~position (3.2}

exists;

= Imin(1).I,where

take e.g. ~(I)

Imin(1)

is the s m a l l e s t

e i g e n v a l u e of f(1)

Clearly,

for given f, in general the d e c o m p o s i t i o n

unique. A n a l o g o u s l y

(3.2)

is not

to the static m u l t i v a r i a t e case the main p r o b l e m

then is to d e t e r m i n e the m a x i m u m p o s s i b l e n u m b e r of

(independent)

^

l i n e a r d y n a m i c r e l a t i o n s b e t w e e n the z t, m'say,

over all d e c o m p o -

sitions for given f and to d e s c r i b e the set of all o b s e r v a t i o n a l l y e q u i v a l e n t w. N o t e that

(3.2) may also be f o r m u l a t e d as a d y n a m i c

f a c t o r a n a l y s i s model.

N o w again we r e s t r i c t o u r s e l v e s to the case n = 2, i.e. when x t a n d Yt are o n e - d i m e n s i o n a l :

(3.4)

f~(1)

and that

> 0

We a s s u m e

V

(3.1) can be w r i t t e n as

~

wi~t_i

= 0.

i=--00

Clearly,

for n = 2, f(1) can e i t h e r have rank one or rank zero. ^

In

the second case f(1) = 0 and f (I) = 0. I m p o s i n g (3.4) implies ^ xy m* = I and f(l) has rank I for all I. (3.4) is a u t o m a t i c a l l y fulf i l l e d if f

(I) ~ 0 for all I. Thus u n d e r (3.4) the s y s t e m can xy a l w a y s be w r i t t e n as (1.1) w h e r e w = (w,-1) is u n i q u e for given f and

fxy(X)

= 0 implies

If f itself

w(e - i l )

is singular,

= O. then f=f and ~=0 d e f i n e s a d e c o m p o s i t i o n

c o r r e s p o n d i n g to an e r r o r - f r e e

system.

This d e c o m p o s i t i o n

is u n i q u e

whenever f

(I)~0. For 1's w h e r e f (I)=0 we may have e.g. f (I)=0, xy xy y (fx(1) > 0 and f~(1) > 0), and fu(1) > 0 gives rise to a n o t h e r d e c o m -

position.

Of c o u r s e

in this case we have for the c o r r e s p o n d i n g

transfer function w(e-il)=0. a s s u m e that f(1) than zero.

For the rest of the p a p e r we a l w a y s

is n o n s i n g u l a r on a set of L e b e s g u e m e a s u r e g r e a t e r

51 Besides the transfer function w, the other characteristics interest are

of

fx' fu' fv"

Analogously to the static case, the set of pairs with given f satisfies

(f~,fg) compatible

(3.5)

O < f~

< fx'

f~(1) = f~(-l);

f~ is measurable

(3.6)

0 ~ f9

! fy,

f~(1)

f~ is measurable

= fg(-l) ;

^

and, since f is singular

(3.v)

If

xy

12 = f^f^

x y

and (3.5)(3.6)(3.7) are the only restrictions on (f~,f~). Thus we have (Anderson and Deistler (1984) Deistler(1985a)). Theorem 3.1:Consider

the linear dynamic EV system

Under the additional assumptions (1.9)(1.10) all transfer functions w satisfying (3.8)

Ifyx(1) l-f~1(~)
=

(1.1),(1.3),(1.4).

and (3.4), the set of

w(e-il) I < fy(1)-Ifxy(1)I

-I

for fxy (I)

~(fyx(1))

t o

where ~(z) denotes the phase of the complex number z, and (3.9)

w(e -il) = 0

for fxy(1)

= 0

is the set of all transfer functions w corresponding The corresponding

set of the other characteristics

to given f

of interest,

fx' fu' fv satisfies the following relations

= r~ fyx(1).w_1(e_il )

(3.10)

for fxy(1) ~= 0

f~(1)

L 0 < f~(1)

<_ fx(1)

for fxy(1)

= 0

52

(3.11)

f

(3.12)

f

u

v

= f = f

- f^ x

x

y

- w(e -iX) .f

xy

T h e o r e m 3.1 is the d y n a m i c a n a l o g o n to T h e o r e m 2.1.

It shows t h a t the

p h a s e of the t r a n s f e r f u n c t i o n is u n i q u e l y d e t e r m i n e d from f w h e r e a s the gain of w may vary in a b a n d w h o s e b o u n d a r i e s frequency)

(which d e p e n d on

c o r r e s p o n d to the dynamic r e g r e s s i o n s w h e n e i t h e r u t = 0

or v t = 0. 4. C a u s a l i t y

We h e r e d i s c u s s two a s p e c t s of c a u s a l i t y in the EV setting: c o n s i d e r the k i n d of a d d i t i o n a l

F i r s t we

i d e n t i f y i n g i n f o r m a t i o n o b t a i n e d from

an a p r i o r i c a u s a l i t y a s s u m p t i o n ,

(1.2) w. = 0, i < 0; 1 (and thus the s u m m a t i o n is r a n g i n g f r o m zero to i n f i n i t y only). Second,

the p r o b l e m

i.e.

if in

of w h a t can be said a b o u t the c a u s a l i t y status of the

system, given the second m o m e n t s of the o b s e r v a t i o n s

Let us i n t r o d u c e some notation:

p(B) by ~p we d e n o t e

=

is treated.

For a p o l y n o m i a l

P i 7. pi B , i=0

pi ~

, BE~

its degree and by p* we d e n o t e the r a t i o n a l f u n c t i o n

p*(B)

= p(B -I)

= 6p7 pi B -i i=0

Let ~Po d e n o t e the m u l t i p l i c i t y of the zero of p(B) we denote the p o l y n o m i a l

~(B)

at B = 0; by ~(B)

d e f i n e d as

= p * ( B ) . B 6p

are the n o n z e r o roots has d e g r e e equal !o ~p - ~Po" If B 1 . . . B ~ p _ ~ p o implies ~ = p. of p, t h e n B ; 1 . . . B -~Po are the roots of ~. Po ~ 0

Furthermore,

we d e f i n e

p

+

and p- respectively

p = p+.p-.B~PO

by

53 and p+(B) + 0 IBI
be polynomials.

IB[_> I; p-(0)

As w e l l k n o w n

every

= I

rational

function

f

of the f o r m

f(e-iA)

= P1(e-il)P2(e

d e f i n e d on the u n i t c i r c l e extension

(4.1)

plane has a unique

rational

: PI(B)P2(B)-Ip~(B)p~(B)-I

section we assume

considered

of the c o m p l e x

to ~, g i v e n by

f(B)

In this

-iX-I ,(e-il)p~(e-iX)-I ) P3

are r a t i o n a l , W(B)

t h a t the t r a n s f e r

function

w and spectra

i.e.

= a -I(B)b(B)

f^ = d - l e ~ e* d -I* x c (4.2)

f

u

= c - l h ~ h* c -I*

fv = f - l g o v g . f - l *

where

a(B)

=

~a ~ aiBi , i=0

b(B)

=

6b 7~ b i B i i=0

d(B)

=

Sd . Z d . B I, i= 0 l

e(B)

=

6e "B i Z e ; i= 0 1

C(B)

=

6c i Z C~Bx , i=0

h(B)

=

6h . Z h i BI i=0

f(B)

=

6f • Z fi B~, i=0

g(B)

=

6g i ~ gi B i=0

are p o l y n o m i a l s ,

where

in

(4.2)

we consider

the

,

spectra

to be d e f i n e d

54

(by the r a t i o n a l e x t e n s i o n d e s c r i b e d above) IBI = 1 or on [-~,~])

on ~

(rather than on

and w h e r e a factor of 2 ~ h a s b e e n o m i t t e d a n d

w h e r e we in a d d i t i o n assume:

a(B)

~ o

IBI

f(B)

~ 0

IBI-~

: 1;

d(B)

~ 0

IBE z 1:

1;

g(B)

,~ 0

IBI

e (B) # 0

IBI ~ I

(4.3)

such that ce,a

e(B)

,~ v are the r e s p e c t i v e

< 1

i n n o v a t i o n variances.

~ 0 IBI=I then is e q u i v a l e n t to f~(B)@ 0 IBI=I

i.e. to

(3.4)

A n d f i n a l l y it is assumed:

(4.4)

a,b are r e l a t i v e l y p r i m e and so are d,e and c,h and f,g.

(4.5)

a

+ (0) = d(0)

If we impose

= e(0)

= c(0)

= h(0)

(3.4), then the a s s u m p t i o n s

= f(0)

= g(0)

(4.3)(4.4)(4.5)

= 1

are c o s t l e s s

in the sense they do not r e s t r i c t the c l a s s of t r a n s f e r f u n c t i o n s w and spectra f~, fu' fv c o n s i d e r e d , to o b t a i n u n i q u e p a r a m e t e r s . we do n e i t h e r

impose the m i n i p h a s e a s s u m p t i o n b(B)

we a s s u m e b(0)

= I. The a s s u m p t i o n a(B)

e x i s t e n c e of a s t a t i o n a r y s o l u t i o n

(4.6)

but serve only as n o r m i n g c o n d i t i o n s

It should be s t r e s s e d that c o n c e r n i n g b

~ 0,

# 0

IBI
IBI = I g u a r a n t e e s the

(1.1) of a ( B ) @ t = b ( B ) R t.

a(B) + 0, I BI ~-1

then is the c a u s a l i t y assumption.

The r a t i o n a l i t y a s s u m p t i o n s often are imposed,

even if we do not a priori

k n o w that the true t r a n s f e r f u n c t i o n and s p e c t r a are rational, to a p p r o x i m a t e

in order

the true q u a n t i t i e s by q u a n t i t i e s w h i c h can be d e s c r i b e d

by a finite n u m b e r of p a r a m e t e r s .

N o t e that even if f is rational,

the r a t i o n a l i t y a s s u m p t i o n for

the m a t r i c e s f a n d ~ in the d e c o m p o s i t i o n

(3.2)

is an a d d i t i o n a l

55

a-priori restriction, f,~. This is easily

i.e. in general there are also non-rational seen as, e.g. for fx(1)>0

fy(1)>0,

det f>0 , VI,

if ~x)l2 every f~ such that ~ y (Xy1 )

-< f~ 11) --< fx(l~'where

a decomposition

(3.7), and clearly

(3.2) via

be non-rational,

as, by the nonsingularity

Lf

Take for instance f^ = f

X

define rational

f^ can be chosen so to x

of f(1), the strict inequality

(hl t2

xy < f (A) f (~) x Y holds. On the other hand, if f is rational, always exists:

f^x is measurable,defines

a rational decomposition

X then (3.7), (3.11) and (3.12)

fy, fu and fv"

Consider the following example

(Anderson and Deistler

(1984)):

Let fx(1) (4.7)

= cI > 0

fxy(1)= c 2. (1+be -il)* fy(l)

(b I < I

c I > c2 > 0

= 11 + be-ill2.c2+c3 ; c 3 > 0

Then the set of all feasible f~ is given by c~

~ f~(1) ~ c I ,

f~ measurable

c2+c311+be-il1-2 fR c~

Fig. 4 :

Range of

///" /

feasible f^ x 2 c2 c2+c311+be-ill-2 0

f,~

56

The range of feasible f~ in this case containes all rational densities

spectral

of the form

c. p (e-il-z_.) .P(e-il-zj) I J I

f~(l)

. ~q(e-il_wj) c q. z~e -il -w.;. °1 ] I for arbitrarily

chosen

Izjl<1,

lwjl<1, p and q and for suitably

chosen c, c o . Without restriction of generality we assume zj ~ w i. Under the rationality assumptions

of this section,

ween the second moments of the observations

the relation bet-

and the characteristics

of interest is of the form:

(4.8)

fx = d-le qee,d-1* + c-lh q h * c -I*

(4.9)

fxy = a-lb d-le see,d -I*

(4.10)

fy = a-lb d-le q e,d-l*b*a-1*

+ f-1 g q~g.f-1*

In a first step, we analyse the information about w (and f~) coming from fyx: If w, f~ corresponds

to an EV system also satisfying

then we have: (4.11)

(4.12)

: w.f . ilo cf2 2f 1 1 f - f2 w c-lf1 if 1 1.B6f2- flf

where, using an obvious notation --I c = qeoe , and thus fi are polynomials (4.13)

fi(B)

+ 0

fl = ed,

f2 = ed

satisfying IBI < 1

;

i = 1,2

(4.9)

57

(4.14)

f. (0) 1

= I

;

i

=

1,2

and

(4.15)

c > 0.

Conversely, defines

if

let

Then

n = ~b

less

than

f

yx

us w r i t e

one minus

4.1:

(iii)

the

(i) a n d

(ii)

assumptions

exist

n = 6b

+ 6b O - ~a

with

same

then

is b o t h

function

fyx

(4.6)

if a n d

(4.14)

and

of

zeros

our

assumptions)

of w of m o d u l u s

-

following

(4.5)

that

invariant

is a t r a n s f e r

function

same

satisfying

(4.11)

one.

given

Then:

to the

fl,f2

than

for

theorem:

hold.

corresponding

polynomials

less

determined

- ~a ° is a n

causal

there

fyx (4.13)

holds

for

all

EV systems

there

which

only

which

that

then

if t h e r e

(corresponding

to

function

which

is c a u s a l ,

but

is m i n i p h a s e .

is n o c a u s a l

known

holds,

w

and miniphase

is a t r a n s f e r

function

it is a p r i o r i if

in the

c > 0 such

there

If n < 0, t h e n

i.e.

obviously

f

If n > 0, t h e n

If

(4.12)

of w of m o d u l u s

(4.1)

if t h e r e

and a constant

is a t r a n s f e r

(vi)

and

satisfying

n is u n i q u e l y

if a n d o n l y

which

(and

of p o l e s

see t h a t

functions

no transfer

(v)

we

shown

Let

If n = 0,

(4.11)

- ~a O is t h e n u m b e r

the n u m b e r

(4.13)

the

then

(4.9)

transfer

fxy)

(iv)

via

yx

w and w are

(4.14)

(ii)

f

hold,

w = b + b - B ~ b ° ( a + a - B~a°) -I

. Thus we have

(i)

(4.15)

+ ~b O - ~a

(4.11) a n d

Theorem

-

a w a n d f^ g i v i n g x

Now,

From

(4.13)

transfer

function,

but

there

is m i n i p h a s e

the

transfer

w = a-lb exists

functions

and w correspond

a polynomial

f2

are causal, to the

satisfying

same (4.13)

58

(4.16)

0 < $f2 <- n

and a c o n s t a n t c > 0 such that = c. (b+~ -) (a+~ - ) - 1 . f 2 ~ 2 . B n - ~ f 2

(4.17)

holds Proof:

If in

(4.11) w e take f2 = a w = c ( b + ~ -)

and fl

(a+~-)-lB n

w h e r e ( b + ~ - ( a + ~ -) is a causal and m i n i p h a s e with

(ii) this implies

(iii) - (v) and a l s o

This result has been stated in D e i s t l e r

non n e c e s s a r i l y

case a n d in G r e e n a n d A n d e r s o n From

rational (1985)

transfer (vi)

function;

together

is e a s i l y seen.

(1985a)and D e i s t l e r

Partly more general r e s u l t s have been given for the causal,

then

in Anderson,

(1985 b).

B.D.O.

(1985)

single input - single output

for a causal m u l t i v a r i a b l e

case.

(4.17) we see that the c a u s a l i t y a s s u m p t i o n gives a substantial

r e d u c t i o n of the set of all t r a n s f e r f u n c t i o n s c o m p a t i b l e w i t h given f

. If n = 0, then in the causal case, w is unique up to m u l t i p l i yx c a t i o n by a p o s i t i v e c o n s t a n t (this has b e e n p o i n t e d out by H i n i c h (1983) and Anderson,

B.D.O.

(1985)). An e s t i m a t i o n p r o c e d u r e

has been d e v e l o p e d by H i n i c h

5. C o n d i t i o n s

(1983)

for I d e n t i f i a b i l i t y

for the case n=0

and H i n i c h a n d Weber

(1984).

f r o m the Second M o m e n t s of the

Observations This section c o n s i s t s of two parts: the a d d i t i o n a l

the second step of the a n a l y s i s case.

Special e m p h a s i s

We have

(see A n d e r s o n and D e i s t l e r (1985a) (1985b), M a r a v a l l

If the transfer

(4.10). So this is section for the rational In the second p a r t

for i d e n t l f i a b i l i t y .

T h e o r e m 5.1: Let the a s s u m p t i o n s (i)

(4.8) and

of the p r e v i o u s

is put o n identifiability.

we give some o t h e r c o n d i t i o n s

Deistler

In the first part, we i n v e s t i g a t e

i n f o r m a t i o n c o m i n g from

(1984), A n d e r s o n , (1979), N o w a k

(4.1)

B.D.O.

(1985),

(1983)):

- (4.5) hold.

Then:

f u n c t i o n s are a p r i o r i k n o w n to be causal and

if either n = 0 or if

59

(5.1)

d,b

are relatively

prime

(5.2)

a,e

are relatively

prime

(5.3)

b+,~ -

then w is u n i q u e l y

are relatively

determined

prime

from f under

e a c h of the f o l l o w i n g

condi-

tions :

(5.4)

d,c

are r e l a t i v e l y

prime and

(5.5)

a.d,f

are relatively

prime

(5.6)

6d = 0 a n d

(5.7)

Gad = 0 a n d 6e + 6b > 6g - 6f

(ii)

If w is a p r i o r i assumed

(iii)

assumed

f2 An

If the t r a n s f e r assumptions

(4.14)

to be c a u s a l

(5.8)

a,e

(5.9)

a ,a

prime,

~ad > 0

and

if d a n d c a r e a p r i o r i

then there

compatible

functions

(5.1)

and

~e > 6h - 6c

to b e r e l a t i v e l y

of f a c t o r s

~d > 0

is o n l y a f i n i t e

with given

are n o t n e c e s s a r i l y

number

fx" causal,

a n d if the

- (5.3)

are relatively

prime

+~--

hold,

(iv)

then under

are relatively

prime

(5.4)

or

If w , f ~ c o r r e s p o n d s

or

(5.5)

to g i v e n

fyx'

(5.6)

or

(5.7)

t h e n a l l cw, c - l f ~ ,

satisfies (5.10)

0 < C m i n _< c _< C m a x w i t h C m i n a n d Cma x d e f i n e d

(5.11)

by

min 1B I= I

(fx(B)

- c -I f~(B)) = 0 min

IBlmin=1

(fy(B)

- Cma xlw(B) 12f~(B))

and (5.12)

w is u n i q u e

= 0

where c

60

c o r r e s p o n d to given f, and for all o t h e r c, cw, c-lf

correspond Proof:

to given

x

does not

f.

(i): If n = 0, then as has a l r e a d y b e e n stated, w is u n i q u e l y

determined from

(4.9) up to m u l t i p l i c a t i o n by a p o s i t i v e constant.

The same h o l d s under

(5.1)

-

(5.3) : Due to

zero c a n c e l l a t i o n s on the r . h . s

in

(5.1)and(5.2) no p o l e -

(4.9) can occur and thus a and d

are u n i q u e l y d e t e r m i n e d f r o m the p o l e s in fyx" By

(5-3)

, e then is

u n i q u e l y d e t e r m i n e d f r o m those zeros of a d d * f y x , B i say, w h e r e also -I B i is a zero and thus b and w are u n i q u e up to m u l t i p l i c a t i o n by a p o s i t i v e constant.

From

(4.8) we h a v e

(5.13)

dfxd* = e o e* + d c - l h o U h * c - 1 * d *

If

then there e x i s t s at least one zero of d,

(5.4) holds,

B I say,

and we h a v e

dfxd*(B1)

= e~ge*(B1)

and f r o m this Ge and thus b are u n i q u e l y d e t e r m i n e d . (5.5)

If

The proof for

is c o m p l e t e l y analogous.

(5.6)

h o l d s then

(4.8)

is of the f o r m

-fx = e~£ e* + c lhs h*c-

I*

and thus c is u n i q u e l y o b t a i n e d f r o m the p o l e s of fx' Then a e is obt s i n e d f r o m a c o m p a r i s o n of c o e f f i c i e n t s of p o w e r

cf Xc* = c e ~ e e * c * and in the same way we p r o c e e d if

The proof of (5.9)

(iii)

de + ~c in

+ ho h* (5.5)

holds.

is c o m p l e t e l y a n a l o g o u s ,

since

(5.1) - (5.3)(5.8)

h e r e a g a i n g u a r a n t e e that w is d e t e r m i n e d f r o m fyx up to multi-

p l i c a t i o n by a p o s i t i v e constant.

6~

(ii) if d and c are relatively prime,

then all zeros of d are poles

of fx and thus there is only a finite number of candidates

for d and

thus also for f2 in (4.17) (iv) is an immediate consequence

of Theorem 4.1 and of

taking into account the non-negativity

(4.8) and

of spectral densities

for

(4.10), IBI = 1

Clearly, once w is uniquely determined and if w(e -il) ~ 0 then also ~ ' fu' fv are unique. (i) and (iii) show that e.g. using (4.19), once the degrees are prescribed and if 6d > 0, we have identifiability on a generic subset of the parameter Now

space.

we discuss some other cases where additional

guarantee

identiflability

(i) Let the inputs xt have a spectral distribution (5.14)

F^(I) x

=

a priori restrictions

from the second moments of the observations

; f^dl + Z Fx, j [_~,~] x j:lj<~_

function FR given by ;

F

. > 0 x,3

Thus (x t) is a fairly general process where F x has an absolutely

con-

tinuous and a discrete part and where the discrete part corresponds to a stationary harmonic process ZeiljtZx,j, where Fx, j = ElZx,jl 2. Here we do not impose

(1.9). By assumption

(1.8),

(ut,v t) has a spectral

density and thus we have (5.15)

Fx(1)

=

S (f~+fu)dl [-z,l]

+

Z Fx, j j:lj
and (5.16)

5 fvu dl Fyx (l) = [-z!l] w (e-il)dFR (l) + [-~,l]

=

5 w(e-il) ( ~ + f u ) d l [-~,~]

Thus, from the jumps Fx, j and Fxy,j

+

Z w(e-~J)Fx, j + [ 5 ]fuv dl j:~j&~ -~,~

in F x and Fxy we obtain:

62

(5.17)

w(e -ilJ) ~

..F-1

= F xY,3

If w = a-l.b

is rational

°

x,3

with p r e s c r i b e d

(maximal)

for a and b respectively

then w is determined

from na+nb+1

values

Clearly (ii)

(different)

this result can be extended

If it is a priori known, fu(l)

and therefore,

(5.18)

to the multivariate

open set then clearly

then

j = 1...na÷nb+1.

that f~(1)

fuv(1)

= fyx(1)f~(1)-1 it is uniquely

determined

(4.8) we see that if c=I, d is uniquely

holds, d

(5.18),

since

determined

from f : x the unit circle

d is unique.

if the input errors

e, then w is unique

= fx(1)

at some point I£A.

which are located outside

x are the zeros of d and as d(0)=1, (1.9)

of w(e -il)

from

(iii) From

In this case the poles of f

= 0 and f~(1)

16A

e.g. by the derivatives

If

have the property

# 0 , 16A, we have

A is open,

and if

case.

VI6A~[0,~]

provided

w(e -il)

If w is rational,

(e-ilJ))

that the input-errors

= 0

where A is a nonempty VI6A,

(l@,w

degrees na and nb say

under our assumptions

u t are white noise

(S~derstr~m

(1980)).

(i.e. c-l.h=1)

From

(4.8) we

obtain: (5.19)

d f~d* = e%o e* + do d*

As d is uniquely corresponding zation of

determined

to power

(fR-o~)

6d in

(iv) If

(5.19)

into factors

zeros inside the unit circle (1.9) holds,

a moving average

from fx' a comparison

if

process

gives

of coefficients

oH. Then the usual

that have no poles

factori-

inside or on and no

gives d-le and oE and thus also f~ and w.

(S t) is autoregressive (i.e. c=I)

and if ~d>0

(i.e. e=1) (i.e.

and

(xt))

(u t) is

is auto-

63 regressive

in the narrow

(S~derstr6m

fs

Again,

sense,

(1980)).Here

=

d is uniquely df~d*

not white noise)

(4.8)

d-1~E d-l*

+

determined

=

G

then w is unique

is of the form:

ha

P h*

from fx" Now

+ dhap

h'd*

and if B I is a zero of d, we obtain = df~d* (B I ) and thus fs and w are unique. A common

feature

of the cases

(i) - (iv)

(4.8) and the extra assumptions once f~ is known, conditions

on the spectra

the uniqueness

may be imposed

is that f~ is obtained

fs and fu imposed;

of W,fu,f v is immediate.

to detect

f~ from

from

(4.10).

Analogous

Once f9 is unique,

we have

provided

that fxy(l)

6. Identifiabilit[ For non Gaussian mation coming

from High Order Moments observations,

from moments

identifiability

(Akaike

standard a s s u m p t i o n s (6.1)

# 0 IVl and the rest is easy.

(S t) and

analogously

of order greater

(1966),

Deistler

of

infor-

than two may be useful

(1986)).

are

(mutually)

stationary

In addition

for

to our

(S t) and of

(9t)

processes;

independent

up to order n, where n is sufficiently cumulants

case,

we here assume:

(u~v t) are strictly

the processes

to the static

satisfy

and all moments

large,

exist and the

conditions

of the form

64

t I .Z.tn_l =-m

Icgt19t2"'gtrRtr~{'xt n lxo I <

and the same holds for (u t) and (vt) Then (see e.g. Brillinger (1981)) the corresponding spectrum exists and is given by (6.2)

f~r~(n-r) (~I"''1n-I)

=

n-th order cumulant

=

n-1 c^ "Yt xt " "xt Xo exp{-i -Z11jtj] t1" "tn-1 =-~ Yti r r+1 n-1 J-

(2~) -n+1 .

and analogously for (ut) and (vt). As easily seen, due to linearity and continuity of cumulants with respect to one variable (when the others are kept constan%), we obtain from (6.2) and (1.1): (6.3)

fgr~ (n-r) (11 • • "In-l) =

=

(2~) -n+1

= w(e-i11),

Furthermore, (6.4)

~

( ~ w.c~ ^ tl " .tn_1=_~ i=_~ i Xtl_iYt2.

f

n-1 ^ ~o)eXp{-i Z ljtj} = Ytr.. j=1

y(r-1)x(n-r+1 ) (11---ln_ I)

from the properties of the cumulants we obtain

fyrxn_r(ll...ln_ 1) = fgr~(n_r) (l 1...In_ 1) + + fvru(n-r) (I I---1n_ r)

If, in addition we assume (6.5)

(ut) and (vt) are independent

then (6.6)

fvrv(n_r) (ll...ln_ r) = 0

for r > 0,

n - r > 0

65

If we assume (6.7)

(ut v t) is Gaussian,

then fvru(n-r)

= 0 for all n > 2. Thus we have

Theorem 6.1:Consider the (dynamic} EV-model addition the assumptions (6. I} , (6.5) and (6.8)

(1.1) (1.3)(1.4). If in

fy(r-1)x(n_r+1) (1112...1n_ I) ~ 0

V1 I,

for suitable 12...In_ I and suitable n>2; r-1>0, n-r>0 are satisfied then w is uniquely determined from (6.9)

w(e -i11) = fyrx(n_r) (I 1 ...In_l) . fylr_1)x(n_r+1) (I 1 ...ln_1 )

An analogous result holds if (6.5} is replaced by (6.7). Theorem 6.1 is the dynamic analogon to Theorem 2.2. If R t has a Wold decomposition (see e.g. Hannan (1970))

xt = w2(z)ct where (e t) is i.i.d., then (6.10)

(see e.g. Brillinger

fR (11"''ln-1)

(1981))

= (2n)-n+1"w2(e-ill)''w2 (e-iln-1)"

n-1 . w2(ex p i i=1 Z lj). Cen If all moments of ~ exist if gt is non Gaussian,

(provided that ~t ~ 0)

there is a n>2 such that Cen~ 0. If in addition w2(e -il) ~ 0

VI, then

f~n(ll...In_1) fulfilled.

(6.8) is

# 0 ~ll...In_ 1 and then due to (6.3) condition

The generalization of Theorem 6.1 to the multivariable case is straightforward. Estimators of the transfer function w may be obtained from (6.9) replacing the cumulant spectra by their estimators.

66 References Aigner,D.J. and A.S.Goldberger (Eds.), (1977): Latent Variables Socio-Economic Models.North Holland P.C., Amsterdam

in

Aigner,D.J., C.Hsiao, A.Kapteyn and T.Wansbeek (1984): Latent Variable Models in Econometrics. In: Griliches, Z. and M.D.Intriligator (Eds.) Handbook of Econometrics. North Holland P.C., Amsterdam Akaike,H. (1966): On the Use of Non-Gaussian Process in the Identification of a Linear Dynamic System. Annals of the Institute of Statistical Mathematics 18, 269 - 276 Anderson,B.D.O. (1985): Identification of scalar errors-in-variables models with dynamics, Forthcoming in Automatica Anderson,B.D.O. and M.Deistler (1984): Identifiability in Dynamic Errors-in-Varlables models, Journal of Time Series Analysis, 5, 1-13 Anderson,T.W. (1984): Estimating Linear Statistical Relationships. Annals of Statistics, 12, 1 - 45 Brillinger,D.R. (1981): Time Series: Data Analysis and Theory. panded Edition. Holden Day, San Francisco

Ex-

Deistler,M. (1984}: Linear errors-in-variables models. In: J.Franke, W.H~rdle und D.Martin (Eds.), Robust and Nonlinear Time Series Analysis, Lecture Notes in Statistics, Springer-Verlag, Berlin Deistler,M. (1985a): Linear dynamic errors-in-variables models in: J.Gani and M.Priestley (Eds.) Essays in Time Series and Allied Processes. Forthcoming Deistler,M. (1985b): Identifiability and Causality in Linear Dynamic Errors-in-Variables Systems. In: Proc. 5th Eranco Belgian Meeting of Statisticians. Forthcoming Deistler,M. and H.G.Seifert (1978): Identifiability and Consistent Estimability in Dynamic Econometric Models. Econometrica, 46, 969 - 980 Drion,E.F. (1951): Estimation of the Parameters of a Straight Line and of the Variances of the Variables, if they are Both Subject to Error. Indegationes Math. 13, 256 - 260 Frisch,R. (1934): Statistical Confluence Analysls by Means of Complete Re@ression S[stems. Publication No. 5, University of Oslo, Economic Institute Fuller,W.A. (1980): Properties of some Estimators for the Errors-inVariables Model. Annals of Statistics, 8, 407 - 422 Geary,R.C. (1942): Inherent Relations between Random Variables. Proceedings of the Royal Irish Academy, Sec. A, 47, 63 - 76

67

Geary,R.C. (1943): Relations between Statistics: The General and the Sampling Problem When the Samples are Large. P r o c e e d i n g s of the Royal Irish Academy. Sec. A, 49, 177 - 196 Gini,C. (1921): Sull'interpolazione di una tetra quando i valori della variable indipendente sono affetti da errori aocldentall. Metron I, 63 - 82 Green,M. and B.D.O.Anderson (1985): Identification of m u l t i v a r i a b l e e r r o r s - i n - v a r i a b l e s models with dynamics. Mimeo. Hannan,E.J.

(1970):

Multiple

Time

Series.

Wiley,

New York

Hannan,E.J. and L.Kavalieris (1984): Multivariate Linear Time Series Models. Advances in Applied Probability 16, 492 - 561 Hinich,M.J. (1983): Estimating the Gain of a Linear Filter from Noisy Data. In: D.R.Brillinger and P . R . K r i s h n a i a h (Eds.) Handbook of Statistics, Vol 3. North Holland, A m s t e r d a m Hinich,M.J. and W.E.Weber (1984): Estimating Linear Filters with Errors in Variables Using the Hilbert Transform. Federal Reserve Bank of Minneapolis, Res.Dept. Staff Report 96 Kalman,R.E. (1982): System Identification from Noisy Data. In: A.Bednarek and L.Cesari (Eds.) Dynamical Systems II, a University of Florida International Symposium. Academic Press, New York Kalman,R.E. (1983): Identifiability and Modeling in Econometrics. In: Krishnaiah,P.R. (Ed.) Developments in Statistics, vol 4. Academic Press, N e w York Kendall,M.G. and A.Stuart (1969): The Advanced Vol I, 3rd Edition, Griffin, London

Theor~ of Statistics.

Klepper,S. and E.Leamer (1984) Consistent Sets of Estimates for Regressions with Errors in all Variables. Econometrica 52, 163 -. 183 Madansky,A. (1959): The Fitting of Straight Lines when Both Variables are Subject to Error. Journal of the American Statistical Association 54, 173 - 205 Maravall, A. (1979): Identification Springer Verlag, Berlin.

in Dynamic

Moran, P.A.P. (1971): Estimating Structural ships. Journal of M u l t i v a r i a b l e Analisys

Shock-Error

and Functional I, 232-255

Models. Relation

Nowak, E. (1983): Identification of the Dynamic Shock-Error Model with A u t o c o r r e l a t e d Errors. Journal of Econometrics 23, 211-221 Picci, G. (1985): Factor Analylis Methods. This Volume

Models via Stochastic

Realization

68

Reiers¢l,O. (1941): C o n f l u e n c e A n a l y s i s by M e a n s of Lag M o m e n t s and o t h e r M e t h o d s of C o n f l u e n c e A n a l y s i s . E c o n o m e t r i c a 9, I - 24 Reiers~l,O. (1950): I d e n t i f i a b i l i t y of a L i n e a r R e l a t i o n B e t w e e n V a r i a b l e s w h i c h are s u b j e c t to Error. E e o n o m e t r i c a 18, 375 - 389 S c h n e e w e i S , H . u n d H . J . M i t t a g (1985): L i n e a r e M o d e l l e m i t f e h l e r b e h a f t e t e n Daten. P h y s i c a Verlag, W ~ r z b u r g Scott,E.L. (1950): Note on C o n s i s t e n t E s t i m a t e s of the L i n e a r Structural R e l a t i o n B e t w e e n two V a r i a b l e s . A n n a l s of M a t h e m a t i c a l S t a t i s t i c s 21, 284 - 288 S 6 d e r s t r 6 m , T . (1980): S p e c t r a l D e c o m p o s i t i o n w i t h A p p l i c a t i o n to I d e n t i f i c a t i o n . In: A r c h e t t i , F . and M . C u g i a n i (Eds.) N u m e r i c a l T e c h n i q u e s for S t o c h a s t i c Systems. N o r t h H o l l a n d P.C., A m s t e r d a m Wegge,L. (1983): A R M A X - M o d e l s P a r a m e t e r I d e n t i f i c a t i o n w i t h o u t and w i t h L a t e n t V a r i a b l e s . W o r k i n g Paper. Dept. of Economics, Univ. of C a l i f o r n i a , Davis.

Chapter

3

A New Class of Dynamic Models For Stationary Time Series

Giorgio

Picci

and

Stefano

Pinzoni

I. Introduction In this note we shall discuss a new class of dynamic models which may be better suited than conventional ARMAX schemes to describe non-causally interacting time series. Typical areas of application that we have in mind include econometrics (where it is often not clear what variables are "endogenous" and what are "exogenous") and identification of industrial processes operating under feedback. In these situations there is no a priori clear causality relation among the variables and, in fact, a possible goal of the identification experiment could be the testing for existence of causal relations. The class of models introduced here is a natural dynamic generalization of the well-known static Factor Analysis model which in various equivalent forms (the most popular of which seems to be the so-called Errors-In-Variables scheme) has been object of much study in the past especially by econometricians and psychologists.

(For definitions of these concepts and a

rather comprehensive survey of the literature one may consult the recent paper by Van Schuppen(1985). The study of these models has recently been revitalized by Kalman in a series of papers(Kalman, 1982a,1982b and]98~and some of the critiques presented in Kalman's

?0

work have been the motivating stimulus £or the earlier paper (Finesso and Picci, 1984). The present exposition represents the natural continuation and generalization of the results presented there. In order to improve readability we have chosen to skip some non essential technical details. A more complete story can be found in (Picci and Pinzoni, ]986). People interested in genera] philosophical discussions on the modelling problem considered here are referred to the introduction of (Finesso and Picci, 1984). We should mention that some of the specific issues dealt with in this paper are also treated (in the scalar E.I.V. context) in the work of Anderson and Deistler (1984), Anderson (]985), Deistler (]985). Although the primary motivations (and hence the basic assumptions) in these papers are of a rather different nature than ours, the reader might find some ground for comparisons in the discussion of the causality problem presented in section 4. For the sake of motivating the introduction of Dynamic Factor Analysis models we shall briefly review the definition of causality of a dynamical model, first in the deterministic and then in the stochastic (Gaussian) case. The idea that we want to convey is that causal models are quite "nongeneric" mathematical descriptions to impose aprioristically to real data, e.g. economic time series or data coming from industrial processes involving feedback. In a deterministic framework the notion of causality is of course well known. Assume that the components of the m-dimensional variable y(t), whose temporal evolution is described by a certain dynamical model, have been grouped in two subvectors,

Yl(t)] y(t)

=

,

(I .I)

Y2 (t)

with Yi(t)_ _ER

mi

, i = 1,2 , and m1+m 2 = m .

It is intuitively clear

that a dynamical model should quantify the dynamic relation

71

occurring between the variables Yl and Y2 (i.e. how much

Yl

"influences" Y2 and vice versa). This is made precise in J.O.Willems refoundation of Systems Theory (Willems,1979):anymodel valently dynamical system) with external variables subset of trajectories

is just a

L~ (called the behaviour of the system)

in ( m)~= ( ml)~x (~m2) Z between

y

(or equi-

and therefore a bona fide relation

Yl (ranging over (~ml) ~) and Y2 (ranging on (~m2)2). We

say that Yl causes Y2 or, equivalently,

that Yl is the input and

Y2 is the output variable of the system, if this relation specializes to a very particular kind of function, namely if

y2(t) = f(yl), where

t~z

,

(1.2)

f depends only on the values taken by Yl before and at

time t . In the stochastic case the sharply defined subset ~

is

replaced by a probability measure on the sample space (Rm) Z and thus the external variable

y becomes a stochastic process

{y(t)}. The model is in this case just the probability law of {y(t)}. To make things simple we shall consider here about the simplest possible class of random processes, described in the following BASIC ASSUMPTION The process {y(t)} is an m-dimensional Gaussian stationary process with zero mean and has a rational spectral density

S

strictly positive definite on the unit circle (i.e. S(e ie) >0).

D We shall write the spectrum

S in a partitioned form corre-

sponding to the subdivision (1.1) of the external variables,

72

S

S1

S12

S21

S2

,

=

(I .3)

where the blocks S., of dimension m. xm., represent the auto l

I

l

spectra and $12 the cross spectrum of the two components {y1(t)} and {Y2(t)} of dimension m I and m 2. The definition of causality in this context, essentially due to

Granger (3963 and 1969)~sounds as follows.

DEFINITION 1.1 We say that the process Yl causes Y2 or, equivaleqtl~, that Yl is an input process with correspondlng output Y2 if, for all

t ~ ~ ,

E~2(t ) lyl]

= E[Y2(t)

]Y1(S), si~ ,

(1.4)

where the first conditional expectatlon is with respect to the whole history {Y1(t); t E Z }

of the component YI"

O Causality is just conditional independence of the past and present output history {Y2(S); s ~ t }

from future inputs

{Y1(S); s > t } given the past of the input {Y1(S); s ~ t } and can of course be defined in a much more general setting than the one adopted here. In a Gaussian setting we can however translate everything in the convenient Hilbert space language of the linear theory of random processes (see e.g.Rozanov, 1967).Some of this material necessary for future use will be quickly reviewed in the next paragraphs. We shall denote the vector space of all finite linear combinam tions of the scalar random variables [a'y(t); ~ 6 R , t E Z } closed in the metric induced by the scalar product

< x,z > : = Ex z, by +

the symbol H(y) (sometimes abbreviated to H) • H~(y) , Ht(Y) will

73

denote the past and future subspaces spanned by the random variables y(s) up to and, respectively, after and at, time t. Clearly,

h • (=yurn(y) ) where U : y(t)+y(t+l)

(1.5)

is the (unitary) shift operator of the

process {y(t)}. Normally the subscript zero in (1.5) will be dropped. For the two components Yl and Y2 we shall define the subspaces H(Yl) , H(Y2) (abbreviated to H I and H 2 when there is no danger of confusion) accordingly. Obviously H = HIV H 2

where

the wedge denotes closed vector sum. Subspaces like H I and H 2 are doubly invariant for the shift U, in the sense that they satisfy

UtH. = H. for all t E ~. The i l multiplicity of a doubly invariant subspace X C H is the cardinality of any minimal generating set, i.e. is the smallest n which one can find random variables

{x1''"'Xn } in X

for

such that

the vector space generated by {Utx.; i= 1,...,n , t E Z } is dense i in X. The process {x(t)} with x.(t) =Utx. is called a generating l i process of X. By the Spectra] Representation Theorem (Rozanov, 1967), there is a unitary representation of the random variables in H(x) as n-dimensional

(row) vector functions in the Hilbert space

L2(C,dQ) where C = {z; Izl = l} is the unit circle in the complex n plane and Q is the n x n matrix spectral distribution measure of the process {x(t)} . Each random variable ~(t): = ut~ with ~EX can be written as ~(t) = [~e i@t f(ei@)dx(e i8)

for a unique

f E L 2 (C,dQ). Here ~ is the n-dimensional random n spectral measure of the stationary process {x(t)}. As it is well known ( Rozanov,

1967 )

the spectral distribution matrix is re-

lated to ~ by dQ=E(d~ d~*),where the star means conjugate transpose.

74

The representation will be symbolically written as

(i .6)

~(t) ~ f(z)x(t) .

The System Theoretic interpretation of the notation is that the stationary process {~(t~ is obtained by passing the stationary process {x(t)} through the linear (stable) filter of transfer function f. In all cases of interest for us the spectral distribution measure of {x(t)} will be absolutely continuous with respect to Lebesgue measure on C. The spectral density matrix will still be denoted by the symbol Q.It is well known(compare e.g.Fuhrmann~1981, p. 111) that {x ,...,x } being a minimal generating set is equivalent to Q(e

.@I i

n

. .

. ,

. °

.

. ,

o

) being st rlctly posltlve deflnlte on a set of

positive Lebesgue measure. For example the assumption S(e i8) > 0 a.e. guarantees that H = H ( y ) has precisely multiplicity m, a possible minimal set of generators being given by the m scalar components of the random vector y(0). Observe further that any other minimal generating process for H(x) can be written as

u(t) = T(z)x(t) ,

(1.7)

with T an nxn matrix function having rows in L2(C,dQ) and Q-a.e. n

nonsingular o n t h e

unit circle.

Of course when {x(t)} admits a spectral density Q which is a.e. positive definite on the unit circle, then all admissible T ' s

will be a.e. nonsingular on C and all minimal gene-

rating processes for H(x) will have an a.e. positive definite spectral density on the unit circle.In particular, by choosingT=W I//2n

75

where W is any square solution of the standard spectral factor~zstionproblemWW*=Q,we

obtain white noise generators {u(t)}

for H(x).The transfer function in the representation

(1.6) in

this case belongs to L2(C, d@/2~). In this context we shall call n causal any function f with vanishing positive Fourier coefficients in

L2(C, dO/2~), i.e. such that n f~ . . I e-~Okf (el0)dO/2~ = 0 ~-~

(1.8)

for all k>0. Thus any causal function belongs to the n-dimensional conjugate Hardy space ~2 (Hoffmann,1962) and can be extended n to a function of the complex variable z analytic on {Izl > I} (including the point at infinity). A matrix valued function T will be called causal if its rows are. It can be verified directly that for any generating process {x(t)} with a strictly positive definite

matrix (°)

spectral density

we have

Ht(u) C Ht(x)

if and only if the

transfer matrix T in (1.7) is causal. A (left -) invertible matrix T with rows in L2(C, d9/2~) will be called minimum phase if it is n

causal and its extension has an analytic (left-) inverse on {Izl > I}. This is the same thing as a conjugate outer matrix function in

H2-theory. We finally recall the concept of conditional orthogonalit~.

Two

subspaces HI,H 2 of H will be said conditionally orthogonal, given.

iH21H), if

a third subspace X (notation: H I _

< h I -EXhl , h 2 - E X h 2 > = 0

for all h I E H I and h 2 6 H 2, Here the symbol E

(1.9) X

denotes orthogonal

projection onto X. Since in the Gaussian case conditional expec(o) or, more generally, ]967).

full rank purely non deterministic

(Rozanov,

76

tation given a certain family of random variables in H is the same thing as orthogonal projection onto the subspace of H spanned by them~ we see that conditional orthogonality is the same property as conditional independence, given X, of the two families H I and H 2 of Gaussian random variables. The concept of conditional orthogonality

will be extensively used in this paper. For additional

information one may consult (Lindquist and Picci. ]985). We return to our discussion of causality in the stochastic setting. The following is a rather well known fact although often stated in a different terminology.

THEOREM 1.1 The process {Y1(t)} causes {Y2(t)} if and only if

Y2(t) = A(z)y1(t) +v(t)

,

(1.10)

where A(z) is an m 2 x m I causal matrix function and {v(t)} stationary process completely independent of {Y1(t)},i.e.

E y1(t)v'(s) = 0

for all

(1.11)

t,s E Z.

This result is essentially due to (Caines and Chart, 1976). It is also discussed in (Caines and Chan, ]975) and (Gevers and Anderson, 1982). In these references causality is called "absence of feedback" (from Y2 to y]). Note that (].]0) is nothing else but the popular ARMAX scheme widely used in time series identification. Just express {v(t)} by its innovation representation, v(t) = G(z)e(t)

,

(1.12)

where G(z) is minimum phase, normalized so as to make G(~) = I,

77

and {e(t)} is a white noise process. Recall that, by rationality of S(z), both A(z) and the spectrum of {v(t)} are rational and then express the rational matrix [A(z) G(z)] by a left coprlme M.F.D. D(z) -lIB(z) C(z)] to get

D(z)Y2(t) = B(z)Y1(t) +C(z)e(t).

(1.13)

The orthogonality condition (1.11) holds if and only if

E e(t)y;(s) = 0 ,

t,sEZ

and therefore using ARMAXmodels noise and input (yl) processes

(o)

a causality relation on the data.

,

with independent

(1.14) (or uncorrelated)

.

is equivalent to imposing a priori In this case the statistical

inference problem of estimating the joint law of {Y1(t)} and {Y2(t)} is reduced to the much simpler problem of estimating just the conditional law of future y~s given past inputs YI" Quite often there is no evidence in the data which justifies the use of causal models. What kind of models should then be used in this situation? One obvious answer would be to describe the whole (joint) process {y(t)} by an m-dimensional ARMA scheme corresponding say to the m x m rational minimum phase spectral factor of the joint spectrum S. Our main concern is however in describing how two given groups of variables

(Yl and y2 ) interact

dynamically.

In practice Yl and Y2 have a precise physical or economic meaning and the main reason for doing modelling and identification is to discover how much of the temporal evolution of each variable is "explained" by the other. For this purpose it would be much more useful to have models which (although necessarily equivalent to the joint ARMA scheme mentioned above) put into explicit evidence the mutual influence of the variables Yl and Y2" A class of mathe(o) Actually condition (1.14) is often considered to be part of the definition of an ARMAX model and is not even explicitly mentioned.

78

matical descriptions which in a certain sense generalizes the causal model (1.10) is the stochastic feedback scheme

Y2(t) = L(z)y 1(t) +v 1(t),

(i.~5) Y1(t) = K(z)Y2(t) +v2(t), where L and K are causal transfer functions and {v1(t)} and {v2(t)} stationary"error" processes whose innovations can at most he assumed orthogonal to the past histories of Yl an4 Y2' respectively. This class of models has been extensively investigated in recent years, especially by Gevers and Anderson (1981 and 1982) and Anderson and Gevers (]982) with the main motivation of understanding identifiabi]ity of control systems operating under feedback. Practica] use of these mode]s for time series identification seems however to have been very limited so far. We shall propose here a different class of models in which the dynamic interaction between Yl and Y2 is explicitly by the introduction of an auxiliary

described

variable x. This auxiliary

variable will play a role similar to the state variable in Systems Theory.

DEFIN%TION 1.2 A Dynamic Factor Analysis Model with external variables the (jointly statignary) vector processes

{Y1(t)} and {Y2(t)}, is

a linear relation of the form

Y1(t) = A 1(z)x(t)+w 1(t),

(i .16)

Y2(t) = A2(z)x(t) +w2(t), where A I (z) and A2(z) are transfer matrices of dimension m I x n and. m 2 x n

and {x(t)}, {w1(t)} , {w2(t)} are zero mean stationary

79

processes of dimensions n, m|, m 2 which are pairwise uncorrelated, i.e.

{w1(t)} i {x(t)} i {w2(t)}"

(1.17) []

Note that A I and A 2 need not be causal. The process {x(t)} will sometimes be referred to as the factor process of the model. A Dynamic

Factor Analysis (F.A.) model will be called rational if AI,A 2

are rational matrices and {x(t)} has rationalspectrum. The terminology (although not

terribly elegant) has been extrapolated from the

static case. In the next sections we

shall present a first rudimentary

analysis of the model (1.16). The main questions one would like to answer concern the representability of an arbitrary joint stationary process {y(t)} (with y(t) partitioned as in (1.1)) by models of the type (1.16), the equivalence of representations (i.e. when do different representations describe the same spectrum S or the same process {y(t)}), the "external behaviour" of the model which is obtained once {x(t)} is eliminated, finding a natural notion of minimality and characterizations of minimal models, parametrizations and canonical forms in the rational case and above all discuss

use of Factor Analysis models in Statistical Inference

(i.e. identification). This is quite a large program and only a few of these aspects will be touched upon in this paper. Others (especially the last two mentioned above), which still need more research, will not be discussed here.

80

2. Dynamic Factor Analysis Models The stationary processes {x(t) }, {w 1(t)}, {w2(t)} which define a Factor Analysis model span a certain Hilbert space H(X,Wl,W 2) which we denote by H . The Factor Space X of the O

model (1.16) is the doubly invariant subspace of H

generated O

by the factor process,

X = span {a'x(t); a E R

n

, t6Z}

.

(2.1)

Let n < n be the multiplicity of X and let[x(t)} be a minimal generating process for X. Clearly, since x(t) =T(z)x(t) for some nxn

matrix T, we can always rewrite the model (1.16) with A (z) I and A2(z) replaced by At(z) =A1(z)T(z) and A2(z) =A2(z)T(z) and a factor process x(t) which is a minimal generating process for X. We shall therefore adhere from now on to the convention of considering only F.A. models in which {x(t)}is a minimal generating process for

X. Hence the multiplicity of X will always coincide

with the dimension of x(t). Two F.A. models which differ by a change of (minimal) generators in X will be called equivalent. Obviously two equivalent models have the same {w.(t)} processes (for i=1,2), i

the same factor space X and transfer matrices and factor processes related hy A.(z) = A.(z)T(z) -I , i

i=I,_ 2

,

i

(2.2)

^

x (t) = T(z)x(t), where T is a Q-a.e. nonsingular

n x n matrix function whose rows

belon~ to L2(C,dQ),Q being the spectral distribution measure of n

{x(t)}. It is easy to check that (2.2) defines an equivalence relation on the class of all F.A. models of {Y1(t)}, {Y2(t)}. We shall now introduce the concept of splittin$ subspace. By this idea we shall be able to attach a precise probabilistic meaning

81 to F.A. models and at the same time reduce this notion

to a very

simple geometric object. Let H i=H(yi),i = 1,2 be the Hilbert spaces spanned by the components {Yi(t)},

i= 1,2. It will be useful to

think of H I and H 2 as (doubly invariant) subspaces embedded in a large Hilbert space Ho obtained by suitably augmenting H

NIV H 2. On

there is defined a unitary shift operator U which reduces to O

the shift of the process {y(t)} on the subspaee H=H(y) = H I V H 2. (The role played by H

o

is very similar to that of the space

H(X,Wl,W 2) introduced at the beginning of this section).

DEFINITION 2.1 A (stationary) splitting Subspace is a doubly invariant subspace X __°fHo which makes H(y I) and H(y 2) conditionally orthogonal given X, i.e. satisfies

H(Yl)IH(Y 2) [ X

(2.3)

together with UX = X. A Splitting Subspace X is called minimal if there are no proper subspaces of X which are doubly invariant and still satisfy condition (2.3).

[] The concept of splitting subspace is a generalization of the idea of sufficient statistic (at least in the Gaussian case). It follows in fact from the definition of conditional orthogonality (1.9) that EEh IIXVH2]

= EEh IIX] ,

h IE H I ,

and, equivalently,

E[h2IxVH ] = E[h21X ] ,

h2 E H2 ,

82

so that all what is relevant in H2(H I) at the purpose of predicting any h16 H I (h26 H 2) is already contained in X. Therefore if X (or any system of generators of X) is given, we can disregard H 2 (H I) completely. Note that the concept of splitting is of interest only if it corresponds to effective data reduction. Hence the notion of minimality is of central importance. LEMMA 2.1 (Ruckebusch, 1976 and Lindquist and Picci, 1985) A splitting subspace X is minimal if and only if EXH I = X ,

EXH 2 = X

(2.4)

(here EXH'I is the closure of {EXhl; hie Hi} ,

i = 1,2). []

The following theorem shows that (modulo choice of generators) splitting subspaces and Dynamical Factor Analysis models are essentially the same thing. THEOREM 2.1 The factor space X of any F.A. model of {Y1(t)}, {Y2(t)} i__ss a splitting subspace. Vice versa to every splitting subspace X for H(Yl) , H(Y2) of finite multiplicity there . corresponds the equivalence class, defined modulo choice of generators, of F.A. models having X as

factor space.

Proof: Let X be given by (2.1). Then, since A.(z)x(t) = EXy.(t) , l l

tEZ,

i= 1,2 ,

(2.5)

the o r t h o g o n a l i t y r e l a t i o n of {wl(t)} and {w2(t)} , which holds by assumption for any model (1.16), can be rewritten as X X Y1(t)-g y1(t) l Y2(S)- E Y2(S) ,

t,sEZ •

(2.6)

88

As {Yi(t)} is a generating process for H.I it follows from the definition ( 1 . 9 )

t h a t indeed X i s s p l i t t i n g .

Viceversa, let

X be a splitting subspace and {x(t)} a minimal generating process X for X of dimension n. The projections E Y i ( t ) can be w r i t t e n as in (2.5) for suitable transfer functions A.(z) of dimension m. xn. I i Define

w.(t): = Y i ( t ) - E X . ( t ) l i

,

teT,

i=1,2,

(2.7)

then the stationary processes {w.(t)} are orthogonal to X and, i by the conditional orthogonality condition (2.3),we have also E w1(t)w2(s)' = 0

for all

t,sE ~. Therefore

{y1(t)} and

{Y2(t)} can be written as in (1.16), while satisfying (1.17). [] The equivalence established by Theorem 2.1 permits to define a first rough notion of minimality for F.A. models. We shall say that a F.A. model is irreducible if its factor space is minimal splitting.

THEOREM 2.2

(Picci and Pinzoni, 1986)

A F.A. model is irreducible if and only if the rank a.e. on the unit circle of the matrices A|(z) and A2(z) is equal to the multiplicity of X. All irreducible F.A. models have the same multiplicity (i.e. the same number of factors) n equal to the rank a.e. on the unit circle of the cross spectrum $12 of the processes {Y1(t)} and {Y2(t) }In the rest of this paper we shall concentrate on irreducible models. As we have just seen these models are characterized by a.e. left invertible matrices Ak(Z) , k = 1,2. Their factor process has an absolutely continuous spectrum

with an a.e. positive definite

spectral density matrix Q on the unit circle(Picci and Pinzoni,1986).

84

If in an irreducible F.A. model we eliminate the auxiliary variable {x(t)},we obtain a scheme of the following type, A2(z)-Ly2(t) = A1(z)-Ly1(t)

,

(2.8)

Y1(t) = Y1(t) +w1(t)

,

(2.9a)

y2(t) = Y2(t) +w2(t)

.

(2.9b)

This is essentially what is commonly called an Errors-In-Variables (E.I.V.) model of the processes {Y1(t)}, {Y2(t)}. Here Y1(t) and Y2(t) are represented as "noisy" observations of the "true" variaA

bles y~(t), Y2(t) obeying the deterministic relation (2.8) . Note that the correlation structure of {Y1(t)} and {Y2(t)} is completely embodied in the relation (2.8)

as the noise processes {Wk(t)} are

mutually uncorrelated and also orthogonal to the "true" variables {;k(t)}. An equivalent form of the deterministic link (2.8)

between

the true variables is obtained by substituting x(t) =A1(z)-Ly|(t) into the second equation in (1.16), getting -L Y2(t) = W(z)Y1(t)

,

W(z): =A2(m)AI(Z)

(2. I o)

, W(z) ~=A1(z)A2(z) -L

(2.11)

or,dually,

Y1(t) = W(z~Y2(t)

Note that the transfer functions W(z), W(z) ~ and also the relation (2.8)

are invariant under change of generators , x(t) =

= T(z)x(t) (T nonsingular), and are therefore uniquely attached to the (minimal) splitting subspace X of the model. An important question concerns the existence of models for which W (or W ~

is

a causal transfer function. This is the same as asking if two stationary processes described by an arbitrary joint spectrum S can be represented by the "noisy" input-output model

8G

Yl (t) = yl (t) + w I (t), (2.12) Y2(t) = W(z)Y1(t) +w2(t) , where W(z) is causal and {w1(t)}i{y|(t)}i{w2(t)}.

We shall

take up this kind of questions in section 4. As a last general comment about F.A. models, we remark that the freedom of changing generators in the factor space X permits to choose transfer matrices Ak(Z) or factor processes of very special structure. For example we can always take {x(t)} to be a white noise process or require that both At(z) and A2(z) be causal transfer functions.

For simplicity we shall state the next result

for the case of rational F.A. models.

PROPOSITION 2.1 For every rational irreducible F.A. model there is a choice of (minimal) generators in X, x(t) = T(z)x(t) ,

which (maintains rationality and) achieves causality of the transfer function matrices

Ak(Z) = Ak(Z)T(z)-1,

k = 1,2

(2.13)

Proof: In a rational model both the spectrum Q and the matrices ~,

k = 1,2

are rational functions. Since the joint spectrum

of the processes

Yk(t) = ~ ( z ) x ( t ) ,

=

$12 $21

32

=

k = 1,2

Q A2

I

,

A2

'

(2.14)

86

is then itself a rational function, it admits causal (in particular minimum phase) rational

spectral factors. Note that irre-

ducibility implies that rank S = n = r a n k

$12. Pick a causal full

rank rational spectral factor A (of dimension m x n) of S and write it as a partitioned matrix with two blocks Ak(Z) of dimensions m k x n ,

k = 1,2. The spectral factorization

= [AI(z)

(z)

L A2(z)

A2

is c l e a r l y e q u i v a l e n t t o the r e p r e s e n t a t i o n s k = 1,2

with { x ( t ) } an

Yk(t) = A k ( z ) x ( t ) ,

n - d l m e n s i o n a l white n o i s e p r o c e s s .

We

interpret {~(t)} as the new factor process of the model. Since A(z) is full rank, we can solve for

A1(z) =

A2(z)

Y2 (t)

x(t)

in the representation

x(t)= [X1(z)1 x(t), A2(z)

getting

~(t) = A2(~)

-L [A1(z) A2(z)

Note that T is square n x n

x(t) := T(z)x(t)

.

and nonsingular because of irre-

ducibility. This proves Proposition 2.1.

In the proof we could in particular have chosen =

LA1(z)'X2(z)']' minimum

[] A(z) =

phase. We see that an irreducible

rational F.A. model can always be written as a pair of ARMAX equations,

87

D1(z)Y1(t) = B1(z)x(t) +C1(z)e1(t), (2.15) D2(z)Y2(t) = B2(z)x(t) +C2(z)e2(t),

with {~(t)},

{el(t)} ,

{e2(t)} pairwise uncorrelated white noise

processes and Dk(Z)

and

dimensions

and

mkxm k

Ck(Z) stable polynomial matrices of m k x Pk' Pk

being the multiplicity

of the noise process {Wk(t)} , k = 1,2.

3. Stochastic realization The main problem of this section will be to describe the class of all irreducible F.A. models which match a given spectral density matrix. We shall see that this is equivalent to solving the following problem. PROBLEM P.! Given an

mxm

spectral densit~ matrix S partitioned as in

(1.3) and satisfyin$ the Basic Assumption

of Sect. I, find all

5-tuples of matrix functions {AI,A2,Q,RI,R 2} on the unit circle, with A~ of dimension

mkx n

and of rank n , Q of dimension

and nonsingular , R k of dimension i)

nx n

mkx mk, k = 1,2, which

satisfy the system of equations

S I = AIQ A I + R I ' $12 = A1Q A2 , S 2 = A2Q A 2 * R 2 ,

ii)

make the (m+n) x (m+n) matrix

(3.1)

88

sI

S12

AIQ

$12

S2

A2Q

QA I

QA 2

Q

(3.2)

into a spectral density matrix (in particular Hermitian and nonnegative definite on the unit circle).

[] Assume we have an irreducible F.A. model,

z1(t) = A1(z)x(t) +w1(t) ,

(3.3) z2(t) = A2(z)x(t) +w2(t)

if we interpret Q as the spectral density matrix of Rk, k = 1,2

{x(t)} and

as the spectra of the two noise processes {Wk(t)} ,

we see that eqns. (3.1) express precisely the fact that the joint spectrum of {z1(t)} and {z2(t)} coincides with the given joint spectrum S.

Note also that the matrix S in (3.2) is just the

joint spectral density of the three processes {z1(t)} , {z2(t)} and {x(t)}. Vice versa, assume we are given a 5-tuple {AI,A2,Q, RI,R 2} of matrices satisfying eqns. (3.1) and condition (ii). It is not hard to see and we shall check this later, that condition (ii) implies that

Q,RI,R 2 are necessarily bounded Hermi-

tian positive semidefinite (Q is actually positive definite) matrices on the unit circle and can therefore be interpreted as spectral densities of three mutually uncorrelated zero mean Gaussian processes [x(t)},{w1(t)} , {w2(t)}. Starting from these processes, we generate {z1(t)} and {z2(t)} by the linear transformation (3.3). We see from (3.1) that the joint spectrum of the stationary processes {z1(t)} , {z2(t)} is precisely equal to the given joint spectral density matrix S. In short, solving

89

problem

P.I is the same thing as finding all irreducible F.A.

models (3.3) for which the joint spectrum of the external variable__~s{z1(t)} , {z2(t)} is equal to the given spectral density matrix S. This problem is a distributional or "weak sense" stochastic realization problem (Finesso and Picei, ]984 and Lindquist and Picci, 1985). Interpreting S as the joint spectrum of two given Gaussian processes {y](t)}, {Y2(t)} , we are looking for all irreducible models (3.3) such that {Zk(t)} and {Yk(t)} equal processes in distribution. In "practical"

are

terms this

means that the model (3.3) will only be useful to simulate the signals {Yk(t)} in an "average" sense but not samp]ewise

in

general. A 5-tuple {AI,A2,Q,RI,R 2} satisfying conditions (i) and (ii) above, or, equivalently a F.A. mode] of the type (3.3) matching the given spectrum S, will be called a F.A. representation of the spectrum S. A (strong sense) F.A. representation of the processes {Y1(t)}, {Y2(t)} is instead a F.A. model of the type (3.3) for which Zk(t)= = Yk(t) almost surely for all

t e Z. This type of (samplewise)

equality is clearly stronger than equality in distribution and can only occur when the processes {Zk(t)} and {Yk(t)} are defined on the same probability space. This means that the various processes {x(t)}, {w1(t)} , {w2(t)} in (3.3) must be built in such a way that Ho: =H(X,Wl,W 2) DH(yl,Y 2) =H(zl,z2). Samplewise (i.e. strong sense) F.A. representations of {y1(t)},[Y2(t)} can be classified according to "how big"an underlying space H

is needed to support o the processes which specify the model. Later we shall study in some detail the class of F.A. representations for which H = o = H(yl,y2). These representations will be called "y-measurable" (o)

(o) Clearly an equivalent condition for y-measurability is that the factor space X is included in H(y).

90

Note that whenever {x(t)} is given, the noise processes {Wk(t)} are automatically fixed as functions of {x(t)~, {Y1(t)}, {y2(t)} by the orthogonality condition (1.17), as

Wk(t) = y k ( t ) - EXyk(t)

,

k = ~,2 ,

(3.4)

where X is the splitting subspace generated by {x(t)}.Therefore a (strong) F.A. representation is completely specified once the factor process {x(t)} is assigned as a function of some available generators of the space H . In particular a y-measurable repreo sentation is completely specified once {x(t)} is given as a function of {Y1(t)} and {Y2(t)}. In order to avoid complicated statements about equivalence classes, it will be useful to fix once and for all a rule for choosing generators in each factor space X. A convenient way to do this is to f i x a full rank factorizationof the cross

spectrum $12 ,

$12(z) = H(z)G (z) ,

(3.5)

where H and G are of respective dimensions m I x n , m 2 x n and of rank equal to

n=rank

$12 a.e. on C. Since $12 is rational, we

can always choose H and G to be rational matrices. In fact we shall choose H and G in such a way that (3.5) is a minimal factorization of the rational matrix Sl2(in the sense of Gohberg

and

Kaashoek

Bart,

(]979), p. 84).

Since all entries of a rational spectral density matrix must be analytic on the unit circle, it follows that both H(z) and G(z) must also be analytic on the unit circle. In the following we shall make the simplifying assumption that $12(z) has no zeros on the unit circle, i.e.

rank S I2(e i0) = n,

This guarantees

V e e [0,2~).

(3.6)

that neither H(z) nor G(z) can have zeros

91

on the unit circle, more precisely, both H(e io) and G(e ie) will be of constant rank n

for all

e E [0,2~). From now on the

matrices H and G will be considered as data of our problem.

LEMMA 3.1 Let condition (3.6) hold.

Then for each equivalence class

of irreducible F.A. models of {y1(t)}, {Y2(t)} there is a unique choice of generating process {x(t)} in the factor space X such that -I AI(Z) = H(z),

A2(z) = G(z)Q(z)

,

(3.7)

where Q is the (nonsingular) spectrum of {x(t)} . Alternatively, a unique generating process [x(t)} can be chosen for which

A 1(z) = H(z)Q(z) -I ,

A2(z) = G(z) ,

(3.8)

where Q is the spectrum of {x(t)}. The generating processes {x(t)}, {x(t)} for =he same minimal splitting subspace X are related by the transformation

x(t) = Q-1(z)x(t) .

(3.9)

Proof: In fact, if we start with an arbitrary irreducible model (1.16),there is a unique change of generators in X, ~(t) =T(z)x(t), with T such that H(z)T(z) =A1(z). Note that there is a unique a.e. nonsingular solution to this equation as both A I and H are of full rank n. Moreover T E L2(C,Qd6) where Q is the spectral density of {x(t)}. This follows from T(z) =H(z) -LA 1(z),because A ~ L2(C,Qde) and any left inverse of H(z) is analytic on the I

92

unit circle, in force of assumption (3.6). With this choice we get

$12(z) = H(z)Q(z)(T(z)-I~A2(z)* , where Q is the spectrum of {x(t)}. From (3.5) it follows then A2(z)T(z)-I = G(z)Q(z) -I Similarly, by choosing x(t) = T(z)x(t) with G(z)T(z) =A2(z) , we obtain (3.8). In particular, for x(t) =x(t) we find T =~-I.

[] By choosing the generators as stated in Lemma 3.1, we get a unique irreducible F.A. model representative of each minimal splitting subspace X. These models, for the two different choices (3.7) and (3.8), can be written as Y1(t) = H(z)x(t) + w (t) I

(3.10)

Y2(t) = G(z)Q(z)-Ix(t) +w2(t) ,

and, respectively, as

y1(t) = H(z)~(z)-1~(t) + w1(t) (3.11) Y2(t) = G(z)x(t) +w2(t)

.

We shall call "first" and "second" type canonical forms the two representations (3.10) and (3.11). Clearly each equivalence class of irreducible F.A. representations of a given spectrum S can in turn be represented by a unique 5-tuple -I {H, G Q or by

, Q, RI, R 2}

93

{H

~-I

G, Q

RI

R 2}

Note that R

and R are uniquely determined from the equaliI 2 ties (3.1) as functions of SI,AI, Q and $2,A2, Q. We conclude that all irreducible F.A. representations of the spectrum S• written in the first canonical form, are parametrized in a one-to-one wa 7 By the

nx n

nonsingular matrix function Q as

{H• G Q

-1

*

, Q, SI-HQH • S2-GQ

-1

*

G } ,

(3.12)

where Q is constrained to satisfy the condition that the matrix

S1

HG

GH

S2

QH

G

(3.13)

be a spectral densitY. Dually• all irreducible F.A. representations o_f_fS written in the second canonical form are parametrized in a one-to-one way by the nonsinBular

nx n

matrix function Q

{HQ -I, G, Q• SI-HQ-IH , S2-GQG } ,

a~s

(3.14)

where Q is constrained by the condition that the matrix

SI

HG

GH*

S2

LH*

H (3.15)

QG*

be a spectral density function. At this point we are ready to describe the solution set of our stochastic realization problem P.I. We introduce the

nx n

Hermitian matrices * -I QI: = H S I H•

*

-I

Q2 = G S 2 G

(3.16)

94 and set Q,:

o

=

Note that both Q1 and Q2 are strictly positive definite rational spectral density matrices in force of condition (3.6) and our standing assumptions on S. We define also the

nxn

Hermitian

matrices

A: = q l - q 2

'

~: = Q2-Q1

(3.18)

THEOREM 3.1 All irreducible F.A. representations of the spectrum S written in the first canonical form (3.12) are parametrized hy the solutions Q of the matrix inequality

Q-Q2-(Q-Q2)A-I (Q-Q2)* > 0

(3.19)

•

Dually~ all irreducible F.A. representations

of S written in the

second canonical form (3.14) are parametrized by the solutions of the inequality Q-QI-(Q-QI)% -1 (Q-QI)* ~ 0. A__n_n nx n

(3.20)

matrix function Q solves (3.19) if and only if

solves (3.20). All solutions Q (Q) of(3.19)

~ = Q-I

(resp. (3.20))are

Hermitian bounded and strictly positive definite,in fact they satisfy QI Z Q Z Q2 "

Q2 > ~ > Q1 '

(3.21)

where QI and Q2 (Q| and Q2 ) are the spectra] densities defined by (3.16) and (3.17).

95

Proof: What needs to be shown is that an n x n matrix function Q makes (3.13) a spectral density matrix if and only if it satisfies the quadratic inequality (3.19). Assume there is a Q making (3.13) into a spectral density matrix. Then, by a standard block diagonalization procedure, the positive definiteness of (3.13) is seen to be equivalent to

S2>0

, (3.22)

$I: = S1-S 12S21 S21 > 0 , *

q-G

-I

*

-I

*~-1

*

-I

S 2 G-(q,G S 2 G)H S I H(Q-G S 2 G) > 0 .

The first two inequalities are trivially satisfied. In fact, by our Basic Assumption on S, S 2 and $I are strictly positive definite on the whole of C. By simple matrix manipulations it can be checked that *

* -1

H*S-IHI = H (SI-HQ2 H )

_

H = (QI

-1

Q2 )

(3.23)

and therefore, recalling our notations (3.18), we see that Q has to satisfy the inequality (3.19). Note that Q makes the matrix (3.13) positive semidefinite if and only if Q=Q-I makes (3.15) positive semidefinite. This in turn happens if and only if Q satisfies the dual inequality (3.20) as can be seen by exactly the same argument used before. -I Thus Q satisfies (3.19) if and only if Q satisfies (3.20). -I Observe now that the matrixA , given by the expression (3.23), is strictly positive definite Hermitian on the unit circle and therefore any solution Q to (3.19) makes Q-Q2 positive semidefinlte Hermitian. Hence Q is Hermitian and Q ~ Q 2 " solution Q of (3.20) satisfies Q ~ Q I

•

Similarly any

Then, writing Q as Q-I9

we

96

obtain the first inequality in (3.21). So, any solution to (3.19) has a lower (Q2) and upper bound (~]I),. Q2 being strictly posi----I

rive definite and QI

being trivially bounded on C. It follows

that any solution to (3.19) is a spectral density matrix. The matrix (3.13) constructed from such a solution is also Hermitian positive semidefinite and has bounded entries on the unit circle. Therefore it is a spectral density matrix. [] The solution set of the inequalities

(3.19),

(3.20) can be

described quite explicitly.

THEOREM 3.2 An

nx n

matrix valued function Q on the unit circle solves

the inequality (3.19) if and only if it is Hermitian and Dually, an

nxn

QI ~ Q~Q2'

matrix Q solves (3.20) if and only if it is

Hermitian and satisfies

Q2~Q~QI.

Proof: The "only if "part is already contained in the statement of Theorem 3.1. We only need to prove the "if" part. Assume first that QI > Q > Q2 (with strict inequalities) holds. Then (QI-Q) and (Q-Q2) are both Hermitian strictly positive definite and there-I -I fore (Q-Q2) + (QI-Q) is strictly positive definite. Byawellknown formula for the inverse of a sum of matrices we see that this positivity condition is equivalent to

Q-Q2-(Q-Q2)A

(Q-Q2)

> 0 .

Now, every Q satisfying QI ~ Q ~ Q 2

(3.24)

can be approximated in L=nxn(C)

by a sequence of matrices Qk for which the strict inequalities hold~ Take for instance

97

Qk = -k- Q +

(QI+Q2) '

for which apparently Qk-Q2 > 0 and QI-Qk > 0. Hence Qk satisfies the strict inequality (3.24). But the left hand side of (3.24) is a positive definite matrix which is a continuous function of Qk and, as

k-> ~, it can at most become positive semidefinite.

[3 REMARK As a corollary of Theorems 3.1, 3.2 we obtain that the inequality QI>Q_>Q2 is equivalent to -1

SI>HQH

,

S2>G Q

*

G ,

Q>O,

which form in turn an equivalent set of conditions to the positivity of the matrix (3.13). This fact in particular guarantees that if Q satisfies (3.21) (or equivalently (3.19)), then the noise spectra R I and R 2 will be (Hermitian and) positive semidefinite. Note that the maximal solution QI is in this sense just the matrix which corresponds to the largest approximant of rank n of S I in the ordering of Hermitian positive semidefinite matri-

[] ces.

Theorem 3.1 provides a recipe for computing all irreducible F.A. representations describing a given spectral density matrix S in a fixed coordinate system. We can now easily see that there are many of such representations (a fact that we have not bothered to show till now). For example, as the two "extreme" spectra QI and Q2 defined in (3.16) and (3.17)

both satisfy the inequality

(3.19) (with equality sign), we see that there are a "maximal" and "minimal" irreducible F.A. representations (in the first canonical form) which correspond respectively to the maximal (QI)

98

and minimal (Q2) solutions to the inequality (3.19). Solutions like QI' Q2 above for which (3.19) is satisfied with equality sign have a special meaning. They correspond to joint spectra (3.13) of minimum possible rank, m, as can be seen from the block diagonalization

(3.22). Since the rank of the joint

spectrum of {z1(t)}, {z2(t)} and {x(t)} is equal to the multiplicity of the doubly invariant subspace

H(X,Zl,Z 2) spanned by these

processes, the multiplicity m of

H(X,Zl,Z 2) is equal to the mul-

tiplicity of the subspace H(Zl,Z2). This can only happen if H(X,Zl,Z 2) =H(Zl,Z2) , or, that is the same, if x(t) EH(Zl,Z 2) for all

tEZ. We see that all models which correspond to solu-

tions Q of (3.19) with equality sign are characterized by the fact that the factor process {x(t)} is a function of {z1(t)} , {z2(t)}. This observation is the key to the following result. PROPOSITION 3.1 The solutions Q to the quadratic matrix equation -I Q-Q2-(Q-Q2)A

* (Q-Q2)

= 0

(3.25)

parametrize in a one-to-one way the (strong) irreducible y-measurable representations of the processes {Y1(t)}, {Y2(t)}

of the form

(3.10). Dually, all solutions Q to the quadratic e~uation

Q-QI-

(Q-QI)A-I(Q-QI)* = 0

(3.26)

parametrize in a one-to-one way the (strong) irreducible F.A. representations of {Y1(t)}, {Y2(t)} of the form (3.11) for which XCH(y). Proof: Consider a F.A. representation of the type (3.10). If the factor space X is contained in H(y),then H(x,Yl,Y2) =H(yl,Y2) =H(y)

99

and hence the joint spectrum of {Y1(t)}, {Y2(t)}, {x(t)} has rank m. This implies that the spectrum Q of {x(t)} satisfies (3.19) with equality sign. Viee versa, assume Q is a solution of (3.25). Then, as discussed previously, the factor process of the F.A. -I model of type (3.3) attached to the weak realization {H, GQ , *

Q, SI-HQH , S2-G Q

-I

*

O } of the spectrum S, has the property that

x(t) belongs to H(z 1,z 2) for all t. It can therefore be written as x(t)=P1(Z)Zl(t)+P2(z)z2(t), transfer matrices. Define an

where P.(z), i= 1,2, are n x m . i l n-dimensional process {x(t)} by

setting

x(t) = P1(z)Y1(t) +P2(z)Y2(t).

(3.27)

Then {x(t)}, {Y1(t)}, {Y2(t)} have exactly the same joint second order statistics (i.e. the same spectrum) as {x(t)}, {z1(t)} , {z2(t)}. Since conditional orthogonality depends on joint second order moments o n l ~ i t then follows that is splitting for splitting for

H(Yl) , H(y 2)

X : = span {x(t); t 6 ~ }

exactly as the factor space X was

H(Zl) , H(z2). Hence {x(t)} is the factor process

of a strong F.A. representation of the type (3.10). By construction {x(t)} has spectral density matrix equal to Q and XCH(y). []

Let us define the stationary n-dimensional processes

xl(t) =

Q1(z)

xl(t),

Xl(t) =H

(z)S1(z) -ly I (t)

, (3.28)

x 2(t) = G * (z)S2(z) -ly2(t) where QI is defined by (3.16),(3.17).

Observe that the spectra of

{x1(t)} and {x2(t)} are precisely the extremal solutions QI,Q2 of the quadratic inequality (3.19). It is immediate to check that {x1(t)} and {x2(t)} are minimal generators for the subspaces

100

X1:

-H(Yl)H(Y2) '

= E

X2:

~H(Y2)H(y ]).

=

(In fact, for example X 2 is generated by

Y1(t) = S|2(z)S21(z)Y2(t) =

= H(z)x2(t) ). Moreover both X I and X 2 are minimal splitting suhspaces (compare e.g. Lindquist,Picci and Ruckehusch, 1979) XICH(y I) ,

X2CH(Y 2) ,

therefore they specify two equivalence classes of strong irreducible F.A. representations of {y1(t)}, {Y2(t)}. The particular generators {x1(t) } and {x2(t) } defined in (3.28) correspond to choosing these representations in the first canonical form, namely Y1(t) = H(z)x1(t ) +w1,1(t) , (3.29) Y2(t) = G(z)Q1(z) -Ixi (t) +wl,2(t) , and Y1(t) = H(z)x2(t ) +w2,l(t) , (3.30) Y2(t) = G(z)Q2(z)-Ix2(t ) +w2, 2 (t) . Observe that in the representation (3.29) the second equation is just the decomposition of estimate

Y2(t)

as the sum of the (noncausal)

Y2(t) = S21(z)S](z)-I Y1(t) and of the corresponding

estimation error. The first equation is more interesting. It can be rewritten in the form YI(t) = ~H(z)Y1(t) + (l-nH(z))Y1(t) , where H

(3.31)

i s the p r o j e c t i o n v a l u e d m a t r i x f u n c t i o n ~tt(z) = H(z)(H

(z)S1(z)-IH(z))-'IH*(z)S1(z)-1 (3,32)

101

mapping onto the column space of H. Note that ~ is S1-orthogonal , , H i.e. HHSI(I-~H) = 0 a.e. on the unit circle. Thus x I formally looks like the classical least squares estimate of x linear model Yl = H x + w

in the

. An analogous interpretation holds for the

second equation in (3.30). The next theorem describes quite explicitly

the family of

all (strong) y-measurable irreducible F.A. representations of {Y1(t) },

{Y2(t) }.

THEOREM 3.3 The factor process of any irreducible y-measurable F.A. representation (in the first canonical form) is a combination of {x1(t)} and {x2(t)} of the form x(t) = H(z)x1(t) + (I-E(z))x2(t) ,

(3.33)

where = (Q-Q2)A -1

(3.34)

is a A-ortho~onal projection valued matrix function on the unit circle. Proof : The proof relies on the easily checked fact that IN(x2) ] =x2(t). Then

x(t): =x1(t)-x2(t)

is orthogonal to {x2(t)} and so the direct sum is orthogonal.

ELx1(t) I

form a process which

H(x 1,x 2) =H(x)8H(x2) , where Now, any minimal splitting

suhspace X C H(y) is actually contained in

H(x~,x 2) C H(y)

(Lindquist and Picci, 1985), so that the corresponding factor process {x(t)} can be expressed as x(t) = S

~(z)A(z)-Ix(t) +S (z)Q2(z)-Ix2(t) x,x x,x 2 '

(3.35)

102

where the cross spectra are easily computed from

S = S S-1HQ11 = q x,x I x,Y I SX,X 2 z S S-IG x,y 2 2

=

9

Q2

Equation (3.35) is exactly the same as (3.33). In order to check that H is a projection, notice that right multiplication of -I (3.25) by A gives

(Q-Q2)A -I = (Q-Q2)A-I(Q-Q2)& -I

,

which shows that H = H 2 ; moreover (3.25) can be rewritten to look exactly llke ~g(l-~)

=0. Thus H is a A-orthogonal projection.

If we couple formula (3.33) with the explicit expressions (3.28) given for

x1(t)

and

x2(t) , we obtain a linear trans-

formation acting on the "data" {y1(t)}~Y2(t)}

that ~e want to

represent. This is precisely the rule telling us how the factor process of each y-measurable representation is manifactured. Note that (3.33) is still parametrized by Q. To complete the picture we need now to describe the solution set of the quadratic equation (3.25).

PROPOSITION 3.2 Let V be a square spectral factor of the spectral density matrix A=QI-Q2.

Then all solutions

Q = Q2+v where r is any

nxk

rr v

,

(k~n)

unit circleti.e, such that

Q #Q2

to (3.25) are given

(3.36) isometric matrix function on the

103

F F = Ik ,

Ik being the

kx k

(3.37)

identity matrix.

Proof: Write

Q-Q2' assumed to be of rank

k
, in factorized

form as

Q-Q2 with U

=uu

,

a full rank spectral factor of dimension

n x k. Since

U has a left inverse, (3.25) can be reduced to

Ik = U* (V V*)-IU ,

from which we see that

r: = V - I U satisfies (3.37).

[] Note that there is just one

Q

such that rank(Q-Q2) =n.

In this case F is square and (3.37) is equivalent to F F * = I; hence we obtain the "maximal" solution extreme, the "minimal" solution by setting F = 0

Q = Q2

Q = QI" At the other is formally obtained

in formula (3.36). Observe

that by choosing

F varying over the set of rational isometric matrices we have a parametrization of all rational solutions of equation (3.25). In other words, recalling that H and G were chosen rational,we have a parametrization of all rational irreducible y-measurable F.A. representations of the processes {y|(t)} and {Y2(t)}.

104

4. Causality As an application of the characterization obtained in Sect. 3 we shall discuss here the question of causality of the transfer function

W(z)

defined at the end of Sect. 2. We shall call a

F.A. model

yk(t) = yk(t) +wk(t)

, k = 1,2

~k(t) - ~(z)x(t)

(4.1)

,

causal whenever we can write

yz(t) = W(z)Y1(t) for a causal

m2xm I

(4.2)

transfer function matrix. The question is

if there are any causal F.A. models for a given pair

of processes

{y1(t)},{Y2(t)} satisfying our Basic Assumption. Note that for -L irreducible models W(z) =A2(z)AI(z) and at the effect of (4.2) the choice of the left inverse is immaterial. Therefore an irredu-L cible model will be causal if, for at least one left inverse A I , -L the transfer matrix A2A I is causal. We shall need the concept of Wiener-Hopf factorization relative to the unit circle C of the rational matrix S 2(z). As noticed in (Fuhrmann and Willems,

1979), the original arguments of

(Gohherg and Krein, 1960) can be adapted to cover the nonsquare (singular) case which is of interest here. Recall that by our Basic Assumption and in force of condition (3.6), $12(z) has constant rank n on the unit circle. LEMMA 4.1 (WIENER-HOPF FACTORIZATION) The rational matrix function $12(z) can be factored as

105

s12(z)

H(z)D(z)G(z) ~

=

where ^H(z) and G(z) are

m xn

- -

matrices of rank n and

D(z)

is an

,

and

(4.3)

m xn

I

causal rational

2

on the unit circle with a causal left inverse

nx n

diagonal matrix of the type

D(z) = diag{z-kl,...,z -k~, z k£+I .... ,zkn} .

(4.4)

The integers

-k

< ... < -k I

--

--

< 0 < k £

--

%+1

< ... < k --

--

(4.5) n

are uniquely determined and are called the (left) Wiener-Hopf factorization indices of $12 , relative to C .

[] Note that D(z) can in turn be factored as

D(z) = D1(z)D2(z)

,

(4.6)

where

D (z) = diag r iz-kl

.,z

-k~

I

,I}

I

D2(z) = diag { 1 , . . . , 1 ,

(4.7)

z - k £ + l , . . . , z -kn} .

The faetorization (4.3) can thus be rewritten in the form

$12(z) = [H(z)D 1(z)] [G(z)D2(z) ]

(4.8)

In this section we shall identify the rational matrix functions H(z) and G(z) of the minimal factorization (3.5) with the two terms

within square brackets in (4.8). We shall consider irredu-

cible F.A. models written in the second canonical form

106

y1(t) = H(z)D1(z)~(z)-1~(t ) + w1(t) , (4.9)

Y2(t) = G(z)D2(z)~(t ) with Q(z) any Hermitian

nx n

+ w2(t) ,

matrix function satisfying

Q2ZQ~QI,

(4.10)

where *^,

-I~

Q2 = (D2G $2 ~I =

D2)-I

,A, - 1 ~ DIH $I H D I

(4.11) (4.12)

.

In this framework the transfer function matrix W relative to an arbitrary irreducible F.A. model is W(z) = G(z)D2(z)Q(z)D ](z)*H(z) -L

(4.13)

LEMMA 4.2 The transfer function W(z) choice of the left inverse

is causal (for at least one

~-L) if and onl7 if D2(z)Q(z)D1(z)*

is a causal matrix function. Proof: (If). Since inverse. Thus if

H(z) D2Q D;

is minimum phase there is a causal left is causal, W is causal.

(Only if). Since D2(z)~(z)D1(z)* = G(z)-Lw(z)H(z) and

G(z)

is minimum phase, it follows that W causal implies

D2Q D I causal. []

107

THEOREM 4.1 Under the stated assumptions acausal irreducible F.A. model of {Y1(t)}, {Y2(t)} can only exist if the Wiener-Hopf fact0rization indices of $12(z) are all nonnegative (i.e. D1(z) = I

in --

n

(4.7)). Proof: We show that if D (z) # I I

then

D2QD I

cannot be causal. In fact, for

diagonal element of j-th

or equivalently £ > 0

D 2 Q- D *I

diagonal element of

in (4.5),

n

is

jig

zkj-qjj (z), where

, the j-th q.3J (z)

is the

Q(z). By definition of causality we

must have (compare (1.9)) r~ iO(kj-k)_ I (eiO)de/2~ = 0 J e qjj --7

for all

k > 0. By taking complex conjugate and recalling that

~.. is a real function, we also obtain ]J fI ~ eie(k-kj)~ ..(eie)d6/2~ ~ JJ

for

= 0

k > O. Now, if these two relations hold for some

k. > 0 ~ they 3

imply that J

e_j0 h ~jj ~ele)de/2z

=0

--IT

for all

q..(e ie) = 0 a.e. on C 33 • and contradicts the (strict) positive definiteness of Q(el0). Thus

h E Z . This is equivalent to

D2 Q D I

cannot be causal. D

I08

At this point, to be able to proceed any further we have to introduce the assumption that the Wien=r-Hopf indices of $12 are all nonnegative, i.e. that D(z) = D2(z) = diag{z k] .... ,zkn} ,

(4.14)

0
(4.15)

where

I

< ... < k -- n

We next introduce the notion of matrix

nx n

trigonometric polynomial

P(z) = ~ij(z)~ with indices the ordered set of n natural

numbers {kl,...,kn}. The i,j-th entry of P(z) has the structure

Pij(z) =

ki k E -kj pij k z

(4.16)

THEOREM 4.2 Assume that the Wiener-Hopf factorization indices of

S|2(z)

are all nonnegative. Then there are causal irreducible F.A. models if and only if there are Hermitian trigonometric polynomial solutions Q to the inequality *^* S -I^ (D2G 2 G D2)-I > Q > H^* S-I^ I H ,

(4,17)

with indices, equal to the factorization indices (4.15) of $12(z). Proof: By our assumption model is characterized

(4.14) and Lemma 4.2 each causal F.A. by

D2(z)Q(z)

being causal. By definition

this happens if and only if f~ 1 L~

e

-i0 (ki+k) ~ ~..(eiO)dS/2~ z3

= 0

10g

for

i,j = 1,...,n

and all

k>0.

and recalling that Q is Hermitian,

By taking complex conjugate i.e. qij(eiO) *~qji(eie),

we also get I

ei0 (kJ +k) qij (eie)d0/2~ = 0

~-~ for all

k > 0. Taken together these two relations are equivalent

to

]fit

-iOh_

e

J-~r

qij(e10)dO/2~ = 0

for all

h

satisfying

h<-k

q..(z) lJ

has the expression

and j (4.16).

h>k.. z

This shows that

[]

Observe

that a positive definite trigonometric polynomial

can be factored as

~(z)

where

N(z)

= N(z)N(z)

is an

,

(4.18)

n x n polynomial matrix which can be taken

row-proper and with row degrees exactly equal to the indices k < ... < k of Q. Recalling the remark made after the proof I --- n of Theorem 3.2, we can recast the conditions of Theorem 4.2 in terms of the joint spectrum S in the following way.

COROLLARY 4.1 Assume the Wiener-Hopf factorization indices of $12(z) are all nonnegative.

Then there are causal irreducible F.A. mode!s

if and only if there are

n x n polynomial matrices

(ordered) row degrees equal to the indices

k I < ... < k --

$12(z)

such that

N(z) --

with of

n

--

110

> G D2N S2 --

~*D2G *~* , (4.19)

sI

! ~(~*)-I_-I^*N Q

At the beginning of this section causality was defined with respect to a certain choice of inpu=

(;i) and output

(y2) va-

riables. If we nhoose instead Y2(t) as input and ~1(t) as output , we can of course go through a very similar analysis and obtain analogous conditions for the existence of causal irreducible F.A. models of the type Y1(t) = W(z)~2(t) + w 1(t), (4.20) Y2(t) = Y2(t) where

Y1(t)

+w2(t),

is obtained as

^

(4.21)

Y1(t) = W(z)~Y2(t) for a causal

m Ixm 2

transfer function matrix. Relative to the

Wiener-Hopf factorization (4.3), W ~ has the expression (4.22)

W(z) #F = H(z)D I (z)Q(z)D2(z)* G(z) -L

THEOREM 4.3 Causal irreducible F.A. models of the type (4.20) can only exist if the Wiener-Hopf factorization indices of $12(z)

are

all negative or zero~i.e, only if D(z) = D1(z) = diag{z-k1,...,z -kn} •

(4.23)

In case (4.23) is satisfied, there are causal irreducible F.A. models of the type (4.20) if and only if there are Hermitian

111

tri$onometric polynomial solutions Q to the inequality ( *^~ -I^ )-I ^* -I^ DIH S I H D I > Q > G S2 G ,

(4.24)

with indices equal to the opposite {kl,...,k n} of the factorization indices of S12(z).An equivalent condition is the existence of nx n polynomial matrices N(z) with (ordered) row degrees kl, .... ~n such that

S I > H DIN N DIH

,

$2 > G (N,)_IN_I~,.

(4.25)

[] Let us agree to call minimum phase

those F.A. models for

which both (4.2) and (4.21) are causal input-output relations. Then,as a corollary of Theorems 4.1 ¢ 4.3, we get that minimum phase models exist only if the Wiener-Hopf factorization of S|2(z)

has

D(z) =In

(i.e. is "canonical" in the terminology

of Gohberg and Krein (1960) and Bart,

Gohberg and Kaashoek

(1979)).There exist minimum phase F.A. models if and only if there are constant Hermitian n x n matrices Q for which A~

SI>HQH (4.26)

S2 > ~ Q-I~* on the unit circle. We see that the factor process of minimum phase irreducible F.A. models written in either canonical form, for which H and G are chosen equal to the Wiener-Hopf factors, must be an n-dimensional white noise process.

112

References Anderson, B.D.O. (1985): Identification of Scalar Errors-ln-Variab]es Models with Dynamics. Au~omatica, 21, 709-716. Anderson, B.D.O. and M. Deist]er (1984): Identifiability in Dynamic Errors-ln-Variab]es Models. J. Time Series Analysis, 5, 1-13. Anderson, B.D.O. and M.R. Gevers (]982): Identifiability of Linear Stochastic Systems Operating Under Linear Feedback. Automatica, 18, 195-2]3. Bart, H., I. Gohberg and M.A. Kaashoek (]979): Minima] Factorization of Matrix and Operator Functions, Operator Theory: Advances and App]ications, Vol. 1, Birkh~user Verlag, Base]. Bart, H., I. Gohberg and M.A. Kaashoek (]984): Wiener-Hopf Factorization and Realization, in Proc. Int. Symp. on Mathematical Theory of Networks and Systems, Beer Sheva, Israel, June 1983, Springer-Verla~Lect. Notes in Contro] and Inf. Sciences, 58, 42-62. Caines, P.E. and C.W. Chan (1975): Feedback Between Stationary Stochastic Processes. IEEE Trans. Aut. Control, AC-20, 498-508. Caines, P.E. and C.W. Chan (1976): Estimation, Identification and Feedback, in System Identification: Advances and Case Studies, R.K. Mehra and D.G. Lainiotis eds., Academic Press, New York. Deist]er, M. (]985): Identiflabi]ity and Causality in Linear Dynamic Errors-ln-Variab]es Systems. Report, Inst. of Econometrics and Operations Research, University of Technology,Vienna. Finesso, L. and G. Picci (1984): Linear Statistical Models and Stochastic Realization Theory, in Proc. Vl-th Int. Conf. on Analysis and O~timization of Systems, Nice, France, June 1984, Springer-Verlag Lect. Notes in Control and Inf. Sciences, 62, 445-470. Fuhrmann, P.A. (]98]): Linear Operators and Systems Space, McGraw-Hi]1, New York.

in Hi]bert

Fuhrmann, P.A. and J.C. Willems (]979): Factorization Indices at Infinity for Rational Matrix Functions. Integral Equations and Operator Theory, 2, 287-301.

113

Gevers, M.R. and B.D.O. Anderson (1981): Representations of JointlyStationary Stochastic Feedback Processes. Int. J. of Control, 33, 777-809. Gevers, M.R. and B.D.O. Anderson (1982): On Joint]y Stationary Feedback-Free Stochastic Processes. IEEE Trans. Aut. Control, AC-27, 431-436. Gohberg, I. and M.G. Krein (1960): Systems of Integral Equations on a Half Line with Kernels Depending on the Difference of Arguments. Amer. Math. Soc. Transl.(2), ]4, 2]7-287. Granger, C.W.J. (1963): Economic Processes Invo]ving Feedback. Information and Control, 6, 28-48, Granger, C.W.J. (1969): Investigating Causa] Re]ations by Econometric Models and Cross-Spectral Methods. Econometrica, 37. Hoffman, K. (1962): Banach Spaces and Analytic Functions, Prentice -Hall, Englewood Cliffs. Kalman, R.E.(1982a): System Identification from Noisy Data, in Dynamical Systems II, A.R. Bednarek and L. Cesari eds., Academic Press, New York. Kalman, R.E. (1982b): Identification from Real Data, in Current Developments in the Interface: Economies, Econometrics, Mathematics, M. Hazewinke] and A.H.G. Rinnooy Kan eds., Reidel, Dordreeht. Kalman, R.E. (1983): Identifiability and Modeling in Econometrics, in Developments in Statistics~ Vol. 4, P.R. Krishnaiah ed., Academic Press, New York. Lindquist, A. and G. Picci (1985): Realization Theory for Multivariate Stationary Gaussian Processes. SIAM J. Control and Optim. 23, 809-857. Lindquist, A., G. Picci and G. Ruekebusch (1979): On Minimal Splitting Subspaees and Markovian Representations. Math. Systems Theory, 12, 271-279. Picci, G. and S. Pinzoni (1986): Dynamic Factor Analysis Models for Stationary Processes, IMA J. Math. Control and Information, to appear.

114

Rozanov, Y.A. (]967): Stationary Random Processes, Ho]den-Day, San Francisco. Ruckebusch, G. (1976): Representations Markoviennes de Processus Gaussiens Stationnaires. C.R. Acad. Sc. Paris, S~r. A, 282, 649-651. Van Schuppen, J.H. (1985): Stochastic Realization Problems Motivated hy Econometric Modelling. Report 0S-R8507, Centre for Mathematics and ComputerScience, Amsterdam. Willems, J.C. (1979): System Theoretic Models for the Analysis of Physica] Systems. Ricerche di Automatica, 10, 7]-106.

Chapter 4

Predictive and Nonpredictive Minimum Description Length Principles

Jorma Rissanen

1. Introduction

Statistical estimation or modeling is an activity aimed at infering from a set of observed data certain properties that are expected to hold in future data. This involves a fundamental dilemma in that whatever we estimate will be determined by the current data, and yet the success of our attempts will be judged by the behaviour in the future data, which evidently are not available now. The way this difficulty is dealt with in traditional statistics is to regard the current d ata as a sample from a larger, in effect an infinite population, represented by a "true" probability distribution with parameters, each meant to define some property of the data. These parameters then provide the targets to be estimated, which can be done by minimization of some measure of nearness, such as the squared deviations or the likelihood function, between the existing data and the fitted parametric distributions. In trivial cases the n umb er of the "true" parameters is taken to be known, but frequently, in order to leave all the doors open, the "true" parent distribution is assumed to have infinitely many parameters, which evidently is a safe hypothesis in that it can neither be verified nor disproved.

The problem with the "true" distribution hypothesis is not so much the fact that the distribution has to be chosen subjectively (in fact, selecting a large enough class will allow a lot of leeway) as the fact that this hypothesis forces us to regard models as approximations of the assumed distribution, the goodness of which, however, must be judged in the light of the observed data. Hence, if we fit models having different numbers of parameters, then a model with more parameters is likely to provide a better fit than one with fewer parameters without any guarantee of better performance on future data. And this is true no matter how we measure the nearness. What imtead

116

is needed is the ability to compare models regardless of the number of parameters they have, which simply cannot be done by their nearness to an abstract and subjeetivoly selected parent distribution.

In this paper we present in a tutorial fashion a rather different approach to statistical reasoning, introduced and studied in a number of papers, Rissanen (1978), (1983b), {1984a,b,c), (1985a,b). The reasoning goes as follows: The main problem in statistical modeling is regarded as one of understanding and explaining the set of observed data, which, to be sure, often look quite chaotic. Intuitively, "understanding" presumably means something related to an ability to learn and to discover various regular features that constrain the data and that imply redundancy if we were to describe the data without taking them into account. Additionally, an understanding permits a degree of prediction. A trivial example is a sequence such as l, 4, 9, 16, 25, which, if we spot the rule, can be described very concisely as well as perfectly predicted, provided, of course, that the rule holds even in the future. A less trivial example is Newton's law of gravitation, which is a model that permitted a great improvement over the Ptolemaic models by Eudoxus and Hipparohus as wcU as Tycho Brahe's tables for the planetary motions both in regard of the description length and predictability. Notice, in particular, that no "true" law is needed to do prediction; for example, Hipparchus' epieycles and eccentric circles were clearly incorrect explanations of the planetary motions, but still they provided useful predictions of the lunar eclipses. Similarly, Newton's law is also incorrect, but it provides predictions with astonishing accuracy. Many people think that there is a difference of a kind between a grossly incorrect model like I~udoxus' and an accurate one like Newton's, because, indeed, Newton's model "explains" the planetary motions with help of the most elegant law of universal gravitation. But the difference is simply one of degree, and the universal "law" of gravitation is just another incorrect model, which, incidentally, involves the rather disturbing and, in fact, absurd idea that a force is being transmitted instantly. To summarize, there are no "true" laws nor systems outside the realm of mathematics, but that does not prevent us from understanding observed data.

We are interested in statistical features which, of course, somehow reflect the underlying data generating machinery. Since we usually are not allowed to open up the machinery and take a direct look, we must have some means to recognize the regular Ieatures in the observations and to measure their amount. This can be done by counting the number of binary digits with which the observed data can be written down by taking advantage of the various

117

models, t h a t serve as a n expression of the rules. In technical terms, w e s a y t h a t the d a t a is encoded for the purpose of getting a short code length; i,e., " c o m p r e s s e d " . The resuhin 8 code lensth , then, represents a universal a n d immutable criterion for model fitting, w h i c h is just a b o u t as free from subjective and w h i m s i c a l choices as w e can make it. T h e r e remains the subjective selection e l the class of models, but t h a t m u s t necessarily be so; after all, how can w e learn f r o m the d a t a unless w e can formulate the properties we wish to find'/Similarly, b y carefully selecting the model class w e can also influence the properties w e wish to discover, w h i c h gives us a means of learning. It is i m p o r t a n t to see t h a t the code must include the description of the model itself, for otherwise the ira= aginod d e c o d e r could not r e c o v e r the data. We call the process of minimizing such a criterion the

Minimum Description I2ngth (AIDL) principle.

It is clear t h a t the length of coding the d a t a c a n n o t b e reduced below a certain level, which is entirely d e t e r m i n e d by the d a t a and the class of the selected models, regardless of w h e t h e r the models h a v e the same n u m b e r of parameters or not. W e call this critical level the stochastic complexity of the data, relative to the considered class of models. Different m o d e l classes can be judged b y their stochastic complexity, and perhaps to some consternation a subclass of a n o t h e r class m a y produce a strictly smaller stochastic c o m p l e x i t y t h a n the larger class. Hence, the way to good models is not just to m a k e the model classes larger and larger; that is to say, to m a k e the models increasingly complex.

In a s t a r k contrast w i t h the traditional statistics, the optimal model, d e t e r m i n e d b y the

stochastic complexity, or with which it is reached, is not a n a p p r o x i m a t i o n of a n y t h i n g at all. Rather, it has a n independent meaning in incorporating all the statistical information in the d a t a t h a t can b e e x t r a c t e d w i t h the considered class of models. In particular, it has a n optimal n u m b e r e l parameters, w h i c h are calculated b y a n estimator w h i c h is either efficient or it approaches a n efficient estimator in the traditional sense. In addition, stochastic c o m p l e x i t y also sets the greatest lower bound w i t h w h i c h the d a t a can he predicted w i t h the considered class of models, a n d w e m a y s a y t h a t its calculation and the search for a class of models giving a small stochastic c o m p l e x i t y are the t w o f u n d a m e n t a l problems in statistics.

It m a y be w o r t h w h i l e to elaborate our v i e w on modeling a bit further. Quite often the observed sequence is regarded as a r a n d o m sample, a n d it is t h o u g h t to consist of the i m p o r t a n t information bearing part and of the random noise t h a t is clearly a nuisance to be gotten rid of. Hence, one m a y think t h a t it is necessary to e x t r a c t

118

somehow the "useful" signal so that we then can fit our models to it. Such a prefiltering, however, hides a dangerous prejudice, for strictly speaking the observations never include any noise. They are just numbers, and the only way to separate a portion off them and to call it noise is to use modcB. Hence, nolzo is something we define it to be, namely, the difference between a modeled signal and the observed numbers, rather than something imposed by nature. The fact is that nature produces the observations, and the rest is man made - to paraphrase the famous saying of Kroneeker. To take a simple example, we m a y model the observed input u and the observed output y as being related as follows,

x t = f ( u t) yr=xt+et

where x~ is considered the "useful" signal and e, represents the "noise". Evidently, for a given pair of the observations this decomposition depends completely on the modeled function f. We may, of course, impose a condition on the non-observed e,, such as that its variance is a prescribed number. The effect is a restriction on the functions f that satisfy the extra requirement, which is perfectly in order. A superficial thinking might lead one to the idea that the M D L principle, which has no prefilters for noise, forces us to fit models to noise. Such thinking is evidently contradictory, because as we just saw, noise itself is a result of the modeling process. The M D L principle fits models to data, and we can actually see directly how it automatically avoids inserting parameters to capture "noise": If, indeed, a certain portion of the observations consists of random fluctuations, such as e, in the previous example, then no modeling can shorten their description; i.e., to "compress" them. Suppose that, say, two parameters in the modeled function f a r e sufficient to compress the "useful" part in the observations and that we try to add parameters to compress the fluctuations. Since these cannot be compressed by any means whatsoever, the extra parameters do not "buy" any compression while their own description costs bits. Hence, the M D L principle will remove such parameters, and what remains are only the effective ones, which is just what is needed. The random fluctuations just "pass through" the model unchanged, and they do no harm. As a matter of fact, to push this point to its extreme, you can add pure random noise to the observations without much effect to the optimal M D L model!

119

A deeper issue involves the question of how to measure the amount of "information", that we call complexity, in the data. Fisher's famous idea was to m e a s u r e this information content b y the d e t e r m i n a n t of his information m a t r i x , w h i c h b y C r a m e r - R a o inequality represents the smaliest variance of the p a r a m e t e r estimates. Hence, intuitively, if a n estimator does achieve this lower bound, then it must be the case that the process has e x t r a c t e d all the useful i n f o r m a t i o n in the data. This is a curious, r o u n d a b o u t procedure, and it w o r k s just because the considered p a r a m e t r i c likelihood function is restricted to have a fixed n u m b e r of parameters. W h e n we consider the larger classes of models w h e r e the n u m b e r of p a r a m e t e r s is not fixed, as w e m u s t in order to get b e t t e r models, t h e n the v a r i a n c e of the p a r a m e t e r estimates ceases to be a meaningful m e a s u r e of the information content in the data, which is w h y w e define the stochastic c o m p l e x i t y directly in t e r m s of the data, "noise" a n d all. In a sense, then, Fisher's idea is brilliant but the concept is far too restricted to do w h a t was intended. The information m a t r i x still, of course, is a n i m p o r t a n t quantity, but not as a measure of the useful information in the data.

As a further point, in our view in the absence of prior knowledge we must t a k e every observed sequence to be "typical" in t h a t it is representative of the underlying mechanism. After all, it is all w e have, and we have no right to claim otherwise. Hence, w e should fit models to these data and not to w h a t w e might imagine the data should be. Only if w e have a r a t h e r firm idea of the probabitistic model of a source, such as the gambling machines, obtained on some prior grounds can w e claim that a certain odd observation sequence might not be typical. As a final point, it is often felt that a model is good if its p a r a m e t e r s arc such t h a t repeated estimates are close to each other; in other words, their estimated variance is small. This is the thinking of confidence intervals a n d such. Well, let me define a one- p a r a m e t e r model, w h e r e the p a r a m e t e r has the value 1.2 no m a t t e r w h a t the d a t a are. Clearly, you c a n n o t h a v e a smaller variance, but such a model is probably worthless. In fact, it m a y be a v e r y dangerous prejudice to isolate a quantity and call it a p a r a m e t e r . A n e x a m p l e is the blood pressure, which has been regarded as a n i m p o r t a n t " p a r a m e t e r " , carrying information about the health of a h u m a n body. It has been found relatively recently t h a t it should be regarded as a variable, because it fluctuates considerably in perfectly healthy people, and to diagnose illness, b e c a m e a m e a s u r e m e n t happens to deviate from w h a t has been thought to be its normal value, has led to needless a n d even dangerous medications. In conclusion, the moral of all this discussion is t h a t it is important to u n d e r s t a n d the n a t u r e of statistical reasoning a n d modeling in o r d e r to be able to avoid the m a n y pitfalls t h a t lurk along the way, the most i m p o r t a n t of w h i c h are the a r b i t r a r y and unjustified choices t h a t h a v e a tend-

120

ency of creeping in even if we are on our guard. The only reality consists of the data; the rest are models and uther theories, which well may be gray - as Goethe claimed - but which we ought to select as intelligently as we can.

Th0 stochastic complexity clearly has its roots in the algorithmic notion of information, Solomonoff (1964), Kolmogorov (1965), and Chaitin (1975), which dcfines the complexity of.a binary string to be the length of the shortest program needed to generate it in a universal computer. However, in order to ma ke the principle practicable we must not select the class of models too rich - certainly not to include all computable functions - because then the complexity can neither be computed nor estimated by any algorithm.

2. Coding and Prediction

Our modeling principle is founded on the issues of how to describe or encode data efficiently; that is, with short code length. Although we do not really need any details of such codes, it nevertheless is useful to have an idea of the relevant issues in coding, above all, the code length. This may also add perspective to those interested in prediction, for the reason that it turns out to be a special case of coding. We begin with the traditional coding problem involving one probability distribution, and then we discuss the newer more general situation involving a family of them. We consider the observed data to be a string of symbols y = Yl. . . . . y,, each symbol, for instance, being a binary number written with some finite precision. We do not need to specify this precision, and, in fact, the reader may think of these observations to be just numbers as usual. More generally, the data string may consist of pairs (u,,y,), where the first component is an observed input and the second the observed output response. Nothing substantially different arises from this generalization; instead of a distribution for the outputs we simply consider a conditional distribution for the outputs given the inputs. A code C, then, is a one-to-one function taking each s t r i n g y of every length n to a binary string C(y). Moreover, the code length LO:), defined to be the number of binary digits in CO:), is required to satisfy the so-called Kraft inequality, Abramson (1968),

E .w:

2--L(v) < l,

(2.1)

121

for all n, w h e r e Y" denotes the set of all strings of length n. This inequality is i n t i m a t e l y connected w i t h a desirable property of the code, k n o w n as the prefix property, w h i c h means t h a t no code string Cry) is a prefix of a n o t h e r CO/), w h e r e y and .It are t w o distinct strings of the same length. If we place the code strings C(y), y running through the set of all strings, in a b i n a r y tree (in an obvious fashion), then each code string appears as a leaf having no successor nodes. But then, given such a tree, we can tell which initial portion in a n y string of binary symbols defines a valid code string. In other words, we can tell w i t h o u t a c o m m a w h e n w e have reached the end of a eode string. It is not an accident t h a t w i t h such a self containing description of data the code length defines a distribution Qfy) = 2-LL*~in Y" if w e just a d d the r e q u i r e m e n t t h a t the code be efficient in the sense t h a t the code strings have no superfluous digits, w h i c h turns (2.1) into a n equality.

Suppose the strings in Y" have a probability distribution P(y) assigned to them. Then, for a n y code w i t h the length satisfying (2.1), w e get b y Jensen's inequality, stating that for a c o n v e x function f(x), Ef(x) <_f(Ex),

2 -L(y) E log - P(y)

E L(y) - z: log PO,) < log ~

PO,)2-LO') < o,

y~ y~

where the equality holds if and only if 2-tt~ ) = P(y) for all y. In other words, the m e a n code length satisfies the inequality due to Shannon: EL(g) >_ H(n), w h e r e H(n) = -- j.~Xy. P(y) log P(y) denotes the e n t r o p y of the strings of length n. This means that the ideal w a y to encode the strings relative to the given distribution is to assign to string y a code string w i t h length - log P(y). This, of course, c a n n o t a l w a y s be done e x a c t l y because a code string must have an integer length, but at least we k n o w w h a t w e should be striving for, and w e call it t h e / d e a l code length. A n o t h e r good n a m e w o u l d be Shannon complexity o f y relative to the given distribution.

A direct application of these ideas to compressing strings confronts us with the same problem as m e t in traditional statistics: The distribution P0") is not k n o w n to us, and it either has to be imagined or, better, estimated. For this reason w e consider a p a r a m e t r i c family { P0(Y) } of such distributions or models, w h e r e 0 = (01. . . . . Ok), and k ranges over the set of all n a t u r a l numbers, H o w n o w to calculate the ideal code length is the central problem in the

M D L principle to b e discussed next.

122

There are two basic ways to go about encoding a string of data. In the first way we read the entire string and we ^ somehow form the best estimate 0 O,) of the parameter vector 0. Then we design a code C such that the length of the code string CO,) is close to the ideal -- log/~e c,)(y). We need not concern ourselves with the details of how such a code can be designed, which is just a routine matter. The important thing to realize is that the datay can lag de^ coded from the code string CO,) only if the decoder also knows the estimated parameter vector 0 (y). This has to be given in an explicitly coded form, because the decoder at the time it is needed does not yet k n o w y and, hence, cannot calculate the estimate by any conceivable algorithm. The binary code string for the parameter vector, which may be placed as a preamble in front of C(y), must dearly be a prefix code, lor otherwise the decoder would not be able to separate it from the subsequent binary code of the data. Hence, its length L(8) must satisfy the Kraft-inequality, ~2-L(') _< 1, where 0 runs through all its possible values. These values are clearly truncations (think of computing the maximum likelihood estimates, which surely result in truncated numbers). If we carry too many fractional digits, the required code will have to be long, while if we truncate too heavily, the results will deviate too much from the optimum, and we end up coding the string with non-optimal parameters. It turns out that when each component is truncated to its optimal precision, reflecting its importance to the entire code length, the k code length for the k-component parameter vector and the loss due to truncation is ~-- log n bits, Rissanen (1978). In addition, the decoder will have to be given the number of the components k in the estimated parameter vector as another prefix coded preamble, which takes a little more than log k bits. This number, of course, is almost always quite negligible in comparison with the other length, and we drop it. All told, the best ideal code length with this type of "nonpredictive" coding is to within terms of order log n given by

k l~vp(y) = min { -- log P0(Y) + "~- log n]. k,e

(2.2)

The same expression but with different content and scope was also derived by restrictive Bayesian assumptions }n Schwarz (1978). We also refer to the pioneering work of Akaike (1974) for another criterion, where the weaker model complexity penalizing term k gets added to the first, the negative logarithm of the likelihood term. In contrast with (2.2) such a term is too weak to produce consistent estimates of the number of parameters in all the analyzed cases, Hannan (1980) and Shibata (1976). Finally, we add that when the parameter coding job is done more carefully, Rissanen (1983a), a third term is required, namely, k log ]JOII~<e),where M(O) denotes the Hessian

123

matrix of - log Pc(Y). This term turns out to be sensitive to the structure in which the parameters of a multivariable dynamic system are represented, Rissanen (1983b); see also Section 4.

The other way of coding data strings requires no explicit code for the parameters, because the coding will be done in a "predictive" way. What this means is that from each portion y ' =-)1 . . . . . y, of the data string we form an estimate of the distribution P0(.v,+l I.I/) for the possible values of the next symbol, where # is to be replaced by an estiA A mated value 0(t) = O0/), calculated by an algorithm from the so-far processed string. The decoder knows this algorithm, and he can also calculate the same estimate provided that it indeed does not depend on the future and not yet decoded data points. We know from Shannon's result, derived above, that the best way to do the coding is to assign to the next symbol the code length -- log P~0(,)(y,.~ lY'). and hence the best total ideal code length with this type of predictive coding is

n-I

I_p0') = min { - ~'. log ~t0(vt+l lY')}.

k

t-O

(2.3)

We should also have included the code length, log k, required to describe the number of the components in the es^ timated parameter vectors, but as above this term is negligible. How sheuld we pick the estimates O(t)? It seems to make eminent sense to pick them in such a way that the accumulated past code lengths

t-I -- log PO(,,pt) = -- E log PeO~¢+l lyt), t=O

(2.4)

arc minimized, which is seen to be done by the maximum likelihood estimates of the parameters for each value e[ k. This represents a most attractive principle of inductive inference: Make that choice that has worked best in the past. And who can argue against that, provided that we have no other "prior" knowledge about the behavior of the data! This philosophy in his "prequential" approach to estimation was also discovered independently in Dawid (1984). A somewhat similar and yet crucially different "cross- validation" principle has been studied in Stone (t977). Because no "honesty" of the predictions is required, the associated criterion is asymptotically equivalent with Akaike's AIC, and hence the resulting estimates of the number of parameters are not consistent.

124

In order to avoid ill-conditioned optimization problems, w e in (2,4) never estimate more p a r a m e t e r s t h a n d a t a points; t h a t is, w e begin w i t h k = 0 a n d increase/¢ gradually to each final value with which the criterion in (2.3) is evaluated. The case w i t h no free p a r a m e t e r s means that w e need an initial distribution POt) to predict or encode the v e r y first observation. This could be done by having a fixed p a r a m e t e r value 0(0), obtained s o m e h o w on prior grounds, w h i c h singles out a distribution from the family. W e discuss later h o w such a prior knowledge can be t a k e n a d v a n t a g e of in modeling and prediction.

The predictive coding process is seen to be v e r y similar to prediction: In both cases we try to unravel the uncert a i n t y a b o u t the " n e x t " observation Y,.t b y acting on the past d a t a only. In fact, the t w o processes are equivalent. A A TO see this, let 8(Ft+t - - y (t + 1 ] t)) be a n y reasonable prediction error measure, w h e r e y (t + I ] t) is some prediction of the n e x t observation, involving p a r a m e t e r s to be estimated from the past data. Define a conditional density ^ f~(Y~+t [Y) proportional to e -~(y,,t -y 0+tl0) and we get a family of p a r a m e t r i c probahilistic models, w h e r e the code length, a p a r t from a n irrelevant t e r m due to t r u n c a t i o n and proportional to n, is the sum of the prediction errors. A particularly i m p o r t a n t special case results from the quadratic prediction error measure, because then the predictive MDL principle reduces to a predictive least squares (LS) principle. W e discuss its application to A R M A estimation in the n e x t section. Because the non-predictive coding process c a n n o t be interpreted as prediction, w e conclude t h a t coding is a strictly more general process than prediction.

W e conclude this section b y stating that the t w o described coding lengths are asymptotically optimal in the sense that their mean, relative to a n y process in the considered class of " s m o o t h " models, is shortest a m o n g all codes satisfying (2.1). Because the variance of these lengths, c o m p u t e d per observation, behaves like 1/n, w e m a y take these lengths themselves to represent well the shortest possible per s y m b o l code lengths (prediction errors), and w e call It~e(F) and lr(F) the non-predictive and predictive stochastic complexities, respectively, of the stringy, relative to the considered class of models. This result not only generalizes the a b o v e m e n t i o n e d Shannon theorem, giving a tight lower b o u n d for the code length a n d the prediction errors, but it also serves a similar role as C r a m e r - R a o in= equality for estimators, e x c e p t t h a t w e m a y assess the goodness of a n y estimators, including the n u m b e r of parameters. The n a m e " c o m p l e x i t y " seems apt in view of the fact that it represents the ultimate limit to w h i c h the three f u n d a m e n t a l tasks, prediction, estimation, a n d coding, can be performed.

125

3. A R M A E s t i m a t i o n and Prediction

AS w e outlined in the preceding section, estimation and prediction are intertwined: you c a n n o t predict optimally w i t h o u t performing estimation optimally. Here w e m e a n the real prediction problem w h e r e w e are g i v e n an obs e r v e d sequence of numbers, y ( l ) . . . . . y ( n ) , one b y one, and w e are asked to predict for each n the n e x t value, This is to be done w i t h o u t knowing the probabilistie source of the n u m b e r s as usually done in prediction theory. O u r approach is to select a class of models, or perhaps several classes, and fit a model in each class w i t h the predictive L S principle. The prediction will be d o n e with the best model at each instant of time, and if the past is a n y guid-

ance to the future this strategy will provide the best predictions obtainable with the selected class. W e shall choose the model class as the gaussian A R M A class, w h i c h means that w e shall have to k n o w h o w prediction is done optimally for such processes. The K a l m a n t h e o r y in principle is applicable, but the solution it provides involves par a m e t e r s t h a t c a n n o t be estimated from the observations. For this reason w e shall use a n o t h e r approach, Rissanen (1967), and w e give the relevant recurrence equations below.

Consider a process generated b y the recursion

y ( t ) + a l y ( t -- 1) + " - + apy(t -- p ) = e(t) + c l e ( t -- 1) + ... + Cqe(t -- q),

(3.1)

for t > p , w h e r e e is a n orthogonal zero-mean process with variance E ( e ( t ) 2) = 0 2. Letting u ( t ) for t >_p stand for the M A process

u ( t ) = e(t) + c l e ( t -- 1) + ... + ¢qe(t - - q ) ,

we see that the eovarianca E ( u ( t ) u ( s ) )

(3.2)

= r(t,s), t,$ >_ p , satisfies the crucial "bandedness" property

r(t,s) = 0, for [ t -- s] > q.

(3.3)

W e let the initial variables be specified b y the eovariances as follows

E ( u ( t ) y ( $ ) ) = r(t,s) = 0 if t - s > q

(3.4) E(y(t)y(s))

= r(t,s),

t,s < p .

126

The problem is to find the orthogonal projection of y ( t ) on Y0~-t, the subspace spanned b y the observations up to ^ t - 1, w r i t t e n a s y (t I t -- 1). The task is simple if we find. a representation of the process u as follows:

(3.5)

u ( t ) = E(t) + Ci ( O e ( t - - 1) + "'" + Cq(t)e(t -- q ) , t > q,

w h e r e e(t) is a n uneorrelated (but not of unit variance) process; the variables for non-positive indices are zero. The coefficients are found by the C h o l e s k y factorizatlon of the covariance m a t r i x R = {r(ij)] as R = B I B , w h e r e B is upper triangular,

b(O,O) b(O,1) 0

b(O,n)

b(l,1) 0

B=

b(n0

0

l,n)

b(n,n)

Specifically, the f a c t o r s are defined b y the following rccursions, which also are s e e n to result from the G r a m - S m l t h orthogonalization procedure.

q

b ( t -- i j ) = [ r ( t -- i,t) - -

Z b(t - j , t ) b ( t - j,t - i) q b - l ( t j-i+l

b(t,t) = + [r(t,t) - b 2 ( t - q,t) . . . . .

b(0,0)= +~,b(t-i,t)=0,

b(t -

- i,t - i),

l,t)2] 1/2, t > 0

1 _
(3.6)

i>t.

We t h e n h a v e

ci(t) = b(t - i , t ) b - l ( t

-- i,t -- i), i = 1 . . . .

,el.

(3.7)

Since the e - a n d t h e y - processes span one and the same linear space for all n, w e easily get the desired rceursive equations for the optimal predictor

127

q

: c , I , - f> +

k

2c, c,):c,-,t,-;- 1)=

i--I

(3.8) i--I

where d,(t) = q(t) - a~, i = 1. . . . . k for k = m a x {p:/}, and the coefficients with undefined i n d e x values a r e zero.

One can show that if the polynomial defined by the coefficients c, has its roots strictly outside the unit circle, then c,(t) -4- c,. The limiting predictor, then, agrees w i t h the s t a t i o n a r y optimal predictor

q

~, ft I t -

1) + ~

k

cf, ft -

i It - i -

1) =

i=l

~, (ci -

a i ) y f t - i)

(3.9)

i=1

We n o w r e t u r n to the main problem of h o w to do the prediction w h e n the coefficients and the t w o order n u m b e r s p and q are n o t k n o w n . W e apply the predictive LS principle and proceed as follows. For each pair (p,q) and each t w e solve the following ordinary least squares problem

l

min E ~ 2 ( i ) , fl i--I

(3.10)

where 0 denotes the vector of the coefficients a = (at . . . . ,%,q . . . . . cq) together w i t h the p(p + 1 ) / 2 + q(q + 1 ) / 2 initial elements r(id) in (3.6), defining a vector/3, and r(i) m a y be solved recursively from (3.1), (3.2), and (3.5). A A ^ Let the minimizing p a r a m e t e r s be O (t) = (a (t),/3 (t)). With these we n o w e x t e n d the Cholesky factorization one more step; i.e., w e c o m p u t e the coefficients (3.6) for t + 1, and w i t h (3.7) we calculate the n e w prediction 2~(t + 1 It) f r o m the f o r m u l a (3.8), w h i c h clearly depends only on the past data and the pair (p,q), because the ^ calculation of O(t) is done b y the fixed ordinary least squares algorithm. This gives the prediction error A A ~(t + 1 IpAt) =y(t + 1) - - ~ ( t + 1 It). As the final step w e find the best pair (p(n),q(n)) which solves the optimization problem

128

n-I Ip~v) = mln X # ( t + 1 Ipa). P'q t-O

(3.n)

It can be shown that asymptotically

I

~)

~ 02(1 +

p+q n

In n),

(3.12)

where

02----. ~ - ~ e 2 ( / ) . t-I

(3.13)

Remarks.

In the above described procedure we did not pay any attention to the amount of computations needed. Rather, our aim was to do the prediction as well as we know how, provided, though, that there is no prior knowledge about ^

A

the parameter values. Clearly, when calculating 0 (t + 1) by a suitable hill climbing routine, we should use 8 (t) as the initial estimate. It is also possible to calculate the Cholesky Iactorization by an order of magnitude faster algorithm, Rissanen (1973), in case the eovariance matrix R(t) is a Toeplitz matrix; i.e., if the process u is stationary, and if we set the initial conditions to zero. Alternatively, it is enough to have the initial conditions such that the y -- process is stationary. The resulting fast predictor recursions have been described by Lindquist (1974), Kailath, Morf, and Sidhu (1974), and by Rissanen (19"/5), after this author lectured the topic at Stanford University during 1971-1972. Much earlier, the impulse response of a stationary predictor was found with a fast algorithm by Levinson, but that algorithm required an ever growing memory.

The entire Cholesky factorization can be avoided if we ignore the influence of the initial conditions and simply replace the representation (3.5) by (3.2). The only problem remaining then is to compute the sequence of esti-

129

A

mates a(t), t = 0 . . . . . n - 1 for different values of p and q. In the caze of AR processes such calculations can be done recursively by the so.called ladder forms; see Wax (1985).

As a final remark, the difference between the complexity and the sum of the squared residuals (3.13) was observed in Bittanti (1983), where it was wondered whether the relationship between the two could be clarified. Well, (3.12) does it in a most decisive manner.

We computed in Rissanen (1984c) a small simulation to test the predictive least square principle for estimating an ARMA order. We used the stationary equations which do not require the Cholesky [actorization, The data were generated with an ARMA(1,1) system with parameters a = .5 and c = - . 3 , where e(t) was a computer generated zero mean unit variance independent gaussian sequence. We fitted models of type ARMA(p,q) with (p,q) ffi (1,0), (2,0), (1,1), and (2,2). The following table gives the sum in (3.12), calculated for various values ofp,q and divided by n, along a single sample of size 600.

(p,q)

n ----50

n = 100

n = 200

n = 300

n = 600

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

(1,0) (2,0) (1,I) (2,2)

1.336 1.629 1.505 1.925

1.276 1.385 1.307 1.520

1.101 1.156 1.117 1.221

1.107 1.120 1.091 1.159

1.015 0,996

Table 1. Simulations of ARM A processes

We see that the models (2,0) and (2,2) give uniformly worse values than the two best models (1,0) and (1,1) in the table for all sample sizes (we did not calculate the last entry for them, which surely would have been worse, too). In the last model, in particular, the two extra parameters penalize heavily the prediction errors. For sample sizes up to 200 the simpler model (1,0) performs best, but eventually the model with the right numbers of parameters (1,1) is the winner. This makes sense in that there is no predictive benefit in estimating the second less significant parameter until there is enough data, even if we knew that such a parameter existed; the data are the ultimate arbiter in deciding what is optimal and what is not.

130

We then w a n t e d to study how initial estimates of the p a r a m e t e r s might be t a k e n a d v a n t a g e of to improve t h e par a m e t e r estimates a n d the predictions. After all, in our opinion, the most natural and easy w a y to incorporate inltial knowledge is directly in terms of the estimate of the parameters, including their numbers. Indeed, the p a r a m e t e r s usually represent constants, a n d a n y Bayesian type of prior distribution for t h e m is both a w k w a r d to justify and just a b o u t impossible to estimate in a meaningful way. The traditional Bayesian formalism does not permit a representations of initial k n o w l e d g e in terms of a p a r a m e t e r value, because the so defined singular dis^ tribution c a n n o t be altered by the data. However, our formalism does it easily. In fact, let 0 (0) denote the initial ^ estimate w i t h p + q components. Then w i t h O(t) denoting the predictive L S estimate from the first t observations w i t h initial k n o w l e d g e ignored, as described above, define a new estimate as a linear combination of the t w o

O(t) =

^ ^ at0 (o) + (1 - . c ) O ( t ) .

(3.14)

The coefficient is defined as follows

at=

1

--,_ , ,_ , ' 1 + 2 L'Ow'q't)-Lke'q't)

(3.15)

w h e r e L(p,q,t) denotes the a c c u m u l a t e d prediction errors, (3.11) before the minimization, and Lo(p,q,t) is the s a m e ^ w h e n the p a r a m e t e r is the initial estimate 0 (0). Because this p a r a m e t e r is the same t h r o u g h o u t the data, La(p,q,t) coincides w i t h the usual non-predictive s u m of the squared deviations. We see that a good initial e s t i m a t e tends to m a k e the corresponding code length shorter t h a n the length L(p~l,t) for small values of t, because initially the estiA

m a t e 0 (t) tends to be poor due to a small sample size. This causes a r to be near one, a n d the effective estimate 0 (t) is close to the initial estimate. However, eventually a, gets small, unless the initial estimate is perfect, and the ef^ fective estimate tends to the steadily improving estimate 0 (t).

To test the feasibility of this s c h e m e w e generated a data sequence of length 100 w i t h the A R M A ( I , 1 ) model defined b y the t w o p a r a m e t e r s a = 0 . 7 , c = -- 0.1, a n d w i t h a unit variance zero m e a n gaussian independent input sequence. W e set the initial estimate 0(0) = (0.7, -- 0.1) of the p a r a m e t e r s at the " t r u e " values. W e wish to cornpare the convergence of the p a r a m e t e r e s t i m a t e 0 (t) = (a, c ), given b y such a perfect initial knowledge, w i t h t h a t ^

A

A

of the least squares estimates 0 (t) = (a, c ). In this e x p e r i m e n t we, then, kept the n u m b e r of p a r a m e t e r s at the

131

correct value. The~o estimates along with the two sums of the squared predictions, corresponding to the two estiA

maters 0 and O, r~pectively, were computed along the 100 sample points, and the results are in the following table.

time t

with prior estimates a c L

without prior estimates a c L

.......................................................................

I0 20 30 40 100

0.15 0.47 0.62 0.69 0.69

0.01 -0.23 -0.17 -0.11 -0.11

5.0 17.1 28.7 36.0 98.9

0.04 0.16 0.24 0.43 0.50

0.03 -0.41 -0.50 -0.28 -0.33

4.99 20.0 33.7 42.9 108.2

Table 2. Effect of prior estimates

We see that indeed good initial estimates improve both the convergence and the prediction error.

4. Vector Time Series Models

As quite well known, the class of multi-input/output linear dynamic systems, even of a fixed dimensionality, is topologically a lot more complex space than in the case when either the input or the output is scalar. Hence, when we search for the stochastic complexity of an observed vector li me series, relative to the class of such models, we may find a model which with relatively few parameters will capture the essence in the data. In the older statistical literature the only models that were fitted to a series with, say, p components, had the maximal dimensionality, a multiple of p. This was justified on the grounds that since the estimated l-lankel-matrix, or its equivalent, has the maximal rank, there is no point in fitting other models. Such an argument indicates a gross misunderstanding of modeling, and, in fact, an equivalent argument would dismiss fitting dynamic systems to scalar sequences as well; after all, no observed sequence is generated by any dynamic system.

Since the theory of multivariable linear dynamic systems is by now well known, and, in fact, it ma y even be covered in some of the other chapters of this book, we do not need to describe it here in any detail. Instead, we just summarize the relevant facts. The set of all linear dynamic systems with, say, p inputs and equally many outputs,

132

is in o n e - t o - o n e c o r r e s p o n d e n c e w i t h t h e set of all H a n k e l m a t r i c e s of p x p blocks w i t h finite r a n k , s a y n. G i v e n s u c h a s y s t e m , its m a t r i x i n p u t / o u t p u t impulse response defines a H a n k e l m a t r i x of p x p blocks a n d r a n k n. C o n v e r s e l y , a n y of t h e usual realization a l g o r i t h m s defines a

p-input/output s y s t e m

of d e g r e e n of s u c h a H a n k e l

matrix.

T h e set of all finite r a n k H a n k e l m a t r i c e s of p x p blocks c l e a r l y a d m i t s a p a r t i t i o n into e q u i v a l e n c e classes b y t h e r a n k n. H o w c a n e a c h s u c h class b e p a r a m e t c r i z e d ? Unlike in t h e c a s e w i t h p = 1, a n equivalenc~ class c o r r e s p o n d i n g to a r a n k n, a n d h e n c e t h e set of all linear s y s t e m s of o r d e r n w i t h p inputs a n d o u t p u t s , c a n n o t b e p a r a m e t e r i z e d w i t h a single c o o r d i n a t e s y s t e m , a n d t h e set is n o t a linear space. This i m p o r t a n t o b s e r v a t i o n is a t the r o o t of t h e m o d e r n t h e o r y of linear d y n a m i c s y s t e m s , a n d it also affects in a p r o f o u n d m a n n e r the w a y s u c h models o u g h t to be fitted to the o b s e r v e d time series. C o n s i d e r , f o r e x a m p l e , the set of all H a n k e l m a t r i c e s w i t h p = 2 a n d n -- 3. If w e f u r t h e r a s s u m e t h e first t w o r o w s to b e linearly i n d e p e n d e n t , as w e s h o u l d to a v o i d p a t h o l o g y , t h e n t h e H a n k e l p r o p e r t y implies t h a t either t h e t h i r d or the f o u r t h r o w m u s t be the last r e m a i n i n g r o w t h a t t o g e t h e r w i t h t h e first t w o f o r m s a 3 - e l e m e n t basis f o r t h e s p a n of all the r o w s in t h e m a t r i x . A g a i n the H a n k e l p r o p e r t y implies t h a t these t h r e e basis r o w s a r e defined just as s o o n as w e specify t h e t w o first e l e m e n t s in each, hence, six a l l t o g e t h e r . In the f o r m e r case, w h e r e the basis consists of the Hrst t h r e e r o w s , t h e f o u r t h a n d t h e fifth r o w a r c linear c o m b i n a t i o n s of the basis e l e m e n t s a n d , hence, to specify t h e m w e n e e d six ecefficients. All t h e o t h e r r o w s in the H a n k e l m a t r i x a r e n o w just shifts a n d t r u n c a t i o n s of these a n d t h e basis r o w s .

2np =

Similarly,

12 p a r a m e t e r s a r e n e e d e d to specify all t h e H a n k e l m a t r i c e s , w h e r e the first, s e c o n d , a n d t h e f o u r t h r o w s

f o r m a basis.

C o n s i d e r n o w t h e set of m a t r i c e s w h e r e the f o u r t h r o w is a basis c l e m e n t . T h e n the t h i r d r o w p e r f o r c e is l i n e a r l y d e p e n d e n t o n t h e first, second, a n d t h e f o u r t h . C o n s i d e r t h e f u r t h e r subset w h e r e t h e t h i r d r o w in f a c t is linearly d e p e n d e n t o n t h e first t w o . E v i d e n t l y n o s u c h m a t r i x a n d the c o r r e s p o n d i n g linear s y s t e m c o u l d b e e x p r e s s e d in t e r m s of t h e p a r a m e t e r s d e f i n e d b y t h e basis consisting of t h e first, second, a n d t h e t h i r d r o w . F r o m this w e c o n elude t h a t in o r d e r t o p a r a m e t e r i z c the set of all s y s t e m s of d e g r e e 3 h a v i n g t w o inputs a n d o u t p u t s , w e n e e d t w o distinct c o o r d i n a t e s y s t e m s .

~33

In general, then, the set of all linear systems of degree n having p inputs and outputs, may be partitioned into finitely many equivalence classes, each class corresponding to the so-called lexicographic basis defined by each matrix as follows: Each of the first p rows is included in the basis, and the next basis element is the first row which is not in the linear span of those above it, and so on. Consider the ith row, i _
The state space representation of the multi-input/output linear systems is somewhat simpler than the ARMA representation for the reason that the invariants; i.e., the coordinates, appear directly as elements in the system matrices; in the ARMA representation the matrix elements are functions of the invariants. What we need is the equations for the predictions, which we for simplicity take to be time invariant, which means that we need not solve the Riccati equations or, equivalently, find the Cholesky factors recursively; otherwise, we would have to proceed as in the previous section. The predictor equations, then, are as follows:

.~(t + 1 It) = F~Ctlt-- 1) + GuCt) + K[.y(t)

-

-

tt~(tlt - 1)] (4.1)

~'OIt- 1) = # ~ O I t - 1),

134

w h e r e ~ , ( t l t - i ) is the prediction of the observed p-component output sequence, u is the possibly present observed r - c o m p o n e n t input sequence, and ~(tlt - l ) is the prediction of the intermediate n - c o m p o n e n t state seA quence, w h i c h we, otherwise, have n o interest in. Take initially x (1 I 0) = 0. The observed input sequence is quite irrelevant; its n u m b e r of components, w h e t h e r single or multiple or none at all, has no i m p o r t a n t effect on the theory. It is the n u m b e r of outputs p that w e now require to be greater t h a n unity. W e m a y t a k e all the elements in the t w o matrices G a n d K as free. The m a t r i x F has np free p a r a m e t e r s in the manifold representation, and possibly fewer in its m i n i m a l representation. Their locations depend on the choice of the coordinate system, which also d e t e r m i n e the location of the O's a n d l's in H, w h i c h are its only elements. Hence, there are k free p a r a m e t e r s in the model (4.1), w h i c h w e arrange into a vector 0. In the ease w i t h the manifold representation, k = n(2p + r) in each of the structures corresponding to the dimensionality n, while in the minimal realization k depends on the structure; however, k < n(2p + r).

If w e agree to measure the prediction errors b y squares, w e get the so-called Gauss - M a r k o v class of models. Relative to this class the non-predictive stochastic c o m p l e x i t y of the data (.el u) = (v(1) I u(1)) . . . . . (y(N) [ u(N)) is given b y

I~vt,fO,l u ) = m i n { - f l o g d e t R ( O ) + $.0

log N + k log llOnM(O)},

(4.2)

where

/V R(0) : ~-t~1(v(t) -- )A'8(fI t -- l))(y(t) -- .~(t I t -- t))t

(4.3)

and M(O) denotes the Hessian defined b y the double derivatives of log det R(O), evaluated at 0. It was s h o w n in Rissanen (1983b) t h a t if, indeed, the data w e r e generated b y some such system in a s t r u c t u r e s, then the estimated ^ structure sA(N) and the associated p a r a m e t e r s 0 (N), will approach the corresponding generating parameters. In particular, the last t e r m in the c o m p l e x i t y forces the s t r u c t u r e estimates to converge. The last t w o t e r m s in the stoehastlc c o m p l e x i t y (4.2) represent the optimal model c o m p l e x i t y as m e a s u r e d in t e r m s of the n u m b e r of b i n a r y digits required to encode the parameters. As the n u m b e r of d a t a points grows, the second t e r m becomes domi-

135

A

, Jr. "This nantly greater ol the two, and w e see that the optimal model c o m p l e x i t y grows proportional to Tk ( N ) mg is because one m u s t express the p a r a m e t e r s collectively with increasing precision as the sample size grows, which makes e m i n e n t sense.

We n e x t describe the predictive stochastic complexity. It is given b y

1

N

!. ~-,,

^

A

Ir(vlu) = m}n ~ - l o g det 2vt~t O ' ( t ) - - y ( t l t - l ) ) ( y ( t ) - - ) ' ( t i t -

(4.4)

1))',

=

A

A

where y (t I t -- 1) denotes the prediction obtained as follow: The m a x i m u m likelihood estimate 8 (t -- 1) is deter. , A mined from the d a t a up to time t - 1. With this p a r a m e t e r the predicttons y (i I i -- 1) are c o m p u t e d from (4.1) up to time i = t, w h i c h gives the prediction needed in (4.4) at this t i m e instant. The whole process is to be repeated for the next time instant, w h i c h requires a lot of computations. The bulk of these goes to the evaluations of the estiA

A

mates 0 (t), w h i c h must b e done for each t. It is clear that 0 (t) provides an excellent starting point for getting A

A

0 (t + 1), but nevertheless an order of t computations are needed to calculate y

(tit -- 1), so t h a t the entire string

takes a n order of N 2 operations.

The stochastic c o m p l e x i t y (4.4) involves the minimization with respect to the s t r u c t u r e index s. If w e represent the models in the manifold, t h e n the predictors c o m p u t e d in t w o equivalent coordinate systems are identical, to within the numerical precision. Therefore, equivalent coordinate systems also produce the same value for the stochastic complexity, a n d c a n n o t b e distinguished. In the minimal representations there are no equivalent coordinate systems a n d the predictive stochastic c o m p l e x i t y is given by a unique s t r u c t u r e index. These facts can be t u r n e d to an a d v a n t a g e in t h a t for each o r d e r n w e evaluate the a c c u m u l a t e d prediction errors in (4.4) (before minimizaA

A

tion) b y calculating the required least squares estimates 0 (t) in a good structure s (n), d e t e r m i n e d suitably, say, with the procedure in van O b e r b o e k and Ljung (1982). This is i m p o r t a n t to ensure that the estimates can be determined w i t h sufficient precision. Then the minimization required in (4.4) m a y be done with respect to n, just as in the single i n p u t / o u t p u t case, w i t h ~ as the result. Finally, w e m a y also find the minimizing s t r u c t u r e A

A

A

A

s (n) = (n, ..... np) b y letting the models be in their minimal representations.

136

Simulations.

T h e s e s i m u l a t i o n s w e r e d o n e b y V. W e r t z , see R i s s a n e n a n d W e r t z (1985). T h e d a t a w e r e g e n e r a t e d b y t h e s y s t e m , E x a m p l e 5. I in v a n O b e r b e e k a n d L j u n g (1982),

x(t + 1) = F x ( t ) + G u ( t ) + K e ( t ) (4.5)

y(t) ----Hx(t) + e(t),

w h i c h in the m a n i f o l d r e p r e s e n t a t i o n has t h e m a t r i c e s

F=

.50.

,

G=

0

and

K=

-.000547

.063

.119

.157

.674

.0000666

This s y s t e m has t h e s t r u c t u r e s = (2,1) c o r r e s p o n d i n g to t h e basis defined b y t h e first t h r e e r o w s in t h e H a n k e l m a t r i x . T h e 2 - c o m p o n e n t i n p u t e(t) is a s a m p l e f r o m a 2 - v a r i a t e zero m e a n gaussian process w i t h t h e c o v a r i a n c e m a t r i x given b y .2 x I, w h e r e I denotes t h e 2 x 2 identity m a t r i x . T h e s c a l a r i n p u t u(t) is a n o b s e r v e d m o r e or less r a n d o m signal, t a k i n g values + 1 or -1, w h i c h w a s a d d e d for g o o d m e a s u r e . T h e s a m p l e size is N = 500; i.e., t = 1, 2 ..... 500.

T h r e e d i f f e r e n t m o d e l s w i t h r e a s o n a b l e s t r u c t u r e s sl = (1,1), s2 = (2,1), a n d s3 = (2,2) w e r e fitted. T h e c o r r e s p o n d i n g values of t h e m i n i m i z e d a c c u m u l a t e d p r e d i c t i o n e r r o r s a r e .3031, .0649, a n d .0671, respectively. W e see t h a t t h e s e c o n d s t r u c t u r e , w h i c h is t h e s a m e as t h e s t r u c t u r e o! the d a t a g e n e r a t i n g s y s t e m , gives t h e p r e d i c t i v e

137

complexity. The first being under-parameterized is clearly the worst, while the last, having only one excess parameter, suffers only slightly for this. We give the optimal matrices for the winning modal

0 F=

1

0

• 5 -.02 .

-.o2

.008 ,

G=

.032

L.0490J

.

and

K=

23

--.023 ,

-.11

.083 j

H= 0

Referenc~

Abramson, N. (1968 ), Information Theory and Coding. McGraw-Hill, New York. Akaike, H. (1974), " A New Look at the Statistical Model Identification", IEEE Trans. AC-19, 716-723. Akaike, H. (1975), "Markovian Representation of Stochastic Processes by Canonical Variables", SIAM J. Control, 13. Bittanti, S. (1983), "Is the.Predictlon Error of a Regression Model White?", J. Franklin Inst. Vol. 315, No. 4, 239-246. Chaitin, G.J. (1975), "A Theory of Program Size Formally Identical to Information Theory" J.ACM, 22,329-340. Clark, J.M.C. "The Consistent Selection of Local Coordinates in Linear Systems Identification", JACC, Purdue University, Lafayette, Indiana, pp. 576-5g0, July 1976. Dawld, A.P. (1984), "Present Position and Potential Developments: Some Personal Views, Statistical Theory, The Prequential Approach", J. Royal Star. Soc. Series A, Vol. 147, Part 2,278-292. Geisser, S. and Eddy, W. (1979), "A Predictive Approach to Model Selection", J. American Star. Ass., Vol. 74, Nr, 365, 153-160.

138

Clover, K. and Willems, J.C. (1974), "Parameterizations of Linear Dynamical Systems: Canonical Forms and Identifiability", IEEE Trans. AC-19, no. 6. Hannan, E.J. (1980), "The Estimation of the Order of an A R M A Process", Ann, Stat. 8, No. 5, 1071-1081. Hjorth, U. (I982), "Model Selection and Forward Validation", Stand. J. Stat. 9, 95-105. KaUath, T., Morf, M., Sidhu, G.S. (1974), "Some New Algorithms for Recursive Estimation in Constant DiscreteTime Linear Systems", IEi~E Tr. Automatic Control, Vol. AC-19, 315-323. Kalman, R.E. (1974), "Algebraic Geometric Description of the Class of Linear Systems of Constant Dimension", 8'th Ann. Princeton Conf. on Inf. Sciences and Systems, Princeton, New Jersey. Kolmogorov, A.N. (1965), "Three Approaches to the Quantitative Definition of Information", Problems of Information Transmission l, 4-7. Lindquist, A. (1974),)'A New Algorithm for Optimal Filtering of Discrete-Time Stationary Processes", SIAM J. Control 4, 736-747. Ljung, L. and Rissanen, J. (1976) "On Canonical Forms, Parameter Identifiability and the Concept of Complexity", IFAC Syrup. on Identification, Tbilisi, USSR. Luenbcrger, D.G. (1974), "Canonical Forms for Linear Multivariable Systems", IEEE Trans. AC-12, 290-293. Popov, V.M. (1972), "Invariant Description of Linear, Time-Invariant Controllable Systems'), SIAM J. Control, 10, 254-264. Rissanen, J. (1967), "An algebraic approach to the problems of linear prediction and identification", IBM Res. Rep. RJ 468, Oct. 23. Rissanen, J. (1973), "Algorithms for Triangular Decomposition of Block Hankel and Toeplitz Matrices with Application to Factoring Positive Matrix Polynomials", Mathematics of Computation, Vol. 27,147- 154. Rissanen, J. (1974), "Basis of Invariants and Canonical Forms for Linear Dynamic Systems", Automatica, Vol. 10, pp.175-182. Rissanen, J. (19"/5), "Canonical Markovian Representations and Linear Prediction", Proc. of the 6'th IFAC Symposium, Part 1, Paper 29.3, 1-9. Rissanen, J. (1978), "Modeling by shortest data description", Automatica, Vol. 14, pp. 465-471. Rissanen, 3. (1983a), "A Universal Prior for Integers and Estimation by Minimum Description Length", Ann. of Statistics, Vol. 11, No. 2,416-431.

139

gissanen, J. (1983b), "Estimation of Structure by Minimum Description Length", Circuits, Systems, and Signal Processing, special issue on Rational Approximations, Vol. 1, Nr. 3-4, 395-406. Rissanen, J. (1984a), "Universal Coding, Information, Prediction, and Estimation", IEEE Trans. Inf. Theory, Vol. IT-30, Nr. 4, 629-636. Rissanen, J. (1984b), "Stochastic Complexity and Modeling", (to appear in Ann. of Statistics). Rissanen, J. (1984c), "Order Estimation by Accumulated Prediction Errors", Esseys in Time Series and Allied Processes (eds. J. Gani, M.B. Priestley). Rissanen, J. (1985a), "Minimum Description Length Principle", Encyclopedia of Statistical Sciences, Vol. V, (S. Kotz & N. L. Johnson eds.), pp. 523-527. John Wiley and Sons, New York. Rissanan, J. (1985b), "A Predictive Least Squares Principle", (to appear). Rissanen, J. and. Ljung, L. (1975), "Estimation of Optimum Structures and Parameters for Linear Systems", Prec. CNR. CISM Syrup. on Algebraic System Theory, Udine, Math. System Theory 131, Springer-Verlag, pp. 76-91. Rissanen, J. and Wertz, V. (1985), "Structure Estimation by Accumulated Prediction Error Criterion", Eighth IFAC Symposium on Identification and System Parameter Estimation, York, England. Schwarz, G. (1978), "Estimating the Dimension of a Model", Ann, Statist. 6, 461-464. Shibata, R. (1976), "Selection of the Order of an Autoregressive Model by Akaike's Information Criterion", Biometrica, 63, 1, 117-126. Solomonoff, R.J. (1964), "A Formal Theory of Inductive Inference". Part I, Information and Control 7, 1-22; Part II, Information and Control 7,224-254. Stone, M. (1977), "An Asymptotic Equivalence of Choice of Model by Cross-Validation and Akalke's Criterion", J. Royal Stat. See., Ser. B, 39, 44-4"/. van Overbeek, A.J.M. and Ljung, L. (1982), "On Line Structure Selection for Multivariable State Space Models", Automatica, vol. 18. no. 5, 529- 543. Wax, M. (1985), "Order Selection for AR Models by Predictive Least Squares", (to appear) Wertz, V. (1982), Structure Selection for the Identification of Multivariate Processes, Dr. Sci. Appl. thesis, Universite Catholique de Louvain, Louvain-La-Neuve.

140

Wertz, V., Gevers, M., Hannan, E.J. (1982), "The Determination of Optimum Structures for the State Space Representation of Multivariable Stochastic Processes", IEEE Trans. Autom. Control., Vol. AC-27, No.6, 1200- 1211.

Chapter

5

Deterministic

and Stochastic

Linear Periodic Systems

Sergio Bittanti

I. Introduction Linear periodic

systems are linear systems described by diffe-

rential or difference coefficients.

equations with periodically

Deterministic

and stochastic

periodic

are useful to model natural and artificial dic type. As such, of application,

Signal Modelling and Processing.

stands on the observation practical

significance

Moreover,

(PSYCO)

(DaPrato, 1979),

(Gilbert,

1971 and 1976), 1967),

(theory and applications)are 1976)

1973), 1971),

1968),

Systems and (Bailey,

1980), b, c),

(Dorato and Knudsen,

1985),

(Houlinhan,

periodic

(Colonius,1985a,

(Gilbert and Lyons,

(Hernandez and Jodar,

(Horn and Bailey,

action in

mentioned:

(Berstein and Gilbert,

(Dorato and Levis, 1977),

is a

the following

to the area of Periodic

Fronza and Guardabassi, 1984),

systems of

control problem of stochastic

relative

(Bekir and Bucy,

(Bittanti,

This theory

calls for a detailed

Without any claim of completeness,

Control 1973),

theory.

of such a periodic

of stochastic disturbances

general references

systems

for which the best operation

analysis of the periodic systems.

periodic

control

fields

Engineering,

that there exist several

periodic one. The implementation presence

in various

and Aerospace

play a key role in optimal periodic

systems

phenomena of perio-

they are of great interest

such as Chemical

time-varying

1981),

(Guardabassi,

(Horn and Lin, Cliff and Kelley,

142

1982),

(Khargonekar,Poolla

(Khandelwal,

and Tannenbaum,

Sharma and Ray,

1974),

(Marcus,

1973),

Onogi,

1981),

1976),

(Nistri,1983),

1979),

(Noldus,

1984),

1978),

(Meyer and Burrus,

(Shayman,

1976),

1980),

1985),

1973 and 1976),

(Watanabe, Nishimura and Matsubara,

1981),

(Maffezzoni,

(Onogi and Matsubara,

1983),

(Speyer,

nabe, Nishimura and Matsubara, Matsubara,

1978),

1975),

(Sch~dlich, Hoffmann and Hofmann,

Evans,

(Kono, 1980),

(Kern, 1980),

(Matsubara, Nishimura, Watanabe and

(Matsubara and Onogi,

(Sincic and Bailey,

1985),

(Speyer and

1984),

(Wata-

(Watanabe, Onogi and

(Watanabe, Kurimoto and Matsubara,

1984).

Further references will be quoted in the sections below. Obviously,

the linear time-invariant

of linear periodic systems. However,

systems belong to the class the extension of the pro-

perties of time-invariant systems to the periodic case is far from straightforward, the theory of PSYCO.

as witnessed by the very development of Indeed, many peculiar and challenging

problems are encountered along this route. Only in the last few years,

a number of these problems have been solved and some

open questions have been clarified. In particular,

a somewhat detailed understanding of the struc-

tural properties of periodic systems has been achieved.

This

paper is intended to provide a first general picture of such a subject by surveying the appropriate literature, which covers the two last decades. The use of these properties in the study of some basic questions relative to stochastic linear periodic systems is also discussed.

143

The p a p e r

is e x p o s i t o r y

The s t r u c t u r a l so on)

in n a t u r e

properties

and is o r g a n i z e d

(teachability,

of c o n t i n u o u s - t i m e

controllability,

and discrete-time

systems

w i t h in S e c t i o n s

2 and

3 respectively.

known

valid

in the t i m e - i n v a r i a n t

properties

recalled. cussed.

Then,

The Kalman canonical

properties with

their generalization

is the s u b j e c t

the e x t e n d e d

and d e t e c t a b i l i t y . of s t o c h a s t i c stationary

systems.

stochastic

2. S t r u c t u r a l

x(t)

the

= A(t)

where A

solution

x(t)

+ B(t)

;

transition

d dt

= A(t)

~(t,T)

is such t h a t

5 deals

to the a n a l y s i s of a c y c l o -

is i n v e s t i g a t e d .

Periodic

Systems

Systems

b y the d i f f e r e n t i a l

equation: (1.a)

: R ~ R nxn

T is the

B(t+T)

The s y s t e m

of t h e s e

u(t)

: R ÷ R nx/n and B

= A(t)

is d i s -

stabilizability

6 is d e v o t e d

of C o n t i n o u s - t i m e

system described

systems

Section

the e x i s t e n c e

Linear Periodic

T - p e r i o d i c . The p e r i o d A(t+T)

Section

i.e.

are d e a l t

are f i r s t

in t e r m s

4. Then,

and

five w e l l

case

to p e r i o d i c

properties,

Precisely,

Properties

2.1Continous-time

Consider

structural Finally,

Precisely,

decomposition

of S e c t i o n

as follows.

~(t,T)

smallest

= B(t)

matrix

value

real

and

for w h i c h

, Yt.

~(t,T), ,

are c o n t i n u o u s

i.e.

~(T,T)

(1.b) the s o l u t i o n

= I,

t>T,

of

144

~(t+T,

T+T)

=

~(t,T) .

B y the F l o q u e t Yakubovich

theory,

(2) see e.g.

and S t a r z h i n s k i i

Halanay

(1975),

(1966),

#(t,0)

Chen

(1970)

and

c a n be e x p r e s s e d

as

follows : ~(t,0)

= ~(t)

e Rt

where ~(t+T)

= T(t),

The matrix ticular

the

Vt;

~ (t+T,t)

# ( T , 0 ) = e RT

monodromy

the

d e t ( e RT)

= e(tr

eigenvalues

System teristic

(namely

matrix

~(t+T,t)=

at time at t=0,

~(t,T)

are i n d e p e n d e n t By J a c o b i ' s

be p o s i t i v e .

of R are n a m e d

t. or

~(T+T,T)

of t. T h e y

Theorem,

In fact

In p a r simply ~(t,T) -I are c a l l e d

the p r o d u c t

of

•

thah

c~ctez~stic

stable

lie w i t h i n

to r e q u i r i n g

spectrum

monodromg matrix

R) T

multipliers

to the o p e n

denote

must

(~) is a s y m p t o t i c a l l y

equivalent

The

Since

of ~(t+T,t)

eigenvalues

= I.

is the m o n o d r o m y

cham2cte~stic multipliers.

these

The

is n a m e d

matrix.

the e i g e n v a l u e s

~(0)

exponents.

if a n d o n l y

the o p e n u n i t

if the c h a r a c -

disk,

the c h a r a c t e r i s t i c

which

exponents

is belong

left plane.

of e RT w i l l

the set of the the e i g e n v a l u e s

be d e n o t e d

"unstable"

by

[, w h i l e

eigenvalues

not b e l o n g i n g

[I is u s e d to

of the

same m a t r i x

to the o p e n u n i t

disk).

,

145

2.2 Structural properties For the sake of completeness,

the classical definitions of reach-

ability

are given below.

and c o n t r o l l a b i l i t y

Definition 1.1

I

The state x ~ R n is reachable over lable over (t,T),T > for

(1.a) which

[(t,x) 1.2

into

Xr (T,t)

System

if there exists an input function

carries

the event

(T,0) into

(t,x)

(T,0) 3 .

[Xc(t,T)] denotes

[controllable] 1.3

t]

(T,t),T
(1.a)

states over

the set of the reachable (T,t)[(t,T) 3

(or, equivalently,

the pair

(A,B)) is reachn

able[controllable 3 over (T,t) [(t,T)] if Xr(T,t) = R EXc(t,T) 1.4

= Rn3

The state x e R n is reachable at t [ controllable if there exists a time point T,T < t reachable over

1.5

Xr(t) lable~

1.6

System

[re(t)]

(T,t) [ controllable

[~>t],such over

that x is

(t,T) ~ .

denotes the set of the reachable E c o n t r o l -

states at t. (1.a)

(or, e q u i v a l e n t l y

the pair

(A,B)) is r e a c h -

able[controllable 3 at time t if Xr(t) = R n [Xc(t) 1.7

System

at t 3

(I)

(or, equivalently,

the pair

=

Rn]

(A,B)) is reach-

a b l e [ c o n t r o l l a b l e 3 if Xr(t) = R n, Yt [Xc(t)

= R n, Yt 3 .

146

2.3

GramPian matrices

The following nxn matrices are named reachability and controlla-

bility Grammian matrices respectively.

Wr(T,t)

.t = j~ #(t,~) B(o) B(o)' ¢(t,s)' do , t >

(3.a)

w

=

(3.b)

(t,-r)

~(t,o)

B(a)

B(c~)'

~(t,c~)'

de

,

T>t

C

It is well known (Kalman, 1969) that

Xr(~,t) = R EWr(~,t)]

x (t,~) = R EWc(t,~)] c

L-J m

where R

is the range operator.

In the periodic case, the following recursions can be derived in view of

(2):

Wr(t-(i+1)T,t ) = Wr(t-iT,t ) + [@(t+T,t)] i Wr(t-T,t) [@(t+T,t)'] i (4.a) C

C

s

C

(4.b) 2.4

Five structural properties of time-invariant systems

The structural properties of linear time-invariant systems have received ample coverage in the literature,see e.g. Kalman

(1969),

147

Chen

(1970).

though all,

some

well

of t h e m

in t h e p r o p e r

periodic

A)

Five

known

are

properties

trivial,

order,

for

are

listed

it is a d v i s a b l e

the

subsequent

below.

to

list

discussion

Althem

on

systems.

The reachability

and controllability

subspaces

at t i m e

t

subspaces

are t i m e -

do c o i n c i d e : x

r

(t)

B) T h e

= x

c

(t)

,

reachability

Yt

and controllability

invariant: X

X

C)

r

c

(t) = const.

, Yt

(t) = const.

, Yt.

If t h e p a i r point, X

r

X

(t)

(A,B)

is r e a c h a b l e

it is r e a c h a b l e =

Rn

(t) = R n

---->

X

~

X

c

D)

r

[controllable]

[controllable] (t)

=

Rn

(t) = R n

,

at a n y

at a t i m e time

point:

Vt , Vt.

c

If a s t a t e any

e>0,

(t,

t+~)]

is r e a c h a b l e

it is r e a c h a b l e :

Xr(t)

= Xr(t-e,t )

Xc(t)

= Xc(t,

t+E).

at t [ c o n t r o l l a b l e over

(t-E,t)

at t],

[controllable

then, over

for

148

E) The p a i r

(A,B)

[sI - A

rank o n the s p e c t r u m

to the s y s t e m

a trivial

between

the t i m e - i n v a r i a n t

D can be r e p h r a s e d

systems,

reachability

instantaneous". made

arbitrarily

constraint I.

often

referred

2.5

Five

an

input

the

(1966),

Popov

structural

that,

with

function

properties

the

case.

in t i m e - i n v a r i a n t

are

"asymptotically

of

transitions time

the a b s e n c e

of any

in the c l a s s i c a l

characterization

Belevitch

(1968),

of c o n t i n u o u s - t i m e

Defini-

E is

(Popov-Belevitch-Hautus)

(1966),

is

to stress

and the p e r i o d i c

interval

spectral

to as the PBH

only

and c o n t r o l l a b i l i t y

is c o n n e c t e d

on the

Finally,

see J o h n s o n

in

B is i n t u i t i v e . C

here

and c o n t r o l l a b i l i t y

occur

shor~This

tions

by saying

The reachability to

Property

of B and is l i s t e d

Property

energy

if

of A.

time-invariance,

consequence

difference

can be

if and o n l y

B 3

is full Due

is r e a c h a b l e

condition, Hautus

(1969).

periodi c

systems

The f o l l o w i n g

basic

Do p r o p e r t i e s • Can

anything

questions

A-C hold

true

be said about

lity

intervals?

Does

there

exist

In the

first

place,

and c o n t r o l l a b i l i t y

for p e r i o d i c

version

holds

subspaces

section:

and c o n t r o l l a b i -

of the PBH

true,

coincide

in this

systems?

the r e a c h a b i l i t y

a periodic A still

are c o n s i d e r e d

test?

i.e.

the r e a c h a b i l i t y

even

in the p e r i o d i c

149

case:

X

r

(t) = X

As m a n y r e s u l t s periodic

~t.

concerning

systems,

geometric

this

the s t r u c t u r a l

can b e p r o v e n

derivation

of a t y p i c a l

of inclusion

that a n y s t a t e NT.

geometric

Due

t+NT,

to p e r i o d i c i t y ,

x is c o n t r o l l a b l e

X

reachable

c

(T,t)

contrary and X

c

to zero

(t+NT),

in an

so t h a t

x~

since x ~ is c o n t r o -

(t,x)

(t,0)

into

will

there

exists

(t+NT,0).

By the

t h e n be t r a n s f e r r e d at t+NT,

to p e r i o d i c i t y .

This

and c o n s e leads

to

s y s t e m at e a c h

well

known

property

to the time

and c o n t r o l l a b i l i t y time p o i n t

(T,t) m a y n o t c o i n c i d e

is an e x t e n s i o n

for t i m e - i n v a r i a n t

invariant

case,

sub-

systems.

the s u b s p a c e s

(see E x a m p l e

1 below).

c

dically

r

the

Xr(t).

P r o p e r t y B is o b v i o u s l y

X

that,

x e is r e a c h a b l e

at t thanks

of a p e r i o d i c

r

(t) = X

of the r e a c h a b i l i t y

of the a n a l o g o u s

X

the event

Xc(t) C

The c o i n c i d e n c e

However,

let us o u t l i n e

at t. T h e r e f o r e

transfers

Therefore

the c o n c l u s i o n

spaces

which

function,

(t+NT,-x~).

quently

or

at t + N T as well.

function

same i n p u t

proof,

a t t c a n be d r i v e n

Let x = $ ( t , + N T ) x "e. It is a p p a r e n t

to

algebraic

X

controllable

is c o n t r o l l a b l e

an i n p u t

either via

of l i n e a r

(t) C X (t). L e t x ~ be a s t a t e w h i c h c -- r at t and d e n o t e b y N a p o s i t i v e i n t e g e r such

is c o n t r o l l a b l e

lableat

properties

methods.

As an e x a m p l e

interval

(t),

c

time-varying

(t) = X

X c(t)

:

(t+T)

,

~t

= X c(t+T)

,

~t.

r

false. I n s t e a d X r ( t )

and Xc(t)

are p e r i o -

150

However,

b y the s a m e a r g u m e n t s

ti a n d B o l z e r n

dim X

dim X

From

r

c

(1984,a),

Lemma

used

in d i s c r e t e - t l m e

3, it can be p r o v e n

in B i t t a ~

that

(t) = c o n s t

(t) = const.

this,

periodic

it f o l l o w s

system

reachable

that property

is r e a c h a b l e

C still h o l d s true:

at a g i v e n

at a n y t i m e point.

Hence

s p e a k of s y s t e m r e a c h a b i l i t y

time point,

If a

it is

it is p o s s i b l e

and c o n t r o l l a b i l i t y

to

without

fur-

ther specifications.

The attention of w h i c h are knowledge

is n o w f o c u s e d somewhat

vals

(1969),

algebraic

Theorem

1

If s y s t e m sition

systems

in

C

Bittanti

(Brunovsky,

X

first

concern-

sixties.

interIn B r u -

is p r o v e n b y m e a n s proof,

stories

To the b e s t

statements

late

of

of g e o m e t r i c

(1984,a),

then

Lemma

type,

I.

the c o n t r o l l a b i l i t y

in an i n t e r v a l

(t) = X

C

(t,t+nT).

tran-

of t i m e of l e n g t h nT

:

C

the

and c o n t r o l l a b i l i t y

and Bolzern

(1) is c o n t r o l l a b l e ,

~

e a c h other.

to the

result

D and E,

1969)

c a n be p e r f o r m e d

(t) = R n

the

An alternative

is the s y s t e m order)

X

go b a c k

the f o l l o w i n g

arguments.

can be found

author,

with

of the r e a c h a b i l i t y

of p e r i o d i c

novsky

interwoven

of the p r e s e n t

ing the l e n g t h

on p r o p e r t i e s

~]

(n

151

In Kalman

(1969),

a stronger

statement

is reported

without

proof.

Proposition If system sition

(Kalman,

(I) is controllable,

can be performed

The question untill

1969)

of proving

1975,

when,

in an interval

in a

Riccati

proposition.

Furthermore,

equation,

system controllability

different

and N i s h i m u r a ~(T,0)

condition, matrix

remained

paper

which generalizes

open

on

Hewer gave a proof

he gave a spectral

[]

of Kalman

condition

of

the PBH test to the

case.

A slightly

matrix

lengthy

tran-

of time of length T.

Kalman proposition

the periodic

periodic

then the controllability

condition

is due to Kano

(1979), where reference is made to the m o n o d r o m y RT = e in place of R. The condition, named H-

will now be stated

~(t+T,t).

to discrete-time

H-condition

yet equivalent

This

in terms of a ~ n e r i c

is especially

systems. Recall

at t

that

monodromy

useful

for the extension RT [ is the spectrum of e

(continuous-time)

Given a time point t, the matrix sl-

#(t+T,t)

Wc(t,t+T) 3

is full rank on I-

[]

The first paper where

the condition was

stated

is probably Bittanti,

Bolzern,

and Guardabassi

Colaneri

in these

terms (1983).

152

However,the spectral conditions playeda k e y r o l e in the analysis of the periodic Lyapunov and Riccati Equations, Hewer Kano and Nishimura

(1969),

(1975). The proof given by Hewer of the

validity of the H-condition as controllability test was based on Kalman proposition, in Hewer

though. Unfortunately,

the proof given

(1969) of such a proposition was not correct. Even

more so:the Kalman proposition itself is not true,as shown by the following counterexample.

Example 1 For a given integer n, let 1 I, ~2' ..., I n be n given distinct real numbers. Consider the single-input system:

A(t) = diag [ 1 I, 12, .... Xn] ' E e-X1 (l-t)

e _12 (l_t)

l , sin ... e -ln (l-t) _

B It) = periodic extension of previous,

t ~ [0,1]

For this system, which is periodic of period T = I, 9(0,~) B(o) = (sin ~g) x I where x I = [e -11

e-k2

... e-ln]

Letting 2 = f01 sin ~q dq

=t, te [0,13

153

it f o l l o w s

Wc(0,1)

from

=

(3.b)

~ X l X ~.

Therefore,

dim

X

not

controllable

For

a given

interval

xi

Then, can

(0,1)

c

and

-iX1

e

Wc(0,k)

the

= ~(xlx:]

assuming

n > I, t h i s

system

integer

k,

k ~ n,

consider

now

the

time

let

. .. e

-iXnJ

'

(4)

for

recursion following

,

any

i = I, 2,

...,

k.

integer

k ~n,

W

c

(0,k)

expression:

+ x2x[z + "" " + XkXJ)'k

k < n

Consequently,

Xc(0,k)

Since dim

X

= span

x I, x2,

c

(0,k)

Therefore states shorter

the

which than

Interestingly

Xr(0,k)

Ix1,

...,

= k

x2,

...,

x n are ,

,

Xk~

k

independent,

it f o l l o w s

is c o n t r o l l a b l e , be d r i v e n

to

zero

but

enough,

it t u r n s

X_l

...,

out

X~k+13

there

in an

(n).

= s p a n Ix0,

.< n .

that

Yk ~
system cannot

is

(0,1).

-iX 2

by a p p l y i n g

be g i v e n

= I, and,

over

positive

(0,k)

[e

: =

that

that

exist

interval

some of

time

154

This

entails

that X

The v a l i d i t y condition

Notice

One

over

8~ngZ~

a

such proof,

(1984),

2(b),

minimal

matrix

Bittanti,

Theorem

cannot

be m a d e

calls

and

to o c c u r

in a

for the c o n t r o l l a b i l i t y

in Bittanti,

the

time point,

of ~(t+T,t) Colaneri

Bolzern,

Colaneri

following

of the m i n i m a l

at any

Bolzern,

of K a l m a n ' s

the r e a c h a b i l i t y

and Bittanti,

contain

polynomial

[]

period.

(1983)

which

m a y not coincide.

as a c o n t r o l l a b i l i t y

although

can be found

~ is the d e g r e e

monodromy

(0,k)

c

indipendently

the H - c o n d i t i o n

and Guardabassi si

that,

transitions

period,

Grammian

and X

n o w be p r o v e n

controllability single

(0,k)

of the H - c o n d i t i o n

must

conjecture.

r

and G u a r d a b a s -

results.

polynomial ~(t+T,t).

does

In T h e o r e m of the

Note

not d e p e n d

and G u a r d a b a s s i

Colaneri

that

upon

the

t, see

(1983).

2

(a) If s y s t e m

(I)

is c o n t r o l l a b l e

at time

t, t h e n the H - c o n d i -

tion at t is satisfied. (b) S u p p o s e system

that

is c o n t r o l l a b l e

In c o n c l u s i o n , condition. riodic

(I)

time-invariant is c o n t r o l l a b l e . point

of the PBH

defined

Since whenever

Then,

the

= Xc(t,t+~T). D

is in fact a c o n t r o l l a b i l i t y test,

is c o n t r o l l a b l e

system

at t is satisfied.

at t and Xc(t)

the H - c o n d i t i o n

In v i e w

system

any time

the H - c o n d i t i o n

this m e a n s

at t if and o n l y

by the p a i r

a periodic

that

system

the peif the

(~(t+T,t),

Wc(t,t+T))

is c o n t r o l l a b l e

it is c o n t r o l l a b l e

at a g i v e n

at

time

155

point,

it r e a d i l y

follows

tion is s a t i s f i e d tion h o l d s

true

Brunovsky's that,

at a g i v e n

time

theorem

(b)

if the H - c o n d i -

then the same

Finally,

can be s l i g h t l y

it is

strengthened

(I) is c o n t r o l l a b l e

can be p e r f o r m e d

that,

point,

at any time point.

if s y s t e m

transition

from part

condi-

clear

that

by c l a i m i n g

at t, the c o n t r o l l a b i l i t y

in an i n t e r v a l

of time of l e n g t h

Remarks I. The above

is the s t r o n g e s t

interval

length

in terms

of m a t r i x

A only.

periodic

matrix

there

that,

(A,B)

if

cannot

that

(Chen,

(A,B),

to zero

1970)

(~(t+T,t),

W

c

analogous

given

any

matrix

some

of s y s t e m

state

result

If one

to p e r f o r m

B such

1985)

equation y(t)

output

the c o n t r o l l a b i l i t y

index

(t,t+T)).

Needless

results

(I):

= C(t)

x(t)

equation

is

the c o n t r o l l -

notion

to say that,

can be g i v e n

can be a d d e d

to the

has since

for r e a c h a b i -

lity. 3. A p e r i o d i c

(A,B)

considers

(Kabamba,

the c o n t r o l l a b i l i t y

into c o n s i d e r a t i o n .

= Xc(t),

that,

can be d r a w n

a periodic

required

is at m o s t

only

system which

in 9-I periods.

of p e r i o d s

subsection,

taken

exists

then a f u r t h e r

of

on the c o n t r o l l a b i l i t y

This m e a n s

is c o n t r o l l a b l e ,

transition

2. In this

Xr(t)

A,

the n u m b e r

ability

been

of a c o n t r o l l a b l e

be d r i v e n

the p a i r

conclusion

state

156

with

C : R ÷ R p x n continuous,

C(t+T) Then,

= C(t)

,

the n o t i o n s

and T - p e r i o d i c :

Yt. of state

can be introduced. these n o t i o n s

real

As

observability

is w e l l

deal w i t h

known,

and r e c o n s t r u c t i b i l i t y

Kalman

the p o s s i b i l i t y

(1969),

starting

f r o m or e n d i n g

in a g i v e n

free m o t i o n

starting

f r o m or e n d i n g

in the o r i g i n

Obviously,

output

function.

(A,C)

this

last

The observability

are n o t e x p l i c i t e l y

the o b s e r v a b i l i t y systems

free m o t i o n

considered

via

(1970).

the o b s e r v a b i l i t y

is e q u i v a l e n t

here.

the d u a l i t y

f r o m the

of the

in the

The r e a s o n

properties

f r o m the ones c o n c e r n i n g

controllability Indeed,

results

state

theory,

state-

zero

Kalman

of

is that

of p e r i o d i c

reachability (1969),

Econtrollability]

of

(A,C)

of the

dual pair (A', C').

3. S t r u c t u r a l 3.1

Properties

Discrete-time

Turning

now

linear

of D i s c r e t e - t i m e periodic

to d i s c r e t e - t i m e

Periodic

Systems

systems

systems,

consider

the d i f f e r e n c e

equation

x(t+1) where

= A(t) x(t) A

+ B(t)

: Z + R n x n and B

u(t) : Z ÷ R nxm

(5.a)

and

(for an integer

and

Chen

[reconstructibility]

to the r e a c h a b i l i t y

a

and r e c o n s t r u c t i b i l i t y

and r e c o n s t r u c t i b i l i t y

can be d e r i v e d

(1970),

of d i s t i n g u i s h i n g

free m o t i o n

space.

Chen

T)

157 T-periodic:

A(t+T)

= A(t)

;

B(t+T)

= B(t)

, ~t

(5.b)

Since A m a y be s i n g u l a r

at c e r t a i n

time points,

not be r e v e r s i b l e .

is a m a j o r

difference

-time

and c o n t i n u o u s - t i m e

reversibility, quite

different

through

tanti

where

no free m o t i o n ending

~(t,T)

= A(t-1)

expressed

A(t-2)

belong

eigenvalues By r e v e r s i n g

~(Y+T,T) nonzero

the r o l e of follows

eigenvalues

= FG and

that,

i.e.

since

~(t+T,t)

and

and

invariance

of

F G x = ix,

~(T+T,T)

x ~ 0,then

1

= GF.

in the a b o v e eigenvalues

implies

¢(T+T,T)

of the

Hence,

all the n o n z e r o

~(T+T,T)

This

can be

y ~ 0, so t h a t

t h a t all the n o n z e r o of ¢(t+T,t).

= GF.

Indeed,

i ~ 0 and x ~ 0, x

of G. H e n c e Therefore,

of t.

matrices

¢(T+T,T)

are e i g e n v a l u e s

of ~(t+T,t)

the time

is i n d e p e n d e n t

of ¢ ( t + T , t ) ,

of GF as well.

are e i g e n v a l u e s

demonstrating

new-born s t a t e at time t, B i t -

, the m o n o d r o m y

Notice

#(t+T,t)=FG

it also

and e v e n t s

is an e v e n t w i t h no free

~(t+T,t)

to the n u l l - s p a c e

of

end

A(T).

eigenvalue

y = Gx.

is an e i g e n v a l u e

argument,

...

in the f o r m ~(t+T,t)

GFy = ly, w h e r e

there m a y

is n o w g i v e n b y

(T, t e E 0 , T - I ~ )

if I is a n o n z e r o

cannot

matrix

of the m a t r i x

for any p a i r

(x,t)

passing

(1986).

transition

The s p e c t r u m

If

free m o t i o n s

c a n be

in

free m o t i o n

In d i s c r e t e - t i m e ,

in it, x is n a m e d

and B o l z e r n

The s y s t e m

end.

discrete

of n o n -

Precisely,

is one and o n l y one

two o r m o r e

(5) m a y

in d i s c r e t e - t i m e

t h a n in c o n t i n u o u s - t i m e . there

between

As a c o n s e q u e n c e

portrait

a n y g i v e n event (x,t).

exist events

motion

systems.

the t r a j e c t o r y

continuous-time

where

This

system

that

the

do c o i n c i d e ,

spectrum

[.

of

so

158

Some eigenvalues of #(t+T,t)

may be zero. The symbol [0 will

denote the set of the nonzero elements of lAlthough the characteristic polynomial of #(t+T,t)

does not

change with t, the minimal polynomial may depend on t. The degree of the minimal polynomial will be denoted by 9t" Finally, note that the determinant of ~(t+T,t) may not be positive.

3.2

Structural properties

The definitions of reachability and controllability given above apply to discrete-time

systems as well. When considering the

structural properties,

the only care to be taken in discrete-

time concerns

the definition of reconstructibility.

due to the peculiar phase portrait of nonreversible

This is systems.

Since the attention is mainly focused here on reachability and controllability,

this aspect will not be discussed in this paper.The

interested reader is referred to Bittanti and Bolzern

3.3

(1986).

Grammian matrices

The only Grammian matrix which can be defined in general is the reachability one: Wr(T,t)

=

t [ j ~(t,j)B(j-1)B(j-1)'~(t,j)' T+I

(6)

For reversible systems, the backward transition matrix ~ (T,t)= -I =~(t,T) , t >. T, can be defined. Then, the controllability Gramian matrix is given by:

W

c

(t,T)

=

t+~1j ~(t,j)B(j-I)B(j-I)'#(t,j) '

159

As in continuous-time, reversible

systems,

Xr(T,t)

Xc(t,%)

In particular,system(5.a)

continuous-time

3.4

can also be tested by resorting

This coincides with the PBH condition

systems

Five structural

at time t if and only if,for

is full rank. For time-invariant

the system reachability

to the PBH condition.

for

R[Wc(t,~)]

=

is reachable

some T
= R [Wr(T,t) 7_ . Moreover,

(Sect.

for

2.4).

properties

of discrete-time

periodic

systems

Property A is false for periodic

discrete-time

only conclusion which holds in general Bolzern,

1984,a,

,

(Bittanti and

Vt.

only the dimension

is time-invariant dim

is that

of the controllability

(Bittanti and Bolzern,

1984,a,

reachability

Example

(7)

simple example

shows that the dimension

subspace may change

2

Consider the scalar system x(t+1)

= a(t) x(t) I

subspace

Lemma 2)

X (t) = const. c

The following

The

Lemma 2)

X r (t) c_ Xc (t) Furthermore,

systems.

+ b(t) u(t) 0

,

t even

I

,

t odd.

a(t) = b(t) =

in time.

of the

160

Then:

x

r

(t)

R I

,

t

even

{0}

,

t odd.

C,

from

=

As for P r o p e r t y controllable

In o r d e r

(7) it f o l l o w s

at a c e r t a i n

any t i m e point. conclusion

[]

it is n e c e s s a r y

if s y s t e m

(5) is

it is c o n t r o l l a b l e

from E x a m p l e

2 that

the

at

same

for r e a c h a b i l i t y .

to g u a r a n t e e

the s y s t e m

to r e q u i r e

set 6, w h i c h

for r e v e r s i b l e

point,

It is a p p a r e n t

is false

of a s u i t a b l e

time

that,

systems

any t e [0,

the r e a c h a b i l i t y reduces

only. T-I]

reachability

to any

Precisely,

if the

at any time point,

at each singleton

time p o i n t in [0,

T-13

let

system

is r e v e r s i b l e

6 = {t e [ 0 , T - I ] Then,

by a n a l y z i n g

Bolzern

(1985,b)

it is r e a c h a b l e

such

the r e a c h a b i l i t y

it is p r o v e n for e a c h

is d e r i v e d

X X

r c

and B o l z e r n by

(t) = X

c

(t, t+nT)

system

in B i t t a n t i

and

(5) is r e a c h a b l e

and c o n t r o l l a b i l i t y

(1984,a,

considerations

(t) = X r ( t - n T , t )

that

Gramian,

, otherwise.

if

t e &.

As far as the r e a c h a b i l i t y in B i t t a n t i

that det A(t-1)=0}

Yt , Vt.

Lemma

I) the

of g e o m e t r i c

interval following

kind:

length, result

161

Obviously,

this e n t a i l s

Theorem.

Another

lability

interval

can be stated

Xr(t)

result

Xr(t)

X (t) = R n c

~

X

of B r u n o v s k y

the r e a c h a b i l i t y

in B i t t a n t i

and c o n t r o l -

and B o l z e r n

(1985,b),

= Xr(t-v t T,t)

(t) = X

c

it is r e c a l l e d of

given

version

as follows:

~

nomial

concerning

length,

= Rn

where

the d i s c r e t e - t i m e

(8.a)

(t, t+~tT) ,

c

that

(8.b)

is the d e g r e e

t

of the m i n i m a l

poly-

¢(t+T,t).

Remarks

4. For n o n r e v e r s i b l e

systems,

m a y give

raise

Consider

a controllable

Then,

driven t=2,

to an i n t e r e s t i n g

starting

at m o s t

to zero

any

However,

conclusion

state

since

say,

t. This

would

91=3

fact

and 92=1.

c a n be d r i v e n any

state

no l o n g e r

c a n be d r i v e n

upon

paradox.

any state

in an i n t e r v a l

the s t r o n g e r

depend

s y s t e m with,

f r o m t=1,

3 periods.

f r o m t=1,

vt m a y

to zero

can also be

than T s t a r t i n g

follow to zero

in

that,

from

starting

in at m o s t

2

periods. However, shown

this p a r a d o x

(Bittanti

I~ t - vTl

a n d Bolzern,

$ I

readily,

1985,b)

since

it can be

that

, WT,t

5. T h e d i s c r e t e - t i m e I can be found

resolves

version

of the r e s u l t

in B i t t a n t i ,

Colaneri,

mentioned

De N i c o l a o

in R e m a r k (1986).

162

Precisely,

let ~rt be the reachability

(#(t+T,t),

Wr(t,t+T))

where

is

~z

the

r

denote by ~ct = maX(Urt'

pair ~z )

dimension of the largest Jordan block

of the controllable (~(t+T,t),W

and

index of the

and unreachable

(t,t+T)).

part of system

Then

X r (t) = Xr(t-~rtT,t) Xc(t) = Xc(t,

From this,

t+~ctT)

it follows

if the a s s u m p t i o n s

6. The impossibility

X

r

that conclusions (t)=R n

and

X

c

(8) hold true even

(t)=R n

are

removed.

of defining in generalacontrollability

Grammian

leads to the question of working out a controllability

test

of periodic

A

systems based on the reachability

test of this type can be found in Bittanti

Gramian.

and Bolzern

The discrete-time

version of the spectral characterizations

of the structural

properties

stated as follows

(Bittanti and Bolzern,

Theorem System

presented

in Sect. 1985,b,

(1985,b).

2.5 can be Sect.

6).

3

(5) is reachable [controllable 3 at time t if and only

if the following H-condition H-condition

at t

at t is satisfied.

(discrete-time)

(i) For reachability: Given the time point t, the matrix

[]

163

sI -

~(t+T,t)

is full

W r (t, t+T) 3

rank on I-

(ii) F o r c o n t r o l l a b i l i t [ Given

(9)

:

the t i m e p o i n t

t, the m a t r i x

(9) is full

r a n k on [0 o

Remark

7. As in c o n t i n u o u s - t i m e ,

if the H - c o n t r o l l a b i l i t y

is s a t i s f i e d

at a time point,

point.

follows

As

it

the r e a c h a b i l i t y conclusion

4. K a l m a n

does

not a p p l y

system

controllable-observable continuous-time

riodic

systems

is b a s e d

Gramians,

which

dimensions by means

the

on same

condition.

Theory

is t h a t

systems,

in 4 parts,

controllable-unobservable,

this

result

structural

is e x t e n d e d

of(Bittanti

on the t i m e - i n v a r i a n c e

of a T - p e r i o d i c

any t i m e - i n v a r i a n t

parts.

to pe-

and Bolzern, of the r a n k

i.e. un-

and u n c o n t r o l l a b l e - u n o b s e r v a b l e

corresponds

of the

systems,

can be d e c o m p o s e d

in the a p p e n d i x

The proof

of p e r i o d i c

time

discussion

to the H - r e a c h a b i l i t y

in S y s t e m

the controllable-observable,

For

previous

at e a c h

Decomposition

result

and c o n t i n u o u s - t i m e

the

properties

Canonical

A fundamental

from

it is s a t i s f i e d

condition

1985,c).

of the

to the time

invariance

of the

subspaces.

The r e s u l t

says

state

transformation,

that,

any T - p e r i o d i c

164

system can be decomposed into the 4 parts of the Kalman canonical decomposition. As discussed in the previous section, the reachability and controllability subspaces may not coincide for a discrete-time periodic system.

Dually,

the observability and reconstructibi-

lity subspaces may not coincide too. Therefore, decompositions

sbould have to be considered,

cal d e c o m p o s i t i o n s based on the pairs bility),

(controllability,

reconstructibility)

and

four canonical

i.e. the canoni-

(reachability,

observability),

(controllability,

observa-

(reachability, reconstructibility).

To see which one of these decompositions can actually be used for discrete-time periodic systems, note that the reachability subsPace and the dual observability subspace may have timevarying dimensions trollability,

(Sect.

3.4). Consequently,

reconstructibility)

only the

(con-

decomposition is a candidate

for a canonical decomposition of general validity.

This is

why, contrary to the continuous-time case, the theory cannot be based on the Gramian matrices.

Indeed, as already observed,

only the reachability and observability Gramians can be defined for discrete-time systems. In

(Bittanti and Bolzern,

1984,b and 1986), a theory for the

Kalman canonical decomposition of any time-varying and discrete -time system is worked out. Letting Xa(t) be either the reachability or

the

controllability

subspace at t and Xe(t) be either

the unobservability or unreconstructibility proven that an

subspace,

it is

(a,e) canonical decomposition exists if the

following three conditions are met with.

165

(i)

dim X a (t) = const.

(ii)

dim XU(t) = const.

(J'ii)

dim X a ( t ) ~

(i)-(iii) In

Xe(t) = const.

are named dimension-invariance

(Bittanti and Bolzern,

periodic systems,

1986),

condition.

it is also shown that, for

the dimension-invariance condition is veri-

fied by taking a= controllability and u=reconstructlbility. In conclusion,

discrete-time periodic systems can be canoni-

cally decomposed by making reference to the pair lity,

reconstructibility).

only,

this result is also derived in Grasselli

(controllabi-

By focusing on periodic systems (1984).

5. Extended Structural Properties The notions of stabilizability and detectability are here called extended structural properties.

Since the detectability results

can be derived from the ones relative to stabilizability duality,

only stabilizability

by

is considered in this section.

As a concise introduction to the stabilizability concept,

con-

sider the time invariant system: x(t) = Ax(t)

+ B u(t)

(10)

or x(t+1) = AX(t)

(11)

+ B u(t)

A classical result of System Theory, that,

if

(A,B) is reachable,

see e.g. Kalman

(1969) is

then the system can be stabilized

by a suitable feedback control law. Stated differently,

there

166

exists

a matrix

K e Rm x n

~(t)

= Ax(t)

U(t)

= ~x(t)

such that

+ B u(t)

or x(t+l) u(t) is

= Ax(t)

= Kx(t)

asymptotically

ability

feedback

leads

according

While

respectively. condition

control

to the n o t i o n

to a c l a s s i c a l

is s t a b i l i z a b l e control

stable

is a s u f f i c i e n t

stabilizing This

+ B u(t)

whenever

law.

Thus,

for the e x i s t e n c e

However,

there

by Wonham

exists

Precisely,

(1968),a

a stabilizing

PBH c o n d i t i o n

for r e a c h a b i l i t y

system

feedback

is the same

continuous

or d i s c r e t e - t i m e ,

the PBH c o n d i t i o n

lizability

of t i m e - i n v a r i a n t

systems

or d i s c r e t e - t i m e .

Precisely,

if and o n l y

if the m a t r i x

[sI - A

B] rank

part [with The m a i n

of a

it is not necessary.

of s t a b i l i z a b i l i t y .

definition

reach-

law.

the

is full

the s y s t e m

system

for all e i g e n v a l u e s

modulus

greater

characterizations

continuous-time

periodic

than

[(11)]

in c o n t i n u o u s is s t a b i l i z a b l e

of A w i t h n o n n e g a t i v e or equal

of the system

for the stabi-

is d i f f e r e n t (10)

in

real

to I].

stabilizability

c a n be s u m m a r i z e d

notion

for

as follows

167

(Bittanti and Bolzern, 1984,c)

and

(Shayman,

eigenvalues

1985,b)and also 1984).

of #(t+T,t)

(Bittanti and Bolzern,

Recall that

not belonging

11 is the set of the

to the open unit disk

(unstable part of the spectrum). Theorem 4 The following

statements

are equivalent to each other.

(a) There exists a T-periodic matrix K:R + R mxn such that

~(t) = [A(t)

- B(t) K(t)] x(t)

is asymptotically

stable.

(b) The uncontrollable

part of system

(I) is asymptotically

stable.

(c) For at least a time point t e E0,T] [SI-

#(t+T,t)

, the matrix

Wc(t,t+T) 3

is full rank on [ 1 " ~

Characterizations

(a) and

systems of analogous systems. condition

characterizations

(c) is a natural extension (Sect.

the PBH test, stabilizable (~(t+T,t),

(b) are direct extensions

2.5).

for time-invariant

of the H-controllability

In view of the discrete-time

(c) is equivalent

The discrete-time

version of

to saying that system

if and only if the discrete-time

Wc(t,t+T))is

to periodic

(1) is

pair defined by

stabilizable.

version of this theorem is given in

(Bolzern,

168

1986)

and can be stated as follows.

Theorem 5 The f o l l o w i n g

statements

(a) T h e r e e x i s t s

x(t+l)

are e q u i v a l e n t :

a T-periodic matrix K

= [ACt)

: Z + R m x n such that

- Bit) K(t)~ x(t)

is a s y m p t o t i c a l l y

stable.

(b) T h e u n c o n t r o l l a b l e

p a r t of s y s t e m

(I) is a s y m p t o t i c a l l y

stable.

[0,T3

(c) F o r a t l e a s t o n e t i m e p o i n t t e [sI - ~(t+T,t)

Wr(t,t+T)

, the m a t r i x

3

is full rank on [1"[]

The m o d a l Thm.

characterizations

of s t a b i l i z a b i l i t y

4 and 5) are e x t e n s i o n s

of the H - c o n d i t i o n s

(c) of

previously

introduced.

Needless

Wc(t,t+T)3

is full rank on [I at a g i v e n t, then the same

matrix holds

if the m a t r i x [sI - ~(t+T,t)

is full rank on [1 at any time. for d i s c r e t e - t i m e

6. S t o c h a s t i c This

to say,

(point

section

systems

statement

systems.

Linear Periodic is d e v o t e d

An a n a l o g o u s

Systems

to the study of linear p e r i o d i c

subjet to inputs w h i c h are s t o c h a s t i c

processes

of

169 periodic

type.

Precisely,

continuous-time

focusing

case only,

the

in this

system

Section

taken

on the

into c o n s i d e r a t i o n

is

dx(t)

= A(t)

x(t)

d t + B(t)

where v is an m v e c t o r

(12)

dv(t)

valued

stochastic

process

characterized

as follows: Let m(t):

: E [v(t)~

;

Then v(t)

= m(t)

+ z(t)

where

z satisfies

dz(t)

= q(t)

In

dw(t)

(13) the

stochastic

,

z(0)

that f u n c t i o n wise

continuously

Therefore,

(12)

standard

and continuous.

m in

(13)

equation

(14)

= 0.

(14), w is an m - d i m e n s i o n a l

q is T - p e r i o d i c

differential

Wiener

Moreover,

is T - p e r i o d i c ,

process,

while

it is also a s s u m e d

continuous

and p i e c e -

differentiable.

is a s t o c h a s t i c

differential

equation

of the

form dx(t)

= [A(t)x(t)

with q(t) : = B(t)q(t)

+ B(t)m(t)]

dt + n(t)

dw(t)

(15)

170

For a given random vector x(0) the meaning of

(15) is precisely

x(t) = X(0)+ ~0

the stochastic

integral being understood

t % 0},

q(~)dw(q),

(16)

in the sense of Ito

1985, Ch. 4). The solution of

integral version

x(t)= #(t,0)x(0)+

of {w(t),

that

[A(G)X(O)+B(O)~(G)]dO+

(Wong and Hajek, equivalent

independent

(15), or its

(16), is given by

/0t $(t,G)B(q)~(q)dq+/0t ~ (t,G)n (o) dw(o) (17)

as follows readily from the Ito differential Should x(0) be Gaussian,

rule.

say with expected value

covariance matrix r0,then

z 0 and

(17) defines a Gaussian process,

with expected value

z(t) = ~(t,0)

z0+

and covariance

£(t) = ~(t,0)

(18)

matrix

F 0 O(t,0) ' +

where Q(t) = q(t)

~(t,o)B(q)~(~)d~

q(t) '

~(t,o)Q(s)~(t,o) 'd~

(19)

171

represents

the c o v a r i a n c e

From

(18)

and

~(t)

= A(t)

F(t)

= A(t) F(t)

with

the

z(0)

= z0

z(t)

and,

(Brockett,

1970)

that

m(t)

F(0)

covariance

the

known

(15).

entering

(20)

'

(21)

conditions

= E [(x(t)

= F 0.

function,

- z(t))

following

(x(T)

- z(T))' 3,

properties

= 7(T,t) '

for

7(t,T)

t > T,

= ~(t,T)

Eq. (22) Since

is w e l l

of t h e n o i s e

+ F(t)A(t) ' + B(t)Q(t)B(t)

,

possesses

7(t,T)

it

+ B(t)

initial

The p r o c e s s

¥(t,T):

(19),

matrix

can be derived

A,

7(t+T,

in v i e w

(22)

from

(17)

by

B, m a n d Q a r e T - p e r i o d i c ,

the e x i s t e n c e If s u c h

F(T).

of p e r i o d i c

simple

computations.

it is n a t u r a l

solutions

of

(20)

and

to

investigate

(21).

is the c a s e ,

T+T)

of

=

(2).

~(t+T,

T+T)

F(T+T)

= ~(t,T)

F(~)

= 7(t,T)

172

A stochastic covariance

y(t+T,

function

T+T)

is c a l l e d of this

process

(Gardner

example,

see

signals

(Bittanti

In c o n c l u s i o n ,

solutions

and

of s e a s o n a l

and Hernandez,

time

type.

areas.

series

For

Processes They

or to

an i l l u s t r a t i v e

1986).

is to find

are p e r i o d i c

1975).

in v a r i o u s

of p e r i o d i c

the p r o b l e m

there

and Franks,

applications

for the m o d e l l i n g

uncertain

value

Vt,T

find n u m e r o u s

describe

is,

,

eyeZo-8#at~onary

can be u s e d

expected

satisfying

= 7(t,w)

type

for w h i c h

with T-periodic

initial

solutions

to

conditions

(20) a n d

(22)

(21),

that

satisfying

z(T)

= z(0)

(23)

F(T)

= F(0)

(24)

respectively.

More

precisely,

for the L y a p u n o v

the p r o b l e m

is to f i n d a T - p e r i o d i c

is p o s i t i v e

semidefinite

As

for the e x p e c t e d

characteristic

at e a c h

value,

satisfying

Consider

n o w the L y a p u n o v

and s y m m e t r i c

matrix

solution

(21), which

time point.

it is e a s y

multiplier

solution

symmetric

equation

to see that,

is equal

to I, then

equation

(21)

(20)

if no admits

a unique

(23).

5,

let

r

(t) "r that,

and,

given

be a solution

a n x n real

such

r (T) = ~. It is well k n o w n for a n y ~, (21) has T s o l u t i o n r (t), -co < t < + ~ , see e.g. B r o c k e t t (1970). T a T - p e r i o d i c m a t r i x such t h a t

that a unique Let B be

173

B(t) B(t)' = B(t) Q(t) B(t)' and denote by Wr(T,t) i.e.

the reachability Gr~T.~ian matrix of

(A,B),

(see (3)),

WrCT,t) =

#(t,u)B(~)B(~) '~(t,s)'d~

(25)

For t>T, the solution of the Lyapunov equation is given by FT(t) = ¢(t,T) ~ ~(t,T)' + Wr(~,t).

(26)

Setting now T=0, t=T and imposing the periodicity constraint (24), the following equation is obtained: = ¢(T,0)

~

(T,0)'

+ ~.

This is the discrete-time algebraic Lyapunov equation. be shown

It can

(Graham, 1981) that, if the characteristic multipliers

lie within the unit circle, this equation admits a unique solution. From these results, stable, then both

it follows that: if the system is as~nptotically

(20) and (21) admit a unique T-periodic

solution. As a matter of fact, under the assumption of asymptotic stability, the following can be shown to hold true Bolzern and Colaneri,

(Bittanti,

1984):

Consider the solution F (t) of (21) such that F (T) = ~. T T Then F (t) converges to the periodic solution of % (21) as T÷-~, for whichever ~. In particular, taking ~=0,

(26)

entails that the Wr(-~,t) ~8 the T-periodic solution. Moreover,

174

in view of positive

(25), it is also apparent

semidefinite

reachable,

(at each t).

that this solution In fact,

should

is

(A,B) be

the solution is obviously positive definite

(at each

t). This last conclusion Lemma,

is part of the so-called Periodic Lyapunov

which can be stated as follows

Colaneri, Theorem

(Bittanti,

Bolzern and

1985).

6

The system is asymptotically such that

(A,B)

positive definite

stable if and only if, for any

is reachable,

there exists a T-periodic

solution of the Lyapunov equation

(21).~]

An extended version of this lemma can be given under the assumption

that

(A(t), B(t))

be stabilizable

only.

Theorem 7 The system is asymptotically such that

(A,B) is stabilizable,

positive semidefinite Theorem

7 is proven in

is decomposed

to the reachability

there exists a T-periodic

solution of the Lyapunov equation (Bittanti,

by means of a decomposition equation

stable if and only if, for any

Bolzern and Colaneri,

technique.

Precisely,

into three subequations

canonical

decomposition

of

(21).[7 1985)

the Lyapunov

corresponding (A,B).

One could wonder whether the Lyapunov equation may admit a T-periodic

positive

is not stable.

semidefinite

solution even if the system

In case the system is not asymptotically

stable,

175

matrix ~(T,0)

has some eigenvalues

on or outside

the unit

circle. If a characteristic the pair

(A,B)

T-periodic

multiplier

is stabilizable,

solution,

(see

(Bittanti and Colaneri, say, p characteristic

lies on the unit circle, then

(21) does not admit any

(Wimmer and Ziebur,

1986, Thm.

multipliers

2(a)).

that,

if

1984) and

(A,B)

solution of time-points.

(Bittanti,

is reachable

(21)

lower than I. Then,

Colaneri,

1986),

or stabilizable,

it is shown

the T-periodic for each

The remainign n-p ones are all positive if

(A,B)

solutions of

(21) correspond

(12), the conclusion stabilizable.

Then,

eq.(12)

(A,B)

semidefinite

to a cyclostationary

is the following:

if

is stabilizable.

Since it is obvious that only the positive

Assume

solution of

that

(A,B)

is

if the system is not asymptotically

admits no cyclostationary

The analysis of the discrete-time

solution.

periodic Lyapunov equation

is currently underway and partially Colaneri,

Suppose now that,

(if any) has p negative eigenvalues

is reachable or nonnegative

stable,

1975) and

have modulus greater than I,

while the remaining n-p ones have modulus in (Shayman,

and

reported

in

(Bolzern and

1986).

Acknowledgment The author is grateful to Professors Diego Bricio Hernandez comments.

Guido Guardabassi

and

for their helpful and stimulating

176

References Bailey, J.E. (1973): Periodic Operation of Chemical Reactors: A Review. Chem. Eng. Commun. I, 111-124. Bekir, E. and R.S. Bucy (1976): Periodic Equilibria for Matrix Riccati Equations. Stochastics 2, 1-104. Belevitch, V. (1968): Classical Network Theory. Holden Day, San Francisco. Bernstein, D.S and E.G. Gilbert (1980): Optimal Periodic Control: The H Test Revisited. IEEE Trans. Automatic Control AC-25, 673-684. Bittanti, S. and P. Bolzern (1984,a): Can the Kalman Canonical Decomposition be performed for a Discrete-time Linear Periodic System? Ist Latin American Conference on Automatica, Campina Grande, Brazil, 449-453. Bittanti, S. and P. Bolzern (1984,b) : Canonical Decomposition and Discrete-time Linear Systems. 23rd Conference of Decision and Control, Las Vegas, U.S.A., 1737, 1738. Bittanti, S. and P. Bolzern (1984,c): Four Equivalent Notions of Stabilizability of Periodic Linear Systems. 3rd American Control Conference, San Diego, U.S.A., 1321-1323. Bittanti, S. and P. Bolzern (1985,a): Reachability and Controllability of Discrete-time Linear Systems. IEEE Trans. Automatic Control 30, 399-491. Bittanti, S. and P. Bolzern (1985,b): Discrete-time Linear Periodic Systems: Grammian and Modal Criteria for Reachability and Controllability. International J. Control 41, 899-928. Bittanti, S. and P. Bolzern (1985,c): Stabilizability and Detectability of Linear Periodic Systems. Systems and Control Letters 6, 141-145. Plus Addendum, to appear in Systems and Control Letters (1986), 7, 73. Bittanti, S. and P. Bolzern (1986): On the Structure Theory of Discrete-time Linear Systems. International J. Systems Science, 17, 33-47.

177

Bittanti, S., P. Bolzern and P. Colaneri (1984): Stability Analysis of Linear Periodic Systems via the Lyapunov Equation. 9th IFAC World Congress, Budapest, 8, 169-172. Bittanti, S., P. Bolzern and P. Colaneri (1985): The Extended Periodic Lyapunov Lemma. Automatica 5, 603-605. Bittanti, S., P. Bolzern, P. Colaneri and G. Guardabassi (1983): H and K-Controllability of Linear Periodic Systems. 22nd Conference on Decision and Control, S. Antonio, U.S.A., 1376-1379. Bittanti, S. and P. Colaneri (1986): Lyapunov and Riccati Equations: Periodic Inertia Theorems. IEEE Trans. Automatic Control (to appear). Bittanti, S., P. Colaneri and G. De Nicolao (1986): Discretetime Periodic Systems: a note on the Reachability and Controllability interval length. Centro Teoria Sistemi, Politecnico di Milano, Int. Rep. 86-003. Bittanti, S., P. Colaneri and G. Guardabassi (1984): H-Controllability and Observability of Linear Periodic Systems. SIAM J. Control and Optimization 22, 889-893. Bittanti, S., G. Fronza and G. Guardabassi (1973): Periodic Control: A Frequency Domain Approach. IEEE Trans. Automatic Control 18, 33-38. Bittanti, S., G. Guardabassi, C. Maffezzoni and L. Silverman (1978): Periodic Systems: Controllability and the Matrix Riccati Equation. SIAM J. Control and Optimization 16, 37-40. Bittanti, S. and D.B. Hernandez (1986): The Simple Pendulum as an Illustrative Example of the Periodic Control Problem. Centro Teoria dei Sistemi, Politecnico di Milano, Int. Rep. 86-010.

Bolzern, P. (1986): Criteria for Reachability, Controllability and Stabilizability of Discrete-time Linear Periodic Systems. V Polish-English Seminar on Real-Time Process Control, Warsaw, Poland.

178

Bolzern, P. and P. Colaneri (1986): Existence and Uniqueness Conditions for the Periodic Solutions of the Discretetime Periodic Lyapunov Equation. Centro Teoria dei Sistemi, Politecnico di Milano, Int. Rep. 86-011. Brockett, R.W. (1970): Finite Dimensional Linear Systems. Wiley and Sons.

J.

Brunovsky, P. (1969): Controllability and Linear Closed loop Controls in Linear Periodic Systems. J. Differential E~uations 6, 296-313. Chen, C.T. (1970): Introduction to Linear System Theory. Rinehart and Winston.

Holt,

Colonius, F. (1985ja): Optimality for Periodic Control of Functional Differential Systems. J. Mathematical Analysi s and Applications (to appear). Colonius, F. (1985,b): The High Frequency Pi-Criterion for Retarded Systems. IEEE Trans. Automatic Control 11, 1045-1048. DaPrato, G. (1984): Periodic Solutions of Infinite Dimensional Riccati Equations. Rendiconti Accademia Nazionale dei Lincei, (to appear). Dorato, P. and A.H. Levis (1971): Optimal Linear Regulators: the Discrete-time Case. IEEE Trans. Automatic Control 6, 613-620. Dorato, P. and H.K. Knudsen (1979): Periodic Optimization with A p p l i c a t i o n s to Solar Energy Control. Automatica 15, 673-679 Gardner, W.A. and D.E. Franks (1975): Characterization of Cyclo-stationary Random Processes. IEEE Trans. Information T h e o r y 21, 1-24. Gilbert, E.G. (1977): Optimal Periodic Control: A General Theory of Necessary Conditions. SIAM J. Control and Optimization 15, 717-746.

17g

Gilbert, E.G. and D.T. Lyons (1981): The Improvement of Aircraft Specific Range by Periodic Control. AIAA Guidance and Control Conference, Albuquerque. Graham, A. (1981): Kronecker Products and Matric Calculus with Applications. Ellis Horwood Limited, Chichester. Grasselli, O.M. (1984): A Canonical Decomposition of Linear Periodic Discrete-time Systems. International J. Control 40, 201-214. Guardabassi, G. (1971): Optimal Steady State Versus Periodic Control. Ricerche di Automatica 2, 240-252. Guardabassi, G. (1976): The Optimal Periodic Control Problem. Journal A 17, 75-83. Halanay, A.(1966): New York.

Differential Equations.

Academic Press,

Hautus, M.L.J. (1969): Controllability and Observability Conditions of Linear Autonomous Systems. Inda@. Math. 443-448.

72

Hernandez, V. and L. Jodar (1985): Boundary Problems and Periodic Riccati Equations. IEEE Trans. Automatic Control 11, 1131-1133. Hewer, G.A. (1975): Periodicity, Detectability and the Matrix Riccati Equation. SIAM J. Control 13, 1235-1251. Horn, F.J.M. and R.C. Lin (1967): Periodic Processes: A Variational Approach. Ind. Eng. Chem. Process Des. Dev. I, 21-30.

6,

Horn, F.J.M. and J.E. Bailey (1968): An Application of the Theorem of Relaxed Control to the Problem of Increasing Catalyst Selectivity. J. Optimization Theory and Applications 2, 441-449. Houlihan, S.C., E.M. Cliff and H.J. Kelley (1982): Study of Chattering Cruise, Journal Aircraft 19, 119-124.

180

Johnson, C.D. (1966): Invariant Hyperplanes for Linear Dynamical Systems. IEEE Trans. Automatio Control 11, 113-116. Kabamba, P.T. (1985): Monodromy Eigenvalue Assignment in Linear Periodic Systems. 24th Conference on Decision and Control, Ft. Lauderdale, U.S.A., 177, 178. Kalman, R.E. (1969): Theory of Regulators for Linear Plants. In: Kalman R.E., P.L. Falb and M.A. Arbib: Topics in Mathematical S y s t e m Theor[. McGraw-Hill Co., New York. Kano, H. and T. Nishimura (1979): Periodic Solutions of Matrix Riccati Equations with Detectability and Stabilizability. !nternational J. Control 29, 471-487. Kern, G. (1980): Linear Closed-loop Control in Linear Periodic Systems with Application to Spin-stabilized Bodies. International J. Control 31, 905-916. Khandelwal, D.N., J. Sharma and L.M. Ray (1979): Optimal Periodic Maintenance of a Machine. IEEE Trans. Automatic Control 24, 513. Khargonekar, P.P., K. Poolla and A. Tannenbaum (1985): Robust Control of Linear Time-invariant Plants Using Periodic Compensation. IEEE Trans. Automatic Control 11, 1088-1098. Kono, M. (1980): Eigenvalue Assignment in Linear Periodic Discrete-time Systems. International J. Control I, 149-158. Maffezzoni, C. (1974): Hamilton-Jacobi Theory for Periodic Control Problems. J. Optimization Theory and Applications 14, 21-29. Markus, L. (1973): Optimal Control of Limit Cycles or what Control Theory can do to Cure a Heart Attack or to Cause One. Symposium on Ordinary Differential Equations, Minneapolis, Minnesota (1972). W.A. Harris, Y. Sibuya, eds., SpringerVerlag, Berlin. Matsubara, M., N. Nishimura, N. Watanabe and K. Onogi (1981): Periodic Control Theory and Applications. Research Reports of Automatic Control Laboratory Vol. 28, Faculty of Engineering, Nagoya University.

181

Matsubara, M. and K. Onogi (1978): Stabilized Suboptimal Periodic Control of a Chemical Reactor. IEEE Trans. Automatic Control 23, 1005-1008. Meyer, R.A. and C.S. Burrus (1976): Design and Implementation of Multirate Digital Filters. IEEE Trans. Acoustics, Speech and Signal Processing 1, 53-58. Nistri, P. (1983): Periodic Control Problems for a Class of Nonlinear Periodic Differential Systems. Nonlinear Analysis, Theory, Methods and A p p l i c a t i o n s 7, 79-90. Noldus, E. (1975): A Survey of Optimal Periodic Control of Continuous Systems. Journal A 16, 11-16. Onogi, K. and M. Matsubara (1980): Structure Analysis of Periodically Controlled Chemical Processes. Chem. En~. Sci. 34, 1 0 0 9 - 1 0 1 9 . Popov, V.M. Berlin.

(1973): Hyperstability of control systems.

Springer,

Sch~dlich, K., U. Hoffmann and H. Hofmann (1983): Periodical Operation of Chemical Processes and Evaluation of Conversion Improvements. Chemical En~ineerin~ Science 38, 1375-1384. Shayman, M.A. (1984): Inertia Theorems for the Periodic Lyapunov Equation and Periodic Riccati Equation. Systems and Control Letters 4, 27-32. Shayman, M.A. (1985): On the Phase Portrait of the Matrix Riccati Equation Arising from the Periodic Control Problem. SIAM. J. Control and Optimization 23, 717-751. Sincic, D. and J.E. Bailey (1978): Optimal Periodic Control of Variable Time-delay Systems. International J. Control 27, 547-555. Speyer, J.L. (1973): On the Fuel Optimality of Cruise, J. Aircraft 10, 763-764.

182

Speyer, J.L. (1976): Non-optimality of Steady-state Cruise for Aircraft. AIAA Journal 14, 1604-1610. Speyer, J.L. and R.T. Evans (1984): A Second Variational Theory of Optimal Periodic Processes. IEEE Trans. Automatic Control 29, 138-148. Valko~ P. and G.A. Almasy (1982): Periodic Optimization of Hammerstein-type Systems. Automatica 18, 245-148. Watanabe, N., Y. Nishimura and M. Matsubara (1976): Singular Control Test for Optimal Periodic Control Problems. IEEE Trans. Automatic Control 21, 609-610. Watanabe, N., K. Onogi and M. M a t s u b a r a (1981): Periodic Control of Continuous Stirred Tank Reactors - I, Chem. En@. Sci. 36, 809-818, II ibid. 37, 745-752. Watanabe, N., H. Kurimoto and M. M a t s u b a r a (1984): Periodic Control of Continuous Stirred Tank Reactors - I I I , Case of multistage reactors. Chem. En 9. Sci. 39, 31-36. Wimmer, H.K. and A.D. Ziebur (1975): Remarks on Inertia Theorems for Matrices. Czechoslovak Mathematical Journal 25, 556-561. Wong, E. and B. Hajek (1985): Stochastic Processes En~ineerin ~. Springer-Verlag, Berlin.

in

Wonham, W.M. (1968): On a M a t r i x Riccati Equation for Stochastic Control. SIAM Journal Control 6, 681-698. Yakubovich, V.A. and V.M. Starzhinskii (1975): Linear Differential Equations with Periodic Coefficients. J. Wiley, New York.

Chapter

6

Numerical

Problems

Daniel

in L i n e a r

Boley

System

and S e r g i o

Theory

Bittanti

I. I n t r o d u c t i o n The a n a l y s i s tation

of m u l t i v a r i a b l e

of m a t r i x

problems,

rank and e i g e n v a l u e s . computer

In this work,

numerical

We b e g i n

more

we o u t l i n e

problems

with

f r o m linear

linear

2. R e v i e w

used

methods Value

of these

for c o m p u t e r examples

calculations,

of w h e r e

decompositions problems,

eigenvalue

Decompositions). concepts

to m a t r i x

to be u s e d on a

theory.

and r e l a t e d for

systems

the c o m p u -

for h a n d c o m p u t a t i o n s .

and give

in s y s t e m

equations

and S i n g u l a r

few a p p l i c a t i o n s riant

arise

involves

the t e c h n i q u e s

a r e v i e w of the s i m p l e r

of linear

systems

a few t e c h n i q u e s

w h y they are useful,

sophisticated

(Schur

ranging

In general,

are n o t the same as those

illustrate

systems

control

used then

and rank

go on to

computation

We c o n c l u d e

to the a n a l y s i s

to solve

with

a

of t i m e - i n v a

systems.

of S i m p l e r

Computational

Methods

2.1 - LU d e c o m ~ o s ! ~ ! 2 n

We b e g i n that

by i n t r o d u c i n g

the c o n c e p t

is, we try to reduce

simpler we w o u l d

matrices, like

a matrix

from w h i c h

to calculate.

of a m a t r i x

decomposition;

A to the p r o d u c t

we can c a l c u l a t e

of several

whatever

it is

-

184

The

first

a matrix

example

triangular, Gaussian

is the LU d e c o m p o s i t i o n ,

A into the p r o d u c t respectively.

Elimination.

A = LU, w h e r e

in w h i c h w e d e c o m p o s e L, U are

This decomposition

To see this,

lower,

is c o m p u t e d

it is b e s t

upper using

to use an

example.

Consider

A =

[31i] 1 2

1 1

In G a u s s i a n 1 to rows the

Elimination,

2 and

3. T h i s

(1)

the f i r s t

step

is to add m u l t i p l e s

can be a c c o m p l i s h e d

by m u l t i p l y i n g

of r o w A on

l e f t by the m a t r i x

Im

0 1 0

M1 =

21 m31

where,

in this

the r e s u l t

is

0 0 1 case,

m21 = -1/3,

m31 = -2/3

are the m u l t i p l i e r .

Then,

185

MIA

=

2/3

-

(2)

I/3

Then,

in the n e x t

s t e p we

apply

a matrix

I°l

M2 =

I

m32

where row

m 3 2 = -I/2.

2 to r o w

3 U = M2MIA

This

3. T h e

has

=

both

m32 = -I/2

times

2/3

~

(3)

/

sides

by MI I M21

(det M i = I),

so we m a y

to o b t a i n

L = M11 M21 .

By c o m p a r i n g

(I) w i t h

(2) a n d

zero

then

to set to z e r o

M 2 is u s e d

column

2.

In t h e

general

matrices,

one

following

case,

all

the

(3), n o t e

M I is to set to

the

of a d d i n g

is

t h a t M I, M 2 are n o n s i n g u l a r

multiply

where

result

I

0 We n o t e

the e f f e c t

final

where

for e a c h

subdiagonal all

the

Matrix

the

elements

action

Mk,

of m a t r i x

of c o l u m n

subdiagonal

A is n x n, we m u s t

column.

structure:

that

elements

apply

k = 1,2,..,

n-1

I of A; of

"M"

n-l,

has

186

I

"

0 I

Mk =

mk+ I ,k " .

° mn, k 0

T k - t h column Coefficients matrix

The

mj,k,

k + 2 .... , n, w i l l

be

referred

to as the

multipliers.

last

i t e m we n e e d

decomposition that

j = k+1,

the

inverse

as M k w i t h

to c o m p l e t e

is: w h a t of Mk,

is

"L"? To

the description see w h a t

as c a n be e a s i l y

the m u l t i p l i e r s

in t h e k - t h

of the L U

f o r m h a s L, we n o t e

verified,

column

is the

same

negated:

-I I

• Mkl

0 I

= O-mk+]

,k " .

-mn, k

0

T k-th

Secondly, the

result

diagonal.

we n o t e

column

that when

is s i m p l y In o u r

to

fill

we

form

in all

3 × 3 example,

we

the p r o d u c t

L =M11

the m u l t i p l i e r s

have:

"'" Mn

below

the

I'

187

L =

=

Here,

one

can

multipliers

I

I

I_2/3

0

I/2

see

from

the d i a g o n a l we h a v e

/3

that all

the Mk,

in t h e i r

all

"I" 's.

the n e t

=

change

Hence,

the

1/2

is to c o l l e c t

sign

and place

position.

(i,j)

-m.. = the m u l t i p l i e r u s e d on r o w 13 the s t a g e w h e n c o l u m n j is b e i n g

I

I_2/3

effect

corresponding

/3

position

j when

added

all

the

them below

On t h e d i a g o n a l , of L,

i > j, is

to r o w

i during

zeroed:

i 0 L =

(-Iniji" -

In c o n c l u s i o n , L is l o w e r

we have

triangular

found with

a decomposition

"1" 's

on

for A : A = LU,

the d i a g o n a l

where

a n d U is u p p e r

triangular.

What

can we

do w i t h

this?

We give

2 uses

of t h i s

decomposition:

A. Solve Linear Equations By u s i n g

LUx

A=LU,

the

system

Ax =b

is e q u i v a l e n t

to

= b.

If we

(4)

call

y = Ux,

we t h e n

Ly = b

Ux

= y.

reduce

equation

(4) to two t r i a n g u l a r

systems:

188

Triangular

systems

are

"back-substitution". of G a u s s i a n

... M 1 b

Then

solution

Ux

= L-Ib

solved

using

note

that

also

to t h e v e c t o r

= L-lAx

x can be

if w e

= Ux

found

by

the process

apply b,

known

as

the

row operation

the

result

will

be

= y.

solving

= y.

It t u r n s o u t operations work

that also

to s o l v e

except new

We

Elimination

Mn_lMn_ 2

the

easily

that

right

the extra

to b to o b t a i n

L y = b f o r y.

by using

hand

work

involved

in a p p l y i n g

y is e x a c t l y

The

two

L y = b, w e m a y

s i d e b, w i t h o u t

schemes solve

the

same

as t h e

are exactly

directly

repeating

the row

equivalent,

A x = b, w i t h

a

the decomposition.

B. Computing Determinant Since

the determinant

product

of a p r o d u c t

of t h e d e t e r m i n a n t s ,

determinant

uij

known

fact

is t h e

the product

Using

2.2

that of

(i,j)

element

the diagonal

of d e t A

- Orthoqonal

2.2.1

is e q u a l write

to t h e

the

... Unn) ,

of U.

the determinant

the LU decomposition

definition

immediately

of A as:

d e t A = d e t L • d e t U = I " (u 1 1 u 2 2

where

of m a t r i c e s

we may

Here,

we have

of a t r i a n g u l a r

used

matrix

the w e l l is s i m p l y

elements.

is m u c h

(Stewart,

faster

than

using

the

1973).

Decomposition

- Q R Decomposition

In t h e L U d e c o m p o s i t i o n ,

we

have

applied

matrices

that

are not

189

orthogonal; Since

they

2 vectors

formations,

do not p r e s e r v e m a y be m a d e

we w o u l d

transformations,

like

which

almost

to see w h a t

Q is o r t h o g o n a l

ortho-normal,

i.e.

=

{0

or angles

parallel

do p r e s e r v e

A n x n matrix

qlqJ

lengths

of vectors.

by such t r a n s -

one can do w i t h

lengths

orthogonal

and angles.

if its c o l u m n s

qi are m u t u a l l y

, if i ~ j ,ifi=j,

or,

in m a t r i x

notation,

Q'Q = I.

In this

section,

we

show h o w one m a y t r i a n g u l a r i z e

using o n l y o r t h o g o n a l and angles.

transformations,

thereby

preserving

T h e n we show w h y the use of o r t h o g o n a l

is p a r t i c u l a r l y

useful

Consider

the matrix:

A =

I

by g i v i n g

an e x a m p l e

a matrix lengths

decompositions

of its use.

I

We w o u l d where

like to r e d u c e

"?" d e n o t e s

determined.

We

transformation

QI =

where,

the

a nonzero

first

element

see h o w to do this of the

column

D

whose

using

1

value

~ ' to

~

0

is to be

a "rotation",

i.e.

a

form

c 0

by o r t h o g o n a l i t y , c 2 + s 2 = I.

(5)

0~',

190

We w i l l

u s e QI

to z e r o

element

a21,

i.e.

is[!ic0[!I The

second

line yields:

-S"

3

I

+

which,

c"

=

0,

together

with

C = 3//1-0

,

Having

QI'

defined

s a m e way,

Q2

=

to

zero

we

find

c 2 + s 2 = I, y i e l d s

s = 1/I/To .

we a p p l y the

it to A to o b t a i n

elements

c and

s of the

Q1A.

Then,

in the

rotation

(6)

1 0

a third

a31.

To c o m p l e t e

rotation

Q3 to

the zero

triangular a32,

decomposition,

obtaining

we n e e d

finally

R = Q3Q2QIA. In g e n e r a l , zero

all

in the n x n case,

the

R = QrQr_1

subdiagonal

... Q 1 A

= an u p p e r

Let

Q =

(QrQr_1

..- Q1 )-I.

By o r t h o g o n a l i t y :

we n e e d

elements

r = n(n+1)/2

of A:

triangular

matrix.

rotations

to

191

,

!

!

Q = Q1Q2

We h a v e

"'"

Qr

now

the

"

so c a l l e d

triangularization

A

=

2.2.2

ortho

A rotation seen from problem

only

upper triangular

the

of a v e c t o r ,

we may

look

as c a n be

at a r e p r e s e n t a t i v e

2 × 2 rotation:

also

x and e I =

2 components

(6). H e n c e ,

2-space.

represent

Consider

:

R

affects

(5) a n d

in t h e

Consider

X

•

Geometric Interpretation of a Rotation

-

We m a y

or o r t h o g o n a l

of A:

Q

arbitrary

QR decomposition

c = cos

a vector

D

%

, s = sin

~ for

some angle

x of R 2, a n d d e n o t e

by

8 the

~

angle

9':

Ixl fcos:l =

IIx II [ s i n

"

Hence,

[ c

Qx

=

llxll

cos

that

+ s

sin

i cc

os(O-~

=

-S COS

This means

0

Qx

8 + C sin

is t h e v e c t o r

llxll

Lsin(8-%~

x rotated

by angle

.

#.

between

192

2.2.3

Decomposition by Householder Transformations

QR

As we h a v e

seen

in 2 . 2 . 1 ,

be o b t a i n e d

by m u l t i p l y i n g

alternative

way

holder

vector

is t h a t

as o n e

in g e n e r a l , component one

likes

can

only.

rotations.

This

n-1 The

reflection")

implies

Householder

c a n be

of a s i n g l e

that,

introduced

to t r a n s f o r m

by m a k i n g

"House-

trans-

components

to o b t a i n

a vector

so-called

to z e r o

out

can

is an

of a

transformation,

transformations, transformation

There

of t h e s e

as m a n y

it is p o s s i b l e

Householder

We w a n t

on t h e

advantage

zero out

by m e a n s

QR d e c o m p o s i t i o n

rotations.

Q, b a s e d

The main

by a r o t a t i o n

can use

follows.

one

Q of the

n(n+l)/2

of c o m p u t i n g

transformations".

formations

matrix

whereas,

one

the Q R d e c o m p o s i t i o n , instead

of n ( n + l ) / 2

(or " e l e m e n t a r y reference

to R 2 as

x to a v e c t o r

v along

e.g.,

axis

e I = [I

such

03

that

vector

,

n v [[ =

around

[[ x [[

x + v

This

(see Fig.

c a n be a c h i e v e d 1).

[ Z=X+ V

eI

Figure 1.

v

by r e f l e c t i n g

the

193

We

go

v=

through

the

following

steps

(note

that

we know

x

and

ttxti e1~

- Axis

of

reflection

z = x+v

= FXl +

II x il,x~]'

I..

The

corresponding

- Project

x

onto

unit

the

axis

vector

of

is

z/fir If.

reflection

to

obtain:

Z Z Ix

tl z II2 - Find

the

difference

between

x

and

its

projection:

a

(or

equivalently

b=a-x.

- Reflect v

x around

= x+2b=x+

its

2(a-x)

projection = 2a-x

2 -z z- ' x = -

= -x+

z) :

(I- 2

ilzli The

zz'

) X.

ilzil2

matrix ZZ

!

P =I-2----

,

IIzll2

gives

the

Householder

transformation.

Since

v = - Px,

we

can

conclude

"reflect"

a vector

In n - s p a c e , to

zero

that,

out

we at

can once

x

by into

pick as

such any

the

many

a linear axis

of

transformation, the

desired

target

components

of

one

can

space.

direction a vector

as

c we

so

as

like.

194

2.2.4

Solving Least Squares Problems Using Orthogonal Decompositions

Let A e R mxn, m_>n, problem

b e R m and x e R n. The L i n e a r L e a s t Squares

is the p r o b l e m of f i n d i n g the f o l l o w i n g m i n i m u m

min I 1 ~ - b

II

X

The a l g o r i t h m of 2.2.1 may be a p p l i e d to r e c t a n g u l a r just as well as square ones. decomposition

matrices

In this case we see that the Q R

of the r e c t a n g u l a r

w h e r e Q ~ R m x m is o r t h o g o n a l ,

m a t r i x A is:

R ~ R nxn is upper t r i a n g u l a r

0 e R (m-n)xn is a b l o c k of zero elements.

As Q is o r t h o g o n a l ,

it does not change the norm.

llAx-bI' = I'Q' (Ax-b)'l

= " [RTx-c

Lol

II,

where

c = Q'b. Partitioning

C

=

this v e c t o r c o n f o r m a l l y

r

IC21 we have

IIAx - b II = II [RXc2-c I II

with [~I,

Hence:

and

195

In o r d e r

to m i n i m i z e

this norm,

we

set

x = R-Ic I .

(9)

Thus,

min x

II A x - b

To f i n d

II : II c 2 ll-

the o p t i m u m

value

of x g i v e n

by

(9) we h a v e

to s o l v e

system

Rx = c I .

In this of

(10)

respect,

N Ax-b

A'Ax

II l e a d s

to the

noticing

celebrated

a direct

normal

minimization

equations:

of

(11)

(8),

system

(11)

is e q u i v a l e n t

to

= R'C I.

(12)

It is i m p o r t a n t

to o b s e r v e

to s o l v i n g

(12)

as o n e o b t a i n s

discussion

of

Systems,

this,

see e.g.

give an e x a m p l e

Suppose

only

(Lawson

0

solving fewer

use

system

errors.

the e r r o r

and Hanson,

(10)

For

a complete

analysis

1974).

is p r e f e r a b l e

of L i n e a r

However,

we m a y

of the p r o b l e m .

on a computer

7 significant

[ 11 10 -4

that

one must

we a r e w o r k i n g

we c a r r y

A =

that

-- A'b.

In v i e w

R'RX

i t is w o r t h

0

10 -4

digits.

with

precision

Consider

10 -7

i.e

matrix

(13)

196

A has r a n k

A'A

I

=

2, but

if we f o r m

1 + 10-8

1

1

1+10 -8

in o u r c o m p u t e r

J

w e w i l l loose

the p a r t

which

has r a n k

3. S p e c i a l

1. So, we loose r a n k

Forms

Used

in N u m e r i c a l

The LU a n d Q R d e c o m p o s i t i o n s s t e p in the c o m p u t a t i o n in the f o l l o w i n g . flavour

useful

things

(ii)

used

a given

Determinant:

(iii)

R a n k of A

(iv)

Nullspace

is w e l l

known,

section

I

also

are u s e d

as the basic

to be i n t r o d u c e d

serves

of f i n d i n g

square matrix

to give

the

A,

a n u m b e r of i.e.

1

det A

of A: ker A

(v) I m a g e o r R a n g e of A:

- The J o r d a n

above

in the r e s t of this work.

on the p r o b l e m s

(i) E i g e n v a l u e s :

Linear Algebra-Why

discussed

The p r e v i o u s

about

information.

of the d e c o m p o s i t i o n s

of the a p p r o a c h

We n o w c o n c e n t r a t e

As

instead

El ii

A'A =

3.1

"10 -8" a n d o b t a i n

Canonical

there

c o l s p A.

Form

are m a n y

classical

decompositions

for

197

matrices,

the m o s t

common

A is then d e c o m p o s e d

A = PJP

where

-I

into the p r o d u c t

P is n o n s i n g u l a r , Form

eigenvalues

of A

(product

(dimension

corresponding

( elements

so-called

Matrix

minus

corresponding

of J),

and the

of J o r d a n

the c o l u m n s

bloks

of P c o r r e s p o n

the n u l l s p a c e

to n o n z e r o

us the

the

of J)

the n u m b e r

of J g e n e r a t e

Jordan

form can tell

elements

to h i = 0). F u r t h e r m o r e ,

of A,

rows of J g e n e r a t e

of A.

So, the J o r d a n

Canonical

(i) - (v). However,

almost

This

of the d i a g o n a l

of the m a t r i x

the c o l u m n s

the range

1959).

of the d i a g o n a l

ding to all zero c o l u m n s

advised.

decomposition.

of 3 m a t r i c e s :

and J is in the

(Gantmacher,

determinant

whereas

the J o r d a n

,

Canonical

rank

being

The m a t r i x

singular),

separated(almost separated", This calls

especially

Conditionin~

conditioning

finite w o r d

Form

li are p o o r l y

I. are "well l is an i l l - p o s e d problem.

consideration

computer.

Because

can be r e p r e s e n t e d

in the computer.

In the t r e a t m e n t used

are p e r t u r b e d

(i.e.

of a P r o b l e m

model m o s t

often

items is ill-

if the

is an i m p o r t a n t

numbers

form

of

use of a d i g i t a l

length,

even

this

ill-conditioned

if the e i g e n v a l u e s

the J o r d a n

for the q u e s t i o n

one c o n s i d e r s t h e

one to find out all

computations,

P can be e x t r e m e l y

coincidingl.But

finding

3.2 - N u m e r i c a l

Numerical

Form enables

for n u m e r i c a l

only

whenever of the approximately

of such a p p r o x i m a t i o n s ,

is to c o n s i d e r

what happens

the

if the n u m b e r s

slightly.

The e i g e n v a l u e s take for e x a m p l e

can be e x t r e m e l y the

3 x 3 matrix

sensitive

to p e r t u r b a t i o n s :

198

-64

82

144

-178

-46

-778

962

248

A =

which

21]

has e i g e n v a l u e s

(14)

1, 2, 3. If we add a small p e r t u r b a t i o n

EE,

where

01 E = 10 -4

-0.6

I. I

-6

_-0.1

0.3

-1

is a rank one m a t r i x perturbed shows

that

!

matrix

of n o r m ~10 -3, then,

A + EE w i l l

problems

have

m a y occur

complex

for any e > 0.45, eigenvalues!

e v e n on small

the

This

innocuous-looking

matrices!

Even with

7-16 d i g i t s

In the f o l l o w i n g perturbations destroy order

of accuracy,

20 x 20 e x a m p l e s ,

in the

all d i g i t s

of m a g n i t u d e

9 th place

of a c c u r a c y will

this

is a r e l e v a n t

due to W i l k i n s o n

in some e l e m e n t

will

in the e i g e n v a l u e s -

be w r o n g

problem:

(1965), completely even

the

in some cases:

["20 20

20 19

20 18

20 17

0 20 16

20 (15)

A = 5

0

20 4

20 3

20 2

20 I

199

We p o i n t apply then

out that

to o b t a i n

an a l g o r i t h m . slight

result,

as was

gorithm

that can give

In c a s e a p r o b l e m

use m e t h o d s

satisfactory

is n o t so b a d l y

introduce

collectively

the s t a b i l i t y

conditioned

and the

believe

the answer

the p r o b l e m is w e l l hope

hope to improve ditioned

The usual

in terms

4. Schur

of s t a b i l i t y

"Backward

of a s p e c i f i c

If a p r o b l e m

decomposition

which

most closely

corresponds

called

decomposition.

Schur

this d e c o m p o s i t i o n

badly

con-

not.

in L i n e a r

in the n e x t

is n u m e r i c a l l y

best

Algebra intro-

section.

to the J o r d a n

is g i v e n

Denoting

useful

decomposition

by

"*" c o n j u g a t e

is the

so-

transpose,

(16)

Q is a ( p o s s i b l y (possibly

and w h i c h

by

A = QRQ*,

is a

we c a n n o t

Decomposition

The m a t r i x

where

case,

for a or

of

If the p r o b l e m

is p r o b a b l y

algorithm

then we m a y

t h e n we can

in the o p p o s i t e

This

is w e l l -

by the c o n d i t i o n i n g

for a l g o r i t h m s

Stability".

causing

are c a l l e d

is u n s t a b l e ,

stable

like to

the b e s t

it is stable

of a s o l u t i o n

by any a l g o r i t h m ,

measure

is s o - c a l l e d duced

the a c c u r a c y

problem

providing properties

of the algorithm).

but

is no al-

we w o u l d

as p o s s i b l e ,

defined

but the m e t h o d

algorithm;

there

t h e n we m a y con-

regard

to solve

limits

and the s t a b i l i t y

for a b e t t e r

conditioned,

In this

desirable

method

case

in

the d e s i r e d

results.

of an algorithm.

(to the

conditioned,

In this

or at least

These

or i l l - c o n d i t i o n e d

can d e s t r o y

as few e r r o r s

perturbations,

on such errors.

one m u s t

(such as t h a t o c c u r i n g

length)

(15).

for its solution.

which

the s m a l l e s t bounds

word

seen in e x a m p l e

to a p r o b l e m

is i l l - p o s e d

to the d a t a

from the finite

sider m e t h o d s

solution

If a p r o b l e m

perturbations

the c o m p u t e r

the

complex) o r t h o g o n a l

complex)upper

triangular

matrix matrix.

(Q*Q = I) and R

200

We f i r s t n o t e the m a t h e m a t i c a l position: values

since

elements.

= detQ

Hence,

is a l w a y s

form yields

the n u m e r i c a l bounded

"almost parallel",

the o r i g i n a l

instability

items

properties?

we can m a k e

form

between

(16)

is n o t m a d e worse.

same

size

to be

stable,

that

b l e m c l o s e to the o r i g i n a l

of s a y i n g

slightly

perturbed

b l e m A".

Here,

R.

3.2.

This

and

The a d v a n t a g e

in A w i l l

result

sensitivity

in

to

any algorithm/form

shown

(Wilkinson,

is e x a c t

1965)

for a p r o -

R

starting problem

As is s e e n

is f o r w a r d

stable:

"The

R c l o s e to the R t h a t w e w o u l d h o p e

starting with

an a n s w e r

(a) is not t r u e destroy

so the

is the r e s u l t

an a n s w e r

"close r' m e a n s

used.

in Sect.

in

is n o t

problem.

in a p r o b l e m

In g e n e r a l ,

(b) the a l g o r i t h m

has p r o d u c e d

"close to s i n g u l a r i t y " ) .

one.

exact arithmetic

p r o b l e m A", we say

are

the i l l - c o n d i t i o n i n g

can be

(a) the a l g o r i t h m

has p r o d u c e d

it

(the c o l u m n s

sure that the r e s u l t

in R,

transformations

to obtain w i t h

of Sect.3.

Q is o r t h o g o n a l ,

is t h a t p e r t u r b a t i o n s

perturbations

the c o m p u t e r

Since

ill-conditionin~

b a s e d on u n i t a r y "backward"

(ii)

t h a n the s t a r t i n g

of the

statement

just

the r e l a t i o n :

of R.

so t h a t Q is n e v e r

perturbations

complete

satisfy

(i) a n d

of an a l @ o r i t h m m e n t i o n e d

of the S c h u r

algorithm

are

elements

to p e r t u r b a t i o n s

is the d i s t i n c t i o n

Instead

the e i g e n v a l u e s

can hope to r e m o v e

problem,

sensitive

algorithm

the e i g e n -

in size a n d w e l l - c o n d i t i o n e d

T h o u g h no a l g o r i t h m

more

for w h i c h

transformation,

The d e t e r m i n a n t s

of the d i a g o n a l

the S c h u r

What about

R,

o f the S c h u r d e c o m -

d e t R det Q* = det R =

= product

never

is a s i m i l a r i t y

of A are t h o s e o f

the diagonal

detA

(16)

relevance

the e x a c t

is b a c k w a r d which

original

stable:

is e x a c t

"The

for a

A c l o s e to the o r i g i n a l

on the o r d e r

of t h e p r e c i s i o n

from examples

in g e n e r a l - s l i g h t

(14)

and

changes

(15),

pro of the

to A c a n

201

By u s i n g exact

orthogonal

eigenvalues

In e x a c t

no method examples

of

can

(14)

c a n be

we make and

the

still

obtain

(backward

hence

If o u r m e t h o d still

stability).

we o b t a i n

makes

the

exact

slight

hope

for

errors

(b). T h e s e

in a m e t h o d •

Schur f o r m c a n n o t be used reliably for items (iii)-(v).

Schur

as

form

follows.

If A is a l r e a d y

is o b t a i n e d

with

upper

R = A, Q = I. C o n s i d e r

the n × n m a t r i x :

I -I

-1

I

A

to A

one can expect

illustrated

triangular,

close

(a), b u t we c a n

the b e s t the

we may

no errors,

(15).

satisfy

shows

Unfortunately,

then

for a m a t r i x

arithmetic,

eigenvalues

This

transformations,

0

=

(17)

". -I 1

i

If ~ = 0, a l l

the

rank

is c l o s e

to d e f i c i e n t ,

will

make

A singular!

-I x =

eigenvalues

-2 , ~

Therefore, a matrix.

-3 r (~ , - • . t

we need Such

This

exactly

since

-(n-~)

-(n-l) ,

way

~

by

e to

the

e = - I / 2 (n-2)

forming

Ax with

-(n-1 r

to f i n d

is p r o v i d e d

I. N e v e r t h e l e s s ,

"perturbing"

c a n be v e r i f i e d

(~

a better

a way

are

by the

the rank, Singular

det, Value

etc.

of

Decom-

position.

5.

Singular

In t h i s items We

section,

such

also

and try

Value

Decomposition-Condition

we

introduce

as r a n k :

introduce to e x p l a i n

the

the its

another

Singular

concept

Number

decomposition

Value

Number

a Matrix

relevant

Decomposition

of C o n d i t i o n

significance.

of

to

(S.V.D.).

of a m a t r i x

202

The

A

S.V.D.

of a mx p matrix

A is

-- U ~ V *

where

(18)

U and V are

m x p real By

and diagonal

letting

oI ! o2 ~

IIAII =

o2,

...,

and orthogonal

matrix,

n = min(m,p),

= diag (al,

In w h a t

square

we

with

usually

matrices

nonnegative assume

and

E is a

diagonal

elements.

that

on ) ,

-.- Z o n I 0.

follows,

we w i l l

IIAxll

max

use

the m a t r i x

2-norm:

,

(19)

IIx II = 1 w h e r e llx llis t h e u s u a l several

properties.

the

fact

the

2-norm

that

vector

The most

orthogonal

of a matrix

2-norm.

In t h i s

immediately

matrices

do n o t

norm,

relevant affect

we have

property (Stewart,

is 1973)

or v e c t o r :

IIQxll = IIx II

(20a)

IIQAII=

IIA ll-

(20b)

Notice

also

What

kind

obvious

IIAII

that of

IIQ

II = 1.

information

is t h e n o r m

o f A.

From

gleam

(18),

from the

(20)

and

S.V.D.?

The most

(19):

= I1~11 = 01 •

If A is a n o n s i n g u l a r

A -1

can one

= V Z-I U*.

Moreover,

square

matrix,

its

inverse

is g i v e n

by

203

IIA-Ill = 11 z-III = %1 Given a square nonsingular matrix A, the number

kIAl =IIAII • IIA-III

(21)

is said to be the condition number of A. Obviously,

k(A)

= oI/~n"

The condition number happens to be a very useful quantity in estimating the sensitivity of such items as rank, determinant, inverse,

solution to a set of linear equations,

etc., with

respect to perturbations to the matrix A. It also gives the "distance to singularity" To see this, we start with the classical origin of k(A). Consider the problem of solving the matrix equation Ax = b. When using a computer, we obtain an approximate result ~, which we consider exact for the slightly perturbed problem A~ = ~. Note that we have perturbed only b, not A. We have:

Ax = b ~u A~=b. Subtract to get

A(~-x)=b-b. Multiplying both sides by A -1 and taking the norms, one obtains:

i~-xJl ~ liA-Ill il~-bll, i.e. the

(error in answer)

122) is bounded by the(error in right

hand side) magnified by iIA-III. However,

to estimate the number

of digits of accuracy in x, we need the "relative error"

204

II~-xll ilxll If the r e l a t i v e of accuracy,

error

is e.g. % 1 0 -6,

regardless

of the r e l a t i v e

error,

of the

then we h a v e

about

size of x. To o b t a i n

we use the r e l a t i o n

6 digits

an e s t i m a t e

A x = b to obtain:

IIAII Ilxl[ > [Ibll i.e.

flail > I llbil -]Ixil From

(22)

113-xlI Ilx II i.e.

and

(23)

(23),

and d e f i n i t i o n

(21),

<_ kCAI ll~-bll lib II

the

(number

of g o o d d i g i t s

of g o o d

(24a)

digits

in b) m a g n i f i e d

For p e r t u r b a t i o n s

in x)

is b o u n d e d

by the

in A, one can o b t a i n

an a n a l o g o u s

result:

II~-xll II~-AII ll~l] _< kIAl II~II Here,

the e r r o r s

We give

(24b)

are r e l a t i v e

a few e x a m p l e s

k(I)

(b) k(A)

to the a p p r o x i m a t e

of c o n d i t i o n

matrices:

(a)

(number

by k(A).

=

I

>

I, for any A.

numbers

values.

of some p a r t i c u l a r

205

Indeed:

= (c) k(Q)

IIAII

IIA-111

z

IIAA-111

=llIII

= 1

= 1, if Q is o r t h o g o n a l

(d) L e t T 6 be the are t.. = 13

6 × 6 Hilbert

matrix,

the

elements

of w h i c h

(1+i+j) -I.

Then, k(T 6) ~ 106 .

The

S.V.D.

a n d k(A)

singularity"

c a n be

used

to f i n d

the

of a m a t r i x

A. We n o t e

that

7 = number

of n o n z e r o

G. s.

"distance

to

!

rank A = rank

In p a r t i c u l a r ,

if A is n o n s i n g u l a r ,

t h a t A is n o n s i n g u l a r singular.

V*(A+E)

Then,

a i > 0, V i .

a n d E is a p e r t u r b a t i o n

we m a y w r i t e ,

U =V*AU

1

+V*EU

using

the

S.V.D.

Now,

such

suppose

t h a t A + E is

of A

(18):

= Z +F,

where

F = V*EU.

Because

U, V are

to A c o r r e s p o n d We

can define

smallest

d

= sing

In v i e w

dsing

=

"distance

E such

EII

:

II F II- T h u s ,

to p e r t u r b a t i o n s to

singularity"

F to as the

perturbations

E. norm

of the

t h a t A + E is s i n g u l a r :

min A + E sing.

of t h e

orthogonal,ll exactly

II E II •

discussion

min 7.+F sing.

II F II

(25)

above,

this

corresponds

to

(26)

E

206

Since

Z = diag(ol,

is c l e a r

that

o 2 , . . . , On) , w i t h

the F which

achieves

o I I o2 >

the m i n i m u m

... _> o n > 0, in

(26)

it

is

F = d i a g ( 0 , 0 , . . . , 0 , - ~ n)

so t h a t

lJ ~ Jl= % Hence,

the E a c h i e v i n g

we have

U = [u I u 2

labeled

(25)

is

the columns

of U , V :

... U n ]

V = Iv I v 2 ... V n ]

Notice

in

= - O n U n v*n '

E = UFV*

where

the minimum

also

.

that

If ~ Ir= % Hence,

dsing

the

distance

to s i n g u l a r i t y

is

= an .

(27)

Consequently, to t h e

size of

d

the the

"relative starting

distance matrix

A)

to

singularity"

(relative

is:

o sing

_

}IA II SO,

k(A)

solving

n

- k (A) .

(28)

°I not only Ax=

b, b u t

indicates also

the

shows

difficulty

how

close A

one

can expect

is to a s i n g u l a r

in

207

matrix relative gives

the

sensitivity

It s h o u l d defined

be n o t e d

using

S.V.D.

We h a v e

spaces

We n o t e

hold

the

suppose

... ~

in a n y

the

size

r a n k A,

For

about

this the

case,

analogous

of t h e

smallest

(Or/O I) g i v e s

we

...

to

IIAII II

c a n be

to a v e c t o r

norm,

involving

2-norm.

(iv)

such quantities

and

(v) of S e c t . 3 ,

start with values

and

perturbation the relative

a singular

of A

= o n = 0,

(27)

k(A)

in A.

to o b t a i n

points

singular

words,

The results

in the

can be used

that

:

such norm. only

that,

and

k(A)

corresponding

o r > Or+ I = Or+ 2 =

in p a s s i n g

or gives

quantity

norm

S.V.D.

In o t h e r

A to p e r t u r b a t i o n s

are valid

colsp A?

L e t us

oI ~ o2 ~

the

r a n k A, k(A) . W h a t

k e r A,

n × n A.

reduce

(24)

of t h e m a t r i x .

of rank

that

however,

seen how

as IIAII ,

size

any matrix

and relations the

to t h e

satisfy:

so t h a t

(28), needed size

of

r a n k A = r.

the quantity to

further

such

perturbation•

We w r i t e

the

S.V.D.

of A as

"a 1

v;]

o2

0 or

v Eu12[iI 0°

0

A = U

0

0 0

where

Z I = diag(ol,

o2

,---,

or )

(29)

208

is r × r a n d n o n s i n g u l a r . to Z.

A

been

partitioned

conformally

1

UI,V I are nx r orthonormal

singular. ker A

have

Thus,

= UIZIV

where

U, V

Hence,

is t h e

orthonormal

orthogonal basis

In p r a c t i c e ,

matrices

U I is an o r t h o n o r m a l

i.e.

to u s e

complement

and

basis

of

the

space

V 2 is an o r t h o n o r m a l

(29), o n e w i l l

~I

is r × r n o n -

for c o l s p with

basis

frequently

A,

and

V I as

f o r k e r A.

encounter

the

situation

o I >_ o 2

where

>_

some of the

the order cide

singular

It is b e s t gular 10 0

only

the exact

later

10 -I

_> ... > o n _> 0,

singular

the machine

is s m a l l " , value

(scaled 10 -2

the order

10 -4

instead,

"small",

The problem

is at w h a t

the problem

point

of a c c u r a c y ,

here. Assume

so t h a t

o I = 1) e.g.

10 -8

10 -10

10 -9

of magnitude

Hence

are

precision.

that

r a n k of t h e m a t r i x

zero.

values

i.e.

on

is to d e -

to c o n s i d e r

zero.

illustrate are

6 digits

considered

10 0

to

values 10 -1

where

If,

of e.g.

"how small

a small

had

... >_ O r > O r + I

are

shown.

is 8. B u t

then

we would

0

that

the

0,

Then

we

see

if t h e o r i g i n a l

any number consider

< 10 -6 the

rank

that

data

should to be

be 5.

we had the values

10 -2

10 -4

10 - 6

10 - 8

10 - 1 0

10 - 1 2

10 - 1 4

10 - 1 6

I0 -I

10 -2

10 -3

10 -4

10 -5

10 -6

10 -7

10 -8

10 -18

or

i0 0

sin-

10 -9 '

only

209

then there almost

is no o b v i o u s

entirely

Unfortunately, practice,

small

see

this

the S.V.D.

This

arise

really

means

perturbation

to A w i l l

reduce

the rank,

only

slightly

larger,

in

is n o t a d e f e c t

arises,it

For a full d i s c u s s i o n

this

frequently

situation

(Klema and Laub,

We close

c a n and d o e s

in l a r g e m a t r i c e s .

perturbation,

further.

rank depends

of the z e r o t o l e r a n c e !

situation

If this

(negligible)

another even

so the e f f e c t i v e

o n the c h o i c e

especially

of the S.V.D.

gap,

can r e d u c e

of the S.V.D.

that a and

the r a n k

and the r a n k

1980).

section with

W e just p o i n t

a few examples o u t the idea,

of s i t u a t i o n s

leaving

involving

the d e t a i l s

to

It is u s e f u l

to

t h e reader.

a) L e a s t

Squares

(Lawson

T h i s w a s the c l a s s i c a l solve p r o b l e m cases, R in

(7) in cases w h e r e

we c a n n o t

A is r a n k d e f i c i e n t .

use the Q R d e c o m p o s i t i o n

(8) is s i n g u l a r ,

If we r e s o r t

& Hanson,1974) o r i g i n of the S.V.D.

instead

and h e n c e we c a n n o t to the S.V.D.

because solve

In s u c h

the m a t r i x

(9).

of A, we have,

in v i e w of

(20a)

llAx-bll

= 11

V*x-bll

=

w h e r e y = V * x a n d c = U*b. original

problem

partition

I[

y

We minimize we

find

The result

to a d i a g o n a l

the a b o v e

0

11 ZV*x-

as in

C

(29)

y ctl,

is w e h a v e

problem

converted

involving

the

Z. We

to o b t a i n

II

t h i s n o r m by s e t t i n g

that Y2 is free!

*bll=ll

Yl

-I = Z1 ci"

In the s o l u t i o n ,

210

(b) P s e u d o

Inverse

The p s e u d o expressed

inverse

A + of A, L a w s o n

and Hanson,

1974,

can be

as

A+

V

U*

where

we have

(c) R e l a t i o n

used

the p a r t i t i o n i n g

(29).

to A * A

We p o i n t

out the r e l a t i o n s h i p

of the S.V.D.

to the c l a s s i c a l

idea of e i g e n v a l u e s . If A = U Z V*, then A *A = V Z U* U Z V*

= V Z 2 V*

~2

•...,

= diag(~,

Hence•

the

a~

singular

of A'A,

of A * A . I n

using

fact,

semi-definite

to p r o v e

Hanson,

1974). for a c t u a l

solution

of the

without ease

accurate

forming

A'A,

squares and

u s i n g A * A are o f t e n

performing

computation

this

argument

in

(Lawson

and

or the

the S.V.D.

from e x a m p l e

(or 2 × n for any n), sufficiently by hand.

positive

(a), it is a l m o s t

to c o m p u t e

as c a n be seen

r o o t of

is s y m m e t r i c

of the S.V.D.

problem

faster

problems

square

of V are the eigenvectors

of the S.V.D.

computation

least

of 2 × 2 m a t r i x

obtained when

the fact that A * A

the e x i s t e n c e

However,

more

oi are just the

and the c o l u m n s

for any A, one can c a r r y

reverse

always

2 an).

values

the e i g e n v a l u e s

,

(13). the

accurate,

directly In the results

especially

211

6. A p p l i c a t i o n s

of P r e v i o u s

We f i n a l l y

a look of h o w the p r e v i o u s

stability

take

applies

continuous-time ~(t)

= F x(t)

y(t)

= H x(t)

to L i n e a r

on n u m e r i c a l either

the

system

+ G u(t) (31)

F is n x n, G is n x m a n d H is p x n. Markov

w(i)

G

= H F i-I

In d i s c r e t e whereas

parameters

time

t h e y are the v a l u e s

in c o n t i n u o u s - t i m e at the o r i g i n

We s h a l l

consider

a given

system

system

defined

is r e a c h a b l e

and

t, t h e r e

Obviously, The p r o b l e m

criteria

studied

theorem:

two problems

Determining

f r o m the whether

the s y s t e m

w(-). (Kalman,

Falb,

if,

for e a c h

state x and e a c h

a T
whether

a given in the

have been

for c o n t i n u o u s - t i m e

Arbib,

function

= 0 into x(t)

upon matrices

extensively

for r e a c h a b i l i t y

criteria

of the p u l s e

that

depends

of d e t e r m i n i n g

response

(i) D e t e r m i n i n g

such as to c a r r y x(T)

reachability

or n o t has b e e n

of p o p u l a r

exist

(ii)

sequence

(i), we r e c a l l

o v e r IT,t)

examples

stability.

is said to be r e a c h a b l e

time point

of the p u l s e

its d e r i v a t i v e s .

as p a r t i c u l a r

n f r o m the e x t e r n a l

for p r o b l e m

as

t h e y are the v a l u e s

and

p o i n t of v i e w of n u m e r i c a l

order

are d e f i n e d

, i = 1,2,...

response

this

Consider

(30)

The s o - c a l l e d

many

discussion

Systems.

+ G u(t)

x(t+1) = F x(t) y(t) = H x(t)

As

Dynamic

Systems

system

or the d i s c r e t e - t i m e

where

to L i n e a r

1969)

u(-) = x

.

F, G only.

system last

is r e a c h a b l e

30 years,

developed. systems

and

Examples

are g i v e n

in

a

212

Theorem The

A

system

equivalent

(CI)

The

T h e n x(nm)

There

if and o n l y

if a n y of t h e

following

is t r u e

matrix

~ = [G F G F 2 G . . . F n - I G ]

has rank

n

rank)

complex

(C3)

is r e a c h a b l e

n x(nm)

(full

(C2)

(30)

conditions

matrix

number

exists

values)

s

P(s)

: [sI-F

" G ] has

rank

n for a n y

(PBH t e s t ) .

a m× n matrix

of F + GK are

all

K such

that

different

the p o l e s

from

those

(eigen-

of F

(state

feedback).

(C4)

(In c a s e

F is s t a b l e ,

negative

real

the FW

solution +

WF

=

(Grammian

(C5)

There

part). of t h e

In

exists

(33),

eigenvalues W has

rank

of F h a v e n, w h e r e

W is

equation: (32)

no p a i r

T is

Fll

(~,~)

(33),

related

some nonsingular conformally

to

(30)

matrix,

by ~ = T F T -I,

such

that

as :u

,

0 are

c a n be then

(33)

G =

F22J

and F22

condition form

the

-GG'

[i

This the

Lyapunov

G can be partitioned

F =

all

grammian

condition)

G = TG, w h e r e

F,

i.e. The

square

read

the

as:

system

matrices. "If t h e r e is n o t

is a p a i r

reachable".

[]

% % (F,G)

of

213

For d i s c r e t e - t i m e

Theorem

The s y s t e m

(31) (C1),

if and o n l y singular.

Many

an a n a l o g o u s

theorem:

is r e a c h a b l e (C2),

(C3)

if and only or

(C5)

if the d i s c r e t e - t i m e

if any of the e q u i v a l e n t

holds,

Grammian

or, W:

equivalently,

= C C'

is non-

[]

We d i s c u s s

this

showing

giving

we h a v e

B

conditions

first

system,

theorem

the

limitations

some p o s i t i v e

when

b u t o n l y to the w o r k i n g c a s e we m i g h t

regarding

conclude

precision", criterion

(C2),

(C4)

(C5),

becomes

This

can

ill-posed.

system

unreachable".

if we take

(C3).

on the

not e x a c t l y

of our computer.

"almost

and then

(C1),

or sub-matrix.

t h a t the s t a r t i n g

(C1),

(C3),

be r a n k - d e f i c i e n t ,

precision

or

of v i e w by

A and B d e p e n d

the rank p r o b l e m will

point

for c o n d i t i o n s

in T h e o r e m s

a submatrix

to w o r k i n g

(CI),

of the r a n k of a m a t r i x

lead to p r o b l e m s Frequently,

of

results

of the c o n d i t i o n s

computation

from a n u m e r i c a l

In this

is " u n r e a c h a b l e For example,

the e x a m p l e

of P a i g e

(1981): F = diag

o = El,

(1, 1/2,

I,

1 ....

I/22 ,...,I/2 n-l)

then hy i n s p e c t i o n However

(34)

it is a p p a r e n t

if we f o r m the m a t r i x

a smallest that

(34)

,13'

C

singular

has rank

value

values

are n o n - z e r o ) ,

computers,

10 -12

A similar

problem

determining

the

~

. mln i.e.

< n-l,

is not reachable.

The but,

system

the

the

system

rank

of

to the w o r k i n g

when

order

system

for n=10,

is reachable.

we find

= 10 -12 , so we w o u l d

exact

is c o n s i d e r e d arises

that

C

it has conclude

defined

by the p a i r

C

(all s i n g u l a r

is n

precision

of m o s t

zero. considering

the p r o b l e m

n from the e x t e r n a l

of

sequence

w(-).

214

This

problem

suitable

~r

is i n d e e d

matrix.

the p r o b l e m

Precisely,

of f i n d i n g

the r a n k of a

let

-w(1 )

w(2)

...

w(r)

w(2)

w(3)

...

w(r+1)

w(r)

w(r+1)

...

w(2r-1

--

be the s o - c a l l e d We recall

that,

r-dimension

matrix.

given a system defined

the s y s t e m d e f i n e d and t h a t a s y s t e m reachable

Hankel

by

(F', H',

is said

(see Kalman,

G')

by the t r i p l e

is n a m e d

to be o b s e r v a b l e

Falb,

Theorems

A and B it f o l l o w s

and only

if the m a t r i x

Arbib,

1961).

t h a t the

system

(F,G,H),

the dual

system

if its dual Then,

from

is (CI) of

is o b s e r v a b l e

if

!

O

= [H'

F|H I

(FI)2H '

. ....

(F')n-1H '

]

is full rank. It is e a s y

to see t h a t

~n =OC Therefore, rank

~n

In fact, vable rank

if a s y s t e m

is b o t h

and observable,

= n.

it can be s h o w n

if and o n l y ~r

reachable

=n

that a system

if ,

Yr>n.

is r e a c h a b l e

and o b s e r -

215

This m e a n s

that,

in principle,

observable

system

the o r d e r

can be found

from the p u l s e

determining

the rank of a Hankel

Now

t h a t F and G are given

suppose

this

case

symmetric ~min ~ n of

C

matrix

= (Omin C ~n

)2. We give

for v a r i o u s

in T a b l e

values

4.8 x I0 -5

2.3 x I0 -9

7

1.4 x I0 -6

1.9 x I0 -12

8

2.1 x I0 -8

4.3 x I0 -16

"almost

formally reachable

define

I

of the c o m p u t e r

rank-deficient system

motivates

reachability".

With

the c o n c e p t

even

being

used,

though

matrix

the m a t r i x

is r e a c h a b l e .

the

introduction

this

objective

of d i s t a n c e

of the c o n c e p t

in mind,

to the n e a r e s t

we un-

system.

Definition

Given

above

values

amin ~ n

6

the

and

singular

7.7 x I0 -7

and h e n c e

are

of n.

8.8 x I0 -4

The d i s c u s s i o n of

1 the

In

matrices

~n = ~2

5

on the p r e c i s i o n

not,

Hence

Omin C

could be c o n s i d e r e d is

and H = G'

= O.

C'

Table

C

(34),

and equal

: C =

by

of large d i m e n s i o n .

and o b s e r v a b i l i t y

n

~n

by

and

response,

the c o n t r o l l a b i l i t y

and

Depending

of a r e a c h a b l e

(Paige,

a system

(30)

f r o m an u n r e a c h a b l e

(a) The p a i r

1981),

(Eising,

reachable, system

1984)

we say it has a d i s t a n c e

~(F,G)

if

(F, G) : = (F + ~F, G + ~G)

is not

reachable,

with

216

and

(b) F o r any pair is

(~, ~)

II? -

with

~

~

G

II< ~

the p a i r

(~, G)

reachable.

0 In o t h e r

words,

H is the n o r m of the

~G to F and G of

Miminis terms

(1981)

of

(30)

has

that y i e l d s

found

a more

smallest

perturbation

an u n r e a c h a b l e

computational

6F,

system.

w a y to d e f i n e U in

(C2) :

= m i n ~ (P(s)) seC n

where take

denotes n the m i n i m u m

(35)

o

the n-th over

the s* a c h i e v i n g

(smallest)

all c o m p l e x

the m i n i m u m

singular

numbers;

is not real,

value

in fact

of P(s). in some

as i l l u s t r a t e d

We

cases

by the

example

0

-1

I

0

1

F =

G =

for w h i c h achieved

it has b e e n when

for any real

(36) 0

shown,

s= ± i - ~ s. Hence,

Boley

, and that in this

a n d Lu

(1984)

the m i n i m u m

case,

that

~ = .6614

is not a c h i e v e d

the p e r t u r b a t i o n

6F,

8G is

not real.

Paige

(1981)

points

using

(C1) - (C3).

(C2),

one m a y

eigenvalue

out the n u m e r i c a l

An e x a m p l e

s h o w that,

to just e i g e n v a l u e s

of F, or doing

The p r o b l e m

cases,

one m a y be u n a b l e

latter

(e.g.

case,

expensive,

with

the

former

to c o m p u t e

if F is the m a t r i x

the c o m p u t a t i o n

Miminis

(1981).

one m a y e n c o u n t e r

(CI) we a l r e a d y P(s) < n, then

of F. One t h e n has a c h o i c e

plane.

accuracy

for

if rank

problems

mentioned.

s must

of l i m i t i n g

it o v e r

the e i g e n v a l u e s (15)) (Paige,

involved

search

complex in some of F to any

1981).

can b e c o m e

Eising(1982)has

one

the whole

is the fact that

For

be an

In the

prohibitively

also described

a

217

way to c o m p u t e blem,

which

~

(C2)

(C3), w e can e n c o u n t e r

in that

critically

the s u c c e s s

minimization

problem,

is an e x t r e m e

our t h a t u s i n g

again

the

in d i s t i n g u i s h i n g

on the e i g e n v a l u e

ditioned((15)

We p o i n t

of an n - d i m e n s i o n a l

pro-

can a l s o be e x p e n s i v e .

In the case of in

in terms

same p r o b l e m

the poles

as

depends

w h i c h may be badly

con-

example).

the G r a m m i a n

can

lead

to s i m i l a r

problems,

because

(a) in the d i s c r e t e which

we h a v e

time

seen

(b) in the c o n t i n u o u s depends among

on the

other

case,

the grammian

is a p o o r v e h i c l e

time

case

solution

things,

W is d e f i n e d

in this

the d e t e r m i n a t i o n

of the L y a p u n o v

becomes

CC',

of r e a c h a b i l i t y ,

equation

ill-conditioned

as

regard

(32), which,

when

F is a l m o s t

unstable.

We can

say a few p o s i t i v e

with

regard

al.

(1979),

finds

to

(C5),

have

a unitary

such exists. the J o r d a n

reachable,

given

form,

In p a r t i c u l a r ,

the

produced

by the

by i n s p e c t i o n

principles With

regard

values

of C

of

in B o l e y

and u s i n g

(36)

(36)

and Lu

simple

to c o n d i t i o n

(36)

F21

that

(Cl)

and

does

o n e can imply

that if

not give

a good

estimate

(36).

back

~.

in the f o r m

a nearby

by g o i n g

Unun-

the d i s t a n c e

is a l r e a d y

since

1968).

but a l m o s t

~ is a c t u a l l y

algebraic

do n o t n e c e s s a r i l y

(1961

to 0 in

(1984)

(33)

et

to one u s e d to c o m p u t e

The best

is ~ 1

by s e t t i n g

algorithm")

to e s t i m a t e

algorithm".

w a y to see from

obtained

system

of all,

Van Dooren

the f o r m

is reachable,

algorithm"

the

"staircase

system can be o b t a i n e d

value

(30)

"staircase

for ~. In fact,

no o b v i o u s

similar

is no e a s y w a y

estimate

can o b t a i n

T exhibiting

is v e r y

First

(1981),

("staircase

Kublanovskaya

if the s y s t e m t h e n there

(CI)-(C5).

Paige

an a l g o r i t h m

algorithm

canonical

about

people,

transformation

This

fortunately,

things

several

one

unreachable

There .6614,

is a

to first

argument. show that almost

small

singular

unreachability

of

218 the

system

bound

(F,G),

but under

the d i s t a n c e

Theorem

If C

(Boley

has

~o+ell+

B in t e r m s

and

Lu,

singular

...

certain

+e I n n

of the

singular

one

values

can of

C

:

1984)

values

YI ~ "'" ~ Yn-1 ~ Yn > 0 a n d

is the

characteristic

polynomial

of F

with

= 1, t h e n

n

<

where

This

~

(I

+ max I ~ i l )

F is n x n,

theorem

not on

the

the

spread

Van

Dooren

that

the

of t h e

of the

find

smallest

that

how

shows useful

(¥n)

but

on

example,

C

has

show

of (37)

regard

Hence use.

to o b t a i n may

least

(C3) one

U of

on the o r d e r (38)

In

fact Boley

a nearbyreal physical

discussion

we c a n

direction,

around

also

give

this

of

/~ a n d

is n o t O ( e ) ,

the criterion

have more

(see e.g.

to

values

distance

some

which

one

results:

singular

the

to u s e

in at

0

(1981)).

something

that

two

that

is s t i l l

with

(38)

-1/2

a complex

Finally,

values

(yn/Yn_1) . In the

[ij20] [

can

system,

singular

values

(~) d e p e n d s

G =

modified,

than

to u n r e a c h a b l e

(1981):

O ( / ~ ) , (Van D o o r e n

showed

(37)

n-1

"distance

singular

,~"

. One

~(n

G n x m. D

says

size

y

F =

we

circumstances

but

(C1)

suitably

a n d Lu

(1984)

unreachable significance eq.

(36)).

a result

condition

that

can

give

219

Theorem

Assume

(Boley

that p a i r

eigenvalue small

and Lu,

ho,

(A,B)

of A. T h e n there

1984)

is r e a c h a b l e

What

this

from

theorem

by any state

a feedback ~1,...,Vn

I n by at least

says

feedback

asymptotically

is t h a t

zero,

some

eigenvalue

is h a r d to m o v e

then

the

is a l m o s t

In this d i r e c t i o n above

this

the c o n v e r s e

to i l l - c o n d i t i o n i n g reachability

We h a v e

some

K with

sufficiently

IIK II ~ h such

of the c l o s e d

loop m a t r i x

h U(A,B). D

than then

eIIKll

, as

of A is m o v e d IIKII

U < e. In o t h e r

by any small

state

words,

if

feedback,

unreachable.

is a p o s i t i v e

is not true,

since

result,

but as m e n t i o n e d

the poles

of the e i g e n p r o b l e m

rather

may m o v e

due

than the

property.

given

limitations

matrix

if some e i g e n v a l u e

K no m o r e

approaches

system

In is a simple

for any h > 0 less t h a n

exists

that all the e i g e n v a l u e s A + BK d i f f e r

and that

in this

section

just a few e x a m p l e s

of some of the c l a s s i c a l

from a numerical to c o u n t e r b a l a n c e

point these

of view,

criteria

of the

for r e a c h a b i l i t y

and some of the p o s i t i v e

limitations.

aspects

220

References Boley, D.L. and W.S. Lu, 1984: The Quasi Kalman Decomposition and State Feedback. American Control Conference, S. Diego. Gantmacher,

F.,1959:

Theory of Matrices I & II. Chelsea

(New York).

Eising R., 1982: "The Distance Between a System and the Set of U n c o n t r o l l a b l e Systems". memo COSOR 82-19, Eindhoven Univ. of Technology. Eising, R., 1984: "Between Controllable and Uncontrollable". Systems & Control Letters , Vol. 4,n. 5 pp. 263-264, July 1984. Klema, V.C. and A.J. Laub, 1980: The Singular Value Decomposition: its Computation and Some Applications. IEEE Trans. Automatic Control, Vol. AC-25, no. 2, pp. 164-167. Kalman, R.E., P. Falb and M.A. Arbib, System Theory. McGraw-Hill.

1969: Topics in Mathematica

Kublanovskaya, V.N., 1961: On Some Algorithms for the Solution of the Complete Eigenvalue Problem. Zh. Vych. Mat., VoI.I, pp. 555-570. Kublanovskaya, V.N., 1968: On a Method for Solving the Complete Eigenvalue Problem for a Degenerate Matrix. USSR Computational Math. and Math. Phys. Vol. 6, pp. 1-14. Lawson, C. and R. Hanson, 1974: Solving Linear Least Squares Problems. Prentice-Hall. Miminis, G., 1981 : Numerical Algorithms for Controllability and Eigenvalue Allocation. M. Sc. Thesis, McGill University. Paige, C.C., 1981 : Properties of Numerical Algorithms Related to Computing Controllability. IEEE Trans~ Automatic Control, Vol. AC-26, no. I, pp. 130-138. Smith, B.T., et al., 1976: Matrix Ei@ensystems Routines - EISPACK Guide. Lecture Notes in Computer Science 6, Springer-Verlag (Berlin). Stewart, G.W., Press.

1973 : Introduction to Matrix Computations.Academic

Van Dooren P., A.Emani-Naeini and L.Silverman, 1979: Stable Extraction of the Kronecker Structure of Pencils. Proc. 17th IEEE Conference on Decision and Control, pp. 521-b24. Van Dooren, P., 1979: The Computation of Kronecker's Canonical Form of a Singular Pencil. Linear Al~ebr@ and Applications, Voi.27, pp. 103-141. Van Dooren, P., 1981: The Generalized Eigenstructure in Linear Systems Theory. IEEE Trans. Automatic Control, Vol. AC-26, no.l, pp. 111-130.

221

Wilkinson, J.H., 1965: The Algebraic Ei~envalue Problem. Claredon Press (Oxford). Wilkinson, J.H. and C.Reinsch, 1971: Linear Algebra- Handbook for Automatic Computation. Vol.2, Springer-Verlag (Berlin).

Chapter

7

SOMERECENTDEVELOPMENTS IN ECONOMETRICS

Michael McALEER and Manfred DEISTLER

I.

INTRODUCTION Econometrics, in a wide sense, is concerned with the application of

statistical or mathematical methods to the analysis of economic phenomena. In this sense, econometrics may be thought of as consisting of the following four fields: (i)

Economic statistics:

problems of definition of economic

variables {such as in National Income and Product Accounts), problems of data collection, sampling and data construction, and problems of validating the data; (ii)

Econometrics in the narrow sense:

econometric methods and

econometric model building; (ili) (iv)

Economic theories:

the use of mathematical formulations;

Econometric computinS:

data bank systems for economic data and

computer programs, and interactive computing systems for data transformation, estimation and test procedures, graphical displays, calculation of solutions, and computer simulation. We will be mainly concerned with econometric methods here.

Econometrics

was born in the 1930's, from the evolving (Keynesian) business cycle theories and the first national accounts, and was especially advanced by the statistical methods developed by the Cowles Commission.

*

The authors are grateful to Dr. A. Pagan (Canberra) for valuable comments.

223

The first econometric models for national economies were due to Tinbergen and Klein, and were built in the forties.

In the sixties, for

almost every industrialized country, macroeconometric models had been established.

These models~ and especially forecasts based on these models,

behaved falrly well in the periods of steady economic growth, but showed a relatively poor performance in trying to cope arisln 8 from the seventies.

with the economic problems

This poor performance of econometric models had

great implications for the standing of econometrlcs~ but the attempt to over-

come these difflculties has been one of the main driving forces for the development of current econometrics. In analyzing these problems, it was found that many of the "a priori" assumptions which had been used in

traditional

econometric model building,

such as those concerning the classification of variables as endogenous and exogenous, the functional form of the relation between the variables, the dynamic specification of the model, and the correlation structure of the errors, could hardly be Justified on the basis of real economic a priori knowledge, and that these assumptions had been imposed primarily for statistical convenience.

Moreover, the fact that oftenp by using different a priori

assumptions, different conclusions from the same or from similar data sets could be derived, showed that econometrics was far from obtaining objective results from data.

The consequence was a critical re-examinatlon of

traditional

methods and of the assumptions Justifying them, and the development of more appropriate methods and research strategies that were more closely related to the actual problems arising in economics. A criticism that has frequently been raised is that there is a great discrepancy between the process of actually drawing conclusions from economic data~ and inference, as described by the decision theoretlcally-orlentedmathematlcal theory of statistics (see e.g. Leamer (1978)).

In many applications

224

the situation is far too complicated to express a s~at~stical decision with one formula.

Learning from data may consist of several steps where suD-

jectlve decisions cannot he excluded at each stage.

This was paralleled by

the development of exploratory data analysis, as reported in the seminal work of Mosteller and Tukey (1977) in ~eneral) statistics, Special emphasis has recently been directed in econometrics towards developing methods for checking the model specification from the data~ and on data-orlented specification search procedures.

In particular~ a great

number of tests and diagnostic checks have been developed in the last fifteen years to detect mlsspeclflcation of different kinds (see Pagan and Hall (1983) for a useful discussion of many of these developments).

Information criteria

have also been developed and used to determine automatically the dynamic specification of various lags of models.

A further development concerns the

performance of estimators or tests if the data generating process is not described in the model class, and this area of (potentially) mlsspecified models has recently been investigated by Kent (1982) and Whlte (1982),

Two

additional areas of current research interest are the robustness of estimators and tests to departures from the assumptions made in using models, as well as the sensitivity of inferences drawn to changes in the assumptions and differences in a priori information, Until the late sixties, a good part of econometric model-buildlng activity was concerned with macroeconomlc modelling and forecasting.

Since then~ there

has been an increasing number of econometric investigations on a far less aggregated level which has led to new models and methods,

Moreover, owing to

the increased quantity and improved quality of data available, applications of

225

more data-consuming techniques have increased.

These '~icroeconometrlc"

methods are definitely among the most important developments in econometrics today, and we will describe them briefly in this paper. The question of appropriate macroeconomlc modelling and forecasting is still an unresolved issue, after the difficulties encountered in traditional structural model buildins.

Several proposals have been made to overcome these

difficulties associated with the traditional approach, and we w~ll describe some of the most important developments in this paper. The paper is orsanlzed as follows,

Section 2 is concerned with the

specification and quality control of models, and the related issue of specification searches.

Macroeconomlc modelling and £orecastin s is discussed

in Section 3, and some examples of modern '~icroeconometrlc" models are given in Section 4.

Needless to say, our account is far from complete and a number

of important topics have been omitted.

In particular, the (relatively)

inexpensive role of computers in econometrics, such as for conducting sophisticated Monte Carlo experiments for comparln s different estimators and different test statistics, and for bootstrapping the small sample distributions of estimators and test statistics, is not discussed althoush they will play a very important role in the development of the discipline in the decades to come.

226

2.

SPECIFICATIONAND qUALITY CONTROLOF A MODEL Differences i n formulating a model have been described by McAleer and

Pesaran (1986) as follows:

differences in theoretical paradigms, differences

in the way that auxiliary assumptions within a paradigm are specSfled, or different strategies that might be adopted in the process of model construction. By a model specification is meant the set of all assumptions which define the model class, and hence also the parameter space for the inference procedure.

Economic theory, e.g., often suggests the variables in a relation-

ship, but not the appropriate functional form or the direct links between the various parts of a system.

For these reasons, a data-orlented specification

search procedure is warranted, w h e r e b y specification search is meant the set of procedures followed in moving from an initial model specification

to

a

final model class. Two matters which arise when there are conflicting views regarding assumptions are the justification of the assumptions from the data and the effect of altering any of them on the properties of inference procedures. The former issue has to do, e.g., with hypothesis testSng and diagnostic evaluation, whereas the latter is concerned with robustness of inference procedures to changes in the underlying assumptions. There are several ways of conducting a specification search, and they may be given as follows: (i)

Data analytle methods:

these procedures are based on recognizing

patterns in data, as well as in their transformations, and rely heavily on subjective decisions.

A well known example of this

approach is the method advocated by Box and Jenkins (1970).

227

(ll)

Information criteria,

or criteria which provide a trade-off

between goodness-of-fit and the number of parameters used to obtain this fit for different model specifications, that is, for different candidates in the specification search.

(ill)

Testing procedures:

this would appear to be the most common

specification search procedure in econometrics.

This is, in fact,

the third of five stages used in the quality control of a model, as outlined by McAleer, Pagan and Volker (1985). are given as:

The other stages

checking consistency with economic theory; economic

and statistical considerations, such as signs and magnitudes of estimated coefficients, as well as statistical significance; sensitivity analysis; and reconciliation of empirical findings with the results obtained from previous research using alternative non-

n e s t e d models.

2.1

Model

Speci~tcation

In what follows, we will concentrate on the role of diagnostic checking within the framework of the multiple linear regression model u ~ N(0, o21),

which may be written for the

Yt = x;8 + u t where

Yt

u

t

ut ~ NID(0,

is the dependent variable,

observation matrix and

,

X

comprising

x't

t'th

y = X8 + u ,

observation as

o2)

(2.1)

is the t'th row of the

T observations on

k

T ~ k

regressors,

is the random error that is assumed to be normally distributed with

zero mean, identically and independently distributed for all observations t = 1,2,...,T,

and uncorrelated with X.

228

It should be noted that virtually all of the assumptions made in the context of the model given above are, in fact, testable.

The properties of

the error term, namely, zero mean, serial independence, homoscedasticlty and normality, are testable using what are by now standard procedures.

Assumptions

regarding linearlty of the model, a correctly specified set of explanatory variables, and constancy of the coefficients are also all testable.

Finally,

the informational content of the model given in equation (2,1) may be reconciled with the empirical findings of previous research by recourse to recently developed non-nested testing procedures. 2.2

T~ght and Loose Spec!flcatlons Before proceeding, it will be necessary to discuss briefly two alter-

native approaches to specification searches, namely, tight and loose specifications, with corresponding tests of misepecification and speciflcztion. In a tight speclficatlon,a very small model set is specified as a first step, and then a series of diagnostic checks is used to indicate ways in which it may be respecifled by enlargening the model set.

Diagnostic

checks are primarily tests of misspecificatlon since only the null hypothesis needs to be specified in advance of performing the test. following examples:

Consider the

testing for serial independence of errors against either

AR(p) or MA(p) alternatives can result in the same test statistic (Godfrey (1978));

testing for homoscedasticity of the errors against either multi-

plicative or additive heteroscedasticlty as alternatives can also result in the same test statistic (see Beta and McKenzle (1986)).

229

Bearing the caveats given above in mind, rejection of the null hypothesis using diagnostic checks may suggest where to look to improve the model specification (see Table i). The scheme given in Table i should be used regardless of whether or not a tight specification is used.

However, the more tight the specification, the

more likely it is that instances will be found where the diagnostic checks lead to rejection of the null hypothesis, If there are many observations available, it may be useful to commence with a very large model set and to test restrictions on that loose specification.

In situations such as this, in which both the null and alternative

hypotheses are specified, a test of specification is being considered.

An

important approach to consider in testing restrictions on a loosely specified model class using time series data is that of uniquely ordered hypotheses (see Anderson (1971)). In this approach, if any hypothesis is rejected, any succeeding hypothesis is also rejected and need not be tested, It is advisable to start with a (fairly) general hypothesis (i.e. the maintained hypothesis) and to test hypotheses in increasing order of restrictiveness until a rejection occurs, or the most restrictive hypothesis is accepted (i.e. is not rejected).

The accepted hypothesis is the one prior to rejection.

An

interesting and useful application of uniquely ordered hypotheses in econometrics is that of testing for common factors (see Sargan (1980) for theoretical considerations, and Hendry and Mizon (1978) for an illustration).

280 TABLE l

Usin9 Diagnostic Checks to Test for Possible Model Misspecification

Diagnostic check

Possible sources of error

Serial correlation

Correlated errors Omitted variables Incorrect functional form Incorrect transformation of variables

Heteroscedasticlty

Non-constant variances Incorrect functional form Incorrectly transformed dependent variable

Exogeneity

Measurement errors Omitted links with larger system

Functional form

Omitted variables Incorrect transformations on variables Incorrect functional form

Parameter constancy

Structural change Varying coefficients Weak forecasting ability

Non-nested alternatives

Incorrect model Alternative explanations possible

231 2.3

Principles for Testing Returning now to the model given in equation (2.1), let us denote the

ordinary least squares (OLS) estimators of and

g2 = (y-XS)' (y-XS)/(T-k).

8

and

o2

as

8 = (X'X)-Ix'y

The OLS residuals are given as

u = y-XB,

^

with t'th comment given by

ut = Yt - x~B.

Tests may be constructed using the followlng Principles:

The Likelihood

Ratio, Wald and Lagrange Multiplier (or Score) Principles (see Engle (1983) for a discussion); the Cox (1961, 1962) Principle for non-nested (or separate) families of hypotheses (see McAleer and Pesaran (1986));

and the test

procedures based on the work of Durbin (1954) and Hausman (1978) (see Ruud (1984) for further details).

The first three Principles lead to tests which

are asymptotically equivalent under the null hypothesis as well as under local alternatives, and the likelihood ratio test is the only one of the three which requires estimation under both the null and the alternative hypothesis. The Lagrange multiplier (LM) test is extremely straightforward to use computationally, as it can frequently be computed as

TR 2,

that is, the sample

size times the coefficient of multiple determination from an auxiliary linear regression.

Several examples will he given below to illustrate the

simplicity of the LM procedure. The Cox Principle is a general approach that may be used for testing nonnested hypotheses, a special case of which is that of nested hypotheses.

It

essentially involves centring any given test statistic under the null hypothesis, and then deriving its asymptotic null distribution.

This procedure can be

applied to the likelihood ratio statistic itself, or to residual sums of squares from different regression models, or to differences in some or all of the parameter estimates of alternative models.

The Hauaman test procedure

(Hausman (1978)) can be considered to be an application of the Cox Principle. This procedure is based on the difference between two estimators, one of which

232 is efficient under the null, but not even consistent under the alternative hypothesis, whereas the other is consistent regardless of whether the null or alternative hypothesis is true.

2.4 Diagnostic Testing Throughout this section it will be presumed that a regression package is available for provldlng OLS estimates and for storing the predictions and residuals from OLS estimation.

Unless stated otherwise, all test statistics

discussed below may be obtained from OLS regressions based on simple auxiliary equations.

Emphasis is placed on non-structural models, and the

interested reader is referred to the review by Pagan and Hall (1983) for a detailed discussion of extensions to structural models.

Since the following

discussion is necessarily limited in scope, the broader treatment provided by Pagan (1984) is also highly recommended.

2.4.l

Serial correlation Serial correlation of the error term

u

t

can lead to inefficient

estimators and predictions, and to inconsistent estimates of dependent variable is in the set of regressors.

8

if a lagged

Perhaps the most useful

tests for serial independence against AR(p) or MA(p) alternatives have been developed by Durbin (1970), see also Breusch (1978) and Godfrey (1978). ut

follows an AR(p) process, then

where

et

u t = PlUt_l +

P2Ut_2 + ... + ppUt_ p + e t,

is white noise, and the regression model is given by

Yt = x~B + PlUt_l + P2Ut_2 + "'" + p p U t - p + et " hypothesis

If

Ho: Pl = P2 = "'" = Pp

ut_j (J = 1,2, .... p),

0

The LM test of the null

is obtained by replacing

the lagged values of the OLS residuals.

statistic is calculated as

TR 2

from the auxiliary regression

ut_ j

with

The LM test

233 ^

^

Yt = x~B + PlUt_l + P2Ut_2 + ... + ppUt_ p + c; , asymptotically

as

X2(p)

with

under the null hypothesis.

TR 2

distributed

The LM test for serial

independence against an MA(p) process is given by the same auxiliary regression. Durhin's

2.4.2

Setting

p = i

gives an asymptotically

equivalent

test to

(1970) h-statistlc.

Heteroscedasticity

When the variance of the error term is not constant but varies with each observation, where

zt

we might think of it as having the relation

is given.

The LM test statistic

squared OLS residuals, example,

if

zt

^2 ut •

2 2 o t = o + z;7,

is calculated by regressing the

on a constant and a vector of variables.

is given as the scalar

E(y t)

or

£n E(Yt) ,

For

the LM ^

statistic

is obtained as

^2

or

TR 2

from the auxiliary regression

^

u t = e + y£n Yt + ut '

where

ut

the explicit form of heteroscedastlcity incorporated

u~ = u + YYt + ut

into the construction

is the equation error.

might be suspected,

of the vector

zt .

In cases where

thl8 could be

234

Exogenetty

2.4.3

In the model

y = X8 + u,

lack of exogenelty of

X,

through measurement errors or because some elements of within a larger system, may

either

X

are determined

lead to inconsistent estimators of

B,

The

Hausman test for exogeneity can be applied straightforwardly, as discussed in Section 2.3.

A eomputatlonally convenient method for calculating the test

is to use a set of instrumental variables £

.

W(W'W)-~'X

W

and to obtain the predictions

from regressing the columns of

auxiliary regression

y = X8 + £~ + u

hypothesis of exogenelty

H : @ = 0,

X

on those of

W.

The

is estimated to test the null where the F statistic is asymptotically

O

valid for testing

2.4.4

Ho

(see also Durbln (1954)).

Functionalform

The most straightforward diagnostic check for omitted variables and/or incorrect functional form is undoubtedly Ramsey's (1969) regression specification error test (RESET).

This involves adding powers of the

predictions from the null model to the system and testing for the presence o f the additional factors,

The augmented regression is, for example,

Yt = x~B + yly~ + y2y t + u t exactly as

F(2,T-k-2)

under

and the F test of H°

Ho; Yl

if the regressors

72 xt

0

is distributed

are exogenous.

Several additional tests for incorrect functional form are available, and some of these methods have been based on the data transformations suggested by Box and Cox (1964).

In particular, Andrews (1971) has derived

a llnearlzed version of the Box-Cox model which does not require estimation of the Box-Cox model itself, hut only the specialization of it that is being tested.

Godfrey and Wickens (1981) have derived LM tests of both linear and

log-linear specifications against the more general Box-Cox model that are calculated as

TR 2

from auxiliary regressions.

For a survey of alternative

tests of linear and log-linear models, as well as a discussion of their small sample properties, see MmAleer (1985, Section 6).

235

2.4.5

Parameterconstancy Perhaps the most well known test for parameter constancy in econometrics

is the Chow test, in which the null hypothesis of constancy of parameters

in

the linear regression model is tested against the alternative of a change in coefficients at a known point in time.

The cumulative sum and cumulative sum

of squares tests of Brown, Durbln and Evans (1975), which are based on recursive residuals,

is available when a broader class of parameter non-constancles

considered as an alternative.

The constancy of parameters may also be checked

by testing for predictive ability based upon post-sample Salkever

observations

(1976) for a very simple test for parameter constancy).

review of this literature is given in Pesaran, 2.4.6

is

(see

A useful

Smith and Yeo (1985).

Non-nested alternatives The result of applying diagnostic checks to various model specifications

may lead to two or more models that are not rejected.

When one model cannot

be obtained from another by the imposition of restrietlons~ said to be non-nested.

The most well known test, namely the Cox test~ was intro-

duced into the econometric thesis be given by

literature by Pesaran

Ho: y = X8 + u, against

H°

augmented regression

y = X8 + Zy + u,

ature as testlng "parameters

details).

Two other Cox-type

and Fisher and McAleer

(1981).

to small sample properties

H1

is the test of

Let the null hypobe

Ho: y = 0

HI: y = Z 7 + v. in the

and this has been justified

of interest"p

Union-lntersection

(1974).

and let the alternative

The simplest test of

based on Roy's

the models are

Principle

as an encompassing

in the liter-

test, and as a test

(see McAleer and Pesaran

(1986) for

tests are given in Davidson and MacKinnon

(1981)

All of these tests are compared with regard

in McAleer

(1985, Section 7).

236

3.

MACROECONOMIC MODELLING AND FORECASTING Macroeconomic modelling is concerned with modelling the dynamics of,

and the interaction between, highly aggregated macroeconomic variables such as national income, consumption, investment, unemployment and prices; of special interest is the study of the business cycle, forecasting and policy simulation. The traditional approach to macroeconomlc modelling is structural model building, when large-scale models comprised e.g. of a hundred or more equations are

estimated from the data and used for economic analysis fore-

casting and policy simulation.

In view of the relatively large number of

equations and the relatively small data sets involved, a great number of restrictions on the parameters have to be imposed to obtain reasonably lowdimensioned parameter spaces and reasonable parameter estimates. restrictions are

These

frequently in the form of zero restrictions, indicating

that a certain variable does not influence some other variable in a certain equation of the system.

Under the assumption of an a priori given specification~

the theory of identifiability and maximum likelihood estimation of linear simultaneous equation systems (with uncorrelated errors) was developed by

Koopmans, Rubin and Leipnik

(1950);

two-stage and three-stage least squares

methods were subsequently developed in order to simplify calculations, However, in practical applications the most common estimation method was, and still is, ordinary least squares, despite its lack of consistency in the simultaneous equation framework. In the period of steady economic growth, the traditional large scale models showed satisfactory forecasting behaviour.

But with the increasing

fluctuations in many economic variables after the oil shock of the seventies, many of these models showed rather poor forecasting performances.

At this

time the first macroeconomic forecasts were made with Box-Jenkins models and

237

these simple univariate models often out performed large econometric models, at least for short-term forecasts. As has been said previously, these facts led to a widespread critique of the traditional model-building approach.

One of the main reasons for the

weakness of many forecasts was found in the poor specification of the respective models,where too much economic a priori information was presumed to be available.

As a consequence, data-orlented specification search

procedures (as described in Section 2) and new models have been proposed. At present, different modelling philosophies, and hence different models, have been proposed, even for identical or similar data sets.

Therefore,

macroeconometric modelling is still far from lacking in controversy.

There

are still advocates of traditional structural model-building, especially if the main aim is an analysis of the interaction between variables and policy simulation, rather than (unconditional) forecasting;

on the other hand,

several different proposals for new modelling approaches have been made, and we will describe some of these below. A method for obtaining the dynamic specification of a structural model is described in Zellner and Palm (1974) as follows.

Let

P q Z AiYt_ i = i=0 Z Bizt-i + u t i=O

(3.1)

be the structural model, where

Aiq]Rsxs ,

matrices, and

are the endogenous, exogenous and white noise

Yt' zt

and

ut

error variables, respectively. multiplication by the adjoint of operator)

Bi~/RSXm

are the parameter

Equation (3.1) may be transformed by a left EAi Bi

(where

B

is the backward-shift

to yield a system which is deeoupled ~n the sense that only the

i'th endogeneous variable (including its lagged values) appears in its i'th

238 equation.

In this form, the equations can be treated as

s

single

equations andthe Box-Jenkins method can be applied to obtain the dynamic specification for each of these single equations.

The dynamic specification

of the original model is then obtained from these specifications.

Of course,

one problem associated with this procedure is that the zero restrictions of the original model

are not taken into account in the transformed single

equations. The classification of the observed variables as endogeneous and exogenous often cannot be Justified on a priori grounds; this is especially true if conflicting economic theories imply different classifications for the variables.

This ambiguity has led

to discussions concerning the

concept of exogeneity and to tests for exogeneity.

One concept of exogeneity

is related to causality in the sense introduced by Granger (1969). analysis, variables are called exogenous if there is a

In this

unidirectional causal

influence from the exogenous to the endogenous variables.

Tests for

causality have been proposed, e.g. by Sims (1972), and Pierce and Haugh (1977). The second concept of exogeneity was given in Engle, Hendry and Richard (1983). The defining property of exogeneity here is that conditioning the other observed variables on the exogenous variables gives no loss of information about the parameters of interest. Another approach to overcome the classification problem

discussed above

is to provide a symmetric treatment to all observed variables, by describing them Jointly as a vector autoregressive (VAR) process. e.g. in Sims (1980).

This has been proposed

Once the VAR system has been estimated, questions such

as the classification of variables into exogenous and endogenous, or whether there are zero restrictions among the parameters, may be answered on an empirical basis (see Sargent and Sims (1977), and Hsiao (1982)).

Although

239 these vector autoregreselons usually contain slgnlfleantly fewer equations (about ten) than the usual structural models, both the dimension of the parameter space as well as the dynamic specification of the model remain problematic. In order to reduce the dimensions of the parameter space, Sargent and Sims (1977) have proposed a dynamic principal component analysis where the dynamics in the observed variables are introduced by factors of lower dimension; cycle

this is very closely related to the idea

that the business

in most macroeconomic variables can be explained by a few dynamic

factors (Bowden

(1972)).

However, in determining the number of dynamic

factors empirically, there seems to be no sharp delineation of principal components (Sims,prlvate communication). A Bayesian procedure for analyzing VAE models has been proposed in Doan Litterman and Sims (1984), which seems to have surprisingly good forecasting properties.

In this approach the prior

means of all coefficients corresponding

to lags greater than one are set equal to zero and the estimation problem is reduced to the estimation of relatively few "hyperparameters", such as the tightness of the prior means. Another method of reducing the dimension of VAR systems is the use of the AIC or BIC criteria to determine both the maximum lags in the V A R m o d e l and zero restrictions on the coefficients corresponding to smaller lags.

For

multivariate subset autoregressive modelling, see e.g. Penm and Terrell (1984). This method also performs well in forecasting. A different approach is to concentrate on the modelling of one equation at a time (see Davidson et ai.(1978) and Hendry (1986)) which implicitly assumes that the effects of simultaneity are negligible.

This approach stresses

that economic a priori knowledge is primarily concerned with the long-run

240

equilibrium solutions of the system, whereas in many cases economic theory has very little to say about short-run behaviour.

For this reason,

equilibrium solutions of dynamic models should be consistent with economic theory. 4.

MICROECONOMETRICS During the last decade there has been a substantial development of

econometric techniques to answer questions posed in empirical microeconomics. As an important example, consider the case where the variable being explained by the model either takes on only discrete values, or is limited in its range. Sample survey data frequently requires such models to be used:

for example,

a binary-choice model might be used to explain the decision to buy a car

or

not, and a multiple-choice model might be used to explain whether a commuter travels by bus, train or car.

The explanatory variables in each of these examples

would be the personal and economic characteristics of various individuals. The conditional probabilities of the outcomes of the discrete variable are related to various explanatory variables in the model. of probabilities, in particular,

Owing to the nature

the functional form of these relations must be restricted;

linear relations are excluded.

In binary models, where there is

only one conditional probability to be explained,

the most important class of

models is of the form

E

and

F

P(E~x~6) = F(x~8),

where

is a cumulative distribution function.

are the normal and the logistic, respectively, models.

denotes outcome of one event

The most frequently used functions leading to probit and logit

The most well known model where the dependent variable is limited in

its range is the Tobit model (Tobin (1958)).

For example, many observations in

a sample may take on the value zero (if, say, it is decided not to buy a car) while other individuals may spend varying amounts on cars.

In this sense, the

dependent variable is part qualitative and part quantitative. of estimation of these models is by maximum likelihood. see Amemiya (1981), Maddala (1983) and McFadden (1976).

The standard method

For further details,

241 REFERENCES Amemiya, T. (1981): Qualitative Response Models Economic Literature 19, 1483-1536. Anderson, T.W. (1971): New York.

:

A Survey.

Journal of

The Statistica!Analysis of Time Series.

Wiley,

Andrews, D.F. (1971): A Note on the Selection of Data Transformations. Biometrika 58, 249-254. Bera, A.K. and C.R. McKenzie (1986): Alternative Forms and Properties of the Score Test. Forthcoming in Journal of Applied Statistics. Bowden, R.J. (1972): More Stochastic Properties of the Klein-Goldberger Model. Econometrica 40, 87-98. Box, G.E.P. and D.R. Cox (1964): An Analysis of Transformations. of the Royal Statistical Society B 26, 211-252.

Journal

Box, G.E.P. and G.M. Jenkins (1970): Time Series Analysis, Forecastin$ and Control. Holden Day, San Francisco. Breusch, T.S. (1978): Testing for Autocorrelation in Dynamic Linear Models. Australian Economic Papers 17, 334-355. Brown, R.L., J. Durbin and J.M. Evans (1975): Techniques for Testing the Constancy of Regression Relationships Over Time. Journal of the Royal Statistical Society B 37, 149-192. Chow, G.C. (1960): Tests of Equality Between Sets of Coefficients in Two Linear Regressions. Econometrlca 28, 591-605.

Cox, D.R. (1961):

Tests of Separate Families of Hypotheses. In: Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability i. University of California Press, Berkeley.

Cox, D.K. (1962): Further Results on Tests of Separate Families of Hypotheses. Journal of the Royal Statistical Society B 24, 406-424. Davldson, J.E.H., D.F. Hendry, F. Srba and J.S. Yeo (1978): Econometric Modelling of the Aggregate Time-Series Relationship Between Consumers' Expenditure and Income in the United Kingdom. Economic Journal 88, 661-692. Davidson, R. and J. MacKinnon (1981): Several Tests for Model Specification in the Presence of Alternative Hypotheses. Eeonometrica 49, 781-793. Doan, T., R. Litterman and C. Sims (1984): Forecasting and Conditional Projection Using Realistic Prior Distributions. Econometric Reviews 3, i-i00. Durbin, J. (1954): Errors in Variables. Statistical Institute 22, 23-32.

Review of the International

Durbin, J. (1970): Testing for Serial Correlation in Least Squares Regression When Some of the Regressors are Lagged Dependent Variables. Econometrica 38, 410-421.

242

Engle, R.F. (1983): Wald, Likelihood Ratio and Lagrange Multiplier Tests in Econometrics. In: Z. Griliches and M. Intriligator (Eds.) Handbook of Econometrics. North-Holland, Amsterdam. Engle, R.F., D.F. Hendry and J-F. Richard (1983): 51, 277-304.

Exogeneity.

Econometrica

Fisher, G. and M. McAleer (1981): Alternative Procedures and Associated Tests of Significance for Non-Nested Hypotheses. Journal of Econometrics 16, 103-119. Godfrey, L.G. (1978): Testing for Higher Order Serial Correlation in Regression Equations When the Regressors Include Lagged Dependent Variables. Econometrica 46, 1303-1310. Godfrey, L.G. and M.R. Wickens (1981): Testing Linear and Log-Linear Regressions for Functional Form. Review of Economic Studies 48, 487-496. Granger, C.W.J. (1969): Investigating Causal Relationships by Econometr±c Models and Cross-Spectral Methods. Econometrica 37, 424-438. Hausman, J.A. (1978): 1251-1271.

Specification Tests in Econometrics.

Econometrica 46,

Hendry, D.F. (1986): Empirical Modelling ±n Dynamic Economics. in Applied Mathematics and Computation.

Forthcoming

Hendry, D.F. and G.E. Mizon (1978): Serial Correlation as a Convenient Simplification, Not a Nuisance : A Comment on a Study of the Demand for Money by the Bank of England. Economic Journal 88, 549-563. Hsiao, C. (1982): Autoregressive Modelling and Causal Ordering of Economic Variables. Journal of Economic Dynamics and Control 4, 243-259. Kent, J.T. (1982): 69, 19-27.

Robust Properties of Likelihood Ratio Tests.

BiometriRa

Koopmans, T.C., H. Rubln and R.B. Leipnlk (1950): Measuring the Equation Systems of Dynamic Economics. In: T.C. Koopmans (Ed.) Statistical Inference in Dynamic Economic Models. Wiley, New York. Leamer, E.E. (1978): Specification Searches : Ad Hoc Inference with Nonexperimental Data. Wiley, New York. Maddala, G.S. (1983): Limited Dependent and Qualitative Variahles in Econometrics. Cambridge University Press. McAleer, M. (1985): Specification Tests for Separate Models: A Survey. In: M.L. King and D.E.A. Giles (Eds.) Specification Analysis in the Linear Model. Routledge and Kegan Paul, London. McAleer, M., A.R. Pagan and P.A. Volker (1985): What Will Take the Con Out of Econometrics? American Economic Review 75, 293-307. McAleer, M. and M.H. Pesaran (11986): Statistical Inference in Non-nested Econometric Models. Forthcoming in Applied Mathematics and Computation. McFadden, D. (1976): Quantal Choice Analysis : A Survey. and Social Measurement 5, 363-390.

Annals of Economic

243 Mosteller, F. and J.W. Tukey (1977): Wesley, New York.

Data Analysis and Regression.

Addlson-

Pagan, A.R. (1984): Model Evaluation by Variable Addition. In: D.F. Hendry and K.F. Wallis (Eds.) Econometrics and Quantitative Economies. Blackwell, Oxford. Pagan, A.R. and A.D. Hall (1983): Diagnostic Tests as Residual Analysis, Econometric Reviews 2, 159-218. Penm, J.H.W. and R.D. Terrell (1984): Multivariate Suhset Autoregressive Modelling with Zero Constraints for Detecting "Overall Causality". Journal of Econometrics 24, 311-330. Pesaran, M.H. (1974): On the General Problem of Model Selection. Economic Studies 41, 153-171.

Review of

Pesaran, M.H., R.P. Smith and J.S. Yeo (1985): Testing for Structural Stability and Predictive Failure: A Review. The Manchester School 53, 280-295. Pierce, D.A. and L.D. Haugh (1977): of Econometrics 5, 265-293.

Causality in Temporal Systems.

Journal

Ramsey, J.B. (1969): Tests for Specification Errors in Classical Linear Least Squares Regression Analysis. Journal of the Royal Statistical Society B 31, 350-371. Ruud, P.A. (1984): Tests of Specification in Econometrics. Reviews 3, 211-242.

Econometric

Salkever, D.S. (1976): The Use of Dummy Variables to Compute Predictions, Prediction Errors and Confidence Intervals. Journal of Econometrics 4, 393-397. Sargan, J.D. (1980): Some Tests of Dynamic Specification for a Single Equation. Econometrica 48, 879-897. Sargent, T.J. and C.A. Sims (1977): Business Cycle Modelling Without Pretending to Have Too Much A Priori Economic Theory. In: C.A. Sims (Ed.) New Methods of Business Cycle Research. Federal Reserve Bank of Minneapolis. Sims, C.A. (1972): Money, Income and Causality. 540-552. Sims, C.A. (1980): Macroeconomics and Reality.

American Economic Review 62,

Econometrica 48, 1-48.

Tobln, J. (1958): The Estimation of Relationships for Limited Dependent Variables. Econometrlca 26, 24-36. White, H. (1982): Maximum Likelihood Estimation of Misspecifled Models. Econometrlca 50, 1-25. Zellner, A. and F. Palm (1974): Time Series Analysis and Simultaneous Equation Econometric Models. Journal of Econometrics 2, 17-54.

Linear Time-Varying Systems: Analysis and Synthesis

Read more

Linear Operators and Linear Systems)

Read more

Linear Operators and Linear Systems)

Read more

Linear Systems and Signals

Read more

Linear Systems and Signals

Read more

Discrete time markov jump linear systems

Read more

Discrete-time Markov jump linear systems

Read more

Linear Time-Varying Systems: Algebraic-Analytic Approach

Read more

Signals and Linear Systems

Read more

Discrete-Time Markov Jump Linear Systems

Read more

Linear Systems

Read more

Linear systems

Read more

Linear Systems

Read more

Linear Systems

Read more

Linear Systems

Read more

Signal Processing and Linear Systems

Read more

Linear Inequalities and Related Systems

Read more

Signal Processing and Linear Systems

Read more

Signal Processing and Linear Systems

Read more

Linear Inequalities and Related Systems

Read more

Signal Processing and Linear Systems

Read more

Systems of Linear Inequalities

Read more

Principles of Linear Systems

Read more

Linear Dynamical Systems

Read more

Linear dynamical systems

Read more

Well-posed linear systems

Read more

Linear Systems Theory

Read more

Linear Systems Theory

Read more

A Linear Systems Primer

Read more

Linear Time Varying Systems and Sampled-data Systems (Lecture Notes in Control and Information Sciences)

Read more

Recommend Documents

Linear Time-Varying Systems: Analysis and Synthesis

...

Linear Operators and Linear Systems)

Operator Theory: Advances and Applications Vol. 204 Editor: I. Gohberg Editorial Ofﬁce: School of Mathematical Sciences...

Linear Operators and Linear Systems)

Operator Theory: Advances and Applications Vol. 161 Editor: I. Gohberg Editorial Office: School of Mathematical Scienc...

Linear Systems and Signals

Linear Systems and Signals, Second Edition B. P. Lathi Oxford University Press 2005 Published by Oxford University Pres...

Linear Systems and Signals

Linear Systems and Signals, Second Edition B. P. Lathi Oxford University Press 2005 Published by Oxford University Pres...

Discrete time markov jump linear systems

Probability and Its Applications Published in association with the Applied Probability Trust Editors: J. Gani, C.C. Heyd...

Discrete-time Markov jump linear systems

Probability and Its Applications Published in association with the Applied Probability Trust Editors: J. Gani, C.C. Hey...

Linear Time-Varying Systems: Algebraic-Analytic Approach

Lecture Notes in Control and Information Sciences 410 Editors: M. Thoma, F. Allgöwer, M. Morari Henri Bourlès and Bog...

Signals and Linear Systems

Discrete-Time Markov Jump Linear Systems

Probability and Its Applications Published in association with the Applied Probability Trust Editors: J. Gani, C.C. Hey...