Lecture Notes in Control and Information Sciences Edited by M.Thoma and A. Wyner
86 Time Series and Linear Systems
Edited by S. Bittanti
Springer-Verlag Berlin Heidelberg New York London Paris Tokyo
Series Editors M. Thoma • A. Wyner
Advisory Board L. D. Davisson • A. G. J. MacFarlane • H. Kwakernaak J. L. Massey ' Ya Z. Tsypkin • A. J. Viterbi Editor Sergio Bittanti Dipartimento di Elettronica Politecnico di Milano Piazzo Leonardo da Vinci 32 20133 Milano (italy)
ISBN 3-540-16903-2 Springer-Verlag Berlin Heidelberg New York ISBN 0-387-16903-2 Springer-Verlag New York Berlin Heidelberg Library of Congress Cataloging in Publication Data Time series and linear systems. (Lecture notes in control and information sciences; 86) Includes bibliographies. 1. Time-series analysis. 2. Linear systems. I. Bittanti, Sergio. I1. Series. OA280.T558 1986 519.5'5 86-20244 ISBN 0-387-16903-2 (U.S.) This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically those of translation, reprinting, re-use of illustrations, broadcasting, reproduction by photocopying machine or similar means, and storage in data banks. Under § 54 of the German Copyright Law where copies are made for other than private use, a fee is payable to "Verwertungsgesellschaft Weft", Munich. © Springer-Verlag Berlin, Heidelberg 1986 Printed in Germany Offsetpnnting: Color-Druck, G. Baucke, Berlin Binding: B. Helm, Berlin 216113020-543210
PREFACE
O v e r the
five p a s t
n i c o di M i l a n o
years,
(Italy)
and ~ d e n t i f i c a t i o n Several
Analysis,
series
Statistics,
of r e s e a r c h
by m e a n s
Numerical
contributing
on the subject.
underlying
this
was
chapters
are e x t e n d e d
advanced
topics
The b o o k
problem
in the field.
They also
directions
as follows.
The p r o b l e m of f i n d i n g
the
linear
of c r i t e r i a
such as A I C or B I C
the p r o b l e m
of d e t e r m i n i n g
observed second
~
Hankel
studied matrix
matrix
variables
chapter.
assumtions
of c u r r e n t
which
Among
of the
impulse
of f i n i t e
are s u b j e c t
The motivation
can then be avoided.
interest.
rank.
is an
in time
series
here
other
things
the use
discussed.
rational
transfer
of a p p r o x i m a t i n g response
Linear
to errors
as the
is the b e s t
is c r i t i c a l l y
as the p r o b l e m
important
useful
chapter
models
a suitable
The v a r i o u s
constitute
is i n t e r p r e t e d
model
for the d a t a at hand.
with a Hankel
talks
of ideas
overviewing
The first
of m o d e l l i n g
approximant
infinite
their
a system-theoretic
papers
to the use of s t o c h a s t i c
approximation
train
of s u c h an activity.
introductory
is o r g a n i z e d
analysis.
report
to r e s e a r c h
introduction
to d e v e l o p
The
with
System
for the art of modelling.
is a p a r t i a l
introductions
systems.
Econometrics,
up a w o r k s h o p
This b o o k
of l i n e a r
including
to s e t t i n g
point of v i e w
of m o d e l l i n g
backgrounds,
the P o l i t e c n i c o
activity
at the P o l i t e c -
in the m e t h o d o l o g y
of d i f f e r e n t
Theory,
visited
has b e e n
of time
specialists
and C o n t r o l
a stream
function the
coefficients
systems
where
are c o n s i d e r e d
is t h a t p r e j u d i c i a l A n e w class
Moreover,
all
in the
causality
of d y n a m i c
models
IV
for time
series
are b a s e d strictly
is p r o p o s e d
on the c l a s s i c a l related
chapter
Length
approach.
digits
with which
is d e v o t e d
of s t o c h a s t i c of b i n a r y data.
ically
time-varying
coefficients,
structural
series.
Chapter
The
properties
of these
and so on.
in the a n a l y s i s
of s t o c h a s t i c
in the sixth Shur Then,
problems
chapter.
and Singular the p r o b l e m
time-invariant
The v o l u m e
on the s u b j e c t the m a i n
authors their
of some r e c e n t either
expresses
his
care and patience
sincere
valuable
Research
Council
(M.P.I.)
is g r a t e f u l l y
theory
(C.N.R.)
properties upon.
are c o n s i d e r e d
of the LU,
QR,
is provided. subspace
of a
is d e v o t e d
in E c o n o m e t r i c s .
book providing
acknowledgment contributions,
di T e o r i a
researchers
courses with
field.
in the p r e p a r a t i o n
of the C e n t r o
to d e s c r i b e
is t o u c h e d
last c h a p t e r
in the
period-
on the b a s i c
b y these
algorithms
trends
with
reachability,
reachability The
to
as a t e x t b o o k for m o n o g r a p h i c
and p e r s p e c t i v e s
for t h e i r m o s t
The s u p p o r t
the
or as a r e f e r e n c e
trends
The e d i t o r
Decomposition
systems
systems
overview
as
it p e r m i t s
here
i.e.
system
is studied.
can be u s e d
focuses
periodic
This
of the data,
with
The r o l e p l a y e d
of b i n a r y
data.
can be u s e d
systems,
of c o m p u t i n g
system
to the d i s c u s s i o n
which
An e x t e n s i v e
2. The
Description
the o b s e r v e d
5 deals
in l i n e a r
Value
Minimum
with which
attention
stabilizability
Some numerical
in C h a p t e r
complexity
digits
and are
by the n u m b e r
to e n c o d e
the o b s e r v e d
time
judged
These models
approach,
introduced
encode
seasonal
Analysis
is t h e n
it p e r m i t s
number
chapter.
to the so c a l l e d
A model
to the n o t i o n
the s h o r t e s t
Factor
to the s y s t e m s
fourth
leads
in the t h i r d
to the as well
fellow as
of the m a n u s c r i p t s .
dei S i s t e m i
of the N a t i o n a l
and t h a t of the M i n i s t r y
of E d u c a t i o n
acknowldged.
Sergio
Bittanti
ABSTRACTS
Chapter
TIME
l
SERIES AND
b y E.J.
The b a s i c
concept
an o u t p u t
y(t),
u(t),
p
of
STOCHASTIC
Hannan
of this p a p e r of
q
is a l i n e a r
components,
components
y(t)
y(t)
the
e(t)
= ~ W i e(t-i) linear
The m e t h o d s
valid when
the s y s t e m
prediction
is o p t i m a l ,
is r e l a t e d
to an input,
+ ~ L i u(t-i) 1
are the
- Z L i u(t-i).
system wherein
via a relation
0
wherein
MODELS
is t r u l y
prediction
errors
of the p a p e r linear
but may prove
are s u b s t a n t i a l l y
in the useful
for
sense over
that
linear
a much wider
range. To b r i n g
the p r o b l e m b a c k
statistical structure
methods
are b a s e d
by one w h e r e i n
W(z)
are a p p r o x i m a t e d discussion
= Z W.
process.
to c h o o s e
"order"
lags
-i
, L(z)
the of the t r u e
functions
= Z L.
of r a t i o n a l
of some b a s i c
approximation
the m a x i m u m
z
proportions
on t h e a p p r o x i m a t i o n
the m a t r i x
by m a t r i c e s
is g i v e n
the
to r e a s o n a b l e
-i
functions.
theory
It is n e c e s s a r y ,
z
relating
A brief
to such an
in the a p p r o x i m a t i o n
of the a p p r o x i m a n t ,
i.e.
effectively
in the A R M A X m o d e l ,
h Z AiY(t-i) 0
h = ~ B.u(t-i)l I
h + Z C.l e(t-i) 0
,
to w h i c h Various
the r a t i o n a l algorithms
analysis
described
are d e s c r i b e d
a suitable
that
does
approximant.
recursive
at e a c h
corresponds.
are b a s i c
in time
a solution
The m a i n
this by a G a u s s - N e w t o n
is r e d e t e r m i n e d
to the p r o b l e m
algorithm
iteration
iteration
series
in w h i c h
the
by a c a l c u l a t i o n
in the order.
Finally,
on-line
presented
Chapter
function
and are t h e n u s e d to e f f e c t
of f i n d i n g
order
transfer
implementations
for the c a s e w h e r e
LINEAR
2
of the a l g o r i t h m
y(t)
are
is scalar.
ERRORS-IN-VARIABLES
SYSTEMS
by M. D e i s t l e r
Linear where
errors-in-variables(EV) all o b s e r v e d
considered. out
The
variables
statistical
to be s i g n i f i c a n t l y
conventional good part
of t h e s e
the t r a n s f e r general,
errors
systems, are s u b j e c t
analysis
more
complications
function
of the
is n o t u n i q u e l y
system
are
systems.
f r o m the
f r o m the
turns
to A
fact t h a t
in the E V case,
determined
systems
systems
compared
(e.g. ARMAX) arises
linear
to e r r o r s
of s u c h
complicated
in e q u a t i o n s
i.e.
in
second moments
of the o b s e r v a t i o n s . The p a p e r known
is o r g a n i z e d
results
sections contained
concerning
3 - 5 the in the
is a n a l y s e d :
as follows: the s t a t i c
information
(ensemble)
In s e c t i o n
In s e c t i o n
about
second
3 the
case
2 some w e l l
are r e s t a t e d .
the t r a n s f e r
moments
In
function
of the o b s e r v a t i o n s
set of all t r a n s f e r
functions
corresponding described.
to g i v e n
Section
system
is a p r i o r i
whether
causality
the o b s e r v a t i o n s . are derived. using
4 deals with known
of the o b s e r v a t i o n s
the same p r o b l e m
to be c a u s a l
c a n be d e t e c t e d In s e c t i o n
Section
information
second moments
the
the p r o b l e m
f r o m the s e c o n d m o m e n t s
5 conditions
6 deals
coming
and w i t h
when
is
with
for i d e n t i f i a b i l i t y
conditions
from moments
of
for i d e n t i f i a b i l i t y
of o r d e r
greater
than
two.
A N E W C L A S S OF D Y N A M I C M O D E L S FOR STATIONARY TIME SERIES
Chapte r 3
by G. P i c c i
A new class presented. known
of d y n a m i c
for s t a t i o n a r y
time
T h e y are a n a t u r a l
generalization
of the w e l l -
and P s y c h o m e t r i c s . of time
to some e x t e n t
simple
of m u l t i v a r i a t e introduction subsumed
series
is
Analysis M o d e l s w i d e l y u s e d in S t a t i s t i c s It is s h o w n
series clarify
the
in this n o t e
structure
schemes
series which
of a p r i o r i causality
by c o n v e n t i o n a l
reduce
of) Dynamic
in the r e c e n t
mathematical time
that the F a c t o r A n a l y s i s
considered
VariabZe Models d i s c u s s e d provide
S. P i n z o n i
models
l i n e a r Factor
Models
and
avoid
They
identification
the u n j u s t i f i e d
assumptions
A R M A X models.
(and
Errors-In-
literature.
for the
to
as for e x a m p l e
Chapter
4
PREDICTIVE AND NONPREDICTIVE MINIMUM DESCRIPTION LENGTH PRINCIPLES by J. R i s s a n e n
This
chapter
behind
presents
the r e c e n t l y
Minimum model
permits length
one to e n c o d e
stochastic
for m o d e l s
the p r e d i c t i v e tend
sets
data
can be predicted.
involves their
a tight
estimates
values,
statistical with
lower
information
problems
complexity
bound
that
in m o d e l i n g
we d e s c r i b e
of the d a t a
relative
of models,
both
single
case.We
illustrate
with
associated
estimates
structures.
We also
the p a r a m e t e r s , be t a k e n
of.
The
by s i m u l a t i o n s .
complexity the
the c o m p l e x i t y and
all the
say that the
the funda-
stochastic
model. of the
stochastic
A R M A class input/output
the c o n s i s t e n c y of the p a r a m e t e r s
h o w the p r i o r by their
feasibility
for
f r o m the d a t a
and the m u l t i p l e
as r e p r e s e n t e d
advantage
demonstrated
describe
with
to the g a u s s i a n
of the n u m b e r
which
with which
to c a l c u l a t e
simulations
can be
stochastic
the c a l c u l a t i o n
complexity
to be the
ones,
to i n c o r p o r a t e
optimal
code
of the p a r a m e t e r s
we m a y
it
on h o w the
c a n be e x t r a c t e d
are
with which
complexities
associated
Hence,
called
a statistical
is d e f i n e d
The
ideas
shortest
for the errors
The m o d e l
models.
in the
The
Depending
same value.
and the a s s o c i a t e d
As a p p l i c a t i o n s
digits
and the n o n p r e d i c t i v e
m a y be t a k e n
the c o n s i d e r e d
mental
in a class
b o t h of the n u m b e r
which
principle,
data.
of s t o c h a s t i c
to the
also
the b a s i c
Briefly,
of b i n a r y
of the data.
two k i n d s
samples
principles.
the o b s e r v e d
complexity
is done
defined, large
Length
manner
estimation
by the n u m b e r
available
coding
developed
Description
is judged
in a t u t o r i a l
knowledge
estimated of the
of the and the about
values,
scheme
is
can
IX
Chapter 5
DETERMINISTIC AND STOCHASTIC LINEAR PERIODIC SYSTEMS by S. Bittanti
The main results concerning the structural properties of linear periodic systems are reviewed. and discrete-time time-invariant discussed.
Both continuous-time
systems are dealt with. By a comparison with
systems,
five structural properties are
Three of them are basic properties concerning the
reachability and controllability subspaces. The fourth one concerns the length of the time interval required to perform the reachability and controllability transition. (spectral) characterizations
are presented as fifth property.
The extended structural properties detectability)
The modal
(i.e. stabilizability and
are also dealt with. Finally,
periodic stochastic
systems are considered. The existence of a cyclostationary solution is investigated by analizing the appropriate periodic Lyapunov equation.
Chapter 6
NUMERICAL PROBLEMS IN LINEAR SYSTEM THEORY by D. Boley and S. Bittanti
We discuss some numerical aspects
in linear system theory.We
start by showing the numerical algorithm to solve systems of linear equations and non-degenerate
least squares problems.We
then move on to an introduction to more sophisticated matrix decompositions,
used to solve more sophisticated problems,and
introduce the cincept of son,
backward
error
analysis
1965). Among the decompositions we introduce
(Wilkin-
name
form
LU
A=LU
used
to o b t a i n
solution
of l i n e a r
determinant
(Gaussian Elimination) A=QR
QR
soln. to l e a s t S q u a r e s p r o b l e m (linear n o n d e g e n e r a t e )
(orthogonal triangularization)
soln. to l i n e a r E q u a t i o n s w i t h o u t n e e d to p i v o t
Schur
A=QRQ ' . Eigenvalues/vectors
Singular Value D e c o m p o s i t i o n (S.V.D.)
A=PZQ ' . Singular
Values
• rank • distance
to s i n g u l a r i t y
2 - n o r m of m a t r i x
•
2-norm condition where
P,Q denote
orthogonal
U,R
"
upper
triangular
matrices
L
"
lower
triangular
matrices
Z
is n o n - n e g a t i v e
diagonal
last s e c t i o n we d i s c u s s
linear
s y s t e m theory.
linear
numerical
methods.
a n d give
in t e r m s
subspace
some r e c e n t
of r e s u l t s
aspects
is f o c u s e d
in
o n the p r o b l e m
of a t i m e - i n v a r i a n t
It is s h o w n h o w some c l a s s i c a l
problems
on the e r r o r s
some n u m e r i c a l
The attention
the c o n t r o l l a b l e
system.
number
matrices
In the
of c o m p u t i n g
Equations
methods
results
from these
lead to
giving bounds
classical
×I
Chapter
SOME R E C E N T
7
DEVELOPMENTS
b y M. M c A l e e r
In this p a p e r
we d i s c u s s
in e c o n o m e t r i c s : particular,
macroeconomic associated
modelling
with
and M. D e i s t l e r
some of the m a i n
methods
diagnostic
IN E C O N O M E T R I C S
recent
for s p e c i f i c a t i o n
checking
and
empirical
search,
specification
and f o r e c a s t i n g ; microeconomics.
developments in testing;
and some m o d e l s
AUTHORS
Sergio Bittanti D i p a r t i m e n t o di E l e t t r o n i c a P o l i t e c n i c o di M i l a n o P i a z z a L e o n a r d o da Vinci, 32 20133 M I L A N O ITALY
Daniel Boley D e p a r t m e n t of C o m p u t e r S c i e n c e U n i v e r s i t y of M i n n e s o t a 136 L i n d Hall 207 C h u r c h S t r e e t S.E. M I N N E A P O L I S , M i n n e s o t a 55455 U.S.A.
Manfred Deistler I n s t i t u t fdr O k o n o m e t r i e u n d Operations Research Technische Universit~t Wien Argentinierstrasse 8/119 A-1040 WIEN AUSTRIA
E d w a r d G. H a n n a n D e p a r t m e n t of S t a t i s t i c s M a t h e m a t i c a l S c i e n c e s Bldg. The A u s t r a l i a n N a t i o n a l U n i v e r s i t y GPO Box 4 C A N B E R R A , A C T 2601 AUSTRALIA
M i c h a e l J. M c A l e e r D e p a r t m e n t of S t a t i s t i c s , The F a c u l t i e s The A u s t r a l i a n N a t i o n a l U n i v e r s i t y GPO Box 4 C A N B E R R A , A C T 2601 AUSTRALIA
Xill
Giorgio Picci I s t i t u t o di E l e t t r o t e c n i c a U n i v e r s i t ~ di P a d o v a Via G r a d e n i g o 6/A 35131 P A D O V A ITALY
Stefano Pinzoni LADSEB-CNR Corso Stati Uniti 35020 P A D O V A ITALY
Jorma R i s s a n e n IBM-RES 650 H a r r y R o a d SAN JOSE, C A 95193 U.S.A.
4
ed E l e t t r o n i c a
XIV TABLE
TIME
Chapte r I
SERIES
by E.J.
AND
OF C O N T E N T S
STOCHASTIC
MODELS
Hannan
I. I n t r o d u c t i o n
I
2. Some
4
Basic
Algorithms
3. A p p r o x i m a t i o n 4. R a t i o n a l
Criteria
Transfer
5. A. G a u s s - N e w t o n 6. Some
Theoretical
8
Function
Approximation
12 16
Procedure
28
Considerations
34
References
Chapter
2
LINEAR
ERRORS-IN-VARIABLES
SYSTEMS
37
by M. D e i s t l e r
I. I n t r o d u c t i o n
37
2. The S t a t i c
41
3. S e c o n d
Case
M o m e n t s and D y n a m i c
Models: the G e n e r a l
Case
4. C a u s a l i t y 5. C o n d i t i o n s Moments
52 for I d e n t i f i a b i l i t y
f r o m the S e c o n d
of the O b s e r v a t i o n s
6. I d e n t i f i a b i l i t y References
48
from H i g h O r d e r
58 Moments
63 66
XV
Chapter
3
A NEW CLASS
OF D Y N ~ 4 I C
FOR STATIONARY b y G. Picci
TIME
MODELS
69
SERIES
and S. P i n z o n i
69
I. I n t r o d u c t i o n 2. D y n a m i c
Factor
3. S t o c h a s t i c
Analysis
80
Models
87
Realization
4. C a u s a l i t y
104
References
112
Ch_~pter 4
PREDICTIVE MINIMUM
AND NONPREDICTIVE
DESCRIPTION
LENGTH
115
PRINCIPLES
by J. R i s s a n e n
1. I n t r o d u c t i o n
115
2. C o d i n g
120
and Prediction
3. A R M A E s t i m a t i o n 4. V e c t o r
Time
and P r e d i c t i o n
Series
125 131
Models
137
References
Chapter
5
DETERMINISTIC
AND
STOCHASTIC
LINEAR
PERIODIC
141
SYSTEMS by S. B i t t a n t i
141
I. I n t r o d u c t i o n 2. S t r u c t u r a l Systems
Properties
of C o n t i n u o u s - t i m e
Periodic
143
X~ 2.1 Continuous-time Linear Periodic Systems
143
2.2 Structural Properties
145
2.3 Grammian Matrices
146
2.4 Five Structural Properties of Time-invariant
146
Systems 2.5 Five Structural Properties of Continuous-time
148
Periodic Systems 3.
Structural Properties of Discrete-time Periodic
156
Systems 3.1 Discrete-time Linear Periodic Systems
156
3.2 Structural Properties
158
3.3 Grammian Matrices
158
3.4 Five Structural Properties of Discrete-time
159
Periodic Systems 4.
Kalman Canonical Decomposition
163
5.
Extended Structural Properties
165
6.
Stochastic Linear Periodic Systems
168
References
Chapter 6
176
NUMERICAL PROBLEMS IN L I N E A R
SYSTEM
THEORY
183
by D. Boley and S. Bittanti
1 ,
Introduction
183
2.
Review of Simpler Computational Methods
183
2.1LU
183
Decomposition
2.20rthogonal 2.2.1QR
Decomposition
Decomposition
2.2.2 Geometric Interpretation of a Rotation
188 188 191
2.2.3 QR Decomposition by Housolder deconigositions 192 2.2.4 Solving Least Squares Problems Using Orthogonal Decompositions
194
X~
Special Forms Used in Numerical Linear Algebra-Why
196
3.1 The Jordan Canonical Form
196
3.2 Numerical Conditioning of a Problem
197
4.
Schur Decomposition
199
5.
Singular Value Decomposition -
201
3.
Condition Number of a Matrix 6.
Applications of Previous to Linear Systems
References
Chapter 7
211 220
SOME RECENT DEVELOPMENTS IN ECONOMETRICS
222
by M. McAleer and M. Deistler
I.
Introduction
222
2.
Specification and Quality COntrol of a Model
226
2.1 Model Specification
227
2.2 Tight and Loose Specifications
228
2.3 Principles for Testing
231
2.4 Diagnostic Testing
232
2.4.1 Serial Correlation
232
2.4.2 Heteroscedasticity
233
2.4.3 Exogeneity
234
2.4.4 Functional Form
234
2.4.5 Parameter Constancy
235
2.4.6 Non-nested Alternatives
235
3.
Macroeconomic Modelling and Forecasting
236
4.
Microeconometrics
240
References
241
Chapter
I
Time
Series and Stochastic
E.J.
.
Models
Hannan
Introduction
This c h a p t e r will be c o n c e r n e d w i t h p r o c e d u r e s
for a n a l y s i n g
y(t),
of
t = 1,2,...,T,
where
y(t)
is a v e c t o r
q
data,
components
that can be thought of as the o u t p u t of some s y s t e m to w h i c h the input is
u(t),
an o b s e r v e d v e c t o r of
p
components.
held in m i n d is one w h e r e no very p r e c i s e about the system and the d e s c r i p t i o n of such g e n e r a l i t y explanation.
that e x p e r i e n c e
information
The situation is available
will be on the b a s i s
suggests w i l l
suffice
This w i l l be further d i s c u s s e d below.
of m o d e l s for a g o o d
T h e s e models
will always be stochastic. Let us b e g i n by c o n s i d e r i n g stationary
stochastic
y(t),
p r o c e s s with
E{yj(t) 2} < -, where
y~(t)
is the
assume that
y(t)
of the p r o c e s s
is ergodic
is e v e r
from the i n d e f i n i t e l y
effects c o u l d
=
of
y(t).
since only one h i s t o r y
so that there
far past,
to r e q u i r e if there
such effects
to
or r e a l i z a t i o n that it be
is no influence
or rather
such as diurnal
by a so that
It is c o s t l e s s
on
y(t)
is such an
as the m e a n or of
or seasonal m o v e m e n t s . so that,
will be with the m e a n c o r r e c t e d
Such
for example,
quantities
y(t)- y,
1 T Z y(t).
to c a l c u l a t i o n s
so t h a t
[(t)
This makes n o t a t i o n Any such stationary, least in part,
square,
q,
first be removed by r e g r e s s i o n
all c a l c u l a t i o n s
In relation
as g e n e r a t e d
seen and r e a s o n a b l e
it can only be through
periodic c o m p o n e n t s
been done
j = 1,2
j'th c o m p o n e n t
purely n o n - d e t e r m i n i s t i c , influence
alone,
finite mean
it will be assumed
is the r e s i d u a l
that this has a l r e a d y
from such an adjustment.
simpler. non-deterministic
through
its spectrum,
process can be analysed, f(w),
a
q x q
at
matrix valued
2
function
satisfying F(t)
f (~) = f (~)
= f (-~) '
d E{y(s)y(s+t)'}
= I ~eit~f{~)
We shall n o t discuss F o u r i e r m e t h o d s methods models
are e m p h a s i s e d
proportions,
in c o n t r a s t
Here
to F o u r i e r m e t h o d s
by smoothness
and systems e n g i n e e r i n g
a c r o n y m for a u t o r e g r e s s i v e (Here e x o g e n o u s
means
For
to m a n a g e a b l e
requlrements
for
f~).
emphasised
and are c a l l e d ARMAX,
moving-average
input.)
that are
is r e d u c e d
These finite p a r a m e t e r m o d e l s have been e s p e c i a l l y econometrics
the main
"finite parameter"
and in w h i c h the g e n e r a l i t y
essentially,
de.
in any d e t a i l b e c a u s e
of this paper are different.
non-parametric
and
with exogenous
y(t}
stationary
in
an
compo[:ents.
and non-
deterministic j y(t) Here the e(t) of
e(t)
= y(t) y(t)
= 7 W i e(t-i), 0
are the linear i n n o v a t i o n s
- y(tlt-l)
from
y(t[t-l)
important,
Then
There
% 0,
since
i.e.
is the b e s t linear p r e d i c t o r
There
is an e x t e n s i v e
of
f(~)
here.
the c o n s t r u c t i o n
but this will n o t be
Put (i.i)
Iz] > 1
and
W(z)
is a n a l y t i c
H o w e v e r we always a s s u m e
zeros on
Izl = 1
for
Izl > i ,
det W(z)
cause c o n s i d e r a b l e
# 0,
problems.
is a d e c o m p o s i t i o n f(~)
which
is u n i q u e
W(z)
having
= _ 1 W(e-i~ ) nw(e-i~)*, 2~ since there
the p r o p e r t i e s
we g e n e r a l i s e
(i.I)
y(t)
is no other
(1.2) such d e c o m p o s i t i o n
stated above.
To take a c c o u n t
= 7. Wie(t-i) 0
u(t)
= ZLiZ-i
is causal, so that there is no i n f l u e n c e However
The e s s e n t i a l
(1.3)
relation
(I.I},
as a basis for a w o r t h w h i l e further
of
+ 7. L.u(t-i), 1
L(z)
s > t.
with
to
and p u t
u(s),
theory
= 7. W.z -i 0
~II W i N 2 < ,
Izl > I,
E{e(s)e(t) '} = ~st ~.
W i e n e r and others c o n c e r n i n g
algorithmically,
det W(z)
since
y(t-2)...
from k n o w l e d g e
W(z}
9(tlt-1)
where
y(t-l),
due to Kolmogoroff, of
W 0 = Iq,
specialisation
(1.27,
restriction
on
is that the
y~t)
(1.3) are too g e n e r a l
statistical
consider
here
analysis.
the infinite
from to serve
To introduce
(Hankel)
matrix
a
-W 1
L1
W2
L2
W3
L3
-..
W2
L2
W3
L3
W4
L4
-.-
W3
L3
W4
L4
W5
L5
...
H =
i
m
I
0
Here
[WjLj]
Q
Q
t
0
0
i
Q
O
w i l l be c a l l e d a "block",
columns.
The i m p o r t a n c e of
obvious, f a c t that the b e s t predictic~ of u ( ~ , ,
H
j
H
has,
Li÷jut i,
Let
n
rows of
H
the c o e f f i c i e n t w i l l be m a d e
that span all of the rows of
so that a n y row can be l i n e a r l y r e p r e s e n t e d in t e r m s of them.
Of course rank of
n
w o u l d be infinite in general.
H,
rows of
H
p
[W(z),L(z) ].
Call
H1
y(t) Since
where
K,L
c o l u m n s of
(1.3),
comprise, H 0.
q
r e s p e c t i v e l y the first
q
are com.posed of
This is the state
F,H
+ e(t},
the rank of
H, H 0.
x(t)
H0~(t)
= Ke(t)
and
= H0~(t-l).
(1.4)
(full)
rows of
+ Lu(t) H
then
+ H2u(t-I). H 1 = HH0,
and x(t+l)
= Fx(t)
space r e p r e s e n t a t i o n
Its lack of u n i q u e n e s s ,
given that
F
+ Lu(t)
+ Ke(t).
(1.5)
in p r e d i c t i o n e r r o r form. is minimal,
i.e. of d i m e n s i o n
is e n t i r e l y due to the lack of u n i q u e n e s s
in a
T h a t can be m a d e u n i q u e by c h o o s i n g the rows of
as the first l i n e a r l y H.
or
Put
+ e(t),
for suitable
y(t) = Hx(t)
choice of
H,
(1.4)
= Hl~(t-l)
H0, H 2
H 2 = FH 0
the
the first b l o c k of
~(t) = [e(t)'u(t) ' e ( t - l ) ' u ( t - l ) ' . . . ] ' , Then from
n,
and p u t
H 0 = [K L H 2] the next
The i n t e g e r
is c a l l e d the o r d e r or the M c M i l l a n d e g r e e of
e q u i v a l e n t l y of
of
is, ignoring
j>l
The i m p o r t a n c e of H
in section 4.
be a set of
p + q
can be seen f r o m the, a l m o s t
evident in o t h e r w a y s H0
rows a n d
as the j'th row of blocks,
blocks in t h a t p r e d i c t i o n .
H
q
step a h e a d p r e d i c t o r
(t+jlt) = Z0 Wi+ j et_i + SO that
of
H0
i n d e p e n d e n t set f o u n d as you go down the r o w s
We w i l l r e t u r n to this later.
The m e t h o d s
used herein are d e p e n d e n t 9 n acting, as if
Then, and only then, functions
of
z
W(z)
and
L(z)
are m a t r i c e s
a n d can thus be w r i t t e n
n
is finite.
of rational
in the form
[ W(z) L(z)] = A(z-l) -I [C(z-I) B(z-l)] where (1.6)
A(z),
B(z), C(z)
are m a t r i c e s
(1.6)
of p o l y n o m i a l s .
is far from unique but we shall later describe
prescription fraction
of
H0
how the u n i q u e
just d e s c r i b e d
description",
the shift o p e r a t o r
Of course
(1.6).
i.e.
leads to a unique "matrix -1 shall use z also to indicate
We
z-ly(t)
= y(t-l).
Corresponding
to
(1.6)
we have the ARMAX r e p r e s e n t a t i o n A(z-l)Y(t)
= B(z-l)u(t}
This is i m p o r t a n t p a r t l y b e c a u s e
+ C(z-l}e(t)"
it e x p r e s s e s
y(t-1), y(t-2),., u(t-l), u(t-2),..e(t), serve as a basis for an iterative coefficient matrices are unobserved, no input
y(t)
estimation
This will be dealt w i t h variable
and can
p r o c e d u r e where the
are e s t i m a t e d by regression,
(or exogenous)
in terms of
e(t-l), e(t-2),.,
b e i n g r e p l a c e d by e s t i m a t e s
in the iteration.
(1.7)
the
e(t),
from a p r e v i o u s in section
5.
which stage
When
is o b s e r v e d we speak of the ARMA
case. Notes on References. spectral
theory,
for example
theory of systems 2.
There are m a n y r e f e r e n c e s Hannah
see K a i l a t h
(1970).
(1980), C a s t i
For the structure (1977).
Some Basic A l g o r i t h m s
There are three basic a l g o r i t h m s (i)
The first a l g o r i t h m y(t)
d(~)
of time series analysis.
is the discrete
at f r e q u e n c i e s
for
Under
T'
highly
Composite.
E{d(2~j/T) d(2nk/T) and,
indeed the error
(ii)
The
transform
is u n i f o r m l y
second a l g o r i t h m
j=0,1 ..... [½T']
conditions
on
f(~)
n
is finite.
is the L e v i n s o n - W h i t t l e
recursion.
in a sense,
(I.I) by c o n s i d e r i n g
smoothness
2~j/T',
} ~ 6jk2~f(2~j/T)
a l g o r i t h m will not be so i m p o r t a n t
This is designed,
Fourier
= T - ½ T~y(t) e it~ , 1
which is c h e a p l y c o m p u t a b l e
in
for the basic
0(T -I)
if
This
to us.
to produce a p p r o x i m a t i o n s
to
e(t)
~(z) = w(z) -I
e(t) = ~(z)y(t)
The procedure recursively calculates polynomial
approximations
$ n
of degree ~(z)
n
to
~.
is a polynomial
degree
n.
s < 0
n
of degree
will,
n
because a system for which
The recursive calculation
natural estimates
For
We have used
of
F(t)
in fact, be of McMillan
uses the data through the
of the form
1 T-t (t) = ~ Z y(s)y(s+t) ', t Z 0. s=l put ~(t) = ~(-t)' However it will be convenient
put this Levinson-Whittle
recursion
because of its many uses later.
in a more general
Thus let
v(t)
to
setting
be a vector of
s
components and put ^ I T-s Fv(t ) = ~ 7 v(s)v(s+t) ' = s=l
v(-t) ',
t >. 0.
~he recursion calculates matrices^ Fn,j, Fn, j, Sn, S nIf^ v(t) = y(t) then Fn, j is ~n,j' the c o e f f i c i e n t ~n(Z) ~ and correspondingly en(t) The
Fn, ~
process putting
=
would,
"backwards"
we have an estimate
n 0Z Sn,jy(t-j) '
rn(t),
v(t)
for
n rn(t) = ~ n , j
We now give the
e(t)
= y(t),
corresponding
correspond
12.1) to
to the time reversed
(as distinct from the forwards residuals Fn,j = ~n,j
z -j in
iT+n ~n = Sn = T ~ Sn (tlen(t) '
in this case where
residuals,
of
of
~n(t)).
Thus,
y(t) = v(t) y(t-n+j) ,
IT+n ~n = Sn = ~ 1~ rn(t)rnlt)'.
recursive algorithm in terms of
Fn,j = Fn-l,j + Fn,nFn-l,n-j'Fn,j
v(t).
= Fn-l,j
Fn, 0 = Fn, 0 ffi I s~_ Fn,n = -An-ISn~I ' Fn,n = -An-iSn~l'
12.21
+ Fn,nFn-l,n-j '
n An =
ZFn,j~v(J-n-I) "
0
Sn = (Is - Fn,nFn,n) Sn-l' Sn = (Is - Fn,nFn,n) Sn-i '
So = S o ffi ~v 1°)" In case
s = i
we have
that the algorithm
S n = S n' F n,j = F n,j . . j=l, . .
is simplified.
,n,
so
These procedures better,
when
have
n/T
severe
is not small.
founded on the T o e p l i t z T < t ~ T+n. for the
disadvantages This
assumption
Fn, ~
for given
n
T
v(t)
implicitly or
the system of e q u a t i o n s
has a block Toeplitz down any diagonal.)
= %n_l(t)
they are
= 0, -n < t ~ 0
+ ~n,nrn_l|t-l),
matrix,
i.e.
There have been
many m o d i f i c a t i o n ~ , often Dased on calculating, ~n(t) (see (2.1), (2.2)) r e c u r s i v e l y by %n(t)
is small or,
is because
that
(This is so called because
one with the same e l e m e n t s
when
for example,
~
(t), n
rn(t)
= ~n_l(t-l) + ~n,nen_l(t)
~n(0)
£ 0,
~0(t)
= 90(t)
= y(t),
1 ~ t ~ T.
Then also An - T1 It is the terms
T+n Z en(t)rn(t-l). 1
in
(2.1),
(2.2),
to cause m o s t of the trouble, involve, q = 1
in a substantial
it has been
T Z en_l(t)^ n
rn_l(t-l),
resulting
number
coefficient. equivalent
to the fact The use of them)
replaced
in
that
are called partial coefficients
the
Sn(Z)
between
is that the
# 0,
and is completely Izl ~ I,
additional because
a desirable calculations.
but wherever
These
of the flow diagrams
in real time calculations.
we shall continue
~n,n
to write
that is used
For the
in terms of the it could be
(using a lower case symbol
autocorrelations
by systems
computing
(2.4)
(2.4)
formula.
TO see why the a l g o r i t h m consider
also In case
by
of correlation in
~ n,n
(so c a l l e d
recursion
case
I ~ t < n
assumption.
be replaced
n,n
that seem
as also does the c o r r e l a t i o n
involves
are important
by a lattice
In the scalar
$
A virtue
(-i,i),
(2.3)
of this account
Levinson-Whittle
those for
use the c o e f f i c i e n t
or ladder m e t h o d s
describing purposes
that
is also true of
property. lattice
T < t ~ T+n
the Toeplitz
n ~ t ~ T.
lies
This
for
~n-i (t-l)/{!2 TZ en_l(t) ^ ^ 2 + ½ tZ rn_llt-l)2 } n n
but one m i g h t equally en_l{t),
(2.3)
though
way,
suggested
(2.3)
by statisticians
for
q = i}
and reflection
engineers. has b e e n p r e s e n t e d
an estimate
of
e(t)
for general
when
v(t)
inputs are observed.
Put,
then, n^
en(t)
n^
= Z~ n • y(t-j) 0 ,3
- Z~n, j u(t-j). I
Here
Z~n, j z -j is an a p p r o x i m a t i o n
using
(1.6) .
To obtain
A
and
[~n^,~'Tn,j" ]
also
~n'
hand
q x q
$, ~
to
take
as the f i r s t b l o c k
the c o v a r i a n c e matrix
W(z)-IL(z)
of
q
m a t r i x of the
of S n.
= C ( z - l ) - i B ( z -I)
.
v(t) ' = (y(t) ',u(t) ') , s = p + q, rows in
en(t),
Fn,j.
Then
is the top left
This type of p r o c e d u r e
will r e p e a t e d l y
be used below. (iii) The third m a j o r a l g o r i t h m a finite p a s t e q u i v a l e n t finite.
The a l g o r i t h m A
x(t+l)
to
is the K a l m a n
e(t),
filter,
on the basis of
which computes
(1.5),
for
n
is
= Fx(t)
+ Lu(t)
+ K(t) e(t),
y(t)
^
= Hx(t)
+ c(t)
!
K(t)
= {FP(t) H
P(t+l) P(1)
It may be wise
= FP(t)F'
= FP(1)F'
There is an e n o r m o u s Gaussian
x(l}
P(t),
of r o u n d i n g literature
+ .Q}-IK(t)'
= 0.
replacing
it by
½{P(t)+P(t) '}
errors.
surrounding
this algorithm.
For
lies in the fact that it allows
to be calculated,
w h i c h we call
+ ~}-I
- K(t){HP(t)H'
+ KnK',
its importance
likelihood
likelihood,
+ K~K'
to sym/netrise
to reduce the e f f e c t s
our p u r p o s e s
+ K~}{HP(t)H'
L(8)
or better
and still
(-2T -I)
the
by that
speak of as the likelihood.
This is, apart from a c o n s t a n t , 1 T Zlog d e t { H P ( t ) H ' + ~ }
i T + ~ 1Ze(t) '{HP (t)H'+n}-l£ (t) .
(2.5)
7 1 Here
e
K, ~.
stands for the p a r a m e t e r s Those
indicate by in
~.
T.
In (2.5)
treating few,
in F, H, K
u(t)
if any,
assumption
we
The r e m a i n d e r the G a u s s i a n as a fixed
of the m e t h o d s
that the
e(t)
involved,
shall call
i.e.
are the v a r i a n c e s likelihood
sequence
those
in
system p a r a m e t e r s
of this c h a p t e r are Gaussian.
and c o v a r i a n c e s
has been w r i t t e n
of vectors.
down
We e m p h a s i s e
depend greatly
The likelihood,
is used to obtain an e s t i m a t i o n m e t h o d rather
F, H,
and shall
than b e c a u s e
that
on the (2.5), it is
the true likelihood. Notes on R e f e r e n c e s .
The fast F o u r i e r
t r a n s f o r m was i n t r o d u c e d
to
latter day science
in C o o l e y and Tukey
of the L e v i n s o n - W h i t t l e
(1965).
The v e c t o r
a l g o r i t h m was given in W h i t t l e
L a t t i c e f o r m s are s u r v e y e d in F r i e d l a n d e r
(1982).
form
(1963).
A g r e a t amount
of detail a b o u t the K a l m a n f i l t e r is found in A n d e r s o n a n d Moore (1979). 3.
Approximation Criteria
The p r o b l e m to be c o n s i d e r e d
in t h e r e m a i n d e r of this c h a p t e r is
that of a p p r o x i m a t i n g the true s y s t e m by one of finite M c M i l l a n degree.
T h i s degree,
n,
has to be d e t e r m i n e d .
Once t h i s
is
r e c o g n i s e d it m u s t a l s o be r e c o g n i s e d that it is not p o s s i b l e to p r o c e e d p u r e l y through the m i n i m i s a t i o n of a l w a y s be f u r t h e r r e d u c e d by t a k i n g procedures here considered choose
n n
log det ~n + d(n)CT/T' Here
~n
(2.5)
large.
by m i n i m i s i n g some f o r m of n = 0,1,...,N.
is the m a x i m u m l i k e l i h o o d e s t i m a t e of
and the f i r s t term in the m i n i m a l v a l u e of
(3.1)
~,
(2.55, for
n
given.
The c o n s t a n t
The second term in
is
n ( 2 q + p).
term w h i c h i n c r e a s e s as Two c o m m o n l y used w i l l be c a l l e d be c a l l e d
CT
n
BIC(n).
increases,
s e q u e n c e s are
AIC(n),
given
and
An upper bound,
d(n)
whereas C T ~ 2,
C T = log T, N,
is the d i m e n s i o n (3.1)
(N
is a p e n a l t y
the first d e c r e a s e s . in which c a s e
in w h i c h case has b e e n
m i g h t increase w i t h
(3.15
(3.15 w i l l
imposed on
and is n e e d e d in c o n n e c t i o n w i t h p r o o f s of a s y m p t o t i c p r o p e r t i e s of the method.
n,
essentially
(Some a p p r o x i r ~ t i o n is
i n v o l v e d in that statement.) which
(3.1)
is, e x c e p t for a constant,
of
T,
since t h a t can The a l t e r n a t i v e
n
(wi~h T5
T.)
in p r a c t i c e
such b o u n d s do not seem to be u s e d p r o b a b l y b e c a u s e the b o u n d s n e e d e d for v a l i d i t y are m u c h larger than v a l u e s of experienced
n
t h a t an
i n v e s t i g a t o r w o u l d c o n s i d e r r e a s o n a b l e and a r e needed
in the t h e o r e t i c a l
i n v e s t i g a t i o n only to e x c l u d e r i d i c u l o u s l y
large values. For the c a s e of
C T = log T
a j u s t i f i c a t i o n has been g i v e n by
R i s s a n e n on the b a s i s of a m i n i m u m d e s c r i p t i o n
length p r i n c i p l e .
The idea is to use the m o d e l
set to r e c o r d the d a t a in as f e w b i t s
as p o s s i b l e .
(or r a t h e r
The first term
T/2
b y it) g i v e s a
m e a s u r e of the a v e r a g e n u m b e r of b i t s r e q u i r e d for an o p t i m a l encoding when
n
is fixed and the m a x i m u m l i k e l i h o o d structure,
on G a u s s i a n a s s u m p t i o n s , i s decode,
t a k e n to be the true structure.
To
the m o d e l p a r a m e t e r s m u s t also be t r a n s m i t t e d a n d T / 2 by
the second t e r m in
(3.1), for
BIC,
measures
the n u m b e r of bits for
an optimal e n c o d i n g of these, to an a c c u r a c y d e t e r m i n e d by that of the m e t h o d of m a x i m u m likelihood.
The use of
CT - 2
justified by A k a i k e on the basis of a p r e d i c t i o n
has b e e n
theory,
and has b e e n
widely used. The e m p h a s i s in this c h a p t e r w i l l p r i n c i p a l l y be on the use of rational t r a n s f e r f u n c t i o n systems as a p p r o x i m a t i o n s more general kind. section.
to systems of a
T h i s will be f u r t h e r d i s c u s s e d in the next
H o w e v e r h e r e some d i s c u s s i o n of the case w h e r e there is a
true r a t i o n a l t r a n s f e r f u n c t i o n s y s t e m w i l l be g i v e n in r e l a t i o n to the use of
(3.1).
T h e c o n d i t i o n s under w h i c h the s t a t e m e n t s b e l o w
hold true are e s s e n t i a l l y
(6.1),
of fourth m o m e n t s of the
ej (t),
also d e p e n d o n a c o n d i t i o n (Compare b e l o w
(I.i).)
(6.2), below, p l u s the f i n i t e n e s s but the p r o o f s of the t h e o r e m s
det W(z)
This
6
# 0,
Izl >_ 1-6,
6 > 0.
may be as small as d e s i r e d but is
p r e s c r i b e d u pr~or~. Now assume there is a true T ÷ ~,
CT/T
+
0
no
n
minimises
(3.1) while,
(which is an i n s i g n i f i c a n t r e q u i r e m e n t ) .
following holds, w h e r e a.s. (i)
and
lim inf C T / ( 2 1 o g log T) > 1 then T~=
n ÷ n0, a.s.
If
lim sup C T / ( 2 T+=
n
a.s. to
loglog T) < 1 then
does not c o n v e r g e
n0.
lim inf C T = ~ then
If
lim sup C T < = T ~ ~
~ + n O in p r o b a b i l i t y .
then
!im l i m P{n > i. 6~0 T + ~ no } =
(3.2)
These results d e s e r v e careful i n t e r p r e t a t i o n . (i) should not be i n t e r p r e t e d as saying that a good value to use b e c a u s e with
T
tO be m e a n i n g f u l .
At is 3.9. of
T
2 loglog T At
T = i0
It is t h e r e f o r e not far f r o m
In the f i r s t place C T = 2 loglog T
changes
CT = 2
(3.2) s u g g e s t s that
AIC(n)
n
T = i000
for m o s t AIC(n).
values
The r e s u l t
is bad b e c a u s e it w i l l a l w a y s o v e r -
estimate the M c M i l l a n degree. no true d e g r e e and t h e n
is
far too slowly
it is 1.7 and at
met in p r a c t i c e , w h i c h is the value for
is how fast.
Then the
stands for " a l m o s t surely".
If
(ii) If
as
H o w e v e r in p r a c t i c e there w i l l be
should increase with
Some i n v e s t i g a t i o n s
suggest t h a t
T.
The q u e s t i o n
C T = 2, i.e. AIC,
10
gives an o p t i m a l rate Of increase, The r e s u l t
(3.2) d e s e r v e s
s i m p l e s t case w h e r e n = 1
a c c o r d i n g to c e r t a i n c r i t e r i a .
further d i s c u s s i o n .
q = i,
n0 = 0
We give this for the
so that
y(t)
= e(t).
When
is the m o d e l t h e n y(t)
+ ay(t-l)
= e(t)
+ ce(t-l),
We i n d i c a t e w h y
n = 1
value,
is u n i f o r m l y bounded.
when
CT
lal < I,
w i l l be p r e f e r r e d to
Icl < 1-6.
n = 0,
(3.3)
the true
The c h o i c e b e t w e e n the two
v a l u e s w i l l be b a s e d on log ~i + 2 C T / T - log ~0 = - l ° g ~ 0 / ~ l ) so that
n = 1
Consider
Fig.
w i l l be p r e f e r r e d w h e n
+ 2CT/T
A T = T l o g ( ~ 0 / ~ l) > 2C T •
i.
/ 1-6
c
-i+6
/ -i
1 a
F i g u r e I. The r e g i o n of o p t i m i s a t i o n lines t h r o u g h
±(1-6),
for
n = 1
is that b e l o w and a b o v e the
e x c l u d i n g the diagonal,
the l i k e l i h o o d c o u l d be at the boundary. maximum likelihood estimates that
(~,~)
it may be s h o ~
m o v e s to the diagonal.
but the m a x i m u m of
In fact if
Thus
that AT
that d i a g o n a l b y
lal < i o g { ( 2 - 6 ) / ~ } 1T
= 4,
let us say.
s t a t i o n a r y r a n d o m f u n c t i o n of
(a - c) ÷ 0,
Fig.
~ = log{(l+a)/(l-a)}
6,
i.e.
~(~)2 ~
so
i.
where
Let us
so that
Then this function,
is e v e n t u a l l y the m a x i m u m v a l u ~ is
are the
is e v e n t u a l l y the
m a x i m u m of a f u n c t i o n d e f i n e d on the d i a g o n a l of parameterise
a, 6
~(s)
of w h i c h is a
t a k e s the p l a c e of
t
in our p r e v i o u s
considerations
spectral d e n s i t y ~(u)
will,
as
~ + 0 el.
so that
Thus
A
(a,c)
as follows.
becomes
large v a l u e s
which will m a k e o p t i m i s a t i o n interpretation
continuously.
-~ < ~ < =.
its m a x i m u m for i n c r e a s i n g l y that approach
but v a r i e s
{cosh ~ } - i
will
increasingly of
u
(i,i)
has
that
large,
i.e. v a l u e s
approach
difficult.
~(u)
It is e v i d e n t
or
take
of
a
(-i,-i),
This r e s u l t has a n o t h e r
It is a p p r o x i m a t e l y
true that
~i
is the
minimum value of
-~{Id(~)12/lw(eiC~) l2}
IV
where that
W(z)
=
(z-c)/(z-a).
a-S ~ 0.
If
Iw(ei~)12 ÷ i,
a,S
(e,a) = (I,i)
than
does,
0 (for -i)
then
IW(ei~)12
or
(3.4)
a "notch"
If
a
reduced
~n
or
0.
bf
Where
Id(~)l 2
This f u n c t i o n
near
0,
en.
is i n c r e d i b l y
local minima to find.
to
(3.4) and the a b s o l u t e
This c o r r e s p o n d s
local maxima and minima, neighbourhoods the function
of
~,e
+i) or
IW(ei~)12 ~).
Thus
the n o t c h will by the shape of irregular that w i l l
for
(for
~
m i n i m u m m a y be very d i f f i c u l t 2 ~(u) will have many
small)
into small
because
of the n a t u r e of
e = log{(l+a)/(1-a)}. situation,
essentially
the same. in that
i.e.
that for general
It must be e m p h a s i s e d T
n, q, p, that
(3.2)
is is very
may need to be very large before
it is
relevant.
Notes on References. suggested in A k a i k e
The procedures (1969),
and above are in H a n n a n relating to
AIC
Rissanen
(1980),
described (1983).
(1981),
w h e n there is no true
Hannah and K a v a l i e r i s
(1980).
T
give
to the fact that
w h i c h w i l l be c o m p r e s s e d
a = ±i
The general "asymptotic"
for
faster
(for
(since
shape will be is d e t e r m i n e d
so that there w i l l be many v a l u e s
el
±n
be and what its precise large,
We know that
then
goes to
zero at
at other values
at
d(~)). el
so that it seeks to move
becomes
is f u r t h e r
for
The m e t h o d of m a x i m u m
(3.4),
(-i,-i).
to unity u n i f o r m l y
develops
2(i)
away f r o m
a-a ÷ o.
IW(ei~)l-2
a n d thus
will converge
as
to m i n i m i s e
towards e
(See s e c t i o n
remain b o u n d e d
uniformly,
likelihood attempts
(3.4)
de
in this s e c t i o n were The results
(1984). nO
in
(3.2)
For the results
see S h i b a t a
(1980),
12
4.
Rational
Transfer
Function ApDr0ximation
In this section a b r i e f theory c o n c e r n i n g
the a p p r o x i m a t i o n
by a p p r o x i m a t i n g
to
H
less c o n c e r n e d w i t h methods
account w i l l be given of some d e e p of the true structure
by a Hankel matrix
theory may c h o o s e
relate mostly
to the case where
W(z)
by a
possible
W(z)
for
n
finite
in the Hankel norm, w h i c h
singular value norm) to another.
for
H
from
has Thus
)',
past.
The
structure
~ ~).
matrices being (5.5).
describes
space on w h i c h
By i.e.
R.
H
(I
Q ~)
as the
operates
(1.2).
space
F(t)'
= 0,
(4.1)
(j,k)th
block,
Wj = 0,
of the f u t u r e on the
is therefore matrix
of
e n d o w e d with a e t,
namely
definition H
of tensor p r o d u c t
operates
blocks see b e l o w
is endowed w i t h a m e t r i c
m a t r i x of
Yt+l'
namely
that
block
is the
t'th
the c a n o n i c a l
= F(j-k)'
Fourier coefficient factorisation
Let this f a c t o r i s a t i o n
(This n o t a t i o n
(e(t)',e(t+l)',...)'
a b l o c k diagonal m a t r i x w i t h the diagonal
E{y(t+j)y(t+k) '} = F(k-j) Since
(or
we mean the T e n s o r p r o d u c t of the two
The space to w h i c h
we c o n s i d e r
norm
yt = ( y ( t - l ) ' , y ( t - 2 ) ' , . . . ) '
given by the c o v a r i a n c e
For the general
(j,k}th
)'
the d e p e n d e n c e
structure given by the c o v a r i a n c e with
is as small as
from one H i l b e r t
et =
E ( e t et+l)
Wj_k,J,k=l,2,... (4.1)
metric (I
H-~
(i.I),
Yt+l = Her + Ket+l' K
so t h a t
is the E u c l i d e a n
(y(t)',y(t+l)' ....
as is e a s i l y c h e c k e d
j < 0.
The idea is to a p p r o x i m a t e
as an o p e r a t o r
e t = (e(t-l) ',e(t-2)',...
where
The
To see w h a t is i n v o l v e d put Yt =
Then,
(Readers
this section.)
there are no inputs and
only that case w i l l be d i s c u s s e d here. to
of finite rank.
to "skip"
be
f(-m)
matrix of
f(--a)
of this, as for f(m) in -I~ -iv --iw * = (2~) W(e )~W(e ) .
is in a g r e e m e n t w i t h that in section 2 b e c a u s e
is the s p e c t r u m of the time r e v e r s e d process.)
Here
W(z)
f(-w]
= E WjzJ
13
and
det W + 0,
block,
W(j}
Izl > I.
Let
= 0, j < 0.
W
have
W(k-j)
(j,k)th
as the
Then
s = ( i • ~-%) w - 1 ~ ( z ~ n ½) operates from £2 to £2' sequences al,a2,.., with produc~
(a,b)
decomposition is upper
= Zajbj.
Thus S
triangular
The blocks,
Sj+k_l,
S
and for
z = exp i~ If
Toeplitz,
singular
by the matrix
matrix
so that
W-IH
then that
it is easily
q = 1
In the scalar
whose
W -I
is of Hankel
value
because
then
case,
f(-~)
q = I,
Thus we write
in the typical
form.
(j,k) th
place
function (4.3)
checked
= f(~)
we shall
w(z).
W
is also of matrix
= ~-½ W ( z ) -I W(z -I) n ½
matrix. letters.
S
j,k = 1,2,...,
are g e n e r a t e d S(z)
it is
is also a Hankel
and block
It follows
in
where £2 is the space of all ZIajl 2 < ~ and with the ihner
is sought.
that form.
(4.2)
that this is a unitary
so that in future
In this case
~ = ~
and
W = W.
use lower case
therefore
(4.3)
becomes
s(z) = w(z-l)/w(z) which
is o b v i o u s l y
of modulus
1
for
has real coefficients.
Of course
analytic
for
However
analytic
part,
The singular that operator
Izl ~ I. i.e.
value
Introduce
unlike
since W(z),
only the c o e f f i c i e n t s
the coefficients decomposition
of
to be appropriate,
S = 2 pjnj~j, 1
z = exp i~
S(z),
e.g.
of S
z 3,
j > 0,
is of the form
w(z) is not
of the
occur in
S.
(assuming
compact)
njnk = ~j~k = ~jk
Pl ~ P2 ~ "'" ~ 0.
the new random variables uj ~ nj
(I
@ ~½)
*
(I
@ n -½) e .
xj = ~j
W -I Yt+l t
Then E(uju k) = E(xjx k) = 6jk; The occur
uj,
xj
m i g h t be called
in the classical
analysis
as functions
theory
E(ujx k) = 6jkP j.
"discriminant
functions"
of statistical
canonical
that are used to c l a s s i f y
since they correlation
individuals.
The
14
pj
themselves
e s, s ~ t, canonical
w o u l d be c a l l e d
spans the same correlations
w o u l d be o b t a i n e d with the metric yr.
if
"canonical
space as do the
and the same S
uj
determined
(at least for
Hankel norm a p p r o x i m a t i o n given
n.
virtues
to
H
these
for the A R M A case.
H.
block,
Call
that such a c a n o n i c a l
r(v,j)
the
v=l,2, .... r(v,j),
after f i r s t
Then
by the f i r s t j'th
n
row,
of
it is
the b e s t W(z),
for
since the in a
to estimate
ideas h a v e b e e n used by Akaike
representation
is chosen as c o n s t i t u t e d
to
of h a v i n g
survey,
It will be r e c a l l e d
matrix
is known
into that here
that we shall b r i e f l y (1.6),
6
xi)
from a space
are b y no means evident
c o n t e x t nor are the e f f e c t s However
of
and e q u i v a l e n t
We shall not enter further
pj, uj, xj.
the same
by the c o v a r i a n c e
q = I) to d e t e r m i n e
of such an a p p r o x i m a t i o n
statistical
s ~ t,
as an operator
Once the singular value d e c o m p o s i t i o n
possible u n i q u e l y
of
yS,
Since
(but not the same
were c o n s i d e r e d
structure
correlatlcns".
introducing
in a way
a canonical
form is a t t a i n e d linearly
the
if
H0
independent
j = 1,...,q,
rows
in the v'th
such a set of rows is always of the form
v = 1 ..... nj;
j = 1 ..... q;
Znj = n.
(4.4)
The
n. are known as the K r o n e c k e r indices. They uniquely ] determine these first linearly i n d e p e n d e n t rows of H . There is 0 a c o r r e s p o n d i n g unique f a c t o r i s a t i o n of W(z) = A(z-l)-ic(z-l), w h e r e A(z -I) = ZA(z),
C(z)
= ZC(z)
and
in the j'th place in the diagonal. n o m i a l s with monic,
A having diagonal
Z
is diagonal with
A, C
are m a t r l c e s
elements
of degree
i.e. have unity as the c o e f f i c i e n t
= C - A
the d e c o m p o s i t i o n
is u n i q u e l y
of
znJ.
nj
z-nJ or p o l y w h i c h are
Putting
d e f i n e d by the i n e q u a l i t i e s
on degrees deg aij
< deg ~jj, j + i;
deg ~ij
~ deg aii'
j ~ i;
deg aij < deg aii" deg eij < deg ~ ii'
i,j = 1,2 .... ,q.
A k a i k e ' s m e t h o d leads to estimates y~t = (y(t)', y ( t - l ) ' , . . . , y ( t - h ) ' ) ' fitting an a u t o r e g r e s s i o n minimising
BIC
or
AIC.
j > i
of the
nj
and of
A.
Put
where
h
m i g h t be c h o s e n by
and d e t e r m i n i n g
h
as the order
Put,
for
£ = 0,1,...;
m = 0,1,...,q-l,
15
y£m(t) ' = (y(t+l} ', y(t+2) ', .... y(t+£) ', Yl(t+£+l),..., Ym(t+£+l) ) '. If the smallest
nj
is for
j = m
and
nm = £
then row
(see (4.4)) is linearly dependent on earlier rows of correspondingly y£,m(t)
H
r(E+l,m) and
(see (4.1)) there will be some linear function of
that is orthogonal to the
past,
while this will not be
true for £i < £ or for £I = £' ml < m. we consider the solutions of ~J[DjI£q+m - ~£,m~£,m ]"
To judge when this is so
01 > D2 > "'" > 0£q+m
1 ~£,m = {T Zy£,m(t)Y£,m(t),}-½ T1 Z ~ , ~ t ) (gt) ,{~l zgt(gt),}-% where the summations are over £q+m 4 hq.
The
~j
h+l < t < T-£-I.
It is assumed that
are the canonical correlations between the
Y£,m(t) and 9 t. Successively examining these canonical correlations (ordering (£,m) in dictionary order, first according to £ and then
m)
we stop when,
for the first time
_(T_V£,m)log (l-~£q+m) 2 - ~£,m > 0; If this happens at eliminate
£(i), m(1)
Ym(1) (t+£(1)+j),
then
j > 0,
nm(1)
~£,m = q(h-£)-m+l. is put at
from all future
£(i). Y£,m
Now
and
continue, always taking 9£,m as qh-dim y£,m(t) + i. Once nm(1) is determined we eliminate Ym(2) (t+£(2)+j), j > 0, from future y£,m(t) and continue and so on. In this way all nk are determined and with each will be associated an ~(k), which is the ~j for the smallest ~j at the step when nk was determined. ~j is determined only up to a scalar factor and that is fixed in ~(k) making the last element unity. of the estimate of to yj(t+v) determined,
A(z)
Now
~(k)
so that the element of
nk
in canonical form corresponding to
available.
~(k)
k'th
by row
corresponding
in y£,m(t), for £,m at the values where nk was is the coefficient of z v-I in the estimate of ak,j(z).
Thus at the end of the calculation the A(z),
determines the
and
estimate
~
of
the Kronecker indices, are
It is then necessary to estimate
C(z).
This would be
done
by forming ~(z-l)y(t) and using the calculated autocovariances of this to estimate those of C(z-l)e(t). Then an estimate of the spectrum will be obtained and factored to find an estimate of Since
A(0) = C(0)
and the row degrees of
C(z)
C(z).
are prescribed by
the degree inequalities this would have to be done carefully and would not be a trivial calculation for
q > i.
In any case these
18
estimates of
A(z), C(z)
are inefficient b u t could be used to
initiate a minimisation of
(2.5), in the form for
to the canonical choice of
H0
and the
nk"
(1.5) corresponding
We do not proceed
further with the description because there are problems with the method.
It is, so far, restricted to the ARMA case.
determined in an inefficient estimation procedure adjustment of them has been suggested.
The
nk
are
and no later
However the method is of
interest because of its association with the theory of the first part of this section. Notes on References.
Adamyan,
and Jewell and Bloomfield norm approximation. q=l,
Arov and Krein
(1983)
(1983,a)
suggest,
that the canonical correlations be found directly
estimate.
Akaike
(1983)
deal with the theory of Hankel
Jewell and Bloomfield
s(z) = W(z)/W(z-l),
for
from
w h i c h is to be obtained by factoring a spectral (1969,a)
presents his method.
of a moving-average model see Hannah 5.
(1971), Glover
For some estimation
(1970).
A Gauss-Newton Procedure
(i)
First the case
q=l
and the calculations
Gauss-Newton procedure but to include
n
! T T~ eT (t)2' Here
At each iteration this is to
Thus consider = cT(z-l) _l{a T (z-1)y(t)-bT(z-1)u(t)}.
eT(t)
at, by, c r
The idea is to use a
to approximate to the true A~MAX structure
in the estimation.
be done recursively.
ARMAX model for
will be discussed because this is important
are then quite feasible.
are the transfer functions, q=l
and
T
for given
n,
(5.1) in the
is the vector of system parameters
i.e. the 3n freely varying coefficients in aT, b , c T. Here, again, we use lower case letters for the scalar case. Note that b
is, i n general,
a row vector since we do not require
p=l.
The
and are functions only of wT(z)-i = CT(Z-I)-IaT(Z -1) eT(t) WT(z)-l£T(z) = c (Z-I)-IbT(z-I). The procedure is to linearise these functions about a previous estimate, of
(5.1)
to a linear problem.
Gauss-Newton but includes
n
which reduces the minimisation
As has been said the procedure is in the optimisation.
It is necessary
to obtain a first estimate from which to commence the itezation. This is done by taking autoregression. Step 0.
eT ~ 1
and choosing
Put vCt)
at, b T
We go on to describe the algorithm.
=
~-u(t)/'
t
=
1 ..... T
by regression
17
and use the Levinson-Whittle algorithm. hand element of
S n.
Choose
~
Let
G2 n
be the top left
to minimise
log ~2 + n(p+l)log T/T. n Let the first row of scalar and in
~(z)
F~,j
be called
(aj'b!)3 where
a.~
is
5. has p elements. Then ~. is the j'th coefficient 3 J and ~ is the j'th coefficient vector in ~(z). S
The basic algorithm is now given by step 1 which is repeated until convergence.
To commence step 1 one needs estimates
These will initially come from step 0, with Step i.
Define
e(t), ~(t), ~(t), ~(t)
. . . .
ce(t)=~y(t),
~
c~(t)=y(t),
n, a, ~, c.
~ ~ 1.
by
^
(t)=e(t), c~(t)=u(t),
y(t)=~(t)=~(t)=~(t)=e(t)=0,
t < 0.
Put
I ~ (t~\
v(t) :
l-~(t)~ ~-~(t)/
,
t = 1,2 ..... T
Fn,j' Fn,j' Sn'
and use the Levinson-Whittle recursion to generate
Sn" Put n
£n(t) = ~Fn,jV(t-n+j)
Sa(1)\ n,n~ (1)| = Sn_11 ~n,n[
~(l~ I n an /
<~ (1}.~
/~(1)
- T7{~(t)-~(t)+~(t) TI 1
I n-l,j\
n,sl
| n-l,jl
~(z)/
~^(l)
n,j!
a2
n tn~
+ Fn_l(n-j)'
/
n,O
^2
s(l~l
n,O
.^(l)
•
n ,n|
j
=
a(1)l n in!
\Cn-l,j/
n,O
<~
(1)\
\
n,~|
(5.2)
} 9n_l(t-l)
~(l)'
n = °n-i - ~an,n'
n,n '
6(11) n,n- Sn-i
/
n,n~
I~(11~ ]
n,n I
\ n,n/ ^2
oo
:
1 T
s{~ft)
1
- ~(t)
+ ~(t)}2
1,2,...,n-l.
18
Choose
~,i, Ar
to minimise
log ~2 + n(p+2) and set
~ = ~%1;
aJ
log T/T
a(t) = Z~jz ] •
#
5. = ~ ( l )
= ~(I) n,J
'
3
= zS~zJ
C(Z) = ZGjzZ• ~(z) ~j = ~(I)
n,j
~,j •
'
and proceed to repeat step I. 2 log On+n(p+2) log T/T
Cease the iteration when the minimised value of stabilises. We make a number of remarks. i.
If there is a true rational
algorithm will provide as
T
increases.
(see section parameters,
7)
transfer function
estimates
that converge
system then this
to the true
Under fairly general conditions T½(T--T0) ,
TO
2sl Z
values e(t)
being the true vector of system
will be asymptotically
could be estimated
on the
normal.
The covariance
matrix
as (t-l)
.[v(t)',v(t-1)'• .... v(t-fl))} -I
V where
2 2 is o~ at the last iteration and n used at the last iteration. A more efficient
v(t)
is that vector
estimate
of the
covariance matrix would he obtained via the aj, ~J' °''3 j=l ..... n• ~2, a t the last iteration but we omit the formulae for brevity. 2.
It is not fully apparent
that it is best to use
BIC and some
would argue that in a situation where it is ~ r e a s o n a b l e in a true rational used. 3.
transfer
function
AIC
should be
Much depends upon the end purpose of the analysis.
Though the Levinson-Whittle
recursion has been used above this
could be replaced by other recursive discussed briefly
in sub-section
finally to effect one iteration and initiating with
4.
One problem with the algorithm t.
of the form
An alternative
of an algorithm
n=~
then it will
calculations
2(ii).
at
with
system then
to believe
would be
to optimise
(2.5)
aj, ~j, cj, j=l ..... ~•~2.
fail at step 1 since
is that unless ~(t)
~(z) ~ 0,
What should be done is to reflect those zeros of
that are inside the unit circle•
Izl ~
i,
etc will grow exponentially
in the unit circle.
c(z),
This may be
19
done as follows.
Form
^2 ~ -j 7. afl,jZ
Z C,, .Z j
O
0
0
which will be factored
as
~2 ~ -j 7. ~^ .Z 0 n,J where now
n,3
Z c^ .z j 0 n'3
7~^ .z j + 0, n,3
Izl ~ i.
To achieve
n-] p(j) =
this form
n ~2
7 ~^ ^ k=0 n'j+kCn'k
/ Z 0
n,j
and put !~
!~
, =
~j+l = 2 j + 2 j' ~
=
6(i)
6(2)
6(k)
~j,
c^
n,]
=
algorithm
c(z)
of
6.. 3
c~,j
5.
In some applications
test shows
~
:
:
:
0
...
~ (0)
6 (£~ 6 (n-l)
/
is the limit of
step i, on each
to check the location
algorithm)
that there
by taking
at each iteration
An alternative
i.
before
cheaper
by allowing
for example
to vary arbitrarily
and use the above
are zeros
the degrees
of steps
has been worthwhile.
At the first iteration
of
a(z), c(z)
using
b(z),
lower.
1 to 4, one can allow
but this will be much more
always
inside
in the use of
the degree of
would be to proceed
after the last iteration, elimination
•.. • ..
it may be felt that economy
may be e f f e c t e d
In principle,
If
at step
(by a Schur-Cohn
= i.
to differ,
6(1) 6(0)
o
this calculation
only when the
parameters
+
then
Izl
6.
...
it may by c o m p u t a t i o n a l l y
of the zeros of
costly.
0
6c~)/~(0).
Instead of performing
degrees
o
is used in place of
occasion,
c(z)
...
is the k'th element
this sequence,
cn,j
...0)
°
6}~)
Now
(i,0,0,
= 2(l,p(1) ..... p(~))G-l(6j)
G(6j)
Here
~0
BIC
computationally
by eliminating
terms
to determine
whether
We omit details.
of step 1 then
these
c(z)
~ 1
and hence
only this
20
~(t)
~ y(t),
~(t)
= eCt),
~(t) ~ u(t).
Computationally more
e f f i c i e n t a l g o r i t h m s h a v e b e e n g i v e n for this case, w h e n
u(t)
does
^
not Occur, w h i c h e x p l o i t the fact that to
y(t-j),
j=l,...,n,
and that
e(t)
e(t-j),
o r t h o g o n a l and c o u l d be t r e a t e d as such.
is T o e p l i t z o r t h o g o n a l
e(t-k)
are a p p r o x i m a t e l y
It is p o s s i b l e t h a t a m o r e
e f f i c i e n t i m p l e m e n t a t i o n may also be found at later iterations. m a y be m e n t i o n e d that in c o n f i n i n g s u m m a t i o n s we have treated
y(t),
t h a t assLunption
for
7.
u(t)
At step 1 from the
This is b e c a u s e
so that
It
t=l,2,...,T
t < 0,
but have a v o i d e d
This seems p r e f e r a b l e .
first i t e r a t i o n
p r o c e d u r e one may r e p l a c e n > n.
as zero for
t > T.
to
(i.e. repetition)
~(t)-~(t)+$(t) 6 (z-l)~(t)
a(z-l)~(t)-&~(t)=0
and
by
e(t)
implies that
~(t)- ~(t)
is,
of the
in
(5.2)
for
a(z-l)~(t) for
n > n
= e(t) a
linear c o m b i n a t i o n of the v a r i a b l e s in the regression. W h e n this ~(1) ~(1) i s done t h e ~ ., must be r e g a r d e d as a d j u s t m e n t s t o t h e n,3 n,] previous a~,j, c~,j i.e. m u s t be added to these. (ii)
N o w c o n s i d e r the v e c t o r case, w h i c h is m o r e e l a b o r a t e .
r e t u r n to the set, n,
for g i v e n
~.
M(n),
of all systems,
(We fix
the s y s t e m p a r a m e t e r s
~
the set of all H a n k e l m a t r i c e s
W(z), L(z). dimension
(1.3), of M c M i l l a n d e g r e e
for the m o m e n t only, b e c a u s e
that n e e d discussion.)
the r e q u i r e m e n t s b e l o w
(i.i))
H
of rank
First
M(n) n
it is
is e q u i v a l e n t l y
(for
W(z)
and of all p a i r s of t r a n s f e r
obeying functions
It m a y b e c o n c e p t u a l i s e d as a s m o o t h surface of n(2q+p)
and,
technically,
is an a n a l y t i c m a n i f o l d .
A
r e a s o n a b l e a p p r o a c h to e s t i m a t i n g a system w o u l d t h e r e f o r e be to determine
n
and t h e n the a p p r o p r i a t e p o i n t on
w h a t was done for because
M(n)
q=l.
For
q > 1
be m a p p e d h o m e o m o r p h i c a l l y
into E u c l i d e a n space.
a l t e r n a t i v e to the c o n s i d e r a t i o n of the K r o n e c k e r indices, There is, however,
a sum of
and this is
c a n n o t t h e n b e c o v e r e d by one n e i g h b o u r h o o d t h a t m a y
of all systems w h o s e K r o n e c k e r indices sum
of M(n)
M(n)
~his is h o w e v e r a p r o b l e m
M(n)
to
M(n) n
is the u n i o n
and h e n c e an
is the d e t e r m i n a t i o n of
as was the t e c h n i q u e u s e d in s e c t i o n 4.
s o m e t h i n g very a r b i t r a r y in the d e c o m p o s i t i o n
into sets c o r r e s p o n d i n g to d i f f e r e n t p a r t i t i o n s of q
n
as
i n t e g e r s and the e f f o r t r e q u i r e d for an e f f i c i e n t
p r o c e d u r e to d i s c o v e r t h e s e
is fairly c o n s i d e r a b l e .
of K r o n e c k e r indices s u m m i n g to n a m e l y those which,
for
n
n = qh + m,
A m o n g s t the set
there is one special set, 0 ~ m < q,
are of the f o r m
n I = n 2 = ... n m = h+l, nm+ 1 = -- . = n q = h. T h e n the f i r s t l i n e a r l y i n d e p e n d e n t r o w s in H are just the first n rows
n If
21
U(n)
is the subset of
independent
then
M(n)
u(n)
for which these rows are linearly
is open and dense in
or nothing is lost in restricting
attention
to
unlikely that the maximum of the likelihood in
M(n).
(However
u(n)
would provide
M(n).
Thus
U(n).
little
It is most
will be found off
a bad coordinate
system in
which to work if the maximum was near the edge.)
We describe
in another way by giving a unique description
A(z),
of
U(n) U(n)
B(z), C(z)
in W(2) = A(z-1)-lc(z-l), L(z) = A(z-l)-iB(z-l) for a system in U(n). We do this by describing the coefficient matrices An, j, Bn,j, Cn, j in A(z), B(z), C(z).
These will be depicted
indicating a freely varying are after the
All partitions
m'th row or column
A n ,0
= [~m Oq_m] ' Bnw0 = O, An , 1 = [[ 0. ]
Cn, 0
An,h+l' All other
below with a star
submatrix of elements.
Bn,h+l'
Cn,h+l =
An,j, Bn,j, Cn,j,
j ~ h+l,
are unrestricted.
We do
not mean that An,h+l, Bn,h+l, Cn,h+ 1 are equal. The vector T of system parameters coordinatising U(n) is of dimension n(2q+p) and is made up of the freely varying elements matrices.
in the coefficient
We now go on to describe how to estimate
n, T
and
~.
We do this by a series of steps that are related to those for but are more complicated. step
2
is
iterated.
Steps 0 to 1 are not repeated. Always the output
from the previous
the input to the next so we do not indicate
q=l
Only step is
those by a special
notation i.e. we do not for example write
~!i) for the ~. 3 J matrix found at step 1 since it is clear which Aj is used an step 2 i.e. that from step 1 and not step 0. Also we shall now index the stages in the Levinson-Whittle
recursion
by
h,
rather than
n
as
before. Step 0.
Put vet)
~ \-uCtl/'
t
=
1, ....
T
and use the Levinson Whittle recursion. hand choose
q x q ~,
submatrix of i.e.
n,
S h, h = 0,1,2,...,
be the top left n where n = qh, and
to minimise
log det ~n + n(q+p) Let the first block of
Let
q
log T/T,
rows in
Fh, j
n
=
hq,
h=
be called
0,i,...
|Ae,
I~j ]
and
22
let ~ be called ~. Then Aj: Bj n c o e f f i c i e n t m a t r i c e s in A(z), B{z), and
n
Step i.
=
are the
j = l,...,h
with
A0 = Ig,
C(z)
~ Iq
.
Put 8(t)
= Z Ajy(t-j) 0
- Z Bju(t-j) 1
and
v(t) =
|-u(t)J
k-act)/
and use the L e v i n s o n - W h i t t l e element of
Sh
is c a l l e d
algorithm.
~n'
Again
n = hq
the top left hand
and we choose
h
i.e.
to m i n i m i s e log det ~ Now
IAj, Bj, 6 9 ]
coefficient
Now
m
= ~-i
+ n(2q+p)
n
are the top
matrices
in
to the case
~-i
or equal to the true
in
in
C(z),
and p r o v i d e
with
A0 = B0 = I
n = £q+m,
for
0 ~ m < q.
m = 1,2,...,q-l,
at step i,
£,
T
is to insert
(5.3) and the elements
transfer
function
procedure,
in the a p p r o p r i a t e
a l g o r i t h m will be used for of
m
need be taken.)
n =
(h-l)q + m 1 < j < m
m.
An, 0, Cn, 0.
the c a l c u l a t i o n
we regress
Yj(t)
here
at (see
t h e m as a r e g r e s s i o n
(It is u n l i k e l y
and then only
The r e g r e s s i o n
other variables,
that we d e s c r i b e If
q > 5
places
c h e a p l y using the c a l c u l a t i o n s
It is simpler to d e s c r i b e
one for each value of
than
We c o n s i d e r
done at step i.
step 1 but the details are too c o m p l i c a t e d to be d e s c r i b e d the references).
m=0
will be greater
indicated by a star in
This can be done c o m p u t a t i o n a l l y
for
was p r e f e r r e d
our procedure.
then are a l r e a d y
zero e l e m e n t s
h
h
the
We c h o o s e
since
to w h i c h
at step 1
which e x p l a i n s
but the c a l c u l a t i o n s
h=0,1,2,...
F~,j
If there is a true r a t i o n a l
system then for large e n o u g h
The p r o b l e m
n = hq,
rows in
B(z),
and need only compute
by the criterion.
m = q
q
A(z),
has to be determined,
corresponds
log T/T,
4
that the
or fewer v a l u e s
is of a v e c t o r v a r i a b l e
but is c a r r i e d for a typical
on
out row by row so row,
j,
j = l,...,q.
on the f o l l o w i n g v a r i a b l e s ^
(i) and
- Yk(t-i), i = 2,...,h
for
k=l,...,q;
where
k = m+l,...,q.
i = l,...,h
for
k ~ m
23
(2}
Uk(t-i),
k = l,...,q;
i = 1 .... ,~
(3)
ek{t-i),
k = l,...,q;
i = 1 .... ,h
where ^j^ Z C4e(tj) = 0
g
g
^j Z A4y(tj) 0
Z Bju(t-j ), 1
A
y(t) FQr
m < j < q
= u(t)
we regress
= e(t) yj(t)
(i)
-Yk(t-i),
(2) (3)
-(Yk(t) - Sk(t)), uk(t-i), ek(t-i),
The coefficient regression ek(t-i)
of
-Yk(t-i) aj,k(i) ,
in relation as
or
-(Yk(t)
in
A(z) C(z).
from the
choosing
to m i n i m i s e
q
m = q
the left
end of step
I).
Now we have
an
indicated
by
As was said above
for
Step 2.
q2, q2
n=(~-l)q+m, products
of
is chosen by
n = q(~-l)
with
+ m,
m = i, .... q.
expression
the latter
0, 1 are not repeated.
will be necessary,
F o r m matrices
respectively
~(z)
~n'
uk(t-i),
at the
of the form
n = ~.
steps
often no repetition
n
j'th
for
and cross
Now
log T/T,
B(z),
is the
The matrix
side is just the m i n i m i s e d
n, A(z),
(5.3)
- ek(t))
and s i m i l a r l y
regressions.
log det ~n + n{2q+p) (For
i = 1 ..... ~-i.
by the sums of squares
the residuals m,
i = 1 .... ,h-l.
k = m+l ..... q. k = 1 ..... q,
to • B(z),
T -I
t ~ O.
on
k = l,...,q;
estimate,
is estimated
= O,
q(t), and
~(t),
qp
~(t),
columns
Step
2 may be but
or at most one. of
q
rows
and,
by solving
A
h 0Z ~j [q(t-j},
~(t-j) , ~(t-j)]
=
(y(t)',
(y(t} ', u(t) ', e(t) ') = 0, Here
A
e(t)
is obtained 0Z ~je(t-j)
with the usual product wherein
Iq,(5.4)
t ~ 0.
from
= - 1Z ~ u(t-J)3
initial
u(t)', e(t)')@
conditions.
a typical
block
is
+ By
0Z Ajy^ (t-j) X @ Y
xijY ,
we m e a n
(5.5) the tensor
i = 1,...,a;
j = l,...,b
24
where
X
is
a x b.
Of course in (5.4)
all blocks are a scalar multiple of column of
n(t), for example, is
X
lq.
is
1 x (2q+p)
Thus the
and
i+q(j-l)th
O(z=l)-iEijy(t),
where
Eij
consists of zeros save for a unit in the (i,j)th place. Put
I [n(t)!] ~v(J ) = ~ Xl-C(t) ~-l[q(t+j), -C(t+j), -~(t+j)] . L-C (t) This matrix is of dimension
q(2q + p).
It is to be the
that is the input to the Levinson-Whittle carried out.
It is, thus,
computational effort.
For
(5.6)
~v(j)
recursion which is to be
q(2q + p)
that determines the
q = p = 5
this is
75,
which already
would be a rather large scale implementation of the Levinson-Whittle recursion. In cases where q is larger it may be necessary to use some other expedient and we discuss this in remarks below. Let
~(t)
be the vector obtained by adding columns numbered
i + q(i - I),
i -- l,...,q
in
n(t)
and similarly for
~(t), ~(t)
in relation to ~(t), u(t). (It is ~(t), ~(t), ~(t) that correspond most closely to the quantities defined for q=l.) Thus h 0Z Cjn(t-j) = y(t), 0Z Cj~(t-j) = e(t), 0Z Cj~(t-j)=u(t). ^
^
^
Now form, for each h recursion with (5.6), ^ = ~-I_ ~h,h Sh i
^
value considered in the L e v i n s o n - ~ i t t l e
h-I 7 ~ !Z[n(t-h+j) 0 h-l,j T
-~(t-h+j), {~(t)
Here e(t) q (2q+p).
is as from (5.5).
This vector,
Th, j = Th_l, j + Fh_l,h_jTh, h,
-~(t-h+j)] '
- ~(t) {h,h'
+ e(tl}. is of dimension
j = 0,1,2 .... ,h-l.
To initiate take ~0,0 to have zeros everywhere save for units in the places numbered i + q(i-l), i = l,...,q; q(q+p) + i + q(i-l), ^
i = l,...,q.
NOW the
Th, j
Bj, Cj, for n = hq. Thus element in the i + q(k-l)'th
provide^ estimates of the matrices
Aj,
Ah, j has as estimate of aik(j) the place in ~h,j" ~h,j has as its
(i,k) 'th element the element in place
q2 + i + q(k-l)
while Ch, j has as its (i,k) 'th element that in the {q2 + qP + i + q(k-l)}'th place. Next put,
in
~h,j
25
~n = T1 Z ~n(t)&n(t)'"
n = hq,
where h h Z th, jen(t- j) = Z A ,jy(t-j) 0 0 h and choose
i.e.
~
h - Z Bh,jult-j) 1
so that this minimises
log det ~n + n(2q + p) log T/T,
Now we seek to estimate
m
in
n = (~ - l)q + m,
as in step I of the algorithm. this as a regression formed at
(5.4) columns
elements
in
numbered
[A(j),
B(j), C(j)]
i+q(k-l),
bik(J)
matrix
Oik(J). element
in addition,
q2+qp+i+q(k-l)
are added,
having been eliminated) Now call
T
order,
where
parameter
comes
column index,
~h,j' the
i+q(k-l),
from k
all columns
i=l,...,m; Xj(t)
columns numbered i=m+l,...,q,
A(j),
j,
B(j)
i,k=l .... ,q for which
to be null.
Thus
k=m+l,...,q
are
except that for the i+q(k-l),
k=l,...,m
to form a matrix of only
to lag,
q(2q+p)
is associated
q2+qp+i+q(k-l),
is prescribed
these parameters
first according
~(t-j)]
aik(J) , the
k=l,...,p
Now eliminate
the vector of estimates
n
n=(~-l)g+m,
~(t-j),
As in forming
Call the resulting matrix
X0(t),
|-n(t-j),
is associated with
in (5.3)
columns numbered
eliminated.
it could be computed using
i=l,...,q;
and the column numbered
the corresponding j=l
step.
to describe
in the sense that the column
i,k=l ..... q
q2+i+q(k-l),
is associated with for
though
Consider
in this matrix are associated with the
column numbered with
m,
(5.6).
from the previous
q(2q+p)
m = 1,2,...,q-l,
Again it will be easiest
for each
the output from the use of
(5.7)
n = hg.
(all others
m(q-m)
of system parameters
are arranged or
C(j),
for
in dictionary
then according
and finally according
columns.
to whether
then according
to row index
i.
the to
Then
~n = {TI ZX(t) '~-ix(t)}-i {~I ZX(t) ,~-l(~(t) + e(t) -~(t))}; x(t) = [X0(t),
We emphasise
Xl(t) . . . .
that
of step 2.
~(t), ~(t)
are all formed using
step, which at the first use of step 2
step I, but later will have been from a previous use
Only
at this step.
].
X(t), ~, ~(t),
the output from the previous will have been
(5.8)
h
has been determined
The notation
~n
in (5.8)
by previous calculations should not be confused
26 ^
with ;fh,~ type
earlier. h,j
~n
is made up of many submatrices
and is of dimension
n(2q + p),
of the
n ~ (h-l)q + m.
We now again put = TI Z e n (t) e^n (t)',
~n
n = (h-l)q + m
(5.9)
h
Z 6 e (t-j) 0 n,3 n where
~n,j'
= Z A (t-j) - Z B ju(t-j) 0 n'JY 1 n,
~n,j' dn,j
the identification
have elements
discussed
before
obtained
(5.7).
~n
according
We choose
~
to
to
minimise log det ~n + n(2q + p) log T/T,
n = (h-l)q + m,
m = 1,2,...,q. (5.10)
Again for
m = q
optimised
(5.7).
the value of this criterion Then
values corresponding of
~
from
~j, ~j, ~j
to
~, i
is
(h-l)q + m
(5.9) that optimised
the
Remarks.
I.
analogues
here.
(5.4)
~
in
(5.5)).
All of the remarks In relation
from these
a
in relation
this factorisation references). 2. and
AA " C(e i ~)~C(e I~)
which does have
det C(z)
are available
Much of the work involved q
begin to be important.
would not be unreasonable step 1.
for
*
[z[ ~ i.
simulations
of
transfer n
function
is improved
at
Iz[ ~ I.
so as to obtain Algorithms
for
(see the
h, m
p
there it
found at
of step 2 the values of
step.
of
and if that
To reduce the calculation
to do them only at the
at the previous
with rational
that the determination
C(z)
is in step 2 where the sizes of
(5.5) have been computed we may move determined
# 0,
criterion
canonically
# 0,
of
but we omit details here
In any case at repetitions
h, m
the description
to the scalar case have
det ~(z)
from the first use of step 2 could be used. (5.4),
is the value
to the use of an estimate
again a problem will arise unless
C(z),
~
~j, ~j, Cj, ~,
This completes
Again this can be checked via a Schur-Cohn fails we should factor
and
to be the
(5.10).
We may now repeat step 2 commencing (which defines the algorithm.
is that which
are finally defined
h,
When that is done, once straight to
(5.8),
(5.9)
However experience with generated
data shows
at the first use of step
2 and it may improve again at later iterations
of that step.
27
N o t e s on R e f e r e n c e s .
The a l g o r i t h m s here d e s c r i b e d w e r e
p r e s e n t e d in R a n n a n and R i s s a n e n
(1982), H a n n a n and K a v a l i e r i s
The emphasis there w a s m o r e on o r d e r d e t e r m i n a t i o n . d e t e r m i n a t i o n of m
in step 1 (i.e.
is given in the second r e f e r e n c e . b e g i n n i n g of s u b s e c t i o n example.
(ii)
(1969)
Tunnicliffe Wilson
(1972).
For
(1984).
the
q > i) an a l t e r n a t l v e c a l c u l a t i o n For the structure t h e o r y at the
see D i e s t l e r and H a n n a h
The a l g o r i t h m in remark 4 in s u b s e c t i o n
Tunnicliffe Wilson
first
(1981),
for
(i) is due to
and its m a t r i c i a l v e r s i o n to
28
6.
Some T h e o r e t i c a l C o n s i d e r a t i o n s
This section w i l l be very b r i e f this account,
nor c o u l d
a v a i l a b l e here.
since t h e o r y is not the p u r p o s e of
such t h e o r y be f u l l y p r e s e n t e d in the space
H o w e v e r there seems to be some v i r t u e in i n d i c a t i n g
the scope of the t h e o r y u n d e r l y i n g
the m e t h o d s .
In the first place it is not n e c e s s a r y t h a t linear i n n o v a t i o n s , e(t),
be G a u s s i a n and all of the m e t h o d s are v a l i d u n d e r m u c h m o r e
general c o n d i t i o n s the
e(t)
in the sense t h a t the same theory o b t a i n s as if
were Gaussian.
The e s s e n t i a l c o n d i t i o n
E{e(t) le(t-l) , e(t-2) ....
} = 0.
This is e q u i v a l e n t to the a s s e r t i o n ,
for
(1.3), that the b e s t l i n e a r p r e d i c t o r
by a linear system.
Asymptotic
u(t)
(in the
if they are to be
require additionally that (6.2)
} = n.
some r e g u l a r i t y c o n d i t i o n s of a r e a s o n a b l y g e n e r a l
nature are n e e d e d b u t we do not d i s c u s s Of course
see
in that sense, g e n e r a t e d
distributions,
E{e(t)e(t) 'le(t-l) ....
- ZLiu(t-i) ,
is the b e s t p r e d i c t o r
so t h a t the d a t a is,
For
(6.1) y(t)
least squares sense)
the same as for the G a u s s i a n case,
is that
(6.1),
t h e m here.
(6.2) w i l l h o l d if the
with zero mean v e c t o r and finite
e(t)
are i n d e p e n d e n t ,
s e c o n d moments,
but are c o n s i d e r a b l y
more general. 7.
On-Line Procedures
Here only the case
p = q = I
m e t h o d easily g e n e r a l i s e s
to
w i l l be c o n s i d e r e d , p > I.
c o n c e r n i n g m e t h o d s for real time, and this has r e c e n t l y b e e n references.
There
though the
is a large l i t e r a t u r e
o n - l i n e e s t i m a t i o n of s y s t e m s
surveyed,
as w i l l be i n d i c a t e d in the
Here a t t e n t i o n will be c o n c e n t r a t e d on an o n - l i n e
implementation,
for
q = i,
of the a l g o r i t h m d e s c r i b e d in s e c t i o n 5.
In other w o r d s we i m p l e m e n t the two steps of this a l g o r i t h m in an o n - l i n e fashion, w i t h the step 1 i t e r a t e d Before d e s c r i b i n g that let us d e s c r i b e procedures.
(i.e. repeated)
once.
three k n o w n on-line
Each is of the f o r m
T(t) = T(t-l)
+ P(t)x(t)$(t),
e(t)
= w(t)
- T(t)'x(t)
2g
T
P(t)
= {z x ( t ) x ( t ) ' } - i 1
= P(t-l) Here and
v(t)
is the
T(t)
"independent
is the e s t i m a t e
coefficients. w(t),
- {I + x(t) 'P(t-l)x(t) } - I p ( t - l ) x ( t ) x ( t ) ' P ( t - l ) .
must
at t i m e
In t h e b a s i c
be c o n s t r u c t e d
w i t h the e s t i m a t e
on-line at t i m e
T(t-l).
(I) RLS = R e c u r s i v e
least
variable" t
in a r e g r e s s i o n
of the v e c t o r
procedures t
squares.
to t i m e
identify
bl(t),
t
together
T, x ( t ) ,
This corresponds
ah(t),
and probably
w(t).
to s t e p 0.
• (t) ' =
(al(t),
x(t) ' =
( - y ( t - l ) , - y ( t - 2 ) ..... -y(t-h) , u ( t - 1 ) , u ( t - 2 ) , . . . , u ( t - h ) }
w (t)
a2(t) .....
x(t)
of r e g r e s s i o n
x(t),
from data
In e a c h c a s e w e
on
b2(t) .....
bh(t)).
= y (t).
(2) A M L = A p p r o x i m a t e
maximum
likelihood.
This
corresponds
t o the
first use o f s t e p i.
T(t) ' =
(ai(t) ..... an(t) ,bl(t) ..... b n ( t ) ,Cl(t ) ..... C n ( t ) )
x(t) ' =
(-y(t-1) ..... - y ( t - n ) , u ( t - 1 ) ..... u ( t - n ) , % ( t - 1 ) ..... % ( t - n ) )
w(t)
= y(t).
In fact w h a t
is m o s t p r o p e r l y
but i n s t e a d
£ (t-j)
~(t)
= y(t)
e (t-j)
in
x(t)
- T(t)x(t).
This c a n b e done s i n c e which uses
c a l l e d /hML u s e s n o t
at t i m e
t
the l a t e s t
value
used
is
~(t-1)
T(t-1).
(3) RML = R e c u r s i v e s e c o n d use o f s t e p
maximum
likelinoodo
This
corresponas
to t h e
I.
T(t) ' =
(a l(t) , .... a n(t) ,b l(t), .... b n ( t ) , c l(t) , .... c n(t))
x(t) ' =
(-~(t-1) ..... - ~ ( t - n ) , ~ ( t - 1 ) ..... ~(t-n) ,~(t-1) ..... ~(t-n))
w(t)
= ~(t)
+ ~(t)
n 7 Cj (t)x(t-j)' 0 y(t)
= u(t)
As p r e s e n t e d independently
=
= e(t)
- ~(t).
(7.1)
( - y ( t ) , u ( t ) , ~ ( t ) ) , C 0 ( t ) - l, = 0,
t ~ 0.
above each of these of the others.
(7.2)
c o u l d De u s e d as a p r o c e d u r e
Of c o u r s e
n
is fixed.
It is k n o w n
80
that AML m a y not converge, McMillan degree
n,
e v e n if the true s y s t e m is A R M A X of
unless
2R(c(ei~) - I - ½) > 0,
~ q [-~,~],
i.e. unless the p o s i t i v e real c o n d i t i o n is s a t i s f i e d .
It seems
that ~ML m a y fail u n l e s s the l o c a t i o n of the zeros o f
C(z)
is
m o n i t o r e d and w h e n these move inside the u n i t circle then the Cj {t)
u s e d in f o r m i n g
~ (t) , ~ (t) , ~ (t) , % (t)
m u s t be held at
fixed v a l u e s o u t s i d e of the u n i t circle u n t i l the o u t p u t vector, T(t)
corresponds
to a stable
C.(t) set, i.e. a set with zeros 3 For these reasons it has b e e n s u g g e s t e d
outside of the circle.
that the a l g o r i t h m s be run in parallel, p r o v i d e d by the
~(t)
for RML b e i n g the
~(t)
f r o m AML.
in g e n e r a l be m u c h l a r g e r t h a n assumed true order.
w i t h the
from RLS and w i t h the
~(t-j)
e(t)
The v a l u e of
n
in h
for AML
(7.1),
in AM-L, RML w h e r e the
A c o m m o n choice w o u l d be
f7.2)
in RLS w o u l d
h = 2n,
n
is the
but t h i s
is arbitrary. One main reason for on-line c a l c u l a t i o n is to allow the e s t i m a t e s to adapt to an e v o l v i n g m e c h a n i s m
generating
case one should a l s o be "forgetting"
the r e m o t e p a s t since that
will be i r r e l e v a n t to the "forgetting factor" and
x(s)
e s t i m a t i o n problem.
£t(s)
t ~ ~(u), u=s+ 1
then the nett e f f e c t changed,
at time
t.
In that
Thus a
is i n c l u d e d t h a t m u l t i p l i e s
in the c a l c u l a t i o n s ~t(s) =
the data.
w(s)
If
£ (s) = 1 s
is that o n l y the f o r m u l a for
P(t)
is
becoming 1 = ~)[P(t-l)
P(t)
- {l(t)
+ 2 ( t ) ' P ( t - l ) x ( t ) } - i p ( t - l ) x ( t ) x ( t ) 'P(t-l)] . One r e a s o n a b l e p r o c e d u r e w o u l d be to take 0 < ~ < 1
and
1
is fairly n e a r to
However it is felt t h a t
h
and
n
i,
h
will have to i n c r e a s e w i t h
converge to the t r u e log t,
T
H l,
where
l
is
0.95.
m i g h t be m a d e to d e p e n d on
In p a r t i c u l a r even if the true system w e r e then
l(t) e.g.
t
Of c o u r s e if
in o r d e r that h
T (t)
increases with
as it w i l l if AIC or BIC is used to choose
e v e n t u a l l y the c a l c u l a t i o n c a n n o t be d o n e
t.
of the k n o w n order,
h,
in real time.
n,
will t
as
then However
if "forgetting" is u s e d then the sample s i z e is not, truly,
3~ increasing
with
The criterion
t
and thus
time
t
shoula not increase
indefinitely.
should be
log c^2(t) h wherej when
h
+ h log f(t)/f(t)
"forgetting"
is used
f(t)
(7.3) measures
the sample
size to
and is f(t+l)
= X(t+l)f(t)
sznce the effective
sample
+ i,
size is
t t f(t) : Z n X(u). s=l s+l It remains allowed
to describe
so to vary.
RLS, where readily with
h
how to compute
Though
indicates
~(t)
in
these procedures
the order,
(7.3) w h e n
are d e s c r i b e d
and for
p = 1
h
is
for
it will be
seen that they can be used in the same way for AML or RML,
n
taking
the place of
could also fairly easily vector
x(t)
h,
and for
be g e n e r a l i s e d
p > i.
to
Inaeea
they
Call
Xh(t}
q > 1.
when this has b e e n rearranged
the
as
(-y (t-l) ,u (t-l) ,-y (t-2) ,u (t-2) ,... ,-y (t-h) ,u (t-h)) and rearrange
T(t)
Xh(t)' If
Q
accordingly,
is orthogonal
and
Rh(t)
and
is upper
f (t)-Ish (t) 2
the calculations be obtained S =
is
[~(t)
,
[xH~t+l~ Q
acts only on rows place
in
QiQi_l
Qi_iQi_2
... Q1 S
:
[~
(t)
rh(t) 1
0
sn(t)j
triangular, c~(t).
then
Moreover,
as
cost.
Put
= (y(1) ..... y(t}).
~(t)~h(t)
= -rh(t|
as now will be indicated, a~(t),
h = I,...,H
may
consider
rH (t) ]
y(t+l~] Q2HQ2H_I
i, 2H+I ... Q1 s. are
Indeed
%h(t).
v~t)'
may be done so that all
at little
and construct
it
: [Xh~l) ..... xh(t)],
Q[X h(t)v(t)]
where
calling
--- Q2QI
where
and introduces
Qi'
orthogonal,
a zero in the
Then if the rows numbered
i,
(2H+l,i)'th (2h+l)
in
32
and
(0,0,
...,
0, d ½, d % r 2 . . . .
~0,0,
...,
0 • ~_iXl
xI i'
di,
~i
2
e
, 6 i~ - i x 2 '
are defined
d i = d + 6i_ixi
,
= d / d i,
i,
(0,0
(2h+l)
#
..., 0,
where
SH(t)2 we may find
of
"'" Q1
chosen
... S
d r h+l )
Q
---
, ~ X h + l ). Q2HQ2H-I
"'" Q 1
right hand element
of
Q2HQ2H_I
... Q 1 S
is
~(t)
h ~
H
h = H,
Moreover
at no e x t r a ... Q 1 S
6~ h.
Thus
this,
and
(7.3)
the w h o l e
thing may be done with f o r m of the a l g o r i t h m s
q = i.
How
this
well and the and via
h
h
to h a v e
transfer
It w o u l d
then,
eventually
the a l g o r i t h m
algorithm
could
t = 2000)
that
and often
log t
itself
with
RML.
This then
5, at l e a s t
to b e seen.
should t
could to i t s
for
If
h,
If it w a s fit the d a t a
set
l(t)
£ 1
and could be chosen
increase
as
log t.
Tnis
n o t r u n in real time. long run value
increases
b e r u n In r e a l t i m e
it c o u l d b e r e g a r d e d
recursive.
system would
then we
to i n c r e a s e
would De small compared
very large
in AML,
in s e c t i o n
h ~ H,
made
some virtues.
function
right
for a l l
calculation n
gives
and that of
will be remains
system was not evolving
that eventually
However got
certainly
a rational
should be allowed
(7.3).
means
it s e e m s
algorithm
the b o t t o m
m a y be c o m p u t e d
the same
that
is
Thus given
the c a l c u l a t i o n
cost, s i n c e % ^ is 62heh(t)
an o n - l i n e
are f i x e d
in RLS.
is
+ ~H(t) 2
Q2hQ2h_l is
for
defines n
are
"'"
recursively.
useful
xI = 0
of
to minimise
Precisely
e
r I = r I = I,
element
~H(t)
~(t)
hand element
0, 6 X 2,
= SH(t-1)2
for all
QiQi_l
right hand
and the bottom
6 %2 H e^H (t),
!
r k = c r k + sx k,
r o w s of
the bottom
believed
---- 1
60
= ~i_ixi/d i
O, d ~i e d z r 2,
--oQ
(0,0,
Q2hQ2h-i
,
!
t h e n the
~h(t) 2
by the recursion
s
x k = x K - xirk,
6 H
' 6 %i - i x 2h+l"%
"'"
6i = d 6 i - i / d i
e
Moreover
, a½r2h+l)
slowly
u p uo v a l u e s
as a r e a l
until
it
so t h a t the so l a r g e
time algorithm.
(say
33
Allowing
h
useful, w i t h
and
n
l(t)
to i n c r e a s e n e e d s i n v e s t i g a t i o n but c o u l d p r o v e a d r o i t l y varied,
or even a n o n - l i n e a r ,
to m o d e l an e v o l v i n g p h e n o m e n o n
episodic phenomenon.
When
h
or
n
it is likely that o c c a s i o n a l l y they w i l l change a p p r e c i a b l y value of
t
to another.
This
is b e c a u s e
(7.3)
varies from one
is l i k e l y to be flat
near its m i n i m u m or e v e n h a v e s e v e r a l m i n i m a n e a r to e q u a l i t y . may not m a t t e r m u c h since all of the c o m p e t i n g m o d e l s
This
are b e n a v Z n g
about e q u a l l y w e l l b u t c o u l d be m i s i n t e r p r e t e d as e v o l u t i o n . Notes on References.
The f i e l d of o n - l i n e c a l c u l a t i o n
surveyed in L j u n g and S ~ d e r s t r o m section, for
h,
n
(1986).
The b a s i c p r o c e d u r e of this
fixed, w a s s u g g e s t e d in Mayne,
(1983) and the p r o c e d u r e and M a c k i s a c k
(1983).
for
h, n
is e x t e n s i v e l y
A s t r o m anO C l a r ~ e
v a r y i n g in Hannan,
Kavalleris
34
References. Adamyan,
V.M., Arov,
D.F. and Krein, M.G.
(1971)
of Schmidt pairs for a Hankel operator Schur-Takagi Akaike,
H.
Ann. Akaike,
problem.
(1969) Inst.
H.
Fitting
(1969,a)
autoregressive
Canonical Advances
and D.G. Lainiotis, Anderson,
B.D.O.
15, 31-73.
models
for prediction.
6, 416-431. correlation
and the use of an information Identification,
and the generalised
Maths USSR Sbornik,
Star. Math.
Analytic properties
analysis
criterion.
and Case Studies,
Academic
and Moore, J.B.
Press, (1979)
of time series
In: System eds. R.K. Mehra
New York,
29-91.
Optimal Filtering,
Prentice Hall, Englewood Cliffs. Casti,
J,L.
(1977)
Academic Cooley,
J.W.
and Tukey,
calculation
Glover,
M. and Hannan,
B.
(1983)
E.J.
(1981)
Anal.
Lattice
systems and their Cambridge,
Multiple
Hannah,
E.J.
(1980)
The estimation
system.
of the
for adaptive
processing.
Ann. (1981)
Statist.
error bounds
Tlme Ser~es,
Research Dept.
Wiley,
New York.
of the order of an ARMA
8, 1071-1081.
Estimating
J. Multivariate
L~
Systems Division,
of linear
EngLand.
(1970)
E.J.
of
All optimal Hankel norm approximatlons
E.J.
Hannah,
Mathematics
Some properties
filters
Hannan,
process.
for machine
ii, 474-484.
Control and Management
Engineering,
An algorlthm
70, 830-867.
multivariable Report,
(1955)
of ARMA systems with unknown order.
(1982)
IEEE,
K.
J.W.
19, 297-301.
J. Multivariate
Proc.
and Their Applications,
of complex Fourier Series.
parameterization
Friedlander,
Systems
Press, New York.
Computation, Diestler,
Dynamical
the dimension
Anal.
of a linear
ii, 459-473.
of
35
Hannan,
E.J.
(1982)
criterion.
Testing
for autocorrelation
In: Essays in Statistical
and E.J. Hannan,
and Akaike's
Science,
Applied Probability
Trust,
eds. J.M. Gani
Sheffield,
403-412. Hannan,
E.J.
and Kavalieris,
series models. Hannan, E.J. models. Hannan, E.J.,
and Kavalieris,
Kavalieris,
autoregressive
Ann. Statist.
of past and future
~1986)
order.
Recursive
733, no.l. estimation
BiometriKa,
of mlxed
59, 81-94.
Prentice Hall, Englewood
~1983)
canonical
definitions
Identification,
P.
(1983,a)
for time serles:
T.
(1983)
MIT Press,
D.Q., Astrom,
for recursive
correlations
correlations
bounds and computation.
K.J.
and Clarke,
Research
Imperial
(1983)
Theory and Practice
College,
Universal
J.M.
(1983)
of parameters
Report,
of
Mass. A new algorithm
in controlled
Dept. of Electrical
London.
prior
estimation by minimum description
for parameters length.
and
Ann. Statist.
416-431. R.
(1980)
Asymptotically
order of the model process.
of
and theory.
Canonical
Cambridge,
identif±cation
A~MA processes.
Shibata,
Cliffs.
l_!l, 848-855.
Lgung, L. and S6derstrom,
J.
autoregression
l_!l, 837-847.
Ann. Statist.,
Rissanen,
M.
Biometrika,
for time series:
and Bloomfield,
Engineering,
Regression,
J. (1982) Recursive
P.
linear time
7.
Linear Systems,
past and future
Mayne,
(1986)
moving-average
(1980)
Multivariate i_~6, 492-561.
L. and MacKisack,
Jewell, N.P. and Bloomfield,
Jewell, N.P.
L.
of linear systems.
Hannan, E.J. and Rissanen,
T.
I1984) Prob.
J. Time Series Anal.
estimation
Kailath,
L.
Adv. Appl.
Ann.
efficient
for estimating
Statist.
selection
parameters
8, 147-164.
of the
of a linear
~,
36
Tunnicliffe Wilson, G.
(1969}
Factorization
of the covariance
generating function of a pure moving-average SIAM J. Numer. Tunnicliffe Wilson,
Anal., G.
~1972) The factorization or matricial
spectral densities. Whittle,
P.
~1963)
process.
~, 1-7.
SIAM J. Appl. Math.,
23, 420-426.
On the fitting of multivariate
auto-regressions
and the approximate canonical
factorization of a spectral
density matrix.
50, 129-134.
Biometrika,
Chapter 2
Linear Errors-in-Variables Models
Manfred Deistler
I. I n t r o d u c t i o n
In this c o n t r i b u t i o n we are c o n c e r n e d w i t h some a s p e c t s of the ident i f i c a t i o n p r o b l e m for linear systems w h e r e b o t h inputs and outputs are subject to
("observational")
called e r r o r s - i n - v a r i a b l e s
errors.
M o d e l s of this k i n d are
(EV) models.
The c o n v e n t i o n a l s e t t i n g in the s t a t i s t i c a l a n a l y s i s of linear s y s t e m s is to a t t r i b u t e all e r r o r s to the outputs,
or
valently to add the e r r o r s to the e q u a t i o n s . equations ^
(for our purposes)
equi-
T h i s gives the e r r o r s in
(EE) models. ^
Let x t and Yt d e n o t e the "true" inputs and o u t p u t s r e s p e c t i v e l y and let x t and Yt d e n o t e the o b s e r v e d inputs and outputs, ation can be i l l u s t r a t e d as follows:
I
There
E V m o d e l s are of the form:
Fig
I
then the situ-
I: S c h e m a t i c r e p r e s e n -
t a t i o n of an E V m o d e l
u t and v t are the e r r o r s of the inputs a n d the o u t p u t s re-
spectively.
On the o t h e r hand EE m o d e l s are of the form:
S
Fig 2: S c h e m a t i c r e p r e s e n -
r
tation of an EE m o d e l Yt
S8
Of c o u r s e the E V setting is m o r e g e n e r a l t h a n the EE setting.
For a
nunlber of p u r p o s e s , e.g. for the p r e d i c t i o n of the o b s e r v e d o u t p u t s f r o m o b s e r v e d inputs,
the EE s e t t i n g is adequate.
In m a n y cases h o w -
ever, the E V s e t t i n g seems to be m o r e a p p r o p r i a t e ,
(i)
e.g.
if our m a i n i n t e r e s t c o n c e r n s the "true" s y s t e m g e n e r a t i n g the data
(rather t h a n a good r e p r e s e n t a t i o n of the data)
and if we
c a n n o t be sure a p r i o r i that the true inputs are not c o n t a m i n a t e d by e r r o r s
(ii)
if we w a n t to d e c o u p l e the c o m m o n e f f e c t b e t w e e n the v a r i a b l e s f r o m the i n d i v i d u a l e f f e c t s
(iii)
if there is no a p r i o r i c l a s s i f i c a t i o n of the o b s e r v e d v a r i a b l e s into inputs a n d o u t p u t s a n d if thus a s y m m e t r i c t r e a t m e n t of the v a r i a b l e s w o u l d be a p p r o p r i a t e .
We are d e a l i n g here only w i t h linear systems in a s t a t i o n a r y context. Also,
if the c o n t r a r y has not b e e n s t a t e d e x p l i c i t e l y ,
we r e s t r i c t
o u r s e l v e s to the single input - single o u t p u t case. Our p r i m a r y int e r e s t is in the c h a r a c t e r i s t i c s function
of the system,
i.e. in the t r a n s f e r
(or the p a r a m e t e r s of the t r a n s f e r f u n c t i o n ) ;
but a l s o the
^
c h a r a c t e r i s t i c s of the errors and of
(xt) are of interest.
The s t a t i s t i c a l t h e o r y of linear d y n a m i c EE systems, A R M A X systems
(also in the m u l t i i n p u t - m u l t i o u t p u t case)
c h e d a c e r t a i n stage of c o m p l e t e n e s s n o w (1984)).
e s p e c i a l l y of has rea-
(see H a n n a n and K a v a l i e r i s
In the EV case on the o t h e r h a n d there is still a g r e a t
n u m b e r of open p r o b l e m s and this is the r e a s o n why there is still a r e l a t i v e l y small n u m b e r of a p p l i c a t i o n s
in this field.
lems in the E V case a r i s e f r o m the fact that the
The m a i n prob-
(ensemble)
second
m o m e n t s of the o b s e r v a t i o n s do in g e n e r a l n o t u n i q u e l y d e t e r m i n e the t r a n s f e r f u n c t i o n of the system. A n o t h e r d i f f e r e n c e to EE m o d e l s is, that in the EV case, h i g h e r o r d e r m o m e n t s may c o n t a i n a d d i t i o n a l
Our e m p h a s i s fiability,
is on two problems:
i.e.
(in the non G a u s s i a n case)
i n f o r m a t i o n a b o u t the t r a n s f e r function.
The f i r s t is the p r o b l e m of identi-
the p r o b l e m w h e t h e r the c h a r a c t e r i s t i c s
of i n t e r e s t
39
m e n t i o n e d a b o v e can be u n i q u e l y d e t e r m i n e d f r o m c e r t a i n c h a r a c t e r i s t i c s of the o b s e r v a t i o n s as e.g. f r o m t h e i r from their p r o b a b i l i t y law
(ensemble)
s e c o n d m o m e n t s or
(see D e i s t l e r and S e l f e r t
(1978)).
If the
answer is n e g a t i v e then the s e c o n d p r o b l e m is to d e s c r i b e the sets of 9 b s e r v a t i o n a l l y e q u i v a l e n t c h a r a c t e r i s t i c s of interest,
i.e. the sets
of c h a r a c t e r i s t i c s of i n t e r e s t w h i c h c o r r e s p o n d to the same c h a r a c t e r istics of the o b s e r v a t i o n s . These q u e s t i o n s are q u e s t i o n s p r e c e d i n g e s t i m a t i o n in the n a r r o w sense and as has b e e n s t a t e d a l r e a d y they t u r n out to be the m a i n d i f f i c u l t y in the p r o c e s s of e s t i m a t i o n
(or inference)
in E V models.
This diffi-
culty is the r e a s o n w h y not v e r y m u c h a t t e n t i o n has b e e n p a i d to EV models for a long time. However,
in the last d e c a d e t h e r e has b e e n a
r e s u r g i n g i n t e r e s t in E V m o d e l s in e c o n o m e t r i c s , theory, see e.g. A i g n e r and G o l d b e r g e r A n d e r s o n B.D.O. (1984), D e i s t l e r
s t a t i s t i c s and system
(1977), A i g n e r et al.
(1985), A n d e r s o n and D e i s t l e r
(1984),
(1984), A n d e r s o n T.W.
(1984), D e i s t l e r
(1985a),Fuller
(1980), G r e e n and
Anderson
(1985) , H i n i e h and W e b e r
(1984) , K a l m a n
(1982), K a l m a n
Maravall
(1979), Picci
Mittag
(1985), W e g g e
(1985), S 6 d e r s t r 6 m
(1983),
(1980), S c h n e e w e i B und
(1983).
The p a p e r is o r g a n i z e d as follows.
In s e c t i o n 2 we r e p e a t some well
known results for the static case.
In sections 3 to 5 we c o n s i d e r the
(dynamic) c a s e w h e n the c h a r a c t e r i s t i c s are their second moments.
of the o b s e r v a t i o n s c o n s i d e r e d
T h e r e b y in s e c t i o n 3 the set of all t r a n s f e r
functions c o r r e s p o n d i n g to g i v e n s e c o n d m o m e n t s of the o b s e r v a t i o n s is described.
Section
4
deals w i t h the same p r o b l e m , w h e n the s y s t e m
is a priori k n o w n to be c a u s a l and w i t h the p r o b l e m w h e t h e r c a u s a l i t y can be d e t e c t e d f r o m the s e c o n d m o m e n t s of the o b s e r v a t i o n s .
In sec-
tion 5 several c o n d i t i o n s for i d e n t i f i a b i l i t y are given. F i n a l l y in section 6 we d e r i v e c o n d i t i o n s for i d e n t i f i a b i l i t y u s i n g i n f o r m a t i o n coming f r o m m o m e n t s of o r d e r g r e a t e r than two.
The system c o n s i d e r e d is of the form
(1.1)
Yt = w(mxt
40
where B on ~
is a c o m p l e x v a r i a b l e as w e l l as the b a c k w a r d - s h l f t o p e r a t o r
and where
(1.2)
w(B)
=
Z wiBl
is the t r a n s f e r function.
The s u m m a t i o n on the l.h.s of
(1.2) ranges
o v e r all i n t e g e r s and thus in g e n e r a l the s y s t e m is not a p r i o r i a s s u m e d to be causal.
The o b s e r v e d p r o c e s s e s
(x t) and
(yt) are g i v e n by
^
(1.3)
x t = xt + Ut
(1.4)
Yt = Yt + vt
We a s s u m e throughout:
(1.5) All p r o c e s s e s c o n s i d e r e d are
(wide sense)
stationary;
all limits
of r a n d o m v a r i a b l e s are u n d e r s t o o d in the sense of m e a n squares convergence
(I .6)
Ex t = Eu t = Ev t = 0
(I .7)
EXsU t = EXsV t = 0
Vs,t
and
(1.8)
(ut,v t) has a s p e c t r a l density,
~ say.
T h e s e a s s u m p t i o n s are c a l l e d the s t a n d a r d a s s u m p t i o n s h e r e and they w i l l not be f u r t h e r e x p l i c i t e l y restated.
The a s s u m p t i o n Ex t = 0 is i m p o s e d for n o t a t i o n a l c o n v e n i e n c e o n l y and may e a s i l y be relaxed. tion
(1.8)
(1.7)
is n a t u r a l
is n a t u r a l for errors.
In m a n y cases w e in a d d i t i o n a s s u m e
in our context. A l s o the assump-
41
(I .9)
EUsV t = o
Vs,t
i.e. ~ is d i a g o n a l (1.10) All p r o c e s s e s Thereby,
if
considered
have a spectral
(zt) is a s t a t i o n a r y
density
we often use fz to d e n o t e
process,
its spectral density. Assumption and
(1.9) means
(yt) are
due
that all
ment devices
effects
e.g.
if the errors
for inputs and outputs are correlated.
then c o r r e s p o n d
between
the s i t u a t i o n
is h o p e l e s s
to given second m o m e n t s
to separate
because
(x t)
effects are
Of course s i t u a t i o n s may occur w h e r e
can not be justified,
tional a s s u m p t i o n information
(linear)
to the s y s t e m and that only i n d i v i d u a l
a t t r i b u t e d to the errors. an a s s u m p t i o n
common
such
in the m e a s u r e -
W i t h o u t any addi"too many"
of the observations.
systems
Additional
the errors c o u l d be o b t a i n e d from certain
frequency domain p r o p e r t i e s
of the errors,
or from h i g h e r order moments.
2. The Static Case Here we c o n s i d e r the t r a n s f e r
the special case, w h e r e the system is static,
function w is simply the slope p a r a m e t e r of a line and
all p r o c e s s e s detail in the
are white noise. literature,
the surveys by M a d a n s k y T.W. A n d e r s o n complicated
see K a l m a n
Yt
(1.3) and
This case has been d i s c u s s e d
see e.g.
Gini
(1959), Moran
(1921), F r i s c h
(1982)
in great
(1934)
(1971), A i g n e r et al.
(1984). For the m u l t i v a r i a b l e
The static E V model
(2.1)
i.e.
and
(1984)
and K l e p p e r and L e a m e r
(1984).
is w r i t t e n as
= axt
(1.4), w h e r e
EXsX t = ~st.O~ In a d d i t i o n we a s s u m e
; a~R
(xt),
;
(ut) and
EUsU t = 6stO u (1.9)
(v t) are w h i t e n o i s e and thus
;
and
case, w h i c h is much more
EVsV t = 6stO v
i.e. E U s V t = o. If we try to w r i t e
42
(2.1) (1.3)(1.4)
as a "regression"
in the o b s e r v e d variables,
we
obta in :
Yt = axt + (vt - aut) But here E x t ( v t - au t) = -a.o u and thus in general squares e s t i m a t o r s investigate
will not be consistent.
the p r o b l e m
The p a r a m e t e r s
least
in more detail.
of i n t e r e s t are ~ = (a,o~,Ou,av).
ween these p a r a m e t e r s
(ordinary)
T h e r e f o r e we have to
and the second moments
The relation bet-
of the o b s e r v a t i o n s
is given by (2.2)
o x = Ex~ = o~ + ~u
(2.3)
~xy = ExtY t = ExtYt = a . ~
(2.4)
~y
^
^
= Ey 2 = a2o^ + x
v
Thus the p r o b l e m of i d e n t i f i a b i l i t y model
is w h e t h e r
8 is u n i q u e l y
from second moments
determined
for this
from ~x" Cxy" Oy" A slightly
m o r e general model w o u l d be of the form
(2.5)
(I .3) -
bYt = ax t
(where a and b are suitably n o r m a l i z e d e.g. by a 2 + b 2 = I)
(I .4). Sloppy speaking here we a l l o w for the case a = ~
(2. I). Then the p r o b l e m of o b s e r v a t i o n a l to the f o l l o w i n g covariance
"Frisch"
matrix
problem
equivalence
(see Kalman
is e q u i v a l e n t
(1982)):
Given the
K = --Fx'~xyi find all d e c o m p o s i t i o n s i
l
l yx °yI (2.6)
K
into c o v a r i a n c e
-- ~
÷
(i.e. symmetric,
nonnegative
definite)
matrices
^
K a n d ~,
such that K is singular and ~ is diagonal.
lence is s t r a i g h t f o r w a r d ~
in
This equiva-
here K is the c o v a r i a n c e m a t r i x of
4,3
^
^
(xt,Yt), a and b, after suitable normalization, are defined from
~=(°u0)
^
the linear dependence relations in K, and
0
In the case
(2.1)
aV
(which excludes the possibility b = 0 in (2.5)
and which is the only one we treat here, unless the contrary has been explicitely stated)
K = ~.
a
a
2
holds. ^
By the singularity of K we have ^ det K =
(2.7) where a~ = E ~
a^.a^
x
y
~2 xy
-
=
O,
and furthermore
(2.8)
0 -<- ~^x -<- ax
(2.9)
0 -<- ~^y -<-
y
and these are the only restrictions on ~ of pairs
(~,g~)
and ~9. Thus the range
compatible with the given second moments of the
observations is a part of a hyperbola, as illustrated in Fig 3 a^J b 0Y: __. Y g
~
x
Fi@ 3: The range of compatible
^=
(
I
a2 xy"
~-I y
J
~v
x
~ ^x
x'
pairs
44 We do assume (2.10)
throughout
that
~^ > 0 x
and that (2.11)
det K > 0.
Then the range of compatible
slope parameters
-I a = ~xy.O~ is given
by the intervals
(2.12)
[axy. Ox I, ~y.~x I ]
for Cxy > 0
~-I [~y" xy
for ~xy < 0
-I] ' ~xy'~x
{0}
for axy = 0
Note that the end points coefficients
of the
of these
from Yt to x t respectively. EV model contains
intervalls
(theoretical) This
regressions
correspond
is an a p p e a l i n g
the two regressions
to the
from x t to Yt and
as extreme
result cases
as the (where
either ut=0 or vt=0). The set of compatible and
parameters
a u and ~v is obvious
from
(2.2)
(2.4).
F r o m what we said above, matrix
it also follows
K can be decomposed
as in
that every covariance
(2.6).
Let us summarize: Theorem
2.1: For every covariance
ponding
EV system
tional
assumptions
(2.5) (1.3) (1.4) (2.1),
(2.10)
0 = (a,0~,Ou, O v) compatible servations
K there exists a corres-
satisfying and
(2.11),
with given
(1.9).
Under the addi-
the set of parameters
second moments
of the ob-
is given by -I
(2.13)
matrix
{O = (a,a
at(sign
-
~xy,Ox
x .[lo y/o
a-1
~xy,~y
- aOxy)6 ~ 4 I
ll,loy.
]}
for °xy + o
and {0 = {(0,
a~,a x - ~,0)6]R4[
o < ~
< o x}
for Oxy
o
45
This result is due to Gini
(1921) and F r i s c h
(1934).
If we d r o p the a s s u m p t i o n s~ ~ 0 a n d c o n s i d e r the d e c o m p o s i t i o n (2.6)
(i.e. the m o r e general case
picture:
(2.5)) then we have the f o l l o w i n g
In the case ~xy ~ 0, s~ ~ o m u s t h o l d and in every de-
composition
(2.6), K m u s t h a v e r a n k e q u a l to one. For o t h e r w i s e
^
K = 0 and K = ~ w o u l d not be diagonal. If ~ = 0 and K @ 0, then ^ ^ xy K may e i t h e r h a v e r a n k equal to one, or K = 0 and thus K = ~. In the latter c a s e the errors c o r r e s p o n d to a m a x i m a l e x t r a c t i o n of individual factors of the o b s e r v e d v a r i a b l e s .
For s i n g u l a r K, the F r i s c h p r o b l e m is trivial, defines the u n i q u e d e c o m p o s i t i o n
b e c a u s e then K =
(2.6~ w h e n e v e r ~xy ~ o.
If x t and Yt are not n e c e s s a r i l y one d i m e n s i o n a l Frisch p r o b l e m can be formulated.
then an a n a l o g o u s
Then the m a i n p r o b l e m s are to ^
determine the m a x i m u m corank, m'say, (2.6) of K
(i.e. to d e t e r m i n e
between the true variables) (suitably normalized)
of K among, all d e c o m p o s i t i o n s
the m a x i m u m n u m b e r of l i n e a r r e l a t i o n s
and to c h a r a c t e r i z e the set of all
linear r e l a t i o n s c o r r e s p o n d i n g to g i v e n K.
These p r o b l e m s h a v e n o t yet been s o l v e d for the general, m u l t i variable case
(see K a l m a n
(1982), K l e p p e r and L e a m e r
(1984)) and
they will n o t be t r e a t e d here.
Now, let us turn a g a i n to the one d i m e n s i o n a l case. Gaussian o b s e r v a t i o n s
In the c a s e of
(xt,Yt) , there is no i n f o r m a t i o n f r o m the
data e x c e e d i n g the i n f o r m a t i o n o b t a i n e d f r o m the s e c o n d m o m e n t s ; thus e.g. for the slope p a r a m e t e r a there is an " i n t e r v a l of uncertainty";
therefore,
of course,
the m o d e l
is not i d e n t i f i a b l e
in this case.
There are s e v e r a l p o s s i b i l i t i e s
to o v e r c o m e this "basic" n o n i d e n t i -
liability of the E V model. A g a i n the r e a d e r is r e f e r e d to the survey p a p e r s by M a d a n s k y (1959) and M o r a n (1971). As easily seen, -I if ~u or av or ~u.~v are k n o w n then we h a v e i d e n t i f i a b i l i t y . A s s u m p t i o n s of this k i n d may be j u s t i f i e d in p h y s i c a l a p p l i c a t i o n s
48
where either a-priori
the properties
known,
or where
the true variables such assumptions Another
the m e a s u r e m e n t s
are kept constant.
Gaussian
case)
Reiers~l
(1950)).
is to utilize
from moments Here,
(1969)).
c
information
coming
whereas
large order exist. rather
Zl..Z n
(in the non
than two
we assume
(Geary
(1942)
throughout
that
For technical
than w i t h moments
If z1...z n are random variables,
in the Taylor
about the origin, Brillinger
are
for most applications
of order ~reater
up to a suitable
joint n-th order cumulant (i)ntl...tn
can be repeated,
However
for simplicity,
sons, we deal with cumulants and Stuart
instruments
cannot be justified.
possibility
all moments
of the m e a s u r e m e n t
rea-
(see e.g. Kendall then their
is given by the c o e f f i c i e n t of Cz1"''Zn n series expansion of in E exp ii~izlt i has the following
properties
(see e.g.
(1981)) :
Cz1= EZl
czl..Zn
;
Czlz2 = E(z I - EZl)(zo, - Ez2)
is symmetric
c(alz1+a2z2 ) z3"'Zn
if the random variables
in its arguments.
= a I Czl
z3...z n + a 2 c z2z3--z n
(Zl...z n) and
(Wl...w n) are independent
then C
=C (z1+wl)...(Zn+Wn)
Now,
in addition
(2.14)
~t,t£~
+C Zl...z n
Wl...w n
we assume are independent
and identically
distributed
so are the u t and the v t and the processes (v t) are m u t u a l l y Then we have the following cumulants:
(xt~), (u t) and
independent. relations
between
and
the n-th order
47
(2.15)
Cxn
C~n
(2.16)
Cyrx(n_r)
=
+ Cun
= Cgr~(n_r)
+ Cvru(n-r)
= CgrR(n-r)
= a.cg(r-1)~(n+1_r)=a.Cy(r_1)x(n+1_r);
(2.17)
where
Cy n = C ~ n we have
used
=
r = 2..n-I
+ Cvn
the notation
that C v r u n _ r = 0, for
C z r w n _ r = c z . .z w . . . w
and
the
r > 0, n - r > 0;
r times n-r times r n-r since v and u are
is n o n
(and t h a t
fact
inde-
pendent.
Let us a s s u m e a
that
n > 2 such
using
(2.16),
(2.18)
xt
that
C~n
Cgr-1~(n+1-r)
a is u n i q u e l y
processes.
Note
are at l e a s t and for
that
(2.16)
Thus w e h a v e
shown:
Theorem
Consider
2.2:
the a s s u m p t i o n
x t is n o n
Gaussian
theorem
forward
Then
there
a @ 0 then,
static
r > 0
the c u m u l a n t s
of
to the c a s e
the
observed
n = 2)
there
f o r m C y r x ( n - r ) , r > 0, n - r > 0
-
~,
~u and
a v can
(2.4).
EV model
Then,
under
a n d a ~ 0, t h e m o d e l
can be extended
n-
a is d e t e r m i n e d ,
(2.2)
(2.14).
I > 0,
(as o p p o s e d
Once
from
r-
from
of the
holds.
the
> 0).
(2.1) (1.3) (1.4)
the assumptions
s~
together ~ 0,
is i d e n t i f i a b l e .
to t h e m u l t i v a r i a t e
case
in a s t r a i g t h -
manner.
If i n s t e a d late
that
thus
(2.16)
unique
;
determined
for n > 2
determined
~
we assume
~ 0, a n d
two cumulants
these
be u n i q u e l y
This
If in a d d i t i o n
a = Cyrx(n-r) Cy (r-1) x (n+l-r)
and t h u s
with
Gaussian
@ 0.
of a s s u m i n g (u t v t) holds
provided
is for
that
that
(u t) a n d
Gaussian all there
then
r and for
(v t) a r e
independent
we postu-
Curvn- r = 0 whenever
n > 2 and
all n > 2 and
is a n > 2 s u c h
that
therefore ^
Cxn
~ 0.
a is
is
48
Now, let us make a few remarks on estimation: mic Gaussian
likelihood
is of the form (2.19)
function
(where constants
LT(O)
Thereby K(O)is the covariance
in (2.19)
estimator
The corresponding observationally
(MLE), obtained by minimizing
IT =
In the non Gaussian
case, of course
estimator
The reader
(for the case Oxy,T>0
Sy,T
say)
a'~xy,T )
(2.18) can be used to define
for a from the sample cumulants.
that the estimators
Here the
should be selected and also infor-
is referred
of different
to Drion
3. Second Moments and Dynamic Models:
order can be
(1951)
of obtained from
satisfy the restrictions
and Scott
(1950).
(2.18) do not
coming from the second moments. The General
Case
From now on, linear dynamic
systems are considered.
the two sections
only the information
following,
of all
to the true K,
IJxy
mation coming from sample cumulants
necessarily
corresponding
~x,T - &-1 "~xy,T'
'
problem arises which cumulants
Note also,
i.e. the e s t i m a t e o f t h e s e t
parameters
to theorem 2.1 is given by
[~xy,T'~x,T -1
combined.
xt
T tZ1= (yt) (xt'Yt)
set of parameters,
equivalent
{8 = (a,a-1.~xy,T,
a consistent
size. Thus the correspon-
is given by
K T = \~xy'T' Oy,T
then according
'
matrix K in (2.6) corresponding
{~x,T'°xy,T) (2.20)
(2.1) (1.3) (1.4)
T Z (xt,Yt).K-1(0).(xt,Yt) t=l
8, and T is the sample
ding maximum likelihood LT(O)
of the static model
logarith-
have been neglected):
= T log det K (8) +
to the parameters
The negative
second moments of the observed processes
In th£s and in
coming from the
(x t , yt ) is used.
49
For the m o m e n t let x t and Yt be not n e c e s s a r i l y Let z t = (xt,Yt), The general
zt =
one dimensional.
(~t'gt)' wt = (ut'vt)"
f o r m of a linear dynamic
s y s t e m is:
N • •N•
(3.1)
lim Z N~ i=-N wi
•
--
= 0, w i 6 ~
zt-i
m x n
; m
where N
w(B)
= lira
X
~!N)Bi ¢ 0
N ÷~ i=-N x is the c o r r e s p o n d i n g
transfer
function.
In
(3.1), a n a l o g o u s l y
for the static case, we a l l o w for a c o m p l e t e l y the variables.
symmetric
to
treatment
In this section we assume that the spectral
(2.5) of
densi-
ties
f =
, ~ =
fyx fy
and ~ of
(xt,Yt),
(1.9) holds
(~t,§t)
and of
1
fxy fyx fy
(ut,v t) r e s p e c t i v e l y
in the sense that ~ is diagonal.
(3.1)(1.3)(1.4)
we have a d e c o m p o s i t i o n
exist and that
Then for an EV system
analogous
to
(2.6):
^
(3.2)
f = f ÷ ^
.
where f is singular Note that a matrix
f:[-~,~]
stationary p r o c e s s
if and only matrix
(3.1)) and ~ is diagonal.
if it is an integrable,
satisfying
f(1)
from f, every d e c o m p o s i t i o n
singular and w h e r e ~ is a diagonal omit the s t a t e m e n t
l-a.e),
Since f is singular, (3.3)
= 0 by
~ ~nxn is a spectral d e n s i t y of a
definite H e r m i t i a n if we c o m m e n c e
^
(since ~ ( ~ l ) f
w(e -il)
corresponds
a transfer
f(1)
= 0
spectral
(real)
nonnegative
= f (-I)'. C o n v e r s e l y (3.2) where f is a density matrix
to an EV system
(we always
(3.1) (1.3) (1.4):
f u n c t i o n can be found s a t i s f y i n g
50
T h e r e b y a l w a y s a s u i t a b l e n o r m a l i z a t i o n can be c h o s e n such that the limit in
(3.1) exist.
As easily can be seen,
for every spectral density matrix f a decc~position (3.2}
exists;
= Imin(1).I,where
take e.g. ~(I)
Imin(1)
is the s m a l l e s t
e i g e n v a l u e of f(1)
Clearly,
for given f, in general the d e c o m p o s i t i o n
unique. A n a l o g o u s l y
(3.2)
is not
to the static m u l t i v a r i a t e case the main p r o b l e m
then is to d e t e r m i n e the m a x i m u m p o s s i b l e n u m b e r of
(independent)
^
l i n e a r d y n a m i c r e l a t i o n s b e t w e e n the z t, m'say,
over all d e c o m p o -
sitions for given f and to d e s c r i b e the set of all o b s e r v a t i o n a l l y e q u i v a l e n t w. N o t e that
(3.2) may also be f o r m u l a t e d as a d y n a m i c
f a c t o r a n a l y s i s model.
N o w again we r e s t r i c t o u r s e l v e s to the case n = 2, i.e. when x t a n d Yt are o n e - d i m e n s i o n a l :
(3.4)
f~(1)
and that
> 0
We a s s u m e
V
(3.1) can be w r i t t e n as
~
wi~t_i
= 0.
i=--00
Clearly,
for n = 2, f(1) can e i t h e r have rank one or rank zero. ^
In
the second case f(1) = 0 and f (I) = 0. I m p o s i n g (3.4) implies ^ xy m* = I and f(l) has rank I for all I. (3.4) is a u t o m a t i c a l l y fulf i l l e d if f
(I) ~ 0 for all I. Thus u n d e r (3.4) the s y s t e m can xy a l w a y s be w r i t t e n as (1.1) w h e r e w = (w,-1) is u n i q u e for given f and
fxy(X)
= 0 implies
If f itself
w(e - i l )
is singular,
= O. then f=f and ~=0 d e f i n e s a d e c o m p o s i t i o n
c o r r e s p o n d i n g to an e r r o r - f r e e
system.
This d e c o m p o s i t i o n
is u n i q u e
whenever f
(I)~0. For 1's w h e r e f (I)=0 we may have e.g. f (I)=0, xy xy y (fx(1) > 0 and f~(1) > 0), and fu(1) > 0 gives rise to a n o t h e r d e c o m -
position.
Of c o u r s e
in this case we have for the c o r r e s p o n d i n g
transfer function w(e-il)=0. a s s u m e that f(1) than zero.
For the rest of the p a p e r we a l w a y s
is n o n s i n g u l a r on a set of L e b e s g u e m e a s u r e g r e a t e r
51 Besides the transfer function w, the other characteristics interest are
of
fx' fu' fv"
Analogously to the static case, the set of pairs with given f satisfies
(f~,fg) compatible
(3.5)
O < f~
< fx'
f~(1) = f~(-l);
f~ is measurable
(3.6)
0 ~ f9
! fy,
f~(1)
f~ is measurable
= fg(-l) ;
^
and, since f is singular
(3.v)
If
xy
12 = f^f^
x y
and (3.5)(3.6)(3.7) are the only restrictions on (f~,f~). Thus we have (Anderson and Deistler (1984) Deistler(1985a)). Theorem 3.1:Consider
the linear dynamic EV system
Under the additional assumptions (1.9)(1.10) all transfer functions w satisfying (3.8)
Ifyx(1) l-f~1(~)
=
(1.1),(1.3),(1.4).
and (3.4), the set of
w(e-il) I < fy(1)-Ifxy(1)I
-I
for fxy (I)
~(fyx(1))
t o
where ~(z) denotes the phase of the complex number z, and (3.9)
w(e -il) = 0
for fxy(1)
= 0
is the set of all transfer functions w corresponding The corresponding
set of the other characteristics
to given f
of interest,
fx' fu' fv satisfies the following relations
= r~ fyx(1).w_1(e_il )
(3.10)
for fxy(1) ~= 0
f~(1)
L 0 < f~(1)
<_ fx(1)
for fxy(1)
= 0
52
(3.11)
f
(3.12)
f
u
v
= f = f
- f^ x
x
y
- w(e -iX) .f
xy
T h e o r e m 3.1 is the d y n a m i c a n a l o g o n to T h e o r e m 2.1.
It shows t h a t the
p h a s e of the t r a n s f e r f u n c t i o n is u n i q u e l y d e t e r m i n e d from f w h e r e a s the gain of w may vary in a b a n d w h o s e b o u n d a r i e s frequency)
(which d e p e n d on
c o r r e s p o n d to the dynamic r e g r e s s i o n s w h e n e i t h e r u t = 0
or v t = 0. 4. C a u s a l i t y
We h e r e d i s c u s s two a s p e c t s of c a u s a l i t y in the EV setting: c o n s i d e r the k i n d of a d d i t i o n a l
F i r s t we
i d e n t i f y i n g i n f o r m a t i o n o b t a i n e d from
an a p r i o r i c a u s a l i t y a s s u m p t i o n ,
(1.2) w. = 0, i < 0; 1 (and thus the s u m m a t i o n is r a n g i n g f r o m zero to i n f i n i t y only). Second,
the p r o b l e m
i.e.
if in
of w h a t can be said a b o u t the c a u s a l i t y status of the
system, given the second m o m e n t s of the o b s e r v a t i o n s
Let us i n t r o d u c e some notation:
p(B) by ~p we d e n o t e
=
is treated.
For a p o l y n o m i a l
P i 7. pi B , i=0
pi ~
, BE~
its degree and by p* we d e n o t e the r a t i o n a l f u n c t i o n
p*(B)
= p(B -I)
= 6p7 pi B -i i=0
Let ~Po d e n o t e the m u l t i p l i c i t y of the zero of p(B) we denote the p o l y n o m i a l
~(B)
at B = 0; by ~(B)
d e f i n e d as
= p * ( B ) . B 6p
are the n o n z e r o roots has d e g r e e equal !o ~p - ~Po" If B 1 . . . B ~ p _ ~ p o implies ~ = p. of p, t h e n B ; 1 . . . B -~Po are the roots of ~. Po ~ 0
Furthermore,
we d e f i n e
p
+
and p- respectively
p = p+.p-.B~PO
by
53 and p+(B) + 0 IBI
be polynomials.
IB[_> I; p-(0)
As w e l l k n o w n
every
= I
rational
function
f
of the f o r m
f(e-iA)
= P1(e-il)P2(e
d e f i n e d on the u n i t c i r c l e extension
(4.1)
plane has a unique
rational
: PI(B)P2(B)-Ip~(B)p~(B)-I
section we assume
considered
of the c o m p l e x
to ~, g i v e n by
f(B)
In this
-iX-I ,(e-il)p~(e-iX)-I ) P3
are r a t i o n a l , W(B)
t h a t the t r a n s f e r
function
w and spectra
i.e.
= a -I(B)b(B)
f^ = d - l e ~ e* d -I* x c (4.2)
f
u
= c - l h ~ h* c -I*
fv = f - l g o v g . f - l *
where
a(B)
=
~a ~ aiBi , i=0
b(B)
=
6b 7~ b i B i i=0
d(B)
=
Sd . Z d . B I, i= 0 l
e(B)
=
6e "B i Z e ; i= 0 1
C(B)
=
6c i Z C~Bx , i=0
h(B)
=
6h . Z h i BI i=0
f(B)
=
6f • Z fi B~, i=0
g(B)
=
6g i ~ gi B i=0
are p o l y n o m i a l s ,
where
in
(4.2)
we consider
the
,
spectra
to be d e f i n e d
54
(by the r a t i o n a l e x t e n s i o n d e s c r i b e d above) IBI = 1 or on [-~,~])
on ~
(rather than on
and w h e r e a factor of 2 ~ h a s b e e n o m i t t e d a n d
w h e r e we in a d d i t i o n assume:
a(B)
~ o
IBI
f(B)
~ 0
IBI-~
: 1;
d(B)
~ 0
IBE z 1:
1;
g(B)
,~ 0
IBI
e (B) # 0
IBI ~ I
(4.3)
such that ce,a
e(B)
,~ v are the r e s p e c t i v e
< 1
i n n o v a t i o n variances.
~ 0 IBI=I then is e q u i v a l e n t to f~(B)@ 0 IBI=I
i.e. to
(3.4)
A n d f i n a l l y it is assumed:
(4.4)
a,b are r e l a t i v e l y p r i m e and so are d,e and c,h and f,g.
(4.5)
a
+ (0) = d(0)
If we impose
= e(0)
= c(0)
= h(0)
(3.4), then the a s s u m p t i o n s
= f(0)
= g(0)
(4.3)(4.4)(4.5)
= 1
are c o s t l e s s
in the sense they do not r e s t r i c t the c l a s s of t r a n s f e r f u n c t i o n s w and spectra f~, fu' fv c o n s i d e r e d , to o b t a i n u n i q u e p a r a m e t e r s . we do n e i t h e r
impose the m i n i p h a s e a s s u m p t i o n b(B)
we a s s u m e b(0)
= I. The a s s u m p t i o n a(B)
e x i s t e n c e of a s t a t i o n a r y s o l u t i o n
(4.6)
but serve only as n o r m i n g c o n d i t i o n s
It should be s t r e s s e d that c o n c e r n i n g b
~ 0,
# 0
IBI
IBI = I g u a r a n t e e s the
(1.1) of a ( B ) @ t = b ( B ) R t.
a(B) + 0, I BI ~-1
then is the c a u s a l i t y assumption.
The r a t i o n a l i t y a s s u m p t i o n s often are imposed,
even if we do not a priori
k n o w that the true t r a n s f e r f u n c t i o n and s p e c t r a are rational, to a p p r o x i m a t e
in order
the true q u a n t i t i e s by q u a n t i t i e s w h i c h can be d e s c r i b e d
by a finite n u m b e r of p a r a m e t e r s .
N o t e that even if f is rational,
the r a t i o n a l i t y a s s u m p t i o n for
the m a t r i c e s f a n d ~ in the d e c o m p o s i t i o n
(3.2)
is an a d d i t i o n a l
55
a-priori restriction, f,~. This is easily
i.e. in general there are also non-rational seen as, e.g. for fx(1)>0
fy(1)>0,
det f>0 , VI,
if ~x)l2 every f~ such that ~ y (Xy1 )
-< f~ 11) --< fx(l~'where
a decomposition
(3.7), and clearly
(3.2) via
be non-rational,
as, by the nonsingularity
Lf
Take for instance f^ = f
X
define rational
f^ can be chosen so to x
of f(1), the strict inequality
(hl t2
xy < f (A) f (~) x Y holds. On the other hand, if f is rational, always exists:
f^x is measurable,defines
a rational decomposition
X then (3.7), (3.11) and (3.12)
fy, fu and fv"
Consider the following example
(Anderson and Deistler
(1984)):
Let fx(1) (4.7)
= cI > 0
fxy(1)= c 2. (1+be -il)* fy(l)
(b I < I
c I > c2 > 0
= 11 + be-ill2.c2+c3 ; c 3 > 0
Then the set of all feasible f~ is given by c~
~ f~(1) ~ c I ,
f~ measurable
c2+c311+be-il1-2 fR c~
Fig. 4 :
Range of
///" /
feasible f^ x 2 c2 c2+c311+be-ill-2 0
f,~
56
The range of feasible f~ in this case containes all rational densities
spectral
of the form
c. p (e-il-z_.) .P(e-il-zj) I J I
f~(l)
. ~q(e-il_wj) c q. z~e -il -w.;. °1 ] I for arbitrarily
chosen
Izjl<1,
lwjl<1, p and q and for suitably
chosen c, c o . Without restriction of generality we assume zj ~ w i. Under the rationality assumptions
of this section,
ween the second moments of the observations
the relation bet-
and the characteristics
of interest is of the form:
(4.8)
fx = d-le qee,d-1* + c-lh q h * c -I*
(4.9)
fxy = a-lb d-le see,d -I*
(4.10)
fy = a-lb d-le q e,d-l*b*a-1*
+ f-1 g q~g.f-1*
In a first step, we analyse the information about w (and f~) coming from fyx: If w, f~ corresponds
to an EV system also satisfying
then we have: (4.11)
(4.12)
: w.f . ilo cf2 2f 1 1 f - f2 w c-lf1 if 1 1.B6f2- flf
where, using an obvious notation --I c = qeoe , and thus fi are polynomials (4.13)
fi(B)
+ 0
fl = ed,
f2 = ed
satisfying IBI < 1
;
i = 1,2
(4.9)
57
(4.14)
f. (0) 1
= I
;
i
=
1,2
and
(4.15)
c > 0.
Conversely, defines
if
let
Then
n = ~b
less
than
f
yx
us w r i t e
one minus
4.1:
(iii)
the
(i) a n d
(ii)
assumptions
exist
n = 6b
+ 6b O - ~a
with
same
then
is b o t h
function
fyx
(4.6)
if a n d
(4.14)
and
of
zeros
our
assumptions)
of w of m o d u l u s
-
following
(4.5)
that
invariant
is a t r a n s f e r
function
same
satisfying
(4.11)
one.
given
Then:
to the
fl,f2
than
for
theorem:
hold.
corresponding
polynomials
less
determined
- ~a ° is a n
causal
there
fyx (4.13)
holds
for
all
EV systems
there
which
only
which
that
then
if t h e r e
(corresponding
to
function
which
is c a u s a l ,
but
is m i n i p h a s e .
is n o c a u s a l
known
holds,
w
and miniphase
is a t r a n s f e r
function
it is a p r i o r i if
in the
c > 0 such
there
If n < 0, t h e n
i.e.
obviously
f
If n > 0, t h e n
If
(4.12)
of w of m o d u l u s
(4.1)
if t h e r e
and a constant
is a t r a n s f e r
(vi)
and
satisfying
n is u n i q u e l y
if a n d o n l y
which
(and
of p o l e s
see t h a t
functions
no transfer
(v)
we
shown
Let
If n = 0,
(4.11)
- ~a O is t h e n u m b e r
the n u m b e r
(4.13)
the
then
(4.9)
transfer
fxy)
(iv)
via
yx
w and w are
(4.14)
(ii)
f
hold,
w = b + b - B ~ b ° ( a + a - B~a°) -I
. Thus we have
(i)
(4.15)
+ ~b O - ~a
(4.11) a n d
Theorem
-
a w a n d f^ g i v i n g x
Now,
From
(4.13)
transfer
function,
but
there
is m i n i p h a s e
the
transfer
w = a-lb exists
functions
and w correspond
a polynomial
f2
are causal, to the
satisfying
same (4.13)
58
(4.16)
0 < $f2 <- n
and a c o n s t a n t c > 0 such that = c. (b+~ -) (a+~ - ) - 1 . f 2 ~ 2 . B n - ~ f 2
(4.17)
holds Proof:
If in
(4.11) w e take f2 = a w = c ( b + ~ -)
and fl
(a+~-)-lB n
w h e r e ( b + ~ - ( a + ~ -) is a causal and m i n i p h a s e with
(ii) this implies
(iii) - (v) and a l s o
This result has been stated in D e i s t l e r
non n e c e s s a r i l y
case a n d in G r e e n a n d A n d e r s o n From
rational (1985)
transfer (vi)
function;
together
is e a s i l y seen.
(1985a)and D e i s t l e r
Partly more general r e s u l t s have been given for the causal,
then
in Anderson,
(1985 b).
B.D.O.
(1985)
single input - single output
for a causal m u l t i v a r i a b l e
case.
(4.17) we see that the c a u s a l i t y a s s u m p t i o n gives a substantial
r e d u c t i o n of the set of all t r a n s f e r f u n c t i o n s c o m p a t i b l e w i t h given f
. If n = 0, then in the causal case, w is unique up to m u l t i p l i yx c a t i o n by a p o s i t i v e c o n s t a n t (this has b e e n p o i n t e d out by H i n i c h (1983) and Anderson,
B.D.O.
(1985)). An e s t i m a t i o n p r o c e d u r e
has been d e v e l o p e d by H i n i c h
5. C o n d i t i o n s
(1983)
for I d e n t i f i a b i l i t y
for the case n=0
and H i n i c h a n d Weber
(1984).
f r o m the Second M o m e n t s of the
Observations This section c o n s i s t s of two parts: the a d d i t i o n a l
the second step of the a n a l y s i s case.
Special e m p h a s i s
We have
(see A n d e r s o n and D e i s t l e r (1985a) (1985b), M a r a v a l l
If the transfer
(4.10). So this is section for the rational In the second p a r t
for i d e n t l f i a b i l i t y .
T h e o r e m 5.1: Let the a s s u m p t i o n s (i)
(4.8) and
of the p r e v i o u s
is put o n identifiability.
we give some o t h e r c o n d i t i o n s
Deistler
In the first part, we i n v e s t i g a t e
i n f o r m a t i o n c o m i n g from
(1984), A n d e r s o n , (1979), N o w a k
(4.1)
B.D.O.
(1985),
(1983)):
- (4.5) hold.
Then:
f u n c t i o n s are a p r i o r i k n o w n to be causal and
if either n = 0 or if
59
(5.1)
d,b
are relatively
prime
(5.2)
a,e
are relatively
prime
(5.3)
b+,~ -
then w is u n i q u e l y
are relatively
determined
prime
from f under
e a c h of the f o l l o w i n g
condi-
tions :
(5.4)
d,c
are r e l a t i v e l y
prime and
(5.5)
a.d,f
are relatively
prime
(5.6)
6d = 0 a n d
(5.7)
Gad = 0 a n d 6e + 6b > 6g - 6f
(ii)
If w is a p r i o r i assumed
(iii)
assumed
f2 An
If the t r a n s f e r assumptions
(4.14)
to be c a u s a l
(5.8)
a,e
(5.9)
a ,a
prime,
~ad > 0
and
if d a n d c a r e a p r i o r i
then there
compatible
functions
(5.1)
and
~e > 6h - 6c
to b e r e l a t i v e l y
of f a c t o r s
~d > 0
is o n l y a f i n i t e
with given
are n o t n e c e s s a r i l y
number
fx" causal,
a n d if the
- (5.3)
are relatively
prime
+~--
hold,
(iv)
then under
are relatively
prime
(5.4)
or
If w , f ~ c o r r e s p o n d s
or
(5.5)
to g i v e n
fyx'
(5.6)
or
(5.7)
t h e n a l l cw, c - l f ~ ,
satisfies (5.10)
0 < C m i n _< c _< C m a x w i t h C m i n a n d Cma x d e f i n e d
(5.11)
by
min 1B I= I
(fx(B)
- c -I f~(B)) = 0 min
IBlmin=1
(fy(B)
- Cma xlw(B) 12f~(B))
and (5.12)
w is u n i q u e
= 0
where c
60
c o r r e s p o n d to given f, and for all o t h e r c, cw, c-lf
correspond Proof:
to given
x
does not
f.
(i): If n = 0, then as has a l r e a d y b e e n stated, w is u n i q u e l y
determined from
(4.9) up to m u l t i p l i c a t i o n by a p o s i t i v e constant.
The same h o l d s under
(5.1)
-
(5.3) : Due to
zero c a n c e l l a t i o n s on the r . h . s
in
(5.1)and(5.2) no p o l e -
(4.9) can occur and thus a and d
are u n i q u e l y d e t e r m i n e d f r o m the p o l e s in fyx" By
(5-3)
, e then is
u n i q u e l y d e t e r m i n e d f r o m those zeros of a d d * f y x , B i say, w h e r e also -I B i is a zero and thus b and w are u n i q u e up to m u l t i p l i c a t i o n by a p o s i t i v e constant.
From
(4.8) we h a v e
(5.13)
dfxd* = e o e* + d c - l h o U h * c - 1 * d *
If
then there e x i s t s at least one zero of d,
(5.4) holds,
B I say,
and we h a v e
dfxd*(B1)
= e~ge*(B1)
and f r o m this Ge and thus b are u n i q u e l y d e t e r m i n e d . (5.5)
If
The proof for
is c o m p l e t e l y analogous.
(5.6)
h o l d s then
(4.8)
is of the f o r m
-fx = e~£ e* + c lhs h*c-
I*
and thus c is u n i q u e l y o b t a i n e d f r o m the p o l e s of fx' Then a e is obt s i n e d f r o m a c o m p a r i s o n of c o e f f i c i e n t s of p o w e r
cf Xc* = c e ~ e e * c * and in the same way we p r o c e e d if
The proof of (5.9)
(iii)
de + ~c in
+ ho h* (5.5)
holds.
is c o m p l e t e l y a n a l o g o u s ,
since
(5.1) - (5.3)(5.8)
h e r e a g a i n g u a r a n t e e that w is d e t e r m i n e d f r o m fyx up to multi-
p l i c a t i o n by a p o s i t i v e constant.
6~
(ii) if d and c are relatively prime,
then all zeros of d are poles
of fx and thus there is only a finite number of candidates
for d and
thus also for f2 in (4.17) (iv) is an immediate consequence
of Theorem 4.1 and of
taking into account the non-negativity
(4.8) and
of spectral densities
for
(4.10), IBI = 1
Clearly, once w is uniquely determined and if w(e -il) ~ 0 then also ~ ' fu' fv are unique. (i) and (iii) show that e.g. using (4.19), once the degrees are prescribed and if 6d > 0, we have identifiability on a generic subset of the parameter Now
space.
we discuss some other cases where additional
guarantee
identiflability
(i) Let the inputs xt have a spectral distribution (5.14)
F^(I) x
=
a priori restrictions
from the second moments of the observations
; f^dl + Z Fx, j [_~,~] x j:lj<~_
function FR given by ;
F
. > 0 x,3
Thus (x t) is a fairly general process where F x has an absolutely
con-
tinuous and a discrete part and where the discrete part corresponds to a stationary harmonic process ZeiljtZx,j, where Fx, j = ElZx,jl 2. Here we do not impose
(1.9). By assumption
(1.8),
(ut,v t) has a spectral
density and thus we have (5.15)
Fx(1)
=
S (f~+fu)dl [-z,l]
+
Z Fx, j j:lj
and (5.16)
5 fvu dl Fyx (l) = [-z!l] w (e-il)dFR (l) + [-~,l]
=
5 w(e-il) ( ~ + f u ) d l [-~,~]
Thus, from the jumps Fx, j and Fxy,j
+
Z w(e-~J)Fx, j + [ 5 ]fuv dl j:~j&~ -~,~
in F x and Fxy we obtain:
62
(5.17)
w(e -ilJ) ~
..F-1
= F xY,3
If w = a-l.b
is rational
°
x,3
with p r e s c r i b e d
(maximal)
for a and b respectively
then w is determined
from na+nb+1
values
Clearly (ii)
(different)
this result can be extended
If it is a priori known, fu(l)
and therefore,
(5.18)
to the multivariate
open set then clearly
then
j = 1...na÷nb+1.
that f~(1)
fuv(1)
= fyx(1)f~(1)-1 it is uniquely
determined
(4.8) we see that if c=I, d is uniquely
holds, d
(5.18),
since
determined
from f : x the unit circle
d is unique.
if the input errors
e, then w is unique
= fx(1)
at some point I£A.
which are located outside
x are the zeros of d and as d(0)=1, (1.9)
of w(e -il)
from
(iii) From
In this case the poles of f
= 0 and f~(1)
16A
e.g. by the derivatives
If
have the property
# 0 , 16A, we have
A is open,
and if
case.
VI6A~[0,~]
provided
w(e -il)
If w is rational,
(e-ilJ))
that the input-errors
= 0
where A is a nonempty VI6A,
(l@,w
degrees na and nb say
under our assumptions
u t are white noise
(S~derstr~m
(1980)).
(i.e. c-l.h=1)
From
(4.8) we
obtain: (5.19)
d f~d* = e%o e* + do d*
As d is uniquely corresponding zation of
determined
to power
(fR-o~)
6d in
(iv) If
(5.19)
into factors
zeros inside the unit circle (1.9) holds,
a moving average
from fx' a comparison
if
process
gives
of coefficients
oH. Then the usual
that have no poles
factori-
inside or on and no
gives d-le and oE and thus also f~ and w.
(S t) is autoregressive (i.e. c=I)
and if ~d>0
(i.e. e=1) (i.e.
and
(xt))
(u t) is
is auto-
63 regressive
in the narrow
(S~derstr6m
fs
Again,
sense,
(1980)).Here
=
d is uniquely df~d*
not white noise)
(4.8)
d-1~E d-l*
+
determined
=
G
then w is unique
is of the form:
ha
P h*
from fx" Now
+ dhap
h'd*
and if B I is a zero of d, we obtain = df~d* (B I ) and thus fs and w are unique. A common
feature
of the cases
(i) - (iv)
(4.8) and the extra assumptions once f~ is known, conditions
on the spectra
the uniqueness
may be imposed
is that f~ is obtained
fs and fu imposed;
of W,fu,f v is immediate.
to detect
f~ from
from
(4.10).
Analogous
Once f9 is unique,
we have
provided
that fxy(l)
6. Identifiabilit[ For non Gaussian mation coming
from High Order Moments observations,
from moments
identifiability
(Akaike
standard a s s u m p t i o n s (6.1)
# 0 IVl and the rest is easy.
(S t) and
analogously
of order greater
(1966),
Deistler
of
infor-
than two may be useful
(1986)).
are
(mutually)
stationary
In addition
for
to our
(S t) and of
(9t)
processes;
independent
up to order n, where n is sufficiently cumulants
case,
we here assume:
(u~v t) are strictly
the processes
to the static
satisfy
and all moments
large,
exist and the
conditions
of the form
64
t I .Z.tn_l =-m
Icgt19t2"'gtrRtr~{'xt n lxo I <
and the same holds for (u t) and (vt) Then (see e.g. Brillinger (1981)) the corresponding spectrum exists and is given by (6.2)
f~r~(n-r) (~I"''1n-I)
=
n-th order cumulant
=
n-1 c^ "Yt xt " "xt Xo exp{-i -Z11jtj] t1" "tn-1 =-~ Yti r r+1 n-1 J-
(2~) -n+1 .
and analogously for (ut) and (vt). As easily seen, due to linearity and continuity of cumulants with respect to one variable (when the others are kept constan%), we obtain from (6.2) and (1.1): (6.3)
fgr~ (n-r) (11 • • "In-l) =
=
(2~) -n+1
= w(e-i11),
Furthermore, (6.4)
~
( ~ w.c~ ^ tl " .tn_1=_~ i=_~ i Xtl_iYt2.
f
n-1 ^ ~o)eXp{-i Z ljtj} = Ytr.. j=1
y(r-1)x(n-r+1 ) (11---ln_ I)
from the properties of the cumulants we obtain
fyrxn_r(ll...ln_ 1) = fgr~(n_r) (l 1...In_ 1) + + fvru(n-r) (I I---1n_ r)
If, in addition we assume (6.5)
(ut) and (vt) are independent
then (6.6)
fvrv(n_r) (ll...ln_ r) = 0
for r > 0,
n - r > 0
65
If we assume (6.7)
(ut v t) is Gaussian,
then fvru(n-r)
= 0 for all n > 2. Thus we have
Theorem 6.1:Consider the (dynamic} EV-model addition the assumptions (6. I} , (6.5) and (6.8)
(1.1) (1.3)(1.4). If in
fy(r-1)x(n_r+1) (1112...1n_ I) ~ 0
V1 I,
for suitable 12...In_ I and suitable n>2; r-1>0, n-r>0 are satisfied then w is uniquely determined from (6.9)
w(e -i11) = fyrx(n_r) (I 1 ...In_l) . fylr_1)x(n_r+1) (I 1 ...ln_1 )
An analogous result holds if (6.5} is replaced by (6.7). Theorem 6.1 is the dynamic analogon to Theorem 2.2. If R t has a Wold decomposition (see e.g. Hannan (1970))
xt = w2(z)ct where (e t) is i.i.d., then (6.10)
(see e.g. Brillinger
fR (11"''ln-1)
(1981))
= (2n)-n+1"w2(e-ill)''w2 (e-iln-1)"
n-1 . w2(ex p i i=1 Z lj). Cen If all moments of ~ exist if gt is non Gaussian,
(provided that ~t ~ 0)
there is a n>2 such that Cen~ 0. If in addition w2(e -il) ~ 0
VI, then
f~n(ll...In_1) fulfilled.
(6.8) is
# 0 ~ll...In_ 1 and then due to (6.3) condition
The generalization of Theorem 6.1 to the multivariable case is straightforward. Estimators of the transfer function w may be obtained from (6.9) replacing the cumulant spectra by their estimators.
66 References Aigner,D.J. and A.S.Goldberger (Eds.), (1977): Latent Variables Socio-Economic Models.North Holland P.C., Amsterdam
in
Aigner,D.J., C.Hsiao, A.Kapteyn and T.Wansbeek (1984): Latent Variable Models in Econometrics. In: Griliches, Z. and M.D.Intriligator (Eds.) Handbook of Econometrics. North Holland P.C., Amsterdam Akaike,H. (1966): On the Use of Non-Gaussian Process in the Identification of a Linear Dynamic System. Annals of the Institute of Statistical Mathematics 18, 269 - 276 Anderson,B.D.O. (1985): Identification of scalar errors-in-variables models with dynamics, Forthcoming in Automatica Anderson,B.D.O. and M.Deistler (1984): Identifiability in Dynamic Errors-in-Varlables models, Journal of Time Series Analysis, 5, 1-13 Anderson,T.W. (1984): Estimating Linear Statistical Relationships. Annals of Statistics, 12, 1 - 45 Brillinger,D.R. (1981): Time Series: Data Analysis and Theory. panded Edition. Holden Day, San Francisco
Ex-
Deistler,M. (1984}: Linear errors-in-variables models. In: J.Franke, W.H~rdle und D.Martin (Eds.), Robust and Nonlinear Time Series Analysis, Lecture Notes in Statistics, Springer-Verlag, Berlin Deistler,M. (1985a): Linear dynamic errors-in-variables models in: J.Gani and M.Priestley (Eds.) Essays in Time Series and Allied Processes. Forthcoming Deistler,M. (1985b): Identifiability and Causality in Linear Dynamic Errors-in-Variables Systems. In: Proc. 5th Eranco Belgian Meeting of Statisticians. Forthcoming Deistler,M. and H.G.Seifert (1978): Identifiability and Consistent Estimability in Dynamic Econometric Models. Econometrica, 46, 969 - 980 Drion,E.F. (1951): Estimation of the Parameters of a Straight Line and of the Variances of the Variables, if they are Both Subject to Error. Indegationes Math. 13, 256 - 260 Frisch,R. (1934): Statistical Confluence Analysls by Means of Complete Re@ression S[stems. Publication No. 5, University of Oslo, Economic Institute Fuller,W.A. (1980): Properties of some Estimators for the Errors-inVariables Model. Annals of Statistics, 8, 407 - 422 Geary,R.C. (1942): Inherent Relations between Random Variables. Proceedings of the Royal Irish Academy, Sec. A, 47, 63 - 76
67
Geary,R.C. (1943): Relations between Statistics: The General and the Sampling Problem When the Samples are Large. P r o c e e d i n g s of the Royal Irish Academy. Sec. A, 49, 177 - 196 Gini,C. (1921): Sull'interpolazione di una tetra quando i valori della variable indipendente sono affetti da errori aocldentall. Metron I, 63 - 82 Green,M. and B.D.O.Anderson (1985): Identification of m u l t i v a r i a b l e e r r o r s - i n - v a r i a b l e s models with dynamics. Mimeo. Hannan,E.J.
(1970):
Multiple
Time
Series.
Wiley,
New York
Hannan,E.J. and L.Kavalieris (1984): Multivariate Linear Time Series Models. Advances in Applied Probability 16, 492 - 561 Hinich,M.J. (1983): Estimating the Gain of a Linear Filter from Noisy Data. In: D.R.Brillinger and P . R . K r i s h n a i a h (Eds.) Handbook of Statistics, Vol 3. North Holland, A m s t e r d a m Hinich,M.J. and W.E.Weber (1984): Estimating Linear Filters with Errors in Variables Using the Hilbert Transform. Federal Reserve Bank of Minneapolis, Res.Dept. Staff Report 96 Kalman,R.E. (1982): System Identification from Noisy Data. In: A.Bednarek and L.Cesari (Eds.) Dynamical Systems II, a University of Florida International Symposium. Academic Press, New York Kalman,R.E. (1983): Identifiability and Modeling in Econometrics. In: Krishnaiah,P.R. (Ed.) Developments in Statistics, vol 4. Academic Press, N e w York Kendall,M.G. and A.Stuart (1969): The Advanced Vol I, 3rd Edition, Griffin, London
Theor~ of Statistics.
Klepper,S. and E.Leamer (1984) Consistent Sets of Estimates for Regressions with Errors in all Variables. Econometrica 52, 163 -. 183 Madansky,A. (1959): The Fitting of Straight Lines when Both Variables are Subject to Error. Journal of the American Statistical Association 54, 173 - 205 Maravall, A. (1979): Identification Springer Verlag, Berlin.
in Dynamic
Moran, P.A.P. (1971): Estimating Structural ships. Journal of M u l t i v a r i a b l e Analisys
Shock-Error
and Functional I, 232-255
Models. Relation
Nowak, E. (1983): Identification of the Dynamic Shock-Error Model with A u t o c o r r e l a t e d Errors. Journal of Econometrics 23, 211-221 Picci, G. (1985): Factor Analylis Methods. This Volume
Models via Stochastic
Realization
68
Reiers¢l,O. (1941): C o n f l u e n c e A n a l y s i s by M e a n s of Lag M o m e n t s and o t h e r M e t h o d s of C o n f l u e n c e A n a l y s i s . E c o n o m e t r i c a 9, I - 24 Reiers~l,O. (1950): I d e n t i f i a b i l i t y of a L i n e a r R e l a t i o n B e t w e e n V a r i a b l e s w h i c h are s u b j e c t to Error. E e o n o m e t r i c a 18, 375 - 389 S c h n e e w e i S , H . u n d H . J . M i t t a g (1985): L i n e a r e M o d e l l e m i t f e h l e r b e h a f t e t e n Daten. P h y s i c a Verlag, W ~ r z b u r g Scott,E.L. (1950): Note on C o n s i s t e n t E s t i m a t e s of the L i n e a r Structural R e l a t i o n B e t w e e n two V a r i a b l e s . A n n a l s of M a t h e m a t i c a l S t a t i s t i c s 21, 284 - 288 S 6 d e r s t r 6 m , T . (1980): S p e c t r a l D e c o m p o s i t i o n w i t h A p p l i c a t i o n to I d e n t i f i c a t i o n . In: A r c h e t t i , F . and M . C u g i a n i (Eds.) N u m e r i c a l T e c h n i q u e s for S t o c h a s t i c Systems. N o r t h H o l l a n d P.C., A m s t e r d a m Wegge,L. (1983): A R M A X - M o d e l s P a r a m e t e r I d e n t i f i c a t i o n w i t h o u t and w i t h L a t e n t V a r i a b l e s . W o r k i n g Paper. Dept. of Economics, Univ. of C a l i f o r n i a , Davis.
Chapter
3
A New Class of Dynamic Models For Stationary Time Series
Giorgio
Picci
and
Stefano
Pinzoni
I. Introduction In this note we shall discuss a new class of dynamic models which may be better suited than conventional ARMAX schemes to describe non-causally interacting time series. Typical areas of application that we have in mind include econometrics (where it is often not clear what variables are "endogenous" and what are "exogenous") and identification of industrial processes operating under feedback. In these situations there is no a priori clear causality relation among the variables and, in fact, a possible goal of the identification experiment could be the testing for existence of causal relations. The class of models introduced here is a natural dynamic generalization of the well-known static Factor Analysis model which in various equivalent forms (the most popular of which seems to be the so-called Errors-In-Variables scheme) has been object of much study in the past especially by econometricians and psychologists.
(For definitions of these concepts and a
rather comprehensive survey of the literature one may consult the recent paper by Van Schuppen(1985). The study of these models has recently been revitalized by Kalman in a series of papers(Kalman, 1982a,1982b and]98~and some of the critiques presented in Kalman's
?0
work have been the motivating stimulus £or the earlier paper (Finesso and Picci, 1984). The present exposition represents the natural continuation and generalization of the results presented there. In order to improve readability we have chosen to skip some non essential technical details. A more complete story can be found in (Picci and Pinzoni, ]986). People interested in genera] philosophical discussions on the modelling problem considered here are referred to the introduction of (Finesso and Picci, 1984). We should mention that some of the specific issues dealt with in this paper are also treated (in the scalar E.I.V. context) in the work of Anderson and Deistler (1984), Anderson (]985), Deistler (]985). Although the primary motivations (and hence the basic assumptions) in these papers are of a rather different nature than ours, the reader might find some ground for comparisons in the discussion of the causality problem presented in section 4. For the sake of motivating the introduction of Dynamic Factor Analysis models we shall briefly review the definition of causality of a dynamical model, first in the deterministic and then in the stochastic (Gaussian) case. The idea that we want to convey is that causal models are quite "nongeneric" mathematical descriptions to impose aprioristically to real data, e.g. economic time series or data coming from industrial processes involving feedback. In a deterministic framework the notion of causality is of course well known. Assume that the components of the m-dimensional variable y(t), whose temporal evolution is described by a certain dynamical model, have been grouped in two subvectors,
Yl(t)] y(t)
=
,
(I .I)
Y2 (t)
with Yi(t)_ _ER
mi
, i = 1,2 , and m1+m 2 = m .
It is intuitively clear
that a dynamical model should quantify the dynamic relation
71
occurring between the variables Yl and Y2 (i.e. how much
Yl
"influences" Y2 and vice versa). This is made precise in J.O.Willems refoundation of Systems Theory (Willems,1979):anymodel valently dynamical system) with external variables subset of trajectories
is just a
L~ (called the behaviour of the system)
in ( m)~= ( ml)~x (~m2) Z between
y
(or equi-
and therefore a bona fide relation
Yl (ranging over (~ml) ~) and Y2 (ranging on (~m2)2). We
say that Yl causes Y2 or, equivalently,
that Yl is the input and
Y2 is the output variable of the system, if this relation specializes to a very particular kind of function, namely if
y2(t) = f(yl), where
t~z
,
(1.2)
f depends only on the values taken by Yl before and at
time t . In the stochastic case the sharply defined subset ~
is
replaced by a probability measure on the sample space (Rm) Z and thus the external variable
y becomes a stochastic process
{y(t)}. The model is in this case just the probability law of {y(t)}. To make things simple we shall consider here about the simplest possible class of random processes, described in the following BASIC ASSUMPTION The process {y(t)} is an m-dimensional Gaussian stationary process with zero mean and has a rational spectral density
S
strictly positive definite on the unit circle (i.e. S(e ie) >0).
D We shall write the spectrum
S in a partitioned form corre-
sponding to the subdivision (1.1) of the external variables,
72
S
S1
S12
S21
S2
,
=
(I .3)
where the blocks S., of dimension m. xm., represent the auto l
I
l
spectra and $12 the cross spectrum of the two components {y1(t)} and {Y2(t)} of dimension m I and m 2. The definition of causality in this context, essentially due to
Granger (3963 and 1969)~sounds as follows.
DEFINITION 1.1 We say that the process Yl causes Y2 or, equivaleqtl~, that Yl is an input process with correspondlng output Y2 if, for all
t ~ ~ ,
E~2(t ) lyl]
= E[Y2(t)
]Y1(S), si~ ,
(1.4)
where the first conditional expectatlon is with respect to the whole history {Y1(t); t E Z }
of the component YI"
O Causality is just conditional independence of the past and present output history {Y2(S); s ~ t }
from future inputs
{Y1(S); s > t } given the past of the input {Y1(S); s ~ t } and can of course be defined in a much more general setting than the one adopted here. In a Gaussian setting we can however translate everything in the convenient Hilbert space language of the linear theory of random processes (see e.g.Rozanov, 1967).Some of this material necessary for future use will be quickly reviewed in the next paragraphs. We shall denote the vector space of all finite linear combinam tions of the scalar random variables [a'y(t); ~ 6 R , t E Z } closed in the metric induced by the scalar product
< x,z > : = Ex z, by +
the symbol H(y) (sometimes abbreviated to H) • H~(y) , Ht(Y) will
73
denote the past and future subspaces spanned by the random variables y(s) up to and, respectively, after and at, time t. Clearly,
h • (=yurn(y) ) where U : y(t)+y(t+l)
(1.5)
is the (unitary) shift operator of the
process {y(t)}. Normally the subscript zero in (1.5) will be dropped. For the two components Yl and Y2 we shall define the subspaces H(Yl) , H(Y2) (abbreviated to H I and H 2 when there is no danger of confusion) accordingly. Obviously H = HIV H 2
where
the wedge denotes closed vector sum. Subspaces like H I and H 2 are doubly invariant for the shift U, in the sense that they satisfy
UtH. = H. for all t E ~. The i l multiplicity of a doubly invariant subspace X C H is the cardinality of any minimal generating set, i.e. is the smallest n which one can find random variables
{x1''"'Xn } in X
for
such that
the vector space generated by {Utx.; i= 1,...,n , t E Z } is dense i in X. The process {x(t)} with x.(t) =Utx. is called a generating l i process of X. By the Spectra] Representation Theorem (Rozanov, 1967), there is a unitary representation of the random variables in H(x) as n-dimensional
(row) vector functions in the Hilbert space
L2(C,dQ) where C = {z; Izl = l} is the unit circle in the complex n plane and Q is the n x n matrix spectral distribution measure of the process {x(t)} . Each random variable ~(t): = ut~ with ~EX can be written as ~(t) = [~e i@t f(ei@)dx(e i8)
for a unique
f E L 2 (C,dQ). Here ~ is the n-dimensional random n spectral measure of the stationary process {x(t)}. As it is well known ( Rozanov,
1967 )
the spectral distribution matrix is re-
lated to ~ by dQ=E(d~ d~*),where the star means conjugate transpose.
74
The representation will be symbolically written as
(i .6)
~(t) ~ f(z)x(t) .
The System Theoretic interpretation of the notation is that the stationary process {~(t~ is obtained by passing the stationary process {x(t)} through the linear (stable) filter of transfer function f. In all cases of interest for us the spectral distribution measure of {x(t)} will be absolutely continuous with respect to Lebesgue measure on C. The spectral density matrix will still be denoted by the symbol Q.It is well known(compare e.g.Fuhrmann~1981, p. 111) that {x ,...,x } being a minimal generating set is equivalent to Q(e
.@I i
n
. .
. ,
. °
.
. ,
o
) being st rlctly posltlve deflnlte on a set of
positive Lebesgue measure. For example the assumption S(e i8) > 0 a.e. guarantees that H = H ( y ) has precisely multiplicity m, a possible minimal set of generators being given by the m scalar components of the random vector y(0). Observe further that any other minimal generating process for H(x) can be written as
u(t) = T(z)x(t) ,
(1.7)
with T an nxn matrix function having rows in L2(C,dQ) and Q-a.e. n
nonsingular o n t h e
unit circle.
Of course when {x(t)} admits a spectral density Q which is a.e. positive definite on the unit circle, then all admissible T ' s
will be a.e. nonsingular on C and all minimal gene-
rating processes for H(x) will have an a.e. positive definite spectral density on the unit circle.In particular, by choosingT=W I//2n
75
where W is any square solution of the standard spectral factor~zstionproblemWW*=Q,we
obtain white noise generators {u(t)}
for H(x).The transfer function in the representation
(1.6) in
this case belongs to L2(C, d@/2~). In this context we shall call n causal any function f with vanishing positive Fourier coefficients in
L2(C, dO/2~), i.e. such that n f~ . . I e-~Okf (el0)dO/2~ = 0 ~-~
(1.8)
for all k>0. Thus any causal function belongs to the n-dimensional conjugate Hardy space ~2 (Hoffmann,1962) and can be extended n to a function of the complex variable z analytic on {Izl > I} (including the point at infinity). A matrix valued function T will be called causal if its rows are. It can be verified directly that for any generating process {x(t)} with a strictly positive definite
matrix (°)
spectral density
we have
Ht(u) C Ht(x)
if and only if the
transfer matrix T in (1.7) is causal. A (left -) invertible matrix T with rows in L2(C, d9/2~) will be called minimum phase if it is n
causal and its extension has an analytic (left-) inverse on {Izl > I}. This is the same thing as a conjugate outer matrix function in
H2-theory. We finally recall the concept of conditional orthogonalit~.
Two
subspaces HI,H 2 of H will be said conditionally orthogonal, given.
iH21H), if
a third subspace X (notation: H I _
< h I -EXhl , h 2 - E X h 2 > = 0
for all h I E H I and h 2 6 H 2, Here the symbol E
(1.9) X
denotes orthogonal
projection onto X. Since in the Gaussian case conditional expec(o) or, more generally, ]967).
full rank purely non deterministic
(Rozanov,
76
tation given a certain family of random variables in H is the same thing as orthogonal projection onto the subspace of H spanned by them~ we see that conditional orthogonality is the same property as conditional independence, given X, of the two families H I and H 2 of Gaussian random variables. The concept of conditional orthogonality
will be extensively used in this paper. For additional
information one may consult (Lindquist and Picci. ]985). We return to our discussion of causality in the stochastic setting. The following is a rather well known fact although often stated in a different terminology.
THEOREM 1.1 The process {Y1(t)} causes {Y2(t)} if and only if
Y2(t) = A(z)y1(t) +v(t)
,
(1.10)
where A(z) is an m 2 x m I causal matrix function and {v(t)} stationary process completely independent of {Y1(t)},i.e.
E y1(t)v'(s) = 0
for all
(1.11)
t,s E Z.
This result is essentially due to (Caines and Chart, 1976). It is also discussed in (Caines and Chan, ]975) and (Gevers and Anderson, 1982). In these references causality is called "absence of feedback" (from Y2 to y]). Note that (].]0) is nothing else but the popular ARMAX scheme widely used in time series identification. Just express {v(t)} by its innovation representation, v(t) = G(z)e(t)
,
(1.12)
where G(z) is minimum phase, normalized so as to make G(~) = I,
77
and {e(t)} is a white noise process. Recall that, by rationality of S(z), both A(z) and the spectrum of {v(t)} are rational and then express the rational matrix [A(z) G(z)] by a left coprlme M.F.D. D(z) -lIB(z) C(z)] to get
D(z)Y2(t) = B(z)Y1(t) +C(z)e(t).
(1.13)
The orthogonality condition (1.11) holds if and only if
E e(t)y;(s) = 0 ,
t,sEZ
and therefore using ARMAXmodels noise and input (yl) processes
(o)
a causality relation on the data.
,
with independent
(1.14) (or uncorrelated)
.
is equivalent to imposing a priori In this case the statistical
inference problem of estimating the joint law of {Y1(t)} and {Y2(t)} is reduced to the much simpler problem of estimating just the conditional law of future y~s given past inputs YI" Quite often there is no evidence in the data which justifies the use of causal models. What kind of models should then be used in this situation? One obvious answer would be to describe the whole (joint) process {y(t)} by an m-dimensional ARMA scheme corresponding say to the m x m rational minimum phase spectral factor of the joint spectrum S. Our main concern is however in describing how two given groups of variables
(Yl and y2 ) interact
dynamically.
In practice Yl and Y2 have a precise physical or economic meaning and the main reason for doing modelling and identification is to discover how much of the temporal evolution of each variable is "explained" by the other. For this purpose it would be much more useful to have models which (although necessarily equivalent to the joint ARMA scheme mentioned above) put into explicit evidence the mutual influence of the variables Yl and Y2" A class of mathe(o) Actually condition (1.14) is often considered to be part of the definition of an ARMAX model and is not even explicitly mentioned.
78
matical descriptions which in a certain sense generalizes the causal model (1.10) is the stochastic feedback scheme
Y2(t) = L(z)y 1(t) +v 1(t),
(i.~5) Y1(t) = K(z)Y2(t) +v2(t), where L and K are causal transfer functions and {v1(t)} and {v2(t)} stationary"error" processes whose innovations can at most he assumed orthogonal to the past histories of Yl an4 Y2' respectively. This class of models has been extensively investigated in recent years, especially by Gevers and Anderson (1981 and 1982) and Anderson and Gevers (]982) with the main motivation of understanding identifiabi]ity of control systems operating under feedback. Practica] use of these mode]s for time series identification seems however to have been very limited so far. We shall propose here a different class of models in which the dynamic interaction between Yl and Y2 is explicitly by the introduction of an auxiliary
described
variable x. This auxiliary
variable will play a role similar to the state variable in Systems Theory.
DEFIN%TION 1.2 A Dynamic Factor Analysis Model with external variables the (jointly statignary) vector processes
{Y1(t)} and {Y2(t)}, is
a linear relation of the form
Y1(t) = A 1(z)x(t)+w 1(t),
(i .16)
Y2(t) = A2(z)x(t) +w2(t), where A I (z) and A2(z) are transfer matrices of dimension m I x n and. m 2 x n
and {x(t)}, {w1(t)} , {w2(t)} are zero mean stationary
79
processes of dimensions n, m|, m 2 which are pairwise uncorrelated, i.e.
{w1(t)} i {x(t)} i {w2(t)}"
(1.17) []
Note that A I and A 2 need not be causal. The process {x(t)} will sometimes be referred to as the factor process of the model. A Dynamic
Factor Analysis (F.A.) model will be called rational if AI,A 2
are rational matrices and {x(t)} has rationalspectrum. The terminology (although not
terribly elegant) has been extrapolated from the
static case. In the next sections we
shall present a first rudimentary
analysis of the model (1.16). The main questions one would like to answer concern the representability of an arbitrary joint stationary process {y(t)} (with y(t) partitioned as in (1.1)) by models of the type (1.16), the equivalence of representations (i.e. when do different representations describe the same spectrum S or the same process {y(t)}), the "external behaviour" of the model which is obtained once {x(t)} is eliminated, finding a natural notion of minimality and characterizations of minimal models, parametrizations and canonical forms in the rational case and above all discuss
use of Factor Analysis models in Statistical Inference
(i.e. identification). This is quite a large program and only a few of these aspects will be touched upon in this paper. Others (especially the last two mentioned above), which still need more research, will not be discussed here.
80
2. Dynamic Factor Analysis Models The stationary processes {x(t) }, {w 1(t)}, {w2(t)} which define a Factor Analysis model span a certain Hilbert space H(X,Wl,W 2) which we denote by H . The Factor Space X of the O
model (1.16) is the doubly invariant subspace of H
generated O
by the factor process,
X = span {a'x(t); a E R
n
, t6Z}
.
(2.1)
Let n < n be the multiplicity of X and let[x(t)} be a minimal generating process for X. Clearly, since x(t) =T(z)x(t) for some nxn
matrix T, we can always rewrite the model (1.16) with A (z) I and A2(z) replaced by At(z) =A1(z)T(z) and A2(z) =A2(z)T(z) and a factor process x(t) which is a minimal generating process for X. We shall therefore adhere from now on to the convention of considering only F.A. models in which {x(t)}is a minimal generating process for
X. Hence the multiplicity of X will always coincide
with the dimension of x(t). Two F.A. models which differ by a change of (minimal) generators in X will be called equivalent. Obviously two equivalent models have the same {w.(t)} processes (for i=1,2), i
the same factor space X and transfer matrices and factor processes related hy A.(z) = A.(z)T(z) -I , i
i=I,_ 2
,
i
(2.2)
^
x (t) = T(z)x(t), where T is a Q-a.e. nonsingular
n x n matrix function whose rows
belon~ to L2(C,dQ),Q being the spectral distribution measure of n
{x(t)}. It is easy to check that (2.2) defines an equivalence relation on the class of all F.A. models of {Y1(t)}, {Y2(t)}. We shall now introduce the concept of splittin$ subspace. By this idea we shall be able to attach a precise probabilistic meaning
81 to F.A. models and at the same time reduce this notion
to a very
simple geometric object. Let H i=H(yi),i = 1,2 be the Hilbert spaces spanned by the components {Yi(t)},
i= 1,2. It will be useful to
think of H I and H 2 as (doubly invariant) subspaces embedded in a large Hilbert space Ho obtained by suitably augmenting H
NIV H 2. On
there is defined a unitary shift operator U which reduces to O
the shift of the process {y(t)} on the subspaee H=H(y) = H I V H 2. (The role played by H
o
is very similar to that of the space
H(X,Wl,W 2) introduced at the beginning of this section).
DEFINITION 2.1 A (stationary) splitting Subspace is a doubly invariant subspace X __°fHo which makes H(y I) and H(y 2) conditionally orthogonal given X, i.e. satisfies
H(Yl)IH(Y 2) [ X
(2.3)
together with UX = X. A Splitting Subspace X is called minimal if there are no proper subspaces of X which are doubly invariant and still satisfy condition (2.3).
[] The concept of splitting subspace is a generalization of the idea of sufficient statistic (at least in the Gaussian case). It follows in fact from the definition of conditional orthogonality (1.9) that EEh IIXVH2]
= EEh IIX] ,
h IE H I ,
and, equivalently,
E[h2IxVH ] = E[h21X ] ,
h2 E H2 ,
82
so that all what is relevant in H2(H I) at the purpose of predicting any h16 H I (h26 H 2) is already contained in X. Therefore if X (or any system of generators of X) is given, we can disregard H 2 (H I) completely. Note that the concept of splitting is of interest only if it corresponds to effective data reduction. Hence the notion of minimality is of central importance. LEMMA 2.1 (Ruckebusch, 1976 and Lindquist and Picci, 1985) A splitting subspace X is minimal if and only if EXH I = X ,
EXH 2 = X
(2.4)
(here EXH'I is the closure of {EXhl; hie Hi} ,
i = 1,2). []
The following theorem shows that (modulo choice of generators) splitting subspaces and Dynamical Factor Analysis models are essentially the same thing. THEOREM 2.1 The factor space X of any F.A. model of {Y1(t)}, {Y2(t)} i__ss a splitting subspace. Vice versa to every splitting subspace X for H(Yl) , H(Y2) of finite multiplicity there . corresponds the equivalence class, defined modulo choice of generators, of F.A. models having X as
factor space.
Proof: Let X be given by (2.1). Then, since A.(z)x(t) = EXy.(t) , l l
tEZ,
i= 1,2 ,
(2.5)
the o r t h o g o n a l i t y r e l a t i o n of {wl(t)} and {w2(t)} , which holds by assumption for any model (1.16), can be rewritten as X X Y1(t)-g y1(t) l Y2(S)- E Y2(S) ,
t,sEZ •
(2.6)
88
As {Yi(t)} is a generating process for H.I it follows from the definition ( 1 . 9 )
t h a t indeed X i s s p l i t t i n g .
Viceversa, let
X be a splitting subspace and {x(t)} a minimal generating process X for X of dimension n. The projections E Y i ( t ) can be w r i t t e n as in (2.5) for suitable transfer functions A.(z) of dimension m. xn. I i Define
w.(t): = Y i ( t ) - E X . ( t ) l i
,
teT,
i=1,2,
(2.7)
then the stationary processes {w.(t)} are orthogonal to X and, i by the conditional orthogonality condition (2.3),we have also E w1(t)w2(s)' = 0
for all
t,sE ~. Therefore
{y1(t)} and
{Y2(t)} can be written as in (1.16), while satisfying (1.17). [] The equivalence established by Theorem 2.1 permits to define a first rough notion of minimality for F.A. models. We shall say that a F.A. model is irreducible if its factor space is minimal splitting.
THEOREM 2.2
(Picci and Pinzoni, 1986)
A F.A. model is irreducible if and only if the rank a.e. on the unit circle of the matrices A|(z) and A2(z) is equal to the multiplicity of X. All irreducible F.A. models have the same multiplicity (i.e. the same number of factors) n equal to the rank a.e. on the unit circle of the cross spectrum $12 of the processes {Y1(t)} and {Y2(t) }In the rest of this paper we shall concentrate on irreducible models. As we have just seen these models are characterized by a.e. left invertible matrices Ak(Z) , k = 1,2. Their factor process has an absolutely continuous spectrum
with an a.e. positive definite
spectral density matrix Q on the unit circle(Picci and Pinzoni,1986).
84
If in an irreducible F.A. model we eliminate the auxiliary variable {x(t)},we obtain a scheme of the following type, A2(z)-Ly2(t) = A1(z)-Ly1(t)
,
(2.8)
Y1(t) = Y1(t) +w1(t)
,
(2.9a)
y2(t) = Y2(t) +w2(t)
.
(2.9b)
This is essentially what is commonly called an Errors-In-Variables (E.I.V.) model of the processes {Y1(t)}, {Y2(t)}. Here Y1(t) and Y2(t) are represented as "noisy" observations of the "true" variaA
bles y~(t), Y2(t) obeying the deterministic relation (2.8) . Note that the correlation structure of {Y1(t)} and {Y2(t)} is completely embodied in the relation (2.8)
as the noise processes {Wk(t)} are
mutually uncorrelated and also orthogonal to the "true" variables {;k(t)}. An equivalent form of the deterministic link (2.8)
between
the true variables is obtained by substituting x(t) =A1(z)-Ly|(t) into the second equation in (1.16), getting -L Y2(t) = W(z)Y1(t)
,
W(z): =A2(m)AI(Z)
(2. I o)
, W(z) ~=A1(z)A2(z) -L
(2.11)
or,dually,
Y1(t) = W(z~Y2(t)
Note that the transfer functions W(z), W(z) ~ and also the relation (2.8)
are invariant under change of generators , x(t) =
= T(z)x(t) (T nonsingular), and are therefore uniquely attached to the (minimal) splitting subspace X of the model. An important question concerns the existence of models for which W (or W ~
is
a causal transfer function. This is the same as asking if two stationary processes described by an arbitrary joint spectrum S can be represented by the "noisy" input-output model
8G
Yl (t) = yl (t) + w I (t), (2.12) Y2(t) = W(z)Y1(t) +w2(t) , where W(z) is causal and {w1(t)}i{y|(t)}i{w2(t)}.
We shall
take up this kind of questions in section 4. As a last general comment about F.A. models, we remark that the freedom of changing generators in the factor space X permits to choose transfer matrices Ak(Z) or factor processes of very special structure. For example we can always take {x(t)} to be a white noise process or require that both At(z) and A2(z) be causal transfer functions.
For simplicity we shall state the next result
for the case of rational F.A. models.
PROPOSITION 2.1 For every rational irreducible F.A. model there is a choice of (minimal) generators in X, x(t) = T(z)x(t) ,
which (maintains rationality and) achieves causality of the transfer function matrices
Ak(Z) = Ak(Z)T(z)-1,
k = 1,2
(2.13)
Proof: In a rational model both the spectrum Q and the matrices ~,
k = 1,2
are rational functions. Since the joint spectrum
of the processes
Yk(t) = ~ ( z ) x ( t ) ,
=
$12 $21
32
=
k = 1,2
Q A2
I
,
A2
'
(2.14)
86
is then itself a rational function, it admits causal (in particular minimum phase) rational
spectral factors. Note that irre-
ducibility implies that rank S = n = r a n k
$12. Pick a causal full
rank rational spectral factor A (of dimension m x n) of S and write it as a partitioned matrix with two blocks Ak(Z) of dimensions m k x n ,
k = 1,2. The spectral factorization
= [AI(z)
(z)
L A2(z)
A2
is c l e a r l y e q u i v a l e n t t o the r e p r e s e n t a t i o n s k = 1,2
with { x ( t ) } an
Yk(t) = A k ( z ) x ( t ) ,
n - d l m e n s i o n a l white n o i s e p r o c e s s .
We
interpret {~(t)} as the new factor process of the model. Since A(z) is full rank, we can solve for
A1(z) =
A2(z)
Y2 (t)
x(t)
in the representation
x(t)= [X1(z)1 x(t), A2(z)
getting
~(t) = A2(~)
-L [A1(z) A2(z)
Note that T is square n x n
x(t) := T(z)x(t)
.
and nonsingular because of irre-
ducibility. This proves Proposition 2.1.
In the proof we could in particular have chosen =
LA1(z)'X2(z)']' minimum
[] A(z) =
phase. We see that an irreducible
rational F.A. model can always be written as a pair of ARMAX equations,
87
D1(z)Y1(t) = B1(z)x(t) +C1(z)e1(t), (2.15) D2(z)Y2(t) = B2(z)x(t) +C2(z)e2(t),
with {~(t)},
{el(t)} ,
{e2(t)} pairwise uncorrelated white noise
processes and Dk(Z)
and
dimensions
and
mkxm k
Ck(Z) stable polynomial matrices of m k x Pk' Pk
being the multiplicity
of the noise process {Wk(t)} , k = 1,2.
3. Stochastic realization The main problem of this section will be to describe the class of all irreducible F.A. models which match a given spectral density matrix. We shall see that this is equivalent to solving the following problem. PROBLEM P.! Given an
mxm
spectral densit~ matrix S partitioned as in
(1.3) and satisfyin$ the Basic Assumption
of Sect. I, find all
5-tuples of matrix functions {AI,A2,Q,RI,R 2} on the unit circle, with A~ of dimension
mkx n
and of rank n , Q of dimension
and nonsingular , R k of dimension i)
nx n
mkx mk, k = 1,2, which
satisfy the system of equations
S I = AIQ A I + R I ' $12 = A1Q A2 , S 2 = A2Q A 2 * R 2 ,
ii)
make the (m+n) x (m+n) matrix
(3.1)
88
sI
S12
AIQ
$12
S2
A2Q
QA I
QA 2
Q
(3.2)
into a spectral density matrix (in particular Hermitian and nonnegative definite on the unit circle).
[] Assume we have an irreducible F.A. model,
z1(t) = A1(z)x(t) +w1(t) ,
(3.3) z2(t) = A2(z)x(t) +w2(t)
if we interpret Q as the spectral density matrix of Rk, k = 1,2
{x(t)} and
as the spectra of the two noise processes {Wk(t)} ,
we see that eqns. (3.1) express precisely the fact that the joint spectrum of {z1(t)} and {z2(t)} coincides with the given joint spectrum S.
Note also that the matrix S in (3.2) is just the
joint spectral density of the three processes {z1(t)} , {z2(t)} and {x(t)}. Vice versa, assume we are given a 5-tuple {AI,A2,Q, RI,R 2} of matrices satisfying eqns. (3.1) and condition (ii). It is not hard to see and we shall check this later, that condition (ii) implies that
Q,RI,R 2 are necessarily bounded Hermi-
tian positive semidefinite (Q is actually positive definite) matrices on the unit circle and can therefore be interpreted as spectral densities of three mutually uncorrelated zero mean Gaussian processes [x(t)},{w1(t)} , {w2(t)}. Starting from these processes, we generate {z1(t)} and {z2(t)} by the linear transformation (3.3). We see from (3.1) that the joint spectrum of the stationary processes {z1(t)} , {z2(t)} is precisely equal to the given joint spectral density matrix S. In short, solving
89
problem
P.I is the same thing as finding all irreducible F.A.
models (3.3) for which the joint spectrum of the external variable__~s{z1(t)} , {z2(t)} is equal to the given spectral density matrix S. This problem is a distributional or "weak sense" stochastic realization problem (Finesso and Picei, ]984 and Lindquist and Picci, 1985). Interpreting S as the joint spectrum of two given Gaussian processes {y](t)}, {Y2(t)} , we are looking for all irreducible models (3.3) such that {Zk(t)} and {Yk(t)} equal processes in distribution. In "practical"
are
terms this
means that the model (3.3) will only be useful to simulate the signals {Yk(t)} in an "average" sense but not samp]ewise
in
general. A 5-tuple {AI,A2,Q,RI,R 2} satisfying conditions (i) and (ii) above, or, equivalently a F.A. mode] of the type (3.3) matching the given spectrum S, will be called a F.A. representation of the spectrum S. A (strong sense) F.A. representation of the processes {Y1(t)}, {Y2(t)} is instead a F.A. model of the type (3.3) for which Zk(t)= = Yk(t) almost surely for all
t e Z. This type of (samplewise)
equality is clearly stronger than equality in distribution and can only occur when the processes {Zk(t)} and {Yk(t)} are defined on the same probability space. This means that the various processes {x(t)}, {w1(t)} , {w2(t)} in (3.3) must be built in such a way that Ho: =H(X,Wl,W 2) DH(yl,Y 2) =H(zl,z2). Samplewise (i.e. strong sense) F.A. representations of {y1(t)},[Y2(t)} can be classified according to "how big"an underlying space H
is needed to support o the processes which specify the model. Later we shall study in some detail the class of F.A. representations for which H = o = H(yl,y2). These representations will be called "y-measurable" (o)
(o) Clearly an equivalent condition for y-measurability is that the factor space X is included in H(y).
90
Note that whenever {x(t)} is given, the noise processes {Wk(t)} are automatically fixed as functions of {x(t)~, {Y1(t)}, {y2(t)} by the orthogonality condition (1.17), as
Wk(t) = y k ( t ) - EXyk(t)
,
k = ~,2 ,
(3.4)
where X is the splitting subspace generated by {x(t)}.Therefore a (strong) F.A. representation is completely specified once the factor process {x(t)} is assigned as a function of some available generators of the space H . In particular a y-measurable repreo sentation is completely specified once {x(t)} is given as a function of {Y1(t)} and {Y2(t)}. In order to avoid complicated statements about equivalence classes, it will be useful to fix once and for all a rule for choosing generators in each factor space X. A convenient way to do this is to f i x a full rank factorizationof the cross
spectrum $12 ,
$12(z) = H(z)G (z) ,
(3.5)
where H and G are of respective dimensions m I x n , m 2 x n and of rank equal to
n=rank
$12 a.e. on C. Since $12 is rational, we
can always choose H and G to be rational matrices. In fact we shall choose H and G in such a way that (3.5) is a minimal factorization of the rational matrix Sl2(in the sense of Gohberg
and
Kaashoek
Bart,
(]979), p. 84).
Since all entries of a rational spectral density matrix must be analytic on the unit circle, it follows that both H(z) and G(z) must also be analytic on the unit circle. In the following we shall make the simplifying assumption that $12(z) has no zeros on the unit circle, i.e.
rank S I2(e i0) = n,
This guarantees
V e e [0,2~).
(3.6)
that neither H(z) nor G(z) can have zeros
91
on the unit circle, more precisely, both H(e io) and G(e ie) will be of constant rank n
for all
e E [0,2~). From now on the
matrices H and G will be considered as data of our problem.
LEMMA 3.1 Let condition (3.6) hold.
Then for each equivalence class
of irreducible F.A. models of {y1(t)}, {Y2(t)} there is a unique choice of generating process {x(t)} in the factor space X such that -I AI(Z) = H(z),
A2(z) = G(z)Q(z)
,
(3.7)
where Q is the (nonsingular) spectrum of {x(t)} . Alternatively, a unique generating process [x(t)} can be chosen for which
A 1(z) = H(z)Q(z) -I ,
A2(z) = G(z) ,
(3.8)
where Q is the spectrum of {x(t)}. The generating processes {x(t)}, {x(t)} for =he same minimal splitting subspace X are related by the transformation
x(t) = Q-1(z)x(t) .
(3.9)
Proof: In fact, if we start with an arbitrary irreducible model (1.16),there is a unique change of generators in X, ~(t) =T(z)x(t), with T such that H(z)T(z) =A1(z). Note that there is a unique a.e. nonsingular solution to this equation as both A I and H are of full rank n. Moreover T E L2(C,Qd6) where Q is the spectral density of {x(t)}. This follows from T(z) =H(z) -LA 1(z),because A ~ L2(C,Qde) and any left inverse of H(z) is analytic on the I
92
unit circle, in force of assumption (3.6). With this choice we get
$12(z) = H(z)Q(z)(T(z)-I~A2(z)* , where Q is the spectrum of {x(t)}. From (3.5) it follows then A2(z)T(z)-I = G(z)Q(z) -I Similarly, by choosing x(t) = T(z)x(t) with G(z)T(z) =A2(z) , we obtain (3.8). In particular, for x(t) =x(t) we find T =~-I.
[] By choosing the generators as stated in Lemma 3.1, we get a unique irreducible F.A. model representative of each minimal splitting subspace X. These models, for the two different choices (3.7) and (3.8), can be written as Y1(t) = H(z)x(t) + w (t) I
(3.10)
Y2(t) = G(z)Q(z)-Ix(t) +w2(t) ,
and, respectively, as
y1(t) = H(z)~(z)-1~(t) + w1(t) (3.11) Y2(t) = G(z)x(t) +w2(t)
.
We shall call "first" and "second" type canonical forms the two representations (3.10) and (3.11). Clearly each equivalence class of irreducible F.A. representations of a given spectrum S can in turn be represented by a unique 5-tuple -I {H, G Q or by
, Q, RI, R 2}
93
{H
~-I
G, Q
RI
R 2}
Note that R
and R are uniquely determined from the equaliI 2 ties (3.1) as functions of SI,AI, Q and $2,A2, Q. We conclude that all irreducible F.A. representations of the spectrum S• written in the first canonical form, are parametrized in a one-to-one wa 7 By the
nx n
nonsingular matrix function Q as
{H• G Q
-1
*
, Q, SI-HQH • S2-GQ
-1
*
G } ,
(3.12)
where Q is constrained to satisfy the condition that the matrix
S1
HG
GH
S2
QH
G
(3.13)
be a spectral densitY. Dually• all irreducible F.A. representations o_f_fS written in the second canonical form are parametrized in a one-to-one way by the nonsinBular
nx n
matrix function Q
{HQ -I, G, Q• SI-HQ-IH , S2-GQG } ,
a~s
(3.14)
where Q is constrained by the condition that the matrix
SI
HG
GH*
S2
LH*
H (3.15)
QG*
be a spectral density function. At this point we are ready to describe the solution set of our stochastic realization problem P.I. We introduce the
nx n
Hermitian matrices * -I QI: = H S I H•
*
-I
Q2 = G S 2 G
(3.16)
94 and set Q,:
o
=
Note that both Q1 and Q2 are strictly positive definite rational spectral density matrices in force of condition (3.6) and our standing assumptions on S. We define also the
nxn
Hermitian
matrices
A: = q l - q 2
'
~: = Q2-Q1
(3.18)
THEOREM 3.1 All irreducible F.A. representations of the spectrum S written in the first canonical form (3.12) are parametrized hy the solutions Q of the matrix inequality
Q-Q2-(Q-Q2)A-I (Q-Q2)* > 0
(3.19)
•
Dually~ all irreducible F.A. representations
of S written in the
second canonical form (3.14) are parametrized by the solutions of the inequality Q-QI-(Q-QI)% -1 (Q-QI)* ~ 0. A__n_n nx n
(3.20)
matrix function Q solves (3.19) if and only if
solves (3.20). All solutions Q (Q) of(3.19)
~ = Q-I
(resp. (3.20))are
Hermitian bounded and strictly positive definite,in fact they satisfy QI Z Q Z Q2 "
Q2 > ~ > Q1 '
(3.21)
where QI and Q2 (Q| and Q2 ) are the spectra] densities defined by (3.16) and (3.17).
95
Proof: What needs to be shown is that an n x n matrix function Q makes (3.13) a spectral density matrix if and only if it satisfies the quadratic inequality (3.19). Assume there is a Q making (3.13) into a spectral density matrix. Then, by a standard block diagonalization procedure, the positive definiteness of (3.13) is seen to be equivalent to
S2>0
, (3.22)
$I: = S1-S 12S21 S21 > 0 , *
q-G
-I
*
-I
*~-1
*
-I
S 2 G-(q,G S 2 G)H S I H(Q-G S 2 G) > 0 .
The first two inequalities are trivially satisfied. In fact, by our Basic Assumption on S, S 2 and $I are strictly positive definite on the whole of C. By simple matrix manipulations it can be checked that *
* -1
H*S-IHI = H (SI-HQ2 H )
_
H = (QI
-1
Q2 )
(3.23)
and therefore, recalling our notations (3.18), we see that Q has to satisfy the inequality (3.19). Note that Q makes the matrix (3.13) positive semidefinite if and only if Q=Q-I makes (3.15) positive semidefinite. This in turn happens if and only if Q satisfies the dual inequality (3.20) as can be seen by exactly the same argument used before. -I Thus Q satisfies (3.19) if and only if Q satisfies (3.20). -I Observe now that the matrixA , given by the expression (3.23), is strictly positive definite Hermitian on the unit circle and therefore any solution Q to (3.19) makes Q-Q2 positive semidefinlte Hermitian. Hence Q is Hermitian and Q ~ Q 2 " solution Q of (3.20) satisfies Q ~ Q I
•
Similarly any
Then, writing Q as Q-I9
we
96
obtain the first inequality in (3.21). So, any solution to (3.19) has a lower (Q2) and upper bound (~]I),. Q2 being strictly posi----I
rive definite and QI
being trivially bounded on C. It follows
that any solution to (3.19) is a spectral density matrix. The matrix (3.13) constructed from such a solution is also Hermitian positive semidefinite and has bounded entries on the unit circle. Therefore it is a spectral density matrix. [] The solution set of the inequalities
(3.19),
(3.20) can be
described quite explicitly.
THEOREM 3.2 An
nx n
matrix valued function Q on the unit circle solves
the inequality (3.19) if and only if it is Hermitian and Dually, an
nxn
QI ~ Q~Q2'
matrix Q solves (3.20) if and only if it is
Hermitian and satisfies
Q2~Q~QI.
Proof: The "only if "part is already contained in the statement of Theorem 3.1. We only need to prove the "if" part. Assume first that QI > Q > Q2 (with strict inequalities) holds. Then (QI-Q) and (Q-Q2) are both Hermitian strictly positive definite and there-I -I fore (Q-Q2) + (QI-Q) is strictly positive definite. Byawellknown formula for the inverse of a sum of matrices we see that this positivity condition is equivalent to
Q-Q2-(Q-Q2)A
(Q-Q2)
> 0 .
Now, every Q satisfying QI ~ Q ~ Q 2
(3.24)
can be approximated in L=nxn(C)
by a sequence of matrices Qk for which the strict inequalities hold~ Take for instance
97
Qk = -k- Q +
(QI+Q2) '
for which apparently Qk-Q2 > 0 and QI-Qk > 0. Hence Qk satisfies the strict inequality (3.24). But the left hand side of (3.24) is a positive definite matrix which is a continuous function of Qk and, as
k-> ~, it can at most become positive semidefinite.
[3 REMARK As a corollary of Theorems 3.1, 3.2 we obtain that the inequality QI>Q_>Q2 is equivalent to -1
SI>HQH
,
S2>G Q
*
G ,
Q>O,
which form in turn an equivalent set of conditions to the positivity of the matrix (3.13). This fact in particular guarantees that if Q satisfies (3.21) (or equivalently (3.19)), then the noise spectra R I and R 2 will be (Hermitian and) positive semidefinite. Note that the maximal solution QI is in this sense just the matrix which corresponds to the largest approximant of rank n of S I in the ordering of Hermitian positive semidefinite matri-
[] ces.
Theorem 3.1 provides a recipe for computing all irreducible F.A. representations describing a given spectral density matrix S in a fixed coordinate system. We can now easily see that there are many of such representations (a fact that we have not bothered to show till now). For example, as the two "extreme" spectra QI and Q2 defined in (3.16) and (3.17)
both satisfy the inequality
(3.19) (with equality sign), we see that there are a "maximal" and "minimal" irreducible F.A. representations (in the first canonical form) which correspond respectively to the maximal (QI)
98
and minimal (Q2) solutions to the inequality (3.19). Solutions like QI' Q2 above for which (3.19) is satisfied with equality sign have a special meaning. They correspond to joint spectra (3.13) of minimum possible rank, m, as can be seen from the block diagonalization
(3.22). Since the rank of the joint
spectrum of {z1(t)}, {z2(t)} and {x(t)} is equal to the multiplicity of the doubly invariant subspace
H(X,Zl,Z 2) spanned by these
processes, the multiplicity m of
H(X,Zl,Z 2) is equal to the mul-
tiplicity of the subspace H(Zl,Z2). This can only happen if H(X,Zl,Z 2) =H(Zl,Z2) , or, that is the same, if x(t) EH(Zl,Z 2) for all
tEZ. We see that all models which correspond to solu-
tions Q of (3.19) with equality sign are characterized by the fact that the factor process {x(t)} is a function of {z1(t)} , {z2(t)}. This observation is the key to the following result. PROPOSITION 3.1 The solutions Q to the quadratic matrix equation -I Q-Q2-(Q-Q2)A
* (Q-Q2)
= 0
(3.25)
parametrize in a one-to-one way the (strong) irreducible y-measurable representations of the processes {Y1(t)}, {Y2(t)}
of the form
(3.10). Dually, all solutions Q to the quadratic e~uation
Q-QI-
(Q-QI)A-I(Q-QI)* = 0
(3.26)
parametrize in a one-to-one way the (strong) irreducible F.A. representations of {Y1(t)}, {Y2(t)} of the form (3.11) for which XCH(y). Proof: Consider a F.A. representation of the type (3.10). If the factor space X is contained in H(y),then H(x,Yl,Y2) =H(yl,Y2) =H(y)
99
and hence the joint spectrum of {Y1(t)}, {Y2(t)}, {x(t)} has rank m. This implies that the spectrum Q of {x(t)} satisfies (3.19) with equality sign. Viee versa, assume Q is a solution of (3.25). Then, as discussed previously, the factor process of the F.A. -I model of type (3.3) attached to the weak realization {H, GQ , *
Q, SI-HQH , S2-G Q
-I
*
O } of the spectrum S, has the property that
x(t) belongs to H(z 1,z 2) for all t. It can therefore be written as x(t)=P1(Z)Zl(t)+P2(z)z2(t), transfer matrices. Define an
where P.(z), i= 1,2, are n x m . i l n-dimensional process {x(t)} by
setting
x(t) = P1(z)Y1(t) +P2(z)Y2(t).
(3.27)
Then {x(t)}, {Y1(t)}, {Y2(t)} have exactly the same joint second order statistics (i.e. the same spectrum) as {x(t)}, {z1(t)} , {z2(t)}. Since conditional orthogonality depends on joint second order moments o n l ~ i t then follows that is splitting for splitting for
H(Yl) , H(y 2)
X : = span {x(t); t 6 ~ }
exactly as the factor space X was
H(Zl) , H(z2). Hence {x(t)} is the factor process
of a strong F.A. representation of the type (3.10). By construction {x(t)} has spectral density matrix equal to Q and XCH(y). []
Let us define the stationary n-dimensional processes
xl(t) =
Q1(z)
xl(t),
Xl(t) =H
(z)S1(z) -ly I (t)
, (3.28)
x 2(t) = G * (z)S2(z) -ly2(t) where QI is defined by (3.16),(3.17).
Observe that the spectra of
{x1(t)} and {x2(t)} are precisely the extremal solutions QI,Q2 of the quadratic inequality (3.19). It is immediate to check that {x1(t)} and {x2(t)} are minimal generators for the subspaces
100
X1:
-H(Yl)H(Y2) '
= E
X2:
~H(Y2)H(y ]).
=
(In fact, for example X 2 is generated by
Y1(t) = S|2(z)S21(z)Y2(t) =
= H(z)x2(t) ). Moreover both X I and X 2 are minimal splitting suhspaces (compare e.g. Lindquist,Picci and Ruckehusch, 1979) XICH(y I) ,
X2CH(Y 2) ,
therefore they specify two equivalence classes of strong irreducible F.A. representations of {y1(t)}, {Y2(t)}. The particular generators {x1(t) } and {x2(t) } defined in (3.28) correspond to choosing these representations in the first canonical form, namely Y1(t) = H(z)x1(t ) +w1,1(t) , (3.29) Y2(t) = G(z)Q1(z) -Ixi (t) +wl,2(t) , and Y1(t) = H(z)x2(t ) +w2,l(t) , (3.30) Y2(t) = G(z)Q2(z)-Ix2(t ) +w2, 2 (t) . Observe that in the representation (3.29) the second equation is just the decomposition of estimate
Y2(t)
as the sum of the (noncausal)
Y2(t) = S21(z)S](z)-I Y1(t) and of the corresponding
estimation error. The first equation is more interesting. It can be rewritten in the form YI(t) = ~H(z)Y1(t) + (l-nH(z))Y1(t) , where H
(3.31)
i s the p r o j e c t i o n v a l u e d m a t r i x f u n c t i o n ~tt(z) = H(z)(H
(z)S1(z)-IH(z))-'IH*(z)S1(z)-1 (3,32)
101
mapping onto the column space of H. Note that ~ is S1-orthogonal , , H i.e. HHSI(I-~H) = 0 a.e. on the unit circle. Thus x I formally looks like the classical least squares estimate of x linear model Yl = H x + w
in the
. An analogous interpretation holds for the
second equation in (3.30). The next theorem describes quite explicitly
the family of
all (strong) y-measurable irreducible F.A. representations of {Y1(t) },
{Y2(t) }.
THEOREM 3.3 The factor process of any irreducible y-measurable F.A. representation (in the first canonical form) is a combination of {x1(t)} and {x2(t)} of the form x(t) = H(z)x1(t) + (I-E(z))x2(t) ,
(3.33)
where = (Q-Q2)A -1
(3.34)
is a A-ortho~onal projection valued matrix function on the unit circle. Proof : The proof relies on the easily checked fact that IN(x2) ] =x2(t). Then
x(t): =x1(t)-x2(t)
is orthogonal to {x2(t)} and so the direct sum is orthogonal.
ELx1(t) I
form a process which
H(x 1,x 2) =H(x)8H(x2) , where Now, any minimal splitting
suhspace X C H(y) is actually contained in
H(x~,x 2) C H(y)
(Lindquist and Picci, 1985), so that the corresponding factor process {x(t)} can be expressed as x(t) = S
~(z)A(z)-Ix(t) +S (z)Q2(z)-Ix2(t) x,x x,x 2 '
(3.35)
102
where the cross spectra are easily computed from
S = S S-1HQ11 = q x,x I x,Y I SX,X 2 z S S-IG x,y 2 2
=
9
Q2
Equation (3.35) is exactly the same as (3.33). In order to check that H is a projection, notice that right multiplication of -I (3.25) by A gives
(Q-Q2)A -I = (Q-Q2)A-I(Q-Q2)& -I
,
which shows that H = H 2 ; moreover (3.25) can be rewritten to look exactly llke ~g(l-~)
=0. Thus H is a A-orthogonal projection.
If we couple formula (3.33) with the explicit expressions (3.28) given for
x1(t)
and
x2(t) , we obtain a linear trans-
formation acting on the "data" {y1(t)}~Y2(t)}
that ~e want to
represent. This is precisely the rule telling us how the factor process of each y-measurable representation is manifactured. Note that (3.33) is still parametrized by Q. To complete the picture we need now to describe the solution set of the quadratic equation (3.25).
PROPOSITION 3.2 Let V be a square spectral factor of the spectral density matrix A=QI-Q2.
Then all solutions
Q = Q2+v where r is any
nxk
rr v
,
(k~n)
unit circleti.e, such that
Q #Q2
to (3.25) are given
(3.36) isometric matrix function on the
103
F F = Ik ,
Ik being the
kx k
(3.37)
identity matrix.
Proof: Write
Q-Q2' assumed to be of rank
k
, in factorized
form as
Q-Q2 with U
=uu
,
a full rank spectral factor of dimension
n x k. Since
U has a left inverse, (3.25) can be reduced to
Ik = U* (V V*)-IU ,
from which we see that
r: = V - I U satisfies (3.37).
[] Note that there is just one
Q
such that rank(Q-Q2) =n.
In this case F is square and (3.37) is equivalent to F F * = I; hence we obtain the "maximal" solution extreme, the "minimal" solution by setting F = 0
Q = Q2
Q = QI" At the other is formally obtained
in formula (3.36). Observe
that by choosing
F varying over the set of rational isometric matrices we have a parametrization of all rational solutions of equation (3.25). In other words, recalling that H and G were chosen rational,we have a parametrization of all rational irreducible y-measurable F.A. representations of the processes {y|(t)} and {Y2(t)}.
104
4. Causality As an application of the characterization obtained in Sect. 3 we shall discuss here the question of causality of the transfer function
W(z)
defined at the end of Sect. 2. We shall call a
F.A. model
yk(t) = yk(t) +wk(t)
, k = 1,2
~k(t) - ~(z)x(t)
(4.1)
,
causal whenever we can write
yz(t) = W(z)Y1(t) for a causal
m2xm I
(4.2)
transfer function matrix. The question is
if there are any causal F.A. models for a given pair
of processes
{y1(t)},{Y2(t)} satisfying our Basic Assumption. Note that for -L irreducible models W(z) =A2(z)AI(z) and at the effect of (4.2) the choice of the left inverse is immaterial. Therefore an irredu-L cible model will be causal if, for at least one left inverse A I , -L the transfer matrix A2A I is causal. We shall need the concept of Wiener-Hopf factorization relative to the unit circle C of the rational matrix S 2(z). As noticed in (Fuhrmann and Willems,
1979), the original arguments of
(Gohherg and Krein, 1960) can be adapted to cover the nonsquare (singular) case which is of interest here. Recall that by our Basic Assumption and in force of condition (3.6), $12(z) has constant rank n on the unit circle. LEMMA 4.1 (WIENER-HOPF FACTORIZATION) The rational matrix function $12(z) can be factored as
105
s12(z)
H(z)D(z)G(z) ~
=
where ^H(z) and G(z) are
m xn
- -
matrices of rank n and
D(z)
is an
,
and
(4.3)
m xn
I
causal rational
2
on the unit circle with a causal left inverse
nx n
diagonal matrix of the type
D(z) = diag{z-kl,...,z -k~, z k£+I .... ,zkn} .
(4.4)
The integers
-k
< ... < -k I
--
--
< 0 < k £
--
%+1
< ... < k --
--
(4.5) n
are uniquely determined and are called the (left) Wiener-Hopf factorization indices of $12 , relative to C .
[] Note that D(z) can in turn be factored as
D(z) = D1(z)D2(z)
,
(4.6)
where
D (z) = diag r iz-kl
.,z
-k~
I
,I}
I
D2(z) = diag { 1 , . . . , 1 ,
(4.7)
z - k £ + l , . . . , z -kn} .
The faetorization (4.3) can thus be rewritten in the form
$12(z) = [H(z)D 1(z)] [G(z)D2(z) ]
(4.8)
In this section we shall identify the rational matrix functions H(z) and G(z) of the minimal factorization (3.5) with the two terms
within square brackets in (4.8). We shall consider irredu-
cible F.A. models written in the second canonical form
106
y1(t) = H(z)D1(z)~(z)-1~(t ) + w1(t) , (4.9)
Y2(t) = G(z)D2(z)~(t ) with Q(z) any Hermitian
nx n
+ w2(t) ,
matrix function satisfying
Q2ZQ~QI,
(4.10)
where *^,
-I~
Q2 = (D2G $2 ~I =
D2)-I
,A, - 1 ~ DIH $I H D I
(4.11) (4.12)
.
In this framework the transfer function matrix W relative to an arbitrary irreducible F.A. model is W(z) = G(z)D2(z)Q(z)D ](z)*H(z) -L
(4.13)
LEMMA 4.2 The transfer function W(z) choice of the left inverse
is causal (for at least one
~-L) if and onl7 if D2(z)Q(z)D1(z)*
is a causal matrix function. Proof: (If). Since inverse. Thus if
H(z) D2Q D;
is minimum phase there is a causal left is causal, W is causal.
(Only if). Since D2(z)~(z)D1(z)* = G(z)-Lw(z)H(z) and
G(z)
is minimum phase, it follows that W causal implies
D2Q D I causal. []
107
THEOREM 4.1 Under the stated assumptions acausal irreducible F.A. model of {Y1(t)}, {Y2(t)} can only exist if the Wiener-Hopf fact0rization indices of $12(z) are all nonnegative (i.e. D1(z) = I
in --
n
(4.7)). Proof: We show that if D (z) # I I
then
D2QD I
cannot be causal. In fact, for
diagonal element of j-th
or equivalently £ > 0
D 2 Q- D *I
diagonal element of
in (4.5),
n
is
jig
zkj-qjj (z), where
, the j-th q.3J (z)
is the
Q(z). By definition of causality we
must have (compare (1.9)) r~ iO(kj-k)_ I (eiO)de/2~ = 0 J e qjj --7
for all
k > 0. By taking complex conjugate and recalling that
~.. is a real function, we also obtain ]J fI ~ eie(k-kj)~ ..(eie)d6/2~ ~ JJ
for
= 0
k > O. Now, if these two relations hold for some
k. > 0 ~ they 3
imply that J
e_j0 h ~jj ~ele)de/2z
=0
--IT
for all
q..(e ie) = 0 a.e. on C 33 • and contradicts the (strict) positive definiteness of Q(el0). Thus
h E Z . This is equivalent to
D2 Q D I
cannot be causal. D
I08
At this point, to be able to proceed any further we have to introduce the assumption that the Wien=r-Hopf indices of $12 are all nonnegative, i.e. that D(z) = D2(z) = diag{z k] .... ,zkn} ,
(4.14)
0
(4.15)
where
I
< ... < k -- n
We next introduce the notion of matrix
nx n
trigonometric polynomial
P(z) = ~ij(z)~ with indices the ordered set of n natural
numbers {kl,...,kn}. The i,j-th entry of P(z) has the structure
Pij(z) =
ki k E -kj pij k z
(4.16)
THEOREM 4.2 Assume that the Wiener-Hopf factorization indices of
S|2(z)
are all nonnegative. Then there are causal irreducible F.A. models if and only if there are Hermitian trigonometric polynomial solutions Q to the inequality *^* S -I^ (D2G 2 G D2)-I > Q > H^* S-I^ I H ,
(4,17)
with indices, equal to the factorization indices (4.15) of $12(z). Proof: By our assumption model is characterized
(4.14) and Lemma 4.2 each causal F.A. by
D2(z)Q(z)
being causal. By definition
this happens if and only if f~ 1 L~
e
-i0 (ki+k) ~ ~..(eiO)dS/2~ z3
= 0
10g
for
i,j = 1,...,n
and all
k>0.
and recalling that Q is Hermitian,
By taking complex conjugate i.e. qij(eiO) *~qji(eie),
we also get I
ei0 (kJ +k) qij (eie)d0/2~ = 0
~-~ for all
k > 0. Taken together these two relations are equivalent
to
]fit
-iOh_
e
J-~r
qij(e10)dO/2~ = 0
for all
h
satisfying
h<-k
q..(z) lJ
has the expression
and j (4.16).
h>k.. z
This shows that
[]
Observe
that a positive definite trigonometric polynomial
can be factored as
~(z)
where
N(z)
= N(z)N(z)
is an
,
(4.18)
n x n polynomial matrix which can be taken
row-proper and with row degrees exactly equal to the indices k < ... < k of Q. Recalling the remark made after the proof I --- n of Theorem 3.2, we can recast the conditions of Theorem 4.2 in terms of the joint spectrum S in the following way.
COROLLARY 4.1 Assume the Wiener-Hopf factorization indices of $12(z) are all nonnegative.
Then there are causal irreducible F.A. mode!s
if and only if there are
n x n polynomial matrices
(ordered) row degrees equal to the indices
k I < ... < k --
$12(z)
such that
N(z) --
with of
n
--
110
> G D2N S2 --
~*D2G *~* , (4.19)
sI
! ~(~*)-I_-I^*N Q
At the beginning of this section causality was defined with respect to a certain choice of inpu=
(;i) and output
(y2) va-
riables. If we nhoose instead Y2(t) as input and ~1(t) as output , we can of course go through a very similar analysis and obtain analogous conditions for the existence of causal irreducible F.A. models of the type Y1(t) = W(z)~2(t) + w 1(t), (4.20) Y2(t) = Y2(t) where
Y1(t)
+w2(t),
is obtained as
^
(4.21)
Y1(t) = W(z)~Y2(t) for a causal
m Ixm 2
transfer function matrix. Relative to the
Wiener-Hopf factorization (4.3), W ~ has the expression (4.22)
W(z) #F = H(z)D I (z)Q(z)D2(z)* G(z) -L
THEOREM 4.3 Causal irreducible F.A. models of the type (4.20) can only exist if the Wiener-Hopf factorization indices of $12(z)
are
all negative or zero~i.e, only if D(z) = D1(z) = diag{z-k1,...,z -kn} •
(4.23)
In case (4.23) is satisfied, there are causal irreducible F.A. models of the type (4.20) if and only if there are Hermitian
111
tri$onometric polynomial solutions Q to the inequality ( *^~ -I^ )-I ^* -I^ DIH S I H D I > Q > G S2 G ,
(4.24)
with indices equal to the opposite {kl,...,k n} of the factorization indices of S12(z).An equivalent condition is the existence of nx n polynomial matrices N(z) with (ordered) row degrees kl, .... ~n such that
S I > H DIN N DIH
,
$2 > G (N,)_IN_I~,.
(4.25)
[] Let us agree to call minimum phase
those F.A. models for
which both (4.2) and (4.21) are causal input-output relations. Then,as a corollary of Theorems 4.1 ¢ 4.3, we get that minimum phase models exist only if the Wiener-Hopf factorization of S|2(z)
has
D(z) =In
(i.e. is "canonical" in the terminology
of Gohberg and Krein (1960) and Bart,
Gohberg and Kaashoek
(1979)).There exist minimum phase F.A. models if and only if there are constant Hermitian n x n matrices Q for which A~
SI>HQH (4.26)
S2 > ~ Q-I~* on the unit circle. We see that the factor process of minimum phase irreducible F.A. models written in either canonical form, for which H and G are chosen equal to the Wiener-Hopf factors, must be an n-dimensional white noise process.
112
References Anderson, B.D.O. (1985): Identification of Scalar Errors-ln-Variab]es Models with Dynamics. Au~omatica, 21, 709-716. Anderson, B.D.O. and M. Deist]er (1984): Identifiability in Dynamic Errors-ln-Variab]es Models. J. Time Series Analysis, 5, 1-13. Anderson, B.D.O. and M.R. Gevers (]982): Identifiability of Linear Stochastic Systems Operating Under Linear Feedback. Automatica, 18, 195-2]3. Bart, H., I. Gohberg and M.A. Kaashoek (]979): Minima] Factorization of Matrix and Operator Functions, Operator Theory: Advances and App]ications, Vol. 1, Birkh~user Verlag, Base]. Bart, H., I. Gohberg and M.A. Kaashoek (]984): Wiener-Hopf Factorization and Realization, in Proc. Int. Symp. on Mathematical Theory of Networks and Systems, Beer Sheva, Israel, June 1983, Springer-Verla~Lect. Notes in Contro] and Inf. Sciences, 58, 42-62. Caines, P.E. and C.W. Chan (1975): Feedback Between Stationary Stochastic Processes. IEEE Trans. Aut. Control, AC-20, 498-508. Caines, P.E. and C.W. Chan (1976): Estimation, Identification and Feedback, in System Identification: Advances and Case Studies, R.K. Mehra and D.G. Lainiotis eds., Academic Press, New York. Deist]er, M. (]985): Identiflabi]ity and Causality in Linear Dynamic Errors-ln-Variab]es Systems. Report, Inst. of Econometrics and Operations Research, University of Technology,Vienna. Finesso, L. and G. Picci (1984): Linear Statistical Models and Stochastic Realization Theory, in Proc. Vl-th Int. Conf. on Analysis and O~timization of Systems, Nice, France, June 1984, Springer-Verlag Lect. Notes in Control and Inf. Sciences, 62, 445-470. Fuhrmann, P.A. (]98]): Linear Operators and Systems Space, McGraw-Hi]1, New York.
in Hi]bert
Fuhrmann, P.A. and J.C. Willems (]979): Factorization Indices at Infinity for Rational Matrix Functions. Integral Equations and Operator Theory, 2, 287-301.
113
Gevers, M.R. and B.D.O. Anderson (1981): Representations of JointlyStationary Stochastic Feedback Processes. Int. J. of Control, 33, 777-809. Gevers, M.R. and B.D.O. Anderson (1982): On Joint]y Stationary Feedback-Free Stochastic Processes. IEEE Trans. Aut. Control, AC-27, 431-436. Gohberg, I. and M.G. Krein (1960): Systems of Integral Equations on a Half Line with Kernels Depending on the Difference of Arguments. Amer. Math. Soc. Transl.(2), ]4, 2]7-287. Granger, C.W.J. (1963): Economic Processes Invo]ving Feedback. Information and Control, 6, 28-48, Granger, C.W.J. (1969): Investigating Causa] Re]ations by Econometric Models and Cross-Spectral Methods. Econometrica, 37. Hoffman, K. (1962): Banach Spaces and Analytic Functions, Prentice -Hall, Englewood Cliffs. Kalman, R.E.(1982a): System Identification from Noisy Data, in Dynamical Systems II, A.R. Bednarek and L. Cesari eds., Academic Press, New York. Kalman, R.E. (1982b): Identification from Real Data, in Current Developments in the Interface: Economies, Econometrics, Mathematics, M. Hazewinke] and A.H.G. Rinnooy Kan eds., Reidel, Dordreeht. Kalman, R.E. (1983): Identifiability and Modeling in Econometrics, in Developments in Statistics~ Vol. 4, P.R. Krishnaiah ed., Academic Press, New York. Lindquist, A. and G. Picci (1985): Realization Theory for Multivariate Stationary Gaussian Processes. SIAM J. Control and Optim. 23, 809-857. Lindquist, A., G. Picci and G. Ruekebusch (1979): On Minimal Splitting Subspaees and Markovian Representations. Math. Systems Theory, 12, 271-279. Picci, G. and S. Pinzoni (1986): Dynamic Factor Analysis Models for Stationary Processes, IMA J. Math. Control and Information, to appear.
114
Rozanov, Y.A. (]967): Stationary Random Processes, Ho]den-Day, San Francisco. Ruckebusch, G. (1976): Representations Markoviennes de Processus Gaussiens Stationnaires. C.R. Acad. Sc. Paris, S~r. A, 282, 649-651. Van Schuppen, J.H. (1985): Stochastic Realization Problems Motivated hy Econometric Modelling. Report 0S-R8507, Centre for Mathematics and ComputerScience, Amsterdam. Willems, J.C. (1979): System Theoretic Models for the Analysis of Physica] Systems. Ricerche di Automatica, 10, 7]-106.
Chapter 4
Predictive and Nonpredictive Minimum Description Length Principles
Jorma Rissanen
1. Introduction
Statistical estimation or modeling is an activity aimed at infering from a set of observed data certain properties that are expected to hold in future data. This involves a fundamental dilemma in that whatever we estimate will be determined by the current data, and yet the success of our attempts will be judged by the behaviour in the future data, which evidently are not available now. The way this difficulty is dealt with in traditional statistics is to regard the current d ata as a sample from a larger, in effect an infinite population, represented by a "true" probability distribution with parameters, each meant to define some property of the data. These parameters then provide the targets to be estimated, which can be done by minimization of some measure of nearness, such as the squared deviations or the likelihood function, between the existing data and the fitted parametric distributions. In trivial cases the n umb er of the "true" parameters is taken to be known, but frequently, in order to leave all the doors open, the "true" parent distribution is assumed to have infinitely many parameters, which evidently is a safe hypothesis in that it can neither be verified nor disproved.
The problem with the "true" distribution hypothesis is not so much the fact that the distribution has to be chosen subjectively (in fact, selecting a large enough class will allow a lot of leeway) as the fact that this hypothesis forces us to regard models as approximations of the assumed distribution, the goodness of which, however, must be judged in the light of the observed data. Hence, if we fit models having different numbers of parameters, then a model with more parameters is likely to provide a better fit than one with fewer parameters without any guarantee of better performance on future data. And this is true no matter how we measure the nearness. What imtead
116
is needed is the ability to compare models regardless of the number of parameters they have, which simply cannot be done by their nearness to an abstract and subjeetivoly selected parent distribution.
In this paper we present in a tutorial fashion a rather different approach to statistical reasoning, introduced and studied in a number of papers, Rissanen (1978), (1983b), {1984a,b,c), (1985a,b). The reasoning goes as follows: The main problem in statistical modeling is regarded as one of understanding and explaining the set of observed data, which, to be sure, often look quite chaotic. Intuitively, "understanding" presumably means something related to an ability to learn and to discover various regular features that constrain the data and that imply redundancy if we were to describe the data without taking them into account. Additionally, an understanding permits a degree of prediction. A trivial example is a sequence such as l, 4, 9, 16, 25, which, if we spot the rule, can be described very concisely as well as perfectly predicted, provided, of course, that the rule holds even in the future. A less trivial example is Newton's law of gravitation, which is a model that permitted a great improvement over the Ptolemaic models by Eudoxus and Hipparohus as wcU as Tycho Brahe's tables for the planetary motions both in regard of the description length and predictability. Notice, in particular, that no "true" law is needed to do prediction; for example, Hipparchus' epieycles and eccentric circles were clearly incorrect explanations of the planetary motions, but still they provided useful predictions of the lunar eclipses. Similarly, Newton's law is also incorrect, but it provides predictions with astonishing accuracy. Many people think that there is a difference of a kind between a grossly incorrect model like I~udoxus' and an accurate one like Newton's, because, indeed, Newton's model "explains" the planetary motions with help of the most elegant law of universal gravitation. But the difference is simply one of degree, and the universal "law" of gravitation is just another incorrect model, which, incidentally, involves the rather disturbing and, in fact, absurd idea that a force is being transmitted instantly. To summarize, there are no "true" laws nor systems outside the realm of mathematics, but that does not prevent us from understanding observed data.
We are interested in statistical features which, of course, somehow reflect the underlying data generating machinery. Since we usually are not allowed to open up the machinery and take a direct look, we must have some means to recognize the regular Ieatures in the observations and to measure their amount. This can be done by counting the number of binary digits with which the observed data can be written down by taking advantage of the various
117
models, t h a t serve as a n expression of the rules. In technical terms, w e s a y t h a t the d a t a is encoded for the purpose of getting a short code length; i,e., " c o m p r e s s e d " . The resuhin 8 code lensth , then, represents a universal a n d immutable criterion for model fitting, w h i c h is just a b o u t as free from subjective and w h i m s i c a l choices as w e can make it. T h e r e remains the subjective selection e l the class of models, but t h a t m u s t necessarily be so; after all, how can w e learn f r o m the d a t a unless w e can formulate the properties we wish to find'/Similarly, b y carefully selecting the model class w e can also influence the properties w e wish to discover, w h i c h gives us a means of learning. It is i m p o r t a n t to see t h a t the code must include the description of the model itself, for otherwise the ira= aginod d e c o d e r could not r e c o v e r the data. We call the process of minimizing such a criterion the
Minimum Description I2ngth (AIDL) principle.
It is clear t h a t the length of coding the d a t a c a n n o t b e reduced below a certain level, which is entirely d e t e r m i n e d by the d a t a and the class of the selected models, regardless of w h e t h e r the models h a v e the same n u m b e r of parameters or not. W e call this critical level the stochastic complexity of the data, relative to the considered class of models. Different m o d e l classes can be judged b y their stochastic complexity, and perhaps to some consternation a subclass of a n o t h e r class m a y produce a strictly smaller stochastic c o m p l e x i t y t h a n the larger class. Hence, the way to good models is not just to m a k e the model classes larger and larger; that is to say, to m a k e the models increasingly complex.
In a s t a r k contrast w i t h the traditional statistics, the optimal model, d e t e r m i n e d b y the
stochastic complexity, or with which it is reached, is not a n a p p r o x i m a t i o n of a n y t h i n g at all. Rather, it has a n independent meaning in incorporating all the statistical information in the d a t a t h a t can b e e x t r a c t e d w i t h the considered class of models. In particular, it has a n optimal n u m b e r e l parameters, w h i c h are calculated b y a n estimator w h i c h is either efficient or it approaches a n efficient estimator in the traditional sense. In addition, stochastic c o m p l e x i t y also sets the greatest lower bound w i t h w h i c h the d a t a can he predicted w i t h the considered class of models, a n d w e m a y s a y t h a t its calculation and the search for a class of models giving a small stochastic c o m p l e x i t y are the t w o f u n d a m e n t a l problems in statistics.
It m a y be w o r t h w h i l e to elaborate our v i e w on modeling a bit further. Quite often the observed sequence is regarded as a r a n d o m sample, a n d it is t h o u g h t to consist of the i m p o r t a n t information bearing part and of the random noise t h a t is clearly a nuisance to be gotten rid of. Hence, one m a y think t h a t it is necessary to e x t r a c t
118
somehow the "useful" signal so that we then can fit our models to it. Such a prefiltering, however, hides a dangerous prejudice, for strictly speaking the observations never include any noise. They are just numbers, and the only way to separate a portion off them and to call it noise is to use modcB. Hence, nolzo is something we define it to be, namely, the difference between a modeled signal and the observed numbers, rather than something imposed by nature. The fact is that nature produces the observations, and the rest is man made - to paraphrase the famous saying of Kroneeker. To take a simple example, we m a y model the observed input u and the observed output y as being related as follows,
x t = f ( u t) yr=xt+et
where x~ is considered the "useful" signal and e, represents the "noise". Evidently, for a given pair of the observations this decomposition depends completely on the modeled function f. We may, of course, impose a condition on the non-observed e,, such as that its variance is a prescribed number. The effect is a restriction on the functions f that satisfy the extra requirement, which is perfectly in order. A superficial thinking might lead one to the idea that the M D L principle, which has no prefilters for noise, forces us to fit models to noise. Such thinking is evidently contradictory, because as we just saw, noise itself is a result of the modeling process. The M D L principle fits models to data, and we can actually see directly how it automatically avoids inserting parameters to capture "noise": If, indeed, a certain portion of the observations consists of random fluctuations, such as e, in the previous example, then no modeling can shorten their description; i.e., to "compress" them. Suppose that, say, two parameters in the modeled function f a r e sufficient to compress the "useful" part in the observations and that we try to add parameters to compress the fluctuations. Since these cannot be compressed by any means whatsoever, the extra parameters do not "buy" any compression while their own description costs bits. Hence, the M D L principle will remove such parameters, and what remains are only the effective ones, which is just what is needed. The random fluctuations just "pass through" the model unchanged, and they do no harm. As a matter of fact, to push this point to its extreme, you can add pure random noise to the observations without much effect to the optimal M D L model!
119
A deeper issue involves the question of how to measure the amount of "information", that we call complexity, in the data. Fisher's famous idea was to m e a s u r e this information content b y the d e t e r m i n a n t of his information m a t r i x , w h i c h b y C r a m e r - R a o inequality represents the smaliest variance of the p a r a m e t e r estimates. Hence, intuitively, if a n estimator does achieve this lower bound, then it must be the case that the process has e x t r a c t e d all the useful i n f o r m a t i o n in the data. This is a curious, r o u n d a b o u t procedure, and it w o r k s just because the considered p a r a m e t r i c likelihood function is restricted to have a fixed n u m b e r of parameters. W h e n we consider the larger classes of models w h e r e the n u m b e r of p a r a m e t e r s is not fixed, as w e m u s t in order to get b e t t e r models, t h e n the v a r i a n c e of the p a r a m e t e r estimates ceases to be a meaningful m e a s u r e of the information content in the data, which is w h y w e define the stochastic c o m p l e x i t y directly in t e r m s of the data, "noise" a n d all. In a sense, then, Fisher's idea is brilliant but the concept is far too restricted to do w h a t was intended. The information m a t r i x still, of course, is a n i m p o r t a n t quantity, but not as a measure of the useful information in the data.
As a further point, in our view in the absence of prior knowledge we must t a k e every observed sequence to be "typical" in t h a t it is representative of the underlying mechanism. After all, it is all w e have, and we have no right to claim otherwise. Hence, w e should fit models to these data and not to w h a t w e might imagine the data should be. Only if w e have a r a t h e r firm idea of the probabitistic model of a source, such as the gambling machines, obtained on some prior grounds can w e claim that a certain odd observation sequence might not be typical. As a final point, it is often felt that a model is good if its p a r a m e t e r s arc such t h a t repeated estimates are close to each other; in other words, their estimated variance is small. This is the thinking of confidence intervals a n d such. Well, let me define a one- p a r a m e t e r model, w h e r e the p a r a m e t e r has the value 1.2 no m a t t e r w h a t the d a t a are. Clearly, you c a n n o t h a v e a smaller variance, but such a model is probably worthless. In fact, it m a y be a v e r y dangerous prejudice to isolate a quantity and call it a p a r a m e t e r . A n e x a m p l e is the blood pressure, which has been regarded as a n i m p o r t a n t " p a r a m e t e r " , carrying information about the health of a h u m a n body. It has been found relatively recently t h a t it should be regarded as a variable, because it fluctuates considerably in perfectly healthy people, and to diagnose illness, b e c a m e a m e a s u r e m e n t happens to deviate from w h a t has been thought to be its normal value, has led to needless a n d even dangerous medications. In conclusion, the moral of all this discussion is t h a t it is important to u n d e r s t a n d the n a t u r e of statistical reasoning a n d modeling in o r d e r to be able to avoid the m a n y pitfalls t h a t lurk along the way, the most i m p o r t a n t of w h i c h are the a r b i t r a r y and unjustified choices t h a t h a v e a tend-
120
ency of creeping in even if we are on our guard. The only reality consists of the data; the rest are models and uther theories, which well may be gray - as Goethe claimed - but which we ought to select as intelligently as we can.
Th0 stochastic complexity clearly has its roots in the algorithmic notion of information, Solomonoff (1964), Kolmogorov (1965), and Chaitin (1975), which dcfines the complexity of.a binary string to be the length of the shortest program needed to generate it in a universal computer. However, in order to ma ke the principle practicable we must not select the class of models too rich - certainly not to include all computable functions - because then the complexity can neither be computed nor estimated by any algorithm.
2. Coding and Prediction
Our modeling principle is founded on the issues of how to describe or encode data efficiently; that is, with short code length. Although we do not really need any details of such codes, it nevertheless is useful to have an idea of the relevant issues in coding, above all, the code length. This may also add perspective to those interested in prediction, for the reason that it turns out to be a special case of coding. We begin with the traditional coding problem involving one probability distribution, and then we discuss the newer more general situation involving a family of them. We consider the observed data to be a string of symbols y = Yl. . . . . y,, each symbol, for instance, being a binary number written with some finite precision. We do not need to specify this precision, and, in fact, the reader may think of these observations to be just numbers as usual. More generally, the data string may consist of pairs (u,,y,), where the first component is an observed input and the second the observed output response. Nothing substantially different arises from this generalization; instead of a distribution for the outputs we simply consider a conditional distribution for the outputs given the inputs. A code C, then, is a one-to-one function taking each s t r i n g y of every length n to a binary string C(y). Moreover, the code length LO:), defined to be the number of binary digits in CO:), is required to satisfy the so-called Kraft inequality, Abramson (1968),
E .w:
2--L(v) < l,
(2.1)
121
for all n, w h e r e Y" denotes the set of all strings of length n. This inequality is i n t i m a t e l y connected w i t h a desirable property of the code, k n o w n as the prefix property, w h i c h means t h a t no code string Cry) is a prefix of a n o t h e r CO/), w h e r e y and .It are t w o distinct strings of the same length. If we place the code strings C(y), y running through the set of all strings, in a b i n a r y tree (in an obvious fashion), then each code string appears as a leaf having no successor nodes. But then, given such a tree, we can tell which initial portion in a n y string of binary symbols defines a valid code string. In other words, we can tell w i t h o u t a c o m m a w h e n w e have reached the end of a eode string. It is not an accident t h a t w i t h such a self containing description of data the code length defines a distribution Qfy) = 2-LL*~in Y" if w e just a d d the r e q u i r e m e n t t h a t the code be efficient in the sense t h a t the code strings have no superfluous digits, w h i c h turns (2.1) into a n equality.
Suppose the strings in Y" have a probability distribution P(y) assigned to them. Then, for a n y code w i t h the length satisfying (2.1), w e get b y Jensen's inequality, stating that for a c o n v e x function f(x), Ef(x) <_f(Ex),
2 -L(y) E log - P(y)
E L(y) - z: log PO,) < log ~
PO,)2-LO') < o,
y~ y~
where the equality holds if and only if 2-tt~ ) = P(y) for all y. In other words, the m e a n code length satisfies the inequality due to Shannon: EL(g) >_ H(n), w h e r e H(n) = -- j.~Xy. P(y) log P(y) denotes the e n t r o p y of the strings of length n. This means that the ideal w a y to encode the strings relative to the given distribution is to assign to string y a code string w i t h length - log P(y). This, of course, c a n n o t a l w a y s be done e x a c t l y because a code string must have an integer length, but at least we k n o w w h a t w e should be striving for, and w e call it t h e / d e a l code length. A n o t h e r good n a m e w o u l d be Shannon complexity o f y relative to the given distribution.
A direct application of these ideas to compressing strings confronts us with the same problem as m e t in traditional statistics: The distribution P0") is not k n o w n to us, and it either has to be imagined or, better, estimated. For this reason w e consider a p a r a m e t r i c family { P0(Y) } of such distributions or models, w h e r e 0 = (01. . . . . Ok), and k ranges over the set of all n a t u r a l numbers, H o w n o w to calculate the ideal code length is the central problem in the
M D L principle to b e discussed next.
122
There are two basic ways to go about encoding a string of data. In the first way we read the entire string and we ^ somehow form the best estimate 0 O,) of the parameter vector 0. Then we design a code C such that the length of the code string CO,) is close to the ideal -- log/~e c,)(y). We need not concern ourselves with the details of how such a code can be designed, which is just a routine matter. The important thing to realize is that the datay can lag de^ coded from the code string CO,) only if the decoder also knows the estimated parameter vector 0 (y). This has to be given in an explicitly coded form, because the decoder at the time it is needed does not yet k n o w y and, hence, cannot calculate the estimate by any conceivable algorithm. The binary code string for the parameter vector, which may be placed as a preamble in front of C(y), must dearly be a prefix code, lor otherwise the decoder would not be able to separate it from the subsequent binary code of the data. Hence, its length L(8) must satisfy the Kraft-inequality, ~2-L(') _< 1, where 0 runs through all its possible values. These values are clearly truncations (think of computing the maximum likelihood estimates, which surely result in truncated numbers). If we carry too many fractional digits, the required code will have to be long, while if we truncate too heavily, the results will deviate too much from the optimum, and we end up coding the string with non-optimal parameters. It turns out that when each component is truncated to its optimal precision, reflecting its importance to the entire code length, the k code length for the k-component parameter vector and the loss due to truncation is ~-- log n bits, Rissanen (1978). In addition, the decoder will have to be given the number of the components k in the estimated parameter vector as another prefix coded preamble, which takes a little more than log k bits. This number, of course, is almost always quite negligible in comparison with the other length, and we drop it. All told, the best ideal code length with this type of "nonpredictive" coding is to within terms of order log n given by
k l~vp(y) = min { -- log P0(Y) + "~- log n]. k,e
(2.2)
The same expression but with different content and scope was also derived by restrictive Bayesian assumptions }n Schwarz (1978). We also refer to the pioneering work of Akaike (1974) for another criterion, where the weaker model complexity penalizing term k gets added to the first, the negative logarithm of the likelihood term. In contrast with (2.2) such a term is too weak to produce consistent estimates of the number of parameters in all the analyzed cases, Hannan (1980) and Shibata (1976). Finally, we add that when the parameter coding job is done more carefully, Rissanen (1983a), a third term is required, namely, k log ]JOII~<e),where M(O) denotes the Hessian
123
matrix of - log Pc(Y). This term turns out to be sensitive to the structure in which the parameters of a multivariable dynamic system are represented, Rissanen (1983b); see also Section 4.
The other way of coding data strings requires no explicit code for the parameters, because the coding will be done in a "predictive" way. What this means is that from each portion y ' =-)1 . . . . . y, of the data string we form an estimate of the distribution P0(.v,+l I.I/) for the possible values of the next symbol, where # is to be replaced by an estiA A mated value 0(t) = O0/), calculated by an algorithm from the so-far processed string. The decoder knows this algorithm, and he can also calculate the same estimate provided that it indeed does not depend on the future and not yet decoded data points. We know from Shannon's result, derived above, that the best way to do the coding is to assign to the next symbol the code length -- log P~0(,)(y,.~ lY'). and hence the best total ideal code length with this type of predictive coding is
n-I
I_p0') = min { - ~'. log ~t0(vt+l lY')}.
k
t-O
(2.3)
We should also have included the code length, log k, required to describe the number of the components in the es^ timated parameter vectors, but as above this term is negligible. How sheuld we pick the estimates O(t)? It seems to make eminent sense to pick them in such a way that the accumulated past code lengths
t-I -- log PO(,,pt) = -- E log PeO~¢+l lyt), t=O
(2.4)
arc minimized, which is seen to be done by the maximum likelihood estimates of the parameters for each value e[ k. This represents a most attractive principle of inductive inference: Make that choice that has worked best in the past. And who can argue against that, provided that we have no other "prior" knowledge about the behavior of the data! This philosophy in his "prequential" approach to estimation was also discovered independently in Dawid (1984). A somewhat similar and yet crucially different "cross- validation" principle has been studied in Stone (t977). Because no "honesty" of the predictions is required, the associated criterion is asymptotically equivalent with Akaike's AIC, and hence the resulting estimates of the number of parameters are not consistent.
124
In order to avoid ill-conditioned optimization problems, w e in (2,4) never estimate more p a r a m e t e r s t h a n d a t a points; t h a t is, w e begin w i t h k = 0 a n d increase/¢ gradually to each final value with which the criterion in (2.3) is evaluated. The case w i t h no free p a r a m e t e r s means that w e need an initial distribution POt) to predict or encode the v e r y first observation. This could be done by having a fixed p a r a m e t e r value 0(0), obtained s o m e h o w on prior grounds, w h i c h singles out a distribution from the family. W e discuss later h o w such a prior knowledge can be t a k e n a d v a n t a g e of in modeling and prediction.
The predictive coding process is seen to be v e r y similar to prediction: In both cases we try to unravel the uncert a i n t y a b o u t the " n e x t " observation Y,.t b y acting on the past d a t a only. In fact, the t w o processes are equivalent. A A TO see this, let 8(Ft+t - - y (t + 1 ] t)) be a n y reasonable prediction error measure, w h e r e y (t + I ] t) is some prediction of the n e x t observation, involving p a r a m e t e r s to be estimated from the past data. Define a conditional density ^ f~(Y~+t [Y) proportional to e -~(y,,t -y 0+tl0) and we get a family of p a r a m e t r i c probahilistic models, w h e r e the code length, a p a r t from a n irrelevant t e r m due to t r u n c a t i o n and proportional to n, is the sum of the prediction errors. A particularly i m p o r t a n t special case results from the quadratic prediction error measure, because then the predictive MDL principle reduces to a predictive least squares (LS) principle. W e discuss its application to A R M A estimation in the n e x t section. Because the non-predictive coding process c a n n o t be interpreted as prediction, w e conclude t h a t coding is a strictly more general process than prediction.
W e conclude this section b y stating that the t w o described coding lengths are asymptotically optimal in the sense that their mean, relative to a n y process in the considered class of " s m o o t h " models, is shortest a m o n g all codes satisfying (2.1). Because the variance of these lengths, c o m p u t e d per observation, behaves like 1/n, w e m a y take these lengths themselves to represent well the shortest possible per s y m b o l code lengths (prediction errors), and w e call It~e(F) and lr(F) the non-predictive and predictive stochastic complexities, respectively, of the stringy, relative to the considered class of models. This result not only generalizes the a b o v e m e n t i o n e d Shannon theorem, giving a tight lower b o u n d for the code length a n d the prediction errors, but it also serves a similar role as C r a m e r - R a o in= equality for estimators, e x c e p t t h a t w e m a y assess the goodness of a n y estimators, including the n u m b e r of parameters. The n a m e " c o m p l e x i t y " seems apt in view of the fact that it represents the ultimate limit to w h i c h the three f u n d a m e n t a l tasks, prediction, estimation, a n d coding, can be performed.
125
3. A R M A E s t i m a t i o n and Prediction
AS w e outlined in the preceding section, estimation and prediction are intertwined: you c a n n o t predict optimally w i t h o u t performing estimation optimally. Here w e m e a n the real prediction problem w h e r e w e are g i v e n an obs e r v e d sequence of numbers, y ( l ) . . . . . y ( n ) , one b y one, and w e are asked to predict for each n the n e x t value, This is to be done w i t h o u t knowing the probabilistie source of the n u m b e r s as usually done in prediction theory. O u r approach is to select a class of models, or perhaps several classes, and fit a model in each class w i t h the predictive L S principle. The prediction will be d o n e with the best model at each instant of time, and if the past is a n y guid-
ance to the future this strategy will provide the best predictions obtainable with the selected class. W e shall choose the model class as the gaussian A R M A class, w h i c h means that w e shall have to k n o w h o w prediction is done optimally for such processes. The K a l m a n t h e o r y in principle is applicable, but the solution it provides involves par a m e t e r s t h a t c a n n o t be estimated from the observations. For this reason w e shall use a n o t h e r approach, Rissanen (1967), and w e give the relevant recurrence equations below.
Consider a process generated b y the recursion
y ( t ) + a l y ( t -- 1) + " - + apy(t -- p ) = e(t) + c l e ( t -- 1) + ... + Cqe(t -- q),
(3.1)
for t > p , w h e r e e is a n orthogonal zero-mean process with variance E ( e ( t ) 2) = 0 2. Letting u ( t ) for t >_p stand for the M A process
u ( t ) = e(t) + c l e ( t -- 1) + ... + ¢qe(t - - q ) ,
we see that the eovarianca E ( u ( t ) u ( s ) )
(3.2)
= r(t,s), t,$ >_ p , satisfies the crucial "bandedness" property
r(t,s) = 0, for [ t -- s] > q.
(3.3)
W e let the initial variables be specified b y the eovariances as follows
E ( u ( t ) y ( $ ) ) = r(t,s) = 0 if t - s > q
(3.4) E(y(t)y(s))
= r(t,s),
t,s < p .
126
The problem is to find the orthogonal projection of y ( t ) on Y0~-t, the subspace spanned b y the observations up to ^ t - 1, w r i t t e n a s y (t I t -- 1). The task is simple if we find. a representation of the process u as follows:
(3.5)
u ( t ) = E(t) + Ci ( O e ( t - - 1) + "'" + Cq(t)e(t -- q ) , t > q,
w h e r e e(t) is a n uneorrelated (but not of unit variance) process; the variables for non-positive indices are zero. The coefficients are found by the C h o l e s k y factorizatlon of the covariance m a t r i x R = {r(ij)] as R = B I B , w h e r e B is upper triangular,
b(O,O) b(O,1) 0
b(O,n)
b(l,1) 0
B=
b(n0
0
l,n)
b(n,n)
Specifically, the f a c t o r s are defined b y the following rccursions, which also are s e e n to result from the G r a m - S m l t h orthogonalization procedure.
q
b ( t -- i j ) = [ r ( t -- i,t) - -
Z b(t - j , t ) b ( t - j,t - i) q b - l ( t j-i+l
b(t,t) = + [r(t,t) - b 2 ( t - q,t) . . . . .
b(0,0)= +~,b(t-i,t)=0,
b(t -
- i,t - i),
l,t)2] 1/2, t > 0
1 _
(3.6)
i>t.
We t h e n h a v e
ci(t) = b(t - i , t ) b - l ( t
-- i,t -- i), i = 1 . . . .
,el.
(3.7)
Since the e - a n d t h e y - processes span one and the same linear space for all n, w e easily get the desired rceursive equations for the optimal predictor
127
q
: c , I , - f> +
k
2c, c,):c,-,t,-;- 1)=
i--I
(3.8) i--I
where d,(t) = q(t) - a~, i = 1. . . . . k for k = m a x {p:/}, and the coefficients with undefined i n d e x values a r e zero.
One can show that if the polynomial defined by the coefficients c, has its roots strictly outside the unit circle, then c,(t) -4- c,. The limiting predictor, then, agrees w i t h the s t a t i o n a r y optimal predictor
q
~, ft I t -
1) + ~
k
cf, ft -
i It - i -
1) =
i=l
~, (ci -
a i ) y f t - i)
(3.9)
i=1
We n o w r e t u r n to the main problem of h o w to do the prediction w h e n the coefficients and the t w o order n u m b e r s p and q are n o t k n o w n . W e apply the predictive LS principle and proceed as follows. For each pair (p,q) and each t w e solve the following ordinary least squares problem
l
min E ~ 2 ( i ) , fl i--I
(3.10)
where 0 denotes the vector of the coefficients a = (at . . . . ,%,q . . . . . cq) together w i t h the p(p + 1 ) / 2 + q(q + 1 ) / 2 initial elements r(id) in (3.6), defining a vector/3, and r(i) m a y be solved recursively from (3.1), (3.2), and (3.5). A A ^ Let the minimizing p a r a m e t e r s be O (t) = (a (t),/3 (t)). With these we n o w e x t e n d the Cholesky factorization one more step; i.e., w e c o m p u t e the coefficients (3.6) for t + 1, and w i t h (3.7) we calculate the n e w prediction 2~(t + 1 It) f r o m the f o r m u l a (3.8), w h i c h clearly depends only on the past data and the pair (p,q), because the ^ calculation of O(t) is done b y the fixed ordinary least squares algorithm. This gives the prediction error A A ~(t + 1 IpAt) =y(t + 1) - - ~ ( t + 1 It). As the final step w e find the best pair (p(n),q(n)) which solves the optimization problem
128
n-I Ip~v) = mln X # ( t + 1 Ipa). P'q t-O
(3.n)
It can be shown that asymptotically
I
~)
~ 02(1 +
p+q n
In n),
(3.12)
where
02----. ~ - ~ e 2 ( / ) . t-I
(3.13)
Remarks.
In the above described procedure we did not pay any attention to the amount of computations needed. Rather, our aim was to do the prediction as well as we know how, provided, though, that there is no prior knowledge about ^
A
the parameter values. Clearly, when calculating 0 (t + 1) by a suitable hill climbing routine, we should use 8 (t) as the initial estimate. It is also possible to calculate the Cholesky Iactorization by an order of magnitude faster algorithm, Rissanen (1973), in case the eovariance matrix R(t) is a Toeplitz matrix; i.e., if the process u is stationary, and if we set the initial conditions to zero. Alternatively, it is enough to have the initial conditions such that the y -- process is stationary. The resulting fast predictor recursions have been described by Lindquist (1974), Kailath, Morf, and Sidhu (1974), and by Rissanen (19"/5), after this author lectured the topic at Stanford University during 1971-1972. Much earlier, the impulse response of a stationary predictor was found with a fast algorithm by Levinson, but that algorithm required an ever growing memory.
The entire Cholesky factorization can be avoided if we ignore the influence of the initial conditions and simply replace the representation (3.5) by (3.2). The only problem remaining then is to compute the sequence of esti-
129
A
mates a(t), t = 0 . . . . . n - 1 for different values of p and q. In the caze of AR processes such calculations can be done recursively by the so.called ladder forms; see Wax (1985).
As a final remark, the difference between the complexity and the sum of the squared residuals (3.13) was observed in Bittanti (1983), where it was wondered whether the relationship between the two could be clarified. Well, (3.12) does it in a most decisive manner.
We computed in Rissanen (1984c) a small simulation to test the predictive least square principle for estimating an ARMA order. We used the stationary equations which do not require the Cholesky [actorization, The data were generated with an ARMA(1,1) system with parameters a = .5 and c = - . 3 , where e(t) was a computer generated zero mean unit variance independent gaussian sequence. We fitted models of type ARMA(p,q) with (p,q) ffi (1,0), (2,0), (1,1), and (2,2). The following table gives the sum in (3.12), calculated for various values ofp,q and divided by n, along a single sample of size 600.
(p,q)
n ----50
n = 100
n = 200
n = 300
n = 600
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
(1,0) (2,0) (1,I) (2,2)
1.336 1.629 1.505 1.925
1.276 1.385 1.307 1.520
1.101 1.156 1.117 1.221
1.107 1.120 1.091 1.159
1.015 0,996
Table 1. Simulations of ARM A processes
We see that the models (2,0) and (2,2) give uniformly worse values than the two best models (1,0) and (1,1) in the table for all sample sizes (we did not calculate the last entry for them, which surely would have been worse, too). In the last model, in particular, the two extra parameters penalize heavily the prediction errors. For sample sizes up to 200 the simpler model (1,0) performs best, but eventually the model with the right numbers of parameters (1,1) is the winner. This makes sense in that there is no predictive benefit in estimating the second less significant parameter until there is enough data, even if we knew that such a parameter existed; the data are the ultimate arbiter in deciding what is optimal and what is not.
130
We then w a n t e d to study how initial estimates of the p a r a m e t e r s might be t a k e n a d v a n t a g e of to improve t h e par a m e t e r estimates a n d the predictions. After all, in our opinion, the most natural and easy w a y to incorporate inltial knowledge is directly in terms of the estimate of the parameters, including their numbers. Indeed, the p a r a m e t e r s usually represent constants, a n d a n y Bayesian type of prior distribution for t h e m is both a w k w a r d to justify and just a b o u t impossible to estimate in a meaningful way. The traditional Bayesian formalism does not permit a representations of initial k n o w l e d g e in terms of a p a r a m e t e r value, because the so defined singular dis^ tribution c a n n o t be altered by the data. However, our formalism does it easily. In fact, let 0 (0) denote the initial ^ estimate w i t h p + q components. Then w i t h O(t) denoting the predictive L S estimate from the first t observations w i t h initial k n o w l e d g e ignored, as described above, define a new estimate as a linear combination of the t w o
O(t) =
^ ^ at0 (o) + (1 - . c ) O ( t ) .
(3.14)
The coefficient is defined as follows
at=
1
--,_ , ,_ , ' 1 + 2 L'Ow'q't)-Lke'q't)
(3.15)
w h e r e L(p,q,t) denotes the a c c u m u l a t e d prediction errors, (3.11) before the minimization, and Lo(p,q,t) is the s a m e ^ w h e n the p a r a m e t e r is the initial estimate 0 (0). Because this p a r a m e t e r is the same t h r o u g h o u t the data, La(p,q,t) coincides w i t h the usual non-predictive s u m of the squared deviations. We see that a good initial e s t i m a t e tends to m a k e the corresponding code length shorter t h a n the length L(p~l,t) for small values of t, because initially the estiA
m a t e 0 (t) tends to be poor due to a small sample size. This causes a r to be near one, a n d the effective estimate 0 (t) is close to the initial estimate. However, eventually a, gets small, unless the initial estimate is perfect, and the ef^ fective estimate tends to the steadily improving estimate 0 (t).
To test the feasibility of this s c h e m e w e generated a data sequence of length 100 w i t h the A R M A ( I , 1 ) model defined b y the t w o p a r a m e t e r s a = 0 . 7 , c = -- 0.1, a n d w i t h a unit variance zero m e a n gaussian independent input sequence. W e set the initial estimate 0(0) = (0.7, -- 0.1) of the p a r a m e t e r s at the " t r u e " values. W e wish to cornpare the convergence of the p a r a m e t e r e s t i m a t e 0 (t) = (a, c ), given b y such a perfect initial knowledge, w i t h t h a t ^
A
A
of the least squares estimates 0 (t) = (a, c ). In this e x p e r i m e n t we, then, kept the n u m b e r of p a r a m e t e r s at the
131
correct value. The~o estimates along with the two sums of the squared predictions, corresponding to the two estiA
maters 0 and O, r~pectively, were computed along the 100 sample points, and the results are in the following table.
time t
with prior estimates a c L
without prior estimates a c L
.......................................................................
I0 20 30 40 100
0.15 0.47 0.62 0.69 0.69
0.01 -0.23 -0.17 -0.11 -0.11
5.0 17.1 28.7 36.0 98.9
0.04 0.16 0.24 0.43 0.50
0.03 -0.41 -0.50 -0.28 -0.33
4.99 20.0 33.7 42.9 108.2
Table 2. Effect of prior estimates
We see that indeed good initial estimates improve both the convergence and the prediction error.
4. Vector Time Series Models
As quite well known, the class of multi-input/output linear dynamic systems, even of a fixed dimensionality, is topologically a lot more complex space than in the case when either the input or the output is scalar. Hence, when we search for the stochastic complexity of an observed vector li me series, relative to the class of such models, we may find a model which with relatively few parameters will capture the essence in the data. In the older statistical literature the only models that were fitted to a series with, say, p components, had the maximal dimensionality, a multiple of p. This was justified on the grounds that since the estimated l-lankel-matrix, or its equivalent, has the maximal rank, there is no point in fitting other models. Such an argument indicates a gross misunderstanding of modeling, and, in fact, an equivalent argument would dismiss fitting dynamic systems to scalar sequences as well; after all, no observed sequence is generated by any dynamic system.
Since the theory of multivariable linear dynamic systems is by now well known, and, in fact, it ma y even be covered in some of the other chapters of this book, we do not need to describe it here in any detail. Instead, we just summarize the relevant facts. The set of all linear dynamic systems with, say, p inputs and equally many outputs,
132
is in o n e - t o - o n e c o r r e s p o n d e n c e w i t h t h e set of all H a n k e l m a t r i c e s of p x p blocks w i t h finite r a n k , s a y n. G i v e n s u c h a s y s t e m , its m a t r i x i n p u t / o u t p u t impulse response defines a H a n k e l m a t r i x of p x p blocks a n d r a n k n. C o n v e r s e l y , a n y of t h e usual realization a l g o r i t h m s defines a
p-input/output s y s t e m
of d e g r e e n of s u c h a H a n k e l
matrix.
T h e set of all finite r a n k H a n k e l m a t r i c e s of p x p blocks c l e a r l y a d m i t s a p a r t i t i o n into e q u i v a l e n c e classes b y t h e r a n k n. H o w c a n e a c h s u c h class b e p a r a m e t c r i z e d ? Unlike in t h e c a s e w i t h p = 1, a n equivalenc~ class c o r r e s p o n d i n g to a r a n k n, a n d h e n c e t h e set of all linear s y s t e m s of o r d e r n w i t h p inputs a n d o u t p u t s , c a n n o t b e p a r a m e t e r i z e d w i t h a single c o o r d i n a t e s y s t e m , a n d t h e set is n o t a linear space. This i m p o r t a n t o b s e r v a t i o n is a t the r o o t of t h e m o d e r n t h e o r y of linear d y n a m i c s y s t e m s , a n d it also affects in a p r o f o u n d m a n n e r the w a y s u c h models o u g h t to be fitted to the o b s e r v e d time series. C o n s i d e r , f o r e x a m p l e , the set of all H a n k e l m a t r i c e s w i t h p = 2 a n d n -- 3. If w e f u r t h e r a s s u m e t h e first t w o r o w s to b e linearly i n d e p e n d e n t , as w e s h o u l d to a v o i d p a t h o l o g y , t h e n t h e H a n k e l p r o p e r t y implies t h a t either t h e t h i r d or the f o u r t h r o w m u s t be the last r e m a i n i n g r o w t h a t t o g e t h e r w i t h t h e first t w o f o r m s a 3 - e l e m e n t basis f o r t h e s p a n of all the r o w s in t h e m a t r i x . A g a i n the H a n k e l p r o p e r t y implies t h a t these t h r e e basis r o w s a r e defined just as s o o n as w e specify t h e t w o first e l e m e n t s in each, hence, six a l l t o g e t h e r . In the f o r m e r case, w h e r e the basis consists of the Hrst t h r e e r o w s , t h e f o u r t h a n d t h e fifth r o w a r c linear c o m b i n a t i o n s of the basis e l e m e n t s a n d , hence, to specify t h e m w e n e e d six ecefficients. All t h e o t h e r r o w s in the H a n k e l m a t r i x a r e n o w just shifts a n d t r u n c a t i o n s of these a n d t h e basis r o w s .
2np =
Similarly,
12 p a r a m e t e r s a r e n e e d e d to specify all t h e H a n k e l m a t r i c e s , w h e r e the first, s e c o n d , a n d t h e f o u r t h r o w s
f o r m a basis.
C o n s i d e r n o w t h e set of m a t r i c e s w h e r e the f o u r t h r o w is a basis c l e m e n t . T h e n the t h i r d r o w p e r f o r c e is l i n e a r l y d e p e n d e n t o n t h e first, second, a n d t h e f o u r t h . C o n s i d e r t h e f u r t h e r subset w h e r e t h e t h i r d r o w in f a c t is linearly d e p e n d e n t o n t h e first t w o . E v i d e n t l y n o s u c h m a t r i x a n d the c o r r e s p o n d i n g linear s y s t e m c o u l d b e e x p r e s s e d in t e r m s of t h e p a r a m e t e r s d e f i n e d b y t h e basis consisting of t h e first, second, a n d t h e t h i r d r o w . F r o m this w e c o n elude t h a t in o r d e r t o p a r a m e t e r i z c the set of all s y s t e m s of d e g r e e 3 h a v i n g t w o inputs a n d o u t p u t s , w e n e e d t w o distinct c o o r d i n a t e s y s t e m s .
~33
In general, then, the set of all linear systems of degree n having p inputs and outputs, may be partitioned into finitely many equivalence classes, each class corresponding to the so-called lexicographic basis defined by each matrix as follows: Each of the first p rows is included in the basis, and the next basis element is the first row which is not in the linear span of those above it, and so on. Consider the ith row, i _
The state space representation of the multi-input/output linear systems is somewhat simpler than the ARMA representation for the reason that the invariants; i.e., the coordinates, appear directly as elements in the system matrices; in the ARMA representation the matrix elements are functions of the invariants. What we need is the equations for the predictions, which we for simplicity take to be time invariant, which means that we need not solve the Riccati equations or, equivalently, find the Cholesky factors recursively; otherwise, we would have to proceed as in the previous section. The predictor equations, then, are as follows:
.~(t + 1 It) = F~Ctlt-- 1) + GuCt) + K[.y(t)
-
-
tt~(tlt - 1)] (4.1)
~'OIt- 1) = # ~ O I t - 1),
134
w h e r e ~ , ( t l t - i ) is the prediction of the observed p-component output sequence, u is the possibly present observed r - c o m p o n e n t input sequence, and ~(tlt - l ) is the prediction of the intermediate n - c o m p o n e n t state seA quence, w h i c h we, otherwise, have n o interest in. Take initially x (1 I 0) = 0. The observed input sequence is quite irrelevant; its n u m b e r of components, w h e t h e r single or multiple or none at all, has no i m p o r t a n t effect on the theory. It is the n u m b e r of outputs p that w e now require to be greater t h a n unity. W e m a y t a k e all the elements in the t w o matrices G a n d K as free. The m a t r i x F has np free p a r a m e t e r s in the manifold representation, and possibly fewer in its m i n i m a l representation. Their locations depend on the choice of the coordinate system, which also d e t e r m i n e the location of the O's a n d l's in H, w h i c h are its only elements. Hence, there are k free p a r a m e t e r s in the model (4.1), w h i c h w e arrange into a vector 0. In the ease w i t h the manifold representation, k = n(2p + r) in each of the structures corresponding to the dimensionality n, while in the minimal realization k depends on the structure; however, k < n(2p + r).
If w e agree to measure the prediction errors b y squares, w e get the so-called Gauss - M a r k o v class of models. Relative to this class the non-predictive stochastic c o m p l e x i t y of the data (.el u) = (v(1) I u(1)) . . . . . (y(N) [ u(N)) is given b y
I~vt,fO,l u ) = m i n { - f l o g d e t R ( O ) + $.0
log N + k log llOnM(O)},
(4.2)
where
/V R(0) : ~-t~1(v(t) -- )A'8(fI t -- l))(y(t) -- .~(t I t -- t))t
(4.3)
and M(O) denotes the Hessian defined b y the double derivatives of log det R(O), evaluated at 0. It was s h o w n in Rissanen (1983b) t h a t if, indeed, the data w e r e generated b y some such system in a s t r u c t u r e s, then the estimated ^ structure sA(N) and the associated p a r a m e t e r s 0 (N), will approach the corresponding generating parameters. In particular, the last t e r m in the c o m p l e x i t y forces the s t r u c t u r e estimates to converge. The last t w o t e r m s in the stoehastlc c o m p l e x i t y (4.2) represent the optimal model c o m p l e x i t y as m e a s u r e d in t e r m s of the n u m b e r of b i n a r y digits required to encode the parameters. As the n u m b e r of d a t a points grows, the second t e r m becomes domi-
135
A
, Jr. "This nantly greater ol the two, and w e see that the optimal model c o m p l e x i t y grows proportional to Tk ( N ) mg is because one m u s t express the p a r a m e t e r s collectively with increasing precision as the sample size grows, which makes e m i n e n t sense.
We n e x t describe the predictive stochastic complexity. It is given b y
1
N
!. ~-,,
^
A
Ir(vlu) = m}n ~ - l o g det 2vt~t O ' ( t ) - - y ( t l t - l ) ) ( y ( t ) - - ) ' ( t i t -
(4.4)
1))',
=
A
A
where y (t I t -- 1) denotes the prediction obtained as follow: The m a x i m u m likelihood estimate 8 (t -- 1) is deter. , A mined from the d a t a up to time t - 1. With this p a r a m e t e r the predicttons y (i I i -- 1) are c o m p u t e d from (4.1) up to time i = t, w h i c h gives the prediction needed in (4.4) at this t i m e instant. The whole process is to be repeated for the next time instant, w h i c h requires a lot of computations. The bulk of these goes to the evaluations of the estiA
A
mates 0 (t), w h i c h must b e done for each t. It is clear that 0 (t) provides an excellent starting point for getting A
A
0 (t + 1), but nevertheless an order of t computations are needed to calculate y
(tit -- 1), so t h a t the entire string
takes a n order of N 2 operations.
The stochastic c o m p l e x i t y (4.4) involves the minimization with respect to the s t r u c t u r e index s. If w e represent the models in the manifold, t h e n the predictors c o m p u t e d in t w o equivalent coordinate systems are identical, to within the numerical precision. Therefore, equivalent coordinate systems also produce the same value for the stochastic complexity, a n d c a n n o t b e distinguished. In the minimal representations there are no equivalent coordinate systems a n d the predictive stochastic c o m p l e x i t y is given by a unique s t r u c t u r e index. These facts can be t u r n e d to an a d v a n t a g e in t h a t for each o r d e r n w e evaluate the a c c u m u l a t e d prediction errors in (4.4) (before minimizaA
A
tion) b y calculating the required least squares estimates 0 (t) in a good structure s (n), d e t e r m i n e d suitably, say, with the procedure in van O b e r b o e k and Ljung (1982). This is i m p o r t a n t to ensure that the estimates can be determined w i t h sufficient precision. Then the minimization required in (4.4) m a y be done with respect to n, just as in the single i n p u t / o u t p u t case, w i t h ~ as the result. Finally, w e m a y also find the minimizing s t r u c t u r e A
A
A
A
s (n) = (n, ..... np) b y letting the models be in their minimal representations.
136
Simulations.
T h e s e s i m u l a t i o n s w e r e d o n e b y V. W e r t z , see R i s s a n e n a n d W e r t z (1985). T h e d a t a w e r e g e n e r a t e d b y t h e s y s t e m , E x a m p l e 5. I in v a n O b e r b e e k a n d L j u n g (1982),
x(t + 1) = F x ( t ) + G u ( t ) + K e ( t ) (4.5)
y(t) ----Hx(t) + e(t),
w h i c h in the m a n i f o l d r e p r e s e n t a t i o n has t h e m a t r i c e s
F=
.50.
,
G=
0
and
K=
-.000547
.063
.119
.157
.674
.0000666
This s y s t e m has t h e s t r u c t u r e s = (2,1) c o r r e s p o n d i n g to t h e basis defined b y t h e first t h r e e r o w s in t h e H a n k e l m a t r i x . T h e 2 - c o m p o n e n t i n p u t e(t) is a s a m p l e f r o m a 2 - v a r i a t e zero m e a n gaussian process w i t h t h e c o v a r i a n c e m a t r i x given b y .2 x I, w h e r e I denotes t h e 2 x 2 identity m a t r i x . T h e s c a l a r i n p u t u(t) is a n o b s e r v e d m o r e or less r a n d o m signal, t a k i n g values + 1 or -1, w h i c h w a s a d d e d for g o o d m e a s u r e . T h e s a m p l e size is N = 500; i.e., t = 1, 2 ..... 500.
T h r e e d i f f e r e n t m o d e l s w i t h r e a s o n a b l e s t r u c t u r e s sl = (1,1), s2 = (2,1), a n d s3 = (2,2) w e r e fitted. T h e c o r r e s p o n d i n g values of t h e m i n i m i z e d a c c u m u l a t e d p r e d i c t i o n e r r o r s a r e .3031, .0649, a n d .0671, respectively. W e see t h a t t h e s e c o n d s t r u c t u r e , w h i c h is t h e s a m e as t h e s t r u c t u r e o! the d a t a g e n e r a t i n g s y s t e m , gives t h e p r e d i c t i v e
137
complexity. The first being under-parameterized is clearly the worst, while the last, having only one excess parameter, suffers only slightly for this. We give the optimal matrices for the winning modal
0 F=
1
0
• 5 -.02 .
-.o2
.008 ,
G=
.032
L.0490J
.
and
K=
23
--.023 ,
-.11
.083 j
H= 0
Referenc~
Abramson, N. (1968 ), Information Theory and Coding. McGraw-Hill, New York. Akaike, H. (1974), " A New Look at the Statistical Model Identification", IEEE Trans. AC-19, 716-723. Akaike, H. (1975), "Markovian Representation of Stochastic Processes by Canonical Variables", SIAM J. Control, 13. Bittanti, S. (1983), "Is the.Predictlon Error of a Regression Model White?", J. Franklin Inst. Vol. 315, No. 4, 239-246. Chaitin, G.J. (1975), "A Theory of Program Size Formally Identical to Information Theory" J.ACM, 22,329-340. Clark, J.M.C. "The Consistent Selection of Local Coordinates in Linear Systems Identification", JACC, Purdue University, Lafayette, Indiana, pp. 576-5g0, July 1976. Dawld, A.P. (1984), "Present Position and Potential Developments: Some Personal Views, Statistical Theory, The Prequential Approach", J. Royal Star. Soc. Series A, Vol. 147, Part 2,278-292. Geisser, S. and Eddy, W. (1979), "A Predictive Approach to Model Selection", J. American Star. Ass., Vol. 74, Nr, 365, 153-160.
138
Clover, K. and Willems, J.C. (1974), "Parameterizations of Linear Dynamical Systems: Canonical Forms and Identifiability", IEEE Trans. AC-19, no. 6. Hannan, E.J. (1980), "The Estimation of the Order of an A R M A Process", Ann, Stat. 8, No. 5, 1071-1081. Hjorth, U. (I982), "Model Selection and Forward Validation", Stand. J. Stat. 9, 95-105. KaUath, T., Morf, M., Sidhu, G.S. (1974), "Some New Algorithms for Recursive Estimation in Constant DiscreteTime Linear Systems", IEi~E Tr. Automatic Control, Vol. AC-19, 315-323. Kalman, R.E. (1974), "Algebraic Geometric Description of the Class of Linear Systems of Constant Dimension", 8'th Ann. Princeton Conf. on Inf. Sciences and Systems, Princeton, New Jersey. Kolmogorov, A.N. (1965), "Three Approaches to the Quantitative Definition of Information", Problems of Information Transmission l, 4-7. Lindquist, A. (1974),)'A New Algorithm for Optimal Filtering of Discrete-Time Stationary Processes", SIAM J. Control 4, 736-747. Ljung, L. and Rissanen, J. (1976) "On Canonical Forms, Parameter Identifiability and the Concept of Complexity", IFAC Syrup. on Identification, Tbilisi, USSR. Luenbcrger, D.G. (1974), "Canonical Forms for Linear Multivariable Systems", IEEE Trans. AC-12, 290-293. Popov, V.M. (1972), "Invariant Description of Linear, Time-Invariant Controllable Systems'), SIAM J. Control, 10, 254-264. Rissanen, J. (1967), "An algebraic approach to the problems of linear prediction and identification", IBM Res. Rep. RJ 468, Oct. 23. Rissanen, J. (1973), "Algorithms for Triangular Decomposition of Block Hankel and Toeplitz Matrices with Application to Factoring Positive Matrix Polynomials", Mathematics of Computation, Vol. 27,147- 154. Rissanen, J. (1974), "Basis of Invariants and Canonical Forms for Linear Dynamic Systems", Automatica, Vol. 10, pp.175-182. Rissanen, J. (19"/5), "Canonical Markovian Representations and Linear Prediction", Proc. of the 6'th IFAC Symposium, Part 1, Paper 29.3, 1-9. Rissanen, J. (1978), "Modeling by shortest data description", Automatica, Vol. 14, pp. 465-471. Rissanen, 3. (1983a), "A Universal Prior for Integers and Estimation by Minimum Description Length", Ann. of Statistics, Vol. 11, No. 2,416-431.
139
gissanen, J. (1983b), "Estimation of Structure by Minimum Description Length", Circuits, Systems, and Signal Processing, special issue on Rational Approximations, Vol. 1, Nr. 3-4, 395-406. Rissanen, J. (1984a), "Universal Coding, Information, Prediction, and Estimation", IEEE Trans. Inf. Theory, Vol. IT-30, Nr. 4, 629-636. Rissanen, J. (1984b), "Stochastic Complexity and Modeling", (to appear in Ann. of Statistics). Rissanen, J. (1984c), "Order Estimation by Accumulated Prediction Errors", Esseys in Time Series and Allied Processes (eds. J. Gani, M.B. Priestley). Rissanen, J. (1985a), "Minimum Description Length Principle", Encyclopedia of Statistical Sciences, Vol. V, (S. Kotz & N. L. Johnson eds.), pp. 523-527. John Wiley and Sons, New York. Rissanan, J. (1985b), "A Predictive Least Squares Principle", (to appear). Rissanen, J. and. Ljung, L. (1975), "Estimation of Optimum Structures and Parameters for Linear Systems", Prec. CNR. CISM Syrup. on Algebraic System Theory, Udine, Math. System Theory 131, Springer-Verlag, pp. 76-91. Rissanen, J. and Wertz, V. (1985), "Structure Estimation by Accumulated Prediction Error Criterion", Eighth IFAC Symposium on Identification and System Parameter Estimation, York, England. Schwarz, G. (1978), "Estimating the Dimension of a Model", Ann, Statist. 6, 461-464. Shibata, R. (1976), "Selection of the Order of an Autoregressive Model by Akaike's Information Criterion", Biometrica, 63, 1, 117-126. Solomonoff, R.J. (1964), "A Formal Theory of Inductive Inference". Part I, Information and Control 7, 1-22; Part II, Information and Control 7,224-254. Stone, M. (1977), "An Asymptotic Equivalence of Choice of Model by Cross-Validation and Akalke's Criterion", J. Royal Stat. See., Ser. B, 39, 44-4"/. van Overbeek, A.J.M. and Ljung, L. (1982), "On Line Structure Selection for Multivariable State Space Models", Automatica, vol. 18. no. 5, 529- 543. Wax, M. (1985), "Order Selection for AR Models by Predictive Least Squares", (to appear) Wertz, V. (1982), Structure Selection for the Identification of Multivariate Processes, Dr. Sci. Appl. thesis, Universite Catholique de Louvain, Louvain-La-Neuve.
140
Wertz, V., Gevers, M., Hannan, E.J. (1982), "The Determination of Optimum Structures for the State Space Representation of Multivariable Stochastic Processes", IEEE Trans. Autom. Control., Vol. AC-27, No.6, 1200- 1211.
Chapter
5
Deterministic
and Stochastic
Linear Periodic Systems
Sergio Bittanti
I. Introduction Linear periodic
systems are linear systems described by diffe-
rential or difference coefficients.
equations with periodically
Deterministic
and stochastic
periodic
are useful to model natural and artificial dic type. As such, of application,
Signal Modelling and Processing.
stands on the observation practical
significance
Moreover,
(PSYCO)
(DaPrato, 1979),
(Gilbert,
1971 and 1976), 1967),
(theory and applications)are 1976)
1973), 1971),
1968),
Systems and (Bailey,
1980), b, c),
(Dorato and Knudsen,
1985),
(Houlinhan,
periodic
(Colonius,1985a,
(Gilbert and Lyons,
(Hernandez and Jodar,
(Horn and Bailey,
action in
mentioned:
(Berstein and Gilbert,
(Dorato and Levis, 1977),
is a
the following
to the area of Periodic
Fronza and Guardabassi, 1984),
systems of
control problem of stochastic
relative
(Bekir and Bucy,
(Bittanti,
This theory
calls for a detailed
Without any claim of completeness,
Control 1973),
theory.
of such a periodic
of stochastic disturbances
general references
systems
for which the best operation
analysis of the periodic systems.
periodic
control
fields
Engineering,
that there exist several
periodic one. The implementation presence
in various
and Aerospace
play a key role in optimal periodic
systems
phenomena of perio-
they are of great interest
such as Chemical
time-varying
1981),
(Guardabassi,
(Horn and Lin, Cliff and Kelley,
142
1982),
(Khargonekar,Poolla
(Khandelwal,
and Tannenbaum,
Sharma and Ray,
1974),
(Marcus,
1973),
Onogi,
1981),
1976),
(Nistri,1983),
1979),
(Noldus,
1984),
1978),
(Meyer and Burrus,
(Shayman,
1976),
1980),
1985),
1973 and 1976),
(Watanabe, Nishimura and Matsubara,
1981),
(Maffezzoni,
(Onogi and Matsubara,
1983),
(Speyer,
nabe, Nishimura and Matsubara, Matsubara,
1978),
1975),
(Sch~dlich, Hoffmann and Hofmann,
Evans,
(Kono, 1980),
(Kern, 1980),
(Matsubara, Nishimura, Watanabe and
(Matsubara and Onogi,
(Sincic and Bailey,
1985),
(Speyer and
1984),
(Wata-
(Watanabe, Onogi and
(Watanabe, Kurimoto and Matsubara,
1984).
Further references will be quoted in the sections below. Obviously,
the linear time-invariant
of linear periodic systems. However,
systems belong to the class the extension of the pro-
perties of time-invariant systems to the periodic case is far from straightforward, the theory of PSYCO.
as witnessed by the very development of Indeed, many peculiar and challenging
problems are encountered along this route. Only in the last few years,
a number of these problems have been solved and some
open questions have been clarified. In particular,
a somewhat detailed understanding of the struc-
tural properties of periodic systems has been achieved.
This
paper is intended to provide a first general picture of such a subject by surveying the appropriate literature, which covers the two last decades. The use of these properties in the study of some basic questions relative to stochastic linear periodic systems is also discussed.
143
The p a p e r
is e x p o s i t o r y
The s t r u c t u r a l so on)
in n a t u r e
properties
and is o r g a n i z e d
(teachability,
of c o n t i n u o u s - t i m e
controllability,
and discrete-time
systems
w i t h in S e c t i o n s
2 and
3 respectively.
known
valid
in the t i m e - i n v a r i a n t
properties
recalled. cussed.
Then,
The Kalman canonical
properties with
their generalization
is the s u b j e c t
the e x t e n d e d
and d e t e c t a b i l i t y . of s t o c h a s t i c stationary
systems.
stochastic
2. S t r u c t u r a l
x(t)
the
= A(t)
where A
solution
x(t)
+ B(t)
;
transition
d dt
= A(t)
~(t,T)
is such t h a t
5 deals
to the a n a l y s i s of a c y c l o -
is i n v e s t i g a t e d .
Periodic
Systems
Systems
b y the d i f f e r e n t i a l
equation: (1.a)
: R ~ R nxn
T is the
B(t+T)
The s y s t e m
of t h e s e
u(t)
: R ÷ R nx/n and B
= A(t)
is d i s -
stabilizability
6 is d e v o t e d
of C o n t i n o u s - t i m e
system described
systems
Section
the e x i s t e n c e
Linear Periodic
T - p e r i o d i c . The p e r i o d A(t+T)
Section
i.e.
are d e a l t
are f i r s t
in t e r m s
4. Then,
and
five w e l l
case
to p e r i o d i c
properties,
Precisely,
Properties
2.1Continous-time
Consider
structural Finally,
Precisely,
decomposition
of S e c t i o n
as follows.
~(t,T)
smallest
= B(t)
matrix
value
real
and
for w h i c h
, Yt.
~(t,T), ,
are c o n t i n u o u s
i.e.
~(T,T)
(1.b) the s o l u t i o n
= I,
t>T,
of
144
~(t+T,
T+T)
=
~(t,T) .
B y the F l o q u e t Yakubovich
theory,
(2) see e.g.
and S t a r z h i n s k i i
Halanay
(1975),
(1966),
#(t,0)
Chen
(1970)
and
c a n be e x p r e s s e d
as
follows : ~(t,0)
= ~(t)
e Rt
where ~(t+T)
= T(t),
The matrix ticular
the
Vt;
~ (t+T,t)
# ( T , 0 ) = e RT
monodromy
the
d e t ( e RT)
= e(tr
eigenvalues
System teristic
(namely
matrix
~(t+T,t)=
at time at t=0,
~(t,T)
are i n d e p e n d e n t By J a c o b i ' s
be p o s i t i v e .
of R are n a m e d
t. or
~(T+T,T)
of t. T h e y
Theorem,
In fact
In p a r simply ~(t,T) -I are c a l l e d
the p r o d u c t
of
•
thah
c~ctez~stic
stable
lie w i t h i n
to r e q u i r i n g
spectrum
monodromg matrix
R) T
multipliers
to the o p e n
denote
must
(~) is a s y m p t o t i c a l l y
equivalent
The
Since
of ~(t+T,t)
eigenvalues
= I.
is the m o n o d r o m y
cham2cte~stic multipliers.
these
The
is n a m e d
matrix.
the e i g e n v a l u e s
~(0)
exponents.
if a n d o n l y
the o p e n u n i t
if the c h a r a c -
disk,
the c h a r a c t e r i s t i c
which
exponents
is belong
left plane.
of e RT w i l l
the set of the the e i g e n v a l u e s
be d e n o t e d
"unstable"
by
[, w h i l e
eigenvalues
not b e l o n g i n g
[I is u s e d to
of the
same m a t r i x
to the o p e n u n i t
disk).
,
145
2.2 Structural properties For the sake of completeness,
the classical definitions of reach-
ability
are given below.
and c o n t r o l l a b i l i t y
Definition 1.1
I
The state x ~ R n is reachable over lable over (t,T),T > for
(1.a) which
[(t,x) 1.2
into
Xr (T,t)
System
if there exists an input function
carries
the event
(T,0) into
(t,x)
(T,0) 3 .
[Xc(t,T)] denotes
[controllable] 1.3
t]
(T,t),T
(1.a)
states over
the set of the reachable (T,t)[(t,T) 3
(or, equivalently,
the pair
(A,B)) is reachn
able[controllable 3 over (T,t) [(t,T)] if Xr(T,t) = R EXc(t,T) 1.4
= Rn3
The state x e R n is reachable at t [ controllable if there exists a time point T,T < t reachable over
1.5
Xr(t) lable~
1.6
System
[re(t)]
(T,t) [ controllable
[~>t],such over
that x is
(t,T) ~ .
denotes the set of the reachable E c o n t r o l -
states at t. (1.a)
(or, e q u i v a l e n t l y
the pair
(A,B)) is r e a c h -
able[controllable 3 at time t if Xr(t) = R n [Xc(t) 1.7
System
at t 3
(I)
(or, equivalently,
the pair
=
Rn]
(A,B)) is reach-
a b l e [ c o n t r o l l a b l e 3 if Xr(t) = R n, Yt [Xc(t)
= R n, Yt 3 .
146
2.3
GramPian matrices
The following nxn matrices are named reachability and controlla-
bility Grammian matrices respectively.
Wr(T,t)
.t = j~ #(t,~) B(o) B(o)' ¢(t,s)' do , t >
(3.a)
w
=
(3.b)
(t,-r)
~(t,o)
B(a)
B(c~)'
~(t,c~)'
de
,
T>t
C
It is well known (Kalman, 1969) that
Xr(~,t) = R EWr(~,t)]
x (t,~) = R EWc(t,~)] c
L-J m
where R
is the range operator.
In the periodic case, the following recursions can be derived in view of
(2):
Wr(t-(i+1)T,t ) = Wr(t-iT,t ) + [@(t+T,t)] i Wr(t-T,t) [@(t+T,t)'] i (4.a) C
C
s
C
(4.b) 2.4
Five structural properties of time-invariant systems
The structural properties of linear time-invariant systems have received ample coverage in the literature,see e.g. Kalman
(1969),
147
Chen
(1970).
though all,
some
well
of t h e m
in t h e p r o p e r
periodic
A)
Five
known
are
properties
trivial,
order,
for
are
listed
it is a d v i s a b l e
the
subsequent
below.
to
list
discussion
Althem
on
systems.
The reachability
and controllability
subspaces
at t i m e
t
subspaces
are t i m e -
do c o i n c i d e : x
r
(t)
B) T h e
= x
c
(t)
,
reachability
Yt
and controllability
invariant: X
X
C)
r
c
(t) = const.
, Yt
(t) = const.
, Yt.
If t h e p a i r point, X
r
X
(t)
(A,B)
is r e a c h a b l e
it is r e a c h a b l e =
Rn
(t) = R n
---->
X
~
X
c
D)
r
[controllable]
[controllable] (t)
=
Rn
(t) = R n
,
at a n y
at a t i m e time
point:
Vt , Vt.
c
If a s t a t e any
e>0,
(t,
t+~)]
is r e a c h a b l e
it is r e a c h a b l e :
Xr(t)
= Xr(t-e,t )
Xc(t)
= Xc(t,
t+E).
at t [ c o n t r o l l a b l e over
(t-E,t)
at t],
[controllable
then, over
for
148
E) The p a i r
(A,B)
[sI - A
rank o n the s p e c t r u m
to the s y s t e m
a trivial
between
the t i m e - i n v a r i a n t
D can be r e p h r a s e d
systems,
reachability
instantaneous". made
arbitrarily
constraint I.
often
referred
2.5
Five
an
input
the
(1966),
Popov
structural
that,
with
function
properties
the
case.
in t i m e - i n v a r i a n t
are
"asymptotically
of
transitions time
the a b s e n c e
of any
in the c l a s s i c a l
characterization
Belevitch
(1968),
of c o n t i n u o u s - t i m e
Defini-
E is
(Popov-Belevitch-Hautus)
(1966),
is
to stress
and the p e r i o d i c
interval
spectral
to as the PBH
only
and c o n t r o l l a b i l i t y
is c o n n e c t e d
on the
Finally,
see J o h n s o n
in
B is i n t u i t i v e . C
here
and c o n t r o l l a b i l i t y
occur
shor~This
tions
by saying
The reachability to
Property
of B and is l i s t e d
Property
energy
if
of A.
time-invariance,
consequence
difference
can be
if and o n l y
B 3
is full Due
is r e a c h a b l e
condition, Hautus
(1969).
periodi c
systems
The f o l l o w i n g
basic
Do p r o p e r t i e s • Can
anything
questions
A-C hold
true
be said about
lity
intervals?
Does
there
exist
In the
first
place,
and c o n t r o l l a b i l i t y
for p e r i o d i c
version
holds
subspaces
section:
and c o n t r o l l a b i -
of the PBH
true,
coincide
in this
systems?
the r e a c h a b i l i t y
a periodic A still
are c o n s i d e r e d
test?
i.e.
the r e a c h a b i l i t y
even
in the p e r i o d i c
149
case:
X
r
(t) = X
As m a n y r e s u l t s periodic
~t.
concerning
systems,
geometric
this
the s t r u c t u r a l
can b e p r o v e n
derivation
of a t y p i c a l
of inclusion
that a n y s t a t e NT.
geometric
Due
t+NT,
to p e r i o d i c i t y ,
x is c o n t r o l l a b l e
X
reachable
c
(T,t)
contrary and X
c
to zero
(t+NT),
in an
so t h a t
x~
since x ~ is c o n t r o -
(t,x)
(t,0)
into
will
there
exists
(t+NT,0).
By the
t h e n be t r a n s f e r r e d at t+NT,
to p e r i o d i c i t y .
This
and c o n s e leads
to
s y s t e m at e a c h
well
known
property
to the time
and c o n t r o l l a b i l i t y time p o i n t
(T,t) m a y n o t c o i n c i d e
is an e x t e n s i o n
for t i m e - i n v a r i a n t
invariant
case,
sub-
systems.
the s u b s p a c e s
(see E x a m p l e
1 below).
c
dically
r
the
Xr(t).
P r o p e r t y B is o b v i o u s l y
X
that,
x e is r e a c h a b l e
at t thanks
of a p e r i o d i c
r
(t) = X
of the r e a c h a b i l i t y
of the a n a l o g o u s
X
the event
Xc(t) C
The c o i n c i d e n c e
However,
let us o u t l i n e
at t. T h e r e f o r e
transfers
Therefore
the c o n c l u s i o n
spaces
which
function,
(t+NT,-x~).
quently
or
at t + N T as well.
function
same i n p u t
proof,
a t t c a n be d r i v e n
Let x = $ ( t , + N T ) x "e. It is a p p a r e n t
to
algebraic
X
controllable
is c o n t r o l l a b l e
an i n p u t
either via
of l i n e a r
(t) C X (t). L e t x ~ be a s t a t e w h i c h c -- r at t and d e n o t e b y N a p o s i t i v e i n t e g e r such
is c o n t r o l l a b l e
lableat
properties
methods.
As an e x a m p l e
interval
(t),
c
time-varying
(t) = X
X c(t)
:
(t+T)
,
~t
= X c(t+T)
,
~t.
r
false. I n s t e a d X r ( t )
and Xc(t)
are p e r i o -
150
However,
b y the s a m e a r g u m e n t s
ti a n d B o l z e r n
dim X
dim X
From
r
c
(1984,a),
Lemma
used
in d i s c r e t e - t l m e
3, it can be p r o v e n
in B i t t a ~
that
(t) = c o n s t
(t) = const.
this,
periodic
it f o l l o w s
system
reachable
that property
is r e a c h a b l e
C still h o l d s true:
at a g i v e n
at a n y t i m e point.
Hence
s p e a k of s y s t e m r e a c h a b i l i t y
time point,
If a
it is
it is p o s s i b l e
and c o n t r o l l a b i l i t y
to
without
fur-
ther specifications.
The attention of w h i c h are knowledge
is n o w f o c u s e d somewhat
vals
(1969),
algebraic
Theorem
1
If s y s t e m sition
systems
in
C
Bittanti
(Brunovsky,
X
first
concern-
sixties.
interIn B r u -
is p r o v e n b y m e a n s proof,
stories
To the b e s t
statements
late
of
of g e o m e t r i c
(1984,a),
then
Lemma
type,
I.
the c o n t r o l l a b i l i t y
in an i n t e r v a l
(t) = X
C
(t,t+nT).
tran-
of t i m e of l e n g t h nT
:
C
the
and c o n t r o l l a b i l i t y
and Bolzern
(1) is c o n t r o l l a b l e ,
~
e a c h other.
to the
result
D and E,
1969)
c a n be p e r f o r m e d
(t) = R n
the
An alternative
is the s y s t e m order)
X
go b a c k
the f o l l o w i n g
arguments.
can be found
author,
with
of the r e a c h a b i l i t y
of p e r i o d i c
novsky
interwoven
of the p r e s e n t
ing the l e n g t h
on p r o p e r t i e s
~]
(n
151
In Kalman
(1969),
a stronger
statement
is reported
without
proof.
Proposition If system sition
(Kalman,
(I) is controllable,
can be performed
The question untill
1969)
of proving
1975,
when,
in an interval
in a
Riccati
proposition.
Furthermore,
equation,
system controllability
different
and N i s h i m u r a ~(T,0)
condition, matrix
remained
paper
which generalizes
open
on
Hewer gave a proof
he gave a spectral
[]
of Kalman
condition
of
the PBH test to the
case.
A slightly
matrix
lengthy
tran-
of time of length T.
Kalman proposition
the periodic
periodic
then the controllability
condition
is due to Kano
(1979), where reference is made to the m o n o d r o m y RT = e in place of R. The condition, named H-
will now be stated
~(t+T,t).
to discrete-time
H-condition
yet equivalent
This
in terms of a ~ n e r i c
is especially
systems. Recall
at t
that
monodromy
useful
for the extension RT [ is the spectrum of e
(continuous-time)
Given a time point t, the matrix sl-
#(t+T,t)
Wc(t,t+T) 3
is full rank on I-
[]
The first paper where
the condition was
stated
is probably Bittanti,
Bolzern,
and Guardabassi
Colaneri
in these
terms (1983).
152
However,the spectral conditions playeda k e y r o l e in the analysis of the periodic Lyapunov and Riccati Equations, Hewer Kano and Nishimura
(1969),
(1975). The proof given by Hewer of the
validity of the H-condition as controllability test was based on Kalman proposition, in Hewer
though. Unfortunately,
the proof given
(1969) of such a proposition was not correct. Even
more so:the Kalman proposition itself is not true,as shown by the following counterexample.
Example 1 For a given integer n, let 1 I, ~2' ..., I n be n given distinct real numbers. Consider the single-input system:
A(t) = diag [ 1 I, 12, .... Xn] ' E e-X1 (l-t)
e _12 (l_t)
l , sin ... e -ln (l-t) _
B It) = periodic extension of previous,
t ~ [0,1]
For this system, which is periodic of period T = I, 9(0,~) B(o) = (sin ~g) x I where x I = [e -11
e-k2
... e-ln]
Letting 2 = f01 sin ~q dq
=t, te [0,13
153
it f o l l o w s
Wc(0,1)
from
=
(3.b)
~ X l X ~.
Therefore,
dim
X
not
controllable
For
a given
interval
xi
Then, can
(0,1)
c
and
-iX1
e
Wc(0,k)
the
= ~(xlx:]
assuming
n > I, t h i s
system
integer
k,
k ~ n,
consider
now
the
time
let
. .. e
-iXnJ
'
(4)
for
recursion following
,
any
i = I, 2,
...,
k.
integer
k ~n,
W
c
(0,k)
expression:
+ x2x[z + "" " + XkXJ)'k
k < n
Consequently,
Xc(0,k)
Since dim
X
= span
x I, x2,
c
(0,k)
Therefore states shorter
the
which than
Interestingly
Xr(0,k)
Ix1,
...,
= k
x2,
...,
x n are ,
,
Xk~
k
independent,
it f o l l o w s
is c o n t r o l l a b l e , be d r i v e n
to
zero
but
enough,
it t u r n s
X_l
...,
out
X~k+13
there
in an
(n).
= s p a n Ix0,
.< n .
that
Yk ~
system cannot
is
(0,1).
-iX 2
by a p p l y i n g
be g i v e n
= I, and,
over
positive
(0,k)
[e
: =
that
that
exist
interval
some of
time
154
This
entails
that X
The v a l i d i t y condition
Notice
One
over
8~ngZ~
a
such proof,
(1984),
2(b),
minimal
matrix
Bittanti,
Theorem
cannot
be m a d e
calls
and
to o c c u r
in a
for the c o n t r o l l a b i l i t y
in Bittanti,
the
time point,
of ~(t+T,t) Colaneri
Bolzern,
Colaneri
following
of the m i n i m a l
at any
Bolzern,
of K a l m a n ' s
the r e a c h a b i l i t y
and Bittanti,
contain
polynomial
[]
period.
(1983)
which
m a y not coincide.
as a c o n t r o l l a b i l i t y
although
can be found
~ is the d e g r e e
monodromy
(0,k)
c
indipendently
the H - c o n d i t i o n
and Guardabassi si
that,
transitions
period,
Grammian
and X
n o w be p r o v e n
controllability single
(0,k)
of the H - c o n d i t i o n
must
conjecture.
r
and G u a r d a b a s -
results.
polynomial ~(t+T,t).
does
In T h e o r e m of the
Note
not d e p e n d
and G u a r d a b a s s i
Colaneri
that
upon
the
t, see
(1983).
2
(a) If s y s t e m
(I)
is c o n t r o l l a b l e
at time
t, t h e n the H - c o n d i -
tion at t is satisfied. (b) S u p p o s e system
that
is c o n t r o l l a b l e
In c o n c l u s i o n , condition. riodic
(I)
time-invariant is c o n t r o l l a b l e . point
of the PBH
defined
Since whenever
Then,
the
= Xc(t,t+~T). D
is in fact a c o n t r o l l a b i l i t y test,
is c o n t r o l l a b l e
system
at t is satisfied.
at t and Xc(t)
the H - c o n d i t i o n
In v i e w
system
any time
the H - c o n d i t i o n
this m e a n s
at t if and o n l y
by the p a i r
a periodic
that
system
the peif the
(~(t+T,t),
Wc(t,t+T))
is c o n t r o l l a b l e
it is c o n t r o l l a b l e
at a g i v e n
at
time
155
point,
it r e a d i l y
follows
tion is s a t i s f i e d tion h o l d s
true
Brunovsky's that,
at a g i v e n
time
theorem
(b)
if the H - c o n d i -
then the same
Finally,
can be s l i g h t l y
it is
strengthened
(I) is c o n t r o l l a b l e
can be p e r f o r m e d
that,
point,
at any time point.
if s y s t e m
transition
from part
condi-
clear
that
by c l a i m i n g
at t, the c o n t r o l l a b i l i t y
in an i n t e r v a l
of time of l e n g t h
Remarks I. The above
is the s t r o n g e s t
interval
length
in terms
of m a t r i x
A only.
periodic
matrix
there
that,
(A,B)
if
cannot
that
(Chen,
(A,B),
to zero
1970)
(~(t+T,t),
W
c
analogous
given
any
matrix
some
of s y s t e m
state
result
If one
to p e r f o r m
B such
1985)
equation y(t)
output
the c o n t r o l l a b i l i t y
index
(t,t+T)).
Needless
results
(I):
= C(t)
x(t)
equation
is
the c o n t r o l l -
notion
to say that,
can be g i v e n
can be a d d e d
to the
has since
for r e a c h a b i -
lity. 3. A p e r i o d i c
(A,B)
considers
(Kabamba,
the c o n t r o l l a b i l i t y
into c o n s i d e r a t i o n .
= Xc(t),
that,
can be d r a w n
a periodic
required
is at m o s t
only
system which
in 9-I periods.
of p e r i o d s
subsection,
taken
exists
then a f u r t h e r
of
on the c o n t r o l l a b i l i t y
This m e a n s
is c o n t r o l l a b l e ,
transition
2. In this
Xr(t)
A,
the n u m b e r
ability
been
of a c o n t r o l l a b l e
be d r i v e n
the p a i r
conclusion
state
156
with
C : R ÷ R p x n continuous,
C(t+T) Then,
= C(t)
,
the n o t i o n s
and T - p e r i o d i c :
Yt. of state
can be introduced. these n o t i o n s
real
As
observability
is w e l l
deal w i t h
known,
and r e c o n s t r u c t i b i l i t y
Kalman
the p o s s i b i l i t y
(1969),
starting
f r o m or e n d i n g
in a g i v e n
free m o t i o n
starting
f r o m or e n d i n g
in the o r i g i n
Obviously,
output
function.
(A,C)
this
last
The observability
are n o t e x p l i c i t e l y
the o b s e r v a b i l i t y systems
free m o t i o n
considered
via
(1970).
the o b s e r v a b i l i t y
is e q u i v a l e n t
here.
the d u a l i t y
f r o m the
of the
in the
The r e a s o n
properties
f r o m the ones c o n c e r n i n g
controllability Indeed,
results
state
theory,
state-
zero
Kalman
of
is that
of p e r i o d i c
reachability (1969),
Econtrollability]
of
(A,C)
of the
dual pair (A', C').
3. S t r u c t u r a l 3.1
Properties
Discrete-time
Turning
now
linear
of D i s c r e t e - t i m e periodic
to d i s c r e t e - t i m e
Periodic
Systems
systems
systems,
consider
the d i f f e r e n c e
equation
x(t+1) where
= A(t) x(t) A
+ B(t)
: Z + R n x n and B
u(t) : Z ÷ R nxm
(5.a)
and
(for an integer
and
Chen
[reconstructibility]
to the r e a c h a b i l i t y
a
and r e c o n s t r u c t i b i l i t y
and r e c o n s t r u c t i b i l i t y
can be d e r i v e d
(1970),
of d i s t i n g u i s h i n g
free m o t i o n
space.
Chen
T)
157 T-periodic:
A(t+T)
= A(t)
;
B(t+T)
= B(t)
, ~t
(5.b)
Since A m a y be s i n g u l a r
at c e r t a i n
time points,
not be r e v e r s i b l e .
is a m a j o r
difference
-time
and c o n t i n u o u s - t i m e
reversibility, quite
different
through
tanti
where
no free m o t i o n ending
~(t,T)
= A(t-1)
expressed
A(t-2)
belong
eigenvalues By r e v e r s i n g
~(Y+T,T) nonzero
the r o l e of follows
eigenvalues
= FG and
that,
i.e.
since
~(t+T,t)
and
and
invariance
of
F G x = ix,
~(T+T,T)
x ~ 0,then
1
= GF.
in the a b o v e eigenvalues
implies
¢(T+T,T)
of the
Hence,
all the n o n z e r o
~(T+T,T)
This
can be
y ~ 0, so t h a t
t h a t all the n o n z e r o of ¢(t+T,t).
= GF.
Indeed,
i ~ 0 and x ~ 0, x
of G. H e n c e Therefore,
of t.
matrices
¢(T+T,T)
are e i g e n v a l u e s
of ~(t+T,t)
the time
is i n d e p e n d e n t
of ¢ ( t + T , t ) ,
of GF as well.
are e i g e n v a l u e s
demonstrating
new-born s t a t e at time t, B i t -
, the m o n o d r o m y
Notice
#(t+T,t)=FG
it also
and e v e n t s
is an e v e n t w i t h no free
~(t+T,t)
to the n u l l - s p a c e
of
end
A(T).
eigenvalue
y = Gx.
is an e i g e n v a l u e
argument,
...
in the f o r m ~(t+T,t)
GFy = ly, w h e r e
there m a y
is n o w g i v e n b y
(T, t e E 0 , T - I ~ )
if I is a n o n z e r o
cannot
matrix
of the m a t r i x
for any p a i r
(x,t)
passing
(1986).
transition
The s p e c t r u m
If
free m o t i o n s
c a n be
in
free m o t i o n
In d i s c r e t e - t i m e ,
in it, x is n a m e d
and B o l z e r n
The s y s t e m
end.
discrete
of n o n -
Precisely,
is one and o n l y one
two o r m o r e
(5) m a y
in d i s c r e t e - t i m e
t h a n in c o n t i n u o u s - t i m e . there
between
As a c o n s e q u e n c e
portrait
a n y g i v e n event (x,t).
exist events
motion
systems.
the t r a j e c t o r y
continuous-time
where
This
system
that
the
do c o i n c i d e ,
spectrum
[.
of
so
158
Some eigenvalues of #(t+T,t)
may be zero. The symbol [0 will
denote the set of the nonzero elements of lAlthough the characteristic polynomial of #(t+T,t)
does not
change with t, the minimal polynomial may depend on t. The degree of the minimal polynomial will be denoted by 9t" Finally, note that the determinant of ~(t+T,t) may not be positive.
3.2
Structural properties
The definitions of reachability and controllability given above apply to discrete-time
systems as well. When considering the
structural properties,
the only care to be taken in discrete-
time concerns
the definition of reconstructibility.
due to the peculiar phase portrait of nonreversible
This is systems.
Since the attention is mainly focused here on reachability and controllability,
this aspect will not be discussed in this paper.The
interested reader is referred to Bittanti and Bolzern
3.3
(1986).
Grammian matrices
The only Grammian matrix which can be defined in general is the reachability one: Wr(T,t)
=
t [ j ~(t,j)B(j-1)B(j-1)'~(t,j)' T+I
(6)
For reversible systems, the backward transition matrix ~ (T,t)= -I =~(t,T) , t >. T, can be defined. Then, the controllability Gramian matrix is given by:
W
c
(t,T)
=
t+~1j ~(t,j)B(j-I)B(j-I)'#(t,j) '
159
As in continuous-time, reversible
systems,
Xr(T,t)
Xc(t,%)
In particular,system(5.a)
continuous-time
3.4
can also be tested by resorting
This coincides with the PBH condition
systems
Five structural
at time t if and only if,for
is full rank. For time-invariant
the system reachability
to the PBH condition.
for
R[Wc(t,~)]
=
is reachable
some T
= R [Wr(T,t) 7_ . Moreover,
(Sect.
for
2.4).
properties
of discrete-time
periodic
systems
Property A is false for periodic
discrete-time
only conclusion which holds in general Bolzern,
1984,a,
,
(Bittanti and
Vt.
only the dimension
is time-invariant dim
is that
of the controllability
(Bittanti and Bolzern,
1984,a,
reachability
Example
(7)
simple example
shows that the dimension
subspace may change
2
Consider the scalar system x(t+1)
= a(t) x(t) I
subspace
Lemma 2)
X (t) = const. c
The following
The
Lemma 2)
X r (t) c_ Xc (t) Furthermore,
systems.
+ b(t) u(t) 0
,
t even
I
,
t odd.
a(t) = b(t) =
in time.
of the
160
Then:
x
r
(t)
R I
,
t
even
{0}
,
t odd.
C,
from
=
As for P r o p e r t y controllable
In o r d e r
(7) it f o l l o w s
at a c e r t a i n
any t i m e point. conclusion
[]
it is n e c e s s a r y
if s y s t e m
(5) is
it is c o n t r o l l a b l e
from E x a m p l e
2 that
the
at
same
for r e a c h a b i l i t y .
to g u a r a n t e e
the s y s t e m
to r e q u i r e
set 6, w h i c h
for r e v e r s i b l e
point,
It is a p p a r e n t
is false
of a s u i t a b l e
time
that,
systems
any t e [0,
the r e a c h a b i l i t y reduces
only. T-I]
reachability
to any
Precisely,
if the
at any time point,
at each singleton
time p o i n t in [0,
T-13
let
system
is r e v e r s i b l e
6 = {t e [ 0 , T - I ] Then,
by a n a l y z i n g
Bolzern
(1985,b)
it is r e a c h a b l e
such
the r e a c h a b i l i t y
it is p r o v e n for e a c h
is d e r i v e d
X X
r c
and B o l z e r n by
(t) = X
c
(t, t+nT)
system
in B i t t a n t i
and
(5) is r e a c h a b l e
and c o n t r o l l a b i l i t y
(1984,a,
considerations
(t) = X r ( t - n T , t )
that
Gramian,
, otherwise.
if
t e &.
As far as the r e a c h a b i l i t y in B i t t a n t i
that det A(t-1)=0}
Yt , Vt.
Lemma
I) the
of g e o m e t r i c
interval following
kind:
length, result
161
Obviously,
this e n t a i l s
Theorem.
Another
lability
interval
can be stated
Xr(t)
result
Xr(t)
X (t) = R n c
~
X
of B r u n o v s k y
the r e a c h a b i l i t y
in B i t t a n t i
and c o n t r o l -
and B o l z e r n
(1985,b),
= Xr(t-v t T,t)
(t) = X
c
it is r e c a l l e d of
given
version
as follows:
~
nomial
concerning
length,
= Rn
where
the d i s c r e t e - t i m e
(8.a)
(t, t+~tT) ,
c
that
(8.b)
is the d e g r e e
t
of the m i n i m a l
poly-
¢(t+T,t).
Remarks
4. For n o n r e v e r s i b l e
systems,
m a y give
raise
Consider
a controllable
Then,
driven t=2,
to an i n t e r e s t i n g
starting
at m o s t
to zero
any
However,
conclusion
state
since
say,
t. This
would
91=3
fact
and 92=1.
c a n be d r i v e n any
state
no l o n g e r
c a n be d r i v e n
upon
paradox.
any state
in an i n t e r v a l
the s t r o n g e r
depend
s y s t e m with,
f r o m t=1,
3 periods.
f r o m t=1,
vt m a y
to zero
can also be
than T s t a r t i n g
follow to zero
in
that,
from
starting
in at m o s t
2
periods. However, shown
this p a r a d o x
(Bittanti
I~ t - vTl
a n d Bolzern,
$ I
readily,
1985,b)
since
it can be
that
, WT,t
5. T h e d i s c r e t e - t i m e I can be found
resolves
version
of the r e s u l t
in B i t t a n t i ,
Colaneri,
mentioned
De N i c o l a o
in R e m a r k (1986).
162
Precisely,
let ~rt be the reachability
(#(t+T,t),
Wr(t,t+T))
where
is
~z
the
r
denote by ~ct = maX(Urt'
pair ~z )
dimension of the largest Jordan block
of the controllable (~(t+T,t),W
and
index of the
and unreachable
(t,t+T)).
part of system
Then
X r (t) = Xr(t-~rtT,t) Xc(t) = Xc(t,
From this,
t+~ctT)
it follows
if the a s s u m p t i o n s
6. The impossibility
X
r
that conclusions (t)=R n
and
X
c
(8) hold true even
(t)=R n
are
removed.
of defining in generalacontrollability
Grammian
leads to the question of working out a controllability
test
of periodic
A
systems based on the reachability
test of this type can be found in Bittanti
Gramian.
and Bolzern
The discrete-time
version of the spectral characterizations
of the structural
properties
stated as follows
(Bittanti and Bolzern,
Theorem System
presented
in Sect. 1985,b,
(1985,b).
2.5 can be Sect.
6).
3
(5) is reachable [controllable 3 at time t if and only
if the following H-condition H-condition
at t
at t is satisfied.
(discrete-time)
(i) For reachability: Given the time point t, the matrix
[]
163
sI -
~(t+T,t)
is full
W r (t, t+T) 3
rank on I-
(ii) F o r c o n t r o l l a b i l i t [ Given
(9)
:
the t i m e p o i n t
t, the m a t r i x
(9) is full
r a n k on [0 o
Remark
7. As in c o n t i n u o u s - t i m e ,
if the H - c o n t r o l l a b i l i t y
is s a t i s f i e d
at a time point,
point.
follows
As
it
the r e a c h a b i l i t y conclusion
4. K a l m a n
does
not a p p l y
system
controllable-observable continuous-time
riodic
systems
is b a s e d
Gramians,
which
dimensions by means
the
on same
condition.
Theory
is t h a t
systems,
in 4 parts,
controllable-unobservable,
this
result
structural
is e x t e n d e d
of(Bittanti
on the t i m e - i n v a r i a n c e
of a T - p e r i o d i c
any t i m e - i n v a r i a n t
parts.
to pe-
and Bolzern, of the r a n k
i.e. un-
and u n c o n t r o l l a b l e - u n o b s e r v a b l e
corresponds
of the
systems,
can be d e c o m p o s e d
in the a p p e n d i x
The proof
of p e r i o d i c
time
discussion
to the H - r e a c h a b i l i t y
in S y s t e m
the controllable-observable,
For
previous
at e a c h
Decomposition
result
and c o n t i n u o u s - t i m e
the
properties
Canonical
A fundamental
from
it is s a t i s f i e d
condition
1985,c).
of the
to the time
invariance
of the
subspaces.
The r e s u l t
says
state
transformation,
that,
any T - p e r i o d i c
164
system can be decomposed into the 4 parts of the Kalman canonical decomposition. As discussed in the previous section, the reachability and controllability subspaces may not coincide for a discrete-time periodic system.
Dually,
the observability and reconstructibi-
lity subspaces may not coincide too. Therefore, decompositions
sbould have to be considered,
cal d e c o m p o s i t i o n s based on the pairs bility),
(controllability,
reconstructibility)
and
four canonical
i.e. the canoni-
(reachability,
observability),
(controllability,
observa-
(reachability, reconstructibility).
To see which one of these decompositions can actually be used for discrete-time periodic systems, note that the reachability subsPace and the dual observability subspace may have timevarying dimensions trollability,
(Sect.
3.4). Consequently,
reconstructibility)
only the
(con-
decomposition is a candidate
for a canonical decomposition of general validity.
This is
why, contrary to the continuous-time case, the theory cannot be based on the Gramian matrices.
Indeed, as already observed,
only the reachability and observability Gramians can be defined for discrete-time systems. In
(Bittanti and Bolzern,
1984,b and 1986), a theory for the
Kalman canonical decomposition of any time-varying and discrete -time system is worked out. Letting Xa(t) be either the reachability or
the
controllability
subspace at t and Xe(t) be either
the unobservability or unreconstructibility proven that an
subspace,
it is
(a,e) canonical decomposition exists if the
following three conditions are met with.
165
(i)
dim X a (t) = const.
(ii)
dim XU(t) = const.
(J'ii)
dim X a ( t ) ~
(i)-(iii) In
Xe(t) = const.
are named dimension-invariance
(Bittanti and Bolzern,
periodic systems,
1986),
condition.
it is also shown that, for
the dimension-invariance condition is veri-
fied by taking a= controllability and u=reconstructlbility. In conclusion,
discrete-time periodic systems can be canoni-
cally decomposed by making reference to the pair lity,
reconstructibility).
only,
this result is also derived in Grasselli
(controllabi-
By focusing on periodic systems (1984).
5. Extended Structural Properties The notions of stabilizability and detectability are here called extended structural properties.
Since the detectability results
can be derived from the ones relative to stabilizability duality,
only stabilizability
by
is considered in this section.
As a concise introduction to the stabilizability concept,
con-
sider the time invariant system: x(t) = Ax(t)
+ B u(t)
(10)
or x(t+1) = AX(t)
(11)
+ B u(t)
A classical result of System Theory, that,
if
(A,B) is reachable,
see e.g. Kalman
(1969) is
then the system can be stabilized
by a suitable feedback control law. Stated differently,
there
166
exists
a matrix
K e Rm x n
~(t)
= Ax(t)
U(t)
= ~x(t)
such that
+ B u(t)
or x(t+l) u(t) is
= Ax(t)
= Kx(t)
asymptotically
ability
feedback
leads
according
While
respectively. condition
control
to the n o t i o n
to a c l a s s i c a l
is s t a b i l i z a b l e control
stable
is a s u f f i c i e n t
stabilizing This
+ B u(t)
whenever
law.
Thus,
for the e x i s t e n c e
However,
there
by Wonham
exists
Precisely,
(1968),a
a stabilizing
PBH c o n d i t i o n
for r e a c h a b i l i t y
system
feedback
is the same
continuous
or d i s c r e t e - t i m e ,
the PBH c o n d i t i o n
lizability
of t i m e - i n v a r i a n t
systems
or d i s c r e t e - t i m e .
Precisely,
if and o n l y
if the m a t r i x
[sI - A
B] rank
part [with The m a i n
of a
it is not necessary.
of s t a b i l i z a b i l i t y .
definition
reach-
law.
the
is full
the s y s t e m
system
for all e i g e n v a l u e s
modulus
greater
characterizations
continuous-time
periodic
than
[(11)]
in c o n t i n u o u s is s t a b i l i z a b l e
of A w i t h n o n n e g a t i v e or equal
of the system
for the stabi-
is d i f f e r e n t (10)
in
real
to I].
stabilizability
c a n be s u m m a r i z e d
notion
for
as follows
167
(Bittanti and Bolzern, 1984,c)
and
(Shayman,
eigenvalues
1985,b)and also 1984).
of #(t+T,t)
(Bittanti and Bolzern,
Recall that
not belonging
11 is the set of the
to the open unit disk
(unstable part of the spectrum). Theorem 4 The following
statements
are equivalent to each other.
(a) There exists a T-periodic matrix K:R + R mxn such that
~(t) = [A(t)
- B(t) K(t)] x(t)
is asymptotically
stable.
(b) The uncontrollable
part of system
(I) is asymptotically
stable.
(c) For at least a time point t e E0,T] [SI-
#(t+T,t)
, the matrix
Wc(t,t+T) 3
is full rank on [ 1 " ~
Characterizations
(a) and
systems of analogous systems. condition
characterizations
(c) is a natural extension (Sect.
the PBH test, stabilizable (~(t+T,t),
(b) are direct extensions
2.5).
for time-invariant
of the H-controllability
In view of the discrete-time
(c) is equivalent
The discrete-time
version of
to saying that system
if and only if the discrete-time
Wc(t,t+T))is
to periodic
(1) is
pair defined by
stabilizable.
version of this theorem is given in
(Bolzern,
168
1986)
and can be stated as follows.
Theorem 5 The f o l l o w i n g
statements
(a) T h e r e e x i s t s
x(t+l)
are e q u i v a l e n t :
a T-periodic matrix K
= [ACt)
: Z + R m x n such that
- Bit) K(t)~ x(t)
is a s y m p t o t i c a l l y
stable.
(b) T h e u n c o n t r o l l a b l e
p a r t of s y s t e m
(I) is a s y m p t o t i c a l l y
stable.
[0,T3
(c) F o r a t l e a s t o n e t i m e p o i n t t e [sI - ~(t+T,t)
Wr(t,t+T)
, the m a t r i x
3
is full rank on [1"[]
The m o d a l Thm.
characterizations
of s t a b i l i z a b i l i t y
4 and 5) are e x t e n s i o n s
of the H - c o n d i t i o n s
(c) of
previously
introduced.
Needless
Wc(t,t+T)3
is full rank on [I at a g i v e n t, then the same
matrix holds
if the m a t r i x [sI - ~(t+T,t)
is full rank on [1 at any time. for d i s c r e t e - t i m e
6. S t o c h a s t i c This
to say,
(point
section
systems
statement
systems.
Linear Periodic is d e v o t e d
An a n a l o g o u s
Systems
to the study of linear p e r i o d i c
subjet to inputs w h i c h are s t o c h a s t i c
processes
of
169 periodic
type.
Precisely,
continuous-time
focusing
case only,
the
in this
system
Section
taken
on the
into c o n s i d e r a t i o n
is
dx(t)
= A(t)
x(t)
d t + B(t)
where v is an m v e c t o r
(12)
dv(t)
valued
stochastic
process
characterized
as follows: Let m(t):
: E [v(t)~
;
Then v(t)
= m(t)
+ z(t)
where
z satisfies
dz(t)
= q(t)
In
dw(t)
(13) the
stochastic
,
z(0)
that f u n c t i o n wise
continuously
Therefore,
(12)
standard
and continuous.
m in
(13)
equation
(14)
= 0.
(14), w is an m - d i m e n s i o n a l
q is T - p e r i o d i c
differential
Wiener
Moreover,
is T - p e r i o d i c ,
process,
while
it is also a s s u m e d
continuous
and p i e c e -
differentiable.
is a s t o c h a s t i c
differential
equation
of the
form dx(t)
= [A(t)x(t)
with q(t) : = B(t)q(t)
+ B(t)m(t)]
dt + n(t)
dw(t)
(15)
170
For a given random vector x(0) the meaning of
(15) is precisely
x(t) = X(0)+ ~0
the stochastic
integral being understood
t % 0},
q(~)dw(q),
(16)
in the sense of Ito
1985, Ch. 4). The solution of
integral version
x(t)= #(t,0)x(0)+
of {w(t),
that
[A(G)X(O)+B(O)~(G)]dO+
(Wong and Hajek, equivalent
independent
(15), or its
(16), is given by
/0t $(t,G)B(q)~(q)dq+/0t ~ (t,G)n (o) dw(o) (17)
as follows readily from the Ito differential Should x(0) be Gaussian,
rule.
say with expected value
covariance matrix r0,then
z 0 and
(17) defines a Gaussian process,
with expected value
z(t) = ~(t,0)
z0+
and covariance
£(t) = ~(t,0)
(18)
matrix
F 0 O(t,0) ' +
where Q(t) = q(t)
~(t,o)B(q)~(~)d~
q(t) '
~(t,o)Q(s)~(t,o) 'd~
(19)
171
represents
the c o v a r i a n c e
From
(18)
and
~(t)
= A(t)
F(t)
= A(t) F(t)
with
the
z(0)
= z0
z(t)
and,
(Brockett,
1970)
that
m(t)
F(0)
covariance
the
known
(15).
entering
(20)
'
(21)
conditions
= E [(x(t)
= F 0.
function,
- z(t))
following
(x(T)
- z(T))' 3,
properties
= 7(T,t) '
for
7(t,T)
t > T,
= ~(t,T)
Eq. (22) Since
is w e l l
of t h e n o i s e
+ F(t)A(t) ' + B(t)Q(t)B(t)
,
possesses
7(t,T)
it
+ B(t)
initial
The p r o c e s s
¥(t,T):
(19),
matrix
can be derived
A,
7(t+T,
in v i e w
(22)
from
(17)
by
B, m a n d Q a r e T - p e r i o d i c ,
the e x i s t e n c e If s u c h
F(T).
of p e r i o d i c
simple
computations.
it is n a t u r a l
solutions
of
(20)
and
to
investigate
(21).
is the c a s e ,
T+T)
of
=
(2).
~(t+T,
T+T)
F(T+T)
= ~(t,T)
F(~)
= 7(t,T)
172
A stochastic covariance
y(t+T,
function
T+T)
is c a l l e d of this
process
(Gardner
example,
see
signals
(Bittanti
In c o n c l u s i o n ,
solutions
and
of s e a s o n a l
and Hernandez,
time
type.
areas.
series
For
Processes They
or to
an i l l u s t r a t i v e
1986).
is to find
are p e r i o d i c
1975).
in v a r i o u s
of p e r i o d i c
the p r o b l e m
there
and Franks,
applications
for the m o d e l l i n g
uncertain
value
Vt,T
find n u m e r o u s
describe
is,
,
eyeZo-8#at~onary
can be u s e d
expected
satisfying
= 7(t,w)
type
for w h i c h
with T-periodic
initial
solutions
to
conditions
(20) a n d
(22)
(21),
that
satisfying
z(T)
= z(0)
(23)
F(T)
= F(0)
(24)
respectively.
More
precisely,
for the L y a p u n o v
the p r o b l e m
is to f i n d a T - p e r i o d i c
is p o s i t i v e
semidefinite
As
for the e x p e c t e d
characteristic
at e a c h
value,
satisfying
Consider
n o w the L y a p u n o v
and s y m m e t r i c
matrix
solution
(21), which
time point.
it is e a s y
multiplier
solution
symmetric
equation
to see that,
is equal
to I, then
equation
(21)
(20)
if no admits
a unique
(23).
5,
let
r
(t) "r that,
and,
given
be a solution
a n x n real
such
r (T) = ~. It is well k n o w n for a n y ~, (21) has T s o l u t i o n r (t), -co < t < + ~ , see e.g. B r o c k e t t (1970). T a T - p e r i o d i c m a t r i x such t h a t
that a unique Let B be
173
B(t) B(t)' = B(t) Q(t) B(t)' and denote by Wr(T,t) i.e.
the reachability Gr~T.~ian matrix of
(A,B),
(see (3)),
WrCT,t) =
#(t,u)B(~)B(~) '~(t,s)'d~
(25)
For t>T, the solution of the Lyapunov equation is given by FT(t) = ¢(t,T) ~ ~(t,T)' + Wr(~,t).
(26)
Setting now T=0, t=T and imposing the periodicity constraint (24), the following equation is obtained: = ¢(T,0)
~
(T,0)'
+ ~.
This is the discrete-time algebraic Lyapunov equation. be shown
It can
(Graham, 1981) that, if the characteristic multipliers
lie within the unit circle, this equation admits a unique solution. From these results, stable, then both
it follows that: if the system is as~nptotically
(20) and (21) admit a unique T-periodic
solution. As a matter of fact, under the assumption of asymptotic stability, the following can be shown to hold true Bolzern and Colaneri,
(Bittanti,
1984):
Consider the solution F (t) of (21) such that F (T) = ~. T T Then F (t) converges to the periodic solution of % (21) as T÷-~, for whichever ~. In particular, taking ~=0,
(26)
entails that the Wr(-~,t) ~8 the T-periodic solution. Moreover,
174
in view of positive
(25), it is also apparent
semidefinite
reachable,
(at each t).
that this solution In fact,
should
is
(A,B) be
the solution is obviously positive definite
(at each
t). This last conclusion Lemma,
is part of the so-called Periodic Lyapunov
which can be stated as follows
Colaneri, Theorem
(Bittanti,
Bolzern and
1985).
6
The system is asymptotically such that
(A,B)
positive definite
stable if and only if, for any
is reachable,
there exists a T-periodic
solution of the Lyapunov equation
(21).~]
An extended version of this lemma can be given under the assumption
that
(A(t), B(t))
be stabilizable
only.
Theorem 7 The system is asymptotically such that
(A,B) is stabilizable,
positive semidefinite Theorem
7 is proven in
is decomposed
to the reachability
there exists a T-periodic
solution of the Lyapunov equation (Bittanti,
by means of a decomposition equation
stable if and only if, for any
Bolzern and Colaneri,
technique.
Precisely,
into three subequations
canonical
decomposition
of
(21).[7 1985)
the Lyapunov
corresponding (A,B).
One could wonder whether the Lyapunov equation may admit a T-periodic
positive
is not stable.
semidefinite
solution even if the system
In case the system is not asymptotically
stable,
175
matrix ~(T,0)
has some eigenvalues
on or outside
the unit
circle. If a characteristic the pair
(A,B)
T-periodic
multiplier
is stabilizable,
solution,
(see
(Bittanti and Colaneri, say, p characteristic
lies on the unit circle, then
(21) does not admit any
(Wimmer and Ziebur,
1986, Thm.
multipliers
2(a)).
that,
if
1984) and
(A,B)
solution of time-points.
(Bittanti,
is reachable
(21)
lower than I. Then,
Colaneri,
1986),
or stabilizable,
it is shown
the T-periodic for each
The remainign n-p ones are all positive if
(A,B)
solutions of
(21) correspond
(12), the conclusion stabilizable.
Then,
eq.(12)
(A,B)
semidefinite
to a cyclostationary
is the following:
if
is stabilizable.
Since it is obvious that only the positive
Assume
solution of
that
(A,B)
is
if the system is not asymptotically
admits no cyclostationary
The analysis of the discrete-time
solution.
periodic Lyapunov equation
is currently underway and partially Colaneri,
Suppose now that,
(if any) has p negative eigenvalues
is reachable or nonnegative
stable,
1975) and
have modulus greater than I,
while the remaining n-p ones have modulus in (Shayman,
and
reported
in
(Bolzern and
1986).
Acknowledgment The author is grateful to Professors Diego Bricio Hernandez comments.
Guido Guardabassi
and
for their helpful and stimulating
176
References Bailey, J.E. (1973): Periodic Operation of Chemical Reactors: A Review. Chem. Eng. Commun. I, 111-124. Bekir, E. and R.S. Bucy (1976): Periodic Equilibria for Matrix Riccati Equations. Stochastics 2, 1-104. Belevitch, V. (1968): Classical Network Theory. Holden Day, San Francisco. Bernstein, D.S and E.G. Gilbert (1980): Optimal Periodic Control: The H Test Revisited. IEEE Trans. Automatic Control AC-25, 673-684. Bittanti, S. and P. Bolzern (1984,a): Can the Kalman Canonical Decomposition be performed for a Discrete-time Linear Periodic System? Ist Latin American Conference on Automatica, Campina Grande, Brazil, 449-453. Bittanti, S. and P. Bolzern (1984,b) : Canonical Decomposition and Discrete-time Linear Systems. 23rd Conference of Decision and Control, Las Vegas, U.S.A., 1737, 1738. Bittanti, S. and P. Bolzern (1984,c): Four Equivalent Notions of Stabilizability of Periodic Linear Systems. 3rd American Control Conference, San Diego, U.S.A., 1321-1323. Bittanti, S. and P. Bolzern (1985,a): Reachability and Controllability of Discrete-time Linear Systems. IEEE Trans. Automatic Control 30, 399-491. Bittanti, S. and P. Bolzern (1985,b): Discrete-time Linear Periodic Systems: Grammian and Modal Criteria for Reachability and Controllability. International J. Control 41, 899-928. Bittanti, S. and P. Bolzern (1985,c): Stabilizability and Detectability of Linear Periodic Systems. Systems and Control Letters 6, 141-145. Plus Addendum, to appear in Systems and Control Letters (1986), 7, 73. Bittanti, S. and P. Bolzern (1986): On the Structure Theory of Discrete-time Linear Systems. International J. Systems Science, 17, 33-47.
177
Bittanti, S., P. Bolzern and P. Colaneri (1984): Stability Analysis of Linear Periodic Systems via the Lyapunov Equation. 9th IFAC World Congress, Budapest, 8, 169-172. Bittanti, S., P. Bolzern and P. Colaneri (1985): The Extended Periodic Lyapunov Lemma. Automatica 5, 603-605. Bittanti, S., P. Bolzern, P. Colaneri and G. Guardabassi (1983): H and K-Controllability of Linear Periodic Systems. 22nd Conference on Decision and Control, S. Antonio, U.S.A., 1376-1379. Bittanti, S. and P. Colaneri (1986): Lyapunov and Riccati Equations: Periodic Inertia Theorems. IEEE Trans. Automatic Control (to appear). Bittanti, S., P. Colaneri and G. De Nicolao (1986): Discretetime Periodic Systems: a note on the Reachability and Controllability interval length. Centro Teoria Sistemi, Politecnico di Milano, Int. Rep. 86-003. Bittanti, S., P. Colaneri and G. Guardabassi (1984): H-Controllability and Observability of Linear Periodic Systems. SIAM J. Control and Optimization 22, 889-893. Bittanti, S., G. Fronza and G. Guardabassi (1973): Periodic Control: A Frequency Domain Approach. IEEE Trans. Automatic Control 18, 33-38. Bittanti, S., G. Guardabassi, C. Maffezzoni and L. Silverman (1978): Periodic Systems: Controllability and the Matrix Riccati Equation. SIAM J. Control and Optimization 16, 37-40. Bittanti, S. and D.B. Hernandez (1986): The Simple Pendulum as an Illustrative Example of the Periodic Control Problem. Centro Teoria dei Sistemi, Politecnico di Milano, Int. Rep. 86-010.
Bolzern, P. (1986): Criteria for Reachability, Controllability and Stabilizability of Discrete-time Linear Periodic Systems. V Polish-English Seminar on Real-Time Process Control, Warsaw, Poland.
178
Bolzern, P. and P. Colaneri (1986): Existence and Uniqueness Conditions for the Periodic Solutions of the Discretetime Periodic Lyapunov Equation. Centro Teoria dei Sistemi, Politecnico di Milano, Int. Rep. 86-011. Brockett, R.W. (1970): Finite Dimensional Linear Systems. Wiley and Sons.
J.
Brunovsky, P. (1969): Controllability and Linear Closed loop Controls in Linear Periodic Systems. J. Differential E~uations 6, 296-313. Chen, C.T. (1970): Introduction to Linear System Theory. Rinehart and Winston.
Holt,
Colonius, F. (1985ja): Optimality for Periodic Control of Functional Differential Systems. J. Mathematical Analysi s and Applications (to appear). Colonius, F. (1985,b): The High Frequency Pi-Criterion for Retarded Systems. IEEE Trans. Automatic Control 11, 1045-1048. DaPrato, G. (1984): Periodic Solutions of Infinite Dimensional Riccati Equations. Rendiconti Accademia Nazionale dei Lincei, (to appear). Dorato, P. and A.H. Levis (1971): Optimal Linear Regulators: the Discrete-time Case. IEEE Trans. Automatic Control 6, 613-620. Dorato, P. and H.K. Knudsen (1979): Periodic Optimization with A p p l i c a t i o n s to Solar Energy Control. Automatica 15, 673-679 Gardner, W.A. and D.E. Franks (1975): Characterization of Cyclo-stationary Random Processes. IEEE Trans. Information T h e o r y 21, 1-24. Gilbert, E.G. (1977): Optimal Periodic Control: A General Theory of Necessary Conditions. SIAM J. Control and Optimization 15, 717-746.
17g
Gilbert, E.G. and D.T. Lyons (1981): The Improvement of Aircraft Specific Range by Periodic Control. AIAA Guidance and Control Conference, Albuquerque. Graham, A. (1981): Kronecker Products and Matric Calculus with Applications. Ellis Horwood Limited, Chichester. Grasselli, O.M. (1984): A Canonical Decomposition of Linear Periodic Discrete-time Systems. International J. Control 40, 201-214. Guardabassi, G. (1971): Optimal Steady State Versus Periodic Control. Ricerche di Automatica 2, 240-252. Guardabassi, G. (1976): The Optimal Periodic Control Problem. Journal A 17, 75-83. Halanay, A.(1966): New York.
Differential Equations.
Academic Press,
Hautus, M.L.J. (1969): Controllability and Observability Conditions of Linear Autonomous Systems. Inda@. Math. 443-448.
72
Hernandez, V. and L. Jodar (1985): Boundary Problems and Periodic Riccati Equations. IEEE Trans. Automatic Control 11, 1131-1133. Hewer, G.A. (1975): Periodicity, Detectability and the Matrix Riccati Equation. SIAM J. Control 13, 1235-1251. Horn, F.J.M. and R.C. Lin (1967): Periodic Processes: A Variational Approach. Ind. Eng. Chem. Process Des. Dev. I, 21-30.
6,
Horn, F.J.M. and J.E. Bailey (1968): An Application of the Theorem of Relaxed Control to the Problem of Increasing Catalyst Selectivity. J. Optimization Theory and Applications 2, 441-449. Houlihan, S.C., E.M. Cliff and H.J. Kelley (1982): Study of Chattering Cruise, Journal Aircraft 19, 119-124.
180
Johnson, C.D. (1966): Invariant Hyperplanes for Linear Dynamical Systems. IEEE Trans. Automatio Control 11, 113-116. Kabamba, P.T. (1985): Monodromy Eigenvalue Assignment in Linear Periodic Systems. 24th Conference on Decision and Control, Ft. Lauderdale, U.S.A., 177, 178. Kalman, R.E. (1969): Theory of Regulators for Linear Plants. In: Kalman R.E., P.L. Falb and M.A. Arbib: Topics in Mathematical S y s t e m Theor[. McGraw-Hill Co., New York. Kano, H. and T. Nishimura (1979): Periodic Solutions of Matrix Riccati Equations with Detectability and Stabilizability. !nternational J. Control 29, 471-487. Kern, G. (1980): Linear Closed-loop Control in Linear Periodic Systems with Application to Spin-stabilized Bodies. International J. Control 31, 905-916. Khandelwal, D.N., J. Sharma and L.M. Ray (1979): Optimal Periodic Maintenance of a Machine. IEEE Trans. Automatic Control 24, 513. Khargonekar, P.P., K. Poolla and A. Tannenbaum (1985): Robust Control of Linear Time-invariant Plants Using Periodic Compensation. IEEE Trans. Automatic Control 11, 1088-1098. Kono, M. (1980): Eigenvalue Assignment in Linear Periodic Discrete-time Systems. International J. Control I, 149-158. Maffezzoni, C. (1974): Hamilton-Jacobi Theory for Periodic Control Problems. J. Optimization Theory and Applications 14, 21-29. Markus, L. (1973): Optimal Control of Limit Cycles or what Control Theory can do to Cure a Heart Attack or to Cause One. Symposium on Ordinary Differential Equations, Minneapolis, Minnesota (1972). W.A. Harris, Y. Sibuya, eds., SpringerVerlag, Berlin. Matsubara, M., N. Nishimura, N. Watanabe and K. Onogi (1981): Periodic Control Theory and Applications. Research Reports of Automatic Control Laboratory Vol. 28, Faculty of Engineering, Nagoya University.
181
Matsubara, M. and K. Onogi (1978): Stabilized Suboptimal Periodic Control of a Chemical Reactor. IEEE Trans. Automatic Control 23, 1005-1008. Meyer, R.A. and C.S. Burrus (1976): Design and Implementation of Multirate Digital Filters. IEEE Trans. Acoustics, Speech and Signal Processing 1, 53-58. Nistri, P. (1983): Periodic Control Problems for a Class of Nonlinear Periodic Differential Systems. Nonlinear Analysis, Theory, Methods and A p p l i c a t i o n s 7, 79-90. Noldus, E. (1975): A Survey of Optimal Periodic Control of Continuous Systems. Journal A 16, 11-16. Onogi, K. and M. Matsubara (1980): Structure Analysis of Periodically Controlled Chemical Processes. Chem. En~. Sci. 34, 1 0 0 9 - 1 0 1 9 . Popov, V.M. Berlin.
(1973): Hyperstability of control systems.
Springer,
Sch~dlich, K., U. Hoffmann and H. Hofmann (1983): Periodical Operation of Chemical Processes and Evaluation of Conversion Improvements. Chemical En~ineerin~ Science 38, 1375-1384. Shayman, M.A. (1984): Inertia Theorems for the Periodic Lyapunov Equation and Periodic Riccati Equation. Systems and Control Letters 4, 27-32. Shayman, M.A. (1985): On the Phase Portrait of the Matrix Riccati Equation Arising from the Periodic Control Problem. SIAM. J. Control and Optimization 23, 717-751. Sincic, D. and J.E. Bailey (1978): Optimal Periodic Control of Variable Time-delay Systems. International J. Control 27, 547-555. Speyer, J.L. (1973): On the Fuel Optimality of Cruise, J. Aircraft 10, 763-764.
182
Speyer, J.L. (1976): Non-optimality of Steady-state Cruise for Aircraft. AIAA Journal 14, 1604-1610. Speyer, J.L. and R.T. Evans (1984): A Second Variational Theory of Optimal Periodic Processes. IEEE Trans. Automatic Control 29, 138-148. Valko~ P. and G.A. Almasy (1982): Periodic Optimization of Hammerstein-type Systems. Automatica 18, 245-148. Watanabe, N., Y. Nishimura and M. Matsubara (1976): Singular Control Test for Optimal Periodic Control Problems. IEEE Trans. Automatic Control 21, 609-610. Watanabe, N., K. Onogi and M. M a t s u b a r a (1981): Periodic Control of Continuous Stirred Tank Reactors - I, Chem. En@. Sci. 36, 809-818, II ibid. 37, 745-752. Watanabe, N., H. Kurimoto and M. M a t s u b a r a (1984): Periodic Control of Continuous Stirred Tank Reactors - I I I , Case of multistage reactors. Chem. En 9. Sci. 39, 31-36. Wimmer, H.K. and A.D. Ziebur (1975): Remarks on Inertia Theorems for Matrices. Czechoslovak Mathematical Journal 25, 556-561. Wong, E. and B. Hajek (1985): Stochastic Processes En~ineerin ~. Springer-Verlag, Berlin.
in
Wonham, W.M. (1968): On a M a t r i x Riccati Equation for Stochastic Control. SIAM Journal Control 6, 681-698. Yakubovich, V.A. and V.M. Starzhinskii (1975): Linear Differential Equations with Periodic Coefficients. J. Wiley, New York.
Chapter
6
Numerical
Problems
Daniel
in L i n e a r
Boley
System
and S e r g i o
Theory
Bittanti
I. I n t r o d u c t i o n The a n a l y s i s tation
of m u l t i v a r i a b l e
of m a t r i x
problems,
rank and e i g e n v a l u e s . computer
In this work,
numerical
We b e g i n
more
we o u t l i n e
problems
with
f r o m linear
linear
2. R e v i e w
used
methods Value
of these
for c o m p u t e r examples
calculations,
of w h e r e
decompositions problems,
eigenvalue
Decompositions). concepts
to m a t r i x
to be u s e d on a
theory.
and r e l a t e d for
systems
the c o m p u -
for h a n d c o m p u t a t i o n s .
and give
in s y s t e m
equations
and S i n g u l a r
few a p p l i c a t i o n s riant
arise
involves
the t e c h n i q u e s
a r e v i e w of the s i m p l e r
of linear
systems
a few t e c h n i q u e s
w h y they are useful,
sophisticated
(Schur
ranging
In general,
are n o t the same as those
illustrate
systems
control
used then
and rank
go on to
computation
We c o n c l u d e
to the a n a l y s i s
to solve
with
a
of t i m e - i n v a
systems.
of S i m p l e r
Computational
Methods
2.1 - LU d e c o m ~ o s ! ~ ! 2 n
We b e g i n that
by i n t r o d u c i n g
the c o n c e p t
is, we try to reduce
simpler we w o u l d
matrices, like
a matrix
from w h i c h
to calculate.
of a m a t r i x
decomposition;
A to the p r o d u c t
we can c a l c u l a t e
of several
whatever
it is
-
184
The
first
a matrix
example
triangular, Gaussian
is the LU d e c o m p o s i t i o n ,
A into the p r o d u c t respectively.
Elimination.
A = LU, w h e r e
in w h i c h w e d e c o m p o s e L, U are
This decomposition
To see this,
lower,
is c o m p u t e d
it is b e s t
upper using
to use an
example.
Consider
A =
[31i] 1 2
1 1
In G a u s s i a n 1 to rows the
Elimination,
2 and
3. T h i s
(1)
the f i r s t
step
is to add m u l t i p l e s
can be a c c o m p l i s h e d
by m u l t i p l y i n g
of r o w A on
l e f t by the m a t r i x
Im
0 1 0
M1 =
21 m31
where,
in this
the r e s u l t
is
0 0 1 case,
m21 = -1/3,
m31 = -2/3
are the m u l t i p l i e r .
Then,
185
MIA
=
2/3
-
(2)
I/3
Then,
in the n e x t
s t e p we
apply
a matrix
I°l
M2 =
I
m32
where row
m 3 2 = -I/2.
2 to r o w
3 U = M2MIA
This
3. T h e
has
=
both
m32 = -I/2
times
2/3
~
(3)
/
sides
by MI I M21
(det M i = I),
so we m a y
to o b t a i n
L = M11 M21 .
By c o m p a r i n g
(I) w i t h
(2) a n d
zero
then
to set to z e r o
M 2 is u s e d
column
2.
In t h e
general
matrices,
one
following
case,
all
the
(3), n o t e
M I is to set to
the
of a d d i n g
is
t h a t M I, M 2 are n o n s i n g u l a r
multiply
where
result
I
0 We n o t e
the e f f e c t
final
where
for e a c h
subdiagonal all
the
Matrix
the
elements
action
Mk,
of m a t r i x
of c o l u m n
subdiagonal
A is n x n, we m u s t
column.
structure:
that
elements
apply
k = 1,2,..,
n-1
I of A; of
"M"
n-l,
has
186
I
"
0 I
Mk =
mk+ I ,k " .
° mn, k 0
T k - t h column Coefficients matrix
The
mj,k,
k + 2 .... , n, w i l l
be
referred
to as the
multipliers.
last
i t e m we n e e d
decomposition that
j = k+1,
the
inverse
as M k w i t h
to c o m p l e t e
is: w h a t of Mk,
is
"L"? To
the description see w h a t
as c a n be e a s i l y
the m u l t i p l i e r s
in t h e k - t h
of the L U
f o r m h a s L, we n o t e
verified,
column
is the
same
negated:
-I I
• Mkl
0 I
= O-mk+]
,k " .
-mn, k
0
T k-th
Secondly, the
result
diagonal.
we n o t e
column
that when
is s i m p l y In o u r
to
fill
we
form
in all
3 × 3 example,
we
the p r o d u c t
L =M11
the m u l t i p l i e r s
have:
"'" Mn
below
the
I'
187
L =
=
Here,
one
can
multipliers
I
I
I_2/3
0
I/2
see
from
the d i a g o n a l we h a v e
/3
that all
the Mk,
in t h e i r
all
"I" 's.
the n e t
=
change
Hence,
the
1/2
is to c o l l e c t
sign
and place
position.
(i,j)
-m.. = the m u l t i p l i e r u s e d on r o w 13 the s t a g e w h e n c o l u m n j is b e i n g
I
I_2/3
effect
corresponding
/3
position
j when
added
all
the
them below
On t h e d i a g o n a l , of L,
i > j, is
to r o w
i during
zeroed:
i 0 L =
(-Iniji" -
In c o n c l u s i o n , L is l o w e r
we have
triangular
found with
a decomposition
"1" 's
on
for A : A = LU,
the d i a g o n a l
where
a n d U is u p p e r
triangular.
What
can we
do w i t h
this?
We give
2 uses
of t h i s
decomposition:
A. Solve Linear Equations By u s i n g
LUx
A=LU,
the
system
Ax =b
is e q u i v a l e n t
to
= b.
If we
(4)
call
y = Ux,
we t h e n
Ly = b
Ux
= y.
reduce
equation
(4) to two t r i a n g u l a r
systems:
188
Triangular
systems
are
"back-substitution". of G a u s s i a n
... M 1 b
Then
solution
Ux
= L-Ib
solved
using
note
that
also
to t h e v e c t o r
= L-lAx
x can be
if w e
= Ux
found
by
the process
apply b,
known
as
the
row operation
the
result
will
be
= y.
solving
= y.
It t u r n s o u t operations work
that also
to s o l v e
except new
We
Elimination
Mn_lMn_ 2
the
easily
that
right
the extra
to b to o b t a i n
L y = b f o r y.
by using
hand
work
involved
in a p p l y i n g
y is e x a c t l y
The
two
L y = b, w e m a y
s i d e b, w i t h o u t
schemes solve
the
same
as t h e
are exactly
directly
repeating
the row
equivalent,
A x = b, w i t h
a
the decomposition.
B. Computing Determinant Since
the determinant
product
of a p r o d u c t
of t h e d e t e r m i n a n t s ,
determinant
uij
known
fact
is t h e
the product
Using
2.2
that of
(i,j)
element
the diagonal
of d e t A
- Orthoqonal
2.2.1
is e q u a l write
to t h e
the
... Unn) ,
of U.
the determinant
the LU decomposition
definition
immediately
of A as:
d e t A = d e t L • d e t U = I " (u 1 1 u 2 2
where
of m a t r i c e s
we may
Here,
we have
of a t r i a n g u l a r
used
matrix
the w e l l is s i m p l y
elements.
is m u c h
(Stewart,
faster
than
using
the
1973).
Decomposition
- Q R Decomposition
In t h e L U d e c o m p o s i t i o n ,
we
have
applied
matrices
that
are not
189
orthogonal; Since
they
2 vectors
formations,
do not p r e s e r v e m a y be m a d e
we w o u l d
transformations,
like
which
almost
to see w h a t
Q is o r t h o g o n a l
ortho-normal,
i.e.
=
{0
or angles
parallel
do p r e s e r v e
A n x n matrix
qlqJ
lengths
of vectors.
by such t r a n s -
one can do w i t h
lengths
orthogonal
and angles.
if its c o l u m n s
qi are m u t u a l l y
, if i ~ j ,ifi=j,
or,
in m a t r i x
notation,
Q'Q = I.
In this
section,
we
show h o w one m a y t r i a n g u l a r i z e
using o n l y o r t h o g o n a l and angles.
transformations,
thereby
preserving
T h e n we show w h y the use of o r t h o g o n a l
is p a r t i c u l a r l y
useful
Consider
the matrix:
A =
I
by g i v i n g
an e x a m p l e
a matrix lengths
decompositions
of its use.
I
We w o u l d where
like to r e d u c e
"?" d e n o t e s
determined.
We
transformation
QI =
where,
the
a nonzero
first
element
see h o w to do this of the
column
D
whose
using
1
value
~ ' to
~
0
is to be
a "rotation",
i.e.
a
form
c 0
by o r t h o g o n a l i t y , c 2 + s 2 = I.
(5)
0~',
190
We w i l l
u s e QI
to z e r o
element
a21,
i.e.
is[!ic0[!I The
second
line yields:
-S"
3
I
+
which,
c"
=
0,
together
with
C = 3//1-0
,
Having
QI'
defined
s a m e way,
Q2
=
to
zero
we
find
c 2 + s 2 = I, y i e l d s
s = 1/I/To .
we a p p l y the
it to A to o b t a i n
elements
c and
s of the
Q1A.
Then,
in the
rotation
(6)
1 0
a third
a31.
To c o m p l e t e
rotation
Q3 to
the zero
triangular a32,
decomposition,
obtaining
we n e e d
finally
R = Q3Q2QIA. In g e n e r a l , zero
all
in the n x n case,
the
R = QrQr_1
subdiagonal
... Q 1 A
= an u p p e r
Let
Q =
(QrQr_1
..- Q1 )-I.
By o r t h o g o n a l i t y :
we n e e d
elements
r = n(n+1)/2
of A:
triangular
matrix.
rotations
to
191
,
!
!
Q = Q1Q2
We h a v e
"'"
Qr
now
the
"
so c a l l e d
triangularization
A
=
2.2.2
ortho
A rotation seen from problem
only
upper triangular
the
of a v e c t o r ,
we may
look
as c a n be
at a r e p r e s e n t a t i v e
2 × 2 rotation:
also
x and e I =
2 components
(6). H e n c e ,
2-space.
represent
Consider
:
R
affects
(5) a n d
in t h e
Consider
X
•
Geometric Interpretation of a Rotation
-
We m a y
or o r t h o g o n a l
of A:
Q
arbitrary
QR decomposition
c = cos
a vector
D
%
, s = sin
~ for
some angle
x of R 2, a n d d e n o t e
by
8 the
~
angle
9':
Ixl fcos:l =
IIx II [ s i n
"
Hence,
[ c
Qx
=
llxll
cos
that
+ s
sin
i cc
os(O-~
=
-S COS
This means
0
Qx
8 + C sin
is t h e v e c t o r
llxll
Lsin(8-%~
x rotated
by angle
.
#.
between
192
2.2.3
Decomposition by Householder Transformations
QR
As we h a v e
seen
in 2 . 2 . 1 ,
be o b t a i n e d
by m u l t i p l y i n g
alternative
way
holder
vector
is t h a t
as o n e
in g e n e r a l , component one
likes
can
only.
rotations.
This
n-1 The
reflection")
implies
Householder
c a n be
of a s i n g l e
that,
introduced
to t r a n s f o r m
by m a k i n g
"House-
trans-
components
to o b t a i n
a vector
so-called
to z e r o
out
can
is an
of a
transformation,
transformations, transformation
There
of t h e s e
as m a n y
it is p o s s i b l e
Householder
We w a n t
on t h e
advantage
zero out
by m e a n s
QR d e c o m p o s i t i o n
rotations.
Q, b a s e d
The main
by a r o t a t i o n
can use
follows.
one
Q of the
n(n+l)/2
of c o m p u t i n g
transformations".
formations
matrix
whereas,
one
the Q R d e c o m p o s i t i o n , instead
of n ( n + l ) / 2
(or " e l e m e n t a r y reference
to R 2 as
x to a v e c t o r
v along
e.g.,
axis
e I = [I
such
03
that
vector
,
n v [[ =
around
[[ x [[
x + v
This
(see Fig.
c a n be a c h i e v e d 1).
[ Z=X+ V
eI
Figure 1.
v
by r e f l e c t i n g
the
193
We
go
v=
through
the
following
steps
(note
that
we know
x
and
ttxti e1~
- Axis
of
reflection
z = x+v
= FXl +
II x il,x~]'
I..
The
corresponding
- Project
x
onto
unit
the
axis
vector
of
is
z/fir If.
reflection
to
obtain:
Z Z Ix
tl z II2 - Find
the
difference
between
x
and
its
projection:
a
(or
equivalently
b=a-x.
- Reflect v
x around
= x+2b=x+
its
2(a-x)
projection = 2a-x
2 -z z- ' x = -
= -x+
z) :
(I- 2
ilzli The
zz'
) X.
ilzil2
matrix ZZ
!
P =I-2----
,
IIzll2
gives
the
Householder
transformation.
Since
v = - Px,
we
can
conclude
"reflect"
a vector
In n - s p a c e , to
zero
that,
out
we at
can once
x
by into
pick as
such any
the
many
a linear axis
of
transformation, the
desired
target
components
of
one
can
space.
direction a vector
as
c we
so
as
like.
194
2.2.4
Solving Least Squares Problems Using Orthogonal Decompositions
Let A e R mxn, m_>n, problem
b e R m and x e R n. The L i n e a r L e a s t Squares
is the p r o b l e m of f i n d i n g the f o l l o w i n g m i n i m u m
min I 1 ~ - b
II
X
The a l g o r i t h m of 2.2.1 may be a p p l i e d to r e c t a n g u l a r just as well as square ones. decomposition
matrices
In this case we see that the Q R
of the r e c t a n g u l a r
w h e r e Q ~ R m x m is o r t h o g o n a l ,
m a t r i x A is:
R ~ R nxn is upper t r i a n g u l a r
0 e R (m-n)xn is a b l o c k of zero elements.
As Q is o r t h o g o n a l ,
it does not change the norm.
llAx-bI' = I'Q' (Ax-b)'l
= " [RTx-c
Lol
II,
where
c = Q'b. Partitioning
C
=
this v e c t o r c o n f o r m a l l y
r
IC21 we have
IIAx - b II = II [RXc2-c I II
with [~I,
Hence:
and
195
In o r d e r
to m i n i m i z e
this norm,
we
set
x = R-Ic I .
(9)
Thus,
min x
II A x - b
To f i n d
II : II c 2 ll-
the o p t i m u m
value
of x g i v e n
by
(9) we h a v e
to s o l v e
system
Rx = c I .
In this of
(10)
respect,
N Ax-b
A'Ax
II l e a d s
to the
noticing
celebrated
a direct
normal
minimization
equations:
of
(11)
(8),
system
(11)
is e q u i v a l e n t
to
= R'C I.
(12)
It is i m p o r t a n t
to o b s e r v e
to s o l v i n g
(12)
as o n e o b t a i n s
discussion
of
Systems,
this,
see e.g.
give an e x a m p l e
Suppose
only
(Lawson
0
solving fewer
use
system
errors.
the e r r o r
and Hanson,
(10)
For
a complete
analysis
1974).
is p r e f e r a b l e
of L i n e a r
However,
we m a y
of the p r o b l e m .
on a computer
7 significant
[ 11 10 -4
that
one must
we a r e w o r k i n g
we c a r r y
A =
that
-- A'b.
In v i e w
R'RX
i t is w o r t h
0
10 -4
digits.
with
precision
Consider
10 -7
i.e
matrix
(13)
196
A has r a n k
A'A
I
=
2, but
if we f o r m
1 + 10-8
1
1
1+10 -8
in o u r c o m p u t e r
J
w e w i l l loose
the p a r t
which
has r a n k
3. S p e c i a l
1. So, we loose r a n k
Forms
Used
in N u m e r i c a l
The LU a n d Q R d e c o m p o s i t i o n s s t e p in the c o m p u t a t i o n in the f o l l o w i n g . flavour
useful
things
(ii)
used
a given
Determinant:
(iii)
R a n k of A
(iv)
Nullspace
is w e l l
known,
section
I
also
are u s e d
as the basic
to be i n t r o d u c e d
serves
of f i n d i n g
square matrix
to give
the
A,
a n u m b e r of i.e.
1
det A
of A: ker A
(v) I m a g e o r R a n g e of A:
- The J o r d a n
above
in the r e s t of this work.
on the p r o b l e m s
(i) E i g e n v a l u e s :
Linear Algebra-Why
discussed
The p r e v i o u s
about
information.
of the d e c o m p o s i t i o n s
of the a p p r o a c h
We n o w c o n c e n t r a t e
As
instead
El ii
A'A =
3.1
"10 -8" a n d o b t a i n
Canonical
there
c o l s p A.
Form
are m a n y
classical
decompositions
for
197
matrices,
the m o s t
common
A is then d e c o m p o s e d
A = PJP
where
-I
into the p r o d u c t
P is n o n s i n g u l a r , Form
eigenvalues
of A
(product
(dimension
corresponding
( elements
so-called
Matrix
minus
corresponding
of J),
and the
of J o r d a n
the c o l u m n s
bloks
of P c o r r e s p o n
the n u l l s p a c e
to n o n z e r o
us the
the
of J)
the n u m b e r
of J g e n e r a t e
Jordan
form can tell
elements
to h i = 0). F u r t h e r m o r e ,
of A,
rows of J g e n e r a t e
of A.
So, the J o r d a n
Canonical
(i) - (v). However,
almost
This
of the d i a g o n a l
of the m a t r i x
the c o l u m n s
the range
1959).
of the d i a g o n a l
ding to all zero c o l u m n s
advised.
decomposition.
of 3 m a t r i c e s :
and J is in the
(Gantmacher,
determinant
whereas
the J o r d a n
,
Canonical
rank
being
The m a t r i x
singular),
separated(almost separated", This calls
especially
Conditionin~
conditioning
finite w o r d
Form
li are p o o r l y
I. are "well l is an i l l - p o s e d problem.
consideration
computer.
Because
can be r e p r e s e n t e d
in the computer.
In the t r e a t m e n t used
are p e r t u r b e d
(i.e.
of a P r o b l e m
model m o s t
often
items is ill-
if the
is an i m p o r t a n t
numbers
form
of
use of a d i g i t a l
length,
even
this
ill-conditioned
if the e i g e n v a l u e s
the J o r d a n
for the q u e s t i o n
one c o n s i d e r s t h e
one to find out all
computations,
P can be e x t r e m e l y
coincidingl.But
finding
3.2 - N u m e r i c a l
Numerical
Form enables
for n u m e r i c a l
only
whenever of the approximately
of such a p p r o x i m a t i o n s ,
is to c o n s i d e r
what happens
the
if the n u m b e r s
slightly.
The e i g e n v a l u e s take for e x a m p l e
can be e x t r e m e l y the
3 x 3 matrix
sensitive
to p e r t u r b a t i o n s :
198
-64
82
144
-178
-46
-778
962
248
A =
which
21]
has e i g e n v a l u e s
(14)
1, 2, 3. If we add a small p e r t u r b a t i o n
EE,
where
01 E = 10 -4
-0.6
I. I
-6
_-0.1
0.3
-1
is a rank one m a t r i x perturbed shows
that
!
matrix
of n o r m ~10 -3, then,
A + EE w i l l
problems
have
m a y occur
complex
for any e > 0.45, eigenvalues!
e v e n on small
the
This
innocuous-looking
matrices!
Even with
7-16 d i g i t s
In the f o l l o w i n g perturbations destroy order
of accuracy,
20 x 20 e x a m p l e s ,
in the
all d i g i t s
of m a g n i t u d e
9 th place
of a c c u r a c y will
this
is a r e l e v a n t
due to W i l k i n s o n
in some e l e m e n t
will
in the e i g e n v a l u e s -
be w r o n g
problem:
(1965), completely even
the
in some cases:
["20 20
20 19
20 18
20 17
0 20 16
20 (15)
A = 5
0
20 4
20 3
20 2
20 I
199
We p o i n t apply then
out that
to o b t a i n
an a l g o r i t h m . slight
result,
as was
gorithm
that can give
In c a s e a p r o b l e m
use m e t h o d s
satisfactory
is n o t so b a d l y
introduce
collectively
the s t a b i l i t y
conditioned
and the
believe
the answer
the p r o b l e m is w e l l hope
hope to improve ditioned
The usual
in terms
4. Schur
of s t a b i l i t y
"Backward
of a s p e c i f i c
If a p r o b l e m
decomposition
which
most closely
corresponds
called
decomposition.
Schur
this d e c o m p o s i t i o n
badly
con-
not.
in L i n e a r
in the n e x t
is n u m e r i c a l l y
best
Algebra intro-
section.
to the J o r d a n
is g i v e n
Denoting
useful
decomposition
by
"*" c o n j u g a t e
is the
so-
transpose,
(16)
Q is a ( p o s s i b l y (possibly
and w h i c h
by
A = QRQ*,
is a
we c a n n o t
Decomposition
The m a t r i x
where
case,
for a or
of
If the p r o b l e m
is p r o b a b l y
algorithm
then we m a y
t h e n we can
in the o p p o s i t e
This
is w e l l -
by the c o n d i t i o n i n g
for a l g o r i t h m s
Stability".
causing
are c a l l e d
is u n s t a b l e ,
stable
like to
the b e s t
it is stable
of a s o l u t i o n
by any a l g o r i t h m ,
measure
is s o - c a l l e d duced
the a c c u r a c y
problem
providing properties
of the algorithm).
but
is no al-
we w o u l d
as p o s s i b l e ,
defined
but the m e t h o d
algorithm;
there
t h e n we m a y con-
regard
to solve
limits
and the s t a b i l i t y
for a b e t t e r
conditioned,
In this
desirable
method
case
in
the d e s i r e d
results.
of an algorithm.
(to the
conditioned,
In this
or at least
These
or i l l - c o n d i t i o n e d
can d e s t r o y
as few e r r o r s
perturbations,
on such errors.
one m u s t
(such as t h a t o c c u r i n g
length)
(15).
for its solution.
which
the s m a l l e s t bounds
word
seen in e x a m p l e
to a p r o b l e m
is i l l - p o s e d
to the d a t a
from the finite
sider m e t h o d s
solution
If a p r o b l e m
perturbations
the c o m p u t e r
the
complex) o r t h o g o n a l
complex)upper
triangular
matrix matrix.
(Q*Q = I) and R
200
We f i r s t n o t e the m a t h e m a t i c a l position: values
since
elements.
= detQ
Hence,
is a l w a y s
form yields
the n u m e r i c a l bounded
"almost parallel",
the o r i g i n a l
instability
items
properties?
we can m a k e
form
between
(16)
is n o t m a d e worse.
same
size
to be
stable,
that
b l e m c l o s e to the o r i g i n a l
of s a y i n g
slightly
perturbed
b l e m A".
Here,
R.
3.2.
This
and
The a d v a n t a g e
in A w i l l
result
sensitivity
in
to
any algorithm/form
shown
(Wilkinson,
is e x a c t
1965)
for a p r o -
R
starting problem
As is s e e n
is f o r w a r d
stable:
"The
R c l o s e to the R t h a t w e w o u l d h o p e
starting with
an a n s w e r
(a) is not t r u e destroy
so the
is the r e s u l t
an a n s w e r
"close r' m e a n s
used.
in Sect.
in
is n o t
problem.
in a p r o b l e m
In g e n e r a l ,
(b) the a l g o r i t h m
has p r o d u c e d
"close to s i n g u l a r i t y " ) .
one.
exact arithmetic
p r o b l e m A", we say
are
the i l l - c o n d i t i o n i n g
can be
(a) the a l g o r i t h m
has p r o d u c e d
it
(the c o l u m n s
sure that the r e s u l t
in R,
transformations
to obtain w i t h
of Sect.3.
Q is o r t h o g o n a l ,
is t h a t p e r t u r b a t i o n s
perturbations
the c o m p u t e r
Since
ill-conditionin~
b a s e d on u n i t a r y "backward"
(ii)
t h a n the s t a r t i n g
of the
statement
just
the r e l a t i o n :
of R.
so t h a t Q is n e v e r
perturbations
complete
satisfy
(i) a n d
of an a l @ o r i t h m m e n t i o n e d
of the S c h u r
algorithm
are
elements
to p e r t u r b a t i o n s
is the d i s t i n c t i o n
Instead
the e i g e n v a l u e s
can hope to r e m o v e
problem,
sensitive
algorithm
the e i g e n -
in size a n d w e l l - c o n d i t i o n e d
T h o u g h no a l g o r i t h m
more
for w h i c h
transformation,
The d e t e r m i n a n t s
of the d i a g o n a l
the S c h u r
What about
R,
o f the S c h u r d e c o m -
d e t R det Q* = det R =
= product
never
is a s i m i l a r i t y
of A are t h o s e o f
the diagonal
detA
(16)
relevance
the e x a c t
is b a c k w a r d which
original
stable:
is e x a c t
"The
for a
A c l o s e to the o r i g i n a l
on the o r d e r
of t h e p r e c i s i o n
from examples
in g e n e r a l - s l i g h t
(14)
and
changes
(15),
pro of the
to A c a n
201
By u s i n g exact
orthogonal
eigenvalues
In e x a c t
no method examples
of
can
(14)
c a n be
we make and
the
still
obtain
(backward
hence
If o u r m e t h o d still
stability).
we o b t a i n
makes
the
exact
slight
hope
for
errors
(b). T h e s e
in a m e t h o d •
Schur f o r m c a n n o t be used reliably for items (iii)-(v).
Schur
as
form
follows.
If A is a l r e a d y
is o b t a i n e d
with
upper
R = A, Q = I. C o n s i d e r
the n × n m a t r i x :
I -I
-1
I
A
to A
one can expect
illustrated
triangular,
close
(a), b u t we c a n
the b e s t the
we may
no errors,
(15).
satisfy
shows
Unfortunately,
then
for a m a t r i x
arithmetic,
eigenvalues
This
transformations,
0
=
(17)
". -I 1
i
If ~ = 0, a l l
the
rank
is c l o s e
to d e f i c i e n t ,
will
make
A singular!
-I x =
eigenvalues
-2 , ~
Therefore, a matrix.
-3 r (~ , - • . t
we need Such
This
exactly
since
-(n-~)
-(n-l) ,
way
~
by
e to
the
e = - I / 2 (n-2)
forming
Ax with
-(n-1 r
to f i n d
is p r o v i d e d
I. N e v e r t h e l e s s ,
"perturbing"
c a n be v e r i f i e d
(~
a better
a way
are
by the
the rank, Singular
det, Value
etc.
of
Decom-
position.
5.
Singular
In t h i s items We
section,
such
also
and try
Value
Decomposition-Condition
we
introduce
as r a n k :
introduce to e x p l a i n
the
the its
another
Singular
concept
Number
decomposition
Value
Number
a Matrix
relevant
Decomposition
of C o n d i t i o n
significance.
of
to
(S.V.D.).
of a m a t r i x
202
The
A
S.V.D.
of a mx p matrix
A is
-- U ~ V *
where
(18)
U and V are
m x p real By
and diagonal
letting
oI ! o2 ~
IIAII =
o2,
...,
and orthogonal
matrix,
n = min(m,p),
= diag (al,
In w h a t
square
we
with
usually
matrices
nonnegative assume
and
E is a
diagonal
elements.
that
on ) ,
-.- Z o n I 0.
follows,
we w i l l
IIAxll
max
use
the m a t r i x
2-norm:
,
(19)
IIx II = 1 w h e r e llx llis t h e u s u a l several
properties.
the
fact
the
2-norm
that
vector
The most
orthogonal
of a matrix
2-norm.
In t h i s
immediately
matrices
do n o t
norm,
relevant affect
we have
property (Stewart,
is 1973)
or v e c t o r :
IIQxll = IIx II
(20a)
IIQAII=
IIA ll-
(20b)
Notice
also
What
kind
obvious
IIAII
that of
IIQ
II = 1.
information
is t h e n o r m
o f A.
From
gleam
(18),
from the
(20)
and
S.V.D.?
The most
(19):
= I1~11 = 01 •
If A is a n o n s i n g u l a r
A -1
can one
= V Z-I U*.
Moreover,
square
matrix,
its
inverse
is g i v e n
by
203
IIA-Ill = 11 z-III = %1 Given a square nonsingular matrix A, the number
kIAl =IIAII • IIA-III
(21)
is said to be the condition number of A. Obviously,
k(A)
= oI/~n"
The condition number happens to be a very useful quantity in estimating the sensitivity of such items as rank, determinant, inverse,
solution to a set of linear equations,
etc., with
respect to perturbations to the matrix A. It also gives the "distance to singularity" To see this, we start with the classical origin of k(A). Consider the problem of solving the matrix equation Ax = b. When using a computer, we obtain an approximate result ~, which we consider exact for the slightly perturbed problem A~ = ~. Note that we have perturbed only b, not A. We have:
Ax = b ~u A~=b. Subtract to get
A(~-x)=b-b. Multiplying both sides by A -1 and taking the norms, one obtains:
i~-xJl ~ liA-Ill il~-bll, i.e. the
(error in answer)
122) is bounded by the(error in right
hand side) magnified by iIA-III. However,
to estimate the number
of digits of accuracy in x, we need the "relative error"
204
II~-xll ilxll If the r e l a t i v e of accuracy,
error
is e.g. % 1 0 -6,
regardless
of the r e l a t i v e
error,
of the
then we h a v e
about
size of x. To o b t a i n
we use the r e l a t i o n
6 digits
an e s t i m a t e
A x = b to obtain:
IIAII Ilxl[ > [Ibll i.e.
flail > I llbil -]Ixil From
(22)
113-xlI Ilx II i.e.
and
(23)
(23),
and d e f i n i t i o n
(21),
<_ kCAI ll~-bll lib II
the
(number
of g o o d d i g i t s
of g o o d
(24a)
digits
in b) m a g n i f i e d
For p e r t u r b a t i o n s
in x)
is b o u n d e d
by the
in A, one can o b t a i n
an a n a l o g o u s
result:
II~-xll II~-AII ll~l] _< kIAl II~II Here,
the e r r o r s
We give
(24b)
are r e l a t i v e
a few e x a m p l e s
k(I)
(b) k(A)
to the a p p r o x i m a t e
of c o n d i t i o n
matrices:
(a)
(number
by k(A).
=
I
>
I, for any A.
numbers
values.
of some p a r t i c u l a r
205
Indeed:
= (c) k(Q)
IIAII
IIA-111
z
IIAA-111
=llIII
= 1
= 1, if Q is o r t h o g o n a l
(d) L e t T 6 be the are t.. = 13
6 × 6 Hilbert
matrix,
the
elements
of w h i c h
(1+i+j) -I.
Then, k(T 6) ~ 106 .
The
S.V.D.
a n d k(A)
singularity"
c a n be
used
to f i n d
the
of a m a t r i x
A. We n o t e
that
7 = number
of n o n z e r o
G. s.
"distance
to
!
rank A = rank
In p a r t i c u l a r ,
if A is n o n s i n g u l a r ,
t h a t A is n o n s i n g u l a r singular.
V*(A+E)
Then,
a i > 0, V i .
a n d E is a p e r t u r b a t i o n
we m a y w r i t e ,
U =V*AU
1
+V*EU
using
the
S.V.D.
Now,
such
suppose
t h a t A + E is
of A
(18):
= Z +F,
where
F = V*EU.
Because
U, V are
to A c o r r e s p o n d We
can define
smallest
d
= sing
In v i e w
dsing
=
"distance
E such
EII
:
II F II- T h u s ,
to p e r t u r b a t i o n s to
singularity"
F to as the
perturbations
E. norm
of the
t h a t A + E is s i n g u l a r :
min A + E sing.
of t h e
orthogonal,ll exactly
II E II •
discussion
min 7.+F sing.
II F II
(25)
above,
this
corresponds
to
(26)
E
206
Since
Z = diag(ol,
is c l e a r
that
o 2 , . . . , On) , w i t h
the F which
achieves
o I I o2 >
the m i n i m u m
... _> o n > 0, in
(26)
it
is
F = d i a g ( 0 , 0 , . . . , 0 , - ~ n)
so t h a t
lJ ~ Jl= % Hence,
the E a c h i e v i n g
we have
U = [u I u 2
labeled
(25)
is
the columns
of U , V :
... U n ]
V = Iv I v 2 ... V n ]
Notice
in
= - O n U n v*n '
E = UFV*
where
the minimum
also
.
that
If ~ Ir= % Hence,
dsing
the
distance
to s i n g u l a r i t y
is
= an .
(27)
Consequently, to t h e
size of
d
the the
"relative starting
distance matrix
A)
to
singularity"
(relative
is:
o sing
_
}IA II SO,
k(A)
solving
n
- k (A) .
(28)
°I not only Ax=
b, b u t
indicates also
the
shows
difficulty
how
close A
one
can expect
is to a s i n g u l a r
in
207
matrix relative gives
the
sensitivity
It s h o u l d defined
be n o t e d
using
S.V.D.
We h a v e
spaces
We n o t e
hold
the
suppose
... ~
in a n y
the
size
r a n k A,
For
about
this the
case,
analogous
of t h e
smallest
(Or/O I) g i v e s
we
...
to
IIAII II
c a n be
to a v e c t o r
norm,
involving
2-norm.
(iv)
such quantities
and
(v) of S e c t . 3 ,
start with values
and
perturbation the relative
a singular
of A
= o n = 0,
(27)
k(A)
in A.
to o b t a i n
points
singular
words,
The results
in the
can be used
that
:
such norm. only
that,
and
k(A)
corresponding
o r > Or+ I = Or+ 2 =
in p a s s i n g
or gives
quantity
norm
S.V.D.
In o t h e r
A to p e r t u r b a t i o n s
are valid
colsp A?
L e t us
oI ~ o2 ~
the
r a n k A, k(A) . W h a t
k e r A,
n × n A.
reduce
(24)
of t h e m a t r i x .
of rank
that
however,
seen how
as IIAII ,
size
any matrix
and relations the
to t h e
satisfy:
so t h a t
(28), needed size
of
r a n k A = r.
the quantity to
further
such
perturbation•
We w r i t e
the
S.V.D.
of A as
"a 1
v;]
o2
0 or
v Eu12[iI 0°
0
A = U
0
0 0
where
Z I = diag(ol,
o2
,---,
or )
(29)
208
is r × r a n d n o n s i n g u l a r . to Z.
A
been
partitioned
conformally
1
UI,V I are nx r orthonormal
singular. ker A
have
Thus,
= UIZIV
where
U, V
Hence,
is t h e
orthonormal
orthogonal basis
In p r a c t i c e ,
matrices
U I is an o r t h o n o r m a l
i.e.
to u s e
complement
and
basis
of
the
space
V 2 is an o r t h o n o r m a l
(29), o n e w i l l
~I
is r × r n o n -
for c o l s p with
basis
frequently
A,
and
V I as
f o r k e r A.
encounter
the
situation
o I >_ o 2
where
>_
some of the
the order cide
singular
It is b e s t gular 10 0
only
the exact
later
10 -I
_> ... > o n _> 0,
singular
the machine
is s m a l l " , value
(scaled 10 -2
the order
10 -4
instead,
"small",
The problem
is at w h a t
the problem
point
of a c c u r a c y ,
here. Assume
so t h a t
o I = 1) e.g.
10 -8
10 -10
10 -9
of magnitude
Hence
are
precision.
that
r a n k of t h e m a t r i x
zero.
values
i.e.
on
is to d e -
to c o n s i d e r
zero.
illustrate are
6 digits
considered
10 0
to
values 10 -1
where
If,
of e.g.
"how small
a small
had
... >_ O r > O r + I
are
shown.
is 8. B u t
then
we would
0
that
the
0,
Then
we
see
if t h e o r i g i n a l
any number consider
< 10 -6 the
rank
that
data
should to be
be 5.
we had the values
10 -2
10 -4
10 - 6
10 - 8
10 - 1 0
10 - 1 2
10 - 1 4
10 - 1 6
I0 -I
10 -2
10 -3
10 -4
10 -5
10 -6
10 -7
10 -8
10 -18
or
i0 0
sin-
10 -9 '
only
209
then there almost
is no o b v i o u s
entirely
Unfortunately, practice,
small
see
this
the S.V.D.
This
arise
really
means
perturbation
to A w i l l
reduce
the rank,
only
slightly
larger,
in
is n o t a d e f e c t
arises,it
For a full d i s c u s s i o n
this
frequently
situation
(Klema and Laub,
We close
c a n and d o e s
in l a r g e m a t r i c e s .
perturbation,
further.
rank depends
of the z e r o t o l e r a n c e !
situation
If this
(negligible)
another even
so the e f f e c t i v e
o n the c h o i c e
especially
of the S.V.D.
gap,
can r e d u c e
of the S.V.D.
that a and
the r a n k
and the r a n k
1980).
section with
W e just p o i n t
a few examples o u t the idea,
of s i t u a t i o n s
leaving
involving
the d e t a i l s
to
It is u s e f u l
to
t h e reader.
a) L e a s t
Squares
(Lawson
T h i s w a s the c l a s s i c a l solve p r o b l e m cases, R in
(7) in cases w h e r e
we c a n n o t
A is r a n k d e f i c i e n t .
use the Q R d e c o m p o s i t i o n
(8) is s i n g u l a r ,
If we r e s o r t
& Hanson,1974) o r i g i n of the S.V.D.
instead
and h e n c e we c a n n o t to the S.V.D.
because solve
In s u c h
the m a t r i x
(9).
of A, we have,
in v i e w of
(20a)
llAx-bll
= 11
V*x-bll
=
w h e r e y = V * x a n d c = U*b. original
problem
partition
I[
y
We minimize we
find
The result
to a d i a g o n a l
the a b o v e
0
11 ZV*x-
as in
C
(29)
y ctl,
is w e h a v e
problem
converted
involving
the
Z. We
to o b t a i n
II
t h i s n o r m by s e t t i n g
that Y2 is free!
*bll=ll
Yl
-I = Z1 ci"
In the s o l u t i o n ,
210
(b) P s e u d o
Inverse
The p s e u d o expressed
inverse
A + of A, L a w s o n
and Hanson,
1974,
can be
as
A+
V
U*
where
we have
(c) R e l a t i o n
used
the p a r t i t i o n i n g
(29).
to A * A
We p o i n t
out the r e l a t i o n s h i p
of the S.V.D.
to the c l a s s i c a l
idea of e i g e n v a l u e s . If A = U Z V*, then A *A = V Z U* U Z V*
= V Z 2 V*
~2
•...,
= diag(~,
Hence•
the
a~
singular
of A'A,
of A * A . I n
using
fact,
semi-definite
to p r o v e
Hanson,
1974). for a c t u a l
solution
of the
without ease
accurate
forming
A'A,
squares and
u s i n g A * A are o f t e n
performing
computation
this
argument
in
(Lawson
and
or the
the S.V.D.
from e x a m p l e
(or 2 × n for any n), sufficiently by hand.
positive
(a), it is a l m o s t
to c o m p u t e
as c a n be seen
r o o t of
is s y m m e t r i c
of the S.V.D.
problem
faster
problems
square
of V are the eigenvectors
of the S.V.D.
computation
least
of 2 × 2 m a t r i x
obtained when
the fact that A * A
the e x i s t e n c e
However,
more
oi are just the
and the c o l u m n s
for any A, one can c a r r y
reverse
always
2 an).
values
the e i g e n v a l u e s
,
(13). the
accurate,
directly In the results
especially
211
6. A p p l i c a t i o n s
of P r e v i o u s
We f i n a l l y
a look of h o w the p r e v i o u s
stability
take
applies
continuous-time ~(t)
= F x(t)
y(t)
= H x(t)
to L i n e a r
on n u m e r i c a l either
the
system
+ G u(t) (31)
F is n x n, G is n x m a n d H is p x n. Markov
w(i)
G
= H F i-I
In d i s c r e t e whereas
parameters
time
t h e y are the v a l u e s
in c o n t i n u o u s - t i m e at the o r i g i n
We s h a l l
consider
a given
system
system
defined
is r e a c h a b l e
and
t, t h e r e
Obviously, The p r o b l e m
criteria
studied
theorem:
two problems
Determining
f r o m the whether
the s y s t e m
w(-). (Kalman,
Falb,
if,
for e a c h
state x and e a c h
a T
whether
a given in the
have been
for c o n t i n u o u s - t i m e
Arbib,
function
= 0 into x(t)
upon matrices
extensively
for r e a c h a b i l i t y
criteria
of the p u l s e
that
depends
of d e t e r m i n i n g
response
(i) D e t e r m i n i n g
such as to c a r r y x(T)
reachability
or n o t has b e e n
of p o p u l a r
exist
(ii)
sequence
(i), we r e c a l l
o v e r IT,t)
examples
stability.
is said to be r e a c h a b l e
time point
of the p u l s e
its d e r i v a t i v e s .
as p a r t i c u l a r
n f r o m the e x t e r n a l
for p r o b l e m
as
t h e y are the v a l u e s
and
p o i n t of v i e w of n u m e r i c a l
order
are d e f i n e d
, i = 1,2,...
response
this
Consider
(30)
The s o - c a l l e d
many
discussion
Systems.
+ G u(t)
x(t+1) = F x(t) y(t) = H x(t)
As
Dynamic
Systems
system
or the d i s c r e t e - t i m e
where
to L i n e a r
1969)
u(-) = x
.
F, G only.
system last
is r e a c h a b l e
30 years,
developed. systems
and
Examples
are g i v e n
in
a
212
Theorem The
A
system
equivalent
(CI)
The
T h e n x(nm)
There
if and o n l y
if a n y of t h e
following
is t r u e
matrix
~ = [G F G F 2 G . . . F n - I G ]
has rank
n
rank)
complex
(C3)
is r e a c h a b l e
n x(nm)
(full
(C2)
(30)
conditions
matrix
number
exists
values)
s
P(s)
: [sI-F
" G ] has
rank
n for a n y
(PBH t e s t ) .
a m× n matrix
of F + GK are
all
K such
that
different
the p o l e s
from
those
(eigen-
of F
(state
feedback).
(C4)
(In c a s e
F is s t a b l e ,
negative
real
the FW
solution +
WF
=
(Grammian
(C5)
There
part). of t h e
In
exists
(33),
eigenvalues W has
rank
of F h a v e n, w h e r e
W is
equation: (32)
no p a i r
T is
Fll
(~,~)
(33),
related
some nonsingular conformally
to
(30)
matrix,
by ~ = T F T -I,
such
that
as :u
,
0 are
c a n be then
(33)
G =
F22J
and F22
condition form
the
-GG'
[i
This the
Lyapunov
G can be partitioned
F =
all
grammian
condition)
G = TG, w h e r e
F,
i.e. The
square
read
the
as:
system
matrices. "If t h e r e is n o t
is a p a i r
reachable".
[]
% % (F,G)
of
213
For d i s c r e t e - t i m e
Theorem
The s y s t e m
(31) (C1),
if and o n l y singular.
Many
an a n a l o g o u s
theorem:
is r e a c h a b l e (C2),
(C3)
if and only or
(C5)
if the d i s c r e t e - t i m e
if any of the e q u i v a l e n t
holds,
Grammian
or, W:
equivalently,
= C C'
is non-
[]
We d i s c u s s
this
showing
giving
we h a v e
B
conditions
first
system,
theorem
the
limitations
some p o s i t i v e
when
b u t o n l y to the w o r k i n g c a s e we m i g h t
regarding
conclude
precision", criterion
(C2),
(C4)
(C5),
becomes
This
can
ill-posed.
system
unreachable".
if we take
(C3).
on the
not e x a c t l y
of our computer.
"almost
and then
(C1),
or sub-matrix.
t h a t the s t a r t i n g
(C1),
(C3),
be r a n k - d e f i c i e n t ,
precision
or
of v i e w by
A and B d e p e n d
the rank p r o b l e m will
point
for c o n d i t i o n s
in T h e o r e m s
a submatrix
to w o r k i n g
(CI),
of the r a n k of a m a t r i x
lead to p r o b l e m s Frequently,
of
results
of the c o n d i t i o n s
computation
from a n u m e r i c a l
In this
is " u n r e a c h a b l e For example,
the e x a m p l e
of P a i g e
(1981): F = diag
o = El,
(1, 1/2,
I,
1 ....
I/22 ,...,I/2 n-l)
then hy i n s p e c t i o n However
(34)
it is a p p a r e n t
if we f o r m the m a t r i x
a smallest that
(34)
,13'
C
singular
has rank
value
values
are n o n - z e r o ) ,
computers,
10 -12
A similar
problem
determining
the
~
. mln i.e.
< n-l,
is not reachable.
The but,
system
the
the
system
rank
of
to the w o r k i n g
when
order
system
for n=10,
is reachable.
we find
= 10 -12 , so we w o u l d
exact
is c o n s i d e r e d arises
that
C
it has conclude
defined
by the p a i r
C
(all s i n g u l a r
is n
precision
of m o s t
zero. considering
the p r o b l e m
n from the e x t e r n a l
of
sequence
w(-).
214
This
problem
suitable
~r
is i n d e e d
matrix.
the p r o b l e m
Precisely,
of f i n d i n g
the r a n k of a
let
-w(1 )
w(2)
...
w(r)
w(2)
w(3)
...
w(r+1)
w(r)
w(r+1)
...
w(2r-1
--
be the s o - c a l l e d We recall
that,
r-dimension
matrix.
given a system defined
the s y s t e m d e f i n e d and t h a t a s y s t e m reachable
Hankel
by
(F', H',
is said
(see Kalman,
G')
by the t r i p l e
is n a m e d
to be o b s e r v a b l e
Falb,
Theorems
A and B it f o l l o w s
and only
if the m a t r i x
Arbib,
1961).
t h a t the
system
(F,G,H),
the dual
system
if its dual Then,
from
is (CI) of
is o b s e r v a b l e
if
!
O
= [H'
F|H I
(FI)2H '
. ....
(F')n-1H '
]
is full rank. It is e a s y
to see t h a t
~n =OC Therefore, rank
~n
In fact, vable rank
if a s y s t e m
is b o t h
and observable,
= n.
it can be s h o w n
if and o n l y ~r
reachable
=n
that a system
if ,
Yr>n.
is r e a c h a b l e
and o b s e r -
215
This m e a n s
that,
in principle,
observable
system
the o r d e r
can be found
from the p u l s e
determining
the rank of a Hankel
Now
t h a t F and G are given
suppose
this
case
symmetric ~min ~ n of
C
matrix
= (Omin C ~n
)2. We give
for v a r i o u s
in T a b l e
values
4.8 x I0 -5
2.3 x I0 -9
7
1.4 x I0 -6
1.9 x I0 -12
8
2.1 x I0 -8
4.3 x I0 -16
"almost
formally reachable
define
I
of the c o m p u t e r
rank-deficient system
motivates
reachability".
With
the c o n c e p t
even
being
used,
though
matrix
the m a t r i x
is r e a c h a b l e .
the
introduction
this
objective
of d i s t a n c e
of the c o n c e p t
in mind,
to the n e a r e s t
we un-
system.
Definition
Given
above
values
amin ~ n
6
the
and
singular
7.7 x I0 -7
and h e n c e
are
of n.
8.8 x I0 -4
The d i s c u s s i o n of
1 the
In
matrices
~n = ~2
5
on the p r e c i s i o n
not,
Hence
Omin C
could be c o n s i d e r e d is
and H = G'
= O.
C'
Table
C
(34),
and equal
: C =
by
of large d i m e n s i o n .
and o b s e r v a b i l i t y
n
~n
by
and
response,
the c o n t r o l l a b i l i t y
and
Depending
of a r e a c h a b l e
(Paige,
a system
(30)
f r o m an u n r e a c h a b l e
(a) The p a i r
1981),
(Eising,
reachable, system
1984)
we say it has a d i s t a n c e
~(F,G)
if
(F, G) : = (F + ~F, G + ~G)
is not
reachable,
with
216
and
(b) F o r any pair is
(~, ~)
II? -
with
~
~
G
II< ~
the p a i r
(~, G)
reachable.
0 In o t h e r
words,
H is the n o r m of the
~G to F and G of
Miminis terms
(1981)
of
(30)
has
that y i e l d s
found
a more
smallest
perturbation
an u n r e a c h a b l e
computational
6F,
system.
w a y to d e f i n e U in
(C2) :
= m i n ~ (P(s)) seC n
where take
denotes n the m i n i m u m
(35)
o
the n-th over
the s* a c h i e v i n g
(smallest)
all c o m p l e x
the m i n i m u m
singular
numbers;
is not real,
value
in fact
of P(s). in some
as i l l u s t r a t e d
We
cases
by the
example
0
-1
I
0
1
F =
G =
for w h i c h achieved
it has b e e n when
for any real
(36) 0
shown,
s= ± i - ~ s. Hence,
Boley
, and that in this
a n d Lu
(1984)
the m i n i m u m
case,
that
~ = .6614
is not a c h i e v e d
the p e r t u r b a t i o n
6F,
8G is
not real.
Paige
(1981)
points
using
(C1) - (C3).
(C2),
one m a y
eigenvalue
out the n u m e r i c a l
An e x a m p l e
s h o w that,
to just e i g e n v a l u e s
of F, or doing
The p r o b l e m
cases,
one m a y be u n a b l e
latter
(e.g.
case,
expensive,
with
the
former
to c o m p u t e
if F is the m a t r i x
the c o m p u t a t i o n
Miminis
(1981).
one m a y e n c o u n t e r
(CI) we a l r e a d y P(s) < n, then
of F. One t h e n has a c h o i c e
plane.
accuracy
for
if rank
problems
mentioned.
s must
of l i m i t i n g
it o v e r
the e i g e n v a l u e s (15)) (Paige,
involved
search
complex in some of F to any
1981).
can b e c o m e
Eising(1982)has
one
the whole
is the fact that
For
be an
In the
prohibitively
also described
a
217
way to c o m p u t e blem,
which
~
(C2)
(C3), w e can e n c o u n t e r
in that
critically
the s u c c e s s
minimization
problem,
is an e x t r e m e
our t h a t u s i n g
again
the
in d i s t i n g u i s h i n g
on the e i g e n v a l u e
ditioned((15)
We p o i n t
of an n - d i m e n s i o n a l
pro-
can a l s o be e x p e n s i v e .
In the case of in
in terms
same p r o b l e m
the poles
as
depends
w h i c h may be badly
con-
example).
the G r a m m i a n
can
lead
to s i m i l a r
problems,
because
(a) in the d i s c r e t e which
we h a v e
time
seen
(b) in the c o n t i n u o u s depends among
on the
other
case,
the grammian
is a p o o r v e h i c l e
time
case
solution
things,
W is d e f i n e d
in this
the d e t e r m i n a t i o n
of the L y a p u n o v
becomes
CC',
of r e a c h a b i l i t y ,
equation
ill-conditioned
as
regard
(32), which,
when
F is a l m o s t
unstable.
We can
say a few p o s i t i v e
with
regard
al.
(1979),
finds
to
(C5),
have
a unitary
such exists. the J o r d a n
reachable,
given
form,
In p a r t i c u l a r ,
the
produced
by the
by i n s p e c t i o n
principles With
regard
values
of C
of
in B o l e y
and u s i n g
(36)
(36)
and Lu
simple
to c o n d i t i o n
(36)
F21
that
(Cl)
and
does
o n e can imply
that if
not give
a good
estimate
(36).
back
~.
in the f o r m
a nearby
by g o i n g
Unun-
the d i s t a n c e
is a l r e a d y
since
1968).
but a l m o s t
~ is a c t u a l l y
algebraic
do n o t n e c e s s a r i l y
(1961
to 0 in
(1984)
(33)
et
to one u s e d to c o m p u t e
The best
is ~ 1
by s e t t i n g
algorithm")
to e s t i m a t e
algorithm".
w a y to see from
obtained
system
of all,
Van Dooren
the f o r m
is reachable,
algorithm"
the
"staircase
system can be o b t a i n e d
value
(30)
"staircase
for ~. In fact,
no o b v i o u s
similar
is no e a s y w a y
estimate
can o b t a i n
T exhibiting
is v e r y
First
(1981),
("staircase
Kublanovskaya
if the s y s t e m t h e n there
(CI)-(C5).
Paige
an a l g o r i t h m
algorithm
canonical
about
people,
transformation
This
fortunately,
things
several
one
unreachable
There .6614,
is a
to first
argument. show that almost
small
singular
unreachability
of
218 the
system
bound
(F,G),
but under
the d i s t a n c e
Theorem
If C
(Boley
has
~o+ell+
B in t e r m s
and
Lu,
singular
...
certain
+e I n n
of the
singular
one
values
can of
C
:
1984)
values
YI ~ "'" ~ Yn-1 ~ Yn > 0 a n d
is the
characteristic
polynomial
of F
with
= 1, t h e n
n
<
where
This
~
(I
+ max I ~ i l )
F is n x n,
theorem
not on
the
the
spread
Van
Dooren
that
the
of t h e
of the
find
smallest
that
how
shows useful
(¥n)
but
on
example,
C
has
show
of (37)
regard
Hence use.
to o b t a i n may
least
(C3) one
U of
on the o r d e r (38)
In
fact Boley
a nearbyreal physical
discussion
we c a n
direction,
around
also
give
this
of
/~ a n d
is n o t O ( e ) ,
the criterion
have more
(see e.g.
to
values
distance
some
which
one
results:
singular
the
to u s e
in at
0
(1981)).
something
that
two
that
is s t i l l
with
(38)
-1/2
a complex
Finally,
values
(yn/Yn_1) . In the
[ij20] [
can
system,
singular
values
(~) d e p e n d s
G =
modified,
than
to u n r e a c h a b l e
(1981):
O ( / ~ ) , (Van D o o r e n
showed
(37)
n-1
"distance
singular
,~"
. One
~(n
G n x m. D
says
size
y
F =
we
circumstances
but
(C1)
suitably
a n d Lu
(1984)
unreachable significance eq.
(36)).
a result
condition
that
can
give
219
Theorem
Assume
(Boley
that p a i r
eigenvalue small
and Lu,
ho,
(A,B)
of A. T h e n there
1984)
is r e a c h a b l e
What
this
from
theorem
by any state
a feedback ~1,...,Vn
I n by at least
says
feedback
asymptotically
is t h a t
zero,
some
eigenvalue
is h a r d to m o v e
then
the
is a l m o s t
In this d i r e c t i o n above
this
the c o n v e r s e
to i l l - c o n d i t i o n i n g reachability
We h a v e
some
K with
sufficiently
IIK II ~ h such
of the c l o s e d
loop m a t r i x
h U(A,B). D
than then
eIIKll
, as
of A is m o v e d IIKII
U < e. In o t h e r
by any small
state
words,
if
feedback,
unreachable.
is a p o s i t i v e
is not true,
since
result,
but as m e n t i o n e d
the poles
of the e i g e n p r o b l e m
rather
may m o v e
due
than the
property.
given
limitations
matrix
if some e i g e n v a l u e
K no m o r e
approaches
system
In is a simple
for any h > 0 less t h a n
exists
that all the e i g e n v a l u e s A + BK d i f f e r
and that
in this
section
just a few e x a m p l e s
of some of the c l a s s i c a l
from a numerical to c o u n t e r b a l a n c e
point these
of view,
criteria
of the
for r e a c h a b i l i t y
and some of the p o s i t i v e
limitations.
aspects
220
References Boley, D.L. and W.S. Lu, 1984: The Quasi Kalman Decomposition and State Feedback. American Control Conference, S. Diego. Gantmacher,
F.,1959:
Theory of Matrices I & II. Chelsea
(New York).
Eising R., 1982: "The Distance Between a System and the Set of U n c o n t r o l l a b l e Systems". memo COSOR 82-19, Eindhoven Univ. of Technology. Eising, R., 1984: "Between Controllable and Uncontrollable". Systems & Control Letters , Vol. 4,n. 5 pp. 263-264, July 1984. Klema, V.C. and A.J. Laub, 1980: The Singular Value Decomposition: its Computation and Some Applications. IEEE Trans. Automatic Control, Vol. AC-25, no. 2, pp. 164-167. Kalman, R.E., P. Falb and M.A. Arbib, System Theory. McGraw-Hill.
1969: Topics in Mathematica
Kublanovskaya, V.N., 1961: On Some Algorithms for the Solution of the Complete Eigenvalue Problem. Zh. Vych. Mat., VoI.I, pp. 555-570. Kublanovskaya, V.N., 1968: On a Method for Solving the Complete Eigenvalue Problem for a Degenerate Matrix. USSR Computational Math. and Math. Phys. Vol. 6, pp. 1-14. Lawson, C. and R. Hanson, 1974: Solving Linear Least Squares Problems. Prentice-Hall. Miminis, G., 1981 : Numerical Algorithms for Controllability and Eigenvalue Allocation. M. Sc. Thesis, McGill University. Paige, C.C., 1981 : Properties of Numerical Algorithms Related to Computing Controllability. IEEE Trans~ Automatic Control, Vol. AC-26, no. I, pp. 130-138. Smith, B.T., et al., 1976: Matrix Ei@ensystems Routines - EISPACK Guide. Lecture Notes in Computer Science 6, Springer-Verlag (Berlin). Stewart, G.W., Press.
1973 : Introduction to Matrix Computations.Academic
Van Dooren P., A.Emani-Naeini and L.Silverman, 1979: Stable Extraction of the Kronecker Structure of Pencils. Proc. 17th IEEE Conference on Decision and Control, pp. 521-b24. Van Dooren, P., 1979: The Computation of Kronecker's Canonical Form of a Singular Pencil. Linear Al~ebr@ and Applications, Voi.27, pp. 103-141. Van Dooren, P., 1981: The Generalized Eigenstructure in Linear Systems Theory. IEEE Trans. Automatic Control, Vol. AC-26, no.l, pp. 111-130.
221
Wilkinson, J.H., 1965: The Algebraic Ei~envalue Problem. Claredon Press (Oxford). Wilkinson, J.H. and C.Reinsch, 1971: Linear Algebra- Handbook for Automatic Computation. Vol.2, Springer-Verlag (Berlin).
Chapter
7
SOMERECENTDEVELOPMENTS IN ECONOMETRICS
Michael McALEER and Manfred DEISTLER
I.
INTRODUCTION Econometrics, in a wide sense, is concerned with the application of
statistical or mathematical methods to the analysis of economic phenomena. In this sense, econometrics may be thought of as consisting of the following four fields: (i)
Economic statistics:
problems of definition of economic
variables {such as in National Income and Product Accounts), problems of data collection, sampling and data construction, and problems of validating the data; (ii)
Econometrics in the narrow sense:
econometric methods and
econometric model building; (ili) (iv)
Economic theories:
the use of mathematical formulations;
Econometric computinS:
data bank systems for economic data and
computer programs, and interactive computing systems for data transformation, estimation and test procedures, graphical displays, calculation of solutions, and computer simulation. We will be mainly concerned with econometric methods here.
Econometrics
was born in the 1930's, from the evolving (Keynesian) business cycle theories and the first national accounts, and was especially advanced by the statistical methods developed by the Cowles Commission.
*
The authors are grateful to Dr. A. Pagan (Canberra) for valuable comments.
223
The first econometric models for national economies were due to Tinbergen and Klein, and were built in the forties.
In the sixties, for
almost every industrialized country, macroeconometric models had been established.
These models~ and especially forecasts based on these models,
behaved falrly well in the periods of steady economic growth, but showed a relatively poor performance in trying to cope arisln 8 from the seventies.
with the economic problems
This poor performance of econometric models had
great implications for the standing of econometrlcs~ but the attempt to over-
come these difflculties has been one of the main driving forces for the development of current econometrics. In analyzing these problems, it was found that many of the "a priori" assumptions which had been used in
traditional
econometric model building,
such as those concerning the classification of variables as endogenous and exogenous, the functional form of the relation between the variables, the dynamic specification of the model, and the correlation structure of the errors, could hardly be Justified on the basis of real economic a priori knowledge, and that these assumptions had been imposed primarily for statistical convenience.
Moreover, the fact that oftenp by using different a priori
assumptions, different conclusions from the same or from similar data sets could be derived, showed that econometrics was far from obtaining objective results from data.
The consequence was a critical re-examinatlon of
traditional
methods and of the assumptions Justifying them, and the development of more appropriate methods and research strategies that were more closely related to the actual problems arising in economics. A criticism that has frequently been raised is that there is a great discrepancy between the process of actually drawing conclusions from economic data~ and inference, as described by the decision theoretlcally-orlentedmathematlcal theory of statistics (see e.g. Leamer (1978)).
In many applications
224
the situation is far too complicated to express a s~at~stical decision with one formula.
Learning from data may consist of several steps where suD-
jectlve decisions cannot he excluded at each stage.
This was paralleled by
the development of exploratory data analysis, as reported in the seminal work of Mosteller and Tukey (1977) in ~eneral) statistics, Special emphasis has recently been directed in econometrics towards developing methods for checking the model specification from the data~ and on data-orlented specification search procedures.
In particular~ a great
number of tests and diagnostic checks have been developed in the last fifteen years to detect mlsspeclflcation of different kinds (see Pagan and Hall (1983) for a useful discussion of many of these developments).
Information criteria
have also been developed and used to determine automatically the dynamic specification of various lags of models.
A further development concerns the
performance of estimators or tests if the data generating process is not described in the model class, and this area of (potentially) mlsspecified models has recently been investigated by Kent (1982) and Whlte (1982),
Two
additional areas of current research interest are the robustness of estimators and tests to departures from the assumptions made in using models, as well as the sensitivity of inferences drawn to changes in the assumptions and differences in a priori information, Until the late sixties, a good part of econometric model-buildlng activity was concerned with macroeconomlc modelling and forecasting.
Since then~ there
has been an increasing number of econometric investigations on a far less aggregated level which has led to new models and methods,
Moreover, owing to
the increased quantity and improved quality of data available, applications of
225
more data-consuming techniques have increased.
These '~icroeconometrlc"
methods are definitely among the most important developments in econometrics today, and we will describe them briefly in this paper. The question of appropriate macroeconomlc modelling and forecasting is still an unresolved issue, after the difficulties encountered in traditional structural model buildins.
Several proposals have been made to overcome these
difficulties associated with the traditional approach, and we w~ll describe some of the most important developments in this paper. The paper is orsanlzed as follows,
Section 2 is concerned with the
specification and quality control of models, and the related issue of specification searches.
Macroeconomlc modelling and £orecastin s is discussed
in Section 3, and some examples of modern '~icroeconometrlc" models are given in Section 4.
Needless to say, our account is far from complete and a number
of important topics have been omitted.
In particular, the (relatively)
inexpensive role of computers in econometrics, such as for conducting sophisticated Monte Carlo experiments for comparln s different estimators and different test statistics, and for bootstrapping the small sample distributions of estimators and test statistics, is not discussed althoush they will play a very important role in the development of the discipline in the decades to come.
226
2.
SPECIFICATIONAND qUALITY CONTROLOF A MODEL Differences i n formulating a model have been described by McAleer and
Pesaran (1986) as follows:
differences in theoretical paradigms, differences
in the way that auxiliary assumptions within a paradigm are specSfled, or different strategies that might be adopted in the process of model construction. By a model specification is meant the set of all assumptions which define the model class, and hence also the parameter space for the inference procedure.
Economic theory, e.g., often suggests the variables in a relation-
ship, but not the appropriate functional form or the direct links between the various parts of a system.
For these reasons, a data-orlented specification
search procedure is warranted, w h e r e b y specification search is meant the set of procedures followed in moving from an initial model specification
to
a
final model class. Two matters which arise when there are conflicting views regarding assumptions are the justification of the assumptions from the data and the effect of altering any of them on the properties of inference procedures. The former issue has to do, e.g., with hypothesis testSng and diagnostic evaluation, whereas the latter is concerned with robustness of inference procedures to changes in the underlying assumptions. There are several ways of conducting a specification search, and they may be given as follows: (i)
Data analytle methods:
these procedures are based on recognizing
patterns in data, as well as in their transformations, and rely heavily on subjective decisions.
A well known example of this
approach is the method advocated by Box and Jenkins (1970).
227
(ll)
Information criteria,
or criteria which provide a trade-off
between goodness-of-fit and the number of parameters used to obtain this fit for different model specifications, that is, for different candidates in the specification search.
(ill)
Testing procedures:
this would appear to be the most common
specification search procedure in econometrics.
This is, in fact,
the third of five stages used in the quality control of a model, as outlined by McAleer, Pagan and Volker (1985). are given as:
The other stages
checking consistency with economic theory; economic
and statistical considerations, such as signs and magnitudes of estimated coefficients, as well as statistical significance; sensitivity analysis; and reconciliation of empirical findings with the results obtained from previous research using alternative non-
n e s t e d models.
2.1
Model
Speci~tcation
In what follows, we will concentrate on the role of diagnostic checking within the framework of the multiple linear regression model u ~ N(0, o21),
which may be written for the
Yt = x;8 + u t where
Yt
u
t
ut ~ NID(0,
is the dependent variable,
observation matrix and
,
X
comprising
x't
t'th
y = X8 + u ,
observation as
o2)
(2.1)
is the t'th row of the
T observations on
k
T ~ k
regressors,
is the random error that is assumed to be normally distributed with
zero mean, identically and independently distributed for all observations t = 1,2,...,T,
and uncorrelated with X.
228
It should be noted that virtually all of the assumptions made in the context of the model given above are, in fact, testable.
The properties of
the error term, namely, zero mean, serial independence, homoscedasticlty and normality, are testable using what are by now standard procedures.
Assumptions
regarding linearlty of the model, a correctly specified set of explanatory variables, and constancy of the coefficients are also all testable.
Finally,
the informational content of the model given in equation (2,1) may be reconciled with the empirical findings of previous research by recourse to recently developed non-nested testing procedures. 2.2
T~ght and Loose Spec!flcatlons Before proceeding, it will be necessary to discuss briefly two alter-
native approaches to specification searches, namely, tight and loose specifications, with corresponding tests of misepecification and speciflcztion. In a tight speclficatlon,a very small model set is specified as a first step, and then a series of diagnostic checks is used to indicate ways in which it may be respecifled by enlargening the model set.
Diagnostic
checks are primarily tests of misspecificatlon since only the null hypothesis needs to be specified in advance of performing the test. following examples:
Consider the
testing for serial independence of errors against either
AR(p) or MA(p) alternatives can result in the same test statistic (Godfrey (1978));
testing for homoscedasticity of the errors against either multi-
plicative or additive heteroscedasticlty as alternatives can also result in the same test statistic (see Beta and McKenzle (1986)).
229
Bearing the caveats given above in mind, rejection of the null hypothesis using diagnostic checks may suggest where to look to improve the model specification (see Table i). The scheme given in Table i should be used regardless of whether or not a tight specification is used.
However, the more tight the specification, the
more likely it is that instances will be found where the diagnostic checks lead to rejection of the null hypothesis, If there are many observations available, it may be useful to commence with a very large model set and to test restrictions on that loose specification.
In situations such as this, in which both the null and alternative
hypotheses are specified, a test of specification is being considered.
An
important approach to consider in testing restrictions on a loosely specified model class using time series data is that of uniquely ordered hypotheses (see Anderson (1971)). In this approach, if any hypothesis is rejected, any succeeding hypothesis is also rejected and need not be tested, It is advisable to start with a (fairly) general hypothesis (i.e. the maintained hypothesis) and to test hypotheses in increasing order of restrictiveness until a rejection occurs, or the most restrictive hypothesis is accepted (i.e. is not rejected).
The accepted hypothesis is the one prior to rejection.
An
interesting and useful application of uniquely ordered hypotheses in econometrics is that of testing for common factors (see Sargan (1980) for theoretical considerations, and Hendry and Mizon (1978) for an illustration).
280 TABLE l
Usin9 Diagnostic Checks to Test for Possible Model Misspecification
Diagnostic check
Possible sources of error
Serial correlation
Correlated errors Omitted variables Incorrect functional form Incorrect transformation of variables
Heteroscedasticlty
Non-constant variances Incorrect functional form Incorrectly transformed dependent variable
Exogeneity
Measurement errors Omitted links with larger system
Functional form
Omitted variables Incorrect transformations on variables Incorrect functional form
Parameter constancy
Structural change Varying coefficients Weak forecasting ability
Non-nested alternatives
Incorrect model Alternative explanations possible
231 2.3
Principles for Testing Returning now to the model given in equation (2.1), let us denote the
ordinary least squares (OLS) estimators of and
g2 = (y-XS)' (y-XS)/(T-k).
8
and
o2
as
8 = (X'X)-Ix'y
The OLS residuals are given as
u = y-XB,
^
with t'th comment given by
ut = Yt - x~B.
Tests may be constructed using the followlng Principles:
The Likelihood
Ratio, Wald and Lagrange Multiplier (or Score) Principles (see Engle (1983) for a discussion); the Cox (1961, 1962) Principle for non-nested (or separate) families of hypotheses (see McAleer and Pesaran (1986));
and the test
procedures based on the work of Durbin (1954) and Hausman (1978) (see Ruud (1984) for further details).
The first three Principles lead to tests which
are asymptotically equivalent under the null hypothesis as well as under local alternatives, and the likelihood ratio test is the only one of the three which requires estimation under both the null and the alternative hypothesis. The Lagrange multiplier (LM) test is extremely straightforward to use computationally, as it can frequently be computed as
TR 2,
that is, the sample
size times the coefficient of multiple determination from an auxiliary linear regression.
Several examples will he given below to illustrate the
simplicity of the LM procedure. The Cox Principle is a general approach that may be used for testing nonnested hypotheses, a special case of which is that of nested hypotheses.
It
essentially involves centring any given test statistic under the null hypothesis, and then deriving its asymptotic null distribution.
This procedure can be
applied to the likelihood ratio statistic itself, or to residual sums of squares from different regression models, or to differences in some or all of the parameter estimates of alternative models.
The Hauaman test procedure
(Hausman (1978)) can be considered to be an application of the Cox Principle. This procedure is based on the difference between two estimators, one of which
232 is efficient under the null, but not even consistent under the alternative hypothesis, whereas the other is consistent regardless of whether the null or alternative hypothesis is true.
2.4 Diagnostic Testing Throughout this section it will be presumed that a regression package is available for provldlng OLS estimates and for storing the predictions and residuals from OLS estimation.
Unless stated otherwise, all test statistics
discussed below may be obtained from OLS regressions based on simple auxiliary equations.
Emphasis is placed on non-structural models, and the
interested reader is referred to the review by Pagan and Hall (1983) for a detailed discussion of extensions to structural models.
Since the following
discussion is necessarily limited in scope, the broader treatment provided by Pagan (1984) is also highly recommended.
2.4.l
Serial correlation Serial correlation of the error term
u
t
can lead to inefficient
estimators and predictions, and to inconsistent estimates of dependent variable is in the set of regressors.
8
if a lagged
Perhaps the most useful
tests for serial independence against AR(p) or MA(p) alternatives have been developed by Durbin (1970), see also Breusch (1978) and Godfrey (1978). ut
follows an AR(p) process, then
where
et
u t = PlUt_l +
P2Ut_2 + ... + ppUt_ p + e t,
is white noise, and the regression model is given by
Yt = x~B + PlUt_l + P2Ut_2 + "'" + p p U t - p + et " hypothesis
If
Ho: Pl = P2 = "'" = Pp
ut_j (J = 1,2, .... p),
0
The LM test of the null
is obtained by replacing
the lagged values of the OLS residuals.
statistic is calculated as
TR 2
from the auxiliary regression
ut_ j
with
The LM test
233 ^
^
Yt = x~B + PlUt_l + P2Ut_2 + ... + ppUt_ p + c; , asymptotically
as
X2(p)
with
under the null hypothesis.
TR 2
distributed
The LM test for serial
independence against an MA(p) process is given by the same auxiliary regression. Durhin's
2.4.2
Setting
p = i
gives an asymptotically
equivalent
test to
(1970) h-statistlc.
Heteroscedasticity
When the variance of the error term is not constant but varies with each observation, where
zt
we might think of it as having the relation
is given.
The LM test statistic
squared OLS residuals, example,
if
zt
^2 ut •
2 2 o t = o + z;7,
is calculated by regressing the
on a constant and a vector of variables.
is given as the scalar
E(y t)
or
£n E(Yt) ,
For
the LM ^
statistic
is obtained as
^2
or
TR 2
from the auxiliary regression
^
u t = e + y£n Yt + ut '
where
ut
the explicit form of heteroscedastlcity incorporated
u~ = u + YYt + ut
into the construction
is the equation error.
might be suspected,
of the vector
zt .
In cases where
thl8 could be
234
Exogenetty
2.4.3
In the model
y = X8 + u,
lack of exogenelty of
X,
through measurement errors or because some elements of within a larger system, may
either
X
are determined
lead to inconsistent estimators of
B,
The
Hausman test for exogeneity can be applied straightforwardly, as discussed in Section 2.3.
A eomputatlonally convenient method for calculating the test
is to use a set of instrumental variables £
.
W(W'W)-~'X
W
and to obtain the predictions
from regressing the columns of
auxiliary regression
y = X8 + £~ + u
hypothesis of exogenelty
H : @ = 0,
X
on those of
W.
The
is estimated to test the null where the F statistic is asymptotically
O
valid for testing
2.4.4
Ho
(see also Durbln (1954)).
Functionalform
The most straightforward diagnostic check for omitted variables and/or incorrect functional form is undoubtedly Ramsey's (1969) regression specification error test (RESET).
This involves adding powers of the
predictions from the null model to the system and testing for the presence o f the additional factors,
The augmented regression is, for example,
Yt = x~B + yly~ + y2y t + u t exactly as
F(2,T-k-2)
under
and the F test of H°
Ho; Yl
if the regressors
72 xt
0
is distributed
are exogenous.
Several additional tests for incorrect functional form are available, and some of these methods have been based on the data transformations suggested by Box and Cox (1964).
In particular, Andrews (1971) has derived
a llnearlzed version of the Box-Cox model which does not require estimation of the Box-Cox model itself, hut only the specialization of it that is being tested.
Godfrey and Wickens (1981) have derived LM tests of both linear and
log-linear specifications against the more general Box-Cox model that are calculated as
TR 2
from auxiliary regressions.
For a survey of alternative
tests of linear and log-linear models, as well as a discussion of their small sample properties, see MmAleer (1985, Section 6).
235
2.4.5
Parameterconstancy Perhaps the most well known test for parameter constancy in econometrics
is the Chow test, in which the null hypothesis of constancy of parameters
in
the linear regression model is tested against the alternative of a change in coefficients at a known point in time.
The cumulative sum and cumulative sum
of squares tests of Brown, Durbln and Evans (1975), which are based on recursive residuals,
is available when a broader class of parameter non-constancles
considered as an alternative.
The constancy of parameters may also be checked
by testing for predictive ability based upon post-sample Salkever
observations
(1976) for a very simple test for parameter constancy).
review of this literature is given in Pesaran, 2.4.6
is
(see
A useful
Smith and Yeo (1985).
Non-nested alternatives The result of applying diagnostic checks to various model specifications
may lead to two or more models that are not rejected.
When one model cannot
be obtained from another by the imposition of restrietlons~ said to be non-nested.
The most well known test, namely the Cox test~ was intro-
duced into the econometric thesis be given by
literature by Pesaran
Ho: y = X8 + u, against
H°
augmented regression
y = X8 + Zy + u,
ature as testlng "parameters
details).
Two other Cox-type
and Fisher and McAleer
(1981).
to small sample properties
H1
is the test of
Let the null hypobe
Ho: y = 0
HI: y = Z 7 + v. in the
and this has been justified
of interest"p
Union-lntersection
(1974).
and let the alternative
The simplest test of
based on Roy's
the models are
Principle
as an encompassing
in the liter-
test, and as a test
(see McAleer and Pesaran
(1986) for
tests are given in Davidson and MacKinnon
(1981)
All of these tests are compared with regard
in McAleer
(1985, Section 7).
236
3.
MACROECONOMIC MODELLING AND FORECASTING Macroeconomic modelling is concerned with modelling the dynamics of,
and the interaction between, highly aggregated macroeconomic variables such as national income, consumption, investment, unemployment and prices; of special interest is the study of the business cycle, forecasting and policy simulation. The traditional approach to macroeconomlc modelling is structural model building, when large-scale models comprised e.g. of a hundred or more equations are
estimated from the data and used for economic analysis fore-
casting and policy simulation.
In view of the relatively large number of
equations and the relatively small data sets involved, a great number of restrictions on the parameters have to be imposed to obtain reasonably lowdimensioned parameter spaces and reasonable parameter estimates. restrictions are
These
frequently in the form of zero restrictions, indicating
that a certain variable does not influence some other variable in a certain equation of the system.
Under the assumption of an a priori given specification~
the theory of identifiability and maximum likelihood estimation of linear simultaneous equation systems (with uncorrelated errors) was developed by
Koopmans, Rubin and Leipnik
(1950);
two-stage and three-stage least squares
methods were subsequently developed in order to simplify calculations, However, in practical applications the most common estimation method was, and still is, ordinary least squares, despite its lack of consistency in the simultaneous equation framework. In the period of steady economic growth, the traditional large scale models showed satisfactory forecasting behaviour.
But with the increasing
fluctuations in many economic variables after the oil shock of the seventies, many of these models showed rather poor forecasting performances.
At this
time the first macroeconomic forecasts were made with Box-Jenkins models and
237
these simple univariate models often out performed large econometric models, at least for short-term forecasts. As has been said previously, these facts led to a widespread critique of the traditional model-building approach.
One of the main reasons for the
weakness of many forecasts was found in the poor specification of the respective models,where too much economic a priori information was presumed to be available.
As a consequence, data-orlented specification search
procedures (as described in Section 2) and new models have been proposed. At present, different modelling philosophies, and hence different models, have been proposed, even for identical or similar data sets.
Therefore,
macroeconometric modelling is still far from lacking in controversy.
There
are still advocates of traditional structural model-building, especially if the main aim is an analysis of the interaction between variables and policy simulation, rather than (unconditional) forecasting;
on the other hand,
several different proposals for new modelling approaches have been made, and we will describe some of these below. A method for obtaining the dynamic specification of a structural model is described in Zellner and Palm (1974) as follows.
Let
P q Z AiYt_ i = i=0 Z Bizt-i + u t i=O
(3.1)
be the structural model, where
Aiq]Rsxs ,
matrices, and
are the endogenous, exogenous and white noise
Yt' zt
and
ut
error variables, respectively. multiplication by the adjoint of operator)
Bi~/RSXm
are the parameter
Equation (3.1) may be transformed by a left EAi Bi
(where
B
is the backward-shift
to yield a system which is deeoupled ~n the sense that only the
i'th endogeneous variable (including its lagged values) appears in its i'th
238 equation.
In this form, the equations can be treated as
s
single
equations andthe Box-Jenkins method can be applied to obtain the dynamic specification for each of these single equations.
The dynamic specification
of the original model is then obtained from these specifications.
Of course,
one problem associated with this procedure is that the zero restrictions of the original model
are not taken into account in the transformed single
equations. The classification of the observed variables as endogeneous and exogenous often cannot be Justified on a priori grounds; this is especially true if conflicting economic theories imply different classifications for the variables.
This ambiguity has led
to discussions concerning the
concept of exogeneity and to tests for exogeneity.
One concept of exogeneity
is related to causality in the sense introduced by Granger (1969). analysis, variables are called exogenous if there is a
In this
unidirectional causal
influence from the exogenous to the endogenous variables.
Tests for
causality have been proposed, e.g. by Sims (1972), and Pierce and Haugh (1977). The second concept of exogeneity was given in Engle, Hendry and Richard (1983). The defining property of exogeneity here is that conditioning the other observed variables on the exogenous variables gives no loss of information about the parameters of interest. Another approach to overcome the classification problem
discussed above
is to provide a symmetric treatment to all observed variables, by describing them Jointly as a vector autoregressive (VAR) process. e.g. in Sims (1980).
This has been proposed
Once the VAR system has been estimated, questions such
as the classification of variables into exogenous and endogenous, or whether there are zero restrictions among the parameters, may be answered on an empirical basis (see Sargent and Sims (1977), and Hsiao (1982)).
Although
239 these vector autoregreselons usually contain slgnlfleantly fewer equations (about ten) than the usual structural models, both the dimension of the parameter space as well as the dynamic specification of the model remain problematic. In order to reduce the dimensions of the parameter space, Sargent and Sims (1977) have proposed a dynamic principal component analysis where the dynamics in the observed variables are introduced by factors of lower dimension; cycle
this is very closely related to the idea
that the business
in most macroeconomic variables can be explained by a few dynamic
factors (Bowden
(1972)).
However, in determining the number of dynamic
factors empirically, there seems to be no sharp delineation of principal components (Sims,prlvate communication). A Bayesian procedure for analyzing VAE models has been proposed in Doan Litterman and Sims (1984), which seems to have surprisingly good forecasting properties.
In this approach the prior
means of all coefficients corresponding
to lags greater than one are set equal to zero and the estimation problem is reduced to the estimation of relatively few "hyperparameters", such as the tightness of the prior means. Another method of reducing the dimension of VAR systems is the use of the AIC or BIC criteria to determine both the maximum lags in the V A R m o d e l and zero restrictions on the coefficients corresponding to smaller lags.
For
multivariate subset autoregressive modelling, see e.g. Penm and Terrell (1984). This method also performs well in forecasting. A different approach is to concentrate on the modelling of one equation at a time (see Davidson et ai.(1978) and Hendry (1986)) which implicitly assumes that the effects of simultaneity are negligible.
This approach stresses
that economic a priori knowledge is primarily concerned with the long-run
240
equilibrium solutions of the system, whereas in many cases economic theory has very little to say about short-run behaviour.
For this reason,
equilibrium solutions of dynamic models should be consistent with economic theory. 4.
MICROECONOMETRICS During the last decade there has been a substantial development of
econometric techniques to answer questions posed in empirical microeconomics. As an important example, consider the case where the variable being explained by the model either takes on only discrete values, or is limited in its range. Sample survey data frequently requires such models to be used:
for example,
a binary-choice model might be used to explain the decision to buy a car
or
not, and a multiple-choice model might be used to explain whether a commuter travels by bus, train or car.
The explanatory variables in each of these examples
would be the personal and economic characteristics of various individuals. The conditional probabilities of the outcomes of the discrete variable are related to various explanatory variables in the model. of probabilities, in particular,
Owing to the nature
the functional form of these relations must be restricted;
linear relations are excluded.
In binary models, where there is
only one conditional probability to be explained,
the most important class of
models is of the form
E
and
F
P(E~x~6) = F(x~8),
where
is a cumulative distribution function.
are the normal and the logistic, respectively, models.
denotes outcome of one event
The most frequently used functions leading to probit and logit
The most well known model where the dependent variable is limited in
its range is the Tobit model (Tobin (1958)).
For example, many observations in
a sample may take on the value zero (if, say, it is decided not to buy a car) while other individuals may spend varying amounts on cars.
In this sense, the
dependent variable is part qualitative and part quantitative. of estimation of these models is by maximum likelihood. see Amemiya (1981), Maddala (1983) and McFadden (1976).
The standard method
For further details,
241 REFERENCES Amemiya, T. (1981): Qualitative Response Models Economic Literature 19, 1483-1536. Anderson, T.W. (1971): New York.
:
A Survey.
Journal of
The Statistica!Analysis of Time Series.
Wiley,
Andrews, D.F. (1971): A Note on the Selection of Data Transformations. Biometrika 58, 249-254. Bera, A.K. and C.R. McKenzie (1986): Alternative Forms and Properties of the Score Test. Forthcoming in Journal of Applied Statistics. Bowden, R.J. (1972): More Stochastic Properties of the Klein-Goldberger Model. Econometrica 40, 87-98. Box, G.E.P. and D.R. Cox (1964): An Analysis of Transformations. of the Royal Statistical Society B 26, 211-252.
Journal
Box, G.E.P. and G.M. Jenkins (1970): Time Series Analysis, Forecastin$ and Control. Holden Day, San Francisco. Breusch, T.S. (1978): Testing for Autocorrelation in Dynamic Linear Models. Australian Economic Papers 17, 334-355. Brown, R.L., J. Durbin and J.M. Evans (1975): Techniques for Testing the Constancy of Regression Relationships Over Time. Journal of the Royal Statistical Society B 37, 149-192. Chow, G.C. (1960): Tests of Equality Between Sets of Coefficients in Two Linear Regressions. Econometrlca 28, 591-605.
Cox, D.R. (1961):
Tests of Separate Families of Hypotheses. In: Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability i. University of California Press, Berkeley.
Cox, D.K. (1962): Further Results on Tests of Separate Families of Hypotheses. Journal of the Royal Statistical Society B 24, 406-424. Davldson, J.E.H., D.F. Hendry, F. Srba and J.S. Yeo (1978): Econometric Modelling of the Aggregate Time-Series Relationship Between Consumers' Expenditure and Income in the United Kingdom. Economic Journal 88, 661-692. Davidson, R. and J. MacKinnon (1981): Several Tests for Model Specification in the Presence of Alternative Hypotheses. Eeonometrica 49, 781-793. Doan, T., R. Litterman and C. Sims (1984): Forecasting and Conditional Projection Using Realistic Prior Distributions. Econometric Reviews 3, i-i00. Durbin, J. (1954): Errors in Variables. Statistical Institute 22, 23-32.
Review of the International
Durbin, J. (1970): Testing for Serial Correlation in Least Squares Regression When Some of the Regressors are Lagged Dependent Variables. Econometrica 38, 410-421.
242
Engle, R.F. (1983): Wald, Likelihood Ratio and Lagrange Multiplier Tests in Econometrics. In: Z. Griliches and M. Intriligator (Eds.) Handbook of Econometrics. North-Holland, Amsterdam. Engle, R.F., D.F. Hendry and J-F. Richard (1983): 51, 277-304.
Exogeneity.
Econometrica
Fisher, G. and M. McAleer (1981): Alternative Procedures and Associated Tests of Significance for Non-Nested Hypotheses. Journal of Econometrics 16, 103-119. Godfrey, L.G. (1978): Testing for Higher Order Serial Correlation in Regression Equations When the Regressors Include Lagged Dependent Variables. Econometrica 46, 1303-1310. Godfrey, L.G. and M.R. Wickens (1981): Testing Linear and Log-Linear Regressions for Functional Form. Review of Economic Studies 48, 487-496. Granger, C.W.J. (1969): Investigating Causal Relationships by Econometr±c Models and Cross-Spectral Methods. Econometrica 37, 424-438. Hausman, J.A. (1978): 1251-1271.
Specification Tests in Econometrics.
Econometrica 46,
Hendry, D.F. (1986): Empirical Modelling ±n Dynamic Economics. in Applied Mathematics and Computation.
Forthcoming
Hendry, D.F. and G.E. Mizon (1978): Serial Correlation as a Convenient Simplification, Not a Nuisance : A Comment on a Study of the Demand for Money by the Bank of England. Economic Journal 88, 549-563. Hsiao, C. (1982): Autoregressive Modelling and Causal Ordering of Economic Variables. Journal of Economic Dynamics and Control 4, 243-259. Kent, J.T. (1982): 69, 19-27.
Robust Properties of Likelihood Ratio Tests.
BiometriRa
Koopmans, T.C., H. Rubln and R.B. Leipnlk (1950): Measuring the Equation Systems of Dynamic Economics. In: T.C. Koopmans (Ed.) Statistical Inference in Dynamic Economic Models. Wiley, New York. Leamer, E.E. (1978): Specification Searches : Ad Hoc Inference with Nonexperimental Data. Wiley, New York. Maddala, G.S. (1983): Limited Dependent and Qualitative Variahles in Econometrics. Cambridge University Press. McAleer, M. (1985): Specification Tests for Separate Models: A Survey. In: M.L. King and D.E.A. Giles (Eds.) Specification Analysis in the Linear Model. Routledge and Kegan Paul, London. McAleer, M., A.R. Pagan and P.A. Volker (1985): What Will Take the Con Out of Econometrics? American Economic Review 75, 293-307. McAleer, M. and M.H. Pesaran (11986): Statistical Inference in Non-nested Econometric Models. Forthcoming in Applied Mathematics and Computation. McFadden, D. (1976): Quantal Choice Analysis : A Survey. and Social Measurement 5, 363-390.
Annals of Economic
243 Mosteller, F. and J.W. Tukey (1977): Wesley, New York.
Data Analysis and Regression.
Addlson-
Pagan, A.R. (1984): Model Evaluation by Variable Addition. In: D.F. Hendry and K.F. Wallis (Eds.) Econometrics and Quantitative Economies. Blackwell, Oxford. Pagan, A.R. and A.D. Hall (1983): Diagnostic Tests as Residual Analysis, Econometric Reviews 2, 159-218. Penm, J.H.W. and R.D. Terrell (1984): Multivariate Suhset Autoregressive Modelling with Zero Constraints for Detecting "Overall Causality". Journal of Econometrics 24, 311-330. Pesaran, M.H. (1974): On the General Problem of Model Selection. Economic Studies 41, 153-171.
Review of
Pesaran, M.H., R.P. Smith and J.S. Yeo (1985): Testing for Structural Stability and Predictive Failure: A Review. The Manchester School 53, 280-295. Pierce, D.A. and L.D. Haugh (1977): of Econometrics 5, 265-293.
Causality in Temporal Systems.
Journal
Ramsey, J.B. (1969): Tests for Specification Errors in Classical Linear Least Squares Regression Analysis. Journal of the Royal Statistical Society B 31, 350-371. Ruud, P.A. (1984): Tests of Specification in Econometrics. Reviews 3, 211-242.
Econometric
Salkever, D.S. (1976): The Use of Dummy Variables to Compute Predictions, Prediction Errors and Confidence Intervals. Journal of Econometrics 4, 393-397. Sargan, J.D. (1980): Some Tests of Dynamic Specification for a Single Equation. Econometrica 48, 879-897. Sargent, T.J. and C.A. Sims (1977): Business Cycle Modelling Without Pretending to Have Too Much A Priori Economic Theory. In: C.A. Sims (Ed.) New Methods of Business Cycle Research. Federal Reserve Bank of Minneapolis. Sims, C.A. (1972): Money, Income and Causality. 540-552. Sims, C.A. (1980): Macroeconomics and Reality.
American Economic Review 62,
Econometrica 48, 1-48.
Tobln, J. (1958): The Estimation of Relationships for Limited Dependent Variables. Econometrlca 26, 24-36. White, H. (1982): Maximum Likelihood Estimation of Misspecifled Models. Econometrlca 50, 1-25. Zellner, A. and F. Palm (1974): Time Series Analysis and Simultaneous Equation Econometric Models. Journal of Econometrics 2, 17-54.