Lecture Notes in Control and Information Sciences Edited by A.V. Balakrishnan and M.Thoma
10 Jan M. Maciejowski
The Modelling of Systems with Small Observation Sets
Springer-Verlag Berlin Heidelberg New York 1978
Series Editors A. V. Balakrishnan • M. Thoma Advisory Board A. G. J. MacFarlane • H. Kwakernaak • Ya. Z. Tsypkin Author Dr. Jan Marian Maciejowski Maudstey Research Fellow, Pembroke College, Cambridge also with the Control and Management Systems Group, Cambridge University Engineering Department Mill Lane, Cambridge CB2 1RX, England
ISBN 3-540-09004-5 Springer-Verlag Berlin Heidelberg NewYork ISBN 0-387-09004-5 Springer-Verlag NewYork Heidelberg Berlin This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically those of translation, reprinting, re-use of illustrations, broadcasting, reproduction by photocopying machine or similar means, and storage in data banks. Under § 54 of the German Copyright Law where copies are made for other than private use, a fee is payable to the publisher, the amount of the fee to be determined by agreement with the publisher. © by Springer-Verlag Berlin Heidelberg 1g78 Printed in Germany
SUMMARY
The p r o b l e m systems,
when
is i n t r o d u c e d defined
of a s s e s s i n g
only
of a system,
of a v a i l a b l e algorithm un d e r
A general "information more no
of models,
criteria
information gain
including
and its c o m p u t a t i o n gain
for the The
language about
of m o d e l l i n g ,
to the p r o b l e m
of s y s t e m
account
of the size of the set
A model
is d e f i n e d
observation
to be an set of a s y s t e m
to find
that
in the s e n s e the m o d e l
with
gain.
nonlinear
criterion
dynamical
with
gain
program.
i n s i g n i f i c a n t as the o b s e r v a t i o n
that
models,
is d e m o n s t r a t e d . requires
The c h o i c e
the m o d e l l e r ' s
It is s h o w n
class
The use of i n f o r m a t i o n
of rival m o d e l s
of i n f o r m a t i o n
for a w i d e
stochastic
is s t r a i g h t f o r w a r d .
is a s s o c i a t e d
with
It is p r o v e d
c a n exist,
in general,
its
consistency
is d i s c u s s e d .
algorithm"
as a c o m p u t e r
the system.
of a model,
and its
is a s u i t a b l e
assessment
calculation
be e x p r e s s e d
solution
is proposed,
modelling
Information
information
a characterisation
of the q u a l i t y
it is not possible,
the h i g h e s t
accounts
of a l g o r i t h m i c
the o u t p u t
criterion
conventional
that
is
restrictions.
gain",
"universal
identification
for
observations.
specified
are available,
to a t h e o r y w h i c h
taking
for c o m p u t i n g
of
of
a partial
while
models
f r o m a set of o b s e r v a t i o n s
on to d e v e l o p
constitutes
identification,
System
The c o n c e p t s
are d r a w n
interpreting
of o b s e r v a t i o n s
and discussed.
that b e h a v i o u r .
which
sets
as the p r o g r e s s i o n
the b e h a v i o u r
theory
small
and
this
that
of p r o g r a m m i n g
a priori choice
sets b e c o m e
the m o d e l
beliefs
becomes
large.
A detailed
IV
investigation of
shows
"the s m a l l e s t
program.
t h a t it is p o s s i b l e
language"
A priori
knowledge
t h e r e f o r e be c o n s i d e r e d required
to r u n a p a r t i c u l a r
assumed
about a system can
to be d e f i n e d by the s m a l l e s t
language
to run the m o d e l .
Finally, which
required
to s p e a k p r e c i s e l y
the e f f e c t on m o d e l
system observations
t h a t a "safe"
are c o d e d
c o d i n g exists,
a s s e s s m e n t as w o u l d
a s s e s s m e n t of the m a n n e r is e x a m i n e d .
which often
It is f o u n d
leads to the
the use of m o s t o t h e r c o d i n g s .
in
same
ACKNOWLEDGEMENTS
The
idea of e x a m i n i n g
information
theory
His c o n s t a n t detailed
modelling
is due
to P r o f e s s o r
encouragement
criticism,
in the
light
A.G.J.
and e n t h u s i a s m ,
has b e e n
an e s s e n t i a l
of a l g o r i t h m i c
MacFarlane.
as w e l l
as
ingredient
of this
work. I have also benefited
from d i s c u s s i o n s
of the C o n t r o l
and M a n a g e m e n t
Dr.F.P.
Kelly,
Dr.
special
mention.
chapter
was
Watson
M.B.
from N e w t o n
of w h o m Beck
deserve
in the
last
out to me by D r . A . T . F u l l e r .
support
Council,
Group,
and Dr.
The q u o t a t i o n
pointed
Financial Research
S.R.
Systems
with many members
for this r e s e a r c h
and in the
final
came
stages
from
the S c i e n c e
from P e m b r o k e
College. Roberta of typing,
Hill but
special
so s u c c e s s f u l l y My wife
has p r o d u c e d thanks
through
are due
chapter
saying
have b e e n w i t h o u t
her
I shall
how
leave
to her
this
standard
for s t r u g g l i n g
one of those
impossible
constant
excellent
5.
has a s k e d me not to w r i t e
acknowledgements,
consequently
her usual
this
encouragement to t h e
embarassing
research
would
and support;
reader's
imagination.
CONTENTS
1
1.
Introduction
2.
S u r v e y of R e l a t e d Work
23
3.
A Characterisation
60
4.
I n c o r p o r a t i o n of A Priori K n o w l e d g e
102
5.
F r a g m e n t s of P r o g r a m m i n g
115
6.
h-Comparability
135
7.
Table L o o k - U p C o d i n g s
148
8.
D i s c u s s i o n and C o n c l u s i o n
158
References
180
of M o d e l l i n g
Languages
Appendices: A
Formal S e m a n t i c s of P r o g r a m m i n g L a n g u a g e s
185
B
S y n t a x of the A l g o l W - S u p p o r t of the G a s - F u r n a c e Models
216
Table L o o k - U p s
220
C
Diagrams
for the G a s - F u r n a c e M o d e l s
229
1.
i.i
INTRODUCTION
Motivation
The areas in w h i c h the s c i e n t i f i c m e t h o d has b e e n demonstrably
and s p e c t a c u l a r l y
by the p o s s i b i l i t y observations,
successful
are c h a r a c t e r i s e d
of p e r f o r m i n g e x p e r i m e n t s ,
or m a k i n g
more or less freely w h e n e v e r these are d e e m e d
desirable.
The result of this has b e e n that e x p l i c i t
c o n s i d e r a t i o n of the size of the set of o b s e r v a t i o n s w h i c h a m o d e l is h y p o t h e s i s e d , fitted, has b e e n n e g l e c t e d .
from
and to w h i c h a m o d e l is Any doubts w h i c h
arise about
the m o d e l can be r e s o l v e d by further e x p e r i m e n t a t i o n
and
observation. This p l e a s a n t p r o p e r t y i n c r e a s i n g l y d i s a p p e a r s enters
the domains of complex i n d u s t r i a l processes,
m e n t a l c o n t r o l systems, m a n a g e m e n t systems, e c o n o m i c systems.
as one environ-
and socio-
The w o r k d e s c r i b e d here aims to c l a r i f y
the r e l a t i o n s h i p b e t w e e n the s m a l l n e s s of the a v a i l a b l e o b s e r v a t i o n sets for such systems of the m o d e l s
and the d e g r e e of u s e f u l n e s s
o b t a i n e d for them.
Until recently,
the class of m o d e l s ~ h ~ c h
c o u l d be used
in s c i e n t i f i c i n v e s t i g a t i o n s was r e s t r i c t e d by a v e r y p r a c t i c a l consideration. understood,
The b e h a v i o u r of the m o d e l had to be
and that u n d e r s t a n d i n g
the theory of the model. s u f f i c i e n t l y simple
could only be o b t a i n e d from
The m o d e l was
c o n s t r a i n e d to be
for t h e o r e t i c a l i n v e s t i g a t i o n to be
possible. The
availability
situation
of the
radically.
the b e h a v i o u r theoretical
complicated behaviour,
of it.
of u s e f u l
relaxed.
model
changed
with hardly
Consequently
models
structure,
has b e e n
to o b s e r v e
the d e t a i l s
this
to i n v e s t i g a t e
It is now p o s s i b l e
and to a d j u s t
simulated
by s i m u l a t i o n ,
understanding
least g r e a t l y
has
It is now p o s s i b l e
of a m o d e l
on the c o m p l e x i t y
computer
any
this
constraint
removed,
or at
to p o s t u l a t e
a
its s i m u l a t e d
of the m o d e l
b e h a v i o u r r e s e m b l e s the b e h a v i o u r
until
its
of the s y s t e m b e i n g
investigated. When
is such
understanding be used the
of h o w
some
light
investigate
say how
to how
models
good
the
an i s o l a t e d
Why should
the details ability
model
a simulation
above not be u s e f u l
system behaviour,
indicate
the q u a l i t y
any can it
in this
A further
with
thesis aim
is
of rival m o d e l
connected
in
system
of the thesis
to d i s t i n g u i s h
assessment,
between
the a b i l i t y
to
is.
model
or r e l i a b l e ?
observed
When
of the same
Most
is i n t i m a t e l y
it give
reported
on t h e s e q u e s t i o n s .
how r i v a l m o d e l s
that
does
the s y s t e m w i l l b e h a v e
of the w o r k
concerned with
b u t it is clear
When
really works?
s h o u l d be assessed.
ostensibly
competing
guide
The p u r p o s e
is to t h r o w
behaviour
useful?
the s y s t e m
as a r e l i a b l e
future?
is to
a model
of the type d e s c r i b e d If it r e p r o d u c e s
is that not s u f f i c i e n t
of the m o d e l ?
In fact,
the
evidence
is it not
to
clear
that
the b e t t e r
the b e t t e r
the
the m o d e l ?
is the p o s s i b i l i t y complexity checked
against
the
time.
clear
the only
is no m o r e value.
agrees w i t h model
of some v a r i a b l e
no o t h e r
that v a l u e s
the v a l u e
in some
taken,
model,
then
It n e v e r
observations, prediction
also
confidence
amounts
assessment
say,
w o u l d be
than
confidence
increases
little
(which does
in
value
but
predictions, measurements of the
very quickly. after
doubt not
in the
to say
the p r e d i c t i o n s
of course,
have
is
any o t h e r
If further
in the m o d e l
third
it is
of c o n f i d e n c e
are b e t t e r
agree w i t h
correct
then
It is now p o s s i b l e
guesses.
one w o u l d
at some
It
is taken w h i c h
of the model,
to certainty,
it.
The p r e d i c t e d sense)
by the m o d e l
than m e r e
that
time of the v a r i a b l e
is nil.
increases.
and these
about
of the two o b s e r v a t i o n s ,
the p r e d i c t i o n
sense,
its
at two d i f f e r e n t
of the v a r i a b l e
with
reasonable
predicted
since
Suppose
information
if a third m e a s u r e m e n t
immediately
reason
and it is b e i n g
example.
(in an i n t u i t i v e
However,
The b a s i c
the model,
simple
of the m o d e l
likely
behaviour,
set of data.
are t a k e n
on the b a s i s
that
is no.
unconstrained,
following
to p r e d i c t
the p r e d i c t i o n
that
imply
only
ten
the next
that it
be). The
model
answer
If a linear v a r i a t i o n
proposed,
of the o b s e r v e d
"overfitting"
and that w e have
is d e s i r e d
would
of
a small
two m e a s u r e m e n t s
are
Our
is r e l a t i v e l y
Consider
times,
reproduction
confidence
clearly
which
depends
one is w i l l i n g
on the d i f f e r e n c e
to ascribe between
to this
the n u m b e r
of o b s e r v a t i o n s observations
required
the a v a i l a b l e then we have situation
of a r b i t r a r y
number
it "explains"
no
to c o n s t r u c t
that
by s a y i n g
then w e
if the n u m b e r
about
it fit the o b s e r v a t i o n s , i s
of o b s e r v a t i o n s ,
the model, This
that
have been m a d e
of
If all of
in its p r e d i c t i o n s .
also be d e s c r i b e d decisions
the model.
are used
confidence
to m a k e
and the n u m b e r
to c o n s t r u c t
observations
can
in o r d e r
which
the m o d e l ,
the same
have no c o n f i d e n c e
as~the
in the
model. This
p o i n t was m a d e
dismissed
Jeans'
catastrophe
succinctly
classical
and the
by P o i n c a r e ,
explanation
specific
heat
when
he
of the u l t r a v i o l e t
of solids
(i) :
"It is o b v i o u s that by g i v i n g s u i t a b l e d i m e n s i o n s to the c o m m u n i c a t i n g tubes b e t w e e n his r e s e r v o i r s and g i v i n g s u i t a b l e values to the leaks, Jeans can a c c o u n t for any e x p e r i m e n t a l results w h a t e v e r . But this is not the role of p h y s i c a l theories. T h e y s h o u l d n o t i n t r o d u c e as many a r b i t r a r y c o n s t a n t s as there are p h e n o m e n a to be e x D l a i n e d ; they should establish connections between different experimental facts, and above all they s h o u l d allow p r e d i c t i o n s to be made."
On the o t h e r hand, reproduces If o n l y increase
a slight
have
been
of p h e n o m e n a "
r e q u i r e d for m o d e l the
complexity
accuracy
behaviour
increase
in accuracy,
constants" "number
the o b s e r v e d
the
is c l e a r l y
in c o m p l e x i t y
then
in some
added
sense
to it than
which
assessment
of a m o d e l
with which
and its
significant.
results fewer
in a large "arbitrary
the a d d i t i o n a l
it now explains. is some
the m o d e l
What
"trade-off"
accuracy.
is
between
A prerequisite
for this a wide
is a m e a s u r e
class
appears
casting
of m o d e l s
in such
of fit of m o d e l
behaviour
to the o b s e r v e d
is the
as a c o m p o n e n t is thus
a suitable
of m o d e l
achieved
assessment
qrthodox would
be
f r o m a small
approach
a form,
in
that behaviour
The r e q u i r e d
model
class,
ment problem
as a s t a t i s t i c a l has
of the
complexity
in
indeed been
follow
of m o d e l s
some
statistical
to f o r m u l a t e
decision
the a s s e s s -
problem.
investigated,
such
of m o d e l
assessment
then be p o s s i b l e
type e n c o u n t e r e d
We do not
the
and to p o s t u l a t e
It m a y
of a p p r o a c h
to the p r o b l e m
to e x a m i n e
framework.
(5).
introduced
complexity.
by a s s e s s i n g
to
manner.
A more
models
is a p p l i c a b l e
innovation
trade-off
chosen
of models.
which
A major
this w o r k poorness
of c o m p l e x i t y
even
in c o n t r o l
an a p p r o a c h
This
type
for d y n a m i c a l
studies for the
(2)(3)(4) following
reasons. Any m e t h o d w i l l be
arrived
appropriate
(such as l i n e a r
Such
compared
investigated market,
for a n a r r o w
(statistical)
corrupted
a method will
are b e i n g
only
difference-equation
set in a p a r t i c u l a r "observations
at from s t a t i s t i c a l
by w h i t e ,
n o t be u s e f u l - for e x a m p l e ,
is the b e h a v i o u r
it may be d e s i r e d
Forrester's
"Industrial
class
of m o d e l s
models,
for e x a m p l e ) ,
environment Gaussian,
(such
firms
models
being in some
a model based
techniques
noise").
different
if the s y s t e m
to c o m p a r e
as
additive
if two very
of c o m p e t i n g
Dynamics"
considerations
on
(6) w i t h
a model
which
uses
market's
game
theory
firms'
elements. usually
simulation
When
the p r o b a b i l i t y Furthermore, economic
difficult
when
under
conditions.
few o b s e r v a t i o n s
and there
is little
the
statistical
specification
may
i t s e l f be very u n c e r t a i n .
by n o t a s s u m i n g conclusions These fruitful
it to be known;
considerations
to i n v e s t i g a t e
by a p a i n s t a k i n g
three
of r e l e v a n t are
about
it,
environment little
is lost
misleading
indicate
that
by m a k i n g
the g e n e r a l
and d i f f i c u l t
it may be m o r e of m o d e l s
of complex,
as few a s s u m p t i o n s situation,
analysis
rather
as
than
of each m o d e l
as it arises.
Overview
We
case
in fact,
the a s s e s s m e n t
systems
and e x a m i n i n g
structure,
knowledge
these,
may be avoided.
understood
possible
behaviours
of a s y s t e m
of the s y s t e m ' s In this
(8).
and s o c i o -
stationariness
a priori
of
When modelling
processes. available,
it is
and i m p o r t a n t
to assume
Finally,when
nonlinear
variables
environmental
it may not be a p p r o p r i a t e
1.2
and the
the e v o l u t i o n
of r e l e v a n t
interesting
transient
contain
also d y n a m i c a l ,
to d e s c r i b e
investigating the m o s t
often
are
distributions
systems,
occur
models
such m o d e l s
extremely
poorly
actions
responses.
Realistic
often
(7) to e x p l a i n
develop
of A p p r o a c h
and Results.
a characterisation
"components":
the s y s t e m
of m o d e l l i n g
to be m o d e l l e d ,
which
has
a model
of
this system, The
and a c r i t e r i o n
system
pair of sets
of q u a l i t y
to be m o d e l l e d
of o b s e r v a t i o n s are
and accuracy,
observation
Each
each
therefore
discrete-time that this
of d a t a detail
does n o t
time,
of this
become
of such
reflects
evident
a system
to
the r e a l i t i e s
be d e f i n e d
in m o r e
which
implies
compute
a reversed
time
obtained.
exercise
interest,
ordering.
functions
defined
These
It only
are u s e l e s s
in a n e w s i t u a t i o n exercise),
as a r e f e r e n c e ,
with
will
of a p a r t i c u l a r
subsets
to a d m i t
be of m u c h
of the m o d e l l i n g
to m o d e l s
which
is b r o a d e n o u g h
system may behave
serve
the o u t p u t
a lack of any
of the m o d e l l i n g
Any r e s t r i c t i o n
onto
not n o r m a l l y
observations
the goal
by s p e c i f y i n g
The
is any a l g o r i t h m w h i c h maps
or even
h o w the
type w i l l
the success
It m e r e l y
interpretation
algorithms
on the p a r t i c u l a r
(presumably
finite.
it w i l l
the m o d e l s
definition
would
such as those w h o s e
for d e d u c i n g
resolution
a set of d i s c r e t e - s t a t e ,
of the o b s e r v a t i o n s
which
allows
limited
to be r a t i o n a l .
to be
However,
system
This
of
and output.
is a s s u m e d
A system will
of the
observations.
direction
by a
1.3.
subsets
algorithms
like
category.
in sec.
input
is a s s u m e d
constrain
collection.
A model certain
looks
to be d e f i n e d
obtained with
measurements.
be of the same
also
always
set of o b s e r v a t i o n s
system
is t a ke n
of its
Since m e a s u r e m e n t s
of the model.
but models
respect
to w h i c h
be assessed. type
of the o b s e r v a t i o n s
is a c c o m p l i s h e d lie
in the
domain
of the a l g o r i t h m ,
observations
are
deterministic
successive
successive
outputs,
the W i e n e r
- Kolmogorov
blocks
of i n p u t
elements
to be the c o r r e s p o n d i n g
For example, n e e d o n l y map
and w h i c h
images.
difference
blocks
whereas
of the o u t p u t
of input
stochastic
or K a l m a n
and p a s t o u t p u t
equation
models
observations
predicting
types m u s t map
observations
to
models
of
successive
to s u c c e s s i v e
outputs. The
term
program".
Thus
the o u t p u t specified
"algorithm" we
think
observations, subsets
may be
interpreted
of m o d e l s and these
as p r o g r a m s programs
of the o b s e r v a t i o n s
task.
This
it w e r e
not
for the p o w e r
of C h u r c h ' s
states
that
any p r o c e d u r e
which
notion
of an " a l g o r i t h m "
equivalent hence
viewpoint
the m o d e l
some p r o g r a m m i n g taken
to be the
the n u m b e r the p r o g r a m which
have
is w r i t t e n
shortness
is a m e a s u r e
the o u t p u t
criterion
program
in
of q u a l i t y
is
as m e a s u r e d
with which
the o b s e r v a t i o n s
The
length
of a r b i t r a r y
to the p r o g r a m m i n g Furthermore,
observations were
and
program.
of the n u m b e r
to c o m p u t e
(9), w h i c h
of a l g o r i t h m s ,
in the program.
the model.
if
in any one of the
of that p r o g r a m ,
(relative
in this
the i n t u i t i v e
as a c o m p u t e r
the
them
arbitrary,
Thesis
theory
as a c o m p u t e r
of c h a r a c t e r s
in c o n s t r u c t i n g
of the
lanaguage,
been m~e
be e x c e s s i v e l y
satisfies
for c o m p u t i n g
may use the
to help
can be e x p r e s s e d
formalisations
can be e x p r e s s e d When
would
as " c o m p u t e r
originally
of
decisions
language)
a model
exactly
by
is r e q u i r e d
(to the a c c u r a c y made).
In o r d e r
to do this,
the m o d e l m u s t g e n e r a t e i n t e r n a l l y
those terms
w h i c h w o u l d c o n v e n t i o n a l l y be t h o u g h t of as "fitting errors". Since the p r o g r a m m i n g terminals,
l a n g u a g e has a finite n u m b e r of
the length of the m o d e l i n c r e a s e s w h e n these
terms increase.
The c r i t e r i o n of q u a l i t y
a particular trade-off between
thus i n c o r p o r a t e s
c o m p l e x i t y and a p p r o x i m a t i o n .
The above c h a r a c t e r i s a t i o n of m o d e l l i n g more detail 2.2.
in C h a p t e r 3.
Support
is e x p l a i n e d in
for it is given in s e c t i o n
The e s s e n c e of this s u p p o r t is that the length o~
the s h o r t e s t p r o g r a m r e q u i r e d to c o m p u t e a s ~ q u e n c e d i s p l a y s properties
analogous
to the p r o p e r t i e s
of the e n t r o p y
associated with a probability
space.
long sequence, w h i c h r e q u i r e s
a maximally
compute it, p a s s e s every e f f e c t i v e (asymptotically, w i t h p r o b a b i l i t y
possible
long p r o g r a m to
i).
This suggests
to "compress"
that
the p r o g r a m
r e q u i r e d to compute a set of o b s e r v a t i o n s
represents
a
test for r a n d o m n e s s
the amount by w h i c h it is p o s s i b l e (model)
In p a r t i c u l a r ,
(system)
the amount of i n f o r m a t i o n w h i c h it has b e e n
to e x t r a c t from the o b s e r v a t i o n s .
If the only
m o d e l w h i c h has b e e n found is one that m e r e l y reads out the observations
from a look-up table,
has b e e n achieved,
and such a m o d e l
then no " c o m p r e s s i o n " conveys no i n f o r m a t i o n
about the o b s e r v a t i o n s . A c o n s e q u e n c e of our c h a r a c t e r i s a t i o n
is that no
a l g o r i t h m can e x i s t for finding
the best m o d e l
the above c r i t e r i o n of quality)
of an a r b i t r a r y
(according to system.
10
The choice of p r o g r a m m i n g
l a n g u a g e to be used,
a s s e s s i n g the q u a l i t y of a model,
for
can be v i e w e d as the
s p e c i f i c a t i o n of "what is to be taken for granted". should
It
t h e r e f o r e be m a d e in the light of the m o d e l l e r ' s
a priori k n o w l e d g e
about the system,
the m o d e l l i n g exercise.
In C h a p t e r 4 this c o n n e c t i o n is
e x a m i n e d m o r e closely. sets are large enough,
and of the p u r p o s e s of
It is shown that,
if the o b s e r v a t i o n
then the results of m o d e l a s s e s s m e n t
are i n d e p e n d e n t of the choice of p r o g r a m m i n g
language.
This can be i n t e r p r e t e d to m e a n that the m o d e l l e r ' s 9 p r i o r i beliefs become
less s i g n i f i c a n t as the set of o b s e r v a t i o n s
a v a i l a b l e to him grows. Nevertheless, observation
the a s s e s s m e n t of m o d e l s of small
sets ~ d e p e n d e n t on the m o d e l l e r ' s
of his a p r i o r i beliefs.
Consequently
cannot be taken to be definitive.
specification
such an a s s e s s m e n t
However,
this is
m i t i g a t e d by the fact that the m o d e l l e r does not n e e d to choose b e t w e e n
mutually exclusive
he can s t i p u l a t e p r o g r a m m i n g
sets of a priori beliefs:
l a n a g u a g e s w h i c h imply a g r e a t e r
or s m a l l e r state of k n o w l e d g e . S e v e r a l d i f f e r e n t models,
even w h e n w r i t t e n in the same
language, w i l l rarely use e x a c t l y the same f e a t u r e s of that language.
It is t h e r e f o r e q u e s t i o n a b l e w h e t h e r a c o m p a r i s o n
of their lengths gives a m e a s u r e to the same set of assumptions. this difficulty.
Chapter
of their c o m p l e x i t y r e l a t i v e Chapters
5 develops
5 and 6 resolve
a formal e q u i v a l e n t
of "a p r o g r a m makes use of s u c h - a n d - s u c h f a c i l i t i e s of a
11
language".
A prerequisite
for this is a formal m e t h o d of
d e f i n i n g the s e m a n t i c s of p r o g r a m m i n g
languages.
such m e t h o d is o u t l i n e d in A p p e n d i x A. the concepts d e v e l o p e d in C h a p t e r
these c o n d i t i o n s
C h a p t e r 6 then uses
5 to specify some c o n d i t i o n s
under w h i c h m o d e l s may be m e a n i n g f u l l y d e m o n s t r a t e d that m o d e l
One
compared.
It iS
a s s e s s m e n t is not m u c h a f f e c t e d if
are not m e t exactly.
The details of the c o m p l e x i t y / / a p p r o x i m a t i o n t r a d e - o f f , w h i c h is i n h e r e n t in our p r o p o s e d m e t h o d of m o d e l a s s e s s m e n t , d e p e n d on the p r e c i s e m a n n e r in w h i c h the o b s e r v a t i o n s coded in the p r o g r a m m i n g
language.
It is c o n v e n i e n t
are to
s e p a r a t e this aspect of the s e l e c t i o n of a s u i t a b l e p r o g r a m m i n g language from those aspects c o n s i d e r e d in C h a p t e r s e q u e n t l y the coding of o b s e r v a t i o n s
4;
con-
is d i s c u s s e d in C h a p t e r 7.
A d i s t i n g u i s h e d m i n i m a l coding is shown to exist,
and it is
argued that this is a n a t u r a l c o d i n g to use for m o d e l assessment. The m o d e l l i n g of one p a r t i c u l a r s y s t e m gas-furnace data
(i0))
(Box and Jenkins'
is used as an e x a m p l e throughout.
The r i v a l m o d e l s c o n s i d e r e d for this s y s t e m are very simple and in no way r e p r e s e n t the range of possibi.lities d i s c u s s e d in sec.
i.i.
Nevertheless,
the c o n s i d e r a t i o n s
there apply e v e n to these simple models, Chapter
3.
It w i l l b e c o m e
raised
as w i l l be seen in
a p p a r e n t that the a s s e s s m e n t
m e t h o d p r o p o s e d in this thesis is i m m e d i a t e l y a p p l i c a b l e to a much
larger class of models.
12
1.3
System
Identification r Realisation
Modern notion
developments
of a d y n a m i c a l
experimental with
data
of systems
system
(ii),
the i n f e r e n c e
not y e t o b s e r v e d
conditions,
behaviour,
known
under
theory
emphasise
as an a b s t r a c t
(12),
of s y s t e m
and M o d e l l i n @
(13).
summary
Modelling
behaviour
under
is c o n c e r n e d
by w h i c h
is a c h i e v e d
is the p o s t u l a t i o n
the
system,
which
and
the s e l e c t i o n ,
from t h e s e
candidate
is p r e f e r r e d
on the basis
of some
criterion.
its h e a v y
emphasis
that
modern
discussing
as
However,
observations, upon
Consequently,
a more
than
if a s y s t e m
then as little
we adopt
structures,
and
one
the
of one The
on
to adopt,
less u s e f u l
when
view
of c o m p o n e n t s " . and the
by r e f e r e n c e
abstract
modelling
for
observations,
is to be m o d e l l e d ,
is to be g a u g e d
it, b e f o r e
these
natural
the o l d e r
structures
this
following
to the
structure
has begun,
success
should
be
as possible.
definition:
(1.3.1)
A system observations, U=
with
"an i n t e r c o n n e c t i o n
of the m o d e l l i n g
Definition
with
is t h e r e f o r e
modelling,
of a s y s t e m
(i)
compatible
v i e w of a system,
observations,
imposed
are
but
of p a s t
The m e t h o d
of a b s t r a c t
of
specified
from o b s e r v a t i o n s
conditions.
the
S is d e f i n e d S=
(u I , u 2
to be an o r d e r e d
(U, Y)
, where:
, .
,uM)
and Y=
(Yl
p a i r of
' Y2
'
,YN )
13
are the i n p u t and o u t p u t o b s e r v a t i o n sets r e s p e c t i v e l y ; ui=
(Ul, u2
i )and . , u~i
• .
y i=
are o r d e r e d sets of o b s e r v a t i o n s
w h e r e tl,t2,..,
(yi1
'
yi2
i ' Ymi
'
c a r r i e d out at time ti,
t N is the n a t u r a l
time ordering;
u~ E { r a t i o n a l s }
'
u {b} where b
i
for yj;
3
(blank)
denotes a missing observation;
similarly
and
(ii) w i t h the c o n v e n t i o n
£i=0;
)
if
that
Yi=b t h e n mi=O;
if
(b,b,...,b)=b,
if u . = b then l
u.%b t h e n u£.@b; i l
if Y i ~ b then
1 i
Ym, ~b; 1
and YN%b.
C o n d i t i o n s (ii) serve only to e n s u r e that adding on a set of blanks
(missing o b s e r v a t i o n s )
does not create a new system.
For c o n c r e t e n e s s • we have s p e c i f i e d that ui,Y i refer to observations made
at time t i, since we are i n t e r e s t e d p r i m a r i l y
in d y n a m i c a l models. essential.
Also,
However•
this i n t e r p r e t a t i o n
is not
each u i , Y i could be a m u l t i d i m e n s i o n a l
finite a r r a y of o b s e r v a t i o n s ,
r a t h e r than a o n e - d i m e n s l o n a l
array, w i t h o u t a f f e c t i n g later results. The input o b s e r v a t i o n set is a l l o w e d to be empty, order to admit d e v i c e s such as noise g e n e r a t o r s as systems of the form w h e n stating
(b, Y).
in
and o s c i l l a t o r s ,
It has b e e n a r g u e d that
the g e n e r a l p r o b l e m of s y s t e m i d e n t i f i c a t i o n ,
it should not be n e c e s s a r y
to d i s t i n g u i s h b e t w e e n input and
output(14).
The two should be lumped t o g e t h e r as a "system
behaviour",
and the task of s y s t e m i d e n t i f i c a t i o n s h o u l d
~4
include
the
it seems the
two
separation
essential cases
shown
and
internal
structures
procedure inputs
must
have
sets.
The
f r o m the sets
lead
form
Our
especially field
difference
a system
define
of
cc. c e ~ n ~ d r
however,
interaction have
is s o m e
cbservaLions concise
referred we prefer the
with
with
to above
o f its
the
set of observations
of observation
themselves and
systems
(b, Y). seem odd,
theory.
by
by
In t h i s
a set
of
examining
equations. process.
We We
of a system
hehaviour.
"laws"
- such
"explain"
The as t h e
this
set of equations as
a "system".
reason
the o b s e r v a t i o n
a system
reverse
this
are
assume
because
eD~TircFme~.t - 'I o t h e r w o r d s ,
- which
to regard
control
the e x i s t e n c e
set of
are
for
the
observations
at f i r s t
of these
the
its
that
a
identification
unless
pair
input
its b e h a v i o u r
solutions
of
with
to define
properties
aware
may
any
It is
t h a t U # b)
familiar
and investigate
we
a system,
"system"
equations,
its
Note
different
between
of both
the
between
labelled
very
But
as an o r d e r e d
distinguishes
i t is c o n v e n t i o n a l
the
point.
distinguished.
of
to t h o s e
t h a t ?,e are
the
same model
(b, U) ( p r o v i d i n g
definition
boxes
-
observations.
U a n d Y, w h i c h
o f the
The black
consider
However,
of distinguishing
to h a v e
are
ordering
"output".
can be expected
to the
defined
output
i.
and an earthing
and outputs
that we
and
a means
in Fig.
"sink"
generator
"input"
to have
"source"
signal
of
goal
of because
of modelling
set of equations
interaction. as a " m o d e l " ,
Hence and
15 The d e f i n i t i o n of "system" w h i c h is p r o p o s e d above is much cruder than the d e f i n i t i o n s
usually encountered.
It
is w o r t h s t a t i n g in full one such d e f i n i t i o n - that of Kalman, Falb and A r b i b
Definition
(ii) :
(1.3.2)
A dynamical system mathematical (a)
(i)
( i n p u t / o u t p u t sense)
is a c o m p o s i t e
c o n c e p t d e f i n e d as follows:
T h e r e is a given time set T, a set of input values U,
a set of a c c e p t a b l e i n p u t functions
R={~
:T+
output values Y, and a set of o u t p u t functions (ii)
(Direction of time).
U}, a set of F ={y
:T÷
Y}.
T is an o r d e r e d subset of the reals.
(iii) The i n p u t space ~ s a t i s f i e s
the f o l l o w i n g conditions:
(I)
(Nontriviality).
~ is nonempty.
(2)
(Concatenation of inputs).
An input s e g m e n t
~(t I, t 2) is ~e~ r e s t r i c t e d to
(t I , t2)~T.
If ~,~'e~ and tl< t 2 < t3, there is an e"e~ such that m" (tl,t2) = ~ ' ( t l , t 2 ) and ~" (b)
T h e r e is given a set F = (fe
:
T
x
A
(t2,t3)=w"(t2,t3).
i n d e x i n g a family of f u n c t i o n s ~ ~Y,~eA}
;
each m e m b e r of F is w r i t t e n e x p l i c i t l y
as f (t,~)= y(t)
w h i c h is the o u t p u t r e s u l t i n g at time
t
under the e x p e r i m e n t
e.
Each f
from the input
is c a l l e d an i n p u t / o u t p u t
function and has the f o l l o w i n g p r o p e r t i e s : (i)
(Direction of time).
f (t,~)
There
is d e f i n e d for all t>l(e).
is a map
~:A÷T such that
16
(ii) ~(~
Let T,teT
(Causality). ,t) =~
and T
, then f (t,~)=f
(T,t)
If ~,m'e~
(t,W')
and
for all e such
that T=t(a).
The r e a s o n why our d e f i n i t i o n definition
(1.3.2)
by o b s e r v a t i o n s
(1.3.1)
is that we c o n s i d e r our s y s t e m to b e . d e f i n e d
of reality.
Our s y s t e m is not so m u c h
a b s t r a c t s u m m a r y of e x p e r i m e n t a l data", definition
(1.3.2)
can be c r u d e r than
"an
as the s y s t e m of
is, but r a t h e r is the d a t a itself.
do not have to i n c l u d e c o n d i t i o n s " c o n c a t e n a t i o n of inputs",
ensuring
We
"causality"
or
b e c a u s e we c o n s i d e r these to be
c o n d i t i o n s w h i c h w i l l be i m p o s e d on the class of m o d e l s w h i c h we are w i l l i n g
to c o n s i d e r
for the system,
c o n d i t i o n s on the s y s t e m itself.
r a t h e r than
Definition
(1.3.2)
is
u n d o u b t e d l y v e r y s u i t a b l e for the d e d u c t i v e d e v e l o p m e n t of t h e o r i e s of s y s t e m b e h a v i o u r ,
but it is i n a p p r o p r i a t e
as a
s t a r t i n g p o i n t for the study of s y s t e m i d e n t i f i c a t i o n , p r i m a r i l y b e c a u s e it assumes that the m a i n task of s y s t e m i d e n t i f i c a t i o n has already b e e n a c c o m p l i s h e d . To see that this is so, c o n s i d e r the family of i n p u t / output
f u n c t i o n s F.
corresponding
function
Under a particular experiment f
e
determines
how the s y s t e m w i l l b e h a v e In o t h e r words,
But
is p r e c i s e l y to d e t e r m i n e
in r e s p o n s e
it is to d e t e r m i n e
the
the future o u t p u t
b e h a v i o u r of the s y s t e m for any a c c e p t a b l e input ~. the task of s y s t e m i d e n t i f i c a t i o n
~
to c e r t a i n inputs.
some of the f 's.
17
Furthermore,
the d i v i s i o n
"experiments" process.
is p r o p e r l y
If a m a c h i n e
operated
again,
its b e h a v i o u r of some
when
one e x a m i n i n g
also
decision
a continuous
does n o t k n o w t h a t
switched
off
it has b e e n
then
a reflection
of the m a c h i n e .
of its b e h a v i o u r ,
into
and
account of
is a l r e a d y
switched
of the r e c o r d
off,
to take
we p o s s e s s record
into s e p a r a t e
of the i d e n t i f i c a t i o n
not
it is s w i t c h e d
conception
division
a part
is o p e r a t e d ,
the u s u a l
abstract
suitable
of o b s e r v a t i o n s
off,
may
separate
Some-
and w h o
find
a
"experiments"
to be far from obvious. In our view,
system
identification
progression
from
to a s y s t e m
in the form of d e f i n i t i o n
first of all
a system
a division
whereupon
the
system
functions
F to c e r t a i n
determination, of the
many
possibilities It seems
to quote again
Falb yet
from
trouble
another
subsets is that
from,
even
to d i s t i n g u i s h
that
and A r b i b
(ii) :
of T x ~.
restrictions,
to c h o o s e
from
(1.3.2).
This
into
of s y s t e m
(ii).
definition
the
the e x t e n s i o n s (commonly
are i n f i n i t e l y
for a p a r t i c u l a r
the p r o b l e m
realisation,
To do this, of
of the
remains
of T x ~ there
involves
experiments,
There of w h a t
the
(1.3.1)
as a set of r e s t r i c t i o n s
subsets
The
as w e l l
identification Kalman,
appears
F are on larger
on all of T x ~).
form of d e f i n i t i o n
of the o b s e r v a t i o n s
from these
functions
in the
is e s s e n t i a l l y
~e~.
of s y s t e m as d e f i n e d
it is n e c e s s a r y
"dynamical
system"
-
by
18
Definition
(1.3.3)
A dynamical mathematical (a)
There
(state space sense)
is a c o m p o s i t e
c o n c e p t d e f i n e d by the f o l l o w i n g axioms: are given sets T, U, ~, Y, F s a t i s f y i n g all
the p r o p e r t i e s (b)
system
r e q u i r e d by d e f i n i t i o n
(1.3.2)
T h e r e is g i v e n a state set X and a s t a t e - t r a n s i t i o n
function 2:
T xTx
w h o s e value is the state x
Xx
~ + X
(t) = ~ ( t ; T , x , ~ ) c X r e s u l t i n g at
time teT from the i n i t i a l state X = X ( T ) E X at initial time ToT u n d e r the action of the input ~£~.
~ has the f o l l o w i n g
properties: (i)
(Direction of Time).
but not n e c e s s a r i l y (ii)
g is d e f i n e d for all t~T,
for all t
(Consistency).
~(t;t,x,~)=x
for all teT, all xEX,
and all me~. (iii)
(Composition property).
For any t l < t 2 < t 3 we have
~(t3;tl,X,~)=~(t3;t2,~(t2;tl,x,~),~) for all xeX and all ~£R.
(iv)
(Causality).
If m ,~e~ and ~
~(t;~,x,~)
(c)
(T,t)=~
(T,t)' then
= ~(t;T,x,~').
T h e r e is given a r e a d o u t map ~ : T x X ÷Y w h i c h defines
the o u t p u t Y ( t ) = ~ ( t , x ( t ) ) . O+D(O,@(~;T,X,~)),~E(T,t), r e s t r i c t i o n Y(~,t)
The map
(~,t)+Y g i v e n by
is an o u t p u t segment,
of some y£F to
(T,t).
that is, the
19
The p r o b l e m of s y s t e m r e a l i s a t i o n the p r o b l e m of c o n s t r u c t i n g of d e f i n i t i o n (1.3.2).
(1.3.3)
Kalman,
is now d e f i n e d to be
a dynamical
s y s t e m in the sense
from a s y s t e m in the sense of d e f i n i t i o n
Falb and A r b i b
(ii) state t h a t this is
"simply an a b s t r a c t way of looking at the p r o b l e m of s c i e n t i f i c model b u i l d i n g " .
We disagree.
If this w e r e
s c i e n t i f i c m o d e l b u i l d i n g w o u l d be "merely" problem.
so, then
.
a mathematical
But the m a j o r p r o b l e m s w i t h m o d e l b u i l d i n g are
p h i l o s o p h i c a l ones - q u e s t i o n s nature of inference, the p r o b l e m s
c o n c e r n i n g the p o s s i b i l i t y
the v a l i d i t y of inductiQn,
a r i s i n g out of our u n c e r t a i n t y
of s c i e n t i f i c method.
These problems
connection with system realisation.
in fact,
and all
about the n a t u r e
do not arise in T h e y all arise, however,
when the q u e s t i o n of s y s t e m i d e n t i f i c a t i o n ,
as d e f i n e d above,
is considered. T h a t is not to say, h o w e v e r ,
that s y s t e m i d e n t i f i c a t i o n
is "an a b s t r a c t w a y of looking at s c i e n t i f i c m o d e l b u i l d i n g " , any more than s y s t e m r e a l i s a t i o n
is.
In order to be useful,
a m o d e l m u s t not only s p e c i f y the i n p u t / o u t p u t a system; them.
it m u s t u s u a l l y p r o v i d e
f u n c t i o n s of
also a m e a n s of c o m p u t i n g
To do this it m u s t take the form of a s y s t e m in the
sense of d e f i n i t i o n
(1.3.3).
A better abstract building"
is, then:
the p r o b l e m of c o n s t r u c t i n g a s y s t e m in
the sense of d e f i n i t i o n of d e f i n i t i o n
f o r m u l a t i o n of " s c i e n t i f i c m o d e l
(1.3.1).
(1.3.3)
from a s y s t e m in the sense
This i n c l u d e s w i t h i n it b o t h the
e s s e n t i a l l y p h i l o s o p h i c a l p r o b l e m of s y s t e m i d e n t i f i c a t i o n ,
20
and the e s s e n t i a l l y m a t h e m a t i c a l
one of s y s t e m realisation.
It is i m p o r t a n t to n o t e that this d i v i s i o n c o n c e p t u a l one;
its aim is to b r i n g to the s u r f a c e the
p r e c i s e n a t u r e of the m o d e l l i n g problem. to imply that a m o d e l l e r identification, in s t a t e - s p a c e
is a p u r e l y
It is not i n t e n d e d
first carries out a p r o c e s s of
and then of r e a l l s a t i o n .
Indeed,
form is o f t e n u s e d to obtain the i n p u t - o u t p u t
f u n c t i o n of the m o d e l ,
rather than the other way round.
But the p o i n t is that in g o i n g a p p a r e n t l y d i r e c t l y d a t a to a s t a t e - s p a c e model, c a r r i e d out a p r o c e s s of the p h i l o s o p h i c a l a c c e p t i n g the m o d e l
of i d e n t i f i c a t i o n , difficulties
and s h o u l d be aware
a s s o c i a t e d w i t h this b e f o r e
as a r e p r e s e n t a t i o n
state-space
(ii).
from the
the m o d e l l e r has i m p l i c i t l y
of the system.
An i l l u s t r a t i o n of the above d i s t i n c t i o n s Ho's a l g o r i t h m
a model
is p r o v i d e d by
This a l g o r i t h m c o n s t r u c t s a s y s t e m in
form from a s e q u e n c e of data.
The e x i s t e n c e of
s u c h an a l g o r i t h m m a y a p p e a r to imply that for any s e q u e n c e of data it is p o s s i b l e
to d e t e r m i n e u n i q u e l y a r e a l i s a t i o n ,
and that this m u s t t h e r e f o r e be the "true" m o d e l of the s y s t e m generating entails
the sequence.
the a s s u m p t i o n s
is linear,
However,
that the s y s t e m i n p u t / o u t p u t
is t i m e - i n v a r i a n t ,
has the s m a l l e s t d i m e n s i o n T h e s e a s s u m p t i o n s of course completely. constitutes
the use of Ho's
algorithm function
and that its m i n i m a l r e a l i s a t i o n
c o m p a t i b l e w i t h the data sequence. s p e c i f y the i n p u t / o u t p u t f u n c t i o n
Thus the d e c i s i o n
to use Ho's a l g o r i t h m itself
the p r o c e s s of s y s t e m i d e n t i f i c a t i o n .
The term " i d e n t i f i c a t i o n "
is used s o m e w h a t d i f f e r e n t l y
21
above
than
is c o n v e n t i o n a l .
being to d i s t i n g u i s h the m a t h e m a t i c a l Our precise system
clearly
problems
definition
of
- c o u l d be used
set of time
functions
of all the v a l u e s Indeed,
own d e f i n i t i o n But they b o t h
whose
of a s y s t e m share w i t h that m a k e s
identification
process,
the a s s u m p t i o n
that
them
rather
(i.e.
as a the
in s p i r i t
pair
(1.3.2)
set (12). to our
of o b s e r v a t i o n s .
the e s s e n t i a l
"outputs"
suitable
situation
as
of the
"inputs":
the b e h a v i o u r
of
is specified. term
"system"
it can be u n d e r s t o o d
is p e r h a p s
in so m a n y
definitions
really be said to d e f i n e
models
for not d o i n g
"model"
for
persist
in c a l l i n g
emphasise
(1.3.2),
is a r e l a t i o n
suitable
In a s t u d y of m o d e l l i n g ,
reason
space
are c l o s e r
than
of a
(function)
of a s y s t e m
as an o r d e r e d
in any new
The use of the because
and o u t p u t
functions)
definition
concept
(13), - a r e l a t i o n
notion
approaches
characteristic
the s y s t e m
Zadeh's
attainable
time
not rely on the
of d e f i n i t i o n
similar
of the
b o t h of these
does
of the i n p ut
very
and
in m o d e l l i n g .
object"
in p l a c e
the aim
the p h i l o s o p h i c a l
(1.3.2).
abstract
product
could W i n d e k n e c h t ' s
between
involved
form of d e f i n i t i o n
on the c a r t e s i a n
is d e l i b e r a t e ,
identification
as an " o r i e n t e d
spaces
This
something
the
are d e r i v e d
fact
our b a s i c that
from its
rather
specific.
all
data
than
and
with
senses. (1.3.3)
systems.
to r e s e r v e On the o t h e r
a "system",
abstract
interaction
different
(1.3.2)
so is that we w i s h more
unfortunate,
The
the term hand,
in o r d e r
conceptions
should
about
its e n v i r o n m e n t
we
to a system - in
22
o t h e r words,
from our o b s e r v a t i o n s of it.
the s e p a r a t i o n of the o b s e r v a t i o n s already reflects but,
into "inputs"
some a b s t r a c t c o n c e p t i o n s
as has already been e x p l a i n e d ,
S t r i c t l y speaking, and "outputs"
about the system,
this s e p a r a t i o n
c o n s i d e r e d to be an e s s e n t i a l part of the "input"
is
to the
p r o c e s s of i d e n t i f i c a t i o n . Such aspects as the choice of m e a s u r e m e n t and e v e n how these m e a s u r e m e n t s ui' Yi
(in d e f i n i t i o n
preconceptions
also imply a b s t r a c t However,
some p r e c o n c e p t i o n s
and we choose to r e g a r d o b s e r v a t i o n s
as
This is e q u i v a l e n t to c o n s i d e r i n g that " o b j e c t i v i t y "
for the m o d e l l e r to him.
are a r r a n g e d in the arrays
about the system.
m u s t be assumed, primitive.
(1.3.1))
scales,
is d e f i n e d by the o b s e r v a t i o n s
available
2.
2.1
SURVEY
Complexity
This
not
literature
since
survey
particular,
such
will
of m u c h
of m o d e l s
of s c i e n c e philosophy
authors
them. who
namely,
inference
from s a m p l e s constitute
Instead, for the with
the s u r v e y w i l l
the
that
the n o t i o n
the
shortness
also e x a m i n e
considered
In
and s t a t i s t i c a l
thesis
most
those w h o have
using
a
and p h i l o s o p h y .
It w i l l
are
the e n t i r e
of s c i e n c e
can be a s s o c i a t e d
study:
of s c i e n t i f i c
of h y p o t h e s e s
of s u p p o r t
for c o m p u t i n g
the p r e s e n t
to c o v e r
not be included.
the d e v e l o p m e n t
w o r k of those
WORK
an a t t e m p t w o u l d
the r e l e v a n t
literature
programs
attempt
on the i n f e r e n c e
of b e h a v i o u r ,
quality
RELATED
Measures
survey will
literature
trace
OF
of
the
relevant
examined
to
the n a t u r e
of c o m p u t a t i o n a l
complexity. It is c o n v e n i e n t d u c e d by ideas
Blum
(15),
(16),
to w h i c h m o s t
Blum's
to b e g i n
by s t a t i n g
since
computable axiomatic
functions.
interest
machines
is in his
theorems which
axioms,
axioms
can be related. of
is the d e v e l o p m e n t
do not d e p e n d
is b e i n g
help
of an
on the class
considered.
w h i c h will
intro-
some u n i f y i n g
of the c o m p l e x i t y
His m e t h o d
theory whose
of c o m p u t i n g
these p r o v i d e
of the o t h e r w o r k
aim is a c h a r a c t e r i s a t i o n
some
Our main
to c l a s s i f y
other
work. Let N denote an e f f e c t i v e one v a r i a b l e ,
the set of n o n n e g a t i v e
listing and
of all p a r t i a l
{M i} a set of
integers,
recursive
"machines",
{~i }
functions
such
that M i
of
24
c o m p u t e s ~i"
The p r e c i s e n a t u r e of M i is not specified.
It can be v i e w e d as a program. Axioms
for Size
mapping N
(Blum
A recursive
(viewed as the set of indices)
the set of sizes) machines, (i)
(15)).
function
into N
I I
(viewed as
is called a m e a s u r e of the size of
Ill b e i n g c a l l e d the size of Mi, if and only if
there e x i s t at m o s t a finite n u m b e r of m a c h i n e s of
any g i v e n size and (2)
there exists
an e f f e c t i v e p r o c e d u r e
any y, w h i c h m a c h i n e s
Axioms
for deciding,
are of size y.
for S t e p - c o u n t i n g
(Blum
(16)).
The set ~ i : i = O , l , . . . }
is a s t e p - c o u n t i n g m e a s u r e on { ~ i : i = O , l , . . . } (i)
~i is a p a r t i a l
recursive
(2)
~i(n)
if and only if ~i(n)
converges
for
if and only if
function converges
r (3)
M ( i , n , m ) = Ii
L0
if ~i(n)---m otherwise
is
(total)
recursive.
(M is a m e a s u r e o n c o m p u t a t i o n ) . An e x a m p l e of a size m e a s u r e is the length of a program, if m e a s u r e d by the n u m b e r of c h a r a c t e r s
a p p e a r i n g in it.
The n u m b e r of s t a t e m e n t s in a p r o g r a m is not u s u a l l y a size measure,
b e c a u s e it v i o l a t e s a x i o m
(i).
The l e n g t h Z(p) of
a p r o g r a m p is not a s t e p - c o u n t i n g m e a s u r e b e c a u s e it v i o l a t e s axiom
(2). E x a m p l e s of s t e p - c o u n t i n g m e a s u r e s
statements
e x e c u t e d by a p r o g r a m ,
are the n u m b e r of
and the amount of time t a k e n
25
by a p r o g r a m are not
"static"
the
obtained
consequence
of the
machines.
For example:
be m e a s u r e s
of the
respectively, T i both g such
compute that,
C2)
glil~
However,
~i"
than
sense
measures.
~
sizes
and {T } are 1
Let
(18)),
sharp
"complexity
measure". enough
to
of
II M and
M. l
and
so o r d e r e d
exists
IT T
l
that M. and l
a recursive
function
that
imposed find
there
any r e c u r s i v e A result
lilt •
of M. and T. are not 1 1
the b o u n d
a primitive
(15)),
there
and
for any class
of the m a c h i n e s
Then
(17)
is an i n e v i t a b l e
they hold
(Blum
are
In some of
"step-counting
by B l u m are not
that
measures
of d e s c r i p t i o n "
This
called
Lofgren
and H o p c r o f t
n.
~ glilT
b o u n d w h i c h we m a y the
{M.} l
are o f t e n
argument
for all i:
lil~
the
size
where
(1)
Thus,
fact
on the
step-counting
for
for our p u r p o s e s .
These measures
respectively.
Hartmanis
as a s y n o n y m
theorems
be u s e f u l
and
"complexity
(e.g.
is u s e d
The
size m e a s u r e s
of i n t e r p r e t a t i o n " ,
literature
units).
they d e p e n d
complexity
t h e m the names
measure"
clock
measures,
"dynamic"
"complexity
since
literature,
complexity
called
gives
in CPU
size m e a s u r e s ,
In the
then
(measured
are
"too d i f f e r e n t " .
by g is m u c h
useful.
greater
(It is only
functions
which
than
a bound
any in
grow m o r e q u i c k l y
function).
of more
interest
to us is that
recursive
function
whose
smallest
there
exists
primitive
26
recursive derivation is considerably larger than its smallest general recursive derivation.
The practical implication of
this is that if we characterise m o d e l l i n g as a search for short algorithms, then ~le should be prepared to use programming languages which allow general recursion, even i~ we only wish to compute primitive recursive functions.
2o2 Algorithmic Information Theory 2.2.1 ~ o l m o g o r o v Complexity
Much of the support for our view of modelling is provided by the theory of complexity and information developed by Kolmogorov
(19),
(20).
Kolmogorov complexity is defined in
terms of lengths of programs, and is therefore more akin to a static than a dynamic measure. property of sequences
It is, however, a
(observed behaviours in our application)
rather than of machines for computing them.
The results
presented below have been taken from the excellent survey paper by Zvonkin and Levin
~21).
Proofs of most of them
can be found there. The r e s u l ~ in which we are interested are concerned with finite binary sequences, which we call words. means of the bijection:
A ~+0, 0÷÷i,
i~2,
By
OO~+3, 01÷+4,...
we can regard words as nonnegative integers, and conversely (here
A is the empty word).
Thus effective procedures
transforming words into words are viewed as partial recursive functions mapping integers into integers.
We denote by £ (x)
27
the ~
of the w o r d x, i.e.
the n u m b e r of bits it contains,
and by xy the c o n c a t e n a t i o n of the words x and y. x denotes
a w o r d or the c o r r e s p o n d i n g
clear from the context. ~(x)
For example,
is m e a n t £(00), but by log
integer w i l l always be if x = 00, then by
(x) is m e a n t
log
we do not d i s t i n g u i s h b e t w e e n x = O0 and x = 3. the above b i j e c t i o n is not the usual b i n a r y integers.
In p a r t i c u l a r ,
Whether
(3).
Thus
N o t e that
coding of
s e q u e n c e s w h i c h d i f f e r Q n l y in
the n u m b e r of leading zeros are a s s o c i a t e d w i t h d i f f e r e n t integers.
Clearly,
£(xy) Also,
=
£(x)
the f o l l o w i n g result holds:
+ £(y)
. . . . . . . . . . . . . . .
(2.1)
it can easily be shown that I£(x) - log 2
Definition
(x) I ~i , (x>O)
. . . . . . . . . . .
(2.2)
(2.2.1) I
Let F
be an a r b i t r a r y p a r t i a l
recursive
f u n c t i o n of one
variable.
T h e n the K 0 1 m o ~ o r o v c o m p l e x i t y of the w o r d x i w i t h r e s p e c t to F is d e f i n e d as: KFI (x)
=
~
[
rain Z(p)
, s, t.
F1(p)
= x. . (2.3)
if no such p exists
A g e n e r a l i s a t i o n of this is the c o n c e p t of c o n d i t i o n a l complexity:
Definition
(2.2.2)
The c o n d i t i o n a l K o l m o ~ o r o v c o m p l e x i t y of the w o r d x, for a
28
2
g i v e n y, w i t h r e s p e c t to the p a r t i a l r e c u r s i v e (of two variables)
function F
is d e f i n e d to be 2
I min£(p) KF2 (x]y)
, s.t.
F
(p,y) = x,
=
.(2.4) if no such p exists.
C o m p l e x i t y was
i n t r o d u c e d by K o l m o g o r o v
was also used by S o l o m o n o f f the t e r m i n o l o g y
(19), a l t h o u g h it
(22) and C h a i t i n
(23), who u s e d
"the length of the s h o r t e s t p r o g r a m r e q u i r e d
to compute x from y, using a T u r i n g M a c h i n e w h i c h c o m p u t e s the f u n c t i o n F 2".
This t e r m i n o l o g y shows the i n t e n d e d
i n t e r p r e t a t i o n of the definitions,
p is t h o u g h t of as a
code or p r o g r a m for x, and F is t h o u g h t of as a d e c o d i n g d e v i c e or computer. w i l l do the job,
the
S i n c e p is the s h o r t e s t p r o g r a m w h i c h (conditional)
c o m p l e x i t y is in some
sense the " s m a l l e s t amount of i n f o r m a t i o n r e q u i r e d to o b t a i n x
(from y), u s i n g F".
Theorem
(2.2.3)
T h e r e exists
a partial recursive
function F 2
(called
0
optimal),
such that,
there exists
for
a constant C
any partial
recursive
(depending o n l y on P a o
function
G2,
and G2),
such that KF2 (xly) ~ o
KG2(xly)
+ C . . . . . . . . . . . . .
This t h e o r e m is due to K o l m o g o r o v
(19) and S o l o m o n o f f
If F O2 is t h o u g h t of as a g e n e r a l p u r p o s e computer,
(2.5) (22).
then the
29
theorem
is e a s i l y
program
for F 2 w h i c h 0
worst,
seen to be true.
the p r o g r a m
the s i m u l a t i o n computes
causes
x using
it to s i m u l a t e
for o b t a i n i n g
program
C is the
x using
+
F 2.
However,
C, b e c a u s e
Theorem
of this
Thus,
KF2
at
G 2 is p r e f i x e d program
(xly) may be less
than
the g r e a t e r by some
flexibility
shorter
of F2o m a y
program.
As
a
t h e o r e m we have:
(2.2.4)
For
any two o p t i m a l
G ~, there
exists
G 2) , such
that
partial
a constant
C
recursive
(depending
functions
only
F ~ and
on F 2 and
IK~(xly) - K ~ (xty)l<.c . . . . . . . . . . . . . . We henceforth chosen,
omit
complexities
assu~e
that
optimal
and r e f er
simply
or K(xly) .
of c o m p l e x i t y ,
as K(x)
it is u s e f u l
in the
intuitively
appealing
information
required
procedure.
a fixed
the s u b s c r i p t ,
At this p o i n t
together
by
O
allow x to be o b t a i n e d corollary
G 2.
for F o' 2 and the r e s u l t i n g
O
KG2(xly)
l e n g t h of a
to r e f l e c t
has b e e n
on the s i g n i f i c a n c e
Thesis.
of the s m a l l e s t
to o b t a i n
Furthermore,
function
to K o l m o g o r o v
light of C h u r c h ' s measure
(2.6)
an entity,
and r e m a r k a b l y ,
It is an
amount
of
x, by any e f f e c t i v e result
(2.6)
with lira K(xly)
=
~
. . . . . . . . . . . . . .
(2.7)
30
show
that,
for m o s t
approximately to be the
invariant.
coding
complexity
arbitrary
following
theorem
K(x)
recursive
set of points,
(2.2.5)
(19).
following
defined does
operates which
the n u m b e r
being
complex
of
there-
of theories,
important
since
equal,
one. consequences
later.
is not p a r t i a l
function
there
(the c o r r e s p o n d i n g
Kolmogorov
K(x),
complexity
~(x),
with
recursive.
defined
K(x)
Moreover,
on an i n f i n i t e
in the w h o l e
of its
of d e f i n i t i o n .
Theorem
#(x)
then
It can
things
a more
has m o s t
can c o i n c i d e
In o t h e r words,
on the
other
as w i l l be seen
function
no p a r t i a l
K(x)
in a theory.
than
x to y,
(intuitive)
it m e a s u r e s
that,
is b e t t e r
p is c o n s i d e r e d
relates
the q u a l i t y
is
(2.2.5)
The
domain
that
embodied
believed
theory
for m o d e l l i n g ,
Theorem
of the
as m e a s u r i n g
it is g e n e r a l l y
The
a measure
decisions
a simpler
If the p r o g r a m
in the sense
fore be v i e w e d
this m e a s u r e
of a theory w h i c h
gives
of theories,
such e n t i t i e s ,
is no e f f e c t i v e
theorem
was
first
Zvonkin lines.
Suppose
Then we
as follows:
is in the d o m a i n
also
a partial
can i m a g i n e
the t h e o r e m
recursive
function
and c o i n c i d e n t
with
a computer which
it uses ~(x)
of d e f i n i t i o n
p r o o f by
(21) p r o v e
set of points,
for each m,
of o b t a i n i n g
for K(xly)).
stated without
and L e v i n
on an i n f i n i t e
exist.
holds
way
of ~(x),
to find
an x
and for w h i c h
31
K(x)>m. However,
Denote this value F(m). this c o m p u t e r
T h e n K(F(m))>m.
(call it F) r e q u i r e s only to be given
m in o r d e r to find F(m).
Hence KF(F(m))~£(m).
know that for some C, K ( F ( m ) ) ~ K F ( F ( m ) ) Hence we k n o w that, w h i c h is false.
+ C
But we
~£(m)
for some C, and for all m,
+ C.
m<~(m)
Hence such a # c a n n o t exist.
The above p r o o f relies on F(m)
being general
recursive.
T h a t this can be a r r a n g e d for, is shown as follows: d o m a i n of #(x) enumerable
set.
But e v e r y such set c o n t a i n s (Rogers
for F(m)
is defined,
the
is by s u p p o s i t i o n an i n f i n i t e r e c u r s i v e l y
recursive subset possible
+ C,
(9), t h e o r e m 5-IV).
an i n f i n i t e Hence
it is
to e x a m i n e only i n t e g e r s x for w h i c h
and t h e r e f o r e F(m)
is d e f i n e d
#(x)
for e a c h m.
2.2.2 R a n d o m n e s s
N o t e that there exists
a C, i n d e p e n d e n t of x, such
that K(x)
~
£(x)
+ C . . . . . . . . . . . . . . . . . . .
(2.8)
This r e s u l t says that if all else fails, we can always compute x by m a k i n g p a copy of x, t o g e t h e r w i t h i n s t r u c t i o n s t e l l i n g the o p t i m a l c o m p u t e r s i m p l y to copy its input, symbol by symbol.
This
corresponds
to c o m p u t i n g x by
u s i n g a "table look-up".
Theorem
(2.2.6)
The p r o p o r t i o n of w o r d s
of length £(x)
for w h i c h
32
K ( x ) < £ ( x ) - m does not e x c e e d 2 -m+l.
This m e a n s that m o s t
finite s e q u e n c e s have n e a r l y m a x i m a l complexity. K o l m o g o r o v and C h a i t i n p r o p o s e d maximal
c o m p l e x i t y is e q u i v a l Q n t
In o t h e r words,
that the p r o p e r t y of
to the p r o p e r t y of randomness.
w h e n we say that a s e q u e n c e is "random",
w h a t we m e a n is that we have no w a y of c o m p u t i n g it, o t h e r than by looking up its terms in a table. This idea is a d e v e l o p m e n t of C h u r c h ' s that von Mises' collectives
"Law of E x c l u d e d G a m b l i n g Systems"
(random sequences)
c o m p u t i n g s u c c e s s f u l gambles and a s s o c i a t i n g
"partial recursive Sequences
(25),
(26) for
can be f o r m a l i s e d by
s t i p u l a t i n g that no e f f e c t i v e p r o c e d u r e
sequences,
suggestion
can exist for
on the o u t c o m e s of such
"effective procedure" with
function".
are c o n s i d e r e d to be n o n - r a n d o m if they c o n t a i n
sufficiently many regularities.
A r e g u l a r i t y is "any
v e r i f i a b l e p r o p e r t y of a s e q u e n c e i n h e r e n t o n l y in a n a r r o w e r class",
More p r e c i s e l y ,
the m e a s u r e of the set of s e q u e n c e s
c o n t a i n i n g m o r e than m bits of r e g u l a r i t y It is e s s e n t i a l that,
that the r e g u l a r i t i e s
as Zvonkin and L e v i n
(21) say:
cannot e x c e e d 2 -m.
are v e r i f i a b l e ,
so
"We r e g a r d as r a n d o m
those s e q u e n c e s w h i c h under any a l g o r i t h m i c test and in any algorithmic experiment behave
as r a n d o m sequences".
To e x p l a i n the above p a r a g r a p h m o r e carefully, we shift our a t t e n t i o n
from finite to i n f i n i t e b i n a r y sequences.
We denote an i n f i n i t e b i n a r y s e q u e n c e by m, and the set of all such s e q u e n c e s by ~.
The initial s e g m e n t of ~, of length
33
n, is d e n o t e d by
Definition
(~)
n
(2.2.7)
Let P be a p r o b a b i l i t y m e a s u r e on ~. of p r o o f of P - r e g u l a r i t y , function F(x) w h i c h
or P-test,
satisfies
(a)
It is g e n e r a l r e c u r s i v e
(b)
for m>O where
, P{~
F(~)
=
:
A correct method
is d e f i n e d to be a
the f o l l o w i n g conditions:
F ( ~ ) ~ m}<2 -m,
s~p F((~)n).
F(~), w h i c h is the " q u a n t i t y of r e g u l a r i t i e s "
found by a
test, is t a k e n to be the value of the test.
The P - t e s t F
is said to reject ~ if F(~) Let F
x
denote
= ~.
the set of all b i n a r y s e q u e n c e s w h o s e
initial s e g m e n t is the w o r d x. (strictly,
A p r o b a b i l i t y m e a s u r e on
on the B o r e l ~- algebra of subsets of ~) can
be d e f i n e d by giving its values on the sets F x. see this,
imagine i n f i n i t e b i n a r y s e q u e n c e s
e x p a n s i o n s of real numbers corresponds
Definition
to
in [O,i).
(To
to be b i n a r y
Then F0,,
for example,
~,½)).
(2.2.8)
A p r o b a b i l i t y m e a s u r e P on ~ is c o m p u t a b l e if there exist g e n e r a l r e c u r s i v e that the r a t i o n a l n u m b e r
f u n c t i o n s F(x,n)
and G(x,n),
such
34
F(x,n) ~p (x,n)
. . . . . . . . . . . . .
=
(2.9)
G(x,n) approximates
Theorem
P(F x) to w i t h i n
an a c c u r a c y of 2 -n.
(2.2.9)
For any c o m p u t a b l e m e a s u r e P there exists a P - t e s t F, called universal,
such that for any P - t e s t G a c o n s t a n t C
can be found such that, G(~)
Definition
~
F(~)
for all ~ ,
+ C . . . . . . . . . . . . . . . .
(2.10)
(2.2.10)
A s e q u e n c e ~ is called r a n d o m w i t h r e s p e c t to a m e a s u r e P if it w i t h s t a n d s
any P-test.
W i t h this d e f i n i t i o n , r e s p e c t to P, s a t i s f i e s law of p r o b a b i l i t y
every s e q u e n c e w h i c h is r a n d o m w i t h
every c o n c e i v a b l e e f f e c t i v e l y v e r i f i a b l e
theory,
since the v i o l a t i o n of such a law w o u l d
constitute a r e g u l a r i t y w h i c h w o u l d be d e t e c t e d by some P-test. Now c o n s i d e r a finte s e q u e n c e x.
The f o l l o w i n g c o n s t r u c t i o n
can be used to d e f i n e the "number of r e g u l a ~ i ties", p(x), r e s p e c t to the u n i f o r m m e a s u r e L, d e f i n e d b y ~ L { F x } = 2 - £ ( x ) corresponds
to L e b e s g u e m e a s u r e on
c o r r e s p o n d s to B e r n o u l l i Let F(x,n)
~,i),
in x, w i t h (L
and is the m e a s u r e w h i c h
s e q u e n c e s with g e n e r a t i n g p r o b a b i l i t y
~).
d e n o t e the m i n i m u m v a l u e of the u n i v e r s a l L - t e s t on
w o r d s of length n b e g i n n i n g w i t h x. p (x) =lira F (x,n)
Then
. . . . . . . . . . . . . . . .
(2.11)
35
The q u a n t i t y
£(x)
to c o m p l e x i t y ,
- p(x)
is a n a l o g o u s
and is r e l a t e d
in several
to it by the
respects
following
theorem:
Theorem
(2.2.11)
There
exists
I (£(x)
- p(x))
As a c o r o l l a r y has
a finite
Theorem
a constant
such
that
- K(x) I~4£(Z(x))
of this we obtain,
number
+ C
.......
(2.12)
(since a r a n d o m
sequence
of r e g u l a r i t i e s ) :
(2.2.12)
For
any
a constant
sequence
C,
such
~,
random with
The supports
-C
above d e v e l o p m e n t the c o n t e n t i o n
"maximally
respect
to L,
there
exists
that
K((~) n ) ~ n - 4£(n)
2.2.3
C,
. . . . . . . . . . . .
is due
that
to M a r t i n - L ~ f
"random"
(2.13)
(24),
is e q u i v a l e n t
and
to
complex".
Information As was
remarked
in s e c t i o n
is an a p p e a l i n g
measure
of i n f o r m a t i o n "
required
object between
by any e f f e c t i v e complexity
that e n t r o p y
is the
2.2.1,
Kolm~gorov
of the i n t u i t i v e to obtain,
procedure.
and entropy: "average
concept
of the
or r e c o n s t r u c t , An a n a l o g y
"amount
an
can be d i s c e r n e d
~t is g e n e r a l l y
amount
complexity
accepted
of i n f o r m a t i o n "
required
to s e l e c t space.
(i.e.
predict)
Furthermore,
entropy
measure
of the r a n d o m n e s s
Section
2.2.2
measure
of the r a n d o m n e s s
Pursuing
suggests
the
proposed
Definition
(2.2.13)
I(y:x) This
information defined
For
analogy
to be
a
of events.
complexity
is a s u i t a b l e
of a sequence. we m a k e
- K(xly)
the
following
definition,
(19):
in y about x is
. . . . . . . . . . . . .
with
the c l a s s i c a l
random variable
= H(~)
- H(~I~)
entropy.
ab o u t
(2.14)
Shannon
another,
which
is
. . . . . . . . . . . . . . This
classical
quantity
(2.15) has
the
properties :
J(~:~)
~> 0
J(~:~)
= H(~)
J(~:~)
= J(n:~)
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
the a l g o r i t h m i c
properties
Theorem
Kolmogorov
in one
H denotes
following
and
that
by K o l m o g o r o v
= K(x)
taken
probability
by
J(n:~) where
is u s u a l l y
of i n f o r m a t i o n
is in d i r e c t
from a g i v e n
of a c o l l e c t i o n
analogy,
originally
The q u a n t i t y
an e v e n t
hold
only
. . . . . . . . . . . . . . . . . quantity
of i n f o r m a t i o n ,
(2.16) (2.17) (2.18)
the c o r r e s p o n d i n g
approximately:
(2.2...14)
•here
exist positive
constants,
C] ,C2,C~,
(independent
37
of x and y)
such
I(x:y)
that
~ -CI
. . . . . . . . . . . . . . . . . . .
II(x:x)
- K(x)I~C2
II(x~y)
- I(y:x)l~12£(K(xy))
A concrete established
Theorem
by
. . . . . . . . . . . . . . . .
link between the
(2.19)
+ C3
complexity
following
(2.20)
. . . . . . . . and entropy
(2.21) is
theorem:
(2.2.15)
Suppose
that
i words,
each
possible
words
with
frequency
the
a constant
of
a word
C,
length
of
r.
length
such
qk
x, of
length
Suppose
r
(label
(k=l,
...,
£(x) that
such 2r).
= ir, each
a word Then
consists
of k)
the
and
that
< i(H(qk) + ~(i)) + C . . . . . . . . . . . . 2r H(qk) = ~ k=lqk l°g2qk . . . . . . . . . . . . . e(i)
-- C r
A closer information stationary
£n
i
i
÷
connection
O as
i ~ "
between
can be established random
in x
exists
K(x) where
2r
occurs
there
of
processes
algorithmic
for
(see
. . . . . . . . . .
arbitrary
Zvonkin
(2.22) . ~2.23)
(2.24)
and probabilistic ergodic
and Levin
(21)).
38
2.2.4
The W o r k of C h a i t i n and S c h n o r r
Chaitin
(23) also i n v e s t i g a t e d
the p r o p e r t i e s
sequences which need maximal-length programs computation. above:
of b i n a r y
for their
His f o r m a l i s m was d i f f e r e n t from that used
he w o r k e d d i r e c t l y w i t h p a r t i c u l a r T u r i n g machines.
C h a i t i n showed that m o s t s e q u e n c e s have m a x i m a l
complexity,
and he o b t a i n e d a r e s u l t r e l a t i n g c o m p l e x i t y to entropy, s i m i l a r to t h e o r e m predicate
~(x)=n
(2.2.15). &K(x)<m]
w e a k e r r e s u l t than t h e o r e m d e d u c e that K(x) K(x)
Also,
he s h o w e d that the
is not decidable. (2.2.5), since
from it one can
is not g e n e r a l recursive,
is not p a r t i a l
T h i s is a
but not that
recursive.
C h a i t i n s u g g e s t e d that " m a x i m a l c o m p l e x i t y " appropriate explication
for "randomness".
is an
He also h i n t e d
that c o m p l e x i t y m a y be a m e a s u r e of the p o o r n e s s of s c i e n t i f i c theories,
arguing that "static"
appropriate measure
c o m p l e x i t y is a m o r e
for this than "dynamic"
did not use these terms of course,
complexity
(he
since they w e r e not yet
c u r r e n t at the time). Schnorr
(27) has i n v e s t i g a t e d
u s i n g the c o n c e p t of program.
related questions without
Instead,
G ~ d e l n u m b e r i n g s of p a r t i a l r e c u r s i v e
he has e x a m i n e d
functions.
He has
shown that there e x i s t o p t i m a l G6del n u m b e r i n g s ,
relative
to w h i c h the lowest G~del n u m b e r of each p a r t i a l r e c u r s i v e f u n c t i o n is r a t h e r low w h e n c o m p a r e d w i t h the lowest G o d e l n u m b e r of the same f u n c t i o n r e l a t i v e to any other G ~ d e l
39
numbering.
Optimal G6del numberings
to K o l m o g o r o v ' s o p t i m a l p a r t i a l an o p t i m a l p a r t i a l
recursive
recursive
functions:
f u n c t i o n is u n i v e r s a l ,
it is an o p t i m a l G ~ d e l numbering. that not all o p t i m a l
correspond closely if then
(However, S c h n o r r shows
f u n c t i o n s are u n i v e r s a l ) .
It s h o u l d be n o t e d that S c h n o r r i n v e s t i g a t e s p r o p e r t i e s of functions, w h e r e a s K o l m o g o r o v i n v e s t i g a t e s p r o p e r t i e s
of
sequences, w i t h o u t s p e c i f y i n g the f u n c t i o n s w h i c h are to compute them.
S c h n o r r ' s m a i n result is that an optimal
Godel n u m b e r i n g
is e s s e n t i a l l y u n i q u e in the f o l l o w i n g
sense:
g i v e n any two o p t i m a l G ~ d e l n u m b e r i n g s ,
o b t a i n a Godel n u m b e r of a f u n c t i o n f, r e l a t i v e the n u m b e r i n g s ,
one can to one of
from a G~del n u m b e r of f r e l a t i v e to the
other n u m b e r i n g ,
by a r e c u r s i v e i s o m o r p h i s m
-
t and t 1 are l i n e a r l y b o u n d e d s i m i l a r l y for t-l).
However,
(i.e. Meyer
.
11~up
t, such that t(n)
n
<~ and
(28) has shown that the
sets of lowest G~del n u m b e r s r e l a t i v e to any two Godel numberings
are not n e c e s s a r i l y
Furthermore,
M e y e r remarks
of " s - m i n i m a l "
that this r e s u l t extends
to sets
indices d e f i n e d by any r e c u r s i v e f u n c t i o n s
w h i c h is a s i z e - m e a s u r e Thus a p a r t i c u l a r S c h n o r r and M e y e r
in the sense of Blum~ i n t e r p r e t a t i o n of these results of
is the following:
s h o r t e s t p r o g r a m for c o m p u t i n g univers~
r e c u r s i v e l y isomorphic.
language,
some function, w r i t t e n in a
then one can e f f e c t i v e l y
w r i t t e n in some o t h e r u n i v e r s a l the same function,
if one knows the
find a program,
language, w h i c h c o m p u t e s
such that the lengths of these two p r o g r a m s
40
are "not too different". effectively
in general,
find the s h o r t e s t program, w r i t t e n
language, w h i c h c o m p u t e s
2~3
But one cannot,
in the second
the same function.
A s y m p t o t i c Inference.
We now turn to i n v e s t i g a t i o n s i n f e r e n c e of h y p o t h e s e s
of the p r o c e s s of
from o b s e r v a t i o n s .
to c l a s s i f y these into two categories.
It is c o n v e n i e n t
In the first are
those i n v e s t i g a t i o n s w h i c h c o n c e n t r a t e on g o o d "asymptotic" results,
n a m e l y on e v e n t u a l l y o b t a i n i n g g o o d results
set of o b s e r v a t i o n s g r o w s w i t h o u t
limit.
consists of w o r k w h i c h e m p h a s i s e s
inference
finite sample.
if the
The second c a t e g o r y from a fixed,
In this section we e x a m i n e the first of
these categories.
2.3.1
Grammatical
Inference
The p r o b l e m of finding a m o d e l
for ~ b e h a v i o u r
is
o f t e n p o s e d as fo ~ o w s :
g i v e n a sample set of strings
(words)
i d e n t i f y the g r a m m a r w h i c h g e n e r a t e s
f r o m a language,
that l a n g u a g e considerably
(29),
(30).
This
f o r m u l a t i o n differs
from the c h a r a c t e r i s a t i o n of the m o d e l l i n g
p r o c e s s w h i c h is d e v e l o p e d in C h a p t e r 3, and is m e n t i o n e d here only for c o m p l e t e n e s s . regard whereas
The m a i n d i f f e r e n c e is that we
a m o d e l as a f u n c t i o n from o b s e r v a t i o n s
to o b s e r v a t i o n s ,
a g r a m m a r w i l l in g e n e r a l be a r e l a t i o n b e t w e e n
41
observations.
In other words,
a m o d e l in our sense can
be r e g a r d e d as a p a r t i c u l a r set of d e r i v a t i o n s
in a grammar,
r a t h e r than the g r a m m a r itself. Another difference inference
is that w h e n
considering grammatical
it is o f t e n a s s u m e d that a "teacher"
is available,
who is able to s u p p l y not only those strings g e n e r a t e d by the g r a m m a r to be inferred,
but also some strings w h i c h
cannot be g e n e r a t e d by that grammar,
and who
"informs"
the
i n f e r e n c e m a c h i n e w h i c h type of string it is being shown. We do not allow this p o s s i b i l i t y ,
since it implies the
e x i s t e n c e of an agent who knows the "true model" s y s t e m b e i n g investigated.
Gold
e x i s t e n c e of such a t e a c h e r makes a s y m p t o t i c c a p a b i l i t y of a m a c h i n e A third d i f f e r e n c e
of the
(31) has shown that the a great difference
to the
for i n f e r r i n g grammars.
is that it is u s u a l l y
a s s u m e d that each
string in the l a n g u a g e w i l l e v e n t u a l l y be p r e s e n t e d to such a machine. There inference
are also s i m i l a r i t i e s b e t w e e n g r a m m a t i c a l and modelling.
investigated,
the s o - c a l l e d
"type O g r a m m a r s "
The m o s t g e n e r a l class of g r a m m a r s "general r e w r i t i n g systems"
are as p o w e r f u l
as T u r i n g m a c h i n e s ,
or
in the
sense that the set of strings g e n e r a t e d by any such g r a m m a r is the range of some T u r i n g m a c h i n e , (32).
and c o n v e r s e l y
Thus i n f e r r i n g a g r a m m a r is in g e n e r a l e q u i v a l e n t
to i n f e r r i n g a T u r i n g machine. two p r o b l e m s
s t r o n g l y related.
This o b v i o u s l y m a k e s
the
42
A n o t h e r s i m i l a r i t y is that any l a n g u a g e can be g e n e r a t e d by m o r e
than one grammar,
and any
w h i c h is used for i n f e r e n c e language.
Consequently,
(finite)
sample of strings
can b e l o n g to m o r e than one
some c r i t e r i o n is n e e d e d for
c h o o s i n g b e t w e e n rival grammars.
One way of o b t a i n i n g
such a c r i t e r i o n is to r e g a r d the g r a m m a r as s t o c h a s t i c - e a c h p r o d u c t i o n in it occurs w i t h a c e r t a i n p r o b a b i l i t y . Selection
from among c a n d i d a t e g r a m m a r s
e i t h e r by u s i n g Bayes' grammar,
is then p o s s i b l e
t h e o r e m to i n d i c a t e the m o s t p r o b a b l e
or by u s i n g s t a t i s t i c a l t e s t i n g t e c h n i q u e s
The B a y e s i a n a p p r o a c h r e q u i r e s probabilities
(33).
the s p e c i f i c a t i o n of 9 priori
for the c a n d i d a t e grammars.
A second w a y of o b t a i n i n g a criterion,
and one w h i c h is
of m o r e i n t e r e s t to us, is to choose the least c o m p l e x of the c a n d i d a t e grammars,
(29),
(34).
d i f f e r e n c e now emerges b e t w e e n
Another significant
the p r o b l e m s of g r a m m a t i c a l
i n f e r e n c e and m o d e l l i n g as we u n d e r s t a n d it. c o m p l e x i t y m e a s u r e u s e d is a "static" of the lengths of the p r o d u c t i o n s
one,
If the
such as the sum
in the grammar,
then the
g r a m m a r w h i c h w i l l u s u a l l y be chosen is a u n i v e r s a l g r a m m a r which generates
the l a n g u a g e c o n s i s t i n g of all p o s s i b l e
strings from the alphabet.
Consequently
the c o m p l e x i t y
m e a s u r e used m u s t i n c l u d e a c o m p o n e n t w h i c h is a "dynamic" measure,
such as the n u m b e r of d e r i v a t i o n steps r e q u i r e d to
g e n e r a t e the sample set of strings. is used,
it is p o s s i b l e
W h e n such a m e a s u r e
to e f f e c t i v e l y
(relative to the c o m p l e x i t y measure)
find the b e s t g r a m m a r
which generates
a
43
particular
sample
(34).
on the o t h e r hand, is a p p r o p r i a t e , best m o d e l
In our f o r m u l a t i o n of m o d e l l i n g ,
the use of a static c o m p l e x i t y m e a s u r e
and it is not u s u a l l y p o s s i b l e
in an e f f e c t i v e m a n n e r
to find the
(see c h a p t e r 3).
We shall not d i s c u s s p a r t i c u l a r a l g o r i t h m s grammatical
inference,
for
since these are not a p p l i c a b l e to
the m o d e l l i n g p r o b l e m p o s e d in the n e x t chapter.
2.3.2
Inductive
Inference
We use the t e r m " i n d u c t i v e d i s t i n c t i o n to " g r a m m a t i c a l of inferring,
inference",
inference",
to denote the p r o b l e m
from an o b s e r v e d b e h a v i o u r ,
w h i c h p r o d u c e d that b e h a v i o u r
as output.
from s e c t i o n 2.3.1 that i n d u c t i v e
in c o n t r a -
the a l g o r i t h m It w i l l be c l e a r
and g r a m m a t i c a l
inference
h~ve m u c h in common. In this s e c t i o n we r e v i e w the two p a p e r s w h i c h we c o n s i d e r to be by far the m o s t s i g n i f i c a n t in this field, namely Solomonoff Solomonoff
(22), and B l u m and B l u m
(35).
c o n s i d e r e d the p r o b l e m of e x t r a p o l a t i n g
very long s e q u e n c e of symbols.
He f o r m u l a t e d this as the
p r o b l e m of finding the degree of c o n f i r m a t i o D c(a,T) the h y p o t h e s i s
that the s e q u e n c e
a w i l l occur,
the sense of Carnap Solomonoff's
of
given the
e v i d e n c e that the s e q u e n c e T has just occurred. this d e g r e e of c o n f i r m a t i o n
a
He c o n s i d e r e d
to be a logical p r o b a b i l i t y
in
(36).
distinctive
contribution
to the s o l u t i o n
of this p r o b l e m was that he r e g a r d e d the o b s e r v e d
and p r e d i c t e d
44
sequences
as o u t p u t s of some T u r i n g m a c h i n e ,
the p r o p e r t i e s of those b i n a r y
"programs"
w h i c h c a u s e d the o b s e r v e d and p r e d i c t e d computed.
He was a p p a r e n t l y
and e x a m i n e d
for this m a c h i n e
s e q u e n c e s to be
the first to e x a m i n e the
p r o b l e m in these terms. He p r e s e n t e d several a l t e r n a t i v e c(a,T).
The first g i v e s c(a,T)
concatenated
schemes for c a l c u l a t i n g
a high v a l u e if the
s e q u e n c e Ta can be c o m p u t e d by short p r o g r a m s
a n d / o r if it can be c o m p u t e d by m a n y programs. programs
Short
are f a v o u r e d b e c a u s e they r e p r e s e n t simple h y p o t h e s e s
about the s t r u c t u r e of the o b s e r v e d sequence, w h i l e s e q u e n c e s w i t h n u m e r o u s p r o g r a m s are f a v o u r e d b e c a u s e of the feeling that if they can have m a n y a l t e r n a t i v e
"causes"
are m o r e " l i k e l y " .
to be used is a
The T u r i n g m a c h i n e
"universal machine",
n a m e l y one w h i c h can s i m u l a t e another
u n i v e r s a l m a c h i n e by p r e f i x i n g set of " t r a n s l a t i o n corresponds
t h e o r e m s i m i l a r to t h e o r e m
machine
its p r o g r a m s w i t h a fixed
instructions".
to K o l m o g o r o v ' s
show that c(a,T)
then they
"optimal
Such a m a c h i n e function",
(2.2.4) holds.
and so a
This is u s e d to
is fairly i n d e p e n d e n t of w h i c h u n i v e r s a l
is used, p r o v i d i n g
that T is long enough.
The m a i n d r a w b a c k of this scheme is that the e v a l u a t i o n of c (a,T)
r e q u i r e s the s u m m a t i o n of an i n f i n i t e n u m b e r of
terms, m o s t of w h i c h are not e f f e c t i v e l y computable. S o l o m o n o f f tries to o v e r c o m e approximations,
this by d e r i v i n g s u i t a b l e
but the n e c e s s a r y
approximations
depend
h e a v i l y on the n a t u r e of the s e q u e n c e s b e i n g e x t r a p o l a t e d .
45
His e x a m p l e s Markov
are:
chain,
generated
a Bernoulli
and the e x t r a p o l a t i o n
by some
The d e t a i l s relevant
language
computed.
of the
not m a k e
To m a k e
this p o i n t
needed,
in o r d e r
sufficient
to m a k e
It is i m p o r t a n t "best"
programs
compute
models,
is b a s e d
rather
same
setting
as that
find an a l g o r i t h m inductive
the
c(a,T)
is
without
is not b a s e d
any
sequence
"best"
Ta.
Thus
different.
model.
for c o m p u t i n g
in C h a p t e r s
law w h i c h
f(x)=y. f.
close
1 and
and Blum
(35)
3.
This
is
b u t the
The o b s e r v a t i o n s
(x,y) , and the
computes
Solomonff's
of all p o s s i b l e
by S o l o m o n o f f ,
is very
on
but on the set of all
investigated
inference
as f o r m u l a t e d
to a solution,
that
program,
that
machines".
approximation,
by Blum
to be an a l g o r i t h m
but
of k n o w l e d g e
investigated
to be a set of p a i r s
evidence
schemes,
"inference
on a "weighting"
is s l i g h t l y
c a n n o t be
procedures.
than on the
The p r o b l e m the
a suitable
to note
just a s i n g l e
flaw -
suitable
of his
amount
direct
same
cases p r o v i d e s
validity
the
are not very
whieh
in f i n d i n g
special
clearer:
Solomonoff's
that
from the
t h e m into p r a c t i c a l
for p r o c e e d i n g
to use
prediction
success
inference).
proposed
of q u a n t i t i e s
conceptual
it does
grammatical
all s u f f e r
for c e r t a i n
a finite-order
of a set of strings
schemes
the c o m p u t a t i o n
approximations
need
they
Solomonoff's
in s u p p o r t
(i.e.
of the o t h e r
to us, but
they r e q u i r e
sequence,
formal
are a s s u m e d
explains
The p r o b l e m formulation
to the p r o b l e m
them is to of
of m o d e l l i n g ,
48
Blum f which
and B l u m
it is p o s s i b l e
for w h i c h The
attempt
to i d e n t i f y
it is p o s s i b l e
characterisations
counting
(dynamic)
it is p o s s i b l e to compute between
can not be used
Generally
the c l e a r e s t
that have been
a step-counting
in n a t u r e
of stepspeaking,
if it is not too d i f f i c u l t
give
and c o m p l e x i t y
However,
different
results
functions
algorithm.
are in terms
measures.
a function
functions
is, t h o s e
a correct
they o b t a i n
complexity
These
inference
to date.
which
those
- that
to i n f e r
to i n f e r
it.
to c h a r a c t e r i s e
measure
from a size m e a s u r e ,
connections established
is v e r y
so these
to supportthe h y p o t h e s i s
that
results
small m o d e l s
are good models. Nevertheless,
the a u t h o r s
"to have quality,
a hypothesis
than
be e n c o d e d
could m e r e l y
The m a c h i n e s identify through
which
by a p r o c e s s algorithm
numbering
increasing
but the
search
construction
Blum
to e n s u r e
in their p r o o f s sense
- the G o d e l
that
more
recursive
for it.
smallest
invariably
they
number
suitable
nevertheless
difficult
and B l u m
that
to be m e a n i n g f u l .
arbitrarily
explain
occurs
search functions
of an (It is not G~del
number,
in order
of
size).
In one example,
enough
that
. in that m a n y bits"
of all p a r t i a l
the
conviction
long m u s t
a size m e a s u r e
for t h e m to find
in g e n e r a l ,
.
in the
of e n u m e r a t i o n
is of c o u r s e
their
n bits
construct
"small"algorithms, a G~del
possible
they
state
the
inferred
A machine
0-i v a l u e d
employ
an i n t e r e s t i n g
algorithm
is to i d e n t i f y
recursive
is small some
functions.(A
machine
47 con~rqesin
the limit to i if it e v e n t u a l l y o u t p u t s i and
then n e v e r o u t p u t s
a d i f f e r e n t number,
w i t h some i n f i n i t e s e q u e n c e of pairs. if, w h e n e v e r (x,f(x)), ~i
It can i d e n t i f y
f
it is g i v e n a c o m p l e t e s e q u e n c e of pairs
it c o n v e r g e s
to i and the p a r t i a l r e c u r s i v e f u n c t i o n
is an e x t e n s i o n of f).
If its last c o n j e c t u r e d e f i n e d and ~ i ( y ) + f ( y ) , o t h e r hand,
upon b e i n g p r e s e n t e d
as fcllows.
is i, and it finds that ~i(y) then it c o n j e c t u r e s
if ~i(y)=f(y)
the f o l l o w i n g manner.
The m a c h i n e works
i+l.
is
On the
for all y<x then it tests @i(x)
in
F i r s t it c o n s t r u c t s an u p p e r bound,
then it tests w h e t h e r ~i(x)=f(x)
w i t h i n this u p p e r bound.
If so, it accepts i, o t h e r w i s e it c o n j e c t u r e s
i+l.
The
i n t e r e s t i n g p a r t is the c o n s t r u c t i o n of the upper bound: first a r e c u r s i v e f u n c t i o n h is fixed. for, such that ~j c o n v e r g e s faster
(i.e. in fewer steps)
to f on inputs O , l , . . . , m a x ( 2 j , 2 x ) than ~i c o n v e r g e s on x.
{#i } be the set of s t e p ~ o u n t i n g a s u i t a b l e j is found, (h(x,f(x)), max Thus
(#j(y):
T h e n a j is s e a r c h e d
m e a s u r e s b e i n g used.
Let When
take the u p p e r b o u n d to be max y~ m a x ( 2 j , 2 x ) ) ) .
c o n j e c t u r e i is a b a n d o n e d if it is d i s c o v e r e d
that some a l g o r i t h m computes
a r e s t r i c t i o n of the f u n c t i o n
to be i n f e r r e d m o r e q u i c k l y than i, for a c o n s i d e r a b l y larger set of values.
Now the r e a s o n why j can be r e g a r d e d
as a p o t e n t i a l l y m e a n i n g f u l this:
e x p l a n a t i o n of the data is
think of j as the jth p r o g r a m for some u n i v e r s a l
48 machine.
If it is w r i t t e n in a b i n a r y a l p h a b e t its
length is r o u g h l y enough to store
log2j, and it is t h e r e f o r e not large
(f(O),f(1),...,f(2j))
table
(recall that f(n) e{O,l}).
2.4
Small-Sample
in a look-up
Inference
2.4.1 W r i n c h and J e f f r e y s
In 1921 W r i n c h and J e f f r e y s m o d e l s of a set of o b s e r v a t i o n s b a s i s of simplicity. physics
(37) p r o p o s e d that c o m p e t i n g should be a s s e s s e d on the
They s u g g e s t e d that any m o d e l in
could be f o r m u l a t e d as a d i f f e r e n t i a l equation;
if two d i f f e r e n t i a l e q u a t i o n s
e x p l a i n e d the same set of data,
then the one w i t h the fewer p a r a m e t e r s
should be preferred.
In fact they p r o p o s e d a s s i g n i n g p r o b a b i l i t i e s
to m o d e l s in
this form,
function
the p r o b a b i l i t y b e i n g a d e c r e a s i n g
the n u m b e r of parameters.
Popper
of
(38) s u g g e s t e d a s i m i l a r
but v a g u e r scheme, but argued that W r i n c h and J e f f r e y s should have r e g a r d e d the s i m p l e r m o d e l s as the less p r o b a b l e ones.
This a r g u m e n t seems to arise from an almost w i l f u l
c o n f u s i o n of two s e p a r a t e concepts,
and it is c o n v e n i e n t to
c l a r i f y these at this stage. It seems c l e a r that W r i n c h and J e f f r e y s use the term "probability"
in the sense of "degree Of c o n f i r m a t i o n " .
T h a t is, h a v i n g o b s e r v e d
some data,
and h a v i n g c o n j e c t u r e d
some h y p o t h e s e s w h i c h are c a p a b l e of e x p l a i n i n g the data, they are a s s e s s i n g the r e l a t i v e
l i k e l i h o o d s of the v a r i o u s
49
predictions
e n t a i l e d by t h o s e hypotheses.
This is e q u i v a l e n t
to a s s e s s i n g the q u a l i t y of the c o m p e t i n g hypotheses. this sense
"probabflity"
is s y n o n y m o u s w i t h P o p p e r ' s
"degree of c o r r o b o r a t i o n " , simplicity.
which,
he insists,
increases with
T h u s P o p p e r ' s v i e w is c o n s i s t e n t w i t h that of
W r i n c h and J e f f r e y s ,
if this i n t e r p r e t a t i o n of " p r o b a b i l i t y "
is admitted.
(Popper argues,
corroboration"
cannot be i n t e r p r e t e d as a p r o b a b i l i t y .
is p r o b a b l y correct,
however,
t h a t " d e g r e e of
but it is a s e p a r a t e point.
argument which supports
it is g i v e n in C h a p t e r
"the p r o b a b i l i t y
that this m o d e l c o r r e c t l y p r e d i c t s
in the light of the b e h a v i o u r
to mean,
roughly,
"giventhat
are to be made, w h a t is the p r o b a b i l i t y p o s s i b l e to e x p l a i n them using Now it is q u i t e r e a s o n a b l e ,
(if it is d e f i n e d
like future
P o p p e r uses some o b s e r v a t i o n s
that it w i l l be
a model with n parameters?"
intuitively,
that this p r o b a b i l i t y
should i n c r e a s e w i t h the n u m b e r of p a r a m e t e r s , suggests
An
already observed
and the m o d e l s w h i c h have b e e n c o n j e c t u r e d " , "probability"
This
8).
W h e r e a s W r i n c h and J e f f r e y s r e f e r to s o m e t h i n g
behaviour,
In
as P o p p e r
at all).
Thus no c o n f l i c t arises b e t w e e n W r i n c h and J e f f r e y s and Popper,
providing
that it is b o r n e in m i n d that W r i n c h
and J e f f r e y s use " p r o b a b i l i t y " model,
to i n d i c a t e
in the light of o b s e r v a t i o n s ,
a p r o p e r t y of a
w h e r e a s P o p p e r uses it
to d e n o t e an a priori p r o p e r t y of the o b s e r v a t i o n s .
This
thesis is c o n c e r n e d e n t i r e l y w i t h the p r o p e r t i e s of models.
SO As it stands, W r i n c h and J e f f r e y s
the c r i t e r i o n of q u a l i t y p r o p o s e d by is not a p r a c t i c a l one.
Empirical
data w o u l d u s u a l l y n e e d a v e r y high o r d e r d i f f e r e n t i a l e q u a t i o n m o d e l to fit it.
In p r a c t i c e ,
some a p p r o x i m a t i o n
is t o l e r a t e d in o r d e r to allow a s i m p l e r model, trade-off between approximation made.
Also,
intuitive
the o n l y s u p p o r t
and some
and c o m p l e x i t y has to be for their p r o p o s a l
f e e l i n g that the n u m b e r of p a r a m e t e r s
is the
is the
a p p r o p r i a t e m e a s u r e of complexity.
A p a r t from this, however,
the c r i t e r i o n of W r i n c h and J e f f r e y s
is v e r y s i m i l a r to the
p r o p o s e d c r i t e r i o n of m o d e l q u a l i t y w h i c h is i n t r o d u c e d in C h a p t e r
2.4.2
3, and is close to b e i n g a s p e c i a l case of it.
Gaines
Gaines has r e c e n t l y p r o p o s e d general
system identification problem
i s a t i o n of C h a p t e r Gaines'
a f o r m u l a t i o n of the
proposals.
Gaines
considers
of m o d e l s w h i c h are of interest, p a r t i a l o r d e r i n g of m o d e l s
of m o d e l s
The c h a r a c t e r -
3 can be v i e w e d as a special case of
p r o b l e m to be d e f i n e d by an o b s e r v e d
b e i n g called
(14).
"complexity"),
an i d e n t i f i c a t i o n behaviour,
a class
an a r b i t r a r y but fixed
in this class
(this o r d e r i n g
and an a r b i t r a r y partial o r d e r i n g
in this class w h i c h is i n d u c e d by the p a r t i c u l a r
behaviour being observed "approximation").
(this o r d e r i n g being called
The " a d m i s s i b l e
subset"
of m o d e l s
then the set of m o d e l s w h i c h has the p r o p e r t y that,
is
if m is
51
a member which
of this
is b o t h
less
approximation this
in the
subset,
complex
than m,
to the b e h a v i o u r
admissible
problem.
admissible
subset
admissible
subset
depends
exercise.
characterisation,
and in e f f e c t
approximation
A point which will is that
Gaines
approximation revolutions" these
(39).
relations Gaines
behaviours symbols,
has
complexity
being
of a p p r o x i m a t i o n observed
admissible
models
is not p r a c t i c a l interest
which only
being
being
to i d e n t i f y
most
complex
"scientific or both
sequences state
of
and
distance
measures the
the set of
However, which
because
it
are of
of e x c e s s i v e
because
algorithms,
of
between
case
behaviours
finite-state
automata
several
computable.
but m a i n l y
of a r b i t r a r y
the m e a s u r e
In this
partly
by c o m p a c t
and
case of the
finite
of states,
behaviours.
8)
"view of science".
on the H a m m i n g
requirements,
by m a x i m a l l y
trade-
complexity
in e i t h e r
finite
is e f f e c t i v e l y
can be p r o d u c e d
in our
(in C h a p t e r
Kuhn's
the special
to us in this manner,
computational
with
to a n e w
the n u m b e r
and c o m p u t e d
circumstances
than this
later
and p r o b a b i l i s t i c ) ,
based
those
a particular
in the
a change
of m o d e l s
(both d e t e r m i n i s t i c
propose
changes
investigated
class
from among
and c o m p l ex i t y .
corresponds
to be m o d e l l e d
the
to the i d e n t i f i c a t i o n
We go f u r t h e r
relations
Thus
considers
on the p a r t i c u l a r
be of i n t e r e s t
associates ordering
solution
exists
a better
Gaines
is to be p r e f e r r e d
of each m o d e l l i n g
off b e t w e e n
and gives
than m.
to be the
Which model
then no m o d e l
many
behaviours,
can be m o d e l l e d
automata.
Thus
the
52
use of this class of m o d e l s does not lead to a d i s c e r n i b l e s t r u c t u r e w h e n used for the i d e n t i f i c a t i o n behaviours.
of m a n y s i m p l e
An e x a m p l e of such a b e h a v i o u r is that p r o d u c e d
by the program: n : = i; loop:n:= n*(n+l); write
(n);
g o t o loop; namely,the sequence
2,6,42,1806, ....
This p r o g r a m c a n n o t
be c o r r e c t l y i m p l e m e n t e d by e i t h e r a d e t e r m i n i s t i c or s t o c h a s t i c finite state automaton.
2.4.3 L o f @ r e n
Lofgren
(40) has made the r a t h e r u n l i k e l y s u g g e s t i o n
that"as soon as a s c i e n t i s t b e l i e v e s theory for some p h e n o m e n o n ,
that he has p r o d u c e d a
he should try to f o r m a l i s e the
theory so as to m a k e it e f f e c t i v e l y c o m m u n i c a b l e " . "formalise"
L ~ f g r e n m e a n s that the t h e o r y should be t r a n s l a t e d
into one of the formal "formalise"
By
logical systems!
is r e i n t e r p r e t e d
However,
if
to m e a n that the theory s h o u l d
be e x p r e s s e d as an a l g o r i t h m for c o m p u t i n g the o b s e r v e d phenomenon,
then L ~ f g r e n ' s views on the q u a l i t y of t h e o r i e s
b e c o m e v i r t u a l l y i d e n t i c a l w i t h our views on models. For example,
his key h y p o t h e s i s
is:
Let S and S' be
two formal t h e o r i e s w i t h the same logical basis, w h i c h e x p l a i n one set of e x p e r i m e n t a l
b o t h of
facts w i t h c o m p a r a t i v e l y
53
short and few p r o p e r axioms and p r o p e r rules of inference. Then,
if b o t h S and S' p r e d i c t f u r t h e r e x p e r i m e n t a l results
(as b e i n g p r o p e r t h e o r e m s or not), simplest proper axioms p r e d i c t i v e power,
the theory w i t h the
and p r o p e r rules has the g r e a t e r
in the s e n s e that its p r e d i c t i o n s
m o r e likely to agree w i t h f u r t h e r experiments.
are
The s i m p l i c i t y
of the p r o p e r axioms and p r o p e r rules of i n f e r e n c e can be m e a s u r e d by the total length of all c o r r e s p o n d i n g w e l l formed formulae".
This h y p o t h e s i s w i l l be seen to be very
c l o s e to the b a s i c idea u n d e r l y i n g
this thesis,
that the
q u a l i t y of a m o d e l can be m e a s u r e d by the s h o r t n e s s of the p r o g r a m r e q u i r e d to r e a l i s e it. Also, L o f g r e n has a n t i c i p a t e d one of the key c o n c e p t s w h i c h w i l l be i n t r o d u c e d later:
"Let { h l , . . . , h n} be a set
of e x p e r i m e n t a l l y o b t a i n e d facts.
This set can be c o n s i d e r e d
the t h e o r e m h o o d of a formal theory S O w i t h
{ h l , . . . , h n} as the
set of axioms, w i t h no rules of inference,
and thus w i t h a
logical basis w h i c h is empty the e x p e r i m e n t a l l y whatsoever".
obtained
.... Such a m e r e listing of facts has no p r e d i c t i v e p o w e r
The t h e o r y S O c o r r e s p o n d s
later call the "trivial m o d e l " ,
to w h a t we shall
and w h i c h w i l l serve as a
s t a n d a r d a g a i n s t w h i c h the q u a l i t y of m o d e l s w i l l be measured. L o f g r e n also i n v e s t i g a t e d the c o n n e c t i o n b e t w e e n r a n d o m n e s s of a s e q u e n c e and the length of p r o g r a m r e q u i r e d to c o m p u t e it.
He p r o v e d a w e a k e r
by using the r e c u r s i o n t h e o r e m
form of t h e o r e m
(see Rogers
(2.2.5)
(9)), and showed
54
that
there
can be no a l g o r i t h m
whose
Kolmogorov
2.5
Inference
Recently,
complexity
of the structure, a Gauss-Markov
as w e l l
process,
is a s h o r t d e s c r i p t i o n Suppose observed,
that
and
x(t+l) y(t)
(41)
finding
is g r e a t e r
of P a r a m e t e r s
Rissanen
for
a sequence
than
a given
of G a u s s - M a r k o v
has
Process
investigated
as the r e a l - v a l u e d
by u s i n g
the i d e a
of the o b s e r v e d
value.
the e s t i m a t i o n parameters,
that
of
a good m o d e l
data.
the set y = ( y ( O ) , .... y(N-l)) T has been
that
it was
generated
by the p r o c e s s
= Ax (t) +Be (t) = Cx(t)+e(t),
x(O)=O, t=O,... ,N-I.
Then
((A,B,C) ;
e(o),...,e(N-l))
description
of y,
using
where
y=Te,
can be c o n s i d e r e d
since y can be r e c o v e r e d
to be a
from it by
I T=
y is s u p p o s e d number
CB
I
CAB
CB
to be r e c o r d e d
of f r a c t i o n a l
consider
bits.
the s h o r t e s t
the e x p e c t e d
length
of such
of y.
triple
is such that
with
an a c c u r a c y
It is t h e r e f o r e
description
by the e n t r o p y (A,B,C)
I
of y.
a description
Rissanen the
meaningful
It is k n o w n is b o u n d e d
demonstrates sequence
of a f i x e d
that,
to
that below
if the
e is s e r i a l l y
SS
uncorrelated, Gaussian,
and if the d i s t r i b u t i o n
then e can be e n c o d e d
length n e a r l y
attains
in the
scalar
case
writing
each
There
e(t)
with
a pair
of
(A,B,C),
lower bound.
as a b i n a r y
number.
triples Each
(s,8),
triple
where
parameters
positions
of the c o m p o n e n t s
(s*,8*) z* =
make
e uncorrelated,
a concise coding
associated
of
y +
(s,e)
the
structure
and the
another
form of
~ is a code
of e.
a description
(s,).
Given
entropy
of e. with
of y, w h e r e
~* is the
s,8
are
to m i n i m i s i n g the m e a n
a structure
s, a c a n d i d a t e
asymptotically
the m a x i m u m
of @ g i v e n
where
also m i n i m i s e s
a Gaussian
If s* is the true
in a d d i t i o n
length of the
estimator
is one w h i c h
For
if the
is m i n i m i s e d ,
Thus,
of ~, a c o n c i s e
estimator
estimates
8 is a v e c t o r
of a set of
where
is c o n c i s e
(s,e,e)
for s,8.
concise
coincides
description
description
codes
the l e n g t h length
then
to the
of e.
An e s t i m a t o r
the b e s t
and
system
Thus
of y is a t r i p l e ( s , 8 , ~ ) ,
rise
(s*,e*,~*)
is c a l l e d shortest
of the
of 8.
that
can be i d e n t i f i e d
s represents
the o r d e r
shows
by s i m p l y
give
an e n u m e r a t i o n
which
description
which
(A,B,C)
(42).
and is in fact define
reached
s is an integer,
integers,
If
(A,B,C)
is
string w h o s e
He also
is n e a r l y
e.
of r e a l - v a l u e d
as a b i n a r y
the b o u n d
are m a n y
same s e q u e n c e
the
of the o b s e r v a t i o n s
at least m i n i m i s e s
the e s t i m a t e d
distribution,
an e s t i m a t o r
likelihood
structure
by this
such
estimator.
of the p r o c e s s ,
estimator
are
then
the
asymptotically
5~
G a u s s i a n at s=s*.
Consequently
it is p o s s i b l e to e s t i m a t e
the e n t r o p y of 8, c o n d i t i o n a l on s*. H(@Is)~H(81s*), how H(SIs)
Furthermore,
w h e r e H d e n o t e s entropy,
m a y be estimated.
and R i s s a n e n shows
If H(@Is)
not o n l y has a e been found w h o s e
is m i n i m i s e d ,
then
shortest expected length
of d e s c r i p t i o n is the s h o r t e s t p o s s i b l e , b u t the a s s o c i a t e d s m u s t also be the true structure. Thus an e s t i m a t o r w h i c h m i n i m i s e s 1 H ( e ) + ~H(SIs) , w h e r e the m i n i m i s a t i o n asymptotically
concise estimator,
an e s t i m a t e of is over
(s,8),is an
n a m e l y one w h i c h e v e n t u a l l y
gives the s h o r t e s t p o s s i b l e d e s c r i p t i o n of the o b s e r v e d d a t a y, and w h i c h also e s t i m a t e s Furthermore,
R i s s a n e n shows that H(@Is)
error covariance
2.6
c o r r e c t l y the s t r u c t u r e s.
is h i g h l y s e n s i t i v e
is small w h e n the
to v a r i a t i o n s
in 8.
Summary
The above survey has, h o p e f u l l y ,
f i l l e d in the back-
g r o u n d to the t r e a t m e n t of m o d e l s w h i c h w i l l be d e v e l o p e d in this thesis - that is, a t r e a t m e n t b a s e d on ideas of c o m p u t a t i o n a l complexity.
Blum's
axioms h e l p to d i s t i n g u i s h
two very d i f f e r e n t n o t i o n s of c o m p l e x i t y - the c o m p l e x i t y r e q u i r e d to d e s c r i b e interpreting
something,
and the c o m p l e x i t y of
that d e s c r i p t i o n .
A d e t a i l e d r e v i e w has been g i v e n of the t h e o r y d e v e l o p e d from the n o t i o n of K o l m o g o r o v complexity. is shown to be not e f f e c t i v e l y
computable.
This c o m p l e x i t y The results
S7 which have been given support the idea
that complexity
represents the amount of information required to obtain an
entity by any effective procedure.
They also
indicate that the property of being maximally complex is equivalent to the property of being random.
These ideas
are the chief support of our approach to modelling. Kolmogorov complexity is asymptotically
invariant with
respect to a wide class of universal computers,
but it will
be seen later that this invariance property does not appear sufficiently quickly for the purposes of practical model assessment,
and the choice of computer
(in fact, of language)
constitutes a major problem. Aspects of grammatical examined.
inference have been briefly
Although grammatical
inference is conceptually
very close to the inference of any type of model from observations, different
the statement of the problem is sufficiently
from what we understand as the modelling problem,
to make the nature of possible
solutions very different.
The use of dynamic complexity measures fact, essential)
for grammatical
an effective procedure grammar,
is appropriate
inference,
and this allows
to exist for finding the "best"
within a sufficiently restricted class.
the procedures attempted enumeration,
(in
(Admittedly
so far are based on exhaustive
and so are not practical,
but the problem can
at least be solved in principle). Solomonoffqs
approach to inductive inference is the
58
progenitor
of the w o r k
Solomonoff
emphasised
confirmation"
and ours
behaviour,
terms.
is that
confirmation"
one - the b e s t The p a p e r it is c l e a r l y generally.
we
one
that
consider
which
That
because
Gaines
refer
inference
relevant
to this
to the a s y m p t o t i c
with
the
situation
"Small-Sample
thesis
situation,
of h a v i n g
suggests
in our
L~fgren
a fixed,
similar
The a p p r o a c h e s a characteristic a more
features,
to our
none
scheme
to i n f e r e n c e
less p l a u s i b l e of them
identification
and these
features
theories
for a s s e s s i n g
models.
surveyed
each of t h e m seems
account
can be a p p l i e d
of the
can
problem.
for a s s e s s i n g
w h i c h w e have
while
considered
case of our
of the m o d e l l i n g
a scheme
in common:
or
any s y s t e m
we
is i n t e r e s t i n g
as a s p e c i a l
that
formulation
suggests
Inference"
and J e f f r e y s
be r e g a r d e d
the same b a s i c
of inference,
for i n d u c t i v e
of W r i n c h
it can a l m o s t
is v e r y
provide
for a
of b e h a v i o u r .
three papers.
Finally,
programs
of
and B l u m has b e e n m e n t i o n e d
it is not v e r y
the h e a d i n g
be d i s c e r n e d
"degree
by only
are c o n c e r n e d
has
Solomonoff's
it to be d e t e r m i n e d
important
we
problem
by all the p o s s i b l e
of
sualmation
considered
whereas
proposals.
the
Solomonoff
all its r e s u l t s
because
required
"degree
between
of B l u m
However,
Under
and this
of
available.
most
sample
measure
However,
difference
because
finite
thesis.
A major
is d e c i d e d
whereas
in this
a quantitative
of a theory,
of u n c o m p u t a b l e work
reported
to
abstract
to a u s e f u l
have
nature
59
range
of p r a c t i c a l
a v i e w of m o d e l l i n g and w h i c h
which
can be d i r e c t l y
encountered
in c o n t r o l
In s e c t i o n which
models.
is b a s e d applied
assumes
a very
a particular
complexity
by an i n v e s t i g a t i o n
Gauss-Markov the d a t a
leads
estimation.
develop
of c o m p l e x i t y , of the type
with
search
to an e x t e n s i o n This w o r k g i v e s
of the c h a r a c t e r i s a t i o n
of the
of R i s s a n e n ,
v i e w of m o d e l l i n g .
explicit
the model.
the
the work
similar
can r e p l a c e
process,
to m o d e l s
probabilistically
Rissanen
associated
on ideas
examined
situation,
entropy
3 we shall
studies.
2.5 w e have
implicitly
By e x a m i n i n g
In C h a p t e r
described
consideration
classical For the
for the
of
Shannon
case of a
shortest
model
of
of m a x i m u m - l i k e l i h o o d strong
support
w h i c h we shall
to the p l a u s i b i l i t y
d e v e l o p below.
3.A C H A R A C T E R I S A T I O N
OF
MODELLING.
3.1 I n t r o d u c t i o n
In this modelling
problem
is p a r t i a l obtaining choosing
chapter we propose introduced
because a model
between
exist.
for a system, competing
Our
can be expected. if the class
3.2
solution
1.3.
The
b u t only
of m o d e l l i n g
is that
is, t h e r e f o r e ,
of c a n d i d a t e
models
have
been
for
a criterion
However,
(More c o m p l e t e
solution
an a l g o r i t h m
models.
solution
to the
solutions
for
a consequence such
an a l g o r i t h m
as c o m p l e t e
as
can be o b t a i n e d
is s u f f i c i e n t l y
restricted).
S~stems
Systems (1.3.1). Dositive
-
already
For each integer
in S by n
unchanged)
results
a system
is e i t h e r
such
introduced
S there
(except
for blanks,
in an i n t e g e r
an i n t e g e r
system
each
or a blank.
with
of w h i c h
transformation. be the e n t i t i e s
We which
exists
that m u l t i p l y i n g
as d e f i n e d p r e v i o u s l y
can be i d e n t i f i e d systems,
system
n,
appearing
is,
in s e c t i o n
it does not p r o v i d e
of our c h a r a c t e r i s a t i o n cannot
a partial
is m a p p e d
shall
a smallest e v e r y u~
i
3' YJ
which
are
left
SI=(UI,YI) ;
, but n o w each Each
a countable
in d e f i n i t i o n
integer
equivalence
that
observation
system class
SI
of
into S I by the above
consider
the i n t e g e r
are to be m o d e l l e d .
systems
The p r e f i x
to
81
"integer" will usually be omitted. We need to establish that the set of integer systems can be identified with the set of nonnegative
integers.
However, we do not need to know the details of the correspondence.
Theorem
(3.2.1)
There exists an effective bijection between the set of integer systems and the set of nonnegative integers. Proof.
For each u~ in an integer system S I, write O if 3
u~=b3 , u~+l if uj)o,i~ and u~3 otherwise. i yj.
Similarly for each
S I has now been replaced by an ordered pair of arrays
of integers.
Each integer can be mapped into a nonnegative
integer by the function p, where: p(n) = 21n I if n~O, 2n-i if
n>O.
+ Instead of considering SI, we can now consider SI, where SI=((( u + ~,...,u£11 ) ..... (u~ ..... u£~)),((y~ ..... y~l ) ,.., (y~,...,y~N))), i i and where each uj, yj is now a nonnegative integer. Rogers
(9) demonstrates the existence of a recursive
bijection T: ~ q x ~ ~ ,
which he calls a Rairing function,
and he uses it to define recursively a coding of k-tuDles of nonnegative integers onto the nonnegative integers: ~k(nl,n2,...,n k) = T(nl,Tk-l(n2,...,nk)), with Tl(n)=n.
62
+ If S I is n o w r e p l a c e d by an "indexed"
((M,(glU , . . . , u £ 1 ) , . . . , (
version:
gM,U~, .,u Ml),(N,(ml,Yl,...
1
""
), . . . .
£M . . . . .
then it can be c o d e d as the single n o n n e g a t i v e integer: £+i £M+I ml+l (TM+I(M,T (£i'''''u£i) ,... ,T ( £ M ' ' ' ' ' u ~ M) ) 'TN+I(N'T
mN+l (ml,...,YMll) ..... T Since an "indexed"
v e r s i o n of S + I has b e e n coded,
+ to r e c o v e r S I u n i q u e l y
possible
(since T is i n v e r t i b l e ) . + to r e c o v e r S I from S I. the s e q u e n c e
bijection
it is
from this single i n t e g e r it is o b v i o u s l y p o s s i b l e
It is now p o s s i b l e to search t h r o u g h
(0,1,2,...)
(no,nl,n2,...) establishing
Also,
(mN .... , y ~ ) ) ) .
for the s e q u e n c e of i n t e g e r s + from w h i c h a v a l i d S I can be recovered.
the c o r r e s p o n d e n c e n o
n l ÷ + l ,.
, the r e q u i r e ~
is obtained.
Theorem
(3.2.1)
establishes
that a n o n n e g a t i v e
integer
can be u n i q u e l y a s s o c i a t e d w i t h every i n t e g e r system, that it t h e r e f o r e m a k e s system,
By
Si
(integer)
But it was p o i n t e d out in section 1.3 that
e v e r y input o b s e r v a t i o n observation
sense to refer to "the ith
and
set U (except U=b) and e v e r y o u t p u t
set can itself be r e g a r d e d as a system.
Consequently a nonnegative each of these sets.
i n t e g e r can be a s s o c i a t e d w i t h
As in s e c t i o n 2.2, we shall f r e q u e n t l y
not d i s t i n g u i s h b e t w e e n a s y s t e m and the i n t e g e r a s s o c i a t e d w i t h it.
3.3
Models It is t e m p t i n g
which operates
to r e g a r d
a model
on the i n p u t o b s e r v a t i o n
output observation
set of a system.
for m a n y p u r p o s e s ,
in p a r t i c u l a r
deterministic models. desirable
However,
to a l l o w m o d e l s
observations
recent
deterministic determine
in some e a s e s
it is
The most
not
common example
for the p r e d i c t i o n of
environment.
In this
to a l l o w the m o d e l
is u n n e c e s s a r y ,
system behaviour
f r o m the
since
initial
case
the use of
i n f o r m a t i o n on s y s t e m b e h a v i o u r .
case this
sufficient
to use some of the o u t p u t
in a s t o c h a s t i c
it w o u l d be u n r e a s o n a b l e
the
for the t r e a t m e n t of
in the use of m o d e l s
system behaviour
set to p r o d u c e
This w o u l d be
for c o m p u t i n g others.
of this o c c u r s
the m o s t
as an a l g o r i t h m
(In the
the m o d e l
conditions
can and
the i n p u t h i s t o r y ) . It is t h e r e f o r e
necessary
to a l l o w a m o d e l
effective procedure which operates as on inputs,
in o r d e r
to c o m p u t e
a b i l i t y m u s t be r e s t r i c t e d , for i n s t a n c e ,
models which
set s i m p l y by c o p y i n g to c o m p u t e p r e v i o u s
ones.
these u s e l e s s
with reference These
as w e l l
This
the o u t p u t o b s e r v a t i o n future observations
this r e s t r i c t i o n
I n s t e a d of d e f i n i n g m o d e l s types of a l g o r i t h m s ,
to c e r t a i n
sets of s u b s e t s input
cap-
if one is to e x c l u d e ,
We accomplish
sets d e t e r m i n e w h i c h
be u s e d
however,
it, or t h o s e w h i c h use
a r a t h e r r o u n d a b o u t way. to e x c l u d e
on some o u t p u t s the output.
compute
to be an
in
so as
we d e f i n e m o d e l s
of the o b s e r v a t i o n s .
and o u t p u t
observations may
for the c o m p u t a t i o n of e a c h output.
R e s t r i c t i o n to
64
the classes of m o d e l s w h i c h
are of i n t e r e s t is then
a c c o m p l i s h e d by s u i t a b l y d e l i m i t i n g these sets. We d i s t i n g u i s h b e t w e e n a b s t r a c t models, w h i c h are partial
r e c u r s i v e functions,
c o m p u t e r programs.
Definition
and c o n c r e t e models, w h i c h are
This t e r m i n o l o g y follows C h a i t i n
(43).
(3.3.1)
G i v e n a s y s t e m S=(U,Y), subsets of U,
let A = { A i} be a set of o r d e r e d
let B={B i} be a set of o r d e r e d subsets of Y,
and let C={C i} be a set of m d i s j o i n t o r d e r e d subsets of Y, w h i c h is c o m p l e t e in the sense that e v e r y yj c Y occurs in some C i.
The o r d e r i n g of the e l e m e n t s of A i , B i and
C i is to be the same as their o r d e r i n g in U and Y. be a set of o r d e r e d p a i r s Di=(Aj,Bk) , (i=l, finally
.,m) , and
let E be a set of o r d e r e d p a i r s E i = ( D i , C i ) ,
T h e n an a b s t r a c t E - m o d e l of the s y s t e m S=(U,Y) partial recursive function M:~x~, E
l
such that,
Let D
(i=l,...,m). is a
for each
e E, (i=l,... ,m) , M (i ,D i) = C i . . . . . . . . . . . . . . . . . .
(3. i)
This d e f i n i t i o n is b e s t i l l u m i n a t e d by some examples. We use the n o t a t i o n S = ( ( u l , . . . , U N ) ,
Example
(yl,...,yN)).
(3.3.2)
If only n o n - d y n a m i c m o d e l s w e r e of interest, we m i g h t
SS
specify
the sets A,B,C,D,E
as follows:
A i = ui ,
i=l,
..., N,
B
i=l,
(~ denotes
1
= ~
,
Ci = Yi
, i=l,
(A i, B1)=(ui,0)
Ei ~
(Di, C i) = E
function
an abstract
i=l,
E-model
...,N}. of S is a partial
recursive
for i=l,
.... , N.
(3.3.3)
If we were -memory
:
..., N,
M, such that
M(i, (u i,~) )= Yi
Example
, i=l . . . . , N,
(ui,Yi) , i=l,
= { (ui, yi )
In this case,
set)
..., N,
Di =
so that
the empty
(say,
interested
two-period)
in dynamical, models,
deterministic,
a suitable
finite
specification
the sets might be: A
= {~,(u,,u~,u3),
B
=
Ci
= Yi
(u2,u3,
u~) ..... (UN_2,UN_l,UN)},
{@} , i=l,...,N,
=I(~,~)
for i=i,2,
Di
I[((Ui_ 2,
ui_ 1,u i) ,~) for i=3,
= I ( (@,0) ,yi)
... , N
for i=1,2,
Ei ((ui_ 2,ui_ 1,u i) ,~) ,yi ) for i=3,
...,N
of
66
This time an a b s t r a c t E - m o d e l of S is a p a r t i a l r e c u r s i v e f u n c t i o n M, such that M(i, (~,~)) = Yi and M(i,
Example
'
for i=i,2,
((ui_2,ui_1,ui),~))
= Yi
for
i=3,
..., N.
(3.3.4)
If we w e r e i n t e r e s t e d in o n e - s t e p - a h e a d p r e d i c t i n g models, w h i c h w e r e a l l o w e d to use all past o b s e r v a t i o n s , we could s p e c i f y the sets as follows:
(We take U = b for s i m p l i c i t y
in this case) A
= {~}
B
= {~ 'Y~
'(YI'N2 )'
Ci
= Yi
i=l,...,N,
'
[(~,~)
for i=l
(NI'Y2'Y~) ..... (Yl ..... YN-I )}
,
Di [(@, (yl,...,yi_l) Ei
=I((~,~),yi)
for i=2,...,N,
for i=l,
[,(~' (YI' .... Y i - ~ ) ' Y i ) In this case,
for i=2, .... N.
an a b s t r a c t E - m o d e l of S is a p a r t i a l r e c u r s i v e
f u n c t i o n M, such that M(I, (~,~)) and
=
Yl
M(i,(@, ( y l , . . . , y i _ 1 ) ) ) = y i M takes
index, w h i c h o p e r a t l n g on.
two a r g u m e n t s
for i=2 .... ,N.
so that the first can act as an
"tells" M w h i c h b l o c k of o b s e r v a t i o n s
it is
The s i g n i f i c a n c e of this can be seen m o s t
c l e a r l y if we a n t i c i p a t e a little,
and c o n s i d e r M to be a
67
computer
program.
The p r o g r a m m a y be d e s i g n e d
on d i f f e r e n t
blocks
on t h e m w i t h
different
the p r o g r a m
in d i f f e r e n t
subroutines.
m u s t be told w h i c h
operating
on.
argument,
but
It w o u l d it w o u l d
of i n t e r e s t of the
inadvertently A more must
to a l l o w M only
recursive
satisfy.
functions,
the
is to s p e c i f y
it w o u l d
for w h i c h
class
because
"YkEBj ,, d e n o t e s
each of w h i c h
of m o d e l s
which
is
in the m a n n e r of
w h i c h m a y be of interest.
conditions
be s u f f i c i e n t E satisfied
which
desired
"copying"
these
sets
to e x c l u d e
the o u t p u t
to c o n s i d e r
only
the condition:
for e a c h E i E E , ( Y k £ B j & B j E D i ) ~ Y k ~ Here
a model
of the risk
if it w e r e
merely
one
algorithm.
sets A , B , C , D , E
some m o d e l s
of a m o d e l
set,
those E - m o d e l s
the
For example,
the p o s s i b i l i t y
to r e g a r d
computational
primarily
excluding
usual method
observation
be p o s s i b l e
to s p e c i f y
examples,
to do this,
it is c u r r e n t l y
by s p e c i f y i n g
above
In o r d e r
by o p e r a t i n g
block
to a d i f f e r e n t
It is n o t u s u a l
for e x a m p l e
then be n e c e s s a r y
as a set of such p a r t i a l corresponded
ways,
to o p e r a t e
Ci
.
" B . = ( . . . , y k , . . . ) , ,, and 3
,,BjeDi,,
denotes
"D =(. .,B )" i " j " Similarly, compute excluded
previous
if m o d e l s ones
by i m p o s i n g
Definition
which
use
future
are of no interest, suitable
conditions
observations
to
they can be on E:
(3.3.5)
A set E, d e f i n e d
as in d e f i n i t i o n
(3.3.1),is
nonanticipative
88 if, for each E i e E: (i) m a x ( p : y p e B j & B j E D i ) < m i n ( p : y p e C i) and
(ii)max(p:up~Aj&AjeDi)~min(p:ypeCi). An E - m o d e l w i l l be said to be n o n a n t i c i p a t i v e
if E is
nonanticipative. We w i s h to c o n s i d e r m o d e l s We f o r m a l i s e a c o m p u t e r
to be c o m p u t e r programs.
(together with a p r o g r a m m i n g
language)
as a
(not n e c e s s a r i l y universal)
3-place p a r t i a l
recursive
f u n c t i o n F, and a p r o g r a m as the first a r g u m e n t of
this function.
Definition
(3.3.6)
A concrete
(FvE)- m o d e l of the s y s t e m S=(U,Y)
is an
integer p, such that F (P,i,Di) =C i . . . . . . . . . . . . . . . . . .
(3.2)
for e v e r y EiEE , w h e r e C i , D i , E i , E are d e f i n e d as in d e f i n i t i o n (3.3.1) , a n d F : ~ X ~ x ~ i s
Theorem
a 3 - p l a c e p a r t i a l r e c u r s i v e function.
(3.3.7)
T h e r e exists an F, such that to e v e r y a b s t r a c t E - m o d e l M of S there c o r r e s p o n d s
a concrete
(F,E) -model p of S,
such that F(p,x,y)
= M(x,y)
for all x,y £N.
. . . . . . . . . . . . . . . .
(3.3)
89
Proof:
Choose F to be any 3-place u n i v e r s a l
recursive
function.
F(p,x,y)=M(x,y)
Then there exists p, such that
for all x , y £ ~
M(i,Di)=C i for every EieE. every EieE.
Theorem
(Rogers(9),p.22).
But
T h e r e f o r e F ( p , i , D i ) = C i for
Consequently
p is an
(F,E)-model
of S.
(3.3.8)
To every concrete an
partial
abstract E - m o d e l M(x,y) =F (p,x,y)
(F,E) -model p of S there corresponds
M of S, such that . . . . . . . . . . . . . . . . .
(3.4)
for all x,ye~. Proof:
By the s-m-n theorem
a 2-place partial M(x,y)=F(p,x,y) every EieEo
recursive
(Rogers(9),p.23),
function M, such that
for all x,ye~.
Consequently
there exists
But F ( p , i , D i ) = C i for
M ( i , D i ) = C i for every EieE,
and
therefore M is an abstract E -model of S. Theorems(3.3.7)
and
(3.3.8)
concrete m o d e l s
are e s s e n t i a l l y
F which appears
in d e f i n i t i o n
some computing The n o n n e g a t i v e
facility, integers
for such a facility Thus concrete models
show that abstract equivalent.
(3.3.6)
by invoking
and
The function
can be a s s o c i a t e d w i t h Church's
Thesis.
can be used to e n u m e r a t e
in some standard m a n n e r
the programs
(cf section
2.2.1).
can be a s s o c i a t e d with programs.
Note that every system has i n f i n i t e l y many a b s t r a c t E-models,
and that to each of these abstract
will u s u a l l y c o r r e s p o n d
infinitely
E-models
many concrete
(F,E)
there
70
-models always
(for a fixed F).
If F is u n i v e r s a l
correspond infinitely many concrete
each a b s t r a c t E -model, programs partial
G~del n u m b e r i n g ,
functions:
appropriate requires
it is k n o w n that in every f u n c t i o n has i n f i n t e l y
The task of m o d e l l i n g
abstract model
the p o s t u l a t i o n
(programs)
to
a G ~ d e l n u m b e r i n g for the
each p a r t i a l r e c u r s i v e
m a n y G ~ d e l numbers.
(F,E) -models
since the e n u m e r a t i o n of the set of
for F then c o n s t i t u t e s recursive
then there w i l l
for a system.
is to find an In p r a c t i c e
this
and e x a m i n a t i o n of c o n c r e t e m o d e l s
for it.
A c c o r d i n g to the d i s c u s s i o n of s e c t i o n 1.3, the goal of the m o d e l l i n g of d e f i n i t i o n statement,
exercise
(1.3.3).
is a d y n a m i c a l s y s t e m in the sense To r e c o n c i l e this w i t h the above
we p o i n t out that a n o n a n t i c i p a t i v e
E - m o d e l M can be r e g a r d e d as such a d y n a m i c a l providing
,
there are no gaps in the set C.). l
It
to take the time set T to be the set of integers
(O,l,2,...,m) time T=I,
system
that each C i has the form C i = ( y k , Y k + l , . . . , Y k + n i )
(in o t h e r words, suffices
abstract
w i t h the usual ordering,
to take the i n i t i a l
and to i d e n t i f y time t w i t h the c o m p u t a t i o n of C t
(using the t e r m i n o l o g y of d e f i n i t i o n s
(1.3.3) and
(3.3.1)).
The state at time t can be taken to be the input and o u t p u t h i s t o r y x ( t ) = ( ( U l , U 2 , .... u j ) , ( y l , y 2 ..... yk)), w h e r e j= m a x ( p : u p e A q & A q e D i, i~t)
and k = m a x ( p : y p e B q & B q ~ D i , i ~ t ) .
input at time t is the s e q u e n c e m ( t ) = ( u j + l,...,u r) w h e r e r--max(p:UpeAq&AqCDt) , and the o u t p u t is C t.
Since E is
The
71
nonanticipative, (x(t),~(t+l)),
all the e l e m e n t s of Dt+ 1 appear in
and so Dt+ 1 can be "assembled"
(x(t),~(t+l)).
from
Let ~ be an a l g o r i t h m for doing this.
Then ~ and M t o g e t h e r d e t e r m i n e
the state t r a n s i t i o n
f u n c t i o n ~:
(t+l; t, ( (u I ..... uj) , (Yl ..... Yk ) ) '~ (t+l)) = ((Ul,... ,uj,~(t+l)) , (YI' .... Yk 'M(t+l '~(t+l'x(t) ,~(t+l))))) . The initial state x(O)
is a p a i r of e m p t y sequences,
and
the r e a d o u t map ~ is d e f i n e d by n ( t , ( ( U l , . . . , u £ ) , ( y I .... ,Ym,Ct)))= Ct, Conventionally, behaviour, (3.3.1)
for t > O.
m o d e l s are allowed to a p p r o x i m a t e s y s t e m
r a t h e r than r e p r o d u c e it exactly.
and
(3.3.6)
however,
Definitions
r e q u i r e m o d e l s to compute the
o b s e r v e d s y s t e m b e h a v i o u r exactly.
This does not m e a n that
the class of m o d e l s w h i c h we can t r e a t is any smaller than the class of models w h i c h are u s u a l l y of interest. merely m e a n s
It
that w h e r e a s a c o n v e n t i o n a l m o d e l may r e p r o d u c e
a system b e h a v i o u r a p p r o x i m a t e l y ,
the c o r r e s p o n d i n g m o d e l
in our f o r m a l i s m has the a d d i t i o n a l task of g e n e r a t i n g the "corrections" behaviour, Fig.
w h i c h m u s t be a p p l i e d to the a p p r o x i m a t e
in o r d e r to p r o d u c e the e x a c t s y s t e m behaviour. 2 shows the c o r r e s p o n d e n c e b e t w e e n a type of
c o n v e n t i o n a l m o d e l c o m m o n l y e n c o u n t e r e d in c o n t r o l studies, and a m o d e l w h i c h s a t i s f i e s d e f i n i t i e n s It w i l l be r e c a l l e d that t h e o r e m "random"
is e q u i v a l e n t
table look-up",
(3.3.1)
(2.2.12)
and
(3.3.6).
suggests that
to "can be c o m p u t e d o n l y by u s i n g a
If a m o d e l is c o n s i d e r e d to be a s u m m a r y
72 of k n o w l e d g e
about a system,
then those c o m p u t a t i o n s of the
m o d e l w h i c h have to be p e r f o r m e d by using a table look-up correspond
to those aspects of the s y s t e m b e h a v i o u r w h i c h
are not u n d e r s t o o d ,
and cannot be p r e d i c t e d - in fact,
those that a p p e a r to be random. may be very d i f f e r e n t
from that shown in fig.
if they are "corrections", m o r e generally,
The role of these c o m p u t a t i o n s 2.
For example,
they need not be additive.
But
the terms c o m p u t e d by table look-up need not
play the role of "corrections".
T h e y may,
for instance,
be p a r a m e t e r s , w h i c h w o u l d c o n v e n t i o n a l l y be v i e w e d as " r a n d o m l y varying".
3.4 C r i t e r i o n of Q u a l i t y
The third c o m p o n e n t of our c h a r a c t e r i s a t i o n of m o d e l l i n g is a c r i t e r i o n of q u a l i t y of a model. Let F r e p r e s e n t a c o m p u t i n g programming
language.
facility,
together with a
Let c be an i n j e c t i v e f u n c t i o n from
the i n t e g e r s to the set of strings of t e r m i n a l in the progr a m m i n g language, w h i c h is used to r e p r e s e n t the integers
in
programs.
(c t h e r e f o r e is i n c l u d e d in the d e f i n i t i o n of the
programming
l a n g u a g e F.
The d e f i n i t i o n of p r o g r a m m i n g
languages is r e v i e w e d in A p p e n d i x A; given in C h a p t e r 7).
m o r e d~tails of c are
Let S be an i n t e g e r s y s t e m as d e f i n e d
in s e c t i o n 3 2, w i t h input and o u t p u t o b s e r v a t i o n s u~, •
Definition
i yj-
(3.4.1)
The trivial F m o d e l of S is the s h o r t e s t p r o g r a m w h i c h
73 4
is a concrete
(F,E)-model of S, such that each c(y~)
appears in it, where the minimisation all possible sets E
(defined by def.
of length ranges over (3.3.1)).
It is assumed that the length of a program is measured by the number of terminals
appearing in it.
The trivial model of a system is one which computes the output observation table look-up.
set by simply reading
it out from a
It is a model which the modeller has
available right at the beginning of the modelling
exercise,
before he has found any structure or pattern in the system behaviour. For any system S, let the sets Ci,D i be those defined by def.
(3.3.1) ~
One can think of the length of a concrete
(F,E)-model of S as the "perceived complexity", F, of the set
(Cl,...,Cm),
conditional
relative to
on the set
((l,Dl),...,(m,Dm)).
The greatest lower bound of this "perceived complexity", taken over all concrete
(F,E)-models of S, is just the
conditional Kolmogorovcomplexity
KF((C,,...,Cm) I ((l,Dl),...,(m,Dm))).
(Although Kolmogorov complexity was developed and binary programs,
for binary sequences
it can be readily generalised to sequences
and programs containing
any finite number of sMmbols).
approximate upper bound for this Kolmogorov complexity
An is
the length of the trivial model of S. The length of the trivial F model of S is the "perceived complexity"
of
(C~,...,C m) before any structure has been
discovered in the system behaviour.
If a shorter model of
74
S is found,
then its " p e r c e i v e d c o m p l e x i t y " w i l l be reduced.
R e c a l l i n g the a n a l o g y b e t w e e n c o m p l e x i t y and entropy, is a p p e a l i n g to m e a s u r e in
((l,D1),...,(m,Dm))
the " p e r c e i v e d q u a n t i t y of i n f o r m a t i o n " about
(CI, .... C m)
as the d i f f e r e n c e
b e t w e e n these two " p e r c e i v e d c o m p l e x i t i e s " . Kolmogorov complexity
it
Since
is not e f f e c t i v e l y c o m p u t a b l e ,
the o n l y
u p p e r b o u n d on this " p e r c e i v e d q u a n t i t y of i n f o r m a t i o n " which
is a v a i l a b l e ,
model.
in general,
is the length of the t r i v i a l
Thus the length of the trivial m o d e l is a m e a s u r e
of the a m o u n t of i n f o r m a t i o n p o t e n t i a l l y to be c o n v e y e d by the m o d e l l i n g exercise.
Definition
(3.4.2)
Let p be a c o n c r e t e trivial F m o d e l of S.
(F,E)-model of S, and let t be the Then the i n f o r m a t i o n ~ain I(p)
of p is the d i f f e r e n c e I (p) =£ (t) -Z (p) . . . . . . . . . . . . . . . . . . where
£(.)
denotes
In section
(3.5)
the length of a program.
i.i a simple e x a m p l e was p r e s e n t e d ,
which
s u g g e s t e d that the c o n f i d e n c e w h i c h one has in a m o d e l d e p e n d s on the d i f f e r e n c e b e t w e e n the n u m b e r of o b s e r v a t i o n s w h i c h the m o d e l e x p l a i n s
and the n u m b e r of o b s e r v a t i o n s
r e q u i r e d to c o n s t r u c t the model. m e a s u r e of this difference.
The i n f o r m a t i o n g a i n is a
If the i n f o r m a t i o n gain is
zero, then all of the o u t p u t o b s e r v a t i o n s have been used to c o n s t r u c t the model;
the t r i v i a l m o d e l is, of course,
prime e x a m p l e of such a model.
the
If the i n f o r m a t i o n gain is
7S
close to its u p p e r enough
bound
£(t),
to be c o n s t r u c t e d
observation
set,
by the model.
"parameters"than
that
the m o d e l
is j u s t i f i e d
course,
implies
(Chapter that
of a s y s t e m
contains
sets
that system.
This
that if w e have we can n e v e r
only
the
latter
accords
size
confidence
of the p r o g r a m m i n g
aspect).
This
model
the i n t u i t i v e
of
notion
of a system,
in any m o d e l
claim
in some m o d e l
of the t r i v i a l
well w i t h
of
is c o n t a i n e d
we m a y have
a few o b s e r v a t i o n s
have m u c h
We assume,
the s y s t e m
confidence
by the
in a m o d e l
increases.
about
about
set.
and in the d e f i n i t i o n
the p o s s i b l e
of course.
arbitrary
the c o n f i d e n c e
gain
4 deals with
is b o u n d e d
more
observation
that
all our k n o w l e d g e
in the o b s e r v a t i o n
set is " e x p l a i n e d "
by the amount of i n f o r m a t i o n
as its i n f o r m a t i o n
that
language
then,
of this
of the o u t p u t
gain m a y be n e g a t i v e ,
in the o u t p u t
We are c l a i m i n g ,
is simple
a small p a r t
and the r e m a i n d e r
contained
increases
the m o d e l
from o n l y
The i n f o r m a t i o n
This i n d i c a t e s
reality
then
then
of it w h i c h
may be p o s t u l a t e d . We e m b o d y of w h i c h
Axiom
our c l a i m
a c h o i c e m a y be m a d e
following
between
axiom,
competing
on the basis models.
(3.4.3)
If S is a system, El- m o d e l s
an
and El and E2
of S and Ez- m o d e l s
and q are models, being
in the
(F,E2)
has the h i g h e r model of S.
with p being
-model
of S an
of S, then
information
gain
are sets
such
that
are of interest,
(F,EI)
-model
and p
of S and q
the one of p and q w h i c h
is to be c h o s e n
as the b e t t e r
76 This a x i o m implies that good m o d e l s Good m o d e l s w i l l t h e r e f o r e computational
are small models.
tend to use the same
(short)
a l g o r i t h m for as many c o m p u t a t i o n s
as p o s s i b l e ,
since the s p e c i f i c a t i o n of every new a l g o r i t h m i n c r e a s e s the size of the model. specific
Thus the above a x i o m p r o v i d e s
a
link b e t w e e n the w i d e l y - h e l d b e l i e f that s i m p l i c i t y
(as m e a s u r e d by smallness)
is d e s i r a b l e
and the a l m o s t u n i v e r s a l c o n v i c t i o n r e g u l a r i t y has been repeated,
in s c i e n t i f i c h y p o t h e s e s ,
that the more an o b s e r v e d
the more
likely it is to recur.
The f o l l o w i n g t h e o r e m is a crucial c h a r a c t e r i s a t i o n of m o d e l l i n g .
feature of our
As before,
£(p) d e n o t e s
the length of p r o g r a m p, m e a s u r e d by the n u m b e r of t e r m i n a l c h a r a c t e r s w h i c h appear in it.
Theorem
(3.4.4)
T h e r e is, in general, an
no e f f e c t i v e p r o c e d u r e
(F,E) - m o d e l p of a system S, such that,
for finding
for any o t h e r
(F,E) -model q of S, ~(p)~ £(q).
Proof.
S u p p o s e that such an e f f e c t i v e p r o c e d u r e exists.
C o n s i d e r the case E = { E I } = { ( ~ , Y ) } ( w h e r e have an e f f e c t i v e p r o c e d u r e
S=(U,Y)).
Then we
for f i n d i n g the s h o r t e s t p r o g r a m
w h i c h computes Y, using only the set {i}. Now suppose that the p r o g r a m m i n g t e r m i n a l characters. procedure
language F has only two
Then there exists
an e f f e c t i v e
for finding the s h o r t e s t b i n a r y p r o g r a m w h i c h
c o m p u t e s Y, u s i n g b i n a r y sequence:
{i}.
But Y can be a s s o c i a t e d u n i q u e l y w i t h
Y is a system,
and can t h e r e f o r e be
77
a s s o c i a t e d w i t h its index in some fixed e n u m e r a t i o n of systems. This i n d e x can be a s s o c i a t e d w i t h a b i n a r y s e q u e n c e by the b i j e c t i o n i n t r o d u c e d in s e c t i o n 2.2,1. above steps is effective,
Since each of the
there exists an e f f e c t i v e p r o c e d u r e
for finding the s h o r t e s t b i n a r y p r o g r a m w h i c h c o m p u t e s
the
binary s e q u e n c e a s s o c i a t e d w i t h Y, and h e n c e there exists an e f f e c t i v e p r o c e d u r e
for finding its length, n a m e l y the
K o l m o g o r o v c o m p l e x i t y KF(YII).
S u p p d s e F is optimal.
Then, by C h u r c h ' s Thesis,
is p a r t i a l recursive, w h i c h
contradicts
theorem
Theorem
K(YII)
(2.2.5).
This p r o v e s
the theorem.
(3.4.4) does not rely on F h a v i n g only two
terminals or on E h a v i n g the form i n d i c a t e d in the proof. These a s s u m D t i o n s to t h e o r e m
are made in o r d e r to d e r i v e
(2.2.5).
However,
a contradiction
as m e n t i o n e d earlier,
this
t h e o r e m can be g e n e r a l i s e d to the case w h e r e the s e q u e n c e s c o n s i d e r e d have an a r b i t r a r y
finite n u m b e r of symbols,
to cover the u n c o m p u t a b i l i t y
of c o n d i t i o n a l complexity.
the o t h e r hand,
and On
the t h e o r e m does rely on F b e i n g optimal.
A sufficiently restricted programming shortest m o d e l to be found, systems w i l l not p o s s e s s
language may allow the
if it exists.
any m o d e l s
simplest e x a m p l e is a p r o g r a m m i n g
However,
most
in such a l a n g u a g e
(the
l a n g u a g e w h i c h always
computes the same thing, w h a t e v e r p r o g r a m it may be given). Theorem
(3.4.4)
implies that there is no a l g o r i t h m for
finding the m o d e l of a s y s t e m w h i c h has the h i g h e s t i n f o r m a t i o n gain.
So, a c c o r d i n g to our axiom,
finding the b e s t m o d e l of a system.
there is no a l g o r i t h m for C o n s e q u e n t l y the
m o d e l l i n g e x e r c i s e c a n n o t p r o c e e d a c c o r d i n g to some
78
"universal
modelling
of n o n a l g o r i t h m i c followed our
by the
algorithm",
(creative?)
assessment
but must
postulation
of these
involve
a process
of h y p o t h e s e s ,
hypotheses,
according
to
axiom. Note
that the
in s e c t i o n is still
(2.2.4),
Most models
of data w h i c h
up,
it is m o s t
can be explained.
error.
by the m u m b e r
It w i l l
of c h a r a c t e r s
a table
not b e e n e x p l a i n e d usually
be possible
system behaviour
any table
look-ups
algorithms
- that
in the m o d e l
in the rest
a trade-off
between
which would
conventionally
and the d e g r e e
the
a table required
look-
of the data as
to p r o g r a m
it,
of p r e d i c t i o n features
of
by the model. to e x p l a i n m o r e is,
to r e d u c e
of the
the size of
- only by u s i n g m o r e - that Thus
of q u a l i t y
complexity
the use of s m a l l n e s s of a m o d e l
leads
to
of the m o d e l
as the m o d e l
provided
elaborate
is, by i n c r e a s i n g
of that p a r t
be r e g a r d e d
of a p p r o x i m a t i o n
table
look-up,
is, the more
of the m o d e l
as a c r i t e r i o n
aspect
measure
the size of the rest of the model. of the p r o g r a m
artificially
at least one
that e v e r y
size of such
such
the s h o r t e s t
Criteria.
has not been
as a very g e n e r a l
The b i g g e r
the d a t a have
observed
The
there
model.
to c o n t a i n
unlikely
is m e n t i o n e d
is found,
for finding
with Conventional
can be e x p e c t e d
can be r e g a r d e d
(28), w h i c h
if a m o d e l
procedure
generated
measured
that
of that p a r t i c u l a r
Compatibility
since
of M e y e r
implies
no a l g o r i t h m i c
implementation
3.5
result
(cf fig. 2),
by that p a r t of the
79
model
to the o b s e r v e d
therefore model
provides
data.
The use of this
a safeguard
against
observation
set is large
criterion
"overfitting"
the
to the data. If the o u t p u t
to the size look-up,
of that part
then
the
of the m o d e l
size of the
the size of the model. are b e i n g quality table
compared,
leads
to the
selection This
p r e f e r e n c e for small is large
In this
which
is not
look-up(s)
case,
fitting
enough
the
of smaller
conventional
if the n u m b e r
for the d a n g e r
dominate
criterion
to the
errors,
a table
will
of the m o d e l w i t h
corresponds
relative
if two such m o d e l s
the use of the p r o p o s e d
look-up(s).
ations
table
enough,
of o b s e r v -
of o v e r f i t t i n g
to be
d~missed. The d e f i n i t i o n mines
the d e t a i l s
the s m a l l n e s s
of the p r o g r a m m i n g
of the
trade-off
criterion.
are
definition,
is c o n s i d e r e d
A serious proposed about
the s y s t e m
A typical
is not
situation
a program, a priori
the
has
smallest
of this where
that
of the
language
a priori this m a y
the use of the knowledge
indicate
that
a
should
be p r e f e r r e d .
in c o n v e n t i o n a l
system
identification
a particular
will
about
knowledge
a more
look-
available
~
a smaller
knowledge
then
part
table
7.
be m a d e
If s u f f i c i e n t
model with
It m a y h a p p e n
must
used d e t e r -
in the use of
in w h i c h
constitutes
in C h a p t e r
is a v a i l a b l e
example
parametric
which
reservation
criterion.
model w h i c h
is the
coded,
implicit
The m a n n e r
up e l e m e n t s
language
elaborate overall
prevent
indicates
structure model,
size;
that
is a p p r o p r i a t e .
when written
nevertheless,
that m o d e l
a
being
as
the
chosen
as
80
better.
A n o t h e r e x a m p l e is p a r a m e t e r e s t i m a t i o n of a
l i n e a r d y n a m i c a l p r o c e s s w h o s e o u t p u t is c o r r u p t e d by noise. In this case a s t r a i g h t f o r w a r d m i n i m i s a t i o n of the e q u a t i o n e r r o r u s u a l l y leads to b i a s e d e s t i m a t e s So if two m o d e l s
(Eykhoff,
are b e i n g c o m p a r e d w h o s e
c o n t a i n the e q u a t i o n errors,
table look-ups
it is p o s s i b l e
that the larger
one w i l l be p r e f e r r e d on p r o b a b i l i s t i c grounds. again,
a priori k n o w l e d g e
the s m a l l n e s s
(44)).
(about the noise)
Once
is r e q u i r e d if
c r i t e r i o n is to be overridden.
Furthermore,
the s m a l l n e s s c r i t e r i o n could still be u s e d to d e c i d e b e t w e e n the l a r g e r of these two m o d e l s and a third m o d e l b e l o n g i n g to a d i f f e r e n t
class.
As i n d i c a t e d in s e c t i o n intended
i.i,
the p r o p o s e d c r i t e r i o n is
for use in s i t u a t i o n s w h e r e little a p r i o r i
is available,
information
or in s i t u a t i o n s w h e r e it is too d i f f i c u l t to
use such a p r i o r i k n o w l e d g e
for m o d e l assessment.
The s m a l l n e s s - of - m o d e l c r i t e r i o n choice of m o d e l
leads to the same
as do s t a t i s t i c a l c o n s i d e r a t i o n s ,
i m p o r t a n t class of s y s t e m b e h a v i o u r s
for a v e r y
and m o d e l s of them.
If the s y s t e m b e h a v i o u r is a s t a t i o n a r y r a n d o m p r o c e s s w i t h rational spectral density predict,
at any time,
function,
its future behaviour,
the m e a n - s q u a r e p r e d i c t i o n error. to W i e n e r and
then it is known how to So as to m i n i m i s e
The method,
due e s s e n t i a l l y
Kolmogorov, is to m a k e the p r e d i c t i o n for any
future time a s u i t a b l e
linear f u n c t i o n of past o b s e r v a t i o n s
of the b e h a v l o u r
(46).
(45),
are e q u a l l y spaced,
If the o b s e r v a t i o n i n t e r v a l s
and a p r e d i c t i o n
is b e i n g m a d e at each
i n s t a n t of the s y s t e m b e h a v i o u r at the n e x t o b s e r v a t i o n instant,
81
then the p r e d i c t i o n errors are equal
to the random,
uncorre-
lated d i s t u r b a n c e s w h i c h are i m a g i n e d to be acting on the system. S u p p o s e it is d e s i r e d to b u i l d a c o n c r e t e
(F,E) - m o d e l
of the s y s t e m w h i c h w i l l give useful o n e - s t e p - a h e a d predictions.
Any E can be chosen w h i c h allows the m o d e l
to use p r e v i o u s o b s e r v a t i o n s example
(3.3.4)).
to compute p r e d i c t i o n s
(cf.
The m o d e l w i l l have to g e n e r a t e terms
c o r r e s p o n d i n g to p r e d i c t i o n errors by m e a n s of a table up.
If the p r o g r a m m i n g
look-
l a n g u a g e used codes table look-
up terms in such a way that length of code is n o n d e c r e a s i n g with the m a g n i t u d e of the term
(cf. C h a p t e r 7), then,
s u f f i c i e n t l y long s e q u e n c e of o b s e r v a t i o n s , smallest
(in magnitude)
(in length)
for a
the model w i t h
p r e d i c t i o n errors will be the s m a l l e s t
model.
But it is k n o w n that,
for the s y s t e m u n d e r c o n s i d e r a t i o n ,
the s m a l l e s t m e a n square p r e d i c t i o n error is o b t a i n e d by the use of the W i e n e r - K o l m o g o r o v theory. Sherman
(47) has shown that,
Furthermore,
if the p r o c e s s
is Gaussian,
then
the same linear p r e d i c t o r is o b t a i n e d if the e x p e c t a t i o n of any even n o n d e c r e a s i n g
f u n c t i o n of the p r e d i c t i o n error is
minimised. So, u n d e r these conditions, of o b s e r v a t i o n s , to axiom
(3.4.3),
the " e x p e c t e d b e s t model",
uncorrelated
fore s u g g e s t s
judged according
is the W i e n e r - K o l m o g o r o v model.
terms a p p e a r i n g in the table a random,
for a long enough s e q u e n c e
look-up of this m o d e l
sequence.
Theorem
(2.2.12)
The constitute there-
that these terms could not be g e n e r a t e d by any
82
m o r e e f f i c i e n t a l g o r i t h m than a table
look-up.
3.6 P r e d i c t i o n If the b e s t m o d e l that has b e e n found up to some time is a c o n c r e t e
(F,E) - m o d e l p, and it is d e s i r e d to find the
s y s t e m b e h a v i o u r u n d e r some new conditions,
(possibly not yet observed)
w h i c h can be r e p r e s e n t e d by a b l o c k of " v i r t u a l
observations",
D m + ~ , t h a t is, the o b s e r v a t i o n s w h i c h w o u l d
be o b s e r v e d if the new c o n d i t i o n s obtained,
then the m o d e l p,
and the c o m p u t e r F, can be used to find the " p r e d i c t i o n " F(p,m+l,Dm+1).
This p r o v i d e s
v a l u e s of a p o s s i b l e
a m e a n s of c o m p u t i n g
input/output
the
f u n c t i o n of the s y s t e m on
elements of its d o m a i n w h i c h have not b e e n p r e v i o u s l y observed.
According
best "predictions" "prediction"
to our axiom,
a v a i l a b l e to us.
in quotes,
these values
are the
We have put the w o r d
b e c a u s e the v a l u e F ( p , m + l , D m + I)
n e e d not r e p r e s e n t a future v a l u e
(for example,
if the m o d e l
runs b a c k w a r d s t h r o u g h the o b s e r v a t i o n interval). It is possible, is not defined.
In this case,
use p for p r e d i c t i n g
%+i
of course,
However,
that the value F ( p , m + l , D m + I) it may not be p o s s i b l e
s y s t e m b e h a v i o u r u n d e r the c o n d i t i o n s
for some models,
the v a l u e F ( p , m + l , D m + I)
may be u n d e f i n e d simply b e c a u s e p i n c l u d e s
the g e n e r a t i o n of
c e r t a i n p a r a m e t e r s by m e a n s of a table look-up,
and the table
does not contain an e l e m e n t w h i c h is to be used for the computation.
to
In this case, p r e d i c t i o n
(m+l)th
is still p o s s i b l e
if
83
such an e l e m e n t what v a l u e propose
should
a second
extension
Axiom
be s u p p l i e d that
to the model.
element
axiom,
take?
Our
w h i c h m a y be
of the p r e v i o u s
The p r o b l e m solution
is,
is to
thought
of as an
to t a b l e
look-ups
one.
for P r e d i c t i o n
If e l e m e n t s model,
in o r d e r
are
to be s u p p l i e d
to a l l o w
then the b e s t p r e d i c t i o n are chosen
that m o d e l will
so as to m i n i m i s e
to c o m p u t e
be o b t a i n e d the
a prediction,
if these
resulting
of a
elements
increase
in
size of the model. The use of this the v a l u e
of the i n f o r m a t i o n
to a t r i v i a l no c o n f i d e n c e A rough The b a s i c
axiom must
model,
thus
gain.
enabling
in that p r e d i c t i o n , justification
assumption which
observations,
will
of the s y s t e m
during
should
the e l e m e n t s
that p r e v i o u s l y
have b e e n continue
the p r e d i c t i o n
observed
requires
a large
of code
amout
to c o m p u t e
such r e g u l a r i t i e s , by using
the
"fixed"
part
is that
can c e r t a i n l y
of
H e n c e we
look-up
is such
to appear
any
in the b e h a v i o u r
to be such
are p r e s e n t
if the m o d e l
of the m o d e l
b u t we have
as follows.
interval.
an o u t p u t w h i c h
then we
an e l e m e n t
in a s e q u e n c e
table
by
the e l e m e n t m a y be.
prediction
regularities
But,
supply
runs
to be p r e s e n t
prediction.
in o r d e r
whatever
detected
for the
of course,
it to predict;
of the a x i o m
computed
up,
We can
of s c i e n t i f i c
regularities,
choose
be q u a l i f i e d ,
in the that
in a table
look-
is c o n s i s t e n t obtain (i.e.
it
with
a better model
the p a r t
that
84
is common to all the computations)
to c o m p u t e the regularities.
This is true b e c a u s e for a s u f f i c i e n t l y observations, average look-up.
large set of
the size of the m o d e l w i l l be g o v e r n e d by the
length of code a p p e a r i n g as e l e m e n t s of the table Thus the a x i o m is r e a s o n a b l e if it is a s s u m e d
that it is a p p l i e d to the b e s t a v a i l a b l e model. The above a r g u m e n t can be i l l u s t r a t e d by the f o l l o w i n g example.
S u p p o s e a s y s t e m is d e f i n e d by the o b s e r v a t i o n s :
S=(U,Y)=(b, ( 5 5 2 , 5 5 3 , 5 4 6 , 5 5 1 , 5 4 9 , 5 4 4 , 5 4 7 , 5 5 4 , 5 5 7 , 5 5 1 ) ) . If the p r o g r a m m i n g
l a n g u a g e and c o m p u t i n g
f a c i l i t y F is
t a k e n to be A l g o l W, as i m p l e m e n t e d on the IBM 370/165 i n s t a l l a t i o n at C a m b r i d g e ,
and E i = ( ~ , Y i ) , i = l , . . . , l O , (so
that E3=(@,546) , for example),
then a trivial
(F,E) - m o d e l
of S is: B E G I N I N T E G E R I,J;
I N T E G E R A R R A Y Y(I::IO);
FOR J:=l U N T I L I0 DO READ READ WRITE
( Y(J));
(I) ; (Y(I)) ;
END.
552,553t546,551,549,544,547,554,557,551 , W h e n p r e s e n t e d w i t h an i n t e g e r i
(16i~iO), this p r o g r a m
c o m p u t e s Yi by looking it up in the array Y. We k n o w that this m o d e l
is useless for p r e d i c t i o n ,
b e c a u s e it is a trivial model. it c o m p u t e a "prediction".
we can m a k e
We m u s t first supply it w i t h a
new e n t r y in its table look-up. integer
Nevertheless,
To do this, we replace the
iO in line 2 by the i n t e g e r ii, and add a new n u m b e r
85
at the end of the program.
When presented
ii, this p r o g r a m w i l l
the new number.
this n u m b e r
be?
output
According
to our A x i o m
should be one of the i n t e g e r s Yll will
then be that
Clearly nearer
can see that
it.
But why
regularity
will
doing this
close
of this
to 550,
INTEGER
READ WRITE
of
In o t h e r w o r d s ,
by not o b e y i n g
Because
to 550.
But
to b u i l d that
"mean plus INTEGER
UNITL
that
a
the b e h a v i o u r
case w e
can use
O n e w a y of
of the b e h a v i o u r
to b u i l d
random ARRAY
iO DO READ
the A x i o m
detected
a b e t t e r model. the m e a n
we
by o b e y i n g
we have
in that
and t h e r e f o r e
I,J;
F O R J:=l
it
A prediction
than one o b t a i n e d
see this?
is of the c o n v e n t i o n a l BEGIN
should
The p r e d i c t i o n
be better.
obtained
is to o b s e r v e
close
integer
for P r e d i c t i o n ,
in the s y s t e m b e h a v i o u r - n a m e l y ,
our k n o w l e d g e
What
is a v e r y bad one.
be b e t t e r
can we
tends to r e m a i n
remains
"obviously"
a prediction
for P r e d i c t i o n
the
integer.
this p r e d i c t i o n
550 w o u l d
0,...,9.
with
a model
error"
which
type:
E(I::IO);
(E(J));
(I) ; (550+E(I)) ;
END. 2,3,-4,1,-i,-6,-3,4,7,1, This m o d e l the o b s e r v e d obtained
from
only s l i g h t l y
computes
regularity a table better
gain is 12 terminals,
the
system
(550),
look-up. than but
the
behaviour
and c o r r e c t i n g Admittedly, trivial
it w o u l d
model
rapidly
Y by c o m p u t i n g it by a term this m o d e l
is
(its i n f o r m a t i o n become
decisively
86 superior cl o s e
if m o r e
observations
became
available,
which
remained
to 550. In this
obtain
case,
if we
as the p r e d i c t e d
500 and
559.
This
Clearly,
apply next
time,
several
our A x i o m
output
an i n t e g e r
the p r e d i c t i o n
similar
models
each of t h e m there
is a c o n s i d e r a b l e
It may be p o s s i b l e
to r e d u c e
estimating
the p r o b a b i l i t y
table
terms.
3.7
An E x a m p l e
3.7.1
will above
data,
In this
section
portray
a particular
example
w h i c h was
can be built,
range,
distribution
an e x a m p l e
will
model~ng
and
of"best"
for
predictions.
for e x a m p l e of the
by
look-up
be p r e s e n t e d ,
exercise
of 296 p a i r s
which
in terms
gas
of the
and J e n k i n s
flow
as Series
observations.
rate
The
furnace
(45).
into
The J), The
a furnace,
are of the c o n c e n t r a t i o n
gases.
observations
and
of c a r b o n were made
at
seconds.
obtain
a model
of a d e t e r m i n i s t i c flow rate
of the gas
and J e n k i n s
of i n p u t - o u t p u t
of nine
and J e n k i n s
consists
the i n p u t
by Box
observations
intervals
by Box
are of gas
in the o u t l e t
Box
is the m o d e l l i n g
considered
observations
dioxide
which
used
(which is g i v e n
the o u t p u t
equal
reasonable.
characterisation.
consists input
lying b e t w e e n
is q u i t e
range
we
Introduction
The data,
this
for P r e d i c t i o n ,
for these
transfer
to the o u t p u t
observations,
function
concentration
relating of carbon
87
dioxide,
and a m o d e l
deterministic
of the n o i s e
relationship.
process
The m o d e l
which
disturbs
they o b t a i n
the
is:
2
^ Yt
0.53+0 =
37B+O "
--
51B "
u t --3
. . . . . . . .
(3.6)
2
I-0.57B-O.OIB nt
1
=
wt. . . . . . . . .
(3.7)
. . . . . . . . . . . . . . . . .
(3.8)
2
I-O.53B+O.63B Yt
=
Yt+nt
IIere u t and y~ r e p r e s e n t respectively,
after
the input
removal
and o u t p u t
of t h e i r m e a n
variables,
values,
at
^
sampling
instant
generated
t.
Yt r e p r e s e n t s
by the t r a n s f e r
the e s t i m a t e
function
of y~
of eqt~ (3.6),
and n t is
^
the error b e t w e e n identification in variables" (48))).
y~ and Yt"
terminology, in the
white-noise"
process
random s e q u e n c e ) , nt according operator,
u,y denote
(i.e.
which
(3.7).
by Bx t = xt_ I.
the m e a n
to cause
of the
(Johnston acting
on
uncorrelated the d i s t u r b a n c e
B is the b a c k w a r d The m o d e l
representation,
values
("error
t, and w t is a " d i s c r e t e
is c o n s i d e r e d
diagrammatic
disturbance
a zero-mean,serially
to r e l a t i o n s h i p
defined
conventional
at time
system
error"
of e c o n o m e t r i c s
a stochastic
of the p r o c e s s
conventional
n t is an "output
terminology
n t represents
the o u t p u t
Using
input
shift
can be g i v e n
as in fig. and o u t p u t
a
3, w h e r e
variables,
respectively.
3.7.2
The S y s t e m
In terms
of d e f i n i t i o n
(1.3.1),
the
s y s t e m w h i c h we
are
88
considering
is S=(U,Y)
where
U=(u
, ....
u
1
Y=(y
1
, ...,
£ . = m =i, l l and As
the
Y 2 9 G ),
for
example
of
programming
language
IBM
installation
3.7.3
Model
definition
...,
296,
{ui,Y i}
are
section
3.6,
F to be A l g o l
I - The
We m u s t
i=l,
observations
in the
370/165
), 296
3.3.1.
listed
we
shall
in A p p e n d i x take
as i m p l e m e n t e d
C.
the on the
at C a m b r i d g e .
Trivial
define
W,
as
Model
the
sets
For
the
A,B,C,D,E, w h i c h trivial
model,
we
occur can
in
take
these
to be: A =
{Ai:i=i,...,296}
,
Ai= ~
B = { B i : i = l .... ,296}
,
Bi=~
C =
{Ci:i=i,...,296}
,
Ci=Y i
D =
{Di:i=i,...,296}
,
Di=(Ai,Bi)=@
,
Ei=(Di,Ci)=(~,y
i)
is a t r i v i a l
model
E = { E i : i : l ..... 296} A concrete S,
(F,E)
-model,
which
is: BEGIN
INTEGER
I,J;
FOR J:=l READ WRITE
UNTIL
REAL 296
ARRAY
Y(I::296);
DO R E A D O N
(I) ; (Y(I)) ;
END. 53.8
53.6
53.5
57.0
(Y(J)) ;
of
the
system
89 The last line of the trivial m o d e l is the table look-up, which c o n t a i n s the o u t p u t o b s e r v a t i o n s . can be r e p r e s e n t e d d i a g r a m m a t i c a l l y ,
3.7.4
Model
The t r i v i a l m o d e l
as in fig.
4(a).
II - The Mean
Probably
the first n o n t r i v i a l m o d e l to be h y p o t h e s i s e d
for many systems is that the s y s t e m b e h a v i o u r has a c o n s t a n t mean value.
This m o d e l is of the type w h i c h r e p r o d u c e s
regularities only in the o u t p u t o b s e r v a t i o n s ,
and does not
exploit any i n f o r m a t i o n in the input o b s e r v a t i o n s .
Con-
sequently, the sets A , B , C , D , E may be taken to be the same as for the t r i v i a l model.
The m e a n value of the o u t p u t
observations is 53.5.
The f o l l o w i n g is a
(P,E)-model of S
which m a k e s use of this fact: BEGIN
I N T E G E R I,J;
REAL A R R A Y Y(I::296);
FOR J : = l U N T I L 296 DO READON READ
(Y(J));
(I);
WRITE
(53.5 + Y(I)) ;
END.
.3
.i
O
0
-.i
. ..
3.8
3.5
The table look-up of this m o d e l is listed in the column headed y~ in A p p e n d i x C. Fig. r e p r e s e n t a t i o n of this model.
4(b)
shows a d i a g r a m m a t i c
The d a s h e d line r e p r e s e n t s
the b o u n d a r y of the model.
3.7.5 M o d e l I I I -
Deterministic Transfer Function
We now assume that the t r a n s f e r f u n c t i o n of e q u a t i o n
90
(3.6) the
has been
input
and output
restriction output may
hypothesised
that
of
assume
the
between we make
knowledge
initial
the past
to c h o o s e
{Ai:i=1,...,296}
However,
not
sets
the
of past
conditions),
and present
new
Ai = A =
relationship
system.
than
of all
We have
the
may
(other
knowledge
information.
the
the model
observations
assume
as
A,
but
input
...,
(u ,u ,. 1 2 "''ui)
E: f o r i~6
, A i = @ for i<6
B =
{Bi:i=i,...,296}
,
Bi =
C = {Ci:i=i,.o.,296}
,
Ci = Yi
D = {Di:i=i,...,296}
,
Di =
(Ai,B i)
Ei =
((ul,u2,...,ui),Yi).
E = { E i : i = l , .... 2 9 6 } , An
(F,E)
-model,
behaviour BEGIN
which
is g o v e r n e d INTEGER
the
hypothesis
by equation
I,J;
FOR J:=l READ
uses
(3.6),
REAL ARRAY
UNTIL
296
WRITE
DO R E A D O N
(N(I))
BEGIN
FOR J:=l
UNTIL
5 DO
BEGIN Y(J) := N ( J ) - 5 3 . 5 ; READON
(U (J)) ;
END; FOR J:=6
UNTIL
I DO
(u ,U 1
that is:
ELSE
(N(J));
2
the
N,U,Y(I::296);
(I) ;
IF I<6 THEN
=
, . . . , u i)
system
91
BEGIN READON
(U (J)) ;
Y (J) :=-. 53"U (J-3)-. 37"U (J-4).51*U (J-5) +. 57"Y (J-l) + .OI*Y(J-2) ; END ; WRITE
(Y(I)+53.4+N(I)) ;
END; END. 53.8
53.6
53.5
53.4
-.2
The table look-up of this m o d e l in the column h e a d e d n.. 1
-.4
...
4.1
is listed in A p p e n d i x C
N o t e that the first five terms of
the table are simply the OUtDut o b s e r v a t i o n s
reason for this is that e q u a t i o n Yi to the values of u i_3,ui_
y l , . . . , y s.
The
(3.6) relates the v a l u e of
,ui_s.
H e n c e this e q u a t i o n
cannot be used for g e n e r a t i n g the first five terms of the observed behaviour. by table look-up. model.
So the first five terms are g e n e r a t e d Fig.
The s e q u e n c e
defined by e q u a t i o n Note that fig.
4(c)
shows a r e p r e s e n t a t i o n of this
{n i} is the same as the s e q u e n c e
{n t}
(3.8). 4(c)
shows the s u b t r a c t i o n of the m e a n
of the input o b s e r v a t i o n s b e f o r e these o b s e r v a t i o n s
are
submitted to the t r a n s f e r f u n c t i o n algorithm.
In fact,
model achieves
53.4 i n s t e a d
this more e c o n o m i c a l l y by h a v i n g
of 53.5 in its output statement.
the
4.0
92
3.7.6
Model
IV
-
Deterministic
Transfer
Function,
Using
Output Observations. Now suppose as governing
the behaviour
is now allowed as past nature
that the same transfer
of the system,
to use all past output
and present
function
but that the model
observations,
input observations.
of the transfer
function,
A = {Ai:i=i,...,296} ,
is hypothesised
Because
suitable
as well of the
sets A,...,E
Ai= ~ for i<6, Ai=(ui_s,ui_
are: ,ui_3)
for i>6, B = {Bi:i=i,...,296} ,
Bi=~ for i<6, Bi=(Yi_1,yi_ 2) for ij>6,
C = {Ci:i=i,...,296} ,
Ci=Y i
D = {Di:i=i,...,296} ,
Di=(Ai,Bi){~
for i<6
=
(ui_s,ui_ # ,u.i--3 ) ' (Yifor i>6,
1
'Yi-2 ))
I
=(~,yi ) for i<6
E = {Ei:i=I,...,296},
Ei= (Di,C i )
|: c c cui- 5 ,u~_~ ,u~_~ ~ ' cYi- ~ 'Yi-~ ! Yl )
for i>~6. A suitable BEGIN
(F,E) -model INTEGER
is:
I,J;
FOR J:=l UNTIL READ
REAL ARRAY E(I::296); 296 DO READON
(I) ;
IF I<6 THEN WRITE
(E(I))
ELSE
BEGIN READON
(U,V,W,Y,Z) ;
(E(J));
REAL U,V,W,Y,Z;
93 WRITE
( - . 5 3 * U - . 3 7 * V - . 5 1 * W + . 5 7 * Y + .OI*Z + 22.4 + E(I));
END;
END, 53.8
53.5
53.6
53.5
53.4
-.2
-.3
...
1.7
1.6
The table look-up for this m o d e l is listed as column e. l in A p p e n d i x C.
The d a t a items
U=ui-~' V=u.1-% , W=ui_s,
r e q u i r e d by this m o d e l are:
Y=Yi-l , Z=Yi-2 .
appears in the o u t p u t statement,
corrects
The term "22.4" ' w h i c h for the n o n - z e r o
m e a n s of b o t h input and o u t p u t o b s e r v a t i o n s . r e p r e s e n t a t i o n of this m o d e l
A diagrammatic
is shown in fig.
4(d).
N o t e that the e l e m e n t s e i in the table look-up are not the same as the terms n t w h i c h a p p e a r in e q u a t i o n The r e a s o n for this is, of course,
(3.8).
that the output of the A
t r a n s f e r f u n c t i o n p a r t of the m o d e l
is no longer Yi' but is
^ is g i v e n by Yi
a new q u a n t i t y Yi"
Y i = - O ' 5 3 u i -3-O'37uf1-~-O'51ufl-5+O'57yi-]+O'Olyi- 2. . (3.9) w h e r e a s Yi* is g i v e n by Y [ = - O ' 5 3 u i - 3 -O.37ufl-~ - O . 5 1 u fl-s + O . 5 7 y ~ _ i+ O . O l Y i _ 2. . (3.10) Since y ~ = Y i + n i , we h a v e y~=~i-O.57ni_ -O.Oln . . . . . . . . . . . . . . . I
In general,
y~=yi+(l-A(B~ni
i f the scalar t e r m of A(B) Eq~
i--2
(3.8) and
. . . . . . . . . . . . . . . is i, w h e r e A(B)
(3.11) (3.12)
is d e f i n e d as in fig.
(3.12) , t o g e t h e r w i t h
ei:y~-y [ . . . . . . . . . . . . . . . . . . . . . .
(3.13)
3.
94
lead to which
ei=A(B)n i . . . . . . . . . . . . . . . .
is a w e l l - k n o w n
result
(44).
e. is in fact the 1
S for the m o d e l y =~u
" e q u a t i o n error"
In v i e w of t h e s e d i f f e r e n c e s , which was
a s s u m e d by B o x
measuring
uses
model
IV is not of the
and
for w h i c h
and thus
obtaining
Theorem
some
This u n d e r l i n e s
f e a t u r e of our c h a r a c t e r i s a t i o n of m o d e l l i n g .
(3.4.4)
implies
of m o d e l s w h i c h we there exists
that,
consider
a fundamental
and a s s e s s i n g model was
from
as a o n e - s t e p - a h e a d p r e d i c t o r w h i c h
the m o s t r e c e n t o u t p u t o b s e r v a t i o n s .
an i m p o r t a n t
form
the m o d e l
This does n o t p r e v e n t us
its i n f o r m a t i o n gain,
a s s e s s m e n t of its v a l u e
.
and J e n k i n s ,
coefficients were estimated.
(3.14)
it.
arrived
Thus
for the v e r y g e n e r a l
(those of d e f i n i t i o n dichotomy
between
the p r o c e s s
at is q u i t e
by w h i c h
irrelevant
class
(3.3.1)),
finding
a model
a particular
to its a s s e s s m e n t
by the use of i n f o r m a t i o n gain.
3.77
Model V - Stochastic Process We n o w c o n s i d e r
u s e s eqn From
(3.7)
Model
a r e f i n e d v e r s i o n of m o d e l
in an a t t e m p t
to p r e d i c t
a s t a t i s t i c a l p o i n t of v i e w this
the c o e f f i c i e n t s the p r o c e s s
ni,
appearing and eqn
(3.14)
and n i have quite different again,
however,
in eqn
spectral
we are free to a s s e s s
for o n e - s t e p - a h e a d p r e d i c t i o n , obtained.
the t e r m s e i-
is a n o n s e n s e ,
(3.7) w e r e
shows
IV, w h i c h
because
estimated
for
t h a t the p r o c e s s e s
ei
characteristics.
Once
the v a l u e
regardless
of the m o d e l
of h o w it was
95
In this case the model must be allowed output
observations,
exploit
equation
slightly
since it would
(3.7).
different
However,
otherwise
to past
be unable
the sets A,...,E
from those defined
A={Ai:i=l , .... 296},
access
for model
Ai= ~ for i<8,
to
must be
IV:
Ai=(ui_
,...,ui_1)
for i>8, B={Bi:i=l,...,296}
,
Bi=~
for i<8,
Bi=(Yi_7,...,yi_ I)
for i98, C={Ci:i=i,...,296},
Ci=Y i,
D={Di:i=l ..... 296},
Di=
for i<8
Ai,Bil= (Ui_~'''.,Ui_l) ,(Yi_7,.-.,Yi_!
for i~8, f for i<8 Ei=(Di,Ci )= ~(~'Yi )
E={Ei:i=l,...,296},
[
(((ui_7,...,ui_l),(Yi_ yi_,)),yi)
A model which
exploits
be built
for smaller
However,
the model would
Since
real interest
a particular
lies not in building
(3.3.5)),
...,E in a way
that allows
it is sensible the smallest
could
I,J;
FOR J:=
1 UNTIL
an
above.
larger.
(F,E)-model
to define
for E
the sets A,
(F,E)-model
to be
nonanticipative.
(F,E)-model
INTEGER
READ
(3.7)
for any nonanticipative
that E remains
In this case the BEGIN
and
then have to be slightly
set E, but rather
providing
(3.6)
sets A i and B i than those defined
(cf. definition
built,
equations
for i98.
is:
REAL ARRAY A,U,Y,Z 296 DO READON
(I);
IF I<8 THEN WRITE
(A(I))
ELSE
(1::296);
(A(J));
,...,
96
BEGIN
FOR J : = l
UNTIL
F O R J:=(I-2)
7 DO R E A D O N
UNTIL
(U(I-J),Y(I-J));
I DO
Z(J) : = - . 5 3 * U ( J - 3 ) - . 3 7 * U ( J - 4 ) - . 5 1 * U ( J - 5 ) +.57*Y(J-1)+.OI*Y(J-2) WRITE
;
(Z (I)+1.53" (Y(I-1)-Z(I-1))-.63* (Y(I-2)-Z(I-2))
+ A(I)+2.2) ; END; END. 53.8
53.6 The
53.5
table
in A p p e n d i x
53.5
look-up
C.
statement.
shown
4(e).
above
are
necessary.
A shorter
52.7
computed,
adjustments by
shows
which
O
....
as column
a. 1
to the m e a n s
the single
term
of
"2.2"
of the m o d e l
clearly
but is s l i g h t l y
version
.1
is listed
A representation
form of the m o d e l
observations
computation
required
are a c c o m p l i s h e d
in the o u t p u t
The
53.1
for this m o d e l
All the
the o b s e r v a t i o n s
in fig.
53.4
is
how the o u t p u t
larger
performs
than
the same
is:
BEGIN INTEGER
I,J;
F O R J:=l READ
UNTIL
REAL A R R A Y
A,U,Y(I::296);
296 DO R E A D O N
(A(I));
(I);
IF I<8 THEN W R I T E
(A(I))
ELSE
BEGIN FOR J : = l WRITE
UNTIL
7 DO R E A D O N
(U(I-J),Y(I-J));
(2.1*Y(I-1)-I.5*Y(I-2)+.34*Y(I-3) + . O I * Y (I-4) -. 53"U (I-3) +. 44"U (I-4)
.2
97 -. 28"U (I-5) +. 55"U (I-6)-. 32"U (I-7) +A (I) +2.2) ;
END; END. 53.8 The
53.6 table
look-up
for the p r e v i o u s
.
for this
(3.6),
(3.7)
Jenkins model
and
(45).
V.
function
we
version
& Jenkins
consider (3.8) The
.
2
is of c o u r s e
the
same
as
Model
the m o d e l w h i c h
in the m a n n e r
which
on p . 4 0 7
of
can be (45),
uses
intended
sets A , . . . , E c a n
The model, given
.
one.
3.7.8 M o d e l VI - Box
Finally,
.
remain
equations by Box
the same
compared
with
the
and as for forecast
is:
BEGIN INTEGER
I,J;
F O R J:=l READ
UNTIL
REAL A R R A Y
W,U,Y(I::296);
296 DO READON
(W(J));
(I) ;
IF I<8 THEN W R I T E
(W(I))
ELSE
BEGIN FOR J:=l WRITE
UNTIL
7 DO READON
(U(I-J),Y(I-J));
(2.1*Y(I-1)-I.5*Y(I-2)+.34*Y(I-3) + . O l * Y (I-4)-. 53"U (I-3) +. 44"U (I-4) -. 28"U (I-5) +. 55"U (I-6) -. 32"U (I-7) +W (I) -. 57"W (I-l) - . O l * W (I-2) +2.2) ;
END; END. 53.8
53.6
53.5
53.5
53.4
53.1
52.7
.1
.1
....
4
98 The table look-up for this m o d e l as the c o l u m n h e a d e d w i.
is shown in A p p e n d i x C
The e l e m e n t s of this c o l u m n are
e s t i m a t e s of the e l e m e n t s of the "white noise" of e q u a t i o n Fig.
sequence
(3.7).
4(f) shows the s t r u c t u r e of this model,
although
the above p r o g r a m is a m o r e e f f i c i e n t i m p l e m e n t a t i o n that shown in the figure this m o d e l to e q u a t i o n s manipulations, fig.
{w t}
(cf m o d e l V). (3.6)-(3.8)
than
The e q u i v a l e n c e of
is shown by the f o l l o w i n g
w h e r e the o p e r a t o r s A , B , C , D are as d e f i n e d in
3: Yl =
Yi*+ei
yi*=(l-A)y~+Bu[ e
l
= ( l - C ) e +ADw l
1
= (l-C) ~ ~-yi*) + A D w i
= (l-C) (Ay~-Bu[) + A D w i . •. y~ = ( 1 - A C ) ~ ÷ B C u ~ + A O w i
ACy .'. y~
=BCu2+AOw i B . D
~ui+~w i N o t e that the o p e r a t o r s
(l-A) and
(l-C) act only on
past v a l u e s of y~ and e i, r e s p e c t i v e l y . 3.7.9 A s s e s s m e n t of the Models
The size of each of the above six m o d e l s was m e a s u r e d as the n u m b e r of t e r m i n a l c h a r a c t e r s in it.
R e s e r v e d words,
be single terminals,
of Algol W that appears
such as BEGIN, w e r e c o n s i d e r e d to
as w e r e s t a n d a r d p r o c e d u r e names,
such
99
as WRITE.
This p r a c t i c e
is j u s t i f i e d in c h a p t e r 6.
U n n e c e s s a r y spaces w e r e not counted.
The e l e m e n t s of each
table look-up were taken to be as shown in A p p e n d i x C, e x c e p t that p o s i t i v e e n t r i e s w e r e c o n s i d e r e d to be p r e c e d e d by "+".
The reason for this is d i s c u s s e d in c h a p t e r 7.
The f o l l o w i n g table gives assessment. of e a c h model,
the results of the m o d e l
In a d d i t i o n to the size and i n f o r m a t i o n gain the " i n f o r m a t i o n explained"
by it is shown.
This q u a n t i t y is the r a t i o of the i n f o r m a t i o n gain to the size of the t r i v i a l m o d e l and r e s e m b l e s an e f f i c i e n c y m e a s u r e , if the i n f o r m a t i o n gain is not negative. MODEL
SIZE
I
1532
O
O
II
1159
373
24.4%
III
1076
456
29.8%
964
568
37.0%
V
1OO5
527
34.4%
VI
1013
519
33.8%
IV
TABLE It is i n t e r e s t i n g
INFORMATION GAIN
I
-
Results of Model A s s e s s m e n t .
to c o n s i d e r w h a t
the same six m o d e l s w e r e p o s t u l a t e d
the s i t u a t i o n w o u l d be if for systems w h i c h
c o n s i s t e d of the initial s e g m e n t s of the data, (yl,...,yj)), O~j~296. of the six m o d e l s
INFORMATION EXPLAINED
Sj=((ul,...,uj) ,
The m o d e l sizes and i n f o r m a t i o n gains
for these systems
are shown in figs.5
and 6, r e s p e c t i v e l y . F r o m fig.
6 it is seen that after the first few
o b s e r v a t i o n s b e c o m e available,
the b e s t of the six
models
100
is m o d e l
II - namely,
This m o d e l
remains
been obtained, model more
elaborate
does
actually
Nevertheless, that
This
have
smaller
the
comparison
step-ahead, whereas
model
that
III,
output
for l o n g - t e r m than
also has
than
insignificant
not,
is b e i n g
output
gains
IV.
V
to the data. V.
V is p r e d i c t i n g
the w h o l e
than m o d e l
at each
onestep,
observation However,
II.
model
This
it is b e t t e r
the m e a n
to j u s t i f y
form i n t e n d e d
there
is little
information IV.
errors
those of m o d e l
It is i n t e r e s t i n g
to m o d e l
a slightly
gain.
lower
value
to use
of the
V, b u t
the i n c r e a s e d to note
by Box
information
w i are s l i g h t l y
that,
although
and Jenkins,
to choose
between
Furthermore,
while
gain than
smailer,
the d i f f e r e n c e complexity
model
on
is too
of m o d e l
model
VI
model
V is
them on the basis
neither
V
indicates
than m o d e l
information.
just p r e d i c t
that m o d e l
IV to m o d e l
gain
prediction
have
than m o d e l
information
over
gain
information
"overfitted"
since m o d e l
the i n p u t
The p r e d i c t i o n
the whole,
of the
from m o d e l
V - a
observations.
V.
VI.
latest
information
rather
M o d e l VI model
only
a higher
indicates
the
Model
C reveals
errors
have
Thereafter
a lower
of i n f o r m a t i o n
III is p r e d i c t i n g
interval t using III has
of A p p e n d i x
informa[~on
surprising,
using
model
a lower
better.
296 o b s e r v a t i o n s
prediction
the m o d e l
is not very
all
mean.
90 o b s e r v a t i o n s
IV - has
in c o m p l e x i t y
III has
about
the five models.
after
Examination
justified;
Model
of
of a c o n s t a n t
IV b e c o m e s
than m o d e l
IV, e v e n
the i n c r e a s e
is not
until
model
the b e s t
model
gain than m o d e l obtained.
the best
whereupon
IV remains
been
the p o s t u l a t i o n
is
of
is p r e f e r a b l e
101
3.8
Summar[ The c h a r a c t e r i s a t i o n
in this chapter
of m o d e l l i n g
can be s u m m a r i s e d
w h i c h has been d e v e l o p e d
as follows:
(i)
A system is d e f i n e d by a set of observations.
(2)
A model of a system is an a l g o r i t h m
the output o b s e r v a t i o n
set by using s p e c i f i e d
for computing subsets of
the system observations. (3)
Those
not u n d e r s t o o d table
aspects
of a system's
b e h a v i o u r which are
are computed by the m o d e l with the aid of a
look-up. (4)
exercise
The situation
The m o d e l l i n g
each step r e s u l t i n g the system.
model.
exercise
progresses
from the p o s t u l a t i o n
In general,
the next cannot p r o c e e d
exercise
of:the m o d e l l i n g
is c a p t u r e d by the concept of the trivial model.
(5)
(6)
at the b e g i n n i n g
At each step,
the t r a n s i t i o n
in "steps",
of h y p o t h e s e s
about
from one step to
algorithmically. the p r o g r e s s
of the m o d e l l i n g
is m e a s u r e d by the i n f o r m a t i o n
gain of the current
4.
4.1
INCORPORATION
OF
A
PRIORI
KNOWLEDGE
C h o i c e of P r o @ r a m m i n @ Language.
The m o d e l l e r has c e r t a i n a priori beliefs s y s t e m he is m o d e l l i n g . s h o u l d reflect
these.
about the
His choice of p r o g r a m m i n g
language
It will be r e c a l l e d from sec.
3.4
that a s s e s s i n g m o d e l s on the basis of i n f o r m a t i o n gain is tantamount
to c o m p a r i n g the n u m b e r of " a r b i t r a r y elements"
w h i c h m a k e up a h y p o t h e s i s of o b s e r v a t i o n s "
about a b e h a v i o u r w i t h the "number
of that behaviour.
These a r b i t r a r y e l e m e n t s
are always c o u n t e d r e l a t i v e to some s t r u c t u r e w h i c h is taken for granted.
This s t r u c t u r e
of the p r o g r a m m i n g programming
is p r o v i d e d by the d e f i n i t i o n
l a n g u a g e used.
language embodies
In o t h e r words,
the
those a r b i t r a r y e l e m e n t s w h i c h
w i l l be common to all the m o d e l s b e i n g assessed.
It
o b v i o u s l y makes
sense to choose the language so that these
common elements
c o i n c i d e w i t h those a s s u m p t i o n s
that the
m o d e l l e r is w i l l i n g to take for granted. For example,
suppose that the m o d e l l e r b e l i e v e s
the g a s - f u r n a c e data of sec.
that
3.7 is c e r t a i n l y p r o d u c e d by
a m o d e l of the form Y i = b o U i + b l U i _ l + . . . + b m U i _ m - a z Y i _ ]- . . . - a n Y i _ n % e i . . . .
(4.1)
and that he is not p r e p a r e d to s e r i o u s l y c o n s i d e r a m o d e l w i t h any o t h e r structure. the f o l l o w i n g p r o g r a m m i n g
Then he can language.
l a n g u a g e is a list of i n t e g e r s
z
define
Every p r o g r a m of the
and r a t i o n a l s w h i c h is given
the i n t e r p r e t a t i o n : n,m,a
(informally)
,...,an,bo,. "" ,bm,e 1 ' ' ' ' ' e N "
103
The data for such a p r o g r a m is a s i m i l a r llst, w i t h the interpretation: i , Y i _ l , . . . , Y i , n , U i , - - - , u i _ mGiven such a p r o g r a m and such a set of data, of Yi in a c c o r d a n c e w i t h eqn observations
(u ,...,u N)
the c o m p u t a t i o n
(4.1) is invoked.
and o u t p u t o b s e r v a t i o n s
If input (y , ....,yN )
i
!
of a s y s t e m are obtained, programs
then a certain
(infinite)
set of
in this l a n g u a g e w i l l c o n s t i t u t e m o d e l s of the s y s t e m
((u ,...,UN), (y ,...,yN)). I
The terms e ,...,e N w h i c h appear
1
!
in the p r o g r a m form a table
look-up.
A t r i v i a l m o d e l is
o b t a i n e d if m = n = b o = O , and e l=YI'- "''eN=YN" The p r o g r a m m i n g
language
Linear M o d e l Language, of the c o m p u t a t i o n s A to i l l u s t r a t e
just d e s c r i b e d w i l l be called
or LML.
Fig.
it performs.
7 shows the s t r u c t u r e
LML is u s e d in A p D e n d i x
the formal d e f i n i t i o n of p r o g r a m m i n g
languages.
A s s u m i n g that the same r e p r e s e n t a t i o n of n u m b e r s is used in LML as in Algol,
it is clear that a m o d e l w r i t t e n in LML
w i l l be s m a l l e r than the A l g o l algorithm.
i m p l e m e n t a t i o n of the same
So m o d e l a s s e s s m e n t u s i n g LML will i n d i c a t e
fewer " a r b i t r a r y elements"
in each m o d e l than w o u l d a s s e s s m e n t
using Algol.
some of the a r b i t r a r y e l e m e n t s
In a sense,
have b e e n s h i f t e d from the d e f i n i t i o n of eac h p r o g r a m to the d e f i n i t i o n of the language.
This is seen clearly if LML is
c o n s i d e r e d to be an A l g o l p r o c e d u r e ,
rather than a s e p a r a t e
language. It w i l l be seen later in this chapter that the choice of p r o g r a m m i n g assessment.
language can affect the results of m o d e l So, if the choice of l a n g u a g e is c o n s i d e r e d
104 to be the s p e c i f i c a t i o n of the m o d e l l e r ' s
a priori k n o w l e d g e ,
then m o d e l a s s e s s m e n t on the basis of i n f o r m a t i o n g a i n is seen to d e p e n d on a p r i o r i knowledge. any m e t h o d of m o d e l a s s e s s m e n t ,
This is a feature of
and is not s p e c i f i c to the
m e t h o d b e i n g a d v o c a t e d here. The m o d e l l e r w i l l o f t e n be u n c e r t a i n of his a priori beliefs.
Fortunately,
have to choose b e t w e e n c o n f l i c t i n g choose b e t w e e n p r o g r a m m i n g
about the c o r r e c t n e s s
he does not always
assumptions.
He can
l a n g u a g e s w h i c h imply a g r e a t e r
or lesser state of knowledge.
For example,
the choice
of LML for m o d e l a s s e s s m e n t i m p l i e s m u c h m o r e s p e c i f i c knowledge
about the s y s t e m than does the choice of Algol.
An i n t e r m e d i a t e state of k n o w l e d g e may p e r h a p s be r e p r e s e n t e d by use of a s i m u l a t i o n
language.
N o t e that the choice of l a n g u a g e is not just a choice between a special-purpose modeller's
and a u n i v e r s a l
language.
The
a ~ r i o r i b e l i e f s may c o i n c i d e fairly w e l l w i t h
the s t r u c t u r e e m b o d i e d in some s u b s e t of Algol,
but may be
q u i t e d i f f e r e n t from that e m b o d i e d in some l a n g u a g e d e s i g n e d for m a n i p u l a t i n g
strings.
Nevertheless both
languages m a y
be universal,
in the sense that each is capable of c o m p u t i n g
every partial
recursive
function.
An o b v i o u s r e s t r i c t i o n w h i c h m u s t be p l a c e d on a l a n g u a g e w h i c h is to be used for m o d e l Suppose
a s s e s s m e n t arises as follows.
that the l a n g u a g e b e i n g u s e d includes
a standard
p r o c e d u r e w h i c h can be called by its s i n g l e - t e r m i n a l say A.
Suppose
further that this p r o c e d u r e
output observation
computes
identifier, the
set of the s y s t e m S by m e a n s of a table
I05
look-up. single
Then
a program
terminal
A would
be a shorter,
and h e n c e
model written
in that
without
be a m o d e l
language,
must
that
the
would
any other
it c o u l d be c o n s t r u c t e d S.
Clearly,
about
aspect
of such
this
the system.
is w i l l i n g
as an " e x o g e n o u s
variable"
he does
would,
is not p r o b l e m a t i c
he is e x a m i n i n g ,
to investigate.
render
-
the s y s t e m b e h a v i o u r
of the w o r l d w h i c h
of course,
the m o d e l l e r ' s
The use of the p r o c e d u r e
to accept
not w i s h
a procedure
of s p e c i f y i n g
the s y s t e m b e h a v i o u r
that the m o d e l l e r
acceptance
This m o d e l of S than
of the s y s t e m
as a n o t h e r
knowledge
and one w h i c h
yet
on the i n a d m i s s i b i l i t y
can be r e g a r d e d
imply
model
than
be outlawed.
Insisting
a priori
of little m o r e
of S.
a better,
any u n d e r s t a n d i n g
situation
would
consisting
Such
the m o d e l l i n g
an
exercise
redundant.
4.2
Asymptotic Models
In this
section
language
does
sets
large
are
developed a single this,
it is shown
not a f f e c t m o d e l enough.
in c h a p t e r
The
that
assessment
so as to d e s c r i b e
a system
sets.
the m o d e l l i n g
of p r o g r a m m i n g
if the o b s e r v a t i o n
characterisation
3 considered
p a i r of o b s e r v a t i o n
the choice
of m o d e l l i n g
to be d e f i n e d
We n o w w i s h
to e x t e n d
of an i n c r e a s i n g
of o b s e r v a t i o n s .
Definition (a)
(4.2.1) Let U =(u 1
observation
sets.
1
,...,u i) and U 2 = ( u i + 1 , . . . , u j )
Then
the o b s e r v a t i o n
set
by
be
set
106
U U =(u 1
2
1
,...,u i
(b)
'Ui+l''''
Let S =(U 1
,Uj) is an e x t e n s i o n of U1
,Yl)
and S =(U
1
2
,Y ) be systems. 2
Then
2
the s y s t e m S S =((U U ), (Y Y )) is an e x t e n s i o n of S . 12
Definition
12
] 2
1
(4.2.2)
An i n f i n i t e
s e q u e n c e of systems
2
~=(Sl,S
,...1
w h e r e S j is an e x t e n s i o n of S j-1 sJ=s S ...Sj) 12
for every j>l,(i.e.
is an a s y m p t o t i c system.
We w i s h to c o n s i d e r m o d e l s of the sJ's w h i c h d i f f e r only in their table table
look-ups.
To capture the idea of a
look-up w i t h o u t r e s t r i c t i n g it unduly, we shall
c o n s i d e r m o d e l s to be pairs
(m,T).
is a part of a program,
and the pair
the c o m p l e t e program.
This
if required: (3.2.1)),
take a p a i r i n g
E a c h e l e m e n t of the pair (m,T) is r e g a r d e d as
can be f o r m a l i s e d q u i t e easily, function T
and change d e f i n i t i o n
(cf proof of t h e o r e m
(3.3.6),
so that a c o n c r e t e
(F,E)-model b e c o m e s an o r d e r e d pair of i n t e g e r s that F ( T ( m , T ) , i , D i ) = C i. w i t h programs,
(m,T), such
T h e s e i n t e g e r s can be a s s o c i a t e d
as before,
m will be c o n s i d e r e d to be the
p a r t w h i c h is common to m o d e l s of all the sJ's, w h i l e T j w i l l be r e g a r d e d as a table for e a c h S j.
look-up, w h i c h may be d i f f e r e n t
W h e n a t r a n s l a t i o n of the p r o g r a m
one l a n g u a g e to another is considered, T
(or at least its length)
hand,
(m,T) from
it w i l l be a s s u m e d that
remains unchanged.
On the o t h e r
the t r a n s l a t i o n of m w i l l be assumed to be d i f f e r e n t
107
from m.
In this way a distinction is drawn between T and
m, which corresponds to some aspects of the distinction between table-lookup and other types of program. In the following definition a particular programming language is assumed, in this language.
m and T j are fragments of programs The definition is based on definition
(3.3.1), and the notations of definition
(4.2.1) are
generalised in an obvious manner. Definition
(4.2.3)
Let AJ={A~} be a set of ordered subsets of let BJ={B~} be a set of ordered subsets of
(Uz 2U ...Uj),
(Y, Y 2 "''Yj)' and
let cJ={c~} be a complete set of mj disjoint ordered subsets of
J_ J J Let DJbe a set of ordered pairs Di-(Ak,B£)
(YI y 2 "''Yj)"
(i=l,2,...,mj),
and let E j be a set of ordered pairs
E i-' j- ~Di'~i j ~J ) (i=l'2'''''mj)" I
Finally,
let ~ be the sequence
2
I
~=(E ,E .... ), a n d ~ Then the pair
be the sequence
~=(T
2
,T .... ).
(m,~) is an asymptotic t-model of the
asymptotic system =(S ,S ,...) if and only if (m,T j) is an EJ-model of S j, for every j=l,2,... The following definitions distinguish between two possible asymptotic behaviours of rival models. denotes the i n f o r m a t i o n gain of the model denotes the information explained by model
I(m,T j)
(m,TJ), and E(m,T j) (m,TJ), n ~ e l y
the ratio of I(m,T 3) to the size of the trivial model of Sj .
(m, , < )
and (m2 '~)2 denote asymptotic models of some
108
I
asymptotic s y s t e m # ,
with ~
2
= (T II ,T 21 ,...) and f 2 = ( T 2 , T 2 .... ). 1
We use lim inf xj to denote lim inf Xk, and j+~ j~m k>j similarly for lim sup. Definition
(4.2.4)
(m , ~ ) 1
is asymptoticall[ weakly better than
(m ,/)
1
2
(denoted by
(m , < ) > w ( m 1
2
,/2)) if and only if 2
lira inf {I(m ,TJ)-I(m ,TJ)}=+ ~. . . . . . . . . . 1
j~ Definition
2
(4.2)
2
(4.2.5)
(m , ~ ) 1
1
is asymptotically
strongly better than
(m , ~ )
]
(denoted by
2
(m1,~1)>s(m2,~z))
lim inf
j~
if and only if
{E(m ,TJ)-E(m 1
1
2
,TJ)}>O . . . . . . . . . . 2
(4.3)
2
The ideas behind these definitions
are the following.
Let tj denote the trivial model of S j, and Itjl denote its size.
We henceforth make the natural assumption that lim [tj[=+ ~ . . . . . . . . . . . . . . . . . . . j~
If
(m , ~ ) 2
is asymptotically weakly better than
I
the "amount of information"
(4.4)
(m ,~) 2
extracted from S j by
eventually greater than that extracted by difference between them is eventually
(m ,T j ) is l
]
(m2,T23), and the
increasing.
their "rates of information extraction",
then
2
But
as measured by the
109
information explained, may be converging towards each other. For example, if Itjl=kj, I(m],TJ)=pj ½,1
I(m2,TJl=qj½2 , with p>q,
then I(mz,T32)-I(m2,T3)=(p-q)j½~- , while E(m ,T j)-E(m ,T~)= k ~ j -~ ~O. i ! 2 If (m ,~) I
(m ,~) 2
is
is asymptotically strongly better than
1
then the "rate of information extraction" by (m ,~)
2
1
eventually greater than that by (m ,~}. 2 2
strong"
terminology
is
justified
by
the
1
The "weak/
following
theorem.
Theorem (4.2.6) (m 1 ' 3 )1 >
S
(m2 , ~2) ~ ( m l , ~ l ) > w ( m
2
,~). 2
Proof Suppose lira inf{I(m ,TJ)-I (m ,T j)}<+~ . 1
j-~
1
?.
2
Then :IN, such that for any integer k, ~i>k, such that I(m ,Tl)-I(m ,T~),
2 4
Since E (m,Ti)= ~ )
and ItiI+m , this implies that for any
integer k, and for any £>O,~i>k, such that E(m ,Ti)-E(m ,Ti) <£ . l
1
2
2
But this contradicts lira inf {E(m ,TJ)-E(m ,TJ)}>O. j~ 1 l 2 z Hence lim inf {E(m ,TJ)-E(m ,TJ)}>O=~lim inf {I(m ,T3)j+~ 1 1 2 2 j~ 1 1 I(m ,TJ)}= 2
+oo
•
2
We now consider the effect of writing models in different languages on their asymptotic performance.
For a precise
discussion of what it means for a program to be written in
110
some particular
language,
see chapter
5.
Let
(m , ~ ) I
(m
,~)
be asymptotic
models of J w r i t t e n
language
~.
a programming
programs
(p ,T~), (p ,T3), j=l,2,...,
2
and
l
in a programming
Z
Let
~ be 2
functions
such
can be written
that
in ~,
2
and such that these programs recursive
language,
compute
as the programs
the same partial (m 'TJ)' 1 (m2'TJ)' 2
j=l,2
1
respectively.
Using
the
notation
of
definition
(3.3.6)
we can write (T (PI'T~) ,' ,') = ~ (T (ml 'T3~) '''' ) where T is an a p p r o p r i a t e
pairing
for P2,m2.
(p , ~ )
Consequently
.......
function,
and
(p , ~ )
]
models
of#written
Let
IPl denote
It J=Jt 1÷k Theorem
2
similarly
are asymptotic
2
in z. the size of a program p;
trivial model of S j written model of S 3 written
and
(4.5)
in ~.
let t~ be the 3
in ~, and let t~ be the trivial 3 we assume that
..................
146)
(4.2.7)
With the notations
and assumptions
as stated above,
(a)
(ml , ~ ) >w(m2 , < ) ~
(pl, < ) >w (p2 , < )
(b)
(ml ' < ) > s (mr '<)<=>(Pl ' 4 ) >s (P2 ' < )
Proof (a)
There exist integers
IpII=Im11+k 2 and
k , k , such that 2 3
Ip21=Im21+k ~.
t
111
Hence
I(ml 'Tj)I - I(m2 'Tj)2 = Im21+ITJI'Im11-1T~]2 = Ip21+ITJl-lp 2
1
l-IT~l+k -k 2
= I ( p ,T~)-I (P2 'Tj )+k -k 2
2
$
{I(m1,T~)-I(m2,T~)}-k2+k~
.
The result follows from definition (b)
li~+~nf
3
(4.2.4).
{E(pl ,TJ)-E(p2,T~)}= 1
= lim inf , ~ i r {I( p ,T~)-I(P2,TJ ) } j+~ It j ] 1 2 lira inf
It l = lim inf {g(m ,TJ)-E(m j~
1
1
The result follows from definition Theorem
It L+k ,TJ)} , by eqn
2
(4.4).
2
(4.2.5).
(4.2.7) shows that the asymptotic relative
merits of two asymptotic models are not changed by a change of programming
language.
that choice of programming knowledge,
This result,
coupled with the view
language specifies
has the following interpretation:
our characterisation
of modelling,
a priori According to
the relative merits of
two rival models are independent of the modeller's beliefs,
if the observation sets are large enough.
a priori This is
a weak condition which should be satisfied by any reasonable procedure for model assessment. 4.3
Practical Effect of Chan~e of Lan~ua@e. The asymptotic results of theorem
(4.2.7) say nothing
about the situation for small observation sets. section it is demonstrated
In this
that a change of programming
112
language
can a f f e c t
Models
I,II,
(cf. s e c t i o n Model
the r e s u l t s
be a s s e s s e d
(ELML).
w h i c h was d e s c r i b e d computations
assessment
IV and V of the g a s - f u r n a c e
3.7) w i l l
Language
of m o d e l
This
in s e c t i o n
using
4.1,
observations
Extended
is a l a n g u a g e
in practice.
Linear
similar
but w h i c h
to LML,
performs
of the form
boUi+'''+bmUi-m-alYl-2-'''-anYi-n
Yi =
e i = doWi+...+dpWi_p-Clei_1-...-Cqei_ q Yi = Yi+ei +y Fig.
8 shows
the s t r u c t u r e
ELML p r o g r a m given
(4.7) of these
is a list of i n t e g e r s
computations.
Each
and r a t i o n a l s
which
is
the i n t e r p r e t a t i o n :
Y'm'n'p'q'a1'''''an'bo'''''bm'Cl'''''Cq'do'''''dp'Wi'''''w The d a t a
for such
a program
is a n o t h e r
list,
with
N.
the
interpretation: i,Ui_q_m,...,ui,Yi_q_n,-.-,Yi_ Since
the
value
of e i t h e r
outputs
algorithm
(assuming
an
(i-max(m+q,n+q))th
that p
III and VI of the g a s - f u r n a c e
structure
However,
u or y
requires
the p r o g r a m
w i if i~ m a x ( m + q , n + q ) .
Models the
(4.7)
I.
models
constitute Model
(4.7),
and so c a n n o t
I, II,
these m o d e l s
data do not have
be w r i t t e n
IV and V can.
The ELML
are:
I - Trivial O,O,O,O,O,O,i,+53.8,+53.6,...,+57.O
Model
in ELML.
II - Mean 53.5,O,O,O,O,O,i,+.3,+.i,...,+3.5.
.
program~which
113
Model
IV - D e t e r m i n i s t i c 22.4,5,2,O,O,.57,.01,0,O,O,-.53,-.37,-.51,1, +53.8,+53.6,...,+1.6.
Model V - Stochastic 2.2,5,2,0,2,.57,.01,0,0,O,-.53,-.37,-.51,.53,.63,1, +53.8,+53.6,...,+.2. The were shows
table
look-ups
for the A l g o l
of these m o d e l s
W versions
a comparison
are the same
of s e c t i o n
of the p e r f o r m a n c e
3.7.
as they
Table
II
of the models.
MODEL
SIZE
INFORMATION GAIN
INFORMATION EXPLAINED
I
1494
0
O
II
1119
375
25.1%
IV
879
615
41.2%
v
853
641
42.9%
TABLE
II - R e s u l t s
It is seen as b e i n g b e t t e r section
3.7,
than m o d e l
information
comes
form model
that m o d e l
IV.
W,
Assessment
showed model
about.
small
The
increase
of
IV to be b e t t e r
why
this
change
language
ELML
has
to m o d e l
in
so m u c h
that it r e q u i r e s
in the n u m b e r
to change
Consequently
the a s s e s s m e n t
I).
in its d e f i n i t i o n ,
appropriate
U s i n @ ELML.
V is now a s s e s s e d
H ow e v e r ,
to see i n t u i t i v e l y
of a p r o g r a m
(4.7)) V.
table
inherent
comparatively elements
than m o d e l
(cf.
It is easy assessment
from the table
using Algol
V
of M o d e l
of a r b i t r a r y
from the a l g o r i t h m
(of the
IV to that a p p r o p r i a t e
a comparatively
a
small
to
improvement
114
in the a b i l i t y is s u f f i c i e n t written
of this
algorithm
to j u s t i f y
to e x p l a i n
the increase.
in A l g o l W a g r e a t e r
the o b s e r v a t i o n s
When
improvement
the m o d e l s
are
is required.
4.4 S u m m a r y
The c h o i c e
of the p r o g r a m m i n g
the c h a r a c t e r i s a t i o n
of m o d e l l i n g
developed
can be r e g a r d e d
as a s p e c i f i c a t i o n
knowledge
the s y s t e m w h i c h
The common
about
characterisation
case,
examined. a priori
where
It has b e e n beliefs
assessment
become
of any
as the o b s e r v a t i o n However, especially such
extended
in c h a p t e r
in
3
a priori
to deal w i t h
sets of o b s e r v a t i o n s that
in this
increasingly
alternative
appears
of the m o d e l l e r ' s
models
case
irrelevant which
the
are b e i n g
the m o d e l l e r ' s to the
he may be considering,
sets grow.
it has b e e n d e m o n s t r a t e d
for small
an a s s e s s m e n t
beliefs.
shown
which
he is i n v e s t i g a t i n g .
has b e e n
increasing
language
sets
are
that
of o b s e r v a t i o n s ,
conditional
in p r a c t i c e , the results
on the m o d e l l e r ' s
of
a priori
115
5.
5.1
FRAGMENTS
OF
PROGRAMMING
LANGUAGES
Introduction In c h a p t e r
4 the idea was d e v e l o p e d that the d e f i n i t i o n
of the p r o g r a m m i n g
l a n g u a g e to be used for m o d e l
c o r r e s p o n d s to a s p e c i f i c a t i o n of the m o d e l l e r ' s k n o w l e d g e of the system. a t t a c h e d to the p h r a s e W h e n two m o d e l s
assessment a priori
But can s p e c i f i c m e a n i n g be
" d e f i n i t i o n of a p r o g r a m m i n g
are compared,
to w r i t e each of t h e m as p r o g r a m s
language"?
the l a n g u a g e s r e q u i r e d are r a r e l y e x a c t l y the same.
Can their c o m p a r i s o n on the basis of p r o g r a m lengths be meaningful
then,
since the a p r i o r i k n o w l e d g e
a s s u m e d for e a c h
is s l i g h t l y d i f f e r e n t ? This c h a p t e r and the next are c o n c e r n e d w i t h these questions.
The e x p e d i e n t
a programming
adopted earlier,
of a s s o c i a t i n g
l a n g u a g e w i t h a p a r t i a l r e c u r s i v e function,
no longer s a t i s f a c t o r y ,
since it gives no i n f o r m a t i o n
is
about
the a l g o r i t h m u s e d for c o m p u t i n g the function. A p p e n d i x A reviews one m e t h o d of d e f i n i n g p r o g r a m m i n g languages
formally,
We h e n c e f o r t h assume Appendix.
n a m e l y the s o - c a l l e d " V i e n n a Method". f a m i l i a r i t y w i t h the c o n t e n t s of this
In this chapter,
the V i e n n a m e t h o d is used as
a basis for a p r e c i s e d e f i n i t i o n of " p r o g r a m m i n g The n o t i o n s of a p r o g r a m being
language".
" w r i t t e n in a language"
of "the f u n c t i o n c o m p u t e d by a language" The concept of " f r a g m e n t of a language" section 5.4, and is used to d e f i n e
and
are then introduced. is m a d e p r e c i s e
in
" e q u i v a l e n c e of languages".
116
Languages
are e q u i v a l e n t only if they are i n d i s t i n g u i s h a b l e
to the user. equal,
However,
equivalent
since their i n t e r p r e t i n g
languages n e e d not be
automata,
for instance,
may be different. Most programs
are w r i t t e n in i n f i n i t e l y m a n y
m a n y of w h i c h are f r a g m e n t s of others.
languages,
T h e r e f o r e the
"family of l a n g u a g e s in w h i c h a p r o g r a m is w r i t t e n " introduced.
Such a family c o r r e s p o n d s ,
roughly,
is
to the set
of l a n g u a g e s w h i c h is r e f e r r e d to by a name like "Algol" or "Fortran".
If s u i t a b l y r e s t r i c t e d ,
a family of l a n g u a g e s
in w h i c h a p r o g r a m is w r i t t e n has a "smallest" is c a l l e d the "support"
of the program.
element, w h i c h
The m a i n aim of
this c h a p t e r is the f o r m a l i s a t i o n of this c o n c e p t of "the s u p p o r t of a p r o g r a m " ,
since this is the c o n c e p t w h i c h is
r e q u i r e d for m o d e l assessment. The f o l l o w i n g e x a m p l e m a y help to c l a r i f y this chapter. C o n s i d e r the program: ~egin integer i_~f i=j
i,j; then i:=j+l else i:=j;
i:=l; end. This p r o g r a m is w r i t t e n
in Algol,
but it uses only a few
of the features w h i c h A l g o l u s u a l l y provides. b l o c k structure,
some i n t e g e r arithmetic,
and c o n d i t i o n a l expressions. for statements,
It uses
a s s i g n m e n t statements,
It does not use p r o c e d u r e s ,
or goto statements, for instance.
A l s o it
117
uses very few terminals.
In our t e r m i n o l o g y this p r o g r a m
is w r i t t e n in an A l g o l - f a m i l y of languages
(more p r e c i s e l y ,
it is w r i t t e n in an A l g o l X - f a m i l y of languages, w h e r e A l g o l X is some c o m p l e t e l y s p e c i f i e d A l g o l - l i k e
language).
Languages
in this family include all the v e r s i o n s of A l g o l found i m p l e m e n t e d in practice. is, r o u g h l y speaking,
The A l g o l X - s u p p o r t of the p r o g r a m
the s m a l l e s t A l g o l - l i k e f r a g m e n t w h i c h
allows the above p r o g r a m to be written. block structure,
integer addition,
It will i n c l u d e
a s s i g n m e n t statements,
conditional expressions,
the i d e n t i f i e r s
i.
any f u r t h e r features.
It w i l l not include
i,j, and the integer A constraint
on the c o n c r e t e s y n t a x of the A l g o l X - f a m i l y ensures that the A l g o l X - s u p p o r t of the p r o g r a m allows m o r e than one p r o g r a m to be w r i t t e n in it.
Thus the A l g o l X - s u p p o r t
may not have the trivial and u s e l e s s c o n c r e t e syntax: <program>::=be~in (However,
integer i,j;
i f i =j then i:=j+l else i:=j;
a h - s u p p o r t of the p r o g r a m exists for some language ~,
w h i c h does have this c o n c r e t e
syntax).
The n o t i o n of " e q u i v a l e n c e " w h i c h will be f o r m a l i s e d is i n t e n d e d to c o i n c i d e w i t h the concept w h i c h appears in 011ongren
(49), a l t h o u g h the n o t i o n of "fragment"
is s l i g h t l y
different. The a p p l i c a t i o n of these ideas to m o d e l will be p r e s e n t e d in c h a p t e r
5.2
i:=l;en~
assessment
6.
Preliminaries
We shall use p to d e n o t e a p r o g r a m in the usual sense, namely a finite string of letters from a finite a l p h a b e t
118
of t e r m i n a l s .
When
a specified of
and
(concrete)
a parsing
tree
namely
shall
that
in G.
~G'
(see s e c t i o n We
for
L(G)
Definition
A.3
will
p,
shall
assume
that
language
assume
(49)
the
with existence
the derivation
sections
2.3
G is u n a m b i g u o u s ,
is at m o s t
the
language
P G to d e n o t e
and O l l o n g r e n
there
denote
G, w e
and write
always
any
a particular
grammar
algorithm
~G(p) 3).
discussing
one
derivation
generated
of p
by G.
(5.2.1)
Consider
three
context-free
G i = ( N i , Z i , P i S i)
grammars
(cf.
sec.
A.3)
, i=i,2,3,
where
N i is a s e t of n o n t e r m i n a l s ,
Zi is a set of t e r m i n a l s ,
Pi
a set
is
is
(i)
of
productions,
G ~
G
i
~:~
and
(a) N ~
2
a start
symbol.
N
i
(b)
Si 2
Z ~ 1
Z
2
(c) P ~- P i 2 (d) S (ii)
G =G n G 1
=
1
S
2
(a) N =N n N
2
3
1
(b)
2
3
Z =Z m I
2
3
(c) P =P ~ P 1
2
3
(d) S =S =S i
(Note:
we
G ~ - G,
for
Lemma
use--
to d e n o t e
. 3
improprer
inclusion.
Thus
all G).
(5.2.2)
G~G i
Proof:
2
Immediate
&G~G~GcG 2
from
2
3
]
3
transitivity
of set
inclusion.
119 Lemma
(5.2.31
(i)
G~G
~L(G
1
)~L(G
2
1
) 2
(ii) G I = G 2 n G 3 ~ L ( G , ) ~ L ( G 2 ) n L ( G 3 (i)
Proof:
)"
G~G2&P~L(GI)~G~G2&SI~P I
~$2
G=~p
(by d e f .
(5.2.1))
2
~p
e L (G) 2
(ii)From
(i),G =G a G 1
~
2
L(G )cL(G
3
)&L(G
1
~L(G
)cL(G
2
1
) 3
)~L (G2)mL (G3) . I
(Note that
S~p
the
converse
inclusion
and S G p may be different 2
not
hold,
derivations~
because
neither
of
3
which
is
possible
in
Definition
(5.2.4)
derivation
trees
Lemma
G ). i
~(G)={~G(p)
generated
(5.2.5)
(i)
G~G I
~H(G
)c~(G
2
1
Proof: production
(i)
I
2
G
1
is also in G i
is
.
also
is the set of
in
) 2
~(G ) ~ ( G 2 ) n ~ ( G
3
1
From definition in
: peL(G)}
b y G.
(ii) G =G n G ~
G
does
(5.2.1) G .
it follows
Hence
2
). 3
every
that every
derivation
But to every derivation
in
there corresponds
2
a unique
derivation
every derivation
tree
(Ollongren
(49)
s e c ~. 2 . 3 ) .
tree in ~(G ) is also in
~ (G).
]
(ii)
From
2
(i), G =G r%G ~ ]
2
~(G
3
)~ (G)&H(G 1
~g(G
2
availability
of
a metalanguage
)~H(G l
)cE(Gz)nH(G I
The
Hence
). 3
for
the
) 3
definition
120
of p r o g r a m m i n g of the
languages
is assumed,
set of t r e e - s t r u c t u r e d
in A p p e n d i x
A.
metalanguage
objects
In the c o n t e x t
is the same
by the m o d e l l e r
as is the e x i s t e n c e
of m o d e l
as the
for s p e c i f y i n g
which
is i n t r o d u c e d
assessment,
one that w o u l d (informally)
the
be u s e d
his
a priori
knowledge. The differs
following slightly
reference dropped,
since
from that g i v e n
the same states
M =
is the i n i t i a l F ~is-state
automaton
states
is a p r e d i c a t e state,
w h i c h has
Also,
Its p u r p o s e
will
is-state
that e v e r y
state has
described languages
that A(~)
then
over
A(~)
the set of objects,
states,
is n o t
in the m a n n e r general
is a 5-tuple
~o
function,
and E is a set of
the p r o p e r t y :
control
~eF,
has been
throughout.
A is the s t a t e - t r a n s i t i o n
and that this
assumed
Explicit
~o,A,F,E)
is a set of final
The p r e d i c a t e
more
objects
E is i n t r o d u c e d.
(is-state,
is-state
assumed
set is a s s u m e d
A.6.
(5.2.6)
An i n t e r p r e t i n g
error
in s e c t i o n
automaton
later.
Defintion
where
of an i n t e r p r e t i n g
to the set of t r e e - s t r u c t u r e d
a set of e r r o r be seen
definition
further a
determines for LML
(possibly
empty)
but it is control
the s t a t e - t r a n s i t i o n in s e c t i o n
by O l l o n g r e n
is a set of states,
is not defined.
defined,
(49).
A.6.2,
and
function for
(Thus it is
in general).
part,
If
121
Definition
(5.2.7)
A computation ~O,A,F,E)
is a sequence
for i=O,l,..., terminates 5.3
of an interpreting (~o,~i,...),
and ~ i ~ F ~
automaton
M=(is-state,
such that is-state
~i+leA(~i).
(~i)
The computat£on
if ~i e F for some i.
Pro~rammin~
Definition
Lan@uages
(5.3.1)
A programming
language
where G is an unambiguous, the concrete
syntax;
the abstract
syntax;
is an £nterpreting is effectively
is a 5-tuple context-free
I =(G, is-program,T,M,op) grammar which defines
is-program is a predicate which defines ^ T:H(G)+is-program is a translator; M
automaton whose state-transition
computable,
and whose
initial
function
state ~o(PA )
A
depends
on an object PA=T(PG ) e
output function whose domain
is-program;
and op is an
is in the set F of final states ^
of M.
Furthermore,
~i(pA) |
F ,
. . , ~ ( p A )) is a terminating
other computation ~F),
for every PA e is-program,
(~o(PA),~J(pA),...)
if
computation, terminates
(~o(PA) , then every
(with
and o p ( ~ ( p ) ) = o p ( ~ ( p ) ) . In the above definition
it is assumed
is-program
specifies
"program",
but also of the "data".
departure
the abstract
from the practice
and in the definition
that the predicate
syntax not only of the
followed
This constitutes
a
in the Vienna method,
of LML in Appendix A.
To see that this
122
is no r e s t r i c t i o n ,
c o n s i d e r the following.
If a p r e d i c a t e
i s - p r o g r a m has b e e n d e f i n e d w h i c h does not m a k e p r o v i s i o n for data,
and the d a t a is a s s u m e d to be l o c a t e d in some
d i r e c t o r y of the state
(as is done in the case of LML),
then a new p r e d i c a t e can be d e f i n e d by: is-program I =(<s-program:is-program>, w h e r e i s - d a t a is a p r e d i c a t e
<s-data:is-data>)
s a t i s f y i n g the a b s t r a c t syntax
of d a t a sets.
All that remains to be done is to m o d i f y
the i n s t r u c t i o n
a s s o c i a t e d w i t h the initial state,
the first action of the i n t e r p r e t i n g s-data
a u t o m a t o n is to read
(p) into the a p p r o p r i a t e d i r e c t o r y
and to m a k e s - p r o g r a m new m a c h i n e
so that
(or d i r e c t o r i e s ) ,
(p) the n e x t argument.
~
l
of the
is then i d e n t i c a l w i t h ~
of the old one. o The o u t p u t f u n c t i o n op is i n t r o d u c e d in o r d e r to have
a v a i l a b l e the n o t i o n of "result of a c o m p u t a t i o n " w i t h o u t r e s t r i c t i n g the p r e d i c a t e
is-state.
N o t e that a l t h o u g h
the states a p p e a r i n g in a l t e r n a t i v e c o m p u t a t i o n s need not be the same, we impose the usual r e q u i r e m e n t that if the r e s u l t of a c o m p u t a t i o n
Definition
then it is unique.
(5.3.2)
Let I=(G, p is w r i t t e n is d e f i n e d
is defined,
is-program,
in ~ (pew~)
T,M,op)
~ieF, and ~i~E.
language.
if and only if peL(G), P A = T ( P G )
(where p G = ~ G ( p ) ) ,
the c o m p u t a t i o n
be a p r o g r a m m i n g
and for every ~i a p p e a r i n g in
(~o(PA),~1 (pA),...) , A(~ i) is d e f i n e d unless
123
The role of the set of error states E can now be seen. ~t the end of section A.6 it is r e m a r k e d that specifying the concrete sufficient
and abstract
syntax of a language
to define the v a l i d programs
in a language.
Part of the d e f i n i t i o n m u s t be a c c o m p l i s h e d of the i n t e r p r e t i n g is encountered,
is f o r m a l i s e d
(5.2.6),
state it remains Definition
whenever
by s p e c i f i c a t i o n
an invalid p r o g r a m
an error state is entered.
language d e f i n i t i o n By d e f i n i t i o n
functions:
is not
This aspect of
in d e f l n ~ t i o n
once a c o m p u t a t i o n
(5.3.2).
enters
an error
in an error state.
(5.3.3) is the set of all programs
PI={P:PEw~} Definition
written
(5.3.4)
The function
computed by the language
I is the function
~:Pl--)range
(op) , such that ~ l ( p ) = o P ( ~ F ( P A )) ' w h e r e
PA =T(nG (p))'
(~o(PA) .... '~F(PA ) ) is a c o m p u t a t i o n
~F(PA) eF, and (Definition
~(p)
(5.3.1)
is u n d e f i n e d ensures
The function ~ function.
that ~ l
is a partial
is a function).
effectively
integers),
can be r e g a r d e d
as a partial
if the p r e d i c a t e
is-program
and "data",
correspondence
then, by Church's recursive allows
computable
(op) are suitably
(i.e. put into o n e - t o - o n e
the n o n n e g a t i v e
and
if no such ~F exists.
If the sets P~ and range
arithmet~ed
"program"
in I.
with
thesis, ~
function.
Furthermore,
a clear d i s t i n c t i o n
between
and if each of these can be a r i t h m e t i s e d
124
separately,
then ~ %
can be r e g a r d e d
function of two arguments. regarded
(cf. Rogers
entities
conventionally
are universal. (5.3.1), 5.4
when so
(9)), then I is universal. considered
All those
be p r o g r a m m i n g
to
is included
languages
in d e f i n i t i o n
is not
In this
of Languages
section
language to w h i c h
a subscript
reference
GI denotes
the g r a m m a r
Definition
(5.4.1)
Let k
2
recursive
is universal,
H o w e v e r LML, w h i c h
Fragments
of k
If ~
as a p a r t i a l
and %
i
(11
2
(ii)
pewk1~pew~
1
For example,
of I.
languages,
l
1
is a fragment
G~2
(iii) V pewk 1
2
, ~ (p)=~l (p), where k, 2
~Iz(p), ~l~p)
(iv) V p e L ( G ~ 1 ) ,
everything
a programming
if and only if
G~
Roughly
is being made.
be p r o g r a m m i n g
(i)
if one of
usually denotes
is undefined,
Pewl2~
speaking,
if I
it is u n d e r s t o o d
that
then so is the other,
Pgwl l1
is a fragment
that can be done using
I
of I , then 2
can also be done using ]
I . However, to make this d e f i n i t i o n useful, the languages 2 c o n c e r n e d m u s t be d e f i n e d in a rather i d i o s y n c r a t i c manner. Part
(iv) of the d e f i n i t i o n
implies
that if I
2
contains
125 standard procedures,
for example, which are not available
in ~ , then they must be called by new terminal characters I which do not appear in the grammar of ~ ] . This is not the usual practice, but there ks no reason why it should not be A given programming language
done.
{understood informally)
does not have a unique formal definition. (5.4.1)
Definition
assumes that much more of the burden of the language
definition has been transferred
from the translator to the
concrete syntax, than is convenient in practice.
This
point will be discussed further in section 6.2. Theorem
(5.4.2)
Reflexivity is obvious Suppose ~ I
Then
(i) G ~
O~2 and G A ~ G I
(ii) p e w l l ~ Pewit O
.
Hence G I ~ G I 3
pew~ z and p e w l 2 ~
pewl3.
by lemma
(5,2,2)
Hence
Pew~3" (lii)VP£wl1' ~l] (P)= q~A~p) a n d V P e w l 2 '
~A
(P)=~@13 (p)"
S°'~Pewl1' ~l,(P)=q°13 (P)' by def.
(5.4.1)
2 (iv)~/peL(G~1) , pew~2 ~
pewll
(A)
and V p e L ( G ~ 2 ) , pewl ~ ~
pewl~.
(B)
Suppose peL(Gl1) (5.2.3)
(i).
and p£wl3.
So pew~2 , by
Then peL(GI2) (B).
by lemma
Hence pEwl1, by
(A).
(ii).
126
Definition 1
1
(5.4.3)
is equivalent
to
(I %1 ) if and only if 1 i z 2
2
and I
(5.4.4)
~ is an equivalence
Proof:
Reflexivity
and symmetry
follows from theorem Theorem
(5.4.5)
relation.
are obvious.
Transitivity
(5.4.2).
~ ~
2~
Ptl=P~2
(i)
(ii) q31 i = ~ 2 Proof : Lemma Proof:
Immediate (5.4.6)
from definitions
H
~
=G ~
(5.4.1)
and
(5.4.3).
~9
pewU~Pew~
(ii)~Pew~ , ~ 0 (p) = q01 (p) (iii)~peL(G u) , P £ w ~
Pew~.
G~=G~&9
=G 9
'.
(i) G = G (ii) pew~-~ p~w~ (iii) V P £ w ~ , Q 3 ( p ) = ~ ( p ) (iv)~peL(G) Hence ~ ~ ~ by defs Definition
(5.4.1)
, pew94=~pew~. and
(5.4.3).
(5.4.7)
[~]~={l~:~'~l}
is the equivalence
class of I m o d u l o ~.
127 Lemma
(5.4.8)
For any programming language I, there
exists at most a finite number of equivalence classes
[U]~,
such that H
~
G ~
G~=(N~,Z~,P~,S~)
But each of the sets N~,Z~,Pk,SI definition
is finite.
(5.2.1)) Hence, by
(5.2.1), there exist only finitely many distinct
G , such that G Definition
(cf. def.
~-Gk.
The result follows from lemma
(5.4.6).
(5.4.9)
Let S be a set of programming
languages.
Then the set
of common fra@ments of S is the set NFS ={I: ~ e S ~ Theorem
(5.4.10)
Proof:
~eNFS&~
I
~
(~ES~
~(9¢S ~E~FS Definition
pElFS. ~)&~u
~
~
~) by theorem
by definition
(5.4.2)
(5.4.9).
(5.4.11)
The @reatest common fra@ment of a set of programming languages S is the set r (s) ={X :I~NFS& (~cAFS ~ p < F I) }. Theorem (5.4.12) IaF(S) & ueS~l
Theorem
(5.4.13)
(i)
IEF(S)&~eF(S) ~
l~p
(ii)
~EF (S) &1~p ~ U E F
(S)
128
(i)
Proof:
ICF(S)&NCF(S)~AEF(S)&P£~FS ~H
Similarly, I
(5.4.11).
%, by definition
Hence ~%p.
IEF (S) &I~B ~ %¢~F S& (Ue~FS ~ 9
( 9 ~ F S ~ U
by theorem (5.4.10) and definition
(5.4.3) ,
by theorem (5.4.2), ~eF(S),
by definition
Lemma (5.4.14)
%eS&%eNFS ~
Proof:
%~S&~e~FS ~ ( ~ F S ~ < F I ) & I ~ F S
for(S)
~leF(S), Definition
(5.4.11).
by definition
(5.4.11).
(5.4.15)
If pew% , then the l-family of languages in which p is written is the set A l(p)={p:pewp& (l
pewl4=~ leak(p)
Obvious
Theorem (5.4.17)
PeAx(P)&U£AI(P)~v(P)=~(P).
Proof:
(~
and
~eAl(p)~
U
by definition
l
similarly.
(5.4.1)
129
Hence
l-lEA;t ( p ) ~ ~
Similarly
VEA~ ( p ) ~ I D U (p)=~QX (p) "
Therefore
~eA~(p)&ueA{p)~
( p ) = ~ (p).
~0p(p)=g~(p).
Note that if the convention p ~ w l ~ then theorem
(5.4.17) would not hold.
A~(p)=0 were dropped, Also, note that if
pewk then it is always possible to construct a ~, such that pew~, yet A k ( p ) ~ A (p)=~. Definition
(5.4.18)
The l-support of a program p is the set of languages ZI (p) =F (Ak (p)). By theorem
(5.4.13)
p are equivalent.
(i), all languages in the h-support of If p{w ~, then Z~ (p)=~.
We now come to the central result of this chapter. Theorem
(5.4.19)
(i)
V
l~,k"eAk(p) , 3~cAk(p)
such that
~
Al(p)#~u£Al(p)
, such t h a t V f
£ Ak(p),
~
language
in A k (p)). (iii) Furthermore, Proof:
9£Zl(p).
(i) Suppose t"EA X(p)
Then pewl'&(l
or l'
and l " e A t (p). & p£w l'' &(l
then k~
(5.4.2).
Put ~=l'.
130
If I"
Put p=l".
If I
G~,~GI,,
is-program T (pGI)
={T~(pG~
) :pG~ e~ (G) }
= Tl~(pGl ) , V p G I
eH (G)
Mp=MI~ op=opl~ Now pEL(GI,)
& peL(G~,,) & GI.~G 1 & GI,, ~ GI.
derivations S ~ . p
If the
and S ~ G ,,p were distinct then there would W
be two distinct derivations S G ~ P ,
by definition
(5.2.1).
But this would contradict the standing assumption that G1 is unambiguous.
Consequently
and GI,, are the same.
the derivations of p in GI.
Consequently the productions which
appear in this derivation are all in Gp. and P G ~ = ~ G ~ ( P ) ~ ( G
).
Also, p e w l ' ~ A p ( ~ computation
Hence p£L(G ),
Therefore TU (PGp)=TI~(PGp)~
is-program
i) is defined for every ~i in the
(to( p ) ,~i (P) .... ) (~i%F) , and Ap(~i)nE =~.
These last two properties follow becaue the interpreting automata of l ~ and p are the same. (5.3.2), pCwp
....................
Hence, by definition (i)
131
From the construction, (a)
G
(b)
P'ew~ ~
it is clear that:
~ G I. p~£w ~"
(c)
V p'~w n
: ~x,(p') = ~ (p~)
(d)
kf p'£L(G u) : p'ewX ~ p ' e w U .
Hence, by definition
(5.4.1),
But I~
(i) and
It remains
(3),
~
hence
~eA~(p)
. . . . . . . . . . . .
(e)
From the construction,
(f)
p~ew~
(5.2.3)
(5.4.1)
P ~ w ~'' & I " < F I ~ ° ~ " (f) , p ~ e w ~ 0
(h)
Let p'eL(G
I"
By lemma
[I']~ ~ A l ( p )
By
(p')=~l(p')
(5.4.1)
(iii).
similarly.
Then p'ew I, since
, by definition
(5.4.1)
(iv).
show that ~
(5) together prove (5.4.8),
(5)
(i).
there are at most finitely
, such that I'
shows that 3 9 £ A l(p) (iii)
by definition
) and p " e w I" .
(g) , (h) together
(4) and (ii)
many
(iv)
(p')=~l, , (p').
But u
(e) , (f),
But p~eL(GI,,)
(i), and I"
p~ewU&~
So, using
(4)
it is clear that G U ~ GI,,.
p~ew~ , since ~
Hence p ~e w I" , by definition (g)
(5.4.2).. (3)
to show that U
by (e) and lemma
(2),
~
(2)
such that ~
This, together with
I'EA l(p)
(ii), ~e~FA 1 (p) and 9e~ 1 (p).
9 F1 . Hence,
by lemma
(i),
132
(5.4.14) , 9oFf(p).
Corollary
(5.4.20)
Definition languages. Theorem
pew~Z~(p)
(5.4.21)
Let S and T be sets of programming
Then S~T~=~(VseS&VteT) :
(5 .4.22)
Proof:
~Al(p)
DeZ~(pI ) &~eZ~
Immediate from theorems
Definition
(5.4.23)
(pz)
s~t. &
P~==~(Pl)~E~(P2
(5.4.13)
)"
(i) and (5.4.4).
Let T(p) denote the set of terminals
which occur in p. Theorem
(5.4.24)
Proof:
~% (Pl)~X~ (Pz) ==~ T (pt)=Y (P~) •
Z~(pl)%7-1(p2 ) ~
(5.4.3).
GF I(p~)=Gz l(P2 )' by definition
Let G ZI(Pl ) =(N,A,P,S),
of terminals •
where A is the alphabet
Suppose T(pl)~T(p2) .
Then T(pl ) c A
T(pz)cA~nd either T(pl )#A or T(p2 )~A (or both). T(p )#A. 1
Consider the grammar G=(N,T(p
1
and
Suppose
), P~,S), where P"
is obtained from P by deleting those production rules ~÷8 for which 8 ~ A - T ( p ) .
Clearly,
G~G Z l(pl ).
If El(p l ) is
now replaced by the language obtained from El(pl) by replacing GE l(Pl) by G, then the resulting of Zl(pl) since
language will be a fragment
and will be in Al(pl).
But this is a contradiction,
El(pl) is a fragment of every language in Al(pl) ,
and is not equivalent
to the new language
(because T(p )#A). i
Hence T(pl)=A.
Similarly T(p2)=A.
Hence T(pl)=T(p2).
183
5.5
Conclusion Conditions
(i) and
(iv) of definition
(5.4.1) ensure
that it is possible for two different programs same l-support. 5.1.
Recall the short program given in section
Consider its AlgolW-support,
to be defined definition
to have the
(informally)
(5.4.1)
as in
where AlgolW is taken
(50).
Condition
(i) of
implies that its concrete syntax specification
must include the production rules: <program>::=.
::=<statement>end
::=l<statement>; ::= beginl::=
head><declaration>
statement>I<simple
::=
clause><simple
statement> statemen%)
else <statement> etc. The language generated by these production rules includes the program be~in integer i,j; if i=j then i:=j else i:=j+l; j:= i; end. NOW, condition
(iv) of definition
(5.4.1) ensures that this
program is written in the AlgolW-support of the first program,
since it is obviously written in AlgolW.
easy to see that the AlgolW-supports
It is
of the two programs are
134
equivalent. formal
However,
definition
not
to p r o v e
of this w o u l d
example,
w e have
of the s e m a n t i c s the e q u i v a l e n c e
method
the ¢ 0 ~ e p t
assessment.
specified checked
formally,
necessary
This
chapter
are c o n s i d e r e d using
that
this w i l l
of X - s u p p o r t
language
and t h e o r e m
X will (5.4.24)
content
of the
be the t y p i c a l
in c o n n e c t i o n not need
with
to be
provides
an e a s i l y -
condition
for the e q u i v a l e n c e
of l-supports.
has
that
languages
shown
to be d e f i n e d
the V i e n n a m e t h o d ,
on d e f i n i t i o n about
The
and w e w e r e
of the A l g o l W - s u p p o r t s
It is e n v i s a g e d
model
a full
r e l i e d on an i n f o r m a l
of AlgolW,
two programs. of u s i n g
require
of AlgolW.
In the above understanding
a proof
(5.4.1)),
the"smallest"
if p r o g r a m m i n g
in a c e r t a i n way,
and in a c c o r d a n c e then
language
with
it is p o s s i b l e required
(namely, the c o m m e n t s
to speak p r e c i s e l y
to run
a particular
program. The that
reason
for s h o w i n g
the p r o g r a m
smallest
this
is a model,
is the
and that
following.
Suppose
the d e f i n i t i o n
of the
language
required
to run it is t a k e n
of the m o d e l l e r ' s
a priori
knowledge.
Then
"a p r i o r i
knowledge"
shown
that
defined,
this
of
it has b e e n can be p r e c i s e l y
if required.
However, relative
concept
as a s p e c i f i c a t i o n
w e have
succeeded
to a p a r t i c u l a r
family
only of
in d e f i n i n g
languages.
this
concept
6
~ - COMPARABILITY
6.1 I n t r o d u c t i o n
Two rival m o d e l s
of a system, w r i t t e n in a language
will v e r y r a r e l y have the same h-support.
i,
It m a y be argued,
then, that the use of e a c h of them implies a s l i g h t l y d i f f e r e n t set of a p r i o r i
assumptions.
In this case,
a
c o m p a r i s o n of the two m o d e l s m a y not s e e m m e a n i n g f u l . On the o t h e r hand, one may c o n s i d e r that the choice of a p a r t i c u l a r
language specifies
the a p r i o r i
assumptions,
and that these are not c h a n g e d if it later appears t h a t c e r t a i n features of the l a n g u a g e are not needed. R a t h e r than argue the m e r i t s of e i t h e r v i e w p o i n t , we shall a t t e m p t to show that it does not m a t t e r m u c h for m o d e l assessment,
whichever position
is adopted.
For the p u r p o s e s of m o d e l
assessment,
the term
" p r o g r a m m i n g language" m u s t be u n d e r s t o o d r a t h e r d i f f e r e n t l y than is u s u a l in c o m p u t e r science. of A l g o l
Since the d e f i n i t i o n
60 it has b e c o m e c o m m o n to d e f i n e "reference"
languages,
w h i c h are i n d e p e n d e n t of p a r t i c u l a r h a r d w a r e
facilities.
Consequently,
c e r t a i n aspects of a l a n g u a g e
(such as i n p u t / o u t p u t in Algol) the p r o v i n c e of its d e f i n i t i o n . language" m e a n s a c o m p u t i n g
are c o n s i d e r e d to lie o u t s i d e For us, h o w e v e r
facility,
"programming
e x a c t l y as it appears
to the p r o g r a m m e r . The m o s t
i m p o r t a n t d i f f e r e n c e b e t w e e n this v i e w of a
language and a "reference"
language
(for us)
is that s t a n d a r d
136
procedures, (e.g.
sin,
language 6.2
such
as i n p u t / o u t p u t
sqrt),
must
Definition
to be i n c l u d e d
in the
and m I
are
Models
(6.2.1)
Let m
2
be c o n s i d e r e d
procedures
definition.
l-Comparable
and m
and m a t h m a t i c a l
be two m o d e l s
of a s y s t e m
S.
Then m
z
i
l-comparable
if and only
if
E~(m I) % Z l(m 2) ~0 In this d e f i n i t i o n is a l a n g u a g e
which
in o t h e r words, m
and m 1
are
it is of c o u r s e
interprets
m
in the t e r m i n o l o g y assumed
to be
for some
E
and E i
l-comparability
Such
More
Perhaps
the
to a f f e c t m o d e l
feature
in a p r o g r a m
the
without
arguments,
"facilities"
identity
most
is its
those p r o c e d u r e s being
defined
procedures or h o w m a n y
of S,
of
are terminal
as " f a c i l i t i e s " .
are
features
such
or of arrays.
of all
of a l a n g u a g e
assessment
"facilities"
as w h e t h e r
of got___~os t a t e m e n t s ,
ensures
- namely,
all the
can be r e g a r d e d
significant
l-comparability
- models
2
are also u s e d by the other.
one or s e v e r a l
availability
procedures
(I,E)
(3.3.6),
.
that
details
are a v a i l a b l e ,
obviously
as the
ensures
trivial
to take
characters
and
of S;
2
u s e d by one m o d e l
apparently
allowed
as m o d e l s
2
1
respectively,
that
of d e f i n i t i o n
(~,E)
2
a language
and m
l
assumed
which
such features. can be e x p e c t e d
complement which
of s t a n d a r d
can be c a l l e d
(declared).
Consequently,
137
if l - c o m p a r a b i l i t y ensured
that
is to be a u s e f u l
h-comparable
procedures.
This
models
to be t e r m i n a l s
in c h a p t e r
5, this
it for the
following
Firstly, procedures
is not
treated
called
is arises
language
implementation. calls w e r e
identifiers concept
standard
definition
such
between
would
to the
of p r a c t i c a l
procedure
terminals),
as d e f i n e d
intended
not ensure
of s t a n d a r d
of p r o g r a m m e r -
if s t a n d a r d
of a language",
a
Such d i s t i n c t i o n
if the s y n t a x
existing
there
as addition,
as the s y n t a x
up from
when
Secondly,
of c o n v e n i e n c e
(that is,
standard
then our
in s e c t i o n
intuitive
the use of the
notion, same
procedures.
To see are w r i t t e n standard
adopt
language
for our p u r p o s e s ,
same
correspond
and h - c o m p a r a b i l i t y
but we
6.3 how
Thirdly,
calls
were built
not
are
As was m e n t i o n e d
by its own terminal.
the
of " f r a g m e n t
5.4, w o u l d
names
as terminals.
for r e a s o n s
procedure
procedure
practice,
and an o p e r a t i o n
as there
supplied
standard
in s e c t i o n
in the
distinction,
w h i c h i_~s u s u a l l y
the same
reasons.
are
procedure
it m u s t be
language.
the u s u a l
can be i n c l u d e d
is no e s s e n t i a l
procedure
of the
it is d e m o n s t r a t e d
their i d e n t i f i e r s
standard
call
is so if s t a n d a r d
considered
concept,
this,
suppose
in Algol.
procedures
sin as w e l l
are t e r m i n a l s
of the
not A l g o l - c o m p a r a b l e
two m o d e l s
Suppose
entier
as e n t i e r
that
system
that one of t h e m uses
and sqrt,
and sqrt. language,
of some
while
the o t h e r
If entier, then
(by t h e o r e m
s~rt
the m o d e l s
(5.4.24)).
the uses
and sin
are c e r t a i n l y
138
But if entier,
sqrt and sin are s i m p l y p r o c e d u r e
i d e n t i f i e r s w h o s e syntax is g i v e n by, for example: <procedure identifier>::=
::= l e t t e r > [ < i d e n t i f i e r > < l e t t e r >
then the A l g o l - s u p p o r t s equivalent.
of the two m o d e l s may be
This syntax, w h i c h is part of the s y n t a x of
the A l g o l - s u p p o r t of the first model,
allows
sin to be used
::=ilnls must
(since the p r o d u c t i o n s
also be p a r t of the syntax), to be u s e d as a p r o c e d u r e
and f u r t h e r m o r e
identifier.
A l g o l - s u p p o r t of the first m o d e l
(5.4.1)
intuitively).
Then,
it allows it suppose that the
is a f r a g m e n t of the
A l g o l - s u p p o r t of the second m o d e l to allow,
Now,
the i d e n t i f i e r
(a p o s s i b i l i t y w h i c h we w i s h
a c c o r d i n g to d e f i n i t i o n
(iv), the sin call m u s t have the same effect in both
languages.
So,
the A l g o l - s u p p o r t of the first m o d e l m u s t
c o n t a i n sin as a s t a n d a r d procedure.
C l e a r l y this c o n t r a d i c t s
the i n t e n d e d m e a n i n g of " A l g o l - s u p p o r t " . Consequently,
we insist that s t a n d a r d p r o c e d u r e i d e n t i f i e r s
be r e g a r d e d as terminals.
If it is now s t i p u l a t e d that only
l - c o m p a r a b l e m o d e l s s h o u l d be c o m p a r e d for m o d e l assessment, then we have the formal e q u i v a l e n t of the i n t u i t i v e idea, that m o d e l s
should be c o m p a r e d only if they use the same
f a c i l i t i e s of a language.
One r e a s o n for m a k i n g this
s t i p u l a t i o n has a l r e a d y been r e f e r r e d to in s e c t i o n It m a y be felt to be an "unfair"
6.1.
c o m p a r i s o n if the m o d e l s
are not l - c o m p a r a b l e . An o b v i o u s e x a m p l e of this w o u l d be a c o m p a r i s o n of a
139
d i f f e r e n c e - e q u a t i o n m o d e l w i t h a d i f f e r e n t i a l - e q u a t i o n model. If the d i f f e r e n t i a l - e q u a t i o n m o d e l w e r e a l l o w e d to call a standard p r o c e d u r e
for integration,
w o u l d it be r e a s o n a b l e
to compare the " n u m b e r of a r b i t r a r y elements"
e m b o d i e d in
it w i t h the n u m b e r e m b o d i e d in the d i f f e r e n c e - e q u a t i o n m o d e l ? The d l f f e r e n c e - e q u a t i o n m o d e l assumptions
r e q u i r e s fewer a priori
(if its ~- s u p p o r t is a f r a g m e n t of the l - s u p p o r t
of the d l f f e r e n t i a l - e q u a t i o n model, w h e r e in w h i c h b o t h m o d e l s There
I is the l a n g u a g e
are w r i t t e n ) .
are, however,
two w a y s of m a k i n g m o d e l s
l-comparable.
Rather than a d d i n g an e x p l i c i t l y d e c l a r e d i n t e g r a t i o n p r o c e d u r e to the d i f f e r e n t i a l - e q u a t i o n model,
it is p o s s i b l e to add a
"dummy" call of the s t a n d a r d p r o c e d u r e the d i f f e r e n c e - e q h a t i o n model. o f f e r r e d by this p o s s i b i l i t y , r e d u n d a n t statements,
in s e c t i o n
It is the f l e x i b i l i t y of "padding"
that r e d u c e s
i n s i s t e n c e on l - c o m p a r a b i l i t y .
for i n t e g r a t i o n to
models with
the s i g n i f i c a n c e of any
This w i l l be d e m o n s t r a t e d
6.3.
If l - c o m p a r a b i l i t y
is required,
the choice of a s u i t a b l e l - s u p p o r t to be compared.
there still remains
for the m o d e l s w h i c h are
R e t u r n i n g to the above example,
still a d e c i s i o n to be m a d e - s h o u l d b o t h m o d e l s standard p r o c e d u r e decision,
for i n t e g r a t i o n ,
of course,
or n e i t h e r ?
is v e r y s i g n i f i c a n t
it w i l l be g o v e r n e d by the apriori
the m o d e l l e r w i s h e s
to make.
call the This
for m o d e l assessment.
But this is the d e c i s i o n d i s c u s s e d in c h a p t e r 4. words,
there is
In o t h e r
assumptions
that
140
6.3
Example:
Algol W-Comparable Gas Furnace Models
This section investigates how the assessment of the six models of the gas-furnace data
(cf. chapter 3) is altered,
if they are required to be AlgolW-comparable. 6.3.1
Standard Procedures The definition of Algol W is assumed to be a formalised
version of the specification given in use three standard procedures, procedures
(50).
The six models
namely the input/output
READ, READON and WRITE.
In accordance with the
discussion of section 6.3, we consider the syntax specification of
(50) to be augmented by the productions:
<simple statement>::=<standard
procedure statement>
<standard procedure statement>::=<standard procedure (
identifier> list>)
identifier>::=READ[READON]WRITE
The abstract syntax,
translator and interpreting
automaton are considered to be modified accordingly. 6.3.2 Al~olW-Comparable
Models
In this example the models are modified so as to be AlgolW-comparable expressions,
by inserting redundant statments
and
rather than by avoiding certain constructions.
Referring to section 3.7, and comparing the models in order, we notice first that the support of model II contains
syntax of the AlgolW-
the productions
<simple t expression>::=<simple
t expression>+
term>l
141 whereas
the s y n t a x of the A l g o l W - s u p p o r t of m o d e l I c o n t a i n s
only the p r o d u c t i o n <simple t e x p r e s s i o n > : : = < t
term>
(For the s i g n i f i c a n c e of "t" see Appendix B). the W R I T E
This d i s c r e p a n c y
s t a t e m e n t of m o d e l
(50) or the i n t r o d u c t i o n to can be removed by c h a n g i n g
I to:
WRITE(Y(I)+O);.
M o d e l III r e q u i r e s several p r o d u c t i o n s w h i c h are not needed for m o d e l s
I or II.
T h e s e are:
< l e t t e r > : : = NIU ::=. <simple
t expression>::=<simple
: : = < t t e r m > * < t
t expression>-
term>
factor>
::= ::=<simple
t expression>
operator>
<simple t e x p r e s s i o n > ::=< <statement>::=
statement>
<simple s t a t e m e n t > : : = < b l o c k > l < : : = < t
t a s s i g n m e n t statement> left part><
t expression>
: : = : = : : = < i f
clause><simple
statement>ELSE
<statement> : : = IF < l o g i c a l e x p r e s s i o n >
THEN
M o s t but not all of these are n e e d e d for m o d e l IV, but model IV itself needs two p r o d u c t i o n s w h i c h are not n e e d e d by m o d e l s
I,II, or III:
::= EIVIWIZ
142
l i s t > : : = < a c t u a l p a r a m e t e r > l < a c t u a l p a r a m e t e r list>, The only new p r o d u c t i o n r e q u i r e d by m o d e l s v and VI are < l e t t e r > : : = A ,
and < l e t t e r > : : = W ,
respectively,
but these can easily be r e m o v e d by u s i n g d i f f e r e n t identifiers. We give b e l o w the six models, m o d i f i e d AlgolW-comparable. AlgolW-support I
so as to be
The c o n c r e t e syntax of their common
is g i v e n in A p p e n d i x B.
The Trivial Model
BEGIN INTEGER
I,J,N,V,W,Z;
REAL A R R A Y E w U , Y ( I : : 2 9 6 ) ;
BEGIN FOR J : = l UNTIL READ
296 DO READON
(Y(J-O));
(1) ;
V:=O; IF I
(O,V) ELSE W R I T E ( Y ( I ) * I + O ) ;
END END. 53.8
II
53.6
53.5
...
57.0
The M e a n
BEGIN INTEGER
I,J,N,V,W,Z;
REAL A R R A Y E , U , Y ( l : : 2 9 6 ) ;
BEGIN FOR J : = l U N T I L 296 DO READON READ v:=O;
(I) ;
(Y(J-O));
143
IF I < l T H E N W R I T E WRITE
(O,V)
ELSE
( 5 3 . 5 + Y (I)*l) ;
END END. .i
.3
III
O
O
Deterministic
-.1
Transfer
...
Function
(Using
3.5
Input
Observations
Only).
BEGIN INTEGER FOR
I,J,E,V,W,Z;
J:=l
READ
UNTIL
REAL ARRAY
296 D O R E A D O N
N,U,Y(l::296);
(N(J));
(I) ;
IF I<6 T H E N
WRITE
(N(I),O)
ELSE
BEGIN FOR J:=l
UNTIL
5 DO
BEGIN Y(J) : = N ( J ) - 5 3 . 5 ; READON
(U (J)) ;
END; FOR J:=6
UNTIL
I DO
BEGIN READON
(U (J)) ;
Y (J) :=. 5 7 " Y (J-l) + . O I * Y (J-2)-. 53"U (J-3) -.37*U(J-4)-.51*U(J-5) END ; WRITE
(Y(I)+53.4+N(I)) ;
END ; END. 53.8
53.6
...
4.1
;
144
IV
Deterministic Transfer Function
(Using Input and O u t p u t Observations)
BEGIN I N T E G E R I,J;
REAL A R R A Y E(I::296);
REAL N , U , V , W , Y , Z ; FOR J:=l U N T I L 296 DO R E A D O N READ
(E(J));
(I) ;
N:=O; IF I<6 THEN W R I T E
(E(I)) ELSE
BEGIN READON WRITE
(U,V,W,Y,Z); (22.4 -.53"U - . 3 7 " V - . 5 1 * W + .57"Y + . O I * Z + E ( I ) ) ;
END; END. 53.8
V
53.6
...
1.6
Stochastic Process Model
BEGIN I N T E G E R I,J;
REAL A R R A Y E , U , Y ( I : : 2 9 6 ) ;
REAL N,V,W,Z~ FOR J:=l U N T I L 296 DO R E A D O N READ
(E(J));
(I) ;
IF I<8 THEN W R I T E
(E(I)) ELSE
BEGIN FOR J : = l U N T I L 7 DO READON WRITE
(U(I-J), Y(I-J));
( 2 . 1 * Y ( I - I ) - 1.5"Y(I-2)
+.34"Y(I-3)
+ . O I * Y (I-4)-. 53"U (I-3) +. 44"U (I-4) -. 28"U (I-5) +. 55"U (I-6)-. 32"U (I-7) +E (I) +2.2) ;
145
END; END. 53.6
53.8
VI
Box
. . . .
& Jenkins
2
Model
BEGIN INTEGER
I,J;
REAL A R R A Y
E,U,Y(I::296);
REAL N t V t W t Z ; FOR J:=l READ
UNTIL
296 DO R E A D O N
(E(J));
(I) ;
If I<8 THEN W R I T E
(E(I))
ELSE
BEGIN F O R J:=l U N T I L WRITE
7 DO R E A D O N
(U(I-J),Y(I-J));
(2. I*Y (I-l) -i. 5*Y (I-2) +. 34"Y (I-3) + . O I * Y (I-4)-. 53"U (I-3) +. 44"U (I-4) -. 28"U (I-5) +. 55"U (I-6) -. 32"U (I-7) +E (I) -. 57"E (I-1)-.Ol*E (I-2) +2.2) ;
END; END. 53.8
53.6
It w i l l that
they
of t h e i r
be seen
that
are not c h a n g e d compilation
(for example, models
. . . .
the m o d e l s
that each p r o d u c t i o n
have b e e n
in essence,
and e x e c u t i o n
by the i n t r o d u c t i o n
I and II).
t h e s e models,
4
although
may
and that
changed
of the e x t r a b l o c k
in A p p e n d i x
every p r o d u c t i o n
but
so
the d e t a i l s
have b e e n
It is s t r a i g h t f o r w a r d listed
"padded"
tedious
B is used required
in to c h e c k
in each for the
of
146
p a r s i n g of these m o d e l s
6.3.3
is listed in A p p e n d i x B.
C o m p a r i s o n of A s s e s s m e n t s
T a b l e III shows a c o m p a r i s o n of the sizes of the m o d e l s before and after the above m o d i f i c a t i o n s . their p e r f o r m a n c e s .
T a b l e IV c o m p a r e s
The t a b l e look-ups remain u n c h a n g e d
(from t h o s e shown in A p p e n d i x C), since the syntax r e q u i r e d for each of t h e m is a l r e a d y common.
model
Size,
e x c l u d i n g table look-up Unmodified Modified
Size of table look-up
T o t a l Size Unmodified I Modified
I
52
93
1480
1532
1573
II
57
96
1102
1159
1198
III
199
209
877
1076
1086
IV
129
131
835
964
966
V
203
212
802
1OO5
1014
VI
225
234
788
1013
1022
Table
III-
Sizes of Models Before and A f t e r Modification
T a b l e s III and IV show that a l t h o u g h the sizes of m o d e l s I and II, e x c l u d i n g the t a b l e r e s u l t of the m o d i f i c a t i o n s ,
look-ups,
n e a r l y d o u b l e d as a
the i n f o r m a t i o n gains and the
i n f o r m a t i o n e x p l a i n e d by e a c h m o d e l w e r e v e r y little affected. This can be e x p e c t e d to be true in m a n y cases, reason.
for the f o l l o w i n g
The size of the trivial m o d e l w i l l u s u a l l y be
147
Information
Model
Unmodified
Gain
Information
Modified
I
O
0
II
373
III
Unmodified
Explained Modified
O
O
375
24.4%
23.8%
456
487
29.8%
31.0%
IV
568
607
37.1%
38.6%
V
527
559
34.4%
35.4%
VI
519
551
33.8%
35.4%
i T a b l e IV - P e r f o r m a n c e s of M o d e l s Before and After Modification
d o m i n a t e d by the size of its table look-up,
even for small
o b s e r v a t i o n sets.
of the m o d e l can
Any n e c e s s a r y
"padding"
u s u a l l y be i n t r o d u c e d very e c o n o m i c a l l y ,
so the o v e r a l l size
of the trivial m o d e l does not change m u c h 1573 in the p r e s e n t example).
(from 1532 to
On the o t h e r hand,
as the
m o d e l s b e c o m e m o r e s o p h i s t i c a t e d and their table look-ups smaller,
they w i l l need
less "padding"
since m a n y m o r e
f e a t u r e s of the l a n g u a g e w i l l a l r e a d y be in use Thus,
once again,
6.4
Conclusion
(cf.table III).
the o v e r a l l size w i l l not c h a n g e much.
A l t h o u g h chapter
5 provides
some p r e c i s e c o n c e p t s w h i c h
can be a s s o c i a t e d w i t h "a priori k n o w l e d g e " , w h i c h c o n c e p t is m o s t a p p r o p r i a t e .
Should
it is not clear "a priori knowledge"
be c o n s i d e r e d to be s p e c i f i e d by the l a n g u a g e X in w h i c h one intends to w r i t e a model,
or by the X - s u p p o r t in w h i c h it turns
148
out to be w r i t t e n ? associating recursive
The first v i e w c o r r e s p o n d s
"a priori k n o w l e d g e "
w i t h a certain p a r t i a l
f u n c t i o n w i t h o u t r e f e r e n c e to the model, w h i l e
the second v i e w c o r r e s p o n d s recurslve
to simply
to a s s o c i a t i n g it w i t h a p a r t i a l
f u n c t i o n after e x a m i n i n g the a s s u m p t i o n s n e e d e d by
the model. For the e x a m p l e i n v e s t i g a t e d , it does not m a t t e r ,
in p r a c t i c e ,
it has b e e n shown that
w h i c h v i e w is adopted.
T h e r e is r e a s o n to s u p p o s e that this is true in m o s t cases. This is m o s t useful, models
not o n l y b e c a u s e
it is not clear w h e t h e r
should be ~ - c o m p a r a b l e b e f o r e b e i n g compared,
also b e c a u s e c h e c k i n g h - c o m p a r a b i l i t y w o u l d be q u i t e d i f f i c u l t automated).
but
of c o m p l i c a t e d m o d e l s
(although m u c h of it c o u l d be
7. TABLE LOOK-UP
7.1
CODINGS
Introduction M o d e l a s s e s s m e n t w i l l o b v i o u s l y be a f f e c t e d by the m a n n e r
in w h i c h table look-ups in m o d e l s
are coded.
S i n c e a table
look-up can be v i e w e d as a g e n e r a l i s e d error
(cf. chapter3),
the details of the coding w i l l d e t e r m i n e
the i m p l i c i t trade-
off b e t w e e n c o m p l e x i t y and a p p r o x i m a t i o n . of p r e v i o u s without
chapters,
comment.
In the e x a m p l e s
a p a r t i c u l a r coding has been a s s u m e d
We shall show that this coding is a n a t u r a l
one to use.
7.2
Size-Capturing Codings
We assume a finite a l p h a b e t P.
Let P* be the set of
all finite s e q u e n c e s of e l e m e n t s of P, i n c l u d i n g the empty s e q u e n c e A.
S u p p o s e that m o d e l a s s e s s m e n t is to be p e r f o r m e d
using a l a n g u a g e ~, and that the set of p r o g r a m s w r i t t e n in 1 is a subset of P*. S u p p o s e f u r t h e r that all the rival m o d e l s b e i n g a s s e s s e d are such t h a t the e l e m e n t s of their table l o o k - u p s are rationals.
By using a s u i t a b l e scaling,
can be r e g a r d e d as i n t e g e r s obtain a concrete
l-model,
(cf. sec.
all these e l e m e n t s
3.2).
In o r d e r to
t h e s e i n t e g e r s m u s t be coded into
e l e m e n t s of P* w h i c h a p p e a r as f r a g m e n t s of p r o g r a m s w r i t t e n in I. Definition
(7.2.1)
A c o d i n g is a total, e f f e c t i v e ,
injective function
150
c : ~ P *, w h e r e N d e n o t e s the set of all i n t e g e r s
(previously
N has d e n o t e d the set of n o n n e g a t i v e integers). Let ~+
d e n o t e the set of n o n n e g a t i v e
Definition
integers.
(7.2.2)
W i t h each coding c there can be a s s o c i a t e d a codelength f u n c t i o n L = c o £ : N ~ + ,
where
gives the length ~(p) of peP*.
£ is the function w h i c h Here o d e n o t e s c o m p o s i t i o n ,
so that L ( n ) = Z ( c ( n ) ) , The terms w h i c h a p p e a r in a table look-up of a m o d e l r e p r e s e n t the errors w h i c h
the m o d e l makes
c o m p u t e the o b s e r v e d s y s t e m behaviour.
in trying to
Our m e t h o d of
a s s e s s i n g m o d e l s is b a s e d on the a s s u m p t i o n that a short table look-up c o r r e s p o n d s
to small errors.
This will only
be so if the coding used in ~ is suitable. Definition
(7.2.3)
A size-capturing
coding is one for w h i c h the a s s o c i a t e d
c o d e - l e n g t h L satisfies: (i)
L(-n)=L(n)
(ii)
Inl l>In21~L(nl)~L(n2).
Clearly,
, for all nE~,
the use of a s i z e - c a p t u r i n g c o d i n g leads to table
look-ups having the d e s i r e d property.
T h e r e q u i r e m e n t that
L should be an even f u n c t i o n r e f l e c t s the usual i n d i f f e r e n c e to the sense of the error.
The p o s i t i v e entries a p p e a r i n g
in A p p e n d i x C w e r e c o n s i d e r e d to be p r e c e d e d by "+", in order to o b t a i n a s i z e - c a p t u r i n g
coding.
S i z e - c a p t u r i n g codings can be c o m p a r e d w i t h the class L of l o s s - f u n c t i o n s d i s c u s s e d by D e u t s c h
(51).
The use of
151
a coding w h i c h was not s i z e - c a p t u r i n g declaring
that certain
small errors,
or, perhaps,
are p r e f e r a b l e permissible,
large errors
the m o r e usual
are p r e f e r a b l e
ones,
and so on.
This
but we w i s h to r e s t r i c t
situation.
Notice
the m e a s u r e m e n t
scales
errors
is p e r f e c t l y
ourselves
to
also that the same effect
could be achieved by using a s i z e - c a p t u r i n g changing
to
to certain
that in some cases positive
to n e g a t i v e
of course,
would be t a n t a m o u n t
coding,
but
(i.e. by " t r a n s f o r m i n g
the
variables"). Let Q be a subset of P, c o n t a i n i n g and CQ be the set of s i z e - c a p t u r i n g subsets of Q*.
For example,
of the integers Definition
has as range
r-codin9
the o r d i n a r y
ranges
decimal
are
coding
a subset of { 0 , i . . . , 9 , + , - } ~
is a s i z e - c a p t u r i n g
such that L O ( n ) = ~ o g r l n l ] + l the g r e a t e s t
, w h e r e Lo=coO£
integer not e x c e e d i n g
For example,
is a smallest
2-coding: Co(3)=10,
can clearly be c o n s t r u c t e d (7.2.5)
x
(xc~),
if r=2 then the coding
Co(2)=OO , Co(-2)=O1,
Theorem
codings w h o s e
(r)2),
(7.2.4)
A smallest
follows
r elements
coding
and
CoECQ,
[x~ denotes
and Lo(O)=O.
cO defined
Co(O)=A , ¢o(i)=O,
as
Co(-l)=l ,
etc. A c o r r e s p o n d i n g
coding
for any r>2.
If Co is a smallest
r-coding,
and Lo=CoO£ ,
then for any CeCQ,L=cQ£, L (n) 9Lo (n) . Proof:
Since
c is injective
all the elements
and total
(definition
of S ( n ) = { c ( O ) , c ( 1 ) , c ( - l ) , c ( 2 )
(7.2.1))
.... ,c(n)}
152
exist
and are distinct.
capturing, Then
we have
images
c is assumed
lil
Is(n) I=21nI+l.
(i.e.
Since
Now the number
of c) of code-length
to be sizeSuppose
of possible
n~O.
code words
j is r j, since CECQ.
Hence L (n) j E r j=O
IS(n) I Z
r L (n) +i r-i =
[logrlnl]+l .
Assume
L(n)
i.e.
L (n)~< [logrl nlJ .
Then
IS (n) I<
[logr In I] +l r
-i r - 1
<.
rlnll r
_
-
r
nl -
r-i
_!_l
r-I
< 21n I + 1 since r>2 =IS(n) I. Hence
the assumption
then the assumption since L(n)=L(-n)
to a contradiction.
L(n)
decimal
coding
+,- for denoting
or negative. ~J~°glo'n']+2'
leads
and Lo(n)=Lo(-n).
The ordinary the symbols
leads
Consequently
to the same contradiction Hence L(n)>Lo(n).
of the integers
whether
the integer
its code-length
if it is assumed
If n>O
that positive
reserves is positive
is given by L(n)= integers
are
always preceded by "+"
(The usual convention of leaving
positive integers unsigned size-capturlng,
leads to this coding not being
since L(n)~L(-n);
identification of O with OO,OOO,
furthermore
the usual
etc, means that the usual
notation is not a coding according to our definition (7.2.1)). models, length 7.3
Note that since we are interested only in comparing
it does not much matter whether the code we use has [loglolnIl+ 1 or
[lOglolnl ] +2.
Coding Determines Feature Se!ectign Suppose that the models being considered have the structure
shown in fig.
9.
Each model has one table look-up, whose
conteDts are denoted by E={e
:i=l,...N}.
The "fixed"
l
part of the model, look-up,
namely that part which is not the table
is denoted by m.
m operates on the model input
D i to give m(Di) , which ~s summed with e i to give the model output C i.
(cf. see.
3.3).
ith s~stem output observation integers
suppose further that
Ci=Yi, the
, and that ei,Y i are single
(as opposed to vectors of integers).
Let £(m)
be the length of the ~ program implementing m, and let L(E) N
denote
Z L(ei), namely the length of the table look-up i=l
when coded into I (assu/aing that ~ uses a coding c for which L=coZ).
Note that we have assumed that any separators
used in the table look-up are included in m. The suggested assessment procedure namely choosing the shorter model,
for rival models,
leads to the selection
of the model with the shorter table look-up,
if the number N
154
of o b s e r v a t i o n s
is large enough.
u s e d for m o d e l a s s e s s m e n t
Thus a change of the coding
tends to change the "weighting"
w h i c h is g i v e n to v a r i o u s For example,
f e a t u r e s of the m o d e l behaviour. 2 if the coding u s e d is such that L ( n ) = n
then the m o d e l w h i c h gives tend to be selected.
a better least-squares
fit w i l l
If L(n)=Inl x and x is large then the
m o d e l w h i c h gives a b e t t e r "minimax"
fit w i l l tend to be
selected. Suppose
that the table l o o k - u p of one of the m o d e l s has
leiI=a, i=l,...,N, w h e r e a s of the entries,
a n o t h e r one has ei=O for 99%
and
leil=3a for 1% of the entries. T h e n the 5 use of a coding w i t h c o d e - l e n g t h L ( n ) = I n I , for example,
w o u l d tend to favour the s e c o n d model, w h i l e L ( n ) = I n I w o u l d tend to favour the first. the use of a c o d i n g w i t h L(n)
increasing
a coding w i t h
On the
other hand,
sufficiently slowly
m a y be i n d i f f e r e n t to e i t h e r table look-up.
Ideally,
the
coding w h i c h s h o u l d be used is one w h i c h best e m p h a s i s e s the features
in w h i c h the m o d e l l e r
is interested.
However,
it is not clear w h e t h e r a coding exists w h i c h is c a p a b l e of reflecting
the features w h i c h are i m p o r t a n t to the m o d e l l e r ,
or even w h e t h e r he
can
say p r e c i s e l y w h ~ t they are.
The e x i s t e n c e of a s m a l l e s t r - c o d i n g s ~ r o n g l y s u g g e s t s that this coding s h o u l d be used for m o d e l assessment. A f t e r all,
in s e a r c h i n g
for a g o o d model,
the m o d e l l e r is
t r y i n g to find the m o s t c o n c i s e way of expressing, computing,
the o b s e r v a t i o n s .
It t h e r e f o r e
that he s h o u l d use the m o s t c o n c i s e coding.
or
seems n a t u r a l
155
The use of the s m a l l e s t conservative sense
that
favour
sense
any o t h e r
in a "safe"
size-capturing
the results
coding w e r e b e i n g
section.
7.4
Effect
Let
of C h a n ~ e
l ~ be a p r o g r a m m i n g that
than
If the entries
c.
integers
are t r a n s f o r m e d
to them,
then
definition
(7.2.1)).
look-up
and its
in a table
and m
L=co~,
which
look-up
in the
is i d e n t i c a l
if the
is £(m)+L(E)
of a
l-model i c o c"
is £ ( m ) + L ~ ( E ) ,
, and table
l-model
(cf.
fig.
where
two rival models, look-ups
E
2
, E 1
has
with
of
of by table
9), then
L=co£
with
c ~ rather
the f u n c t i o n
the entries
of E
the
and L'=c~o£. "fixed"
, respectively. 2
, E i
Theorem
-
2
e i , e i denote
models
size
This w i l l be s h o w n
by a p p l y i n g
that we have
1 1
agree with
if any o t h e r
language
Furthermore,
size
size of the l ' - m o d e l
Let
the r e s u l t s
in the
the r e s u l t i n g p r o g r a m will be a l ~- m o d e l _I (c is a f u n c t i o n , since c is injective,
the system.
Suppose
This
assessment,
are coded v i a the coding
a system
E,
errors.
quickly
of Coding.
l, e x c e p t
m
used.
in the
coding will more
for m o d e l
be o b t a i n e d
to the m o s t
trade-off,
fitting
circumstances
which would
next
parts
small
procedure
that in c e r t a i n
capturing
leads
complexity/approximation
a complex model with
results
r-coding
are
assessed
(7.4.1) L~=c~o£
using
.
that the
I and l'.
Let c,c ~ be s i z e - c a p t u r i n g be such
Suppose
2
that,
for any i,j,
IL" (i)-L~(J) J)IL(i)-L(J ) I-
codings
and
156
If £(ml)+L(EI)>£(mz)+L(E2),
and a bijection 2
{1,2 ..... N} exists, then Z(m )+LJ(E 1
)>£(m )+L~(E 1
(The bijection
that
such
2
h:{l,2 .... ,N}÷
I
leil<~leh(i) } (i=l .... N), ).
2
h is used to rearrange
E , so that to every I
entry in h(E ) there corresponds
a smaller entry in E ).
I
2
Proof: 1
2
1
2
[LJ (eh(i))-L" (e i) ]~>IL(eh(i))-L(e i) I 1
But c,c" are size-capturing 2
1
Therefore
and
leh(i)l~>le21"z 2
1
L'(eh(i))-L" (ei)~>L(eh(i))-L(ei)-
So, summing over i, L~(E
)-L'(E ]
)~>L(E )-L(E ). 2
1
2
But £(m )+L(E )>Z(m )+L(E )---~1
L(E
1
2
2
)-L(E )>£(m )-£(m ). 1
2
Therefore
2
L ~E
1
)-L'(E 1
hence
)>£(m )-~(m ), 2
Z (m)+L'(E 1
2
1
This theorem,
1
)>Z(m )+L'(E 2
). 2
together with the previous
shows that under the stated condition, than model
1 according
c', whose code-length
if model
to a size-capturing
remains better according
discussion, 2 is better
coding c, then it
to any other size-capturing
everywhere
increases
coding
at least as quickly
as that of c. The condition expected Example
to be satisfied
the existence
, E =(9,1). 2
of h can be
in many cases of practical
(7.4.2)
E =(8,15) 1
stipulating
(N=2)
interest.
157
h(1)=2, h (2) =i. 2 1 e, =9<15=eh(l) So the theorem L(n) L'(E
2 I e2=l<8=eh(2).
,
can be applied,
e.g. :
= lOglOlnl+l , L'|n)=In I 1
)-L'(E
Example
2
)=I3>I=L(m
(1,15)
).
(N=2)
z
case a suitable
)-L~(E
)=-I
1
would
2
, E =(8,9).
1
L'(E
)-L(E
(7.4.3)
E = In this
1
gives
h does not exist, )-L(E
1.
), so the proof
of the theorem
2
break down.
Corollary
(7.4.4)
If c o is a smallest
r-coding
with
and a b i j e c t i o n h:{I,...,N}~{I,...,N} 2 i leil~< l e h ( i ) I , (i=l, .... N), and
and in fact we have
£(ml)+Lo(EI)>~(m2)+Lo(E2)'
code-length exists
Lo(lil+l)-Lo(i)=l---~
n(lil+l)-L(i))l,
we have Z(m )+L(E l
)>£(m 1
)+L(E 2
). 2
Proof: From definitions
(7.2.3)
O<~Lo ( Iil +i) -Lo (i) ~
from definition L (lil+l)-L(i)>,O.
such that
then for any CeCQ,
such that
and ,
(7.2.4), i%O.
(7.2.3) ,
Lo,
L=co£,
158
These two facts, the s t a t e m e n t
together w i t h the condition on L imposed
of the corollary,
in
imply that
n ( Iil +l)-L (i)~no ( Ii l + l ) - L o (i) . Consequently,
for any i,j,
IL(i)-L(j) I~ILo(i)~Lo(J) I. The corollary
follows
Corollary than m o d e l remains
(7.4.4)
if model
r-coding
CECQ is used,
at those points
2 is b e t t e r
is used,
then it
providing
only
at which L ° increases,
h can be found.
The c o n d i t i o n
some codings w h i c h may c o n c e i v a b l y
such as those
and
on L does
be of interest,
for w h i c h L ( n ) = m a x ( k , L o ( n ) ) .
Summary A class of codings
table
(7.4.1).
that,
b e t t e r w h e n any other
that a suitable
7.5
states
1 w h e n a smallest
that L increases
exclude
from t h e o r e m
look-ups
has been i n t r o d u c e d w h i c h
of m o d e l s w i t h the d e s i r a b l e
that their size increases increase.
A smallest
given size of alphabet. be a natural of using
one to use
as the m a g n i t u d e s
coding
exists
This
it is that i n m a n y cases
codings
approximated
of the integers
for a
coding appears
to
An a d v a n t a g e
the result of model
o b t a i n e d w i t h most other codings. closely
of their entries
assessment.
a s s e s s m e n t will agree w i t h the result
reasonably
characteristic
in this class,
smallest
for model
endows
that w o u l d have been
The s m a l l e s t
coding
by the c o n v e n t i o n a l
(cf. Bobrow
and A r b i b
(52)).
is
r-ary
8
8.1
Model
DISCUSSION
has
The m o s t choice
before, a priori
of drawbacks.
value?
We b e l i e v e
obvious
drawback
of p r o g r a m m i n g
assumptions
holding.
is no w o r s e
of language
assumptions
Admittedly, is r a t h e r
it is not p o s s i b l e which
specified.
language
is the w a y
actually
specified
discussions choice
(e.g. M i h r a m language
in w h i c h
in w h i c h m a n y in s i m u l a t i o n
language
specifications A second
drawback
respect,
a priori hand,
such
For example, knowledge
knowledge
a programming
In fact,
common
are
in
to a s s o c i a t e " w o r l d view"
of p r o g r a m m i n g
to t i g h t e r
or looser
knowledge.
is that,
although
--
is
assumptions
the m o d e l l e r ' s
correspond
if the
of a s s u m p t i o n
choosing
choices
of
of a p r i o r i
statistical
studies.
Furthermore,
of a ~ r i o r i
our m e t h o d
of s p e c i f y i n g
a priori
with
on some
restricted.
it has b e c o m e
can be made w h i c h
relies
in this
a priori
depends
As has been m e n t i o n e d
and the type
On the o t h e r
(53)).
assessment
as a s p e c i f i c a t i o n
to s p e c i f y
of s i m u l a t i o n
the
Consequently,
is s o m e w h a t
of m o d e l l i n g
it n e v e r t h e l e s s
it does.
assessment
this m e t h o d
is one of the forms
commonly
is that
indirect,
can be s p e c i f i e d
that
that others
is t a k e n
w h i c h we h a v e
Does
language.
of m o d e l
assumptions.
which
rival models
every method
assessment choice
of a s s e s s i n g
a number
have p r a c t i c a l
on the
CONCLUSION.
Assessment
The m e t h o d proposed
AND
a n a t u r a l w a y of
160
coding table
look-up e l e m e n t s has been d e m o n s t r a t e d ,
a c o d i n g exists s h o u l d be used?
for any p o s i t i v e i n t e g e r r~2. Fortunately,
such
Which r
there is a s m a l l e s t p o s s i b l e
r (namely 2), and the l o g a r i t h m i c n a t u r e of the s m a l l e s t code-length
implies t h a t t h e
relative
sizes of m o d e l s w i l l
not change m u c h if r ranges o v e r the c o m m o n l y used values (say rzl6).
However,
the change may be significant;
t h e r e f o r e the only m e a n i n g f u l p r o c e d u r e try two or three
low values of r, i n c l u d i n g r=2,
how m u c h this i n f l u e n c e s As an example,
the a s s e s s m e n t of the
using b i n a r y - c o d e d table
rather than decimal ones.
of the table
look-ups of
shown in A p p e n d i x C, are each m u l t i p l i e d
by I0 to m a k e them integral,
and the c o n v e n t i o n a l b i n a r y
coding a p p l i e d to the r e s u l t i n g i n t e g e r s "-"),
look-ups
We shall c o n s i d e r only m o d e l s
If the e l e m e n t s
these three models,
and see
the m o d e l assessment.
let us i n v e s t i g a t e
Algol g a s - f u r n a c e models
I, IV and VI.
appears to be to
(preceded by "+" or
then the a s s e s s m e n t shown in table V is obtained.
We have a s s u m e d that the fixed parts of the models unchanged,
are
and that the c o m p i l e r t r a n s l a t e s the table look-
ups back to s t a n d a r d A l g o l r e p r e s e n t a t i o n .
~odel
Size, e x c l u d i n g table look-up
Size of table look-up
Total size
Information gain
Information explained
I
52
3184
3236
O
IV
129
1153
1282
1954
60.4%
VI
225
1059
1284
1952
60.3%
O
T a b l e V - A s s e s s m e n t of A l g o l G a s - F u r n a c e Models I,IV and VI, U s i n ~ B i n a r y - C o d e d T a b l e Look-Ups.
161
The
coding
used
since we have L (n) = sequence,
since we
From is used
performances
VI.
are i n t e r e s t e d
of m o d e l s
c o u l d have
of m o d e l
good, than
in the
concluded
sense
that
for the other.
answer model
than m o d e l
no firm s u p p o r t primarily choose
The
two chief are
only n e c e s s a r y
because
However,
and
remarked
in this
that
to
there
is
interested
then one
could
information
used.
in C h a p t e r
the m o d e l s
assessment non-
being
similar.
It is
the m o d e l
i, this
is no g r e a t
of s i m u l a t i o n .
is n e c e s s a r y
the m o d e l s
model
to c o m p l e x
of o b t a i n i n g
of the p o s s i b i l i t y
thesis
trying
since
a higher
(2) that
a qualification
for one
elaborate
of the p r o p o s e d
a means
several
if one w e r e
procedure
to be s t r u c t u r a l l y
to have
- as was
restriction
assumed
models,
do not need
behaviour
"no",
(i) that it can be a p p l i e d
dynamical
compared
support
a more
F i n a ll y ,
advantages
to m o d e l
changed,
if one w e r e
VI.
was
case we have
are e q u a l l y
answer
2-coding
the
IV and VI
could
had
2-coding
Firstly,
using
decision
sizes.
reached.
is no m o r e
the m o d e l w h i c h
a smallest
procedures linear
one
in a c l e a r - c u t
gain w h e n
that m o d e l s
for m o d e l
as b e t t e r
perforance
"Is it w o r t h
IV?"
between
IV is s u p e r i o r
have b e e n
there
model
a smallest
In this
Alternatively,
the q u e s t i o n
is of no con-
no d i f f e r e n c e
that m o d e l
alternativeconclusionscould one
when
IV and VI.
the o r d e r i n g
this
2-coding,
in r e l a t i v e
that,
is v i r t u a l l y
firm i n d i c a t i o n
Had
a smallest
~ o g 21nl] +2, b u t
table V it is seen
, there
a rather
is not q u i t e
here.
of i n t e r e s t
W e have are
162
essentially deterministic
causal p r o c e s s e s , w i t h s t o c h a s t i c
effects e n t e r i n g in such a m a n n e r to be m a n i f e s t a t i o n s of the system.
that they can be c o n s i d e r e d
of the m o d e l l e r ' s
imperfect understanding
The m o d e l s w h i c h appear in control theory
are of this type.
But in o p e r a t i o n a l
r e s e a r c h it is common
p r a c t i c e to use m o d e l s w h i c h are e s s e n t i a l l y s t o c h a s t i c in nature,
such as the c o n v e n t i o n a l m o d e l s
c h a r a c t e r i s a t i o n of C h a p t e r
of queues.
The
3, and c o n s e q u e n t l y the m e t h o d
of a s s e s s m e n t b a s e d on it, is not i m m e d i a t e l y a p p l i c a b l e to such models.
(In some cases it m a y be p o s s i b l e to
e x t e n d the c h a r a c t e r i s a t i o n models
to such m o d e l s by r e q u i r i n g the
to compute not the o b s e r v e d b e h a v i o u r
itself, but
some o b s e r v e d p r o p e r t y of that b e h a v i o u r - for example,
a
histogram). Perhaps
the b e s t known type of complex m o d e l w h i c h can
be a s s e s s e d by our m e t h o d is the "System Dynamics" model
i n t r o d u c e d by F o r r e s t e r
fact s u b j e c t his m o d e l s
(6).
to tests
argues that c o m p l e x i n d u s t r i a l
F o r r e s t e r does not in
against real data.
and s o c i o - e c o n o m i c
c o n t a i n such strong c r o s s - c o u p l i n g s validation
type of
is v i r t u a l l y impossible.
He
systems
and f e e d b a c k s that s t a t i s t i c a l C o n s e q u e n t l y he insists
that the m o d e l l e r should be able to j u s t i f y all the detail of the model,
and that e l e m e n t s of the m o d e l should c o r r e s p o n d
to "real elements"
in the system.
In other words,
only
those e l e m e n t s are p e r m i t t e d about w h i c h the m o d e l l e r ' s a priori
knowledge
t h e i r inclusion.
is so strong that he has no doubts about Furthermore,
if the m o d e l behaves
in an
163
unsatisfactory m u s t be
manner,
found w h i c h that
then
"causes
for the d i s c r e p a n c y
can be e x p l a i n e d
and d e f e n d e d
other
than
their
inclusion
corrects
since
"a s u f f i c i e n t l y
elaborate
formal
can be d e v i s e d data"
to fit a r b i t r a r i l y
restrictive. competing
to s t r i c t l y ,
curve-fitting
closely
structures
he s h o u l d
should
include
procedure
any e n s e m b l e
structure,
then F o r r e s t e r
of d e c i d i n g
or of c h o o s i n g The Dynamics"
of
which
includes
doubt,
modeller
will
is,
systems. uncertain
him w i t h
gain p r o v i d e s method
the m o d e l l e r
of course. structures
of m a k i n g
such
requires
some
He can take
a model
about which
he has
gain.
It does
b a s e d his m o d e l
at this
stage
on m a n y
other observations
knowledge of s i m i l a r
one of the c o m p e t i n g
and see w h e t h e r Obviously,
the
on the p a r t i c u l a r
ultimately,
incorporate
no
not m a t t e r
but on his a p r i o r i
or not.
any
a "System
to him,
structures,
is i m p r o v e d
one or o t h e r
available
He can now
he
structures.
its i n f o r m a t i o n
not have
hand,
in fact be included,
is - it can even be negative;
observations - that
not p r o v i d e
an o b j e c t i v e
those
of two
in some p a r t of the model,
it s h o u l d two
about which
and i n c l u d e s
use of it,
data,
only
and assess
this
with
To m a k e
observed
the
does
are v e r y
If, on the o t h e r
of i n f o r m a t i o n
modeller
decisions. relevant
between
concept
appear
guidelines,
whether
injunctions
is u n s u r e
neither.
the above
means
these
If the m o d e l l e r
disregards
what
system behaviour"
(6). If a d h e r e d
then
on g r o u n d s
the
information
he can also e s t a b l i s h
gain which
164
of these
structures
Note
that
Forrester's should
proviso,
that
about
ensures
adding
gain
the s y s t e m
information
gain.
consistency
with
new features
be a " c u r v e - f i t t i n g "
of i n f o r m a t i o n
explained
the g r e a t e r
this p r o c e d u r e
not m e r e l y
increase
gives
exercise.
indicates
behaviour
to a m o d e l An
that m o r e has been
than has b e e n
added
to
the model. The d e s i r a b i l i t y
of v a l i d a t i n g
set of o b s e r v a t i o n s
other
it is now g e n e r a l l y
recognised.
more
strongly:
to be m e a n i n g l e s s .
used
for the v a l i d a t i o n
been
stated
several
can be a c h i e v e d ment
Why,
then,
this
of a m o d e l
by u s i n g
the
for c o n s t r u c t i n g
is c o r r e c t
already,
can be said
if the
set of data.
is of c o u r s e
it is
criterion
since,
an a r b i t r a r y
as has
goodness-of-fit
N o w the
a procedure
set of o b s e r v a t i o n s
same
assess-
for m o d e l
is assumed.
is it not m e a n i n g l e s s ?
simpler
is that but
it.
as m e a s u r i n g ,
information
trades
the model,
to c o n s t r u c t
the
gain
y e t only one
goodness-of-fit,
set u s e d
for c o n s t r u c t i n g
is g o o d n e s s - o f - f i t ,
for any finite
The r e a s o n
The
This
times
of i n f o r m a t i o n
validation,
Indeed,
as the one u s e d
held
by the use of a
than the one used
the v a l i d a t i o n
set of o b s e r v a t i o n s
a model
Thus
the
for c o n s t r u c t i n g
"proportion"
An i n f o r m a t i o n observations
which
gain
have
fewer
the
been
to b u i l d
are r e q u i r e d
of the o b s e r v a t i o n
and c o m p a r i n g validates
indicates,
just assess
can be r e g a r d e d
"proportion"
successfully
used
gain
not
complexity.
observations
the model,
of zero
does
it off a g a i n s t
information
in a sense,
gain
this w i t h
the model.
roughly,
that all
the model,
and none
the
I&5
remain
for v a l i d a t i o n .
Suppose
that
to c o n s t r u c t on a s e c o n d behaviour clear
a model,
the
that
converse
constructed
that its
gain
system which
it is is,
of the m o d e l
is a c o n c a t e n a t i o n
It m a y be c o n j e c t u r e d
true:
sets,
Then
set of o b s e r v a t i o n s
that
then it is p o s s i b l e
using
such
that
one of the sets,
if a m o d e l
has
a high
to d i v i d e
the o b s e r v a t i o n
the m o d e l
can be
and s u c c e s s f u l l y
validated
the other. Of course,
he s u c c e e d s
the m o d e l l e r
in v a l i d a t i n g
if his m o d e l
modelling
the
(for example, acquire
To certain method
does
because then
have
a high
he w o u l d
features of m o d e l
which
means
assessment
be a u s e f u l
can
indicate
to him
The
features
which
can,
guide
information
have
to w a i t
arguments of m o d e l
appear if used
of his
arbitrary
safer
if
set of data, gain w h e n
is not p o s s i b l e too
long to
show that
information
validation. that
in spite
arbitrary,
insofar
efforts
are no more
of
the p r o p o s e d
intelligently
to the m o d e l l e r ,
the success remain
on a second
it is b e l i e v e d
at first
feel
But if this
the above
then,
inevitably
his m o d e l
an a l t e r n a t i v e
summarise,
care,
will
first set alone.
new data)
gain g i v e s
with
second
of the
set into two d i s j o i n t
even
the
is also
gain,
is then v a l i d a t e d
in the s e n s e
sets of o b s e r v a t i o n s ) .
information
using
this m o d e l
be the i n f o r m a t i o n
as a m o d e l
the
and that
is u s e d
fit to those o b s e r v a t i o n s .
larger
the g r e a t e r w i l l
of the two
set of o b s e r v a t i o n s
set of o b s e r v a t i o n s ,
is a good
that
(viewed
a certain
and as it
to date. prominent
166 than the a r b i t r a r y features of any m o d e l
a s s e s s m e n t procedure.
The a v a i l a b i l i t y of such a guide s i g n i f i c a n t l y extends range of t e c h n i q u e s
a v a i l a b l e to the m o d e l l e r ,
the
for an i m p o r t a n t
and large class of d y n a m i c a l models.
8.2
Model Buildin~
The c h a r a c t e r i s a t i o n of m o d e l l i n g w h i c h was p r e s e n t e d in C h a p t e r s
3 and 4 is of course
an i d e a l i s a t i o n and a
s i m p l i f i c a t i o n of the m o d e l l i n g process. exhibits differs (e.g.
some i m p o r t a n t c h a r a c t e r i s t i c s
Nevertheless, of m o d e l l i n g .
from o t h e r a t t e m p t e d f o r m a l i s a t i o n s
(53),
beliefs:
(54)) in that it q u e s t i o n s
namely,
that a "true m o d e l "
s y s t e m to be m o d e l l e d , in m o d e l complexity.
These b e l i e f s systems,
of m o d e l l i n g
some l o n g - c h e r i s h e d n e c e s s a r i l y exists of the
are a legacy from the
and do not s e e m a p p r o p r i a t e
and other p o o r l y - u n d e r s t o o d
M o s t d i s c u s s i o n s of m o d e l l i n g some "true model"
systems.
assume at the o u t s e t that
of the s y s t e m u n d e r i n v e s t i g a t i o n exists,
and that the aim of the m o d e l l i n g e x e r c i s e
is to find a
m o d e l w h i c h in some sense a p p r o x i m a t e s the true one. m a k e no such assumption. ]
@=(S
It
and that there is some i n h e r e n t v i r t u e
m o d e l l i n g of e n g i n e e r i n g for s o c i o - e c o n o m i c
it
We
C o n s i d e r an a s y m p t o t i c s y s t e m
2
,S ,...)
(cf. C h a p t e r 4).
Let x(n)
of the best m o d e l of the s y s t e m S n c r i t e r i o n of C h a p t e r sequence
3).
(x(1),x(2),...)
be the G~del n u m b e r
(judged a c c o r d i n g to the
T h e n it is p o s s i b l e is random.
that the
In this case it does
not seem m e a n i n g f u l to speak of the "true model" However,
an a s y m p t o t i c m o d e l
of / c a n
of ~ .
always be found,
since
167
the s e q u e n c e
of trivial
When modelling detailed well
models
My m a k i n g
association
them more
in t h e s e
circumstances.
a model
performance
the o p p o s i t e are
using
with
useful
model
set.
Of course,
them directly
though
contradicting
usually
by large
pay
not very
conditions,
prominent,
a "bigger
For e x a m p l e ,
they m a k e
relationships hundred
are
the m e r i t s
of M e s a r o v i c
that e n o u g h justify
such
performance
data
ultimately
may
that
its
and
derived
not be able
of such o b s e r v a t i o n s .
the size which
complex
to
of u s e f u l m o d e l s have
models
b e e n made,
prove
useful
they m a y not be d i r e c t l y
sets.
Other
ideas; finds
treatments
however, the
of m o d e l l i n g
these
author
notions
slipping
are
back
into
of mind. and P e s t e l
as a p o i n t
statement:
(55)
in its
favour.
statement
and P e s t e l ' s
model.
it seems
should
causes
about
iOO,0OO to a few
doubts
It seems
the
to the
the v i e w p o i n t
grave
"world
even more
be so g o o d
about
From
refer
At one p o i n t
as c o m p a r e d
world models".
this
a large model;
repeatedly
"In our m o d e l
in the computer,
c o u l d be c o l l e c t e d
of the m o d e l
compare
of a c r e d i b l e
it is a m o d e l
though
in o t h e r w e l l - k n o w n
of our c h a r a c t e r i s a t i o n ,
but must
is i t s e l f
even
Mesarovic
stored
cannot
that very
frame
the f o l l o w i n g
if one
fact
to these
size of their m o d e l
Furthermore,
of o b s e r v a t i o n s
and one o f t e n
is better"
being
alone,
generally,
observation
lip-service
things
the m o d e l l e r
amount
the
other
by the size of the o b s e r v a t i o n
- in a sense,
total
an
In our c h a r a c t e r i -
the size
knowledge
it can be s t a t e d q u i t e
is l i m i t e d by the
large
limited
a priori even
then
better
and g o o d m o d e l s
is taken:
a priori knowledge
the
are
Thus
models
small models.
is in e f f e c t
certain
view
observations,
from o b s e r v a t i o n s ,
supported
detailed.
formed
good models
under
and more
systems to o b t a i n
detailed
equal,
without
to seek
model.
in w h i c h
within
complex,
precisely
Thus
relationships
it is r e a s o n a b l e
sation
exhibit
situations,
between
is n a t u r a l l y
build
is an a s y m p t o t i c
engineering
cause/effect
understood,
models
about
unlikely
system"
unlikely
as to j u s t i f y
to
that the the r e j e c t i o n
168
of a r e a s o n a b l y smaller.
(Although
made with the
same
care,
model
effective
since
the m o s t
procedure
can exist.
This
standard
guaranteed
of the n a i v e specific
important,
implies system
have
times
to be
sets w o u l d
expertise
achieve
not be
to compare
theories. principle
application
analysis"
it m a y
should
on w h a t
with
the
the
systems
as e c o n o m i c s theory,
of m o d e l l i n g
dispose
can r e p l a c e
concerned
of e c o n o m i c
can not be
though
This
such
of a s y s t e m
technique even
no
depend
and not on
and simulation.
Inference.
is c o n c e r n e d
with
the i n f e r e n c e
from o b s e r v a t i o n s .
briefly
of theories
It is t h e r e f o r e
the c h a r a c t e r i s a t i o n some p h i l o s o p h i c a l
interesting
of m o d e l l i n g studies
developed
of general
inference.
Both philosophers regarded
model,
in a field
trappings
thesis w i t h
scientific
in g e n e r a l
the m e c h a n i c a l
The b o u n d s
on the p r o g r e s s
contentious,
the best m o d e l
"systems
primarily
and h y p o t h e s e s
not
an "answer".
can
Modelling
that
a useful
modelling
Scientific
is that
of the d i s c i p l i n e
studied.
the t e c h n o l o g i c a l
albeit
identification
to p r o d u c e v i e w that
system being
in this
would
the o b s e r v a t i o n
for f i n d i n g
to p r o d u c e
be g u a r a n t e e d
8.3
is a t h o u s a n d
the c o m p a r i s o n s
of our c h a r a c t e r i s a t i o n
of some
which
for the two models).
Perhaps result
adequate
simplicity
of science
as a d e s i r a b l e
One of the b e s t k n o w n of O c k h a m ' s
Razor,
which
and s c i e n t i s t s characteristic expressions states
that
have
long
of s c i e n t i f i c
of this only
is the
those
169
entities
s h o u l d be i n t r o d u c e d
absolutely Philosophy" "Rule
necessary. (56) I:
Newton's
stem We
from
are
"Rules
than such their
Therefore
III:
and w h i c h
are
the
IV:
of our e m p h a s i s IV justify degree
to all b o d i e s
of our e x p e r i m e n t s ,
has b e e n made
qualities
philosophy inferred
m a d e more
accurate,
some rules
on high
we
of all
are
by w h i c h
such
I and II are
an i n f o r m a l
which
may be held
on the (57).
idea
that
However,
be
to exceptions".
with
gain
as o t h e r
they m a y e i t h e r
or liable
gain,
hypotheses
time
similarities
information
induction
or very n e a r l y
any c o n t r a r y till
to look
by g e n e r a l
as a c c u r a t e l y
occur,
by Bunge
are to
whatsoever.
notwithstanding
attack
admit neither
to b e l o n g
the use of i n f o r m a t i o n
A detailed
causes.
found
phenomena
of c o n f i d e n c e
the same
we m u s t ,
of degrees,
that m a y be imagined,
can be discerned:
and s u f f i c i e n t
effects
which
the u n i v e r s a l
from p h e n o m e n a
rules
true
remission
upon p r o p o s i t i o n s
In these
of n a t u r a l
nor
In e x p e r i m e n t a l
true,
assign
of bodies,
reach
be e s t e e m e d bodies
causes
to the same n a t u r a l
The q u a l i t i e s
within
in
appearances.
intensification
Rule
of R e a s o n i n g
as are b o t h
as far as p o s s i b l e , Rule
are
this p r i n c i p l e :
to e x p l a i n II:
a theory w h i c h
to admit no m o r e
things
Rule
into
our characterisation
while
counterpart
rules
as an i n d i c a t o r
III and of the
in a model. simplicity
is d e s i r a b l e
our c h a r a c t e r i s a t i o n
170
of m o d e l l i n g e s c a p e s the b r u n t of this attack. m a i n concern is that r e g a r d i n g s i m p l i c i t y of s c i e n t i f i c q u a l i t y is too naive.
Bunge's
as the sole c r i t e r i o n
Proper emphasis should
also be p l a c e d on a c c u r a c y and depth:
"The m o t t o of science
is not just P a u c a but r a t h e r P l u r i m a ex p a u c i s s i m i s - the m o s t out of the least.
In short, we w i s h e c o n o m y and not
merely parsimony." I n f o r m a t i o n gain does not just m e a s u r e program.
It m e a s u r e s
the size of a
the size of a program,
p r o g r a m is a m o d e l of the system.
given t h a t the
As was m e n t i o n e d earlier,
this leads to a t r a d e - o f f b e t w e e n m o d e l c o m p l e x i t y and approximation.
Thus our c h a r a c t e r i s a t i o n
B u n g e ' s motto.
It should be added, h o w e v e r that w h e n m o d e l l i n g
p o o r l y u n d e r s t o o d systems,
accords w i t h
the p l u r i m a is limited by the n a t u r e
of the a v a i l a b l e o b s e r v a t i o n
set;
if this is small then
useful m o d e l s w i l l n e c e s s a r i l y have to be small also.
Bunge's
second m a j o r c r i t i c i s m of a general e x h o r t a t i o n
of s i m p l i c i t y is that several d i f f e r e n t aspects of s i m p l i c i t y can be discerned,
and that these are often m u t u a l l y
An i n d i s c r i m i n a t i n g
incompatible.
call for s i m p l i c i t y is t h e r e f o r e n o n s e n s i c a l .
We e a s i l y evade this criticism, s p e c i f i c about how s i m p l i c i t y
since we have been q u i t e
should be measured,
and t h e r e f o r e
about w h a t type of s i m p l i c i t y we r e g a r d as important. Our theory .concerning m o d e l s "realisation" (38).
of P o p p e r ' s
is in some r e s p e c t s
a
a b s t r a c t theory of s c i e n t i f i c m e t h o d
The first p o i n t of a g r e e m e n t is the h y p o t h e t i c o -
d e d u c t i v e n a t u r e of the theory.
Theorem
(3.4.4)
shows that
171
the h y p o t h e s e s obtained
in some
algorithm" to us.
from w h i c h
our m o d e l s
routine
can exist.
Forcing
manner How
the m o d e l
it p o s s i b l e
to test
of the e x t e n t observations.
the theory
gain
not e x c l u d e
which
introduced
prima
facie
the e m p i r i c a l the o r i g i n a l is g r e a t e r Popper's
to save
auxiliary
content
the
that
whereas
an a l l e g e d
true
has
counterpart
If a m o d e l w h i c h possible cannot
model
prove
possible a smaller Our
has b e e n
found
of the s y s t e m
it.
model,
On
the o t h e r
then we
A
the so that of -
(58).
law w h i c h law w h i c h
is true is not
in our theory. (smallest)
investigation,
hand,
then we
if it is not the b e s t
can d e m o n s t r a t e
this by e x h i b i t i n g
model. assertion
to P o p p e r ' s
belief
that good m o d e l s that good
are small
theories
content,
all
hypothesis
system"
is the best
under
... by
auxiliary
a scientific
its
not e v e n
system - consisting
can n e v e r be v e r i f i e d , can be falsified,
from b e i n g
hypotheses,
that of the o r i g i n a l
contention
can be v i e w e d
hypotheses ....
auxiliary
theory plus
by the
model:
m a y be e v a d e d
of the
a measure
the e m p i r i c a l
all i m m u n i s a t i o n s ,
ad hoc
This m a k e s
the m o d e l
of the o v e r a l l
of t e s t a b l e
than
look-up
then m e a s u r e s
falsification
introduction
gives
is f a l s i f i e d
a table
of c o r r o b o r a t i o n " ,
"We m u s t
the o b s e r v a t i o n s
look-up
introduced
Information
is i r r e l e v a n t
- the size of the c o r r e c t i o n s
Alternatively,
falsified.
modelling
of the method.
by a table
as an ad hoc h y p o t h e s i s
or "degree
part
the t h e o r y
to w h i c h
are o b t a i n e d
to c o m p u t e
to the d e d u c t i v e
c a n n o t be
- no " u n i v e r s a l
they
corresponds
w h i c h m u s t be g e n e r a t e d
are b u i l t
corresponds
are simple.
Popper
172
associates
"simplicity" w i t h " p a u c i t y of p a r a m e t e r s " , w h e r e a s
we have a rather m o r e g e n e r a l c o n c e p t of s i m p l i c i t y - p a u c i t y of terms in the model.
It is b e c a u s e of this c o r r e s p o n d e n c e
that we s u g g e s t that i n f o r m a t i o n gain can be i d e n t i f i e d w i t h Popper's Kuhn
"degree o f c o r r o b o r a t i o n " . (39) has s u g g e s t e d that s c i e n c e does not in fact
use a u n i f o r m method, but that it can be d i v i d e d into two d i s t i n c t phases. fundamental
In the u s u a l phase - "normal science"
assumptions
of a " p r o b l e m - s o l v i n g "
are not q u e s t i o n e d ,
many anomalies
and r o u t i n e w o r k
c h a r a c t e r is pursued.
time to time a " s c i e n t i f i c revolution"
However,
from
occurs - s u f f i c i e n t l y
and s h o r t c o m i n g s of the e s t a b l i s h e d
"Weltanschauung"
a c c u m u l a t e to force a r e v i s i o n of b a s i c assumptions. t e m p t i n g to a s s o c i a t e
-
a change of the p r o g r a m m i n g
It is
l a n g u a g e used
for w r i t i n g m o d e l s w i t h such a " s c i e n t i f i c revolution".
For,
as has been argued in c h a p t e r 4, a change of p r o g r a m m i n g language implies a change in a priori b e l i e f s being investigated.
about the s y s t e m
Such a change leads to a change in the
o r d e r i n g of models, w h e n they are o r d e r e d in a c c o r d a n c e w ~ t h their i n f o r m a t i o n gains. o r d e r i n g of m o d e l s
The s u g g e s t i o n that a change in the
corresponds
to a " s c i e n t i f i c revolution"
p r e v i o u s l y b e e n m a d e by G a i n e s Formal developments
of logical p r o b a b i l i t y
s c i e n t i f i c i n d u c t i o n by C a r n a p assume a p a r t i c u l a r formal language,
corresponding
and his school
" l o g i c a l basis".
in w h i c h
p r i n c i p l e be made.
((14) and cf. sec.
and of
(36),(59)
all s c i e n t i f i c s t a t e m e n t s
a s s u m p t i o n is made,
particular programming
2.4.2).
language.
always
T h i s is taken to be a
In our c h a r a c t e r i s a t i o n
has
could in
of m o d e l l i n g a
n a m e l y the a s s u m p t i o n of a A m a j o r c r i t i c i s m of the
a s s u m p t i o n of a f o r m a l i s e d
language has always been that it
is e v i d e n t that s c i e n t i f i c
statements
are n e v e r e x p r e s s e d in
173 such languages,
and that it is not c e r t a i n w h e t h e r a formal
language capable of e x p r e s s i n g i n t e r e s t i n g s c i e n t i f i c s t a t e m e n t s can exist. programming
This c o n t r a s t s w i t h the status of the
languages w h i c h we have to assume.
such l a n g u a g e s
Obviously,
are capable of e x p r e s s i n g s t a t e m e n t s w h i c h
are i n t e r e s t i n g to the m o d e l l e r , p r a c t i c a l to do so.
and in m a n y cases it is
Furthermore,
it has been shown in
c h a p t e r 5 that such l a n g u a g e s can be d e f i n e d q u i t e precisely. A tenuo~similarity characterisation (60).
can be p o i n t e d out b e t w e e n our
and C a r n a p ' s
" C o n t i n u u m of I n d u c t i v e Methods"
Carnap introduces a "confirmation
function", w h i c h
is to be i n t e r p r e t e d as the logical p r o b a b i l i t y event w i l l occur.
The v a l u e of this f u n c t i o n d e p e n d s on a
term w h i c h can be i n t e r p r e t e d as an a priori on
an a p o s t e r i o r i e m p i r i c a l
frequency.
that a p a r t i c u l a r
logical factor, and
factor, w h i c h is a r e l a t i v e
The r e l a t i v e w e i g h t i n g of these two factors is
g o v e r n e d by a p o s i t i v e real p a r a m e t e r I, w h i c h thus indexes the " c o n t i n u u m of i n d u c t i v e methods".
The value of the
p a r a m e t e r % is u s u a l l y taken to be chosen s u b j e c t i v e l y , depends on how r e g u l a r or "lawlike" his " u n i v e r s e of discourse"
and
the i n d u c t i v i s t b e l i e v e s
to be.
S o m e w h a t analogously, we can c o n s i d e r our c h a r a c t e r i s a t i o n of m o d e l l i n g to be an " e n u m e r a t i o n of i n d u c t i v e m e t h o d s " , w h i c h is i n d e x e d by a G~del n u m b e r of the p r o g r a m m i n g w h i c h has b e e n chosen. subjective,
The choice of this index is also
since it r e f l e c t s
a priori beliefs.
we are not p r o p o s i n g that our m e t h o d probability.
language
However,
leads to a logical
A l t h o u g h it is easy to o b t a i n a rO,l]-valued
474
function to make
from i n f o r m a t i o n it b e h a v e
like
is that no e f f e c t i v e exist. need
effectively
w a y of n o r m a l i s i n g
the f u n c t i o n
would
over
out
they
as follows:
ravens
are black".
suppose This
that
Any
of i n d u c t i o n a ~pothesis
supports This
the m o d e l arise,
logical
algorithm a very
by o b s e r v a t i o n s
notion from the
"all to the
on " c o n f i r m i n g
seems
to lead to the c o n c l u s i o n , thing w h i c h
hypothesis
that
instances"
is also not "all ravens
that a are
absurd. clearly
does
behaviour, However,
a model
depend
on " c o n f i r m i n g
a hypothesised
the i n f o r m a t i o n
Hempel's
paradox
can be v i e w e d
no m o d e l w i l l
one - s u c h
not
corresponds
(we can w r i t e
an
Thus we have
type of e n t i t y
classical
does
as the s t a t e m e n t
but not x:~y;).
of w h a t
regularity
gain Of
exist w h i c h
of that h y p o t h e s i s
of the form x:=y;
different
that
paradox
is b a s e d
in g e n e r a l
negation
Hempel's
is e q u i v a l e n t
b e c a u s e every time
although
as "benchmark"
which
in the s y s t e m
because
consideration
are n o n - r a v e n s " .
is p a t e n t l y
is increased.
of
We include
it is h y p o t h e s i s e d
hypothesis
the o r i g i n a l
of a h y p o t h e s i s , to the
is not
of i n d u c t i o n
inference.
of a n o n - b l a c k
in a sense,
is r e p e a t e d
and w o u l d
things
Our characterisation instances"
paradoxes
and Goodman.
"all n o n - b l a c k
every observation
black".
which
tend to be r e g a r d e d
hypothesis
raven
famous
of i n d u c t i v e
arises
supporting
to be additive,
difficulty can
that our c h a r a c t e r i s a t i o n
of H e m p e l
of a c c o u n t s
theory
have
a set of m o d e l s
the two
those
of these b e c a u s e tests
The e s s e n t i a l
we p o i n t
escapes
(61), n a m e l y
not s e e m p o s s i b l e
computable).
Finally, modelling
it does
a probability.
(The f u n c t i o n
to sum to u n i £ y
gain,
can be "supported"
entities
cannot
175
be m e r e l y
logical s t a t e m e n t s ,
The s e c o n d p a r a d o x also arises
but m u s t be algorithms.
is G o o d m a n ' s
from c o n s i d e r a t i o n
of " c o n f i r m i n g instances".
time a green e m e r a l d is observed, "emeralds are green" "emeralds
are blue"
also confirms
Every
the h y p o t h e s i s that
is confirmed, is falsified.
the h y p o t h e s i s
"grue" p a r a d o x , w h i c h
and the h y p o t h e s i s However,
the o b s e r v a t i o n
that " e m e r a l d s are grue"
green until 1980 and blue t h e r e a f t e r ,
that
and f a l s i f i e s
- namely,
the
h y p o t h e s i s that " e m e r a l d s are bleen" - blue until 1980 and g r e e n thereafter. Thus the o b s e r v a t i o n w o u l d a p p e a r to tell us n o t h i n g at all about the a p p e a r a n c e of e m e r a l d s in the future. This p a r a d o x is evaded, characterisation.
r a t h e r than solved, by our
We assume
language has b e e n chosen.
that a p a r t i c u l a r p r o g r a m m i n g
The d e f i n i t i o n of this
language
can be r e g a r d e d as the d e f i n i t i o n of w h a t b a s i c p r e d i c a t e s are to be u s e d in our s c i e n t i f i c predicates
like "blue"
like "grue"
and bleen"
statements.
and "green"
If b a s i c
are defined,
then p r e d i c a t e s
can be c o n s t r u c t e d from these - it is
h e l p f u l to think of them as b e i n g d e f i n e d by p r o c e d u r e s . Now a theQry
like " e m e r a l d s
are green"
can be e x p r e s s e d as
a m o d e l by using just the t e r m i n a l c h a r a c t e r s of the language. However,
a m o d e l c o r r e s p o n d i n g to " e m e r a l d s
are grue" w o u l d
need to i n c l u d e the d e c l a r a t i o n of the p r o c e d u r e consequently,
"grue";
its i n f o r m a t i o n gain w o u l d be lower,
and this
m o d e l w o u l d be r e j e c t e d in favour of the first one. Of course,
this m a k e s no c o n t r i b u t i o n
to the p h i l o s o p h i c a l
p r o b l e m of why the l a n g u a g e chosen should d e f i n e "blue" "green"
r a t h e r than "grue"
and "bleen".
and
176
8.4
S[stems Science It has long b e e n p e r c e i v e d that c o n t r o l theory and,
more generally, with
systems
the acquisition,
concerned
t r a n s f e r and use of i n f o r m a t i o n ,
r a t h e r than of energy. information.
theory are p r i n c i p a l l y
Models
However,
are often said to convey
"information"
u s u a l l y u s e d in an intuitive,
in this sense is
p r e s y s t e m a t i c way.
I n f o r m a t i o n gain has b e e n i n t r o d u c e d in an attempt to formalise
this idea of i n f o r m a t i o n c o n v e y e d by a model.
The
e s t a b l i s h e d t h e o r i e s of i n f o r m a t i o n do not appear a d e q u a t e for this purpose.
Use of the s t a t i s t i c a l theory of i n f o r m a t i o n
t r a n s m i s s i o n w o u l d have r e s t r i c t e d us to the c o n s i d e r a t i o n only of r a n d o m p r o c e s s e s w h i c h could be d e s c r i b e d s t a t i s t i c a l l y , and the i d e ~ o f
"cause"
in such a framework. Hillel's
and "effect"
could have found no p l a c e
On the o t h e r hand, C a r n a p and Bar-
theory of s e m a n t i c i n f o r m a t i o n
i n v o l v e d the use of u n c o m p u t a b l e
(59),
(62) w o u l d have
"logical probabilities",
and w o u l d in any case not be c a p a b l e of p r a c t i c a l
application
to m o d e l l i n g . The a l g o r i t h m i c i n f o r m a t i o n a l m o s t p r o v i d e s w h a t is required, not in terms of p r o b a b i l i t i e s , - and m o d e l s
theory of K o l m o g o r o v since it defines
(19)
information
but in terms of a l g o r i t h m s
are c l e a r l y algorithms.
Consequently,
i n f o r m a t i o n g a i n has b e e n d e v e l o p e d from the ideas of a l g o r i t h m i c i n f o r m a t i o n theory.
However,
it m u s t be e m p h a s i s e d that
i n f o r m a t i o n gain is not the same entity as K o l m o g o r o v ' s " q u a n t i t y of i n f o r m a t i o n " .
~he latter is an u n c o m p u t a b l e
177
quantity, whereas
it has b e e n d e l i b e r a t e l y e n s u r e d that
i n f o r m a t i o n g a i n is computable. " q u a n t i t y of i n f o r m a t i o n " signals, w h e r e a s models
Furthermore,
Kolmogorov's
is a f u n c t i o n d e f i n e d on p a i r s of
i n f o r m a t i o n g a i n is a f u n c t i o n d e f i n e d on
(assuming the "signals"
to be givem).
How can the a s s e r t i o n that i n f o r m a t i o n gain m e a s u r e s the p e r f o r m a n c e four a r g u m e n t s association:
of a m o d e l be j u s t i f i e d ? in its favour.
We have e m p l o y e d
The first is a r g u m e n t by
a l g o r i t h m i c i n f o r m a t i o n is a p l a u s i b l e
f o r m a l i s a t i o n of " i n f o r m a t i o n " ;
i n f o r m a t i o n g a i n is i n t i m a t e l y
r e l a t e d to a l g o r i t h m i c information;
therefore information
gain is a p l a u s i b l e m e a s u r e of " i n f o r m a t i o n " . is a r g u m e n t by w e i g h t of opinion:
The s e c o n d
m a n y of the w o r k e r s who
have a t t a c k e d s i m i l a r p r o b l e m s have b e e n d r a w n to s i m i l a r conclusions chapter 2).
(e.g. W r i n c h and Jeffreys,
Solomonoff,
etc.
cf.
The third a r g u m e n t we have used is the a r g u m e n t
of c o n s i s t e n c y w i t h w h a t we w o u l d expect:
there is no
u n i v e r s a l m e t h o d for f i n d i n g the best model;
for large
o b s e r v a t i o n sets m o d e l s are chosen on the basis of g o o d n e s s of fit;
for large o b s e r v a t i o n
matter;
for an i m p o r t a n t class of p r o c e s s e s ,
is c h o s e n as best,
sets a priori beliefs do not the same m o d e l
as w o u l d be chosen by e s t a b l i s h e d theory.
The fourth a r g u m e n t is that of " o p e r a t i o n a l i s m " : gain of m o d e l s w h i c h are of p r a c t i c a l
the i n f o r m a t i o n
i n t e r e s t can be calculated.
This has b e e n d e m o n s t r a t e d by an example. A s t r o n g e r type of a r g u m e n t than any of these w o u l d be the use of the concept of i n f o r m a t i o n gain to obtain new results in systems theory,
or to e x p l a i n known p h e n o m e n a of
178
systems m o d e l l i n g . result of this k i n d
R i s s a n e n has already p r o v i d e d one ((41) and cf.
sec.
2.5).
In effect,
R i s s a n e n has shown that if the i n f o r m a t i o n gain of the m o d e l of a G a u s s - M a r k o v p r o c e s s is m a x i m i s e d , identification
then the r e s u l t i n g
scheme is an e x t e n s i o n of s t a n d a r d m a x i m u m
- l i k l i h o o d techniques. A second p o s s i b l e a p p l i c a t i o n of i n f o r m a t i o n g a i n is to the e x p l a n a t i o n of w h a t Y o u n g has termed the "Law of Large Systems"
(63).
that complex,
This is the conjecture,
b a s e d on experience,
p o o r l y u n d e r s t o o d systems can often be a d e q u a t e l y
r e p r e s e n t e d by v e r y simple models.
One can see i m m e d i a t e l y
how the e x p l a n a t i o n of this in terms of i n f o r m a t i o n gain w o u l d run.
The p o s s i b l e i n f o r m a t i o n gain of a m o d e l of such
a system is limited by the size of the a v a i l a b l e i n f o r m a t i o n sets.
S u p p o s e that m o d e l s
(understood informally)
belonging
to a c e r t a i n class are fitted to these o b s e r v a t i o n sets in o r d e r of i n c r e a s i n g complexity. trivial model,
Let t be the size of the
and I n be the i n f o r m a t i o n gain of the nth
m o d e l so fitted w h e n s u i t a b l y formalised. from the nth to the
(n+l)th model,
the g r e a t e s t p o s s i b l e
i m p r o v e m e n t in i n f o r m a t i o n gain is t-I n . seems
likely that d i m i n i s h i n g
Then in p a s s i n g
If t is sma11,
returns w o u l d q u i c k l y set in,
and no i m p r o v e m e n t in i n f o r m a t i o n gain w o u l d be p o s s i b l e after the first few m o d e l s had been fitted. interesting
It w o u l d be
to e s t a b l i s h the c o n d i t i o n s u n d e r w h i c h this
does i n d e e d happen.
it
179
8.5
Conclusion
A characterisation which
is b e l i e v e d
the m o d e l l i n g
of m o d e l l i n g
to i n c o r p o r a t e
of complex,
are e n c o u n t e r e d
and m a n a g e m e n t
studies,
This
characterlsation
model,
A system
sets
observation
set is a s s u m e d
of
such
socio-economic
industrial
control
a system,
studies. a
quality.
to be d e f i n e d
- an i n p u t
features
systems,
of three parts:
of m o d e l
observation
salient
understood
and in c e r t a i n
is c o n s i d e r e d
introduced
in e n v i r o n m e n t a l ,
consists
and a c r i t e r i o n
certain
poorly
as those w h i c h
has b e e n
by a p a i r
set and an o u t p u t to be a finite
set.
array
of Each
of r a t i o n a l
numbers. A model observation sets
but m u s t which
set.
to help
The m o d e l
is an a l g o r i t h m
it in this
is not compute
represents
exercise,
It m a y
before
which
use
certain
task,
allowed
computes
the o u t p u t
elements
as s p e c i f i e d
to a p p r o x i m a t e
by the modeller.
the
trivial
of the o b s e r v a t i o n
system behaviour,
it exactly.
The
model
the s i t u a t i o n
at the b e g i n n i n g
any s t r u c t u r e
has b e e n
is a m o d e l of the m o d e l l i n g
discerned
in the
system behaviour. The c r i t e r i o n gain. than
This
trade-off prevents puts
is the a m o u ~ b y
the t r i v i a l
program.
of m o d e l
model,
model
"overfitting"
a premium
which
when both
In c o n v e n t i o n a l between
quality
terms,
the m o d e l are w r i t t e n
of the m o d e l
and m o d e l
information
is s m a l l e r as a c o m p u t e r
this c r i t e r i o n
complexity
on finding
is the m o d e l ' s
leads
to a
accuracy.
to the o b s e r v a t i o n s ,
the g r e a t e s t
amount
This and
of r e g u l a r i t y
180
in the
s y s t e m behaviour.
The ranking depends
of m o d e l s
on the p r o g r a m m i n g
models.
However,
this
it can be a s s o c i a t e d about also
the
system.
affects
arbitrary, shown
according
A detailed "a p r i o r i
and
this
investigation
information
by a s s o c i a t i n g
required
to w r i t e of the
meaningfully consequence
if such
The w o r k Firstly,it logically
sound,
a formal
Secondly,
it g i v e s
the p r o g r e s s has
been
a model
that,
indicates
other
is not
coding
has been
one to use. the n o t i o n
of
language However,
models
an
can be
it is of no g r e a t
serves
analysis
a plausible
efforts. being
which
is
formalisation
by a model".
a simple
rather
two purposes.
of m o d e l l i n g
supplied
things
vacuousness
that
thesis
the m o d e l l e r
of his m o d e l l i n g
beliefs
are not m e t exactly.
gives
of the c o n c e p t of " i n f o r m a t i o n
because
can be p r e c i s e l y
the s m a l l e s t
indicated
in this
and w h i c h
that
under which
conditions
reported
provides
has
shown
as a program.
conditions
compared
smallest
by a model"
it w i t h
the m o d e l
this
is a n a t u r a l
has
assumed
defined,
examination
coding
the
the o b s e r v a t i o n s
Again,
a distinguished
for w r i t i n g
a priori
of coding
ranking.
criterion
arbitrary,
the m o d e l l e r ' s
The m a n n e r
because
chosen
is not e n t i r e l y
with
the m o d e l
to exist,
language
to this
technique Our
for
guiding
assessing
idea
equal,complexity
than
sophistication.
in
REFERENCES i.
Weber, R.L, "A Random Walk in Science'~ Physics, (1973), p.92.
The Institute
2.
Astrom, Survey~
3.
Akaike, H, "Autoregressive Model Fitting for Control", Annals of the Institute of Statistical Maths, 23, (1971), 163-180.
4.
Chan, C-W, Harris, C.J, and Wellstead, P, "An Order-Testing Criterion for Mixed Autoregressive Moving Average Processes", Int. J. Control, 20, (1974), 817-834.
5.
Akaike, H,"A New Look at the Statistical Model Identification", IEEE Trans. Auto. Control, AC-19, (1974), 716-723
6.
Forrester, (1961).
7.
Von Neumann, J, and Morgenstern, O, "The Theory of Games and Economic Behaviour", Princeton, (1944).
8.
Fuller, A.T., "Analysis of Nonlinear Stochasti~ Systems by Means of the Fokker-Planck Equation", Int. J. Control, 9, (1969) , 603-655.
9.
Rogers, H, "Theory of Recursive Functions Computability", McGraw-Hill, (1967).
K.J. and Eykhoff, P, "System Identification Automatica, !, (1971) , 123-162.
J.W,
"Industrial
Dynamics",
M.I.T°
Press
of - A
and Wiley,
and Effective
iO.
Box, G.E.P. Forecasting
ll.
Kalman, Systems
12.
Windeknecht, T.G, Press, (1971).
13.
Zadeh, L.A, and Desoer, C.A, "Linear System Theory-The State Space Approach", McGraw-Hill, (1963).
14.
Gaines, B.R, Complexity",
15.
Blum, M. "On the Size of Machines", 257-265.
16.
Blum, M. "A Machine-Independent Theory of the Complexity Recursive Functions", J. ACM, 14, (1967), 322-336.
and Jenkins, G.M, "Time Series Analysis, and Control", Holden-Day, (1970).
R.E, Falb, P.L, and Arbib, M.A, Theory", McGraw-Hill, (1969). "General Dynamical
"Topics
Processes",
in Mathematical Academic
"System Identification, Approximation and Int. J. Gen. Systems, !, (1977), 145-174. Info & Contr, ll,
(1967) , of
182
17.
L~fgren, L, "Complexity of Descriptions of Systems", Research Report IV1 7601, Dept. of Automata and General Systems Sciences, Lund Institute of Technology, (January 1976).
18.
Hartmanis, J, and Hopcroft, J.E, "An Overview of the Theory of Computational Complexity", J. ACM, I_~8, (1971), 444-475.
19.
Kolmogorov, A.N, "Thmee Approaches to the Quantitative Definition of Information", Problems of Information Transmission, ~, No. i, [1965), 1-7.
20.
Kolmogorov, A.N,"Logical Basis for Information Probability Theory", IEEE Trans. Info. Theory, (1968), 662-664.
21.
Zvonkin, A.K., and Levin, L.A., "The Complexity of Finite Objects and the Development of the Concepts of Information and Randomness by Means of the Theory of Algorithms", Russian Mathematical Surveys, 2_~5, no. 6, (1970), 83-124.
22.
Solmonoff, R.J., "A Formal Theory of Inductive Inference", Information and Control, !, (1964), 1-22, and 224-254.
23.
Chaitin, G.J., "On the Length of Programs for Comptuing Finite Binary Sequences," J.ACM, 13, (1966), 547-569.
24.
Martin-L~f, P., "The Definition of Random Sequences", Information and Control, ~, (1966), 602-619.
25.
Church, A, "On the Concept of a Random Sequence", Amer. Math. Soc., 46, (1940), 130-135.
26.
Gillies, D.A., Methuen,(1973~
27.
Schnorr, C.P."Optimal Enumerations and Optimal G~del Numberings", Math. Syst. Th, 8, (1975), 182-191.
28.
Meyer, A.R, "Program Size in Restricted Info& Contr., 21, (1972), 382-394.
29.
Biermann, A, & Feldman, J.A, "A Survey of Grammatical Inference" in: Watanabe, S, (ed), "Frontiers of Pattern Recognition", Academic Press, (1972), 31-54.
30.
Fu, K.S, & Booth, T.L, "Grammatical Inference: Introduction and Survey-Part I", IEEE Trans Syst., Man & Cybernetics, SMC-5, (1975), 95-111.
31.
Gold, M, "Language Identification Contr., i_~O, (1967), 447-474.
32.
Chomsky, N, "On Certain Formal Properties Info & Contr., ~, (1959), 137-167.
"An Objective
Theory IT-14,
and
Bull
Theory of Probability",
Programming
in the Limit",
Languages",
Info &
of Grammars",
183
33.
Fu, K.S, & Booth, T.L, "Grammatical Inference: Introduction and Survey-Part If", IEEE Trans. Syst, Man & Cybernetics, SMC-5, (1975), 409-423.
34.
Feldman, J, "Some Decidability Results on Grammatical Inference and Complexity", Info & Contr., 2__O0, (1972), 244-262.
35.
Blum, L, and Blum, M, "Toward a Mathematical Theory of Inductive Inference", Info & Contr., 28, (1975), 125-155.
36.
Carnap, R, "Logical Foundations of Chicago Press, (1950).
37.
Wrinch, D, and Jeffreys, H, "On Certain Fundamental Principles of Scientific Inquiry", Philosophical Magazine, ser. 6, vol. 42, (1921), 369-390
38.
Popper, K.R., (1959).
39.
Kuhn, T.S, "The Structure of Scientific University of Chicago Press, (I962).
40.
L~fgren, L, "Relative Explanations of Systems" in Klir, G.J, "Trends in General Systems Theory", Wiley, (1972), 340-407.
41.
Rissanen, J, "Parameter Estimation by Shortest Description of Data", Proceedings of JACC Conference, ASME, (1976), 593-597.
42.
Rissanen, J, "Basis of Invariants and Canonical Forms for Linear Dynamic Systems", Automatica, i_~0, (1974), 175-182.
43.
Chaitin, G.J. "A Theory of Program Size Formally Identical to Information Theory", J.ACM, 22, (1975), 329-340.
44.
Eykhoff, P, "System Identification-Parameter Estimation", Wiley, (1974).
45.
Box, G.E.P., and Jenkins, Forecasting and Control",
46.
Whittle, P, "Prediction and Regulation Squares Methods", EUP, (1963).
47.
Sherman, S, "Non-Mean-Square Error Criteria", Info. Theory, IT-4, (1958), 125-126.
48.
Johnston,
49.
Ollongren, A, "Definition of Programming Languages preting Automata", Academic Press, (1974).
of Probability",
"The Logic of Scientific
J, "Econometric
University
Discovery",
Hutchinson,
Revolutions",
and State
G.M, "Time Series Analysis, HoldenTDay, (1970).
Methods",
by Linear LeastIRE Trans.
McGraw-Hill,
(1963). by Inter-
184
50.
Challis, M.F, "Algol W Language Specification and Programmer's Guide", University of Cambridge Computing Service, 3rd edition (August 1975).
51.
Deutsch,
52.
Bobrow, L.S, and Arbib, M.A, "Discrete Mathematics: Applied Algebra for Computer and Information Science~ W.B. Saunders, (1974).
53.
Mihram, G.A, "Simulation: Statistical Foundations Methodology", Academic Press, (1971).
54.
Zeigler, B.P, "Theory of Modelling and Simulation", Wiley, (1976).
55.
Mesarovic, M, and Pestel, E, "Mankind at the Turning Point", Hutchinson, (1975).
56.
Cajori, F, "Sir Isaac Newton's Mathematical Principles of Natural Philosophy and his System of the World, vol. 2", University of California Press (1962).
57.
Bunge, M, "The Myth of Simplicity",
58.
Popper, K.R, "Unended Quest", Fontana/Collins,
59.
Hintikka, J, and Suppes, P, "Information Reidel, (1970).
60.
Carnap, R, "The Continuum of Inductive Methods", of Chicago Press, (1952).
61.
Hesse, M, "The Structure of Scientific Inference", (1974).
62.
Bar-Hillel, Y, "Language and Information", and The Jerusalem Academic Press, (1964).
63.
Young, P.C, Shellswell, S.H, and Neethling, C.G., "A Recursive Approach to Time Series Analysis", Cambridge University Engineering Department, Technical Report CUED/BControl/TR16(1971).
64.
De Bakker, J.W. "Semantics of Programming languages" in: Tou, J.T. (ed), "Advances in Information Systems Science, vol. 2", Plenum Press, (1969), pp.173-227.
65.
Lauer, P. "Formal Definition of Algol 60", IBM Laboratory, Vienna, Technical Report TR 25.088, (1968)
66.
Zimmermann, K. "Outline of a Formal Definition of Fortran", IBM Laboratory, Vienna, Technical Report LR.25.3.O53, (1969).
R, "Estimation Theory",
Prentice-Hall,
(1965).
Prentice-Hall,
and
(1963)
(1976).
and Inference", University Macmillan
Addison-Wesley
185
67.
Neuhold, Jo "The Formal Description of Programming Languages", IBM Systems Journal, i__OO,2, (1971), pp.86-i12.
68.
Lucas, P., Lauer, P, Stigleitner, H, "Method and Notation for the Formal Definition of Programming Languages" IBM Laboratory, Vienna, Technical Report TR 25.087, (1968), (revised 1970).
69.
Aho, A.V. & Ullman, J°D. "The Theory of Languages", Mathematical Systems Theory, ~, 2, pp 97-125, (1969),.
188
APPENDIX Formal
A.I
of P r o g r a m m i n g
formal d e f i n i t i o n
language
(64).
in r e c e n t
will
semantics
so that
and b e c a u s e
it has been
practical
programming
therefore
be no d o u b t s
A very Method
complete
is g i v e n
some ways
a brief
because
used
about
language - too
called
is c h o s e n
simple,
of the V i e n n a
stages.
is defined. which
so-called
In
Vienna
of d e f i n i n g
(66).
documented, of s e v e r a l
There
need
account
of the V i e n n a
(49), but m o r e
are a v a i l a b l e
can be g i v e n
because
and in
(67)
This
defining
Language",
to n e e d m o s t
in
here.
by f o r m a l l y
Model
concise,
and
(68).
intro-
a special
or LML. it is v e r y
This simple
of the S o p h i s t i c a t i o n
Method.
The V i e n n a M e t h o d in four
(65),
for e x p o s i t i o n
in fact,
now exist.
its power.
descriptions
"Linear
the
has b e e n m a d e
it is the best
and c a r e f u l
introduction
of c o m p u t e r
for the d e f i n i t i o n
languages
d u c t i o n w i l l be i l l u s t r a t e d language,
solutions
This m e t h o d
by O l l o n g r e n
clearer,
problem
solutions,
be described. chosen
of a p r o g r a m m i n g
a lot of p r o g r e s s
several
one of these
has been
semantics
difficult
Fortunately,
years,
appendix
Method,
of the
is a n o t o r i o u s l y
science
Only
Languages.
Introduction
The
this
Semantics
A
This
involves
First,
the c o n c r e t e
specifies
are v a l i d p r o g r a m s
the d e f i n i t i o n
those
in the
syntax
strings
language.
of a l a n g u a g e
of the
language
of c h a r a c t e r s The
specification
187
is i n v a r i a b l y
in B a c k u s - N a u r
form;
therefore a context-free grammar
the c o n c r e t e syntax is
(69).
The c o n c r e t e syntax
i n d i c a t e s how e a c h string of c h a r a c t e r s w h i c h forms a p r o g r a m is to be parsed.
On c o m p l e t i o n of p a r s i n g c e r t a i n c h a r a c t e r s
(such as s e m i c o l o n s redundant,
and c o m m e n t strings in Algol)
and can be discarded.
become
The r e m a i n i n g e n t i t i e s
w h i c h a p p e a r on the nodes of the p a r s i n g tree are now m a p p e d into e n t i t i e s assigned.
to w h i c h s e m a n t i c roles w i l l e v e n t u a l l y be
This m a p p i n g
is s p e c i f i e d by d e f i n i n g a translator;
this is a f u n c t i o n w h i c h maps the s t r u c t u r e d o b j e c t w h i c h is a
(parsed)
c o n c r e t e p r o g r a m into a n o t h e r s t r u c t u r e d o b j e c t
called an a b s t r a c t program.
The set of s t r u c t u r e d o b j e c t s
w h i c h are v a l i d a b s t r a c t p r o g r a m s of the l a n g u a g e is c a l l e d the a b s t r a c t syntax of the language.
The language d e f i n i t i o n
is c o m p l e t e d by d e f i n i n g an i n t e r p r e t i n g
automaton.
This is
d e f i n e d by s p e c i f y i n g a set of s t r u c t u r e d states of the automaton,
and a state t r a n s i t i o n
function.
It is the
d e f i n i t i o n of this f u n c t i o n w h i c h r e a l l y d e t e r m i n e s p r o g r a m s in the l a n g u a g e are to be interpreted.
how
A computation
is v i e w e d as a s e q u e n c e of states of this automaton.
The
i n i t i a l state is d e t e r m i n e d by the p r o g r a m and its data. Tree-structured objects Method.
are a l l - p e r v a s i v e
in the V i e n n a
They are u s e d to r e p r e s e n t both the c o n c r e t e and
a b s t r a c t s y n t a x of a language, of the i n t e r p r e t i n g automaton. the D - f u n c t i o n ,
is i n t r o d u c e d
and is u s e d e x t e n s i v e l y .
and to r e p r e s e n t the states A special
function,
called
to carry out " t r e e - s u r g e r y " ,
188
Finally,
it is assumed that a m e t a l a n g u a g e
which
can be used for the d e f i n i t i o n
A.2
Linear Model L a n g u a g e Suppose
rational
systems,
numbers,
finite-order
difference
input o b s e r v a t i o n
only single-input,
input and output values input-output
equation,
relation
together w i t h
of the the output. of such a system,
the d i f f e r e n c e
ark
is a
an additive
Let u.z be the ith
Yi its ith output
and d i the ith random disturbance.
bj be c o e f f i c i e n ~ o f following
whose
and w h o s e
random d i s t u r b a n c e
observation,
of the above entities.
that we w i s h to i n v e s t i g a t e
single-output
is a v a i l a b l e
equation,
Let aj,
so that the
e q u a t i o n holds:
Yi = alYi_ 1 + ...+anYi_ n + b o u i + . . . b m U i _ m +di. We can follows. integers
(informally)
define
a programming
Every p r o g r a m of the language and rationals w h i c h
is given
as
is a list of
the interpretation:
n,m,al~2, .... a ~ o , b l . . . , b m , d l , d 2 , . . . , d N . such a p r o g r a m is a similar
language
The data for
list, with the i n t e r p r e t a t i o n
i , Y i _ l , . . . , Y i _ n , U i , . . . , u i _ m.
Given such a p r o g r a m
a set of data,
the c o m p u t a t i o n
of Yi in accordance with the
above equation
is invoked.
and output o b s e r v a t i o n s
(y n,...,yN)
then a certain .(infinite) will
constitute
models
If input o b s e r v a t i o n s
and such
(U_m,...,UN) ,
of a system are obtained,
set of programs
in this language
of the system defined by the o b s e r v a t i o n s
((Ul,...,UN), (yl,...,yN)).
The terms d l , . . . , d N w h i c h appear
in the program
are, in fact,
shows the structure
a look-up table.
language described
called Linear Model Language,
algorithm
7
of each such model.
The programming
not a universal
Fig.
language,
above will be
or LML.
Note that LML is
in the sense that not every
can be implemented
in it.
A trivial model of a system is obtained
in LML if
m=n=bo=O. ~.3
Tree-Structured As mentioned
Objects
earlier,
used in the Vienna Method. are familiar. nature,
tree-structured
for short),
objects
are much
We assume here that such objects
The following example
and will also introduce
A typical
A
and the ~-Function.
tree-structured
should clarify
some terminology object
their
and notation.
(or simply "object",
is the following entity A:
=
el
/
Sl!2 s 3 ~
I
e2
e3 Sl,S2,S3,
are called simple selectors.
at each node of the tree are themselves with the objects el,e2,e3,
tree-structured,
as degenerate
are called elementary
is some tree-structured
appearing
which appear at the terminal
nodes of the tree, being regarded These objects
The objects
objects.
trees. The o b j e c t Q
object which is not necessarily
190
elementary. Every o b j e c t has a finite n u m b e r of nodes, node, o t h e r than the u n i q u e root, p r e c e d i n @ node. is"selected" nj=s(ni).
and e a c h
is a s s o c i a t e d w i t h a unique
A s u c c e s s o r node nj of a p r e c e d i n g node n i
by a simple s e l e c t o r s, and this is d e n o t e d by Strictly
r a t h e r than nodes.
speaking, Thus,
n i and nj here denote objects
in the example, we have el=Sl(A).
The simple s e l e c t o r s a s s o c i a t e d w i t h any node m u s t be p a i r w i s e distinct.
This m a k e s
it p o s s i b l e to select the o b j e c t
a s s o c i a t e d w i t h any n o d e by c o m p o s i n g simple selectors. For example,
in the above o b j e c t we have e 2 = s l o S 2 ( A ) ,
e3=s2os2(A),
B=s3os2(A) , w h e r e o denotes
selectors.
E n t i t i e s of the form s o...os, are called l 3
c o m p o s i t e selectors.
c o m p o s i t i o n of
N o t e that r e a d i n g a c o m p o s i t e s e l e c t o r
from left to right c o r r e s p o n d s
to "reading"
an o b j e c t from
b o t t o m to top. The null o b j e c t is a s s o c i a t e d w i t h the empty tree, that is the tree w i t h no nodes,
and is d e n o t e d ~.
Let K d e n o t e a c o m p o s i t e selector, object.
Then the c h a r a c t e r i s t i c
the set of all pairs , can be d e f i n e d by giving
and e an e l e m e n t a r y
set of an object A is
such that K(A)=e.
its c h a r a c t e r i s t i c
set.
An o b j e c t For example,
the above o b j e c t A is d e f i n e d by;
A={<sl:el>,<SlOS2:e2>,<s2os2:e3>,...} The c h a r a c t e r i s t i c
set of B is not k n o w n in this case,
this d e f i n i t i o n cannot be completed. w e r e the object:
. so
But suppose that B
191
B={<sl:e4>,<s2:e5@ =
then we would have A={<sl:el>,<SlOS2:e2>,<s~ s2:e3>,<Sl°S3 ° s2:e4>,<s2 ° s ~ s2:e5>}. We now introduce the
H-function, which is used to
perform operations on objects.
The ~-function takes two
arguments, the first of which is an object A, and the second is a pair, where K is a composite selector, and B is an object.
The range of ~ is the set of all objects.
The
value ~(A;) is an object which is obtained from A by replacing K(A) by B in such a way that K(~(A;)=B. This is most clearly shown by examples (taken from Lucas et al (68)) : Let A = i/sl
s2~ s1
Sl~2
/ e2
S~e 4
\ e3
Then (i)
/Sl/~S3
(A;<s3:B>)=
~
i
/ e2
Sl
2~e
s2
\ e3
I--.
4
192
/
(ii)u (A;<SlO s2:B>)=
s2
s\
e~ S1
e4 (iii)
(A;<Sl~ sl,s l.s 2:B>)=
e/
i/•2 \
Sl~S
e4
2 e3
c/ In particular, (i) ~(A;<s3:~>)
if B=~, we obtain: = A
(ii) ~ ( A ; < s ~ s2:~>)
=
~e 4
(iii) H (A; <SlO SlO Sl- S 2 :~>) =
/
sI
S2
s~s2\
e I
<,
e, e 3
193
Ollongren
(49) and Lucas et al (68) define the ~-function
more precisely
than has been done here.
We now introduce (i)
the following notations:
~(A;,
>)A =
~(~(A;);,...,), with
~(A;)~=A.
Exam Dle Let A=
/
sI
s2
\
eI
Then
e2
~(A;<s3:e3>,<SlOS2:e4>)=
Sls / ~ 2
/
eI
J
s1
s3
\ e3
I
e4 Ollongren
(49) gives conditions
arguments of the (ii)
~-function
under which interchanging
the
leaves the value unchanged.
~o (,... ,) ~ ]l(~; , . . . , )
Thus ~o is a function which
"creates"
objects.
Example ~° (<Sl :el> '<Sl° s2 :B> '<s2e s2 :e3>) =
s ~
2
S1
U
s2
\
e3
194
~.4
C o n c r e t e Syntax.
The c o n c r e t e syntax of a p r o g r a m m i n g d e f i n e d by u s i n g the B a c k u s - N a u r rules.
language can be
form of w r i t i n g p r o d u c t i o n
This is a s h o r t h a n d m e t h o d of d e f i n i n g a grammar.
S u p p o s e that there exists
a finite n o n - e m p t y set Z of
terminals.
T y p i c a l e l e m e n t s of Z are:
b, 2, *, begin,
and so on.
Let Z* denote the set of all finite strings of
e l e m e n t s of Z.
Also,
suppose N is a finite n o n - e m p t y set
of n o n - t e r m i n a l s
such that N n Z = ~ and N* is the set of all
finite strings of e l e m e n t s of N.
Let V=ZuN,
Let V+=V*-A, w h e r e A is the empty string.
and V * = ( Z u N ) *
Then the set of
p r o d u c t i o n rules is the set ~={(~,~) :~eV*xNxV* Each pair
&SeV+}.
(~,8)e~ is w r i t t e n
the set of p r o d u c t i o n
rules
as ~ 8 .
In B a c k u s - N a u r
{6+~i ' ~+B2'''''
~+~n }
form,
is
d e n o t e d by the single e x p r e s s i o n < ~ > : : : 8 1 1 8 2 1 - . 1 8 n. The b r a c k e t s Naur notation providing
<> are used to d e n o t e n o n - t e r m i n a l s .
Backus-
can be used to express p r o d u c t i o n rules
that ~eN.
(~,8),
Such p r o d u c t i o n rules are c a l l e d
context-free. A g r a m m a r G is a 4-tuple G = ( N , Z , P , S ) , w h e r e P is a finite n o n - e m p t y subset of ~ , and SEN is the start s~mbol. If each p r o d u c t i o n of a g r a m m a r is c o n t e x t - f r e e g r a m m a r is said to be c o n t e x t - f r e e .
then the
A context-free grammar
can be c o n v e n i e n t l y d e f i n e d by a finite set of e x p r e s s i o n s
195
in B a c k u s - N a u r
form.
If there e x i s t 61,6 yi=61~2
and y 2 = ~ i ~ 2
7i-i
Yi
~
2 e V* and ~+SEP such that
then Y1
(i=l,2,...,n),
~Y2"
If ~leV*
then y o ~ Y n ( Y n
and
is d e r i v e d
from yo ) . The g r a m m a r G is said to g e n e r a t e L(G)={x:S
~.
Two g r a m m a r s
the l a n g u a g e
x & xeZ*} are e q u i v a l e n t if they g e n e r a t e the same
language. The V i e n n a M e t h o d d e f i n e s One is the c o n c r e t e syntax, L u c a s et al
two g r a m m a r s
for each language.
the o t h e r the a b s t r a c t syntax.
(68) e x p l a i n the d i s t i n c t i o n m o s t clearly:
"An a b s t r a c t syntax is one w h i c h only s p e c i f i e s the e x p r e s s i o n s of the l a n g u a g e as to the s t r u c t u r e s s i g n i f i c a n t subsequent interpretation
for their
and not as to how they are to be
e x p r e s s e d for the p u r p o s e of c o m m u n i c a t i o n either to o n e s e l f or to others.
A c o n c r e t e syntax s p e c i f i e s
the e x p r e s s i o n s
of the language
as a set of c h a r a c t e r strings".
The c o n c r e t e syntax of LML is d e f i n e d as follows: <program>
::= ,
,
,
, .
< r a t i o n a l > : : = +J
-JO
::= J
::= J<
::: O J l J 2 J 3 J 4 J 5 J 6 J 7 J 8 J 9
The t e r m i n a l s of LML are Positive
rationals
.J.
integer>
:
O 1 2 3 4 5 6 7 8 9 . , + -
are r e q u i r e d to be signed so that the
...,
196
terms
in the table
manner
(cf.
the o t h e r
chapter7).
terms
An e x a m p l e
solely
of a v a l i d
This p r o g r a m give
look-up w i l l This
be
coded
requirement
for s i m p l i c i t y string
in LML
(not shown
is e x t e n d e d
to
of d e f i n i t i o n .
is:
can be p a r s e d (using the
the o b j e c t
in a s i z e - c a p t u r i n g
2,1,+.6,-3,O,-1.41,-5.2.
syntax definition)
to
in full):
[/s
s
s1
1
6 A. 5.
Abstract
We
Syntax
introduce
if an o b j e c t
4
the
following
x satisfies
notational
a predicate
conventions:
P, we w r i t e
is -P(x). A
The That
set of o b j e c t s is,
which
is-P={x:is-P(x)}
satisfy .
The
P is d e n o t e d set is-P
is-P.
is d e f i n e d
by
197
an expression is-P=
of the form
(<S-Pl:iS-Pl>,<s-P2:is-P2>,...<S-Pn:iS-Pn
>)
^
which indicates
that for every x c is-P,
X=llo(<S-Pl:Xl>,<s-P2:x2>,...,<S-Pn:Xn>), A
^
A
where x I e is-P I, x 2 e is-P2,...,x n e is-P n. (<S-Pl:iS-Pl >) then we write is-P=is-P I.
If is -P=
A predicate
also be defined by using the disjunction
may
operator V, e.g.: ^
is-P=is-P 1 V is-P2, which denotes x e is-P 1 V is-P 2.
that x e is-P only if
It is assumed that certain predicates
are satisfied by subsets of the elementary Using this notation, defined
the abstract
objects.
syntax of LML is
as follows:
is-program=(<s-n:
is-integer>,<s-m:
is-integer>,<s-rational-
list: is-rational-list>) is-rational-list=(<s-head:
is-rational>,<s-tail:-is-rational-
list V is -~>) It is assumed that is-~={~}, integer and is-rational
and that the predicates
are satisfied by
infinite
sets of elementary
program"
satisfies
objects.
the predicate
program introduced P=_
(countably)
Every LML "abstract
is-program.
the abstract program corresponding
is-
For example,
to the concrete
LML
in section A.4 is the object
//~s-rational-list s~n
/ 2 s-m
! 1
/
_~eaS-tail s- ead ~s-tail / +.6
s-head / -3
~
-tail
s-head
0/
~s-tail s-head /
2 -h e a d /
-1.41 --
.2
198
How
this
object
specified next
by
i9 o b t a i n e d
the
section.
from
translator, Note
which
is m e r e l y
object,
it is c h o s e n
Most the
discussions
abstract
defining
syntax
syntax, not be
If w e w e r e
to a d o p t
would
assumes not
an i n f i n i t e
to have
measure of
terminals
earlier which
must
(in o u r (i)
(2)
case,
of
view,
set
of
has
over
there
of t h e
size
3 is
are
allowed.
of the
string
over
"machines"
programs
is n o t
are:
a finite
an e f f e c t i v e
terminals
length
and the
a size measure
there
axiom
of p r o g r a m s ,
of axioms
size,
first
I t is e s s e n t i a l
pair
These
it
therefore
a useful
of any given
the
A.4).
since
discussed
at m o s t
any y, w h i c h
abstract
As
programs).
exists
as its realisations
(and is
the
that
languages.
above
a program.
introduced
exists
out
for our purposes,
in s e c t i o n
by
viewed
concrete
"terminals"
constitute~
value.
point
separate
to
for that
mnemonic
the
in t h e
any m e a n i n g
Method
then
is
the o b j e c t
label
can be
to b e
satisfactory
satisfied
for Clearly,
this
attach
alternative
in c h a p t e r
(15)
be
that
a maasure
which
Blum
a language
program
defined
for
arbitrary
the Vienna
as d e f i n e d
introduced
not
to h a v e
considered
not be
a "grar~mar"
for us
of
and
of it n e e d
syntax
of
an
be
"+.6"
(p) d o e s
that object.".6"
concrete
will
that writing
s-head o s-rational-list
although
the
number
procedure
of programs
for d e c i d i n g ,
are o f s i z e y.
satisfied
if i n f i n i t e
sets
199
Furthermore, procedures
we w a n t p r o g r a m s
for c o m p u t i n g
to d e s c r i b e e f f e c t i v e
functions.
D e f i n i n g a language
w i t h i n f i n i t e l y m a n y t e r m i n a l s w o u l d c o r r e s p o n d to d e f i n i n g a T u r i n g m a c h i n e w i t h i n f i n i t e l y m a n y tape symbols. w o u l d be a f u n d a m n n t a l
change in the n o t i o n of " c o m p u t a b i l i t y " .
To o v e r c o m e these o b j e c t i o n s , define
This
it w o u l d be p o s s i b l e to
an a b s t r a c t s y n t a x for LML w h i c h s p e c i f i e d a finite
set of terminals.
The object at each t e r m i n a l node of an
a b s t r a c t p r o g r a m w o u l d then s a t i s f y one of the p r e d i c a t e s i s - d i g i t or is-sign,
say, and these w o u l d be d e f i n e d by
i s - d i g i t = is-O V is-i V . . . V is-9 i s - s i g n = is-+ and is~O={O},
V is- -,
.... is-9={9},
is-+={+},
In this case the a s s e m b l y of the d i g i t s
is--=[-}. into i n t e g e r s
and
r a t i o n a l s w o u l d h a v e to be p e r f o r m e d by tile i n t e r p r e t i n g automaton,
A.6
rather than by the translator.
The T r a n s l a t o r
The t r a n s l a t o r is a f u n c t i o n w h i c h maps parsed
concrete programs
a b s t r a c t programs.
the set of
in a l a n g u a g e into the set of
To d i s t i n g u i s h b e t w e e n
concrete
and
a b s t r a c t o b j e c t s we i n t r o d u c e the conventions: is-<program>(x)
means
that x is a p a r s e d c o n c r e t e program,
n a m e l y an object such as that shown in s e c t i o n A.4. precisely,
for LML we have,
for some p o s i t i v e
More
integer k:
is-<program>=(<sl:is->,<s2:is-,>,..~<S2k_l:is->, <s2k:is-.>)
200
The p r e d i c a t e s the concrete
is-,
syntax in exactly
we have is-,={,},
is-O={O},
In the following ...else...
that is-<program>(p)
obtained
are
the same way.
Obviously,
the statement
It is assumed
and is-(xi).
~o(<S-n:trans-integer
if...then
in the metalanguage.
trans-program,
from
etc.
definition,
is a s t a t e m e n t
translator,
is-
is d e f i n e d
The LML
as: t r a n s - p r o g r a m
(p)=
(s l(p))>,
<s-m:trans-integer
(s 3 ( p ) ) > , < s - r a t i o n a l - l i s t :
m a k e l i s t ( s 5(p) ,
s7(P) , .... S2k_l (P)) >) where makelist
(Xl,X2,...,Xn)=~o(<s-head:trans-rational(Xl)>,
<s-tail:i_~f x 2 = ~ & . . . & X n = ~
then ~ else m a k e l i s t
(x2,...,Xn)>)
and the functions trans-rational:
is-
+ is-rational
^
trans-integer
^
: is-
are not further defined. functions
+ is-integer
For our p u r p o s e s
these two
are best thought of as the usual mappings
the rational numbers.
(In an actual
may be more useful to consider
implementation,
them as m a p p i n g s
In this case the sets
w o u l d be finite sets, practical
it
into bit-
^
patterns.
onto
A
is-rational
and i s - i n t e g e r
due to the fixed w o r d - l e n g t h
of
computers). ^
Note that t r a n s - p r o g r a m
(p) e is-program,
(x I, .... x n) e is-rational-list.
and m a k e l i s t
20~
A.7
The I n t e r p r e t i n g A u t o m a t o n
Following Ollongren a u t o m a t o n to be a 5-tuple
(49), we d e f i n e (0, is-state,
~o,A,F), w h e r e
0 is the set of t r e e - s t r u c t u r e d objects and i s - s t a t e
is a p r e d i c a t e over O.
an i n t e r p r e t i n g
a l r e a d y introduced,
Objects satisfying A
i s - s t a t e are states of the automaton.
~o e
is the initial state of the automaton, final states. however, A(~)
is-state
and F is a set of
A is the state t r a n s i t i o n
function;
its range is not is-state, but the p o w e r set of is~state.
is thus a set of states
d e f i n i t i o n of LML,
in qeneral,
a l t h o u g h in our
A(~) will always be a single state.
A . 7 . 1 The State
The state of the i n t e r p r e t i n g
a u t o m a t o n is structured.
The s t r u c t u r e depends on the language to be defined, the d e f i n i t i o n of LML can be rather simple. b l o c k structure, types,
procedures,
conditional
variable
For the LML i n t e r p r e t i n g
A language w i t h
i d e n t i f i e r s of various
and qoto statements,
need a r a t h e r m o r e c o m p l i c a t e d
and for
and so on, will
set of states. automaton,
or LML machine,
we
define is-state=
(<s-c: i s - c > , < s - d n : i s - d n > , < s - c o u n t e r : i s - i n t e g e r > ) .
is-dn is a p r e d i c a t e s a t i s f i e d by a d e n o t a t i o n directory, and is d e f i n e d by is-dn=(<s-data:is-data>,<s-y:
is-rational>,<s-parno:
is-integer
V is -~>) where
202
is-data=
(<s-i:is-integer>,<s-list:is-rational-list>).
The data for a program,
namely the sequence i,Yi_l,Yi_2,...
appears in the initial state as the object
s-list
s-i / i
s_hea~s_tai 1
/
),,
Yi-1
s-head'-
/ Yi-2 We do not specify how this is achieved. result of the computation,
Similarly,
the
Yi,is the object s-yos-dn(~F) ,
where ~F £ F, and we do not specify how it is output.
The
number m+n, which is required for the correct interpretation of the program,
is stored in s-parno • s-dn
"denotation directory"
(~).
(The term
is taken over from
(49) and
(68).
For LML this directory is simpler than in
(49) and
(68),
but it serves essentially intermediate
the same purpose,
namely storing
and final results).
The most complex part of the state is the control, which is an object satisfying the predicate is-c=
(<s-in: <s-ri:
is-in>,<s-al:
is-dum V is-~>,)
where the following c: control, ri:
is-obj-list>,
in:
V is-~,
abbreviations have been used:
instruction,
al:
argument
list, obj
: object,
return information,
dum: dummy.
In this definition,
is~in is a subset of the elementary
203 ^
objects,
called the set of instructions,
subset of the e l e m e n t a r y r is a simple is-obj-list
selector,
is a p r e d i c a t e
discussion
called the set of dummy names.
different
w h i c h we do not define extensive
objects
and is-dum is a
from s-in,
s-al or s-ri.
satisfied by lists of objects,
further;
Ollongren
(49) gives an
of lists.
An example of a control
is the object:
r~-al s-in
s-
/
in/~Ss-al
in 2
ri
[ in 1
I x
This p a r t i c u l a r
control may have
the i n s t r u c t i o n
in 2 is performed,
The result of carrying name a.
of the next state
s-i~s-al
in 1
with x as its argument. to the dummy
so that the control part
\
in 2 (x)
in 1 is now carried out, with
in 2(x)
as its argument.
in 2 is said to be contracting.
On the
it may be that carrying out in 2 requires
carrying out some other instruction
effect:
is
/
In this case,
the following
out in 2 is assigned
in°2 is then deleted,
other hand,
~a
a
instruction
in 3 on in 4 (x).
first
in 4 on x, and then an
In this case in 2 is said to
204
be expanding,
and c a r r y i n g it out results in the n e x t state
having the control:
r
\ /
~
-
s-in
/
s_ri
in3
-ri 1
s-al
b
I
in 4
a
in 1
~b
x
If b o t h in 4 and in 3
a
are contracting,
the c o n t r o l s of the
n e x t two states w i l l be:
r ~N~s-al
s-in s-ri
1 in 1
a
in 3 s~al
a s-in
s-al
/
\
in 4 (x)
in 3 (in 4(x))
in 1
= in2
If an i n s t r u c t i o n
is expanding,
(x)
then p e r f o r m i n g it leaves
all c o m p o n e n t s of the state u n c h a n g e d e x c e p t the control itself.
However,
if it is contracting,
then its e f f e c t
m a y be to change any of the c o m p o n e n t s of the state case,
s-counter
(~) and s-dn
We need some d e f i n i t i o n s
(in our
(~), as w e l l as s-c(~)). for later use.
The set of
205
control
selectors
selectors ~he
of a control
C is the set of composite
~(C)={K:K=roro...or
identity
of a control
selector)
&K(C)¥Q},
if C=~.
if C ~ ,
The terminal
C is the composite
and is I control
selector
selector
T ( C ) = { Y : T e ~ ( C ) & r o T % ~ ( C ) }. If K=r n is a control where
rn=rorg...0r
precedin~
selector
(n compositions),
control
selector
If K is a control
& s-alopreci(K)(C)
selectors
of i n s t r u c t i o n s
selectors
control C, then
o K(C)#~}
is the set of composite
control
of a n o n - e m p t y
(C,K)={s-alopreci(K):i~l
arguments
then the mth
(O~m~n).
selector
=s-ri
and n)l,
control,
of K is
prec m ( K ) = { K ' : r m o K ' = K }
prec-arg
of a n o n - e m p t y
which
select those
associated with preceding
of K w h i c h
are equal
to the dummy name
a s s o c i a t e d with K. If K is a control
selector
of a n o n - e m p t y
then the derived
return
the set r i ( C , K ) =
prec-arg
included because
these two sets differ
for the r e l a t i v e l y The initial
Here, p
(C,K).
a s s o c i a t e d with K is (This d e f i n i t i o n in
state of the LML m a c h i n e
is
(49), but coincide
(<s-data:
introduced
is
int-prog>,<s-al:p>)>,<s-counter: d>,<s-y:
is the LML program,
is-program
C
simple LML machine).
~o=~o(<S-C:~o(<S-in: <s-dn:~
information
control
which
I>,
O>)>). satisfies
in section A.5.
the p r e d i c a t e
The object d
206
satisfies
the p r e d i c a t e
section,
int-prog
is-data d e f i n e d
is an i n s t r u c t i o n
earlier
in this
w h i c h will be defined
later. The set of final states of the LML m a c h i n e F={~:is-state(~) A sequence
~o,~i,...
the LML machine. terminates. A.7.2
& s-c(~)=~}. , where
~i+l~A(~i ) is a c o m p u t a t i o n
The State T r a n s i t i o n
interpretin~
of
If, for some i, ~i e F then the c o m p u t a t i o n
(Every LML c o m p u t a t i o n
W i t h every
is the set
instruction
function
Function in
~in"
of a state ~, and K a control and let ARG= s-al.K(C)
terminates).
e is-in is a s s o c i a t e d Let C be a n o n - e m p t y selector
of C.
an
control
Let s-in-K(C)=in,
be the list of arguments
of in.
Then ~in(ARG,$,K)
= i f PI(ARG,~)
then
gl
else if P2(ARG,~)
then
g2
then
gm'
es___~e 1 . else if Pm(ARG,~) where PI,P2,...,Pm
are p r e d i c a t e s
(m~l),
and gj has one of
two forms: (i)
For the case of c o n t r a c t i n g
control,
gj=~(~(~ (~;) ;{:TEri(C,K) }) ; <s-counter: where
eJo ande3 are objects.
deletes
the i n s t r u c t i o n
in,its
is-integer>,<s-dn:E~(ARG)>)
In this e x p r e s s i o n argument
the innermost
list and its return
207
information,
the m i d d l e ~ r e t u r n s the o b j e c t
p r e c e d i n g control
selectors,
EJ(ARG) o
to
and the o u t e r m o s t ~ alters
c o m p o n e n t s of the state o t h e r than the control. (ii)
For the case of e x p a n d i n g control,
gj = ~ ( ~ ; < K a s - c : ~ ( c 3 (ARG);<s-ri: w h e r e eJ (ARG)
satisfies
s-rioKos-c(~)>)>),
the p r e d i c a t e
is-c.
In this case
the i n n e r ~ a s s o c i a t e s the r e t u r n i n f o r m a t i o n of K(C) w i t h the new control
EJ(ARG),
and the o u t e r ~ r e p l a c e s
the
control K(C) w i t h the new o b j e c t thus created. The V i e n n a M e t h o d uses to d e f i n e i n t e r p r e t i n g in a m o r e r e a d a b l e
a s y s t e m of i n s t r u c t i o n
functions
schemata
rather m o r e c o n c i s e l y and
fashion than the above e x p r e s s i o n s .
However, we shall not d e s c r i b e
this
feature,
since it is
f e a s i b l e to define LML in the above manner. It is now p o s s i b l e to d e f i n e the state t r a n s i t i o n A(~)={q:q=~in(ARG,~,K) &
F r o m this d e f i n i t i o n
ARG
& K=T(s-c(~)) =
& i__nn=s-inoK,s-c(~)
s-al=KoS-C
it is a p p a r e n t
(~) }.
that the state t r a n s i t i o n
is d e t e r m i n e d by always c a r r y i n g out the i n s t r u c t i o n w i t h the t e r m i n a l control of the state, o c c u r r i n g at the "deepest" in s e c t i o n A.7.1).
associated
n a m e l y the i n s t r u c t i o n
level of the control
In g e n e r a l
function:
(cf. e x a m p l e s
(although not for LML),
will be a set c o n t a i n i n g m o r e than one control
selector.
T(S-C(~)) Hence
our e a r l i e r remark that A(~) w i l l in g e n e r a l be a set of states,
r a t h e r than a single state.
In such a case,
does not m a t t e r w h i c h of the t e r m i n a l i n s t r u c t i o n s first.
it
is p e r f o r m e d
208
It is the specification of the interpreting functions of an interpreting automaton which assigns meaning to an abstract program. A.7.3
Interpretin~ Functions
for LML
we now complete the definition of LML by defining a set of interpreting
functions
for it.
The instructions
to
be defined are as follows: Instruction
Type
Domain
int-prog
expanding
is-program
int-m~
expanding
(is~integer) 2
set-mn
contracting
is~integer
int-~ro~-list
expanding
is-rational-list
updatey
contracting
is-rational
product
contracting
{is-rational) 2
sum
contracting
(is-rational)
A
^
We assume that the binary arithmetic operators available.
2
+ and * are
The remarks at the end of section A.6 apply
to these. (i)
int-pro@ int-~ro~
(p,~o,I)=H (to;<S-C:e (p)>)
where e(p)
s-al s-in s-in
/
s-al
J
s-rational-list
int-prog-list
int-mn
(s-m(p) ,s-n(p))
(p)
209
(2)
int-mn int_mn((X,y) ,~,K)=~(~; <Eos-c:e(x,y)>) where e (x,y) = r / ~ s - a l s-in s-i /
s-al
s=
I
v
I ~et-mn
~v
(x,y) (3)
set-mn set-mn(X'~'K)=~(~(~;) ;<s-dn:~(s-dn(~) ;<s-parno:x>) >) Note:
(4)
set-mn puts the value m+n into s-parno-s-dn(~).
int-prog-list (x,~,K)= if s-counter (~)<s-parnoos-dn(~)+2 ~ (~;) ~(~;<~ ~ c : e 2 (x)>)
~int-prog-list then else
where e I (x) =
s-in
i
sn rri< \
k s-tail (x)
int-prog-list v
product
"~'"
~k
\
(u, s-go s-dn(~) )
s-el
I (s-head(x),
s-head
•
s-list
°
s-data
o
s-dn
(~))
210
and ¢2 (x) =
r
/•s-al s-in
updatey s-
-
±
/
\ v
sum s-al
I
(s-yDs-dn(~) , s-head o (s-tail) i (x)) where (5)
i=s-ios-dataos-dn
(~)
updatey ~updatey
(x,~,K)=~(~(~;
-c:~>) ;<s-dn:~(s-dn(~) ;<S-y=x> ,
<s-list#s-data: <s-counter:
s-tail-s-list-s-data,
s-counter
updatey
brings
the next data item to the top of s-list-s-data.s-dn(~),
(6)
(7)
into s-yos-dn(~),
(~) by i.
produc 9 ~product
where
s-counter
value
(6)+1>)
Note:
and increases
puts an intermediate
s-dn
((x,y),~,K)=~ (~ (~;) ; )
r e ri (s-c(~),K) sum ~ s u m ((x,y),~,K)=~(~(~;) where
T eri(s-c(6),K)
;)
(6)>),
211
In order to clarify the above definitions, some of the steps in an LML computation are shown below.
To save
space, only the control and those parts of the state which have just changed are shown. s
s-in
s-counter
/ int-prog
I
1 0/
s-data
s-al
/s-i~ 1
s-list
s_n ~ s _ m s-rational-list
s-ha~d
m n
/
Yi_l-/"
s-tail
s-head
s-tail
I
\.
/
.)
2
el
s-head s-hel
u.1-m
o/
sc ~
I
s-al
int-prog-list
/ int-mn
/ s-head
s-in
I
(re,n)
s-tail k%
I aI
212
~2 =
S-C
r "~s-al s_~in siin t ~ in -prog-lis s-t~l r s-head s-al %
I
\v
set-mn
/
ai
\
/
s-al
sum
I
(ra,n)
~3 =
o-c~ s-al
int-pro~-list
s-in / ~ s-al / \ Set-mn
~s-tai< s-head
m+n
/
aI ~4 =
•/ ~
s-c~
/ s-in s-al int-pro@-list s-ea~ds-tail /
aI
\
s-dn S - ~ s 0
s-parno
I
m+n
-da "-.
213
s-o~ if m+n > 0 then ~5 =
/ ~ r s-¢n / / ~
i
r
s-al
int-prog-list ~
IX
/
[ s-a\ /
l~s_r~
s;i%Sa~
s-ln
s-head
~v
/
updatoy
s-i / product
s-al
\ s-rx
/
,0)
\
(al,yi_ 1 )
u
~6 =
S-C/~
~
/
_ s-~n I \ ~,,~-~,-°~-;,, ,s-head
s-in s-i / sum
s-al ~
I updatey
~
/
k
a2
v
v
(al*Yi_l,O)
s-c~
~7 =
/ updatey
int-pro@-list
I (al*Yi_I)
s-
~-t,,,.il
214
S--C
s-dn s-in~ / int-pro~-list _
s-counter 2I
~
s-da~
al*Yi_1 s-head
/
a2
s-head
I
/
:
Yi-2 A sequence like ~5,~6,~7,~8 is now repeated until s-counter (~i)=m+n+2, whereupon we get s-dn s_c/~ / ~ s-co~nter s_y~ s-data m+n+2 r s-al s-parno
~i+l=
.
s-ln sum / ''
s-~n
v
I ~pdatey s-al I s-ri (~,di)
~
~i+2=
//~ S-C
update~
s-al I 9+d.1
I
s-i
m+n
i
I
215
\
~i+3 =
s-dn s-counter
I
\
m+n+3
s-y
/
s-data s- ~arno
Yi
s-i m+n
~i+3£F,
so the
is a v a i l a b l e
computation
be r e m a r k e d
the LML i n s t r u c t i o n s restrictions
These
that
length of the
are,
items
table.
is s i m p l y
done by e n t e r i n g
free g r a m m a r
cannot
definitions
These
LML
and
the v a l u e s
not e x c e e d N,
of m
the
can be
instructions.
state
This
if any of these
like Algol,
context
be e x p r e s s e d (49)).
are
and a b s t r a c t
with
of the LML
of
be e x p r e s s e d
restrictions
to s p e c i f y
(see
There
of p a r a m e t e r s
In l a n g u a g e s
can also be u s e d which
cannot
of c o n c r e t e
an e r r o r
are violated.
restrictions,
which
of i s h o u l d
in the d e f i n i t i o n s
technique
complete.
be c o m p a t i b l e
the v a l u e
look-up
above d e f i n i t i o n s
that the n u m b e r
expressed
conditions
the
are not q u i t e
specifications
rumber of data
and n, and
that
on an LML p r o g r a m
in the e a r l i e r
the
Its r e s u l t
in s - y o s - d n ( ~ i + 3 ) .
It s h o u l d
grammars.
has t e r m i n a t e d .
this
- sensitive
in the c o n t e x t -
216
A.8
Summary
The V i e n n a m e t h o d of d e f i n i n g progra~%ming languages has been described.
This m e t h o d includes
d e f i n i t i o n of the s e m a n t i c s of a language,
the formal and is s u f f i c i e n t l y
p o w e r f u l to be used for the d e f i n i t i o n of p r a c t i c a l p r o g r a m m i n g languages.
It has been used here for the d e f i n i t i o n of
the simple and s p e c l a l - p u r p o s e L i n e a r M o d e l Language. This has b e e n done b o t h to i l l u s t r a t e the method, o r d e r to m a k e language"
and in
f a m i l i a r a r a t h e r b r o a d e r n o t i o n of " p r o g r a m m i n g
than is usual.
The V i e n n a M e t h o d of l a n g u a g e d e f i n i t i o n is used in ch~ter
5 to f o r m a l i s e the n o t i o n of a "fragment"
of a language.
217
APPENDIX B Syntax
Of the
Algo iW-Support
of the Gas-Furnace
Models
This appendix contains the concrete syntax of the AlgolW-support
of the five models of section 6.3.2.
It
is based on the AlgolW syntax specification given in The numbers in brackets the relevant sections of comparison.
to the right of subheadings
(50). indicate
(50), in order to facilit&te
Standard procedure
statements
terminals which do not appear in
are new non-
(50) (cf. sec. 6.3.1).
The symbol "t" may be replaced by either "real" or "integer", in accordance with the rules specified in sections i.i, 1.5, 1.5.3, I.
and 1.6.2 of
Identifiers
::=
(50).
(1.2)
::= ::= <standard procedure identifier>::=
READIREADONIWRITE
::= EIIIJINIUIVIWIYIZ ::= 0111213141516171819
(Note:each of these appears in
Numbers
every
look-up table).
list>:: = l
list>,
(1.3.1)
::=
::=
number>.
number> I
.
218
::=l
Declarations
number>
(1.4)
<declaration>::=<simple
variable declaration> I
3.1
Simple Variable Declarations
(1.4.1)
<simple variable declaration>::=<simple
type>
<simple type>:: = INTEGERIREAL 3.2
Arra[ Declarations
(1.4.2)
::=<simple
type>ARRAY
list>
() ::= ::=:: ::= ::= 4.
Expressions
(1.5)
::=<simple 4.1 Variables
t expression>
(1.5.1)
<simple t variable>::= I ::=<simple
t variable>
::=
array identifier>(<subscript
<subscript list>::=<subscript> <subscript>::=
expression>
list>)
219
4.2
Arithmetic Expressions (1.5.3)
<simple t expression>::=l<simple t expression>+ l<simple t expression>- ::=l* ::= ::=I 4.3
Lo@ical Expressions
(1.5,,..4)
::= ::=<simple t expression>
operator>
<simple t expression> ::= < 5.
Statements
(1.6)
<program>::=.
(Note we do not provide a
specification of the syntax of ). <statement>::=<simple
statement> I
I <simple statement>
::=l I <standard procedure statement>
5.1
Blocks
(1.6.1)
::=<statement>END ::=l<statement>; ::= BEGINI<declaration> 5.2
Assignment Statements
(1.6.2)
::=
left part>
220
::=
Standard
variable>:=
Procedure
<standard procedure
Statements
(cf. 1.6.3 and 1.6.8)
statement>::=<standard
procedure
(
list>::=
list>,
expression>I
designator>::=
5.4
designator
If Statements
::=
parameter>
subarray designator>
array identifier>(<subarray designator
<subarray
list>)
list>::=<subscript> (1.6.5) clause><simple
statement>
ELSE<statement> ::= 5.5
Iterative
IF
expression>THEN
Statements
(1.6.7)
statement>::=
::=
clause><statement>
FOR:=BO
value>::=
::=
list>)
parameter> I
::=
identifier>
expression>
expression>
value>UNTIL
I I
~
~ ,
I
I
o
,
. 1 .
I
.
I
,
I
~
0
I
I
.
I I
.
]
.
.
I
~
.
l
,
I
o
.
.
I
°
I
~
I I
I
o
o
I
.
I
,
I
I
,
0
I
I
o
I
I
.
~
.
.
o
.
I I I
I I
I I
o
.
~
' ~ X ' ' ~
I
o
.
!
~
.
~
~
*
i
I
-
~
.
. . . .
I I I I ~ 1 1
I
'
0
'
I
)
'
I
o
I
'
I
.
I
~
,
•
I
l
I
,
*
I
.
I
I
•
.
*
.
,
~
I
I
,
.
I
o
I
.
,
o
~
~
I
.
~
I
.
W
o
~
W
~
~
.
.
.
•
~
~
.
•
~
~
.
w
.
w
. o
°
I
I-'-
'~'~
I~. •
I~-
F-'-
I-I
I-t
I--I
H
o
0
I
0 0
t~
,
i
i
~
l
,
i
I
~
I
~
0
0
o
o°
o
I I I
~
i
~
e
0
I
I I I
I
i
,
o
~
,
~ , .
° ° l ~ ~
i
,
i
~
~p
~
i
I
e
I I I
e
,
• ~
~
i
I
.
~.
,
~
I | e e e e ~ ~ ~ ~
°
.
~
I
I
i
,
i
o
~
I
,
o ~ 0
e
~
I I
e
,
e
i o
~
.
l
i ,
0
~°
a m
i
I I ~ e e e ~ ~ ~
i
~o
l
.
i
I
o
~
~
i e
i
1
~~
o
0
e ~ m ~ ~
.
o ~
I
.
i
.
~
I
~ ~
e
~
,
|
~
m
I
~
i
o°
e ~
I
I
I I
I 1
i
~
~
~
,
~ ~
1
i
o
.
~
0
I
I
1
i
°
~.
~
I
o
~
I
i
~
l
o
0
i
~ 0
o ~
0
I I
i o
.
e ~
I I ee ~ ~
i
o
o ~
i l
o°
*
o
~
~
j
i
,
t
I ~
I
i
0 0
0
~
e
i
~°
*
l
'
~
e
i
°
"
• ~
~
~
j
i
o
'
~
~,
'
l
i
.
o
e
l ~ ~
e
o
~
°
'
o
'
m
I
~
"
I
o
~
I
o
I
~
I
I
1
I I ~
I
~
~
~
I
~
o
~
I o
l
I
o
I
I
I
.
o
I
I
I I I I I . o l a o
l o
.
. . . . . . .
I
~
l .
.
l i
o
,
l
.
l
I I I I I ~ i
~
I
o
2
I I I
I
~
l
~
I
J
l
l
i
-
l
.
t
l
~
I .
l
I
I
o
l l
l
l l l l ' ' " ~
~
l
,
o
l
~
l
II
l
l
I
o
l
~
l
0
I
I
.
.
0
~
I
4
.
l
.
l
•
0
I
.
l
.
1
l
1
.
1
.
.
1
~
1
.
1
.
1
1
.
1
.
.
1
1
.
I
I
I
•
I
I
I I I
. . . o o . 1 1 1 1 1 1
I
I I I I I I I I I I I I o w l m , . - O O o . i ~ ~ ~ ~
I
l
.
I
I I
•
nJ
I--,-
ro ~o
,
I
I
~
,
I
I
~
,
,
l
~
•
o
I
l ~o
l
l ,~
,
~
0
~
~
,
I
~
I
I
~
o
i ,
I
I
.~
~
l
l
~
I
I
I
~
,
l
o
I
,
I
~
~o
l
°
I
I
o
~
I
i
~
~ ~ .......
~
.
,
I
,
~
l
.
I
l ,~
-
I
I
,
~
.
~~
0
I
l
,
"
~
=
I
"
~.
"
I
I
~
I
o
~
~
,
"
I
~
I
~
I =
~
. ~ .
~
I
I
"
l
,
I I
~
.
&
~
I
,
-
~
+~
~
I
I
°
.
.
~
I
.
-
~
.
,
~
I
.
~
l ,
.
l
,
~
.
II
i
,
I.
,
I
~
i
~~
l
,
0
I
~
I
I
~
0
0
O
<~
0
o
. F--+
. I--+
~
II
, O +
~ .
I
I I
.I ,I~
~
. ~ .
•l l--+
~
.
0
I
.I I-~
~
o~
.
0
°
,~
. ,=.-
~
I
~
0
.
• I,~
~
I
0
~
0
+
I
I
I
I
~
o ~
~
~
~
I
I-+"~
.
.
I o
I
I
,
~
.
o
.
o
I
.
•
,
,
I o
~
I l l . .
I I I
I
0
.
0
I
.
o
~
~
.
•
.
l
.
.
l
.
.
~
I o
~
l
I o
.
-
•
l
I
I '
,
l
~
.
I °
~
.
i
,
I '
o
.
l
I '
I
0
o
'
0
o
.
I I
.
~
I
o
.
l o
.
°
l o
I
I
~
I t
l .
~
I
I
l .
I ,
~
l .
I
O
b
i ~
I
O
.
.
I t
l i
I ,
0
l ,
I O
I .
l ,
O
I .
Q
I
~
.
I
I
O
I .
.
I .
~
I
I I
I
~
I
I
o .
,
-
I °
I
.
I
'
.
~
l
'
-
l
l
.I I
I
'
obbhb
.
.
. , . , , ,
b b b b o b b b o ~ o b b b o b o b b b b b - o ~ b o
.
0
•
.
I
I
I
.
I
.
~
~
I
l
.
i
I
I f
o
m
~P
r..n
I
I
0
~
I
I
I
.
f
~
~
0
•
I
.
I
.
~
~
~
I
0
.
~
~
~
o
|
.
j
.
~
.
I !
O
0
I
~
,
~
I
~
.
o
......
0
~
.
~ ~
0
.
o
~
~.~
i
.
I
I
B ~
I
~
l
l
I
~
~
~
I
o
i
f
~
I
~
~
l
~
~
0
0
o
l
l
I
O
~
I
I
l i ~
~
~
~
I
l o
I f
O
~
I
l o
~
I
~
I
~
0
i o
I
~
~
i o
I
~
I
I
e
~ . -
I
0
l
I
l I - I
~
l
~
~
I
i
!
~
I
~
~
|
~
~
0
|
~ •
I
~
I
~
.
~
~
~
I
~
I
I l
~
I
I l
I
I
I
~
I
0
.
~
~
0
I
~
~
I
~ .
l [
~
l
~
~
i
.
1
O
~
1
0....
~
1
~
,!
~
O
~
O . ,
~
. . . . , . , , ' I 1 " 1 1 1 1 1 1 1
~
I l i ~ l l l t l l l . , , . , , ,
0
I
0
~
I
~
!
0
t
t-,-
~P
r~
I
.
.
.
i
~
~
l
~
.
~
.
. ~
~
i
~ ~
*
0
I
.
~
i
* ~
I
~ 0
l
.i
l
0
l
I .
~
*
~
i
*
.
~
l
I
~
i
.
~
I
~ ~
*
.i
I ~
I
I
~
.
4 ~
~
~
i
~
I
I
~
.
.i
~
I
~ ~
I
~
I
I .
.
~
~
l
~
~
0
l
~ ~
.
~
l
~ 0
0
i
.
I
(
)
~
* * o ~ ~
~
.
"
~
~
~
l
!
.
I
0
.
0
~ ~
l
.
~
.
• ~
~
.
~
|
.
.
I
0
0
~
~ ~
I
.
~
• ~
~
~
.
I
I
~
.
I
I
~
l
~ ~
.
.l
l
~
l
~
~
I
~
.
I
l
I
~
.
0
~ 0
.
~
.
0
.
~
°
~
l
.
!
I
.
0
~
.
~ ~
,
l
o
.
p
.
~
~
~
l
.
I
0
~
.
,
~ ~
*
*
l
*
I
I
~
~
0
.
l
o
~
~
~
~
0 ~
.
.
!
~ b
!
~
.
~
~
0
I-'
~
~
~
l
0
.
,
~
0
b
.
~
.
i i
0
I I ~
0
~
~
!
I
~
~
0
.
~
I
0
I
I ~
~
.
~
~
.
~
P H ~
0
I
~
~
.
*
~
0
I
.
I
I
~
!
~
~
0
~-~
°
~
~
I
~
1
1
1
I
I
0
0 °
F
I
.
.
i
~ 1 1 1 1
~
I
1
I I
0 j
~
~
I ~ I • ~ ~
~
~ •
~
0
I
0
~
o
I
I . ~
~ ,
.
I
,
O
~
~ .
.
I
,
o
.
. ~
I I
I
~
~
.
I
~
I
~ ~
,
~
.
~
~ e *
I
~
I
~
~
.
0
O
.
. ~
.
~
~
o
.
I
I
I I o .
-
.
. * ~ ~
I
I o
o
~
.
-
~
I
I
~
-
-
°
0
e
0
°
] I
I ~
°
I
~
I o o . ~ ~
'
I
~
L
.
l .
'
l '
|
~
I
~
1
l '
I
l '
1
0
l
'
l
l '
1
'
i
• ~
I I I
o
0
1
l *
~
l
l
l *
'
I I ~
~
l '
'
~
l
l
l
o
'
~
• ~
~
'
I
0
l
l
~
I
'
i
"
I
~
I
~
~
~
I
o
~
. , , , , II I
I
~
'
l
~
FJ.
r~
k< l.J.
h~ O0
o ~
j
I t
l
o ~
I
o
l
~
a
I
4
.
o
o ~
o
e
'
'
0
I
'
'
I
~
"
I
I
~
I
a
I
I
t
I
~
o
I
I l
. ~
e
I
0
I
I I . e ~
~
o ~
~
~
I
.
I
o ~
I I o ,
l
*
l
.
I
I l
o
I
l
o
'
.
I
i
e
o
I o
l
I
.
[
I I e o
.
I ~
"
I '
I
I '
I ~ I I o O O
I
I
'
I O
I
O
I
I I
I
i.J.
230
constant behaviour
~sou RCE ~'~X°mp[exX~
S'NK
behavJour S 1=(O,X)
FIG
52 =(X,O)
1
I
)
const,ant beha~our
231
e.
Yi
.*,~rror
_JCONVENT]ONAL
ui
Yi Iii exact output observation
I approximate output obser vat[ on (model. output) Representation of Conventional Mode[ FIG 20
GENERATION OF CORRECTIONS
I
I ui
I I I
II
.J CONVENTIONAL MODEL
f/i
J
I
,Yi lexact output observation i(model output )
I
I I
I
I
Corresponding Modet Which Satisfies Definitions 3.3.1,3.3.6 FIG 2b
232
LI I
° t
D(B) C(B)
NOISE PROCESS
nt
B(B) A(B) DE TERMINISTI C PROCESS
ut
Yt
o
A(B) : 1 -0.57B-0.01B
2
B(B) = - ( 0 . 5 3 , 0 3 7 B , O . S 1 B 2 ) B 3 C(B) = 1 - O . 5 3 B , O . 6 3 B 2 D(B) = 1
The B o x - J e n k i n s M o d e l
of t h e
FIG
Gas-Furnace
3
Data
233
,
TABLE LOOK-UP
"Yi
(Yl ' " ' Y 2 9 6 )
(a) Model I - T h e
Trivial Model
I
I
I I
I I I I I I I I I I
53.5
I I
I
I I
TABLE LOOK-UP
I s
a
a
Yi
I I
I I
1_
_I
(b)Model I[-The Mean
FIG 4
,-'Yi
234
I
I
I
I
,[
INITIAL CONDITIONS
I
-0.049
J
I
J
I (u I ,...,u i ) I I
.-1
I
Yi,%
I
,l
I I I I I I I
AB(B) (B)
I ( ~'6 .....
I
Y~-I )
TABLE LOOK-UP
,~
I I
( n I .... n 296 )
I
I L
/
(c) Model I i I - Deterministic Using
Input
Observations
FIG
4
Transfer Only
Function
II
Yi (i;~6)
(i{6)
235
I
I
, I-o.o 9i,
(ui_5,ui_4,ui_ 3 ) [
I
53.5
..
B(B) A(B)
Yi
*
.It
Yi (i~.6)
(Yi_2,Yi_l) ei 53.5
I
~ I
I i
.
I I
TABLE
L OOK- UP
(el ' " # 2 9 6 )
I
L (d)Model IV-Deterministic
Tronsfer
Function
Us{,,ng Input ond Output Observotiqns
FIG
4
Yi (i~6)
236
[
1
I
I
,I
-0.049 J
53.5
I
(ui_7, ,u _ii) I
Yi
B(B) A(B)
I
J(Yi*__2,Yi*__l) +~ el-l)
L
(Y
,-7
.... 'Yi-1 )1 I
I
Y
I I
I I i
I I I
J
i
(Yi-2'Yi -1 )
-"v
J
D(B)
J
53.5
kl
TABLE LOOK-UP
-1 (al , ..., a296)
L
..I F'
j I I I 1 I !
I I I I I I I I I J
.(e) Model V - Stochastic
FIG
Process Model
4
Yi (i)8)
Yi (i<8)
237
I
I
I
53.5
-0.049 1
I
)1
i-7 ..... ui-1 I
÷
I
B(B)
Yi
l, +
-7 ""'Yi-1 )1
~
I
I
I I I I I I
.I I I
I
I
I
Y
(Yi-2,Y(_1)
ei-1)
=~-J
I A(B) D(B) _~
(wi_2,wiq,wi)l'
-I ..
I TABLE • LOOK-UP
-{ (Wl,...,w296)
L
c ( a i ....
I ei
I I I I I '1"
~r
t I
(i(8)
J
(f) Model VI- Box_& Jenkins M.ode[ FIG
4
Yi (i~8)
Yi
238
',N U'l
'1500
.1000
,500
lII&V I
observations I
I
100
20O FIG
5
Sizes of modets
I-VI
3OO
239
]
~rof,.OrmotiOn n
600'1" IV V VI
500
11"[ 400
300
200
100'
0
-100
y
observations :
I
100
200
360
VI -200 FIG
information
6 gains
of modets
l-Vl
240
.i
(ui...m,...,ui)J
i
m
bmBm. . . .
boI
~ LI
]
i
Y~ n
TABLE
LOOK-UP
FIG 7
Structure of computations performed by Linear Model Language (B denotes the backward shift operator)
241
Fq ( ~i -q-r~
" • ", ui
)J
Yi (i > m a x ((m- q-),(n.q)))
(Yi-q-n'""Yi-1)! -I A,"~
(Y'i-q.... ,Y D(B) C(B)
(Yi-q'""Yi-1)
-I
(Wi_p,...,w i ) _I
TABLE
-l
LOOK-UP
Yi Ill (i< m a x ((m~,q),(n • q)))
FIr~ 8 Structure
of comDutQtions perform~l Linear Mode[ Lar~juaeje
bY Extend.e~
242
I TABLE -
LOOK-UP
e,
Di
I~ixEo PART ~i oF ~OOEL
I
• - Ci=Y i
+=6
FIG
9
Assumed structure of models (in chapter
For definitions
of
D.,C. I
7)
I
see
def. (3.3.7)