Optimization of Stochastic Systems Topics in Discrete-Time Systems
MATHEMATICS IN SCIENCE AND ENGINEERING A SERIES OF MONOGRAPHS AND TEXTBOOKS
Edited by Richard Be.Ilrrra.ri University of Southern California 1.
TRACY Y. THOMAS. Concepts from Tensor Analysis and Differential Geometry. Second Edition. 1965
2.
TRACY Y. THOMAS. Plastic Flow and Fracture in Solids. 1961
3.
RUTHERFORD ARIS. The Optimal Design of Chemical Reactors: A Study in Dynamic Programming. 1961
4.
JOSEPH LASALLE and SOLOMON LEFSCHETZ. Stability by Liapunov's Direct Method with Applications. 1961
5.
GEORGE LEITMANN (ed.). Optimization Techniques: With Applications to Aerospace Systems. 1962 RICHARD BELLMAN and KENNETH L. COOKE. Differential-Difference Equations. 1963 FRANK A. HAIGHT. Mathematical Theories of Traffic Flow. 1963
6. 7. 8.
F. V. ATKINSON. Discrete and Continuous Boundary Problems. 1964
9.
A. JEFFREY and T. TANIUTl. Non-Linear Wave Propagation: With Applications to Physics and Magnetohydrodynamics. 1964
10.
JULIUS T. Tou. Optimum Design of Digital Control Systems. 1963
11.
HARLEY FLANDERS. Differential Forms: With Applications to the Physical Sciences. 1963
12.
SANFORD M. ROBERTS. Dynamic Programming in Chemical Engineering and Process Control. 1964
13.
SOLOMON LEFSCHETZ. Stability of Nonlinear Control Systems. 1965
14.
DIMITRIS N. CHORAFAS. Systems and Simulation. 1965 A. A. PERVOZVANSKII. Random Processes in Nonlinear Control Systems. 1965 MARSHALL C. PEASE, III. Methods of Matrix Algebra. 1965
15. 16.
17.
V. E. BENES. Mathematical Theory of Connecting Networks and Telephone Traffic. 1965
18.
WILLIAM F. AMES. Nonlinear Partial Differential Equations in Engineering. 1965
19.
J. ACZEL. Lectures on
20.
R. E. MURPHY. Adaptive Processes in Economic Systems. 1965
21.
S. E. DREYFUS. Dynamic Programming and the Calculus of Variations. 1965 A. A. FEL'DBAUM. Optimal Control Systems. 1965
22.
Functional Equations and Their Applications. 1966
MATHEMATICS
IN
SCIENCE AND
ENGINEERING
23.
A. HALANAY. Differential Equations: Stability, Oscillations, Time Lags.
24. 25. 26. 27.
M. NAMIK OGUZTORELI. Time-Lag Control Systems. 1966 DAVID SWORDER. Optimal Adaptive Control Systems. 1966
1966
MILTON ASH. Optimal Shutdown Control of Nuclear Reactors. 1966 DIMITRIS N. CHORAFAS. Control System Functions and Programming Approaches. (In Two Volumes.) 1966 N. P. ERUGIN. Linear Systems of Ordinary Differential Equations. 1966
28. 29.
SOLOMON MARcus. Algebraic Linguistics; Analytical Models. 1967
30. 31.
A. M. LIAPUNOV. Stability of Motion. 1966 GEORGE LEITMANN (ed.). Topics in Optimization. 1967
32.
MASANAO AOKI. Optimization of Stochastic Systems. 1967
In preparation A. KAUFMANN. Graphs, Dynamic Programming, and Finite Games MINORU URABE. Nonlinear Autonomous Oscillations A. KAUFMANN and R. CRUON. Dynamic Programming: Sequential Scientific Management
Y. SAWARAGI, Y. SUNAHARA, and T. NAKAMIZO. Statistical Decision Theory in Adaptive Control Systems F. CALOGERO. Variable Phase Approach to Potential Scattering J. H. AHLBERG, E. N. NILSON, and J. L. WALSH. The Theory of Splines and Their Application HAROLD J. KUSHNER. Stochastic Stability and Control
This page intentionally left blank
Optimization of Stochastic Systems Topics in Discrete-Time Systems
MASANAO AOKI Department of Engineering University of California Los Angeles, California
1967 ACADEMIC PRESS New York • London
COPYRIGHT
© 1967,
BY ACADEMIC PRESS INC.
ALL RIGHTS RESERVED. NO PART OF THIS BOOK MAY BE REPRODUCED IN ANY FORM, BY PHOTOSTAT, MICROFILM, OR ANY OTHER MEANS, WITHOUT WRITTEN PERMISSION FROM THE PUBLISHERS.
ACADEMIC PRESS INC.
111 Fifth Avenue, New York, New York 10003
United Kingdom Edition published by ACADEMIC PRESS INC. (LONDON) LTD.
Berkeley Square House, London W.l
LIBRARY OF CONGRESS CATALOG CARD NUMBER:
PRINTED IN THE UNITED STATES OF AMERICA
66-30117
To M. F. A. and C. A.
This page intentionally left blank
Preface This book is an outgrowth of class notes of a graduate level seminar on optimization of stochastic systems. Most of the material in the book was taught for the first time during the 1965 Spring Semester while the author was visiting the Department of Electrical Engineering, University of California, Berkeley. The revised and expanded material was presented at the Department of Engineering, University of California, Los Angeles during the 1965 Fall Semester. The systems discussed in the book are mostly assumed to be of discrete-time type with continuous state variables taking values in some subsets of Euclidean spaces. There is another class of systems in which state variables are assumed to take on at most a denumerable number of values, i.e., these systems are of discrete-time discrete-space type. Although the problems associated with the latter class of systems are many and interesting, and although they are amenable to deep analysis on such topics as the limiting behaviors of state variables as time indexes increase to infinity, this class of systems is not included here, partly because there are many excellent books on the subjects and partly because inclusion of these materials would easily double the size of the book. The readers are referred to Refs. 47a, 52, 58, 63a, 74a and the books by K. L. Chung, J. G. Kemeny et al., and R. S. Varga listed in the Bibliography. Following the introductory remarks and simple one-dimensional examples to indicate the types of problems dealt with in the book, the procedures for deriving optimal Bayesian control policies for discrete-time stochastic systems are developed systematically III Chapters II through IV. Those readers who are being exposed to the types of problems in the examples in Chapter I for the first time should glance over these examples without unduly concerning themselves with the question of how the optimal controls are derived and then come back to them after reading Chapters I and III. Chapter II treats a class of stochastic control systems such that the complete information on the random variables in the system descriptions is available through their joint probability distribution functions. ix
x
PREFACE
Such systems are called purely stochastic. Chapter III treats a class of stochastic systems in which the joint probability distribution functions of the random variables are parametrized by unknown parameters in known parameter spaces. Such systems are called parameter adaptive. Chapter IV presents the most general formulation of optimal Bayesian optimization problems in the book and is a generalization of the material in Chapters II and III. Advanced readers may go directly to Chapter IV to see the general mathematical formulation used. The material in Chapters II and III is included primarily for pedagogical purposes. Since optimal control problems often involve estimation problems as subproblems, and since the topic is of interest in its own right, Chapter V is devoted to discussions of estimation problems of linear and nonlinear systems. Chapter VI concerns the convergence questions in Bayesian optimization method and includes material on stochastic observability of systems. Some of the material in this chapter is relevant to learning systems. Chapter VII presents approximations in control and estimation problems and current topics such as various suboptimal estimation schemes and construction of suboptimal control policies for adaptive systems. Control problems discussed are mostly of finite duration, N. The behaviors of systems as N -----). ware only touched upon in Chapters VIII and IX. Chapter VIII briefly describes the question of stability of stochastic systems. The last section of Chapter VI and this chapter constitute material on the quantitative aspects of discrete-time systems. Needless to say, the concept of optimal controls is meaningful only when the resultant system behaviors are stable. Implicit in this is the assumption that there is at least one control policy which makes the expected value of the criterion function finite. Although this point becomes important when the control problems with infinite duration are discussed, the stability question is considered primarily as an application of martingales discussed in Chapter VI. This and some other topics not contained in previous chapters are mentioned in Chapter IX and some future problems are also suggested. All my work on stochastic control systems included here has been supported by the Office of Naval Research. I am particularly grateful to Professor G. Estrin who has supported and encouraged my work in this area for many years since I was a graduate student, to Professor R. Bellman who introduced me to problems of optimization and suggested writing the book, to Professors Zadeh and Desoer who gave me the opportunity to visit the University of California, Berkeley and to give
PREFACE
Xl
a seminar, and to Professors A. V. Balakrishnan and C. T. Leondes for their support of the seminar conducted at the University of California, Los Angeles. The book has been improved materially as the results of discussion with D. D. Sworder, R. E. Mortensen, A. R. Stubberud, and J. R. Huddle. I want to express my sincere thanks and appreciation to my teachers, colleagues, and students for their help in preparing the book. The following charts are included as an aid for those readers who wish to follow particular topics of interest. Optimal Bayesian Control
Estimation
Stability
I
11.3
VI.3
II
V
VIA
III
I
VIII
I
I
I
I
I
VII.3-6
IV
I
I
I
I
VI
VI
I IX
I
I
VIII
I
VII.2
I
VII.!
Approximate Methods in Control and Estimation
Appendix IV
II
I
I--~
II
I~_---I
III VII.!, 2
Use of Sufficient Statistics in Control and Estimation
I VII.3-6
1-,I
IIL5
I
V.3
IV.2
Los Angeles, California December, 1966
MASANAO AOKI
This page intentionally left blank
Contents Preface
CHAPTER 1.
IX
Introduction
1. Introduction 2. Preliminary Examples
1 4
Optimal Bayesian Control of General Stochastic Dynamic Systems . . . . . . . . . . . . . . . . .
CHAPTER II.
20
1. Formulation of Optimal Control Problems . . . . . . . . . 2. Example. Linear Control Systems with Independent Parameter Variations . . . 3. Sufficient Statistics . . . . . . . . . . . 4. Discussions . . . . . . . . . . . . . . Appendix A. Minimization of a Quadratic Form Appendix B. Use of Pseudoinverse in Minimizing a Quadratic Form Appendix C. Calculation of Sufficient Statistics Appendix D. Matrix Identities . . . . . . . . . . . . . . . .
36 53 71 73 74 76 79
Adaptive Control Systems and Optimal Bayesian Control Policies . . . . . . . . . . . . . .
81
21
CHAPTER III.
1. 2. 3. 4. 5. 6. 7.
General Problem Statement (Scope of the Discussions) Systems with Unknown Noise Characteristics . . . . Systems with Unknown Plant Parameters . . . . . . Systems with Unknown Plant Parameters and Noise Characteristics Sufficient Statistics . . . . . . . . . . . . . . . Method Based on Computing Joint Probability Density Discussions . . . . . . . . . . . . . . . . . .
82 83 104 116 117 120 125
Optimal Bayesian Control of Partially Observed Markovian Systems . . .
128
1. Introduction 2. Markov Properties
128 132
CHAPTER IV.
xiii
CONTENTS
XIV
3. Optimal Control Policies . 4. Derivation of Conditional Probability Densities 5. Examples . . . . . . . . . . . . . . . .
140 142 143
v. Problem of Estimation
154
CHAPTER
1. Least-Squares Estimation . . . 2. Maximum Likelihood Estimation 3. Optimal Bayesian Estimation . Appendix. Completion of Squares .
155 168 173 195
Convergence Questions in Bayesian Optimization Problems . .
197
I. 2. 3. 4. 5.
Introduction Convergence Questions: A Simple Case Martingales . . . . . . . . . . . . Convergence Questions: General Case . Stochastic Controllability and Observability
197 199 202 204 209
Approximations
223
CHAPTER VI.
CHAPTER VII.
1. 2. 3. 4. 5.
Approximately Optimal Control Policies for Adaptive Systems Approximation with Open-Loop Feedback Control Policies Sensitivity and Error Analysis of Kalman Filters . . . . . . Estimation of State Vectors by a Minimal-Order Observer . . Suboptimal Linear Estimation by State Vector Partition Method: Theory . 6. Suboptimal Estimation by State Vector Partition: An Example . . Appendix A. Derivation of the Recursion Formula for Open-Loop Feedback Control Policies (Section 2) . . . . . . . . . . . . Appendix B. Derivation of the Constraint Matrix Equations (Section 4) Appendix C. Computation of Llr(i) (Section 6) .
224 241 246 250 265 269 276 278 279
Stochastic Stability
282
I. Introduction . 2. Stochastic Lyapunov Functions as Semimartingales 3. Examples . . . . . . . . . . . Appendix. Semimartingale Inequality
282 284 288 290
CHAPTER VlII.
Miscellany
291
1. Probability as a Performance Criterion 2. Min-Max Control Policies . . . 3. Extensions and Future Problems . .
291 298 300
CHAPTER IX.
CONTENTS
xv
Appendix I. Some Useful Definitions, Facts, and Theorems from Probability Theory . . . . . . . . . . . . . . . . . . . Appendix II. Pseudoinverse . . . . . . . Appendix III. Multidimensional Normal Distributions Appendix IV. Sufficient Statistics . . . . . . . . .
309 318 325 333
Bibliography . . References . . . List of Symbols.
339 339 347
Author Index .
349
Subject Index .
352
This page intentionally left blank
Optimization of Stochastic Systems Topics in Discrete-Time Systems
This page intentionally left blank
Chapter I
Introduction
1+ Introduction
There is a wide range of engineering problems in which we want to control physical equipment or ensembles of such equipment. These problems may range from a relatively simple problem of controlling a single piece of equipment, such as a motor, to a very complex one of controlling a whole chemical plant. Moreover, we want to control them in the best, or nearly best, possible manner with respect to some chosen criterion or criteria of optimality. These criteria are usually referred to as the performance indices, or criterion functions (functionals), etc. In each of these control problems, we are given a physical system (a plant) that cannot be altered, and a certain amount of key information on the plant and the nature of the control problems. The information on control problems may be classified loosely into four somewhat interrelated classess" ": (I) requirements on over-all control systems to be synthesized, (2) characteristics of plants, (3) characteristics of the controllers to be used, and (4) permissible interactions between the controllers and the plants. The first class of information will include such things as the desired responses of the plants which may be given indirectly by the performance indices or directly in terms of the desired outputs of the plants, such as the requirement that outputs of plants follow inputs exactly.
* Superscript
numbers refer to the references at the end of this book.
2
I.
INTRODUCTION
In the second class will be included descriptions of the dynamical behaviors of given plants. For example, plants may be governed by linear or nonlinear ordinary differential equations, difference equations, or by partial differential equations, the last being the case for distributed parameter systems. This class may also include information available on plant parameters and on random disturbances affecting the plant behavior, such as plant time-constant values, probability distribution functions of random noises acting on the outputs of plants, or random variations of some plant characteristics, and so on. Available controllers may be limited in amplitude or in total energy available for control purposes. Controllers to be used may be capable of storing certain amounts of information fed into them. Their complexities may also be constrained. For example, for some reason we may want to use only linear controllers, or we may want to limit their complexities by allowing no more than a specified number of components, such as integrators and so on. This information is given by the third class. Finally, the fourth class may include specifications on the types of feasible measurements to be performed on plants, on the ways actuators can influence plant behaviors, and generally on the way information on the states of plants is fed back to the controllers and descriptions of the class of inputs the controllers are expected to handle, etc. The natures and difficulties of optimal control problems, therefore, vary considerably, depending on the kinds of available information in each of these four categories. The theory of optimal control has reached a certain level of maturity, and we now possess such theoretical tools as Pontryagin's maximum principle.P! dynamic programming,20-22 functional analysis,"! RMS filtering and prediction theory,U,98 etc., in addition to the classical control theory, to synthesize optimal control systems, given the necessary information for the problems. However, one major shortcoming of these theoretical tools is that they assume "perfect" information for the problems to be solved. Namely, for such theories to be applicable one needs information such as the equation for a system to be controlled, the mechanism by which the system is observed, the statistical properties of internally and externally generated noises affecting system performance, if any, the criterion of performance, and so on. In other words, when all pertinent information on the structures, parameter values, and/or nature of random disturbances affecting the system performances are available, the problem of optimally controlling
1.
INTRODUCTION
3
such systems can, in principle, be solved. Such a theory of optimal control might be termed as the theory of optimal control under perfect information. In reality, the "perfect information" situation is never true, and one needs a theory of control which allows acceptable systems to be synthesized even when one or more pieces of key information required by the current optimal control theory are lacking. This book is intended as an attempt to offer partial answers to the defects of "perfect information" optimal control theories. It primarily discusses optimal control problems with varying assumptions on items in Classes 2 and 4, and with relatively standard assumptions on items in Classes 1 and 3. The main objective of the present book, therefore, may be stated as the unified investigation of optimal stochastic control systems including the systems where some information needed for optimal controller synthesis is missing and is to be obtained during the actual controlling of the systems. In this book we are concerned with closed-loop optimal control policies of stochastic and adaptive control systems. More detailed discussion on the nature of optimal controls is found in Section 1 of Chapter II. Although closed-loop control policies and open-loop control policies are equivalent in deterministic systems, they are quite different in systems involving random elements of some kinds.P For an elementary discussion of this point see, for example, S. Dreyfus.P Further discussions are postponed until Section 1, A of Chapter II. Whatever decision procedures controllers employ in supplying the missing information must, of course, be evaluated by the consequences reflected in the qualities of control in terms of the stated control objectives or chosen performance indices. Statistical decision theory 29 ,11 5 will have a large part to play in synthesizing optimal controllers. Papers on the theoretical and computational aspects of optimal stochastic and adaptive control problems began to appear about 1960. 3 , 21 , 55 ,60 ,61 In particular, in a series of four papers on dual control theory, Fel'dbaum recognized the importance of statistical decision theory." The major part of the present book is concerned with the question of how to derive optimal Bayesian control policies for discrete-time control systems, The derivation is somewhat different from that of Fel'dbaum, however, and is partly based on the method suggested by Stratonovich.P? For similar or related approaches see Refs. 2, 54, 105a,
124, 132, 133, 141.
4
I.
INTRODUCTION
2. Preliminary Examples In order to introduce the topics of the next three chapters and to illustrate the kinds of problems encountered there, very simple examples of optimal control problems are discussed in this section without showing in detail how the indicated optimal controls are derived, before launching into detailed problem formulations and their solutions. These simple examples will also be convenient in comparing the effects on the complexities of optimal control policies of various assumptions on the systems. The readers are recommended to verify these optimal controls after becoming familiar with the materials in Chapters II and III. The plant we consider in these examples is described by the firstorder scalar difference equation UiE(-OO,
(0),
0
~
i
~
N-l
(1)
where x, a, b, and u are all taken to be scalar quantities. The criterion function is taken to be That is, a final value control problem of the first-order system is under consideration. We will consider only nonrandomized control policies in the following examples. The questions of randomized controls versus nonrandomized controls will be discussed in the next chapter. For the purpose of comparison, a deterministic system is discussed in Example I where the plant parameters a and b are assumed to be known constants. Later this assumption is dropped and the optimal control of System (I) will be discussed (Examples 2, 5~ 7) where a and/or b are assumed to be random variables. The effects on the form of control of random disturbances on the plant and observation errors will be discussed in Examples 3 and 4. In all examples the control variable u is taken to be unconstrained. Optimization problems, where the magnitude of the control variable is constrained, are rather complex and are discussed in Ref. 45, for example.
A.
OPERATIONS WITH CONDITIONAL PROBABILITIES
Before beginning the discussion of examples, let us list here some of the properties of conditional probabilities'" (or probability densities when they exist) that are used throughout this book. These are given for probability density functions. Analogous relations are valid in terms
2.
PRELIMINARY EXAMPLES
5
of probabilities. Some definitions, as well as a more detailed discussion of expectations, conditional expectations, and other useful facts and theorems in the theory of probabilities, are found in Appendix I, at the end of this book. There are three basic operations on conditional probability density functions that are used constantly. The first of these is sometimes referred to as the chain rule: pea, b I c)
=
pCb I c) pea I b, c)
(2)
Equation (2) is easily verified from the definition of conditional probability densities. The second operation is the integrated version of (2): pea I c)
=
JpCb
1
c) pea I b, c) db
(3)
This operation is useful when it is easier to compute pCb I c) and pea I b, c) than to compute pea I c) directly. For example, consider a system with a plant equation
(4) where ex is a random system parameter. Assuming that p(ex I Xi) is available, this formula is used to compute p(xi+l I Xi)' since P(Xi+l I Xi , ex) is easily obtained from the plant equation (4) if the probability density P(~i) is assumed known. The last of the three basic operations is used to compute certain conditional probability densities when it is easier to compute those conditional probability densities where some of the variables and the conditioning variables are interchanged. This is known as Bayes' formula: pea 1 b c)
,
=
pea I b)p(c I a, b)
f pea I b) p(c I a, b) da
(5)
or its simpler version pea I b) =
pea) pCb I a)
f pea) pCb I a) da
(6)
The Bayes formula is used, for example, to compute p(x i 1 Yi) given
P(Yi I xi) where Yi is the observed value of Xi .
1.
6
INTRODUCTION
In this book the notation E(·) is used for the expectation operation. A detailed discussion of the E(·) operation can be found in Appendix 1. This is a linear operation so that, given two random variables X and Y with finite expectations and two scalar quantities a and b,
+ bY) =
E(aX
a E(X)
+ b E(Y)
(7)
This formula is also valid when E(X) and/or E(Y) is infinite when the right-hand side of (7) is well defined. Another useful formula is E(X2)
[E(X)]2
=
+ var X
(8)
where var X is the variance of X which is defined to be var X
B.
EXAMPLE
1.
=
E(X - EX)2
(9)
DETERMINISTIC CONTROL SYSTEM
Suppose we have a scalar deterministic control system described by the difference equation (1) with a and b known and observed by Yi
=
Xi'
(10)
O~i~N-l
Such a system is drawn schematically in Fig. 1.1. Equation (10) shows that the state of the system is observed exactly. That is, the control system of Example 1 is deterministic, completely specified, and its state is exactly measured. This final control problem has a very simple optimal control policy. Since ] =
X N2 =
(ax N - 1
+ buN _ 1 ) 2
clearly an optimal control variable at time N - 1, denoted by utr-l , is given by
(11) , - - - - - - - - - - - - -PLANT --, I
I
I
I
I
IL
_
Fig. I.l. System with deterministic plant and with exact measurement. a, bare known constants.
2.
7
PRELIMINARY EXAMPLES
U o*, U 1 *,..., Ut-2 are arbitrary, and min] = 0. Actually in this example we can choose anyone or several of the N control variables u o, U 1, ... , U N- 1 appropriately to minimize ]. For the purpose of later comparisons we will consider the policy given by (11) and choose U i * = ~ ax] b, i = 0, 1,... , N ~ 1. From (11) we see that this optimal control policy requires, among other things, that (i) a and b of (1) be exactly known, and that (ii) X N- 1 be exactly observable as indicated by (10). When both of these assumptions are not satisfied, the optimal control problem of even such a simple problem is no longer trivial. Optimal control problems without Assumptions (i) and/or (ii) will be discussed later. Now let us discuss the optimal control problem of a related stochastic system where the plant time-constant a of (I) is assumed to be a random variable.
C. EXAMPLE
2.
STOCHASTIC CONTROL SYSTEM:
SYSTEM WITH RANDOM TIME CONSTANT
Consider a discrete-time control system Ui
Xo Yi
E (-00,00)
(12)
gIven
=
(13)
O~i~N-I
Xi'
where {ai} is a sequence of independently and identically distributed random variables with known mean () and known variance a 2 • This system is a slight modification of the system of Example 1. It is given schematically in Fig. 1.2. The criterion function is still the same X N 2• Since X N PLANT
r--------------l
I
I
I
I I
I
L
~
__
~
__
~_
Fig. 1.2. System with random plant and with exact measurement. a, are independently and identically distributed random variable with known mean and variance; b is a known constant.
1.
8
INTRODUCTION
is a random variable now, an optimal control policy is a control policy which minimizes the expected value of ], Ej. Consider the problems of choosing U N- I at the (N - I )th control stage. Since (14)
where the outer expectation operation is taken with respect to the random variables X o , Xl , X N- I , * EX N 2 is minimized by minimizing the inner conditional expectation with respect to U N- I for every possible collection of X o ,... , X N- I , U o ,... , UN-I' Now
where
is taken to be some definite (i.e., nonrandom) function of In obtaining the last expression in (15), use is made of the basic formula of the expectation operations (7) and (8). From (15),
Xo
,
U N- I
Xl"'"
XN- I •
U~-l
= -8x rv_ I lb
(16)
and (17)
By assumption, a is a known constant. Therefore, the problem of choosing U N- 2 is identical to that of choosing UN-I' Namely, instead of choosing U N- I to minimize EX N 2, U N- 2 is now chosen to minimize a 2 E(x7v_I)' Thus it is generally seen that each control stage can be optimized separately with O~i~N-I
(18)
and min
E]
a 2 Nx o2
=
Uo'" "UN_l
(19)
This problem can also be treated by a routine application of dynamic programming. 20 Define Irv_n(x) =
* Since Uo
, ... ,
UN-l
min
u,u·····uN_I
E(X N
2
1
Xn
= x at time n)
(20)
only nonrandomized closed-loop control policies are under consideration, are some definite functions of Xo , ... , XN-l for any given control policy.
2.
9
PRELIMINARY EXAMPLES
starting from x at time n employing an optimal sequence of controls Un ,"', UN-I' Then, invoking the principle of optimality, I N - n satisfies the functional equation
IN_n(x) is the expected value of
XN 2
(21)
where To solve (21), it is easily seen that I N-n(Xn) is quadratic in put
Xn ;
therefore, (22)
where Q's and iL's are to be determined. Since Io(xN ) Qo = 1,
=
fLo = 0
X N 2,
we have (23)
From (21)-(23) one obtains Qn
=
fLn =
u 2n
(24)
0,
(25)
therefore min
uO·····uN_l
EXN 2
=
a 2Nx o2
with (26)
O~i~N-l
Comparing (18) with (11) of the previous example, one notices that , U 1 , ••• , U N- 2 are no longer arbitrary and the mean is regarded as "a" of the deterministic system. If you consider a system associated with (12) where the random variable ai is replaced by its expected value e, then we have a deterministic system
e
Uo
with i
= 0, 1,... , N
- 1
If you consider a control problem with this plant equation replacing the original system (12), then from (II) the optimal control policy for this associated system is such that
which turns out to be identical with the optimal control at time N ~ 1 for the original system (12), This is an example of applying what is
I.
10
INTRODUCTION
known as the certainty equivalence principle,49,136a where a given stochastic system is replaced by a corresponding deterministic system by substituting expected values for random variables. Sometimes optimal control policies for the deterministic system thus obtained are also optimal for the original stochastic systems. The detailed discussion of this principle is deferred until Chapter II, Section 2. Systems involving randomness of one sort or another are called stochastic to distinguish them from deterministic control systems. The adjective "purely" is used to differentiate stochastic systems with known probability distribution functions or moments, such as mean and variance, from stochastic systems in which some of the key statistical information is lacking, or incomplete. Such systems will be called adaptive to differentiate them from purely stochastic systems. The system of this section is therefore a simple example of a purely stochastic control system. One can go a step further in this direction and consider an adaptive system, for example, by assuming that the mean e is random with a given a priori distribution for e. Before doing this, let us go back to the basic system of Example I and add random disturbances to the state variable measurement (10) and/or to the plant equation (I).
D.
EXAMPLE
3.
STOCHASTIC CONTROL SYSTEM:
SYSTEM WITH NOISY OBSERVATION
Let us now assume that the observations of state variables are noisy. Figure 1.3 is the schematic diagram of this system. Later, in Example 4 of this section, as well as in Chapters III and IV, we will consider
,------ -
-
-
-
-
-PLANT ---, I
...,- _ _ r-r-
L
Yj
I
I Xi I I
J
I
Fig. 1.3. System with deterministic plant and with noisy measurement. a, bare known constants, and 'I; are measurement noises.
2.
PRELIMINARY EXAMPLES
11
several such examples which show that the optimal control problems with noisy observations are substantially more difficult than those with exact state variable observations. In this example, the plant parameters a and b are still assumed given, but instead of (10) we now assume that Yi
=
Xi
+ "Ii ,
(27)
O~i~N-l
where YJi is the noise in the observation mechanism (observation error random variable of the system at time i). Its first and second moments are assumed given. Otherwise, the system is that of Example 1. Note that it is no longer possible to say as we did in Example I that the control variable of (11) is optimal, since what we know at time N - I is the collection YN-l 'YN-2 , ... , Yo rather than that of X N- 1 , X N- 2 , ... , X o ; i.e., X N-1 is not available for the purpose of synthesizing control variable UN-I' We must now consider dosed-loop control policies where U i is some deterministic function of the current and past observations on the system state variable and of past employed controls. That is, the control is taken to be
and the function cPo, cPl ,... , cPN-l must be chosen to mimrmze E]. Control policies are discussed in Section I, A, Chapter II. Denote the conditional mean and variance of Xi by (28)
E(xi!Yo'''',Yi)=P-i
and var(xil Yo '''',Yi) =
Ui
2,
O~i~N-l
(29)
Then, from (7), (9), (28), and (29), E(XN
2 =
j
Yo '''',YN-l' E[(aX N_ I
Uo , ... , UN-I)
+ bUN_ I)21 Yo '''''YN-I' Uo , ... , UN-I] (30)
By choosing UN- 1 to minimize (30) for given is minimized, since
EX N 2
Yo , ... , YN-l , U o , ... , UN-I'
(31)
where the outer expectation is with respect to all possible y N - 1 and uN-I, where the notation y N - I is used for Yo , ... , YN-l and UN-I for Uo , ... , UN-I'
I.
12
INTRODUCTION
If GN_1 is independent of UN-I' then
=
-a E(XN _ 1 I yN-I)/b,
U o , ...
, U N- 2
arbitrary
(32)
is optimal in the sense that this control policy minimized E J, and (33)
Note that the problem of choosing U N-1 optimally is reduced to that of estimating X N-1 given yN-l by the conditional mean ILN-l. Later we will see how to generate such estimates using additional assumptions on the observation noises. See, for example, Section 3, Chapter II, and Section 2, Chapter III. Note also that one of the effects of noisy observations is to increase the minimal EJ value by some positive constant value proportional to the variance of the noise. E. EXAMPLE
4.
STOCHASTIC CONTROL SYSTEM:
SYSTEM WITH ADDITIVE PLANT NOISE
The system to be considered next is that of Example 1, with random disturbances added to the plant equation: Xi+l
=
Xo
grven
aX i
+ bu, +
ti'
(34) (35)
where
~i
are independent with (36) (37)
See Fig. 1.4 for the schematic diagram. Proceeding as in Example 2, (38)
smce and
2.
13
PRELIMINARY EXAMPLES
r-----
~i- - - - - -PLANT --, I
I I
I I I L
_
Fig. 104. System with deterministic plant, with additive random plant disturbances, and with exact measurement. a, b are known constants, and ti are random disturbances on the plant.
because the conditional probability density p(xN I X N- I ,UN-I) is given by that of ~N-I with ~N-I = X N - aX N_ I - UN-I' From (38), the optimal policy is given by (39)
since G N- I is a constant independent of UN-I' Observe that the random disturbance in the plant equation has an effect on EJ similar to that of the disturbance in the observation equation. In both cases the minimum of E(] I yN-I) is increased by an amount proportional to the variance of the disturbances. Since the mean of t, is zero, the system of Example 1 is the deterministic system obtained from the system of Example 4 by replacing ~i by its mean, i.e., by applying the certainty equivalence principle to the system. Comparing (11) with (39), the optimal control policy for this system is seen to be identical with that of Example 1. Comparing Example 3 with Example 4, the optimal control policy for Example 4 is seen to be simpler. In Example 3 it is necessary to compute f-t's, whereas the optimal control policy for Example 4 is the same as that of Example 1. As this example indicates, it is typically more difficult to obtain optimal control policies for systems with noisy state vector observations than with exact state vector measurements.
F.
EXAMPLE
5.
STOCHASTIC CONTROL SYSTEM:
SYSTEM WITH UNKNOWN TIME CONSTANT
In Examples 1, 3, and 4, it is essential that the plant time-constant a be known exactly since it appears explicitly in the expressions for
1.
14
INTRODUCTION
optimal control policies for these systems. In this example, we consider the system described by u,. E (-00, (0) Yi =
Xi
+ YJi
(40)
(41)
(42)
where "a" is now assumed to be a random variable with known mean and variance and where YJ's are assumed to be independent. It is further assumed that "a" is independent of YJ's and that E(a)
= ex
(43) (44)
where ex and Ul are assumed known. One may interpret the value of the time-constant "a" as a sample from a common distribution function with known mean and variance. Such a situation may arise when the plant under consideration is one of the many manufactured in which, due to the manufacturing tolerances, the time-constant of the plant is known to have a statistical distribution with known mean and variance. The noise in the observation (41) prevents the determination of "a" exactly by measuring the state variables at two or more distinct time instants. This problem is a simple example of plant parameter adaptive control systems. Later we consider another parameter adaptive system, in Section H (Example 7). In Example 3, we have derived the optimal control policy when a is a known constant. There we have U~_l
=
-afLN-l/ b
In this example a is not known exactly. In Examples 1 and 3, by comparing (11) and (32) we see that the only change in U N- 1 when the observations are noisy is to replace X N-1 by its conditional mean value fLN-l . In Example 2, where the time constant is chosen independently from the common distribution at each time instant, the time-constant a in the optimal control of (11) has been replaced by the mean value in the optimal control of (18). Therefore, it is not unreasonable to expect that (45)
is optimal where the random variable a is replaced by its a posteriori mean value
2.
15
PRELIMINARY EXAMPLES
The control of (45) is not optimal. Namely, the optimal control policy for Example 5 cannot be derived by applying the certainty equivalence principle mentioned in Examples 2 and 4. To obtain the optimal control at time N - 1, compute
=
J
XN
2
p(x N
X P(XN -
=
J(aX
N- I
X P(XN -
I ,
I X N- I
a I yN-\
+ I ,
, UN-I,
a)
UN-I)
dXN dXN - I da
UN-I)
dXN - I da
bU N _ I)2
a I yN-\
(46)
where the probability densities are assumed to exist. Denoting ~
N-I =
E(ax N-I I yN-I ,
uN-I)
and
(46) can be expressed as E(XN 2 I yN-t,
UN-I)
=
(~N-It-
bU N_ I)2
+ };~-I
Therefore, assuming that l:'~_1 is independent of control at time N - 1 is given by
UN-I'
the optimal
By the chain rule, we can write
In Chapter II, we will show that if the observation noises are Gaussian then the conditional probability density function of XN- I , given a, yN-\ and U N- 2, is Gaussian, and that its conditional mean satisfies the recursion equation where fLi = E(xi I a, yi, U i - 1) and where K N - I is a constant independent of y's and u's. We will also show that the conditional variance of XN- I , given a, yN-\ and UN-2, is independent of y's and u's.
I.
16
INTRODUCTION
The conditional mean and the variance of nonlinear functions of a. Therefore, -=F E(XN _ I
~N-I
XN~l'
I a, y N-l, U N- 2) E(a I yN-l,
however, are some UN-I)
showing that the control given by (45) is not optimal. We will take up the questions of computing the optimal control policies for systems with random parameters in Chapter III. G. EXAMPLE
6.
STOCHASTIC CONTROL SYSTEM:
SYSTEM WITH UNKNOWN GAIN
In Examples 1-4 we see that their optimal control policies have the common structure that the random or the unknown quantities are replaced by their (a posteriori) mean values; i.e., the certainty equivalent principle yields the optimal control policies for these examples. The optimal control policy in Example 5, however, does not have this structure. As another example of the latter nature let us consider a stochastic control system U i E (-00 , 00) (47) X H I = ax, + bu, + ti' Xo
given
Yi
(48)
O~i~N-l
= Xi'
where a is a known constant but where b is now assumed to be a random variable, independent of g's with finite mean and variance. The schematic diagram of this system is also given by Fig. 1.4. The plant disturbance g's are assumed to be independently and identically distributed random variables with (49) (50)
O:(:i~N-l
According to the certainty equivalence principle,
U!;_l , we consider the deterministic plant
ill
order to obtain (51)
where bN -
From (11), the optimal
U N-1
I
~
E(b I X N- I)
(52)
for the system (51) is given by (53)
2.
17
PRELIMINARY EXAMPLES
With this control, the conditional expected value of tribution to E I from the last control stage, is given by E(x N 2IX N-'I
UN-I)
+~
E [(ax N-I _ _b b_ ax N-I -
=
N
I
I.
N-I
r.e., the con-
)21 X N-I] (54)
where a~_1
=
I X N- I)
var(b
Let us try another control variable U N-I
= -
bN _ I
b2 N-I
+
02 N-I
(
aXN-I
)
(55)
With this control, E(x N 2[X N-'I
UN-I)
= E [(ax N-I _
N I _ bb+ 02
b2 N-I
ax N-I
N-I
+ SN-I t )21
X N-I]
(56)
Comparing (54) and (56), we see the optimal control for the deterministic system (5 I) is not optimal since the control variable of (55) is better. This is only one of the many subtle points that arise in optimal control of stochastic systems. In Chapter III we will show how to derive such a policy in a routine manner.
H.
EXAMPLE
7.
STOCHASTIC CONTROL SYSTEM:
RANDOM TIME-CONSTANT SYSTEM WITH UNKNOWN MEAN
In Example 2, the random time-constants {aJ are assumed to have known means. Now, we assume the mean is unknown. The system is described by u, Yi
=
Xi
E (-
OCJ, OCJ)
(57) (58)
where' a/s are independently and identically distributed Gaussian random variables with mean e and variance a 2 , where a is assumed known but e is assumed to be a random variable.
I.
18
INTRODUCTION
I t is convenient to introduce a notation 2(·) to denote the distribution of a random variable. Using this notation, it is assumed that (59)
°
where a is given and where N(a, b) is a standard notation for a normal distribution with mean a and variance b. The unknown mean is assumed to have the a priori distribution 2 0(8)
=
N(8 0
,
u 02 )
with 00 and U o given. This type of control problem, which is stochastic but not purely stochastic, is called adaptive or more precisely parameter-adaptive to distinguish it from purely stochastic problems. If, instead of assuming that the mean of a is known in Example 5, we assume that the mean is a random variable with given a priori distribution, then we obtain another example of adaptive control system. The optimal control policy for parameter adaptive control systems are discussed in Section 3, Chapter III.
I.
EXAMPLE
8.
SYSTEM WITH UNKNOWN NOISE
Most parts of this book are concerned with a class of control policies known as closed-loop Bayes control policies.w Loosely speaking, the Bayesian approach to the optimal control problems requires the assumption of a priori probability distribution functions for the unknown parameters. These distribution function are updated by the Bayes rule, given controls and state vector measurements up to the current time. The Bayes approach is examined in some detail in Chapter VI. The min-max approach does not assume the probability distribution functions for the unknown parameters. In Chapter IX, we will briefly discuss min-max control policiesw and their relationship with Bayes control policies. As an illustration, consider a system with perfect observation:
+ o + ~o
Xl =
ax o
Yo
X o given
=
U
where it is assumed that a is known and that
to is a random variable with
with probability p with probability I - P
where 01 and O2 are given, 01
> O2 •
2.
PRELIMINARY EXAMPLES
19
The criterion function is taken to be
] =
X1
2
=
(aX O
+ Uo + ~O)2
Since] is a function of U o as well as p we write it as ](p, u). The expected value of ] is given as
Therefore, the control given by minimizes E]:
Note that Y1* is maximized when p = 1. When p is known, the control is called the optimal Bayes control for the problem. If p is not given, U o* cannot be obtained. Let us look for the control which makes ] independent of 81 or 82 , Namely, consider Uo given by Uo
Then
Thus, if Uo is employed, X 1 2 is the same regardless of p values. Such a control policy is called an equalizer control policy. 58a ,133 The value of ] is seen to be equal to Y1 * when p = 1. In other words, the control Uo minimizes the criterion function for the worst possible case p = 1. Therefore Uo may be called the min-max control since it minimizes the maximal possible E] value. Comparing Uo and U o*, Uo is seen to be the optimal Bayes control for p = 1. For this example, an equalizer control policy is a min-max control policy, which is equal to the optimal Bayes control policy for the worst possible a priori distribution function for the unknown parameter 8. It is known that the above statements are true generally when the unknown parameter 8 can take on only a finite number of possible values. When 8 can take an infinite number of values, similar but weaker statements are known to be true. See Chapter IX, Section 2 of this book or Ferguson 58a and Sworder.I'" for details.
Chapter II
Optimal Bayesian Control of General Stochastic Dynamic Systems
In this chapter, we develop a systematic procedure for obtaining optimal control policies for discrete-time stochastic control systems, i.e., for systems where the random variables involved are such that they all have known probability distribution functions, or at least have known first, second, and possibly higher moments. Stochastic optimal control problems for discrete-time linear systems with quadratic performance indices have been discussed in literature under the assumptions that randomly varying systems parameters and additive noises in the plant and/or in the state variable measurements are independent from one sampling instant to the next. 67 ,80 The developments there do not seem to admit any ready extensions to problems where the independence assumption is not valid for random system parameters, nor to problems where distribution functions for noises or the plant parameters contain unknown parameters. In this chapter, a method will be given to derive optimal control policies which can be extended to treat a much larger class of optimal control problems than those mentioned above, such as systems with unknown parameters and dependent random disturbances. This method can also be extended to cover problems with unknown parameters or random variables with only partially known statistical properties. Thus, we will be able to discuss optimal controls of parameter adaptive systems without too much extra effort. The method to be discussed-v-" partly overlaps those discussed by other investigators, notably that of Fel'dbaum.v" Although the method presented here is essentially its equivalent,105a the present method is 20
1.
FORMULATION OF OPTIMAL CONTROL PROBLEMS
21
believed to be more concise and less cumbersome to apply to control problems. For example, the concept of sufficient statistics'" are incorporated in the method and some assumptions on the systems which lead to simplified formulations are explicitly pointed out. 15 ,16 The evaluations of various expectation operations necessary in deriving optimal control policies are all based on recursive derivations of certain conditional probabilities or probability densities. As a result, the expositions are simpler and most formulas are stated recursively which are easier to implement by means of digital computers.
1. Formulation of Optimal Control Problems
A.
PRELIMINARIES
In this section, purely stochastic problems are considered. Namely, all random variables involved are assumed to have known probability densities and no unknown parameters are present in the system dynamics or in the system observation mechanisms. We consider a control system described by Uk E
Uv , k
= 0, I, ... , N - 1
(1)
where Po(x o) is given and observed by k
= 0, I, ...,N
(2)
and where X k is an n-dimensional state vector at kth time instant, Uk is a p-dimensional control vector at the kth time instant, Uk is the set in the p-dimensional Euclidean vector space and is called the admissible set of controls, t k is a q-dimensional random vector at the kth time instant, Yk is an m-dimensional observation vector at the kth time instant, and YJk is an r-dimensional random vector at the kth time instant. The functional forms of F k and G k are assumed known for all k. Figure 2.1 is the schematic diagram of the control system. The vectors tk and YJk are the random noises in the system dynamics and in the observation device, or they may be random parameters of the system. In this chapter, they are assumed to be mutually independent, unless stated otherwise. Their probability properties are assumed to be known completely. The problem of optimal controls with imperfect probability knowledge will be discussed in the next chapter.
22
II.
OPTIMAL CONTROL OF STOCHASTIC SYSTEMS
CONTROLLER WITH
Fig. 2.1.
MEMORY
Schematic diagram of general stochastic control system.
From now on, Eq. (1) is referred to as the plant equation and Eq. (2) is referred to as the state variable observation equation or simply as the observation equation. The performance index is taken to be N
]
=
I
Wk(X k, Uk-I),
(3)
k~l
This form of performance index is fairly general. It contains the performance indices of final-value problems, for example, by putting Wi = 0, i = 1,... , N - 1 and taking W N to be a function of X N only. We use a notation uk to indicate the collection U o , U I , ... , Uk . Similarly x k stands for the collection X O, Xl" .. ' X k • Although in the most general formulation the set of admissible control at time k, Uk' will depend on x k and uk-I, Uk is assumed in this book to be independent of x k , uk-I. a. Optimal Control Policy
One of our primary concerns in the main body of this book is the problem of deriving optimal control policies, in other words, obtaining the methods to control dynamic systems in such a way that some chosen numbers related to system performances are minimized. Loosely speaking, a control policy is a sequence of functions (mappings) which generates a sequence of control actions U o , U I , ... according to some rule. The class of control policies to be considered throughout this book is that of closed-loop control policies, i.e., control policies such that the control Uk at time k is to depend only on the past and current observations yk and on the past control sequences U k- I which are assumed to be also observed. A nonrandomized closed-loop control policy for an N-stage
1.
FORMULATION OF OPTIMAL CONTROL PROBLEMS
23
control process is a sequence of N control actions Ui , such that each Ui takes value in the set of admissible control Vi' Ui E Vi' 0 ~ i ~ N - 1, depending on the past and current observations on the system Yo , Yl ,... , Yi-l ,Yi and on the past control vectors Uo ,... , Ui-l' Since past controls Uo ,... , Ui-l really depend on Yo ,... , Yi-l' Ui depends on Yo ,... , Yi-l 'Yi . * Thus a control policy c?(u) is a sequence of functions (mappings) cpo , c?l ,..., c?N-l such that the domain of c?i is defined to be the collection of all points
v, E
with
Yj
,
O,s;; j ,s;; 1
where Y j is the set in which the jth observation takes its value, and such that the range of c?i is Vi' Namely, u; = Ui(yi, Ui- 1 ) = c?i(yi) E Vi ." When the value of Ui is determined uniquely from yi, u':", that is when the function c?i is deterministic, we say a control policy is nonrandomized. When c?i is a random transformation from y i, Ui-1 to a point in Vi' such that c?i is a probability distribution on Vi' a control policy is called randomized. A nonrandomized optimal control policy, therefore, is a sequence of mappings from the space of observable quantities to the space of control vectors; in other words, it is a sequence of functions which assigns definite values to the control vectors, given all the past and current observations, in such a way that the sequence minimizes the expected value of J. From (3), using E(·) to denote the expectation operation, the expected value of ] is evaluated as N
E]
=
E
(I
Wk)
lc~l
where Essentially, the method of Fel'dbaum'" consists in evaluating E(Wk ) by
* For the sake of convenience, initially available information on the system is included in the initial observation. t Uo = uo(Yo) = oPo(Yo), ... , u, = U;(yi, Ui- I) = U;(yi, oPo(Yo), ... , oPi-l(yi-I)) = oPi(yi).
24
II.
OPTIMAL CONTROL OF STOCHASTIC SYSTEMS
where dx k f':, dx o dX1 '" dXk , dyk-l f':, dyo ... dYk-l, and dU k- 1 ~ duo'" dU k_l, and writing p(x k, yk-l) in terms of more elementary probability densities related to (1) and (2). Since we do not follow his method directly, we will not discuss it any further in this chapter. However, in order to give the readers some feeling for and exposure to his method, we give, as an example in Section6, his method of the treatment of a particular class of parameter adaptive systems. The other method, to be developed fully in this and later chapters, evaluates not R k directly but the conditional mean of W k , E(Wk I y k-l, u k- Z)
=
JWk(Xk , Uk-I) P(Xk , Uk- I Iyk-l, uk- Z) dXk dUk_I
(4)
and generates p(x k I yk, Uk-I)
and P(Yk+l I yk, Uk),
0 :;:;; k :;:;; N - 1
recursively. See (21) and (22) for the significance of these expressions.
b. Notations It may be helpful to discuss the notations used in the book here. In the course of our discussions, it will become necessary to compute various conditional probability densities such as P(Xi+1 I yi). As mentioned before, we are interested in obtaining optimal closed-loop control policies; i.e., the class of control policies to be considered is such that the ith control variable U i is to be a function of the past and current observable quantities only, i.e., of yi and Ui- l only, 0 ~ i ~ N - 1. If nonrandomized control policies are used, * then at time i, when the ith control Ui is to be determined as a function of yi as Ui = 1>i(yi), it is the functional form of 1>i that is to be chosen optimally, assuming 1>i-I are known. In other words, 1>i depends on 1>i-I. Note that even though the function 1>i is fixed, 1>i(yi) will be a random variable prior to time i since yi are random variables. It will be shown in the next section that these 1>'s are obtained recursively starting from 1>N-I on down to
i is expressed as a function of 1>0 ,...,1>i-l , which is yet to be determined. Therefore, it is sometimes more convenient to express Ui = 1>i(yi) as Ui = Ui(U i-l, yi), whereby the dependence of u i on past controls 1>0 ,...,1>i-1 is explicitly shown by a notational abuse of using Uj for 1>j , 0 :;:;; j :;:;; i. Since Ui is taken to be a measurable function of yi,
* It is shown later on that we need consider only the class of nonrandomized closedloop control policies in obtaining optimal Bayesian control policies.
1.
FORMULATION OF OPTIMAL CONTROL PROBLEMS
25
Of course, one must remember that p(. I yi, u i ) is a function of Ui , among others, which may yet be determined as a function of yi (or equivalently of Ui-1 and yi). To make this explicit, sometimes a subscript 4>i is used to indicate the dependence of the argument on the form of the past and current control, e.g., pq,JXi+l I Xi' yi) = p(Xi+l I Xi'
ic,
= 4>i(yi)).
When randomized control policies are used, the situation becomes more complicated since it is the probability distribution on Vi that is to be specified as a function of Ui-1 and yi; i.e., a randomized control policy is a sequence of mappings 4>0 ,4>1 ,..., 4>N-1 such that 4>i maps the space of observed state vectors yi into a probability distribution on Vi . A class of nonrandomized control policies is included in the class of randomized control policies since a nonrandomized control policy may be regarded as a sequence of probability distributions, each of which assigns probability mass 1 to a point in Vi' 0 ~ 1 ~ N - 1. The question of whether one can really find optimal control policies in the class of nonrandomized control policies is discussed, for example, in Ref. 3. For randomized control policies,
hence P(Yi+1 I yi) is a functional depending on the form of the density function of ui , p(ui I yi). When Ui is nonrandomized, P(Yi+1 I yi) is a functional depending on the value of u, and we write
or simply P(Yi+1 I yi, ui )· The variables Ui or ui are sometimes dropped from expressions such as p(. I yi, u i ) or p(. I yi, ui ) where no confusion is likely to occur. Let (5) be the joint conditional probability that the sequence of the state vectors and observed vectors will lie in the elementary volume dx o'" dx, dyo ... dYi-1 around Xi and yi-\ given a sequence of control specified by 4>i-\ where the notation d(Xi,yi-1)
=
d(x o ,... , Xi ,Yo '''·,Yi-1) (6)
II.
26
OPTIMAL CONTROL OF STOCHASTIC SYSTEMS
is used to indicate the variables with respect to which the integrations are carried out. Let (7)
be the conditional probability that the observation at time k lies in the elementary volume dYk about Yk , given Xk • Finally, let (8)
be the probability that the initial condition is in the elementary volume about X o ' Various probability density functions in (5), (7), and (8) are assumed to exist. If not, they must be replaced by Stieltjes integral notations. B.
DERIVATION OF OPTIMAL CONTROL POLICIES
We will now derive a general formula to obtain optimal control policies. At this point, we must look for optimal control policies from the class of closed-loop randomized control policies. a. Last Stage Consider the last stage of control, assuming y N - 1 have been observed and U N- 2 have been determined somehow, and that only the last control variable U N-1 remains to be specified. Since U N-1 appears only in W N , EJ is minimized with respect to U N-1 by minimizing EWN with respect to UN-I' Since (9)
where the outer expectation is with respect to yN-l and U N- 2, R N is minimized if E(WN I yN-l, U N- 2) is minimized for every yN-l and U N- 2• One can write E(WN
I yN-\
U N-
2)
=
J
WN(X N, UN-I) p(X N , U N- I
I yN-I,
U
N- 2)
d(x N, UN-I)
(10)
By the chain rule, the probability density in (10) can be written as p(x N , U N-I
I yN-\
U N- 2) =
I yN-\ U N- 2) x p(x N I UN-I, yN-I)
P(U N- I
(11 )
1.
FORMULATION OF OPTIMAL CONTROL PROBLEMS
where p(XN
I uN-I, yN-I)
=
27
Jp(XN I X
N - I , uN-I, yN-I)
X
P(XN - I I uN-I, yN-I) dXN _ I
(12)
If the fs and YJ's are mutually independent and for each k, i.e., if i. ,..., ~N-I , YJo ,... , YJN-l are all independent, then, from Eqs. (1) and (2), ~o ,
(13) O~i~k-l
We will use Eq. (13) throughout this section. Developments are quite similar when this Markov property'" does not hold. One merely uses the left-hand side of Eq. (13). See Section 2 of Chapter IV for more general discussions of the Markov property. In particular, in (12),
and (14)
since
U N- I
affects
XN
but not
X N-1 •
Define
Therefore, if one assumes that (14) is available, then (10) can be written as (15)
where AN
~
JW
N(x N , UN-I) p(X N
X P(XN -
I
I yN-I,
U
N- 2)
I XN - I
, UN-I)
d(XN , XN - I )
(16)
In (16), the probability density p(x N I X N- I ,UN-I) is obtainable from the known probability density function for ~N-l and the plant equation (I) under appropriate assumptions on (1). See for example Eq. (27). The second probability density in (16), P(XN-I I yN-I, U N- 2), is not generally directly available. It will be shown in the next section how it can be generated. For the moment assume that it is available.
II.
28
OPTIMAL CONTROL OF STOCHASTIC SYSTEMS
Thus /ow is in principle computable as a function of yN-l and uN-I, hence its minimum with respect to U N-1 can in principle be found. Denote this minimizing U N- 1 by U"tr-l' Define
Thus, the minimization of EWN with respect to PN-l IS accomplished by that of E(WN I yN-\ UN- 2), which is achieved by taking \ .IS a f unction . PN-l 0~( U N- 1 UN-I' S'mce I\N 0 f y N-l an d U N-l , U N _ 1 IS obtained as a function of yN-l and U N- 2 as desired. See Fig. 2.2 for illustrations of random and nonrandom control policies and the corresponding values of the conditional expectation of WN • In Eq. (15) the expression PN-l(U N- 1 ) represents a probability density function of U N- 1 E UN-l' where the functional form of the density function depends on the history of observation or on yN-I. The functional form of PN-l specifies the probability PN-l(U N-1) dU N_ 1 with which a control in the neighborhood of a point U N-1 is used in the last control stage. However, we have seen that this generality is not necessary, at least for the last control UN-I' and we can actually confine our search for the optimal U N- 1 to the class of nonrandomized control policies; i.e., the value of the optimal control vector U N- 1 will actually be determined, given yN-\ and it is not that merely the form of the probability density will be determined. We can see by similar arguments that the Ui are all nonrandomized, 0 :(; i :(; N - 1. Thus, we can remove U N-1 from
-
* )
* .
>'N ,
f.~~
,
_ _ :-
-/1
--I
i: f x P
' ULtf -!------
i
N,
: ,
,
U~_I
UN_I
'
--i--S(uN-I -u"N-l )
t
-----
,
:
~
:
"
UN_I
Fig. 2.2.
d
UN-I
,
:.
:
N N_'(UN_,)
E(WN
:
i
P (
N-tUN_1
)
""'"'- UN_I
I yN_l)
versus
UN_I'
t If u'N- is not unique, then the following arguments must be modified slightly. By 1 choosing anyone control which minimizes /ow and concentrating the probability mass one there, a nonrandomized control still results.
1.
FORMULATION OF OPTIMAL CONTROL PROBLEMS
29
the probability density function in Eq. (11) and we can deal with p(x N I yN-l) with the understanding that UN-1 is uniquely determined by
yN-l.
Figure 2.3 illustrates this fact schematically for scalar control variable. A typical PN-l(U) may have a form like Fig. 2.3(a), where UN- 1 is taken to be a closed interval. Optimal PN-l , however, is given by Fig. 2.3(b). A nonrandomized control is such that a point in UN-l is taken with probability 1. If U N-l consists of Points A, B, and C for two-dimensional control vectors, as shown in Fig. 2.4(a), then there are three possible nonrandomized U N- 1 , i.e., U N- 1 given by Point A, Point B, or Point C, whereas a neighborhood of any point in Triangle ABC typically may be chosen with a randomized control policy with probability PN-l(U) du, where du indicates a small area about U in Triangle ABC. This is shown in Fig. 2.4(b).
b. Last Two Stages Putting aside, for the moment, the question of how to evaluate
P(X N- 1 I yN-l), let us proceed next to the consideration of optimal control
(a)
Fig. 2.3.
Schematic representation of randomized and nonrandomized control.
(a)
(b)
I~C ~ Fig. 2.4. Admissible control variable with the randomized and nonrandomized control policies.
II.
30
OPTIMAL CONTROL OF STOCHASTIC SYSTEMS
policies for the last two stages of the process. Assume that yN-2 and UN-3 are given. The control variable U N- 2 appears in W N - 1 and W N • Since E[WN_I(X N_I, UN-2)
+ WN(xN, uN-I)]
=
E[E(WN_1
+ W N ly N-2, UN-
3
)]
where the outer expectation is with respect to yN-2, and since a choice of certain U N- 2 transforms the problem into the last stage situation just considered, EJ is minimized by choosing U N- 2 such that it minimizes E(WN_ 1 WN I yN-2, U N- 3) for every yN-2 and by following this U N- 2 by U'Ll . Analogous to (15) we have
+
(18)
where and where AN- 1
g
J
WN-1(X N-I 'U N-2) P(X N-I
I XN-2'
X P(X N-2 I yN-2, UN- 3 ) d(X N_I ,
Also since
E( W N I yN-2, UN- 3 ) yN-2
=
X N-
UN- 2)
(19)
2)
E[E( W N I yN-I, UN-2) I yN-2, UN- 3 ]
C yN-l. This is seen also from p(. I yN-2, UN- 3 )
=
Jp(. I yN-\ UN-2) P(YN-l I yN-2, UN-2) X P(UN-2)d(YN-l , UN-2)
where use is made of the elementary operations (1) and (2) discussed in Chapter I. The optimal PN-2 is such that it minimizes E(WN_ 1 WN * ) where the asterisk on W N is to indicate that ut_l is used for the last control. Now,
+
min E(WN_I
PN-2
+ W N* IyN-2, UN- 3 )
= min[E(WN_1 I y N- 2, UN- 3 ) PN-2 E(WN* I yN-2, UN- 3 ) ]
+
=
min E[WN _ 1 PN-2 E(WN* I yN-\ UN-2) I yN-2, UN- 3 ]
+
=
min E[WN_1
PN-2 YN* I yN-2, UN-3]
+
+ JYN * P(YN-I I UN-2, yN-2) dYN-I] X PN-2 dU N_2
(20)
1.
FORMULATION OF OPTIMAL CONTROL PROBLEMS
31
where it is assumed that P(YN-l I UN- 2, y N- 2) is available. Defining YN-l by YN-I =
AN - I
+ f YN* P(YN-I I y N-2, UN-2) dYN-I
Eq. (20) is written as . E( W N-I mill PN-2
* IY N-2 + WN ' UN-3) =
. mill
PN-2
f YN-IPN-2 dUN-2
Comparing this with Eq. (I5), it is seen that the optimal control is such * = 0"(UN- 2 - UN*) h * .IS UN- 2 W hiIC h mnumizes .. . t h at PN-2 UN_2 YN-I , - 2 , were and the control at (N - 2)th stage is also nonrandomized. c. General Case
Generally, E('Lf:+l Wi) is minimized by minimizing E('Lf:+l Wi/y k, uk-I) with respect Pic for each yk, Uk- I and following it with pt+1 ,... , Pl;-I . It should now be clear that arguments quite similar to those employed in deriving Pl;-I and Pl;-2 can be used to determine Pic *. Define Yk
Y~+l
=
Ak
+ f Y:+I P(Yk I y k- l, Uk-I) dYk ,
(21)
== 0
where p(YIc I yk-\ Uk-I) is assumed available and where Ak IS gIven, assuming P(Xk-1 I yk-\ Uk-2) is available, by
Ak =
f Wk(Xk , Uk-I) p(Xk I Xk- I , Uk-I) P(Xk-1 I y k- I, Uk1
Then optimal control at time k -
. Y» mm uk_l
=
~k
2
)
d(Xk , Xk- I),
(22)
~N
1, ULI , is Ulc-I , which minimizes Ylc :
'Yk *,
(23)
By computing Ylc recursively, optimal control variables are derived in the order of Ul;-I , Ul;-2 ,... , Uo*. Once the optimal control policy is derived, these optimal control variables are used, of course, in the order of time u o*, u l *,..., Ul;-I. The conditional probability densities assumed available in connection with (21) and (22) are derived in Section 1,C. At each time k, Uo*, ... , UJ:_I and Yo ,... , Ylc are no longer random but known. Therefore, Ulc * is determined definitely since
and ePic is given as a deterministic function.
32
II.
OPTIMAL CONTROL OF STOCHASTIC SYSTEMS
°
From (22), Ak = if Wk = 0. Therefore, if we have a final value problem, then Ak = 0, k = 1,2, ..., N - 1 and, from (21), Yk'S are simply obtained by repeated operations of minimization with respect to u's and integrations with respect to y's. From (21) and (23) we have Yk
*
.
mmYk
=
This is precisely the statement of the principle of optimality'" applied to this problem where
To see this simply, let us assume that the state vectors are perfectly observable, i.e., Yi
=
Xi,
Then, the key relation (21) reads
which is the result of applying the principle of optimally to Yk*
=
min
u k _ l' ...• uN_l
E[Wk
+ .., + WN [X k-1]
We have the usual functional equation of the dynamic programming if the {xk}-process is a first-order Markov sequence, for example, if tk's are all independent. Then
When the observations are not perfect, then the arguments of Yk * are generally yk-l and U k- 2• Thus the number of the arguments changes with k. "IN * is computed as a function of yN-l and U N- 2 and, at step k, Yk in Y[+J is integrated out and the presence of U k- I is erased by the minimization operation on Uk-l to obtain Yk * as a function of yk-I and U k- 2• As we will discuss in Section 3, when the information in (y k , Uk-I) is replaceable by that in quantities called sufficient statistics.I" Sk' and when Sk satisfies a certain condition, then the recursion relation for the
1.
FORMULATION OF OPTIMAL CONTROL PROBLEMS
33
general noisy observation case also reduces to the usual functional equation of dynamic programming
where
Sk
satisfies the relation
for some function 1j;. For detail, the reader is referred to Sections II, 3 and IV,2. Similar observations are valid for recurrence equations in later chapters. C. DERIVATION OF CERTAIN CONDITIONAL PROBABILITY DENSITIES
Equations (21)-(23) constitute a recursive solution of optimal control policies. One must evaluate y's recursively and this requires that the conditional densities h(xi I yi) and Pq,(Yi+l I yi) or, equivalently, p(xi I yi, U i - 1) and P(Yi+l I yi, u i ) are available. * We have noted, also, that these conditional densities are not readily available in general. The general procedure for deriving such densities are developed in Chapters III and IV. To indicate the method, let us derive these densities under the assumption that noise random vectors es and Yj's are mutually independent and independent for each time. Consider a conditional density p(xi+l , Yi+l I yi, ui ). By the chain rule, remembering that we are interested in control policies in the form of Ui =rpi(yi, Ui - 1), 0 ~ i ~ N - 1,
We can write, using (13),
(24)
* Alternately, one can just as easily generate pfx.L, They are related by P(Xi+l
I y"
u i) =
f
P(Xi+l
I Xi
I y i , u') andp(Yi+l I y i ,
, Ui) pi»,
I y,., Ui~l)
dx,
u') recursively.
II.
34
OPTIMAL CONTROL OF STOCHASTIC SYSTEMS
Thus, from (24),
=
J
p(Xi I yi, Ui- 1 ) P(X'+1 I x, , U,)
X
Hence p( X.
I yi+! Ui)
HI'
=
p(Yi+1 I x,+!) dx,
(25)
i) P(Xi+! 'Yi+~ yi: U P(Yi+l I y" U') f p(Xi I yi, Ui-1 ) P(Xi+l I Xi, Ui) P(Yi+l I Xi+l) dXi
J(numerator) dXi+l
(26)
where the denominator of (26) gives P(YHI I yi, ui) and where p(xi+!lxi, ui) and P(Yi I Xi) are obtainable from the plant and observation equations and the density functions for t i and YJi . The recursion formula is started from p(x o I Yo), which may be computed by the Bayes formula p(x I Yo) o
=
Po(xo) p(Yo I x o)
I Po(xo)p(Yo I x o) dx o
where Po(x o) is assumed available as a part of the a priori information on the system. Equation (26) is typical in that the recursion formulas for r». I yi, U i - 1) and P(Yi+! I y i , ui) generally have this structure for general stochastic and adaptive control problems in later chapters. In the numerator of Eq. (26), P(XH 1 I Xi , ui) is computed from the plant equation and the known density function for ti and P(YHI I Xi+!) is computed from the observation equation and the known density function for YJi' The first factor p(xi I yi, U i - 1) is available from the previous stage of the recursion formula. With suitable conditions 73 •10 9b and
= p(ti)1 J< I p(y, I Xi) = p(YJ,) I I, I
P(Xi+ 1 I Xi' Ui)
(27)
where J< and [; are appropriate Jacobians and where the plant and the observation equations are solved for t i and YJi' respectively, and substituted in the right-hand sides. When t's and YJ's enter into Eqs. (1) and (2) additively, then the probability densities in Eq. (26) can be obtained particularly simply
1.
35
FORMULATION OF OPTIMAL CONTROL PROBLEMS
from the probability densities for t's and 71'S. See Ref. 1 for multiplicative random variable case. For example, if Eqs. (1) and (2) are Xk+l = Fk(Xk, Uk) + gk Yk = Gk(Xk) + 'YJk
then
I J< I = I J~ I =
1
and and are substituted inthe right-hand sides of Eq. (27). Thus, if P(gi)
1
= (27T)1/2
exp (u1
gi2
~?)
and P('YJi)
1
= (27T)1/2
exp (u2
'YJi 2
~;Z)
then
and
Equation (26) indicates clearly the kind of difficulties we will encounter time and again in optimal control problems. Equation (26) can be evaluated explicitly by analytical methods only in a special class of problems. Although this special class contains useful problems of linear control systems with Gaussian random noises as will be discussed in later sections of this chapter, in a majority of cases, Eq. (26) cannot be integrated analytically. We must resort either to numerical evaluation, to some approximate analytical evaluations of Eq. (26), or to both. Numerical integrations of Eq. (26) are nontrivial by any means since the probability density function P(Xi I yi, U i - 1) will not be any well-known probability density in general, cannot be represented conveniently analytically, and hence must he stored numerically. See Appendix IV at the end of this hook and Chapter III for additional details. Also see Ref. 73a. In order to synthesize u.", it is necessary to compute p(xi I yi, U i - 1) by (26) and then to compute '-\+1' to generate P(Yi+1 I yi, u i ), to evaluate E(yt+2 I yi, u i ), to obtain Yi+1 , and finally to minimize Yi+1 with respect to Ui .
36
II.
OPTIMAL CONTROL OF STOCHASTIC SYSTEMS
Note that the controller must generally remember yi and U i - 1 at time i in order to generate Ui*' Although some of the information necessary to compute U i can be precomputed, i.e., generated off-line, all these operations must generally be done on the real-time basis if the control problem is the real-time optimization problem. If k sampling times are needed to perform these operations, one must then either find the optimal control policy from the class of control policies such that
i = k, k
+ 1,... , N
- 1
where U o* through ULI must be chosen based on the a priori information only, or use approximations so that all necessary computations can be performed within one sampling time. In practice we may have to consider control policies with the constraints on the size of the memory in the controller and/or we may be forced to use control policies as functions of several statistical moments (such as mean or variance) instead of the probability density functions and generate these statistics recursively. For example, ui * may have to be approximated from the last few observations and controls, say Yi-l , Yi , Ui-2 , and U i - 1 . The problems of suboptimal control policies-t-?" are important not only from the standpoint of simple engineering implementations of optimal control policies but also from the standpoint of approximately evaluating Eq. (26). The effects of any suboptimal control policies on the system performance need be evaluated carefully either analytically or computationally, for example, by means of Monte Carlo simulations of system behaviors. We will return to these points many times in the course of this book, in particular in Chapter VII, where some approximation techniques are discussed.
2. Example. Linear Control Systems with Independent Parameter Variations A.
INTRODUCTION
As an application of the optimal control formulation given in Sections 1,Band 1,C, the optimal control policy for a linear stochastic sampled-data control system with a quadratic performance index will be derived. We assume that system parameters are independent random variables, that systems are subject to external disturbances, and that
2.
SYSTEMS WITH INDEPENDENT PARAMETER VARIATIONS
37
the state vector measurements are noisy. These random disturbances are all assumed to have known means and covariances. Specializations of this general problem by dropping appropriate terms lead to various stochastic optimal control problems, such as the optimal control of a deterministic plant with noisy state vector measurements, the optimal control of random plant with exact state vector measurements, and so on. Scalar cases of such systems have been discussed as Examples 2-4 of Chapter I. This type of optimal control problem has been analyzed by means of dynamic programming. 67 ,Bo The key step in such an analysis is, of course, the correct application of the principle of optimality to derive the functional equation. By the method of Section I,B the correct functional equations will result naturally without invoking the principle of optimality explicitly. Consider the sampled-data control system of Fig. 2.5, where the state vector of the system satisfies the difference equation (28a), where the system output vector is given by (28b), and where the observation equation is given by (33): (28a)
where po(xo) is assumed given, (28b)
where Xk
Ak Bk
is an n-vector (state vector), is an n X n matrix, is an n X p matrix,
Fig. 2.5. System with linear random plant, with additive plant disturbances, and with noisy measurement. The sequence of imput signals d, are generated by Eq, (34).
38
II.
OPTIMAL CONTROL OF STOCHASTIC SYSTEMS
is a p-vector (control vector), Uk E Uk , where Uk is a subset of E p (p-dimensional Euclidean space) and is called an admissible set of controls, gk is an n-vector (noise vector), Ck is an s-vector (output vector), and M k is an s X n matrix.
Uk
In (28a), A k , B k , and gk are generally random variables, which are assumed to be independent for each k. The {gk} random variables are also assumed to be independent of {A k } and of {B k } . The independence assumption on gk for each k can be weakened somewhat by introducing another random variable Vk such that k
=
0, 1,... , N - 1
(29)
where Ck is a known (n X n) matrix, D k is a known (n X q) matrix, v k is a q-vector, and V k is a random variable assumed to be independent for each k, and independent of A's and B's at all times. Equation (29) is introduced to handle random disturbances on the system which is not independent in k but which may be derived from another stochastic process {vk } which has the desirable property of being independent for each k. * This type of noises is not more general, since by augmenting the state vector Xk with gk' Eqs. (28) and (29) can be combined to give an equation similar to Eq. (28) with an independent random variable as a forcing term. Let
then (30)
where
and where Zk is the generalized (or augmented) state vector. t The random noise in (30), Ok' is independent for each k and of random variables Sk *'The noises fs are analogous to those generated by white noise through a linear shaping filter in continuous time processes. See for example Ref. 98. t See Chapter IV for more systematic discussions of the idea of augmented state vectors.
2.
SYSTEMS WITH INDEPENDENT PARAMETER VARIATIONS
39
and T k for all k. Thus, it is seen that, by augmenting the original equation for the system state vector by another equation describing the noise generation mechanism, it is possible to treat certain classes of dependent noises by the augmented state equation, Eq. (30), on which only independent noises act. Thus, it is no loss of generality to discuss Eq. (28) with independent ~k for this class. Assume that the control problem is to make the system output follow the desired output sequence {dk } as closely as possible, measured in terms of the performance index J:
J
N
=
I
1
Wk(e k , Uk-I),
(31)
where Wk is a functional which assigns a real number to each pair of an error vector ek ('\, dk - Ck and Uk-I. For example, Wk may be a quadratic form in ek :
where V k is a positive symmetric (s X s) matrix, and a pnme denotes a transpose. The feedback is assumed to consist of (33)
where Yk is an m vector (observation vector); i.e., the controller does not observe X k directly but receives Yk where YJk is the random observation error. In most control situations, the desired output sequence {dn } is a sampled sequence of a solution to some linear differential equation on which some noise is possibly superimposed. Assume that {dk } is generated by gk+l
=
Fkg k + c.i,
dk
=
Hkg k
(34)
where gk Fk
Gk
Sk
tt,
is is is is is
an an an an an
m' vector, (m' X m') matrix, (m' X r) matrix, r-dimensional random vector independent for each k, and (s X m') matrix.
40
II.
OPTIMAL CONTROL OF STOCHASTIC SYSTEMS
Since most deterministic signals are solutions of linear differential or difference equations or can be approximated by such solutions, the class of desired output sequences described by (34) is fairly large. It is possible to combine Eqs. (28) and (34) into a single equation. Define (35)
Then (36)
where r-i
.!:jk
= (10
and the generalized output of the system is given by (37)
where
The performance index for systems described by (36) can be expressed as a quadratic from in X by defining a new V k appropriately when W's are quadratic in (31). For example, since ek
=
dk - Ck
=
Hkg k - MkXk (-M k , Hk)X k
=
Letting the new V k be
one can write (Xk'VkXk) instead of (ek'Vkek)' where the new V« again is positive symmetric with dimension (m' n). * Thus, by suitably
+
* For those not familiar with operating with partitioned matrices, see for example Gantmacher. Baa
2.
41
SYSTEMS WITH INDEPENDENT PARAMETER VARIATIONS
augmenting the state equation for the plant, it is possible to incorporate the mechanisms for dependent noises and/or input signals and the control problems can be taken to be the regulator problem, i.e., that of bringing the (augmented) state vector to the origin in the state space. Since we are interested in closed-loop control policies, the control at the kth sampling instant is assumed to depend only on the initially available information plus yk and U k-1 and on nothing else. We see from the above discussions that the problem formulation of this section with the system of (28) observed by (33) is not as restrictive as it may appear at first and is really a very general formulation of linear control systems with quadratic performance indices. It can cover many different control situations (for example, by regarding (28) as the state equation for the augmented systems). With this in mind, we will now discuss the regulator problem of the original system (28). In the development that follows, Wk of the performance index is taken, for definiteness, to be
B.
PROBLEM STATEMENT
Having given a general description of the nature of the problem, we are ready to state the problem more precisely. The problem is to find a control policy U N-1 such that it minimizes the expected value of the performance index E] where U i given by
E
Vi' 0
~
i ~ N - 1, and where the performance index N
] =
l:>k'VkX k 1
IS
N-l
+L
(38)
Uk'PkUk
0
where Vk's and Pk's are symmetric positive matrices, and where the system's dynamics is given by k
where Po(xo) is given and where A k , B k with
,
= 0, 1,... , N - 1 and
tk
(39a)
are random variables
i = 0, 1,0,0, N - 1
(39b)
42
II.
OPTIMAL CONTROL OF STOCHASTIC SYSTEMS
It is assumed that gk's are independent of all (A k , B k ), that gk and (A k , B k ) are independent for each k, and the system is observed by k
=
0, 1, . , N - 1
(40)
where E(7Jk) = 0, E(7Jk7Jk') = R k , and for simplicity of exposition 7Jk is assumed independent for each k and of all other random variables gk and (A k , B k ) , k = 0, I, ... , N - 1. R's and Q's are assumed known. The situation where fs and 7J's are not independent can be treated also. See for example Section 3,E. We have seen in the previous section that this problem statement can cover situations with dependent noise, input signal dynamics, and others by considering Eq. (39) as the equation for the augmented state vectors, if necessary. Various conditional probability density functions and moments are all assumed to exist in the following discussions.
C.
ONE-DIMENSIONAL EXAMPLE
Before launching into the derivations of the optimal control policy for the problem, let us examine its simpler version of the one-dimensional problem so that various steps involved in arriving at the optimal control policy are made clear. In this way, we will avoid getting lost when we deal with general vector cases. The one dimensional problem is given with the plant equation
°
~ i ~ N -
1,
u,
(41)
E ( - 00. 00)
and the observation equation Yi
=
Xi
where (Xi' fJi , gi, and 7Ji' pendent random variables. It is assumed that E(ai)
+ TJi,
°
~
(42)
0~i~N-1
i
~
N - I, are assumed to be inde-
a,
(43a)
E({3i) = b,
(43b)
E(gi)
=
=
E(TJi)
=
0,
O~i~N-l
(43c)
and that the random variables all have finite variances. Take] to be (44)
2.
43
SYSTEMS WITH INDEPENDENT PARAMETER VARIATIONS
Then, according to the development in Section 1,B, in order to obtain must compute
ut-l , one
f
=
2 XN p(x N
I CXN-I , (3N-I , tN-I'
X P(CX N-I , (3N-I , tN-I' X d(x N , XN-I , =
f
(CXN-IXN-I
XN-I
XN-I , UN-I)
I yN-I)
CXN-I , (3N-I , tN-I)
+ (3N-IUN-I + t N_ I)2
X P(CX N-I ,
(3N-I , tN-I'
XN-I
X d(XN_ I ,
CX N-I , (3N-I ,
tN-I)
I yN-I) (45)
By the independence assumption, one has, in (45),
o~ and YN
=
f
[(aN-Ix N-I
X P(X N-I
+ bN _ I UN_ I )2 + a~_lx~_1
I yN-l)
+ .E~_lU~_1
i
~
N -
I
(46)
+ q~-I] (47)
dX N_I
where (48a)
.El var(ti) = ql, var((3i) =
(48b) O~i~N-I
(48c)
Let (49a)
and O~i~N-I
(49b)
These p,'s and zl's may be computed explicitly with the additional assumptions on the random variables. For example, if these random variables are all assumed to be Gaussian, then they can be computed as in the examples of Section 3. From (47)-(49),
(50)
44
II.
OPTIMAL CONTROL OF STOCHASTIC SYSTEMS
Assuming fLi and Lli are independent of to U N-1 to give
Ui , YN
is minimized with respect (51)
where (52)
and
. mmYN = UN_I
YN *
(53)
where /
I
£:,
=
L'2
L'~_I
N-I
+ b2
2
N-I
aN -
I
+
2
(54a)
UN-I
and (54b)
The expression for YN * can be put slightly differently, retaining the conditional expectation operation E(· I yN-l) in the expression for YN*' In this alternate form, (47) can be written as (55)
where (56)
One can easily check that Eqs. (53) and (55) give the same value for since
YN*
E[/lX;"-I
+ VI I yN-l]
+ .1~_lJI = IIiL;"-I + PI =
/liL~-I
+ VI
Having obtained some insight into the problem, we will treat the general case next.
D.
OPTIMAL CONTROL POLICY
As discussed in Section I,B, in order to determine the optimal control policy for problem one must first compute Ak , I ~ k ~ N: .\.k =
E(Wk I yk-\
U k-2)
(57a)
2.
45
SYSTEMS WITH INDEPENDENT PARAMETER VARIATIONS
where p(X k I yk-l, Uk- 2) =
J
p(X k I Xle-I , Uk-I' A k- I , B k- I , glc-I)
X
P(Xlc-1 , A k- I , B k- I , gk-I I yk-l, Uk- 2 )
X d(Xk_l
,
Ale-I, B lc-I , gk-I)
(57b)
By the independence assumptions of the random variables Ai , B; , and YJi , 0 :(: i :(: N - 1, one can write
Therefore,
x;
=
~i
,
J[xlc'VleXle + U~-IPk-IUk-]] X p(x k I Xk- I , Ulc-I , A k-I , B le-I , gk-I)
X P(Xk- 1 I yle-l, Uk- 2 ) p(A lc-1 X
,
B k- I) P(gk-I)
d(x le , Xk- I , A lc-I , B k- I , gk-I)
(59)
To obtain U}:;_I , AN is evaluated first. Since the mean of ~N-I is zero by Assumption (39b), the contribution of (xN' V NXN) to AN is given by
J(Xk'VNXN)P(XN I XN- I' UN-I' AN-I, B N-I, gN-I) P(XN- I I yN-I) X p(A N-I, B N- I) P(gN-I) d(x N 'X N-I' AN-I' B N-I, gN-I) =
J[(AN-IXN- I + BN-IUN-I)' VN(AN-IXN-I + BN-IUN-I) + E!;(f vNg)] X
p(A N- I , B N- I) P(XN-I I yN-I) d(XN- I , AN-I, B N- I)
(60)
r
where E!; is the expectation operation with respect to Denoting by a bar the expectation operation with respect to the random variables A and B, we have, from (39b), (59), and (60),
(61 )
By minimizing (61) with respect to UN-I' the optimal UN- I is given by (62)
46
II.
OPTIMAL CONTROL OF STOCHASTIC SYSTEMS
where (63)
and where (64)
In (63), the superscript plus denotes the pseudoinverse. The pseudoinverse is discussed in Appendix II at the end of this book. If the inverse exists, for example, if P N - 1 is assumed to be positive definite, then the pseudoinverse coincides, of course, with the inverse. See Appendix A and B at the end of this chapter for further discussion of the pseudoinverse and its applications to minimizations of quadratic forms. Substituting (62) into (61) and making use of (B.9) in Appendix B,
(65)
where
and
The first term in Eq. (66c) is independent of X N-1 and yN-l by assumption on ( N - l ' The second term will be independent of past controls if the estimation error (X N-1 - fLN-l) has a conditional covariance matrix independent of X N-1 and yN-\ for example, if X N-1 - fLN-l is normally distributed (see Section 2,F). To obtain U':i-2 , we compute
where use is made of the property of the conditional expectation (68)
2.
SYSTEMS WITH INDEPENDENT PARAMETER VARIATIONS
47
We encountered this relation earlier in Section I,B. Proceeding as before, noting that now (VN- 1 II) corresponds to V N , PN-2 to P N- 1, etc., the development from (60) to (66) is repeated to give
+
(69)
and (70)
where (71)
and where (72a)
and V2
g
VI
+ tr[(VN_ 1 +I1)Q N- 2] + E[(X N_2 -
X 7T 2(XN-2 -
JLN-2)'
JLN-2) I yN-2]
(72d)
In general, O~i~N-l
and O~i~N-l
(73) (74a)
where (74b)
and where (75a)
and VN-i
g
VN-i-l
+ tr[( V i +1 + I N-i-l)Qi]
+ E[(Xi -
JL, , 7T N-i(Xi - JLi)) I yi]
(75d)
48
II.
OPTIMAL CONTROL OF STOCHASTIC SYSTEMS
When fL/s are computed explicitly as a function of yi and Ui~\ Eqs. (73)(75d) solve the proposed problem completely. Equations (74) and (75) show that the feedback coefficients A are computable before the control process begins, i.e., off-line since they do not depend on the previous control vectors nor on the observation vectors. Note also that A's are not random. Computations of fL's generally must be done on-line. They are computed later in Section 3 with the additional assumptions that the noise are all Gaussian random vectors. Figure 2.6 shows the configuration of the optimal control system. In terms of fL, (73) can also be written as (76a)
where
(76b)
and where
(76c)
is the conditional covariance matrix of
Xi .
RANDOMLY VARYING PLANT WITH ADDITIVE N O /
,------
I
I
OPTIMAL ESTIMATOR
Fig. 2.6. Optimal controller for the stochastic system of Fig. 2.5 with noisy state vector measurement. See Fig. 2.8 for the schematic diagram of the optimal estimator. = [P k + Bk'(Vk+1 -+- IN_k_dBkJi B/(VH ated by Eq. (75).
.11 k
1
+
IN __ k_l)A k; {lil. i = I, ... , N gener-
2.
SYSTEMS WITH INDEPENDENT PARAMETER VARIATIONS
49
When the state vectors can be observed exactly, E(Xi I yi) reduces to Xi and the term E[(Xi - fLi)' '71ixi - fLi) I yi] vanishes in the equation for Vi . Replacing fLi by Xi in (62)-(76), the optimal control vectors with the exact state vector measurements are given by O:S;;i:S;;N-l
(77)
where Ai is the same as before and is given by (75c) and (78)
where (79a)
with (79b)
Figure 2.7 is the optimal control system configuration with no observation noises. Thus, as already observed in connection with a simple system of Example 4 of Chapter I, the effect of additive noise t to the system is merely to increase y* by ii. When the performance index is given by (80)
RANDOMLY VARYING PLANT
I" - - - - - - - - -- - l
.-
I
;-1_ _-.1
I I Ix
I I
Ck k +1
I
I
UNIT DELAY 1
L
Fig. 2.7.
I I I .J
Optimal controller for the stochastic system of Fig. 2,5 when the state
vector measurement is exact, A k = -[Pk {Ii}, i = 1, .,', N, generated by Eq. (75).
+ Bk'(Vk+l ~-
IN~k-_l)Bk]t
Bk;(Vk : 1 + IN_k_1)A k;
50
II.
OPTIMAL CONTROL OF STOCHASTIC SYSTEMS
rather than by (38), the recursion formula for y* is obtained by putting all Pi equal to zero. Then, from (75c), the optimal control policy with the new performance index of (80) is given by (74a) and (75c) with Pi 0. In particular,
UJr_l
(81 )
and
Unlike the previous problem, where Pi =F 0, this problem permits a certain simplification and one can write (83)
where PI =
Generally, with Pi
A N _1 =
0,
-
BN_l(B~_1
°
~
VN B N _ 1 ) + (B~_1
VNA N _ 1)
(84)
i ~ N - 1, (85)
where
(86)
Equations (74) and (75), which define recursively the optimal feedback control coefficients and the optimal criterion function values, can be put in a little more transparent forms when the system parameters A's and B's are deterministic and fs and Y)'s are the only random variables. From (74a) and (75c), we write Ai as (74a-l)
where (74a-2)
and where we give a symbol L N - i to Defining Ii by
Vi
+ I N-i for
ease of reference. (75a-1)
we have from (75a) and (75b) Ji
=
L N _ i_1(I - BiNi)
(75a-2)
2.
51
SYSTEMS WITH INDEPENDENT PARAMETER VARIATIONS
and (75a-3)
The recursion formulas (74a-2), (75a-2), and (75a-3) for N's, ]'S and L's are initiated by putting 10 = 0 or equivalently 1N = O. Then from (75a-3) From (74a-l) and (74a-2) AN - I
=
NN-IAN-I
=
(P N -
I
+ B~-IVNBN-IrB~-IVNAN-I
which is in agreement with (63), taking note of the fact that A N B N-I are now deterministic by assumption. By using (75a-2), we have IN-I
=
Lo(I -
I
and
BN-INN-l)
and from (75a-3) Now, N N-2 , 12 , and L 2 etc. are determined in the orders indicated. Later in Section 3 of this chapter as well as in Chapter V, we will encounter a similar set of recursion equations in expressions for conditional means and conditional covariance matrices of certain Gaussian random sequences. We will postpone the discussions of the significance of this similarity until then. E. CERTAINTY EQUIVALENCE PRINCIPLE
If we consider a plant with nonrandom plant parameters and if es and Y)'s are the only random variables in the system, then the bars over the expressions for Ai , Ii , and 7Ti in (75) can be removed. Since these quantities are independent of the plant noise process {ti}, and of the observation noise process hi}' they are identical to the ones derived for a deterministic plant with no random input and with exact state vector measurements. As observed earlier in connection with (58), (66), (74), and (75), {t i } and {Y)i} processes affect only Vi and E(x i I yi). Since the optimal control vectors are specified fully when E(xi I yi) are given, the problem of optimal control is separated into two parts: the estimation of the state vectors, given a set of observation data; and the determination of proper feedback coefficients, {Ai}, which can be done from the corresponding deterministic plant.
II.
52
OPTIMAL CONTROL OF STOCHASTIC SYSTEMS
If A's are random but B's are deterministic in the plant equation, then Ai , Ii , and 7Ti are the same as the ones for the equivalent deterministic plant The procedure to obtain control policies for stochastic systems by considering the optimal control policies for the related deterministic systems where the random variables are replaced by their expected values, is called the certainty equivalence principle.49.136a One may speak of a modified certainty equivalence principle when the random variables are replaced with some functions of their statistical moments. For systems with random A's and deterministic B's, their optimal certainty equivalent control policies are the optimal control policies for the same class of stochastic systems with Yi = Xi , i.e., when the Xi are observed exactly and when E(qi) = 0, ~ i ~ N - 1, or if Yi oj::. Xi' then Xi is replaced by E(x i I y i ) . When A's and B's are both random, the optimal certainty equivalent control policies are optimal control policies for the deterministic system with the plant equation
°
For example, with N
] =
I
1
u;* = -[Pi
[X/ViX i
+ U;-lPi-1U i- 1]
+ B/(Vi+l + IN-i)B i]+ B/(Vi+l + IN-i)Ai E(Xi Iyi)
where
+ IN-i-1)Ai - A/(Vi+l + I N- i- 1) Bi[Pi + B/(Vi+l + IN-i-1)B i]+ B/(Vi+l + IN-i-1)Ai
I N-i = A/(Vi+l X X
Since B/(Vi+l
+ IN_i_1)B; oj::. B/(Vi+l + IN-i-1)B i
the optimal certainty equivalent control policy is not optimal for this class of stochastic control systems.
F.
GAUSSIAN RANDOM VARIABLES
It has been assumed in connection with Eq. (74) that quantities
are independent of
Xi
and y i •
3.
53
SUFFICIENT STATISTICS
Two sufficient conditions for this to be true are: (a) All random variables in the problem have a joint Gaussian distribution. (b) The plant and observation equations are all linear. This will be shown by computing the conditional error covariance matrix E[(xi - fLi)'(Xi - fLi) [yi] explicitly under Assumptions (a) and (b) in the next section, Section 3. See Appendix III at the end of this book for brief expositions of Gaussian random variables.
3. Sufficient Statistics We have seen in previous sections that Uk is generally a function of yk and not just of Yk . From (21) and (22) of Section I,B, we note that this dependence of Uk on yk occurs through P(x k I yk) and P(Yk+l I v". Uk) in computing y's. Intuitively speaking, if a vector Sk' a function of yk, exists such that P(Xk I yk) = p(x k I sic), then the dependence of Uk on past observation is summarized by Sic and optimal Uk will be determined, given Sk and perhaps Ylc without the need of additional knowledge of ylc-l. Such a function of observations is called a sufficient statistic.:" See also Appendix IV. We discuss two simple one-dimensional examples first. Those readers who are familiar with matrix operations and Gaussian random variables may go directly to Section 3, C.
A.
ONE-DIMENSIONAL EXAMPLE
To show that such a function exists and to see how it helps simplify the control problem solution, consider a scalar control system with a plant equation
o~ and the observation equation Yi
= hixi
+ 7Ji ,
hi
i
~
* 0,
N -1,
UiE(-oo,oo)
0 ~ i ~ N - 1
(87)
(88)
Take as the performance index a quadratic form in x and u,
] =
N
I
I
(ViXi
2
+ ti_1uL),
(89)
II.
54
OPTIMAL CONTROL OF STOCHASTIC SYSTEMS
where a i and b, are known deterministic plant parameters and where fs and YJ's are assumed to be independent Gaussian random variables with E(ti) = E('r)i) = 0, E(ti 2) = qi2 > 0, E('r)i 2) = r i2 > 0, E(ti'r)j) = 0,
O~i~N-I
(90a)
O~i~N-I
(90b)
O~i~N-I
(9Oc)
all i and j
(90d)
Assume also that X o is Gaussian, independent of fs and YJ's with mean ex and variance a 2 • This system is a special case of the class of systems discussed in Section 2. Now
where fLo l/a o2
= =
(ex/a 2 + hoyo/ro2)/(I/a2 + ho2/ro2) l/a
2
+
(92a) *
ho2 /ro2
(92b)
From (26) of Section 1,C, P
(
x. HI
I i+l) Y
=
f p(xi I yi) p(xi+1 I Xi, U i)P(Yi+l I Xi+1) dXi P(Yi+1 I y i)
(93)
From (88), (90a), and (90c), P(Yi I Xi)
(y - hX)2)
const exp ( -
=
'2r. 2"
,
(94)
From (87), (90a), and (90b), p(x i+1 I Xi 'U i )
=
const exp ( -
(Xi+1 - aix i - biui )2) 2qi2
(95)
We will now show by mathematical induction on i that p(xi I yi, Ui- 1 )
=
const exp (_ (Xi2-:/'i)2 )
(96)
holds for all 0 ~ i ~ N - 1 with appropriately chosen fLi and ai' This relation is satisfied for i = 0 by (91). Substituting (94)-(96) into (93) and carrying out the integration with respect to Xi , P(Xi+l I yi+l}
=
const exp ( _
(xi+12~
{
* If p(Xo) = 3(x o - a), i.e., if we are absolutely sure of the value of xo, then E(x o I Yo) i.e., the measurement Yo does not change our mind about Xo = a.
= a,
3.
55
SUFFICIENT STATISTICS
where where (98a)
and where h7+l(qi + alUi ) + r7+1 r;+l(qi 2 + ai2ui 2 ) 2
2
(98b)
Thus (96) is established for all i = 0, 1,... Note that fLi and ai 2 in (96) are the conditional mean and variance of Xi' respectively, given yi and U i - 1. Ki+l can also be expressed as Ki+l = U7+1 hi +l /r 7+1 . Equation (96) shows that (fLi' a i 2 ) are sufficient statistics and contain all information carried by (yi, Ui~l). Equation (97) shows that the equation satisfied by the sufficient statistic fLi is composed of two parts: the first part is the same as the dynamic equation of the plant, and the second part constitutes a correction term proportional to Yi+l ~ hi+l(afLi bu i) which may be interpreted as the difference between the actual observed value of Xi+l and the predicted value of the observation based on the estimate fLi of Xi' hi+l(afLi bUi). Note that yi and Ui~l are replaced by fLi and ui 2 and that a's are computed from the knowledge of the noise variance and are constants independent of the observations and controls. In other words, a's can be generated off-line. We are now ready to determine U~~l • As usual we first compute from (87), (89), (90), and (96):
+
+
=
J
(V NXN
X
=
2
P(X N-1
J
+ tN-IU~-I) I yN-I)
[vN(aN-IxN-I
X
P(XN-1
p(x N
I X N-1 ,
UN-I)
d(x N , X N-1)
+ bN _ 1U N _ 1) 2 + VNq~-1
+ tN-IU~~I]
I yN-I) dXN_1 (99)
II.
56
OPTIMAL CONTROL OF STOCHASTIC SYSTEMS
where (100)
Minimization of (99) with respect to
UN-l
gives (101)
where (102)
Substituting (101) into (99) gives YN* =
CN
+ TNfL~-1
(103)
where (104a) (104b)
To compute Ut-2 , one must compute YN-I . AN _ 1 is given by computations similar to (99):
The probability density P(YN-I I y N-2), necessary to evaluate E(YN *jy N-2), is given from (88) and (96) by the Gaussian probability density with mean h N- 1(aN- 2P-N- 2 bN- 2UN- 2) and variance
+
From (97) and (103), YN* is seen to depend on YN-I only through since P-N-2 are functions of y N-2 and UN-3. We have
P-N-I ,
(106)
and
Therefore, from (103), (106), and (107), E(YN*
l
y N- 2)
=
+ T N{(aN-2fLN-2 + bN_2UN_2)2 + K~_I[h~_I(a~_2a~_2 + q~-2) + r~-l]}
CN
(108)
3.
57
SUFFICIENT STATISTICS
From (105) and (108) YN-I
=
A. N _ I
=
CN- I
+ E(YN* I yN-2) + tN-2U~-2 + (VN-I + T N)(aN-2iLN-2 + bN_2UN_2)2
(109)
where CN - I
D N- I
=
+ C + TNK~_I[h~_I(a~_2a~_2
+ q~-2)
N
Therefore, by minimizing (109) with respect to
+ T~_I]
(110)
U N- 2 ,
where and where _
T
N-I -
+
(VN-I T N)tN-2a~_2 t N_2 + (VN_I + T N)b7v_2
The above process is perfectly general and one has u," t
where J1. = •
Yi*
=
= -Ji·u·e
(Vi+1 + T i+2)biai t; + (v i+1 + T i +2 )bl ' C, + T iiLi 2
where C,
=
D,
C N +1
=
0
(IlIa)
~j
o ,e:;; i
,e:;; N - 1, T N +1 = 0 (ilIb) (1I Ie)
+ CHI + Ti+1Ki2[hi2(aLaLI + qL) +
Ti
2
J
(IIId) (IIIe)
and TN+!
= 0, I,e:;; i ,e:;; N
(lllf)
When t i = 0, 1 ~ i ~ N - 1, in (89), from (Ill b), £1 0 = ao/bo, and, from (Ll l a), aofLo bouo* = O. More generally, Ai = ai/bi and aifLi biu i * = 0 for all 0 ~ i ~ N - 1. Therefore, from (97),
+
+
O~i~N-I
II.
58
OPTIMAL CONTROL OF STOCHASTIC SYSTEMS
and, from (111 a), O~i~N-l
is the optimal policy for the system with the plant equation (87) and the observation equation (88) and the criterion function] = L:f V iXi 2; i.e., with this criterion function u i * is proportional to Yi More discussions on this point are found in Section 3,D. B.
ONE-DIMENSIONAL EXAMPLE
2
As a special case of the above example, consider a system bci=O, Yi =
Xi
+ "t« ,
UiE(-OO,
OO )
(112) (113)
O~i~N-l
°
where a and b are known constants and where r/s are independent random variables with E(TJi) = 0, var(TJi) = ri 2 , ~ i ~ N - 1, and where X o is a random variable independent of everything else. This is the system discussed as Example 3 of Chapter I, with] = XN 2• There we have obtained the optimal control policy in terms of the statistics Et», I yi)
=
iLi and
var(x i I yi)
=
al,
0
~
i
~
N
without indicating how they may be computed. With the additional assumption that X o and TJ'S are Gaussian, O~i~N-1
the result of Example 1 of this chapter can be used to compute these statistics. Namely, p.:s and a's are sufficient and can be computed as I
~
i
~
N - 1 (114a) (1I4b)
(1l4c)
O~i~N-l
l/ai
2
l/a0 2
2
=
l/r i
=
l/a 2
+ 1/(a + l/r 0
2aLl)'
2
l~i~N-l
(1l4d) (114e)
When ] is given, -\ is computable since p(x i I y i ) is known as a Gaussian probability density function with the conditional mean fLi
3.
59
SUFFICIENT STATISTICS
and the conditional variance ui 2 • The conditional probability density P(Yi+1 I yi, ui ) needed to compute E(yft2 I y i ) is obtained analogous to P(fJ'N-1 I yN-2) given by (l06) and (107) or independently as follows. From Eq. (I 13), hence or Similarly From Eqs. (112) and (113), Yi+l
=
aYi
+ bu, +
?)i+l -
a?);
where Yi+1 is a Gaussian random variable since it is a linear combination of Gaussian random variables. Now
+ bu, + B(?)i+1 I y i ) = aYi + bu, - a(Yi - f-Li) = af-Li + bu,
B(Yi+l [yi, u i) = aYi
aE(?)i I y i )
(115)
because Also hence (116)
Equations (115) and (116) determine P(Yill I yi, u i ) completely. The reader is asked to compare the effectiveness of the optimal closed-loop control policy for the system of Example 2 with that of the optimal open-loop control policy (i.e., the controls are function of Yo only) with a quadratic criterion function. What difference do these two policies make in the values of E(] I Yo) ?
C.
EXAMPLE
3.
UN CORRELATED GAUSSIAN NOISES
The above examples can be extended to a system with a vector difference equation as the plant equation (117)
60
II.
OPTIMAL CONTROL OF STOCHASTIC SYSTEMS
and the vector observation equation Yi
+ 7Ji
Hi»,
=
(118)
where X o is assumed to be a Gaussian random variable with E(xo) = a, cov(x o) = Eo, where fs and Yj's are Gaussian random variables independent of X o with E(gi)
=
0
E(7Ji)
=
0
E(gig/)
= 0i;Qi
E(7Ji7J/)
=
0i;Ri
E(gi7J/)
=
0
(119) for all i andj
where Qi and R i are assumed to be positive definite. The last assumption on the independence of fs of Yj's is made to simplify computations and can easily be removed with some additional complications in the derivations. This is indicated in Section 3,E. To derive optimal control policies, one must first compute N
(120)
P(Xi- 1 I yi-1, U i- 2) dX i_1
(121)
1 ~i
where P(Xi I y i - 1 ) =
f p(x i
1
Xi-I'
U H)
~
where P(Xi- 1 I yi-\ U i- 2) is computed recursively by (26) of Section 1,C. Actually, one could just as well derive the recursion relation for P(Xi+1 I yi) rather than for P(Xi I yi). See (149) in Section 3,E for derivation. The recursion process is initiated by computing (122)
where by assumption Po(xo)
and where the notation
const exp(-
=
til X o - a 1110 , )
II z II} /':, z'Sz is used.
p(Yo 1x o) = const exp( -
(123)
From (119),
t II Yo -
Hox o Ilh ')
(124)
From (123), p(xo I Yo) is seen to be a Gaussian: p(x o IYo)
=
const exp( -
til X o -
fLo 11~01)
(125)
3.
61
SUFFICIENT STATISTICS
where (126)
and (127)
The detail is carried out in Appendixes C and D at the end of this chapter. See also Refs. 84-86, 141. Assume that P(Xi I yi) has also a Gaussian density: p(X i
= x I y i) = const exp( - t II x -
fLi llh I )
(128)
where fLi L E(xi I yi) and r, Q cov(xi I yi). The variable Vi is therefore the conditional mean and i is the conditional covariance matrix of Xi , given the past and current observation Yo ,..., Yi . This is certainly true for i = 0 from (125). From (26),
r
J
= canst exp( - t
(129)
E i ) dx,
where
(130)
SInce
and P(Yi I Xi)
= const exp( - til Yi - H,», Ilh;)
After carrying out the integration, which is shown m detail in Appendix C, one has P(Xi+l I y i +1)
= const exp( - i II Xi +1 -
(131)
fLi+l l~i-~I)
where fLi+l
= =
+ Bsu, + ri+lH:+lRZt.\[Yi+l ri+l(Qi + AiriA/)-l(AifLi + Biu i)
AifLi
Hi+l(AifLi
+ Biu i)] (132a)
62
II.
OPTIMAL CONTROL OF STOCHASTIC SYSTEMS
and where
(132b) This completes the mathematical induction on i and Eq. (128) has been established for i = 0, 1,.... Figure 2.8 shows a schematic diagram of a filter which generates fLi . Note that in (132a) the terms AifLi Biu i show that the fLi+! satisfies the same dynamic equation as the plant equation plus a correction term proportional to Yi+! - Hi+!(AifLi Biu i) which may be interpreted as the error in predicting Xi+! based on the estimate fLi of Xi . An alternate expression for the constant multiplying the correction term in (l32a) is given by (C20) in Appendix C. Thus we have seen from (132a) that fLi+l is computable given fLi 'Yi+l , and U i . Using (132b) and the matrix identity in Appendix D T i +! is computable from T i • Hence, fLi and T i are sufficient statistics for Xi and summarizes all available information contained in y i - 1 on Xi . There is another way of obtaining the recursion formula for the sufficient statistics. To do this we first obtain the expression for P(Xi+l I y i) in terms of p(x i I yi). The detail is also found in Appendix C.
+
+
P(Xi+l I yi) =
const exp( - til
Xi+l -
AifLi -
Biu i
Ilk!)
(133)
-'-l I I I I
,, , ,,, , I
,
L.
._ • __ oJ
SYSTEM IDENTICAL TO PLANT Y,'+I
-;----+i K i+1 1-
--'
_._._._J CONDITIONAL MEAN GENERATOR
Kj=rjH:Rjl L;=I-KjH j
Fig. 2.8. Schematic diagram of the conditional mean generator (Wiener-Kalman filter). This is the optimal estimator in Fig. 2.6.
3.
SUFFICIENT STATISTICS
63
where (134)
The last expression is obtained using the matrix identity in Appendix D, or more directly knowing that p(x i +1 I yi) has a Gaussian distribution. Since the conditional mean of x i +1 is given by
(135)
where the independence assumptions of ts and y/s and (119) are used in putting the last term equal to zero. To obtain the conditional covariance, we compute
(136)
since from the independence assumptions of ts
We see, therefore, that M i +1 of (134) is given by (136) From (135) and the recursion relation of J-ti given by (132a), the recursion formula for Vi is simply given by
where (138)
Note that, since T/s do not depend on particular y's nor on u's, they can be precomputed if necessary. We see that the derivation of P(xi+1 I yi+1) by first obtaining the conditional mean and the conditional variance of P(Xi+l I yi) by (135) and (136) and then by making use of Bayes rule
yields the relations of (l32a) and (l32b) without too much manipulations of matrices.
64
II.
OPTIMAL CONTROL OF STOCHASTIC SYSTEMS
Duality Principle The set of recursion relations to generate fJ-'S and T'e bears a remarkable resemblance to that given by Eq. (75a-l)-(75a-3) at the end of Section 2,D. The problem discussed in this section is that of the filtering, i.e., to obtain the conditional mean and conditional variance associated with the Gaussian probability density function where the constant associated with the correction term (called the filter gain), K; and r i are generated by the following set of equations [see Eqs. (134), (Cl7), and (CI8)]
K, = MiH/(R i
+ HiMiH/)-l
= Qi + AiriA/ P, = M;(I - H/K/)
M i +1
where the system equations are given by (117) and (118) where g's and YJ's are Gaussian random variables with moments given by (1l9). The regulator problem discussed in Section 2,D is that of minimizing the quadratic criterion function where the plant equation is given by
The result of the minimization is expressed by n
L (x/c'V/cx/c + U~-lP/c-lU/c-l)
min
ni_l' ...•uN_I i
where I N-i L N-i I,
g g =
n, g
A/IiAi Vi A/IiAi LN_i_1(I - BiNi) (Pi + B/LN_i_lBi)+B/LN_i_l
+
Therefore, by comparing these two sets of equations we can establish the correspondence between these two problems as follows Ki~N/
Hi~G/
Mi~LN-i-l
Qi~
r,«: Ii
Vi
Ri~Pi
Ai~F/
This correspondence is sometimes referred to as the duality principle.v'- Making use of this principle, whatever results we obtain
3.
SUFFICIENT STATISTICS
65
for regulator (filtering) problems can be translated into the corresponding results for filtering (regulator) problems.
D.
PROPORTIONAL PLUS INTEGRAL CONTROL
Using the sufficient statistics just derived, the optimal control policy for the system of (117) and (118) can now be given explicitly with the criterion function
] =
N
I
1
Wi(X i , U i - 1),
The optimal control policy for this system has already been derived in Section 2 if we take A k , B k , and H k to be deterministic with the additional assumption that tk and Y)k are Gaussian random variables with mean 0 and covariances of (119). Then u i * = -AifLi of (74a) still gives the optimal control policy where fLi is now given explicitly by (132a). Bars over various matrices can be removed in the expression for Ai since A, B, and H are assumed deterministic in this section. Therefore, the optimal controller has the structure shown in Fig. 2.9, where fL-generator has the structure shown in Fig. 2.9. Since t and Y) are now assumed Gaussian, what appeared as an assumption in Section 3-that (Xi - fLi) has a conditional covariance matrix independent of Xi and yi-is now one of the properties of the Gaussian random variables and this constant is T i of (132b). Since fLo is proportional to Yo when .Eo1 is a null matrix,
OPTIMAL CONTROLLER -- ---- - -- ---,
,,r------------- -
,,
,,I
,,I
,
I
: I I
:
,,
I
I
,
Ui _ 1
:
p.-GENERATOR
~-
Fig. 2.9. Structure of optimal controller for stochastic system with linear plant and observation equations.
66
II.
one has AofLo
OPTIMAL CONTROL OF STOCHASTIC SYSTEMS
+ Bouo* = =
AofLo - BoAofLo = (A o - BoAo)fLo (A o - BoAo)ToHo'Rr/yo
°
Therefore, it is' easily seen that P« is linear in y i , ~ i ~ N - 1. The assumption that .E 1 is null implies no a priori knowledge of X o . Generally, E(xi I yi) is some (measurable) function of yi. When Gaussian random variables are involved, we have just seen that E(x i I yi) turns out to be linear in yi. This fact may be used to construct an approximation to E(x i I yi) when the random variables are not Gaussian. From the recursion formula for fL, (l32a), with the optimal control u i * given by -AifLi' (l32a) can be rewritten as
o
(139)
where C,
=
(I - K j Hj)(A j _ 1
-
Bj-lAj-l)
(140)
and where K, is given by (138). This can be written as i
fLi+l = K i +lY H
l
+ L: CHI'"
Cj+lKjYj
j~O
Therefore
u;* = -AiKiYi - Ai
i-I
(I C, ... Cj+lKjYj),
1~i~N-1
J~O
(141)
which can be interpreted to mean that the optimal control is of the proportional plus integral type.P? where the first term in (141) gives the control proportional to the measurement of the current state vector and the second term in (141) expresses the control due to the integral on the past state vector measurements. Figure 2.10 gives a block diagram description of the proportional plus integral control generation. The effects of past observations are therefore weighted according to the weight C. Thus, if 11(I1~}+l Ck)Kj II <{ II K i II for i}> j, then the remote past measurements have negligible effects on the current control variables to be chosen. In the extreme case C k = 0, u i * depends only on Yi and past observations yi-l have no effect on u.": If the control problem is such that (142)
3.
67
SUFFICIENT STATISTICS
r--------------------------,
I
I
I I
I
I u* I
I I I
I +
I
I
I I
I UNIT DELAY
I I , UNIT DELAY
I
I
J
L
Fig. 2.10.
I
Schematic diagram of proportional-plus-integral controller.
then from (139) and (140) we see that for all i = 0, 1,...,N-1 Therefore, 0~i~N-1
is the optimal control policy. Namely, the optimal controller becomes a pure proportional controller. One-dimensional case of (142) has been mentioned at the end of Section 3,A. One sufficient condition that (142) holds is that Pi = 0 and that Bi l exists for all i = 0, 1,... , N - 1. Then, Ai = BilAi and Condition (142) will be met. a. Accurate Measurements
We see from (139) that the control U i * consists of proportional part plus the integral part unless C, = 0 of all j ~ i - 1. We have seen one way that C, = 0 results by having A j - BjA j = O. Now, suppose A j - BjA j i= O. Then, unless KjHj = I, the proportional part does not disappear. Intuitively speaking, if the measurements of the state vectors are exact and there are no unknown parameters in the problem statements as we are assuming now, then the control at time i will be a function of Xi alone, indicating KjHj will be equal to I under the perfect measurements. For systems with poor measurements of the state vectors it is intuitively reasonable that the controller makes use not only of the current measurements but also of past measurements in synthesizing optimal controls.P"
68
II.
OPTIMAL CONTROL OF STOCHASTIC SYSTEMS
This turns out to be true, as we can see from the following. When the measurements are accurate, this will be expressed by small covariance matrices R j in some sense. Let us therefore write ERj instead of R j with the understanding that E is a small positive scalar quantity. Then, from (127a),
+ EL';l)-l
=
E(Ho'R~;tHo
R::i
E(Ho'R;lHo)-l
In general, from (l32b), Therefore
Thus (143) Equation (143) shows that U i * is small compared with Yi and the integral part will be negligible compared with the proportional part.
b. Inaccurate Measurements Now, let us examine the relative magnitudes of the integral and the proportional parts when the accuracy of measurements is poor. We will now suppose that R, is large or Rjl is small in some sense. Writing ERjl instead of RjJ, where E is a small positive scalar quantity as before, we now have
In general (144) Equation (144) can be solved as
3. where
69
SUFFICIENT STATISTICS
t., g Qj L_ I
g
+ AjL;_IA/
20
Thus
Thus i-2
iLi = ELi_IHLIR;!IYi
+ I E
(A k
-
B kA k)20Ho'R-;;lyo
(145)
k~l
It is seen from (145) that the integral part is of the same order as the proportional part unless II nt:~ (A k - BkAk)11 is of the order E or less, for example by satisfying the inequality 11 A k - BkA k 11 <{ 1 for 1 ;(; k ;(; i - 2.
E.
EXAMPLE
4.
CORRELATED GAUSSIAN NOISES
Before closing this chapter, we shall briefly outline the derivations of sufficient statistics when t-noises and 'I-noises are correlated, while retaining the assumption that t and 'I are independent at different times. As discussed in Section 2, this independence assumption can also be dropped by dealing with augmented state vectors. See Refs. 33a and 141 for continuous-time counterparts. Instead of Eq. (119), it is now assumed that t and 'I are jointly normally distributed with E(gi) = E(YJi) = 0 E(gig/) = QiOi; E(YJiYJ/)
=
( 146)
Rio i;
E(giYJ/) = SiOij
where Qi and R, are assumed positive definite. The joint probability density function for (t i , 'Ii) has the form P(gi, "Ii) = const exp( -
~
(g;" "I;') ei l
(~:)
(147)
where 6 E [(gi) C i ,,= -n . "/'t
(t' ')] _ [Qi Si , YJiS' t
Si] R. 1
It is convenient to work with the expression for P(xi+l I yi) rather than P(xi I yi) now since t i and n. are correlated.
II.
70
OPTIMAL CONTROL OF STOCHASTIC SYSTEMS
To obtain the recursion equation for p(xi
I yi~l),
consider
P(Xi , Yi, Xi+l I yi-l).
It can be written by the chain rule: P(Xi, y, , Xi+l I yi-l)
=
(148)
P(Xi I yi-l) p(xi+l , Yi I Xi' yi-l)
Also
Therefore, P(Xi+l I yi)
=
=
f p(xi , Xi+l I yi) dx, const f p(xi I yi-l) P(Xi+l , Yi I Xi, yi-l) dx,
(149)
where the constant (a function of yi) is determined by the relation
f P(Xi+l I yi) dXi
+1 =
I
In (149), p(xi+l ,Yi I Xi ,yi-l) is given by
- Aixi -- B iu i)I/ -1 ) (150) P(Xi +1 ,Yi I Xi' Yi-I) -- const exp (1 - -211!(Xi+l , y,. _ H.,X,. Ci 2
Thus, we obtain the recursion formula p(xi+l I yi) = const
f
(
p(x, I yi-l) exp _
111(X' -- A·x· - BU)11I
2
~.
2 I
HI
e
1
,
y. - H,», 1. 1 1,
t.
c-i 1
)
dx,
(151)
The relation to be verified by the mathematical induction is now p(Xi I yi-l)
=
const exp( - ~'l
Xi - Vi
Ilh·
I)
(152)
where (Vi' T i ) is the sufficient statistics. Substituting (152) into (151) and carrying out Integration (lSI), recursion formula for Vi and Zi are obtained in much the same way as before. Only the result is listed here: (153)
4.
71
DISCUSSIONS
where Ki+l
g r i+ l ( -Diz
+ G/LiFi)
(154a)
rill g nil + G/LiG i Gi
g
r, g
D~z
=
(154b)
A/Dil
+ H/Di~
(154d)
H/D~z
-+
A/Diz
(154e)
-(Qi - SiRilS()-lSiRil
(154g)
»; = u;: + RilS/(Qi Assuming X o is independent of are given as follows:
SiRilS()SiRil ~o,
1)0
(154h)
and of N(a, A o), the initial values (155)
o
with To replaced by A o in the expression for K 1 and .E
1.
4. Discussions There are several related classes of problems that may be investigated using the techniques developed in this chapter. We have already mentioned the desirability of investigating the control problems with control policy based on observation data k sampling time or more old i
=
k
+ 1,..., N
- 1
where u o, U l , ... , Uk must be chosen from some other considerations. Then, instead of generating p(x i I yi) and P(Yi+l I yi), it is necessary to generate P(xi I yi-k) and P(Yi I yi-k). Using these latter density expressions, the formulation of optimal control is formally identical to the one given in this chapter. The reader is invited to investigate the optimal control problem for the system of Section 3, when the criterion function includes an additional term, representing the cost of computing, which may be taken to be a decreasing function of k.
72
II.
OPTIMAL CONTROL OF STOCHASTIC SYSTEMS
Another class of problems that are amenable to analysis with the techniques of this chapter is that of control systems with delay. Then defining new augmented state and control vectors appropriately the difference equation for it can be put into a standard form. The theory can now be applied to the augmented systems. Closely related to control problems with delays either in the plants or in the observation data available for control synthesis are the problems with intermittent observation data. Although we develop in this book the optimal control synthesis method assuming that system state vectors are observed each sampling time instant, there is a class of systems, be it chemical or aerospace, where it is neither feasible nor desirable to observe the state vectors at every sampling time instant. For such systems it is more realistic to derive optimal control policies imposing some constraints on the way observations on the state vectors are performed. One such possibility is to specify the total number of possible observations for N-stage control process and to optimize E] with respect to a control policy and the spacing of observations. See Kushner'" for a preliminary study of such systems. A more straightforward example of systems with constrained observation schemes is a system where observations are taken every k sampling instants for some fixed k. Such a system can be treated by the techniques of this chapter by rewriting the plant (i.e., the state transition) equation in terms of the time instants at which observations are made. Another way of imposing constraints on observations is to assume that at any time i there is a positive probability that the state vector will not be observed. Such a probability may be constant throughout the process or may be modified by the control variable with possible penalty incurred for modifying the probability. See Eaton-" for such analysis for purely stochastic systems. More direct constraint can be imposed on possible observation schemes by incorporating a cost associated with observation in the system performance indices. Realistically, such cost of observations will be functions of state vectors. See Breakwell-s for an elementary example where the cost of control is taken to be independent of the state vectors. Note that the recursive procedure developed in Section 2,C for generating P(Xi I yi) can be modified to generate p(xj I yi) for j < i recursively. Such conditional probability densities can be used to obtain more accurate estimate of x j based on the observations Yo , ... , Yj , Yj+l , ... , Yi , rather than on just Yo , ... , Yj .
73
APPENDIX A: MINIMIZATION OF A QUADRATIC FORM
Appendix A.
Minimization of a Quadratic Form
Consider the problem of finding u which minimizes (u, Su)
+ 2(u, Tx)
(AI)
where (', .) is an inner product and where it is assumed that 8 is symmetric and positive definite, hence 8-1 exists. By completing the square in (AI), (u, Su)
+ 2(u, Tx)
= (u
+ S'
(x, 1"S-ll'x)
one sees that u which minimizes (AI) is given by u which minimizes (u
+ S'
(A2)
Since 8 is positive definite, (A2) is minimized by
+ S-lTx
u
=
0
or u
and min[(u, Su) u
=
-S-lTx
+ 2(u, Tx)]
= -(x, 1"S-lTx)
(A3)
(M)
Now consider the case where x is a random variable and it is desired to minimize E[(u, Su) + 2(u, Tx) i y] (A5) with respect to a deterministic u(y) where y is another random variable. (See Appendix I at the end of this book for a more general discussion on conditional expectations.) Then again, by completing the square in (A5), l(y)
+ S'
g min E{(u u =
(A6)
Defining another random variable w by (A7)
one sees that u which minimizes I(y) is the same u which minimizes min E[((u - w), S(u - w)) [y] u
(A8)
74
II.
OPTIMAL CONTROL OF STOCHASTIC SYSTEMS
Equation (A8) can be rewritten by defining
w= min E[(u u
=
~
w
+w -
E(w Iy)
(A9)
w, S(u - W + w - w» I y]
min{[(u - w, S(u - w» u
+ E«w -
w), S(w - w) I y]} (AIO)
Thus, one sees that u::::::::::w
(All)
minimizes (AIO) and (A5). The minimizing u is given from (A7), (A9), and (All) by u* = -Ax
(AI2)
where A
= S-lT and x = E(x Iy)
(Al3)
The minimal value of (A5) is given, then, from (A6), (A7), and (AlO), I(y)
= = =
-E[(x, T'S-lTx) I y] -E[(x, T'S-lTx) I y] --(x, T'S-lTx)
+ E[«w + E[«x -
w), S(w - w» I y]
x), T'S-lT(x - x» I y] (AI4)
Note that u* given by (AI2) is such that it satisfies the equation E{(u - u*, S(u* - w» I y}
=
0
for any u. This fact is sometimes referred to as the orthogonality principle. 8 .5 ,86.109 b See also Chapter V for other instances where the orthogonality principle is applied.
Appendix B. Use of Pseudoinverse m Minimizing a Quadratic Form Consider I(u)
=
(u, Su)
+ 2(u, Tx) + (x, Rx)
(BI)
where 8 is positive and symmetric. We know from Appendix A that when 8-1 exists I is minimized by choosing u to be u
=
-S-lTx
(B2)
75
APPENDIX B. USE OF PSEUDOINVERSE
and (B 1) becomes 1
=
(x, (R - T'S-1T)x)
~
I x IILT'S-IT
(B3)
When 8-1 does not exist, we will see that u * given by u*
=
-S+Tx
(B4)
minimizes I, where 8+ is the pseudoinverse of 8. Pseudoinverses are discussed in Appendix II at the end of this book. As it is discussed there in detail, when the pseudoinverses are involved, the minimizing u are not usually unique unless one imposes the additional condition such as the condition that II u II is also minimal. The u given by (B4) is the one with the minimum norm. One can write
where (range space of S)
and (null space of S)
Since 8 is symmetric, 21!(8) and %(8) are orthogonal and we have
To derive (B4) we will rewrite (BI) such that it includes a term
II u
+ S+Tx IIs
2
?o 0
(B5)
and a term independent of u. Then (B5) is never negative and it vanishes when u = u*. Then it will be seen that I(u) is minimized by u* of (B4). Using the identities'V
1(u)
=
II u
S+SS+
=
S+
(S+)'
=
(S')+
(B6a) =
S+
(B6b)
+ S+Tx Iii + II x IILT'sly + 2(Tx, (l -
S+S)u)
(B7)
In (B7), note that (1 - S+S)u
+ = U1 + U2 =
U1
U2 -
S+S(u 1
+u
2)
U1
(B8)
76
II.
OPTIMAL CONTROL OF STOCHASTIC SYSTEMS
since S+S restricted to .'?i( S) is the identity.P" Also, (B9)
Thus,
The first term in (B 10) is minimized by choosing
From the requirement of the minimal norm of u one has
Thus, u* = -S+Tx
(BII)
Appendix C. Calculation of Sufficient Statistics In this appendix an integral of the form f exp( - t E i ) dx, is evaluated where E i is given by (130). By completing the square, Ei can be rewritten as
(Cl)
Let (C2)
APPENDIX C. CALCULATION OF SUFFICIENT STATISTICS
77
Then
(C3)
After integration with respect to Xi , E· Jexp ( - T) d»,
=
const exp ( -
E! T)
(C4)
where
(C5)
and where (C6)
By substituting (C2) into (C6) and using the identity of Appendix D, Mi~A
can be shown to be equal to (C7)
The first term II xi+l - AifLi f P(Xi+l I Xi , u i ) p(Xi I yi) dXi
B i uo ;llt
:-' in (C5) is the result of evaluating
HI
p(Xi+l I yi). Therefore, from (C5) one sees that the conditional distribution of Xi+l given yi is normal with =
(C8)
and (C9)
The expression
E/
of (C5) can be rewritten as
78
+IIY·
~+
II.
OPTIMAL CONTROL OF STOCHASTIC SYSTEMS
I-
2 H 1.+1(A./I.. +B.u·)IIR - 1 R-1H i: H'R- 1 Z,z z t i-l-l- i+l i+l i+l i+l £+1
(ClO)
where (CIl)
Hence P(Xi+l I yi+l)
=
canst exp( - til Xi+l - fLi+lll~;:)
(CI2)
and (CI3)
where (CI4)
is the optimal gam of the filter. There is another expreSSlOn for fJ.-i+l given by (CIS)
These two expressions are identical since T i+1 H:+1R;;l[Yi+l - H i+1(AifLi
+ BiUi)]
+ Ti+1(M;;1 + H:+1R:;;lHi+l)(A ifLi + Biui) = T i+1 H:+1R;;iYi+l - Hi+l(AifLi + Billi » + AifLi + Biu since
by definition (ell). An alternate expression for ri+l can be obtained directly from its definition ri+l = cOV(Xi + 1 I yi+ l). We do this by computing 6 COV(xi) and noting that T i = I', for all i. Since fJ.-i+1 = AifJ.-i + BiUi + Ki+l(Yi+l - Hi+l(AifJ.-i + Biu i where Ki+l is the gain of the filter, the estimation error satisfies
r,
»,
= Ai(Xi - fLi) + gi - Ki+l(Yi+l - H i+1(AifLi + Biu i » =
(/ -
K i+lHi+1)(Ai(Xi - fLi)
+ g;) -
Ki+1"li+l
79
APPENDIX D. MATRIX IDENTITIES
therefore, noting that EXi = 0 for all i, E(Xi+IX;+l)
g I'i+l = (I -
+ Qi)(l -
Ki+lHi+l)(AiI'i A/
Ki+lHi+l)'
(Cl6) Note that the expression for 1'i+1 given by (CI6) is valid not only for the optimal filter gain Ki+1 = ri+1Hi+lR"i~l but for any arbitrary gain. By completing the square in K i +1 in (CI6),
(Cl7) where
g c, g
K'::I
(Cl8a)
Mi+IH;+lC;1 Hi+lMi+lH;+l
+ Ri+l
(Cl8b)
Thus the norm of 1'i+1 is minimized by choosing Ki+1 to be equal to Kt+l since Ci is positive definite. By the matrix identity in Appendix D
and by the mathematical induction we see that for all i
(Cl9)
The equivalence of (CI8a) and (CI4) is established by means of the matrix identity of Appendix D. Now l Mi+lH;+ICi - Ti+lH;+lRi;1 =
Mi+1H;+IC;1 - (Mi+l - Mi+lH;+lCiIHi+lMi+I)H;+lRi;1
=
Mi+lH;+l[Cil
=
Mi+lH;+l{Cil[Ri+1
=
Mi+lH;+l{ C;ICiR;;1 - Ri;l}
Appendix D.
-
R;;l
+ C;IHi+lMi+IH;+IRi;l]
+ H,+lMi+IH;+l]R;}1 =
R;;I}
0
Matrix Identities
The following matrix identities are often useful in obtaining equivalent, computationally convenient expressions for error-covariance matrices and gain matrices. (A (A
+ BCB')-I = + BCB')-l =
A-I - A-IBqC
A-I - A-IB(C-I
+ CB'A-IBC)-ICB'A-I + B'A-IB)-IB'A-I
80
II.
OPTIMAL CONTROL OF STOCHASTIC SYSTEMS
where the indicated inverses are assumed to exist. These are due to Householder. 74 The proof is by direct substitution. There are similar identities involving pseudoinverses:
+ BCB' A+ + BCB'
A+
+ CB'ABC)-lCB'A]+ AB(C+ + B'AB)-lB'A]+
=
[A - ABC(C
=
[A -
where the indicated inverses are assumed to exist. These identities are due to Farrison.P Another useful formula is the expression for the inverse of a matrix in terms of submatrices:
S)-l = (Cn
R
Cn
where Cn C12
= (Q - SR-1S')-1 = -CnSR-1
C 21 = C{2 C 22 = R-1
+ R-1S'CnSR-1
Chapter III
Adaptive Control Systems and Optimal Bayesian Control Policies
In the previous chapter, we developed the method of deriving optimal control policies for purely stochastic systems. In this chapter we follow the same line of attack to discuss optimal control problems of certain parameter adaptive systems. Namely, we develop methods for obtaining optimal control policies for a class of dynamic systems where some of the key items of information on the optimal control problems are not known exactly. More specifically, we assume in this chapter that there is at least one unknown parameter in control problem descriptions, i.e., we assume that at least one unknown parameter is contained in the plant equation, the observation equation, various probability densities of the noises, initial conditions and/or in the descriptions of the inputs. The unknown parameter is assumed to be a member of a known parameter space. Therefore, the class of control systems considered in this chapter may be regarded as a class of parameter adaptive stochastic control systems. We will further develop in this chapter the procedure introduced in Section 1,B of Chapter II for obtaining optimal control policies by making use of recursively generated conditional densities. In the next chapter, we give the most comprehensive formulation of the Bayesian optimal control problems for the class of stochastic and adaptive discrete-time dynamic systems.!" Actually, we could have given this general formulation first and treated the subjects in Chapters II and III as special cases. It is felt, however, that this approach is less revealing, although it may be formally more elegant. 81
82
III.
ADAPTIVE CONTROL SYSTEMS; OPTIMAL CONTROL POLICIES
I + General Problem Statement (Scope of the Discussions) A control system is now assumed to be described by the plant equation k
and is observed by Yk :
=
0,1, ..., N - 1
(1) (2)
where Xk, Yk , Uk , gk' and 7Jk are the same vectors as III Chapter II, and where ex and fJ are parameter vectors of Eqs. (1) and (2), * where ex E {9.. and fJ E {9fJ and where {9.. and (9fJ are parameter spaces assumed given. The random vectors g and YJ are assumed to be such that their joint probability density p(gk, YJk I 8 1,82) is given, 8 1 EO {91 and 8 2 E {92 , where {91 and {92 are known parameter spaces. When each of these parameter spaces contains a single element, all the parameter values are therefore known and the problem will reduce to a purely stochastic or a deterministic one. Therefore at least one parameter space is assumed to contain more than a single element in this chapter. It will be seen in the course of development of this chapter that the method of this chapter applies equally well to situations where the probability distribution functions of the plant parameters ex and/or fJ contain unknown parameters 8.. and 8fJ , 8.. E {9.. , 8fJ E {9fJ' with {9.. and {9fJ known. Example 7 of Chapter I is an example of such a system where the plant time-constant is taken to be a random variable with unknown mean. For additional examples, see Section 3,B. Since both types of systems can be treated by the method of this chapter, no careful distinction will be made in deriving their optimal control policies. A prior probability density functions are assumed to be given for each and all unknown parameters as well as for initial state vector X o . The distribution function for the initial state vector X o may contain an unknown parameter 8 3 , 8 3 E {93' where (93 is assumed known. Known deterministic inputs and/or disturbance to the systems are assumed to be incorporated in (I) and (2). Most of the formulas of this chapter are developed for systems of (1) and (2) where various random variables are assumed to be independent at each time instant, unless stated otherwise. The control problems of systems with dependent random variables are discussed in Chapter IV. Desired inputs to the systems are assumed to be stochastic and the probability density function of such an input is assumed to contain a
* These parameters could be time varying. For the simplicity of exposition, they are assumed to be time invariant in this chapter.
2.
SYSTEMS WITH UNKNOWN NOISE CHARACTERISTICS
83
parameter vector which may be unknown, fL E 81' , where 81' is assumed given. The symbol dk(fL), k = 0, 1,... , N, is used to denote a sequence of such random inputs. The actual inputs to the systems are denoted by Zk and are assumed to be related to d k by (3)
'k
where are random noise with known probability densities. The random variables Zk are assumed to be observed. Therefore, the observation data at any time k consists of zk, yk, and Uk-I. The criterion function] is taken to be essentially the same as in Chapter II: N
]
=
I
Wk(X k,
dk
, Uk-I)
(4)
k~I
It is desired to find an optimal closed-loop control policy which rrurumizes EJ. Control variables u to be considered in minimizing EJ depend only on the initially available information plus past and current observations and past controls and do not depend explicitly on unknown parameter values. As in Chapter II, optimal control policies are nonrandomized, i.e., each control Uk is specified to be a definite function of observed data only. Since the optimal control problem in this general formulation is rather complicated, we will proceed in steps and will treat in the next section a class of problems where 8I E 8 1 and 82 E 8 2 are the only parameters vectors assumed unknown. Problems with unknown ex and/or fJ are discussed in Section 3 of this chapter. A control problem where the distribution for the initial state vector X o contains an unknown parameter can be treated quite analogously. Only an example is discussed in Section 2,D in order to avoid repetitions of general formulations. The formulation for more general situations, where all of 81 , 82 , 83 , 8", , 8f3 , and 81' are assumed unknown, is quite analogous and it should become apparent by then how to obtain the general formulation. See also Section 7 for further discussions.
2. Systems with Unknown Noise Characteristics A.
INTRODUCTION
In this section, we will derive optimal control policies for systems such that the random noises in the plant equation as well as in the observation equation have unknown probability density functions. Inputs to the systems are assumed to be deterministic and known.
84
III.
ADAPTIVE CONTROL SYSTEMS; OPTIMAL CONTROL POLICIES
The system to be considered in this section, therefore, is described by (5) (6)
where fs are random disturbances in the plant. Their probability density functions are assumed to contain unknown parameter (}1 , (}1 E 6)1 . A similar assumption is made for 7/S with their densities containing the unknown parameter (}2 E 6)2' The random variables to ,... , gN-l , YJo , ... , YJN-l are all assumed to be independent for all (}1 E 6)1 and (}2 E 6)2 . * A joint a priori density PO«(}I' (}2 , x o) is assumed given. The assumption that fs and YJ's are independent is made to simplify presentations. This assumption can readily be removed by working with the joint density for fs and YJ's rather than with separate densities for fs and YJ's. Problems with dependent noises are taken up in Chapter IV. Since the parameters ex and f3 are assumed known in this section, they are not exhibited in (5) and (6). Because inputs are known they are incorporated in (5). Instead of (4), the criterion function is taken to be N
J= L
(7)
Wk(X k, Uk-I)
k~1
* Suppose t, , i = 0, I, ... , are a sequence of random variables which are independent for each 0 E e. Namely pa'IO)
[lP(t, I 0)
=
for each
0
E
e
;=0
The a priori density function of 0 is assumed given by PoCO). This conditional independence does not imply the independence of ti , i = 0, I, ... , when 8 is not specified. To see this, compute
pet, , t,)
=
=
J J
p(t" t, I 0) PoCO) dO
pa, I 0) pa, I O)Po(O)
se
On the other hand,
pen pai)
=
Jpa, I 0) PoCO) pet, It/» poet/»~
Thus, generally,
p(t, , t,)
u», t/»
"* pet,) paj)
unless PocO) is a delta function which means the density function for completely a priori.
fs are known
2.
85
SYSTEMS WITH UNKNOWN NOISE CHARACTERISTICS
As before, an optimal control policy minimizes the expected value of
where B.
u,
= EWk
,
I
~
k
~
J:
N.
OPTIMAL CONTROL POLICY
It can again be shown that only nonrandomized control policies need be considered in optimization. This is done heuristically as follows: a. Last Control Stage
We use the notations of Chapter II. The arguments are almost identical to those of Section I,B, Chapter II. Assume that we have somehow determined optimal controls Po*,"" PJ;-2 and that PN-1 is the only control function yet to be determined. As before, R N is minimized by min E(WN I yN-1)
(8)*
PN-l
for all yN-1 where E(WN I yN-1) =
JWN(XN , UN-I) p(XN , UN-1 [ yN-l) d(XN , UN-I)
(9)
U sing the chain rule, write the probability density in (9) as p(X N , UN-1 I yN-l) = P(UN- 1 I yN-l) p(xN I UN-I' yN-l) =
In (10), we can write p(x N I UN-I' yN-l) =
PN-l(U N-1) p(X N I UN-I, yN-l)
Jp(XN I XN-1 , UN-I, 8
1)
(10)
P(XN-1 , 81 I yN-l) d(XN-1 , 81 )
where P(Xi+l I Xi' U i , 81 ) is obtainable from (5) and the density functions for es. Note that, unlike in Chapter II, p(xi+l I Xi' ui) is not known since the density function of t contains the unknown parameter 81 , It is only when the value for 81 is assumed that the density function of t i is completely specified. * As in Chapter II, the control variables are understood as parts of the conditioning variables when the observed state variables are shown, and are not exhibited explicitly.
86
III.
ADAPTIVE CONTROL SYSTEMS; OPTIMAL CONTROL POLICIES
Assume that P(XN - 1 ,81 I yN-1) is available. We will derive a recursion equation to generate this density function in Section 2, C using the technique of Section I, C of Chapter II. Then substituting (10) into (9), E(WN
I yN-1) =
(11)
JAN PN-l(U N-1) dU N-1
where AN
g
J WN(X N, UN-I) p(XN
X P(XN-1 ,
81 I yN-1)
I XN_l,
UN-I'
d(x N , X N-1 ,
81)
81 )
From (11), the optimal control policy is now seen to be nonrandomized with the optimal control policy Pti-l = 8(UN_1 - Uti-I)' where Uti-1 is U N- 1 which minimizes AN' The minimization is performed for U N_1 in UN-I' Define . (12) where YN = AN YN* =[j, rnrn YN' uN_l
then (13)
b. Last Two Control Stages Now consider the situation where two more control actions remain to be exerted. The probability density for U N- 2 ,PN-2 are to be chosen so that, when followed by Pti-1 , the sum R N - 1 R N = E(WN _ 1 WN ) is minimized or, equivalently,
+
+
(14) is minimized for all yN-2. Proceeding analogously with the last stage case, we write (15)
where AN - 1
g
J W N-1(XN-1 , UN-2) P(XN-1
I XN- 2 , UN- 2 , 81 )
X P(X N-2 , 81 I yN-2) d(XN_ 2 , X N-1 , 81 )
where P(XN - 1 I XN - 2 , U N- 2 ,81 ) is obtained from (5) and the known density for t N - 2 and the conditional density function P(XN - 2 , 81 I yN-2)
2.
SYSTEMS WITH UNKNOWN NOISE CHARACTERISTICS
87
is assumed to be available. The discussion of its computation is deferred until the next section. To evaluate E( W N I yN-Z) in (14) note as we did in Section 1,B of Chapter II that E(WN I yN-Z)
=
E(E(WN I yN-l) I yN-Z)
(16)
From (14), (15), and (16), min E( WN-l PN-2
+ WN* I yN-Z)
=
min
PN-2
[f A
N- 1
PN-Z(U N- Z) duN_Z
where the asterisk indicate the use of plJ-l . Since we know that pJi-l is nonrandomized, the last term in (17) becomes, from (11) and (13), (18)
Proceeding analogously to Section 1,B of Chapter II, (18) is evaluated as
Thus, (17) becomes
where it is assumed that P(YN-l I uN- Z, yN-Z) is known. This density is also derived in Section 2,C. Define (21)
then (22)
minimizes (20) where
uJi-z
is
UN-Z
in UN- Z which minimizes
YN-l :
(23)
88
III.
ADAPTIVE CONTROL SYSTEMS; OPTIMAL CONTROL POLICIES
c. General Case Proceeding recursively, it is now easy to see that an optimal control function at time k, Pk , is determined by min E(Wk+l Pk
=
n;,~n
+ W:+ 2 + ... + WN* [yk)
J [Ak+ + JY:+2 p(Yk+1 I Uk, yk) dYk+l] Pk du.; 1
where (24)
and where
Therefore optimal Pk is given by Pk
*
=
o(uk
Uk *)
-
where Uk * is Uk which minimizes Yk+1 : k
=
O, ...,N-1
(26)
Thus, by computing ,\'s and y*'s recursively, the optimal control policy is obtained. The density functions P(Xk' 81 I yk) and P(Yk+1 I uk, yk), needed in computing ,\'s and Y*'s, are derived next. Remarks similar to those made at the end of Section I,B of Chapter II can be made here about the possible simplifications of the recursion formula and of the control policy implementations when sufficient statistics exist. C. DERIVATION OF CONDITIONAL PROBABILITY DENSITIES
We have shown that, if p(x k , 81 I yk) and P(Yk+1 I yk, Uk), k = 0, ... , N - 1, are available, then (24), (25), and (26) allow us to derive an optimal control policy as a nonrandomized sequence of control vectors u o* ," " ut-1' We now generate these conditional probability density functions recursively. Since it is easier to obtainp(xi , 81 , 82 I yi) recursively, this conditional density is generated instead of p(x i , 81 I yi). These two densities are related by (27)
2.
89
SYSTEMS WITH UNKNOWN NOISE CHARACTERISTICS
We will now derive p(8 1 , 82 , xi+ll yi+l), assuming that p(8 1 , 82 , Xi I yi) is available. For this purpose consider p(8 1 , 82 , Xi , Xi+l 'Yi+1 I yi). It can be written by applying the chain rule; p(81 , 82 , Xi , Xi+l ,Yi+l I y i) =
p(81 , 82 , Xi I yi) P(Xi+l I 81 ,(J2 , Xi ,yi) P(Yi+l I 81 , 82 , Xi , Xi+l ,yi) (28)
where the independence assumptions on fs and YJ's are used to obtain the expression on the last line. In (28), the first term of the right-hand side is assumed to be given, and the other two terms are computed from (5) and (6) and the assumed density functions for P(gi I 81 ) and P(YJi+l I 82 ) , Now integrate the left-hand side of (28) with respect to Xi to obtain
Jp(81 ,
82 , Xi , Xi+l ,Yi+1 I yi) dx,
= p(81 , 82 , Xi+l ,Yi+l I yi) =
P(Yi+l I yi) p(81 , 82 , 8i +l I yi+l)
(29)
Hence, from (28) and (29), the desired recursion equation is obtained as p(81 , 82 , Xi+1 I yi+l) =
J
p(81 , 82 , Xi I yi) p(xi+l I 81 , Xi , Ui)
X
P(Yi+1 I 82 , Xi+l) dXi
(30)
f [numerator] d(Xi+l , 8
1 , ( 2)
where the denominator gives P(Yi+l I yi). From (27), p(8 1 , Xi I yi) is obtained by integrating (30) with respect to 82 • By the repeated applications of (30), p(81 , 82 , Xi I yi), i = 0, 1,... , N - 1, are computed recursively, starting from p( 8l ' 82,
I yO)
X 0
=
P(()1 , 82 , Xo ,Yo) p(Yo) PO(()l,
()2 ,
XO) p(Yo I ()2 , Xo)
where Po(81 , 82 , x o) is the given a priori joint density of 81 Note that
(31)
,
82 , and Xo .
where p(8 1 I yi) is the ith a posteriori probability density of 81 given yi, i.e., given all past and current observations on the systems state vectors Yo ,... , Yi .
90
III.
ADAPTIVE CONTROL SYSTEMS; OPTIMAL CONTROL POLICIES
If the state vectors happen to be perfectly observable, i.e., if the observation equation is noise free and can be solved for x uniquely, then instead of P({J1 , Xi I yi) we use simply P({J1 I Xi) in deriving equations similar to (24)-(31). We can derive a similar recursive relationship for P(Xi I yi) or P(gi , Xi I yi) or, more generally, for TJi , Xi I yi), i = 0, ..., N - 1. In some problems, the latter is more convenient in computing EJ. For example, the conditional density P(Xi+l I Xi , Ui , gi) may be simpler to manipulate than that of p(xi+l I Xi , Ui ,(1 ) , If this is the case write E(Wi I yi-1) as
«e: ,
E(Wi I yi-1) =
=
JWi(Xi , Ui-1) p(Xi I yi-1) dx,
JWi(Xi , Ui-1)p(Xi I Xi-I' Ui-1 , gi-1) X
(32)
P(Xi-1 , gi-1 I yi-1) d(Xi , Xi-I' gi-1)
°
where i = 1,..., N and where u's are taken to be nonrandomized. Now to obtain P(Xj , gj I yj), ~ j ~ N - 1, recursively, write
=
=
X
gi , Xi I yi) P(gi+l I 81 , gi , Xi , yi) P(Xi+l I 81 , gi' Xi , gi+1 , yi)
X
P(Yi+l I 82 , gi' Xi , gi+1 , Xi+l ,yi)
p(81 , 82
,
gi , Xi \ yi) P(gi+1 I 81 ) p(xi+1 I gi , xi , u i ) P(Yi+l I x i+1 , 82 )
p(81 , 82 , X
where the independence assumption of fs and TJ's for any 81 82 E €i)2 is used. From (5),
The left-hand side of (33), after Xi and written as
Thus p(81, 82
,
gi+l , x i+1 I yi+1) =
Jp(81 , 8
2 ,
X
gi
(33) E €i)1
and
are integrated out, can be
gi' Xi I yi) P(gi+l I 81 ) O(Xi+1 - F i )
P(Yi+l I Xi+1 , 82) d(Xi , gi) J[numerator] d(81 , 82 , gi+l' xi+l)
(34)
2.
SYSTEMS WITH UNKNOWN NOISE CHARACTERISTICS
91
t«. ,
Similarly we can compute YJi , Xi I yi) recursively to obtain the optimal control policies when 8} and 82 are unknown. Sometimes this form is more convenient to use in deriving optimal control policies. To obtain the recursion relation for P(gi , YJi , Xi I yi), write p(81 , 82 , Xi , ti' Y"fi , X H I , ti+l , Y"fi+l ,Yi+l I y i)
= p(81 , 82 , Xi , ti , Y"fi I yi) P(Xi+l I Xi , ti , Ui) X P(ti+l
I 81 ) P(Y"fHI I 82) P(Yi+l I xi+! , Y"fi+!)
where the conditional independence assumption on been used. From (5), P(Xi+1 I Xi , u, , ti)
=
t»
(35)
and YJ's have
O(X H I - F i )
From (6), Integrating (35) with respect to Xi ,
f p(8
gi , and YJi
,
i) 1 , 82 , Xi , ti , Y"fi , Xi+l , ti+l , Y"fi+l , YHI I y d(x i , ti , Y"fi)
= p(81 , 82 ,
Xi+l , ti+l , TJHI , Yi+l I y i)
= p(81 , 82 , X H I , ti+l , Y"fi+l I yi+l) P(Yi+l I yi) Hence =
f p(8
1 , 82 , Xi , ti , Y"fi I yi) O(X H I - F i ) O(Yi+l - Gi+l)
X _P(ti+l
I 81) P(Y"fi+1 I 82)
J(numerator] d( 8
1 ,
d(xi' ti' ~
(36)
82 , Xi+! , ti+l , Y"fi+l)
Integrating both sides of (36) with respect to 8} and 82 , the desired recursion equation results. Equations (30) and (36) have been obtained assuming that gi' i = 0, 1,... , and YJj ,j = 0, 1,... , are all independent for each 8} and 82 • If this conditional independence assumption is weakened and if it is assumed that gi' i = 0, 1,... , is dependent for each 8}, that the g process and the YJ process are still independent then, by considering p(B} , 82 , gi , Xi , gi+l ,Yi+l I yi), for example, instead of Eq. (33), the recursion equation for p(8} , 82 , gi' Xi I yi) is obtained as p(81 , 82
=
,
e.; ,Xi+l Iyi+l)
Jp(81 , 8 X
2 ,
ti' Xi I yi) P(ti+l
I 81 ,
t i) O(X H I - F H I )
P(Yi+l I Xi+l , 82) d(xi , ti) 1 , 82 , ti+} , Xi-t-l)
J[numerator] d(8
(37)
92
III.
ADAPTIVE CONTROL SYSTEMS; OPTIMAL CONTROL POLICIES
The most convenient form should be used for a problem under consideration. The indicated integrations in these recursion equations should not be carried out by themselves when they involve delta functions. Perform the integration in evaluating E(Wj I yj-l). It is not usually necessary to compute the denominators in expressions such as (30), (31), and (34) since they are normalization constants. By repeated applications of (30) or (34), p(8 l, 82 , Xi I yi) or p(8 1 , 82 , ~i , Xi I yi) can be given directly in terms of Po(8 l , 82 , Xo), p(Xj+l I Xj , Uj , 8), and «», I Xj), 0 ~ j ~ 1. Fel'dbaum uses this form of the conditional probability density expressions in his works. As an example, let us rewrite the left-hand side of (30) in this form. See also the brief exposition of Fel'dbaurn's method in Section 6 for further details. Since
p(8l , 82
, Xl
Iyl )
=
f
p(8l , 82 , X o I Yo) p(x l I 81 , X o , uo) P(YI I Xl) dxo -=----------------[numerator] d( 81 , 82 , Xl)
f
f Po(8
1 ,
82 , x o)p(xl I 81 , x o, U o) p(Yo I x o, 82 ) P(YI I Xl , 82 ) dxo f[ numerator] d(81 , 82 , Xl)
It is easy to see that the general expression is obtained to be
i
TI p(x; I 8
1 , X;_l ,
U;-l)
;~l
X
f [numerator] d(81 , 82 , Xi)
Similarly, we can write p(8 1 , 82
,
~i ,
i
TI S(x; ;~1
X
p(y; I x; , 82 ) dXi - 1
(38)
Xi I yi) as
i-I
F;-l)
TI p(t; I 8
1)
;~o
f[numerator] d(81 , 82
,
Xi ,
ti)
«», I x
j ,
82 ) d(xi-I,
ti - l ) (39)
2.
D.
SYSTEMS WITH UNKNOWN NOISE CHARACTERISTICS
93
EXAMPLES
a. System with Unknown Initial Condition
In Example 3 of Chapter I, we have obtained the optimal control policy for one-dimensional system Xi+! Yi
= ax, + bu, , = Xi + YJi,
]=
u.;
E (-
00, 00)
O~i~N-l
N
I
Xi
2
1
under the assumption that a and b are known, and X o , YJo ,•••, 'YJN-l are all independent Gaussian random variables with known mean and vanances. Now assume that the mean of X o is unknown, i.e., assume
where 2(·) denotes the distribution function, N(·, .) stands for a normal distribution, and where e is a random variable, the a priori distribution of which is given by an independent Gaussian distribution
The optimal control policy is still given by Ui
* =
-aiLdb
O~i~N-l
where iLi is as defined by (28) of Chapter 1. The recursion equation for iL has been derived in Section 3,B of Chapter II as (l14a). The expression for fLo is the only difference that results from the assumption of unknown e. The initial value fLo is now given by computing p(xo I Yo)
where and where
JPo(B) Po(xo I B) p(Yo I xo) se J [numerator] dx o
=
--,...--------
=
const exp (_ (x o - iLO)2 ) 2ao2
94
III.
ADAPTIVE CONTROL SYSTEMS; OPTIMAL CONTROL POLICIES
Thus, the unknown mean e of X o has the effect of replacing a 2, the variance of X o , by o? 6 2 • Otherwise, the recursion formula for fL and a 2 as given by Eqs. (114a) and (114d) of Section 3,B, Chapter II, remains valid.
+
b. Perfect Observation System with Unknown Plant Noise Consider a one-dimensional linear control system described by the plant equation UiE(-W,W),
O~i~N-l
(40)
and assume to be perfectly measured Yi =
Xi'
(41)
O~i~N-l
where a and b are known constant plant parameters and where fs are random variables assumed to be independently and identically distributed with the distribution function (42) for each e on the real line, where e is the unknown random parameter of the distribution function for [; The variance is assumed given. The a priori distribution function for e is assumed given by
ar
(43)
where fL and a o2 are assumed known. The initial state X o of the plant is assumed to be a random variable independent of e and assumed to be distributed according to (44)
where
(X
and a 2 are assumed known. The criterion function is taken to be qi > 0, I
1
~
i ~ N
We will now derive the optimal control policy for this system. According to the general procedure developed in Section 2, in order to derive Uf;-l , we first compute
2.
SYSTEMS WITH UNKNOWN NOISE CHARACTERISTICS
95
(45)
Define the conditional mean and variance of gi by (46a)
and 0~1~N-1
(46b)
From (45) and (46), (47)
From (47), if fti and T i do not depend on the (N - I )th stage is found to be
Ui ,
the optimal control at
U~-l
= -(axN_1 + J-LN-l)/b
(48)
YN*
= rt,
(49)
and Computations of J-L's and I''« are carried out later and given by (59) and (60). There we show that T'« are all constants and J-Li is independent of Ui' thus satisfying the assumption made in deriving (48). Equation (49) shows that YN* is a constant. Generally Yi* becomes constant, hence the dependence of Yi+l on U i is determined by that of Ai +1 only and each control stage can be optimized separately. The optimal control policy for this problem, therefore, consists of N one-stage optimal control policy. The result is given as U;*
=
r.
=
-(ax;
+ J-L;)/b
N-l
I rl,
0~i~N-1
(50)
(51)
i-I
We compute ft'S and T'e next. Since x's are perfectly observed, the knowledge of Xi is equivalent to that of gi-l since gi = Xi+l - aXi - bu; . Therefore, (52)
96
III.
ADAPTIVE CONTROL SYSTEMS; OPTIMAL CONTROL POLICIES
where the conditional independence of g for given 8 is used. The first factor of the integrand of (52) is given by (42). The second factor is computed recursively. By the chain rule we write p(B, gi+l I gi)
=
pCB I gi)P(gi+l I B, gi)
= p(B I gi)P(gi+l I B),
O~i~N-l
(53)
Therefore, O~i~N-l
(54)
To initiate the recursion (54), we note that, by the independence assumption of 8 and X o , p(B, go, xo)
=
p(B, x o) p(go I B, x o)
=
PoCB) PoCx) p(go I B)
Therefore,
(55)
Namely, we have derived fLo
=
fL
(56a) (56b)
Assume (57)
This is certainly true for i = 0 with (58a) (58b)
2.
SYSTEMS WITH UNKNOWN NOISE CHARACTERISTICS
97
When the right-hand side of (54) is computed, using the assumed form for p(8 I gi), we obtain p(O I gi+l) = canst exp ( __ (0 - ~i+1)2) 217i+1
(59a)
with
e _ Oil17i + gi+1l u i+1 1/17? + 1/u 1/17;+1 = 1/17i + 1/u 2
1
2
(59b)
12
2
1
2
(59c)
the assumed form for the a posteriori probability density function for 8 given gi. Thus, (57) is established for all i = 0, 1,.... This shows also that (8i , r i ) is the sufficient statistics for the unknown parameter 8. From (52), (57), and (59), (60a) 1 <:'i<:'N-1
(60b)
From (60b), T'« are all seen to be constants as asserted earlier. Thus, YN* and all subsequent y*'s become constants and each control stage
can be optimized separately with the optimal control policy of (51). From (51) we have Yl*
= min EJ =
N-l
L Tl o
c. System with Noisy Observation Having Unknown Bias Consider a one-dimensional linear control system described by the plant equation UiE(-oo, (0),
0
<:. i <:. N-1
(61)
and observed by Yi = Xi + TJi'
(62)
where a and b are known constant plant parameters and where fs are independently and identically distributed with the distribution function given by (63)
III.
98
ADAPTIVE CONTROL SYSTEMS; OPTIMAL CONTROL POLICIES
The random variables T) are assumed to be independent of fs. The random variables T) are also assumed to be independently and identically distributed with (64)
for each () E ( - 00, 00) where e is the unknown parameter of the distribution. Its a priori distribution function is assumed to be given by (65)
The initial state X o of the plant is assumed to be a random variable, independent of all other random variables and assumed to be distributed according to (66)
where
(X
and a 3 are known. The criterion function is taken to be qi > 0
The optimal control policy is now derived for the above system. In the previous two examples, the recursion equations for conditional probability density functions involved a one-dimensional normal distribution function. In this example, we will see that, because of noisy observations, two-dimensional normal distribution functions must be computed in the recursive generations of the conditional density functions. To obtain ut-l , compute first
=
=
f
I
QN XN
2
QN XN
2
p(x N I yN-l) p(X N
I X N- l
dX N
, UN-I'
tN-I) P(X N-1 , t N-1 I yN-l) d(x N
, X N- l , tN-l)
In Eq. (67), (68)
2. since
gi
SYSTEMS WITH UNKNOWN NOISE CHARACTERISTICS
is independent of
gi-l and
of
YJi
99
by assumption. From Eq. (62), O:(;i:(;N-l
(69)
Substituting Eqs. (68) and (69) into Eq. (67), AN =
I
qN[(ax N-1
+ bU N_1)2 + ao2] P(X N- 1 I "fJN-l ,yN-l)
X P("fJN-l I yN-l) d(X N- 1 , "fJN-l)
=
I
qN{[a(YN-l - "fJN-l)
+ bUN- l ]2 + ao
2
Defining the conditional mean and variance of
}
P("fJN-l I yN-l) d"fJN-l YJi
(70)
by (71a)
O:(;i~N-l
and (71b)
we have
Since ?}N-l and T N- 1 are independent of UN-I' by minimizing Eq. (72) with respect to U N- 1 the optimal control variable at time (N - 1) is given as (73)
and (74)
Therefore, if fLN-l and T N - 1 are known, so are U'fi-l and YN*' We also see from Eq. (74) that, since YN * is independent of YN-l , each control stage can be optimized separately and the optimal control policy consists of a sequence of one stage optimal policy (75)
In order to compute fL's and T'«, we show first that the conditional probability density of e and YJi are jointly normally distributed, i.e., p(8, n, I yi)
=
canst exp[-
~
(8 - (Ji , 7]i
-
?}i) Mil (8 -
~i)]
"fJi - YJi
(76a)
100 III.
ADAPTIVE CONTROL SYSTEMS; OPTIMAL CONTROL POLICIES
where (76b)
is a constant covariance matrix and where
8i
=
E(8
Il)
TJi = E(YJi I yi)
=
var( 8 I yi)
M~2
=
var( YJi I yi)
(76c)
= E[(8 - 8i )(YJi - TJi) IlJ
= M~l
M~2
Mtl
Then with the notation defined by (71b),
To verify (76) for i = 0, consider p(8 YJ I yO) ,
=
0
p(8, YJo ,yo) p(yo)
Po(8) p(YJo I 8) p(Yo I 8, YJo) p(Yo)
(77a)
where, from (65), Po(8)
=
const exp (_ (8 -
fL)2)
2u 22
(77b)
from (64), p(YJo I 8)
= canst exp (- (YJ02~ 28)2 )
(77c)
and from (66),
From (77), (76a) is seen to hold for i (j _
o-
°
with
+ (yo - rx)/(U12 + U 32) 1/u22 + 1/(u12 + u 2) + (yo - rx)/U 2 fL/( u1 + u2 3 1/(u + U2 2) + Iju3 2
fL/U22
2 3 )
(78a)
2
~
YJo
=
(77d)
=
1
2
(78b) (78c) (78d) (78e)
Note that M o is a constant matrix.
2.
101
SYSTEMS WITH UNKNOWN NOISE CHARACTERISTICS
Thus (76a) is verified for i = O. Next, assume that (76a) is true for some i > O. We will show that the equation is true for i + 1, thus completing the mathematical induction on (76a). To go from ito (i + 1), consider p(8, YJi , YJi+1 I yi). By the chain rule, this conditional density can be written as p(8, YJi
,YJi+l ,Yi+l
I y i)
=
p(8, YJi I y i) P(YJi+l I 8, YJi' yi) X
P(Yi+l I 8, YJi , YJi+l ,yi)
(79)
where the second factor reduces to P(YJi+l I 8) since YJ/s are conditionally independent of all other YJ's and t's by assumption. From (61) and (62), y satisfies the difference equation (80)
The third factor of (79) is given, therefore, by
By integrating both sides of (79) with respect to YJi , p(8, YJi+l ,Yi+l I yi) = P(Yi+l I yi) p(8, YJi+l I yi+l)
=
I p(8, YJi I yi) P(YJi+l I 8)P(Yi+l I 8, YJi' YJi+lY) dYJi
Therefore,
P(
I i+l) _
8 ,YJi+l
Y
-
f p(8, YJi I yi) p(YJi+l I 8) P(Yi+l 18, YJi; YJi+l' yi) f·
.
dTji
p(8, YJi I y') P(YJi+l I 8) P(Yi+l I 8, YJi' YJi+l' y') d(8, YJi'
YJi+l)
(81)
After carrying out the tedious integration in (81), we establish (76a) for 1 with i 8 _ [(Ci + Di)/ui 2 + BiD i - Ci2]8i + (B i + Ci)Z;juI 2 i+l L1 2
+
i
~
YJi+l =
Mit
1
(C i
+ Di)8;ju1 2 + [(Bi + Ci)/UI 2 + BiDi L1.2
= (l/u I 2 + B i)/L1 i 2
Miri 1 =
(l/u I 2
-
M~ril
=
(l/uI 2
+D
=
Yi+l - a(Yi - Y]i) - bu,
Zi
Ci )/L1 i 2 i)/L1 1
2
1
C i2]Zi
102 III.
ADAPTIVE CONTROL SYSTEMS; OPTIMAL CONTROL POLICIES
where
and where
Therefore, M i in (76a) all turn out to be constants which can be precalculated (i.e., calculated off-line). The only on-line computation in generating ui * is that of updating YJi and Bi where Zi depends on Yi , Yi+1 , and U i .
d. System with Unknown Noise Variance The technique similar to that used in the previous example can be used to treat control problems with unknown noise variances. As an illustration, we derive an optimal control policy for the system of Section 2,D,c given by (61) and (62) assuming now that the variances of the observation noises are unknown, and the mean is known. We only give a summary of the steps involved since they are quite similar to those of Section 2,D,c. Another approach to treat unknown noise variances is to use CYJi instead of YJi with 2?( YJi) = N(O, 1) and assume C to be the unknown parameter and apply the method of Section 3 of this chapter. Also see Sections 2 and 3 of Chapter VII. For an entirely different approach see Ref. 91a. Instead of (64), we assume that 2?(TJi)
=
N(O,.E)
where 1: is the variance which is assumed to be distributed according to
where ZO,l
-+- ZO.2
=
1
2.
SYSTEMS WITH UNKNOWN NOISE CHARACTERISTICS
103
Namely, the unknown variance of the observation noise is assumed to be either l:1 or l:2 with the a priori probability given by ZO,I and ZO,2 respectively. Other assumptions on the distribution functions of fs and X o are the same as before. The probability of l: is now taken to be independent of X o . With the criterion function as before, each control stage can be optimized separately. To obtain u i *, we compute
Defining j E(xj I y )
\+1 can
=
and
Xj
var(xj I yj)
=
r.,
(82)
O~j~N-I
be expressed as Qi+l[(axi
"i+1 =
+ bui )2 + a2Ti + u02]
Under the assumption, to be verified later, that Xi and of U i we obtain o ~ i ~ -1
r, are independent
as the optimal control variable at time i. In order to compute Xi , consider the joint probability density function p(xi , l: I yi). It is easy to see that it satisfies the recursion equation
.
p(xi+l , l: I y'+1)
I-p(X i , l: yi)P(Xi+l Xi , Ui)P(Yi+l Xi+l , l:) dx, ----------------I [numerator] d(xi+l' l:) 1
=
I
I
It is also easy to show inductively that p(Xi , l: I y i)
=
[Zi.l S(l: - l:1)
+ Zi.2 S(l: -
l:2)]N[fl-i(l:), TiCl:)]
(83)
The second factor in (83) is the Gaussian probability density function with mean fl-i(l:) and variance riCl:) , where
+ bui)/(aTi(l:) + u 2) + Yi+1Il:J/[I/(a 2Ti(l:) + u 2) + Ill:] fl-o(l:) = (alu 2 + Yo/l:)/(Ilu 2 + Ill:) IITi+1(l:) = I!(a 2Ti(l:) + 2) + Ill: IITo(l:) = Ih + Ill: Zi+l.l = Zi,lwi.I!(Zi.l + Zi.2 (0 ~ Zi+1.2 = Zi.2 + Zi.2 fl-i+l(l:)
=
[(afl-i(l:)
0
0
3
3
U0
2
Wi.1
Wi.2)
Wi.2/(Zi.l Wi.l
W i . 2)
i
~
N - 1)
104 III.
ADAPTIVE CONTROL SYSTEMS; OPTIMAL CONTROL POLICIES
and where j
=
1,2
Then from (82) and (83) Xi
=
Zi,lfLi(l:l)
r,
+ Zi,2fLi(l:2)
=
Zi,lri(l:l)
+ Zi,2ri(l:2) + Zi,lZi,2[fLi(l:1)
the assumption that satisfied.
Xi
and
r, are independent of
- fL;(l:2)]2
Ui
is thus seen to be
3. Systems with Unknown Plant Parameters A.
OPTIMAL CONTROL POLICY
We had an occasion to mention briefly as Example 7 of Chapter I a control problem where the time constant of a system is random with an unknown mean. We did not derive its optimal control policy however. In this section we will derive optimal control policies for systems having unknown plant and/or observation parameters. Derivations of optimal control policies for this class of systems are carried out quite analogously as in Section 2. Their plant and observation equations are given by (I) and (2). The derivations hinge on the availability of conditional densities P(Ci, {3, Xi I yi), where Ci and {3 are unknown plant and observation parameters of the system. The class of control policies can again be taken to be nonrandomized by the arguments that parallel exactly those of Section 2. Optimal control policies are derived from (84)
where YN
g
AN
Yk+l
g
Ak+l
+ E(Yk+2 I yk),
O~k~N-2
and where Ai
g
JWi(X i ,
U i-1)
p(x i I Xi-I' ex,
U i-1)
1 ~ i ~ N
pea,
Xi-l
I yi-l) d(Xi_1 ,Xi, a),
(85)
3.
105
SYSTEMS WITH UNKNOWN PLANT PARAMETERS
To obtain p(ex, Xi 1 yi), which appears in (85) recursively, consider fl, Xi , Xi+l ,Yi+l I yi) and write it by the chain rule as
p(ex,
p( ex, f3, Xi , Xi+l 'Yi+l I y i) =
p(ex, f3, Xi I yi) P(Xi+l I ex,
f3, Xi ,yi) P(Yi+l I ex, f3, Xi+l , Xi ,yi)
= p(ex, f3, Xi Iyi) p(Xi+l I ex, Xi , Ui) P(Yi+l I f3, Xi+l) Now since
the
recursion equation for Xi I yi) is obtained as
the
p(ex, fl,
p(ex, f3, Xi+l I yi+l)
=
conditional
probability
density
f p(ex, f3, Xi I yi) p(Xi+l I Xi' Ui, ex) X
P(Yi+l I f3, Xi+l) dXi
(86)
J[numerator] d(ex, f3, Xi+l)
and (87)
We next consider several examples.
B.
EXAMPLES
a. System with Unknown Random Time Constant Consider a one-dimensional control system of Example 7, Chapter I, described by the plant equation
o~
i
~
N - 1,
u,
E (-
00,
(0)
(88)
where a's are independently and identically distributed random variables with (89)
for each e where e is the unknown parameter of the distribution function. It is assumed to have an a priori distribution function (90)
106
III.
ADAPTIVE CONTROL SYSTEMS; OPTIMAL CONTROL POLICIES
The system is assumed to be perfectly observed: Yi
=
Xi'
(91 )
O~i~N-l
When the common mean of the random time constants is known, the problem reduces to that of a purely stochastic system. As such, this problem has already been discussed in Section 2 of Chapter II. By letting U o ~ 0, the solution of this problem reduces to that for the purely stochastic system as we will see shortly. The criterion function is taken to be (92)
Now
(93)
°
Because of the assumption of perfect observation, the knowledge of is equivalent to that of a N- 2, since ai = (xi+l - Ui)!xi if Xi =F from (88). If Xi = for some i, it is easy to see that u j = 0, j = i, i 1,... , N - 1, is optimal from that point on. Define X
N- I
+
°
ul
=
var(a i I ai-I),
~i~N-l
(94)
~i~N-l
(95)
From Eq. (93), using symbols just defined by (94) and (95), (96)
From Eq. (96), the optimal control variable at time N - 1 is given by (97)
since aN - I and uN - I can be seen to be independent of minimal value of YN is given as
UN-I.
The
(98)
3. Now \
=
SYSTEMS WITH UNKNOWN PLANT PARAMETERS
0, i
<
107
N, because of the criterion function (92), and we have
As we will see later, aN - I is independent of U N- 2 • Therefore, the problem of finding optimal control policy for the remaining (N - 1) control stages essentially remains unchanged. Hence O:::;;i:::;;N-l
(100)
is the optimal control policy and min
EJ =
h*
N-I
(Il a
=
i 2) X 0 2
(101)
1.=0
It now remains to compute quantities defined by (94) and (95). Now from the conditional independence assumption on 8, pta; I ai-I)
=
Jp(a i I B) p(B I ai-I) dB
(102)
(ai 2-':-2 B)2 )
(103)
where, from (89), p(a i I B)
= canst exp (-
To compute p(8 I ai-I), we use the recursion formula p(B I ai)
=
H
H
p(B I a ) p(aj I B, a ) I p(B I aJ-l) p(aj I B, a,-I) dB
,
1 :::;;j :::;; N - 1
(104)
where its initial conditional probability density function is given by
(105)
where fLo
I/E02
=
ao/a2 l/a2
=
l(a 2
+ Bo/a 02
+ l/a0 + l/u0 2
2
(106a) (106b)
108
III.
ADAPTIVE CONTROL SYSTEMS; OPTIMAL CONTROL POLICIES
As in previous examples, 8 has the sufficient statistics and we can write p(8 I a j ) = canst exp (_ (6 ;~j)2
,
)
(107)
where the sufficient statistics (fLi , I:i ) are given by (108a)
O~j~N-l
1/Lj 2
=
l/L;_l
+ l/a
2
(108b)
where fLo and I:o are given by Eq. (106). Substituting (103) and (104) into Eq. (102), p( a . I ai-I)
,
=
canst exp (_ (a i - fLi_I)2 2(a2 + LLl)
)
(109)
Thus we have l~i~N-l
(110)
and
Or, more explicitly,
By letting U o -+ 0, we see that iii all reduces to 80 and I:i to O. As expected, the optimal control policy given by (100) reduces to the optimal control policy for the corresponding stochastic case; a special case discussed in Section II,2,C.
b. System with Unknown Time Constant Let us consider a problem related to Example 5 of Chapter I. The plant equation is assumed given by Xo
given,
u, E (-00,00)
(111)
3.
109
SYSTEMS WITH UNKNOWN PLANT PARAMETERS
where it is assumed that b is known, that a is a random variable independent of ts with (112)
where
eis assumed unknown with its a priori probability density given by (113)
and where
are independently and identically distributed with
~/s
(114)
The observation is assumed perfect: Yi
(115)
Xi
=
The system is assumed to be a final-value control system. Let us take its criterion function to be To obtain an optimal control policy, we compute first YN
= =
E(X N
2
I X N- 1)
I X 2 p(X N
N
I XN - 1 , UN-I' a) pea I X N- 1) d(x N , a)
(116)
where CX N - 1
g
E(a I X N- 1)
(117a)
r
g var(a I X N-1)
(1l7b)
N-l
It is now shown that p(a I xi) has the form pea I Xi) = canst exp (_ (a -:;rCXi'L), z
i = 0, 1,..., N - 1
This form is seen to hold for i = 0, since pea I x o) = po(a) = =
f pea I B) Po(B) dB (a-Bo)2) 2 --:l)
canst exp ( - 2(
U
o
-+ Ul
(118)
110
III.
ADAPTIVE CONTROL SYSTEMS; OPTIMAL CONTROL POLICIES
Therefore, (118) is true with
Assuming that (118) is true for some i, one obtains p(a I Xi+l) = canst exp (_
-.ia - CXi+~) 2Ti +l
using the recursion relation p(a I Xi+l)
p(a I Xi) P(Xi+l I Xi, Ui , a) u, , a) da
=
f p(a I X') P(Xi+l I Xi,
where
and where (119a)
I/TH1
=
I/T i
To = a 0
2
+ xlla 22
+ a1
(1l9b)
2
This establishes (118) for all i = 0, 1,..., N - 1. The optimal last control is obtained by minimizing (116) with respect to U N- 1 : U~_I
(120)
=
and (121)
Since] depends only on
Y~-I
=
JYN* P(X
N-
XN ,
Ai
=
0, i
=
1,... , N - 2. Therefore,
N- 2) dXN_1 I IX
=
JYN* P(XN-I I XN-2 , UN-I' a) p(a I XN-2) d(XN_1 , a) a22 + T N-1 J [(ax N_2 + bu N-2)2 + a 22] p(a I XN-2) da
=
2 a2 (1
=
+ TN-I) + T N-I[(
CXN-2
XN-2
+ bU N_2)2
1- x~-2TN-2] (122)
3.
SYSTEMS WITH UNKNOWN PLANT PARAMETERS
I I1
By comparing (116) with (122), one immediately gets
and generally
* =-T aixi
(123)
». where
(Xi
are computed from (119).
c. System with Unknown Gain
U~~l
Let us now re-examine Example 6 of Chapter 1. There, we derived as U~-l
=
where and 2 UN-l
YN * -_ b2
N-l
2 -+ U N_1
(
aX N_1
'\' 2 -+ Loo
)2
without computing bN - 1 and UN-l explicitly. Let us now compute them under the assumption that the random variable b is Gaussian, independent of t's and of X o and has the unknown mean 0, and that 0 itself is a random variable with a priori density function given by where eo, L:1 , and L:2 are assumed known. We first show that p(b I Xi)
=
canst exp ( -
Since p(b I x o)
=
Po(b) =
=
(b - by
~i-), ,
i
=
0, I, ...
Jp(b I (}) Po({}) d{} (b -
canst exp ( - 2(L;
lZ
{}O)2
-+ L;zZ) )
(124)
112 III.
ADAPTIVE CONTROL SYSTEMS; OPTIMAL CONTROL POLICIES
°
(124) is certainly true for i =
with
bo
=
80
(125a)
1/u 0 2 = 1/171 2 +- 1/1722
(l25b)
From the recursion relation p(b I xi+l)
=
P(Xi+l I Xi , b, Ui)
=
p(b I Xi) P(Xi+l I b, Xi , Ui) b, u.;) db
(126)
I p(b I x') P(Xi+l I Xi'
where const exp ( -
(X
'+
1 -
ax - bU)2 ) 217;2 '
(127)
one gets p(b I xi+1 )
=
const exp ( -
(b - b 2Ui +l
--2~
)2)
where b i+l
_ bd ui2 +- Ui(Xi+l - ax i)/170 2 1/ui2 +- Ui2/170 2
(128a)
and (128b)
where ];0 2 is the variance of the plant noise, thus verifying (124) for all i = 0, 1,.... The statistics (bi , ui 2 ) are the sufficient statistics for b. From (128a), therefore, (129)
where bN~2 is a function of XN~2. Since the criterion function depends only on N - 1, and we have
Y~-l
= =
Xt_l
J
YN
J
X N,
II;
=
0,
* P(XN - 1 I XN - 2 ) dXN - 1
YN*
P(XN - 1 I XN - 2' b, UN - 2) p(b I XN -
l
= 1,... ,
(130) 2
)
d(x N _ 1 , b)
Notice that appears not only explicitly in YN * but also through bN - 1 in carrying out the integration with respect to X N-1 •
3.
SYSTEMS WITH UNKNOWN PLANT PARAMETERS
113
The integration (131)
is of the form 1=
oo
J
-s-co
(ax
X2
+ b)2 +
CZ
( X - fL)2 ) exp - - - 2 - - dx
a, b, creal
(132)
after a suitable change of variables or, more generally, (133)
where Ql and Qz are quadratic polynomials in x. The integration of Eq. (131) cannot be carried explicitly to give an analytically closed expression. Therefore, it is impossible to obtain expressions for u o*,..., u!;_z explicitly analytically in a closed form. They must be obtained either numerically or by approximation. Here is an example of a common type of difficulty we face in obtaining optimal control policies. The system of this example is rather simple, yet the assumption of the unknown mean of the random gain made it impossible to obtain the optimal control policy explicitly. There are many such examples, even when the systems are linear with quadratic performance indices, that they do not admit analytic solutions for optimal control policies. We note that the integrand YN * in (131) is bounded from above by
If this upper bound is used in (131), then we have an approximation of yJ;-l given by
This approximation is equivalent to one-stage optimization where each control stage is optimized separately. This approximation yields a suboptimal control policy where u, = -
b,Z
+ b·
,
a. Z
(axi),
O~i~N-1
114 III.
ADAPTIVE CONTROL SYSTEMS; OPTIMAL CONTROL POLICIES
Even though (131) cannot be carried out analytically, some insight can be obtained into the structure of optimal control policies. From (128a) and (128b) we note that
Therefore, the coefficient multiplying X~_l approaches zero:
Substituting
aX N_ 2
+
bU N_ 2
+ ~N-2
for
in the expression for
in
X N-1
YN*'
YN
*
we have
If b is known, then the minimum of the expected control cost for the last two stages using an open-loop control policy is equal to17 0 2(1 + a 2 ) , i.e., the contribution from ~N-l and a~N-2 terms. Let us now examine the effects of I U N_ 3 I -+ 00. Then X N-1 XN
+ gN-2 aX N_1 + bU N-1 + gN-l
~ a~N-3' ~
By employing the controls given by U N-1 =
+ U~-l)
-abN_lxN_l/(b~_l
and and noting that bN -
2
-+
band
U N- 2
-+
0, we have
therefore, Thus, one optimal policy is to let I U N- 3 I -+ 00. These considerations indicate that this example is singular, i.e., the control cost does not attain its minimum for control policies using finite control variables. To be more meaningful, the criterion function must be modified to include the cost of control, for example, from ]= x N 2to N-l
] =
XN
2
+,\ I
o
Ui
2
3.
SYSTEMS WITH UNKNOWN PLANT PARAMETERS
115
With this modified criterion function, we can easily derive optimal control policy when b is known. It is given by u*N-, = -a 2i 0
ix
ob/(,\2 + b~1... a2(j-l))
N-,
j~i
When b is assumed to be the random variable as originally assumed in this example, we encounter the same difficulties in deriving the optimal control policy. Assuming I ai2/bi I is small, i.e., assuming the learning process on b is nearly completed, we may expand y*'s in Taylor series in (al/b i 2 ) . This approximation results in the control policy
where K N - i and L N - i are complicated algebraic functions of a, '\, bN _ i , and aN-i' Computational works on this and other approximation schemes are found in a report by Horowitz.v'< See also Ref. 135a for discussions of suboptimal control policies of a related but simpler problem where b is assumed to be either bi or b2 where the latters are given. Let us return now to (132) and consider its approximate evaluation. Since exp[ - ~{x - fL)2) in (132) is very small for I x - fL I large, one may approximately evaluate 1 by expanding (ax
+ b)2 + c
2
about x = fL and retaining terms up to quadratic in (x - fL), say. When this is carried out, I
R:oO
(27T)1/2fL2 \ (afL + {;)2+ c2 11
+
b2 + c2 b(afL + b)2 + c2(2afL + b) I 2] fL2[(afL + b)2 + c - 2afL fL2[(afL + b)Z + C2]2 \
(134)
After E(YN * I X N- 2) is approximately carried out, it will be, in general, a complicated function of U N- 2 • To find U7;_2 , the following sequential
116 III.
ADAPTIVE CONTROL SYSTEMS; OPTIMAL CONTROL POLICIES
scheme (an approximation in policy space-") may be used. First, YN-1 is approximated by a quadratic function of U N- 2 about some U~_2. This U~_2 could be the optimal U N- 2 for the control system with some definite b value. Minimization of this quadratic form gives UJv_2 as the optimal U N- 2 • Then, YN-1 is approximated again by a quadratic function of U N- 2 about UJv_2. The optimal U N- Z now is denoted by U1_1. Generally, YN-1 is approximated by a quadratic form in U N_Z about Ut-2 and Ut+!.2 is the minimizing U N- 2 . Under suitable conditions on YN-1 , Ut_2 ---+ ul;_z as i ---+ 00. See Ref. 8 for an exposition of a similar successive approximation method. This sequential determination of the optimal control, coupled with the approximate evaluation of E(yr:+1 , X k- 1), for example, by the Taylor series expansion, can generate Uk *, 0 ~ k ~ N - 1, approximately.
4. Systems with Unknown Plant Parameters and Noise Characteristics Lastly we will derive recursive formulas for p( ex, (3, 81 , 82 , Xi I yi) and p( ex, (3, 81 , 82 , Xi , ~i , YJi I yi) needed in computing optimal control policies for systems with both unknown plant parameters and noise statistics; the simpler of these two conditional probability density forms should be used for any given problem to evaluate Ak and Yk , 1 ~ k ~ N. These conditional densities are used quite analogously to (24) and (25) of Section 2 or to (84) and (85) of Section 3 in deriving optimal control policies. To obtain p(ex, (3, 81 , 82 , Xi I yi) recursively, consider
and write it by the chain rule as p(ex, (3, 81 , 8z , Xi =
, xi+l ,Yi+1
Iy
i)
p(ex, (3, 81 , 8z , Xi I yi) P(Xi+l I ex, 81
, Xi ,
ui ) P(Yi+l I (3, 82
Integrate both sides with respect to Xi to obtain
f p(ex, (3, 8
1 ,
8z , Xi
, Xi+l ,Yi+1
= p(ex, (3,81 , 82 ,
I yi) dx,
Xi+l 'Yi+1
I y i)
, Xi+1)
5.
117
SUFFICIENT STATISTICS
Hence
=
I p(ex, (3, 8
1 ,
X
82 , Xi I yi) P(Xi+1 I ex, 81 , Xi , Ui) (136)
P(Yi+! 1 (3, 82 , Xi+!) dXi
S[numerator] d(ex, (3,8
1,82 ,
Similarly, p(ex, (3, 81 , 82 , Xi , relation
gi , YJi
Xi+!)
I yi) is obtained recursively by the
= p(ex, (3, 81 , 82 , Xi , ti , YJi I yi) P(Xi+! I Xi , ti , ui) X
P(ti+! I 81 ) P(Yi+1 I Xi+! , 82 , YJi+1) P(YJi+1 I 82 )
(137)
where P(Xi+1 1 Xi , ti , Ui)
=
O(Xi+1 - F i )
from
(1)
P(Yi+! I Xi+1 , YJi+l)
=
O(Yi+1 - Gi+1)
from
(2)
Therefore,
=
I p(ex, (3,8
1 , 82 ,
X
Xi , ti' YJi I yi) o(x i+! - F i ) P(ti+1 I 81 )
P(YJi+1 1 82) O(Yi+! - G i+!) d(x i, ti, YJi) S[numerator] d(ex, (3,81,82 , Xi+!, ti+!' YJi+1)
(138)
The conditional densities (136) and (138) can also be put in forms similar to those of (38) or (39) of Section 2.
5. Sufficient Statistics In Chapter II, one example of sufficient statistics is discussed to replace the collection of observations made on x j , 0 :s;; j :s;; i, by fi-'i and T i so that P(Xi I yi, U i - 1) = (P(xi I P« ,Ti ), i = 0,1, ... , N - 1. One consequence of this is that the optimal control variable U i * becomes a function of T i and fi-'i, where fi-'i is a function of on-line variables U i- 1 , fi-'i-1 , and Yi and Other quantities that can be generated off-line. Similar simplifications of functional dependence of y's of (24) or (84) on y i and U i - 1 are possible if sufficient statistics exist for some or all of the variables ex, {3, 81 , 82 , and Xi . See Section 3 of Chapter IV for details.
118 III.
ADAPTIVE CONTROL SYSTEMS; OPTIMAL CONTROL POLICIES
As an example, consider a simple scalar final-value control problem where the state vector is perfectly observed." The plant equation is (139)
where UkC(-oo,oo)
and where y's are used instead of x's because of the assumption of perfect observation. The criterion function is taken to be J = WN(YN)' The plant noise rk's are assumed to be independently and identically distributed binomial random variables with probability 8 with probability 1 - 8
(140)
for each e, where eis the unknown parameter with its a priori distribution given. Assume, for example, that with probability with probability
(0 ~
Zo ~
1)
(141)*
An aircraft landing system that lends itself to this model is considered by Grishin.f" From (139), therefore, at time k, the past realization of the noise sequence (r o , r 1 "'0' rk~l) is available from the knowledge of y k. Therefore, the joint probability** of r o ,... , rlc- 1 is given by Pr[r o, r1
, ... ,
r k - 1 I 8] = 8k - i (1 - 8)i
(142)
where i is the number of times +c is observed and is known from yk. In other words, any function of r o ,... , rk - 1 can be computed given (k, i).
* Assumption (141) on the form of the unknown parameter 8 is not essential for the development. If 8 is a continuous variable, 0 ,;;; 8,;;; I, then an a priori probability density may be taken, for example, to be of Beta type, i.e., r(c + d + 2) 8'(1 _ 8)d ( 8) Po = T(c + I)T(d + I) ,
C,
a> 0
** In this example, the random variables are discrete and it is convenient to deal with probabilities rather than probability densities. It is, therefore, necessary to modify prior developments in an obvious way. Such modifications will be made without further comments.
5.
SUFFICIENT STATISTICS
119
The pair (k, i) is said to be sufficient for 8, or the number of times r's are +c is the sufficient statistic for 8. Denote this number i by Sk . To obtain an optimal control policy for the system (139) one computes, as usual, YN(yN-I)
=
E(WN I y N-I)
J
WN(YN)P(YN I y N- I) dYN
=
(143)
The conditional probability one needs in evaluating YN, therefore, P(r i I y i ) or P(ri I r i - 1 ) , 0 ~ i ~ N - 1. One can write it as Prir,
=
e I yi)
Pr[r i
=
=
IS
e I ri- I ]
where and where Si = t(i + (lie) LJ:~ rj) is the number of times +c is observed. Therefore, in (143) the conditioning variable yN-l, which is an N-dimensional vector, is replaced by a single number SN-l' and we can write (143) as E(WN I yN-I) = E(WN I SN-I' yN-I) = YN(YN-I, ZN-I) =
+ e + UN-I) + (l - 81)WN(aYN-I - e + UN-I)] + (1 - ZN-I)[8 N(aYN-I + e + UN-I) + (1 - 8 N(aYN_I - e + uN-I)] WN(aYN_I + e + UN-I)fJN-I + WN(aYN-I -- e + UN_I )(1 - ON-I)
zN_I[8 I WN(aYN-I
2W
2)W
=
where
0N-l
is the a posteriori estimate of 8, given yN-\
(144)
120
III.
ADAPTIVE CONTROL SYSTEMS; OPTIMAL CONTROL POLICIES
and where
where
cxN-1
[:"
=
(~)SN-l
81
(1 -
82 1 - 81
)N-1-S N_
1
Thus YN *, which is generally a function of yN-1 and U N- 2, is seen to be a function of YN-1 and ZN-1 (or SN-1) only, a reduction in the number of variables from 2N - 3 to just two. The derivation of the optimal control policy for this problem and its relation with optimal control policies of the corresponding stochastic systems, where 8 = 81 or 82 with probability one, is discussed later in Section 1 of Chapter VII.
6. Method Based on Computing Joint Probability Density Next, we will describe briefly the procedure along the lines proposed by Fel'dbaum'" to evaluate optimal control policies for the class of system of (1) and (2). The method consists in evaluating R i by first computing
rather than by computing p(81 , 82 , ex, j3, ti , YJi , Xi I yi). We evaluate R k for any non-randomized control policies eP k - 1 = (ePo ,... , eP"-l) where Ui = ePi(Ui - l, yi), i = 0, 1,... , k - 1, since the proof for non-randomized controls proceeds quite analogously. We will first discuss the case when 81 and 82 are the only unknowns.
A.
SYSTEMS WITH UNKNOWN NOISE CHARACTERISTICS
In this section we will obtain optimal control policies for the same class of systems under the same set of assumptions as in Section 2. Define
X
p",k-l(X k , yk-1,
X d(x k , yk-1,
gk-l, YJ k - l,
gk-1, YJ k - l,
81 , 82 )
81 , 82 )
(145)
6. where
COMPUTING JOINT PROBABILITY DENSITY
121
pq;-I(X k, yk-\ gk-\ 71 k- \ 81 , (12)
p(81 , ( 2 ) pq;-I(X k I x k- \ yk-l, gk-\ 71 k- \ 81 X P"k-I(Xk-\yk-\ gk-\ 71 k - 1 181 , ( 2)
=
, ( 2)
(146)
and where
- TI P . I(X Ie-I
-
c , 11'l i i v Y i , Si
l't-
I Xi-1, yi-l , <:i-1 8 ' 8) s , TJ i-I ' 1 2
(147)
i~O
By convention, the quantities with negative indices are to be ignored, so that, in (147), P,,-I(XO ,
go , 710 I x-I, y-\
= p(xo ,
g-1, 1')-\ 81
, ( 2)
to , 710 I (II , ( 2 )
(148)
The second term in (146) is simply equal to p"k-I(X k I Xk-1 , tk-1 ,yk-1) = P(Xk I Xk-1 , Uk-I' tk-I) =
!l(x/c - F k- I (X/C - l , U/c-l , tie-I»
(149)
An abbreviated notation O(Xk - F k - I ) will be used. O(Yi - Gi) will stand for O(Yi - Gi(xi , 7)i» in what follows. The right-hand side of (147) is
X
!l(Yi - G i ),
l~i~k-l
(150)
where the independence assumption of ts and 7)'s is used. The subscripts on p disappear on the densities for g and 7) since they are independent of the controls.
122
III.
ADAPTIVE CONTROL SYSTEMS; OPTIMAL CONTROL POLICIES
Thus (146) becomes, from (150),
k-l X
TI P(gi I g;-1, 01)P(YJi I YJ i- 1, 02) S(Xi -
F i- l ) S(Yi - G i )
(151)
i~1
If fs and YJ's are serially independent, then (152)
and (153)
As in Chapter II, when no confusion is likely, the superscript or subscript k on 1>k or 1>k will be omitted when 1>k or 1>k appears as a subscript of p such as Pq,k( ... ), or it may be dropped altogether. It is always clear which 1> is meant. For example, in Pq,(Xi+1 I Xi , t i , yi), 1> is really 1>i and Pq,,(Xi+l I Xi' gi ,yi) = P(Xi+l I Xi' gi' u, = ePi(yi)) If Eq. (2) is invertible, then instead of Eq. (151) the joint probability density can be rewritten as
,°
Pq,k_I(Xk,yk-l, e- l , YJ k- l, 81 2) k-l = peel , 02) P(gi I gi-l, 01)P(YJi I YJi-l, 02)
TI
i=l
X P(Yi I Yi-l , Ui-l , gi-l , tu-: , "Ii) P(Xi I Yi , YJi) X
p(X k I Xk- l , Uk_I' gk-l)
(154)
This expression is simpler if P(Yi I Yi-l , U i- 1 , t i - 1 , YJi-1 , YJi) is easily computable. If the values of parameter vectors (J1 and (J2 are known, say (J1 * and (J2 *, then (155)
but values of 81 and 82 are actually unknown and only the a priori density function for 81 and (J2 is assumed given as poe (Jll (J2)' We use the a posteriori density function for p(81 , (J2)' After y k has been observed, define Pk(Ol , 02) = Pq,(OI , 02 I y k)
°
PO(OI , 02) pq,(yk I 1,°2)
(156)
6.
COMPUTING JOINT PROBABILITY DENSITY
123
Equation (156) is evaluated from (154) or (151) using the simpler expression of the two. Equation (156) is rather complicated in general. If (154) is applicable, then (156) becomes a little simpler with Pk(y k I (Jl , (JZ)
=
JTI p( e. I ti-l, k
(Jl)
P(TJi I TJ i-\ (JZ)
i~O
Otherwise, one needs to evaluate pk-1(yi
I (Jl' (J2)
JTI k
=
P(ti I t i-\ (Jl) P(TJi I Tf i - \ (JZ)
i~O
Note the joint density expressions such as (151) and (154) are essentially the same as (38) or (39) which are obtained by the repeated application of the recursion formula for the conditional probability density functions. The method of the present section requires the generation of the joint probability density functions such as (151) or (154), which are used in obtaining the optimal control policies. In our method developed in Chapter II and in the previous sections of this Chapter, the conditional density expressions appear directly in the equations for optimal control policies, and the conditional densities are generated recursively. For example, Pk(8 1 , 82 ) of (156) is generated as follows. Using the chain rule, P((Jl, (Jz , Xi , Xi+1 , Yi+l I yi)
=
P((Jl , (Jz , Xi I yi) X P(Xi+1 I Xi , (Jl ,yi) P(Yi+l I Xi+l , (Jz)
Integrating it with respect to
thus, P((Jl , (Jz , Xi+1 I yi+l)
=
Xi ,
JP((Jl , (Jz , Xi I yi) p(xi+1 I Xi , e, , ui) X
and
P(Yi+l I Xi+l , (Jz) dx,
-;:-----"---=---'-'-=-_:....:c:c_=-,-
_
J [numerator] d((Jl , (Jz , x i+1)
124 III.
B.
ADAPTIVE CONTROL SYSTEMS; OPTIMAL CONTROL POLICIES
OPTIMAL CONTROL POLICY
The procedure to obtain the optimal control policy is quite similar to that already discussed in Sections 2-4:
Ak ~
JWk(Xk , Uk-1)p(xo , to, TJo I 8
1 ,
k-1 X
II P(ti I t i - \
82 ) p(Xk I Xk-1 , Uk-1 , e-1)Pk(81 , 82 )
81 ) P(TJi I TJ i-\ 82 ) p(X i I Xi-1 , Ui-1 , ti-1)
i~1
X
Eq,k-1(Wk )
=
P(Yi I Xi , TJi) d(xk, t k- \ TJk-1)
JA dy 7e
(159)
k- 1
The optimal control policy is now obtained from
(160)
and k
where the asterisk indicates
= 1,... , N - 1
ut-1 , ut_2 , ... , Uk
* are
substituted in
h *.
C. SYSTEMS WITH UNKNOWN PLANT PARAMETERS
From the discussions in Sections 6,A and B, it is clear that similar developments are possible when the plant equation contains unknown parameters iX and f3 or, more generally, iX, f3, 81> and 82 , Since the procedure to treat such systems are almost identical, only results are given for the system when iX is the only unknown plant parameter. An a priori probability density function for iX, PO{iX) is assumed given. Again, optimal control policies turn out to be nonrandomized. Hence the probability density for Uk is omitted. The probability density function one needs is now k
=
0, 1, 2, ... ,N - 1 (161)
where (162)
7.
125
DISCUSSIONS
and where p",(Xk+1 !IX,yk) = p(XO)
Remember that they depend on
Uk.
k
TI P(Xi+1 I Xi , Ui , IX) o
Equation (162) can be computed from
J
k dx" h( IX I yk) = -;;-'-PO(IX) p",(x I uk-I, IX) c:-:-~
~
JPO(IX)P(Xk I uk-I, IX) d(xk, IX)
Define
Ak
(163)
(164)
by
k-I
=
k-I
JWkP(XO) TIo P(Yi I Xi) TI P(Xi+1 I Xi' Uk' IX) h-I(IX) d(xk, IX)
(165)
0
k
= 1,... ,N
(166)
Define A
YN N-2) YN*(yN-I, U
= =
"\
liN
minAN(yN-I, UN-I)
uN_l
(167)
where Define
= f(yN-I, UN- 2)
U~-I
* (yN-2 UN-3) YN-I' A
=
minimizes (167) with respect to
min [A N-I
UN_2
+ JYA*(yN-I , U
N
-
2) dy ] N-I
(168)
Optimal UN- 2 is obtained from (168) as a function of yN-2 and UN- 3. In general, by defining A *(Yk-I , Uk-2) Yk
=
~~~
. ["Ilk
+ Yk+1 A* (k Y , Uk-I) dYk] ,
k
=
1,... , N - 1
(69)
we thus obtain a sequential procedure for evaluating Bayes' control policy.
7. Discussions In this chapter, the main problem has been to compute certain conditional densities of the form p( Vi I yi), where the variable Vi contains
126 III.
ADAPTIVE CONTROL SYSTEMS; OPTIMAL CONTROL POLICIES
the unobservable variables such as Xi , (Xi' e), (e, Xi , ;i)' or(o;, fJ, e1 , e2 , ;i , YJi)' as the case may be. The variable Vi is chosen in such a way that p(v i I yi) can be computed recursively starting from p(v o I Yo) and such that E(Wk I yk-l) is simply evaluated in terms of P(Vk~l I yk-l). The conditioning variables contain only observable variables or known functions of the observable variables. In the next chapter, we will expand on this point further and develop a general theory of optimal control for a more general class of systems. So far, all the discussions are on control systems with known deterministic inputs. When stochastic inputs are considered as indicated in Section I, and a different criterion function Xi ,
N
]
=
L Wk(X k, d k , Uk-I) k~1
is taken for the same system (1) and (2), where dk is the desired stochastic response of the system at time k and where actual input is given by Zk , Zk
=
Kk(d k , 'k)
then a development similar to Sections 2, 3, 4, and 6 to obtain closedloop optimal control policy is possible if the desired form for Uk is specified to be For example, we may assume that the probability density function for dk is known except for the parameter fL E ei" ' e i" given, and that the probability density function for Sk , which is assumed to be independent of fs and YJ's, is completely given. If zk, in addition to yk and uk-I, is assumed to be observed, then, using f P(fL I Zk) p( dk I fL) dfL as the probability density function for dk , it is possible to discuss optimal control policies for discrete-time systems where e's, 0;, fJ, and/or other parameters may be additionally assumed unknown. Unlike a posteriori density functions for e's, 0; or fJ, the P(fL I zk) does not depend on the employed control policy since the Zk are observed outside the control feedback loop. In such cases information on fL is accumulated passively by merely observing a sequence of random variables whose realizations cannot be influenced by any employed control policies. The procedure of obtaining the optimal control policies in Section 6 accomplishes the same thing as the procedure based on evaluation of conditional expectation
7.
DISCUSSIONS
127
In the former, the computation of A. k is complicated whereas in the latter, that of p(vi I yi) is the major computation where Vi could be Xi , (Xi' ex) or (Xi' ex, fl, 81 , 82), as the case may be. Once P(vi I yi) is available the computation of A. k is relatively easy. Thus the method of this book differs from that of Fel'dbaum primarily in the way the computations of E(Wk ) are divided. Our method is superior in that the dependence of A. k on p(Vi I yi) is explicitly exhibited and hence the introduction of sufficient statistics is easily understood. The dependence of A.'S on some statistics is also explicitly shown in our method. The similarity of the main recursion equations of Chapters II and III are also more clearly seen in our formulation. Also, in Section 6, the a posteriori density function for unknown system and/or noise distribution function parameters are incorporated somewhat arbitrarily and heuristically, whereas in our method it is incorporated naturally when P(vi I yi) are computed. It is worthwhile to mention again that the problems of optimal control become more difficult when observations are noisy. We have discussed enough examples to see that the derivations of optimal control policies are much easier when state vectors and realizations of random variables are measurable without error than when only noisy measurements on state vectors are given. The difficulties in deriving optimal controls are compounded many times when the statistics of the measurement noises are only partially known.
Chapter IV
Optimal Bayesian Control of Partially Observed Markovian Systems
1 Introduction +
In the previous two chapters we have derived formulations for optimal Bayesian control policies for purely stochastic and adaptive systems. We noted that the main recursion equations for optimal control policies are identical for these two classes of systems. The slight differences are in the auxiliary equations that generate certain conditional probability densities for these two classes. The only quantities that are not immediately available and must be generated recursively are the conditional probability densities which are p(Xi I yi) in the case of purely stochastic systems and are P(Xi' 8 I yi) or p(x i, 8i , 82 , ex, fJ I yi), etc., in the parameter adaptive systems. The other probability densities needed in computing y's are immediately available from the plant and observation equations and from the assumed probability distribution functions of noises and/or random system parameters. In each case, the conditioning variables are the variables observed by the controller or some functions of these observed variables, such as y's, u's, or sufficient statistics. The other variables are the quantities not observed by the controller, such as x's or (Xi' 81 , 82 , ex, fJ), etc. Developments in the previous two chapters are primarily for systems with independent random disturbances, although possible extensions for systems with dependent noises have been pointed out from time to time. In this chapter we consider more general systems where noises t and 1] may be dependent and where unknown plant and observation parameters ex and fJ may be time-varying. We present a formulation 128
1.
129
INTRODUCTION
general enough to cover much wider classes of control systems than those considered so far. See also Refs. 2, 14-16, 56, 105a, 130, and 135 for subjects related to this chapter. The class of systems of this chapter is assumed to be described by a plant equation i = 0,1, ... , N
-~
1
(1)
and the observation equation Yi =
Gi(X i ,
i
iu , f3i),
=
0, 1,... , N - 1
(2)
where t's and YJ's are noises and where system parameters ex and f3 are subscripted now to include the possibility of these unknown system parameters being time-varying. When they are not subscripted, they are understood to be constants. The criterion function is the same as before: N
J= I
Wi(Xi , Ui-1)
1
Only the class of nonrandomized control policies will be considered. It is fairly clear that one can heuristically argue that optimal control policies for systems of (1) and (2) are nonrandomized in much the same way as before. It is fairly clear also that the approach of Chapters II and III, where certain conditional probability densities have been computed recursively to derive optimal control policies, can be extended to cover the class of control problems of this chapter. As an illustration, consider a system with the unknown plant parameter ex and the unknown observation parameter f3. The noises t and YJ are assumed to form mutually independent first-order Markov sequences such that the unknown parameters ()1 and ()2 characterize their respective transition probability densities, i.e., and We know that, if p( ex, Ai
g =
Xi '
ti I yi)
E(Wi(X i, U i-1)
J
Wi(Xi , U i-1)
X
p(ex,
X i-1 , ~i-1
is known for all
J
I yi-1) =
Wi(X i, U i-1)
p(x i I X i-1
I yi-1)
, U i-1 ,
~i-1
d(X i_1 , Xi ,
,
ex)
ex, ~i"1)
°:s;;
j
:s;;
N - 1, then
p(x i ! yi-1) dx,
130
IV.
CONTROL OF PARTIALLY OBSERVED MARKOVIAN SYSTEMS
is computable for all 1 ~ i ~ Nand nonrandomized optimal control policies are derived from them. The conditional density p(Ci., Xi , ~i I yi) is obtained by computing recursively conditional densities of a certain vector which suitably augments (xi' Yi)' all conditioned on yi. For example, P(Xi' ~i' TJi , 01 , 02 , Ci., (3 I yi) is computed recursively. The derivation of such a recursion relation is carried out as usual. The chain rule is used to write =
f3 I yi) P(Xi+1 I Xi , ti , <x, ui ) P(ti+1 I ti' 81) P(YJi+1 I YJi' 82) P(Yi+1 I xi+1' (3)
P(Xi , ti , YJi , 81 , 82 X
, <X,
where the assumptions on fs and TJ's are used to simplify some of the conditional density expressions. Integrating both sides with respect to Xi , ~i , and TJi ,
Jp(xi , ti , YJi ,8
1,82 , o ,
f3, Xi+1 , ti+1 , YJi+1 ,Yi+1 I y i) d(x i , ti , YJi)
=
P(Yi+1 I yi) p(x i+1 , ti+1 , YJi+1 , 81 , 82
=
Jp(xi, ti' YJi' 8
, <x,
f3 I yi+1)
Therefore,
1 , 82 , <x,
X
f31 yi) O(Xi+1
-Fi ) p(t i+1
I t., 81)
p(YJi+1 I YJi , 82 ) O(Yi+1 - Gi+1) d(Xi , ti , 7]i)
f [numerator] d(x i+1 , t i+1 , YJi+1 , 81 , 82 , Ci., (3)
(3)
and
The recursion (3) is started from a given a priori probability density for (x o , ~o , YJo , 01 , O2 , Ci., (3): c 8 8 p( x o, £0 , YJo' l ' 2'
<X,
f3 I ) -
Yo-
Po( Xo , to , YJo , 81 , 82
, <x,
(3) p(Yo I Xo , (3)
f [numerator] d(x o , to, YJo , 81 , 82 , Ci., (3)
Conditional probability densities and optimal control policies for systems under different sets of assumptions can be similarly derived by first augmenting (x, y) appropriately so that the conditional densities for the augmented vectors are more easily obtainable.
1.
131
INTRODUCTION
As another example, if the system parameter cx is not a constant but a Markov random variable with known transition probability density P(CXi+l I (Xi)' and if fs and 7)'s are all independent with known densities, then P(xi , (Xi I yi) can be recursively generated similarly to (3) and used to evaluate
=
J
Wi+l(Xi+l , u i ) p(xi+l I Xi'
X
where P(Xi+l , (Xi+l I yi+l)
=
CXi ,
Ui)
p(Xi , CXi I yi) d(x i , Xi+l , CXi)
Jp(Xi , X
(Xi
I yi) p(Xi+l I Xi , (Xi , Ui) P(IXi+l I (Xi)
P(Yi+l I Xi+l) d(Xi , (Xi)
f [numerator] d(xi+l ,
IXi+l)
Instead of cataloging all such systems which are amenable to this approach, we will develop a general method which subsumes these particular cases. The approach we use in deriving optimal control policies for such systems is to augment the state vector x and the observation vector y with appropriate variables in such a way that the augmented state vector becomes a (first-order) Markov sequence. Then, in very much the same way as in Chapters II and III, the optimal control policy is obtained once we compute certain conditional probability densities of the unobserved portion of the augmented state vector, i.e., the components of the augmented state vector which are not available to the controllers, conditioned on the observed portion, i.e., the components of the state vector which are made available to the controllers. The knowledge of the controller consists, then, of the past controls, the observation data, i.e., the collection of the observed portions of the augmented state vectors and of the a posteriori probability distribution function of the unobserved portion of the augmented state vector. This amount of information is summarized by sufficient statistics if they exist. The derivation of the optimal control policy and the a posteriori probability distribution function, assuming the existence of the probability density function, is discussed in Section 3 and 4, respectively. In the next section, we pursue the subject of Markov properties of the augmented state vectors which are of basic importance. See also Ref.51a.
132
IV. CONTROL OF PARTIALLY OBSERVED MARKOVIAN SYSTEMS
2. Markov Properties A.
INTRODUCTION
In some cases {(Xi' Yin is already a first-order Markov sequence, as will be shown presently in this section. When this is not the case, there is more than one way, depending on the assumptions about noises and parameters in plant and observation equations, of augmenting (Xi' Yi) so that the resulting vector becomes a first-order Markov sequence. Clearly, if gi} is Markovian, where 'i is some augmented state vector, then we do not destroy the Markov property by adding to independent random variables with known distribution functions. Simplicity and ease of computing the conditional densities would dictate particular choice in any problem. The question of the minimum dimension of augmented state vectors 'i to make gi} Markovian is important theoretically but will not be pursued here. Generally speaking, the a posteriori probability density functions such as p(xi I y i ) and P(xi' gi , YJi I yi) are sufficient in the sense that the corresponding density functions at time i 1 are computable from their known value at time i. We can include the a posteriori probability density function as a part of an augmented state vector to make it Markovian. The dimension of such augmented vectors, generally, are infinite. We are primarily interested in finite dimensional augmented state vectors. As an example of a system where {(Xi' Yin is a first-order Markov sequence, consider a purely stochastic dynamic system of Chapter II:
'i
+
Yi = Gi(Xi, "1i),
i = 0, 1,... , N - 1
(4)
where fs and YJ's are mutually independent and independent in time and have known probability densities and (4) contains no unknown parameters. Consider a vector (5)
In (5), Yi and Ui are the only components observed by the controller. We will see that under certain assumptions gi}-process is a first-order Markov sequence, where the conditional probability of 'HI is such that Pr['i+l EEl
'0 =
Zo , ... , ~i = Zi] = Pr['i+l EEl 'i = Zi]
for all i, where E is any measurable set in the Euclidean space with the same dimension as ,. This is the transition probability of g}-process.
2.
133
MARKOV PROPERTIES
It is assumed furthermore that the transition probability density p( 'i+l I 'i) exists so that Pr['i+l EEl 'i = z;] =
r
&.'
~i+lEE
P('i+l I 'i = Zi) d'i+l
Let us compute the conditional probability density P('i+l I 'i) of (5) assuming that Ui depends only on Yi' or at most on Yi and Ui- 1 • This assumption will be referred to as Assumption Y. We have seen several examples in previous chapters where Assumption Y holds true. Generally speaking Assumption Y implies that Yi is the sufficient statistics for Xi , i.e., p(x i I y i) = p(x i I Yi) and P(Yi+l I yi, Ui) = P(Yi+l I Yi , Ui)' Then 'Yi+l will be a function of Yi and Ui~l rather than y' and u'', and ui * is obtained as a function of 'Yi rather than of y i [see, for example, Eq. (21) of Section II, 2, BJ. Detailed discussions on the validity of Assumption Y is presented later in this section. With Assumption Y, we can write p( 'i+l I 'i) as
= 8(x i+l - F i) P(gi+l) p("Ii+l) 8(Yi+l - Gi+l) X 8(Ui+l ~ (MYi+l , u i ) )
where the independence assumption of the random noises Thus, we see that
IS
used.
and P"'('i+l I 'i) is computable as a function of
'i
(6)
134
IV.
CONTROL OF PARTIALLY OBSERVED MARKOVIAN SYSTEMS
where p(xi+l I Xi' u i ) is computed from (4) making use of the known distribution of gi and similarly for P(Yi+l I xi+l)' If Assumption Y does not hold, however, then Ui depends on y i and we must consider (7)
instead. Then Pq,i(X i+1, gi+1 , 7)i+1 , yi+1 I Xi, gi, 7)\ y i) i»P(gi+1) P(7)i+1) = p(Xi+1 I Xi' gi' u, = ePi(y X
P(Yi+1 I Xi+1
,7)i+1)
and
(8)
'i
and gJ becomes a first-order Markov sequence. Since the dimension of grows with i, this process is not a conventional Markov sequence. One way to avoid the growing state vectors will be discussed later for problems with sufficient statistics. As another example of constructing a first-order Markov sequence gi}' consider the system of (4) again, this time assuming that the distribution of noises g and 7) contain unknown parameters 81 and 82 , respectively. The random noises are still assumed all independent. Then, we can no longer compute p(xi+l , Yi+l I Xi 'Yi) since p(xi+l I Xi ,Ui) is a function of 81 and P(Yi+l I xi+l) contains 82 , which are both assumed unknown, i.e., {(Xi' Yin is no longer Markovian even with Assumption Y. * Consider instead where IJ1i
=
IJ1
IJ 2i = IJ 2
i
,
=
0, 1,... , N - 1
Then =
P(Xi+1 I Xi, gi , u, = ePi(Yi» P(gi+1 [ IJ 1) X P(7)i+1 I IJ 2) o(IJU+1 X
=
,7)i+1)
O(Xi+1 -Fi ) p(gi+1 1IJ1)P(7)i+11 IJ 2) X
* It
P(Yi+1 ! Xi+1
IJ1) O(IJ 2 •i+1 - IJ 2)
o(IJU+ 1 - IJ1) O(IJ 2. i+1 - IJ 2) O(Yi+1 - G i +1)
is conditionally Markovian in the sense that P(Xi+l ,Yi+l I Xi, Y" 0 , , O2)
=
P(Xi+l ,Yi+l I Xi , Yi , 0, , O2 ) ,
This fact may be used advantageously in some cases.
2.
MARKOV PROPERTIES
135
which is computable again knowing 'i with Assumption Y, hence gi} is Markovian with Assumption Y. Also, as indicated in connection with (3) in the previous section, if we change the independence assumption on fs and r/s, then {(Xi' Yin is no longer Markovian even if the noise distributions are assumed known. For example, assume that fs and r/s are the first-order Markov sequences, and {t}-process is independent of {?)}-process. Assume that P(ti+1 I ti) and P(YJi+1 I YJi) are known. Then, in (6), P(Xi+1 I Xi' Ui) is not known since t i is not given; P(Yi+l I x i+1) is unknown because YJi+1 is not known. The augmented state vector 'i' where
still forms a Markov sequence, however, with Assumption Y. This is seen by =
p(x i+1 I Xi, X
B.
gi , U i ) P(gi+1 I gi)
p( 1)i+l I YJi) P(Yi+1 I Xi+1 , 1)i+1)
PROBLEMS WITH SUFFICIENT STATISTICS
In all the examples discussed so far in this chapter, Assumption Y is used to guarantee that Ui is a function of Yi and not of y i so that a first-order Markov sequence {SJ can be constructed by augmenting Yi and not y i. Now we consider the possibility of replacing y i by some sufficient statistics. We have seen in previous chapters that U i is generally a function of yi and not just of Yi . This dependence of Ui on yi occurs through p( . I yi) in computing \+1 . Intuitively speaking, if sufficient statistics, Si' exist so that p(xi+1 I yi) = P(Xi+l I Si), where Si is some function of Si-l , Yi' and possibly of Ui- 1 , then the dependence of Ui on past observation is summarized by Si. By augmenting the observed portion of the state vector with the addition of Si' say
g}-process may become Markovian even without Assumption Y. In order to make this more precise, consider a situation where the conditional density of xi+1 , given Xi' (), and any control Ui' is known, where () is a random parameter in a known parameter space B. That IS, it is assumed that p(Xi+1 I Xi, u, , 8),
8E
e
136
IV.
CONTROL OF PARTIALLY OBSERVED MARKOVIAN SYSTEMS
is given, that the observation noise random variables are independent among themselves and of all other random variables, and that P(Yi I Xi) is known completely. 0 and X o are assumed to be independent. Denote their a priori probability density functions by Po(O) and Po(x o), respectively. Such dependence of the conditional density xi+l on 0 may arise either through the unknown system parameter 0 in the plant equation or through the plant noises whose probability distribution function contains O. This dependence on 0 can occur, for example, for a system with the plant equation Xi+l =
a(8) Xi
+ b(8) u, +
ti
where a and b are known functions of 0 and where the t i are serially independent and have a known probability density, or for a system described by
where a is a known constant and t's are independently and identically distributed with a probability density I 0),0 E 8. We know from Chapter III that, in order to find an optimal Ui , we need the expression for
«e.
A;
= =
=
E(Wi I yi-l)
J
Wi(Xi , Ui-l)
p(x i
I yi-l) dx,
J
Wi(X i , Ui-l)
p(x i
I Xi-I' U i- l , 8) p(8, X i - l I yi-l)
where p(B, write
Xi - 1
Iyi-
1)
d(Xi_1 ,Xi'
8)
(9)
can be generated recursively as such, or we can (10)
and obtain p(B I yi-l) and P(Xi - 1 I 0, yi-l) by two separate recursion equations. The sequence of observations Yo , ... is related to the sequence X o , Xl , ... through the observation equation. Hence it is assumed that the sequence contains information on B, i.e., the joint probability density function of yi and 0 is assumed to exist and not identically equal to zero for almost all realization yi.
2.
137
MARKOV PROPERTIES
Given any control policy, the a posteriori probability density function of 8 given yi is computed from Po(8) by Bayes' rule. Define (II) Suppose now that a set of sufficient statistics Si = S(yi) exists for 8 such that it satisfies the equation (12)
When a sufficient statistic exists it is known 73 that the probability density of yi given 8 for any control policy can be factored as (13)
A large class of sufficient statistics satisfy Condition (12). For example, consider a class of probability density functions known as the KoopmanPitman class.P The density function of this class has the form p(y I B)
exp{r(B) K(y)
=
+ S(y) + q(B)},
-00
<
Y
<
00
where r(8) is a nontrivial continuous function of 8, 8(y) and K'(y) =f= 0 are continuous in y, and q(8) is some function of 8. The density of a Gaussian random variable, for example, belongs to this class. Then, p(yi I B)
=
exp [r(B) n
I K(Yi) + t S(Yi) + (n + 1) q(B)] o
0
]
[
= R [ ~ K(Yi) exp r(B).~
n
K(Yi)
+ (n + 1) q(B)]
exp(L~S(Yi))
R[L~
K(y)]
where R is a function that arises in the one-to-one transformation defined by
so that L~ K(Yi) is seen to be sufficient and IS m the form of (12). From (11) and (13), the a posteriori probability density of 8 is given by Pi(B) =po(B)f(B, 5i) J dB po(B)f(B, 5 i )
=
p(BI 5,)
(14)
138
IV.
CONTROL OF PARTIALLY OBSERVED MARKOVIAN SYSTEMS
Equation (14) shows that Pi(B), the ith a posteriori probability density of B, depends on yi only through S(yi) when s is the sufficient statistic for B. Then (10) becomes
Therefore, instead of deriving the recursive relation for p(B, x j I yj), one can obtain p(Xj I B, yj) recursively. This recursion equation will be generally easier to manage since B is now assumed known and can be generated in the usual manner. First, write
Now
Therefore, P(XH
I
Iy+l, B) =
Jp(xi I yi, e) P(Xi+l I Xi' Ui , 8) X
P(Yi+l I Xi+l) dXi
J[numerator] dX +
(I5)
i 1
with p( x
o
IY
8) 0'
=
p(xo ,Yo I e) p(Yo I e) Po(xo I e) p(Yo I X o , e)
JPo(xo I e) p(Yo I Xo , e) dx o Po(xo) p(Yo I x o)
Suppose that (15) is such that another sufficient statistic t i exists such that p(Xj I B, yj) = p(x j I B, t j), where tj+l is a function of t j, Yj+l , and u j, and that tj+1 = f2(t j ,Yj+l , Uj). Then, (9) shows that '\ will be a function of Ui-I , Si-l , and t i - 1 • To show that Yi also depends at most on U i- I , Si~l , t i - 1 , and on Yi-l , we must next investigate the functional dependence of E(yt+1 I yi-I). We know from the above augument that
and therefore
2.
139
MARKOV PROPERTIES
Then, YN* will be a function of we need
and
SN-l
tN-I.
In computing
YN-l ,
where and by assumption. Now P(YN-l
I y N- 2 )
=
I
P(YN-l
X =
I X
I X N_1) P(X N_1 I X N- 2
p(8, X N- 2 I y N-2)
P(YN-1
p(8
, UN-1 ,
d(X N_1 , X N- 2 ,
8)
8)
I XN-1) P(X N_1 I X N- 2 , U N- 2 , 8)
I SN-2) P(X N-2 I 8, t N_ 2) d(XN_1 , XN- 2 , 8)
Hence E(YN * I y N- 2) will depend at most on SN-2 , t N- 2 , and on U N- 2 • By similar reasonings, E(yl+1 I yi-l) is seen to depend at most on Si-l , t i - 1 , and on U i - 1 . Thus,
e
To summarize, Si is the sufficient statistic for and t i is the sufficient statistic for yi. Si+1 and t H I are computed as known functions of Si' t i , U i , and YH1 . Now we are ready to show that gi} is a first-order Markov sequence where (16)
O~i~N-l
This can be shown by computing
= P(Xi+l I xi, yi, r, Si) P(Yi+l X P(ti+1
I Xi+l)
I t., , Yi+1 , u i ) P(Si+l lSi, Yi+l
, Ui)
where the assumptions of the dependence of t i + 1 and and U i are used. The first factor can be written as
Yi+1 ,
Si+1
on
t i , Si'
140
IV.
CONTROL OF PARTIALLY OBSERVED MARKOVIAN SYSTEMS
Since Thus The observed portion of the vector ~i is (Yi , t i , Si). It can also be shown that gi}' where 'i = (Xi' Bi , Yi , Si , t i ), Bi = B, is also Markovian.
3. Optimal Control Policies Suppose that gi}-process, where Si is derived by augmenting (Xi' Yi) appropriately, is a first-order Markov sequence with a known transition probability density function. In the previous section several ways of constructing such "s are discussed. Components of the vector ~i can usually be grouped into two classes, one group consisting of components not observed and not available for control signal synthesis, the other group consisting of known (vector) functions of components observed and stored by the controller. Denote them by f-Li and Vi , respectively, 'i
=
(0i, Vi)
Therefore, f-Li contains some function of Xi among others and Vi contains some function of y i . * For example, if 'i = (Xi' ti' YJi, Yi), then f-Li = (Xi' t i, YJi) and Vi = Yi . The available data at time i to the controller is vi, and U i is to be determined by choosing ePi where
a. Last Stage
Let us determine U~_l' assuming U o* , ..., Ui;-2 have been already chosen. E(WN ) is minimized by minimizing E(WN I v N - 1 ) for every possible v N - l • Now, as a function of UN-I' E( W
N
I v N-1) = = =
J
W N(X N , UN-I) p(X N , U N- I
J J
I v N- I) d(X N , UN-I)
W N(X N , UN-I) P(U N-I
I v N- I) p(XN I UN-I' v N- I) d(X N
WN(X N , UN-I) P(U N-I
I v N- I) P(0N I 0N-I
X P(0N-I
I UN-I' v
N- I)
, UN-I)
, UN-I' vN-I)
d(0N , 0N-I , UN-I)
'i
* In some cases it may be convenient to take such that both fLi and of observed quantities by the controller. See Section 5 for examples.
Vi
are functions
3. assuming
UN-I) is available. The density p(Pw I fLN-I , is computed from the known transition density P(f-LN'
P(PW-I
VN-I ,UN-I)
141
OPTIMAL CONTROL POLICIES
IvN - \
I f-LN-I , V N- I , UN_I)' Following the line of reasoning in Chapters II and III, one sees that the nonrandomized control is optimal, and is given by VN
where
Uj$_1
minimizes
N I AN(U N- I , v - )
AN'
g
J
WN(X N , UN-I) P(floN
X P(floN-I I UN-I' V
I floN-I
,UN-I' V N- I)
N I) d(floN , floN-I)
Define
b. Last Two Stages Next determine Uj$_2 in such a way that when followed by Uj$_1 it minimizes E(WN - I WN I vN - 2 ) for every vN - 2 , assuming u o*, ..., Uj$_3 have been determined. As before,
+
assuming
P(fLN-2
I U N- 2 ,
vN -
2
)
is known. Since
+ JYN*(vN-I)p(V N_ I I UN- 2, v N - 2 ) X dVN-I] P(U N- 2 I V N -
Defining YN-I
=
A N- I
+ JYN * P(VN- I
I
U N- 2 , v
N- 2
2
) dU N_ 2
) dVN-I
where P(V N- I I U N- 2 , v N- 2) is assumed known, the optimal control at time N - 2, Uj$_2 , is found by Y~-I
g min YN-I u'N_2
142
IV.
CONTROL OF PARTIALLY OBSERVED MARKOVIAN SYSTEMS
and the optimal control is given as
c. General Case
Generally U't_I is found by minimizing Yk where Ak
g
J
Ylc
g
Ak
Wk(Xk , Uk-I) P(Pk
+ JY:+l P(Vk I U
Ic-
I Pk-I , Vk-I , Uk-I) P(Pk-1 I vic-I, Uk-I) k-I) dVk, I ,v Yk
*=
k
min U k_ 1
=
1, ... , N
d(fLk , Pk-I)
(17)
Yk
where P(fLk-I I Uk-I' vk-I) and P(Vk I Uk-I, vk-I) are assumed known. We see that the optimal control policies are obtainable if P(fLk I Uk' vk) and P(Vk+l I U/c , vk), k = 0, ... , N - 1, are known. Therefore, our attention is next turned to computing these conditional probability density functions. Here, it should be pointed out that this approach may be computationa:Ily advantageous even when {x/c} and {y/c} are Markovian by themselves.
4. Derivation of Conditional Probability Densities It is not generally true that {VIc} itself is Markovian even though g/c} is. We want to compute the conditional probability densities of parts of the components of a multidimensional Markov sequence conditioned on components that are observed. We assume that the a priori probability density of
'0 ,
(18)
is given. Let us now obtain the recursion equation for P(fLi I vi) and P(vi+l I vi). By now, the method of obtaining such a recursion relation should be routine for us. We consider
By the chain rule, we can write it as P(fLi ,
fLi+l ,
Vi+l
1 Vi)
= P(fLi I vi) P(fLi+1 , Vi+1 I fLi , vi)
= where the last line
IS
P(fLi
I vi) P(fLi+1
,
Vi+1
I fLi , Vi)
obtained from the Markovian property of gi}'
5.
143
EXAMPLES
Integrating both sides with respect to
fLi,
we have
Therefore, (19)
and the denominator gives P(vi+1 I vi).
5. Examples A.
SYSTEM WITH UNKNOWN RANDOM TIME CONSTANT
Let us rework Example a of Chapter III, Section 3,B, by another method, using the idea of the augmented state vector. The system equations are given by (88) and (91) of Chapter III. We know from our previous investigation of this example that a sufficient statistic exists for 8:
(20)
From (107) and (l08) of Chapter III, we can identify Sic with fLlc-1 and with };1c-1 • With this identification of the sufficient statistics, they are seen to satisfy the recursion equation:
Ua,1c
2
Ua.1+1 a.
S. = )+1
a
_1_=_1_+~ U~.H1
2
uL
J
+ Ua.1+l s. 2
2)
Ue,j
u
2
(21) (22) (23)
or
144
IV.
CONTROL OF PARTIALLY OBSERVED MARKOVIAN SYSTEMS
and ao2L~-1(Xi+l Sj
=
a2
budxi)
+ jao
+
a 280
2
Note that (X i + 1 - bui)/xi = ai in (21), showing that the value the random variable takes at time i is exactly computable because of no measurement errors. The same comment found in Example b, Section 4,B of Chapter III, applies when Xi = 0 for some i. Since Sk summarizes all the information contained in x k about e, from our discussion in Section 2,B, it is seen that controls Uk depending on Sic and x k are just as good as controls depending on x lc• Therefore, we consider the class of nonrandomized control policies such that O~i~N-l
(24)
Define (25)
Therefore, the augmented state vector equation
Sj
obeys the augmented plant
(26)
Since Uj depends only on x j and Sj and since aj is mutually independent, (26) shows that {Sj} is a first-order Markov chain. This is rather a special case where fLi = Xi and Vi = Si, i.e., every component of Si is observed. Its transition probability density is given by
The right-hand side of (28) is computable since
(28)
where
pee I Sj)
is given by (20). From (21), (29)
5.
EXAMPLES
145
An optimal control policy for the problem is now computed using (17). We first compute AN by
From the plant equation (88) of Chapter III and the probability density function of ai ((89) of Chapter III), p(xN I X N- 1 , UN-I' 0) is Gaussian with mean OXN~1 bUN_1 and variance a2x~_1 • Therefore,
+
(31)
Therefore, (32)
and (33)
Since this
IS
a final-value problem, '\; =0,
i
=
1"'0' N - 1
(34)
and
Comparing (35) with (30), we see immediately that, aside from the multiplicative constant factor, minimization of AN with respect to U N_ 1 is identical to that of YN-l with respect to U N- 2. Therefore,
or, in general, i
=
0, 1,0,0, N - 1
(36)
146
IV.
CONTROL OF PARTIALLY OBSERVED MARKOVIAN SYSTEMS
where Si is given by (21). Note that, as u0 2/u 2 2 ---+ 0, i.e., as our knowledge of the unknown mean (j becomes more precise, as expected. B.
SYSTEM WITH MARKOVIAN GAIN
Consider a one-dimensional control system described by i
= 0, 1,... , N
- 1
(37)
where a and x o are assumed known. The gain of the system bi is assumed to form a first-order Markov chain with two possible states, (38)
with known stationary transition probabilities i,j=I,2 k
=
0, 1,... , N - 1
(39)
and Pr[bo] given. The observations are assumed perfect, Yi =
i
Xi'
0,1, ..., N - 1
=
(40)
Hence x's are used throughout, instead of y's. The performance index is taken to be N
]
=
L (Xi + AuLl) 2
(41)
1
Since b.
=
Xj+l -
aXj
Uj
J
if
Uj
*°
the knowledge of xi implies that the value of bi - 1 is known. This fact is used to obtain the probability distribution for bi which we need in evaluating -\. *
* If Ui-l =
0 then bi _ 1 is unknown. In this unlikely case, we must work with 2
P(b i
I bi - 2 ) =
L: ru; I b
i..,
= f3k)P(b i -
1
= f3k I bi -
2)
5.
147
EXAMPLES
The augmented state vector for this example can be taken to be i
=
1,2, ... , N
This example problem is special in that all components of Si are known at time i. * The fact that {Si} is a first-order Markov sequence is verified by computing the conditional density P( Si+l lSi) P(~i+l
I ~i)
=
p(b i , Xi+l I br:", Xi)
= p(b i I bi-l)P(Xi+l I Xi , u,
, bi)
I ~i)
= P(~i+l
(42)
assuming ui = epi(Xi ,bi- l ) . We will see shortly that the optimal control variables have the assumed functional dependence. To derive the optimal control policy, we consider the last control stage first. Since it is convenient to indicate the dependence of the conditional expectation E(WN I SN~l) on the value of bN - 2 , let us define AN. k ~ E(WN I XN-l' bN- 2 = f3k)
=
I (XN + AU~_l)P(XN
=
I
2
2
i~l
=
I (XN2 + AU~_l)P(XN
2
I
, bN- l I XN- l , bN_2
[(aXN~l
=
f3k) d(x N , bN- l )
I XN- l , bN- l = f3i , UN-l) Pik dXN
+ f3i UN_l)2 + AU~-l]Pik
(43)
i~l
Minimizing (43) with respect to at time N - 1
U N-1'
we obtain the optimal control k
=
1,2
(44)
where (45)
In the denominator of (45), note that L:;~l f3iPik = E(bN _ l I bN - 2 = 13k)· Thus, from (44) and (45) we see that U~_l depends on the value of bN - 2 , *
~o
and p( ~o)
require an obvious special handling because of Pr[boJ.
148
IV.
CONTROL OF PARTIALLY OBSERVED MARKOVIAN SYSTEMS
i.e., ut-l = cPN-l(XN- 1 , bN - 2) as assumed in connection with (42). Substituting (45) into (43), we obtain (46)
where
Equation (46) shows that AN,l and A N,2 are both quadratic in X N- l• Now we compute the optimal control variable generally. Assume that
k = 1,2,
i
= 0, 1,..., N - 1
(47)
By definition, CN,k
= 0,
k
=
1,2
Then, from (47),
(48)
On the other hand, Yi\
= ~in
1-1
2
[,\;.k + I
m=!
E(yi+l.m I Xi - l ' bi- 2=
where
and where 2
I
E[yi+l.m I X i - l , bi -
m=l
2= ,sk]
2
=
L
m=l
E( Ci,mxll
Xi-l
.s., =
f3k)
,sk)]
(49)
5.
149
EXAMPLES
Therefore, making use of (50),
k
By minimizing (51) with respect to
Ui-1 ,
we obtain
=
1,2
Ut-l.k
(51)
to be (52a)
where
where k = 1 or 2, depending on bi - 2 = (31 or (32' respectively. Substituting (52a) into (51), the recursion equation for Cj,k is obtained: C'--l.k
=
azll + PlkC + PZkCZ,k i,l
_
[Plk,81( Ci • + 1) + P2k,82(C'-,2 + 1)]Z I ,\ + P1kf3 1Z(C'-,1 + 1) + PZk,822(C,-,Z + 1)\' 1
k
=
1,2
(53)
Equation (53) can be simplified somewhat by writing it in a vector form. Define Then
c.'-1
C,
'-1
=
aZ
(1)1 + aZ (Pn PIZ
=
(C,.-l.l) Ci-l.
(54)
Z
PZl) PZZ
C, + D, 1
,
,
i = 2,3, ... , N
(55a)
where (55h)
and where (55e) (55d)
Note that the initial vector Co must be computed using the a pnon probability Pr[bo = (3,-J, i = 1, 2, rather than the transition probabilities. From (52b) and (55),
Ai- 1 g (AA
i-
1'1) = _ (dil/flil) diZlfl iZ
i - 1,Z
(56)
150
IV.
CONTROL OF PARTIALLY OBSERVED MARKOVIAN SYSTEMS
Since C's and A's can be precomputed, the only operation that the optimal controller must perform on-line at time i is the determination of bi - 1 to be either (31 or (32 by
i
=
1,2, ... , N - 1
(57)
i.e., Pr[b i _ 1 = (3k I xi, Ui-1] = 1, where (3k is given by (57). Once bi - 1 = (3k is determined, then ui = -Ai,kXi' Comment. Instead of the system of (37)-(40), if the system is governed by (58)
where b is now assumed known, and where
ai
is such that (59)
for all i = 0, 1,...
(60)
then the optimization problem of the above system with respect to the performance index
J=
N
L
(Xi
2
1
+ '\ul)
can be similarly carried out. Next consider the same plant equation (37), the Markov parameter (38), and the performance index (41) with noisy observation
The previous development has shown that, if the properties of the noise are such that if the values of bi can be determined exactly from yi+1 and ui, namely (61)
for k = 1 or 2, then the previously still optimal for this modified noisy is not true, however, then the above The reader is invited to rework this augmented state vector.
derived optimal control policy is observation problem. When (61) policy will no longer be optimal. example by suitably defining an
5. C.
151
EXAMPLES
DISCRETE-STATE STOCHASTIC SYSTEM
Consider a three-stage discrete-state, discrete-time, stochastic control process whose state at time i is denoted by Xi and the observation of Xi at time i is given by Yi . It is given that Xi is either a 1 or a z , and Yi is either b1 or bz . The criterion of control is to minimize the expected value of where
i
= 0, 1,2
gi and YJi are some noise processes. The control U i is assumed to take on only two values, 0 and m. The vector Si = (Xi' Yi) is assumed to be a Markov chain with known stationary transition probabilities, given U i . The initial probabilities of So are also assumed known. Since this example deals with discrete-state variables, the developments in the main body of the chapter must be modified in an obvious way to deal with probabilities rather than probability densities. Such modifications will be made without further comments. Four possible states of So are labeled as follows:
C1 = (aI' bl)' c2 = (a 2 , bl ) c3
=
C4
=
(aI' b2), (a2 , b2 )
Given 1 :(;i:(;4
where gl
=
0.45,
g2 = 0.05 g3 = 0.1,
s, = 0.4 and the stationary transition probabilities gij(U) =
Prob(~k+1
= c, I ~k = Cj, u),
k
=
0, 1
152
IV.
CONTROL OF PARTIALLY OBSERVED MARKOVIAN SYSTEMS
where p(O) = (gij(O» =
and p(m) = (gij(m» =
C
0.05 0.35 ~.~5 0 0.05 0.6
005)
0.5 0 0.55 0.45 0.1 0.05 0.3
C
05 )
5 0.4 0.05 0 0.45 0.05 ~:~5 0.55 0 0.4 0.4 0.05 0.5 0.05
it is desired to find the optimal sequence of control (uo , uI ) . Suppose a particular realization of YJi is such that Yo = bI and YI = b2 are observed at times 0 and 1. Let us now obtain an optimal U I • Let Define
or (a 1 )
Wo
=
0.45 0.45 + 0.05
=
O.
9
0.05 (a2 ) = 0.45+ 0.05 = 0.1
Wo
Let us define i = 1,2
The transition probabilities for x are given by
etc. Therefore, w
and
l(a1
I u)
=
(g31(U)
g31(U) wO(a1) + g32(U) wO(a2) wO(a1) + (g32(U) + gd u» w O(a 2)
+ g41(U»
5.
153
EXAMPLES
Thus, Y2 *(yl,
Uo
= 0) = m~n 1 L (I X2 - al I + u2) P(~2
I ~l)
Wl(X l
I 0) (
I ~l)
Wl(X I
I m) ~
Xl'~2
and Yz *(y1, Uo = m) =
~}n 1 L (I "1"2
'1
X2 - a 1 I + u2) P(~2
'2
where the summation over Xl and ranges over all possible states. Since = (Xl' b2) , defining d = I a 2 - a l I,
Y2*(Y" u«
~
and
d[(gdO) + gdO)) wl(al 10) + (g24(0) + g44(0)) w l (a2 I 0)],
= 0) = min m2[(g13(m)
I \
Xl
+ g33(m)) w1(al 10) + (g14(m) + g34(m)) w (a2 I 0)] + (d + m'l)[(gdm) + gdm)) w1(al I 0) + (g24(m) + g44(m)) w (a2 0)] 1
l
'2
I'
j
1
A similar expression for Y2 *(y\ U o = m) is obtained by replacing 0) with wl(ai I m). In performing numerical calculations, the optimal U l , when U o = 0 and (bl , b2 ) have been observed, is given by wl(a i I
and =m
if
d
<
3.6 m 2
if
d
>
3.6 m 2
Chapter V
Problem of Estimation
We have discussed in Chapters II-IV optimal Bayesian control policies for discrete-time dynamic systems under a wide variety of assumptions on plant equations, observation equations, system parameters, and on random noises. In this chapter, we consider the problems of estimation and discuss three principal methods of estimation; the least-squares, maximumlikelihood, and Bayesian estimators are discussed for both linear and nonlinear systems. Their interrelations are also indicated. Not only are the estimation (or identification) problems of interest on their own merits, but also they are of inherent interest as a part of over-all system optimization problems. We have noted that for a limited class of systems the optimal control problems naturally separate into two subproblems: one is the construction of the optimal estimators of state vectors or unknown parameters, and the other is the synthesis of optimal controllers. For this class of systems, therefore, the over-all optimal control schemes are optimal estimators followed by optimal controllers. For a much larger class of control problems, however, the over-all optimization requirements do not permit such a convenient and simplifying separation of the estimation processes from the control processes. For such a class of problems this separation of problems affords initial approximate solutions to the original control problems which may be improved upon if desired. Such approximation schemes are important in practice. As we have seen in the previous chapters the sets of equations resulting from the formulations of the over-all control system optimization problems in many cases are often too complex to permit exact analytical solutions in closed forms. We must consider approximate solutions of the op154
1. LEAST-SQUARES ESTIMATION
155
timization equations and approximate implementations of optimal control policies for the theory of optimal control to be useful in many practical problems. There are many such approximations. We have mentioned approximations arising from the separations of estimation problems from the over-all control optimal problems with possibly further approximations being introduced in estimation and/or control subproblems. In Chapter VII we discuss some of the techniques of approximation in control problems plus a few topics on approximate estimation schemes.
1. Least-Squares Estimation A.
INTRODUCTION
The method of least squares will be discussed first. One advantage of the method of least squares is that the method does not make explicit assumptions on various statistical properties of random variables involved. We need not know what the noise covariance matrices are so long as they exist in order to construct the least-squares estimators. We will show later that when noises are Gaussian and plant and observation equations are linear, the results obtained from the method of least squares with appropriate weights agree with those obtainable from maximum likelihood methods, or Bayesian methods. The subject of the least-squares estimation is an ancient one and there are many ways of introducing the subject of this section. 71a ,101 We will use the techniques of dynamic programming and invariant imbedding of R. Bellman.s'' The developments in this chapter are also based in part on a work of Ho and Lee. 72 ,100
B.
STATIC SYSTEM
There are numerous problems involving both static or dynamic systems which are amenable to the method of least squares. We will examine one static problem in some detail. The examination of this particular problem will lead us naturally to the least-squares estimation procedures for state vectors of dynamic systems. Suppose that we have an equation connecting an observable variable y to some variable or parameter x of a system or of some physical material, say (1) Hx =Y where H is a known m X n matrix, x is an n vector, and y is an m vector.
V.
156
PROBLEM OF ESTIMATION
It is desired to solve (l) for x in terms of Hand y. If the observation is exact and if H is invertible, or if the matrix H has rank n (this implies m ?: n), then one can determine x exactly from one observation y: x
(H'H)-lH'y
=
When either or both of these assumptions are not valid, say m < n and/or y contains observation noise, then one may decide to make a number of observations Yo , Yl ,... , Yk on x and choose x which satisfies (1) with Yo ,..., Yk in a sense that the sum of the squares of the errors (residuals) are minimized. * With this as a motivation, let us discuss the way of choosing an optimal x from a set of noisy observation data where Yi
=
H,»
+ (measurement noise),
i = 0, 1,...
(2)
in the sense that the chosen x minimizes lk(x)
g
k
I /I u» -
(3)
Yi lI~i
i=O
where Vi is an m X m symmetric positive matrix and acts as a weight for different Yi' This implies that we have some idea of the relative magnitudes of the random noises involved, otherwise V may be taken to be the identity matrix. lk is the criterion function of the estimation problem. The optimal x will be determined sequentially so that one need not re-solve the least-squares problems from the beginning when new observation becomes available. From (3), one readily sees that lk+l(x)
Denote by
Xk
* the
lk(x)
=
+ II Hk+lx
- Yk+lll~""
(4)
optimal x of (3), that is
u.. *)
~
(5)
lk(x)
for any x in the Euclidean n space. From (5) and (3), by considering x = Xk * Llx, we obtain the equation X k * satisfies:
+
k
Llx'
I
H/ Vi(H,x k*
i~O
-
Yi) =
°
(6)
for any Llx in the Euclidean n space. t
* There is a close relation between the concept of pseudoinverse':" of H, its sequential determination and the method of least squares. t This condition is called the orthogonality principle. 8 5 • 8 6 , 1 0 90 Note that Llx is a linear function of y".
I.
157
LEAST-SQUARES ESTIMATION
Let us write (7)
i.e., the optimal x for 1k+1 is written as a sum of Xlc *, the optimal x with the data ylc and a correction term due to the additional observation data, Ylc+1 . The correction term can be determined as fol1ows. From (4) and (7), making use of (6), k
Ik+l(x:+1) = lk(x k*)
+ Llx~+l
+I
(Hi Llxk+l)' Vi(H i Llxk+1)
i=O
k+l
(I H;' ViH i) Llxk+l
(8)
,~o
An optimal correction term, ..::::lxlc+1' minimizes (8) with respect to ..::::lxlc+1 . Since (8) is quadratic in the correction term, ..::::lX[+1 is given by
(to H/ViHi) kH
Llx:+1 = -
+
H~+l
Vk+l(Hk+lX k *
-
Ylc+1)
(9)
If the inverse of (L~~l H/ ViHi) exists, then the pseudoinverse is to be replaced by the inverse. See Appendix B of Chapter II for derivation. The pseudoinverses are discussed in Appendix II at the end of this book. Later, in discussing the maximum likelihood estimators, the weights Vi of (3) are identified with the inverses of the covariance matrices of the observation noise random variables. Thus, in many cases the Vi are positive definite and the inverse in (9) wil1 exist. Define Pk+l =
+
k+1
(Lo H;'ViH
i)
(10)
This is an (n X n) matrix. Then, from (7) and (9), (11)
158
V. Y
PROBLEM OF ESTIMATION
r-------- - --- - - - - - - - - - - ---- --,
I
NEW k+1 OBSERVATION:
TIME VARYING GAIN
I
Kk + 1
\ ..
I I
" I X:+ 1
:
UNIT DELAY
L
K
k+1
=P
: J OPTIMAL FILTER
H' V k+1 k+1 k+1
Fig. 5.1.
I I
Optimal least-squares estimator for a static system.
where
n.; =
k+l
I
o
H/ViHi =
k
I
H/V,.Hi
+ H~+1Vk+lHk+l (12)
since A++ = A for any matrix A. See Fig. 5.1 for the schematic diagram of the optimal estimator. If there are no measurement errors and if X k * is exact, then (II) reduces to Xt+l = X k * as it should. Equations (II) and (12) are the desired recursive equations for X Ic *. Using the matrix identity in Appendix D, Chapter II, (1I) can also be written as
when the indicated inverse exists. From (10), P's are seen to be (n X n) matrices and (vt+1 + H k + 1PkH!c+1) is an (m X m) matrix. By taking the additional data one by one, i.e., by taking m = I, the inversion of the matrix in (13) is reduced to the trivial operation of inverting a scalar number. Thus, generating Pic by (13) is a much easier task computationally than to invert an (n X n) matrix of (10) directly. Note the similarity of the forms of (II) and (12) with those of (132a) and (I32b) of Chapter II.
C.
LINEAR DYNAMIC SYSTEM
Having solved the least-squares estimation problem of the static system, one can quickly dispose of the least-squares estimation problems for deterministic dynamic systems. The variable x in the previous section is now regarded as the state vector of a system governed by Ai nonsingular,
i
=
0, 1,...
( 14)
1.
159
LEAST-SQUARES ESTIMATION
where Xi is the n-dimensional state vector of the system at time i. The state vector is assumed to be observed by Yi
H,x,
=
+ YJi ,
i = 0, 1,...
(I5)
where Hi is an (m X m) matrix and where »;» are measurement noises. It is desired to obtain at time k the best least-squares estimates of X k as functions of Yo ,..., Yk' ie, obtain Xi *, 0 ~ i ~ k, as a function of y i such that they minimize the criterion function
I,
k
=
L: I it», -
(16)
Yi I ~i
i~O
where Vi is now taken to be positive definite for simplicity lk can be regarded as a function of X k since (14) can be used to express Xi as a function of X k , i < k: where
k-l
TI A
i
j ,
=
0, 1,2, ..., k - 1
(I7)
J·=i
is the transition matrix of the system (14). Thus (16), which is a function of Xo ,... , Xk , can be rewritten as a function of Xk only: k
lk(xk) ~
L: I K!'Xk i=O
(I8)
Yi I ~i
where Kl ~ Hi
+ II(H k+1 Xk+1 -
(19)
Yk+1)II~k+1
Since X k *, if exact, produces the exact state variable at time (k by (14), let us write Xt+l as
+
1)
(20)
Then, from (19), quite similarly to (8), lk+l(x:+l)
=
lk(x/)
+ (LlXk+l)'A~-l
k
(L: K:'ViK/) A"k Llxk+l 1
.~O
(21)
*A
slight, obvious modification of the method works even when A's are singular.
V.
160
PROBLEM OF ESTIMATION
Thus, by minimizing (21) with respect to LlXk+l , we get the sequential scheme of generating the best least-squares estimate for the dynamic system of (14):
where P k is defined analogously to (10). Making use of the relation Kf+l = KlAj/, k+l
P;11
=
L: (K~+l)'
Vi(K~+l)
=
A~-l
i=O
k
(L: K{ ViK/) A;l + H~+l
Vlc+1Hk+l
i=O
(23)
when the indicated inverse exists. When we compare (16), (22), and (23) with (II7), (132a), and (132b) of Chapter II, respectively, we note that they are identical if we equate Vi with R;l, Pi with I', , and if we put B/s and Q/s identically equal to zero to reduce the system of (II 7) of Chapter II to that given by (14). Therefore, if the noises TJ in (15) are Gaussian random variables with mean 0 and covariance matrices R, then the best least-squares estimate of Xk , X k * is the same as fLk , the conditional mean of Xk given yk. Thus we see that fLk minimizes lk , i.e., the trace of T k • This fact is sometimes stated by saying that fLk is the minimum variance estimate" of X k • See Fig. 5.2 for the schematic diagram of the optimal estimator. Figure 5.3 shows the diagram of Fig. 5.2 rearranged in order to show more clearly the way the optimal estimates are constructed. When P does not have the inverse, P+ should be used in (23) quite analogous to
TIME VARYING GAIN Kk +1
, PREDICTION
Fig. 5.2.
Optimal least-squares estimator for a deterministic dynamical system.
1.
161
LEAST-SQUARES ESTIMATION
,-----------------------, y
I +
Y +
TIME VARYING GAIN
k+1
I
L }-----------.-'---:
CO~~~~TION
NEW OBSERVATION
I X"
I I
~
k.,
I I
_..J
Xk + 1=A k X~
I
PREDICTION
Xk +1=Ak \
yk
= Hk X + NOISE k
PREDICTED VALUE AT POINT X CORRECTION TERM AT POINT Y
SYSTEM
Fig. 5.3. The optimal least-squares estimate at time k + I consists of the updated optimal estimate at k plus a correction term generated by a new observation at time k + I.
(12) of Section B. If there are no measurement errors and if Xk * is exact, then, of course, (22) reduces to xA!'+1 = Akxk *. When measurement errors are present, AkXlc * is the best least-squares estimate of Xk+1 , given Yo ,... , Yk but not Yk+1 . From (23), P~l = A~-lPklAkl H~+1Vk+1Hk+1' From Appendix II, D, this can be rewritten as
+
Pk+l
=
AkPkA k' - AkPkAk'Hk+I(V;Jl
+ Hk+lAkPkAk'Hk+lflHk+lAkPkAk' (24)
where remarks similar to those of the previous section apply to taking the inverse of (V;;;l + Hk+1AkPkA'H~+1)' When (15) is replaced by a more general equation Yi
=
h(xi)
+ 1;
one can still discuss the problem of obtaining
(25) Xk
* such that it
minimizes
k
lk =
L (h(x i) -
i=O
Yi)'Vi(h(x i) - Yi)
(26)
lk is, however, no longer quadratic in x's. The optimal Xk+1 is still obtained by minimizing Ik-rl(Axk * LlXk+1)' numerically if necessary, although it is not possible now to give a simple recursion equation for optimal correction term LlXk+l . One may, however, expand h(x) into a Taylor series and treat the linearized problem by means of the technique of this section. Another technique is discussed in Section 1, E of generating sequentially approximate estimates of the state vector when the observation equation is given by (25). See also Section 3 of this chapter for more detail.
+
V.
162
D.
PROBLEM OF ESTIMATION
NOISY LINEAR PLANT
N ow consider that not only the observations are nOlsy but also the system plant is subject to noise: (27) (28)
where g's and YJ's are random noises. Unlike our previous problems we now have two sources of errors in estimating the state vector. In our previous problems, if Xk is known exactly, so is x k +! . Now, because of g's, this is no longer the case. The criterion function may now consist of two kinds of error terms,
I, =
k
L [II KikXk -
Yi
II~i
i=O
+ II Xi+l
-
AiXi
II},)
(29)
where Vi and T, are symmetric positive matrices. They express relative weights we attribute to the two error sources. Equation (29) implies that we have a good idea about the relative magnitudes of the various random variables involved. Thus, if we are led to believe that the effects of YJ are smaller than those of g's, then one would tend to believe the observations y more and attach a larger weight to the first term. Here we can make comments similar to those we made in connection with (22) and (23). That is, by properly identifying the weight matrices V and T with the error covariance matrices Rand Q we see that the best least-squares estimate of Xk of the system (27) and (28) is the same as the conditional mean of X k given yk. Therefore f-Lk is the minimum-variance estimate of Xk' Detailed discussions on (29) are deferred to Section 3, where the subject of the Bayesian optimal estimates are taken up. There it will become clear that the weight matrix should be taken to be the inverse of the covariance matrix of the noise involved.
E.
NONLINEAR SYSTEMS
a. Introduction
We now discuss the least-squares estimation problems of a nonlinear system. Consider the plant and the observation equation given by X k +1
=
fk(X k)
Yk
=
h,,(Xk)
(30)
+ YJk ,
k
=
0,1, ...
(31)
1.
163
LEAST-SQUARES ESTIMATION
If (30) can be solved for X k uniquely in terms of Xk+I as X/c = gk(Xk+I)' where gk is the inverse of fk , then X o , Xl"'" X!c-l can be expressed in terms of X!c as before and l!c can be regarded as a function of X k . Defining I» analogous to (3) or (18), Jk+l(X)Ctl)
Jk(gk(Xk+l))
=
+ (hk+l(Xk+l)
x Vk+l(hk+l(Xk+l)
- Yk+l)' (32)
Yk+l)
-~
The estimate xt+l , then, will be expressed as x:+ I
= fk(X k*)
+ Llxk+1
The optimal correction term is that Llxk+l which minimizes lk+1 with respect to Llxk+I . Equation (32) is, however, no longer quadratic in Llxk+I . As mentioned in Section I,C, we may linearize (30) and (31). Then we can apply the technique of Section I,C. Instead, we next derive approximately optimal least-squares estimates recursively by the method of invariant imbedding. 2o , 22 Sequential estimation problems for general nonlinear dynamic systems are discussed more fully in Section 3.
b. Approximate Solution The method is given for one-dimensional systems in order to present the basic procedure most clearly. It is a discrete-time version of the method proposed by Bellman et al. 24 Consider a plant equation
i = 0, 1,...
(33)
where J. is assumed differentiable and the observation equation i
=
0, 1,...
(34)
where 71's are noises in observation, and where hi IS assumed twice differentiable. Denote by Xi * the best least-squares estimate of Xi at time i. Namely x i * is the best estimate of the current state variable Xi of the system. We will look for the recursion formula for Xi * in the form of (35)
Define the criterion function of estimation by
J(x, N) =
N
L Vi[Yi o
h i(Xi)]2 ,
Vi
;;:::0,
°
(36)
V.
164
PROBLEM OF ESTIMATION
with the understanding that x is the state vector at time N, i.e., X N = x. The v/s are weights of the observations, which implies that some ideas of relative magnitudes of the variances of YJi are available. Otherwise one may take vi = 1, i = 0, 1,... , N. The optimal X N *, therefore, satisfies (37)
where
Ix is the
I with respect to l.,(xt+l ,N + 1) = 0
partial derivative of
x. Similarly, (38)
From (35), (37), and (38),
o=
lx(x;l+l , N lx(x N*, N
=
+
1)
+ 1) + lxx(x N*, N + l)gN(x N*) + ...
(39)
Therefore, we obtain the expression for the best correction term as gN(X N*) ~ - lx(x N*, N
To compute
Ix
and
Ixx
+ 1)/lx.,(xN*, N + 1)
(40)
in (40), note from (33) and (36) that
+ VN+l[YN+l - hN+l(x + fN(X))]2 = l(x + fN(X), N + I) = lex, N + I) + lx(x, N + l)fN(X) + t lxix, N + l)fN 2(X) +...
lex, N)
(41)
where the last line is obtained by the Taylor series. Differentiating (41) with respect to x, lx(x, N)
=
+ fN'(X)]{],,(x, N + 1) + lxx(x, N + l)fN(x) + 2vN+lb'N+l - hN+l(x + fN(X))] h;'l+l(x + fN(X))} + (terms in lxxx and higher)
[1
(42)
Substituting x N * for x in (42) and noting (37), lixN*' N
where
+ 1) ~
- lxx(x N*, N
+ l)fN(X N*)
~ 2v N+l[YN+l -
hN+l(xN+l)]h;"+l(xN+l)
(43) (44)
is the best estimate of X N +1 at time N. Since ](x, N) is nearly quadratic in the neighborhood of x N *, terms in Ixxx and higher derivatives are neglected in obtaining (43) from (42).
1.
165
LEAST-SQUARES ESTIMATION
From (40) and (43), therefore, (45)
where (46)
See Fig. 5.4 for the schematic diagram of the estimator. Note that (45) has the form identical with the one-dimensional version of the recursion formula of (22) previously obtained for linear dynamic and observation equations. There XN+ 1 is ANxN* since the plant equation is linear. One may, therefore, be tempted to interpret the term 2/lxx(xN *, N + 1) in KN+l as PN+l of (22) and infer that its inverse satisfies a linear recursion equation similar to (23). We next show that this is indeed the case when terms in lxxx and of higher derivatives of I are neglected. Define (47)
Differentiating (42) with respect to x, substituting X N * for x, and noting that lxx(x N *, N) = lxx(x'k-l , N) + (terms in lxxx and higher), Ixx(x N*, N
+ 1)[1 + f/(x N*)]2 , N) - 2v N+l[YN+l - hN+l(XN+l)] h~+JCXN+l)
Ixx(x~-l
x [1
+ fN'(XN*)] + 2v N+l{h N+l(XN+l) [1 + f/(xN*)W
(48)
r - - - -- - - - - - - - - - - - - - - - - - - --- --I Yk + l
I
,
NEW I OBSERVATIONI :
PREDICTION
X
+
k+1
*
I I X
I
I :
L
I I I
TIME VARYING GAIN Kk + 1
I
l:
k
I
I
I I
UNIT DELAY ~
x*
k+1
__ J
LEAST SQUARE ESTIMATOR
Fig. 5.4. Optimal least-squares estimator for systems with nonlinear plant and observation equations.
V.
166
PROBLEM OF ESTIMATION
Substituting this into (47),
P-i:/~l
=
[1
+ fN\X N*)]2{P,,;I + vN+l(h~+l(xN+l)[1
- vN+l[YN+l - h N-el(XN+l)] h':v+l(X N+1 ) [1
+fN'(XN*)])2
+ f~(XN*)]}
(49)
This is the desired recursion formula for P-l. Note that unless the observation equation is linear, P could become negative. Equation (47) indicates, then, that the estimate will not be optimal, suggesting that the Taylor series expansions and other approximations made in deriving (45) and (46) are no longer valid. c. Particular Case
For a linear system given by Yi
=
hix i
+ Y]i
(45), (46), and (49) reduce to the correct formulas:
where and where
From (49), it is seen that the computation for P's can be done off-line when the observation equation is linear. When the observation equation is nonlinear both xl+! and Pi+I must be generated on-line, and the signs of P's must be checked. When negative P's are encountered, the sequential estimation process must be reinitialized since the estimates that have been generated so far have, in all likelihood, drifted away from the true values significantly. It is a straightforward exercise to repeat the above for the vector case and will not be given here.
F.
LINEAR CONTROL SYSTEMS
We consider a system described by Xi+l
=
Yi =
+ Biu i H,», + Y],:,
(50)
Aix i
i = 0,1, ...
(51)
1.
167
LEAST-SQUARES ESTIMATION
where Ail exists. The control vectors u are also assumed to be directly observable. The criterion is taken again to be
I,
k
=
L II u», o
Yi
lit,
(52)
In this section, the developments essentially parallel those of Section l,e. The only difference that the newly introduced control term makes is in the predicted value oX k+1 of Xk+1 based on the observations Yo ,... , Yk . Namely, instead ofAxk * in (20) of Section l,e, we will now have (53)
Thus XZ+l can be obtained from Eq. (52) by minimizing it with respect to Llxk +1 • Since the control terms act as known biases in estimating x's it is not actually necessary to derive the expression for optimal Llxk +1 all over agam. Define i-I
Wi =
I
Xi -
(54)
¢i.j+lBjU j
j~O
and Zi =
Hiw i
+ YJi
(55)
Then, Wi satisfies w i +1 = Aiwi . This is the dynamic equation for w's. Then using the optimal estimation formula for w's with the observation
NEW OBSERVATION,
~
Yk+1
: TIME VARYING GAIN
Uk CURRENT CONTROL
Kk+1
I
I I I
•
it
X =A X +B u k+ I k k k k
'PREDICTION
\+I=AkXk+BkU k ) Y k
Fig. 5.5.
=H X + NOISE k k
SYSTEM
Optimal least-squares estimator for linear control systems.
168
V.
z's, and noting that Zk+l obtain
PROBLEM OF ESTIMATION
=
Yk+l - Hk+l(Xk+l - wk+l)' we immediately
where U O, Ui , ... , Uk are the actual controls used up to the present time and P's are as defined by (23). See Fig. 5.5 for the schematic diagram of the optimal estimator.
2. Maximum Likelihood Estimation A.
INTRODUCTION
In Section 1, equations for sequentially generating the best leastsquares estimates of the state vectors have been derived. One of the most attractive features of the method of least squares is its wide applicability to many practical situations since the method does not assume any statistical properties of the random disturbances beyond the mere existence of finite variances. We show in this and the next section that, for linear systems with Gaussian random noises, the results obtained by the method of least squares coincide with those obtainable from other well-known estimation methods, such as the maximum likelihood method or Bayesian estimation method upon proper identifications of the weights used in the leastsquares estimates. We will first discuss the maximum likelihood estimates. 39 , 73 For the applications of the maximum likelihood and other similar methods to the estimation of the plant parameters (sometimes called identification problems 72a ) rather than to the estimation of the state vectors, see the detailed discussion in Ref. 91a. See also Ref. 100. The maximum likelihood method is an important method of point estimation, i.e., method of estimating a value of the unknown parameter or variable, and is used widely in many practical problems. Roughly speaking, the maximum likelihood method chooses the value of a parameter or a variable to be estimated appearing in the probability distribution functions in such a way as to maximize the likelihood function. A likelihood function is the probability (density) function when regarded as a function of the parameter. For example, suppose y k is a set of data from which an estimate of a parameter 8 is to be constructed. Suppose also that the joint probability density of yk, given 8, p(yk I 8), is available where 8 appears as a parameter. When 8 is regarded as the variable, then p(yk I 8) is a likelihood function.
2.
169
MAXIMUM LIKELIHOOD ESTIMATION
Denoting the maximum likelihood estimate of by the relation
e by e*, e*
IS
gIven
p(yk I 8*) = max p(yk I 8) e
The intuitive reason behind this choice is that the sequence of observations having the maximum density functional value is the most likely one to be realized (observed). B.
STATIC SYSTEMS
Consider first the static system (2) of Section 1 Yi = Hix
+ YJi
(57)
where YJ/s are the m-dimensional Gaussian random vector with E(YJi)
=
0
(58a) (58b)
where R, is an m X m positive definite matrix. The probability density function for YJi is then given by (59)
where I R; I is the determinant, and the joint probability density function of YJo ,... , YJk is given by p(YJo ,..., YJk) =
k
(27T)-!mlk+ll
III R; 1-1/ 2 i=O
(60)
We could write explicitly the density function of the Gaussian random variable YJ as (59), as soon as we know its mean value and the covariance matrix Eq. (58) because these two quantities are sufficient statistics and serve as the parameters which specify the density function uniquely from a family of distribution functions for Gaussian random variables. * This is a useful fact which is used many times. Gaussian
* Gaussian random variables have many remarkable properties which are in part responsible for their widespread use in theoretical works. We discuss one such property, self-reproducing property of the distribution function forms,I28.129 in Appendix IV on sufficient statistics at the end of the book.
V.
170
PROBLEM OF ESTIMATION
random variables are said to have normal distributions. For a brief discussion on multidimensional Gaussian random variables and their distribution functions, see Appendix III at the end of this book. By changing variables from YJ's to y's in (60), p(Yo, Yl ""'Yk I x)
k
(27T)-!m(k+l)
=
TI ! n, 1-
1 2 /
i=O
exp( -Ek/2)
(61)
where
s,
k
k
i~O
0
L YJ/Ri 1YJi = L II Hix
=
- Yi IIh'
(62)
In (61), the variable to be estimated, x, appears as a parameter of the probability density expression. Namely, by interpreting x as the unknown parameter vector in the joint probability density function, (61) is exactly the likelihood function for the observation Yo '00" Yk' The maximum likelihood estimate of x, Xk *, is that x which maximizes (61) or which minimizes Ek • By comparing (62) with (3), we see immediately that Xk * obtained in (11) is the maximum likelihood estimator of x if Vi is identified with R;t, i = 0, 1, 2'00" k. Furthermore, since X k * minimizes Ek , x lc*
Ie
=
Ie
-1
(L H,'Ri 1H i) L H/RiIYi i=O
i=O
Ie
=
Pic L H/Ri\Hix + YJi) i=O
k
=
X
+ Pie L H/RiIYJi
(63)
i=O
where P k is as defined in Section 1,B. The schematic diagram of the optimal estimator is identical to Fig. 5.1 when V's are replaced by R-I'S. Since x is a constant and YJ's are Gaussian, x k * is a Gaussian random variable with E(x k * I x) = x
(64)
and cov(x lc*) = E[(x k * - x)(x,c* - x)'J k:
= E
[Pk (L
l=O
H/Ri
IYJiYJ/Ri 1H
i)
Pic] (65)
2.
MAXIMUM LIKELIHOOD ESTIMATION
171
Thus, under the stated assumptions, Xk * is the unbiased estimator of x, i.e., the mean of the estimator is the variable to be estimated. Define ek = x - X k * as the error of estimation. Then P k in Eq. (65) can be interpreted as the covariance matrix of the error of the estimation. The conditional error covariance matrices of the state vector estimates have been calculated for systems with linear plant and observation equations in Appendix C of Chapter II when the noises are all Gaussian. We denoted them by rk's. When plants are static with no noises, r k reduces to P k as it should. Thus, by identifying R,;,I by Vk of Section 1, we see that, for systems of (57), the maximum likelihood estimate minimizes its error covariance. An estimator is said to be efficient when it has this minimal variance property.i'?
C.
DYNAMIC SYSTEMS
We can similarly interpret the least-squares estimates obtained in Section 1,C for the linear dynamic systems as the maximum likelihood estimator by identifying Vi with R;:l. Namely, given (66) (67)
where YJ's are as in Section 2,B, (68)
where
s,
k
=
I
=
I II tt», -
o
YJ/RiIYJi
k
o
Yi
IIh
1
(69)
and where p(YJk) is to be maximized by choosing Xi to minimize E k subject to the constraint (66). By comparing (69) with (16) of Section 1,C, one immediately obtains
where
172
V.
PROBLEM OF ESTIMATION
or
(71)
The schematic diagram of the optimal estimator is identical to Fig. 5.2, where V's are replaced by R's. Pursuing this line of investigation further, one can discuss the best maximum-likelihood estimator of the state vector for a noisy linear system with noisy observation. Consider (72) Yi =
tt»,
+ "Ii
where A, C, and H are known matrices and random vectors with *
E(tit/)
= Q;'8i j
E(tiTJ/)
= 0
(73) ~/s
and '/s are Gaussian
(74)
The effect of ~i in Eq. (72) is to introduce uncertainty in the value of Xi+l given Xi , In other words, we now have the additional probability density function P(Xi+l I Xi) to deal with in evaluating the error covariance for the estimation error ek+1 = x k+1 - Xf+l . This adds a term CkQkCk' to the previously obtained expression for Pk+1 , i.e., instead of A"PfcAfc' in (71), one would have (75)
With (75) replacing the expression AfcPkA'fc in (71), (70) is now a valid expression for the maximum likelihood estimator for the system state vector of (72). This point will be made much more explicit when the various joint conditional probability densities are evaluated. This is done in the next section where optimal Bayesian estimation of the state vector is discussed.
* The
reader is invited to consider the case with E(t(TJ/)
=
Sia,; as an exercise.
3.
OPTIMAL BAYESIAN ESTIMATION
173
3. Optimal Bayesian Estimation A.
BEST ESTIMATORS
In Chapters II-IV we have seen that the probability density function p(xi I yi), or its generalized form P(fLi I vi), plays an essential role in constructing optimal Bayesian control policies. We have also seen that, for a certain class of problems with quadratic criterion functions, we can construct optimal control policies if we know only E(xi 1 yi) without knowing P(xi I yi). E(x i I yi) is far easier to deal with than p(xi I yi) especially numerically since the latter is a function. It requires, in principle, a table with an infinite number of entries to represent a function numerically. Thus, we may decide to use E(xi ! yi) instead of p(xi yi) at least for this class of problems. Therefore, for estimation problems with quadratic criterion functions the conditional expectation E(x i yi) is said to be the best (optimal) estimate of Xi given yi. There are other classes of problems where E(x i I yi) is not the best estimate. The optimal estimate depends on the types of criterion functions employed. If the criterion function is such that L(Xi , z) = I Xi - z I, then the best estimate is given by choosing z to be the median of p(x i ! yi).22 The maximum likelihood estimate is the best estimate in the sense that it maximizes the P(Xi ! yi). For example, Cox investigated the estimation method closely related to the maximum likelihood method. 4o , 41 He uses the modes of the a posteriori probability distributions as the best estimates. Given p(x i I yi), one has already three estimates which are best, depending on the criteria used. These estimates are, in general, different. When the density functions, however, are symmetric and unimodal, for example, Gaussian, these three estimates are the same. In this section we will discuss optimal Bayesian estimates. These are the best estimates in the sense described above (such as the conditional mean or median, etc., depending on the criteria used for problems), where p(x i I yi) is computed from a priori probability density functions according to the Bayes rule. The conditional means are taken to be the best estimates throughout this section. Namely we use quadratic criterion functions. The procedure to obtain optimal Bayesian estimates should be clear by now since we have been computing p(x.i I yi) all along by the Bayes rule. We have also obtained sufficient statistics for p(x i I yi) for linear systems with Gaussian random noises in Section 3 of Chapter II. Hence, we will only briefly summarize the steps involved to obtain best Bayesian estimates. 1
1
174 B.
V.
PROBLEM OF ESTIMATION
STATIC SYSTEM
Consider the problem of estimating x, given y, where y = G(x, YJ)
(76)
and where x is an n-dimensional state vector, y is the observation vector (m-dimensional), and Y) is the noise vector (q-dimensional). Let us now compute p(x y). Assume the probability density function p(x, Y)) is given. This will be the case, for example, if p(x) and p(Y)) are given and x and Y) are independent. Then, from (76) and p(x, Y)), p(y) and p(y I x) are obtained by the Bayes rule, 1
p(x I y) = p(xJJ(y I x) p(y)
If (76) is invertible for
(77)
Y),
where g-l is the inverse of G, then p(x, y) = p(x, g-l(X, y))1 ] I
where
J is
the Jacobian
] = det
(~)
and p(x I y) =
p(:~_!_y)
(78)
p(y)
Thus, the a posteriori density p(x I y) is obtained from either (77) or (78) and its mean, median, or mode can be computed. As an example, consider a linear system given by y = Hx
+ YJ
(79)
where x is an n-vector, H is an m X n matrix, and where Y) is an m vector. Assume that x and Y) are independent and that p(x, Y)) = Pl(X), P2(Y)), where Pl(X) is Gaussian with E(x) =
x,
cov(x)
=
Po
(80)
3.
OPTIMAL BAYESIAN ESTIMATION
175
and where P2('ry) is a Gaussian with E(YJ)
=
0,
(81)
From (79)-(81), y is seen to be a Gaussian random vector with E(y)
=
cov(y) = HPoH'
He,
+R
(82)
Hence,
Since the Jacobian is 1, p(x,y) =p(x,YJ) =P1(X)P2(YJ) = P1(X) P2(y - Hx) and p(x I y) .
=
P1(X) P2(y - Hx) Pa(Y)
=
C exp( -
t {(x
- x)' P(j1(X - x)
- (y - Hx)'CR
+ HPoH')-l(y -
where 1
C
=
+ (y
(27T)n(2·
- Hx)' R-1(y - Hx)
Hx)})
(83)
[ HPoH' + R [1(2 I Po 11 ( 2 [ R 1 ( 2 1
By completing squares in (83) (See Appendix A at the end of this chapter), p(x Iy)
=
C exp( -
where x*
=
x
t (x
- x*)' P-1(X - x*))
+ PH'R-1(y -
Hx)
(84) (85)
and (86)
or, from the matrix identity in Appendix D at the end of the Chapter II,
Since p(x I y) is symmetric and unimodal about x*, all three best estimates, i.e., the conditional mean, mediam, and mode of p(x I y), are given by x*.
176
V.
PROBLEM OF ESTIMATION
Incidentally, we have just rederived in the example the WienerKalman filter for a single-stage estimation problem.
C.
DYNAMIC SYSTEM
We next discuss estimation problems for general dynamic systems. Consider a dynamic system described by (87)
and observed by (88)
where the subscript i refers to the ith time instant and ti is the noise in the dynamics of the system. Assume that p(x o I Yo) is computable from the a priori information on (87) and (88) and that we can obtain probability density functions P(Xi+l I xi) and P(Yi I Xi) from those of t i and YJi and t i and YJi are independent. As in Chapters II-IV, we can generate P(Xi I y i ) recursively. By the Bayes rule, (89)
Also, by applying the chain rule to P(Xk , xk+l , Yk+l I y k), we write
Integrating the above with respect to
Xk ,
From Eqs. (89) and (90), (91)
is the recursion formula for p(x i I yi), i = 0, 1,.... Using (91), various optimal estimates of Xk+l can now be obtained by specifying the estimation criterion functions.
3.
OPTIMAL BAYESIAN ESTIMATION
177
If the assumptions of independence of (;i and YJi are removed, then one must deal with P(Xk+l I x k) and P(Y1c+l I x k+1) instead of p(xk+1 I x k) and P(Yk+l I xk+l)' Then
and p( Xk+l I yk+l)
=
p( Xk+l , yk+1) p(yk+l) p(x k I yk) p(xk+l I xk, yk) P(Yk+l I xk+l, y k) P(Yk+l I yk)
f D.
p(x k I yk)p(Xk+l I xk,ylc)P(YIc+l I Xk+l,y lc) p(xkl ylc)p(x lc+;\ x k , ylc)P(YIc+:T Xk+l, y lc) d x k+1
(92)
EXAMPLE. SEQUENTIAL (KALMAN) FILTER
As an illustration of the previous section, let us rederive the equation for the best estimate Xk* of Xk when the system is linear and the random noises are Gaussian:
+ Citi
Xi+l
=
Aix i
Yi
=
Hix, +1)i,
i
=
0, 1,...
(93)
with the assumption that f;i and YJi are independent white Gaussian random vectors such that E(ti)
E(1)i)
=
0,
=
QiOi;
COV(1)i ,1);) =
R/)i;
=
COV(ti,
t;)
COV(;i , 1);)
= 0,
all i
all i andj
The details of the derivation are found in Section 3 and Appendix C of Chapter II. The best estimates with correlated noises are also found in Section 3 of Chapter II. The probability density function p(xi I y i ) has been shown to be Gaussian, hence has the sufficient statistics
v.
178
PROBLEM OF ESTIMATION
and
Now P(Xlc+1 I ylc)
IS
Gaussian with E(x k+1 I yk)
=
COV(Xk+1 I yk) =
Akx k* ~ xk+l
r,
where
Since P(Ylc+1 I ylc) is Gaussian with
from (89) and (93),
where (95) (96)
K k+1 = rk+1H~+1R;}l
r;;l
or
=
P;tl
+ H~+1R;~lHk+1
where (98)
To = covjx, I Yo] is given from the a priori information on the system. Alternate expressions for T's and K's have been obtained in Section 4 of Chapter II as
tc, = r i+1
PiH/[HiPiH/
+ R;j-l
= [1 - K i+1Hi+l]Pi+l[1 - Ki+1Hi+1J'
(96)'
+ Ki+1Ri+1K;+l
(97)'
Symbol M i was used there instead of Pi with e's replaced by identity matrices.
3.
OPTIMAL BAYESIAN ESTIMATION
179
It is important to realize that even if the random noises are not Gaussian, the best linear estimate of Xi given yi, i.e., the estimate which is linear in the observations Yo, Y1 ,.", Yi is given by (95). The estimate x i * is the orthogonal projection of Xi on the linear manifold spanned by y i. The best estimate with a quadratic criterion function, E(x i I yi), is the best linear estimate when alI variables, x's and y's, are Gaussian random variables, adding further importance to this estimate.
E.
CONTROL SYSTEMS
Let us modify the example discussed in the last section by adding a control term to the plant equation (93): X i+ 1 =
Yi =
+ Biu, + Gig i Hix i + 7Ji Aix,:
(99) (100)
where Ui is the closed-loop control vector, i.e., Ui is some function of Under the same set of assumptions, we can derive the expression for Xt+1 in terms of Xk * and Uk . As before, by computing p(xk+l I yk+l), one can obtain xt+l in a straightforward manner. The only difference now is that
1'i .
whereas E(xk+l I yk) = Akxk * in the last section. Thus, we obtain
where
The .schematic diagram of the optimal Bayesian estimation is identical to Fig. 5.5, the expression for the gain being the only difference. F. KALMAN FILTER FOR LINEARIZED NONLINEAR SYSTEMS
a. Introduction
Optimal Bayesian filters for general dynamic systems have been discussed in Section 3,C. The special case is discussed in Section 3,D, where the plant and the observation equations are both linear and when the conditional means are taken to be the best estimates.
V.
180
PROBLEM OF ESTIMATION
In this section, we will construct Kalman filters for nonlinear systems by linearizing the plant and the observation equations.s" I t should be noted,. however, that the procedure of linearizing the plant and the observation equations first and constructing an optimal linear filter for the linearized system does not necessarily yield a better approximation than that of obtaining the exact or approximate conditional probability densities for the nonlinear system, as was done in Section 3,C, and then approximating it further if desired by appropriate linearization for ease of handling and/or implementation. This latter approach will be developed in Section 3,G.
b. Construction of Filter In order to construct an approximate filter for a nonlinear system we linearize a nonlinear plant and observation equations xCi
+ 1) =
F(x(i), t(i), i)
y(i)
G(x(i), "f)(i), i)
=
(101)
by a Taylor series expansion about a nominal sequence of state vectors = 1](i) = 0, retaining only terms up to the first order. The subscripts are now used to denote components of vectors. The time index is carried as the argument:
x's and about t(i)
xCi
+ 1) = F(x(i), 0, i) + A(i) (x(i) y(i) = G(x(i), 0, i) + H(i) (x(i) -
+ v(i) xCi)) + wei)
xCi))
(102)
where xCi) is the nominal xCi) and where A(i) is the Jacobian matrix of 1" with respect to x. Its (i, j)th element is given by
which is the partial derivative of the jth component of F by the kth component of x and the partial derivative is evaluated at xCi) = xCi), t(i) = 'Y)(i) = 0, and i. Similarly, we compute H(i)
g [(
oG) ]
ox
ik (x(;) .0. i)
,( ') = "L., of/x(i), 0, i) OC
V,t
Wj
k
Sk
C (') Sk t
.) _ " oGj(x)i), 0, i) (') (t - L., 0 "f)k t
k
"f)k
3.
181
OPTIMAL BAYESIAN ESTIMATION
where vj(i) is the jth component of the vector v(i). Assume
=
E(v(i»
E(w(i»
= 0
E(v(i) v'(j» = V(i)
0ij
E(w(i) w'(j» = W(i)
s;
E(v(i) w'(j» = 0
Define a(i) = x(i) - x(i) f3(i
+ 1) =
x(i
+ 1) -
(103)
F(x(i), 0, i)
y(i) = y(i) - G(x(i), 0, i)
Then f3(i
+ 1) = y(i)
=
+ v(i) H(i) a(i) + w(i) A(i) a(i)
(104)
are the linearized plant and observation equations. Using symbols defined as a*(i) = E(a(i) I y(O),..., y(i»
~(i
+ 1) =
E((3(i
+ 1) I y(O),
(3*(i + 1) = E((3(i + 1) I y(O),
~(i
+ 1) =
, y(i» , y(i + 1»
A(i) a*(i)
the Kalman filter for the system of (104) is governed by
(3*(i +1)
=
~(i
+ 1) + K(i + l)[y(i + 1) - H(i + 1)~(i
+ 1)] (105)
Then from (103) the approximately optimal estimate is given by x*(i + 1) = F(x(i), 0, i) + (3*(i + 1)
(106)
and a*(i) = x*(i) - x(i)
The expressions for error-covariance matrices can be similarly obtained. If the nominal x's are chosen to satisfy the nonlinear plant equation with no noise, then a(i) and (3(i) coincide. IfF and G are linearized about x*(i), instead of about the nominal x, then a*(i) = 0
since a(i) = x(i) - x*(i)
V.
182
PROBLEM OF ESTIMATION
Hence ~(i
+ 1) =
0
[3*(i
+ 1) =
K(i
and Define x(i
+ 1) =
F(x*(i), 0, i)
x*(i
+ 1) =
x(i
Then
+ 1) y(i + 1)
(107) (108)
+ 1) + K(i + 1) [y(i + 1) -
G(x(i
+ 1),0, i)]
(109)
is the Kalman filter for the linearized system, where the gain is computed recursively from
ru + 1) =
+ 1) H(i + 1)] M(i + 1) [1 + K(i + 1) R(i + 1) K'(i + 1)
[1 - K(i
K(i
+ 1) H(i + 1)]' (110)
where
+ 1) = K(i + 1) =
M(i
+ V(i) M(i + 1) H'(i + 1) [H(i + 1) M(i + 1) H'(i + 1) + W(i + 1)]-1 A(i) T(i) A'(i)
c. Numerical Example
A numerical example is now presented. The example is computed by R. E. Orr 109a and is included here with his consent. In this example the arguments refer to the time instants. The subscripts refer to the components of vectors. Consider a point mass accelerated in a vacuum by the reaction force of expelled mass. Then the acceleration, a(t), is given by a(t) = F(t) = m(t)
_
(111)
assuming a constant velocity, C, and flow rate, m, of expelled mass; mo is the mass at an arbitrary time to = O. For simplicity, assume the acceleration to be directly away from a point at which range x and/or range rate x are being observed. Variations in the direction of acceleration, as well as motion of the observation point, can be taken into account, but will not be considered here since they do not fundamentally affect the problem. It is desired to estimate x, x, and x = a(t) from observations of x and/or x contaminated by white noise. Because a(t) is nonlinear, a linear approximation method of the previous section is used and the optimal Kalman filter is obtained for the linearized problem.
3.
183
OPTIMAL BAYESIAN ESTIMATION
Letting C, m, and rno vary by small amounts,
oa =
(C + oC)(m + om) ( ) mo + omo - (m + om)t - a t [memo
mt)] oc + [Cmo] om - [Cm] om o + [mo - mt] oc om [mo - mtp + [mo - mt] omo - [(mo ~ mt)t]om
~
This expression may be approximated by a linear equation
if the perturbations are small with respect to the parameters C, m, and rno · The resulting linearized system may be represented diagrammatically as in Fig. 5.6. The uncertainties in C and m are assumed to be represented by white noise passed through a first-order filter, while orn o is a bias. The "nominal" value of a is independently determined as a known function of time. The state vector x is taken to be
(113)
The observation equation is given by y
=
Hx
+ Yj
(114)
a
Fig. 5.6. Schematic diagram of a linearized system for the nonlinear estimation problem ( 112).
184
V.
PROBLEM OF ESTIMATION
or, in its component form, 0 0 1 0 0 0 0 0 0 0
[f] ~
0 0 0 0 0
m:~J m +
(114)'
when x and x are observed. If either quantity were to cease being observed, the corresponding element of the H matrix must be put to zero. The plant equation for x of (113) is given by x(k
+ 1) =
rj>(k) x(k)
+ ;(k)
(lI5)
where 1
o
rj>(k) =
1
0 0
[
o
AC1(k) A'C1(k) e:"
0
o o
0 o 0
and where A
=
B
=
A'
BC2(k) B'C2(k)
e-
a
e-
b
e- b
0
+a +1 a2
+b+1 b2
1 - e- a
=----
a 1 - e" B' = - - b
and
The time-varying gams are considered constant during the sampling intervals, i.e.,
3.
185
OPTIMAL BAYESIAN ESTIMATION
If the change in oa is sufficiently small during a sampling interval, a sample-and-hold can be introduced at Sa to further simplify the 1> matrix:
1>(k)
~1
=
tC1(k) C1(k) e- a 0 0
o
01 0 0 o 0 o 0
tC'(k]
tC2(k) C2(k) 0 e- b 0
C3(k) 0 0 I
(116)
Noise covariance matrices are given by
~ [a'~
R(k)
~ ~~
Q(k)
The initial value of the
0 0
r
0 0 0 0 0
0 0 0 0 0
0 0 a 32 0 0
0 0 0 a4 2
~]
0
~ ~ ~,
G
0 0
2
xo
0 0
0 0 0 a~~o 0
x(k
+ I)
2
0 0 0
asco
Letting 1>(k) x*(k)
=
(117)
~J
0
matrix is given by a xo
F(O)
0 0 0
0 0 0 0 0
0
a 22
( 118)
JJ
(119)
or, in terms of its components, x(k
+ I)
=
+ 1) = oC(k + 1) = om(k + 1) = omo(k + 1) = x(k
x*(k)
+ x*(k) + tC
+ tC
3(k)
x*(k)
+C
oC*(k)
+ tC2(k) om*(k)
omo*(k)
e:" oC*(k)
e" om*(k) omo*(k)
1(k)
1(k)
oC*(k)
+C
2(k)
om*(k)
+C
3(k)
om o*(k)
V.
186
PROBLEM OF ESTIMATION
The sequential estimation equations are given by x*(k
+ 1) =
x*(k -l- 1)
=
+ 1) ~~ Sm*(k + 1) = Smo*(k + 1) = SC*(k
+ 1) + Kn(Yl - ,Q(k + 1))+ K 12(Yz - &(k-t 1)) &(k + 1) + K 2l(Yl - x(k -1- 1)) K 22(Y2 - &(k -I 1)) SC(k + 1) + K:n(Yl - x(k;- 1)) + K 32(Y2 -- &(k + 1») Sih(k + 1) + K 4l (Yl - x(k + 1)) + K 42(Y2 - &(k + 1)) MiO(k + 1) + Kobl - x(k + 1)) + K 52(Y2 - :f:(k+ 1)) x(k
( 120)
The error covariance of a is not computed by T's because a is not contained in x. Since this covariance is of interest, it is computed as follows. From (112),
20
40 TIME. SECONDS
Fig. 5.7.
Variances of the estimates of the linearized system versus time.
3.
187
OPTIMAL BAYESIAN ESTIMATION
or Ga
Z
=
c 1 Zr3 3
+C
+ 2C
1CZr34 -
2c ZC3 r45
Zr 2 44 -
2C 1C3r35 + C zr 3
55
Figures 5.7 and 5.8 show the error covariances T i j and G a 2 versus time when only x is observed (H 22 = 0, R u = 100) for the parameter values given in the accompanying tabulation:
Parameter
Time constant of filter t
a
C = 10 4 ft/sec = 10 slugs/sec rno = 2000 slugs
10.0 sec 1.0 sec co (bias)
«c = 10. a,;, = 0.01 a mo = 200.
til
10
2
4
a. XIO m
0.1
0.01
20
40
60
80
100
TIME, SECONDS
Fig. 5.8.
Variances of the estimates of the linearized system versus time.
V.
188
PROBLEM OF ESTIMATION
where t's are the time constants of the filters used to represent errors in C, m, and mo . Note the initial rapid decrease in the error covariances as the first measurements improve the estimates. Since mo is a constant, the error covariance of its estimate decreases monotonically as long as measurements are made. C and m are constants with additive stochastic variations, so the estimation errors reach constant values; oX and x, as well as a, are time-varying quantities with additive time-varying stochastic components proportional to l/m(t) and 1/m(t)2. After an initial decrease, the error covariances of these quantities begin to increase as the mass becomes smaller. If the process is allowed to continue indefinitely, these error covariances will go to infinity as met) ---+ O. The filter gains K i l are shown in Figs. 5.9 and 5.10. Similar reasoning 8
7
6
5
4
:3
2
Fig. 5.9.
Time-varying gains of the Kalman filter for the linearized system.
3.
OPTIMAL BAYESIAN ESTIMATION
189
7
6
5
4
3
2
40
20
60
TIME. SECONDS
Fig. 5.10.
Time-varying gains of the Kalman filter for the linearized system.
to the above applies. The components of the error covanances T li 2 ~ i ~ 5, are shown in Figs. 5.11 and 5.12.
G.
,
ApPROXIMATE BAYESIAN ESTIMATION OF NONLINEAR SYSTEMS
a. Estimation Based on the First Two Moments
In this section another approximate method of obtaining the conditional means of a posteriori probability densities is carried out for a scalar nonlinear system given by Xi+l =
under the assumptions that
+ ~i h(Xi) + Y"fi
f(x i )
(I21)
Yi
=
Xo
is Gaussian with mean a and vanance
190
V.
PROBLEM OF ESTIMATION
20
40
60
80
100
TIME • SECONDS
Fig. 5.11.
The off-diagonal elements of the error-covariance matrices versus time.
that f and h are twice differentiable, and that random variables independent of X o with
0
2,
E(~i)
E(~i~i)
e. and TJi are Gaussian
= £(7)i) = 0 = qi2Dii
The method similar to Section 3,E of Chapter II can be used when t and TJ are correlated. The following development is given for the uncorrelated t and YJ noises. Unlike Section F, we do not linearize (121) at the beginning.
3.
191
OPTIMAL BAYESIAN ESTIMATION
100
10
20
40
60
80
100
TIME. SECONDS
Fig. 5.12.
The off-diagonal elements of the error-covariance matrices versus time.
The recursion equation for the conditional probability density P(Xi I yi) is given by
. 1)
P(Xi+l I Y'+
=
Sp(Xi I yi) P(Xi+ 1 \ xJ P(Yi+l I Xi+1) dx, ----0----------numerator] dXi + 1
J[
(122)
The initial conditional density p(x o I Yo) is given by (123)
where
192
V.
PROBLEM OF ESTIMATION
and
The approximation consists in regarding p(x i I yi) as approximately Gaussian. In other words, only the first term in the asymptotic expansion of P(Xi I yi), such as the Gram-Charlier or Edgeworth series.P is retained in the approximation. Expanding h(xo) about X o = a, (Yo - h(XO))2
=
(Yo - h(a))2 - 2(yo - h(a)) h'(a) (x o - a) + [h'(a)2 - (Yo - h(a)) h"(a)](xo - a)2
+ o«xo -
a)2)
An approximate expression for (123) is given by (124)
provided
1/02 + [h'(a)2 - (Yo - h(a)) h"(a)]/r0 2 > 0
where
(125a)
1/00 2 = 1/a2 + [h'(a)2 - (Yo - h(a)) h"(a)]/r02
(125b)
=
a
+
(Yo - h(a)) h'(a)/r02 (Yo _ h"-'c'(a----')'-h)~"c-(a~)]-/ro- c-2
+ [h'(a)2 _
P»
1/02
Assume that (126)
where fLi and (Ji are generally functions of yi. From (121) we have (127)
and
In order to carry out the integration with respect to of (122), expand j'(e.] about fLi as a Taylor series:
Xi
in the numerator
3.
193
OPTIMAL BAYESIAN ESTIMATION
From (126), (127), and (128),
J
p(Xi Iyi) P(Xi+l I Xi) dx, ~ canst exp( -
t
Ei )
where
To carry out the integral with respect to Xi+1 in the denominator of (122), expand h(x i +1 ) into a Taylor series about (131 )
to obtain
(132)
The exponent in (128) can be written as [Yi+1 - h(Xi+1)]2 = [Yi+1 - h(Xi+l)J2 - 2(Yi+l - h(Xi+l» h'(xi+l) (xi +1 - Xi+1)
+ [h'(Xi+l) + O[(Xi+1 -
(Yi+1 - h(Xi+l» h"(Xi+1)] (Xi+l - Xi+1)2 Xi+1)2]
(133)
Combining (130) and (133), after the denominator in (122) is computed, we obtain P(Xi+l I yi+l)
~
canst exp (- (Xi +12--: fLi+1)2 ) a i +1
(134)
where (135a)
where
xi+1
is given by (13 I), and where
provided l/a7+1 turns out positive.
194
V.
PROBLEM OF ESTIMATION.
From (135b), the variance is seen to depend on the observation data. This effect is not observed if the nonlinear equations are linearized first and the Kalman filter is constructed for the linearized systems. Comments similar to those in Section 1,E can be made about negative variances which may occur in using this method. In short, a negative variance, when it occurs, is an indication that some of the assumptions of the Section are no longer valid and that the sequential estimation procedure of (135a) should be restarted.
b. Higher-Order Estimator In order to increase the accuracy of the filter, it may be necessary to include the effects of the moments higher than the second. Namely, instead of approximating p(xi I yi) by N(fJ-i' ui 2 ) , it may be necessary to include the effect of the third moment, i.e., the skewnesss? of the probability distribution, in approximating I yi) by
«».
p(X; I yi) ~ const
[1 _ Yi6 )3 (Xi -UiJ!:.i.) _ (Xi -a,J!:.i.)3 jIJ exp (_
(Xi - !-Li)2) 2ai2
where
is the index of the skewness of the distribution. Then proceeding quite analogously to the previous section, assuming now that f and h are differentiable at least three times, and retaining more terms in Taylor series expansions of f and h, a set of recursion equations for fJ-i , Ui , and Yi are obtained. Then fJ-i+l is given as a function of fJ-i , Ui , Yi , and Yi+l among others. Of course, this approach can be extended to include the effects of fourth or higher moments to increase the accuracy of approximation at the expense of added complexities of computation. The method suggested in this section and some variations have been examined in some detail and performances of these nonlinear filters, as well as that of the Wiener-Kalman 'filter applied to the linearized system (method of Section F), have been investigated numerically. See Ph. D. Thesis by H. Sorensens for computational results. The preliminary numerical experiments seem to indicate that the method of Section F is quite good if the nominal xCi), about which the system equations are linearized, is taken to be the current optimal estimate x*(i), and that it is usually necessary to include more than the first two moments to achieve a comparable or a slightly better result than that obtainable by the method of Section F.
195
APPENDIX. COMPLETION OF SQUARES
Another approach would be to use a density function from a class of probability density functions specified by a finite number of parameters, such as a Pearson type density function.i" to approximate p(x i I yi). This approach may be more advantageous since the approximating functions are always true probability density functions. For a more general discussion of nonlinear estimation problems, see, for example, Balakrishnan.l" See also Ref. 46.
Appendix.
Completion of Squares
The procedure of completing squares of expressions involving matrices and vectors has already been discussed in detail in Appendixes C and D, Chapter II. Since Appendix C contains many other topics, we excerpt here the portion on completing the matrices for easy reference. With regard to (83), define E = (x - x)'P~\x
- x)
- (y ~ Hx)'(R
+ (y
- Hx)'R-\y - Hx)
+ HPoH'tl(y -
Hx)
Note that R and Po are symmetric since they are covariance matrices. Collecting terms which are quadratic in x, linear in x, and independent of x, we have E
=
(x - X)'P;;I(X - x)
+ [y
-- Hx - H(x - X)]'R-l[y - EIx- H(x - x)]
- (y - Hx)(R-l - R-IHPH'R-l)(y -- Hx)
+ H'R-IH)(x - x) - 2(x -- x)'H'R-l(y ExnR-1 - (R + HPoH')-l(y- Hx)
= (x .- x)'(P;;I
+ (y
-
Define p-l = P;;I
Hx)
+ H'R-IH
From the matrix identity in Appendix D, Chapter II, (R
+ HPoH')-l = u-: -
R-IH(P;;l+ H'R-IH)-lH'R-l
= R-l - R-IHPH'R-l or R:' - (R
+ HPoH')-l
= R-IHPH'R-l
196
V.
PROBLEM OF ESTIMATION
Therefore, E = (x - .i),P-l(X - i) - 2(x - i),H'R-l(y - Hi)
+ [H'R-l(y -
Hx)]'PH'R-l(y - Hi)
= (x - X*),p-l(X - X*) where x* = i
+ PH'R-l(y -
Hi)
Chapter VI
Convergence Questions in Bayesian Optimization Problems
1. Introduction We have adopted the Bayesian approach in this book in studying the optimal control problems. The Bayesian approach for adaptive control problems is characterized by the assumption that there exists an a priori distribution of the unknown parameter vector e, e E e, where e is a known parameter space, which specifies the system parameter and/or the distribution functions of the random variables affecting the system. This a priori distribution is updated by the Bayes rule to obtain the a posteriori distribution of e when a new set of information (observation data) becomes available. The Bayesian approach is widely used not only for optimal control problems but also for many other situations involving estimation of parameters in communication theory, statical theory of learning, operations research, and so on. Because of the widespread use and the importance of the Bayesian approach in applications, it is important to investigate several related questions on the Bayesian approach such as: (1) When can the search for optimal policies be restricted to the class of Bayesian policies? (2) How do we choose a priori distribution for e? (3) When do the sequences of a posteriori distributions computed by the Bayes rule converge to the true distributions? 197
198
VI.
CONVERGENCE QUESTIONS
(4) Do two different choices of a priori distribution converge necessarily eventually to the same distribution? (5) What is the effect of a particular choice of a priori distribution on the performance index of the system? Answers to these questions will help us see whether the Bayesian approach is reasonable or applicable in any given problem. The first question really asks for the conditions for the completeness of a class of Bayesian policies, i.e., the conditions under which optimal policies are guaranteed to be Bayesian, in a given problem. This question has been answered in Chapters II-IV and is not pursued any further here. See also Sworder.l 33 ,136 The second question has been a source of various criticisms against the Bayesian approach. So far no rational procedure has been put forth to choose a priori distributions. Spragins pointed out the desirableness of choosing self-reproducing-type a priori distribution functions. 128 It is sometimes suggested to choose a priori distributions which maximize the entropy, subject to some constraints such as fixed means of a priori distributions.?? In case the same decision problems occur repeatedly and independently with the same p(y I 8) and the same Po(8), the empirical Bayes approach may be used.P? ,118 ,138 In this approach no specific a priori probability density Po(8) is assumed. Under certain conditions, the empirical Bayes control policies can be shown to approach asymptotically to the optimal but unknown Bayes control policies, which would be used if Po(8) is known. Question (2) is intimately connected with Questions (3), (4), and (5), because if it is true that different a priori distributions do converge to the same true distributions eventually then the arbitrariness associated with the initial choice of distribution functions becomes largely immaterial except for questions related to transient behaviors such as Question (5). Investigation of Question (5) will give, for example, upper and/or lower bounds on additional costs of control or losses in the performance indexes due to "ignorance" on the part of controllers. Needless to say, such bounds are useful in many engineering problems. 1 2 ,134 In Section 1, Chapter VII, we investigate some problems related to Question (5). For a class of linear control systems with quadratic criterion function we derive expressions giving approximate costs of control for parameter adaptive systems, as functions of control costs for the related stochastic systems, of a priori probability distributions and of the state vectors. Questions (3) and (4), which are questions of convergence of a
2.
CONVERGENCE QUESTIONS: A SIMPLE CASE
199
posteriori distributions, are discussed in Section 4 of this chapter. It turns out that answers to such convergence questions are already available in the existing mathematical and statistical literature in various forms as martingale convergence theorems. 4 7a ,10 2 After an elementary discussion of convergence in Section 2, more precise statements of Questions (3) and (4) are given and some pertinent theorems are collected and stated in forms convenient for our purposes in Section 4. Since stochastic processes known as martingales are unfamiliar to most control engineers, we digress from the convergence questions and discuss martingales in Section 3. The problem of observability of stochastic and adaptive systems is intimately related to the convergence questions. This point is only touched upon in the last section of this chapter.
2. Convergence Questions: A Simple Case Although the questions of convergence of a posteriori probability distributions will be treated generally in the next two sections, it is instructive to consider a simple example using heuristic arguments and elementary techniques. The following discussions of this example is based on Fukao.vA particular problem considered in this section is the problem where the parameter space e contains a finite number of elements, i.e., the value of the unknown parameter B is assumed to be one of Bi , 1 ~ i ~ s, and where the observations yare discrete and assumed independent for each in e. Suppose an a priori probability for is given by
e
e
ZO,i = Pr[B = 8i ] ,
(1)
The a priori probability Zo = (ZO,l , ... , zo,s) is transformed by Bayes' rule into the a posteriori probability given an observation y. After Yo, Y1 , , Yn have been observed, the a posteriori probability, zn+l = (Zn+l,l , , zn+l,s)' is given by Z _ p(yn I 8i ) ZO,i (2) n+1,i - '"'. p(yn I 8) z 0.). L.)~l ) where by the independence assumption of y's
p(yn I 8i ) =
n
I1 p(y; I 8
i),
;~o
°
(3)
Note that if Zi.k = 0 at some i, then Zj,k = for all j ;:;: i, that if 1 at some i, then Zj,k = I for all j;:;: i and consequently Zj, I = 0, .i ;:;: i, 1:# k
Zi,k =
VI.
200
CONVERGENCE QUESTIONS
For example, given i, if «s; I Bi ) = 0 for some m, then Z""+l,i = 0 from (2) and zn,i = 0 for n ::? m 1. Assume now that y can take only a finite number of values a 1 , ... , a f each time, with probability
+
Prey
k
= ail = Pi ,
where p(y
LPi
1
1,
=
i
~
~
I
(4)
1
a, I OJ)
=
>0
for all 1 :::;;; i :::;;; I and 1 :::;;; j :::;;; s. From the comments following (3), one can take ZO,i > 0, 1 :::;;; i :::;;; s, without loss of generality. After a large number, N, of independent observations, ai will be observed approximately NPi times with probability close to one. Then
.
ZN+l.i
=:= '"'. L..k~l
OJ~l
[p(y
=
OfJ~l [p(y
aj =
I 0i)]NPi ZO.i
a-IO )]NPi J k
Z
o,k
Equation (5) will now be used to discuss the limiting behavior of as N tends to infinity.
(a) If
for at least one j, then clearly, from (5), lim
N~oo
ZN - = ,t
0
(b) If, on the other hand,
TI [P(y
k~lP(Y
=
=
a k I OJ) ak
I 0i)
]Pk <
1
for all j :j::. i, then lim
N"")rf.)
ZN - = ,1.
1
(c) We may have an intermediate situation where for some
i
«i
ZN,i
2.
20I
CONVERGENCE QUESTIONS: A SIMPLE CASE
and for all other
I
then , I 1m
N-.oo
ZN , = .'
1
1
+ ZO.i/ZO.;
Zo,;
+ ZO.i
(d) If for all j
E
J
where] is some subset of {I, 2,... , I} containing i and for all j ¢=
J
then lim
.
Z
N-.oo
N.'
= ~-;-
LiE} zO.i
As can be seen from these four special cases, the ratio R, (6)
plays an important role in deciding the limiting behaviors of tends to infinity. The condition R ~ I is equivalent to that of
L: P« log p(y = le
ale I 0i) ~
L: PIc log p(y = le
ZN.i
ale I 0;)
as N
(7)
It is easy to see that }
H = -
L: PIc log p(y =
ale I 0;)
(8)
le~1
is minimal with respect to the probability Pk' 1
~
k
~
I, when (9)
Thus I
L: k~l
I
Ple log p(y = ale I OJ) ?': -
L: Ple log Ple , k~l
forall
l~i~s
(10)
VI.
202
CONVERGENCE QUESTIONS
Given Pj' 1 ~ J ~ I, of the probability that y = a j , suppose that p(y = a j I 0i) = pj for all J and p(y = a j 10k) pj for all k i, then, from (9) and (10),
"*
"*
hence, from (7),
This corresponds to Case (b) and 1imN_oo given Pj, 1 ~ J ~ I, if
ZN i
= 1. More generally,
is realized for a unique OJ , then
- L: Pk log p(y = k
«, I 8j )
< -
and, from Case (b), limN_oo
ZN.i
L: Pk log p(y = k
=
a k I 8;)
for all i #- j
1.
3. Martingales As a preparation to our general discussions of convergence problems in Section 4, an introductory account of a special kind of stochastic process, martingales, will be presented in this section. Also see Appendix I at the end of this book. Consider a discrete-time stochastic process {Zk; k = 0, 1, 2, ...}. It may be a sequence of state vectors {xk ; k = 0, I, ...} of a given stochastic system or a sequence of observation vectors of the initial condition of a system {Yn ; n = 0, I, ...} contaminated by noise. A stochastic process {Zk ; k = 0, I, ...} is called a martingale if E I Zk I < 00 for all k ? and, for any n subscripts, ~ k, ~ k 2 < ... < k n ,
°
°
with probability
(11)
For example, Zk is such that the expectation of Zn+l conditioned on the n + 1 preceding realization of z's, Zo , ZI , ... , Zn , is equal to Zn . Before we discuss the meanings of (11)" let us mention two other stochastic processes closely related to martingales. When the equality sign in (11)
3.
203
MARTINGALES
is replaced by an inequality sign ~():) we call the stochastic process a semimartingale, or an expectation-decreasing (increasing) martingale. At first sight of (11), one may think that martingales are too special to arise in many engineering problems. This suspicion turns out to be unfounded and actually there are many ways in which martingales arise in optimization problems. One classical example of martingales is a fair gambling situation.s?> By a slight change of terminology, this example can be rephrased as a control system with the plant and observation equations
°
Assume that its control policy is given by Ui = ePi(Xi), Suppose that t/s are independent and that Efi[ti, ePi(Xi) I xi] = for all Xi . Then E(xi+l I Xi) = Xi and {Xi} is a martingale. There are other, less trivial, examples. We discuss next the maximum likelihood estimation problem of an unknown parameter. We know from Chapters II-VI that, for some optimal adaptive control problems, the optimal control policy synthesis can be separated from the optimal estimation process of the unknown plant parameters and/or noise statistics. Maximum likelihood estimates are often used when exact Bayesian conditional expectation estimates are not available or are too cumbersome to work with. If the random variables are Gaussian, these two kinds of estimates coincide. Suppose we have a system where its unknown parameter 8 is assumed to be either 81 or 82 • 47a Consider a problem of constructing the maximum likelihood estimate of 8 given a set of (n + 1) observed state vectors at time n, yn. Suppose that p(yn I Bi ) is defined for all n = 0, 1,... and i = 1,2. Form the ratio (12)
The probability density p(yn I 8), when regarded as a function of 8 for fixed yn, is a likelihood function mentioned in Section 2, Chapter V. Hence Zn is called the likelihood ratio. Since 8 = 81 or 82 in the example, the maximum likelihood estimate of 8 is 82 if Zn > 1, 81 if Zn < 1, and undecided for Zn = 1. Thus, the stochastic process {zn} of (12) describes the time history of estimate of 8. To study the behavior of the sequential estimate of 8, one must study the behavior of {zn} as n ---+ 00. Since p is a density function, the denominator is nonzero with probability one. Let us assume that
204
VI.
CONVERGENCE QUESTIONS
p(yn I ()2) = 0 whenever p(yn I ()1) = 0 since otherwise we can decide () to be ()2 immediately. Suppose ()1 is the true parameter value. Then
and
Then, since zn are random variables which are functions of yn, with probability
I
(13)
Taking the conditional expectation of (13) with respect to zn, E(E(zn+1 I yn) I zn)
=
E(E(zn+1 I yn, zn) I zn)
= E(Zn+l I zn) =
st»; I zn)
Thus, it is seen that the sequence of likelihood ratios, {zn}, is a martingale. For more practical engineering examples, see, for example, Daly,42.43 Kallianpur'", Raviv.P"
4. Convergence Questions: General Case A.
PROBLEM STATEMENT
We now make precise statements made in Section 1. This section is based on Blackwell and Dubins.s! A common frame of reference used in this book in judging system performances of control systems is the expected values of some scalarvalued functions]. In case of adaptive control systems, their expected values EJ depend, among others, on the unknown parameters () taking their values in some known parameter spaces e.
4.
CONVERGENCE QUESTIONS: GENERAL CASE
205
There are many other systems, not necessarily control systems, whose performances are judged using this common frame of reference. A sequence of measurements Yo , YI ,... is made while a given system is in operation where the measurement mechanisms are assumed to be designed so that y's are functions, among others, of 8; i.e., joint conditional probability density p(yn I 8) is assumed given for each 8 E e. An a priori probability density for 8, Po(8), is also assumed given. The a posteriori probability density p(8 I yn) is computed by the Bayes rule
p(e I yn)
=
po(e)p(yn I e)
J de Po( e)p(yn I e)
Now we formulate (from the questions on p. 197 and 198) Question 3'. Under what conditions does p(8 I yn) converge as n---+ oo ? Question 4'. Given two a priori probability densities Po(8) and qo(8), under what conditions do they approach the same density? In Questions 3' and 4', the closeness or the distance of any two probabilities PI and P 2 , defined for the same class of observable (i.e., measurable) events, is measured by the least upper bound of the absolute differences of the probabilities assigned to all such events by PI and P 2 • Denote the distance of the two probabilities by p(P I , P 2 ) and the class of observable events by ff:
In the language of the theory of probability, Q is a sample space, ff is the a field of subsets of Q, and PI and P 2 are two probability measures so that PI(A) and P 2(A) are the probabilities assigned to A, for every AEff. Some of the definitions and facts from the probability theory are summarized in Appendix I at the end of this book. Question 3' asks for the conditions for p(pn, P*) ---+ 0 as n ---+ 00 for some probability P" where pn is the nth a posteriori probability. Question 4', therefore, asks for the conditions under which
where pn and Qn are the nth a posteriori probabilities starting from the a priori probabilities Po and Qo , respectively.
206 B.
VI.
CONVERGENCE QUESTIONS
MARTINGALE CONVERGENCE THEOREMS
Both Questions 3' and 4' of Section 4,A are answered by straightforward applications of martingale convergence theorems. The forms in which we will use them are stated here without the proofs: The proofs can be found, for example, in Refs. 31, 47a. See also Appendix I at the end of this book.
Theorem 1. Let Zn be a sequence of random variables such that I has a finite expectation, converges almost everywhere to a random variable z, and let Yo, Yl ,... be a sequence of measurements. Then
SUPn , Zn
lim E(zn I yn)
n->ro
=
E(z I Yo, Yl ,... )
Theorem 2. Let in be any sequence of random variables that converges to 0 with probability 1. Then, with probability 1 and for all € > 0, converges to 0 as n
---+ 00.
We also note here that zero-one laws in their various versions,47a,102 although they can be proved directly, can also be proved by applying Theorem 1. Let
Z
= Is, where Is is the indicator of the event B, i.e., wEB w¢B
where the event B is defined on the sequence of the measurements Yo 1 Yl ,.... Then, from Theorem 1, one has
Theorem 3. on the Yn's.
PCB I y n )
. s t,
with probability 1, where B is defined
C. CONVERGENCE
We now consider the convergence of the a posteriori probability densities pee I y n ) to the true value of e, eo . It is assumed that there exists: (i) an a priori probability density which assigns a positive probability to some neighborhood of eo ;
4.
CONVERGENCE QUESTIONS: GENERAL CASE
207
e
(ii) a subset B in such that the event 80 E B is defined on the Yn's; namely, with nonzero probability, there is a realization of measurements Yo 'Y1 ,... such that the sequence of functions of Yo, Y1 ,... computed according to the Bayes rule converges to 80 • Then, by Theorem 3, PCB [yn)
L
d8p(8 jyn) ......... 1 or 0
=
depending on
This is equivalent to saying
D.
MUTUAL CONVERGENCE
For the sake of convenience in writing, let w = (y n ) be the measurements that have been performed and let v = (Yn+l , Yn+2 ,...) be the future measurements. Let A be a measurable set in the product space ~n+l X ~n+2 X "', where ~k is a a field of subsets of Y k , the space of outcomes for the kth measurement Yle . Let B be a measurable set in ff, a a field of subsets of e. Let pn(A, B I w) be the conditional probability of v being in A and B being in B, given w, when the probability density Po(B) is adopted as the a priori density function on e. The conditional probability Qn(A, B I w) is similarly defined with qo(B) as its a priori probability density. qo( B) is assumed to be absolutely continuous with respect to Po(8). Namely it is assumed that a nonnegative function of 8, feB) ?: 0, exists such that
qo(8)
= Po(8)f(8).
The main result can be stated that except for the measurement sequence with Q-probability (i.e., the probability constructed from qo(8») zero, p(pn, Qn) -+ O. The convergence to zero of distance SUPB p[pn(B I w), Qn(B I w)] as n -+ 00 is implied by the convergence to zero of SUPA,B p(pn(A, B I w), Qn(A, B I w). Therefore, the proof for this latter is sketched in this section. See Ref. 31 for details. Because of the absolute continuity assumption, we can write Q(e) =
Ie
VI.
208
CONVERGENCE QUESTIONS
where C is a set of points (w, v, (), i.e., C E ff X fJ80 X /!4 I X a; X fJ8 n +l X Define fn(w)
=
x···
Jrf(w, v, B) dpn(v, B I w)
where the integration ranges over the whole space, i.e.,
with respect to P measure. Define
Then, from Theorem 1, almost surely with respect to P probability i.e., with probability 1 when the probability is given by P
Consequently, lim n, n
~
almost surely with respect to P measure and dn almost surely with respect to Q measure
1,
-+
1
Define
for all D E:F X
~n+l
X ...
Then Qn is a conditional probability of future measurements and 8 given the past measurements. Now
= ~
r
d n-l >0
[dn(w, v, 8) - 1] dpn(v, 8 I w)
5.
STOCHASTIC CONTROLLABILITY AND OBSERVABILITY
=
J
dn-l
>€
(dn ~ 1) dpn(v, 8 I w)
+J
Q
+J
dn-l>f"
(dn ~ 1) dpn(v, 8 I w) ~
u; -
1) dpn(v, 8 I w)
where G
=
209
{(v, 8): dn(w, v, 8) - 1
~
E
E
+J
G
dn dpn(v, 8 I w)
> E}
Thus The last step comes from Theorem 2. Thus, if given a priori density function for 8, Po(8), any other choices of a priori density functions also converge to the same density function eventually so long as they are absolutely continuous with respect to Po(8).
5. Stochastic Controllability and Observability A.
INTRODUCTION
In deterministic dynamical systems the concepts of controllability and observabilit y8 7- 89 , l oo , 142 play a very important theoretical role in characterizing possible system behaviors. As pointed out in Section 1, the corresponding concepts of observability and controllability of stochastic systems exist and are intimately connected with the convergence questions of a posteriori probability density functions such as P(xi I yi) or p(x o 11') as i -+ 00. We will define these concepts in terms of the covariance matrices associated with these a posteriori density functions. * We have discussed in Chapters II-V the procedure for generating these a posteriori density functions for general systems with nonlinear plant and observation equations. Therefore, their definitions will in principle be applicable
* By the duality principle discussed in Section 3,C, Chapter II, the results of this section can be translated into the corresponding results on the asymptotic behaviors of regulator systems or vice versa. See Ref 89a. The independent investigation of the asymptotic behaviors of the error covanance matrices is of interest since it sheds additional light on the subject.
VI.
210
CONVERGENCE QUESTIONS
to general nonlinear stochastic systems even though they are developed for stochastic systems with linear plant and observation equations in this section. Let us illustrate by simple examples how the question of observability arises in stochastic control systems. Consider a system with the plant equation A nonsingular (14) and with the observation equation Yi =
Hix i
+ n,
(15)
where the matrix A is assumed known, where Hi YJ's are some observation noises. Then, from (14) and (15),
=
Ar', and where
showing that Yi observes a noisy version of X o for all i = 0, 1,.... In this case, since (14) is deterministic, Xi for any i > 0 can be constructed if X o is known exactly. Thus assuming the existence of p(x o I yn), if p(xo I v") converges as n ---+ 00 to a delta function then so does the density function for Xi , at least for stable systems, since Xi = Aix o . Instead of (14) consider now a system (16)
Then Xi
=
i-I
Ai X O
+ L:
Ai-l-jCi~j
j~O
and i-I
Yi =
HAi X o
+ L:
HAi-HC;~j
+ 7];
j~O
If HAi = 0 for some i o , then HAk = 0 for all k ~ i o . Therefore, no matter how many observations are made, Yk does not contain X o for k ~ i o . It is not possible to get more precise information on X o than that contained in Yo ,... , Yi o- l . Similarly, if the density function for X o is not completely known, for example if the distribution function of X o contains an unknown parameter 81 , then the observation scheme of (16) is such that p(81 I y n ) remains the same for n ;:? io . Then we cannot hope to improve our knowledge of 81 beyond that at time i o no matter how
5.
STOCHASTIC CONTROLLABILITY AND OBSERVABILITY
211
many observations are taken. Therefore, we may want to call the system with (14) and (15) stochastically observable and the system with (16) stochastically unobservable or observable in a weaker or wider sense. Observability of stochastic systems is then defined as the existence condition of the system state vector estimates with certain specified asymptotic behaviors, where the class of the state vector estimates of Xi is taken to be functions of y i . Such observability may be called on-line observability. There is another type of observability concept which may be called off-line observability. The class of state vector estimates of Xi is no longer restricted to be functions of yi but is taken to be of Yj ,Yj+l ,..., Yj+k' where j > i or j < i. The behavior in some probability sense of such estimates as k ----+ CX) is then investigated. Both are of interest in system applications. In this book, the on-line observability is developed using the convergence in the mean. We now make these preliminary remarks more precise. Given two square matrices of the same dimension, A and B, we use the notation A ? B when (A - B) is nonnegative definite and A > B when (A - B) is positive definite. B.
OBSERVABILITY OF DETERMINISTIC SYSTEMS
Consider a deterministic system described by
where x's are n vectors and y's are m vectors and where Ai exists for all i. The problem is to determine the x/s from a sequence of observations Yo ,Yl ,.... Because of the deterministic plant equation, the problem is equivalent to determining x at anyone particular time, say x o, from a sequence of observations Yo , Yl ,.... Of course, the problem is trivial if m ? n and H o has rank n. Then X o is determined from Yo alone by Xo =
Xi
(Ho'Ho)-lHoyo
More interesting situations arise when m < n, Let us determine from y i . Defining the (i l)m X n augmented H matrix by
+
H z.
=
~
H 0 1> O ' i ~
H l 1>l.i •
Hi
212
VI.
CONVERGENCE QUESTIONS
where ePk,j is the transition matrix from x j to y vector by
Xk
and an augmented
we can write
N ow if (H/Hi ) is nonsingu1ar, then
i.e., if Hi has rank n, then the definition of Hi to
Xi
can be determined from
yi.
By changing
we obtain
Such a system is called observable. The condition that the rank of Hi is n is the observability condition of a deterministic systemJ42 This concept has been introduced by Kalman,87-89 Physically speaking, when a system is observable the observation mechanism of the system is such that all modes of the system response become available to it in a finite time. In other words, when the system is observable it is possible to determine any Xi' i = 0, 1'00" exactly from only a finite number of observations y.
C.
STOCHASTIC OBSERVABILITY OF DETERMINISTIC PLANT WITH NOISY MEASUREMENTS
Let us now consider a deterministic plant with noisy observations: (17) Yi =
H.»,
+ TJi
(18)
5.
STOCHASTIC CONTROLLABILITY AND OBSERVABILITY
213
where X o is a Gaussian random variable with a covariance matrix l:o , where noises are independent Gaussian random variables with E(1)i)
0
=
(19)
E(1)i1)/) = R/Jij
and where R i is nonsingular, i = 0, 1,.... Here again, if we can determine (or estimate) the state vector x at anyone time, such as X o , then from (17) we can determine all other x's. Because of the noisy observations, it is no longer possible to determine Xi from a finite number of observations. We compute, instead, the probability density function p(xi I yi) as our knowledge of Xi , or compute the least-squares estimate of Xi from yi if noises are not Gaussian. Since and where
rpn.i is the state transition matrix from
g An-I'" Ai ,
g
time ito n,
g
(20)
I
(21)
from (18), (22)
Define (23)
The best Wiener-Kalman estimate of
Xn , Xn
*, is obtained by minimizing (24)
with respect to
Then, from Chapter V, we know that
Xn .
Xn
*=
L H;'R;lH; )+( L H;'Ri1Yi n
(
_
_
o
n
_
)
0
From (22) and (25),
r; g
cov(xn * I yn )
=
(Io i1;'Ri1i1 i(
(25)
VI.
214
CONVERGENCE QUESTIONS
a. Stochastic Observability Matrix Define an (n X n) matrix, called the stochastic observability matrix, by
n
L (
R"i 1H;<jJi,n)
(26)
i=O
when Lo 1 is null. This matrix recursion equation {l)n+1
=
,p~,n+1(1)n,pn,n+1
IS
nonnegative definite. It satisfies the
+ H~+1R;;~IHn+1 (27)
If mn is positive definite, then the first term can be written as (Anm:;;IA n')-I. Thus, from Chapters II and V, we see that m;;1 satisfies the same recursion equation as the error-covariance matrix of the Kalman filter, when ,EOI is taken to be null matrix. When ,Eo l is not null mo must be replaced by ,Eo! + H o'R 1Ho • Since the second term of (27) is nonnegative definite, we see that mj is positive definite for all j ~ i o when m i o is positive definite.
o
b. Definition of Stochastic Observability (Strict Sense) The system with (17) and (18) is said to be stochastically observable in the strict sense if and only if the covariance matrices associated with the conditional probability density function of X lc given v" goes to zero as k ---+ 00, or equivalently the stochastic observability matrix (!)Ic is positive definite for some k and its error-covariance matrix, Tic , converges to the null matrix as k ---+ 00. *
* It is possible to define the observability by behavior of the variance of p(xo I yn). Thus, if Xo is more and more accurately known as the number of observations increases, then the system is observable in the strict sense. The stochastic process (zn), defined by Zn =
E(x o i yn)
is a martingale. If EI X o I < 00, then, by the martingale convergence theorem, E(xo I yn) ~ E(x o I Yo ,Y, ,... )
with probability I as n ~ 00, i.e., the associated covariance matrices converge to the null matrix. For observable systems this limit is X o •
5.
STOCHASTIC CONTROLLABILITY AND OBSERVABILITY
215
By the Chebychev inequality, the stochastic observability, therefore, implies that the estimation errors of the state vectors converge to zero in probability as the number of observations increases. Thus, with L 0 1 being null, if mio is positive definite then for all k.:? i o
(2Sa)
When L 0 1 is not null, we have (2Sb)
A sufficient condition for stochastic observability is now derived. The sufficient condition must insure that mk is positive definite for some k and that II mk II increases indefinitely as k ---+ 00. For this purpose it is convenient to modify mk as (i)k
=
k
L (f;,oH/K;lHifi.O)
(29)
i~O
Then, from (26) and (29), (30)*
The matrix (i)k satisfies the recursion equation (31)
c. Sufficient Condition for Stochastic Observability (Strict Sense)
The sufficient condition for stochastic observability is that there exists a positive integer q such that the partial sum of H/ Hi l Hi over any q consecutive time intervals is positive definite and that II (9,;1 II decreases in such a way that 2
II fk,O 11 II ([i",,/ II -+
0 as k
-+ 00
Since 1Ji.O is nonsingular, the partial sum of 1J~.oH/ R;1 Hi1Ji, 0 over any q consecutive time intervals is positive definite and we have for any i
> q,
k.:? i o
* The two observability matrices of (26) and (29) differ only in that time 0 or current time is used as the time of reference.
VI.
216 where
f\
CONVERGENCE QUESTIONS
is positive definite. Then, from (28a) and (30), II
when
r
Ie
II
<
II cPle,o 11
211
1
@k II
..E0 1 is null. When ..E0 1 is not null, from (28b) and (30), (32)
For large enough k we have
II @k II J> 1I..E0 1 II and (33)
Thus in both cases
II T k 11-+ 0 as k -+ 00.
D. STOCHASTIC OBSERVABILITY OF GENERAL DYNAMICAL SYSTEMS:
I
As a next class of systems, consider
+ Cig i HiXi + YJi
Xi+l = Aixi Yi
where Cs and
r/s
=
(34) (35)
are independent Gaussian random variables with E(gi) = E(YJi) = 0 E(git/)
=
Q/)ij
where Qi and R i are nonsingular, i,j = 0, I, .... The initial state vector X o is taken to be random, independent of Cs and r/s with
In the previous section, the stochastic observability matrix is shown to be related to the inverse of the error-covariance matrix. Because of the plant noise, it is no longer possible to hope that any Xi will be learned eventually from Yo , YI"'" nor that the knowledge of Xi at any i suffices to determine x j exactly, j =F- i. A reasonable definition of observability, then, is to focus our attention on the error-covariance matrix of the state vector estimate and define the systems to be stochastic-
5.
217
STOCHASTIC CONTROLLABILITY AND OBSERVABILITY
ally observable in the wide sense when the error-covariance matrices remain bounded in some sense. * Definition of Stochastic Observability (Wide Sense) A system is stochastically observable in the wide sense if the covariance matrix associated with P(Xi I yi) remains bounded as i ---+ co, In the process of obtaining a sufficient condition for this wide sense stochastic observability we need a concept of controllability of stochastic systems which is a companion or dual of the concept of observability. This concept is discussed in Section E and was also introduced by Kalman. 8 7- 89 We will return to the topic of this section in Section F.
E.
CONTROLLABILITY OF STOCHASTIC SYSTEMS
Consider a dynamical system with the plant equation
where
ti
are independent random variables with
Because of the noise disturbances the state vector at time n, x n from ePn.oxo , the state vector at time n with no disturbances by
,
differs
n~l
L q,n.i+l Citi
=
i=O
The random variable d n is such that E(dn ) = 0
and n-l
cov(dn) =
L: q,n.i+1CiQiC/
(36)
If the matrix '&' n is such that II '&' n II remains bounded for all n, where II . II is a norm in an Euclidean space, then II II will remain bounded
s;
* By Chebychev's inequality, the observability in the wide sense enables one to establish an upper bound on the probability of the estimation error exceeding a given threshold.
218
VI.
CONVERGENCE QUESTIONS
for all n. In other words, the effects of the random disturbances remain bounded. Therefore, '(} n is called the stochastic controllability matrix of the dynamical system (34). The system (34) is called stochastically controllable if its stochastic controllability matrix is positive definite for some n, and remains bounded for all n. We are now ready to discuss the observability of the general system (34) and (35).
F.
STOCHASTIC OBSERVABILITY OF GENERAL DYNAMICAL SYSTEMS;
II
The stochastic observability of the general system given by (34) and (35) is now discussed by applying the results of Sections C and E to its subsystems to be defined shortly. Since n-l
Xn =
fn.oxo
+L o
fn.i+lCiti
it is easy to obtain an upper bound on Tn , the error-covariance matrix of the state vector X n of the Wiener-Kalman filter, as (37)
If the system is stochastically observable, then, given a positive integer io , there exists NUo) > 0 such that NUo)' Therefore, * (37a)
This is essentially the bound obtained by Kalman.s" Let us also obtain a lower bound on Tn and improve on the upper bound given by (37a). For this purpose, it is convenient to write the vectors Xi and Yi as the sums of other vectors defined as follows: (38)
(39)
* The conditioning variables yn in (37) are more complicated than yn used in defining the observability matrix, since the former is a function of ~n-l and TJn while the latter is simply a function of TJn. However, if the system is observable then E(xo I yn) -->- 0, even when yn are functions of ~ and TJ. See Appendix I for detail. Hence the errorcovariance matrix converges to the null matrix and can be bounded as in (37).
5.
STOCHASTIC CONTROLLABILITY AND OBSERVABILITY
219
where (40a)
and (40b)
The subsystem (40a) therefore has a stochastic plant with no measurement errors. The other subsystem, (40b), has a deterministic plant with noisy measurements. Assume that the subsystem (40a) is stochastically controllable and the subsystem (40b) is stochastically observable (strict or wide sense). Assume that the state vector X i 2 is estimated by (41)
where Kl+ 1 is the gain of the filter, and that the estimation of the state vector xl is done by (42)
where K;+l is the gain of the filter. It is easy to see that the estimate of Xi given by will be optimal if (43)
where Klt-1 is the optimal filter gain for the Wiener-Kalman filter for the system of (34) and (35). The error-covariance matrix associated with Xi is given by T',
=
E(x i - Xi)(Xi - Xi)'
+ [';2
~ ['i 1
(44)
where and The cross product terms disappear in (44) because of the particular choice of initial state vectors for the two subsystems. The lower limit for T i will now be established. Since T/, the second term of (44), is the error-covariance matrix for the filter (41) with the nonoptimal gain Kl+1 , we have
r*2 ,
~
T.2 t
VI.
220
CONVERGENCE QUESTIONS
where Tl 2 is the error-covariance matrix of Xi 2 with the optimal gain. By assumption, the subsystem of (40b) is stochastically observable. From (28b), (45)
In obtaining the estimate for I'l, the first term in (43), we will encounter the controllability condition for the subsystem (40a). From (42),
Defining the state transition matrix for (46) by if;L ,
(xl - xl) =
i-I
-
x/) + L
j~O
i-I
=
L
j~O
where is used. Thus,
1'/
i-I
=
L
j~O
(47)
where
By comparing (47) with (36), we see that when if;'s are substituted for ¢'s in (36) we obtain ~n . This matrix is defined as the modified stochastic controllability matrix with the gain given by (43). 'li'i results from ~i by K, = O. The matrix ~ is not simple to compute since the optimal gain for the Kalman filter for the original system is used. See Sorenson-" for more detailed computations. From (44), (45), and (47), we obtain a lower bound on T i as
(eP~.i
17;;l1>o.i
+
-1 :(:
(
To get a sharper upper bound on using the gain
Til
+ (Ii;) + ~i
:(:
-1
we compute E[(Xi2
-
r,
Xi2)(Xi2 -
(48)
xl)'] (49)
5.
STOCHASTIC CONTROLLABILITY AND OBSERVABILITY
221
From (40b) and (41),
Defining the state transition matrix for (50) by we have
fL
analogously to
fL ,
i-I
( Xi 2 -
' 2)
Xi
,/.2 ( 2 = 'r t» X o
' 2)
-
L, '1/i.Hl K j 2'YJj
'\' .1 2
-
Xo
(51)
j=O
In view of (51), define i-I
~
f?
=
+ L tJ;;,j+lK j R jKNL+1
tJ;~,oEoif; ,~
Then, substituting (49) as ;'2 _ i
-
.1.2
'"Pi,O
2
j=O
K/,
E O"Pi.O ,/.2'
+ L~
.1.2
1H )+,/.2' . .(H.'R-: J 1, 1 '"f?"J
'"fl.?
(52)
j=l
Since the gain (49) is not optimal for the system (34) and (35) (53)
where ~i Since
IS
defined analogously to
«,
with K/ replaced by K/ (49).
we have, from (45) and (53), (54)
Thus, the system given by (34) and (35) is stochastically observable in the wide sense if the subsystem (40b) is stochastically observable (either in the strict or wide sense) and if the subsystem (40a) is stochastically controllable. G.
IDENTIFIABILITY OF STOCHASTIC SYSTEMS
Consider the system described by
where A is now assumed to be unknown.
222
VI.
CONVERGENCE QUESTIONS
To date, several iterative procedures to identify A have been proposed. 62, 72a The system is said to be identifiable when limk->oo Ak = A in some probabilistic sense such as in quadratic mean or in probability, where A k is some estimate of A given Yo '00" Yk i.e., the system is identifiable when the error covariance matrix of Ak converges to a null matrix. The proof of this convergence62a , 72a is adapted from the convergence proof of the stochastic approximation. 51 Thus, the condition of the identifiability is essentially that of the convergence of the stochastic approximation and not of observability. It can be shown, however, that the Kalman observability criterion in Section 3 is sufficient for the identifiability. 62a, 72a,100 This can also be inferred from the fact 62a that the convergence proof of the identification procedure is essentially the same for systems with or without the inverse of H. This observability criterion is, however, not necessary. This can be seen by considering the next example. * Consider a system described by
Xi+I) (Zi+l
=
Yi =
(a 0) (Xi) (0 ) ° a + e. (1,0) (:i) Zi
t
where the state vector has two components x and z, where z's are independent Gaussian random variables with means 0 and known finite vanances. Even without the plant noise, this system is not observable since it does not satisfy the rank condition, but is identifiable since if Yi
*- Q
Even when the observation equation is modified to Yi
= (1,0)
G:) + n,
where y]/s are observation noises of known probability distribution function, it is still possible to identify a under certain conditions.P'e
* This
example is due to T. Fukao (private communication).
Chapter VII
Approximations
We come to realize very quickly that various optimal control formulations derived in Chapters II-IV do not generally admit closedform analytic solutions. Even if we restrict our attention to a class of control systems with linear plant equations and quadratic performance indices, we can obtain optimal Bayesian control policies in analytically closed form only for special cases. As pointed out repeatedly, it is usually the rule rather than the exception that we must solve optimal control problems approximately. We have mentioned some approximation schemes in discussing Example c, Section 3,B, Chapter III, when the gain of the system is unknown. In this chapter, we discuss several other approximation schemes. There are many ways to approximately obtain optimal control policies: approximations in policy spaces,4,9,11,20 linear or nonlinear programmings including gradient techniques of various kinds;'! ,59 ,68 ,69,71,139 stochastic approximations! and separation of optimization from estimation with possibly additional approximations being made for control and/or estimation parts,63,64,105,129 to name only a few possibilities. The subject of this Chapter could indeed be a basis for an entire book. We have chosen five topics somewhat arbitrarily, all of which, however, have to do with the question of how to reduce the amount of computations involved in obtaining optimal control policies. The first topic discusses a method of approximate synthesis of optimal closed-loop control policies for a class of linear adaptive systems from the knowledge of optimal closed-loop control policies for a corresponding class of purely stochastic control systems. 5,62 As will be discussed in detail in Section I, under certain conditions,
223
VII.
224
APPROXIMATIONS
optimal policies for adaptive systems can even be synthesized exactly, in this manner. Since the amount of computations involved in deriving optimal closed-loop policies for adaptive systems are usually several orders of magnitude larger than that for purely stochastic systems, the saving in computational work could sometimes be very significant. The second topic discusses an approximate control method which employs what is sometimes denoted as open-loop feedback control policies instead of closed-loop feedback control policies. 4 9 ,126 ,1 27 The remaining three topics are devoted to the problems of approximate estimations of state vectors and/or unknown parameters. After performing some sensitivity and error analysis of Wiener--Kalman filter'" in Section 3, we discuss two particular methods of approximate state vector estimation: by constructing an observer of state vectors to supplement the available measurements on the state vector in Section 4,17,103 and by partitioning state vectors in Sections 5 and 6. 8 1 ,104 .104a , 11 2
1. Approximately Optimal Control Policies for Adaptive Systems A.
INTRODUCTION
The problems of obtaining optimal control policies are generally much harder for adaptive systems than for purely stochastic systems. We have seen many examples testifying to this fact. It is natural and important to ask, therefore, if it is possible to construct approximate optimal control policies for adaptive systems from the knowledge of optimal control policies for purely stochastic systems. For brevity, the former will be referred to as optimal adaptive control policy and the latter as optimal stochastic control policy. Before we make this idea more precise, it is instructive to examine the relation between optimal adaptive and stochastic policies for a simple control problem. The system we consider is the same one used in Section 5 of Chapter III. B.
ONE-DIMENSIONAL EXAMPLE 5
The system to be considered is governed by the plant and observation equations
Yi =
Xi'
O:'(i:'(N-l,
u, E (-00, (0)
1.
ADAPTIVE SYSTEM OPTIMAL CONTROL POLICIES
225
where x, a, u, r, and yare all scalar quantities and where ri is the independently and identically distributed random variable with r =
,
with probability with probability
Ie
I-e
8 1- 8
(1)
We take the usual criterion function
]=
N
I
Wi(X i, Ui-1)
1
e
When in (I) is assumed known, we have a purely stochastic control problem. When 0 is not known we have an adaptive control problem. Here, for the sake of simplicity, we consider the adaptive system to be such that Pr[8 = 81 ] = Zo Pr[8
82]
=
=
1-
(2)
Zo
where Zo is the given a priori probability. The a posteriori probability that be equal to 01 at time i is denoted by Zi . We will see that when Wi is quadratic the optimal adaptive control policy can be synthesized exactly by knowing the optimal stochastic control policies with 0 = 0i' i = 1, 2. Thus, with about twice the labor of solving a purely stochastic optimal control problem, an exact solution for the corresponding adaptive optimal control problem is obtained for this example. This is a large saving of labor compared with solving the adaptive problem exactly. Some idea of the degrees of saving may be obtained by the analogy with the difference in labor of integrating a function of one variable for two different parameter values and one function of two variables. In the next section we will see that for more general adaptive problems approximations to optimal adaptive control policies can be made with a similar savings of labor. For the adaptive problem of this section the augmented state vector {(Xi' Zi)} forms a first-order Markov sequence. The minimum of the expected cost of control for the adaptive system at time i, Yi*' is therefore a function of (X i-1 ,zi-1)' The equation for YN* is already given by minimizing (144) of Chapter III. The general recursion equation for Yi* is given as
e
Yi *(X'_1 , Z'_1) = min[A, -1. E(Y'~l U _ i
1
I Xi-I'
Zi-l)]
(3)
where, from the definition of ;\ , Ai
= E(W,(x" U'_I) I Xi_I' Z'-l) = ZH E(W, I X'_l' 8J + (1 --
Z'-l)
E(W, I X'-l' 82)
(4)
VII.
226
APPROXIMATIONS
where E(Wi I Xi-I' 8j )
= 8j
Wi(xL , Ui-l)
+ (1 -
(5)
8;) W i(xi-l , Ui_l) j = 1,2
and where (6)
In (3) we can write
where E(y7+11 Xi-I' Zi-l , 8j ) = 8j yi*rl(X7-1 , Z7-1)
+ (1 -
8j ) yi+1(xi-l ,Zi-l) (8)
j = 1,2
and where +
Zi-l
=
Zi_181
Zi-181
+ (1 ~
_ Zi-l = Zi-l(l -
Zi-l)8 2
zi_l(1 ( 1)
+ (l
(9)
81 )
- Zi-l)(1 - ( 2 )
are the a posteriori probabilities that 8 and Yi-l = -c, respectively. From (4) and (8) we can write (3) as
=
81
,
given that
Yi-l
=
+c
where j = 1,2
(11)
for 1 :(; i :(; Nand
Note that the variable %i-l which expresses our Ignorance about the precise value of 8 appears linearly in the inside of the minimization operation of the recursion equation (10). Note also that the purely stochastic problem with 8 = 81 corresponds to the problem with the a priori probability Zo = 1 and the problem with 8 = 82 to %0 = o. The recursion equation for the corresponding stochastic system where 8 = 81 therefore is obtained by putting Zi-l = I for I :(; i :(; N in (10).
1.
ADAPTIVE SYSTEM OPTIMAL CONTROL POLICIES
227
-t,
Denoting by the optimal control cost for the stochastic problem with 8 = 8j , j = 1,2, it satisfies the recursion equation: yti(X i - l) = min<Wi(x i , Ui - l) U _ i
1
+ -z.,«.», j = 1,2 (12)
where i
=
1,2, ..., N
Now suppose Wi(x, u) is quadratic in x and u:
Then the optimal control at the last stage for the stochastic system with 8 = OJ , denoted as U't-1,j , is given by U~-l,;
VN[aX N_ l
=
VN
The optimal control at N is given from (10) by
+ ((28; +t
- 1)]
j
N- l
=
1,2
(13)
1 for the adaptive system, denoted as
U't-1'
U~_l
VN[axN_l
+ ((28 N _ l +t
VN
N- l
-
1)]
(14)
where
(15) By comparing (13) with (15), we see that the optimal adaptive control is given by a linear combination of the optimal stochastic controls (16)
where ZN-1 is the a posteriori probability that 8 = 01 at the (N - l)th stage, i.e., after X o , Xl'"'' X N- l have been observed. Thus, at least for this example, the last optimal adaptive control is obtained once the last optimal stochastic controls for 8 = 81 and 82 are known. We can show from (10) and (12) that the inequality (17)
228
VII.
APPROXIMATIONS
holds. Furthermore, Wi need not be quadratic for (17) to be true." More precisely, we will show later on that y/(x, z)
=
z yi\(x)
+ (1
- z) y72(X)
+ LlYi(Z)
(18)
where LlYi(Z) depends only on z. Numerical experiments done for some stable final-value control problems show that LlYi is small compared with ytl and yt2 for i of 11000
10000
9000
8000
7000
2000 11 x· 0.25
•
1000 0.9830 00'70
03706
06294
08536
01464 00170
__ z
x· 0.125
V .-- 0
08536 03706
0.6294
x
.--0.125
o .. --0.25
09830 _2
0·718
c -1/16
m -9/128
-0.25< X s 0.25 x
n.,.1
=:
ox .. U .. r n n n
un; zm
2000
2000
1000
1000 n·6
0.1464
00110
03706
06294
09830 08536 ~Z
Fig. 7.1.
n·8
0.1464
00170
03706
0.6294
0.8536 08536 ~z
Performance index Yn *(x, z) as a function of x. z. and n.
I.
ADAPTIVE SYSTEM OPTIMAL CONTROL POLICIES
229
order 10. Thus, for the example under consideration, not only the optimal adaptive control is obtainable exactly from the corresponding optimal stochastic controls but also (17) gives a good approximation for Yi*(X, z) as well. See Fig. 7.1 for some computation results." C. DERIVATION OF ApPROXIMATE RELATIONS OF OPTIMAL CONTROL POLICIES FOR ADAPTIVE AND STOCHASTIC SYSTEMS
We now generalize the above observation and discuss the relation of the optimal adaptive and stochastic control policies for linear systems with quadratic criterion function and with exact state vector measurements. The following developments are based in part on Fukao.F When the state vector measurements are noisy, the problem of approximating adaptive policies becomes much more difficult and requires further analytical and computational investigation. The system is now assumed given by (19) Yi
=
(20)
Xi
where Xi is n vector, u.; is r vector, ~i is m-dimensional random vector, A is n X n matrix, B is n X r matrix, and C is n X m matrix. Let us use R; as a generic symbol of the random variables. We consider the problems such that, at time j 1, R o ,... , R, will be known exactly from the known collection of the state vectors X o , ... , Xj+l . In Section D, we consider two examples, one with R; = ~i and the other with R i = A. Since the problem to be considered is adaptive, the probability distribution function for R will not be completely known. Let us assume that a parameter e characterizes the probability distribution function for R and the a priori information on (J is given as
+
Pr[O = 0,] =
zo"
,
where s
"'z /: u..,=1
° 1
°
and where the first subscript of ZO.i refers to time and the second subscript i indicates that it is the probability of (J being equal to e.;. In other words, when (J is specified to be ei , the random variables Rj, j = 0, 1,... , are distributed with known probability distribution function F(Rj I (Ji)'
VII.
230
APPROXIMATIONS
<
We will use the notation -> k introduced by (I I) to indicate the expectation operation when the distribution function involved has the parameter value Ok' For example, ('1>(R)k = f'1>(R) p(R 10k) dR when the indicated integral exists. In keeping with the notation introduced above, we denote the a posteriori probability that = 0i at time j by Zj .i . By the Bayes rule, when R's are independent for each (J, the recursion relation for Zi'j is given by
°
p(R j lei) Zj.i p(R-J I e.) z·s ,«. ' t
ZHLi =""S
L..t=l
= 0, 1,..., 1
~
i
~
S
(21)
°
where p(Rj I 0i) is the probability density function of R, when is 0i . Let Zi = (Zi.1 , zi,2 ,... , zi.S)' Then the augmented state vector (Xi' Zi) forms a first-order Markov sequence. We now state a series of four observations which will serve as a basis of our approximation scheme. The notations used are summarized here. Yi*(X, z) is the minimum of E[Lf=i Wj(x j, u j- 1) I X, z] when X i - 1 = X and the probability of the parameter at time i is given by z. ytk(X) Yi*(X, z) where Z = (0, ... ,0, 1,0,... ,0) where the only nonzero component of Z is the kth component which is one. Thus, ytk(X) is the minimum of E[L]:l Wj(x j , u j- 1) I x] when is known to be Ok' Let us call Yi * the adaptive control cost and Yi~k the stochastic control cost.
°
a. Observation I
Assume that if Y-41(X, z) is separable in x and in the components of z. Then the adaptive control cost is expressible as S
Yi*r1(X, Zi)
=
L:
fLk(Zi,k) Vk(X)
(22)
k~l
Assume further that fLk(Zi,k) is proportional to Zi.k , where (23)
Then S
yi+1(x, Zi) =
L:
Zi,k yi+u(x)
(24)
k~l
Remember that (25)
1.
ADAPTIVE SYSTEM OPTIMAL CONTROL POLICIES
231
where only the kth component of Zi is nonzero and is equal to one in (25). Thus (24) shows that, if the adaptive control cost is a separable function of x and components of z, then it is a linear combination of the stochastic control cost. This is a useful fact in approximating the adaptive control policy by those of the stochastic control systems. Before proving Observation 1, let us note that (10) and the ensuing discussions show that the adaptive control cost of a final-value control problem satisfies approximately the assumption of Observation 1, where fLk(Z) is proportional to z.
Proof of Observation 1.
Let Z =
(0,0, ... ,0, 1)
then, from (23) and (25), (26)
for some Let
fLs
and
Vs . Zi.k
= 1,
(27)
l~k~S
then (28)
for some fLk and Vk . Therefore, from (26) and (28),
*(
Yi+l x, z .
) = L., ~
* ()
iLk(Zi, k) - -I) ( Yi-j l,k X
k~l
(29)
iLk
If fLl"(Z) is proportional to z, then (30)
and we obtain (24).
b. Observation 2 As one of the components in
Zi
approaches 1, s
Y;*f-1(x, Zi) ---+
I
k~1
Zi.k Y:+u(X)
(31)
VII.
232
APPROXIMATIONS
This shows that, if the a priori probability of 0 being equal to one of 01 , ... , Os is close to 1, or if the a posteriori probability Zi is such that most of the probability mass is concentrated on one of S possible values for 0 (i.e., when one is fairly sure which value 0 is), then the adaptive control cost will approach the form assumed in Observation 1.
Proof of Observation 2. Expand Yi*(X, z) about Zi* = (0, ... , l(jth), 0, ... , 0) retammg only linear terms in the components of Z and use Observation 1. c. Observation 3 Suppose s
Yi:1(X, Zi)
=
L Zi.k Yi:U(X)
(32)
1
Then the recursion equation for the adaptive control cost is given by y,*(X i_1, Zi-1)
= min Ui_l
S
L Zi_U(W;{X i , Ui-1) + yi+U(Xi»k
k=l
(33)
where the notation introduced earlier by (11) is used to write the conditional expectation (Wi(Xi'
U i- 1)
+ Yi+U(X')k g
f
[W;{x i, Ui-1)
+ yi+l.k(Xi)]
where and where the random variable R i - 1 is assumed to have the probability distribution with the parameter 0 = Ok . Thus, by knowing the stochastic control cost Yi*!-l.k for 1 ~ k ~ S, the optimal adaptive control variable Ui*-1 can be obtained from (33) if Y!+1 has the assumed form (32). (33) shows that, even if Y!+1 is linear homogeneous in z, Yi * is not necessarily linear homogeneous in z.
Proof of Observation 3.
The recursion equation for Yi* is given by S
Yi*(X, Zi-1)
=
!J1in i--1
L
k=l
z'_1,k(Wi(x', Ui_1)
+ Yi:1(X', Z~-r»k
(34)
1.
ADAPTIVE SYSTEM OPTIMAL CONTROL POLICIES
233
where
and Zi_l is the a posteriori probability when the a pnon probability is given by Zi-l . Its components are given by
By Assumption (32),
* (' ')
Yi+l x, Zi-l
s =
* (')
~' L, Zi-l,j Yi+l,k X 1
Thus in (34) s
L
s
Zi-l,k
k~1
Ij-l
s
=
I J Zi_l,j Yi:U(X') peRt-! I OJ) dR i
j~1
s
L zi-u
j~1
establishing (33). The optimal adaptive control If we have assumed
U!-1
IS
therefore obtained from (33).
S
L Zi.k yi+J.k(x) + LlYi+l(Zi)
Yi+l(x, Zi) =
1
instead of (32), then
Ut-l
is still obtained by s
min u _ i
l
L Zi.k(Wi + Yi+l,lc)lc 1
since the contribution LlYi+l is independent of U i - 1 . Thus, in view of our discussion in Section B and of Observations
VII.
234
APPROXIMATIONS
2 and 3, we may define a measure of increase in control cost due to the imprecise knowledge of B by S
L
LlYi ~ Yi*(X, Zi-l) -
(35)
Zi-l.k Yi\(X)
k~l
when the system is in the state (Xi-1 = x, Zi-l) at time i ~ 1. When Yi* is expressible as (33), the right-hand side of (35) can be written as S
0Yi ~ min Ui~l
L Zi-l,k<Wi + Yi't-l.k>k
k=l
s
- L
k=l
Zi-l.k min<Wi Ui_l
+ Yi't-l.k>k
(35a)
i.e., 0Yi is the special case of LlYi where Yi* has the form (33) as the result of the assumed form (32). Note that the operations of the minimization and the averaging with respect to Z are interchanged in (35a). We may therefore say in this case the increase in the cost of adaptive control over the cost of stochastic control is given by the interchange of the summation and the minimization operations. The relation of the optimal policies for adaptive and purely stochastic systems is established by Observation 4.
d. Observation 4 If s
y;*(x, Zi-l)
=
min Ui~l
L Zi-Lk<Wi + Y;+1.k>k k=l
(36)
+ -t:»,
if <Wi is quadratic in u, and if no constraints are imposed on u and x (i.e., state vector and control vectors are not constrained in any way), then the optimal control vector for the adaptive system at time i ~ 1 is given by a linearly weighted sum of the optimal control vector of the corresponding S stochastic problems at time i - 1. Under the same set of assumptions, 0Yi of (35a) is, at most, quadratic in the optimal control vector for the stochastic problems.
Proof of Observation 4.
Since the dependence of
<Wi + Yi\l.l'>
k
on
u is quadratic by assumption, by completing the square, if necessary,
and recalling the recursion formula for
-r, and that the notation
ULl.k
1.
ADAPTIVE SYSTEM OPTIMAL CONTROL POLICIES
235
is used to denote the optimal control for the purely stochastic problem with 0 = Ok at time i, we can write it as
wherecfJ~,k = cfJi,k and where cfJi,k and rPi,k generally depend on x. Note that Utl,k will generally depend on Ok . Substituting (37) into (36), the optimal control Ut_1 for the adaptive problem is given by performing the minimization
s
L
min
k=l
Ui_l
Zi_U[(U - ULU)'([>i,k(U - uLu)
+
i.e., (38)
proving Observation 4 when the indicated inverse exists. Equation (38) shows that the adaptive optimal control policy is obtainable by solving S purely stochastic control problems. Knowing optimal control vectors for the stochastic problems Ui*-l,k , k = 1'00" S, we can obtain the difference of the adaptive and the stochastic control cost when the assumptions of Observations 4 are met. From (35a) and (37), S
0Yi
=
I
Zi-U[<Wi
k~l
+ Yi*:-U)kk_l
s
- I
Zi~l,k[
<Wi
+ yi+U)k]U:_l.k
k~1
s
L zi-u(uLI -
Ui*-U)'([>i.k(ui-l - uL u)
k~l
s
=
s ,
L (Uf-U)'Zi-U([>i.kU'i-l.k - (L Zi-Lk([>i,kUi-U) k~
~I
(39)
As a special case, if the quadratic part of rt,k is independent of Ok , i.e., if
VII.
236
APPROXIMATIONS
then, from (38), the adaptive optimal control 1 ~ k < S, by
IS
related to ut-Lk,
Namely the optimal adaptive control is a weighted average of the corresponding stochastic optimal control. For this special case, (39) reduces to S
0Yi
=
L
(U'Ll,k)'Zi-l,kep,U;-l.k
k~l
(41 )
Even in the general case, by defining s
q), ~
L
(42)
Zi-l,kepi.k
k-l
and (43)
We can express ULI and SYi in forms similar to (40) and (41), respectively. We have, from (40), (42), and (43), (44)
and, from (39), (42), and (43),
(45)
The difference in control cost SYi generally depends on x. If q)k is independent of the state vector x and if the stochastic optimal control vector utk can be expressed as a sum of functions of Xi only and of Ok only, then SYi will be independent of x. To see this we express, byassumpbon, Ui.k as (46)*
.
*
* Equation (13) shows that the system of Section
B satisfies (46), at least for i
~
N -
1.
I.
ADAPTIVE SYSTEM OPTIMAL CONTROL POLICIES
237
Then, substituting (46) into (44) and (45), we obtain u i * = a(x i )
S
+ L zi,kb
(47)
k
lc=l
and s
0Yi+l =
s,
L bk'tPi+lZi.kbk - (L k~l
Zi.kbk) 1>i+l
k~l
S
(L
Zi.kbk)
(48)
k~l
Equation (48) shows that 0Yi is independent of x, when the stochastic problems are such that
D.
EXAMPLES
a. Adaptive Systems with Unknown Plant Noise
If we identify R, with i. in the development of Section C, then we have an adaptive control problem where the probability distribution function for the plant disturbance random variable contains unknown parameter 8. Assume that gi are the only random variables in (19), that they are independent in time, and that their common distribution function is given by F(z I 8), where 8 is chosen from 81 " " , 8s . Thus, AN(XN_1, ZN-l)
=
JWN(x, UN-I) p(x I XN--1 , ZN-l' UN-I) dx
=
J WN(x, UN-I) p(x X
I
XN-1' fN-l , UN-I)
P(fN-l I ZN-l) d(x, f N-1)
(49)
We take W N to be where and where p(f N-1 I ZN-l)
=
S
L P(gN-l 10k) ZN-l,k
(50)
k~l
Thus N
YN*(XN-1, ZN-l) = t?in
L ZN-1,k<WN) ,c
N-l k~l
(51)
VII.
238
APPROXIMATIONS
where, dropping the subscript N - 1 from A, B, and C, (WN)k
= =
+ BUN_1 + Cg N- 1, UN-1)k W N(Ax N- 1 + BUN_I, UN-I) + 2{C(gN-l)k, VN(Ax N-1 + BUN_I)}
(WN(Ax N_1
+ «(Cg
N- 1 ,
VNCgN-1)k
Therefore, u'i;-Lk is obtained by min[WN(Ax N_1
uN~l
+ BUN-I, UN-I) + 2(C(gN_l)k,
V N(Ax N- 1
+ BUN_I))]
where N defined in (37) is seen to be independent of x and k: (52)
From (52),
where
bk,N-l ~ -(PN- 1
+ B;"'_lVNBN_l)-lB;"'_lVN_lCN_l(g)"
(54b)
From (53) the corresponding stochastic control cost is given by Y~,k
= {XN- 1 , (A'VNA - A'VNB(PN_1 -
+ B'VNB)-lB'VNA)X N_
+ B'VNB)-lB'VNAxN_1
1}
2
+
-s.. is obtained by
where
f
s
~
L ZN-Lk
k~l
From (42)-(44), and (45) S
U~_l
=
aN-1x N- 1
+I
1
ZN-l.kbk,N-l
(55)
1. ADAPTIVE SYSTEM OPTIMAL CONTROL POLICIES
239
Thus YN*(X N-1 , ZN-l)
=
S
L
ZN-Lk YN*(X N- 1)
+ LlYN(ZN_l)
(56)
k~l
where LlYN is given from (35), (52), (54a), and (54b), and is independent of X N- I since if> and b's are independent of x. If
= U~-Lk
U~_l
for any k. By employing the argument similar to those in the proof of Observation 3, we can show that utI is obtained from S
min
L
Ui_l k~l
Zi-Lk<Wi
+ Yi~1.k)k
(57)
and S
Yi"t-1(Xi , Zi)
=
L
Zi.k Yi't-Lk(Xi)
+ LlYi+l
(58)
k~l
where LlYi+l equation
1S
independent of x and is the solution to the recursion S
LlYi(Zi_1)
=
0Yi(Zi-1)
+ L Zi-Lk
(59)
where QYi(zi-l) is defined by (35a). In the system just discussed LlYi turned out to be a function of z only. Hence we could synthesize ui* from Utk, 1 ~ k ~ S exactly. If LlYi is a function of x, however, then we can no longer synthesize Ui * so simply. When the random variable R, contains Ai' B i, and/or C; in addition to or instead of gi' LlYi will, in general, be functions of u, x, and g. Such an example is briefly discussed next. b. Adaptive System with Unknown Transition Matrix Suppose A of (19) is an unknown constant matrix with S possible values A(l),... , A(S). The random variable g's are assumed to be independently and identically distributed with Eg i = 0 and assumed
VII.
240
APPROXIMATIONS
to have finite second moments. Assume C = I in (19) for the sake of simplicity. Now, S
I
=
YN(X N-1, ZN-l)
(60)
ZN-1,k(WN)k
k~l
where smce
<
p(x N
-)k
now stands for the expected value for the system with
I XN-l
=
, ZN-l)
I
ZN-Lk P(gN-l
=
XN -
A(k)X N_ 1 -
A(k)
BN-1U N-1)
One can write where (62a) U~-Lk
ePN.k
=
-(B;';-lVNB N_1
g
aN-LkxN-l
g
x;"'_1(A(k»)'(V
+
PN_l)-lB;';_lVNAlklXN_l
(62b)
+ E[g;';_l(V~;I
1
N
+ +
B;';_lPN_IBN_l)-lAUc)XN_l
B'PN_1Br1gN_l]
(62c)
when the indicated inverses exist. From (42),
=
From (43), From (44), S
U~-l
=
I
ZN-1,kU~-Lk
k~l
From (45), S
GYN =
I
(aN-LkXN-S
k~l
s
,S
- (I
ZN-LkaN-l,kxN-l)
k~l
(I
ZN-LkaN-LkxN-l)
k~l
s
=
[I
X;"_l
ZN-1,k(a;';-1,k
k~l
s
- (I k~l
,S ZN-LkaN-l.k)
(I k~l
ZN-l.1c aN-Lk)] X N-1
(63)
2.
241
OPEN-LOOP FEEDBACK CONTROL POLICIES
Thus, unlike the previous example, OYN is a quadratic function of X N- 1 .
Hence Assumption (36) of Observation 4 is true only for i in general,
=
Nand,
where L1Yi satisfies the recursion equation (59). Thus the control constructed from the stochastic controls by (38) is no longer an optimal adaptive control but becomes its approximation which is equivalent to neglecting L1y terms in the recursion equation for y*. From (59) and (63), it can be seen that such an approximation will be good so long as s
(I
s,
Zi.k a;.k<1>i+l ai,k) -
k~1
(L:
s
Zi,k a,:.k) <1>i+1
k~1
(L:
Zi,kai,k)
k~1
remains small; in other words, if the norm of s
L:
Zi.k(Alk))' Alk) -
(L:
, Zi.k(AIc))
(L:
Zi.k A 1k))
k~1
is small either by having Zi.k close to zero except for one k(i.e., when the learning on the value of A is almost complete) or if A(1),..., Alk) are very close together.
2. Approximation with Open-Loop Feedback Control Policies In the previous section, we have discussed the method which approximately synthesizes optimal closed-loop control policies for adaptive systems from optimal closed-loop policies for the corresponding purely stochastic systems. In this section we will discuss a scheme which approximates optimal closed-loop control policies with what is sometimes called optimal open-loop feedback control policies for the same systems.t" An open-loop control policy specifies the sequence of control decisions to be followed from a given initial point, i.e., all control decisions are given as functions of the initial point and time. An open-loop feedback control policy computes the current and all future control variables Uj , i :(; j :(; N ~ I, at time i from the past and current observed state variables of the system yi but incorporates feedback in that only U ,:
242
VII.
APPROXIMATIONS
is actually used, and the new observation Yi+l on the attained state variable Xi+l is used to recompute the open-loop control Ui+l as the functions of yi+ l at time i 1. The discussion will be for systems whose state vectors are exactly observable. This assumption of exact measurements is not essential for the development of this section. The systems with measurement noise can be treated similarly but with added complexities in the derivation. The method discussed in this section is essentially a stochastic version of Merriam's parametric expansion method 104b for deterministic systems and adapted from the method proposed by Spang.P" For computer study of the effectiveness of the open-loop feedback control policies, also see Spang. 127 One starts from the assumption of the plant equation given by
+
k
= 0, 1,... , N - 1
(64)
where X k is the state vector, Uk is the control vector, and tk is the random disturbance vector. The matrices A, B, and C are assumed unknown, with given a priori joint probability density function Po(A, B, C). The matrix C is assumed nonsingular. * The assumption of unknown C amounts to the assumption that the variance of the noise to the system is unknown. The joint probability density function of to, tl ,..., t N - 1 is assumed to exist and to be known, It is a straightforward extension of the method of this section to include the case where the joint probability density function is parametrized by an unknown parameter. As before, optimal closedloop Uk is to depend only on yk and Uk-I, The criterion function ] is taken to be quadratic: N
J= L Wi
(65)
1
where and where Vi and T i- l are positive symmetric matrices, 1 :S; i :S; N. The contribution to ] from the present and the future at time k is given by (66)
* If C is singular, it can be shown that a certain number of coefficients can be learned exactly in a finite time.
2.
243
OPEN-LOOP FEEDBACK CONTROL POLICIES
Equation (66) can be rewritten as (67)
Here we will make use of the extension of Merriam's idea of parametric expansion for the deterministic systems. This difference equation for lk is satisfied by a quadratic form in X k-1, Uk-I, Uk , ... , UN-I' and t k-1, t k , .•. , tN-I' Therefore, we write
I, =
N-l
ak
+2 L
N-l
i~k-l
i~k-l
N-l
N-l
i~k-l
N-l
N~
gi(k)
gi
i=k-l
t/ Ni;(k) t;
N-l
+2 L
;~k-l
+2 L
u/ Kij(k) u;
;~k-I
+2 L
+ 2C kXk_1 + x~_lLkXk_I
+ L L
N-l
+ L L
bi(k) u,
u/I;(k)
X k- I
;~k-l
g/ Mi(k)
N~
Xk-l
N~
+2 L L i=k-l
i~k-l
t/
Qi;(k) U;
(68)
;~k-l
where a k , b(k)'s, etc., are matrices of appropriate dimensions. Substituting Eq. (68) into Eq. (67), we obtain a set of recursion equations for the coefficients of the expansion. They are derived in Appendix A at the end of this Chapter. They are, then, solved for all k off-line. We know from our discussions in Chapters II-IV that an optimal closed-loop feedback policy is such that it minimizes N
Yk+I. =
(L
Wi I X k)
Jlk+1
P(tk , ... ,
E
k+I
=
X dCA, B,
with respect to ui
U i - 1•
,
tN-II A, B, C, x k) peA, B, C I x k)
C, tk , ... ,
k :( i :( N -
(69)
gN-I)
I, where
Ui IS
a function of
Xi
and
VII.
244
APPROXIMATIONS
Define
f ;;';j p(;" , ;"+l ,..., fL/ = f ;j p(;/c, ;"+l
rt =
'00"
k
~
j
i,
;N-1
I A, B,
I A, B,
;N-1
C,
C,
x") d(;" '00"
x") d(;" '00"
;N-1)
;N-1),
(70a)
(7Gb)
N - 1
~
Using a bar to indicate the conditional expectation operation with respect to A, B, C, Eq. (69) becomes approximately equal to N-1
E(Jh:+1 I x") "'" a"+l
+- 2 L
+-
bi(k
i=k
N-1 N-1
l)u"
i=k
N-1 N-1
+- L L i~"
+-
l)ri~)
N -1
2
L
u/f/k
+-
l)x"
j~"
N-1 N-1
+-
i=k
+-
+- 2 L
j~"
+- 2 L g;'(k
a"+l
u,,'Kij(k -f-=1)u j
j=k
N-1
tr(N;;(k
N-1
=
+- L L
(bi(k
t=k
1)fL/
+- L L
i=k j=k
N-1 ~-=--=---cc
+- 1) +- L
fLrqi(k
J=1-
tr(Ni/k f l)F~J
+- 1)) ui
N-1 +-2(Ch:+1+'
L
fLrMi(k+-1))x"
Z~"
N-1 N-1
+- L L i~"
u;'Kij(k
j~"
+-
N-1
l )«,
+- 2 L
u/fj(k
+- l)x"
j~"
(71)
The approximation consists in replacing closed-loop control decisions with open-loop control decisions. For i > k, note the relations bi(k
+-
Jb;(k +- l)u i p(A, B, C I x") d(A, B, C) * f b;(k I) p(A, B, C I x") d(A, B, C)
l)u i ~
+-
Ui
= bi(k
+-
I)ui
2.
245
OPEN-LOOP FEEDBACK CONTROL POLICIES
Similarly u;'kij(k
+ I)uj
0/= u;'kij(k
+ I)uj,
i] > k
etc.,
Note that only when Uk is involved we can write
etc. When hiCk
+
+ I )ui , u/ Kij(k + 1)Uj , etc., are equated with hiCk + 1)u i ,
etc., in the right-hand side of (71), the control variables k 1 are all taken to be functions of x only, i.e., open-loop control variables are substituted for closed-loop control variables. The optimal open-loop control variables Uk Uk+1 , ... , U N- 1 which approximate the optimal closed-loop policy is then given by Uk , ... , UN-I, which minimizes (71). Hence, by differentiating Eq. (71) with respect to u j ,j = k, k 1,... , N - I, we obtain
uiKij(k I)uj , Uk , uk+1 , ... , U N-
+
N-l
z::
i=k
Kji(k
+ I)u;*
N-l
= - (hj(k
+ 1) + I
fJ{O;j (k
i=k
j
=
k, k
+ 1))
+ 1,... , N
- 1
(72)
which, when solved, gives Uk * among others. When Uk * is applied at time k and the time advances to k 1 from k, we have one more observation X k +1 ' Therefore, rather than using ut+l ,..., U't-l obtained by solving Eq. (72) at time k, we resolve Eq. (72) after iL's and T'« are re-evaluated conditioned on xk+ l rather than on x k . In other words, only the immediate control Uk * is used from (72). Thus, at each time instant k we have a recursive procedure to obtain uk * This approximation generates an open-loop feedback control policy since a new observation Xk+l is incorporated in computing a new optimal open-loop policy based on the knowledge xk+I. It is easy to see that open-loop policies are much easier to compute than closed-loop policies. The question of when optimal open-loop feedback policies are good approximations to optimal closed-loop policies must be carefully considered for each individual problem. See Spangl 27 for computer studies for simple systems.
+
VII.
246
APPROXIMATIONS
3. Sensitivity and Error Analysis of Kalman Filters A.
INTRODUCTION
If the description of a linear dynamic system set of equations Xi+! = Aixi + ti
IS
given exactly by the
(73)
+ 7Ji
(74)
E(ti) = E(7Ji) = 0
(75a)
Yi = H,», E(tit/)
=
Q;?iij
(75b)
E(7Ji7J/)
=
R i8ij
(75c) (75d)
namely, when the matrices of the system Ai and Hi' the means and covariance matrices of the random noises, are given exactly as above, then the outputs of the Wiener-Kalman filter are the best linear estimates of the state vectors of the system. See Chapter V and Section 4 of Chapter II. It is important to have some measures of the variations of the filter outputs when some of the underlying assumptions are not true, since the system parameters such as Ai and Hi or noise statistics such as Qi and R i will not be generally known exactly in real problems. Such inaccuracy may arise as a result of numerically evaluating Ai and Hi (round-off errors and/or error in quadrature). For example the linear system given by (73) and (74) may be merely an approximate expression of a nonlinear dynamic and/or plant equations obtained by linearizing them about some nominal trajectories. Then, Ai and Hi are evaluated, perhaps numerically, by taking certain partial derivatives. See, for example, Section 3,F of Chapter V. Another reason for such analysis is that, for problems with complex expressions for A and/or H, it is of interest to examine the effect of a simplified approximate expression for A and/or H on the accuracy of the estimation. As for noise statistics, we usually have only their rough estimates. Therefore, it is important to evaluate the effects of inaccuracies in Ai , Hi , Qi , and/or in R i on the estimates, i.e., on the error covariance matrices of the outputs of the Wiener-Kalman filters. It is also important to know the effects of nonoptimal filter gains on the error-covariance matrices. We are interested in nonoptimal gains: (I) to study the sensitivity of the estimates and of the error covariance matrices with respect to the filter gain and (2) to study the effects of the
3.
SENSITIVITY AND ERROR ANALYSIS OF KALMAN FILTERS
247
simplified suboptimal method of gain computation on the filter performance since the gain computation is the most time consuming operation in generating the estimates. For additional details see, for example, Joseph. 8 1 . 8 2 We have derived, in Section 3 of Chapter II and elsewhere, the expressions for the error-covariance matrices for Kalman filter. They are given by (76)
where (77a)
where X,:*
and where
Ki+I
g
E(x,: I y':)
(77b)
is the optimal filter gain given by (77c)
where M':+l
g
E[(Xi+l - x':+l)(x':+l - x':+l)' I y':]
=
A,:r,:A/
+ Qi
(78)
The error-covariance matrix of the optimal estimate I',
IS
calculated by
g E[(x,: - x,:*)(x,: - x,:*)' I y':] = (I - K,:H,:)M,:(I - K,:H,:)' + K;R;K/
(79a)
or equivalently by (79b)
The initial estimate X o* and its error-covariance matrix To is assumed given from a priori information on the initial state of the system. B.
GAIN VARIATION
Let us first consider the effects on T of the gain changes from its optimal values K, by 8Ki . Denoting by 8Ti the deviation of the errorcovariance matrix from its optimal form T'; and dropping the subscripts,
sr = ~
[I - (K + SK)HJM[I - (K + SK)H]' - (I - KH)M(I - KH)' - KRK'
+ (K + SK)R(K + SK)'
SK [- HM(I - KH)'
KH)MH'
+ RK'] + [(I -
where the second-order terms are neglected.
+ KR]
SK'
(80)
248
VII. APPROXIMATIONS
Since K is given by (77c), coefficients multiplying oK and oK' vanish in (80) and we have
sr =
0
The alternate expression for optimal error-covariance (79b), obtainable by substituting (77c) into (79b), gives
sr =
-SKHM
Therefore, in numerically evaluating r, the expression (79a) would be less sensitive than (79b) to small variation in K. In Sections 4-6 we consider several suboptimal filters using non-optimal gains in the Wiener-Kalman filters. See also Section
3,E. C.
THE VARIATION OF THE TRANSITION MATRIX
We now investigate the effects of changes in Ai on the accuracy of computing M i +1 from r i . The noises are taken to be Gaussian random variables. Denoting the small variation of Ai by oA i and dropping subscripts, oM = oArA' AroA' from (78). Since oM will be small compared with M, write M oM = M EN, where E is a small positive constant. Since M is symmetric, by appropriate linear transformation on x, M can be made diagonal, i.e., the components of the estimation error x ~ x after the linear transformation can be taken to be uncorrelated, hence independent. The variances of these independent errors are the eigenvalues of M. Therefore, the change in the eigenvalues of M due to a small change in A may be regarded approximately as the changes in the variances of the components of the estimation error x ~ x. (This is only approximately true since oM will not be generally diagonal even if Mis.) We will now investigate the difference of the eigenvalues of M and of M oM. Denote by t\ the ith eigenvalue of M with its normalized eigenvector denoted by ei . We define .\ and ei as the corresponding quantities for M + EN. Writing
+
+
+
+
3.
SENSITIVITY AND ERROR ANALYSIS OF KALMAN FILTERS
the relation (M
+ EN)e =
Ae yields, to the order
249
E,
+ AT OA')ei
=
e/(oATA'
=
2e/ oATA'ei
= 2e/ oAA-l ATA'ei =
2e/(OAA-l)(A iei
~
Qei)
Therefore, or
If a major contribution to M comes from ATA' and not from II Qi 11/\ ~ 1, and one has approximately t
Q, then
EA;l 1/1 Ai I ~ 2 I oA,Ail II
In computing M N , Eq. (78) will be used N times starting from To. If each step of going from I', to Mj+l' 0 ~ j ~ N - 1, satisfies the assumptions stated above, then the total percentage error in M N is approximately given by N-l
2
I
o
II OAiAil II
or 2N II oAA-l II if A is a constant matrix. Therefore, as a rule of thumb, one must have tlOAAII~
1
2N
in such applications where N total number of estimates are generated.
D.
IMPRECISE NorSE COVARIANCE MATRICES
Since the statistics of the random noises are known only very roughly, the effects of large variations of Q and R, rather than their small variations on T, need be investigated. Such investigations must generally be done numerically in designing filters.
VII.
250
APPROXIMATIONS
One may take the min-max point of view in evaluating the effect of different Q's and R's on r, using techniques similar to those in Aoki'" where the effects of unknown gain (distribution) matrix on the performance index have been discussed. See also Section 2,D of Chapter II, Section 2 of this chapter, and Refs. 64 and 129 for treatment of unknown covariance matrices.
E.
EFFECTS OF SIMPLIFICATION
The amount of computations for implementing optimal Kalman filter is quite large for systems with high dimensions. In order to update x i * and r i, i.e., to obtain Xt+l and ri+l from xi* and r i, the following steps are involved: (i) xi+l is computed by x i * and Ai, (ii) Mi+l is computed from by (78), (iii) Ki+l is computed by (77), (iv) Xt+l is computed by (76), and (v) ri+l is computed by (79). A rough calculation shows that the number of multiplications involved is of the order n 3 even without counting the number of multiplications necessary to invert an (m X m) matrix, where n is the dimension of the state vector. In many problems, therefore, one is willing to use slightly inaccurate estimates if a significant reduction of the amount of computation results. One such reduction is achieved by reducing the dimension of the state vectors, for example, by replacing correlated noises by uncorrelated noises, or by partitioning the state vectors. 81 ,104 , 104a . 1l 2 Related approximation methods aimed at the reduction of the amount of computation are discussed in the next two sections. In practice, any such approximation must be carefully evaluated to achieve a reasonable trade-off of accuracy versus the amount of computation.
r,
4. Estimation of State Vectors by a Minimal-Order Observer A.
INTRODUCTION
When the problems of control are separated from those of estimation, * approximation may be made to the subproblems of control, to estimation, or to both. Approximate control schemes may, for example, use some
* This procedure is known to yield an over-all optimal control system for a class of linear systems with quadratic criterion. See Section 2 of Chapter II for detail.
4.
ESTIMATION OF STATE VECTORS
251
statistics which are not sufficient to approximately summarize past and current observation data and use control policies which are functions of these statics. In the next three sections, we will discuss effects on the performances of the optimal Kalman filters of various approximations which reduce the amount of computation required to generate estimates of the state vectors. Consider the case of linear systems with additive noises. We have seen in Section 3 of Chapter II that for the linear observation scheme the best linear estimator of the state vector has the same dimension as the plant. For complex systems with large dimensions, therefore, the problem of constructing the optimal filter or computing the optimal state vector estimates is not trivial. It is, therefore, important in practice to consider approximately optimal estimation procedures where constraints are imposed on the permissible complexities of the estimators or on the amount of computations. One approach is to partition the state vector into subvectors'"; i.e., instead of constructing an optimal estimate for the entire state vector, one may partition the state vector and construct a suboptimal filter for the state vector by combining judiciously estimates of these partitioned components of the state vector. This method requires a smaller amount of computation because of the nonlinear dependence of the amount of computation on the dimension of the state vector. This will be the subject of Sections 5 and 6. Another approach in limiting the complexities of an estimation scheme is to specify the dimension of the estimator. One such proposal has been made by Johansen. 78 Consider a situation where the system is described by Xi+l Yi
= =
+ BU i + gi HXi + TJi
(81)
AXi
(82)
where x is an n-dimensional state vector, y is an m-dimensional observation vector, u is a control vector, and g and YJ are Gaussian noises, and where we use O~i~N-l
as the optimal estimate of Xi at time i. The best estimate fLi+l has been shown to be generated recursively as a linear function of fLi and Yi+l . Note that fL has the same dimension as x. Johansen's proposal in discrete-time version is to generate approximate estimates of Xi , Zi , i.e., an approximation to P« , by Zi+l
=
Fiz i
+ DiYi
or
Zi+l
=
Fiz i
+ DiYi+l
252
VII. APPROXIMATIONS
where the dimension of control generated by
Zi
is generally less than that of u,
=
Xi ,
and to use
CiZ i
in problems with quadratic criterion functions since we know that the optimal control u; is proportional to fLi . In this formulation, matrices C, D, and F are chosen to minimize a given criterion function. These matrices, however, are not determined uniquely and require further conditions and/or numerical experimentation to obtain satisfactory results. Since the observation of the state vector Yi carries a certain amount of information on the state vector Xi , we will now consider a procedure for generating a vector Zi in such a way as to supplement the information carried by Yi so that Zi , together with Yi , can be employed to yield an approximation to fLi . This idea will be made more precise for the case of time-invariant linear sample data systems of (81) and (82). * We are particularly interested in this procedure where the dimension of the vector Zi is smaller than that of Xi' where Zk is the state vector of the dynamic system governed by k
=
0,1, ...
where Zk is the p-dimensional vector at the kth time instant, p :'(: n, F k is the (p X p) matrix, and D k is the (p X m) matrix. Typically, p < n. For example, one may take the estimate of X k , Xk , to be
where K and N are to be chosen. The vector Zk' together with Ylc , acts as the inputs to the estimator of X k • Such a system will be called an observer in this section. We now consider the problem of constructing an estimator Zi of (n - m) dimensions so that Xi is estimated as some linear function of Yi and zi' B.
DETERMINATION OF THE STATE VECTOR OF A DETERMINISTIC SYSTEM
Under some regularity conditions it is possible to generate Zi which, together with Yi , determines Xi exactly for deterministic linear systems. *'The following development is based on Aoki and Huddle.!?
4.
ESTIMATION OF STATE VECTORS
253
Namely, if an n-dimensional linear plant is completely observable, and if the observation on the system state vector produces m independent outputs (m < n), then it is possible to construct a device with (n - m)-dimensional state vector to supply the remaining n - m components of the plant state vector. We will give a sketch of the method developed by Luenberger.l'" A more detailed discussion is given in Section C, where the basic idea is modified to estimate the state vectors of a stochastic system using a more constructive method. Consider a linear system, the plant of which is governed by
where Xi is the n-dimensional state vector and the observation equation is given by Yi =
H»,
where Yi is m-dimensional, m ~ n, and H is an (m X n) matrix. Assume that the system is completely observable.P'' i.e., assume that the m· n column vectors of the matrices k
=
0, I, ... , n -
I}
span the n-dimensional Euclidean space. Then it is possible to design an observer with arbitrarily small time constant, such that Xi can be reconstructed exactly from Yi and Zi where Zi is the state vector of the observer. The design of such an observer is based on the existence of a matrix T which relates the state vectors of the plant and the observer by Tx i
Zi =
,
i = 0,1, ...
The dynamic equation of the observer is given by Zi+l =
FZ i
+ DYi +- CUi
where T and F is related to A and C by the matrix equations TA - FT = DH
(83)
C = TB
(84)
and These equations are derived in Section C. Luenberger shows that if the original plant is observable then F can be chosen so that its norm
254
VII. APPROXIMATIONS
is arbitrarily small and that T can be chosen in such a way that is nonsingular. Therefore,
r = (~)
His proof that T can be chosen to make T nonsingular, however, is not constructive. We will construct an (n - m)-dimensional observer for the stochastic system in such a way that the error-covariance matrices of the estimates of the plant state vector are minimized in a sense to be specified later.
C.
ESTIMATION OF THE STATE VECTOR OF A STOCHASTIC SYSTEM
We will now extend the ideas discussed in Section B to linear stochastic systems and design an estimator of Xi using (Yi , zi)' where Yi is the system observation and Zi is the output of an observer with (n - m) memory elements. If the system is such that nearly all the components of Xi are observed, i.e., if m is close to n, then the number of memory elements employed by the observer is much less than n. If the resultant error covariance matrix indicates that system performance is not much worse than that achieved using the optimal Wiener-Kalman filter, then the estimator considered here may have a practical application. See Section D and Ref. 17 for some numerical comparisons of the filter performances. a. The Stochastic Design Problem and the Estimation Error-Covariance Matrix In this section we consider the stochastic problem without control. The control term is introduced in Section 4,C,c. The system whose state is to be estimated is shown in Fig. 7.2. The state vector satisfies the nth-order time-invariant linear difference equation (85)
where gi is a sequence of independent vector random variables representing disturbance noise. The observation equation of the state vector is assumed given by (82). We will assume here that H is an (m X n) matrix having rank m and in addition is such that the system
4.
255
ESTIMATION OF STATE VECTORS
,-----------------1 : {.
xi +1
,
I '
I
I
I SYSTEM I
I
I I I
(N Ih-ORDER)
I I I I
I I
:
L
--,
r - - - - - - - - - - - - - - - - - - ----, Z· I " D L DELAY ,+ ,MINIMAL
I
ORDER DYNAMIC 'SUBSYSTEM OF I THE OBSERVERI(N - M lth-ORDER
I
I I
rI - - - - - - - - - - - - - - - - - - - - - I SUBSECTION OF I THE OBSERVER
I FOR GENERATING
I I
iTHE STATE VECTOR I ESTIMATE
I
L "
I
IX i + 1
I
I L
Fig. 7.2. estimator.
,
J
Schematic diagram of (n - m)-dimensional observer and the state vector
is observable. * We denote by Rand Q the covariance matrices of the independent noises g, T) which are assumed here, for convenience, to be stationary: E(g;g/) = QOi; E('TJi'TJ/) = us; E(gi'TJ/) = 0 for all i and j
The state vector of the observer is assumed to satisfy the difference equation (86)
where F and D are time-invariant (n - m) X (n - m) matrix and (n - m) X m matrix, respectively, yet to be specified. From the discussion of the previous paragraphs, the observer system is seen to involve two distinct sections. The first is a dynamic subsystem whose output is to represent, under nonstochastic conditions and proper initialization, a linear transformation Tx of the observed system state vector.
* See
Ref. 17 for discussions on unobservable systems.
VII.
256
APPROXIMATIONS
The other section of the observer accomplishes the construction of the observer estimate of the system state vector by applying the inverse linear transformation t-» to the partitioned state vector
Denoting
7'-1 =
(P i V)
(87a)
(-I-)
(87b)
where
t
=
and where P is an (n X (n ~ m» matrix and V is an (n X m) matrix, we may express the observer output as (88)
which is as depicted in Fig. 7.2. The central problem in the design of the observer for deterministic systems is to select the unspecified matrices such that the fundamental matrix equation (83) is satisfied while t remains nonsingular.I'" For the stochastic problem, these conditions must also be specified, but in addition we seek that solution which permits minimization of the elements of the estimation error-covariance matrix. To obtain this design solution we first derive a number of matrix relations involving the estimation error-covariance matrix which is defined as C,
g E[(Xi - Xi)(Xi - Xi)']
(89)
where Xi is the estimate of Xi provided by the observer at time i and where Xi is the true state of the system of (85) at time i. The relations obtained will then be manipulated in such a way that a set of equations are obtained for elements of the covariance matrix C in terms of the given matrices A, H, Q, and R and one design matrix V of (87a). These relations may then be used by the designer to minimize certain elements of the error-covariance matrix C, as desired. It should be emphasized that the constraints placed on observer design lead to a loss of freedom in the minimization process as should be expected. The central question, then, is whether or not these constraints allow a much cheaper estimator to be built which may have performance comparable to that of the (unconstrained) optimal estimator for the particular application considered. Throughout the ensuing discussion we shall
4.
257
ESTIMATION OF STATE VECTORS
use the following relations which, as shown in Appendix B at the end of this chapter, guarantee the existence of the observer:
PT
F= TAP
(90)
D
=
TAV
(91)
+ VH
=
In
(92)
HV =Im TP = I n _ m
(93)
HP=O TV = 0
where lie denotes the (k X k) identity matrix. We begin by considering the error in the dynamic subsystem of the observer. We define it as e,
=
Z,: -
Tx,:
(94)
The dynamic equation for ei may be written from (85) and (86) as em
=
Fe,
+ (DH -
(1' A - FT))x,:
+ ~,:
where But as T is taken to satisfy (83), (94) simplifies to (95)
We note that the augmented observation vector satisfies the equation
[;;J =
ix,: + [~:
J
(96)
where
[;;J is the observation of Xi which augments the originally available observation vector Yi . The noise ei is not white however. Its mean and covariance matrices are given by Ee,
=
0
(97)
TQT'
VII.
258
APPROXIMATIONS
Note that E(eiTJ/)
= 0
The estimate of Xi is constructed as the output of the observer and is given by (88). Expressing Zi and Yi as functions of Xi' we have from (96) Xi
(PT
=
+ VH)X i + Pe, + VTJi
(98)
Using (92) we see that the error-covariance matrix is expressed by C,
= PSiP'
+ VRV'
(99)
Using (93) and (99) we easily obtain TCiH'
=
0
(100)
The covariance Q of the plant disturbance noise does not enter relation (100) explicitly. To obtain an expression containing Q we reconsider the error propagation of the estimate. Defining xi = Xi - Xi' we write xi+l as (101)
Using (86) and (90)-(92), Xi+!
=
= = =
Xi+l
can be rewritten as
+ P(Fzi + DYi) VYi+i + PTA(Pzi + Vy,) VYi+l + PTAXi VYi+l + (I - VH)Ax i VYi+l
(102)
Therefore we have the difference equation for the estimation error as (103)*
From (103), the recursion equation for the error covariance matrix is given by CHi =
VRi+lV'
+ (I
~
VH)(AC,A'
+ Qi)(I -
VH)'
(104)
where V satisfies the constraint HV=Im
(105)
* Matrices VH and PT = I - VH that appear in (103) and elsewhere, are projection operators since (VH)(VH) = VH.
4.
259
ESTIMATION OF STATE VECTORS
Multiplying (104) by H' from right and making use of the relations of (93), we obtain (106)
b. Optimal Design of an Estimator of Minimal Order In this section we modify the matrix relations involving the errorcovariance matrix C, obtained in the previous section, and proceed to an optimal design solution in a rather unencumbered manner, while still satisfying the multiple constraints imposed on the observer structure. The constraint (105) is explicit and can be applied with no difficulty at the outset of the design effort for a given observation matrix H. Since (92) alone is sufficient for the inverse 1'-1 to exist we employ the expresSiOn
PT =1 -
(107)
VH
(with HV = 1m imposed) wherever useful From (106), we obtain
III
the ensuIllg discussion. (108)
From (104), we obtain (1 - VH)Ci+l = (1 - VH)[ACiA'
+ Qi](I -
VH)'
(109)
These two implicit relations involving C i + 1 are sufficient with Eq. (101) to obtain an expression equivalent to (106): (1 - VH)Ci+lH' = 0 The constraint on Ci+l expressed by (108) is easily imposed at the outset of design, to specify the covariance Ci+l given C; and the design matrix V. * Thus, if we address ourselves to the task of minimizing selected elements of Ci+l while (106), (108), and (109) are satisfied, by selection of the matrix V subject to the constraint HV = 1m , we will have optimized the design of the minimal-order estimator for the stochastic application. If this is done sequentially (i = 1,... ) we will derive a sequence of matrices {Vi} which realize the optimal transient response of the filter. On the other hand, by assuming a steady-state condition, (110)
* Although the conditions given by (104) and (105) and those given by (106), (108), and (109) are equivalent, the latter may be more convenient to employ directly.
260
VII.
APPROXIMATIONS
we can use (108)-(110) to yield by the same procedure an estimator design which is optimal in the steady state. c. Control Systems
Now suppose that instead of (85) we have the control system as originally given by (81):
The estimator is now taken, by adding a control term to (86), to be (Ill) Then, as before, in terms of T which satisfies T A difference ei between Zi and TXi satisfies the equation
FT =
DH, the
Therefore, by choosing G
=
TB
(112)
where T is previously chosen, the result of the prevIOUS section still holds true. d. Connection with Optimal Wiener-Kalman Filter The estimator described in this section generates the suboptimal estimate Xi of Xi by (88): (I13)
where Yi+l is the observed state vector of the system given by (82), where Zi+1 is the output of the observer (86), and where F and Dare given by (89) and (90), respectively. Therefore, from (102) we can see that
=
AX i
+ V(Yi+1
- HAx i)
where V must satisfy the constraint given by (105).
(114)
4.
261
ESTIMATION OF STATE VECTORS
In this form, we can see clearly the relation with the Wiener-Kalman filter of the estimation scheme of this section. Instead of using optimal time-varying gains of the Kalman filter, the estimation scheme of this section uses a constant gain V which is chosen optimally in order to minimize the steady-state error-covariance matrix C of (110). Figure 7.2 is the schematic diagram of the estimator showing how the observed state vector and the output of the observer is combined to generate the estimate of the state vector. We next present an example of constructing an observer which IS optimal in the steady state. For this example, the original plant IS chosen to be observable.
D.
EXAMPLE: OBSERVABLE SYSTEM
Consider a two-dimensional system with the plant equation x(i
+ 1) =
Ax(i)
+ Bu(i) + W)
where ") =
x(1
and where equation
Xl , X 2,
(Xl(i))
A=
(") ,
X2 1
u, and
~ 2
( o
B =
(~)
g are scalar quantities and with the observation y(i) = Hx(i)
+ YJ(i)
where H
=
(1,0)
Q(i)
=
(qlo q20)
R(i)
=
r
Xl'
The observation y is taken on but X 2 is not observed. We will construct a first-order observer which estimates X 2 by Zi+l
=
FZ i
+ D y(i) + K
Let and
u(i)
262
VII.
APPROXIMATIONS
The constraint equations (90)-(93) between VI =
l' and 1'-1 yield
1
p] = 0
t]
t 2P 2
=
1
t]P2
=
- V2
=
0
+tv
2 2
(115)
We now compute the steady-state error-covariance matrix C:
Imposing the constraints on the steady-state error-covariance matrices, HCH' = R
yields
Cn
= r
(1 - VH)CH' = 0
yields
C12
=
and (I -
VH)C = (1 -
VH)ACA'
+ Q(1 -
v 2r
VH)'
yields (116)
We choose to minimize the variance of X 2 variable v 2 • Thus we seek V 2 such that
, C2 2 ,
by selection of the free
which yields
Solving this equation with the additional simplifying assumption ( 117)
we find V2
= -0.37
(118)
and Cn
=
02
C22
=
2.4450 2
c12 = -0.370
(119) 2
4.
263
ESTIMATION OF STATE VECTORS
To complete the design of the observer, we compute F = TAP = -0.63
(120)
D = TAV = -0.5t 2
and where
and
t-» _ [ 0 -
l/t 2
1]
-0.37
It is seen that t 2 remains unspecified in the optimization. This due to the fact that the multiplication of the transformation
IS
by t 2 is irrelevant in reconstructing x, so long as it is nonzero, as passing
z through the inversion t-: cancels whatever effect t 2 may have.
The schematic diagram of the observer is given in Fig. 7.3. In order to obtain some insight into the accuracy of the estimation method discussed in this section, the error-covariance matrices of the optimal Wiener-Kalman filter are computed for the system of the example for comparison. Denoting the (i, j)th component of the optimal error-covariance matrix at time k by Tij(k), the following set of equations hold: Tn(i
+ 1) =
[q1
+ Tn(i)
X
[1 - (q1
- 4 T 12(i) + T 22(i)]
+ 4 Tn(i)
- 4 T 12(i) + T 22(i))/L1.J
T 12(i + 1) = [2 T 12(i) - T 22(i)][1 - (q1 Tdi
+ 1) =
q2
+ T 22(i) -
+ 4 Tn(i)
- Tdi)
+ Tdi))/L1i]
(2 T 12(i) - T 22(i))2/L1i
where
In particular, we are interested in T ll(oo), Tu(oo), and T 22(OO ). These satisfy the algebraic equation
VII.
264
APPROXIMATIONS
,--------- - - - - - - - - uti)
-.,
-
I I
I
LOBSERYER
I
y(i) ---'----,---1
I
I I
I I -
-
-
-
-
-
1 I
GENERATOR
OF THE STATE rYECTOR I ESTIMATE
I I X(i) I
I I J
I
L
x(j +1) {-; xu-
(~) urn +(j)
-:) X(;) +
I
SYSTEM
(I,O)X(i) +')(i)
E( ; 0 , E'1; 0
EH';(~'~')'
E'1'1';U'
E('1';O
Fig. 7.3.
Numerical example of the observer construction.
where
In terms of s, T 1 2 ( CIJ)
=
rls
T 22 ( (0)
=
2rjs -
Q2S
Considering s for the same case of (117), we must solve
S4+
S3_6s
2+2s+4=0
The required solution is s =
-3.055
5.
265
SUBOPTIMAL LINEAR ESTIMATION: THEORY
which yields the optimal error covariances for the (unconstrained) Kalman filter as Tn(oo) ~~ O.89a2 T 22 ( (0)
=
2.4a 2
T 1 2 ( (0)
=
-O.33a2
Comparing these results with those obtained for the minimal-order filter, we see that performance appears very favorable while realizing simultaneously a reduction in memory elements.
5. Suboptimal Linear Estimation by State Vector Partition Method: Theory We discuss in this section another approximate estimation scheme of state vectors of dynamical systems with linear plant and observation equations. The scheme is based on the observation made in Section 3,E that the amount of computation involved in generating an optimal estimate of the state vector of a linear dynamical system is a nonlinear function of the dimension of the state vector. For example, it requires a smaller amount of computations to generate k estimates of nlk-dimensional state vectors than to generate one estimate of the n-dimensional state vector. After an introductory discussion of such a suboptimal estimation method in this section an approximate estimation problem will be discussed in Section 6 when a natural partitioning of the state vector is possible based on the difference of time responses of various modes 14 2 of the system. The discussion of this section is done for the system given by (73)-(75d) of Sectjon 3.
A.
CONSTRUCTION OF SUBOPTIMAL FILTER
Suppose
Xi
is partitioned into k subvectors
z/,
Zi
2,
... ,
zl,
where
z/ is the value of the jth subvector at time i and where zj has dimension
nj Dj
, Lj n j ?: n. The jth subvector is related to the state vector by a (nj X n) matrix
:
z/
=
DjXi,
.r
~
j
~
k
(121)
Although the matrices D, could be taken to be time varying, they are assumed to be time invariant in order to avoid the resultant complexities
266
VII.
APPROXIMATIONS
of the filter construction. Therefore, the manner in which the state vector x is partitioned into k subvectors is fixed throughout the estimation process. From (121), (122)
where the notations * and have the same meanmgs gIven by (77a) and (77b). The estimates Xi * and Xi are assumed to be reconstructed from the estimates for the partitioned subvectors by A
k
x·* = "LJ F-z;* ~ J z. ;~l
(123)
where F's and D's must satisfy the relation (124)
in order that the state vector is reconstructed from the partitioned subvectors. Proceeding analogously with the optimal estimation procedure, we consider the sequential estimation equation for the subvectors given by (125)
where Kij is the filter gain at time i and where the matrix G j chooses the subvector of the y that is used in updating the estimate for z/ The matrices G j are also taken to be time invariant. From (123) and (125), x:t
*=
k
IF ·z· J
H t
i
=
k
k
L Fjz/ + L FjK/G;[Yi i
= Xi
- Hix i]
k
+ L FjK/G;[Yi -
HiXi]
(126)
5.
SUBOPTIMAL LINEAR ESTIMATION: THEORY
267
The comparison of (76) and (126) shows that the suboptimal estimation scheme implies that k
L FjK/Gj
s; =
(127)
j~l
is the gain of the filter for the state vector Xi . We choose Kii next in a manner analogous to (77c). From (121) and (122),
= = Since
(Xi -
Dj(Aix i
+ gi -
DjAi(X i - x;*)
Aixi*)
+ Djgj
x i *) is not available, it is approximated by Xi - x/ ~ D/(z/ - z~*)
where the superscript plus indicates a pseudoinverse to yield (128)
Then
(129)
where
Proceeding analogously with the optimal estimation equations we construct a suboptimal filter by the following set of equations: TOi
g
E(zoj - zoj*)(zoj
=
DjroD/
where To is assumed known, P;+l
where
g
A/T/A{
+ Q/
~
z~
*)' (130)
268
VII.
APPROXIMATIONS
and Q/ ~ DjQi D/
H/
~
(131)
c.n.o»
R/ ~ GjR;G/
+ R/]-l
K/ ~ P/H/[H/P/Hj' T/
=
[I - K/H/]P/[I - K/H/]'
+ K/R/K/ (131a)
k
Xi*
Xi
=
+ L FjK,tGj[Yi -
Hix;]
j~l
where Xo is assumed known. Figure 7.4 is a schematic diagram of the suboptimal filter. Computations in (131) involve manipulations of matrices of order less than (n X n) or (m X n). Of course, the above derivation does not give any unique partitions of the state vector. If the partition subvectors are not crosscoupled to any other subvectors, either through the plant or observation equations or through the random disturbances, then the present scheme would be optimal. Physical intuition based on the components of x plus a certain amount of numerical experimentation would be necessary to arrive at a reasonable partition. r--------- -
-
-
-
-
r----l
I
I
I
I
I G I
I
I
Y,
-
r----l
I
NEW
-
I
I
I
-
-- -
-
-
-
r-------,
-
-
-
-
-
-
-
I
I
I I
I
I
I
I
F
I
' I
I I
I
I
I
I
I
Ix·
I ' I
OBSERVATIONI
I I I I I
I
--l
OBSERVATION VECTOR PARTITIONING
I
I
~';E=-
~E-; E~';'~;-
- _J VARYING GAIN
I I I I
0-; - J THE CORRECTION TERMS
I
I
I
I
H.X.
I I
'Hi
X,
I
I I
L
-l ESTIMATOR
Fig. 7.4.
Suboptimal filter by partitioning of the state vector.
6. B.
269
SUBOPTIMAL ESTIMATION: AN EXAMPLE
ERROR COVARIANCE OF THE SUBOPTIMAL FILTER
Noting that the suboptimal filter based on the partitioned vector of Section B is equivalent to using the filter gain (127) in the WienerKalman filter for Xi' the error-covariance matrices of this. suboptimal filter can be computed as follows:
[1 - (L F jK;+1 G j) H k
1"i+l
=
[1 - (L F jK;+1 G j) H
k '
i+1] PHI
J~1
i+1]
J~1
k
k '
+ ( L F jK;+1 G j) R i+1 ( L F jK;+1 G j) J~1
J~1
where To is given and where K/ is computed by (131). Comparing T thus generated with the optimal error covanance which results from using the optimal gain K,
=
PiH/[HiPiH/
+ R i ]- 1
the degradation of the filter accuracy can be computed for this suboptimal filter. See Pentecost-P for an application to a navigation problem. We will next consider in detail a particular partition that results from the certain assumed form of the plant transition matrix.
6. Suboptimal Estimation by State Vector Partition: An Example A.
INTRODUCTION
As an example of the subject of the last section, where the estimation of state vectors via partitioned substate vectors is treated, let us now consider a problem where the state vector can be grouped into two subvectors naturally in a sense to be mentioned below. The system is given by (73) and (74). As before, Xi and ~i are n vectors and Yi and YJi are m vectors where E(gi) = E(1)i) = 0 E(fig/)
=
AiDij
E(1)i1)/)
=
L s;
VII.
270
APPROXIMATIONS
We use A and 1: instead of Q and R since we want to use Q and R to denote submatrices of A and 1:. We have already discussed the desirability of reducing the amount of computations associated with optimal filterings by using some suboptimal filtering scheme, so long as the required accuracy of desired estimates is compatible with that of the suboptimal filters. Suppose that the system is stable and that eigenvalues of Ai can be grouped into two classes such that the real parts of eigenvalues in one group are much different from those of the other group. Using Jordan canonical representation.P'' assume that the state vector Xi can be written as
where Zi is a q vector and becomes
~ -t:. l _ ) ( _ Wi+1
=
Wi
is an (n - q) vector. The plant equation
(_<1>-.!~~_)(_~i_)
0 : Pi
ui,
+ (_~i_)
(132)
vi
where it is assumed that I tJ1i II ~ II epi II, i = 0, I, ... and where fLi q vector and Vi is an (n - q) vector. Define covariance submatrices Qi' Si' and R i by
IS
a
Assume that II <1>ill
= 0(1)
and where E is a small positive quantity and where the notations O( I) and O(E) mean that I cI>i II is of order 1 and II Pi II is of order E, respectively. Partition Hi as (134)
where
M,
=
(m
N;
=
m
X X
q)
matrix
(n - q) matrix
Then, writing the Kalman filter equation for x i * in terms of its components Zi * and Wi * , the optimal estimates of Zi and Wi , they satisfy the equations Z~l
wi+1
+ Ki+l(Yi+1 = Piw;* + L i+1(Yi+1 = <1>i Z;*
Mi+l<1>i Z;* - N i +1 P iWi*) M i+1<1>i Z;* - N i+1 P iW,; *)
(135)
6.
SUBOPTIMAL ESTIMATION: AN EXAMPLE
where
K'+l
=
(q
matrix
m)
X
L'+l = (n - q)
271
X
m
matrix
are gains of the filters for the subvectors z and w. Optimal gains of these two filters can be expressed as (136)
where S,*n
g (M'+l , NiH) i'*(i) ( ~f+l
r ~ * ( 1' ) =[', (c[J, 0 A J *(i)
g
) HI
+ .E'+J
0) r*(') (c[J/0 lfI.'0) + (S' Q,
lfI ,
1
M'H i't1(i)
...1 2*(i) g M,+l i'l~(i)
r
z
+ N i+1 i'i;(i) + N'+l i'~(i)
( 137)
(138)
(139)
The time index of is now carried as the argument of T and the subscripts on r refer to the components of r. The asterisk indicates a quantity associated with the optimal filters. Since
the components of the optimal error-covariance matrices are given by rtJ.(i
r1~(i r2~(i
+ 1) = + 1) = + 1) =
i'ti(i) - Al*(i)' 57+11 A/(i)
i'1~(i)
- Al*(i)' 57+11 ...1 2*(i)
i'~(i)
- A/(i)' 57+11 A/(i)
(140)
When arbitrary nonoptimal gainsKi+1 andL i +1 are used these components of the error-covariance matrices are related by
+ 1) = r + 1) = r 22(i + 1) = r
+ K'+l5i+1K'~1 Ki+l A 2(i) + Ki+15i+1L;+1 A 2(i)' L'+J + L,+l5,+lLi+1
ll(i
i'J1(i) - Ki+l AJ(i) - A 1(i)' K'+J
12(i
f'di) - A 1(i)' L~+l
-
i'di) - Li+1 A 2(i) -
(141)
VII.
272
APPROXIMATIONS
where Ei+l is defined by EH 1
and where
=
+ E i+!
Hi+! t(i) HI+!
+ eJ>i Tu(i) eJ>/ Si + eJ>j T (i) P/ s, + Pi T (i ) P/
(142)
tll(i) ~ Qi
t 1 2 (i)
~
Tdi) ~
12
(143)
22
By our assumption on cIJi and Pi ,
II t 22 (i)11= 0(11 n, II) and
if
II s, II = 0(1) II s, II = 0(.:)
(~;
:;).
II t 1 2 (i)11= 0(11 s, II) II t 1 2 (i)11= 0(.:)
0nNil (~'
Hil =
~;)
if
+
(144) ( 145a) (l45b)
~(, =Hi.,f
COMPUTATION OF OPTIMAL TIME -VARYING GAIN
,
,(flil "
S. =H. J
1+1
Slj
S.) I
Ri
H.
1+1
+
L.
I+J
COMPUTATION OF SUBOPTIMAL TIME -VARYING GAIN
Fig. 7.5.
Comparison of filter gain generations for optimal and suboptimal filters.
6. B.
SUBOPTIMAL ESTIMATION: AN EXAMPLE
273
SUBOPTIMAL FILTER
As an approximation, consider the filter gains which result from retaining submatrices of order 1 and neglecting those of order e in f'(i). Namely we use gains (146)
where q
""'i+l
=f':,
H
i+l
Si) H' R., i+l
(f'l1(i) S.',
'"' +. ""i+l
(147)
to generate estimates for ,zi+1 and Wi+1' The matrix E'i";-l is defined similarly with f'li(i) replacing f'11(i) in (142). Since we have made no assumptions on the magnitude of Si' Si is retained in the definition of E'i+l at this point. Figure 7.5 shows diagrammatically the difference in the optimal and the suboptimal filter gains given by (136) and (146). C. ERROR-COVARIANCE MATRIX OF THE SUBOPTIMAL FILTER
We will now obtain an estimate of the magnitude of the differences .1r(i)
~
r(i) - r*(i)
.1f'(i) ~ f'(i) - f'*(i)
(148)
which arise from the use of the suboptimal gain instead of the optimal galOs. Write E i + 1 = 8 i +1 -t- Si+l Then, from (142) and (147), 8m
=
(149)
Hi+l (Pi r2(i) l/J;'
Thus 8i +1 is of o(e) while E'i+1 is of 0(1). From (141) and (146) the errorcovariance matrix for the suboptimal filter is given by r ll(i
+ 1) =
f'11(i) -
f'11(i) M:+lE'i';IMi+l f'd i )
f'11(i) M:+lE'i';I Ni+l(Si
- (Si
+
l/Ji r I2(i )
+ l/J TI2(i ) P;') i
pn N :+lE'i';l M i+l f'11(i) (l50a)
VII.
274
APPROXIMATIONS
and T 22(i
+ 1) =
f'22(i) - RiN;+lSi~1(Mi+1 - (f'{2(i) M;+l
f'12(i)
+ Ni+l
f'22(i))
+ f'22(i) N;+l)Si~lNi+1Ri (I50c)
Define L1 i +1 by From (137) and (147), (151)
Since L1 i +1 is of O(€), the inverse of 8[+1 is approximately given by (152)
(153)
Intuitively speaking, the suboptimal filter with gain (146) would be most accurate when the correlation between these subvectors is zero, i.e., when S, = 0, i = 0, 1 ,.... Therefore, we will investigate two cases: (i) when Si is of small quantity of order € and (ii) when Si is not small. In Case (i) it is assumed that II s, II R:; IIlJ'i II = O(€). a. Case with Small S, From (150a), to the order of L1Tl1(i
+ 1) ==
€,
L1f'11(i) - L1f'11(i) M;+lSi;lMi+l f'~(i)
(154)
6.
275
SUBOPTIMAL ESTIMATION: AN EXAMPLE
The details of the following calculations are found in Appendix C at the end of this chapter. Dropping the subscript (i 1) from M, N, E, E*, .1, and (3 and removing the subscript i from Rand S,
+
L1T12 (i
+ 1) ~
-L1T11 M'S*-WR
L1T22(i
+ 1) ==
RN'S*-l
(L1~11
~)
+ Tt{M'S*-l (L1~11
~)
S*-lNR (155)
S*NR
(156)
Therefore, (157)
where
(I 58)
where f3i g
(II Mi+lSi*+-/Ni+l R i I
+ I Ttl(i) Mi+lSi",\ I II 37+ N i+lRi II) II Wi 1 1
2
and (159)
where Hence, i-I
II .1T ll(i) II ~ ( IT aj) II L1T11(0)11 o
(160)
b. Case with Si = 0(1) Now .1r1 2 is 0(1). From (150),
L1T12 (i
+ 1) ~
+ M'S*-lNR) + SN'S*-lMS + (TliM' + SN')S*-l .1r., M'S*-\MS + NR) -L1T11 (M'S*-lMS
(16Ib)
VII.
276
APPROXIMATIONS
and LlT22(i + 1) ~ SM'S*-lMS
+ (S'M' + RN')S*-lM LITH M'S*-l(MS + NR) (161c)
Therefore,
(162)
where
(163)
where bi ~
II SiNi+lS"t-/Mi+lSi II
fJi ~
III/>i 11 2 II Mi+lSi'+-/(Mi+l Si
+ Ni+lRi)11 (1 + II(r~.(i)
Mi+l
+ SiNi+l)S"t-/Mi+l II) and (164)
where c, ~
II SiMi+lSi';/Mi+lSi II
r,
III/>i 11 211 Mi+l S"tl\Mi+1 Si
~
+ N i+lRi)11
2
Comparing (157)-(159) with (162)-(164), it is clear that that II Si I is the major source of error of this type of approximations as expected. Appendix A. Derivation of the Recursion Formula for Open-Loop Feedback Control Policies (Section 2) Substituting Eq. (68) into Eq. (67), one obtains N-l
an
+2 L
b/c(n) U/c
+ L L U/c' K/cj(n) Uj
/c~n-l
/c
+ 2Cnxn_1 + x~-lLnxn-l ,
+ 2 L u/ fj(n) j
X n- 1
j
+ 2 L g/c(n) f/c + L L f/c' N/cj(n) fj /c
+ 2 L f/c' M/c(n) /c
/c
X n_1
j
+ 2 L L f/c' O/cj(n) u, /c
j
277
APPENDIX A. RECURSION FORMULA
+ an+l + 2 L bk(n +
+ L L Uk'
I) Uk
k
k
+ 2Cn+lXn + Xn'L'H1Xn + 2 Lgk(n + k
+LL
tk' Nkj(n
+2L
tk Mk(n
k
+
I) t j
j
+
Kkj(n
j
I)
I) ».
e,
+ 2 L u/ fj(n +
I) Xn
j
+
I) Xn
+ 2 L L tk'
+
Okj(n
i
k:
+
+
where Xn = AXn_1 BUn_1 cgn- 1 . Since Eq. (AI) must hold for all values of the variables u, one obtains bn_1(n)
=
Cn+1B
bk(n)
=
b,C
C;
=
Cn+lA
+ I),
+ Vn)A
L; = A'(Ln+l fn-1(n)
=
B'Ln+lA
fk(n)
=
fk(n
kn-l,n-1(n) Kk,n-1(n) with
+
I) A
= B'Ln+lB =
fk(n
+
Kk,j(n)
=
Kk,j(n
Kk,j(n)
=
K;,k(n)
gn-1(n)
=
Cn+lC
gk(n)
=
gk(n
+ 1'.,-1
I) B,
+ I),
+ I),
k7'cn-1 k,j 7'c n - I
k7'cn-1
Mn_1(n) = C'Ln+lA
+
Mk(n)
=
Mk(n
On-l,n-1(n)
=
C'Ln+lB
0k,n_1(n)
=
Ok,j(n)
=
On,k(n) = Nk,j(n)
=
I) A,
+ I), Ok,j(n + I), C' Mk(n + I)', Nk,j(n + I), Mk(n
k7'cn-1 k7'cn-1 k, j
7'c n - I
k7'cn-1 k, j
7'c n - I
= Nn,k(n)',
k7'cn-1
Nk,n_1(n)
=
Mk(n
k7'cn-1
Nn-l,n-1(n)
=
C'Ln+lC
Nk.n(n)
+ 1) C,
(AI)
I) Uj
g,
and x,
278
VII.
Appendix B. (Section 4)
APPROXIMATIONS
Derivation of the Constraint Matrix Equations
As is evident from the discussion of Section 4,B, the existence of a minimal (n - m)th-order observer depends on the fact that the matrix T satisfies the matrix equation TA -FT
DH
=
(BI)
while the matrix
l' =
[-~-]
(B2)
remains nonsingular. In the matrix equation (Bl), F and D are to be specified by the designer. It is well known that if A and F have no common eigenvalues a unique solution for T of (B l) exists for arbitrary DH. See, for example, R. Bellman "Introduction to Matrix Analysis." Since for the stochastic problem we are concerned with obtaining a solution which permits the minimization of certain elements of the error-covariance matrix C; defined by (89), we present here a constructive procedure which specifies the design matrices F and D in such a manner that (BI) and (B2) are true. This approach is useful as it obtains the method of Section 4, C for minimizing the system error-covariance matrix in a straightforward way, even though a number of constraints need be simultaneously satisfied. We present the essentials of this approach in the form of a theorem.
Theorem.
If
F= TAP D
=
TAV
where
f' !l
[-~-]
,
L g [P: V],
T is an (n - m) X n matrix, H is an m X n matrix P is an nx(n - m) matrix, V is an n X m matrix
are such that either A.
L1' = PT
B.
1'L
+ VH =
or =
[-l~
~t-]
In, In is the n
X
n identity matrix,
APPENDIX C. COMPUTATION OF
-1r(i)
279
then 1.
TA -FT
II. L
=
=
DH,
1'-1.
Proof. We see that Condition A is the definition that L is a left inverse of T while Condition B is the definition of a right inverse of T. Since for square matrices the existence of a right or left inverse is equivalent to nonsingularity, Conditions A and B are equivalent and either implies L
=
1-1
Using the definitions of F and D we have TA -FT
=
TA(I - PT)
DH
=
TA(VH)
But as (I - PT) = VH
from Condition A, then TA -FT = DH
In the main body of Section 4, it can be seen that any relation involving the error-covariance matrix C can be converted using Conditions A and B to one involving the known matrices A, H, R, and Q, and the unknown matrix V. Hence it is sufficient to consider Valone in designing the observer. It should be noted however that V (as well as C) is constrained. From Condition B we have (B3)
where H is given.
Appendix C.
Computation of -1F(i) (Section 6)
From (140) and (150), .M'u(i
+ 1) =
Llfn(i)
+ (t\i(i) M:+l8i;l M'+l i\i(i)
- fu(i) M:+l
X 8i;lMi+1 fu(i))
+ [f1i(i) M:t-l8i;lNi+1 f1~'(i) X (8, + 1'J, Tdi) P/)']
- fu(i) M/ H8i';lNi+1
VII.
280
APPROXIMATIONS
+ rtl~(i)N(+IS;:;lMi+I
i'ri(i) - (S,
+ f/>i F 12(i) IfF()N:+I
X S;;lMi +1 .tn(i)]
+ .tl~(i) M:+1S;;lNi+l Tt2(iY + .tn(i)M(+IS;/lOi+1S;;lMi+I .tn(i) - (Mi+1.t1i(i) + N i+1 i\~(i)Y X S;;lLl i+lSi+1(Mi+1.t1i(i) + N i+1.ttz'(i)) ~ Ll.tn(i) - Ll.tn(i) M:+i S;t\Mi+1 .t1i(i) - .ttl(i) M:+1S;;lMi+l Ll.tn(i) - .ti~(i)
M:+1S;;lNi+l(f/>i LlF12(i) IfF(Y
- Ll.tn(i) M:+1S;;lNi+l(Si
+ f/>i Tl~(i)
IfF(Y
- (f/>i LlT12 (i) IfF()N:+1S;;lMi+1 .t1i(i) - (Si
+ f/>i T 12(i) lfFi)N:+1S;;lMi+l Ll.tn(i)
+ .tiz(i) N:+1S;;lNi+l .ti2'(i) + I\i(i) M(+1S;:;lOi+IS;;lMi+l .t{;.(i) - Al*(i)' S;;ILli+lSi+; Al *(i)
We drop subscripts from now on: LlTdi
+ 1) =
Ll.t12(i)
+ [.tliM'S-IM.tl~
- .tnM'S-IM.t12]
+ .tl~N'S-lM.tl~ + [.tliM'S-W.t2~ - .tnl'l!J'S-WR] + (.tl~N'S-W.t2~ - .t12N'S-lNR) + .tnM'S-W(R - .t22) + .t1iM'S-10S-1NR - (.t1iM' + .tl~N')S-l LlS-l(M.tl~ + N.t2~) =
Ll.t12(i) - Ll.tn(i) M'S-IMtl~
- tt{M'S*-lM Llt12
+ ttiM'S-WlfFiTizP/ + Tl~N'S-WlfFiT2~1fF/ -
Lltn M'S-WR Llt1 2 N'S-WR
+ .tl~N'S-lMtl~
- .t1iM'S-WlfFiTizP/ - Lltn M'S-WlfFiT22P/
+ i'riM'S-lOS-lNR LlS-1 (Mtl~ + Ntiz)
- t1iM'S-WlfFi LlT22 IfF;' - (t1iM'
+ tl~N')S-l
APPENDIX C. COMPUTATION OF
11T22(i
.dr(i)
+ 1) = 11t22(i ) + (l\~N'S-1Ml\~ - RN'S-1Mt12) + (t~N'S-Wt~ - RN'S-Wt22) + (t1~'M'S-Wt2~ - t{2M'S-WR) + tt2M'S-1Mt~ + RN'S-WR - t 22N'S-WR + RN'S-18S-WR - (t1~'M' + t2~N')S-1 11-1S-1 X (Mt1~ + Nt~) = 11t22(i) - RN'S-1M 11t12 + PiR/.p/N'S-1Mt1~ - RNS-WPi 11T22 Pi - 11/'{2 M'S-WR
+
t~M'S-1Mt1~
+-
RN'S-18S-WR - (t1~'M' 1
X 11S- (Mt1~
- 11t22 N'S-WR
+ Nt2~)
+ t2~N')S-1
- Pir2~P/N'S-WR
281
Chapter VIII
Stochastic Stability
I. Introduction We consider the question of stability of discrete-time stochastic systems via Lyapunov functions in this chapter. * This topic is not only important in its own right but also provides an example of the classes of engineering problems where the theory of martingales introduced in Chapter VI can be fruitfully applied. It is well known that natures of stability of deterministic dynamical systems can be answered if we can construct Lyapunov functions with certain specified properties. See, for example, La Salle and Lefschetz.s? Hahn.?" or Krasovskii.t" Also see Refs. 141a, 143 for other methods. Generally speaking, given a dynamical system with the state vector x, the stability of the equilibrium point of a dynamical system (which is taken to be the origin without loss of generality) can be shown by constructing a positive definite continuous function of x, Vex), called a Lyapunov function, such that its time derivative dV(x)/dt along a system trajectory is nonpositive definite. A monotonically decreasing behavior of Vex) along the trajectory implies a similar behavior for the norm of the state vector x, II x II, i.e., II x I ---+ as t ---+ 00, which is in correspondence with our intuitive notion that the origin is asymptotically stable. For a discrete-time system with the trajectory {x n , n = 0, I,...},
°
* For discussions of stability of continuous time stochastic systems see, for example, Samue]s,'2'-123 Kozin.f" Bogdanoff," Caughey," and Caughey and Dienes. 3' See also Ref.6Ia.
282
1.
283
INTRODUCTION
the stability of the origin is implied by the behavior of V(x) such that, for any set of i discrete sampling points in time, 0 ,,:;; n 1 < n 2 < ... < ni , (1)
This behavior of V(x) may be intuitively understood by interpreting V(x) as a "generalized" energy of the system which must not increase with time for stable dynamical systems. Now consider a discrete-time stochastic dynamical system described by k
=
0, I, ...
(2)
where Xk is the n-dimensional state vector and glc is a random variable (generally a vector). A control system described by k
=
0,1, ...
(3)
can be regarded as a dynamical system described by (2) for a given control policy, say Uk = rPlc(Xk)' where rPlc is a known function of xk: Xk+l
=
Fk(Xk ,
g Dk(Xk, t k) Only one trajectory results from a given initial state X o , for a deterministic system. A collection of trajectories is possible from a given X o for a stochastic system, depending on realizations of the stochastic process {gk}' Using the same intuitive arguments as before, consider a positive definite continuous function of x, V(x), which may be regarded again as representing the system's generalized energy. Because {xn } is now a stochastic process, one must now consider the behavior of V(x) for the class of all possible realizations of trajectories. One may, for example, replace Condition (1) for the behavior of Lyapunov functions of deterministic systems by (4)
for any given set of i time instants, 0 ,,:;; n 1 < n 2 < ... < n i. Namely, (4) requires that the average behavior of V(x) over possible realizations of trajectories behaves like the Lyapunov function of a deterministic system. Intuitively speaking, stable stochastic systems should be such that E(V(x n » remains bounded. In terms of {x n } , stable stochastic systems should be such that, for the majority of sample sequences {xn } , r.e.,
284
VIII.
STOCHASTIC STABILITY
for the set of sample sequences with probability arbitrarily close to 1, II «; II remains bounded. For an asymptotically stable stochastic system not only should E(V(xn )) remain bounded for all n but actually E(V(x1, ) ) should decrease monotonically to zero with probability 1. Equation (4) is implied by the relation that, given any particular realization of x o , X 2 , ••• , X n , V(x) satisfies the inequality (5)
since E(V(xn )) = E(E(V(xn ) I X O, Xl"'" x n - l ) ) , where the outer E refers to the expectation operation over possible X o Xl"'" X n - l • Inequality (5) can be interpreted to mean that the behavior of {xn } is such that, given past behavior of V(x) for a realized (sample) trajectory X o , Xl"'" X n - l , the expected value of V(x) at the next time instant, E(V(x n ) I X o ,... , x n - l ) , is not greater than the last value of V(x), V(x n _ l ) . Then the system can be regarded as stable in some stochastic sense, since the conditional expected generalized energy is not increasing with time. Inequality (5) is precisely the definition that V(x n ) , n = 0, 1,... , is an expectation-decreasing martingale or a supermartingale (also known as a lower semimartingale). The idea of discussing stability of stochastic systems by suitably extending the deterministic Lyapunov theory seems to have appeared first in the papers by Bertram and Sarachik'" and Kats and Krasovskii.P? The realization that such stochastic Lyapunov functions are supermartingales seems to be due to Bucy'" and Kushner.w-"? In the next section, we will make precise the statements made in this section and discuss the problems of stability of stochastic systems. The exposition is based in part on Bucy and Kushner.
2. Stochastic Lyapunov Functions as Semimartingales We cbnsider a stochastic system described by (2) such that for all
k
(6)
i.e., the origin is taken to be the equilibrium point of the system. The solution of (2) is denoted by Xk , k = 0, 1,... , or by k
=
0,1, ...
when it is desired to indicate the dependence of the solution on the initial state vector X O' Note that X o is generally a random variable.
2.
STOCHASTIC LYAPUNOV FUNCTIONS AS SEMIMARTINGALES
285
We now give definitions of stability and asymptotic stability m a way which parallels the definitions for deterministic systems.
<
°
° °and
Definition of stability. only if, for any > II Xo II p(o, E), then
The origin is stable with probability I if and e > 0, there is a p(o, €) > such that, if Pr[sup II n
Xn
II
;:?o
-l
~ 1)
Definition of asymptotic stability. The ongm is asymptotically stable with probability I if and only if it is stable with probability I and
II x(n, xo)11
--+
0 with probability 1 as n --+
00
(7)
for X o in some neighborhood of the origin. If (7) is true for all X o in the state space, then we say that the origin is asymptotically stable in the large. Let us now indicate how we prove the stability of a stochastic system if a positive definite continuous scalar function V(·) is defined on the space of state vectors such that {V( x(n, xo))} is an expectation-decreasing martingale in some region about the origin. Such a proof can be made to depend essentially on the semimartingale inequalityv-: for any A > 0, (8)
See Appendix A at the end of this chapter for proof. See also Appendix I at the end of this book. To simplify the arguments, we show the proof for positive definite scalar function V(x n ) which forms an expectation-decreasing martingale in the whole state vector space. It is a minor technical complication to extend the arguments to include {V(xn )} which is an expectationdecreasing martingale only in some region about the origin. We assume that
Consider a positrve definite continuous function of n-dimensional vector x, V(x), such that V(O) = 0, V(x) finite for any finite II x II, and V(x) ---+ 00 as II x 11---+ 00, and suppose that the sequence {V(x(n, x o))} is an expectation-decreasing martingale for all X o . Then we will show that the origin is stable. From (8), for any € > 0, (9)
VIII.
286
STOCHASTIC STABILITY
Given such a scalar-valued function Vex), it is always possible to find continuous positive nondecreasing functions ex and f3 of real variables such that
(X(O)
(X(II x II) ---+ and
(X(II x II)
f3
Vex)
~
=
0
as
00
II x II ---+
00
(3(1! x II)
~
can be constructed by
II)
(3(11 x
and
(3(0)
(3(11 x II) ---+
00,
For example, such ex and
0,
=
Il
!X(II x II)
V(y)
max
-
Ilxll
IJVII~
min V(y)
~
y
where
II x II where
II y II
~
c(11 x II) Il -
b(c(11 x II))
~
V(y)
min
IIYII~IIXII
and where the function b is chosen so that, for any a Vex) > a
> 0,
II x II > b(a)
for
Since Vex) - 00 as II x II - 00, it is always possible to choose such b(a). Choose p(8, E) such that if
II Xo II
~ p(o,
E)
then (10)
From (9) and (10), Pr[sup V(x n ) n
Since
Pr[sup (X(II x II) n
~
E]
~
~
E]
~
0
Pr[sup V(x n ) n
(11)
~
E]
and since ex(') has a unique inverse in the neighborhood of the origin, the origin is stable with probability 1. Condition (10) can be satisfied in other ways. For example, if X o is the random variable such that Pr[1i X o II
~
M] = I
2.
for some finite M E
287
STOCHASTIC LYAPUNOV FUNCTIONS AS SEMIMARTINGALES
V(II
Xo
II)
>
0, then
~ f3(a)
[1 - Pr(11 X o II ~ a)]
+ f3(M) Pr(11 X o II
~ a)
or
Pr[11 X o II
°
~
a] ~ (EV(II X o II) - f3(a))/f3(M)
Choose p(o, E) > sufficiently small so that f3(p) Then for X o satisfying Pr(11 X o II ? p) ~ p we have E
V(II
Xo
II)
+ pf3(M) ~
EO.
EO
~
From this inequality and (9), (11) follows. Thus we have proved the stability with a slightly different definition of stability that the origin is stable if and only if, for any 0 > and E > 0, there exists a p(o, E) > such that for every X o satisfying Pr(11 X o II? p) ~ p, and Pr(11 X o I ~ M) = 1, (11) holds. The criterion for asymptotic stability is given by the following. Suppose that there exists a continuous nonnegative function y(.) of real numbers such that it vanishes only at zero and
°
°
E[V(x(n, xo)) I xo , ... , xn - 1]
-
V(x(n - 1, xo))
~
-y(11 x(n -
1, xo) II)
<
0 (12)
for all x o' Then the origin is asymptotically stable with probability 1. As commented earlier in connection with the definition of the asymptotic stability, in order to show the asymptotic stability it is necessary to show E V(x n ) ~ as n ~ 00. Letting
°
and
r«
~
y(11
Xn
II)
(12) is written as
Taking the expectation of this with respect to the random variables Xo
,... ,
X n- 1 ,
or n
EVn+l - EVo ~ -
L EYi < 0
i=O
288
VIII.
STOCHASTIC STABILITY
or n
o < L EYi ~
EVo < E{3o
<
(13)
OC!
i~O
for every n. (13) implies that as
n
-+ OC!
Thus, Yn ---+ 0 in probability. * Since it is possible to pick a subsequence of {Yn} such that the subsequence converges almost surely, let {Yn.} be such a subsequence. * Then, since Y is continuous and vanishes only at zero, we have II x(n i , x o) 11---+ 0 with probability 1. Since 0 ~ EVn ~ EVo < CI) by the semimartingale convergence theorem (see Doob, p. 324),47a lim n-->oo
r,
tl r,
exists with probability I. But (14)
Then taking the limit of (14) on the subsequence
o~
V 00
~
(3(0)
=
{,en)'
0
Therefore, V 00
=
0
with probability I
or lim an = «(Iim
hence,
II x(n, xo)11
--l-
0
II x(n, xo)ll) = 0
with probability
3. Examples The following examples are taken from Kushner.P? Consider a scalar system (15)
* See Appendix I for discussion of convergence in probability and convergence with probability 1 or almost sure convergence.
3. where the
gn
289
EXAMPLES
are independent and identically distributed with (16)
Choose V(x)
=
x2
(I 7)
Then
Therefore, if (a2
+a
2
)
<
I, then
hence the origin is asymptotically stable. This example can be extended to systems described by vector difference equation immediately. From the basic semimartingale inequality (8),
(19)
or
gives a useful probability expression on the magnitude of for the same system, if V(x) is chosen to be V(x)
for some positive integer r
> I,
=
x 2r
Xn
.
Now (20)
then
Thus, for those positive integers r such that (21) {x~r} is still an expectation-decreasing martingale and is still asymptotically stable. Now, instead of Eq. (19), one has
Pr[sup n
x~r
;?; c] ,,:;; Ex~rlc
or (22)
290
VIII.
STOCHASTIC STABILITY
Another inequality which is sometimes useful is:
where z's are independent random variables with mean zero and finite vanances.
Appendix.
Semimartingale Inequality-?»
Let {Zi' flJi , i = 0, 1, 2, ...} be an expectation-decreasing martingale where flJi is the a field associated with Zi . Define, for any nonnegative c, a(w)
=
inf{k I Zk(W) :); c}
U
{O}
That is, a is k such that Zk ~ c for the first time. If no such k exists, k = 0, 1,2, ... , then a is set equal to 0 by definition. Now {zo, zo' flJo ' flJo } is a two-member expectation-decreasing martingale. Therefore, (Al)
where B -Q [sup k
Z" :);
c]
and where B is the complement of B. Since for hence a = 0:
wEB,
SUPk Zk(W)
<
c,
CA2)
Therefore, from Eqs. (AI) and (A2),
f
8 Zo
Therefore, Pr B
dP :);
< 1c f
8
f
8 Zo
Zo
dP
dP :); c Pr B
< 1C E I Zo I
Chapter IX
Miscellany
1. Probability as a Performance Criterion In most parts of this book we have used criterion functions which are explicit functions of state vectors and/or control vectors. There are problems, however, where it is more realistic to use implicit functions of state vectors and/or control vectors as criterion functions. A time optimal control problem? is a good example of this. Another example of problems where the criterion functions are implicit functions is given by a final-value control problem where the probability that the terminal error exceeds a specified value is used as a criterion of performance. Although it is sometimes possible to obtain approximately equivalent criterion functions which are explicit functions of state and control vectors for some problems' with implicit criterion functions, * for most problems with implicit criterion functions such approximations are not possible. We will discuss in this section yet another example of a control system with an implicit criterion function where the probability of the maximum system deviation exceeding a given value is used as the criterion of performance. The development of this section is based in part on Odanaka-?" and Aoki.?
* For example, the probability of the terminal error can be approximated by a quadratic function of XN , i.e., by an explicit criterion function of XN in some cases (see Pfeiffer'l3), where xN is the system state vector at the terminal time N. 291
IX.
292
A.
MISCELLANY
PROBLEM FORMULATION
The derivation of an optimal control policy with this criterion function is illustrated for a two-dimensional system described by x1(i x 2(i
+ 1) = + 1) =
x1(i) x 2(i)
+ 1&>::1(i), xz(i» + Iz(x1(i), xz(i» + i + fi, U
i
=
0, 1,..., N - 1
(1)
where xj(i) is the jth component of the state vector at time i, j = l, 2, is a scalar control variable at time i, and are independent random noise with known probability density functions. They are taken to be identically distributed with the density function p(g) for the sake of simplicity. Later we will indicate how this assumption can be dropped and how the method can be extended to certain other classes of nonlinear systems. t i is the scalar value of the noise at time i. In this section Xl and X z are assumed to be observable without error. The function 11 is assumed to be such that (Xl + 11(xl , xz»Z < DZ whenever x l z x zz < D2, where D is a given constant. 2xdl 112 < X 22 is one sufficient condition for this. The admissible set of control is taken to be the whole real line. The extension to the case of bounded control variable I Ui I :(; m i , mi given, will also be indicated at the end of this section. Assume that the origin (Xl = 0, X 2 = 0) is an unstable equilibrium point of this system. The random disturbances in (l) will tend to move the state vector away from the origin. It is desired to keep the system state vector in the neighborhood of the origin in the face of random disturbances by appropriate control actions. We take the duration of the control to be finite, N. The criterion function is taken to be the probability that the maximum of the current and all future deviations exceeds a predetermined value, D. Define
e»
Ui
+
+
where Namely, Pk(C I , cz) is the probability that the maximum of the current and the future deviations of the system exceeds the value D, when the system starts from the state vector (cl , cz) and an optimal control policy is employed. Clearly, k
=
0,1, ..., N
(3)
1.
PROBABILITY AS A PERFORMANCE CRITERION
293
Also (4)
The recurrence relation for P k is given by I,
(5)
where
B.
c1'
=
C1
c 2'
=
C2
+ 11(C1 , c + 12(C c + 2)
1 ,
2)
Uk
+
(6) gk
DERIVATION OF OPTIMAL CONTROL POLICY
Suppose that the probability density function ing assumptions:
p(~)
satisfies the follow-
(i) p(g) is differentiable and unimodal;
(ii)
0 <
J:~: pW dg
for any finite
a, b
>
O.
From (3) and (4), P N ( C1 , c2 ) is given for allc1 and c2 • From (5), (7)
where (c1 ' , c2 ' ) is related to (c1 , c2 ) by (6). From (3) and (4) the integrand P N in (7) is one whenever (8)
and zero otherwise. From (6) the set of ~N-1 by solving the inequality
where
values for which (8) is satisfied is obtained
IX.
294
MISCELLANY
This inequality for tN-l can be solved explicitly and the situation of (8) is true for tN-l satisfying the inequalities or
where ~~-l
~
-UN- I -
OCN-I -
~~-l
~
-U N- I -
OCN-I
f3N-I
(9)
+ f3N-I
and where OCi
~
f3i ~
xz(i) {DZ -
+ fz(x1(i), xz(i» [Xl (i)
+ fl(xl(i), xz(i»p}l/z,
Note that D2 > (c1 + 11(C1 assumption on 11 . Define G N - 1 ( U N - 1 ) by
, C2))2
whenever
(10) O~i~N-l C12
+ C22 <
D2
by the
Then, from (9),
The optimal control at time N - 1, U~_I' is the control which minimizes G N - 1 • Since G N - 1 is differentiable with respect to U N-1 ,
From (10) and (11),
(12)
where pi exists by assumption. Equation (11) can be written as GN-I(U N- I) =
{" -00
ePN-I(f;
+ UN-I) p'Cf) df;
(13)
I.
295
PROBABILITY AS A PERFORMANCE CRITERION
where
-+ UN-I) g
il
/0:
f3N-l ::;; g ::;; -CXN-l otherwise
-CX N-1 -
-+
f3N-l
Since TN-I;?: 0 and p'(x) changes its sign from plus to minus once somewhere on the real axis because p(x) is unimodal by assumption, changes its sign from minus to plus once as U N- 1 varies from - 00 to 00. Therefore, there is a unique Ut-l at which is zero. From (7) and (11), PN-l is obtained as
c;,
+
<:
PN-1(x1(N - 1), x 2(N - 1» 1,
X 12(N
- 1)
+ X 22(N
- 1) ;;; IJ2
X 12(N
-
1)
+ X 22(N
-
(14)
1)
< f)2
Let us next derive PN-2(X 1 , x 2). From (5) and (14), (15)
for Define G N-2(UN-2) =
I PN-1(C1', c2' ) p(O dg
Noting agam that (C1')2 + (C 2')2 ;?: D2 for certain written as G
N-2(UN-2)
= rUN-2-0'N-2-~N-2
p(O dg
t values, (16) can be
+ reo
_00
(16)
p(O dg
-UN-2-~N-2+~N-2
(17)
where CX N- 2 and (3N-2 are defined by (10). Differentiating (17) with respect to U N- 2
,
G;'_2(U N_2) = -p( -U N - 2 - CX N- 2 - f3N-2) + p( -U N-2 - cxN-2+- f3N-2) - PN-1(C1', C2') p( -U N-2 - CX N- 2 + f3N-2) + PN-1(C1', c2' ) p( -UN-2 - CX N- 2 - f3N-2)
+
rUN-2-''N-2+~N-2 •
-UN-2-0N-2-~N-2
:N-l N-l
p(O dg
(18)
296
IX.
MISCELLANY
where c/ and c2 ' are cI ' and c2 ' with g given by the upper limit of the integration. (I' and (2' are similarly defined with g given by the lower limit of the integration. Since u and g enter into the arguments of P N - 1 symmetrically, i.e., SInce
we have (19)
for any y and
=
s.
From (18) and (19),
{ '
+ g) p'(~)
dg
(20)
where
-Ci N - 2 -
~ ~ otherwise
(3N-2 :(;
CiN-2
+ (3N-2
(21)
By arguments paralleling those used in establishing the existence of the unique Ufv-l' G',._2 vanishes for the unique Ufv-2 . * From (15), (16), and (17),
1, (22)
where Ufv-2 is the zero of (20) and where CI = x1(N - 2), C2 = x 2(N - 2), and cI ' and c2 ' are related to CI and C2 by (6).
*
See, for example, Ref. 89b.
1.
PROBABILITY AS A PERFORMANCE CRITERION
297
It is now clear that Pk(C l , C2) can be expressed quite analogously to (14) or (22) where the optimal control Uk * is determined as the unique zero of Gk'(U k)
= J'X) -00
ef>k(U k
+ g) p'W dg
(23)
where
C.
EXTENSIONS
Certain extensions of the results in Section B are almost immediate. The assumption of identically distributed random variables can be dropped by using Pk(~k) instead of P(~k) in (23), where PkO is the probability density function of ~k • The control variable Uk can be taken to be constrained I Uk I ~ mk • Then, Gk'(Uk) may not become zero for any I Uk I ~ m k• Denote the zero of Gk ' by Uk • We now have the optimal control given by mk < I ilk I ~ ilk <
ilk mk -m k
As for the expression of P k we have, instead of (22),
where Cl = xl(k), are given by (10).
C2 =
x2(k), cl ' and c2' are given by (6), and
Cik' f3k
298
IX.
MISCELLANY
The development in Section B allows us to see that a similar recursion equation for P k is possible when the probability density function for ~ contains an unknown parameter. Because of the perfect observation assumption, we need only to replace p(~) in the equations for P k and G k ' by
f peg I B) p(B I x c-1) dB l
It is also easy to see that the plant equation need not be given by (1), nor need the system be two-dimensional. The properties of (1) that have been used are: (i) II x(i + I) 11 2 ~ D2 can be solved for g(i) so that the probability of g satisfying the inequality can be evaluated from pa), where II • II is the Euclidean norm; (ii) 8P k+l/8uk and 8P k+l/8gk are related by a simple equation. These two conditions are met by a large variety of linear and nonlinear equations. Other implicit criterion functions are possible for this type of problem, such as the maximum expected deviation. A computational work with this criterion function has been carried out by M. Aoki for a system satisfying Van der Pol equations for both purely stochastic and adaptive systems."
2. Min-Max Control Policies A priori probability distribution functions must be known or assumed before Bayes' optimal control policies can be obtained for problems with unknown or random parameters. For some problems the assumption of such a priori distribution functions is not valid. Even if valid, such a priori distribution functions may not be available and choice of a priori distribution functions may be arbitrary. The empirical Bayes approach-P or other considerations'" mentioned in Chapter VI eliminate this arbitrariness to some degree but not completely. In such cases, we may want to use control policies which do not assume any a priori distribution functions for the unknown parameter () of the problems. The class of control policies known as min-max control policies does not require any assumption on the distribution functions of () on the parameter space e. In Example 8 of Section 2,1 of Chapter I, we have already encountered one min-max control policy.
2.
MIN-MAX CONTROL POLICIES
299
In this section, we gather a few facts on min-max control policies when the probability distribution functions for the random noises in the plant equations contain unknown parameters B. For more detailed exposition of the subject, the reader is referred to Blackwell and Girshick.s" Ferguson,58a and Sworder. 133 ,134 A.
EQUALIZER CONTROL POLICIES
Given a criterion function ], it can be regarded as a function of the random variables to ,..., tN-l and of a control policy c/J. Define where the expectation is taken with respect to t N - 1 and X o , and where c/J is a member of the admissible class of controls, <1>. The detailed structure of this admissible set differs depending on the types of control policies being considered such as open-loop or closed-loop control policies. The fact E] is a functional of c/J and B does not change. It is assumed throughout this section that H(c/J, B) is a convex in c/J for every BE G. Denote the set of all probability distribution functions (including the degenerate distribution functions which assign probability I to a point in G) of B over G by G#. An element of G# is denoted by B#. * In a similar manner, the class of randomized control policies is denoted by <1># and its element by c/J# . Define H(>#, 0#)
=
E¢,e H(>, 0)
where the expectation operation is with respect to c/J and B with the distribution functions c/J# and B#, respectively. Define the max-cost of a randomized control c/J# by sup H(>#, B#)
e#EG# Min-Max Control Policy. that
If there exists a control policy
>/ E <1># such
then c/J/ is called a min-max control policy.
* Since we have used an asterisk to indicate optimality such as u ; * being the optimal control at time i, we use a somewhat unconventional symbol, #.
IX.
300
MISCELLANY
Bayes Control Policy. In terms of this set of notations, a randomized optimal Bayes control policy 4>0#' if it exists, is given by H(4)o#, 0#)
i~t
=
H(4)#, 0#)
where ()# is the assumed (or known) probability distribution function for (). If, for every € > 0, there exists (),# E e« such that a given control policy 4>0# comes within € of inf H(4)#, ()/), then 4>0# is called an extended Bayes or c-Bayes control policy.P
Equalizer Control Policy.
If there exists a control policy 4># such that H(4)#, 0)
=
constant
for all () E e, then it is called an equalizer control policy. The usefulness of equalizer policies may be illustrated by the following theorem.
Theorem. If 4>0# E <1J# is an equalizer policy and is extended Bayes, then it is a min-max control policy. If contains only a finite number of elements ()i' i = 1,... , M, then, by considering the convex set {H(4)#, ()i), i = 1,... , M, 4># E <1J#}, a stronger result can be obtained.
e
Theorem. If e contains a finite element, then there always exists a least-favorable (worst case) distribution function ()# and a min-max control policy 4># E <1J# exists which is a Bayes control policy with respect to ()#. (See Ferguson's" or Sworder P" for proof.)
3. Extensions and Future Problems We have suggested in several places of this book problems requiring further investigations. Some of the suggested problems are such that they can be treated by minor extensions or modifications of the techniques developed in this book. Optimal control of plants with delay, of systems with intermittent observations, or of systems with delay in the observation mechanisms are of such a type and are discussed to some extent in Section 4 of Chapter II. Random errors in the control actuations have not been discussed explicitly as such. What we have studied in this regard is the optimal control problems where the gains of the control variables are assumed to be random variables or unknown constants. It is clear, however, that the same approach can be utilized to study
3.
EXTENSIONS AND FUTURE PROBLEMS
301
the effects of random actuation errors (perhaps with unknown statistics). For example, when the plant equation is originally given by a linear difference equation Xi+I = Aixi Biu i , both the fixed component of the actuation error and the component of the actuation error proportional to control can be considered by modifying the plant equation to
+
Xi+-! =
AiXi
+ iB, + LJBi)U i + gi
where ti and ,dBi are independent random variables. The random variable ti represents the fixed actuation error and ,dBi U i gives rise to the proportional actuation error. Also see Orford 109 Another problem in this class is the optimal control problem of systems where system parameters may change randomly at random times. 1 6 ,75 This problem is sketched in Section A. The extreme cases where parameter values may change at each time instant (either independently or in some dependent manner) and where the parameter values remain constant throughout the control periods have already been discussed. See, for example, Section 2 of Chapter II, and Chapters III and IV. There are other problems, however, which seem to require considerable extensions or modifications of the method presented in this book. For control processes that run indefinitely, the criterion function we have used in this book, ] = L:f Wi(xi , U i - 1), may be meaningless without additional assumptions on Wi , such as the incorporation of the future discount factor or time-averaging factors. Also, the method of deriving optimal control policies developed in this book is not suitable for processes of infinite duration. * Some results for this class of problems can be found in Baum.P Bellman.P? Blackwell.i" Drenick and Shaw." Eaton and Zadeh, 52 Howard.P" and Strauch.P! for example. Consider, as another example, control problems where the terminations of control depend on random events. For such problems, the control durations are also random. Stochastic version of time optimal problems, pursuit and rendezvous problems in noisy environments, are typical of this class of problems with random control durations. See also Dubins and Savage.s? Eaton and Zadeh. 52 In discussing optimal control of such problems, we may take a criterion function to be N
] =
L
Wi(X i , U i- 1 ;
N)
i=l
* Instead of the usual backward dynamic programming, a forward dynamic programming method may be used.
IX.
302
MISCELLANY
where it is now assumed that N is a random variable and the cost of control at stage i depends not only on Xi and U i - 1 but also on N. For example, in a final-value problem where a cost is associated only to a final state, the cost at stage i will definitely be a function of N. Some preliminary investigation of this problem has been made by R. A. Baker, and will be the subject of Section B. A.
SYSTEMS WITH PARAMETERS WHICH CHANGE RANDOMLY AT RANDOM TIMES
In Chapters II-IV, we have discussed the method of deriving optimal control policies for systems where system parameters appearing in plant and observation equations and statistics of the probability distribution functions are constant (may be unknown) or random variables for each time constant. In this section, we will indicate how the method can be extended to derive optimal control policies for systems whose parameters may change their values randomly at random times. An example of such a system is given by the system which obeys the equations X;+!
=
Yi =
+ Bu, u«, + ru Ax;
where Band H are known and the noise YJ has a known probability distribution and where it is assumed that the matrix A is unknown and is such that it may change its value at some random time during the control. A special case where A is an independent random variable for each time instant with common distribution function has been discussed in Chapter II. Another special case is discussed by Rosenbrock'P" where the measurements are exact, where there are two possible values for A, and where A is known to change its value exactly once during the control period. Howard." discussed similar problems for memoryless systems, i.e., for systems without dynamics where the underlying statistical parameters of the process may change from time to time. We will illustrate the method to derive optimal control policies for systems described by
N
]
=
L W;(x; ,
U i - l),
3.
EXTENSIONS AND FUTURE PROBLEMS
303
where the same set of symbols is used as before. The plant parameter ex is assumed to be the parameter which may change randomly from time to time. It is assumed that there are no other parameters in the problem and that the noise processes {~i} and hi} are all assumed to have known probability distribution functions. The only additional piece of information we need in deriving optimal control policies is the description of the {ex}-process. Assume that ex undergoes random changes at discrete time instants 0 < n 1 < n z < '" < ">» ~ N such that the intervals between the successive changes are independently and identically distributed random variables with common known probability distribution, per). * Define
1
1
fLN -
where fhk is the total number of parameter value changes in k time instants, fhk ~ k. The new parameter value after each change is assumed to be chosen independently from the common known distribution, p(ex). The main recursion equations remain unchanged:
r, * =
min [Ai U _ i
l
where
+ E(Y~ll
Il-\ i
=
u
H
)]
1,..., N
The auxiliary recursion equations to generate p(xi, exi I yi) and P(Yi+l I Yi , ui) must now be modified to include the possibility that ex may change. Since the pattern of the parameter change intervals T",-l completely specifies the possible changes in ex up to time i, and since p(ex i I T";-l) is computable by the independence assumption of ex values after parameter value changes occur, P( T"i- 1 I y i , u i ) is the only new probability expression needed in computing P(Xi , exi I yi). The reader is invite a to derive it for himself. The detail is found in Aoki.l" B.
OPTIMAL CONTROL SYSTEMS WITH RANDOM STOPPING TIMES
a. Statement of the Problem
Consider a system described by the system and observation equations Xi+1 Yi
= =
Fi(X i , u; , ti) Gi(X i , 7)i)
* The random variable T takes only integer values. The {oj-process is a special case of a general stochastic process known as a semiMarkov process.Uw
IX.
304
MISCELLANY
where Yi is an observable and ui a control. Assume that this system operates from i = 0 to i = N, where N is a random variable. The cost is a function of xi, u':", and N, and may be expressed as N
] =
L
Wi,N(X i , Ui-l)
i=l
We wish to find the control policy that will rmrurmze the expected cost of j. We shall assume that we know the distributions of all of the random variables, X o , ~i' "u » and N. In particular, we know the conditional distribution of N, conditioned on N > i, the observable vectors yi, and an auxiliary observable Yi' To simplify notation, we shall write
We shall restrict our problem to the following two special cases: Case A. Pr(N
> M) = 0 for some fixed M;
Case B. Pr(N = j I ~i'
yi, N
>
i)
= Pr(N = j IN> i).
In Case A, we shall find a general procedure following the approach of this book. In Case B, we show that the problem can be transformed into one in which there is an infinite running time and a new cost function ]', which is not a function of stopping time. If the plant and observation equations are linear, with the only random variables (besides stopping time) being additive noise, and if the cost function is quadratic, that is if Xi+!
=
Yi = Wi,N
=
+ Biu, + gi cs, + n, +
AiXi
(U i-1 , GN,iUi-l)
(Xi' HN,iXi)
where Ai , B i , and C; are known and ~i and YJi are independent random variables with known distributions, then we can, formally, write a solution to the optimal problem.
b. Case A [Pr(N
>
M) = 0]
Suppose the system has survived through i = M - 1 steps. Then we know that we have exactly one more step to go. The problem is then the same as the case of a known stopping time and we have already solved that problem. Hence there is an explicit expression for U M- 1 as a function of U M- 2 and y M - I that will minimize the expected cost.
3.
EXTENSIONS AND FUTURE PROBLEMS
305
Now suppose we have survived M - 2 steps. Now there are two possibilities: either the stopping time is N = M - 1 with probability p M-l/ M-2' or N = M with probability P N/ M-2 . If the former holds, the additional cost will be W M-l M - l ; in the latter case it will be W M-l,M W M,M' Hence, taking the conditional expectation with respect to stopping time, the conditional expectation of the last two-stage cost L1] is given by
+
W M, M is a function of x M , U M-l ; U M-l is a function of yM-l , U M-2 ; X M is a function of X M - 1 , UM-l , ~M-l and hence a function of X M - 1 , 2 yM-l , U M- , ~M-l' Hence EN L1] is a function of 'M-2 (because P is a function of 'M-2), yM-2, U M- 3, which are observables, plus YM-l' gM - l ' X M - l ' which are not observables, and also, of course, U M-2 • In principle, we can find the probability distribution
and hence
Then the optimal policy is to choose U M-2 to minimize this conditional expectation. Now we see that this U M- 2 is a function of yM-2, U M - 3, and 'M-2' Going back another step, we have
This expression is a function of the observables 'M-3' yM-3, u M - 4, the nonobservables YM-2' YM-l , gM-2 , gM-l , X M - 2 ; and on U M-3· Again we find the conditional probabilities of the nonobservables conditioned on the observables and U M- 3 • Then we find the conditional expectation of the additional cost after M - 3 stages conditioned on the same variables. Again, we choose U M- 3 to minimize this conditional expectation. We see again that the control U M-3 is a function of the observables yM-3, U M- 4, and SM-3 .
IX.
306
MISCELLANY
The process continues with the general expression M
EN[LJ] I ~i]
=
M
I
P j/ iWi+1,j
+ I
j~i+l
M-l
=
P j/ iWi+2,j
+ ".
j~i+2
M
I
I
Ic~l
j~i+1c
Pj/iWi+Ic,j
which is a function of the observables y i , ui - \
~i'
Yi+l 'Yi+2 ,,,., YM-l , gi' gi+l , ... , gM-l , ~i+l
,,,.,
the nonobservables ~M-l , Xi' and the control Ui' The conditional expectation of Ll j, conditioned on the observables and Ui , is found and Ui chosen to minimize this expectation. As in the case of known stopping times, the practical problem is finding the conditional probabilities required. The process is only slightly more difficult by the inclusion of the extra variable ~, which determines the conditional distribution of stopping times. ~i+2
,
c. Case B Now let us consider the special case where the only additional information about the stopping time we have at the ith step over what we know at the first step is that the stopping time is greater than i. That is, ~i disappears for this problem and we have Pr(N
=
j yi, N I
> i)
=
_
Pr(N
=
!
Pr(N Pr(N
j =
>
1
N
> i)
j) i)
o
Now, if we exarrune EN[Llj n on, we have
+ I i~n+2
IN> n]
if j > i otherwise
and where Llj = cost from
Pr(i = N) Wn +2 , i
+ .,,]
3.
EXTENSIONS AND FUTURE PROBLEMS
307
If we multiplied all of the cost by a constant, we would change nothing. Hence, once we get to the nth stage, we can use EN[LJ]'
IN> n]
I I
=
k~l
Pr(i
=
N) W n +k • i
i~n+k
But then
That is, the expression for the expected cost function from n on is the same as from time zero. This shows us that we can use a single equivalent cost function in which the implicit dependence on the random stopping time disappears. That is, we note that E N [] ] =
I I k~l
i~k
W k'
=
where
Pr(i = N)
Wk,i
=
I
W k'
k=l
I
Pr(i = N)
Wk,i
i~k
N ate that we have left off the upper limits of summation in all cases. We can let this upper limit go to infinity. N ow our optimal control policy is that policy which is optimal for the system given and a cost function of
I'
=
I
Wk'(X k, Uk-I)
k~l
As an example, suppose final state). Then W k'
=
L
=
Wk,i
P(i = N)
xl8 k ,i (that is least squares in the
Wk,i
=
i=k
L
P(i = N)
Xk
20k,i
i=k
or, if
then 00
W k'
=
I
P(i = N)
AX i
i=k
= A P(k = N)
20k,i
+I
(P = i) U~_l
i=k
Xk
2
+ P(N :?' k)
ULI
308
IX.
MISCELLANY
Now, the obvious difficulty is that we have an equivalent system with an infinite time duration. This precludes the possibility of going to the last stage and working back. If we have linear plant and observation equations with additive independent noise and a quadratic cost function, the problem is solvable. This is because we know the optimal policy is the same as in the deterministic case except we use E[x n I ynJ instead of X n. The deterministic linear system of infinite duration can be solved by variational techniques and hence our problem can be solved.w Even in this special case, we may not be able to find explicit expressions for E(x n I yn). If the observation equation is noise free, or if the system is noise free and the observation noise is Gaussian, we can solve the problem in principle.
Appendix J
Some Useful Definitions, Facts, and Theorems from Probability Theory
In order to facilitate the reading of this book (especially of Chapters VI and VIII) several facts and theorems from the theory of probability are collected here together with some of their intuitive explanations. For more detailed and systematic accounts see, for example, Doob 47a or Loeve.l'" PROBABILITY TRIPLE
In order to be able to discuss probabilities of certain events, three things must be specified. They are: (i) the sample space, Q; (ii) the class of events to which probabilities can be assigned, :F. Events in the class :F are certain subsets of the sample space Q; (iii) probability measure P (defined on :F) so that, to any event A in the class :F, a real nonnegative number PA, 0 ~ PA ~ 1, is assigned, with PQ = 1. These three things are collectively referred to as a probability triple (Q,:F, P). Since each event in :F must have a probability assigned to it unambiguously, :F cannot be any arbitrary collection of subset of Q but must have a certain structure. For example, in a single coin tossing, the sample space Q is composed of two points: H (for head) and T (for tail). The class :F consists of four subsets, {(c/», (H), (T), (H, Tn, where c/> denotes a null set. When we say a coin is fair, we mean that PH
=
PT = 309
i
310
APPENDIX I
Intuitively, .fF includes all the events to which probabilities can be assigned. If an event A has a probability p, then one also wants to talk about the probability of A (the event that A does not occur) I - p; i.e., if A E.fF, then .fF must be such that A E .fF. If Al and A 2. are in .fF, then one wants to discuss the event Al n A 2 (the event that Al and A 2 occur simultaneously), the event Al U A 2 (the event that at least one of Al and A 2 occur), etc. Namely, if AI, A 2 E.fF, then.fF must be such that .fF contains Ai U Ai ' A, n Ai' Ai U Ai ' Ai n Ai' Ai U Ai ' and Ai n Ai' i, j = 1, 2. Such a class is known as a field. Since we also want to discuss probabilities of events which are limits of certain other events such as lim n -7w U~ Ai and lim n -7w n~ Ai , Ai E .fF, i = 1,2,... , .fF is usually taken to be a a field.
Example. Given a set AC Q, .fF = {cjY, A, Q - A, Q} is the minimal a field containing A (i.e., the smallest a field containing A). RANDOM VARIABLES
A random variable X (abbreviated as r.v. X) is a mapping from Q to the extended real line R (the real line plus ± (0) such that
for all A E Borel field (a field on R) where X-I is the inverse mapping of X; i.e., X-IA = {w; X(w) E A, w E Q}. Such an X is called .fF measurable. We denote by a(X) = X-I (Borel field) the smallest a field of subsets of Q with respect to which X is measurable. INDEPENDENCE
Let .EI
, .E2 , ... ,.En be sub-a-fields of .fF, i.e., .Ei is a a field such that 1 :s;: i :s;: n. They are called independent if and only if
s, C s ,
P
(n Ai) = 1
fI P(A i)
for arbitrary
Ai E };i'
1:( i :( n
I
A sequence of sub-a-fields of .fF, .Ei , i = 1,... , is independent if .EI , ... , .En are independent for all n = 1,2,.... Random variables Xl' X 2 , ••• are independent if and only if a(XI), a(X2 ) , ••• are independent.
311
PROBABILITY THEORY EXPECTATION
An indicator function (also called a set indicator)
fA
is defined to be
IS
called a simple
WEA
w$A The expectation of
fA
is defined to be
A finite linear combination of indicator functions function. If m
X
n
I
=
aJAi
=
i=l
I s.:s,
i=l
where Ai and B, are measurable, i.e., Ai , B, of X is defined to be m
I «r»,
EX =
E
%, then the expectation
n
=
1
I
bjPB j
1
If X is a nonnegative random variable, and if {Xn } and {Yn } are two sequences of measurable simple functions such that X n i X and r, i Y (Xn i X means that ~ Xn+l and limn Xn(w) = X(w), for all w E Q), then
x;
lim EXn
=
lim EYn
and this common value is defined to be EX (the expectation of X). The expectation of a random variable X on (Q, %, P) is defined to be EX
EX+ - EX-
=
where X+
=
max(X, 0),
X-
=
when the right-hand side is meaningful. EX is also written as EX
=
JX dP.
max(O, -X)
312
APPENDIX I
ABSOLUTE CONTINUITY
Let us suppose that two probabilities P and Q are available for the same (Q, g;). We say P is absolutely continuous with respect to Q, written as P <{ Q, if Q(4) = 0 implies P(A) = O.
Radon-Nikodym Theorem. P <{ Q if and only if there exists a measurable function f (written as dP/dQ) such that (i)
f
~
0, Q almost everywhere;
Jf dQ for arbitrary A
(ii)
PA
(iii)
unique Q almost everywhere.
=
E §,
Jf dQ <
00;
(That is, functions with Properties (i) and (ii) differ at most on a set whose probability is zero computed according to the probability measure Q.)
Example. Consider (Q, g;, P) such that Q is partitioned into four mutually exclusive subsets Ai , I :c:;; i :c:;; 4, Q = U~~l Ai , Ai n A. = 1>, i eF j:
Let g; be the a field generated by this partition. Let P be such that PA i = Pi> 0,
4
L Pi =
1
i=l
Let Q be such that
QA i = qi > 0,
P <{ Q means LI qi = 0 ~ LI Pi = 0 where I is a subset of {I, 2, 3,4}. Since qi > 0, Pi > 0 for all i, I :c:;; i :c:;; 4, in the above example, P <{ Q as well as Q <{ P and dP/dQ = Pi/qi on set A, , I :c:;; i :c:;; 4. CONDITIONAL EXPECTATION
Let (Q, g;, P) be a given probability triple, }) be a sub-a-field of g; and X be a random variable such that
J I X I dP <
00
(such an X is denoted by X
E
U(P))
313
PROBABILITY THEORY
It can be shown by the Radon-Nikodym theorem that there exists a function h such that (i) h is £ measurable,
(ii) f A h dP = f A X dP for all A E i; (iii) h is unique P almost everywhere. This function h is written as E(X I £) and is called the conditional expectation of X with respect to £. Note that, if a and f!J are two sub-a-fields of :F such that a C f!J, then E(X I Ot) dP = E(X) I f!J) dP for all A E a. Given two r.v. X and Y, E(X I Y) is to be interpreted as
L
L
E(X I a(Y» MARTINGALES
Let (Q, :F, P) be a probability triple. Let T be an ordered set (e.g., Tis a time axis T={t, t ~O}or T={O, 1,2, 3,...}). ForeachtET,:Ftis a sub-a-field of:F such that :F t C :Fs , t < s. Let X, be a LV. such that
(i) E I s, [ < 00, (ii) X, is :F t measurable, (iii) f A dP = f A X s dP for all A E:F t , t
x,
<
s.
This can also be stated as P almost everywhere (a.e.)
When the nondecreasing sequence of a fields {:F t , t E T} IS not specified, take :F t = a(X t , T ~ t). Such {Xt , :F t , t E T} is called a martingale. In (iii) if the equality sign is replaced by an inequality sign ~(~), then {Xt,:F t, t E T} is called an expectation-increasing (decreasing) martingale or semimartingale. EXAMPLES
Let us consider a fair gambling situation and denote by X n the capital after n plays. Thus, X o is the initial capital. By a "fair" gambling situation we mean m
>
n
where the abbreviated notation of the conditional expectations is used for
314
APPENDIX I
That is, a game is considered fair if the expected capital after m plays, conditioned on the wins and losses of the past n plays (m > n), is the same as the current capital X n . Thus one's capital in a fair gambling situation is a martingale. As another example of a similar nature, consider
where the Y i are independent, i 3 1, with EYi = 0, t = 1,2, .... Then, for i < j, A EO a(Xo , ... , Xi)'
J x, dP = J Xi dP A
smce
L
A
Y k dP = PA . EY k = 0,
k
=
i
+ 1,... , j.
As a final example of martingales, consider a situation where En is a nondecreasing sequence of sub-a-fields, (Q, %, P) given. Let X EO L1. Let T = {1, 2, ..., z, z + 1} where
s; g
Vn%.,
g
a
(U %.,) n
Let X,
=
E(X I~)
Then {XI' %1 , T} is a martingale. SUBSEQUENCES OF MARTINGALES
Take T = {1, 2, ..., N} and assume that {(Xi' %i)' 1 martingale. A random time 7 is called admissible if and only if peT
=
1,2, ... , N)
=
1,
Sometimes the set [w: 7(W) = n] as (7 = n).
[W : T(W) IS
=
n)
EO
:s:; i :s:; N}
is a
%.,
written in an abbreviated form
315
PROBABILITY THEORY
Define
= {A: A E iF, A (\ (7 = n) E~}
~
Let 7 1 ~ 7 2 ~ ••• ~ 7k (X ff is a martingale. 7k
'
~
be
N
admissible,
then (X
7
! ,
ff
T , ) , ••• ,
T)
CONVEX FUNCTIONS OF MARTINGALES
Let (X t , ff t , T) be a martingale on (Q, ff, P). Let ¢ be a convex function (nondecreasing). If ¢(X t *) ELI, then {¢(X t ) , ff t , t E T, t ~ t*} increasing martingale.
IS
an expectation-
INEQUALITIES
Chebychev Inequality. Let X ?': 0, EX Pr[X
~
A]
<
then
00,
(l/A)EX.
~
°
Martingale Inequality. Let {Xi' ff i , ~ i ~ n} be an expectationdecreasing martingale, where Xi ?': 0. Then, for ,\ > 0,
Let (Xo , ffo)," " (Xn , ff n) be an expectation-increasing martingale on (Q, s-; P). Let ,\ > 0. Then, Pr[ max X; O~J~n
~
~ ~
A]
1\
E IXn I
(Compare these with the Chebychev inequality.) Let {(Xi' ff i ), i = 0, I, ...} be a martingale on (Q, ff, P) such that EX,.2 <
00,
i
=
0,1, ...
Then, PreO~j~n max I Xi- I > A) < 1s-: ~ A2 EXn 2 CONVERGENCE
Convergence in probability: A sequence of LV. {X,J is said to converge to a LV. X in probability if, for every E > 0, there exists 0 > 0 and N(o, E) such that Pr[j X n
-
X I
~
E] < 8
for
n
~
N(E, 8)
316
APPENDIX I
Convergence with probability one: A sequence of LV. {Xn } is said to converge with probability one to a LV. X if Pr[Xn ----+ X] = 1, i.e., for every E > 0, Pr
[n U I X n
m
n +m -
X
I~
E]
=
0
or, equivalently, Pr
[U [I X m
n +m -
X
I~
E)] --+ 0,
Convergence in L': A sequence of converges to a LV. X in L! if
LV.
n
n
--+ 00
{Xn } , X n ELI, n
=
1,... ,
--+ 00
EXAMPLES
Convergence in probability does not imply convergence with probability one. Let X n be independent such that Pr[Xn
°
=
0] = 1 - lin,
Pr[X n = 1] = lin
Then, X n ----+ in probability one but not with probability one. As a matter of fact, the set of w such that {Xn ( w)} will be one infinitely often has the probability one. Convergence with probability one does not imply convergence in L': Let X n be independent, EX" < 00 with Pr[Xn Then X n
----+
°
=
0]
=
1 - 1/n2,
with probability one but
EXn = 1 -1+ EX = 0 SOME CONVERGENCE THEOREMS
Monotone Convergence Theorem. Consider a r.v. X and sequence of LV. X n such that X n i X. Then EX = limn EXn . Martingale Convergence Theorem. a martingale on (Q, :7, P).
Let {Xi' :7i
,
a
i = 1, 2, ...} be
317
PROBABILITY THEORY
If E I X n I ~ k
<
lim X n n
then
00, =
XX)
Given (D, ff, P), let X
exists ae., and
ELI
E(X I .%;,)
--+
E I X oo I
~
k
and ff n be nondecreasing a fields. Then
E(X
I Vn~)
a.e. and
U
Appendix II
Pseudoinverse
INTRODUCTION
There are many situations where it is necessary to solve an algebraic equation such as (1)
Ax =Y
where A is an m X n matrix, x is an n vector, and y is an m vector. If A is square and nonsingular, i.e., if m = n = rank A, then (1) can be solved as x = X- 1y. Even when A-1 does not exist, it is desirable to solve (1) in some approximate sense. For example, if m > n = rank A, then we may be interested in x = (A'A)-1A'y as a solution to (1) m some cases. We have seen one example in Chapter II, Section 2, where it is necessary to minimize a quadratic expression [(u, Su)
+ 2(u,
Tx)]
(2)
with respect to u even when 8-1 is not defined. In (2), as shown in Appendix A of Chapter II, the desired u is obtained by solving the linear equation
(3) Su + Tx = 0 when 8-1 exists. Even if 8-1 does not exist and (3) cannot be solved for u, one is still interested in finding u which minimizes the quadratic form (2). This minimizing u satisfies (3) in an approximate sense, to be described below. The concept of pseudoinverses of matrices is introduced as an extension of the concept of the inverses to provide the method of solving the
318
319
PSEUDO INVERSE
equation such as (1) or (3) approximately in such a way that, when the inverses of appropriate matrices exist, these two concepts coincide. 65 ,llo ,1l1 There are several ways to introduce and derive properties of pseudoinverses. 27 .47 ,142 Here, the starting point is taken to be the minimization of a quadratic form. Namely, the problem of solving (1) for x is transformed into that of minimizing a quadratic form II
Ax - y
11 2 =
(Ax - y, Ax - y)
(4)
with respect to x. After all, this is the way the pseudoinverses appeared in our problem in Chapter II, Section 2. The minimizing x of (4) may not be unique. Then, let us agree to pick that x with the smallest norm II x II as our solution. This seems quite reasonable for (2), for example, since one is usually interested in minimizing the performance index (2) with the smallest fuel, energy, etc., which may be interpreted as u having the smallest norm. For further discussions of quadratic programming problems to select unique solutions by successive use of various criteria, see Mortensen.J?" Denote x with these properties by x
A+y
=
(5)
where A+ is called the pseudoinverse of A. Note that when A-I exists, x = A-ly satisfies the conditions of uniquely minimizing II Ax _ y 2 • 11
CONSTRUCTION OF THE PSEUDOINVERSE
The development of the pseudoinverses presented here is based on the properties of finite-dimensional Hermitian matrices.l'" See Beutler''? for similar treatments of pseudoinverses in more general spaces. , Let A be an m X n matrix with rank r, C" an n-dimensional complex Euclidean vector space, M(A) the range space of A, %(A) the null space of A, and A* the complex conjugate transpose of A. Vectors are column vectors. Vectors with asterisk are, therefore, row vectors with complex conjugate components. Our construction of A+ is based on the polar decomposition of A: r
A
=
I
(6)
Adigi*
i=l
where r = rank A, and where gi EO C», such that fi*fi = 0ij ,
I.
and gi are column vectors,
g;*gj = 0ij,
~ i, j ~ r
I. EO c»,
320
APPENDIX II
and where .\.; >0 is defined later by (15). In (6),fig/ is a dyad (m X n matrix of rank one) andfi*fj is complex inner product. Then it will be shown that A+ with the desired property is obtained as r
A+
I
=
Ai1gJ;*
(7)
;~l
First, (6) is derived. Let Then one can write X
=
Xi ,
i = 1,... , n, be an orthonormal basis in C",
n
I
for all
«,»,
x
E
en
i=l
where (X;
x;*x
=
Now Ax =
n
I
(X;Ax;
i=l
where 1 =
1...., n
since A is a linear mapping from C» to C», Since rank A Yl ,... , Yr be the orthonormal basis of :a?(A) C c». Then, generally,
=
r, let
r
Ax;
=
I
{3ijYi
(8)
j~l
By suitable choices of bases in c» and C", (3ij in (8) can be made quite simple. To find such suitable bases, consider A *A, an (n X n) matrix. It is a Hermitian linear transformation on C n , hence it has n nonnegative real eigenvalues, and its matrix representation can be made diagonal by a proper choice of a basis in en. Since r = rank A = rank A*A
321
PSEUDOINVERSE
exactly r of the n eigenvalues are nonzero. Let Pi be such positive eigenvalues with eigenvector Zi, I ~ i ~ r, A*Azi
=
PiZi,
Pi
> 0,
Zi E
i
en,
=
1,..., r
(9)
Multiplying (9) by A from left yields AA*(Azi ) = pi(Azi),
i
=
1,... , r
(10)
This shows that, if Zi is an eigenvalue of A *A, with the eigenvalue Pi , then the AZi are eigenvectors of AA * with the same eigenvalue Pi . Since AA * has exactly r positive eigenvalues, rank (AA *) = rank (A*A) = r, hence A*A and AA* have Pi' i = 1,... , r, as their common eigenvalues. Orthonormalize the eigenvectors for A *A and denote them by {gi' i = 1, ... , r}: i = 1,... , r
We have seen in (10) that Agi are eigenvectors for AA*. Choose the eigenvectors for AA* {Ii, i = 1,... , r} by i
=
1,... , r
Since
{Ii' i
=
I, ... , r}
IS
also orthonormal if fJi
=
(Pi)l/2, i
=
I, ... , r. Thus (II)
It is known that14 2 em = 8i'(AA *) EB JV(AA *)
en = 8i'(A*A) EBJV(A*A),
Since ~(A *A) C C", complete an orthonormal basis for Cn by adding gr+l ,... , gn to {gi, i = 1,... , r}. Similarly, ~(AA*) C c» and an orthonormal basis for c» is obtained by augmenting {Ii , i = 1,... , r} by {lr+l ,···,fm}· Then {gr+l' ... , gn} spans A'(A *A) and {lr+l ,... ,Im} spans %(AA*). It is also true 14 2 that ~(A) ~ %(A*) and .'J1'(A*) ~ .Y(A). Thus A*Ax = 0
¢>
Ax = 0,
AA*x=O¢>A*x=O
Hence, from A *Agj = 0, Ag;
= 0,
j
=
r
+ 1,..., n
322
APPENDIX II
and from AA *fj = 0,
(12)
Ar], = 0,
j
+ 1,..., m
r
=
From (11) and (12),
Ar],
0,
=
i
=
1,..., r
i
=
r
(13)
+ 1,... , m
Since {gl , ... , gn} is a basis in en, given any x E en,
where and Ax
=
n
r
1
1
L cxiAgi = L CXiP~/Yi
(14)
or r
A
L Adigi*
=
(15)
1
where Pi is a positive eigenvalue of A*A, 1 ~ i ~ r. Equation (14) is the simplified form of (8). Thus ~(A) is spanned by I. , i = 1,... , r. Now we consider Problem (4) with x E en. Write with
v
with
Yi = f;*v
E
91?(A)
Then v has the expansion r
V
= Lydi
(16)
1
N ow consider a vector related to x by
Then, from (14), Ax
=
v
and therefore II
Ax - y
11
2
~
II
Ax - Y
11
2
for all
x
E
en
323
PSEUDO INVERSE
Also
Therefore, one sees that A+ is defined by
or r
A+ = L/..;lgJt
(17)
1
where
1\ is given by (15). From (15), A*
=
r
L /..igdi* 1
and r
(A *)+
=
L /..;lhgi *
(18)
I
From (17) and (18), one can readily obtain useful identities: (i) (ii) (iii) (iv)
AA+A = A A+AA+ = A+ (A+)* = (A*)+ (A+)+ = A
For example, (i) is obtained from (15) and (17);
=
L
i,j,k
\k;l\J;gi *gj~
"Ls; *
since
Expressions such as (17) and (18) can be put in to matrix forms. Define
= {II ,···,fm}:
m X m
matrix
G = {gl ,..., gn}:
n X n
matrix
F
324
APPENDIX II
and
A 0:) R ~ ( .:H>.~ 1.
.
:0
·
m
X
n
matrix
The orthonormalities of j's and g's imply FF* =F*F
=
t;
where 1m is the m-dimensional identity matrix. Similarly, GG* = G*G = In
From (15) and (17), A =FRG*
and A+ = GR+F*
where
R+
=
Similarly, A* = GR'F* (A*)+ = F(R+),G*
where ['] means a transpose.
Appendix III
Multidimensional Norma! Distributions
In this section certain useful facts on multidimensional normal distributions are listed for easy reference. An attempt has been made to give a logical self-contained presentation wherever deemed possible without unduly lengthening the material presented in the appendix. Most of the proofs are omitted. For a more complete discussion of the material, the reader is referred to Cramer'" and Miller.l?" RANDOM MATRICES AND RANDOM VECTORS
Definition I.
A random (m X n) matrix Z is a matrix Z
=
(Zij),
of random variables
i
= 1,2, ..., m, j
=
1,2, ..., n
Zll' Z12 , .•. , zmn .
Definition 2. EZ
=
(Ez i j )
Let Z be an (m X n) random matrix. Let A be a (l X m) matrix, B an (n X q) matrix, and C a (l X q) matrix. Then
Lemma I.
E(AZB
Example 1. i.e.,
+ C) =
A(EZ)B
+C
Let X be an n-dimensional random vector with mean ft,
EX = flo 325
326
APPENDIX III
Then (X - fL)(X - fL)' is an (n X n) random matrix and 11
~
E[(X - fL)(X - fL)']
is defined as a covariance matrix of the random vector X. Thus, by definition, A is a symmetric positive matrix (i.e., either positive definite or positive semidefinite). CHARACTERISTIC FUNCTIONS AND PROBABILITY DENSITY FUNCTIONS
Definition 3. The characteristic function (abbreviated ch.f.) of an n-dimensional random vector X is q,(t)
~
E
eii'X
for every real n-dimensional vector t. When n = I, this definition reduces to the usual definition of the ch.f. of a random variable.
Theorem 1. Given two distribution functions F I and F 2 on the real line, if the corresponding ch.f. is such that q,l(t) - q,2(t), then F I = F 2 . The inversion formula lim ~21 T->oo
7T
IT
-T
e-
i ta
~
e-
iib
q,(t) dt
It
exists and is equal to F(b) - F(a), where a and b are any continuity points of F. This theorem has the corresponding generalization to n-dimensional Euclidean space.
Definition 4.
When an n-dimensional random vector X has the ch.f. q,(t)
=
exp[it'm - it'l1t]
where m is an n vector and A is a positive (n X n) matrix, then the corresponding distribution function is called normal (n-dimensional normal distribution) and is denoted by N(m, A). The parameters of the distribution function m and A are the mean and the covariance matrix of X, respectively.
Lemma 2. The ch.f. of the marginal distribution of any k components of an n-dimensional vector, say Xl' X2 ,... , Xk , is obtained from q,(t) by putting t i = 0, k + 1 ::s: i ::s: n.
327
NORMAL DISTRIBUTIONS
From Lemma 2 and Definition 4, the ch.f. of k components of X, x k ) , is given by
(Xl' X 2 , ••. ,
rp(u)
exp[iu'(L - tu'Mu]
=
where u is the first k components of t, fL is the first k components of m, and M is the k X k principal minor matrix of .1. Since 4>(u) has the same form as 4>(t), g = (Xl"'" xk) is also normally distributed with N((L, M), or any marginal distribution of a normal distribution is also a normal distribution. LINEAR TRANSFORMATIONS OF RANDOM VARIABLES
Let an n-dimensional random vector X be distributed according to N(m, A) with nonsingular .1. Then there exists a nonsingular (n X n) matrix C such that C'A.-IC
=
I
(1)
Define an n-dimensional random vector Y by CY
=
X - m
(2)
Then the ch.f. if;(t) of Y is given by if;(t)
=
E exp(it'Y)
=
E exp(it'C'(x - m))
= exp( -it'C'm) where rp(t)
E exp(it'C'x)
=
exp( -it'C'm) rp(Ct)
=
E exp(it'X)
Thus
n exp( -tt n
if;(t) = exp( -tt't)
=
2
i )
(3)
i=l
since C' A.C
=
(C'A.-IC)-I
=
1
Therefore Y is also normal and is distributed according to N(O, I). This fact generalizes to any linear transformation.
Lemma 3. Linear transformations on normal random vectors are also normal random vectors.
328
APPENDIX III
Since
J ... J exp[it'y -
tY'Y] dYl ... dYn
=
(27T)n lz exp[ -tt't]
En
where En is the n-dimensional Euclidean space, this shows that
~ (27T~nIZ
f(Yl ,... , Yn)
exp( -t y'y)
(4)
is the probability density function of the d.f. with ch.f. Eq. (3). From Eq. (4), E(y;) = i = 1,2, ... , n E(Yi Z) = 1,
°
E(YiYj)
i=Fj
0,
=
°
Therefore, the covariance matrix of the random vector Y is I. Thus and I in N(O, I) have the physical meaning of the mean and covariance matrices of ¥: E(Y) E(YY')
= =
°
I
(5)
This is also clear from the definition of the ch.f. The probability density function of x can be obtained from Eq. (4) by the change of variables, Eq. (2), as f(x l, X z ,..., xn ) =
(i~Lz
exp( -[t(X - m)'A-l(X - m)])
where] is the Jacobian of the transformation
] =
I 8Yi I= 8x
[ C-l [
j
and where CC '
=
A from (1) is used. Hence
I C 1= and
[A [1/2
I ] I = IA
I-liz
Therefore,
I(xl ,..., xn) =
(27T) n /211A
1
1/ 2
is the density function of N(m, A).
exp[ -t(X - m),A-l(X - m)]
(6)
329
NORMAL DISTRIBUTIONS
Notice that normal distributions are specified as soon as m and A are specified, in other words as soon as the mean and covariance matrices of the random vector X are specified. From (1), (2), and (5), E(X) = m,
E(X - m)(X - m),
=
CC'
(7)
=,1 PARTITION OF RANDOM VECTORS
Let X be an n-dimensional random vector with N(m, A). Assume that A is nonsingular. Partition X into two vectors Xl and X 2 of k and (n - k) dimensions each. Define All = E(X1 - m1)(XI - m 1)', ,122 = E(X2 - m 2)(X 2 - m 2)', ,112 = E(X1 - m1)(X2 - m2)'
m1 = E(X1 ) m2
= E(X2 )
(8)
If A l 2 = 0, then
I A I = I All I I ,122 I ,1-1
=
0)
(,1111
o
A;}
and the density function of x becomes, from Eq. (6), f(X1, X 2) = (27T)k/2+All X
11/2
(27T)(n-kl}2
exp( -{t(X1
-
m1)'A111(X l
-
m1)})
IA-;[172 exp( -{te X 2 - m2)'A;21(X2
-~
m2)})
(9)
Therefore when A l 2 = 0, Xl and X 2 are independent and are distributed according to Nim; ,An) and N(m 2 ,A 22), respectively. Thus, we have Lemma 4. Two uncorrelated normally distributed random vectors are independent. CONDITIONAL DISTRIBUTIONS
Generally, ,112
=I' 0
330
APPENDIX III
In this case, introduce random vectors YI and Y 2 of k and (n - k) dimensions each by Y I = Xl - DX2 Y2 = X 2
where D is a (k
X
(n - k)) matrix to be specified in a moment. Then
If D is chosen to be then YI and Y 2 are uncorrelated normally distributed random vectors, hence independent from Lemma 4. Since YI and Y2 are normally distributed, their distributions are specified by computing their means fil and fi2 and covariance matrices E I and E 2 where 11-1
= EY1 =
m 1 -
11-2
=
m2
EY2
=
111211221m2
and ];2 = E[(Y2 =
-
EY2)(Y2
-
EY2 )']
11 2 2
Then the joint density function of (Xl' X 2 ) when .112 #- 0 is given by
where ] =
I OX °Yi I,
I
~
i, j
~
n
j
Then the conditional probability density function of Xl on X is obtained from
This has the normal distribution law (10)
NORMAL DISTRIBUTIONS
331
Thus the conditional mean of a normally distributed random vector is linear in the conditioning vector X 2 :
SINGULAR DISTRIBUTIONS
When a covariance matrix A is positive semidefinite, then A-I does not exist and the density function cannot be obtained from the inversion formula as has been done in previous sections. The function t/J(t) of Definition 3, however, is still a ch.f. Therefore, there exists a corresponding d.f. even when A-I does not exist. (For necessary and sufficient condition for t/J(t) to be a ch.f. see, for example, Cramer). This d.f. can be obtained as a limit of a d.f. with nonsingular .11 k -+ A. For example, let
Akl now exists and the corresponding d.f. F k can be found. As €k -+ 0, a ch.f. with .11 k converges at every t, Then it can be shown that there exists a d.f. F with t/J(t) as its ch.f. to which F k converges at every continuity point of F. This limit d.f. is called a singular normal distribution. Let rank A
=
r
<
n
Consider a linear transformation Y
=
C(X - m)
(11)
Then the covariance matrix M of y is given by M
=
E(YY')
=
CAC'.
Choose C as an orthogonal matrix such that A is diagonalized. Since rank M
= rank A = r
only r diagonal elements of M are positive, the rest are all zero. Therefore,
> 0, E(y j 2) = 0, E(Yi 2 )
r+I~j~n
by rearranging components of y, if necessary.
332
APPENDIX III
This implies Yi
=
0
Then, from Eq. (11), It is seen therefore can be expressed variables YI ,... , Yr' of Xl"'" X n , each dependent.
with probability 1,
x
=
m
r
+1~
j
~
n
+ CY
that random variables Xl"'" X n , with probability 1, as linear combinations of r uncorrelated random Since each Yi' 1 ~ i ~ r, is a linear combination Yi' 1 ~ i ~ r, is normally distributed and is in-
Theorem 2. If n random variables are distributed normally with the covariance matrix of rank r, then they can be expressed as linear combinations of r independent and normally distributed random variables with probability 1.
Appendix IV
Sufficient Statistics
INTRODUCTION
We have discussed in some detail, in Chapters II-IV, optimal control problems for a class of dynamical systems involving some random variables in the description of their environments, plants, and observation schemes. We have obtained optimal control policies for these problems by first computing y's, the conditional expected values of the criterion function, conditioned on the currently available information about the system and on the utilized control variables, then minimizing y's over the class of admissible control variables. In order to compute Yk we needed the conditional probability density functions p(x k I y k-l, uk-I) or P(XIc-1 I ylc-l, UIc~I). Also, in Chapter IV, in computing Ylc , we needed expressions for P(/l-Ic I vk) and P(VIc I vk~l) where /l-k and Vic are the unobserved and observed portions of the Markov process gk}' 'Ic = (/l-k , Vic)' Generally, expressions for P(Xk I yk, Uk), P(/l-k I vk), and P(Vk I vic) are rather complicated functions of the observed data and employed controls. An optimal controller must remember all past observations and past controls vk or ylc and uk in order to synthesize the optimal control vector 1. Thus, the optimal controller generally needs a memory at time k which grows with time. For certain classes of systems, however, we have seen that it is possible to compute these conditional probability densities by knowing only a fixed and finite number of quantities tk(ylc, Uk-i) of fixed dimensions. They are functions of the observed data (ylc, Ulc-I); i.e., for some problems, optimal control policies can be synthesized by knowing values of only a finite fixed number of functions of the observed data thus eliminating the need for a growing memory.
+
333
334
APPENDIX IV
Random variables which are functions of observed realizations (i.e., samples) of another random variable are called statistics. When statistics carry with them all information about the probability distribution function that can possibly be extracted by studying observed data, they are called sufficient statistics." Thus, we can realize optimal control policies with controllers of finite memory capacity if sufficient statistics exist for the problems. See, for example, Section 3 of Chapter II, Section 5 of Chapter III, and Section 2,B of Chapter IV. SUFFICIENT STATISTICS
A formal definition of sufficient statistics for random variables with probability density functions is as follows. " Let zn be a random sample with the probability density function p(zn; B) which depends on a parameter BEe. A statistic T 1 = t 1 (z n) (a real-valued function) is called a sufficient statistic for B if and only if, for any other real-valued statistics T 2 , ... , Tn such that the Jacobian is not identically zero, the conditional probability density function p(t2 , ... , t n I t 1 ) of T 2 , .. ·, Tn given T 1 = t 1 is independent of B. Namely, not only does B not appear in p(t2 , ... , t n I t 1 ) but also the domain of p(t2 , ... , t n I t 1 ) does not depend on B. A vector-valued sufficient statistic is similarly defined as a finite collection of real-valued sufficient statistics. The above definition is somewhat inconvenient to apply, since one must test conditional density functions of all statistics for the dependence on B. We have a criterion called the Fisher-Neyman criterion or factorization theorem which is much more convenient in practice to test if given statistics are sufficient or not. We state the theorem when the probability density function exists and when its domain is independent of B.
Factorization Theorem. T is a sufficient statistic for Bif and only if it is possible to factor the joint probability density function as p(zn; B)
=
g(zn) h(T, B)
where g does not involve B. Therefore, when a sufficient statistic T exists, an optimal controller needs to remember only T, and the problem of growing memones does not arise. We will now consider an example to illustrate the above discussion. In Section 4, the implications of the existence of sufficient statistics on controller memory requirements are further considered.
335
SUFFICIENT STATISTICS EXAMPLES 73
Consider a sample of size 2, Z2 = (ZI ,Z2) where ZI and Z2 are independent Gaussian random variables with unknown mean Band known variance 1. Then is a sufficient statistic for e. This can be seen, for example, by directly applying the definition. Consider any statistic t 2 = f(zl , Z2) such that ZI and Z2 are expressed by ZI = k 1(t 1 • t z) Zz
=
k Z(t l , t z)
and the Jacobian is nonzero. Then, by writing the density function for p(t l, t z ; 0)
=
=
Z2,
p(kl(t l, t z), kZ(tI' t z» I ] I
ILl
exp ( _ ~ [ (tl -; 20)2
where ] is independent of Since
+
Z t1 - 4 k 1(t1 t z) kZ(t1 , t z) ])
2
e.
the conditional density of t z given t 1 becomes
]
(t
Z
1 4k 1 k z ) p(tz I t 1 ; 0) = (27T)l/Z exp - ---4---
e.
e.
which is independent of Therefore t 1 is a sufficient statistic for Actually, that t 1 is a sufficient statistic for can be seen much more directly by applying the Fisher-Neyman criterion, by writing
e
P(ZI , Zz ; 0) = g(t l , 0) h(ZI , zz)
where
(tl
g(tl' 0)
=
1 (27T)I/Z exp -
-4 20)2 )
h(ZI' zz)
=
1 ( Z l -4 zz)Z ) (27T)I/Z exp -
Other examples are found in Hogg and Craig.?"
336
APPENDIX IV
SELF-REPRODUCING DISTRIBUTION FUNCTION
One of the extraordinary features of Gaussian random variables is that the transformations of a priori probability distribution functions by the Bayes rule into a posteriori distributions turn out to preserve the normal forms of the probability distribution functions when the plant and the observation equations are linear in the random variables. A normal distribution function is completely specified by its mean and covariance matrix. See, for example, Appendix III. This is the reason why the controllers need remember at any time i only two quantities, fLi and T', , in the examples of Section 4, Chapter II, and the controllers can compute the next set of numbers fLi+1 and T i + 1 given a new set of data Yi+l and u; . Unfortunately not all probability distribution functions share this property of "reproducing" the form of the a priori distribution function in the a posteriori distribution function form. If the a posteriori distribution functions have the same form as the a priori distribution functions, then only a set of parameter values need be determined to specify the particular distribution function. Since it is infinitely easier to specify a set of numbers than a function, one sees the advantage of choosing a priori probability distribution functions which reproduce in Bayesian optimization of control problems. See Spragins 1 28 ,1 29 for detail. We have already mentioned that normal distributions have the selfreproducing property. As another example, consider random variables Y such that Yi
=
P /0
with probability with probability
8 1- 8
and where
~ r(a + b + 2) a b Po(8) - I'ia + 1) r(b + 1) 8 (1 - 8) ,
0<8<1
which is a Beta distribution. Then in the sequence of n independent observations (Yo ,..., Yn-1), if I is observed r times, then
P
( 8 In-I) _
Y
- Fia
Ttn + a + b + 2) 8a+r(1 + r + 1) I'tn + b - r + 1)
_ 8)n+b-r
(1)
which is also a Beta distribution. As already mentioned, what makes this class of probability distribution functions very attractive in adaptive control system analysis is the fact that the effect of learning can be summarized by a finite number of parameter values, and the a posteriori distributions are determined by
337
SUFFICIENT STATISTICS
specifying the parameter values; for example, by a + rand n + b - r in (1). If Po(8) is of the nonreproducing type, then the form of distributions is not preserved by the Bayes rules and actual implementation or computation of the learning process may become very elaborate and not feasible because the record of all past observations must be carried to obtain the a posteriori distributions. In the reproducing-type distributions, the effect of past observations is expressed compactly as the parameter values of the a posteriori distribution where functional forms remain the same as the a priori distribution. Thus, the amount of data needed by the system to specify the a posteriori distributions remains constant and does not grow with the number of observations. Therefore, in obtaining a posteriori distributions, it is not necessary to retain all past observations. It suffices to obtain only a certain fixed number of functions of the observations as parameter values. Thus, the parameter values in the a posteriori distribution is equivalent to the past observations. One suspects, therefore, that there must be a close connection between the class of distribution functions with the self-reproducing property and the existence of a finite number of sufficient statistics for such distribution functions because sufficient statistics, when they exist, serve to pick a particular distribution out of the class and any member of the class of self-reproducing distribution functions is specified by a set of parameters. Spragins-'" showed that, under certain assumptions, distribution functions F(x I 8) reproduce themselves if and only if a sufficient statistic for 8 of fixed dimension exists. We now present heuristic arguments for this fact. Let Yo ,..., Yn be a set of observations, and let them have a joint density function with the unknown parameter 8 with Po(8) as its a priori density function. By the Bayes rule, p(O I yn)
=
=
PoCO) p(yn I 0)
I se poe0) p(yn I 0) _p(ynl!_) . f p(yn I 0) dO
Po(O)
f
_
L] se
PoCO) [- p(yn I f p(yn I 0) dO
(2)
Note that the choice of an a priori distribution has effects only on the second factor of (2). Let a parameter in be a sufficient statistic. Then the joint density can be written as p(yn I 0) = g(yn) h(t,,, , 0)
(3)
338
APPENDIX IV
The first factor of (2) then has the form _p(yn~
f dO p(yn I 0)
h(tn, 0) ----I dO h(tn' 0)
which has the same form for all n and only the value in differs for each n. Thus, in (2), h(tn , 0) . _ _ --",P~o(,--,O)~--:-:(0 I n) _ p
y
-
f se h(tn , 0)
!!J
J
dO Po(0) in , 0) I dO h(tn , 0)
The normalized likelihood function ( 0 I n) [:, q y =
J
p(yn I 0) de p(yn I 0)
h(tn , e)
f h(tn, 0) de
(4)
reproduces itself under Bayes' rule. On the other hand, assume that the normalized likelihood is of the reproducing type. Then it has the form
where Sn is the parameter value through which q depends on the observations. Then, from (4), p(yn I e)
Jp(yn I 0) dO
=
q(O, sn)
=
q(O, sn) k(yn)
where k(yn)
=
Jp(yn I 0) se
which shows by the factorization theorem of sufficient statistics that is a sufficient statistic. Therefore the class of self-reproducing probability distribution is the class of distributions with sufficient statistics. Lastly, let us note that in the discussions of various convergence questions in Bayesian optimization problems involving incompletely known stochastic processes, if a priori distribution functions with the self-reproducing property are assumed for these stochastic processes, then the question of a posteriori distributions for the stochastic processes converging to the true distribution is equivalent to that of sufficient statistics in a posteriori distributions for the unknown parameters converging to the true values of these parameters (with probability one).
Sn
Bibliography ARIS, R, "Discrete Dynamic Programming." Random House (Blaisdell), New York, 1964. BELLMAN, R, "Introduction to Matrix Analysis." McGraw-Hill, New York, 1960. CHUNG, K. L., "Markov Chains with Stationary Transition Probabilities." SpringerVerlag, Berlin, 1960. FEL'DBAUM, A. A., "Optimal Control Systems." Academic Press, New York, 1966. HALMOS, P., "Finite Dimensional Vector Spaces." Van Nostrand, Princeton, New Jersey, 1958. KEMENY, J. G., SNELL, J. L., and KNAPP, A. W., "Denumerable Markov Chains." Van Nostrand, Princeton, New Jersey, 1966. Tou, J. T., "Modern Control Theory." McGraw-Hill, New York, 1965. VARGA, R. S., "Matrix Iterative Analysis." Prentice-Hall, Englewood Cliffs, New Jersey, 1962. WILDE, D. J., "Optimum Seeking Methods." Prentice-Hall, Englewood Cliffs, New Jersey, 1964. REFERENCES 1. AITCHISON, J., and BROWN, J. A. C., "The Lognormal Distribution." Cambridge Univ. Press, London and New York, 1957. 2. ASTROM, K. J., Optimal control of Markov processes with incomplete state information. J. Math. Anal. Appl. 10, 174-205 (1965). 3. AOKI, M., On the application of dynamic programming and numerical experimentation as applied to adaptive control systems. Tech. Rep. No. 60-16, Dep. of Eng., Univ. of California, Los Angeles (November 1959). 4. AOKI, M., On the optimal and suboptimal policies in the choice of control variables for final value control systems. IRE Intern. Conv. Record, Part IV, 15-22 (1960). 5. AOKI, M., Dynamic programming approach to the final value control system with a random variable having an unknown distribution function. IRE Trans. Auto. Control 5, No.4, 270-282 (1960). 6. AOKI, M., Stochastic time-optimal control systems. Trans. Amer.Tnst, Elec. Engrs. 80, Part II, 41--46 (1961). 7. AOKI, M., On minimum of maximum expected deviation from an unstable equilibrium position of a randomly perturbed control system. IRE PGAC, AC-7, 1-12 (1962). 8. AOKI, M., Successive approximations in solving some control system optimization problems, II. J. Math. Anal. Appl. 5, No.3, 418-434 (December 1962). 9. AOKI, M., On a successive approximation technique in solving some control system optimization problems, I. Trans. ASME Ser. D. J. Basic Eng. 85, 177-180 (June 1963). 10. AOKI, M., On the approximation of trajectories and its application to control systems optimization problems. J. Math. Anal. Appl. 9, No. 1,23-41 (August 1964). 11. AOKI, M., On optimal and suboptimal control policies in control systems, "Advances in Control Systems," Vol. I, Chap. 1, C. T. Leondes, ed. Academic Press, New York, 1964.
339
340
REFERENCES
12. AOKI, M" On performance losses in some adaptive control systems,!. Trans. ASME Ser. D. j. Basic Eng. 87, No.1, 90-94 (March 1965). 13. AOKI, M., On some convergence questions in Bayesian optimization problems. IEEE Trans. Auto. Control, 10, No.2, 180-182 (April 1965). 14. AOKI, M., Optimal Bayesian and min.-max. control of class of stochastic and adaptive dynamic systems. Proc. IFAC Symposium System Engineering for Control System Design, 77-84, Tokyo (August 1965). 15. AOKI, M., Optimal control of partially observable Markovian control systems. j. Franklin Inst. 280, No.5, 367-386 (November 1965). 16. AOKI, M., Optimal control policies for dynamical systems whose characteristics change randomly at random times. Presented at Third Congress IFAC, London (June 1966). 17. AOKI, M., and HUDDLE, J. R., On estimation of the state vector of a stochastic system using a minimal order observer. Proc. Joint Auto. Control Conf. 694-702, Univ. Washington, Seattle (August 1966). 18. BALAKRISHNAN, A. V., A general theory of nonlinear estimation problems in control systems. j. Math. Anal. Appl. 8, No.1, 4-30 (February 1964). 19. BAUM, E. K., Optimal control of long running stochastic systems. Res. Rep. No. PIBMRI-1220-1964, Polytech. Inst. of Brooklyn (June 1964). 20. BELLMAN, R., "Dynamic Programming." Princeton Univ. Press, Princeton, New Jersey, 1957. 21. BELLMAN, R, and KALABA, R., Dynamic programming and adaptive processes, mathematical foundation. IRE Trans. Auto. Control 5, 5-10 (January 1960). 22. BELLMAN, R., "Adaptive Control Processes: A Guided Tour." Princeton Univ. Press, Princeton, New Jersey, 1961. 23. BELLMAN, R., and DREYFUS, S., "Applied Dynamic Programming." Princeton Univ. Press, Princeton, New Jersey, 1962. 24. BELLMAN, R. E., KAGIWADA, H. H., KALABA, R. E., and SRIDHAR, R., Invariant imbedding and nonlinear filtering Theory. RAND, RM-4374-PR (December 1964). 25. BERTRAM, J. E., and SARACHIK, P. E., On the stability of circuits with randomly varying parameters. IRE Trans. Inform. Theory 5, 260-270 (1959). 26. BERTRAM, J. E., Control by stochastic adjustment. Amer. Inst. Elec. Engrs. Paper 59-1156, Application and Industry (1959). 27. BEUTLER, F. J., The operator theory of the pseudo-inverse; I, Bounded operators. j. Math. Anal. Appl. 10, 1-11 (1965). 28. BHARUCHA-REID, A. T., On the theory of random equations. Proc. Symp, Appl. Math. 14,40-69 (1964). 29. BLACKWELL, D., and GIRSHICK, M. A., "Theory of Games and Statistical Decisions." Wiley, New York, 1954. 30. BLACKWELL, D., On the functional equation of dynamic programming. j. Math. Anal. Appl. 2, 273-276 (1961). 31. BLACKWELL, D., and DUBINS, L., Merging of opinions with increasing information. Ann. Math. Statist. 33, No.3, 882-886 (September 1962). 32. BLACKWELL, D., Discounted dynamic programming. Ann. Math. Statist. 36, 226-235 (1965). 33. BREAKWELL, J. V., A doubly singular problem in optimal interplanetary guidance. j. Soc. Ind. Appl. Math., Ser. A. Control 3, No.1, 71-77 (1965). 33a. BRYSON, A. E., and JOHANSEN, D. E., Linear filtering for time-varying systems using measurements containing colored noise. IEEE Trans. Auto. Control 10, No.1, 4-10 (Jan. 1965).
REFERENCES
341
34. BOGDANOFF, J. L., and KOZIN, F., Moments of the output of linear random systems. J. Acoust. Soc. Amer. 34, No.8, 1063-1066 (1962). 35. Bucv, R. S., Nonlinear filtering theory. IEEE Trans. PGAC 10, No.2, 198 (April 1965). 36. Bucv, R. S., Stability and positive supermartingale. J. Differential Equations I, No.2, 151-155 (April 1965). 37. CAUGHEY, T. K., Commets on: On the Stability of Random Systems. J. Acoust. Soc. Amer. 32, No. 10, 1356 (1960). 38. CAUGHEY, T. K., and DIENES, J. K., The behavior of systems with random parametric excitation. J. Math. Phys. 41, No.4, 300-318 (1962). 39. CRAMER, H., "Mathematical Method of Statistics." Princeton Univ. Press, Princeton, New Jersey, 1946. 40. Cox, H., On the estimation of state variables and parameters for noisy dynamic systems. IEEE Trans. Auto. Control 9, No.1, 5-12 (January 1964). 41. Cox, H., Estimation of state variables via dynamic programming. Conference Paper, Proc. Joint Auto. Control Conf., 376-381, Stanford, California (1964). 42. DALY, R. F., Adaptive binary detectors. Tech. Rep. No. 2003-2, Stanford Electronics Labs. Stanford, California (June 1961). 43. DALY, R. F., The adaptive binary-detection problem on the real line. Tech. Rep. No. 2003-3, Stanford Electronics Labs., Stanford, California (February 1962). 44. DAVENPORT, W. B., jr., and ROOT, W. L., "An Introduction to the Theory of Random Signals and Noise." McGraw-Hill, New York, 1958. 45. DELEY, G. W., and FRANKLIN, G. F., Optimal bounded control of linear sampleddata systems with quadratic loss. Trans. ASME Ser. D. J. Basic Eng. 87, No.1, 135-141 (March 1965). 46. DETCHMENDY, D. M.,and SRIDHAR, R., Sequential estimation of states and parameters in noisy nonlinear dynamical systems. Proc. Joint Auto. Control Conf., 56-63, Troy, New York (1965). 47. DESOER, C. A., and WHALEN, RH., A note on pseudo-inverses. J. Soc. Anal. Appl. Math. 11, No.2, 442-447 (June 1963). 47a. DOOB, J. L., "Stochastic Processes." Wiley, New York, 1953. 48. DRENICK, R. F., and SHAW, L., Optimal control of linear plants with random parameters. IEEE Trans. Auto. Control 9, No.3, 236-244 (July 1964). 49. DREYFUS, S. E., Some types of optimal control of stochastic systems. J. Soc. Ind. Appl. Math. Ser. A. Control 2, No. I, 120-134 (1964). 50. DUBINS, L. E., and SAVAGE, L. }., "How to Gamble if You Must, Inequalities for Stochastic Processes." McGraw-HilI, New York, 1965. 51. DVORETZKY, A., On stochastic approximation. Proc. Third Berkeley Symposium on Mathematical Statistics and Probability 1, 39-55, Univ. of California Press, Berkeley, California (1956). 51a. DYNKIN, E. R, Controlled random sequences. Theor. Probability Appl. 10, No.1, 1-14 (1965). 52. EATON, J. H., and ZADEH, L. A., Optimal pursuit strategies in discrete state probabilistic systems. Trans. ASME Ser. D. J. Basic Eng. 84,23-29 (1961) 53. EATON, J. H., Discrete-time interrupted stochastic control processes. J. Math. Anal. Appl. 5,287-305 (1962). 54. FARRISON, J. R, Identification and control of random-parameter discrete systems. Tech. Rep. No. 6302-4, System Theory Lab., Stanford Electronics Labs., Stanford, California (January 1964). 55. FEL'DBAUM, A. A., Theory of dual control, I, II, III, IV. Automat. Remote Control2l,
342
56. 57. 58. 58a. 59.
60. 61. 61a.
62. 62a. 63. 63a. 64.
65. 66. 67. 68. 69. 70.
71.
71a. 72.
REFERENCES
No.9, 1240-49, No. 11, 1453-64 (1960); 22, No.1, 3-16, No.2, 129-143 (1961). FEL'DBAUM, A. A., On the optimal control of Markov objects. Automat. Remote Control 23, No.8, 993-1007 (1962). FEL'DBAUM, A. A., Optimal systems, "Disciplines and Techniques of System Control," Chap. VII, j. Peschon, ed. Random House (Blaisdell), New York, 1965. FELLER, W., "An Introduction to Probability Theory and its Application," Vol. I, 2nd ed. Wiley, New York, 1957. FERGUSON, T., "Statistical Inference." Academic Press, New York, to appear in 1966. FITZGERALD, R. ]., A gradient method for optimizing stochastic systems. Paper presented at IEEE-OSA Symposium on Recent Advances in Optimization Techniques, Carnegie Inst. Tech., Pittsburgh, Pennsylvania (April 1965). FLORENTIN, J. J., Optimal control of continuous time Markov stochastic systems. J. Electron. Control 10, No.6, 473-488 (1961). FLORENTIN, J. J., "Partial Observability and Optimal Control," J. Electron. Control 11, 263-279 (1962). FRIEDLAND, B., THAN, F. E., and SARACHIK, P. E., Stability problems in randomly excited dynamic systems. Proc. Joint Auto. Control Conf. 848-861, Univ. Washington, Seattle (August 1966). FUKAO, T., Some fundamental properties of adaptive control processes, I. Bull. Electrotech. Lab. 28, No.1, 1-19 (in Japanese) (January 1964) (Tokyo). FUKAO, T. System identification by Bayesian learning processes, I and II. Bull. Electrotech. Lab. 29, No.5, 364-380 (1960). GALTIERI, C. A., Problems of estimation in discrete-time processes. Res. Paper RJ-315, IBM San Jose Res. Lab., San Jose, California (August 1964). GANTMACHER, F. R., "The Theory of Matrices," Vol. 1. Chelsea, New York, 1960. GARDNER, L. A., j r., "Adaptive Predictors." Trans. Third Prague Conference on Information Theory, Statistical Decision Function, Random Processes, 123-192, Publishing House of the Czechoslovak Academy of Sciences, Prague, 1964. GREVILLE, T. N. E., Some applications of the pseudo-inverse of a matrix. Ind. Appl. Math. Rev. 2, No.1, 15-22 (1960). Grishin, V. P., On a calculation method related to a process of automatic adaptation. Automat. Remote Control 23, No. 12, 1502-1509 (1962). GUNCKEL, T. L., III, and FRANKLIN, G. F., A general solution for linear sampleddata control. Trans. ASME Ser. D. J. Basic Eng. 85, 197-201 (1963). HADLEY, G., "Linear Programming." Addison-Wesley, Reading, Massachusetts, 1962. HADLEY, G., Nonlinear and dynamic programming, "Stochastic Programming," Chap. 5. Addison-Wesley, Reading, Massachusetts, 1964. HAHN, W., "Theory and Application of Liapunov's Direct Method, Translated by S. H. Lehnigk and H. H. Hosenthien. Prentice-Hall, Englewood Cliffs, New Jersey, 1963. HANS, 0., "Random Fixed Point Theorems," Trans. Prague Conference on Information Theory, Statistical Decision Function, 105-125, Prague Czechoslovak Academy of Sciences, Prague, 1956. Ho, Y. C., The method of least squares and optimal Filtering theory. RM-3329-PR, RAND Corporation, Santa Monica, California, (October 1962). Ho, Y. C., and LEE, R. C. K., A Bayesian approach to problems in stochastic estimation and control. Proc. Joint Auto. Control Conf., 382-387, Stanford Univ., Stanford, California (June 1964).
REFERENCES
343
72a. Ho, Y. C., and WHALEN, B. H. An approach to the identification and control of linear dynamic systems with unknown parameters. 72b. Ho, Y. C., and LEE, R. C. K., Identification of linear dynamic systems. Inform. Control 8, 93-110 (1965). 73. HOGG, R. V., and CRAIG, A. T., "Introduction to Mathematical Statistics." Macmillan, New York, 1959. 73a. HOROWITZ, E., Some suboptimal control policies in optimal stochastic control systems. M. S. Thesis, Dep. of Eng., Univ. of California, Los Angeles (June 1966). 74. HOUSEHOLDER, A. S., "Principle of Numerical Analysis." McGraw-Hili, N.Y., 1953. 74a. HOWARD, R., "Dynamic Programming and Markov processes." Wiley, N.Y., 1960. 75. HOWARD, R. A., Dynamic inference. Tech. Rep. No. 10, Operations Res. Center, MIT, Cambridge, Massachusetts (December 1964). 76. Hsu, J. C., and MESERVE, W. E., Decision-making in adaptive control systems. IRE Trans. Auto. Control 7, 24-32 (January 1962). 77. JAYNES, E. T., New engineering applications of information theory. Proc, 1960 Random Function Theory Symposium, 163-203, J. L. Bogdanoff and F. Kozin, eds. Held at Purdue University in 1960, Wiley, New York, 1963. 78. JOHANSEN, D. E., Optimal control of linear stochastic systems with complexity constraints. Tech. Rep., Appl. Res. Lab., Sylvania Electronic Systems, Division of Sylvania Electric Products, Ind., Waltham, Massachusetts. 79. JOHNS, M. V., Non-parametric Bayes procedures. Ann. Math. Statist 28, No.3, 649-669 (September 1957). 80. JOSEPH, P. D., and Tou, J. T., On a linear control theory. Trans. Amer. Inst. Elec. Engrs. 80, Part II, 193-196 (September 1961). 81. JOSEPH, P. D., Suboptimal linear filtering. Tech. Rep., Space Technology Labs., Inc. (December 1963). 82. JOSEPH, P. D., On board navigation for rendez-vous missions. Lecture Note for 2-wk Summer Course, Univ. of California Los Angeles, Engineering Extension, Los Angeles, California (1964). 83. KALLIANPUR, G., A problem in optimum filtering with finite data. Ann. Math. Statist. 30, 659-669 (1959). 84. KALMAN, R. E., New methods and results in linear prediction and filtering theory. Tech. Rep. No. 61-1, RIAS, Baltimore, Maryland (1960). 85. KALMAN, R. E., A new approach to linear filtering and prediction problems. Trans. ASME Ser. D. y. Basic Eng. 82, 35-45 (1960). 86. KALMAN, R. E., and Bucv, R. S., New results in linear filtering and prediction theory. Trans. ASME Ser. D, y. Basic Eng. 83,95-108 (March 1961). 87. KALMAN, R. E., On the general theory of control systems. Proc. First IFAC Congress, Butterworth, London and Washington, D. C. (1961). 88. KALMAN, R. E., ENGLAR, T. S., and Bucv, R. S., Fundamental study of adaptive control systems. Tech. Rep. No. ASD-TR-61-27, Vol. 1, RIAS (April 1962). 89. KALMAN, R. E., Ho, Y. C., and NARENDRA, K. S., Controllability of linear dynamical systems. Contributions to Differential Equations 1, No.2, 189-213 (1963). 89a. KALMAN, R. E., Contribution to the theory of optimal control. Bol. Socieda Matematica Mexicana, 102-119 (1960). 89b. KARLIN, S., P61ya type distributions, II. Ann. Mat. Stat. 28, 281-308 (1957). 90. KATS,1. 1., and KRASOVSKII, N. N., On the stability of systems with random parameters. Appl. Math. Mech. (PMM) 24, 809-823 (1960). 91. KOLMOGOROV, A. N., and FOMIN, S. V., "Functional Analysis": Vol. 1, "Metric
344
REFERENCES
and Normal Spaces," Vol. 2, "Measure, Lebesgue Integral, Hilbert Space," English translation. Graylock Press, Rochester, New York, 1957. 91a. KOOPMANS, T. C., RUBIN, H., and LEIPNIK, R. B., Measuring the equation systems of dynamic economics, in "Statistical Inference in Dynamic Economic Models" (T. C. Koopmans, ed.), Chap. II. Wiley, New York, 1950. 92. KRASOVSKII, N. N., and LIDSKII, E. A., Analytical design of controls in systems with random properties, I, II, III. Automat. Remote Control 22, No.9, 1021-1025, No. 11, 1289-1294, No. 10, 1141-1146 (1961). 93. KRASOVSKII, N. N., Application of Liapunov's second method to differential systems and equations with delay, "Stability of motion," Translated by J. L. Brenner. Stanford Univ, Press, Stanford, California, 1963. 94. KOZIN, F., On almost sure stability of linear systems with random coefficients. j. Math. Phys. 42, No.1, 59-67 (1963). 95. KUSHNER, H. J., On the optimum timing of observations for linear control systems with unknown initial state. IEEE Trans. Auto. Control 4, No.2, 144-150 (April 1964). 96. KUSHNER, H. J-. On the stability of stochastic dynamical systems. Proc. Nat. Acad. Sci. U.S.A. 53, No.1, 8-12 (January 1965). 97. KUSHNER, H. J., New theorems and examples in the Liapunov theory of stochastic stability. Proc. Joint Auto. Control Conf., 613-619, Rensselaer Polytech. Inst., Troy, New York, 1965. 98. LANING, J. H., jr., and BATTIN, R. H., "Random Processes in Automatic ControL" McGraw-Hill, New York, 1956. 99. LA SALLE, ]., and LEFSCHETZ, S., "Stability by Liapunov's Direct Method with Applications." Academic Press, New York, 1961. 100. LEE, R. C. K., Optimal estimation, identification and control. Research Monograph No. 28, MIT Press, Cambridge, Massachusetts (1964). 100a. LEFSCHETZ, S., "Stability of Nonlinear Control Systems." Academic Press, New York, 1965. 101. LINNIK, Y. V., "Method of Least Squares and Principles of the Theory of Observations," English translation. Pergamon Press, New York, 1961. 102. LOEVE, M., "Probability Theory." Van Nostrand, Princeton, New Jersey, 1960. 103. LUENBERGER, D. G., Observing the state of a linear system. IEEE Trans. Military Electron. 8, No.2, 74-80 (April 1964). 104. MEDITCH, ]. S., Suboptimal linear filtering for continuous dynamic processes. Tech. Rep., Aerospace Corp. Contract No. AF04(695)-469 (July 15, 1964). 104a. MEDITCH, J. S., A class of suboptimal linear controls. Proc. Joint Auto. Control Conf. 776-782, Univ. Washington, Seattle (August 1966). 104b. MERRIAM, C. W., III, "Optimization Theory of the Design of Feedback Control Systems." McGraw-Hill, New York, 1964. 105. MAGILL, D. T., Optimal adaptive estimation of sampled stochastic processes. IEEE Trans. Auto. Control 10, No.4, 434-439 (October 1965). 105a. MEIER, L, III, Combined optimum control and estimation theory. Tech. Rep. No. NAS2-2457, Stanford Res. Inst., Menlo Park, California (October 1965). 106. MILLER. K. S., "Multidimensional Gaussian Distributions," SIAM Series in Applied Mathematics. Wiley, New York, 1964. 107. MORTENSEN, R. E., A note on polar decomposition and the generalized inverse of an arbitrary matrix, Notes on System Theory," Vol. VI. Electronics Res. Lab., Univ. of California, Berkeley, California, April 1964. 108. ODANAKA, T., Minimization of the probability of the maximum deviation, in
REFERENCES
345
Japanese. Proc. 7th Joint Auto. Control Conf. of Japan, Univ. of Nagoya, Japan (October 1964). 109. ORFORD, R J., Optimal stochastic control systems. J. Math. Anal. Appl. 6,419-429 (1963). 109a. ORR, R E., Optimal linear discrete filtering. M. S. Thesis, Dep. of Eng., Univ. of California, Los Angeles (1964). 109b. PAPOULIS. A. "Probability, Random Variables and Stochastic Processes." McGraw-. Hill, New York, 1965. 109c. PESCHON, J., MEIER, L., III, LARSON, R. E., Fov, W. H., jr., and DAWSON, C. H. Information requirements for guidance and control systems. Tech. Rep, No. NAS2-2457, Stanford Res. Inst., Menlo Park, California (November 1965), 110. PENROSE, R, A generalized inverse for matrices. Proc. Cambridge Phil. Soc. 51. 406-413 (1955). Ill. PENROSE, R., On best approximate solution of linear matrix equations. Proc. Cambridge Phil. Soc. 52, 17-19 (1956). 112. PENTECOST, E. E., and STUBBRUD, A. R, Synthesis of computationally efficient sequential linear estimators. To appear in IEEE Trans. Aerospace Electron. Systems. 113. PFEIFFER, C. G., A dynamic programming analysis of multiple guidance corrections of a trajectory. A.I.A.A. Journal 3, No.9, 1674-1681 (September 1965). 114. PONTRYAGIN, L.S., et al., "The Mathematical Theory of Optimal Processes," English translation. Wiley (Interscience), New York, 1962. 114a. PYKE, R, Stationary probabilities for a semi-Markov process with finitely many states (abstract). Ann. Math. Statist. 31, No. I, 240 (March 1960). 115. RAIFFA, M., and SCHLAIFER, R, "Applied Statistical Decision Theory." Harvard Business School, Boston, Massachusetts, 1961. 116. RAVIV, ]., Decision making in incompleted known stochastic systems. Rep. 64-25, Electronic Res. Lab., Univ. of California, Berkeley, California (July 1964). 117. ROBBINS, H., An empirical Bayes approach to statistics. Proc. Third Berkeley Symposium on Math. Statist, Prob. I, 157-164, Univ. of California, Berkeley, California (1955). 1I8. Robbins, H., The empirical Bayes approach to statistical decision problems. Ann. Math. Statist. 35, No. I, 1-20 (March 1964). 119. ROSENBROCK, H. H., The foundation of optimal control, with an application to large systems. Automatica I, 263-288 (December 1963). 120. ROSENBROCK, H. H., An example of optimal adaptive control. J. Electron. ControlB, 557-567 (1964). 121. SAMUELS, J. C., On the mean square stability of random linear systems. IRE Trans. Inform. Theory 5, 248-259 (May 1959). 122. SAMUELS, J. C., and ERINGEN, A. C., On stochastic linear systems. J. Math. Phys. 38, No.2, 83-103 (1959). 123. SAMUELS, J. C., On the stability of random systems and the stabilization of deterministic systems with random noise. J. Acoust. Soc. Amer. 32, No.5, 594-601 (1960). 124. SCHULTZ, P. R., An optimal control problem with state vector measurement error, "Advances in Control Systems," Chap. Y, Vol. I, C. T. Leondes, ed. Academic Press, New York, 1964. 125. SORENSON, H. W., On the controllability and observability of optimal stochastic linear control systems. Tech. Memorandum, LAS-3329, A. C. Electronics Division, General Motors (January 1966). 125a. SORENSON, H. W., Nonlinear perturbation theory for estimation and control of
346
REFERENCES
time discrete stochastic systems. Ph. D. Thesis, Dept. of Eng. Univ. of California, Los Angeles (1966). 126. SPANG, H. A., III, Optimum control of an unknown linear plant using Bayesian estimation of the error. Proc. Nat. Electron. Conf. 20, 620-625 (1964). 127. SPANG, H. A., III, The effects of estimation of error on the control of an unknown linear plant. Rep. No. 65-RL-3907E, General Electric Res. Lab., Schenectady, New York (April 1965). 128. SPRAGINS, J. D., Reproducing distributions for machine learning. TR 6103-7, Stanford Electronics Labs., Stanford, California (November 1963). 129. SPRAGINS, J. D., A note on the iterative application of Bayes' rule. IEEE Trans.Inform. Theory II, No.4, 544-549 (October 1965). 130. STRATONOVICH, R L., Conditional Markov processes. Theor. Probability Appl. 5, No.2, 156-178 (1960). 131. STRAUCH, R E., Negative dynamic programming. Ph. D. Thesis in Statistics, Univ. of California, Berkeley, California (1965). 132. SUSSMAN, R, Optimal control of systems with stochastic disturbances. Rep. AF-AFOSR 139-63, Electronics Res. Lab. Univ. of California, Berkeley, California (November 1963). 133. SWORDER, D. D., Synthesis of optimal, discrete time, adaptive control systems. Ph. D. Thesis, Dep. of Eng., Univ. of California, Los Angeles, California (June 1964). 134. SWORDER, D. D., Minmax control of discrete time stochastic systems. Soc. Ind. Appl. Math. Ser. A. Control 2, No.3, 433-449 (1964). 135. SWORDER, D. D., Control of a linear system with a Markov property. IEEE Trans. Auto. Control 10, No.3, 294-300 (July 1965). 135a. SWORDER, D. D., A study of the relationship between identification and optimization in adaptive control problems. J. Franklin Inst. 281, No.3, 198-213 (March 1966). 136. SWORDER, D. D. and AOKI, M., On the control system equivalents of some decision theoretic theorems. J. Math. Anal. Appl. 10, No.2, 424-438 (1965). 136a. THEIL, H., A note on certainty equivalence in dynamic planning. Econometrics 25, No.2, 346-349 (April 1957). 137. TRUXAL, J. G., and PADALINO, J. J., Decision theory, "Adaptive Control Systems," Chap. 15, Mishkin and Braun, eds. McGraw-Hill, New York, 1961. 138. ULA, N., and KIM, M., An empirical Bayes approach to adaptive control. J. Franklin Inst. 280, No.3, 189-204 (September 1965). 139. VAJDA, S., "Mathematical Programming." Addison-Wesley, Reading, Massachusetts, 1961. 140. WATANABE, S., Information theoretic aspects of inductive and deductive inference. IBM J. Res. Develop., 208-2310 (April 1960). 141. WONHAM, W. M., Stochastic problems in optimal control. RIAS Tech. Rep., 63-14 (May 1963). 141a. YAKUBOVITCH, V. A., The matrix inequality method in the theory of nonlinear control systems, I. Autom. Remote Control 25, No.7, 1017-1029 (July 1964). 142. ZADEH, L. A., and Desoer, C. A., "Linear System Theory: The State Space Approach." McGraw-Hili, New York, 1963. 143. ZAMES, G., On the stability of nonlinear time-varying feedback systems. Proc. Natl. Election. Conf. 20, 725 (October 1964).
LIST OF SYMBOLS
A+
A>B A~B
Ak,Fk B k , Gk Ck
dk d(x, y)
I Ie I, I IT! I
s;
2'(.) N
N(B,02)
f(·)
pO p(.
I .)
Pseudoinverse of A A - B is positive definite (A - B) is nonnegative definite Plant matrix Control matrix Actual output of a system Desired output of a system = dxdy In general, denotes error vector Expectation of Observation matrix Criterion function Criterion function of estimation problems Jacobian determinants Gain of Kalman filter Distribution function Total duration of control process Normal distribution function with mean 8 and variance a 2 Null space Probability density function Conditional probability density function Probability = E(Wk ) Range space Sufficient statistics Trace of a matrix
Control variable at ith time instant ui = (uo, UI , ... , Ui) Vi Admissible set of control at ith time instant Var( .) Variance of Wk(Xk, Uk-I) Contribution to I from kth time instant Xi State variable of a system at ith time instant Xi
=
(Xo
Xi*
=
E(Xi
Xi
=
Xi -
Xi
x., ~k Yi yi
Yi
fJ
'
Xl , ... , Xi)
I yi) Xi*
lyH) Augmented state vector Observed value of Xi = (Yo '''·,Yi) The set on which the observed value IS taken at ith time instant Unknown or random parameter in the system plant equation Unknown or random parameter in the system observation equation = Ak f yt+IP(Yk Iy k- l, Uk-I) dYk Covariance matrix associated with fLi =
E(Xi
+
Ti'};i' Pi'
o., R s..
347
i ,
Ai
Usually refer to covariance matrices
348
8 8,81,8"" etc. '\k /Li
~i
LIST OF SYMBOLS
Random disturbance III the observation equation at ith time instant Mean of random variable or random parameters Known parameter spaces = E(Wk I yk-l, U k-2) = E(x i I yi) Random disturbance III the plant equation at ith time instant = P(Ui I yi, Ui- 1 ) Variance of a random variable
= (
*
fl
11'llv
11'11
Used as a superscript, the asterisk generally indicates optimality In the sense of the context A caret over a symbol generally indicates the estimated value A bar over a symbol generally indicates the expected value Equality by definition
= (., V 0)
A prime indicates the transpose of a vector or a matrix Euclidean norm of a vector Inner product
= E(o 0k) 1
Direct sum
Author Index
Numbers in parentheses are reference numbers and indicate that an author's work is referred to although his name is not cited in the text. Numbers in italic show the page on which the complete reference is listed. Aitchison, J., 339 Aoki, M., 3(3), 20(14, 15), 21(15, 16), 25(3), 36(11), 81(15), 116(8), 118(5), 129(14-16),198(12,136),223(4,5,9,11), 224(5, 17), 228(5), 229(5), 250, 252, 254(17),255(17),291(6),298(7),301(16), 303, 339, 340, 346 Aris, R., 339 Astrom, K. J., 3(2), 339 Balakrishnan, A. Y., 195, 340 Battin, R. H., 2(98), 38(98), 344 Baum, E. K., 301, 340 Bellman, R. K, 2(20-22), 3(21, 22), 8(20), 32(20), 116(20), 155, 163(20, 22), 173(22), 223(20), 301, 308(20), 339, 340 Bertram, J. E., 284, 340 Beutler, F. J., 319, 340 Bharucha-Reid, A. T., 340 Blackwell, D., 3(29), 18(29), 204, 206(31), 207(31), 299, 300(29), 301, 340 Bogdanoff, J. L., 282, 341 Breakwell, J. Y., 72, 192(33), 340 Brown, J. A. C., 339 Bryson, A. E., 69(33a), 340 Bucy, R. S., 61(86), 74(86), 156(86), 209 (88),212(88), 217(88), 218(88),284,341, 343 349
Caughey, T. K., 282, 341 Cox, H., 173(40, 41), 341 Craig, A. T., 21(73), 32(73), 34(73), 53(73), 137(73), 168(73), 334(73), 335, 343 Cramer, H., 160(39), 168(39), 171(39), 194(39), 195(39), 325, 34/ Daly, R. F., 204, 341 Davenport, W. B., j-, 2(44), 341 Dawson, C. H., 345 Deley, G. W., 4(45), 341 Desoer, C. A., 75(142), 76(142), 156(142), 209(142), 212(142), 253(142), 265(142), 270(142), 319(47, 142), 321(142), 341, 346 Detchmendy, D. M., 195(46), 341 Dienes, J. K., 282, 341 Doob, J. L., 199(47a), 203(47a), 206(47a), 285(47a), 288, 290(47a), 309, 341 Drenick, R. F., 301, 341 Dreyfus, S. E., 3, 10(49), 23, 52(49), 224 (49),241(49), 340, 341 Dubins, L. E., 204, 206(31),207(31),301, 340, 341 Dvoretzky, A., 222(51), 223(51), 341 Dynkin, E. B., 131(51a), 341 Eaton, J. H., 72, 301, 341
350
AUTHOR INDEX
Englar, T. S., 209(88), 212(88), 217(88), 218(88), 343 Eringen, A. C., 282(122), 345 Farrison, J. B., 3(54), 80, 341 Fel'dbaum, A. A., 1(57), 3(55),20, 23, 120, 129(56), 341, 342 Feller, W., 4(58), 27(58), 342 Ferguson, T., 19(58a), 299, 300, 342 Fitzgerald, R J., 223(59), 342 F1orentin, J. J., 3(60, 61), 342 Fomin, S. V., 2(91), 343 Foy, W. H., Jr., 345 Franklin, G. F., 4(45), 20(67), 37(67), 341, 342 Friedland, R, 282(61a), 342 Fukao, T., 222(62, 62a), 223(62), 229, 342 Galtieri, C. A., 223(63), 342 Gantmacher, F. R, 40, 342 Gardner, L. A., Jr., 223(64), 342 Girschick, M. A., 3(29), 18(29), 299, 300(29), 340 Greville, T. N. E., 319(65), 342 Grishin, V. P., 118,342 Guncke1, T. L., III, 20(67), 37(67), 342 Hadley, G., 223(68, 69), 342 Hahn, W., 282, 342 Halrnos, P., 339 Hans, 0., 223(71), 342 Ho, Y. C., 155(71a), 168(72a), 209(89), 212(89), 217(89), 222(72a), 342, 343 Hogg, R. V., 21(73), 32(73), 34(73), 53(73), 137(73), 168(73), 334(73), 335, 343 Horowitz, E., 35(73a), 115, 343 Householder, A. S., 80, 343 Howard, R A., 301(75), 302, 343 Hsu, J. C., 343 Huddle, J. R, 224(17), 252, 254(17), 255(17), 340 Jaynes, E. T., 198(77), 343 Johansen, D. E., 36(78), 69(33a), 251, 340, 343 Johns, M. V., 298(79), 343 Joseph, P. D., 20(80), 37(80), 180(82), 224(82,81),247, 250(8\),251(81),343
Kagiwada, H. H., 163(24), 340 Kalaba, R E., 2(21), 3(21), 163(24), 340 Kallianpur, G., 204, 343 Kalman, R E., 61(84-86), 64(89a), 74(85, 86), 156(85, 86), 209(87-89, 89a), 212, 217,218,343 Kats, 1. 1., 284, 343 Kim, M., 198(138), 346 Kolmogorov, A. N., 2(91), 343 Koopmans, T. C., 102(91a), 168(91a), 344 Kozin, F., 282(34), 341, 344 Krasovskii, N. N., 282, 284, 343, 344 Kushner, H. J., 72, 284, 288, 344 Laning, J. H., j-, 2(98), 38(98),344 Larson, R E., 345 La Salle, J., 282, 344 Lee, R C. K., 155, 168(100), 209(100), 222(100), 342, 343, 344 Lefschetz, S., 282, 344 Leipnik, R R, 102(91a), 168(9Ia), 344 Lidskii, E. A., 344 Linnik, Y. V., 155(101),344 Loeve, M., 199(102), 206(102), 309, 344 Luenberger, D. G., 224(103), 253, 256 (103), 344 Magill, D. T., 223(105), 344 Meditch, J. S., 224(104, 104a) , 250(104, 104a), 344 Meier, L., III, 3(105a), 20(105a), 129(105a), 344, 345 Merriam, C. W., III, 242(104b), 344 Meserve, W. E., 343 Miller, K. S., 325, 344 Mortensen, R E., 319, 344 Narendra, K. S., 209(89), 212(89), 217(89), 343 Odanaka, T., 291, 344 Orford, R. J., 301, 345 Orr, R. E., 34(109a), 182, 345 Padalino, J. J., 346 Papoulis, A., 74(109b), 156(109b), 345 Penrose, R, 319(110,111),345 Pentecost, E. E., 224(112), 250( I 12), 269, 345
AUTHOR INDEX
Peschon, J., 345 Pfeiffer, C. G., 291, 345 Pontryagin, L. S., et a!., 2(114), 345 Pyke, R, 303 (114a), 345 Raiffa, M., 3(115), 345 Raviv, J., 204, 345 Robbins, H., 198(117, 118),298(118),345 Root, W. L., 2(44), 341 Rosenbrock, H. H., 66(119), 67(119), 302, 345 Rubin, H., 102(91a), 168(91a), 344 Samuels, J. C., 282, 345 Sarachik, P. E., 282(6Ia), 284, 340, 342 Savage, L. J., 301, 341 Schlaifer, R, 3(115), 345 Schultz, P. R, 3(124), 345 Shaw, L., 301, 341 Sorenson, H. W., 220, 345 Spang, H. A., III, 224(126, 127), 242, 245, 346 Spragins, J. D., 169(128, 129), 198(128), 223(129), 336, 337, 346 Sridhar, R, 163(24), 195(46), 340, 341 Stratonovich, R L., 3, 129(130), 346 Strauch, R E., 301, 346 Stubbrud, A. R, 224(112), 250(112), 269( 112), 345
351
Sussman, R, 3(132), 346 Sworder, D. D., 3(133),19(133), 115(135a), 129(135), 198(134), 299, 300, 346 Than, F. E., 282(61a), 342 Theil, H., 1O(136a), 52(136a), 346 Tou, J. T., 20(80), 37(80), 339, 343 Truxal, J. G., 346 VIa, N., 198(138), 346 Vajda, S., 223(139), 346 Watanabe, S., 346 Whalen, B. H., 168(72a), 222(72a), 319(47), 341, 343 Wilde, D. J., 339 Wonham, W. M., 3(141), 61(141), 69(141), 346 Yakubovitch, V. A., 282(141a), 346 Zadeh, L. A., 75(142), 76(142), 156(142), 209(142), 212(142), 253(142), 265(142), 270(142), 301, 319(142), 321(142), 341, 346 Zames, G., 282(143), 346
Subject Index
Adaptive systems (See Chapter III), 3 definition of, 10 Approximate estimation method by minimal order observer (See VII. 4) by state vector partition method (See VII. 5) Approximately optimal adaptive control policy (See VII. 1) Approximation in policy space, 116 of adaptive control policy by stochastic control policy (See VII. 1) of closed-loop control policy by openloop feedback control policy (See VII. 2) Assumption Y, 133 Asymptotic stability, 285 Augmented state vectors, 38, 131 Bayes formula, 5 Bayesian control policy, 3, 18 definition of, 300 Bayesian estimation method (See II.3 and Chapter V) for control systems, 179 for dynamic systems, 176 for static systems, 174 Beta distribution function, 118 Certainty equivalence principle application of, 10, 13, 52 definition of, 52 352
modified certainty equivalence principle, 52 Chain rule, 5 Chebychev inequality, 315 Completion of squares, 73, 195 Conditional expectation, 8, 30, 87 definition of, 312 Control policy Bayesian, 3, 18, 300 closed-loop, 3, 11, 22 completeness of, 198 equalizer, 19, 299, 300 extended Bayes, 300 min-max, 18, 19,299 non-randomized, 23, 29, 85 one-stage optimal, 8, 113 open-loop feedback, 241 randomized, 23, 300 Control systems with delay, 72 with intermittent observation data, 72 with old data, 36, 71 Convergence (See Chapter VI) in probability, 315 mutual, 198,207 of a posteriori probability distribution, 197, 205 with probability one, 316 Criterion function for control problem, 1,22,39,83,291,301 for estimation problem, 173 implicit, 291
SUBJECT INDEX Dual control theory (See III. 6), 3 Duality principle, 64 Dynamic programming, 2, 8, 32 Edgeworth's series, 192 Empirical Bayes approach, 198 Error analysis of Kalman filter (see VII. 3) Estimation methods Bayesian, 173 least-squared, 155 maximum likelihood, 168 Estimation problem, 72 (See Filtering problem) Equalizer control policy, 19, 299, 300 Expectation, 6, 311 Extended Bayes control policy, 300 Factorization theorem, 137, 334 Filtering problem, 72 for linearized nonlinear systems, 179 for nonlinear systems, 189 of state vectors (See VII, 4.6), 179 with correlated noise, 69 with uncorrelated noise, 59, 177 Final value control system, 4 Gaussian random variable (See Appendix III), 35 Gram-Charlier's series, 192 Identifiability, 221 Independence of random variables, 310 Kalman filter for linearized nonlinear systems, 179 for linear systems with uncorrelated noise, 59, 177 for linear systems with correlated noise, 69 Koopman-Pitman class, 137 Least-squares estimation method for linear control systems, 166 for linear dynamic systems, 158 for nonlinear dynamic systems, 162 for static systems, 155 Likelihood function, 168 Linear estimate of state vectors, 179 Linear transformation of Gaussian random variables, 327
353
Lyapunov function, 282 Markov properties, 132 Markov sequence, 129 Martingale (See VI. 3) definition of, 202, 313 expectation-decreasing, 203, 284 semi-, 203, 284 Martingale convergence theorem, 206, 316 Martingale inequality, 315 Matrix identities, 79, 80 Maximum-likelihood estimation method, 168 for dynamic system, 171 for static system, 169 for unknown parameter, 203 Median, 173 Minimization of a quadratic form, 73 Min-max control policy (See IX. 2), 18, 19, 299 Mode, 173 Monotone convergence theorem, 316 Monte Carlo simulations, 36 Nonrandomized control policy definition of, 23 example of, 29, 85 Normal distribution (See Appendix III) characteristic function of, 326 conditional distribution of, 329 singular distribution, 331 Observability (See VI. 5) deterministic, 211 off line, 211 on line, 211 stochastic, 212, 218 strict sense, 214 wide sense, 217 Observation equation, 22 One-stage optimal control policy, 8, 113 One-stage optimization, 8, 113 Open-loop control policy, 241 Open-loop feedback control policy, 241 Optimal spacings of observations, 72 Orthogonality principle, 74, 156 Parameter adaptive control systems (See Chapter III), 10 example of, 14, 17
354
SUBJECT INDEX
optimal control policies of (See Chapter Ill) Passive adaptation, 126 Pearson's probability distribution function 195 ' Performance index (See criterion function) Plant equation, 22 P61ya type distribution, 296 Probability distribution function Beta, 118 binomial, 118 Koopman-Pittman class, 137 mean of, 173 median of, 173 mode of, 173 normal (See Appendix III) P6lya, 296 Pearson, 195 self-reproducing, 169, 198, 336 skewness of, 194 Proportional-plus-integral control, 65 Pseudo-inverse (See Appendix II), 46, 74, 157 Purely stochastic systems (See Chapter II), 10 Radon-Nikodyrn theorem, 312 Random stopping time problem 303 Random variable, 310 Randomized control policy definition, 23, 300 examples, 29 Recursion equations for conditional probability density function, 26, 89-92, 105, 143 to generate optimal control policy, 31, 88, 142
Regulation problem, (See II. 2), 64 Self-reproducing probability distribution function, 169, 198 Semi-martingales, 203, 284 Semi-martingale convergence theorem, 288 Semi-martingale inequality, 285 Sensitivity of Kalman filter (See VII. 3) with respect to gain, 247 with respect to noise covariance matrix 249 ' with respect to transition matrix, 248 Skewness, 194 Stability, 285 Stochastic controllability (See VI. 5), 217 matrix, 218 Stochastic observability (See Observability) matrix, 214 Stochastic stability (See Chapter VIII) Stochastic system (See Chapters III and IV), 3, 10 Sufficient statistics (See Appendix IV), 21, 32, 53, 59, 135 factorization theorem, 137, 334 Systems with random or unknown noise characteristics; 18, 83, 94, 97, 102, 120, 237 Systems with random or unknown plant parameters, 7, 13, 16, 17, 104, 105, 108, 111, 124, 143,239 Systems with unknown initial condition 93 ' Wiener-Kalman filter (See Kalman filter) Worst possible distribution function 19 300 ' ,