C.K. Chui · G. Chen
Kalman Filtering with Real-Time Applications Fourth Edition
123
Professor Charles K. Chui Texas A&M University Department Mathematics 608K Blocker Hall College Station, TX, 77843 USA
Professor Guanrong Chen City University Hong Kong Department of Electronic Engineering 83 Tat Chee Avenue Kowloon Hong Kong/PR China
Second printing of the third edition with ISBN 3-540-64611-6, published as softcover edition in Springer Series in Information Sciences.
ISBN 978-3-540-87848-3
e-ISBN 978-3-540-87849-0
DOI 10.1007/978-3-540-87849-0 Library of Congress Control Number: 2008940869 © 2009, 1999, 1991, 1987 Springer-Verlag Berlin Heidelberg This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilm or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer. Violations are liable to prosecution under the German Copyright Law. The use of general descriptive names, registered names, trademarks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. Cover design: eStudioCalamar S.L., F. Steinen-Broo, Girona, Spain Printed on acid-free paper 987654321 springer.com
Preface to the Third Edition
Two modern topics in Kalman filtering are new additions to this Third Edition of Kalman Filtering with Real-Time Applications. Interval Kalman Filtering (Chapter 10) is added to expand the capability' of Kalman filtering to uncertain systems, and Wavelet Kalman Filtering (Chapter 11) is introduced to incorporate efficient techniques from wavelets and splines with Kalman filtering to give more effective computational schemes for treating problems in such areas as signal estimation and signal decomposition. It is hoped that with the addition of these two new chapters, the current edition gives a more complete and up-to-date treatment of Kalman filtering for real-time applications. College Station and Houston August 1998
Charles K. Chui Guanrong Chen
Preface to the Second Edition
In addition to making a number of minor corrections and updating the list of references, we have expanded the section on "realtime system identification" in Chapter 10 of the first edition into two sections and combined it with Chapter 8. In its place, a very brief introduction to wavelet analysis is included in Chapter 10. Although the pyramid algorithms for wavelet decompositions and reconstructions are quite different from the Kalman filtering algorithms, they can also be applied to time-domain filtering, and it is hoped that splines and wavelets can be incorporated with Kalman filtering in the near future. College Station and Houston September 1990
Charles K. Chui Guanrong Chen
Preface to the First Edition
Kalman filtering is an optimal state estimation process applied to a dynamic system that involves random perturbations. More precisely, the Kalman filter gives a linear, unbiased, and minimum error variance recursive algorithm to optimally estimate the unknown state of a dynamic system from noisy data taken at discrete real-time. It has been widely used in many areas of industrial and government applications such as video and laser tracking systems, satellite navigation, ballistic missile trajectory estimation, radar, and fire control. With the recent development of high-speed computers, the Kalman filter has become more useful even for very complicated real-time applications. In spite of its importance, the mathematical theory of Kalman filtering and its implications are not well understood even among many applied mathematicians and engineers. In fact, most practitioners are just told what the filtering algorithms are without knowing why they work so well. One of the main objectives of this text is to disclose this mystery by presenting a fairly thorough discussion of its mathematical theory and applications to various elementary real-time problems. A very elementary derivation of the filtering equations is first presented. By assuming that certain matrices are nonsingular, the advantage of this approach is that the optimality of the Kalman filter can be easily understood. Of course these assumptions can be dropped by using the more well known method of orthogonal projection usually known as the innovations approach. This is done next, again rigorously. This approach is extended first to take care of correlated system and measurement noises, and then colored noise processes. Kalman filtering for nonlinear systems with an application to adaptive system identification is also discussed in this text. In addition, the limiting or steadystate Kalman filtering theory and efficient computational schemes such as the sequential and square-root algorithms are included for real-time application purposes. One such application is the design of a digital tracking filter such as the a - f3 - , and a - f3 - , - ()
VIII
Preface to the First Edition
trackers. Using the limit of Kalman gains to define the a, (3, 7 parameters for white noise and the a, (3,7, () values for colored noise processes, it is now possible to characterize this tracking filter as a limiting or steady-state Kalman filter. The state estimation obtained by these much more efficient prediction-correction equations is proved to be near-optimal, in the sense that its error from the optimal estimate decays exponentially with time. Our study of this topic includes a decoupling method that yields the filtering equations for each component of the state vector. The style of writing in this book is intended to be informal, the mathematical argument throughout elementary and rigorous, and in addition, easily readable by anyone, student or professional, with a minimal knowledge of linear algebra and system theory. In this regard, a preliminary chapter on matrix theory, determinants, probability, and least-squares is included in an attempt to ensure that this text be self-contained. Each chapter contains a variety of exercises for the purpose of illustrating certain related view-points, improving the understanding of the material, or filling in the gaps of some proofs in the text. Answers and hints are given at the end of the text, and a collection of notes and references is included for the reader who might be interested in further study. This book is designed to serve three purposes. It is written not only for self-study but also for use in a one-quarter or one-semester introductory course on Kalman filtering theory for upper-division undergraduate or first-year graduate applied mathematics or engineering students. In addition, it is hoped that it will become a valuable reference to any industrial or government engineer. The first author would like to thank the U.S. Army Research Office for continuous support and is especially indebted to Robert Green of the White Sands Missile Range for' his encouragement and many stimulating discussions. To his wife, Margaret, he would like to express his appreciation for her understanding and constant support. The second author is very grateful to Professor Mingjun Chen of Zhongshan University for introducing him to this important research area, and to his wife Qiyun Xian for her patience and encouragement. Among the colleagues who have made valuable suggestions, the authors would especially like to thank Professors Andrew Chan (Texas A&M), Thomas Huang (Illinois), and Thomas Kailath (Stanford). Finally, the friendly cooperation and kind assistance from Dr. Helmut Lotsch, Dr. Angela Lahee, and their editorial staff at Springer-Verlag are greatly appreciated. College Station Texas, January, 1987
Charles K. Chui Guanrong Chen
Contents
Notation 1. Preliminaries
XIII 1 1
1.1 Matrix and Determinant Preliminaries 1.2 Probability Preliminaries 1.3 Least-Squares Preliminaries Exercises
8 15 18
2. Kalman Filter: An Elementary Approach
20
2.1 The Model 2.2 Optimality Criterion 2.3 Prediction-Correction Formulation 2.4 Kalman Filtering Process Exercises
20
21 23 27
3. Orthogonal Projection and Kalman Filter
33
29
3.1 Orthogonality Characterization of Optimal Estimates 33 3.2 Innovations Sequences 35 3.3 Minimum Variance Estimates 37 3.4 Kalman Filtering Equations 38 3.5 Real-Time Tracking 42 Exercises 45 4. Correlated System and Measurement Noise Processes
49
4.1 The Affine Model 4.2 Optimal Estimate Operators 4.3 Effect on Optimal Estimation with Additional Data 4.4 Derivation of Kalman Filtering Equations 4.5 Real-Time Applications 4.6 Linear DeterministicjStochastic Systems Exercises
49 51 52 55 61 63 65
X
Contents
5. Colored Noise
67
5.1 Outline of Procedure 5.2 Error Estimates 5.3 Kalman Filtering Process 5.4 White System Noise 5.5 Real-Time Applications Exercises
67 68 70 73 73 75
6. Limiting Kalman Filter
77
6.1 Outline of Procedure 6.2 Preliminary Results 6.3 Geometric Convergence 6.4 Real-Time Applications Exercises
78 79 88 93 95
7. Sequential and Square-Root Algorithms
97
7.1 Sequential Algorithm 7.2 Square-Root Algorithm 7.3 An Algorithm for Real-Time Applications Exercises
97 103 105 107
8. Extended Kalman Filter and System Identification
108
8.1 Extended Kalman Filter 8.2 Satellite Orbit Estimation 8.3 Adaptive System Identification 8.4 An Example of Constant Parameter Identification 8.5 Modified Extended Kalman Filter 8.6 Time-Varying Parameter Identification Exercises
108 111 113 115 118 124 129
9. Decoupling of Filtering Equations
131
9.1 Decoupling Formulas 9.2 Real-Time Tracking 9.3 The a - {3 -1 Tracker 9.4 An Example Exercises
131 134 136 139 140
Contents
XI
10. Kalman Filtering for Interval Systems
143
10.1 Interval Mathematics 10.2 Interval Kalman Filtering 10.3 Weighted-Average Interval Kalman Filtering Exercises
143 154 160 162
11. Wavelet Kalman Filtering
164
11.1 Wavelet Preliminaries 11.2 Signal Estimation and Decomposition Exercises
164 170 177
12. Notes
178
The Kalman Smoother The a - {3 - , - () Tracker Adaptive Kalman Filtering Adaptive Kalman Filtering Approach to Wiener Filtering 12.5 The Kalman-Bucy Filter 12.6 Stochastic Optimal Control 12.7 Square-Root Filtering and Systolic Array Implementation 12.1 12.2 12.3 12.4
178 180 182 184 185 186 188
References
191
Answers and Hints to Exercises
197
Subject Index
227
Notation
A, Ak AC AU B, Bk C, Ck Cov(X, Y) E(X) E(XIY = y) ej,
systena naatrices "square-root" of A in Cholesky factorization "square-root" of A in upper triangular deconaposition control input naatrices naeasurenaent naatrices covariance of randona variables X and Y expectation of randona variable X conditional expectation
ej
f(x) f(Xl, X2) f(Xllx2) fk(Xk) G Gk Hk(Xk) H* In J Kk
probability density function joint probability density function conditional probability density function vector-valued nonlinear functions linaiting Kalnaan gain naatrix Kalnaan gain naatrix naatrix-valued nonlinear function
n X n identity naatrix Jordan canonical forna of a matrix
L(x, v) MAr NCA Onxm P Pk,k P[i,j] P(X) Qk Rk
controllability matrix observability matrix n x m zero matrix limiting (error) covariance matrix estimate (error) covariance matrix (i, j)th entry of naatrix P probability of random variable X variance matrix of random vector i k variance matrix of random vector !l.k
103 107
13 9 14 36,37 9 10 12 108 78 23 108 50 7 56 51 85 79 79 26 8
XIV
Notation
Rn Sk tr Uk
Var(X) Var(XIY
=
Vk
space of column vectors x = [Xl ... xn]T covariance matrix of Sk and '!1k trace deterministic control input (at the kth time instant) variance of random variable X y) conditional variance observation (or measurement) data (at the kth time instant)
V 2#
weight matrix
Wk Wj
(W1/Jf)(b, a) Xk
Xk, xklk xklk-l
Xk
Xk
integral wavelet transform state vector (at the kth time instant) optimal filtering estimate of Xk optimal prediction of Xk suboptimal estimate of Xk near-optimal estimate of Xk
x* x# , x# k
"norm" ofw IIwll (x, w) "inner product" of x and W Y(wo,···, w r ) "linear span" of vectors Wo,···, Wr {Zj} innovations sequence of data a,{3,,,,/,(}
{~k}'
r, rk
{lk}
bij !.k,£'
~k,£
'!1k Sk
k£
df/dA
8hj8x
tracker parameters white noise sequences system noise matrices Kronecker delta random (noise) vectors measurement noise (at the kth time instant) system noise (at the kth time instant) transition matrix Jacobian matrix Jacobian matrix
55 5
10 14
53 15 34 164
52 53,57 34 33 34 35 136,141,180 67
15 22
22
65 110
1. Preliminaries
The importance of Kalman filtering in engineering applications is well known, and its mathematical theory has been rigorously established. The main objective of this treatise is to present a thorough discussion of the mathematical theory, computational algorithms, and application to real-time tracking problems of the Kalman filter. In explaining how the Kalman filtering algorithm is obtained and how well it performs, it is necessary to use some formulas and inequalities in matrix algebra. In addition, since only the statistical properties of both the system and measurement noise processes in real-time applications are being considered, some knowledge of certain basic concepts in probability theory will be helpful. This chapter is devoted to the study of these topics. 1.1 Matrix and Determinant Preliminaries
Let Rn denote the space of all column vectors x = [Xl··· xn]T, where Xl, ... ,Xn are real numbers. An n x n real matrix A is said to be positive definite if x T Ax is a positive number for all nonzero vectors x in Rn. It is said to be non-negative definite if x T Ax is non-negative for any x in Rn. If A and B are any two nxn matrices of real numbers, we will use the notation
A>B when the matrix A - B is positive definite, and A~B
when A - B is non-negative definite.
2
1. Preliminaries
We first recall the so-called Schwarz inequality: IxT yl::; Ixllyl,
x,y
E
Rn,
where, as usual, the notation Ixl = (xT X)1/2
is used. In addition, we recall that the above inequality becomes equality if and only if x and y are parallel, and this, in turn, means that x
= Ay
or
y
= AX
for some scalar A. Note, in particular, that if Schwarz inequality may be written as
y
i=
0,
then the
xTx~ (yT x )T(y T y)-l(y T X).
This formulation allows us to generalize the Schwarz inequality to the matrix setting. Lemma 1.1. (Matrix Schwarz inequality) Let p and Q be m x n and m x f matrices, respectively, such that pT P is nonsingular. Then QTQ ~ (pTQ)T(p T p)-l(p T Q).
Furthermore, equality in (1.1) holds if and only if Q some n x f matrix S.
(1.1) = PS
for
The proof of the (vector) Schwarz inequality is simple. It amounts to observing that the minimum of the quadratic polynomial (x - AY) T (x - AY) ,
y
i= 0,
of A is attained at A=(yT y )-l(yT X )
and using this A value in the above inequality. Hence, in the matrix setting, we consider (Q - PS) T (Q - p S) ~ 0
and choose so that QT Q 2: ST (pT Q)
+ (pT Q)T S
_ ST (pT P)S = (pT Q)T (pT p)-l(pT Q)
as stated in (1.1). Furthermore, this inequality becomes equality if and only if (Q - PS) T (Q - PS)
or equivalently, Q = PS for some n x the proof of the lemma.
f
=0,
matrix S. This completes
1.1 Matrix and Determinant Preliminaries
3
We now turn to the following so-called matrix inversion lemma.
Lemma 1.2. (Matrix inversion lemma) Let
where All and A 22 are n x n and m x m nonsingular submatrices, respectively, such that
are also nonsingular. Then A is nonsingular with A-I All + AlII A 12 (A 22 -A 2I A I l AI2 )-1 A 2l A I l [
-AIlAI2(A22 - A2I A I IIA I2)_I]
l
(A 22 - A 2l A I A I2 )-1
-(A 22 - A2lAIII A I2 )-1 A 2l Ail (All - Al2A221 A 21 )-1
-(All - A l2 A 2l A21 )-1 A12 A2l]
1 [ -A 22 A 21 (A II - Al2A221 A 21 )-1
1 A 22 + A2lA21(AII -A12A221 A 21 )-1 Al2A221
(1.2)
In particular, (All - Al2A221 A 21 )-1 =A III + All A 12 (A 22 - A 2l A I l AI2)-IA2IAIl
(1.3)
All A 12 (A 22 - A 2l Ail AI2 )-1 =(A II - Al2A221 A21)-IAI2A2l·
(1.4)
and
Furthermore,
l
det A =(det All) det(A 22 - A 2l A I A 12 ) =( det A 22 ) det(A II - Al2A221 A 21 ) .
To prove this lemma, we write
(1.5)
4
1. Preliminaries
and A=
[Ino
l A 12 A2"2 ] [All - A 12 A2"l A 2l Im A 2l
0].
A 22
Taking determinants, we obtain (1.5). In particular, we have det A
=1=
0,
or A is nonsingular. Now observe that
A12
-1
A 22 - A 2l A I l A 12 ] -AlII A 12 (A22 - A2l A I l A12)-1]
(A 22 - A 2l A 1l A12 )-1
and
In [A2lAIll
Hence, we have
which gives the first part of (1.2). A similar proof also gives the second part of (1.2). Finally, (1.3) and (1.4) follow by equating the appropriate blocks in (1.2). An immediate application of Lemma 1.2 yields the following result.
Lemma 1.3. If P 2: Q > 0, then Q-l 2:
p- l
> o.
Let P(E) = P + El where E > o. Then P(E) - Q > o. By Lemma 1.2, we have p-l(E) = [Q + (P(E) _ Q)]-l = Q-l _ Q-l[(P{E) _ Q)-l
so that
+ Q-l]-lQ-l,
1.1 Matrix and Determinant Preliminaries
Letting
€
~
0 gives Q:-l - p-l
~
0, so that
Q-l
~
p-l
5
> O.
Now let us turn to discussing the trace of an n x n matrix A. The trace of A, denoted by tr A, is defined as the sum of its diagonal elements, namely: n
trA == Laii' i=1
where A == [aij]. We first state some elementary properties.
Lemma 1.4. If A and Bare n
x n
matrices, then
trA T == trA,
(1.6)
tr(A + B) == trA + trB,
(1.7)
tr(.AA) ==.A trA.
(1.8)
and If A is an n x m matrix and B is an m x n matrix, then trAB == trB T AT == trBA = trA T B T
and
n
(1.9)
m
T
(1.10)
trA A == LLa;j' i=1 j=1
The proof of the above identities is immediate from the definition and we leave it to the reader (cf. Exercise 1.1). The following result is important.
Lemma 1.5. Let A be an nxn matrix with eigenvalues .AI,'" ,.An, multiplicities being listed. Then n
(1.11)
trA == L.Ai' i=1
To prove the lemma, we simply write A == U JU- 1 where J is the Jordan canonical form of A and U is some nonsingular matrix. Then an application of (1.9) yields n
trA == tr(AU)U- 1 = trU- 1 (AU) == trJ == L.Ai. i=1
It follows from this lemma that if A > 0 then trA > 0, and if A then trA ~ o. Next, we state some useful inequalities on the trace.
~ 0
6
1. Preliminaries
Lemma 1.6. Let A be an n x n matrix. Then trA :::; (n trAAT )I/2 .
(1.12)
We leave the proof of this inequality to the reader (cf. Exercise 1.2).
Lemma 1.7. If A and Bare n x m and m x f matrices respectively, tben
tr(AB)(AB)T :::; (trAAT)(trBB T ).
Consequently, for any matrices AI,···, A p with appropriate dimensions, tr(A I ·· .Ap)(A I ·· . Ap)T :::; (trAIAi)··· (trApAJ).
If A =
[aij]
(1.13)
and B = [b ij ], then
tr(AB)(AB) T = tr
[t aikbki] [t aikbki] k=1
k=1
I:~=l (I:;;'=l a1kbkP) 2
*
*
I:~=l (I:;;'=l ankbkP)2
= tr n
i
(m
)2
n
m m
i
=
t;]; {; aikbkp
=
(t~a;k) (t,~b~p) = (trAAT)(trBB T ) ,
~
tt]; {; a;k (; b~p
where the Schwarz inequality has been used. This completes the proof of the lemma. It should be remarked that A ~ B > 0 does not necessarily imply trAA T 2: trBB T . An example is A
=
[8
n
Here, it is clear that A - B
and ~ 0
trAAT =
B
=
[i
~1 ]
.
and B > 0, but
169
25 < 7 =
trBB T
(cf. Exercise 1.3). For a symmetric matrix, however, we can draw the expected conclusion as follows.
1.1 Matrix and Determinant Preliminaries
7
Lemma 1.8. Let A and B be non-negative definite symmetric matrices with A 2:: B. Then trAA T 2:: trBB T , or trA2 2:: trB 2. We leave the proof of this lemma as an exercise (cf. Exercise 1.4).
Lemma 1.9. Let B be an n x n non-negative definite symmetric matrix. Then trB 2 S (trB)2 .
(1.14)
Consequently, if A is another nxn non-negative definite symmetric matrix such that B S A, then trB 2 S (tr A)2 .
(1.15)
To prove (1.14), let AI,···,A n be the eigenvalues of B. Then Ai,· . " A~ are the eigenvalues of B 2 • Now, since AI,' ", An are non-
negative, Lemma 1.5 gives trB 2 =
~A~ S (~Ai)
2
= (trB)2.
(1.15) follows from the fact that B S A implies trB S trA. We also have the following result which will be useful later.
Lemma 1.10. Let F be an nxn matrix with eigenvalues AI,"', An such that A := max(IAII,···, IAnl) < 1. Then there exist a real number r satisfying 0 < r < 1 and a constant C such that
for all
k
= 1,2,···.
Let J be the Jordan canonical form for F. Then F = U JU- I for some nonsingular matrix U. Hence, using (1.13), we have ItrFk(F k )T I = ItrU JkU-1(U-l) T (Jk) TU T I
S ItrUU T IltrJk(Jk) TIltrU-1(U-l) TI S p(k)A 2k ,
where p(k) is a polynomial in k. Now, any choice of r satisfying A2 < r < 1 yields the desired result, by choosing a positive constant C that satisfies for all
k.
8
1. Preliminaries
1.2 Probability Preliminaries
Consider an experiment in which a fair coin is tossed such that on each toss either the head (denoted by H) or the tail (denoted by T) occurs. The actual result that occurs when the experiment is performed is called an outcome of the experiment and the set of all possible outcomes is called the sample space (denoted by s) of the experiment. For instance, if a fair coin is tossed twice, then each result of two tosses is an outcome, the possibilities are HH, TT, HT, TH, and the set {HH, TT, HT, TH} is the sample space s. Furthermore, any subset of the sample space is called an event and an event consisting of a single outcome is called a simple event. Since there is no way to predict the outcomes, we have to assign a real number P, between 0 and 1, to each event to indicate the probability that a certain outcome occurs. This is specified by a real-valued function, called a random variable, defined on the sample space. In the above example, if the random variable X = X(s), s E S, denotes the number of H's in the outcome s, then the number P;== P(X(s)) gives the probability in percentage in the number of H's of the outcome s. More generally, let S be a sample space and X : S ~ RI be a random variable. For each measurable set A c RI (and in the above example, A'= {O}, {I}, or {2} indicating no H, one H, or two H's, respectively) define P : {events} ~ [0,1], where each event is a set {s E S : X(s) E A C RI} := {X E A}, subject to the following conditions:
(1) P(X E A) 2 0 for any measurable set A CRI, (2) P(X E RI) = 1, and (3) for any countable sequence of pairwise disjoint measurable sets Ai in RI, CXJ
p(x E igl Ai) = LP(X E Ai)' i=I
is called the probability distribution (or probability distribution function) of the random variable X.
P
If there exists an integrable function f such that P(X
E
A) =
i
f(x)dx
(1.16)
for all measurable sets A, we say that P is a continuous probability distribution and f is called the probability density function of
1.2 Probability Preliminaries
9
the random variable X. Note that actually we could have defined j(x)dx = d>" where>.. is a measure (for example, step functions) so that the discrete case such as the example of "tossing coins" can be included. If the probability density function j is given by j(x)
= ~
e -~(X-/l)2,
a
> 0 and
/L E R,
(1.17)
v21rO'
called the Gaussian (or normal) probability density function, then P is called a normal distribution of the random variable X, and we use the notation: X rv N(J.L, 0'2). It can be easily verified that the normal distribution P is a probability distribution. Indeed, (1) since j(x) > 0, P(X E A) = fA j(x)dx ~ 0 for any measurable set A c R, (2) by substituting y = (x - J.L)/(v'2O') , P(X E RI)
=j
CX) j(x)dx = -CX)
(cf. Exercise 1.5), and (3) since
LA. .
j(x)dx
~
~
=
~ ~
jCX) Y 2 e- dy = 1, V 1r -CX) 1 c.
i.
j(x)dx
~
for any countable sequence of pairwise disjoint measurable sets Ai c RI, we have p(X
E
l}A i ) ~
= LP(X E
Ai).
i
Let X be a random variable. The expectation of X indicates the mean of the values of X, and is defined by E(X)
=
i:
xj(x)dx.
(1.18)
Note that E(X) is a real number for any random variable X with probability density function j. For the normal distribution, using the substitution y= (x-J.L)/(v'2O') again, we have E(X)
=
i:
xj(x)dx
= _1_ (CX) xe -~(x-JL)2 dx ~O' J-CX)
= ~ [00 (V2 ay + /L)e- y2 dy
v 1r J-CX) = J.L_1_ [00 e- y2 dy ~ J-CX) = J.L.
(1.19)
10
1. Preliminaries
Note also that E{X) is the first moment of the probability density function f. The second moment gives the variance of X defined by Var (X) = E(X - E(X))2 =
1:
(x - E(X))2f(x)dx.
(1.20)
This number indicates the dispersion of the values of X from its mean E{X). For the normal distribution, using the substitution y = (x - J-l)j{y'2o-) again, we have Var(X) =
1: 100
(x - /L)2f(x)dx
1
= -yl2i02
= -20~ = 0- 2 ,
1-00
__
I_(X-JL)2
{x - J-l)2 e 20- 2
dx
-00
00
y2 e - Y 2 dy
(1.21)
where we have used the equality J:O y2 e- y2 dy = ~/2 (cf. Exercise 1.6). We now turn to random vectors whose components are random variables. We denote a random n-vector X = [Xl··· Xn]T where each Xi{s) E RI, S E S. Let P be a continuous probability distribution function of X. That is,
(1.22)
where AI,··· ,An are measurable sets in RI and f an integrable function. f is called a joint probability density function of X and P is called a joint probability distribution (function) of X. For each i, i = 1,· .. ,n, define fi(X)
=
{CO ... (CO
i-oo i-oo
f(Xl, ... , Xi-I,
x, Xi+l, ... , Xn)dXl
... dXi-ldXi+l ... dX n .
(1.23)
Then it is clear that J~oo fi{X)dx = 1. fi is called the ith marginal probability density function of X corresponding to the joint probability density function f{XI,· .. ,xn ). Similarly, we define fij and
1.2 Probability Preliminaries fijk by deleting the integrals with respect to Xi, respectively, etc., as in the definition of fi. If
()=
j x
{1
1
T
(21r)n/2(det R)l/2 exp -2(x -!!.) R
-1
Xj
and
}
(x -!!.) ,
11
Xi, Xj, Xk,
(1.24)
where ft is a constant n-vector and R is a symmetric positive definite matrix, we say that f(x) is a Gaussian (or normal) probability density function of x. It can be verified that
1:
j(x)dx:=
E(X)
=
1:···1:
1:
j(X) dX l
...
dX n = 1 ,
(1.25)
xj(x)dx
(1.26)
=!:!.'
and (1.27)
Indeed, since R is symmetric and positive definite, there is a unitary matrix U such that R = UT JU where J = diag[Al,· .. ,An] and AI,···, An > o. Let y = ~diag[JXl'···' JXn]U(x - !:!.). Then
1:
_
j(x)dx
2n / 2 V. IX Al ....VIX An
- (2 )n/2(oX ... oX )1/2 1r
1
n
100 e -Yl dYl . . . 100 e -00 -00 2
-y 2
n dYn
=1.
Equations (1.26) and (1.27) can be verified by using the same substitution as that used for the scalar case (cf. (1.21) and Exercise 1.7) . Next, we introduce the concept of conditional probability. Consider an experiment in which balls are drawn one at a time from an urn containing M 1 white balls and M 2 black balls. What is the probability that the second ball drawn from the urn is also black (event A 2 ) under the condition that the first one is black (event AI)? Here, we sample without replacement; that is, the first ball is not returned to the urn after being drawn.
12
1. Preliminaries
To solve this simple problem, we reason as follows: since the first ball drawn from the urn is black, there remain M I white balls and M 2 - 1 black balls in the urn before the second drawing. Hence, the probability that a black ball is drawn is now M 2 -1
Note that
where M 2 j(M1 + !"12) is the p~~babili~Ylhat.a black ball ~~ picked
at the first drawIng, and CMI +M2) . MI +M2- 1 IS the probabIlIty that black balls are picked at both the first and second drawings. This example motivates the following definition of conditional probability: The conditional probability of Xl E Al given X 2 E A 2 is defined by (1.28)
Suppose that P is a continuous probability distribution function with joint probability density function f. Then (1.28) becomes fAI JA2 f(XI' X2) dx l dx 2
JA2 f 2 ()d X2 X2
P(X1 E A 1 IX2 E A 2 ) =
where f2 defined by h(X2) =
1:
'
f(Xll X2)dx l
is the second marginal probability density function of f. Let f(Xllx2) denote the probability density function corresponding to P(X I E A I IX2 E A 2). f(Xllx2) is called the conditional probability density function corresponding to the conditional probability distribution function P(X I E A I IX2 E A 2 ). It is known that
-
f( Xl IX2 ) -
f(XI' X2) f2(X2)
(1.29)
which is called the Bayes formula (see, for example, Probability by A. N. Shiryayev (1984)). By symmetry, the Bayes formula can be written as
1.2 Probability Preliminaries
13
We remark that this formula also holds for random vectors Xl and X 2 • Let X and Y be random n- and m-vectors, respectively. The covariance of X and Y is defined by the n x m matrix Cov(X, Y)
= E[(X - E(X))(Y - E(Y)) T] .
(1.31)
When Y = X, we have the variance matrix, which is sometimes called a covariance matrix of X, Var(X) = Cov(X, X). It can be verified that the expectation, variance, and covariance have the following properties: E(AX + BY) = AE(X)
+ BE(Y)
(1.32a)
E((AX)(By)T) = A(E(XyT))B T
(1.32b)
Var(X) 2:: 0,
(1.32c)
Cov(X, Y) = (Cov(Y,X))T ,
(1.32d)
Cov(X, Y) = E(XyT) - E(X)(E(y))T ,
(1.32e)
and where A and B are constant matrices (cf. Exercise 1.8). X and y are said to be independent if f(xly) = fl(X) and f(ylx) = f2(y), and X and y are said to be uncorrelated if Cov(X, Y) = o. It is easy to see that if X and Y are independent then they are uncorrelated. Indeed, if X and Y are independent then f(x, y) = fl(X)f2(Y). Hence, E(XyT)
= =
1:1: 1: 1:
xyT f(x,y)dxdy
x!I(x) dx
yT h(y)dy
= E(X)(E(y))T , so that by property (1.32e) Cov(X, Y) = o. But the converse does not necessarily hold, unless the probability distribution is normal. Let where R =
[RR
ll 21
R 12] R 22
, R12 = R 2l T ,
and R 22 are symmetric, and R is positive definite. Then it can be shown that Xl and X 2 are independent if and only if R 12 = CoV(X l ,X2) = 0 (cf. Exercise 1.9).
R ll
14
1. Preliminaries
Let X and Y be two random vectors. Similar to the definitions of expectation and variance, the conditional expectation of X under the condition thaty = y is defined to be E(XIY
= y) =
1:
(1.33)
xf(xly)dx
and the conditional variance, which IS sometimes called the conditional covariance of X, under the condition that y = y to be
1:
Var(XIY
=
= y)
[x - E(XIY = y)][x - E(XIY = y)]T J(xly)dx.
(1.34)
Next, suppose that
E([~]) = [~]
and
Var([X]) Y
=
xx [R R yx
xy R ]. R yy
Then it follows from (1.24) that f(x,y) =f([;]) 1
It can be verified that f(xly) =f(x,y)
f(y)
1 (2n)n/2(det R)1/2
eXP{-!(X-MTil-l(X-M} , 2 -
(1.35)
where and -
R
= R xx -
-1
RxyR yy Ryx
(cf. Exercise 1.10). Hence, by rewriting p:. and R, we have E(XIY
= y) = E(X) + Cov(X, Y)[Var(y)]-l(y - E(Y))
(1.36)
1.3 Least-Squares Preliminaries
15
and Var(XIY = y) = Var(X) - Cov(X, Y)[Var(y)]-lCOV(y,X) .
(1.37)
1.3 Least-Squares Preliminaries
Let {5k} be a sequence of random vectors, called a random sequence. Denote E(~k) = !!:.k' Cov(~k'~j) = Rkj so that Var(~k) = Rkk := Rk· A random sequence {~k} is called a white noise sequence if Cov(5.k'~j) = Rkj = Rk8kj where 8kj = 1 if k = j and 0 if k =I j. {5.k} is called a sequence of Gaussian (or normal) white noise if it is white and each 5.k is normal. Consider the observation equation of a linear system where the observed data is contaminated with noise, namely:
where, as usual, {Xk} is the state sequence, {Uk} the control sequence, and {Vk} the data sequence. We assume, for each k, that the q x n constant matrix Ck, q x p constant matrix Dk, and the deterministic control p-vector Uk are given. Usually, {~k} is not known but will be assumed to be a sequence of zero-mean Gaussian white noise, namely: E(5.k) = 0 and E(5.k~;) = R kj 8kj with Rk being symmetric and positive definite, k, j = 1, 2, .... Our goal is to obtain an optimal estimate Yk of the state vector Xk from the information {Vk}. If there were no noise, then it is clear that Zk - CkYk = 0, where Zk := Vk - DkUk, whenever this linear system has a solution; otherwise, some measurement of the error Zk - CkYk must be minimized over all Yk. In general, when the data is contaminated with noise, we will minimize the quantity:
over all n-vectors Yk where Wk is a positive definite and symmetric q x q matrix, called a weight matrix. That is, we wish to find a Yk = Yk(Wk) such that
16
1. Preliminaries
In addition, we wish to determine the optimal weight Wk. To find Yk = Yk(Wk), assuming that (C~WkCk) is nonsingular, we rewrite F(Yk, Wk) =E(Zk - CkYk) TWk(Zk - CkYk)
=E[(C~WkCk)Yk - C~WkZk]T (C~WkCk)-I[(C~WkCk)Yk - C~WkZk]
+ E(zJ [I -
WkCk(C~WkCk)-ICJ]WkZk) ,
where· the first term on the right hand side is non-negative definite. To minimize F(Yk, Wk), the first term on the right must vanish, so that
Yk = (C~WkCk)-IC~WkZk. Note that if (C~WkCk) is singular, then Yk is not unique. To find the optimal weight Wk, let us consider F(Yk' Wk)
= E(Zk
- CkYk)TWk(Zk - CkYk).
It is clear that this quantity does not attain a minimum value at a positive definite weight Wk since such a minimum would result from Wk = o. Hence, we need another measurement to determine an optimal Wk. Noting that the original problem is to estimate the state vector Xk by Yk(Wk), it is natural to consider a measurement of the error (Xk - Yk(Wk)). But since not much about Xk is known and only the noisy data can be measured, this measurement should be determined by the variance of the error. That is, we will minimize Var(xk -Yk(Wk)) over all positive definite symmetric matrices Wk. We write Yk = Yk(Wk) and Xk -
Yk =
(C~WkCk)-I(C~WkCk)Xk - (C~WkCk)-IClwkZk
= (CJWkCk)-IC~Wk(CkXk - Zk)
= -(ClWkCk)-IClwk~k' Therefore, by the linearity of the expectation operation, we have Var(xk - Yk)
= (C~WkCk)-IClWkE(~k~:)WkCk(C~WkCk)-1 = (ClWkCk)-IClWkRkWkCk(ClWkCk)-I.
This is the quantity to be minimized. To write this as a perfect square, we need the positive square root of the positive definite symmetric matrix Rk defined as follows: Let the eigenvalues of Rk be AI,·", An, which are all positive, and write Rk = UT diag[Al'···' An]U where U is a unitary matrix (formed by the normalized eigenvectors of Ai, i = 1"·,, n). Then we define
1.3 Least-Squares Preliminaries
17
2
= UT diag[JXl'···' JXn]U which gives (R~/2)(Ri/2) T = Rk. It follows that Var(xk - Yk) = QT Q, Ri/
where Q = (R~/2)TWkCk(C~WkCk)-1. By Lemma 1.1 (the matrix Schwarz inequality), under the assumption that p is a qxn matrix with nonsingular pT P, we have
Hence, if (C"[ R;lCk) is nonsingular, we may choose p = (Ri/ 2)-lCk' so that pT p =
c"[ ((R~/2)T)-1(R~/2)Ck
= C~ R;lCk
is nonsingular, and (pT Q) T (pT p)-l(pT Q)
= [C"[ ((Ri/ 2)-1) T (R~/2) TWkCk(C~WkCk)-l]T (C~ RJ;lCk)-l 2
. [C~ ((Ri/ )-1) T(Rk/ 2) TWkCk(C~WkCk)-l]
= (C~ RJ;lCk)-l = Var(Xk - Yk(RJ;l)). Hence, Var(xk-Yk(Wk)) ~ Var(xk-Yk(R;l)) for all positive definite symmetric weight matrices Wk. Therefore, the optimal weight matrix is Wk = R;l, and the optimal estimate of Xk using this optimal weight is Xk := Yk(R;l) = (C~ R;lCk)-lC~ RJ;l(Vk - DkUk).
(1.38)
We call Xk the least-squares optimal estimate of Xk. Note that Xk is a linear estimate of Xk. Being the image of a linear transformation of the data Vk -DkUk, it gives an unbiased estimate of Xk, in the sense that EXk = EXk (cf. Exercise 1.12), and it also gives a minimum variance estimate of Xk, since
for all positive definite symmetric weight matrices Wk.
18
1. Preliminaries
Exercises 1.1. Prove Lemma 1.4. 1.2. Prove Lemma 1.6. 1.3. Give an example of two matrices A and B such that A 2: B > o but for which the inequality AAT 2: BB T is not satisfied. 1.4. Prove Lemma 1.8. 1.5. Show that J~CXJ e- y2 dy = ~. 1.6. Verify that J~CXJ y2 e- y 2dy = ~~. (Hint: Differentiate the integral - J~CXJ e- xy2 dy with respect to x and then let x -+ 1.) 1.7. Let f(x)
=
(21T)n/2(~etR)l/2 exp { -~(x -
Show that
(a) E(X)
= :=
i:
I:!) T R-1(x - !!:.) } .
xf(x)dx
i:.. i:[}]
j(X)dXl ... dX n
=!!:..'
and (b) Var(X) = E(X - !!:..)(X - !!:..)T = R.
1.8. Verify the properties (1.32a-e) of the expectation, variance, and covariance. 1.9. Prove that two random vectors Xl and X 2 with normal distributions are independent if and only if Cov(X I , X 2 ) = o. 1.10. Verify (1.35). 1.11. Consider the minimization of the quantity
over all n-vectors Yk, where Zk is a q x 1 vector, Ck, a q x n matrix, and W k , a q x q weight matrix, such that the matrix (C~WkCk) is nonsingular. By letting dF(Yk)/dYk = 0, show that the optimal solution Yk is given by
Exercises
19
(Hint: The differentiation of a scalar-valued function F(y) with respect to the n-vector y = [Yl ... Yn]T is defined to be
1.12. Verify that the estimate Xk given by (1.38) is an unbiased estimate of Xk in the sense that EXk = EXk.
2. Kalman Filter: An Elementary Approach
This chapter is devoted to a most elementary introduction to the Kalman filtering algorithm. By assuming invertibility of certain matrices, the Kalman filtering "prediction-correction" algorithm will be derived based on the optimality criterion of least-squares unbiased estimation of the state vector with the optimal weight, using all available data information. The filtering algorithm is first obtained for a system with no deterministic (control) input. By superimposing the deterministic solution, we then arrive at the general Kalman filtering algorithm.
2.1 The Model Consider a linear system with state-space description Yk+l {
Wk
= AkYk + BkUk + rk~k = CkYk + DkUk +!1k'
where Ak' Bk' rk, Ck, Dk are nx n, nxm, nxp, q x n, q xm (known) constant matrices, respectively, with 1 ::; m,p,q ::; n, {Uk} a (known) sequence of m-vectors (called a deterministic input sequence), and {~k} and t~l.k} are, respectively, (unknown) system and observation noise sequences, with known statistical information such as mean, variance, and covariance. Since both the deterministic input {Uk} and noise sequences {i k } and {!1k} are present, the system is usually called a linear deterministic/stochastic system. This system can be decomposed into the sum of a linear deterministic system: Zk+l = AkZk + Bk Uk {
Sk
= CkZk + DkUk,
2.2 Optimality Criterion
21
and a linear (purely) stochastic system: Xk+l {
Vk
= AkXk + rk~k = CkXk + '!1k '
(2.1)
with Wk = Sk+Vk and Yk = Zk+Xk. The advantage of the decomposition is that the solution of Zk in the linear deterministic system is well known and is given by the so-called transition equation k
Zk = (A k- 1 ··· Ao)zo
+ L(A k- 1 ... Ai-l)Bi-lUi-l. i=l
Hence, it is sufficient to derive the optimal estimate Xk of Xk in the stochastic state-space description (2.1), so that
becomes the optimal estimate of the state vector Yk in the original linear system. Of course, the estimate has to depend on the statistical information of the noise sequences. In this chapter, we will only consider zero-mean Gaussian white noise processes.
Assumption 2.1. Let {~k} and {'!1k} be sequences of zero-mean Gaussian white noise such that Var({-{» = Qk and Var('!l.J.) = Rk are positive definite matrices and E(~k~ ) = 0 for all k and l. The initial state Xo is also assumed to be independent of ~k and '!1k in the sense that E(xo~~) = 0 and E(xo'!1;) = 0 for all k. 2.2 Optimality Criterion
In determining the optimal estimate Xk of Xk, it will be seen that the optimality is in the sense of least-squares followed by choosing the optimal weight matrix that gives a minimum variance estimate as discussed in Section 1.3. However, we will incorporate the information of all data Vj , j = 0,1,· .. , k, in determining the estimate Xk of Xk (instead of just using Vk as discussed in Section 1.3). To accomplish this, we introduce the vectors
j
= 0, 1,··· ,
22
2. Kalman Filter: An Elementary Approach
and obtain Xk from the data vector Vk. For this approach, we assume for the time being that all the system matrices A j are nonsingular. Then it can be shown that the state-space description of the linear stochastic system can be written as (2.2)
where Hk,j
=
[ CO~Ok] :
~k,j
and
=
[f-k'O]
:
CjiPjk
'
f-k,j
with iPik being the transition matrices defined by if R > k, if R = k,
_ {Ai-I . .. A k iPik iPik
I
= iPkl if R < k, and k
5:.k,l = '!lR.
L
~ Cl
iPliri-l~i_l·
i=l+l
Indeed, by applying the inverse transition property of scribed above and the transition equation
~ki
de-
k
Xk
L
= iPkiXi +
iPkiri-l{i_l'
i=l+l
which can be easily obtained from the first recursive equation in (2.1), we have k
L
Xi = iPlkXk -
iPiiri-l~i_l;
i=l+l
and this yields Hk,jXk
CO~Ok ]
=
:
[
=
+ fk,j
CJoiPJok
[CoXa: + '!1J ] CjXj
which is (2.2).
!la -
[
Co
Et=~
Xk+
+ '!lj
: 'n
:.Lj
[
-
7] Vj
C o,,~ J
L...J~=J+l
=
vj
0
~
0
o~ or ~-l-i-l
J't
]
2.3 Prediction-Correction Formulation
23
Now, using the least-squares estimate discussed in Chapter 1, Section 1.3, with weight Wkj = (Var(~k,j))-\ where the inverse is assumed only for the purpose of illustrating the optimality criterion, we arrive at the linear, unbiased, minimum variance least-squares estimate Xk1j of Xk using the data Vo,···, Vj.
Definition 2.1. (1) For j = k, we denote Xk = xklk and call the estimation process a digital filtering process. (2) For j < k, we call Xklj an optimal prediction of Xk and the process a digital prediction process. (3) For j > k, we call Xklj a smoothing estimate of Xk and the process a digital smoothing process. We will only discuss digital filtering. However, since Xk = xklk is determined by using all data Vo,··· ,Vk, the process is not applicable to real-time problems for very large values of k, since the need for storage of the data and the computational requirement grow with time. Hence, we will derive a recursive formula that gives Xk = xklk from the "prediction" xklk-l and xklk-l from the estimate Xk-l = Xk-llk-l. At each step, we only use the incoming bit of the data information so that very little storage of the data is necessary. This is what is usually called the Kalman filtering
algorithm. 2.3 Prediction-Correction Formulation To compute
Xk
in real-time, we will derive the recursive formula
~klk = ~klk-l ~ Gk(Vk - CkXklk-l)
{ Xklk-l -
Ak-1Xk-llk-l ,
where Gk will be called the Kalman gain matrices. The starting point is the initial estimate Xo = xOlo. Since Xo is an unbiased estimate of the initial state Xo, we could use Xo = E(xo), which is a constant vector. In the actual Kalman filtering, Gk must also be computed recursively. The two recursive processes together will be called the Kalman filtering process. Let Xklj be the (optimal) least-squares estimate of Xk with minimum variance by choosing the weight matrix to be
24
2. Kalman Filter: An Elementary Approach
using that
Vj
in (2.2) (see Section 1.3 for details). It is easy to verify
o ]
-1 W k,k-l =
+Var
[CO Ei=l Oiri_l~i_l]
(2.3)
Ck-l k-l,krk-l~k_l
Rk-l
and
:
w-k,k1 -_ [Wk~-l 6
0]
(2.4)
Rk
(cf. Exercise 2.1). Hence, Wk,k-l and Wk,k are positive definite (cf. Exercise 2.2). In this chapter, we also assume that the matrices (H~jWk,jHk,j),
j=k-l
and
k,
are nonsingular. Then it follows from Chapter 1, Section 1.3, that xkli
= (H~jWk,jHk,j)-lH~jWk,jVj.
Our first goal is to relate that
xklk-l
with xklk. To do so, we observe
and T TXT HT H k,k JIJI k,k vk = k,k-l TXT JIJI k,k-l vk-l
1 + CTRk k Vk·
Using (2.5) and the above two equalities, we have
and
(H~k-lWk,k-lHk,k-l
+ C~ RJ;lCk)Xklk
=(H~kWk,kHk,k)Xklk T TXT = H k,k-l JIJI k,k-lVk-l
(2.5)
1 + CTRk k vk
.
A simple subtraction gives (H~k-l Wk,k-lHk,k-l + C~ RJ;lCk)(Xklk - xklk-l) =C~ RJ;l(Vk - CkXk1k-l) .
2.3 Prediction-Correction Formulation
25
Now define Gk ==(H~k-IWk,k-IHk,k-1
+ C"[ R;lCk)-IC"[ R;I
==(H~kWk,kHk,k)-IC"[R;I.
Then we have (2.6)
Since xklk-I is a one-step prediction and (Vk - CkXk1k-l) is the error between the real data and the prediction, (2.6) is in fact a "prediction-correction" formula with the Kalman gain matrix Gk as a weight matrix. To complete the recursive process, we need an equation that gives xklk-I from Xk-Ilk-I' This is simply the equation (2.7)
To prove this, we first note that
so that W~~_I ==
W';!I,k-1 +Hk-I,k-I k-I,krk-I Qk-I r r-I r-l,kH"[-I,k-1 (2.8)
(cf. Exercise 2.3). Hence, by Lemma 1.2, we have Wk,k-I ==Wk-I,k-I - Wk-l,k-IHk-l,k-1 k-I,krk-I(Q;~I
+ rI-I I-I,kH-:-I,k-1 Wk-l,k-IHk-l,k-1 k-I,krk-I)-I . rI-I I-I,kH-:-I,k-1 Wk-I,k-I
(2.9)
(cf. Exercise 2.4). Then by the transition relation Hk,k-I
== Hk-I,k-I k-I,k
we have H~k-I Wk,k-I ==I-I,k{I - H"[-I,k-I Wk-l,k-IHk-l,k-1 k-I,krk-I(Q;~I
+ rI-I ifJI-I,kH-:-I,k-1 Wk-l,k-IHk-l,k-1 k-I,krk-I)-I . rI-I I-I,k}H-:-I,k-1 Wk-I,k-l
(2.10)
26
2. Kalman Filter: An Elementary Approach
(cf. Exercise 2.5). It follows that (H~k-lWk,k-lHk,k-l)k,k-l (H"[-l,k-l Wk-l,k-lHk-l,k-l)-l . H"[-l,k-l Wk-l,k-l =
H~k-l Wk,k-l
(2.11)
(cf. Exercise 2.6). This, together with (2.5) with j = k -1 and k, gives (2.7). Our next goal is to derive a recursive scheme for calculating the Kalman gain matrices Gk. Write
where and set Then, since -1 Pk,k =
p-l
k,k-l
+ CTR-1C k k k ,
we obtain, using Lemma 1.2,
It can be proved that (2.12)
(cf. Exercise 2.7), so that (2.13)
Furthermore, we can show that
(cf. Exercise 2.8). Hence, using (2.13) and (2.14) with the initial matrix PO,o, we obtain a recursive scheme to compute Pk-l,k-l, Pk,k-l, Gk and Pk,k for k = 1,2,," . Moreover, it can be shown that Pk,k-l
= E(Xk
- xklk-l)(Xk -
= Var(xk - xklk-l)
xklk-l) T
(2.15)
2.4 Kalman Filtering Process
27
(cf. Exercise 2.9) and that
In particular, we have Po,o = E(xo - Exo)(xo - Exo) T = Var(xo).
Finally, combining all the results obtained above, we arrive at the following Kalman filtering process for the linear stochastic system with state-space description (2.1): Po,o = Var(xo)
= Ak-lPk-l,k-lAr-l + rk-lQk-lrr-l = Pk,k-lC"[ (CkPk,k-lC"[ + Rk)-l Pk,k = (I - GkCk)Pk,k-l xOlo = E(xo) xklk-l = Ak-lXk-llk-l xklk = xklk-l + Gk(Vk - CkXk1k-l) Pk,k-l
Gk
k
= 1,2,···
(2.17)
.
This algorithm may be realized as shown in Fig. 2.1.
+
Fig. 2.1.
2.4 Kalman Filtering Process
Let us now consider the general linear deterministic/stochastic system where the deterministic control input {Uk} is present. More precisely, let us consider the state-space description
+ BkUk + rk~k CkXk + DkUk +!1k'
Xk+l = AkXk {
Vk
=
28
2. Kalman Filter: An Elementary Approach
where {Uk} is a sequence of m-vectors with 1 ~ m ~ n. Then by superimposing the deterministic solution with (2.17), the Kalman filtering process for this system is given by Po,o == Var(xo)
+ rk-1Qk-lrJ-l == Pk,k-lC~ (CkPk,k-lC~ + Rk)-l
Pk,k-l == Ak-1Pk-l,k-lAJ-l Gk
Pk,k == (I - GkCk)Pk,k-l xOlo ==
(2.18)
E(xo)
xklk-l == Ak-lXk-llk-l
xklk == Xk/k-l
+ Bk-lUk-l
+ Gk(Vk -
DkUk - CkXk1k-l)
k == 1,2,,,,,
(cf. Exercise 2.13). This algorithm may be implemented as shown in Fig.2.2.
Fig. 2.2.
Exercises
29
Exercises 2.1.
Let fk,j
= [ :~:,o. ]
k
and
c
£k,l
= '0. -
Cl
-k,J
L ~liri-1Si_1' i=l+l
where {Sk} and {!lk} are both zero-mean Gaussian white noise sequences with Var(Sk) = Qk and Var(!lk) = Rk. Define Wk,j = (Var(fk,j))-l. Show that o
[CO 2::=1 ~Oiri-1Si_1 ]
]
1 Wk,k-1 =
+Var
Ck-1 ~k-1,krk-1Sk_1
Rk-1
and -1_ W k,k -
2.2. 2.3.
:
0]
[Wk-~-l
b
Rk'
Show that the sum of a positive definite matrix A and a non-negative definite matrix B is positive definite. Let fk,j and Wk,j be defined as in Exercise 2.1. Verify the relation where
and then show that W~~_l = Wk!1,k-1 +Hk-1,k-1 ~k-1,krk-1Qk-1rr-1~r-1,kH;[-1,k-1 .
2.4.
Use Exercise 2.3 and Lemma 1.2 to show that Wk,k-1 =Wk-1,k-1 - Wk-1,k-1Hk-1,k-1~k-1,krk-1(Qk~1
+ rl- 1~l-1,kH"[-1,k-1 Wk-1,k-1Hk-1,k-1 ~k-1,krk-1)-1 . rl- 1~l-1,kH"[-1,k-1 Wk-1,k-1 .
2.5.
Use Exercise 2.4 and the relation show that
Hk,k-1
= Hk-1,k-1~k-1,k to
H~k-1Wk,k-1 =~l-l,k{I - H;[-1,k-1 Wk-1,k-1Hk-1,k-1 ~k-1,krk-1(Qk~l
+ rl- 1~l-1,kHI-1,k-1 Wk-1,k-1Hk-1,k-1 ~k-1,krk-1)-1 . rl- 1 ~r-1,k}HI-1,k-1 Wk-1,k-1 .
30
2. Kalman Filter: An Elementary Approach
2.6.
Use Exercise 2.5 to derive the identity: (H~k-1 Wk,k-1Hk,k-1)k,k-1 (H"[-1,k-1 Wk-1,k-1Hk-1,k-1)-1 . H"[-1,k-1 Wk-1,k-1
2.7.
= H~k-1 Wk,k-1 .
Use Lemma 1.2 to show that Pk,k-1 C"[ (CkPk,k-1C"[
2.8.
Start with Pk,k-1 = (H"[ k-1 Wk,k-1Hk,k-1)-1. Use Lemma 1.2, (2.8), and the definiti~n of Pk,k = (H~kWk,kHk,k)-l to show that Pk,k-1
2.9.
+ Rk)-l = Pk,k C"[ R k 1 = Gk.
= Ak-1Pk-1,k-1Al-1 + rk-1Qk-1rl-1 .
Use (2.5) and (2.2) to prove that E(Xk - xklk-1)(Xk - xklk-1) T
= Pk,k-1
and E(Xk - xklk)(Xk - xklk) T
= Pk,k .
2.10. Consider the one-dimensional linear stochastic dynamic system Xo = 0, 0- 2, E(Xk~j) = 0, E(~k) = 0, and J-L28kj. Prove that 0- 2 = J-L2/(1 - a 2) and E(XkXk+j) =
where E(Xk) E(~k~j)
=
= 0, Var(xk) =
a 1j1 0- 2 for all integers j.
2.11. Consider the one-dimensional stochastic linear system
with E(TJk) Show that
= 0, Var(TJk) =
0-2,E(xo) =
° and Var(xo)
=
J-L2.
and that xklk - t c for some constant c as k - t 00. 2.12. Let {Vk} be a sequence of data obtained from the observation of a zero-mean random vector y with unknown variance Q. The variance of y can be estimated by
Exercises
31
Derive a prediction-correction recursive formula for this estimation. 2.13. Consider the linear deterministic/stochastic system Xk+l {
Vk
= AkXk + BkUk + rk~k = CkXk + Dk Uk + '!lk '
where {Uk} is a given sequence of deterministic control input m-vectors, 1 :::; m :::; n. Suppose that Assumption 2.1 is satisfied and the matrix Var(?k,j) is nonsingular (cf. (2.2) for the definition of "fk,j). Derive the Kalman filtering equations for this model. 2.14. In digital signal processing, a widely used mathematical model is the following so-called ARMA (autoregressive moving-average) process: Vk
N
M
i=l
i=O
L BiVk-i + L AiUk-i ,
=
where the n x n matrices B I ,'" ,BN and the n x q matrices A o , AI,"', AM are independent of the time variable k, and {Uk} and {Vk} are input and output digital signal sequences, respectively (cf. Fig. 2.3). Assuming that M :::; N, show that the input-output relationship can be described as a state-space model Xk+l {
with
Xo
Vk
+ BUk = CXk + DUk =
AXk
= 0, where BI
I
0
B2
0
I
Al A2
0
A=
B= BN-I BN
0 0
+ BIAo + B2Ao
AM+BMAo BM+IA o
I 0
BNA o C=[10"'0]
and
D
=
[A o] .
32
2. Kalman Filter: An Elementary Approach
At
Fig. 2.3.
3. Orthogonal Projection and Kalman Filter
The elementary approach to the derivation of the optimal Kalman filtering process discussed in Chapter 2 has the advantage that the optimal estimate Xk == xklk of the state vector Xk is easily understood to be a least-squares estimate of Xk with the properties that (i) the transformation that yields Xk from the data Vk == [vci··· vl]T is linear, (ii) Xk is unbiased in the sense that E(Xk) == E(Xk), and (iii) it yields a minimum variance estimate with (VarCfk,k))-l as the optimal weight. The disadvantage of this elementary approach is that certain matrices must be assumed to be nonsingular. In this chapter, we will drop the nonsingularity assumptions and give a rigorous derivation of the Kalman filtering algorithm. 3.1 Orthogonality Characterization of Optimal Estimates
Consider the linear stochastic system described by (2.1) such that Assumption 2.1 is satisfied. That is, consider the state-space description
+ rk{k == CkXk + !1.k '
Xk+l == {
Vk
AkXk
(3.1)
where A k , rk and Ok are known n x n, n x p and q x n constant matrices, respectively, with 1 ::; p, q ::; n, and E({k) == 0,
E(~k~;) == Qk 8kl,
E({kTlJ) == 0,
E(xo~~) == 0,
E(xo!1.~) == 0,
for all k, f == 0,1,···, with Qk and Rk being positive definite and symmetric matrices. Let x be a random n-vector and w a random q-vector. We define the "inner product" (x, w) to be the n x q matrix
34
3. Orthogonal Projection and Kalman Filter
= Cov(x, w) = E(x - E(x))(w - E(w)) T . Let Ilwllq be the positive square root of (w, w). That is, IIwllq is a non-negative definite q x q matrix with (x, w)
IIwll~ = Ilwllqllwll; = (w, w).
Similarly, let IIxlln be the positive square root of (x, x). Now, let W r be random q-vectors and consider the "linear span":
wo,' . "
Y(wo,"', w r ) r
={y: y =
L Piwi,
Po,"', Pr, n x q constant matrices}.
i=o
The first minimization problem we will study is to determine a y in Y(wo,"', w r ) such that trllxk - yll~ = Fk' where Fk := min{trllxk - yll~:
y E
Y(wo,"', w r )}.
(3.2)
The following result characterizes y.
Lemma 3.1. y only if
E
Y(wo,"', w r ) satisfies trllxk - yll~ = Fk if and (Xk - y, Wj) = Onxq
for all j
= 0, 1, ... , r.
Furthermore, y is unique in the sense that trllxk - YII~ = trllxk -
YII;
only ify = y. To prove this lemma, we first suppose that trllxk - yll~ = Fk but (Xk - y, Wjo) = C =I- Onxq for some jo where 0 :::; jo :::; r. Then Wjo =I- 0 so that Ilwjoll~ is a positive definite symmetric matrix and so is its inverse Ilwjoll~2. Hence, Cllwjoll~2CT =1= Onxn and is a non-negative definite and symmetric matrix. It can be shown that tr{Cllwj ll;2C T } > 0 (3.3) o (cf. Exercise 3.1). Now, the vector y + Cllwjoll~2wjo is in Y(wo, "', w r ) and trllxk - (y + Cllwjoll;2Wjo)ll~
= tr{lI x k
y, wjo)(Cllwj o ll;2)T - Cllwj oll;2(wjo,Xk - y) + Cllw jo 11;211 w jo 11~(Cllwjo11;2)T} = tr{llxk - :9"11; - CIIWj oll;2CT } < trllxk - :9"11; = F k -
YII; -
(Xk -
by using (3.3). This contradicts the definition of Fk in (3.2).
3.2 Innovations Sequences
35
Conversely, let (Xk -Y, Wj) = Onxq for all j = 0,1" . " r. Let y be an arbitrary random n-vector in Y(wo,"" w r ) and write Yo = yY = 2:j=o PojWj where P oj are constant nxq matrices, j = 0,1"", r. Then trllxk - yll; = trll(xk - y) - Yoll; = tr{lI x k - YII; - (Xk -
Y, Yo) -
(Yo, Xk - y)
+ IIYoll;}
= tr{llxk -Yll;' - t(Xk -y,Wj)P;;j - tPOj(Xk _y,Wj)T + IIYolI;'} )=0
)=0
= trllxk - YII; + trllYoll; 2: trllxk -
YII; ,
so that trllxk - yll~ = Fk. Furthermore, equality is attained if and only if trllYoll~ = 0 or Yo = 0 so that Y = Y (cf. Exercise 3.1). This completes the proof of the lemma.
3.2 Innovations Sequences To use the data information, we require an "orthogonalization" process.
Definition 3.1. Given a random q-vector data sequence {Vj}, j = O,···,k. The innovations sequence {Zj}' j = O, .. ·,k, of {Vj} (i.e., a sequence obtained by changing the original data sequence { v j }) is defined by Zj=Vj-CjYj_1'
with Y-1 =
0
j=O,l,···,k,
(3.4)
and j-1
Yj-1 =
L Pj-1,i
V
i E Y(vo,"', Vj_1) ,
j = 1"", k,
i=o
where the q x n matrices Cj are the observation matrices in (3.1) and the n x q matrices Pj - 1 ,i are chosen so that Yj-1 solves the minimization problem (3.2) with Y(wo,"', w r ) replaced by Y(vo,"', Vj_1)'
We first give the correlation property of the innovations sequence.
36
3. Orthogonal Projection and Kalman Filter
Lemma 3.2. The innovations sequence {Zj} of {Vj} satisfies the following property: (Zj,Zf)
= (Ri + Clllxl -
Yl-lll~CJ)8jl,
where Ri = Var('!lt) > O. For convenience, we set ej
= Cj(Xj
(3.5)
- Yj-l)'
To prove the lemma, we first observe that Zj
= ej + '!1j'
(3.6)
where {'!1k} is the observation noise sequence, and
('!It' ej) = Oqxq
for all
f
2j.
(3.7)
Clearly, (3.6) follows from (3.4), (3.5), and the observation equation in (3.1). The proof of (3.7) is left to the reader as an exercise (cf. Exercise 3.2). Now, for j = f, we have, by (3.6), (3.7), and (3.5) consecutively, (Zl, Zi)
= (el + '!It' ei + '!It) = (el, el) + (!It' !il) = Cl\\Xl - Yl-ll1~cJ + Ri·
For j =I f, since (ei,ej)T = (ej,ei)' we can assume without loss of generality that j > f. Hence, by (3.6), (3.7), and Lemma 3.1 we have (Zj, Zl) = (ej, el) + (ej, "70) + ("7., el) -J ~
+ (7J., "70) -J ~
= (ej, ei + '!It) = (ej, Zi) = (ej, Vi - CiYi-l)
/
=
i-I) i-I Cj(Xj - Yj-l, Vi) - Cj L(Xj - Yj-l, Vi)Pl~l,iCJ
=
Oqxq.
= \ Cj(Xj - Yj-l), ve -
Ce ~.f>e-l,ivi 2=0
i=o
This completes the proof of the lemma.
3.3 Minimum Variance Estimates
37
Since Rj > 0, Lemma 3.2 says that {Zj} is an "orthogonal" sequence of nonzero vectors which we can normalize by setting (3.8)
Then {ej} is an "orthonormal" sequence in the sense that (ei' ej) = bij Iq for all i and j. Furthermore, it should be clear that (3.9)
(cf. Exercise 3.3).
3.3 Minimum Variance Estimates We are now ready to give the minimum variance estimate Xk of the state vector Xk by introducing the "Fourier expansion" k
Xk = L(Xk' ei)ei
(3.10)
i=o
of Xk with respect to the "orthonormal" sequence {ej}' Since k
(xk,ej)
= L(xk,ei)(ei,ej) = (xk,ej)' i=o
we have (Xk-Xk,ej)=Onxq,
j=O,l,···,k.
(3.11 )
It follows from Exercise 3.3 that (3.12)
so that by Lemma 3.1,
That is, Xk is a minimum variance estimate of Xk.
38
3. Orthogonal Projection and Kalman Filter
3.4 Kalman Filtering Equations This section is devoted to the derivation of the Kalman filtering equations. From Assumption 2.1, we first observe that
(cf. Exercise 3.4), so that k
Xk = L(xk,ej)ej j=o k-l
= L(xk,ej)ej + (xk,ek)ek j=o k-l
= L {(Ak-lXk-l, ej )ej + (rk-l~k_l' ej )ej} + (Xk, ek)ek j=o k-l
A k- 1 L(xk-l,ej)ej + (xk,ek)ek j=o = Ak-1Xk-l + (Xk' ek)ek .
=
Hence, by defining Xklk-l
where Xk-l
:= Xk-llk-l,
= Ak-lXk-l ,
(3.13)
we have (3.14)
Obviously, if we can show that there exists a constant n x q matrix Gk such that (xk,ek)ek = Gk(Vk - CkXk1k-l) ' then the "prediction-correction" formulation of the Kalman filter is obtained. To accomplish this, we consider the random vector (Vk - CkXklk-l) and obtain the following:
Lemma 3.3. For
j = 0,1, ... ,k,
To prove the lemma, we first observe that (3.15)
3.4 Kalman Filtering Equations
39
(cf. Exercise 3.4). Hence, using (3.14), (3.11), and (3.15), we have (Vk - Ck Xkl k-1 ,ek) = (Vk - Ck(Xklk - (Xk, ek)ek), ek)
= (Vk' ek) - C k { (xklk, ek) - (Xk, ek)} = (vk,ek) - Ck(Xklk - xk,ek) = (Vk' ek) = (Zk + CkYk-1, Il zkll;lzk) = (Zk, Zk) II zkll;l + Ck(Yk-1, Zk) IIZk 11;1 = II Z kllq· On the other hand, using (3.14), (3.11), and (3.7), we have (Vk - Ck Xk1k -1,ej)
= (CkXk +'!lk - Ck(Xklk - (xk,ek)ek),ej) = Ck (Xk - Xklk, ej) + ('!lk' ej) + Ck(Xk, ek) (ek' ej) = Oqxq for
j
= 0,1,···, k - 1. This completes the proof of the Lemma.
It is clear, by using Exercise 3.3 and the definition of Xk-1 = that the random q-vector (Vk-CkXklk-1) can be expressed as I:i=o Miei for some constant q x q matrices Mi. It follows now from Lemma 3.3 that for j = 0,1,· .. ,k, Xk-1Ik-1,
so that
Mo
= M 1 = ... = Mk-l = 0 and
Mk
=
IIzkllq.
Hence,
Define Then we obtain (Xk,ek)ek
= Gk(Vk
- CkXklk-l).
This, together with (3.14), gives the "prediction-correction" equation: (3.16)
40
3. Orthogonal Projection and Kalman Filter
We remark that xklk is an unbiased estimate of Xk by choosing an appropriate initial estimate. In fact, Xk - xklk ==Ak-1Xk-l
+ rk-l~k_l
- Ak-lXk-llk-l - Gk(Vk - CkAk-lXk-llk-l) .
Xk - xklk ==(1 - GkCk)Ak-l(Xk-l - Xk-llk-l)
+ (I -
GkCk)rk-l~k_l - Gk'!lk .
(3.17)
Since the noise sequences are of zero-mean, we have
so that
Hence, if we set XOIO
== E(xo) ,
(3.18)
then E(Xk - xklk) == 0 or E(Xklk) == E(Xk) for all k, Le., xklk is indeed an unbiased estimate of Xk. Now what is left is to derive a recursive formula for Gk. Using (3.12) and (3.17), we first have
o ==
(Xk - xklk, Vk)
+ (I - GkCk)rk-l~k_l Xk-llk-l) + Xk-llk-l) + Ckrk-l~k_l + '!lk)
== ((I - GkCk)Ak-l(Xk-l CkAk-l((Xk-l v
Xk-llk-l)
2
v
T
Gk'!lk'
T
== (I - GkCk)Ak-lllxk-l - Xk-llk-lllnAk-l C k
+ (I -
T
v
T
v
(3.19)
GkCk)rk-lQk-lrk_lck - GkRk,
where we have used the facts that (Xk-l -Xk-llk-l' Xk-llk-l) == Onxn, a consequence of Lemma 3.1, and (Xk'~k)
== Onxn,
(Xk'!lj) == Onxq , j
(Xklk' -J €.) == Onxn , (Xk-llk-l, !lk)
== 0, ... , k (cf. Exercise 3.5). Define Pk,k == Ilxk - xklkll;
== Onxq ,
(3.20)
3.4 Kalman Filtering Equations
41
and Pk,k-l
= Ilxk - xklk-lll~·
Then again by Exercise 3.5 we have Pk,k-l
= IIAk-1Xk-l + rk-l~k_l - Ak-lXk-llk-lll~ = Ak-1\lxk-l - Xk-llk-lll~Al-l + rk-lQk-lrl-l
or
= Ak-1Pk-l,k-lAl-..l + rk-lQk-lrl-l'
Pk,k-l
(3.21)
On the other hand, from (3.19), we also obtain T
v
T
(I - GkCk)Ak-lPk-l,k-lAk-lCk T T + (I - GkCk)rk-lQk-lrk_lCk - GkRk = v
v
o.
In solving for Ok from this expression, we write T
=
T
T
+ Ck(Ak-1Pk-l,k-lAk-l + rk-1Qk-lrk-l)Ck ] [Ak-lPk-l,k-lAl-l + rk-lQk-lrl-l]C~ v
Gk[Rk
= Pk,k-lC~ . and obtain
T 1 = Pk,k-lCkT (Rk + CkPk,k-lCk),
v
Gk
(3.22)
where Rk is positive definite and CkPk,k-lC~ is non-negative definite so that their sum is positive definite (cf. Exercise 2.2). Next, we wish to write Pk,k in terms of Pk,k-l, so that together with (3.21), we will have a recursive scheme. This can be done as follows: Pk,k
= II x k - xklkll~ = Ilxk - (xklk-l + Ok(Vk - Ckxklk-l))II~ 2 = Ilxk - xklk-l - Gk(CkXk + '!lk) + GkCkxklk-llln v
v
v
2
v
= 11(1 - GkCk)(Xk - xklk-l) - Gk!lklln
= (I = (I -
2
v
v
GkCk)\Ixk - xklk-llln(1 - GkCk) v
v
GkCk)Pk,k-l(1 - GkCk)
T
T
v
VT
+ GkRkGk v
VT
+ GkRk G k ,
where we have applied Exercise 3.5 to conclude that (Xk = Onxq. This relation can be further simplified by using (3.22). Indeed, since
xklk-l,!lk)
42
3. Orthogonal Projection and Kalman Filter
we have ....
v
Pk,k ==(1 - GkCk)Pk,k-l(I - GkCk)
T....
+ (I -
..
GkCk)Pk,k-l(GkCk)
==(1 - OkCk)Pk,k-l .
T
(3.23)
Therefore, combining (3.13), (3.16), (3.18), (3.21), (3.22) and (3.23), together with Po,o == II x o - xOloll~ == Var(xo) ,
(3.24)
we obtain the Kalman filtering equations which agree with the ones we derived in Chapter 2. That is, we have xklk == xklk' xklk-l == xklk-l and Ok == Gk as follows: Po,o == Var(xo) Pk,k-l == Ak-lPk-l,k-lAl-l + fk-lQk-lfr-l Gk == Pk,k-l C -:' (CkPk,k-1C-:' + Rk)-l Pk,k == (I - GkCk)Pk,k-l
== E(xo) xklk-l == Ak-1Xk-llk-l xklk == xklk-l + Gk(Vk - CkXk1k-l) k == 1,2, .... xOlo
(3.25)
Of course, the Kalman filtering equations (2.18) derived in Section 2.4 for the general linear deterministic/stochastic system Xk+l {
Vk
== AkXk + BkUk + rk~k == CkXk + DkUk +!1k
can also be obtained without the assumption on the invertibility of the matrices Ak, VarC~k,j)' etc. (cf. Exercise 3.6).
3.5 Real-Time Tracking To illustrate the application of the Kalman filtering algorithm described by (3.25), let us consider an example of real-time tracking. Let x(t), 0 :::; t < 00, denote the trajectory in three-dimensional space of a flying object, where t denotes the time variable (cf. Fig.3.1). This vector-valued function is discretized by sampling and quantizing with sampling time h > 0 to yield Xk
~
x(kh),
k
== 0,1,···.
3.5 Real-Time 'fracking
43
Fig. 3.1.
, ", - -
-~ x(t)
"'-, I I
• x(O)
For practical purposes, x(t) can be assumed to have continuous first and second order derivatives, denoted by x(t) and x(t), respectively, so that for small values of h, the position and velocity vectors Xk and Xk ~ x(kh) are governed by the equations
{
·
1h2 ..
~k+l
= ~k + h~k +"2
Xk+l
= Xk + hXk ,
Xk
where Xk ~ x(kh) and k = 0,1,···. In addition, in many applications only the position (vector) of the flying object is observed at each time instant, so that Vk = CXk with C = [I 0 0] is measured. In view of Exercise 3.8, to facilitate our discussion we only consider the tracking model
(3.26)
44
3. Orthogonal Projection and Kalman Filter
to be zero-mean Gaussian white noise sequences satisfying: E(~k)
= 0,
E("1k) = 0,
E(~k~;) = Qk 6kl, E(xo~;) = 0,
E("1k"1l) = rk 6kl, E(Xo"1k) = 0,
where Qk is a non-negative definite symmetric matrix and rk > 0 for all k. It is further assumed that initial conditions E(xo) and Var(xo) are given. For this tracking model, the Kalman filtering algorithm can be specified as follows: Let Pk := Pk,k and let P[i, j] denote the (i, j)th entry of P. Then we have Pk,k-l[l,l] = Pk-l[l, 1]
+ 2hPk-l[1, 2] + h2Pk_l[1, 3] + h 2Pk-l[2, 2] h4
+ h 3 Pk-l[2, 3] + 4Pk-1[3, 3] + Qk-l[l, 1], P k,k-l[1,2] = Pk,k-l[2, 1] 3h 2
= Pk-l[l, 2] + hPk-l[l, 3] + hPk-l[2, 2] + TPk-1[2, 3] h3
Pk,k-l[2,2] = Pk,k-l[1,3]
=
+ 2Pk-1[3, 3] + Qk-l[l, 2], 2 Pk-l[2, 2] + 2hPk-l[2, 3] + h Pk-l[3, 3] + Qk-l[2, 2], Pk,k-l[3, 1]
h2 = P k- 1[1, 3] + hPk- 1[2, 3] + 2Pk-1[3, 3] + Qk-l[l, 3],
= Pk,k-l [3,2] = Pk-l[2, 3] + hPk-l[3, 3] + Qk-l[2, 3], Pk,k-l [3,3] = Pk-l [3,3] + Qk-l [3,3] , Pk,k-l [2,3]
with Po,o = Var(xo) ,
Exercises
45
(3.27)
with :Kala = E(xo).
Exercises 3.1. Let A =f. 0 be a non-negative definite and symmetric constant matrix. Show that trA > o. (Hint: Decompose A as A = BB T with B =f. 0.) 3.2. Let j-I
ej
=
Cj(Xj -
Yj-I) = C j
(Xi -
L
Pi-l,iVi) ,
'1.=0
where Pj-I,i are some constant matrices. Use Assumption 2.1 to show that for all /!, ? j. 3.3. For random vectors
define
WO,"', W r ,
Y(Wo,"', w r ) r
Po, ... 'Pr' constant matrices}.
y= LPiWi, i=O
Let
j-I
Zj
= Vj
-
L
Cj
Pj-l,iVi
i=O
be defined as in (3.4) and
ej
3.4. Let
=
Ilzjll-lzj'
Show that
j-I
Yj-I
=L
Pj-l,iVi
i=O
and
j-I Zj
=
Vj -
Cj
L
Pj-l,iVi .
i=O
Show that j
= 0,1," . ,k - 1.
46
3. Orthogonal Projection and Kalman Filter
3.5. Let
ej
be defined as in Exercise 3.3. Also define k
Xk == L(Xk, ei)ei i=O
as in (3.10). Show that
(Xk, -J "l.) == Onxq ,
for j == O,l,···,k. 3.6. Consider the linear deterministic/stochastic system
+ BkUk + rk{k == CkXk + DkUk +!lk '
Xk+l == AkXk {
Vk
where {Uk} is a given sequence of deterministic control input m-vectors, 1 :::; m :::; n. Suppose that Assumption 2.1 is satisfied. Derive the Kalman filtering algorithm for this model. 3.7. Consider a simplified radar tracking model where a largeamplitude and narrow-width impulse signal is transmitted by an antenna. The impulse signal propagates at the speed of light c, and is reflected by a flying object being tracked. The radar antenna receives the reflected signal so that a timedifference b..t is obtained. The range (or distance) d from the radar to the object is then given by d == c~t/2. The impulse signal is transmitted periodically with period h. Assume that the object is traveling at a constant velocity w with random disturbance ~ ~ N(O, q), so that the range d satisfies the difference equation d k+1 == dk
+ h(Wk + ~k) .
Suppose also that the measured range using the formula d == cb..t/2 has an inherent error ~d and is contaminated with noise "l where "l~N(O,r), so that
Assume that the initial target range is do which is independent of ~k and "lk, and that {~k} and {"lk} are also independent (cf. Fig.3.2). Derive a Kalman filtering algorithm as a rangeestimator for this radar tracking system.
Exercises
radar
47
Fig. 3.2.
3.8. A linear stochastic system for radar tracking can be described as follows. Let E, ~A, ~E be the range, the azimuthal angular error, and the elevational angular error, respectively, of the target, with the radar being located at the origin (cf. Fig.3.3). Consider E, ~A, and ~E as functions of time with first and second derivatives denoted by E, ~A, ~E, ~, ~A, ~E, respectively. Let h > 0 be the sampling time unit and set Ek = E(kh), Ek = E(kh), Ek = E(kh), etc. Then, using the second degree Taylor polynomial approximation, the radar tracking model takes on the following linear stochastic statespace description:
+ rk~k = CXk + '!lk'
Xk+l = AXk {
Vk
where Xk = [Ek
Ek Ek ~Ak ~Ak ~Ak ~Ek ~Ek ~Ek] T
1
h
o o
1 0
2
h /2 h 1 1
h
h 2 /2
o o
1
h
0
1
h
1 0 0
C=
[~
0 0 0
0 0 0
0 1 0
0 0 0
0 0 0
0 0 1
1 0 0 0 0
h2 /2 h 1
~] ,
,
48
3. Orthogonal Projection and Kalman Filter
and {{k} and {!lk} are independent zero-mean Gaussian white noise sequences with Var({k) = Qk and Var(!lk) = Rk. Assume that
rk = I
Qk =
[Qk
Q~
[f
I
k
Q~] ,
r 2k
f~] ,
Rk = [ RIk
R 2k
Rf] ,
where r~ are 3 x 3 submatrices, Q~, 3 x 3 non-negative definite symmetric submatrices, and Rl, 3 x 3 positive definite symmetric submatrices, for i = 1, 2, 3. Show that this system can be decoupled into three subsystems with analogous state-space descriptions.
Fig. 3.3.
4. Correlated System and Measurement Noise Processes
In the previous two chapters, Kalman filtering for the model involving uncorrelated system and measurement noise processes was studied. That is, we have assumed all along that
E(~kiJ) =
°
for k, R = 0,1,' ". However, in applications such as aircraft inertial navigation systems, where vibration of the aircraft induces a common source of noise for both the dynamic driving system and onboard radar measurement, the system and measurement noise sequences {~k} and {!lk} are correlated in the statistical sense, with E(~kiJ) = Sk 8k£ , k,P = 0,1,"', where each Sk is a known non-negative definite matrix. This chapter is devoted to the study of Kalman filtering for the above model.
4.1 The Affine Model Consider the linear stochastic state-space description
+ rk~k Vk = CkXk + !lk where Ak' Ck and rk are known constant Xk+l = AkXk
{
with initial state Xo, matrices. We start with least-squares estimation as discussed in Section 1.3. Recall that least-squares estimates are linear functions of the data vectors; that is, if x is the least-squares estimate of the state vector x using the data v, then it follows that x = Hv for some matrix H. To study Kalman filtering with correlated system and measurement noise processes, it is necessary to extend to a more general model in determining the estimator x. It turns out that the affine model x=h+Hv
(4.1)
50
4. Correlated Noise Processes
which provides an extra parameter vector h is sufficient. Here, h is some constant n- vector and H some constant n x q matrix. Of course, the requirements on our optimal estimate x of x are: x is an unbiased estimator of x in the sense that E(x)
= E(x)
(4.2)
and the estimate is of minimum (error) variance. From (4.1) it follows that h = E(h) = E(x - Hv)
= E(x) -
H(E(v)).
Hence, to satisfy the requirement (4.2), we must have h = E(x) - HE(v).
or, equivalently,
x = E(x) -
(4.3)
H(E(v) - v) .
(4.4)
On the other hand, to satisfy the minimum variance requirement, we use the notation F(H) = Var(x -
x) =
Ilx -
xII;,
so that by (4.4) and the fact that Ilvll; = Var(v) is positive definite, we obtain· F(H)
= (x-x,x-x) = ((x - E(x)) - H(v - E(v)), (x - E(x)) - H(v - E(v))) = Ilxll; - H(v,x) - (x, v)H T + Hllvll~HT = {lIxll; - (x,v)[Ilvll~]-l(v,x)} + {Hllvll~HT - H(v, x) - (x, v)H T + (x, v)[lIvll~]-l(v, x)} = {lIxll~ - (x; v) [lIvll~]-l (v, x)} + [H - (x, v)[lIvll~]-l]lIvll~[H - (x, v)[lIvll~]-l]T ,
where the facts that (x, v) T = (v, x) and that Var(v) is nonsingular have been used. Recall that minimum variance estimation means the existence of H* such that F(H) 2:: F(H*), or F(H) - F(H*) is nonnegative definite, for all constant matrices H. This can be attained by simply setting H* = (x, v)[Ilvll~]-l ,
(4.5)
4.2 Optimal Estimate Operators
51
so that F(H) - F(H*) = [H - (x, v)[llvll~]-l]lIvll~[H - (x, v)[llvll~]-l]T ,
which is non-negative definite for all constant matrices H. Furthermore, H* is unique in the sense that F(H) - F(H*) = 0 if and only if H = H*. Hence, we can conclude that x can be uniquely expressed as
x = h+H*v, where H* is given by (4.5). We will also use the notation x = L(x, v) for the optimal estimate of x with data v, so that by using (4.4) and (4.5), it follows that this "optimal estimate operator" satisfies: L(x, v) = E(x) + (x, v)[llvll~]-l(v - E(v)). (4.6)
4.2 Optimal Estimate Operators
First, we remark that for any fixed data vector v, L(·, v) is a linear operator in the sense that L(Ax + By, v)
= AL(x, v) + BL(y, v)
(4.7)
for all constant matrices A and B and state vectors x and y (cf. Exercise 4.1). In addition, if the state vector is a constant vector a, then (4.8) L(a, v) = a (cf. Exercise 4.2). This means that if x is a constant vector, so that E(x) = x, then x = x, or the estimate is exact. We need some additional properties of L(x, v). For this purpose we first establish the following.
Lemma 4.1. Let v be a given data vector and y = h+Hv, where h is determined by the condition E(y) = E(x), so that y is uniquely determined by the constant matrix H. If x* is one of the y's such that trllx - x*ll~ = mintrllx - YII~, H
then it follows that x*
=
x,
where x = L(x, v) is given by (4.6).
52
4. Correlated Noise Processes
This lemma says that the minimum variance estimate x and the "minimum trace variance" estimate x* of x from the same data v are identical over all affine models. To prove the lemma, let us consider trllx - YII; == trE((x - y)(x _ y) T)
== E((x - y)T (x - y)) == E((x - E(x)) - H(v - E(v))T ((x - E(x)) - H(v - E(v)),
where (4.3) has been used. Taking 8 2 8H (trllx - Ylln)
== 0,
we arrive at x* == E(x) - (x, v)[Ilvll~]-l(E(v) - v)
(4.9)
which is the same as the x given in (4.6) (cf. Exercise 4.3). This completes the proof of the Lemma.
4.3 Effect on Optimal Estimation with Additional Data
Now, let us recall from Lemma 3.1 in the previous chapter that w r ) satisfies
y E Y == Y(wo,""
trllx -
YII; == mintrllx - YII; yEY
if and only if (x-y,Wj)==Onxq,
Set Y == Y(v - E(v)) and H* == (x, v)[llvll~]-l. If we
x == x
j==O,l,···,r.
x == L(x, v) == E(x) + H*(v - E(v)), where use the notation
- E(x)
and
v == v
- E(v) ,
then we obtain
IIx - xii; == 11 (x -
E(x)) - H*(v - E(v))II;
== Ilx - H*vll; .
4.3 Effect on Optimal Estimation
53
But H* was chosen such that F(H*) ::; F(H) for all H, and this implies that trF(H*) ::; trF(H) for all H. Hence, it follows that
trllx for all
Y E Y(v - E(v))
H*vll; ::;
trllx - YII;
= y(v). By Lemma 3.1, we have (x - H*v, v) = Onxq .
Since E(v) is a constant, (x - H*v, E(v)) = Onxq, so that (x - H*v, v) = Onxq ,
or (x -
x, v) = Onxq .
(4.10)
Consider two random data vectors vI and v 2 and set (4.11)
Then from (4.10) and the definition of the optimal estimate operator L, we have (4.12)
and similarly,
(v 2#, vI) =
o.
(4.13)
The following lemma is essential for further investigation. Lemma 4.2. Let x be a state vector and vI, v 2 be random observation data vectors with nonzero finite variances. Set
Then the minimum variance estimate x of x using the data v can be approximated by the minimum variance estimate L(x, vI) of x using the data vI in the sense that (4.14)
with the error e(x,v 2 ) :=L(x#,v 2 #) = (x#, v 2#)[llv 2#11 2]-Iv 2# .
(4.15)
54
4. Correlated Noise Processes
We first verify (4.15). Since L(x,y 1) is an unbiased estimate of x (cf. (4.6)), E(x#) = E(x - L(x, yl)) = o.
Similarly, E(y2#) =
o.
Hence, by (4.6), we have
L(x#, y2#) = E(x#) + (x#, y2#) [Il y 2# 11 2]-I(y2# - E(y2#)) = (x#,y2#)[lI y 2#1I 2]-l y 2#,
yielding (4.15). To prove (4.14), it is equivalent to showing that XO
:=
L(x, yl)
+ L(x# , y2#)
is an affine unbiased minimum variance estimate of x from the data y, so that by the uniqueness of x, XO = X = L(x, v). First, note that XO = L(x, yl)
+ L(x#, y2#)
= (hI + H1y l) + (h 2 + H 2(y 2 - L(y2, yl)) = (hI + H1y l) + h 2 + H 2(y2 - (h3 + H3y l)) = (hI + h 2 :=
-
H2h3 )
+H
[
:~ ]
h+Hy,
where H = [HI - H 2 H 3 H 2 ]. Hence, XO is an affine transformation of Y. Next, since E(L(x, yl)) = E(x) and E(L(x#, y2#)) = E(x#) = 0, we have E(xO) = E(L(x, yl))
+ E(L(x#, y2#)) = E(x).
Hence, XO is an unbiased estimate of x. Finally, to prove that XO is a minimum variance estimate of x, we note that by using Lemmas 4.1 and 3.1, it is sufficient to establish the orthogonality property (x-XO,y) =Onxq.
This can be done as follows. By (4.15), (4.11), (4.12), and (4.13), we have (x - xO, y) = (x# - (x#,y2#)[lI y 2#11 2]-l y 2#,y) 1
[:2]) -
2 22
2 [:2])
= (x#, (x#,v #Hllv #11 j-l(v #, = (x#, y2) _ (x#, v 2#) [ll y 2# 11 2]-1 (y2#, y2) .
1
4.4 Kalman Filtering Equations
55
But since v 2 = v 2# + L(v 2 , vI), it follows that (v 2#,v 2) = (v 2#,v 2#)
+ (v 2#,L(v 2,v l ))
from which, by using (4.6), (4.13), and the fact that (v 2#,E(v l )) = (v 2 #, E(v 2 )) = 0, we arrive at (v 2#, L(v 2 , vI)) = (v 2#, E(v 2 ) + (v 2, vl)[llvI1l2]-I(vl - E(v l ))) = ((v 2#,v l ) _ (v2#,E(vl)))[llvII12]-I(vl,v2) =0,
so that
(v 2#, v 2) == (v 2#, v 2#) .
Similarly, we also have (x#, v 2) = (x#, v 2#) .
Hence, indeed, we have the orthogonality property: (x - xo, v) =(x#,v 2#) - (x#,v2#)[Ilv2#112]-I(v2#,v2#) ==Onxq.
This completes the proof of the Lemma. 4.4 Derivation of Kalman Filtering Equations
We are now ready to study Kalman filtering with correlated system and measurement noises. Let us again consider the linear stochastic system described by Xk+1 {
Vk
= AkXk + rk~k = CkXk + '!lk
(4.16)
with initial state Xo, where A k , Ck and rk are known constant matrices. We will adopt Assumption 2.1 here with the exception that the two noise sequences {~k} and {'!lk} may be correlated, namely: we assume that {~k} and {'!lk} are zero-mean Gaussian white noise sequences satisfying E(~kXci) = Opxn , E(~k~;) == Qk 8kl , E(~kiJ) = Sk 8kl ,
E('!lk xci ) = Oqxn , E('!lkiJ) == Rk 8kl ,
56
4. Correlated Noise Processes
where Qk, Rk are, respectively, known non-negative definite and positive definite matrices and Sk is a known, but not necessarily zero, non-negative definite matrix. The problem is to determine the optimal estimate Xk = xklk of the state vector Xk from the data vectors Vo, VI,···, Vk, using the initial information E(xo) and Var(xo). We have the following result.
Theorem 4.1. The optimal estimate Xk = xklk of Xk from the data vo, VI,···, Vk can be computed recursively as follows: Define Po,o = Var(xo) .
Then, for
k
= 1,2, ... , compute
Pk,k-I =(A k- I - Kk-ICk-I)Pk-l,k-I(Ak-1 - Kk-ICk-l) T
+ rk-IQk-Irl-1
- Kk-IRk-IKl-I ,
(a)
where (b)
and the Kalman gain matrix
(c) with Pk,k
= (I -
GkCk)Pk,k-1 .
(d)
Then, with the initial condition xOlo
= E(xo) ,
compute, for k = 1,2,· .. , the prediction estimates
and the correction estimates
(f) (cf. Fig.4.1).
4.4 Kalman Filtering Equations
57
These are the Kalman filtering equations for correlated system and measurement noise processes. We remark that if the system noise and measurement noise are uncorrelated; that is, Sk-I = Opxq, so that Kk-I = Onxq for all k = 1,2,···, then the above Kalman filtering equations reduce to the ones discussed in Chapters 2 and 3.
+
Fig. 4.1.
We will first derive the prediction-correction formulas (e) and (f). In this process, the matrices Pk,k-I, Pk,k, and Gk will be defined, and their computational schemes (a), (b), (c), and (d) will be determined. Let
Then, v k , V k - I , and Vk can be considered as the data vectors v, and v 2 in Lemma 4.2, respectively. Also, set
VI,
A
Xklk-I
Xklk
= L( Xk, v k-I) , = L(Xk, v k ) ,
and X# k
= Xk
- Xklk-I A
= Xk
-
L( Xk, v k-I) .
58
4. Correlated Noise Processes
Then we have the following properties k 2
(~k-1' V - )
= 0, (~k-1' Xk-1) = 0, (Xt-1'~k_1) = 0,
(Xk-1Ik-2, ~k-1)
= 0,
k ('1k-1' V -
2
)
= 0,
('1k-1' Xk-1)
= 0,
(4.17)
(xt-1' '1k-1) = 0, (Xk-1Ik-2, '!lk-1) = 0,
(cf. Exercise 4.4). To derive the prediction formula, the idea is to add the "zero term" Kk-1(Vk-1 - Ck-1Xk-1 - '!lk-1) to the estimate
For an appropriate matrix Kk-1, we could eliminate the noise correlation in the estimate by absorbing it in Kk-1. More precisely, since L(.,v k - 1 ) is linear, it follows that Xklk-1
+ rk-1~k_1 + Kk-1(Vk-1 - Ck-1Xk-1 - '!lk-1)' v k- 1) =L((Ak- 1 - Kk-1Ck-1)Xk-1 + Kk-1Vk-1 + (rk-1~k_1 - Kk-1'!1.k_1) , Vk- 1) =(Ak-1 - Kk-1Ck-1)L(Xk-1, v k- 1) + Kk-1L(Vk-1, V k- 1) + L(rk-1~k_1 - Kk-1'!lk_1' v k - 1 ) =L(Ak-1Xk-1
:=11
+ 12 + 13 .
We will force the noise term 13 to be zero by choosing Kk-1 appropriately. To accomplish this, observe that from (4.6) and (4.17), we first have 13 = L(rk-1~k_1 - Kk-1'!lk_1' v
k 1 - )
= E(rk-1~k_1 - Kk-1!l.k_1)
+ (rk-1~k_1
- Kk-1!l.k_1' vk-1)[llvk-1112]-1(vk-1 - E(v k - 1))
=
(rk-l~k_l -
=
(rk-1~k_1 - Kk-1!l.k_1' Vk_1)[!Ivk-1112]-1(vk-1 - E(v k- 1))
= (rk-1~k_1
Kk-l!lk_l'
[::~:J) [llvk-1112rl(vk-l -
- Kk-1'!lk_1' Ck-lXk-l
k 1 E(V - ))
+ '1k-1) [lIvk-1112]-1(vk-l
= (rk-lSk-l - Kk_lRk_l)[llvk-1112]-1(vk-l - E(V k - 1)).
Hence, by choosing
k 1 - E(v - ))
4.4 Kalman Filtering Equations
so that (b) is satisfied, we have 13 = determined as follows:
o. Next,
11 = (A k- 1 - Kk-1Ck-l)L(Xk-l, v k -
11 1
59
and 12 can be
)
= (A k- 1 - Kk-1Ck-l)Xk-llk-l and, by Lemma 4.2 with vt-l = Vk-l - L(Vk-l, v k- 2), we have 12
= Kk-1L(Vk-l, v k = Kk-1L(V k- 1, [V
1
)
k-2
])
Vk-l = Kk-l(L(Vk-l, v k- 2) + (vt-l' vt_l)[llvt_11l 2]-lvt_l) = Kk-1(L(Vk-l, v k- 2) + vt-l)
= Kk-lVk-l. Hence, it follows that Xklk-l = 11 + 12 = (Ak-l - Kk-1Ck-l)Xk-llk-l
+ Kk-lVk-l
= Ak-lXk-llk-l + Kk-l(Vk-l - Ck-1Xk-llk-l) which is the prediction formula (e). To derive the correction formula, we use Lemma 4.2 again to conclude that Xklk = L(Xk,v k- 1 )
+ (xt,vt)[llvtI1 2]-lvt
= xklk-l + (xt,vt)[llvtIl 2]-lvt,
(4.18)
where and, using (4.6) and (4.17), we arrive at vt = Vk - L(Vk, v k- 1)
= CkXk + ~k - L(CkXk + ~k' v k - 1 ) = CkXk + ~k - CkL(Xk, v k - 1 ) - L(~k' v k = Ck(Xk - L(Xk, V k - 1 )) + ~k - E('!lk)
1
)
- (!1k' vk-l)[lIvk-1112]-1(vk-l - E(v k- 1))
= Ck(Xk - L(Xk, v k - 1 )) + ~k = Ck(Xk - xklk-l) + '!lk .
60
4. Correlated Noise Processes
Hence, by applying (4.17) again, it follows from (4.18) that Xklk
== xklk-I + (Xk
- xklk-I, Ck(Xk - xklk-I)
. [IICk(Xk - xklk-I)
==
Xk\k-I
+ II x k
-
+ !lkll~]-I(Ck(xk
+ !!..k)
- xklk-I)
+ !lk)
xk\k-IIl;C~
. [Ckllxk - xklk-III;C~
== xklk-I + Gk(Vk
+ Rk]-I(Vk -
CkXk1k-l)
- CkXk1k-I) '
which is the correction formula (f), if we set Pk,j
== Ilxk - xk1jll;
and Gk
==
Pk,k-IC-: (CkPk,k-IC-:
+ Rk)-I.
(4.19)
What is left is to verify the recursive relations (a) and (d) for and Pk,k. To do so, we need the following two formulas, the justification of which is left to the reader: Pk,k-I
(4.20)
and (Xk-I - Xk-Ilk-I, rk-lik_ 1 - Kk-l!!..k_l)
==
Onxn ,
(4.21)
(cf. Exercise 4.5). Now, using (e), (b), and (4.21) consecutively, we have Pk,k-I
== II x k - xklk-Ill; == II A k-I Xk-1 + rk-l~k_1 -
Ak-IXk-Ilk-1
- Kk-I(Vk-1 - Ck-I Xk-Il k-I)II;
== IIAk-IXk-1 + rk-l~k_1 - Kk-I(Ck-IXk-1
== II(A k- 1 -
- Kk-l!!..k_l) 11;
(A k- I - Kk-ICk-I)Pk-Ilk-I(Ak-1 - Kk-ICk-l) T
+ Kk-IRk-IK-:_I ==
- Ck-IXk-Ilk-I)II;
Kk-ICk-I)(Xk-1 - Xk-Ilk-I)
+ (rk-l~k_1 ==
+ !lk-I
Ak-IXk-Ilk-1
- rk-ISk-IK-:_I - Kk-IS~-lrJ-I
(A k- I - Kk-I Ck-I)Pk-l,k-1 (A k- I - Kk-I Ck-I) T
+ rk-IQk-IrJ-I which is (a).
+ rk-IQk-IrJ-I
- Kk-IRk-IK~_1 ,
4.5 Real-Time Applications
61
Finally, using (f), (4.17), and (4.20) consecutively, we also have Pk,k
= = = =
Ilxk - xklkll; Ilxk - xklk-l - Gk(Vk - CkXklk-l)ll~ lI(xk ~ xklk-l) - Gk(CkXk
11(1 -
+ '!lk -
CkXklk-l)ll;
GkCk)(Xk - xklk-l) - Gk'!lk ll ;
= (1 - GkCk)Pk,k-l(1 - GkCk)T + GkRkGl = (1 - GkCk)Pk,k-l - (1 - GkCk)Pk,k-l C-:Gl + GkRkGl = (1 - GkCk)Pk,k-l , which is (d). This completes the proof of the theorem.
4.5 Real-Time Applications An aircraft radar guidance system provides a typical application. This system can be described by using the model (3.26) considered in the last chapter, with only one modification, namely: the tracking radar is now onboard to provide the position data information. Hence, both the system and measurement noise processes come from the same source such as vibration, and will be correlated. For instance, let us consider the following state-space description Xk+l[l]] Xk+l[2] = [ Xk+l[3]
[1
0 0
2
h 1 0
h /2] [Xk[l]] h xk[2] 1 xk[3]
Xk[l]] Vk = [1 0 0] xk[2] [ xk[3]
+ [~k[l]] ~k[2] ~k[3]
(4.22)
+ TJk ,
where {{k}' with {k := [~k[l] ~k[2] ~k[3]]T, and {TJk} are assumed to be correlated zero-mean Gaussian white noise sequences satisfying E({k) = 0,
E(TJk) = 0,
E({k(;) = Qk 6ki!, E(xo{~) = 0,
E(TJkTJi!) = rk 6ki!,
E({kTJi!) = Sk 6ki!,
E(XoTJk) = 0 ,
with Qk ~ 0, rk > 0, Sk := [sk[l] sk[2] sk[3]]T ~ 0 for all E(xo) and Var(xo) are both assumed to be given.
k,
and
62
4. Correlated Noise Processes
An application of Theorem 4.1 to this system yields the following: Pk,k-l [1,1] == Pk-l[l, 1] + 2hPk-l[l, 2] + h 2 Pk-l[l, 3] + h 2 Pk-l[2, 2] h4 + h3 Pk-d2, 3J + 4" P k-l[3, 3J + Qk-d1, 1] + Sk-l[l] rk-l
1 {Sk-d JPk-d1, 1J - 2(Pk-l[l, 1] rk-l
+ hPk-d1,2J + ~2 Pk-d 1,3J) Pk,k-l [1, 2]
-
Sk-l[1 1},
== Pk,k-l [2, 1] 3h 2
== P k- 1 [1,2] + hPk-l[I,3] + hPk-l[2,2] + T Pk - 1 [2,3] 3
h + -Pk-l[3, 3]
2
+ Qk-l[l, 2] + {Sk-l[I]Sk-l[2] 2 Pk-l[I,I] rk- 1
Sk-l[l] ( ) Sk-l[2] ( - - - Pk-l[I,2]+ hPk-l[I,3] - - - Pk-l[I,I] rk-l rk-l 2
+ hPk-l[l, 2]
) h + -P k- 1 [1, 3] -
2
Sk-I[I]Sk-l[2]} , rk-l
Pk,k-l [2,2]
== Pk-l[2, 2] + 2hPk-l[2, 3] + h 2 Pk-l[3, 3] + Qk-I[2, 2] Sk-l [2] { +-
Sk-l [2] ( --Pk-I[2,2] - 2 P k- I [I,2] rk-l
Pk,k-l [1,3]
== Pk,k-l [3,1]
rk-l
} + hPk- 1 [1, 3]) - Sk-l[2] ,
h2
== Pk-I[I, 3] + hPk-I[2, 3] + 2Pk-1[3, 3] + Qk-l[l, 3]
+ { Sk-d11Sk-l[31 Pk-d1, 1J - Sk-l[lJ Pk-l[l, 3J - Sk-l[3J (Pk-d 1, 1] r k- l
rk-l
2
+ hPk-l[I,2] Pk,k-I[2,3]
h + -P k- I [I,3])
2
-
rk-l
Sk-l [1]Sk-1 [3] } , rk-l
== Pk,k-l[3, 2]
== Pk-I[2, 3] + hPk- 1 [3, 3] + Qk-l[2, 3] + {
Sk-I[2]Sk-l[3] 2
rk- l
Pk-l[I,I]
_ sk-d2JSk-l[3J _ Sk-l[3J (Pk-d 1, 2) + hPk-d1, 3]) _ Sk-l[2)Sk-l[3) } , rk-l
rk-l
rk-l
4.6 DeterministicjStochastic Systems
Pk,k-l [3,3] =Pk-l [3,3]
63
+ Qk-l [3,3]
} +Sk-l[3] - - {Sk-l[3] --Pk-l[I,I]- 2Pk-l[I,3]-Sk-l[3] rk-l
rk-l
,
where P k- 1 = Pk,k-l and P[i,j] denotes the (i,j)th entry of P. In addition,
Pk,k-l [1, I]Pk,k-l [1, 3]] Pk,k-l[l, 2]Pk,k-l[l, 3] Pk,k_1 2 [1, 3]
with Po = Var(xo), and
[1]]
Xk_l1k-l Xk-llk-l[2] [ Xk-llk-l [3]
with :Kala = E(xo).
4.6 Linear DeterministicjStochastic Systems Finally, let us discuss the general linear stochastic system with deterministic control input Uk being incorporated. More precisely, we have the following state-space description: Xk+l = AkXk {
Vk
+ BkUk + rk~k
= CkXk + DkUk +!1k'
(4.23)
64
4. Correlated Noises
where Uk is a deterministic control input m-vector with 1 :::; m :::; n. It can be proved (cf. Exercise 4.6) that the Kalman filtering algorithm for this system is given by == Var(xo), Pk,k-l == (A k- l PO,O
Kk-l = rk-ISk-IRk~I' K k- l Ck-I)Pk-l,k-1 (A k- l - Kk-l Ck-l) T
+ rk-IQk-Irr-1
-
Kk-IRk-IKJ_I
Gk = Pk,k-ICJ (CkPk,k-ICJ
+ Rk)-l
== (1 - GkCk)Pk,k-1 xOlo == E(xo) xklk-l == Ak-IXk-Ilk-1 + Bk-l Uk-l Pk,k
+ Kk-I(Vk-1 xklk
k
-
== xklk-l + Gk(Vk
(4.24)
Dk-IUk-1 - Ck-1Xk-Ilk-l)
- DkUk - CkXk1k-l)
= 1,2,," ,
(cf. Fig.4.2). We remark that if the system and measurement noise processes are uncorrelated so that Sk == 0 for all k == 0,1,2, . ", then (4.24) reduces to (2.18) or (3.25), as expected.
Fig. 4.2.
Exercises
65
Exercises 4.1. Let
v
be a random vector and define L(x, v) = E(x)
+ (x, v) [ll v I1 2] -l(v -
E(v)) .
Show that L(·, v) is a linear operator in the sense that L(Ax + By, v)
= AL(x, v) + BL(y, v)
for all constant matrices A and B and random vectors x and y.
4.2. Let v be a random vector and L(·, v) be defined as above. Show that if a is a constant vector then L(a, v) == a. 4.3. For a real-valued function f and a matrix A = [aij], define
:~ = [:~j]T. By taking 8
8H (trllx
2
- yll ) = 0,
show that the solution x* of the minimization problem
yl12 = min trllx _ yl12 , H
trllx* -
where
y == E(x)
+ H(v -
E(v)), can be obtained by setting
x* == E(x) - (x, v) [ll v l1 2J-1 (E(v) - v) ,
where
= (x, v) [lI v Il 2 ] -1 .
H*
4.4. Consider the linear stochastic system (4.16). Let v k-1 -_
~o.
[
]
and
Vk-1
Define L(x, v) as in Exercise 4.1 and let xt Prove that
== Xk -xklk-l
k 1 xklk-l :== L(Xk,V - ).
(~k_1'Vk-2) =0,
k 2 (!1k-l' V - ) ==
0,
(~k-l'
Xk-l) ==
0,
(!lk-l'
Xk-l) ==
0,
(Xt-l'
~k-l) == 0,
(Xt-l'
!lk-1) == 0,
(Xk-llk-2,
~k-l)
== 0,
(Xk-1Ik-2, !lk-l) == 0 .
with
66
4. Correlated Noise Processes
4.5. Verify that and
4.6. Consider the linear deterministic/stochastic system
+ BkUk + rk~k = CkXk + DkUk + '!1.k '
Xk+l = AkXk {
Vk
where {Uk} is a given sequence of deterministic control inputs. Suppose that the same assumption for (4.16) is satisfied. Derive the Kalman filtering algorithm for this model. 4.7. Consider the following so-called ARMAX (auto-regressive moving-average model with exogeneous inputs) model in signal processing:
where {Vj} and {Uj} are output and input signals, respectively, {ej} is a zero-mean Gaussian white noise sequence with V ar( ej) == Sj > 0, and aj, bj , Cj are constants. (a) Derive a state-space description for this ARMAX model. (b) Specify the Kalman filtering algorithm for this statespace description. 4.8. More generally, consider the following ARMAX model in signal processing: n
Vk
== -
L j=l
m
ajVk-j
+L j=O
l
bjUk-j
+L
Cjek-j ,
j=O
where 0 ~ rn, R ~ n, {Vj} and {Uj} are output and input signals, respectively, {ej} is a zero-mean Gaussian white noise sequence with V ar( ej) == Sj > 0, and aj, bj , Cj are constants. (a) Derive a state-space description for this ARMAX model. (b) Specify the Kalman filtering algorithm for this statespace description.
5. Colored Noise
Consider the linear stochastic system with the following statespace description: Xk+l {
Vk
= AkXk + rk~k = CkXk + '!lk '
(5.1)
where A k , rk, and Ck are known n x n, n x p, and q x n constant matrices, respectively, with 1 :::; p, q :::; n. The problem is to give a linear unbiased minimum variance estimate of Xk with initial quantities E(xo) and Var(xo) under the assumption that
(i) ~k = Mk-l~k_l + f!.-k (ii) '!lk = Nk-l'!lk_l + lk' where ~-l = '!l-l = 0, {~k} and {lk} are uncorrelated zero-mean Gaussian white noise sequences satisfying E(f!.-kL)
= 0,
E(f!.-ktJ)
= Qk 8ki,
E(1.kl.£T)
= Rk 8ki ,
and Mk-l and Nk-l are known p x p and q x q constant matrices. The noise sequences {~k} and {'!lk} satisfying (i) and (ii) will be called colored noise processes. This chapter is devoted to the study of Kalman filtering with this assumption on the noise sequences.
5.1 Outline of Procedure The idea in dealing with the colored model (5.1) is first to make the system equation in (5.1) white. To accomplish this, we simply set Zk
= [{:] ,
and arrive at (5.2)
68
5. Colored Noise
Note that the observation equation in (5.1) becomes (5.3)
where Ok = [Ck 0]. We will use the same model as was used in Chapter 4, by considering where
The second step is to derive a recursive relation (instead of the prediction-correction relation) for Zk. To do so, we have, using the linearity of L, ,,-
Zk = Ak-1L(Zk-l, V
k
-
)
+ L(f!.k' V
k ).
From the noise assumption, it can be shown that -
L (!!.k' V
k )
(5.4)
= 0,
so that (5.5)
(cf. Exercise 5.1) and in order to obtain a recursive relation for we have to express Zk-llk in terms of Zk-l. This can be done by using Lemma 4.2, namely:
Zk,
5.2 Error Estimates It is now important to understand the error term in (5.6). First, we will derive a formula for By the definition of and the observation equation (5.3) and noting that
vt.
(\~k = [Ck oJ
vt
[;J
= 0,
5.2 Error Estimates
69
we have Vk
= CkZk + '!lk
+ Nk-1'!lk_1 + lk = Ck(Ak-1Zk- 1 + ~k) + Nk-1(Vk-1 = Hk-1Zk-1 + Nk-1Vk-1 + lk ' = CZk
with Hk-1
= ckAk- 1 -
Ck-1Zk-1)
+ lk (5.7)
Nk-1Ck-1
= [CkAk-1 - Nk-1Ck-1
Ckfk-1].
Hence, by the linearity of L, it follows that L(Vk, v k -
1
)
=Hk-1L(Zk-1, v
k
-
1
)
+ Nk-1L(Vk-1, v k - 1 ) + L(lk' v k - 1 ).
'" · SInce L (Zk-1, v k-1) = Zk-1,
(5.8)
(cf. Exercise 5.2), and (5.9)
we obtain
# L( Vk,V k-1) Vk=Vk-
= =
Vk -
(Hk-1Zk-1
Vk -
Nk-1Vk-1 -
+ Nk-1Vk-1)
(5.10)
Hk-1Zk-1 .
In addition, by using (5.7) and (5.10), we also have (5.11)
Now, let us return to (5.6). Using (5.11), (5.10), and = 0 (cf. Exercise 5.3), we arrive at
(Zk-1 -
zk-1,lk)
Zk-1lk
2 '" + (Zk-1 - Zk-1, '" -_ Zk-1 V k#)[11 V k#11 ]-1 V k# = Zk-1 + Ilzk-1 - zk_111 2 H,J-1(Hk-1Ilzk-1
. (Vk -
Nk-1Vk-1 -
Hk-1Zk-1).
Putting this into (5.5) gives
-
Zk_111
2
H;-l
+ Rk)-l
70
5. Colored Noise
or
[1:] = [A~-l
ft:~lJ [1:=:]
+ Gk (Vk -
Nk-lVk-l - Hk-l
[1:=:]) ,
(5.12)
where
with (5.14)
5.3 Kalman Filtering Process What is left is to derive an algorithm to compute Pk and the initial condition zoo Using (5.7), we have
= (Ak-1Zk-l + ~k) - (Ak-1Z k- 1 + Gk(Vk - Nk-lVk-l - Hk-izk-l)) = Ak-l(Zk-l - Zk-l) + ~k - Gk(Hk-lZk-l + lk - Hk-lZk-l) = (A k- 1 - GkHk-l)(Zk-l - Zk-l) + (~k - Gklk ) .
In addition, it follows from Exercise 5.3 and
(Zk-l - Zk-l, ~k)
=0
(cf. Exercise 5.4) that Pk
=
-
-
(A k - 1 - GkHk-l)Pk-l(A k- 1 - GkHk-l)
+
[~ ~k] + GkRkGr
-
-
T
[0 0]
= (A k- 1 - GkHk-l)Pk-lAk-l + 0 Qk ' where the identity -
+ GkRk G kT = - Ak-1Pk-lH"[_lGl + GkHk-lPk-lH"[_lGl + GkRkGl = - Ak-1Pk-lH"[_1 Gl + Gk(Hk-lPk-lH"[_l + Rk)Gl T T T T == - Ak-lPk-lHk-l G k + Ak-lPk-lHk-l G k - (A k - 1 - GkHk-l)Pk-l(GkHk-l)
T
==0,
which is a consequence of (5.13), has been used.
(5.15)
5.3 Kalman Filtering Process
71
The initial estimate is Zo = L(zo, vo) = E(zo) - (zo, vo)[Il vo\l2]-1(E(vo) - vo) = [E~o)] _ [Var(i)Cl] [CoVar(Xo)CJ + Rot l . (CoE(xo) - vo) .
(5.16)
We remark that
and since Zk - Zk
+ (§.k - Gk]k) = (A k- 1 - GkHk-l) ... (A o - G 1H o)(zo - Zo) + noise, = (A k- 1 - Gk H k-l)(Zk-1 - Zk-1)
we have E(Zk - Zk) = o. That is, Zk is an unbiased linear minimum (error) variance estimate of Zk. In addition, from (5.16) we also have the initial condition Po =Var(zo - zo) =Var([Xo - E(xo)] ~o + [ V ar (xo 0 )cl ] [CoVar (Xo) CoT
+ R o] -1 [CoE (Xo) - Vo ])
Var(xo - E(xo) = +var(xo)CJ[Covar(xo):J + ~J-l[CoE(xo) - Vo]) [ Var(xo) -[var(xo)]Cl[covar(Xoo)C;J
__
+ R o]-lCo[Var(xo)]
0 ]
Qo
0 ]
(5.17a )
[ Qo
(cf. Exercise 5.5). If Var(xo) is nonsingular, then using the matrix inversion lemma (cf. Lemma 1.2), we have (5.17b)
72
5. Colored Noise
Finally, returning to the definition
Zk =
[~~],
we have
Zk == L(Zk, v k ) == E(Zk) + (Zk, vk)[llvkI12]-1(vk - E(v k ))
==
[E (Xk)] + [( Xk, v:) ]
E({k) _ [E(Xk) - E(~) -k
[11 v k 11 2 ] -1 (v k _ E (v k ) ) ({k'V ) + (Xk, vk)[llvkI12]-1(vk - E(v k ))] + (~ ,vk)[llvkI12]-1(vk - E(v k ))
-k
=[1:] . In summary, the Kalman filtering process for the colored noise model (5.1) is given by ==[Ak-1 [ ~k] fk 0
f k- 1 ]
Mk-1
+ Gk ( Vk -
[~k-1] fk-1
Nk-l Vk-l
-
Hk-l
]) ,
(5.18)
+ R k )-1
(5.19)
[1:=~
where and G k == [ Ak-1 0
f k- 1 ] D HT (H D HT Mk-1 rk-1 k-1 k-1rk-1 k-1
with Pk-1 given by the formula Pk
A k- 1
= ([ 0
) f k- 1 ] [Al- 1 Mk-l - GkHk-l Pk-l
+ [~ ~J,
rLl
0]
MLI
(5.20)
k == 1,2,···. The initial conditions are given by (5.16) and (5.17a or 5.17b). We remark that if the colored noise processes become white (that is, both Mk == 0 and Nk == 0 for all k), then this Kalman filtering algorithm reduces to the one derived in Chapters 2 and 3, by simply setting xklk-1 == Ak-1Xk-1Ik-1 , so that the recursive relation for Xk is decoupled into two equations: the prediction and correction equations. In addition, by defining Pk == Pk1k and
Pk,k-1 == Ak-1Pk-1Al-1 + fkQkfl , it can be shown that (5.20) reduces to the algorithm for computing the matrices Pk,k-1 and Pk,k. We leave the verification as an exercise to the reader (cf. Exercise 5.6).
5.5 Real-Time Applications
73
5.4 White System Noise If only the measurement noise is colored but the system noise is white, i.e. Mk = 0 and Nk =1= 0, then it is not necessary to obtain the extra estimate ~k in order to derive the Kalman filtering equations. In this case, the filtering algorithm is given as follows (cf. Exercise 5. 7) : Po = [[Var(xo)]-l
+ C~ Ra1Co]-1
Hk-1 = [CkAk-1 - Nk-1Ck-1] Gk = (Ak-1Pk-1H"[_1 + rk-1Qk-lrl-lC"[)(Hk-1Pk-lH"[-1 + Ckrk-1Qk-lrl_1C"[ + Rk_l)-l Pk = (A k- 1 - GkHk-1)Pk-lAl-l + (1 - GkCk)rk-1Qk-1rl_l Xo = E(xo) - [Var(xo)]C~[CoVar(xo)C~ + RO]-l[CoE(xo) - Vo] Xk
= Ak-lXk-l + Gk(Vk - Nk-lVk-l - Hk-lXk-l)
k = 1,2,··· ,
(5.21)
(cf. Fig.5.!). +
Fig. 5.1.
5.5 Real-Time Applications Now, let us consider a tracking system (cf. Exercise 3.8 and (3.26)) with colored input, namely: the state-space description Xk+l = {
Vk
AXk
+ 5.k
= CXk + TJk ,
where C=[1 00],
(5.22)
74
5. Colored Noise
with sampling time h > 0, and ~k = F~k_1 + f!..k { 'TJk
(5.23)
= g'TJk-1 +Ik,
where {~k} and {Ik} are both zero-mean Gaussian white noise sequences satisfying the following assumptions: E(~kri:)
= Qkbkl,
E(xo~~) = 0, E(XOlk)
= 0,
E(1]o{ik)
E(1]k1]l)
= rkbkl ,
E(~k Il)
= 0,
= 0,
E(xo~~) = 0,
E(~o~~) = 0,
E(~oIk) = 0,
E(xo'TJo)
= 0,
~-1
= o.
= 0,
1]-1
~k =
[;J '
Set Zk
= [{:] ,
Then we have
Zk+1 = Akz k + ~k+1 { Vk = CkZk + 1]k
as described in (5.2) and (5.3). The associated Kalman filtering algorithm is then given by formulas (5.18-20) as follows:
where hk-1 == [ CA - gC
and G k ==
C]T
= [ (1 - g) h h 2 /2 1 0 O]T,
0 F Pk-1 h k-1 (h k- 1P k-1 h k-1 + rk )-1 [AI] T
with Pk-1 given by the formula
o"] p2
+
[00 Qk0]
Exercises
75
Exercises 5.1. Let {~k} be a sequence of zero-mean Gaussian white noise and {Vk} a sequence of observation data as in the system (5.1). Set
and define L(x, v) as in (4.6). Show that -
L(~k'
k V
)
= o.
5.2. Let {Ik} be a sequence of zero-mean Gaussian white noise and v k and L(x, v) be defined as above. Show that
and
L(Ik,V k- 1 ) =
o.
5.3. Let {Ik} be a sequence of zero-mean Gaussian white noise and v k and L(x, v) be defined as in Exercise 5.1. Furthermore, set " Zk-l
= L( Zk-l,
V k-l)
and
Zk-l
=
X k- 1 ] [ ~
.
-k-l
Show that (Zk-l - Zk-l,
Ik) =
o.
5.4. Let {~k} be a sequence of zero-mean Gaussian white noise and set
Furthermore, define
Zk-l
as in Exercise 5.3. Show that
5.5. Let L(x, v) be defined as in (4.6) and set Zo = L(zo, vo) with
zo=[~].
76
5. Colored Noise
Show that Var(zo - zo)
=
[
Var(xo) -[Var(xo)]cJ[CoVar(x~CJ +.Ro]-lCo[Var(xo)]
5.6. Verify that if the matrices Mk and Nk defined in (5.1) are identically zero for all k, then the Kalman filtering algorithm given by (5.18-20) reduces to the one derived in Chapters 2 and 3 for the linear stochastic system with uncorrelated system and measurement white noise processes. 5.7. Simplify the Kalman filtering algorithm for the system (5.1) where Mk == 0 but Nk =1= o. 5.8. Consider the tracking system (5.22) with colored input (5.23) . (a) Reformulate this system with colored input as a new augmented system with Gaussian white input by setting
A c ==
[~ ~ ~] o
0
and
Cc == [C 0 0 0 1].
9
(b) By formally applying formulas (3.25) to this augmented system, give the Kalman filtering algorithm to the tracking system (5.22) with colored input (5.23). (c) What are the major disadvantages of this approach ?
6. Limiting Kalman Filter
In this chapter, we consider the special case where all known constant matrices are independent of time. That is, we are going to study the time-invariant linear stochastic system with the state-space description: Xk+l: {
Vk -
AXk
+ r{k
(6.1)
CXk +'!1k'
Here, A, r, and C are known n x n, n x p, and q x n constant matrices, respectively, with 1 ~ p, q ~ n, {~k} and {'!1k} are zeromean Gaussian white noise sequences with
where Q and R are known pxp and q x q non-negative and positive definite symmetric matrices, respectively, independent of k. The Kalman filtering algorithm for this special case can be described as follows (cf. Fig. 6.1):
{
with
~klk = Xklk~l + Gk(Vk Xklk-l = AXk-llk-l :Kolo = E(xo)
-
CXklk-l)
Po,o = Var(xo) Pk,k-l = APk_l,k_lAT + rQr T Gk = Pk,k_lCT (CPk,k_l CT + R)-l
(6.2)
(6.3)
Pk,k = (I - GkC)Pk,k-l .
Note that even for this simple model, it is necessary to invert a matrix at every instant to obtain the Kalman gain matrix Gk in (6.3) before the prediction-correction filtering (6.2) can be carried out. In real-time applications, it is sometimes necessary to
78
6. Limiting Kalman Filter
replace Gk in (6.2) by a constant gain matrix in order to save computation time. + +
Fig. 6.1.
The limiting (or steady-state) Kalman filter will be defined by replacing G k with its "limit" G as k --4 00, where G is called the limiting Kalman gain matrix, so that the prediction-correction equations in (6.2) become ~klk {
= Xklk~l + G(Vk -
Xklk-l xala
CXklk-l)
(6.4)
= AXk-llk-l
= E(xo) .
Under very mild conditions on the linear system (6.1), we will see that the sequence {Gk} does converge and, in fact, trllxklk - xklk 11; tends to zero exponentially fast. Hence, replacing Gk by G does not change the actual optimal estimates by too much. 6.1 Outline of Procedure
In view of the definition of Gk in (6.3), in order to study the convergence of Gk, it is sufficient to study the convergence of Pk := Pk,k-l .
We will first establish a recursive relation for Pk. Since Pk
= Pk,k-l == APk-l,k-lAT + rQr T = A(I - Gk-l C )Pk-l,k-2 AT + rQr T = A(I - Pk-l,k-2CT (CPk-l,k-2 CT + R)-lC)Pk_l,k_2 AT + rQr T == A(Pk - 1 - Pk_l CT (CPk_l CT + R)-lCPk_1)A T + rQr T ,
it follows that by setting 'lJ(T)
= A(T - TC T (CTC T + R)-lCT)AT + rQr T
,
6.2 Preliminary Results
79
Pk indeed satisfies the recurrence relation (6.5)
This relation is called a matrix Riccati equation. If Pk ~ P as k ~ 00, then P would satisfy the matrix Riccati equation P
= 'I!(P) .
(6.6)
Consequently, we can solve (6.6) for P and define G
so that Gk
~
G as k ~
= PCT (CPC T + R)-l, 00.
Note that since Pk is symmetric, so is
'I!(Pk ).
Our procedure in showing that {Pk} actually converges is as follows: (i) Pk:S W for all k and some constant symmetric matrix W (that is, W -Pk is non-negative definite symmetric for all k); (ii) Pk:S Pk+l, for k = 0,1,···; and (iii) limk-+oo Pk = P for some constant symmetric matrix P.
6.2 Preliminary Results To obtain a unique P, we must show that it is independent of the initial condition Po := PO,-l, as long as Po 2:: o.
Lemma 6.1. Suppose tbat tbe linear system (6.1) is observable (tbat is, the matrix N CA
CA C. ]
= [
CA·n-l
has full rank). Tben tbere exists a non-negative definite symmetric constant matrix W independent of Po sucb tbat
for all k 2:: n + 1.
80
6. Limiting Kalman Filter
Since (Xk - Xk,S-k) = 0 (cf. (3.20)), we first observe that Pk := Pk,k-l =
II x k -
xklk-lll~
= IIAxk-l + r{k_l - AXk-llk-lll~ = Allxk-l - Xk_llk_lll~AT + rQr T . Also, since Xk-llk-l is the minimum variance estimate of Xk-l, we have Ilxk-l - Xk-llk-lll~ ::; II x k-l - xk-lll~
for any other linear unbiased estimate Xk-l of Xk-l. From the assumption that NCA is of full rank, it follows that NJANCA is nonsingular. In addition, n-l
NJANCA = L(AT)iCTCA i , i=o
and this motivates the following choice of Xk-l: n-l Xk-l
= An[NJANcA]-l L(AT)iCT Vk-n-1+i,
k 2: n
+ 1.
(6.7)
i=o
Clearly, Xk-l is linear in the data, and it can be shown that it is also unbiased (cf. Exercise 6.1). Furthermore, n-l
= A n [N6 ANcAt 1 L(AT)iCT (Cxk-n-l+i + !l.k-n-l+i) i=o
Since
n-l Xk-l
= AnXk_n_l + L Air{n_l+i ' i=o
6.2 Preliminary Results
81
we have n-I
Xk-I - Xk-I
= '" L.-t Air~-n- I,j +lJ i=o n-I
i-I
1 - An [N6A N CAr L(AT)iCT ( L CAj ~i-l-j +!Ik-n-l+i) . i=o j=o
Observe that E(5..m !JJ) = 0 and E(?1!JJ) = R for all m and £, so that Ilxk-1 - Xk-Ill; is independent of k for all k ~ n + 1. Hence,
+ rQr T T T ~ Allxk-I - Xk_Ilk_III;A + rQr = Allxn - xnlnll~AT + rQr T
Pk = Allxk-I - Xk_Ilk_III;A
for all k
~ n
T
+ 1. Pick W =
Allxn- xnlnll;AT + rQr T
.
Then Pk ~ W for all k ~ n + 1. Note that W is independent of the initial condition Po = PO,-I = IIxo -xo,-Ill;. This completes the proof of the Lemma. Lemma 6.2. If P and Q are both non-negative definite and symmetric with P ~ Q, then \l1(P) ~ \l1(Q). To prove this lemma, we will use the formula :SA-1(s)
= -A-1(s) [:sA(S)]A-1(s)
(cf. Exercise 6.2). Denoting T(s) = Q + s(P - Q), we have \l1(P) - \l1(Q)
{I d
= la ds iJ!(Q + s(P - Q»ds 1 =A{1 :s {(Q + s(P - Q» - (Q . [C(Q
+ s(P - Q»CT
+ s(P - Q»C T + Rr1C(Q + s(P - Q» }dS }AT
1 =A{1 [P - Q - (P - Q)CT (CT(s)C T
+ R)-lCT(s)
82
6. Limiting Kalman Filter - T(s)C T (CT(s)C T
+ R)-IC(p - Q) + T(s)C T (CT(s)C T + R)-l
. C(P - Q)C T (CT(s)C T
+ R)-lCT(s)]ds }AT
1
=A{1 [T(s)C T (CT(s)C T . [T(s)C T (CT(s)C T
+ R)-lC](p - Q)
+ R)-lC]T dS}A
20.
Hence, \lJ(P) 2 \lJ(Q) as required. We also have the following:
Lemma 6.3. Suppose that the linear system (6.1) is observable. Then with the initial condition Po = PO,-l = 0, the sequence {Pk} converges componentwise to some symmetric matrix P 2 0 as k
-t
00.
Since PI := IIXI -
xlloll;
20= Po
and both Po and PI are symmetric, Lemma 6.2 yields
Pk+l 2 Pk ,
k
= 0, 1, ....
Hence, {Pk} is monotonic nondecreasing and bounded above by y, we have
w (cf. Lemma 6.1). For any n-vector
o::; yT PkY ::; YTWy, so that the sequence {y T PkY} is a bounded non-negative monotonic nondecreasing sequence of real numbers and must converge to some non-negative real number. If we choose y
= [0···0
1 0·· .O]T
with 1 being placed at the ith component, then setting Pk = [p~~)], we have Tp (k) Y kY = Pii ----t Pii as k ----t 00 for some non-negative number
Pii.
Next, if we choose
Y = [0···0 1 0···0 1 0·· .O]T
6.2 Preliminary Results
83
with the two 1's being placed at the ith and jth components, then we have YT PkY =P~~) + p~~) ~~
~J
+ p(,~) + p(~~
=P~~) + 2p~~) ~~
~J
for some non-negative number
JJ
J~
+ p(.k~ JJ
q.
-+
q
as k
Since p~~)
-+
-+ 00
Pii'
we have
That is, Pk -+ P. Since Pk 2: 0 and is symmetric, so is P. This completes the proof of the lemma. We now define G
=
lim Gk,
k~(X)
where Gk = PkCT (CPkCT + R)-l. Then G
= PCT (CPC T + R)-l.
(6.8)
Next, we will show that for any non-negative definite symmetric matrix Po as an initial choice, {Pk} still converges to the same P. Hence, from now on, we will use an arbitrary non-negative definite symmetric matrix Po, and recall that Pk = '1J(Pk-l) , k = 1,2, ... , and P = '1J(P). We first need the following.
Lemma 6.4. Let tbe linear system (6.1) be observable so tbat can be defined using Lemma 6.3. Tben tbe following relation
P
bolds for all k = 1,2" ", and any non-negative definite symmetric initial condition Po. Since Gk-l = Pk_lCT (CPk_l CT + R)-l and P;:-l = Pk-l, the matrix Gk-lCPk-l is non-negative definite and symmetric, so that Gk-lCPk-l = Pk-lCTGl_l' Hence, using (6.5) and (6.6), we have P-Pk
= '1J(P) - '1J(Pk-l) = (APA T - AGCPAT ) -
(APk_lAT - AGk-lCPk_lAT) T T = APA - AGCPA - APk_lAT + APk_1CTGl_1A T .
(6.10)
84
6. Limiting Kalman Filter
Now, (I - GC)(P - Pk-l)(1 - Gk_lC)T =P - Pk-l + Pk_1C T Gl- 1 - GCP + Re,
(6.11)
where Re = GCPk-l - PCT Gl- 1 + GCPC T Gl- 1 - GCPk_lCT Gl- 1 .
(6.12)
Hence, if we can show that Re = 0, then (6.9) follows from (6.10) and (6.11). From the definition of Gk-l, we have Gk_l(CPk_lCT + R) = Pk_lCT or (CPk_lCT + R)Gk-l = CPk-l, so that Gk-lCPk-lCT
= Pk_lCT - Gk-lR
(6.13)
CPk_l CT Gl- 1
= CPk-l
(6.14)
or Taking k have
~
00
- RGl- 1 .
in (6.13) with initial condition Po
:= PO,-l = 0,
we
GCPC T
= PCT - GR, (6.15) and putting (6.14) and (6.15) into (6.12), we can indeed conclude that Re = 0. This completes the proof of the lemma. Lemma 6.5. Pk =[A(I - Gk-lC)]Pk-l[A(I - Gk_lC)]T
+ [AGk_l]R[AGk_l]T + rQr T and consequently, for an observable system with Po P = [A(I - GC)]P[A(I - GC)]T
(6.16) := PO,-l = 0,
+ [AG]R[AG]T + rQr T
.
(6.17)
Since Gk_l(CPk_lCT + R) = Pk_lCT from the definition, we have and hence, AGk-lRGl_lAT
= A(I - Gk-lC)Pk-lCT Gl_1A T .
Therefore, from the matrix Riccati equation Pk = W(Pk-l), we may conclude that Pk = A(1 - Gk_lC)Pk_l AT + rQr T = A(I - Gk-lC)Pk-l(I - Gk_lC)T AT + A(I - Gk-lC)Pk-lCT Gl_1A T + rQr T = A(I - Gk-lC)Pk-l(I - Gk_lC)T AT + AGk-1RGl_1AT + rQr T
which is (6.16).
6.2 Preliminary Results
85
Lemma 6.6. Let the linear system (6.1) be (completely) con-
trollable (that is, the matrix
has full rank). Then for any non-negative definite symmetric initial matrix Po, we have Pk > 0 for k ~ n + 1. Consequently, P >0.
Using (6.16) k times, we first have Pk = fQfT + [A(I - Gk_IC)]fQfT[A(I - Gk_IC)]T + ... + {[A(IGk-I C )]", [A(I - G 2 C)]}fQf T {[A(I - Gk-I C)],,· [A(I - G2C)]} T + [AGk_I]R[AGk_I]T + [A(I - Gk-IC)][AGk-2]R[AGk-2]T . [A(I - Gk_IC)]T + ... + {[A(I - Gk-IC)] ... [A(I - G2C)][AG I ]}R{[A(I - Gk-IC)] ... [A(I - G2C)][AG I ]} T + {[A(I - GkC )] ... [A(I - GIC)]}Po{[A(I - Gk C)] ... [A(I - GIC)]}T .
To prove that Pk > 0 for k ~ n + 1, it is sufficient to show that = 0 implies Y = o. Let·y be any n-vector such that Y T PkY = o. Then, since Q, R and Po are non-negative definite, each term on the right hand side of the above identity must be zero. Hence, we have Y T PkY
YTfQfT Y = 0,
(6.18)
y T[A(I - Gk_IC)]fQfT[A(I - Gk_IC)]T Y = 0,
(6.19)
Y T {[A(I - Gk-I C )],,· [A(I - G 2 C)]}fQf T
. {[A(I - Gk-IC)]", [A(I - G2C)]} T Y = 0
(6.20)
and
YT {[A(I - Gk-IC)]·" [A(I - G 3 C)] [AG 2]}R
. {[A(I - Gk-IC)]", [A(I - G3 C)][AG 2]}T y = o.
(6.22)
Since R > 0, from (6.21) and (6.22), we have yT AGk-1
= 0,
(6.23)
86
6. Limiting Kalman Filter
y T[A(l - Gk-l C )]··· [A(l - G 2 C)][AG 1] = o.
(6.24)
Now, it follows from Q > 0 and (6.18) that
and then using (6.19) and (6.23) we obtain yTAr=o,
and so on. Finally, we have YT Ajr
= 0,
j
= 0,1,·· . ,n -
1,
as long as k 2: n + 1. That is, yTMAry=yT[r Ar ... An-1r]y=0.
Since the system is (completely) controllable, MAr is of full rank, and we must have y = o. Hence, Pk > 0 for all k 2: n + 1. This completes the proof of the lemma. Now, using (6.9) repeatedly, we have P - Pk
= [A(l - Gc)]k-n-l(p - Pn+1)BJ ,
(6.25)
where Bk
= [A(l - Gk-lC)]··· [A(l - Gn+1C)] ,
k
= n + 2, n + 3,··· ,
with B n+ 1 := I. In order to show that Pk ~ P as k ~ 00, it is sufficient to show that [A(l - Gc)]k-n-l ~ 0 as k ~ 00 and Bk is "bounded." In this respect, we have the following two lemmas.
Lemma 6.7. Let the linear system (6.1) be observable. Then
for some constant matrix M. Consequently, if Bk = [b~j)] then (k)1
b I ij
for some constant rn and for all
:Srn, i,j
and
k.
6.2 Preliminary Results
By Lemma 6.1, Pk ~ W for k 6.5 repeatedly, we have
~ n
87
+ 1. Hence, using Lemma
W ~ Pk ~ [A(1 - Gk-lC)]Pk-l[A(1 - Gk_lC)]T ~ [A(1 - Gk-lC)][A(1 - Gk-2C)]Pk-2
. [A(1 - G k _ 2C)]T[A(1 - Gk_1C)]T
Since Pn +1 is real, symmetric, and positive definite, by Lemma 6.6, all its eigenvalues are real, and in fact, positive. Let Amin be the smallest eigenvalue of Pn+1 and note that Pn+1 ~ Amin1 (cf. Exercise 6.3). Then we have
Setting M =
A~~n W
completes the proof of the lemma.
Lemma 6.8. Let A be an arbitrary eigenvalue of A(1 - GC). If the system (6.1) is both (completely) controllable and observable, then IAI < 1. Observe that A is also an eigenvalue of (I a corresponding eigenvector. Then
GC) T AT.
Let y be (6.26)
Using (6.17), we have yT Py
= "XyT PAy + yT[AG]R[AG]T Y + yTrQr Ty.
Hence, (1 -
IAI 2 )yTPy = YT [(AG)R(AG) T + rQrT]y .
Since the right-hand side is non-negative and yT Py ~ 0, we must have 1 -IAI 2 ~ 0 or IAI ~ 1. Suppose that IAI = 1. Then 51T (AG)R(AG) T Y = 0
and
yTrQr Ty
=0
or
~----::~T
[(AG)T y ] R[(AG)
T
y] = 0
or
Since Q > 0 and R > 0, we have GTATy=O
(6.27)
88
6. Limiting Kalman Filter
and
r T y = 0,
(6.28)
so that (6.26) implies AT y = AY. Hence, rT(Aj)TY=AjrTy=o,
j=0,1,···,n-1.
This gives yTMAr=yT[r Ar ... An-1r] =0.
Taking real and imaginary parts, we have [Re(y)]T MAr = 0
and
[Im(y)]T MAr =
o.
Since y =1= 0, at least one of Re(y) and Im(y) is not zero. Hence, MAr is row dependent, contradicting the complete controllability hypothesis. Hence IAI < 1. This completes the proof of the lemma.
6.3 Geometric Convergence
Combining the above results, we now have the following.
Theorem 6.1. Let the linear stochastic system (6.1) be both (completely) controllable and observable. Then, for any initial state Xo such that Po := PO,-l = Var(xo) is non-negative definite and symmetric, Pk := Pk,k-l ~ P as k ~ 00, where P > 0 is symmetric and is independent of xo. Furthermore, the order of convergence is geometric; that is, (6.29)
where 0 < r < 1 and C > 0, independent of k. Consequently, (6.30)
To prove the theorem, let p = A(I - GC). Using Lemma 6.7 and (6.25), we have (Pk - P)(Pk - P) T =pk-n-l(Pn + 1 - P)BkBJ (Pn+ 1 - p)(pk-n-l)T 5:pk-n-1o(pk-n-l) T
for some non-negative definite symmetric constant matrix o. From Lemma 6.8, all eigenvalues of p are of absolute value less
6.3 Geometric Convergence
89
than 1. Hence, F k ~ 0 so that Pk ~ P as k ~ 00 (cf. Exercise 6.4). On the other hand, by Lemma 6.6, P is positive definite symmetric and is independent of Po. Using Lemmas 1.7 and 1.10, we have tr(Pk - P)(Pk - p)T :::; trFk-n-l(Fk-n-l)T . trO:::; Cr k ,
where 0 < r < 1 and C is independent of k. To prove (6.30), we first rewrite Gk -G = PkC T (CPk CT + R)-l - PCT (CPC T = (Pk - P)C T (CPk CT + R)-l
+ R)-l
+ PCT[(CPkCT + R)-l - (CPC T + R)-l] = (Pk - P)C T (CPk CT + R)-l + PCT (CPk CT + R)-l . [(CPC T + R) - (CPk CT + R)](CPCT + R)-l
= (Pk - P)C T (CPk CT + R)-l + PCT (CPkC T
Since for any n
x n
+ R)-lC(p -
Pk)C T (CPC T
+ R)-l.
matrices A and B,
(A+B)(A+B)T :::;2(AA T +BB T )
(cf. Exercise 6.5), we have (G k
-
G) (G k
-
G) T
:::;2(Pk - P)C T (CPkCT + R)-l(CPkCT + R)-lC(Pk - P) + 2PC T (CPk CT + R)-lC(p - Pk)C T (CPC T + R)-l . (CPC T + R)-lC(p - Pk)C T (CPk CT + R)-lCp.
And since Po :::; Pk, we have CPoc T
+R
:::; CPkCT
(6.31)
+ R, so that by
Lemma 1.3,
+ R)-l :::; (CPoC T + R)-l , and hence, by Lemma 1.9, (CPk CT
+ R)-l(CPk CT + R)-l
tr(CPk CT
:::; (tr(CPoC T
+ R)-1)2.
Finally, by Lemma 1.7, it follows from (6.31) that tr (G k
G) (G k
-
-
G) T
:::; 2tr(Pk - P)(Pk - p)T . trCTC(tr(CPoC T + R)-1)2 + 2trpp T . trCT C(tr(CPoC T + R)-1)2 . trCC T . tr(P - Pk)(P - Pk)T . trCT C· tr(CPC T
+ R)-l(CPCT + R)-l
:::; Cl tr(Pk - P)(Pk - p)T :::; Cr k
,
where Cl and C are constants, independent of k. This completes the proof of the theorem.
90
6. Limiting Kalman Filter
The following result shows that Xk is an asymptotically optimal estimate of Xk. Theorem 6.2. Let the linear system (6.1) be both (completely) controllable and observable. Then lim
k~~
Ilxk - xkll; = (p-l + c T R-1C)-1 =
lim
k~~
Ilxk - xkll; .
The second equality can be easily verified. Indeed, using Lemma 1.2 (the matrix inversion lemma), we have lim
k~~
=
Ilxk - xkll;
lim Pk ,k
k~~
= k~~ lim (1 - GkC)Pk,k-l
= (1 -
GC)P
= P - PCT (CPC T + R)-lCp = (P- 1 + c T R-1C)-1 > o. Hence, to verify the first equality, it is equivalent to showing that Ilxk - xkll; -+ (1 -GC)P as k -+ 00. We first rewrite Xk -Xk
= (AXk-l + fi k _ 1 ) = (AXk-l + fik_l) - G(CAXk-l
= (1 -
-
(AXk-l
+ GVk -
GCAXk-l)
- AXk-l
+ Cfik_ 1 + ~k) + GCAXk-l
GC)A(Xk-l - Xk-l)
+ (1 -
GC)f~k_l - G~k .
(6.32)
Since (6.33)
and (Xk-l - Xk-l' ~k) = 0
(6.34)
(cf. Exercise 6.6), we have
Ilxk -
xkll;
=(1 - GC)Allxk-l - Xk_lll;A T (1 - GC) T + (1 - GC)fQf T (1 - GC) T + GRGT .
(6.35)
On the other hand, it can be proved that Pk,k =(1 - GkC)APk-l,k-lAT (1 - GkC ) T
+ (1 -
GkC)fQfT (1 - Gk C ) T
+ GkRGl
(6.36)
6.3 Geometric Convergence
(cf. Exercise 6.7). Since Pk,k = (1 --+ 00, taking the limit gives
GkC)Pk,k-1
--+
91
(1 - GC)P as
k
(1 - GC)P =(1 - GC)A[(1 - GC)P]A T (1 - GC) T
+ (1 - GC)rQr T (1 -
GC) T
+ GRGT .
(6.37)
Now, subtracting of (6.37) from (6.35) yields Ilxk - xkll~ - (1 - GC)P =(1 - GC)A[llxk-1 - xk-Ill~ - (1 - GC)P]A T (1 - GC) T
.
By repeating this formula k - 1 times, we obtain IIXk - Xk II~ - (1 - GC)P =[(1 - GC)A]k[llxo - xoll~ - (1 - GC)P][A T (1 - GC)T]k.
Finally, by imitating the proof of Lemma 6.8, it can be shown that all eigenvalues of (1 - GC)A are of absolute value less than 1 (cf. Exercise 6.8). Hence, using Exercise 6.4, we have Ilxk -xkll;(I - GC)P --+ 0 as k --+ 00. This completes the proof of the theorem. In the following, we will show that the error to zero exponentially fast.
Xk -Xk
also tends
Theorem 6.3. Let the linear system (6.1) be both (completely) controllable and observable. Then there exist a real number r, o < r < 1, and a positive constant C, independent of k, such that trllxk - Xk 11;
Xk
::; Crk .
= AXk-1 + Gk(Vk - CAXk-l) = AXk-1 + G(Vk - CAXk-l) + (G k -
G)(Vk - CAXk-l)
and we have fk
= Xk - Xk = A(Xk-1 - Xk-I) -
GCA(Xk-1 - Xk-I)
+ crik _ 1 +!lk - CAXk-l) GC)Afk_1 + (Gk - G)(CA~k_1 + cri k _ 1 + !1k).
+ (Gk - G)(CAXk-1
= (I -
92
6. Limiting Kalman Filter
Since {
and ({k-l,!l.k)
= 0
(fk-l' i k- 1) = 0,
(fk-1' !l.k) = 0,
(~k -1 , {k -1)
(~k -1 , !l.k)
(6.38)
= 0,
= 0,
(cf. Exercise 6.9), we obtain
Ilfkll~ = [(I - GC)A]llfk_lll~[(1 - GC)A]T + (G k - G)CAII~k_lll~AT c T (Gk - G) T + (Gk - G)erQr T eT (Gk - G) T + (Gk - G)R(Gk - G) T
GC)A(fk_1, ~k_1)ATc T (Gk - G) T + (Gk - G)eA(~k_l,fk_l)AT (I - GC)T =Fllfk_111~FT +(Gk- G )Ok_1(G k -G)T + FB k- 1(Gk - G)T + (Gk - G)BJ_1F T ,
+ (I -
where F
= (I -
(6.39)
GC)A,
Bk-1 = (fk_1'~k_l)AT eT ,
and
+ crQrT c T + R.
Ok-1 = CAII~k_111~AT eT
Hence, using (6.39) repeatedly, we obtain
Ilfkll~ = Fkllfoll~(Fk)T +
k-1
L Fi(Gk_i -
G)Ok-1-i(G k-i - G)T (Fi)T
i=o
k-1
+
L
F i [FB k_ 1_i(G k_i - G) T
+ (Gk-i - G)B~_l_iFT](Fi) T
.
i=o (6.40)
On the other hand, since the Bj'S are componentwise uniformly bounded (cf. Exercise 6.10), it can be proved, by using Lemmas 1.6, 1.7 and 1.10 and Theorem 6.1, that tr[FBk_l_i(Gk_i - G)T
+ (G k- i - G)B~_l_iFT] ~ Clr~-i+l
(6.41)
for some rI, 0 < rl < 1, and some positive constant Cl independent of k and i (cf. Exercise 6.11). Hence, we obtain, again using Lemmas 1.7 and 1.10 and Theorem 6.1,
trllfk II~ ::;
tr Ilfo II~ . tr F k (F k )T
k-1
+ L trF i (F i )T
i=o . tr(G k-2. - G)(G k-2. - G)T . tr 0 k-1-2.
6.4 Real-Time Applications
93
k-l
+ LtrFi(Fi)T .tr[FBk_1_i(Gk_ i
_G)T
i=o
+ (Gk-i
- G)B;_l_i FT ]
~ trllfoll;C2r~ +
k-l
k-l
i=o
i=o
L C3r~C4r~-i + L
Csr~Clr~-i+l
~ p(k)r~ ,
(6.42)
where 0 < r2,r3,r4,rS < 1, r6 = max(rl,r2,r3,r4,rS) < 1, C 2 , C 3, C4, Cs are positive constants independent of i and k, and p(k) is a polynomial of k. Hence, there exist a real number r, r6 < r < 1, and a positive constant C, independent of k and satisfying p(k)(r6/r)k ~ C, such that trllfkll; ~ Cr
k
.
This completes the proof of the theorem.
6.4 Real-Time Applications Now, let us re-examine the tracking model (3.26), namely: the state-space description Xk+l
=
[~ ~ h~2] Xk +{k
{
Vk = [1 0 0 ]Xk
(6.43)
+ TJk ,
where h > 0 is the sampling time, {~k} and {TJk} are both zeromean Gaussian white noise sequences satisfying the assumption that E({k{J) = E(~kTJl)
[
~p~
= 0,
and (Jp, (Jv, (Ja 2: 0, with matrices
~ ] 6kl,
0 (Jv
0
E(TJkTJl) =
E(~kxci) = 0 , (Jp
(Jm
8kl ,
(Ja
+
(Jv
+
(Ja
E(XoTJk) = 0 ,
> 0, and
(Jm
> 0. Since the
94
6. Limiting Kalman Filter
and NCA
fA] = [~ ~ h0/2] 2 2
= [
CA2
1
2h
2h
are both of full rank, so that the system (6.43) is both completely controllable and observable, it follows from Theorem 6.1 that there exists a positive definite symmetric matrix P such that lim Pk+l,k
k-+oo
=P,
where
with Gk
= Pk,k_lC(CT Pk,k-lC + O"m)-l.
Hence, substituting Gk into the expression for Pk+l,k above and then taking the limit, we arrive at the following matrix Riccati equation: (6.44)
Now, solving this matrix Riccati equation for the positive definite matrix P, we obtain the limiting Kalman gain
and the limiting (or steady-state) Kalman filtering equations:
{
Xk+l
= AXk + G(Vk - CAXk)
i o = E(xo).
(6.45)
Since the matrix Riccati equation (6.44) may be solved before the filtering process is being performed, this limiting Kalman filter gives rise to an extremely efficient real-time tracker. Of course, in view of Theorem 6.3, the estimate Xk and the optimal estimate Xk are exponentially close to each other.
Exercises
95
Exercises 6.1. Prove that the estimate Xk-l in (6.7) is an unbiased estimate of Xk-l in the sense that E(Xk-l) = E(Xk-I). 6.2. Verify that
6.3. Show that if Amin is the smallest eigenvalue of P, then P 2 Amin1. Similarly, if Amax is the largest eigenvalue of P then P
~
Amax 1.
6.4. Let F be an n x n matrix. Suppose that all the eigenvalues of F are of absolute value less than 1. Show that· pk ~ 0 as k
~ 00.
6.5. Prove that for any n
x n
matrices A and B,
6.6. Let {~k} and {!lk} be sequences of zero-mean Gaussian white system and measurement noise processes, respectively, and Xk be defined by (6.4). Show that
and (Xk-l - Xk-l' !1k)
= o.
6.7. Verify that for the Kalman gain Gk, we have -(I - GkC)Pk,k-ICT GJ
+ GkRkGJ = o.
Using this formula, show that Pk,k =(1 - GkC)APk-l,k-IAT (I - GkC) T + (I - GkC)rQk rT (1 - GkC ) T + GkRGJ .
6.8. By imitating the proof of Lemma 6.8, show that all the eigenvalues of (I - GC)A are of absolute value less than 1. 6.9. Let £k = Xk - Xk where Xk is defined by (6.4), and let ~k = Xk - Xk. Show that (£k-l' ~k-l)
= 0,
(~k-l' ~k-l) = 0,
96
6. Limiting Kalman Filter
where {{k} and {!lk} are zero-mean Gaussian white system and measurement noise processes, respectively. 6.10. Let Bj={~j' ~j)ATCT,
j=O,l,···,
where ~. = Xj - Xj, ~j = Xj - Xj, and Xj is defined by (6.4). Prove that B j are componentwise uniformly bounded. 6.11. Derive formula (6.41). 6.12. Derive the limiting (or steady-state) Kalman filtering algorithm for the scalar system: Xk+l {
Vk
= aXk + r~k = CXk + 'r/k,
where a, r, and c are constants and {~k} and {'r/k} are zeromean Gaussian white noise sequences with variances q and r, respectively.
7. Sequential and Square-Root Algorithms
It is now clear that the only time-consuming operation in the Kalman filtering process is the computation of the Kalman gain matrices: Gk = Pk,k-lC-: (CkPk,k-lC-: + Rk)-l. Since the primary concern of the Kalman filter is its real-time capability, it is of utmost importance to be able to compute Gk preferably without directly inverting a matrix at each time instant, and/or to perform efficiently and accurately a modified operation, whether it would involve matrix inversions or not. The sequential algorithm, which we will first discuss, is designed to avoid a direct computation of the inverse of the matrix (CkPk,k-lC-: +Rk), while the square-root algorithm, which we will then study, only requires inversion of triangular matrices and improve the computational accuracy by working with the squareroot of possibly very large or very small numbers. We also intend to combine these two algorithms to yield a fairly efficient computational scheme for real-time applications.
7.1 Sequential Algorithm The sequential algorithm is especially efficient if the positive definite matrix Rk is a diagonal matrix, namely: Rk
= diag [ r~, ... , r% ]'
where rl,···, r% > o. If Rk is not diagonal, then an orthogonal matrix Tk may be determined so that the transformation T;[ RkTk is a diagonal matrix. In doing so, the observation equation
of the state-space description is changed to
98
7. Sequential and Square-Root Algorithms
Vk = OkXk
+ !lk '
where Vk = T;[ Vk, Ok = T;[ Ok, and !lk = T;[!lk' so that Var(!lk) = T"[ RkTk·
In the following discussion, we will assume that R k is diagonal. Since we are only interested in computing the Kalman gain matrix G k and the corresponding optimal estimate xklk of the state vector Xk for a fixed k, we will simply drop the indices k whenever no chance of confusion arises. For instance, we write
and Rk
= diag [ rI,
... , r q
] .
The sequential algorithm can be described as follows.
Theorem 7.1. Let k be fixed and set pO =
Pk,k-l
and
x"'0
'" = Xklk-l'
(7.1)
For i = 1, ... ,q, compute
(7.2)
Tben we have (7.3)
and (7.4)
(cf. Fig.7.1).
7.1 Sequential Algorithm
i
g
1
99
pi-l i
= (Ci)T pi -1 Ci + r i
C
Fig. 7.1.
To prove (7.3), we first verify that (7.5)
This can be seen by returning to the filtering equations. We have Pk,k
= (1 - GkCk)Pk,k-l = Pk,k-l - GkCkPk,k-l ,
so that Gk
= Pk,k-l C "[ (CkPk,k-1C"[ + Rk)-l = (Pk,k + GkCkPk,k-l)CJ (CkPk,k-lCJ + Rk)-l
or Gk(CkPk,k-l C "[
+ R k) =
(Pk,k
+ GkCkPk,k-l)C"[
,
which yields (7.5). Hence, to prove (7.3), it is sufficient to show that (7.6)
100
7. Sequential and Square-Root Algorithms
A direct proof of this identity does not seem to be available. Hence, we appeal to the matrix inversion lemma (Lemma 1.2). Let E > 0 be given, and set P O = Pk,k-l + El, which is now positive definite. Also, set f
and
Then by an inductive argument starting with i = 1 and using (1.3) of the matrix inversion lemma, it can be seen that the matrix
is invertible and
Hence, using all these equations for i = q, q - 1, ... ,1, consecutively, we have
q
= (pfO)-l + Lci(ri)-l(ci)T i=l
On the other hand, again by the matrix inversion lemma, the matrix is also invertible with
7.1 Sequential Algorithm
101
From the Kalman filtering equations, we have
PE
-t
pO - pOCk(CkPOC~
+ Rk)-l pO
= (1 - Gk C k)Pk,k-1 = Pk,k
as
€ -t
0 ;
while from the definition, we have
as
€ -t
o. This means that (7.6) holds so that (7.3) is verified.
To establish (7.4), we first observe that (7.7)
Indeed, since pi-1
=
pi
+ gi(ei)T pi-1
which follows from the third equation in (7.2), we have, using the first equation in (7.2), .
1
(...
g' = (et·)T p.t- 1et· + r t. p' + g'(c')
T . 1) . pt-
c'.
This, upon simplification, is (7.7). Now, from the third equation in (7.2) again, we obtain
for any i, 0 ~ i ~ q - 1. Hence, by consecutive applications of the correction equation of the Kalman filter, (7.3), (7.1), (7.8), and (7.7), we have
102
7. Sequential and Square-Root Algorithms
Xklk
= Xklk-l + Gk(Vk -
CkXk1k-l)
= (I - GkCk)Xklk-l + GkVk = (I - pqC"[ R;;lCk)Xklk_l + pqC"[ R;;lVk =
=
(I - tpqCi(ri)-l(Ci)T )X? + tpqci(ri)-lvi [(I - pqcq(rq)-l(cq)T) - ~(I - gq(cq)T) ... (I T)pici(ri)-I( X + ~ (I Ci ) T] O
gi+l(CHI)
... (I =
gq( cq)T)
gi+l(Ci + 1) T)pici(ri)-lv i + pqcq(rq)-lv q
[(I - gq(cq)T) - ~(1 - gq(cq)T) ... {I -
gHI(CHI)T)gi(ci)T ]XO+
~(1 -
gq(cq)T)
gi+l(c i + 1)T)givi + gqv q (I - gq (c q)T) ... (I - g 1 ( C l ) T) Xo q-l + L(1 - gq(cq)T) ... (I - gi+l(Ci+1)T)givi i=l
... (I -
=
+ gqv q .
On the other hand, from the second equation in (7.2), we also have Xq =
(I - gq(cq)T)xq- 1 + gqv q = (I - gq(c q)T) (I - gq-l(c q- 1) T)x q- 2 + (I - gq(c q)T)gq-1v q- 1 + gqv q = (I - gq(c q)T) ... (I q-l
+ L (I -
gl(C 1)T)xO
gq (c q)T) ...
(I -
gi+l(Ci + 1)T)givi
+ gqv q
i=l
which is the same as the above expression. That is, we have proved that xklk = x q, completing the proof of Theorem 7.1.
7.2 Square-Root Algorithm
103
7.2 Square-Root Algorithm We now turn to the square-root algorithm. The following result from linear algebra is important for this consideration. Its proof is left to the reader (cf. Exercise 7.1).
Lemma 7.1. To any positive definite symmetric matrix A, there is a unique lower triangular matrix AC such that A = AC(AC) T. More generally, to any n x (n + p) matrix A, there is an n x n matrix A such that AAT = AAT . AC has the property of being a "square-root" of A, and since it is lower triangular, its inverse can be computed more efficiently (cf. Exercise 7.3). Note also that in going to the square-root, very small numbers become larger and very large numbers become smaller, so that computation is done more accurately. The factorization of a matrix into the product of a lower triangular matrix and its transpose is usually done by a Gauss elimination scheme known as Cholesky factorization, and this explains why the superscript c is being used. For the general case, A is also called a "square-root" of AAT . In the square-root algorithm to be discussed below, the inverse of the lower triangular factor
(7.9)
Hk:= (CkPk,k-lCl +Rk)C
will be taken. To improve the accuracy of the algorithm, we will also use R k instead of the positive definite square-root R~/2. Of course, if Rk is a diagonal matrix, then R k = R~/2. We first consider the following recursive scheme: Let "Jo,o Jk,k-l
= (Var(xo)) 1/2 ,
be a square-root of the matrix
[ Ak-lJk-l,k-l
r
1/2 ] T nx(n+p) '
k-lQk-l
and Jk,k
=
Jk,k-l [ I - J~k-l
C-: (H-:)-l(Hk + Rk)-lCkJk,k-l ]
for k = 1,2,···, where (Var(xo))1/2 and Q~l!l are arbitrary squareroots of Var(xo) and Qk-l, respectively. The auxiliary matrices Jk,k-l and Jk,k are also square-roots (of Pk,k-l and Pk,k, respectively), although they are not necessarily lower triangular nor positive definite, as in the following:
104
7. Sequential and Square-Root Algorithms
Theorem 7.2. Jo,oJ;[,o = Po,o, and for k = 1,2,···, Jk,k-lJ~k-1 = Pk,k-l
(7.10)
Jk,kJ~k = Pk,k .
(7.11)
The first statement is trivial since PO,o = Var(xo). We can prove (7.10) and (7.11) by mathematical induction. Suppose that (7.11) holds for k - 1; then (7.10) follows immediately by using the relation between Pk,k-1 and Pk-1,k-1 in the Kalman filtering process. Now we can verify (7.11) for k using (7.10) for the same k. Indeed, since so that
+ [(Hk + R k )T]-l H k 1 - (H~)-l(Hk + Rk)-lCkPk,k-1 C~[(Hk + R k )T]-l H;l = (H~)-l(Hk + Rk)-l{ Hk(Hk + R k )T + (Hk + Rk)H~ - HkH~ + R k } [(Hk + R k )T] -1 H;;l = (H~)-l(Hk + Rk)-l{HkH~ + Hk(R k )T + RkH~ + Rk}[(Hk + R k )T]-l H;;l = (H~)-l(Hk + Rk)-l(Hk + Rk)(Hk + R k )T[(Hk + R k )T]-l H;;l (H~)-l(Hk + Rk)-l
= (H~)-lH;l
=
(HkH~)-l,
it follows from (7.10) that Jk,kJ~k = Jk,k-l [ I - J~k-1C~ (H~)-l(Hk
. [I = Jk,k-l {
+ Rk)-lCkJk,k-1]
J~k_lC~[(Hk +Rk)T]-lH;lCkJk,k_l]J~k_l I - J~k-lC~ (H~)-l(Hk
+ Rk)-lCkJk,k-1
- J~k_lC~[(Hk + Rk)T]-lH;lCkJk,k_l
+ J~k-l C~ (H~)-l(Hk + Rk)-lCkJk,k-lJ~k_l C-: . [(Hk + R k )T]-l HklCkJk,k-l}J~k_l = Pk,k-l - Pk,k-lCl {(H;[)-l(Hk + Rk)-l + [(Hk + R k)T]-l H;;l - (H~)-l(Hk
+ Rk)-lCkPk,k-lC~[(Hk + R k )T]-l H;l }CkPk,k-l
= Pk,k-l - Pk,k-lC~ (HkHJ)-lCkPk,k-l = Pk,k. This completes the induction process.
7.3 Real-Time Applications
105
In summary, the square-root Kalman filtering algorithm can be stated as follows: (i) Compute Jo,o = (Var(xo))1/2. (ii) For k = 1,2"", c'ompute Jk,k-l, a square-root of the matrix [Ak-1Jk-l,k-l rk-lQ~~21]nX(n+p)[Ak-lJk-l,k-l rk-lQ~~l]JX(n+p),
and the matrix Hk
= (CkJk,k-lJ~k_lC~ + Rk)C,
and then compute Jk,k = Jk,k-l [ I - J~k-l C~ (H-:)-l(Hk
+ Rk)-lCkJk,k-l]
.
(iii) Compute xOlo = E(xo), and for k = 1,2" ", using the information from (ii), compute Gk = Jk,k-lJ~k_1C~ (H-:)-l H"k 1
and
(cf. Fig. 7.2). We again remark that we only have to invert triangular matrices, and in addition, these matrices are square-root of the ones which might have very small or very large entries.
7.3 An Algorithm for Real-Time Applications In the particular case when Rk = diag[ r~, ... , r% ]
is a diagonal matrix, the sequential and square-root algorithms can be combined to yield the following algorithm which does not require direct matrix inversions: (i) Compute Jo,o = (Var(xo))1/2. (ii) For each fixed k = 1,2"", compute (a) a square-root Jk,k-l of the matrix 1/2 ]nx(n+p) [A k-1 J k-1,k-1 rk-1 Q1/2 [A k-1 J k-l,k-1 rk-1 Q k-1 k-1 ]T nx(n+p) ,
and
106
7. Sequential and Square-Root Algorithms
(b) for i = 1, .. k, 0
Jik,k-l
,
i 1
- (Ji-l)T = (Jk,k-l k,k-l
-
i - 1 (Ji-1 )T) C i ( i)T J k,k-l k,k-l
gk Ck
,
where o J k,k-l, Jqk,k-l = J k,k J k,k-l'0-
and
cJ:= [cl
000
Ck ]
0
(iii) Compute XO/O = E(xo). (iv) For each fixed k = 1,2, 0", compute (a) xklk-l = Ak-1Xk-llk-l, (b) for i = 1", ',q, with x~ := xklk-l, and using information from (ii)(b), compute
where
+
Fig. 7.2.
Vk := [v~
... vk
]T,
so that
Exercises
107
Exercises 7.1. Give a proof of Lemma 7.1. 7.2. Find the lower triangular matrix L that satisfies: (a)
LL
T
~ ~].
= [; 3
(b)
LL
T
2
14
[~ ~
=
;].
124
7.3. (a) Derive a formula to find the inverse of the matrix
[~~~ £~2 f ~], f f
L=
31
32
33
where f 11 ,f22 , and f 33 are nonzero. (b) Formulate the inverse of f 11
0
0
0
f 21
f 22
0
0
f n1
f n2
L=
o fnn
where f 11 , ... ,fnn are nonzero. 7.4. Consider the following computer simulation of the Kalman filtering process. Let € « 1 be a small positive number such that 1-€~1
1-€2~1
where "~" denotes equality after rounding in the computer. Suppose that we have Pk
k
,
=
[
€2 1-€2
0
0]1
.
Compare the standard Kalman filter with the square-root filter for this example. Note that this example illustrates the improved numerical characteristics of the square-root filter. 7.5. Prove that to any positive definite symmetric matrix A, there is a unique upper triangular matrix AU such that A = AU(AU)T. 7.6. Using the upper triangular decompositions instead of the lower triangular ones, derive a new square-root Kalman filter. 7.7. Combine the sequential algorithm and the square-root scheme with upper triangular decompositions to derive a new filtering algorithm.
8. Extended Kalman Filter and System Identification
The Kalman filtering process has been designed to estimate the state vector in a linear model. If the model turns out to be nonlinear, a linearization procedure is usually performed in deriving the filtering equations. We will consider a real-time linear Taylor approximation of the system function at the previous state estimate and that of the observation function at the corresponding predicted position. The Kalman filter so obtained will be called the extended Kalman filter. This idea to handle a nonlinear model is quite natural, and the filtering procedure is fairly simple and efficient. Furthermore, it has found many important real-time applications. One such application is adaptive system identification which we will also discuss briefly in this chapter. Finally, by improving the linearization procedure of the extended Kalman filtering algorithm, we will introduce a modified extended Kalman filtering scheme which has a parallel computational structure. We then give two numerical examples to demonstrate the advantage of the modified Kalman filter over the standard one in both state estimation and system parameter identification.
8.1 Extended Kalman Filter A state-space description of a system which is not necessarily linear will be called a nonlinear model of the system. In this chapter, we will consider a nonlinear model of the form Xk+l {
Vk
= fk(Xk) + Hk(Xk)~k = gk(Xk) + '!lk'
(8.1)
where fk and gk are vector-valued functions with ranges in Rn and Rq, respectively, 1 :S q :S n, and Hk a matrix-valued function with range in Rn X Rq, such that for each k the first order partial derivatives of fk(Xk) and gk(Xk) with respect to all the components
8.1 Extended Kalman Filter
109
of Xk are continuous. As usual, we simply consider zero-mean Gaussian white noise sequences {Sk} and {!lk} with ranges in RP and Rq, respectively, 1 ~ p, q ~ n, and
E(Sk!1I)
= 0,
E(Sk
xri ) = 0,
E(!lk
xri ) = 0,
for all k and R. The real-time linearization process is carried out as follows. In order to be consistent with the linear model, the initial estimate Xo = xOlo and predicted position xIlo are chosen to be Xo
= E(xo) ,
XIlo
= fo(xo) .
We will then formulate Xk = xklk, consecutively, for k = 1,2,," , using the predicted positions (8.2)
and the linear state-space description Xk+I {
Wk
= AkXk + Uk + fkS k = CkXk + !1k '
(8.3)
where Ak,Uk,rk,Wk and C k are to be determined in real-time as follows. Suppose that Xj has been determined so that Xj+Ilj is also defined using (8.2), for j = 0,1,,,, ,k. We consider the linear Taylor approximation of fk(Xk) at Xk and that of gk(Xk) at xklk-I; that is, fk(Xk) ~ fk(Xk) + Ak(Xk - Xk) { gk(Xk) ~ gk(Xklk-l) + Ck(Xk - xklk-l)'
(8.4)
where and
Here and throughout, for any vector-valued function
(8.5)
110
8. Kalman Filter and System Identification
where
we denote, as usual,
[~(xk)] = 8Xk
~(Xk) x k
~(x*) k 8x k
(8.6) ill!m (x*) 8x~ k
~(x*) 8x k k
Hence, by setting Uk {
fk Wk
= fk(Xk) - AkXk = Hk(Xk) = Vk - gk(Xklk-l) + CkXklk-1 ,
(8.7)
the nonlinear model (8.1) is approximated by the linear model (8.3) using the matrices and vectors defined in (8.5) and (8.7) at the kth instant (cf. Exercise 8.2). Of course, this linearization is possible only if Xk has been determined. We already have xo, so that the system equation in (8.3) is valid for k = o. From this, we define Xl = XIII as the optimal unbiased estimate (with optimal weight) of Xl in the linear model (8.3), using the data [vci wJl T. Now, by applying (8.2), (8.3) is established for k = 1, so that X2 = x212 can be determined analogously, using the data [vci wJ wIlT, etc. From the Kalman filtering results for linear deterministicjstochastic state-space descriptions in Chapters 2 and 3 (cf. (2.18) and Exercise 3.6), we arrive at the "correction" formula Xk
= xklk-l + Gk(Wk - CkXk1k-l) = xklk-l + Gk((Vk - gk(Xklk-l) + CkXklk-l) = xklk-l + Gk(Vk - gk(Xklk-I))'
- CkXklk-l)
where Gk is the Kalman gain matrix for the linear model (8.3) at the kth instant. The resulting filtering process is called the extended Kalman filter. The filtering algorithm may be summarized as follows (cf.
8.2 Satellite Orbit Estimation
111
Exercise 8.3): Po,o ~or
= Var(xo) , k
Xo
= E(xo) .
= 1,2,···,
n .Lk,k-l =
[8f -8-- Xk-l
k- 1 ("')]
Xk-l
[8f
k
1
('" Pk-1,k-l -8-Xk-l )] T
Xk-l
+ Hk-l (Xk-l)Qk-lHJ-l (Xk-l) xklk-l = fk-l(Xk-l)
Gk
8g k
'"
= Pk,k-1 [ 8Xk (xklk-d .
(8.8)
] T
[[~:: (Xklk-d] Pk,k-1 [~:: (Xklk-d]
T
+ Rk]
-1
[1 - Gk [~;: (Xk lk-1)]] Pk,k-1
Pk,k
=
Xklk
= Xklk-l + Gk(Vk
- gk(Xklk-l))
(cf. Fig.8.l). + +
Fig. 8.1.
8.2 Satellite Orbit Estimation
Estimating the planar orbit of a satellite provides an interesting example for the extended Kalman filter. Let the governing equations of motion of the satellite in the planar orbit be given by (8.9)
where r is the (radial) distance (called the range) of the satellite from the center of the earth (called the attracting point), () the angle measured from some reference axis (called the angular displacement), and m,9 are constants, being the mass of the
112
8. Kalman Filter and System Identification
earth and the universal gravitational constant, respectively (cf. Fig.8.2). In addition, ~r and ~B are assumed to be continuoustime uncorrelated zero-mean Gaussian white noise processes. By setting x = [r r (J O]T = [x[l] x[2] x[3] x[4]]T, the equations in (8.9) become:
I r ..
I I
\1
---+- -
Canter of the Earth
I
I
I I Fig. 8.2.
Hence, if we replace x by Xk and x by (Xk+l -xk)h- 1 , where h > o denotes the sampling time, we obtain the discretized nonlinear model where xk[l] xk[2]
+ h xk[2]
+ h xk[l] xk[4]2 xk[3]
hmgjxk[1]2
+ h xk[4]
xk[4] - 2h Xk[2]Xk[4]/Xk[1]
8.3 Adaptive System Identification
H (Xk)
h o = 0 [ o
h0 0
0 0] 0 0 hO'
0
0
113
h/xk[l]
and {k := [0 ~r(k) 0 ~e(k)]T. Now, suppose that the range r is measured giving the data information Vk
= [ 01 00 01 0] 0 Xk + 'f/k ,
where {'f/k} is a zero-mean Gaussian white noise sequence, independent of {~r(k)} and {~e(k)}. It is clear that [88f
Xk-1
(xk-d]
=
+ hXk_1[4]2
0
2hxk-1 [2] Xk-1 [4] / Xk-1 [1] 2
1
0
-2hxk-1 [4]/Xk-1 [1]
0
0
1
0
0
2h Xk-1 [1] Xk-1 [4]
h
1 - 2hxk-1[2]/Xk-1[1]
1
2hmg/xk_1[1]3
h
H(Xk-1Ik-1)
=
[~ ~ ~ o
8g [ 8Xk (xklk-d A
]
0
= [10
T
~]
h/xk-1[1]
0
0 0
0 1
00]
and
= xklk-1[1]. By using these quantities in the extended Kalman filtering equations (8.8), we have an algorithm to determine Xk which gives an estimate of the planar orbit (xk[1],.xk[3]) of the satellite. 9(Xklk-1)
8.3 Adaptive System Identification
As an application of the extended Kalman filter, let us discuss the following problem of adaptive system identification. Suppose that a linear system with state-space description Xk+1 {
Vk
= Ak(ft)Xk + fk(ft){k = Ck(ft)Xk + ~k
(8.10)
114
8. Kalman Filter and System Identification
is being considered, where, as usual, Xk ERn , ~kERP, !lkERq, 1 ~ p, q ~ n, and {~k} and {!lk} are uncorrelated Gaussian white noise sequences. In this application, we assume that Ak(tl) , rk(tl) , and Ck(ft) are known matrix-valued functions of some unknown constant vector fl. The objective is to "identify" fl. It seems quite natural to set ftk +l = tlk = fl, since tl is a constant vector. Surprisingly, this assumption does not lead us anywhere as we will see below and from the simple example in the next section. In fact, fl must be treated as a random constant vector such as tlk+l = flk + ~k ' (8.11) where {~k} is any zero-mean Gaussian white noise sequence uncorrelated with {!lk} and with preassigned positive definite variances Var(~k) = Sk. In applications, we may choose Sk = S > 0 for all k (see Section 8.4). Now, the system (8.10) together with the assumption (8.11) can be reformulated as the nonlinear model:
[~:::] {
Vk
= [ Ak~:)Xk ] +
=
[Ck(~)
0]
[rk(~~)~k ]
[~:] +!Zk'
(8.12)
and the extended Kalman filtering procedure can be applied to estimate the state vector which contains ftk as its components. That is, tlk is estimated optimally in an adaptive way. However, in order to apply the extended Kalman filtering process (8.8), we still need an initial estimate ~o := ~Io' One method is to appeal to the state-space description (8.10). For instance, since E(vo) = Co(fl)E(xo) so that vo-Co(fl)E(xo) is of zero-mean, we could start from k = 0, take the variances of both sides of the modified "observation equation" Vo - Co(fl)E(xo) = Co(!l.)xo - Co(fl)E(xo)
+ '!la '
and use the estimate [vo - Co (tl)E(xo)][vo - Co (ft)E(xo)]T for Var(voCo(!l.)E(xo» (cf. Exercise 2.12) to obtain approximately vovci - Co(fl)E(xo)v6 - vo(Co(!l.)E(xo» T
+ Co(tl) (E(xo)E(xci) -
Var(xo»C~ (fl) - R o = 0
(8.13)
(cf. Exercise 8.4). Now, solve for fl and set one of the "most appropriate" solutions as the initial estimate~. If there is no solution of fl in (8.13), we could use the equation VI
= Cl (tl)Xl + !11 = Cl (fl)(Ao(fl)xo
+ ro(fl)~o) + !11
8.4 Example of Parameter Identification
115
and apply the same procedure, yielding approximately C1(fl)A o(fl)E(xo)v"[ -
VI V "[ -
- C 1 (fl) r 0 (fl) Q0 ( C 1 (fl) r 0 (fl) ) T
VI (C1(fl)A o(fl)E(xo)) T
+ C 1 (fl) A o(fl) [E (xo )E (xci )
- Var(xo)]Aci (fl)Ci (fl) - RI = 0
(8.14)
(cf. Exercise 8.5), etc. Once ~ has been chosen, we can apply the extended Kalman filtering process (8.8) and obtain the following algorithm: - [var(x o)
R
0,0 -
~or
0
k = 1,2,···,
= [Ak-l(~k-l)Xk-l] [ ~klk-l] -klk-l -k-l P,
k,k-l --
[Ak-l(~k-d 0 .
:0-
[Ak-l(~k-l)Xk-l]] I
[Ak-l~~k-l) :~ [Ak-l(~k-l)Xk-l] ] +
[r
k- 1
"
(~k-l)Qk-lrl-l (~k-l) o
Gk = Pk,k-l [Ck(fl klk- 1) 0]
P
k-l,k-l
(8.15)
T
0]
Sk-l
T
. [[Ck(~klk-l) O]Pk,k-l [Ck(~klk-l) O]T
+ Rk]-1
Pk,k = [I - Gk[Ck(~klk-l) 0] ]Pk,k-l
= [~klk-l] + Gk(Vk - Ck(~klk-l)Xklk-l) [ ~k] -k -klk-l (cf. Exercise 8.6). We remark that if the unknown constant vector fl is considered to be deterministic; that is, fl k+1 = fl k = fl so that Sk = 0, then the procedure (8.15) only yields ~k = ~k-l for all k, independent of the observation data (cf. Exercise 8.7), and this does not give us any information on ~k. In other words, by using Sk = 0, the unknown system parameter vector fl. cannot be identified via the extended Kalman filtering technique.
8.4 An Example of Constant Parameter Identification The following simple example will demonstrate how well the extended Kalman filter performs for the purpose of adaptive system
116
8. Kalman Filter and System Identification
identification, even with an arbitrary choice of the initial estimate ~o· Consider a linear system with the state-space description
where a is the unknown parameter that we must identify. Now, we treat a as a random variable; that is, we consider
where ak is the value of a at the kth instant and E((k) == 0, Var((k) == 0.01, say. Suppose that E(xo) == 1, Var(xo) == 0.01, and {1]k} is a zero-mean Gaussian white noise sequence with Var(1]k) == 0.01. The objective is to estimate the unknown parameters ak while performing the Kalman filtering procedure. By replacing a with ak in the system equation, the above equations become the fol-
lowing nonlinear state-space description:
(8.16)
An application of (8.15) to this model yields
(8.17)
where the initial estimate of Xo is Xo == E(xo) = 1 but ao is unknown. To test this adaptive parameter identification algorithm, we create two pseudo-random sequences {1]k} and {(k} with zero mean and the above specified values of variances. Let us also (secretely) assign the value of a to be -1 in order to generate the data {Vk}. To apply the algorithm described in (8.17), we need an initial estimate ao of a. This can be done by using (8.14) with the first bit of data VI == -1.1 that we generate. In other words, we obtain
8.4 Example of Parameter Identification
117
en )(
as I
Q)
oE
~~;::
o It)
o
o o v
0 0
c:
0 M
0
-; 0
-.-
....c:
Q) 1:)
0 0
E
0
.... Q)
C\I
Cl)
~
en
0 0 0
,..-
.--0 --0--0 0---" ~
0
It)
0 0
C\I
.".:.
~
cas
co C\I ~
CJ)
0 I
0
cas
T"'"
T"'"
co
11
0
0
~
---- ----- ..
0
11
0
0
0 0
0 0
cas
I
11
11
0
0
cas
C')
0 0
0
CD
cas
0
0
0
~
~
~ I
Lt)
,..-
.,.:
I
I
~
U.
118
8. Kalman Filter and System Identification 0.99a 2
+ 2.2a + 1.2 =
0
or
ao =
-1.261,
-0.961.
In fact, the initial estimate ao can be chosen arbitrarily. In Fig. 8.3. we have plots of ak for four different choices of ao, namely: -1.261, -0.961, 0, and 1. Observe that for k ~ 10, all estimates ak with any of these initial conditions are already very close to the actual value a = -1. Suppose that a is considered to be deterministic, so that ak+l = ak and (8.16) becomes
Then it is not difficult to see that the Kalman gain matrix becomes
for some constant 9k and the "correlation" formula in the filtering algorithm is
Note that since ak = ak-l = ao is independent of the data {Vk}, the value of a cannot be identified. 8.5 Modified Extended Kalman Filter
In this section, we modify the extended Kalman filtering algorithm discussed in the previous sections and introduce a more efficient parallel computational scheme for system parameters identification. The modification is achieved by an improved linearization procedure. As a result, the modified Kalman filtering algorithm can be applied to real-time system parameter identification even for time-varying stochastic systems. We will also give two numerical examples with computer simulations to demonstrate the effectiveness of this modified filtering scheme over the original extended Kalman filtering algorithm.
8.5 Modified Extended Kalman Filter
119
The nonlinear stochastic system under consideration is the following: Xk+l J [ Fk (Yk)Xk J [ Yk+l = Hk(Xk, Yk) + Vk
=
[Ck(Xk,Yk) 0]
where, Xk and Yk are
{[iiJ} and
n-
k [rr~(Xk,
(Xk, Yk)
[;:J +
Yk)
0
J [~l]
r~(Xk,yk) {~
(8.18)
!Zk'
and m-dimensional vectors, respectively,
{!Zk} uncorrelated zero-mean Gaussian white noise
sequences with variance matrices and
respectively, and Fk' Hk' rk, r~, r~ and Ck nonlinear matrix-valued functions. Assume that Fk and Ck are differentiable. The modified Kalman filtering algorithm that we will derive consists of two sub-algorithms. Algorithm I shown below is a modification of the extended Kalman filter discussed previously. It differs from the previous one in that the real-time linear Taylor approximation is not taken at the previous estimate. Instead, in order to improve the performance, it is taken at the optimal estimate of Xk given by a standard Kalman filtering algorithm (which will be called Algorithm 11 below), from the subsystem
+ rk(Xk'Yk)~~ = C(Xk, Yk)Xk + !lk
Xk+l = Fk(Yk)Xk {
Vk
(8.19)
of (8.18), evaluated at the estimate (Xk,Yk) from Algorithm I. In other words, the two algorithms are applied in parallel starting with the same initial estimate as shown in Figure 8.4, where Algorithm I (namely, the modified Kalman filtering algorithm) is used for yielding the estimate [;~] with the input Xk-l obtained from Algorithm 11 (namely: the standard Kalman algorithm for the linear system (8.19)); and Algorithm 11 is used for yielding the estimate Xk with the input [~k-l] obtained from Algorithm I. Yk-l The two algorithms together will be called the parallel algorithm (I and 11) later.
120
Xo
8. Kalman Filter and System Identification
= i o = E(xo)
Yo
\
= E(yo)
Y / \
Po = \t" ar ( [;: ] )
Fig. 8.4.
Algorithm I:
Set
For k
compute
== 1,2, . ",
8.5 Modified Extended Kalman Filter
where Qk =
var([~])
121
and Rk = Var(!l.k); and Xk-l is obtained by
using the following algorithm. Algorithm 11:
Set Xo = E(xo)
and
Po = Var(xo).
For k = 1,2, ... , compute Pk,k-l =[Fk-l (Yk-l)]Pk-l [Fk-l (Yk_l)]T
+ [fk-l (Xk-l, Yk-l)]Qk-l [fk-l (Xk-l, Yk_l)]T xklk-l =[Fk-l(Yk-l)]Xk-l Gk =Pk,k-l[Ck(Xk-l,Yk-l)]T . [[Ck(Xk-l,Yk-l)]Pk,k-l[Ck(Xk-l,Yk-l)]T Pk
=[1 -
+ Rk]-l
Gk[Ck(Xk-l,Yk-l)]]Pk,k-l
Xk =xklk-l
+ Gk(Vk
where Qk = Var(~~), from Algorithm I.
- [Ck(Xk-l,Yk-l)]Xklk-l),
Rk = Var(!Zk) ,
and (Xk-l, Yk-l) is obtained
Here, the following notation is used:
We remark that the modified Kalman filtering algorithm (namely: Algorithm l) is different from the original extended Kalman filtering scheme in that the Jacobian matrix of the (nonlinear) vector-valued function Fk-l (Yk-l )Xk-l and the prediction term
[~klk-l] Yklk-l
are both evaluated at the optimal position
~k-l
122
8. Kalman Filter and System Identification
at each time instant, where Xk-I is determined by the standard Kalman filtering algorithm (namely, Algorithm 11). We next give a derivation of the modified extended Kalman filtering algorithm. Let Xk be the optimal estimate of the state vector Xk in the linear system (8.19), in the sense that (8.20)
among all linear and unbiased estimates Zk of Xk using all data information VI,"', Vk. Since (8.19) is a subsystem of the original system evaluated at (Xk' Yk) just as the extended Kalman filter is derived, it is clear that (8.21)
where
Now, consider an (n + m)-dimensional nonlinear differentiable vector-valued function Z = f(X)
(8.22)
defined on Rn+m. Since the purpose is to estimate
from some (optimal) estimate Xk of X k, we use Xk as the center of the linear Taylor approximation. In doing so, choosing a better estimate of Xk as the center should yield a better estimate for Zk. In other words, if Xk is used in place of Xk as the center for the linear Taylor approximation of Zk, we should obtain a better estimate of Zk as shown in the illustrative diagram shown in Figure 8.5. Here, Xk is used as the center for the linear Taylor approximation in the standard extended Kalman filter.
8.5 Modified Extended Kalman Filter
z~
123
-----------
Fig. 8.5.
Now, if f'(X) denotes the Jacobian matrix of f(X) at X, then the linear Taylor estimate Zk of Zk with center at Xk is given by
or, equivalently, (8.23)
We next return to the nonlinear model (8.18) and apply the linearization formula (8.23) to f k defined by (8.24)
Suppose that (Xk,Yk) and Xk have already been determined by using the parallel algorithm (I and 11) at the kth instant. Going the (k + l)st instant, we apply (8.23) with center [;~] (instead of
[;~]), using the Jacobian matrix
to linearize the model (8.18). Moreover, as is usually done in the standard extended Kalman filter, we use the zeroth order Taylor approximations Ck(Xk, Yk) and rt(Xk, Yk) for the matrices Ck(Xk, Yk) and rt(Xk, Yk), i = 1,2,3, respectively, in the linearization
124
8. Kalman Filter and System Identification
of the model (8.18). This leads to the following linear state-space description:
[;::~] = [a [;z] [::g::::)]] [;:] rl(Xk, Yk) 0 ] + [ r~(xk,h)rHxk,h) Vk
+Uk
[~~] ~~
(8.25)
=[Ck(Xk,h) 0) [;:] +!lk'
in which the constant vector Fk(Yk)Xk] [ 8 Uk = [ Hk(Xk, h) - a
[;z]
[ Fk(Yk)Xk ]] [Xk] Hk(Xk, h) h
can be considered as a deterministic control input. Hence, similar to the derivation of the standard Kalman filter, we obtain Algorithm I for the linear system (8.25). Here, it should be noted that the prediction term that contains
Uk
[~klk-l] Yklk-l
in Algorithm I is the only term
as formulated below:
[;:::=~] = [a[;~=~] [::~:(~:~2::=:)]] [;:=~] _ [
8
- a [;z=~] 8
[ Fk-l(Yk-l)Xk-l ]] [Xk-l]
- [ a [;z=~]
Hk-l(Xk-llh-l) [ Fk-l (Yk-l)Xk-l ]]
Hk-l(Xk-llh-d
Yk-l +
+Uk-l
[Fk-l(Yk-l)Xk-l] Hk-l(Xk-llYk-l)
[~k-l] Yk-l
_ [ Fk-l (Yk-l)Xk-l ] - Hk-l(Xk-l,Yk-l) .
8.6 Time-Varying Parameter Identification
In this section, we give two numerical examples to demonstrate the advantage of the modified extended Kalman filter over the standard one, for both state estimation and system parameter identification.
8.6 Time-Varying Parameter Identification
125
For this purpose, we first consider the nonlinear system
[
~::~] = [[ -~.1 ~k] [::]] + [~i], [~n U:~] -k
where ~k and 'T}k are uncorrelated zero-mean Gaussian white noise sequences with Var(~k) = 0.113 and Var('T}k) = 0.01 for all k. We then create two pseudo-random noise sequences and use
iO] [~oZo
=
[Xo] ~o Zo
=
[100] 100 1.0
and
Po = 13
as the initial estimates in the comparison of the extended Kalman filtering algorithm and the parallel algorithm (1 and 11). Computer simulation shows that both methods give excellent estimates to the actual Xk component of the state vector. However, the parallel algorithm (1 and 11) provides much better estimates to the Yk and Zk components than the standard extended Kalman filtering algorithm. In Figures 8.6 and 8.7, we plot the estimates of Yk against the actual graph of Yk by using the standard extended Kalman filtering scheme and the parallel algorithm, respectively. Analogous plots for the Zk component are shown in Figures 8.8 and 8.9, respectively. Fig. 8.5.
1_ - -
150
-
--
EKF
Yk
actual
I
Yk
100
50
ITERATIONS
o
100
200
300
400
500
126
8. Kalman Filter and System Identification
_---para1J~1 1
150
algorithm
Fig.B.7.
'·1
- - - adualy.
100
50
0 ...
-
/-,
I/"~
/~, "-/'
"
-..../
I
If \
~I
/
ITERATIONS
0
100
200
400
300
500
0.20-.---------------------------...... Fig. 8.8. 0.15
0.10
- - - - _. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - --
0.05
-0.05
ITERATIONS
-0.10+-----....- - - -....- - - - -...- - - -.....------.... o
100
200
400
300
- - - pllralleoliUgortthm z.
0,15
/
500
I
- - - - - actual z.
0.10
0.05
o.OO-tL-------------------------------t -0.05
ITERATIONS
-0.10 -...:.---.-----....- - - -.....- - - - -....- - - -... o
100
200
300
400
500
Fig.8.9.
8.6 Time-Varying Parameter Identification
127
As a second example, we consider the following system parameter identification problem for the time-varying stochastic system (8.19) with an unknown time-varying parameter vector fl k . As usual, we only assume that flk is a zero-mean random vector. This leads to the nonlinear model (8.25) with Yk replaced by flk. Applying the modified extended Kalman filtering algorithm to this model, we have the following: Algorithm I':
Let Var(flo) > 0 and set
[ XfL~oo] =
[E(OXo)] ,
R
0,0
= [var(xo)
0
0 ] Var(fl o ) ·
For k = 1,2, ... , compute
[ ~k'lk-l] = [Ak-l(:k-l)Xk-l] -klk-l
P
k,k-l
-k-l
=
[Ak-l(~k-l) tfJ. [Ak-l(~k_l)Xk-l]] 0 I
P
. [Ak-l~~k-l) tfJ. [Ak-l(~k-l)Xk-l]]
k-l,k-l
T
+ [rk-l(~k-l)Q~-lrLl(~k-l) ~] -
Gk = Pk,k-l[Ck(flkl k- 1 ) 0] -
T
-
. [[Ck(~klk-l) O]Pk,k-l[Ck(fLklk-l) 0] Pk,k
T
+ Rk]- 1
= [I - Gk[Ck(~klk-l) O]]Pk,k-l
Xk] [Xk1k-l] [B = B -k -klk-l
+ Gk(Vk -
........ Ck(fLklk-l)Xklk-l) ,
where Xk is obtained in parallel by applying Algorithm 11 with Yk replaced by ~k. To demonstrate the performance of the parallel algorithm (I' and 11), we consider the following stochastic system with an unknown system parameter f)k:
128
8. Kalman Filter and System Identification
For computer simulation, we use changing values of
o :S k :S 20 20 < k :S 40 40 < k :S 60 60 < k :S 80 80 < k.
1.0 1.2 (}k
==
1.5 2.0 2.3
with
(}k
given by
80 == 5.0.
The problem now is to identify these (unknown) values. It turns out that the standard extended Kalman filtering algorithm does not give any reasonable estimate of (}k, and in fact the estimates seem to grow exponentially with k. In Figure 8.10 the estimates of (}k are given using the parallel algorithm (I' and 11) with initial variances (i) Var((}o) ==0. 1, (ii) Var((}0)==1.0, and (iii) Var((}o) ==50. We finally remark that all existing extended Kalman filtering algorithms are ad hoc schemes since different linearizations have to be used to derive the results. Hence, there is no rigorous theory to guarantee the optimality of the extended or modified extended Kalman filtering algorithm in general. fJ
6 .....- - - - - - - - - - - - - - - - - - - - - - - - - . \-ar(fJo)=O.l \-ar(fJo )=50
5
\-ar(fJo)=l.O
actual
ITERATIONS
O-t--....,--..,...---...--......--..--..--.....-...-.II~o
Fig.8.10.
10
20
30
40
50
60
70
80
...... - .... 90
100
Exercises
129
Exercises 8.1. Consider the two-dimensional radar tracking system shown in Fig.8.ll, where for simplicity the missile is assumed to travel in the positive y-direction, so that x = 0, iJ = v, and jj = a, with v and a denoting the velocity and acceleration of the missile, respectively.
Fig. 8. 11 .
(a) Suppose that the radar is located at the orIgIn and measures the range r and angular displacement f) where (r, e) denotes the polar coordinates. Derive the nonlinear equations r = vsine .
v r
{ e = -cose and ..
r
2
. v = a szne + 2cos2e
r
. (ar - rv2 sinO) cose - vr sine cose e= 2
{
2
2
for this radar tracking model. (b) By introducing the state vector
establish a vector-valued nonlinear differential equation for this model. (c) Assume that only the range r is observed and that both the system and observation equations involve random disturbances {{} and {'T]}. By replacing x, x, {, and T/
130
8.2. 8.3.
8.4. 8.5. 8.6. 8.7. 8.8.
8. Kalman Filter and System Identification
by Xk, (Xk+l - xk)h- 1 , 5.k' and 1}k, respectively, where h > o denotes the sampling time, establish a discrete-time nonlinear system for the above model. (d) Assume that {5.k} and {1}k} in the above nonlinear system are zero-mean uncorrelated Gaussian white noise sequences. Describe the extended Kalman filtering algorithm for this nonlinear system. Verify that the nonlinear model (8.1) can be approximated by the linear model (8.3) using the matrices and vectors defined in (8.5) and (8.7). Verify that the extended Kalman filtering algorithm for the nonlinear model (8.1) can be obtained by applying the standard Kalman filtering equations (2.17) or (3.25) to the linear model (8.3) as shown in (8.8). Verify equation (8.13). Verify equation (8.14). Verify the algorithm given in (8.15). Prove that if the unknown constant vector fl. in the system (8.10) is deterministic (Le., ftk +1 == fl k for all k), then the algorithm (8.15) fails to identify fl.. Consider the one-dimensional model
+ ~k == C x k + 1}k ,
Xk+l {
Vk
==
Xk
where E(xo) == xo, Var(xo) == Po, {~k} and {1}k} are both zeromean Gaussian white noise sequences satisfying E(~k~f,) E(~k1}f)
== Qk 8kf, E(1}k1}f) == rk 8kf , == E(~kXO) == E(1}kXO) == o.
Suppose that the unknown constant c is treated as a random constant: Ck+ 1 == Ck + (k , where (k is also a zero-mean Gaussian white noise sequence with known variances Var((k) == Sk . Derive an algorithm to estimate c. Try the special case where Sk == S > o.
9. Decoupling of Filtering Equations
The limiting (or steady-state) Kalman filter provides a very efficient method for estimating the state vector in a time-invariant linear system in real-time. However, if the state vector has a very high dimension n, and only a few components are of interest, this filter gives an abundance of useless information, an elimination of which should improve the efficiency of the filtering process. A decoupling method is introduced in this chapter for this purpose. It allows us to decompose an n-dimensional limiting Kalman filtering algorithm into n independent one-dimensional recursive formulas so that we may drop the ones that are of little interest.
9.1 Decoupling Formulas Consider the time-invariant linear stochastic system
{
Xk+l
=
Vk
=
+ rs' k CXk + '!lk' AXk
(9.1)
where all items have been defined in Chapter 6 (cf. Section 6.1 for more details). Recall that the limiting Kalman gain matrix is given by G = PCT (CPC T
+ R)-l,
(9.2)
where p is the positive definite solution of the matrix Riccati equation
and the steady-state estimate Xk
Xk
of Xk is given by
= AXk-l + G(Vk - CAXk-l) = (1 - GC)AXk-l + GVk ,
(9.4)
132
9. Decoupling of Filtering Equations
(cf. (6.4)). Here, Q and R are variance matrices defined by and
for all k (cf. Section 6.1). Let [9ij]nxq, 1::; q ::; n, and set
= (1 - GC)A :=
[cPij]nxn, G =
and
We now consider the z-transforms <X>
X j = Xj(z) = LXk,jZ-k,
j
= 1,2,···
,n,
k=O
(9.5)
<X>
Vj = Vj(z) = L Vk,jZ-k ,
j
= 1,2,··· ,n,
k=O
of the jth components of {Xk} and {Vk}, respectively. Since (9.4) can be formulated as n
q
Xk+l,j = L cPjiXk,i i=l
+L
9ji Vk+l,i
i=l
for k = 0,1,··· , we have n
q
zXj = L cPjiXi i=l
+ Z L9ji Vi· i=l
Hence, by setting A = A(z) = (z1 - <1»,
we arrive at (9.6)
Note that for large values of Izl, A is diagonal dominant and is therefore invertible. Hence, Cramer's rule can be used to solve for Xl, ... ,Xn in (9.6). Let Ai be obtained by replacing the ith column of A with zG
[J.] .
9.1 Decoupling Formulas
133
Then, detA and detA i are both polynomials in z of degree n, and (9.7) i
=
In addition, we may write
1, ... ,n.
where
+ A2 + ... + An) , (AI A2 + AI A3 + ... + AI An + A2 A3 + ... + An -l An) ,
b1 = -(AI b2 =
with Ai, i = 1,2, .. " n, being the eigenvalues of matrix we have
detA
i
=
q>.
(~c}zn-i) VI + ... + (~cizn-i) Vi,
Similarly,
(9.9)
where c~, R = 0,1,' .. ,n, i = 1,2,"" q, can also be computed explicitly. Now, by substituting (9.8) and (9.9) into (9.7) and then taking the inverse z-transforms on both sides, we obtain the following recursive (decoupling) formulas: Xk,i
=-
b 1 X k-l,i -
b 2 X k-2,i -
+ C6 V k,1 + C~Vk-l,l +
- bnXk-n,i
+ C~Vk-n,1
+ C6 V k,q + CiVk-l,q + ... + C~Vk-n,q,
(9.10)
i = 1,2,' .. ,n. Note that the coefficients b1 , ... ,bn and cb, ... ,c~, i = 1," . ,q, can be computed before the filtering process is applied.
We also remark that in the formula (9.10), each Xk,i depends only on the previous state variables Xk-l,i, ... ,Xk-n,i and the data information, but not on any other state variaples Xk-l,j with j =1= i. This means that the filtering formula (9.4) has been decomposed into n one-dimensional recursive ones.
134
9. Decoupling of Filtering Equations
9.2 Real-Time Tracking To illustrate the decoupling technique, let us return to the realtime tracking example studied in Section 3.5. As we have seen there and in Exercise 3.8, this real-time tracking model may be simplified to take on the formulation:
+ ~k == CXk + 1Jk ,
Xk+l == AXk {
Vk
(9.11)
where A==
[
1
0
h hh/2] 1
o
0
2
,
C==[l 00],
h > 0,
1
and {~k} and {1}k} are both zero-mean Gaussian white noise sequences satisfying the assumption that
with ap , av, aa 2: 0, a p + a v + aa > 0, and am > o. As we have also shown in Chapter 6 that the limiting Kalman filter for this system is given by
where
q> == (1 - GC)A
with
and P == [P[i, j]]3X3 being the positive definite solution of the following matrix Riccati equation:
9.2 Real-Time 'fracking
or
Since
Equation (9.6) now becomes Z-1+ 91 92 [ 93
-h+h91 Z - 1 + h92 h93
2 291 -h /2+h /2] [Xl] -h + h292/2 XX32 Z - 1 + h293/2
=Z
[91] 9 2 V, 93
and by Cramer's rule, we have
i
= 1,2,3, where
+ (93 h2 / 2 + 92 h - 291)Z-1 + (93 h2 / 2 - 92h + 91)Z~2} . {I + ((91 - 3) + 92h + 93h2 /2)z-1 + ((3 - 291) - 92h + 93h2 /2)Z-2 + (91 -1)z-3},
HI ={91
H 2 ={92 + (h93 - 292)Z-1 + (92 - h93)Z-2}
. {I
+ ((91
+ ((3 -
- 3)
291) -
+ 92h + 93h2 /2)Z-1 92h + 93h2 /2)Z-2 + (91 - 1)z-3},
H 3 ={93 - 293 Z- 1 + 93 Z- 2} . {I
+ ((3 Thus, if we set
+ ((91
291) - 92h + 93h2/2)Z-2
- 3)
+ 92h + 93 h2 )z-1
+ (91 - 1)z-3}.
135
136
9. Decoupling of Filtering Equations
and take the inverse z-transforms, we obtain Xk
Xk Xk
+ 92 h + 93 h2 /2)Xk-l - ((3 - 291) - 92h + 93 h2 /2)Xk-2 - (91 - 1)xk-3 + 91 Vk + (93 h2 / 2 + 92 h - 291)Vk-l + (93 h2 / 2 - 92 h + 91)Vk-2 , = - ((91 - 3) + 92 h + 93 h2 /2)Xk-l - ((3 - 291) - 92 h + 93 h2 / 2 )Xk-2 - (91 - 1)xk-3 + 92 Vk + (h93 - 292)Vk-l + (92 - h93)Vk-2 , = - ((91 - 3) + 92 h + 93 h2 /2)Xk-l - ((3 - 291) - 92 h + 93 h2 /2)Xk-2 - (91 - 1)xk-3 + 93 Vk - 293 Vk-l + 93 Vk-2 , =-
((91 -
3)
k = 0,1,···, with initial conditions X-I, X-I, and X-I, where Vk for k < 0 and Xk = Xk = Xk = 0 for k < -1 (cf. Exercise 9.2).
9.3 The a -
(3 - ,
= 0
Tracker
One of the most popular trackers is the so-called a - (3 - , tracker. It is a "suboptimal" filter and can be described by Xk { Xo
= AXk-l + H(Vk - CAXk-l)
= E(xo),
(9.13)
where H = [a (3/h ,/h2]T for some constants a, (3, and, (cf. Fig.9.1). In practice, the a,(3" values are chosen according to the physical model and depending on the user's experience. In this section, we only consider the example where and
C=[l 00].
Hence, by setting 91=a,
92
= (3/h,
and
2 93=,/h ,
the decoupled filtering formulas derived in Section 9.2 become a decoupled a - (3 - , tracker. We will show that under certain conditions on the a, (3" values, the a - (3 - , tracker for the timeinvariant system (9.11) is actually a limiting Kalman filter, so that these conditions will guarantee "near-optimal" performance of the tracker.
9.3 The a - (3 - , Tracker
137
+
(I - HC)A Fig. 9.1.
Since the matrix P in (9.12) is symmetric, we may write
P =
[PH P2I P3I
P2I P22 P32
P31 ] P32
,
P33
so that (9.12) becomes
[
P=AP-
[1
PO PII + am 0 1
~]P]AT+[I
0 0 0
0 av 0
~J
(9.14)
and G = PCj(C T PC
=
1
PII
+ am
+ am)
[PII] P2I . P3I
(9.15)
A necessary condition for the a - (3 - , tracker (9.13) to be a limiting Kalman filter is H = G, or equivalently,
a]2 = 1 (3jh [,jh PII + am
[PI1] P2I , P31
so that
[ ~~~] = l~a [jJ/~] . P3I
(9.16)
,jh
On the other hand, by simple algebra, it follows from (9.14),
138
9. Decoupling of Filtering Equations
(9.15), and (9.16) that P2I P3I ] P22 P32 = P32 P33 PII + 2hp2I + h2p3I +h 2p22 + h3p32 + h4 p33/ 4
PII [
P2I P3I
P2I + hP3I + hP22 +3h2p32/2 + h3p33/2
P3I + hP32 +h 2p33/ 2
P2I + hP3I + hP22 +3h2p32/2 + h3p33/2
+ hP33
P32
(a + (3)2
PII + am
p
+
[ a0 0
P33
(a(3 + a, + (32 +3,8,/2 +,2 /2)/h
,(a + (3 +,/2)/h 2
(a(3 + a, + (32 +3(3,/2 +,2 /2)/h
((3 + ,)2/h 2
,((3 + ,)/h3
,(a+(3+,/2)/h 2
,((3 + ,)/h3
,2/h4
+,(a + (3 + ,/4) 1
+ hP33
P32
~] .
0 av 0
aa
Substituting (9.16) into the above equation yields h4
,
2 aa =PII+ a m
h4 ((3+,)2 h PII = ,3(2a + 2(3 + ,) aa - ,(2a + 2(3 + ,) a v 2hp2I + h2p3I + h2p22 =
h
4
(9.17) h
2
-42 (4a 2 + 8a(3 + 2(32 + 4a, + ,8,)aa + - av , 2
P31 + P22
h4
-
ap
3
= 4'"1 2 (40: + (3)((3 + '"1)aa + 4"av
,
and P22 = (1
~:)h2 [(3(0: + (3 + '"1/4 ) -
am ) P32 = (1 _ 0:)h 3'"1(0: + (3/2
P33
= (1 ~:)h4 '"1((3 + '"1) .
'"1(2 + 0:)/2] (9.18)
9.4 An Example
139
Hence, from (9.16), (9.17) , and (9.18) we have 0"
.J!.... O"rn
1
= -l-(o? + a(3 + a'Y/2 - a
2(3)
1 2 -2 -O"v = --((3 - 2a'Y)h O"a
(9.19)
1- a
O"rn
= _1_'Y 2h - 4
O"rn
1- a
and P
(3/h (f3(a + (3 + 'Y/4) - 'Y(2 + a)/2)/h 2 'Y(a + (3/2)/h 3
a f3/h [ 1 - a 'Y/ h2
=~
'Y/h2 ] 'Y(a + f3/2)/h 3 'Y((3 + 'Y)/h 4 (9.20)
(cf. Exercise 9.4). Since P must be positive definite (cf. Theorem 6.1), the a,(3,'Y values can be characterized as follows (cf. Exercise 9.5):
Theorem 9.1. Let the a,(3,'Y values satisfy the conditions in (9.19) and suppose that O"rn > o. Then the a - f3 - 'Y tracker is a limiting Kalman filter if and only if the following conditions are satisfied: (i) 0 < a < 1 , 'Y > 0, (ii) y' 2a'Y 5: (3 5: 2~Q (ex + 'Y /2), and (iii) the matrix
p=
[~ 'Y
1'/~ - 1'(2 + 1')/2
,6(a + ,6 + 'Y( a
+ (3/2)
1'(a: ,6/2)] 'Y((3 + 'Y)
is non-negative definite. 9.4 An Example Let us now consider the special case of the real-time tracking system (9.11) where O"p = O"v = 0 and O"a,O"rn > o. It can be verified that (9.16-18) together yield (9.21)
where
140
9. Decoupling of Filtering Equations
(cf. Exercise 9.6). By simple algebra, (9.21) gives
,6 + ,5 -
f(,) :=83
82
38(8 - 1/12),4
+ 68,3 + 3(8 -
+, -
1/12),2
1 (9.22)
=0
(cf. Exercise 9.7), and in order to satisfy condition (i) in Theorem 9.1, we must solve (9.22) for a positive,. To do so, we note that since f(O) = -1 and f(+oo) = +00, there is at least one positive root,. In addition, by the Descartes rule of signs, there are at most 3 real roots. In the following, we give the values of , for different choices of 8:
8
0.09
0.08
0.07
0.06
0.05
0.04
0.03
0.02
0.01
,
0.755
0.778
0.804
0.835
0.873
0.919
0.979
1.065
1.211
Exercises 9.1. Consider the two-dimensional real-time tracking system
where h > 0, and {Sk}' {TJk} are both uncorrelated zero-mean Gaussian white noise sequences. The a-f3 tracker associated with this system is defined by Xk = {
[~ ~] Xk-l + [/J/h] (Vk -
[1 0]
[~ ~] Xk-l)
Xo == E(xo).
(a) Derive the decoupled Kalman filtering algorithm for this a - f3 tracker. (b) Give the conditions under which this a - f3 tracker is a limiting Kalman filter. 9.2. Verify the decoupled formulas of Xk,Xk, and Xk given in Section 9.2 for the real-time tracking system (9.11).
Exercises
141
9.3. Consider the three-dimensional radar-tracking system 1
Xk+l
{ Vk
where
{Wk}
= [~
h h /2] 2
~
= [1 0 0
~
Xk
]Xk
+ Wk ,
+ ~k
is a sequence of colored noise defined by Wk
= SWk-l + TJk
and {{k}' {TJk} are both uncorrelated zero-mean Gaussian white noise sequences, as described in Chapter 5. The associated a - {3 - '1 - (J tracker for this system is defined by the algorithm:
Xk =
[~ ~] Xk- + [~~~ ] {Vk -
Xo =
[E~o)]
where
1
a, {3, '1,
and
[1 0 0]
[~ ~] Xk-d
, (J
are constants (cf. Fig.9.2).
Cl ---.I
{J/h
i/ h2 (J
Fig. 9.2.
t - - -..... + { > - - - - - - - - - - - - - - - . . - -..
+
142
9. Decoupling of Filtering Equations
(a) Compute the matrix
~=
{I -
[~~~
] [1 0 O]}
[~ ~].
(b) Use Cramer's rule to solve the system
9.4. 9.5. 9.6. 9.7.
for X l ,X2 ,X3 and W. (The above system is obtained when the z-transform of the Cl! - f3 -, - () filter is taken.) (c) By taking the inverse z-transforms of Xl, X2 , X3 , and W, give the decoupled filtering equations for the Cl! - f3 -,- () filter. (d) Verify that when the colored noise sequence {17k} becomes white; namely, s = 0 and () is chosen to be zero, the decoupled filtering equations obtained in part (c) reduce to those obtained in Section 9.2 with 91 = Cl!, 92 = f3/h, and 93 = ,/h2 • Verify equations (9.17-20). Prove Theorem 9.1 and observe that conditions (i)-(iii) are independent of the sampling time h. Verify the equations in (9.21). Verify the equation in (9.22).
10. Kalman Filtering for Interval Systems
If some system parameters such as certain elements of the system matrix are not precisely known or gradually change with time, then the Kalman filtering algorithm cannot be directly applied. In this case, robust Kalman filtering that has the ability of handling uncertainty is needed. In this chapter we introduce one of such robust Kalman filtering algorithms. Consider the nominal system
+ rk~k ' = CkXk + '!lk '
Xk+l = AkXk {
Vk
where A k , rk and Ck are known n x n, n respectively, with 1 ~ p, q ::; n, and where E(~k)
= 0, E('!lk) = 0, E(~k!1.J) = 0,
x p
(10.1)
and q
x n
matrices,
E(~k(J) = Qkbki , E('!lk!1.J)
= Rkbki ,
E(xo~k)
= 0,
E(xO'!lk) = 0 ,
for all k, f = 0,1"", with Qk and Rk being positive definite and symmetric matrices. If all the constant matrices, A k , r k , and Ck, are known, then the Kalman filter can be applied to the nominal system (10.1), which yields optimal estimates {Xk} of the unknown state vectors {Xk} using the measurement data {Vk} in a recursive scheme. However, if some of the elements of these system matrices are unknown or uncertain, modification of the entire setting for filtering is necessary. Suppose that the uncertain parameters are only known to be bounded. Then we can write Ak = Ak + ~Ak =" [Ak -1~Akl, Ak + I~AkIJ '
rk = rk + ~rk = [rk -I~rkl,rk + l~rklJ ' c£ = Ck + ~Ck = [Ck -1~Ckl, Ck + I~CkIJ ,
144
10. Kalman Filtering for Interval Systems
k == 0,1"", where I~Akl, I~rkl, and I~Ckl are constant bounds for the unknowns. The corresponding system Xk+l {
Vk
= A~Xk + r'~k '
(10.2)
== CkXk +!1.k '
is then called an interval system. Under this framework, how is the original Kalman filtering algorithm modified and applied to the interval system (10.2)? This question is to be addressed in this chapter. k == 0,1" . "
10.1 Interval Mathematics
In this section, we first provide some preliminary results on interval arithmetic and interval analysis that are needed throughout the chapter. 10.1.1 Intervals and Their Properties
A closed and bounded subset [~, x] in R == (-00, 00) is referred to as an interval. In particular, a single point x E R is considered as a degenerate interval with ~ == x == x. Some useful concepts and properties of intervals are: (a) Equality: Two intervals, [~l' Xl] and [~2' X2], are said to be equal, and denoted by
if and only if ~l == ~2 and Xl == X2. (b) Intersection: The intersection of two intervals, [~2' X2], is defined to be [~I,XI]
n [~2,X2]
[~l' Xl]
and
== [max{~I'~2},min{xI,x2}]'
Furthermore, these two intervals are said to be disjoint, and denoted by [~I,XI] n [~2,X2] == cP, if and only if ~l > X2 or ~2 > Xl. (c) Union: The union of two non-disjoint intervals, [~2' X2], is defined to be
[~I,XI]
and
10.1 Interval Mathematics
145
Note that the union is defined only if the two intervals are not disjoint, i.e.,
otherwise, it is undefined since the result is not an interval. (d) Inequality: The interval [~l' Xl] is said to be less than (resp., greater than) the interval [~2' X2], denoted by
if and only if Xl < ~2 (resp., ~l > X2); otherwise, they cannot be compared. Note that the relations ~ and 2: are not defined for intervals. (e) Inclusion: The interval [~l' Xl] is said to be included in [~2' X2], denoted [~l' Xl] ~ [~2' X2] if and only if ~2 ~ ~l and Xl ~ X2; namely, if and only if [~l' Xl] is a subset (subinterval) of [~2,X2]. For example, for three given intervals, Xl == [-1,0], X 2 == [-1,2], and X 3 == [2,10], we have Xl
n X 2 == [-1,0] n [-1,2] == [-1,0],
Xl n X 3 == [-1,0] n [2,10] == cjJ,
X 2 n X 3 == [-1,2] n [2,10] == [2,2] == 2, Xl U X 2 == [-1,0] U [-1,2] == [-1,2], Xl U X 3 == [-1,0] U [2,10] is undefined,
X 2 U X 3 == [-1,2]
U
[2,10] == [-1,10] ,
Xl == [-1,0] < [2,10] == X 3
,
Xl == [-1,0] C [-1,10] == X 2
•
10.1.2 Interval Arithmetic
Let [~, x], [;fl' Xl], and [;f2' X2] be intervals. The basic arithmetic operations of intervals are defined as follows:
(a) Addition:
146
10. Kalman Filtering for Interval Systems
(b) Subtraction: [;£1' Xl] - [~2' X2] ( C)
[~l
=
- X2, Xl -
~2]
.
Reciprocal operation: If 0
tt [~,x]
then [;f.,X]-l = [l/x, 1/;f.];
If 0
E [~,x]
then [~,x]-l is undefined.
(d) Multiplication:
where
(e)
y.. = min {;£1~2 , ~lX2 , Xl~2 , XlX2} , y = max {;f.l~2' ~lX2, Xl;f.2' XlX2} .
Division: [;£1'
Xl] / [~2,X2]
provided that 0 tt [;f.2' X2];
= [;£1' Xl] . [~2,X2]-1, otherwise, it is undefined.
For three intervals, X = [;f., x], Y = [y, y], and Z = [~, z], consider the interval operations of addition (+)~ subtraction (-), multiplication (.), and division (/), namely,
* E {+,-,.,
Z=X*Y,
I}.
It is clear that X * Y is also an interval. In other words, the family of intervals under the four operations {+, -, . , /} is algebraically closed. It is also clear that the real numbers x, y, z,· .. are isomorphic to degenerate intervals [x, x], [y, y], [z, z], ... , so we will simply denote the point-interval operation [x, x] * Y as x * Y. Moreover, the multiplication symbol "." will often be dropped for notational convenience. Similar to conventional arithmetic, the interval arithmetic has the following basic algebraic properties (cf. Exercise 10.1): X+Y=Y+X, Z
+ (X + Y) = (Z + X) + Y ,
XY=YX,
= (ZX)Y, X +0 = 0+X = X Z(XY)
and
XO
= OX = 0, where 0 = [0,0] ,
X1=1X=X, where /=[1,1], Z(X
+ Y)
~
ZX
+ ZY,
where
= holds only if either
(a) Z = [z,z], (b)X=Y=O, (c) xy
~ 0
or
for all x E X and y E Y .
10.1 Interval Mathematics
147
In addition, the following is an important property of interval operations, called the monotonic inclusion property. Theorem 10.1. Let Xl, X 2 , YI , and Y2 be intervals, with and
Then for any operation * E {+, -, . , I}, it follows that
This property is an immediate consequence of the relations Xl ~ YI and X 2 ~ Y2 , namely, Xl *X2
= {Xl *x2l x I ~ {YI
* Y21 YI
E X I ,X2 E X 2 }
E YI , Y2 E Y2 }
=YI *Y2 . Corollary 10.1. Let X and Y be intervals and let
X
E
X and Y E Y.
Then, for all
* E {+, -,
. , /} .
What is seemingly counter-intuitive in the above theorem and corollary is that some operations such as reciprocal, subtraction, and division do not seem to satisfy such a monotonic inclusion property. However, despite the above proof, let us consider a simple example of two intervals, X = [0.2,0.4] and Y = [0.1,0.5]. Clearly, X ~ Y. We first show that fiX ~ flY, where f = [1.0,1.0]. Indeed,
~ X
= [1.0, 1.0] = [2.5,5.0]
[0.2,0.4]
We also observe that f - X
and
f - X ~ f - Y,
= [1.0,1.0] -
f [1.0,1.0] [ ] y = [0.1,0.5] = 2.0,10.0 .
by noting that
[0.2,0.4]
= [0.6,0.8]
and I - Y = [1.0,1.0] - [0.1,0.5] = [0.5,0.9] .
Moreover, as a composition of these two operations, we again I I . have I-X ~ I-Y' sInce f I _ X
= [5/4,5/3]
and
f f _ Y
= [10/9,2].
148
10. Kalman Filtering for Interval Systems
We next extend the notion of intervals and interval arithmetic to include interval vectors and interval matrices. Interval vectors and interval matrices are similarly defined. For example, AI _ [[2,3] - [1,2]
[0,1] ] [2,3]
hI
and
= [ [0, 10] ] [-6,1]
are an interval matrix and an interval vector, respectively. Let AI = [a[j] and B I = [b[j] be n x m interval matrices. Then, AI and B I are said to be equal if a[j = b[j for all i = 1,·· . , nand j = 1, ... , m; AI is said to be contained in B I , denoted AI ~ B I , if a[j ~ b[j for all i = 1, ... , nand j = 1, ... , m, where, in particular, if AI = A is an ordinary constant matrix, we write A E B I . Fundamental operations of interval matrices include: (a) Addition and Subtraction:
(b) Multiplication: For two and B I ,
n x
AIB I =
rand
r x m
interval matrices AI
[ta[kb£j] . k=l
(c) Inversion: For an n
x n
interval matrix AI with det [AI] i= 0,
D . t ance, I·f AI = [[2,3] ror Ins [1,2]
I
[A]
-1
adj [AI]
[0,1] ] [2,3) , then [2,3] [ -[1,2]
-[0,1]] [2,3]
= det[AI) = [2,3)[2,3)- [0,1)[1,2) _ [ [2/9,3/2] - [-1, -1/9]
[-1/2,0] ] [2/9,3/2] .
Interval matrices (including vectors) obey many algebraic operational rules that are similar to those for intervals (cf. Exercise 10.2).
10.1 Interval Mathematics
149
10.1.3 Rational Interval Functions
Let SI and S2 be intervals in Rand f : SI ~ S2 be an ordinary one-variable real-valued (i.e., point-to-point) function. Denote by ES I and ES2 families of all subintervals of SI and S2, respectively. The interval-to-interval function, f1 : ES I ~ Es2 , defined by f1 (X) = {f(x) E 8 2
:
x EX, X E E S1 }
is called the united extension of the point-to-point function f on SI. Obviously, its range is f1(X) =
U {f(x)} , xEX
which is the union of all the subintervals of S2 that contain the single point f (x) for some x EX. The following property of the united extension f1 : ES I ~ ES2 follows immediately from definition, namely,
In general, an interval-to-interval function F of n-variables, monotonic inclusion property, if
Xl,···, X n , is said to have the
Note that not all interval-to-interval functions have this property. However, all united extensions have the monotonic inclusion property. Since interval arithmetic functions are united extensions of the real arithmetic functions: addition, subtraction, multiplication and division (+, -, ., /), interval arithmetic has the monotonic inclusion property, as previously discussed (cf. Theorem 10.1 and Corollary 10.1). An interval-to-interval function will be called an interval function for simplicity. Interval vectors and interval matrices are similarly defined. An interval function is said to be rational, and so is called a rational interval function, if its values are defined by a finite sequence of interval arithmetic operations. Examples of rational interval functions include X + y2 + Z3 and (X 2 + y2)/Z, etc., for intervals X, Y and Z, provided that 0 ~ Z for the latter. It follows from the transitivity of the partially ordered relation ~ that all the rational interval functions have the monotonic
150
10. Kalman Filtering for Interval Systems
inclusion property. This can be verified by mathematical induction. Next, let f = f(XI,··· ,xn ) be an ordinary n-variable realvalued function, and Xl,···, X n be intervals. An interval function, F = F(XI ,·· ., X n ), is said to be an interval extension of f if V
Xi E Xi,
i
= 1, ... , n .
Note also that not all the interval extensions have the monotonic inclusion property. The following result can be established (cf. Exercise 10.3):
is an interval extension of f with the monotonic inclusion property, then the united extension f1 of f satisfies
Theorem 10.2. If F
Since rational interval functions have the monotonic inclusion property, we have the following
If F is a rational interval function and is an interval extension of f, then
Corollary 10.2.
This corollary provides a means of finite evaluation of upper and lower bounds on the value-range of an ordinary rational function over an n-dimensional rectangular domain in Rn. As an example of the monotonic inclusion property of rational interval functions, consider calculating the function fI(X, A) =
I~XX
for two cases: Xl = [2,3] with Al = [0,2], and X 2 = [2,4] with A 2 = [0,3], respectively. Here, Xl C X 2 and Al C A 2 • A direct calculation yields
= [0,2]· [2,3] = [-6 0] f I1(X I, A) I [1, 1] - [2,3] , and
= [0,3]· [2,4] = [-12 0] f 21(X2, A) 2 [1,1] - [2,4] ,. Here, we do have
ff (Xl, AI) C f1 (X2 , A 2 ),
as expected.
10.1 Interval Mathematics
151
We finally note that based on Corollary 10.2, when we have interval division of the type xIlx I where Xl does not contain zero, we can first examine its corresponding ordinary function and operation to obtain xix = 1, and then return to the interval setting for the final answer. Thus, symbolically, we may write X I / X I = 1 for an interval X I not containing zero. This is indeed a convention in interval calculations. 10.1.4 Interval Expectation and Variance
Let I(x) be an ordinary function defined on an interval X. If I satisfies the ordinary Lipschitz condition II(x) - l(y)1 ::; L Ix -
yl
for some positive constant L which is independent of x, y EX, then the united extension 11 of I is said to be a Lipscbitz interval extension of lover X. Let B(X) be a class of functions defined on X that are most commonly used in numerical computation, such as the four arithmetic functions (+, -, . , I) and the elementary type of functions like e(o), In(·), V:-, etc. We will only use some of these commonly used functions throughout this chapter, so B(X) is introduced only for notational convenience. Let N be a positive integer. Subdivide an interval [a, b] ~ X into N subintervals, Xl = [Xl,Xl],···,XN = [XN,XN], such that a
= Xl < Xl = X 2 < X 2 = ... = X N < XN = b.
Moreover, for any I E B(X), let F be a Lipschitz interval extension of I defined on all Xi, i = 1,···, N. Assume that F satisfies the monotonic inclusion property. Using the notation b-a
SN(F; [a, b]) = ~
N
L F(X
i) ,
i=l
we have b
1 a
n CX)
I(t)dt =
SN(F; [a, b])
= lim
N~CX)
N=l
SN(F; [a, b]) .
Note that if we recursively define Yl = SI, { Yk+l = Sk+l n Yk ,
k
= 1, 2, ...
,
152
10. Kalman Filtering for Interval Systems
where Sk = Sk(F; [a, bD, then {Yk} is a nested sequence of intervals that converges to the exact value of the integral f(t)dt. Note also that a Lipschitz interval extension F used here has the property that F(x) is a real number for any real number x E R. However, for other interval functions that have the monotonic inclusion property but are not Lipschitz, the corresponding function F(x) may have interval coefficients even if x is a real number. Next, based on the interval mathematics introduced above, we introduce the following important concept. Let X be an interval of real-valued random variables of interest, and let
J:
f(x)
=
1 { -(x - J.-tx)2 } ~exp 2 2
y27rO"x
'
XEX,
O"X
be an ordinary Gaussian density function with known J.-tx and o"x > o. Then f(x) has a Lipschitz interval extension, so that the interval expectation E(X) =
=
i: j
CX)
-CX)
xf(x)dx
x { -(x - J.-tx)2 } --exp 2 dx, ~O"x
XEX,
(10.3)
20"x
and the interval variance
x E X,
(10.4)
are both well defined. This can be easily verified based on the definite integral defined above, with a -+ -00 and b -+ 00. Also, with respect to another real interval Y of real-valued random variables, the conditional interval expectation E(XI y
E
Y) =
i: j j
CX)
-CX)
CX)
-ex:>
xf(xly)dx f(x'Y)d x--x f(y)
x ~O"xy
exp
{ -(x - J.-tXy)2 } d 2 20"xy
x, x E X, (10.5)
10.1 Interval Mathematics
153
and the conditional variance Var(XI y E Y)
i:
= E((x -
j
JLx)21 y E Y)
[x - E(xl Y E
ex:> [x -
-00
_ jex:>
y)]2 f(xly)dx
E(xl y E Y)
J 2 !(x,y)
[x-E(xIYEy)]2
-
-ex:>
rn=,-
y21ra
7fij)dx exp
{_(X- ii )2} dx, -2
2a
x E X,
(10.6)
are both well defined. This can be verified based on the same reasoning and the well-defined interval division operation (note that zero is not contained in the denominator for a Gaussian density interval function). In the above, and
with a;y
= a~x = E(XY) -
E(X)E(Y)
= E(xy)
- E(x)E(y) ,
x EX.
Moreover, it can be verified (cf. Exercise 10.4) that E(XI y E Y)
= E(x) + a;y [y -
E(y)]/a~,
XEX,
(10.7)
and xEX.
(10.8)
Finally, we note that all these quantities are well-defined rational interval functions, so that Corollary 10.2 can be applied to them.
154
10. Kalman Filtering for Interval Systems
10.2 Interval Kalman Filtering Now, return to the interval system (10.2). Observe that this system has an upper boundary system defined by all upper bounds of elements of its interval matrices:
and a lower boundary system using all lower bounds of the elements of its interval matrices: Xk+l = [Ak -1~Akl ]Xk {
Vk
= [Ck
+ [rk
-I~rkl ]~k'
-1~Ckl ]Xk +!1k·
(10.10)
We first point out that by performing the standard Kalman filtering algorithm for these two boundary systems, the resulting two filtering trajectories do not encompass all possible optimal solutions of the interval system (10.2) (cf. Exercise 10.5). As a matter of fact, there is no specific relation between these two boundary trajectories and the entire family of optimal filtering solutions: the two boundary trajectories and their neighboring ones are generally intercrossing each other due to the noise perturbations. Therefore, a new filtering algorithm that can provide all-inclusive estimates for the interval system is needed. The interval Kalman filtering scheme derived below serves this purpose. 10.2.1 The Interval Kalman Filtering Scheme
Recall the derivation of the standard Kalman filtering algorithm given in Chapter 3, in which only matrix algebraic operations (additions, subtractions, multiplications, and inversions) and (conditional) expectations and variances are used. Since all these operations are well defined for interval matrices and rational interval functions, as discussed in the last section, the same derivation can be carried out for interval systems in exactly the same way to yield a Kalman filtering algorithm for the interval system (10.2). This interval Kalman filtering algorithm is simply summarized as follows:
10.2 Interval Kalman Filtering
155
The Interval Kalman Filtering Scheme
The main-process:
x& = E(x&),
xr = AL1 XL1 + er [vr - ClAL1 xL1] , k
= 1,2,··· .
(10.11)
The co-process:
pt = Var(x&) , M£_l = A'-lPt-l [A'_l]T + Bk-lQk-l [Bk_l]T, er = ML1 [Cl] T [[Cl]ML1 [cl] T + R k ] -1 , pt = [I-G'C£]M£_l[I-G,c£]T + [G']Rk[G,]T, k
= 1,2, . .. .
(10.12)
A comparison of this algorithm with the standard Kalman filtering scheme (3.25) reveals that they are exactly the same in form, except that all matrices and vectors in (10.11)-(10.12) are intervals. As a result, the interval estimate trajectory will diverge rather quickly. However, this is due to the conservative interval modeling but not the new filtering algorithm. It should be noted that from the theory this interval Kalman filtering algorithm is optimal for the interval system (10.2), in the same sense as the standard Kalman filtering scheme, since no approximation is needed in its derivation. The filtering result produced by the interval Kalman filtering scheme is a sequence of interval estimates, {x,}, that encompasses all possible optimal estimates {Xk} of the state vectors {Xk} which the interval system may generate. Hence, the filtering result produced by this interval Kalman filtering scheme is inclusive but generally conservative in the sense that the range of interval estimates is often unnecessarily wide in order to include all possible optimal solutions. It should also be remarked that just like the random vector (the measurement data) Vk in the ordinary case, the interval data vector v~ shown in the interval Kalman filtering scheme above is an uncertain interval vector before its realization (Le., before the data actually being obtained), but will be an ordinary constant vector after it has been measured and obtained. This should avoid possible confusion in implementing the algorithm.
156
10. Kalman Filtering for Interval Systems
10.2.2 Suboptimal Interval Kalman Filter
To improve the computational efficiency, appropriate approximations of the interval Kalman filtering algorithm (10.11)-(10.12) may be applied. In this subsection, we suggest a suboptimal interval Kalman filtering scheme, by replacing its interval matrix inversion with its worst-case inversion, while keeping everything else unchanged. To do so, let and
where Ck is the center point of c£ and Mk-1 is center point of M£_l (i.e., the nominal values of the interval matrices). Write
[[C1] Mk-l [cl] T + Rk =
[ [Ck
+ ~CkJ
= [CkMk-lCJ
r
1
[Mk- 1 + ~Mk-1J [Ck
+ ~CkJ T + Rk ]-1
+~Rkrl,
where ~Rk = CkMk_1[~Ck]T
+ Ck[~Mk-1] C~ + Ck[~Mk-1] [~Ck]T + [~Ck]Mk-1C~ + [~Ck]Mk-1[~Ck]T + [~Ck] [~Mk-1]C~ + [~Ck] [~Mk-1] [~Ck]T + Rk .
Then, in the algorithm (10.11)-(10.12), replace ~Rk by its upper bound matrix, I~Rkl, which consists of all the upper bounds of the interval elements of ~Rk = [[-rk(i,j), rk(i,j)]J, namely, (10.13)
We should note that this I~Rkl is an ordinary (non-interval) matrix, so that when the ordinary inverse matrix [CkMk-1C~ + I~Rkl] -1 is used to replace the interval matrix inverse [[C£J M£_l [Cl]T + R k ] -1, the matrix inversion becomes much easier. More importantly, when the perturbation matrix ~Ck = 0 in (10.13), meaning that the measurement equation in system (10.2) is as accurate as the nominal system model (10.1), we have I~Rkl = Rk. Thus, by replacing ~Rk with I~Rkl, we obtain the following suboptimal interval Kalman filtering scheme.
10.2 Interval Kalman Filtering
157
A Suboptimal Interval Kalman Filtering Scheme
The main-process:
X5 = E(X5), -I = AIk-lXk-l + G Ik [vkI k = 1,2,··· .
-I Xk
-
CIA I -I k k-lXk-l
J, (10.14)
The co-process:
pt = Var(x5), M kI -
1
I I [I JT + Bk-1Qk-l I [ I JT , = Ak-1P B k- 1 k- 1 A k- 1
er = ML1 [Cl]
T
[CkMk- 1CJ + I~Rkl] -1 ,
pt = [I-GkC£]M£_l[I-GkC£]T + [Gk]Rk[Gk]T, k
= 1,2,··· .
(10.15)
Finally, we remark that the worst-case matrix I~Rkl given in (10.13) contains the largest possible perturbations and is in some sense the "best" matrix that yields a numerically stable inverse. Another possible approximation is, if ~Ck is small, to simply use I~Rkl ~ Rk. For some specific systems such as the radar tracking system to be discussed in the next subsection, special techniques are also possible to improve the speed and/or accuracy in performing suboptimal interval filtering. 10.2.3 An Example of Target Tracking
In this subsection, we show a computer simulation by comparing the interval Kalman filtering with the standard one, for a simplified version of the radar tracking system (3.26), see also (4.22), (5.22), and (6.43), (10.16)
where basic assumptions are the same as those stated for the system (10.2). Here, the system has uncertainty in an interval entry: hI
= [h -
~h, h + ~h]
= [0.01 - 0.001,0.01 + 0.001] = [0.009,0.011] ,
158
10. Kalman Filtering for Interval Systems
in which the modeling error ~h was taken to be 10% of the nominal value of h = 0.01. Suppose that the other given data are:
[XOl]
E(xo) =
= [1], 1
X02
Qk
=
[6
~]
=
Var(xo) = [POO PlO
[~:~ ~:n,
Rk
POl] = [0.5 P 0.0
ll
0.0] 0.5 '
= r = 0.1.
For this model, using the interval Kalman filtering algorithm (10.11)-(10.12), we have hI [2pI_ l (1,0) +hIpI_l(l,l)] M£-l = +pI-l (0,0) + q [ pI-l (1,0) + hI pI-l (1,1)
pI- l (0, 1) + h I pI_l(l, 1)] pI- l (1,1) + q
. = [M~_l(O,O)
M~_l(O,l)] M k_ l (l,l) r )]._ [Gk,l] G lk -_[1-r/(M60+ I / I .I .
M k_ l (l,O)
M lO (Moo
+ r)
rGk'l
pt =
G k,2
rGk,2
]
q+ [PLl(l,l)[PLl(O,O)+q+r]
[
rG£,2
pI (0,0) .- [ pI (1,0)
-[PLl(O, l)F]/(MLl(O,O) +r)
pI (0,1)] pI (1,1) .
In the above, the matrices M£_l and pI are both symmetrical. Hence, M£_l (0,1) = M£_l (1,0) and pI-l (0,1) = pI-l (1,0). It follows from the filtering algorithm that Xk'l] _ [ [r(xk-l,l + hIxk_l,2) + M£_l (0, O)Yk] / M£_l (0,0)] [ Xk,2 Xk-1,2 + Gk,l (Yk - Xk-1,1 - hIxk_1,2) .
The simulation results for Xk,l of this interval Kalman filtering versus the standard Kalman filtering, where the latter used the nominal value of h, are shown and compared in both Figures 10.1 and 10.2. From these two figures we can see that the new scheme (10.11)-(10.12) produces the upper and lower boundaries of the single estimated curve obtained by the standard Kalman filtering algorithm (3.25), and these two boundaries encompass all possible optimal estimates of the interval system (10.2).
10.2 Interval Kalman Filtering 10 _-.......__-__- -__- - - - r - - - . . , . . - - _ , _ - - . , . . - - , 9
8 7 6
5 4
3 2
O'----'------I--~_---..I.--....a...-----...--...-......"
o
50
100
150
200
250
300
350
Fig. 10.1.
25
r------T---r-....,...-.....,...--,---..,.--r-----r----r--_
20
15
10
5
O---...----------------.-....---..I.-~
o
100 200 300
Fig. 10.2.
400
500
600 700 800
900
159
160
10. Kalman Filtering for Interval Systems
10.3 Weighted-Average Interval Kalman Filtering
As can be seen from Figures 10.1 and 10.2, the interval Kalman filtering scheme produces the upper and lower boundaries for all possible optimal trajectories obtained by using the standard Kalman filtering algorithm. It can also be seen that as the iterations continue, the two boundaries are expanding. Here, it should be emphasized once again that this seemingly divergent result is not caused by the filtering algorithm, but rather, by the iterations of the interval system model. That is, the upper and lower trajectory boundaries of the interval system keeps expanding by themselves even if there is no noise in the model and no filtering is performed. Hence, this phenomenon is inherent with interval systems, although it is a natural and convenient way for modeling uncertainties in dynamical systems. To avoid this divergence while using interval system models, a practical approach is to use a weighted average of all possible optimal estimate trajectories encompassed by the two boundaries. An even more convenient way is to simply use a weighted average of the two boundary estimates. For instance, taking a certain weighted average of the two interval filtering trajectories in Figures 10.1 and 10.2 gives the the results shown in Figures 10.3 and 10.4, respectively. Finally, it is very important to remark that this averaging is by nature different from the averaging of two standard Kalman filtering trajectories produced by using the two (upper and lower) boundary systems (10.9) and (10.10), respectively. The main reason is that the two boundaries of the filtering trajectories here, as shown in Figures 10.1 and 10.2, encompass all possible optimal estimates, but the standard Kalman filtering trajectories obtained from the two boundary systems do not cover all solutions (as pointed out above, cf. Exercise 10.5).
10.3 Weighted-Average Interval Kalman Filtering 120 110
100 90 80 70 60 50 40 30 20 10
k
0
0
50
100
150
200
250
300
350
400
50
100
150
200
250
300
350
400
Fig. 10.3.
120 110 100 90 80 70 6C 50
40 30 20 10 0
k
-10 0
Fig. 10.4.
161
162
10. Kalman Filtering for Interval Systems
Exercises 10.1. For three intervals x, Y, and Z, verify that X+Y=Y+X, Z
+ (X + Y) = (Z + X) + Y
,
XY=YX,
= (ZX)Y, X +0 = 0+X = X Z(XY)
and
XO
= OX = 0, where 0 = [0, 0] ,
XI=IX=X, where 1=[1,1],
+ Y) ~ ZX + ZY, (a) Z = [z,z];
Z(X
where = holds only if
(b)X=Y=O;
(c) xy 2:: 0 for all x E X and y E Y .
10.2. Let A, B, C be ordinary constant matrices and AI, B I , Cl be interval matrices, respectively, of appropriate dimensions. Show that (a) AI ± B I = {A ± B I A E AI, B E B I }; (b) AIB = {ABIA E AI};
(c) (d)
(e) (f)
AI+BI=BI+AI ; AI+(BI+CI)=(AI+BI)+CI; AI +O=O+A I =A I ; All = IA I =A I ;
(g) Subdistributive law: (g.l) (AI + BI)CI ~ AICI + BIC I ;
(g.2) (h)
(i) (j)
Cl (AI + B I ) ~ Cl AI + Cl B I ; (AI + BI)C = AIC + BIC; C(A I + B I ) = CAI + CBI;
Associative and Subassociative laws: (j.1) AI(BC) ~ (AIB)C; (j.2) (ABI)C I ~ A(B I Cl) if Cl = -Cl; (j.3) A(BIC) = (ABI)C; (j.4) AI(BICI ) = (AIBI)C I , if B I = -B I and Cl = -Cl. 10.3. Prove Theorem 10.2. 10.4. Verify formulas (10.7) and (10.8). 10.5. Carry out a simple one-dimensional computer simulation to show that by performing the standard Kalman filtering algorithm for the two boundary systems (10.9)-(10.10) of the interval system (10.2), the two resulting filtering trajectories
10.3 Weighted-Average Interval Kalman Filtering
163
do not encompass all possible optimal estimation solutions for the interval system. 10.6. Consider the target tracking problem of tracking an uncertain incoming ballistic missile. This physical problem is described by the following simplified interval model: Xk+l {
All
1 Vk
= A 22 = A 33 = 1,
= Akxk +{k' = ClkXk1 + '!lk'
A 44
1
A 47
=
-"2 gX4 z ,
A 55 =
A 46 1
-"2 gX7
1 X7X5X6 A56=A65=-"2g-z-'
1
A 66
= --2 gX7
A 76
= -Klx7,
z2
+ x~
z
1 2
X7X4X6
= A 64 = - - 9 - - , z2
z
+ xg z
1
A57=-"2gx5z, 1
= -- gX6 Z ,
A 67
z
A 77
+ x~
2
1 X7X4X5 A 45 = A 54 = - - 9 - - , 2 z 1
z2
= -- gX7 - - -
2
= - Klx 6,
with all other A ij = 0, where K l is an uncertain system parameter, 9 is the gravety constant, z = v'x~ + xg + x~, and C=
1 0 [o
0 1 0
0 0 1
0 0 0 0 0 0 000
0] 0 . 0
Both the dynamical and measurement noise sequences are zero-mean Gaussian, mutually independent, with covariances {Qk} and {Rk}, respectively. Perform interval Kalman filtering on this model, using the following data set:
= 0.981, K 1 = [2.3 X 10- 5 ,3.5 x 10- 5 J ' x~ = [3.2 x 10 5 ,3.2 x 10 5 ,2.1 x 10 5 ,-1.5 x 104 ,-1.5 x 104 ,
9
- 8.1
X
10 3 ,5 x 10- l0 J T
,
pt = dia g { 106 , 106 , 106 ,106 , 106 , 1.286 x 1O- 13 exp{ -23.616} } ,
Qk = _1_ diag{O, 0, 0,100,100,100,2.0 x 10- l8 } k+1 Rk
= k: 1 dia g { 150, 150, 150} .
,
11. Wavelet Kalman Filtering
In addition to the Kalman filtering algorithms discussed in the previous chapters, there are other computational schemes available for digital filtering performed in the time domain. Among them, perhaps the most exciting ones are wavelet algorithms, which are useful tools for multichannel signal processing (e.g., estimation or filtering) and multiresolution signal analysis. This chapter is devoted to introduce this effective technique of wavelet Kalman filtering by means of a specific application for illustration - simultaneous estimation and decomposition of random signals via a filter-bank-based Kalman filtering approach using wavelets. 11.1 Wavelet Preliminaries
The notion of wavelets was first introduced in the earlier 1980's, as a family of functions generated by a single function, called the "basic wavelet," by two simple operations: translation and scaling. Let 'ljJ(t) be such a basic wavelet. Then, with a scaling constant, a, and a translation constant, b, we obtain a family of wavelets of the form 'ljJ((t - b)/a). Using this family of wavelets as the integral kernel to define an integral transform, called the integral wavelet transform (IWT): (W",f) (b, a) =
la l- 1/ 2
1:
f(t) 'l/J( (t - b)ja)) dt,
f
E £2 ,
(11.1)
we can analyze the functions (or signals) f (t) at different positions and at different scales according to the values of a and b. Note that the wavelet 'ljJ(t) acts as a time-window function whose "width" narrows as the value of the "scale" a in (11.1) decreases. Hence, if the frequency, w in its Fourier transform :(f(w), is defined to be inversely proportional to the scale a, then the width of the time window induced by 'ljJ(t) narrows for studying high-frequency objects and widens for observing low-frequency situations. In addition, if the basic wavelet 'ljJ(t) is so chosen that its Fourier transform is also a window function, then the IWT, (W1/Jf) (b, a), can be used for time-frequency localization and analysis of f(t) around t == b on a frequency band defined by the scale a.
11.1 Wavelet Preliminaries
165
11.1.1 Wavelet Fundamentals
An elegant approach to studying wavelets is via "multiresolution analysis." Let £2 denote the space of real-valued finite-energy functions in the continuous-time domain (-00, (0), in which the inner product is defined by (/,g) = f~oo I(t)g(t) dt and the norm is defined by 11/11£2 '= Ji(7J)i. A nested sequence {Vk} of closed subspaces of £2 is said to form a multiresolution analysis of £2 if there exists some window function (cf. Exercise 11.1), cjJ(t) E £2, which satisfies the following properties: (i) for each integer k, the set 0 1 ... } J. -- ... , -1 '"
is an unconditional basis of Vk, namely, its linear span is dense in Vk and for each k,
alI{cj}II;2 :S 11 jJ;oo CjePkj 1[2 :S fJll{cj}II;2
(11.2)
for all {Cj} E £2, where II{cj}ll l 2= V'L-'J=-oo ICjI2; (ii) the union of Vk is dense in £2; (iii) the intersection of all Vk'S is the zero function; and (iv) f(t) E Vk if and only if 1(2t) E Vk+l. Let Wk be the orthogonal complementary subspace of Vk+l relative to Vk, and we use the notation (11.3)
Then it is clear that Wk .l W n for all k =1= n, and the entire space £2 is an orthogonal sum of the spaces Wk, namely: (11.4)
Suppose that there is a function, 1jJ(t) E Wo, such that both 1jJ(t) and its Fourier transform ;j(w) have sufficiently fast decay at ±oo (cf. Exercise 11.2), and that for each integer k, J o
-
•••
0 1 , -1 '"
... }
(11.5)
166
11. Wavelet Kalman Filtering
is an unconditional basis of Wk. Then 'ljJ(t) is called a wavelet (cf. Exercise 11.3). Let :;fi(t) E Wo be the dual of 'ljJ(t) , in the sense that
I:
~(t -
i) 1jJ(t - j) dt = Oij ,
i,j=···,-l,O,l,··· .
Then both :;fi(t) and :;fi(w) are window functions, in the time and frequency domains, respectively. If ;j(t) is used as a basic wavelet in the definition of the IWT, then real-time algorithms are available to determine (W~f)(b, a) at the dyadic time instants b = j /2 k on the kth frequency bands defined by the scale a = 2- k • Also, f(t) can be reconstructed in real-time from information of (W~f)(b, a) at these dyadic data values. More precisely, by defining k/2 k 'ljJkj(t) = 2 'ljJ(2 t - j) ,
any function f(t)
E £2
(11.6)
can be expressed as a wavelet series 00
00
L: L:
f(t) =
(11.7)
dJ'ljJk,j(t)
k=-ooj=-oo
with dJ
= (W~f)
(j2- k , 2- k )
(11.8)
.
According to (11.3), there exist two sequences {aj} and {bj} in such that
£2
00
q;(2t -f)
=
L:
[at-2jq;(t - j)
+ bt - 2j 'ljJ(t -
j)J
(11.9)
j=-oo
for all integers £; and it follows from property (i) and (11.3) that two sequences {Pj} and {qj} in £2 are uniquely determined such that 00
q;(t) =
L:
Pj q;(2t - j)
(11.10)
qj c/J(2t - j) .
(11.11)
j=-oo
and
00
'ljJ(t) =
L: j=-oo
The pair of sequences ({ aj}, {bj }) yields a pyramid algorithm for finding the IWT values {dj}; while the pair of sequences ({Pj}, {qj})
11.1 Wavelet Preliminaries
167
yields a pyramid algorithm for reconstructing f(t) from the IWT values {dj}. We remark that if cjJ(t) is chosen as a B-spline function, then a compactly supported wavelet 'ljJ(t) is obtained such that its dual ;j;(t) has exponential decay at ±oo and the IWT with respect to ;j;(t) has linear phase. Moreover, both sequences {Pj} and {qj} are finite, while {aj} and {bj} have very fast exponential decay. It should be noted that the spline wavelets 'ljJ(t) and ;j;(t) have explicit formulations and can be easily implemented. 11.1.2 Discrete Wavelet Transform and Filter Banks
For a given sequence of scalar deterministic signals, {x( i, n)} E £2, at a fixed resolution level i, a lower resolution signal can be derived by lowpass filtering with a halfband lowpass filter having an impulse response {h(n)}. More precisely, a sequence of the lower resolution signal (indicated by an index L) is obtained by downsampling the output of the lowpass filter by two, namely, h(n) ~ h(2n), so that 00
xL(i - 1, n) =
L:
h(2n - k) x(i, k) .
(11.12)
k=-oo
Here, (11.12) defines a mapping from £2 to itself. The wavelet coefficients, as a complement to xL(i - 1, n), will be denoted by {xH(i-1,n)}, which can be computed by first using a highpass filter with an impulse response {g(n)} and then using downsampling the output of the highpass filtering by two. This yields 00
xH(i - 1, n)
=
L:
g(2n - k) x(i, k).
(11.13)
k=-oo
The original signal {x( i, n)} can be recovered from the two filtered and downsampled (lower resolution) signals {xL(i-1,n)} and {xH(i-1, n)}. Filters {h(n)} and {g(n)} must meet some constraints in order to produce a perfect reconstruction for the signal. The most important constraint is that the filter impulse responses form an orthonormal set. For this reason, (11.12) and (11.13) together can be considered as a decomposition of the original signal onto an orthonormal basis, and the reconstruction 00
x(i, n) =
L: k=-oo
h(2k - n)xL(i - 1, k)
168
11. Wavelet Kalman Filtering 00
+
L
g(2k - n)xH(i - 1, k)
(11.14)
k=-oo
can be considered as a sum of orthogonal projections. The operation defined by (11.12) and (11.13) is called the discrete (forward) wavelet transform, while the discrete inverse wavelet transform is defined by (11.14). To be implementable, we use FIR (finite impulse response) filters for both {h(n)} and {g(n)} (namely, these two sequences have finite length, L), and we require that g(n) = (_1)n h(L - 1 - n) ,
(11.15)
where L must be even under this relation. Clearly, once the lowpass filter {h(n)} is determined, the highpass filter is also determined. The discrete wavelet transform can be implemented by an octave-band filter bank, as shown in Figure 11.1 (b), where only three levels are depicted. The dimensions of different decomposed signals at different levels are shown in Figure 11.1 (a) . i -3 .
X -kH '-:---------------~ ~-3 :: -kL ----'--...i
2S~~ _:---~--------"'-----~ Xi-2 :,-_ : -kL _--....L
-.....
--'--
~
X ikH -1 : _ :'----a_--....L_---r._-....._--'--_--'--_~_..... X i- 1L : -k
._---I_--....L_---r._-....._--'--_--'--_~_....&
Xi
_k
:--''----I'----I'----I----.Io---r.----'---'-_ _--'---''''--oI.--'''-
~I__..1
(a) Decomposed signals
X i-3
-kH
+
)(-2
-kL where
@
denotes downsampling by 2
(b) A two-channel filter bank
Fig. 11.1.
11.1 Wavelet Preliminaries
169
For a sequence of deterministic signals with finite length, it is more convenient to describe the wavelet transform in an operator form. Consider a sequence of signals at resolution level i with length M:
X1 =
[x(i, k - M
+ 1), x(i, k - M + 2)" .. ,x(i, k)] T
.
Formulas (11.12) and (11.13) can be written in the following operator form: and
where operators H i - 1 and G i - 1 are composed of lowpass and highpass filter responses [the {h(n)} and {g(n)} in (11.12) and (11.13)], mapping from level i to level i -1. Similarly, when mapping from level i - I to level i, (11.14) can be written in operator form as (cf. Exercise 11.4) (11.16)
On the other hand, the orthogonality constraint can also be expressed in operator form as (Hi-1)TH i - 1 + (Gi-1)T G i - 1 = 1
and
Hi-1(Hi-1)T [ Gi-1(Hi-l)T
Hi-l(Gi-l)T] _ Gi-l(Gi-l)T -
[I0 0]1 .
A simultaneous multilevel signal decomposition can be carried out by a filter bank. For instance, to decompose x1 into three levels, as shown in Figure 11.1, the following composite transform can be applied: X i-
3
-kL
X i- 3 -kH Xi-2 -kH Xi- 1 -kH
where ..
T~-31~
= T i - 3 li -Xik '
Hi-3Hi-2Hi_l] Gi-3Hi-2Hi-l G i - 2H i - 1 [ Gi- 1 -
is an orthogonal matrix, simultaneously mapping three levels of the filter bank.
X1
onto the
170
11. Wavelet Kalman Filtering
11.2 Singal Estimation and Decomposition
In random signal estimation and decomposition, a general approach is first to estimate the unknown signal using its measurement data and then to decompose the estimated signal according to the resolution requirement. Such a two-step approach is offline in nature and is often not desirable for real-time applications. In this section, a technique for simultaneous optimal estimation and multiresolutional decomposition of a random signal is developed. This is used as an example for illustrating wavelet Kalman filtering. An algorithm that simultaneously performs estimation and decomposition is derived based on the discrete wavelet transform and is implemented by a Kalman filter bank. The algorithm preserves the merits of the Kalman filtering scheme for estimation, in the sense that it produces an optimal (linear, unbiased, and minimum error-variance) estimate of the unknown signal, in a recursive manner using sampling data obtained from the noisy signal. The approach to be developed has the following special features: First, instead of a two-step approach, it determines in one step the estimated signal such that the resulting signal naturally possesses the desired decomposition. Second, the recursive Kalman filtering scheme is employed in the algorithm, so as to achieve the estimation and decomposition of the unknown but noisy signal, not only simultaneously but also optimally. Finally, the entire signal processing is performed on-line, which is a realtime process in the sense that as a block of new measurements flows in, a block of estimates of the signal flows out in the required decomposition form. In this procedure, the signal is first divided into blocks and then filtering is performed over data blocks. To this end, the current estimates are obtained based on a block of current measurements and the previous block of optimal estimates. Here, the length of the data block is determined by the number of levels of the desired decomposition. An octave-band filter bank is employed as an effective vehicle for the multiresolutional decomposition. 11.2.1 Estimation and Decomposition of Random Signals
Now, consider a sequence of one-dimensional random signals, {x(N, k)} at the highest resolution level (level N), governed by x(N, k + 1) = A(N, k)x(N, k)
+ ~(N, k) ,
(11.17)
11.2 Singal Estimation and Decomposition
171
with measurement v(N, k) = C(N, k)x(N, k)
+ 1J(N, k) ,
(11.18)
where {~(N, k)} and {1J(N, k)} are mutually independent Gaussian noise sequences with zero mean and variances Q(N, k) and R(N, k), respectively. Given a sequence of measurements, {v(N, k)}, the conventional way of estimation and decomposition of the random signal {X(N, k)} is performed in a sequential manner: first, find an estimation x(N, k) at the highest resolutional level; then apply a wavelet transform to decompose it into different resolutions. In the following, an algorithm for simultaneous estimation and decomposition of the random signal is derived. For simplicity, only two-level decomposition and estimation will be discussed, i.e., from level N to levels N - 1 and N - 2. At all other levels, simultaneous estimation and decomposition procedures are exactly the same. A data block with length M = 22 = 4 is chosen, where base 2 is used in order to design an octave filter bank, and power 2 is the same as the number of the levels in the decomposition. Up to instant k, we have X~ = [x(N, k - 3), x(N, k - 2), x(N, k - 1), x(N, k)] T
,
for which the equivalent dynamical system in a data-block form is derived as follows. For notional convenience, the system and measurement equations, (11.17) and (11.18), are assumed to be time-invariant, and the level index N is dropped. The propagation is carried out over an interval of length M, yielding x(N, k + 1) = Ax(N, k)
+ ~(N, k) ,
(11.19)
or x(N, k + 1)
= A2x(N, k -
1)
+ A~(N, k - 1) + ~(N, k) ,
(11.20)
or x(N, k + 1) =A 3 x(N, k - 2)
+ A2~(N, k - 2) + A~(N, k -1) + ~(N, k),
(11.21)
172
11. Wavelet Kalman Filtering
or x(N, k + 1) =A4x(N, k - 3)
+ A3~(N, k - 3) + A2~(N, k - 2) + A~(N, k - 1) + ~(N, k). (11.22)
Taking the average of (11.19), (11.20), (11.21) and (11.22), we obtain 1
=4 A 4x(N, k -
x(N, k + 1)
1
+ 4 A 3x(N, k -
3)
1
2)
1
+ 4 A 2x(N, k - 1) + 4 Ax(N, k) + ~(1), where
~(1) =~ A3~(N, k -
3) + ~A2~(N, k - 2)
3
+4A~(N,k-l)+~(N,k).
(11.23)
Also, taking into account all propagations, we finally have a dynamical system in a data-block form as follows (cf. Exercise 11.5): X(N'k+l)] [iA4 x(N,k+2) _ 0 x(N,k+3) 0 [ x(N,k+4) 0 '---v-'"
0 0 v
A
[~1zj =~l] + [~m] , wr:
Kf: ~(i),
(11.24)
~(4) '--v--"
x(N,k)
'---v-'"
where
!A] 14 A2 IA3 A4
0
"
Kf:+l
x
2 !A l A3 IA 4
!A3 tA4
i = 2,3,4, are similarly defiend, with
E{W:} = 0 and E{W: (W:) T} = Q, and the elements of Q are given by 1
1
6
4
9
2
Qll = 16 A Q + 4 A Q + 16 A Q + Q , Q13 =
3 4 2 8 A Q + A Q, 1
4
3
Q14 = A Q, 2
1
_
1 6 4 2 4 A Q + A Q + A Q + Q,
q41 = q14 ,
q42 = q24,
1
3
Q21 = Q12 , 1
-
5
Q22 = 9" A 6 Q + 9" A 4 Q + A Q + Q , q23 = 3A Q q24 = A 4Q + A 2Q, q31 = q13, q32 = q23 , q33 =
5
q12 = (3 A Q + "2 A Q + AQ ,
5
+ A 3 Q + AQ , 3
Q34 = A Q + A Q + AQ , 2 4 q43 = q34, q44 = A6 Q + A Q + A Q + Q.
11.2 Singal Estimation and Decomposition
173
The measurement equation associated with (11.24) can be easily constructed as
where
E{nf} = 0
and
E{nf (nf) T} = R = diag {R, R, R, R}.
Two-level decomposition is then performed, leading to N k
[XX If-
2
- 2] -kH N X -1 -kH
=
HN- 2H N - 1 2 N - Ik [GN-2 H N-l] X Nk = T NXN GN-l
.
(11.25)
Substituting (11.25) into (11.24) results in (11.26)
where
wf = T N - 2IN W:, A
=
E{wf} = 0,
T N - 2IN A(T N - 2 IN )T ,
Q
E{wf (wf) T} = Q,
= TN-2INQ(TN-2IN)T .
Equation (11.26) describes a dynamical system for the decomposed quantities. Its associated measurement equation can also be derived by substituting (11.25) into (11.24), yielding (11.27)
where
c = C(T N - 2IN ) T .
Now, we have obtained the system and measurement equations, (11.26) and (11.27), for the decomposed quantities. The next task is to estimate these quantities using measurement data. The Kalman filter is readily applied to (11.26) and (11.27) to provide optimal estimates for these decomposed quantities.
174
11. Wavelet Kalman Filtering
11.2.2 An Example of Random Walk
A one-dimensional colored noise process (known also as the Brownian random walk) is studied in this section. This random process is governed by x(N, k + 1) = x(N, k)
+ ~(N, k)
(11.28)
with the measurement v(N, k) = x(N, k)
+ TJ(N, k),
(11.29)
where {~(N, k)} and {TJ(N, k)} are mutually independent zero mean Gaussian nose sequences with variances Q(N, k) = 0.1 and R(N, k) = 1.0, respectively. The true {x(N,k)} and the measurement {v(N,k)} at the highest resolutionallevel are shown in Figures 11.2 (a) and (b). By applying the model of (11.26) and (11.27), Haar wavelets for the filter bank, and the standard Kalman filtering scheme through the process described in the last section, we obtain two-level esti-N -2 -N -2 -N -1 . mates: X kL ,XkH ,and X kH . The computatIon was performed on-line, and the results are shown in Figures 11.2 (c), (d), and 11.3 (a). -N-2 -N-1 At the first glance, X kH and X kH both look like some -N-2
-N-1
kind of noise. Actually, X k and X kH are estimates of the high frequency components the true signal. We can use these high-frequency components to compose higher-level estimates by "adding" them to the estimates of the low-frequency components. -N-1 -N-2 -N-2 For instance, X kL is derived by composing X kL and X kH as follows: -N-2
01
X:£-1 = [(H N- 2)T (G N- 2)T]
[~~-2] X kH
,
(11.30)
x:£
which is depicted in Figure 11.3 (b). Similarly, (at the highest resolutional level) can be obtained by composing X:£-1 and
x:;;\ which is displayed in Figure 11.3 (c). To compare the performance, the standard Kalman filter is applied directly to system (11.28) and (11.29), resulting in the curves shown in Figure 11.3 (d). We can see that both estimates at the highest resolutional level obtained by composing the estimates of decomposed quantities and by Kalman filtering along are
11.2 Singal Estimation and Decomposition
175
very similar. To quantitatively compare these two approaches, a set of 200 Monte Carlo simulations were performed and the rootmean-square errors were found to be 0.2940 for the simultaneous estimation and decomposition algorithm but 0.3104 for the standard Kalman filtering scheme. This illdicates that the simultaneous approach outperforms the direct Kalman filter, even for only two-level decomposition and estimation. The difference becomes more significant as the number of levels is allowed to increase. (a) 11,.------r------.....
(b) 14 r - - - - - - - - r - - - - - - - , ~ c:
10
12
Q)
E
~ 10
:J
(/)
«S
9
Q)
E 8 8 '---------'-----~ o 500 1000
500 samples
samples Q)
(c) 12 .-----,.---r---r--~-__.
10
E ~Q) 11
(d) 0.5 r - - - - - , . - - - - r - - - - - , - - - . . , . . - - - - ,
ca
E
:;::; (/)
Q) (/)
(/) (/)
(/)
ctS
~10
o
I
~ 9
ctS
a.
eu
B:cI 0)
0
(/) (/)
eu
a.
~
.Q
Q)
1000
8 ~~---'-_---'-_---L-_-.J o 50 100 150 200 samples
Fig. 11.2.
~
.Q -0.5 ' - - - - - - - L _ - - ' - _ - - 1 - _ - - - - L - _ - - l
o
50
100 150 samples
200
11. Wavelet Kalman Filtering
176
(b)
(a)
...,Cl) 12 (U E ~ 11
0.3 0.2
Q)
ca
E 0.1
en en
~ Cl)
(U
~10
0 ctS f-0.1 fI)
f/J
.Q -0
~ 9
0>
:E
0
-0.2
a.
E
0 0
0
100
200
300
400
8
0
samples
100
200
300
400
samples
(c)
(d)
12
12 Cl)
ca
Cl)
caE 11
E11
~Cl) c:
~ Q)
-g 10
ctS10 E
fI)
(ij
0
a. E 9 0
~
.£ 9 (U
(J
8
0
a. 500 samples
Fig. 11.3.
1000
500 samples
1000
11.2 Singal Estimation and Decomposition
177
Exercises 11.1. The following Haar and triangle functions are typical window functions in the time domain: 4JH(t) = {
~
4JT(t) = { 2 -
O::;t
~
O::;t
Sketch these two functions, and verify that
Here, cPH(t) and cPT(t) are also called B-splines of degree zero and one, respectively. B-spline of degree n can be generated via 1 4Jn(t) = 4Jn-1(t - r)dr, n = 2,3" ...
1
Calculate and sketch cP2(t) and cP3(t). (Note that cPo == cPH and cPl == cPT.) 11.2. Find the Fourier transforms of cPH(t) and cPT(t) defined above. 11.3. Based on the graphs of cPH(t) and cPT(t), sketch
Moreover, sketch the wavelets
and VJT(t) ==
1
1
-2 cPT(2t) - 2
11.4. Verify (11.16). 11.5. Verify (11.24) and (11.26).
1) .
12. Notes
The objective of this monograph has been to give a rigorous and yet elementary treatment of the Kalman filter theory and briefly introduce some of its real-time applications. No attempt was made to cover all the rudiment of the theory, and only a small sample of its applications has been included. There are many texts in the literature that were written for different purposes, including Anderson and Moore (1979), Balakrishnan (1984,87), Brammer and Siffiin (1989), Catlin (1989), Chen (1985), Chen, Chen and Hsu (1995), Goodwin and Sin (1984), Haykin (1986), Lewis (1986), Mendel (1987), Ruymgaart and Soong (1985,89), Sorenson (1985), Stengel (1986), and Young (1984). Unfortunately, in our detailed treatment we have had to omit many important topics; some of these will be introduced very briefly in this chapter. The interested reader is referred to the modest bibliography in this text for further study.
12.1 The Kalman Smoother Suppose that data information is available over a discrete-time time-interval {I, 2,··· ,N}. For any K with 1 ::; K < N, the optimal estimate xKIN of the state vector XK using all the data information on this time-interval (i.e. past, present, and future information are being used) is called a (digital) smoothing estimate of XK (cf. Definition 2.1 in Chapter 2). Although this smoothing problem is somewhat different from the real-time estimation which we considered in this text, it still has many useful applications. One such application is satellite orbit determination, where the satellite orbit estimation is allowed to be made after a certain period of time. More precisely, consider the following linear deterministic/stochastic system with a fixed terminal time N:
12.1 The Kalman Smoother Xk+1 {
Vk
179
= AkXk + BkUk + rk~k = CkXk + Dk Uk + '!lk '
where 1 :::; k < N and in addition the variance matrices Qk of 5.k are assumed to be positive definite for all k (cf. Chapters 2 and 3 where Qk were only assumed to be non-negative definite.) The Kalman filtering process is to be applied to find the optimal smoothing estimate xKIN where the optimality is again in the sense of minimum error variance. First, let us denote by x:f< the usual Kalman filtering estimate xKIK, using the data information {VI,···, VK}, where the superscript f indicates that the estimation is a "forward" process. Similarly, denote by x'K = xKIK+I the "backward" optimal prediction obtained in estimating the state vector XK by using the data information {VK+I,···, VN}. Then the optimal smoothing estimate xKIN (usually called Kalman smoothing estimate) can be obtained by incorporating all the data information over the time-interval {I, 2,· .. , N} by the following recursive algorithm (see Lewis (1986) and also Balakrishnan (1984,87) for more details): GK=pkpk(1+pkpk)-1 PK {
= (1 - GK)pk
xKIN
= (1 -
GK)xk
+ PKx'K,
where pk, x:f< and Pk, x'K are, respectively, computed recursively by using the following procedures:
pt = Var(xo) P[k-I
= Ak-IP{_IAl- I + rk-IQk-Irl-1
= P[k-ICl (CkP!,k_lCl + Rk)-l pt = (1 - G'Ck)P!,k_l x6 = E(xo) = Ak-IX'_I + Bk-lUk-l G'
x,
+ G'(Vk k
and
= 1,2,··· ,K,
CkAk-lX'_l - DkUk-l)
180
12. Notes
PJv == 0 P:+ 1 ,N == P:+ 1 + cl+ 1 Rk~l Ck+l
G~ == p:+l,Nr~+l(rk+lP:+l,Nr~+l + Qk~l)-l P: == Al+ 1 (1 - G~rl+l)P:,NAk+l "b XN == 0
r
T "b k+l (I - Gbk k+l ) { Xk+l
"b Xk ==
AT
- (Cl+lRk~lDk+l k == N - 1, N - 2, ... ,K.
12.2 The a -
T
1
+ C k+l R-k+l Vk+l
+ P:+1,NBk+l)Uk+l}
Tracker
{3 - , - ()
Consider a time-invariant linear stochastic system with colored noise input described by the state-space description Xk+l {
== AXk + r~k
== CXk + !lk ' where A, C, rare n x n, q x n, and n x p constant matrices, 1 ::; p, q ::; n, and Vk
~k: M~k_l + ~ { !lk - N!lk_l + lk with M, N being constant matrices, and {~k} and {lk} being independent Gaussian white noise sequences satisfying E(~k[[J) == Q8kl ,
E(lkL) == R8kl ,
E(xOl~) == 0,
E(xofi;) == 0,
~-l
E(~kL) == 0, == !l-l
== o.
The associated a - (3 - , - () tracker for this system is defined to be the following algorithm: Xk ==
[~o ~ N~ ] Xk-l 0
+
Xo = where
a, {3",
[~]
[Vk - [e 0 I]
[~ ~
1]
Xk-l]
[E!O)] ,
and () are certain constants, and Xk
:==
[xl ~; !l~]T.
12.2 The a - {3 - , - () Tracker
181
For the real-time tracking system discussed in Section 9.2, with colored measurement noise input, namely:
A= [~ ~ h~2], C= [1 o
Q
=
0
ap
0
l o.
r = 13 ,
0 0],
1
0
0 ]
av
0 aa
0
R
,
= [am] > 0,
M = 0 , and N = [r] > 0 ,
where h > 0 is the sampling time, this a - {3 - , - () tracker becomes a (near-optimal) limiting Kalman filter if and only if the following conditions are satisfied: a a > 0, and
(1) (2) (3) (4) (5)
,(2a+2{3+,) >0, a(r - 1)(r - 1 - r()) + r{3() - r(r + 1),() /2(r - 1) > 0, (4a + {3)({3 +,) + 3({32 - 2a,) - 4,(r - 1 - ,())/(r - 1) ~ 0, ,({3+,) ~O, a(r + 1)(r - 1 + r()) + r(r + 1){3() j(r - 1) +r,(r + 1)2 j2(r - 1)2 - r 2 + 1 ~ 0, av/aa = ({32 - 2a,)j2h 2,2,
(6) (7) ap/aa
[a(2a+2,6+'Y)-4,6(r-1-rB)/(r-1)-4QB/(r-1)2] h 4 /(2'Y 2) , (8) am/aa = [a(r + 1)(r - 1 + rB) + 1 - r 2 + r 2B2 +r(r + 1),6B/(r - 1) + r'Y(r + 1)2/2(r - 1)2] h 4 h 2 , =
and (9) the matrix where
[Pij]4X4
is non-negative definite and symmetric,
1 [
r,()(r + 1)]
r{3()
Pll
= r _ 1 a(r - 1 - rB) + r _ 1 - 2(r _ 1)2 '
P12
= -1- [{3(r - 1 - r()) + -r,() - 1] '
P14
= r_
P22
=
r-l r()
1
[ {3 1 a - r- 1
4(4a
P13
r-
,( r + 1) ] + 2(r - 1)2
3
+ {3)({3 +,) + 4({3
P23
= ,(a + {3/2) ,
P33
= ,({3 +,),
P44
=
- 2a,) -
P34
=r_
,(r - 1 - r()) r_1 '
1 ({3 - ,j(r - 1)),
= r,()j(r -1),
1 [ 1 _ r 2 a(r + 1)(r + rB - 1)
r-l
'
2
r()
P24
= - -1 , ( r - 1 - r()),
+
r{3()(r
and
+ 1)
r_1
r,(r + 1)2 2] + 2(r _ 1)2 + 1 - r .
182
12. Notes
In particular, if, in addition, N = 0; that is, only the white noise input is considered, then the above conditions reduce to the ones obtained in Theorem 9.1. Finally, by defining Xk := [Xk Xk Xk Wk]T and using the ztransform technique, the a - (3 - 1 - () tracker can be decomposed as the following (decoupled) recursive equations: (1) Xk =aIxk-I + a2 Xk-2 + a3 x k-3 + a4xk-4 + aVk + (-2a - ra + (3 + 1/ 2 )Vk-I + (a - (3 + ,/2 + r(2a - (3 - ,/2)Vk-2 - r(a - (3 +, /2)Vk-3 , (2) Xk =aIxk-I + a2 x k-2 + a3 x k-3 + a4 x k-4 + (l/h){(3vk - [(2
+ [,6 -
1
+ r)(3 -1]Vk-I
+ r(2,6 -1)]Vk-2 -
r(,6 -1)Vk-3} ,
(3) Xk =aIxk-I + a2 x k-2 + a3 x k-3 + Xk-4 + (1/ h2 )[Vk - (2 + ,)Vk-l + (1 + 2r)vk-2 - rVk-3], (4) Wk =aIwk-I + a2 w k-2 + a3 Wk-3 + a4 Wk-4
+ ()( Vk
- 3Vk-I
+ 3Vk-2 -
Vk-3) ,
with initial conditions X-I, X-I, X-I, and W-I, where aI = -a-,6-1/2 + r ((}-1)+3, a2 = 2a +,6 -1/2 + r(a
+ (3 + 1/2 + 3() -
3) - 3,
a3 = -a + r( -2a -,6 + ,/2 - 3(} + 3) + 1, a4=r(a+(}-1).
We remark that the above decoupled formulas include the white noise input result obtained in Section 9.2, and this method works regardless of the near-optimality of the filter. For more details, see Chen and Chui (1986) and Chui (1984).
12.3 Adaptive Kalman Filtering
Consider the linear stochastic state-space description Xk+I {
Vk
= AkXk + rkfk = CkXk + !lk '
where {~k} and {!lk} are uncorrelated zero-mean Gaussian white noise sequences with Var({k) = Qk and Var(!lk) = Rk. Then assuming that each Rk is positive definite and all the matrices
12.3 Adaptive Kalman Filtering
183
Ak,Ck,fk,Qk, and Rk, and initial conditions E(xo) and Var(xo) are known quantities, the Kalman filtering algorithm derived in Chapters 2 and 3 provides a very efficient optimal estimation process for the state vectors Xk in real-time. In fact, the estimation process is so effective that even for very poor choices of the initial conditions E(xo) and/or Var(xo), fairly desirable estimates are usually obtained in a relatively short period of time. However, if not all of the matrices A k , Ck, fk' Qk, Rk are known quantities, the filtering algorithm must be modified so that optimal estimations of the state as well as the unknown matrices are performed in real-time from the incoming data {Vk}. Algorithms of this type may be called adaptive algoritbms, and the corresponding filtering process adaptive Kalman filtering. If Qk and Rk are known, then the adaptive Kalman filtering can be used to "identify" the system and/or measurement matrices. This problem has been discussed in Chapter 8 when partial information of A k , fk' and Ck is given. In general, the identification problem is a very difficult but important one [cf. Astrom and Eykhoff (1971) and Mehra (1970,1972)]. Let us now discuss the situation where Ak' fk' Ck are given. Then an adaptive Kalman filter for estimating the state as well as the noise variance matrices may be called a noise-adaptive filter. Although several algorithms are available for this purpose [cf. Astrom and Eykhoff (1971), Chen and Chui (1991), Chin (1979), Jazwinski (1969), and Mehra (1970,1972) etc.], there is still no available algorithm that is derived from the truely optimality criterion. For simplicity, let us discuss the situation where A k , fk' Ck and Qk are given. Hence, only Rk has to be estimated. The innovations approach [cf. Kailath (1968) and Mehra (1970)] seems to be very efficient for this situation. From the incoming data information Vk and the optimal prediction xklk-l obtained in the previous step, the innovations sequence is defined to be
It is clear that Zk
= Ck(Xk -
xklk-l)
+ '!1k
which is, in fact, a zero-mean Gaussian white noise sequence. By taking variances on both sides, we have Sk := Var(zk) = CkPk,k-lC~
This yields an estimate of R k ; namely,
+ Rk ·
184
12. Notes
where Sk is the statistical sample variance estimate of Sk given by
with Zi being the statistical sample mean defined by 1
i
z·=-'"""z· i L.....J J' 'l
j=1
(see, for example, Stengel (1986)).
12.4 Adaptive Kalman Filtering Approach to Wiener Filtering In digital signal processing and system theory, an important problem is to determine the unit impulse responses of a digital filter or a linear system from the input/output information where the signal is contaminated with noise, see, for example, Chui and Chen (1992). More precisely, let {Uk} and {Vk} be known input and output signals, respectively, and {1]k} be an unknown sequence of noise process. The problem is to "identify" the sequence {h k } from the relationship 00
Vk
=
L
hiUk-i
+ 1]k ,
k
= 0,1,···
.
i=O
The optimality criterion in the so-called Wiener filter is to determine a sequence {h k } such that by setting 00
Vk
=L
hiUk-i ,
i=O
it is required that
Under the assumption that hi = 0 for i > M; that is, when an FIR system is considered, the above problem can be recast in the state-space description as follows: Let x=[h o hI··· hM]T
12.5 The Kalman-Bucy Filter
185
be the state vector to be estimated. Since this is a constant vector, we may write Xk+l
In addition, let
C
= Xk = x.
be the "observation matrix" defined by C
= [uo
Ul
...
UM].
Then the input/output relationship can be written as
{
Xk+l
Vk
= Xk
(a)
= CXk + 1Jk ,
and we are required to give an optimal estimation Xk of Xk from the data information {vo,··· ,Vk}. When {1Jk} is a zeromean Gaussian white noise sequence with unknown variances Rk = Var(1Jk), the estimation can be done by applying the noiseadaptive Kalman filtering discussed in Section 12.3. We remark that if an I I R system is considered, the adaptive Kalman filtering technique cannot be applied directly, since the corresponding linear system (a) becomes infinite-dimensional.
12.5 The Kalman-Bucy Filter This book is devoted exclusively to the study of Kalman filtering for discrete-time models. The continuous-time analog, which we will briefly introduce in the following, is called the Kalman-Bucy filter.
Consider the continuous-time linear deterministic/stochastic system dx(t) = A(t)x(t)dt + B(t)u(t)dt + f(t)~(t)dt , { dv(t) = C(t)x(t)dt + !1(t)dt ,
X(O) = Xo,
where 0 :::; t :::; T and the state vector x(t) is a random n- vector with initial position x(O) N(O, E 2 ). Here, E 2 , or at least an estimate of it, is given. The stochastic input or noise processes ~(t) and 1J(t) are Wiener-Levy p- and q-vectors, respectively, with -1 :::; p, q ~ n, the observation v(t) is a random q-vector, and A(t), f(t), and C(t) are n x n, n x p, and q x n deterministic' matrix-valued continuous functions on the continuous-time interval [0, T]. f'..I
186
12. Notes
The Kalman-Bucy filter is given by the following recursive formula:
1 T
x(t)
=
P(r)C T (r)R-1(r)dv(r)
rT [A(r) - P(r)C
+ la
T
(r)R-1(r)C(r)]x(r)d(r)
+
1T
B(r)u(r)dr
where R{t) = E{1J{t)1J T (t)}, and P{t) satisfies the matrix Riccati equation - -
{
F{t) = A{t)P{t) + P(t)A T (t) - P(t)C T (t)R- 1 (t)C{t)P(t) P(O) = Var(xo) = E 2
+ r(t)Q(t)r T (t)
,
with Q(t) = E{~(t)~T(t)}. For more details, the reader is referred to the original paper of Kalman and Bucy (1961), and the books by Ruymgaart and Soong (1985), and Fleming and Rishel (1975).
12.6 Stochastic Optimal Control
There is a vast literature on deterministic optimal control theory. For a brief discussion, the reader is referred to Chui and Chen (1989). The subject of stochastic optimal control deals with systems in which random disturbances are also taken into consideration. One of the typical stochastic optimal control problems is the so-called linear regulator problem. Since this model has continuously attracted most attention, we will devote this section to the discussion of this particular problem. The system and observation equations are given by the linear deterministic/stochastic differential equations: dx{t) = A(t)x{t)dt + B(t)u(t)dt + r(t){(t)dt , { dv(t) = C(t)x(t)dt + '!](t)dt ,
where 0 ::; t ::; T (cf. Section 12.5), and the cost functional to be minimized over a certain "admissible" class of control functions u(t) is given by
E{l [xT(t)Wx(t)x(t) + uT (t)Wu(t)u(t)]dt} . T
F(u) =
12.6 Stochastic Optimal Control
187
Here, the initial state x(O) is assumed to be Xo rv N(O, E 2), E 2 being given, and ~(t) and TJ(t) are uncorrelated zero-mean Gaussian white noise processes and are also independent of the initial state Xo. In addition, the data item v(t) is known for 0 ~ t ~ T, and A(t), B(t), C(t), Wx(t), and Wu(t) are known deterministic matrices of appropriate dimensions, with Wx(t) being non-negative definite and symmetric, and Wu(t) positive definite and symmetric. In general, the admissible class of control functions u(t) consists of vector-valued Borel measurable functions defined on [O,T] with range in some closed subset of RP. Suppose that the control function u(t) has partial knowledge of the system state via the observation data, in the sense that u(t) is a linear function of the data v(t) rather than the state vector x(t). For such a linear regulator problem, we may apply the so-called separation principle, which is one of the most useful results in stochastic optimal control theory. This principle essentially implies that the above "partially observed" linear regulator problem can be split into two parts: The first being an optimal estimation of the system state by means of the Kalman-Bucy filter discussed in Section 12.5, and the second a "completely observed" linear regulator problem whose solution is given by a linear feedback control function. More precisely, the optimal estimate x(t) of the state vector x(t) satisfies the linear stochastic system
{
dx(t) == A(t)x(t)dt + B(t)u(t)dt + P(t)C T (t)[dv(t) - R- 1 (t)C(t)x(t)dt]
x(O) == E(xo) ,
where R(t) == E{TJ(t)TJ T (t)}, and P(t) is the (unique) solution of the matrix Riccati equation p(t) ==A(t)P(t) {
+ P(t)A T (t) - P(t)CT (t)R- 1 (t)C(t)P(t)
+ f(t)Q(t)f T(t) P(O) == Var(xo) == E 2
,
with Q(t) == E{~(t)~T(t)}. On the other hand, an optimal control function u*(t) is gIven by u*(t) == -R- 1 (t)B T (t)K(t)x(t) ,
where K(t) is the (unique) solution of the matrix Riccati equation K(t) == K(t)B(t)W;l(t)B T (t)K(t) - K(t)A(t) - AT (t)K(t) - Wx(t) { K(T) ==0,
188
12. Notes
with 0 ::; t ::; T. For more details, the reader is referred to Wonham (1968), Kushner (1971), Fleming and Rishel (1975), Davis (1977), and more recently, Chen, Chen and Hsu (1995).
12.7 Square-Root Filtering and Systolic Array Implementation The square-root filtering algorithm was first introduced by Potter (1963) and later improved by Carlson (1973) to give a fast computational scheme. Recall from Chapter 7 that this algorithm requires computation of the matrices Jk,k' Jk,k-l, and G k , where Jk,k
= Jk,k-l[I - J~k-lC"[ (H"[)-l(Hk + Rk)-lCkJk,k-l] ,
(a)
Jk,k-l is a square-root of the matrix [Ak-1Jk-l,k-l rk-lQ~~l][Ak-lJk-l,k-l rk-lQ~~l]T,
and Gk
= Jk,k-lJ~k_lC~ (H~)-l H k 1 ,
where Jk,k = p~~2_1' Jk,k-l = p~~2_1' Hk = (CkPk,k-lC"[ +Rk)C with MC being the "squ'are- root" of the matrix M in the form of a lower triangular matrix instead of being the positive definite squareroot M 1 / 2 of M (cf. Lemma 7.1). It is clear that if we can compute Jk,k directly from Jk,k-l (or Pk,k directly from Pk,k-l) without using formula (a), then the algorithm could be somewhat more efficient. From this point of view, Bierman (1973,1977) modified Carlson's method and made use of LU decomposition to give the following algorithm: First, consider the decompositions and
where Ui and D i are upper triangular and diagonal matrices, respectively, i == 1,2. The subscript k is omitted simply for convenience. Furthermore, define D := D 1 - DIU! C"[ (H"[)-I(Hk)-lCkUIDl
and decompose D == U3 D 3 UJ. Then it follows that and
Bierman's algorithm requires O(qn 2 ) arithmetical operations to obtain {U2 , D 2 } from {U1 , D 1 } where nand q are, respectively, the
12.7 Systolic Array Implementation
189
dimensions of the state and observation vectors. Andrews (1981) modified this algorithm by using parallel processing techniques and reduced the number of operations to O(nq fog n). More recently, Jover and Kailath (1986) made use of the Schur complement technique and applied systolic arrays [cf. Kung (1982), Mead and Conway (1980), and Kung (1985)] to further reduce the number of operations to O(n) (or more precisely, approximately 4n). In addition, the number of required arithmetic processors is reduced from O(n 2 ) to O(n). The basic idea of this approach can be briefly described as follows. Since Pk,k-l is non-negative definite and symmetric, there is an orthogonal matrix M 1 such that 2 (b) [Ak-lP1~;,k-l rk-lQk~21]Ml = [P1;k _1 0]. Consider the augmented matrix
which can be shown to have the following two decompositions: A=[10 C1k][ROk
and
0][1 Pk,k CJ
~]
A=[Pk,k_lCJ(kl)-lHk 1 ~][H\;-lJ . [~ (Hl)-l H~lCkPk,k-l] .
P~'k]
Hence, by taking the upper block triangular '~'square-root" of the decomposition on the left, and the lower block triangular "squareroot" of that on the right, there is an orthogonal matrix M 2 such that (c)
Now, using LU decompositions, where subscript k will be dropped for convenience, we have:
and
190
12. Notes
It follows that the identity (c) may be written as [
=
UR
o
CkU1] U1
[D0
R
0 D1
]1/2 M 2
[Pk'k_1CI(~!f)-lH-1UH ~2] [~H
o ] 1/2 D2
'
so that by defining
o ] -1/2 D2
which is clearly an orthogonal matrix, we have (d)
By an algorithm posed by Kailath (1982), M 3 can be decomposed as a product of a finite number of elementary matrices without using the square-root operation. Hence, by an appropriate application of systolic arrays, {UH, U2 } can be computed from {UR,U1} via (d) and P~:k2_1 from P~~;,k-1 via (b) in approximately 4n arithmetical operations. Consequently, D 2 can be easily computed from D 1 • For more details on this subject, see Jover and Kailath (1986) and Gaston and Irwin (1990).
References
Alfeld, G. and Herzberger, J. (1983): Introduction to Interval Computations (Academic, New York) Anderson, B.D.G., Moore, J.B. (1979): Optimal Filtering (Prentice-Hall, Englewood Cliffs, NJ) Andrews, A. (1981): "Parallel processing of the Kalman filter", IEEE Proc. Int. Conf. on Para!' Process., pp.216-220 Aoki, M. (1989): Optimization of Stochastic Systems: Topics in Discrete-Time Dynamics (Academic, New York) Astrom, K.J., Eykhoff, P. (1971): "System identification - a survey," Automatica, 7, pp.123-162 Balakrishnan, A.V. (1984,87): Kalman Filtering Theory (Optimization Software, Inc., New York) Bierman, G.J. (1973): "A comparison of discrete linear filtering algorithms," IEEE Trans. Aero. Elec. Systems, 9, pp.28-37 Bierman, G.J. (1977): Factorization Methods for Discrete Sequential Estimation (Academic, New York) Blahut, R.E. (1985): Fast Algorithms for Digital Signal Processing (Addison-Wesley, Reading, MA) Bozic, S.M. (1979): Digital and Kalman Filtering (Wiley, New York) Brammer, K., Siffiin, G. (1989): Kalman-Bucy Filters (Artech House, Boston) Brown, R.G. and Hwang, P.Y.C. (1992,97): Introduction to Random Signals and Applied Kalman Filtering (Wiley, New York) Bucy, R.S., Joseph, P.D. (1968): Filtering for Stochastic Processes with Applications to Guidance (Wiley, New York) Burrus, C.S. , Gopinath, R.A. and Guo, H. (1998): Introduction to Wavelets and Wavelet Transfroms: A Primer (Prentice-Hall, Upper Saddle River, NJ) Carlson, N.A. (1973): "Fast triangular formulation of the square root filter," J. ALAA, 11 pp.1259-1263
192
Kalman Filtering
Catlin, D.E. (1989): Estimation, Control, and the Discrete Kalman Filter (Springer, New York) Chen, G. (1992): "Convergence analysis for inexact mechanization of Kalman filtering," IEEE Trans. Aero. Elect. Syst., 28, pp.612-621 Chen, G. (1993): Approximate Kalman Filtering (World Scientific, Singapore) Chen, G., Chen, G. and Hsu, S.H. (1995): Linear Stochastic Control Systems (CRC, Boca Raton, FL) Chen, G., Chui, C.K. (1986): "Design of near-optimal linear digital tracking filters with colored input," J. Comp. App!. Math., 15, pp.353-370 Chen, G., Wang, J. and Shieh, L.S. (1997): "Interval Kalman filtering," IEEE Trans. Aero. Elect. Syst., 33, pp.250-259 Chen, H.F. (1985): Recursive Estimation and Control for Stochastic Systems (Wiley, New York) Chui, C.K. (1984): "Design and analysis of linear predictioncorrection digital filters," Linear and Multilinear Algebra, 15, pp.47-69 Chui, C.K. (1997): Wavelets: A Mathematical Tool for Signal Analysis, (SIAM, Philadelphia) Chui, C.K., Chen, G. (1989): Linear Systems and Optimal Control, Springer Ser. Inf. Sci., Vo!. 18 (Springer, Berlin Heidelberg) Chui, C.K., Chen, G. (1992,97): Signal Processing and Systems Theory: Selected Topics, Springer Ser. Inf. Sci., Vo!. 26 (Springer, Berlin Heidelberg) Chui, C.K., Chen, G. and Chui, H.C. (1990): "Modified extended Kalman filtering and a real-time parallel algorithm for system parameter identification," IEEE Trans. Auto. Control, 35, pp.100-104 Davis, M.H.A. (1977): Linear Estimation and Stochastic Control (Wiley, New York) Davis, M.H.A., Vinter, R.B. (1985): Stochastic Modeling and Control (Chapman and Hall, New York) Fleming, W.H., Rishel, R.W. (1975): Deterministic and Stochastic Optimal Control (Springer, New York) Gaston, F.M.F., Irwin, G.W. (1990): "Systolic Kalman filtering: An overview," lEE Proc.-D, 137, pp.235-244 Goodwin, G.C., Sin, K.S. (1984): Adaptive Filtering Prediction and Control (Prentice-Hall, Englewood Cliffs, NJ)
References
193
Haykin, S. (1986): Adaptive Filter Theory (Prentice-Hall, Englewood Cliffs, NJ) Hong, L., Chen, G. and Chui, C.K. (1998): "A filter-bank-based Kalman filtering technique for wavelet estimation and decomposition of random signals," IEEE Thans. Circ. Syst. (11), 45, pp. 237-241. Hong, L., Chen, G. and Chui, C.K. (1998): "Real-time simultaneous estimation and ecomposition of random signals," Multidim. Sys. Sign. Proc., 9, pp. 273-289. Jazwinski, A.H. (1969): "Adaptive filtering," Automatica, 5, pp.475-485 Jazwinski, A.H. (1970): Stochastic Processes and Filtering Theory (Academic, New York) Jover, J.M., Kailath, T. (1986): "A parallel architecture for Kalman filter measurement update and parameter estimation," Automatica, 22, pp.43-57 Kailath, T. (1968): "An innovations approach to least-squares estimation, part I: linear filtering in additive white noise," IEEE Trans. Auto. Contr., 13, pp.646-655 Kailath, T. (1982): Course Notes on Linear Estimation (Stanford University, CA) Kalman, R.E. (1960): "A new approach to linear filtering and prediction problems," Thans. ASME, J. Basic Eng., 82, pp.35-45 Kaiman, R.E. (1963): "New method in Wiener filtering theory," Proc. Symp. Eng. Appl. Random Function Theory and Probability (Wiley, New York) Kalman, R.E., Bucy, R.S. (1961): "New results in linear filtering and prediction theory," Thans. ASME J. Basic Eng., 83, pp.95108 Kumar, P.R., Varaiya, P. (1986): Stochastic Systems: Estimation, Identification, and Adaptive Control (Prentice-Hall, Englewood Cliffs, NJ) Kung, H.T. (1982): "Why systolic architectures?" Computer, 15, pp.37-46 Kung, S.Y. (1985): "VLSI arrays processors," IEEE ASSP Magazine, 2, pp.4-22 Kushner, H. (1971): Introduction to Stochastic Control (Holt, Rinehart and Winston, Inc., New York) Lewis, F.L. (1986): Optimal Estimation (WHey, New York)
194
Kalman Filtering
Lu, M., Qiao, X., Chen, G. (1992): "A parallel square-root algorithm for the modified extended Kalman filter," IEEE Trans. Aero. Elect. Syst., 28, pp.153-163 Lu, M., Qiao, X., Chen, G. (1993): "Parallel computation of the modified extended Kalman filter," Int'l J. Comput. Math., 45, pp.69-87 Maybeck, P.S. (1982): Stochastic Models, Estimation, and Control, Vo!' 1,2,3 (Academic, New York) Mead, C., Conway, L. (1980): Introduction to VLSI systems (Addison-Wesley, Reading, MA) Mehra, R.K. (1970): "On the identification of variances and adaptive Kalman filtering," IEEE Trans. Auto. Contr., 15, pp.175184 Mehra, R.K. (1972): "Approaches to adaptive filtering," IEEE Trans. Auto. Contr., 17, pp.693-698 Mendel, J.M. (1987): Lessons in Digital Estimation Theory (Prentice-Hall, Englewood Cliffs, New Jersey) Potter, J.E. (1963): "New statistical formulas," Instrumentation Lab., MIT, Space Guidance Analysis Memo. # 40 Probability Group (1975), Institute of Mathematics, Academia Sinica, China (ed.): Mathematical Methods of Filtering for Discrete-Time Systems (in Chinese) (Beijing) Ruymgaart, P.A., Soong, T.T. (1985,88): Mathematics of KalmanBucy Filtering, Springer Ser. Inf. Sci., Vo!' 14 (Springer, Berlin Heidelberg) Shiryayev, A.N. (1984): Probability (Springer-Verlag, New York) Siouris, G., Chen, G. and Wang, J. (1997): "Thacking an incoming ballistic missile," IEEE Trans. Aero. Elect. Syst., 33, pp.232240 Sorenson, H.W., ed. (1985): Kalman Filtering: Theory and Application (IEEE, New York) Stengel, R.F. (1986): Stochastic Optimal Control: Theory and Application (Wiley, New York) Strobach, P. (1990): Linear Prediction Theory: A Mathematical Basis for Adaptive Systems, Springer Ser. Inf. Sci., Vo!' 21 (Springer, Berlin Heidelberg) Wang, E.P. (1972): "Optimal linear recursive filtering methods," J. Mathematics in Practice and Theory (in Chinese), 6, pp.40-50 Wonham, W.M. (1968): "On the separation theorem of stochastic control," SIAM J. Control, 6, pp.312-326
References
195
Xu, J.H., Bian, G.R., Ni, C.K., Tang, G.X. (1981): State Estimation and System Identification (in Chinese) (Beijing) Young, P. (1984): Recursive Estimation and Time-Series Analysis (Springer, New York)
Answers and Hints to Exercises
Chapter 1 1.1.
Since most of the properties can be verified directly by using the definition of the trace, we only consider trAB = trBA. Indeed,
1.2.
1.3.
1.4.
A=[~
;],
B=[~ ~].
There exist unitary matrices P and Q such that
and
n
n
LA~ ~ LJl~'
k=l
k=l
Let P = [Pij]nxn and Q = [qij]nxn. Then
+ P~n + + P;n = 1 , qrl + q~l + q~2 + q~2 + + q~2 = 1 , ... ,q~n + q~n + Pin
and
+ q~l = 1, + q~n = 1 ,
198
Answers and Hints to Exercises
* P~l Ai
+ P~2A~ + ... + P~nA;
tr
P~lAi
+ P~2A~ + ... + P~nA~
*
= (pil + P~l + ... + p;l)Ai + ... + (Pin + P~n + ... + P;n)A; = Ai + A~ + ... + A~. Similarly, trBB T = JLi + JL~ + ... + JL~. Hence, trAA T 1.5. Denote 00 2 1=
1-00
~ trBB T .
e- Y dy.
Then, using polar coordinates, we have
1.6.
Denote I(x)
=
00
1
-00
e-
xy
2
dy.
Then, by Exercise 1.5, I(x)
= -1
Vi
100 e-(VXY) 2 d( yXy) = V1r / x . -00
Hence,
100
y2 e -
Y 2 dy
d I = --I(x) dx
-00
=
1.7.
x=l
-!-(~)I = ~V1f. dx x=l 2
(a) Let p be a unitary matrix so that R
= pT diag[Al,···, An]P,
Answers and Hints to Exercises
199
and define
Then E(X)
= =
1: 1:
xf (x) dx
~ + y'2p-l
diag[
1/-1>.1> ... ' l/-I>.n
]y)f(x)dx
= ~l:f(X)dX + const·l
00
loo [Yl] ~n 00
... oo
e-yr . .. e-Y;dYl ... dYn
=l!:·1+0=l!:.
(b) Using the same substitution, we have
1: 1:
Var(X)
= =
(x -
~)(x - ~) T f(x)dx
2R1 / 2 yy T R 1 / 2 f(x)dx
(~~n/2Rl/2{
=
1:.. 1:[~r
YnYl
2 . e-Yl2 ... e-YndYl ... dYn } R 1/2
= R l / 2 IR l / 2 = R. 1.8. 1.9.
All the properties can be easily verified from the definitions. We have already proved that if Xl and X 2 are independent then CoV(X l ,X2 ) = o. Suppose now that R 12 = Cov(X l , X 2 ) = o. Then R 2l = COV(X2 , Xl) = 0 so that f(X l ,X2 ) =
/
1
(21l")n 2detR ll detR 22 .
e-~(Xl-~l)T Rll(Xl-~1)e-~(X2-~2)T R 22(X2 -E:2)
= fl(X l ) . f2(X 2 ). Hence,
Xl
and X 2 are independent.
200
Answers and Hints to Exercises
1.10. (1.35) can be verified by a direct computation. First, the following formula may be easily obtained:
This yields, by taking determinants,
~Xy] = det [R xx yy
det [Rxx Ryx
RxyR;; R yx ] . detRyy
and
([;J - [~x ]) -y
=(x - i!:.)T [R xx
T
[~: ~: r\[;J - [~x ]) -y
- RxyR;;Ryxr\x -
i!:.)
+ (y - !!:.)T R;;(y -!!:.)'
where I!:.
= l!.x + RxyR;yl(y -l!.y) .
The remaining computational steps are straightforward. 1.11. Let Pk = ClWkZk and 0- 2 = E[pr (ClWkCk)-lpk]. Then it can be easily verified that
From dFd(Yk) Yk
= 2(CIWkCk)Yk -
2Pk
= 0,
and the assumption that the matrix (CJWkCk) is nonsingular, we have
1.12.
EXk = (Cl R;lCk)-lC~R;l E(Vk - DkUk) = (cl R;lCk)-lC~R;l E(CkXk =EXk·
+ !lk)
Answers and Hints to Exercises
201
Chapter 2 2.1. l W k-,k _ l = VarC~k ,k-l) = E(fk "k-Ifl k-l)
= E(Vk-1 - Hk,k-lXk)(Vk-l - Hk,k-lXk) T Co ] Rk-l
2.2.
E~=l iPOiri-1{i_l ]
+Var
:
[
Ck-l iPk-l,krk-l{k_1
For any nonzero vector x, we have x T Ax > 0 and x T Bx ~ 0 so that X T (A + B)x = X T Ax + X T Bx > 0 . Hence, A + B is positive definite.
2.3. -1 W k,k-l = E(fk ,k-lfl,k-l)
= E(fk-l,k-l - Hk,k-lrk-l{k_l)(fk-l,k-l - Hk,k-lrk-l~k_I)T = E(fk-l,k-Ifl-l,k-l) = Wk"!l,k-l
+ Hk,k-lrk-IE({k_I{~_I)rl-IH~k-1
+ Hk-l,k-Iq>k-l,krk-IQk-lrl-liPl-l,kH"[-l,k-l'
2.4.
Apply Lemma 1.2 to All = Wk"!I,k-I,A 22 = Qk~l and
2.5.
Using Exercise 2.4, or (2.9), we have H~k-l Wk,k-l =iPl- 1 ,kH"[-1 ,k-l Wk-1,k-1 - q>l-l,kH"[-I,k-1 Wk,k-IHk,k-l iPk-l,krk-1 · (Qk~l
+ rl- 1q>l-l,kH"[-l,k-l Wk-l,k-lHk-l,k-l iPk-1,krk-l)-1
· rl-1iPl- 1,kH"[-1 ,k-lWk-l,k-l =q>l-l,k{I - H"[-l,k-l Wk-l,k-lHk-l,k-l iPk-l,krk-l · (Qk~l
+ rl- 1 q>l-l,kH"[-l,k-l Wk-l,k-lHk-l,k-l iPk-l,krk-l)-l
· rl- 1 q>l-l ,k}H"[-l ,k-l Wk-l,k-l .
202
2.6.
Answers and Hints to Exercises
Using Exercise 2.5, or (2.10), and the identity Hk,k-1 we have
Hk-1,k-1~k-1,k,
(H~k-1 Wk,k-1Hk,k-1)~k,k-1 · (H"!-1,k-1 Wk-1,k-1Hk-1,k-1)-1 H~-1,k-1Wk-1,k-1 = ~l-l,k{I - H~-1,k-1 Wk-1,k-1Hk-1,k-1 ~k-1,krk-1 ·
(Q;;~l + rl- 1~l-1,kH~-1,k-1 Wk-1,k-1Hk-1,k-1 ~k-1,krk-1)-1
· rl- 1 ~l-1,k}H~-1,k-1 Wk-1,k-1
= H~k-1 Wk,k-1 .
2.7. Pk,k-1C~ (CkPk,k-1CJ + Rk)-l =Pk,k-1C~(R;;l - R;;lCk(Pk~~_l + C~ R;;lCk)-lC~ R;;l)
=(Pk,k-1 - Pk,k-1C~ R;;lCk(Pk~_l ,
+ C~ R;;lCk)-l)C~ R;;l =(Pk,k-1 - Pk,k-1C~ (CkPk,k-1C~ + Rk)-l · (CkPk,k-1C~ + Rk)R;;lCk(Pk~_l + c~ R"k1Ck)-1)C"! R"k 1 , =(Pk,k-1 - Pk,k-1C~ (CkPk,k-1C~ + Rk)-l · (CkPk,k-1C~ R"k1Ck + Ck)(Pk,~-l + C~ R"k1Ck)-1)C~ R"k 1 =(Pk,k-1 - Pk,k-1C~ (CkPk,k-1C~ + Rk)-lCkPk,k-1 · (C~ R;;lCk + Pk~~-l)(Pk,~-l + c~ R"k1Ck)-1)C~ R;;l =(Pk,k-1 - Pk,k-1C~ (CkPk,k-1C~ + Rk)-lCkPk,k-1)C~ R"k 1 =Pk,kCJ R;;l =Gk·
2.8. Pk,k-1 =(H~k-1 Wk,k-1Hk,k-1)-1
=( ~l-1,k(H"!-1,k-1Wk-1,k-1Hk-1,k-1 - H~-1,k-1 Wk-1,k-1Hk-1,k-1 ~k-1,krk-1 ·
(Q;;~l + rl- 1~J-1,kH"!-1,k-1 Wk-1,k-1Hk-1,k-1 ~k-1,krk-1)-1
· rl- 1~J-1 ,kH~-l ,k-1 Wk-1,k-1Hk-1,k-1)~k-1,k)-1
=( ~l-1,kPk-.!1,k-1~k-1,k - ~l-1,kPk.!1,k-1 ~k-1,krk-1 ·
(Q"k~l + rJ-1 ~J-1,kPk.!1,k-1~k-1,krk-1)-1
· r k-1 '*'k-1,k k-1,k-1 '*'k-1,k T
m. T
p-1
m.
)-1
=( ~l-1,kPk!1,k-1~k_1,k)-1 + rk-1 Qk-1rl-1 =Ak-1Pk-1,k-1Al-1
+ rk-1Qk-1rl-1 .
Answers and Hints to Exercises
203
2.9. E(Xk - Xklk-l)(Xk - Xklk-l) T =E(Xk - (H~k-l Wk,k-lHk,k-l)-l H~k-lWk,k-lVk-l) o
(Xk - (H~k-l Wk,k-lHk,k-l)-l H~k-lWk,k-lVk-l) T
=E(Xk - (H~k-l Wk,k-lHk,k-l)-l H~k-lWk,k-l (Hk,k-lXk
o
o
+ fk,k-l))(Xk
- (H~k-l Wk,k-lHk,k-l)-l
H~k-l Wk,k-l(Hk,k-lXk + f.k,k-l)) T
=(H~k-l Wk,k-lHk,k-l)-l H~k-lWk,k-lE(fk,k-lfr,k-l)Wk,k-l o
Hk,k-l(H~k-l Wk,k-lHk,k-l)-l
=(H~k-lWk,k-lHk,k-l)-l =Pk,k-l
o
The derivation of the second identity is similar. 2.10. Since a2
= Var(xk) = E(axk-l + ~k_l)2 = =
a2Var(xk_l) a 2a 2 + J.-l2 ,
+ 2aE(xk-l~k-l) + E(~~-l)
we have For
j
= 1, we have + ~k)) = aVar(xk) + E(Xk~k) = aa 2
E(XkXk+l) = E(Xk(axk
0
For
j
= 2, we have E(Xk Xk+2)
= E(Xk(axk+l + ~k+l)) + E(Xk + ~k+l)
= aE(xkxk+l) = aE(xkxk+l) = a2 a 2 ,
etc. If j is negative, then a similar result can be obtained. By induction, we may conclude that E(XkXk+j) = a 1j1 a 2 for all integers j.
204
Answers and Hints to Exercises
2.11. Using the Kalman filtering equations (2.17), we have Po,o = Var(xo) = J.L2 ,
=
Pk,k-l
Pk-l,k-l ,
Gk = Pk,k-l ( Pk,k-l
+ Rk )
-1
= P
Pk-l,k-l k-l,k-l
+a
and Pk,k
=( 1- ) Gk Pk,k-l =
a2Pk-lk-l 2 p' a k-l,k-l
+
.
Observe that
Hence, Pk-l,k-l
G k=
Pk-l,k-l
+ 0'2
so that Xklk
= xklk-l + Gk(Vk '"
= Xk-llk-l + with
xOlo
J.L2 a
2
+
k
'" J.L
2 (Vk - Xk-llk-l)
= E(xo) = o. It follows that
for large values of 2.12.
xklk-l)
k.
'" = I" N
QN
N
T
L...J(VkVk) k=l
1
1
= N(VNvl.) + N
N-l
L(VkVJ) k=l
1
N-1 '"
T
= N(VNVN) + -yv-QN-l '"
1
T
'"
= QN-l + N[(VNV N) - QN-l] with the initial estimation Ql = vlvT.
2 '
Answers and Hints to Exercises
2.13. Use superimposition. 2.14. Set Xk = [(xl) T ... (xf) T]T for each k, k (and Uj = 0) for j < 0, and define
= 0,1, ... ,
with
205
Xj =
0
Then, substituting these equations into
yields the required result. Since Xj it is also clear that Xo = o.
= 0
and
Uj
= 0 for
j
< 0,
Chapter 3 3.1. 3.2.
Let A = BB T where B = [b ij ] =1= o. Then trA = trBB T = Ei,j b;j > O. By Assumption 2.1, TU is independent ofxo, {o' ... , {j-I' !la ' !l.j -1' since f ~ j. On the other hand, ej = Cj(Xj -
Yj-l) j-l
= Cj
(Aj-IXj-1
+ rj-IE. -J- 1 -
P
""'" j - l i(CiXi ~,
+ "7,)) -'I,
i=O
= Boxo
j-l
j-l
i=O
i=O
+ E Bli~i + E B 2i'!1.i
for some constant matrices Ba, B li and B 2i • Hence, (TU, ej)
= Oqxq for all f 3.3.
~ j.
Combining (3.8) and (3.4), we have j-l
ej
= IIzjll;lzj = IIzjll;lvj - E(llzjll;lCjPj-1,i)vi; i=O
206
Answers and Hints to Exercises
that is, ej can be expressed in terms of vD, VI,· .. ,Vj. Conversely, we have Vo =Zo = Ilzollqeo,
+ CIYo = ZI + CIPO,OVo =llzIllqel + CIPo,ollzollqeo,
VI =ZI
that is, Vj can also be expressed in terms of eo, el, ... , ej. Hence, we have
3.4.
By Exercise 3.3, we have i
Vi
= LLlel l=O
for some
qx q
constant matrices Ll' f = 0,1,· .. , i, so that i
(Vi, Zk) = LLl(el, ek)llzkll; = Oqxq, l=O
i
= 0,1, ... ,k - 1. Hence, for
j
= 0,1, ... ,k - 1,
j
= LPj,i(Vi' Zk) i=O
= Onxq.
3.5.
Since Xk
+ rk-l~k_1 I (Ak-2 X k-2 + rk-2~k_2) + rk-l~k_1
= Ak-IXk-1
= A k-
k-I
= Boxo +
L BIi~i i=O
for some constant matrices B o and B Ii and ~k is independent of Xo and ~i (0::; i ::; k - 1), we have (Xk' ~k) = o. The rest can ·be shown in a similar manner.
Answers and Hints to Exercises
3.6. 3.7.
Use superimposition. Using the formula obtained in Exercise 3.6, we have ~klk = {
3.8.
207
dOlo
dk-llk-l
+ hWk-l + Gk(Vk -
fldk - dk-llk-l - hWk-l)
== E(do) ,
where Gk is obtained by using the standard algorithm (3.25) with A k == Ck == rk == 1. Let
C==[100].
Then the system described in Exercise 3.8 can be decomposed into three subsystems: Xk+l {
= Ax~ + r~i~
vk == CXk + 1]k ,
== 1,2,3, where for each k, Xk and ~k are 3-vectors, Vk and are scalars, Qk a 3 x 3 non-negative definite symmetric matrix, and Rk > 0 a scalar.
i
1]k
Chapter 4 4.1.
Using (4.6), we have L(Ax+ By, v)
== E(Ax + By) + (Ax + By, v) [Var(v)] -l(v - E(v)) = A{E(x)
+ (x, v) [Var(v)]-l(v - E(v))}
+B{E(y)+(y, v)[Var(v)]-l(v-E(v))}
== AL(x, v) + BL(y, v).
208
4.2.
Answers and Hints to Exercises
Using (4.6) and the fact that E(a) = a so that (a, v) = E(a - E(a)) (v - E(v)) = 0,
we have L(a, v)=E(a)+(a, v)[Var(v)]-l(v-E(v))=a.
4.3.
By definition, for a real-valued function A = [aij], dj /dA = [aj /aaji]. Hence,
o=
j
and a matrix
8~ (trllx - YII~) a
= 8HE((x - E(x)) - H(v - E(v))) T ((x - E(x)) - H(v - E(v)))
a
= E 8H((x - E(x)) - H(v - E(v))) T ((x - E(x)) - H(v - E(v))) = E( -2(x - E(x)) - H(v - E(v))) (v - E(v)) T = 2(H E(v - E(v)) (v - E(v))T - E(x - E(x)) (v - E(v))T) =2(Hllvll~-(x, v)).
This gives so that x* = E(x) - (x, v) [IIVII~] -1 (E(v) - v).
4.4.
Since v k - 2 is a linear combination (with constant matrix coefficients) of xa, {a' ... , {k-3' '!la' ... , '!lk-2
which are all uncorrelated with
{k-l
and '!lk-l' we have
and
4.5.
Similarly, we can verify the other formulas [where (4.6) may be used]. The first identity follows from the Kalman gain equation (cf. Theorem 4.1(c) or (4.19)), namely:
Answers and Hints to Exercises
so that GkR k
209
= Pk,k-lCJ - GkCkPk,k-lCJ = (1 - GkCk)Pk,k-lCJ .
To prove the second equality, we apply (4.18) and (4.17) to obtain rk-l~k_l -
(Xk-l - Xk-llk-l,
Kk-l!lk_l)
(Xk-1 - Xk-1Ik-Z - (x# k-l, v# k-1) [lIv# k_IiI
rk-l~k_l -
= -(X#k-b =
-1 V# k-1,
r\C
z
k- 1X#k-1
+ !lk-1)'
- Kk-l!lk_l)
V#k-1) [lI v #k- 1Il
r\SJ-
z
1fL1 - Rk-1KJ-1)
Onxn,
in which since
Kk-l
= rk-ISk-lRk~I' we have
sJ-Irl- 1 4.6.
]
Kk-l!lk_l)
(X#k-1 - (X#k-1, V#k-1) [llv#k-III
rk-l~k_l
Z
Rk-IKJ-I
= Onxn·
Follow the same procedure in the derivation of Theorem 4.1 with the term Vk replaced by Vk - DkUk, and with
= L(Ak-IXk-l + Bk-lUk-l + rk-l~k_I'
xklk-l
V
k 1 - )
instead of xklk-I
4.7.
=
L(Xk'
Let Wk
=
Xk
=
k 1 - )
=
L(Ak-IXk-1
+ rk-I~k_l'
k 1 V - ).
+ bl Uk-I + CIek-1 + Wk-I , -a2 V k-2 + b2 U k-2 + Wk-2 ,
-alVk-I
Wk-I =
and define
v
[Wk
Wk-I
Xk+1 {
Vk
Wk_2]T.
Then,
= AXk + BUk + rek = CXk + DUk + D..ek ,
where A=
[=;; ~
n,
C=[l 00],
D = [b o]
and
D.. = [co].
210
4.8.
Answers and Hints to Exercises
Let Wk
=
Wk-l
+ bl Uk-l + clek-l + Wk-l , = -a2 v k-2 + b2U k-2 + C2 e k-2 + Wk-2 , -alVk-l
where bj = 0 for Xk
> m and
j
=
[Wk Wk-l
{
Vk
Then
= 0 for
Cj
...
j
> f, and define
Wk_n+I]T.
= AXk + BUk + rek = CXk + DUk + flek ,
Xk+l
where -al -a2
1 0
0 1
0 0
-an-l -an
0 0
0 0
1 0
A=
bl
B=
-
albo
Cl -
bm - ambo -am+lbo
Cl - alcO -al+l
r=
0],
C=[10
= [b o],
D
alcO
fl = [co].
and
Chapter 5 5.1.
Since v k is a linear combination (with constant matrices as coefficients) of Xo, !lo' 1 0, ...
, 'lk'
io'
f!..o'
...
,
f!..k-l
which are all independent of fi k , we have (f!..k' v
On the other hand, have
k
)
= o.
fi k has zero-mean, so that by (4.6) we
r 1
L(1!-k'
v
k
)
=
E(1!-k) - (1!-k'
v
k
)
[lI vk l1 2
k (E(v ) -
v
k
)
=
o.
Answers and Hints to Exercises
5.2.
Using Lemma 4.2 with
v
= V k-
I , VI
L( Vk-I,
V # k-I -- Vk-I -
= V k- 2,
V2
211
= Vk-I and
V k-2) ,
we have, for x = Vk-I, L(Vk-l, v k -
I
)
k 2 - )
+ (V#k-ll
k
+ Vk-I
= L(Vk-ll
v
= L(Vk-l,
v -
2
)
V#k-l)
[II V #k_ 1 11 2]
-1 V#k-l
L(Vk-l, v k- 2 )
-
= Vk-I .
5.3.
The equality L(lk' v k - I ) = 0 can be shown by imitating the proof in Exercise 5.1. It follows from Lemma 4.2 that Zk-I - Zk-I -
L(Zk-l,
k I V - )
= Zk-l -
E(Zk-l)
+ (Zk-ll
= Zk-I
_ -
v
k
-
1
)
[lI vk - 1 11 2]
-1
(E(v k- 1 )
_ v
k
-
1
)
[Xk-I] _ [E(Xk-I)]
~k-I
E(~k_l)
]-1 (E(V k- 1 ) + [(Xk-l, V:=~)] [llv k - 111 2 ({k_I'V
)
_
vk - 1)
whose first n-subvector and last p-subvector are, respectively, linear combinations (with constant matrices as coefficients) of Xo,
~o'
f!..o'
...
, f!..k-2' !lo'
10, ... ,
lk-I'
which are all independent of lk. Hence, we have
5.4. 5.5.
The proof is similar to that of Exercise 5.3. For simplicity, denote B = [CoVar(xo)cri +RO]-I.
212
Answers and Hints to Exercises
It follows from (5.16) that Var(xo - xo)
= Var(xo - E(xo) - [Var(xo)] cri [CoVar(xo)cri + RO]-I(vO - CoE(xo))) = Var(xo - E(xo) - [Var(xo)]cri B(Co(xo - E(xo)) + ~o))
= Var((1 - [Var(xo)] cri BCo)(xo - E(xo)) - [Var(xo)]C~ B!lo) = (I - [Var(xo)]C~ BCo)Var(xo) (I - cri BCo[Var(xo)])
+ [Var(xo)]Cri BRoBCo[Var(xo)] = Var(xo) - [Var(xo)]cri BCo[Var(xo)] - [Var(xo)]Cri BCo[Var(xo)]
+ [Var(xo)]cri BCo[Var(xo)]C~ BCo[Var(xo)] + [Var(xo)]Cri BRoBCo[Var(xo)] = Var(xo) -
[Var(xo)]C~ BCo[Var(xo)]
- [Var(xo)]Cri BCo[Var(xo)] + [Var(xo)]C~ BCo[Var(xo)] = Var(xo) - [Var(xo)]C~ BCo[Var(xo)].
5.6.
From ~o
== 0,
we have Xl == Aoxo
and ~l
== 0,
+ GI(VI - CIAoxo)
so that X2 == AIXI
+ G 2(V2 -
C 2A I XI) ,
etc. In general, we have
+ Gk(Vk - CkAk-IXk-l) + Gk(Vk - CkXklk-l) .
Xk == Ak-IXk-1 == xklk-l
Denote and Then
Answers and Hints to Exercises
Pl =
([~o ~o] -Gl[ClAo CIf0])
+[~ ~J = [[ In - Pl,ocl (ClPl'gCl
[Pg,O
~oJ
213
[:f
~]
+Rd-lCl ]Pl,o 31]'
and, in general, Gk = [Pk,k-l C;[ (CkPkt-lC;[ + Rk)-l] , Pk = [[ In - Pk,k-lC;[(CkPk,kOl C;[
3J.
+ Rd-lCk]Pk,k-l
Finally, if we use the unbiased estimate :Ko = E(xo) of xo instead of the somewhat more superior initial state estimate i o = E(xo) - [Var(xo)]C~[CoVar(xo)C~ + Ro]-l[CoE(xo) - vo], and consequently set Po =E([XO ] ~o
_
= [var~xo)
5.7.
[E(Xo)]) E~o)
([xo] _[E(Xo)])T ~o E~o)
30]'
then we obtain the Kalman filtering algorithm derived in Chapters 2 and 3. Let and Hk-l = [ CkAk-l - Nk-lCk-l ]. Starting with (5.17b), namely: Po = [( [var(xo)]-lo+ COR01CO)-1
0]
Qo
[Po 0
0]
Qo
'
we have
Gl= [~o ~o] [~o ~oJ [f¥~l ] .([Ho Clfo][~O ~o][f¥~l]+Rl)-l = [(AoPoH~ +foQofriCl) (Ho~oH~ +ClfoQofri cl +RI) -1] :=
[~l ]
214
Answers and Hints to Exercises
and PI
= ([
~o ~o] - [~l ][Ho
+[~ = [(Ao :=
[~l
GIrO
l) [~o
o]
Qo
[AT 0]
rJ
0
3J G1Ho)PoAJ -;; (I - G1G1)roQorJ
3J.
In general, we obtain Xk = Ak-1Xk-1 + Ok(Vk - Nk-1Vk-1 - Hk-1Xk-1) Xo = E(xo) - [Var(xo)] cri [CoVar(xo)Cri + Ro]-l[CoE(xo) - vo] Hk-1 = [ CkAk-1 - Nk-1Ck-1 ] -T T Pk = (A k- 1 - GkHk-1)Pk-1Ak-1 + (1 - GkCk)rk-1Qk-1rk-1 -T T T Gk = (Ak-1Pk-1Hk-1 + fk-1Qk-1fk-1Ck ). -T T T 1 (Hk-lPk-1Hk-1 + Ckrk-1Qk-1rk-1ck + Rk-1)Po = [ [Var(xo)]-l + cri Ra1Co]-1 k = 1,2,··· .
By omitting the "bar" on Hk, Ok, and Pk , we have (5.21).
5.8.
(a) Xk+1 = AcX k { Vk = CcX · k
+ ~k
(b) var(xo) Po,o =
[
0
o
0 00] ,
Var(~o)
o
Var(1Jo)
o o rk-1
Answers and Hints to Exercises
215
(c) The matrix CJ Pk,k-l Cc may not be invertible, and the extra estimates ~k and r,k in Xk are needed.
Chapter 6 6.1.
Since
and Xk-l = A n [N6A NcA]-I(C T Vk-n-l + ... + (AT)n-Ic T Vk-2)
+ AT C Tvk-n
= An[N6ANcA]-I(CTCXk-n-1 + AT C T CAXk-n-1 + ... + (AT)n-Ic T CAn-Ixk_n_1 + nOise) = A n [N6A NcA]-I[N6 A NCA]Xk-n-1 = AnXk_n_1
6.2.
we have E(Xk-l) Since
+ noise
+ noise,
= E(An Xk _ n _ l ) = E(Xk-I).
we have
Hence,
~A-l(S) = -A- 1 (S)[:sA(s)]A- 1 (s). 6.3.
LetP=Udiag[AI,···,An]U- I . P - AminI
6.4.
=
Then
Udiag[ Al - Amin,···, An - Amin ]U- I 2:
o.
Let AI,···, An be the eigenvalues of F and J be its Jordan canonical form. Then there exists a nonsingular matrix U such that
U-IFU
=J
=
216
Answers and Hints to Exercises
with each
* being 1 or
o.
Hence,
A~ Fk
*
A~
*
* *
= UJkU- 1 = U
where each * denotes a term whose magnitude is bounded by p(k)IAmaxl k
6.5.
with p(k) being a polynomial of k and IAmaxl = max( IAII,· .., IAnl ). Since IAmaxl < 1, F k ~ 0 as k ~ 00. Since
we have Hence, (A+B) (A+B)T =AA T +AB T +BA T +BB T
:S 2(AA T
6.6.
Since Xk-l = AXk-2 + r~k_2 is a linear combination (with constant matrices as coefficients) of xO'~o,··· '~k-2 and Xk-l
6.7.
+ BB T ).
= AXk-2 + G(Vk-1 - CAXk-2) = AXk-2 + G( CAX k-2 + cr~k_2 + '!lk-l)
- GCAXk-2
is an analogous linear combination of Xo, ~o' ... '~k-2 and '!lk-l' which are uncorrelated with ~k-l and '!lk' the two identities follow immediately. Since
al
Pk,k-I C -: Gl - GkCkPk,k-I C -: =CkCkPk,k-I C "[ cl + CkRkCl - CkCkPk,k-I C "[ cl =GkRkGl,
we have
Answers and Hints to Exercises
217
Hence, Pk,k
= = = =
(1 (1 (1 (1 -
GkC)Pk,k-1 Gk C )Pk,k-I(1 - Gk C ) T GkC) (APk_l,k_IAT
+ GkRGl
+ rQr T ) (1 -
GkC)T
+ GkRGl
GkC)APk_l,k_IAT (1 - GkC) T
+ (1 - GkC)rQr T (1 -
GkC ) T
+ GkRGl.
Imitating the proof of Lemma 6.8 and assuming that IAI ~ 1, where A is an eigenvalue of (1 - GC)A, we arrive at a contradiction to the controllability condition. 6.9. The proof is similar to that of Exercise 6.6. 6.10. From o < (E.-8. E.-8.) -J -J'-J -J 6.8.
= (fj' fj) - (fj' ~j) -
(~j' fj) + (~j' ~j )
and Theorem 6.2, we have (fj' ~j) + (~j' fj)
< (fj, fj) +
(~j' ~j )
(Xj -Xj +Xj -Xj,Xj -Xj +Xj -Xj) + IIxj -Xjll~ --
Ilx·J -
2 x·11 J n
+ (x·J -
x·J' x·J -
x·) J
+ (Xj - Xj,Xj - Xj) + 211xj - Xjll;
:S 211 x j - Xj 11; + 311 x j - Xj 11; -+ 5(P- 1 + C T R-IC)-I
as j -+ 00. Hence, B j = (fj'~j)AT c T are componentwise uniformly bounded. 6.11. Using Lemmas 1.4, 1.6, 1.7 and 1.10 and Theorem 6.1, and applying Exercise 6.10, we have tr[PBk-I-i(Gk-i - G) T
+ (Gk-i
- G)BJ_I_i PT ]
:S(n trPBk-I-i(Gk-i - G)T (Gk-i - G)BJ_I_iPT)I/2
+ (n
tr(Gk-i - G)BJ_I_ipT PBk-I-i(Gk-i - G) T)I/2 :S(n trpp T . trBk-I-iBJ_I_i· tr(Gk-i - G)T (Gk-i - G))1/2 + (n tr(Gk-i - G) (Gk-i - G) T . trBJ_I_iBk-I-i . trp T p)I/2
=2(n tr(G k- i - G) (G k- i - G) T . trBJ_I_iBk-I-i . trF T F)I/2
for some real number rI, 0 < rl < 1, and some positive constant C independent of i and k.
218
Answers and Hints to Exercises
6.12. First, solving the Riccati equation (6.6); that is, c2p2 + [(1 - a 2)r - c2,2q]p - rq,2 = 0,
we obtain p
= 2c12 {c2 ,),2 q + (a 2 -
l)r + V[(l - a 2 )r - c2 ')'2 q]2
+ 4C2 ')'2 qr }.
Then, the Kalman gain is given by 9
= pcj(c2 p + r) .
Chapter 7 7.1.
The proof of Lemma 7.1 is constructive. Let A = [aij]nxn and AC = [Rij]nxn. It follows from A = AC(AC) T that i
aii
= LR;k,
i
= 1,2,' .. , n,
k=1
and
j
aij = L RikRjk , k=1
j =1= i ;
i,j
= 1,2,···,n.
Hence, it can be easily verified that i-I
f ii
= (a ii
-
L f;k)
1/2
,
i
= 1, 2, ... ,n,
k=1 j-l
fij = (aij -
L fikfjk) / fj j ,
j = 1,2, ... ,i - 1;
i = 2, 3, ... , n,
k=1
and j = i
+ 1, i + 2, ... , n;
i = 1,2, ... , n.
This gives the lower triangular matrix AC. This algorithm is called the Cholesky decomposition. For the general case, we can use a (standard) singular value decomposition (8VD) algorithm to find an orthogonal matrix U such that U diag[sl,···,Sr,o, ... ,O]UT =AA T ,
Answers and Hints to Exercises
219
where 1 :::; r :::; n, 81,'" ,8r are singular values (which are positive numbers) of the non-negative definite and symmetric matrix AA T, and then set
A= U
diag[JS1 , ... , VS;r" 0 ... , 0]
.
7.2. (a)
L
=
[~
0
0
n·
2 -2
(b)
L
=
J2/2 m [J2 J2/2 1.5/m
oo ] . J2]
7.3.
(a) L- 1
=
[
l/f l1 -£21/£11£22 -£31/£11£33 + £32£21/£11£22£33
0 1/£22 -£32/£22£33
oo ] . 1/£33
(b) l1
L- 1
=
[bb~l bn1
0 b22
0 0
bn2
bn3
)J
where i
= 1,2,""
n;
i
= -£711
bij j
7.4.
=
L
bik£kj, k=j+1 i - 1, i - 2, ... ,1;
i
= 2,3, ... ,n.
In the standard Kalman filtering process,
which is a singular matrix. However, its "square-root" is p1/2 _
k,k -
[E/~ 0
which is a nonsingular matrix.
0]1 [E0 0]1 ~
220
7.5.
Answers and Hints to Exercises
Analogous to Exercise 7.1, let A == It follows from A == AU(AU)T that
[aij]nxn
and AU ==
[£ij]nxn.
n
aii
==
i
L£;k' k=i
== 1,2.···, n,
and n
aij
==
j
L fikfjk , k=j
=I i;
== 1,2,· .. ,n.
i, j
Hence, it can be easily verified that == 1,2,· .. , n,
i n
f ij
=
(aij -
L
fikfjk )/fjj ,
k=j+l
j == i
+ 1, ... ,n;
i == 1,2, ... ,n.
and j == 1,2, ... ,i - 1;
7.6.
7.7.
i
This gives the upper-triangular matrix AU. The new formulation is the same as that studied in this chapter except that every lower triangular matrix with superscript c must be replaced by the corresponding upper triangular matrix with superscript u. The new formulation is the same as that given in Section 7.3 except that all lower triangular matrix with superscript c must be replaced by the corresponding upper triangular matrix with superscript u.
Chapter 8 8.1.
== 2,3, ... ,n.
(a) Since r 2 ==x 2 +y2, we have . x. y. r == -x + -y, r r
so that r == v sinB and
r == iJ
sinB + vB cosB.
Answers and Hints to Exercises
On the other hand, since (xy - xy)jx 2 or .
o=
=
tane
yjx,
we have
221
8sec 20 =
xy - xy xy - xy v 2 20 = 2 = -cosO , x sec r r
so that ..
.
2
r = a s~nO
v 20 + -cos r
and
o.. = =
(iJr-vi') v· r2 cosO - ;:OsinB 2 2 ar - v sinO) v r cosB r 2 2 sinOcosB . (
(b)
x=
f(x)
:=
V sinB ] 2 . a sinO + vr cos 2B [ (ar - v 2sinO)cosfJ j r 2 - v 2sinfJ cosO j r 2
(c) xk[l] xk[2]
+ ha
+ hv
sin(xk[3])
sin(xk[3])
+ v2cos2(xk[3])jxk[1]
vcos (x k [3] ) j Xk [1] (axk[l] - v 2sin(xk[3]))cos(xk[3])jxk[l]2 -v 2sin(Xk [3])COS(Xk [3])/xk[1]2
and Vk
8.2. 8.3.
8.4.
= [1 0 0 0
]Xk
+ TJk ,
where Xk := [xk[l] xk[2] xk[3] xk[4]]T. (d) Use the formulas in (8.8). The proof is straightforward. The proof is straightforward. It can be verified that
Taking the variances of both sides of the modified "observation equation" Vo - Co(fJ)E(xo)
= Co(O)xo - Co(B)E(xo) + '!lo'
222
Answers and Hints to Exercises
and using the estimate (vo - Co(B)E(xo))(vo - Co(B)E(xo))T for Var(vo - Co(B)E(xo)) on the left-hand side, we have (vo - Co(B)E(xo))(vo - Co(B)E(xo))T =Co(B)Var(xo)Co(B)T + R o .
8.5.
Hence, (8.13) follows immediately. Since E(Vl) = Cl(B)Ao(B)E(xo) ,
taking the variances of both sides of the modified "observation equation" VI -
Cl (B)Ao(B)E(xo)
=Cl(B)(Ao(B)xo - Cl(B)Ao(B)E(xo)
+ r(B)~o) + '!1.l '
and using the estimate (VI -Cl(B)Ao(B)E(xo))(Vl -Cl(B)Ao(B) ·E(xo))T for the variance Var(vl - Cl(B)Ao(B)E(xo)) on the left-hand side, we have (VI -
Cl (B)Ao(fJ)E(xo))(Vl - Cl (B)Ao(B)E(xo)) T
=Cl(B)Ao(B)Var(xo)A~ (B)Ci (B)
8.6. 8.7.
+ Cl(B)ro(B)Qor~ (B)Ci (B) + RI·
Then (8.14) follows immediately. Use the formulas in (8.8) directly. Since fl. is a constant vector, we have that p, = V (x) = [var(xo) 0,0 ar fl. 0
Sk :=
Var(fl.)
= 0,
so
0] o·
It follows from simple algebra that Pk,k-l
=
[~~]
and
Gk
=
[~]
where * indicates a constant block in the matrix. Hence, the last equation of (8.15) yields ~klk == ~k-llk-l· 8.8. p,
_
0,0 -
[po0
0]
So
223
Answers and Hints to Exercises
where ca is an estimate of Co given by (8.13); that is,
Chapter 9
{
Xk
= -(a + /3 -
:h
= -(a + (3 - 2)Xk-l - (1 - a)Xk-2
2)Xk-l -
(1 -
a)Xk-2
+
aVk
+ (-a +
+ kVk -
(3)Vk-l
kVk-l .
2
9.2. 9.3.
(b) 0 < a < 1 and 0 < /3 < l~a. System (9.11) follows from direct algebraic manipulation.
(a)
=
I-a
(1 - a)h
(1 - a)h2 /2
-/3lh 2 [ -,lh
1 - /3 1 -,Ih
h - /3h12 1 -,/2 -()h2 /2
-()
-()I h
-sa]
-s/3/h -s,lh 2 s(l - ())
(b) det[zI - <1>] = z4 + [(a - 3) + /3 + ,/2 - (() - 1)s]z3 + [(3 - 2a) - /3 +,/2 + (3 - a - /3 -,/2 - 3())s]z2 + [(a - 1) - (3 - 2a - /3 + ,/2 - 3())s]z + (1 - a - ())s. -
zV(z-s)
Xl = det[zI _ Ill] {az
X = 2
+ h'/2 + (3 -
zV(z - l)(z - s) {/3
det[zI _
<1>]
x = zV{zdet[zI - 1)2(z _
2
z
- /3
s) Ih 2
,
+,
2a)z + (,/2 - j1 + }/h
,
,
and
w=
zV(z-1)3 ().
det[zI - 4.>]
an,
224
Answers and Hints to Exercises
Xk = alXk-1 + a2 Xk-2 + a3 x k-3 + a4 Xk-4 + aVk + (-2a - sa + ,8 + , /2)Vk-1 + [a - ,8 +,/2 - (a - ,8 +, /2)SVk-3 , x x alxk-l + a2 k-2 + a3 k-3 a4x k-4 + (,8/h)Vk
+ (2a - ,8 - , /2)S]Vk-2
Xk
=
- [(2 + s),8/h - ,/h]Vk-1 + [,8/h - ,/h + (2,8 - ,)S/h]Vk-2 - [(,8 - ,)S/h]Vk-3, Xk = alxk-l + a2 x k-2 + a3 x k-3 + a4 x k-4 + (,/h)Vk - [(2 + ,),/h2]Vk_1 + (1 + 2S)Vk-2 - SVk-3, Wk = alwk-l + a2 Wk-2 + a3 Wk-3 + a4 Wk-4
+ (,/h 2)(Vk -
3Vk-1
with the initial conditions where
+ 3Vk-2 X-I
al = -a - ,8 -,/2 + (0 - l)s a2 a3
=
Vk-3),
X-I
=
X-I
=
Wo
= 0,
+ 3,
= 2a + {3 - ,/2 + (a + ,8h + ,/2 + 38 - 3)s - 3 , = -a + (-2a - ,8 +,/2 - 38 + 3)s + 1 ,
and a4
9.4. 9.5. 9.6. 9.7.
= (a + 8 - l)s.
(d) The verification is straightforward. The verifications are tedious but elementary. Study (9.19) and (9.20). We must have ap,av,aa 2: 0, am > 0, and p > o. The equations can be obtained by elementary algebraic manipulation. Only algebraic manipulation is required.
Chapter 10 10.1. For (1) and (4), let
* E {+,-,
X *Y
.,/}. Then
= {x * ylx E X,y E Y} = {y * xly E Y,x E x}
=y*x. The others can be verified in a similar manner. As to part (c) of (7), without loss of generality, we may only consider
Answers and Hints to Exercises
225
the situation where both x 2 0 and y 2 0 in X = [;f, x] and y = [y, y], and then discuss different cases of ± 2 0, z :::; 0, and ±--Z < o. 10.2. It is straightforward to verify all the formulas by definition. For instance, for part (j.1), we have
A1(BC) = [~AI(i,j) [~BjlClk]]
~ [~~AI(i,j)BjlClk] [t [tA1(i,j)Bjl] Clk]
=
l=l
J=l
= (A1B)C. 10.3. See: Alefeld, G. and Herzberger, J. (1983). 10.4. Similar to Exercise 1.10. 10.5. Observe that the filtering results for a boundary system and any of its neighboring system will be inter-crossing from time to time. 10.6. See: Siouris, G., Chen, G. and Wang, J. (1997).
Chapter 11 11.1.
1 2
-t
2
O:::;t
+ 3t - -3
1:::;t<2
-t -3t+2 2
2:::;t<3
-t 1
2
2 9
2
o
otherwise. 1
-t 6
1
3
3
2
2
- - t + 2t - 2t + 2
1
3
- t - 4t 2
1
--t 6
3
2
3 22
+ lOt - -
+ 2t 2 -
3
32
8t + 3
o
O:::;t
226
Answers and Hints to Exercises
11.2. ;j;;,(w)
=
C-i:-iW)n= e-inw/2 (si~1f2)) n.
11.3. Simple graphs. 11.4. Straightforward algebraic operations. 11.5. Straightforward algebraic operations.
Subject Index
covariance
adaptive Kalman filtering 182 noise-adaptive filter 183 adaptive system identification 113,115 affine model 49 a - {3 tracker 140 a - {3 - , tracker 136 a - {3 - , - () tracker 141,180
13
Cramer's rule
132
decoupling formulas 131 decoupling of filtering
algorithm for real-time
equation 131 Descartes rule of signs 140 determinant preliminaries 1 deterministic input sequence 20
application 105 angular displacement 111,129 ARMA (autoregressive moving-average) process 31
digital digital digital digital
ARMAX (autoregressive moving-average model with exogeneous inputs)
66
attracting point 111 augmented matrix 189 augmented system 76 azimuthal angular error 47 Bayes formula
12
Cholesky factorization 103 colored noise (sequence or process) 67,76,141 conditional probability 12 controllability matrix 85 controllable linear system 85 correlated system and measurement noise processes
49
filtering process 23 prediction process 23 smoothing estimate 178 smoothing process 23
elevational angular error 47 estimate 16 least-squares optimal estimate 17 linear estimate 17 minimum trace variance estimate 52 minimum variance estimate 17,37,50 optimal estimate 17 optimal estimate operator 53 unbiased estimate 17,50 event 8 simple event 8 expectation 9 conditional expectation 14
228
Subject Index
extended Kalman filter FIR system
108,110,115
184
Gaussian white noise sequence geometric convergence 88 IIR system 185 independent random variables innovations sequence 35 inverse z- transform 133
15,117
14
LU decomposition 188 marginal probability density function 10 matrix inversion lemma 3 matrix Riccati equation 79,94,132,134 matrix Schwarz inequality 2,17 minimum variance estimate 17,37,50 modified extended Kalman filter 118 moment 10
Jordan canonical form 5,7 joint probability distribution (function) 10
nonlinear model (system) 108 non-negative definite matrix 1 normal distribution 9 normal white noise sequence 15
Kalman filter 20,23,33 extended Kalman filter 108,110,115 interval Kalman filter 154
observability matrix 79 observable linear system 79 optimal estimate 17
limiting Kalman filter 77,78 modified extended Kalman filter 118 steady-state Kalman filter 77,136 wavelet Kalman filter 164 Kalman-Bucy filter 185 Kalman filtering equation (algorithm, or process) 23,27,28,38,42,57,64,72-74,76,108 Kalman gain matrix 23 Kalman smoother 178 least-squares preliminaries 15 limiting (or steady-state) Kalman filter 78 limiting Kalman gain matrix 78 linear deterministic/ stochastic system 20,42,63,143,185 linear regulator problem 186 linear state-space (stochastic) system 21,33,67,78,182,187
asymptotically optimal estimate 90 optimal estimate operator 53 least-squares optimal estimate 17 optimal prediction 23,184 optimal weight matrix 16 optimality criterion 21 outcome 8 parallel processing 189 parameter identification 115 adaptive parameter identification algorithm 116 positive definite matrix 1 positive suqare-root matrix 16 prediction-correction 23,25,31,39,78 probablity preliminaries 8 probablity density function conditional probability density function 12
8
Subject Index joint probability density function 11 Gaussian (or normal) probability density function - 9,11 probability distribution 8,10 function 8 joint probability distribution (function) 10
square-root algorithm 97,103 square-root matrix 16,103 steady-state (or limiting) Kalman filter 78 stochastic optimal control 186 suboptimal filter 136 systolic array 188 implementation 188
radar tracking model
Taylor approximation trace 5
(or system) 46,47,61,181 random sequence 15 randon signal 170 random variable 8 independent random variables uncorrelated random variables
variance 10 conditional variance
14
10
random vector
range 47,111 real-time application
61,73,93,105 real-time estimation/decomposition 170 real-time tracking 42,73,93,134,139 sample space
47,122
uncorrelated random variables
13 13
8
satellite orbit estimation 111 Schur complement technique 189 Schwarz inequality 2 matrix Schwarz inequality 2,17 vector Schwarz inequality 2 separation principle 187 sequential algorithm 97
wavelets 164 weight matrix
15
optimal weight matrix 16 white noise sequence (process)
15,21,130 Gaussian (or normal) white noise sequence 15,130 zero-mean Gaussian white noise sequence 21 Wiener filter 184 z-transform 132 inverse z-transform
229
133
13