APPROXIMATE KALMAN FILTERING
APPROXIMATIONS AND DECOMPOSITIONS Editor-in-Chief: CHARLES K. CHUI
Vol. 1: Wavelets: An Elementary Treatment in Theory and Applications Tom H. Koornwinder, ed. Vol. 2: Approximate Kalman Filtering Guanrong Chen, ed.
Series in Approximations and Decompositions - Vol. 2
APPROXIMATE KALMAN FILTERING edited by
Guanrong Chen University of Houston
World Scientific Singapore- New Jersey- London- Hong Kong
Published by World Scientific Publishing Co. Pte. Ltd. P O B o x 128, Farrer Road, Singapore 9128 USA office: Suite IB, 1060 Main Street, River Edge, NJ 07661 UK office: 73 Lynton Mead, Tottendge, London N20 8DH
Library of Congress Cataloging-in-Publication Data Approximate Kalman filtering / edited by Guanrong Chen. p. cm. — (Series in approximations and decompositions ; vol. 2) Includes index. ISBN 981021359X 1. Kalman filtering. 2. Approximation theory. I. Chen, Guanrong. II. Series. QA402.3.A67 1994 O03'.76'0115-dc20 93-23176 CIP
Copyright © 1993 by World Scientific Publishing Co. Pte. Ltd. All rights reserved. This book, or parts thereof, may not be reproduced in any form orby any means, electronic or mechanical, including photocopying, recording orany information storage and retrieval system now known or to be invented, without written permission from the Publisher. For photocopying of material in this volume, please pay a copying fee through the Copyright Clearance Center, Inc., 27 Congress Street, Salem, MA 01970, USA.
Printed in Singapore by Utopia Press.
Approximations and Decompositions During the past decade, Approximation Theory has reached out to encompass the approximation-theoretic and computational aspects of several exciting areas in applied mathematics such as wavelets, fractals, neural networks, and computer-aidedgeometric design, as well as the modern mathematical development in science and technology. The objective of this book series is to capture this exciting development in the form of monographs, lecture notes, reprint volumes, text books, edited review volumes, and conference proceedings. Approximate Kaiman Filtering, the second volume of this series, represents one of the engineering aspects of Approximation Theory. This is an important subject devoted to the study of efficient algorithms for solving many of the realworld problems when the classical Kaiman filter does not directly apply. The series editor would like to congratulate Professor Guanrong Chen for his excellent job in editing this volume and is grateful to the authors for their fine contributions.
World Scientific Series in A P P R O X I M A T I O N S A N D DECOMPOSITIONS Editor-in-Chief: CHARLES K. CHUI Texas A&M University , College Stollon, Texas
This page is intentionally left blank.
Preface Kalman Filtering: from "exact" to "approximate" niters As has been in the last three decades and still is today, the term Kalman filter evokes favorable responses and applause from engineers, scientists, and mathematicians, both researchers and practitioners alike. The history of the development of the Kalman filter, or more precisely, the Kalman filtering algorithm, has been fairly long. Ever since the fundamental con cept of least-squares for signal estimation was introduced by Gauss at the age of eighteen in 1795, first published by Legendre in his book Nouvelles methodes pour la determination des orbites des cometes in 1806, and later also appeared in Gauss' book Theoria Motus Corporum Coelestium in 1809, no significant improvement was achieved in the next hundred years — not until 1912 when R. A. Fisher pub lished the celebrated maximum likelihood method, which was already anticipated by Gauss but unfortunately rejected also by Gauss himself much earlier. Then, a little later, Kolmogorov in 1941 and Wiener in 1942 independently developed the im portant fundamental estimation theory of linear minimum mean-square estimation technique. All these together, as well as the strong motivation from astronomi cal studies and the exciting stimulus from computational mathematics, provide the necessary and sufficient background for the subsequent development of the Kalman filtering algorithm, a mile-stone for the modern systems theory and technology. The Kalman filter, mainly attributed to R. E. Kalman (1960), may be consid ered in very general terms as an efficient computational algorithm for the discretetime linear least-squares estimation method of Gauss-Kolmogorov-Wiener, which was extended to the continuous-time setting by Kalman himself and to more gen erality by Bucy about a year later. To briefly describe the discrete-time Kalman filtering algorithm, we consider a stochastic state-space system of the form
Jx f c + i = = ,4fcXfc+| Ak-x.k+ifck, \
vVfc CfcXfc 7^ , fc = CfcX fc + r^
fc A: == 0, 0, I! ,,- '■-■,,
where {xt} is the sequence of state vectors of the system, with an initial state vector xo, {vfc} the sequence of measurement (or observation) data, {£ } and {n }
Vll
Preface
Vlll
two noise sequences, and {Ak} and {Ck} two sequences of the time-varying system and measurement matrices. In this linear state-space system, the first and second equations are usually called the dynamic and measurement equations, respectively. The problem is to find and calculate, for each k = 0,1, ■ ■ •, the optimal estimate xjt of the unknown state vector x^ of the system, using the available measurement data {vi, V2, ■ ■ ■, vjt} under the criterion that the estimation error covariance Cov(xLk - Xfc) = min, over all possible linear and unbiased estimators which use the aforementioned avail able measurement data, where the linearity is in terms of the data and the unbias is in the sense that E{xk} — E{xk}- It turns out that optimal solutions exist under, for example, the following additional conditions: (1) The initial state vector is Gaussian with mean E{x.o} and covariance Cou(xo) both being given; and (2) the two noise sequences {£,} and {rj. } are stationary, mutually indepen dent Gaussian, and mutually independent of xo, with known covariances Cov(£k) = Qk and CW(7?fc) = Rk, respectively. In addition, the optimal solutions x/t can be calculated recursively by
jf xxo = £{x £ { x00}} ,, 0 = \ xfc = ylfc_ix ylfc_ixfcfc_i + Gk(vk - A Ak-ix-k-i), k..i-xk_i),
k = 1,2, • • • ,
where Gk are the Kalman gains, successively calculated by •Po.o = Cou(xo), Pk,k-i = Pfc.fc-l = Ak-iPk-i.k-iAj^ Afc-lFn^-lA^! + + Q Qffcc __!, ! , Gk = = Pk Pk,k-\C (CkPk,k-iC Gfc (CfcPfc,fc-iCk fc + + Rk)~ ftfc)~ , yk-iCk Pk,k = (/ Pfc.fc
G GkkC Ckk)Pk,k-i, )Pk,k-i,
Cov(x f c— - J 4Ak-\x.k-i), in which Pk,k-i is actually the prediction error covariance Cov(xk = f c _ix f c _i), kk = 0,1,-, The entire set of these formulations comprises what we have been calling the Kalman filter or the Kalman filtering algorithm. This algorithm can be derived via several different methods. For more information about this recursive estimation algorithm, for example its detailed derivations, statistical and geometrical interpre tations, relations with other estimation techniques, and broad-range of applications, the reader is referred to the existing texts listed in the section entitled Further Read ing in the book. A few remarkable advantageous features of the above recursive computational scheme can easily be observed. First, starting with the initial estimate xo = E{XQ},
Preface
IX
each optimal estimate x/t, obtained in the consequent calculations, only requires the previous (one-step-behind) estimate and a single bit (current) datum Vfc. The essential advantage of such a simple "step-by-step" structure of the computational scheme is that there is no need to keep all the old state estimates and measurement data for each up-dating state estimate, and this saves huge computer memories and CPU times, especially in real-time (on-line) applications of very large scale (higher dimensional) systems with intensive measurement data. Second, all the recursive formulas of the algorithm are straightforward and linear, consisting of only matrix multiplications and additions, and a single matrix inversion in the calculation of the Kalman gain. This special structure makes the scheme feasible for parallel implementation using advanced computer architectures such as systolic arrays to further speed up its computation. Moreover, the Kalman filtering algorithm is the optimal estimator over all possible linear ones under the aforementioned conditions, and gives unbiased estimates of the unknown state vectors — although this is not the unique characteristic monopolized by the Kalman filter. The Kalman filter is ideal in an ideal world. This is merely to say that the Kalman filter is "perfect" if the real world is ideal — offering linear mathematical models for describing physical phenomena, providing accurate initial conditions for the model established, guaranteeing the exact models and their accurate parameters would not be disturbed or changed throughout the process, and creating Gaussian white noise (if there should be any) with complete information about its means and variances, etc. Unfortunately, nothing is ideal in the real world, and this makes the ideal Kalman filter often impractical. As a result, various modified versions of the standard Kalman filter, called approximate Kalman filters, become undoubtedly necessary. To be more specific, recall that the standard (ideal) Kalman filter requires the following conditions: the dynamic and measurement equations of the system both have to be linear; all the system parameters (matrices) must be given exactly and fixed without any perturbation (uncertainty); the mean £{xo} and covariance Cew(xo) of the Gaussian initial state vector need to be specified; and the two noise sequences, {£ } and {rh } , are both stationary, mutually independent, Gaussian, and mutually independent of xo, with known covariances Cov(£ ) = Qk and Cov(r] ) = Rk, respectively. If any of these conditions is not fulfilled, the Kalman filtering algorithm is not efficient: it will not give optimal, often not even satisfactory, estimation results. In most applications, the following questions are raised: (1) "What if the state-space system is nonlinear?" (2) "What if the initial conditions are unknown, or only partially given?" (3) "What if the noise statistics (means and/or covariances) are unknown, or only partially given, or changing?" (4) "What if the noise sequences are not Gaussian?" (5) "What if the system parameters (matrices) involve uncertainties?" (6) "What if • ■ • ?"
x
Preface
The objective of this book is to help answering at least some of these questions. However, it is appropriate to remark that in a very modest size of this tutorial vol ume, neither do we (actually, never can we) intend to cover too many interesting topics, nor does the reader can expect to be able to get into very deep insights of the issues that we have chosen to discuss. The motivation for the authors of the chapters to present their overview and commentary articles in this book is basically to pro mote more effort and endeavor devoted to the stimulating and promising research direction of approximate Kalman filtering theories and their real-time applications. We would like to mention that a good complementary volume is the collection of research papers on the standard (ideal) Kalman filter and its applications, entitled Kalman Filtering: Theory and Application, edited by H. W. Sorenson and published by IEEE Press in 1985. The first topic in this tutorial volume is the extended Kalman filter. As has been widely experienced, a mathematical model to describe a physical system can rarely be linear, so that the standard Kalman filtering algorithm cannot be applied directly to yield optimal estimations. In the first article, Bullock and Moorman give an introduction to an extension of the linear Kalman filtering theory to estimating the unknown states of a nonlinear system, possibly forced by additive white noise, using measurement data which are the values of certain nonlinear functions of the state vectors corrupted by additive Gaussian white noise. In the second part of their article, some possible choices for linearization are discussed, leading to the standard and ideal extended Kalman filters, whereas some modified extended Kalman filters are also described. Then, in part three of the article, Moorman and Bullock study how to use the a priori state estimate sequence to perform linearization and analyze the bias occurred in this modified extended Kalman filter. The second issue which we are concerned with in this book is the initialization of the Kalman filter. If the initial conditions, namely, the mean E{X.Q} and covariance Ccw(xo) of the initial state vector, are unknown or only partially given, the Kalman filtering algorithm cannot even start operating. Catlin offers an introduction to the classical Fisher estimation scheme, used to initialize the Kalman filter when prior information is completely unknown, and then further extends it to the more general case where the measurements may be ill-conditioned, making no assumption on the invertibility of any matrix involved in the estimation process. In the joint article of Gomez and Maravall, several successful approaches to initializing the Kalman filter with incomplete specified initial conditions, which work well even for nonstationary time series, are reviewed. In particular, they describe a simple solution, based on a trivial extension of the Kalman filter, to the problem of optimal estimation, forecasting, and interpolation for a general class of linear systems. Next, the adaptive Kalman filters are discussed. Basically, adaptive Kalman filters are referred to as those modified Kalman filters that can adapt either to un known noise statistics or to changing system parameters (or changing noise statis tics). Under the assumption that all the noises are Gaussian, although with un-
Preface
xi
known statistics, several efficient methods for providing adaptation capability to the Kalman filter are reviewed in the article of Moghaddamjoo and Kirlin. The adapta tion of the filters to unknown deterministic inputs are also discussed. A technique for the Kalman filter to be able to adapt to changing noise statistics is then de scribed in some details, yielding a stable and robust estimation process even under certain irregular environment. In Wojcik's article, problems in design and applica tions of adaptive Kalman filters for on-line estimation of signal and noise statistics, including noise reduction and the best possible signal restoration, are discussed. An overview on practical issues in solving these problems is given for both stationary and nonstationary linear systems. Several adaptive filtering schemes are compared in terms of speed and efficiency. Almost all the noise sequences in practical systems are non-Gaussian, but many of them can be approximated well by a finite sum of Gaussian noises, called a Gaus sian sum for short, with different means and covariances. For this Gaussian sum case, which is non-Gaussian overall, fairly rigorous mathematical analysis can be carried out to yield optimal or suboptimal Kalman filtering algorithms. In the article of Wu and Chen, this topic is investigated in some details. A historical overview, describing several representative approaches, is first given. Then, un der different assumptions on the Gaussian sums of the dynamic and/or measure ment noise sequences, several optimal (modified and generalized) Kalman filtering schemes obtained under the standard minimum mean-square error (MMSE) esti mation criterion are presented. To handle modeling errors and uncertainties, set-valued models are often pre ferred. This issue is considered by Morrell and Stirling for systems with Gaussian noise, and also by Hong in case that the noise sequences are non-Gaussian. The set-valued Kalman filter is reviewed in the article of Morrell and Stirling. The setvalued Kalman filtering theory is developed under the Gaussian noise assumption, and hence generalizes the standard point-valued Kalman filtering algorithm to the case where the noise sequences can have Gaussian densities defined by common means and covariances in a prescribed convex set of Gaussian density functions. The non-Gaussian case with set models is studied in Hong's article, where the only information needed is the sets with confidence values from which the modeling and measurement errors and the initial conditions are obtained. This approach is con sidered to be a generalization of some existing approaches based on the commonly used "unknown-but-bounded" assumption on the noise. Robust stability of the Kalman filter under parametric and noise uncertainties is analyzed by Chen and Peng. A realistic sufficient condition under which the Kalman filter works satisfactorily with robust stability in the presence of both parametric and noise uncertainties, is derived. This gives a guideline to the Kalman filter user for guaranteeing the stability of the estimation process under imperfect modeling conditions. The final article of the book, written by Kerr, is devoted to an investigation of
xii
Preface
several numerical approximation issues in implementation of the Kalman filter. As has been observed in practice, getting incorrect results at the output of a Kalman filter simulation or hardware implementation can be blamed on the use of faulty ap proximations applied in the implementation, faulty coding/computer programming, or incorrect theoretical details, etc. A thorough discussion on how these problems are handled is presented in the article. The author provides his original unique approach to handling these problems. The techniques espoused therein are consid ered to be universal, and are independent of the constructs of particular computer languages, and have been used in cross-checking Kalman filter implementations in several diverse commercial and military applications. If the survey articles presented in this book can serve as an overview of the state-of-the-art research for Kalman filters that are not "exact" but only "approx imate" under irregular environments; if the reader can benefit by some new ideas, techniques, and insights from individual articles of the book; and if new research in this challenging, yet promising, area can be further stimulated and motivated, then the goal of the present authors, who have contributed their best effort to make the publication of the book possible, will be achieved. In the preparation of this book, the editor has received enthusiastic assistance from several individuals. First, the editor would like to express his gratitude to Charles Chui, the series editor, for his continuous encouragement and support. Sec ond, he is very grateful to Margaret Chui for her assistance in the editorial work. In addition, the editor would like to thank William Bell, George Siouris, Harold Sorenson, and John Woods, for their interest in this project. To his wife, Helen Qiyun Xian, he would like to thank her for her understanding and support. Fi nally, the editor would like to acknowledge the financial support from both the President's Research and Scholarship Fund and the Institute of Space Systems Op erations grants at the University of Houston, and from the McDonnell Douglas Space Systems Company research grants.
Houston Spring, 1993
Guanrong Chen
Contents Preface
vii
I. Extended Kalman Filtering for Nonlinear Systems Extended Kalman Filters 1: Continuous and Discrete Linearizations T. E. Bullock and M. J. Moorman
. . 3
Extended Kalman Filters 2: Standard, Modified and Ideal T. E. Bullock and M. J. Moorman . . .
9
Extended Kalman Filters 3: A Mathematical Analysis of Bias M. J. Moorman and T. E. Bullock . . .
.
.
II. Initialization of Kalman Filtering Fisher Initialization in the Presence of Ill-Conditioned Measurements D. Catlin Initializing the Kalman Filter with Incompletely Specified Initial Conditions V. Gomez and A. Maravall
.
III. Adaptive Kalman Filtering in Irregular Environments Robust Adaptive Kalman Filtering A. R. Moghaddamjoo and R. L. Kirlin
15
23
39
65
On-line Estimation of Signal and Noise Parameters and the Adaptive Kalman Filtering P. J. Wojcik . . .
.87
Suboptimal Kalman Filtering for Linear Systems with Non-Gaussian Noise H. Wu and G. Chen .
113
xiii
xiv
Contents
IV. Set-valued and Distributed Kalman Filtering Set-valued Kalman Filtering D. Morrell and W. C. Stirling .
139
Distributed Filtering Using Set Models for Systems with Non-Gaussian Noise L. Hong . .
161
V. Stability Analysis and Numerical Approximation of Kalman Filtering Robust Stability Analysis of Kalman Filter under Parametric and Noise Uncertainties B. S. Chen and S. C. Peng
.
Numerical Approximations and Other Structural Issues in Practical Implementations of Kalman Filtering T. H. Ken Further Reading
.
. . .
179
193
. 2 2 1
Notation
223
Subject Index
225
Extended Kalman Filters 1: Continuous and Discrete Linearizations T. E. Bullock and M. J. Moorman
Abstract. The use of a linearizing trajectory to apply the linear Kalman Filter equations to nonlinear estimation problems is introduced. Some possible choices for this linearizing trajectory are examined leading to the standard and ideal Extended Kalman Filters. Some other useful modifications to the Extended Kalman Filter are also presented and discussed.
§1 Introduction This article will introduce the extension of linear Kalman Filter theory to the problem of estimating the trajectory of a nonlinear dynamical system, possibly forced by additive white noise, from measurements of nonlinear functions of the system states corrupted by additive white Gaussian noise. A nominal trajectory is used to linearize the nonlinear functions and obtain perturbation equations to which the standard linear method may be applied. The equations for the linearized filter are presented and the choice of the linearizing trajectory is discussed. This leads to the definition of the standard and ideal extended Kalman filters. Some suggested modifications to the standard filter are discussed and their impact on estimator performance is addressed. §2 Derivation of the continuous time linearized equations Consider the nonlinear continuous system with additive white noise disturbance | x ( t ) = a(x(r), * ) + £ « ) , Approximate Kalman Filtering Guanrong Chen (ed.), pp. 3-8. Copyright ©1993 by World Scientific Publshing Co. Inc All rights of reproduction in any form reserved. ISBN 981-02-1359-X
(1) 3
4
T. Bullock and M. Moorman
where E{£(t)}=0, = 0, TT
E{i(t)i (r)} E{£(t)i (T)} Q(t)6(t-r), 6( « } = Q{t)Q(t)6(t-T), with initial conditions xo = x(to) which is a vector valued random variable with known mean £{xo} and covariance Cov (xo,xo). Discrete nonlinear measurements are taken at times tk vfc = c(x(tO, * * ) + & .
(2)
where E{vk} = 0,
E{rkUJ)=Rk6kl. In order to apply linear Kalman filter theory we can linearize the above equations along some "nominal" or "reference" trajectory, x((), which satisfies the dynamical equation ^ ( ( ) = a(x((),t)
(3)
with initial condition x(i 0 ) = £ { x 0 } . Although x(i) is a known deterministic trajectory here, this will be changed later as we investigate various candidates for x(i). In the case of the standard Extended Kalman Filter x(t) is in fact a stochastic process and does not satisfy (3). Subtracting (3) from (1) gives - x ( t ) - -x(t)(t) = a(x(t),t) - a(x(i),i) + ! ( i ) .
(4)
Define the deviation of the nominal trajectory from the actual as x(t) = x(i) - x ( i ) .
(5)
Expanding a(x(£),£) to first order around the nominal trajectory leads to a(x(t),t) « a(x(t),t) + A(t) x ( t ) , where J4(£) is defined as
A(t) -
(6)
<9a(x, t) 59 x
^
x=x(t) x=*(0
Substitution of (6) and the derivative of (5) into (4) gives
A(t)x(t)+£(*), with a random initial condition vector x(to) with mean £{x(< 0 )} = £{x(i 0 )} - x(t 0 ) = 0 and covariance <7tw(x(to),x(to)) = Coy(xo,x 0 ).
(7)
5
Extended Kalman Filters 1 §3 The discrete time linearized equations
In order to process the discrete time measurement sequence, one approach is to derive an approximate linear discrete equation of the form x fc+1 = Ak xk + fj,
(8)
with the initial condition x 0 = x(£0) • The discrete model (8) is obtained, so that Xfc « X(tfc) ,
in the sense that, to first order in tk+i—tk, = E E{x(tk)}
E{xk} and
Cov{xk,xk) = Cov{x(tk),x(tk)). For a random sample input £(£), (7) can be solved from tk to tk+i to obtain rtk+i
x(t f c + i) = *(tfc+i,
(9)
Jtk
where <£(i, r) is the state transition matrix corresponding to the linear system matrix A(t). Taking the mean gives, since £(r) is assumed to be zero mean, E{x(tk+1)} )} = *(i f c + i,t f c ) E- E{x(tk)}, so that the choice
Ak =
$(tk+utk)
and
mk}=o makes E{x(tk+i)} agree with E{xk+i}. From (8) taking Cov(xk+i,xk+i) leads to COT(xfc+i,xfc+1) = AkCov{xk,xk)Al
A[ + Cov(§_k,^_k).
(10)
From (9), Ccw(x(i f c + i),x(t f c + i)) = AkCov{x{tk),x{tk))&))• Al CJt + 1
+ /t+1*(ifc+1,T)Q(r)$T(£fc+1,T)dr £t
(11)
T. Bullock and M. Moorman
6 Comparison of (10) and (11) shows that
Cov(x(tk+i),i(tk+i)) Cav(xk+i,xk+i) tk+l)) = Cov( if Cov
0k>&. * * > = / t + 1 ^ f c + i , r ) Q ( r ) $ T ( t f c + 1 , r ) d r Jtk
:=
Qh.
The measurement equation can likewise be linearized by defining a nominal measurement sequence, vfc = c ( x f c j t f c )
and V = Vfc - Vfc .
We then have Vfc RS CfcXfc + 7^ ,
where Ck
dc(x, t) dx
(12) x=x(tt)
Thus, we have obtained a discrete linear system Xfc+i = AkXk+^h,
(13)
Vfc = CfcXfc + T[k,
(14)
to which we can apply the linear Kalman filter equations to obtain (approximately) a minimum variance estimate of x(t) at the sample points which, since x(t) is known, yields an estimate of x(tfc). Or equivalently we use the equations - x ( t ) = a(x(«),i), X.(tk) = Xfc,fc ,
Xfc+1>fc = x(t f c + 1 ), Pk+i,k = AkPk,kAj
(15)
+ Qk,
Gk = Pfc,fc-iCfcT [CkP^k-iC^
(16)
+ Rk]-1,
Xfc.fc = Xfcifc_1 + Gfc(vfc-c(xfc,tfc)-Cfc(xfcifc_1-Xfc)), -Pfc.fc = -Pfc.fc-i - GkCkPk,k-i = (I-GkCk) Pk,k-i {I-GkCk)T
+ GkRkGl,
(17) (18)
(19)
7
Extended Kalman Filters 1
where Xfc+i,fc is the a priori estimate of xt+i and Pk+i,k is the corresponding error covariance and Xfc^ is the a posteriori estimate of Xfc and Pktk is the error covariance. Although we have not explicitly shown the dependence of all the linearized terms on x*,, it should be recalled that Ak and Gk axe functions of the nominal trajectory. Note that there are two equivalent equations for calculating the o posteriori covariance Pk,k- The first is the standard form while the second is referred to as the Joseph form and can be shown to be more computationally stable [1] (though more complex). These equations are called the linearized Kalman Filter equations with nominal trajectory x(i). Equation (20) shown in the next article must be integrated by some method to obtain the a priori estimate Xfc+i^ from the previous a posteriori estimate x^fc and the nominal trajectory 5t(t) on (tk, tk+i)- The linearized state equations which were used for (16) represent an approximation in updating the estimate and covariance. For this reason another computational scheme might be suggested. One method to obtain the a priori covariance would be to numerically integrate Kolmogorov's Forward equation (also called the Fokker-Planck equation) [2] associated with the stochastic differential equation (1). As it is, (16) is the solution of the linear variance equation applied to the linearized system (8). The choice of computational scheme is based on available processing capability and engineering judgment as to whether integration of (20) is sufficient and whether a more exact integration of (16) is required. We are left with two questions: What nominal trajectory do we select and how valid is the linearization? The choice of nominal trajectory will be addressed in the rest of this article and the validity of the linearization procedure has been addressed in many texts and articles. In addition, the possibility of a bias being introduced into the state estimate due to the linearization used in the standard Extended Kalman Filter is addressed in the last section of this chapter.
References 1. Gelb, A. et ai, Applied Optimal Estimation, M.I.T. Press, Cambridge, MA, 1974. 2. Jazwinski, A. H., Stochastic Processes and Filtering Theory, Academic Press, New York, 1970.
8
T. E. Bullock Department of Electrical Engineering University of Florida Gainesville, FL 36211 bullock@eel .ee.ufl.edu M. J. Moorman WL/MNGA, Eglin AFB Eglin, FL 32542
[email protected]
T. Bullock and M. Moorman
Extended Kalman Filters 2: Standard, Modified and Ideal T. E. Bullock and M. J. Moorman
§1 Introduction If the nominal trajectory, defined by x(t) = a(x(i),i), x(t 0 ) = £{x(t 0 )} is used to linearize the state and measurement equations, we can see that the linearization will be "good" only if the neglected higher order terms are small. Otherwise the nominal trajectory may eventually bear little resemblance to the actual trajectory, and we would expect poor performance to result. This approach does not take advantage of the fact that we are making measurements of the system and are (presumably) producing better and better estimates of the state. One obvious choice for use in the linearization at sample time k is the a priori state estimate, i.k,k-i- Using xkik-\, we obtain the equations for the Extended Kalman Filter
Ttx{t)
x(t fe )
Xfc+l,fc
a(x(i),t), Xfc.fc,
*(tfc+l),
Pk,k-i =4fc-i(xfc l fc- 1 )ft-i,fc-ii4j_ 1 (xfc i fc_i) Gk = Pk>k-i Cj(x fci fc_ 1 ) [C f c (x M _!) Pfc,fc_! Cj(xk
(20) + Q fc)
(21) l
+ Rk]~ ,
(22) 9
T. Bullock and M. Moorman
10
Xfc.fc = x fc , fc _i + Gfc(vfc - c(xfc,fc_i)),
Pk,k = [I-Gk (I-GkC(xklk-i))Pk,k-i = (I -G fc C(x fc , fc _ 1 ))P fc , fc _ 1 (7 - G f c C ( * k J k _ i ) ) T + GkRkGTk
(23)
,
(24)
where now Ak = $(tk+i,tk) is the state transition matrix corresponding to the system linearized about the estimated trajectory defined by (20). We have now explicitly shown that Ak and Ck are functions of the a priori state estimate x ^ - i and the a posteriori estimate Xfc^ rather than the nominal trajectory Xfc. We have effectively used a recursive stochastic linearization to apply the linear Kalman filter equations to the nonlinear problem. The extended Kalman filter has been studied and used in far too many books and articles to reference here. §2 The ideal extended Kalman filter The Extended Kalman Filter presented in the previous section is a practical appli cation of approximate Kalman Filtering to nonlinear estimation problems. Another filter can be proposed in which the nonlinear functions are linearized about the actual state trajectory. That is, we use the same equations as for the Extended Kalman filter but make the substitutions C/b(xfc,fc_i) «- Cfc(xfc) =
Ak «-
A(x(t),t)
dc(x) <9x
x=xfc
da(x) dx x=x(t)
assuming (a situation that only occurs in simulation studies) the actual state tra jectory x(t) is known makes the linearizations and, therefore, the gain sequence, Gjt, and the covariance matrix sequences, Pk,k—t and Pk,k, deterministic. We refer to this filter as the Ideal Extended Kalman Filter (IEKF). It has been shown [3,7] that for a deterministic system (e.p., Qk = 0 for all k) the covariance matrix of the IEKF, call it P£, is the Cramer-Rao lower bound for the estimation problem. That is, for any other unbiased estimator with covariance matrix Pk, we have Pk — Pj^ is positive definite. Having the Cramer-Rao lower bound for an estimation problem gives a benchmark for comparison with practical filters and if the EKF is being evaluated it is a simple and potentially very valuable to apply the IEKF to determine the best possible performance that can be obtained.
Extended Kalman Filters 2
11
Note that we treat x(t) as being deterministic even though, as we have defined it, it is not. For x(i) to be deterministic we would require that Cov(xo,xo) = 0. In this case we would no longer have an estimation problem. This obstacle can be removed by considering xo to be deterministic while the initial condition xo (which has been heretofore assumed to be deterministic) is considered to be the random variable. This retains the same definition of Po,o as Po,o = -E{(x0 - xo)(x 0 - x 0 ) T } . The same equations still apply though now in a less exact sense. Because the EKF is already an approximation this is not an unreasonable assertion. §3 The modified gain extended Kalman filter It has been pointed out that the linearization of a nonlinear measurement function (we will now only refer here to the measurement function) is perfectly valid only at the point of linearization. It has been proposed [4] that rather than approximating the total variation of a nonlinear measurement function at time tk by its first variation, we replace it exactly by a linear structure that is a function of the state estimate, xjt^-i, and the actual measurement function, c(x). A function for which such a linear structure can be found is called a modifiable function. To illustrate consider a function being approximated by its first order expansion, « / N ,-\ <9c(x) v* = c(x) as c(x) + dx then c(x) - c(x)
dc(x) dx
(x - x).
(x - x ) .
If c(x) is a modifiable function we can write c(x) - c(x) = g(v~, x) (x - x ) , i.e., this equation is exact, not an approximation. The Modified Gain Extended Kalman Filter (MGEKF) uses the modified form of the measurement function, <7(vjt, xjt k-i), evaluated at the current measurement and the a priori state estimate in the covariance update equation in place of C(ik,k-i) while retaining all the other equations of the EKF. That is, instead of using Pk,k = (I-GkC(ik,k-i))
Pk,k-i(I - Gk C(xk,k-i))T
+ Gk Rk Gj ,
(25)
- Gfc9(vfc, x fc , fc _!)) T + Gk Rk Gl
(26)
we use Pk,k = (I-Gkp(vfc,
xfc,,t_i)) Pk,k-i(I
T. Bullock and M. Moorman
12
It is important to note that the modified form is not used in calculating the gain Gfc. This would cause the gain to be a function of the current measurement and thus have a strong correlation with the measurement residue sequence resulting in poor performance. The MGEKF has been shown to yield better performance than the EKF for some applications and conditions [5,6]. As with the EKF the use of an approximating function (approximating because it is a function of the noisy measurement Vfc, not v£) to apply linear equations to a nonlinear problem precludes an analytical determination of the global performance of the filter and local performance can only be determined via simulation. §4 Other modified filters Another important modification to the Extended Kalman Filter is the use of suc cessive iterations of the estimate in order to mitigate the effect of the measurement nonlinearities. That is, we use the a posteriori estimate to relinearize the measure ment equation and then repeat the update process. This procedure can be repeated as many times as desired, usually until the iteration yields little change in the estimate. This algorithm, called the Iterated Extended Kalman filter [1,2], is sum marized as follows. Begin with the Oth order iterate, x
fc,fc - Xfc.fc-i ,
and calculate the subsequent iterates as
*£TJ = «*.*-! + ^iV* - cfiu^-o - cpHzkjc-i - *&)) < where (i)
and
_ <9c(x)
Gi')=pfc,fc-1cr^)pfc,,-1cr+^]-1
The iteration procedure is continued for i = 0,1, • • ■ , n. The (n + l)st iterate, xkk , is then the output estimate of the filter. The nth iterate is used in the covariance update as pk,k = pk,k-i
- 4n) ^n)
p
k,k-i
Because the Extended Kalman Filter is only an approximation of a nonlinear estimator we have no guarantee of optimality or even stability. Thus the algorithm is open to multiple possible modifications in order to enhance performance. We have
Extended Kalman Filters 2
13
introduced a few of these in this article but the possibilities are essentially infinite. In order to analyze and improve performance of this and other approximate Kalman filters under various adverse conditions such as poor or incomplete initialization, unknown measurement and plant noise processes (unknown or non-Gaussian distri butions and nonstationary statistics) and various computational issues associated with parallel and distributed implementation many methods have been devised. Ar ticles in subsequent chapters of this book will address all these issues and attempt to resolve them using adaptive filters, convergence and error analysis and set valued filtering.
References 1. Gelb, A. et al., Applied Optimal Estimation, M.I.T. Press, Cambridge, MA, 1974. 2. Jazwinski, A. H., Stochastic Processes and Filtering Theory, Academic Press, New York, 1970. 3. Taylor, J. H., The Cramer-Rao estimation lower bound computation for deter ministic systems, IEEE Trans, on Auto. Contr. 24 (1979), 343-344. 4. Song, T. L. and J. L. Speyer, A stochastic analysis of a modified extended Kalman filter with application to estimation with bearings only measurements, IEEE Trans, on Auto. Contr. 30 (1985), 940-949. 5. Galkowski, P. J. and M. A. Islam, An alternative derivation of the modified gain function of Song and Speyer, IEEE Trans, on Auto. Contr. 36 (1991), 1323-1326. 6. Moorman, M. J. and T. E. Bullock, Empirical evidence of bias in extended Kalman filters used for passive target tracking, Proc. of NAECON'92, May, 1992, 393-398. 7. Moorman, M. J. and T. E. Bullock, New results and applications for the ideal extended Kalman filter as a Cramer-Rao lower bound estimator, Proc. ofAIAA Guid. Navig. Contr. Conf., August, 1992, 1264-1270.
T. E. Bullock Department of Electrical Engineering University of Florida Gainesville, FL 36211
[email protected] M. J. Moorman WL/MNGA, Eglin AFB Eglin, FL 32542
[email protected]
Extended Kalman Filters 3: A Mathematical Analysis of Bias M. J. Moorman and T. E. Bullock
Abstract. The use of the a priori state estimate sequence as the linearizing trajectory in the Extended Kalman Filter makes the filter's gain matrix sequence a function of this estimate. An expression is obtained using a perturbation method for the change in the state estimate due to the incorporation of a measurement. If the expected value of this expression is not zero then a bias is introduced into the state estimate at each update. An example is given that demonstrates this bias.
§1 Introduction The Extended Kalman Filter linearizes the nonlinear measurement equation about the a priori state estimate x^fc-i so that the linear Kalman Filter equations can be applied. For convenience, the measurement update (23) and gain equations (22) are repeated here, Xfc.fc = Xfc,fc_i + Gfc(xjt,fc_i) (vfc - c(xfcifc_i)), G*(*fc,fc-i) = Pk,k-i Cj(xktk-i)
[C f c (*M-i) Pk,k-i CfcT(xfc,/b-i) + Rk]'1
(27) (28)
In the following analysis we will assume a scalar measurement to simplify the presentation. Therefore Rk is also a scalar and the matrix inverse in (28) represents division by a scalar. We have written the gain matrix Gk as an explicit function of the a priori state estimate because we desire to investigate the properties of the state update term, Gfc(xfc, fc _i)(vfc-c(x fc , fc _ 1 )), Approximate Kalman Filtering Guanrong Chen (ed.) P PP- 15-20. Copyright ©1993 by World Scientific ®&pypJCffote1elc All rights of reproduction in any form reserved. ISBN 981-02-1359-X
(29) 15
Material
M. Moorman and T. Bullock
16
which appears as part of the righthand side of (27). If the gain Gk(x-k,k-i) were not a function of Xfc^-i and c(x.k,k-i) was an unbiased estimate of v t , then the measurement update term would be zero mean. Then no bias would be introduced by the measurement update (29). Because the two factors in (29) are both functions of x^fc-i we suspect that the expected value of this term is generally nonzero and that it therefore introduces a bias into the a posteriori estimate. Applying a perturbation analysis we obtain (now using x instead of Xfcjt-i for the a priori estimate) c(x) « c(x) +
dc(s) ds
(x-x)
= c(x) + G(x) (5x
(30)
and G(x) » C(x) +
dC(s) ds
(x-x)
= C(x) + <5G(x),5x,
(31)
where C(x) was defined in (30), and <5x = x — x , <5C(x)
<92c(s) 5s2
8C(B)
ds
Note that <5C(x) is the Hessian of the nonlinear measurement function c(x). Doing the same for the gain, we obtain G(x) w G(x) +
aG(s) ds
(x-x)
G(x) + 5G(x).5x, where ^G(x)
8G(s) ds
Using the chain rule to find <5G(x) from (28), we obtain <5G(x) = (/ - 2G(x)C(x)) P<5G(x) £» _ 1 (x), where P = Pk,k-i is the a priori covariance matrix of the filter and £»(x) = G ( x ) P G T ( x ) + fl
Extended Kalman Filters 3
17
is a scalar function of the actual state and the filter's covariance matrix. We have dropped the subscripts (all expressions are now a priori estimates) for convenience. As in the previous article we have interpreted the problem as estimating a deter ministic trajectory with the filter initialization being random. If the state dynamics are actually described by the stochastic differential equation (1) this is not true. P (and thus G) is stochastic since it is a function of the linearization used in the covariance update equation (24) (and in the covariance propagation equation (21) if the dynamics are nonlinear). Therefore, we can not say that P is uncorrelated with <5x and treat it as being deterministic. However, in the present analysis we are only interested in first-order effects and this correlation may be reasonably evaluated as a higher-order effect and ignored. The part of the update term (29) that we are interested in can then be written as G(x) (v - c(x)) » (G(x) + <5G(x) <5x) (v - c(x) - C(x) <5x). If we consider <5x to be zero mean, and note that E{v} = c ( x ) , the only part of this term that can contribute to a bias is the quadratic term in <5x, -<5G(x)<5xG(x)<5x. Since we are assuming that c(x) is scalar, we can write G(x)<5x = <5x T G T (x). Then £{G(x) (v - c(x))} « -E{6G(x)
<5x<5xT G T (x)}
Letting E = £'{6x(5x } be the actual a priori estimation error covariance matrix, we can write out the expression for the expected value of the state update term (a posteriori estimate minus a priori estimate), £{xfc,fc - kk,k-i}
:= £{Ax f c } « - ( / - 2G(x)G(x)) P6C(x) E G T ( x )
D~l(^).
Whether this term is invariably positive, negative or zero determines, to first-order, whether the EKF is biased (positively or negatively) or unbiased and is totally dependent on the nature of the measurement function c(x), its gradient G(x) and its Hessian 6C(x). Obviously if c(x) is linear, then G(x) is constant and 6C(x) is identically zero, leading to the known fact that the linear Kalman filter is unbiased. The foregoing analysis can be applied for any nonlinear measurement function c(x) to determine the possible existence of a bias and may allow for a first order estimate of the bias. Let us consider a particular example.
M. Moorman and T. Bullock
18 §2 Example
Consider the system where the states to be estimated are the relative positions of two bodies in Cartesian coordinates as shown below: y-axis
Line of isVght"-"^:
Observation Platform
iar
Set
i-axis
Thus, the state vector is X
[XY)T.
=
The measurement is the angle 6, corrupted by white noise, + Jh, = c(xfc) + TL
Vfc = t a n
The gradient is then
X]r~2 ,
C(x) = [-Y where r2=X2 + Y2
The Hessian is, 6C(x) =
2XY -(X2-Y2)
-{X2
- Y2) -2XY
Y X
-X Y
„-4
(32)
We can factor 6C as 8C =
X Y
Y -X
(33)
By defining a unit vector orthogonal to C U=[X
Y]'
r-1,
which is the gradient of the function r = (7(x) = (X2 + Y2)1/2, partitioned form as 6C=[U'r T „ - l
cT]
-c
Ur~l
we can write 6C in (34)
We can take the expression for Ax^ and premultiply by the gradient C(x) to project the update term into the direction normal to the line-of-sight vector (in the direction of increasing 9) and manipulate,
Extended Kalman Filters 3
19
CTD'1
E{Ad} « C£{Ax f c } ss -C (I - 2GC) P6C E = CP6CT,CT
D'1 +
2CGCP6CECTD-1
= -(1-2CG)CP8CT,CT
D~l
(35)
T,CTD~l
(36)
Then E{A6} ss - ( 1 - 2 C G ) C P [ [ / T r - 1
- CT]
= - ( 1 - 2CG) [ C P [ / T r - - '
-
'
-C Ur-1
CPCT]
-CECT UY,CTr-^
D-1
.
(37)
can identify these terms as follows: o2e=CPCT as the filter's estimate of its error covariance projected into the c(x) direction by the true gradient C(x) and is thus an estimate of the a priori angular estimation error. Similarly, a\ = CPUT is an estimate of the cross-covariance of the a priori estimation errors. The terms CY,CT=o2g
and
C/ECT = a ^
are estimates of these quantities using the actual covariance E. We can also write CG = CPCTD-1
= a2e D'1
Then we have E{A0)^(\-2a2eD-1)
{a% aj + 52g a2r9) D~l r " 1
Whether this expression is zero or non-zero can not be determined exactly. However, if the filter were observed to have a bias, exhibited by divergence of the angle estimate, then we may desire to use the above expression to estimate the bias and perhaps correct it. Simulations have shown the angle estimate of the EKF in this case to be nondivergent. The range estimate has been shown to diverge, and we will investigate this next. The range update term can be approximated by premultiplying Ax by the gradient U to obtain E{Ar}
w [/£{Ax} w -U(I= -UPAUT,CTD~1
2GC)
PSCT,CTD~l +2UGCP6CZCTD-1
M. Moorman and T. Bullock
20 = {-UP[CTr-1 T
1
= {{-UPU r-
-CT]
+ 2UGCP[UTr-1 T
T
-UPC ]
- CT]) 1
+ UG[CU r-
-c 1
Ur'
-CPCT\)
ECTZ?_1 -CEC"
(38)
UECTr-1
Defining as before, UPUT = a2r , UG = UPCTD~1
^a^D'1
,
we obtain £ { A r } » - [a2 r " 1 - 2a2g a2g D~' r~l - a2g + 2a2g a\ D~1} = ((a2-2a2ea2eD-')a2e+(l-2a2eD-1)a2ea2g)D-lr-1
D' (39)
In [2], arguments are presented to show that the range bias is positive each time a measurement is processed. This may be a very small effect (as it is here) but has been shown to have a major impact in other examples [1]. This idea has been used to produce an estimator where this bias, in the mean, is eliminated by subtracting an estimate of the bias from the a posteriori estimate [3]. References 1. Moorman, M. J. and T. E. Bullock, Empirical evidence of bias in extended Kalman filters used for passive target tracking, Proc. of NAECON'92, May, 1992, 393-398. 2. Moorman, M. J. and T. E. Bullock, A stochastic perturbation analysis of bias in the extended Kalman filter as applied to the bearings-only estimation, Proc. of IEEE Conf. on Decis. Contr., Dec, 1992, 3778-3783. 3. Moorman, M. J. and T. E. Bullock, A new estimator for passive tracking of maneuvering targets, Proc. of the 1st IEEE Conf. on Contr. AppL, Sept., 1992, 1122-1127. M. J. Moorman WL/MNGA, Eglin AFB Eglin, FL 32542
[email protected] T. E. Bullock Department of Electrical Engineering University of Florida Gainesville, FL 36211
[email protected]
Fisher Initialization in the Presence of Ill-Conditioned Measurements
Donald Catlin
Abstract. We extend the usual Fisher estimation scheme, used to initialize Kalman filters when prior information is completely unknown, to the case where the measurements may be ill-conditioned. Specifically, we make no assumptions about the invertibility of any of the matrices involved in the estimation procedure.
§1 Introduction A Bayesian inference scheme is, in the broad sense, a scheme in which prior statistical beliefs are conditioned with data to provide posterior statistical descriptions. As additional data is collected the scheme can be repeated using the posterior descriptions as the new priors. Clearly any such scheme must be initialized with some choice of prior statistical information. The means of choosing such prior information, or indeed if one should even employ Bayesian techniques, has been the subject of much debate in the field of Statistics. So called Classical and Bayesian statisticians have been squabbling adnauseum about this over the years. Nevertheless, Bayesian techniques have proven to be very successful in the analysis of many, many statistical problems. Certainly the Kalman filter, an inherently Bayesian scheme, provides ample evidence of the validity of such a statement. The problem of choosing priors is, therefore, a real problem with which one must deal. One of the most compelling approaches to this problem is, in our opinion, provided by the work of Jaynes [7]. A thorough discussion of Jaynes' work would take us too far afield here, but suffice it to say that Jaynes proposes choosing priors that maximize entropy [7,8]. An interesting corollary to this procedure is that in the presence of no prior information at all one should choose as a prior a uniform distribution. This is fine except that one is typically working with distributions on TZn, Euclidean n-space, Approximate Kalman Filtering Guanrong Chen (ed.), pp. 23-38. Copyright © 1993 by World Scientific Publishing Co. Inc All rights of reproduction in any form reserved. ISBN 981-02-1359-X
23
D. Catlin
24
and on this space no such uniform distribution exists. The dilemma is usually resolved by choosing a sequence of priors pn that tend to become uniform as n increases, obtaining a sequence of posterior distributions ipn by conditioning each of the priors pn with the same data using the inference scheme, and then taking the limit of ipn a,s n —> oo. This is known as using an improper prior. A Kalman filter, as noted above, is a Bayesian scheme and as such requires initialization with prior information. In this case, the prior information required is an estimate x 0 of the state vector xo as well as the so-called error covariance matrix Po = £ { ( x o - x o ) ( x o - x 0 ) T } .
(1)
In case xo is a known vector of constants c, one simply takes xo = c and Po = 0. If xo is a distribution with known mean vector fj, and covariance matrix E, one chooses xo = /x and Po = £. In the case where xo is completely unknown, the improper prior approach produces an estimate known as the Fisher estimate [13]; we will provide details of this in section 2. The Fisher estimate can also be described in terms of a minimum variance estimation problem and this will be described as well. In section 3 we will show how the Fisher estimation problem can be generalized to the case where the measurements are ill-conditioned and in sections 4 and 5 will solve this more general problem. Initializing Kalman filters is not always a serious issue. If the dynamical system involved is completely controllable and observable then the filter converges to a steady state filter that is independent of the initial conditions [5]. This result has been generalized by Aasnaes and Kailath [1]. Such Kalman filters are called stable. For these filters the choice of a prior undoubtedly affects the rate of convergence to steady state, that is the robustness, but we know of no published results which infer that a particular choice of prior will maximize this robustness (in fact, this may even be an ill-posed problem). In unstable Kalman filters, which are sometimes implemented, the initial conditions are probably much more important. Nishimura [11], greatly generalizing some work of Soong [14], derived conditions under which one can obtain an upper bound for the actual variances of estimates calculated using incorrect initial conditions, whether or not the filter is stable. In the case of stable filters this work does provide information on the robustness of the convergence to steady state. Again, this work does not indicate how to choose prior information, only how to evaluate it once the choice is made (a useful thing to be sure). Maximum entropy priors, including improper priors, seem to be the most compelling choices at the present time. §2 Improper priors and Fisher estimation Throughout this paper we will use the standard notation associated with Kalman filtering. Specifically, x/t represents the state vector at time k;
Fisher
Initialization
25
Vfc = CfcXfc + r^ represents the kth measurement, Ck is an m x n matrix, Th represents the (discrete) white measurement noise; Rk = E{VLvj} where E is the expectation operator; £>• xjtj represents the minimum variance estimate of x^ given measurements v j , • • ■, v^j p
'*fc,j ■ k,j = E{(*k,j -xfc)(xfcj
-xfc)T}.
Assuming that any indicated inverses exist it can be shown [2] that the update equations for the Kalman filter can be rewritten as P&=Pi£-i
+ CZR?Ck;
(2)
l
Xfc.fc = Pk,kCjR^vk v fc H+ Pk,kPk,l-i^k,k-i
(3)
To implement the improper prior approach for initializing a Kalman filter we assume that xo is completely unknown and thus xo,o can be anything at all. It follows from this that xo,o — xo must be uniformly distributed and hence have "infinite covariance." It would then follow from the Kalman extrapolation equation [2,5] that Pifi is infinite and well, as so Pfd = 0 (the zero matrix). From equation (2) above we would then have •Pi.i
=
Ci ^ i
Ci
or P M = (CjR^d)-'
(4)
From this result we can infer from equation (3) that xi,i = PuiCjR^vi.
(5)
The above argument can be made rigorous by taking a sequence of covariance matrices whose eigenvalues diverge and then use these matrices to approximate P\fi\ the details are quite easy. So, in spite of the strange looking equality PJ~Q = 0, the above suggests that equations (4) and (5) provide reasonable estimates of Pii and xi,i whenever xo is completely unknown. It turns out that equations (4) and (5) represent an estimate known as the Fisher estimate. To discuss this we can drop the variable representing time and simply write v = Cx + r/, (6) where as before C is an m x n matrix and x and n are random vectors with E{yiT?} = 0 and E{rj} = 0 .
(7)
We propose to calculate a linear minimum variance unbiased estimate x of x given v. Specifically this means that we wish to find an n x m matrix F such that x = F v (linearity)
(8)
26
D. Catlin
and E{x} = -E{x} (unbiased estimator)
(9)
hold while minimizing || x - x || 2 = tr{E{(i
- x)(x - x) T }} .
(10)
Substituting (6) into (8), taking expected values, and applying (7) we have £ { x } = FCE{x) From (9) it follows that
.
£ { x } = FCFCE{x}.
Since x is completely unknown, it can be any n x l random vector and hence E{x} can be any element of lZn We conclude, therefore, that FC = In (the indentity matrix).
(11)
There are two other assumptions that are made for strictly technical reasons and these will be stated in the following definition. Definition 1. By the classical Fisher estimator of x in equation (6) under the assumptions (a) R =-E{,E{mT}; (b) R~* exists; (c) (CTR-lC)~l exists; \d) £ { X T 7 T } = 0.
we mean the random vector x satisfying the following conditions: (e) x = Fv; (/) FC = I(g) || x — x || is minimized subect to constraints (a) — (/). Theorem 2. The solution to the classical Fisher estimation problem is given by x = PCTR-\ (so F = PCTRTl); T 1 P=(C i?- C)-1=£{(x-x)(x-x)T}
(12) (13)
We are not going to provide a proof of Theorem 2 since this result will be subsumed by Theorem 6 of Section 4. Notice that equations (12) and (13) are exactly equations (5) and (4) in case v in (6) represents \\. Thus the improper prior approach to initializing a Kalman filter yields a classical Fisher estimate. On the one hand this is reasonable since
Fisher
27
Initialization
the Kalman estimate is also a minimum variance estimate. On the other hand it is well known that if x is a Kalman estimate, E{(x — x ) x T } = 0, while if x is a Fisher estimate, E{(x — x ) x T } = FRFr, so the estimates are essentially different. The difference is requirement (/). With it the minimum variance estimate will not, in general, be the Kalman estimate and without it the Kalman estimate cannot be obtained without additional information (such as a prior) [2, pages 141-142]. Nevertheless, the Fisher estimate does appear to be the limiting case of a Kalman estimate when the prior is completely unknown. As we will see in the next section, it is possible to weaken or eliminate many of the constraints in Definition 1. To this end we will need, in addition to the usual manipulations in Linear Algebra, the following two notions. If A is a matrix, then A' will denote the perpendicular projection matrix onto C(A)-L, the orthogonal complement of the column space of A. If we define A" = (A1)' then A" is the orthogonal projection matrix onto C(A). This notation is due to Kaplansky [9] and the basic theorems can be found there or in [2] and [6]. The psuedoinverse of A will be denoted A+ We assume the reader is familiar with this notion, however the basic information is available in [2,6,10] or [12]. §3 Formulation of the generalized Fisher estimate The classical Fisher estimate as described in Definition 1 contains some serious flaws. l To begin with the assumptions (b) and (c), namely that i ? _ 1 and (CTR~1C)~1 exist, are totally unwarranted. It is certainly conceivable that some measurements could be noiseless, and this would render R singular. Even if R~ exists, the fact that l the m x n matrix C will usually have m < n implies that CTR~lCCv will usually be singular. We will, therefore, remove conditions (6) and (c) of Definition 1 when we reformulate the Fisher estimate. Next, we will reformulate condition (/) in Definition 1 since there is no reason to suppose that C has a left inverse. In fact, if C were a simple state selection matrix omitting at least one state it would fail to be left invertible. We wish to replace condition (b) in Definition 1 by a weaker condition that is both reasonable and always attainable. This end we note that if x in equation (6) was an element of N{C), the nullspace of C, then v = 77 and so v would contain no information about x at all. On the other hand if x G Af(C)-1, the the carrier of C, then v would contain as much (noisy) information about x as we could possibly know. In fact, if 77 = 0 in this situation, we could perfectly reconstruct x from v using the pseudoinverse of C. Of course neither of these extreme situations is likely. Rather, one would expect that x would be the sum of two vectors xj and X2, where xi 6 A^C)- 1 and X2 £ M{C) ■ In this instance equation (6) would become
'cy
V^CXi+7?,
(14)
since X2 would be annihilated by C. In other words, the measurement vector v contains, at best, information about Xi. We maintain, therefore, that the best we
D. Catlin
28
can do in estimating x is to estimate that portion of x that is in N{C) , namely, x i . As mentioned above, in case 77 = 0 we would want our estimate x to be xi itself and so we would require that x = Fv, (15) where F satisfies the condition FC=(CT)".
(16)
Equation (16) is the analog of condition (/) in Definition 1. Unlike condition (/), however, one can always find an F satisfying (16) and in case C does have a left inverse, equation (16) reduces to condition (/) in Definition 1. From this discussion we can now state our formal estimation problem. Definition 3. By the generalized Fisher estimator of x in equation (6) under the assumption (a)£{xr?T}=0, we mean the random variable x / satisfying the following conditions: (6) Xj- = Fv; (c) FC = (C T )"; (d) || i / — x || is minimized subject to constraints (a) — (c). Definition 4. If it/ is the generalized Fisher estimator for x, we define the general ized Fisher error covariance matrix to be the matrix Pf given by Pf = £ { ( x , - ( C T ) " x ) ( X / - ( C T ) " x ) T } .
(17)
Note that the trace of Pf measures the square of the distance between Xf and that portion of x lying in AfiC)-1. It is also easy to verify that x / satisfies a condition analogous to that of being unbiased, namely E{kf — ( C T ) " x } = 0, so that Pf really does measure the covariance of the estimation error. In the next section we will show how to find x / when R is non-singular, leaving the case where R is singular to Section 5. §4 Calculating the generalized Fisher estimate for non-singular R In solving the estimation problem defined in Definition 3, we will first consider the case where R is non-singular. This restriction will be removed in the next section. First a general lemma. Lemma 5. The matrix F mentioned in Definition 3 satisfies the condition 2FR = A C T ,
(18)
Fisher Initialization
29
where A is an n x n matrix. Proof: From (6) and condition (6) of Definition 3 we have it/ = F C x +
FT?
,
and by (6) of the same definition, x/
=
(C T )"x + Fry.
(19)
From (19) it follows that x / - x = ( C T ) " x - x + Fr?, or equivalent ly, X /
- x = - ( C T ) ' x + Fr7.
(20)
From equation (20) and part (a) of Definition 3 it follows that From equation (20) and part (a) of Definition 3 it follows that F { ( x / - x ) ( X / - x ) T } = FRFT + ( C T ) ' F { x x T } ( C T ) '
(21)
In a manner analogous to equation (10), equation (21) implies that In a manner analogous to equation (10), equation (21) implies that \\xf-xf=tr{FRFT}+\\(CTyx\\2
(22)
In order to find an F that minimizes || x^ — x || subject to the constraint (16) we form the Lagrangian C(F, A) = tr{FRFT}+
|| (C T )'x f +tr{((CT)"
- FC)AT}
,
(23)
where the Lagrance multiplier A in (23) is a square matrix. Taking the Gateaux (directional) derivative of C in the direction of an arbitrary matrix B we have 6C(F, A)(B) = tr{{2FR -
ACT)BT}
Since this must be zero for all B equation (18) follows. Theorem 6. If R is non-singular, then the generalized Fisher estimate is given by x/ = F v ,
(24)
where F = {CTR-lC)+CTR-1
(25)
Pf = {CTR-lC)+
(26)
and
30
D. Catlin
Proof: If we multiply equation (18) on the right by R
1
we obtain
F = -ACTR-1.
(27)
Multiplying equation (27) on the right by C and noting equation (16) we have (CT)" = i A C T i r 1 C .
(28)
On the other hand, since R (hence i? _ 1 ) is positive definite, we have {CTR-lC)"
= {CT)".
(29)
Substituting (29) into (28) and multiplying the resulting expression on the right by 1 w obtain (CTR-lC)+CTR-1 R' we (CTR-1C)+CTR-1
= )-kCTR-1
Comparing equation (30) with equation (27) it is clear that (25) holds. equations (17) and (19) it is easily seen that Pf = FRFT ,
(30) Prom
(31)
and combining this with (25) we obtain Pf = (CTR-1C)+,
(32)
and this completes the proof. Note that equations (25) and (26) are the same as equations (12) and (13) except that we have not assumed the invertibility of CTR~lClC here nor have we \ assumed that C is left invertible. §5 Calculating the generalized Fisher estimate In this section we remove the restriction that R be non-singular. The core result is the following theorem. Theorem 7. Suppose the noise vector in equation (6) is of the form n=
-Q1
,
(33)
Fisher
Initialization
31
where Ri = £{^7,^7} is invertibk. has the form
In this situation the noise covariance matrix R P = '.Ri 0
0' 0
(34)
Because of (34) we partition C and v so that equation (6) has the form Vl
=
v2
Ci x + 3i
c2
(35)
0
Using this notation the generalized Fisher estimate is given by (36)
Xf = X ! + F2[V2 - C 2 X l ] , where
xi =
PiCjR^vx, T
1
(37) (38) (39) (40)
+
Pi = ( c 1 p r d ) . P 2 = P 2 i — P 2 2 + P23 , P21 = SCj(C2SCj)+, P22 = 5C 2 T (C25C 2 T ) + C 2 P 1 , (P 1 'C 2 T (C 2 SC 2 T ) + C 2 P 1 ') + P 1 , C 2 T (C 2 5C* 2 T ) + T
T
T +
+
,
T
r +
P23 = (C )"(P 1 'C 2 (C 2 5C 2 ) C 2 P 1 ') P 1 C 2 (C7 2 5C 2 ) ,
(41) (42) (43)
and
P^il-FiCJP^I-FiCi)*
.
(44)
Proof: Because of the form of equation (35) it will be convenient to partition F as P = [Pi|P 2 ].
(45)
In equation (18) we can, without any loss of generality, replace A by 2A and obtain FR = A C T , which in partitioned form becomes [FI|F2;
Pi 0
0 0
A[d T |C 2 T
(46)
Performing the indicated multiplications, (46) produces the equations Pi P i = AC?
(47)
32
D. Catlin
and kCj
= 0.
(48)
In a similar fashion, condition (c) of Definition 3 becomes FiCi + F2C2 = (CT)" . Multiplying this expression on the right by C T = [Cj\Cj] equations FiCiCj + F2C2Cj = Cj
(49) we obtain the pair of (50)
and FxdCj
+ F2C2Cj
= Cj
(51)
Since Ri is invertible, equation (47) implies that F i d = ACjR^Ci
(52)
A = (C 1 T J Rr 1 Ci)+ ,
(53)
Recalling Theorem 6 we define
so that (52) becomes Fid
= AA +
(54)
From (54) we can rewrite equations (50) and (51) as From (54) we can rewrite equations (50) and (51) as APfCj
+ F2C2Cj
= Cj
(55)
and AP+C 2 T + F2C2Cj = C2T Multiplying (55) through on the right by Rl l lC\ and noting (53) we obtain Multiplying (55) through on the right by R^ C\ and noting (53) we obtain
(56)
A(F+) 2 + F2C2P+ = P+ [f we successively multiply (57) on the right by Pi we produce equations If we successively multiply (57) on the right by Pi we produce equations
(57)
AP++ F2C2P{'=
P['
(58)
and and APi" + P 2 C 2 Pi = P i . Solving (58) for AP+ and substituting the result into (56) it follows that Solving (58) for AP+ and substituting the result into (56) it follows that F2C2[I - P{']Cj = Cj - P{'Cj F2C2[I - P{']Cj = Cj - P{'Cj ,
(59)
Fisher
33
Initialization
or equivalently, F2C2P{Cj
= P[Cj
(60)
We will need (60) a bit later. Since Pi is symmetric, the range of P\ equals the range of P*, so from (53) and the fact that fij" is non-singular we have
p{' = (cjy From this and (59) if follows that A(Cj)"
= Pi -
and hence, multiplying on the right by CjR^1, 1
F2C2P1, that
1
= P i C ^ P f - F2C2P1CJR11
ACjRi
(61)
Multiplying (47) through on the right by Pf 1 , it follows from (61) that Fi = PiCjRi1
- FiCiPiCjRi1
(62)
Now kj = Fv = [Fi\F2] v i v2
FiVj + F 2 v 2 ,
so from (62), xf = PiCjR^V!
- F2C2PiCjR^vi
+ F2v2 •
(63)
If we now define xi = P i C i f l ^ v i ,
(64)
x / = x i + F2[v2-C2xi].
(65)
then (63) can be rewritten as
Here is our first result, xj is exactly a generalized Fisher estimate of x based on the noisy measurement vi and its generalized Fisher error covariance is Pi (This is why we first derived the case where the measurement noise covariance was non-singular). Equation (65) has the familiar form of a Kalman estimate of x based on the prior X] and the noise-free measurement v 2 . As we will see, this is true only in form, for F2 is not a Kalman gain matrix and its calculation is much more complicated than such. This is not surprising for, as we already pointed out, the Fisher estimate is different in essence from the unconstrained minimum variance (Kalman) estimate. To calculate F2 for use in (65) we first note that since v 2 and C 2 xi are both in the range of C 2 it suffices to calculate F2C'{. Multiplying (59) on the right by Cj and adding the result to (60) we obtain APi"C2T + F2C2(P1 + P[)Cj = (Pi + Pi')C2T .
(66)
34
D. Catlin
Defining S = Pi + P{
(67)
and noting that P" = / - P[, it follows from (48) that (66) can be rewritten as F2C2SC2T = SC2T + AP{Cj .
(68)
Because S is invertible ( 5 " 1 = Pf + P[), if follows that (C2SC2T)" = Cf. Thus if we multiply (68) on the right by (C2SC2)+ we obtain F2C2' = SCj(C2SCj)+
+ XP[C2T{C2SC2T)+
(69)
From (54) we note that F\C\P[ = 0, so multiplying (49) on the right by P{ we obtain F2C2P[ = (CT)"P[ . (70) Multiplying (69) on the right by C2P{, it follows from (70) that (CT)"P[
= 5C 2 T (C 2 5C 2 T ) + C 2 P| +
AP[CJ{C2SC^)+C2P[
Multiplying this last expression on the right by (PiC2T(C2SCj)+C2P[)+P{Cj(C2SCj)+, while noting that the rightmost term reduces to h.P[C2\C2SC2)+,
we have that
AP 1 'C 2 T (C 2 5C 2 T ) + = {CT)"P,l{P'1C2T{C2SC2T)+C2P[)+P[C2T{C2SCj)+ - SC2T(C2SC2T)+C2P{(P{C2T(C2SCj)+C2P{)+P{Cj(C2SCj)+
(71)
Substituting (71) into (69) we obtain our final expression for F2 as stated in the theorem. To obtain the expression for Pf we note that by (17), (19), and (34) Pf = FiRiF?
.
(72)
The final expression for Pf is obtained by substituting expression (62) into (72) and reducing. This completes the proof of Theorem 7. In case the noise 77 is not of the form (33), one can always effect this form by diagonalizing the measurement noise correlation matrix R using an orthogonal matrix V thereby obtaining a matrix F having a diagonal non-zero block Ti in the upper left corner. Defining y = Vv, H = VC, and /£ = Vr], one then multiplies (6) on the left by V and obtains the measurement equation y = H x + (X,
Fisher
Initialization
35
which can be rewritten as yi Y2
=
H x+ H2
(73)
0
where Ti = F{fi nj} is invertible. One can then apply Theorem 7 to the model (73) and obtain x / in terms of yi,y%,Hi,H*i, and IY Finally, one rewrites this estimate in terms of the original v , C , and R (see [2, pages 156-159] for details on such a calculation). The result is the following theorem. Theorem 8. Hie./ is the generalized Fisher estimator for x in equation (6), then (74)
x/ = x i + F 2 [ v - C x i ] . where Xi = P1CTR+v, T Pi = {C R+C)+ ,
(75) (76) (77) (78)
F2 = F21 — F22 + P23 ,
F2i = SCTR'(R'CSCTR')+
F22 =
F 2 3 = (CT)"(P{CT
,
SCTR'(R'CSCTR')+R'CP{(P{CTR'{R'CSCTR')+R'C2P{)+ P{CTR'(R'CSCTR')+, R'(R'CSCT
R')+R'C2P[)+P{CT S = Pi + Pi,
R'(R'CSCT
R')+ ,
(79) (80) (81)
and Pf = (/ - F2C)Pi(I
-
F2C)T
(82)
The generalized Fisher estimate was first introduced in the author's text [2]. Unfortunately, the solution for the estimation equations given there contained an error (the equation directly below (7.4-25), page 153), and the solution was wrong. This was subsequently corrected in the author's paper [3] and the presentation here relies heavily on that work. To use Theorem 8 one must first calculate R',R+, Pi, and P[; ( C T ) " must be calculated prior to calculating F23. Although some reduction of equations (78)-(80) is possible, we have found it easier to leave them in the form shown because of the repetitions of various terms. It can happen that Pi is invertible in which case F22 and F23 are zero and S = Pi- In case C happens to have a right inverse, it is easy to show that Theorem 8 can be avoided altogether since in this case F = C+ (independent of R). We have already discussed the case where R is invertible.
36
D. Catlin To illustrate the theorem, consider a situation wherein 3 -1 1
C
0 1" 1 1 2 3
"3 2 I 2 2 0 1 0 1
and R =
In this case we have 2 1 1
* - ;
1 2 -1
1 -1 ' 2
R+
1 0 1
and
R
1 -1 -1
~ 3
-1 1 1
1 1
1 1 2 - 2 -1 3 0
It follows that 4 2 4
2 4 1 2 2 4
4 2 4
Pi
2 4 1 2 2 4
and P =
-
4 2 4
2 4 1 2 2 4
Hence P!CTP+ =
-2 -1 -2
27
4 2 4
a n d S = -
49 -16 -32
-16 73 -16
-32 -16 49
It turns out in this example that F2 = P23 since F12 is equal to F22'
Fu
= F22 =
235
97 -73 -65
-97 73 65
-97 73 65
The final calculation to determine F2 involves first calculating ( C T ) " and then F23: (C 1
25 -4 3
26
-4 10 12
3 12 17
11
-11
-7
7
-11"
and F23 = —
From these we can determine F and compare the answer to C+:
'-£
8 6 2
-18 7 2
3 12 17
and C+ =
156
43 -10 1
-22 16 14
Note that F j= C+, but it is easily checked that FC = (CT)"
-1 -22 29
Fisher
37
Initialization §6 Conclusions
In this paper we discussed the problem of initializing a Kalman filter when there is no prior information available. In such a case, the so-called improper prior ap proach strongly suggests that a reasonable choice is to calculate a Fisher estimate using the first available measurements. Unfortunately, if the measurements are illconditioned, the classical Fisher estimate is not always computable. To remedy this, we defined a generalization of the Fisher estimate and showed it to be computable in all situations. Although our paradigm here concerned the initialization of Kalman filters, the result is of interest in its own right in that it provides a general solution to the statistical estimation problem posed by linear models with random effects (see [4] for current information on this field). References 1. Aasneas, H. B. and T. Kailath, Initial-condition robustness of linear least squares filtering algorithms, IEEE Trans, on Auto. Contr. 19 (1974), 393-397. 2. Catlin, D. E., Estimation, Control, and the Discrete Kalman Filter, Spring er- Verlag, New York, 1988. 3. Catlin, D. E., Estimation of random states in general linear models, IEEE Trans, on Auto. Contr. 36 (1991), 248-252. 4. Christensen, R., Plane Answers to Complex Questions, The Theory of Linear Models, Springer-Verlag, New York, 1987. 5. Chui, C. K. and G. Chen, Kalman Filtering with Real-Time Applications, Springer-Verlag, Heidelberg, 1987. 6. Foulis, D. J., Relative inverses in Baer *-semigroups, Michigan Math. J. 10 (1963), 65-84 7. Jaynes, E. T., Prior probabilities, IEEE Trans, on Sys. Sci. Cybern. 4 (1968), 227-241. 8. Jaynes, E. T., Where do we stand on maximum entropy? in The Maximum En tropy Formalism, R. D. Levine and M. Tribus (eds.), M.I.T. Press, Cambridge, MA, 1978. 9. Kaplansky, I., Rings of Operators, Mimeographed Notes, University of Chicago, 1955. 10. Luenberger, D. G., Optimization by Vector Space Methods, John Wiley and Sons, New York, 1969, 84-91. 11. Nishimura, T., On the a-priori information in sequential estimation problems, IEEE Trans, on Auto. Contr. 11 (1966), 197-204, and 12 (1967), 123. 12. Rao, R., Calculus of generalized inverses, Part 1: general theory, Sankhya A29, 317-350. 13. Schweppe. F. C , Uncertain Dynamical Systems, Prentice Hall, Englewood Cliffs, N. J., 1973, 100-104. 14. Soong, T., On a-priori statistics in minimum variance estimation problems, Trans, on ASME, series D., J. of Basic Engr. 87 (1965), 109-112.
38 Donald Catlin Department of Mathematics and Statistics University of Massachusetts Amherst, MA 01003
[email protected]
D. Catlin
Initializing the K a l m a n Filter w i t h Incompletely Specified Initial Conditions
Victor Gomez a n d Agustin Maravall
A b s t r a c t . We review different approaches to Kalman filtering with incompletely specified initial conditions, appropriate for example when dealing with nonstationarity. We compare in detail the transformation approach and modified Kalman Filter (KF) of Ansley and Kohn, the diffuse likelihood and diffuse KF of de Jong, the approach of Bell and Hillmer, whereby the transformation approach applied to an initial stretch of the data yields initial conditions for the KF, and the approach of Gomez and Maravall, which uses a conditional distribution on initial observations to obtain initial conditions for the KF. It is concluded that the latter approach yields a substantially simpler solution to the problem of optimal estimation, forecasting and interpolation for a fairly general class of models.
§1 I n t r o d u c t i o n We consider observations generated by a discrete t i m e s t a t e space model (SSM) such t h a t t h e initial s t a t e vector xo has a distribution which is unspecified. We will further allow for unknown regression t y p e parameters. Examples are non-stationary t i m e series which follow an A R I M A model, regression models with A R I M A disturbances, s t r u c t u r a l models (as in [9]) and A R I M A component models, a m o n g others. In all these cases it is not possible t o initialize t h e K a l m a n Filter (KF) as usual, by means of t h e first two m o m e n t s of t h e distribution of xo , because t h e y are not well defined. Therefore, it is necessary t o incorporate new assumptions in order to deal with this initialization problem. Among t h e different alternatives t h a t have been proposed in t h e literature, we will focus on t h e transformation approach of K o h n a n d Ansley, t h e diffuse K a l m a n filter ( D K F ) of de Jong, t h e initialization procedure of Bell and Hillmer a n d t h e approach of Gomez a n d Maravall, based on a trivial extension of t h e K F , t o b e denoted Approximate Kalman Filtering Guanrong Chen (ed.), p p . 39-62. Copyright ©1993 by World Scientific Publishing Co. Inc All rights of reproduction in any form reserved. ISBN 981-02-1359-X
39
40
V. Gomez and A. Maravall
the Extended Kalman Filter (EKF), with a distribution defined conditionally on the initial observations. There are other approaches as well, like the so-called "big k" method (see, for example, [5] and [7]). This method uses a matrix of the form kl to initialize the state covariance matrix, where k is large to reflect uncertainty regarding the initial state and I is the identity matrix. The big k method is not only numerically dangerous, it is also inexact. An alternative to the big k method is to use the information filter (see [1]). However, as seen in [2], the information filter breaks down in many important cases, including ARMA models. The paper is structured as follows. In Section 2 we will define the SSM and consider some illustrative examples. In Section 3 we suppose that the initial state vector xo is fixed, define the likelihood and show how the EKF and the DKF can be used to evaluate it. In Section 4 we will deal with the different approaches to define and evaluate the likelihood of the SSM in the case when there are no regression type parameters and the initial state vector has an unspecified distribution. In Section 5 we will extend these results to include regression type parameters. §2 S t a t e space m o d e l Definition 1. A vectorial time series v = (vj, ■ ■ ■, v ^ ) T is said to be generated by the State Space Model (SSM) if, for k = 1, • • •, N,
vk = Xk@_+Ckxk + Zk$k, xfc = Wk-!0
+ Ak-iXk-i
+
tffc-iffc-i.
where x 0 = B6,^ ~ Nild(0,a2I),k = 0,---,N,6 ~ N(c, a2C), with either C T nonsingular or C = 0, 6 and £ = (£ , • • •, £,N) are independent, B is of full column rank and fi is a vector of fixed regression parameters. Also, Var{v) is nonsingular i f C = 0. ~ This definition is similar to the one in [13]; the vector §_ models uncertainty with respect to the initial conditions. Following [13], we will say that 6. is diffuse if C _ 1 is arbitrarily close to 0 in the Euclidean norm, denoted C —> oo. Contrary to de Jong, we will always suppose that j3, the vector of regression parameters, is fixed; considering /3 diffuse introduces confusion as to what likelihood should be used and it affects neither the equations nor the computations with the DKF, to be defined below. The formulation we use for the SSM has the virtue of explicitly separating the time-invariant "mean" effect 0 from the state vector x.k, keeping its dimension to a minimum. Choosing adequately the matrices Xk,Wk,Hk and Zk, appropriate components of /? and £ can be excluded from or included in each equation. Thus, the specification covers the case where the mean and disturbance effects in each equation are distinct. Two simple examples will illustrate the definition.
Initializing the Kalman Filter
41
Example 1. Suppose a regression model with random walk disturbance and scalar Vfc,
V(vfc - yl0)
= afc,
(2)
where V = 1 - L, L is the lag operator (L(vfc) = v fc _i), and all the afc ~ 7V(0, a2) are independent. Model (2) can be put into a state space form by defining Xk = y j , Ck = 1, Zk = 0, Wk - 0, Ak = 1, Hk = l,x fe = vfc - yj§_ and £ f c l = afc. That is, Xfc = xfc_x + afc ,
(3a)
Vfc = yfc£ + x f c .
(36)
For initialization, we make AQ = 1,HQ = l,Wo = 0,B = 1 and x 0 = 6. Therefore, the first state is xi = 6_ + ai and 6_ is in this case equal to the initial state. Because {xfc} follows the non-stationary model (3a), the distribution of 6. is unspecified. Example 2. Suppose Example 1, but with V replaced by 1 — pL, where \p\ < 1. Then, we have a regression model with AR(1) disturbances. The SSM is x/t = pxfc_i + afc
(4) 2
and (36). For initialization, we make Ao = 1,HQ= l/-\/l — p , WQ = 0, B = 1 and xo = 0 (c = 0,C = 0). In this case {xfc}, follows the stationary model (4) and we can use the first two moments of Xfc, namely £{xfc} = 0 and Var(xfc) = cr 2 /(l —p2), to set up the initial conditions. The first state is xi = (1/y/l — p2)&iA representation which will be very useful in what follows is given by the next theorem. Theorem 1. If v = (vj, ■■ ■, v ^ ) T is generated by the SSM (1), then v = R6 + Sp + e, where the rows of S are Si
=Xl+CiW0,
S2 = X2 + C 2 (Wi +
SN
= XN
+ CN{WN-!
AiWo),
+ AN^WN-2
+ ■■■ + {AN-i
■ ■ ■ AJWQ}
,
and those of R are Ri = ClA%_l-A0B,
i = 1, ■ • ■, JV.
Besides, e ~ iV(0,CT2E) with E nonsingular and Cov(6,e) = 0. Proof: The expressions for Si and Ri are obtained by repeated substitutions using (1). The vectors g4 are linear combinations of £,,,£,, * • • » ^ , i = 1, • ■ •,.AT.
42
V. Gomez and A. Maravall §3 Fixed initial s t a t e
If 6. is fixed (C = 0), then 6 = c and the representation of Theorem 1, v = R6 + SP_ + e, constitutes a regression model where the distribution of e is known. If we define X = (R,S) and 7 = {6T,§J)T,) T . then the log-likelihood of this model, based on v, is (throughout the paper all log-likelihoods will be defined up to an additive constant) A(v) = - i { M l n ( f f 2 ) + ln|E| + (v - Xj)TS-l(v
X2)/a2},
-
where Var(v) = cr2E and M denotes the number of components in v, the vector of stacked observations. The maximum likelihood estimator of a2 is a2 = ( v - J s : 7 ) T E - 1 ( v - X 7 ) / M .
(5)
Substituting a2 back in A(v) yields the cr2-maximized log-likelihood: Z(v) = - i { M l n ( a 2 ) + ln|E|}. It turns out that we can evaluate l(y) efficiently using the KF. Definition 2. The Kalman Filter (KF) is the set of recursions efc = vjt - Xkp_ - CfcXfc,fc_i, Dk = CkPk,k-\Ck Gk = (AkPk,k-iC^
+ ZkZk , + HkZj)D-\
xjfc+i.t = Wfc/3 + Ak±k,k-i A
(6)
+ Gkek,
. JVu,* = ( k ~ GkCk)Pk,k-iAl
+ (Hk -
GkZk)Hl,
with starting conditions x i 0 = Wo/3 + AoB6_ and P^o = HQHJ Here Xfc^-i is the predictor of x<; using (vj , ■ ■ ■, v J _ 1 ) T and Var(Sckik-i Xfc,fc-i) = Pk,kfc-1-\- The efc are the errors of predicting vfc using (vj, ■ ■ ■, v J _ j ) T They constitute an orhogonal sequence with E{ek} = 0 and Var(ek) = Dk, as given in (6). Note that we have supposed a2 = 1 in the equations (6) because we will estimate a2 using (5). It can be shown (see, for example, [13]) that if e = (ej, ■ ■ •, e ^ ) T , then there exists a lower triangular matrix K with ones in the main diagonal such that e = K(v — Xf) and KT,KT = D = diag(.Di, D2, ■ ■ ■, DN). Therefore, E " 1 = KTD~lK K and &2~= eTD~1e/M, 'M, ln|E| = ln|£»i| + ln|D 2 | + ■ • • + ln|Z3jv ]- In the case of scalar v^, the Dk are also scalar and we can obtain a "square root" of E " 1 by putting E " 1 / 2 = D~^2K.K. Then, we can use a vector
43
Initializing the Kalman Filter
of standardized residuals e = D */ 2 e such that a2 = e T e / M and maximizing l(v) becomes equivalent to minimizing the non-linear sum of squares l/M
2
5(v,i)= m i Atf fc=l
!
fc=l
For vectorial v*,, suppose that in each step of the KF, in addition to Dk, we also obtain, by means of its Cholesky decomposition, a ''square root" Dk of Dk, k = U---,N. Then, the matrix D1/2 = diag(D{/2, D^/2, • • •, D%2) is a "square root" of D and we can proceed as in the scalar case to evaluate 5(v, 7). Example 2. (Continued) In this case 6_ = 0, so that, by Theorem 1, we have v = S/3 + e. The initial conditions for the KF are xi 0 = 0 and Pi,o = 1/(1 - p2). Then, the KF gives efc=vfc-X^-xfc,fe_1, Gk = P,
£>i = l / ( l - p 2 ) ,
Dk = l,
k>l,
Xfc+i,k = pxfc,fc-i + P&k ,
lFli0 = l/(l-p2),
Pfc+i,fc = l,
k>0.
The vector of residuals is e = K{v — S0), where
K =
1 -p 0
0 1
0
•• •
0
•■ •
-p
1
•• •
0
0
0
0 0 0
0 0 0
-p
1
and the vector of standardized residuals is e = D 1 / 2 e , with §1 = (vi—yi"/3) \ / l — p2 and efc = (v^ — yJ/3) — p(vk-i — Yk-i@), k > 1. The nonlinear sum of squares is 5(v,7) = ( l / y r ? )
, /
"e
T
e(l/v^)'/
f l
'.
We now return to the general discussion of the term £ ^ , 7 ) . We can concen trate 7 out of S(v, 7) if we replace 7 in S(v, 7) by its maximum likelihood estimator 7, which is the generalized least squares (GLS) estimator of the model v = Xj + e.
(7)
We next show how to obtain 7 by means of the EKF. From what we have just seen, it is clear that the KF can be seen as an algorithm that, applied to a vector v_ of the same dimension as v, yields Kv_ and D. The algorithm can
44
V. Gomez and A. Maravall
be trivially extended to compute also D 1//2 Therefore, if we apply this extended algorithm in model (7) to the data v and to the columns of the X matrix, we = a2IM, and we obtain D~xl2Kw V == D~1?2KX%+ D'^Kg,Ke where Var(D-l'2Ke) Ke) = have transformed a GLS regression model (7) into an ordinary least squares (OLS) one. The estimator 7 can now be efficiently and accurately obtained using the QR algorithm. Supposing X is of full column rank, if p is the number of components in 7, this last algorithm premultiplies both D~1^2Kv v
\llM
/ N
\ l/M
/
\fc=i
/
S(v, 7 ) = fc=i
where OJ2 consists of the last M — p elements of w. Definition 3. The Extended Kalman Filter (EKF) is the KF (6) with the equations for efc and itk+i,k, respectively, replaced by Ek = (vfc, 0, Xk) - CfcXfc.fc-i,
Xfc+1,fe = (0,0, -Wi,) + Ak±k,k-i
with the starting condition X j o = (0, — A0B, — Wo). Also, Dk' with Dk.
+ GkEk ,
is computed along
The columns of the matrix Xfc^-i contain the state estimates, and those of Ek the prediction errors, corresponding to the data and to the columns of the X matrix, respectively. The EKF has been suggested in [15] and [19]; it has also been generalized to the case of a rank deficient X matrix in [6], Example 2. (Continued) Applying the EKF with the starting condition Xi,o = (0,0), we get Ek = (v f c ,yJ)-Xfc,fc_i,
Ek = D^1/2Ek,
Xk+i,k = P~X-k,k-i + pEk .
This implies Ek = (vfc - pvk-i,yk - pyj-i), k > 1, and £1 = ( v i , y 7 ) . The GLS model v = S/3 + e has been transformed into the OLS model
v2 - pvi
N -PV/V_I_
.
yiVi-p2 yj - pyj
/? + !,
e~N(0,a2I).
.yJi-pyJi-i
Consider next predicting the state x.k using {vj,vj, • • •, v J _ x ) T . This is equiv T 1 alent to first predicting Xfc using (T_ , VJ~, v j , ■ • •, v ^ j ) ^ , and then predicting this
45
Initializing the Kalman Filter
predictor using (v^, vj, ■ ■ ■, v J _ t ) T . The first mentioned predictor is Xfc,fc-i, as given in (6). It is easy to check that X f c ^_i(l, - 7 T ) T , where X/t,fc_i is given by the EKF, verifies the same recursion and starting condition as x ^ / t - i , and hence Xfc,fc-i = Xfc, f c _i(l,-7 T ) T . Thus, *k,k-i = X f c ,fc_i(l,-7 T ) T is the predictor we are looking for. Its mean squared error (Mse) is Mse(xfc,fc_1) = Var(xk - *k,k-i + *k,k-i ~ xfc,fc-i = Var(xk - Xfc,jt_i) + Var(ik,k-i 2
= a Pk,k-i
+ Var (±k,k-i(0,
= <72Pk,k-i +
- x/fc,fc-i)
(7 - T _ ) T ) T )
Mlht-iMseiDMlllk-i,
where X.('y)klk-i is the submatrix of X.k,k-i formed by all its columns except the l 1 first, and Mse(7) = a2 (U1U)~IIf Vfc is the predictor of Vfc using (vj, v j , ■ ■ ■, T v J _ j ) , then it can be shown analogously that v^ — v^ = Ek{l, —7 ) T , Mse(vjt) = a2Dk + E(j)kMse(j)E(2)J, where E(y)k is the submatrix of £* formed by all its columns except the first. The DKF of de Jong can also be used for likelihood evaluation when C = 0, although, as we will see, it has other uses as well. Definition 4. The j . l i t Diffuse Kalman Filter (DKF) is the EKF without the com „and with the added recursion Qk+i — Qk + EkD^Ek, putation of D,!/2 'k Ek, where Qi=0. Given that (v - Xj}TT,-l{-v ( v - Xj) = q - s T 5 _ 1 s , where q = v T E _ 1 v , s = X T,~1v, v, ;and S = XTY,~XX,x,\the Qk matrix accumulates the partial squares and cross products and T
(8)
QN+\
Therefore, the DKF allows us to evaluate the (a2,7)-maximized by - i JMln((g - srS~1a)/M)
log-likelihood, given
+ ]T\n\Dk\ 1
Example 2. (Continued) The DKF gives, besides Ek and Xjt+i.fc, computed as in the EKF, the matrices Qk- In this case, Q N+l
(l-pVl+E^K-pVfc.!)2 (1 - p )viyi + Ejb=2(vfc - pv fc _i)(y* * (1 - p2)yiyj + EJtL2(yfc - pyk-i){yk 2
pyk-i) - py/t-i)T
46
V. Gomez and A. Maravall
Finally in this section, we remark that the estimator 7 is obtained by solving the normal equations of the regression, 57 = sv. However, solving the normal equations this way can lead to numerical difficulties because what we are doing is basically squaring a number and then taking its square root. It is numerically more efficient to use a device such as the QR algorithm or the singular value decomposition, once the EKF has been applied. Another alternative, but computationally more expensive, is to use a square root filter version of the DKF. §4 Initial state with an unspecified distribution: no regression parameters In this Section, we suppose that 6_ in xo = B6_ has an unspecified distribution, that is 6_ ~ N(c,v2aC2C)wit: with C nonsingular. We also suppose that there are no regression parameters and, therefore, Wk = 0, and Xk = 0. Then, Theorem 1 implies v = R6 + e.
(9)
Ansley and Kohn [2], hereafter AK, define the likelihood of (9) by means of a transformation of the data that eliminates dependence on initial conditions. Let J be a matrix with | J\ = 1 such that JR has exactly rank(i?) rows different from zero. Such a matrix always exists. Let Ji consist of those rank(.ft) rows of J corresponding to the nonzero rows of JR and let J2 consist of the other rows of J so that JiR = 0. AK define the likelihood of (9) as the density of J2V. We will show later that, under an extra assumption, this definition does not depend on the matrix J. To evaluate the likelihood, however, and merely for algorithmic purposes, given that the transformation usually destroys the covariance structure of the data, they use an equivalent definition of the likelihood and develop what they call "modified Kalman Filter" and ''modified Fixed Point Smoother'' algorithms. The modified Kalman Filter is of considerable complexity, difficult to program and is less computationally efficient than the procedure in [6], when applicable, or the DKF. Also, it does not explicitly handle fixed effects and requires specialized assumptions regarding the SSM (see [14]). Another approach to defining the likelihood of (9) is that of de Jong [13], where 6_ is considered diffuse by letting C —► 00. In order to take this limit we need the following theorem. Theorem 2. Let 6_ ~ N(c,a2C) with C nonsingular. Then, the log-likelihood o / v is A(v) = - i { l n | C | + ln|ff2E| + lnlC" 1 + RTE~1R\ + {(6 - c)TC-\i-
c) + (v - R6)T-E-l(v
- R&)}l°2}
wiere I = ( C - 1 + RTY,-lR)-l{C-lc
+
RTE-1v)
.
Initializing the Kalman Filter
47
and 6. coincides with the conditional expectation E{S\v}. 2
l
Also,
1
Mse(|) = Var{6\v) = a {C~ + i ^ E " / ? ) " 1 Proof: The density p(v) verifies p(8\v)p(v) = p(v\6)p(6), where the vertical bar denotes conditional distribution. The maximum likelihood estimator <5 of 6_ on the left hand side of this equation must be equal to the one on the right hand side. Given that the equality between densities implies (£ - £{<5|v}) T fi-i (6 - E{6\v}) + (v - Rc)TQ;l(v T
1
T
- Re)
1
={6 - c ) C - ( 5 - c) + (v - i ? ^ ) E - ( v - R6), where Cl^\v and Qv are the covariance matrices, divided by a2, of p(6|v) and p(v), respectively, the left hand side is minimized for <5 = E{6]v}. To minimize the right hand side, consider the regression model (c T , v T ) T = (/, RT)T6
+ v, v ~ N(0, diag(C, E)).
Then 5 is as asserted and Var(6\v) = Var{6) = a2{C~l +
RTT,~1R)-1
Theorem 3. With the notation and assumption of Theorem 2, if RTT,~lR singular, then, letting C —► oo, we have A ( v ) + i l n | C | — -^{ln|(7 2 E| + l n l ^ E - 1 ^ + (v Jj — | = (RTE-1R)-1RTE-lv,
is non-
fl|)TE"1(v
- R~6)/a2} ,
Mse(l) -* Mse{6) = o\
ai{RrT,-lR)-1
Proof: It is an immediate consequence of Theorem 2. It is shown in [13] that A(v) + (l/2)ln|C| tends to a proper log-likelihood, called the diffuse log-likelihood. By Theorem 3, in order to compute it, all we have to do is to consider 6_ fixed in (9) and apply the methodology of Section 3. If the EKF is used, then, with the notation of Section 3, the results of Theorem 3 can be rewritten A(v) + | l n | C | 1-^6
2 /2 2 ~\ Mln(<7 ) + 2 | JT \n\Dl \ + \n\U\ I + wJw2/<7
= ET~Vi,
Mse(l) -» Mse(<5) = a2(UTU)~l
,
whereas if the DKF is used, then, with the notation of (8), we obtain N
A(v)+iln|C7|->-i
M\n{a2) + Y, Inpfcl + M 5 | + (q fc=i
6 = S'h,
Use(l) -> Mse(6) = a 2 , ? " 1
sTS~ls)/a2
48
V. Gomez and A. Maravall
If S in (8) is singular, de Jong leaves the diffuse log-likelihood undefined. In order to define the limiting expressions of Theorem 3 when S is singular, we have to consider model (9) with an R matrix that is not of full column rank. Let K be a selector matrix formed by zeros and ones such that KSKT has a rank equal to rank(R) and replace model (9) by v = RKT6l+e,
(10)
where £ t ~ N(c, cr2C), with C nonsingular and 6_x is the vector formed by choosing those components in 8 corresponding to the selected columns RK . This amounts to making the assumption that the other components in 6 cannot be estimated from the data without further information and are assigned value zero with probability one. The next theorem generalizes the results of Theorem 2 to the case of a possibly singular S matrix. Theorem 4. Suppose model (10) with the convention that if R is of full column rank, then matrix K is the identity matrix and 6i = 6. Then, with the notation and assumptions of Theorem 3, letting C —► oo, we have X(v)+hn\C\
- -^{ln|(7 2 £| + \n\KRTZ-1RKT\ 1^6
= (RTT,-1R)-RT'£-\,
+ (v - i ? 5 ) T E - 1 ( v - R8)/a2} , Mse(l) — Mse(<5) = a2(RrE~1R)'
,
where ^E^R)' = KT(K(RTZ-1R)KT)-1K and I and I are interpreted as the particular maximizers obtained by making zero the elements not in 6_x and 6_lt respectively. Proof: The only thing that needs to be proved is that \KRT'E~1RKT\ depend on K. This can be seen in [18, page 527].
does not
The next theorem shows the relationship between the likelihood of AK and the diffuse likelihood of de Jong. When S is singular, we take as diffuse log-likelihood the one given by Theorem 4. Theorem 5. Let J be a matrix with \ J\ = 1 like those used by AK to define their likelihood, and let J\ and J2 be the corresponding submatrices such that J\R ^ 0 and J2R = 0. Ifp(v) is the density o/v when C is nonsingular, as given by Theorem 2, and p{Jiv)v) is the AK likelihood, then, letting C —► 00, we have
W2C\^p(V)^{l[j/(2^AP(J2V), where \\j is the product of the nonzero eigenvalues of the matrix RTJ^JiRJiRa and d is the number of columns of R, rank(R)< d.
49
Initializing the Kalman Filter
Proof: Let J be as specified in the theorem. Then, p(v) = P{Jv) because \J\ = 1. Permuting the rows of JR if necessary, we can always suppose that JiR are the first rows of JR. This amounts to premultiplying JR by a matrix P obtained from the unit matrix by performing the same permutations. Given that P is orthogonal, we can take PJ instead of J. Let K be a selector r x d matrix, where r = rank(.R), and consider model (10). If R is of full column rank, then K = Id and r = d. That the determinant \KTRT jj J\RKT\ is equal to the product of the nonzero eigenvalues of RT jj J\R can be seen, for example, in [18, page 527]. Let J\RKT'
+ \n\MTQnM\
+ (J2v)rQ^l(J2v)/a2}
.
Ansley and Kohn [2], make the following assumption. Assumption A. Matrix R in (9) and (10) does not depend on the model parameters. This assumption holds in many practical situations, including the examples of Section 2. Corollary 1. If Assumption A holds, then the AK likelihood does not depend on the matrix J. Proof: It is an immediate consequence of Theorem 5. Even if Assumption A holds, Theorem 5 shows that the diffuse log-likelihood and the AK log-likelihood, when maximized with respect to a2, do not give the same results. The difference lies in tthe term Mln(
+ Rj^Rk)-1,
Msetxfc.fc.!) = (T2Pk,k-i + v* - vfc = Ek(l,
-6Tk)T,
xk,k-i
= X fc , fc _i(l, - , £ ) T ,
X(6)k,k-iMse(lk)X(6)lk_u Mse(vfc) = o2Dk +
E(6)kMse(6k)E(6)l,
50
V. Gomez and A. Maravall
where Rk is the submatrix formed by the first k rows ofR and Efc =
o--2Var((e[,
T
•■•>*I) ). ■
•
•
^
T
)
T
)
.
Proof: The first two equalities are a consequence of Theorem 2. The other expres sions can be proved as the corresponding ones for the case C = 0 (Section 3). Theorem 7. With the notation and assumption of Theorems 6 and 4, if the rows of X(<5)fctfc-i and E{S}k are in the space generated by the rows then, letting C —► oo, h-h**
(*fe Sfe lflfc)-ig"Efc l v f c l
Xfc.jt-i —» X f c | f c _ i ( l , - £ j t )
vjt -» C**fe,*-i(l, - a J )
,
,
Mse(*fc,fc-l) - c2Pk,k-i T
Maedfc) - Mse(^) = a2(Rj^Rk)-
,
+ X(6)k,k-iMse(6k)X(£)lk-i 2
Mse(vfc) - . o- Z)fc +
.
E(8)kMse(6k)E(6)l
Proof: The first two limits are consequences of Theorems 2 and 4. The other expressions are a direct consequence of Theorem 6. By Theorem 7, in order to get the desired predictors, we must consider the regression model (9) with 6_fixedand apply the GLS theory. We can use the results of Section 3 and, in order to get an efficient algorithm, we can apply the EKF or the DKF for likelihood evaluation or prediction. Note that the difficulties that may arise stem from the fact that the matrix R may be rank deficient. In this last case, we have to use generalized inverses throughout the process and neither all observations will be predictable, nor will all states be estimatable. The next theorem states that the predictors obtained with the modified Kalman Filter coincide with those obtained by means of the EKF or the DKF. Theorem 8. Let Assumption A hold. Then, the predictors of Xfc and v^ obtained with the modified Kalman Filter and those obtained with the EKF or the DKF coincide. If the same estimator of a2 is used in both procedures, then the Mse errors also coincide. Proof: Theorem 5.2 in [2] states that the AK predictors coincide with the diffuse predictors and the statement about the Mse follows trivially. We have seen that, in order to evaluate the AK log-likelihood, we can use the modified Kalman Filter of AK, although it is not the best procedure, or we may use the efficient EKF or DKF to evaluate the diffuse log-likelihood, which, by Theorem 5, differs from the AK log-likelihood only in a constant. This constant, under Assumption A, does not depend on model parameters. The EKF or the DKF should be applied to model (9) considering 6 fixed (C = 0). It would be nice to employ the EKF or the DKF only for an initial stretch of the data, as short as possible, to construct an estimator of 6. and, from then on, use the KF. When this
51
Initializing the Kalman Filter
occurs, one speaks of a collapse of the EKF or the DKF to the KF. Let rank(i?) = r and suppose that the first r rows of R are linearly independent. Let Ri be the submatrix formed by the first r rows and let Ru consist of the other rows of R. Partition v = {vJ,vJr)Ti) and g = (eJ,eJj)T conforming to R = (R.J,Rjj)T.) T - Then, t we can write v 7 = Ri6 + ej vi[ = Ru6 + eu
(11a) (116)
The next theorem shows how to implement the collapsing of the EKF or DKF to the KF. Theorem 9. Under Assumption A, let J with \J\ = 1 be a matrix like those used by AK to define their likelihood, with corresponding submatrices Ji and J% such that j J\R / 0 and J2R = 0, and let p(v//|v/, 5/) be t i e density ofvn -- £ { v /E{VH\VI,6J}, where E{vn\vj,6j} >/} is the conditional expectation ofvn given vj in model (11a) and (116), considering 6_ fixed (C = 0), and 6, replaced by its maximum likelihood estimator 8_j in model (11a). Then, p(^2v)
=p(vii\vI,~6I),
where p{Jiv) is the density of J 2 v. Proof: For simplicity, consider that Ri is of full column rank. If not, we would use generalized inverses, but the proof would not be affected. From model (11a) and (116), we have, considering 6. fixed, £ { v / / | v / } = Rn6 + E 2 iE]- 1 1 (v / - RiS) , where E21 = Cov(e_jj,er) and E n = Var{e_j). Then, E{vji\vj,6_j] = RJJRJ vj and VJI — E{VJJ\VJ,6_J} = v// — RnRJ1vj. Define the matrix J = (Jj T , Jj)T with Ji = (1,0) and J2 = {—R[iRJl,I). Then, J is a matrix of the type used by AK to define their likelihood and v// — £ { v / / | v / , 6_j) = J 2 v. This completes the proof of the theorem. By Theorem 9, to evaluate the log-likelihood of v/j - E{vu\vi, 6_,} = J2v, we can proceed as follows. First, use the EKF or the DKF in model (11a) to obtain the maximum likelihood estimator (mle) 6_j and initial conditions for the KF E{xs}
= X3,3_1(1,-1/T)T,
Var(xs)
=a
2
^., +X ^ ^ j M s e ^ X ^ ) ^ ,
where vs is the first observation in (116). Then, proceed with the KF, applied to the second stretch of the data v / / , to obtain the log-likelihood of vn - E{vrl\v[,6_r} = J2V as in Section 3, but with no regression parameters. We have used the initial
h:
52
V. Gomez and A. Maravall
stretch v / of the data to construct the mle 1/ and the initial conditions for the KF. After that, the effect of 8 has been absorbed into the estimator of the state vector and that is the reason why we can collapse the EKF or DKF to the KF. We now give another interpretation to the result of Theorem 9. With the notation of Theorem 2, we can write A(v) + ln|C|/2 = {A(v7) + ln|C|/2} + A(v„ | v / ) ,
(12)
where A(v//|v/) is the conditional log-likelihood of v// given v/. By Theorem 3, letting C —» oo, the term in curly brackets tends to -{Mjln(er 2 ) + \RjRi\}/2, where Mi is the number of components in v/, whereas A(v//|v/) converges to the log-likelihood of J2V, which, by Theorem 9, is equal to the log-likelihood of •VII — E{vn\vi, 6.1}. Thus, the diffuse log-likelihood of v is the sum of the diffuse log-likelihood of v/ and the log-likelihood of J2V. Note that the first term does not contribute to either the determinant or the sum of squares of the diffuse loglikelihood. Bell and Hillmer [3] use a similar idea to construct initial conditions for the KF. Instead of employing the KF for the initial stretch of the data, they use the transformation approach of AK to construct the mle 6/ and the initial conditions for the KF directly. Whether this approach is more advantageous than using the KF is something that depends on the pecularities of the problem at hand. If it is easier to obtain the mle and the initial conditions directly, then it can be used. However, the KF approach to construct the mle and the initial conditions has the advantage that it is easy to implement, does not depend on ad hoc procedures and it imposes very little computational and/or programming burden. The case we have been considering, where the submatrix Ri is formed by the first r rows of R is important because it happens often in practice. Examples of this are ARIMA models and ARIMA component models. If in model (11a) and (life) we have v/ = 6_, then Theorem 9 implies p(J2v) = p(v//|v/). Also, if J is a matrix like those used by AK to define their likelihood and .7 is of the form J = {Jj,J2)T with J\ = {1,0), then v/ is independent of J2 V - This is the conditional likelihood approach used in [6] in the context of regression models with ARIMA disturbances and generalized to the case when there are missing observations. For ARIMA (p, d, q) models, the situation simplifies still further because it is not necessary to employ the EKF or the DKF for the initial stretch of the data v/ = 8_ to obtain initial conditions for the KF. The SSM can be redefined by simply translating forward the initial conditions d units in time, where d is the degree of the differencing operator. Suppose there are missing observations in v and that in (11a) the vector v/ contains a subvector -VIM of missing observations. Let v / o b e the subvector of v/ formed by the nonmissing observations and let v / / be the subvector of v contain ing the rest of the nonmissing observations. Then, by analogy with the result of Theorem 9, we can still consider v/j - E{vu\vi,v /.£/} 6_j}, treat VIM as a vector of fixed
53
Initializing the Kalman Filter parameters, and define the likelihood of (vj0, vJj)T VJJ = RIIUMVIM
as that of the regression model
+ ujj,
(13)
where r/.. = v// — RnUovio,u_n = tn — RIIRJ1*F §.I,l£i and Uo and UM are the 11 submatrices of R7 RJ formed by the columns corresponding to v / o and VIM, respec tively. Here we have supposed that Rj is of full column rank. If not, we would use generalized inverses, but the main result would not be affected. Note that the vector V / M is considered as a vector of fixed parameters that have to be estimated along with the other parameters of the model. The next theorem shows that this definition of the likelihood is equivalent to the AK definition. Theorem 10. Under Assumption A, the (a2, VIM)-maximizedzed log-likelihood cor responding to (13) coincides, up to a constant, with the a2-maximized AK loglikelihood. Proof: Let Var(uj_n) =
= L~1RHUM-VIM
~ Rn
+ L~1LJU
.
The QR algorithm, applied to the L~1RHUM matrix, yields an upper triangular matrix S with nonzero elements in the main diagonal such that QTL~XRHUM = ( 5 T , 0 T ) T , where Q is an orthogonal matrix. Then, we can write QTL~%I
=
VIM + Q1L-lu_,j
.
The matrix L~l will not have, in general, unit determinant. If we multiply L~l by a = \L\l/M", where M// is the number of components in v / / , then aL~l has unit determinant. Let K = (Kj,Kj)T with KY = (1,0) and K2 = (-RnU0,I) and let P = (P 1 T ,P 2 T ) T with Pi = ( / , 0 ) a n d P 2 = (0,aQTL~1). Partition Q T = (QU Q2)T conforming to QJL-1RUUM = S and QjL-1RnUM = 0 . If J = PK, then J has unit determinant and J(v / T o,v / T / ) T = =(Rjo, (aSRiM)T,0)T6
(vJ0,(aQTL-1RlI)T)T + (ej0, (e,M + aQjL~lu,j)T,
(aQjL~ V / / ) T ) T ,
where we have partitioned Ri and g/ conforming to the partition of v/ into v / o and VIM- Given that (Rjo, (C<SRIM)T)T has rank equal to that of Ri, the matrix J is of the AK type. Therefore, the AK likelihood is the density of aQlL~lr}lH, and the AK log-likelihood, maximized with respect to a2, is -\{(M„
-rs)Ha2)
+ \n\L\2{M,,-Ts)/Mn}
,
54
V. Gomez and A. Maravall
where
a2 =RJI(L-l)TQ2QjL-lriII/(MI[
- rs)
and rs = rank(S). The log-likelihood of (13), maximized with respect to a2 and vIM is - { M / / l n ( a 2 ) + ln|L| 2 }/2, where
a2 =
TL]I(L-1)TQ2QlL-1Tln/M,i
This completes the proof of the theorem. Theorem 10 generalizes the result obtained in [6] for ARIMA models with missing data. This approach is useful when the matrix Ri corresponding to the first observations v/ (including the missing ones) is of full column rank. We now suppose that in model (9) the first r rows of R do not, in general, constitute a submatrix of R of rank r. Let Ri be the first submatrix of R formed adjoining consecutive rows to the first row, such that it has full column rank and let Ru consist of the other rows of R. Partition v = ( v J , v ] / ) T and e = (gj,§Jj) T T conforming to R = (Rj,Rjj)?7,) In the rest of the section, whenever we refer to models (11a) and (lib), we will refer to this partition. Consider the decomposition given by (12). Then, letting C —► oo as before, the term in curly brackets tends to
-i{M 7 ln(a 2 ) + | £ n | + [Rj^Ri]
+ (v, - Rih)T^n^i
~ Rih)/<>2} ,
where Var^j) = CT2EH and 6_r = (RT^^Ri)~1Rj'L^vi. The conditional loglikelihood A(v//|v/) converges to the log-likelihood of J2V, where
J 2 = {-R,,STXT, - £ 2 1 Er/(/ -
RIS^TJ),
I),
and Ti = RjT,^ TO see this, define J = {Jj, J2T)r A(v) = A( Jv) because J has unit determinant and
Eai =
Cov(en,ej),
with Jx = (1,0).
Then,
A(v) + i l n | C | = {A(v7) + lln\C\} + A ( J 3 v | v , ) . Note that now J is not a matrix of the AK type. Theorem 11. Let J be the matrix we have just defined , with the corresponding submatrices J\ and J2, and let p ( v / / | v / , | 7 ) be the density ofxu — £ { v / / | v / , S 7 } , where E{VH\VI,6.J},Sj) is the conditional expectation ofvn given v ; in model (11a) and (116), considering 6 fixed (C = 0) and 6 replaced by its maximum likelihood estimator 6_j in model (11a). Then, p(J2v)
=p(vn\vi,6,),
55
Initializing the Kalman Filter where p(J2v) is the density of J2V. Proof: The proof is analogous to that of Theorem 9.
Thus, to evaluate the AK log-likelihood or the diffuse log-likelihood, we can still use the EKF or the DKF as before, until we have processed a stretch of observations such that the corresponding submatrix of R has full column rank, and then collapse to the KF. The likelihood is evaluated as the sum of two terms. One corresponding to the stretch v/ and the other corresponding to v / / . More specifically, the EKF or the DKF applied to model (11a) yields lEul.litfE^H/l
and
fl/£/)TEf11(v/
(v/ -
- R/&),
(14)
where 6_j = (R.J T,^ Ri)~l Rj E^v 1. These three terms will be needed for the computation of the likelihood because now there will be no cancellation of terms. With the notation of Section 3, if the EKF is used, the expressions in (14) are N,
ln|E n | = 2 £ l n | ^ / 2 | ,
\Rj^R,\
= \Uj\2
k=i
and (v 7 - i?/l / ) T E7 1 1 (v / - Rjij)
= uji2u.i,2 .
whereas, if the DKF is used, they are N,
ln|En| = £ > | Z ) f c | ,
\R]ZTiRi\ = \Si\
k=\
and (v/ - H/i / ) T E 1 - 1 1 (v/ - Rih)
=qi-
sjsyls,.
The initialization for the KF, to be used with the second stretch of the data v / / , is £ { x s } = X S ) S _ X (1, -'sJ)T
,
Var(x3) = a 2 P s , s _ ! + X(£). i ._iMse(£ / )X(£)J j _ 1
(
where, as before, v s is the first observation in (116). Once the run of the KF is completed, we have to add up the terms in (14) to the corresponding terms obtained with the KF, \Var(J2v)\ and (J 2 v) T (V r ar(J 2 v))- 1 (J 2 v). The fact that we don't know for how long we will have to use the EKF or the DKF before we make the transition to the KF may make collapsing unattractive. There is an alternative procedure to evaluate the AK log-likelihood or the diffuse log-likelihood that might be of interest in some cases. It consists essentially of reshuffling the observations in such a way that again the first r rows of R are linearly independent. An algorithm to achieve this is the following. Apply the EKF
56
V. Gomez and A. Maravall
or the DKF to model (9) and, at the same time, obtain the row echelon form of the R matrix. Each time a new observation Vfc is being incorporated, we check whether its corresponding row vector Rk is a linear combination of the rows already processed. If it is, we skip this observation as if it were missing (see [10]). Otherwise, we process the observation as part of the initial stretch of the data v/. Proceeding in this way, after some time we will have processed a stretch of the data v/ for which the corresponding submatrix Ri of R will be formed by a maximal set of linearly independent row vectors. Let v// consist of the other observations and let v s be the first observation that we skip as if it were missing. This will be the first observation of VJJ. Suppose the Fixed Point Smoother (FPS) corresponding to v3 is applied, along with the EKF or the DKF, to all the columns of Xfc+i^. Then, after processing v/, we can set up as initial conditions for the KF, to be applied to v / / , the following £{x,} = X,,,(l, - | J ) T ,
Var(x3) = a2P3j + X ^ / M s e ^ X ® Jj ,
where ~X.aj,P3j and X(£) 3] / are the matrices obtained with the FPS, and <5/ is the mle corresponding to v/. Note that the advantage of using only the KF for likelihood evaluation comes at the expense of an increase in the computations. Example 3. Consider the following ARIMA (1,1,0) model (1 +
°
'
1 - ^,*i,fc = Vfc,x2,fc = v fc+ i - a fc+ i and | f c l xfc = Axfc.j +Hak;
A, Hk = H, with at. Then, we can
vfc = Cx f c .
To initialize, we consider that (1 - L)vjt = Ufc is stationary and follows the model (1 + >i)ufc = afc. Then, xi
and we can choose AQ — I,,B Ho
vo +
1 1
0 1
1 "l,
= (1, l ) T , x 0 = B6,6 = v 0 and 1 1
0 1
J VT^T2
The first state is X! = B6 + H 0 ai. Model (9) specializes to R = (1,1,• • -, 1) T and ek = ui H 1- Ufc, k = 1, • • •, N. The AK likelihood can be obtained as the density
Initializing the Kalman Filter
57
of the differenced data. This is equivalent to multiply v by the matrix
.7 =
1 -1 0
0 1 -1
0
■■
0
■•
1
•■
0
0
0 0 0
0 0 0
-1 1
define J\ = (1,0, ■ • •, 0) and J 2 such that J = {Jj, Jj)T, and take as AK density the density of J 2 v. Note that JR = (1,0,-- -, 0) T and Je = (ui, 112, • • •, UN)1• The EKF or the DKF produces Dx = 1/(1 - 2), Dk = 1, k = 2, • ■ •, TV, ln|E| = l n | J £ J T | = ln|£>i| + • • • + \n\DN\ = l n ( l / ( l - 0 2 )), \n\Rr'E-1R\
= ln\RT JT(J-£JT)-1
T
JR\ = 0,
1
(v - i ? l ) E - ( v - R6) = (v 2 + 0 V j -
Vl)
6 = vx,
2
N
+ Y2^Vk +
In case the DKF is applied,
(1 - 022wKT - |
(1 - 0 2 )v 2 + (v a + >vi - V l ) 2 Q N+l
+ Efc=3t( V fc + >Vfc-l) - (Vfc_i + 0 V f c _ 2 ) ] 2
(l - 02)
(1 - 0 2 )v,
Model (11a) becomes the first equation of (9), vj = 6_ + ui, where v/ = v 1 : Rj = 1 and e_j = ui. Model (116) consists of the rest of the equations. Suppose we use the EKF or the DKF in (11a) to obtain 6_j and initial conditions for the KF, that we will apply later to model (116). Then, "X",
n
—
-M,o -
"0 0
-1 -1
•t
E>. — („.
'-'I
—
V" ll
1\
*■) !
P,nrifl -
1-0 1 Dl Gi = .<6+ ( l - c ^ ) 2 , ' D, ~ 1 - 02' (1 - 0)V, -0 X 2 ,i = Jl-0+0 2 )V1 0 ( 0 - 1 ) . .
P
2,l
1
I _ 02
=
1 1-0
1 .1-0
1-0 (1-0)2
1-0 (1-0)2.
Given that 6j = v i , we have Mse(£j) = 1/(1 - 0 2 ) and the initial conditions for the KF are E{*2] Var(x2)
1
[~h\
Vl Vl
= P2,i + X(£) 2 ,iMse(3 / )X T (4)2,i =
■—?
1
1-0
1-0
(1-0)2
V. Gomez and A. Maravall
58
Therefore, using the EKF or the DKF in (11a) to estimate 6 and to compute initial conditions for the KF yields the same starting values, but shifted ahead one period of time. This is an example where we can redefine the SSM, taking v / = 6 and translate the initial conditions forward one unit of time. This is true for all ARIMA (p,d,q) models (see [6]). §5 Initial state with an unspecified distribution: the general case In this Section we consider a more general SSM than that of Section 3. Besides making the assumption that 6 in xo = B6 has an unspecified distribution, 6_ ~ N(c,cr2C), with C nonsingular, we allow for regression parameters. That is, we consider /3 fixed but unknown. By Theorem 1, we have v = R6_ + S0_ + e. Defining X = (R, S) and 7 = (5 ,/3 T ) T we can write the model more concisely as v = X7 + e.
(15)
To define the AK likelihood, consider a matrix J of the type used by AK when there are no regression parameters and let J\ and J2 be the corresponding submatrices such that J\R ^ 0 and J2R = 0. Then, the AK likelihood is the density of J2(v — S0). In order to efficiently evaluate the likelihood and predict and interpolate unobserved v^'s, they use their modified KF and modified FPS, applying them also to the columns of the regression matrix, as outlined in Section 3. The reader is referred to [15] and [16] for details. For the reasons mentioned in Section 4, we consider the modified KF and modified FPS computationally less efficient and conceptually more complex than the EKF or the DKF. To compute the diffuse log-likelihood of (15) we have to consider that <5 is diffuse, C —* 00, and ft is fixed. De Jong does not consider explicitly this case, although it is a case that is often encountered in practice. Proceeding as in Theorems 2 and 3 of Section 4, replacing v by v — 5/3, and letting C —* 00, we have A(v) + i l n | C | -» -i{ln[cr 2 S| + l n l i ^ E " 1 ^ + (v-Sf3_-
i?5) T E- 1 (v -S0-
R'6)/a2} ,
! - > 1 = ( f l T E - 1 J R ) - 1 i ? T E - 1 ( v - S{3) . Minimizing this diffuse log-likelihood with respect to (3 yields an estimator J3 which minimizes (v - 5 ^ ) T P T E " 1 F ( v - S0), where P = I - R(RTT,~1R)-lRTlS-1 It can be shown that the estimators 6 and J3 obtained in this way can be obtained in a single stage as the GLS estimator 7 = ( | ,/? ) T of model (15). Thus, the EKF 2 2 or■the the DKF DKF can can be be used used toto compute compute the the (
Copyrighted Material
59
Initializing the Kalman Filter
where M is the number of components in v and a1 = ( l / M ) ( v - X 7 ) T E - 1 ( v — Xy). Under Assumption A of Section 4, the AK (a2.7)-maximized log-likelihood differs from the (er2,7)-maximized diffuse log-likelihood only in a constant. As in Section 4, it is possible to employ the EKF or the DKF for an initial stretch of the data to construct an estimator of 8. However, it will not be possible now to collapse to the KF because we will still have to estimate the P parameters. The most we can do is to collapse to a reduced dimension EKF or DKF. More specifically, let rank(il) = r and suppose that the first r rows of R are linearly independent. Let Ri be the submatrix formed by the first r rows and let Ru consistC of O I the l l l i other rows T — of R. PartitionI I vv = (vj, vj{)r,) T , S = (Sj,Sjj)T,-) T , and e = (ej, eJj)^T conforming to R = (Rj,R.JJ)T Then, we can write vI = RI8 + SIp + eI,
(16a)
vu = Rn6 + SIIp + eII.
(166)
Suppose that Rj has full column rank. If not, we would use generalized inverses instead of true inverses but the main result would not be affected. As in Section 4, we will apply first the EKF or the DKF to (16a) to obtain a GLS estimator 6_[ of <5. However it will not be possible now to absorb both <5 and P into the state estimator. Only 8_ will be absorbed. In this way, the EKF or the DKF will only be simplified, not collapsed to the KF, when we apply it to (166) in the second step of the procedure. The number of states of the EKF will be reduced by a number equal to the number of components in 6_. Let v 3 be the first observation in (166). We showed in Section 3 that, if 6 and j3 are known, then the estimator of the state x 3 using ( 7 T . v f r . v J i - - - . v J - i ) T X 3 | 3 _ ! = X . , . - ! (1 - 7
T
)
T
is
= X ( v ) , , , _ l - M&s,s-l8
- X(§)s,s-lP,
(17)
where X(v) S i S _i,X(5) S l 4 _i and X(/3) 3i3 _i are the columns of X 3 | 3 _i corresponding to v s , 6_ and j3, respectively. The GLS estimator 6[ of 6_ obtained from (16a) is Sj = V-lT{w! - Srp) , where V = RjE^R^T (17) yields
= RjXJ1
and Var{ei) =
x 3 , s _! = X(v) 3 , s _! - X ( £ ) . , . - i V - 1 T v / - (X(P)s,s-i -
X&s.s-iV-'TSrip
= X(v) s > s _i-X(£)* ? ! r -i/3, where x J | S _i is the estimator of x 3 using (P_ , v 7 , v j , - • •, v J _ j ) T and (X(v) 3 i ,_ 1 , X(/3) a , 3 _i) are the estimators, respectively, of the states corresponding to the data and the P parameters. Given that Mse(x 3|Ji _i) =Var(x3 =Var(xa
- x 3 , 3 _i + x 3 , s _i - X,^_i) - x SjS _i) + Var(x 3 | 3 _! - x , -_i) ,
60
V. Gomez and A. Maravall
we have Mse(x s , 3 _!) = a 2 P s , s _ ! + X ( S , , , - i M s e ( ^ ) X ( £ ) ^ , _ i - By Theorem 9 of Section 4, the EKF, to be applied to (166), can then be initialized with Var(x.3) = Mse(x Sia _!) and X„,<>-i = (X(v) S i S _i,X(^) s , s _ 1 ). If the DKF is to be employed, the initialization for the Q matrix would be
Qu
On
Q31
Q33
Q12 Q32
Q22 [Q21 Q23} 1
where Q, = (Qtj), i,j = 1,2,3. This can be seen considering that, after esti mating 6, the sum of squares is (v/ - Si0)TPTZJ1P{v/'P(v/ - S//3), with P = I -R,(RjT,J1Rj)-1RjT,J1 RJT,J If the first r rows of R do not constitute a submatrix U of R of rank r, we would proceed as in the last part of Section 4. Example 1. (Continued) Model (15) specializes to e_k = ai + ■ • • &k, k = 1, ■ • •, N, R = (1, • • ■, 1) T and S = (yi, • • ■ ,yjv) T - The AK likelihood is, as in Example 3, the density of the differenced data. If J, J\ and J2 are the matrices defined in Example 3, with J = (Jj T , Jj)J, then the AK likelihood is the density of J2(y-S§)
= (v 2 - v i - ( y 2 - y i ) T ^ , - - - , v j V - V J V _ I - (YN -
yN-i)T§)T
The EKF or DKF produces Dk = \,k = \,---,N,Ex = (vi, l.yj"), Ek = (v fc ,0, yl), k = 2,---,N, ln|E| = 0, l n ^ E " 1 fl| = 0,6 = Vi - yj~0,
^2(yk
yk-i)T
- yfc-i)(yt -
Lfc=2
Yliyk - yk-i)(vk
-vfc_i).
fc=2
and (v - X 7 ) T E - ! ( v - Xfj
= J2 [vib " v fc -i - (yfc - y f c - i ) T | ] ' fc=2
Model (16a) becomes the first equation of (15), vi = 6_ + yJ/3 + a i , where vj = V\,Rt = 1, Sj = yj and Ej = a j . Model (166) consists of the rest of the equations. Suppose we use the EKF or the DKF in (166) to obtain 6_j and then collapse to a reduced dimension EKF or DKF, to be applied later to model (166). Then, Xi, 0 = ( 0 , - 1 , 0 ) , Ei = ( v i , l , y 7 ) , Pi,o = l, G, = l , Di=l,
X 2 , 1 = ( v 1 , 0 , y 7 ) , P 2 ,! = l .
Clearly, 6j = vj — yj[3_, and, therefore, x 2 ,i = vj - yjp Mse(x 2 ,i) = 1.
= X(v) 2 ,! - X(£) a .l£
Initializing the Kalman Filter
61
The collapsed EXF or DKF can be initialized with X2,i = ( v i , y 7 ) and Var(-K2) = 1. Note that now the dimension is that of the original EXF or DKF minus one. In case that the DKF is applied, we have v
Q2=
f vj yivi yivi
v
i 1 yi yi
v
iyiT yT yiyj yiyj
,
and the matrix of partial squares and cross products to initialize the collapsed DKF
QQ22 == \\ vf vf vv ii yy nn __ rr VV ll ii
yy r r 11 ==
[o [o oo-
References 1. Anderson, B. and J. Moore, Optimal Filtering, Prentice Hall, Englewood Cliffs, N. J., 1979. 2. Ansley, C. F. and R. Kohn, Estimation, filtering and smoothing in state space models with incompletely specified initial conditions, Ann. Statist. 13 (1985), 1286-1316. 3. Bell, W. and S. C. Hillmer, Initializing the Kalman filter for nonstationary time series models, J. of Time Ser. Anal. 12 (1991), 283-300. 4. Box, G. E. P. and G. M. Jenkins, Time Series Analysis, Forecasting and Control, Holden-Day, San Francisco, CA, 1976. 5. Burridge, P. and K. F. Wallis, Calculating the variance of seasonally adjusted series, J. of Amer. Statist. Assoc. 80 (1985), 541-552. 6. Gomez, V. and A. Maravall, Estimation, prediction and interpolation for nonstationary series with the Kalman filter, EUI Working Paper ECO 92/80 (under revision for J. of Amer. Statist. Assoc). 7. Harvey, A. C. and G. D. A. Phillips, Maximum likelihood estimation of regression models with autoregressive-moving average disturbances, Biometrika 66 (1979), 49-58. 8. Harvey, A. C. and R. G. Pierse, Estimating missing observations in economic time series, J. of Amer. Statist. Assoc. 79 (1984), 125-131. 9. Harvey, A. C , Forecasting, Structural Time Series Models and the Kalman Filter, Cambridge University Press, Cambridge, UK, 1989. 10. Jones, R., Maximum likelihood fitting of ARM A models to time series with missing observations, Technometrics 22 (1980), 389-395. 11. Jong, Piet de, The likelihood for the state space model, Biometrika 75 (1988), 165-169. 12. Jong, Piet de, Smoothing and interpolation with the state space model, J. of Amer. Statist. Assoc. 84 (1989), 408-409.
62
V. Gomez and A. Maravall
13. Jong, Piet de, The diffuse Kalman filter, Ann. Statist. 19 (1991), 1073-1083. 14. Jong, Piet de, Stable algorithms for the state space model, J. of Time Ser. Anal. 12 (1991), 143-156. 15. Kohn, R. and C. F. Ansley, Efficient estimation and prediction in time series regression models, Biometrika 72 (1985), 694-697. 16. Kohn, R. and C. F. Ansley, Estimation, prediction and interpolation for ARIMA models with missing data, Tech. Rept., Graduate School of Business, University of Chicago, IL, 1984. 17. Kohn, R. and C. F. Ansley, Estimation, prediction and interpolation for ARIMA models with missing data, J. of Amer. Statist. Assoc. 81 (1986), 751-761. 18. Rao, C , Linear Statistical Inference and its Applications, John Wiley & Sons, New York, 1973. 19. Wecker, W. and C. F. Ansley, The Signal extraction approach to nonlinear regression and spline smoothing, J. of Amer. Statist. Assoc. 78 (1983), 81-89. Victor Gomez Instituto Nacional de Estadistica Paseo de la Castellana 183 28046 Madrid, Spain Agustin Maravall European University Institute Badia Fiesolana, 1-50016 S. Domenico di Fiesole (FI), Italy [email protected]
Robust Adaptive Kalman Filtering
A. Reza Moghaddamjoo and R. Lynn Kirlin
A b s t r a c t . In this chapter we first survey several adaptive procedures under the assumption that the noise is Gaussian. Then several different approaches for adaptation to the unknown deterministic input are reviewed. Finally we present a method to adapt the Kalman filter to the changes in the input forcing functions and the noise statistics. The resulting procedure is stable in the sense that the duration of divergences caused by external disturbances are finite and short and, also, this procedure is robust with respect to impulsive noise (outliers). In this approach the input forcing functions are estimated via a running window curvefitting algorithm, which concurrently provides estimates of the measurement noise covariance matrix and the time instant of any significant change in the input forcing functions. In addition an independent technique for estimating the process noise covariance matrix is suggested which establishes a negative feedback in the overall adaptive Kalman filter. This technique is based on the residual characteristics of the standard optimum Kalman filter and a stochastic approximation method.
§1 I n t r o d u c t i o n In this chapter we address t h e problem of applying t h e K a l m a n filter in real t i m e s t a t e estimation. In general a linear system can be modeled as x/t+i = Akxk
+ Bkuk
+ £k
(1)
Here t h e subscript index is t h e t i m e sample, x/t is an (n x 1) system vector, Ak is an (n x n) system matrix, Ufc is a (p x 1) vector of t h e input forcing function, B^ is an (n x p) m a t r i x a n d £ is an (n x 1) vector of a zero mean white sequence. T h e discrete vector measurement v^ is described as Vfc = CfcXfc + 7^ , Approximate Kalman Filtering Guanrong Chen (ed.), pp. 65-85. Copyright ©1993 by World ScientiBc Publshing Co. Inc All rights of reproduction in any form reserved. ISBN 981-02-1359-X
(2) 65
66
A. Moghaddamjoo and R. Kirlin
where Ck is an (m x n) matrix, and rh is an (m x 1) measurement error vector assumed to be a zero mean white sequence uncorrelated with the £fc sequence. The covariance matrix for the £ and n vectors are defined by E
(3)
E{vkvJ} = RtSu ,
(4)
£{|fc^T}=0.
(5)
{§JJ)=Qk6kt,
In the above equations 6ki is the Kronecker delta function. The optimum Kalman filter update equations are Xfc,fc-i = Afc_ixfc_i +Bfc_iu f c _i ,
(6)
xjt = xfc:jt_i + Gfc(vfc - Ck-kk,k-l),
(7)
Pk,k-i = Ak-i Pk-i,k-i Pk,k = (I-
Al_Y + Qk-i,
GfcCfc)Pfc,fc-i,
(8) (9)
where Gk, the optimum Kalman gain at tk, is given by Gk = PKk-X
CfcT(CfcPfc,fc_i C j + Rk)-1
(10)
In the above equations x.klk-i is the a priori estimate of xjt, x/t is the a posteriori estimate of x^, Pk
=p(x fc |7,V fc )p(7|V fc ),
(11)
where Vk = [vo, Vj, • • ■, v^.] are measurements till time tk and Vj is the measurement at tj, and it is assumed that p(xfc|7, Vfc) is Gaussian with mean Xfc(7) and covariance
67
Robust Adaptive Kalman Filtering
Pfc(7). Xfc(7) and Pfc(7_) may be obtained from the Kalman filter for any particular 7. The calculation for p(7|VJt) is performed by P 7\Vk) = P 7 vfc, Vk-i) = T — p — F - f 7 — . , - , . . — r r = (12) JnP(vfcl7- Vfc_i)p(7|Vfc_i)d7 where fi is the set of all 7's. Usually, but not necessarily, p(vfc|7, Vk-i) is assumed to be Gaussian with mean CfcXjt.fc-i and covariance {CkPk,k-\C^ + Rk)- Equation (12) is solved recursively starting from p(j), the a priori distribution of 7. An optimum estimate of Xfc which minimizes the mean-square error ||x;t — Xfc ||2 is given by the conditional mean, kk = E{xk\Vk}.
(13)
This can be obtained from p(x;t,7|Vfc) by integrating out 7 and taking expectation over xyt, xfc = f My)p(l\Vk)d1.
(14)
The main difficulty in using this algorithm is due to equations (12) and (14), which involve integration over a large dimensional space f2. Also this requires cal culation of Xfc (7) for different 7, which can be very time consuming and impractical in real time applications. 2.2 Maximum likelihood estimation Depending upon the density function used three different kinds of maximum likeli hood (ML) estimates can be defined. 1) Joint ML estimate: If Xfc and 7 are to be estimated simultaneously, then p(xfc,7|Vfc) is maximized jointly with respect to Xfc and 7. 2) Marginal ML estimate off. The marginal density p(7| 14) is maximized with respect to 7. 3) Conditional mode estimate: p(xfc|Vfc) is maximized with respect to Xfc. This estimate can be obtained from the Bayesian approach, and is equivalent to the conditional mean in the Gaussian case. For simplicity the marginal ML estimate of 7 is first considered. Using Bayes' rule, , I T / ,lVk)= p(Vkh)p(i) p{y
-
m '
the ML estimate of 7 can be obtained by maximizing L(7) = logp(7|Vfc) 1
fc
+ log \iCjPjj-i
(16) Cj + Rj)\} + logp(7) + constant.
{)
68
A. Moghaddamjoo and R. Kirlin
If no a priori distribution of 7 is given, the logp(7_) term is dropped. In other words p(Vfc|7), is maximized. If xjt and 7 are to be estimated simultaneously, we should maximize L'(x Jt ,7) = logp(x it ,7|K fc ). (17) 2.3 Correlation methods The basic idea in correlation methods is to autocorrelate a known linear processing of the system output [2,8]. A set of equations is derived relating the system parame ters to the observed autocorrelation function: these are then solved simultaneously for the unknown parameters. These methods are mainly applicable to constant coefficient systems. Two different methods can be developed by considering either the autocorrelation function of the output v/t or the autocorrelation of the filter residuals v^. Since the filter residuals v_k are less correlated than the output v^, the second method is more efficient. Moreover, the first method is applicable only if the out put vfc is stationary, whereas the second method is applicable to cases where Vfc is not stationary. Both methods require the system to be completely controllable and observable. Using residual correlation, it is necessary to assume a time invari ant system with zero input. Because more restrictions are involved in the output correlation method, this method is not considered further here. It is known that for an optimum filter the residual sequence Ek — vjfc - Cfcxfe]fc_i
(18)
is zero-mean Gaussian white noise. However, for a suboptimum filter, the residual sequence has non-zero autocorrelation: Let Tt = E{ukul_t}
.
(19)
It can be shown that for stationary time-invariant system where the system param eters are not changing in time, Tt = C[A(I - G3C)]1-1A{P-CT
- GST0),
(20)
where Gs is the suboptimum a priori Kalman gain, P~ and P are the stationary values of Pk,k-i, and Pk}k respectively and
r0 = cp-cT
+ R.
(21)
For an optimum filter G = P-CT{CP~CT
+ R)'1.
(22)
69
Robust Adaptive Kalman Filtering
In this case 1^ for £ > 0 vanishes identically. The optimum Q, R and G can be obtained as follows: 1) Obtain P~CT by solving equation (20) for £ = l , - - , n , where n is the dimension of the state vector.
P~CT
Ti + CAGSF0 T2 + CAGsTl + CA2GSTQ
nT {DTD)~I1D
=
(23) n
r n + c7iG,r n _i + ■ • ■ + CA Gsr0 where DT =
[ATCT,---,(AT)nCT}.
2) Calculate R by using TQ and P ~ C T :
/? = r 0 -
c[p-cT].
(24)
3) Denote by PQ and P0 the error covariance matrices associated with the optimum gain G. Then, under the steady state condition, P 0 - = A(I
- GC)PQAT
(25)
+ Q .
It can be shown that P~ =A[(I-
GSC)P~ - P'CTGj
+ Gs (CP-CT
+ R)Gj]AT
+ Q.
(26)
Subtracting (26) from (25) and using equations (8) and (9) yields A(P0 - Pk)AT
= A[A(P0 - Pk)AT + P-CTGj
Let 6P~ = A(Po — Pk)AT
- GCPQ
- GS{CP-CT
+ G3CP+
R)Gj]AT
(27)
The optimum gain G is then given by
G = (P-CT
+ 6P-CT)(T0
C6P-CT)-1
+
(28)
Substituting (28) into (27) yields 6P~ = A[6P~ - (P~CT + 6P-CT)(r0 T
+ C6P~) + G3CP~ + P-C GJ
+
-
C6P-CTrl(CP-
GsT0Gj]AT
(29)
The optimum Kalman gain G can be obtained by solving equation (29) for 6P~ by using P~CT from (23) in one step from (28). In practice, if a batch of observations Vj can be stored, the above mentioned calculations might be repeated to improve the estimates of the desired parameters [2]. The residual sequence v_; will become
70
A. Moghaddamjoo and R. Kirlin
more and more white with each iteration, resulting in better and better estimates for the autocorrelations r 0 , Ti, ■ ■ ■, and, therefore, for R and G. The estimation for Q can be made with AP0AT + Q = Gr0(CT)+, (30) Q = G r 0 ( C T ) + - A(I - GC)GT0(CT)+AT
,
(31)
where superscript "+" means pseudoinverse. The preceding algorithm can be used on-line by using
3=1
for on-line estimation of autocorrelations. The efficiency of the correlation methods can be improved by using higher order correlations. The correlation technique is good for stationary time invariant systems where calculation time is not critical and running the filter with suboptimal Kalman gain for a long record of data is not dangerous. However this method cannot be used for nonstationary systems or for systems which involve unknown deterministic inputs for a short period of time. Using the stochastic approximation method in con junction with the correlation method gives a slight improvement in the estimates without requiring much extra computation and without requiring further storage of past data. 2.4 Covariance-matching techniques With this method the residuals are made consistent with their theoretical covariances. For example consider the residual sequence u , which has a theoretical covariance of [C(APj-\ATA1 + Q)CT + R]. If the actual covariance of £ has vari ance elements much larger than their theoretical value obtained from Kalman filter, then the estimate for Q should be increased. This has the effect of bringing the actual covariance of v_3 closer to its theoretical value. The actual covariance of v_j is approximated by its sample covariance; viz, Af1_1 J2i=i H-ji^It where M is chosen empirically to give some statistical smoothing. An equation for Q is obtained by setting C{APj-lAT + Q)CT + R = E{vjv]}, (33) or CQCT = E^vJ}
- CAP3_1ATCT
+ R.
(34)
Equation (34) does not give a unique solution for Q if C is of rank less than n. However if the number of unknowns in Q are restricted, a unique solution can be obtained. Notice however that equation (34) is only approximate since Pj-\ does not represent the actual error covariance when the true value of Q and R are unknown. Because of this approximation the convergence of the covariance matching technique is doubtful.
Robust Adaptive Kalman Filtering
71
2.5 Special filters Some of the important special filters of the above mentioned categories are summa rized here, a) Jazwinski adaptive filter [6]: This filter applies the correlation method in a special way to introduce a feedback from residuals in order to compensate the modeling error and hence prevent the divergence of the Kalman filter. This method leads to the estimator, ,. A ( 0, d,agQ fc , N = | d . a
g 0
^
i
every element of e < 0; ^ ^ ^ -"
,„_. (35)
subject to the rule that, if {q~jj)k,N < 0, set (qjj)k,N = 0, where TV is the number of points used in the estimation process, dia g Q fc , N = ( L ^ ) -
1
^
(36)
and eT = [d+i ~ E{d+1\Q
= 0}, • ■ ■, v2k+N - E{£+N\Q
= 0}].
(37)
In equations (36) and (37), we have t L = [aej](Nxn) ; atj = ^(Cfc+£$k+£,fc+i )] .
(38)
where $ij is the transition matrix (from the jth to the ith state) and E{&+t\Q
= 0} = Cfc+/*fc+/,fc flb*£n,fc C^+e + Rk+e ■
(39)
One of the important limitations of this approach is that it responds to mea surement noise, TJ, , if Rk+e > Ck+e QCk+e , where the best performance in terms of absolute size of residuals will not be realized. Also knowledge of Rk is essential in this algorithm. b) Belanger adaptive procedure for estimation of noise statistics [1]: In this proce dure also a correlation method is applied to the filter residuals. The more general time varying case can be handled by this method. In this case it is shown that, if the covariance matrices are linear in a set of parameters, then the correlation function of the filter's residuals is also a linear function of this set of parameters. This fact is used to perform a weighted least-squares fit to a sequence of measured correlation products.
72
A. Moghaddamjoo and R. Kirlin
In this formulation it is assumed that Q and R are linear functions of J com ponents of a vector a, i.e., J
R = Y,Ri<*i, i=l
J
Q = Y^Qi<Xi,
(40)
i=l
with the assumption that Ri, and Qi are known constant matrices, using the results given in the correlation method, it is straightforward to show that S[i^.t/J_ f ] is linear in the Qj's. This property allows a least-squares solution [1]. In this method the fact that autocorrelation function of the residuals sequence can be expressed as a linear function of the parameters of the noise covariance matrices is used to derive a least-squares algorithm to produce a recursive estimates of noise covariance parameters from measured values of the autocorrelation products. 2.6 Discussion Different categories of adaptive filtering are discussed briefly. Generally the proce dures discussed in this section are useful for stationary systems; most of them are also excessively time consuming which make them impractical for on-line processing. §3 Adaptation for unknown deterministic inputs The unknown input problem is handled differently in different situations [7]. There is not a general algorithm that can be used in all circumstances. Most approaches drop the BfcU/t term (see equation (1)); then, by testing the residuals sequence, Q is changed in order to allow the filter to follow any unpredictable change in the inputs. Other methods use inputs as components of a state vector and by the method of state augmentation try to estimate inputs and state vectors simultaneously. A large number of schemes for failure detection in dynamic systems have been used for detecting unpredictable changes in inputs. The following discusses these alternatives. 3.1 Filter compensation using process noise covariance In this method one simply ignores the determinism of the input vector SfcUfc and lumps this uncertainty into the process noise term £ Basically this method examines the "regularity" of the filter residual vector v^ against its covariance matrix Tk using the \2 variable lk=d^kl>^(41) When Ik becomes too large one suspects that the input of the system is changing and the covariance of £., namely Qk, is increased so that Ik is reduced to a reasonable value. This method therefore has the combined features of input detection and filter compensation, but it is completely suboptimal. During the transient following an input change, the filter is completely lost; after a large delay, the filter will catch the track but with more uncertainty. This method can be used only in cases where the corresponding transient delay, in the estimation of the input, is not critical.
Robust Adaptive Kalman Filtering
73
3.2 State augmentation and input detection This method is straightforward but computationally is more costly [3]. The perfor mance of this approach, however, is substantially better than the previous method. In this case, the input u^ is included in the state vector; i.e., the augmented state vector consists of
*.<*) = [ x j . u j f
(42)
When the input dynamic is completely unknown, one uses «t+i = » t + J . ( t ) .
(«)
where £ (k) is modeled as a zero white noise process with covariance Qa(k). For the case where bounds on the magnitudes of inputs are known, a method for choosing Qa(k) such that the actual filter performance is bounded by the computed filter covariance is proposed in [3]. As a rule of thumb the values of the entries in Qa{k) should be a fraction of the expected magnitude squared of the input forcing function and proportional to the measurement time interval. For some applications the process noise covariance term may be required to appear at states other than those representing system inputs. The input state ut is usually not influenced by the state vector xjb- In this case one can compute the state and input estimates separately to obtain Xfc = xfc + Bfciifc ,
(44)
where x^ is the estimate assuming Ufc is zero, u/t is the input estimate, B^ is a control input matrix, and x^ is the final state estimate. The advantage of this decoupled implementation is a saving in computation. It can be shown that the decoupled input state estimator is optimum for linear systems when the input state model is known and deterministic and the input change time tj is known. Suboptimal designs for the case when the above assumptions are not true can also be found in [3]3.3 Discussion All of the above mentioned input detection methods exclude the equally important estimation of noise statistics, hence there is no discussion concerning the mutual influence of the input detection and noise statistics estimation. For all of the procedures discussed in this section, as the estimated input de grades from the true value (which always occurs during transients), the filter will become suboptimal and in some cases unstable. A long recovery period is usually needed. Further, poor estimates of noise statistics will produce the same difficulties. Hence it is necessary to consider these two problems simultaneously and study their effects on each other. This matter is discussed in detail in the next section.
74
A. Moghaddamjoo and R. Kirlin §4 Simultaneous adaptation for unknown noise: statistics and deterministic input
For simplicity and without lack of generality, in the following we assume Bk = I in equation (1) and dimensions of Ufc and x t are the same. 4.1 Linear sequential adaptive algorithm In this algorithm [13] sequential estimators are derived for suboptimal adaptive simultaneous estimation of the unknown noise statistics, deterministic input, and the system states. First- and second-order moments of the noise processes are estimated based on state and observation noise samples generated in the Kalman filter algorithm. A limited memory algorithm is developed for adaptive correction of the a priori statistics, which are intended to compensate for time-varying model errors. In this section the algorithm and its robust version are presented; lastly, a recently proposed stable and asymptotically optimal procedure is discussed. In the procedure proposed in [13] Ns most recent samples of the state error sequence ffc = xfc - ^fc-iXfc.! (45) are used to estimate an unknown constant forcing function or biases. That is, **-i = ^
!>-,••
(46)
This bias is then included in the state estimator Xfc = ^fc-ixfc-i + flfc_! - Gfc[vfc - Cfc^fc-iXfc-i + Ufc_i)].
(47)
The ffc sequence is also used to estimate an unknown state noise covariance matrix, Qk- It has been shown that the following estimator is unbiased: ®k
=
N 1 " N - 1 Hl(ffc-J+1 ~
fl
fc)(ffc-J+i ~ u fc) T l
- w E ( ^ - ; pfc-i Al-t - p -w+i Similarly the measurement covariance estimate based on the Nz most recent errors of the predicted measurement,
^
k =
1
N
*
W-rrl2^-j+i 2 S=l
N — 1 Jf—Ck-j+i
- r /c)(^- J + i -f f c ) T
(49) r
{Ak-j Pk-j Al_j + Qfc_j )C fc _ J+1 ] ,
75
Robust Adaptive Kalman Filtering where v*. = vfc - Ck(Ak-i Xfc-i + u f c _i)
(50)
w2
^
and Tfc
w£*-*»> i
3—1
has been shown to be unbiased. There are some problems associated with this algorithm. All of these estima tors are good as long as the estimates of the state vector are good representatives of the true state of the system. During an abrupt change in the input forcing function, however, state estimates will degrade rapidly from their true values, causing rapid increase in the estimates of R and Q. This in turn will result in more suboptimum state estimates. It has been shown [10] that there is a positive feedback in this se quential algorithm, which is mainly due to the estimates of Rk and Qk- In addition, since a combination of Ufc and rjt can only be observed through Vfc, it is impossible to simultaneously estimate these unknown parameters as indicated. Therefore, it is required to use the filter only for short periods of time along with the assumption that the measurement bias is known. 4.2 Robust sequential adaptive algorithm In this context, robustness signifies insensitivity to small deviations from the un derlying assumptions. The proposed robust version of the sequential adaptive al gorithm [7] is insensitive to measurement noise outliers and decreases the inherent potential of instability by reducing the rate of external disturbances. The algorithm in [7] can be run for a longer period of time without significant filter degradation. In this procedure estimates of Ufc, Qk and Rk are obtained as follows: The robust forcing function estimate is based on the med (= median) of the Ns most recent estimation errors, Ufc = med {ffc} N3, (52) and the unbiased robust process covariance estimate is
Qk = [fe(*)l -
1 W N
'
N
-
£ ( 4 t - / Pk-eAl_e - Pfc_m ),
(53)
i-
where NS 9ijW
E<[&-<+i(0 - at(i)][fib-<-nO-) - ukU)][i -
M?^)}4
{£J[1 - JU&<0][1 " 5M^)]}{-1 + £J[1 - M?-W][l - 5M?.(£)iy
l
)
In this equation ffc(i) and iifc(i) are representing the ith components of ffc and Ufc respectively. Mu (t) can be calculated from
76
A. Moghaddamjoo and R. Kirlin
Mi
(f)
=
**K '
|f f c -<+i(i)-med{f < (i)}| 6med{|fc(z) - med{4(*)}|}
|f f c -/+i(j)-med{f<(j)}| 6med{|f c (j) - med{f c (j)}|}
k - Ns + 1 < c < k.
(55)
In equation (54), Yl indicates summation only over those £ for which Af& < 1. Thus, the biweight weights less those variations of the state estimate which are more distant from the median and gives zero weight to those variations which are greater than 6 times the median variation. The bracketed denominator factors in equation (54) are derived from an asymptotic form [12]. Similarly the unbiased robust measurement error covariance estimate based on the Nz most recent measurement residuals is formulated as follows: 1
N
* C
Rk = [hj{k)] - jj- Y,
k-e+i (Ak-t Pk-t Al_t
+ Qk-t)C^_e+1,
(56)
where . , „ _ N, E f kfc-<+i(0 - m e d f a ( i ) } ] K _ m ( j ) - med{^Q-)}][l - /^-(l)] 4 ( E i ( i - 4 ( Q ] ( i - 5 / ^ M l H - i + £ j [ i - A, «
- 5/x?,-W]} (57)
and 2 (f) = MiiW
|j^-f+i(i)-med{j^(i)}[ 6med{|^(i)-med{^(i)}|}
kk-e+iU) - med{t^(j)}| 6med{|^(j) - med{i^(j)}|} '
k - Nz + 1 < c < k.
(58)
In these equations v_k{i) represents the ith component of v^. and £ indicates sum mation only over those £ for which /z|(Q < 1. In this case r^, the measurement bias, is assumed to be zero. In addition, the input forcing function is assumed to be a step-like function, and robustness is with respect to the noise outliers. The fact that this method is insensitive to the state and measurement noise outliers results in less variations in the estimates of Rk and Qk due to an abrupt change in the unknown input forcing functions. Also estimates of the time instants of the abrupt changes in the input provide a tool (refiltering from the estimate of the time of change to the present time) to adjust the previous estimates of the unknown parameters. This algorithm can be run for a longer period of time without significant filter degradation. However, since the skeleton of this robust algorithm is derived from the proposed method in [13], there is a potential instability in the procedure. The major disadvantage of this robust procedure is its complexity and computation time.
Copyrighted Material
77
Robust Adaptive Kalman Filtering 4.3 Stable and optimal sequential adaptive algorithm
In this procedure [10] the input forcing function, its time of occurrence and the measurement error covariance matrix R are simultaneously estimated based on a running window (robust or conventional) regression analysis. This algorithm is completely independent of the main Kalman filter loop because it is not based on state estimates. In addition a stochastic approximation for estimating the process noise covariance Q is derived which produces a stabilizing negative feedback in the overall algorithm. A) Estimation of the unknown input forcing functions The instability problems of the previous methods are mainly due to the de pendency of the adaptation routines on the byproducts of the Kalman filter. It is therefore desirable to search for an independent adaptive procedure to estimate the unknown parameters. In this procedure the unknown input forcing functions are estimated by using only the N most recent measurements. In consequence it is nec essary to discover a direct relationship between the system measurements and input forcing functions. The general deterministic input-output relationship is derived as follows. The system equations in noise-free (using superscript *) situations are x£ + i = Ax*k + Buk ,
(59)
v'k = Cxi ,
(60)
where it is assumed that the system is time-invariant. Note that in the case of time-variant systems the derivation is similar but much more complicated. The Z-transforms of equations (59) and (60) are X*(z)=(zI-A)-1BXJ{z), V*(z)=CX*(2).
(61) (62)
Substitution of X*(z) from (61) into (62) yields the final input-output relation in the 2-domain V*(z) =C(zI-A)-1BU{z). (63) In general for any linear system we can assume
^-A^
= W)>
(64)
where ij}{z) is an nth-order polynomial in z, and A(z) is an (n x n) matrix whose elements are polynomials in z. Prom equations (63) and (64) we get ■tP(z)V*(z)=CA(z)BU(z),
(65)
78
A. Moghaddamjoo and R. Kirlin
which in the discrete time domain is written as
X>v£ + i = X>u f c + i . i=0
(66)
i=0
Here we have assumed n-l
(67)
rp(z) = ]T] a.iZ% i=0
and
n-l
CA(z)B =
(68)
Y^Diz\ i=0
where Di's are (m x p) constant matrices. The dimensions of the unknown vector u/t and the measurement vector v^ play an important role in the estimation process of the unknown input forcing functions. Assuming that v£'s are known and the estimation process is at step j , from (66) we have n-l
n-2
D n _iUj = ] T a i v * _ n + i + 1 - ^2 DiUj^+i+i i=0
,
(69)
i=0
where \ij-n+i++1+1 i >, i — 0,1, • • •, n — 2, are replaced for by the estimates of the input forcing functions in the previous steps. If m = p, we would have a unique solution for Uj. But, if m < p, the number of equations is less than the number of unknowns, and in general there is no unique solution for Uj. In the case of m > p there is a typical minimum MSE solution to equation (69). To obtain a unique estimate of the unknown input forcing function at step j , it is therefore essential to restrict the dimension of Ufc, namely p, to be less than or equal to the dimension of Vfc, namely m. The problem is now reduced to that of fitting an appropriate deterministic function of time to each component of measurements in order to obtain estimates of error-free measurements, v£'s. Given the complete knowledge of the system characteristics it is always feasible to find these deterministic functions. Let us assume that the following measurement model is appropriate for the system under study: v '=^o + ^ ^ ) + ^ 2 ( * . ) + ---+^(*.)+l„ (70) where gj{t);j = 1, 2, ■ ■ ■ ,q are determined based on the knowledge of the system and /3.; j = 1,2, • • •, q, are m-dimensional vectors of coefficients which are assumed to be constant. Cj is the measurement error at time ti, which is assumed to be white Gaussian noise with potential outliers. To obtain estimates of the error-free measurement v£, it is necessary to find the best fit (in the weighted least-square error sense) to the N most recent measure ments using the measurement model in equation (70). This curve-fitting procedure
79
Robust Adaptive Kalman Filtering
should be conducted for each individual component of v^ regardless of their interdependency. Therefore for simplicity and without loss of generality, we can assume scalar measurement in the following formulation. To simplify the notation the following variables are introduced: xu = gi(U) ' X2i = 92(U)
•Eqi
>;
(i = i,2,
(71)
,N),
9q\^i) >
where tjv corresponds to the present time (the time of the most recent measure ment). Assuming scalar measurements, equation (70) can be written as v = XB + £,
(72)
[vi,v2,
(73)
where ,VNl
B=[/3i,/32,---,/yT,
(74)
£ = [«1,«2|--- ,£N}T ,
(75)
(\ X
V1
Hi
121
1,1
£12
£22
Xq2
X\N
X2N
XqN
^
(76)
)
Therefore v is an (TV x 1) vector of the TV most recent measurements, X is an [TV x (q+ 1)] matrix of the levels of the regression variables, B is a [(q+ l ) x l ] vector of the regression coefficients, and £ is an (TV x 1) vector of white measurement errors with zero mean and variance a2 It is also assumed that measurement errors include outliers. The objective is to estimate the vector of the regression coefficients, B, while reducing the effects of the outliers. We use the weighted least-square estimate, B, which minimizes S{B) = Y^WiC2 = £TW£
= (v - XB)rW(v
-
XB),
(77)
where W i s a diagonal matrix with elements w\, W2, • ■ ■, WN, which are the weights for the errors. The weighted least-square estimate of B [11] is B=
(XTWX)-1XTWv.
(78)
80
A. Moghaddamjoo and R. Kirlin
Errors with small Wi have larger variances than errors with large Wi. Application of the weighted least-square method requires the knowledge of the weights, u>i. Sometimes prior knowledge, experience, or information from the theoretical model can be used to determine the weights. In some cases it is necessary to guess the weights, perform the analysis, and then reestimate the weights based on the results. Several iterations may be necessary. In general a class of robust estimators may be defined that minimizes a function p of the residuals; for example, N
N
i=i
i=l
(79) where x ^ denotes the zth row of X. The function p is related to the likelihood function for an appropriate choice of the error distribution. In equation (79) s is a robust estimate of standard deviation. A popular choice for 5 is absolute median deviation given by s = med|e; - med(ei)|/0.6745 , (80) which is insensitive to outliers, and its statistical behavior is well studied [11,12]. A number of popular robust criterion functions are discussed in [11]. Based on an extensive Monte Carlo simulation study [7] we have found that Tukey's biweight method and the square of the absolute median deviation given in [12] are the most acceptable methods for providing a robust estimation of variance. In [7] it is shown that these methods have the least bias and fewest variations. On the other hand it is also shown that Spearman's rank-correlation [4] is the most appropriate method (again due to its least bias and variation) for the robust estimation of correlation coefficient. B) Estimation of the point of change in the input forcing functions For the purpose of refiltering, estimation of the points of change in the inputs is required. It is assumed that the regression coefficient vector before a change in the input is B and after the change is 6_ = B + AB, where AB is the change in the coefficients due to the change in the input. This problem has been studied for different cases in [5]. In this work only the case with two-segment curves is considered. For N data points in the window it is important to determine whether or not a two-segment curve is appropriate and has the weighted least-square error. If so, it is necessary to estimate the point j at the intersection of these two curves. Also it is essential to test the hypothesis for the significance of changes in the coefficients, AB. If none of the components of AB is statistically significant, then the estimated j should not be considered as an estimated point of change in the input. The weighted least-square solution sought is now briefly stated for the case of two submodels, g\{t; B) and g2(t;6) joined together at t = a. It is necessary to find
81
Robust Adaptive Kalman Filtering
vectors B = B, 6_ = 6 and real values a = a, j = j , which minimize the N weighted squares of errors in the window. That is, minimize j
N
S(B,S,a,j)^^2wi[vi^g1(ti;B)]2+
£
i=l
u*fa - g2(U;6)}2
(81)
i=>+l
subject to the following relationships among the parameters: 9l(a;B)=g2(&;6)
(82)
t-
(83)
and the parameters B, 6_, a, and j are clearly not independent. The estimated j is accepted as a point of change in the input when AB, which is related to the change of the input, is statistically significant. Based on the assumption that the noise is Gaussian (aside from the additive impulsive noise which is rejected during the estimation process of B by using the weighted leastsquare method), the random variable B is approximately normal [11]. Therefore, test of the hypothesis (for the significance of AB) can be conducted using Gaussian distribution. For example, for 95% confidence, AB is statistically significant if it is greater than 1.96 times its corresponding standard deviation. If AB is statistically insignificant, the process then proceeds without considering j as the point of change in the inputs. C) Estimation of the measurement noise covariance matrix An important byproduct of this curve-fitting procedure is that it is possible to simultaneously estimate the measurement noise covariance matrix, R, independent of the Kalman state estimates. We estimate R by calculating the covariance matrix of the curve fitting residuals. Let Vj. represent the vector of residual at time tk- The conventional estimate of R (if Vj. is white Gaussian), using the N most recent residuals, is given by 1
N
(
^jvn^ --"£)(-r£)T'
{84)
2 = 1
where 1
N
«=1
Robust versions (with respect to outliers and the assumption that v_k is white Gaussian noise with potential outliers) of equations (84) and (85) can be found in [4,11,12]. As mentioned earlier, Tukey's biweight method and the square of
82
A. Moghaddamjoo and R. Kirlin
the absolute median deviation are the most acceptable methods for the robust estimation of variance, and Spearman's rank-correlation is the most appropriate method for robust estimation of correlation coefficient [7]. D) Estimation of the state noise covariance matrix The optimal estimation of Q is a very difficult task. It has been shown in [8] that under special conditions, it is possible to optimally estimate Q adaptively. The stated conditions are that 1) the system should be controllable and observable; 2) process noise should be stationary, and there should be no simultaneous input estimation; 3) for large sample size covariance estimation, the Kalman filter with suboptimal gain should be run for a long period of time to ensure steady-state suboptimal residual sequence; and 4) in order to have a unique solution, the number of unknown components in Q must be less than (n x m), where n is the dimension of the state vector and m is the dimension of the measurement vector. These conditions are generally not satisfied in practice. The role of the noise covariance, Q and R, in the Kalman filter is to adjust the Kalman gain in such a way that it controls the filter bandwidth as the state and the measurement errors vary. Direct estimation of R, as mentioned before, is attainable via an adaptive curve fitting procedure while, in general, direct estimation of Q is impossible. For indirect estimation of Q we shall propose to use the residual properties and adjust Q in such a way that statistics of the filter residuals approach those of the optimum Kalman filter. In this regard it is therefore necessary to derive a rela tionship between Q and the autocorrelation structure of the Kalman filter residual sequence u_k. Q must then be modified in such a way that reduces the deviation of the autocorrelation structure of v_k from its optimum values [8,9]. By assuming that the filter is in a steady-state condition with steady-state Kalman gain G, the residual sequence v_k is a stationary Gaussian sequence [8]. The autocorrelation of v_k sequence is given by 1
(CP~CT + R, \C[A(I-GC)]i-1A[P-CT-G(CP-CT
+ R)],
ifi = 0, if i > 0,
, (
. '
where P~ = APAT
(87)
+ Q. T
T
1
Note that the optimum value for G, i.e., G = P~C (CP~C + R)' , makes T; vanish for all k > 0 (the optimum filter property). In this section we propose that instead of trying to estimate Q completely, we can use Q as a negative feedback to control and adjust the statistics of the filter's residual sequence. In other words Q is modified and changed in such a way that causes T^'s to approach their optimum values. It is therefore necessary to recognize the direction of changes in Fj due to variations in Q. Based on equation (87) Q must be adaptively changed so that Tj vanishes for i > 0.
Robust Adaptive Kalman Filtering
83
By careful consideration of the Kalman filter's equations and equations (87) and (88), it is evident that if any component of I \ is positive for i > 0, then proper correction requires an increase in those components of G which correspond to the positive components of IV This can be achieved by an increase in the corresponding components of Q; similarly, negative components in I\, i > 0, will be corrected by a decrease in those components. In other words if I \ is positive for i > 0, Q should be increased, and, as a result, I \ will be decreased (I\ approaches zero). On the other hand if I \ is negative for i > 0, Q should be decreased, and, as a result, I \ will be increased (I\ approaches zero). Thus, the effects of T; on Q are direct, and for I \ = [0], k > 0, it is necessary to keep Q unchanged. The relationship between Q and 1% cannot be derived analytically, but several different empirical relationships may be proposed; for example,
Qfc+i = Qk exP[A(r! + r 2 + • • • + r,)ro 1 T T ]
(88)
where matrix Q is (n x n), and matrices IYs are (m x m), A and T are coefficient matrices which can be found experimentally, and their dimensions are (nxrn). The number of autocorrelations that is used depends on: 1) the maximum number of data points necessary to yield acceptable estimates of IYs, and 2) the maximum number of permissible data points in the moving window. In this procedure, due to our findings in [7], we suggest that small sample size estimates of correlations (like rank-correlations) be used. Although the stability formulation of this method, due to its complexity, is not readily accessible, it can be easily understood by the following arguments. In this method, Q is the only unknown parameter which controls the Kalman gain. This is because R and u are independently estimated and they are not associated with any feedback loop in the filtering process. Let us assume that, due to some disturbances (i.e., unknown sudden changes in R and/or u), the Kalman gain becomes less than its optimum value. The resultant residual sequence will then have positive autocorrelation, due to equation (87). Detection of the positive autocorrelation demands an increase in Q, which will, in turn, increase the Kalman gain (G changes in the direction which approaches its optimum value). On the other hand, if the Kalman gain is larger than its optimum value, the residual sequence will develop a negative autocorrelation which requires a decrease in Q. Reduction of Q will then decrease the Kalman gain (G changes again in the direction which approaches its optimum value). This correction continues until G reaches its optimum value where it oscillates. Therefore deviation of G from its optimum value, due to any disturbances, not only is controlled by Q, but also will be reduced in time. This behavior is what we refer to as a negative feedback which has a stabilizing role in the overall performance of the algorithm. the overall performance of the algorithm. The overall algorithm is examined in [10] by simulating the double integrator system. The results verify the superiority of this method over the conventional adaptive filtering algorithm in [13] operating under the same conditions.
84
A. Moghaddamjoo and R. Kirlin
4.4 Discussion The conventional method utilizing state estimates to estimate both unknown inputs and noise covariances is potentially unstable and produces suboptimal state esti mates in both the transient and steady state conditions. Another drawback to that procedure is that it is too sensitive to the existence of outliers in the measurement errors due to a possibly invalid Gaussian noise assumption. This problem may be resolved with the considerable calculations of the robust adaptive method. The instability problem arises from the dependency of the estimate of the unknown pa rameters on the state estimates and a positive feedback in the process of estimating Q. The difficulty has been overcome [10] by a curve fitting procedure for estima tion of the input u^ and measurement noise covariance R and by using stochastic approximation for estimation of the process noise covariance Q. §5 Conclusion In this work several different approaches to adaptive Kalman filtering are reviewed. Adaptation is assumed to be with respect to unknown noise statistics and/or un known deterministic inputs. In the case where only noise statistics are unknown, four different approaches (Bayesian, ML, correlation and covariance matching) are reviewed. These methods are based on the assumption that the noise is stationary. Some of these methods (Bayesian and ML) are computationally more involved that cannot be used on real time processing. Others (correlation and covariance match ing) are required to be run suboptimally for a long period of time until optimum estimates of unknown parameters can be obtained. In the case where only deterministic inputs are unknown, two different ap proaches are reviewed. In the first method the deterministic input term is assumed to be zero and the process noise covariance is increased. In this case the filter op erates suboptimally. In the second method inputs are estimated in the same way as states of the system, using the method of state augmentation. This method is optimum only for linear systems when the input state model is completely known. The augmentation method is computationally more costly. In the case where noise statistics and the deterministic inputs are unknown, three different approaches, which are all sequentially adaptive, are studied. The first approach, which is referred to as the linear method, has a potential of instability and can be applied only for a short period of time. The second approach, which is a robust version of the linear method, has also a potential of instability but it can be used in a longer period of time. These two approaches are both suboptimal. The third method, which is referred to as the optimal sequential adaptive algorithm, is stable mainly due to its way of estimating the process noise covariance. This method yields optimum estimates of the unknown inputs and the measurement noise covariance. In this approach estimates of the unknown parameters (except process noise covariance) are calculated independently and optimally. This method can can be be used used for for aa long long period period of of time time without without any any degradation. degradation.
85
Robust Adaptive Kalman Filtering References
1. Belanger, P. R., Estimation of noise covariance matrices for a linear timevarying stochastic process, Automatics 10 (1974), 267-275. 2. Carew, B. and P. R. Belanger, Identification of optimum filter steady-state gain for systems with unknown noise covariances, IEEE Trans, on Auto. Contr. 18 (1973), 582-587. 3. Chang, C. B. and K. P. Dunn, Kalman filter compensation for a special class of systems, IEEE Trans, on Aero. Electr. Sys. 13 (1977), 700-706. 4. David, H. A., Order Statistics, John Wiley & Sons, New York, 1981. 5. Hudson, D. J., Fitting segmented curves whose joint points have to be esti mated, J. ofAmer. Statist. Assoc. 61 (1966), 1097-1129. 6. Jazwinski, A. H., Adaptive filtering, Automatics 5 (1969), 475-485. 7. Kirlin, R. L. and A. Moghaddamjoo, Robust adaptive Kalman filtering for systems with unknown step input and non-Gaussian measurement errors, IEEE Trans, on Acous. Spee. Sign. Proc. 34 (1986), 252-263. 8. Mehra, R. K , On the identification of variances and adaptive Kalman filtering, IEEE Trans, on Auto. Contr. 15 (1970), 175-184. 9. Mehra, R. K , Approaches to adaptive filtering, IEEE Trans, on Auto. Contr. 17 (1972), 693-698. 10. Moghaddamjoo, A. and R. L. Kirlin, Robust adaptive Kalman filtering with unknown inputs, IEEE Trans, on Acous. Spee. Sign. Proc. 37 (1989), 11661175. 11. Montgomery, D. C. and E. A. Peck, Introduction to Linear Regression Analysis, John Wiley & Sons, New York, 1982. 12. Mosteller, F. and J. W. Tukey, Data Analysis and Regression, Addison-Wesley, Reading, MA, 1977. 13. Myers, K. A. and B. D. Tapley, Adaptive sequential estimation with unknown noise statistics, IEEE Trans, on Auto. Contr. 21 (1976), 520-423. A. Reza Moghaddamjoo Department of Electrical Engineering and Computer Science University of Wisconsin-Milwaukee P.O.Box 784 Milwaukee, WI 53201 [email protected] R. Lynn Kirlin Department of Electrical Engineering University of Victoria Victoria, B. C. Canada V8W 2Y2 [email protected]
On-line Estimation of Signal and N o i s e P a r a m e t e r s and the Adaptive Kalman Filtering
Piotr J. Wojcik
Abstract. The paper discusses problems in design and application of on-line Kalman filtering techniques for noise reduction and the best possible signal restoration, and gives an overview of practical approaches to solving these problems. The discussion is limited to linear systems with known structures and includes both stationary and nonstationary cases. First, the formulation of the optimum Kalman filtering problem is given. This includes all a priori information about the signal and noise which is necessary for the design process of an optimum Kalman filter. However, such full information about a system model is usually unavailable. In this situation the adaptive Kalman filtering process can be used to solve the problem of incomplete information. As is known, the adaptive Kalman filtering process consists of the following functional components: estimating unknown parameters of the signal model, updating estimates within the Kalman filter structure, and Kalman filtering itself. The paper gives an overview of unknown parameter estimation techniques as well as methodologies for on-line updating parameters within the Kalman filter structure. These methodologies and algorithms are assessed through computer simulations. The assessment is based on the overall performance of the adaptive filtering process. The performance of the filter is measured directly through an improvement coefficient, which indicates how many times the adaptive Kalman filter reduces the observation error. Such a measure enables a designer to determine whether the application of the adaptive Kalman filter will reduce observation errors. Finally, the paper discusses rates at which the adaptive Kalman filter can follow changes of the system characteristics, and proposes modifications to the adaptation scheme, which increase the filter applicability to nonstationary signals.
§1 I n t r o d u c t i o n In control a n d d a t a acquisition systems when signals from any sensors are processed, t h e problem of t h e measurement noise always arises. In t h e past, simple analog filters, with characteristics chosen by t h e designer on t h e basis of a priori information Approximate Kalman Filtering Guanrong Chen (ed.), p p . 87-111. Copyright ©1993 by World Scientific Publshing Co. Inc All rights of reproduction in any form reserved. ISBN 981-02-1359-X
87
88
P. Wojcik
about the signal, were used. It was unreasonable to utilize an expensive computer for each sensor or even a group of sensors to perform a high quality digital filter ing, but now, with microprocessors commonly available, inexpensive and powerful optimum digital filtering can be applied to a variety of sensor signals. Sensor signals are usually stochastic in nature, so statistical approaches to filtering should be applied to these kinds of signals. Such approaches define the best filter as one, which, on average, has its output closest to the correct or useful signal. Two theories of the optimal linear filtering have been developed: the WienerKolmogorov theory for coping with stationary signals and the Kalman filter theory [8] for nonstationary signals. The problem of the design of the Kalman filter for a particular application can be solved in many ways, dependent on a priori information about the processed signal and noise: (i) When all parameters of the pure signal and noise are known, the optimum Kalman filter can be obtained [1,6,5,3] relatively easily. (ii) When the parameters of the signal and noise are not known exactly, but the uncertainty ranges of these parameters are relatively small, the low sensitivity Kalman filter [12] can be designed. (iii) When the parameters of the pure signal and measurement noise are un known, but the structure of the system generating signal and noise is known, several adaptive Kalman filtering algorithms [15,16,2] can be used. This paper describes adaptive Kalman filter algorithms, which can be applied to signals in the presence of either white or coloured measurement noise. All pa rameters of the signal and noise are assumed to be unknown. The adaptive Kalman filter algorithms have been tested through computer simulations, and their results are presented and discussed. §2 Kalman filtering problem and optimum solution in the presence of white noise 2.1 Observed signal model Consider a stationary multivariable linear discrete system (also called a shaping filter) described by the following state space equations (Figure 1) :
Figure 1. Model of the observed signal.
89
On-Line Estimation and Adaptive KF
x fc+1 = Axk + B(k
(1)
vfc = Cx fc + r7fc,
(2)
where Xfc — n x 1 state vector; A — n x n state transition matrix; B — nx q input matrix; Vfc — p x 1 measurement vector; C — p x n output matrix. The sequences £ (q x 1) and TL (p x 1) are uncorrelated Gaussian white noise sequences with means and covariances as follows:
E{ik}=0,
E{tg}
E{r,k} = 0, E
iiill]} =
= Q6lj,
E{rLiTL}=R6ij, 0
foralU.j,
where E{-} denotes the expectation and <5,j denotes the Kronecker delta function. The observed signal v^ consists of the pure signal Cxk and measurement noise 77.. 2.2 Optimum Kalman filter When all parameters (matrices A, B, C, Q, R) of the signal model are known, the optimum filtering equations [1,6,5,3] are Xfc+i,fc = Akk
(3) - Cxk,k-l)
(4)
where Xfc+i^ is the estimate of xt+i based on all measurements up to k, i.e., {v 0 , • ■ -, Vfc} and x.k
— for thefiltering,
Vfc+i = Cxfc+ifc
for the one step prediction. - for the one step prediction.
filtering,
(5) (6)
?he optimal gain matrix, Gopt(n X p), of the Kalman filter is given by The optimal gain matrix, Gop t(n x p), of the Kalman filter is given by Gopt = MoptCT(CMoptCT
+ R)-1
(7)
90
P. Wojcik
where Mopt = E{(x.k — xjtifc_i)(xfc — Xfc,fc-i)T} is the covariance matrix of the error in estimating the state for the Kalman predictor. The minimum covariance of the state estimation error, M^t, can be found [7] by solving the following Riccati difference equation M(k + 1) = A{M(k)
- M(k)CT[CM(k)CT
+ R]-1CM(k)}AT
T T + BQB BQB
(8)
or the Riccati algebraic equation (steady state solution) M = A{M - MCr[CMCT
+ + R]-1CM(k)}AT
T T + BQB BQB
(9)
The minimum covariance of the state estimation error, P, for the Kalman filter is defined as P = E{(xk — Xfcfc)(xfc — Stk,k)T} and can be calculated [7] from the following equation CMopt. (10) P = Mopt - oplGoptCMopt. opt-G In order to process the signal v* using the optimum Kalman filter (Figure 2), the matrices A, C have to be known and the optimum gain matrix Gopt has to be found. In the case when all parameter matrices A, B, C, Q, R are known exactly, the optimal gain matrix Gopt can be calculated by solving the Riccati equation (8) or (9) and substituting the result to (7). However, in the overwhelming majority of practical applications some of those parameter matrices (or even all of them) are usually unknown.
Figure 2. The structure of the Kalman filter-predictor. §3 On-line Kalman filtering In the case when off-line processing is possible, all unknown parameters of the signal and noise models can be estimated first from the observed signal, which can be run as many times as required, and next the observed signal can be filtered using the Kalman filter with the previously estimated parameters. When on-line filtering of the signal is required, the problem becomes more difficult and complex. Now all the adaptive signal processing components: the
On-Line Estimation and Adaptive KF
91
estimation of unknown parameters, updating the estimates and the Kalman filtering itself have to be executed concurrently in real time. The adaptation process (i.e., the way parameter estimating and updating is performed) of the Kalman filter can be organized in many ways, and Figure 3 shows one of the possible schemes. A choice of the appropriate adaptation process is dependent on the required adaptation rate, estimation accuracy and available a priori information about the signal and noise to be filtered. These issues will be discussed in the latter sections of this paper.
Figure 3. Adaptation scheme for the Kalman filter. §4 Estimation of the system model parameters and the adaptive Kalman filtering Under the assumption that only the structure of the signal model (Figure 1) is known and all parameter matrices are unknown, an adaptive Kalman filter is to be design. In that case the problem is to estimate matrices A, C and the optimum gain matrix, Gopt, of the Kalman filter on the basis of the observed signal samples Vfc. Once all parameter matrices are estimated, the optimum gain matrix, Gopt, can be found by solving the Riccati equation (8) or (9) and substituting the result to (7). Thus, the main problem is to identify all matrices of the system model (Figure 1) using consecutive samples of the observed noisy signal v> . In order to solve this problem it is assumed [10] that the system is minimum phase and is completely controllable and observable, the state transition matrix A is nonsingular and stable, the input and output of the system are scalars (p = q = 1) and the covariance of the system noise £. equals identity (Q = I). 4.1 State transition matrix A estimation Let the correlation function of the observed signal be denoted by T = E{vE{v I\,(0 v(i) = kvk-i} kvk-i}
(11) (11)
An expression for Tv(i) can be easily obtained from (1) and (2) {f CSCT + R r„(i) = { T l ' \\ CA CA'SC SCTT
for i = =0 forz>0,
(12) (12) K '
92
P. Wojcik
where = ASAT
S = E{xkxJ}
+ BBT
(13)
is the state covariance of the signal model. Using (12) for i = 1,2, • ■ ■, n,
~r„(ir
(14)
DSC>
r„(n) where D is square, nonsingular matrix defined by CA (15)
D = CAn Consequently, inverting (14)
r.(i) (16)
SC< = £ T
Lr»(n)j Considering (12) for i = n + 1,
r„(i) Tv{n+ 1) = CAn+1SCT
CAn+lD-1
=
(17)
r„(n) and using the Cayley-Hamilton theorem (for the matrix ^4), which states that a matrix satisfies its own characteristic equation: A n + a n A n - 1 + --- + Q 1 / = 0,
(18)
where at\, ■ ■ ■ , a „ are the coefficients in the characteristic polynomial of A, it can be easily seen that CAn+l = -[ai ••• an]D (19) or
CAn+lD~l
=
m
(20)
■■■ an\
Therefore, from (17) and (20),
I \ , ( n + 1) = - [ Q I •■• Q„]
J2atTv(z)
,r.(n)
(21)
93
On-Line Estimation and Adaptive KF and generalizing this result n
r„(n + j) = - ^ a < r „ ( i + j - l )
forj>0.
(22)
Using (22) for j = 1,2, ■■■,n> rewriting the set of equations in matrix form and inverting the final matrix equation
r„(n)
r.(i)
a\
r w (n+l) (23)
r.(n)
La„
r„(2«-i)
r„(2n)
Replacing the values of the correlation function r „ ( l ) , • • ■, Tv(n) in (23) by their estimates r „ ( l ) , • • ■, Tv(n) gives the algorithm to estimate the coefficients in the characteristic polynomial of the state transition matrix A of the system:
r„(i)
OL\
r„(n)
T„(n+1) (24)
r„(n)
LQ„
r.(2n-l)
T„(2n)
The estimates of f„(j), the correlation function of the observed signal, can be obtained [9] using an asymptotically unbiased, normal and consistent estimate 1
f v(J) = -y 2jtW£l,i-j]
,2n,
(25)
or its recursive form
f? + 1 W = f^O") + J^JIVN+IVN+I-,
- f f 0')], j = 1,2,- -,2n.
(26)
It can be proved [10] that if the estimates (25) or (26) are used in (24), the estimates Qi, • • • ,&„ are asymptotically unbiased, normal and consistent. Note that the algorithm (24) identifies the coefficients in the characteristic polynomial of A rather than the matrix A itself. Nevertheless, the unknowns in the matrix A are algebraically related to the coefficients a\, ■ ■ ■, an, so that the matrix A can be calculated from e*i, •• • ,an by solving a set of algebraic equations. This set of equations is, in general, nonlinear and possesses multiple solutions. However, if the particular form of A in (1) is not important and one is only interested in obtaining a linear system with output Vk (for example in signal pro cessing applications), then a canonical representation can be used for the system.
Copyrighted Material
P Wojcik
94
The canonical form is obtained from (1) and (2) using a nonsingular transformation matrix T: " C CA , K (27) Let the new state vector in canonical representation be denoted by x£, so that (28)
A = rxfc and Xfc+1 = A'x'k + BXk .
(29)
Vk = C*X*k + 7?fc ,
(30)
where A' = TAT'1
=
0 0
1 0
-a\
—ot2
(31) —«3
C* = CT'1 = [10 • • • 0], B* = TB.
(32) (33)
In the system representation of (29) and (30), only B* and R remain to be estimated. 4.2 Noise covariance R estimation Rewriting (12) for i = 0,1, • • • ,n, multiplying both sides of each equation by ai+i and adding, we have r„(0)ai + r „ ( l ) a 2 + • • • + r „ ( n - l ) a n + T„(n) ={aj
+ a 2 A + • • • + anAn~l
+ An)S + Rax
(34)
Using the Cayley-Hamilton theorem (18) and manipulating (34) 1 R = — ~Y^aJ+iVv(j), «i
where Q „ + I = 1.
(35)
j=0
Replacing all variables by their estimates, we obtain an asymptotically unbiased, normal and consistent [10] estimator of the noise covariance R: 1 n R = — ^aj+itvij), 3=0
where d n + i = 1.
(36)
95
On-Line Estimation and Adaptive KF 4.3 Input matrix B* estimation
For a canonical system representation of (29) and (30) it can be easily shown that C*A' D* =
= A*.
(37)
.C*[A*}n. Thus, substituting (37) to (16) and using (13)
r,(D s'[cy
T
r ,1*1-1
=[A-]
(38)
and S* = AmS'[A*}T
+ B*[B*]T.
(39)
In order to obtain the estimate of the input matrix B*, the equations (38) and (39) must be solved knowing the estimates of A* and F„(l), ■ • ■, Tv(n). These equations arise in many other problems in control theory and can be solved using, for example, the method given by [12]. However, in the case when the order of the system n < 3, equations (38) and (39) lead to a relatively simple set of nonlinear algebraic equations, which can be solved using conventional methods. 4.4 Adaptation scheme for the Kalman filter The adaptive filtering process based on the above theoretical considerations can be performed on-line. The parameters of the Kalman filter are estimated during each time interval To, Ti, • - •, adjusted at its end and kept constant during the next adaptation interval (Figure 3). The time intervals Ti are all equal (i = 0,1, 2, • • ■) and the duration of T; is referred to as adaptation step T a d_ 3 t, which is usually expressed as the number of the signal samples within the time interval T;. The adaptation scheme of the Kalman filter is illustrated in Figure 3 and can be described as follows. (i) During To, the initial time interval, the correlation function of Vk, the observed signal, is estimated using (25) or (26). (ii) At the end of To, the following computations must be performed: — estimation of the elements of A*, the transition matrix, from (23) and (31), — estimation of the noise covariance, R, from (36), — calculation of B*, the input matrix estimate, from (38) and (39), — solution to the Riccati matrix equation (8) or (9), — calculation of the optimum gain, Gi, of the Kalman filter from (7).
96
P. Wojcik
(iii) During the next time interval, Tx, the Kalman filtering process with parameters calculated in (ii) is performed ; simultaneously the correlation function of the input signal is estimated. (iv) For the next consecutive time intervals, steps (ii) and (iii) are repeated. It should be pointed out that during the initial time interval, To, the adaptive Kalman filter only identifies its parameters and does not perform any filtering. The filtering process starts at the beginning of the time interval T\ (Figure 3). T^llP
C4"T"1'""*''"'I1***3 / \ f
4-Vn-» *-}*-in*-i»*il-»j->»-l
ok/-nm
o / l f i T-\+iiro \C o l m O Tl
filter
1C T*T£>C^»Tl+"fiH
1 T\
Figure 4.
Figure 4. Structure of the adaptive Kalman filter.
§5 Kalman filter modification for a coloured noise The adaptive Kalman filtering equations (Section 4) and the filter structure have been derived under the assumption that the additive measurement noise is white (uncorrelated). However, very often in practical applications the signal is distorted by the noise with significant correlation times. In these cases the white noise as sumption is not satisfied, and the Kalman filter must be slightly modified. Several approaches [1,6,3] may be used to solve the problem. One of these methods is described next. 5.1 System model modifications Assume that the pure signal v 3 and the coloured noise v„ are described by the following state space equations: xi,fc+i = ^ i x i , f c + Bi£
(40)
v s ,jb =
Cix1>k,
(41)
A2y *2,fc+l =- ^2X2,*: + B2T]k ,
(42)
Vn.fc
(43)
C2X2,k ■
97
On-Line Estimation and Adaptive KF
Combining equations (40), (41) and (42), (43) we obtain the reformulated signal generating model: x fc+ i = Axfc + Bek ,
(44)
Vfc = vs,k + Vn.it = Cxfc
- the observed signal,
(45)
where A =
Ax 0
0 A2
B =
Bx 0
0 B2
,
C=[d
C2],
x
*1
x2
As a result of comparison of equations (44), (45) with (1), (2) it can be con cluded that in this redefined signal generating model the equivalent white noise is zero. Consequently, the equivalent white noise covariance matrix is singular (R = 0). Thus, the optimum Kalman filter for a coloured noise can be designed using equations (3), (4), (7),and (8) with R = 0, but the observation equations (5) and (6) have to be modified as follows: (46) (47)
v3,fc,fc = C m X f c . j t , Vp,fc+l,fc = CVnXfc+l.fc ,
where C m = [d 0]. If all parameters of a signal generating model are known, the optimum Kalman filter for a signal with coloured measurement noise can be designed as outlined in Section 2 of this paper. When the parameters of the signal generating model are unknown, estimation of these parameters can be performed using methodology described in Section 4 of this paper. In order to show another parameter estimating methodology, which is based on innovation sequence, it is assumed in this section, that the dynamics of the signal and noise is known exactly (matrices A, C), but the signal to noise ratio is unknown. As a result, only the optimum Kalman gain Gopt must be estimated on-line during the adaptive filtering process. The next section gives the description of an algorithm for the Kalman gain estimation. 5.2 Optimum Kalman gain estimation Let Zfc denote the innovation sequence (in general p x 1 vector) defined as Zfc = Vfc -
(48)
Cxfc.fc-! .
Considering the correlation matrix of the innovation sequence r z (.j) = E{zizJ_ } for the signal model defined by (44) and (45), it can be shown [9] that r , ( 0 ) =CMCT, r , 0 ' ) = G[A{I - GC)]1-lA[MCT
(49) - GT z (0)]
for j > 0,
(50)
98
P Wojcik
where M is the covariance of the prediction error in estimating the state and G is the gain matrix of the Kalman filter; both matrices do not have to be optimum. Notice that the optimum choice of the Kalman gain matrix according to (7) makes TzO) vanish for all j ^ 0, i.e., the innovation sequence for the optimum filter is white. Rewriting (50) explicitly and making appropriate manipulations [13,9], the following recursive relationship for the Kalman gain G can be derived: rM-i(i)
[iV-i(o)]- 1 ,
d = G,_! + $+_i
(51)
rz,i-i(n) where $+ is pseudoinverse of the matrix $ ; defined as CA CA(I - GXC)A
*, C[A(I
(52)
n l
-GiC)\ - Al
The obtained equation (51) gives the recursive algorithm for the estimation of the Kalman gain matrix Gi for the time interval T on the basis of the innovation sequence correlation function (r z ,i_i) estimated in the preceding time interval Ti_i. It can be proved [9] that the above sequence of the Kalman gains converges to Gopt5.3 Adaptive Kalman
filtering
The adaptive Kalman filter utilizes this algorithm to estimate the gain of the Kalman filter over a certain period of time (adaptation interval), processes the noisy signal with the gain estimated during the previous adaptation interval, and simultaneously estimates the gain for the next consecutive adaptation interval ac cording to adaptation scheme illustrated in Figure 3. The adaptive Kalman filter gain Gi is computed for each consecutive adaptation step according to the recursive algorithm (51). The adaptive Kalman filtering process of an observed signal in the presence of a coloured noise can be summarized as follows: (i) Filter the signal using equations (3), (4), and (46) during the current adap tation interval Ti with Kalman gain Gi which has been estimated during the adaptation step 2j_i. (ii) Estimate simultaneously (within Ti) the correlation function TZii of the inno vation sequence z<. (iii) At the end of the current adaptation interval Ti calculate (from equations (51) and (52)) the estimate Gi+\ of the Kalman gain to be used for the next adaptation interval Xi+i.
99
Oil-Lint Estimation and Adaptive KF (iv) Repeat steps (i), (ii) and (iii) for the consecutive adaptation intervals.
The structure of the adaptive Kalman filter, which processes signals according to adaptation scheme described above is illustrated in Figure 5.
Figure 5. Another structure of the adaptive Kalman filter.
§6 Signal models for the simulation purposes In the overwhelming majority of practical applications, an observed pure signal as well as an additive noise can be characterized as signals, which contain frequencies from zero up to a certain upper frequency, usually with a uniform distribution of power. Therefore, it is natural to assume that the signal and the measurement noise are formed from the Gaussian white noise by filtering it using low-pass filters. For our simulation purposes, we assumed that the shaping filter can be described by the following first order transfer functions: a) for a pure signal Hs(jw) = — - ~
(53) (53)
where UJS is the upper ZdB frequency of the signal spectrum, b) for an additive coloured measurement noise
#«0")
1 (1 + jw/u)n) '
(54)
where ojn is the upper 3dB frequency of the noise spectrum. In order to transform the frequency characteristics of the given analog systems into a discrete time domain description, several methods [5,14] can be applied. Using the discretization of continuous-time state equations, the following state space
100
P. Wojcik
description of the digital shaping filters with frequency responses given by (53), (54) can be obtained as follows: x\,k+i = A\x\tk + B\£k £2,fc+i = A2x2tk + B2r]k vk = CiXi:k + C2x2,k
- for the pure signal, — for the coloured noise, ~ the noisy signal,
(55) (56) (57)
where Ai = exp(-u>sT), Bi,B2
A2 = e x p ( - w n T ) ,
— coefficients determining covariances of the signal and noise respectively,
T = l//o — sampling period. The matrix C = [C\ C2] in the observation equation (46) is assumed to be [1 1]. That could be done without loss of generality because the matrix C affects only the signal to noise ratio which can be preset by the appropriate choice of the signal and noise covariances through the coefficients B\, B2. The signal and noise model described above has been used for performance evaluation of the adaptive Kalman filter designed to process the signal in the pres ence of coloured noise described in Section 5 of this paper. The similar model has been used for evaluation fully adaptive Ka}man filter developed in Section 4 of the paper. This model assumes the pure signal having dynamic characteristics described by equations (53) and (55), and the measurement noise being white. The output signal from the models described has been used as the input signal for the adaptive Kalman filter to simulate the derived adaptive Kalman filtering processes under several different conditions. §7 Performance measures for the adaptive Kalman filter Many authors, who devoted their works to the adaptive Kalman filtering, mea sured the performance of the adaptive Kalman filter by examining the differences between the estimated parameters of the Kalman filter and their optimum values. Such approaches do not address errors of the adaptive Kalman filter, because the relationship between the increase of the state estimation error and the deviation of the parameters is strongly nonlinear. Therefore, the measure of the adaptation quality based on mean square errors of the state estimation is used in this paper. Since the canonical system representation is assumed, the filtering error is equal to the state estimation error defined by respectively Ax = E{(xk - ifc.fc)2}
- for white noise case,
(58)
or A i i = E{(xitk — x\,k,k) }
— for coloured noise case,
(59) (59)
101
On-Line Estimation and Adaptive KF
where E{} denotes the expectation calculated over all signal samples within the filtering time (i.e., intervals 2\, T2, • ■ •). In order to determine the performance of the adaptive Kalman filter, two im portant measures of the filter quality have been introduced. Let these measures be defined as follows : a) The relative filtering error of the adaptive Kalman filter for white noise case Axrei = [Ax-P]/P,
(60)
where P is the (scalar) covariance of the error in the state estimation for the optimum Kalman filter (G = Gopt) and can be calculated from (10). And for the coloured noise case: Ax 1 | r e , = [ A n - P n l / P n ,
(61)
where P\\ is the (scalar) covariance of the error in the state variable x i esti mation for the optimum Kalman filter (G = Gopt) and can also be calculated from (10). b) The improvement coefficient of the adaptive Kalman filter IM = E{rfc}/Ax
— for white noise case,
(62)
where E{nl} = R is the white noise covariance, and IM = Z?{z2|fc}/Axi
— for the coloured noise case
(63)
where E{x\ k} is the coloured noise covariance. The relative estimation error A i r e i describes the relative increase of the state estimation error (or the pure signal filtering error) for the adaptive Kalman filter in comparison to the optimum Kalman filter. It should be noticed that the adaptive Kalman filters (Figures 4 and 5) always give greater errors than the optimum Kalman filter. On the other hand, the opti mum Kalman filter is always better than the case when no filtering is applied at all. Thus, it can happen in certain circumstances that the adaptive Kalman filter will give greater errors than noise would cause itself (no filtering). Therefore, it is very useful to compare errors for the adaptive Kalman filter with errors which would appear in the case when no processing of the signal were applied at all. Such a comparison can be done by introducing the improvement coefficient (62) or (63). The improvement coefficient IM determines how many times the mean square error of the signal, after processing by the adaptive Kalman filter, is smaller than the error of the pure signal with no processing applied, or in other words how many times the adaptive Kalman filter reduces the observation error. It is obvious that for the application for which IM < 1, the adaptive Kalman filter will generate greater error than the noise causes itself, so the signal processing with the adaptive Kalman filter will be unreasonable.
102
P. Wojcik §8 Simulation results
The adaptive Kalman filtering algorithms described above has been tested for errors using 70000 samples of the signal with the additive noise. The observed signal Vk was generated by the model presented in Section 6 of this paper. The parameters changed during tests, were — adaptation step (Tad-St = 1000^2000,5000,10000), — the ratio (/o/w 3 = 2.5 to 10000) of the sampling frequency to the upper 3dB angular frequency of the pure signal in the case of white additive noise, — the ratio (u}a/un = 0.001 to 1000) of the upper 3dB angular frequency of the pure signal to the upper frequency of noise in the case of coloured noise, — values of the signal to noise ratio, (C = 0.1,1,10,100,1000), defined as the ratio of the covariance of the pure signal to the covariance of the noise: C = E{(xk)2}/E{r)l} 2
— for white noise case, 2
C = E{(xhk) }/E{(x2,k) }
- for coloured noise case.
(64) (65)
The tests proved the usefulness of the adaptive Kalman filtering algorithms in the processing of a variety of signals especially those having small signal to noise ratios. Some examples of the simulation results are presented next in this section. Complete test results with detailed analysis can be found in [15,16]. 8.1 W h i t e noise case (Section 4) Each diagram in Figure 6 represents the relative estimation error, Aarre(, as a func tion of the ratio (fo/ojs) of the sampling frequency to the upper 3dB frequency of the signal. The parameter of the curves is £, the signal to noise ratio, defined by (64). The steady state relative estimation error Arr re ; decreases with the increase of the length of the adaptation step Tad-St- Therefore, in order to attain the min imum estimation error, the adaptation step should be chosen as large as possible. On the other hand, a longer adaptation step results in slower adaptation of the Kalman filter to changes in signal and noise parameters. Thus, compromise has to be achieved when choosing the adaptation step for a particular application. Figure 6 shows also that the curves of the relative estimation errors exhibit distinct minima with respect to the ratio fo/ui3. Therefore, these curves can be used to determine the best sampling frequency of the signal for a particular application of the adaptive Kalman filter.
103
On-Line Estimation and Adaptive KF KF
Figure 6. Relative filtering errors for the adaptive Kalman (white noise case). 1) 2) 3) 4)
filter for for for for
C= 1 C = 10 ( = 100 C = 1000
Figure 7 presents the improvement coefficients as a function of frequency ratio, fo/u>s, with the parameter £ for two adaptation steps (1000,10000). It is evident from the diagram that the improvement coefficient slightly increases with the increase of the adaptation step. Thus, from a noise reduction point of view, it is better to apply a longer adaptation step.
104
P- Wojcik
Figure 7. Improvement coefficients for the adaptive Kalman filter (white noise case). 1) for C = 2) for C = 3) for C = 4) for C =
1 10 100 1000
It should be pointed out that for certain values of the signal parameters (/o/w s , £) the improvement coefficient can become smaller than 1, so the adaptive Kalman filter would be ineffective. For instance, when the signal to noise ratio £ = 1000, the adaptive Kalman filter does not reduce the measurement noise for the ratio /o/w s smaller than 100. In order to make the filter effective for that particular application, a designer must increase the sampling frequency.
On-Line Estimation and Adaptive KF
105
Consequently, the fo/ws, ( plane of the signal parameters (Figure 8) can be divided into two regions: A — for IM < 1, where the adaptive Kalman filter should not be applied, because it will increase the measurement errors, and B — for IM > 1, where the filter will reduce observation errors. The boundary between these regions shown in Figure 8 was obtained for the length of the adaptation step equal 1000. When the length of the adaptation step increases, the boundary moves slightly upwards on the parameters' plane.
Figure 8. Threshold for the application of the adaptive Kalman filter (white noise case). 8.2 Coloured noise case (Section 5) Simulations performed for the adaptive Kalman filtering in the presence of coloured noise were very similar to those performed for the white noise case. Parameters of the signal and noise models were chosen in the same way, however, the sampling frequency /o has been chosen constant and equal 1kHz. Figure 9 illustrates the relative filtering errors Ax\:rei for the adaptive Kalman filtering algorithm derived in Section 5 of this paper. Each diagram in Figure 9 represents the relative estimation error A x l r e ; as a function of the ratio (uis/uin) of the upper signal frequency to the upper 3dB frequency of the noise. The parameter of the curves is £, the signal to noise ratio defined by (65). Similarly as for the white noise case, it can be observed that the steady state relative estimation error Axi^ei decreases with the increase of the length of the adaptation step Tad-StWhen the spectra of the signal and noise are very close to each other (uis/wn ss 1), the estimation of the Kalman filter parameters is no longer efficient, the filtering errors become very large, and, therefore, this adaptive algorithm can not be used.
106
P. Wojcik
Figure 9. Improvement coefficients for the adaptive Kalman filter (coloured noise case). 1) for C = 0.1 2) for C C,== 10 10 3) for C = 1000 Figure 10 presents the improvement coefficients, defined by (63) as a function of the frequency ratio us/ujn with the parameter C for the adaptation step equal 10000.
On-Line Estimation and Adaptive KF
107
Figure 10. Improvement coefficients for the adaptive Kalman filter (coloured noise case). 1) for C = 0.1 2) for C = 10 3) for C = 1000 It can be noticed from the diagram that, again similarly to the white noise case, for certain values of the signal parameters (uj3/u)n, () the improvement coefficient can become smaller than 1, so the adaptive Kalman filter would be ineffective. Thus, Figure 11 shows two regions of the u>3/un, ( plane: one - where the adaptive Kalman filter is ineffective, and the second — where the filter will reduce measurement noise. The boundary between these two regions shown in Figure 11 was obtained for the length of the adaptation step equal 10000.
Copyrighted Material
108
P. Wojcik
Figure 11. Threshold for the application of the adaptive Kalman filter (coloured noise case). §9 Modifications of the adaptation scheme The adaptive filtering algorithms, which work according to the adaptation scheme illustrated in Figure 3 , respond to variations of the signal parameters with a certain delay. This delay is always greater than the length of the adaptation step. Choice of the smallest possible adaptation interval is the only way to shorten the reaction time of the adaptive Kalman filter. However, when the adaptation interval is chosen too small, the number of signal samples becomes insufficient to obtain a good estimate of correlation function, and errors of the filter will increase. This problem can be solved by a modification to the adaptation scheme, which is illustrated in Figure 12.
Figure 12. First modification of the adaptation scheme.
On-Line Estimation and Adaptive KF
109
First, the number (N) of signal samples is loaded into the processor memory, and during the time interval between sample N and N + 1 the following computations must be performed: (i) estimation of the input signal (or the innovation sequence) correlation function
r, (ii) calculation of the signal model parameters from (31), (36), and (39), (iii) solution to the Riccati equation (8) or (9) and Kalman gain calculation from (7) or (51). When the sample N + 1 of the signal appears at the input, the first memorized signal sample is replaced in the memory by the sample N + 1. Next, the sample N + 1 is filtered using (3), (4), (5) or (46) with previously calculated A, K, and the processor performs computations (i), (ii), (iii) to determine Kalman filter parameters for the next signal sample N + 2. This procedure is repeated for each consecutive signal sample. Even though the algorithm, which is based on the first modification of the adaptation scheme, responds relatively quickly to variations of the signal parameters, it has the following disadvantages: — the signal processor must have a suitable memory to be able to remember N signal samples (for practical applications N has to be much greater than 1000 to obtain a good estimate of the correlation function), — all calculations (i), (ii), (iii) must be completed within one sampling period; however, this constraint is very difficult to satisfy especially for high sampling frequencies (the operation which consumes most of the computing time is the correlation function T estimation (i), because it requires 3N multiplications and 3N additions).
Figure 13. Second modification of the adaptation scheme.
110
P. Wojcik
In order to overcome these problems the second modification (Figure 13) of the adaptive filtering algorithm has been introduced. The processing is now being performed with constant parameters of the Kalman filter within each adaptation interval (as in the adaptation scheme illustrated in Figure 3), but the estimate of the signal correlation function is calculated using signal samples from a number (L) of the preceding adaptation intervals. The correlation function of the input signal (or the innovation sequence) is estimated successively for each adaptation interval using algorithm (26) and loaded into memory (let T m denote the estimate obtained for T m _ i ) . But only the last L estimates r m _£,+i, • • •, r m, are are stored in the processor memory. Whenever the new estimate T m is calculated it replaces the oldest one rm-L in the memory. At the end of each adaptation interval Tm-\ the average estimate is calculated from: m
1 f am. = y
2J
I\
for m > L
i=m-L+i
T Q m = — > Ti m f—f
(66)
for m < L .
i=i
After the average estimate T a m has been calculated, this estimate and the equations (31), (36), (39), (9), (7) or (51) are used to compute the Kalman filter parameters for the next adaptation interval Tm. The second modification of the adaptation scheme (Figure 13) combines ad vantages of both previously mentioned schemes (Figures 3 and 12). Since shorter adaptation intervals can be applied, the algorithm responds relatively quickly to changes in signal parameters and is recommended for processing nonstationary sig nals. Moreover, the computational burden and memory requirements of the scheme from Figure 12 are significantly reduced, because correlation function is estimated on-line for each consecutive adaptation interval using the recursive form (26), and only L values of the correlation function estimates have to be stored in the processor memory.
References [1] Anderson, B. D. O. and J. B. Moore, Optimal Filtering, Prentice-Hall, Englewood Cliffs, N. J., 1979. [2] Chen, G. and C. K. Chui, A modified adaptive Kalman filter for real-time applications, IEEE Trans, on Aero. Electr. Sys. 27 (1991), 149-154. [3] Chui, C. K. and G. Chen, Kalman Filtering with Real-Time Applications, Springer-Verlag, New York, 1987. [4] Dorf, R. C , Modern Control Systems, Addison-Wesley, Reading, MA, 1980.
On-Line Estimation and Adaptive KF
111
[5] Franklin, G. F. and J. D. Powell, Digital Control of Dynamic Systems, AddisonWesley, Reading, MA, 1981. [6] Gelb, A., Applied Optimal Estimation, M.I.T. Press, Cambridge, MA, 1974. [7] Goodwin, G. C. and K. S. Sin, Adaptive Filtering Prediction and Control, Prentice-Hall, Englewood Cliffs, N. J., 1984. [8] Kalman, R. E. and R. S. Bucy, New results in linear filtering and prediction theory, 7>ans. of ASME, Series D., J. of Basic Eng. 83 (1961), 95-108. [9] Mehra, R. K., On the identification of variances and adaptive Kalman filtering, IEEE Trans, on Auto. Contr. 15 (1970), 175-184. [10] Mehra, R. K., On-line identification of linear dynamic systems with applications to Kalman filtering, IEEE Trans, on Auto. Contr. 16 (1971), 12-21. [11] Mehra, R. K., An algorithm to solve matrix equations PHT = G and P = 3>P$ T + r r T , IEEE Trans, on Auto. Contr. 15 (1970), 600. [12] Sasiadek, J. Z. and P. J. Wojcik, On the design of a discrete Kalman filter for a tactile sensor in a feedback control system, Int'l J. of Control 45 (1987), 1211-1228. [13] Sasiadek, J. Z. and P. J. Wojcik, Tactile sensor signal processing using an adaptive Kalman filter, Proc. of IEEE Conf. on Robot. Autom., March 30 April 3, 1987, Raleigh, North Carolina, 1753-1759. [14] Stanley, W. D., G. R. Dougherty, and R. Dougherty, Digital Signal Processing Reston Pub., Reston, VA, 1984. [15] Wojcik, P. J., On the design of an adaptive Kalman filter for on-line processing of sensor signals, Proc. of IEEE Conf. on Decis. Contr., Dec. 9-11, 1987, Los Angeles, CA, 1605-1611. [16] Wojcik, P. J., On-line estimation of signal and noise parameters with application to adaptive Kalman filtering, Circ. Sys. Sign. Proc. 10 (1991), 137-152. Piotr J. Wojcik Advanced Computing and Engineering Department Alberta Research Council Calgary, Alberta, Canada [email protected]
S u b o p t i m a l K a l m a n F i l t e r i n g for L i n e a r S y s t e m s with Non-Gaussian Noise
Huaiyi Wu and G u a n r o n g C h e n
Abstract. This article gives a brief overview for the suboptimal filtering of discrete-time linear systems which has non-Gaussian state and/or measurement noises, under the condition that the state prediction density function can either be expressed as, or be well-approximated by, a finite sum of Gaussian distributions. The representative approaches of Sorenson and Alspach and of Masreliez are reviewed in somewhat details. A few new modifications and generalizations of their approaches under the MMSE criterion are described, which are all recursive algorithms with significant improvements in computational efficiency and filtering performance and with the capability of handling more general random environments.
§1 I n t r o d u c t i o n Consider a discrete-time linear dynamic system consisting of a s t a t e equation and a measurement equation in t h e following form: J x f c + i = Akxk
+ £k
{
+ r^ ,
v fc = Ckxk
t = 0,1,2,»- ,
{l)
where Xfc is an n x 1 s t a t e vector, Vfc a q x 1 measurement (data) vector, Ak a n d Ck deterministic matrices, a n d {£. } and {77, } b o t h noise sequences. Under t h e conditions t h a t t h e initial s t a t e vector xo has a Gaussian distribution with known mean vector a n d covariance m a t r i x and t h e two noise sequences are mutually independent a n d Gaussian with given m e a n vectors and covariance matrices, a n d b o t h mutually independent of t h e initial s t a t e vector, an optimal recursive filtering scheme can be obtained using t h e available measurement d a t a {vfc} where t h e optimality is in t h e Approximate Kalman Filtering Guanrong Chen (ed.), p p . 113-136. Copyright ©1993 by World Scientific Publshing Co. Inc All rights of reproduction in any form reserved ISBN 981-02-1359-X
113
114
H. Wu and G. Chen
sense of Kalman, namely: the estimates of the unknown state vectors at all instants are linear (in the data), unbiased (with the same means as the true state vectors), and of minimum estimation-error covariances. More precisely, denoting the state estimation vectors by {xt}, this optimal filtering scheme, known as the Kalman filtering algorithm, is given by
f xx00 == £ { x 0 } \ xxf fcc = Ak-i-kk-i
«2. + Gfc(vfc - Cfc>4fc_iXfc_i),
where £ { x 0 } is the mean vector of the initial state, G/t the Kalman gain matrix calculated recursively via ' P0 'Pi {
= Ccw{x0}
flb.k-1 = Ak-iPk-iAl^ Pi
+ Q fc _!
Gk = Pfc;t_iCfc (CkPk,k-iCk G Pk = (I -
+ Rk)~
GkCk)Pk,k-i,
with Cov{xo} the covariance matrix of the initial state, and Qk and Rk the covariance matrices of the state and measurement noises, respectively. Actually, the Kalman filter gives optimal state estimations {xjt}, i.e., the minimum mean-square estimations (MMSE), each of them minimizes the conditional expectation of the estimation error over all possible linear estimations:
where
£{(x fc - x fc ) T (x fc - x fc )|V fc } = min £{(x fc - x fc ) T (x fc - x*)|V fc } ,
(4)
Vfc = { v l j v 2 , - - , v f c } ,
(5)
and is given by the conditional expectation of the unknown state vector: xfc = £{x fc |V fc } =
oo
/
x fc p(x fc |V fc )dx fc ,
(6)
-oo
where p(xfc|Vfc) is the conditional density function of x^ given the data set Vfc For more detals, the reader is referred to Jazwinski [9], Anderson and Moore [1], Chui and Chen [6], or Catlin [4]. In applications, however, the two noise sequences {£, } and {TL} may not be Gaussian. In this case, the optimal Kalman filtering algorithm cannot be applied since in the derivation of the Kalman filter the Gaussian distribution of the noise is essential. Practical experience has shown that if we use the Kalman filtering scheme (2) for the linear system (1) with non-Gaussian noise sequences, the performance of the filter may be very unsatisfactory, see for example Tukey [17] and Hubert
Kalman Filtering for Non-Gaussian Noise
115
[8]. Hence, in this case suboptimal or "robust" (in the sense of statistics) Kalman filtering becomes necessary. Concerned with the special yet quite frequent case where the non-Gaussian noise distribution can be expressed as, or approximated very well by, a finite sum of known Gaussian distributions (or, a "Gaussian sum'' for short), some Kalman-like filtering algorithms have been derived by, for example, Sorenson and Alspach [15] and Masreliez [13]. In particular, Sorenson and Alspach [15] considered the case where both the state and measurement noise sequences are wide-sense stationary non-Gaussian which have a uniformly convergent series expression in terms of some known Gaussian distributions. Prom a mathematical point of view, all the Gaussian terms in the series must have the same covariance in order to ensure the uniform convergence of the random series. For their algorithm to be practical, Sorenson and Alspach suggested to truncate the series up to certain order and then fix the number of terms in the truncated series. Furthermore, in order to have a better approximation after the truncation, they assumed that each Gaussian term in the truncated finite sum can have different means and covariances, and these means and covariances as well as the combination coefficients of the finite sum are all explicitly known. Under these conditions, an optimal filtering scheme using the MMSE crite rion was derived in their paper, which we will further discuss in the next section. To this end, we would like to mention that the idea of using a finite sum of Gaussian distributions to approximate a non-Gaussian distribution can be traced back to, for example, Aoki [2], Camerson [3], Lo [11] and, later, Fugerson [7]. One major dis advantage of Sorenson and Alspach's approach is that the numerical computation in the filtering process increases almost exponentially in time, as will be seen from the description of their algorithm given below in the next section. To overcome this computational complexity issue, on the other hand, Masreliez [13] proposed another approach, under some stronger assumptions, such as either the state or the measurement noise is Gaussian, and the one-step ahead prediction density function are both Gaussian. Then, by introducing a so-called score function, which charac terizes the deviation of the non-Gaussian distribution from the Gaussian one, he derived two filtering schemes for the two different cases, namely: when the state noise is Gaussian but the measurement noise is not, and vice versa. This approach saves considerable computation time and effort, but raises a new problem that one has to handle a rather difficult convolution operation involving the nonlinear score function. We will discuss this approach in somewhat detail in the next section, while it may be worth mentioning here that the convolution operation with a score function was suggested to be approximated or replaced by Martin and Masreliez [12] and Masreliez and Martin [14], and has also been evaluated by Tsai and Kurz 12] and Masreliez and Martin [16] and Wu and Kundu [21]. [14], and has also been evaluated by Tsai and Kurz 16] 6] and Wu and Kundu [211. [21]. To take advantage of the aforementioned two approaches and to avoid some of their difficulties, we suggested, in Wu and Kailash [20] and Wu and Chen [18,19], to approximate the state prediction distribution by a finite sum of known Gaussian
116
H. Wu and G. Chen
distributions and, with the help of this approximation, to obtain suboptimal Kalman filtering schemes (under the MMSE criterion) for those linear systems whose initial state and measurement as well as state noises can be expressed as finite sums of Gaussian random vectors and those linear systems whose state (or, respectively, measurement) noise sequence can otherwise be arbitrary. The resulting filtering algorithms resolve the computational burden of Sorenson and Alspach's schemes and contains Masreliez' scheme as a special case, and improve significantly the estimation performance of Masreliez's algorithm as has been demonstrated by the computer simulations shown in Wu and Kailash [20]. This new approach, yielding several similar but different suboptimal Kalman filtering schemes for different cases, will be summarized and discussed in Section 3. §2 Suboptimal filtering for linear systems with non-Gaussian noises In this section, we review some existing suboptimal Kalman filtering schemes that were derived for linear systems under different assumptions on the noise sequences, as described in the last section. More specifically, we will first review the approach of Sorenson and Alspach, and then the one proposed by Masreliez. 2.1 Sorenson and Alspach's approach Consider the linear dynamic system (1), with the initial state xo and the state as well as measurement noise sequences {£ } and {n, } are non-Gaussian but all can be represented as, or approximated by, finite sums of known Gaussian random vectors in an explicit manner to be described precisely below. In what follows, by non-Gaussian we will always mean not Gaussian nor Gaussian sums. To simplify the notation in our discussion of the suboptimal filtering scheme of Sorenson and Alspach [15], we only consider the scalar case in the following. Let, as usual, N(m, a) denote the Gaussian density function with mean m and variance a, and assume that the distribution density functions of xo, £fc and T]k are given respectively by to
p(x0) = J2 ot0iN{x0i, S0i)
(7)
i=l
(8)
p(tk) = Y,^N(tki,Qki) i=i ii
P(%) = ^ ^ • A r ( % « - R " ) >
(9)
i=l
where ~ indicates the mean and {aoi,/3fci, Hki} are nonnegative constants satisfying la
^Q0t = l, i=l
h
5Z
i=l
e2 /9fci = 1
and
X^ f c < t=l
= 1
fOT k =
Mi2'"
117
Kalman Filtering for Non-Gaussian Noise
Here, all quantities on the right-hand side such as the means and variances are assumed to be given a priori. Under these conditions, the state prediction density function p(xk\Vk~1) can be expressed as a finite sum of Gaussian density functions:
pixklV^1) =
J2akiN(xki,Mki),
(10)
i=i
where all quantities on the right-hand side can be computed using the given informa tion (see Sorenson and Alspach [15] for more details). Then, after this setting-up, the filtering scheme of Sorenson and Alspach under the MMSE criterion can be described as follows: ' xTo, 0 = E{x0i} <
= xoi k
i = 1, ■ ■ ■, to
= ^2 Y2 bkijikij ,
ixkt = E{xk\V }
(11)
where t h e coefficients {bkij } and t h e estimates {£kij } are calculated via t h e following formulas: For each k — 0 , 1 , 2 , • • •, and all i = 1, • ■ ■, 7> and j = 1, • • •, t%, c o m p u t e /■
—
' uVkij = CkXki + fjkj Ckij = CkMki a
+ Rkj aki^kjN(vkij,akij)
< 6
E r = l J2%1
OtkrUksNiVkrs,
Okrs)
MkiCk
x
T h e filtering error covariance is given by ' P„ = C!tmtr.«A = .„_.
■, = ! . . . />„
(12) Pkij = Mki
-
Ml&
MkiCl + Rkj
Using this algorithm, the one-step ahead prediction xk+i Xk+i = Akxk + | f c l where fit = 2j^/ti^fct •
can be obtained as (13)
118
H. Wu and G. Chen
This algorithm has many advantages such as the capability of handling a very general situation, namely: when both the state and measurement noise sequences as well as the initial state vector are Gaussian sums, which is already not easy. Some other merits can be found from Sorenson and Alspach's paper [15]. In this algorithm, however, we notice that both Xk and Pjt are nonlinear functions of the data Vk- We also notice that this filtering scheme has no recursive structure, and hence does not maintain one of the main characteristics of the Kalman filtering algorithm. More importantly, it is computationally intensive, since the number of (finite) terms in certain summations of Gaussian distributions increases almost exponentially in time. For example, if at the fcth instant the density functions p(£jt), p(rjk), and p(xk\Vk~1) have £\, £2, and nt terms in their expressions of Gaussian sums, respectively, then the one-step ahead prediction density function p(xk+i\V ) will have rk+i = rk x £\ x £2 terms in its Gaussian sum expression, so that at the next instant, the new state prediction density function p(xk+2\Vk+x) has a total of rk+2 = Tk x (£1 x £2)^ terms in its expression of the Gaussian sum. This is certainly a heavy burden for real-time applications. To improve this, Masreliez [13] suggested a different approach by imposing a stronger assumption on the statistical property of the state prediction density function. We will give a brief review of his method in the next subsection. 2.2 Masreliez' approach Two cases were considered in Masreliez [13]: (i) the initial state is Gaussian with known mean and covariance, the state noise is zero-mean Gaussian with known covariances, the measurement noise is non-Gaussian with known distribution, and the one-step ahead prediction is assumed to have a Gaussian distribution; and (ii) the state noise is non-Gaussian with known means and covariances, yet the measurement noise is zero-mean Gaussian with known covariances, where the onestep ahead prediction is unknown and the initial state can be arbitrary. For the first case, let the covariances of the zero-mean state noise, which is assumed to be white and Gaussian, be E{£,£. } = Qk^kj, where fc = 1 if k = j and 0 otherwise, k,j = 0, l , - - . Then, under the MMSE criterion and the ad ditional assumption that the one-step ahead prediction density function at each instant is Gaussian with the mean and covariance matrices denoted by x^ and Mk, respectively, Masreliez derived the following filtering scheme: xo0 '' X •.
== £ { x 0 }
Xjfc it* =^l = fc _ 1 x fc _ 1
. xfc = xfc +
(15)
MkCk9kiyk),
k = 1, 2, ■ ■ •, where the covariance matrices Mk can be recursively up-dated by
119
Kalman Filtering for Non-Gaussian Noise
p0 = Cov{x0} f' P Mi Mk = Ak-iPk-iAj^
+ Qk-i
:= [Gij\ = ' Gfc(vfc) G, . P Pifc
= Mfc -
(16) a
v
{ *}>
J,X,
MkC^Gk(vk)CkMk,
fc = 1, 2 ■ • •, in which {•}; is the ith component of the vector, and
S P K I V *fc-IM -1)
d{vk}i
PKIV*"1).
(17)
In this algorithm, gfc(vfc) is called a "score function." In this algorithm, note that both pfc(vfc) and Gfc(vt) are nonlinear functions of the data Vfc for all k = 1, 2, • • ■. Notice also that if the conditional density function p(vfc|V fc_1 ) is Gaussian, or equivalently, if both p(£ ) and p{rh) are Gaussian, then the above scheme reduces to the standard Kalman filtering algorithm. For the second case, under the MMSE criterion with the additional assump tion that the matrices [Ck Rk1Ck] are nonsingular for all k = 1, 2, • • • (which is of course not practical for most higher-dimensional systems), where Rk is the covariance matrix of the measurement noise n,, Masreliez obtained the following filtering scheme: ' xxo 0 = £{x0} - Xfc ^fc-ixfc-i (18) XJt
where Tk = (C^Rk
Ck)
xfc + TkCjlRZHvk - Ckik) -
9k(yk)],
1
, with the estimation-error covariance matrix
Pk = Tk -
TkC^Gk{yk)CkTk,
(19)
in which all other notation as above. Although this algorithm allows the initial state be arbitrary, there is a quite serious problem with it, namely: since the state noise is non-Gaussian, the condi tional density function p(xfc|V fc_1 ) is non-Gaussian about which we do not have any information, so that the calculation of p(vt|V f c _ 1 ) is impossible in general. Consequently, both gk{^k) and Gfc(vfc) cannot be calculated, so that this algorithm has not much practical value in applications.
120
H. Wu and G. Chen
2.3 N e w algorithms for suboptimal filtering We have considered, in Wu and Kailash [20] and Wu and Chen [18], several different cases of the linear system (1), which can have Gaussian and/or non-Gaussian noise sequences and which can have noise sequences or the initial state be approximated by certain Gaussian sums. In these cases, many statistical quantities such as the one-step ahead prediction density function p(xfc|V fc_1 ) will not be Gaussian (nor a Gaussian sum) in general. We observed however that in the derivation of subopti mal Kalman filtering algorithms for such systems, again under the MMSE criterion, only the independence (i.e., the white property) and the first two moments (i.e., the mean and the covariance) of those quantities, e.g., p(xfc|V fc_1 ), are needed. For this reason, it is very natural to replace them by the so-called "Gaussian ana logues" without loss of mathematical rigor, as has been demonstrated in Liptser and Shiryayev [10] (see also Chen, Chui and Yu [5] for a more detailed argument). A Gaussian analogue of p(xfc|V fc_1 ), for example, is a Gaussian random sequence which has the same mean and covariance as p(xfc|V fc_1 ) at each k = 0,1, • • ■. To have a better approximation, we used a finite sum of Gaussian analogues to replace this prediction density function at each instant k = 1, 2, • • ■, in the derivation of many of our new filtering schemes as described more precisely in the following two subsections. 2.3.1 Preliminary results In Wu and Kailash [20], we considered three cases of the linear system (1): (i) the state noise is Gaussian, the initial state is a Gaussian sum, but the measurement noise is non-Gaussian, (ii) the state noise and the initial state are both Gaussian sums but the measurement noise is non-Gaussian, and (iii) all the aforementioned three components are Gaussian sums. We describe the three corresponding new algorithms in the following: Case (i): The state noise sequence is Gaussian, with means {£, } and covariances {Qfc}, the measurement noise sequence can be arbitrary, but the initial state density function can be expressed as a finite Gaussian sum: la
t0
p(x 0 ) = Y^ QoiP(xoi) = ] P a0iN(xoi,Soi) i=l
(20)
i=l
for some known constants {aoi}, means {xoi}, and covariance matrices {Soi}In this case, the one-step ahead prediction density function p(xfc|V fc_1 ) is nonGaussian in general for k > 1. Assume that this density function can be expressed as, or approximated by, a finite sum of the form T
p( Xfc |V fc - 1 ) = ^ a f c l P i ( x f c | V f c - 1 )
(21)
121
Redman Filtering for Non-Gaussian Noise
for some constants {aki} and some (perhaps non-Gaussian) density functions p;(x/t| V f c _ 1 ) with mean 5tki and covariance Mki to be computed, respectively, where i = 1, • • ■ ,r. We replace pi{xk\Vk~1) by its Gaussian analogue for all i = 1, • • •, r. Under these conditions, we have derived the following recursive filtering algorithm under the MMSE criterion: ' x 0i XH Xfci = >lfc-iX(fc_1)i + | f c _ 1 • xx fHc i = xfci +
MkiCjgki(vk)
r
Xfc = ^ J a/tiCjtiXfc; Xjfc
where gki(~Vk) is a g x 1 column vector with the jth. component \dpi(vk\Vk-1)
{gki(vk)} with
P,(v fc |V A:—IN
!,-•
OO
/
Pi(3*lVfc-l)p(Vfc|xfc)dxfe -oo
and for fc = 1, 2, • fc* J =p l (v f c |V f c - 1 )/p(v f c |V f c - 1 ) &ki = Q(fc-i)iC( f c _i) t Oiki
Pot = Cov{x0i} k> = Mkl - PPki
= S0l
M fcs C fc T G fcl (v fc )C fc M fcl
r T
Pk = y^ctkiCki[Pki + (xjb - xfci)(xfc - x fcl ) ] Pk i=\
_ Mfcl = i4fc_1P(fc_1)jAj_i + Qk-i where Gkii^k) is a, q x q matrix with the (j,t) entry {Gjn(Vk)}3
9{gkz(vk)}j 9{vk}l
When r = 1, this filtering algorithm reduces to the first one of the two filtering schemes of Masreliez discussed in the last section. In Wu and Kailash [20], an improvement of the second algorithm of Masreliez was also obtained. Recall that the second scheme of Masreliez is not realizable since the state prediction density function therein is completely unknown. If we know a
122
H. Wu and G. Chen
little more about it, however, we can make it realizable. More precisely, if the state prediction density function p(xfc|V fc_1 ) can be expressed as p(x fc |V fc - 1 ) = ^ Q f c l P i ( x f c | V f c - 1 )
(22)
for some constants {a/ti} and some (perhaps non-Gaussian) density functions Pi(xk\ y f c _ 1 ) with mean iki and covariance Mki, respectively, i = 1, • • •, r, then by replac ing it with its Gaussian analogue for all i = 1, • • ■ ,r, we can obtain the following recursive filtering scheme under the MMSE criterion: X = XOt
XOi
X Xfci = Ak-l^k-l)! • x Xfci
+ lk_1
= Xfci + TfcC£"[flfc x(vfc - Cfcxfci) - flbi(vfc)] r
*fc = y^QfcjCfctXfci X
where
,
Cfcj=Pi(vfc|Vfc-1)/p(vt|Vt-1) Qfci =
0(fc-i)iC(fc_i)i
Pot = Cov{x0i}
•
= S0i
[C^R^Ck]'1
Tk =
Pki = Tfc - TkCl\CkMklCTk
+
J
Rk]-lCkTk
T
Pk = ^2akiCki[Pki + (xfc - xfci)(xfc - x fci ) T ] i-\
Mkt = A fc _ 1 P (fc _ 1)l >lJ_i + Qk-i . 9fei(vfc) = [CfcMfclCfcT + Rk]-l(yk
~ C fc x fct ).
Case (ii): The initial state and the state noise sequence are both Gaussian sums with the expressions of the form to
P( x o) = ^ i=l
p(£b)
to Q
o»P(xoi) = X
Q
oi-7V(xcH, 5 0 i)
i=l
= E ^p(i«) = E fo* (L« «*> ■ i=l
i=l
where the means and covariances are known, and the measurement noise sequence can be arbitrary.
123
Kalman Filtering for Non-Gaussian Noise
The essential problem arising in this case is that the number of terms in the Gaussian sum expression of the one-step ahead state prediction density function, p(xjt|V fc_1 ), increases exponentially as k increases. However, we have found an efficient way to handle this intrinsic technical problem. To describe the main result and the new technique, we will use the same notation as above. Suppose that the state prediction density function has the expression p(x fc |V fc - 1 ) = ^ a f c t P i ( x f c | V f c - 1 )
(23)
for some constant coefficients {aki} calculated from the previous step. Then the state update density function is p(x fc |V fc ) = 5> f c i CHPi(x f c |V f c )
(24)
with Pi(x J*(xffcc |Vfc)
< Cfc, Cki : =
p(vfc|Vfc-!)
Cki
XTL, Pi (xfc|V^jp^ (vfc - CfcXfc)dxfc pCvfclV*"1)
where p,, (v^ — Ckxk) = p(?7fc). It is also clear that the next state prediction density function is r
p ( x f c + 1 | V f c ) = ^ajfciCA : iPi(x f c + 1 |V''
(25)
with
p i (x fc+1 |V Pi
fc
) = £/3 f c j P i > (x f c + 1 |V f c ) i=i OO
P«> p.
/
where p ^ ( x f c + 1 - Akxk)
Pi(x fc |V fc )p it (xfc+i - Akxk)dxk
,
= p(f fcj ) = iV(| fcj , Qkj).
Since only the first two moments of pi(x.k+i\Vk) are needed and all such density functions are mutually independent, we may replace them by their corresponding Gaussian analogues p*(xfc+i|Vfc). Here and below, we will use * to indicate the Gaussian analogue of the corresponding density function. Consequently, we have p(x fc+1 |V fc ) =
J2akickiPUxk+i\Vk i=l
(26)
124
H. Wu and G. Chen
Under this assumption, we have derived a recursive filtering algorithm under the MMSE criterion, which turns out to be the same as the one shown in case (i) above. If, instead of using the aforementioned Gaussian analogues, we assume that the one-step ahead state prediction density function p(xfc+i|Vfc) can be approximated by a more accurate formula p a (x fc+1 |V fc ) = X X a K f l y C K p ^ X f c + i l V * ) w p(x f c + 1 |V f c ), i=\
(27)
i=l
which, in turn, is approximated by r
p6(xfc+1|Vfc) =5> ( f c + 1 ) i p*(x f c + 1 |V f c ) ~p a (x fc+1 |V fc ),
(28)
i=l
where the new coefficients {a^+i)i} minimization:
are determined by the following least-squares 2
t\
mm
/
^ ^ a f c i / 3 f c j c f c i p V ( x f c + i | V f c ) - ^a(fc + i) ! p*(x f c + 1 |V f c ) i=\
subject to
dxfc + i
i=l
j=l
1
I
=1
^a{k+l)l
a{fc+i)i>0, » = 1 , • • • , r , fc = 0 , l , - - - , then, under the MMSE criterion we have also obtained a recursive filtering algorithm (at which is the same as the one shown in Case (i), except that the coefficients {a^+i)i} are determined by the above minimization at each step rather than using the formula c*(fc+i)i = QkiCki- For more details in solving efficiently the above constrained minimization problem, the reader is referred to Wu and Kailash [20]. We have observed that the filtering performance can be further improved if at each estimation step we replace the state prediction density function by p a (xfc + i | Vfc) directly rather than using p(xfc+i |Vfc) or p (, (x) c+ i|V fc ), which are both less accurate. In doing so, the resulting filtering algorithm is: ' XOi x 0 i == X()i
Xfci == Xfci
Ak-1x{k_1)i
+ lk_1
Xfci == xfci + XH MkiCkgki{^k) , Xfeij =^fc-lX(fc-l). +I ( J t _ 1 ) ; ) .
Xfcij = Xfcij +
MkijC^gkijivk)
Xfc = V V / l k i j Cfci j Xfc =
= 1 3=1
X-fcij
125
Kalman Filtering for Non-Gaussian Noise with the estimation-error covariance r
h
k'jckij [Pkij + (Xfc - Xfcy)(xfc - Xfci:,)T] .
h
Pk = 5 3 E i=l J=l
where
r
PPoi 0 = Soi
M, Mki = A f c . j P ^ . ^ A j . ! + Q f c _! ' Pfc, Pk = Mki -
Mk,CjGki^k)CkMk
M, Mkij = >lfc-iP(jt-i)i-4Li + Pkij = Mkij -
Q(k-i)j
MklJCkGkij(vk)CkMkij
in which oo
Pi]
/
1 ^(xfclV*) Pi(
(xfclV^ 1 )?,, (vfc - Cfcxfc)dxfc
-OO
^jflWrfjtxfclV*- 1 ) J=l oo
Pit Ckt, Qty
/
ftCxfclV^jRjvjt-CfcxOAcfc
-oo
*] = ^-(vfclV*- ) / X E '
cCfci ki
r
/
1
^P^KIV"-1),
2=1 J= l
=p,(v f c |V f c - 1 ) / ^ a f c i f t C v k l V * - 1 )
Afctj hki = "(*-l)i/S(fc-l)jC(fc_i),
'^(vfclV*-1)1
fffcij(vfc) 9kt,
<9v fc
^■(vfclV*-1)
Gkkij(vk) = ^r{fffcy(vfc)} 5 P l (v f c |V f c -') 3vfc 5 Gfci(v G fc) = ^- T {sfci(v fc )} k 3v*
Sfci(Vfc) = 9ki
-
Pi (v fc | V * " 1 )
where {a(fc+i)j} are calculated either by a(jt+i), = afc,Cfci or by solving the above constrained minimization at each step k = 0,1, • ■ •.
126
H. Wu and G. Chen
We remark that this filtering algorithm is a combined (and modified) version of the two schemes described in case (ii) above, which takes into account each single component p(£. ) of the state noise sequence, i = 1, • ■ • ,#i and k — 0,1, • ■-, and hence improves the performance of the filtering. However, its computation is somewhat intensive. A further improvement of this filter with regard to the computation issue is given in Wu and Chen [18]. Case (iii): The initial state and the two noise sequences are all Gaussian sums of the form to
la
p(x 0 ) = ^ a o t P ( x o ) = ^ QQj JV(x0, Soi) «=1
i=l
a
f
t=l
i=l
Vilk) = Yl ^iP(%^ = Yl VkiN^., Rki) ■ i=i
In this case, the state prediction density function takes on the form T
p(x fc |V fc - 1 ) = ^ a f c i P i ( x f c | V f c - 1 ) ,
(29)
i=i
so that the state update density function can be written as either r
p(xfc|Vfc) = ^V,c fcl p ! (x fc |V fc ) with P i C X f c l V ^ t e j i ^ P n .(Vfc-CfcXfc)] 1 L i 7 .„k_?' fc Cki p(vfc|V *) c, Sreopi(xk\vk-1) [SJLi w^Cvfc - ck*kj\ dxfc fc
Pi(Xfc|V ) = Pi
Ck
p(v fc |V fc -i)'
~ '
*2
p(x fc |V fc ) = ^ ^ a f c l p f c j c f c i j p l . , ( x * : | V f c )
(30)
«=I 3=1
with Py(x fc |V Pi.
fc
)
P t (x fc |V fc ' ) p 2 (vfe-CfcXfc) p(v fc |V fc -
/fLPi(xfc|V fc Cfct7 = Ck
l
x
Cfci
)pRkivk-Ckxk)dx.k
p(v f c |V*-i)
127
Kalman Filtering for Non-Gaussian Non- Gaussian Noise Based on all these, the next state prediction density function becomes either r
fc p(xfc+ |Vfcfc) = 5> i|V 5^afciCfciPi(x i|Vfc)) fc+1 fc ,c fcl p,(x fc+1 fc+|V
(31)
i=l
with oo
ll
/
p t (x fc |V fc ) Y^faiPL
or
r
<2
.((xxfc+i *+i _- A -4fcx kxk)dxL fc)dx k fc ,
p(x fc+ ^=1 s = l
with
sCfcijPijs
i = l >=1 s = l OO oo
/
-OO
(32) (32)
(x fc+ i|V fc )
fc pp ll JJ (xfc|V (xfc|V f c )p^(x )p^(x ff cc ++ ii -- AfcXfc)rfx AfcXfc)rfxfcfc ..
-oo
Then, replacing the one-step ahead state prediction density function Pi(xk+i\ Vfc) by its Gaussian analogue for each i = 1, ■ ■ ■, r and k = 0,1, • • •, we have derived the following recursive filtering algorithm: X 00 t ' X X 00 ll === x
Xfci xfci
Xfcy Xfcy
Ak. lX(fc-l)i + | A .; ,_ _ 1 == -4fc-iX( fc _i)i+| -l = xfc! fci 4-
=z
MkiMC^g gklj(Vfc) (vk) kiCkkiJ fc
1 P.,(v fc |V - ) • - _v> ^(vfclV*Xfci x n Xfct i ^ - Z . M f eMfci p' , p,(v _ { v f cfc j v|Vfcfc 1)xfcy
rr Xfc =- 2jQfcjCfciXfc, 2jafczCfciXfci ,, i=l ,
with 'fffcij(vfc) 'fffe«(vfc)==
[C MklHCl CfcT + ? ^ ] - 1 ^ - Cjtxy [CfckM + iRkiV^k Cjtxw ---a*] g^]
P^(v fc |V fc " -») = N(Ckxkl + vkj, CkMkxCl ^•(vfelV*- 1 ) = NiCkXki + ^CkMkiCl 1
fc 1
fc ) fc lVf c - 1 ) = J^p 1J (v fc |Vf c ' Pi(v -1) P l (v f c |V - ) = ^ ^ P . , ( v f c | V
3=1
Cfci = P . ( v ;
3=1
Cfcj =P l (v f c |V f c " 1 )
/^QHP^VfclV^1)
+ y) + «Rtj)
128
H. Wu and G. Chen
and Pot = Sot Mki = Ak-\P{k-i)iAl-i
+ Qfc-i
Pki, = Mki - Mk&lyCkMk.Ck1
+
Rkj]-lCkMki
PilvfclV 1 - 1 )
■^rj r
P fc = ^QfciCjti[Pjti + (xfc - x fci )(xfc - x f c i ) T ] , V
i=l
where {a (fc+ i)i} are calculated either by a^k+i)i = QfciCfci or by solving the following constrained minimization at each step: Minimize r
/
h
h
r
s=l
i=l
dXfc+l i=l j=l
over a(fc+i)i, subject to
I Y,a(k+Vi
=1
(Ot( Kct(k+i)i>0,
i = l,---,r,
fe
= 0, l , - - ,
in which c^=p!,(vfc|Vfc-1)/^Qfclpi(vfc|Vfc-1) '
i=i
On the other hand, if we calculate Xfc+i and Pjt+i using the state prediction density function (32), then we obtain the following (more accurate) recursive filter ing scheme:
/'
Xo,
Xfc, xki
=Ak-iZ(k-i)i
x/ti Xfc
= xti +
Xfc, Xkij
+ ik_1 MkiCjgki{\rk)
= xfcv +
MkiCkgkij{\k)
' Xkijs xfc = ^fc-l*(fc-l)y+|(fc_1),
Xhijsn = Xfcijs + MkijsCk Xfc
ifc = Xfc
r
l2
IT.
ti
YYYY 1=1 j'=:l s = l m=l
gkiJ3m{vk)
129 129
Kalman Filtering for Non-Gaussian Noise Noise where
P()i 5oi Poi = Po,= =- Soi Soi
M = A _xP +Q = ^4 ki {k fc _! fc-Ji-P(fc-i)iA-i Mfci V_x)i l Al_ ) i xA L + Qfc-i ki = ^ kn ' flw Phi fci Pfa = === Mfci ki -
MkiC^Gkiiv^CkMki
Ml3ki^k)CkMki ClGklJ{vk)CkMki Pfcij = Mfci -MkiClGk fci ki -
= AfcPfcijvir ++ Qg fcs
.. W(fc+l)>j» M(k+X)ijS = AkPkijAl = 11
ks
= N(CkkXki C ' P^VfclV^ ) = , CfcM Rkkk)) 'P,(y k% xkkii + + ^f^, %,« ?kkMk% ClfcT + +R fc,C7 k\\ ~ ) k l
//pp iiI^(vfclV*K K IlV v *^"111))
= -- ^ ^^--1 )) , 5fci(v sw(v
r
n ci(vfc)
1 -p^CvfclV^ 2^Jtj,,( ' P.jCvfclV^ iV((7fcfcxfci C-fcM k M kfciiCPfclT + + fly) Pij(yk\Vk-X) ) = = iV(C ^.,CkMkiCZ fcl + g
, Sfc SfeiiOfc) «(v fc ) = -- J^ -Pp ^,^v(IfvV *| *"V-11-)^ cffcV
/p / P iiii(v ((v v fffcc)V ))V Vffcc--11)) ///p
Gkij(vk) = —^rSfcij(vfc) Gfeii(vfc) JT2fcij(vfc) fc 1 'p P^m(v N(C n^^CkMkijsCl ' p^m(v = N(Ci^ui, 7Vr(CfcX +4-ry n >CkM ,(?kMkij I ^m(v f c |V - ) = kxki:is fcij3 kmJtm kii.cZ SC k r
,
1 ^-Q -g ^ • P^y(S v c flcV m f( V | V^f c _ 1))
= :
gkijsm(vk) gkijsm(vk) gkijSTn\yk}
-I -i
Q
= ~ Q
9
Gkijsm(yvk) k) Gkijsm( Gkijsmiyk)
'pPkijsm fcIJ s m
,
l£22
ti *i
fc X /p mC / /ppU i ijSjssm(v m ( vv |c IVV*-' '"- 11)) f fcfc|V
,
= 7T ^—^gfcyj, m (v fc ) = -K-T9kijsm(Vk)
= Mfcij's Mfcija -— M MkijsC kij3Ck k rr
+ Rkm)
/
GkSijm(Vk)CkMkij Gkij Sm(yk)CkMkijs
eii2
Pfc *k ==~
kiJ3mCkijsTn[*kiJ37n / _, / Jj // j // j^ ""J,kijsmC-kijsTn[*kiJ37n i=l j = l 8=1 m=l
cCfci = fc, ==
pPi i( (vvff cc | V ffcc--1 1) /) ^/ aVf cQi A^(Av fKc | IVVf c^- 11 )
*-kij ,m){X-k'> VX\?%k A: — XfcejsrreAXk
^Arij'^fnj *-kijsm)
a(fc+i)i = 0(fc+l)i = "fciCfci akicki hkij
sm sm
akiHkjPksflkrn jf T
Cfcijsm = Pijsm\*k\ Pijsra{-Vk\yk~▼l) Ckijsm
li2 £
I, 1,
f2
5 3 S S X S I / l/fl fcci ij jsSm mPPii^^m m((V Vffcc| V f c "- 11) . ^ I/ ^T, 1=1 ' i=l j = l s=l m=l
- )-
JJ
130
H. Wu and G. Chen
2.3.2 Recent development In Wu and Chen [18], we considered two cases of the linear system (1): (i) the state noise and the initial state are both Gaussian sums but the measurement noise can be arbitrary, and (ii) the state noise and the initial state are both non-Gaussian with known means and covariances but the measurement noise is Gaussian sum. Note that the first case here is the same as Case (ii) discussed in Wu and Kailash [20] and the second case here is an extension of the second one that Masreliez [13] studied. We now describe these two new algorithms individually in the following: Case (i): Since both the initial state and state noise are Gaussian sums, we denote to
t0
P( X O) =^2otoip(x0i)
=
i=l
^2a0iN(x0i,Soi), i=l
B
f
i=l
i=l
Notice that the computations in the scheme of Wu and Kailash [20], discussed in Case (ii) of the last subsection, which takes into consideration each single component of the state noise, are still somewhat intensive for real-time applications. To improve this, we let the state-update density function be expressed as r
ptx^lV*" 1 ) = ^ a ^ D . p . t x ^ l V * - 1 ) , i=l
where pi(xfc_i|V fc_1 ) may be non-Gaussian (and hence this expression is not re strictive) with mean x.(k-i)i a n d covariance P/k-i)!- Then, the one-step ahead state prediction density function p(xfc|V fc_1 ) can be written as p(x fc |V fc - 1 ) = £
r
i=l
h
X^fc-D^fc-iwPo'folV*-1), j=\
where oo
/
-oo
p i (x f c _ 1 |V f c - 1 )p ?
-<*-l»
(xfc -
J4fc_1xfc_1)cixfc_i,
and the next state-update density function p(xfc|Vfc) will satisfy r
p(x fc |V fc ) = ^ a f c ! p , ( x f c | V f c ) ,
131
Kalman Filtering for Non-Gaussian Noise in which k 1 ft(xt|Vfc) = J^/3 £ %ik-i)iPij(xk\V _,). -)
'
rOO
Cki=
P^(v f c - Cfcxfc)
/ckip(vk\Vk~l)
P^lvfc - Cfcxfc)dxfc
/p(vk\Vk-y)
Jl
/
^ % _
, ajti Qfci == a( Q(fcfc_i)iCfci, _i)iCfci, v
1 ) j P i
i = = 1, l ■ ■ •, r; A: = 0,1, •
Next, replacing pi(xfc|Vfc) by its Gaussian analogue for all i = 1, • • ■ , r and all fc > 0, we obtain the following recursive filtering algorithm under the MMSE criterion: Initialization:
State
XOi =
XOt
Poi =
SQ%,
i = 1,- • • , r
Estimations:
kij = xfcij + Muftigkij{vk) xfc, == 2 ^ % - i Xfci Xfc,
= X ] <*«**» !
Xfc = =
where Mkij,xkij,aki will be given in the "propagation equations" shown below, fffci>(vfc) is a <j x 1 column vector defined by gkij(vk)
=
_d_ Pii(v \Vk-1) k 3v fc '
/Pii(vk\Vk-1),
with (
1 Pi,(v ^■(vfelV*) = / f c |V
f°°
^ ( x f c l V * - 1 ) ^ ^ - Cfcxfc)dxfc
J —oo
< P,(vfcfc|V |Vfcfc - 1 ) = £/3 ( f c -i),P, J (v f c |V f c - 1 ) P.(v 3=1
132
H. Wu and G. Chen
Co-Processing: ' Pkij = Mkij -
MkijC^GklJ(wk)CkMkii
[p"i + (*" - *M*)(*« " *fcii)T]
Pki = JtPik-V'^ivk-n V
~J
P>( fc|
V
)
r
Pk = ^ « f c i [ P f c i + (xfc - Xfci)(xfc - x f c i ) T ] , i=l
in which Gk(-vk) = [Gkij(vk)] is the q x q matrix with the (i, j)th entry Gkij(vk) = Q^f{9kij(vk}} ■ Propagation: X(fc+i)ij = Ak-kki + lkj Afyb+ny = A-kPkiAl + Qkj where c(fc+i)i = Pi(v fc+ i |Vfc) / Y,
aklpi(vk+11Vfc).
Cose (ii): Since the measurement noise is a Gaussian sum, we write «2
*2
In a similar manner, with an additional assumption that the constant matrices [C^R^Ck] are nonsingular for all j = 1,■••,£2 and k = 1,2, ■■■, we obtain a recursive filtering scheme under the MMSE criterion as follows: Initialization: XOj = Zo POJ
= S0 ,
for all j = 1, • • • ,£2
State Estimations: ' Xfc x f c j = xkj
•• f xjt
Xjt = 2
+ TkjCj[Rk^(Yk
-
.MfcjCfcjXfcj;,
- Cfcxjtj - 2^.) - Sfcj(vfc)]
Kalman
Filtering
for Non-Gaussian
133
Noise
where
ckj=P3{vk\Vk-l)/p{vk\Vk-1), a n d gkj(vk)
is a q x 1 column vector defined by
^Kiv-1)
fffcj(Vfc)
'PjfvfclV*" 1 )
Co-Processing:
<
n3 = [cTR-jCk]-1 •Pfcj = TV, - TVjCfc Pk
=
Yl VkjCkj[Pkj
Gkj(yk)CkTkj + (x fc - xkj)(xk
- xkj)T]
■
J=I
Propagation: x
(fc+i)j = ^Uxjy + | ^
M ( f c + i ) i = AkPkjAl
+ Qk .
We remark, however, t h a t just like t h e second scheme of Masreliez which we discussed above, this algorithm is not implementable since t h e one-step ahead s t a t e prediction density function p(xfc|V f c _ 1 ), a n d hence t h e measurement prediction den sity function p{vfc|V fe ), is also unknown in our situation. To overcome this difficulty, we suggest t o approximate this density function b y a sum of (probably non-Gaussian) densities: r
p(xfc|Vfc-1) = ^ Q f c i P i ( x i | V f c - 1 ) i=l
with known means xk{ a n d covariances Mki, i = 1, ■ • ■ , r , a n d t h e n calculate t h e s t a t e - u p d a t e density function by p(x f c |V f c ) = ^ Q f c l c f c i P i ( x f c | V f c ) , where P . ( ^ | V f c ) = . ^ ' X l -fc n E ^ R ^ t v f c " C fc x fc ) Pi(Xk\ CWp(vfc|V -») j=l
< Cfc. = Cfc, =
f /
J —(
Pi(x f c |V fc-i\ p(vfc|V*->)
5 Z P^PzifcJ. (vfc _ C't**)
dxfc.
J=I
To this e n d , we c a n replace pi(xk\Vk~1) b y its Gaussian analogue for each i = 1, • • •, r. T h e n , a recursive o p t i m a l filtering algorithm under t h e M M S E criterion can b e obtained as follows:
134
H. Wu and G. Chen
Initialization: XOJ
POJ
= xo = So ,
for all j = 1,
t2.
State Estimations: ' *kij X;
,
X| x
= x ^ + TkjCl\Rk}{vk
- Cky-ki - 2 ^ ) - flfco(vfc)]
^-Zt^Pi(vfc|Vfc-i)x^ r
xjt = 2jajtiCfciXfci, x; i=l
where Qfci and Cfcj will be given in the "propagation equations" shown below, and ' Sfci(vfc) = [CkMkiCk1 + i&j-r^Vfc - Cfcxfcl - 5fc>) oo
/
PiCxfclV*-1)?, ( v t - C7txfc)dxfc
-oo
_fc:l
Pi(v fc |V fc - 1 ) = ^ A t f c j P l J ( v f c | V f c - 1 ) . Propagation: /■
—
' xfci = Ak-ik(k-i)i
+ £fc_x
M Mfci = i i t - i V i l ^ f c - i + Ofc-i Tk CH Ck <
= P < (v f c |V f c - 1 )/p(v f c |V f e - 1 )
Oil aki = "Kt —
a(fc-i)iC(fc_i)i "(fc-ljl^fc-ljl Pfci,- = 7/y - rfcjCfcT[CfcMfciCfcT +
PA
Rkj]~lCkTkj
T fci Ffei ^M ^ V f c | V f c _1^) ^[Pfcij ++ ((XH -~ * ±kij) y ) ( fxc * w ~ Ffei = = j2 T ^ fj c j^(vfclVfc* fci ~- *xf fee«^* ^ }
T
Pk = 5 J QfciCjti[Pfci + (xfc - xfci)(xfc - x fci ) ] • Pk
Kalman Filtering for Non-Gaussian Noise
135
Acknowledgments. Supported by the President's Research and Scholarship Find, University of Houston. References 1. Anderson, B. D. O. and J. B. Moore, Optima] Filtering, Prentice-Hall, Englewood Cliffs, N. J., 1979. 2. Aoki, M., Optimal Bayesian and min-max control of a class of stochastic and adaptive dynamic systems, in Proc. of IFAC Symp. on Sys. Engr for Contr. Sys. Design, Tokyo, 1965, 77-84. 3. Camerson, A. V., Control and estimation of linear systems with non-Gaussian a prior distributions. Proc. of the 3rd Ann. Conf. on Circ Sys Sci., 1969. 4. Catlin, D. E., Estimation, Control, and the Discrete Kalman Filter, Spring er-Verlag, New York, 1989. 5. Chen, G., C. K. Chui, and Y. Yu, Optimal Hankel-norm approximation ap proach to model reduction of large-scale Markov chains, Int. J. of Sys. Sci. 23 (1992), 1289-1297. 6. Chui, C. K. and G. Chen, Kalman Filtering with Real-Time Applications, Springer-Verlag, New York, 1st ed. 1987, 2nd ed. 1991. 7. Fugerson, J. M., Bayesian density estimation by mixtures of normal distribu tions, in Recent advances in Statistics, M.H.Rizvi et al (eds.), Academic Press, New York, 1983, 287-302. 8. Hubert, P. J., Robust statistics: a review, Ann. Math. Stat. 43 (1972), 10411067. 9. Jazwinski, A. H., Stochastic Processes and Filtering Theory, Academic Press, New York, 1970. 10. Liptser, R. S. and A. N. Shiryayev, Statistics of Random Process, Vol.2, Spring er-Verlag, New York, 1978. 11. Lo, J. L., Finite dimensional sensor orbits and optimal nonlinear filtering, Univ. of South. California Report USCAE 114, August, 1969. 12. Martin, R. D. and C. J. Masreliez, Robust estimation via stochastic approxi mation, IEEE Trans, on Inform. Th. 21 (1975), 263-271. 13. Masreliez, C. J., Approximate non-Gaussian filtering with linear state and ob servation relations, IEEE Trans, on Auto. Contr. 20 (1975), 107-110. 14. Masreliez, C. J. and R. D. Martin, Robust Bayesian estimation for the linear model and robustifying the Kalman filter, IEEE Trans, on Auto. Contr. 22 (1977), 361-371. 15. Sorenson, H. and D. L. Alspach, Recursive Bayesian estimation using Gaussian sum, Automatica 7 (1971), 465-479. 16. Tsai, C. and L. Kurz, An adaptive robustizing approach to Kalman filtering, Automatica 19 (1983), 279-288. 17. Tukey, J. W., A survey of sample from contaminated distribution, in Contri butions to Probability and Statistics, (Harold Hotell Volume), I. Olkm et al.
136
H. Wu and G. Chen
Stanford University Press, CA, 1960, 448-485. 18. Wu, H. and G. Chen, Suboptimal recursive filtering for linear systems with non-Gaussian noises, to appear. 19. Wu, H. and G. Chen, Recursive filtering for discrete-time nonlinear systems, to appear. 20. Wu, H. and B. Kailash, A new approach to approximate minimum mean square estimation for linear systems with non-Gaussian noises, 1992, to appear. 21. Wu, W. R. and A. Kundu, Kalman filtering in non-Gaussian environment using efficient score function approximation, in Proc. of Int. Symp. on Cue. Sys., Portland, OR, May 8-11, 1989, 413-416.
Huaiyi Wu Department of Electrical Engineering University of Houston Houston, Texas 77204 [email protected] Guanrong Chen Department of Electrical Engineering University of Houston Houston, Texas 77204 [email protected]
Set-valued Kaltnan Filtering Darryl R. Morrell and Wynn C. Stirling Abstract. A theory of discrete-time optimal filtering based on convex sets of probability distributions is presented. Rather than propagating a single conditional distribution as does conventional Bayesian estimation, a convex set of conditional distributions is evolved. The conventional point-valued Kalman filter is generalized to a set-valued Kalman filter, consisting of equations of evolution of a convex set of conditional means and a conditional covariance. An interpretation of the set-valued Kalman filter based on Levi's epistemic utility theory is developed. Examples are presented to illustrate and interpret the theory.
§1 Introduction The time-domain filtering problem, in its broadest sense, is the determination of information about a sequence of system state values from a system model and numerical observations (i.e., data). The observations may be incomplete and corrupted by noise, and their relationship to the unknown system state may be imprecisely characterized. A functional relationship between the observations and the estimated state information is termed a filter. The Kalman filter, for example, is the solution to the minimum mean squared estimation problem for a linear dynamic system model. A filter typically forms a part of a larger decision system. As computer technology advances, decision systems become increasingly automated and algorithms must perform higher level functions previously performed by human decision agents. Successful automation of the decision process requires that attributes previously brought to the decision system by human agents must affect and, to some extent, be supplied by automated systems; these attributes include value systems, goals, background knowledge, capability to recognize error, understanding of potential actions, and capability to evaluate risk. We investigate the filtering problem in an epistemological framework that reflects this broader context of decision making. Approximate Kalman Filtering Guanrong Chen (ed.), pp. 139-160. Copyright © 1 9 9 3 by World Scientific Publshing Co. Inc All rights of reproduction in any form reserved ISBN 981-02-1359-X
139
140
D. Morrell and W. Stirling
The process and methodology of human decision making and, more particularly, scientific inquiry has long been a topic of discussion and debate among philosophers. For three decades, Isaac Levi has actively addressed the problem of how scientific knowledge is and should be acquired. He has developed an epistemological model of the behavior of a rational agent in the acquisition of new information [6,7,8]. His model addresses the broad problem of knowledge acquisition, including the goals and values of the rational agent, the agent's willingness to risk error, incomplete and contradictory evidence, the possibility that currently held beliefs are untrue, and experiment design and statistical inference. Levi's work is sufficiently broad to address real issues while providing sufficient detail and mathematical precision to be practically useful. In particular, because Levi provides a direct generalization of classical probability and decision theory, classical techniques such as the Kalman filter can be directly extended. Three key concepts from Levi's theory have a direct and immediate bearing on our approach to the filtering problem. We briefly introduce these concepts, and subsequently provide more detailed descriptions: Credal convexity: In Levi's framework, one's state of knowledge is represented by a convex set of probability distributions. The use of convex sets of distributions, denoted credal convexity, allows the representation of imprecise and indetermi nate probability states. In particular, ignorance about the initial system state can be represented in a way that makes filtering using non-observable system models meaningful. Epistemic utility: In the filtering problem, as in all stochastic decision problems, a decision agent must balance the willingness to risk error against a desire for new information. Levi addresses this balance through an epistemic utility function consisting of two components: the truth of the hypothesis under con sideration, and the information conveyed by this hypothesis. He introduces an index of caution (also called the degree of boldness) that determines the relative weighting of the two components. Suspension of judgment: Levi argues that when the available information does not support a single hypothesis strongly enough to warrant the risk of error inherent in its acceptance, a viable and reasonable alternative is to suspend judgment between rival hypotheses and remain undecided. Suspension of judg ment occurs as a natural result of the adoption of the epistemic utility function and of credal convexity. In the filtering problem, the suspension of judgment is manifested as a sequence of set-valued estimates; all points in a set are seriously possible values of the unknown system state at a given time. In this paper, we outline one approach by which Levi's theory can be ap plied to the discrete-time filtering problem. We use a system model consisting of a discrete-time Markov dynamics model and a memoryless noise observation model; this system model includes the standard discrete-time linear/Gaussian model as a special case. In the filtering theory that we develop, a convex set of probability
Set-valued Kalman Filtering
141
densities (i.e., credal convexity) is used in the representation of the decision agent's knowledge of the initial system state; at each time, the filter output represents a set of posterior densities obtained from the set of prior densities, the system model, and the observations. Using the set-valued Kalman filter, filtering with partially observ able or non-observable system models is meaningful, because a set of all seriously possible state values is obtained for each time value. Also, the filtering algorithm reflects the decision agent's values in terms of what is considered informative, in that the agent can explicitly specify his willingness to risk error in order to obtain information. The set-valued Kalman filter is conceptually related to the older concept of setmembership estimation. In set-membership estimation, however, a stochastic model of the system unknowns is not used. Rather, it is assumed that the unknowns belong to bounded sets; the estimator computes a set of all system states that are consistent with the observations and the sets bounding the system unknowns. The set-membership approach was first taken by Witsenhausen in the context of a general control problem [15,16]. An approximate solution to the filtering problem, in which sets of possible system states are approximated by ellipsoids, was given by Schweppe for a linear system model [13]. Bertsekas and Rhodes solved the filtering problem exactly for a linear system and sets defined by an energy constraint, and also gave an ellipsoidal approximation algorithm that has some advantages over Schweppe's [2]. There is no discussion by Bertsekas and Rhodes about system observability, although it seems that their algorithm should perform correctly when the system is partially observable. Krener showed that under certain assumptions, the standard Kalman-Bucy filter can be interpreted as a set-membership filter [5]. In previous work, a stochastic system model has been combined with a setmembership specification of system unknowns (e.g., the initial state estimate is constrained to lie in a given set). Kats and Kurzhanskii addressed the problem of linear system estimation with arbitrary convex sets [4]. Pshenichnyi and Pokotilo have extended the formulation of Kats and Kurzhanskii to allow the analysis of arbitrary sets with nonlinear systems [11]. Anan'ev and Kurzhanskii present inde pendent work that parallels this present study, and provide a solution for the best minimax conditional mean for the state of a nonlinear multistage system [1]. The paper is structured as follows: we first introduce mathematically the con cept of credal convexity. We then prove that the filtering process preserves convexity of a set of densities. We also prove that this set of densities can be represented by a non-convex subset of densities. We then specialize these results to a linear/Gaussian system to obtain the set-valued Kalman filter equations. We apply Levi's epistemic utility theory to interpret the output of the filter, obtaining a set of all seriously possible state values. Finally, we present several simulation examples to illustrate the set-valued Kalman filter characteristics.
142
D. Morrell and W. Stirling §2 Credal convexity
Credal probability is probability formed on the basis of subjective judgment by the decision agent and measures the likelihood that a hypothesis is true. It is an expectation determining probability and plays the same role as that played by a probability distribution in classical filtering. We denote a credal probability distribution as P(g*) and assume that P(g*) can be represented by a density p(x) such that P(g") = J . p(x')dx' for all g* C TZn Classical Bayesian estimation theory is based on the representation of knowl edge (e.g., knowledge of a system state) using a single probability density. Levi labels this practice credal uniqueness. Levi contends that the requirement of credal uniqueness is often too restrictive; one's state of knowledge cannot always be ac curately represented by a single probability distribution. In Levi's theory, the re quirement of credal uniqueness is replaced with the logically weaker requirement of credal convexity. Under credal convexity, one's state of knowledge is represented by a convex set of credal probability distributions; this convex set of distributions, denoted Bk,k, is termed a credal state. The selection of a credal state depends upon available logical and contextual information. In the context of the filtering problem, we are usually interested in determining the value of an unknown system state from a sequence of observations. Thus, we investigate the computation of a set of posterior densities of the system state conditioned on the observation sequence. §3 Set-valued
filtering
We consider a discrete-time finite-dimensional system with state random vector Xfc, k = 0,1,2, - - -, and an observation random vector Vfc. The sequence of observations from time 1 to time k is denoted Vfc = {vi, ■ • •, Vfc}. Sample values corresponding to Xfc, Vfc, and Vfc are denoted as Xk, Vk, and Vfc = {vi,- • ■, Vk}We consider systems that can be modeled as first-order Markov processes, and assume that the observations errors are memoryless. This model may be expressed in terms of the following probability density functions: • The transition probability densities {p(xk\xk-i),k = 1,2, •••} that charac terize the state transitions from time k — 1 to time k. • The likelihood probability densities {p(vk\xk), k = 1, 2, • ■ ■} that characterize the probability governing the observation random vector Vfc given state Xfc. • The a priori, or prior, probability density p(xo) that characterizes the initial conditions of the system. • With the above densities, we compute the a posteriori probability density p{xk\Vk) that characterizes the probability distribution of the system state at time k given the observation sequence Vfc = Vfc. We obtain the o posteriori probability density p(xk | Vfc) recursively from p(xfc_i | Vfc_i) by defining a time update operator T^ and a measurement update operator
143
Set-valued Kalman Filtering
rf: p(xk\Vk^)
=Tkp(xk^1\Vk_1)
p(xk\x'k^)p^k-i\Vk-i)dxfkfc-i>
= f
p(xfc|vrfc) = r+p(xfc|vrfc_i) =
p(vk\xk)p(xk\Vk-i) Jnnp(vk\x>k)p(xk\Vk^)dxk
Note that Tk is a linear operator. We define the operator I\fc_i = TjJ" o r fc and note that [3] p(xk\Vk)
= I\fc_ip(a:fc_i|Vfc-i) =
P(vk\xk) /TC„ pjXklx'^pjx'^lVk-^dx'^ JTI* P(vk\x'k) Jn„ pix'^x'^pix'^lVk-^dx'^dx'/c'
(1)
for k = 1,2, • • ■• The recursion in (1) is initialized by defining p(xo\Vo) = p{xo)Classical filtering theory assumes credal uniqueness for all densities. In this development, we retain the assumption of credal uniqueness for the transition and likelihood densities, but reject it for the prior and, hence, the posterior, densities. 3.1 C r e d a l s t a t e s relative t o a system s t a t e We shall say a set of densities £ is convex if, for any finite set of positive real n v se numbers { a ; } ' = 1 , / > 1, such that 2 i = i Q» = 1> an<^ a any set ^ ofc probability densities a {Pi}[=i C £, then p* = Y2i=i 'Pi S £■ A probability density of this form is said to be a convex combination of probability densities in £. We generalize classical filtering theory by adopting credal convexity in our representation of prior knowledge of the initial system state. Prior knowledge is represented by a convex set of densities of the form p{xo); this set of prior densities is denoted Z?o,o- The adoption of a set of prior densities necessitates the use of sets of posterior densities in the representation of a posterior credal state. In the following, Bk>k denotes a convex set of conditional densities of the form p(xk\Vk) for the system state at time k. Let Vfc-^fc-i beanarbitrary set of densities of the form p(a:fc_i|Vfc_i); Vk~i,k-\ may or may not be convex. Vkjs-i, the image of Vfc-i^-i under Tk, is denoted as Vfc,fc-i = r^Vfc-i^-i and is the following set of densities: {p(xfc|Vfc-i) : p(xfc|Vfc_i) = TkP(xk-i\Vk-i),
p(xk-i\Vk-i)
e Vfc_i,fe_i}
Vk,k, the image of Vk,k-i under TjJ", is denoted as Vk,k = r£Vfc:fc_i: Vfc,fc = {p(xk\Vk) It follows that Vfc
k
: p(xk\Vk)
= r+p(xfc|Vfc_i), p{xk\Vk_x)
= r t fc_iVfe_i,fc_i.
g Vfc,fc_!}
D. Morrell and W. Stirling
144
The a priori knowledge of the system state xo is represented by Bo,o, a convex set of densities of the form p(xo) ■ The a posteriori knowledge of xi is represented by the set of densities B\^ = r 1|0 Bo,o- The set of densities Bk,k is obtained iteratively: Bk,k = Tkk-\Bk-i,k-i
= Tfc.fc-i o r fc _ lifc _ 2 o • ■ • o r\ 0 B 0 ,o-
(2)( 2 )
In order for Levi's epistemology to apply, the credal state Bk,k obtained according to (2) must be convex. We show in the followingconvei propagation theorem that the operator r ^ j t - i preserves convexity; thus, Bk,k is a convex set of densities for k > 0. Theorem 1. For a finite-dimensional discrete-time system characterized as a firstorder Markov process with state transition densities {p(xk\xk-i), k = 1,2, • • •} and memoryless observations with likelihood densities {p(vk\xk), k = 1,2, • • •}, suppose the credal state Bk-i,k-i is a convex set of densities of the form p(xk-i\Vk-i). Then the set of posterior probability densities Bk,k = ^k,k-iBk-i,k-i is convex; i.e., the operator Tk,k-i preserves convexity of the credal state. To establish this theorem, we first prove the following lemma. Lemma 2. Let {pi,- ■ ■ ,pi} be a set of probability densities, and let { a j } ' = 1 be positive real numbers such that 5Zi=i ai = 1- Then there exist two sets of positive real numbers {ai}\=1 and {&i}'=i such that 5Zi=i a> = !Ci=i &« = 1 and
rt
~52<XiPi
^aiT+pi,
£>r+ f t = r+
^2^
Proof of Lemma 2: We define ni = / R „ p{vk\x'k)pi{x'k)dx'k
(3)
(4) and assume without
loss of generality that -Ki > 0. Also, let o< = „ i a i 7 r ' — and bi = _!*■ a. , It can be verified by direct calculation that J 3 i = i at = 5Zi=i &» = 1 and that (3) and (4) hold [10]. Proof of Theorem 1: By assumption, Bk-i,k~i is convex; we must show that Bklk is convex. Note that Bk,k-i = TkBk-i,k-i and Bk,k = T^Bk,k-\Tk is a Unear operator and thus preserves convexity. We now show t h a t r ^ preserves convexity. Consider two probability densities p\ and P2 arbitrarily chosen from Bk,k- Since Bk,k is the image of Bk,k-i under TjJ", there are two probability densities Pi ,P2 € Bk,k-i such that p\ = ^kPi and p 2 = r j p j ' . Let p , be defined as p» = api + ( l - a ) p 2 , where a e (0,1). From Lemma 2, there is an a e [0,1] such that p» = r £ [api + (1 - a)p2] Thus, by convexity of Bk,k-u a p f + ( l - a ) p ^ e Bk,k-i and, since Bk,k is the image of B/t,fc-i under Tk, pt e Bk,k-
145
Set-valued Kalman Filtering 3.2 Representation of the credal state
Any convex set that contains more than one element must necessarily be uncountably infinite. Thus, to implement a set-valued filtering algorithm, we must express a credal set (consisting of an uncountable infinity of densities) in terms of a set of densities that can be represented computationally. In our approach, we represent the credal state as the convex closure of another, computationally representable set of densities. Specifically, for any set of probability densities V, we may form a convex set of probability densities B by taking the convex closure of V, denoted V; V is the smallest set containing all finite convex combinations of elements of V. The set V is then said to be a credal state generator for B; in Section §4 we develop the set-valued Kalman filtering algorithm in terms of a credal state generator of Gaussian densities. The following convex representation theorem insures that a set-valued filtering algorithm formulated in terms of the credal state generator is equivalent to the filtering operations performed on the the credal state directly. Theorem 3. Let Vk-i,k-i be an arbitraryj —set of probability densities of the form p(xk-i\Vi kl -Vl).t - i ) . 'Then rfcifc_iVfc_iifc_i = rk,k-iVk-i,k-iRemark. In the set-valued Kalman filter, Bk,k is the convex closure of a generator set Vk,k- Since Bk,k = ^k,k-iBk-i,k-]., Theorem 3 implies that Bktk obtained directly as the image of Bk-i,k-i under Tk,k-i is equal to the set obtained indirectly as the convex closure of the image of Vfc-i^-i under r^fc-i- Thus, the filtering algorithm can be formulated in terms generator sets without the need to directly consider the actual credal states. Proof: We first show that r^V fc _ x fc_1 = TkVk-i,kclosure of the image of Vk-i,k-i
Since TkVkl
k_1
is the convex
under Tk, for every distribution p 6 TkVk_1 a
there exists a set of / positive real numbers {en}, cti > 0 and Yli=i i set of / densities {pi} C Vk-\,k-i set of / densities {pi} C Vk-i,k-i
=
1>
an
fc_1, a
d
such that p = 5Z,=i o^r^Pi- Conversely, for any and I numbers {c^}, Qi > 0 and JZi=i a>
=
*>
a
there is a p* e r^"Vfc_x,fc_i such that p* = Tk \J2i=\ iPi\ ■ By the linearity of Tk, p = p* for given {p^ and {a{}. Thus, r^" Vfc_liA._j =
T^Vk-i,k~i-
We next prove that r^"Vfcfc_1 = T^Vk,k-\- Since ^t^kk-i *s the c o n v e x closure of TkVk,k-i and T^Vk^-i is the image of Vk,k-i under rjj", for every distri bution p 6 r£V k fc_p there exists a set of / positive real numbers {6;}, bt > 0 and Y^,i=i h = h a n d a set of I densities {p;} C Vk,k-\ such that p = J2i=i biFkpi. By Lemma 2 there is a set of positive real numbers {pi} such that 0i > 0, J^ i = 1 Pi = 1, and p = T+ [ E ! = 1 PiPi] ■ But E l = i ft?* e Vk,k-u so f+Vk fc_, C r+Vk,k-iSince T^Vk,k-i
is the image of the convex closure of Vfc,jt_i under Ft, for
D. Morrell and W. Stirling
146
every p e r£V/t,fc_i, there exists a set of I positive real numbers {on}, Q> > 0 and H i = i ai = 1> a n d a set of / densities {pi} C \>k,k-i such that p = T~£ I XTi=i a »Pi I • By Lemma 2 there is a set of positive real numbers {a*} such that ai > 0, 2 J J = I a<
i, and P = E L I g.r+p„ But E!=I a.r^p, e ffvkk_v
=
so r+vfc,fc_i c r£vfc>fc_r
Thus, r £ l \ f c - i = r^V f c f c _j. This establishes the result. §4 The set-valued Kalman filter In the following, we specialize the preceding discussion to the case of a linear/Gaus sian system model. Using the theory of support functions from convex analysis, we derive a simple form for the set-valued Kalman filter. Consider a linear stochastic system of the form xfc = Afe_]Xfc_j + B f c _ i £ f c l ,
(5a)
vfc = Cfcxfc + 7?fc,
(56)
for k = 1,2, •••, where Ak is n x n, Bk is n x p, ft is ra x n, {^l and {TJ^} are p- and m-dimensional vector Gaussian (zero-mean) white noise processes with positive-definite covariance matrices Qk and Rk, respectively. The conventional modeling approach is to assume credal uniqueness, and char acterize the initial state as being distributed xo ~ W(xo,o, Po,o) (i.e., xo is an n-dimensional Gaussian random vector with mean x^o and covariance Po,o-) In the set-valued Kalman filter, we specify an initial credal state #o,o in terms of a gener ator set Vo,o that consists of Gaussian densities, all with covariance Po,o and with means in a closed convex set in state space denoted Xo,o- The set-valued Kalman filter provides an algorithm to compute a corresponding covariance matrix Pk,k and set of means Xktk, k = 1, 2, • ■ ■; the credal state generator at time k consists of the set of Gaussian densities with covariance Pktk and means in the set Xk,£, and Bk,k is obtained as the convex closure of Vk,k- We denote the set Xktk as the credal set. There are a number of practical reasons for restricting consideration to a gen erator set formed from Gaussian probability densities. Gaussian models are appro priate for many physical systems, due to considerations such as the central limit theorem or maximum entropy. Also, linear systems driven by Gaussian inputs yield Gaussian outputs, so if the prior is Gaussian and all noise sources are Gaussian, then the output will be Gaussian. Although neither of these reasons represent blan ket warrants for assuming Gaussian distributions, Gaussian models have withstood the test of application successfully with great regularity. The restriction of the prior credal set to Gaussian probability densities with common covariance and means in a convex set makes it possible to express operations on a convex set of probability densities in terms of a convex set of means. We begin the derivation of the set-valued Kalman filter by reviewing the con ventional Kalman filter equations. We introduce support functions from convex
Set-valued Kalman Filtering
147
analysis as a method of representing the set Xk,k- We then derive the set-valued Kalman filter. 4.1 Kalman filter equations Given the linear/Gaussian system model of (5) and a single prior density p(xo) ~ 7V(xo,o, -Po.o), the Kalman filter can be interpreted as equations of evolution for the conditional mean and variance of the system state and thus specify the result of applying T^ and F~£ to a Gaussian density. The application of the time update operator Tj~ to a Gaussian density of the form p(xfc_i|Vfc_1) ~ //(kk-i,k-i, Pk-i,k-i) yields the predicted probability den sity p(xk\Vk-i) ~ jV(xjt,fc_i, Ffc,fc-i), where Xfc,k-i = ylfc_ixfc_iifc_i, Pk,k-i = Ak-iPk-i^-iAj^
(6a) + Bfc.iQfc-iBj.!.
Application of the Bayesian update operator F^ to p(xk\Vk-i) probability density p(xk\Vk) ~ N{x.k,k, Pk,k), where
(66)
yields the filtered
x/b,fc = Xfc,jb-1 + Gfc[vfc - CfcXfc.fc-i],
(6c)
Pk,k = [In-GkCk}Pk,k-i,
(6d)
and Gk is the Kalman gain defined by Gk = Pk.k-iCZ
[CkPk,k-iGl
+ Rk]_1
(6e)
4.2 Derivation of the set-valued filter As previously mentioned, our prior credal state generator Vo,o is a set of Gaussian densities, all with a common covariance matrix Po,o and means in a convex set A^o: Vo,o = {p{xo) ■ p(xo) ~ W(x 0 ,o, PQ.Q), X0,O £ ^0,0} • For k = 1,2, • • -, we wish to determine a set of probability densities of the form Vfc.fc = {p{xk\Vk) : p(xk\Vk) ~Af{xk,k,Pk,k),
ik,k e *fc,fc} •
Theorem 3 insures that we may derive equations of evolution for elements of V^ k and take the convex closure to obtain Bk,k- Consequently, we limit discussion to the set Vk,k of Gaussian probability densities. Corresponding to each density in Vo,o there is a density in Vk,k\ thus, if the set of prior distributions Vo,o is not a singleton set then it is (conceptually) necessary to formulate a Kalman filter of the form (6) for each a priori density in VQ,O. The
148
D. Morrell and W. Stirling
set-valued Kalman provides a means of doing so without implementing an infinity of Kalman filters; instead, Pk,k and the credal set Xk,k are computed for each k, from which Vk,k and Bk:k can be obtained. In the following, we use support functions from convex analysis to represent the set Xk,k and to derive propagation equations for Xk,k- The support function [12] of an arbitrary closed convex set C C TZn, denoted 6(x\C), maps from 7£" to 1Z. It is defined as 6 (x\C) = supx y. y€C
The set C can be recovered from 6{x\C) as follows: y e C if and only if xTy < 6 (x\C) for all x £ Tln Through the use of the properties of convex support functions, we obtain an expression for 6(x\Xk,k) from which the set Xk,k can be computed. Prom (6a), Xk,k-i — Ak-iXk-i,k-i] this notation denotes the multiplication of each element of Xk-ijc-i by Ak-i. The support function of Xk:k-i is computed in terms of the support function of Xk-i,k~i as = 6 (ATk_lX\Xk-i,k-i)
6 (x\Xk,k-i)
From (6c), the set Xk,k is computed from Xk,k-i
■
(7)
as
Xk,k = [In ~ GkCk] Xk,k-i + GfcVfc. In terms of 6 (x\Xk,k-i),
the support function of Xk,k becomes
6 {x\Xk>k) = 6 ([/„ - GkCk\Tx\Xk,k~x)
+ xTGkvk.
(8)
In order to present a specific algorithm, we develop the set-valued Kalman filter for the particular case where A'fc_i?fc_1 is a hyper-ellipsoid of the form <.Xk-l
■ (Xk-l
~ Cfc_l,fc_l)
(Sk-l,k-lSk^l]fc_l)
(Xfc-l - Cfc-l,A:-l) < l | ,
where Ck-i,k-i is the n-element vector centroid of Xk-i^-i and Sk-i,k-i n x n invertible matrix. The support function of Xk-i,k-i is [12] 6 (x\Xk-i,k-i)
= yJxTSk-i,k-iS^_lk_1x
With this form for 6 (x\Xk-i,k-i), 6{x\Xk,k-\)
+
is an
xTCk-i,k-i-
(7) becomes
= )JxT (Ak-iSk-i.k-iS^^Aj^)
= <JxTSk,k-iSlk_lX
+ xTCk,k-i,
x + xT
{Ak-ick-hk~i)
(9)
Set-valued Kalman Filtering where Sk,k-i
= AkSk-i,k-i
6(x\Xk,k)
149 and ck,k-\ = Akck-i,k-i-
Combining (9) with (8) gives
= ^ / x T ([/„ - G fc C fc ]S' tl fc_i^ fc _ 1 [7„ - GkCk]T) T
x
([In - GkCk]ck,k-i
= yJxTSk,kS^kx
x+
+ Gkvk)
+ xTCk,k,
where Ck,k = [In - GkCk]Ck,k-l
+ GkVk
Sk,k = [In-GkCk}Sk,k-l-
(10)
The form of 6 (x\Xklk) indicates that Xk,k is a hyper-ellipsoid with centriod Ck,kWe term the matrix Sk,k the credal matrix. We summarize the preceding derivation in the following theorem: Theorem 4. Consider the system Xfc = Ak-ix.k-i
+ Bk-i£kl
,
Vfc = CfcXfc + 7?fc ,
k = 1, 2, • • •, with {£ } and {7^} Gaussian (zero-mean) white noise processes with covariance Qk and Rk, respectively. Let So,o be an invertible nxn matrix, let co,o be an n-vector, and let Po,o be a positive definite nxn matrix. Then the set-valued Kalman filter is as follows: Prediction Step: = [xk € TC : (xk - ck,k-i)T
Xk,k-i
{Sk.k-iSlk-i)'1
(xk - cjt,fc-i) < l } ,
where Cfc,/t-i = Ak-iCk-\,k-\
,
Pk,k-i = Ak-iPk-i,k-iAk_1 Sk,k-\ = Ak-\Sk-\,k-\
+
Bk-iQk-iBk_x,
■
Filter Step: Xk,k = {xk 6 Kn : (xk - ck,k)T (Sk,kSlk)~l
(Xk - ck,k) < l } ,
where Cfc.fc = Ck,k-\ + Gk[vk - CkCk,k-i], Pk,k = [In - GkCk]Pk,k-l , Sk,k = [In — GkCk]Sk,k-l ,
D. Morrell and W. Stirling
150 with Gk the Kalman gain defined by Gk = Pk,k-iCZ
[CkPk.k-iC? + Rk]
The structure of the credal set is governed by the credal matrix Sk,k- If Sk,k —* 0 as k —► oo, then the set estimates converge asymptotically to the point estimate Ck,k, and all prior distributions lead asymptotically to the same limiting posterior distribution. In such cases, the system is said to be credible. Examination of (10), (6b), and (6d) reveals that the credal matrix and the filtered covariance evolve according to the same dynamics, i.e., through [/ — GfcCjt]Afc_i. It is well known that, for uniformly observable and controllable systems, this matrix is stable. Consequently, all uniformly observable and controllable systems are credible. Also, all detectable systems are credible. §5 Epistemic utility theory The set-valued Kalman filter, as presented in the preceding section, computes a set of conditional means and a conditional covariance matrix that can be interpreted as representing a posterior credal state generator 14,*;. We now present the essential elements of Levi's epistemic utility theory, by which a set of "seriously possible" system states at time k is obtained. Conceptually, a state estimate is seriously possible if the information that the estimate conveys justifies the risk of error entailed by its acceptance. Serious possi bility embodies a conflict between the desire to avoid error and the desire to acquire new information. According to Levi, this conflict can be represented by an epis temic utility function, which is a linear combination of two terms: the utility of obtaining new information, and the utility of avoiding error. A decision agent uses the criterion of maximum expected epistemic utility to determine which state values are seriously possible. In general, a set of state values will be seriously possible. In the following development, we denote a potential estimate set as g* C TZn. We first describe the information component of the epistemic utility function. We then present the use of this information component in conjunction with the credal state generator V%,k to obtain a set-valued estimate of the state at k. Levi's stable acceptance procedure is presented; stable acceptance is an iterative compu tational procedure that eliminates certain difficulties in choosing the information component of the epistemic utility function. 5.1 Informational utility To assess the information conveyed by each potential answer, we assign an informa tional value to each set g' C Hn. This informational value is assigned through the
151
Set-valued Kalman Filtering
use of an informational density m(x); the informational value of a potential answer is M(g*) = J m(x')dx'. dx'- M(g*) is not related to the truth of g*, but rather to the informational value of rejecting the set g* regardless of its truth value. If the decision agent rejects g", he will obtain information with a value of M{g*). The selection of a particular functional form for M(g*) must reflect the decision agent's values in the sense that M(g") indicates an assessment of the information content of g* For example, if all elements of TZn are considered to be equally informative, then a reasonable choice for m(x) would be a uniform density; M(g") would be proportional to the (Lebesgue) measure of g* On the other hand, a non-uniform m(x) implies that the agent is more willing to risk error to obtain informative, and hence smaller, estimate sets for some values of x than for other values; smaller values of m(x) imply a higher information value. 5.2 Expected utility The decision agent decides if a potential answer g* should be accepted on the basis of maximum expected epistemic utility. As previously discussed, the epistemic utility function balances the informational content of an estimate, as expressed by the density m(x), against the probability that the estimate is correct, as expressed by the set of densities Vfc^ computed (in the filtering case) by the set-valued Kalman filter. The balance between willingness to risk error and desire for new information is expressed in terms of a parameter 6, where 0 < b < 1. The parameter 6 is called the degree of boldness or index of caution and represents the decision agent's overall willingness to risk error in order to obtain information. Values of b close to zero correspond to caution, while values of b close to one correspond to a willingness to risk error. The epistemic utility function is a sum of the informational value of a potential answer g* weighted by b and the truth value of g* The set of seriously possible values for the system state is that set gk,k that maximizes the expected epistemic utility. If the credal state Vktk consists of only one density p(xk\Vk), the following set gk k maximizes the expected epistemic utility [9]: gk,k = {xk ■ p{xk\Vk) > bm(xk)} . (11) (11 In general, V^fc consists of more than one density; thus, we modify (11) so that the estimate gk,k consists of all points that are included in a set estimate for at least one density in Vk,k- The estimator thus becomes Sfc.fc = {xk ■ p{xk\Vk) > bm(xk), p(xk\Vk)
€ Vk,k} ■
(12) (12)
The set gk,k is the set of all seriously possible values of XkEquation (12) embodies the fundamental philosophy underlying the epistemic utility interpretation of the set-valued filter; only those sets of values are rejected for which the informational value of rejection exceeds the highest possible belief (as
152
D. Morrell and W. Stirling
measured by the credal state) that the set contains the true value. A unique state estimate will generally not be obtained; instead, only "bad" estimates are rejected (in the sense that the new information obtained in accepting them is outweighed by the probability that they are in error), leaving as potential answers the "good" estimates. Note that as 6 is increased in (12), a larger portion of TZn is rejected and the size of gk,k is reduced. In words, as the decision agent's degree of boldness increases, he rejects more potential answers; the informational content of his estimate is higher at the cost of increased risk of error. 5.3 Stable acceptance By application of (12), the decision agent obtains an answer gk,k which is the set of all state values that are considered to be seriously possible given the degree of bold ness b, the credal state Vk,k, and the informational assessment expressed by m{x). At this point, gk,k may be used as a basis for further inquiry by conditioning m(x) and the densities in Vfc,/t on gkk, and then applying (12) again using the conditional densities. In so doing, some elements of gk,k may be eliminated from further con sideration. The process of conditioning densities on the current answer and then re-applying (12) to obtain a new answer may be iterated to (asymptotically) obtain a stable set of estimates. Levi calls such a procedure stable acceptance. Use of the stable acceptance procedure renders the estimate set invariant to scaling of the informational density m(x) and to scaling of the densities in Vk,kThis property allows the use of densities whose integrals over TZn are unbounded. For example, it is often the case that no potential answer is considered to be more informative than any other potential answer. In such a case, m(x) should be uniform (i.e., m(x) = K, where K is a positive constant). However, the integral of this m(x) over 72." is unbounded. In practice, the determination of gktk begins with some restrictions on values considered to be seriously possible. These restrictions may lead to the choice of a set gk k C TZn; gk k is the set of possible answers not ruled out by a priori physical or logical considerations. We require only that the integral of m(x) over gk \_ be bounded. The process of stable acceptance is then applied to obtain a stable set gkk which is independent of the scaling of m(x). We now define the iterative stable acceptance process mathematically. Let gkk, n > 0, be the estimate obtained from (12) in the nth iteration of the process. The next iteration consists of conditioning the densities on gkk and then computing S M ^ 1 ' using ( 12 )- W e condition m(xk) and each p(xk\Vk) e Vk,k on gk"l as follows: (
i M\
m[xk)
in)
(13)
Set-valued Kalman Filtering /I
IT/
153
(M « )s\
P{Xk\Vk) P(Xk\V k)
,-.
(n)
/I/O
Jg^lP\xk\Vk)dxk The set p ^ 1 " ' is obtained using (12): 9kT]
= {** ; P^\Vk,gknl)
* M*kls£2)> TP{xk\Vk) e
Vkik}ngk°l.
Because of the conditioning in (13) and (14), both p(xfc|Vfc) and m(xfc) may be multiplied by positive constants without affecting gkn£ ' Thus, the requirement that p{xk\V5t) or m(xjt) integrate to one can be dropped; we require only that their integrals over gk 'k be bounded. The stable acceptance procedure produces an infinite sequence of sets {gkk} that converges to a limiting set gkk
The structure imposed on Vk,k by the set-
valued Kalman filter insures that gkk
is non-empty [9].
§6 E x a m p l e s In this section, we present simulation examples that illustrate the performance of the set-valued Kalman filter for several system models. We first present results for an observable system model, and then results for an unobservable system model. We then present an example of a system that is observable but whose transient behavior requires some time for the credal set Xktk to converge. 6.1 Observable system We consider a linear discrete-time system with a two-dimensional state vector Xk = [xi(fc) X2(k)] and a one-dimensional observation value Vk- The system model is given by (5); the matrices Ak, Bk, Ck, Qk, and Rk are independent of k with the following values: ' 1 1 A= B = In, C=\\ 01 0 1 Q = 1, and R = 2. This system model is uniformly controllable and observable. The simulation was run with an observation sequence {vk} of length five. The ere set initial values of co,o and So.o wwere set to co,o = [0 2] and 5o,o = diag(10, 4). These values correspond to an initial credal set -Yo.o that is an ellipse with a centroid of (0, 2) and semi-major and minor axes of length 10 and 4 units, respectively. We interpret this ellipse as the family of all seriously possible initial conditions, i.e., the a priori distribution for the state is assumed to have a mean value lying in this region. The initial state covariance was set to Po,o = 4 / n . At each k e { l , - - , 5 } , Ck,k and Sk,k were computed according to the recursion in Theorem 4.
154
D. Morrell and W. Stirling
Figure 1. Simulation results for the observable system. Figure 1 (a) shows plots of the credal set Xkk is the output of a Kalman filter for a corresponding initial point in Xo,o- The system model used for this simulation is uniformly observable and controllable; thus, the effect of the initial estimate on the estimate at time k decreases as k increases. Figure 1 (a) shows that this is indeed the case, since the size of Xkik decreases as k increases, indicating decreasing uncertainty about the initial state.
Set-valued Kalman Filtering
155
The credal set Xk,k and the covariance matrix Pklk computed by the filter at each time are used to obtain sets of seriously possible state values gkk via the stable acceptance procedure. Throughout these simulations, a uniform information determining density m(x) is used; the informational value of the state estimate is thus inversely related to the area of the estimate. Figures 1 (b), 1 (c), and 1 (d), show plots of the seriously possible sets obtained with values of b of 1.0, 0.5, and 0.1 respectively. As b decreases, the size of the estimate sets increase and the estimates become more conservative. Note that these set estimates depend on both the first moments (Xk,k) and second moments {Pk,k) of the distributions in Vk,fc; thus, they do not admit an interpretation as confidence regions. 6.2 Un-observable system A second simulation was run using a system model that is not observable. This system model is identical to that of the observable system except that C = [0 1 ]; x\(k), the first element of the state vector of this system, is unobservable, in the sense that observations do not provide any information about x\(k). The simulation was run with the same initial parameter values used for the observable system simulation.
Figure 2. Simulation results for the non-observable system.
156
D. Morrell and W. Stirling
Figure 2 (a) shows plots of the credal set Xk,k for values of time from 0 to 5. The initial set is the leftmost and largest. As time progresses, the system trajectory moves up and to the right. The credal set converges in the x2(k) (vertical) axis, but does not converge in the Xi(fc) (horizontal) axis; the initial uncertainty in the estimate of xi(k) is not removed by processing the observation sequence. Figures 2 (b), 2 (c), and 2 (d), show plots of the seriously possible sets obtained with values of b of 1.0, 0.5, and 0.1. These sets have a vertical extent that is much larger than the vertical extent of the credal set due to the relatively large value of the conditional variance of 12(h). 6.3 Transient behaviors Even though a system may be uniformly observable and controllable, large transients may occur before the credal matrix ultimately begins to converge. Consider the following two-dimensional example. Let
A L
~[ o i ] - *=[!]'
C=[0A 0]
-
and P0,o = In, Q = 0.001, R = 2.0, co,o = [5 0], and S0,o = diag(50, 20). The a priori distribution has a relatively small second moment compared to the size of the region of seriously possible first moments.
Figure 3. Transient system behavior.
157
Set-valued Rahman Filtering
Figure 3 (a) displays the evolution of the sequence of filtered credal sets for this system. As data are processed, the credal sets become narrower in one direction and elongated in the orthogonal direction. The observability and controllability of this system are easily established, so it is known that the system is credible and the ellipsoids will ultimately converge to a point. For this example, however, there is a significant transient before the credal sets begin to converge. Figures 3 (b), 3 (c), and 3 (d) display the seriously possible sets for values of 6 of 1.0, 0.5, and 0.1 respectively. §7 Discussion A classical way to deal with ignorance concerning the a priori distribution is to set the initial mean to some arbitrary value (typically, zero) and let the variance be extremely large (ideally approaching infinity), thereby deweighting completely the initial conditions. But such an approach is not completely satisfactory. There are often physical considerations that bound the values of the initial state, and these constraints are often more properly associated with bounds on the first moment, rather than on the second moment. Additionally, when the system model is not observable, the state estimate for all k depends on the choice of the initial state estimate; the effect of an arbitrary initial choice thus directly affects all subsequent estimates. The use of the epistemic utility function to obtain the set of seriously possi ble state values provides a method of considering both the conditional mean and conditional variance of the state estimate. The conditional variance of the estimate is a measure of the dispersion of the true value about the conditional mean due to statistical uncertainty; the credal set consists of all conditional means that are consistent with a priori information and the observations. The additional measure of dispersion associated with the credal set thus provides a means of factoring an agent's ignorance into the estimation problem in addition to specification of the variance. As discussed in [14] and elsewhere, one of the problems often associated with the Bayesian approach is that there is no way for an agent to differentiate between ignorance (lack of knowledge) and uncertainty (imprecise knowledge). By forcing the agent to specify one and only one prior distribution, the resulting es timate is represented as the unique best estimate, even though the specification of a unique prior may be made in an unwarranted and objectionable way (though required for operation of the estimator). By relaxing the requirement of a unique prior, the agent is permitted the opportunity to examine a family of estimates, each one me associated associated with with aa specific specific prior, prior, and and thereby thereby can can assess assess the the significance significance of of the the choice of prior. :hoice of prior. A issue associated with set-valued estimation is that the estimate is not unique, and it may be unsettling to have to deal with a set of estimates. Unfortunately, this is the price to be paid if the agent is to acknowledge his true state of (or lack of) knowledge. If the agent is unable to reliably specify a unique prior distribution,
158
D. Morrell and W. Stirling
then he should not expect to obtain a unique posterior distribution (and, hence, a unique estimate). On the fortunate side, however, if there are sufficient data available and the system is uniformly observable, then the set-valued estimator will asymptotically converge to a point estimate, which corresponds to the estimate that would be obtained for any choice of prior distribution in the a priori credal state. For data-limited applications, however, even if the system is uniformly control lable and observable, it may be advisable to employ a set-valued estimator, thereby providing an agent with the opportunity of assessing the transient behavior of the credal set. For marginally observable systems (i.e., systems for which the sensitivity of the observables to changes in the state is extremely small) or partially observable systems (i.e., systems for which there is an unobservable subsystem), the use of point-valued estimates may prove to be inadequate. While the set-valued estimator does not yield "better" estimates in the sense of improving accuracy, it does pro vide the agent with a more complete picture of the sensitivity of the estimate to ignorance and uncertainty in the initial conditions. §8 Conclusion This paper extends the conventional notion of Bayesian estimation as the calculation of a single conditional distribution to the calculation of a convex set of conditional distributions. It is shown that a Markov dynamics model and a white noise ob servation model yield an estimator structure that preserves convexity of the sets of distributions. It is also shown that a convex set of distributions may be represented by a set set of distributions {e.g., the Gaussian probability densities), and that con vexity is preserved in the estimator structure if only the generator set is considered. By taking the generator family of distributions to be Gaussian distributions with means in a convex set in state space and equal variances, a set-valued Kalman filter is developed that generates the equivalent of an infinity of Kalman filters, each with different initial conditions. If a system is not uniformly observable, or if there are insufficient data to elimi nate the effects of incorrect specification of the prior distribution of the system state, the choice of initial conditions may be critical for Bayesian estimation techniques such as the Kalman filter. In the interest of avoiding error, a set-valued estimator provides a family of all possible estimates that are consistent with the observed data and the contextual and logical information known about the system. The set-valued Kalman filter provides a means of propagating this trajectory credal set, rather than just one member of the set. By examining the structure of the credal set, it is possible to ascertain the effect of the initial conditions on the system state estimate. Acknowledgments. This work was supported in part by ESL Inc., Sunnyvale CA., and in part by NSF grant MIP-9111174.
159
Set-valued Kalman Filtering References
1. Anan'ev, B. I. and A. B. Kurzhanskii, The nonlinear filtering problem for a multistage system with statistical uncertainty, in Proc of the 2nd IFAC Symp. Vol. 2, 1986, 19-23. 2. Bertsekas, D. P. and I. B. Rhodes, Recursive state estimation for a set-member ship description of uncertainty, IEEE Trans, on Auto. Contr. 16 (1971), 117128. 3. Ho, Y. C. and R. C. K. Lee, A Baysian approach to problems in stochastic estimation and control, IEEE Trans, on Auto. Contr. 9 (1964), 333-339. 4. Kats, I. Ya. and A. B. Kurzhanskii, Minimax multistep filtering in statistically indeterminate situations, Automat, i Telemekh 11 (1978), 79. 5. Krener, A. J., Kalman-Bucy and minimax filtering, IEEE Trails, on Auto. Contr. 25 (1980), 291-292. 6. Levi, I., Gambling with Truth, M.I.T Press, Cambridge, MA, 1967. 7. Levi, I., The Enterprise of Knowledge, M.I.T Press, Cambridge, MA, 1980. 8. Levi, I., Decisions and Revisions, Cambridge University Press, London, 1984. 9. Morrell, D. R., Epistemic utility estimation, IEEE Trans, on Sys. Man Cybern. 23 (1993), in press. 10. Morrell, D. R. and W. C. Stirling, Set-valued filtering and smoothing, IEEE Trans, on Sys. Man Cybern. 21 (1991), 184-193. 11. Pshenichnyi, B. N. and V. G. Pokotilo, On observation problems in discrete systems, PMM J. of Appl. Math. Mech. 45 (1981), 1-6. 12. Rockafeller, R. T., Convex Analysis, Princeton University Press, Princeton, N. J., 1970. 13. Schweppe, F. C , Recursive state estimation: unknown but bounded errors and system inputs, IEEE Trans, on Auto. Contr. 13 (1968), 22-28. 14. Stirling, W. C. and D. R. Morrell, Convex Bayes decision theory, IEEE Trans, on Sys. Man Cybern. 21 (1991), 173-183. 15. Witsenhausen, H. S., A minimax control problem for sampled linear systems, IEEE Trans, on Auto. Contr. 13 (1968), 5-21. 16. Witsenhausen, H. S., Sets of possible states of linear systems given perturbed observations, IEEE Trans, on Auto. Contr. 13 (1968), 556-558. Darryl R. Morrell Department of Electrical and Computer Engineering Arizona State University Tempe, AZ 85287 [email protected]
160 Wynn C. Stirling Department of Electrical and Computer Engineering Brigham Young University Provo, UT 84602 [email protected]
D. Morrell and W. Stirling
Distributed Filtering Using Set Models for Systems with Non-Gaussian Noise Lang Hong
Abstract. An algorithm of distributed filtering for systems with non-Gaussian noise using set models with confidence values is derived. Unlike the Kalman filter, this algorithm requires no statistics of noise distribution. The only information needed is the sets with confidence values from which the modeling and measurement errors and the initial values are obtained. Therefore, the algorithm has great potential for real-world applications.
§1 Introduction The Kalman filter which has been widely used, e.g. [1,2], requires specific information about the first moments and the second moments of the stochastic processes (the modeling error and the measurement error) and the information about initial values. Such information may not be available in many applications. To overcome this, Schweppe [6] introduced an unknown-but-bounded model based on which a filter using set operations was derived. The model introduced by Schweppe did not assume any knowledge of the statistics of stochastic processes and initial values. Instead, the errors were assumed to be white, unknown-but-bounded processes and the initial value was an unknown-but-bounded vector. Although the unknown-butbounded set model showed great potential for real-world applications, it has the following drawbacks: (1) The boundaries of the sets are very hard to determine in practice, or the boundaries have to be infinitely large to include every possibility; (2) if the boundaries are not precisely known and since set intersection operation was used in the filtering process, there may be no solution if the two sets are not intersected; (3) since there is no weighting involved in the intersection operation, there is no way to tell which intersection contains more likely solutions when applied Approximate Kalman Filtering Guanrong Chen (ed.), pp. 161-176. Copyright ©1993 by World Scientific Publshing Co. Inc All rights of reproduction in any form reserved ISBN 981-02-1359-X
161
L. Hong
162
to the distributed filtering. Inspired by Schweppe's idea, this work uses set models with confidence values and applies them to the distributed filtering. The chapter is organized as follows. In Section 2, the problem is formulated and in Section 3, a distributed algorithm is derived using set models with confi dence values. One numerical example is presented in Section 4 to illustrate the effectiveness of the proposed algorithm. Finally, Section 5 concludes the chapter. §2 Problem formulation The Kalman filter is based on the following models [4] Xfc+i = Akx-k + BkUk + Gkik
,
Zfc = CjfcXfc +J)k,
(1) (2)
where u^ is a known deterministic input. The first and the second moments of the stochastic white processes f, and TL are exactly known, i.e. E
dk) = o. EdkiTk) = Qk,
and
E{TQ = 0, E%rJZ) = Rk, and prior information of the state is also known. One problem preventing the Kalman filter from being employed in some applications is that the exact statistics of the stochastic processes, f, and TL, and the initial value xo may not be available. Schweppe [6] introduced an unknown-but-bounded model based on which a filter was derived using set operations. The model introduced by Schweppe is similar to equations (1) and (2) but did not assume the statistics of the stochastic processes, £, and TL , and the initial value xo. Instead, f, and rj, were assumed to be white, unknown-but-bounded processes and xo was an unknown-but-bounded vector. As mentioned in Section 1, Schweppe's model has some drawbacks also. Another re lated work by Morrell and Stirling [5] using set models in filtering and smoothing is worth mentioning. In their work, Morrell and Stirling assumed that the errors £ and ru are white Gaussian stochastic processes, but the initial values are drawn from a bounded set which results in a set-valued filter. A distributed system model x fc+1 = Akxk + Bkuk + Gfc£fc,
4 = Cfrfc + f£,
(3)
i = l,---,N,
(4)
is studied in this work in which the errors and the initial value are assumed from the sets
xoeftx0, £fcefi<;t, and ^ e t i ; ,
i = l,---,N,
Filtering Using Set Models
163
with confidence values c I 0 , c$k and c* . A confidence value associated with an element from a set reflects our confidence on the knowledge that the element is drawn from that set. For instance, a confidence value of 0.8 associated with an element x from a set Q means that the chance that the element x is drawn from the set U is 80 %. Notice that the sets flxo, Q^k and fij^ are not bounded sets of errors as used by Schweppe. Using set models with confidence values makes the boundaries of the sets more manageable. For instance, if the knowledge of the error is uncertain, instead of using a large size of the set boundary in Schweppe's approach, a reasonable size of the set boundary can be determined with a small confidence value. The most important advantage of using set models with confidence values is that the information provided by the different sensors z'k, i = 1, • ■ ■, N, can be weighted by their confidence values in forming an integrated estimate. Unlike the Kalman filter, the estimate here is a set instead of a vector. One could, of course, choose the center of the set as an estimate. The objective of this work is to derive an algorithm for distributed filtering using set models with confidence values. §3 Distributed filtering using set models with confidence values Theoretically, the boundaries of the sets fj 1 0 , 0f t and fi* could have an arbitrary shape. However, the minimum ellipsoidal bounding sets Q I 0 , Q^k and OJ^ are used for easier implementation using a computer, i.e.,
n*o c o*o,% £ % andft*,c n;fc.
(5)
The ellipsoidal bounding sets are given by 0 I 0 = {x 0 : (x 0 - m I O ) T r i o - 1 ( x 0 - m I 0 ) < 1},
^
= {ik-lTkQk-%
(6)
«* = (at ■■£^r% < i}where r i 0 , Qk and Rk are positive definite and so chosen that the bounding sets are the smallest sets containing the elements of fil0, Q^k and fij^ respectively. Notice that any set of 0 l 0 , f^fc and Q'nk can be an empty set. In this case, the corresponding formula in equation (6) specifying the set is not used. The center m l 0 of the ellipsoid fiI0 is determined accordingly so that £lxo is the minimum bounding set of 0 X o . In the following, a distributed algorithm is derived assuming that an integrated estimate (set), QXkk, is available (i.e., mXk and r£ k are known) at the kth moment with a known confidence value cIk k. The estimated QXk can be described as 0 I t , t = {*£,* : (*£,* - mXkk)T(TbXk:k)-l(xlk - mbXkk) < 1}, (7) where the superscript b denotes a bounding ellipsoid. The algorithm starts with initial values r £ o o = Txo and Qbxo 0 = Qxo given by equation (6).
L. Hong
164 1. Propagation from tk to tk+iThe propagation of the set £l£t
from tfc to tfc+i can be described by
«£*+,.» = i*Ui,k ■ ( * W - '«L +1 .J T (r p x t+1 ,J- 1 (^ +1 , fc - <
+ 1
,J < i}, (8)
where the superscript p denotes the propagated set and K
t + U
= A * r a ^ + Bfcufc, and r £ t + 1 _fc = A f c r * t ^ .
(9)
The confidence value for HPfc+i fc is unchanged, i.e., c£ t + l t = cj fcfc . Similarly, the propagation of the set fl^fc is given by
^k+1 = {S + 1 =f & T + 1 wv I r 1 ^ + ^ 1 >'
(10)
Ql+l = GkQkGTk,
(11)
where provided that Qk is positive definite and Gk is not a null matrix. If Gk or Qfc is a null matrix which corresponds to an exact system model, one knows that the set n? is an empty set, without using equation (10). The estimate set at tk+\ before updating is given by a vector sum as n
^+,.k
={*k+hk--Xk+i,k = xpk+i,k + Li+v for any x£ +1|fc e WXk+i k and any g + 1 e
fi£fc+i}
(12)
Unfortunately, the vector sum of two ellipsoidal sets is not an ellipsoidal set in general. However, a bounding ellipsoidal set H* of fiit+lit, which is used as the estimated set, can be found (see Appendix Al): « L » . f c = {*ti.fc = ( A | + i , f c - m t k + 1 ( t ) T ( r ^ + l i f c ) - 1 ( x | + 1 > f e - m t ) k + l p J < 1}, (13) where
< u
=
<
+ l
, = ^ < +
B
*
u
^
(14)
and 1+
C e
( rW " --«(tr{Q^ M }) ]rr? f c ) 1 fc + I 1 + - ^ 3f^" ' «'>») '...> + I' + - ^
« « ■ <15>
In equation (15), tr{-} is the trace of a matrix and w(tr{Q^ +1 }) is a step function
»««„»^{j; » } : r
<«>
Filtering Using Set Models
165
The bounding ellipsoidal set 0* k given by equation (13) is a minimum bounding ellipsoid based on the weights of two ellipsoidal sets 0J fe+J fc and 0|' . The step function is used for the consideration that when Qk = 0 (exact modeling), r*H + l.fc T?tJ_, ■_■ The confidence value for fi' . is
C
"xfc+i,» T/b nf!fc+l,k k+llk
6 _ Xfc+l.fc *fc+l,fc
(17)
K V
where VA
= volume of 0*
„
and V
L+x,k = volume of Q,bk+1 >fe ,
where 0^+1 k is the set when the confidence value c£ fc+11 is one (the ideal case). With the assumption of ellipsoidal sets, the confidence value c£ can be calcu lated as 11/2
c*
k
IrIfc+1'fcl
=
,
(18)
|(l + U ( i r { Q ^ + 1 } ) ) ( r ? t + 1 , t / c ? t + 1 , t + Q ^ 1 / c f t + l ) | 1 / 2 where | • | represents the determinant of a matrix. 2. Updating at tk+V The propagated bounding ellipsoidal set fi*fc, 2 t (with a confidence value cx k) is updated by N measurements taken at £fc+i. In this distributed filtering algorithm, each sensor provides a solution set based on its own measurement and the knowledge of measurement error r>) , e fi* ,. The solution set for sensor i, Q.\., , i = -U:+l
Vk+1
'
I
»2fc+l'
1, • • •, JV, is specified by « U + 1 = {xfc+1 : ( 4 + 1 - Q + l X f c + l ) T ( J R , f c + 1 ) - 1 ( 4 + l " Ct+iXfc+i) < 1} = {x fc+1 : (x fc+1 - m t k + 1 ) T C i J + 1 ( f l i , + 1 } - 1 Q + 1 ( x f e + 1 - mj fc+1 ) < 1} , (19) where « 4 + 1 = (C i ^ + 1 (ii* f c + i)" 1 Ct + .i) + Cifi(« i iN-i)~ 1 "M.i, +
(20)
where (-) denotes the pseudoinverse of the matrix. The confidence values of the sensor lensor solution sets are the same as those of measurement error sets, i.e., i.e.,
4,, t + 1 =4 + 1 .
..-,JV. iJ ==1 ,l,-,^-
166
L. Hong
The derivation of equation (19) is given in Appendix A2. The updating is done by integrating N solution sets with the propagated bounding set. The integration is performed by the intersection of the propagated bounding set &bXk+lk and the solution sets of N sensors flL , , i = 1, • - •, N
fU+1.t+1 = {Xfc+i.fc+1: ik+uk+i e nbXk+ik n n ^ t + i n n ^ n - • - n f i ^ J . (21) Again, the intersection of ellipsoidal sets is not an ellipsoidal set in general. A bounding ellipsoidal set of UXk+1 k+1, which is treated as the estimated set, can be found by weighting each set by its confidence value: =
^ifc+l.k+l
{*fc+l,fc+l
:
(Xfc+l,fc+l ~
m
it+i,fc+i)
C"
ik+l.k+l)
(x|+1)fe+i-m^+lit+l)
(22)
where
(fxt+,t<+,fc + £ ^ , » U J (23)
< + , t + 1 = (rik+1,k + E * U J and r
/
JL
The weight of each set is reflected in f* is attached to each set:
k
C
r6 '-It+l.t XZ
' k+l
Kk+1 = -2
\
x k + 1 ( k + I = ak+i ( TXk+1tfe + ~£ ^i*+i J
^
~
and ^ i t + ] in which a normalized weight
*fc + l , k
T^h-l
+V
c» ' Z_> i = l l - i , « f c + 1
/il
1
(24)
/D>
r6;*
\ —1/">»
C* f c + 1 (iT f c + 1 )- 1 CJ + 1 ,
,
(25)
i = 1, • • •,JV.
(26)
The scaling coefficient in equation (24) is determined as / - . I -^h
N b
. S~^ ™
\ i
\
T
/ I ^h
N . v ^ ™
\~l \
< (f^ +1 X t+i . t +]L^ t+i m' t+i j
- ^ . x ^ + E m*!««U"Uj
(27)
167
Filtering Using Set Models
The integration of N sensors, equation (22), is derived in Appendix A3. The bound ing ellipsoidal set flXk+i k+1 actually is a bounding ellipsoid of the following set
nfc+1 = ( 5 ( < + i 4 n n ^ + ^ n n ^ t + J u ( B ( < + ! , f c n ^ + i ) n n t t + i ) u ■ • ■ u (B[n»£t n n^ + i ) n o ^ + i ) ,
(28)
where B(A) denotes the bounding ellipsoidal set of set A. Since fi/t+i is the bound ing ellipsoid of the union of the intersection of the bounding ellipsoid of any two sets with any other set, the estimated set fi£t i 1 is more robust than the original intersection set n i t + 1 t + 1 . For instance, if prior information of one sensor measure ment error is not correct or if one sensor fails, the intersection of sensor solution sets (equation (21)) might not exist, i.e., the intersection set dXk+lik+l might not exist. However, in this case, the bounding ellipsoidal set H* still exists since fi/t+i exists. The confidence value for the QXk is calculated by I 6* I1/ | it+i.t+i|
b
c*I«:+l,t+l C
2
r i = |1 x t + 1 ' t + 1 l 7T/2
_
I
f2
'
9)
Ifc+l,k+l1
where fj> ,-6
„
+
/|
1
1 *
N
^S^
x // ^x \ k+l,k fe+,ifc \
N
1
clfcT+i( +i) lcUi
^ "
)
(30)
So far, we have completed the derivation of the algorithm by providing the updated estimate set Qb , . . and the associated confidence value cL ., ... given Qb . and C\k The updated estimate set £l\k i is optimal in the following sense: (1) the bounding set ilXk, 1 k of the propagate set QXk+l k is minimum, and (2) the bounding set of the union of the set intersections given by equation (28) is the smallest. In the next section, the distributed filtering algorithm is applied to one numerical example to demonstrate the effectiveness of the algorithm. §4 An e x a m p l e One object moves on a two dimensional surface in an elliptical course with a constant speed, Figure 1. Five observers are located at five different places using imperfect sensors to measure the location of the object every three degrees, Figure 1. It
L. Hong
168
is assumed that the measurement errors of sensors are uniformly distributed (not Gaussian). The dynamics of the object is modeled by cos(3) | sin(3) xfc+i
-f sin(3)
Xk
cos(3)
Vk
A
(31)
x*
and the five measurement models are given by
where
(32)
< = 1.
J^U jAj, Jjjsi
NL JJ,
„i _ \cos(6l) -sin(0")' ° f c " [ sin(fl') cos(^) J '
(33)
6l = 4 5 ° , 62 = - 1 5 ° , 0 3 = 4O°, and
5o 43 44 5 _6 ii _ r°i & r i 1, 3 ri h* \ 1 / . r ° L°J i 4 5 J ' ~ 1 5 2 J' ~ L - 5 5 J ' ~ l - 3 7 . '
„, i „-„ K„i: ^ t . m r . l . . ^t; .+ ^;U,,*„^ ™,™. The sensor measurement errors are believed1 t„to u„ be „uniformly distributed over a disk r
with a radius of four, i.e., in Equation (6) R'k =
n
1 fi n
„ 0
.„ , i — 1, 16
, 5, with
confidence values <* = 0-8, <
= 0.8, <
= 0.9, cl = 0.75
0.85.
The initial value xo is from the following set xo e f2I0 = {x 0 : (x 0 - [33
2.5] T ) T
36 0
0 16
(XQ - [33 - 2.5]T) < 1}
with a confidence value cxa = 0.4. Figure 2 shows one sensor's measurement error distribution regions. The trajectory of the moving object is estimated by integrating the measure ments provided by five sensors. Since the measurement errors are uniformly dis tributed, the Kalman filter is not suitable for this. The distributed filtering algo rithm developed in this chapter is applied. The estimated trajectory sets are shown in Figure 3 and the trajectory of the center of the estimate set is shown in Figure 4. One can see that starting from a big set (large uncertainty for xo), the estimate quickly converges to small sets. Compared to the sensor measurement error distri bution regions (Figure 2), the sizes of the estimated sets are much smaller, due to the integration of measurements from five sensors.
Filtering Using Set Models
169 §5 Conclusions
An algorithm of distributed filtering using set models with confidence values has been derived. The algorithm works for systems with any noise distribution as long as the sets which the noise and the initial values belong are known with confidence values. The algorithm shows great potential for real-world applications which is partially illustrated by the example given in the chapter. References 1. Bar-Shalom, Y. and T. E. Fortmann, Tracking and Data Association, Academic Press, San Diego, CA, 1988. 2. Hong, L., Adaptive distributed filtering in multi-coordinated systems, IEEE Trans, on Aero. Electr. Sys. 27 (1991), 715-724. 3. Lay, S. R., Convex Sets and Their Applications, John Wiley & Sons, New York, 1982. 4. Maybeck, P. S., Stochastic Models, Estimation, and Control, Vol. 1, Academic Press, San Diego, CA, 1979. 5. Morrell, D. R. and W. C. Stirling, Set-valued filtering and smoothing, IEEE Trans, on Sys. Man Cybern. 21 (1991), 184-193. 6. Schweppe, F. C , Uncertain Dynamic Systems, Prentice-Hall, Englewood Cliffs, N. J., 1973.
Figure 1. One object moves in an elliptical course and five observers measure the location of the object using imperfect sensors.
170
L. Hong
Figure 2. Measurement error distribution regions of sensor # 1.
Figure 3. The estimated trajectory sets vs. true trajectory.
171
Filtering Using Set Models
Figure 4. The trajectory of the center of the estimated trajectory sets vs. true trajectory.
§6 Appendix
A l Vector sum and bounding ellipsoidal set Given two sets il\ and Sl2: Hi = {Xi : (xi - n n ) T r f x (xi -
mi
) < 1} ,
^2 = {X2 : (x 2 - m 2 ) T r ^ 1 ( x 2 - m 2 ) < 1} ,
(34)
with confidence values C\ and c 2 , a vector sum of these two sets is defined as il = {x = xi + x 2 , for any xi e iti and any x 2 G f22}.
(35)
One example of a vector sum is illustrated in Figure 5 from which one can see that the vector sum of two ellipsoidal sets is, in general, not an ellipsoidal set. A bounding ellipsoidal set can be found as follows.
172
L. Hong //bnp
Figure 5. A vector sum of two ellipsoidal sets S7i and JI2 with confidence values ci = 0.8 and c2 = 0.6 and a bounding ellipsoidal set Qb A concept of support function is introduced first. A convex set C in a Euclidean space £ can be described by a support function /i(/z) [3]: h({i) = /i(^) = s u p £ T x < co. 00.
(36)
x€C
An ellipsoidal set f2i which is a special case of convex sets can also be described by a support function T htM(g) /D M ) ==M mI i! + yf^iE y7fTriAf
(37) (37)
If the set ^4 is a subset of the set B, i.e., i C B , then /t^Q*) < AgQf)- The support function of a vector sum fi of the ellipsoidal sets fli and fi2 is given by
/i(/£) = M/f) + M/£) T
(38) TT
/ TT
= //x( m (mi1 + mj,) m 2 ) + ^^ r ri /^x ++y ' y/ £/£ rr22 M M T T
T = /£ m + yffi yJ^Ty., = /£ m+ Tfj,,
where
(39) (40)
111111 = = 111! 111! + +m m22,, but but rr ^^ rrtt + + rr22..
(41) (41)
Apparently Q is not an ellipsoidal set. Using the Cauchy-Schwarz inequality /
NN
\ 22
N N
1l
(x>) S g ^
(42)
Filtering Using Set Models
173
where N
0 < ai < 1 and V ] a, = 1, I—1
we have /xTI>=(/v(/i)-MTm)2
(43)
= (v/MTr1£+
S/BTT^)
<-fiTT1fi+—tiTT2fi
(44)
= M T rV,
(45)
where 0 < oi, a.2 < 1, and ai + 02 = 1
(46)
r 6 = — ri + — r 2 .
(47)
and ai
ai
The coefficients ai and 0,2 are determined from the confidence values c\ and C2 by ai =
and 0,2 = Ci + C2
■
(48)
C\ + C2
The bounding ellipsoidal set is then Qb = {x6 : (x 6 - m'') T (r 6 )- 1 (x ( > - m 6 ) < 1} ,
(49)
where mb = m i 4- m2 and Tb is given by equation (47). A bounding ellipsoid is shown in Figure 5. A 2 Sensor solution sets From Equations (4) and (6), we have
K, = K
:
(!Z i fc ) T (^fc) _1 i < 1}
= {xfc : ( 4 - Q x f c ) T ( f f f c ) - x ( z t - Ctxfc) < 1} = n^t, i = l,--,7V.
(50) (51) (52)
To change the form of the sensor solution set £TXiZ from equation (51) into a standard form flU
= {** : (xfc - m^ + 1 ) T (H i x ,, f c )- 1 (x f c - mi f c + ] ) < 1},
(53)
174
L. Hong
the definition of the support function is used. The maximum x t from the solution set, x malfc (/£) is derived by solving {*fci£+ § ( ( 4 - Cixk)T(FVk)-\zl
^
- CUk) - 1 ) } = 0,
(54)
where the 0 is a Lagrange multiplier. x m o i t is given by x m a i t = {C*l{R\ylCk)+C>l{R\)-xJk
- (CiTk(Wk)-1Cl)+iJ,/l3,
(55)
where (-) + is a pseudoinverse operator. By solving the equation d_
{ x J M + 2 ( ( 4 - ^x f c ) T (ff f c )" 1 (z' f c - C£xfc) - 1) } = 0
(56)
and substituting Xfc with x m a l f c , the Lagrange multiplier /3 is determined as 0 = ±JET(C'kr(Rik)-1Cl)+H.
(57)
Substituting (55) and (57) into (36), the support function for the solution set 0X)Zfc is then h
*,*(t*)
=/£Txmait T + l T 1 c>k)+(i = = t£ {C>l{R\)-'Ci) { C ^ < f l i f c r 1 CC*l{R\)£ ) + c £ ( #*f kc r 1 - i + + ^ ( ^ J^ ( t {C f filc{R)ik)- -i ^)-^ T
,
/
T
_ .
= AfTmk+1 + \fl£TRi^^E.
(58)
where
< , , = (C<J(rft)-1Ci)+Ci2'(iiih)-1zt
(59)
* U = (ClJ(fffc)-1Q)+.
(60)
and Finally, putting the solution set into standard form results in equation (19). A3 Set intersection and bounding ellipsoidal set An intersection of (N + 1) sets, H; = {XJ : (x; — m ; ) T r r 1 ( x i — m,) < 1}, i = 1,• • • ,N+ 1, is described by
n = {x:xGfli nn2n---nftjv+1},
(61)
Filtering Using Set Models
175
where I \ are positive definite matrices. Since it is not an ellipsoidal set in general, a bounding ellipsoidal set is given by Slb ={x 6 : ai(x fc - m i ) T r 7 1 ( x ' ' - m i ) + a 2 (x 6 - m 2 ) T r ^ 1 ( x i ' - raj) + • • • + ajv + 1 (x 6 - mN+1)rT^l+1{xb 6
6
6 T fc-1
={x : (x - m ) r
6
- mjv+i) < 1}
6
(62)
( x - m ) < 1} ,
(63)
JV+l 0 < a, < 1 and E at = 1,
(64)
where
i=i
and mb and Fb are determined as foDows. Solving the equation A { x{ bx 6%%++ ^| ((aa ii ( x 6 - mmi 1))TT r r 1V — (xb -
m m ii )
+ a 22(x 66 - m 22) T) rT2 -I17(xV, > - m 22)
6 T 6 + ■ • • + aajv+i(x - mjv+i) - 1)} = 0 w + i ( x - m N + 1i ) r ^ + 1 ( x
(65) (6!
gives /JV+i /JV+l
_1
\
xx L L = r-M = (( E E ^^r-M
/N+i /JV+I
\\
£ a^r-V ^ - ' m , -- M//3J . (( £ tip\.
(6. (66)
The Lagrange multiplier multipher (3 is obtained by solving the equation ^ (a,(x ( x 6h - m , ) ^ { x bf cTT/ £ + §
T
r r V - mi) mO + a 2 (x 6 - m 2 ) T r 2 - 1 ( x t - m 2 )
mN+1) - 1)} = 0 + • • • + a N + 1i ( x 6 - m N + 1 ) T r ^ + 1 ( x 6 - mjv+i)
(67) (6'
and substituting x 6 with x ^ O I , which is
9
^H——s-^—• =H a '
(&
where the coefficient a is given by /N+\
/JV+l
\
\
T
/N+i
\
_ 1
/N+l
I E o.r.7''".) - f E W r ^ n n
\
1■
(6'
176
L. Hong
Forming the support function for Qb, one can easily see that \
/N+l
- 1
m^f^o.r-O
\
/N+l
(£air-1m,j
and
\ - 11
/N+l /N+i
r^af^a^-M
-
(70)
(71) (71)
If each set Cli is associated the confidence value d, the weighting coefficients (H, i = I, ■ • ■, N + 1 are specified by (72) ai ~ 2^=1 TN+lCj c' C 2-< j=l
3
Figure 6 shows an intersection of three sets and a bounding ellipsoidal set.
Figure 6. An intersection of three sets Oi, CI2 0,2 and andQ3 Q3with withconfidence confidence values values Ci = 0.8, C2 = 0.7 and C3 = 0.6 and its bounding ellipsoidal set Q.b
Lang Hong Department of Electrical Engineering Wright State University Dayton, OH 45435 [email protected]
Robust Stability Analysis of Kalman Filter under Parametric and Noise Uncertainties Bor-Sen Chen and Sen-Chueh Peng Abstract. The linear discrete-time Kalman filter with both uncertain secondorder statistical noise properties and uncertain system parameters is considered. We investigate the conditions for robust stability based on the upper norm-bound of noise and parametric uncertainties. Saddlepoint theory and the discrete-time Belhnan-Gronwall lemma are employed to solve this problem.
§1 Introduction The Kalman filter is shown to be the optimal state estimator against noise with Gaussian distributions by minimizing a wide class of error cost functions (Kalman and Bucy [6]). But in the real world the noise distribution may not be Gaussian, or its autocorrelations may not be known exactly. Several papers (Morris [8], Nahi [9], Kassam et al. [7], Poor and Looze [10]) have treated this problem from the minimax viewpoint by obtaining the saddlepoint solution for the worst-case of second-order statistics. Recently, the stability of Kalman-Bucy filter under parameter perturbation and noise uncertainty in continuous-time systems has been discussed in Chen [2]. In this paper, the robust stability of Kalman filter in discrete-time systems is concerned. The system faces to not only noise uncertainty but also linear timevarying parametric uncertainty. A sufficient condition for the robust stabilization of the Kalman filter with noise and parametric uncertainties is investigated. Saddlepoint theory and the discrete-time Bellman-Gronwall lemma are employed to treat this problem. Finally, a scheme is introduced to ensure the Kalman filter to satisfy the requirement of robust stabilization. The paper is organized as follows. The problem formulation is given in Section 2. Some robust stability criteria are derived in Section 3. An example and some conclusions are given in Sections 4 and 5, respectively. Approximate Kalman Filtering Guanrong Chen (ed.), pp. 179-192. Copyright ©1993 by World Scientific Publshing Co. Inc All rights of reproduction in any form reserved ISBN 981-02-1359-X
179
B. Chen and S. Peng
180 §2 P r o b l e m formulation
First, let us consider the following standard multivariable dynamic system model without noise and parametric uncertainties: Xjt+i = Axk + L , (1) Vfc = CXfc + 7?^ ,
where for each i t > 0 , x e R " , v e Hm, and A and C are constant matrices with ap propriate dimensions. We assume throughout that {^ and {7/fc} are uncorrelated, zero-mean, wide-sense-stationary random processes, i.e., E
E
£{7 Z j t }=0,
E{UkrlJ}=
HJ = °.
{Lk£} = Q6^ R6ke,
E
{Lk£} = °> ^{xo|J} = 0, £ { X 0 T £ } = 0 ,
for all k, £ — 0,1, •- •, where Q and R are nonnegative definite and positive definite matrices, respectively, and are both symmetrical. It will be assumed throughout this paper that (A, C) is an observable pair. In the one-step-ahead prediction problem, the state estimator without uncer tainties has the form x f c + , = Axfc + Gfclvfc-Cxfc] (2) and the performance is defined as J = E{ijxk}
,
(3)
where Xfc = xk — x t , called the estimation error. Then the Kalman filter is to choose Gfc in equation (2) to minimize the performance J at each k, and its steady-state solution is recursively given by Gk = APkCT{CPkCT Pk+1 = APkAT
+
R)-1,
+ Q- APkCT(CPkCT
+ R)-1CPkTAT
,
(4)
with Po = Cov (xo). The Kalman estimation problem assumes that the spectral densities of the process and measurement noise as well as the parameters of the system (1) are exactly known. When these spectral densities and the parameters are uncertain, the filter's performance is suboptimal and may even exhibit apparent divergence. From a practical point of view, we assume that the actual spectral densities and the deviations of the linear uncertain parameters of the system (1) are unknown but bounded and contained in some non-empty convex and compact sets. We
181
Robust Stability Analysis
reformulate the system (1) in the following form to take into account all the effect of the uncertainties: xjt+i = Ax/t + AAfcXfc + ^ , vfc = Cxfc + ACfcXfc +j2h,
(5)
where AAk e {\\AAk\\ < a} ,
AC fc e{I|AC fc ||< 7 }, and QeSQ = {\\Q-Q0\\0}, ReSR = {\\R-Ro\\ 0}. Here, AAk and AC*: are the linear parametric uncertainties of the matrices A and C, respectively; Qo and RQ are the nominal parts of the actual spectral densities of the noise f and ry,, respectively; and a, 7, p\ and P2 are given positive constants. Then the problem to be investigated in this paper is how to design a robust Kalman filter (2), according to the nominal system (1), such that the filter can asymptotically track the true states in the presence of both the parametric and the noise uncertainties in the dynamic system (5). §3 Design of a robust Kalman filter In this section, we separate the approach for the design of a robust filter into two steps. Before further analysis, some mathematical tools and definitions needed for solving the problem are first introduced. Let the norm of a real stochastic vector x e Kn, denoted by ||x||, be defined by [5,11] ||x|| =: ^ { x T x }
(6)
Then IIAxll ||ylx|| 2 = Elx E{xTTAATTAx\ Ax}
= tr{Eix tr{E{xTTAATTAAx}} x}\
}\ = tr{E{ATAxxT}}
,
(7)
where tr denotes the trace operator, and
Px|| 2
A)
V
if A is deterministic
Pll ==
(8) T
y^max(E{A A})
if A is stochastic.
182
B. Chen and S. Peng
Then we first consider a system with only perturbed measurement and process noise covariances. That is, AAk = ACk = 0 in the system of (5); in this case the system (5) becomes
x fc+1 = Ax f c +£ f c , vfc = Cxfc + r^ , and QeSQ = {\\Q-Q0\\0}, R e SR = {\\R - Ro\\ < P2,R > 0}. The problem of designing a Kalman filter in accordance with the above system can be considered as a minimax problem (i.e., a problem of finding the solution of mine maxQ^/j ||x — x||2 which has been discussed in several papers, Morris [8], Nahi [9], Kassam et al. [7], Poor and Looze [10]) in order to treat this noise uncertainty problem. By employing the saddlepoint and minimax theorems (Poor and Looze [10]), the following theorem follows immediately. Theorem 1. (Poor and Looze [10]) If SQ and SR are convex and compact sets, then there exist Qs 6 SQ and Rs € SR such that min max llx — xllo = Qmax min llx — xllo G Qes " " €S c Q
Q
R€SR
R6Sfi
and the following inequalities also hold: tr{ZQ} < tr{ZQs} T
tr{G sZGsR)
T
< tr{G sZGsRs}
VQ e SQ, VReSR,
(9) (10)
where tr denotes the trace of the bracketed matrix, Gs is the Kalman gain corre sponding to Qs and Rs, and Z is the solution to the equation (A-GsC)TZ(A-GsC)-Z
+ I = 0.
(11)
The pair Qs and Rs is called the saddlepoint or least favorable pair for the minimax problem, and the Kalman filter in terms of this least favorable noise spectra is a robust Kalman hlter under uncertain noise. Since the two sets, SQ = {\\Q - Q0\\ < p\,Q > 0} and SR = {\\R - R0\\ < P2,R > 0}, are convex and compact, there exists a pair of saddlepoints, QQ + p\l and RQ + P2I, which are just the maximal elements in SQ and SR. Then using Theorem 1 to solve our present problem, we obtain the following lemma.
183
Robust Stability Analysis
Lemma 2. The robust Kalman filter under noise uncertainty is the Kalman filter of (2) with the least favorable noise covariances, Qo + p\l and Ro + P2I, i.e., Xfc+i = Akk + Gk[vk - Cx f c ], Gk = APkCT[(Ro + p»I) + CPkCT}-1, Pk+1=APkAT + (Q0 + piI) - APkCT[(R<, + p2I) + CPkCT)-lCPkTAT
(12)
If we consider the system being disturbed both by uncertain noise covariances and by uncertain parametric variations, the minimax filter in Lemma 2 may not still be robust. In order to let the robust Kalman filter of (12) (i.e., the Kalman filter with the least favorable noise covariances) be stable under parametric perturbation, more constraints must be imposed. So advanced analysis is required. For convenience, the dynamic system (5) and the Kalman filter of (12) with the least favorable noise uncertainties are rewritten as follows: (13) (14)
x fc+1 = Axk + AAkxk + £fc vfc = Cxjt + ACfcXfc + n and Xfc+i = Akk + Gk[vk - Cxfc], T
+ pal) +
F fc+ i = APkAT
+ (Qo + fill)
Gk = APkC [(Ro - APkCT[(Ro
T
(15) (16)
1
CPkC }~ , CPkCT\-1CPkTAT
+ p2I) +
(17)
Subtracting (15) from (13), we get Xfc+1 = Xfc + 1 -
Xfc+i
= (A - GfcC)(xfc - xfc) + AAkXk + £k - GfcACfcXfc -
G^
(18)
Combining (13) with (18), it follows that Xfc+1 Xfc+1
A 0
+
0 A - GkC
f
Xfc Xfc
A^fc AAt-GfcAGfc
0 0
Xfc Xfc
0 Gfc
L+
(19)
Let us define Xfc
=
C =
Xfc
XfcJ '
T l\'
A 0
A= and
D
0 A-
GkC 0 Gfc
AA
A.4fc
0
A^-GfcAGfc
0 (20)
B. Chen and S. Peng
184 Then (19) can be rewritten as x fc+1 = Mk + Ai f c x f c + CZk + Dr^ .
(21)
It is easy to see that, if A is stable, then the transition matrix $£ and the solution of (21) are given as * fc = Ak (22) and k-\
k-\
xfc = $fcx0 + Y, ®k-x-AA^
k-\
+ Y ®k-i-iC£. + J2 $fc-i->%
i=0
i=0
(23)
i=0
Since all the eigenvalues of A are distinct and inside the unit circle, the transition matrix 3>fc satisfies the inequality \\1>k\\ = \\Ak\\<mrk,
k>0,
(24)
for some constants m > 0 and 0 < r < 1. Simply choose r = max |AJ(J4)| , j
where A_,(J4), j = 1,2, ■ ■ ■ ,n, denote the eigenvalues of A. That is, r is the absolute value of the eigenvalue of A nearest the unit circle. An estimate of m can be made from ||j4fc||/rfc < m for all k. How to get m is sometimes very difficult. Fortunately, m can be obtained with the aid of a computer. Now our problem is to determine under what condition x^ will converge as k —i- oo in the presence of both the parametric and the noise uncertainties in the system (5). To provide a method of solving this problem, the Bellman-Gronwall lemma in discrete form is employed. The lemma is stated as follows (Desoer [4]): Lemma 3. (Bellman-Gronwall Lemma) Let {Mfc}o°, {/fc}o°) snd {/ifc}§° be real-valued sequences on the set of positive integers Z+. Let hk>0, Vfce Z+. (25) Under these conditions, if fc-i
uk
+ J2h*u"
k = 0,l,2,--,
(26)
;=o then fc-i ( k-i
uk
"j
II l ^ W i
t=0 [j=i+l
. fc = 0,1,2, ■-- , J
(27)
185
Robust Stability Analysis where Ylj=i+i[l + hi] is set to 1 when i = k - 1. Remark 1. For some constant h, hi < h, Vz, (27) becomes fc-i
«fc f c + ^ ^ ( i + /i) f c - 1 -7ii=0
Remark 2. For some constant / , fi < / , Vz, (27) becomes fc-i
uk < f 1 ] [1 + hi]. >=1
We make the observation that (21) has the perturbation term related to Xfc. It is feasible for us to apply the Bellman-Gronwall lemma to obtain a stability criterion. Based on Lemma 3 and the preceding definitions, we can relate a sufficient condition of stability to the Kalman filter (21) in the following theorem. Theorem 4. Consider the state-estimator system (21) with the induced norms, llfyjl = 52, ||f. || = Si, and suppose the transition matrix $fc fulfills the requirement of (24). If the stability inequality r + m||AAfc||
(28) is still
Proof: See Appendix Al. Remark 3. From (28), it is obvious that the robust stabilization of the Kalman filter is related to the location of the eigenvalues of A (or A and A — GkC) in (20). Suppose the eigenvalues of A are located deep enough inside the unit circle (i.e., the signal system of (13) is still stable even though it suffers from noise and parametric uncertainties (this assumption is reasonable because we would not initially choose an unstable signal system)). In this situation, the robust stability is completely determined by the location of the eigenvalues of A — GkC. Remark 4. For stochastic control systems, when the LQG optimal control design technique is considered to apply to this control problem, the Kalman filter must be employed to estimate the state variables. However, in actual control systems, neither plant parameters nor noise may be known precisely. In order to make the estimation of the state variables more practical, the above robust Kalman filter may be used to treat this estimation problem in LQG optimal control systems (Chen [3])-
B. Chen and S. Peng
186
Remark 5. From Theorem 4, it is assumed that m[|Aiifc|J is evaluated, and all of the eigenvalues of the Kalman filter must be inside the disk of radius r < 1 — m||AAfc|| in order to guarantee the stability of the Kalman filter. However, if the eigenvalues of A — GkC in (12) are not all inside the disk of radius r, a scheme proposed by Anderson [1] is employed to treat this problem. Suppose we artificially multiply the covariances Q and R in the system of (1) by (l/r)2k and r < 1, respectively; i.e., Q' = Q(l/r)2k and R' = R(l/r)2k Anderson has shown that the system (2), the Kalman gain Gk, and the P in the Riccati equation (4) can be changed to (29)
x fc+ i = [A - GkC\kk + GfcVfc , Gk = APkCT[(R + p2I)(l/r)2k + CPkCT}-1, T 2k Pk+l = APkA + (Q + PlI)(l/r) T - APkC [(R + p2I)(l/r)2k + CPkCT]-1CPkTAT
(30)
(31)
Then the estimation error Xfc of the filter in (3) will converge at least as fast as rk when k increases; i.e., all of the eigenvalues of (A — GkC) are inside the disk of radius r. After deriving the sufficient condition under which the Kalman filter is stable, we can find one form of bound by the estimate given in Theorem 5. Theorem 5. Consider the state-estimator system (21). If the stability criterion of (28) is satisfied when A; —> oo, then an upper bound of \\x.k\\ can be evaluated by m ( g l llgll+fla || j | | l-(r+m||AAfc||
(32)
Proof: See Appendix A2. §4 An example To illustrate the stability criterion described above, we consider a simulation exam ple as follows. Consider the following disturbed dynamic system: 0.7 0.2
o.r
£l,A
0.5
X2,k
Vk = [-10
10]
x
l,k+l X2,k+1 .
Xl,k X2,k
+
0.2 sin (0.01A: 0
+ [e -2k
0]
0 0.1e"2fc
Xl,k X2,k
+L,
Xl,k
x2,k
with £
{ ! J = °>
E
{Vk} = 0, E{£k(J} = Q6kt, and E{t,kVJ} = RSkt,
187
Robust Stability Analysis ifhere
0.0001 0 0.0001
QzSQ = {Q0
0 0.0001
and Qo
0.0004 0
0 0.0004
R„
0.0001.
Besides, the error covariance matrix PQ is initialized as 1 0 0 1 How do we design a robust Kalman filter to estimate the states of the above system with parametric and noise uncertainties? In accordance with system (5), we have A
0.7000 0.2000
C
-3.0000
-0.1000 -0.5000 10.0000] ,
AAk and
0.2000 sin(O.Olfc) 0
0 O.lOOOe-2^
ACk = [e"2fc 0]
Then ||AAfc|| < 0.2000 = a, ||ACfc|| < 1 = 7, and from the definition of SQ and SR, the following inequalities hold: \\Q - Qo\\ < 0.0001 = pi, \\R-Ro\\ < 0.00005 = p2. Based on the proposed design method, the estimated states x^, Kalman gain Gk and error covariance Pk are easily obtained recursively. Herein, Gk is calculated from (16) and (17) in accordance with Q0 + p\ and R0 + p2 and the steady state Kalman gain is obtained as -0.0305 0.0145], which gives ||G f e ||= 0.0337.
It is easy to check the inequality (28), so we know the Kalman filter with the gain Gk is a robust filter for the system given in this section under the parameter and noise uncertainties. The results of simulation are shown in Figures 1 (a) and 1 (b).
188
B. Chen and S. Peng
Figure 1. Computer simulation for robust Kalman filter under parametric and noise uncertainties in the example with true state ]x\, X2] with initial condition [1, 1] and state estimate [x\, £2] with initial condition [0, 0].
189
Robust Stability Analysis §5 Conclusions
A sufficient condition has been presented to ensure the stability of the state esti mator with a Kalman filter. If the sufficient condition is not satisfied, it does not necessarily imply system instability. A new robust discrete-time Kalman filter de sign has also been introduced to allow for uncertain noise covariance and uncertain parametric deviation in the dynamic system (5). Our design method is to employ the minimax filter gain G/t to meet the robust stability criterion stated in Lemma 2. If the inequality (28) is not satisfied, the modified method stated in Remark 5 is then applied to treat the design problem under both the parameter and the noise uncertainties.
References 1. Anderson, B. D. 0., Exponential data weighting in the Kalman-Bucy filter, Inform. Sci. 5 (1973), 217-230. 2. Chen, B. S. and T. Y. Dong: Robust stability analysis of Kalman-Bucy filter under parametric and noise uncertainties, Int. J. of Control 48 (1988), 21892199. 3. Chen, B. S. and T. Y. Dong, LQG optimal control system design under plant perturbation and noise uncertainty: a state-space approach, Automatica 25 (1989), 431-436. 4. Desoer, C. A. and M. Vidyasagar, Feedback Systems: Input-Output Properties, Academic, New York, 1975. 5. Goodwin, G. C. and K. S. Sin, Adaptive Filtering Prediction and Control, Prentice-Hall, Englewood Cliffs, N. J., 1984. 6. Kalman, R. E. and R. S. Bucy New results in linear filtering and prediction theory, Trans, of Amer. Soc. Mech. Egnns, Pt. D, J. of Basic Eng. 83 (1961), 95-108. 7. Kassam, S. A., T. L. Lim, and L. J. Cimini, Two-dimensional filters for signal processing under modeling uncertainties, IEEE Trans, on Geosci. Electr. 18 (1980), 331-336. 8. Morris, S. A., The Kalman filter: A robust estimator for some classes of linear quadratic problems, IEEE Trans, on Inform. Theory 22 (1976), 526-534. 9. Nahi, N. E., Bounding filter: A simple solution to lack of exact a priori statistics, Inform. Contr. 39 (1978), 212-224. 10. Poor, V. and D. P. Looze, Minimax state estimation for linear stochastic sys tems with noise uncertainty, IEEE Trans, on Auto. Contr. 26 (1981), 902-906. 11. Vidyasagar, M., Nonlinear System Analysis, Prentice-Hall, Englewood Cliffs, N. J., 1978.
B. Chen and S. Peng
190 §6 Appendix
A lI Proof of Theorem 4 insider the the combined combined state state equation equation (21) (21) Consider Xjt+i x fc+1 = = Ax Axkk + + A^fcX A^fcXfcfc + + C£ C£kk + + DT^ Dr^ , where
A-\A
°
A
~[0
C =
k
A-GkCy T
J
,
and
AAk
AA,-[ D=
L
°
[AAk-GkACh
oj '
_
— fc
Solving the preceding difference equation, we obtain the solution fc-i fc-i fc-i
' - AA x + xx fc Ss AAfcxoxo ++ J2£ Ai^-^'Ai^Xft + £J2 AAk-~-l~C£lcik ++ J2£ k
hl l
fc
k 1 l
k k
k
i=0 t=0
i
■
....
_
i=0 i=0
ifc_1 k l
A - ~'D%
(33)
i=0 i=0
._ _ i
Taking norms, we get jt-i
fc-i ||xfc|| < Pk f c ||||x 0 || + Yl M^-^IIIIAifcHllxfcll \\xk\\<\\A \\\\x0\\ + i— n J2\\Ak-l-\\\\AAk\\\\xk\\ fc-1 fc-1
i=0
- fc-1 £ fc-1 iii^-iiiicuiiy + £ II^-^IIPIIII^ + i=0 £ iii^-iiiiciiuy +i=0 J2 ii^-'-iipnii^ii. i=0
(34)
i=0
Using (24), it is found that fc-1 fc-i fc 1_i IJXfeU < mr fc ||x 0 || + ^ m rJ2mr " ' - fc1~| A i||AA xfc|| fc||H fc||||x«||
fc-i fc-i + ^m^-'-'IICIHiejl + 2Zmrk-1-i\\D\\\\rLk
(35)
Dividing both sides of (35) by r , we obtain
r
fc-i -"fc-l ( ffl ||C|| + g2\\D\\) + I ^ l ^ M ^ T V ' l l x i l l . (36) |xfc|| < m||x 0 || + m1-r
191
Robust Stability Analysis Applying Remarks 1 and 2, we obtain r
[acfcII < m||xo|| + m-
\-r
'-(9i\\C\\+g2\\D\\
m||AAfc||^/ m|AJfc|\ *-'-*[ . » > 1H mx 0 + m- 1 - r ■{9i\\C\\+92\\D\\ r ^ V r / [ r~k - 1 m xo + m— (fli||C|| +5 2 |-D ) l —r 2 m \\AAk\\\ / 1 - [1 + (m||AA fc ||/r)] fc Poll l - [ l + (m||AA fc ||/r)] -m 2 ||A,4 fc l l-[l+(m|lAifc||/r)]-fc 1 + l-rj \l-[l + (m\\AAk\\/r)]-1 r
H
1 +
m\\AAk
2
+
m \\AAk\
1 +
r m\\AAk
fc-i
(gi\\C\\+g2\\D\\) 1 \ f 1 - (r + 1 - r / U - {r +
m\\AAk\\)-k m\\AAk\\)-1
fc-i (37)
aft||C|| + fla||S|l).
Next, multiplying either side of (37), it follows that 1-r-"
|xfc|| < mr K ||x 0 |!
+
m2\\AAk\\\
(91 \\C\\+ 92 \\D\\
f rk - (r + m\\AAk\\)k l - [ l + (m||AA fc ||/r)]
'lxoli
AAk\\\ f l - [ l + (m||AA fc ||/r)]\fc-i J-, y l l - [ l + (m||A>m|/r)]-i/(r + m | | A ^ 'm2\\AAk\\\ / l - ( r + m||A J 4 fc ||)- fc x(9i\\C\\ + 92\\D\\) + 1 -r 1 - (r + m\\AAk\\ ro||A>ifc||)fc-1(Si||C||+S2p||),
x(r +
(38)
when k —* oo, if r + m||AAfc|| < 1, then the state estimator will be robustly stable. A2 Proof of Theorem 5 As A; —► oo, (38) becomes
IXfcl
^)<jnm+*m)+(=z^ Copyrighted Material
B. Chen and S. Peng /
192
x
ll g (iVr ^'C.)^ii ii+^ii) \1 - {r + m\\AA \\) V k
\ . l - ( r + m||AAfc||) _m(lgi\\C\\+g - ( r +2\\D\\) m||AA fc ||) l - ( r + m||AA fc ||) Hence, the bound of ||xfc|| is
V
m( g l |l(7||+ff 2 ||P||) l-(r+m||AAfc||) as k —► oo.
Bor-Sen Chen Department of Electrical Engineering National Tsing Hua University Hsinchu, Taiwan, R.O.C. Sen-Chueh Peng Department of Electrical Engineering National Yun-Lin Polytechnic Institute 64, Huw Ei Yun Lin, Taiwan, R.O.C.
Numerical Approximations and Other Structural Issues in P r a c t i c a l I m p l e m e n t a t i o n s o f K a l m a n F i l t e r i n g
T h o m a s H. Kerr
A b s t r a c t . Getting incorrect results at the output of a Kalman filter (KF) simulation or hardware implementation can be blamed on (1) use of faulty approximations in the implementation, or (2) on faulty coding/computer programming or (3) may actually be due to theoretical details of what should be implemented in the application being incorrectly specified by the analyst (especially since some errors still persist and propagate in the published literature that the analyst may have used as a starting point). Handling situations (1) and (3) will first be discussed here. Although situation (2) is initially impossible to distinguish from the effects of (1) and (3) for a new candidate KF software implementation, any residual problems present can be ferreted out by first eliminating (1) and (3) as possibilities for contamination and problems falling under situation (2) may be further isolated (for remedy) by using certain test problems of known analytic closed-form solution for software calibration/check-out in the manner discussed as my original unique approach to (IV&V) Independent Verification and Validation (completely compatible with DOD-STD-SDD/2167/2168A/973/499B/490B methodology) for Kalman filter code. The techniques espoused here are universal and independent of the constructs of particular computer languages and were honed from years of experience in cross-checking Kalman filter implementations (both my own and those of others) in several diverse commercial and military applications (and implementation languages).
§1 I n t r o d u c t i o n Over t h e past t h i r t y years, K a l m a n filters (KF) have been used in telephone line echo-cancelers, missiles, aircraft, ships, submarines, t a n k s t h a t shoot-on-the-run, air traffic control (ATC) radars, defense and targeting radars, Global Position System ( G P S ) sets, a n d other s t a n d a r d navigation equipment. In recent years, G P S / K a l m a n filter combinations in conjunction with laser disk-based digital m a p technology is being considered for use in future automobiles (as well as in ships Approximate Kalman Filtering Guanrong Chen (ed.), p p . 193-220. Copyright ©1993 by World Scientific Publshing Co. Inc All rights of reproduction in any form reserved ISBN 981-02-1359-X
193
194
T. Kerr
using displays rather than paper charts) to tell the driver/pilot where he is and how to get where he wants to be. Commercial products as well as military vehicles and platforms rely on Kalman filters. Computers are used to implement Kalman niters and to test out their performance (assess the speed of response and the accuracy of their estimates) beforehand in extensive simulations. There is evidently consider able commercial value in understanding Kalman filters both as a developer and as a potential user [30]. I seek to weave the thread of the story here but, due to space limitations, will defer to my references for more elaboration (which provide pointers to the further contributing precedents of others, thus serving as analytical stepping stones in the evolution). I will be uncharacteristically terse when discussing a topic here that I have previously published and already extensively discussed elsewhere in the open literature. The techniques espoused here were honed from years of experience in cross-checking Kalman filter implementations (both my own and those of others) in several diverse commercial and military applications from first hand knowledge, having worked directly with C-3 Poseidon submarines' Ships Inertial Navigation System (SINS) 7 state Con-B STAtistical Reset (STAR) filter, C-4 Trident sub marines' 14 state Electro-magnetically Supported Gyro Monitor (ESGM) Reset fil ter and 15 state SINS Correction filter, earlier vintage minesweeper 19 state PINS filter, 13 state Passive Tracking Algorithm (PTA) filter for sonobuoy target track ing, 15 and 18 state Singer-Kearfott and Hughes candidate Class B JTIDS filters (filter parameters such as INS gyro drift-rates, biases, and scale-factor errors are classified for military applications, otherwise standoff targeting and bombing accu racy and radio-silent rendezvous capability could be inferred; however, such gyro and accelerometer parameter information should be reported for clarity in civil ian applications according to new specification standards currently being revised by the IEEE AES Gyro and Accelerometer Panel), 12 state filter for Electronic Terrain Board analysis, 22 state Multi-Band multi-Frequency Airborne Radio System (MFBARS) filter predecessor to ICNIA for the Advanced Tactical Fighter, various GPS filters [12, Table III], angle-only tracking filters and other 6 state exoatmospheric and 7 state endoatmospheric Reentry Vehicle (RV) tracking filters for radar, etc. §2 Some numerical approximation issues that arise in Kalman filtering A Kalman filter (see Figure 1) is an efficient and convenient computational scheme for providing the optimal estimate of the system state and an associated measure of the goodness of that estimate (the variance or covariance). In order to imple ment a KF, the actual continuous-time system must be adequately characterized by a linear (or linearized) ordinary differential equation model, represented in state space at time t in terms of a vector x(£), and having associated initial conditions specified, and availing sensor output measurements v(t) (functions of the state plus additive measurement noise). It is mandatory that the KF itself actually contain within it an analytical mathematical model of the system and sensors in order to
Numerical Approximations of Kalman Filtering
195
perform its computations (designated as a model-based estimator), and it must possess a statistical characterization of the covariance intensity level of the additive white Gaussian measurement and process noises present as well to enable an implementation. Here, we should remark that the Central Limit Theorem is usually invoked from statistics [25, pp.238-240] to justify that a number of contributing minor effects can frequently sum up to a net effect that is Gaussian in distribution, even when the constituent components are not (i.i.d.) independent and identically distributed. Care is sometimes needed to be aware of when a necessary condition on the 3rd moments of the random variables contributing to the sum is in danger of being violated [22, pp.66-73] (as occurs with the bell-shaped Cauchy distribution) otherwise Gausianess is never attained and other approaches need to be used.
Figure 1. Overview functional block diagram of the internal structure of a Kalman filter. A simplified overview of the principles of operation of a Kalman filter has been treated in [17, Sec.V, pp.943-944], [20, Sec.IA] and, from my perspective, is what constitutes the essence of a Kalman filter mechanization. References [3,9,5] all address important numerical approximation issues that sometimes arise in Kalman
T. Kerr
196
filtering such as numerical sensitivities and ill-conditioning, adverse effects due to use of quantized data, the nature of the matrix inversion algorithm or approxi mation utilized within a real-time KF mechanization and its subsequent effect on convergence, respectively, each of which can corrupt KF performance and degrade its tracking accuracy in some instances. However, the approach taken here will be to instead consider the more prevalent 1st order issues usually encountered in a rea sonable (but typically clumsy) software implementation attempt and leave pointers to references to indicate where more detailed information may be found on the more sophisticated topics that arise less frequently but are important never-the-less. 2.1 Typical errors and/or bad approximations occurring in Kalman filter code of otherwise good quality Listed below are several prevalent departures from the ideal involving use of expe dient approximations and simplifying assumptions that one must be alert to avoid lest they taint or corrupt KF output results. Consider the following possible short coming's in KF code (each having been previously observed in both government (DoD) and commercial KF packages and code implementations): 1. Some KF software/covariance analysis implementations don't use the exact discrete-time equivalent to continuous-time white Gaussian system noise (plant or process noise) f (t) [24, p. 171, Eq. (4-127b)] represented by: [ 3>{t k+UT)Qc{T)$> {tk+X,T)dT, *(**+
Qk = / Jtkk
(1)
(where A = tk+\ — tk and $(£, r) is the associated system transition matrix) as an operation on the continuous-time white process noise covariance intensity matrix, Q (or Qc), as in [20, Eq. (5)], where equation (1) simplifies for timeinvariant systems (obtained by the steps depicted in [17, Sec.II]) to be: /•A
Qd = eAA
/ e~AT BQBT Jo
e-^
T
dr eA
Skj ,
(2)
where the above Kronecker delta is defined as
1 0,
otherwise.
(3)
An offending software code implementation (of the type being cautioned against) instead uses the following Kalman approximation: Q'd = A Q ,
(4)
Numerical Approximations of Kalman Filtering
2.
3.
4.
5.
197
which is an uncalibrated approximation [17,27] and, for implementation, Q'd or Qd is usually further factored (via a Choleski decomposition if necessary) as Qd = BBT, where here B is used as the process noise gain matrix, according to the convention of [29] while the underlying white noise originally simulated is of unit variance. It is demonstrated in [17, following Eq. (40)] (also see Nov.'91 update) just how bad the effect of this approximation of equation (4) can be by the degree of error incurred using Q'd as compared to Qd via equations (1) or (2). However, the approximation of Eq. 4 may still be satisfactory for some special application situations. The act of taking the time step to be constant in any KF mechanization cor responds to external position fixes being obtained periodically (while actual KF theory is more flexible than this and real world practicalities don't always strictly adhere to this periodic structure) so implementers frequently compen sate by resorting to data time-tags and appropriate extrapolation to the desired time or by measurement data averaging, explained in [8, p. 291]; The transition matrix calculation for converting the continuous-time n-state model description to discrete-time, historically adaptively tailors the number of terms retained in the Taylor series by using either too coarse a norm (see [29]) or an invalid norm [17, pp.938-939]. A tighter bound for this purpose has been derived from considerations of both column-sum and row-sum norms in [11] and, additionally, it is prudent to also set an upper limit on the total number of terms from the Taylor series expansion allowed to be used in calculating the transition matrix so that the computation can't run away (otherwise it could incur numerous overflow's due to the effect of accumulated roundoff). The transition matrix used throughout the computer run is frequently calcu lated only once (such as in [29]), up front as a pre-processing step, then retained as being constant (while a variation more appropriate for many applications but not possible with a simplistic software implementation is to relinearize a(x) (occurring within the ordinary differential equation x = a(x, £, t) describ ing the system) at each new time point as A(tk) and either recalculate the matrix exponential to provide the new transition matrix at each relinearization or else just use the first few Taylor series terms of the matrix exponential as / + A(tk)(tk+i — tk) to approximate the transition matrix at this new time point, an especially prevalent solution found in many real-time applications). Fallacious versions of tests of matrix positive definiteness/semi-definiteness are prevalent in Kalman filter code and in target tracker code. Two faulty algo rithms that have been encountered in actual use as reasonableness tests for covariance matrices are explicitly identified and warned against in [19, Sees. Ill and IV], with appropriate theoretical fixes suggested (based on use of SVD variants or Choleski factorization).
Finally, alternative approaches that have evolved to justify the technique of equation (4) as the appropriate approximate discrete-time process or plant noise
198
T. Ktrr
covariance intensity matrix Q'd to be used for computer simulations is traced in the Nov.'91 update to [17]. To clarify for actual applications, the usual ruleof-thumb (recommended by Industry) to use for discrete-time white noise simu lation, is that the continuous time white process noise Qcontinuous has units of "(nU"?met0un"tnsitS)'" so o n e n e e d s t o multiply throughout by A (with time units) that cancel the time units in the denominator units of Qcontinuous to yield Q'd, with exclusively (numerator units) 2 " for discrete-time sample-data white process noise. (Note that in equation (1), the exact expression for Qj, rids itself of time units in the denominator via the indicated integration with respect to time.) This final value is then used in actually performing a discrete-time simulation performance evaluation on a computer as the appropriate approximate technique (verified here by a units check and by allowing a type of correspondence for the white noise Qcontinuous, also associated with a Dirac delta function, and the white noise Q'd, associated with the more benign, less pathological, Kronecker delta function (that unlike the Dirac delta function doesn't blow up or need special interpretation for rigor as a functional of bounded convex support using Schwartz's Theory of Distributions (1947)). 2.2 A n approach for debugging linear Kalman filter software The importance of this section is that subsequent software verifiers, when faced with validating newly coded or newly procured Monte-Carlo simulator subroutine software modules and Kalman filters of their own, can treat the entire exercise as one of confirming the proper performance behavior of the new modules merely as an exercise with black boxes. Time can then be saved by just confirming the outputs corresponding to the designated low-dimensional test cases of known closedform solution provided herein and matching critical intermediate computational benchmarks (without having to necessarily further probe the internal theoretical intricacies that are already justified here, in [17] and in [15], where the veracity and utility of these test cases is established and explained in more detail) but can instead check the code, with helpful clues as to the real software culprits and bugs being revealed by these recommended tests when output results don't jibe. Thus, the software verification/validation job is simplified by using the results presented here and used to pinpoint or isolate any problems that exist in the code. This entire exercise of using simple transparent test problems may be interpreted as an initial calibration of the available software before proceeding to use the parameters of the actual application. However, before the KF code can be validated as performing properly, or in case of known errors, before the source can be pin-pointed, first the inputs to the KF must be validated as being exactly what was intended. To this end, we first turn our attention to validating the Monte-Carlo simulator, as addressed next.
Numerical Approximations of Kalman Filtering
199
2.2.1 Capabilities designed into the simulator A state-variable based Monte-Carlo simulator, of the form depicted in Figure 2, was developed to support AR process emulation for testing the performance of multichannel Linear Prediction algorithms (of the Maximum Entropy type) for spectral estimation and also for testing the adequacy of KF trackers. This simulator possesses the following modern features: • Incorporates "exact discrete-time equivalent of continuous-time white noise"; • Offers option of using the more efficient direct calculation of steady-state initial conditions corresponding to stationary behavior of the underlying random process (without having to iterate to steady-state to avoid the nonstationary initial transient); • Offers option of having additive (stationary white Gaussian) output measurement noise present (thus creating a type of ARM A process);
Figure 2. State-variable Markov-based Monte-Carlo simulator.
T. Kerr
200
• Isn't restricted to use of only diagonal covariances for noises or for initial con dition covariances; • If covariances are not diagonal, the program internally automatically checks to verify that the covariances possess the requisite "positive definiteness'' property (via use of the Singular Value Decomposition (SVD) in the manner indicated in [19; 13 (p. 504); 18; 14 (Sec.III, p. 63)]), otherwise diagonal covariances are merely verified not to have zeroes or negative numbers on the principal diagonal; • Calculates transition matrix by more accurate Pade approximation (offering two validated options along these lines [17, Sec.III]) rather than through use of a Taylor series expansion for this purpose [17, Fig.l]; • Can handle nonzero means for both noises and initial conditions; • Outputs final pseudo-random noise (PRN) generator seed value to enable con tinuity of use via allowable dovetailing of output sample functions if further prolonged sample function history is subsequently pursued (which uses this PRN seed during subsequent start-up). 2.2.2 Verifying the simulator proper The overall structure of the simulator is depicted in Fig. 2. Using the input param eters of Test Case 1, as depicted in Table 1, the intermediate outputs provided by the software implementation were verified to be correct. The specific features of the software implementation that were confirmed using Test Case 1 are detailed in the second column from the left in Table 2. The importance of using Test Case 1 and the aspects that it reveals are described next. Certain matrices known as "idempotent" matrices have the unusual property that when multiplied times itself again yields itself as the result: A A = A.
(5)
The non-trivial system matrix of Test Case 1 exhibits this property. The present application in software verification is a neat application of idempotent matrices be ing used to construct test matrices for verifying the transition matrix algorithmic implementations that are used for computer computation of eFl. The utility of these test matrices is that the resulting analytically derived expression for eFt is conveniently in closed-form for F = A. Hence the performance of a general eFt sub routine implementation can ultimately be gauged by how close it comes to achieving the known ideal exact solution. Using the representation of a matrix exponential, defined in terms of its Taylor series , but evaluated with an idempotent matrix A having the property of equation (5) being substituted along with time-step A; the matrix Taylor series expansion of eA* now yields
1= Tkble 1. Summary of parameters of test case models used in validation tests of primary software modules. Case' No.
Test Case 1
Test Case 2
Test Case 3
Test Case 1
0.105
0.5
0.5
1
Step Size DEL (&) System Matrix A Transition Matrix
NDIM
-1/3 1/3 -1/3
1/31 -1/3 1/3J
contlnuous "1 0 .0
time version 0 0" 1 0 0 1.
Observation Matrix C
•10 .0 1
Measurement Noise Covarlance Intensity Matrix R
[o
Initial mean
To o
x
^
•e 3 9. 3
01 oj
?] ol T
*a.
'-5
-1"
"0
r
. 6
0.
.0
0.
"1
0.5]
"0.31-J0.22 -0.75"
.0
1J
.0.65 0.55. as entered
'1.166 -0.165 0.16"F] [-0.0661 -0.1117" -0.166" 1.16? -0.166 .0.166" -0.166 1.166. L 0.8685 0.6571. as calculated as calculated
MDIM - 2
Process Noise Covarlance Intensity Matrix Q
n
R
" 1/3 -1/3 . 1/3
NDIM - 3
MDIM
3 3.
as calculated
NDIM - 2
NDIM - 2
MDIM - 2
MDIM - 2
continuous time version
continuous time version
O 3 to
NDIM - 2
? R 3
MDIM - 2 Discrete time version
[o ?] [o ?]
noo -i Lo io-'J "l .0
0 1.
[0
[o
!]
noLo
ol IO-J
L T ,0°-]
[0
0]T
[l0
l]T
[0 0]T
3. 3
to
[0 ?] r
o
Initial Covarlance p
o
'6 2 .1
2 1" 8 3 3 12.
[o ll
no- o i L o io-'J
[I
?] to
oI—»
T. Kerr
202
eAA
_ V^
f*_Ak fc!
fc=o
-J+^A+^A' + ^ H .
..A
A2
A3
,, A A2 A3 = /T + J 4(l + - + _ r + _ + . . . - l ) = / + yl(eA-l),
(6)
as explained in [17, Sec.IV]. Thus, the closed-form exact expression for the transition matrix corresponding to idempotent system matrices is as depicted in the last line of equation (6) as a finite two step operation involving just a scalar multiplication of a matrix and a single matrix addition (as compared to an infinite series that must be truncated in the case of standard software implementations for the case of more general matrices). Using the result of equation (6) for idempotent matrices within the more general expression of equation (2), allows this expression for the required discrete-time process noise covariance to be evaluated analytically in closed-form as: Qd = [I+ A(eA - 1)1 / [/ + A{e~T - 1)1 BQBT Jo x [/ + AT(e-T - 1)] dr [I + AT{eA - 1)] ,A
= [/ + A(eA - 1)] / [BQBT + (ABQBT + BQBTAT){e~T Jo +ABQBTAT(e-2T - 2e~T + 1)] dr [ / + AT(eA - 1)] = [7 + A(eA - 1)] [BQBTA +ABQBTAT
(~-±e-2A
+ (ABQBT
+ BQBTAT)(1
+ 2e-A + A)
[I + AT(eA
- 1)
- e~A - A) - 1)] .
(7)
This is a new result that is also useful as a confirming check for software imple mentations of equation (2). Here, we remark that along a different line, something similar to equation (2) can be computed for numerically evaluating Q^ for any con stant matrix A, not just for idempotent matrices, by (1) expanding e~Ar into its matrix Taylor series, (2) by performing the indicated multiplications of the two series within the integrand, (3) by subsequently performing term-by-term integra tion, and then (4) by retaining enough terms of the final series to be used to provide sufficient accuracy in actual numerical calculations.
Numerical Approximations of Kalman Filtering
203
Using the parameters of Test Case 2, as depicted in Table 1. The specific features of a software implementation that can be confirmed using Test Case 2 are detailed in the third column from the left in Table 2. Test Case 2 has an easy to determine closed-form expression for the transition matrix, for Qj, for the steadystate Lyapunov equation, and for the ideal output power spectrum [15, Appendix A]Using the parameters of Test Case 3, as depicted in Table 1 (with closed-form expressions for the solution being a straight line with slope 4 and intercept 10 and with such inconsequentially low magnitudes of the noises), the intermediate outputs provided by a software implementation can be verified to be correct. The specific features of the software implementation that can be confirmed using Test Case 3 are detailed in the fourth column from the left in Table 2. Actual extremely reg ular essentially deterministic sample functions obtained for the underlying known unstable system can conveniently be used to check at a high level that the output is exactly correct. Besides confirming the outputs of the simulator with an eas ily recognizable expected answer (as contrasted to Test Cases 1, 2, and 4, which provide random noise corrupted sample functions that can be confirmed at the ag gregate level only from statistical properties that are a byproduct of downstream KF tracking or spectral estimation), this Test Case 3 also allows a programmer to calibrate (and correct) their plot routines and his scale conversion for output plots, if necessary. Using the parameters of Test Case 4, as depicted in Table 1, the intermediate outputs provided by a software implementation can be verified to be correct. The specific features of a software implementation that can be confirmed using Test Case 4 are detailed in the fifth column from the left in Table 2. The main purpose of this last test case is to be able to handle the situation of providing prescribed multi-input/multi-output (MIMO) complex random process output with specified cross-correlation between output channels. This was needed in [15] in verifying the performance of alternative Maximum Entropy spectral estimators down-stream of the simulator (operating on its outputs), which, like a KF, deal only with first and second order statistics. The correct answer for 2-channel spectral estimation should appear as in Figure 3. Certain modern tracking radars use coherent phase processing, also known as coherent integration (where both magnitude and phase are accounted for in the summation of signal returns but where the distinction arises of having to keep track of real and imaginary components, instead of merely needing to keep track of magnitude alone, as conventional radars do), which jointly treats Primary Polarization (PP) returns in conjunction with Orthogonal Polarization (OP) returns and utilizes the additional target information provided from the crosscorrelation of these two separate channels.
Copyrighted Material
T. Kerr
204
Figure 3. True spectrum for the two channel complex Case 4.
Numerical Approximations
of Kalman Filtering
205
Restating for emphasis, the modern simulator design, discussed in this section, was pursued so that only a fairly exact mechanization would be used so that the input to the KF is precisely known. This was sought as a reliable testbed that avoids use of uncalibrated approximations in order to avoid confusing artifacts of simulator approximations with possible cross-channel feed through (that multichannel spectral estimation implementations are also known to frequently exhibit as a weakness or vulnerability) and which can adversely affect KF testing as well for the same reasons of uncalibrated cross-correlations being present. 2.2.3 Confirmation of software structural validity using augmented test cases of known closed-form solution A difficulty, as discussed in [16, Sec.I], is that most closed-form KF covariance solu tions are of either dimension 1 or 2 (as in [8, pp.138-142, pp.243-244, p.246, pp.255257, pp.318-320]) or 3 (as in [26]). To circumvent this dimensional mismatch to higher dimensional real applications that may be hard-wired, we can achieve the dimension n goal by augmenting matrices and vectors with a concatenation of sev eral existing test problems. Use of only totally diagonal decoupled test problems is notorious for being too benign or lenient and not taxing enough to uncover soft ware implementation defects (when the problems exist in the portion of the code that handles cross-term effects). Augmenting either several low-dimensional 2-state problems or fewer 3-state problems is the way to proceed in order to easily obtain a general n-state non-trivial non-diagonal test problem. A confirmation that this proposed augmentation is valid in general is provided next for a closed-form steadystate radar target tracking solution that was successfully used as a check on the software implementation of [29]. An initial worry in adjoining the same 3-state problem with itself relates to whether "controllability and observability" are destroyed, while the 3-state problem by itself does possess the requisite "controllability and observability." "Controlla bility and observability" conditions, or at least more relaxed but similar "stabilizability and detectability" conditions [21, pp.62-64, pp.76-78, pp.462-465], need to be satisfied in order that the covariance of a KF be well-behaved [8 (p.70, p. 142), 21]. The following mathematical manipulations establish that such an adjoining of two 3-state test problems does not destroy the "controllability and observability" of the resulting 6-state test problem even though it already exists for the 3-state test problem by itself. First consider the 3-state test problem of [26] of the following form: X(3xl)
position velocity acceleration
with
x ^ x + B^, v=C 1 x + r?,
£~M{0,Qi), ri~Af(0,R1),
(8)
T. Kerr
206
and assumed to be already satisfying Kalman's "controllability and observability" rank test criteria [8, p.70], respectively, as rank [J3, : AxBi : A\B{\ = m = 3 ,
rank [Cj : Ajcj
: [AjfcJ]
(9)
= m = 3.
(10)
J
(11)
Now the augmented system of the form position velocity acceleration x = position velocity acceleration.
with A,
:
0
Bi
0
" £
£>-
. | .
0
5
1
. ZZ _
x+ 0 Cj
:
0
A,. :
1
0
x+ . 0
:
(13) :
. 0
Ci
(12)
has system, process noise gain, and observation matrices, respectively, of the form A,
:
0
(14)
A-2 0
:
/li.
Bi
:
0
(15)
B2 = 0
:
Bi.
Ci
:
0
Ci
(16) 0
'■
Cy.
207
Numerical Approximations of Kalman Filtering In testing for controllability of this augmented system, form rank
A2B2
B2
:
A\B2
By
':
0
:
AXB\
0
;
Bx
:
0
B,
:
AiBx
:
A\B2
\
:
A\B2\ —
:
A\B2
0
:
A\B,
\
0
AiBi
:
0
:
A\BX
0
:
0
: other
rank
i
A\BX
\
stuff other
0
(17)
rank
Bx
A\BX
: A\Bx
:
stuff
3+ 3= 6 In the next, to the last line of equation (17), the columns of the Controllability Grammian are rearranged for convenience to provide the necessary insight. Permuting columns of a matrix doesn't alter its rank but can alter at-a-glance conclusions. Since we are able to show that the augmented system rank is 6, this system is con firmed to be controllable. A similar conclusion (on the requisite observability being satisfied) can be obtained by identical steps using the duality that exists between controllability and observability results and the associated forms of arguments or proofs when similar matrix structures, such as are present here, are involved. The above described augmented system of equations (12) and (13) can be used with Bx
0 (18)
R% 0
Rx
(19)
Q2 =
Pi(0) (20)
P 2 (0) = 0
Pi(0)
since now the augmented system has been demonstrated above to be both "observ able and controllable" and the measurement noise covariance R2 of equation (18) to be utilized is positive definite. This final observation allows as to use this 6-state augmented test problem with confidence to check out the software implementation as it is currently configured without making any further changes to the software.
T. Kerr
208 2.2.4 Summary of test coverage analytically provided here
An overview of the complete software test coverage offered here through selective use of analytic closed-form "Test Cases of known solution" is provided in Table 2. The utility of this coverage was discussed in Sec. 2.2.2. All items indicated in Table 2 must be successfully validated. By using these or similar examples, certain qualitative and quantitative aspects of the software implementation can be checked for conformance to anticipated be havior as an intermediate benchmark, prior to modular replacement with the various higher-order matrices appropriate to the particular application. This procedure is less expensive in CPU time expenditure during the software debug and checkout phase than using the generally higher n-dimensional matrices of the intended ap plication since the computational burden is generally at least a cubic polynomial in n during the required solution of a Matrix Riccati equation for the associated covariances (also needed to specify the Kalman gain at each time-step). The main contribution of these Test Cases is that one now knows what the answers should be beforehand and is alarmed if resolution is not immediately forthcoming from the software under test. Warning: correct answers could be "hardwired" within candidate software under test, but appropriate scaling of the original test prob lems to be used as inputs can foil this possible stratagem of such an unscrupulous supplier/developer. The benefits of using these recommended or similarly justified test cases are the reduced computational expense incurred during software debug by using such lowdimensional test cases and the insight gained into software performance as gauged against test problems of known solution behavior. However, a modular software design has to be adopted in order to accommodate this approach, so that upon completion of successful verification of the objective computer program implemen tation with these low-dimensional test problems, the matrices corresponding to the actual application can be conveniently inserted as replacements without perturb ing the basic software structure and interactions between subroutines. Even timecritical, real-time applications can be validated in this manner even when using matrix dimensions that are "hardwired" to the particular application by tailoring to the specified dimension using the technique of Section 2.2.3. 2.2.5 Specifying a Kalman filter covariance test problem The particular parameter values to be used for F j , B\, and C\ in equations (12) and (13) (as laid out following equations (2) and (3) of [26]) are '1 T 0 1 0 0 (corresponding to A\ = I3, where $1 =
T2/2 T 1
eAiT),
(21)
Numerical Approximations of Kalman Filtering
209
Table 2. Simulator testability coverage matrix. FUNCTION Transition Matrix Computation: Pade (Ward's Algorithm) Transition Matrix Computation: Pade (Kleinman's Algorithm) Qd Computation: Discrete-Time Equivalent of Continuous-Time White Noise Steady-State Computation of Initial Condition Mean Steady-State Computation of Initial Condition Covariance (Lyapunov Equation Solution) Verification of SVD-based Positive Definiteness Test for Nondiagonal Matrices Verification of Abbreviated Positive Definiteness Test for Diagonal Matrices Checked Process Noise Calculations as Output from Random Number Generator Checked Measurement Noise Calculations as Output from Random Number Generator Checked Recursive Calculation of all Constituent Components of Entire Random Process Over Several Iterations Checked Proper Handling of PRN Seed Verification of Stable Sample Functions Indicative of Stationary Process Verification of Unstable Sample Functions Indicative of Nonstationary Process Obvious Aggregate High Level At-A-Glance Confirmation From Output that all Functions Work Properly in Concert Confirmation of Identical Results When Complex Version of Software Enabled Eventual Confirmation of Proper Sample Function Statistics from Downstream Spectral Estimation Software Module Outputs
CASE 1
CASE 2
V
V
V
V
V
V
CASE 3 | CASE 4
V v7
V V
V
V
V
V
V
V
V
V
V V
v/
V
V
V
V
V V
y V
V V
T. Kerr
210
Bi =
T 3 /6 r2/2 T
(22)
and ,= [100],
(23)
Qi = a* ,
(24)
Ri = al ,
(25)
with and where, in the above, T is the fixed time-step and aa and ax are parametrization conventions used in [26]. Tractably determining the particular values of T, aa, and ax to be used here is the contribution of the Appendix section that provides the steady-state covariance via the parameterized methodology of [26]. The computed steady-state KF covariance immediately before and after a noisy position measure ment update are, respectively, as provided in equation (66) (res. 67) and (74) (res. 75). Notice where the KF residuals occur in [17, Fig.4]. In verifying and debug ging an actual KF software implementation, these residuals are monitored and used as a gauge-of-goodness and indicate good tracking performance when they become "small." The idea being that the measurements v*, match the model representation CfcXfc|fc_i fairly closely when the residuals are ''small." However, since residuals are never identically zero, the question is "how small is small enough?" (see [1] for an appealing explicit statistical test on the residuals using Chi-square statistics with appropriately specified degrees-of-freedom for an assortment of likely test condi tions that can occur.) Residuals (sometimes called innovations) will almost always initially decrease as the initial transient settles out. "Small residuals'' are necessary but not sufficient indicators of good KF performance and similar statements can be made for having statistically white residuals (for instance, see [4,23] which offer an example of a KF exhibiting white residuals despite known use of an incorrect system model but which also incurs an anomalous bias as the clue that something is wrong). When possible, as with simulations, one should juxtapose the time evo lution of any critical system states along side their KF estimates to see how closely the estimates are following the actual quantities of interest as a more encompassing gauge of proper KF performance. The only problem sometimes encountered in cer tain sensitive applications is that actual estimates may be classified while residuals may be unclassified, in which case attention centers on the residuals in unclassified presentations as a default in justifying good filter performance. For actual real sys tem data, the true system state uncontaminated by measurement noise is seldom available for confirming comparisons of proximity so use of residuals must suffice in .vailable for confirming comparisons of proximity so use of residuals must suffice in this situation also. his situation also.
Numerical Approximations of Kalman Filtering
211
Besides being used numerous times in the past to validate DoD KF software implementations, the techniques espoused in Sec. 2.2 were used to test the com mercial KF code of [29] for the PC. This commercial package initially satisfied all test cases invoked (Test Case 4 was skipped, as not being applicable, since this code wasn't designed to handle the situation involving complex number computations). Proceeding to use the augmentation of Section 2.2.3, this commercial KF bombed when attempting to run applications with state sizes greater than 10. Source of the problem was traced to a software bug that was revealed to be in the Monte-Carlo simulator AVBSIM.BAS, as one of the 13 modular subroutines. Proper declaration of index on initial condition xo had been overlooked by the supplier so the software (in BASIC) had defaulted to 10. Note that by later using different designators to correct another oversight, this initial condition, as used in initializing the simulator, had to be distinguished from its use in initializing the filter proper AVL AVBFILTR.BAS. This oversight was easily corrected in the source code and it then successfully han dled system models larger than 10. The original code was also enhanced to (1) read PQ from a file, instead of requiring excruciating hand entry from a keyboard at run time, (2) output results to a file so a more capable plot package, such as EASY PLOT™, could be used for final display. 2.3 A n approach for debugging nonlinear filter software implementation An exact finite-dimensional optimal nonlinear filtering test case of the type discov ered by Benes [2] and extended by Daum [6] (with a recent rigorous update in [28]) may suffice for W&V in the same manner as Section 2.2 by providing collaborative comparison of outputs to verify performance of a general EKF implementation (in stantiated with the same test case) if both implementations agree (sufficiently) for this simple test. This proposed manner of use for EKF software verification would be in keeping with the overall software test philosophy being espoused here. We remark that another nonlinear filtering example with a finite-dimensional imple mentation, not covered within the situations addressed in [2] and [6], is for scalar system x = f(x) + g f(«), where E [?(t)£(s)] = q 6(t - s) and f(x) = -jf^T&-.1 The verifiable asymptotic solution in the limit as t goes to infinity of the associated Fokker-Planck Uorl 1forward Chapman-Kolmogorov equation (defined in [10, pp. 126c wnere ;, e2,,■where Co 130]) is p(x,t\z,s) = n,.+,jS)c c-i == -4- 5 -a:and c is the normalization constant for this pdf. Actual experience in developing an EKF for angle-only Reentry Vehicle (RV) tracking via jammed (range-denied) radar using triangulation (RVTRIANG), as modified from an earlier EKF for tracking RV's via unjammed radar, convinced me that such goals are best carried out in specific well thought out stages. Examples are, first, for a constant gravity, then for inverse-squared gravity. First, for a nonrotating earth, then for a rotating earth. More detail on this aspect is provided in [20, footnotes 5 and 8]. A representative plan for EKF development that I adhered to for this endeavor is depicted in Table 3.
T. Kerr
212
Table 3. Stages and confirmation tests for phased EKF development. STAGE
SOFTWARE UTILIZED
COVARIANCE ANALYSIS
FILTER PORTION
LINEARIZATION
RV T R A J E C T O R Y GENERATOR
PURPOSE
1
RVTRIANG
Covariance only.
None
About true trajectory.
Straight fine.
2
Modified RVTRIANG (copy 1)
Introduce E K F Covariance mechanization (except linearized a b o u t true states).
Introduce E K F Filter portion (except linearized about true states). Print/plot states.
Same as above (so not a true E K F yet).
Same as above. Need to print/plot states.
3
Modified RVTRIANG (copy 2)
Introduce Same as above. linearization about estimates (so now a true EKF).
Same as above.
4
Modified RVTRIANG (copy 3)
Same as above.
Same as above.
Introduce linearization about estimates (so now a true EKF). Same as above.
5
Modified RVTRIANG (copy 4)
Same as above.
Same as above.
Introduce relineaiization.
6
Modify E X E C to conform to goal of 3 main modules.
Same as above.
Same as above.
Same as above.
Establish benchmark for later comparisons. See if covariances are similar to above case. See if filter estimates follow true trajectory (maybe with lag). See how closely E K F estimates follow true trajectory. See how closely E K F estimates follow true trajectory. See improvement in how closely E K F estimates follow true trajectory. See t h a t answers and E K F o u t p u t s are the same as above case.
Introduce nonlinear equations for conic RV trajectory. Same as above.
Keep separate.
§3 Appendix: A closed-form analytic solution — useful for testing Kalman filter covariance calculation The steady-state covariance solution before and after a measurement update (for periodic measurement usage) corresponds to solving the following two familiar KF mechanization equations for P and P, respectively, being: P - BQBT
=$(/ -
GC)P$T,
P={I-GC)P. While it would be desirable to just pluck the steady-state solution from [26], it turns out to not be quite that simple and easy. As laid out in [26] after laboring through a lot of algebra and parameter scaling, we have to solve a biquartic equation [26, Eq.(A9)] as an intermediate calculation. In order to make this challenge somewhat easier, instead of first specifying aa and ax in equations (24) and (25) beforehand, the new contribution provided here is to use a trick of convenience by finding the value that makes the following biquartic easy to solve for S:
213
Numerical Approximations of Kalman Filtering
S4 - 6 5 3 + 10S 2 - 6(1 + 2r 2 )5 + (1 + 3r 2 ) = 0 ,
(26)
where we recall that while quadratic equations are easy to solve, general cubics and quartics/biquartics are extremely challenging and messy in general. The trick is to force a convenient answer, as say, S =6 (27) to be a solution of equation (26) [26, Eq.(A9)] by choosing the value of r (appearing in equation (26)) for convenience. This proper value of r can be selected by first performing the following division exercise: 53
+ 105 + (54-12r 2 ) -6S3
S-Q}S* 54
remainder:
+ 1052
-6(l+2r2)5
10S2 1052
-6(l+2r2)5 -605
+325 - 69;'2
+(l+3r2)
-653
+ (l+3r2) - 6 ( 5 4 - 12r2)
( 5 4 - 12r2)5 (54 - 12r2)5
1 + 3r2 + 324 - 72r2
.^g.
So S = 6 is a root of equation (26) if the remainder in the above is zero as 325 - 69r 2 = 0
(29)
2 325 /325~ r» = - = T = ^ / _ = 2.17028.
(30)
or
Here, we remark that arbitrary solutions of equation (26) can't be forced (as in seeking to make S = 2 be a solution) because the remainder term will correspond to an "imaginary" value for r, which needs to be a real variable to be viable in this application. Now from the equation following equation (15) in [26], we have that A 12ax r=-jr3
(31)
2.17028 = r = ^ § ,
(32)
From equation (30) above,
T. Kerr
214 we can now take T = 2
(33)
ax = 4 ,
(34)
and so that rearranging equation (32) with these two assignments of equations (33) and (34) yields -
12(4)
2.17028(8)
-
— = 2.764620 2.17028
(35)
or, referring back to equations (24) and (25), yields the following two specifications Q1 = a\ = 7.643125
(36)
Ri=a2x
(37)
and = (4) 2 = 16
that are necessary to be pinned-down for a well-posed KF. According to [26, prior to Fig.l], the dimensionless quantity r defined in Eq. 31 can be interpreted as a type of noise-to-signal ratio. Now, following equation (17) of [26], Si =^S2
+ r2 = ,/36 + — = 6.38045, V 69
(38)
S 2 =V4S - 1 = v / 2 4 ^ 1 = %/23 = 4.7958,
(39)
A =V3 = 1.73205,
=2.17028,/i + 325 ^ y 2 69 V 16
(40)
325 / 9 7 69 V 2
81
325\\ 69 ))
=2.17028 W - + 9582.91 =2.17028(97.8949) = 212.46,
(41)
= - 179.651+^- =-179.614,
(42)
U =(D - C)* = (212.46 + 179.614)1/3 = 7.3190, V ={D + C )
1/3
= (212.46 - 179.614)
1/3
= 3.2025,
(43) (44)
Numerical Approximations A Z=U -V
of Kalman Filtering
215
5 5 + - = 7.3190 - 3.2025 + O
O
=4.1165 + 1.66666 = 5.78317, 2
b=Z-
(45)
2
y/Z - 3r - 1 r e c I \2\ =5.8 - \/(5.8) 2 - 3 — - - 1 = 5.8 - V46.73 = 5.8 - 6.8360 V 69 = - 1.0361,
(46)
a = 3 + V 3 Z - 1 = 3 + v/3(5.8) - 1 = 3 + 4.04969 = 7.04969.
(47)
From equation (18) of [26], 6
y13 J ^ l ^ l = r Yu = ^ H
y
1 2
-
=
-380^-6 = 2 ^ /U 2.17028
=
0.17529,
(48) v
V 69
12(01 7529 ) r
= 0.968096,
= 4 = ±1^(017529) v4
^3
(49)
=Q
g2(3 + 2 S - A S 2 )
F22=
Sl
YA
4.7958[3+12-73(4.7958)] __ 2v/3 =9.266584 - 6.38045 = 2.8861, =
^ ^
=
(51)
^™.
i>23 =F323 = 0.7822216 .
(53)
We remark that the expression used here in equation (53) is my correction, which has been gracefully acknowledged as being correct by Ramachandra in personal correspondence. Similarly from [26, Eq.(17)], we have that
JJn + S) = 6^8045^6 T
=
/325
9 *?V^
Yn = Yu =
—
y
"-
(55)
=31.541577, r S2YU
4.7958(5.7045) =
= 15 795
V3 2/
-
-gl
°'
(56)
T. Kerr
216 _4.7958[3+ 1 2 + ^ ( 4 . 7 9 5 8 ) ] 2V3 =25.885822, (S2 + A) 2A
^33
4.7958 + y/3
6.38045 (57) 1.884428,
2 ^
y 2 3 = y 2 = 3.55107
(58) (59)
We can now specify the steady-state KF covariances P and P , respectively, since we have already evaluated all the Y^'s and Yi/s that are necessary intermediate calculations. Using the scalings in [26, Eq.(16)] to unravel the P's from the Y's ([26, Eqs.(48-53)]) yields P n =a2xYn = 16(31.541577) = 504.6652, Pa =\°*°aT2Yn
=
16 2 7
( - 64620)(15
79500) =
(60) 349.3373s,
P13 =(rxaaTY13 = 4(2.764620)2(5.704522) = 126.1666, (2.764620)216 a2T4 (25.885822) = 263.79805, 22 22 12 12 a2T3 (2.764620)28 (3.551069) = 108.5650, P23 =~-Y23 2
"" r
,
2
/ o 7C/ii?on\2/ p 3 3 = a ^27->2v r 2 y 3_3_ = (2.764620)^4(1.884428) = 57.6117.
(61) (62) (63) (64) (65)
Thus, the 3 x 3 steady-state covariance prior to one of the periodic measurement updates is "504.6652 349.3374 126.1666' A = • 263.79805 108.5650 , (66) 57.6117 which is a symmetric partition of the symmetric 6 x 6 matrix Pi
:
0
P2
(67)
0
to be used as an explicit check on the output generated by the software implementation under test. Using the scalings in [26, Eq.(16)] which can also be used in an identical fashion as just demonstrated above to also unravel the P's from the F's ([26, Eqs.(42-47)]) to yield P11 =oxYu
= 16(0.969269) = 15.5083,
(68)
217
Numerical Approximations of Kalman Filtering A2 =^axaaT2Yi2
= 8(2.764620) (0.485352) = 10.7345,
(69)
A s =o-xaaTYi3 = 8(2.764620) (0.1753) = 3.8771, a2T4 (2.764620)216 -(2.8861) = 29.4118, P22 = ^ - y 2 2 12 2 3 <7 T P 2 3 =-2iz- Y23 = (2.764620)24(0.782216) = 23.9143,
(70)
P 3 3 =alT2Y33
(73)
= (2.764620)24(0.88443) = 27.0392.
(71) (72)
Thus, the 3 x 3 steady-state covariance immediately after one of the periodic measurement updates is 15.5083 Pi =
10.7345 29.4118
3.8771 23.9143 27.0392
(74)
which is a symmetric partition of the symmetric 6 x 6 matrix
Po =
A
; 0
(75)
0 to be used as an explicit test case check on the software implementation. The obvious sanity checks between Pi and Pi are satisfied since the latter covariance representing the uncertainty in the three states of position, velocity, and acceleration after a measurement update are in fact all smaller, as expected. The three crosscheck equations following equation (19) of [26] for the main diagonal entries of both Pi and Pi are satisfied as (p
p ^_„2 T 4(4S-1) 12
(P22 - P22) — oal
or
263.798 - P 2 2 = (2.764620) 2
1 6 ( 2 4 - 1) 12
234.389,
(76)
P 22 = 29.408871, which agrees with the result of equation (71) so P22 and A2 axe consistent with this cross-check; (P33-A3)=a2T2, or
57.6117 - A s = (2.764620)24 = 30.5724,
(77)
T. Kerr
218 or P33 = 27.0392,
which agrees with the result of equation (73) so P33 and P33 are consistent with this cross-check; AS2 (Pll - hi)
= CTx-5-,
or
or
504.6652 - A i = 16„t; 3 f;^ = 489.1569, 325/69
(78)
A i = 15.5083, which agrees with the result of equation (68) so A i a n d A i are consistent with this cross-check. Now this closed-form example test case can be used to check-out and confirm output from any software implementation of a discrete-time KF covariance calculation for the parameters of Section 2.2.5 or augmentation of Section 2.2.3 to any state size that is an integer multiple of the original 3 states. References 1. Bar-Shalom, Y. and T. E. Fortmann, Hacking and Data Association, Academic Press, New York, 1988. 2. Benes, V. E., Exact finite-dimensional filters for certain diffusions with nonlinear drift, Stochastics 5 (1981), 65-92. 3. Bierman, G. J., Numerical experience with modern estimation algorithms, Proc. of IEEE Conf. on Decis. Contr., Ft. Lauderdale, FL, Dec, 1985, 18961901. 4. Boozer, D. D. and W. L. McDaniel, On innovation sequence testing of the Kalman filter, IEEE Trans, on Auto. Contr. 17 (1972), 158-160. 5. Chen, G., Convergence analysis for inexact mechanization of Kalman filters, IEEE Trans, on Aero. Electr. Sys. 28 (1992), 612-621. 6. Daum, F. E., Exact finite-dimensional nonlinear filters, IEEE Trans, on Auto. Contr. 31 (1986), 616-622. 7. Daum, F. E., Solution of the Zakai equation by separation of variables, IEEE Trans, on Auto. Contr. 32 (1987), 941-943. 8. Gelb, A. (ed.), Applied Optimal Estimation, M.I.T. Press, Cambridge, MA, 1974. 9. Hall, E. B. and G. L. Wise, An analysis of convergence problems in Kalman filtering which arise from numerical effects, Proc. of the 33rd Midwest Symp. on Circ. Sys., Calgary, Alberta, Canada, Aug., 1990. 10. Jazwinski, A. H., Stochastic Processes and Filtering Theory, Academic Press, New York, 1970.
Numerical Approximations of Kalman Filtering
219
11. Kerr, T. H., An invalid norm appearing in control and estimation, IEEE Trans, on Auto. Contr. 23 (1978), 73-74 (Correction on pp. 1117-1118, Dec. 1978). 12. Kerr, T. H., Decentralized filtering and redundancy management for multisensor navigation, IEEE Trans, on Aero. Electr. Sys. 23 (1987), 83-119 (minor corrections appear on p. 412 of May and p. 599 of July 1987 issues of the same journal). 13. Kerr, T. H., Testing matrices for definiteness and application examples that spawn the need, AIAA J. ofGuid. Contr. Dynam. 10 (1987), 503-506 (reply to and rebuttal by author in 12 (1989), 767). 14. Kerr, T. H., Computational techniques for the matrix pseudoinverse in minimum variance reduced-order filtering and control, in Control and Dynamic Systems-Advances in Theory and Applications, Vol. 28: Advances in Algorithms and Computational Techniques for Dynamic Control Systems, Part 1 of 3, C. T. Leondes (ed.), Academic Press, New York, 1988, 57-107. 15. Kerr, T. H., Rationale for Monte-Carlo simulator design to support multichannel spectral estimation and/or Kalman filter performance testing and software validation/verification using closed-form test cases, M.I.T Lincoln Laboratory Report No. PA-512, Lexington, MA, 22 Dec. 1989. 16. Kerr, T. H., An analytic example of a Schweppe likelihood ratio detector, IEEE Trans, on Aero. Electr. Sys. 25 (1989), 545-558. 17. Kerr, T. H., A constructive use of idempotent matrices to validate linear systems analysis software, IEEE Trans, on Aero. Electr. Sys. 26 (1990), 935-952 (minor corrections in the same journal 27 (1991), 951-952. 18. Kerr, T. H., On misstatements of the test for positive semidefinite matrices, AIAA J. ofGuid. Contr. Dynam. 13 (1990), 571-572. 19. Kerr, T. H., Fallacies in computational testing of matrix positive definiteness/semidefiniteness, IEEE Trans, on Aero. Electr. Sys. 26 (1990), 415-421. 20. Kerr, T. H., Streamlining measurement iteration for EKF target tracking, IEEE Trans, on Aero. Electr. Sys. 27 (1991), 408-420 (minor correction appears in Nov. 1991 issue). 21. Kwakernaak, H. and R. Sivan, Linear Optimal Control Systems, Wiley-InterScience, New York, 1972. 22. Lamperti, J., Probability: A Survey of the Mathematical Theory, Benjamin, Inc., New York, 1966. 23. Martin, W. C. and A. R. Stubberud, An additional requirement for innovations testing in system identification, IEEE Trans, on Auto. Contr. 19 (1974), 583585. 24. Maybeck, P. S., Stochastic Models, Estimation, and Control, Vol.1, Academic Press, New York, 1979. 25. Patel, J. K , C. H. Kapadia, and D. B. Owens, Handbook of Statistical Distributions, Marcel Dekker, New York, 1976. 26. Ramachandra, K. V., Optimum steady state position, velocity, and acceleration
220
27. 28. 29. 30.
T. Kerr estimation using noisy sampled position data, IEEE Trans, on Aero. Electr. Sys. 23 (1987), 705-708 (a correction appears in the May issue of 1988). Sanghai-IAM, S. and T. E. Bullock, Analysis of discrete-time Kalman filtering under incorrect noise covariances, IEEE Trans, on Auto. Contr. 35 (1990), 1304-1309. Tam, L.-F., W.-S. Wong, and S. S.-T. Yau, On a necessary and sufficient condition for finite dimensionality of estimation algebras, SIAM J. on Contr. Optim. 28 (1990), 173-185. Kalman Filtering Software: User's Guide, Optimization Software, Inc., New York, 1984. Special issue of IEEE Trans, on Auto. Contr. 28 (3) (1983), on Applications of Kalman Filters.
Thomas H. Kerr Lincoln Laboratory of M.I.T. and TeK Associates 11 Paul Revere Road Lexington, MA 02173-6632
Further Reading 1. Anderson, B. D. O. and J. B. Moore, Optimal Filtering, Prentice-Hall, Englewood Cliffs, N. J., 1979. 2. Aoki, M., Optimization of Stochastic Systems: Topics in Discrete-Time Dynamics, Academic Press, New York, 1989. 3. Balakrishnan, A. V., Kalman Filtering Theory, Optimization Software, Inc., New York, 1984 (1st ed.) and 1987 (2nd ed.). 4. Bozic, S. M., Digital and Kalman Filtering, John Wiley & Sons, New York, 1979. 5. Brammer, K. and G. Sifflin, Kalman-Bucy Filters, Artech House Inc., Boston, 1989. 6. Brown R. G. and P. Y. C. Hwang, Introduction to Random Signals and Applied Kalman Filtering, John Wiley & Sons, New York, 1992. 7. Bucy, R. S. and P. D. Joseph, Filtering for Stochastic Processes with Applications to Guidance, John Wiley & Sons, New York, 1968. 8. Caines, P. E., Linear Stochastic Systems, John Wiley & Sons, New York, 1988. 9. Catlin, D. E., Estimation, Control, and Discrete Kalman Filters, SpringerVerlag, New York, 1989. 10. Chen, H. F., Recursive Estimation and Control for Stochastic Systems, John Wiley & Sons, New York, 1985. 11. Chui, C. K. and G. Chen, Kalman Filtering with Real-Time Applications, Springer-Verlag, New York, 1987 (1st ed.) and 1991 (2nd ed.) 12. Davis, M. H. A., Linear Estimation and Stochastic Control, John Wiley & Sons, New York, 1977. 13. Gelb, A., Applied Estimation Theory, M.I.T. Press, Cambridge, MA, 1974. 14. Goodwin, G. C. and K. S. Sin, Adaptive Filtering Prediction and Control, Prentice-Hall, Englewood Cliffs, N. J., 1984. 15. Haykin, S., Adaptive Filter Theory, Prentice-Hall, Englewood Cliffs, N. J., 1986. 16. Jazwinski, A. H., Stochastic Processes and Filtering Theory, Academic Press, New York, 1970. 17. Kailath, T., Lectures on Wiener and Kalman Filtering, CISM Courses and Lectures, No.40, Springer-Verlag, New York, 1981. 18. Kailath, T., Course Notes on Linear Estimation, Stanford University, Stanford, CA, 1982. 19. Kallianpur, G., Stochastic Filtering Theory, Springer-Verlag, New York, 1980. 221
222
Further Reading
20. Krishnan, V., Nonlinear Filtering and Smoothing: An Introduction to Martingales, Stochastic Integrals, and Estimation, Jonh Wiley & Sons, New York, 1984. 21. Kumar, P. R. and P. Varaiya, Stochastic Systems: Estimation, Identification, and Adaptive Control, Prentice-Hall, Englewood Cliffs, N. J., 1986. 22. Leondes, C. T. (ed.), Theory and Applications ofKalman Filtering, AGARDograph No.139, Technical Editing and Reproduction Ltd., London, 1970. 23. Lewis, F. L., Optimal Estimation, John Wiley & Sons, New York, 1986 24. Maybeck, P. S., Stochastic Models, Estimation, and Control, Vols. 1-3, Academic Press, New York, 1979. 25. Mendel, J. M., Lessons in Digital Estimation Theory, Prentice-Hall, Englewood Cliffs, N. J., 1987. 26. Otter, P. W., Dynamic Feature Space Modeling, Filtering and Self-Tuning Control of Stochastic Systems, Springer-Verlag, New York, 1985. 27. Ruymgaart, P. A. and T. T. Soong, Mathematics of Kalman-Bucy Filtering, Springer-Verlag, New York, 1985 (1st ed.) and 1988 (2nd ed.). 28. Sorenson, H. W. (ed.), Kalman Filtering: Theory and Application, IEEE Press, New York, 1985. 29. Strobach, P., Linear Prediction Theory: A Mathematical Basis for Adaptive Systems, Springer-Verlag, New York, 1990. 30. Young, P., Recursive Estimation and Time-Series Analysis, Springer-Verlag, New York, 1984.
Notation ,4+ A, A(t), Ak Al/2
B, B(t), Bk C, C(t), Ck Cov (X, Y) E{X) E{X\Y) G Gk In N{m,a2) Pk,k Pk,k-i
p(x) p(xi,x 2 ) p{xi\x2) Qk Rk
sk
tr
Ufc
Var (X) Var {X\Y) Vfc
Wk wk Xfc Xfc, Xfc.jt Xfc,fc-1
(x,y>
xT, r T hi
pseudo-inverse of matrix A n x n system matrices "square-root" of A p x n control input matrices q x n measurement matrices covariance of random variables X and Y expectation of random variable X conditional expectation of X given Y steady-state Kalman gain matrix Kalman gain matrix n x n identity matrix Gaussian distribution with mean m and variance a2 error covariance matrix: Cov(xk,k — Xfc.fc) error covariance matrix: Cov{i.k,k-i — ^-k,k-\) probability density function joint probability density function conditional probability density function covariance matrix of system random vector £ covariance matrix of measurement random vector ru covariance matrix of system and measurement vectors trace of a matrix deterministic control input (at the fcth time instant) variance of random variable X conditional variance of X given Y observation (or measurement) data (at the fcth time instant) weight (matrix) (at the fcth time instant) weight (scalar) (at the fcth time instant) state vector (at the Arth time instant) optimal filtering estimate of x^ one-step ahead optimal prediction of x^ inner product of x and y transposes of vector x and matrix T Kronecker delta
223
Notation
224 77 fc £, $i,j 0
measurement noise (at the A:th time instant) system noise (at the kth time instant) transition matrix (from the j t h to the ith state) zero (number, vector, matrix, function)
Subject Index Adaptive filter 71,87 adaptive Kalman filter 65,98,100 Bayesian estimate 23,66 Bayesian inference scheme 23 Bellman-Gronwall lemma 179,184 Bias 15 Bounding ellipsoidal set 163,171 Central limit theorem 195 Cholesky decomposition 43 Coloured noise 96,105 Confidence value 161 Convex propagation theorem 144 Convex representation theorem 145 Cramer-Rao lower bound 10 Credal convexity 140 Credal probability 142 Credal state 142 Degree of boldness 151 Distributed filtering 161 Epistemic utility 140 maximum expected 150,151 Estimate 24,26,28,42,67,68,70,140 a priori 7,9,15,66,87,157 a posteriori 7,9,16,66,142 conditional mode 67 correlation method 68 covariance-matching technique 70 Fisher 24 classical 26 generalized 28 maximum likelihood (ML) 42,67 joint 67 marginal 67 set-valued 140 Least squares 43,44
Extended Kalman filter (EKF) 3,15,211 ideal (IEKF) 9,10 iterated 13 modified (MEKF,MGEKF) 9,11,39 standard (EKF.SEKF) 4 Fisher initialization 23 Fixed point smoother (FPS) 56 Fokker-Planck equation 7 Gaussian noise 65,89 Gaussian analogue 120 Gaussian sum 113 non-Gaussian noise 113,161 Generalized Fisher error covariance 28 Hassian (matrix) 16 Kalman extrapolation equation 25 Kalman filter 3,6,15,23,25,39,65,87,113, 114,139,161,179,193 adaptive 65,98,100 class B JTIDS filter 194 diffuse (DKF) 39,45 extended (EKF) 3,15 GPF filter 194 IV&V Kalman filter code 193 MFBARS filter 194 on-line Kalman filtering 90 PTA filter 194 PINS filter 194 robust adaptive Kalman filtering 65 robust 181 RV tracking filter 194 set-valued Kalman filtering 139,146 SINS correction filter 194 STAR filter 194 suboptimal Kalman filtering 113,114 Numerical approximation 193 225
226 generalized least-squares (GLS) 43 ordinary least-squares (OLS) 44 Log-likelihood 42 diffuse 48 Maximum entropy 24 Maximum likelihood estimator 42,67 joint 67 marginal 67 Minimum mean-square error 113,114 Model 39,40,161 ARIMA 39 ARMA 40 Set 161 SSM 39,40 Monte-Carlo simulator 199 AVBSIM.BAS simulator 211 Nominal trajectory 4 Nonlinear dynamical system 3 Nonlinear measurement 4
Subject Index faulty approximation 193 IVkV Kalman filter code 193 stochastic approximation method 65 uncalibrated approximation 197 Psuedoinverse (matrix) 27,165 QR algorithm 46 Random walk disturbance 41 Riccati algebraic equation 90 Robust (adaptive) Kalman filtering 65,181 Robustness 24 Saddlepoint theory 179 Schwarz's theory of distribution 198 Stochastic approximation method 65 Support function 172 Suspension of judgment 140 Uncertainty 179 noise 179 parametric 179 Wiener-Kolmogorov theory 88