Math. Control Signals Systems (2002) 15: 71–100 ( 2002 Springer-Verlag London Limited
Mathematics of Control, Signals, and Systems
Almost Optimal Adaptive LQ Control: SISO Case* Jasper Daamsy and Jan Willem Poldermanz Abstract. In this paper an almost optimal indirect adaptive controller for input/ output dynamical systems is proposed. The control part of the adaptive control scheme is based on a modified LQ control law: by adding a time-varying gain to the certainty equivalent control law the conflict between identification and control is avoided. Key words. Adaptive control, LQ control, Riccati equation, Excitation, Stability radius.
1. Introduction It has been recognized by many authors that due to lack of excitation, there exists a fundamental conflict between identification and control in adaptive control schemes that are based on an optimal control design. See, e.g., [BV], [LKS], [P5], [P6], and [vS]. Recently it has been shown [dB] that if the underlying controller design is based on the minimization of a quadratic cost criterion, then the costs incurred may be arbitrarily large. Given these facts, it appears natural to examine the exact nature of the problem carefully and to try to compensate for it through as few modifications as possible. In our findings, the conflict between identification and control is caused by lack of independent equations in terms of the unknown system parameters. By this we mean the following. The parameters of a system are determined by the input/output data, provided these data are su‰ciently exciting. In fact, the coe‰cients of the set of linear equations in which the unknown parameters appear are the input/output data. The excitation conditions then expresses that this set of equations has a unique solution. In turn, this lack of equations is due to the fact that the unmodified adaptively controlled behavior is asymptotically linear and time-invariant and is operating in a closed-loop. We shall see that the di‰culty is that the existence of di¤erent models that explain
* Date received: April 18, 2000. Date revised: March 8, 2001. y Air Tra‰c Control, The Netherlands, P.O. Box 75200, 1117 ZT Schiphol Airport, The Netherlands.
[email protected]. Jasper Daams carried out this research at the Department of Mathematical Sciences at the University of Twente. z Department of Mathematical Sciences, University of Twente, P.O. Box 217, 7500 AE Enschede, The Netherlands.
[email protected]. 71
72
J. Daams and J. W. Polderman
the observed data implies time-invariant autonomous behavior. Hence, if timeinvariant autonomous models are disabled, then only the true system is able to explain the observed data. A logical step is therefore to add time variations to the controller gains to avoid time-invariant behavior. In [P5] a first attempt in this vein was made. In the present paper this idea is worked out further. In fact this paper concludes a line of research that started in [P3] and went on in [P5], [P6], [B], [dB], [AMP], [ABP], and [Da]. It provides a complete solution for deterministic single-input/single-output systems. This solution was obtained in [D] as part of the M. Sc. work of the first author. Several related results for stochastic systems have been reported in the literature [KL], [K], [KB], [CDPD], [G]. One approach to the problem in a stochastic setting was taken in [KL], [K], and [KB]. The key idea there is to bias the estimates toward parameters corresponding to systems for which the optimal costs are relatively low. This approach has recently attracted new attention [C]. Another way to get around the conflict between identification and control is to use external probing signals to ensure correct identification of the open-loop system. See, e.g., [GT] for an early approach to deterministic systems. The main drawback of such an approach is that the probing signal adds to the cost and moreover it makes regulation to zero impossible. A refinement of this idea is to use a probing signal that enables consistent estimation and yet does not a¤ect the asymptotic behavior of the controlled system. This can be done by using a probing signal that converges to zero in a delicate manner. Fast enough to be invisible for the cost criterion and slow enough to guarantee consistent estimation. This idea was put forward in [CG1] and used in [CDPD] for the continuous-time case and in [G] for the discrete-time case. The main purpose of the present paper is to find a strategy that does not use external or open-loop probing signals at all. Rather we incorporate time-dependent variations in the controller parameters so that the resulting control signal contains a part that may be interpreted as a closed-loop probing signal. The advantage of this approach is that this signal is proportional to the norm of the state. If, for example, a sudden change of the system parameters occurs after the system has more or less settled, then the closed-loop probing signal is re-activated automatically. Another advantage is of course that due to the fact that the probing signal is proportional to the norm of the state, it will never be dominated by the state when the system goes through a transient phase. The idea behind the time-varying gain is that it ensures that the true system is identified and hence also the optimal controller. Of course, the time-varying part of the controller should not destroy the stability of the closed-loop system. By exploiting the concept of stability radius, the time-varying gain is chosen such that stability is preserved. Due to the incorporation of the time variations in the controller, the resulting closed-loop system does not behave optimally. However, an additional scaling factor allows us to approximate the optimal behavior arbitrarily well. The solution of the Riccati equation is needed for the determination of the optimal control law, but at the same time it provides an upper bound on the norm of the time-varying gain.
Almost Optimal Adaptive LQ Control: SISO Case
73
The development consists of two stages. Firstly we design a control law such that in the equilibrium points of the algorithm the desired specifications are met. Secondly we prove global convergence of the algorithm on the basis of the earlier obtained results. The paper is organized as follows. In Section 2 we introduce the problem precisely and we provide detailed motivation. It will turn out that we may benefit from a small time-varying gain on top of the usual certainty equivalent control law. How to choose this time-varying part is explained in Section 3. Section 3 is divided into three subsections in each of which di¤erent aspects of the modified control are discussed. These aspects are: how to identify the true system, how to preserve stability, and how to approximate the optimal costs. In Section 4 the ideas developed in Section 3 are used to propose an adaptive control algorithm. The main result that we derive in this section is that the adaptively controlled system asymptotically approaches the optimal behavior. Here asymptotic is with respect to time and a design parameter in the time-varying part of the feedback. In Section 5 the adaptive algorithm and the modification are illustrated by means of simulations. Finally, in Section 6 we draw some conclusions and we indicate possible extensions of the results.
2. Problem Formulation We consider the class of discrete time single-input/single-output (SISO) dynamical systems 0 0 ykþn þ an1 ykþn1 þ þ a00 yk ¼ bn1 ukþn1 þ þ b00 uk ;
ð1Þ
where u is the input, y is the output, and the coe‰cients ai0 ; bi0 are real. We assume that (1) is controllable, i.e., the polynomials AðsÞ ¼ s n þ an1 s n1 þ þ a0 and BðsÞ ¼ bn1 s n1 þ þ b0 are coprime, see [PW]. We furthermore assume that the coe‰cients ai0 ; bi0 are unknown. The order n, however, is assumed to be known. The control objective is to minimize the cost criterion: JðuÞ ¼
y X
yk2 þ ruk2
with r a positive constant:
ð2Þ
k¼0
In order to be able to apply standard LQ theory we reformulate the problem in a state space setting. We use the following nonminimal input/state/output representation of (1): jkþ1 ¼ A0 jk þ b0 uk ; yk ¼ c 0 j k ;
ð3Þ
74
where
J. Daams and J. W. Polderman
2
6 6 6 6 6 6 6 6 6 6 6 6 6 A0 :¼ 6 6 6 6 6 6 6 6 6 6 6 6 6 4 c0 :¼ ½1
0 0 a10 a00 bn2 b10 b00 an1
1 0 .. . .. .
0 ..
. ..
0 .. . .. . .. .
0 .. . .. . .. .
0 .. . .. . .. .
0
0 .. . .. .
0 .. . .. . .. .
0
0 .. . .. .. . . .. . 0
0 .. . .. . .. .
0
0
0
1
0
0
.
1
0 .. . .. . .. .
0
0
0 .. . .. . .. .
0 .. .
...
...
0
...
...
1 0
...
3 7 7 7 7 7 7 7 7 7 7 7 7 7 7; 7 7 7 7 7 7 7 7 7 7 7 7 5
2 6 6 6 6 6 6 6 6 6 6 6 6 b0 :¼ 6 6 6 6 6 6 6 6 6 6 6 6 4
0 bn1
0 .. . .. . 0 1 0 .. . .. .
3 7 7 7 7 7 7 7 7 7 7 7 7 7; 7 7 7 7 7 7 7 7 7 7 7 5
0
0
ð4Þ
and jk ¼ ½yk
yknþ1
uk1
uknþ1 T ;
ð5Þ
where j A R2n1 is the (nonminimal) state. Since the state is composed of past inputs and outputs that are available for measurement, we assume throughout the paper that at time k, including k ¼ 0, jðkÞ is known. We denote the class of state systems of the form (3) by W. The pair ðA0 ; b0 Þ is controllable since the system (1) is controllable. The unobservable modes of the nonminimal state representation are zero and hence the state j is detectable from the input/output ðu; yÞ. It follows from (5) that the state of (3) is always known from input/output observations without knowing the system parameters, and this is why we prefer the nonminimal representation (3) to a minimal representation. In terms of the newly defined state j the control objective can be reformulated as the minimization of y X Jðu; j0 Þ ¼ jkT Qjk þ ruk2 ; Q :¼ c0T c0 ; r > 0: ð6Þ k¼0
The feedback that minimizes (6) is given by (see [KS2]) where
uk ¼ f ðA0 ; b0 Þjk ;
ð7Þ
f ðA0 ; b0 Þ ¼ ðb0T P0 b0 þ rÞ1 b0T P0 A0
ð8Þ
and P0 is the unique symmetric positive semidefinite solution of the algebraic Riccati equation: P0 A0T P0 A0 þ A0T P0 b0 ðb0T P0 b0 þ rÞ1 b0T P0 A0 Q ¼ 0:
ð9Þ
Almost Optimal Adaptive LQ Control: SISO Case
75
The optimal costs are given by J ðj0 Þ ¼ j0T P0 j0 :
ð10Þ
Because ðA0 ; b0 Þ is unknown, we want to generate an input sequence uk in an adaptive fashion. If we proceed in the standard way, we would apply the controller (7) based on estimates of ðA0 ; b0 Þ obtained from closed-loop observations. However, in the case of LQ control there exists a conflict between closed-loop identification and control, see [P6]. The nature of this conflict can be e¤ectively explained using the notion of a closed-loop unfalsified model, which we define below, see also [P5]. Definition 2.1. Consider the dynamical system (3). Let ðA; bÞ A W be controllable and let g be a control gain designed on the basis of ðA; bÞ according to some control objective. Assume that the sequence jk is generated as follows: jkþ1 ¼ ðA0 þ b0 gÞjk ;
j0 ¼ j:
ð11Þ
If for all k ðA0 þ b0 gÞjk ¼ ðA þ bgÞjk ;
ð12Þ
then the pair ðA; bÞ is said to represent a closed-loop unfalsified model of (3). In the remainder of the paper the control gain g will always be f ðA; bÞ as defined in (7), with ðA0 ; b0 Þ replaced by ðA; bÞ. The set of closed-loop unfalsified models is denoted by G. Remark 2.2. Consider a standard adaptive algorithm. If we initialize this algorithm in ðA; bÞ A G, then the one step ahead prediction error is identically zero (see Definition 2.1) and therefore the identification procedure used in this algorithm will be frozen. In that case, the true system will be controlled by an input sequence generated as follows: uk ¼ f ðA; bÞjk : ð13Þ A closed-loop unfalsified model is therefore an invariant point of any certainty equivalent adaptive control algorithm. We now explain the conflict between identification and control in LQ optimal adaptive control, [P6]. The first question that arises is: when the true system is controlled on the basis of a closed-loop unfalsified model ðA; bÞ, what is then the value of the cost criterion (6)? In [P6] it was shown that these costs are in general larger than the optimal costs. In fact, the only closed-loop unfalsified model on the basis of which the true system could be controlled optimally, is the true system itself. This leads to the question: how large is the set of closed-loop unfalsified models? Unfortunately, there are many closed-loop unfalsified models: Consider the system of equations (12) that defines the set G. The number of unknowns is 2n whereas there are only 2n 1 equations in the unknowns. As a consequence it can be expected that the set of closed-loop unfalsified models is a manifold of dimension at least 1. See [P6] for more details.
76
J. Daams and J. W. Polderman
It has been proven in [dB] for the first-order case, that the real costs as a function of the closed-loop unfalsified models are unbounded. More precisely, for any true system and for any positive constant C, the set of closed-loop unfalsified models contains a model such that if the true system is controlled by the optimal controller corresponding to that model, the costs incurred exceed C. The set G contains the set of possible limit points of the sequence of estimates. Hence, in the limit the control law is based on a closed-loop unfalsified model. We conclude that when applying a standard certainty equivalent control scheme the limit points of the sequence of estimates correspond to closed-loop unfalsified models that are in general not equal to the true system. The costs incurred with these models are expected to be larger than the optimal costs. It follows from the above reasoning that if we want to avoid the conflict between identification and control, we have to ensure that the only closed-loop unfalsified model is the true system, i.e., G ¼ fðA0 ; b0 Þg. Since the usual certainty equivalent control law cannot guarantee this, we develop a modified control law such that the following specifications hold:
. . .
The only closed-loop unfalsified model of (3) controlled by the modified control law is the true system. The modified control law is stabilizing. The extra costs incurred with the modification can be kept arbitrarily small, without a¤ecting the first two requirements.
The development of a control law that meets the above specifications is the main goal of the paper. However, we also apply these ideas to an adaptive control algorithm. 3. The Modified Control Law In this section we develop a control law that meets the specifications of the previous section. To streamline the discussion we have divided this section into three subsections. In each of these subsections we discuss one of the three specifications. 3.1. How to Make Sure that G ¼ fðA0 ; b0 Þg In this subsection we construct a control law such that G ¼ fðA0 ; b0 Þg without worrying about the other two required properties. It is clear that the modified control law should be such that the measured data fðjk ; uk Þ j k b 0g contain su‰cient information to identify the true system. The following theorem states that if the data are explained by a system di¤erent from the true system, then they can be explained by an autonomous time-invariant system. Theorem 3.1. Let ðA0 ; b0 Þ; ðA; bÞ A W with ðA0 ; b0 Þ 0 ðA; bÞ. Suppose that both ðA0 ; b0 Þ and ðA; bÞ explain all data fðjk ; uk Þ j k b 0g: jkþ1 ¼ A0 jk þ b0 uk ¼ Ajk þ buk ;
Ek b 0:
ð14Þ
Almost Optimal Adaptive LQ Control: SISO Case
77
Then there exists a feedback law v A R1ð2n1Þ such that uk ¼ vjk
ð15Þ
and a matrix K A Rð2n1Þð2n1Þ such that jkþ1 ¼ Kjk : Proof.
ð16Þ
Suppose b 0 b0 . It follows from (14) that ðA0 AÞjk ¼ ðb b0 Þuk
ð17Þ T
and since b b0 0 0 we can premultiply both sides of (17) with ðb b0 Þ =kb b0 k 2 . This yields ðb b0 Þ T ðA0 AÞjk ¼ uk : ð18Þ kb b0 k 2 If we define v :¼ ððb b0 Þ T=kb b0 k 2 ÞðA0 AÞ and K :¼ A0 þ b0 v the statement follows. Suppose b ¼ b0 . Then A 0 A0 and from (14) we obtain that Ajk ¼ A0 jk ;
ð19Þ
Ek:
?
2n1
Define F :¼ spanfjk g and let F be the orthogonal complement of F in R . Since A 0 A0 , (19) implies that dim F < 2n 1 and hence F? is nontrivial. Next we prove that there exists a vector q~ A F? such that q~T b0 0 0. Suppose that q T b0 ¼ 0 for all q A F? . By definition q T jk ¼ 0 for all k and q A F? and we conclude from (14) that f0g ¼ ðF? Þ T jkþ1 ¼ ðF? Þ T A0 jk ;
Ek:
ð20Þ
This implies that ðF? Þ T A0 H ðF? Þ T . Since we assumed that ðF? Þ T b0 ¼ f0g and F? is nontrivial this contradicts the controllability of the pair ðA0 ; b0 Þ. We premultiply the first equality in (14) by a q~ A F? such that q~T b0 0 0 which yields 0 ¼ q~TA0 jk þ q~T b0 uk : ð21Þ Division by q~T b0 yields uk ¼
~ q TA0 j : q~T b0 k
The statement follows by defining v :¼ ð~ q TA0 Þ=ð~ q T b0 Þ and K :¼ A0 þ b0 v.
ð22Þ 9
Notice that ðA; bÞ in Theorem 3.1 need not be controllable, whereas the controllability of ðA0 ; b0 Þ is essential. Remark 3.2. The result of Theorem 3.1 can be extended to multi-input/state dynamical systems xkþ1 ¼ A0 xk þ B0 uk ; ð23Þ where u A Rm and the state x A Rn . In this case it can be proven under the same assumptions that if there exists a model di¤erent from the true system that ex-
78
J. Daams and J. W. Polderman
plains all data, then there exists a nonzero vector w A Rm1 and a v A Rn such that vxk ¼ wuk for all k. Moreover, after a suitable coordinate transformation the data may be explained by a model ðF ; GÞ A Rnn Rnm where one of the columns of the input matrix G equals zero. Remark 3.3. The result of Theorem 3.1 can be formulated in a continuous time setting as well, see [D]. Theorem 3.1 states that if there exist unfalsified models other than the true system, then the data can also be explained by a linear time invariant system. Thus, in order to reduce the set of closed-loop unfalsified models to the true system only it is su‰cient to exclude the possibility that the closed-loop system behaves like a timeinvariant system. Since the usual certainty equivalent adaptive control strategies yield asymptotically time-invariant closed-loop systems, we propose adding a persistently time-varying term Lk to the certainty equivalent part of the feedback. The following theorem describes how we could choose the Lk so that they are ‘‘su‰ciently’’ time-varying. It generalizes ideas that were developed for the first-order case in [P5]. Theorem 3.4. Let ðM; bÞ A W. Let the sequence of time-varying feedback gains Lk be of the form Lk ¼ lk H with H A R pð2n1Þ . Assume that ðM; HÞ is detectable and that the unobservable modes are zero. Let l~i A R1p , i ¼ 1; . . . ; p þ 1, be such that they do not satisfy the same a‰ne relation, i.e., there does not exist a nonzero vector a A R p and a scalar c such that l~i a ¼ c, i ¼ 1; 2; . . . ; p þ 1. Let lk switch through l~1 ; . . . ; l~pþ1 in the following fashion: l0 ¼ ¼ l2n2 ¼ l~1 ; l2n1 ¼ ¼ l4n3 ¼ l~2 ; .. . lpð2n1Þ ¼ ¼ lð pþ1Þð2n1Þ1 ¼ l~pþ1 ;
ð24Þ
lð pþ1Þð2n1Þ ¼ ¼ lð pþ2Þð2n1Þ1 ¼ l~1 .. . If jkþ1 ¼ Mjk þ bLk jk and if jð pþ1Þð2n1Þ 0 0, then there does not exist a matrix K A Rð2n1Þð2n1Þ such that, for all k, jkþ1 ¼ Kjk . Proof. Suppose that the statement is not true. Since b 0 0 there exist a vector v and a matrix K such that Lk jk ¼ vjk ; ð25aÞ jkþ1 ¼ Kjk :
ð25bÞ
Consider the intervals Ij :¼ ½ð j 1Þð2n 1Þ; jð2n 1Þ 1 , where j ¼ 1; . . . ; p þ 1
Almost Optimal Adaptive LQ Control: SISO Case
79
(these are the intervals where lk is constant). Define Vj :¼ spanfjk j k A Ij g. We prove that there exists a j A Vj , j ¼ 1; . . . ; p þ 1, such that Hj ¼: w 0 0:
ð26Þ
HVpþ1 ¼ f0g:
ð27Þ
Suppose Apparently Vpþ1 is an M-invariant subspace contained in ker H. By the assumption on the pair ðM; HÞ it follows that M 2n1 Vpþ1 ¼ f0g. In particular, M 2n1 jpð2n1Þ ¼ jð pþ1Þð2n1Þ ¼ 0. This contradicts the assumption that jð pþ1Þð2n1Þ 0 0. Therefore there exists j A Vpþ1 such that Hj 0 0. It follows from (25b) and the theorem of Cayley–Hamilton that Vpþ1 H Vp H H V1 and hence (26). Since j A Vj there exist scalars ak such that j ¼
jð2n1Þ1 X
ak j k ;
j ¼ 1; . . . ; p þ 1:
ð28Þ
k¼ð j1Þð2n1Þ
Since lk is constant on the intervals Ij , j ¼ 1; . . . ; p þ 1, it follows from (25a) and (26) that vj ¼
jð2n1Þ1 X k¼ð j1Þð2n1Þ
ak vjk ¼ l~j H
jð2n1Þ1 X
ak jk ¼ l~j w ;
j ¼ 1; . . . ; p þ 1;
k¼ð j1Þð2n1Þ
ð29Þ which contradicts the assumption that the l~j ’s do not satisfy the same a‰ne relation. 9 Suppose now for the moment that the sequence of estimates has converged so that the certainty equivalent part of the control law is constant, say f . If we use the structure Lk ¼ lk H in jkþ1 ¼ ðA0 þ b0 f Þjk þ b0 Lk jk and apply the resulting feedback to the true system we obtain jkþ1 ¼ ðA0 þ b0 f Þjk þ b0 lk Hjk :
ð30Þ
According to Theorem 3.4 we should choose H such that ðA0 þ b0 f ; HÞ is detectable and such that the unobservable modes are zero. Although ðA0 ; b0 Þ is unknown and f can be any feedback vector, it is not di‰cult to find such an H as the following lemma shows. c0 Lemma 3.5. Consider (30) where ðA0 ; b0 ; c0 Þ is given by (4) and let H ¼ mf where m 0 0. Then ðA0 þ b0 f ; HÞ is detectable and the unobservable modes are zero. Proof. Let V be an ðA0 þ b0 f Þ-invariant subspace such that HV ¼ f0g. Then, by the specific form of H, A0 V H V and c0 V ¼ f0g. Therefore A02n1 V ¼ 0 and consequently ðA0 þ b0 f Þ 2n1 V ¼ f0g. 9 Notice that it follows from Lemma 3.5 that the number of rows of H, which is p in Theorem 3.4, may be taken equal to 2. In Theorem 3.4 it is derived that the
80
J. Daams and J. W. Polderman
number of feedbacks that is switched through is just one more than the number of rows of H. That means that we need to switch through three feedbacks only. The above discussion motivates the use of a controller of the form c0 uk ¼ f jk þ lk ð31Þ j : mf k As stated in Theorem 3.4 the idea of the time-varying feedback is to switch cyclically through an o¤-line determined set of feedback gains l~i . In particular we use l~1 for 2n 1 iterations followed by l~2 for the next 2n 1 iterations, l~3 for 2n 1 iterations, and then again l~1 for 2n 1 iterations and so on. In other words we switch through l~1 ; l~2 ; l~3 as l~1 ; . . . ; l~1 ; |fflfflfflfflffl{zfflfflfflfflffl} 2n1 times
l~2 ; . . . ; l~2 ; |fflfflfflfflffl{zfflfflfflfflffl} 2n1 times
l~3 ; . . . ; l~3 ; |fflfflfflfflffl{zfflfflfflfflffl} 2n1 times
l~1 ; . . . ; l~1 ; . . . |fflfflfflfflffl{zfflfflfflfflffl}
ð32Þ
2n1 times
For convenience of notation we define the switching mechanism more formally. Let n, k A Z. The function s: Z ! f1; 2; 3g is defined as sðkÞ ¼ ½ðk mod 3ð2n 1ÞÞ divð2n 1Þ þ 1: We can now write lk as follows:
ð33Þ
lk ¼ l~sðkÞ :
ð34Þ
jkþ1 ¼ ðA0 þ b0 f Þjk þ b0 uk ; c0 uk ¼ l~sðkÞ j ; mf k
ð35Þ
Consider the system
where ðA0 ; b0 Þ A W. The following theorem states that by using the above ideas, the set of closed-loop unfalsified models reduces to a single model, namely the true system. Theorem 3.6. Consider the system (35) and a closed-loop unfalsified model ðA; bÞ A W. Let the l~i be chosen such that they do not satisfy the same linear a‰ne relation and let the function s be defined by (33). If for all k, c0 c0 j0 ¼ j; jkþ1 ¼ A0 þ b0 f þ b0 l~sðkÞ jk ¼ A þ bf þ bl~sðkÞ jk ; mf mf ð36Þ and j3ð2n1Þ 0 0, then A ¼ A0 and b ¼ b0 . Proof. Since ðA0 ; b0 Þ A W we know that ðA0 ; c0 Þ is detectable and the unobservable modes are zero. We may therefore apply Theorem 3.4 to conclude that there does not exist a matrix K such that jkþ1 ¼ Kjk . It then follows from Theorem 3.1 that A0 þ b0 f ¼ A þ bf and b0 ¼ b and hence also A0 ¼ A. 9 We see that incorporating the time-varying feedback on top of the certainty equivalent feedback ensures that the only closed-loop unfalsified model is the true
Almost Optimal Adaptive LQ Control: SISO Case
81
system. The assumption that j3ð2n1Þ 0 0 is not very restrictive: if the state equals zero at some time, then it will remain zero afterwards and there is not much to control. Remark 3.7. Notice that the condition that there does not exist a vector a and a constant c such that al~i ¼ c, i ¼ 1; 2; 3, is invariant with respect to scaling. If l~1 ; l~2 ; l~3 do not satisfy the same a‰ne relation, then neither do al~1 ; al~2 ; al~3 for any nonzero constant a. This property turns out to be essential when we want to scale down the additional feedback to ensure that stability is preserved. 3.2. How to Ensure that Stability Is Preserved In Section 3.1 we proposed a modified control law (31) such that G ¼ fðA0 ; b0 Þg. Of course, the time-varying part of the feedback could destroy the asymptotic stability of the controlled system, unless the time-varying part is su‰ciently small in norm. In this subsection we investigate how small the time-varying part should be. The analysis in this section can be easily formulated for multi-input/multioutput systems. However, we provide the details for the SISO case only. We assume that (3) is controlled by the state feedback c0 uk ¼ f ðA; bÞ þ lk ð37Þ jk ; mf ðA; bÞ where m > 0 is a design parameter and f ðA; bÞ is the optimal feedback given by (8). The time-varying gains lk are left unspecified. For convenience of notation we introduce c0 H :¼ : ð38Þ mf ðA; bÞ The closed-loop interconnection of (3) and (37) is given by jkþ1 ¼ ðA þ bf ðA; bÞ þ blk HÞjk ; yk ¼ c0 jk :
ð39Þ
We are interested in the relation between the norm of lk and the stability of (39). Consider the matrix A þ bf ðA; bÞ þ blk H. This matrix can be decomposed as the sum of a stable, constant matrix M :¼ A þ bf ðA; bÞ and a time-varying matrix blk H. The matrix blk H can be seen as a (structured) perturbation of M. Of course, if the norm of lk is su‰ciently small, the perturbed matrix is stable. In fact, we show that if lk is su‰ciently small a Lyapunov function can be derived for the perturbed system M þ blk H which does not depend on lk . As a result, the timevarying system (39) is exponentially stable. These ideas are closely related to the notion of the complex structured stability radius, see [HS]. If the structured perturbations of M are smaller than the corresponding stability radius, it can be shown that there exists a Lyapunov function for the perturbed system M þ blH that does not depend on l. This Lyapunov function is closely linked to an algebraic Riccati equation that is associated with M, b, and H. In the present context, we derive such a Lyapunov function directly from the algebraic Riccati equation that is
82
J. Daams and J. W. Polderman
associated with the optimal control problem of minimizing (6). The main advantage of this approach is that the evaluation of the upper bound on the norm of lk does not require a significant increase in computational e¤orts, whereas computation of the stability radius is quite involved. The following theorem provides the necessary results. Theorem 3.8. Consider the triple ðA; b; HÞ in (39), the algebraic Riccati equation (9), f ðA; bÞ defined by (8), and M ¼ A þ bf ðA; bÞ. Let H be defined by (38), with ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi s ffi 1 m :¼ 1 r; d > 1; ð40Þ d where d is a design parameter and r is the weighting factor for the inputs in (9), and let g 2 b b T Pb þ dr: ð41Þ If klk k 2 a
1 e; g2
Ek;
for some e > 0;
ð42Þ
then the system jkþ1 ¼ ðM þ blk HÞjk
ð43Þ
is exponentially stable. Proof. Since M is asymptotically stable, there exists a strictly positive definite symmetric matrix X such that X satisfies the Lyapunov equation M T XM X ¼ I :
ð44Þ
Furthermore, we can rewrite the Riccati equation (9) in terms of the matrix M and control law f ðA; bÞ as M T PM P ¼ c0T c0 f ðA; bÞ T rf ðA; bÞ
ð45Þ
and (8) can be rewritten as rf ðA; bÞ ¼ b T PM:
ð46Þ
We prove that we can construct a strict Lyapunov function for the system (43) by making the right combination of X and P. This Lyapunov function is of the form V ðx; r; sÞ ¼ x T ðrP þ sX Þx. To prove that, for a suitable choice of r and s, V is a strict Lyapunov function for the system (43) we derive the following matrix inequality: ðM þ blHÞ T ðrP þ sX ÞðM þ blHÞ ðrP þ sX Þ ¼
rM T PM rP þ sM T XM sX þ rH T l T b T PblH þ sH T l T b T XblH þ rH T l T b T PM þ rM T PblH þ sH T l T b T XM þ sM T XblH
Almost Optimal Adaptive LQ Control: SISO Case ð44Þ; ð45Þ
¼
83
rc0T c0 rf ðA; bÞ T rf ðA; bÞ sI þ rH T l T b T PblH þ sH T l T b T XblH þ rH T l T b T PM þ rM T PblH þ sH T l T b T XM þ sM T XblH
ð46Þ
¼
rc0T c0 rf ðA; bÞ T rf ðA; bÞ sI þ rH T l T b T PblH þ sH T l T b T XblH rH T l T rf ðA; bÞ rf ðA; bÞ T rlH þ sH T l T b T XM þ sM T XblH
¼
rc0T c0 rf ðA; bÞ T rf ðA; bÞ sI þ rH T l T b T PblH T pffiffiffi 1 þ sH T l T b T XblH r dlH þ pffiffiffi f ðA; bÞ d pffiffiffi 1 r r dlH þ pffiffiffi f ðA; bÞ þ rdH T l T rlH þ f ðA; bÞ T rf ðA; bÞ d d !T ! pffiffiffi pffiffiffi 1 1 b blH pffiffiffi M s bblH pffiffiffi M X b b s þ sbH T l T b T XblH þ M T XM b
a
rc0T c0 rf ðA; bÞ T rf ðA; bÞ sI þ rH T l T b T PblH r þ sH T l T b T XblH þ rdH T l T rlH þ f ðA; bÞ T rf ðA; bÞ d s T T T T þ sbH l b XblH þ M XM b
¼
rc0T c0 rf ðA; bÞ T rf ðA; bÞ sI þ rH T l T ðb T Pb þ drÞlH r s þ f ðA; bÞ T rf ðA; bÞ þ sð1 þ bÞH T l T b T XblH þ M T XM d b 1 rc0T c0 r 1 f ðA; bÞ T rf ðA; bÞ sI þ rg 2 klk 2 H T H d s þ sð1 þ bÞH T l T b T XblH þ M T XM b s rð1 g 2 klk 2 ÞH T H sI þ sð1 þ bÞH T l T b T XblH þ M T XM b
a
¼ a
reg 2 H T H þ sð1 þ bÞ
1 T T s H b XbH sI þ M T XM: b g2
ð47Þ
If we choose b , s , and r such that 1 M T XM a 12 I ; b
ð48Þ
84
J. Daams and J. W. Polderman
2 a s ; s ð1 þ b Þ
ð49Þ
1 T b Xb a r eg 2 I ; g2
ð50Þ
then it follows from (47) that V ðjkþ1 ; r ; s Þ V ðjk ; r ; s Þ a jkT jk and the system (43) is exponentially stable.
ð51Þ 9
Remark 3.9. We observe from (51) that the strict Lyapunov function has a fixed rate of decay. From this it can be concluded that the perturbed system is also exponentially stable when subjected to vanishing, unstructured perturbations. 3.3. How Much Does the Time-Varying Gain Add to the Costs? Now that we know how to design the time-varying feedback so as to ensure that the only closed-loop unfalsified model is the true system, we would like to know how much the modification of the usual certainty equivalence design adds to the costs. It should be clear that by uniqueness of the optimal control law, the modified controller is not optimal. However, we will see that we can approximate the optimal costs arbitrarily well by su‰ciently scaling down the time-varying part of the feedback. This result still concerns the behavior of the controlled system when it is controlled on the basis of a closed-loop unfalsified model. Recall from the Introduction that this behavior may be arbitrarily bad without the modification. Theorem 3.10. Let the system (3) be controlled on the basis of the optimal control law f ðA0 ; b0 Þ and an unspecified bounded time-varying feedback lk as uk ¼ f ðA0 ; b0 Þjk þ alk jk ;
a > 0:
ð52Þ
Denote the real costs incurred by using (52) by Ja . The optimal costs (10) may approximated arbitrarily well by choosing a su‰ciently small, that is lima#0 Ja ¼ J0 ¼ J . Proof. Let M0 :¼ A0 þ b0 f ðA0 ; b0 Þ denote the closed-loop matrix with respect to the optimal feedback f ðA0 ; b0 Þ and let J denote the optimal value of (6). We can write y y X X jkT Qjk þ ruk2 ¼ jkT Qjk þ rð f ðA0 ; b0 Þjk Þ 2 J ¼ k¼0
¼
y X k¼0
¼ j0T
k¼0
jkT ½Q þ rf ðA0 ; b0 Þ T f ðA0 ; b0 Þ jk ! y X T T ðM0k Þ ðQ þ rf ðA0 ; b0 Þ f ðA0 ; b0 ÞÞM0k j0 : k¼0
ð53Þ
Almost Optimal Adaptive LQ Control: SISO Case
85
The value of the cost function (6) when using the input function (52) is Ja ¼
y X
jkT Qjk þ ruk2 ¼
k¼0
¼
y X
y X
jkT Qjk þ rðð f ðA0 ; b0 Þ þ alk Þjk Þ 2
k¼0
jkT ½Q þ rð f ðA0 ; b0 Þ þ alk Þ T ð f ðA0 ; b0 Þ þ alk Þ jk
k¼0
¼ j0T
y Y k 1 X
ðM0 þ b0 ali ÞT
k¼0 i¼0
! k 1 Y ½Q þ rð f ðA0 ; b0 Þ þ alk Þ ð f ðA0 ; b0 Þ þ alk Þ
ðM0 þ b0 alk Þ j0 : T
ð54Þ
i¼0
For a su‰ciently small the system (3), (52) is exponentially stable so that the infinite sum in (54) converges absolutely. Therefore, if we let a tend to zero, we may interchange the limit and the summation. From that it follows that lim Ja ¼ J : a#0
9
ð55Þ
4. Application to Adaptive LQ Control The analysis presented thus far was nonadaptive. In this section we use the ideas developed in the previous sections to design an adaptive control scheme based on an LQ design and using a time-varying feedback on top of the usual certainty equivalent feedback. For the identification of the system parameters we use a standard projection algorithm. To calculate the optimal feedback corresponding to the estimates we would have to solve the algebraic Riccati equation for each iteration, which is numerically unacceptable. Instead, we use the Riccati di¤erence equation to approximate the solution of the algebraic equation. Since due to the time variations in the feedback we are going to identify ðA0 ; b0 Þ, we may hope that the solution of the di¤erence equation converges to the positive semidefinite solution of the corresponding algebraic Riccati equation. This idea was also used in [ABP], [G], and [S]. Then there is the issue of stability. The time-variations in the feedback should be su‰ciently small to make sure that they do not compromise stability. Therefore, we computed an upper bound in terms of the system parameters and the solution of the corresponding algebraic Riccati equation, such that if the norm of the time-varying gain is smaller than this upper bound, stability is guaranteed. Since we do not know the true system we use an estimate for an upper bound on the time-varying gain based on the estimate of the true system. This upper bound is obtained adaptively from the solution of the Riccati di¤erence equation. The interesting feature of the algorithm is that the Riccati di¤erence equation is used for approximating the certainty equivalent part of the feedback as well as for obtaining an upper bound on the time-varying gain.
86
J. Daams and J. W. Polderman
4.1. Identification Algorithm The identification procedure that we use is the orthogonal projection algorithm [GS]. We estimate the system parameters from the input/state/output representation (3). It can be verified that this yields the same estimates that we would obtain if we estimate the unknown parameters directly from (1) and then substitute them into the nonminimal representation (3), provided that we choose the initial estimates accordingly. Let ðA^k ; b^k Þ be an estimate of ðA0 ; b0 Þ at time k. The estimates are updated according to the orthogonal projection algorithm: ðjkT ; uk Þ ðA^kþ1 ; b^kþ1 Þ ¼ ðA^k ; b^k Þ þ ðjk A^k jk b^k uk Þ : kðjkT ; uk Þk 2
ð56Þ
It is easily seen that (56) updates the unknown parameters in ðA^k ; b^k Þ only. Also the claim that (56) is equivalent to applying the projection algorithm to the input/ output representation follows readily by direct inspection of (56). Some properties of the orthogonal projection algorithm are recalled in Lemma 4.1. Lemma 4.1. Consider the orthogonal projection algorithm (56). Assume that jk , uk satisfy (3). Then for all k: 1. kðA^kþ1 ; b^kþ1 Þ ðA0 ; b0 Þk a kðA^k ; b^k Þ ðA0 ; b0 Þk: 2. limk!y kðA^kþm ; b^kþm Þ ðA^k ; b^k Þk ¼ 0, Em. Proof.
See [GS].
9
4.2. Solving the Algebraic Riccati Equation by Means of the Riccati Di¤erence Equation We now show that under mild conditions the solution of the Riccati di¤erence equation converges to the positive semidefinite solution of the algebraic Riccati equation. The results also hold for multivariable systems and in a continuous-time setting. However, for simplicity we confine the analysis to the class of systems that we are considering in this paper,namely single-input discrete-time systems. The Riccati di¤erence equation corresponding to the sequence of estimates ðA^k ; b^k Þ is given by P^kþ1 ¼ A^T P^k A^k A^T P^k b^k ðb^ T P^k b^k þ rÞ1 b^ T P^k A^k þ c T c0 ; P^0 b 0: ð57Þ k
k
k
k
0
As a standing (technical) assumption for this section we take: Assumption 4.2. The sequence of estimates ðA^k ; b^k Þ is contained in a compact subset of the set of controllable pairs. Assumption 4.2 is used to prove boundedness of the sequence fP^k g. In the adaptive algorithm Assumption 4.2 is replaced by the assumption that the true system parameters belong to a known closed and convex subset of the set of controllable pairs, see Assumption 4.8. This assumption implies Assumption 4.2. Of
Almost Optimal Adaptive LQ Control: SISO Case
87
course Assumption 4.2, and Assumption 4.8 for that matter, are very much related to the notorious stabilizability problem in adaptive control, see [MP] and the references therein. We provide further comments on this issue in Remark 4.9. We first consider the following fictitious system: hkþ1 ¼ A^N1k hk þ b^N1k uk ;
ð58Þ
and the cost criterion: J N ðuÞ ¼
N 1 X
hkT Qhk þ ruk2 þ hNT P^0 hN :
ð59Þ
k¼0
The reason that we are interested in (58) and (59) is that P^N is just the optimal value function for the finite horizon problem defined by (58), (59). This observation is a direct consequence of standard dynamic programming. Next we prove that there exists a feedback strategy for the finite horizon problem such that the resulting costs are bounded independent of the horizon N. This then implies boundedness of the sequence P^k . To be able to use the time-varying gain modification, it is necessary that the time-varying gain remains within the stability radius of the optimal closed-loop system. In Section 3.2 it was shown how this stability radius may be upper bounded through the solution of the Riccati equation, see (41), (42). From (41), (42) it follows that boundedness of P^k is essential, for if we would not have that, then the time-varying part of the feedback could vanish. See also in the adaptive algorithm (83c), (83d). Theorem 4.3. Consider the system (58) and the finite horizon cost criterion (59). Under Assumption 4.2, there exists a feedback f~k that yields costs that are bounded independent of the horizon N. Proof. Let f~ðA^N1k ; b^N1k Þ be such that the spectrum of the closed-loop matrix ^ N1k :¼ A^N1k þ b^N1k f~ðA^N1k ; b^N1k Þ M ð60Þ is constant and contained in the open unit disk. Because of Assumption 4.2 and the continuity of the pole-placement feedback there exists a compact set C of stable matrices such that, Ej;
^j A C M
and
^ jþ1 M ^ j k ¼ 0: lim kM
j!y
ð61Þ
To prove boundedness of the costs, consider the Lyapunov equation: ^ T Xj M ^ j Xj ¼ I : M j
ð62Þ
^ j , it follows Because of the continuity of the solution of (62) as a function of M from (61) that there exists a compact set of positive matrices such that, Ej;
Xj A D
and
lim Xjþ1 Xj ¼ 0:
j!y
ð63Þ
The closed-loop system can be written as ^ N1k hk : hkþ1 ¼ M
ð64Þ
88
J. Daams and J. W. Polderman
It follows from (62) and (64) that T T ^T ^ XNðkþ1Þ hkþ1 ¼ hkT M hkþ1 N1k XNðkþ1Þ MN1k hk ¼ hk ½XNðkþ1Þ I hk
¼ hkT hk þ hkT XNk hk þ hkT ½XNðkþ1Þ XNk hk :
ð65Þ
Equation (65) implies that T hkT ½I þ XNk XNðkþ1Þ hk ¼ hkT XNk hk hkþ1 XNðkþ1Þ hkþ1 :
ð66Þ
From (63) we conclude that there exists k0 b 0 such that for N su‰ciently large and k a N k0 , 12 I a XN1k XNk a 12 I : ð67Þ It follows from (66), (67) that if 0 a k a N k0 , then 1 T 2 hk hk
T a hkT XNk hk hkþ1 XNðkþ1Þ hkþ1 :
ð68Þ
This implies that Nk X0
hkT hk a 2h0T XNk0 h0 :
ð69Þ
k¼0
Therefore, J N ðuÞ a c1 h0T XNk0 h0 þ c2
N 1 X
hkT hk ;
ð70Þ
k¼Nþ1k0
where c1 and c2 are constants that do not depend on the horizon N. Xj is bounded ^ j is bounded and the summation in (70) is over a finite number of steps and since M and by (69) hNk0 is bounded, these last k0 1 steps also give a bounded contribution to the costs. 9 Corollary 4.4.
The sequence P^k is bounded.
Proof. This is a direct consequence of the fact that by Theorem 4.3 there exists a (suboptimal) feedback that yields finite costs. 9 Theorem 4.5. Assume that the sequence of estimates of (3), ðA^k ; b^k Þ, satisfies Assumption 4.2. Then there exist a subsequence fkt g H N, a pair ðA; bÞ A W, and a positive semidefinite matrix P such that for all m A N, lim ðA^kt þm ; b^kt þm ; P^kt þm Þ ¼ ðA; b; PÞ;
ð71aÞ
P ¼ A T PA A T Pbðb T Pb þ rÞ1 b T PA þ c0T c0 :
ð71bÞ
t!y
Proof.
Let fks g H N be a subsequence along whichðA^k ; b^k ; P^k Þ converges, say lim ðA^ks ; b^ks ; P^ks Þ ¼ ðA; b; PÞ:
s!y
ð72Þ
Almost Optimal Adaptive LQ Control: SISO Case
89
Since by Corollary 4.4 P^k is bounded we can always find such a subsequence. By Assumption 4.2 we also have lim ðA^ks þm ; b^ks þm Þ ¼ ðA; bÞ: ð73Þ s!y
Therefore the sequence P^ks þm converges. Namely if we define Pmþ1 ¼ A T Pm A A T Pm bðb T Pm b þ rÞ1 b T Pm A þ c0T c0 ;
P0 ¼ P;
ð74Þ
then lim P^ks þm ¼ Pm :
ð75Þ
s!y
Since ðA; bÞ is controllable and P^k b 0, it follows from (74) that lim Pm ¼ P:
ð76Þ
m!y
Combining (73), (75), and (76) we conclude that ðA; b; PÞ is a limit point of ðA^k ; b^k ; P^k Þ: ð77Þ lim ðA^kt ; b^kt ; P^kt Þ ¼ ðA; b; PÞ: t!y
By Lemma 4.1, part 2, and the fact that P is a stationary point of (74) it follows that lim ðA^kt þm ; b^kt þm ; P^kt þm Þ ¼ ðA; b; PÞ: ð78Þ t!y
This concludes the proof.
9
The last result of this subsection is that if ðA^k ; b^k Þ actually converges to ðA0 ; b0 Þ, then P^k converges to the positive semidefinite solution of the algebraic Riccati equation associated to ðA0 ; b0 Þ. Theorem 4.6. Assume that limk!y ðA^k ; b^k Þ ¼ ðA0 ; b0 Þ. Then limk!y P^k ¼ P0 , where P0 is the positive semidefinite solution of the algebraic Riccati equation P0 A0T P0 A0 þ A0T P0 b0 ðb0T P0 b0 þ rÞ1 b0T P0 A0 c0T c0 ¼ 0. Proof. By Corollary 4.4 the sequence P^k is bounded and since ðA^k ; b^k Þ converges to ðA0 ; b0 Þ, the recursion for P^k , (57), may be written as ð79Þ P^kþ1 ¼ A T P^k A0 A T P^k b0 ðb T P^k b0 þ rÞ1 b T P^k A0 þ c T c0 þ Ek ; 0
0
0
0
0
where Ek is an asymptotically vanishing term. It is not di‰cult to check [P2], [P5] that by linearizing the Riccati di¤erence equation (79) around P0 we get P^kþ1 P0 ¼ ðA0 þ b0 f ðA0 ; b0 ÞÞ T ðP^k P0 ÞðA0 þ b0 f ðA0 ; b0 ÞÞ þ Ek þ h:o:t: ð80Þ Since A0 þ b0 f ðA0 ; b0 Þ is asymptotically stable and since limk!y Ek ¼ 0, we conclude from (80) that (57) is locally asymptotically stable. By Theorem 4.5 P0 is a limit point of P^k . Combining these two facts yields that lim P^k ¼ P0 : 9 ð81Þ k!y
Remark 4.7. Similar results may be found in [S], however, the proofs are not provided there. Some of the properties discussed in this section could also be derived, mutatis mutandis, using the results in Section 5 of [AM]. The analysis there is
90
J. Daams and J. W. Polderman
based on the filter Riccati equation and uses the concept of uniform detectability for time-varying systems. In fact, Theorem 4.3 implies that the dual of our sequence of estimates is uniformly detectable. See also Theorem 3.4 of [CG2]. 4.3. The Adaptive Control Algorithm In this section we propose the adaptive algorithm. To be able to use Theorem 4.5, we assume that we have some prior knowledge about the true system. The following assumption is made for technical reasons. It ensures that all estimates are uniformly controllable, i.e., that all limit points are controllable. See also the comment just below Assumption 4.2. Assumption 4.8. The true parameters ðA0 ; b0 Þ of the nonminimal representation (3) belong to a known convex and closed set 1 contained in the set controllable models. Remark 4.9. Assumption 4.8 may be replaced by the much weaker assumption that the true system is controllable. However, to guarantee controllability of the estimated parameters, uniformly in time, the identification and/or the design scheme would require nontrivial modifications. Alternatively, noncontrollable models may be avoided by forcing the algorithm to identify the true system, which by assumption is controllable, by injecting explicit probing signals. Avoiding noncontrollable models without using explicit excitation is a problem in itself. It has attracted ample attention in the literature. An early treatment was given in [L1]. Later contributions are [KS1], [P4], [P7], [P1], and [L2]. See also Chapter 7 of [MP]. For the convenience of the reader we briefly indicate how one such modification, proposed in [P4] and [P7], may be incorporated into our scheme. It is based on using an alternative sequence of feedback gains as soon as the estimates gets too close to the uncontrollable region. The tolerance according to which the estimates may approach the uncontrollable region is decreased every time it is violated. The sequence of alternative controller gains is such that correct identification is guaranteed provided that is it used an infinite number of times. However, this would, by assumption, lead to a controllable pair, thus yielding a contradiction with the property that the alternative sequence of controller gains is invoked only infinitely often if the sequence of estimates approaches the uncontrollable region arbitrarily close. Consequently, the alternative sequence of controller gains will be used only a finite number of times after which the estimates are guaranteed to be bounded away from the noncontrollable region. In [KS1] another modification was proposed that is based on simultaneous estimation of open-loop system parameters and controller parameters. This is a very elegant approach, however, since it was tailored to be used in conjunction with a pole placement design, it is not straightforward to combine it with our LQ control objective. The above methods are based on modification of the feedback design. A method that modifies the estimates themselves is presented in [L2]. All methods that we know of carry some ad hoc elements in them and do not really solve the fundamental di‰culty that standard identification algorithms have
Almost Optimal Adaptive LQ Control: SISO Case
91
no preference for controllable models. In fact most identification methods are based on the Euclidean, homogeneous structure of the parameter space. As a consequence parameter values such as, e.g., ða1 ; b1 Þ ¼ ð1; 0:1Þ and ða2 ; b2 Þ ¼ ð1; 0:1Þ are considered to be close, whereas from a controller design point of view they should be regarded as rather far apart. An attempt to investigate this di‰culty in a more fundamental way may be found in [P1]. We have decided not to incorporate any of the above modifications since this would lead to a much more complicated and therefore less transparent algorithm. We have chosen to concentrate on the main issue of the paper and just to assume uniform controllability keeping in mind that in principle this assumption may be relaxed. In the algorithm presented below, the estimates of ðA0 ; b0 Þ are denoted by ðA^k ; b^k Þ and are generated by the projection algorithm. The time-varying part of the feedback is based on l~1 ; l~2 ; l~3 which are assumed not to satisfy the same linear a‰ne relation and to have norm one. We also write # " c0 q ffiffiffiffi ^ Hk :¼ ð82Þ 1 ^ ^ 2 r f ðAk ; bk Þ in the time-varying part of the control law. Notice that (82) is obtained from (38), (40) by taking d ¼ 2. The solution of the Riccati di¤erence equation (57) is denoted ^ k ; b^k ; H ^k Þ, where by P^k , the estimate for the upper bound of the stability radius rðM ^ ^ ^ ^ ^ ^ ^ ^ Mk ¼ Ak þ bk f ðAk ; bk Þ, is denoted by ^rðAk ; bk ; Hk Þ or ^rk for short. Finally, a is a scaling factor: 0 < a < 1. Algorithm 4.10. Initialization: P^0 b 0, ðA^0 ; b^0 Þ A 1 Recursion: ½jkT ; uk
ðA^kþ1 ; b^kþ1 Þ ¼ ðA^k ; b^k Þ þ ðjkþ1 A^k jk b^k uk Þ ; k½jkT ; uk k 2
ð83aÞ
T ^ P^k ¼ A^k1 Pk1 A^k1 T ^ T ^ T ^ Pk1 b^k1 ðb^k1 Pk1 b^k1 þ rÞ1 b^k1 Pk1 A^k1 þ c0T c0 ; A^k1
ð83bÞ
1 ^rk ¼ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ; T b^k P^k b^k þ 2r
ð83cÞ
^ k jk : uk ¼ ðb^kT P^k b^k þ rÞ1 b^kT P^k A^k jk þ a^rk l~sðkÞ H
ð83dÞ
The behavior of the adaptively controlled system is characterized by the following theorem. Theorem 4.11. Assume that the system (1) is controllable and let it be controlled by Algorithm 4.10. Assume that for all k, jk 0 0: 1. limk!y ðA^k ; b^k Þ ¼ ðA0 ; b0 Þ.
92
J. Daams and J. W. Polderman
2. 3. 4. 5.
limk!y P^k ¼ P0 . limk!y jk ¼ 0, exponentially P fast. T 2 T lima#0 lim supN!y ð1=kjN k 2 Þð y k¼N jk Qjk þ ruk jN P0 jN Þ ¼ 0. Consider the extended state vector ck :¼ ðjk ; A^k ; b^k ; P^k Þ, let N A f0; 3ð2n 1Þ; 6ð2n 1Þ; . . .g be arbitrary. Suppose that cN ¼ c, then there exists an extended state trajectory c 0 with c00 ¼ c.
The first and the third parts of Theorem 4.11 state that the true system is identified and that the adaptively controlled system is exponentially stable. The fourth part expresses that the normalized asymptotic costs are arbitrarily close to the normalized optimal costs provided a is su‰ciently small. With respect to the fifth part we have the following remark. Remark 4.12. Property 5 expresses that the adaptively controlled system is governed by equations with periodic coe‰cients. In fact all equations have constant coe‰cients except for the periodic e¤ect of the additional time-varying feedback gains. This implies that the ability of the algorithm to identify the optimal control gain is essentially preserved in time, including the speed with which adaptation takes place. If for example a sudden change in the true system parameters occurs long after the algorithm has stabilized, the adaptation that follows will be as fast as when we would have initialized in the changed situation (give or take a few time steps to account for the period 3ð2n 1Þ). In the next section this property is illustrated with an appropriate simulation. Algorithms that rely on periodic external probing signals share the feature of being able to recover from sudden changes in the systems dynamics. As described in the Introduction, however, such algorithms do not allow regulation to zero. On the other hand vanishing probing signals such as proposed in [CG1], [CDPD], and [G] lack the periodicity property. The later the change in dynamics occurs, the smaller the probing signal is, and the longer it takes before the algorithm has caught up with the new situation. Remark 4.13. Algorithm 4.10 contains a scaling factor a. Theorem 4.11 characterizes its asymptotic behavior for values of a between zero and one. The question arises how to choose a. From part 4 of Theorem 4.11 it can be concluded that with respect to asymptotic suboptimality a should be chosen as small as possible. On the other hand, intuitively it should be clear that small a has the opposite e¤ect on the speed of convergence: small a will yield slow convergence of the sequence of estimates of the system parameters. Hence there is a tradeo¤ between suboptimality and speed of convergence. It would be desirable to have some clue as to how to deal with this tradeo¤. Unfortunately this is not so easy. In [B] it was proposed to replace a by a time-varying scaling parameter ak . The time variations are driven by the prediction error. For large values of the prediction error ak is large and ak converges to a small positive value a as the prediction error approaches zero. The rationale behind this idea is that when the prediction error is large, most likely the certainty equivalence controller based on the estimates will be far from optimal. Therefore it makes sense to use the controls to speed up convergence. Once the
Almost Optimal Adaptive LQ Control: SISO Case
93
prediction error becomes small, it follows that the estimation error is also small so that the corresponding controller will be close to optimal. Therefore the identification e¤ort may be decreased. The resulting algorithm is such that initially, when there is little knowledge about the system parameters, emphasis is on identification. Then gradually emphasis shifts from identification to control. Of course, the choice of a, the limiting value of ak remains, but at least it has no influence on the speed of convergence. The reason that we did not incorporate this refinement in the present paper is that it was worked out satisfactorily for the first-order case only. For the proof of part 1 of Theorem 4.11 we use the following lemma. Asymptotically, the normalized prediction error is zero: ^k Þjk jkþ1 ðA^k þ b^k ½ f ðA^k ; b^k Þ þ a^rðA^k ; b^k Þl~sðkÞ H ¼ 0: lim k!y kjk k
Lemma 4.14.
Proof.
ð84Þ
By (56): ^k Þjk : jkþ1 ¼ ðA^kþ1 þ b^kþ1 ½ f ðA^k ; b^k Þ þ a^rðA^k ; b^k Þl~sðkÞ H
ð85Þ
Because f and ^r are bounded on 1, we conclude from (85) and from Lemma 4.1, part 2, that ^k Þjk jkþ1 ðA^k þ b^k ½ f ðA^k ; b^k Þ þ a^rðA^k ; b^k Þl~sðkÞ H lim k!y kjk k ^ ^k Þjk ððAkþ1 A^k Þ þ ðb^kþ1 b^k Þ½ f ðA^k ; b^k Þ þ a^rðA^k ; b^k Þl~sðkÞ H ¼ lim k!y kjk k ¼ 0:
ð86Þ
9
Proof of Theorem 4.11. 1. It follows from Theorem 4.5 that there exists a subsequence fkt g of f0; 3ð2n 1Þ; 6ð2n 1Þ; . . .g (this is to make sure that along this sequence l~sðkÞ will always be at the start of a new cycle) such that for all m, jkt lim ¼ j ; t!y kjk k t lim ðA^kt þm ; b^kt þm Þ ¼ ðA; bÞ
some ðA; bÞ A 1;
t!y
lim P^kt þm ¼ P
t!y
some positive semidefinite matrix P; ð87Þ
lim f ðA^kt þm ; b^kt þm Þ ¼ f ðA; bÞ;
t!y
^kt þm ¼ H; lim H
t!y
lim ^rkt þm ¼ ^r:
t!y
The idea is now to re-initialize the system ‘‘at y’’ in j . Define the sequences zk
94
J. Daams and J. W. Polderman
and z~k as follows: zkþ1 ¼ ðA0 þ b0 f ðA; bÞ þ a^rðA; bÞl~sðkÞ HÞzk ; zk ; z~kþ1 ¼ ðA þ bf ðA; bÞ þ a^rðA; bÞl~sðkÞ HÞ~
z0 ¼ j ; z~0 ¼ j :
ð88Þ
It follows from (87) and Lemma 4.14 that for all k, zk ¼ z~k so that ðA0 þ b0 f ðA; bÞ þ a^rðA; bÞl~sðkÞ HÞzk ¼ ðA þ bf ðA; bÞ þ a^rðA; bÞl~sðkÞ HÞzk : ð89Þ Applying Theorem 3.6 to (89), we conclude that ðA; bÞ ¼ ðA0 ; b0 Þ. Now, from Lemma 4.1, part 1, we conclude limk!y ðA^k ; b^k Þ ¼ ðA0 ; b0 Þ and hence the statement. 2. This follows now directly from Theorem 4.6. 3. The system controlled by Algorithm 4.10 can be described as follows: ^k Þjk jkþ1 ¼ ðA0 þ b0 ½ f ðA^k ; b^k Þ þ a^rðA^k ; b^k Þl~sðkÞ H ¼ ðA0 þ b0 f ðA0 ; b0 Þ þ b0 ½a^rðA0 ; b0 Þl~sðkÞ H0 þ Dk Þjk ;
ð90Þ
where ^k a^rðA0 ; b0 Þl~sðkÞ H0 : ð91Þ Dk :¼ f ðAk ; bk Þ f ðA0 ; b0 Þ þ a^rðA^k ; b^k Þl~sðkÞ H ^ It follows from the continuity of f , H , and ^r on 1 and from Theorem 4.11, part 1, that Dk ! 0. The result then follows from Theorem 3.8. 4. First observe that " y T X jN P0 jN jNT ¼ ðA0 þ b0 ð f ðA0 ; b0 ÞÞ T Þ l kjN k l¼0 kjN k 2 # jN T l : ð92Þ ðQ þ rf ðA0 ; b0 Þ f ðA0 ; b0 ÞÞðA0 þ b0 ð f ðA0 ; b0 ÞÞÞ kjN k Define JNa by JNa :¼
1
y X
kjN k 2 k¼N
jkT Qjk þ ruk2 ;
ð93Þ
then JNa ¼ ¼
y X
1 kjN k
2
^k jk Þ 2 jkT Qjk þ rð½ f ðA^k ; b^k Þ þ a^rðA^k ; b^k Þl~sðkÞ H
k¼N y X
1
kjN k 2 k¼N
^k Þ T jkT ðQ þ rð f ðA^k ; b^k Þ þ a^rðA^k ; b^k Þl~sðkÞ H ^k ÞÞjk ð f ðA^k ; b^k Þ þ a^rðA^k ; b^k Þl~sðkÞ H
¼
y X
1 kjN k
2 k¼N
jNT
k 1 Y
^i ÞÞ T ðA0 þ b0 ð f ðA^i ; b^i Þ þ a^rðA^i ; b^i Þl~sðiÞ H
i¼N
^k Þ T ð f ðA^k ; b^k Þ þ a^rðA^k ; b^k Þl~sðkÞ H ^k ÞÞ ðQ þ rð f ðA^k ; b^k Þ þ a^rðA^k ; b^k Þl~sðkÞ H ! k1 Y ^i ÞÞ jN ðA0 þ b0 ð f ðA^i ; b^i Þ þ a^rðA^i ; b^i Þl~sðiÞ H i¼N
Almost Optimal Adaptive LQ Control: SISO Case
95
" y Y l1 X jNT ¼ ðA0 þ b0 ð f ðA^jþN ; b^jþN Þ þ a^rðA^jþN ; b^jþN Þl~sð jþNÞ ÞÞ T kjN k l¼0 j¼0 ðQ þ rð f ðA^lþN ; b^lþN Þ þ a^rðA^lþN ; b^lþN Þl~sðlþNÞ Þ T ^lþN ÞÞ ð f ðA^lþN ; b^lþN Þ þ a^rðA^lþN ; b^lþN Þl~sðlþNÞ H
l1 Y
ðA0 þ b0 ð f ðA^jþN ; b^jþN Þ
j¼0
!# ^jþN ÞÞ þ a^rðA^jþN ; b^jþN Þl~sð jþNÞ H
jN : kjN k
(94)
It follows from the exponential stability of the system (90), which was proved in part 2, that (94) is absolutely summable. Therefore we may interchange the summation, the limit for N, and the limit for a. As a consequence, the part in (94) without the state jN converges to y X ðA0 þ b0 ð f ðA0 ; b0 ÞÞ T Þ l ðI þ rf ðA0 ; b0 Þ T f ðA0 ; b0 ÞÞðA0 þ b0 ð f ðA0 ; b0 ÞÞÞ l : ð95Þ l¼0
Comparing (92) and (95) yields the result. 5. First notice that ðA^N ; b^N Þ A 1 and P^N b 0, so ðA^N ; b^N Þ, P^N are admitted as an initialization of the algorithm. Furthermore, there are no restrictions on j0 in (5). The statement now follows directly from the fact that (1) and (83a)–(83c) are time-invariant and that in (83d) l~sðkþ3ð2n1ÞÞ ¼ l~sðkÞ . 9 5. Simulation Example In this section we present computer simulations that illustrate Theorem 4.11. Consider the system ykþ2 ykþ1 2yk ¼ ukþ1 þ 1:5uk ð96Þ controlled by Algorithm 4.10. The underlying cost-criterion is chosen as Jðu; j0 Þ ¼
y X
yi2 þ ui2 ;
i¼0
where j0 ¼ ½y0 ; y1 ; u1 T and u is the sequence of inputs. The algorithm is initialized as follows:
Initial input and output Initial estimates
Riccati di¤erence equation
½ y0 ; y1 ; u1 ¼ ½1; 1; 1
½^ a1 ; a^0 ¼ ½2; 1
½b^1 ; b^0 ¼ ½1; 2
2 3 3:25 1:06 2:12 6 7 0:53 1:05 5 P^0 ¼ 4 1:06 2:12 1:05 2:10
ð97Þ
96
J. Daams and J. W. Polderman
Fig. 1.
Algorithm 4.10 with modified feedback for a ¼ 0:1, a ¼ 0:5, and a ¼ 0:9.
Notice that the P^0 corresponds to the solution of the algebraic Riccati equation for the initial parameter estimate. We have first simulated the controlled system for various values of the scaling parameter a and measured the distance between the true and the estimated system parameters k½A0 ; b0 ½A^k ; b^k k and the distance k f0 f^k k between the optimal feedback f0 and the current certainty equivalent part of the feedback f^k . We also simulated the case that a standard certainty equivalent feedback is used. In Fig. 1 we plotted both k½A0 ; b0 ½A^k ; b^k k and k f0 f^k k on a logarithmic scale for the case that the modified feedback is used with various values of a:
Almost Optimal Adaptive LQ Control: SISO Case
97
Fig. 2. Algorithm 4.10 with standard feedback: a ¼ 0.
a ¼ 0:1, a ¼ 0:5, and a ¼ 0:9. In Fig. 2 we plotted these quantities for the case that a standard certainty equivalent feedback is used (a ¼ 0). We observe that in Fig. 1 the estimates converge to the true system parameters. Also the certainty equivalent part of the feedback approaches the optimal control law, where the variations in k f0 f^k k are also decreasing (this is less apparent due to the logarithmic scale). For both quantities the convergence appears to be exponential in these examples. In Fig. 2, depicting the pure certainty equivalence case, we see that the distance between the estimates and the true system parameters converges to a strictly positive constant. In view of Lemma 4.14 the only thing that we can conclude is that the sequence of estimates, if at all convergent, converges to a closed-loop unfalsified model not being the true system. In [P6] it was proven that the only closedloop unfalsified model that generates the optimal inputs is in fact the true system. As a consequence, the certainty equivalent feedback is bounded away from the optimal control law in this example. This is also nicely illustrated in Fig. 2. We next investigate how much the time-variations in the control law add to the costs. In order to illustrate this, we evaluated Ja ¼ Jðð f0 þ a^rðA0 ; b0 ÞlsðkÞ H0 Þjk ; j0 Þ
ð98Þ
as defined in Section 3.3 for several values of a. The result is shown in Fig. 3. In Fig. 3 two graphs are drawn: the lower graph representing the optimal costs J0 , the upper graph representing the suboptimal costs Ja . Clearly, lima#0 Ja ¼ J0 in the example. We conclude from Figs. 1 and 3 that there is a tradeo¤ between the rate of convergence of the parameter estimates and the extra costs incurred with the modification of the certainty equivalent feedback, as could be expected. See also Remark 4.13. Finally, we illustrate the property that the adaptive algorithm can easily recover from sudden parameter changes. Here we start with the same situation as we did for Fig. 1 with a ¼ 0:9, however, at some time instant N we suddenly change the true system into ykþ2 ykþ1 2yk ¼ 3ukþ1 þ 2uk : ð99Þ
98
J. Daams and J. W. Polderman
Fig. 3.
Suboptimal costs as function of a (solid line), and optimal costs (dotted line).
So the sequence ðyk ; uk Þ satisfies (96) for k < N and (99) for k b N. We plotted the distance to the true system parameters and the distance of the certainty equivalent part of the feedback to the optimal gain as a function of time in Fig. 4 for N ¼ 1000 and N ¼ 3000. It is clear from Fig. 4 that not only the algorithm is able to re-adapt to the new situation, but that the speed with which it does so does not depend on time, thus visualizing the result of Theorem 4.11, part 5.
Fig. 4. Algorithm 4.10 when the to-be-controlled system parameters suddenly change.
Almost Optimal Adaptive LQ Control: SISO Case
99
6. Conclusions For the class of single input/state systems we proposed an almost optimal LQ controller. On the basis of this controller we designed an adaptive algorithm. The crucial property of this algorithm is that asymptotically the value of the LQ cost criterion approaches the optimal costs by choosing a design parameter su‰ciently small. At the basis of the analysis lies the fact that the existence of a model that explains the input/output sequence but that is di¤erent from the true system implies that the inputs can be looked upon as the result of a constant state feedback, and hence the observed behavior is essentially linear and time-invariant. The construction of the controller is such that time-invariant behavior is not possible and therefore the true system parameters can be identified. Another approach could be to use a small nonlinear modification of the certainty equivalent control law. This could be an issue of interest for further research. We used the fact that the solution of the algebraic Riccati equation associated with the true system may be obtained as the limit of the solution of the time-varying Riccati di¤erence equation corresponding to the estimated models. Extensions to MIMO systems and continuous time have been investigated in [D]. Acknowledgments. The authors thank A. A. Stoorvogel, S. Weiland, and H. J. Zwart for helpful discussions and D. Hinrichsen for providing some of the references. References [AM] B. D. O. Anderson and J. B. Moore. Detectability and stabilizability of time-varying discretetime linear systems. SIAM J. Control Optim., 19:20–32, 1981. [ABP] K. Arent, Y. Boers, and J. W. Polderman. Almost optimal adaptive LQ control: observed state case. In Proc. 34th IEEE Conf. on Decision and Control, pages 2328–2333, New Orleans, LA, 1995. [AMP] K. Arent, I. M. Y. Mareels, and J. W. Polderman. The pole zero cancellation problem in adaptive control: a solution for minimum phase systems by approximate models. Technical Report Memorandum 1175, Dept. of Applied Mathematics, University of Twente, 1993. [B] Y. Boers. Adaptieve lq regelaars met gesloten lus excitatie. Technical Report, Dept. Appl. Math., University of Twente, 1994. Master thesis (in Dutch). [BV] V. Borkar and P. Varaiya. Adaptive control of markov chains, I: Finite parameter set. IEEE Trans. Automat. Control, 24:953–957, 1979. [C] M. C. Campi. The problem of pole-zero cancellation in transfer function identification and application to adaptive stabilization. Automatica, 32:849–857, 1996. [CDPD] H. F. Chen, T. E. Duncan, and B. Pasik-Duncan. Stochastic adaptive control for continuous time linear systems with quadratic cost. J. Appl. Math. Optim., 34:113–138, 1996. [CG1] H. F. Chen and L. Guo. Optimal adaptive control and consistent parameter estimate for armax model with quadratic cost. SIAM J. Control Optim., 25:845–867, 1987. [CG2] H. F. Chen and L. Guo. Identification and Stochastic Adaptive Control. Birkha¨user, Boston, MA, 1991. [D] J. Daams. Almost optimal adaptive control. Technical Report, Dept. Appl. Math., University of Twente, 1996. Master thesis.
100
J. Daams and J. W. Polderman
[dB] P. T. de Bruin. Adaptieve lq regeling met neutrale zekerheidsequivalentie algoritmes. Technical Report, Dept. Appl. Math., University of Twente, 1992. Master thesis (in Dutch). [GS] G. C. Goodwin and K. S. Sin. Adaptive Filtering Prediction and Control. Prentice-Hall, Englewood Cli¤s, NJ, 1984. [GT] G. H. Goodwin and E. K. Teoh. Persistency of excitation in the presence of possibly unbounded signals. IEEE Trans. Automat. Control, 30:595–597, 1985. [G] L. Guo. Self-convergence of weighted least-squares with applications to stochastic adaptive control. IEEE Trans. Automat. Control, 1:79–89, 1996. [HS] W. Hirsch and S. Smale. Di¤erential Equations and Linear Algebra. Academic Press, New York, 1974. [KS1] G. Kreisselmeier and M. C. Smith. Stable adaptive regulation of arbitrary nth-order plants. IEEE Trans. Automat. Control, 31:299–305, 1986. [K] P. R. Kumar. Simultaneous identification and adaptive control of unknown systems over finite parameter sets. IEEE Trans. Automat. Control, 28:68–76, 1983. [KB] P. R. Kumar and A. Becker. A new family of optimal adaptive controllers. IEEE Trans. Automat. Control, 27:137–146, 1982. [KL] P. R. Kumar and W. Lin. Optimal adaptive controllers for unknown systems. Linear Algebra Appl., 27:765–774, 1982. [KS2] H. Kwakernaak and R. Sivan. Linear Optimal Control Systems. Wiley, New York, 1972. [L1] Ph. De Larminat. On the stabilizability condition in indirect adaptive control. Automatica, 20:793–795, 1984. [LKS] W. Lin, P. R. Kumar, and T. I. Seidman. Will the self-tuning approach work for general cost criteria? Systems Control Lett., 6:77–86, 1985. [L2] R. Lozano. Singularity-free adaptive pole-placement without resorting to persistency of excitation: detailed analysis for first order systems. Automatica, 28:27–33, 1992. [MP] I. M. Y. Mareels and J. W. Polderman. Adaptive Systems: an Introduction. Birkha¨user, Boston, MA, 1996. [P1] F. M. Pait. Achieving tunability in parameter-adaptive control. Ph.D. thesis, Yale University, 1993. [P2] J. W. Polderman. A note on the structure of two subsets of the parameter space in adaptive control. Systems Control Lett., 7:25–34, 1986. [P3] J. W. Polderman. On the necessity of identifying the true system in adaptive LQ control. Systems Control Lett., 8:87–91, 1986. [P4] J. W. Polderman. Avoiding the non-admissible region of the parameter space in indirect adaptive control algorithms. In Proc. 8th Internat. Conf. on Analysis and Optimization of Systems, pages 822–829, Juan les Pins, France, 1988. [P5] J. W. Polderman. Adaptive Control and Identification: Conflict or Conflux. CWI Tract 67. Centre for Mathematics and Computer Science, Amsterdam, 1989. [P6] J. W. Polderman. Adaptive lq control: conflict between identification and control. Linear Algebra Appl., 122–124:219–244, 1989. [P7] J. W. Polderman. A state space approach to the problem of adaptive pole assignment. Math. Control Signals Systems, 2:71–94, 1989. [PW] J. W. Polderman and J. C. Willems. Introduction to Mathematical Systems Theory: a Behavioral Approach. Volume 26 of Texts in Applied Mathematics. Springer-Verlag, New York, 1997. [S] C. Samson. An adaptive lq controller for nonminimum phase systems. Internat. J. Control, 35:1–28, 1982. [vS] J. H. van Schuppen. Tuning of gaussian stochastic control systems. IEEE Trans. Automat. Control, 39:2178–2190, 1994.