An Introduction to Applied Optimal Control
This is Volume 159 in MATHEMATICS IN SCIENCE AND ENGINEERING A Series of Monographs and Textbooks Edited by RICHARD BELLMAN, University of Southern California The complete listing of books in this series is available from the Publisher upon request.
An Introduction to Applied Optimal Control Greg Knowles Department of Mathematics Carnegie-Mellon University Pittsburgh. Pennsylvania
@
1981
ACADEMIC PRESS A Subsidiary of Harcourt Brace Jovanovich. Publishers
New York
London
Toronto
Sydney
San Francisco
COPYRIGHT © 1981, BY ACADEMIC PRESS, INC. ALL RIGHTS RESERVED. NO PART OF THIS PUBLICATION MAY BE REPRODUCED OR TRANSMITTED IN ANY FORM OR BY ANY MEANS, ELECTRONIC OR MECHANICAL, INCLUDING PHOTOCOPY, RECORDING, OR ANY INFORMATION STORAGE AND'RETRIEVAL SYSTEM, WITHOUT PERMISSION IN WRITING FROM THE PUBLISHER.
ACADEMIC PRESS, INC.
111 Fifth Avenue, New York, New York 10003
United Kingdom Edition published by
ACADEMIC PRESS, INC. (LONDON) LTD. 24/28 Oval Road, London NWI 7DX
Library of Congress Cataloging in Publication Data
Knowles. Greg. An introduction to applied opt1Jllal control. (Mathematics in science and engineering)
Includes bibliographies and index.
1. Control theory. 2. Mathematical optimization. I. Title. II. Series. QA402.3.K56 629.8'312 81-7989 ISBN Q..12-416960-0 AACR2
PRINTED IN THE UNITED STATES OF AMERICA
81 82 83 84
9 8 7 6 S 4 3 2 1
Contents
Preface
ix
Chapter I Examples of Control Systems;
the Control Problem
General Form of the Control Problem
Chapter II
5
The General Linear Time Optimal Problem
I. Introduction
9
2. Applications of the Maximum Principle
13
3. Normal Systems-Uniqueness of the Optimal Control
17
20
29
34
4. Further Examples of Time Optimal Control 5. Numerical Computation of the Switching Times References
Chapter III The Pontryagin Maximum Principle I. The Maximum Principle
2. Classical Calculus of Variations 3. More Examples of the Maximum Principle References
35
40
44
57
Chapter IV The General Maximum Principle; Control Problems with Terminal Payoff 1. Introduction
2. Control Problems with Terminal Payoff 3. Existence of Optimal Controls References
59
61
64
69
V
vi
Contents
Chapter V Numerical Solution of Two-Point Boundary-Value Problems 1. 2. 3. 4. 5. 6.
Linear Two-PointBoundary-Value Problems NonlinearShooting Methods Nonlinear ShootingMethods: Implicit Boundary Conditions Quasi-Linearization Finite-Difference Schemes and Multiple Shooting Summary . References
71
74
77
79
81
84
85
Chapter VI Dynamic Programming and Differential Games 1. Discrete Dynamic Programming 2. Continuous Dynamic Programming-Control Problems 3. Continuous Dynamic Programming-Differential Games References
87
90
96
105
Chapter VII Controllability and Observability 1. ControllableLinear Systems 2. Observability References
107
116
121
Chapter VIII State-Constrained Control Problems 1. 2. 3. 4.
The Restricted MaximumPrinciple Jump Conditions' The Continuous Wheat Trading Model withoutShortselling Some Models in Productionand Inventory Control References
123
126
132
135
141
Chapter IX Optimal Control of Systems Governed by Partial Differential Equations 1. SomeExamp~sofEllipticControIProb~ms 2. Necessary and SufficientConditionsfor Optimality 3. Boundary Control and ApproximateControllability
of Elliptic Systems 4. The Control of Systems Governedby ParabolicEquations 5. Time Optimal Control 6. ApproximateControllabilityfor Parabolic Problems References
Appendix I Geometry of R"
143
146
152
156
159
161
164
165
Contents
Appendix II
vii
Existence of Time Optimal Controls
and the Bang-Bang Principle
169
Appendix III Stability
175
Index
179
This page intentionally left blank
Preface
This book began as the notes for a one-semester course at Carnegie Mellon University. The aim of the course was to give an introduction to deterministic optimal control theory to senior level undergraduates and first year graduate students from the mathematics, engineering, and business schools. The only prerequisite for the course was a junior level course in ordinary differential equations. Accordingly, the backgrounds of the students were widely dissimilar, and the common denominator was their interest in the applications of optimal control theory. In fact, one of the most popular aspects of the course was that students were able to study problems from areas that they would not normally cover in their respective syllabi. This text differs from the standard ones in that we have not attempted to prove the maximum principle, since this was beyond the background and interest of most of the students in the course. Instead we have tried to show its strengths and limitations through examples. In Chapter I we introduce the concept of optimal control by means ofexamples. In Chapter II necessary conditions for optimality for the linear time optimal control problem are derived geometrically, and illustrations are given. In .Chapters ill and IV we discuss the Pontryagin maximum principle, its relation to the calculus of variations, and its application to various problems in science, engineering, and business. Since the optimality conditions arising from the maximum principle can often be solved only numerically, numerical techniques are discussed in Chapter V. In Chapter VI the dynamic programming approach to the solution of optimal control problems and differential games is considered; in Chapter VII the controllability and observability of linear control systems are discussed, and in Chapter VIII the extension of the maximum principle to state-constrained ix
x
Preface
control problems is given. Finally, for more advanced students with a background in functional analysis, we consider in Chapter IX several problems in the control of systems governed by partial differential equations. This could serve as an introduction to research in this area. The support of my colleagues and students at Carnegie-Mellon University has been invaluable during this project; without it this text would almost certainly not have appeared.
Chapter I
Examples of Control Systems; the Control Problem
Example 1 Consider a mechanism, such as a crane or trolley, of mass m which moves along a horizontal track without friction. If x( t) re presents the position at time t, we assume the motion of the trolley is governed by the law t > 0,
mx(t) = u(t),
(1)
where u(t) is an external controlling force that we apply to the trolley (see Fig. 1). Assume that the initial position and velocity of the trolley are given as x(O) = xo, x(O) = Yo, respectively. Then we wish to choose a function u (which is naturally enough called a control function) to bring the trolley to rest at the origin in minimum time. Physical restrictions will usually require that the controlling force be bounded in magnitude, i.e., that !u(t)! s M.
(2)
For convenience, suppose that m = M = 1, and rewrite Eq. (1) X2
= u(t),
where x 1(t) and X2(t) are now the position and velocity of the body at time t. Equation (1) then becomes
2
I.
Examples of Control Systems m 1·. . 1ee
ll =UC>:.L_ _
01---------"-''---~---------
Fig••
or x(t)
= Ax(t) + bu(t),
x(O) =
[;:l
where A
=
[~ ~]
and
b=[~l
x(t) =
(3)
[Xt(t)], X2(t)
and the control problem is to find a function u, subject to (2), which brings the solution of (3), x(t), to [8J in minimum time t. Any control that steers us to [8J in minimum time is called an optimal control. Intuitively, we should expect the optimal control is first a period of maximum acceleration(u = + 1),and then maximum braking(u = -1), or vice versa. Example 2 (Bushaw [1 J) A control surface on an aircraft is to be kept at rest at a fixed position. A wind gust displaces the surface from the desired position. We assume that if nothing were done, the control surface would behave as a damped harmonic oscillator. Thus if () measures the deviation from the desired position, then the free motion of the surface satisfies the differential equation
o+ afJ + w
2
()
=0
with initial conditions ()(O) = ()o and fJ(O) = ()~. Here ()o is the displace ment of the surface resulting from the wind gust and ()~ is the velocity imparted to the surface by the gust. On an aircraft the oscillation of the control surface cannot be permitted, and so we wish to design a servomechanism to apply a restoring torque and bring the surface time. The equation then becomes back to rest in ~inimum O(t) + afJ(t)
+ w 2 ()(t) =
u(t),
fJ(O) = ()~,
(4)
where u(t) represents the restoring torque at time t. Again we must suppose that lu(t)1 s C, where C is a constant, and by normalization
Examples of Control Systems
3
can be taken as 1. The problem is then to find such a function u, so that the system will be brought to (J = 0, tJ = 0 in minimum time. It is clear that if (Jo > 0 and (J~ > 0, then the torque should be directed initially in the direction of negative (J and should have the largest possible magnitude. Thus u(t) = -1 initially. However, if u(t) = -1 is applied for too long a time, we shall overshoot the desired terminal condition (J = 0, = O. Therefore at some point there should be a torq ue reversal to + 1 in order to brake the system. The following questions occur:
e
(1) Is this strategy indeed optimal, and ifso, when should the switch take place? (2) Alternatively, is it better to remove the torque at some point, allow a small overshoot, and then apply + 1? (3) In this vein, we could ask whether a sequence of -1, + 1, -1, + 1, ... of n steps is the best, and if so, what is n and where do the switches occur? Again we are led to controls that take on (only) values ± 1; such controls are called bang-bang controls. Note that as before, setting Xl = () and X 2 = iJ, we can write the system equation (4) Xl = x 2 ,
x2 = i
-ax 2
Xl(O) = -
W2Xl
+ u,
= Ax + bu,
xiO)
=
x(O) =
(Jo,
(J~,
[~:J
where
A=[O _w
2
1J b=[~J
-a '
and u is chosen with ju(t)1 =:;; 1 and to minimize C(u) =
III Jo 1 dt,
Example 3 (Isaacs [3J) Let x(t) be the amount of steel produced by a mill at time t. The amount produced at time t is to be allocated
4
I. Examples of Control Systems
to one of two uses: (1) production of consumer products; (2) investment. It is assumed that the steel allocated to investment is used to increase
productive capacity-by using steel to produce new steel mills, trans port facilities, or whatever. Let u(t), where 0::;; u(t) ::;; 1, denote the fraction of steel produced at time t that is allocated to investment. Then 1 - u(t) represents the fraction allocated to consumption. The assump tion that the reinvested steel is used to increase the productive capacity could be written dx dt = ku(t)x(t),
where
x(O)
= C - initial endowment,
where k is an appropriate constant (i.e., rate of increase in production is proportional to the amount allocated to investment). The problem is to choose u(t) so as to maximize the total consumption over some fixed period of time T > O. That is, we are to maximize
f: (1 -
u(t»x(t)dt.
For this problem, do we consume everything produced, or do we invest some at present to increase capacity now, so that we can produce more and hence consume more later? Do we follow a bang-bang procedure of first investing everything and then consuming everything? Example 4 Moon-Landing Problem (Fleming and Rishel [2J) Consider the problem of a spacecraft attempting to make a soft landing on the moon using the minimum amount of fuel. For a simplified model, let m denote the mass, h the height, v the vertical velocity of the space craft above the moon, and u the thrust of the spacecraft's engine (m, h, v, and u are functions of time). Let M denote the mass of the spacecraft without fuel, ho the initial height, Vo the initial velocity, F the initial amount of fuel, oc the maximum thrust of the engine, k a constant, and g the gravity acceleration of the moon (considered constant). The equations of motion are
h = v, v=-g+m-1u,
m=
-ku,
General Form of the Control Problem
and the control u is restricted so that are
°::; u(t) ::;;
h(O)
0(.
5
The end conditions
= ho ,
v(O) = vo, m(O) - M - F = 0,
h(t l ) =0, v(t l ) = 0,
where t 1 is the time taken for touchdown. With Xl
= h,
x(O) x(t l)
Xl
= V,
= m,
X3
= (ho, vo, M + F)T, = (O,O,anything)T,
this problem becomes, in matrix form, -g
X= [
+X l X3"lu] -ku
and we wish to choose u so that
is a minimum. However, x3
= f(t,x,U),
°
;$;
u(t)
;$;
0(
and
= - ku, so the above becomes -M - F
+ k f~1
u(r)dr,
and this is minimized at the same time as C(u)
CII = Jo
u(r) dt,
Note that although these problems come from seemingly completely different areas of applied mathematics, they all fit into the following general pattern.
GENERAL FORM OF THE CONTROL PROBLEM (1) The state equation is i = 1, ... ,n,
6
I.
Examples of Control Systems
or in vector form i = f(t, x, U),
where and (2) The initial point is x(O) = Xo ERn, and the final point that we wish to reach is Xl ERn. The final point Xl is often called the target (point), and mayor may not be given. (3) The class .1 of admissible controls is the set of all those control functions u allowed by the physical limitations on the problem. (In Examples 1 and 2 we had .:\ = {u : lu(t)1 s 1} and m = 1.) Usually we shall be given a compact, convex set Q c B" (the restraint set) and we shall take .:\ =
{u = (ut> ... ,um) : u, piecewise continuous and u(t) E Q}.
(4) The cost function or performance index quantitatively compares the effectiveness of various controllers. This is usually of the form
ill
C(u) = Jo fo(t, x(t), u(t»dt, where fo is a given continuous real-valued function, and the above integral is to be interpreted as: we take a control u E .:\, solve the state equations to obtain the corresponding X, calculate fo as a function of t, and perform the integration. If a target point is given (so called fixed end-point problem), then t 1 must be such that X(tl) = Xl' In particular, if fo == 1, then C(u) = t 1 , and we have the minimum-time problem. Ifa target point is not given (free-end-point problem), then t 1 will be a fixed given time, -and the integration is performed over the fixed interval [0, t 1 ]. The optimal control problem can now be formulated: Find an admissible control u* that minimizes the cost function, i.e., for which C(u*) S C(u) for all u E .:\. Such controls u* are called optimal controls. We shall first investigate in depth in Chapter 2 the linear (i.e., state equations are linear in X and u) time optimal control problem, deriving a necessary condition for optimality known as Pontryagin's maximum principle [4].
References
7
References [I] D. Bushaw, Optimal discontinuous forcing terms, in "Contributions to the Theory
of Non-linear Oscillations," pp. 29-52. Princeton Univ. Press, Princeton, New Jersey, 1958. [2] W. Fleming and R. Rishel, "Deterministic and Stochastic Optimal Control." Springer-Verlag, Berlin and New York, 1975. [3] R. Isaacs, "Differential Games." Wiley, New York, 1965. [4] L. Pontryagin, V. Boltyanskii, R. Gramkrelidze, and E. Mischenko, "The Mathe matical Theory of Optimal Processes." Wiley (Interscience), New York, 1962. For extensive bibliographies on control theory, the reader should consult E. B. Lee and L. Markus, "Foundations of Optimal Control Theory." Wiley, New York, 1967. M. Athans and P. Falb, ."Optimal Control: An Introduction to the Theory and Its Applications. McGraw-Hill, New York, 1965.
This page intentionally left blank
Chapter II
1.
The General Linear Time Optimal Problem
INTRODUCTION
Consider a control system described by the vector differential equation x(t) = A(t)x(t)
+ B(t)u(t),
x(O)
= X o ERn,
(1)
where
a2n(t)
aln(t)l . ,
(t ) bblm(t)] 2m . ,
anit)
bnm(t)
and we assume the elements of A(t), B(t) are integrable functions over any finite interval of time. The set of admissible controls will be ~
= {u = (ut>. . . ,Um)T : lui(t)1 ::;; 1, i = 1, ... ,m}.
(2)
A target point x, ERn is given, and the control problem is to mini mize the time t l for which x(t l ) = Xl' 9
lO
II.
The General Linear Time Optimal Problem
From the theory of ordinary differential equations, the solution of (1) can be written x(t: u) = X(t)x o + X(t) f~
X-1(l:)~(r)u(l:)dl:,
(3)
where X(t) is the principal matrix solution ofthe homogeneous system X(t) = A(t)X(t), X(O) = I, the identity matrix [3,7]. At a given time t we define the attainable set at time t to be just the set of all those points x E R" that we can reach in time t using all of our admissible controls, i.e., d(t) = {x(t: u): U E Ll}. A knowledge of the attainable set at time t will give us a complete description ofthe points we can hit in time up to t (see Fig. 1). Further we can reformulate our control problem: Minimize the time t 1 for which
Xl E
d(t 1 ) .
Before we proceed to characterize time optimal controls, we should really at this point ask ourselves, Does an optimal control exist? Otherwise we lay ourselves open to the possibility of constructing something that may not exist. This point should be stressed, for in a control problem we are usually trying to force a physical system to behave according to our requirements, and there is generally no reason to expect that nature will be sympathetic. For the linear time optimal control problem, it is shown in Appen dix 1 that optimal controls exist if the target point Xl can be hit in some time. This last condition brings us to the subject known as con trollability, which will be further discussed in Chapter VII. From now on we shall assume that an optimal control u* exists, where t* is the minimum time, and we shall consider the problem of
• "1
Fig.l
1.
II
Introduction
characterizing u*. Further we shall consider only the case of one control variable, i.e., m = 1. B is then n x 1. The general case is only notationally different. Define y(t) = X-l(t)B(t), i.e., y(t) = (Yl(t), ... ,Yn(tW, and Yl = X-l(t*)Xl - Xo ERn, i.e., Yl = J~ y(r)u*(r) dt, Set 9P(t)
=
{f~
=
{(f~
y(r)u(r)dr: u h(r)u(r),
f~
E~} Y2(r)u(r), ... ,
f~
Yn(r)u(r) dt
r:
uE
~}.
9l(t) is called the reachable set in time t, and we show in Appendix I that 9P(t) is closed, bounded, and convex for all t 2 0. In fact d(t) = X(t)[xo
and 9P(t)
+ 9P(t)] =
{X(t)[xo
+ y]: y E 9P(t)}
= X-l(t)d(t) x o'
Since Xl E d(t*), we have Yl E 9P(t*), and clearly t* is the smallest time for which Yl E 9l(t) (otherwise we would have Xl E d(t) for t < t*, a contradiction). If we think of the control problem in terms of these sets 9P(t), t > 0, we know that at t = 0, 9P(t) = {OJ, and as time increases 9P(t) grows and eventually intersects the point Yi - Our problem becomes, find the first time t* at which 9P(t) intersects Yl. We should expect, as is shown in Hermes [3] that the first contact occurs when 9P(t*) just touches Y1, or that Yl must belong to the boundary of 9l(t*) (see Fig. 2). The supporting hyperplane theorem (Appendix 1) then implies that there is a nontrivial hyperplane H (with outward normal", say) sup porting 9P(t*) at Yl' In other words "T Y1 2 "T y
that is
"T(f~·
Yl(~)u*(r)dr, 2
tl(f~·
for all Y E 9P(t*) and
... ,
f~·
n « 0,
Yn(r)u*(r)dr)
ytCr)u(r) dt, ... ,
f~·
Yn(r)U(r)dr)
for all u E~.
Rearranging gives (with "T = ('11' ... ,'1n))
f~·('11Yl(r)
+ '12Y2(r) +... + '1nYn(r) )(u*(r)-
u(r))dr 2 0,
for all u E~,
12
II.
The General Linear Time Optimal Problem
Fig. 2
or
f~*
("Ty(t»(U*(t)-u(t»dt
This can happen only if u*(t) = sgn("Ty(t», r +1 sgn(x}= -1 { undefined
E
~ O.
(0, t*). Here
if x>O if x < 0 if x = O.
Hence we have just proven the following theorem, a special case of the Pontryaqin maximum principle [6]. Xl
Theorem 1 If u* is any optimal control transfering (1) from Xo to in minimum time t*, then there exists a nonzero" E R" such that t
E
[0, t*].
(4)
In the general case, with u* = (uf, ... , U;::)T, we can similarly show (5)
for i = 1, ... , m, t E [0, t*], where [ ]; denotes the ith component of the vector. To simplify the notation we shall henceforth abbreviate (5) by t
E
[0, t*].
(6)
2. Applications of the Maximum Principle
13
We can place this theorem in the form it is usually stated by noting that the function t/f(t) = "TX- 1(t) is the solution of the adjoint equation
if,(t) = - t/f(t)A(t) with initial condition t/f(0) = ing gives
"T. For, taking inverses and differentiat
0= dd ("T) = dd (t/f(t)X(t)) = if,x . . t t = if,x
+ t/fk
+ t/fAX = (if, + t/fA)X
and consequently, if, = -t/fA. Further t/f(0) tions (4) and (6) then become
=
"T since X(O) = I. Rela
0:::;; t:::;; t*.
u*(t) = sgn(t/f(t)B(tW,
(7)
If we define the Hamiltonian by
H(t/f, x, u) = t/f(Ax
+ Bu),
then the optimal control (7) satisfies
H(t/f, x, u*) = max H(t/f, x, u). ued
We shall return later to an in-depth discussion of the maximum prin ciple, but first we shall see how the characterization (7) allows us to determine time optimal controls. One immediate observation is that u* will be bang-bang whenever "TX- 1(t)B(t) is nonzero, and the times at which the control changes value are just the zeros of this function. These times are usually called the switching times of u*.
2. APPLICAnONS OF THE MAXIMUM PRINCIPLE Consider the control problem in Example 1 in Chapter I. The state equation is '
i(t)
=
[~ ~JX(t)
+ GJU(t),
Hence A
=
[~ ~J
B=[~J
X(O) = (Xo, Yo).
(1)
14
ll.
The General Linear Time Optimal Problem
and X(t) is the solution of X(t)
= AX(t),
X(O)
= I.
Then X(t) = eAt = I
+
Ant"
L n! 00
n= 1
and so
X-l(t)=e-At=[~
~tl
Consequently, from the maximum principle the optimal control has the form u*(t) = sgn(171t
+ 172)
for some (171)172) E R 2 •
(2)
We see immediately that u* has only one switch between 1 and -1 (as we conjectured). Note that when u* = + 1 we can solve (1) with initial condition x(O) = (xo, Yo) and obtain dX 1
(ft = X2'
dX 2 = 1 dt ' Xl
.
1
dX2
i.e., dXl = X2' x~
y~
=2+ XO- 2
Fig. 3
x(O) = (xo, Yo),
2. Applications of the Maximum Principle
15
(see Fig. 3). Note that X2 is always increasing, so we traverse the path in the direction indicated. When u = -1, we have Xl
= -tx~
+ (xo + ty~),
which is shown in Fig. 4. This time X2 is decreasing. The maximum principle tells us that the optimal control is either u* = + 1 (then u* = -1), or u* = -1 (then u* = + 1). So, bearing in mind that we have to end up at 0 by traversing arcs of the types in Figs. 3 and 4, we can easily construct the optimal trajectory (see Fig. 5).
- - - - - - f B - - - - - - - - - - - - - + - - - -.... Xl
Fig. 4
-":lIIo.
....
"'" "
..... .... <,
'\ '\
u =-1 ...... ~
'\
/
\ I
I
.....(x O• yo)
.
---._- Fig. 5
16
Il,
The General Linear Time Optimal Problem
For example, if Xo = 4 and Yo = -1, then we must control first with = - 1, and when the two parabolas in Fig. 5 intersect we change to u = + 1. To calculate the first switch t I , we find the equation of the first parabola: U
Xl
or X2 = occurs at
=
!(9 - x~)
J9 - 2XI (since X2 < 0). Hence
the point of intersection
i.e., 9-2xI =2x I, 9
Xl
=4'
X2= - J9-~=
-3f·
~=
-
Now solving the state equations (with u = -1) in terms of t gives X2(t) = -t
+ Yo = -3.J212,
so tl
3.J2
+ 1 =-2-'
tl
=
is the switching time. Similarly, after changing to u =
XI=X2'
3.J2-2 2
= 1.121 ...
+ 1, we traverse
x2=1
°
from (£, -3.J212) to (0,0). Solving for X2' we find xit) = t - 3J212, and x 2 = when t = 3 Jil2 sec., that is, the total time taken is t* = 2(3 J212) - 1 = 3 J2 - 1 = 3.242 sec. We can formalize. this argument by defining the following curve, called the switchinglocus I": X2 = W(XI)
{
- ~
for for
= +y~2 -"'X I
Xl
2::
Xl
<
°
°
(see Fig. 6). Then if we set if X2>W(XI)
or if (X I,X2)Er
if Xl = X2 = if X2 < W(XI)
or if (X I,X2)
°
E
r +,
3.
Normal Systems-Uniqueness of the Optimal Control
17
r..
..
------i-+---------fl~---_+-_+-_+_------
Fig. 6. Switching locus T = T _ u I' +.
the optimum trajectory starting from x(O) = (xo, Yo) is just the solution of z = r{!(z, i), z(O) = xo, z(O) = Yo, which terminates at z(t*) = 0, i(t*) = O. Given any point (z(t), i(t)) on this trajectory, the value of the optimal control at this time t is u*(t) = r{!(z(t), i(t».
With the equation written in this way, an observer placed on the trolley will be able to operate the thrust optimally knowing only his position and velocity at every time t. Controls of this type are called closed loop or feedback controls.
3. NORMAL SYSTEMS-UNIQUENESS
OF THE OPTIMAL CONTROL
For this section, we shall suppose A and B are constant matrices. Then X(t) = eAt and the maximum principle states that optimal con trols have the form t
E
[0, t*],
(1)
for some" -=I- 0 in R". Writing B = (hi' ... ,b m) in columns, we can state this equivalently, uj(t) = sgn("Te-Atb), t E [0, t*], (2)
x,
18
II.
The General Linear Time Optimal Problem
j = 1,2, ... ,m. The only way the maximum principle will not determine the optimal control uniquely is if, for some j = 1, 2, ... , m, the function t E [0, t*],
is zero on some interval. Of course systems for which the maximum principle has only one solution for each" :1= 0, and so gives the optimal control uniquely, deserve a special name.
Definition 1 (Hermes and La Salle [3J) We call the control system (1.1) (with A and B constant) normal if for every nonzero" ERn, the
functions t 1-+ "Te-Atb j , t E [0, t*J, j = 1,2, ... ,m, are nonzero, except at a finite number of points. Then, if (1) is normal, the optimal control is
unique, bang-bang with a finite number of switches, and is given by (1) (or (2)) for some" ERn.
At the end of this section we shall derive a simple necessary and sufficient test for normality, but first we look at the geometric meaning of normality. Reconsidering our proof of the maximum principle, we showed that " was the normal to a hyperplane H, which supported 9l(t*) at Yl (see Fig. 7). If H happens to touch 9l(t*) at another point Y2' say, then by def inition for all
Fig. 7
YE 9l(t*).
3.
Normal Systems--Uniqueness of the Optimal Control
If y 2 is reached by control Y2
U2
19
in time t*, that is,
= f~'
e-
AtBu
2(t)dt,
then, as in the proof of the maximum principle, U2(t)
= sgn("Te-AtB),
t e [O,t*],
or U2 is another solution of the maximum principle (U2 ¢. u*, otherwise Y1 = Y2)' In other words, any control that steers us in time t* to a point in H n 9l(t*) satisfies the maximum principle for the same n. Equiva lently, the maximum principle gives information only about the inter section of the hyperplane H (with normal a) with the set 9l(t*). The larger this set is (for instance if 9l(t*) has a "flat"), the less information about our particular optimal control u* we can derive from it. For normal systems, the maximum principle tells us everything about u*,so the set H n 9l(t*) must be as small as possible; in fact, H n 9l(t*) = {ytJ. In our geometric language we can state this as a theorem: Theorem 1 The optimal control u* is uniquely determined by the maximum principle if and only if y 1 is an exposed point of 9l(t*). We leave as an exercise the following: Corollary The system (1.1) is normal if and only if 9l(t) is strictly convex for all t > O. As promised, we shall now derive a simple test for normality. Theorem 2 The control system is normal if and only if for each j = 1,2, ... ,m, the vectors {bj,Abj,A 2bj , . . . ,A"-lbj } are linearly in dependent. Proof
Suppose" is an arbitrary nonzero vector in R".lfthe function t
E
[0, t*],
(3)
is zero at more than a finite number of points, it must be identically zero, "Te-Atb j = 0 for all t E [0, t*], as (3)is an analytic function. Hence if(l.1) is not normal for somej, we must have "Te-Atbj == O. Substituting t = 0 gives "Tb j = O. Differentiating (3) gives
~("Te-Atb.)=
dt
and, at t = 0,
J
-"TAe-Atb.==O J
20
II.
The General Linear Time Optimal Problem
Similarly we can show
"TArbj =
°
for all
r = 0,1,2, ... ,n - 1,
and so {bj' Ab j, . . . , An-Ib j} must be linearly dependent, that is, linear independence implies normality. To prove the converse, assume that for some j, the vectors {bj' Ab j, ... , An-Ibj} are linearly dependent. Then there exists a nonzero" E R" with "Tb j
= "TAb j = ... =
"T An-1b j = 0.
(4)
Define
then
= "T(_A)ke-A1b j,
(Dk~)(t)
=
=
where D dtdt. We shall now show that ~ 0. Let q;( - A) be the char acteristic polynomial of A. By the properties of matrix exponentials, q;(D)~
= q;(D)["Te-A1bjJ = "T[q;(D)e-A1bjJ = "T[q;(_A)e-A1b j].
However, by the Hamilton-Cayley theorem [7J, q;(-A) = 0, and so (q;(D)~)(t)
=
°
for all
t E [0, t*].
(5)
Further by (4), [Dk~J(O)
= 0,
k = 0, 1,2,3, ... , n - 1.
(6)
Equations (5) and (6) are just a linear homogeneous ordinary differential equation for the function ~, with zero initial data, and, by the uniqueness of the solutions for such problems, we must have ~(t) = for all t E [0, t*J, which contradicts the definition of normality. •
°
4. FURTHER EXAMPLES OF TIME OPTIMAL CONTROL Example 1
Consider the control problem
Xl =
X2
=
X 2,
-Xl
+ U,
We wish to reach the origin in minimum time.
lui :$
1.
(1)
4. Further Examples of Time Optimal Control
21
We have
i.e., A= [
0
b=[~l
IJ
-1
0'
Ab
= [~l
hence {b, Ab} are linearly independent and the system is normal. Next X(t)=eAI=[
c~st
e-A1b= [-sintJ. cos t
sintJ, cos t
- SIn t
By the maximum principle, optimal controls must be of the form u*(t)
= sgn( -'It sin t + 112 cos t)
(111' I1z) ;/- (0,0)
or u*(t) = sgn(sin(t
+ 0))
for some -n ~ 0 ~ tt. That is, the optimal control is unique, bang-bang, and its switches occur exactly n seconds apart. When u = + 1, Xl = Xz, Xz =
-Xl
dX I
X
dX2
-Xl
1-
Xl = X
+ 1, z
+1
acost,
z = asint,
and (1 - xddx I X~
+ (1 -
XI)Z
= XZdX2' = a2 ,
which is a circle centered at (1,0). Similarly, when u = -1, x~
+ (-1
-
X I)2
= aZ ,
which is a circle centered at (- 1,0), where
= acost - 1, Xz = -asint. Xl
22
II.
The General Linear Time Optimal Problem
With t increasing, these circles are transversed in a clockwise sense; with t decreasing, in an counter clockwise sense. To solve this control problem, suppose we start at the origin and move backward in time until we hit (x o, Yo) in time - t*. Since we are moving backward in time, we transverse the circles counterclockwise. Suppose first that 0 < b ~ n (see Fig. 8). Then we move - b seconds around the arc of the trajectory corresponding to u = + 1, which passes through the origin. At t = - b, sin(t + b) changes sign and we switch the control to u = -1. Since u = - 1, the optimal trajectory is circular with center at ( -1, 0),and passing through P i - We travel along this circle for rt seconds, in which time we traverse exactly a semicircle. (From Fig. 8 we see that each circle is traversed in 2n seconds.) After n seconds we shall reach the point P 2, which by symmetry is just the reflection of P 1 onto the circle with radius 1 and center (-1, 0). At P 2 we switch to u = + 1 again and traverse a semicircle which has a center at (1,0) and which passes through P 2' After n seconds we reach P 3 and switch to u = - 1, etc.
(-3,01
. '. ..
1-1,01 "
'
'", , Fig. 8
-
x
23
4. Further Examples of Time Optimal Control
In this way we generate the optimum trajectories, and the one that passes through (x o, Yo) must be (by normality) the desired optimal trajectory. If - tt S 15 < 0, then we switch with u = -1 until sin(t + 15) = i.e., t = - 15 - n seconds (see Fig. 9). We then switch to u = + 1, describing a semicircular arc with center (1,0) for n seconds to Q2' switch to u = - 1, etc. Clearly, the switching locus W is just as shown in Fig. 10, and the synthesizer is defined by
°
- I t/J(x, y) = {
+1
if (x, y) lies above W or on if (x, y) lies below W or on
r_ r +.
The optimal responses are just the solutions of
Xl
+ Xl =
t/J(X1> X2) = t/J(Xl,Xl)
with initial point (xo, Yo) and final point (0, 0). Example 2 (Bushaw [lJ; Lee and Marcus [5J) Consider the mini mal time control to the origin for X + 2bx
+ k 2 x = u,
x(O)
(the damped linear oscillator) where b > lu(t)j S 1. First, (2) is equivalent to
= x o,
°
x(O)
and k >
°
= Yo
(2)
are constants and
(3) To begin with, we shall suppose b2 - k2 ~ 0; (2) is then critically or over damped. The maximum principle becomes (from the alternative formulation in terms of the adjoint system)
.j, = -
t/JA,
u*(t) = sgn(t/J(t)b) = sgn(t/J2(t».
Writing this out, we see that
..
[0 -lJ
(t/J 1> t/J 2) = (t/J 1, t/J 2) k 2 that is, If; 2
-
2b
+ k2t/J 2 = 0, which has solutions if b2 - k 2 = e bt( 1X + f3t) t/J 2(t) = { «e" sinh(ll t + f3) if b2 - k 2 > 2bifJ 2
° 0,
24
N
X
25
26
II.
The General Linear Time Optimal Problem
where IX, f3 are constants (initial conditions for the adjoint equation) and 2 - k • In any case, r/J2 can have at most one zero, and u* has at most one switch. If we denote by J1. = Jb 2
+ 1 in the lower right
the solution of(3) passing through (0,0) with u = hand quadrant and by
the solution of (3) passing through (0,0) with u = - 1 in the upper left hand quadrant, then the switching locus X2 = W(x 1 ) is as pictured in Fig. 11. The optimal control synthesizer is then for for
X2 X2
> W(xd and on (I' _) < W(x 1) and on (r +).
The verification of these details is exactly the same as in Section 2 Example 1.
... u
~-1
-------------t-------------~Xl
u
~
----_ ...
+1
Fig. 11
27
4. Further Examples of Time Optimal Control
Consider now the case of underdamping b2 the adjoint system, and we get
J
t/J 2(t) =
ae" sin(wt
-
k2 < 0. We can solve
+ f3),
where w = k - b . In other words, the switches of the optimal con trol are exactly nfca seconds apart. Each solution of the state equations (with u = + 1) 2
2
[.X X2lJ =
[0 1J[XlJ X2 + [OJ _k 2
-2b
1
is a spiral approaching a critical equilibrium point 0 + = (llk 2 , 0) as t ---. + (f) (see Fig. 12). Similarly for u = -1, each solution of
is a spiral approaching 0 _ = (-liP, 0) as t ---. (f) (see Fig. 13). We construct the switching locus as follows: Find the solution of S + passing through the origin and unfold it, that is, take S 1 and reflect it back to Sz to etc. (see Fig. 14). This defines the switching locus = W(Xl) for ~ 0. For < 0, we set W(X l) = - W( This gives us the result shown in Fig. 15. Then it can be shown [5] that if P 1 and P 2 are colinear with 0 +, it takes exactly nk» seconds to traverse the optimal trajectory (S+) between PI and P 2, so as in the case of the undamped oscillator, we start off at our initial point (xo, Yo), with u = + 1 if it lies below W, u = -1 if above W, and continue until we hit W. Then the control switches sign, and by
s;
s;
Xl)'
X2
Xl
Xl
....--+------+---+-+-Herl-+-+--f-- x 1
Fig. 12
----+--I-~~t_t1---Xl
Fig. 13
-- .... S1
I
/
-------:-~+_--+-__._--~-x, I
,
\
S' 2
Fig. 14
u = +1
Fig. IS
S.
Numerical Computation of the Switching Times
29
the maximum principle we go for n/w seconds with the control at this value. However, it takes us exactly nk» seconds to rehit W, and so every time we cross W the control changes sign. This process continues until we hit r _ or I' +, and then we come into 0 by switching to u = - 1 or u = + 1, respectively. The synthesizer can be defined by '¥(Xb X2 ) = {
for x2>W(xdandonr_ for X2 < W(Xt) and on I' +
- I +1
and the optimal trajectories are solutions of x
+ 2bx + k 2 x
= t/J(x, x),
x(O)
=
Xo,
x(O) = Yo.
5. NUMERICAL COMPUTATION OF THE SWITCHING TIMES In this section we discuss a method for the numerical solution of time optimal control problems
x = Ax + bu.
(1)
Suppose the optimal control u* transferring Xo to 0 in minimum time is bang-bang with r switches. Without loss of generality, suppose that the first action of u* is -1 (see Fig. 16). Then, we must have
o = ~+At{XO
+
£r e-Atbu*(t)dt)
i.e.,
+1
-11------
Fig. 16
30
II.
The General Linear Time Optimal Problem
In other words, multiplying both sides by A and integrating yields Ax o = _(e- At1b - b) + (e- At2b - e- At1b) + ... + (-l)'(e-Atrb _ e- Atr- 1b) i.e.,
Ax o = b - 2e- At1b + 2e- At2b + ... + (_l)'e-Atrb.
(2)
If matrix Ais normal (commutes with its adjoint), has eigenvalues AI" .. , An' and corresponding orthonormal eigenvectors {x., .. , Xn } [7], then n
eAtb =
L
eAjt(b' X)Xj
j= 1
i.e.,
e-Atb =
n
L j= 1
e-Ajt(b' X)Xj .
So we can expand (2) as n
L Aj(XO • Xj)Xj = L [(b' X)Xj -
2e- Ajt1(b • X)Xj
j= 1
+ 2e- AjI2(b' X)Xj + .. , + (_l)'e-Ajtr(b' x)xil
(3)
Now using the orthonormality of the {x), (3) can be alternatively written as a set of n nonlinear equations in r unknowns t 1 , ••• , t.:
+ Aj(Xo'Xj) = (b' xj)(l - 2e- Ajt, + 2e- Ajt2 - 2e Ajl3 + ... +( -1)'e Ajtr) for j = 1, 2, ... , n; that is, -AI(XO' x.) + [b : x 1)(1 - 2e- A,t, + 2e- A,t 2 + ... + (_lYe-A,t r) = 0 -Aix o' Xn) + (h : xn)(l - 2e- Ant, + 2e- Ant2 + '" + (_l)'e- Antr) = O.
Note. It has been shown by Feldbaum [6, Chap. 3, Theorem 10] that if {AI' ... , An} are real, then r ~ n - 1.
Take as an example the harmonic oscillator, Example 1, Section 4,
x=
Ax
+ bu,
where x(O) =
G:J.
5.
31
Numerical Computation of the Switching Times
Here we have and Hence Ax o = [
YOJ
-x o
and from Example 1, e-Atb = [-sintJ.
cost
Then (2) becomes
YoJ = [OJ - 2[-sint 1J + 2[-sint 2J + ... + (_l y [ - sin tr J [ -x o 1 cos r, cos r, cos r, or "
smt 1 - s m t2 1 - Zcos r,
+ ·· , +
Yo
( -1)'+ 1 .
2
smtr = 2
+ 2cost 2 + ... + (-I)'cost r
= -Xo.
These equations in general have multiple solutions; however, by normality, the problem minimize t,
subject to
0< t 1 < t 2 < ... < t,
and
(-Iy+ 1 . . . f 1(t1,···,t r ) =smt 1 - s m t2 + · · · + 2 smtr f2(t1,"" t r ). = 1 - Zcos r,
( -1)'
-
Yo 2
°
= ,
+ 2 cos t 2 + ... + -2- cos r, -
Xo
=
°
must have a unique solution, namely, the switching times of our optimal control u*. So, setting t = (t 1> ' •• , tr ) , the problem has the general form P: minimize h(t)
subject to
gi(t) >
° and
jj(t) = 0, j = 1, 2,
i = 1, ... , r.
32
II.
The General Linear Time Optimal Problem
This is a mathematical programming problem that could be solved by Lagrange multiplier methods. A better method is to make the following transformation: t 1 = yi,
tz =
YI + yL
then setting fj(y) = jj(t), P is equivalent to pT:
minimize
YI + ... + Y;
subject to Jj(y) = 0, j = 1, 2,
and we have removed the inequality constraints. One method, which seems to work, for solving pT is to solve
pTT:
minimize YI + ... +
Y; + A(f!(y)Z + f!(Y)Z),
increasing A until successive solutions y coincide to a desired accuracy (this is an example of a penalty method) [2,3]. For example, with X o = 1, Yo = 0, the switching times are t 1 = 0.505 sec, t z = 1.823 sec.
Problems 1. Consider a control process described by x + bx = u for a real constant b, with the restraint lu(t)/ ::s; 1. Verify that the response x(t) with x(O) = Xo to a control u(t) is x(t) = e-btx o + e- bt
f:
ebSu(s)ds.
(a) If b ~ 0, show that every initial point can be controlled to Xl = O. (b) If b < 0, describe precisely those points Xo that can be steered to Xl = O. 2. In the control process in Problem 1, show that the control trans ferring Xo to Xl = 0 in minimum time (when this is possible) is bang-bang, indicate how many switches it can have, and show that it can be synthesized by u(t)
=
-sgn(x(t)).
Compute the minimum time t in terms of X o and b.
Problems
33
3. Suppose that you have been given a contract to ship eggs by rocket from New York to Los Angeles, a distance of 2400 miles. Find the shortest time in which you can do this without breaking any eggs. You may assume the path traveled is a straight line, and neglect friction, the rotation of the earth, etc. The only stipulation is that the eggs break if the acceleration exceeds 100 ft/sec 2 . 4. Calculate the minimum time to transfer from the initial point (1, 0) to the origin (0, 0) for the system
x+X=U,
IU(t)1 ~ 1.
What are the switching times of the optimal control? 5. Find the optimal trajectories and switching locus for the problem of reaching the origin in minimum time for the control system Xl = X z
lUll ~ Iuzl s
+ Ut>
Xz = -Xl + Uz ,
1,
1.
6. Find the control that steers the system Xl(O) = 1,
xz{O) = 0,
to the origin (0,0) in t = 1 and minimizes the cost C(u) = sup IU(t)l· 0:51:51
°
[Hint: If for the optimal control u*, C(u*) = k, show that this problem is equivalent to finding that number k > for which the minimum time t* to reach (0,0) from (Ilk, 0) for the system Xz =
lui
U,
~ 1
is t* = 1.J 7. For the control system
lUi ~
1,
°
show that the minimum time control steering X o = - 1 to X o = is u == t, which is not bang-bang. 8. Prove the Corollary to Theorem 3.1. 9. Discuss the time optimal control to the origin for the system,
= Xz = Xl
U1
+ Uz,
Ul -
Uz ,
lUll ~
1,
Iuzl ~ 1.
34
Il,
The General Linear Time OptimalProblem
10. Show that the maximum of the Hamiltonian defined in Section 1 is constant in time if the control system i = Ax + Bu is
autonomous. 11. Prove that a point Yl E flt(t) is hit by a unique trajectory if and only if y, is an extreme point of 9l(t). References
[lJ D. Bushaw, Optimal discontinuous forcing terms, in "Contributions to the Theory [2J [3J [4J [5J [6J
[7J
of Non-linear Oscillations," pp. 29-52. Princeton Univ. Press, Princeton, New Jersey, 1958. A. V. Fiacco and G. P. McCormick, "Non-linear Programming: Sequentially Unconstrained Minimization Techniques." Wiley, New York, 1968. H. Hermes and J. P. LaSalle, "Functional Analysis and Time-Optimal Control." Academic Press, New York, 1969. M. R. Hestenes, Multiplier and gradient methods, J. Optim. Theory Appl. 4, 303-320 (1969). E. B. Lee and L. Markus, "Foundations of Optimal Control Theory." Wiley, New York, 1967. L. Pontryagin, V. Boltyanskii, R. Gramkrelidze, and E. Mischenko, "The Mathe matical Theory of Optimal Processes." Wiley (lnterscience), New York, 1962. G. Strang, "Computational Linear Algebra." Prentice-Hall, Englewood Cliffs, New Jersey, 1980.
Chapter III
1.
The Pontryagin Maximum Principle
THE MAXIMUM PRINCIPLE
Consider the autonomous control problem: (1) Xi = h(Xl>' .. ,Xn , Ub' differentiable in R" x n.
.. ,Um ) ,
i = 1,2, ... n, with f continuously
We are given (2) the initial point Xo and (possibly) final point Xl' (3) the class Ll of admissible controls, which we take to be all piece wise continuous functions u, with u(t) E n,n a given set in B", (4) the cost functional C(u) =
f~' fo(x(t), u(t)) dt,
where fo is continuously differentiable in W x n. Define for
X"
u, cP
E
R" x B" x R", CPo
E
R,
H(x, u, cp) = CPofo(x, u) + CPt!I(X, u) + CP2f2(X, u) =
ofo(x, u) + cP • f(x, u)
(H is called the Hamiltonian) and M(x, cp) = max H(x, v, cp). YEn
Then the maximum principle can be stated. 35
+ ... + CPnfn(x, u)
36
III.
The Pontryagin Maximum Principle
Tbeorem 1 (Pontryagin [4]) Suppose u* is an optimal control for the above problem and x* is the corresponding trajectory. Then there exists a nonvanishing function cp*(t) = (cpt(t), . . . , cp:(t» and cp~ such that oH i = 1,2, ... n, (a) x:", = -OCPi = i(x* u*), 1 , (b)
_cp~ ofo (x*, u*) _ CPt ofl (x*, u*) ox, ox,
_ ... _ cP* ofn (x* u*) n
(c)
cP~
'
8xi
,
is a nonpositive constant and H(x*(t), u*(t),cp*(t»
= M(x*(t), cp*(t» = max H(x*(t), v, cp*(t». VEO
Further, M(x*(t), cp*(t» is constant for 0 :::;; t s: t*. Let us reapply this to the autonomous linear time optimal problem to check it. We have 0=[-1,1], x = Ax + bu = f(x,u), and
=J~
C(U)
1 dt,
so
fo == 1.
The Hamiltonian is just H(x, v, cp)
=
CPo
+ cp(Ax + bv),
so the statement of the maximum principle becomes H(x*(t), u*(t), cp*(t» =
max(cp~ Ivl,;l
+ cp*(t)(Ax(t) + bv»,
which occurs simultaneously with the following maximum max (cp*(t)bv). Ivl';l
This maximum clearly occurs when u*(t) = sgn(cp*(t)b),
1. The Maximum Principle
37
and we calculate cp* from (b) as just liJ*(t)
= -
cp*(t)A,
which is just the adjoint equation. (Even in: the general case (b) is called the adjoint equation.) Remark If t l is fixed, and Xl is not given, then cp*(t l ) = O. (This is a special case of a transversality condition, which we will investigate later in more detail.)
Consider a chemical reaction in which one component X is added at a constant rate over some fixed period of time [0, T]. Let x(t) be the pH value of the reaction at time t. We assume that the quality of the final product depends critically on this pH value, and we control the pH by the strength u of some component of X. Further, suppose that the rate of change of pH is proportional to the sum of the current pH and the strength of the controlling ingredient u. That is, x(t)
= cxx(t) + pu(t),
= Xo,
x(O)
0~ t
~
T,
where IX and p are known positive constants and Xo is the known initial pH. We model the decrease in yield due to changes in the pH by
JOT x 2dt and we suppose the rate of cost of maintaining the strength u to be proportional to u 2 • The total cost associated with a control u will then be C(u) =
J: (ax
2
+ u2 )dt,
where a > 0 is a given constant. In this example the control u is not constrained. In our previous formulation,
1 = IXX + f3u, 10 = ax 2 + u2 , H(x, U,
it equals -1. The maximum principle allows
38
III.
The Pontryagin Maximum Principle
have the strange situation that our conditions for optimality do not depend on the cost of running the system! When
x* = IXX* +
f3u*,
x*(O) =
ep* = +2ax* - tup",
(1)
x~,
(2)
and the optimal control satisfies H(x(t), u*(t),
+ f3v).
This maximum occurs at the same point as the maximum of max( - v 2 + f3v
Since we are taking the maximum over all v E R, for each t, and this maximum clearly exists, we can obtain it by differentiating with respect to v:
o
ov (- v2 + f3v
hence the maximum occurs at v = +tf3
(3)
O~t~T.
The optimal control is determined by (1)-(3), that is, we must solve the two-point boundary-value problem
x* = ep*
rxx* +
2
13 2
tp",
= 2ax* - oxp",
x*(O) = Xo ,
0~ t
T,
(4)
o s t s T;
(5)
~
then u*(t) = tf3
o~
t
s T?
1. The Maximum Principle
39
In attempting to do this, divide (4) and (5) by x*, x* x* = ex
f32 q>* x*'
+2
cfJ* q>* = 2a- ex x* x*'
-
and set d(t) = q>*(t)/x*(t). Then
d=
_(X*q>* - cfJ*x*) = _ x* q>* (X*)2 x* x*
+ cfJ* x*
or •
d=
f32 2 d - 2exd 2
--
+ 2a
with
d(T)
=
o.
(6)
To solve equation (6)(the Ricatti equation), make the change of variable
then (6) becomes
that is, ~ + 2exe - af32~
= 0.
Since d(T) = 0, we have e(T) = 0, and since we are only interested in the ratio we can choose ~(T) = 1. The solution is then
(em, ~(t)
=
e-<X(t-T)
[COSh Jex2 + af32(t -
T)
ex sinh Jex2+a f32(t-T)] Jex2 + af32
+
40
UI.
The Pontryagin Maximum Principle Plant
I
u' (tl
~----1:
x= ax + ilu
xtt)
u*(tl = C(t)x*(t)
:1+-------1
Control
Fig.•
for 0
~
t ~ T, and
u*(t) =
~
2
cp*(t)
= ~ cp*(t) x*(t) = ~ d(t)x*(t) 2 x*(t)
2
1 (~(t») -_ P w) x*(t).
The function ~/e can be precomputed, so we have a linear feedback control with variable gain
C(t) =
1 ~(t)
PW)
(see Fig. 1).
2. CLASSICAL CALCULUS OF VARIATIONS The oldest problem in the calculus of variations is probably the following: Minimize the integral g f(y(x), y'(x»dx over all differentiable func tions y passing through y~O) = Yo, y(T) = Yl' with y' piecewise con tinuous. We can solve this problem with the maximum principle. To do this, we just reformulate it as a control problem. Set y' = u, and since y' is not constrained, we take A, the set of admissible controls, to be all piecewise continuous functions. Then the above problem becomes a control problem with state equation
y' = u,
y(O)
= Yo,
y(T) = Yl
2.
and cost C(u)=
Classical Calculus of Variations
41
S: f(y,u)dt.
Applying the maximum principle, we set H(y, u,
= -
f(y, u) +
and then, for the optimal u*, y*, tp", we have dy* -=u* dx '
(1)
d
(2)
and H(y*,u*,
= max(-f(y,v) +
Since this function is differentiable with respect to v, a necessary test for its maximum is to set
°
ov (- f(y, v) +
(3)
and so
=
of ov (y*, u*).
Combining (1)-(3) then gives us a partial differential equation for y*,
5£(Of) _oyof -_ 0, dx oy'
y(O)
= Yo,
y(T) = Yl'
This last equation is known classically as the Euler-Lagrange equation
[5].
As an example, suppose Yo and Yl are given points in the plane, and consider the problem of finding the curve of shortest length passing between Yo and Yl (see Fig. 2). If y is a curve passing between Yo and Yl' then, parametrizing y with respect to x, 0 ~ x ~ 1, say, its length is just
Sol";l + y'(x)dx,
y(O) = Yo,
So our problem is Minimize 16 ";1 + (y')2 dx subject to y(O) = Yo, y(l) = Yl'
42
III.
The Pontryagin Maximum Principle
Fig. 2
For this problemf(y, y')
..)1 + (y')2, and the Euler equation becomes
=
d(Of) of _0 oy' - oy - ,
dx
d ( y' ) dx ..)1 + (y'f - 0 = 0,
that is, y'
----;:::==::;;
~1
+ (y')2
= const,
which has the solution y' = const, or y is linear. Note that since f = f(y, y') is not explicitly a function of x, Euler's equation can be rearranged in the form
f - yh = const. o•
This is called the first integral of Euler's equation, and it holds if f is independent of x. Brachistochrone Problem
Find the curve passing between P a and Pb' which has the property that (neglecting friction) a point mass would slide down the curve in minimum time (see Fig. 3). This problem becomes: Find the curve y such that y' is continuous, y(a) = P a , y(b) = Pb' and the integral
_1_ fb..)1 + (y'f dx
J2(J Ja
is minimized.
JY
2. Classical Calculus of Variations
43
Fig. 3
For the equation of motion, notice that
+ K.E.
P.E.
=
const.
(where P.E. is potential energy and K.E. is kinetic energy), so
tmv 2 where v = dsjdt and s so at any time
mgy = E
-
= arc length. Initially v = 0, y = 0, hence E = 0,
tmv 2 = mgy,
ds dt = J2gy, dt
=~
=
'J2gy In this problem
f( y,y
') =
J1 + (dyjdx)2 dx. J2iY J1 + (y')2 JY'
and so the first integral becomes
J1 + (y')2 (l + (y')2)-1/2 .JY - (y) JY y I
that is,
JYJ1 + (y')2 = A,
where A is a positive constant, or Y' -
j9 . - y y
--
I
= const,
44
III.
The Pontryagin Maximum Principle
We try for a parametric solution of the form
y Then
=
A sin 2 (!O).
. (e)"2 (e)"2 dedx
A sm
cos
=
cos(!e) sin(te)'
that is
or x
= B + tA(e - sine)
and y = tA(l - cos e) (these curves are called cycloids). We solve for A and B by finding the solution passing through y = P a when x = a, y = P b when x = b.
3. MORE EXAMPLES OF Example 1 equations
rae MAXIMUM PRINCIPLE
Consider an inventory control model governed by the
jet) = pet) - Set), Set) = - AP(t),
1(0)=1 0 , S(O)
= So,
where l(t) is the level ofinventory at time t, Set) the rate ofsales at time t, pet) the rate of production with P ~ pet) ~ P, and A a positive constant. The first equation just expresses the assumption that the rate of change of the inventory level is the production rate minus the sales rate. The second reflects the assumption that the rate of change of sales declines at a rate proportional to the production rate. Let c be the cost per unit time of producing at a rate pet), and h the cost per unit time of holding inventory level l(t) (c, h constants). If T is given, we consider the problem of minimizing our total cost, that is, minimize C(P)
= foTCcp(t) + hl(t»dt
by choosing a suitable production rate P.
3. More Examples of the Maximum Principle
45
Applying the maximum principle we find H
= -cP - hl +
q>l(P - S) + q>z( -AP)
and . oH q> 1 = - ill = - (- h) = h, . oH qJz = - oS = -( -qJd =
qJl'
qJl(T) = qJz(T) = 0, as this is a free-end-point problem. Solving the adjoint equations gives C1
= -hT,
so qJl(t) = ht - hT, ¢z(t) = qJl(t) = ht - hT, qJit) =
ht Z
2 -
(hT)t
+ c z ; qJz(T) = 0.
Therefore Cz
hT z
hT z
(hT)t
+ 2·
= hT Z --2-=2'
qJz(t) =
ht Z
2 -
hT Z
Now, applying the last stage of the maximum principle, we see that H(I*(t), S*(t), P*(t), q>t(t), q>~(t))
=
max H(I*(t), S*(t), P, qJ!(t), qJ~(t»,
l'sPsP
for all
t E (0,T).
This maximum is achieved at the same time as the maximum of max (- cP + qJ!(t)P -
l'sPsP
= max P( -
AqJ~(t)P)
l'sPsP
That is, denoting
W) =
-
C
+ qJ!(t) -
AqJ~(t),
C
+ qJ!(t) -
AqJ~(t».
m. The Pontryagin MaximumPrinciple
46
the optimal control for 0
t
~
~
T is given by
p P*(t) = _upnknown {
when
W»O
when when
~(t)
= 0
~(t)
<
0;
and from our remarks above, ht
2
hT2)
W) = -c + ht - hT - A( 2: - (hT)t + -2 Aht2
AhT 2
= --2- + (h + AhT)t - -2- - hT - c. Note that ~(t) is a quadratic, so we can have at most two switches. Solving for the zeros of ~, we find that t+
-
= 1 + AT + ~ A
- A
J
1 _ 2d h
are the switching times. Note that t± =
T+ ~ (1 ± J1_2~A)
and so with A, c, h > 0, the only optimal strategy is P*(t) =
E,
0
~
t
~
T.
Example 2 (Isaacs [3J) We return to Example 3 of Chapter 1. For simplicity, we assume that the constant k = 1 and
x = ux, max
x(O) = x o,
fo (1 T
u(t))x(t)dt.
The Hamiltonian is #(x, u, cp)
= (u -
l)x
+ tpux,
and the adjoint equation is
oH
= - - = -(u - 1 + cpu) = 1 - u(cp + 1),
ox
We then have to minimize over 0 u(x
~
u ~ 1,
+ cpx) = ux(1 + cp),
cp(T) =
o.
3.
More Examples of the Maximum Principle
47
and consequently the optimum control is given by
u*(t) = {
°
if if
+1
l+qJO
(x is always positive).
(1)
Since qJ(T) = 0, u*(T) = 0, and so the optimum control must be zero over the final time period. Using this information, we can solve the adjoint equation backward from t = T, with qJ(T) = and u* = 0. Then
°
qJ(t) = t - T
for
The control will switch when 1 + qJ(t) 1+t - T =
°
=
t
~
T.
0, that is, when t = T - 1,
or
and so
u*(t) = 0,
qJ(t) = t - T,
for
t E (T - 1, T).
Since qJ satisfies the adjoint equation, it is continuous over (0, T); so, over the penultimate switching interval
rp =
-qJ,
that is, t~T-1.
However, the exponential function can equal -1 at most once, so we must have
qJ(t) = -eT-t-r, u*(t) = + 1,
° °
~
t ~ T - 1,
~
t
~ T -
1.
The optimal policy (if it exists) must be full investment up to one time period before the final time, and then flat-out consumption. The cost of u* is obtained from
x* = that is,
x*(t)
=
Then over T - 1 ~ t ~ T, continuous. Hence
C(u*) =
JofT (1 -
over
x*(O) = Xo
x*,
xoet,
x* =
u*)x*dt =
[0, T - 1],
°
~ t ~ T - 1.
0, that is, x*(t) = xoe T-
1
since x* is
fT (1 - u*)x*(t)dt = x oeT- 1 •
JT-l
48
OI.
The Pontryagin Maximum Principle
Example 3 An Inventory Control Problem (Bensoussan et al. [IJ) Consider an inventory of items being controlled over a time period [0, TJ. If d(t) represents the demand rate for these items at time t, and u(t) the production rate at time t, then the rate of change of the inventory level x(t) will be just x(t)
= u(t)
- d(t),
= Xo,
x(o)
05,t5,T.
(2)
Suppose the firm chooses desired levels Ud(t), xit), 05, t 5, T, for the production and inventory. In order to penalize variations in u and x from the desired levels, we could formulate the cost of a production rate u as C(u) =
JoT
{c(u(t) - uit»2
+ h(x(t) - Xd(t» 2} dt
(3)
for some constants c, h > 0. Then the control problem would be to mini mize C (u) subject to (2). The problem (2), (3) would appear more realistic if we placed constraints on u and x, such as x(t) 2 0, u(t) 2 0, t > 0. For the moment, we shall assume that Ud and X d are sufficiently large that the minimization of the cost (3) keeps us in the region x 2 0, u 2 0. In Chapter 9 we shall consider inventory problems with state and control constraints. For this problem, the Hamiltonian is H(x, u, t/J) = - c(u - UdV - h(x - Xd)2
+ t/J(u - d),
and the adjoint equation is.
. oR t/J* = - ax = 2h(x*
t/J*(T)
- Xd),
= 0,
(4)
since (2), (3) is a free-end-point control problem. The maximum of the Hamiltonian over all U E R will occur at
oH au
= - 2c( u - Ud)
+ t/J
= 0,
that is, u*(t)
t/J*(t)
=~
+ uit),
05, t 5, T.
(5)
Substituting (5) into (2) gives x*(t) =
t/J;~t)
+ uit) -
d(t),
05, t 5, T,
x*(O)
= xo,
(6)
3. More Examples of the Maximum Principle
49
which, together with (4) forms a two-point boundary-value problem for the optimum trajectory and adjoint variable. Rather than solve this problem directly, we attempt a solution for the adjoint variable in the form ljJ*(t) = a(t) + b(t)x*(t),
(7)
0,
t~
for some functions a and b. The advantage is again that once the func tions a and b are determined, relation (7) and (5) gives the optimal con trol in feedback fomi. Differentiating (7) gives
tfr* = d + bx* + bx", and substituting (6) and (4) for X, tfr results in . 2h(x* - xd ) = d + bx*
+ b (ljJ* 2c + Ud
-
d) .
Plugging in (7) for ljJ* results in d(t) + b(t)(uit) - d(t»
+ 2hxd(t)
+ a(t)b(t) + (b(t) + b 2c
2(t)
2c
_ 2h) x*(t)
= 0 for all 0 ~ t ~ T.
(8)
This will be satisfied if we choose a and b such that b(t) d(t) + b(t)(Ud(t) - d(t»
From ljJ*(T)
=
+ b;~)
_=
0,
(9)
= O.
(10)
2h
a(t)b(t)
+ 2hxit) + ~
0, without loss of generality we can also suppose
= 0, a(T) = O.
(11)
b(T)
(12)
The equation for b is a Ricatti equation, which can be solved by the to give substitution b(t) = ~(t)g(t)
b(t)=-2cj!jtanh(j!j(T-t)}
O~t~T.
(13)
When (13) is substituted in (10) we are left with a linear equation for a, which can be solved by variation of parameters.
50
HI.
The Pontryagin Maximum Principle
For simplicity, we take a particular case Ud(t) = d(t), xd(t) = Cd' a constant, for 0 ::;; t ::;; T. That is, the firm wishes to have the production rate match the demand rate while maintaining a constant level of in ventory. Now a satisfies .( )
at
a(t)b(t) 2h +2 - + Cd =
0
a(T) = 0,
,
which, when solved, gives (14) substitution of (14) and (13) into (5) gives the feedback control law
U*(t'X)=~[Cd-X]tanh(~(T-t))+d(t),
O::;;t::;;T.
(15)
As a consequence the optimum control rate is equal to the demand rate plus an inventory correction factor, which tends to restore the inven tory to the desired level Cd. Further computation gives the optimal inventory level as x*(t) =
Cd
+
(xo -fi:T:. Cd) cosh [~ - (T - t)] , cosh('\I hlc T) C
0::;; t
s:
T.
(16)
We can see from (15) and (16) that if we start out with Xo = Cd' the de sired inventory, we remain at that level and meet the demand through production. If we start out away from Cd' the inventory level asymptot ically moves toward the desired level.
Example 4 Moon-Landing Problem (Fleming and Rishel [2]) This problem was described in Chapter 1. The state equations were
h= v, iJ = -g
m=
+ ulm
with cost
C(u) = min IT u
Jo
-ku,
where
os
u(t) ::;; 1,
(h(O), v(O), m(O)) = (ho , Vo, M h(T) = 0,
v(T) = 0,
+ F),
m(T) > O.
u(~) d~,
(17)
3. More Examples of the Maximum Principle
51
The Hamiltonian is H(h, D, m,u, t/J 1> t/J2' t/J3)
U + t/J1 D + t/J2( -g
=-
+ u/m) -
kt/J3U.
The adjoint equations become
.
t/Jl
aH -a;;- =
=
.
0,
aH
t/J2 = -Ji;= -t/J1>
tfr 3 -
(18)
aH _ t/J22U am - m
-
'
with t/J3(T) = 0 since m(T) is not specified. The maximum with respect to u of the Hamiltonian will occur at the same point as max [-u
O'::;u:sl
+ t/J2u/m -
kt/J3U],
that is,
u'(t)
when when when
~ {:dCfined
1 - t/J2(t)/m + kt/Jit) < 0 1 - t/Jz(t)/m + kt/Jit) = 0 1 - t/J2(t)/m + kt/J 3(t) > O.
Note that for the problem to be physically reasonable, max thrust> gravitational force, that is,
1> (M In general, if max thrust
+ F)g
1 --->g M+F .
or
= a (0 ~
u(t)
~
a), then
IX
M+F>g·
Our intuition would lead us to expect that the optimal control is first a period offree fall (u* == 0) followed by maximum thrust (u* == 1), culminating, we hope, in a soft landing. Assuming this to be the case, we shall first show how to construct such an optimal control and then
52
m.
The Pontryagin Maximum Principle
use this information to show that it is the unique solution of the maximum principle. Suppose that u*(t) = + lover the last part of the trajectory [e, T]' Remember that h(T) = 0, v(T) = 0, m(T) is unknown, and m(e) = M + F since we have coasted to this point. Then the solution of (17) is h(e) =
_ ~ g(T _ 2
v(e) = g(T m(e) = M
1')2 _
...
e) + ~
k
In
M+ (M + (M + k2
FIn
F - k(T - e)) _ T M +F k
F - k(T - e)) M+F '
e '
(19)
+ F.
If we plot h(e) againstetc), we get the curve shown in Fig. 4. Clearly this curve is the locus of all (height, velocity) pairs that we can steer to (0, 0) with full thrust + 1. There are some physical restrictions on the length of this curve. Namely, as the spacecraft is burning fuel at a rate k, the total amount of fuel will be burned in time F/k seconds. Consequently,
°s
T-
es
F/k.
Over the first part of the trajectory [0, e], we "free-fall," u* = 0, and we have h(t) = -tgt 2 + vot + ho,
v(t) = -gt + vo, m(t) = M + F,
\
\
h \
Fig. 4
(20)
3. More Examples of the Maximum Principle
53
or, in the phase plane, 1 h(t) = ho - 2g [VZ(t) - v~J,
os
t
~~.
(21)
As expected, this curve is a parabola. Using this strategy we free-fall following trajectory (21) until we hit the curve constructed previously in Fig. 4., where we switch to full thrust (see Fig. 5). The switching time ~ is the time at which (19) and (20) cross. We now show that this choice for u* satisfies the maximum principle. Let From the adjoint equations (18)
= Az - Alt, J/J3(t) = A3' m(t) = k(~ - t) + M + F, J/Jz(t)
O~t~T,
o~
t
~~,
~ ~
t
~
T,
we see that J/J3(r) = A3
it
+ J~ [k(~
(Az - Alt)
_ t) + M + FJz dt
Since the switching function
os is zero at t =
t
~
T,
~,
r(J<) = 1 _ Az - Al ~ + kA = 0 .. M+F 3 , h
Fig. 5
....
"
(22)
54
III.
The Pontryagin Maximum Principle
and (23) it is not too hard to find three unknowns AI,A2;A3 satisfying (22) and (23). They can also be chosen so that ret) < 0 on [0, ~], and ret) > 0 on [~, TJ. Any such solution clearly satisfies the maximum principle. We shall next show that this is the only type of solution of the maximum principle. Note that
ret) = - At/m(t),
o s t s T,
and so if Al -# 0, ret) is strictly monotone, and the optimal control is either of the type above or the opposite, namely, u* = 1, then u* = O. We now show that if (ho, vo) lies above the switching curve a control u* = 1, then u* = 0 cannot achieve a soft landing. If there were such a control, with switch at r, say, then veT) = 0 only if
T=v(r)/g+r
(by (17)).
Now since h(T) = 0, we must have by (17)
her) = -v(r)2/2g < 0, and we switch below the surface of the moon, which is not too good for the astronaut. Last, notice that if(h o, vo) is below the switching curve, a soft landing is not possible, because even by using full thrust for the entire trajectory, the spacecraft will strike the moon with a nonzero velocity (Fig 5). In the case Al = 0, ret) is a constant and if nonzero, the optimal control must be u* == 1 or u* == 0; the last case can easily be seen to be false, as is the former, unless (ho, vo) lies on the curve (19). The only remaining case is then Al = 0 and ret) == 0, 0 :s; t s; T. First, Al = 0 implies t/JI(t) == 0 and so t/J2(t) == A2, O:s; t s; T. So ret) == 0,0 :s; t s T, means that
A2 1 - met) + kt/Jit) == 0
for
os
t :s; T,
which implies thatthefunctions {I, 1Im(t),t/J3(t)} arelinearly dependent, a contradiction. Since we have eliminated all other cases, the given solution is the only solution of the maximum principle.
Problems
55
We can simplify the equation of the switching curve somewhat by approximating
In(1 _k(T+- F~») ~ _k(T+- F~) _ P(T+- F)2~)2. M
M
(M
Then h(~)
'" a(T _ ~)2,
v(~)
'" -2a(T - ~) - b(T _ ~)2,
where a
=
~
2
(k -M+F g(M + F») ' k
b = 2(M
+ Ff'
Now the switching curve can be approximately represented by
~a h + 2a '.)!E~ + v =
'¥(h, v) =
0,
so if we are measuring h and v simultaneously, we free-fall as long as 'P(h, v) > 0, and switch to maximum thrust when first 'P(h, v) = o.
Problems 1. Find the production rate P(t) that will change the inventory level l(t) from 1(0) = 2 to 1(1) = 1 and the sales rate S(t) from S(O) = 0 to S(I) = 1 in such a way that the cost C = gP 2 (t)dt is a minimum. Assume j(t) = P(t) - S(t),
S(t) = (P is unrestricted).
2. Minimize C(u) = t
- P(t)
g U(t)4 dt subject to
x = x + U,
x(O) = x o ,
x(l)
= O.
3. Find the extremals for (a)
So" «y')2 -
(b)
SOl
«y')2
y2)dx,
+ 4xy')dx,
y(O)
= 0,
y(O) = 0,
y(n)
= 0,
y(l) = 1.
56
m.
The Pontryagin Maximum Principle
Remember that the extremals for
S:
j(x, y'(x), y(x))dx,
= Ya,
y(a)
y(b) = Yb
are the solutions of the Euler-Lagrange equation d
/y - dx (/y,) = 0 that satisfy the boundary conditions. 4. Consider the following model for determining advertising expendi tures for a firm that produces a single product. If S(t) is the sales rate at time t and A(t) the advertising rate at time t, restricted so that 0::;; A(t)::;; A, then we assume S(t) =
- AS(t) + Y S~ A(t - r)e- dt,
A, y > O.
t
We wish to choose A to maximize g S(t)dt, the total sales over period [0, TJ. 5. Find the optimal control which maximizes the cost SOlOO
x(t)dt
subject to the system dynamics x(t) = -O.1x(t)
+ u(t),
os
u(t) ::;; 1,
x(O)
= X o'
6. Apply the maximum principle to the control problem
2(t)) dt,
maximize S02 (2x(t) - 3u(t) subject to
x(t)
IXU
= x(t) + u(t),
x(O)
IX
~ 0
= 5,
0 ::;; u(t) ::;; 2.
7. Discuss the optimal control of the system X2 =-u
from x(O) = (x?, x~) to (0,0), which minimizes the cost
S~
xi(t) dt.
8. What is the solution of the maximum principle for the problem of minimum fuel consumption minimize
S;' .,)1 + u(tf dt,
References
57
for the system
x = u, x(O)
= x?,
X(O) = xg,
lui ~
1,
x(t 1 ) ~ 0,
Does the solution of this problem change if the cost is replaced by
f~l lu(t)1 dt ? 9. A function E(x*, u*, u, t) is defined by
E(x*, u*, u, t) = f(x*, u, t) - f(x*, u*, t) + (u* - u)fu(x*, u*, t). Show that if (x*, u*) is an optimal solution of the calculus of variations problem of Section 2, then
E(x*, u*, u, t) ~ 0
for all
u, t.
(This is the Weierstrass E function necessary condition for a strong extremum.)
References [1] A. Bensoussan, E. Hurst, and B. Naslund, "Management Applications of Modern Control Theory." North-Holland Publ., New York, 1974. [2] W. Fleming and R. Rishel. "Deterministic and Stochastic Optimal Control." Springer-Verlag, Berlin and New York, 1975. [3] R. Isaacs, "Differential Games." Wiley, New York, 1965. [4] L. Pontryagin, V. Boltyanskii, R. Gramkrelidze, and E. Mischenko, "The Mathe matical Theory of Optimal Processes." Wiley (Interscience), New York, 1962. [5] H. Sagan, "Introduction to the Calculus of Variations." McGraw-Hili, New York, 1969.
This page intentionally left blank
Chapter IV
The General Maximum Principle; Control Problems with Terminal Payoff
1. INTRODUCTION In this chapter we consider the maximum principle for the general non autonomous control problem. Namely, suppose that (1) x = f(x, t u) is the control process and f is continuously dif ferentiable in R" + 1+ m; (2) we are given sets X 0, X 1 in R" to be interpreted as sets of allowable initial and final values, respectively; (3) d = {u: u is bounded, piecewise continuous u(t) Ene B", and u steers some initial point Xo E X 0, at a fixed time to, to some final point Xl E Xl' at time t d; (4) The cost of control u is C(u) =
I
II
10
fo(x(t), t, u(t)) dt,
fo continuously differentiable in W+ m+ 1 .
Define the Hamiltonian for x=(x1, ... ,xn),U=(Ul""'Um) , and t/J = (l{IO,l{Il,'" ,l{In) as H(x, u, t, t/J) = l{IofO(x, t, u) + l{I tfl(X, t, u) + ...
and define M(x, t, t/J) = max H(x, u, t, t/J). UEC!
59
+ l{Infn(x, t, u)
60
IV. The General Maximum Principle
Theorem 1 (Lee and Marcus [2J) If u*, x* are optimal for the above problem over [to, tt], then there exists a nontrivial adjoint response t/f* such that (i) x*(t) = f(x*(t), t, u*(t)), (ii) tiJj(t) = (iii) "'~
:~
=
ito
"'t(t)
:~j
(x*(t), t, u*(t)),
is a nonpositive constant (~O),
(iv) H(x*(t), u*(t), t, t/f*(t)) tinuity. Further
j = 1, ... , n,
and
= M(x*(t), t, t/f*(t)), at points of con
It
~
M(x*(t), t, t/f*(t)) = Jt'
LJ
t'i=O
"'{(s)
oj;
ot (x*(s),s, u*(s)) ds
----!.
and, hence
M(x*(tt), tt, t/f*(tT)) = O. If X 0 and Xl (or just one of them) are manifolds in R" with tangent spaces To and T l at x*(to) and x*(tt), respectively, then t/f* can be selected to satisfy the transversality conditions (or at just one end) and In the most usual case Xl (and possibly X 0) will be of the form (1)
Xl = {X:lh(X)=O,k= 1,2, ... ,1}.
for some given real-valued functions {gl,' .. , g/}' For example, our fixed-end-point problem could be formulated x*(tf) = Xl or
x: = 0,
gr(X*(tl)) = X:(tl) -
r = 1,...
,no
(2)
If the functions {g 1" •• , gl} are differentiable, then the transversality condition can be simply written
t/f*(tt)
=
I
L AkVgk(X*(tt)),
k=l
for some constants Vb ... , AI}' By (1)atthe terminal point we have also
gk(X*(tf))=O
for
k= 1,... ,1.
For example, in the fixed-end-point problem,
Vgr(x*(tf)) = (0,... ,1,0 ... 0). rth
2. Control Problems with Terminal Payoff
61
So (3) becomes t/1*(tf) = (A'I,A2,'" ,An), and in this case (as we have been assuming all along) t/1*(td is unknown. In the other case that we have been using, the free-end-point problem, Xl = W; consequently the only vector orthogonal to the tangent space is 0, that is, t/1*(tf) = O.
2. CONTROL PROBLEMS WITH TERMINAL PAYOFF In many applications of control theory the cost takes the form
°
C(u) = g(x(T»
+ SOT fo(x(t), t, u(t»
(1)
dt,
where T> is fixed and qi R" -+ R is a given continuously differentia ble function that represents some terminal payoff, or salvage cost, at the final time T. We can convert such problems into the standard form by adding an extra state variable X n + I (t) satisfying
xn+1(t) = 0,
rr;
xn+I(T)
= g(x(T»/T.
Then g(x(T» = x n+ l(t) dt, and the cost (1) is now in purely integral form. When the maximum principle is applied to this new problem, one finds the only change is in the transversality conditions, which now become t/11(T)
= - ~g (x*(T», ox,
i
(2)
= 1,2, ... ,no
Example 1 A Continuous Wheat-Trading Model (Norstrom [3J) We consider the following model for a firm engaged in buying and selling wheat, or a similar commodity. The firm's assets are of two types, cash and wheat, and we represent the balance of each quantity at time t by xl(t) and X2(t), respectively. The initial assets Xl(O) and xiO) are given. The price of wheat over the planning period [0, TJ is t ::; T. assumed to be known in advance and is denoted it by p(t), The firm's objective is to buy and sell wheat over the period [0, TJ so as to maximize the value of its assets at time T, that is, it wishes to maximize
°::;
(3)
62
IV.
The General Maximum Principle
Ifthe rate of buying or selling wheat at time t is denoted by u(t), (u(t) > 0 indicates buying; u(t) < 0, selling), then we could model the operation of the firm by the differential equations
xlet) = - axit) x2 (t) = u(t),
(4)
p(t)u(t),
(5)
where a > 0 in (4) is the cost associated with storing a unit of wheat, and the term p(t)u(t) indicates the cost (or revenue) of purchases (or sales) of wheat at time t. We have also the natural control constraint M
s
u(t)
s M,
M, M
given constants.
(6)
The Hamiltonian for this problem is
H=
r/Jl( -ax2 -
pu)
+ r/J2U,
(7)
and the adjoint equations are (8)
(9) with transversality conditions
by (2). In this case (8) and (9) are independent of the state variables and can be solved directly to give r/Jl(t)
=
r/J 2(t) =
(10)
-1,
o s t s T.
- a(t - T) - peT),
(11)
This enables us to write out the Hamiltonian explicitly as
H = aX2 + p(t)u
+ u( -
a(t - T) - peT))
= u(p(t) - a(t - T) - p(T))
+ aX2'
H will be minimized as a function of u if u*(t) =
M (buy) M (sell) { undetermined
when when when
pet)
<
peT) - a(T - t)(= -r/J2(t))
pet) > peT) - a(T - t) pet)
= peT) -
a(T - t).
(12)
63
ControlProblems withTerminal Payoff
2. 7
6
5
4
3
2
x
Sell
.. I
Buy
I
I
1
1
I
I
I
o
5
4
3
2
7
6
Fig.!
Figure 1 illustrates this solution for a particular price function, T=7,
O(=t,
M=-I,
M=I,
- 2t + 7 p(t) =
{
_
2:: 1~ t-
2
x 1 (0) = 50, X2(0)
= 1,
Osts2
~ ~ :~ 5s t
s
: 7.
From (11), it follows that ljJ 2(t) = - (tt + !). The optimal control is seen to be Os t < 4.6 u*(t) = {-I (sell) 4.6 < t < 7. 1 (buy) The simple optimal policy (12) has some shortcomings, particularly for long planning periods. It is very much dependent on the final price of wheat p(T), and not on the price between t and T. If T is very large,
64
IV.
The General Maximum Principle
then for small t the function t -+ p(T) - iY.(T - t) will be negative, and (12) would require us to sell wheat. This would mean (for T sufficiently large) that the supply of wheat x 2 (t) would eventually become negative, which would gain us cash with the state equation having the form (4). So for long planning periods, we should modify (4) to assign a cost for short-selling (that is, X2 < 0) or else forbid short-selling by adding the state constraint.
t
~
o.
We shall return to this problem in Chapter VIII, when we discuss state constrained control problems.
3. EXISTENCE OF OPTIMAL CONTROLS In this section we summarize some of the main existence theorems of optimal control theory. From now on we will assume the set of admissible controls is
11 = {u: u is bounded, measurable, and u(t) E n, for all t}, for some compact set n c R". In order to have a reasonable existence theory, it is necessary to consider measurable controllers, not just piecewise continuous. This may seem to lead us away from physical reality. On the other hand, once the existence of an optimal control is guaranteed, the necessary conditions, such as the maximum principle, can be applied without qualm, and in many cases they will show that the optimal controls are indeed piecewise continuous or better. The first result is essentially due to Fillipov. For a detailed proof, we refer to [2].
Theorem 1 Consider the nonlinear control process
x = f(x, t, u),
t> 0,
x(O) E X o ,
where f is continuously differentiable, and (a) the initial and target sets X 0, X 1 are nonempty compact sets in R"; (b) the control restraint set n is compact, and there exists a control transferring X 0 to X 1 in finite time; (c) the cost for each u E 11 is C(u) = g(x(ttl)
+ I~'
fo(x(t),t,u(t))dt
+
max {y(x(t))},
lello,lll
3. Existence of Optimal Controls
65
where fo is continuously differentiable in R n + 1 +m and g and yare continuous on R". Assume (d)
there exists a uniform bound Ix(t)!:$;; b
O:$;; i
s: t 1 ,
For all responses x(t) to all controllers uEA; (e)
the extended velocity set
V(x, t)
=
{(fO(x, t, u), f(x, t, u) : UEO}
n
is convex in R + 1 for each fixed (x, t). Then there exists an optimal controller u*(t) on [0, t1], u* E A, minimizing C(u).
Corollary 1 For the time optimal problem if (a), (b), (c), (d) hold, and V(x,t) = {f(x,t,U)IUEO} (the velocity set) is convex in W for each fixed (x, t), then time optimal controls exist. Applications of the theorem and its corollary are plentiful. Consider our moon-landing problem: minimize
S~l
subject to
u(r)dr
h = v,
iJ
=
-g
+ m-1u,
-g
+ m-1u,
rh = -ku.
Hence V«h, v,m),t)
= {(u, v,
-ku): O:$;; U:$;; l},
which is convex for each fixed «h, v,m), t). Hence an optimal control for this problem exists, and all our previous computations are justified. Similarly for Example 3 in Chapter I, minimize
S:.(1 - u)xdt
subject to
x=
kux,
O:$;; u :$;; 1.
We have V(x,t) = {«I - u)x,kux): O:$;; U:$;; I} = {(x - ux, kux): O:$;; U:$;; l},
There are, however, many problems where optimal controls do not exist [1].
66
IV.
The General Maximum Principle
Example 1 Consider the time optimal problem of hitting the fixed target (1,0) with the system
Xl = (1 - X~)U2, x 2 = u,
Xl(O)
= 0,
xiO)
= 0,
IU(t)1
s
1.
We show first that (1,0) E d(1) but (1,0) fI d(1). For each positive integer n, subdivide [0, 1J into 2n equal subinter vals. Let I j = (jj2n, (j + 1)/2n),j = 0, 1, ... , 2n - 1. Define U
(t) =
n
{I
if t E I j , j odd if t E I j , j even.
-1
For example (see Fig. 2), let x(t; un) denote the solution corresponding to Un' Then
xiI: un) = I~ and
Ix it : un)1 s as n --+ 00. Thus Xt(1: un) =
=
un(.)d.
=
°
II; Un(.)d·l--+ °
for all n
uniformly in
t E [0, 1]
I: [1 - xi.: unfJu;(.)d.
fo (1 l
xi.: un(.W d. --+ 1
as
n--+ 00.
Hence x(1 : un) --+ (1,0); however, (1,0) fI d(I), since this would require a control u such that X2(t: u) == 0, or U == and then Xl(t: u) == 0.
°
Fig. 2
67
Problems
As an exercise, show that for any t > 1, (1,0) E d(t), and hence inf{t: (1,0) is reached in time t} = 1, but this infimum cannot be attained. As the final existence theorem we consider the time optimal problem when the state equations are linear in x. In this case we do not need any convexity. Theorem 2 (Olech and Neustadt [4J) control problem with state equation x(t) = A(t)x(t)
+ f(t, u(t),
Consider the time optimal x(O) = x o,
where f is (separately) continuous and the entries of A are integrable over finite intervals. Suppose /)., X 1 are as before. If the target set Xl is hit in some time by an admissible control, then it is hit in minimum time by an optimal control. Problems
1.
Use the maximum principle to solve the following problem: maximize (8Xl(18) + 4x2(18»
subject to Xl = 2x l + X2
x2 = 4x
2.
l -
+ u,
2u,
Resolve the control problem in Chapter 3, Problem 5, with the cost maximize x(l00).
3.
Discuss optimal control of the system . . . 1 ('II 2 d mllllIlllze"2 Jo Xl t
subject to Xl = X2
X2
+ u,
= -u,
and X(tl) = O. (Note that a singular arc u* =
!(xTf = const, is possible in this example.)
-4 -
x!, xfx!
+
4. Discuss the minimum time control to the origin for the system
lui ~
1.
68
IV.
The General Maximum Principle
5. Discuss the possible singular solution of the system Xl = X2, X2 = -X2 - XtU, x(O) = x'', with terminal restraint set X t
and cost
1
It
C(u) = "2 Jo (xi(t) + x~(t»
lui ~
1,
= {(Xt, X2): xi + x~ = I}, dt.
(Show that the singular solution trajectory is x! = ± x!, and that the optimal control there is u* = - (x!
+ x!)/x!
6. Discuss the optimal solution of the fixed time control problem
lui s
1,
T
C(u) = fo (xi + luj)dt. Include the possibility of singular arcs.
7. Consider the following model for a rocket in plane flight. Let Xt(t) and X2(t) be the cartesian coordinates of the rocket; X3(t) = dxfd: and X4(t) = dx 2/dt the velocity components; X5(t) the mass of the rocket; Ut(t) and U2(t) the direction cosines of the thrust vector; U3(t) = - dx 5/dt the mass flow rate; c the effective exhaust gas
speed (positive constant); and g the magnitude of gravitational acceleration (positive constant). Define Ut
U2
= cos at, = cosfJ.
The state equations are then Xl
= x 3 ,
x2 = X4'
x3 = CUtU3/X5' x4 = CU2U3/X5 x5 = -u 3 ,
x(O) =
X o .
g,
The control restraint set is defined by
ui + u~
=
1,
0 ~ U3 ~ M.
References
69
Say as much as you can about the control that transfers the rocket from Xo to [" A, " " B] (A, B fixed) and minimizes
- f~
X3
dt .
(That is, we wish to transfer a rocket of given initial mass and position to a given altitude while expending a prescribed amount of fuel and attaining maximum range.) 8. Show that time optimal control transfering (0,0) to (1,0) does not exist in the example in Section 3.
References [1] H. Hennes and J. P. LaSalle, "Functional Analysis and Time-Optimal Control." Academic Press, New York, 1969. [2] E. B. Lee and L. Marcus, "Foundations of Optimal Control Theory." Wiley, New York, 1967. [3] C. Norstrom, The continuous wheat trading model reconsidered. An application of mathematical control theory with a state constraint. Working Paper, Graduate School of Industrial Administration, Carnegie-Mellon University, 1978. [4] C. Olech, Extremal solutions of a control system, J. Differential Equations 2, 74-101 (1966).
This page intentionally left blank
Chapter V
1.
Numerical Solution of Two-Point Boundary-Value Problems
LINEAR TWO-POINT BOUNDARY-VALUE PROBLEMS
We shall consider methods for the numerical solution of the following types of problem: (i) where (ii) (iii)
n first-order equations are to be solved over the interval [to, tr], to is the initial point, and t r is the final point; r boundary conditions are specified at to; (n - r) boundary conditions are specified at tro
Without loss of generality we shall take the problem in the form
Yi = gi(YI, Y2' ... ,Yn' t), Yi(tO) = c., Yi(t r) = Ci ,
= 1,
, n, i = 1, , r, i = r + 1, ... , n, i
(1)
where each gi is twice differentiable with respect to Yj' There can be considerable numerical difficulties in solving such problems. It has been shown for
Y = 16 sinh 16y,
Y(O)
= y(l) = 0,
that if we choose the missing initial condition, Y(O) = s, 71
72
V.
Numerical Solution of Two-Point Boundary-Value Problems
for s > 10- 7 , then the solution goes to infinity at some point in (0,1). (This point is ~ l6In(8/s).) The true solution is of course y(x) == 0, s x ~ 1 [2]. Let us first consider the linear case
°
y(t)
= A(t)y(t) + f(t),
(2)
where
= 1, ... , r, i = r + 1, ... , n.
Yi(tO) = c.; Yi(tr) =
i
Ci,
So c and f(t) are given. The adjoint system is defined as the solution of the homogeneous equation z(t)
=-
(3)
AT(t)z(t).
As usual we can write the general solution of (2) as (4)
where
k = A(t)X,
X(to)
= I.
Furthermore, the solution of (3) can be written (5)
and so (4) gives z(tr)Ty(ta
= z(trl X(tay(t o) + z(taT Jlo fIr X(tr)X(s)-lf(s) ds.
(6)
Substituting the transpose of (5) into (6) gives
z(taTy(t~)
= Z(to)Ty(t o) + fIr z(taTX(taX(s)-lf(s)ds. Jlo
Now, again from (5), z(s) = X(s)-TX(tr)Tz(t r),
and so (7)
1. Linear Two-Point Boundary-Value Problems
or in component form, itl Zj(tC)Yi(tC) -
it
Zi(tO)Yi(tO) =
i:
f
73
(8)
itl Zi(S)!;(S) ds.
Equations (7)and (8)are the basic identities in the method of adjoints, the first of the shooting methods we shall investigate. The method begins by integrating the adjoint equations (3)backward n - r times, with the terminal boundary conditions
z(ll(t c) =
-
0 0
0 0
0 0 0 1 0 0
rth,
Z(2)(t C) =
-
0 0 1 0
Z(3 l(tc) =
0
0
-
0 0 0 1
rth, (8')
0 0
z(n-r)(tc) =
-
0 0 1
This gives us n - r functions z(l)(t), Z(2 l(t), ... , z(n-r)(t), to
~ t ~ tc (of course in practice we do the integration at a discrete set of points). Then
z(m)(tc)Ty(tc)
=
n
L zlm)(tc)Yi(tc) = Yr+m(tC) =
i,,: 1
C
m = 1, ... , n - r,
r+m,
and so (8) becomes, with some rearrangement,
i=~
n 1
zlm)(to)Yi(to)
= Cr+m - i~l
i: i~l
n
r
z!ml(tO)Yi(tO) -
zlm)(s)!;(s) ds
(9)
for m = 1, ... , n - r. The set of equations (9) is a set ofn - r equations in n - r unknowns {Yr+ 1(to), Yr+ ito), ... , Yn(tO)}, which we can solve and thereby obtain a full set of initial conditions at t = to.
74
V.
Numerical Solution of Two-Point Boundary-Value Problems
Note that (9) can be rewritten
l
z ~ ~ 1(tO) Z~~2(tO) z~~ 1(tO)
z~~ 2(tO)
z~n+-[)(to)
z~n+-{)(to)
···
Cr+ 1
Z~1)(tO)]
z~2)(to) z~n-r)(to)
±
-
i= 1
zP)(tO)Ci -
lYr+ 1(to)J Yr+ 2(tO)
.. . .. .
Yn(t o)
ff ~>P)(t)};(t)
dt
0
(10)
c; -
it
1
i- L:f L
zln-r)(tO)C
zln-r)(t)};(t) dt
The inverse of the z matrix exists since {Z(1)(t r), ... ,z(n-r)(tr)} are linearly independent, and consequently so are {z(1)(t), . . . , z(n-r)(t)}
for any
t
E
[to,
trl
(This is a known fact from linear ordinary differential equations.) We can code this method as follows: 1. Set m = 1. 2. Integrate the adjoint equations backward from t r to to for the mth set of boundary conditions (8'). 3. Evaluate the mth row of (10). 4. If m = n - r, solve Eq. (10) for the missing initial conditions {Yr+1(tO),' .. , Yn(tO)}; go to item 6. 5. If m < n - r, set m = m + 1; return to item 2. 6. Now using the full set of initial conditions {Yi(tO)}i= 1, integrate (1) forward to obtain the solution of the boundary-value problem. Note that if n - r-> r, we save work by reversing the roles of the initial and terminal points. 2. NONLINEAR SHOOTING METHODS
Nonlinear two-point boundary problems can be solved by an iter ative process. We start with an initial guess {
(0) ( ) (OJ ( ) (0)( )} Yr+ 1 to , Yr+ 2 to , ... ,Yn to .
2. Nonlinear Shooting Methods
This will allow us to solve the Eqs. (1.1) to find y(O)(t), to ::; t we iterate according to the following scheme: set
s;
75
t r. Then (1)
Then by the usual Taylor-series expansion, we have as a first-order approximation i = 1, ... , n,
that is, (2)
(where J is the gradient of g), which is just a set of linear ordinary differential equations with variable coefficients. Furthermore, bYlk)(to) = 0, bYlk)(tr) = Ci - Ylk)(t r),
i = 1, ... , r, k = 0, 1, 2, i = r + 1, ... ,n, k = 0, 1,2,
, (3) .
Equations (2) and (3) define a linear two-point boundary-value problem for the correction vector by(k)(t), to ::; t ::; t., which we can solve by the previous method, the method of adjoints; namely, as before, we define the adjoint system to (2) i(k)(t)
= -Jnt)z(k)(t),
(4)
where J k = J(y(k)(t», and solve (4) backward n - r times with end point conditions
o o Q rth
J
o
o o
o o
o
Q
Q
Q
o 1
o o
o
1
o
o
o
1
Denote the solutions thus obtained at the kth iteration step by {z(l)(t), Z(2)(t),. . . ,z(n- r)(t)} (k)'
76
V. Numerical Solution of Two-Point Boundary-Value Problems
Then the fundamental identity tells us that
[
Z ~ ~ l(tO) z~~\(to)
Z~I)(tO)
Z~~2(tO) z~~ ito)
Z~2)(tO)
z~n-r)(to)
z~n+-{)(to)
lbY~k21(tO)]
]
bY~~.2(tO)
(k)
[t5yn
..
by~k)(tO)
l(tr)]
== t5y~k~ 2(tr) . (5) .. t5y~k)(tr)
[Note that (2) has no forcing term, simplifying Eqs. (5).J Solving (5) gives us a complete set of initial conditions t5y(k)(t O)' and we get the next choice for the initial conditions for y by setting y(k+l)(tO)
= y(k)(tO) + t5y(k)(t o).
Now We solve for y(k+ 1)(t),. t E [to, trJ, and return to (2) to calculate J(y(k+ 1», t5y(k+ 1), etc. We terminate whenever max{t5Ylk)(t r): i = r, r + 1, ... , n} is sufficiently small or k is too large. This is called the shooting method; we guess the unknown initial values, solve the equations, and then, on the basis of this solution, make corrections to the previous initial values. So in shooting methods our aim is to find the missing initial data. It can be shown that shooting methods are a special case of the Newton-Raphson method, and so, provided that our initial guess is sufficiently close, Jk is nonsingular, and the interval [to, trJ is not too large, the approximations converge quadratically. To recapitulate, the method of adjoints for nonlinear ordinary dif ferential equations is carried out as follows: 1. Determine analytically the gradient (og;/oy). 2. Initialize the counter on the iterative process. Set k = 0. 3. For k = 0, guess the missing initial conditions yIO)(to), i = r + 1, ... , n. 4. Integrate (1) of Section 1 with initial conditions Ylk)(t o) = ylk)(to),
Cj,
i = 1,2, , r, i = r + 1, , n,
and store y(k)(t). 5. Set the counter on the integration of the adjoint equations, m = 1.
3. Nonlinear Shooting Methods: Implicit Boundary Conditions
77
6. Calculate zlm)(t o), i = r + 1, ... , n, by integrating the adjoint equations (4) backward from t f to to, with final data z\m)(t c) as in Eq. (8) of Section 1, i = 1, ... , n. Note that in this integration the stored profiles y(k)(t) are used to i, j = 1, ... , n. evaluate the partial derivatives
og;j(JYj,
7. For the mth row of (5) form the right-hand side of (5) by sub tracting the specified terminal value Yi(tC) = Ci from the calculated value Ylk)(tf ) , i = r + 1,. , . , n, found in item 4. 8. If m < n - r, set m = m + 1 and go to 6. 9. Form the set ofn - r linear algebraic equations (5) and solve for (iy\k)(t O)' i = r + 1, ... , n. 10. Form the next set of trial values by ylk+ 1)(t o) = y\k)(t O)
+ by\k)(t O),
i= r
+ 1, ... , n.
11. Set k = k + 1; return to 4. 12. Terminate, whenever max{(iYlk)(tc) : i = r + 1, ... ,n} is suffi ciently small, or whenever k exceeds a maximum value. 3. NONLINEAR SHOOTING METHODS: IMPLICIT BOUNDARY CONDITIONS In this section we consider the two-point boundary-value problem of Eqs. (1.1) with implicit boundary conditions that are functions of both the initial and the terminal conditions: q;(Yl(tO) , Yz(t o),·· ., Yn(t O), Yl(t C),·· . Yn(tC))
=
0,
i
= 1, ... , n. (1)
Let us define the variation in qi as i = 1,2, ... , n,
(2)
Since qlrue = 0. Up to second order we can approximate (iqi by n
bqi =
.L
J~ 1
Oqi
~y.(t) U J
n
0
(iYNo)
Oqi
+ J~.L1 UY ~ .(t) (iYitc), C
i= 1,2, ... ,n, (3)
J
where
(o~~:o)
and
(o:j~~c)
are the gradients evaluated at y(t o) and y(ta, respectively. Equations (3) are n equations in 2n variables {(iy(to), (iy(tc)}. However, from the
78
V. Numerical Solution of Two-Point Boundary-Value Problems
fundamental identity of adjoints we can relate by(t o) and by(tc) as in Eq. (2.5) n
I
i= 1
zP)(tO)bYi(tO) = bYj(td,
j =
J, 2, ... , n,
(4)
where {z'!', ... , z(nl} are the solutions of the adjoint equations (5.2.4) with terminal conditions 1 0 z(ll(t c) =
Z(2
l(t c) =
0 0
0 0
0 1 0
z(nl(td =
(5)
1
0
If we substitute (4) into (3), we get bq; =
Oqi
n
I
j= 1
+
-0 ( ) bYj(to)
Yj to
.f
J= 1
O~(ti
0 ) YJ c
(± z~jl(tO) s= 1
bYs(to»),
i
= 1,2, ... ,no
(6)
i = 1,2, ... .n.
(7)
On rearranging (6), we find that
~~(jl bq, _~(~ - L. 0 ( ) + L. 0 ( ) zp (to)) bYp(to), p=l Yp to j=l Yj t r
Equation (7) is a set of n equations in n unknowns {by(t o)}, which can be solved, and from by(t o) weget our new guess for the initial data. To recapitulate: 1. Determine analytically the partial derivatives
Oqi ) ( °Yj(t o) ,
2. Initialize the counter on the iterative process. Set k = O. 3. For k = 0, guess the missing initial conditions y\O)(t o),
i=I,2, ... ,n.
4. Integrate (1.1) with initial conditions ylkl(to) and store the profiles
y\kl(t), i = 1,2, ... ,no
5. Using initial values Ylkl(to) and calculated final values ylk)(t r), evaluate bqi' i = 1,2, ... , n by (1) and (2).
Quasi-Linearization
4.
79
6. If max {c5q 1, c5q2' ... , c5qn} is sufficiently small, or if k is greater than the maximum terminate. Otherwise go to 7. 7. For each Ylt r) appearing in the implicit boundary conditions (1), integrate the adjoint equations (2.4) backward with terminal data (5). The profiles y!k)(t), i = 1,2, ... , n, that are stored in 4 are used to evaluate the derivatives (ogi/oYj)' Save {z(1)(to), ... , z(n)(t o)}. 8. Using the expressions in 1 for (oq;/oy), evaluate oq; oYj(to)'
Oqi oYNr)'
,n,
i,j= 1,2, ...
and form the left-hand side of (7). 9. Solve (7) for c5Yi(t O) and call the solution c5y!k)(t O) for the kth iteration step, i = 1,2, ... , n. 10. Form the next set of trial conditions y!k+ 1)(to) = y!k)(to)
+ c5y!k)(t o),
i = 1, ... ,no
11. Set k = k + 1; return to 4. 4. QUASI-LINEARIZATION Let us reconsider the system of n nonlinear ordinary differential equations:
Yi = gi(Yl, Y2"'" Yn, t),
i = 1,2,
.n,
(1)
Yi(tO) = c.,
i = 1,2,
.r,
(2)
Yi(tr) = c.,
i = r
+ 1,
, n.
(3)
Suppose we have the kth nominal solution to Eqs. (1)-(3), y(k)(t) over [to, trJ, in the sense that the initial and terminal conditions are satisfied exactly, but the profiles y(k)(t) satisfy the differential equation (1) only approximately. We expand the right-hand side of (1) in a Taylor series up through first-order terms around the nominal solution y(k)(t); namely, approximate
ess"" 1)) ~ where
g;(y(k»)
+ Jly(k»)(y(k+ 1)
_
y(k»),
i
= 1, ... , n,
(4)
80
V. Numerical Solution of Two-Point Boundary-Value Problems
is the ith row of the gradient evaluated at y(k). Since we want ylk+ 1)
= ess": 1»,
i = 1,... .n,
(5)
by substituting (4) in (5) we arrive at the ordinary differential equation for y(k+ 1) t) +
ylk+ ll(t) = gi(ylk)(t), . . • ,y~k)(t),
og In -0 (y)k+1)(t) i
j=1 Yj
y)k)(t)).
(6)
On rearranging terms in (6), we have y(k+
ll(t)
= J(y(k)(t))y(k+ 1)(t) + f(t),
k = 0,1,2, ... ,
(7)
where J(y(k)(t)) is an n x n matrix with elements ogi/oYj evaluated at y(k)(t), and f(t) is an n x 1 vector with elements gi( Yl(k)() t, ... , Yn(k)(» t, t -
a
~ Ogi Yj(k,() L. t, j=1 Yj
i = 1,2, ... ,no
Since we are clamping the boundary conditions at each iteration, we set y!k)(t O) = Yi(tO) = c., y!k)(t r) = Yi(tr) = Ci'
i = 1,2, i
= r + 1,
,r, ,n.
(8)
Equations (7)and (8)define a linear two-point boundary-value prob lem that can be solved by the method of adjoints to give the (k + l)th approximation to Eqs. (1)-(3). Theoretically, for a solution to the nonlinear problem, we require lim y\k)(t) = Yt(t),
k ..... a:
i = 1,2, ... ,n,
to:S; t :s; tr.
Numerically it is sufficient for
Iy!k+ 1)(t) - y!k)(t) I < s,
i = 1,2, ... , n, to:S; t :s; t r .
Recapitulating, quasi-linearization consists ofthe following steps: 1. Linearize the right-hand side of (1) to obtain (7). 2. For k = 0, provide nominal profiles ylO), ... ,y~O)(t), to :s; t :s; tr, that satisfy the boundary conditions. 3. For the kth iteration, using as nominal profiles y(k)(t), solve the linear two-point boundary-value problem (7) and (8). 4. Test whether
Iy!k+ 1)(t) - y\k)(t)1 < e,
i = 1,2, ... ,n,
If satisfied, exit; otherwise set k = k
+ 1 and
to:S; t:s; t r .
go to 3.
5. Finite-Difference Schemes and Multiple Shooting
81
It can be shown that when quasi-linearization converges it does so quadratically, but again it is a Newton-type method (now in q[t o , t r ] ) and so the initial guess is very important.
5. FINITE-DIFFERENCE SCHEMES AND MULTIPLE SHOOTING Finite-difference schemes have proved successful for numerically unstable problems since the finite-difference equations incorporate both the known initial and final data, as does quasi-linearization. There fore the solution is constrained to satisfy these boundary conditions. The disadvantages are that the preparation ofthe problem is arduous and the solution involves solving large numbers of (in general) non linear simultaneous equations by say Newton-Raphson or other itera tive methods. To see why this arises, consider a simple linear problem
d2y
dt 2 = Y
+ t,
y(a) = c l ,
a ~ t y(b) =
~
b,
(1)
C2'
(2)
The interval [a, b] is divided into N + 1 intervals of length h = + 1), discrete points are given by
(b - a)/(N
t j = to
b-a
+j N + l '
j = 0, ... , N
+ 1,
where to = a and t N + 1 = b. Set Yj = y(t). Then if we replace y" by second central difference
y"
=
Yj+l -
~i + Yj-l + O(h 2),
and the discrete approximation to (1) is -Yj-l +(2 + h2)Yj - Yj+l = -h 2t j ,
j
= 1,2, ... , N.
(3)
Then (1) and (2) can be approximated by the N equations in N un knowns {Yl" .. , YN}, that is, 2 (2 + h )Yl - Y2 = -h 2t 1 + c 1 , -Yl + (2 + h2)Y2 - Y3 = -h 2t 2 ,
82
V. Numerical Solution of Two-Point Boundary-Value Problems
Or in matrix form, Yz
-h Zt1 -hZt z
YN-I
-hZtN _ 1
Yt
-1 -1 -1
2
+ hZ
+ c1 (4)
-hZtN + Cz
YN
since Yo = y(a) = C I, YN+I = y(b) = Cz. Since this problem is linear, we have here a linear set of equations. In general, for nonlinear ordinary differential equations we shall get a set of nonlinear equations that must be solved by iteration. For example, the central difference approximation to _y"
+ yZ =
t,
y(a)
= C I,
y(b)
= Cz ,
is - Yj-t
+ 2Yj + hZyJ -
Yj+ I = -
hZtj ,
j = 1,2, ... ,N.
Finally, for unstable problems a combination of shooting and finite difference ideas, called multiple shooting, has given promising results. Here the interval [to, tf ] is divided into equal parts, over each of which a shooting method is applied (see Fig. 1). However, in contrast to finite-difference methods, the number of intervals is usually relatively small. Note that in solving our general problem by shooting over [to, t r], if we use integration scheme of order p and step size h, the errors in the numerical solution of the differential equations is of the order K I = const,
and so reducing the size of the interval over which we shoot can substantially reduce the errors [1]. For instance, consider the simple
Fig. 1 Multiple Shooting.
5.
Finite-Difference Schemes and Multiple Shooting
83
two-point boundary-value problem (TPBVP) Yi(t)] = [fi(t,y(t»] [ Y2(t) f2(t, y(t»
(5)
over 0 ::; t::; 1, where Yi(O) = a, Y2(1) = b. If we break up the interval [0, 1] into two equal subintervals [O,!] and [!, 1] (see Fig. 2), and apply shooting over each, guessing Y2(0), Yi(!), Y2(!), then we can formulate this problem as a four-dimensional TPBVP of a type considered earlier. First, we make a time change to reduce the equation (6) y = f(t, y), to an equation over [0,
n That is, if we set. = t -
o::; r ::; !, then z is the solution of z(.) = f(.
with
+!, z(.»
!, z(.) =
y(.
+ i),
: = g(., z(.», Z2(!) = b.
Furthermore, continuity across t
=
(7)
! requires that Y2(!) = Z2(0).
(8)
Therefore our original problem can be equivalently stated
(9)
o
1/2 Fig. 2
84
V.
Numerical Solution of Two-Point Boundary-Value Problems
with boundary conditions (5), (7), (8), or
~r OI ~ ! ~lr :i~ J o
0 0
zz(O)
+
r- L~ 0
(10)
0
which is of the type considered in Section 3. Keller [1] has shown that if we partition the interval [to, tc] into N equal subintervals, then we must now solve a nN system of ordinary differential equations by shooting; however, the error can be bounded by K 1 = const.
6. SUMMARY Method
Advantages
Disadvantages
Shooting
1. Easy to apply. 2. Requires only (N-r) missing initial data points. 3. Converges quadratically when it converges.
1. Stability problems, particularly over long intervals. 2. Must solve nonlinear differential equations.
Quasi-linearization
1. Only need to solve linear differential equations 2. Converges quadratically 3. For numerically sensitive problems quasi-linear equations may be stable.
1. Must select initial profiles. 2. Need to store y(k)(t) and
1. Good for numerically unstable problems.
1. Arduous to code. 2. May need to solve large numbers of nonlinear equations.
Finite-Difference Schemes
ylk+ l)(t).
Problems 1. Apply the method of adjoints to the equation ji
= y + t,
y(O) = 0,
y(1) =
tX.
References
85
2. Apply the method of Section 3 to the nonlinear TPBVP 3yji
+ y2
=
0,
with the boundary conditions
y(O) =
0(,
3. Write a program to solve the TPBVP Y(t) By(O)
= A(t)y(t) + f(t), 0 s t ~ 1
y(t) = (Yl(t), ... ,Yit)),
+ Cy(l) = d,
for given A(t), f(t), B, C, and d. As an example, solve
with Y3(0)
=
-0.75,
Yl(l)
= 13.7826,
Y3(1) = 5.64783,
over 0 ~ t ~ 1, and give the resultant y values at 0.05 intervals. [2]
References [1] H. B. Keller, "Numerical Methods for Two-Point Boundary Value Problems. Ginn (Blaisdell), Boston, Massachusetts, 1968. [2] S. Roberts and J. Shipman, "Two-Point Boundary Value Problems" Elsevier, Amsterdam, 1972.
This page intentionally left blank
Chapter VI
Dynamic Programming and Differential Games
The fundamental principle of dynamic programming, the so-called principle of optimality, can be stated quite simply.
If an optimal trajectory is broken into two pieces, then the last piece is itself optimal.
A proof of this principle, sufficient for many applications, can be given quite simply (see Fig. 1). Suppose path (2) is not optimal. Then we can find a "better" trajectory (2') beginning at (x', t'), which gives a smaller cost than (2). Now, tracing (1) to (x', t') and (2') from then on must give a smaller overall cost, which contradicts the supposed optimality of (1)-(2). (Of course this proof requires that pieces of admissible trajectory can be concatenated to form an admissible trajectory. This is not necessarily always true, and indeed the principle of optimality may fail to hold.) 1. DISCRETE DYNAMIC PROGRAMMING
Let us consider the network with associated costs shown in Fig. 2. We can regard this as an approximation to a fixed-time problem, where we have partitioned the time interval into three parts, and we have restricted the number of states that are attainable to three after the first time step, three after the second time step, and four after 87
88
VI. Dynamic Programming and DifferentiaJ Games
T
t'
Fig. I
---/,[2]
ill 2
3
Fig. 2
third. The number on the path refers to the payoff when that path is chosen. Furthermore, not all future states need be attainable from previous states. Finally, the numbers in boxes refer to the extra payoff we get for finishing up at that point. This corresponds to a term g(x(T)) in the cost functional. Of course they may be zero. Let us work out the path that maximizes the total payoff (Fig. 3). Remembering the idea of dynamic programming, we start at the rear (i.e., state 3) and move backward. Now calculate all possible payoffs in going from state 2 to state 3. Remembering that the final section of an optimal trajectory is itself optimal, we can immediately disregard the lower path from P 1 since this gives only 5 units of payoff, whereas the upper path gives 17 units. Another way of looking at this is to
1. Discrete Dynamic Programming
@] P3
89
r---_ - - - - - ' 0
2
3
Fig. 3
suppose our optimal trajectory (from 0) lands us at P 1; then clearly over our final stretch we will pick up 17 units and not 5. To indicate that this path is the best from this point, we mark it with an arrow and put 17 next to Pl' Continuing similarly with P 2 and P 3' we find 15 and 13 as the best payoffs from these points. Then we apply the same argument in going from step 1 to step 2, and we obtain the result shown in Fig. 4. Note that both paths from the middle point at step 1 give the same cost; hence we arrow both paths. Now, with one final application (Fig. 5), we have finished. We see that the maximum payoff is 24, and we can follow our arrows backward to get the optimal path: it is the upper path in this case. ~
100::,--- - - - - - - l
§I""'----
---~~ 2
Fig. 4
~
90
VI. Dynamic Programming and Differential Games
1
®J
Fig.S
Unfortunately, if we discretized our time interval into many pieces (n), and allowed many control values at each step (m), then we can
see that the dimension of the set of all possible final paths will be enormous (mn ) . This is the main disadvantage of dynamic programming: it is often called the curse of dimensionality. 2. CONTINUOUS DYNAMIC PROGRAMMING CONTROL PROBLEMS We now use dynamic programming to derive necessary conditions for optimality for a control problem of the type minimize f.~ fo(x(t), u(t)) dt
(1)
subject to i(t) = f(x(t),u(t»,
x(to) = xo,
to ~ t
~ T.
(2)
Let V(Xl> tt) be the optimal cost for this control problem with initial condition x(tt) = x., to ~ t 1 ~ T. In the following we shall suppose that V, f, and fo are sufficiently differentiable. This is a strong assumption. Suppose that, u* is an optimal control over [to, TJ for (1) and (2), and that {) > 0 is sufficiently small. Then V(x o, to) =
fT
Jto
fo(x(t:u*),u*(t»dt
= ftoH fo(x(t: u*),u*(t» dt Jto
+
fT Jto+,s
fo(x(t: u*),u*(t» dt.
(3)
However, by the principle of optimality, the final term in (3) is just V(x(to + {) : u*), to + {», that is, V(x o, to) =
i
to + ,s
to
fo(x(t: u*),u*(t» dt
+ V(x(t o + {) : u*), to + {».
(4)
2.
91
Continuous Dynamic Programming-Control Problems
Now, using the regularity of f and V, we can expand in a Taylor series and see that
+ 15 : u*) = V(x(t o + 15 : u*), to + b) = x(to
Xo
+ bf(x o, u*(to)) + 0(15),
(5)
b{V x V(x o, to) • f(x o, u*(t o))
+ V;(xo , to}} + V(xo, to) + 0(15),
(6)
i:o+O JoJx(t: u*), u*(t)) dt = bJo(x o, u*(to)) + 0(15),
(7)
where lim(o(b)/b)
=
~~o
o.
IfEqs. (5)-(7) are substituted in (4)and 15 --+ 0, we arrive at the following partial differential equation for V: V;(x o, to) = -{VxV(xo, to)· f(xo,u*(t o))
+ Jo(xo,u*(t o))}.
(8)
If we return to Eq. (3), we can derive a little more information about the optimal control. Thus V(x o, to) = min{ ued
~to+o
Jto
Jo(x(t: u), u(t)) dt
+ V(x(t o + 15 : u), to + b)}.
Now expanding as before in a Taylor series gives V;(xo, to) = - min {VxV(xo,t o)· f(xo,u(t o)) u(tolen.
+ Jo(xo,u(to))}.
(9)
Since u(to) can take any value in n, and since X o and to are arbitrary, (9) implies that V;(x, t)
= - min {V x V(x, t) • f(x, w) + Jo(x, w)}, wen
(10)
for all x, t > 0, and the optimal control u* is just the value of WEn that achieves the minimum in (10); u* = u*(x, t). Equation (10) is called the Hamilton-Jacobi-Bellman (HJB) equation, and it imme diately leads to an optimal control in feedback form. The partial differential equation (10) has the natural terminal condition V(x, T) = 0
(11)
When the control problem has a cost of the form minimize {9(X(T))
+ i~
Jo(X,U)dt},
92
VI. Dynamic Programming and Differential Games
Eq. (11) is replaced by V(x, T) = g(x),
x ERn.
In general it can be shown that the following theorem holds [3].
Theorem 1 Suppose that f and io are continuously differentiable and the solution V of (10) is twice continuously differentiable. If u* achieves the minimum in (10) and u* is piecewise continuous, then u* is an optimal control. Conversely, if there exists an optimal control forall(xo,t
(12)
evaluated along an optimal trajectory x*, then
1/1. j(t) =
d
- dt
i
a2v dx'!'(t) 02V } J a . (x*(t), t) - d + - a . ;, (x*(t), r) { .L: -.a J=l x, x t x.ot n
= -
=
(av oX (x*(t), t) )
-jtl O~~~j a + -a. X,
=
J
(x*(t),t)Jj(x*(t),u*(t))
{n.L -0. av (x*(t), t)Jj(x*(t), u*(t)) + io(x*(t), u*(t))} J= 1
XJ
Jl av (x*(t), t) a;of,. (x*(t), u*(t)) + oJ;ox~ (x*(t), u*(t)), n
aXj
using (10) and the state equation. In other words, ,I: _
'1'; -
-:-
~ c.
j= 1
.1,
oJj (* aio (* *) x ,u *) + ~ X,u , ox,
'I'j-a Xi
i = 1,2, ...
,n, (13)
with ljI(T) = O. Of course Eqs. (13) are the adjoint equations. The other statements of the maximum principle follow analogously; thus under the rather strong regularity assumptions on V, we have derived the maximum principle. The relation (12) also gives us a natural inter pretation of the adjoint variables-they are the negative of the rate of change of the cost with respect to X; that is, -1/1 i is the marginal cost of Xi'
2. Continuous Dynamic Programming-Control Problems
s:
93
As an example, let us reconsider the control problem minimize
(ax 2
+ u2 ) dt
subject to
x = ax + f3u,
x(O) = xo.
The HJB equation becomes . { ax 2 + w2 + av } -m;n ax (ax + f3w),
av = at
V(x, T) = O.
(14)
Since the minimum of the function h(w) = w2 + f3(aVjax)w occurs at u* = -(f3j2)aVjax, (14) becomes 2 _ av = ax 2 _ 13 (aV)2 + ax(aV), (15) at 4 ax ax which, when solved for V, gives the optimal cost and optimal control. We try to compute a solution of the form V(x, t) = c(t)x 2, and we find c(T)
= 0,
which can be solved as before. As one of the most useful applications of dynamic programming, consider the general quadratic cost linear control problem minimizeH x(T) • Sx(T)
+ ~ foT {x(t) . Qx(t) + u(t) • Ru(t)} dt}
(16)
subject to x(t) = Ax(t)
+ Bu(t),
x(O) = x o, O:s;; t :s;; T.
(17)
Here T is a given fixed final time, Q and S are nonnegative definite n x n matrices, and R is a positive definite m x m matrix. As controls we allow any square integrable function u(t) E B". Applying the HJB equation, we find that V,(x,t) = -min{VxV(x,t)· (Ax + Bw) +!x· Qx +!w· Rw}. (18) weRrn
A little computation shows that the minimum occurs at u*(x,t) = -W IBTVxV(x,t),
os t s:
T,
(19)
94
VI. Dynamic Programmil!g and Differential Games
and hence the HJB equation becomes
V, = -!x· Qx - tvxV· BR-TBTVxV
+ Vx V • BR - 1BTvx V
- Vx V • Ax,
(20)
where V(x, T) = 'tx • Sx. By analogy with the previous example, we may be led to seek a solution of (20) in the form V(x, t) = tx • K(t)x
(21) .
and, in fact, we find such a solution indeed exists if K(t) satisfies the matrix Riccati equation, . K(t)
= - K(t)A -
AT K(t)
+ K(t)BR -1 BT K(t) -
Q(t), 0 s t ~ T,
(22)
K(T) = S. The existence and uniqueness of such a K is guaranteed by the existence and uniqueness of optimal controls for problem (16)-(17) and by Theorem 1. Of course, in this example, V has, the required differentiability properties. Once the function K has been computed and stored (in practice, of course, this would be done at some discrete set of points in [0, TJ) the optimal control law
u*(x, t) = - R -1 BT K(t)x,
(23)
is completely determined in feedback form. Finally, the optimum tra jectory is the solution of
x*(O) = and the optimal cost from
Xo
X o,
(24)
is
C(u*)
= tx o • K(O)xo.
(25)
Example 1 Suppose that we are given an ideal response ~(t) for the system (17) and that we wish to minimize the deviation ofthe actual response x(t) from the ideal ~(t) over some period [0, TJ. Furthermore, we wish to use as small a control energy as possible. Setting e(t) = x(t) -
~(t),
o s t s T,
(26)
our cost could be C(u) =e(T)· Se(T)
IT + 21 Jo (e(t)·
Qe(t)
+ u(t)· Ru(t)dt;
(27)
e(t) then represents our "error" at time t: the matrices S, Q, and Rare chosen to weight the respective terms of the cost functional according to our priorities.
2. Continuous Dynamic Programming--Control Problems
Since
95
x = Ax + Bu, substituting in (1) we see that e = Ae + Bu + w(t),
where w(t) = A,(t) - ~(t)
(28)
is a fixed known function, and
e(O)
= Xo - ,(0).
(29)
The control problem defined by (27)-(29) is analogous to the type considered previously [Eq. (28) has a further term in the right-hand side, but this only slightly changes the argument]. As before, we can derive the following theorem. Theorem 2 If R > 0 and S, Q ~ 0, an optimal control exists, is unique, and is given by u*(t) = R -1 BT(g(t) - K(t)x*(t»,
(30)
where K is the solution of the Riccati differential equation K(T)
= S.
(31)
The vector g(t) is the solution of the linear equation
g = -(A - BR-1BTK)T g - Q,(t),
(32)
The optimal trajectory satisfies X o.
(33)
+ ,(T) . K(T)'(T).
(34)
x(O) = The minimum value of.the cost is C(u*)
= tx*(T) • Sx*(T) -
,(T) • STx*(T)
Example 2 An Infinite Horizon Control Problem (Lee and Marcus [3]) When the basic interval [0, T] becomes infinite, that is, T = + 00, the theory given above leads to the linear regulator problem: find the control that minimizes the total error over all time. To simplify the analysis, we consider the linear autonomous system
x = Ax + Bu,
x(O) =
X o,
(35)
for A and B constant matrices, and cost functional C(u)
= 2"1 Jor'"
{x(t)· Qx(t)
+ u(t) • Ru(t)} dt,
where Q, R > 0 are constant symmetric matrices.
(36)
96
VI.
Dynamic Programming and Differential Games
The first problem that immediately arises is finding when (36) will be finite. Clearly we want the solutions of(35) (or at least the optimum trajectory) to decay to zero as t --+ 00. The set of admissible controls will be the space of all m-vector functions that are square integrable over (0, (0). It turns out that the required assumption on A and B is rank{B,AB,A 2B, ... ,A"-lB}
=
n.
(37)
(This condition is called controllability, and we shall consider it in some detail in Chapter VII.) Theorem 3 Consider the autonomous linear control problem (35) for which (37) holds, and the cost functional (36). There exists a unique symmetric positive definite matrix E such that (38)
For each initial point X o in R", there exists a unique optimal control u* that can be synthesized in feedback form as u*(t) = -R-1BTEx*(t).
(39)
The optimal response x* satisfies
x* = (A
- BR-1BTE)x*,
x*(O)
= Xo,
(40)
and the minimal cost is C(u*) =
txo . Exo·
(41)
Note that once the existence of E is assumed (basically this involves showing that limt_C(> K(t)· E exists, using the controllability as sumption), then by a well-known lemma of Liapunov, the matrix A BR-1BTE is a stability matrix (all its eigenvalues have negative real parts), and so solutions of (40) decay to zero as t --+ 00, as required [3].
3. CONTINUOUS DYNAMIC PROGRAMMING DIFFERENTIAL GAMES In many situations we should like to consider two individually acting players in our control model. Associated with the process would again be a cost functional; however, the players will now be antagonistic, in the sense that one player would try to maximize this cost and the other to minimize it. If the dynamics of the process are
3. Continuous Dynamic Programming-Differential Games
97
again given by a differential equation, we could model it as
x=
f(x,,p, !/I),
x(O) = Xo,
(1)
where e is the first player's control variable and !/I is the second player's control variable. We assume each of these controls takes its values in some restraint set; typically -1~cPi~l,
i=I, ... ,m
-1~!/Ii~l,
(2)
or ,p and !/I could be unrestrained [2]' To determine when the game ends, we suppose some terminal set F in the state space R" is given, and we terminate the game whenever F is reached. To simplify the analysis, we always assume that F can be reached from any point in the state space by admissible controls. Given any two controls ,p and "" we associate a cost of the form C(,p,!/I) = f~f
fo(x(t), ,p(t),!/I(t)) dt
+ g(x(tr)),
(3)
where the integration extends over the trajectory and ends (t = t r) when we hit F. The function g is a terminal cost, which needs to be defined only on F. (We shall show later that this formulation actually includes the fixed-time problem considered earlier.) The novel aspect is that now t/J will try to minimize (44) and !/I will try to maximize it. The value of the game starting from X o would then be naturally defined as
.
V(xo) = min max C(,p,!/I), ~
~
where the minimum and maximum are over all admissible controls ,p and !/I. Several mathematical problems involved in such a definition as (4) are immediately obvious. Is the ming max., well defined, and does it equal max~ min.? A full investigation of these problems involves deep concepts (see [1]), and we shall from here on assume the value exists, and min max = max min whenever required. In the simple problems we shall solve here, this will not be a difficulty. We can derive necessary conditions for optimality by using dynamic programming in essentially the same way as before. Assume we have a trajectory starting at Xo at time to, of the following type (see Fig. 6). We break the trajectory at time to + fJ for small fJ > 0, and suppose that we use arbitrary admissible controls ,p,!/I over [to, to + fJ), and optimal controls over the final are, to + fJ --+ F. Then the cost calculated
98
VI.
Dynamic Programming and Differential Games F
Fig. 6
along this trajectory is just rIO+~
Jro
fo(x(r),
+
V(X(to
+ f> :
where x(t: tP, "') is the solution of (1) at time t, having used controls tP and "'. Again expanding x(to + f> : tP, "') and V(x(to + fJ : tP, "')) by Taylor's theorem gives V(x(to + f> :
= V(xo) + f>VV(xo) . f(xo,tP(to), "'(to)) + o(fJ). (5)
In other words, the cost of this trajectory from Xo is V(xo) + f>( VV ·f(x o, tP(to), "'(to))
+~
l:oH fo(x,tP, "')
dt
+ O~f»}
(6)
Since the value of the ga.me starting at Xo is the min, maJC,j, of the cost (6) (by the principle of optimality), we have V(xo) = min max[Eq (6)J, so canceling V(xo) and letting f>
>
--+ 0
'"
gives
min max[VV . f(xo, tP(to), "'(to)) >
+ fo(x o, tP(t o), "'(to))] = O.
(7)
'"
Since (7) holds {or any Xo in the state space and for tP(to) and "'(to) in their respective restraint sets, we have min max [V V(x) . f(x,
(8)
The minimum and maximum extend over the restraint sets of the con trols tP and "', respectively. Of course we again have the natural bound ary condition for (8), V(x) = g(x)
xEF.
(9)
3.
Continuous Dynamic Programming-Differential Games
99
In the same way as before, we can show that the minimum and maxi mum are obtained by optimal controls. The fixed-time free-end-point problem considered before can easily be set in this formulation, namely, if T is the terminal time, we define a new state variable X n + I by
xn + let) =
Xn + 1(0) = 0
1,
[i.e., Xn + l(t) = tJ, and then let F be the set in W+ I, F
= {x E Rn + l ; x = (X loX2'"
.,
x,; T), all
Xi E
R}.
(10)
The state equations now become
I, (x, cP, l/t)
Xl
X2
(11)
Xn
fn(x, cP, l/t) 1
Xn+ 1
If we substitute (10) into (8), we see • ( V; = - mm max
t/>
vex, T)
'"
= g(x),
X E
L n
j= I
oV
-0 !j(x, cP, l/t) X j
) + fo(x, cP, l/t),
(12)
R"
the analog of HJB equation derived earlier. We can show that if V is sufficiently differentiable, then this theorem is also sufficient; see, for example, [2]. Similarly, if we let the Hamiltonian
H
= -
[VV • f
+ foJ,
(13)
then, when we substitute in the optimal trajectory for x and so express Vas a function of t, the adjoint equations
d (Vx.) = ooH (x*(t), cP*(t), l/t*(t)), . Xi dt
i = 1,2, ... ,n,
(14)
hold. Example 1 A War of Attrition and Attack (Isaacs [2J) In the war, we have two opposing sides considered as players I and II in our game-theory model. At time t, player I has supply of vital weapon xl(t), and player II has supply of his vital weapon X2(t). Each player
100
VI. Dynamic Programming and Differential Games
can allocate his stock to one of two uses: (1) Attrition. The long-range objective of depleting his enemy's rate of weapon supply. (2) Attack. Entering his weapon supply into a major conflict in the hope of winning the war outright. (This is a short-range policy, and we shall assume that it is the accumulation of these entries that count; each player seeks more than his opponent, and the excess will be the payoff.)
The basic decision is then a choice between a long-range policy of attrition (e.g., guerilla warfare) and a short-range one of direct attack when we think the enemy is sufficiently weakened. Suppose, at time t, player I splits his forces into the attacking component (1 - (t))x1(t) and the attrition component (t)x1(t). His control variable is then restrained by s; (t) s; 1, t > 0. Similarly, player II's control variable will be l/J, 0 s; l/J(t) s; 1, t > O. Suppose player II, if left unhindered, can manufacture weapons at a rate m2 • He also loses them at a rate proportional to the number x 1 that his enemy is devoting to that purpose. We shall assume therefore
°
(15)
where C2 may be regarded as a measure of effectiveness of player I's weapons against player II's defenses. By reversing the role of the players, we obtain the second state equation (16)
Suppose we plan on the war lasting some definite time T. Each day (say) player I puts (1 -
-
(1 -
reflects the margin of superiority for that day. If we integrate this over the time period [0, T], we can measure the accumulated margin of superiority
q, l/J) =
S: [(1 -
l/J)x 2
-
(l -
(17)
Player II's aim will be to choose l/J to maximize this quantity, and player I will choose to minimize it (i.e., to make it as largely negative as possible). We also suppose (without loss of generality) that C1 > c2 • Otherwise we simply reverse the roles of the players.
3. Continuous Dynamic Programming-Differential Games
101
In this formulation we have a fixed-time free-end-point differential game, and we can apply the HJB equation (12), that is, if V(x, r) repre sents the value of the game starting from x at time t, then
av =
~
ot
.
-mm max
os es
i O~l/t~l
{ (m, -
+ (1 -l/J)x 2 -
C1
av
0/X2 ) ; - +
(1 -
uX l
(m2 -
V(x, T)
,
av
C 2
= 0,
X E R 2•
(18)
To find the optimal controls
occurs. Setting Sl = 1 -
(19)
C 2 VX 2 '
in the case Xl' X2 > 0, we have
{~
n
if if
s, > 0 s, < 0,
(20)
if S2 < 0 if S2 > O.
We shall return to the case xl(t) = 0 or x 2 (t) = 0 for some t at the end of the example. By (14) along the optimal trajectory, we have ~,
=
-(Sl
v,,2 =
-(S20/* + 1),
xER 2 •
E
[0, T]
(21)
We can simplify the analysis by substituting (21) into (19), arriving at
51
=
+ciS20/* + 1),
52 = +Cl (Sl
1)
(22)
with Sl(T) = 1, SiT) = -1. The solution of this differential game follows in the usual way we solved the fixed-time control problem, by using the information given at time t = T to compute the final control values, and then solving backward in time.
102
VI. Dynamic Programming and Differential Games
By Eqs. (22) and (20),
¢*(T)
ljJ*(T) = 0,
0,
=
so the game ends with both sides attacking fully. Hence, over the final time period
ljJ*(t)
= {~
(23)
or ljJ* has its final switch at T - l/c l' Next we continue solving backward from T - l/Cl' Now
ljJ* = 1,
(24)
¢* =0.
Since 8 is continuous, we know
8(T -~) 1
Cl
2, C Cl
= 1-
82
(T -~)
C1
= -1
+ C2 Cl
(25)
and, with control values (23), Eqs. (22) become
s, =
£'2(8 2
+ 1),
S2
=
-c 1 ·
Solving backward, using (25), then gives
s, =
8 2 = -1 + cl(T - t),
CIC2(T - r), 8dt)
=
1-~2c1
8 2 (t ) ~ -1
CI
C2(t_
2
+ c 1(T
tv,
t < T - l/Cl' (26) (27)
- t),
It is easy to see that 8 2 > 0 now, and so the switch can occur only when 8 1(t) = O. This happens at t=
T-~
J2C 1 Cl
C2
_
1;
as a consequence, ¢* switches from 0 to 1 at T - (1/c 1 ) (2cdc2) - 1. If we continue solving backward from T - (l/cd (2cdc2) - 1 with
1 03
3. Continuous Dynamic Programming-Differential Games
+* = I)*
= 1, we shall find that no further switches occur, so
+* and
JI* must be the unique solutions of the HJB equation (see Fig. 7).
Now that we have explicitly calculated the optimal controls we can solve the state equations from any initial point x, and calculate I/: ~ ( X l , X , , O )=
W*,**) T-l/ci
= J T - ~ l / c l m m T( - x % t ) ) d t
T
+ JT-l/cl
(xz*(t)- x:(t))dt
when xf(0) = x l , x;(O) = x 2 . Solving the state equations and substituting for the optimal trajectory gives
+ 21 ( c 1 x 2- m l ) T Z- c l m62 T 3' -
___.
Finally, we consider the case in which one of the x*,say x?, becomes zero on [0, T I . We can see easily that this can happen only if t I T l/cl: for if t E ( T - l/cl, T ) ,then i t . T = m , > 0, and so xf is increasing over ( T - l/cl, T ) . However, it is certainly possible that xT(t) = 0 for t I T - l/c, . This corresponds to player I being annihilated. There will be a critical trajectory that just hits the point A in Fig. 8. All trajectories which start off below this will have x:(to) = 0 for some t o < T - l/cl. It is clear that if x f ( t o ) = 0, all player I1 must do is keep 2: = 0; XT can be kept at 0 only if
(28)
xz* 2 m1lc1,
and in this case I1 should play **(t) = m l / c l x 2 ( t ) ,
until point A, when both egies (+ 1).
+* and $I*
to I t IT -
l/cl,
revert to their old optimum strat-
104
VI. Dynamic Programming and Differential Games
X,
T-
c,!.
T ~.
=
0
v" = 1
~.
=
0
"'. = 0
Fig. 8
Finally, for t ::; to, we can resolve the HJB equation with Xl(t O) = 0, and we find that II's optimal strategy is unchanged; however, player I should switch slightly earlier, along the dotted line in Fig. 8. Here we see that since I's final attack is going to be nullified through the annihi lation of his forces, he tries to compensate by starting earlier. Of course, the value of the game will have to be recomputed for these new opti mum trajectories. The case in which II is annihilated can be handled symmetrically.
Problems
1. Find the feedback control and minimal cost for the infinite horizon control problem
x=
-x
+ u,
x(O) = Xo,
with cost
C(u) = fo'" [(X(t))2 + (u(tW] dt. 2. Find the control u = (Ul' U2) that minimizes the cost
~2 Jor'" (x 2 + u21 + u2)dt 2 for the system x(O) = 1.
References
105
3. Calculate the feedback control that minimizes -1
2
IT ((Xl-IX) 0
2
+U2)dt .
for the system 4. Calculate the symmetric feedback matrix
for the optimal control of the system
x-X=U, with cost
where X
=
[~l
w=
:J>
[:1
0,
y
> 0.
5. By use of dynamic programming, derive the optimality conditions for the discrete control problem . . .
1 ~ 2 f.., Uk 2 k =o
mmumze subject to
Xo
= 1.
References [1] [2] [3]
A. Friedman, "Differential Games." Wiley, Interscience, New York, 1971. R. Isaacs, "Differential Games." Wiley, New York, 1965. E. Lee and L. Markus, "Foundations of Optimal Control Theory." Wiley, New York, 1967.
This page intentionally left blank
Chapter VII
Controllability and Observability
One of our central assumptions in the time-optimal control problem was that the target point could be reached in finite time by an admissible control. In this chapter we investigate this problem of controllability of systems.
1. CONTROLLABLE LINEAR SYSTEMS To begin with, suppose that there are no restrictions on the magni tude of the control u(t). Let the restraint set be n = R". Definition 1 The linear control process x(t)
= A(t)x(t) + B(t)u(t),
(1)
with n = B", is called (completely) controllable at to if to each pair of points Xo and X'l in R" there exists a piecewise continuous control u(t) on some finite interval to ~ t ~ t 10 which steers X o to x.. In the case (1) is autonomous, we shall see that controllability does not depend upon the initial time, and so we just say the system is controllable. Theorem 1 The autonomous linear process x=Ax+Bu 107
(2)
108
VII.
Controllability and Observability
is controllable if and only if the n x nm matrix [B,AB,A 2B, ... ,An-1BJ
has rank n. Proof
Assume that (2) is controllable, but that rank[B,AB, ... ,An-1BJ < n.
Consequently, we can find a vector v e R" such that v[B,AB,A 2B, ... ,An-1BJ = 0
or vB
= vAB = ... = vAn-lB = O.
(3)
By the Cayley-Hamilton theorem [1J, A satisfies its own characteristic equation, so for some real numbers Ct> C2, • • • , Cm An
=
c1A n- 1 + C2An-2
+ ... + cnI.
Thus clvAn-lB
+ c2vAn-2B + ... + cnvB = 0
by (2). By induction, vAn+kB for all m = 0, 1,2, ... , and so,
= 0 for all k = 0,1,2, ... , or
vAnB
=
veAIB
= V[I + At + ~ A 2t2 + .. .JB = .
2!
vAmB
0
=0 (4)
for all real t. However, the response starting from X o = 0 with control u is just x(t)
= eAI f~ e-ASBu(s)ds,
so (4) implies that v· x(t)
= f~
veA(I-S)Bu(s)ds
= 0,
for all u and t > O. In other words, all points reachable from Xo = 0 are orthogonal to v, which contradicts the definition of controllability. Next suppose that rank[B,AB,A 2B,. " ,An- 1BJ = n. Note first that (1) is controllable if and only if
U {Jo II eA(I-s)Bu(s)ds : u is piecewise continuous} = s:
1>0
(5)
1.
Controllable Linear Systems
109
Suppose (4) is false; then certainly l
{fo eA(I-S)Bu(s) ds: u is piecewise continuous}
t= R"
or there must exist some nonzero v E R" such that l
v fo eA(I-s)Bu(s)ds = 0
for all piecewise continuous D, and so veA(I-s)B = 0,
O~s~l.
(6)
Setting s = 1, we see that vB = O. Differentiating (6) and setting s = 1 give vAB = O. Accordingly, by induction, vB = vAb = vA 2B = ... = VA"-l B = 0, which contradicts the rank condition. • Corollary 1
If (1) is normal, then it is controllable.
The converse is true only in the case m = 1; in fact, for m = 2, we have a simple counter example. Example 1
Hence b 2 and Ab2 are linearly dependent, so system is not normal. However . [B,ABJ
=
G
o
-1
I
-2
is certainly rank 2. Note also that the proof of Theorem 1 shows that if the system is controllable, then it is controllable in any arbitrarily small time. The concept of controllability for autonomous systems is indepen dent of the coordinate system chosen for R". For if y = Qx, where Q
11 0
VII.
Controllability and Observability
is a real nonsingular constant matrix, and iJI = QB,
then
x = Ax + Bu
(7)
y = dy + iJlu
(8)
is controllable if and only if is controllable. This follows easily from the identity rank[B,AB, ... ,An-lB]
=
rankQ[B,AB,
,An-lB]
= rank[iJI,d~,
,dn-liJI].
We shall say (7) and (8) are equivalent control systems. Example 2
The autonomous linear process
r'" + aly(n-l) + a2ytn-2) + ... + anY = u
(L)
is controllable with u E R. In phase space, (L) is just
Y=
0 0
1 0
0 1
0 0
0
0
0
1
y+
-al
-an -an-l -an-2
0 0 0
u,
1
and the rank condition is easily checked. For autonomous systems with one control variable, this example is typical.
Theorem 2 Every autonomous controllable system
x=
Ax
+ bu,
(9)
with u E R, is equivalent to a system of the form (L). Proof
Define the real nonsingular (by Theorem 1) n x n matrix Q
= [A n -
1b,A n
-
2b,
... , Ab,b].
Direct calculation shows that b= Qd
and
AQ = Qd,
1. Controllable Linear Systems
III
where IXl
1X2
1 0 0 1
0 0
0 0 0 0
1 0
0 0
a=
d= IXn-l IXn
and the real constants IXl' 1X2,'" istic equation for A by
,lX n
0 1
are determined by the character (10)
The analogous change of coordinates applied to (L) would give system matrices .91' and d, where -al -a2
1 0 0 1
0 0
-an-l -an
0 0 0 0
1 0
.91'=
Consequently, (9) will be equivalent to (L) if we choose In other words, (9) is equivalent to the system y(n) _
where lXI, •.. by (to). •
, tin
1X1ln - 1) _ 1X2 y(n- 2) -
••• -
IXnY = u,
are obtained from the characteristic equation for A
The next concept, which occurs often, is the domain of null control lability at time to. This is the set of all points in R" that can be steered to the origin in time t l > to by an admissible controller u(t) taking values in a given restraint set n c R". In the case n = R" and the system is autonomous, it is easy to show that null controllability is equivalent to controllability (Problem 2). In Section 11.3 we showed directly, by constructing time optimal controllers, that certain second order systems were null controllable with controls restricted by lu(t)1 :-s; 1. The following theorem essentially generalizes this construc tion.
112
VII.
Controllability and ObservabiJity
Theorem 3 (Hermes and LaSalle [1]) Consider the autonomous linear process in R"
x=
Ax
+ Bu,
(11)
for which (a) 0 belongs to the interior of Q; rank[B,AB,An-tB] = n; (c) every eigenvalue A of A satisfies Re(A)
(b)
~
O.
Then the domain of null controllability at any time T equals B", Proof Without loss of generality, assume that Q contains the unit cube in W. Then the theorem will be proven if we can show that [J£ = Ut~o[J£(t)
= {y(t: u): Iud ~
1, t> O} = R",
where [J£(t) is the reachable set introduced in Appendix 1, and y(t: u)
= f~ e-A<Bu(r) dt.
Since fJt is convex, it is enough to show that for each" #- 0 in R", there exists an admissible control u* such that a- y(t: u*) -+ 00 as t -+ 00. (Geometrically this is clear; a detailed proof can be obtained by sup posing fJt #- R", then using the supporting hyperplane theorem to derive a contradiction.) We show that the admissible control u*(t) = sgn["Te-AtB],
t > 0,
suffices. Note first that the rank condition implies that at least one component of"Te-AtB, say "Te-Atbt> is not identically zero. Then "Ty(t: u*) = f~ \"Te-A<Bu*(r)1 dt,
and hence
"T y(t:
u*) -+
00,
if we can show that
fo'" I"Te-A
Suppose (12) is false and set v(t) =
=
(12)
00.
Ir'" ["Te-A
(13)
If cp(A) is the characteristic polynomial of A, then cp( - A) is the char
acteristic polynomial for - A, and hence 0< t <
00,
1. Controllable linear Systems
113
where D == dldt. Consequently, from (13) (cp( - D)Dv)(t) = (Dcp( - D)v)(t) = 0,
0< t <
00.
(14)
However, the characteristic polynomial for the differential operator Dcp( - D) is just 2cp( - 2), and by assumption (c), its roots must all have nonnegative real parts. Hence the solution v of the homogeneous equa tion (10) cannot decay to 0 as t --+ 00, and so foci) \"Te-A
00,
as required. • One of the central concerns of classical control theory is to design feedback controllers (preferably linear) that willstabilize a given system, and hence improve its sensitivity to noise, measurement errors, etc. The question of the existence of a linear feedback control u = Kx, for which the closed-loop system
x = (A + BK)x is stable, is intimately related to the concept of controllability. Namely, we say the control system (2) is ( + )-stabilizable if there exists an m x n constant matrix K+ such that A+=A+BK+
has eigenvalues with only negative real parts. The same system is (- )-stabilizable if there-exists a matrix K - such that
has eigenvalues with only positive real parts. Theorem 4 (Russell [3J) The autonmous system (2) is control lable if and only if it is both ( +)- and (- )-stabilizable. Proof Suppose (2)is ( + )- and (- )-stabilizable. As remarked earlier, it is enough to show that (2) is null controllable in some time T. Set Xo ERn, and define x+(t) = A+x+(t), X-(t)
=
A-x-(t),
x+(O) = x o, x-(T) = -x+(T),
U+(t) = K+x+(t), U-(t)
= K-x-(t),
os t s r.
114
VII.
Controllability and Observability
These equations have solutions x+(t)
and
=
eA+tx o
x-(t) = _e-A-(T-t)x+(T) = _e-A-(T-t)eA+Tx
o'
Finally, define f T : R" --+ R", by fT(x O) = x-CO) = -e-rTeA+Tx o'
Then
and satisfy
x=
Ax
+ Bu,
x(T)
= 0,
or the control u steers X o + f T(X O) to 0 in time T. If we can show that for some T> 0, {xo + f T(XOn ranges over R" as X o ERn, (2) must be null controllable in that time T, and the result follows. However, Xo
+ fT(xo) =
(I - e-A-TeA+T)x o,
and by the ( + ) and (- ) stability assumptions we can guarantee that lIe-A-TeA+TII
< 1
by choosing T sufficiently large. For such a T, the mapping (I + f T): R" --+ R" is 1-1 and onto (since f T is a contraction), and hence {xo + f T(X O) : X o ERn} = R" as required. To prove the converse, we take m = 1 for simplicity. Note that if A + BK is a stability matrix, then so is Q(A + BK)Q - t = d + f!4yt' (yt' = KQ-l). Accordingly, by Theorem 2 it is sufficient to show that the system equivalent to (2) yin) _ oc1yln-t) _ oc 2y1n- 2)
_
••• -
ocny
= u,
is (+)- and (- )-stabilizable. For this problem, the linear feedback control u = dnYl + dn- 1Y2 + ... + dtYn, d = (dt,d2, ... ,dn) ERn, gives the closed-loop system yin) _ (OCt
+ dt)y(n-l) -
(OC 2
+ d 2)y(n-2) -'"
- (OC n + dn)y = 0,
(15)
and a (±) stable process can be achieved by determining d so that the characteristic polynomial of (15) is An - (OCt
+ ddA n- t
-'" - (ocn + dn) = (A
± 1)"
1. ControllableLinear Systems
or -(ai
+ di) =
G}±I)i,
1 s i::; n. •
115
(16)
With a little more work the above proof shows that if (2) is control lable, then by a suitable choice of the feedback matrix K, one place the eigenvalues of A + BK anywhere in the left-hand plane. Such a K can be constructed by reversing the proof. Finally, we give a simple sufficient condition for the nonautonomous system (1) to be controllable. Suppose the matrix A(t) is k - 1 times differentiable and B(t) is k - 2 times differentiable for t e: to' Define inductively a sequence of n x m matrices Mit) as the solutions of the equation j = 0,1,2, ... , k - 2,
with Mo(t) = B(t),
Then d dt (X- 1 (t)B (t» = X- 1(t)[ - A(t)B(t)
d
+ dt
B(t)]
= X- 1(t)M l(t) and similarly di
dt i (X- 1 (t )B (t» = X- 1(t)M i t),
t;::: to,
j
= 0,1,2, ... , k -1.
(17)
Theorem 5 (Hermes and LaSalle [2]) If A(t) has k - 2 derivatives and B(t) has k - 1 derivatives for t ;::: to, and if for some positive integer k and each t 1 > to, there exists atE [to, t 1) such that rank [M o(t),M 1(t), ... ,Mk - 1(t)] = n, then (1) is controllable at to'
Proof Following the proof of Theorem 1, it can be shown by using the continuity of X-I B that if (1) is not controllable, then there exists a'l -# 0 and a t 1 > to such that (18)
116
VII.
Controllability and Observability
Differentiating (18) and applying (17) gives "TX-1(t)Mo(t)=O,
"TX-1(t)M1(t)=O
... , "TX-1(t)Mk_1(t)=O,
for t E [to, t 1). Consequently "TX-1(t) is nonzero and is orthogonal to [Mo(t), ... , M k- 1(t)], which contradicts the rank condition. • Example 3 Show that the system
is controllable at to = O. Computation shows that Mo(O) = b(O) =
[~J
M 1(0) = -A(O)b(O) + b(O) =
[~J,
and
M 1(0) = -Ab(O) - A(O)b(O) + b(O) =
[~l
Hence M 2(0) = A(0)M 1(0)
+ M 1 (0) =
[~l
Similarly, it can be shown that M 3(0)
=
[~J
and so {M 2(0), M 3(0)} are linearly independent. 2.
OBSERVABILITY Suppose that we can observe some known linear function y(t) = Cx(t),
(1)
where C is an m x n matrix of the output x(t) of a known dynamical system
x=Ax
(2)
2.
Observability
11 7
whose initial condition x(O) = x o, however, is unknown. For instance, if 1 0 0 o 1 0 C= 0 0 0 000
then (1) would mean that we would observe the first two components of x(t). Given such a framework the natural question to ask is, How much ofx need we observe in order to be able to determine x uniquely? Since x is completely determined by (2) once the initial condition Xo is known, we call the system (1)-(2) observable if the knowledge of y(t) on some interval [0, TJ uniquely determines Xo' Of course once we know a system is observable, the next problem is to find a recipe for recon structing Xo from {y(t)}. It turns out the solution of both these problems is intimately related to the controllability concept introduced earlier.
Theorem 1 The system (1)-(2) is observable if and only ifthe control system z = ATZ + CTu (3) is controllable. Consequently (1)-(2) is observable if and only if rank[CT,ATCT,(AT)2CT, ... ,(AT)"-lCTJ = n.
(4)
Proof We can write the solution of (2)
x(t)
eA1xO'
t> 0,
y(t) = CeA1x o ,
t > O.
=
and so, by (I), (5)
By definition the system is observable at any time T > 0 if and only if there does not exist an xo =1= 0 for which y(t) = CeA1xo == 0,
0:-:;:; i
s:
T.
(6)
Suppose that such an xo exists. Then setting t = 0 in (6) gives Cxo = O. Differentiating (6)and setting t = 0 gives CAxo = 0, etc. So by induction we see that
118
VII.
Controllability and Observability
that is, X~CT
=
X~ATCT
= ... =
X~(AT)n-lCT
= 0
and the rank condition (4) fails. The converse is proven using the Cayley-Hamilton theorem in essentially the same way as before. • For the problem of calculating Xo, note that, by (5)
S: (eAt)TCTy(t)dt = S: (eAt)TCTCeAtxodt = Mxo,
where M
=
S: (eAtl C CeAt dt. T
(7)
If the system is not observable by (6),there exists an Xo =I 0 for which Mxo = 0, and hence M· is singular. Conversely, if M is nonsingular, then Xo
= M- 1 SOT (eAt)TCTy(t)dt
(8)
is completely determined. So (1)-(2) is observable if and only if M is nonsingular, in which case Xo can be computed from (8). Notice that for autonomous systems, observability is independent of T > 0. These results also carryover to the (perhaps from practical consid erations more interesting) discrete time case of(I)-(2). For example, if
x«k
+ 1)1:).= Ax(h),
x(o) = Xo, y(h) = Cx(h), k = 0, 1,2, ... ,
(9) (10)
where 1: represents some sampling period, then we say the system (9)-(10) is observable if x, is uniquely determined once {y(O), y(1:), ... , y«n - 1)1:)} are known.
Theorem 2 The system (9)-(10) is observable if and only if .rank[CT,ATCT, ... ,(AT)"-lCTJ = n.
Analogous to (7)-(8) we can compute Xo from the observed y'S, as
x(h) = Alex o , y(h) = CAlex o'
Hence
119
Problems
where M is the n x n matrix
M=
n-1
L (Ak)TCTCA k.
k=O
Since M can alternatively be written M=KK T
where K is the n x nm matrix K
= [CT,ATCT , . . . ,(A n
1)TCT],
if the system is observable, then the rank condition guarantees M is nonsingular, and hence Xo
=
M- 1
n-1
L (Ak)TCTy(kr).
(11)
k=O
Clearly, any n-consecutive observations suffice in (11). Problems 1. Check the controllability and observability of the following systems:
x = [~ X=
~}
+ [~},
y = [0
[~-2 -4~ -3'~]X+ [~ -1
l]x; 0 1 y = [1 2
°ll]U'
-lJ
1 x.
2. Show that the linear auto mono us control system is controllable if and only if it is null controllable. 3. Show that the nth-order controlled linear differential equation x~n)(t)
+ an_1(t)x(n-1)(t) + ... + ao(t)x(t) = u(t)
(or equivalently x(t) = A(t)x(t)
A(t)
=
o o
1 0
0 1
o
o
o
+ bu(t),
o o
120
VII.
CootroIlability and Observability
where a j has j continuous derivatives), is controllable at any time to ~ O. 4. Let A
=
[°1 -lJ°'
b(t)
=
[c~s
tJ.
smt
Shows that all solutions of x = Ax + bu, x(O) = 0, lie on the surface Xl sin t - X2 cos t = 0, and hence this system is not con trollable at to ::: 0. 5. Show that if A has two (or more) linearly independent eigenvectors corresponding to the same eigenvalue, x = Ax + bu cannot be controllable. 6. For the system - 1
x=
[
~
1 -1
v = [1
°
l]x,
can the initial state x(O) be chosen so that the observed output y(t) = te" for t > O? 7. Show that the discrete linear autonomous control system x(k + 1) = Ax(k) + Bu(k)
is controllable if and only if rank [B,AB, A 2B, . . . , An-l B] = n. 8. Show that for the autonomous linear process
x= Ax + Bu, if there exists a nonzero m-vector w such that Bw,
ABw,
... ,
An-IBw,
are linearly independent, then the system is controllable. 9. Suppose A is a symmetric matrix with distinct eigenvalues {A'j} and eigenvectors {x jk}, k = I , ... ,nj, nj is the multiplicity of Aj , j = 1,... , P. If B = [b., ... , bm ] is written in columns, show that x = Ax + Bu is controllable if and only if
j=I,2, ... ,P.
References
121
In particular, for the system to be controllable, the number of controls must exceed the largest multiplicity of the eigenvalues ofA.
References [IJ F. Gantmacher, "The Theory of Matrices." Chelsea, New York, 1959. [2J H. R. Hennes and J. P. LaSalle, "Functional Analysis and Time-Optimal Control." Academic Press, New York, 1969. [3J D. Russell, Exact boundary value controllability theorems for wave and heat pro cesses in star-complemented regions, in "Differential Games and Control Theory" (E. O. Roxin, P. T. Liu, and R. L. Sternberg, eds.). Dekker, New York, 1974.
This page intentionally left blank
Chapter VIII
State-Constrained Control Problems
1. THE RESTRICTED MAXIMUM PRINCIPLE
In many applications of control theory, some or all of the state vari ables will be subject to natural constraints, without which the problem may lose its physical meaning. For instance, in the moon-landing prob lem a natural state constraint would be h(t)
~
0
for all
t>
o.
(1)
In that example the solution was sufficiently simple that we could check and make sure that (1) was not violated during the construction of the optimal solution. In this chapter we shall explore an extension of the maximum principle that will directly cover such state constraints as (1). We shall now restrict the control problem further. In particular, we suppose the restraint set n is set of all U E B" for which (2)
for given continuously differentiable functions q1, . . . , qs. In particular, [ -1,1] could be written
n=
{u E R : u2
-
1 :s; OJ,
that is, UER.
We shall place further restrictions on {q l' . . . , qs} later. 123
124
VIII.
State-Constrained Control Problems
For the state constraints, we suppose x(t) is required to lie in a closed region B of the state space R" of the form B
= {x: g(x) ~ O}
(3)
for some given scalar-valued function g, having continuous second partial derivatives, and for which Vg =
(::1' ...,::J
does not vanish on the boundary of B, that is, on {x: g(x) = O}. The other elements of the control problem will remain as before. If we define p(x, u) = Vg(x) • f(x, U),
(4)
then once a trajectory x(t), with control u(t), hits the boundary of B, g(x) = 0, a necessary and sufficient condition for it to remain there for all later t is that p(x(t), u(t»
=
o.
(5)
This asserts just that the velocity of a point moving along the trajectory is tangent to the boundary at time t. Two further assumptions that will be required later are that Vup(x, u) =I 0,
(6)
and that (7)
be linearly independent along an optimum trajectory x*(t), u*(t). Our strategy for solving state-constrained problems will be to break them into two subproblems. Whenever the optimum trajectory x*(t) lies in the interior of B, g(x*(t» < 0, the state constraints are super fluous (nonbinding), and we can apply the maximum principle as be fore. However, as soon as x*(t) hits the boundary, g(x*(t» = 0, we solve for x* using the restricted maximum principle given below. In this way, as long as x* consists of a finite number of pieces of these two types, we can hope to construct it. The essential point of the restricted maximum principle is that once we hit the boundary we have to restrict the control power to prevent the optimum trajectory from leaving B. From our remarks above, this requires that the optimal control u*(t) not only take values in n, but
1. The Restricted Maximum Principle
125
also that p(x*(t),u*(t» = O.
(8)
So our restricted maximum principle will take into account the added control restraint (8). The Hamiltonian H(x, u, t/I) is defined as before. A detailed proof of the theorem 1 is given in [4], Chap. VI; see also [2], Chap. VIII, for an alternative treatment.
Theorem 1 (Restricted Maximum Principle) Let x*(t), t o :<:;; t :<:;; be an optimal trajectory with control u*(t), and suppose that x*(t) lies entirely on the boundary of B, to :<:;; t s; t 1, and (6)-(8) are satisfied. Then there exists a continuous vector function t/I(t) and a piecewise smooth scalar function A(t), to :<:;; t s; t 1, such that tl>
x*(t)
(i)
tj,(t) = - V xH(x*(t), u*(t),t/I(t» + A(t)Vxp(x*(t), u*(t»,
(ii)
(iii)
= f(x*(t), u*(t»,
H(x*(t), u*(t),t/I(t»
= max H(x*(t), u, t/I(t» = 0, u
for to :<:;; t s; t 1, where the maximum is taken subject to u E n, that is, (2) and p(x*(t), u) = O. Since this is really just a constrained maximiza tion, we can alternatively write (iii) (iv) V uH(x*(t), u*(t),t/I*(t» = A(t) V up(x*(t),u*(t» s
+L
r= 1
vr(t)Vuqr(u*(t»,
where A(t), v1(t), . . . ,vit) are Lagrange multiplers, at least at points t of differentiability of A. [A is the same quantity in (ii) and (iv).] Further, (v)
t/Jo(t) = const :<:;; 0, t/I(to) i= 0,
(vi)
and t/I(to) is tangent to the boundary of B at x*(t). (vii) Whenever A is differentiable
~(t)
Vg(x*(t»
is directed toward the interior of B or else is zero. From assumptions (6)-(8), we can discuss (iv) in more detail. At any point of continuity of A, for given x*(t), u*(t), and t/I(t), (iv) is a system
126
VID. State-Constrained Control Problems
of m equations in s + 1 unknowns A(t), v 1(t), . . . ,v.(t). Assumptions (6) and (7) guarantee that the matrix of this system has rank s + 1, and so we can solve for these unknowns; in particular, we can solve for A~ A(t) = "'(t) • a(t),
(9)
where a is a piecewise smooth function. If (9) is substituted back into (ii), we see that '" satisfies a linear homogeneous differential equation, and since "'(to) =1= 0, we have "'(t) 1= 0 on any interval.
2. JUMP CONDITIONS So far we know how to compute each piece ofthe optimum trajectory; all that is lacking is what happens when we change from g(x*(t)) < 0 to g(x*(t)) = 0, or vice versa. In particular, what happens to the adjoint variables when we enter the boundary of B, or when we leave it? It can be shown that if to is a time when x* enters the boundary, then "'(to - 0)
= "'(to + 0),
(1)
that is, '" is continuous across the jump. On the other hand, on leaving the boundary at some time t 1 , say, it can be shown that (2)
Aas in (ii) and (iv) in Section 1. Conditions (1) and (2) are called the jump conditions [4, Chapter 6]. Example 1 Let us consider the application of the previous theorems to a simple "obstacle" problem. Find the planar curve of minimum length joining two given points, and avoiding a closed circular region (see Fig. 1).Without loss of generality we choose one point at the origin. An equivalent control problem would be to minimize the time taken to traverse the given path at constant speed. If ui represents the com ponent of the velocity in the Xi direction, i = 1,2, then we would have (3)
with the control restraint set
(4)
2.
Jump Conditions
127
o¥----------ti----4~-+_-----------·
Fig. 1
and state constraint g(X I,X2) = R 2 - (x, - a)2 - x~ ~ 0.
(5)
We seek an admissible control, transferring X o to 0 in minimum time, such that (5) is satisfied for all time, that is, we take as cost C(u) =
f;' 1 dt.
Consequently, the Hamiltonian is
H(x,u,"') = -1
+ "'lUI + r/J2U2'
First, we analyze the optimum trajectory when the constraint (5) is nonbinding. The adjoint equations are as usual
.
aH
=
--a =0, Xl
r/J2=
--a =0, X2
r/JI
.
aH
which can be solved to give
r/JI = c I, where c l ,
C2
are constants. The Hamiltonian is maximized when Ci
u·I ( t) =-=~= I 2 Y CI
+ C 2' 2
i
=
1,2,
128
VIn.
State-Constrained Control Problems
+ .Jci + c~ = 0, so i = 1,2, = l/J;(t),
However, from (iii) of Section 1, -1 u;(t) =
Cj
(6)
where ,J ci + d = 1, and the optimum trajectory inside the region B is just a straight line, as we would expect. Now we apply the restricted maximum principle in the region where the state constraint is binding. First, Vg = (;-2(x l
a), -2x 2 ),
-
p(x, u) = Vg· f = -2(x l
and so p(x, u) =
°
-
a)2
+
a)ul
2X2 U 2 ,
-
when u2
Xl - a ---
ul
Since (Xl
-
X2
x~ = R 2 and
ui + u~
= 1, this
reduces to
-X2 ul=T'
(7)
The adjoint equations now become l] til = 0 + A[-2U -2u 2
or
(8)
tfr1 = - 2AU l, tfr2 =' - 2AU2·
The constrained maximum of the Hamiltonian will be sought by the unconstrained maximization over all (ut>u 2 ) E R 2 of the function
H - AP
+ v(ui + u~
- 1),
where A and v are Lagrange multipliers. Taking partial derivatives with respect to U l, u2 gives
l/Jl] _ A[-2(Xl [ l/J2 - 2X2
al)] + V[2U l] = 0. 2u2
Since Ul and U2 can be expressed in terms of substitute for Xl and X2' we find
Xl and X2 by (7), if we
l/Jl + 2RAU2 + 2VUl = 0, l/J2 - 2RAUl + 2VU2 = 0.
2.
129
Jump Conditions
Now eliminating v from these equations gives and since
ur + u~
t/JtU2 - t/J2Ut + 2AR(ur + u~) =
=
0,
1, we have A = t/J2 Ut - t/J1 U2 2R
(9)
.
Further, the second part of (iii) in Section 1 tells us that t/J2(t)U2(t)
+ t/Jt(t)Ut(t) = 1
(10)
for the optimum Ut,U2' We now have four equations (8)-(10), and the control restraint + u~ = 1 for the five unknowns t/Jl, t/J2, Ul> U2, A. Consequently, we can solve for the adjoint variables t/J t and t/J 2 and find the optimal solution. We do this by changing to polar coordinates (Fig. 2). Namely, since (Xl(t) - a)2 + x~(t) = R 2, we let,
ur
Xt(t) - a = R cos O(t), xit) = R sin O(t).
(11)
= -sin 0, U2 = cos 0
(12)
Then Ut
by (7), and Xt = X2
Ut = (R sin O)(), = U2 = - (R cos O)(),
0/----------1---4'----+----------.
Fig. 2
130
vm.
State-Constrained Control Problems
so squaring and adding gives (j = l/R.
By the chain rule applied to (8), we can solve for t/J t in terms of that is,
e,
Integrating gives
+ e cos e.
(13)
+ cos e + e sin e.
(14)
t/Jt(e)
=
K cos e - sin e
t/J2(e)
=
K sin e
Similarly, To describe the behavior of '" as we enter and leave the circle of radius R, we now apply the jump conditions. At an entering point t = to, say, e(t o) = eo, we have K cos eo - sin eo + eo cos eo = -cos qJo, K sin eo + cos eo + eo sin eo = sin qJo,
(15)
where, by (6), we have let t/Jt(t) = -coSqJo, 0 $ t $ to, and t/Jit) = sin qJo, 0 $ t $ to, with qJo as shown in Fig. 3. Solving (15) gives (16)
and ()o
+ qJo = n12.
(17)
In other words, the portion of an optimal trajectory that precedes an entering state is tangent to the boundary. Employing the jump condition
.....
Ol--------+---L-::~_+_--------
Fig. 3
2.
for a leaving point t = t 1 , (}(t1 ) = r/Jl((}l - 0) = -sin (}1
(}1'
Jump Conditions
131
we have
[using (16) in (13) and (14)],
r/Ji(}l - 0) = cos (}1
and A(() )
= r/Ji(}l)Ul((}l)
1
- r/J 1((}du2((}1) 2R
(cos (}1)( - sin (}1) - (- sin (}l)(COS (}1) 2R
=0. So the jump (2) in the adjoint variables is 0 at ()
= (}1' and
r/Jl((}l + 0) = -sin(}l' r/J2((}1
+ 0) = cos (}1.
(18)
Since the trajectory leaving the boundary is again a straight line, we have as before r/Jl((}l +O)=UMl +0)= -COSqJlo r/J2((}1
where
({Jl
+ 0) =
U2((}1
+ 0) =
-sinqJlo
(19)
is the angle shown in Fig. 4. Combining (18) and (19) gives . cos ({J 1 = sin () 1 , sin qJ1 = - cos () 1 ,
so, again, the optimum trajectory leaves tangential to the boundary.
Fig. 4
132
Vlli. State-Constrained Control Problems
Summing up, the optimum trajectory consists of straight lines; if it touches the boundary, it follows the boundary and leaves and enters tangentially.
3. THE CONTINUOUS WHEAT TRADING MODEL WITHOUT SHORTSELLINGt We now return to the control problem in Example 4.2.1, namely, maximize C = xl(T) + p(T)xiT)
(1)
subject to
x1(t) =
- p(t)u(t) - IXX2(t),
X2(t) = u(t),
X1(0) given,
(2)
X2(0) given.
(3)
We recall that X1(t) and xit) represented the firm's stock of cash and wheat, respectively, IX is the unit holding cost of wheat storage, and p is the price of wheat over the planning horizon [0, T], assumed continuous. The control function u represents the rate of sale or pro curement of wheat, and we suppose without loss of generality that -1 S u(t)
s
1,
t ~ O.
(4)
Since we forbid shortselling, we have the state constraint t
and so the feasible region is B =
~
0,
{(Xl> X2) : X2 ~
p(x,u)
=
-u.
(5)
O}. For this problem, (6)
Now let x*(t) = (xf(t), x!(t)) be an optimal trajectory with control u*(t), and suppose x*(t) lies entirely on the boundary of B, for t k S t S t k + 1. Trivially, from (6) the optimal control must be u*(t)
= 0,
(7)
Substituting (7) into (2) and (3) gives xf(t) t
See Nostrom [3].
= xf(tk),
x!(t)
= 0,
3. The Continuous Wheat Trading Model without ShortseUing
133
Since the control constraints are now no longer binding, their respective multipliers in (1.12) are zero, and so we have A(t) =
Finally, the adjoint equations for the constrained problem are un changed from before, with the transversality conditions
so by the transversality conditions,
°
~ t
s
T.
Next,
so the adjoint variable -
=
To give a complete solution of this problem, we now suppose that the optimal trajectory consists of a finite number of constrained and unconstrained arcs. From the solution obtained below it is not hard to see this will be the case if p consists of a finite number of concave or convex segments. This case would cover most price functions of interest. In any time period [t k - 1 , tk ] where the trajectory leaves the boundary at time t k - 1 and enters again at time tk> the state constraint will be nonbinding a~d the solution will be as before (Example 4.2.1), with p(T) replaced by p(t k). It only remains to find the points tbt Z"" ,tN where the trajectory enters or leaves the boundary of B. To find these points, we use two additional facts. First, since X!(tk-1) = X!(tk) = 0, the amount of wheat bought in an interval [t k - 1 , t k ] must exactly equal the amount sold. Second, by (vii) of Section 1 on boundary arcs we must have p(t)
<
0(,
(8)
134
VIll.
State-Constrained Control Problems
at points where p is differentiable, and so the points t 1, . . . ,tN must lie in intervals where (8) holds. To solve the problem given a function p is now largely geometric. We solve the example in Chapter IV, Section 2, in Fig. 5. The final arc of CfJ2 is determined by the transversality condition, - CfJiT) = p(T), and the fact that its slope is a. If we draw this, we find the point t 4 . The two requirements above and the fact that the slope of CfJ2 is ct. lead to t 2 and t 3 and to the switch from buying to selling at t = 3. We continue along in this fashion until arriving at t = O. The optimal control is -1
(sell),
O~t<1
1s 1.8 < 3< 4.2 s 4.6 <
0 (constrained), u*(t)
=
1 (buy), -1 (sell), 0 (constrained), 1 (buy),
t ~ 1.8 t< 3 t < 4.2 t ~ 4.6 t < 7.
7 plt)
6
5
I 4
3
I
I I
I
I
2
I I
I
Sell -x-jCons)-x-Buy-x -
I I
t,
0
Sell
t3
t2
1.8
I I x - C - x - - - Buy - - + ;
4.2
3
Fig. 5
t.
I
4.6
6
7
4. Some Models in Production and Inventory Control
135
4. SOME MODELS IN PRODUCTION AND INVENTORY CONTROL We consider an inventory of one or more items being controlled over a fixed time period [0, TJ. At any time t, the rate of change of inventory level will be the difference between orders or production quantities received and the demand fulfilled. In other words x(t) = u(t) - d(t),
x(o)
= xo > 0,
(1)
where x(t) is the inventory at time t, u(t) the production rate, and d(t) the demand rate, which is assumed positive and given. As the cost of holding inventory x(t) and producing at a rate u(t) we take C(u) =
S: (cu(t) + hx(t)) dt
(2)
where c and h are the unit ordering and holding costs per unit time. We assume that they are positive and constant, so that the unit costs depend neither on time nor on the quantity of goods ordered or held. We seek a control (production rate) u that minimizes this cost. We also assume state and control constraints of the form 0,
(3)
u(t) ~ 0,
(4)
x(t)
~
so that backlogging of inventory and disposal of production will not be allowed. To begin with, let us apply the necessary conditions in the case in which the state constraint (3) is nonbinding, H(x,u,cp)
= -cu - hx + cp(u - d),
(5)
and the adjoint equation is (6)
The maximization of the Hamiltonian is equivalent to, max{u(cp ";>:0
cn.
(7)
From (7), we see that if cp - c > 0, the maximum = + 00, so no optimal control would exist. If cp(t) - c ~ 0, the maximum occurs with u*(t) = 0. However, we must check that if x*(t) > and u*(t) = 0, then cp(t) - c
°
136
VIII.
State-Constrained Control Problems
is indeed less than or equal to zero. To do this, we recall that the maximum principle states that along an optimal trajectory H(x*(t), u*(t),qJ(t)) = _0
so - hx*(t) - qJ(t)d(t) = 0.
(8)
Now if u*(t) == 0, then x*(t) = -d(t) :::; 0, so x*(t) is decreasing from its positive initial value X o as t increases. For the moment, let us assume x*(t) reaches zero for t < T. Namely, let to < T be the first time that X
o=
S;o d(t) dt.
(9)
Then x(to) = 0, and so by (8), qJ(t o) = 0. Consequently, from (6) qJ(t)
=
h(t - to),
0:::;; t:::; to,
(10)
and then clearly qJ(t) - c s 0,
0:::; t s; to.
Under the assumption (9), the necessary conditions are satisfied by the following strategy: If we begin with positive inventory, we must produce nothing until the inventory level reaches zero. At this point, however, the state constraint will be violated if we continue this strategy; we must now apply the restricted maximum principle. Note that the state constraint is g(x(t)) == - x(t) :::; 0,
so that the function p(x, u) = Vg· f becomes just p(x, u) = -(u - d).
Consequently,
oplox =
(11)
0, and the adjoint equation is again qy(t) = h,
(12)
Since the jump condition guarantees continuity of qJ across t = to, and qJ has slope h on both sides, we must still have qJ(t) = h(t - to),
(13)
at least until the optimum trajectory leaves the x axis. The maximum condition now becomes maximize H(x*(t), u, qJ(t))
(14)
4. Some Models in Production and Inventory Control
subject to p(x*(t), u) = 0 and u
~
137
O. The first condition implies (15)
u*(t) = d(t),
and since we assumed d(t) > 0, the second is automatically satisfied. Hence the new necessary conditions imply that on the x axis we should use the optimal control (15). Note that if we do this, x*(t)
so that
= d(t) -
d(t)
x*(t) = 0,
= 0,
x*(to)
=0
to :$ t :$ T,
and (15) must be the optimal control over the remaining time interval [to,
TJ.
The structure of the optimal policy is therefore to produce nothing until the initial inventory is depleted and then to produce to demand until the end of the process. x*(t) =
xo {
o
ft d(t)dt Jo
(16)
and the multiplier .Ie is 2(t)
= c + h(t
- to),
to :$ t < T.
(17)
Finally, we remark that if (9) is never satisfied on [0, TJ, then
x*(t) > 0 for all 0 :$ t :$ T, so we really have a free-end-point fixed
time (unconstrained) control problem. Then the transversality con ditions imply qJ(T) = 0, and so qJ(t) - c :$ 0,0:$ t :$ T. In other words. our optimal policy is unchanged. Note that the optimum policy over the time interval [0, toJ is in dependent of the demand, in the sense that we need not know the demand function a priori. We simply produce nothing until our inventory runs out; then over the time interval [to, TJ we must know the demand if.we are to operate optimally. In other words, the optimum policy up to zero inventory is unaffected by the future demand pattern.
This concept is known as a "planning horizon." We remark here that it has been shown that planning horizons exist for more-complicated cost functionals for the inventory problem [IJ. Inventory Control with Instantaneous Ordering. We now consider the inventory problem in which orders are placed and received at dis tinct times during the process, not continuously as before.
138
VIII.
State-Constrained Control Problems
Suppose x(t) is again our inventory, which we replenish by ordering amount Ui at time t i - We assume they ordering is instantaneous. Assume that we place N such orders Uo, UIo' .. , UN -1 in some fixed time interval [0, TJ. We assume that x(O) = 0, Uo occurs at to = 0, and tN - 1 = T. We let the demand rate be d(t), and denote the cumu lative demand up to time t by
f~ d(t) dt, t ~ O.
D(t) =
(18)
The costs we face will be the cost of holding level x(t) over 0 :::;; t :::;; T. We take this of the form
fOT h(t)x(t) dt for some given function h ~ O. The ordering costs will be assumed independent of the order size, and given by some constant K; the N orders then cost KN. Our total cost is then C
= KN +
f: h(t)x(t) dt.
(19)
The state equations could be written N-1
x(t) = Uo t5(t)
+ L
i= 1
u, t5 i(t - t i)
-
(20)
d(t),
x(O) = 0,
(21)
x(t)
(22)
~
0,
where t5(t) is the Dirac t5 function at O. The control problem is to choose N, {til, and {Ui} so as to minimize (19). We shall first rearrange the state equations (20) to make them more amenable. Denote "Ci = t i - t i - 1o i = 1,2, ... ,N; then N
"Ci ~ 0,
L "Ci = T.
(23)
i= 1
If we let Xi(S) be the inventory level in the ith time period t i the state equations can be rewritten dx,
ds' = -"Cid(t i- 1 + S"Ci),
Xi(O) = x i- 1(1) + Ui-1o
0<
S
1 :::;; s
:::;; t i ,
< 1,
i = 1,2, ... ,N,
(24)
with xo(O) = xo = O. This gives N continuous differential equations on [0, 1] with jumps at the end points.
4. Some Models in Production and Inventory Control
139
The solution of (24) is then just the difference of the total number of orders received and the cumulative demand, that is, i-I
L
Xi(S) =
Uj -
j=O
D(t i- 1
o :s: s :s: 1,
+ sri),
(25)
and by (22),
i = 1,2, ... ,N.
x;(s) ~ 0,
(26)
If we insert (25) irito the cost (19), and break the integral over [0, T] into the sum of integrals over [ti - 1 ,t;], i = 1,2, ... ,N, then
i tl hxdt + it, hxdt + .,. + iT hxdt Jo Jtl JtN-l N i1 + i~l Jo h(ti- 1 + sri)xi(s)ds
C = KN + = KN
=
KN
+ it
l
fo
1
h(t i- I + sri)
[~
Uj -
D(t i- 1 + Sri)]ds. (27)
Now by (25) and (26) i-I
LU
j=O
j
~ D(t i -
I
+ sri),
i = 1,2, ... ,N, O:s: s:S: 1,
so we minimize C by making the difference i-I
.L
Uj -
D(t i -
j=O
1
+ sri)
(28)
as small as possible over 0 :s: s :s: 1. Since D is the indefinite integral of a positive function,
o :s: s :s: 1, the smallest we can make (28) over 0 :s: s :s: 1 would be by choosing the optimum {u} and {uj} by i-I
L uj =
j=O
D(t i ) ,
i=1,2, ... ,N.
(29)
In other words, u~
= D(t 1 ) ,
ut =
D(ti + t>
-
D(ti),
i
= 1,2, ... ,N - 1,
(30)
140
VIII.
State-Constrained Control Problems
and we should order just enough at any order time to last until the next order is received. Inserting this into (27) gives the cost
+
C = KN as a function C constraints
=
ct
1
C(t i , .
• •
D(t;)
1:'-1 h(t)dt) -
T
fo h(t)D(t)dt
(31)
,tN), which we can minimize subject to the
0= to < t 1 < t 2 < ... < tN -
1
< tN = T.
It can be shown that the optimal order times t{ are then solutions of
d(t{)
rtf
Jtt
-1
i = 1,2, ... ,N - 1.
h(t) dt = h(t{)[ D(tf+1) - D(tt)],
(32)
Note that for a given N, (32) is an (N - Ij-dimensional system of nonlinear equations. In general we cannot determine N a priori, so we guess a value of N, solve (32), and compute the minimum cost (31). Then we repeat the method with larger N until the cost no longer reduces. In the case h(t) = h = const, d(t) = d = const, (32) then becomes
dh(tt - tt- d = h[dt{+ 1
-
dt{]
or
tt -
e:
1
= t{+ 1
-
t{,
i=I,2, ... ,N-1.
In other words, the order intervals are of constant length, say r, The cost then becomes
KN
+
I (hd-r
i=1
2 )i
2
_ hdT = KN _ hdT
2
2
2
+ hd-r2 N(N + 1).
Then substituting N = TI-r and minimizing over r gives
. (KT nun -r- + hd 2" T(T +r) ) , which occurs for
2
References
or
~~,
Z
r =
r =
141
f§.
From this value we can calculate N, {un, and C from (30 and (19). Problems 1.
Consider the minimum time control to the origin for the system
x=u, with initial conditions, X(O) = Xo ,
x(O) = 0
and the state constraint,
Ix(t)1 s 1,
t
~
O.
2. Resolve Example 2.1 in the case of the curve merely terminating on the Xz axis instead of at the origin. 3. Find the control that transfers (0,0) to the axis Xl = a > 0 In minimum time, for the system
ui + u~
= 1,
with the state constraint r, S > O.
References [I] A. Bensoussan, E. Hurst, and B. Naslund, "Management Applications of Modem Control Theory. North-Holland, Pub!., New York, 1974. [2] M. R. Hestenes, Multiplier and gradient methods, J. Optim. Theory Appl. 4, 303-320 (1969). [3] C. Norstrom, The continuous wheat trading model reconsidered. An application of mathematical control theory with a state constraint. Working Paper, Graduate School of Industrial Administration, Carnegie-Mellon University, 1978. [4] L. Pontryagin, V. Boltyanskii, R. Grarnkrelidze, and E. Mischenko, "The Mathe matical Theory of Optimal Processes." Wiley (lnterscience), New York, 1962.
This page intentionally left blank
Chapter IX
Optimal Control of Systems Governed by Partial Differential Equations
1. SOME EXAMPLES OF ELLIPTIC CONTROL PROBLEMS
To begin with, consider the following simple elliptic control problem. Weare given a region Q c R N with sufficiently regular boundary r (see Fig. 1). Suppose the set of admissible controls U, a closed, bounded, convex set in L 2(Q), is given, and for each U E U let y(u) be the solution of the Dirichlet problem [2] i\y(u) = f + u in Q, (1) y(u) = 0 on r, (2) where f is a given fixed function in L 2(Q). We can ask to find the control u E U that minimizes the cost
C(u)
=
fn
2
IY(u) - Ydl dx.
2
(3)
In this case, the .cost C: L (Q) --+ R, and we ask for min u e u C(u). If we regard y(u) as the steady-state temperature on Q corresponding to u, then the control problem has the physical interpretation of finding a control u such that the steady-state temperature on Q is as close as possible (in the mean-square sense) to some desired temperature distribution Yd' Alternatively, we could regard y(u) as the deformation induced in the body Q by a loading u. Then the control problem would be to find a u that achieves a deformation as close as possible (again in the mean-square sense) to Yd' 143
144
IX.
Control of Partial Differential Equations
Fig. 1
Necessary conditions for the analogous problem for U c R" can be easily obtained. Namely, suppose that we have a differentiable function C: U c R" --. R and that U is a closed, bounded, convex set. Then the minimum of C exists on U, at say u*, and since U is convex, qu*) that is,
~
q(1 - O)u*
qu*
+ Ov)
for all v E U, 0 ~ 0 ~ 1,
+ O(v -
u") - qu*) Z 0,
1 (j [qu*
+ O(v -
and so u") - C(u*)] Z
o.
(4)
As C is differentiable we can let 0 --. 0 in (4), and we arrive at the necessary condition for optimality Dqu*)(v - u*) Z 0
for all v E U.
In other words, the directional derivative of C at u* in the direction v - u* must be nonnegative for all v E U. Of course, in the case V = R", this condition reduces to the familiar optimality criterion vqu*) = o. We can certainly try this argument on our cost functional (3); now of course C: U c L 2(0) --. R. The directional derivative as defined by (4) would just be
1~ ~ [fn [(y(u* + O(v =
u*» - Yd)2 - (y(u*) - Ydf]
1~ ~ [fn [(y(u*) + Oy(v) 1
= lim -0 8-0
[rIn
[(y(u*) - Yd)2
+ 02(y(V) -
dxJ
Oy(u*) - Ydf - (y(u*) - Yd)2]
+ 2(Oy(v) -
Oy(u*»(y(u*) - Yd)
y(u*»2 - (y(u*) - Yd)2] dx
1
dXJ
1. Some Examples of Elliptic Control Problems
145
Now canceling and letting () -+ 0, we arrive at lim
~
6....0 ()
[fIn
=2
[(y(u*
In
+ ()(v -
YdfJ dXJ
u*» - Yd)2 - (y(u*) -
(y(u*) - Yd)(Y(V) - y(u*»dx.
This, then, is the value of the directional derivative of C at u*, in the direction v - u*, except that now v and u* are elements of the Hilbert
space L 2(Q). If u* is an optimal control, then we must have, as before,
In
(y(u*) - Yd){Y(V) - y(u*))dx
~ 0
for all
v E U.
(5)
We can rearrange this necessary condition to achieve some simplifica
tion. Since y(u) - Yd E L 2 (Q ), we can find a p(u) E H 2 (Q) satisfying Ap(u)
=
p(u) = 0
for any
U E
In
Q,
on
r,
U. Then in (5) we have by Green's identity
(Ap(u*))(y(v) - y(u*» dx
since p(u*)
in
y(u) - Yd
In
=
p(u*) A(y(v) - y(u*» dx,
= y(u*) = y(v) = 0 on r, and so (5) becomes
In
p(u*)(v - u*)dx ;;::: 0
for all
v E U.
Now we can restate the necessary condition (5) as
= f + u* y(u*) = 0 Ap(u*) = y(u*) p(u*) = 0 Ay(u*)
and
In
Yd
in
Q,
on in on
Q,
for all
p(u*)(v - u*) dx ;;::: 0
r,
(6)
r, VE
U.
(7)
In the case in which the set of admissible controls is given by pointwise constraints, (7) can be shown to be equivalent to [5, Theorem II, 2.1J p(u*)(x)(v(x) - u*(x» ;;::: 0
Example 1 If U
= {u:
a.e. x
lu(x)1 ~ 1, x
p(u*)(x)u*(x) ;;::: p(u*)(x)v
E
E Q
for all
v E U.
Q}, then (7) becomes
for all
Ivl ~ 1,
(8)
146
IX.
Control of Partial Differential Equations
that is, a.e. x E
u*(x) = sgn(p(u*)(x))
(8')
Q.
Unfortunately, we cannot expect p(u*) to be nonzero almost every where, and so u* in (8') may not be bang-bang everywhere in Q. Of course p(u*) t 0 unless y(u*) = Yd' and the problem is trivial. Note, however, that if p(u*) = 0 on E c Q, E nonnull, then
=0
~p(u*)
E,
a.e. on
that is, y(u*) - Yd = 0
a.e. on
and so if Yd is sufficiently regular, i.e., Yd -
~y(u*)
~Yd
=
0
a.e.
E
and
E
H 2 (Q), then
f
+ u* =
on
~Yd
E
or u*(x) =
~yix)
- f(x)
a.e. x E E.
(9)
In the case in which I~YiX) - f(X)1 > 1, (9) is impossible, since lu*(x)1 :$ 1. Similarly, if ~Yd(X) - f(x) = 1, (9) would contradict the definition of the set E (where u* was not bang-bang). Hence we must - f(t)1 z I}. This have lu*(x)[ = 1 for almost all x E {t s Q: I~Yit) gives the following immediate Proposition. Proposition 1 If I~Yix) bang-bang.
- f(x) 1 z 1 for almost all x E
Q,
then u* is
2. NECESSARY AND SUFFICIENT CONDITIONS FOR OPTIMALITY In this section we state the control problem somewhat more abstractly in the hope of covering a wider range of applications. First, let H be a separable Hilbert space with inner product ( , ) and suppose we are given a continuous, symmetric, positive semi definite bilinear form a(u, v) on H x H, that is, bilinear
+ u2 , v) = a(u l , v) + a(u2 , v), a(u, VI + v2 ) = a(u, VI) + a(a, V 2),
U,V 1,V2EH,
a(cxu, v) = «aiu, v)
CXER,
a(u 1
U I,U 2 , V E H ,
u.o e H
Necessary and Sufficient Conditions for Optimality
2.
147
continuous la(u, v)! :::;
Kilullllvil
for some constant
K,
symmetric a(u, v) = a(v, u), positive definite a(u,u) >
°
v, uEH,
for all u E 8, u =I 0.
Suppose we are also given a continuous linear functional u -+ L(u), u E H, and a closed convex set U c H, which we shall consider as the set of admissible controls. Under these assumptions, we take the cost in the control problem to be C(u) = a(u,u) - 2L(u)
for u E U.
(1)
We shall show in Section 3 that this form of the cost functional includes the problem example in Section 1, as well as many other problems of interest. The following theorem gives some conditions on the form a and the control restraint set U for an optimal control to exist.
Theorem 1 If a is coercive on H, that is, for all u E H,
a(u,u) ~ Kllul12
K > 0,
then there exists a unique u* E U minimizing Cover U. If U is bounded, a unique minimizing element always exists (even if a is not coercive). Proof Suppose a is coercive on U, and let {un} be a minimizing sequence, that is, C(un)
! inf C(u). ueU
Then
IC(u1)1 ~ IC(u )!= la(un, un) n
2L(u n)1 ~ la(un, un)l- 2l L (un )1 2
~ Kllunl1
-
2K 111u
nll,
n = 1,2,3, ... ,
by coercivity and boundedness of L. Solving this quadratic inequality, we must have
148
IX.
Control of Partial Differential Equations
that is, the minimizing sequence {Un} is bounded. Since H is a Hilbert space, the set {un} is relatively weakly sequentially compact. Con sequently, we can choose a weakly convergent subsequence un k --+ u*, weakly in H, u* E H. U is closed and convex; hence it is a weakly closed subset of H, and so u* E U. However, the function u --+ C(u) is convex and norm continuous; hence it is weakly lower semicontinuous, and so C(u*) ~ lim inf C(unJ = inf C(u), k-oo
ueU
that is, u* attains the infimum. The case of U bounded follows similarly, since now any minimizing sequence is automatically bounded in H. To prove uniqueness, we note that the cost C is actually strictly convex, that is, if U 1 =F u2 . Then if U 1 and
U2
both minimize Cover U,
C(tu 1 + tU2) < tC(Ul) + tC(u 2) = inf C(u) ueU
by strict convexity. However, since U is a convex set tUl a contradiction, unless U 1 = U 2. •
+ tU 2 E U,
The next theorem is the direct analog of the directional derivative test for a critical point from Section 1.
Theorem 2 An element u* E U minimizes Cover U if and only if a(u*, v) - a(u*, u*) ~ L(v) - L(u*)
for all
v E U.
(2)
Proof Suppose u* is a minimizing element. Then for any v E U and 0 ~ (} ~ 1, (1 - (})u* + (}v E U; hence
C(u*)
~
C(1 - (})u*
whence 1 (j [C(u*
+ (}(v -
+ (}v),
u*)) - C(u*)] ~ O.
Substituting for C and using the bilinearity of a and the linearity of L gives in the limit as (} --+ 0 a(u*, v) - a(u*, u*) ~ L(v) - L(u*)
as required.
for all
v E U,
2. Necessary and Sufficient Conditions for Optimality
149
Conversely, suppose (2) holds. Since C is convex, C((1 - O)w
+ Ov) ~
(1 - O)C(w)
+ OC(v)
= C(w) + O(C(v) - C(w»; hence 1 C(v) - C(w) 2 (j [C((1 - O)w
+ Ov) -
c(w)].
Now expanding and letting 0 -+ 0, we find as before that C(v) - C(w) 2 a(w, v) - a(w, w) - L(v)
+ L(w).
Setting w = u* gives C(v) - C(u*) 2 a(u*, v) - a(u*, u*) - L(v)
+ L(u*) 20
for all
v E U,
by hypothesis. Consequently, C(v) 2 C(u*) for all v E U; hence u* is a minimizing element. • Inequalities of the type (2) in Theorem 2 are called variational inequalities. A special case of this theorem is worth noticing. Corollary 1 In the case U = H (that is, there are no control con straints), (2) becomes a(u*, v) = L(v)
for all
v E H.
(2')
Proof Set v = u* ±
when when
v = u* +
(3)
that is, a(u*,
~
L(
•
Corollary 1 is a special case of the well-known Lax-Milgram theorem.
150
IX.
Control of Partial Differential EquatiollS
The application of Theorems 1 and 2 to control problems of the type in Section 1 follows since C(u) = =
In (y(u) - Yd)2 dx In Y(U)2 dx - 2 Io y(u)yddx + In Y~ dx =
+ (Yd'Yd)'
(y(u),y(u)) - 2(Y(U)'Yd)
In
where (u, v) = uv dx is the L 2(0) inner product. Since Yd is fixed, C is minimized at the same time as (y(u), y(u)) - 2(y(u), Yd),
which is in the general form considered in Theorems 1 and 2 with a(u, u) = (y(u), y(u)), L(u)
UE
H = L 2(0),
= (y(u), Yd)'
By Theorem 2, the necessary condition for optimality given in Section 1 is in fact sufficient, and the existence and uniqueness of an optimal control can be derived from Theorem 1. [Note that in the case f # 0, the mapping u -+ y(u) is only affine, not linear; however, it is easily seen from the proofs that the theorems still hold.] Theorems 1 and 2 and in particular Corollary 1 also have immediate application to the existence and uniqueness of weak solutions to partial differential equations.
Example 1 Let U
ov = H = H 1(0) = { v: v, oX
E
L 2 (0),
for
i
H 1(0) is a Hilbert space with norm
Define, for u, v E H 1(0), a(u, v) =
and
In (Vu • Vv + uv)dx
i
= 1,2, ... ,
n}.
2. Necessary and Sufficient Conditions for Optimality
151
for some fixed element f E L 2(0). It is not difficult to show that a is a symmetric, bilinear, bounded form on H, and L is continuous and linear. The coercivity of a follows directly since a(u, u) =
f(IVu12 + u2)dx = I'luili.
Consequently, by Theorem 1 and Corollary 1, there exists a unique u* E H 1(0) satisfying a(u*, v) = L(v)
for all
v E H 1(0),
or that fn(Vu*-Vv+u*v)dx= fnfvdx
forall
vEH 1(0 ).
(4)
Thus u* is the unique weak solution of the elliptic equation -,1u
+ u =f
in
0,
au = 0 an
on
r.
(5)
The correspondence between solutions of (4) and (5) can be obtained by Green's identity. Namely, if u satisfies (5), then, for any v E H 1(0),
r fvdx= In =
-
r v(,1u-u)dx= In
r (Vu· In
-
au
r v an + In r Vu·Vvdx+ In r uvdx Jr
Vv +. uv)dx.
Consequently, classical solutions of (5) obey (4). However, a priori solutions of (4) need not even have two derivatives; hence the term weak solution. Example 2 If we instead choose V = HMO) [the subspace of ele ments in H 1(0) vanishing on r], then choosing a and L as before, we find a unique u~ E HMO) satisfying
fn (Vu* • Vv + u*v) dx = fn fv dx
for all
v E HMo).
Another application of Green's identity shows that u* is the weak solution of the Dirichlet problem -,1u
+ u =f u=o
in
0,
on
r.
152
IX.
Control of Partial Differential Equations
3. BOUNDARY CONTROL AND APPROXIMATE
CONTROLLABILITY OF ELLIPTIC SYSTEMS
Boundary Control
We reconsider the control problem in Section 1 except now the control will be exercised through the boundary. Namely, suppose for a given control u E U(r) that y(u) is the solution of the Dirichlet problem ~y(u) = 0 in n, (1) on T, y(u) = u Again a desired state Yd E L 2(n) is given, and we wish to minimize C(u) =
In
(y(u) - Yd)2 dx,
over the set of admissible controls U, a bounded, closed, convex subset of L 2(r). The necessary and sufficient condition for optimality becomes, as before,
In
(y(u*) - Yd)(Y(V) - y(u*» dx
~ 0
for all v E
u.
(2)
If we introduce the adjoint system ~p(u)
= y(u) -
in on
Yd
peri) = 0
n,
Then for u E L 2 (r), we have, by Green's identity,
r
In ~p(u*)y(v
- u*)dx = -
r op(u*)
Jr an y(v
+ Sn p(u*) ~y(v
- u*)
(3)
r.
r
+ Jr p(u*)
- u*) dx.
oy(v - u*) on
(4)
Using the state and adjoint equations, (4) becomes
r
In ~p(u*)y(v
- u*)dx = -
r
op(u*)
Jr ~
(v - u*),
(5)
and hence the necessary and sufficient condition for optimality is now
Ir op~~*)
(v - u*)
:5;
0
for all v E U.
(6)
3.
Control of Elliptic Systems
Again, in the case U = {v: Iv(x)j ~ 1, x equivalent to the local condition op(u*) (x)v(x)
on
~
153
I"}, (6) can be shown to be
E
op(u*) (x)u*(x)
- a.e. x E
r,
oP(u* ) } = sgn { ----an (x) ,
a.e. x
r.
on
that is, u*(x)
Remark
E
(7)
In the case of Neumann boundary control,
f
in
n
--=U
on
F,
L\y(u) = ·oy(u)
on
the optimality condition becomes
Sr
p(u*)(v - u*)
~
for all
0
v E U,
(8)
with the adjoint system L\p(u) = y(u) - Yd
in
n,
op(u) = 0 on
on
r.
The analog of (7) is now u* = -sgn{p(u*)}.
Approximate Controllability In Section 1 we supposed a state Yd to minimize the cost
E
L 2 (n ) was given, and we tried
subject to L\y(u) = u
in
n,
=0
on
T.
y(u)
(9)
154
IX.
Control of Partial Differential Equations
A natural question to ask is, if the controls are allowed to vary over L 2(0), can any desired state Yd be approximated as closely as we like by some y(u), for u E L 2(0)? In other words, is the set of states {y(u): u E L 2(On dense in L 2(0). Of course this is a controllability problem. Now instead of trying to reach a desired state exactly [in £1(0) this is clearly ruled out by the regularity of y(u)J, we ask if a desired state can be approximated as closely as we like by controls in L 2(0). This con cept is called approximate controllability. It is a well-known consequence of the Hahn-Banach theorem that the set {y(u): u E L 2(On is dense in L 2(0), and hence (9) is approxi mately controllable there if and only if and
fn
=0
tJy(u) dx
for all
u E L 2(0), (10)
implies that tJ = O. If we define the adjoint equation as before, tip
= tJ
p=O
then
fn
tJy(u)dx
=
fn
In
0,
on
I',
tipy(u)dx
=
fn
pudx.
Hence (10) implies that for all
fnpudx=O
u E L 2(0),
and so p
== 0,
that is, tip
= n == 0,
and the system (9) is approximately controllable. In the case of boundary control, however, the system is not approxi mately controllable in £1(0). For now suppose
fn
tJ(x)y(u)(x)dx = 0
for
tJ
E
L 2(0),
3.
155
Control of Elliptic Systems
where in on
L\y(u) = 0
=
y(u)
u
Q,
r.
If we define the adjoint system by
in on
L\p = YI p=o
then
o=
In YlY(U)
dx =
r op
= - Jr = _
r
Jr
r
r,
L\py(u) dx
r
oy(u)
+ Jr Pan + In pL\y(u)dx
on y(u)
op on
In
Q,
u.
Consequently, Sr (op/on)u = 0 for all u E L 2(r); hence op/on = 0 on Thus the adjoint system becomes
= YI
In
Q,
p=o
on
r,
op =0 an
on
r.
L\p
r.
(11)
However, it is now possible to find a nonzero I] E L 2 (Q) satisfying (11). For instance, if n = 1, we can choose p as shown in Fig. 2 and pick YI = p". For such an 1], we have
In YlY(U)
=
0
for all
u E L 2(r),
and so {y(u): u E U(r)} is not dense in L 2(Q) and the system is not approximately controllable.
I~
n Fig. 2
156
IX.
Control of Partial Differential Equations
4. THE CONTROL OF SYSTEMS GOVERNED BY PARABOLIC EQUATIONS In this section we consider certain control systems governed by parabolic partial differential equations that typically arise in heat conduction or diffusion problems [1]. For example, if the temperature in a homogeneous body 0 (Fig. 3) is controlled by a variable heat source u(x, t) for x E 0, t> 0; if we denote the temperature cor responding to u by y(x, t: u) or just y(t: u) or y(u); and if the boundary of 0 is kept at a constant, say zero, temperature, then y satisfies 8y(u) t
-8- = c ~y(u) .
+f + u
in
0,
XEO, on r.
y(x,O: u) = Yo(x), y(u) = 0
(1)
The initial temperature Yo and fixed source term f will be assumed to be elements of L 2(0). For simplicity, we take c = 1. [The case of insulated boundary 8y(u)j8n = 0 can be handled analogously.] If the controls u are constrained to belong to a given set of admissible controls U c: L 2(0), and a desired temperature distribution Yd E L 2(0) is given at some final, fixed time T, then one control problem of interest is to choose an admissible control u that brings y(u) as close to Yd as possible in the given time T. That is, min C(u) = UEU
r (y(T: u) -
In
Yd)2 dx,
where we again use the mean-square-error criterion. Since the cost C is essentially the same type as before, our previous theory applies. In particular, if U is a bounded, closed, convex set in L 2(0), then an optimal control u*' exists. The necessary and sufficient condition for
Fig. 3
4. The Control of Systems Governed by Parabolic Equations
157
optimality is
In (y(T: u*) -
Yd)(y(T: v) - y(T: u*»dx ~ 0
for all
v E U.
(2)
Introducing the adjoint system, op(u) + LJ.p U
0
in
n,
p(T: U) = y(T: U) - Yd p(U) = 0
in on
n,
A
(
) -
-
ot
we have that
o = I: In {op~~*) =
+ L\P(u*)} (y(t: v) -
on,
y(t: u*» dx dt
In IOT o~~*) (y(t : v) - y(t : u*» dx dt + I: In L\p(u*)(y(t: v) - y(t: u*)) dx dt.
On integrating by parts, 0=
In {P(t: u*)(y(t: v) -
y(t: u*))I:
_ jT p(u*)(OY(V) _ Oy(u*»)dt}dX Jo o t . in
+ =
I: In p(u*)(L\y(v) -
In p(T: u*)(y(T: v) -
L\y(u*»dx dt y(T: u*»dx
j (Oy(V) oy(u*) ) - JojT In p(u*) & - L\y(v) - ~ + L\y(u*) dx dt
=
I: p(T: u*)'(Y(T: v) - y(T: u*»dx - I: In p(u*)(v - u*)dxdt.
So the necessary condition (2) becomes
I: In p(u*)(v -
If U = {u:
lui:$;
u*)dx dt
~ 0
for all
v E U.
(3)
I}, then (3) becomes u*(x, t) = - sgn{p(x, t : u*)},
and in this case, the optimal control u* is completely determined by the necessary condition.
158
IX.
Control of Partial Differential Equations
Proposition 1 [5J If y(u*) =1= Yd, then the optimal control u* is bang-bang, lu*1 == 1, and u* is unique. Proof As usual, the only way u* cannot be bang-bang and unique is if p(x, t : u*) = 0 for (x, t) E E c n x (0, T), E nonnull. By taking E smaller, if necessary, we can suppose that E c n x (0, t 1) for any t 1 < T that is sufficiently close to T. Since p satisfies the heat equation with homogeneous boundary conditions, it must be analytic in n x (0, t 1) [2J, and hence identically zero there. Now letting t 1 i T and using the continuity of p(u*) in time gives p(T: u*)
and hence
=0
n,
a.e. in
y(u*) = Yd'
a contradiction. •
Boundary Control of the Heat Equation We now consider the problem of controlling the temperature in the region n when the body is heated by convection on its boundary. For this problem, Newton's law of heating (cooling) is a good approxima tion, and we can write the state equations in n Yt=Ay (4) XEn y(x,O) = Yo(x),
oy
on (x, t) = b(u(x, t) - y(x, t»,
x
E
F,
t > 0,
where u is the (boundary) control function and b a positive constant. In this case the set of admissible controls U will be taken as a closed, convex, bounded subset of L 2(r x (0, T». In the same manner as before, we define the adjoint system op(u) = Ap(u)
ot
p(T: u) = y(T: u) - Yd op(u) on
+ bp(u) = 0
in
n,
in
n,
on
r,
and the necessary and sufficient condition for optimality becomes T
fo
fr p(t: u*)(v - u*) ~ 0
for all v E U.
(5)
159
S. Time Optimal Control
Now in the case U = {u E U(r u*(x, t)
X
(0, T»: lui :s I}, we have
= sgn(p(x, t : u*»
and the following proposition. Proposition 2 (Knowles [3J; Lions [5J) If y(u*) i= Yd and analytic, then u* is bang-bang and unique.
r
is
Proof (Sketch) Under the above assumptions, p(u*) is analytic in n x (0, t 1) , t 1 < T[2]. Consequently, if p(u*) vanishes on a nonnull set in r x (0, t 1 ) , it must be identically zero on r x (0, t 1 ) , and hence so must op(u*)jon by (5). Since p(u*) is analytic in 0 x (0,t 1 ) and since p(u*) and op(u*)jon vanish on r x (0, t 1) , uniqueness requires that p(u*) == 0 in n x (0, t 1) . As before, by continuity, this implies Yd = y(T: u*), a contradiction. • Corollary 1 In the case in which n = 1 and 0 is interval, the optimal control u* = u*(t) can, under the assumptions of Theorem 2, have at most a countable number of switches, and if so, the switching times must accumulate at T.
Proof u*(t) = sgn{p(t: u*)}, and by analyticity, p can have only a finite number of zeros in any interval (0, t 1) , t 1 < T. • 5.
TIME OPTIMAL CONTROL
Instead of taking the terminal time T fixed, we now consider certain extensions of the time optimal control problem of Chapter I to systems governed by parabolic partial differential equations. As we noted in our earlier remarks on approximate controllability (see also Section 6), the exact hitting ofthe desired final state Yd may not be possible in any finite time, although we may be able to approach Yd as closely as we like in finite time. Consequently, a more practical formulation of the time optimal control problem for partial differential equations, is to sup pose a "terminal error" B is preassigned, and to try to hit the target set (1)
in minimum time, using admissible controls. Since we have been assum ing Yd E £2(0), we shall take the norm in (1) to be the £2(0) norm. [Other choices of the norm in (1) are clearly possible and may, in fact, be more interesting in applications.J
160
IX. Control of Partial Differential Equations
If y(t: u) is the solution of the state equations (4.1) at time t [or alter natively (4.4)] corresponding to a control u E U, a closed, convex, bounded subset of L 2 (Q) [respectively L 2 (f) ], then we seek a control u* E U such that y(t*: u*) E Wand t* is minimal. Given that some admissible control reaches W in finite time, it can be shown by basically the same geometric methods as used in Chapter I that the minimum time t* exists [3,5]. A necessary condition characterizing t* and u", the optimal control, can be obtained from the previous problem. Namely, in the control problem with cost
C(u) = IIy(t* : u) -
Ydll,
(2)
u* must also be optimal, with minimal cost s; that is, C(u*) = min C(u) = e. lieU
For ifu E U achieves a smaller cost in (2), C(u) <: e,
then
lIy(t* : u) -
Ydll < e.
However, the mapping t 1-+ y(t: U), [0, T] -+ L 2 (Q), is continuous, so we must be able to find a smaller time I < t* with Ily(I: u) -
Ydll ~
e,
which contradicts the minimality of t*. Conversely, it can be similarly argued that if Ul minimizes the cost (2), then Ul must also be a time optimal control. By virtue of this equivalence, the optimality conditions can be de duced from Section 4. For the time optimal control problem with state equations (4.1) and corresponding adjoint system, the optimality con ditions may be stated as
S~' S~ p(u*)(v - u*)dx dt ~ 0
for all
v E U.
(3)
For the boundary control problem (4.4), with adjoint equations (4.5), the optimality conditions now are
S~'
Sr p(u*)(v -
u*) ~ 0
for all
v E U.
(4)
The analogous local necessary conditions also hold, as in fact do the conclusions of Proposition 4.2 and Corollary 4.1.
6. Approximate Controllability for Parabolic Problems
161
Proposition 1 (Knowles [3]) If the set of admissible controls is X [0, T]): !u(x)1 ~ I} for the problem of Eqs. (4.1) or X [0, T]): lu(x)I ~ I} for the problem of Eqs. (4.4) and r is analytic, then time-optimal controls for both problems are unique, bang-bang (Iu(x)! == 1), and uniquely determined by (3) [respectively (4)]. U = {u E U(O U = {u E U(r
6. APPROXIMATE CONTROLLABILITY FOR PARABOLIC PROBLEMS In this section we give a partial answer to the controllability ques tions raised in Section 5. Namely, if y(t: u) denotes the solution of (4.4) for u E L 2(r X [0, T]), then we shall show that {y(T: u): u E L 2(r x [0, T])} is dense in L 2(0), for any T> 0, and so the boundary control problem (4.4) is approximately controllable. Again, we cannot possibly expect exact controllability in L 2(0) (or HP(O) for that matter), because of the smoothing properties of parabolic differential equations. By the same methods one can show the distributed control problem of Eqs. (4.1) is also approximately controllable in L 2(0) in any time T> 0. If {y(T: u): u E L 2(r X [0, T])} were not dense in L 2(0), then, by the Hahn-Banach theorem, we could find a nonzero 11 E L 2(0) such that In l1y(T: u) dx =.
°
for all
u E L 2(r
X
[0, T]).
(1)
Introducing the adjoint system
op
ot + L1p = :: + bp =
° °
p(x, T) = I1(X),
(1) then implies that In p(T)y(T: u) dx
=
°
in
0 x (0, T),
on
r
x (0, T),
XEO,
for all
u E L 2(r x [0, T]),
and on integration by parts, (2) leads to for all
u E L 2 (r
X
[0, T]).
(2)
162
IX.
Control of Partial Differential Equations
Thus
p=O
and consequently
op =
on
on
°
on
I' x (0, T),
r
x (0, T)
°
[by (4.4)]. Uniqueness then requires that p = in n x (0, T). Again taking t t T, and using the continuity of t ~ p(t), we have that p(T) = 17 = 0, a contradiction. We note in passing that the similarity between the above proof and that of Propositions 4.1 and 4.2 is not coincidental. In fact, in the prop ositions we showed that the system had an analog of the "normality property" of Chapter I, namely. optimal controls were bang-bang, unique, and uniquely determined by the necessary condition. As in the case of ordinary differential equations, it can be shown that normality implies approximate controllability, and, in fact, is in some cases equiv alent to it [4]. An analog of the distributed control problem (4.1) is to consider the action of m independently acting "point-source" controllers. In one dimension, we could consider controls acting on n as shown in Fig. 4. The state equation would then become oy ot
+
m
L gi(X)Ui(t),
XEn,
t> 0,
y(x, t) = 0,
XEr,
t> 0,
0,
XEn,
-=
y(x,O)
=
ily
i= 1
°
(3)
with gi = ox" the Dirac function centered at Xi' i = 1,2, ... ,m. Alter natively, with m = 1, gl(X) = I, (3) would model the problem of con trolling the temperature on the rod n by the ambient temperature Ul(t). IfA. j and ({Jj,j = 1,2, ... , are the eigenvalues and eigenfunctions of the system y" = A.y in. n with homogeneous Dirichlet boundary condi
•
U (t) 1
u (t )
~
~
• x
1
2
u (t) m
~
x•2
X
n
• m
•
6. Approximate Controllability for Parabolic Problems
163
tions, then for gi E £2(Q), the solution of (3) can be written y(x, t: u) =
j~l
Ctl
(gio CPj)
f~
exp[ - Aj(t - t)]uJr) d't) CPj(x),
(4)
where (gi,CPj) = Jll9i(x)cpix)dx are the Fourier coefficients of the {gil with respect to the {cpj}. The solution (4) is also valid for gi = bX i if we interpret te.. CPj) = cpixJ As before, (3) is approximately controllable in time T if and only if
In I]y(T:
u) dx
=0
for all
u
= (Ui),
u, E L 2(0, T),
i=I,2, ... ,m
implies that I]
= 0 (I] E L ~Q».
j~l(g,cp)(I:
exp[ -ilN
-
If m
(5)
= 1, (5) is equivalent to
't)]U('t)d't}I],CP) = 0
for all u E L 2(0, T),
and hence 00
I
j=l
(g,cp)(I],cpj)exp(-Aj't) == 0
for
0< r
~
T.
(6)
However, the functions {exp( -Aj't)} are linearly independent, and so (6) must imply that (g, CPj)(I], cp) = 0
for all j = 1,2,. .. .
(7)
The necessary and sufficient condition for approximate controllability can now be read from (7). If(g, CPjo) = 0 for somejx, then we can certainly find a nonzero I] satisfying (7); for instance, take I] = CPj for any j i= jo· Conversely, if(g,cpj) i= 0 for allj = 1,2, ... , then from (7) (I],cP) = 0 for all j, and hence I] = 0, since the {cp j} span L 2(Q). Consequently, with m = 1, (3) is approximately controllable if and only if (g, cP j) i= 0 for every j = 1,2, .... In the general ,case, with m controls one can similarly show (5) is equivalent to (gioCP)(I],CP) = 0 for all i = 1,2, ... .m, j = 1,2, ... ,m. Consequently, (4.4) is approximately controllable in any time T> 0 if and only if for every j = 1,2, ... , (gi, cP j) i= 0 for some i = 1,2, ... , m. (8) Note that in the case gi = forevery j=I,2, ... ,
t)Xi'
(8) becomes
CPj(x;)i=O
forsome
i=I,2, ... ,m,
164
IX.
Controlof Partial Differential Equations
or equivalently, for every j, at least one 1,2, ... .m.
Xi
is not a nodal point of qJj' i =
Finally, in the case in which the state equations (3) are over a region the distinct eigenvalues of
o eRN, if Aj are
Liy = Ay
y=o
in 0, on F,
and {qJk,j}k'4, 1 are corresponding eigenfunctions (mj is the multiplicity of Aj), then the solution of (3) becomes
y(x,t: u) =
itl (f~
j~l k~l
exp[ -Ai t - 't")]u;(-r)d't") (gi,qJk)qJkix)
and the same argument now shows (3) is approximately controllable in any time T if and only if
rm k
(g1> qJlj) (gl,qJ2j)
r ~mj) .
(g 1,
(g2' qJlj) (g2,qJ2) (g 2, qJmj)
(gm' qJlj)l (gm,qJ2j) (gm,
=~
~mj)
for every j = 1,2,. .. . In particular, one needs at least as many controls as the largest multiplicity of the eigenvalues. Consequently, any system (3) with infinite multiplicity of its eigenvalues cannot be approximately controllable in L 2(0) with "point controls." For further discussion of this subject, we refer the reader to Triggiani [6]. References [I] A. Butkovskiy, "Distributed Control Systems." Elsevier, Amsterdam, 1969. [2] A. Friedman, "Partial Differential Equations." Holt, New York, 1969. [3] G. Knowles, Time optimal control in infinite dimensional spaces. SIAM J. Control Optim. 14,919-933 (1976). [4] G. Knowles, Some problems in the control of distributed systems, and their numerical solution, SIAM J. Control Optim. 17, 5-22 (1979). [5] J. Lions, "Optimal Control of Systems Governed by Partial Differential Equa tions." Springer-Verlag, Berlin and New York, 1971. [6] R. Triggiani, Extensions of rank conditions for controllability and observability to Banach spaces and unbounded operators, SIAM J. Control Optim. 14, 313-338 (1976).
Appendix I
Geometry of R"
Suppose that C is a set contained in W, and that for x E C, Ilxll = (xi + x~ + ... + X;)l/2. C is called bounded if there exists a number (X < 00 such that IIxll ::;; (X for all x E C. C is called closed if it contains its boundary; i.e., if x, E C and Ilxn - xII ~ 0, then we must have xE C. C is convex ifit contains every line segment whose end points lie in C; i.e., ifx, y E C, then tx h E C. (See Fig. 1.) An element x E C is called an extreme point of C if it does not lie on any (nontrivial) line segment contained in C (alternatively, if it cannot be written as a proper linear combination of elements of C). For example, the vertices of a square are extreme points (see Fig. 2). Given a vector" ERn, the hyperplane through a point Xl ERn with normal" is H = {x E R": "TX = "TXtJ (see Fig. 3). A point Xl E C is supported (Fig. 4).by this hyperplane H if C lies on one side of H, i.e., "TX l ~ "TX for all x E C. The point Xl is called an exposed point of C if Xl is supported by a hyperplane H (with normal", say) and if Xl is the only point of C supported by H. In other words "TX l > "TX for all x E C, and x =f. Xl' In Fig. 5, Xl is an exposed point. In Fig. 6, Xl is a support point but not exposed point, and in Fig. 7, Xl is an extreme point and a support point, but not exposed point.
+
165
166
Appendix I
not convex
Fig.!
d ....
... c
c a...- - - - - - -... b Fig. 2
Fig. 3.
Hyperplane through 0 with normal If.
1)
Fig. 4.
XI
is a support point of C.
c Fig.S
1)
-----~-r--------_r_-----H
c Fig. 6
C Fig. 7
,)
168
Appendix I
Theorem 1 If C is a closed, bounded convex set, and if Xl belongs to the boundary of C, then we can find a hyperplane H (with normal q ¥- 0) that supports C at Xl. A set C is called strictly convex if every boundary point of C is an exposed point (e.g., circles, ellipsoids). A rectangle is not strictly convex. In general every exposed point is a support point (trivial), and every exposed point is an extreme point (why?), but not conversely.
Appendix II
1.
Existence of Time Optimal Controls and the Bang-Bang Principle
LINEAR CONTROL SYSTEMS
We shall now suppose that we are given a control system of the form x(t)
= A(t)x(t) + b(t)u(t),
x(O) = Xo,
(1)
where x(t) E R", u(t) ER, and A and bare n x nand n x 1 matrices of functions that are integrable on any finite interval. The controls u will be restricted to the set U
= {u: u is Lebesgue measurable and lu(r)1
~
1 a.e.},
(2)
where U is commonly called the set of admissible controls; note that U c LOO(R). A "target" point Z E R" is given, and we shall consider the control problem of hitting z in minimum time by trajectories of (1), using only controls u E U. Given a control u, denote the solution of (1) at time t by x(t : u). In fact, by variation of parameters, x(t: u) = X(t)xo
+ X(t)
S; X- (r)b(r)u(r) dt, 1
(3)
where X(t) is the principal matrix solution of the homogeneous system X(t) = A(t)X(t) with X(O) = I. 169
170
Appendix II
The attainable set at time t [namely, the set of all points (in R") reachable by solutions of (1) in time t using all admissible controlsJ is just
(4)
d(t) = {x(t: u): u E U} cz R",
It will also be of use to consider the set 9l(t) =
{S~
X-1(r)b(r)u(r) dt : u E U} cz R",
(5)
Note that by (3) d(t)
X(t)[ x o + 9l(t)]
=
=
{X(t)xo
+ X(t)y: y E 9l(t)}
(6)
and 9l(t) = X-1(t)d(t) - xo,
(7)
so that ZE d(td if and only if(X-1(t1)z - xo) E Bl(tl)' Finally, we define the bang-bang controls to be elements of U bb = {u :lu(r)1 == 1 a.e. r},
and we denote the set of all points attainable by bang-bang controls by
dbb(t)
=
t > O.
{x(t: u): u E Ubb} ,
2. PROPERTIES OF THE ATIAINABLE SET In this section we derive the properties of d(t) that will be central to the study of the control problem. Define a mapping l: L 00([0, TJ) ~ Rn , T> 0, by leu) =
S:
X-1(r)b(r)u(r)dr,
UE
Loo([O, TJ).
(1)
Lemma 1 lisacontinuouslinearmappingbetween U c Loo([O, TJ) with the weak-star topology and R" with the usual topology. Proof The linearity follows directly from the additivity of inte gration. For the continuity, recall that on bounded subsets of L 00([0, TJ) the weak-star topology is metrizable, and u, ~ U if and only if
SOT y(r)ur(r) dt ..... S: y(r)u(r) dr
as
r
~
00
for all
y
E
U([O, TJ).
Existence of Time Optimal Controls and the Bang-Bang Principle
171
Suppose that (ur ) E U and U r ~ u. Let X- 1 (r)b(r) = (Yi(r)). Then 1([0, E L T]) by assumption, and so
{y;}
fo
T
Yi(r)u.(r)dr
fo
T
--+
Yi(r)u(r)dr
for all i = 1,2, ... , n. In other words I(ur )
--+
I(u). •
As an immediate consequence of this lemma, we have the following theorem. Theorem 1 d(T) is a compact, convex subset of W for any T> O. Proof The set of admissible controls is just the unit ball in L 00([0, T]), and so is w*-compact and convex (Banach-Alaoglu
theorem). Consequently, 9f(T) = I(u)
is the continuous, linear image of a w*-compact, convex set, so must be compact and convex. However, d(T) = X(T)[xo
+ 9f(T)]
is just an affine translate of 9l(T) in W, and so itself must be compact and convex. • A much deeper result about the structure of the attainable is con tained in the following theorem, usually called the bang-bang principle. It says essentially that any point reachable by some admissible control in time T is reachable by a bang-bang control in the same time. Theorem 2 (Bang-Bang Principle)
For any T> 0, d(T) = d bb(T).
Proof Note that d(T) is just an affine translate of 9f(T). It will then be sufficient to show that for every point x E 9l(T), x = I(u*) for some bang-bang control u*. Let B
=r
1
({x }) (') U
=
{u:I(u)
= x}
(') U.
(B is just the set of all admissible controls in U hitting x in time T.) By Lemma 1, B is a weak-star compact, convex subset of L 00([0, T]),
and so by Krein-Milman theorem it has an extreme point, say u*. If we can show ju*1 == 1 a.e., then we shall have found our bang-bang control and be finished.
172
Appendix
n
Suppose not. Then there must exist a set E c [0, T] of positive Lebesgue measure such that lu*(r)/ < 1
for r
E
E.
In fact we can do a little better. Namely, let
Em = {r E E: lu*(r)1 < 1 - 11m},
m = 1,2, ....
Then U::.'= 1 Em = E, and since E has positive measure, at least one Em must also have positive measure. So there must exist an s > 0 and a nonnull set F c [0, T] with lu*(r)1 < 1 - s
for rEF.
(2)
Since F is nonnull, the vector space L OCl(F) (again with respect to Lebesgue measure) must be infinite dimensional, and as a consequence, the integration mapping 1F:LOCl(F) ~ R", 1F(v)
= IF X- 1(r)b(r)v(r) dr,
cannot be 1-1. (IF maps an infinite-dimensional vector space into a finite-dimensional vector space.) Consequently, Ker(IF) "1= 0, so we can choose a bounded measurable function v ¢ 0 on F with 1F(v) = O. We now set v = 0 on [0, T] - F so that 1(v) = 0, (3) and by dividing through by large-enough constant we can certainly suppose Ivl ::;; 1 a.e. on [0, T]. Then by (2), lu* ± svl ::;; 1,so that u* ± €V E U, and by (3)1(u* ± €V) = 1(u*) ± l:1(v) = 1(u*) = x, i.e., u* ± l:V E B. Since clearly, u* = !(u* + l:v) + !(u* - €V)
a
and v ¢ 0, u* cannot be an extreme point of B, a contradiction. •
3. EXISTENCE OF TIME OPTIMAL CONTROLS We return to the time optimal control considered in Section 1 of this Appendix. We shall assume throughout the rest of this section that the target point z is hit in some time by an admissible control, that is: (*) There exists a t 1 > 0 and u E U such that x(t 1 : u) = z.
173
Existence of Time Optimal Controls and the Bang-Bang Principle
Assumptions of the type (*) are called controllability assumptions, and, needless to say, are essential for the time-optimal control problem to be well posed. If we set t* = inf{t : 3u E U with x(t : u) = z} (1) [this set is non empty by (*) and bounded below by 0, so we can talk about its infimum], then t* would be the natural candidate for the minimum time, and it remains to construct an optimal control u* E U such that x(t* : u*) = z. Theorem 1 If (*) holds, then there exists an optimal control u* E U such that x(t* : u*) = z in minimum time t*.
Proof Define t* as in (1). By this definition we can choose a sequence of times t n t t* and of controls {un} C U such that x(t n : un) = z, n = 1,2, .... Then Iz - x(t* : un)1
= Ix(tn: un) -
(2)
x(t* : un)l.
However, from the fundamental solution, x(tn: un) - x(t* : Un) = X(tn)XO - X(t*)xo
- X(t*) f~'
+ X(tn) f~"
X- 1(r)b(r)un(r)dt
X- 1(r)b(r)un(r)dt:
Consequently, Ix(tn: un) - x(t* : un)1 ::::;; IX(tn)xo - X(t*)xol
+ IX(tn)
f" X-
+ IX(t*)
-
1(r)b(r)Un(r)dr!
X(tn)l f~'
(3)
X- 1(r)b(r)U,,(r)dr!
The first term on the right-hand side of (3) clearly tends to zero as n --+ 00. [X(·) is continuous.] The second term can be bounded above by IIX(tn)11
f" IX-
1(r)b(r)llu
)II .I::" IX-
n(r)1 dt ::::;; IIX(t n
1(r)b(r)1
dt
since Iunl ::::;; 1. Consequently, as n --+ 00 this term also tends to zero. Finally, the third term tends to zero, again by the continuity of X(·). Plugging (3) back into (2), we get x(t* : un) --+ z as n --+ 00,
174
Appendix II
i.e., Z E d(t*). However, by Theorem 2.1, $'(t*) is compact, and so $'(t*). Consequently,
ZE
Z =
x(t*: u*)
for some u* E U. • Note that once we have shown that an optimal control exists, the bang-bang principle guarantees that an optimal bang-bang control exists. The above proof also carries over with only cosmetic changes to continuously moving targets z(t), 0 ~ t ~ T, in fact to "continuously moving" compact target sets and to the general state equation
x = A(t)x(t) + B(t)u(t), where B(t) is n x m and u(t) = (u 1(t), . . . ,um(tW.
Appendix III
Stability
Unstable control systems that magnify small errors into large fluctuations as time increases are rarely compatible with good design. Consequently, tests for the stability of systems of differential equations are extremely important. The system of differential equations i
= f(x)
(1)
is said to be stable about x = 0 if, for every B > 0, there exists a (j > 0 such that Ilxoll < (j implies that the solution x(t) starting at x(O) = Xo obeys Ilx(t)11 < B for 0 ~ t < 00. A necessary condition for stability about 0 is clearly that f(O) = 0, or that 0 is an equilibrium point of (1). Usually, for control applications, one requires more than just stability; one would prefer that small fluctuations be damped out as time increases. The system (1) is called asymptotically stable about the origin if, for every B > 0, there exists a (j > 0 such that Ilxoll < (j implies that x(t) ..... 0 as t ..... 00, where x(t) is the solution of (1) starting at x(O) = Xo' Further more, (1) is globally asymptotically stable about 0 if every solution of (1) converges to 0 as t ..... 00. Global asymptotic stability is the most useful concept for practical design of control systems. If (1) is linear, that is, if
i=Ax, 175
(2)
176
Appendix III
it is easy to see that a necessary and sufficient condition for global asymptotic stability is that A have eigenvalues with negative real parts only. For if p.j}}=1 are the distinct eigenvalues of A, then any solution of (2) can be written p
X(t) =
L gj(t) exp(AA
(3)
j= 1
where the gN) are polynomials in t. From (3) it is easy to see that x(t) -+ 0 if and only ifReA.j < O,j = 1,2, ... ,P. The eigenvalues of A are determined as the roots of the characteristic equation det(A.r - A) = O. Once this equation has been computed it is not necessary to find all of the roots; the famous Routh-Hurwitz test gives necessary and sufficient conditions for the roots of a polynomial
d(A) = aoAn + alAn- l
+ ... + an-1A + an,
ao > 0,
(4)
to have negative real parts (for a proof, see [1, Vol. II, Chapter XV]. First, define the following determinants
al ao 0 0 L\k = 0 0 0
a3 a2 al ao 0 0
as
a4
a 3
a2 a4
al a 3
0 'a2
0
0
k = 1,2, ... ,n,
0
ak
where we substitute zero for ak if k > n.
Theorem 1 The polynomial (4) with real coefficients has all its roots with negative real parts if and only if the coefficients of (4) are positive and the determinants L\k > 0, k = 1,2, ... ,n. Example 1
For the characteristic equation
aoA 4 + alA 3 + a2A2
+ a3A + a4 =
the conditions for asymptotic stability are
i = 0,1,2,3,4,
0,
Stability
and
a3 0 a2 a4 = a3(ata2 - aOa3) - aia4 > at a3 Note that since a, > 0, ~2 > 0 follows from
~3
>
o.
o.
177
This page intentionally left blank
Index A Adjoint equation, 13,37,92
Admissible controls, 6, 169
Approximate controllability
of elliptic systems, 153
of parabolic systems, 161, 162
Attainable set, 10
B Bang-bang controls, 3, 4, 18, 158, 159, 161
Bang-bang principle, 171
Bilinear form, 146
coercive, 147
Boundary control
of elliptic systems, 152
of parabolic systems, 158
Bounded set, 165
Brachistochrone, 42
D Differential game, 96
Dynamic programming
continuous, 90, 96
discrete, 87
E Equivalent control systems, 110
Euler-Lagrange equation, 41
Exposed point, 19, 165
Extremals, 56
Extreme point, 165
F Feedback control, 17,94
First integral of Euler equation, 42
H C Calculus of variations, 40
Closed loop controls, 17
Closed set, 165
Controllability, 10, 96, 107
Control problem
fixed-end point, 6
fixed time, 4
free-end point, 6, 61
general form, 5, 6
infinite horizon, 95
linear regulator, 95
minimum time, 1,2,9, 159, 172
normal, 17, 19
quadratic cost, 93
state constrained, 123
Convex set, 165
Cost function, 6
Cycloids, 44
Hamiltonian, 13, 35, 59
Hamilton-Jacobi-Bellman equation, 91, 98,
99
Hyperplane, 165
I
Inventory control, 44, 46, 48, 135, 137
J
Jump conditions, 126
M Marginal costs, 92
Maximum principle
autonomous problems, 12, 35, 36
nonautonomous problems, 59
Moon-landing problem, 4, 50
179
180
Index
N Null controllability, 111
o Observability, 116, 118
Optimal control, 6
p Performance index, 6
Planning horizon, 137
Principle of optimality, 87
R Reachable set, II
Restraint set, 6
Restricted maximum principle, 125
Riccati equation, 94
Routh-Hurwitz theorem, 176
s Salvage cost, 61
Stability, 115
Stabilizability, 113
State equations, 5
Strictly convex set, 19, 168
Supporting hyperplane theorem, 168
Support point, 165
Switching locus, 16, 23, 26, 27
Switching times, 13
numerical solution, 29
T Target point, 6
Terminal payoff, 61
Transversality conditions, 37, 60
Two-point boundary-value problems, 71
finite difference scheme, 81
implicit boundary conditions, 77
method of adjoints, 73
multiple shooting, 82
quasi-linearization, 79
shooting methods, 76
v
Value of a differential game, 97
Variational inequalities, 149
w Weierstrass E function, 57