This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
(t, x(t)) 0 ( t , x , y , u ( t , x , y ) ) . 0, we £_)_ in the case of the penalty function c(z) = ,u), ) •'o n =o m =i + 1.9Ui(fi\v) = 0 (6.2.53) (here v = r/r 0 , £ = Or0/a = B/ar0, and /i(/a) = dli((j,)/d(j,). (t,x) is given by expressions (7.2.16)-(7.2.18), (7.2.20) or (7.2.21) depending on the value of the maximum admissible control um.
44
Chapter I
2 +
(1.2.43)
(sX
But if we use the usual differentiation rule, then we have
dip(t,x(t}} = hj£?>(*> : L
cr(t,x(t))^(t,x(t))dr,(t)
(1.2.44)
for the differential of a composite function (p (under the condition that x(t) satisfies Eq. (1.2.40) with usual differential dr)(t)).
The outlined computational difficulties disappear if we use the symmetrized form of stochastic integrals and equations. This has already been shown for integration when we calculated the integral (1.2.28). Let us show
that the usual formula of the composite function differentiation (1.2.44) holds for the stochastic process x ( t ) defined by the symmetrized differential equation (that is, by Eq. (1.2.38) with v = 1/2). The proof of this statement is indirect. Namely, we show that formula (1.2.44) for x(t), defined by the symmetrized stochastic equation
dx(t) = a(t, x ( t ) ) dt + a(t, x ( t ) ) d l / 2 r j ( t ) ,
(1.2.45)
implies formula (1.2.43) for x(t], defined by the Ito equation (1.2.40). Indeed, it follows from (1.2.41) that the symmetrized equation equivalent to (1.2.40) has the form
r 1 ^ ~i dx(t) = \a- -a—- \ dt + cr dl/2r](t)
I
2 axj
(the arguments of a and cr are omitted). From this relation and (1.2.44) we
obtain the symmetrized stochastic differential
*•<'"<<» = Now we note that (1.2.27) implies
*(*, z(t)) dor)(t) +
(t, x(t))a(t, x(t)) dt. (1.2.47)
By setting $ = crdtp/dx in (1.2.47), we obtain the Ito stochastic differential (1.2.43) from (1.2.46) and (1.2.47).
Synthesis Problems for Control Systems
45
Now let us consider another problem, which is extremely important from the viewpoint of applications. This is the question whether our mathematical model is adequate to the actual process in the dynamic system with
random perturbations. One of the starting points in the theory of optimal control is the assumption that the equation of motion of a dynamic sys-
tem is given a priori (§1.1). Suppose that the corresponding equation has the form (1.2.2) or (1.2.3). We have already shown that one can construct infinitely many solutions of such equations by choosing one or other form
of stochastic integrals and differentials. Which solution from these infinitely many ones corresponds to the actual stochastic process in the system? Does this solution exist? The answers can be obtained only by analyzing specific physical premises that lead to Eqs. (1.2.2), (1.2.3). Such investigations were performed in [167, 173, 175, 181], whose basic results relative to Eqs. (1.2.2), (1.2.3) we state without details.
If we consider the solution x ( t ) of Eq. (1.2.3) as a continuous model for a stochastic discrete time process Xk = s(fcA), k — 0 , 1 , 2 , . . . , which is computer simulated according to the formula
Zfc+i = xk + a(fcA, xk)A + (r(kA, xk)£k+l
(1.2.48)
(£fc > k — 1, 2 , . . . , is a sequence of independent identically distributed Gaussian random variables with zero mean and variance D£fc = A), then as A —>• 0 the sequence Xk (under the linear interpolation with respect to t between
the points t^ = kA) converges in probability to the solution x ( t ) of (1.2.3), provided that the latter is the Ito equation. If the motion of a dynamic system is given by (1.2.2) (stochastic equations of the form (1.2.2) are called Langevin equations [127]), where £(i) is a sufficiently wide-band stationary stochastic process (for example, the Gaussian Ornstein-Uhlenbeck process with the autocorrelation function
R^(T) = (a/2) exp{-a T\} for large values of a), then the solution of (1.2.2) coincides with the solution of the symmetrized equation (1.2.3), that is, of (1.2.38) with v = 1/2. In particular, each simulation of the Langevin equation (1.2.2) with an "actual" white noise by using analog computers gives
a symmetrized solution of Eq. (1.2.3) (see [37]). In the present monograph, all stochastic equations given in Langevin
form (1.2.2) with the white noise £(i) are understood in the symmetrized sense. In what follows, the symmetrized form of stochastic equations is used rather often, since this is the most convenient form for calculations relative to transformations of random functions, the change of variables, etc. In this connection, we omit the index i> = 1/2 in the stochastic differential.
The subscript 0 in the differential dor](t) in Ito equations is used if and only if the Ito equation and the corresponding symmetrized equation have
46
Chapter I
different solutions. In other cases, just as in symmetrized equations, we write stochastic differentials without subscripts. Stochastic integrals and differentials that correspond to other values of v [191, 192] are basically of theoretical interest. We shall not consider them in what follows.
In conclusion, let us consider some possible generalizations of the results obtained. First we note that all above-mentioned facts for scalar equations
(1.1.2), (1.2.3) can readily be generalized to the multidimensional case, so that the form of (1.2.2), (1.2.3) is preserved, provided that x 6 Rn and £(t) (»?(<)) are n-dimensional vector-columns of phase coordinates and random functions and the functions a and cr are an n-column and an n x n matrix. If necessary, the corresponding systems of equations can be written in more detail, for example, instead of (1.2.2), we can write
Xi = a i ( t , x ) + < T i j ( t , x ) £ j ( t ) ;
i=l,...,n
(1.2.49)
(as usual, the summation is taken over repeated indices if any, that is, in (1.2.49) we have cr^j = YTj=iaij^j}Systems (1.2.2) and (1.2.3) (or (1.2.49)) determine an n-dimensional Markov process x(t) with the vector of drift coefficients Ai(t, x) = ai(t, x) +
°
' X>
t = l,...,n,
(1.2.50)
and the matrix of diffusion coefficients
B(t,x) =
(1.2.51)
If the process x(t) is denned by the Ito equation (1.2.40), then, instead of (1.2.50) and (1.2.51), we have
A(t,x) = a(t,x),
B(t,x) = ( r ( t , x ) < T T ( t , x ) .
(1.2.52)
According to [173], stochastic equations of the more general form (1.2.1) can always be represented in the form (1.2.2). Indeed, as shown in [173], if
random functions £(t) in (1.2.1) have a small correlation time (for example, one can assume that £(<) is an n- vector of independent stochastic processes of Ornstein-Uhlenbeck type with a large parameter a), then Eq. (1.2.1) determines a Markov process with the following drift and diffusion coefficients:
,
(1.2.53)
Synthesis Problems for Control Systems j(t
+ T,x,t(t + T))]dT
47
(1.2.54)
(here K[a,/3] = E(a — Ea)(/3 — E/3) denotes the covariance of random variables a and /3; moreover, the mean Eg; and the correlation functions in (1.2.53) and (1.2.54) are calculated under the assumption that the argument x is a nonrandom fixed vector). Since similar characteristics of the Markov process defined by (1.2.2) (or by (1.2.49)) have the form (1.2.50), (1.2.51), we can obtain the differential equation (1.2.2), which is stochastically equivalent to (1.2.1), by solving system (1.2.50), (1.2.51) with respect to the unknown variables a,- and
. 1 daij a, - Ai - - ~-o-kj\ 2 dxk
. i-l,...,n.
It follows from the preceding that to study Markov processes of diffusion type, without loss of generality, we can consider stochastic equations only in the form (1.2.2.), (1.2.3) or (1.2.40). Therefore, the most general form of differential equations of motion of a controlled system with random perturbations £(t) of the white noise type is given by the equation
(1.2.55) or by the equivalent equation dx(t) - a(t, x(t),u(t)) dt + a(t, x(t), u(t)) drj(t)
(1.2.56)
(in (1.2.55) £(<) is the standard white noise with the characteristics (1.1.34); in (1.2.56) r,(t)= f £(T)dT,
Jo
77(0) = 0,
is the standard Wiener process). In (1.2.55) and (1.2.56) u = u(t) is understood as the control algorithm (1.1.2). The form of this algorithm can be found by solving the Bellman equation.
48
Chapter I §1.3. Deterministic control problems. Formal scheme of the dynamic programming approach
The dynamic programming approach [14] was proposed by R. Bellman
in the fifties as a method for solving a wide range of problems relative to processes of multistage choice. In this section we briefly discuss the main idea of this method applied to synthesis problems for optimal feedback control systems [16, 17]. We begin with deterministic problems of optimal
control and pay the main attention to the algorithm of the method, that is, to the method for constructing the optimal control in the synthesis form.
Let us consider the control problem with free right endpoint of the trajectory, in which the plant is given by system (1.1.5)
x(t)=g(t,x(t),u(t)),
0
z(0) = x 0 ,
(1.3.1)
the performance criterion is a functional of the form (1.1.11)
(1.3.2) and the control vector u may take values at each moment of time in a given
bounded set U C R r , u(t) e U.
(1.3.3)
In problem (1.3. !)-(!. 3. 3) the time interval [0,T] and the initial vector of phase variables XQ are known; it is required to find the control function u*(t): Q
represented in the form
u,(t) =
(1.3.4)
where the current values of the control vector are expressed in terms of the
current values of phase variables of system (1.3.1). The optimal control of the form (1.3.4) is called optimal control in the synthesis form, and formula (1.3.4) itself is often called the algorithm of optimal control. The dynamic programming approach allows us to obtain the optimal control in the synthesis form (1.3.4) for problem (1.3. 1)-(1. 3.3) as follows. We write
U
t
T
-I
C(T, x(r), U(T)) dr + i>(x(T)}
J
.
(1.3.5)
Synthesis Problems for Control Systems
49
The function F(t,xt), called later the loss function,6 plays an important role in the method of dynamic programming. This function is equal to the minimum value of the functional (1.3.2) provided that the control process is considered on the time interval [t,T], 0 < t < T, and the vector of phase variables is equal to x(i] = xt at the beginning of this interval (that is, at time t). In (1.3.5) the minimum is calculated over all possible strategies U(T) =
(a) these functions take values in an admissible set U] (b) for any t € [0,T] the Cauchy problem for system (1.3.1), x(r)=ff(r,z(T),y»(r,x(r))),
t < T < T,
x ( t ) = xt,
has a unique solution X(T) : t < T < T.
The dynamic programming method is based on the Bellman optimality principle [14, 17], which implies that the loss function (1.3.5) satisfies the basic functional equation r ft
F(t,xt)=
min
_
1
/ c(o-,x(o-),u(er)) da + F(t,x7) \,
n(tT)£U
Ut
(1.3.6)
J
t<<7<*
for all t <E [t,T]. For different statements of the optimality principle and comments see [1, 16, 50]. However, here we do not discuss these statements, since to derive Eq. (1.3.6) it suffices to have the definition of the loss function (1.3.5) and to understand that this is a function of time and of the state x(t) = xt of the controlled system (1.3.1) at time t (recall that the control process is terminated at a fixed time T). To derive Eq. (1.3.6), we write the integral in (1.3.5) as the sum Jt = It ~*~ Jf °f *wo integrals and write the minimum as the succession of minima
min = min u(r)6U t
min .
«(
Then we can write (1.3.5) as follows:
. . t<(T
U
t c(er, x(ff),
u(cr)} dcr
t
I c(p,X(p},u(p)}dp+^(x(T)]\. Jt \
(1.3.7)
The function (1.3.5) is also called a value function, a cost function, or the Bellman
function.
50
Chapter I
Since, by (1.3.1), the control u(p) on the interval [I, T) does not affect the solution x(cr} of (1.3.1) on the preceding interval [t,t), formula (1.3.7) takes the form F(t,xt) = min < / c(a;x(a-),u(cr)) da u(
( Jt
t<er
+ mm \ f
c(p,x(p),-u(p))dp+il>(x(T))\\.
u(p)€U Ut t
(1.3.8)
J )
Now, since by (1.3.5) the second term in the braces in (1.3.8) is the loss
function F(t,Xf),
we finally obtain Eq. (1.3.6) from (1.3.8).
The basic functional equation (1.3.6) of the dynamic programming approach naturally allows us to derive the differential equation for the loss
function F(t,x). To this end, in (1.3.6) we set t = t + A, where A > 0 is small, and obtain
F(t,xt) =
min «(
F /"*+A 1 / c(
1
Since the solutions x(t) of system (1.1.3) are continuous, the increments (E*+A —xt) of the phase vector are small for admissible controls u(t) =
F(t, x) is continuously differentiable with respect to all its arguments, we can expand the function F(t + A, K«+A) into its Taylor series about the point (t, Xt) as follows:
(1.3.10) In (1.3.10) dF/dx denotes an n-vector column with components dF/dxi, i— 1, 2, . . . , n; therefore, the third term on the right-hand side of (1.3.10) is the scalar product of the vector of increments (CC<+A — xt) and the gradient of the loss function
dF A. .dF xO —— = ^(Xit+^-Xit) —— . the function o(A) denotes the terms whose order is larger than that of the infinitesimal A. It follows from (1.3.1) that for small A the increment of the phase vector x can be written in the form -xt- g(t, xt, w«)A + o(A).
(1.3.11)
Synthesis Problems for Control Systems
51
Writing the first term in the square brackets in (1.3.9) as t+A
c(tr, x(a), u(cr)) da = c(t, xt, w«)A + o(A),
/.
(1.3.12)
substituting (1.3.10) and (1.3.12) into (1.3.9), and taking into account
(1.3.11), we arrive at T dF F(t, xt) = miu c(<, xt, u t )A + F(<, xt) + -^r(t, «t)A tf t G y LI
C7t
ftW
1
+ f f T (f, x ( , w t )-^-(i, a;t)A + o(A) . (1.3.13) ox j Note that only the first and the fourth terms on the right-hand side of (1.3.13) depend on the control ut. Therefore, the minimum is calculated
only over these terms, the other terms in the brackets can be ignored. Dividing (1.3.13) by A, passing to the limit as A —>• 0, and taking into account the fact that lirriA-KD o(A)/A = 0, we obtain the following differential equation for the loss function F(t, x):
at
,
, ,
,,)jr(t,x)
=0
(1.3.14)
(here we omit the subscript t of the phase vector Xt and the control Ut). Note that the loss function F(t, x) satisfies Eq. (3.1.14) on the entire interval of control 0 < t < T except at the endpoint t = T, where, in view of (1.3.5), the loss function satisfies the condition
F(T,x) = i/>(x).
(1.3.15)
The differential equation (1.3.14), called the Bellman equation, plays the central role in applications of the dynamic programming approach to the synthesis of feedback optimal control. The solution of the synthesis problem, that is, the optimal strategy or the control algorithm u* (t) = y>* (t, x) =
( p f ( t , x ( t ) ) can be found simultaneously with the solution of Eq. (1.3.14). Namely, suppose that we have somehow found the function F(t, x) that satisfies (1.3.14) and (1.3.15). Then the expression in the square brackets in (1.3.14) is a known function of i, x, and u. Calculating the minimum of this function with respect to u, we obtain the optimal control u* — y*(<, x) (it* determines the minimum point of this function in U C R r )If the functions c(t, x, u) and g(t, x, u) and the set of admissible controls
U allow us to minimize the function in the square brackets explicitly, then the optimal control can be written in the form
(1.3.16)
52
Chapter I
where dF/dx is a vector of partial derivatives yet unknown; when we minimize the function in the square brackets in (1.3.14), we assume that this vector is given. Using (1.3.16) and denoting
f , , T, , d F . , mm c(t, x, u) + g1 (t, x, u) ——(t, x) u£U \_
OX
dF
9F\\dF —)J — (1.3.17)
we write (1.3.14) without the symbol "min" as follows:
dF ( dF \ — ( t , x ) + 3>(t,x,—(t,x)}=Q, at \ ox J
Q
1.3.18
To complete the synthesis problem, it is necessary to solve (1.3.18) with regard to (1.3.15), that is, to find the function F(t, x) that satisfies (1.3.18) for 0 < t < T and continuously tends to a given function if>(x) as t —>• T,
and to substitute the function F(t, x) obtained into (1.3.16). In practice, the main difficulty in this synthesis procedure, is related to solving the Bellman equation (1.3.14) or (1.3.18), which is a first-order partial differential equation. The main distinguishing feature of the Bellman equation is that it is nonlinear because of the symbol "min" in (1.3.14), which shows that the function $ in (1.3.18) nonlinearly depends on the components of the vector of the partial derivatives dF/dx. The character of this nonlinearity is determined by the form of the functions c(i, x, u) and g(t, x, u), as well as by the set of admissible controls U. Let us consider some typical illustrative examples.
1°. Suppose that c(t,x,u) = ci(t,x) + uTP(t,x)u, where P is a symmetric r x r matrix positive definite for all x 6 R and t £ [0, T], g(t, x, u) — a(t, aj) + Q(t, x)u (a is an n-vector and Q is an n x r matrix), and the control u is unbounded (that is, U = R r ). Then the expression in the square brackets in (1.3.14) takes the form /) W
f) J?
[•} = Cl(t, x) + aT(t, x)^ + uTQT(t, x)——+ uTP(t, x)u.
(1.3.19)
By differentiating this function with respect to u and solving the system d[-]/du = 0, we obtain 1
/) F
^
(s3j
= --P~l(t,x)QT(t,x)--
(1.3.20)
Synthesis Problems for Control Systems
53
(the matrix P~l is the inverse of P). Substituting (1.3.20) into (1.3.19) instead of u, we obtain Q p\ ap t,x, —— J = c i ( t , a ; ) + a T ^ , x ) ——
(
1 f) W
3W
T - rt-,^rQ(t^}P-\t^}Q (t^}-^C*«K C/tC
(1-3-21)
2°. Suppose that c(i,«,w) = ci(t,x), g(t,x,u) = a(t,x) + Q(t,x)u, and the domain U is an r-dimensional parallelepiped, that is, |w,- < tioi, i = 1,..., r, where the numbers u0» > 0 are given. One can readily see that
dF t, x)——
( L3 - 22 )
9F
where sign A and \A\ are matrices obtained from A by replacing each its element a,-j by signojj and |ctjj|, respectively; {MOII • • • > U 0r} denotes the diagonal r x r matrix with itoi, • • •, «0r on its principal diagonal. 3°. Let the functions c(-) and g(-) be the same as in 2°; for the domain U, instead of a parallelepiped, we take an r-dimensional ball of radius RQ centered at the origin. Then, instead of (1.3.22), we obtain the following expressions for the functions tpo and 4>:
dX
——
"-'
1/2
(1.3.23)
Note that in (1.3.23) and in the following, dF/dxT denotes an n-row-vector with components dF/dxt, i = 1,..., n. Therefore, the function •jjrQQ'1'ffis a quadratic form in components of the gradient vector of the loss function, and the matrix QQT is its kernel. As a rule, the nonlinear character of Bellman equations does not allow one to solve these equations (and the synthesis problem) explicitly. There is only one exception, namely, the so-called linear-quadratic problems of optimal control (LQ-problems). In this case the differential equations (1.3.1) of the plant are linear:
x ( t ) = A(t)x(t) +B(t)u(t)
54
Chapter I
(here A(t) and B(t) are given nxn and nxr matrices), the penalty functions c(t, K, u) and i/'(x) m ^ne optimality criterion (1.3.2) are linear-quadratic forms of the phase variables z and controls u, and there are no restrictions
on the domain of admissible controls (that is, U — Rr in (1.3.3)). Let us solve the synthesis problem for the simplest one-dimensional LQproblem with constant coefficients; in this case, the solution of the Bellman equation and the optimal control can be obtained as finite analytic formulas. Suppose that the plant is described by the scalar differential equation
x = ax + bu,
(1.3.24)
and the optimality criterion has the form [•T
I(u) = clX2(T) + I
Jo
[cx2(t) + hu2(t)} dt
(1.3.25)
(ci > 0, c > 0, T > 0, h > 0, and a and b in (1.3.24) and (1.3.25) are given constant numbers). The Bellman equation (1.3.14) and the boundary condition (1.3.15) for problem (1.3.24), (1.3.25) have the form
9F f dF 1 —— (*,£)+min \cx*+ hu* + (ax + bu)——(t,x}\ =0, x
ot
"
" L
J
F(T,x) = Clx2.
(1.3.26)
(1.3.27)
The expression in the square brackets in (1.3.26) considered as a function of M is a quadratic trinomial. Since h > 0, this trinomial has the single minimum
which can readily be obtained from the relation d[-]/du = 0; this is a necessary condition for an extremum. Substituting M* into (1.3.26) instead of u and omitting the symbol "min", we rewrite the Bellman equation in the form
dF dt
,
dF dx
b2 IdF\2 =o, 4h\dxJ
—— + ex2 + ax _ _ _ _
0
~
v(1.3.29)
'
We shall seek the loss function F(t,x) satisfying Eq. (1.3.29) and the boundary condition (1.3.27) in the form
F(t, x) = p(t)x2,
(1.3.30)
Synthesis Problems for Control Systems
55
where p(t) is the desired function of time. If we substitute (1.3.30) into (1.3.29), then we see that p(t) must satisfy the ordinary differential equation
p+c+2ap-~p2 = 0 h
(1.3.31)
for 0 < t < T. Moreover, it follows from (1.3.27) and (1.3.30) that the function p(t] assumes a given value at the right endpoint of the control interval:
p(T) = ci.
(1.3.32)
Equation (1.3.31) can readily be integrated by separation of variables. The boundary condition (1.3.32) determines the unique solution of (1.3.31). Performing the necessary calculations, we obtain the following function p(i) that satisfies Eq. (1.3.31) and the boundary condition (1.3.32):
P(
'~
(J3- a)h - [62d - h(a
Thus it follows from (1.3.28) and (1.3.30) that the optimal control in the synthesis form for problem (1.3.24), (1.3.25) has the form
«, =
(1.3.34)
where p(t) is determined by (1.3.33). Note that problem (1.3.24), (1.3.25) is one of few optimal control problems, for which the Bellman equation can be solved exactly. In Chapter II we consider some other examples of exact solutions to synthesis problems
of optimal control (for deterministic and stochastic control systems). However, the majority of the optimal control problems cannot be solved exactly. In these cases, one usually employs approximate and numerical synthesis methods considered in Chapters III-VII. We complete this section with some remarks. First we note that we have considered only a formal scheme or, as is said sometimes, the algorithmic essence of the dynamic programming approach.
The described method for constructing an optimal control in the synthesis form (1.3.4) is justified by some assumptions, which are violated sometimes. We need to take into account the following. (1) The loss function F(x,t) determined by (1.3.5) is not always differentiable even if the penalty functions c(t,x,u) and i{>(x) are sufficiently
56
Chapter I
smooth (or even analytic) functions. It is well known that, by this reason, the dynamic programming approach cannot be used for solving many time-optimal control problems [50, 156]. (2) Even in the case where the loss function F(x,t) satisfies the Bell-
man equation (1.3.14), the control u*(i, x] minimizing the function in the square brackets in (1.3.14) may not be admissible. In particular, this control can violate the existence and uniqueness conditions for the solution of
the Cauchy problem for system (1.3.1). (3) The Bellman equation (1.3.14) (or (1.3.18)) with the boundary condition (1.3.15) can have nonunique solutions. Nevertheless, we have the following theorem [1].
THEOREM. Suppose that there exists a unique continuously differentiable solution Fo(t,x) of Eq. (1.3.14) with boundary condition (1.3.15) and there exists an admissible control u*(t,x] such that
[
Ft W 1 3F c(t,x,u)+gT(t,x,u)^-(t,x)\ = c(t,x,ut) + gT(t, x, ut)— —(t, x). ox J ox
Then the control w*(t, x) in the synthesis form is optimal, and the function Fo(t,x) coincides with the loss function (1.3.5).
In conclusion, we point out another fact relative to the dynamic programming approach. The matter is that this method can be used for solving problems of optimal control for which the optimal control ut(t, x) does not exist. For example, such situations appear when the domain of admissible controls U in (1.3.3) is an open set. The absence of an optimal control does not prevent us from deriving the
basic equations for the dynamic programming approach. It only suffices to modify the definition of the loss function (1.3.5). So, if we define the function F(t, xt) as the greatest lower bound of the functional in the square brackets in (1.3.5),
U
T
i
C(T, X(T),U(T)) dr + 1>(x(T))
J
,
(1.3.35)
t
then one can readily see that the function (1.3.35) satisfies the equations
c(a, x(a), u(
(1.3.36)
J
t<<7
3F
r
ft F
~\
ot
utu |_
ox
\
—— ( t , x ) + inf \c(t, x, u)+gT(t, x,u)-—(t, x)\ = 0 ,
(1.3.37)
Synthesis Problems for Control Systems
57
which are similar to Eqs. (1.3.6) and (1.3.14). However, in this case the functions u*(t, x) realizing the infimum of the function in the square brackets in (1.3.37) may not exist. Nevertheless, the absence of an optimal control u*(t,x) is of no fundamental importance in applications of the dynamic programming approach,
since if the lower bound in (1.3.37) is not attainable, one can always construct the so-called £-optimal strategy ue(t, x). If this strategy is used in system (1.3.1), then the performance functional (1.3.2) attains the value I(uc) = -F(0, Xo) + e, where e is a given positive number. Obviously, to construct an actual control system, it suffices to know the e-optimal strategy ue(t, x) for a small e. Here we do not describe methods for constructing £-optimal strategies. First, these methods are considered in detail in the literature (see, for example, [113, 137]). Second (and this is the main point), the optimal control always exists in all special problems studied in Chapters II-VII. This is the reason that, from the very beginning, in the definition of the loss function (1.3.5) we use the symbol "min" instead of a more general symbol "inf." §1.4. The Bellman equations for Markov controlled processes
The dynamic programming approach is widely used for solving stochastic problems of optimal control. In this section we consider the control
problems in which the controlled process is a Markov stochastic process. It follows from the definition of the Markov processes given in §1.1 that the probabilities of future states of a controlled system are completely determined by the current states of the vector of phase variables, which are assumed to be known at any time t.
FIG. 10 One can readily see that the servomechanism shown in Fig. 10 possesses the listed Markov properties if the following conditions are satisfied: (1) the joint vector (y(i),x(t)) of instant values that define the input actions and output variables is treated as the phase vector of the system;
58
Chapter I
(2) the input action y(t) is a Markov stochastic process; (3) the random perturbation £(i) is a white noise type process; (4) the controller C is a noninertial device that forms current values of the control actions u(t) according to the rule
u(t) = v(t,x(t),y(t}).
(1.4.1)
Actually, if the plant P is described by equations of the form (1.2.55)
and y(t) is a Markov process with known probability characteristics, then it follows from (1.2.55) and (1.4.1) that the joint vector x(t) = (x(i),y(t)) is a Markov process. In particular, if y(t) is a diffusion process with drift coef-
ficient A(t, y) and diffusion coefficient B(t, j/), then it follows from (1.2.39), (1.2.55), and (1.4.1) that the vector x(t) satisfies a system of stochastic differential equations of the form (1.2.2), that is, x(t) is a diffusion Markov process.
In this section we deal only with systems of the type shown in Fig. 10. In §1.5 we consider the possibilities of applying the dynamic programming approach in a more general situation with non-Markov controlled process (Fig. 3). Later we shall derive the Bellman equations for various stochastic problems of optimal control that are studied in Chapters II— VII. These problems
were stated in §1.1. 1.4.1. Base problem. Optimal tracking of a diffusion process. As the basic problem we consider the synthesis of the optimal ser-
vomechanism shown in Fig. 10 under the following conditions: (i) the controlled plant P is described by a system of stochastic differential equations of the form x(t)=a(t,x(t),u(t))+a(t,x(t})£(t),
z(0) = xo,
0 < * < T, (1.4.2)
where x G Rn is a vector of controlled output variables, u € Rr is a vector of control actions, ^(t) is the n-dimensional standard white noise
with characteristics (1.1.34), a and a are a given matrix and a vectorfunction, and the initial vector x(Q) = x$ and the time interval [0,T] are specified; (ii) the optimal control is sought in the form (1.4.1), the goal of control
is to minimize the functional
o
J
(1.4.3)
(iii) the restrictions on admissible controls have the form
«(*) G U,
(1.4.4)
Synthesis Problems for Control Systems
59
where U is a given bounded closed subset in the space R r ; (iv) the input stochastic process y(t) is independent of £ ( t ) and is an ra-dimensional diffusion Markov process with a known vector Ay(t,y) of drift coefficients and with matrix By(t, y) of diffusion coefficients;7 (v) there are no restrictions on the phase variables, that is, on the components of the vector x = (a;, y) e R n + m ; the current values of the components of the joint vector x can be measured precisely at any instant of time
te[o,T]. By analogy with (1.3.5) we determine the loss function F(t,xt,yt) for problem (i)-(v) as follows: F(t,xt,yt) = min E u(r)£U
r ,-T / C(X(T), j/(r),u(r)) dr Ut
t
+ i>(x(T),y(T))
x ( t ) = xt,y(t)=y^.
(1-4.5)
The loss function (1.4.5) for stochastic problem (i)-(v) differs from the
loss function (1.3.5) in the deterministic case by the additional operation of averaging the functional in the square brackets in (1.4.5). The averaging in (1.4.5) is performed over the set of sample paths xj = [X(T) : t < r < T],
y? = [y(r): t < T < T], that on the interval [t,T] satisfy the stochastic differential equations (1.4.2) and (*) (see the footnote) with initial conditions x(t) = Xt, y(t) = yt and control function u(r) = tf(r, x(r), y ( r ) j ,
t < T < T. Since the process X(T) — (z(r),j/(r)) is Markov, the result of averaging jF<,,(i, Xt) = F,f,(t, xt, yt) = £[•] in (1.4.5) is uniquely determined by the time moment t, by the state vector of the system xt — (xt, yt) at this moment, and by a chosen algorithm of control, that is, by the vector-function ip(-) in (1.4.1). Therefore, it turns out that the loss function (1.4.5) obtained by
minimizing Fv(t,xt) = F^t, xt, yt) over all admissible controls8 (that is, over all admissible vector-functions >(•)) depends only on time t and the state (xt,yt) of the servomechanism (Fig. 10) at this time moment. As was shown in §1.2, the coefficients Av(t,y) and Bv(t, y) uniquely determine the system of stochastic differential equations
y(t) = a»(t,y(t))+v»(t,y(t))r,(t),
(*)
whose solutions are sample paths of the Markov process y(t); in (*) rj(t] denotes the standard white noise (1.1.34) independent of £(*). 8 Just as in the deterministic case (§1.3) the control in the form (1.4.1) is called
admissible if (i) for all t 6 [0,T), x 6 R n , and y € R m , the vector-function
60
Chapter I
One can readily see that, for any t (E [t,T], the loss function (1.4.5) satisfies the equation
F(t,xt,yt) = mm E\ f c(x(T),y(T),u(T))dT+F(t,xT,W)\ u(r)£U
IJt
(1.4.6) J
t
that is a stochastic generalization of the functional equation (1.3.6). The
averaging in (1.4.6) is performed over the sample paths x\ and y*, and the symbol E[-] in (1.4.6) indicates the conditional expectation E^j 7, [•]. To prove (1.4.6), we write E^T j, 7 <•(•) for the conditional expectation T '^7 1^*'"*
of a functional of phase trajectories denoted by (•). Here we average over
all possible sample paths x'j = [X(T) : t < T < T], y'j = [y(r) :t
writing the integral in (1.4.5) as the sum Jt
= ft + fj
of two integrals
and writing the minimum as the succession of minima min = min t
min ,
t<<7
we can rewrite (1.4.5) as
r
min
min
v(
ft
T,s[[ / c(x(
(1.4.8) It follows from (1.4.1) and (1.4.2) that the controls u(p) on the time interval t < p < T do not affect the stochastic process x(
interval t <
a
t
-> ~t c(x(o-),y(
J t
Synthesis Problems for Control Systems
61
we can rewrite (1.4.8) in the form
F(t,xt,yt)=
min E 7 7,,. u(
i>ytt
U(
| / c(x(a),y(cr),u(cr))
dff
(. Jt
+ min E s T i y T | s T y 7[jr c ( x ( p ) , y ( p ) , u ( p ) ) d p + i > ( x ( T ) , y ( T ) ) ] ^ . t
Since the process (z(£), y(t)} is Markov, the result of averaging in the second term in the braces in (1.4.9) depends only on the terminal state of a fixed sample path (xj, y\}. Thus, replacing E^T^T^T,,! by E^T !,,._,,_ and taking into account the fact that, by (1.4.5), the second term in (1.4.9) is the loss function F ( t , x ^ , i f f ) , we finally obtain the functional equation (1.4.6) from
(1.4.9). Just as in the deterministic case, the functional equation (1.4.6) allows us to obtain a differential equation for the loss function F(t,x,y). By setting
t = t + A, we rewrite (1.4.6) in the form r yt+A
F(t,xt,yt)=
min
E
/
C(X(T), y(r), U(T)) dr
.
(1.4.10)
Assuming that A > 0 is small and the penalty function c(x,y,u) is continuous in its arguments and having in mind that the diffusion processes X(T) and J/(T) are continuous, we can represent the first term in the square brackets in (1.4.10) as
c(x(r), 2/(r), w(r)) dr = c(z t , j/t, ut) A + o(A),
(1.4.11)
where, as usual, the function o(A) denotes infinitesimals of higher order than that of A. Now we assume that the loss function F(t,x,y) has continuous derivatives with respect to t and continuous second-order derivatives with respect to phase variables x and y. Then for small A we can expand the function
62
Chapter I
F(t + A, Xt+Ai Vt+A.) in the Taylor series
= -F^, xt, yt) + A
*9 F
f)F
x T
«+ -yt) + 2
+ o(A) + o(|i« +A - it 2 ) + o(|y t+A - yt\2).
(1.4.12)
Here all derivatives of the loss function are calculated at the point (t, xt, yt),
as usual, dF/dx and dF/dy denote the n- and m-column- vectors of partial derivatives of the loss function with respect to the components of the vectors x and y, respectively; d2F/dxdxT , d2 F / dxdyT , and d2F/dydyT denote the n x n, n x m, and m x ra matrices of second derivatives.
To obtain the desired differential equation for F(t,x,y), we substitute (1.4.11) and (1.4.12) into (1.4.10), average, and pass to the limit as A —>• 0. Note that if we average expressions containing the random increments
(xt+A. — %t) and (y«+A — y t ) , then all derivatives of F in (1.4.12) are considered as constants, since they depend on (t,xt,yt) and the mathematical expectation in (1.4.10) is calculated under the assumption that the values of Xt and yt are known and fixed. The mean values of the increments (xt+& — Xt) can be calculated by integrating Eqs. (1.4.2). However, we can avoid this calculation if we use the results discussed in §1.2. Indeed, if just as in (1.4.11) we assume that the control U(T) is fixed and constant, U(T) = ut, then we see that for t < T < t + A, Eq. (1.4.2) determines a Markov process X(T) such that we can write (see (1.1.54))
E(a; t+ A - x t ) = Ax(t, xt, « t )A + o(A),
(1.4.13)
where Ax(t, xt,Ut) is the vector of drift coefficients of this process. But since (for a fixed u(t) = ut) Eq. (1.4.2) is similar to (1.2.2), it follows from (1.2.50) that the components of this vector have the form9 I
\
< i j . X t
=ai(t,xt,ut) + - ——J-^ ——-a-kj(t,xt), (1.4.14) 9 Recall that formula (1.4.14) holds for the symmetrized stochastic differential equation (1.4.2). But if (1.4.2) is an Ito equation, then we have Ax(t, xt,ut) = a(t,xt,ut) instead of (1.4.14).
Synthesis Problems for Control Systems
63
In a similar way, (1.4.2), (1.1.50) and (1.2.52) imply
E(z t+A - xt)(xt+A - xt)T = Bx(t, x f )A + o(A),
(1.4.15)
where
Bx(t,xt}=a(t,xt)aT(t,xt).
(1.4.16)
The other mean values in (1.4.12) can be expressed in terms of the input Markov process y(t) as follows:
E(J/«+A - yt) = A»(t, yt) A + o(A), (1.4.17) A
°( )-
(1.4.18)
Finally, since the stochastic processes y(t] and £(i) are independent, we have
- Vt)T = o(A).
(1.4.19)
Taking into account (1.4.13)-(1.4.19), we substitute (1.4.11) and (1.4.12) into (1.4.10) and rewrite the resulting expression as follows:
F(t,xt,yt) = min < F(t, xt, yt) + A \c(xt, yt, ut) + -=-
vt£U {
\_
at
For brevity, in (1.4.20) we omit the arguments (t,xt,yt) of all partial derivatives of F and denote the trace of the matrix A — |[oij||™ by Sp^4 = on + a 2 2 + • •• + o nn . By analogy with Eq. (1.3.14), we divide (1.4.20) by A, pass to the limit as A —>• 0, and obtain the following Bellman differential equation for the
loss function F = F(t,x,y):
+ mm\c(x,y,u)+(Ax(t,x,u)f~]
=0.
(1.4.21)
By analogy with (1.3.14), we omit the subscripts of xt, yt, and ut, assuming that the phase variables x, y and the control vector u in (1.4.21) are taken
64
Chapter I
at the current time t. We also note that the loss function F — F(t, x, y) must satisfy Eq. (1.4.21) for 0 < t < T. At the right endpoint of the control interval, this function must satisfy the condition
,y) = i>(x,y),
(1.4.22)
which readily follows from its definition (1.4.5). By using the operator
,
2L
(1.4.23)
oxax1
we can rewrite (1.4.21) in the compact form
mm [Ll,iyF(t, x, y) + c(x, y, u)} = 0.
(1.4.24)
In the theory of Markov processes [45, 157, 175], the operator (1.4.23) is called an infinitesimal operator of the diffusion Markov process x(t) = (*(*)),»(*))•
To obtain the optimal control in the synthesis form ut =
If it is possible to calculate the minimum of the function in the square brackets in (1.4.21) explicitly, then the optimal control can be written as
follows (see §1.3, (1.3.16)-(1.3.18)): u* =
(1.4.25)
and the Bellman equation (1.4.21) can be written without the symbol "min"
Sp
*'<*'»>
+
*'*'*
= °'
(1A26)
where $ denotes a nonlinear function of components of the vector dF/dx 9F
-\=c(x -u *n(t T v —— \\
>x>y>%:)}\ Tir-
(i-4-27)
Synthesis Problems for Control Systems
65
In this case, solving the synthesis problem is equivalent to solving (1.4.26) with the additional condition (1.4.22). After the loss function F(t,x,y) satisfying (1.4.26) and (1.4.22) is found, we can calculate the gradient
dF(t, x, y)/dx = u(t, x, y) and obtain the desired optimal control
u* —
(1.4.28)
Obviously, the main difficulty in this approach to the synthesis problem is to solve Eq. (1.4.26). Comparing this equation with a similar equation (1.3.18) for the deterministic problem (1.3.!)-(!.3.3), we see that, in contrast with (1.3.18), Eq. (1.4.26) is a second-order partial differential equation of parabolic type. By analogy with (1.3.18), Eq. (1.4.26) is nonlinear, but, in contrast with the deterministic case, the nonlinearity of Eq. (1.4.26)
is weak, since (1.4.26) is linear with respect to the higher-order derivatives of the loss function. This is why, in the general theory of parabolic equations [61, 124], equations of type (1.4.26) are usually called quasilinear or semilinear. In the general theory [124] of quasilinear parabolic equations of type
(1.4.26), the existence and uniqueness theorems for their solutions are proved for some classes of nonlinear functions $. The unique solution
of (1.4.26) is selected by initial and boundary conditions on the function F(t,x,y). In our case, condition (1.4.22) that determines the loss function
for t = T plays the role of the "initial" condition. The boundary conditions are determined by the restrictions imposed on the phase variables * and y in the original statement of the synthesis problem. If, as in problem (i)-(v) considered here, there are no restrictions on the phase variables, then it is necessary to solve the Cauchy problem for (1.4.26). In this case, the uniqueness of the solution is ensured by some requirements on the rate of growth of the function F(t, x, y) as x , \y\ —> oo (for details see Chapter III). However, there are no general methods for solving equations of type
(1.4.26) explicitly. Nevertheless, in some specific cases, Eq. (1.4.26) can be solved approximately or numerically, and sometimes, exactly. We describe such special cases in detail in Chapters II-VII. Now let us consider some modifications of problem (i)-(v) that we shall study later. First of all, we trace how the form of the Bellman equation (1.4.21) varies if, in the initial problem (i)-(v), we use optimality criteria that differ from (1.4.3). 1.4.2.
Stationary tracking. We begin by modifying the criterion
(1.4.3), which allows us to examine stationary operating conditions of the servomechanism shown in Fig. 10. We assume that criterion (1.4.3) does not penalize the terminal state
of the controlled system, that is, the penalty function ij}(x,y) = 0 in the
Chapter I
66
functional (1.4.3). Then the servomechanism shown in Fig. 10 can operate in the time-invariant (stationary) tracking mode if the following conditions are satisfied: (1) the input Markov process y(t) is homogeneous in time, namely, its drift and diffusion coefficients are independent of time: Ay(t, y] = Ay (y)
(2) the plant is autonomous, that is, the right-hand sides of Eqs. (1.4.2) do not depend on time explicitly, a(t, x, u) = a(x, u) and cr(t, x) = ff(x)',
3) the system works sufficiently long (the upper integration limit T —>• oo in (1.4.3)).
z(t)
= y(t] - x(t)
FIG.
11
A process of relaxation to the stationary operating conditions is schematically shown in Fig. 11, where the error z(i) = y ( t ) — x(t) between the input action (the command signal) and the controlled value (x and y are scalar variables) is plotted on the ordinate axis. One can see that for large T the operation interval [0, T] can be conventionally divided into two intervals: the time-varying operation interval [0, ti, this correlation disappears, and we can assume that z(t), t 6 [
Synthesis Problems for Control Systems
67
The performance on the time-invariant interval is characterized by the value 7 of mean losses per unit time (the stationary tracking error). If the operation time T increases to T + AT (see Fig. 11), then the loss function (1.4.5) increases by -yAT. Therefore, to study the stationary tracking, it
is expedient, instead of the loss function (1.4.5), to use the loss function f(x, y) that is independent of time and can be written as
f(x,y)=
lim [F(t,x,y)-i(T-t)].
T—*-oo
(1.4.29)
It follows from (1.4.23) and (1.4.24) that function (1.4.29) satisfies the stationary Bellman equation
mm [L« y /(z, y) + c(x, y, u)] = 7,
(1.4.30)
where L™^y denotes the elliptic operator
Obviously, for the optimal control ut =
7=
c(x,y,tpt(x,y))p00(x,y)dxdy
(1.4.32)
and, together with the functions f(x,y) and u* = ipt(x,y), can be found by solving the time-invariant equation (1.4.30). Some methods for solving
the stationary Bellman equations are considered in Chapters II- VI. 1.4.3. Maximization of the mean time of the first passage to the boundary. As previously, we assume that in the servomechanism shown in Fig. 10 the stochastic process y(t) is homogeneous in time and the plant P is autonomous. We also assume that a simply connected closed domain
D C R-n+m is chosen in the (n + ra)-dimensional Euclidean space R n +m of vectors (x,y). It is required to find a control that, for any initial state
(z(0),j/(0)) G D of the system, maximizes the mean time Er during which the representative point (x(t),y(i)') achieves the boundary dD of the domain D (see the criterion (1.1.21) in §1.1). By Wu(t — to, XQ, j/o) we denote the probability of the event that the rep-
resentative point (x, y) does not reach dD during time t— to ifx(to) = XQ and
68
Chapter I
y(to) = 3/0, (£0,2/0) G D, and a control algorithm u(t) — < p ( x ( t ) , y ( t ) ) is chosen. This definition implies the following properties of the function Ww:
if (xo,yo) is an interior point of D;
W«(i-*o,zo,l/b) = 0,
V*>t0,
(1.4.33)
if (s0, 2/0) € 3D.
If t r denotes a random instant of time at which the phase vector x ( t ) = (x(t),2/(f)) comes to the boundary dD for the first time, then the time T = tr — to of coming to the boundary is a random variable and the function
W"(-) can be expressed via the conditional probability
Wu(t~to,x0,yo) = Pu{r>t-t0 \x(t0) = x0,y(t0) = 2/0} = Pn{T>t-t0\x0,y0}.
(1.4.34)
For the mutually disjoint events {T < t—to} and {T > t— to}, the probability addition theorem implies
Pu{r
(1.4.35)
Expressing the distribution function of the probabilities PW{T < t — to \ ^0,2/0} via the probability density wT(a] of the continuous random variable T, we obtain ft — to
P{r
x0,y0} = I Jo
wT(a)du — 1 - Wu(t - t0,x0, t/0)
from (1.4.34) and (1.4.35). Hence, after the differentiation with respect to t, we have
dWu wr(t-t0) = ————(t-to,x0,yo).
(1.4.36)
Using the same notation for the argument of the density and for the random
value, that is, writing wT(t — to) = w(r), from (1.4.33) and (1.4.36) we obtain the mean time ET of achieving the boundary /•OO
=
Jo
,-00
Tw(r)dT = -
Jo
^yyu
T~—(T,x0,y0)dT
dr
/.OO
= / Jo
|»OO
Wu(T,x0,y0)dT=
Wn(t-t0,x0,yoo)dt. )dt.
Jt
°
This formula holds if lim^oo^ - t0)Wu(t - t0, x 0 , yo) = 0.
(1.4.37)
Synthesis Problems for Control Systems
69
The mean time ET depends both on the initial state (XQ, J/Q) of the controlled system shown in Fig. 10 and on a chosen control algorithm. u = f>(x, y). Therefore, the Bellman function for the problem considered is determined by the relation ,.00
Fi(x,y) = max /
WU(T - t, x, y) dr.
(1.4.38)
u(r)£U Jt T>t
By analogy with (1.4.10), for the function (1.4.38), the basic functional equation of the dynamic programming approach has the form
Fi(xt,yt) =
max
r ,t+A -i / WU(T -t,xt,yt)dT +EFi(xt+&.,yt+&)\.
u(r)£U Ut t
J
(1.4.39) The Bellman differential equation for the function Fi(x,y) can be derived from (1.4.39) by passing to the limit as A —>• 0. In this case, the procedure is almost the same as that used for the derivation of Eq. (1.4.21)
for the basic problem (i)-(v). Expanding Fi(xt+&., J/J+A) in the Taylor series around the point (xt,yt), averaging the expansion with respect to the random increments (XJ+A — %t) and (yt+A — yt), taking into account the relation lirriA-+o W"(A, Xt, yt) = I for all (xt,yt) lying in the interior of D, and passing to the limit as A —)• 0, from (1.4.39) with regard to (1.4.13)-(1.4.19), we obtain the Bellman differential equation for the function F-i(x,y): F^x.y)^-!,
(1.4.40)
where the elliptic operator L">y is given by (1.4.31).
We also note that the function Fi(x,y) satisfies Eq. (1.4.40) in the interior of the domain D. It follows from (1.4.33) and (1.4.38) that at the points of the boundary dD the function F\ vanishes,
= 0.
(1.4.41)
[x,y)edD
In the theory of differential equations of elliptic type, the problem of solving Eq. (1.4.40) with the boundary condition (1.4.41) is called the first
interior boundary-value problem or the Dirichlet problem. Thus, solving the synthesis problem for the optimal control that maximizes the mean time of the first passage to the boundary is equivalent to solving the Dirichlet
problem for the semilinear elliptic equation (1.4.40).
70
Chapter I
1.4.4. Minimization of the maximum penalty. Now let us consider the synthesis problem with optimality criterion (1.1.18) for the optimal control system shown in Fig. 10. In this case, it is reasonable to introduce the loss function
F2(t,xt,yt)=
min E max r T C(Z(T), t/(r), W(T)) .
U(T)
L*> >
J
(1.4.42)
t
In (1.4.42) the averaging has the meaning of the conditional mathematical expectation £[•] = £{[•] x(t) = xt,y(t) = yt}. For small A we have the following basic functional relation for the function F2'.
F2(t,xt,yt)
=
min
{max.[c(xt,yt,ut) + o(A),EF2(t + A,
Let us introduce the notation c°(x,y) = min^gjy c(x, y, u). Then it follows from (1.4.43) that either
F2(t,xt,yt) = c°(xt,yt),
(1.4.44)
or
F2(t,xt,yt) = lim
min
A
EF2(t + A, z t + A , j/ t+A ),
(1.4.45)
-*-° u(r)£U t
provided that the function F2(t,X(,yt) > c°(xt,yt) has been obtained from
(1.4.45). Acting by analogy with Section 1.4.1, that is, expanding the function .£2(2 + A, £ ( _|_ A ,y t + A ) in the series (1.4.12), averaging, and passing to the limit as A ->• 0, from (1.4.44) and (1.4.45) we obtain (with regard to (1.4.13)-(1.4.19)) the Bellman equation in the differential form:
f mmLl F2(t,x,y) = Q I u£U ( F2(t, x, y) = c°(x, y)
if F2(t, x, y) > c°(x, y ) , (1.4.46) otherwise,
where Lf^^y denotes the operator (1.4.23). The unique solvability of (1.4.46) implies the condition
F2(T,x,y)=c°(x,y),
(1.4.47)
Synthesis Problems for Control Systems
71
as well as the matching conditions for the function ^2(^1 x, y) on the interface between the domains on which the equations in (1.4.46) are defined. These conditions of "smooth matching" [113] require the continuity of the function F2(t,x,y) and of its first-order derivatives with respect to the
phase variables x and y on the interface mentioned above. If, by analogy with Sections 1.4.2 and §1.4.3, the characteristics of the input process y(t) and of the controlled plant P are time-independent, then it is often expedient to use a somewhat different statement of the problem considered, which allows us to assume that the loss function is independent
of time. In this case, we do not fix the observation time but assume that the optimal system minimizes the functional I[u] = E maxc(x(r), y(r), u(T))e-^T-t},
(1.4.48)
where (3 > 0 is a given number. This change of the mathematical statement preserves all characteristic features of the problem. Indeed, it follows from
(1.4.48) that the time of observation of the function c(x,
y, u) is bounded
and determined by /?. Namely, this time is large for small /3 and small for large (3. For the criterion (1.4.48) the loss function is determined by the formula10
h(x,y) = min E [maxc(a;(T), 2/(r),-u(T)) e -^ ( r - < ) j . «(r)eu L r > * J
(1.4.49)
T>t
Taking into account the relations min E
max c(a;(r), J/(r), u(
u(r)£U
r>t + A
= e-/3A min E max C(X(T), «(r)€l7
r>t+A
we can rewrite Eq. (1.4.43) for the function f z ( x , y ) in the form h(xt,yt) =
min < max[c(a3 t ,J/t,Wt) +o(A), Ef2(xt+/\,yt+^)e~/3^] «(r)eu I
>. J
(1.4.50) 10
As usual £[•] in (1.4.49) is treated as the conditional mathematical expectation
72
Chapter I
By analogy with the previous reasoning, from. (1.4.50) we obtain the Bellman equation for the function /2(x, y)'
( mini" h(x,y) = ( 3 f 2 ( x , y ) I u&u I h(x,y) = c°(x,y)
if f?(x,y) > c°(x,y),
(1.4.51)
otherwise,
where i" is the elliptic operator (1.4.31) that does not contain the derivative with respect to time t. In §2.2 of Chapter II, we solve Eq. (1.4.51) for a special problem of optimal control. 1.4.5. Optimal tracking of a strictly discontinuous Markov process. Let us consider a version of the synthesis problem for the optimal
tracking system that differs from the basic problem (i)-(v) by conditions (i) and (iv). Namely, we assume that (i) the input process y(t) in the servomechanism (see Fig. 10) is given by a strictly discontinuous Markov process (see §1.1) with known characteristics X ( t , y ) and T r ( y , z , t ) determining the intensity of jumps and the density of the transition probability at the
state (t, y) and that (ii) there are no random perturbations £(t) that act on the plant P. In this case, the plant P is described by the system of
ordinary (nonstochastic) differential equations
x(t) = a(t,x(t},u(t)},
x(Q) = xQ,
0
(1.4.52)
It follows from (1.1.68) that for small A the transition probability p(t, yt', t-\-
A,2/ t + A) = p(y(t + A) — y t+ A | y ( t ) = yt) for the input process y(t) is determined by the formula
-yt) AA(t, j/ t )T(*, yt, y < + A ) + o(A).
(1.4.53)
By analogy with the solution of the basic problem, in the case considered, the loss function Fz(t,x,y) is determined by (1.4.5) if £[•] in (1.4.5) is understood as the averaging of the functional [•] over the set of sample paths yf = [y(r) : t < T < T] issued from a given initial point y(t) = yt.
Obviously, Fs(i, x,y) satisfies the functional equations (1.4.6) and (1.4.10). We rewrite Eq. (1.4.10) for F3 as follows:
, «
U
t+A
c(x(T),y(T),u(T))dT
•
(1-4.54)
Synthesis Problems for Control Systems
73
Note that for small A we can explicitly average (1.4.54) by integrating the function in the square brackets multiplied by the transition probability (1.4.53). Since the sample paths of the input process y(t) are discontinuous, the random increments (y*+A ~ Vt) are> generally speaking, not small. Therefore, in our case, instead of (1.4.12), we use the following representation of
F3(t + A, a; t+ A, 2/t+A) as A ->• 0:
dF3 F3(t + A, zt+A, yt+A) = Fsit, xt, yt+&) + A - — (i, xt, o(A)
(1.4.55)
(in (1.4.55) it is assumed that -Fs(<, z,j/) is a continuously differentiable function with respect to t and z). The Bellman equation for F3(t, x, y) can be derived from (1.4.54) in the
standard way. To this end, we substitute expansion (1.4.55) into (1.4.54), average with probability density (1.4.53), and pass to the limit as A —>• 0 in (1.4.54). Using (1.4.53), we obtain
EF3(t,xt,yt+£,) = I F3(t,xt,yt+&)[(l- A\(t,yt))8(yt+& - yt)
+ AA(<, yt)x(t, yt, J/*+A)] dyt+& + o(A) = F3(t, xt, yt) + A
/ F3(t, xt, z)ir(t, yt, z] dz
- F3(t, xt, &)] A(*, yt) + o(A).
(1.4.56)
In a similar way, it follows from (1.4.52) and (1.4.53) that -xt = a(t, xt, ut)A + o(A),
dF* -^(i,i t ,j/ t )+0(A) ) 5F-,
-^(t, s t , yt) + 0(A),
(1.4.57)
(1.4.58) (1.4.59)
/•«+A
E / 1/4
C(X(T), y ( r ) , u ( T ) ) dr = c(xt, yt, w«)A + o(A) (1.4.60)
(in (1.4.58) and (1.4.59) the functions O(A) denote terms of the order of A such that lirriA->.o O(A)/A = AT, where TV is a finite number).
74
Chapter I
Using (1.4.55)-(1.4.60) and passing to the limit as A -> 0 in (1.4.54), we obtain the following Bellman integro-differential equation for the function FS:
^(t,x,y) + X ( t , y) \ f ir(t, y, z)F3(t, x,z)dz- F3(t, x, y) Ot
LJ
+ min \aT(t,x,u)—^-(t,x,y)+c(x,y,u) = 0, 0 < t < T, (1.4.61) ueu |_ ox J F3(T,x,y)=il>(x,y). (1.4.62) If A(t, y) = \(y), 7r(t, y, z) = ?r(y, z), a(i, z, u) = a(x, u), tf>(x, y) = 0, and T —^ oo, then the system shown in Fig. 10 may operate in the stationary tracking mode (see Section 1.4.2). In this case, instead of (1.4.61), we have the stationary Bellman equation A(y)
Tr(y,z)f3(x,z)dz-f3(x,y)\
aT(x, u)--(x,y) +c(x,y, u) =7, ox J where a stationary loss function f3(x,y)
(1.4.63)
is determined by analogy with
(1.4.29) as f3(x, y) = ^ [F3(t, x, y) - 7(T - t)] and the number 7 > 0 determines mean losses per unit time in the stationary tracking mode under the optimal control. The solution of the timeinvariant equation (1.4.63) for a special synthesis problem is given in §2.2. In conclusion, we make some remarks. First, we note that in this section we have considered only the synthesis problems (and the corresponding Bellman equations) that are studied in
the present monograph. The Bellman equations for other stochastic control problems can be found in [1, 3, 5, 18, 34, 50, 57, 58, 113, 122]. Moreover, the ideas and methods of the dynamic programming approach are widely used for solving problems of optimal control for Markov sequences and processes with finitely or countably many states [151, 152], which we do not consider in this book.
We also point out that many arguments and computations in this section are of rather formal character and sometimes correspond to the "physical level of rigor." To justify the optimality principle, the sufficiency of Markov optimal strategies, the validity of Bellman differential equations, and the solvability of synthesis problems rigorously, it is required to have rather complicated and refined mathematical constructions that are beyond the
framework of this book. The reader interested in a closer examination of these problems is referred to the monographs [58, 59, 175], and especially to [113].
Synthesis Problems for Control Systems
75
§1.5. Sufficient coordinates in control problems with indirect observations
We have already noted that the dynamic programming method in, so to say, its "pure" form can be used only for Markov controlled processes. Let Xt be a current phase state of the system. The probabilities of future states -X"t+A (A > 0) of the process X(t) must be completely determined by the last measured value Xt. However, since the time evolution of X(t) depends on random perturbations and control actions, the process X(t) satisfies the Markov property only if the values Ut of the current control are determined by the instant values of the phase variables and time as follows:
ut =
(1.5.1)
The Markov property of the process X(t) allows us to write the basic functional equation of the optimality principle, then to obtain the Bellman equation, etc., that is, to follow the procedure described in §1.4. To implement the control algorithm in the form (1.5.1), it is necessary to measure the phase variables Xt exactly at each instant of time. This possibility is provided by the servomechanism shown in Fig. 10. In this case, the phase variables Xt = Xt — ( x t - , y t ) are the components of the (n + m)-dimensional vector of instant input (assigning) actions and output (controlled) variables. Now let us consider a more general case of the system shown in Fig. 3. At each instant of time, instead of true values of the vectors Xt and y$, we have only the results of measurements J/Q and XQ, which are sample paths of the stochastic processes { y ( s ) : 0 < s < t] and { x ( s ) : 0 < s < t}. These processes are mixtures of "useful signals" y^, x*0 and "random noises" £Q, ?7oOnly these results of measurements can be used for calculating the current values of the control actions Ut', therefore, the desired control algorithm for the system shown in Fig. 3 has the form of the functional ut =
(1.5.2)
To illustrate the computation of the optimal functional ¥>(<, EQ, J/Q), we consider, as an example, the basic synthesis problem (see §1.4, Section 1.4.1) in the case of indirect observations. Assume that the equation of the controlled plant, the restrictions on the control, and the optimality criterion have the form
x(t) = a(t,x(t),u(t))+cr(t,x(t))£(t),
u(t) <E U C R r ,
0 < t < T,
(1.5.3)
(1.5.4)
76
Chapter I
I(u)
= E\ f c(x(t), y ( t ) , u ( t ) ) dt + ^(x(T), y(T))] Uo J
(1.5.5)
(here we use the notation from (1.4.2), (1.4.3), and (1.4.4) in §1.4). The
observed processes x ( t ) and y ( t ) are determined by the relations
x ( t ) = P ( t } x ( t ) + Q(t)r,(t);
y(t) = H(t)y(t) + G(t)C(t),
0
Here P, Q, H, and G are given matrices whose dimensions agree with the dimensions of the vectors x, x, n, y, y, and £. We also assume that the
vectors x and rj (as well as the vectors y and Q are of the same dimension, and the square matrices Q(t) and G(t) are nondegenerate for all t G [0, T].11
We assume that the stochastic process £(i) in (1.5.3) is the standard white noise (1.1.34) and the other stochastic functions y(t), C(^), and r)(t) are Markov diffusion processes with known characteristics (that is, with given drift and diffusion coefficients). The stochastic processes £ ( t ) , y(i), CWi and r](t) are assumed to be independent. We also note that the stochastic process x(t), which is a solution of the stochastic equation (1.5.3), is not Markov, since in this case the control functions u(t) = Ut on the right-hand side of (1.5.3) have the form of functionals (1.5.2) and depend on the history of the process.
Following the formal scheme of the dynamic programming approach, by analogy with (1.4.5), we can define the loss function for the problem considered as follows:
t
U
~. ~ „
T
-, c(z(r), y(r), u(r))dr + j(x(T), y(T)) \ z«, %< . J
„
(L5 7)
-
Since the functions XQ and y^ are arguments of F in (1.5.7), it would be more correct if expression (1.5.7) were called a loss functional; however, both (1.5.7) and (1.4.5) are called loss functions.
In contrast with §1.4, it is essentially new that we cannot write the optimality principle equation of type (1.4.6) or (1.4.10) for the function
(1.5.7), since this function depends on the stochastic processes x ( t ) and y ( t ) , which are not Markov. Formula (1.5.6) immediately shows that x ( t ) and y(t) have no Markov properties, since the sum of Markov processes
is not a Markov process. Moreover, it was pointed out that the process x ( t ) itself is not Markov. Therefore, we can solve the synthesis problem 11
For simplicity, we assume that Q(t) and G(t) are nondegenerafce, but this condition
; not necessary [132, 175].
Synthesis Problems for Control Systems
77
by using the dynamic programming approach only if we can choose new "phase" variables X(t) = Xt for the loss function (1.5.7) so that, on one hand, they were sufficient for computation of minimum future losses in the sense of
F(t,x*,y*)=F(t,Xt) and, on the other hand, the stochastic process X(t) be Markov. Such phase variables Xt are called sufficient coordinates [171] by analogy with sufficient
statistics used in the mathematical statistics [185]. It turns out that there exist sufficient coordinates for the problem considered and Xt is a collection of instant values of observable processes x(t) = Xt and y ( t ) — yt and of the a posteriori probability density p(t,xt,yt) = p(x(t) = xt,y(t) = yt | *o,5o) of the unobserved vectors Xt and yt,
Xt=(xt,yt,p(t,xt,yt)).
(1.5.8)
In what follows, it will be shown that the coordinates (1.5.8) are sufficient to compute the loss function (1.5.7). In the case of an uncontrolled process x(t), the Markov property of (1.5.8) follows from Theorem 5.9 in [175].
To derive the Bellman differential equation, it is necessary to know equations that determine the time evolution of sufficient coordinates. For the first two components of (1.5.8), that is, for the process x ( t ) and y ( t ) , these equations can be assumed to be known, because one can readily obtain them from the a priori characteristics of the processes j/(<), x(t), C(£), n(t) and formulas (1.5.6). Later we derive the equation for the a posteriori probability density p(t, zj, yt). First, we do not pay attention to the fact that the control ut has the form of a functional (1.5.2). In other words, we assume that u(t) in (1.5.3) is a known deterministic function of time. Then the stochastic process x(t) that satisfies the stochastic equation (1.5.3) is a diffusion Markov process whose characteristics (the drift and diffusion coefficients) are uniquely determined
by the vector a(t,x,u) and the matrix cr(t,x) (see §1.2). Thus, in our case, x(t), y(t), C(i), and ^(t) are independent stochastic Markov diffusion processes with given drift coefficients and matrices of diffusion coefficients. In view of formulas (1.5.6) and the fact that the matrices Q(t) and G(t)
are nondegenerate, it follows that the collection ( x ( t ) , y(t), x ( t ) , y(i)~) is a Markov diffusion process whose characteristics can be expressed via given characteristics of the processes x(t), y(t), C(*)> and *?(<)• Indeed, if we
denote the vectors of drift coefficients by Ax(t, x), Ay(t, y ) , A f ( t , C), A^(t, n) and the diffusion matrices of independent Markov processes x ( t ) , y ( t ) , £ ( t ) ) , and r)(t) by Bx(t,x), By(t,y), -B c (i,C), B^(t,rj), then it follows from (1.5.6) that the drift coefficients Ay and Ay for the components x(t) and y(t) are
78
Chapter I
determined by the relations
A-s = P(t)Ax(t,x) + Q(t)An(t, Q~1(t)(x- P ( t ) x ) ) + P(t)x + Q(t)Q-l(t)(x - P(i)z),
(1.5.9)
f = H(t)Ay(t, y) + G(t)Ac (t, G~\t)(y - H(t)y}} G(t)G-1(t)(y-H(t)y)
(1.5.10)
and the matrix B of the diffusion coefficients of the joint process ( x ( t ) , y ( t ) , has the block form
Bz(t,x,x) 0 Bx(t,x)PT(t) 0
0 By(t,y,y) 0 By(t,y)HT(t)
P(t)Bx(t,x) 0 Bx(t,x) 0
0 H(t)B9(t,; 0 By(t,y)
, (1.5.11)
where Bx(t,x,x) and B^(t,y,y) are square matrices12 determined by the relations
(1.5.12) T
By(t,y,y)=H(t)By(t,y)H (t] + G(t)B((t,G-l(t)(y-H(t)y))GT(t). (1.5.13) Now we point out that in the Markov collection of random functions
( x ( t ) , y(f), x(t), yit)) the components x ( t ) and y(t) are observable, but the components x(y) and y ( t ) are not observable. Partially observable Markov processes are often called conditional Markov processes. The rigorous theory of such processes can be found in [132, 175]. Let us consider the conditional (a posteriori) density p(t, xt, yt) = (x(t) = xt, y(t) = yt XQ, y0') of the probability distribution for unobservable com-
ponents of the partially observable Markov process ( x ( t ) , y ( t ) , x ( t ) , y ( t ) ) . It turns out that the a posteriori density p(t, Xt, yt) satisfies a stochastic partial differential equation, first obtained in [175], This is a generalization of the Fokker-Plank equation (1.1.67) to the case of observation. In what follows, we briefly derive this equation. 12 If P, Q, H, and G in (1.5.6) are row matrices, then B~(t,x,x) and B~(t,y,y) are scalar functions.
Synthesis Problems for Control Systems
79
According to [175], we introduce the following notation. We denote the collection of random functions (x(t), y(t), x(i), y ( t ) ) that forms a Markov process by a single letter z(t) and assume that the dimension of the vector z is equal to n. We assume that the unobservable components of the vector z are numbered from 1 to m and the observable components are numbered from ra+1 to n. For convenience, we write xa (I < a < m) for unobservable components and yp (m + 1 < p < n) for observable ones. We also use
three groups of indices: the indices i,k,t,... vary from 1 to n; the indices a, /3, 7, . . . from 1 to m; and /?,
z ( t ) = z} = Ak(t,z); Hm
Azk = zk(t + A) -
E[Azfc AZ, | z(t) =z}= Bkt(t, z);
Urn iE[Az f c l ...Az f c l . | z(t) = z] = 0
(1.5.14) (r > 2).
It is required to obtain an equation for the a posteriori probability p(t, x$) = p(xt | 2/o)i provided that (1.5.14) and the results of observation of y*Q are known. Using the transition probability PA(Z*+A I z*) = PA(a=t+A,^t+A xt,yt) and the probability multiplication theorem, we obtain
p(xt+A,Vt+A,xt | 3/0) =PA(a5t+A,2/«+A
xt,yt)p(xt \ y*0).
(1.5.15)
Integrating (1.5.15) with respect to xt and taking into account (1.1.50), we obtain
I J/o) = / PA(a: t + A,2/t+A ] xt,yt)p(t,xt] dxt.
(1.5.16)
If we write the left-hand side of (1.5.16) in the form p(z«+A,yt+A I 2/o) =P(«t+A | 2/o'2/«+A)p(y*+A
J/Q)>
then we can write (1.5.16) as follows: P(Z*+A
2/o.2/t+A) = -7———j—JT / PA(**+A,y«+A | xt,yt)p(t,xt)dxt.
p(yt+A i %/ ^
(1.5.17)
Integrating (1.5.16) with respect to xt+&, we obtain = //
| xt,yt)p(t, xt)dxtdxt+&.
(1.5.18)
Chapter I
80
Substituting (1.5.18) into (1.5.17) and taking into account the fact that the equality
is valid, since the arguments are continuous, we obtain
xt,yt)p(t,xt)dxt
,
p(t + A,
a;*,
—-—— ~ ——-
+ o(A).
(1.5.19) Equation (1.5.19) for partially observable Markov processes plays the
same role as the Markov (Smoluchovski) equation (1.1.53) for the complete observation. To derive the differential equation for the a posteriori density p(t, Xi] from (1.5.19), we use the same method as for the derivation of the
Fokker-Planck equation in §1.1 (see (1.1. 59)-(l.l. 64)). Let us introduce two characteristic functions of random increments As a , a = ! ) • • • ) m, and Az^, k = 1, . . . , n.,13
= I exp [jua(xat+& - x a ( )]p A (x t + A,J/«+A J
Xt,yt)dxt+&,
(1.5.20)
i, • • • , un,zt)
(1.5.21)
- I exp[JM
The transition probability can be expressed in terms of inverse Fourier transforms as follows:
zt = —71—— / exp [ - juk(zkt+A - zkt)] l^ "; J x 02(ui,...,un,zt)dui,...,dun,
(1.5.22)
zt = -—— I exp [ - jua(zat+& - zat)] \*n)
J
(1.5.23) 13 In (1.5.20) and (1.5.21), as usual, j = \/—1 and the sum is taken over the repeated indices:
Synthesis Problems for Control Systems
81
Using the expansion of In $2(1*! zt) in the Maclaurin series, we can write
(1.5.24)
where Ks [ A z & , . . . , Az r ] denotes the s-order correlation between the components of the vector of increments Az = z(t + A) — z(i) of the Markov process z(t). Using well-known relations between correlations and initial
moments [173, 181], we see that (1.5.14) gives the following representation for (1.5.24):
0 2 (wi,- • -,un,zt) = exp \&jAkuk - —BkiukUi + o(A) Zi
L
(1.5.25)
-i
(for brevity, in (1.5.25) and in the following we do not write arguments of
Ak and Bkt, namely, Ak = Ak(t, zt) and Bkl — Bki(t, zt)). Comparing (1.5.22) and (1.5.23), we see that 0i(ui, ...,um, zt, y t + A ) = (27r)—— / exp [ - ;«<,(^+A - yat}}
x ^2(1*1,..., un,zt) dum+i.. .dun.
(1.5.26)
After the substitution of (1.5.25), we can calculate the integral (1.5.26) explicitly. As the result of integration, for the characteristic function #1 we obtain the formula14
— Kex.p [L(ui,.. .,u r o ,z t ,2/ t + A )A + o(A)],
(1.5.27)
where
= jua Aa + Ba
-
,
(1.5.28)
14
To obtain (1.5.27) and (1.5.28), it suffices to use the well-known formula [67] /
J
ex
P
L
~ ~Bkt(xk ~ mk)(xe-mt)\
2
J
dxi ...dxn = \
*
V detij
which holds for real symmetric positive definite matrices B = j|Sfe^|jr and any constants 771 ft and 7Ti£.
82
Chapter I
and K is a constant that does not influence the final result of calculations. Note that we calculated (1.5.26) under the assumption that the matrix ||.B<7p||ro+i is nondegenerate and we used the notation ]|-FVp|| = H-Bo-pll" 1 Since the exponent in (1.5.27) is small (~ A), we replace the exponential by the first three terms of the Maclaurin series ex = 1 + x + x 2 /2, truncate the terms whose order with respect to A is larger than 1, and obtain
exp [L(ui, . . . , w m , zt, 2/«+A)A + o(A)] = 1 + L(UI, ...,um, zt, 2/ t+A )A
- —(uaupBavFvpBpft
- ArFrpAp) + juaBa^F^pAp^ + o(A).
(1.5.29)
In (1.5.29) we used the relation FapBpT = BapFpr = 8VT, where 8ffT is the Kronecker delta, and the formula
Ay p At/ r = Bpr(t, z t )A + o(A) = BpT A + o(A),
(1.5.30)
which follows from the properties of Wiener processes and is a multidimensional generalization of formula (1.2.8) (for details, see Lemma 2.2 in [175]). Substituting (1.5.28) into (1.5.29) and collecting the similar terms, we obtain
exp [£(ui,...,w m ,z t ,3/ t + A )A + o(A)] = 1 + (jua)(AaA + Ba
(1.5.31)
Using (1.5.23), (1.5.27), and (1.5.31), we calculate the numerator of the fraction on the right-hand side in (1.5.19): /
rr
[
//
II
.',.
(A
A
Jua\-H-a'-l -
+ o(A).
(1.5.32)
Taking into account the formulas (see (1.1.60)-(1.1.64)) 1 ____ m (2 }
/C e-3ui(!C-ft+^.-x-ft)
/ -,'->,
(2 ?r)m J
m
j
j
_
aui...aum—
(,-ju-f(xyt+A.-x^f)
j
j
'''
Synthesis Problems for Control Systems
dS(xat+A — Xat)
TT
f
83
/
(27T)™
we obtain the numerator in (1.5.19) (we omit the constant K, since K and a similar constant in the denominator of (1.5.19) are canceled): r\
p(t, xt+&) - -^——— [(vla A + Ba
32 ———— 5———— [Bapp(t,xt+±)]
+A<7F(7pp(t,xt+&))&yp + o(&)
Q g 23)
(in (1.5.33) (<, zt+A,J/t) are the arguments of the coefficients Afc and Fffp). The denominator of the right-hand expression in (1.5.19) differs from the numerator by integration with respect to Xt+&. We perform this integration, take into account the fact that the normalization condition for the probability density p(t, xt+&) implies p(t, »«+A) dxt+A = 1;
p(t, x) —>• 0,
x) *0 — — » 0 c/xa
as
\x\ —> oo,
and from (1.5.33) obtain the following expression (without K) for the denominator in (1.5.19): I
o(A)
= 1 + &ypEpsAffFap
+ o(A),
(1.5.34)
where E p s (-) denotes the a posteriori averaging f(-)p(t,x)dx. We assume that the elements of the matrix Bap (and of F<,p, respectively) are independent of unobservable components x and take into account (1.5.30). Then we can write
(1.5.35)
84
Chapter I
Multiplying (1.5.33) by (1.5.35) and substituting the result into (1.5.19), we obtain
1
/)2
AaP - PEpsAff - j(Bavp) OX(y
+ o(A). (1.5.36) -*
As A —> 0, the terms denoted by o(A) in (1.5.36) disappear, and the finite increments become differentials. In this case, according to §1.2, it is necessary to point out in which sense stochastic differentials are understood, since the differential equation obtained is stochastic (it contains the differential of the Markov process dyp(t)). Comparing Eq. (1.5.36) (as A —> 0) with the stochastic equation (1.2.3), we see that now the a posteriori probability density p(t, x) in (1.5.36) plays the role of the random function x(t) in (1.2.3), and the vector-function (1.5.37) plays the role of the function
dt [d0yp(t) - EpsAp dt}Fpa. pAa - pEpsAa - -— (5a
d2
dp _ 1
r
m-2dx-^[(Baf}--
dyr
2 yp
dxa
Tf>
dyT
Synthesis Problems for Control Systems f\
a
i -&TP
rt
VT
85 i J? <rp-**-rT **-p
by using coupling formulas between stochastic differentials and integrals (see §1.2);15 in (1.5.39) yp = yp(t) = dyp(t)/dt denotes the formal timederivative of the Markov process yp(t).
Equation (1.5.39) is a generalization of the Fokker-Planck equation to the observation case. It should be noted that if some transformations of random functions p(t,x) are necessary (see formulas (1.5.41)-(1.5.44) below),
then it is more convenient to use Eq. (1.5.39), although it is more cumbersome than a similar Ito equation (1.5.38), since (see §1.2) the symmetrized form allows us to treat random functions (even such singular functions as white noises) according to the same formal rules as for deterministic and
sufficiently smooth functions. We can show [132] that VFap[d0yp(t) - EpsApdt}16 is the differential of the standard Wiener process dr](t) studied in §1.2. Therefore, in view of Eq. (1.5.38), the already cited Markov property of the set (j/t,p(<, x)) can be obtained by not completely rigorous but sufficiently illustrative arguments. Indeed, since the increments [^(t + A) — r),,(i)] of the stochastic pro-
cesses riv(t) = ^/Fap[yp(t)
- J0 EpsAp(T,y(T))dt]
in (1.5.38) are mutually
independent, the future values of the a posteriori probability p(t + A, x) are
completely determined by (x t , yt,p(t, *))• Since the vector xt is unobservable, then the probabilities p(t + A, x) of future values are determined by \ytip(t, x)) and the probability of the current value Xt, that is, by the a posteriori density p(t, x) contained in (yt,p(t, «))• On the other hand, since the process z(t) is of Markov character, the probabilities of future values of the observable process yt+A are completely determined by its current state z t — (xt,yt)i that is, by the same set (yt,p(t, x)), since xt is unobservable. This implies that (y(t),p(t, x)) is a Markov process. Now let us recall that Eqs. (1.5.38) and (1.5.39) were derived under the assumption that the control u(t) in (1.5.3) is a known deterministic
function of time. However, if the control u(t) is given by the functional (1.5.2) (in the new notation, following (1.5.14), this functional has the form
u(t] — ut = p(t, j/o))> then this fact does not affect the Markov properties of (j/t,p(t, a:)), since it is assumed that (j/t,p(i,a:)) is determined by the 15 Here we do not show in detail how to transform the Ito equation (1.5.38) to the symmetrized form (1.5.39); the reader is strongly recommended to do this useful exercise
on his own. 16
\AF(Tp denotes an element of the matrix \/F which is the square root of the ma-
trix F; since the matrix |]Bp
86
Chapter I
entire past history of the observations J/Q = { y ( s ) : 0 < s < t}. Thus, for a given state of (yt,p(t, x)) and any chosen functional if> in (1.5.2), the control Ut is a known vector on which the functions o(i, x, u) in (1.5.3) and the functions Aa and Aa in (1.5.38) depend as on a parameter. Hence it
follows that Eqs. (1.5.38) and (1.5.39) are also valid for controlled processes (provided that the control is given in the form (1.5.2)).
Now let us return to the synthesis problem and the dynamic programming approach. Describing the state of a controlled system at time t by (1.5.8) or, briefly, by (yt,p(t,x)^
(recall that after (1.5.14) we introduced
the new notation: Sj, yt —>• yt and xt, yt —)• xt), we can write the loss function (1.5.7) as F(t,i/Q) = F(t, yt,p(t, z)). Using the Markov property of (yt,p(t, z)), we can write the basic equation of the optimality principle for the function F(t,yt,p(t,x)} as follows:
F(t,yt,p(t, x)) =
rnin
«reu
E< \
"*+A c(xT,yT,uT)dr
t
ytt,,PP(t,x)\. (t,x).
(1.5.40)
Generally speaking, by passing to the limit as A —> 0 in (1.5.40) and using (1.5.14) and (1.5.38) (or (1.5.39)), we can obtain the Bellman differen-
tial equations by analogy with §1.4. However, the equation obtained in this way contains the functional derivatives SF/Sp(t,x), 82F/6p(t,x)Sp(t,x), etc.;
usually it is difficult to solve this equation (as pointed out in §1.4
and §1.5, even the solution of "usual" Bellman partial differential equa-
tions is a rather complicated problem). Therefore, it is more convenient in practice, instead of the a posteriori density p(t, x), to use some equivalent set of parameters as arguments of the function F. We show how to do this. Assume that the a posteriori probability density p(t, x) is a unimodal function of a vector variable x for all t £ [0, T]. By the vector mt = m(t] we denote the maximum of the a posteriori density p(t, x) at time t. Expanding lnp(t, x) in the Taylor series with respect to x around the point m(t), we
obtain the following representation of the a posteriori density p(t, x): p(t, x) = exp < a(t) - y~] —
x
•^ 2J
a,/3,...,C=i
aap...dt)(xc.-ma(t))...(x<;-mf(t))^
} J
(1.5.41)
Synthesis Problems for Control Systems
87
(the scalar function a(t) in (1.5.41) is determined by the normalization condition J p(t, x) dx = 1). Using (1.5.41), we readily obtain the system of equations for the parameters (ma(t),aap(t),aapy(t), . . .) instead of the symmetrized equation (1.5.39). To this end, we rewrite (1.5.39) in the more compact form
at
<
,
p
,
,
.
(1.5.42)
Next we replace the functions Aa and $(a;, y) by their Taylor series,17 substitute (1.5.41) into (1.5.42), and successively set the coefficients of equal powers of (xa — ma), (xa — ma)(xi) — mp), ... on the left- and right-hand sides of (1.5.42) equal to each other; thus we obtain the following system of ordinary differential equations for ma(t), aa/}(t), aap^(i), . . .:
dxa' 2
-
5A7 Qi~jft dx 7P 3
dx( 03^ — -^
Q--JCI
'
, — >> 1 -5—apiS + OXct
(1.5.43)
In (1.5.43) the dot over a variable indicates, as usual, the time derivative (mp = dm/}(t)/dt). Moreover, in (1.5.43) we assume that Bap is independent of x and omit the arguments of the functions A, $, and of their derivatives; we assume_that_the values of these functions are taken at the point x = m, that is, Ap — Ap(t,m,y), d$/dxa = d$(t,m,y)/dxa, etc. It follows from (1.5.41) that the set of the parameters ma(t), aap(t),... uniquely determines the a posteriori probability density p(t, x) at time t. 17
The functions Aa and $(a;,j/) are expanded with respect to x in a neighborhood
of the point ra(£).
88
Chapter I
Thus we can use these parameters as new arguments of the loss function, since F(t, yt,p(t, x ) ) = F(t, yt,mat,aa(3t, • • •)• However, in the general case, system (1.5.43) is of infinite order, and therefore, if we use the new sufficient coordinates (yt, mat, aapt,...) instead of the old coordinates (yt,p(tix)}> then we do not gain considerable advantage in solving special problems.
Nevertheless, there is an important class of problems in which the a posteriori probability density (1.5.41) is Gaussian (conditional Gaussian Markov processes are studied in detail in [131, 132]). We have such processes if [175] (1) the elements of the matrix Bap are constant numbers; (2) the functions Aa linearly depend on x; (3) the function <3>(a;,t/) depends on x linearly and quadratically; (4) the initial probability density (the a priori probability density of unobservable components before the observation) p(0, x) is Gaussian. Under these conditions, we have aa/g7 = aap^t = • • • = 0 in
(1.5.41) and (1.5.43), and system (1.5.43) is closed and of finite dimension:
+ &o:/3 —
^*«7"'po j-"yo
f\
•
a,/3,7,(5= I,...,TO.
(1.5.44)
Now let us consider the synthesis problem corresponding to this case. To avoid cumbersome formulas, we deal with a simplified version of problem (1.5.3)-(1.5.6). Namely, we assume that the input y(t) is absent and the system shown in Fig. 3 does not contain Block 1. Suppose that the plant P is described by the system linear with respect to the output (controlled) variables
x = G(t, u)x + b(t, u) +
(1.5.45)
where x = x ( t ) is an m-vector of output variables, G(t, u) and <j(i) are given m x TO matrices, b(t, u} is a given TO-vector-function, and £(t) is an ra-vector of random perturbations of the standard white noise type (1.1.34). More
explicitly, the vector-matrix equation (1.5.45) has the form a,/3= l , . . . , m . We observe the stochastic process
x ( t ) = P(t)x(t) + Q(t)ri(t),
Q
(1.5.46)
where x and 77 are fc-vectors, P and Q are k x n and k x k matrices, the matrix Q(t) is nondegenerate for all 0 < t < T, and rj(t) is the standard white noise (1.1.34) independent of (,(t).
Synthesis Problems for Control Systems
89
Under the assumption that the admissible control satisfies condition (1.5.4), it is required to find the optimal control uf(t) = <£>(£, ZQ) such that the cost functional
c(x(t),u(t))dt + i,(x(T))
(1.5.47) J
attains its minimum value.
We write y(t) = I X(T) dr. Jo
(1.5.48)
Then (x(t), y(t)) is a Markov stochastic process, and it follows from relations (1.5.45), (1.5.46), and (1.5.48) that the characteristics (1.5.14) of this
process have the form
Aa = Ga/}(t, u)xp + ba(t, u), Bap = Q,
Aa = Paa(t}xa, (
B^ =Qpr(t)Q
' '
'
In (1.5.49) the indices a, /3, 7 take values from 1 to m and the indices p, cr, T from TTi+1 to m + k.
In this case, it follows from (1.5.49) and (1.5.39) that in (1.5.42) we have
Bap = Ba!),
Aa = Aa,
*(x,y) = AaFapyp - -Ap \
Zi
'
(1.5.50)
(Pap is an element of the matrix H-B^H"1 = [Q(t)QT (^)] -1 ). It follows from (1.5.49) and (1.5.50) that in this case system (1.5.43) has the form (1.5.44). Substituting (1.5.49) and (1.5.50) into (1.5.44), we obtain the following system of equations for the parameters ma(t), aa/3(t), a,(3 = 1, . . . , m, of the a posteriori density:
b p ( t , u ) ] + P
- Pp/)(t)mf}),
(t, u)a7/3 - a a7 G 7/3 (t, u)
(1.5.51)
+ P<,a(t)Fffp(t)Ppf)(t). System (1.5.51) can be written in a more compact vector-matrix form. So, introducing the matrix A = ||aa/g|[J* and taking into account the fact that yp = xp according to (1.5.48), we see that (1.5.51) implies
Am = A[G(t, u)m+b(t, u)} + PT (t)[Q(t)QT (t)]-l(x - P(t)m),
A - -Aa(t)crT(t)A - GT(t,u)A - AG(t,u) T
T
+ P (t)[Q(t)Q (t)]^P(t).
(1.5.52)
90
Chapter I
Now we note that the right-hand sides of (1.5.52) do not explicitly depend on y(t), and moreover, the cost functional (1.5.47) is independent of the observable process x(t). Therefore, in this case, the current values of the vector yt do not belong to the sufficient coordinates of problem (1.5.45)(1.5.47), which are the current values of the components of the vector mt and of the elements of the matrix At. If instead of the matrix A we consider the matrix D = A~l of a posteriori covariances, then, multiplying the first equation in (1.5.52) by the matrix D from the left and the second equation in (1.5.52) from the left and from the right and taking into account the formulas
DA = AD = E,
DA + DA = 0,
D = -DAD,
we obtain, instead of (1.5.52), the relations
TO = G(t,u)m+b(t,u) + DPT(t)[Q(t)QT(t)}-1(x(t) T
T
D = v(t)
T
- P(t)m),
(1.5.53)
1
- DP (t)[Q(t)Q (t)]- P(t)D. Equations (1.5.53) are well-known equations of the Kalman filter [1, 5, 58, 79, 132]. As is known, the Kalman filter is a device for optimal filtering of the "useful signal" x(t) that is observed on the background of a random noise. In this case, the vector m(t) is an optimal18 estimate of current values of the components of the unobservable stochastic process x(t) that is the result of observation of XQ = {x(s): 0 < s < t}, provided that the observation process is given by (1.5.46). The matrix D(t) that satisfies the second (matrix) equation in (1.5.53) characterizes the accuracy of the estimation of unobservable components of the process x(t) by the vector
m(t) (see [1, 5, 79]). Equations (1.5.53) play the role of "equations of motion" for the controlled system in the space of sufficient coordinates. Since the process Q~l(t)(x(t) — P(t)mj is a white noise, the first equation in (1.5.53) is a stochastic equation of type (1.5.45), and the second equation is a usual differential (matrix) equation. Therefore, the Bellman differential equation for the loss function F(t,mt, Dt) can be derived by a technique similar to that used in §1.4 to derive Eq. (1.4.21) for the function (1.4.5). 18 The optimality of the estimate m(t) is understood as the minimum of the mean square deviation E|a;(t) — m(t)| 2 ; as is known [167, 175, 181], in the Gaussian case, m(t) coincides with the maximum point of the a posteriori probability density p(t, x) = p(x(t) = x | S0f).
Synthesis Problems for Control Systems
91
After similar calculations, we obtain the Bellman equation of the following form (see also [34, 175]) for the function F(t, m, D) in problem (1.5.45)(1.5.47):
dt
I
ft p
(TOTGT (t, u) + 6 T (t, u)) ——
3F
1
+ Sp ^(VGT(t, u) + G(t,«)D) + c(m, D, u) = 0, (1.5.54)
where dF/drn is an w,-vector with components dF/dma, a = l , . . . , m ; 82F/dmdmT is the mxm matrix of the derivatives d2F/dmadmf), a, (3 = 1,..., TO; dF/dD is the mxm matrix of the partial derivatives dF/dDap, a,/3 = 1,..., TO; and c(m, D, w) denotes the a posteriori mean of the penalty
function c(x,u) in the functional (1.5.47), that is,
= [(2 7 r) T O detD]- 1 / 2
(1.5.55) The loss function F(t, TO, D) satisfies (1.5.54) for 0 < t < T.
At the
terminal instant of time t = T, this function is determined by the relation
F(T,m,D) = EpsTp(x),
(1.5.56)
where, by analogy with (1.5.55), E ps (-) denotes integration of (•) with Gaussian density.
We see that (1.5.56) is a generalization of condition
(1.4.22) to the case of indirect observations. As usual, by solving Eq. (1.5.54) with the additional condition (1.5.56), we simultaneously obtain the optimal control u*(t) = ipi(t,m(t), D(t)) (see §1.3 and §1.4). Thus the desired algorithm of optimal control in the functional form u*(t) = (p(t,Xo) for problem (1.5.45)-(1.5.47) is the superposition of the two operations: the optimal filtering of the observed process ( x ( t ) : 0 < t < T) by means of the Kalman filter (1.5.53) and the formation of the current control ut(t) = tpi(t,m(t), D(t)). This situation is typical of other problems with indirect observations. Therefore, in the general case of the servomechanism shown in Fig. 3, the
Chapter I
92
FIG. 12 controller C actually consists of two blocks that are functionally different (see Fig. 12): the sufficient coordinate block SC that models the corresponding filter and the decision block D whose structure is determined by the solution of the Bellman equation.
Some examples of other Bellman equations obtained by using sufficient coordinates, as well as solutions of these equations, will be considered later
in §3.3, §4.2, §5.4, and §6.1.
CHAPTER II
EXACT METHODS FOR SYNTHESIS PROBLEMS
Exact solutions to synthesis problems of optimal control are of deep theoretical and practical interest. However, exact solutions can be obtained only in some special cases. The point is that exact methods are characterized by rather strict restrictions on the assumptions of the synthesis problem, but these assumptions are seldom satisfied in actual practice. It is well known that, for instance, the Bellman equation can be solved exactly under the following assumptions: (1) the dynamic equations of the plant are linear, (2) the optimality criterion of the form (1.1.11) or (1.4.3) contains only quadratic penalty functions, (3) no restrictions are imposed on the control and on the phase coordinates, (4) random actions (if any) on the system are Gaussian Markov processes or processes of the white noise type. The synthesis problems satisfying (l)-(4) are called linear- quadratic problems of optimal control. An extensive literature is devoted to these problems [3, 5, 18, 24, 72, 112, 122, 128, 132, 168]. In the present chapter we restrict our consideration to an outline of methods for solving such problems (§2.1) and consider in more detail less known results concerning the solution of some special synthesis problems with bounded controls (§§2.2-2.4).
§2.1. Linear-quadratic problems of optimal control (LQ-problems)
2.1.1. First, let us consider an elementary optimal stabilization problem of a first-order system perturbed by a Gaussian white noise (see Fig. 13). Suppose that the plant P is described by a linear scalar equation of the form
x - ax + bu+ A/KW,
(2.1.1)
where a, b, and v are given constants (y > 0) and £(t) is the standard white noise (1.1.31). The performance of this system is estimated by the following functional of the form (1.4.3) with quadratic penalty functions:
T
[cx2(t) + hu2(t)} dt + Cl z 2 (T)
(2.1.2) J
93
Chapter II
94
c
u(t)
FIG.
x(t)
P
13
(here c, ci, and ft, are given positive constants). We do not impose any restrictions on the control u and the phase variable x.
Problem (2.1.1), (2.1.2) is a stochastic generalization of the linear-quadratic problem (1.3.24), (1.3.25) considered in §1.3 and a special case of a more general problem (1.4.2)-(1.4.4). Since the stabilization system shown in Fig. 13 is a specific case of the servomechanism shown in Fig. 8, the Bellman equation for problem (2.1.1), (2.1.2),
- ex2 + min I bu—— + hu2\ = 0,
ax
~dt
'
(2.1.3)
can be obtained from (1.4.21) by setting
Ay=By = 0,
Bx = v,
c(x,y,u) = cx2 + hu2,
Ax = ax + bu.
In (2.1.3) the loss function F = F(t,x), determined, as usual, by
t
satisfies Eq. (2.1.3) in the strip ET = {0 < t < T, -oo < x < 00} and becomes a given quadratic function,
F(T, x) — cix2,
(2.1.5)
for t = T. Condition (2.1.5) readily follows from the definition of the loss function (2.1.4) or from formula (1.4.22) with ip(x, y) — c-^x2. The optimal control M* in the form (1.4.25), which minimizes the expression in the square brackets in (2.1.3), is determined by the condition
d[-]/du — 0 as follows:
—sir
<"•"
Exact Methods for Synthesis Problems
95
Substituting, instead of u, the control u* into the expression in the square brackets in (2.1.3) and omitting the symbol "min", we rewrite Eq. (2.1.3) in the form
?L +ari>L+1L*PL + 2
dt
dx
2 dx
r
2_^lf^V=o
4h\dx)
(217} {
' ' '
(Eq. (2.1.7) is just Eq. (1.4.26) for problem (2.1.1), (2.1.2)). Now to solve the synthesis problem, it remains to find the solution F(t, x)
that satisfies Eq. (2.1.7) in the strip HX and is a continuous continuation of (2.1.5) as t —> T. We shall seek such a solution in the form
F(t,x) =p(t)x2+ r ( t ) ,
(2.1.8)
where p(t) and r ( t ) are some functions of time. We choose these functions so that the solution of the form (2.1.8) satisfy (2.1.5) and (2.1.7). Substituting (2.1.8) into (2.1.7) and setting the coefficient of x2 equal to zero, as well
as the terms independent of a;, we obtain the following equations for the unknown functions p(t) and r(t):
b2 p=-c-2ap+--p2, h r=-vp.
(2.1.9) (2.1.10)
It follows from (2.1.5) that the solutions p(t) and r(t) of (2.1.9) and (2.1.10) attain the values p(T) = d, r(T)=0 (2.1.11) at the terminal time t = T. The system of ordinary differential equations (2.1.9), (2.1.10) with additional conditions (2.1.11) can readily be integrated. As a result, we obtain the following expressions for the functions p(t) and r(t): T
Di-D2eW-
)
,__
v\D2. D3 + DtJW-V =
l
D,
n
(2.1.13) where the constants (3, DI, D2, D3, and D4 are related to the parameters
of problem (2.1.1), (2.1.2) as follows: cb2
—,
DI = c + ci(a + /3), 2
b D3=f3-a+—Ci, n
D2 = c+ Cl (o -/?),
b2 D4=/3 + a-—Cl. h
Chapter II
96
From (2.1.6), (2.1.8), and (2.1.12), we obtain the optimal control law
w,(<, x) - p(t)x,
(2.1.14)
p(t) = -
which is the solution of the synthesis problem for the optimal stabilization
system in Pig. 13. It follows from (2.1.14) that in this case the controller C in Fig. 13 is a linear amplifier in the variable x with variable amplification factor p(t). In the sequel, we indicate such amplifiers by a special
mark ">." Therefore, the optimal system for problem (2.1.1), (2.1.2) can be represented as the block diagram shown in Fig. 14.
FIG. 14 Obviously, the minimum value I[u*] of the optimality criterion (2.1.2) with control (2.1.14) and the initial state z(0) — x is equal to F(0, x). From
(2.1.8), (2.1.12), and (2.1.13), we have
J4e-2^T
"fib
2/3 \D4
n
3
Di
+D 4 + r > 3
^—— ^ .
(2.1.15)
To complete the study of problem (2.1.1), (2.1.2), it remains to prove
that the solution (2.1.12)-(2.1.15) of the synthesis problem is unique. It follows from our discussion that the problem of uniqueness of (2.1.12)(2.1.15) is equivalent to the uniqueness of the solution (2.1.8) of Eq. (2.1.7). The general theory of quasilinear parabolic equations [124] implies that
Eq. (2.1.7) with additional condition (2.1.5) has a unique solution in the class of functions F(t, x) whose growth as x —¥ oo does not exceed that of any finite power of x\. On the other hand, an analysis of properties of the
loss function (2.1.4) performed in [113] showed that, for each t e [0,T] and x £ RI, the function (2.1.4) satisfies the estimate
0
97
Exact Methods for Synthesis Problems
where N(T) is bounded for any finite T. Therefore, the function (2.1.8) is a unique solution of Eq. (2.1.7), corresponding to the problem considered, and the synthesis problem has no solutions other than (2.1.12)-(2.1.15).
REMARK. The optimal control (2.1.14) is independent of the parameter v, that is, of the intensity of random actions on the plant P, and coincides with the optimal control algorithm (1.3.33), (1.3.34) for the deterministic problem (1.3.24), (1.3.25). Such a situation is typical of many other linear-quadratic problems of optimal control with perturbations in
the form of a Gaussian white noise. D The exact formulas (2.1.12)-(2.1.15) allow us to examine the process of relaxation of stationary operating conditions (see §1.4, Section 1.4.2) for the stabilization system in question. To this end, let us consider a special case of problem (2.1.1) in which the terminal state x(T)
is not penalized
(ci = 0). In this case, formulas (2.1.12) and (2.1.13) read
_a
a ) e -2/3(T-t)
(2.1.16)
'
vh, r^ -ln[/3-a + (ft-
a e
1/3. -Sln2/3. (2.1.17)
If the operating time is equal to T > t\ = 3/2/3, then the functions p(t) and r(t) determined by (2.1.16) and (2.1.17) have the form shown in Fig. 15.
vcT
/3-a
T- — T t 2/3
T- — T t
2/3
FIG.
15
The functions p(i) and r(t) are characterized by the existence of two time intervals [0, T—
98
Chapter II
ways. The first interval [0, T — ii] corresponds to the stationary operating mode, that is, p(t) ~ c/(/3 — a) = const for t (E [0, T— ti], the function r(t) linearly decreases as t grows, and on this interval the rate of decrease in r(i) is constant and equal to vc/(f) — a). The terminal interval [T — ii,T] is essentially nonstationary. It follows from (2.1.16) and (2.1.17) that the length of this nonstationary interval is of the order of 3/2/3. Obviously, in the case where this nonstationary interval is a small part of the entire operating time [0, T], the control performance is little affected if, instead of the exact optimal control (2.1.14), we use the control
x
(2.1.18)
that corresponds to the stationary operating mode. It follows from (2.1.18) that for large T the controller C in Fig. 13 is a linear amplifier with constant amplification factor, whose technical realization is much simpler than that of the nonstationary control block described by (2.1.14) and (2.1.12). Formulas (2.1.16) and (2.1.17) show that, for large values of T — t, the loss function (2.1.8) satisfies the approximate relation
F(t,x)~-?—x2 +
-(T-t).
(2.1.19)
Comparing (2.1.19) and (1.4.29), we see that in this case the value 7 of stationary mean losses per unit time, introduced in §1.4, is equal to VC
' ~~ p-a a '
(2.1.20)
that is, coincides with the rate of decrease in the function r(t) on the stationary interval [0, T —
It should be noted that to calculate 7 and the function /(a;), we need not have exact formulas for p(t) and r(i) in (2.1.8). It suffices to use the corresponding stationary Bellman equation (1.4.30), which in this cases has the form
df and to substitute the desired solution in the form /(x) = px2 into (2.1.22). We obtain the numbers p and 7, just as in the nonstationary case, by setting
Exact Methods for Synthesis Problems
99
the coefficients of x2 and the free terms on the left- and right-hand sides in (2.1.22) equal to each other. We also note that if at least one of the parameters a, 6, i/, c, and h of problem (2.1.1), (2.1.2) depends on time, then, in general, there does not exist any stationary operating mode. In this case, one cannot obtain finite
formulas for the functions p(t) and r(t) in (2.1.8), since Eq. (2.1.9) is a Riccati equation and,
in general, cannot be integrated exactly. Therefore,
if the problem has variable parameters, the solution is constructed, as a rule, by using numerical integration methods. 2.1.2. All of the preceding can readily be generalized to multidimensional problems of optimal stabilization. Let us consider the system shown in Fig. 13 whose plant P is described by a linear vector-matrix equation of the form
(2.1.23)
where x = x(t] e Rn is an n- vector-column of phase variables, u G Rr is an r- vector of controlling actions, and £(t) £ RTO is an ra- vector of random perturbations of a Gaussian white noise type with characteristics (1.1.34). The dimensions of the matrices A, B, and a are related to the dimensions of the corresponding vectors and are equal to n x n, n x r, and n x m, respectively. The elements of these matrices are continuous functions of
time1 defined for all t from the interval [0, T] on which the controlled system is considered. For the optimality criterion, we take a quadratic functional of the form
= EJ f
[xT(t)G(t)x(t)+uT(t)H(t)u(t)}dt xTt(T)Qx(T}\.
(2.1.24)
Here Q and G(t) are symmetric nonnegative definite n x n matrices and the symmetric r x r matrix H(t) is positive definite for each t e [0, T]. Just as (2.1.3), the Bellman equation for problem (2.1.23), (2.1.24) follows from (1.4.21) if we set A« = B« = 0, Bx = a(t)(TT(t), Ax = A(t)x + B(t)u, and c(x,y,u] = xTGx + UTHu. Thus we obtain
r,. d2F G(t)x 1
+ min \uTBT(t) —— + uTH(t)u\ = 0. « \_ ax J
(2.1.25)
As was shown in [156], it suffices to assume that the elements of the matrices A(t), .B(t), and rr(t) are measurable and bounded.
100
Chapter II
In this case, the additional condition on the loss function (1.4.22) has the form F(T, x) - xTQx.
The further considerations leading to the solution of the synthesis problem
are similar to those in the one-dimensional case. Calculating the minimum value of the expression in the square brackets in (2.1.25), we obtain the optimal control 1
Q EI
(t)-,
(2.1.26)
which is a vector analog of formula (2.1.6). Substituting the expression
obtained for w* into (2.1.25), we arrive at the equation aP
f) p
i
G(T)x - -B(t)H^(t)BT(t)
= 0. (2.1.27)
We seek the solution of (2.1.27) as the following quadratic form with respect to the phase variables:
F(t,x) = xTP(t)x + r(t).
(2.1.28)
Substituting (2.1.28) into (2.1.27) and setting the coefficients of the quadratic (with respect to x) terms and the free terms on the left-hand side in (2.1.27) equal to zero, we obtain the following system of differential equations for the unknown matrix P(t) and the scalar function r(t):
P + AT(t)P + PA(t) + G(t) - PB(i)H-1(t)BT(t)P T
r + Sp Pa-(t)(T (t) = 0,
= 0, P(T) = Q;
r(T) = 0.
(2.1.29)
If system (2.1.29) is solved, then the optimal solution of the synthesis problem has the form
uf(t,x) = -H-1(t)BT(t)P(t)x
= J>(t)x,
(2.1.30)
which follows from (2.1.26) and (2.1.28). Formula (.2.1.30) shows that the controller C in the optimal system in Fig. 13 is a linear amplifier with n inputs and r outputs and variable amplification factors. Let us briefly discuss the possibilities of solving system (2.1.29). The existence and uniqueness of the nonnegative definite matrix P(i) satisfying the matrix-valued Riccati equation (2.1.29) are proved in [72] under the above assumptions on the properties of the matrices A(t), B(t), G(t), H(t),
Exact Methods for Synthesis Problems
101
and Q. One can obtain explicit formulas for elements of the matrix P(t) only by numerical methods,2 which is a rather complicated problem for large dimensions of the phase vector x. In the special case of the zero matrix G(t) = 0, the solution of the matrix equation (2.1.29) has the form [1, 132]
rT
+ Q/ Jt
-i-l
X(T,s)B(s)H-1(s}BT(s)XT(T,s)ds\
QX(T,t).
(2.1.31)
J
Here X(t,s), t > s, denotes the fundamental matrix of system (2.1.23); sometimes this matrix is also called the Cauchy matrix. The properties of the fundamental matrix are described by the relations
X(t, t) = E,
= A(t)X(t, ,),
= -X(t, s}A(s).
(2.1.32) One can construct the matrix X(t, s) if the so-called integral matrix Z(t) of system (2.1.23) is known. According to [111], a square nxn matrix Z(t) is called the integral matrix of system (2.1.23) if its columns consist of any n linearly independent solutions of the homogeneous system x = A(t)x. If the matrix Z(i) is known, then the fundamental matrix X(t, s) has the form X(t,s) = Z(i)Z-l(s). (2.1.33) One can readily see that the matrix (2.1.33) satisfies conditions (2.1.32). The fundamental matrix can readily be calculated if and only if the elements of the matrix A(t) in (2.1.23) are time-independent, that is, if A(t) = A = const. In this case, we have
and the exponential matrix can be expressed in the standard way [62] either via the Lagrange-Silvester interpolation polynomial (in the case of simple eigenvalues of the matrix A) or via the generalized interpolation polynomial (in the case of multiple eigenvalues and not simple elementary divisors of the matrix A). If the matrix A is time-varying, the construction of the fundamental matrix (2.1.33) becomes more complicated and requires, as a rule, the use of numerical integration methods. 2 There also exist approximate analytic methods for calculating the matrices P(t) [ I , 72]. However, for matrices P(t) of larger dimensions, these methods meet serious
computational difficulties.
102
Chapter II
2.1.3. The results obtained by solving the basic linear-quadratic problem (2.1.23), (2.1.24) can readily be generalized to more general statements
of the optimal control problem. Here we only list the basic lines of these generalizations; for a detailed discussion of this subject see [1, 5, 34, 58, 72,
122, 132]. First of all, note that the synthesis problem (2.1.23), (2.1.24) admits an exact solution even if there are noises in the feedback circuit, that is, if
instead of exact values of the phase variables x(i), the controller C (see Fig. 13) receives distorted information of the form
x ( t ) = N(t)x(t) +
(2.1.34)
where N(t) and
process of the white noise type (1.1.34) or a Gaussian Markov process. In this case, the optimal control algorithm coincides with (2.1.30) in which,
instead of the true values of the current phase vector x = x ( t ) , we use the vector of current estimates m — m(i) of the phase vector. These estimates
are formed with the help of Eqs. (1.5.53) for the Kalman filter, which with regard to the notation in (2.1.23) and (2.1.34) have the form 3
m = [A(t) - B(t)H-i(t)BT(t)P(t)}m + DJV^M^Wr 1 ^) - N(t)m), (2.1.35) T
T
D =
well-known separation theorem [58, 193]. The next generalization of the linear-quadratic problem (2.1.23), (2.1.24) is related to a more general model of the plant. Suppose that, in addition to additive noises (,(t), the plant P is subject to perturbations depending on the state x and control u and to pulsed random actions with Poisson distribution of the pulse moments. It is assumed that the behavior of the plant P is described by the special equation
x = A^x+B^u+o-^x^+o-^u&^+a^^+az^Ott), 3
(2.1.37)
Equations (2.1.35) and (2.1.36) correspond to the case in which 77(4) in (2.1.34) is a
white noise.
Exact Methods for Synthesis Problems
103
FIG. 16 where £i(t) and £2^) are scalar Gaussian white noises (1.1.31), 9(t) is an I-vector of independent Poisson processes with intensity coefficients A,- (i = !,...,£), <TI, 0*2, and 173 are given n x n, n x r, and n x I matrices, and
the other variables have the same meaning as in (2.1.23). For the exact solution of problem (2.1.37), (2.1.24), see [34]. We also note that sufficiently effective methods have been developed for infinitely dimensional linear-quadratic problems of optimal control if the
plant P is either a linear dynamic system with distributed parameters or a quantum-mechanical system. Results corresponding to control of distributed parameter systems can be found in [118, 130, 164, 182] and to control of quantum systems in [12, 13]. All linear-quadratic problems of optimal control, treated examples, are characterized by the fact that isfying the Bellman equation is of quadratic form (a and the optimal control law is a linear function (a
as well as the abovethe loss function satquadratic functional) linear operator) with
respect to the phase variables (the state function). To solve the Bellman equation becomes much more difficult if it is necessary to take into account some restrictions on the domain of admissible control values in the design of an optimal system. In this case, exact analytical results can be obtained, as a rule, for one-dimensional synthesis problems (or for problems reducible to one-dimensional problems). Some of such problems are considered in the following sections of this chapter. §2.2. Problem of optimal tracking a wandering coordinate
Let the input (command) signal y ( t ) in the servomechanism shown in Fig. 2 be a scalar Markov process with known characteristics, and let the plant P be a servomotor whose speed is bounded and whose behavior is
104
Chapter II
described by the scalar deterministic equation
x-u,
\u(t)\
(2.2.1)
(here um determines the admissible range of the motor speed, — um < x < um). Equation (2.2.1) adequately describes the dynamics of a constant
current motor controlled by the voltage on the motor armature under the assumption that the moment of inertia and the inductance of the armature
winding are small [2, 50]. We shall show that various synthesis problems stated in §1.4 can be solved for such servomechanisms. 2.2.1. Let y(t] be a diffusion Markov process with constant drift a and
diffusion B coefficients. We need to calculate the controller C (see Fig. 2) that minimizes the integral optimality criterion f
c (x(t),y(t))dt,
(2.2.2)
Jo
where c(x, y) is a given penalty function.
By setting A* = a, B« = B, ax = u, and Bx = 0 in (1.4.21), we readily obtain the following Bellman equation for problem (2.2.1), (2.2.2):
dF dt
dF dy
Bd2F 2 dy
.
.
. r dF] «-— = 0. |«|<« TO L dx J
— — + a — — + - — —2 - + c z,y + mm
2.2.3
We shall consider the penalty functions c(x, y) depending only on the error signal, that is, on the difference z = y — x between the command input y and the controlled variable x. Obviously, in this case, the loss function F(t,x,y) = F(t,y—x) = F(t,z) in (2.2.3) also depends only on z. Instead of (2.2.3), we have
dF at
dF dz
Bd2F 2 dz
r dFi - u—— \= 0. l«l<«™ L dz J
-5T + a1T- + TlTir 2 + c(z + mm
(2.2.4)
The minimum value of the function in the square brackets in (2.2.4) can be obtained by using the control4
ut — um sign f —— J , 4
(2.2.5)
In (2.2.5) sign
sign
Exact Methods for Synthesis Problems
105
which requires to switch the servomotor speed instantly from one admissible limit value to the opposite value when the derivative dF(t, z)/dz of the loss function changes its sign. Control of the form (2.2.5) is naturally called control of relay type (sometimes, this control is called "bang-ban
Substituting (2.2.5), instead of u, into (2.2.4) and omitting the symbol "min", we reduce Eq. (2.2.4) to the form
3F dF B d2F 8F -57- +0-5- + -^-^+ c(z) -Urn -£- = 0 . dt dz 2 dz2 dz
(2.2.6)
In [113, 124] it was shown that in the strip UT = {0 < t < T, -oo < z < 00} Eq. (2.2.6) has a unique solution F(t,z) satisfying the additional condition F(T, z) = 0 if the penalty function c(z) is continuous and does not grow too rapidly as z\ —>• oo.5 In this case, F(t, z) is a function twice continuously differentiable with respect to z and once with respect to t. In particular, since dF/dz is continuous, the condition
—— ( t , z ) = Q
(2.2.7)
must be satisfied at the moment of switching the controlling action. If c(z) > 0 attains its single minimum at the point z = 0 and does not decrease monotonically as z| —> oo, then Eq. (2.2.7) has a single root z f ( t ) for each t. This root determines the switch point of control. On
different sides of the switch point the derivative dF/dz has opposite signs. If dF/dz > 0 for z > z p (<) and dF/dz < 0 for z < zr(t), then we can write the optimal control (2.2.5) in the form
w»(t,z)=umsign(z-zr(t)).
(2.2.8)
Thus, the synthesis problem is reduced to finding the switch point z (t}. To this end, we need to solve Eq. (2.2.6).
Equation (2.2.6) has an exact solution if we consider the stationary tracking. In this case, the terminal time (the upper limit of integration in (2.2.2)) T —>• oo, and Eq. (2.2.6) for the time-invariant loss function (see (1.4.29)) •t)]
(2.2.9)
becomes the ordinary differential equation
B d2f ^d^
df +a
dz~U"
+ c(z)=
(2.2.10)
5 More precisely, the condition that there exist positive constants A\, A?, and a such that 0 < c(z) < A\ + A? z\a for all z implies constraints on the growth of the
function c(z).
106
Chapter II
which can be solved by the matching method [113, 171, 172]. Let us show how to do this. Obviously, the nonlinear equation (2.2.10) is equivalent to two linear equations
f^+ <-->£+•<•>=* »>v £>»• *
l
(2211)
-
for the functions f i ( z ) and f i ( z ) that determine the function f ( z ) on each side of the switch point z p . The unique solutions to linear equations (2.2.11) are determined by the behavior of /i and /2 as \z\ —>• oo. It follows from the statement of the problem that if we take into account the diffusion "divergence" of the trajectories z(t) for large \z , then we only obtain small corrections to the value of the optimality criterion and, in the limit as \z\ —)• oo, the loss functions f i ( z ) and /2(z) must behave just as the solutions to Eqs. (2.2.11) with B = Q. The corresponding solutions of Eqs. (2.2.11) have the form =
z
2 f°°
-fi JIz B
"' ""'
I c ( z ) - f]
2 [*
dz
„ = —S B
r
ex
\
2K.-a)
1
(2 2 12
L z P \_ - - J —5—— ( ~ z) \ B
>-~+.>r }]f
-- '
\
i i —-r- J t c ( z )-7]exp
i -\~ /
J-oo
„
t-2
Z ) | UZ.
L
According to (2.2.7), we have the following relations at the switch point z p :
~(zr) = -^-(zr) = 0.
(2.2.13)
Substituting (2.2.12) into (2.2.13), considering (2.2.13) as a system of equations with respect to two unknown variables z p and -y, and performing some simple transformations, we obtain the equation for the switch point: /
/ Jo
\cI z -\- ________ ) — r\ ? — ________ I \e~v rlit — fl
O 9 14^ ^Z.Z.ltj
and the expression for the stationary tracking error
^ f zZ +7 - r cC
"Jo
Bp
}c-dvr dv
( r 2(um-a))e
-J0
Cc(z Z
( r
BV
}c-*dv dv
2(um+a))e
' (2.2.15)
Exact Methods for Synthesis Problems
107
To obtain explicit formulas for the switch points and stationary errors, it is necessary to choose some special penalty functions c(z). For example, for the quadratic penalty function c(z) = z 2 from (2.2.14), (2.2.15), we have 5^-j,
(2-2.16)
If c(z) — z|, then we have /
n
\
(2.2.18)
,. 2(um-a) ™ + a ) l n 1 + ——
+um-a
(2.2.19) It should be noted that formulas (2.2.16)-(2.2.19) make sense only under the condition um > a. This is due to the fact that the stationary operating mode in the problem considered may exist only for um > a. Otherwise, (for a > um), the mean rate of increase in the command signal y(t) is larger
than the limit admissible rate of change in the output variable x(t), and the error signal z ( t ) = y(t) — x ( t ) is infinitely growing in time. If the switch point zp is found, then we know how to control the servomotor P under the stationary operating conditions. In this case, according to (2.2.8), the optimal control has the form «„ (z(t)) = um sign ( z ( t ) - z r ) ,
(2.2.20)
and hence, the block diagram of the optimal servomechanism has the form
shown in Fig. 17. The optimal system shown in Fig. 17 differs from the optimal systems considered in the preceding section by the presence of an essentially nonlin-
ear ideal-relay-type element in the feedback circuit. The other distinction between the system in Fig. 17 and the optimal linear systems considered in §2.1 is that the control method depends on the diffusion coefficient B
of the input stochastic process (in §2.1, the optimal control is independent of the diffusion coefficients,6 and therefore, the block diagrams of optimal deterministic and stochastic systems coincide).
If B = 0 (the deterministic case), then it follows from (2.2.16)-(2.2.19) that the switch point zf — 0 and the stationary tracking error 7 = 0. These 6
This takes place if the current values of the state vector x(t) are measured exactly.
Chapter II
108
FIG. 17 results readily follow from the statement of the problem; to obtain these results it is not necessary to use the dynamic programming method. Indeed,
if at some instant of time we have y(t] > x(t) (z(t) > 0), then, obviously, it is necessary to increase x at the maximum rate (that is, at u = +um) till the equality y = x (z = 0) is attained. Then the motor can be stopped. In
a similar way, for y < x (z < 0), the control u — —um is switched on and operates till y becomes equal to x. After y = x is attained and the motor is stopped, the zero error z remains constant, since there are no random actions to take the system out of the state z = 0. Therefore, the stationary tracking "error" is zero.7 If the diffusion is taken into account, then the optimal deterministic
control u*et = um sign z is not optimal. This fact can be explained as follows. Let u — um signz, and let 5 ^ 0 . Then the following two factors affect the trajectories z(t): they regularly move downwards with velocity (um —a) for z > 0 and upwards with velocity (um + a) for z < 0 due to the drift a and control u (see Fig. 18), and they "spread" due to the diffusion B that is the same for all z. As a result, the stochastic process z(t) becomes
stationary (since the regular displacement towards the i-axis is proportional to t and the diffusion spreading away from the i-axis is proportional to \/t) and all sample paths of z ( t ) are localized in a strip of finite width containing the t-axis.8 However, since the "returning" velocities in the upper and lower
half-planes are different, the stationary trajectories of z ( t ) are arranged not It is assumed that the penalty function c(z) attains its minimum value at z = 0 and c(0) = 0. 8 More precisely: if 2(0) = 0, then with probability 1 the values of z(t) lie in a strip of finite width for all t > 0.
Exact Methods for Synthesis Problems
109
z(t)
FIG. 18 symmetrically with respect to the line z = 0, as is conventionally shown in Fig. 19. If the penalty function c(z) is an even function (c(z) — c(—z)), then, obviously, the stationary tracking error 7 = Ec(z) (see (1.4.32)) can be decreased by placing the strip AB (where the trajectories are localized) symmetrically with respect to the axis z = 0. This effect can be reached
by switching the control u at some negative value zr rather than at z = 0. The exact position of the switch point zr is determined by formulas (2.2.14), (2.2.16), and (22.2.18).
FIG. 19 In conclusion, we note that all results obtained in this section can readily be generalized to the case where the plant P is subject to additive noncon-
trolled perturbations of the white noise type (see Fig. 10). In this case,
110
Chapter II
instead of Eq. (2.2.1), we have
um,
(2.2.21)
where £(t) is the standard white noise (1.1.31) independent of the input process y(t) and N > 0 is a given number. In this case, the Bellman equation (2.2.3) acquires the form
dF - ^ dt
dF — dy
Bd2F 2 dy22
Nd2F 2 dx2o
.
i
.
.
mm
r dF
|«|<«
and instead of (2.2.4), we obtain
dF -7 at
dF -5az
B + Nd2F 2 dz2
. .
. r dFi A mm - «—— = 0. l«l<«m L oz
This equation differs from (2.2.4) only by a coefficient of the diffusion term. Therefore, all results obtained for systems whose block diagram is shown in Fig. 2 and whose plant is described by Eq. (2.2.1) are automatically
valid for systems in Fig. 10 with Eq. (2.2.21) if in the original problem the diffusion coefficient B is replaced by B + N. In particular, if noises in the plant are taken into account, then formulas (2.2.16) and (2.2.17) for the stationary switch point and the stationary tracking error take the form Sf =
_(B + N)a ~
2
Note also that the problem studied in this section is equivalent to the synthesis problem for servomechanism tracking a Wiener process of intensity B with nonsymmetric constraints on admissible controls — um + a <
u < um + a, since both these problems have the same Bellman equation
(2.2.4). 2.2.2. Now let us consider the synthesis problem that differs from the problem considered in the preceding section only by the optimality criterion. We assume that there is an admissible domain [^1,^2] for the error
z(i) = y(t) - x(t) (ti and £2 are given numbers such that li < I?). We assume that if z(t) leaves this domain, then serious undesirable effects may occur. For example, the system considered or a part of any other more
complicated system containing our system may be destroyed. In this case,
Exact Methods for Synthesis Problems
111
it is natural to look for controls that keep z(t] within the admissible limits for the maximum possible time. General problems of calculating the maximum mean time of the first passage to the boundary were considered in §1.4. In particular, the Bellman equation (1.4.40) was obtained. In the scalar case studied here, this equation has the form
Bd2Fj
S=-1
dj\
~2~dv^ + a~dy~^ H
(2-2-24)
(Eq. (2.2.24) follows from (1.4.40), (1.4.31), since A« = a, Ax = u, BV = B, Bx = 0). Recall that the function Fi(x,y) in (2.2.24) is equal to the maximum mean time of the first passage to the boundary of the domain of admissible phase variables if the initial state of the system is (x, y). In the case where the domain of admissible values (x, y) is determined by the error signal z = y — x, the function FI depends only on the difference FI(X, y) = Fi(y — x) — F\(z] and, instead of the partial differential equation (2.2.24), we have the following ordinary differential equation for the function FI(Z): „ , 22 f a^+ max - «^ =-1. 2 dz dz |«|<«™ L dz J
(2.2.25)
The function -Fi(z) satisfies Eq. (2.2.25) at the interior points of the domain [£1,^2] °f admissible errors z. At the boundary points of this domain, FI vanishes (see (1.4.41)):
The optimal system can be synthesized by solving Eq. (2.2.25) with the boundary conditions (2.2.26). Just as in the preceding section, one can see that the optimal control u f ( z ) is of relay type and is equal to
ux(z) = -um sign (~rL)•
(2.2.27)
Using (2.2.27), we transform Eq. (2.2.25) to the form
dz
(2.2.28)
The condition of smooth matching (see [113], p. 52) implies that the solution -Fi(z) of Eq. (2.2.28) and the derivatives dFi/dz and d2Fi/dz2 are
Chapter II
112
FIG. 20 continuous everywhere in the interior of [li z1 is determined by the condition
dz
\. Therefore, the switch point
= 0.
(2.2.29)
The same continuity conditions and the boundary conditions (2.2.26), as well as the "physical" meaning of the function F±(z), a priori allow us to estimate the qualitative behavior of the functional dependence -Fi(z). The corresponding curve is shown in Fig. 20. It follows from (2.2.29) that the switch point corresponds to the maximum value of FI(Z). In this case, F ( ( z ) < 0 for z > z1, and F[(z] > 0 for z < z1. In particular, this implies that the optimal control (2.2.27) can be written in the form
u*(z) = -u r a sign(z-zj), which is similar to (2.2.20) and differs only by the position of the switch point. Thus, in this case, if the applied constant displacement —zr is replaced by — z1, then the block diagram of the optimal system coincides with that in Fig. 15. The switch point z1 can be found by solving Eq. (2.2.28) with the boundary conditions (2.2.26). Just as in the preceding section, we replace the nonlinear equation (2.2.28) by the following pair of linear equations for the
Exact Methods for Synthesis Problems
113
function Ff(z), zl < z < £ 2 > and the function F^(z), lj_ < z < z^:
Bd2F+ t .dF+ V^H~+ (a-Wm)-7^ = -l, 2 2 dz dz Bd2F~ .dFr , r
1
z1
Y^- + (a + U ™ ) ^ = -1'
, 2>2>30)
^<*<^
The required switch point z1 can be obtained from the matching conditions for F f ( z ) and F^(z). Since FI(Z) is twice continuously differentiable, it follows from (2.2.27) that these conditions have the form
F?(*i) = fT(*1), v
dz
(2-2-31)
dz
r'
The boundary conditions (2.2.26) and (2.2.32) for F?(z) and J'f(z) imply ,-.0. /
\
^ — <-2
um - a
-^
2(um - a)2 {
exp
-
I B v2(um -a)(z-z1}
— exp
B j.j
n (-1
—— 2
um + a
s
f\ 1 . .
i
.
W
.
' \'
(2.2.33)
It
I
\
'
IJ
2(um + a) 2 |_
*L
B -a)(z-zl
By using (2.2.33) and the continuity condition (2.2.31), we obtain the following transcendental equation for the required point z1:
2umz* = (um + a)£2 + (um - a)li
™
— ,
B\um2 I um + .i._. 4- n.
r
9
.
i 1
(2.2.34)
um — a
In the simple special case a = 0, it follows from (2.2.34) that i
4+4
114
Chapter II
that is, the switch point is the midpoint of the interval of admissible errors z. This natural result can be predicted without solving the Bellman equation. In the other special case where —li — I? = t (I > 0) and a
Zr
oB
~ <
at r i + exp(2Mro£/5)1 +
^ [l-exp(2uml/B)\ '
To find z1 in the other cases, it is necessary to solve the transcendental equation (2.2.34). 2.2.3. Assume that the performance of the servomechanism shown in Fig. 2 is determined by the maximum error z(i] = y(z) ~ x(t] on a fixed
time interval 0 < t < T. Then it is natural to minimize the optimality criterion
/[«] = E max z ( t ) \ = E max \y(t) - x(t)|, 0
0
(2.2.35)
which is a special case of the criterion (1.1.18). For convenience, we shall
use the modification (1.4.48) of the criterion (1.1.18), that is, instead of (2.2.35), we shall minimize I[u] = Emax z(r)| e -^ r -*>.
(2.2.36)
T>t
The parameter (3 > 0 determines the observation time for the stochastic process z(r). We assume that the criteria (2.2.35) and (2.2.36) are equivalent if the terminal time T and the variable (3 are matched, for example, as follows: T = c//3, where c > 0 is a constant. The Bellman equation for the problem considered can be obtained from (1.4.51) with regard to the relation /2(#,y) = f z ( y — x) — /2(z). This equation has the form
min \-u~ =/?/ PJ22 , «|<« m L dz\ '
i fJ/ 2 ( z ) > | z , ^ ' ' '
(2.2.37)
otherwise. Just as in the preceding sections, after the expression in the square brackets is minimized, Eq. (2.2.37) acquires the form = /?/2,
if/2(*)>|*,
(2.2.38)
otherwise. In this case, just as in the preceding sections, the optimal control w*(z) is
of relay type and can be written in the form (2.2.20). The only distinction
Exact Methods for Synthesis Problems
115
is that, in general, the switch point z 2 differs from z p and z1. The point z 2 can be found by solving Eq. (2.2.38). Solving Eq. (2.2.38), we shall distinguish two domains on the z-axis: the domain Z-± where f2(z] > \z\ and the domain Z2 where f2(z) = \z\.
Obviously, if f2(z*) = \z*\ for any z*, then / 2 (z) = z| for any z such that \z\ > z* . In other words, the domain Z2 consists of two infinite intervals
(—oo, z') and (z",+oo). In the domain Z\ lying between the boundary points z' < 0 and z" > 0, we have
Bd2f2 2
2 dz
,
d/2
+ a—— - un
dz
d/2 = /3f2. dz
(2.2.39)
Next, the interval [z', z"] is divided by the switch point z 2 into the following two parts: the interval z' < z < z 2 where Eq. (2.2.39) takes the form
=0
(2-2.40)
and zr2 — < z< z" where —
Thus, in this case, we have seven unknown variables: z', z", z 2 , and the four constants obtained by integrating Eqs. (2.2.40) and (2.2.41). They can be obtained from the following seven conditions:
, c 2} I\ Z / I>, r
( =^ df2
.
2.
_
—;—— V I Z ;I —
/2
(2.2.42) 2
_
—— V ( Z ;I — U .
Formulas (2.2.42) are smooth matching conditions for the solutions f2 (z) and f i ~ ( z ) .
The last three conditions show that the solutions and their
first-order derivatives are continuous at the switch point z 2 (see (2.2.31) and (2.2.32)). The first four conditions show that the solutions and their first-order derivatives are continuous at the boundary points of Z\. By solving (2.2.40) and (2.2.41) with regard to (2.2.42), we readily obtain
116
Chapter II
the following three equations for z', z", and z2: 1
A2{exp[-(A1 + A 2 ) ( Z 2 _ z ' ) ] _ 1
' ~ ^ " ~ l; 1
(2.2.44)
1
-
_e
(2.2.45)
In (2.2.43)-(2.2.45) we have used the notation
= •= um - a + v/(wm - a) 2 + 2/3B , a 1
1 o \
uro + "^("m - a) 2 + 2/3B
The desired switch point z2 can be found by solving the system of transcendental equations (2.2.43)-(2.2.45). Usually, this system can be solved by numerical methods. One can obtain the explicit expression for z1 from
Eqs. (2.2.43)-(2.2.45) only if the problem is symmetric, that is, if a — 0. In this case, the domain Zi is symmetric about the origin, z' = —z", and we have the switch point z2 — 0. However, this is a rather trivial case and of no practical interest.
REMARK. It should be noted that the optimal systems considered in Sections 2.2.2 and 2.2.3 are very close to each other (the switch points nearly coincide, zj Kt z2) if the corresponding parameters of the problem agree well with each other. These parameters can be made consistent in
the following way. Assume that the same parameters a, B, and um are given in problems of Sections 2.2.2 and 2.2.3. Then, choosing a value of
Exact Methods for Synthesis Problems
117
the parameter /3, we can calculate three numbers z' = z'(/3), z" = z"(/3), and z2 — z2((3) in dependence on the choice of (3. Now if we use z' and z" as the boundary values of admissible errors (li = z' (/3), ti — z" (/3)) in the problem considered in Section 2.2.2, then by solving Eq. (2.2.34), we obtain the coordinate of the switch point z* and show that z^((3) tel z^(f3)9 for (3 varying from 1.0 to 10~4. This is confirmed by the numerical experiment
described in [92]. Moreover, in [92] these values of the parameter /3.
it is shown that Fi(z^(f3)}
« (3'1 for
D
2.2.4. Now let us consider the synthesis problem of optimal tracking a discontinuous Markov process. Let us assume that the input process y(t) in the problem of Section 2.2.1 is a pure discontinuous Markov process. As shown in §1.1, such processes are completely characterized by the intensity
\(y) of jumps and the density function ir(y,y') describing the transition probabilities at the jump moments. The one-dimensional density p(t, y) of this process satisfies the Feller equation (see (1.1.71))
dp(t, y) - A(j/)p(t, y) + I A(Z)TT(Z, y)p(t, z) dz = 0. dt
(2.2.46)
From (1.4.61) with regard to (2.2.1) and (2.2.2), we obtain the Bellman equation
ir(y,z)F3(t,x,z)dz-F3(t,x,y) + c(x,y)+
min u - ( t , x, y) = 0. l«l<« m L ox J
(2.2.47)
If we denote the integro-differential operator of Eq. (2.2.46) by L f t y , then this equations can be written in the short form
Lt,yP(t,y)=Q.
(2.2.48)
Comparing Eqs. (2.2.46) and (2.2.47) with the Feller equations (1.1.69) and (1.1.70), we see that, for pure discontinuous processes, the Bellman equation (2.2.47) contains the integro-differential operator L^y of the backward Feller equation; this operator is dual to Ltiy. Therefore, Eq. (2.2.47) can be written in the form , x, y) + c(x, y) + min \u~(t, x, y)} = 0. [«|<« m [_ 9
OX
(2.2.49)
J
The approximate relation z^(&) « z*(P) means that \z*(0)-z^(P)\ < z"(/3)-z'(/3).
118
Chapter II
In what follows, we assume that the input Markov process y(t) is homogeneous with respect to the state variable y, that is, A(y) = A = const and •x(y,y') = n(y — y'). In this case, by using the formal method proposed in [176], we can replace the integro-differential operator L^>y in (2.2.47) and
(2.2.49) by an equivalent differential operator. Let us show how to do this. First, we try to write Eqs. (2.2.46) and (2.2.48) in the form
(2 2 50)
--
where L is the required differential operator and Lp is the density of the probability flow [160, 173]. We apply the Fourier transform to (2.2.50) and
(2.2.46). For the Fourier transform of the probability density oo
/
esyp(t,y)dy,
s = u,
=\
•oo
we obtain the following two equations from the well-known property of the Fourier transform of the convolution of two functions:10
),
(2.2.51)
s)-l),
(2.2.52)
where 7f(s) denotes
f°° s W(s) = I e ^(y}dy.
(2.2.53)
J — oo
Comparing (2.2.51) and (2.2.52), we obtain the spectral representation of the desired operator L(8)
= A^" 1 .
(2.2.54)
s
If the expression on the right-hand side of (2.2.54) is the ratio of polynomials 0
=
Q(s)
l
...
m
qo + qis + - - - + qnsn
V
'
(hi and 5,- are constant numbers), then, as follows from the theory of Fourier
transforms [41], the desired operator L(d / dy) can be obtained from L ( s ) 10
RecaII that A(y) = A and ir(z,y) = Tr(y - z) in (2.2.4
Exact Methods for Synthesis Problems
119
by the formal change s —>• d/dy. Using the operator L(d/dy), we transform the Bellman equation (2.2.49) to the form
(note that if L(d/dy) = H(d/dy)/Q(d/dy), then the Li/> = = Q(d/dy)
!?- =i-c(z)
dz
(2.2.57)
(here the time-invariant loss function /s = fs(z) is determined by analogy with (2.2.9) and 7 is the stationary error.) The optimal control (2.2.58) 3 differs from (2.2.20) only by the^ position puoiuiuii of ui the oiic switch owiu^ii point puniu z 7 , which can
be obtained from the condition dz
v r/
.
To complete the solution of the synthesis problem, we need to calculate z3. By analogy with Section 2.2.1, the switch point z3 divides the z-axis into two parts: the domain z > z3, where df$/dz > 0, it* = um, and
Eq. (2.2.57) has the form
and the domain z < z 3 , where df^/dz the form
w
/ A \
< 0, u* = um, and Eq. (2.2.57) has
AtF + um -^- = 7 - c ( z ) .
(2.2.60)
At the switch point z3 the derivatives of the functions / + (z) and f ~ ( z ) vanish:
120
Chapter II
To solve the linear equations (2.2.59) and (2.2.60) explicitly, we need to specify the form of the linear operator L(d/dz). Assume that the density of the transition probability Tr(y,y') at the jump moment is given by the formula f
1I exp(-fc / L 2 |y I / -2/1)n ffor // y. Calculating the integral (2.2.53), we obtain
^(s) = 71 IT-•——— fc2 —— + Afc s - s22 '
(2.2.63)
A* = fc2
After the change s —> d/dz, we obtain the following expression for the operator L(d/dz) from (2.2.63) and (2.2.54):
(2.2.64) With regard to (2.2.64), we can write Eqs. (2.2.59) and (2.2.60) in the form x(_d__
dz
*,\df£ I dz
/,,
.. d
Introducing the functions
we transform the system (2.2.65) as follows:
=
um ) dz
,
where •U, 7rn/ 1[_
(U,^. 1 "7
(17 U,/(,
JI
Relations (2.2.61) and (2.2.66) imply the following matching conditions for the functions
(k2um + \&k)p+(z3r) = (-k22um + \&k)
(2.2.69)
Exact Methods for Synthesis Problems
121
The characteristic equations corresponding to Eqs. (2.2.67) are
umj
\
um J
=0.
(2.2.70)
By fj,^ and ^ we denote the roots of the characteristic equation for the function (p+ (z) (correspondingly, by /j^ and /j,^ the roots of the characteristic equation for (f>~(z)). A straightforward verification shows that if
A
(2.2.71)
then (1) all roots /uf )2 are real,
(2) each characteristic equation in (2.2.70) has roots of opposite signs (for definiteness, in what follows, we assume that ^ and /j,^ are positive, and respectively, fj,^ < 0). REMARK. Note that condition (2.2.71) must be satisfied, since this is the existence condition for the stationary tracking considered here. Indeed, the expression on the left-hand side in (2.2.71) is equal to the absolute value of the mean rate of the regular displacement in the command signal y(t) caused by random jumps. Obviously, this rate of change cannot exceed the limit admissible speed of the servomotor. Inequality (2.2.71) just confirms
this fact.
D
Taking into account these properties of the roots of the characteristic equations (2.2.70), one can readily see that the solutions of Eqs. (2.2.67) have the form ±
(z)=
X
f r
"?('-*') ± ( '}
'
~ (/if - / 4 ) L / f o o 6 fZ
±, ^ a. 1 ei*2(*-*)c±(z')dz'\.
(2.2.72)
Using (2.2.72), from the matching conditions (2.2.69), we obtain the following equation for the required switch point z3:
(k2um + AAfc) I" r° elt+yc+(z3_ ZT 04 ~ / 4 ) [J-oo6
}
y
, y
f°° rf9+(3_ ZT Jo
] \
e^yc~(z3r~y)dy\.
(2.2.73)
122
Chapter II
For the quadratic penalty function c(z) = z 2 , Eq. (2.2.73) has an exact solution. Actually, taking into account (2.2.68), we can rewrite Eq. (2.2.73) in the form
t-y-
Oh - A*2~)
f°° M~»/o
L./-OO
>
e"*»c(y) dy + / Jo
1 J
e"^c(t/) dy] = 0, J
(2.2.74)
where
c(y) = co + cl2/ + c 2 y 2 ,
CQ = fc 2 (z 3 ) 2 + 2Afcz 3 - 2,
{ 2i.jL. i 0 1
Calculating the integrals in (2.2.74), we obtain
(-k2um+XAk)\
/I
1N
/
c
2
T —:—T———XT— Ko ~
/ / ' I
— ^2 J
L
v \
^i
1
~
2
i *J '
1
4
//~/
,
-
\ ( III ' 1 "
A*2
1/^
Since /Lt^"2 and /z^ 2 satisfy the quadratic equations (2.2.70), after easily transformation, we obtain the following explicit formula for the switch point: /o n
v • •«
Using (2.2.69), (2.2.72), and (2.2.77), we can readily calculate the stationary specific error 32
r
mAk
+ X)
2Afc
3
2
~ ~ ~ ~
.
(2.2.78)
Exact Methods for Synthesis Problems
123
If instead of condition (2.2.71), we have a stronger inequality
A
F~~ I
then we can substantially simplify (2.2.77) and (2.2.78) by expanding these expressions in power series in the small parameter e = XAk/umk2 and taking only the leading terms of these expansions. In this case, instead of (2.2.77) and (2.2.78), we have Afc/2 jfe2
1
'
7 - K3)2-
(
J
(2-2.80)
For the first time, formulas (2.2.79) and (2.2.80) were derived by somewhat different methods in [176]. §2.3.
Optimal control of the population size
Numerous investigations deal with the dynamics of animal and microorganism populations and control of the population size. For example, an
extensive literature on this subject can be found in [51, 73, 87, 89, 133, 142, 186, 187, 189]. Various mathematical evolutional models depending on the environmental conditions of biological populations are used for describing the variations in the population size. We begin with a brief review of such models. The main attention is given to the models that we shall consider in this book later. 2.3.1. Models describing the population dynamics. Apparently, Malthus was the first who considered the following model for the population
dynamics in 1798: J = az.
(2.3.1)
Here x = x ( t ) is the population size11 at time t, and the constant number a, called the growth factor, is determined as the difference between the birthrate and the death-rate factors. If the birth-rate is larger than the death-rate (a > 0), then, according to the Malthus model (2.3.1), the population size must grow infinitely. llr
The variable x is assumed to be continuous, although the number of individuals in
the population can be only integer. However, if the number of individuals is sufficiently large, then the continuous model (2.3.1) can be used. In this case, the variable x is treated as the population density, that is, as the number of individuals per unit area (or volume) of the population habitat.
124
Chapter II
Usually, this result is not confirmed, which shows that the model (2.3.1) is not perfect. Nevertheless, the basic idea of this model, namely, the assumption that the rate of the population variation is proportional to the current population size proved to be very fruitful. Many more realistic
models were constructed on this basis by introducing the corresponding corrections to the growth factor a. So, for example, if we assume that in (2.3.1) the growth factor a depends
on the population size x as = a(x) = -fj,
ln(x/K)
/
x\
or a = a(x) = r (I - — J ,
then we obtain the Gompertz model (1825)
dx
* - '^ or the Verhulst model (1838)
Equation (2.3.3) is often called the logistic equation. The positive constants r and K are usually called the natural growth factor and the capacity of the medium. Models for more complicated systems of interacting populations are also based on the Malthus model (2.3.1). Assume that in the same habitat there are two different populations of sizes xi and x%, respectively. Let each of these populations be described by the Malthus type equations
at
,
at
= bx2.
, . (2.3.4)
Now we assume that individuals in the second population (predators) can exist only if they eat individuals (prey) from the first population.12 In this case, it is natural to assume that the growth factors a and b in (2.3.4) have the form
12
This model is usually illustrated by the assumption that x\ denotes a community
of hares bares and X? a community of wolves. Hares need vegetable food, and wolves feed on hares •es (and (and only onlv on hares). hares)
Exact Methods for Synthesis Problems
125
Thus, we arrive at the two equations
dxi
—-— = (ai — 02^2)*!,
at
dx2
—;— = ( — t>i + ozxzjx^
at
,
,
(2.3.5)
which are well-known Lotka-Volterra equations that model the behavior of the simplest system of interacting populations, the "predator-prey" model. These equations were studied in detail by V. Volterra [187], who found many remarkable properties of their solutions.
The multidimensional generalization of the Lotka-Volterra model has the form
•fcr
/.
^. .. V
r=i)2,...,n.
(2.3.6)
The dynamics of system (2.3.6) depends on the form of the matrix A = \\aij\\™. If this ma trix is antisymmetric, i.e., if a,-j = — oj,- and an = 0, then Eq. (2.3.6) describes a conservative model for the population interaction. If the quadratic form arsxrxs is positive definite, then model (2.3.6) is called dissipative. Further generalizations of models for the population dynamics were related to more detailed descriptions of the interaction between individuals in the population. For example, in many actual situations, the growth factor a depends on the population size at some preceding moment of time rather than on the current population size. In these cases, it is expedient to use the Hutchinson model (1948) M\
h > 0,
(2.3.7)
which is a generalization of the logistic model (2.3.3). In 1976 Gushing proposed the following more general model, in which both discrete and distributed delays are taken into account: f
x(t)=rx(t)\l-
I
f00
Jo
1
x(t-s)dK(s)\,
1
t > 0,
(2.3.8)
where K(s) is a nondecreasing bounded function and the integral on the right-hand side is of the Stieltjes type. In some special cases, it is necessary to take into account the spatial distribution of the population size. In these cases, the state of the population is described by the density function D(t,x,y] at the point (x,y). If the movement of individuals within the habitat is a diffusion process, then instead of (2.3.7), we have the Hutchinson model with diffusion —— (ft x y } =
dt
' '>
126
Chapter II
Equations (2.3.1)-(2.3.9) model the behavior of isolated biological communities (autonomous models). If there are external actions on the population, then some additional terms responsible for these actions appear on the right-hand sides of Eqs. (2.3.1)-(2.3.9) As usual, we distinguish two
types of external actions: purposeful controlled actions, which can be used to control the population size, and noncontrolled random perturbations. Let us consider a population described by the model (2.3.3). If there are external actions, say, some individuals are taken away from the population, then we obtain the controlled logistic model x = r
/ x\ (l-K — \x-qux, V /
(2.3.10)
where the function u = u(t) > 0 is the intensity of the catching process and
the number q > 0 is the catchability coefficient. In this case, the value
ft,
Q=q
u(t)x(t)dt
(2.3.11)
•>«!
gives the number of individuals caught during the time interval [ti,^]-13 In a similar way, the Lotka-Volterra equations can be generalized to the following controlled system:
xi — (ai - a2x2)xi - qiuixi, •
i
X2 = ( — Oi +
i
\
(Z.O.J-.ZJ
Here the control functions «i = ui(t) > 0 and u^ = u-2(t) > 0 are, respectively, the intensities of catching prey and predators. If individuals are removed only from one population, then we have u\(t) = 0 or U2(t) = 0 in
(2.3.12). If the population behavior is substantially influenced by noncontrolled
random perturbations, then the dynamics of the population is described by stochastic differential equations. For example, in some problems, the population behavior can be satisfactory described by the stochastic logistic model
x = r 1- K- x-qux + Wx£(t), (2.3.13) V J where £ ( t ) is the scalar Gaussian white noise (1.1.31) and the number B > 0 determines the intensity of random perturbations. Many other stochastic models used to describe the dynamics of various biological systems can be found in [51, 83]. 13
Note that Eq. (2.3.10) can be used not only in the case of "mechanical" removal
(catching, shooting, etc.) but also in the case where the population size is controlled by treating the habitat with chemical agents.
Exact Methods for Synthesis Problems
127
2.3.2. Statement of the problem. In this section we consider the optimal control of the size of a population described by the controlled logistic model (2.3.10). The statement of this problem is borrowed from the books [35, 68], where this problem is formulated in conformity to the problem of fisheries management.
We shall assume that the state x = x(t) of the controlled system (2.3.10) characterizes the general quantity (or the mean density) of fish at time t in some chosen habitat. We also assume that the intensity of fishing is bounded by a given value um > 0. In this case, the mathematical model for the dynamics of fish population has the form . (, x\ as = r I 1 — — I a? — qux, \ K)
0 < u(t)
t> 0,
(2.3.14)
x(Q) = ZQ > 0.
By p > 0 we denote the price of the unit mass of caught fish and by c > 0 the price of unit "efforts" u for fishing. Then it is natural to estimate the "quality" of control by the functional fT
I(u)= I (pqx(t)-c)u(t)dt, Jo
(2.3.15)
which, with regard to (2.3.11), gives the profit provided by the fishing process defined by the control function u(i): 0 < t < T. The problem is to find the optimal control u*(t): 0 < t < T for which the functional (2.3.15) attains its maximum. Following [35, 68], instead of (2.3.15), we shall estimate the quality of control (i.e., of fishing) by the functional in which the terminal time T —t oo. In this case, an additional "killing" factor appears in the integrand to ensure the convergence, namely, ,.00
I(u) = I Jo
st
e-
(pqx(t)-c)u(t)dt,
(2.3.16)
where S > 0 is a given positive number. As a result, we arrive at the following problem: for an arbitrary initial state x(Q) = x$ of the controlled system (2.3.14), to find a control function 0 < Ut.(t) < um: t > 0 (or
0 < w* ( x ( t ) } < um: t > 0) for which the functional (2.3.16) attains its maximum on the trajectories of system (2.3.14). REMARK. If the initial population size XQ does not exceed the capacity K of the medium, then it follows from Eq. (2.3.14) that for any time moment
128
Chapter II
t > 0 and any admissible control u(t), the population size has the same property, x(t) < K. Therefore, this problem is well posed if the parameters p, q, and c in the functional (2.3.16) satisfy the condition
— = Xl
(2.3.17)
Otherwise, (x± > K), this problem has only a trivial solution u*(t) = 0, t > O.14 Therefore, in what follows, we assume that the inequality (2.3.17) is satisfied. We also assume that qum > r.
D
2.3.3. The solution of problem (2.3.14), (2.3.16). If we define the function F(x) of the maximum future profit by the relation
U
oo
-|
e~st (pqx(t) - c)u(t) dt z(0) = x , j
(2.3.18)
then, using the standard procedure described in §1.3, we obtain the Bellman equation
max (J^U^Ufn
< I j
ra!
(l ~ IT) ~~ qux\ —r— — SF + (pqx — c)u > = 0 ^
-**- '
I CtiC
(2.3.19)
J
corresponding to problem (2.3.14), (2.3.16). It follows from Eq. (2.3.19) that, depending on the current state (the
population size) x of the system (2.3.14), to perform the optimal control we need to choose u*(x) = 0 for all points x £ R1 C R+ at which the function
p(x) =pqx-c-qx—— ax
(2.3.20)
is negative. Conversely, at all points x G R2 C R+ where
need to take the maximum admissible control ut(x) = um. If tp(xt) = 0 at a point cc* (in view of continuity, the point x* separating R1 from R2 is the limit point of these domains), then the optimal control u f ( x * ) for Eq. (2.3.19) is formally undetermined. However, one can see that the choice
of any admissible control 0 < u < um at the point x* does not affect the solution of Eq. (2.3.19).
Now let us consider how the population size in system (2.3.14) varies with time. Let x(0) = XQ < xi. Obviously, in this case, there exists an
initial half-interval [0,t*) at all points of which we must set ut(t) = 0. This statement immediately follows from the fact that the expression pqx — c in
the parentheses in (2.3.16) is negative for x(t] close to XQ. Thus, we have 14
Since x\ > K, we have pqx(t) — c < 0 for all t.
Exact Methods for Synthesis Problems
129
x(t) e R1 for all t £ [0,t«). Hence, it follows from Eq. (2.3.14) with u = 0 that, on the interval [0,**), the population size x(t) increases monotonically up to the value xt = x(t*) that separates the sets R1 and R 2 . At the point x f , as was already noted, the control may take any arbitrary value. It is expedient to take this value equal to
and keep it constant for t > t#. It follows from (2.3.14) that the control (2.3.21) preserves the population size xt. REMARK. For u(xt) ^ MI, the representative point of system (2.3.14), starting from the state a?*, comes either to the set R1 (for u(xt) > MI) or to the set R2 (for u(xt) < «i) during an infinitely small time interval. Then the control u = 0 (or u = um) immediately returns the representative point to the state a;*. Thus, for u(x+) ^ ui, the population size ** is preserved by infinitely rapid switches of control (this is the sliding mode). Though, as follows from (2.3.19), the value of the functional (2.3.16) for this control remains the same as for u(xt) = -MI, the constant control u(t) = u(xt) = HI, t > <*, is more convenient, since in this case the existence problem does not arise for the solution x(t), t > t f , of Eq. (2.3.14). The optimal control 0
for
0 < t < t*
«={(1-V)
fo,
«>«„
(x(t)
< a;*),
<"-22>
realizes the generalized solution x*(t) of Eq. (2.3.14) in the Filippov sense (see §1.1). D Thus, for x(0) = XQ < xi the optimal control (2.3.22) is a piecewise function shown in Fig. 21 together with the plot of the function x*(t), which shows the change of the population size corresponding to this control. It
remains to find the moment tt at which the catching of individuals starts or, which is the same, to find the size (density) xt = xt(tf} of the population that we need to keep constant in the area of active catching. These variables can readily be obtained by calculating the functional (2.3.16) and taking its maximum with respect to t«. Indeed, for the control (2.3.22), the functional (2.3.16) is equal to -
K J
. (2.3.23)
We can calculate its maximum with respect to tt by using the fact that x* = z*(i*) as a function of tt satisfies Eq. (2.3.14) for u = 0. After the
Chapter II
130
FIG. 21 differentiation, from the extremum condition
we obtain the following equation for the optimal size x* of the population: x
t - ± ( ± + ( l - i } K X\ x _ ^ - n • 2\pq + ( 2Mr ' r)") *
(
f2324^ '
This equation has only one positive solution
J
(2.3.25)
which has a physical meaning. Note that, in view of (2.3.17), the value xt determined by (2.3.25) always satisfies the inequality x^ < xt < K.
We also note that the condition
XQ < %i, introduced for the sake of obviousness, does not influence the
Exact Methods for Synthesis Problems
131
choice of the optimal control. This strategy is completely determined by x f , and according to this strategy, we do not catch individuals if the current population density x(t) < xf and start catching with constant intensity (2.3.21) if the population size attains the value xt (2.3.25). We can readily calculate the profit function (2.3.18) corresponding to
this strategy. Integrating the equation in (2.3.14) with a;(0) = x and u = 0, we obtain
-rK
x (w t ) = —————-.——r, x + (K - x)e~Tt
t > 0. ~
v(2.3.26)
'
Using (2.3.26), we see that the condition
xK
— = a;,
X -f {J\ — XJC
(2.3.27)
' '
allows us to find the moment tt. From (2.3.23) and (2.3.27), we explicitly calculate the profit function -Z*
(2.3.28) for x < x*. To solve problem (2.3.14), (2.3.16) completely, it remains to consider the case x(0) = XQ > x*, that is, the case where the initial population size is larger than the optimal size (2.3.25). First, we note that, in view of (2.3.28), the profit function F(x) monotonically increases on the interval 0 < x < x* from zero to the maximum value
(x,) =
dq
(pqx* - c) 1 \ KJ
(2.3.29)
We also note that the function ^(x) has only one maximum point
and since the "killing" factor 6 in (2.3.16) is strictly positive, we always
have the strict inequality x 2 > x*. Now if x(0) = XQ = x2 > £*, then using the constant control (2.3.30)
Chapter II
132
we can keep the population size at a level of x? for which the functional (2.3.16) attains the value
I(u(x2)) = However, the constant control (2.3.30) is not optimal. One can readily see that the functional (2.3.16) can take values larger than l(u(x.2)) = if>(x2) if, instead of (2.3.30), we use the piecewise constant control function
0
(2.3.31)
shown in Fig. 22.
A
x(t)
FIG.
22
We choose a time interval A during which the control um is applied, so that at the end of A the population size attains the value (2.3.25), that is,
«(A) = a;*.15 The time interval A for the control um is determined by the equation , - ________x 2 K(r-qu m )________ «t'* —
p
:
:——————^—;——————TT"-
rx2+[K(r-qum)-rx2\e(iu™-r^
(Z.O.OZ)
15
The inequality I(u&(t)) > 7(«(a;2)) = *l>(xi) is obtained by calculating the func-
tional (2.3.16) with regard to Eq. (2.3.14), where control has the form (2.3.31). Here we do not perform the corresponding elementary but cumbersome calculations and leave
them to the reader as an exercise.
Exact Methods for Synthesis Problems
133
Obviously, control functions of the form (2.3.31) can be used not only for the initial population size x(0) = x% but also for arbitrary initial sizes x(0) = x > xf. In this case, we must perform the change £2 —>• x only in Eq. (2.3.32) for the length A of the initial pulse um. One can easily verify
that (2.3.20) implies tp(x) > 0 for all x > x*. Therefore, the optimal control as a function of the current population size (the synthesizing function) for problem (2.3.14), (2.3.16) has the form
for
for for
0 < x < a;*,
x = x#,
(2.3.33)
x > a;*,
where xt is determined by (2.3.25). Formula (2.3.33) gives the mathematical expression for the control strategy that is well known in the theory of optimal fisheries management [35, 68]. The key point in this strategy is the existence of an optimal size xt of fish population given by (2.3.25). The goal of control is to achieve the optimal size xf of the population as soon as possible and to preserve the size x* by using the constant control (2.3.21). This control strategy maximizes
the profit obtained by fishing if the profit is estimated by the functional (2.3.16). In conclusion, we note that the results presented in this section can be generalized to the case in which the dynamics offish population is subject
to the retarded equation (or the equation with delay) =r(l- ^--V] X(t) - qu(t)x(t); K \ J
here we have studied the controlled Hutchinson model. For the results related to this case, see [99].
The stochastic version of problem (2.3.14), (2.3.16), when the behavior of the population is described by the stochastic equation (2.3.13), will be
described in §6.3. §2.4. Stochastic problem of optimal fisheries management
Now let us consider the problem of optimal fisheries management that differs from the problem considered in §2.3 by the stochastic character of the model used for the description of the population dynamics. We assume that the behavior of a fish population is subject to the stochastic differential equation of the form
x ( t ) = [r- qu(t)]x(t) + V2Bx(t)£(t),
x(0)
= z0,
0 < u(t) < um,
t > 0,
(2.4.1)
134
Chapter II
where £ ( t ) is a scalar Gaussian white noise (1.1.31), B > 0 is a given positive number, the natural growth factor r > 0 and the catchability coefficient q > 0 have the same meaning as similar coefficients in (2.3.10), (2.3.13), and (2.3.14). Equation (2.4.1) is a special case (as K ->• oo) of Eq. (2.3.13) and, in accordance with the classification presented in Section 2.3.1, the model described by Eq. (2.4.1) can be called a controlled stochastic Malthus
model. Just as in §2.3, the size x ( t ) of the fish population is controlled by catching a part of this population. In this case, the catching intensity u(t ) has an upper bound um, and therefore, the set of all nonnegative measurable bounded functions of the form u(t) : [0, oo) —>• [0, um] is considered as the
set of admissible controls. The goal of control is to maximize the functional (2.3.16), which, in view of the random character of the functions x(t) and u(t), is replaced by the corresponding mean value. As a result, we have the problem r
/.oo
I(u) = E \
-|
e~5t (pqx(t) - c}u(t) dt ->•
I JO
J
max
.
(2.4.2)
«(t)6[0,« m ] «>0
In what follows, we assume that the decay index 6 in (2.4.2) satisfies the
condition 8 > r. We shall solve problem (2.4.1), (2.4.2) by using the standard procedure of the dynamic programming approach described in §1.4. We define the profit function for problem (2.4.1), (2.4.2) by the relation F(x)=
r r°° e~
max E / «(«)e[o,« m ] Uo t>o
Si
i
(pqx(t) - c)u(t) dt \ x(Q) = x , \
(2.4.3)
where £[(•) [ x(0) = x] denotes the conditional mathematical expectation of (•). As was shown in [113, 175], the second-order derivative of the profit function (2.4.3) is continuous. It follows from Theorem 3.1.5 in [113] that for all x G R + — [0, oo), this function has the upper bound
F(x)
(2.4.4)
where N > 0 is a constant.
The Bellman equation (Fx - ^, Fxx = f^f) max [Bx2 Fxx + (r + B - qu)xFx - SF + (pqx - c)u] = 0
0<«<« m
(2.4.5)
for the profit function (2.4.3) can be obtained as usual (see §1.4). It should
be pointed out that a symmetrized stochastic integral (see [174] and §1.2)
Exact Methods for Synthesis Problems
135
was used for writing (2.4.5). This led to an additional term B in the parentheses in (2.4.5), that is, in the coefficient of xFx.le Equation (2.4.5) allows us to find the optimal control ut as a function u i f ( x ) of the current states of system (2.4.1). First, we note that, according to (2.4.5), the set of all admissible states of (2.4.1) can be divided into the following two subsets (just as in §2.3):
the subset R 1 , where
the subset R 2 , where f>(x) > 0 and u*(x) = um. The boundary between these two subsets is determined by the relation pqx - c - qxFx - 0.
(2.4.6)
The further calculations show that, in this problem, there exists a unique point x* satisfying (2.4.6). Therefore, the subsets R1 and R2 are the intervals R1 = [0, x*) and R2 = (x*,oo). Thus the optimal control in the synthesis form Ut = u*(x) becomes uniquely determined at all points x <E R+
except for the point x*. It follows from (2.4.6) that we can use any admissible control u(z*) 6 [0,itm] at the point x*. Therefore, the optimal control function M*(X) can be represented in the form ( \ ' ,,
u*(x)=<
(2.4.7)
I Mm,
if
X > X*,
and the final solution of the synthesis problem is reduced to calculating the coordinate of the switch point X*. To calculate x*, we need to solve the Bellman equation (2.4.5). As was already noted, the second-order derivative of the profit function F(x) is continuous, thus the profit function F(x) satisfying (2.4.5) can
be obtained by using the matching method (see §2.2). In what follows, we describe the procedure for solving the Bellman equation (2.4.5) and calculating the coordinate of the switch point xf in detail. By F*(x) and F2(x) we denote the profit function F(x) on the intervals
R1 = [0, x,) and R2 = (a:,, oo). It follows from (2.4.5) and (2.4.7) that the functions Fl and F2 satisfy the linear equations
- SF1 = 0, 2
Bx F
2
2
x
0 < x < x,, 2
+ (r + B - qum)xF - SF + (pqx
- c)um = 0,
(2.4.8) x > a;,.
(2.4.9) 16 If the stochastic differential equation in (2.4.1) is the Ito equation, then the second term in the Bellman equation (2.4.5) has the form (r — qu)xFx.
136
Chapter II
Since the profit function F(x) is sufficiently smooth (recall that the secondorder derivative of F(x) is continuous), both functions F1 and F2 must satisfy the condition (2.4.6) at the switch point x*. Taking into account the fact that F(0) = 0 according to (2.4.1) and (2.4.3), we have the following additional boundary condition for the function F!(x):
Fl(0) = 0.
(2.4.10)
The boundary conditions (2.4.6), (2.4.10), and the upper bound (2.4.4) allow us to obtain the explicit analytic solution of Eqs. (2.4.8) and (2.4.9). Equation (2.4.8) is the well-known homogeneous Euler equation. Its general solution has the form
,
(2.4.11)
where AI and A? are constants, and k\ and k\ satisfy the characteristic equation
Bk(k-l) + (r + B)k-S = 0.
(2.4.12)
The constants AI and A^ are determined by two boundary conditions (2.4.10) and (2.4.6) at the points x = 0 and x = a;,. Since the roots
of Eq. (2.4.12) have opposite signs, we conclude that to satisfy the condition (2.4.10), we need to set A2 equal to zero in (2.4.11). The constant c
~k
can be calculated by substituting Fl(x) = Aixk> into (2.4.6) and taking into account the fact that (2.4.6) is valid at the switch point x*. Thus, the
solution of Eq. (2.4.8) is given by the formula
>
0<x<x*.
(2.4.14)
The inhomogeneous Euler equation (2.4.9) can be solved in a similar way. By using the standard method of variation of parameters, we obtain the general solution PqX
Exact Methods for Synthesis Problems where
137
___________ *? = ^ [««m - r + >/(««m - r) 2 + 4B<5] ,
Y ^2 = ^[?w™ -
r
Lti
___________ ~ V(lum -r)2 + 4BS]
(2.4.16)
satisfy the characteristic equation
Bk2 - (qum -r)k-S = 0, and Ay and A± are arbitrary constants.
Since k2 is positive, we must set the constant A$ equal to zero (otherwise, formula (2.4.15) contradicts the upper bound (2.4.4)). The constant A^ can
be calculated from condition (2.4.6) at the switch point x f . Substituting F2(x) (determined by (2.4.15) with A3 = 0) into (2.4.6) instead of the function F, we obtain A
.
(S-r-B)p
——— ________ 1____________________ '___________ rp
4 — r 7/ r
n
,
\
i_fc*
^
__
*
c
_______ /w
T 7
-kl
2,
*
This implies the following expression for the function F2(x):
6 - r - B + qum
l\(6-r-B)px*
c]fx\k'
+ 72" r————— JT^ ——— ~ T
kl [S-r-B + qum
— )
JJV**/
i
x >x
~
*-
(2.4.17)
The two functions F1(x) (2.4.14) and F2(x) (2.4.17) determine the profit function F(x) satisfying the Bellman equation (2.4.5) for all x £ R+ = [0,oo). These functions contain a parameter a;*, which remains unknown.
We can calculate x* by using the continuity property of the profit function F(x). Each of the functions Fl and F2 is continuous. Hence, to ensure the continuity of F(x), it suffices to satisfy the condition
Fl(x*) = F2(x*)
(2.4.18)
at the switch point x». It follows from (2.4.6), (2.4.8), and (2.4.9) that (2.4.18) is equivalent to the condition
F2,(x.) = F2x(x*) at the switch point x*.
(2.4.19)
138
Chapter II
Calculating the second-order derivative of the functions (2.4.14) and (2.4.17), we derive the following equation for x» from (2.4.19):
p(6 — r — B)
c
Hence, the switch point xt is determined by the explicit formula T _ JLtjfi
c(d — r — tf + qum)
——
/1 I
*\
*
I Zf.rt.ZdJ I
pq[S — r — B + ,ki~k2\qum] Formula (2.4.20) and the optimal control algorithm (2.4.7) constitute
the complete analytic solution of the stochastic problem (2.4.1), (2.4.2) of optimal fisheries management. Some final comments and remarks. It is of interest to compare (2.4.20) and (2.3.25), which is the optimal size of the population in the deterministic problem (2.3.14), (2.3.16) of optimal control considered in §2.3. Denoting (2.3.25) by xf, we may expect that the equality
lim x* = lim xt
K-s-oo
B-X)
(2.4.21)
is valid due to continuity reasons (since the deterministic version of problem (2.4.1), (2.4.2) formally coincides with problem (2.3.14), (2.3.16) as K ->• oo). We can verify (2.4.21) by straightforward calculations of the limits on both sides. Indeed, using (2.3.25), we readily calculate the limit on the left-hand side of (2.4.21) for S > r,
lim i, =
K-+OO
f
pq(S - r)
(2.4.22) ^
'
The same result is obtained by calculating the limit of (2.4.20) as B —>• 0, since fej-l v (S-r)(qum-r) k\ — k\ B->o which follows from (2.4.13) and (2.4.16). Formula (2.4.21) shows how the results obtained in this section for problem (2.4.1), (2.4.2) are related to similar results for problem (2.3.14), (2.3.16) obtained in Section 2.3.3 by quite different methods.
There is another interesting specific feature of problem (2.4.1), (2.4.2). Namely, the standard "classical" approach of the dynamic programming
Exact Methods for Synthesis Problems
139
that leads to the exact solution of the stochastic problem (2.4.1), (2.4.2) does not allow us to solve the synthesis problem (that is, to find the switch point Xf) for the deterministic version of problem (2.4.1), (2.4.2), that is, in the case where there are no random perturbations in Eq. (2.4.1). This fact can readily be verified if we consider the deterministic version of the
Bellman equation (2.4.5). max \(r - qu)xFx - SF + (pqx - c)u] = 0
(2.4.23)
0
and calculate the functions
F'( z ) = l(f>z.-£)(±.J :c I
i •"
d \o-l
,
(2.4.24)
-«'l <- I + T^ i / \7 i \ •*-* / a[(r-qu )(a-l) m
I —
q\ \a;*
* r - qum
(2.4.25)
which, in this case, determine the profit function F(x) on the intervals R1 = [0, »*) and R 2 = [a;*, oo). Contrary to the stochastic case in which the continuity condition (2.4.18) for the functions (2.4.14) and (2.4.17) determines the unique switch point (2.4.20), one can readily verify that the same continuity condition F 1 ( x f ) = F2(x*) for the functions (2.4.24) and (2.4.25) holds for any point xf € (0, oo). Therefore, the control problem considered can serve as an example illustrating the well-known idea (see [113, 175]) that the dynamic programming approach is more suited for solving control problems with stochastic
models of plants (which, by the way, describe the actual reality more adequately) . REMARK. If the equation in (2.4.1) is understood as the Ito stochastic equation, then the Bellman equation for problem (2.4.1), (2.4.2) differs from (2.4.5) and has the form max [Bx2Fxx + (r - qu)xFx — SF + (pqx — c)u] = 0.
0<«<« m
The way for solving this equation is quite similar to the above procedure for solving Eq. (2.4.5). However, the population size xf that determines
the switch point for the optimal control (2.4.7) differs from (2.4.20) and is
140
Chapter II
given by the expressions
_
=
c(S - r + qum)
s B + qum - r - y/(B + qum - r) 2 + 46B . D
CHAPTER III
APPROXIMATE SYNTHESIS OF STOCHASTIC CONTROL SYSTEMS WITH SMALL CONTROL ACTIONS
Various approximate synthesis methods can be useful if the Bellman equation cannot be solved exactly. Chapters III-VI deal with some of these
methods. Approximate methods are usually efficient if the initial statement of the optimal control problem contains a small parameter. Quasioptimal control algorithms are constructed by using either the corresponding procedures of successive approximations or asymptotic expansions of the loss function in powers of a small parameter of the problem. The choice of a method for constructing an approximate solution of the synthesis problem essentially
depends on the choice of a parameter that is considered to be small. For example, in this chapter, the values of control actions are assumed to be small. Chapter IV is about the Bellman equation with small diffusion coefficients. In Chapter V, we consider control problems for oscillating systems with small attenuation decrement. In Chapter VI, the role of small parameters is played by the a posteriori covariances of unknown coefficients in the plant equations. Let us formulate the main idea of the approximate synthesis method studied in this chapter. As was already noted, the method is based on the assumption that control actions on the plant P are relatively small. Prom
the physical viewpoint, this assumption means that the effect of the control actions on the phase trajectories of the system is small, and therefore the system dynamics is similar to noncontrolled motion. In particular, this
assumption holds for control problems with constraints if the noises acting on the plant are of large intensity. Indeed, let us assume that the controlled (unperturbed) plant is a stable
mechanical system. Then large random perturbations lead to large deviations of the system from the equilibrium state. In this case, some "internal" inertial and elastic forces arise in the system. These forces can significantly
exceed the (bounded) control forces whose effects on the system turn out to be relatively small.1 1
Note that in this book we do not consider deterministic synthesis problems for
141
142
Chapter III
From the formal mathematical viewpoint, the fact that control actions are small leads to a small parameter in the nonlinear term in the Bellman equation. To verify this fact, let us consider the synthesis problem for the servomechanism (Fig. 10) governed by the Bellman equation (1.4.21). Assume that the dimensions of the region U of admissible controls are bounded by a small value of the order of e. For definiteness, we assume that U is either an r-dimensional parallelepiped (RT D U = (u: u,-| < umi, i — 1, 2, . . . , r; maxj umi = e)) or an r-dimensional ball of radius e, that is, (R, D U = (u: £J=1 «? < e ) ) . In the first case, according to (1.3.22), the solution of the synthesis problem is given by the formula (the control algorithm)
= - { « m l , - . - , W m r } s i g n ( QT (t) -r— (t, X, y) 1,
\
dx
J
(3.0.1)
where the vector of partial derivatives dF/dx is calculated by solving the equation
LF(t, x,y) = -ci ( x , y ) + eu^ ^
w
~ - r, *,,,
(3.0.2)
Here um denotes the r-vector (column) um/e and L is a linear operator of the form 2
"'^^ u -
(see
32
In the second case (where U is a ball), the optimal control has the form (1.3.23))
systems controlled by small forces. Such systems called weakly controllable in [32] were studied in [32, 137]. 2 Recall that relations (3.0.1) and (3.0.2) follow from the Bellman equation (1.4.21)
with c(x,y,u) = ci(x,y) and Ax(t,x) = a(t,x) + Q(t)u; {uml,.. .,umr} denotes a diagonal (r X r)-matrix; for a column A with components AI ,..., Ar, the expressions sign A and \A\ denote j--columns with components sign-A,- and \A{\ (i = 1,..., r), respectively.
Approximate Synthesis of Stochastic Control Systems
143
where the vector dF/dx is the gradient of the loss function satisfying the equation
(
op
1
ft]?\ ~^
QQT-
•
(3.0.4)
If we denote the nonlinear terms in Eqs. (3.0.2) and (3.0.4) in the same way, then we can write both equations in the form
,
(3.0.5)
where $(f, dF/dx) is a given nonlinear function of its arguments. As a rule, equations of the type (3.0.5) cannot be solved exactly. However, the presence of a small parameter in the nonlinear term of this equation yields a rather natural way of solving this equation approximately. To this end, one can use the method of successive approximations in which the
zero-order approximation Fo(t, x,y) satisfies the equation
LF0 = -c1(x,y)
(3.0.6)
and the successive approximations Fk(t,x,y) can be calculated recurrently by solving the sequence of linear equations of the form
A =1,2,....
(3.0.7)
If we know the solution Fk(t, x, y) of the equation for the /sth approximation (fc = 0, 1, . . .), then we can perform an approximate synthesis of the controlled system by setting the quasioptimal control algorithm as (3.0.8)
In this chapter we consider an approximate method for the synthesis of optimal systems whose "algorithmic" essence is given in formulas (3.0.6)-
(3.0.8).3 Needless to say, the practical use of procedure (3.0.6)-(3.08) in special problems leads to additional problems of constructivity and efficiency of 3 The approximate synthesis algorithm (3.0.6)-(3.08) is a modification of the wellknown Bellman method of successive approximations [14, 16]. This method was used by
W. Fleming for solving some stochastic problems of optimal control [55]. The procedure
(3.0.6)-(3.0.8) is a special case of the Bellman method if the trivial strategy uo(t,x,y) = 0 is used as the initial "generating" control strategy in the Bellman method.
144
Chapter III
this approximate synthesis method. In this chapter we shall discuss these problems in detail. All related material is divided in sections as follows. First (§§3.1-3.3), we consider some methods for calculating the successive approximations for stationary synthesis problems. We write out approximate solutions (that correspond to the first two approximations) for some
special control systems with various types of disturbances affecting the system. In §3.1 and §3.2, we consider random perturbations of the white noise type. In §3.3 the results obtained in §3.1 and §3.2 are generalized to the case of correlated noises. In §3.4 we study nonstationary problems and estimate the error of the approximate synthesis (3.0.6)-(3.0.8) for the first two approximations. In §3.5 we study asymptotic properties of successive approximations (3.0.7), (3.0.8) as k —>• oo. We show that, under some special conditions, as k —> oo the sequence F& is convergent to the exact solution of the Bellman equation, and the corresponding quasioptimal control algorithms (3.0.8) to the optimal control ut(t,x,y). In this case, the convergence Uk -> uf is understood in the sense of convergence of values of the functional to be minimized. Finally, in §3.6 the method of successive approximations (3.0.6)^(3.0.8) is used for approximate synthesis of some stochastic control systems with distributed parameters. §3.1.
Approximate solution of stationary synthesis problems
3.1.1. Let us consider the problem of optimal damping of oscillations in a dynamic system subject to random perturbations of the white noise type (Fig. 13). Let the plant P be described by the following system of linear stochastic differential equations with constant coefficients:
(3.1.1)
Here x = x ( t ) is an n- vector (column) of current phase variables of the system (xi(t), . . . , xn(t )) , u = u(t) is an r-vector (column) of control actions (ui(t), . . . ,u r (t)), £(t) is an n-vector (column) of random perturbations with independent components (£i(<), . . • ,£»»(<)) of the standard white noise type (1.1.31), and A, Q, and
J
(3.1.2)
Approximate Synthesis of Stochastic Control Systems
145
where c(x) > 0 is a given convex penalty function attaining the absolute minimum c(0) = 0 at the point x — 0 (the restrictions on c(z) are discussed in detail in §3.4 and §3.5). Let admissible controls be bounded and small. We assume that all components of the control vector u satisfy the conditions
K < £umi,
i=l,...,r,
(3.1.3)
where e > 0 is a small parameter and w m i , . . . , umr > 0 are given numbers
of order 1. The system shown in Fig. 13 is a special case (the input signal y(t) = 0) of the servomechanism shown in Fig. 10. Therefore, the Bellman equation
for problem (3.1.1)-(3.1.3) readily follows from (1.4.21), and taking into account the relations Ay(t, y) = 0, By(t, y) = 0, Ax(t, x, u) = Ax + Qu, and c(x,y,u) = c(x), we obtain dF(t x}
' +LF(t,X)+ Ot
min
T
gT
+ c(x) = 0.
(3.1.4)
Here L denotes a linear elliptic operator of the form
(3 L5)
-
where, according to (1.4.16), the matrix B = aaT , and, as usual, the sum in the last expression on the right-hand side of (3.1.5) is taken over repeated indices from 1 to n. It follows from (3.0.1) and (3.0.2) that in this case the optimal control has the form
(3.1.6) where the loss function F(t, x) satisfies the equation
dF —— (t, a;) + LF(t, x) - -c(x) + eu
(3.1.7)
Some methods for solving Eq. (3.1.7) will be considered in §3.4. In the present section, we restrict our consideration to stationary operating conditions of the stabilization system in question. It follows from §1.4 that the stationary mode of stabilization (damping) can take place if T —)• oo, where T is the terminal instant of the operation interval (the upper
Chapter III
146
integration limit in (3.1.2)). Obviously, in this case, stationary operating conditions exist only if the unperturbed motion of the plant P is stable, that is, in other words, if the real parts of the eigenvalues of the matrix A in (3.1.1) are negative. In what follows, we assume that these conditions
are satisfied. If we define the stationary loss function f ( x ) by the relation (see (1.4.29), (2.2.9))
then (3.1.7) implies the following time-invariant equation for f ( x ) :
Lf(x)
= 7 - c(x) +
(3.1.8)
where the parameter 7 characterizing stationary "specific losses" together
with the function f ( x ) can be found by solving Eq. (3.1.8). We shall solve Eq. (3.1.8) by the method of successive approximations. The scheme for calculating (3.0.6), (3.0.7), applied to the time-invariant equation (3.1.8) leads to the sequence of equations
(3.1.9)
Lfo = 7° - c ( x ) ,
Lfk = Jk- c(x)
Q1
dx
(3.1.10)
It follows from (3.1.9) and (3.1.10) that each time when we are calculating the next approximation, we need to solve a linear inhomogeneous elliptic equation of the form
Lf(x) =
(3.1.11)
We shall consider a method for solving Eq. (3.1.11) with a given function
in eigenfunctions of some Sturm-Liouville problem [179]. 3.1.2. The passage to the adjoint equation. Let us consider the operator
that is the dual of the operator (3.1.5). The equation
— (<, x) = L*p(t,x)
(3.1.13)
Approximate Synthesis of Stochastic Control Systems
147
is the Fokker-Planck equation (1.1.67) for the n-dimensional Gaussian Markov process x(i) [45, 167, 173]. The assumption that the matrix A is stable implies that this process has a stationary density function po(x) such that
L*p0(x) = 0.
(3.1.14)
In this case the stationary probability density po(x) has the form
po(x) = V
. /
1
«
T ——— exp [L - \(x Px)} J . 2 - i
(3.1.15) '
where P"1 is the matrix of covariances of components of the vector x.
We shall present a possible method for solving Eq. (3.1.13) and calculating the matrix P in (3.1.15). The diffusion Markov process x ( t ) described by (3.1.13) satisfies the system of linear stochastic differential equations
x = Ax +
(3.1.16)
describing the uncontrolled motion of the plant (3.1.1). We pass from x to new variables y related to x by the linear transformation
x = Vy
(3.1.17)
with a nondegenerate matrix V. As a result, instead of (3.1.16), we have the following system for the new variables:
(3.1.18) where
A = V~iAV,
a = V-1a.
(3.1.19)
We choose V so that the matrix A is diagonal
I={A1,A2,...,An}.
(3.1.20)
As is known [62], the matrix V always exists and can readily be constructed if the eigenvalues of the matrix A are simple, that is, if the characteristic equation of the matrix A,
det(j4. - \E) = 0,
(3.1.21)
has different roots AI, A 2 , . . . , A n . In this case, the columns of the matrix V are the eigenvectors vl of the matrix A that satisfy the linear equations
Aff=\ivi,
i=l,...,n.
(3.1.22)
148
Chapter III
The system (3.1.18) can readily be solved in the case of (3.1.20). Indeed, writing (3.1.18) in rows, we obtain
yt = Aiyi + rli(t),
£=l,...,n,
(3.1.23)
where the random functions rjt(i) — &tk(,k(t) are processes of the white noise type and have the characteristics
Eife(*) = 0,
Em(t)rjm(t - T) = BtmS(r)-
£ , m = l , . . . , n . (3.1.24)
Here Btm is an element of the matrix B = cfcfT. obtain
yt(t) = yt0e^-^ +
Solving Eq. (3.1.23), we
e^-r,^') dt',
ylo = W (t 0 ),
(3.1.25)
Jto
and taking into account (3.1.24), derive the following expressions for the means and covariances:
(3.1.26)
which determine the transition probability p(y(t) process y(t). It follows from (3.1.26) that
Ey/ = 0,
Eytym = -
y(to)) of the Gaussian
B m
^ .
(3.1.27)
*t + *m
in the stationary case as t —> oo, since, by assumption, ReA^ < 0 (i = 1, . . . , n) for all roots of the characteristic equation (3.1.21) of the matrix A. It follows from (3.1.27) that the stationary density function po(y) can be written in the form
po(y) =
1
=exp[-i(y 1
r
Py)],
^/(27r)«detP-
where each entry of the matrix P"1 is given by the formula
(3.1.28)
Approximate Synthesis of Stochastic Control Systems
149
The stationary density po(y) satisfies the stationary Fokker-Planck equation
L*Po(y) = --(XiViPo) + ^jj-(BijPo) dyi
2 ayidyj
= 0.
(3.1.30)
Since the random processes y(t) and x(t) are related by the linear transformation (3.1.17), the comparison of (3.1.15) with (3.1.28) yields the formula
P = VTPV,
(3.1.31)
which together with (3.1.29) allows us to calculate the matrix P. Now let us return to the Fokker-Planck equation (3.1.14). If the operator (3.1.5) satisfies the potentiality condition (see §4, Section 5 in [173]), then the operator equality4 POL
= L*PQ
(3.1.32)
readily follows from (3.1.5), (3.1.12), and (3.1.14). However, even if the
potentiality conditions are not satisfied and (3.1.32) does not hold, one can choose an operator L\ satisfying a similar relation PoL
= L\PQ.
(3.1.33)
One can readily see that the operator L\ has the form5
ri = - ( G , ^ +
(^),
(3-1.34)
where the matrix G = ||G,-j||" is similar to the transpose matrix AT from (3.1.1) and (3.1.5),
G=P-1ATP.
(3.1.35)
The similarity transform (3.1.35) employs the matrix P from (3.1.15). Relation (3.1.33) allows us to replace Eq. (3.1.11) by a similar equation
for the dual operator. In other words, it follows from (3.1.11) and (3.1.33) that the problem of finding f ( x ) in (3.1.11) is equivalent to the problem of finding z(x) in the equation
L\z(x) = ip(x),
(3.1.36)
where z(x), i>(x), and the functions /(a;), V>(x) from Eq. (3.1.11) satisfy the relations
z(x)=po(x)f(x), 4
if>(x) = Po(x)if>(x).
(3.1.37)
As usual, the operator equality is understood in the sense that it is an ordinary
r e l a t i o n p g ( x ) L w ( x ) = L*po(x)w(x) for any sufficiently smooth function w(x). 5 The verification of (3.1.33) is left to the reader as an exercise.
150
Chapter III
3.1.3. The solution of equations (3.1.36) and (3.1.11). Let us consider the following problem of finding the eigenfunctions zs (x) and eigenvalues A s of the operator L\ (the Sturm-Liouville problem):
L\zs = \,z,.
(3.1.38)
Since L\ is the Fokker-Planck operator, its eigenfunctions zs must satisfy the zero conditions at infinity (as x —> oo).
By passing from x to new variables y (x = Vy) and acting in a way similar to (3.1.17)-(3.1.31), we can transform the operator (3.1.34) to the form £;
=-(A,.tt)+ I ( B
t f
) ,
(3.1.39)
where Bij is an element of the matrix B = a (TT , "a = V cr, V is a nonde__ _ i __ __ generate matrix such that the transformation V GV makes the matrix G diagonal, G = {Ai, . . ., A n }, and A; are roots of Eq. (3.1.21).6 In the new variables the stationary Fokker-Planck equation has the form
*
- 0-
^-
This equation differs from (3.1.30) only by the matrix of diffusion coefficients; therefore, the stationary probability density p0(y) is determined by the formulas
pQ(y) =
1
_= exp [ - |(j/TPy)] ,
(3.1.41)
y(27r)"detP
1 similar to (3.1.28) and (3.1.29). Differentiating (3.1.40) appropriately many times, we see that
_
+ m 2 A 2 + • • • + mn \n) —^——jr^Po = 0,
(3.1.43)
According fco (3.1.35), the matrix G is similar to the transpose AT. Since all similar and transpose matrices have the same eigenvalues, the characteristic equation det(G — AB) = 0 for the matrix G coincides with (3.1.21)).
Approximate Synthesis of Stochastic Control Systems
151
where mi, m?,..., mn are any arbitrary integers between 0 and oo. It follows from v(3.1.43) that the functions -J^j—Q 1" Pn an-d the num' dy-i ---dyn bers (miAi + • • • + m n A n ) can be treated as the eigenfunctions zs and the
eigenvalues Xs of problem (3.1.38), respectively. By using (3.1.41), we can write the functions zs in more detail as follows:7 Zs
= (_l)™i+"-+™»tf rai ... ra J y )exp [- f(y T Pj/)].
(3.1.44)
Here Hmi,_mn(y) — Hmi...mn(yi: • • -,2/n) denote multidimensional Hermitian polynomials (for instance, see [4]) that, by definition, are equal to Hm^,...,mn(y) = (-l) mi +"+ m « exp [f (j/TPt/)]
"
(3.1.45)
It follows from the general theory [4] for Hermitian polynomials with real variables y that these polynomials form a closed and complete system of functions, and an arbitrary function from a sufficiently large class (these functions grow at infinity not faster than any finite power of \y\) can be expanded in an absolutely and uniformly convergent series in this system of functions. Furthermore, the polynomials H are orthogonal to another
group of Hermitian polynomials G given by the formula
"-|(/i T P"V)]-
(3.1.46)
Here the variables fj, and y satisfy the relation or
=
and the orthogonality condition itself has the form
r00
i"*
J — oo
J — oo
_
MetP
..Sv,m,
(S1/imi is the Kronecker delta). 7
The constant coefficient [(2ir) n detP "'j- 1 / 2 in (3.1.44) is omitted.
(3.1.48)
152
Chapter III
However, we often need to use a complex matrix V for the change of variables x —> y (for instance, see the problem in §3.2.). To pass to complex variables, we need to verify some additional statements from the general
theory [4], which hold for real variables. In particular, it is necessary to verify the orthogonality conditions (3.1.48), which are the most important in practical calculations.
This was verified in [107], where it was shown that all properties of the polynomials H and G remain valid for complex variables if only all functions HVl...Vn(y), Gmi...mn(y), exp[|(yTPy)], and exp[|(/iTP fj,)] are considered as functions of the initial real variables x of the problem. To this end, we need to make the change of variables y = V x in all these functions. In particular, in this case, the orthogonality condition (3.1.48) has the form oo
/
/»oo
... / -oo
e-^xTp^HVl...
J — oo
..£„„„,„,
(3.1.49)
where the matrices PI and P satisfy the relation
~P = VTP1V
(3.1.50)
similar to (3.1.31). Thus, we obtain the following algorithm for constructing the solution f ( x ) of Eq. (3.1.11). First, we seek a stationary density po(x) satisfying (3.1.14) and an operator L\ satisfying (3.1.33). Then we transform prob-
lem (3.1.11) to problem (3.1.36). After this, to find the eigenfunctions and eigenvalues of problem (3.1.38), we need to calculate the matrix V that transforms the matrix G to the diagonal form { A i , . . . , A n } by the siml-
larity transform V GV. Next, using the known A,- and V and (3.1.42), we calculate the matrices P and P that determine the stationary distribution (3.1.41). The expression obtained for p0(y) enables us to find the eigenfunctions zs = zmi...mm (3.1.44) for problem (3.1.38) and the orthogonal polynomials G TOl ... TOn (3.1.46). Finally, we seek the function z(x) satisfying (3.1.36) in the form of the series with respect to the eigenfunctions: oo z x
( ) =
X) mj....m n =0
a
m1...mnZmi. ..mn(x),
(3.1.51)
Approximate Synthesis of Stochastic Control Systems
153
where omi...ran are unknown coefficients; the eigenfunctions zmi,,.mn(x) can be calculated by formulas (3.1.44) with y = V x. If we also represent the right-hand side ij)(x) — po(x)
po(x)
^
bmi...mnzmi...mn(x),
(3.1.52)
where, in view of (3.1.49),
*— r...r
A/det PI (27r) n / 2 mi... !.. .m .,,„„. ^_00 00 n\ J_
Jj.oo _,
T-l i x G r o i ... T O/nT(F~ z)dxi...dz n
(3.1.53)
then we can calculate the unknown coefficients oroi...TOn in (3.1.51) by the formula ^m i...mn
\
ami...mn - T———— ,
\
.
, \
A mi ... mn = Aimi + ••• + Xnmn,
/ o i c x \
(3.1.54)
which follows from (3.1.38) and (3.1.43). Now we see that (3.1.37) implies the expression
for the solution of the initial equation (3.1.11). The algorithm obtained for solving (3.1.11) can be used for calculating
the successive approximations (3.1.9) and (3.1.10) It remains only to solve the problem of how to choose the stationary losses 7^ (k=0,l,2, ... ) in
Eqs. (3.1.9) and (3.1.10). 3.1.4. Calculation of the parameters Vs (k - 0,1,2,...). The structure of the solution (3.1.55) and a natural requirement that the stationary loss function f ( x ) must be finite imply that there is a unique method for choosing 7*. Indeed, since, according to (3.1.54), the eigenvalue AQO...O = 0, the coefficient aoo...o in (3.1.46) is finite if a necessary condition 60o.. .0 = 0 is satisfied, or, more precisely, (in view of (3.1.53) and (3.1.46)) if we have oo
/
/»oo
.../ •oo
J — oo
PQ(X)
154
Chapter III
This relation, (3.1.9) and (3.1.10) imply the following expressions for the stationary losses 7*: OO
/
/»OO
.../ -OO
c(x)pQ(x)dxl...dxn,
(3.1.56)
J — OO
Tdfk-i(x)
Q'
Po(x)dxi...dxn, dx = 1,2,.... (3.1.57)
Thus, we have completely solved the problem of how to calculate the successive approximations (3.1.9), (3.1.10) for the stationary operating conditions
of the optimal stabilization system. If the loss function f k ( x ) in the fcth approximation is calculated, then the quasioptimal control Uk(x) in the feth. approximation is completely defined,
namely, in view of (3.0.8) and (3.1.6), we have
(3.1.58) In the next section, using this general algorithm for approximate synthesis, we shall calculate a special system of optimal damping of random oscillations when the plant is a linear oscillating system with one degree of freedom. §3.2. Calculation of a quasioptimal regulator for the oscillatory plant
In this section we consider the stabilization system shown in Pig. 13, in which the plant P is an oscillatory dynamic system described by the equation
x + /3x + x = u + VB£(t),
(3.2.1)
where the absolute value of the scalar control u is bounded,
\u\ < e,
(3.2.2)
the scalar random process £(t) is the standard white noise (1.1.31), and /?,
B, and s are given positive numbers ((3 < 2). Equations of the type of (3.2.1) describe the motion of a single mass
point under the action of elastic forces, viscous friction, controlling and random perturbations. The same equation describes the dynamics of a direct-current motor controlled by the voltage applied to the armature when
Approximate Synthesis of Stochastic Control Systems
155
the load on the shaft varies randomly. Examples of other actual physical objects described by Eq. (3.2.1) can be found in [2, 19, 27, 136]. For system (3.2.1), (3.2.2), it is required to calculate the optimal regulator (damper) C (see Fig. 13), which will damp, in the best possible way with respect to the mean square error, the oscillations constantly arising
in the system due to random perturbations £(t). More precisely, as the optimality criterion (3.1.2), we shall consider the functional
(x2(t)
=E
x2 (t)) dt] , J
(3.2.3)
which has the meaning of the mean energy of random oscillations in system (3.2.1). Note that the mean square criterion (3.2.3) is used most frequently and this criterion corresponds to the most natural statement of the optimal
damping problem [1, 50]. However, there are other statements of the problem with penalty functions other than the function c(x] = x2 + x2 exploited in (3.2.3). From the viewpoint of the method used here for solving the synthesis problem, the choice of the penalty function is of no fundamental importance. To make the problem (3.2.1)-(3.2.3) consistent with the general statement treated in §3.1, we write Eq. (3.2.1) as the following system of two first-order equations for the phase coordinates x\ and x% (these variables can be considered as the displacement x\ = x and the velocity X2 = x):
(3.2.4)
= —zi —(3x2
Using the vector-matrix notation, we can write system (3.2.4) in the form (3.1.1), where A, Q, and
A=
0
0
Q=
(3.2.5)
According to §3.1, under the stationary operating conditions (T -» oo in (3.2.3)), the desired optimal damper C (Fig. 13) is a relay type regulator described by the equation (see (3.1.6))
u*(xi,x2) = -esign
-— . \OX2'
(3.2.6)
Here / = /(xi^xz) is the loss function satisfying the stationary Bellman equation (see (3.1.8))
df
(3.2.7)
Chapter III
156 where, according to (3.1.5) and (3.2.5),
L — x2-— - (/3x2 +
B d2
-— + IT -5-
(3.2.
ox-2
The equation
21
>.i-j.i.
determines a switching line for the optimal control action (from u — +£ to u = —£ or backwards) on the phase plane (KI, x2). The goal of the present
section is to obtain explicit expressions for the control algorithm (3.2.6) and the switching line (3.2.9). To this end, it is necessary to solve Eq. (3.2.7). We shall solve this equation by the method of successive approximations
discussed in §3.1. First, we shall prepare the mathematical apparatus for calculating the successive approximations. A straightforward verification shows that the stationary distribution with the density function po(x) = po(xi,x2), satisfying the equation (see (3.1.14))
f
=
has the form
(3.2.10) Hence, the matrices P and P~l in (3.1.15) are equal to
2/3 B
B
I 0 0 1
i o o i
(3.2.11)
It follows from (3.2.11) and (3.1.35) that in this case the matrix G of the
operator (3.1.34) coincides with the transpose matrix AT, that is, according to (3.2.5), we have " 0 -1 G= (3.2.12) 1 -0 and the operator (3.1.34) has the form
d
o
B d2 f\ s\ 2
(3.2.13)
One can readily see that the same probability density (3.2.10) satisfies the stationary equation Llpo(xi,x2) = 0. Therefore, in this case, the matrix PI from (3.1.49) and (3.1.50) coincides with the matrix P determined by (3.2.11).
Approximate Synthesis of Stochastic Control Systems
157
The matrix V that reduces (3.2.12) to the diagonal form by the similarity transform is equal to
1
V =
1
—AI
—A 2 (3.2.14)
This expression and formulas (3.1.50) and (3.2.12) imply B
B
Correspondingly, the inverse matrix P
B 2(4-,
2//J
(3.2.15)
-A2
has the form
A2 2//J 2//3 A!
(3.2.16)
The matrices (3.2.15) and (3.2.16) allow us to calculate the two-dimensional Hermitian polynomials .
Gim
r 1 / T7"^" \ T
, e i _,
f)l+m
^
r
1 / T1 ~^ \ i
exp
= (-!)
(3.2.18)
VL = Py.
Then these polynomials must be represented as functions of xi and x? by using the formula x = Vy and expression (3.2.14) for the matrix V. Table 3.2.1 shows some first polynomials H and G. In this case, in view of (3.1.51)-(3.1.55), (3.2.7), (3.2.10), and (3.2.11), the solutions of the equations of successive approximations (3.1.9), (3.1.10) can be written in terms of the Hermitian polynomials H^m(xi,X2) as the
series mA 2 '
(3.2.19) where the coefficients b\m are calculated by the formulas
(3.2.20) £
/V K __
/
-
9
V* ——
^1
9
I
^2 i ^
> 1.
Chapter III
158
TABLE 3.2.1 Polynomials H #00 = 1
-
-
2/3
-H
, p
2/3
//3
\
X2 #10 — -Pnyi + P\iyi — MI — ~TT B »i+ (^-y ) w
-"01 = -i 122/1 + -T22J/2 — M2 — ~^" ±> rr
"p
i
T-T-
n
.
rr
~p
i
'Xl+(P
+js}x,
,,2
-"20 = ~-» 11 + Ml
,,2
-"02 — — •« 22 T P2 #30
— Ml —
~ PllfJ-2
~
#12 =
#03 = A*2 — 3P22/x2 #40 = Ml ~ 3PnPl 2
#31 = MlM2
o-
_ ,,2,,2
rr
3
#"22 — MlM2 3Pl 2 P 2 2
#"13 — PlM2 #04
— fJ-2 ~ 6P22M 2
Polynomials G
GOO = 1
1 2JS 1 2^ Expressions for the polynomials G 2 O j G n , . . . can be obtained from the corresponding expressions for Him by the change Ppq —>• Ppq , fj,p —>• yp. Before we pass to the straightforward calculation of the successive ap-
proximations f k , we make the following two remarks about some singular-
Approximate Synthesis of Stochastic Control Systems
159
ities of the series on the right-hand side of (3.2.19).
REMARK 3.2.1. In practice, the series (3.2.19) is usually replaced by a finite sum. The number of terms of the series (3.2.19) left in this sum is determined by the rate of convergence of the series (3.2.19). Here we do
not discuss this question (see, for example, [26, 166, 179]). However, in our case, the series (3.2.19) cannot be truncated in an arbitrary way, since it contains complex terms such as the polynomials Him and the coefficients
aklm (this follows from (3.2.14)-(3.2.18) and (3.2.20)). At the same time, the loss function /jt(£i, £2) represented by this series has the meaning of a
real function for real arguments. Therefore, truncating the series (3.2.19), we must remember that a finite sum of this series determines a real function
only if the last terms of this sum contain all terms with Him of a certain group (namely, all Htm with l + m = s, where s is the highest order of the
polynomials left in the sum (3.2.19)).
D
REMARK 3.2.2. Equation (3.2.7), as well as the corresponding equations of successive approximations (3.1.9) and (3.1.10), is centrally symmetric (such equations remain unchanged after the substitution (x\, £2) —>• (—£1, —£2))- Therefore, the series (3.2.19) must not contain terms for which the sum (I + TO) is odd, since the polynomials Htm with odd (I + TO) are
not centrally symmetric (see Table 3.2.1). If we take this fact into account, then the body of practical calculations is considerably reduced.
D
In what follows, we present the first two approximations calculated according to (3.1.9) and (3.1.10) and the quasioptimal control algorithms «o(£i,£2) and «i(£i,CB2) corresponding to these approximations. The zero approximation. First of all, let us calculate the parameter 7° of specific stationary losses in the zero approximation. From (3.1.56) with regard to c(x) = x\ + x\ and (3.2.10), we have 0
O
f
r^
II
/•OO
/
2
*) \
F
/3 /
9
9 \ "I
KB JJ_ 0 0 Calculating the integral, we obtain
7° = j.
(3.2.21)
In view of Remark 3.2.2, the first coefficients a°0 and a^ in the series (3.2.19) are equal to zero.8 The coefficients b^Q, b®^ and 6§2 can be calculated by using the formulas for G2o, GII, and G02 from Table 3.2.1 and 8
The same result can be obtained if we formally calculate the coefficients 6j0 and fejjj
by using (3.2.20).
160
Chapter III
(3.2.16). Then, according to (3.2.20), the coefficient b% has the form O<x\
=
f3
x exp [ - f (x2 + x2)} (7° - xl - x2,), (3.2.22) The integral in (3.2.22) can readily be calculated, thus, taking into account (3.2.21) and (3.2.14), we obtain
In a similar way, we can easily find
(3 2 24)
--
All other coefficients b®m with l-\-m > 2 are zero in view of the orthogonality
condition (3.1.49). According to (3.2.19), it follows from (3.2.23) and (3.2.24) that a° =--^-/^-^
(3.2.25) 02
_
**'
IP
,rV
Finally, using the formulas for H20, #11, and H02 from Table 3.2.1 and (3.2.25), we obtain the loss function in the zero approximation t I
\ _
Orj-
| a0 r r
JOv*!) 2-2J — a20-"20 i
ll-"ll +
| a0 r r
02"02
. Ix| _ -B(/? 2 +4)
(3.2.26) This relation and condition (3.2.9) imply the following equation for the zero-approximation switching line F0:
zi+z
2
= 0.
(3.2.27)
Approximate Synthesis of Stochastic Control Systems
161
In this case, the quasioptimal control algorithm UQ(X] in the zero approximation has the form
u0(x) = -g sign (xi + ^x2} .
(3.2.28)
REMARK 3.2.3. The loss function f o ( x i , x 2 ) in the zero approximation (without a constant term) and the parameter of stationary losses (3.2.21) can be calculated in a different way, without using the method considered above. Indeed, if we first seek the solution of the zero approximation equation (L is the operator (3.2.8)) Lf0 = 7° - xl - x\
(3.2.29)
as the quadratic form
fo(xi, x2) = with unknown coefficients /in, hi2, and h22, then, substituting this expression into (3.2.39), we obtain four equations for ha, /ii2, h22, and 7°. However, higher approximations cannot be obtained by this simple reasoning. D
The first approximation. It follows from (3.1.10) and (3.2.26) that in the first approximation we need to solve the equation
2x2 This equation can be solved by analogy with the zero-approximation equation (3.2.29), but the calculations are much more cumbersome due to more complicated expression on the right-hand side.
First, we employ (3.1.57) and (3.2.21) to find the specific stationary losses 1
R
cR
^
f r°°
//
9
f / 3 / 2
2\~l
T
then, after the integral is calculated, we obtain
The coefficients a\0, a\^... in (3.2.19) are calculated by (3.2.19) and (3.2.20) with regard to the formulas for Glm from Table 3.2.1. We omit
162
Chapter III
the intermediate calculations and write the final expression for /I(:EI, £2)Taking only the first terms in the series (3.2.19) up to the fourth order inclusively (that is, omitting the terms for which (I + m) > 4), we obtain the following expression for the loss function in the first approximation: = VX
\ + px\ + puxixi + pmx\ + const . (3.2.31) Here9
ea/3 25
P31 = -^5-,
Pii =
£CX. l+T,
_ ea B
_ ea/3 125
P22 = -=-,
_ ea 165
Pl3 = 77T^, 1
O^O!
^2 = ^ - ^ ,
P04 = T^, ,
/o i->\ — 1 /9
« = (7r/35)
(32.32)
.
The condition f32
(3.2.32). From (3.2.9) and (3.2.31) we obtain the following equation for the switch-
ing line F1 in the first approximation:
It follows from the continuity conditions that for small e the switching
line F1 is close to F° determined by Eq. (3.2.27). Therefore, if we set £ 2 = —(/3/2)xi in the terms of the order of e in (3.2.33), then we obtain errors of the order of e2 in the equation for F1. Using this fact and formulas (3.2.32) and (3.2.33), we arrive at the following expression with accuracy up to the terms of the order of O(e1}:
Figure 23 shows the position of the switching lines F° and F1 on the phase plane (EI, x%). The switching line (3.2.34) determines the quasioptimal control algorithm in the first approximation:
«i(i) = -esign L +
l+
£«a;i -
-Zi •
(3-2.35)
We do not calculate the coefficients v and p and the constant term "const" in (3.2.31) since they do not affect the position of the switching line and the control algorithm in the first approximation.
Approximate Synthesis of Stochastic Control Systems
163
— —e
W* = £
FIG. 23
FIG. 24 This algorithm can easily be implemented with the help of standard blocks of analog computers. The corresponding block diagram of a quasioptimal control system for damping of random oscillations is shown in Fig. 24, where 1 and 2 denote direct-current amplifiers with amplification factors
gg/32 " 45
164
Chapter III
In conclusion, we dwell on another statement that follows from the calculations of the first approximation.
Namely, all expressions with a
small parameter contain this parameter in the form of the product ea = £/^TT/3B, This statement concerns the loss function (3.2.31), the switching line (3.2.34), and the formula (3.2.30) for stationary specific losses, which can be written in the form
or, more briefly,
if the condition /?2
fact, by the parameter e/ ' ^JirfjB. If we recall that, by the conditions of problem (3.2.2), the parameter e determines the values of admissible control, then it turns out that this variable need not be small for the method of successive approximations to be efficient. Only the relation between the limits of the admissible control and the intensity of random perturbations B is important. All this confirms our assertion made at the beginning of this chapter that the method of successive approximations considered here is convenient
for solving problems with bounded controls when the intensity of random perturbations is large.
§3.3. Synthesis of quasioptimal controls in the case of correlated noises
Now we shall show how the method of successive approximations studied in this chapter can be used for constructing quasioptimal controls when random actions on the system are not white noises. Instead of the system shown in Pig. 13, we shall consider a stabilization system of a somewhat more general form (see Fig. 25), where in addition to random actions £(<) on the plant we also take into account the noise r)(t] in the feedback circuit.
Let the controlled plant P be described, just as in §3.1, by the system of linear differential equations with constant coefficients (3.3.1)
Approximate Synthesis of Stochastic Control Systems
165
y(t)
FIG. 25 where XT = (zi, . . ., xn), UT = (MI, . . . , ur), (,T (t) = (&(<), . . .,&«(<)), and the constant matrices vl, Q, and cr are of dimensions n x n, n x r, and
n x ?7i, respectively. Block 1 in Fig. 25 is assumed to be a linear inertialess device described by the equation
y(t) = Cx(t) + Dr](t),
(3.3.2)
where yT = (j/i, . . . , yi, r = (771, . . . , rjt), and C and D are constant matrices of dimensions I x n and I x I, respectively, (det D ^ 0). The goal of control is to minimize a functional of the form - E
c(x(t},u(t}}dt\.
(3.3.3)
We assume that the random perturbations £(<) and r](t) affecting the system are independent diffusion processes with drift coefficients
= Hr),
(3.3.4)
and matrices of local diffusion coefficients B^ and B^ (G and B^ are mxm dimensional constant matrices; H and Bn are £ x £ dimensional constant matrices; the matrices B£ and B^ are symmetric, 5^ is a nonnegative definite matrix and B^ is a positive definite matrix). It is well known that in this case the diffusion processes £(t) and rj(t) are Gaussian. The stated problem is a special case of the synthesis problem treated in
§1.5. This problem is characterized by the fact that the controlled process x(t) is not a Markov process (in contrast, say, with problems considered in §3.1 and §3.2; moreover, x(t) is a nonobservable process), and therefore, to describe the controlled system shown in Fig. 25, we need a special space Xt
Chapter III
166 of states. This space was called the space of sufficient (see
coordinates in §1.5
also [171]). As was shown in §1.5, in this case, as sufficient coordinates,
we must use a system of parameters that determine the current a posteriori probability density of nonobserved stochastic processes:
The a posteriori density (3.3.5) satisfies a stochastic partial differential equation, which is a special case of Eq. (1.5.39). It follows from §1.5 that to write an equation for the density (3.3.5), we need to use a priori probability characteristics of the (n + m + £)-dimensional stochastic Markov process It follows from (3.3.1), (3.3.2), and (3.3.4) that the combined process has the drift coefficients
ax = Ax + Qu + a!;, ay — Ayxx Ayyy +
CQu,
(3.3.6)
and the matrix of local variances
B-
0...0 0...0
0 . . . 00 . . . 0 0...0 0...0
0...0 0...0
0...00...0 0...0
0...0 0...0
0...0
0...0
0...0
(3.3.7)
0...0
B
The matrices introduced in (3.3.6) and (3.3.7) are Ay,
= CA-
Ayy
=
DHD-^,
C,
Ayt: = Ca, By = DBnDT.
(3.3.8)
10 In this case, the control u in (3.3.1) is assumed to be a given known vector at each time instant t.
Approximate Synthesis of Stochastic Control Systems
167
Using (3.3.6) and (3.3.7), we obtain the following equation for the a posteriori probability density (3.3.5):11
dp(t,z)
I
92p(t,z)
ot
2*
Ozoz
(
T D—1
—^—- = -Spg ; — — 1 dy SJV
2\
p C
'
OCLy
d ,
,.
' - Sp^-^(azp(t,z))
-,\ , F . T D - LI /
rp
_l
dy -\- op ——Tfr — C.psd-u &v
^y
_
\
+ \y* B (ay - cEpsay)
uz
r
c
Ody
\
/
*
^y '— ^~p» ^P Tj—^r J P\t) Zi.
^y
J
*'
/ O O O\
{o.o.y ]
Here p(i,z) = pt(x,^) denotes the a posteriori density (3.3.5), z denotes the vector (#,£), az is the vector composed of the vector-columns ax and ct£, the matrix Bz is a part of the matrix (3.3.7) consisting of its first (n + m) rows and columns, Eps denotes the a posteriori averaging of the corresponding expressions (that is, the integration with respect to z with the density p(t, z ) ) .
It follows from (3.3.6)-(3.3.8) that the matrix B2 is constant, the components of the vector az are linear functions of z, and the expression in the square brackets in (3.3.9) linearly and quadratically depends on z. Therefore, as shown in §1.5 (see also [170, 175]), the a posteriori density p(t, z)
satisfying (3.3.9) is Gaussian, that is, (t))TK-l(t)(z
-
- *(<))],
(3.3.10)
if the initial (a priori] density p(0, z) = po(z) is Gaussian (this is assumed in the sequel).
Substituting (3.3.10) into (3.3.9), one can obtain a system of differential equations for the parameters ~z and K~* of the a posteriori probability density (3.3.10). One can readily see that this system has the form
V [y-ay(z,y,u)},
K~l = -2K~1BZK-1 - R-1V - VTK~i - W
(3.3.11)
(3.3.12)
(in our special case, the system (1.5.52) acquires the form (3.3.11), (3.3.12)). If instead of K~[ we use the inverse matrix K (which is the matrix of a posteriori covariances), then the system (3.3.11), (3.3.12) can be written in the form
z = az(z, u) + fflB~\y T
- ay(z, y, u)],
K = 1BZ + VK + K V + KWK. n
(3.3.13) (3.3.14)
To derive (3.3.9) from (1.5.39), we need to recall that, according to the notation
used in (1.5.39), the vector Aa coincides with the vector a z , the vector A& with ay, and
the structure of the diffusion matrix (3.3.7) implies the following relations between the matrices: \\Ba/3\\ = Bz, \\Baa.\\ = 0, \\Fffp\\ = B~l, and ||B,p|| = By.
Chapter III
168 Here
= Ayxkx( A V = 0
a G
yrr _
~
I ^yx-°y AT R"1 ^y A
I
AT R-1 A
»£ »
y*
T
(3.3.15)
"1
AT f!~1A
st y
y
where, in turn, kxx, kx£,... are elements of the block covariance matrix K,
kxx
kxe
(the dimensions of a block are determined by the dimensions of its subscripts; for example, kx£ is of dimension n x TO). The loss function for problem (3.3.1)-(3.3.3)
F(t,zt,Kt) = min Eps
:(X(T),W(T))C!T y*
u(r) t
= min Eps
C X T
= z t , JiC(t) =
u(r) t
(3.3.16) is completely determined by the time instant t and the current values of the parameters ( z t , K f ) of the a posteriori density (3.3.10) at this instant of time. It follows from the definition given in §1.5 that (z(f), K(t)) are sufficient coordinates for problem (3.3.1)-(3.3.3). The Bellman equation (1.5.54) for the function (3.3.16) can readily be obtained in the standard way from the Eqs. (3.3.13), (3.3.14) for the sufficient coordinates. However, it should be noted that, in this case, the system
(3.3.13), (3.3.14) has a special feature that allows us to exclude the a posteriori covariance K(t) from sufficient coordinates. The point is that, in contrast, say, with a similar system (1.5.53), the matrix equation (3.3.14) is independent of controls u and in no way related to the system of differential equations (3.3.13) for the a posteriori means ?(<). This allows us first to solve the system (3.3.14) and calculate the matrix of a posteriori covariances K(t) in the form of a known function of time on the entire control
interval 0 < t < T (we solve (3.3.14) with the initial matrix K(0) = K0, where KQ is the covariance matrix of a priori probability density po(z)). If K(t) is assumed to be known, then in view of (3.3.8) and (3.3.15) we
can also assume that the matrix crz in (3.3.13) is a known function of time, crz(t), and the loss function (3.3.16) depends on the set (i,2t). Therefore,
Approximate Synthesis of Stochastic Control Systems
169
instead of Eq. (1.5.54) for the loss function F(t, 2), we have the Bellman equation of the form
= mn
[c(x,v.)N(z,K(t))dz\ v
\
(3.3.17) (here N(Z,K(t)} denotes the normal probability density (3.3.10) with the vector of mean values ~z and the covariance matrix K(t)).
Just as in §3.1 and §3.2, Eq. (3.3.17) becomes simpler if we consider the stationary operating conditions for the stabilization system shown in
Fig. 25. The stationary operating conditions established during a long operating time interval (which corresponds to large time t) can exist if only there exists a real symmetric nonnegative definite matrix Kf such that
K*WK* + K*VT + VK* + 1BZ = 0
(3.3.18)
and this constant matrix K* is an asymptotically stable solution of (3.3.14). Let us assume that this condition is satisfied. Denoting the mean "control losses" per unit time under the stationary operating conditions, as
usual, by 7, we can define the stationary loss function /(z) = Km [F(t, z) - -j(T - t)], T-KX>
for which, from (3.3.17), we derive the time-invariant Bellman equation
[
rt f
r
aj(z, w ) — + / c(x,u)N(z, K*) dz
In (3.3.19) (TZ> is the matrix az (see (3.3.8) and (3.3.15)), where kxx, k^x,... are the corresponding blocks of the matrix Kf determined by (3.3.18). In some cases, it is convenient to solve Eq. (3.3.19) by the method of successive approximations treated in §3.1 and §3.2. The following example shows how this method can be used. Let us consider the simplest version of
the synthesis problem (3.3.1)-(3.3.3) in which Eqs. (3.3.1), (3.3.2) contain scalar variables instead of vectors and matrices. In (3.3.3) we write the penalty function c(x, u) in the form
c(x,u) = x +e~lu2,
(3.3.20)
170
Chapter III
where e > 0 is a small parameter. From the "physical" viewpoint, this penalty function means that the control actions are penalized much more strongly than the deviations of the phase coordinate x(i) of the control system (3.3.1) from the equilibrium state x — 0. For simplicity, we set Q =
x = — ax -\- £ + u + — [ y — (h — a)x — £ — u • (3.3.21)
£, = —£ + —^[y — (h — a)x — £ — u + /it/], Br,
I
^XX
ff
——
^QifoxX
I
^fox^
I
'
_-j
I
5
-^-,
%£, ——
\®>
' 9)*^x£
ffx = (h - a)kxx + kx^
(3.3.22)
Bn
In this case the Bellman equation (3.3.19) has the form
„..,, s
dxdt
+
4
* eeJ
=
,. (3.3.23,
Here the constants ax* and cr^« are ffx,
= (h- a)k*xx + fc* c ,
where the constant covariances k*x, k*x^ and fc|^ form the stationary solution of the system of differential equations (3.3.22). Passing to the new variables x± = (^/B^/
(3.3.24)
where r =
\+ f x\N(bx1,k*,)dx = f , (3.3.25) J J
Approximate Synthesis of Stochastic Control Systems
171
where b = &x* /\/B^. Taking into account the formulas r°°
I
x\N(bx-L,klx)dx
J — CO
°°
/
x[exp -00
1
--——(v-bx *KXX
V
_ ^rt T.L* ) +I 6' a . 1•*•| Il _ 2 F (\ _ - ^/ rL JT (3.3.26)
27T J -
and minimizing the expression in the square brackets, we obtain from (3.3.25) the optimal control for the stationary stabilization conditions:
uf(xi, x2) = ~~~(xii
a;2
)'
(3.3.27)
where the function /(KI,^) satisfies the nonlinear elliptic equation l, x 2 )
= 7 - "(xi) +
2
-
(3-3.28)
Equation (3.3.28) is similar to Eqs. (3.1.8) and (3.2.7), therefore, in this case, we can use the same method of approximate synthesis as in §3.1 and §3.2. Then the quasioptimal control U]t(xi,X2) in the fcth approximation is determined by the formula
uk(xi,x2) = - — -/^(x1,X2), 20
where the functions fk(xi,x2) approximations
k = 0,1,2,...,
(3.3.29)
satisfy the linear equations of successive
Lfk(xi,x2)=uk(xi,X2),
* = 0,1,2,...,
(3.3.30)
, x2) = U>Q(XI) = 7° -
In this case, the calculations of successive approximations fk(xi,x2) are completely similar to those discussed in §3.1 and §3.2. Therefore, here
Chapter III
172
we restrict our consideration to a brief description of the calculation of fk(xi,xz). We only dwell upon the distinctions in formulas. The operator (3.3.24) can be written in the form (3.1.5) if A = \\Aij\\\ and B = \\Bij\\^ in (3.1.5) are understood as the matrices
A=
1 1 1 1
—a 0
-g
The stationary density po(x) satisfying (3.1.14) has the form (3.1.15), and the matrices P and P~1, as one can readily see, have the form
P ~_ A ,/2
-afj,p
-a/ip -(r +
(3.3.31)
p/ > 1 , = a + g, v — a- g — r, and p = r + 2g). Using (3.3.31), we can find the matrix (see (3.1.35)) p[cf-(r + gY}
p(1a? - 2ar — gr) - g/j,2
(3.3.32)
as well as the matrix p/2gfj, l/2g
V =
(3.3.33)
By the similarity transform, this matrix reduces the matrix (3.3.32) to the diagonal form
\i 0
0 A2
= -a,
A 2 = -g.
(3.3.34)
It follows from (3.1.44), (3.1.51), and (3.1.55) that solutions of the equations of successive approximations (3.3.30) can be represented as the series (3.3.35)
where Him(xi,X2) are two-dimensional Hermitian polynomials calculated __ _ i __ _ -i by the formulas (3.2.17) with y = V x (the matrix V is inverse to (3.3.33)).
Approximate Synthesis of Stochastic Control Systems
173
The coefficients aktm are calculated by the formula (see (3.2.19))
fc
_ ——tfm——_——^m_ ) l\i + m\2 la + mg
(3.3.36)
and the coefficients
,,
00
I 0 /* X* det^P G/ m (/i) ^=T7Tpx ex? [ - ^(xTPx)]uk(x) dXldx2 !m, JJ
(3.3.37) are expressed in terms of the group of Hermitian polynomials Gim(x\, x-z) orthogonal to Him(xi, x-i) and calculated by (3.2.18). Parallel to the calculations of the successive approximations to the loss
function (3.3.35), we calculate specific stationary losses -yfc (corresponding to the Mh approximation) from the condition &QO = 0. So, in the zero approximation we have
hence, performing simple calculations and taking into account (3.3.31), we obtain
g) 2
Next, using the obtained value of 7° and formulas (3.3.26), (3.3.30), (3.3.36), and (3,3.27), we can calculate any desired number of coefficients a°m in the series (3.3.35). With the help of these coefficients, we can construct an approximate expression for the function fo(xi,X2), which allows
us to derive an explicit formula for the quasioptimal control algorithm UQ(XI,X^) in the zero approximation and to calculate the variables -y1, /i (£1,2:2), and MI (3:1,2:2) related to the first approximation. Here we write explicit formulas neither for fo(xi, Xi) nor for /i(a;i, £2), since they are very cumbersome. We only remark that in this case all quasioptimal control algorithms (3.3.29) are nonlinear functions of the phase variables (2:1,2:2); moreover, the character of nonlinearity is determined by
the number of terms left in the series (3.3.35) for the calculations.
Chapter III
174
Thus, from the preceding it follows that the methods for calculations of stationary operating conditions of the stabilization system (Fig. 13) can readily be generalized to the case of a more general system with correlated noise (Fig. 25) if the noise is a Gaussian Markov process. In this case, the
optimal system is characterized by the appearance of an optimal filter in the regulator circuit; this filter is responsible for the formation of sufficient coordinates. In our example (Fig. 25), where x, y, u, £, and f] are scalar, this filter is described by Eqs. (3.3.21). The circuit of functional elements of this closed-loop control system is shown in Fig. 26.
f
——>
" 1 ' -, >
—J
s(i )
Jo
"NT1
M
x(<)
W -*
t( t) ^\
3
Zi
<—
y
5^
Jo
,_ 4<
6
<
7<
f—
d/dt
'
8-
y(*)
!>?(* x
«—
FIG. 26 Blocks P and 1 are units of the initial block diagram (Fig. 25). The rest of the diagram in Fig. 26 determines the structure of the optimal controller. One can see that this controller contains standard linear elements of analog computers such as integrators, amplifiers, adders, etc. and one nonlinear converter NC, which implements the functional dependence (3.3.29).
Units of the diagram marked by ">" and having the numbers 1,2,... ,8 are amplifiers with the following amplification factors Kt:
K5 = -1,
K6 = a - h,
Kr = -1,
K& = h.
Approximate Synthesis of Stochastic Control Systems
175
§3.4. Nonstationary problems. Estimates of the quality of approximate synthesis
3.4.1. Nonstationary synthesis problems. If equations of a plant
are time-dependent or if the operating time T of a system is bounded, then the optimal control algorithm is essentially time-varying, and we cannot find this algorithm by using the methods considered in §§3.1-3.3. In this case, to synthesize an optimal system, it is necessary to solve a time-varying Bellman equation, which, in general, is a more complicated problem. However, if the plant is governed by a system of linear (time-varying) equations, then we can readily write solutions of the successive approximation equa-
tions (3.0.6), (3.0.7) in quadratures. Let us show how this is done. Just as in §3.1,
problem for the stabilization system (Fig.
we consider the synthesis
13) with a plant P described by
equations of the form
x = A(i)x + Q(t)u
+ ar(t)£(t),
(3.4.1)
where x is an n-dimensional vector of phase coordinates, u is an r-dimensional vector of controls, A(t), Q(t), and
n-dimensional standard white noise (1.1.34). To estimate the quality of control, we shall use the following criterion of the type of (1.1.13): (3.4.2)
and assume that the absolute values of the components of the control vector u are bounded by small values (see (3.1.3)):
K <£umi,
i= l,...,r.
(3.4.3)
According to (3.1.6) and (3.1.7), the optimal control u*(t,x) for problem (3.4.1)-(3.4.3) is given by the formula u,(t,x) = -{ew m i,...,£« mr }sign ( QT(t)—-(t,x) ) , \ dx / where the loss function F(t,
(3.4.4)
x) satisfies the equation
LtiXF(t, x) = -c(x)
- £$ (t, —— (<, x)}, \ ox )
(3.4.5)
176
Chapter III
with Lt,x denoting a linear parabolic operator of the form
—+-1
d
fit* L/tt-
*} LJ
For the function 3>(t,dF/dx], we have the expression (3.4.7)
In this case, the function F(t,x) must satisfy (3.4.5) for all x £ R n , 0 < t < T, and be a continuous continuation of the function (3.4.8)
as t -> T (see (1.4.22)). The nonlinear equation (3.4.5) is similar to (3.0.5) and, according to (3.0.6) and (3.0.7), can be solved by the method of successive approximations. To this end, we need to solve the sequence of linear equations
Lt,,F0(t,x) = -c(x),
Lt,,Fk(t, x) = -c(x)
(3.4.9)
- e$ (t, ^±(t, x)}, dx V >
k = 1, 2, . . . (3.4.10)
(all functions Fk(t,x) determined by (3.4.9) and (3.4.10) must satisfy condition (3.4.8)). Next, if we take Fk(t,z) as an approximate solution of Eq. (3.4.5) and substitute Fk into (3.4.4) instead of F, we obtain a quasioptimal control algorithm Uk(t, x) in the fcth approximation. Let us write the solutions Fj. (i, a;), k = 0, 1, 2, . . . , in quadratures. First, let us consider Eq. (3.4.9). Obviously, its solution Fo(t,x) is equal to the value of the cost functional
c(x(T))dT+i>(x(T))
x(t)=x
(3.4.11)
on the time interval [t, T] provided that there are no control actions. In this case, the functional on the right-hand side of (3.4.11) is calculated along the trajectories x(r), t < r < T, that are solutions of the system of stochastic
differential equations
X = A(T)X
+
x ( t ) = x,
describing the uncontrolled motion of the plant (u = 0 in (3.4.1)).
(3.4.12)
Approximate Synthesis of Stochastic Control Systems
177
It follows from §1.1 and §1.2 that the solution of (3.4.12) is a continuous Markov process X(T) (a diffusion process). This process is completely determined by the transitive probability density function p(x, t; z, T), which determines the probability density of the random variable z = X(T) if the stochastic process x(t) was in the state x(f) — x at the preceding time moment t. Obviously, by using p(x,t;z,r), we can write the functional (3.4.11) in the form
FQ(t,x)= f
dr I
Jt
p(x,t-z,T)c(z)dz+
Jnn
f
p(x,t;z,T)4>(z)dz.
(3.4.13)
JRn
On the other hand, we can write the transitive density p(x, t; z, T) for the diffusion process X(T) (3.4.12) as an explicit finite formula if we know the
fundamental matrix X(t, T) for the nonperturbed (deterministic) system z = A(t)z. Indeed, since Eqs. (3.4.12) are linear, the stochastic process X(T) satisfy-
ing this equation is Markov and Gaussian. Therefore, for this process, the transitive probability density has the form
p(x,t;z,T) = [(2ir)B det DJ-^expI-f (z - a)TD~l(z - a)], where a = Ez = E(X(T)
(3.4.14)
x ( t ] = a;) is the vector of mean values and
D = E[(z — Ez)(z — Ez) T ] is the covariance (dispersion) matrix of the random vector z = X(T). On the other hand, using the fundamental matrix X(t,T)12 we can write the solution of system (3.4.12) in the form (the Cauchy formula)
x(r) = X(t,r)x+ f Jt
X(s,T)
Hence, performing the averaging and taking into account properties of the white noise (1.1.34), we obtain the following expressions for the vector a
and the matrix D:
a = Ex(r)=X(t,T)x,
(3.4.15) T T
D = E [ ( x ( r ) - a))((xx((rr))--aa)) ]
!
Jt
X(s,T)o-(s)Et(s)£T(S')o-T(s')XT(s',T)dsds'
12
Recall that the fundamental matrix X(t,r),T > t, is a nondegenerate nxn matrix
whose columns are linearly independent solutions of the system Z ( T ) = A(T)Z(T), so
that X(t, t) = E, where E is the identity matrix. Methods for constructing fundamental matrices and their properties are briefly described on page 101 (for details, see [62, 111]).
178
Chapter III X(s,T)cr(s)6(s'
-s)
X(s,T)B(s)XT(s,T)ds,
=
B(s)=a(s}ffT(s).
J*
(3.4.16)
Formulas (3.4.13)-(3.4.16) determine the solution F0(t,x) of the zeroapproximation equation (3.4.9), satisfying (3.4.8), in quadratures. It follows from (3.4.13)-(3.4.16) that the function F0(t,x) is infinitely many times differentiable with respect to the components of the vector x if the functions c(z) and ij)(z) belong to a rather wide class (it suffices that the functions c ( z ) exp(— |z T D~ 1 z) and ^>(z) exp(— ^zTD~iz) were absolutely integrable [25]). Therefore, by analogy with (3.4.13), we can write the solution Fk(t,x) of the successive approximation equations (3.4.10), satisfying
(3.4.8), in the form
Fk(t,x)=
dr Jt
JRn
= l,2,...
(3.4.17)
To obtain explicit formulas for the functions Fo(t, x), Fi(t, x),. .., which allow us to write the quasioptimal control algorithms Uo(t, x), «i(<, a;), ...
as finite analytic formulas, we need to have the analytic expression of the matrix X(t,r) and to calculate the integrals in (3.4.13) and (3.4.17). For autonomous plants (the case where the matrix A(t) in (3.4.1) and (3.4.12) is constant, A(t) = A = const), the fundamental matrix X(t,r) has the form of a matrix exponential:
X(t,r) = e A ( r -*>
(3.4.18)
whose elements can be calculated by standard methods. On the other hand, it is well known that fundamental matrices of nonautonomous systems can be constructed, as a rule, by numerical methods.13 Thus for A(t) ^ const, it is often difficult to obtain analytical results. If the plant equation (3.4.1) contains a constant matrix A(i) EE A —
const, then formulas (3.4.13) and (3.4.17) allow us to generalize the results obtained in §§3.1-3.3 for the stationary operating conditions to the timevarying case. For example, let us consider a time-varying version of the
problem of optimal damping of random oscillations studied in §3.2. 13 Examples of special matrices A(t) for which the fundamental matrix of the system x = A(t)x can be calculated analytically, can be found, e.g., in [139].
Approximate Synthesis of Stochastic Control Systems
179
Just as in §3.2, we shall consider the optimal control problem (3.2.1)(3.2.3). However, in contrast with §3.2, we now assume that the terminal time (the upper limit T of integration in the functional (3.2.3)) is a finite
fixed value. By writing the plant equation (3.2.1) in the form of system (3.2.4), we see that problem (3.2.1)-(3.2.3) is a special case of problem
(3.4.1)-(3.4.3) if
0
4= a(t) = a =
0
1
Q(t) = Q = 1
-i -p
0
0
C(x) =
o VB
2 2 Xl+X2,
i>(3
(3.4.19) = 0.
Therefore, it follows from the general scheme (3.4.4)-(3.4.10) that in this case the optimal control has the form
u#(t,xi,x2) = -esign f - — ( t , x i , x 2 ) 1,
(3.4.20)
where for 0 < t < T the function F(t,xi,x2) satisfies the equation
Lt>xF(t, xi, x2) = -x\-
(3.4.21)
0z 2 V "
and vanished at the terminal point, that is,
F(T,X!,x2) = Q.
(3.4.22)
According to (3.4.6) and (3.4.13), the operator Lt,x in (3.4.21) has the form
d
d Lt,x = ^T
8
B d2 2 dx\
-—— + — ——r.
dx2
Let us calculate the loss function Fg(t, xi, x2) of the zero approximation. In view of (3.4.9), (3.4.21), and (3.4.22), this function satisfies the linear equation
with the boundary condition
F0(T,xi,x2) = Q.
(3.4.24)
According to (3.4.13), the function F0(t,Xi,x2) can be written in quadratures ,-T
F0(t,xi,x2)=
dr Jt
(3.4.25)
Chapter III
180
where the transition probability density p(x,t;z, T) is given by (3.4.14). It follows from (3.4.15) and (3.4.16) that to find the parameters of the transition density we need to calculate the fundamental matrix (3.4.18). Obviously, the roots AI and A2 of the characteristic equation dei(A — \E) = 0 of the matrix A given by (3.4.19) are
Ai,2 = --
From this and the Lagrange-Silvester formula [62] we obtain the following expression for the fundamental matrix (3.4.18) (here p = (T — t ) ) :
-(A-X1E) -(A-X2E) (Ai-A2 (A2• sin Sp + 6 cos 6p sin 6p — sin 6p 6 cos Sp — | sin 6p (3.4.26)
X(t,r) =
It follows from (3.4.15), (3.4.16), and (3.4.26) that in this case the vector of means a and the variance matrix D of the transitive probability density (3.4.14) have the form
x\ cosSp + |(a:2 + f
(3.4.27)
On
D12
_B
Di 2 B
D 22
25' 6
B
(3.4.28)
Substituting (3.4.14) into (3.4.25) instead ofp(x,t;z,r), integrating, taking into account (3.4.27) and (3.4.28), and performing some easy calcula-
Approximate Synthesis of Stochastic Control Systems
181
tions, we obtain the following final expression for the function Fo(t, x^, x2):
xi
-l-
B(/32+4:); ^2
—e-^sm2S-p(B - 2(3x1 - 4xlX2) 80
(3.4.29)
"? cos 2 W/32 - 4)z? + 40*1*2 + 4zi - 5/3], where ~p = T — t. Let us briefly discuss formula (3.4.29). If we consider the terms on the right-hand side of (3.4.29) as function of "reverse" time ~p — T — t, then these terms can be divided into three groups: infinitely increasing, damping, and independent of ~p as ~p —> oo. These three types of terms have the following physical meaning. The only infinitely growing term (B/(3)~p in (3.4.29) shows how the mean losses (3.4.11) depend on the operating time in the mode of stationary operating conditions. Therefore, the coefficient B//3 has the meaning of the specific mean error 7, which was calculated in §3.2 by other methods and for which we obtained 7° = B//3 in the zero approximation (see (3.2.21)). Next, the terms independent of ~p (in the braces in (3.4.29)) coincide with the expression for the stationary loss function obtained in §3.2 (formula (3.2.26)). Finally, the damping terms in (3.4.29) characterize the deviations of operating conditions of the control system from the stationary ones. Using (3.4.29), we can approximately synthesize the optimal system in the zero approximation, where the control algorithm uo(t, Xi,x2) has the form (3.4.20) with F replaced by FQ. The equation
8F0
=
2x2
— I"
I
2x
+
2cos 2Sp(/3xi + x2) = 0
(3.4.30)
determines the switching line on the phase plane (xi, x 2 ). Formula (3.4.30) shows that this is a straight line coinciding with the K-axis as ~p —)• 0 and rotating clockwise as ~p -> oo (see Fig. 27) till the limit value xi + 2x2/(3 — 0 corresponding to the stationary switching line (see (3.2.27)). Formulas (3.4.29) and (3.4.30) also allow us to estimate whether it is important to take into account the fact that the control algorithm is timevarying. Indeed, (3.4.29) and (3.4.30) show that deviations from the stationary operating conditions are observed only on the time interval lying at
Chapter III
182
FIG. 27 the distance ~ (3~l from the terminal time T. Thus, if the general operating time T is substantially larger than this interval (say, T 3> 3//3), then we can
use the stationary algorithm on the entire interval [0,T], since in this case the value of the optimality criterion (3.2.3) does not practically differ from the optimal value. This fact is important for the practical implementation of optimal systems, since the design of regulators with varying parameters is a rather sophisticated technical problem. 3.4.2. Estimates of the approximate synthesis performance. Up to this point in the present chapter, we have studied the problem of how to find a control syste close to the optimal one by using the method of successive approximations. In this section we shall consider the problem of how the quasioptimal system constructed in this way is close to the optimal
system, that is, the problem of approximate synthesis performance. Let us estimate the approximate synthesis performance for the first two (the zero and the first) approximations calculated by (3.0.6)-(3.0.8). As an example, we use the time-varying problem. (3.4.1)-(3.4.3). We assume that
the entries of the matrices A(t), Q(t), and cr(i) in (3.4.1) are continuous functions of time defined on the interval 0 < t < T. We also assume that the penalty functions c(x) and i}>(x) in (3.4.2) are continuous and bounded for all x £ R n . Then [124] there exists a unique function F(t, x) that satisfies
the Cauchy problem (3.4.5), (3.4.8) for the quasilinear parabolic equation (3.4.5)14 This function is continuous in the strip HT = {\x\ < oo, 0 < t < T} 14
We shall use the following terminology: Eq. (3.4.5) is called a quasilinear (semi-
linear) parabolic equation, the problem of solving Eq. (3.4.5) with the boundary condi-
Approximate Synthesis of Stochastic Control Systems
183
and continuously differentiable once with respect to t and twice with respect to x for 0 < t < T; its first- and second-order derivatives with respect to x are bounded for x 6 HTOne can readily see that in this case
\F(t,x)-F1(t,x)\~e2,
\F(t,x)-F0(t,x)\~e,
(3.4.31)
and hence, for small £, the functions F0(t, x) and Fi(t, x) nicely approximate
the exact solution of Eq. (3.4.5). To prove relations (3.4.31), let us consider the functions So(t,x) = F(t, x) - F0(t, x) and Si(t, x) = F(t, x) - Fi(t, x). It follows from (3.4.5), (3.4.9), and (3.4.10) that these functions satisfy the equations
LS0 = -e* (t, ^-} ,
S0(T, x) = 0,
(3.4.32)
(3.4.33)
Equations (3.4.32) and (3.4.33) differ from (3.4.9) only by the expressions on the right-hand sides and by the initial data. Therefore, according to
(3.4.13), the functions So and Si can be written in the form T
/»
/)/?
dr /
/
S1(t,x)=ef Jt
\
p(x,t;z,T)$ (r,—— (T,Z) ) dz, v
jRn
az
(3.4.34)
'
dr f p (z,i;z,r)[*(r,^)-*(T,^)]dz. v Jnn L v Oz ' J (3 4 35)
Since the function $ is continuous (see (3.4.7)) and the components of the vector dF/dx are bounded, we have |$(T, dF/dz)\ < P for all r 6 [0, T]; hence, we have the estimate
\S0(t ,x)\<e
dr I Jt
p(x,t;z,T)$
dz,
jRn
<eP Ji
dr I p(x,t;z,T)dz = •/R«
(3.4.36)
tion (3.4.8) is called the Cauchy problem, and the boundary condition (3.4.8) itself is sometimes called the "initial" condition for the Cauchy problem (3.4.5), (3.4.8). This terminology corresponds to the universally accepted standards [61, 124] if (as we shall do in §3.5) in Eq. (3.4.5) we perform a change of variables and use the "reverse" time p = T — t instead of t. In this case, the backward parabolic equation (3.4.5) becomes a "usual" parabolic equation, and the boundary value problem (3.4.5), (3.4.8) takes the form of the standard Cauchy problem.
Chapter III
184
The first relation in (3.4.31) is thereby proved. To prove the second relation in (3.4.31), we need to estimate the difference S'0 = (dF/dxi) - (dF0/dxi). To this end, we differentiate (3.4.32) with respect to x;. As a result, we obtain the following equation for the function SQ: (3.4.37)
(in fact, the derivative on the right-hand side of (3.4.37) is formal, since the
function $ (3.4.7) is not differentiable). Using (3.4.13) for SQ, we obtain T
C
dr
(3.4.38)
JRn
Integrating (3.4.38) by parts with respect to z,- and taking into account (3.4.14) and (3.4.15), we arrive at
S'0 = £ f
Jt
dr I
Jnn
p(x,t;z,r)D-j1(zj
-Xjk(t,r)xk}$(r,
-f) dz. (3.4.39)
\ oz /
From (3.4.39) we obtain the following estimate for S10:
\So\ < £P I Jt
dr
I ^i? \zi ~ XJk(t, r)xk\p(x, t; z, T) dz = ePVi, Jnn .
(3.4.40)
J=l Now we note that since Q(i) in (3.4.7) is bounded, the function $(<, satisfies the Lipschitz condition with respect to y:
(3.4.41) Using (3.4.40), (3.4.41), and (3.4.35), we obtain dz <sN f Jt
dr f
p(x,t;z,T}y^\S*0\dz
JRn
i=1
n
<£2NPV(T-t),
V
(3.4.42) 1=1
Approximate Synthesis of Stochastic Control Systems
185
which proves the second relation in (3.4.31). In a similar way, we can also estimate the difference dF/dxi — dFi/dxt = S{. Indeed, just as (3.4.39) was obtained from (3.4.32), we use (3.4.33) to obtain
S{=e ?{=ef Jt
dr dr f
p(x,t-,z,T)D(z-Zkxk)*T,-*T,
dz.
JRn
(3.4.43) This relation and (3.4.40), (3.4.41) for the function S{ readily yield the estimate f> 17
|S1I =
fl Z7
< £2PVViN,
(3.4.44)
which we shall use later. According to (3.0.8), in this case the quasioptimal controls uo(t, x) and MI(<, x) are determined by (3.4.4), where instead of the loss function F(t, x) we use the successive approximations F0(x,t) and Fi(x,t), respectively. By Go(t, a:) and Gi(x,t) we denote the mean values of the functional (3.4.11) calculated on the trajectories of the system (3.4.1)
x = A(T)X + Q(T)MJ(T, a;) +
v(+\ —T \ I — 5
i — 0 1 — ' '
with the use of the quasioptimal controls uo(t, x) and u-\_(t, x). The functions Gi(t, a;), i — 0, 1, estimate the performance of the quasioptimal control algorithms Wi(<, x), i = 0,1. Therefore, it is clear that the approximate synthesis may be considered to be justified if there is only a small difference between the performance criteria Go(t, x) and Gi(i, x) of the suboptimal systems and the exact solution F(t, x} of Eq. (3.4.5) with the initial condition (3.4.8). One can readily see that the functions GO and GI satisfy estimates of type (3.4.31), that is,
\F(t1x)-G0(t1x)\~£,
\F(t,x)-G1(t,x)\~e2.
(3.4.45)
Relations (3.4.45) can be proved by analogy with (3.4.31). Indeed, the functions GO and GI satisfy the linear partial differential equations [45, 157]
LGi(t,x) = -c(x)-euf(t,x)QT(t)-(t,x),
(3.4.46)
Chapter III
186
This fact and (3.4.9), (3.4.10) imply the following equations for the functions H0 = F0 - Go and HI = FI - Gj.:
H0(T, x) = 0,
LH0 =
LH^^W^-*(t^}\,
(3.4.47)
F1(r,,) = o.
(3.4.48)
Since wf Q T '^j£- = $(*, ^")> Eq. (3.4.48) can be rewritten as follows:
(3.4.49) It follows from (3.4.4) that Eqs. (3.1.46), (3.4.49) are linear parabolic equations with discontinuous coefficients. Such equations were studied in [80, 81, 144]. It was shown that if, just as in our case, the coefficients in (3.1.46), (3.1.49) have discontinuities of the first kind, then, under our assumptions about the properties of A(t), Q(t), c ( x ) , and ip(x), the solutions of Eqs. (3.4.46), (3.4.49) and their first-order partial derivatives are bounded.
Using this fact, we can readily verify that the right-hand sides of (3.4.47) and (3.4.49) are of the order of e and e 2 , respectively. For Eq. (3.4.47), this statement readily follows from the boundedness of the components of the
vectors dGo/dx and MO and the elements of the matrix Q. The right-hand side of (3.4.49) can be estimated by the Lipschitz condition (3.4.41) and the inequality
dFi
dF0
8F
dF
which follows from (3.4.40) and (3.4.44). Therefore, for the functions H0 and HI we have
\HO\
•e2.
(3.4.50)
To prove (3.4.45), it suffices to take into account the inequalities |F-Go
FO-GO\< F- FO\ + \HO\,
\F-Gi and to use (3.4.31) and (3.4.50). Thus, relations (3.4.45) show that if the Bellman equation contains a small parameter in nonlinear terms, then the difference between the quasioptimal control system calculated by (3.0.6)-(3.0.8) and the optimal control system is small and, for sufficiently small e, we can restrict our calculations to a small number of approximations. We need either one (the zero)
Approximate Synthesis of Stochastic Control Systems
187
or two (the zero and the first) approximations. This depends on the admissible deviation of the quasioptimal system performance criteria Gi(t, x) from the loss function F(t,x). In conclusion, we make two remarks about (3.4.45).
REMARK 3.4.1. One can readily see that all arguments that lead to the estimates (3.4.45) remain valid for any types of nonlinear functions in (3.4.5) that satisfy the Lipschitz condition (3.4.41). Therefore, in particular, all statements proved above for the function <S> (3.4.7) automatically hold for equations of the form (3.0.4) with r-dimensional ball taken as the set U of admissible controls, instead of an r-dimensional parallelepiped. D REMARK 3.4.2. The estimates of the approximate synthesis accuracy considered in this section are based on the assumption that the solutions of the Bellman equation and their first-order partial derivatives are bounded. At first glance it would seem that this assumption substantially narrows the class of problems for which the approximate synthesis procedure (3.0.6)(3.0.8) can be justified. Indeed, the solutions of Eqs. (3.4.5), (3.4.9), (3.4.10), and (3.4.46) are unbounded for any x G Rn if the functions c(x) and i>(x) infinitely grow as x —>• oo. Therefore, for example, we must eliminate frequently used quadratic penalty functions from consideration. However, if we are interested in the solution of the synthesis problem in a given bounded region XQ of initial states x(0) of the control system, then the procedure (3.0.6)-(3.0.8) can also be used in the case of unbounded penalty functions. This statement is based on the following heuristic arguments. Since the plant equation (3.4.1) is linear and the matrices A(t), Q(t), and ff(t) and the control vector u are bounded, we can always choose a sufficiently large number R such that the probability
P{sup 0 < t < T \x(t)\ > R} becomes arbitrary small [11, 45, 157] for any fixed domain XQ of the initial states x ( Q ) . Therefore, without loss of accuracy, we can replace the unbounded functions c(x) and i{>(x) in (3.4.2) (if, in a certain sense, these functions grow as x\ — R —> oo slower than the probability P{sup 0 < t < T x(i)\ > R} decreases as R —>• oo) by the expressions
c(x] maXj; c(x) \x\ = R,
for for for
ij)(x)
for
\x\ < R, \x\ > R, x
< R, > R,
\x\ = R, for which the solutions of Eqs. (3.4.5), (3.4.9), (3.4.10), and (3.4.46) satisfy the boundedness assumptions. D
188
Chapter III
The question of whether procedure (3.0.6)-(3.0.8) can be used for solving the synthesis problems with unbounded functions c(x) and i/>(x) in the functional (3.4.2) will be rigorously examined in the next section. §3.5. Analysis of the asymptotic convergence of
successive approximations (3.0.6)— (3.0.8) as k —>• oo
The method of successive approximations (3.0.6)-(3.0.8) can also be used for the synthesis of quasioptimal control systems if the Bellman equation
does not contain a small parameter in nonlinear terms. Needless to say that (in contrast with Section 3.4.2 in §3.4) the first two approximations, as a rule, do not approximate the exact solution of the synthesis problem sufficiently well. We only hope that the suboptimal system synthesized on
the basis of (3.0.9) is close to the optimal system for large k. Therefore, we need to investigate the asymptotic behavior as k —>• oo of the functions
Fk(t,x) and U k ( t , x ) in (3.0.6)-(3.0.8). The present section deals with this problem. Let us consider the time-varying synthesis problem of the form (3.4.1)™ (3.4.3) in a more general setting. We assume that the plant is described by
the vector-matrix stochastic differential equation of the form
x = a(t, x) + q(t)u + a(t, x ) £ ( t ) .
(3.5.1)
Here x is an n-dimensional vector of phase coordinates of the system, u is an r-dimensional vector of controls, £(t) is an n-dimensional vector of random actions of the standard white noise type (1.1.34), a(t, x) is a given vectorfunction of the phase coordinates x and time t, and q(t) and
such that for t > t0, 0 < t0 < T, the stochastic equation (3.5.1) has a unique solution x(t) satisfying the condition x(t0) = XQ at least in the weak sense (see §IV.4 in [132]). As an optimality criterion, we take the functional (3.4.2),
U
T
-I
c(x(t))dt + il>(x(T))\. J
(3.5.2)
Here c ( x ) and ij)(x) are given nonnegative scalar penalty functions whose
special form is determined by the character of the problem considered (the requirements on c ( x ) and ij)(x) are given later). The constraints on the domain of admissible controls have the form
(1.1.22), weC/,
(3.5.3)
Approximate Synthesis of Stochastic Control Systems
189
where U C Rr is a closed bounded convex set in the Euclidean space R r . It is required to find a function it, = ut(t,x(t)} satisfying (3.5.3) such that the functional (3.5.2) calculated on the trajectories of system (3.5.1) with the control M* attains its minimum value.
In accordance with the dynamic programming approach, solving this problem is equivalent to solving the Bellman equation that, for problem
(3.5.1)-(3.5.3), reads (see §1.4)
Here a(t, x] is a column of functions with components (see (1.2.48))
-—-^ami, Z Oxm
t=l,...,n.
(3.5.5)
Recall that we assumed in §1.2 that throughout this book all stochastic differential equations written (just as (3.5.1)) in the Langevin form [127] are symmetrized [174].
By definition, the loss function F in (3.5.4) is equal to
F = F(t,x) = min u(T)£U
Here E[(-)
T ( r r E-H / C(X(T))
dr
llJt
x(t) = x] means averaging over all possible realizations of the
controlled stochastic process x(r) =; XU^(T) (r > t) issued from the point
x at T = t. It follows from (3.5.6) that
F(T,x) = j(x).
(3.5.7)
Passing to the "reverse" time p = T — t, we transform Eq. (3.5.4) and the condition (3.5.7) to the form
LF(p,x) = -c(x) - mm \uTqT(p)~(p, x ] \ , F(0,z) = '0(z).
In (3.5.8) we have the following notation:
(3.5.8) (3.5.9)
190
Chapter III
ai(p,x) = a,i(x,T — p), q(p) =q(T — p), b t j ( p , x ) is a general element of the matrix |(T — p, x)'aT (T — p, x) and, as usual, the sum in (3.5.10) (just as in (3.5.5)) is taken over repeated indices from 1 to n. Assuming that the gradient dF/dx of the loss function is a known vector and calculating the minimum in (3.5.8), we obtain
x
c(x).
(3.5.11)
In addition, we obtain the function
u*(p,x) =
(3.5.12)
that satisfies the condition T T, ^^F
dF
~ and solves the synthesis problem (after we have solved Eq. (3.5.11) with
the initial condition (3.5.9)). The form of the functions
Eq. (3.0.5) only by a small parameter (there is no small coefficient e of the function $). Nevertheless, in this case, we shall also use the approximate synthesis procedure (3.0.6)-(3.0.8) in which, instead of the exact solution F(p,x) of Eq. (3.5.11), we take the sequence of functions F Q ( P , X ) , F I ( P , X ) , . . . recurrently calculated by solving the following sequence of linear equations:
) = i>(x),
(3.5.13) (3.5.14)
The successive approximations u0(p, x), Ui(p, x), . . . of control are determined by the expressions
fc = 0 , l , . . .
(3.5.15)
Below we shall find the conditions under which the recurrent procedure (3.5.13)-(3.5.15) converges to the exact solution of the synthesis problem.
Approximate Synthesis of Stochastic Control Systems
191
Let us consider Eq.(3.5.11) with the operator L determined by (3.5.10). The solution F(p, x) and the coefficients bij(p, x) and a,(p, x) of the operator L are denned on UT = {[0, T] x R n } = {(p, x): 0 < p < T, x e R»}. We assume that everywhere in UT the matrix \\bij(p, x)\\" satisfies the con-
dition that the operator L is uniformly parabolic, that is, everywhere in UT for any real vector x
we
have
AM2<X%>,*)x
(3-5.16)
where A. and A are some positive constants. Moreover, we assume that the functions bij(p, x) and o;(/9, x) are bounded in E/r, continuous in both variables (p, x), and satisfy the Holder inequality with respect to x uniformly
in p, that is,
\bij(p,x)-bii(p,x°)\
(3.5.17) 0 < a < 1,
We assume that the functions c ( x ) , if>(x), and 3>(p,dF/dx)
A = const.
are continuous
and that c(x) and if>(x) satisfy the following restrictions on the growth
as x
• oo:
(3.5.18) (h is a positive constant). We also assume that the function $(/?, v) satisfies the Lipschitz condition with respect to v — (vi,...,vn) uniformly in p G
[0,2*], that is,
\*(p,v)-*(p,v°)\
$(p,0) = 0.
(3.5.19)
In particular, the functions $ from (3.4.7) and (1.3.23) satisfy (3.5.19). The following three consequences from the above assumptions are well
known [74]. (1) There exists a unique fundamental solution G(x, p;y,
lim /
G(x, p; y,
(3.5.20)
^P jRn
for any continuous function f ( x ) such that
|/(z)| < const e^l 2 ,
&= -
>
(3.5.21)
192
Chapter III
(here .A is taken from (3.5.16)). (2) Solutions of inhomogeneous equations (3.5.13) and (3.5.14) can be expressed in terms of G(x, p; y,
F0(p,x) =
G(x,p;y,Q)iJ}(y)dy
+
jRn
Fk(p,x)=
f
da Jo
JRn
G(x, p-y,cr)c(y) dy, (3.5.22)
G(x,p-
JRn
+
Jof
P
d
jRn
Gfrw)
i c(y)
~ *-, v
°y
' J dy. (3.5.23)
In this case, formula (3.5.22) holds unconditionally in view of (3.5.18); for-
mula (3.5.23) holds only if the derivatives dFk/dx{ satisfy some inequalities of the form (3.5.18) (or at least of the form (3.5.21)). In the sequel, we show that this condition is always satisfied. The solutions Fk(p, x), k = 0, 1, . . ., are twice continuously differentiate in x, and the derivatives 8Fk/dxi and 92Fk/dxidxj can be calculated by differentiating the integrands on the right-hand sides of (3.5.22) and (3.5.23). (3) The following inequalities hold (for any A < A from (3.5.16)):
(3.5.24)
dG.
—— (x,p;y,(r)
Statements (l)-(3) hold for linear equations (3.5.13), (3.5.14) of successive approximations. Now we return to the synthesis problem and consider the two stages of solving this problem. First, by using the majorant estimates (3.5.24) and (3.5.25), we prove that the successive approximations Fk(p,x) converge as k —>• oo to the solution F(p,x) of Eq. (3.5.11) (in this case, we simultaneously prove that there exists a unique solution of
Eq. (3.5.11) with the initial condition (3.5.9)). Next, we show that the suboptimal systems constructed by the control law (3.5.15) are asymptotically as k —>• oo equivalent to the optimal system.
1. First, we prove that the sequence of functions Fo(p,x),Fi(p,x),... determined by recurrent formulas (3.5.22), (3.5.23) and the sequence of their partial derivatives dFk(p,x)/dxi, k = 0,1,2,... are uniformly con-
193
Approximate Synthesis of Stochastic Control Systems vergent. To this end,
we construct the differences
Qk = Fk+1(p,x)-Fk(p,x)
(3.5.26)
dFk o = J/
>~ady (3.5.27)
(in (3.5.26), (3,5,27) we set k - 0,1,2,... provided that F-i = 0). Using (3.5.19), (3.5.26), and (3.5.27), we obtain the inequalities
\Qk(p,x)\
da JO
\G(x,p;y,
da [
dy,
dy
JRn
(3.5.28) T, y)
dG(x,p;y,
dy
JRn
dy. (3.5.29)
Formulas (3.5.28), (3.5.29) and (3.5.24), (3.5.25) allow us to calculate estimates for the differences (3.5.26), (3.5.27) recurrently. To this end, it is necessary only to estimate \dQo/dxi . It turns out that the estimate of type (3.5.18) holds, that is,
dQ0
(3.5.30)
<
Indeed, since
t~n/2 I
exp ( - j\y\2 + h\y\] dy < K4
JRn
V
*
(3.5.31)
/
for A > 0, we have
P
f da-
Jo
(p-
exp
dy
) (3.5.32)
Chapter III
194
for the derivative 8F0/dxi provided that (3.5.18), (3.5.22), and (3.5.25) are taken into account. By using the inequality
dF0
dF0
<
dy
with regard to (3.5.19), (3.5.27), and (3.5.32), we obtain
dQ0
( / 0 - < T)-("+ 1 )/ 2 (
J" da f Jo
x exp I —
JRn
p-a
• h \ y \ ) dy
and since p is bounded, we arrive at (3.5.30). Using (3.5.30) and applying formulas (3.5.28) and (3.5.29) repeatedly, we estimate the differences (3.5.26) and (3.5.27) for an arbitrary number k > 1 (here F(-) is the gamma function) as follows:
\Qk(p,x)\< (3.5.33)
9Qk(p,x]
,h\x\
(3.5.34)
(formulas (3.5.33) and (3.5.34) are proved by induction over k). The estimates obtained show that the sequences of functions
Fk(p, x) = F0(p, x) + Q0(p,:
Qk-i(p,
(3.5.35)
dFk,
, Ox,
(3.5.36)
converge to some limit functions
F(p,x) = lim Fk(p,x), k— s-oo
-IJT i
^
Wi(p,x)=
i-
dF ,
k hm -r—— ( k—>-oo
In this case, the partial sums on the right-hand side of (3.5.35) uniformly converge in any bounded domain lying in E/r, while in (3.5.36) the partial
Approximate Synthesis of Stochastic Control Systems
195
sums converge uniformly if they begin from the second term. The estimate (3.5.32) shows that the first summand is majorized by a function with singularity at p — 0. However, one can readily see that this is an integrable
singularity. Therefore, we can pass to the limit (as k —>• oo) in (3.5.23) and in the formula obtained by differentiating (3.5.23) with respect to a;;. As a result, we obtain
F(p,x) = I
G(x,p;y,Q}i>(y)dy
JRn
+ I" da I JO
= f
Jnn
G(x, p; y, a) [c(y) - $(
JRu
dy.
ox
i
This implies that Wi(p, x) = dF(p,x)/dxi
and hence the limit function
F(p, x) satisfies the equation
F(p,x) = f
G(x,p;y,0)^(y)dy
jRn
+ [" d
Jo
jRn
G(x,p;y,
I
^ oy
>\
(3.5.37)
Equation (3.5.37) is equivalent to the initial equation (3.5.11) with the
initial condition (3.5.9), which can be readily verified by differentiating with regard to (3.5.20). Thus, we have proved that there exists a solution of Eq. (3.5.11) with the initial condition (3.5.9). The proof of this statement shows that the solution F(p, x) and its derivatives dF/dxi have the following majorants everywhere in HT'-
3F
\F(p,x)\
(3.5.38)
By using (3.5.38), we can prove that the solution of Eq. (3.5.11) with the initial condition (3.5.9) is unique. Indeed, assume that there exist two solutions FI and F2 of Eq. (3.5.11) (or of (3.5.37)). For the difference V = jFi — ^2 we obtain the expression
V(p. <,x) =
rp Jo
do-
r Jnn
i /
G(x,p;y,o-)\$((T,
I ^
dy
dy
which together with (3.5.19) allows us to write
\V(p,x)\
Jo
da f
Js.n
G(x,p;y,
ov
dy.
196
Chapter III
The same reasoning as for the functions Fk leads to the following estimate for the difference V = FI — F-2 that holds for any k: k/2
hM
This implies that V(p, x) = 0, that is, Fj_(p, x) = F2(p, x). We have proved that the successive approximations Fo(p, x), -Fi(p, x), . . . obtained by recurrent formulas (3.5.13) and (3.5.14) converge asymptotically as k —> oo to the solution of the Bellman equation, which exists and is unique. 2. Now let us return to the synthesis problem. Previously, it was proposed to use the functions uk(p, x} given by (3.5.15) for the synthesis of the control system. By Hk(p,x) we denote the functional
calculated on the trajectories of system (3.5.1) that pass through the point x at time t = T — p under the action of control uk. The function Hk(p, x) determines the "quality" of the control uk(p,x) and satisfies the linear equation a rr
LHk(p,x) = -c(x)-ul(p,x}qT(p)-~-(p,x),
Hk(Q,x) = ij)(x).
(3.5.39) From (3.5.14), (3.5.39), and the relation -ulqT8Fk/dx =. 3>(p,dFk/dx), it follows that the difference Ajt(p, x) = Fk(p,x) — Hk(p,x) equation
satisfies the
(3.5.40) A fc (0,a:) = 0. Since the right-hand side of (3.5.40) is small for large k (see (3.5.19),
(3.5.34)), that is,
dFt (3.5.41) and the initial condition in (3.5.40) is zero, we can expect that the difference Afc(/i>, x) considered as the solution of Eq. (3.5.40) is of the same order, that is, (3.5.42)
Approximate Synthesis of Stochastic Control Systems
197
If the functions Uk(p, x) are bounded and sufficiently smooth, so that the coefficients of the operator Lk are Holder continuous, then the operator Lj. is just the same as L and the inequality (3.5.42) can readily be obtained from
(3.5.22), (3.5.24), and (3.5.41). Conversely, if Uk(p, x] are discontinuous functions (but without singularities, for example, such as in (3.0.1) and
(3.0.8)), then the inequality (3.5.42) follows from the results of [81]. Since the series (3.5.35) is convergent, we have \F(p, x) — F k ( p , x ) \ < e'^Kfeh^ (where £% —>• 0 as k —>• oo). Finally, this fact, the inequality
\F-Hk <\F-Fk + \Fk - Hk , and (3.5.42) imply \F(p,x)-Hk (p, x) \ < ekKsehW
(3.5.43)
(efc = max(ej,. , ejj.') and K% = max(^6i -KV))- Formula (3.5.43) proves the asymptotic (as k —->• oo) optimality of suboptimal systems constructed according to the control algorithms Mfc(/o, x) calculated by the recurrent formulas REMARK 3.5.1. If the coefficients of the operator L are unbounded in IT/r, then the estimates (3.5.24) and (3.5.25), generally speaking, do not hold. However, there may be a change of variables that reduces the problem to the case considered above. If, for example, the coefficients a,i(t,x) in (3.5.1) depend on x in a linear way (that is, a(t, x) = A(t)x, where A(t) is an n x n matrix depending only on t), then the change of variables x = X(0,t)y (where X(0,t) is the fundamental matrix of the system x = A(t)x) eliminates unbounded coefficients in the operator L (in the new variables y), which allows us to investigate such systems by the methods considered above.
D
In conclusion, let us consider an example from [96], which illustrates the efficiency of the method of successive approximations for a one-dimensional
synthesis problem that can be solved exactly. Let the control system be described by the scalar equation
penalized. Then the Bellman equation (3.5.8) and the initial condition (3.5.9) take the form
op
= c(x) + min
l«l<«
«
+
2
,
F(0, x) = 0.
(3.5.44)
198
Chapter III
Minimizing the expression in the square brackets, we obtain the optimal control w * ( p , x ) — -um sign —— ( p , x ) ,
and transform the Bellman equation to the form
dF
OF_
, ,
= C(X) -
dx
F(0,x) = Q.
Zlh?'
(3.5.45)
Since the penalty function c(x) is even, it follows from (3.5.45) that for any p the loss function F(p, x) satisfying (3.5.45) is an even function of x, hence we have the explicit formula
ut ( p , x ) =ut(x) = -um sign x. In this case, for x > 0, the loss function F(p, x} is determined by the formula [26] ,
:
f )=/
da f°° \um u2m I ^-r- / C 2 / e x p — — ( x - y -——a-
Jo V27rkr Jo
I b
26
r
+
2b ]
r
J
L
2ba
The successive approximations Fo(p, x), Fi(p, x),... are even functions of the variable x (since c(x) is even). Therefore, in this case, any approximate control (3.5.15) coincides with the optimal control ut, and the efficiency of the method can be estimated by the deviation of the successive approximations FQ, FI, ... from the exact solution F(p, x) written above.
Choosing the quadratic penalty function c(x) = x2 and taking into account the fact that in this case the fundamental solution G(x,p;y,cr} (the transition probability density p(y,
we obtain from (3.5.22) and (3.5.23) the following expressions for the first two approximations:
fp
dcr
[°°
Jo ^/2irb(p-(r)
2
\ - fo ~ y) 2 1
J-^ rP
^(p, x) = F0(p, x) - 1um I JO
[
A
a a
2 ^2
2b(p-
= I
y2TTD{p — (TJ J — oo
2
r
/
\2 l
\y\ exp - ^ ~ V}^ \ dy. L
Approximate Synthesis of Stochastic Control Systems
t
199
F(p= !,»), F0(l,x),
1.5
FIG. 28 The functions FO, FI, F calculated for um = b = p = 1 are shown in Fig. 28. One can see that
\F(l,x)-F0(l,x) that is, the second approximation gives a satisfactory approximation to the exact solution. This example shows that the actual rate of convergence of successive approximations to the exact solution of the Bellman equation can be larger than the theoretical rate of convergence estimated by (3.5.35) and (3.5.33), since the proof of the convergence of the method of successive approximations (3.5.13)-(3.5.15) is based on rather rough estimates (3.5.24) and (3.5.25) for the fundamental solution. §3.6. Approximate synthesis of some stochastic systems with distributed parameters This section occupies a special place in the book, since only here we consider optimal control systems with distributed parameters in which the plant dynamics is described by partial differential equations. So far the theory of optimal control of systems with distributed parameters is characterized by a significant progress, first of all, in its deterministic branch [30, 130]. Important results are also obtained in stochastic problems (the distributed Kalman filter, the separation theorem in the optimal control synthesis for linear systems with quadratic criterion, etc. [118, 182]).
200
Chapter III
However, many problems in the stochastic theory of systems with lumped parameters still remain to be generalized to the case of distributed plants. We do not try to consider these problems in detail but only discuss
the possible use of the approximate synthesis procedure (3.0.6)-(3.0.8) for solving some stochastic control problems for distributed systems. Our consideration is confined to problems in which the plants are described by linear partial equations of parabolic type.
3.6.1. Statement of the problem. Let us consider control systems subject to the equation
^j^- = Cxv(t,x)+u(t,x)+^(t,x),
0<*
v(Q,x)=v0(x). (3.6.1)
Here Cx denotes a smooth elliptic operator with respect to spatial variables X — \%li • • • i -^n)i
Cx = aij (t,x) -A— + bi(t,x)-£-+c(t,x), 0 X j (j X j
(3.6.2)
(sX'i
whose coefficients a,-j(t, x), bi(t,x), and c(t, x) are denned in the cylinder
fi = D x [0,T], where D is the closure of an arbitrary domain in the n-dimensional Euclidean space Rn and the matrix a(t, x] satisfies the inequality ifar) = a,ij(t, x)r]ir)j > 0 (3.6.3) for all (t, x) £ O and all r\ = (rji,..., rjn) (as usual, in (3.6.2) and (3.6.3) the sum is taken over twice repeated indices from 1 to n). If D does not coincide with the entire space R n , then, in addition to (3.6.1), the following boundary conditions must be satisfied at the boundary dD of the domain D:
Mxv(t,x) = ur(t,x),
(3.6.4)
where the linear operator Mx depends on the character of the boundary problem. Thus, for the first, the second, and the third boundary value problems, condition (3.6.4) has the form
v(t,x) = ur(t,x), ^(t,x)=ur(t,x),
dv
— (t, x) + q(t, x)v(t, x) = ur(t, x).
(3.6.4.1) (3.6.4.II) (3.6.4.III)
Here x 6 dD, dv/dcr denotes the outward conormal derivative, and a is the outward conormal vector whose components CTJ (i = 1,.. .,n) and the
Approximate Synthesis of Stochastic Control Systems
201
components of the outward normal v on the boundary dD are related by the formulas cr^ — a.iji>j [61, 124]; in particular, if ||ajj||" is the identity matrix, i.e., dij = <5jj, then the conormal coincides with the normal. For example, equations of the form (3.6.1) with the boundary conditions (3.6.4) describe heat propagation or variation in a substance concentration in diffusion processes in some volume D [166, 179]. In this case, v(t, x) is the temperature (or, respectively, the concentration) at the point x G D at time t. Then the boundary condition (3.6.4.1) determines the temperature (concentration), and the condition (3. 6. 4. II) determines the heat (substance) flux through the boundary dD of the volume D. System (3.6.1) is controlled both by control actions u(t, x) distributed throughout the volume and by variations in the boundary operating conditions up(t, x). The admissible controls are piecewise continuous functions u(t,x) and t/ r (i, x) with values in bounded closed domains:
u(t,x)£U(x),
x£D;
ur(t, x) <E Ur(x),
x & dD.
(3.6.5)
We assume that the spatially distributed random action £(i, x) is of the nature of a spatially correlated normal white noise E£(*,z) = 0,
Et(t,x)t(T,y)=K(t,x,y)6(t-T),
(3.6.6)
where K (t, x, y) is a positive definite kernel-function symmetric in x and y and 6(t) is the delta function. We also assume that, under the above assumptions, the function v(t, x) characterizing the plant state at time t is uniquely determined as the generalized solution of Eq. (3.6.1) that satisfies (3.6.1) for (x,t) E D x (0,T] and is a continuous continuation of a given initial function i>(0, x) = VQ(X) as t —>• 0 and of the boundary conditions (3.6.4) as x —>• dD. The problem is to find functions u*(t,x) and u*(t, x) satisfying (3.6.5) so that to minimize the optimality criterion u[v(t,x1),...,v(t,x'),u(t,x)iur(t,x')] D J dD
xdx1...dx'dxdx'dt\,
(3.6.7)
where x1 = (x\, x*2, ..., x'n), dxl = dx\dx\ . ..dx^ (i = 1, 2, . . . , s), and w is an arbitrary nonnegative integrable function. In this case, the desired functions it* and u* must depend on the current state v(t, x) of the controlled system (the synthesis functions), that is, they must have the operator form ut(t,x) =
u*r(t,x) = il>[t,v(t,x)]
(3.6.8)
(it is assumed that the state function v(t, x) can be measured precisely).
202
Chapter III
3.6.2. The Bellman equation and equations of successive approximations. To find the operators (3.6.8), we shall use the dynamic programming approach. Taking into account the properties of the para-
bolic equation (3.6.1) and the nature of the random actions (3.6.6), we can prove [95] that the time evolution of v(t, x) is Markov in the following sense: for given functions u(t, x) and up(t,x), the probability distribution of the future values of V(T, x) for T > t is completely determined by the value of the function v(t,x) at time t. This allows us to consider the minimum losses on the time interval [t,T\,
F[t,v(t,x)] =
min
E
u(t.x}£U(x) ur(t,x)£Ur(x) t
rT _
uiTdr,
Jt
where
= /•••/
JD
/
U[V(T, x 1 ),...,i)(r, xs),u(r, x),ur(r, x')]dxl.. .dx'dxdx',
JD JdD
(3.6.9)
as a functional depending only on the initial (at time t) state v(t, x) and time t. Therefore, the fundamental difference equation of the dynamic programming approach (see (1.4.6)) can be written as
t
U
t + At
"1
u>T dr + F[t + At, v(t + At, x)] \. ) (3.6.10)
For small At, in view of (3.6.1), we have
v(t + At, x) = v(t, x) + Av(t, x) t+At
/
t(T,x)dr + o(At). (3.6.11)
Taking (3.6.11) into account, we can expand the functional F[t+At, v(t+ At,x)] in the functional Taylor series [91]
F[t + At,v(t + At,x)] „. . ., dF[t,v(t,x)] I" 8F[t, v(t, x) = F[t,v(t,x)}+ —— ——LL '' ; }1}1At+ / ' {
at
Sv(t,x)Sv(t,y)
J
dv(t,x)
-Av(t, x)Av(t, y) dxdy + ... . (3.6.12)
Approximate Synthesis of Stochastic Control Systems
203
The functional derivatives SF/6v and S3F/6v(x)6v(y) in (3.6.12) (for their detailed description, see [91]) can be obtained by calculating the standard derivatives in the formulas
SF ,. 1 = hm -— Sv(t,x) A-).O A"
dvj
AJ —¥x
'
v v . . . Sv(t,x)Sv(t,y)
A-S-O A
(3.6.13)
2n
In (3.6.13) the functional F^(VI,VZ, • • •) denotes a discrete analog of the functional F(t, v(t, a;)), which can be obtained by dividing the volume D into n-dimensional cubes A; of equal volume A™ and replacing the continuous function v(t, x) by a set of discrete values t>i, 1)2,... each of which
is equal to the value of v(t, x) at the center of the cube A,. In this case, the functional F is assumed to be sufficiently smooth, that is, its weak and strong Gateaux and Freshet derivatives [91] exist up to the second order inclusively, are equal to each other, and coincide with (3.6.13).
Substituting the expansion (3.6.12) into (3.6.10), passing to the limit as At —>• 0, and taking into account (3.6.6) and (3.6.11), we obtain the Bellman equation with functional derivatives:
dF
.
{_,
r
SF
r,
S + \ I I K(t,x,y) f "JDJD dv(t,x)6v(t,y)
,.
x
.
,.
„
dxdy.
(3.6.14)
To find the desired optimal control operators (3.6.8), it is necessary to solve Eq. (3.6.14).
The integral in the braces in (3.6.14) depends (in addition to the "solid" controls w(t, a:)) on the control actions u (t, x) that determine the boundary conditions (3.6.4) for the functions v(t, x) obtained by solving Eq. (3.6.1). We can write this dependence explicitly by using the Green formula [61,
124]
f —r
JD $v
- f
JD
r*— x v $
( / A\—— - — (—}]
JdD I
[da 6v
zos(v,Xi) > dx, 1/2
da V Sv / J
(3.6.15)
204
Chapter III
where £* denotes the differential operator dual to Cx in the variables x and v is the outward normal on 3D. In (3.6.15) the integral over the boundary dD of the domain D explicitly depends on the control ur(t, x) of
the boundary operating conditions as it follows from (3.6.4). To be definite, let us consider the third boundary value problem (3.6.4.III). The outward
conormal derivative of the state function v(t, x) on the boundary dD can be written as
dv — ( t , x ) = ur(t,x)-q(t,x)v(t,x)
(x€8D).
(3.6.16)
Substituting (3.6.16) into (3.6.15) and (3.6.15) into (3.6.14), we obtain the following final Bellman equation (for the third boundary value problem):
dF dt
. (_. utu, I
.
--—•= mm
f 5F JD $v
,
r
I -—-udx + I
JdD
ASF
, \
Sv
J
A—ur dx >
«r€fr
+ I t>£* —cb JD &v f [ 8F f dtnj\ . / d /SF\ SF\] + / \v—(bi--z-^-lcosfaxi)-Alv—l-?-}+qv-r-)\dx JdD \_ &v \ oxj / \ do- \ov / 5v / J S + 1 f f K(t, x, y) f dxdy, ZJoJo 6v(t,x)6v(t,y)
F(T, v(T, x}) = 0. (3.6.17)
This equation can be solved only approximately if the penalty functions are arbitrary and the controls u and w r are subject to constraints. Let us consider one of the methods for solving (3.6.17) by using the approximate synthesis procedure of the form (3.0.6)-(3.0.8). As already
noted (§§3.1-3.4), the approximate synthesis method is especially convenient if the controlling actions are small, namely, |[t> — flo||/|H| <si 1> where v is the solution of Eq. (3.6.1) with the boundary condition (3.6.4) and with any admissible functions u and ur satisfying (3.6.5), VQ is the solution of the corresponding homogeneous (in u and ur) problem, and |[ • || is the norm in the space L2'. From a physical viewpoint, this means that the power of (controlled) sources is not large as compared with ||i>||2 or with the intensity ID ID K f t i x> y} dxdy of random perturbations £(t, x). Then, by setting u(t, x) = ur = 0, we obtain the following equation for
the zero approximation instead of (3.6.17):
Approximate Synthesis of Stochastic Control Systems
+ [ L**°.(b.-?*L\ JQD L Sv \ ' dxj . K(t,x,y)
f
' .
' .
f
'Sv(t,x)Sv(t,y)
'f
,dxdy,
205
d fSF
°^
F0(T, v(T, x)) = 0 .
(3.6.18)
Here, according to (3.6.9), w 0 (v(i, a;)) is a functional of the form
uo(v(t,x)) = w(0,0,-y)
= I ... I JO
L
I
u[v(t,x'
),...,v(t,xs),Q,Q}dxl...dxsdxdx'
J D JdD
= I ••• I w 0 [r>(<, x1),..., v(t, xs)] dx1... dx". JD
JD
(3.6.19)
If the functional Fo(t, v(t, 2;)) satisfying (3.6.18) isfound, then the condition
. f_,
, , r SF
r 6F
\
O 0 dx > mm <w(u.u r .v)+ I ——udx+ I ——u «6C7, I JD 6v JdD Sv )
« r €C/ r
= w(^ 0 ,^o,w)+ / 5-^v0dx+ I
JD ov
S
-^^0dx^^(v(t,x)).
J9D Sv
(3.6.20)
allows us to calculate the zero-approximation optimal control functions (operators) uo(t, x) = ip0(t,v(tjx)J and u°(t, x) — i/;o(t,v(t,x)). The expression for oJi (v(t, x)) is used to calculate the first approximation _Fi(i,-u(t, a;)), and so on. In general, the fcth approximation Ftc\t,v(t,x)*) (k — 1,2,...) of the loss functional is determined as the solution of an equation of the form (3.6.18), where the change WQ —^ ^k and FQ —> F^ is performed. Furthermore, simultaneously with Fk, we determine the pair of functions (operators)
uk(t,x) =
x&D,
ukr(t,x) = i>k[t,v(t,x)},
x£dD,
which allow us to synthesize a suboptimal control system in the fcth approximation (the functions tfk and ipk can be obtained from Eq. (3.6.20) with FQ replaced by Fk).
3.6.3. Quadrature formulas for functionals of successive approximations Fk[t,v(t, x ) ] , k = 0,1,2,.... To use the above procedure of approximate synthesis in practice, we need to solve Eq. (3.6.18) and the corresponding equations for Fk (k — 1, 2 , . . . ) .
206
Chapter III
First, let us consider the zero-approximation equation (3.6.18). We show that if the influence function G(x, t; £, T) of an instantaneous point source15 is known, then the solution of Eq. (3.6.18) can be written in the form
F0[t,v(t,x)] = I ••• I
JD
CO
/
JD
dx1...dxs I
Jt
dr
/»OO
.../
uo(vi,...,ve)ptT(vi,...,vi;v(t,x))dvi...dvi,
(3.6.21)
J — oo
-oo
where the function ^0(^1, . . . , vt) is defined by (3.6.19) and
P, r (t;i,... ,«,;«(*,*)) = [(27r)'det||Ar|[]" 1/2 x exp < — — rya — / G(xa,T;x,i)v(t,x) dx\ I
2L
JD
J
(3.6.22) Here the entries of the matrix ||-Dtr|| are given by the formulas
t
dp [
f K(^x,y)G(xa,T-x,tJ,)G(x^T-y,n)dxdy,
JD JD
(3.6.23)
and (D^Tl)ap denotes the (a,/3)th element of the inverse matrix ||Dtr|[-1. To prove (3.6.21) and (3.6.22), we need to recall some well-known facts [61, 124] from the theory of adjoint operators and Green functions. Suppose that a smooth elliptic operator Cx of the form (3.6.2) is given in an arbitrary domain D of an n-dimensional Euclidean space Rn . We also
assume that this operator is denned in the space of functions / sufficiently smooth in D and satisfying the equation
Mxf = 0
( z e dD)
(3.6.24)
on the boundary dD of the domain -D; here M.x denotes a certain differential operator with respect to the variables x G dD (a boundary operator). 15
The function G(x,t;£,T), t > r, with respect to the variables (x,t) is the solution
of the homogeneous boundary value problem (3.6.1), (3.6.4) (the case in which u(t, x) =
ur(t,x) = £(t,x) = 0 in (3.6.1) and (3.6.4)) with the initial condition v(r,x) = S(x - C). This function is also called the fundamental solution or the Green function of problem (3.6.1), (3.6.4).
Approximate Synthesis of Stochastic Control Systems
207
DEFINITION 3.6.1. The operators £* 6 D and M*x £ dD are called adjoint operators oiCx and M.x if for arbitrary sufficiently smooth functions /(z), satisfying (3.6.24), and
l
(3.6.25)
(x G dD),
we have the relation
f
D
(3.6.26)
In general, the adjoint operators £* and A4* are not uniquely defined. However, if we set £* equal to the adjoint operator denned in the unbounded domain D = Rn [61], that is,
32
d
(3.6.27)
then it follows from Definition 3.6.1 and the Green formula
/
that the operator M*x can be defined uniquely. So, for the first, second, and third homogeneous boundary conditions (that is, for the conditions (3.6.4.I)-(3.6.4.III)) where ur(t,x) - 0, Eq. (3.6.25) takes, respectively, the form dD
(3.6.25.1)
= 0,
(3.6.25.II)
A
JT oa ~
9D
= 0. 9D
(3.6.25.III)
Now let us consider the parabolic operators L — — TT7 + £x,
at
(3.6.28) (3.6.29)
208
Chapter III
DEFINITION 3.6.2. A function G(x,t;C,r) denned and continuous for (x,i), ( C , T ) G £i, 2 > T, is called the influence function of a point source (the Green function) for the equation Lf = 0 in the domain fl if for any T G [0, T) the function G(s;,i;C,T) satisfies the equation
LG = 0
(3.6.30)
in the variables (t, x) in the domain D x (T < t < T) and satisfies the initial and boundary conditions
2;C,T) = <J(a;-C), MXG = 0
for x <E 3D,
(3.6.31)
T
(3.6.32)
In a similar way, the Green function G* (x, t; C, T) is defined for the adjoint parabolic operator (3.6.29). The only difference is that, in this case, the function G* is defined for time t < T. The conditions (similar to (3.6.30)(3.6.32)) that determine the Green function for the adjoint problem have the form
L*G* = 0
for
(t,x)£D x (Q
(3.6.33)
limG*(x.t;C,r) v ;= S(x-C), \ sy, tfr
M*XG* = 0
(3.6.34) v ;
for (t, x)£dDx(Q
D
(3.6.35)
The following statement readily holds for the functions G and G*.
DUALITY THEOREM. If G(x,t^,r) and G*(x,t;£, T) satisfy problems (3.6.30)-(3.6.32) and (3.6.33)-(3.6.35), then
G(x, t; C, T) = G* (C, T- x, t).
(3.6.36)
PROOF. Let us consider the functions G(y,rj;^T) and G*(y,n;x,t) for y G D and T < r\ < t. Taking into account the fact that these functions satisfy (3.6.30) and (3.6.33) in y and 77, in view of Definition 3.6.1 of the adjoint (in y) operator £*, we have
0=/ Jr+e
[ (-G*LG+GL*G*)dydrf JO
dyd D V
or]
or) /
*=
[G G] JD
* -~- dy-
Approximate Synthesis of Stochastic Control Systems
209
Rewriting (3.6.37) in the form
/ Gt(y,t-e;x,t)G(y,t-e;C,T)dy= JD
I G*(y,T+e;x,t)G(y,T+E;(;,T) dy JD
passing to the limit as e —)• 0, and taking into account (3.6.31) and (3.6.34), we obtain (3.6.36). D Now, by using the properties of the Green functions, we shall show that the functional (3.6.21) actually satisfies Eq.(3.6.18). To this end, we need to calculate all derivatives in (3.6.18). Taking into account the relation oo
*»o
- '
... / •00
J-o
xexp[-|K-aa
and the property (3.6.31) of the Green function, we differentiate (3.6.21) with respect to time and obtain
° = - f ... f u0[v(t,xl),...,v(t,xs}]dxl... l JD JD + I ••• [ dx^...dxs f
JD
... I J
-°°
JD
Jt
dr f
J -oo
...
\u0(v-L,...vs)—^(vi,...vs;v(t,x)) L
m
dvi,. ..dvs. J
(3.6.38)
To calculate dptr/dt, we use the rules for differentiating determinants and inverse matrices: - det B = det B • S p - 1 ^ ;
JET1 = -B
(here B is the matrix composed of the time-derivatives of the entries of the matrix B). Performing the necessary calculations, we obtain
dt
= -ptrtSp(D^K) ~
+ '
a,{3= !,...,«,
(3.6.39)
210
Chapter III
where, for brevity, we use the notation
= \\Kap\\i =
f
f K(t,x,y)G(xa,r;x,t)G(xf),T;y,t)dxdy
JD JD
[ ] « = [ » « - / G(xa, r; x, t)v(t, x) dx] ,
L JD Ptr = Ptr(vi,...,vs;v(t,x)).
J
(3.6.40)
By formulas (3.6.13) and (3.6.22), we can readily obtain the first- and second-order functional derivatives
SF
f f f^ f°° f°° 5— = / • • • / dxi...dxsl dr I ... w 0 («i, • • -,v,)ptT
dv(t,x)
S2F
——— °
8v(t,x)6v(t,y)
JD
JD Jt J_oo J_00 ( x G(x *,T;x,t)(D^)al)[]/3dvl...dvs,
f • \
JD
l
dx ...dx
s
fT
fc°
I dr I
Jt
J-oo
/•« ... I
J-c
(3.6.41)
, . . . , v,)ptr
- G(xa, r; x, t)(D^r1)al}G(x13, T; x, t) dv,. . . . dvs. (3.6.42) In view of (3.6.36), the Green functions G(xa,T- x,t) in (3.6.39)-(3.6.42) satisfy (with respect to x and t) the adjoint equation (3.6.33) in the interior of the domain D and the adjoint boundary condition on the boundary 3D. Taking into account the fact that the adjoint boundary condition has the form (3. 6. 25. Ill) for the third boundary value problem (Eq. (3.6.18) was written just for this problem) and substituting (3.6.41) into (3.6.18), we readily verify that the integral over the boundary dD in (3.6.18) is equal to zero. Finally, substituting (3.6.38)-(3.6.42) into (3.6.18), we arrive at an identity, and relation (3.6.21) is thereby proved. The solution of the zero approximation equation (3.6.18) is given by
formulas (3.6.21) and (3.6.22). As a rule, the higher-order approximations -Ffc(t, v(t, x)}, k > 1, are calculated by more complicated formulas, where, in addition, we must pass to the limit, since, in general, w/t (v(t, K)), k > 1, are not integral functionals of the form (3.6.19). Therefore, we can calculate successive approximations FI- (t, v(t, x)), k > 1, by using, instead of (3.6.21),
Approximate Synthesis of Stochastic Control Systems
211
the formula [95] T
/
/»oo
dr I J-OO
/»oo
.../
u£(Vl,...,vr)
J..OO
x Ptr(vi, ...,vr;v(t,x))dvi,.. .,dvr,
(3.6.43)
where u>£(vi,.. .,vr) = u£(v(t, x1),..., v(t, x7")) is a finite-dimensional analog of the functional Uk[v(t, x)] such that lim wfcA («i,..., VT) = uk[v(t,x)].
(3.6.44)
r—>oo
A-+0
The following example illustrates calculations with the help of formula
(3.6.43). 3.6.4. An example. If we choose some special expressions for the functional (3.6.7), the operator (3.6.2), etc., then, using formulas (3.6.21) and (3.6.43), we can obtain a finite approximate solution of the synthesis problem. As an example, we calculate the optimal control of a substance concentration in a cylinder of a finite length. Let us consider a control problem often encountered in chemical industry processes. Suppose that there is a chemical reactor in which the output product is obtained by catalytic synthesis reactions. We assume that the reacting agents diffuse into the catalysis chamber through pipelines. There
may be branches in the pipeline through which reagents are coming to technological units where the concentration of the entering substance varies on random. Simultaneously, to obtain a qualitative output product, it is necessary to maintain the reagent concentrations close to given values. One of possible ways to stabilize the concentration in the catalysis chamber is to change the flow rate at the input of the corresponding pipeline.
After appropriate generalizations and idealizations, this problem can be stated as follows. Let the plant (a pipeline) be a cylinder of length i filled
with a homogeneous porous medium; the assumption r
the cylinder can be affected by changes in the flow rate at the end of the cylinder (the rate of the incoming flow is the controlling action). Assuming
that a random perturbation £(i, x) is a stationary white noise, we obtain the following mathematical model of the plant to be controlled [95]:
=a2+<e(M),
Q<x
Q
«2 =
,
(3.6.45)
Chapter III
212
(here B and C are the diffusion and the porosity coefficients of the medium);
dv ~dx
dv_
dx
x=0
(3.6.46)
= 0,
r)-
(3.6.47)
For the plant (3.6.45)-(3.6.47), we need to synthesize a regulator that minimizes the mean value of the quadratic performance criterion
/=E[/
/
[ 6(x,y)v(t,x)v(t,y)dtdx]
L./O Jo Jo
(3.6.48) J
( 9 ( x , y ) is a given positive definite function, i.e., the kernel) provided that the absolute value of the boundary control action (the boundary flow of the substance) u is bounded, that is,
\u\ < um.
(3.6.49)
In this example the Bellman equation (3.6.17) has the form
&F i rl rl
-
S2F
I K(x.y}-^—.———^-——-dxdy y
2 Jo Jo
' *Sv(t,x}Sv(t,y)
mn
££-*£(£) Sv
. i"[r,,(T,x)]=o. (3.6.50)
Taking into account (3.6.45) and (3.6.46) and calculating the minimum with respect to u, we can rewrite (3.6.50) in the form
dF
f^ f^ f^ f^
f^
= L L «(*' x=0
d2 /
SF
dx
SF ' 'dx\Sv(t,x)J\f=
S2F =,y) Sv(t,x)Sv(t,y) dxdy - a un
SF \Sv(t, x ) / x=o
F[T,v(T,x)} = 0 .
(3.6.51)
Simultaneously, we obtain the optimal control law (3.6.52)
Thus, to obtain the final solution of the synthesis problem, it remains to calculate the functional derivative [SF/Sv(t, x)]x=o in (3.6.52). We calculate it by the method of successive approximations.
Approximate Synthesis of Stochastic Control Systems
213
The zero approximation. Suppose that um is small. To solve (3.6.51), we first set um = 0. As a result, we obtain the following equation of the zero approximation ft ft a2 , op f fft
=
-^ L L
<^2 F
1 r^ /*'
+ -/ K(x,y)——~-^-——dxdy, 2 Jo Jo '<Jr;(<,a;)ch>(t,3/)
F0[T, v(T, x)] = 0.
(3.6.53) Elementary calculations show that its solution (3.6.21) can be written in
the form fT
FQ[t,v(t,x-)]= / Jt
r
ft
ft
dr\ / / fl(x,y) (. Jo Jo x
\ /tl /fl (G(a ( -,T;K,i)G(y, r;«,i)-u(t, x)w(i, y) ;
IJo Jo
v
+ / K(x,y)G(x,T;x,ff)G(y,T;y,
(3.6.55) The functional derivative of the quadratic functional (3.6.54) can readily be calculated (for example, by using formulas (3.6.13); see also [91]) as follows:
SF
——^— = 2 OV\l, X)
CT
dr
Jt
Fl fi JQ
JQ
fl
e(x,y)G(x,T;x,t)G(y,T;y,t)v(t,y)dydxdy.
JQ
Hence it follows that the optimal control law (3.6.52) has the following form in the zero approximation:
UQ\t, v(t, x)} = um sign rtt
ft ft
/
Ut
dr
-,
x / / / 0(x,y)G(x,T;Q,t)G(y,T;y,t)v(t,y)dydxdy\. Jo Jo Jo J
(3.6.56)
214
Chapter III
The first approximation. Taking into account (3.6.56), we can write Eq. (3.6.51) in the first approximation with respect to um as follows: c) W
f>t
——ST = /
01
fit
/ e(x,y)v(t,x)v(t,y)dxdy
Jo Jo
F1[T,v(T,x)]=Q, (3.6.57)
U
T
fl fl fl dr I I I 0(x,y)G(x,T]Q,t)G(y,T;y,t)v(t,y)dydxdy Jo Jo Jo
Now, formulas (3.6.21) and (3.6.22) are not sufficient for calculating Fi(t,v(t, x)); we need to use a more complicated calculation procedure according to (3.6.43) and (3.6.44). A finite-dimensional analog of the functional 2 can be obtained by dividing the interval [0,1] into the intervals A = t/r and replacing u by
WA — \hiVi + • • •
r
hll = ^l
Jt
ft r ft r
dr \
\ 9(x,y)G(x,T;Q,t)G(y,T;[j,A,t)dxdy.
Jo Jo
Next, we use formulas (3.6.21), (3.6.22), and (3.6.43) as well as the formula OO
/
AOO
.../
|v
J — OO
-OO
x exp - —^—-——-^——-———— I dxi... dxr
P
'"
J
(3.6.58)
Approximate Synthesis of Stochastic Control Systems
215
where
1
hr gr •
As a result, for Fi[t, v(t, a;)], we obtain the expression
fl
2um
Jt
dr
,
(3.6.59)
where Fo[t, v(t, x)] is given by (3.6.54), and moreover, ft
,T
H = H[t,T,v(t,x)} =
da I JT
Jo
,/
ft
I
I
Jo
Jo
ft
/ 6(x,y) Jo
x G(x, a; 0, r)G(y, a- x, r)G(x, r; y, t)v(t, y) dxdydxdy,
(3.6.60)
da f
JT
da f • • • f K(x, y)6(x', y'}6(x", y")
JT
Jo
Jo
x G(x, T; x, a)G(y, T; y, a)G(x', a; 0, r)G(j/, a; x, r)G(x", W; 0, r) x G(y", cf; y, T) dxdydx'dy'dx"dy"dxdy. After the functional derivative (SFi/Sv(t,x)')x_0 is calculated, relations (3.6.52) and (3.6.59) yield the controlling functional
U
L
U
T
dr
f1 ft I I 0(x,y)G(x,T;Q,t)G(y,T;y,t)v(t,y)dydxdy
Jo Jo
H
H
ft f t f t _ _ J1 I I I 6(x,y)G(y,a;x,T)G(x,T;Q,t)dxdydx\ >. Jo Jo Jo 1)
(3.6.61) Formula (3.6.61) enables us to synthesize the quasioptimal control system in the first approximation.
Chapter III
216
Although the quasioptimal control algorithms (3.6.56) and (3.6.61) look somewhat cumbersome (especially, formula (3.6.61)), they admit a transparent technical realization. For example, let us consider the zero-approximation algorithm (3.6.56), which can be written as
U
l
1
Q(y,t)v(t,y)dy\,
(3.6.62)
J
where Q(y,t)= f Jt
dr I I 8(x,y)G(x,T;Q,t)G(y,T;y,t)dxdy Jo Jo
is a known function calculated previously. The current value of the state
function v(t, x ) can be determined by a system of data units that measure the concentration v(t, xi),v(t, #2), • • . , v(t,xp) at the points xi,X2,..-,xP lying along the cylinder. In particular, if the concentration gauges are placed uniformly along the cylinder, then the integral in (3.6.62) can be replaced by the sum
«o(V
= w ro sign
Qi(t) = 8=1
A = -, P
(3.6.63)
xt = iA,
Vi = v(t, xt).
As a result, we obtain an algorithm whose realization does not present any difficulties.
FIG.29
Approximate Synthesis of Stochastic Control Systems
217
Indeed, it follows from (3.6.63) that, besides a system of data units, the control circuit (the feedback circuit) contains a system of linear amplifiers with amplification factors Qi(t), an adder, and a relay type switching device that relates the pipeline [0,1] either to reservoir 1 (for pumping additional substance) or to reservoir 2 (for substance suction at the pipeline input). Figure 29 shows the block diagram of the system realizing the control algorithm (3.6.63). The quasioptimal first-approximation algorithm (3.6.61) can be realized in a similar way. Here only the control circuit, along with a nonlinear unit of an ideal relay type, contains nonlinear transformers that realize the probability error functions $(K). However, it should be noted that an error is inevitably present in the finite-dimensional approximation of the state function v(t, x) (when the algorithm (3.6.56) is replaced by (3.6.63)), since it is impossible to measure the system state v(t, x) precisely (this state is a point in the infinitely dimensional Hilbert state L2). However, if the points X I , . . . , X P of location of the concentration data units lie sufficiently close to each other, then this error can be neglected.
CHAPTER IV
SYNTHESIS OF QUASIOPTIMAL SYSTEMS
IN THE CASE OF SMALL DIFFUSION TERMS IN THE BELLMAN EQUATION
If random actions £(t) on the plant in the closed-loop control system shown in Fig. 3 are of small intensity and the observation errors j](t) and £(t) are large, then the Bellman equation contains a small parameter, namely, the coefficients of the second-order derivatives of the loss function in the phase variables are small. Indeed, considering the synthesis problem for which we derived the Bellman equation in the form (1.4.26) in §1.4, we assume that the matrix
Ft + [A" (t, y)]T Fy + I [ Sp B* (t,x)FM + Sp By0 (*, y) Fyy] + *(*, x, y, Fx) = 0, (4.0.1) where B^t, x) —
which the matrix Q(t) in (1.5.46) has the form Q(t) = e~i/2Q0(t). In this case, the Bellman equation (1.5.54) for the problem considered can be written in the form
Ft +
Sp DRDFmmT + Sp [FD(aaT - eDRD}} + ^(m, D,Fm,FD} = 0, (4.0.2)
where
, D, Fm, FD) = mm [(mTGT(t, u) + bT (t, u))F
Sp FD (DGT(t, u) + G(t, u)D) + c(m, D, u)] ,
219
220
Chapter IV
If the value of the parameter e is small, then the solutions of the above equations and the solutions of the equations
F? + [A*(t, y)}TF° + *(*, x, y, F°) = 0,
(4.0.3)
T
F? + Sp F°D
(4.0.4)
obtained from (4.0.1), (4.0.2) by setting e — 0, are expected to be close to each other. The equations for F° are, generally speaking, simpler than the original Bellman equations, since they do not contain second-order derivatives and thus are partial differential equations of the first order. If these simpler equations can be solved exactly, then we can construct solutions of
the original Bellman equations as series in powers of the small parameter e, that is, as F = F°+eF1+e2 .... Here the function F° plays the role of the leading term (generating solution) of the expansion. Taking finitely many terms Fk = F° + £F1 + - - - + ekFk (4.0.5) of the asymptotic series and considering Fk as an approximate solution of the Bellman equation (the feth approximation), we can readily solve the synthesis problem corresponding to this approximation. To this end, it
suffices to make the change F —>• Fk in the expression for the optimal control algorithm w* = <po(t,x,y,dF/dx) (see, for instance, (1.4.25)). In this way, we obtain the quasioptimal algorithm for the fcth approximation:
Uk(t,x,y) —
for the successive terms .F1, F2,... in the expansion (4.0.5) can be obtained in the standard way by substituting the expansion (4.0.5) into Eqs. (4.0.1)
or (4.0.2) and setting the coefficients of different powers ek (k > 1) of the small parameter equal to zero. In other cases, it may be convenient to use a somewhat different scheme of calculations in which the successive approximations Fk (k > 1) are obtained as solutions of the sequence of equations: Fkt + [A*(t, y)}TFky + *(<, x, y, Fkx) lyyT},
k > 1,
(4.0.6)
or
Fkt + Sp FkDff(TT
+ $i(m, £>, Fkm, FkD)
= e[Sp Fk-iDcrcrT - ±SpDRDFk_lmmT},
k>l. (4.0.7)
Synthesis of Quasioptimal Systems
221
This approximate synthesis procedure was studied in detail and exploited for solving some special problems in [34, 56, 58, 172, 175]. The accuracy of the approximate synthesis was investigated in [34, 56]. It was shown that, under certain conditions, the use of the quasioptimal control u^ in
the fcth approximation gives an error of the order of ek+1 in the value of the minimized functional. In other words, if instead of the optimal control algorithm ut we use the quasioptimal algorithm «&, then the difference between the value of the optimality criterion /[it*] corresponding to this
control and the minimum possible (optimal) value I[ut] — F is of the order ofe fc+1 , that is,
/[«*] - /[«.] = I[uk] -F< cek+1,
(4.0.8)
where c is a constant. In the present section the main attention is paid to the "algorithmic" aspects of the method, that is, to calculational methods for obtaining quasioptimal controls Ufc. As an example, we consider two specific problems of the optimal servomechanism synthesis. First (in §4.1), we consider the synthesis problem that generalizes the problem considered in §2.2 to the case in which the input process y(t) is a diffusion Markov process inhomogeneous
in the phase variable y. Next (in §4.2), we write an approximate solution of the synthesis problem for an optimal system of tracking a discrete Markov process of a "telegraph signal" type when the command input is observed on the background of a white noise. §4.1.
Approximate synthesis of a
servomechanism with small-intensity noise
Let us consider a servomechanism shown in Fig. 10. Assume that the plant P is described by a scalar equation of the form
x = u + VeN£(t),
(4.1.1)
where £(i) is the standard white noise of unit intensity (1.1.31), e and N are given positive constants (e is a small parameter), and the values of admissible controls u lie in the region1 —a — um
(4.1.2)
1 The nonsymmetric constraints (4.1.2) are, first, more general (see [21]), and second, they allow a more convenient comparison between the results obtained later and the
corresponding formulas constructed in §2.2.
222
Chapter IV
where um > a > 0. The command input y(t) is a £ (^-independent scalar Markov diffusion process with drift and diffusion coefficients
Ay - -/3y,
B« = eB,
(4.1.3)
where (3 and B > 0 are given numbers and e is the same small parameter as in (4.1.1). The performance of the tracking system will be estimated by the value of the integral optimality criterion
c(y(t)-x(t))dt],
o
(4.1.4) J
where the penalty function c(y(t) — x(t)} = c(z(2)) > 0, c(0) = 0, is a given concave function of the error signal z ( t ) = y(t) — x(t). The problem stated above is a generalization of the problem studied in Section 2.2.1 of §2.2 to the case in which the plant is subject to uncontrolled random perturbations and the input Markov process y(t) is inhomogeneous in the phase variable y (the drift coefficient Ay = Ay(y) = —f3y ^ const). The inhomogeneity of the input process y(t) makes the synthesis problem more complicated, since in this case the Bellman equation cannot be reduced to a one-dimensional equation (as in Section 2.2.1 of §2.2).
Since problem (4.1.1)-(4.1.4) is a special case of problem (1.4.2)-(1.4.4), then it follows from (1.4.21), (1.4.22), and (4.1.1)-(4.1.4) that the Bellman equation has the form
-PyFy +
min
—a—um
[uFx] + | (NFXX + BFyy] + c(y - x) = -Ft, 2
0
F(T, x, y) = 0.
(4.1.5)
If like in Section 2.2.1 of §2.2 we introduce a new phase variable z = y — x and replace the loss function F(t,x,y) by F(t,y,z), then Eq. (4.1.5) can readily be written as
- (3y(Fy + FZ) +
min
—a—um
[~uFz]
+ ^(Fyy + Wy* + ?**) + ^» + c(z) = -Ft, Q
F(T,y,z)=Q.
(4.1.6)
We are interested in the stationary tracking when the terminal time T —>• oo. If the stationary loss function f ( y , z) is introduced in the standard
way (see (1.4.29) and (2.2.9)), f ( y , z ) = lim [ F ( t , y, z) - 7(T - <)], 1 —>OO
(4.1.7)
Synthesis of Quasioptimal Systems
223
then (4.1.6) implies the following stationary Bellman equation for the problem considered: — a— um
+ \[B(fyy
+ 2fy, + /«) + #/„] + c(*) = 7-
(4.1.8)
As usual, the number 7 > 0 in (4.1.8) characterizes the mean losses per unit time under stationary operating conditions. This number is an unknown variable and can be obtained together with the solution of Eq. (4.1.8). Let us discuss the possibility that Eq. (4.1.8) can be solved. By R+
we denote the domain on the phase plane (y, z) where fz > 0 and by R_ the domain where fz < 0. It follows from (4.1.8) that the optimal control
«* (y, z) must be equal to u» = um — a in R+ and to u* = —um — a in R_. Denoting by f + ( y , z ) and /_(y, z) the values of the loss function f ( y , z ) in the domains R + and R_, we obtain the following two equations from
(4.1.8):
+ ^ 5 ^ + 2 ^ f + ^f I [ V Oy dyOz Oz
+JV%^ + c ( z ) = 7 Oz J
inR±. ^^
Since in (4.1.8) the first derivatives fy and fz are continuous on the interface F between R+ and R_ [172], both equations in (4.1.9) hold on F, and we have the condition
Since the control action u* is of opposite sign on each side of the inter-
face F, the line F is naturally called a switching line. It follows from the preceding that the problem of the optimal system synthesis is equivalent to the problem of finding the equation for the switching line F.
Equations (4.1.9) cannot be solved exactly. The fact that expressions with second-order derivatives contain a small parameter £ allows us to solve these equations by the method of successive approximations. In the zero
approximation, instead of (4.1.9), we need to solve the system of equations
By /±, 7°, and F° we denote the loss function, the stationary error, and the switching line obtained from the Eq. (4.1.11) for the zero approximation. The successive approximations /£, jk, and F* (k > 1) are calculated
224
Chapter IV
recurrently by solving a sequence of equations of the form
+(«T«™)=7fc-Cfc(2/,*),
* > 1 , (4.1-12)
where
ck(y,z) = ck±(y,z) = c(z)
B
2
-
,f
2
-i
A method for solving Eqs. (4.1.11), (4.1.12) was proposed in [172]. Let us briefly describe the procedure for calculating successive approximations
/ f c , 7 f c , a n d F f c , k = 0 , 1 , 2 , . . . . First of all, note that Eqs. (4.1.11), (4.1.12) are the Bellman equations for deterministic problems of synthesis of secondorder control systems in which the equation of motion has the form ^ = -/3y,
J = aTWro-/%
(4.1.14)
(in the second equation the signs "minus" and "plus" of um correspond
to the domains R^_ and Rk_, respectively). As was shown in [172], the gradient V/ fc of the solution of nondiffusion equations (4.1.11), (4.1.12) remains continuous when we cross the interface Tk , that is, on Ffc we have the conditions
dfl •>+ Q
ay
—
k df•>-
——
Q
oy
dfi J + I
—
Q
dfJ k-
—— Q
oz
dz
JL _ n i o '
"•
—— U )
1
)^;...)
I A 1 1 e;A It.l.lJ I
if the phase trajectories of the deterministic system (4.1.14) either approach the line F on both sides (the switching line of the first kind) or approach Ffc on one side and recede on the other side (the switching line of the second
kind, see Fig. 4). This fact allows us to calculate the gradient V/ fc along Tk. Indeed, in the domain Rk+ we have
dy
(4-1-16)
d.
and in the domain R.^. ,
6y
dz J '
v
,
'
m!
^/-
dz
(4.1.17)
Synthesis of Quasioptimal Systems
225
It follows from the preceding continuity considerations that both equations (4.1.16) and (4.1.17) must be satisfied on F simultaneously. Solving these equations for the first-order derivatives, we find the gradient of the loss function on the interface F* between R/j_ and Rl:
This allows us to write the difference between the values of the loss function at different points on the boundary Tk as a contour integral along the boundary,
fk(Q)-fk(P)=
I Akdy + Akdz. Jp
(4.1.19)
If the part of Ffc between the points P and Q is a boundary of the first kind (that is, the representative point of system (4.1.14) once coming to the boundary moves in the "sliding regime" along the boundary [172]), then formula (4.1.19) makes it possible to obtain a necessary condition for the boundary F* to be optimal. The corresponding equation for the desired switching line z = zk (y} is obtained from the condition that the difference (4.1.19) must be minimal. This equation can be written in the form [172]
8Ak
Equation (4.1.20) is a consequence of the following illustrative arguments. Let y and y be the coordinates of the points Q and P on the j/-axis.
We divide the interval [y , yp] into N equal intervals of length
A = \yp — y \/N and replace the contour integral (4.1.19) by the corresponding integral sum
r / Jp
[Ak(yi, Zj)A + Ak(yi, z,-)(z,-+i - z,-) (4.1.21)
where yi = yp + (i — 1)A and z,- = z(j/ ( -). We need to choose z,- so that to minimize the function $ A ( Z I , . . . , Z ). A necessary extremum condition d
write the following system of equations for optimal z, :
dAk j/i-i,z,--i) = 0. (4.1.22)
226
Chapter IV
If Ay(y,z), A*(y,z): and zk (y) are sufficiently smooth functions of their arguments, then we have a/tfe
(4.1.23) for small A = yt — j/,-_i. Substituting (4.1.23) into (4.1.22), taking into account the relation Zi+i — 2z; + z,-_i — o(A), and passing to the limit as A —>• 0, we obtain the condition ajfc
i,zi) = ^(yi,Zi),
(4.1.24)
which coincides with (4.1.20), since i is an arbitrary number. If we know the gradient of the loss functions along the switching line P* and the equation z = zk(y) for r fc , then we can find a condition for the parameter -jk that is the fcth approximation of the stationary tracking error 7 in the original diffusion equation (4.1.8). By using (4.1.18) and the equation z = zk(y), we obtain the following expression for total derivative dfk/dy along T fc :
f! = ^ y + A*^W(y,7*). dy dy
(4.1.25)
The unknown parameter 7^ can be found from the condition that the derivative (4.1.25) is finite at a stable point; in the problem considered the point y = 0 is stable. More precisely, this condition can be written as
limyw*(j/,7*) = 0.
y-*0
(4.1.26)
The expression ,
dt
dy
is the increment of the loss functions /* on the time interval dt. Hence (4.1.26) means that this increment becomes zero after the controlled deterministic system (4.1.14) arrives at the stable state y — 0. Obviously, in this case, it follows from the above properties of the penalty function c(z) that we also have z = 0. Thus, relation (4.1.26) is a necessary condition for the deterministic Bellman equations (4.1.11), (4.1.12) to have stationary solutions. Let us use the above-treated calculation procedure for solving the equations of successive approximations (4.1.11), (4.1.12). We restrict our calculations to a small number of successive approximations that determine the most important terms of the corresponding asymptotic expansions and primarily affect the structure of the controller C when a quasioptimal control system is designed.
Synthesis of Quasioptimal Systems
227
The zero approximation. To calculate the zero approximation, we need to solve the system of equations (see (4.1.11))
+(
-*«~'
+c(2)=/
(4 L27
-
- >
ftf
ftf
Using (4.1.15) and solving system (4.1.27) with respect to -^ = -—- = f?f°
df°
df°
Qf°
-jj- and -g^- = -gf- = -^-, we obtain the following expressions for the components of the gradient V/° (4.1.18) on the switching line F°: r)f°
f)f°
- = ^(,,,)EO,
r(7\ — -v°
-^fe^iiiL-i.
(4.1.28)
Equation (4.1.20), which is a necessary condition for the switching line of the first kind, and (4.1.28) allow us to obtain the equation for F°:
^=0.
(4.1.29)
Since, by assumption, the penalty function c(z) attains its unique minimum at z = 0, the condition (4.1.29) implies the equation
z = 0,
(4.1.30)
that is, in the zero approximation, the switching line coincides with the y-axis on the plane (y,z). Now let us verify whether (4.1.30) is a switching line of the first kind. An examination of phase trajectories of system (4.1.14) shows that on the segment -
_
,
the phase trajectories approach the j/-axis on both sides;2 therefore, this segment is an actual switching line. For y $ [l-,l+], the equation for the switching line F° will be obtained in the sequel. Now let us calculate the stationary tracking error 7°. From (4.1.25), (4.1.26), and (4.1.28), we have "°(2/,7°) = -,
7° = 0.
(4.1.31)
2 Obviously, in this case, the domain R.5. (R.1_) is the upper (lower) half-plane of the phase plane (y, z). Therefore, to construct the phase trajectories, in the second equation in (4.4.14), we must take «m with sign "minus" for z > 0 and with "plus" for z < 0.
228
Chapter IV
It also follows from (4.1.28) and (4.1.31) (with regard to c(0) = 0) that the loss function is constant on the j/-axis for I- < y < ^+; thus we can set /°(l/,0) = 0 for y e [*-,/+]. To calculate the loss function /° at an arbitrary point (y,z), we need to integrate Eqs. (4.1.27). To this end, let us first write the system of
equations for the integral curves (characteristics):
/3y
/3y-a±um
.
(4.L32)
c(z)
If 3/0 denotes the point at which a given integral curve intersects the y-axis, z — 0, then (4.1.32) implies the following equation for the characteristics
(the
phase trajectories):
,
lny-y,
(4.1.33)
as well as for the zero approximation of the loss function
/ JO
c(z')
dz'_______
t) + z-z']-a±um
(4.1.34)
In (4.1.34) we have J/Q = f>± \.¥±(y) + z]i where f ~ ^ ( y ) is the inverse function of
case, as already noted, the gradient (4.1.15) remains continuous on F°; therefore, the derivatives of the loss function along F° are determined as previously by (4.1.28). However, in general, formula (4.1.20), from which Eq. (4.1.30) was derived, may not be valid any longer. In this case, the equation for F° can be obtained by differentiating (4.1.34), say, with respect to z and by setting, in view of (4.1.28), the expression obtained equal to zero. This implies the following equation for the switching line F°: y
,o
A
-jaz
(4.1.35)
Here we took into account the equality c(0) = 0 and assumed that the condition (dip±/dyo) • (dyo/dz) ^ 0 must be satisfied on F° determined
by (4.1.35).
229
Synthesis of Quasioptimal Systems
An analysis of the phase trajectories (4.1.14) shows that, to find F° for y > l+, we must use the function (p-(y) in Eq. (4.1.35) (correspondingly,
z2.
In this case, the integral in (4.1.35) can readily be calculated and
Eq. (4.1.35) acquires the form
a + um
2
y
y
yo
yo
——— In — + y0 In — = y - y0
2p
(4.1.36)
(in (4.1.36) we have yo — f>~1[
implicitly. Near the point y = l+ = (a + um)/(3 at which the switching line changes its type, Eq. (4.1.36) allows us to obtain an approximate formula and thus write the equation for F° explicitly:
Figure 30 shows the position of the switching line F° and the phase trajectories in the zero approximation.
um - a
FIG. 30
um + a
230
Chapter IV
Higher-order approximations. Everywhere in the sequel we assume that the penalty function c(z) = z 2 . Let us consider Eqs. (4.1.12) corresponding to the first approximation:
1
iy
I
«
I
I
oz )
\ ~ T ">mi
(4.1.37)
r,
dz
cl(y,z) = <4(t/,z)
,
e\
2
fd
32/9
fl
d2fl\
= z 2 +2 - [\B V-4^ + fl ^ %2 + ' "2^ '+ 5y(9z ' z2 /
d 2 fJt ±] (4.1-38)
Ar
gz:
'
To simplify the further calculations, we note that, in the case of the stationary tracking mode and of small diffusion coefficients considered here, the probability that the phase variables y and z fluctuate near the origin on phase plane (y, z) is very large. The values y = (a =p um}/j3 at which the switching line F° changes its type are attained very seldom (for the stationary operating conditions); therefore, we are mainly interested in finding the exact position of the switching line in the region — (um — a)/(3 < y < (um + a)//3, where, in the zero approximation, the position of the switching line is given by the equation z = 0. Next, note that the first-approximation equation (4.1.37) differs from the corresponding zero-approximation equation (4.1.27) only by a small (of the order of e) term in the expression for cl(y, z) (see (4.1.38)). Therefore, the continuity conditions imply that the switching line F1 in the first approximation determined by (4.1.37) is sufficiently close to the previous position z = 0. Thus, we can calculate F1 by using, instead of exact formulas, approximate expressions corresponding to small values of z.
Now, taking into account the preceding arguments, let us calculate the function c1(j/, z) = c±(y, z) determined by (4.1.38). To this end, we differentiate the second expression in (4.1.34) and restrict ourselves to the first-
and second-order terms in z. As a result, we obtain3 ft J± f 9 _ o 2
tz
^7
/3y - a ± um dzdy
f3z (f3y-a±um)2
pft II? yz (j3y - a ± u. 2
2
"''
r
3
( 4 -!-39) dy2
3
The functions f^_(y, z) and f^_(y,z), as the solutions of Eqs. (4.1.37), are defined in R\ and R]_. At the same time, the functions fi(y,z) and /£({/,z) are defined in R.5. and R°_. However, since the switching lines F° (between R^. and R°_) and F1 (between R^ and R]_) are close to each other, to calculate (4.1.39), we have used expressions
(4.1.34) for /£ in R^ and R]_.
Synthesis of Quasioptimal Systems
231
Substituting (4.1.39) into (4.1.38) and (4.1.37), we arrive at the equations
-
dy
m
dz
-
.-
(f3y-a±um)
(in Eqs. (4.1.40) we preserve only the most important terms in the functions
c± (t/, z) and neglect the terms of the order higher than or equal to that ofe 3 ). In view of (4.1.15), both equations (4.1.40) hold on the boundary F1. By solving these equations, we obtain the components of the gradient of the loss function V/ 1 (y, z) on the switching line F1:
dz
*
ul-((3y-ar N)z(/3y - a) __ dy ~ y~ {3y Py[ul - (f3y - a
. } \• • >
In this case, the condition (4.1.20) (a necessary condition for the switch-
ing line of the first kind) leads to the equation
_ e(B + N)((3y - o)1 _ e(B + N)2pz({3y - a)
< - (fly - «)2 J
[< - (fly - «)2]2
l
j
Hence, neglecting the e2-order terms, we obtain the following equation for the switching line F1 in the first approximation: u
™ - (fly - «)2
Equation (4.1.43) allows us to calculate the stationary tracking error •y1 in the first approximation. The function w 1 (j/, -y1) readily follows from (4.1.25), (4.1.41), and (4.1.43). Substituting the expression obtained for w 1 (y, 71) into (4.1.26), we see that 71 = O(e2), that is, the stationary tracking error in the first approximation coincides with that in the zero approximation, namely, 71 = 0. The stationary error 7 attains nonzero values only in the second approximation. To calculate the derivative (4.1.25)
with desired accuracy, we need not calculate the loss function /£ (y, z) in the first approximation but can calculate c 2 (y, z) in (4.1.12) and (4.1.13)
232
Chapter IV
by using expressions (4.1.41) for the derivatives df1 /dy and dfl/dz, are satisfied along the switching line F1. Differentiating the first relation in (4.1.41), we obtain
u
which
-
As follows from (4.1.41), the other second-order derivatives d 2 f ^ / d z d y and d2 f1 /dy2 on F1 are higher-order infinitesimals and can be neglected when we calculate j2 . Therefore, (4.1.45) and (4.1.13) yield the following approximation expression for the function c2(y, z):
Taking (4.1.46) into account and solving the system (4.1.16), (4.1.17) (with k = 2) for df2/dy and df2 /dz, we calculate the functions A2 and A2 in (4.1.44) as 2_g/2_ 1 La, e2(B + N)2 Z + « 8y - j3y\ 2[<-(/3j/-a)2]
d A
7
\ /'
d_f_ dz
"' (4.1.47) From (4.1.26), (4.1.43), (4.1.44), and (4.1.47), we derive the equation for the stationary tracking error in the second approximation:
(
*
e2(B + N)2((3y-a)
whence it follows that 72 =
Formula (4.1.48) exactly coincides with the stationary error (2.2.23) obtained for a homogeneous (in y) input process. The inhomogeneity, in other words, the dependence of the stationary error on the parameter (3, begins to manifest itself only in the calculations of higher approximations. However, the drift coefficient —j3y affects the position of the switching line (4.1.43) already in the first approximation. Formula (4.1.43) is a generalization of the corresponding formula (2.2.22); for (3 = 0 these formulas coincide.
Figure 31 shows the analogous circuit diagram of the tracking system that realizes the optimal control algorithm in the first approximation. The
unit NC is an inertialess nonlinear transformer governed by the functional
Synthesis of Quasioptimal Systems
233
y(t)
FIG. 31 dependence (4.1.43). The realization of the unit NC in practice is substantially simplified owing to the fact that the operating region of the input variable y (where (4.1.43) must be maintained) is small. In fact, it suffices to maintain (4.1.43) for \y\ < Ce1/2, where C is a positive constant of the order of O(l). Outside this region, the character of the functional input-output relation describing NC is of no importance. In particular, for \y\ > Ce1/2, the nonlinear transformer NC can be constructed by using the equations for the switching line F° in the zero approximation or, which is even simpler, by using the equation z = 0. This is due to the fact that the system shown in Fig. 31 optimizes only the stationary tracking conditions when the phase variables are fluctuating in a small neighborhood of the origin on the plane ( y , z ) . §4.2. Calculation of a quasioptimal system for tracking a discrete Markov process
As the second example illustrating the approximate synthesis procedure described above, we consider the problem of constructing an optimal system for tracking a Markov "telegraph signal" type process (a discrete process with two states) in the case where the measurement of the input signal is accompanied by a white noise and the plant is subject to random actions. Figure 32 shows the block diagram of the system in question. We assume that y ( t ) is a symmetric Markov process with two states (y(t) = ±1) whose a priori probabilities pt(±l) = P[y(t) = ±1] satisfy the equations (4.2.1)
Chapter IV
234
y(t)
G
y(t)
c
u(t)
P
a
I*)
FIG. 32 Here the number /J, > 0 determines the intensity of transitions between the states y = +1 and y = — 1 per unit time. The system (4.2.1) is a special case of system (1.1.49) with m = 2 and \a(t) = A 7a (i) = //. It readily follows from (4.2.1) that realizations of the input signal y(t) are sequences of random pulses; the lengths r of these pulses and of the intervals between them are independent exponentially distributed random variables, P(T >c) = e'^. The observable process y(t) is an additive mixture of the input signal y(t) and a white noise (independent of y ( t ) ) of intensity x:
(4.2.2) Like in §4.1, the plant P is described by the scalar equation
(4.2.3) where £(i) is the standard white noise independent of y(t) and C(t) and the controlling action is bounded in absolute value,
\u(t)\ < I.
(4.2.4)
To estimate the system performance, we use the integral optimality criterion
(y(t)-x(t))dt,
(4.2.5)
where the penalty function c(y — x) is the same as in (4.1.4). In the method used here for solving problem (4.2.1)-(4.2.5), it is important that c(y — x) is a differentiate function. In the subsequent calculations, this function is quadratic, namely,
c(y-x) = (y-x)2.
(4.2.6)
Synthesis of Quasioptimal Systems
235
A peculiar feature of our problem, in contrast, say, with the problem studied in §4.1, is that the observed pair of stochastic processes ( y ( t ) , x ( t ) ) is not a Markov process. Therefore, as was already noted in §1.5, to use the dynamic programming approach, it is necessary to introduce a special space of states formed by sufficient coordinates that already possess the Markov property. 4.2.1. Sufficient coordinates and the Bellman equation. Let us
show that the current value of the output variable x(t) and the a posteriori probability Wt(l) = P[y(t) = +1 J/Q] are sufficient coordinates Xt in the problem considered. In the sequel, owing to purely technical considerations, it is more convenient to take, instead of wt(l), the variable zt = Wt(^) ~ wt(— 1) as the second component of Xt. It follows from the normalization condition wt(l) + wt(— 1) — 1 that the a posteriori probabilities wt(l) and wt(— 1) can be uniquely expressed via zt as follows:
«,(-!) = l-=~-
(4-2-7)
Obviously, zt randomly varies in time. Let us derive the stochastic equation describing the random function Zt — z(t). Here we shall consider a somewhat more general case of the input signal nonsymmetric with respect to probability. In this case, instead of (4.2.1) the a priori properties of y(t) are described by the equations D,
(4-2.8)
that is, the intensities of transitions between the states y — +1 and y ~ — 1 down from above (/i) and upwards from below (v) are not equal to each other. Let us pass to the discrete time reference. In this case, random functions in (4.2.2) are replaced by sequences of random variables
yn=yn+Cn,
n=l,2,...,
(4.2.9)
where yn, yn, and £n are understood as the mean values of realizations over the interval A of time quantization: ~
1 fn
yn — A" /
~/ \
1 /"*
y(T) dr,
yn = — /
(
C» = V
/
(n-l)A
"~ 1)A
CW dr.
y(T) dr,
(4.2.10)
Chapter IV
236
It follows from (4.2.8) (see also (1.1.42)) that the sequence yn is a simple Markov chain characterized by the following four transition probabilities P,(yn+i I 2/n):
p A (-l | -1) = 1 - ./A
P A (1 | -1) = i/A,
(4.2.11)
(all relations in (4.2.11) hold up to terms of the order of o(A)). It follows from the properties of the white noise (1.1.31) that the random variables C» corresponding to different indices are independent of each other
and have the same probability densities (4.2.12)
P(C») =
Using these properties of the sequences yn and £ n , we can write recurrent formulas relating the a posteriori probabilities of successive time instants (with numbers n and n + 1) and the result yn+\ of the last observation. The probability addition and multiplication theorems yield the formulas
p(yn+1 = l,yf + 1 ) = p(yn = 1, j/i*)p A (l l)p(5;n+i | y«+i = l)
+ p(yn = -l,y")p A (l -l)p(yn+i | yn+i = l), (4.2.13) P(J/»+I = -l,SfiB+1) =p(y» = -l,sf")p A (- 1 I -l)p(j/ B +i | y n +i = -1) + p(yB = l,gfi B )p A (- 1 [ l)p(y»+i 2/n+i = -1). (4.2.14) Taking into account the relation p(yn = ± l , y " ) = wn(±l)p(y™), rewrite (4.2.13) and (4.2.14) as follows:
2/n + l = 1),
we can
(4.2.15)
Wn(-l
x p(y n+ i y n+ i = -1).
(4.2.16)
We write dn = to n (l)/tw n (-l) and note that (4.2.9) and (4.2.12) imply
Synthesis of Quasioptimal Systems
237
Now, dividing (4.2.15) by (4.2.16) and taking into account (4.2.11), we obtain the following recurrent relation for the parameter dn:
i/Al f2A_ \ ^ exp <^ ——j/ n+ i >.
(4.2.17)
By letting the time interval A —>• 0, and taking into account the fact that
liirtA->o(c?n+i — dn)/A = dt and (4.2.17), we derive the following differential equation for the function dt — d(t): fy
I
dt = v + (v - n)dt - fj-dl +—-y(t).
(4.2.18)
Since, in view of (4.2.7), the functions Zt = z(t) and dt satisfy the relation
dt-(l + zt)/(l - zt), Eq. (4.2.18) for zt has the form
zt = z/(l - zt) - //(I + zt) +
~y(t).
(4.2.19)
3£
For a symmetric signal (fj, = z/), instead of (4.2.19), we have ~
(4-2.20)
REMARK. According to (4.2.2), the observable process y ( t ) contains a white noise, and the coefficients o f y ( t ) in (4.2. 18)-(4. 2.20) contain random functions dt = d(t) and zt = z(t). It follows from §1.2 that, in this case, we must indicate in which sense we understand the stochastic integrals used for calculating the solutions of the stochastic differential equations (4.2.18)(4.2.20). A more rigorous analysis (e.g., see [132, 175] shows that all three
equations (4.2. 18)-(4. 2.20) must be treated as symmetrized equations. In particular, just due to this fact we can pass from Eq. (4.2.18) to Eq. (4.2.19) by using the standard rules for differentiating composite functions (instead of a more complicated differentiation rule (1.2.43) for solutions of differential Ito equations). D Now let us verify whether the coordinates Xt — (xt,zt) are sufficient for the solution of the synthesis problem in question. To this end, according to [171] and §1.5, we need to verify whether the coordinates Xt = (xt,zt) are sufficient (1) for obtaining the conditional mean penalties
£,Xo] = E[c(yt,xt) \ y(r),cc(r): 0 < r < t] = E[c(yt,xt)\Xt];
(4.2.21)
238
Chapter IV
(2) for finding constraints on the set of admissible controls u; (3) for determining their future evolution (that is, the probabilities of the future values Xt+&, A > 0). In this problem, in view of (4.2.4), the set of admissible controls is a given interval — 1 < u < 1 of the number axis independent of anything;
therefore, we need not take into account the statement of item (2).4 Obviously, the conditional mean penalties (4.2.21) can be expressed via the a posteriori probabilities as follows:
E[c(x t , yt)
J/(T), X(T) : 0 < r < t] = c(xt, l)wt(l) + c(xt, -l)t»«(-l). (4.2.22)
Since formulas (4.2.7) express the a posteriori probabilities iyj(±l) in terms
of zt, statement (1) is trivially satisfied for the variables (xt,z ( ). Let us study the time evolution of (xt,Zt). The variable Xt = x(t) satisfies an equation of the form (4.2.3). If in this equation the control ut at time t is determined by the current values of ( x t , Z t ) , then, in view of the white noise properties, the probabilities of the future values of X(T), T > t, are completely determined by Xt = (xt,zt). Now, let us consider Eq. (4.2.20). Note that, according to (4.2.2), y(t) = y(t) + V*CW, where y(t] is a Markov process and C(t) is a white noise. Therefore, it follows from
Eq. (4.2.20) that the probabilities of the future values Z <+ A are determined by Zt and the behavior of J/(T), T > t. However, since y(T) is a Markov process, its behavior for T > t is determined by the state yt described
by the probabilities wt(yt = ±1), that is, in view of (4.2.7), still by the coordinate Zj. Thus, statement (3) is proved for Xt = (xy,Zt). Equations (4.2.3) and (4.2.20) allow us to write the Bellman equation for the problem considered. Introducing the loss function
F(t,xt,zt) = min [«(r)|
T r f E /
C(X(T), y(r)} dr
Ut
i
x(t) = xt, z(t) = zt\
1
t
(4.2.23) and using the Markov property of the sufficient coordinates (x(i),z(t)), from (4.2.23) we obtain the basic functional equation of the dynamic programming approach: _
U
t+A
c(x(r),y(T))dT
+ F(t + A, a5 (+ A, ^+A) I *t, *t • (4.2.24) It is necessary to verify the statement of item (2) only in special cases in which the control constraints depend on the state of the control system. Such problems are not considered in this book.
Synthesis of Quasioptimal Systems
239
The Bellman differential equation can be derived from (4.2.24) by the standard method (see §1.4 and §1.5) of expanding F(t + A, X < + A, -Z«+A) in the Taylor series around the point (t, xt,zt), averaging, and passing to the limit as A —>• 0. In this procedure, we use the following obvious formulas that
are consequences of (4.2.3), (4.2.7), and (4.2.20)-(4.2.22): r
,t+A
E
/
-]
c(x(t),y(T))dr
\ xt,zt\ A + o(A), (4.2.25)
E[(z t +A - xt) | xt, zt] = u t A + o(A),
(4.2.26) (4.2.27) ),
(4.2.28) (4.2.29)
E[(zt+A - «»)* I aj t , zt] = o(A),
fc > 3.
(4.2.30)
It is somewhat more difficult to calculate the mean value of the difference (zt+&.—zt). Since, as was already noted, (4.2.20) is a symmetrized stochastic equation, E[(zt+&—Zt) Xt,Zt] = E[(zt+& — Zt) Zt] can be calculated with the help of formulas (1.2.29) and (1.2.37) (with v = 1/2 in (1.2.37)). Then, taking into account the relation E[y* 1 z*] = E[yt | zt] - zt,
from (4.2.20) and (1.2.37), we obtain
zt) | zt]
+ o(A).
(4.2.31)
As A -» 0, relations (4.2.24)-(4.2.31) enable us to write the Bellman differential equation in the form
dF O I
/// C/(-
. \ dF] I
iiiiAJ.
I\ ai\ tt I -S V. "\1
I IM
\ I
_
fit* Lr Jj
\ \
Bd2F
dF
n
-
-
Uf-Uf^
|
fl"7 C/.G
- / LJ
_
ftf^ \JJj
f. 1
(l-z2)2d2F |
_
/ Si IS Li
1 fl ?^ \J /J
+ c(x, 1)——+ = 0. c (a;.-l)^—^ V ' / O V " ' O 1
(4.2.32) *> -*
240
Chapter IV
The second term in Eq. (4.2.32) can also be written as —\dF/dx\. To the equation obtained, we must add a condition on the loss function
in the end of the control process, namely,
F(T,x,z) = 0,
(4.2.33)
and some boundary conditions. Since the input signal takes one of the two values y(t) = ±1 at each instant of time t, we can restrict our consideration to the region x\ < I. Thus the sufficient coordinates are defined on the square — 1 < x < +1, — 1 < z < +1. The boundary conditions on the sides x — — 1 and x = +1 of this square are r)W ^-(*,±l,z) = 0. (4.2.34) These conditions mean that there is no probability flow [11, 173] through the boundary x = ±1.5 On the other sides z — ±1 of the square, the diffusion coefficient coni1 2\ 2 2 T-I tained in the second diffusion term ^ §pr is zero. Therefore, instead of the conditions dF/dz = 0 on these sides of the square, we have the trivial conditions a
< oo.
(4.2.35)
If, by analogy with the problem solved in §4.1, in the space of sufficient coordinates ( x , z ) we denote the regions where dF/dx > 0 and dF/dx < 0 by R+ and R_, respectively, then in these regions the nonlinear equation (4.2.32) is replaced by the corresponding linear equation and the optimal control is formed by the rule
«,(*,x, z ) = | ' ! r^!R+' (. 1-1,
11
{J-jZ)
t I\—.
Since the first-order derivatives of the loss function are continuous [113, 175], on the interface F between R+ and R_, we have
dF, — (<,:c,z):=0. dx
4.2.36)
To solve the synthesis problem is equivalent to find the interface F between R+ and R_ (the switching line for the controlling action). A straightforward way for obtaining the equation for the switching line F is to solve 5 The condition (4.2.34) means that there are reflecting screens on the boundary segments (x = +1,-1 < z < +1) and (x = -1, -1 < z < +1) (for a detailed description of diffusion processes with phase constraints and various screens, see §6.2).
Synthesis of Quasioptimal Systems
241
the original nonlinear equation (4.2.32) with the initial and boundary conditions (4.2.33)-(4.2.35) and then, on the plane (x, z), to find the geometric locus where the condition (4.2.36) is satisfied. However, this method can be implemented only numerically. To solve
the synthesis problem analytically, let us return to the approximate method used in §4.1. 4.2.2. Calculation of the successive approximations. Suppose that the intensity of random actions on the plant is small but the error of measurement of the input signal is large. In this case, we can set B = eBo and x = XQ/£ (where e > 0 is a small parameter). We consider, just as in
§4.1, the stationary tracking operating conditions. Then for the quadratic loss function (4.2.6), the Bellman equation (4.2.32) has the form a
£
Q2
2 2
2
df . \u^-\ r /l = ^(B ( n f^ +(1-2 5 A + x2 2 - 2xz + 1 - 7 2/iz-^min ^————)>--±} 0 dz |«|
R+ and R_, we can replace the nonlinear equation (4.2.37) by the pair of linear equations
dz
dx
_,, (4.2.38) ^
2
each of which is valid only in one of the regions (R+ or R_) on the phase plane (x, z).
We shall solve Eqs. (4.2.38) by the method of successive approximations considered in §4.1. In this case, instead of (4.2.38), we need to
solve a number of simpler equations that successively approximate the original equations (4.2.38). By setting e = 0 in (4.2.38), we obtain the zero-approximation equations
- ± - - = x2 -2xz+l--y°.
dz
ax
(4.2.39)
The next approximations are calculated according to the scheme 9/;r——— 4^p*, IE
dz
i +
*>*
-
dx
— jj T^ — £,J 9-rr 4- j. 1 — J^ -j-
Bc,-^^— I ° dz 2
y
^
^ ^2
'
-,•* 7
'
fr — 1 2 ~ ' '''' '
(C4
2 401 '
242
Chapter IV
By solving the equations for the Mh approximation (k = 0 , 1 , 2 , . . . ) , we obtain the set /± (x, z), Fk, 7* consisting of approximate expressions for the loss function, the switching line, and the stationary tracking error. In what follows, we solve the synthesis problem in the first two approximations, the zero and the first.
The zero approximation. Let us consider Eqs. (4.2.39). By analogy with §4.1, the equation for the interface F° between R/J. and R£, on which both equations for /° and /£ hold, and the stationary tracking error 7° can be found without solving Eqs. (4.2.39). Indeed, using the condition that the gradient V/ fc (see (4.1.15)) is continuous on the switching line Yk, _ a
OX
—
Q
OX
>
a a
OZ
— —
QQ
OZ
j
K — U, 1, 4, . . .,
we obtain from (4.2.39) the following components of the gradient V/° along r°:
X
~ Ox
The condition
(4.2.43) which is necessary for the existence of the switching line of the first kind (see (4.1.20)), together with (4.2.42) implies that the line
z =x
(4.2.44)
is a possible F° for the zero approximation. An analysis of the phase trajectories of the deterministic system
dx - = ±l,
dz - = -2Mz
(4.2.45)
shows that the trajectories actually approach the line (4.2.44) on both sides6 if only 2fj, < 1. In what follows, we assume that this condition is satisfied. The stationary error 7° is obtained from the condition that the derivative df°/dx calculated along F° at the stable point (e.g., at the origin x — 0, 6 In the first equation in (4.2.45), the sign + corresponds to the region z > x and the sign — to z < x.
Synthesis of Quasioptimal Systems
z - 0) is finite (see we have
243
(4.1.25) and (4.1.26)). In view of (4.2.42) and (4.2.44),
dx
x
z
dx
2(j,x
along F°. The condition (4.1.26) in this case has the form lir 0, which implies 7° = 1. Now, to solve Eq. (4.2.39), we write the characteristic equations
dx
dz
df+
(4.2.46)
To solve (4.2.46) uniquely, it is necessary to pose some additional "initial" condition (to pose the Cauchy problem) for the loss function f ( x , z ) . This condition follows from (4.2.42) and (4.2.44). The second relation in (4.2.42) implies that f°(z,z) = -(z 2 /4^) +/°(0,0) on the line (4.2.44). Without loss of generality, we can set /°(0, 0) = 0. Thus, among the solutions /± obtained from (4.2.46), we choose the solution satisfying the condition /<>(*,*) = -£ = -£ = /"(*,*)
(4.2.47)
on the line z = x. We readily obtain this solution
> = where XQ = X±(xiz)
-
± an
,
±
(4.2.48)
d the functions x± are determined as solutions of
the equations x ± e T2MX± = ze*2l>x.
(4.2.49)
The first approximation. Now, using (4.2.48), we can find the switching line F1 in the first approximation. Relations (4.2.40) and (4.2.41) allow us to write the components of the gradient V/1 on the line F1: +
dx
4>co
\ dz2
dz2 )
eB0
4 \ dx2
dx2
— = —;———— I ———2 -}- ———2 ] -j- —— I ——t 2_j_ ——i^ 2 dz 8/j.itoz \ dz dz J 8fj,z \ dx dx X2
X
( 4 ' 2 - 5 0)
2/xz ~ -. + ^~Differentiating (4.2.48) and using the relations
dx± dx
X±
Chapter IV
244
that follow from (4.2.49), we find the components
dz2
~
(4.2.51)
2/j,z2'
Substituting (4.2.51) into (4.2.50), we obtain
l
dz
4/tz
(4.2.52) 2/J,Z
/J,
IfJLZ
Using again the condition (4.2.43), we find F1. The derivatives dA^/dz and dA^/dx are calculated with regard to the fact that the difference between the position of the switching line F1 in the first approximation and the position of F° determined by (4.2.44) is small. Therefore, after the differentiation of (4.2.52), we can replace x+ z
x
an<
i X- by the relation
X+ = X- = — - If this replacement is performed only for the terms of the order of e, then the error caused by this replacement is an infinitesimal of higher order.
-1
FIG. 33
245
Synthesis of Quasioptimal Systems Taking into account this fact, we obtain from (4.2.52):
dx dx
-0(e2
— 4iJ,2x2}
dz dAl
eB0
xo(l — 4/j,2x2)
— - - + 0(e 2 ). /j,z ft
1 — 4fj?x2
Hence, using (4.2.43), we obtain the equation for the switching line F1: 9/iWl _ x T 2ux(l
2
\2
9i
(4.2.53)
The position of F1 on the plane (x, z ) depends on the values of fj,, >CQ, and BQ. Figure 33 shows one of the possible switching lines and the phase
trajectories of system (4.2.45). By analogy with the zero approximation, we find the stationary tracking error 71 from the condition that the gradient (4.2.52) is finite at the origin. By letting z —>• 0 and x —> 0 in (4.2.52) and taking into account the fact that x+ and %_ tend to zero just as x and z, we obtain
Hence it follows that the stationary error in the first approximation depends on the noise intensity at the input of the system shown in Fig. 32 but is
independent of the noises in the plant.
y(t)
rl y(t)
,
+1|— r -• r —l-i
>/\
1
l/x
J «-
i_ sc
1
< 1
-
-2/z I——
<- X •
-1 FIG.
P
^0
<
<
Kl 1 J
_I
34
ATC
x(t)
246
Chapter IV
Using the equation (4.2.53) for the switching line and Eq. (4.2.20), we construct the analogous circuit (see Fig. 34) for a quasioptimal tracking system in the first approximation. The dotted line indicates the unit SC that produces a sufficient coordinate z(t); the unit NC is an inertialess
transducer that realizes the functional dependence on the right-hand side of (4.2.53). If we have e -C 1 for the small parameter contained in the problem, then the output variable x(t) fluctuates mostly in a small neighborhood of
zero. In this case ( \ x ( t ) \
K =
CHAPTER V
CONTROL OF OSCILLATORY SYSTEMS
The present chapter deals with some synthesis problems for optimal systems with quasiharmonic plants. Here the term "quasiharmonic" means that the plant dynamics is close to harmonic oscillations in the process of control. In this case, through time t = 2?r, the phase trajectories of the second-order systems considered in this chapter are close to circles on the plane (a;, T). There exists an extensive literature on the methods for studying such systems (including controlled systems) (e.g., see [2, 19, 27, 33, 69, 70, 136, 153, 154] and the references therein). These methods are based on the idea (going back to Poincare) that the motion in oscillatory systems
can be divided into "fast" and "slow" motions. This idea along with the averaging method [2] enables one to derive equations for "slow" variables that can readily be integrated. These equations are usually derived by different versions of the method of successive approximations. Various approximate methods based on the first-approximation equation for slowly varying variables play an important role in industrial engineering. For the first time, such a method for studying nonlinear oscillatory systems was proposed by van der Pol [183, 184] (the method of slowly varying amplitudes). Among other first-approximation methods, we also point out
the "mean steepness" method [2] and the harmonic balance method [69, 70], which is widely used in engineering calculations of automatic control systems. More precise results can be obtained by regular asymptotic methods, the most important of which is the asymptotic Krylov-Bogolyubov method [19]. Originally, this method was developed for studying nonlinear oscillations in deterministic uncontrolled systems. Later on, this method was also used for the investigation of stochastic [109, 173] and controlled [33] oscillatory systems. In the present chapter, the Krylov-Bogolyubov method is also
widely used for constructing quasioptimal control algorithms. This chapter consists of four sections, in which we consider four special problems of optimal damping of oscillations in quasiharmonic second-order systems with constrained controlling actions. In the first two sections (§5.1 247
248
Chapter V
and §5.2) we consider deterministic problems; the other two sections (§5.3 and §5.4) deal with stochastic synthesis problems. First, in §5.1 we study the control problem for an arbitrary quasiharmonic oscillator with one degree of freedom. We describe a method for solving the synthesis problem approximately. In this method, the minimized functional and the equation for the switching line are presented as asymptotic expansions in powers of a small parameter contained in the problem. The method of approximate synthesis is illustrated by some examples of solving the optimal control problems for a linear oscillator and a nonlinear van der Pol oscillator. In §5.2 we use the method (considered in §5.1) for solving the control problem for a system of two biological populations, namely, the "predator-prey" model described by the LotkaVolterra equation (see §2.3). We study a special Lotka-Volterra model with a "poorly adapted predator." In this case, the sizes of both interacting populations obey a quasiharmonic dynamics. Next, in §5.3, we consider the stochastic version of the problem studied in §5.1. We consider an asymptotic synthesis method that allows us to construct quasioptimal control systems with an oscillatory plant subject to additive random disturbances. Finally, in §5.4, the method considered in §5.3 is generalized to the case of indirect observation when the measurement of the current state of the oscillator is accompanied by a white noise.
§5.1.
Optimal control of a quasiharmonic
oscillator. An asymptotic synthesis method
According to [2], a mechanical system with one degree of freedom is called a quasiharmonic oscillator if its behavior is described by the system of the form
=
),
\ ,u),
(5.1.1)
where xi and x2 are the phase coordinates, xi and X2 are sufficiently arbitrary (nonlinear, in the general case) functions of their arguments,1 u = (1*1,..., ur) is an r-dimensional vector of controlling actions subject to various restrictions, and the number e is a small parameter. It follows from (5.1.1) that for e — 0 the general solution of system (5.1.1) is a union of two harmonic oscillations
= acos(t + a),
(5.1.2)
'The only assumption is that, for some given functions xi and X2i the Cauchy problem of system (5.1.1) has a unique solution in a chosen domain D in the space of the variables (t,xi,x%) (see §1.1).
Control of Oscillatory Systems
249
with the same period r = 2?r and the phase shift A^> = Tr/2. Note that, in the phase plane (x±, x2), the trajectory that is a circle of radius a corresponds to the solution (5.1.2). If e ^ 0 but is a sufficiently small parameter, then, in view of the continuity, the difference between the solution of system (5.1.1) and the solution (5.1.2) is small on a time interval that is not
too large. More precisely, if for e ^ 0 we seek the solution of system (5.1.1) in the form
xi(t) = a(<)sin (t + «(<)),
x2(t) = a(t) cos (t + a(t)),
then the "amplitude" increment Aa — a(t + 2-Tr) — a(t) and the "phase" increment Aa = a(t + 27r) — a(t) are small during time T = 2?r, that is, Aa ~ £ and Aa ~ £. This fact justifies the term "quasiharmonic" for systems of the form (5.1.1) and serves as a basis for the elaboration of
various asymptotic methods for the analysis of such systems. 5.1.1. Statement of the problem. In the present section we consider controlled oscillators whose behavior is described by an equation of the form
x + ex(x, x)x + x = eu,
(5.1.3)
where x(z, x) is an arbitrary given function (nonlinear in the general case) that is centrally symmetric, e.g., x(x,x) = x(—z, — x). In the phase variables Xi,x2 (determined, as usual, by xi = x and x2 = x), we can replace Eq. (5.1.3) by the following equivalent system of first-order equations:
x-i = X2,
x2 — — xi — ex(xi,x2)x2 + eu,
(5.1.4)
hence it follows that the oscillator (5.1.3) is a special case of the oscillator (5.1.1) with Xi = 0 and X 2 (*i, x2, u) = u - x(xi, X2)x2.
It should be noted that equations of the form (5.1.3) describe a wide class of controlled plants of various physical nature: mechanical (the Froude pendulum [2]), electrical (vacuum-tube and semiconductor generators of harmonic oscillations [2, 19, 183, 184]), electromechanical remote tracking systems for angle reconstruction [2], etc. Numerous examples of actual systems mathematically modeled by Eq. (5.1.3) can be found in [2, 19,
136]. For the controlled oscillator (5.1.3), we shall consider the following optimal control problem with free right-hand endpoint of the trajectory. We assume that the absolute value of the admissible (scalar) control u = u(t) is bounded at each time instant t: \u(t)
(5.1.5)
250
Chapter V
and the goal of control for system (5.1.3) is to minimize the integral functional fT
I[u]— I
Jo
c(x(t},x(t}}dt-+
min |«(«)|<« m , o
v(5.1.6)
over the trajectories {x(t) = x"(t): 0 < t < T} of system (5.1.3) that
correspond to all possible controls u satisfying (5.1.5). The time interval [0,T] and the initial state of the oscillator *(0) = £i(0) = ZIQ, x(Q) = £ 2 (0) = x2o are given. The penalty function c(x, x) = c(xi,x2) in (5.1.6) is assumed to be nonnegative and symmetrical with respect to the origin, c(xi, x2) = c(— cci, —£2), and vanishing only at the point (xi = 0, x2 = 0). In this case, the optimal control w* minimizing the functional (5.1.6) is sought in the synthesis form u* = ut (t, xi(t), £2(2)). Problem (5.1.3)-(5.1.6) is a special case of problem (1.3. !)-(!. 3.3) considered in §1.3. Therefore, if we determine the function of minimum future losses
t
U
T
-I
C(ZI(T), :c2(T)) (fr
xi(t) = Xi,x2(t) = x2\ J
(5.1.7) in the standard way and use the standard derivation procedure described
in §1.3, then, for the function (5.1.7), we obtain the Bellman differential equation
OF at
OF axi
,
.
,
0
.
. dF . (" dF mm eu ' ox2 |«|<«m L 9x2 jF(T,X!,x 2 )=0,
(5.1.8) that corresponds to problem (5.1.3)— (5.1.6). Equation (5.1.8) allow us to obtain some general properties of the optimal control in the synthesis form u*(t, x\, x2), which we shall use later. Indeed, it follows from (5.1.8) that the optimal control ut for which the expression in the square brackets attains its minimum is a relay-type control and can be written in the form
dF u*(t, x^ x2] = -um sign -— (t, xi, x2).
(5.1.9)
REMARK 5.1.1. Rigorously speaking, the optimal control in this problem is not unique. This is related to the fact that at the points (t, Xi, x2), where dF(t, Zi, x2)/dx2 = 0, the optimal control uf is not uniquely determined by Eq. (5.1.8). On the other hand, one can see that at the points
Control of Oscillatory Systems
251
(t,xi,x2), where 8F/dx2 = 0, the choice of any control u° lying in the admissible region [—«TO, um] does not affect the value of the loss function F(t, Xi, x2) that satisfies the Bellman equation. Therefore, in particular,
the control (5.1.9) that requires the choice of w* = 0 at the points (t, x\, X2),
= O,2 is optimal. D
where dF(t,xi,x2)/dx2
Using (5.1.9), we can rewrite the Bellman equation (5.1.8) in the form
OF
dF
,
«^2"o
,
\<"1
dF
,dF
1 ^ / V V ^ - L ) ^ti^L I n
^ ""m
f\
OX2
0 < t < T, (5.1.10) It follows from (5.1.10) and the central symmetry of x(*i; x2) and c(xi, x2) that the loss function (5.1.7) satisfying (5.1.10) is centrally symmetric with
respect to the phase coordinates, namely, F(t, si, x2) = F(t, — X i , — x2)Therefore, for any t, xi, x2 we have
dF
dF
-— (t, xi, x2) = --— (t, -xi, -x2). ox2 ox2 It follows from this relation and (5.1.9) that the optimal control algorithm
u*(t, xi, x2) has an important property of being antisymmetric, namely, w*(<, xi, x2) = — u»(i, —xj_, —x2).
(5.1.11)
The facts that the optimal control in problem (5.1.3)-(5.1.6) is of relay type (5.1.9) and antisymmetric (5.1.11) play an important role in the asymptotic synthesis method discussed in the sequel. We also note that the optimal control algorithm in problem (5.1.3)(5.1.6) can be simplified significantly if we consider the optimal control of system (5.1.3) on an infinite time interval. In this case, the upper limit
of integration T —> oo in (5.1.6) and, instead of (5.1.7), we have the timeindependent3 loss function
min \u(T)\
/
C(ZI(T), x2(r)) dr Xi(t) = xi, x2(t) = x2 ,
Ut
J
T>t
(5.1.12) 2
Recall that the discontinuous function signx is determined by the relation
1 3
+1,
x > 0,
0,
x = 0,
-1,
x<0.
The loss function (5.1.12) is time-independent, since the plant equations (5.1.4) are
time-invariant.
252
Chapter V
and, instead of (5.1.9), we have a time-invariant control algorithm of the form u*(xi, x 2 ) = -Mm sign -— (xj., x2). (5.1.13) In what follows we shall consider just such a time-invariant version of
the optimal control problem (5.1.3)-(5.1.6) on an infinite time interval. REMARK 5.1.2. As T -> oo, problem (5.1.3)-(5.1.6) makes sense only if there exists an admissible control u(xi: X2) in the synthesis form ensuring the convergence of the improper integral4 f0
(5.1.14)
Jo where x"(t) and x2(t) denote solutions of system (5.1.4) with control u
and the initial conditions £i(0) = x\ and 3:2(0) = x2. Simultaneously, for some constraints of the form (5.1.5) imposed on the admissible controls and for some nonlinear functions x(xi,x2) in (5.1.3), (5.1.4), it may happen that none of the admissible controls u ensures the convergence of the integral (5.1.14). For example, if x(xiix2) — x^ — 1, then system (5.1.3) is a controlled van der Pol oscillator. It is well known [2, 183, 184] that undamped quasiharmonic auto-oscillations arise in such systems for u = 0. Moreover, this auto-oscillating process is stable with respect to small disturbances affecting the oscillator. Therefore, for sufficiently small um in
(5.1.5), any admissible control is insufficient to "suppress" auto-oscillations in the oscillator (5.1.3). In its turn, in view of the properties of the penalty function c(xi,x2), it follows from this fact that the integral (5.1.14) does not converge. D Everywhere in the sequel, we assume that the parameters of problem (5.1.3)— (5.1.6) are chosen so that this problem has a solution as T —>• oo.
The solvability conditions for problem (5.1.3)-(5.1.6) as T —» oo will be studied in more detail in Section 5.1.4. 5.1.2. Equations for the amplitude and the phase. Reduction of the synthesis problem. To study the quasiharmonic systems of the
form (5.1.1) and (5.1.3), it is convenient to describe the current state of the system by using, instead of the coordinate x\ and the velocity £ 2 , the polar coordinates A and
0=2 = — Asin$,
^ — t +
(5.1.15)
4 It also follows from the properties of the penalty function c(xi, x%) that the control u(xi,xy) guarantees the asymptotic stability of the trivial solution x\ (t) = x 2 (t) = 0 of system (5.1.4).
Control of Oscillatory Systems
253
The change of variables (5.1.15) transforms system (5.1.4) to the following equations for the slowly changing amplitude and phase (equations in the normal form [2, 19, 136]):
A = eG(A,$,u),
(5.1.16)
where
H(A,$,u) = , . ,,
A
,
2
< e ,
.
,
(5.1.17)
sn
ws (A, $) = u(A, $) sin $,
w c (-4j $) = w(-4; $) cos $•
Since the optimal control is of relay type (5.1.9), (5.1.13) and antisymmetric (5.1.11), for the control function u(A,
Note that, in view of the change of variables (5.1.15), controls of the form (5.1.18) are already of relay type and antisymmetric on the phase plane ( x i , X 2 ) . The function
tion f* (A) is calculated by using the method of successive approximations presented in Section 5.1.4. It is well known [2, 19, 33] that for a sufficiently small parameter e, in-
stead of Eqs. (5.1.16), one can use some other auxiliary equations, which are constructed according to certain rules and are called truncated equations. These equations allow one to obtain approximate solutions of the original equations in a rather simple way (the accuracy is the higher, the smaller is the parameter e).5 In the simplest case, the truncated equations
A = G(A), 5
(5.1.19)
Here we do not justify the approximating properties of the solutions constructed with the help of truncated equations. A detailed discussion of these problems can be found in numerous textbooks and monographs devoted to the theory of nonlinear oscillations (e.g., see [2, 19, 33, 136]).
254
Chapter V
are obtained from (5.1.16) by neglecting the vibrational terms in the expressions for G(A, $, u) and H(A, <&, it) or, which is the same, by averaging the right-hand sides of Eqs. (5.1.16) over the "fast phase" $, while the amplitude A is fixed,6 namely,
G(A) = G(A,
G(A,3>,u)d$,
27T Jo0
(5.1.20)
2
i r* 27T Jo
A higher accuracy of approximation to the solution of system (5.1.16) is ensured by the regular asymptotic Krylov-Bogolyubov method [19, 173],
in which the vibrational terms on the right-hand sides of Eqs. (5.1.16) are eliminated by the additional change of variables
A = A* + £v(A*,$*),
$*=t + y>*,
(5.1.21)
where
v(A* $*) = v-i (A* *£*) -f- sv^iA* 3?*) -f- £ 2
(z 1 OO\ ^O.l.ZZj
denote purely vibrational functions such that
———— i r2n v(A,$*)d$* v(A*,$*) — — I
= 0,
27T J0
——————
1
f2n
w(A*,$*) = — I ^^ Jo
w(A , $*)d$* = 0.
By the change of variables (5.1.21), we obtain the following equations for the nonvibrational amplitude A* and phase (p* from (5.1.16):
A* = eG*(A") = eGl(A*) + e2G*2(A*} + £3 ...,
(5.1.23]
In this case, the successive terms GJ, H^, G%, H%,..., vi, wi, v2, w2,... of the asymptotic series (5.1.23) and (5.1.22) are calculated recurrently by the method of successive approximations. 6
This method for obtaining truncated equations is often called the method of slowly
varying amplitudes or the van der Pol method.
Control of Oscillatory Systems
255
Let us illustrate this method. By using (5.1.21), we can write (5.1.16) in the form
A* + ev(A*, $*) = eG(A* + ev(A*,$*), $* + ew(A\ $*), u),
I 0* -L «^^t I
Substituting (5.1.22) and (5.1.23) into (5.1.24) and retaining only the terms of the order of e in (5.1.24), we obtain the first-approximation relations
(5.1.25) Now, by equating the nonvibrational and purely vibrational terms on the left and on the right in (5.1.25), we obtain the following expressions for the first terms of the asymptotic series (5.1.23) and (5.1.22):
>*)-«.(A*,$*) = G(^*), _ i ______ A* vi'(A*,$*)=j '*
/"**
+
[xA(A*,$')-xA]d&-W^A*,®*),
(5.1.26) (5.1.27)
J $*
1 '-^;*c(A*,*t), (5.1.28) where
In (5.1.26)—(5.1.28), as usual, the bar over an expression indicates the averaging over the period, that is, ^ J0 " ... d$*; the lower integration limits
256
Chapter V
calculate the functions (?£, H%, v2, w2 in (5.1.24), we need to retain the expressions of the order of e2. Then (5.1.24) implies the second-approximation relations
In its turn, each equality in (5.1.29) splits into two separate relations for the nonvibrational and vibrational terms contained in (5.1.29), respectively. This allows us to calculate the four functions G|(j4*), H%(A*), v2(A*,$*), and 102(^*5 $*)• In particular, for the nonvibrational terms, the first equality in (5.1.29) implies ,
(5.1.30)
,
Using (5.1.17), (5.1.27), and (5.1.28), we can write the right-hand side of (5.1.30) in more details as follows:7
'S8A* du
° f( (5.1.31)
where the expression
/(XA - XA) d** + ^ j(Xv - xv] d**
(5.1.32)
indicates the control-independent terms. We do not write out the expressions for H2(A*), v2(A*, $*),... since we do not need them in the sequel. 7 For brevity, we omit the arguments (j4*,$*) of the functions x; , Xv», ua, $ s , and *c: in in(5.1.31) and and(5.1.32).
Control of Oscillatory Systems
257
5.1.3. Auxiliary formulas. The functions Gl(A*), H$(A*), G*i(A*), H^(A*),... that form the asymptotic series in (5.1.23) depend on the choice of the control algorithm u(A, $), that is, in view of (5.1.18), on the function ipr(A). It follows from (5.1.26) and (5.1.31) that we can write this
dependence explicitly if we know the expressions
**^T,
dA
(5133)
du c
<9$
The average values (5.1.33) can readily be calculated by using (5.1.18), the properties of the (5-function, and the fact that the functions ut(A, $), uc(A, $), ^S(A, <&), and $C(A, <J>) are periodic (with respect to $). 1. If, for defmiteness, we assume that 0 < ipr < 7T/2, then it follows from (5.1.17) and (5.1.18) that /"27r
1
m
27T J0
I
f
0 '+ /
— / L
sin $ d$ — /
sin <
^0
(5.1.34) One can readily see that formula (5.1.34) remains valid for any
2. In a similar way, we obtain 1
/*27T
MC = — /
27T JQ
rt
WTO sign [sin ($ — y> r (.A))] cos$d$ = — —— simpr(A) 7T
(5.1.35) and the relation wlw^=0, which we shall use later. 3. Using the formal relation
-— sign a: = 2S(x) ax
(5.1.36)
258
Chapter V
and formula (5.1.18), we can write
f\
-{M™ sign [sin ($=- 2 u
m
- 6 [ dA
S
m (* - <pr(A))] cos ($ - <pr(A)) sin$.
(5.1.37)
Using (5.1.37) and the properties of the (J-function, after the integration and some elementary calculations, we obtain ——- sin n$ = — /
dA
——- sin n<J> d<&
2?r J0
{
u
dA
d
,
,
^—^—[cos(n + l)tpr — cos(n — l)
0
,
,
tor even n ,
for odd n. (5.1.38)
4. By the straightforward integration with regard to
dus ——- = 2Mra5[sin($ -
for odd n. (5.1.39)
5. Since \£ s (yl, $) and dus(A,
Next, using (5.1.27) and (5.1.37), we arrive at n.i/y
(5.1.41)
dA where
,*
<J[sin($' - ^ ] cos$' - y>
sin $' d$'.
259
Control of Oscillatory Systems
7T
2?r
7T + (
FIG. 35 It follows from (5.1.40) that the choice of $1 does not affect the value of vF»f^f- Hence we set $1 = 0. Furthermore, if we consider 0 < y>r < TT,
then the piecewise constant function F(&) in (5.1.41) has jumps of value at the points <pr and TT + (pr as shown in Fig. 35. For this function , one can readily calculate F and usF, namely, Ur n -*»» u,F= —— TT
These relations, (5.1.34), (5.1.40), and (5.1.41) imply
dus
dip
n
—
_
Carrying out similar calculations for — -K < <pr < 0 and comparing the result with the last formula, we finally obtain I sm
dus
= —(uc - uc)us = ucus - ucus
(5.1.42)
260
Chapter V
and expressions (5.1.34)-(5.1.36), we obtain
7. The relation = -uccosn3> n
(5.1.44)
allows us to reduce the calculation of the desired mean value to finding a simpler expression w c cosn$. Using (5.1.17) and (5.1.18) and performing some simple calculations, we obtain
for odd n. (5.1.45)
8. The value ^ „ cos n$ can readily be obtained by using the obvious relation -————— ^
1
s
w „ cos n$ = — —T- -r— cos n$
and formula (5.1.39). The expressions obtained for the average values (5.1.33) will be used
later for solving the synthesis problem. 5.1.4. Approximate solution of the synthesis problem. Now let us return to the basic problem of minimizing the functional (5.1.14). By choosing the nonvibrational amplitude and phase as the state variables, we rewrite (5.1.14) in the form8
f°° /(A , $ * ) = / c*(At*,*;)
(5.1.46)
where c*(A*,<£>*) is obtained from the penalty function c ( x i , X 2 ) by the change of variables (5.1.15), (5.1.21). Note that the functional (5.1.46), treated as a function of the initial state (A*, <&*), is a periodic function in the second variable, namely, I(A* , $*) = 8 The value of the functional (5.1.46) depends both on the initial state A*(0) = A*, $*(0) = $* of the system and on the control algorithm u(A*, $*) : 0 < t < oo. There-
fore, for the functional (5.1.46) it is more correct to use the notation Iu(At > * < ) (A*, $*) or /^rl '(^4*,$*) (which, in view of (5.1.18), is the same). However, for simplicity, we write I(A*,$*).
Control of Oscillatory Systems
261
*,$* + 2ir). Therefore, taking into account (5.1.21) and the second equation in (5.1.23), we obtain •*'27r
*
(5-1.47)
from (5.1.46). In (5.1.47) the integration over the period is performed along a trajectory of the system, and hence the amplitude A^ is treated as
a function of the fast phase £ . This function A£ ($£ ) is determined by the relation
that follows from Eqs. (5.1.23). Note that the amplitude increment A A* = A* ($* + 27r) - .A* (
3/(A*,$*)_
1
'**'
(A* 3>*\ t} l
*'
2
dA*2
1
* (A;)
-AA*-...
(5.1.49)
Since A A* = eGJ(A*)27r in the first approximation with respect to e, it follows from (5.1.49) that
where
" and the function G^(A*) = G^(A*,^ r (A*)) is determined by (5.1.26),
(5.1.17), and (5.1.34). Calculating the right-hand side of (5.1.49) with a higher accuracy (in this case, to calculate the last term in (5.1.49), we need to differentiate (5.1.50)), we obtain
c*(A*) v ;
*
~9c*
____ _ fE—_I { A* $*t)
*
*
'
262
Chapter V
where, just as in (5.1.51), the bar over a letter indicates the averaging over the period with respect to <&£, and the function G^(A*] is determined by (5.1.31). Let us write the functional to be minimized as follows:
-L(A*t,$*}dA*t
(5.1.53)
(note that, by the assumptions of the problem considered, we can set It follows from (5.1.53) that, to minimize the functional (5.1.46), it suffices to find the minimum of the derivative dI(A* , <&*}/dA* for an arbitrary current state (A* , <J>*) of the control system. The accuracy of this minimization procedure depends on the number of terms retained in the expansion of the right-hand side of (5.1.49) in powers of e. Let us perform the corresponding calculations for the first two approximations. According to (5.1.50), to minimize the functional (5.1.46) in the first approximation in e, it suffices to minimize (in <pr) the expression f)T
r*( A*}
(5-1-54) Since the penalty function c(x, x) = c(xi, £3) is nonnegative, we have c*(A*} > 0 for A* ^ 0. Therefore, to minimize (5.1.54), it suffices to minimize the function GJ (A*,
———r~T-——~~'
J.
/•27T /
>*) = — I 2T J0
This fact and (5.1.5) readily imply that the optimal control ui(A*,$*) in
the first approximation must have the form «i (A*,**) =u m sign(sin$*).
(5.1.55)
Comparing (5.1.55) and (5.1.18), we see that <pr(A*) = 0 in the first approximation in e. This means that, in this case, the switching line of the control coincides with the abscissa axis on the phase plane (xi = x , X 2 = x). Indeed, if, instead of the amplitude A* and the phase $*, we take the coordinate x and the velocity x as the state variables, then it follows from
(5.1.15), (5.1.21), and (5.1.55) that, in this approximation, the optimal control of the oscillator (5.1.3) is ensured by the synthesis function of the form
ui(x,x) = —umsignx.
(5.1.56)
Control of Oscillatory Systems
263
From the mechanical viewpoint, this result means that, to obtain the optimal damping of oscillations in the oscillator (5.1.3), we must apply
the maximum admissible controlling force (the torque) and this force (the torque) must always be opposite to the velocity (the angular velocity) of the motion. It must also be emphasized that the control algorithm in the
first approximation is universal, since it depends neither on the nonlinear characteristics of the oscillator (that is, on the function x(xi x) 'm (5.1.3)) nor on the form of the penalty function c(x, x) in the optimality criterion
(5.1-6). To find the quasioptimal control algorithm in the second approximation, we need to calculate the function if>r(Af) that minimizes (5.1.52) or, which is the same, the expression
G*1(^,^r(A*)) + £ G;(^,^ r (^)).
(5.1.57)
Since (5.1.57) differs from Gl(A*, ipr(A*)^ by a term of the order of e, it is natural to assume that the difference between the function <pr(A*) that minimizes (5.1.57) and the function <pr(A*) = 0 in the first approximation is small, that is, it is natural to assume that we have <pr(A*) ~ e for the desired function. Having in mind the fact that tpr(A*) ~ e and using the average values (5.1.33) calculated in Section 5.1.3, we can estimate the order of differ-
ent terms in formula (5.1.31) for the function G^A*, y> r (^4*)). We also note that since the function x 'm (5.1.3) is symmetric, that is, x(«, x) — x(—x, ~x), there are only cosines (sines) of $, 2$,... in the Fourier series for the functions X A (-A, $) (xp(A, $)). Thus, it follows from the results obtained in Section 5.1.3 that, among all terms in (5.1.31), only two terms ^T^efj*- and —y-^c^r are of the order of e. The other terms (depending on the control, that is, on tpr(A*)) in (5.1.31) are of the order of e2 or
e3. This implies that the function
+
.
L
.
(5.1.58)
To obtain some special results, we need to define the function x(z, x) explicitly. Let us consider two examples. EXAMPLE 1 . Suppose that the plant is a linear quasiharmonic oscillator described by Eq. (5.1.3). In this case, x(z,i) = 1 and, in view of (5.1.17),
By using (5.1.34), (5.1.43), and (5.1.45), we obtain ™
• o
T SH1 2 ^r'
264
Chapter V
The desired function <pr(A*) can be found from the condition
(5.1.59)
Since ipr is small (tpr ~ e), it follows from (5.1.59) that9 ,.^
£ .
2ura
(5.1.60)
The function ipr(A*) determines (in the polar coordinates) the switching line equation for the quasioptimal control in the second approximation. The position of this switching line on the phase plane ( x , x) is shown in Pig. 36.
FIG.
36
It follows from (5.1.18) and (5.1.60) that in this case the quasioptimal control algorithm (the synthesis function) in the second approximation has the form = « m s g n sn
(5.1.61)
REMARK 5.1.3. It follows from (5.1.60) that
Control of Oscillatory Systems
265
is that if we use a control of the form (5.1.18), then there always exists a small neighborhood of the origin on the phase plane (z, z) and the quasiharmonic character of the trajectories of the plant (5.1.3) is violated in
this neighborhood. In Fig. 36, this neighborhood is the circle of radius R
(R ~ e).10 In the interior of this neighborhood, the applicability conditions for the asymptotic (van der Pol, Krylov-Bogolyubov, etc.) methods are violated. Therefore, the quasioptimal control algorithms (5.1.56) and (5.1.61) can be used everywhere except for the interior of this neighborhood. Moreover, it is important to keep in mind that, by using the asymptotic synthesis method discussed in this section, it is in principle impossible to find the optimal control in a small neighborhood of the point (x = 0, x = 0).
D
EXAMPLE 2. Now let x(x,x) = x1 - 1. In this case, the plant (5.1.3) is a self-oscillating system (a self-exciting circuit) sometimes called the van der Pol oscillator or the Thomson generator. It follows from (5.1.17) that,
in this case, we have
A\ A2 1 XA(A,$)= - l-cos2$- — (l-cos4$) . 2 [
4
(5.1.62)
J
Using formulas (5.1.34), (5.1.43), and (5.1.45) for the function (5.1.58), we obtain
um £——
\A*2 / I 1
4W
• Q • , ™ ' O - - sm 3tpr - sin
Just as in Example 1, from the condition dF/d<pr = 0 with regard to the fact that (pr is small (tpr ~ e), we derive the equation of the switching line,
(5 L63
-
- >
and the synthesis function in the second approximation,
[ 10
/ F / A*2 A.-II \ \ 1 sin ( $ * - | ( ± _ - l + ^) I V
Z \
4
7T A
(5.1.64)
/ J J
An elementary analysis of the phase trajectories of a linear oscillator subject to the
control (5.1.56) shows that the phase trajectories of the system, once entering the circle
of radius R = 2eum , not only cease to be quasiharmonic, but cease to be oscillatory in character at all.
Chapter V
266
u2(x, x) — -un
u2(x,x) = un
FIG. 37 The switching line (5.1.63) is shown in Fig. 37. REMARK 5.1.4. It was pointed out in Remark 5.1.2 that the problem of optimal damping of oscillations in system (5.1.3) on an infinite time interval is well posed if the optimal (quasioptimal ) control of the plant (5.1.3) ensures the convergence of the improper integral (5.1.14) (or, which is the same, of the integral (5.1.46)). Let us establish the convergence conditions for these integrals in Example 2. The properties of the penalty function c(x, x) readily imply that the integral (5.1.46) converges if, for a chosen control algorithm and any initial value of the nonvibrational amplitude A*(0), the solution of the first equation in (5.1.23) A* (t) —>• 0 as t —>• oo, and furthermore, if A*(t) tends to zero not too "slowly." Let us consider the special form of Eq. (5.1.23) in Example 2. We confine ourselves to the first approximation A* = eG\(A*). Since the quasioptimal control in the first approximation has the form (5.1.55), it follows from (5.1.26) and (5.1.62) that the nonvibrational amplitude obeys the equation
If um > 7r/3\/3, then for any A* > 0 the function on the right-hand side of (5.1.65) cannot be positive; therefore, A* (t) —>• 0 as t —>• oo for any solution of (5.1.65). If in this case >
3V3
+ 6,
6>0,
(5.1.66)
Control of Oscillatory Systems
267
then the solution A*(t) of Eq. (5.1.65) attains the value A* = 0 on a finite time interval, which guarantees the convergence of the integral (5.1.46). Thus, the inequality (5.1.66) is the solvability condition for problem (5.1.3)(5.1.6) as T -> oo in the case of Example 2.11 D In conclusion we note that, in principle, the approximate method considered here can also be used for calculating the quasioptimal control algorithms in the third, fourth and higher approximations. However, in this case, the number of required calculations increases sharply.
§5.2.
Control of the "predator—prey" system. The case of a poorly adapted predator
In this section, by using the asymptotic synthesis method considered in §5.1, we solve the optimal control problem for a biological system consisting of two different populations interpreted as "predators" and "prey" coexisting in the same habitat (e.g., see §2.3 and [133, 186, 187]). This system is mathematically described by the standard Lotka-Volterra model in which the behavior of an isolated system is subject to the following system of equations (see (2.3.5)):
dx „ ^ = («x - o«3*,
y0>0,
dy , ~ ^~ ^ = (blX-b2)y,
x(t),y(t)>Q,
^^^
t>Q.
Recall that x = x( t ) and y = y ( t ] are the respective population sizes12 of prey and predators at time t and the positive constants 0.1, 02, 61, and 62 have the following meaning: 01 is the rate of growth of the number of prey, a-i is the rate of prey consumption by predators, 61 is the rate at which the prey biomass is processed into the new biomass of predators, and 63 is the rate of predator natural death. In this section we consider a special case of system (5.2.1) in which the predators die at a high natural rate and are "poor" predators, since they consume their prey at a low rate. In the nomenclature of [177], this problem corresponds to the case of predators poorly adapted to the habitat. For system (5.2.1), this means that we can take the ratio 0261/62 = e
tained in the asymptotic series on the right-hand side of Eq. (5.1.23) for the nonvibrational amplitude. 12 If the distribution of species over the habitat is uniform, then x and y denote the densities of the corresponding populations, that is, the numbers of species per unit area (volume) of the habitat.
268
Chapter V
5.2.1. Statement of the problem. We assume that system (5.2.1) is controlled by eliminating prey specimens from the population (by shooting, catching, and using herbicides). Then, instead of (5.2.1), we have the system (see (2.3.12)
dx
,
„„,
_
„,„,
—^ = (GI — a,2y)x — ux, d L
„
a;(0) = XQ, (5.2.2)
at here the control u = u(t) satisfies the constraints
0 < u < 7,
(5.2.3)
where 7 is a given positive number. We consider the control problem for system (5.2.2) with infinite time interval; the goal of control is to take the system from any initial state
xo,yo > 0 to the equilibrium state £* = b%/bi, y* — o-i/o-z of system (5.2.1). For the optimality criterion we use the functional
r°° \ f
/PI] = I
/>o\2
/
ni\2"i
[Cl (x( t) - g) + C2 (y ( t) - -i) J dt,
(5.2.4)
where c\ and c-z are given positive constants. We assume that the integral (5.2.4) is convergent. In (5.2.2) we change the variables as follows:
C>1
,
„ y = ———————— , «2
~ t=-r-, OJ&2
U
> = \J--
,. „ 5 2
KX
- -5
V "2
This allows us to rewrite system (5.2.2) in the form
dx u . , — = x = y + exy - —— (1 + ex), dt eb2u , , ay . s a^bi — =y=-x+-xy, £=——. dt us t>2
,, „•,
0 (5.2.0)
In this case, the functional (5.2.4) to be minimized acquires the form 1 r°° /[£] = —— / (Clalx2(t) + c2bl^y2(t)} dt.
(5.2.7)
W»2 Jo
In the new variables (x, y), the goal of control is to transfer the system to the origin (x = y = 0), and the range of admissible values is bounded by the
Control of Oscillatory Systems
269
quadrant x > — l/£, y < w/e (since the initial variables are nonnegative,
x,y > 0 ) . We assume that the admissible control is bounded by a small value. To this end, we set 7 = e2-y in (5.2.3). Then, changing the scale of the controlling function, u = £2u, we can write system (5.2.6) and the constraint (5.2.3) as fit
x — y + exy — -—- (1 + ez),
y — —x + sxy,
(5.2.8)
»2W
0 < u < -y.
(5.2.9)
Thus the desired optimal control u* can be found from the condition that the functional (5.2.7) attains the minimum value on the trajectories of system (5.2.8) with constraint (5.2.9) imposed on the control actions. In this case, we seek the control in the form uf = w*(a;(£), y ( t ) ] . 5.2.2. Approximate solution of problem (5.2.7)— (5.2.9). In the case of "poorly adapted" predators, the number £ in (5.2.8) is small, and system (5.2.8) is a special case of the controlled quasiharmonic oscillator (5.1.1). Therefore, the method of §5.1 can immediately be used for solving
problem (5.2.7)-(5.2.9). The single distinction is that admissible controls are subject to nonsymmetric constraints (5.2.9); thus the antisymmetry property (5.1.11) of the optimal control is violated. As a result, it is impossible to write the desired controls in the form (5.1.18). However, as is shown later, no special difficulties in calculating the quasioptimal controls in problem (5.2.7)-(5.2.9) arise due to this fact. On the whole, the scheme for solving problem (5.2.7)-(5.2.9) repeats the approximate synthesis procedure described in §5.1. Therefore, in what follows, the main attention is paid to distinctions in expressions and formulas
caused by the special nature of problem (5.2.7)-(5.2.9). Just as in §5.1, by changing variables according to formulas (5.1.15)13 we transform system (5.2.8) to the following equations for the slowly changing amplitude and phase (5.1.16):
Now, instead of (5.1.17), we have the following expressions for the functions
G(A, $) and H(A, $) only:
G(A, $) = g(A, *) - uc(A, $) - eu'c(A, $),
H(A, $) = h(A, *) - us(A, *) - eu't(A, *), With the obvious change in notation: xi = x and x? =• y.
270
Chapter V
g(A, $) = A2 sin $ cos 4>(sin <J> — cos $), h(A, $) = j4sin$cos$(sin$ + cos$),
sin $,
«'. (^, *) = -
(5.2.10)
1
sin $ cos $.
The passage to Eqs. (5.1.23) for the nonvibrational amplitude .A* and phase ip* is performed, as above, by using formulas (5.1.21)-(5.1.24). The
terms G±, H-[, G%, • • • in the asymptotic series in (5.1.23) are calculated from (5.1.24), (5.2.10) by the method of successive approximations. In particular, in the first approximation, instead of (5.1.26)-(5.1.28), we have
*,$*), i **
frj(A*) = -«,(A*,**),
(5.2.11)
[ f f (^,$)- U c (^,$) + «c]d$,
(5.2.12)
*)-«.(^*,$) + «.]d*.
(5.2.13)
In (5.2.11)-(5.2.13) we took into account the fact that, in view of (5.2.10), we have
g = g(A*,$*) =
g(A* ,$) d$ = 0,
h = 0.
For the second term of the asymptotic series on the right-hand side of
Eq.(5.1.23), instead of (5.1.31), we have
By §5.1 the quasioptimal controls ui(A*, 3>*),U2(A*, $*),... are found from the condition that the partial derivative dI(A*,$*)/dA* attains its minimum. In view of (5.1.50) and (5.1.52), this condition is equivalent to the condition that G\(A*) attains its minimum (in the first approximation) or
Control of Oscillatory Systems
271
the sum G\(A*} +£GJ(^4*) attains its minimum (in the second approximation). It follows from (5.2.9), (5.2.10), and (5.2.11) that minimization of GA* means maximization of _______
1
uc = uc(A*, $*) = — / 2-7T
u(jT,$)cos$d$->- max . -
(5.2.15)
This fact immediately implies the following implicit formula for quasioptimal control in the first approximation:
«i(A*,$*) = -(signcos$* +1). Zi
(5.2.16)
Taking into account formulas (5.1.15) and (5.1.21) for the change of variables, we can write x = A* cos 4>* with accuracy up to terms of the order of £. This fact and (5.2.16) readily imply the following expression for the
synthesis control in the first approximation in terms of the variables (x, y):
ui(x,y) = ^(signx + I).
(5.2.17)
Thus, in the course of the control process, the controlling action assumes only the boundary values from the admissible range (5.2.9) and is switched from the state MI = 0 to the state MI = 7 (or conversely) each time when the representative point (x, y) intersects the y-axis (the switching line in the
first approximation). We also point out that, according to (5.2.5), in the variables (x, y) corresponding to the original statement of problem (5.2.2)(5.2.4), this control algorithm leads to the switching line that is the vertical line passing through the point x = x* — 62/^1 on the abscissa axis; this point determines the number of prey if system (5.2.1) is in equilibrium. To find the optimal control in the second approximation, we need to minimize the expression Gi(A*) + £G2(A*) — F(A*,u). The functions
Gl(A*} = Gl(A*,u) and G$(A*) = G$(A*,u) are calculated by formulas (5.2.11) and (5.2.14) with regard to (5.2.10), (5.2.12), and (5.2.13). In actual calculations by these formulas, it is convenient to use the fact that the difference between the optimal control u2 (A* , $* ) in the second approximation and (5.2.16) must be small. More precisely, we can assume that on the one-period interval of the fast phase $* variation, the optimal control in the second approximation has the form of the function shown in Fig. 38 (the solid lines), where AI and A2 are the phase shifts of the switch times with respect to the switch times of the control in the first approximation
(the dashed lines); these variables are small (Ai, A2 ~ e). This fact allows us, without loss of generality, to seek the control algo-
rithm u2(A*,$*) in the second approximation immediately in the form u2(A*, $*) = 1 {sign[cos($* - ^1) - sin p2] + 1} .
(5.2.18)
Chapter V
272
A2 37T/2
7T/2
27T
$*
FIG. 38 Here
are related to AI and A2 as
and y>2 =
A2-Ai
Ai+A2
(5.2.19)
and hence, are also of the order of e. If the desired control in the second approximation is written in the form (5.2.18), then there are at least two advantages. First, in this case, we can minimize F(A*,u) = GJ(A*) + eG^(A*) by finding the minimum of the known function F(A*,
calculate G\ and G*2 by formulas (5.2.11) and (5.2.14) using the fact that ipi and <£>2 are small ((pi, ip^ ~ e ) ) . From (5.2.10), (5.2.11), and (5.2.18), we obtain •=__\_t 27T&2W JO
27T&2w
(5.2.20)
Since y>i,y> 2 ~ e, it follows from (5.2.20) that the maximal terms (depending on ipi and y> 2 ) in the expansion of (5.2.20) in powers of e are of the order of e 2 . Therefore, to calculate the second term eG*2 in the function F(A*,
273
Control of Oscillatory Systems
R
FIG. 39 With regard to this remark we calculate the mean values on the right-hand side of (5.2.14) and thus see14 that we need to minimize the function
(5.2.21) to obtain the optimal values of (f>\ and if 2 in the second approximation. From the condition dF/dipi = dF/dtp? = 0 necessary for an extremum, we obtain the desired optimal values
ip2 = ¥>2(A*) = -
ub\A*2'
(5.2.22)
Expressions (5.2.22) determine (in the polar coordinates) the switching line for the optimal control in the second approximation. The form of this
line on the phase plane (a;, y) is shown in Fig. 39. The neighborhood of the origin in the interior of the circle of radius R = 2e7/w&2 is the region where the quasiharmonic character of the phase trajectories is violated. Generally speaking, the results obtained here are not authentic, and we need to use some other methods for constructing the switching line in this region. 5.2.3. Comparative analysis of different control algorithms. It
is of interest to compare the results obtained in the preceding subsection 14
Here we omit cumbersome elementary transformations leading to (5.2.21). To ob-
tain (5.2.21), we need to use formulas (5.2.10), (5.2.12), (5.2.13), and (5.2.18) and the technique used in Section 5.1.3 for calculating average values.
274
Chapter V
with the solutions of similar synthesis problems obtained by other methods. To this end, we can use the results discussed in §7.2 (see also [105]), where we present a numerical method for solving the synthesis problem for the "normalized" predator— prey system controlled on a finite time interval. In §7.2 we consider the optimal control problem in which the plant equa-
tions, the constraints on admissible controls, and the optimality criterion have the form
— = x(l - y) - ux, d l_
x(Q) = x0 > 0, (5.2.23) y0>Q,
0 < r < T,
0 < U(T) < 7,
(5.2.24)
[(l-2f(T)) 2 + ( l - y ( r ) ) 2 ] d r - > O
min
.
(5.2.25)
0<«(r)<7 0
In this case, in §7.2 we derive the optimal control ut (r, x, y) in the synthesis form by solving the Bellman equation corresponding to problem (5.2.23)(5.2.25) numerically.
Note that problem (5.2.23)-(5.2.25) turns into problem (5.2.2)-(5.2.4) if the following assumptions are satisfied:
~ T = ait,
_ x
i~ = —x,
_ 2~ y= — y,
,
6= —2 ,
„,
_
7 = 017,
(5.2.26) "l
Cl = 72 , 2
*^2
C2 = ~2 ,
T -)• 00.
Cti-i
We also note that, in view of the changes of variables (5.2.5) and (5.2.26), the quasioptimal control algorithm in the first approximation (5.2.17) acquires the form
«iC=. V) = |[sign(^ - 1) + 1].
(5.2.27)
To estimate the effectiveness of algorithm (5.2.27), we performed a numerical simulation of the normalized system (5.2.23). Namely, we constructed a numerical solution of (5.2.23) on the fixed time interval 0 < T < T = 15 for three different algorithms of control u: (1) the optimal control u — M*(T, x, y); (2) the optimal stationary control u = ust(x,y)
corresponding to the case where the terminal time T —>• oo in problem (5.2.23)-(5.2.25); (3) the quasioptimal control in the first approximation (5.2.27).
275
Control of Oscillatory Systems
2.
10
12
14
T
FIG. 40 For these three control algorithms, the transient processes in system (5.2.23) are shown as functions of time in Fig. 40 and as phase trajectories in Fig. 41. Moreover, the following parameters of problem (5.2.2)-(5.2.4)
were used for the simulation: a\ = «2 = ^i = ^2 = 0.5, 7 = 0.125, e = 0.5, w = 1, 7 = 0.5, GI = 02 = 1 (in problem. (5.2.23)-(5.2.25), to these values there correspond 7 = 0.25 and b = 1).
1 -
2 x
FIG. 41 Comparing the curves in Figs. 40 and 41, we see that these three algorithms lead to close transient processes in the control system. Hence,
276
Chapter V
the second and the third algorithms provide a sufficiently "good" control. This fact is also confirmed by calculating the quality functional (5.2.25) for these three algorithms, namely, we obtain I[ut(T,x,y)] = 4.812, I[u'f(x,y)] = 4.827, and I[ui(x,y)] = 4.901. Thus, any of these algo-
rithms can be used with approximately the same result. Obviously, the simplest practical realization is provided by the first-approximation algorithm (5.2.27) obtained here; by the way, this algorithm corresponds to
reasonable intuitive heuristic considerations of how to control the system. Indeed, according to (5.2.27), it is necessary to start catching (shooting, etc.) every time when the prey population size becomes larger than the equilibrium size (for the normalized dimensionless system (5.2.23), this equilibrium size is equal to 1). Conversely, as soon as the prey population size becomes smaller than the equilibrium size, any external action on
the system must be stopped. It should be noted that the situation when the first-order approximation allows one to obtain a control algorithm close to the optimal control is rather typical not only of this special case but also of other cases where the small parameter methods are used for solving approximate synthesis problems for control systems. This fact is often (and not without success) used in practice for solving special problems [2, 33]. However, it should be noted that this fact is not universal. There are several cases where the first-approximation control leads to considerable increase in the value of
the functional to be minimized with respect to its optimal value. At the same time, the higher-order approximations allow one to obtain control algorithms close to the optimal control. Some examples of such situations (however, related to control problems of different nature) are examined in §6.1 and in [97, 98]. §5.3. Optimal damping of random oscillations
In this section we consider the optimal control problem for a quasiharmonic oscillator, which is a stochastic generalization of the problem studied in §5.1. Therefore, many ideas and calculational formulas from §5.1 are widely used in the sequel. However, it should be pointed out that the foundations underlying the approximate synthesis methods in these two sections are absolutely different. In §5.1 the quasioptimal controls are obtained by straightforward calculations and minimization of the cost functional, while in the present section the approximate synthesis is based on an approximate method for solving the Bellman equation corresponding to the problem in question. 5.3.1. Statement of the problem. Preliminary notes. Here we
consider a stochastic version of problem (5.1.3)-(5.1.6)as the initial synthe-
Control of Oscillatory Systems
277
sis problem. We assume that the quasiharmonic oscillator (5.1.3) is subject to small controls eu = eu(t) and, in addition, to random perturbations of small intensity
x + £x(x,x)x + x = -£u + ^/eB£(t),
(5.3.1)
where £(i) denotes the standard scalar white noise (1.1.31) and B > 0 is a given number. The admissible controls u = u(t), just as in (5.1.5), are subject to the constraints \u(t)\
and the goal of control is to minimize the mean value of the functional
U
T
-I
c(x(t),x(t))dt
J
->•
min . [«(*))<«„
(5.3.3)
0
The nonlinear functions x(x, x) an(i c(x, x) in (5.3.1) and (5.3.3), just as in §5.1, are assumed to be centrally symmetric, %(x, x) = x(— xi — x) and c(x, x) — c(— x, — x). Next, it is assumed that the penalty function c(x, x) is nonnegative and possesses a single minimum at the point (x = 0, x = 0) and c(0, 0) = 0. Let us introduce the coordinates x\ = x, x 2 = x and rewrite (5.3.1) as
xi = x 2 ,
*2 = -xi -ex(xi,x2)x2 +eu + VeB£(t).
(5.3.4)
Then, using the standard procedure from §1.4, for the function of minimum future losses
t
U
T
I
C(XI(T), x 2 (r)) dr \ x^t) = xi, x2(t) = x2\, J
(5.3.5) we obtain the Bellman differential equation
dF
, . 9F xi + £x(xi,x2)x2-—+
. mm
dF <9x 2 _ (5.3.6)
corresponding to problem (5.3.1)~(5.3.3).
278
Chapter V
It follows from (5.3.6) that the desired optimal control ut (t, x\, x2) can be written in the form
dF u*(t, zi, x2) = -um sign -—(t, zi, x2), OX'2
(5.3.7)
where the loss function F(t,xi,x2) satisfies the following semilinear equation of parabolic type:
dF
dF
dF e_B^d2F
T
(5.3.8)
Equation (5.3.8) and the fact that the functions X(KI, x2) and G(XI, x2] are symmetric imply that F = F(t, KI, 3:2)5 satisfying (5.3.8), is symmetric with respect to the phase coordinates, that is, F(t, xi, x2) = F(t, —xi, — x2). This and formula (5.3.7) show that the optimal control (5.3.7) possesses an important property, which will be used in what follows; namely, the optimal control (5.3.7) is antisymmetric (see (5.1.11)):
We also stress that in this section the main attention is paid to solving the stationary version of problem (5.3.1)-(5.3.3), that is, to solving the control problem in which the terminal time T —>• oo. In the nomenclature of [1], problem (5.3.1)-(5.3.3) as T —>• oo is called the problem of optimal stabilization of the oscillator (5.3.1). 5.3.2. Passage to the polar coordinates. The Bellman differen-
tial and functional equations. By using the change of variables (5.1.15), we transform Eqs. (5.3.4) to equations for the slowly changing amplitude A and phase ip:
,u,t),
3>=t + p,
(5.3.10)
where
H(A,$,u,t) = H(A,$,u)- —T-£c(t),
£c(t) =.B 1/2 ^(<)cos$,
(5.3.11) and the functions G(A, $, u) and H(A,
Control of Oscillatory Systems
279
Note that the right-hand sides of the differential equations (5.3.10) for the amplitude and phase contain a random function £(t) that is a white noise. Therefore, Eqs. (5.3.10) are stochastic equations. The expressions (5.3.11) for G and H are derived from (5.3.4) and (5.1.15) by changing the variables according to the usual rules valid for smooth functions (,(t). Thus
it follows from §1.2 that the stochastic equations (5.3.4) and (5.3.10) are equivalent if they are symmetrized.15
We also note that by passing to the polar coordinates (which become the arguments of the loss function (5.3.5)), we can equally use either the set (A,
For the loss function F(t, A, $) defined by analogy with (5.3.5),
F(t,A,*)=
min
E[ /
|«(r)|<« m
Cl(A(r),
$(r)) dr \ A(t) = A, *(*) = $1 ,
Ut
J
t
ci(A,$) = c(Acos$, -ylsin$),
(5.3.12)
we can write the basic functional equation of the dynamic programming approach (see (1.4.6)) as
min
"' T /"'
E /
\u(r)\
+A
ci(Ar, $ r ) dr + F(i + A,
Ut
t
(5.3.13) This equation expresses the "optimality principle." It is important to stress that relation (5.3.13) holds for any time interval A (not necessarily small).
This fact is important in what follows. But if A -> 0 in (5.3.13), than, using (5.3.10) and (5.3.11), we can readily obtain (see §1.4) the following Bellman differential equation for the function (5.3.12):
0
More precisely, for Eqs.
F(T,A,$) = Q,
(5.3.14)
(5.3.10) it is important to take into account the sym-
metrization property, since these equations contain a white noise £(i) multiplicatively with expressions that depend on the state variables A and f. As for Eqs.(5.3.4), they have
the same solutions independent of whether they are understood in the Ito, Stratonovich, or any other sense.
280
Chapter V
where L denotes the operator n 2 $ a2 2
dA
sin 2$
82
1A dAd®
cos 2 $ d2
2A2 cos2 $ d
sin 2$ d
The last two terms in (5.3.15) appear due to the fact that the stochastic equations (5.3.10) are symmetrized. If we change the time scale and pass to the slowly varying time t = et, then Eq. (5.3.14) for the loss function F(t, A, $) acquires the form
^ ' u + H ( A ' ^ - <5-3-16' It follows from (5.3.16) that the derivatives of the loss functions with respect to the amplitude and the fast phase are of different orders of magnitude (if dF/dA ~ 1, then 6F/d3> ~ e). This fact, important for the subsequent considerations, follows from the quasiharmonic character of the motion of system (5.3.4). Equation (5.3.16) can be simplified if, just as in §1.4, §2.2, §3.1, etc., we consider the stationary stabilization of random oscillations in system (5.3.4). In this case, the upper limit of integration T —>• oo in (5.3.5) and (5.3.12). The right-hand side of (5.3.12) also tends to infinity because of random perturbations £(t). Therefore, to suppress the divergence in the
stationary case, we need to consider the following stationary loss function (see (1.4.29), (2.2.9), and (4.1.7)):
/(A, *) = lim [F(t, A, $) - 7(eT - ?)], T—>oo
(5.3.17)
where the constant 7 characterizes the mean losses of control per unit time in the stationary operating conditions. For the function (5.3.17), we have the stationary version of Eq. (5.3.16): df_
3$
(5 3 18)
-"
Just as in §5.1, taking into account the relay (5.3.7) and the antisymmetry (5.3.9) properties, without loss of generality, we can seek the optimal
Control of Oscillatory Systems
281
control u*(A, «fr), which minimizes the expression in the square brackets in (5.3.18), in the set of controlling actions of the form (5.1.18):
u(A, $) = um sign [sin (* - pr(A))].
(5.3.19)
This allows us to rewrite Eq. (5.3.18) in the form
df - min \G(A,$>,
(5.3.20)
where G(A, $, tpr) and H(A, $, y? r ) denote the functions obtained from
(5.1.17) after the substitution of the control u(A,<&) in the form (5.3.19). Thus, solving the synthesis problem is reduced to finding the function f*(A) that minimizes the expression in the square brackets in (5.3.20) and determines (in polar coordinates) the equation for the switching line of the controlling actions ut = ±WTO under the optimal control u* (A, $). To calculate the function
desired function
Now let us write the functional equation (5.3.13) for the time interval
A = 2-7T. With regard to (5.3.19), we can write (•J + 2JT
mm
"~
Pr(Ar
(5.3.22) Since the loss function (5.3.12) is periodic in the variable <J>, we have F(t, A, $) = F(t, A, $ - 27r). This and (5.3.10) imply that relation (5.3.22) can be rewritten as
F(t,At,$t)=mmE\
ci(A r ,* r ) dr t
+ F(t + 27T, At + eAA, $t + gA^) ,
(5.3.23)
282
Chapter V
where * +2jr
eAA
/J,.
~
G(AT,3>T,uT,
t rt+2n
(5.3.24)'
v
«+2?r
H(AT, $ r , UT, T) dr
/ * + 27T
tf(,4 r ,* r ,y> r (A r )
Using, just as in (5.3.16), the "slow" time t = et and expanding F(t + 2ire, At + eAA, $t + zAip) in the Taylor series, we rewrite (5.3.23) in the form
r
/•*
mmEle fr
I Jt
dF ~ dF ~(t + 2ne, At, $ t ) + eA^— (
(5.3.25) In the stationary case considered in what follows, Eq. (5.3.25) acquires the form
27r r /-*+ mm inE\e I c (A
l
I Jt
(5.3.26)
Equation (5.3.26) is of infinite order and looks much more complicated than the differential equation (5.3.20). Nevertheless, since the differences eA.A and e&ip are small, the higher-order derivatives of the loss function in (5.2.26) are, as a rule, of higher orders of magnitude with respect to powers
of the parameter e. This allows us, considering terms of more and more
Control of Oscillatory Systems
283
higher order of e in (5.3.26) successively, to solve equations of comparatively non-high orders and then to use these solutions for approximate solving of the synthesis problem. In practice, in this procedure of approximate synthesis, special attention
must be paid to a very important fact that simplifies the calculations of successive approximations. Namely, in this case, there are two equations, (5.3.20) and (5.3.26), for the same function f(A, $). Thus, combining both these equations, we can exclude the derivatives df/d$, d2f/dAd$, ... of the loss function with respect to the phase from (5.3.26) and thus to decrease the dimension and to turn the two-dimensional equation (5.3.26) into a one-dimensional equation. It is convenient to exclude the derivatives with respect to the phase, just
as to solve Eqs. (5.3.26), by using the method of successive approximations. 5.3.3. Approximate solution of the synthesis problem. To apply the method of successive approximations, we need to calculate the mean
value of the integral /.« /.«+2jr
-I
Cl(AT,^T)dT\
e Jt
\
(5.3.27)
in (5.3.26) and the mean values of the amplitude and phase increments
E(eAA), E(^), E[(eAA) 2 ], E[(e^A)(e
(5.3.28)
over the time 2?r. By using system (5.3.10), we can calculate expressions (5.3.27) and (5.3.28) with arbitrary accuracy in the form of series in powers of the small parameter e. Let us write
G(A, *, um sign[sin($ - <pr(A))], t) = G(A, $, *),
^
H(A, *, um sign[sin($ -
lo.o.^y )
Then it follows from (5.3.10) that the increments of the amplitude A and the slow phase
At+r — At = £6AT = £ I G(At+T', $t+T', T') dr', Jo
Jo
(5.3.30) , $t+T', r1) dr'.
By using formulas (5.3.30) repeatedly, we can present e8AT and £6ipT as the series
£8AT = •e3...,
284
Chapter V
where
e&iAr = e j
G(At, $t + T', T') dr',
(5.3.32)
(5.3.33)
e3AS3AA T = e
T
1
^,
j dT '
(5 3>34)
-
= £ f H(At^t + r', T') dT', ./o
(5.3.35)
Tl
TL ,
Lr
a* (5.3.36)
The increments (5.3.24) are calculated by formulas (5.3.31)-(5.3.36) with regard to (5.1.17), (5.3.10), and (5.3.11) as
eAA — sSA^m
£&
(5.3.37)
Finally, we need to use (5.3.31)-(5.3.37) and average the corresponding expressions with respect to (,(t), taking into account (1.1.31).
In a similar way, using formulas (5.3.32)-(5.3.36), we can also calculate the integral in (5.3.27) as a series in powers of e. Indeed, writing
:/ Jt
ci(AT,$T)dT = e
Jo
ci(AT,$t + r)dr
substituting (JiA r ,<5i^ r ,... given by formulas (5.3.32), (5.3.35), ..., and averaging with respect to £,(t), we obtain the desired expansion for (5.3.27). In practice, to use this method for calculating the mean values of (5.3.27) and (5.3.28), we need to remember that formulas (5.3.30)-(5.3.38) possess a
Control of Oscillatory Systems
285
specific distinguishing feature relative to the fact that the random functions in expressions (5.3.29) have the coefficients e"1/2:
G(A, <M) = G(A, $, <pr) - e = XA(A*)-«.( ,-1/2 U
v
c(A*)
(5.3.39)
-
FB
COS®
—
(formulas (5.3.39) follows from (5.1.17), (5.3.11), and (5.3.29)). Thus, terms of the order of e"1 appear in £^2^.2^) E^s^*-, . . ., E^^TD E^^TM • • •• Therefore, in the calculations of the mean values of (5.3.27) and (5.3.28), the number of terms retained in the expansions (5.3.31) must always be larger by 1 than needed for the accuracy desired (if, for example, we need
to calculate the mean values of (5.3.27) and (5.3.28) with accuracy up to terms of order of es , then we need to retain (s + 1) terms in the expansions (5.3.31)). For example, let us calculate the first term in the expansion of the mean value E(eAyl). From (5.3.32) and (5.3.35), we have
o
G(At, *« + r', <pr(At)) - i - £ ( r ' ) sin($« + r') dr', V £ J (5.3.40) dr'.
(5.3.41) Averaging (5.3.40) with respect to £(t) and taking into account the properties of the white noise, we obtain
E6j.Ate = 2TrG(At, <pr) = 2n[xA(At) - us (At, <pr)],
(5.3.42)
where the bar, as usual, indicates averaging with respect to the fast phase over the period (e.g., XA(At) = ^ J^ XA(At, *) d&), and us(At,ipr) = us is given by (5.1.34)). Next, it follows from (5.3.33), (5.3.40), and (5.3.41) that 27T
Qf-i
Chapter V
286 f
L\G(At,
Jo
*« + r', pr(At)) - J^-tV) sin(*« + r')] dr*
v e
J
H(At,
-r.
(5.3.43) Averaging (5.3.43) with respect to £(t) and taking into account (1.1.31) and (1.1.32), we obtain D
o D
i o
/-2JT 2JT F
= —- / zAt JQ
1
/ <J(r' - r) cos($t + r') cos($( + r) dr' \dr + D Uo J
where /»27T
r
'
J0
1
ftC
•jf dr.
Finally, from (5.1.34), (5.3.31), (5.3.37), (5.3.42), and (5.3.44) we obtain the desired mean value
E(eAA) = e27r = e2?r
-e2...
-u,(At,
—— (5.3.45)
= -UTO/TT). In a similar way, we obtain
At
•e2...
(5.3.46)
Control of Oscillatory Systems
287
For the other mean values of (5.3.27) and (5.3.28), in the first approximation in e, we have
r
rt+2x i 2 g/ Cl(Ar,3>T}dT \=e2ircl(At) + e ...,
L Jt
1
E(eAA)2 = eBTr + £2...
(5.3.47) (5.3.48)
All the other mean values E[(eA4)(eAy>)], E(eA^>) 2 ,... in (5.3.28) are higher-order infinitesimals in e. Now let us calculate successive approximations of the Bellman equation (5.3.26). Simultaneously, with the help of Eq. (5.3.20), we shall exclude the derivatives of the loss function with respect to the phase from (5.3.26). The first approximation. We represent the loss function f(A, $) as the series
f ( A , *) = h(A, $) + ef2(A, *) + e2 . . . ,
(5.3.49)
substitute into Eq. (5.3.26), and retain only terms of the order of e (omitting the terms of the order of e2 and of higher orders). Since, in view of (5.3.20), df/d$ ~ e, using (5.3.45)-(5.3.48), we obtain the following equation of the first approximation from (5.3.26):
=0. (5.3.50) In (5.3.50) we calculate the minimum with respect to <pr under the assumption that dfi/dA > 0 and thus obtain the expression
(5.3.51)
for the minimizing function
Comparing the result obtained with the approximate synthesis result (5.1.55) for a similar deterministic problem, we see that, in the first approximation in e, the perturbation £(t) in no way affects the switching line. Just as in the deterministic case (5.1.55), (5.1.56), the switching line coincides with the abscissa axis on the phase plane (x, x) for any type of nonlinearity, that is, for any function x(*,i) in Eq. (5.3.1).
288
Chapter V
To find the switching line in the second approximation, we need to calculate the derivative djijdA satisfying the differential equation (5.3.52), where the stationary error 7 is not jet found. But we can readily show how to calculate this error. Namely, since the stationary error is defined (in the
probability sense) as the mean penalty value (see (1.4.32)), we have
7= / Jo
Pl(A)cl(A)dA,
(5.3.53)
where pi (A) is the stationary probability density for the distribution of the amplitude A. The Fokker-Planck equation that determines this stationary
density is conjugate to the Bellman equation. Therefore, in the case of (5.3.52), the equation for pi(A) has the form
For the zero probability flow (see §4, item 4 in [173]), Eq. (5.3.54) has the solution (5.3.55) where the constant C is determined by the normalization condition
C=
r°°
f i r fA l) Aex.pl - -[877,4-4 / xA(A')dA'\ \dA.
(5.3.56)
As soon as 7 is known, we can solve Eq. (5.3.52). The unique solution of this equation is specified by the condition that the function dfi/dA must behave as A —> oo just as in the deterministic case (that is, in (5.3.1) the random perturbations £(t) = 0). This assumption on the solution of
Eq. (5.3.52) is quite natural, since, obviously, the role of the diffusion term in the equation decreases as A increases (similar considerations were used in §2.2). It follows from (5.3.52) that if there are no perturbations (B = 7 — 0), then this equation has the solution
_dh_ -Cl(A) dA ~ ~xA(A)-2p Therefore, the diffusion equation (5.3.52) has the solution
=
7 _ 5l
«„ (5.3.58)
Control of Oscillatory Systems
289
Now we can verify whether the derivative dfi/dA is positive (this was our assumption, when we derived (5.3.51)). It follows from (5.3.58) that this assumption is satisfied for not too small values of the amplitude A. Therefore, if we solve the synthesis problem by this method, we need not consider a small neighborhood of the origin on the phase plane (», x). Just
as in the deterministic case in §5.1, it is clear from the "physical" viewpoint that the controlling action u and the perturbations £(t) lead to appearance of a neighborhood where the quasiharmonic character of the phase trajectories is violated. The second approximation. To obtain the Bellman equation in the
second approximation, we retain the following expression in (5.3.26):
rJt
minE<e / fr
ci(AT, $T] dr —
The other terms in (5.3.26) are necessarily of orders larger than that of e2. The derivatives dfi/dQ, d2fi/dAd&, . . . of the loss function with respect to the phase can be eliminated from (5.3.59) by using (5.3.20). Hence we have
(5.3.60) To find the function y>* (A) that minimizes the expression in the braces in (5.3.59), we shall consider only the terms in (5.3.59) that depend on the control (or, which is the same, on <pr(A)). In this case, we shall use the fact that the minimizing function (f* (A) is small in the second approximation:
and ~ e3. Clearly, it is no longer sufficient to have only formulas (5.3.45)-(5.3.48) for the mean values of (5.3.27) and (5.3.28) in the first approximation.
In the expansions (5.3.45)-(5.3.48) we need to calculate the terms ~ e2. Following the way for calculating (5.3.30)-(5.3.38) and retaining only expressions depending on ipc = ey>2 in the terms of the order of e2, we see that, in the second approximation, formulas (5.3.45)-(5.3.48) must be
290
Chapter V
replaced by
, <3>) — UC(A, i (5.3.61)
E(e&A)2 = eBTt - c-^-uc(A, <pr) sin2 $,
(5.3.62)
E(eAp) = _e2™*(Ai
(5.3.63)
[
/•4 + 2JT
E e I/
Jt
"I / J Ji N I ci(A T,®T)dT\ = \
A
'
°
'
(5.3.64)
where G(A, $) and ci(yl, $) denote the purely vibrational components of
the functions G(A,^,ipr) = G(A,(pr) + G(A,$) and a(A, $) = ci(^) + By using (5.3.60)-(5.3.64), (5.1.34), (5.3.42), and (5.3.59), we see that the desired function if*(A) = e(p2(A), which determines the switching line in the second approximation, can be found by minimizing the expression
uc (A, pjc! (A, *) - « c (A
uc(A, Vr
(5.3.65) We collect similar terms in (5.3.65) with the help of (5.1.34)-(5.1.36).
Control of Oscillatory Systems
291
As a result, we obtain
TT uc(A, $)ci(.A, $)
2um
/_
B\ .
VA
dfi/dA
1A1
r
(5.3.66)
}-
In the following two examples, we calculate the function if* (A) for which (5.3.66) attains its minimum. EXAMPLE 1. Suppose that the plant to be controlled is a linear system. In this case, x(xi i) = 1 in (5.3.1), and it follows from (5.1.17) that _
A
X A (A) — — — j 2
A
XA(A,$) = —cos2$. 2
For simplicity, we assume that the vibrational component of the penalty function ci(A, $) = 0 (this holds, e.g., if c(x, x) = x2 + x2 in (5.3.3)). Then, in view of (5.1.44) and (5.1.45), the expression (5.3.66) acquires the form
5/Af
, £ \ A (I . A[ 4 V3 A B\ .
— cos ip r + — — — I — sin 3<pr + sin (pr
7 - ciU)
The condition dN/dy>r = 0 leads to the following equation for the desired function if* (A):
sin +£
** -\cos3^ + (\ -
+
cos
* =
Representing the desired functions if* (A) in the form of asymptotic expan-
sion
Formula (5.3.68) determines the switching line of the suboptimal control u2(A, $) = um sign[sin($ — ep^A))] in the second approximation.
292
Chapter V
In (5.3.68), 7 is calculated by formula (5.3.53) with the stationary probability density
F(u) = V '
.0
(5.3.69) Here the derivative dfi/dA, determined by (5.3.58) in the general case, has the form
dA
BA f°° , x / [ci(A') -7]A
,
r 1 ,9 ,1 , exp - — (A'2 + 8/iA') dA'.
, (5.3.70)
Since •7 —>• 0
and
3/i ci(A) —— —> ———^—7—-
u
as
JD —>• 0;
A
one can readily see that formula (5.3.68) coincides as B —> 0 with the
corresponding expression (5.1.60) for the switching line of the deterministic problem. EXAMPLE 2. Let us consider a nonlinear plant with x(xi x) = x2 — I in (5.3.1) (in this case, the plant is a self-exciting van der Pol circuit). For such a system, it follows from (5.1.17) that
x(A) = |--g-,
x A (A,$) = -|cos2$ + ^-cos4$.
(5.3.71)
Substituting (5.3.71) into (5.3.66) and using (5.1.44) and (5.1.45), from (5.3.66) and the condition dN/dipr = 0 we derive the expression for the switching line in the second approximation, which coincides in form with
the expression obtained in the previous example. However, now the loss function and the stationary error in (5.3.68) must be calculated in a different way.
So, in this case, the stationary probability density (5.3.55) for the distribution of the amplitude has the form
(5-3-72)
293
Control of Oscillatory Systems where C is the normalization constant: A4_ _ r00 A„ exp ri| —j— _ /— ! (^ C — I/ CXP B V 8 ~ Jo L~ V"8~ ~
(5.3.73)
The stationary error 7 in (5.3.68) is calculated by formula (5.3.53) with the help of (5.3.72) and (5.3.73). The expression for dfi/dA can be obtained from (5.3.58) with regard to (5.3.71). As a result, we see that the derivative dfi/dA in (5.3.68) has the form
9/1
°°
+
^ A'(
dA'.
Just as in Example 1, formula (5.3.68) coincides as B —>• 0 with the corresponding expression
obtained in §5.1 (see (5.1.63)) for the deterministic problem.
_2
-1
-1 U = Un
FIG.
42
The influence of random perturbations on the position of the switching
line in the second approximation is shown in Fig. 42, where four switching
294
Chapter V
lines for the linear quasiharmonic system from Example 1 are depicted. Curve 1 corresponds to the deterministic problem (B — 0). Curves 2, 3, and 4 show the switching lines in the stochastic case and correspond to the white noise intensities B = 1, B = 5, and B = 20, respectively. These switching lines correspond to the quadratic penalty function c(x, x) =
x2 + x2 in the optimality criterion (5.3.3) and the parameters um = 1 and e = 0.25 in problem (5.3.1)-(5.3.3). The dashed circle in Fig. 42 approximately indicates the domain where the quasiharmonic character of the phase trajectories of the system is violated. In the interior of this domain, the synthesis method studied here may lead to large errors, and we need to employ some other methods for calculating the switching line near the origin.
5.3.4.
Approximate synthesis of control that maximizes the
mean time of the first passage to the boundary. As another ex-
ample of the method of successive approximations treated above, let us consider the synthesis problem for a system maximizing the mean time during which the representative point (x(t), x(t)} first comes to the boundary of some domain on the phase plane (x,x). For defmiteness, we assume that this domain is the disk of radius Ro centered at the origin. As before, we consider a system whose behavior is described by Eq. (5.3.1) with constraints (5.3.2) imposed on control. Passing to the polar coordinates and considering the new state variables A and as functions of the "slow" time t = et, we transform Eq. (5.3.1) to the system of equations of the form
at
, ,,,
at
£
,u,t),
(5.3.74)
where the functions G and H are given by (5.3.11) and (5.1.17). By using Eq. (5.3.74), we can write the Bellman equation for the problem in question. It follows from §1.4 that the maximum mean time during which the representative point (A(r), $(r)) achieves the boundary (the loss function for the synthesis problem considered) can be written as (see (1.4.38))
U
oo
-i
W(T,AT,$~\dT\. J
(5.3.75)
Recall that W(T, Af, 3>j) denotes the probability that the representative
point with the polar coordinates (Af, 3>j-) at time t does not achieve the boundary of the region of admissible values during the time (T — t). For the optimality principle (see
(1.4.39)) corresponding to the function (5.3.75),
Control of Oscillatory Systems
295
we can write the equation
t
U
t+A
_ 1 W(T, Aj, %) dr + F(A^+A, % +A ) . -I
(5.3.76) By letting the time interval A —>• 0, in the usual way (§1.4), we obtain the following differential Bellman equation for the function F(A, <J>):
dF ( — r dF dF —- =s< - l - L F - max \G(A, $, u)—— + H(A, $,u) —— (5.3.77) Here L is the operator (5.3.15), and the functions G and H are determined by formulas (5.1.17). On the other hand, if we set A = 1-xe in (5.3.76), then we arrive at the finite-difference Bellman equation (an analog of (5.3.26))
Here the increments of the amplitude eAA and the "slow" phase eA
Section 5.3.3, to solve Eqs. (5.3.77) and (5.3.78) simultaneously. Here we write out the first two approximations of the function f>r(A) determining the switching line in the optimal regulator, which, just as in Section 5.3.3, is of relay type and has the form (5.3.19). The first approximation. Substituting the expression 9F/d$ from (5.3.77) into Eq. (5.3.78), omitting the terms of the order of e2 and of higher orders, and using (5.3.45)~(5.3.48), we obtain the following Bellman equation in the first approximation:
max (5.3.79) Since, by definition, W(t, Af, $j) = 1 at all points in the interior of the domain of admissible states (that is, for all A-JT < RO), we can transform
296
Chapter V
(5.3.79) with regard to (5.3.45) to the form
*v '
=-1. (5.3.80)
4AJ 8A
The function (p* (A) determining the switching line in the first approximation is found from the condition that the expression in the square brackets in (5.3.80) attains its maximum. For dF^/dA < O16 we obtain p*(A)=0.
(5.3.81)
Comparing (5.3.81) with (5.3.51) as well as with (5.1.55), we conclude that, in the first approximation in e, the switching line of the optimal quasiharmonic stabilization system always coincides with the abscissa axis on the plane (x, x); this fact is independent of the type of system nonlinearity,
the existence of random perturbations, and the optimality criterion. Some distinctions between expressions for (p* (A) appear only in higher-order approximations. The equation for the loss function F\ (^4) in the first approximation with regard to (5.3.81) has the form
4 dA2 A unique solution of this equation is determined by the natural boundary conditions
dA
(0) < oo.
(5.3.83)
For simplicity, we shall consider the case where the plant is a linear quasiharmonic system. In this case, we have x(xi *) — 1 in (5.3.1) and x(-A) = -A/2 in (5.3.82). Solving (5.3.81) with the second condition in (5.3.83), we readily obtain
dA
_ 8 M fe-(A+4rf/BdA\\. B Jo JJ
v(5.3.84)
'
The expression (5.3.84) is used for determining the switching line in the second approximation. 16
It follows from (5.3.84) that the condition 9Fi/9A < 0 is satisfied for all A £ (0,Ro\.
Control of Oscillatory Systems
297
The second approximation. The switching line in the second approximation is calculated by analogy with Section 5.3.3. Namely, in Eq. (5.3.78) we consider the terms of the order of £2 and retain the terms depending on
that the desired function ip* (A) in the second approximation is determined by the condition that the expression
attains its maximum. If the system is linear, then we have X A (^4., <J>) = Acos 2<J>/2, and the desired expression for tp* (A), which follows from the condition dN/dip — 0 with regard to (5.1.44) and (5.1.45), has the form
(5.3.86)
Figure 43 shows the switching line given by (5.3.86).
FIG. 43 In conclusion, let us present the block diagram (Fig. 44) of a quasiop-
timal self-stabilizing feedback control system with plant P described by
Chapter V
298
Eq. (5.3.1). The feed-back circuit (the regulator) of this system contains a differentiator, a multiplier, an adder, an inverter, a relay unit, and two nonlinear transducers NC1 and NC2. Unit NCl realizes the functional that is, produces the current value of the amdependence A = plitude A. Unit NCl models the functional dependence if*(A), which is given either by (5.3.68) or by (5.3.86), depending on the problem considered. Thus, the feed-back circuit in the diagram in Fig. 44 realizes the control law
u(x, x) = -eum sign (x + x
1
e .
FIG. 44 We also note that the diagram in Fig. 44 becomes significantly simpler if system (5.3.1) is controlled by using the quasioptimal algorithm in the first approximation (5.1.55), (5.1.56). In this case, the part of the diagram indicated by the dashed line is absent. §5.4. Optimal control of quasiharmonic systems with noise in the feedback circuit Now we shall show how to generalize the results of the preceding section to the case where the error in the measurement of the output (controlled) variable x(i] cannot be removed. 5.4.1. Statement of the problem. We shall consider the feed-back
control system whose block diagram is shown in Fig. 25. Just as in §5.3, we
Control of Oscillatory Systems
299
assume that the plant P is a quasiharmonic controlled system perturbed by the standard white noise and described by the equation
x + ex(x,x)x + x = su + VeB(,(t).
(5.4.1)
We seek the optimal (scalar) control w* = «* (t) in the class of piecewise continuous functions whose absolute value is bounded by um: \u(t)\
(5.4.2)
It is required to calculate the controller C so that to provide the optimal damping of oscillations x(t) arising in system (5.4.1) under the action of
random perturbations (.(t). In this case, the quality of the damping is estimated by the mean value of the functional
U
T
-1
c(x(t),x(t))dt\. J
(5.4.3)
The functions x(x,x] and c(x,x) in (5.4.1), (5.4.3) are the same as in (5.3.1), (5.3.3). Therefore, problem (5.4.1)-(5.4.3) is completely identical to problem (5.3.1)-(5.3.3). The single but important distinction between these problems is the fact
that now it is impossible to measure the current state of the controlled variable x(t). We assume that the result y(t] of our measurement is an additive mixture of the true value of x(t) and a random error of small intensity: y(t) = x ( t ) + V^r,(t], (5.4.4) where e is a small parameter the same as in (5.4.1) and the random function r)(i) is a white noise (independent of £(i)) with characteristics
Er,(t) = 0,
Erj(t)r,(t - r) = N6(r),
(5.4.5)
where N > 0 is the intensity (spectral density) of the process rj(t). Now to obtain some information about the current state of the plant at time t, we need to use the entire prehistory of the observed process 2/0 = {y(r) : 0 < T < t} from the initial time t = 0 till the current time t. Therefore, in this case, the current values of the control action ut and the function (5.3.5) of minimum future losses depend on the observed realization t/o, that is, are the functionals
«t = wfoS],
(5.4.6)
U
T
-j
c(x(T),x(T))dr
y*0 \. J
(5.4.7)
300
Chapter V
The principal distinction between problems (5.4.1)-(5.4.4) and (5.3.1)(5.3.3) is that, to find the optimal control functional (5.4.6) that minimizes the optimality criterion (5.4.3), we need to choose the space of states of the controlled system (the sufficient coordinates of the problem; see §1.5, §3.3, and §4.2) in a special way, which will allow us to use the dynamic
programming approach for solving the synthesis problem. Let us show how to determine the sufficient coordinates for problem 5.4.2. Equations for the sufficient coordinates. Let us consider the random function z(t) = JQ y(r) dr. Then writing the plant equation (5.4.1) as the system of first-order equations:
(5.4.8) and assuming that the control u is a given function of time, we can readily show that z ( t ) is the observable component of the three-dimensional Markov process ( x i ( t ) , X 2 ( t ) , z ( t ) ) . By using (5.4.4), (5.4.5), and (5.4.8), as well as the results of §1.5, we readily obtain an equation for the a posteriori probability density wpi(t,x) = wpi(t,xi,X2) = w(xi,x<2 \ ZQ) = w(xi,xz \ 2/0) for the components of the unobservable diffusion process determined by system (5.4.8). The corresponding equation is a special case of Eq. (1.5.39)
and has the form
(5.4.9) Here the subscripts a , f 3 take the values 1 and 2, and E2, \Bap\\ =
Ky>(x)
0 0 || 0 gB '
= £U — £x(x
, y(a: yj
'
. _ _1 f ~ si
^fA
(5.4.10)
Equation (5.4.9) for the a posteriori density also remains valid if the control u in (5.4.8) is a functional of the observed process z*Q (or y^} or even of the a posteriori density wps(t, x) itself. This fact is justified in [175] (see also §1.5). It follows from (5.4.4), (5.4.5), (5.4.9), (5.4.10), and the results of §1.5 that the a posteriori probability density wps(t, x), treated as a function of time, is a Markov stochastic process and thus can be used as a sufficient coordinate in the synthesis problem. However, usually, instead of wps(t, x), it is more convenient to use a parameter system equivalent to wps(t, x). If we write x%(t) = x°(, x°(t) = X2t for the coordinates of the maximum
Control of Oscillatory Systems
301
point of the a posteriori probability density wps(t,x) at time t,17 then, expanding wps(t, x] in the Taylor series around this point, we obtain the following representation for wps(t, x) = wps(t, xi, x^) (see (1.5.41)): wps(t, 3=1, £2) = const 00
-I
E
-«»!»,...„.(*)(*„. - < (*)) . . . (xn, - X ° . ( t ) )
S=2
S
(5.4.11)
J
-
(in (5.4.11) the sum is over n;, i = 1,..., s, assuming the values 1 and 2). If we substitute (5.4.11) into (5.4.9) and set the coefficients of equal powers of (xni — o;^)... (xn> — x^J on the left- and right-hand sides equal to each other, then we obtain a system of differential equations for the parameters z°.(<) and a ni ...„,(<) (see (1.5.43)). Note that since Eq. (5.4.9) is symmetrized, the stochastic equations obtained for x^.(t) and ani...n,(t) are also symmetrized. It is convenient to replace the probability density wps(t, x) by a set of parameters, since we often can truncate the infinite system of the parameters x nii ani,...,n, [167, 170, 181] retaining only a comparatively small number of terms in the sum that is the exponent in (5.4.11). The error admitted in this case as compared with the exact expression for wps is the less the higher is the a posteriori accuracy of estimation of the unobservable components Xi and £2 (or, which is the same, the less is the norm of the matrix i|D a/ g|| of the a posteriori variances); here, the norm of the matrix ||Da/g|| is of the order of e, since, in view of (5.4.4), the observation error is a small variable of the order of y'e. It is often assumed [167, 170] that a n i n _, n s = anin2n3n4 = • • • = 0 in (5.4.11) (this is the Gaussian approximation). In the Gaussian approximation, from (5.4.9) and (5.4.10) we have the following system of equations for the parameters of the a posteriori density wps(t, zi, X2): 18
eN
17 The variables xj(t) and Xj(t) are estimates of the current values of the coordinate x(t) and the velocity x(t) of the control system (5.4.1). If the estimation quality is
determined only by the value of the a posteriori probability, then xj(i) and x^(t) are the optimal estimates. 18 For the linear oscillator (when %(a;,i) = 1 in (5.4.1)), the a posteriori density (5.4.11) is exactly Gaussian, and Eqs. (5.4.12) are precise.
302
Chapter V
D12 = -DnD 12
_£
.
_Dlll +e
_
+ D 22
_ - -22^— ,
1 2 -—-— oxidx2
D 22 = sB - 2D12 l + 4^ + e
vx2
- D 2 2 OD
(5.4.12)
To write these equations, we have passed from the parameter system ||aa^||
to the matrix ||Dajg|[ = Hcta/sll" 1 of the a posteriori covariances. Besides of this, in (5.4.12) we have used the notation x2) + x 2 -— (KI, x2),
X2(a=i, x2) = x2x(xi, x2).
OX2
Let us make some remarks concerning Eqs. (5.4.12). First, since (see
(5.4.1), (5.4.4), and (5.4.5)) the noise intensity in the plant and in the feedback circuit is assumed to be small (of the order of e), the covariances of the a posteriori distribution are also small variables of the order of e, that is, we can write DH = eDn, Di2 = eD 12 , and D 2 2 = eD 22 . This implies that the terms in (5.4.12) are of different order of magnitude and thus Eqs. (5.4.12) can be simplified furthermore. So, retaining the most important terms and omitting the terms of the order of e2 and of higher orders, we can rewrite (5.4.12) in the form
•A.VH •"a/ < "2 ' ^^ >
= 2D12 -
fij
D?, eN
D?2 , D 22 = e S - 2 D i 2 - — £ + e 2 . . .
D
(5.4.13)
P Iv
We also note that, in this approximation, the last three equations in (5.4.13)
can be solved independently of the first two equations. In particular, we
Control of Oscillatory Systems
303
see that, for a long observation, the stationary operating conditions occur and the covariances of the a posteriori probability distribution attain some steady-state values D^, D^, and D^ that do not change during the further observation. These limit covariances depend neither on the way of control nor on the type of the plant nonlinearity (the function x(x,x) in (5.4.1)) and are equal to
(5.4.14)
= £D22 = In what follows, we obtain the control algorithm for the optimal stabilizer (controller) C under these stationary observation conditions. 5.4.3. The Bellman equation and the solution of the synthesis problem. In the Gaussian approximation, the loss function (5.4.7) is completely determined by the current values of the a posteriori means x®(t) •=. x®t and x2(t) = x2t and by the values of the a posteriori covariances On, Di2, and D22- Under the stationary observation conditions, the a posteriori covariances (5.4.14) are constant, and therefore, we can take Z
2W> X2(t)i and time t as the arguments of the loss function (5.4.7). Thus, in this case, instead of (5.4.7), we have
t
U
T
_^ __^ _^ -i CI(XIT, x2r) dr | x^t, a^, D*!, D*2, D22 . J (5.4.15)
In (5.4.15) the symbol E of the mathematical expectation means the a posteriori averaging, that is, the averaging with the a posteriori probability
density. In other words, if we write the integral in the square brackets in (5.4.15) as a function of the initial values of the unobservable variable x±t and x-2t, then, to obtain F(t, x^t, x2t), we need to integrate this function with respect to *i« and x2t with the Gaussian probability density exp
-
— — ——=,
[D2'3(x« - x°lt)2 - 2D1*a(x» - x°lt)(x2t - x°2t)
(5.4.16) For the function (5.4.15), the basic functional equation (the optimality
304
Chapter V
principle) of the dynamic programming approach has the form
_
U
t+A
Ci(xiT,X2T)dT
F(t + A, z° t+A , x§ t+A ) |<,^ t , DU.DU, D2*2 . (5.4.17) The differential Bellman equation can be obtained from (5.4.17) by using the standard derivation procedure outlined in §1.4 and §1.5. To this end, we need to expand the function F(t + A, «°t+A, Z^+A) in the Taylor series around the point (t, x®t, a^t); *° calculate the mean values of the increments - xr° E(r° + A - xit) r° I 2> ••• \t)i} cf
and the integral t+A
-,
ci(xlT,x2T)dr
J
,
(5.4.19)
to substitute the expressions obtained for (5.4.18) and (5.4.19) into (5.4.17), and pass to the limit as A —>• 0. To calculate the mean values of (5.4.18), we need Eqs. (5.4.13) and formulas (5.4.4) and (5.4.5). So, from (5.4.13) we obtain r
(5.4.20)
J
Since the stochastic processes XIT — KI(T), x\T — X ° ( T ) , and x^T = «°(T) are continuous, for small A we can replace these stochastic functions by the constant values x^t, x®t, and x\t. The error of this replacement is of the order of o(A). As a result, if we average with respect to r\(£) with regard to (5.4.5), then (5.4.20) implies
o(A). By averaging (*) with respect to xu with the probability density (5.4.16),
we finally obtain
Ǥ,A + o(A).
(5.4.21)
Control of Oscillatory Systems
305
In a similar way, we can find the other expressions for (5.4.18) and (5.4.19):
- z°t) = (eut - x°lt °
£x(x°u,
0
N
v
'
x°2t)x°2t)A + 0 (A),
"
— * 7^" *
-^t)=e^7 i i A + o(A), 0
U
° 2 —iti
N
t+A
-I
/Y+OO
c(xir,x2T)dT\ = A // \
c(xit,X2t)N(x°, D*)dxitdx2t
JJ-OO
+ o(A).
D
(5.4.22)
Using (5.4.21) and (5.4.22) and letting A -)• 0 in (5.4.17), we obtain
dx°2 0 -=r*^*
" /• /> +oo
//
1
2
dF 2
+
(
1
_
, D*)dxi(ia; 2
(5.4.23)
»/ J —
(here we omit the subscript t in «j, x®, zi, 3:2)If the terminal time T in (5.4.3), (5.4.7), and (5.4.15) is sufficiently large, then the fact that F depends on t becomes unimportant (the stationary stabilization conditions take place), since the derivative —dF/dt —>• 7 as T —>• oo (here 7 is a constant that characterizes the mean losses per unit time
under the optimal control). As is usual in such cases (see (1.4.29), (2.2.9), (4.1.7), and (5.3.17)), passing from F(t, x°, x®} to the time-independent loss function
[F(t, xl x°) we arrive at the stationary version of Eq. (5.4.23):
/• f +
// J J —C
(5.4.24)
Chapter V
306
Just as in §5.3, it is more convenient to solve Eq. (5.4.24) in the polar coordinates if, instead of the estimated values of the coordinate x® and the velocity x%, we use, as the arguments of the loss function, the corresponding
values of the amplitude A$ ° = vlocos$o,
an
d the phase $o:
x° = -A0 sin <J>0
o = t + fo] •
(5.4.25)
Performing the change of variables (5.4.25), we transform (5.4.24) to the form
(5.4.26)
- mn
The expressions for G(A 0 , $0, u) and H(Ao, $0, u) coincide with (5.1.17) after the change A, <& —>• AQ, 3>o- The function c* (Ao, <J>o) is determined by the penalty function c(x, x) in (5.4.3) (e.g., for c(x, x) = x2 + x2, we have c*(A 0 ,$o) = Al + eDi! + eD^). In (5.4.26) L0 denotes the differential operator
sin2$ 0
'K*^2|cos2< sin2 $o d
sin2$0
sin 2$o cos2$0
d
dA0
cos 2$o I
". n
(D 12 ) 2 sin2 $0 __
sin2$0
cos2 $0 d2
l
}
Note that as TV —>• 0 formula (5.4.27) passes into formula (5.3.15) for the operator L obtained in §5.3 for systems containing complete information
about the phase coordinates of the plant. We can readily verify this fact by substituting the values (5.4.14) of the steady-state covariances into (5.4.27) and passing to the limit as N -> 0. Then (5.4.27) acquires the form of
(5.3.15), and Eq. (5.4.26) coincides with (5.3.18).
Control of Oscillatory Systems
307
Equation (5.4.26) can be solved by the approximate method outlined in §5.3. Indeed, the principal assumption (necessary for the approximate method to be efficient) that the trajectories of the sufficient coordinates x®(t) and £2 W are quasiharmonic is satisfied in this case, since the noises (,(t) in the plant and the noises rj(t) in the feed-back circuit are small (their
intensity is of the order of e). In view of this fact, the rate of change of the estimated values for the amplitude AQ and the phase
min u(r)\
\
ft+2"
E \e I I
c*(AOT, <J>0r) dr — ZTTSJ +
df
, *
df d$0
Jt
i < V < t + 2)r
+
dA0d$0 (5.4.28)
similar to Eq. (5.3.26). Next, just as in §5.3, by using (5.4.26), we eliminate the derivatives of the loss function with respect to the phase <&o from (5.4.28) and solve the obtained one-dimensional infinite-order equation by the method of successive approximations. Note that the increments of the estimated values of the amplitude sAAo and the phase eA<po on the time interval A — 2-Tr can readily be calculated with the help of Eqs. (5.4.13) for the sufficient coordinates written in the
polar coordinates AQ and 3>o in accordance with the change of variables (5.4.25). In this case, just as in §5.3, we assume in advance that, in view of the symmetry of the problem, the optimal control has the form
w»(4 0 ,$ 0 ) = um sign [sin ($0-<pr(A0))],
(5.4.29)
and thus solving the synthesis problem is equivalent to finding the equation in the polar coordinates for the switching line ipr(Ao). We do not consider the mathematical calculations in detail (they coincide
with those in §5.3), but illustrate the results obtained for the switching line in the first two approximations by way of example of a controlled plant that is a linear quasiharmonic system (in (5.4.1) we have x(x, x] = 1). By using the above-described procedure, we simultaneously solve Eqs. (5.4.26) and
Chapter V
308
(5.4.28) and obtain the following one-dimensional Bellman equation in the first approximation (in the case of quadratic penalties c(x, x) = x2 + x2):
Z/dfi
TTT+
dAf.
(I / \
n\ dfi
-^r)x 2 / dA
r
dA0
(5.4.30)
TT 4N Hence we obtain the following equation for the switching line in the first approximation:
vl(A0) = 0,
(5.4.31)
which corresponds to the control law
FIG. 45 Taking into account (5.4.31), from (5.4.30) we obtain the expression exp -
'
.
(A2 — 7) exp
r
1,. — — (A 2 -
for the derivative dfi/dAo, which enters the formula for the switching line in the second approximation:
309
Control of Oscillatory Systems
Since ip2(A0) is small, it follows from (5.4.25) and (5.4.29) that the quasioptimal control algorithm in the second approximation can be written as
« 2 (z°, x°2) = -um sign (x°2 + zV r (\Af The block diagram of a self-stabilizing system realizing the control algorithm in the second approximation is shown in Fig. 45. The most important distinction between this system and that in Fig. 44 is that the feed-back circuit contains an additional element SC producing the current values of the sufficient coordinates x°(t) and x^t). Figure 46 presents the diagram of this element in detail.
eu
FIG. 46
CHAPTER VI
SOME SPECIAL APPLICATIONS OF ASYMPTOTIC SYNTHESIS METHODS
In this chapter we consider some methods for solving adaptive problems of optimal control (§6.1), as well as problems of control with constrained phase coordinates (§6.2). Furthermore, in §6.3 we solve a problem of controlling the size of a population whose behavior is described by a stochastic logistic model.
"Adaptive problems" are optimal control problems, similar to those considered above, that are solved under the assumption that some system parameters are unknown a priori. In this case, just as in problems with observation noise (§3.3, §4.2, and §5.4), the optimal controller is a combination of the optimal nitration unit and the controlling unit properly producing the required controlling actions on the plant. In §6.1 we present an approximate method for calculating such controllers; this method is effective if the a priori indeterminacy of unknown parameters is relatively small. In §6.2 we present exact and approximate solutions of some stochastic problems of control with constrained phase coordinates. We consider two servomechanisms and a stabilizing system under the assumption that the range of admissible deviations between the command signal and the output coordinate is a fixed interval on the coordinate axis. We consider two cases of reflecting and absorbing screens at the endpoints of this interval. In solving the stabilization problem, we study a two-dimensional problem in which the phase trajectories reflect along the normal on the boundary of the region of admissible phase variables. In §2.4 we have already studied the problem of control of a population size and have exactly solved a special control problem based on the stochastic Malthus model. In §6.3 we shall consider a general case of a stochastic logistic controlled model and construct an optimal control algorithm for this model in terms of generalized power series. We also obtain approximate finite formulas for quasioptimal algorithms, which can be used for large values of the model parameter called the medium capacity. 311
312
Chapter VI §6.1.
Adaptive problems of optimal control
In this section we consider the synthesis problem for controlled dynamic systems perturbed by a white noise and described by equations with unknown parameters. We assume that the system equations contain these parameters linearly and that the a priori indeterminacy of these parameters
is small in some sense. First we present a formal algorithm for solving the Bellman equation approximately (and for the synthesis of a quasioptimal control). The algorithm is based on the method of successive approximations in which the solution of the optimal control problem with completely known values of all parameters is used as the zero approximation (a generative solution). Next, we estimate the quality of the approximate synthesis (for the first two approximations). Finally, we illustrate our method by calculating a quasioptimal stabilization system in which the controlled plant
is an aperiodic dynamic unit with an unknown inertia factor. 6.1.1. We shall consider control systems where the plant is described by stochastic differential equations of the form
x = A0(x)+Bu +
(6.1.1)
Here x is an n-dimensional phase vector, u is an r-dimensional control vector, 6(x) is an n-dimensional vector of known functions, £(t) is an ndimensional vector of random functions of the white noise type (1.1.34),
and A, B, a are constant matrices of the corresponding dimensions. Here B and cr are known matrices (det
and any admissible control u. In the following, it is convenient to denote the unknown parameters of
the matrix A by the special letter a. Numbering all unknown parameters in an arbitrary way and writing them as a column a = (QI, . . . , a*), we can rewrite Eq. (6.1.1) as x = A*0(x)+Q(x)a + Bu + o-£(t),
(6.1.2)
where A* is obtained from the matrix A by substituting zeros instead of all unknown elements and the n x k matrix Q(x) (that consists of the functions Bi(x) and zeros) is uniquely determined by the vector a from the condition A0(x) = A*B(x] + Q(x)a. The goal of control is to minimize with respect to u the mean value of the functional
U
T
}
(c(x(t}} + uT(t)Hu(t)} dt + ij>(x(T)} L J
(6.1.3)
Applications of Asymptotic Synthesis Methods
313
where c(x) and ip(x) are some nonnegative bounded continuous functions, and H is a positive definite constant r x r matrix. We do not impose any restrictions on the admissible values of the control vector u and assume that the state vector x can exactly be measured at any time t G [0,T]. Thus, we can seek the optimal control u* that minimizes the mathematical
expectation (6.1.3) in the form of the functional «,=¥»[*, z*0],
(6.1.4)
where x*0 = {X(T) : 0 < T < t] is an observed realization of the state vector from the initial instant of time to the current time t. 6.1.2. The approximate synthesis algorithm. We assume that the difference between the unknown parameters a and the a priori known
vector ao is small. To obtain a rigorous mathematical statement, we assume that a is a random vector subject to an a priori Gaussian distribution with mean QQ and the covariance matrix DO = eDo (e is a small parameter). This assumption and Eqs. (6.1.2) imply the following two facts that we need in the sequel. 1. The a posteriori probability density p(a
a;0) = Pt(ct) calculated from observations of the process ^(i) 1 is a Gaussian (conditionally Gaussian) density completely described by the vector m = m(t) = mt of a posteriori mean values and the matrix D = D(t) = Dt of a posteriori covariances. The latter are described by the following differential equations (see [132, 175]):
d m = DQ T (x(t))AT- 1 [d0x(t) - (A(m)9(x) + Bu) dt], (6.1.5) U = -DJViD.
(6.1.6)
Throughout this section, N~* is the inverse of the matrix TV = era'T, NI =
QTN~1Q, and the matrix A(m) is obtained from A in (6.1.1) in which all unknown parameters a are replaced by their a posteriori means m.2 We also note that system (6.1.5) contains stochastic differential Ito equations and the differential equations in system (6.1.6) are understood in the usual sense. 2. The elements of the matrix Dt are small variables (~ s) for all t > 0. Indeed, by integrating the matrix equation (6.1.6), we obtain the following explicit formula for the covariance matrix D< in quadratures:
Dt=\E + D0 [ QT(xs)N-1Q(xs)ds] L Jo J
D0
(6.1.7)
1
It follows from (6.1.2) and (6.1.4) that x(t) is a diffusion type process.
As is known [38, 39, 167], the a posteriori means m = mt are optimal estimates of ce with respect to the minimum mean square error criterion.
314
Chapter VI
(E is the k x k identity matrix). Denoting the columns (with the same numbers) of the matrices Dt and DO by yt and t/o, respectively, we obtain from (6.1.7) the relations
R(s) = D0QT(xs)N-lQ(xs).
R(s)dsyt=y0,
(6.1.8)
o
Since the constant matrices D0 and N [ are positive definite, the matrix R(s) is nonnegative definite; R(s) is degenerate if and only if all elements of at least one column of the matrix Q are zero. Let X(s) > 0 be the minimum eigenvalue of the matrix R(s). plying (6.1.8) by yt in the scalar way, we obtain
\\yt\\2 + yt
Jo
R(s)dsyt-(y0,yt)
On multi-
(6.1.9)
(here \\yt\\ is the Euclidean norm of the vector yt). Replacing the quadratic form in (6.1.9) by its lower bound and estimating the inner product (yo, yt) with the help of the Cauchy-Schwarz-Bunyakovskii inequality, we arrive at the inequality
11^*11 S (l + /*W)
\\yo\\i
/"*
fJ-(t) = / X(s)ds. Jo
(6.1.10)
Since \\yo\\ ~ £, it follows from (6.1.10) that \\yt\\ ~ £• Thus we have Dt ~ e for alH G [0,T]. We shall solve the problem of optimal control synthesis by the dynamic programming approach. To this end, we first note that the a posteriori probability density pt(a) (or the current values of its parameters mt and Dt) together with the current values of the phase vector Xt form the suffi-
cient coordinates (see §1.5) for the problem in question. Therefore, these parameters and time t are arguments of the loss function given, as usual, by the formula
_ t<»
U
T [ c ( x ( s ) ) +uT(s)Hu(s)]
x ( t ) = x, m(t)
ds
=m, D(t) = D. D\. (6.1.11)
The expression in the square brackets in (6.1.5) is the differential of the Wiener process (the
innovation process [132]) with the matrix N of
315
Applications of Asymptotic Synthesis Methods
diffusion coefficients. Therefore, it follows from (6.1.2), (6.1.5), and (6.1.6) that the variables (xt,mt, Dt) form a diffusion Markov process (degenerate with respect to D). By applying the standard derivation procedure (see §1.4, as well as [97]), we obtain the following differential Bellman equation for the function F = F(t, x, ra, D): T T T T -Ft = 0T(x)A(m)F x + min [U B Fx + u Hu] +
Sp(NF,,T
F(T,x,m,D)=i>(x).
(6.1.12)
Here Ft = dF/dt, Fx is a vector-column with components ^-, . . . , -j£-,
FxxT F
T
•PramT —
dmpdmq
(i= l,...,n;p= l,...,k),
=
8F dD
are matrices of partial derivatives,
pg
and Sp(-) is the trace of the matrix (•). Since the covariance matrix D is of the order of s, it is now expedient to pass to new variables D according to the formula D = sD. Performing this substitution and minimizing the expression in the square brackets, we
transform Eq. (6.1.12) to the form
-Ft = 0T(x)AT(m)Fx - ^FfBiF, + ^ Sp(NFxxT)
+ c(x)~eSp(DNlDFD)+eSp(DQTFxmT} + -Sp(DNlDFmmT), F(T,x,m:D)=il>(x),
B
(6.1.13)
In this case, the vector
(6.1.14)
316
Chapter VI
at which the function in the square brackets in (6.1.12) attains its minimum, determines the optimal control law, which becomes a known function M* = M* (t, x, ra, D) of the sufficient coordinates, after the loss function F = F(t, x, m, D) is calculated from Eq. (6.1.13). Now let us discuss whether Eqs. (6.1.13) can be solved. Obviously, in the more or less general case, it is hardly possible to obtain an exact solution.
Moreover, one cannot construct the exact solution of Eq. (6.1.13) even in the special case where 8(x) is a linear function and c(x) and i>(x) are quadratic functions of a;, that is, in the case in which the synthesis problem
with known parameters in system (6.1.1) can be solved exactly. The crucial difficulty in this case is related to the bilinear form (in the variables x and m) appearing in the coefficients of the first-order derivatives Fx. On the other hand, a high accuracy of estimating the unknown parameters a, due
to which a small parameter e appeared in the three last terms in (6.1.13), results in a rather natural assumption that the difference between the exact solution of (6.1.13) and the solution of (6.1.13) with e = 0 is small. (In other words, the difference between the solution of the synthesis problem with unknown parameters a and the similar solution with given a = ai is small.)
The above considerations allow us to believe that an efficient approximate solution of Eq. (6.1.13) (that is, of the synthesis problem) can be obtained by means of the regular asymptotic method based on the expansion of the desired loss function F in powers of the small parameter e:
F = F° + eF1 + e2F2 + e3 ... .
(6.1.15)
Substituting (6.1.15) into (6.1.13) and grouping terms of the same order with respect to £, we obtain the following equations for successive approximations: _T j ]_
4
x
x
1
2
xx )
2
\ i
-i
x
x
2 (6.1.17) /
-
4£,v.,
* •
1 <
2
p-i) + I 0
Fs (T 2; ra D) = 0
s> 2
(6.1.18)
Applications of Asymptotic Synthesis Methods
317
The zero-approximation equation (6.1.16) is nonlinear,3 while the successive approximations can be found by solving the linear equations (6.1.17) and (6.1.18), which usually is a simpler computational problem. Thus, the described scheme for solving Eq. (6.1.13) approximately is useful only if Eq. (6.1.16), that is, the Bellman equation for the problem with completely
known parameters a, can be solved exactly. As was already pointed out, the last condition is satisfied if 6i(x] are linear functions and c(x] and i/>(x) are quadratic functions of the phase variables x. In this case, all successive approximations can also be calculated in the form of quadratures (see §3.1 in [34]). The solutions of Eqs. (6.1.16)-(6.1.18) of successive approximations can be used for obtaining approximate solution of the synthesis problem. Namely, the quasioptimal control us(t, x,m, D), corresponding to the sth approximation, is determined by formula (6.1.14) after the function F in (6.1.14) is replaced by the approximate expression F = F° + sF^ + • • • + esFs. 6.1.3. Estimates of the quality of approximate synthesis. We assume that the quasioptimal control us (t, x, ra, D) has already been obtained in the sth approximation. By
(6.1.19) we denote the mean value (calculated from the time instant i] of the optimality criterion (6.1.3) for the control w s . 4 The deviation A* = Gs — F of the function (6.1.19) from the equation (6.1.13) is a natural control us (t, x, m, D). In what first two approximations, that
exact solution F(t,x,m, D) of the Bellman estimate of the quality of the approximate follows, we calculate the order of As in the is, we estimate A° and A1.
Just as in §3.4, we calculate the desired estimates A s (s — 0, 1) in two steps. First, we estimate the differences 5s — F - F*, then, 7* = F' - Gs , which immediately implies the estimates for A s (in view of the triangle inequality) .
Estimation of the differences 6° and 51. Let 9 ( x ) , c ( x ) , and i/>(x) be bounded continuous functions for all x e R n . Then it follows from Theorem 2.8 (for the Cauchy problem) in [124] that the quasilinear equations 3 The partial differential equations (6.1.13) and (6.1.16) of parabolic type are linear with respect to the higher-order derivatives of the loss function. That is why, equations of the form (6.1.13) and (6.1.16) are sometimes called weakly nonlinear (quasilinear or
semi-linear), see [61, 124]. 4
In (6.1.19), U,(T)
= U,(T,XU>(r),mu>(T),D»'(T)),
where xu'(r], m"'(r), and
Du> (T) satisfy Eqs. (6.1.2), (6.1.5), and (6.1.6) with u = U,(T) for r > t and the initial conditions x"' (t) = x, rn"> (t) = m, and D"' (t) = D.
318
Chapter VI
(6.1.13) and (6.1.16) have at most one solution in the class of functions that are continuous in the strip HT — {\x\ < oo; |D| < oo; m < oo; 0 < t <
T}, continuously differentiable once in t, and twice in other variables for 0 < t < T, and possess bounded first- and second-order derivatives with respect to x, m, D in HT- Furthermore, Theorem 2.5 (for quasilinear equations) in [124] implies the following estimate for the solution of the Cauchy problem (6.1.13):
x,m, D)| < Cim&xif>(x) + C 2 max|c
(6.1.20)
(here C\, C^ > 0 are some constants; it is assumed that the function c may depend not only on x as in (6.1.13) but on the other variables t,m, D). The above arguments also hold for linear equations (6.1.17) and (6.1.18) of successive approximations. By introducing a quasilinear operator L, we rewrite Eq. (6.1.13) in the form LF~-c(x), 0<«T; F(T, x, m, D) = ^(x). Then, for <5° = F - F°, we obtain from (6.1.13) and (6.1.16) a quasilinear equation of the form
LS° - -( 0
6°(T,x,m,D)=Q
(6.1.21)
(with regard to the fact that the solution F° of the zero-approximation equation (6.1.16) is independent of D, and therefore, F® = 0). The vector of partial derivatives F® is a bounded continuous function in view of the above-mentioned properties of the solution to Eq. (6.1.16). Hence, (6.1.21) is an equation of the form (6.1.13). To use the estimate (6.1.20), we need to verify whether the right-hand side of (6.1.21) is bounded. The elements of the matrices Q and NI are bounded, since the functions 6(x) are bounded and the matrix N is bounded and nondegenerate. Moreover, it follows from the inequality (6.1.10) that the norm of the matrix D can only decrease with time t. Therefore, the matrix D is bounded for all t £ [0,T] if the matrix DO of the initial (a priori) covariances is bounded, which was assumed in advance. It remains to estimate the matrices F°mT and F^mT of partial derivatives. To this end, we turn to the zero-approximation equation (6.1.16). By writing v* = dF°/dmi (here mi is an arbitrary component of the vector TO) and differentiating (6.1.16) with respect to the parameter m;, we obtain
Applications of Asymptotic Synthesis Methods
319
the linear equation for v':
vi(T,x,m,D) = Q.
0
(6.1.22)
Equation (6.1.22) is written for the case where the unknown parameter en stands on the rth line and in the jth column of the matrix A in the initial system (6.1.1); here 9j = 6j(x) is the jth component of the vector-function 0(x). Since OjFXr 'ls bounded, the solution v* of Eq. (6.1.22) and its partial derivatives vx and v'xxT, as was already noted, are also bounded. Finally, since v'x = Fxm . is bounded and the number i is arbitrary, the matrix FxmT in the first term on the right in (6.1.21) is also bounded. In a similar way, we verify the boundedness of F^mT • Thus, it follows from (6.1.21), (6.1.20) that 6° satisfies the estimate
\S°
(6.1.23)
where C is a positive constant. In a similar way, we can estimate Sl = F — F = F — F° — eF1. From (6.1.13), (6.1.16), and (6.1.17), it follows that Sl satisfies the equation
Q
Sl(T,x,m,D) = Q.
(6.1.24)
The boundedness of Fx, F^, F^xT, and F^mT can be verified by analogy with the case where we estimated 5°. Therefore, (6.1.24) and the inequality
(6.1.20) imply
\Sl
(6.1.25) 1
Estimation of the differences 7° and 7 . For the functions Gs = s
G (t,x,m,D), s — 0,1,2,..., determined by (6.1.19), we have the linear partial differential equations [45]
-G\ = 6T(x)AT(m)G*x + uTsBTG*x + uTa Hus + lcx + -eSp(D7V-1(a;)DG3D), s
0 < t < T,
G (T,x,m, D) =i/>(x).
(6.1.26)
320
Chapter VI
The quasioptimal controls
us = us(t,x,m,D) = -l-H^BTF*x = ~^H~1BT(F° + eF* + • • • + e° F°) contained in (6.1.26) are bounded continuous functions. Therefore, in view of [66], the functions Ga satisfying (6.1.26) are also bounded and twice continuously differentiable, just as the functions F and Fs discussed above. By using the expressions u0 = -H~iBTF°/2 and m - -H~iBT(F° + eF*)/2 for quasioptimal controls, as well as equations (6.1.26), (6.1.16), and (6.1.17), we can readily obtain the following equations for the differences 7° = F° - G° and 71 = F° + eF* - G1:
I°7° = e\ Sp(DQTF°mT) + L
0
7°(T,z,m,D) = 0,
(6.1.27)
IV = £2 \Sp(DQTF}mT) + zi Sp (DN,D(F^mT I
Q
j1(T,x,m, D) = 0,
(6.1.28)
where L° and -t1 are the linear differential operators
^
v J
dxdmT
+ ySp(W( : c )D
2
x
dx
Since the expressions in the square brackets in (6.1.27) and (6.1.28) are bounded, the inequalities (6.1.20) for the solutions 7°(t, x,m, D) and 7 1 (t,z,m, D) of Eqs. (6.1.27) and (6.1.28) yield the estimates
]7°| < Ce,
I71 < Ce2.
(6.1.29)
Finally, from (6.1.29), (6.1.23), and (6.1.25) with regard to the inequality |AS < |(5*[ + |7" , we have
|A°| < Ce,
A1 < Ce2.
(6.1.30)
Applications of Asymptotic Synthesis Methods
321
The estimates (6.1.30) show that the use of the quasioptimal control UQ or ui instead of the optimal control (6.1.14) results in a deviation (an increase) in the functional (6.1.3) by ~ e in the zero approximation and by ~ e2 in the first approximation. Thus, it follows from (6.1.30) that the method
of approximate synthesis of optimal control considered in Section 6.1.2 is asymptotically efficient. 6.1.4. An example. Let us consider the simplest case of system (6.1.1) in which the plant is an aperiodic first-order unit with an unknown inertia factor. In this case, Eq. (6.1.2) is a scalar equation
x = -ax + bu + T/vti(t),
(6.1.31)
where a is an unknown parameter, b and v > 0 are given numbers, and £(;£) is a scalar white noise of intensity 1. We define the optimality criterion
(6.1.3) as fT
I[u] = E / (gx2(t) + hu2(t)) dt Jo
(6.1.32)
where g and h > 0 are given constants. The optimal filtration equations (6.1.5), (6.1.6) and the Bellman equation (6.1.13) for problem (6.1.31), (6.1.32) are
d0m= - — x(t)[d0x(t)+
(mx(t] - bu) dt\,
(6.1.33)
—2
t=——x2(t),
(6.1.34)
-Ft = -
D2 2,„
D2
£( \Jxrmx + —a; J*p — e£——x —x F — fmm n ,.
0
? /y
F(T,x,m,D) = Q
(6.1.35)
( x , m, and D are scalar variables in (6.1.33)^(6.1.35)). The zero approximation (6.1.6) for Eq. (6.1.35) has the form
0
F°(T,x,m) = Q.
(6.1.36)
322
Chapter VI
The exact solution of Eq. (6.1.36) is5
F°(t, «, m) = f°(t, ra)z 2 + r°(t, m),
f(t,m) = f(T-t,m) =
gv(T-t)
(6.1.37)
(p — m)e~2'ii(1 ~t> + p + m
vh
,
2/3
It follows from (6.1.14) and (6.1.37) that the quasioptimal control in the zero approximation has the form
u0(t,x,m) = --f°(t,m)x,
(6.1.38)
it
where f°(t,m) is determined by (6.1.37). To obtain the quasioptimal control in the first approximation, we need to calculate the second term in the asymptotic expansion (6.1.15). In our
case, Eq. (6.1.17) for the function Fl = Fl(t, x, m, D) has the form
0
Fl(T,x,m,D) = Q.
(6.1.39)
Since, in view of (6.1.37), we have F^x — 2/^(i,m)a;, we obtain the following expression for the desired function F^(t,x, m, D):
F1(t,x,m, D) = / 1 (t,m, D ) x 2 + r 1 ( t , m , D), /!(*, m, D) = -2Dexp I - 2 / T x r / fm(T~
fT
^(tjm, D) = v I
(
Jt
L
(6.1.40)
[m + — f°(T - s, m)} ds\ J
h
{ rT rm + — 62 f°(T s,m)exp-^ 2 /
J
-
i i s,m) I d s *> ds,
1 f(s,m,D)ds
(here f^(T—s, m) denotes the partial derivative df°(s, m)/dm of the function f°(s,m) in (6.1.37) with respect to the parameter m). 5
Note that the loss function in the zero approximation is independent of the estimate variance D, i.e., F° = F°(t,x,m).
Applications of Asymptotic Synthesis Methods
323
It follows from (6.1.14), (6.1.15), (6.1.37), and (6.1.40) that the quasioptimal control synthesis in the first approximation is given by the formula
«i(«, x, m, D) = -- [ f ° ( t , TO) + e f 1 (t, TO, D)] x.
(6.1.41)
Comparing (6.1.38) and (6.1.41), we note that the optimal regulators in the zero and first approximations are linear in the phase variable x. However, if higher-order approximations are used, then we obtain nonlinear "laws of control." For example, in the second approximation, we obtain from (6.1.18) and (6.1.35) the following equation for the function F2 = F2(t, x, m, D):
-F? = -mxF2 - ^-[f(t,m)xF2
+ (xfl(t,m, D)) 2 ]
Obviously, its solution has the form
F2(t, x, TO, D) = q(t, m, D)a;4 + f 2 ( t , m, D)x2 + r2(t, m, D), and therefore, it follows from (6.1.14), (6.1.15), (6.1.37), and (6.1.40) that the quasioptimal control in the second approximation
u2(t, x, TO, D) = - f r { [ f ° ( t , m) + e/1^, m, D) + £2f2(t, TO, D)]x + 2e2q(t,m,D)x3\ is a linearly cubic function of x. Figures 47 and 48 show block designs for quasioptimal feedback control systems, which correspond to the first (Fig. 47) and the second (Fig. 48) approximations. By Wi (i = 0,1, 2, 3) we denote linear (in x) amplifiers with varying amplification coefficients
Chapter VI
324
FIG. 47 The plant P is described by Eq. (6.1.31). The unit SC of optimal nitration forms the current values of the sufficient coordinates m = m(t) — mt and
D = D(t) = Dt. It should be noted that the coordinate mt is formed in SC with the aid of the equation
(6.1.42) which differs from Eq. (6.1.33). The reason is that only stochastic equations understood in the symmetrized sense [174] are subject to straightforward simulation. Therefore, the symmetrized equation (6.1.42) is chosen so that
its solution coincides with the solution of the Ito equation (6.1.33). 6.1.5. Some results of numerical experiments. The estimates (6.1.30) establish only the asymptotic optimality of the quasioptimal controls u0 and HI. Roughly speaking, the estimates (6.1.30) only mean that the less the parameter e (i.e., the less the a priori indeterminacy of the components of the vector a), the more grounds we have for using the quasioptimal controls UQ and u\ (calculated according to the algorithm given in Section 6.1.2) instead of the optimal (unknown) control (6.1.4) that solves problem (6.1.1)-(6.1.3). On the other hand, in practice we always deal with problems (6.1.1)-
(6.1.3) in which all parameters (including e) have definite finite values. As a rule, in advance, it is difficult to determine whether a given specific value of the parameter £ is sufficiently small so that the above approximate synthesis procedure can be used effectively. Some ideas about the situations arising
Applications of Asymptotic Synthesis Methods
325
FIG. 48 for various relations between the parameters of problem (6.1.1)-(6.1.3) are given by the results of numerical experiments performed to analyze the efficiency of the quasioptimal algorithms (6.1.38) and (6.1.41) (see the example
considered in Section 6.1.4). As was already noted, it is natural to estimate the quality of the quasioptimal controls us (s = 0 , 1 , 2 , . . . ) by the differences A* = Gs — F,
where the functions G" = Gs(t,x,m, D), given by (6.1.19), satisfy the linear parabolic type equations (6.1.26) and the loss function F = F(t, x, m, D) satisfies the Bellman equation (6.1.13). In the example considered in Section 6.1.4, the Bellman equation has the form (6.1.35), and the functions Gs (s = 0,1, 2 , . . . ) satisfy the equations
-G\ = (-mi + bus)Gsx + hu2 + gx2 + -G",,
D2 v 0
G s (T,x,m, D) =0.
G sran (6.1.43)
Equations (6.1.35) and (6.1.43) were solved numerically (Eq. (6.1.43) was solved for s = 0 and s = I with the quasioptimal controls (6.1.38) and (6.1.41) taken as us, s — 0,1). Here we do not describe finite-difference schemes for constructing numerical solutions of Eqs. (6.1.35) and (6.1.43)6 but only present the results 6
Numerical methods for solving equations of the form (6.1.35) and (6.1.43) are discussed in Chapter VII.
Chapter VI
326
F(p,x,m,D), GQ(p,x,m)
D=
FIG. 49 of the corresponding calculations performed for different values of the parameters of problem (6.1.31), (6.1.32). In Fig. 49 the plots of the loss function F (solid curves) and the function G° (dashed curves) are given for three values of the a posteriori
variance D = eD in the case where m = I , p = T — t = 3, and problem (6.1.31), (6.1.32) has the parameters g = h = b = v=\. (Since the functions F and G° are even with respect to the variable x, that is, F(t,x,m,D) - F(t,-x,m,D) and G°(t,x,m) = G°(t,-x,m), Fig. 49 shows the plots of F and G° only for x > 0.) Since the corresponding curves for F and G° are close to each other, we can state that, in this case, the quasioptimal zero-approximation control (6.1.38) ensures the control
quality close to that of the optimal control. However, this situation is not universal, which is illustrated by the numerical results shown in Fig. 50. Figure 50 shows the plots of the functions F (solid curves), G° (dotand-dash curves), and Gl (dashed curves) for the "reverse" time p —
T — t = 2.5 and the parameters g = h — 1, b = 0,1, and v — 5 of problem (6.31), (6.1.32). One can see that the use of the quasioptimal
zero-approximation control uo(t, x,m) leads to a considerable increase in the value of the functional (6.1.19) compared with the possible minimum (optimal) value F(t, x, TO, D). Therefore, in this case, to ensure a qualitative control of system (6.1.31), we need to use quasioptimal controls in higher-order approximations. In particular, it follows from Fig. 50 that, in
this case, the quasioptimal first-approximation control ui(t, x, TO, D) determined by (6.1.37), (6.1.40) and (6.1.41) provides the control quality close
327
Applications of Asymptotic Synthesis Methods
2000
m= -0.6 = 0.6
1.2
1.6
2.0 x
FIG. 50 to the optimal. Thus, the results of numerical solution of Eqs. (6.1.35) and (6.1.43) confirm that the quasioptimal control algorithm (6.1.41) is "highly qualitative." We point out that this result was obtained in spite of the fact that
the a posteriori variance D, which plays the role of a small parameter in the asymptotic synthesis method considered here, is of the same order of magnitude as the other parameters (, h, b, i/} of problem (6.1.31), (6.1.32). This fact allows us to believe that the asymptotic synthesis method (see Section 6.1.2) can be used successfully for solving various practical problems of the form (6.1.1)^(6.1.3) with finite values of the parameter e. In conclusion, we make some methodological remarks. First, we recall
that in the title of this section the problems of optimal control with unknown parameters of the form (6.1.1)—(6.1.3) are called "adaptive." It is well known that problems of adaptive control are very important in the modern control theory, and at present there are numerous publications in
this field (e.g., see [6-9, 190] and the references therein). Thus, it is of interest to compare the results obtained in this section with other approaches to similar problems. The following heuristic idea is very often used for constructing adaptive algorithms of control. For example, suppose that for the feedback control system shown in Fig. 13 it is required to construct a controller C
that provides some desired (not necessarily optimal) behavior of the system in the case where some parameters a of the plant P are not known in advance. Suppose also that for some given parameters a, the required
328
Chapter VI
behavior of system in Fig. 13 is ensured by the well-known control algorithm u —
output process xf0 = {X(T) : 0 < T < t}\ (2) to define the adaptive control by the formula ua — if>(t,x,at). Needless to say, an additional analysis is required to answer the question of whether such control ensures the desired behavior of the system. The corresponding analysis [6-9, 190] shows that
this method for constructing adaptive control is quite acceptable in many specific problems. Now let us discuss the results of this section. Note that the abovementioned heuristic idea is exactly realized if system (6.1.2) is controlled by the quasioptimal zero-approximation control uo(t, x, m). To verify this fact, we return to the example considered in Section 6.1.4. The algorithm of the optimal control for problem (6.1.31), (6.1.32) with a known parameter a = —a is given by formulas (2.1.14) and (2.1.16) in §2.1. Comparing (2.1.14), (2.1.16) with (6.1.37), (6.1.38), we see that the quasioptimal zero-approximation algorithm (6.1.37), (6.1.38) can be obtained from the optimal algorithm (2.1.14), (2.1.16) by replacing the parameter —a by its optimal estimate mt established by means of the filter equations (6.1.33) and (6.1.34). On the other hand, a numerical analysis of the quasioptimal algorithms uo(t, x, m) and ui(t, x, m, D) (see Figs. 49 and 50) shows that the algorithm MI is preferable in contrast with the "heuristic" algorithm UQ in the zero approximation. This result proves that the regular asymptotic method considered in this section is effective for solving adaptive problems of optimal control.
§6.2. Some stochastic control problems with constrained phase coordinates
As was pointed out in §1.1, in the process of constructing actual control systems, one often needs to take into account constraints of various types imposed on the set of possible values of the phase coordinates. These constraints arise due to exploiting some specific systems, additional requirements on the transient processes, allowance to the fact that the time of control switching is finite, and to other courses. In these cases, a region of admissible values is specified in the phase space, so that the representative point of the controlled system must not leave this region. In this case, the equations of the system dynamics that determine the phase trajectories in the interior of this region can be violated at the boundary of this region.
Additional constraints that are imposed on the phase trajectories on the
Applications of Asymptotic Synthesis Methods
329
boundary depend on the type of the problem. In what follows, we consider two one-dimensional and one two-dimensional problems of optimal control synthesis (the problem dimension is determined by the number of phase variables on which, in addition to time i, the loss function depends). In the one-dimensional problems, the controlled
variable z(t) is interpreted as the difference (error signal) between the current values of the random command input y(t) and the controlled variable x(t) in the servomechanism studied in §2.2. However, in contrast with §2.2 where any value of the error signal z ( t ) was admissible, in the present section it is assumed that the region of admissible values of z is an interval [^1,^2]- At the endpoints of this interval, we have either reflecting or absorbing screens [157, 160]. In the first case, if the representative point z(t)
comes to i\ or £2, then it is instantaneously reflected into the interior of the interval; in the second case, on the contrary, the representative point "sticks" to the boundary and remains there forever. In practice, we have the first problem if the error signal values lying outside the admissible interval [£1,^2] are prohibited, and we have the second problem if the tracking is interrupted at the endpoints (just as in radio systems of phase small adjustment [143, 180]). In the two-dimensional problem we consider the optimal control of a diffusion process in the interior of the disk of radius TO centered at the
origin on the phase plane (x,y). The circle bounding this disk is a regular boundary [124] reflecting the phase trajectories along the inward normal. 6.2.1. One-dimensional problems. Reflecting screens. Let us consider, just as in §2.2, the synthesis problem of optimal tracking a wandering coordinate in the case where the servomotor with bounded speed is used as an executive mechanism. By analogy with §2.2, we assume that the
command input y(t) is a continuous Markov diffusion process with known drift a and diffusion B coefficients (a,B — const, B > 0). By using a servomotor with bounded speed (x = u, \u\ < um, um > a|), it is required to "follow" the command signal y(i) on the time interval 0 < t < T so that to minimize the mathematical expectation (mean value) of the integral performance criterion T -i
c(z(t))dt\, 0
1
where z ( t ) = y ( t ) — x(t] is the error signal, c(z) is a nonnegative penalty function attaining its minimum at the unique point z = 0, and c(0) = 0. In this case, as shown in §2.2, solving the synthesis problem (in the case of unbounded phase coordinates) is equivalent to solving the Bellman equation
330
Chapter VI
(see (2.2.4))
2 dzi
oz
min
|«]<« m i
_ _„
»_„,,), at
oz \
0 < ( < T , (6.2.1)
with the loss function
F(t,z)-
f fT 1 E / c(z(s)) ds z(t) = z ,
min |«(»)[<« m
L./*
(6.2.2)
J
t<s
satisfying the following natural condition for t = T:
F(T,z) = 0.
(6.2.3)
According to §1.4, the Bellman equation is defined only by local characteristics of the controlled process z(t). Therefore, for problems with constraints on the error signal, Eq. (6.2.1) also remains valid at all interior points li < z < 12. Indeed, since the stochastic process z ( t ) is continuous, its realizations issued from an interior point z with large probability (almost surely) move to a small distance during a small time At and cannot reach the endpoints t\ and (.-2- Therefore, in a sufficiently small neighborhood of any interior point z, a controlled stochastic process behaves in the same way as if there were no reflecting screens. Hence, the differential equation (6.2.1) is valid at these points. At the points £1 and lz, Eq. (6.2.1) is not valid, and additional conditions on the function F at these points are determined by the character of the
process z(t) near these points. For example, in the case of reflecting screens considered here, we have the conditions [157] (9 W
rtW
^(Mi) = ^(M2) = 0.
(6.2.4)
The conditions (6.2.4) can be explained intuitively by modeling the diffusion process z ( t ) approximately as a discrete random walk [160] in which with some definite probabilities the representative point goes from the point z to neighboring points z ± Az, Az = ^/BAt, during the time At. Then if at any time instant t the point z comes to the boundary, say, z = li, then with probability 1 the process z attains the value £1 + Az at
time t + At, and therefore, we can write the following relation for the loss function (6.2.2):
F(t, 4) = c(li) At + F(t + At, /! + Az).
Applications of Asymptotic Synthesis Methods
331
By expanding the second term in the Taylor series around the point (<, •£].), we obtain
OF —— (
dF —— (
whence, passing to the limit as At —>• 0, we arrive at (6.2.4). Thus, to synthesize an optimal servomechanism with the variable z subject to constraints in the form of reflecting screens at the points z = t\ and z = 12, we need to solve Eq. (6.2.1) with additional conditions (6.2.3) and (6.2.4) on the function F(t,z) (4 < z < £ 2 , 0 < t < T). In this case, the synthesis problem is solved according to the scheme studied in §2.2 for a similar problem without constraints on z. Therefore, here we only briefly recall this scheme paying the main attention to distinctions arising in the calculational formulas due to constraints on the phase variable. Obviously, the expression in the square brackets in (6.2.1) is minimized by an optimal control of the form
dF
uf(t,z) = u m sign —— (t,
(6.2.5)
Substituting (6.2.5) into (6.2.1) and omitting the symbol min, we obtain
Bd2F
dF_
2
dz
2 dz
- u
dF
d_F_
' dt
(6.2.6)
If we pass to the reverse time T = T — t, then the boundary value problem we need to solve acquires the form
B d2F
dF
~o~~5~ 2 dz2
^~ dz
dF
dF
-c(z
t l < Z <
0 < T < T, (6.2.7) (6.2.8)
J?(0,z) = 0.
(6.2.9)
By taking into account the properties of the penalty function c(z), we see that the loss function F ( T , Z ) satisfying the boundary value problem (6.2.7)-(6.2.9) for all r (0 < T < T) has a single minimum (with respect to z) on the interval li < z < £2- Therefore, the optimal control (6.2.5) can be written as (see (2.2.8))
ut (T, z) = um sign (z - z,, (T)),
(6.2.10)
332
Chapter VI
where z* (T) is the minimum point (with respect to z) of the function F(r, z) and simultaneously the switch point of the controlling action. This point can be found from the condition f)F
^-(r,z.) = 0.
(6.2.11)
Thus, to synthesize an optimal system, we need to solve the boundary value problem (6.2.7)-(6.2.9) and to use the condition (6.2.11). Problem (6.2.7)-(6.2.9) can be solved exactly if, just as in §2.2, we consider the stationary operating conditions corresponding to large values of T. In this case, instead of the function F(T, z), we can consider the stationary loss function f(z] given by the relation
(just as in (1.4.29), (2.2.9), (4.1.7), and (5.3.17), the number 7 characterizes the mean losses per unit time in the stationary tracking mode). Therefore, for large T (more precisely, as T —>• oo), the partial differential equation (6.2.7) is replaced by the following ordinary differential equation for the function f ( z ) :
BcPf , df a 2 dz2 dz
= 7-c(z)
with the boundary conditions
(6.2.12)
i"-13)
|«>> =£«•>=»•
In this case, the coordinate of the switch point given by (6.2.11), where F is replaced by /, attains a constant value z* (that is, we have a stationary switch point).
The boundary value problem (6.2.12), (6.2.13) can readily be solved by
the matching method. By analogy with §2.2, let us consider Eq. (6.2.12) on different sides of the switch point z*. Then the nonlinear equation (6.2.12) is replaced by the pair of linear equations
f
"(""*" ° )
= < y
~
C
^'
z
*<*<£2,
(6.2.14)
Solving Eqs. (6.2.14), (6.2.15) with boundary conditions (6.2.13), we arrive at
1(a-um)
(6.2.16)
Applications of Asymptotic Synthesis Methods
333
By using (6.2.11), we obtain the two equations
^M = ^M = 0
(6-2.17)
for two unknown parameters 7 and z*. Substituting (6.2.16) into (6.2.17) and eliminating the parameter 7 from the system obtained, we see that the stationary switch point z* satisfies the transcendental equation p A 2 z,
A2 Jif , ' e*2yc(y] dy V ; "
_ ,,A2/2
(6.2.18)
For the quadratic penalty function c(z) = z 2 , Eq. (6.2.18) acquires the form
wi(z») = w 2 (z*),
(6.2.19) 2 2
\ 2 y 2 _ 2A'j2 -i- 2 _ (A ^ _ 2A'^~ -1- 21 exDfA'f^' _ z 11 WiCz*) = ——————— i = 1,2. If ^i —> —oo and £2 —>• +00 (that is, reflecting screens are absent), then Eqs. (6.2.19) imply the following explicit formula for the switch point zt: - _L \ AI
1 _ \ A 2
aB "? u^ — a9z '
this formula was obtained in §2.2 (see (2.2.16)). In the other special case £2 = —^i and AI = — A2 (the last equality is possible only if a — 0), Eq. (6.2.19) has a single trivial root zt = 0, that is, the optimal control (6.2.10) coincides in sign with the error signal z.
6.2.2. Absorbing screens. Let us see how the tracking system studied in the preceding section operates with absorbing screens. Obviously, in this
case, the loss function (6.2.2) must also satisfy Eq. (6.2.7) in the interior of the interval [^1,^2] and the zero initial condition (6.2.9). At the boundary points, instead of (6.2.8), we have
F(T, 4) = C(^I)T,
F(T, £ 2 ) = c(l2)r.
(6.2.20)
The conditions (6.2.20) follow from formula (6.2.2) and the fact that the trajectories z ( t ) stick to the boundary. Indeed, by using, as above, the discrete random walk model for z(t), we can rewrite (6.2.2) as
334
Chapter VI
and hence, since t and A are arbitrary, we obtain
F(t, li) = c(li)(T -t)= c(li)r,
i = 1, 2.
Just as in the preceding section, the exact solution of synthesis problem with absorbing screens can be obtained only in the stationary case (as
T —>• oo). Suppose that the stationary operating mode exists and that ZQ is the corresponding stationary switch point. Then for large r, the nonlinear equation (6.2.7) can be replaced by two linear equations
c(z)
'
^
B d2F-, 8F-> <9_F> Tirr + ^-^Tr^l^ -'(*)' 2i OZ OZ OT
z o < z < £ 2 . (6.2.22)
For z = ZQ, z = £1, and z = i^-, the functions FI and Jf<2 satisfy (6.2.11) and (6.2.20). In accordance with [26], for large T, we seek the solutions of the linear equations (6.2.21) and (6.2.22) in the form
Fi(r, z) = J>i(z)T + f i ( z ) ,
i = 1, 2.
(6.2.23)
Using (6.2.23), we obtain from (6.2.21), (6.2.11), and (6.2.20) the following system of ordinary differential equations for the functions il>\(z) and f i ( z ) :
(6.2.24) From (6.2.24) we obtain
v. (6.2.25) za
In a similar way, for the functions ^>2 and /2 we have
f a ( z ) = c(l2),
h(z) = ^ fZ dy f" [C(12) - C («)]e A '(-») dy (6.2.26)
(here AI and A2 are given by (6.2.16)). It follows from (6.2.23), (6.2.25), and (6.2.26) that Eq. (6.2.7) has a continuous solution only if
c(/i) = c(/ 2 ).
(6.2.27)
Applications of Asymptotic Synthesis Methods
335
The same continuity condition allows us also to obtain the following equation for the switch point ZQ (provided that (6.2.27) is satisfied):
z
\
[c(t \c(t-\)1)-c(y)}e —c(y}\e x^-^dy= dy =
dz /
a
Ir
dz
[c(4) - c(y)}ex^-^ dy.
•/£?
(6.2.28) Just as in the case of reflecting screens, Eq. (6.2.28) can be specified by various expressions for the penalty function c(z).
REMARK 6.2.1. If the condition (6.2.27) is violated, then it makes no sense to study the stationary operating mode in the problem with absorbing
boundaries, since in this case the synthesis problem has only a trivial solution. In fact, we can readily see that for c(li) > c(£ 2 ) we always need to set u = —um (correspondingly, for c(£i) < c(l^) we need to set u = +um). This character of control depends on the fact that, in view of its regularity, the diffusion process z(t] sticks to that or other boundary with probability 1 (as t —>• oo). Therefore, it is clear that this algorithm for controlling the process z(t) maximizes the probability of the event that the process sticks to the boundary with the least possible value of the penalty function c(z). D In the general case c(£i) ^ c(£ 2 ), we need to solve the nonstationary
boundary value problem (6.2.7), (6.2.20), and (6.2.9). Since this problem cannot be solved exactly, it is necessary to use approximate synthesis methods. In particular, we can use the method of successive approximations considered in Chapter III for problems with unbounded phase coordinates. According to Chapter III, the approximate solutions F^k\r, z) of Eq. (6.2.7) can be found by recurrently solving the sequence of linear equations
B, 2 dz2 2
fe
B 9 p( ) 7T
2
a 2o
dz
c + a^T=^^dz dr W>
k
Qp( )
Qp( )
+0-^—— = —5—— + un
dz
( 6 - 2 - 29 )
k
dr
dz
(all fC=)(r,z), k = 0,1,2,..., in (6.2.29) satisfy (6.2.9) and (6.2.20)). After F(k\T, z) are calculated, the synthesis of a suboptimal system is established by (6.2.10) and (6.2.11) with F replaced by F^. Just as in Chapter III, we can prove that the function sequence F^(T,Z) asymptotically as k —>• oo converges to the exact solution F(T, z) of the boundary value problem (6.2.7), (6.2.9), (6.2.20), and the corresponding suboptimal systems to the optimal system (the last convergence is estimated by the quality functional).
6.2.3. The two-dimensional problem. Suppose that the motion of a controlled system is similar to the dynamics of a Brownian particle
336
Chapter VI
randomly walking on the plane (x, y] so that along one of the axes, say, along the z-axis, this motion is controlled by variations of the drift velocity within a given region, while along the y-eads we have a purely diffusion
noncontrolled wandering. In this case, the equations describing the system
motion have the form
t),
y=V2B£2(t},
-(um - a) < u < um + a, (6.2.30) where £i(i) and £2^) are independent stochastic processes of the white noise type with intensity 1 and, by analogy with one-dimensional problems, — (um — a) < 0 and (um + a) > 0 indicate the boundary values of the nonsymmetric region of admissible controls u. We assume that the representative point ( x ( t ) , y ( t ) ) must not go away from the origin on the plane (x, y) to distances larger than TQ. To this end, we assume that the phase trajectories reflect from the circle of radius TO
along the inward normal to this boundary. Under this assumption, it is required to find a control law that minimizes the mean value of the quadratic optimality criterion
r rT ( x 2 ( t )
i
+ y2(t))dt\. J
I[u}=E { I I Jo
(6.2.31)
One can readily see that the Bellman equation related to this problem, written in the reverse time r — T-t, has the form (FT, Fx, Fy indicate the partial derivatives with respect to T, x, y):
B(Fxx + Fyy) +
min
[uFx] = Fr - x2 - y2.
-(« m -a)<«<(« m +o) L
(6.2.32)
v
'
In addition to Eq. (6.2.32) for the function F(r,x,y) such that 0 < T < T and ^/x2 + y2 < TO, the loss function F ( T , X , y) must satisfy the zero initial condition
F(0,x,y) = Q
(6.2.33)
and the boundary condition of the form [157]
OF,
where d/dn is the normal derivative on the circle of radius TQ.
(6.2.34)
Applications of Asymptotic Synthesis Methods
337
In the polar coordinates (r, (p) defined by the formulas x — r cos (p, y — rsinip, the boundary value problem (6.2.32)-(6.2.34) acquires the form B ( Frr + — Fr + —— 2
r
r
mm
/
.
\
I
\
•
'
„
r i i
(6.2.35) (6.2.36) (6.2.37)
It follows from (6.2.35) that, just as in the one-dimensional case, the optimal control is of relay type:
ut (r, r,
(6.2.38)
but now, instead of the switch point, we have a switching line on the plane (x,y). This switching line is given by the equation (in the polar coordinates)
cos
r
-F^ = 0.
(6.2.39)
To obtain an explicit formula for the switching line, we need to solve Eq. (6.2.35) or (since this is impossible) equations of successive approximations obtained by analogy with Eqs. (6.2.29). Now we shall calculate the loss functions and the corresponding switching lines for the first two approximations of Eq. (6.2.35). The zero approximation. Following the algorithm of successive approximations considered in Chapter III (see also (6.2.29)), we set the nonlinear term in the zero approximation of (6.2.35) equal to zero and thus obtain O) r F B 1(pW + -_F<°) + —-T^ F(°AI — - VF( - T. r2 rr ^ r T T D
( 6 2 40) ^O.Z.IUj
It follows from (6.2.40), (6.2.36), and (6.2.37) that the solution F^ is radially symmetric, F^ = F^(r,r), and therefore, instead of (6.2.40), (6.2.36), and (6.2.37), we have B p
+
°
=F
- r\
J ? ° ( 0 , r) = 0,
F^(r, r0) = 0.
(6.2.41)
Chapter VI
338
It is well known [179] that the solution of Eq. (6.2.41) can be found by separation of variables (by the Fourier method) as the series
(6.2.42) (u.2.4o) / r3/o I -^-r 1 dr. Jo \ ro ) Here IQ(X] is the Bessel function of zero order and is the mth root of the equation dlo(fj,)/d/j, = 0. It follows from the properties of zeros of the Bessel function [179] that the series (6.2.42) is rapidly convergent. Therefore, since we are interested only in the qualitative character of suboptimal control laws, it suffices to find only the first term of the series in the sum in (6.2.42). Calculating GI and using the tables of Bessel functions [77], we obtain the following approximate expression (9 = B/rfy for the function Cm — i
n Vorr /
r> \i9
(A)2[Io(A)]2
F(°)(r,r) = i r - 0 . 0 4 2 6 / 0
(6.2.44)
By differentiating (6.2.44) with respect to r and taking into account the
relations dlo(x)/dx = I i ( x ] and /J,® — 3.84, we find ro
1-exp -
(6.2.45)
Applications of Asymptotic Synthesis Methods
339
Since the first-order Bessel function /i(^°r/r 0 ) is positive for 0 < r < r0 (Ii(fJ^) = 0), the derivative (6.2.45) is positive everywhere in the interior of the disk of radius ro on the plane (z,j/). Hence, in view of (6.2.38), the sign of the controlling action in the zero-approximation is determined by the sign of cos
diameter of the disk of radius r0 on the plane (x, y) (in Fig. 51 the switching line is indicated by AOB; the arrows show the direction of the mean drift velocity).
The first approximation. By using the results obtained above, we can write the first-approximation equation as
(6.2.46)
, r,
0 <
(um+a)Fr(0)(T,r)cos
\ <
(6.2.47)
(here the function Fr(0) is given by formula (6.2.45)). The solution of Eq. (6.2.46) may also be written as a series in eigenfunctions, but since now there is no radial symmetry, this series differs from (6.2.42) and has the form [179] .7nT
00 !-AJ
OO !-^J
F(1) (r, r, tp) = COO(T) + / V} ^ (cnro cos n
x /„ f ^-r) exp f - B(^-\\T - a)] d
(6.2.48)
where
Cnm — Cnm\O) —
^»'
—
2~f jy
n
\2
21 r2 / ^ \
'
^o d «U^ m ; - " J - ' n l M m J
(6.2.49)
—————————!lO——!lO———————————————————————————————————————————
^•UOC)2 - "^KnOC) [ 1 ^ 2
CL, r=: <
for n / 0, for n = 0,
'
(6.2.50)
340
Chapter VI
and COO(T) denotes the terms independent of r and y>, and hence, insignificant for the control law (6.2.38). The numbers /^ are the roots of the equation dln(fj,)/dfj, = 0, where / n (/i) is the nth-order Bessel function. By analogy with the case of the zero approximation, we consider only the first most important terms of the series (6.2.48). Namely, we retain only the terms corresponding to the two roots fj,\ and /i° of the equation dln(/j,)/dfj, — 0; according to [77], ^,\ = 1.84 and $ = 3.84. This means that all coefficients in (6.2.48) except for CQI, GU, and c'n must be set equal to zero. The coefficient CQI coincides with ci in (6.2.43) and has been calculated in the zero approximation (therefore, in the series (6.2.48) the term containing CQI coincides with the second term in formula (6.2.44)). By calculating c'n according to (6.2.50) with regard to (6.2.47), we obtain c'u = 0. Thus, to find the loss function F^l\ it suffices to calculate only cu. Substituting (6.2.47) and (6.2.45) into (6.2.49), we obtain
2
_
(/4)2
Z x jT / «P[-»(,
'o
f ,,(r*,)!: Jo \ o ) r
x
i !-±r\rdr
»3>r/2
L
/.jr/2
cos2 if dip- (um - a) I
(wm + a) / ./jr/2
-j
cos2 p efy> .
./-JT/2
J
(6.2.51) Since we have (see [179], §2, Part 1, Appendix 2) pra
/..I \
/ ,,0 \
J,i I( M l 'r 1J, f - ' i l( ^r\rdrriror— Jo V r o / \^o / II
J
r2n°T, (iil1 \T'lii°\ li
r
o^'„ ^> ^^> . , (Mi)2-(w)2
we calculate the other integrals in (6.2.51) and obtain
,
(6.2.52)
Applications of Asymptotic Synthesis Methods
341
Substituting (6.2.52) into (6.2.39) and letting T —>• oo, we arrive at the following equation for the switching line corresponding to the stationary operating conditions:
[3.54v/i(/4v) - 1.91/1 (/4v)] cos2
FIG. 52 Curves 1, 2, and 3 in Fig. 52 correspond to the three values of the parameter s in Eq. (6.2.53): e = 0.4, 1.0, 3.0. Thus, the optimal control in the first approximation consists in switching the control action from u — —(um — a ) in the region R_ to u = +(um +a) in the region R + , which (in dependence of the value of the parameter e) lies inside one of the closed
curves 1-3 in Fig. 52.
REMARK 6.2.2. The decomposition (Fig. 52) of the phase space into the regions R_ and R_|_ can be refined if the functions F^(r,z} and F^'(T,r^if>) are calculated more precisely (that is, are approximated by a larger number of terms in the series (6.2.42) and (6.2.48)). However, as the corresponding calculations show, curves 1-3 obtained in this case do not practically differ from those shown in Fig. 52. D §6.3. Optimal control of the population size governed by the stochastic logistic model
In this section we return to the problem of optimal control of the population size, which was formulated in §2.4 (but not solved). Let us briefly recall the statement of this problem.
342
Chapter VI
6.3.1. Statement of the problem. We shall consider a single-species population whose dynamics is described by the controlled stochastic logistic model
,
K
,
,
(6.3.1)
where x — x ( t ) is the population size (density) at time t, £(t) is a stochastic process (1.1.31) of the standard white noise type, and r, K q, B, and x° are given positive constants.
Admissible controls belong to the class of nonnegative scalar bounded measurable functions u = u(t] that for all t satisfy a condition of the form 0 < u(t) < um.
(6.3.2)
where um is a given positive number.
We shall consider the control problem on an infinite time interval R+ = [0, oo) with an arbitrary initial population size a;(0) = x° > 0. The goal of control is to maximize the functional
r r°° e~st (pqx(t)
I[u] = E\ Uo
i
- c}u(t) dt -> max , J o<«(*)<« ro
(6.3.3)
t>0
where 6,p,q,c > 0 are given numbers and E denotes the mathematical expectation of the expression in the square brackets (we average over the ensemble of random trajectories issued from a given point x(Q) = x° and satisfying the stochastic differential equation (6.3.1)). It follows from §2.4 that problem (6.3.1)-(6.3.3) is a stochastic general-
ization of optimal fisheries management problems studied in [35, 68, 101]. If, just as in §2.4 and in [35, 68, 101], the number p is the cost of unit mass of caught fish, the number c denotes the cost of unit efforts u(t) for fishing, and q is the catchability coefficient, then the functional (6.3.3) is an estimate of the mean profit obtained by fishing during time of the order of 1/S. The optimal control function u*(t) : R + -» [0, um] maximizing the functional (6.3.3) is a random function of time. To obtain a constructive algorithm for calculating this function, we need to use some results of the general control theory for processes of diffusion type (see [58, 113, 175] as well as §1.4).
We assume that the controlling party has information about the current values of the controlled process x(t). Then it is expedient to choose the control u(t) at time t on the basis of the entire body of information available
Applications of Asymptotic Synthesis Methods
343
on the controlled process. This leads to a controlling function of the form u(t) = u(t, XQ), XQ = {x(s): 0 < s < t} that is sometimes called the natural strategy of control (the function u(t,XQ) can be a probability measure). But if, just as in our case, the controlled system obeys an equation of the form (6.3.1) with perturbations £(t) in the form of a Gaussian white noise, then, as was shown in [113, 175], the prehistory of the controlled process x ( s ) : 0 < s < t does not affect the quality of control. Therefore, to solve the optimization problem (6.3.3), it suffices to consider only the class of controlling functions that are deterministic functions of the current phase variable u(t) = u(t,x(t)) (the nonrandomized Markov strategy). Next, since the stochastic process £(£) is stationary and the coefficients in (6.3.1) are time-invariant, the optimal strategy for the infinite-horizon problem in question does not depend on time explicitly, that is, uf(i) = u*(x(t)}. By using the controlling function (the synthesizing function) in the form u f ( x ) , we can realize the optimal control of system (6.3.1) in the form of an automatic feedback control system. In what follows, we present a method for calculating the synthesizing function u*(x) for problem (6.3.1)-(6.3.3).
6.3.2. Solution of problem (6.3.1)-(6.3.3). By analogy with §2.4 and on the basis of the results obtained in [113, 175], we can assert that the maximum value of the functional (6.3.3) (that is, the cost function)
r r ^^ F(x) =
max 00
E /
e~st(pqx(t) - c)u(t)dt
x(0) = x
IJo
considered as a function of the initial state x is twice continuously differentiable and satisfies the following Bellman equation7 (F1 = dF/dx, F" = d2F/dx2):
Bx2F"+x^r+B-~xjF'+
max
[(pqx-c-qxF')u]-6F
= 0. (6.3.4)
The cost function is defined only for nonnegative values of the variable x; for x = 0, this function satisfies the natural boundary condition F(Q) = 0,
(6.3.5)
which is a straightforward consequence of (6.3.1) and (6.3.3) (indeed, it follows from (6.3.1) that if x(0) = 0, then x(t) = 0 for all t > 0; hence, 7
Equation (6.3.4) is written with regard to the fact that the solution of the stochastic
equation (6.3.1) is understood in the symmetrized sense (see §1.2 and [174]).
344
Chapter VI
it follows from (6.3.3) that in this case the optimal control has the form u*(t) = 0; and hence, (6.3.3) implies (6.3.5)). First, note that Eq. (6.3.4) for 6 > r + B and K —> oo has exact solution (obtained in §2.4) •I
J_
/
C \ / 1* \
TO(PXO~ -}(—) (
7 0
\
>
Q<x<x0,
VOX
C\
F(x)={u m(-———^—————--} v ' \6 — r - B + qum a/ I r (S - r - B)px0 8 - r - B + qum q
(6.3.6)
Here
_ _
c(S-r-B +9Um)
(6_3J)
determines the switch point of the optimal control in the synthesis form
u*(x)=t
f 0,
0 < x < a*,
(. Um,
X > XQ,
(6.3.8)
and the numbers fc° > 0 and k^ < 0 in (6.3.6) and (6.3.7) can be written
in terms of the parameters of problem (6.3.1)-(6.3.3) as r 2 + 4SB),
kl = ~ (qum -r- ^/(qum - r)« + 4SB).
For an arbitrarily chosen value of the parameter (the medium capacity) K > 0, it is impossible to find the solution of Eq. (6.3.4) in the form of
finite formulas like (6.3.6) and (6.3.7). Nevertheless, as is shown below, constructive methods for solving the synthesis problem can also be found in this case. Let us construct a solution of Eq. (6.3.4). First, we note that it follows from (6.3.4) that the optimal control requires only the boundary values u = 0 and u = um of the set [0, um] of admissible controls. The choice of one of these values is determined by the sign of the expression 7(2;) = pqx — c — qxF'x. If 7(2:) = 0, then the choice of control is not determined formally by Eq. (6.3.4). However, one can see that in this case the choice of any admissible value of u does not affect the solution of Eq. (6.3.4), since the nonlinear term of Eq.(6.3.4) vanishes for 7(2:) = 0 and any admissible u. Therefore, we can write the optimal control in the form i
;m,
-\lv\ i\x! ^ ^ n"> 7(2;) > 0.
Applications of Asymptotic Synthesis Methods
345
If the equation 7(2;) = 0 has a single root x f , then the optimal control can be written in the form 0 < x < Xt, J O,
(6.3.9)
= <
1. ^"m t
*^* \ iC.
similar to (6.3.8), where the coordinate of the switch point xt is determined by the equation
pqx - c - qxF'(x) = 0,
(6.3.10)
whose solution can be obtained after the cost function F ( x ] is calculated. By Fo(a;) and FI(X) we shall denote the cost function F(x) on either side of the switch point xf. Then, as it follows from (6.3.4) and (6.3.9), instead of one nonlinear equation (6.3.4), we have two linear equations for FQ and
Bx2F^' + x(r + B-^-x}F^-SF0 = Q, K. I
V
(r + B -qu
*>
m
0 < x < z,,
(6.3.11)
\
- —x\F[-6Fi = um(c-pqx), K
x* < x. (6.3.12)
Since the cost function F(x), as the solution of the Bellman equation (6.3.4), is twice continuously differentiable for all x E [0, oo), the functions FQ and FI satisfy the boundary condition (6.3.10) at the switch point x*. Moreover, it follows from (6.3.5) that -Fo(O) = 0. These boundary conditions allow us
to obtain the unique solution of Eqs. (6.3.11) and (6.3.12), and thus, for all x G [0,oo), to construct the cost function F(x) satisfying the Bellman equation (6.3.4). We shall seek the solution of Eq.(6.3.11) as the generalized power series TTl / \ __ I \ r^\x) ^ x (7(O.Q -j-i a^x +| 0-2^ 2 +I ... j.
/ i ? O 1 O \ (o.o.ioj
By substituting the series (6.3.13) into (6.3.11) and setting the coefficients of x" ,x"+1, . . . equal to zero, we obtain the following system for the characteristic factor cr and the coefficients a,-, i = 0, 1, 2, . . . : - S]a0 = 0,
[Ba-(a - 1) + (r + B)a -6 + 2B
(
- l) + (r + B)
r ~ J^(^ + n- 1)O n-l n = 1,2,3..., .
= 0,
(6.3.14)
346
Chapter VI
If we set a0 7^ 0, then the first relation in (6.3.14) implies the characteristic equation
Ba2 + ra - S — 0, whose roots
2£ v
• . v-
i -—/i
_ 1,0 2 _ __ / _
""~
~2B
_
/
2 i AX n\
-i-"±0-0;
determine two possible values of the characteristic factor
i>(x) = x i , ,
.
fcO
^^, "
n=1
,„
'
(6.3.16)
n\\KB)
For the coefficients of the series (6.3.16) we have the estimate
Thus, the series (6.3.16) converges for any finite x > 0, we can differentiate this series term by term, and its sum ^(x) is an entire analytic function satisfying the estimate < xl exp
- .
\2KB1
The constant ao in (6.3.15) can be found from the boundary condition (6.3.10) for the function FQ at the switch point x*. Hence we have the following final expression for the solution of Eq. (6.3.11):
77?T-
(6'3-17)
The nonhomogeneous equation (6.3.12) is of the same type as Eq. (6.3.11) and its solution can also be expressed in terms of generalized power series.
Applications of Asymptotic Synthesis Methods
347
It is well known that the general solution of the nonhomogeneous equation (6.3.12) is the sum of the general solution of the homogeneous equation,
•B-qum-^-x]xF[-SF1 = 0,
(6.3.18)
•K /
and any particular solution of Eq. (6.3.12). Equation (6.3.18) is similar to Eq. (6.3.11), and therefore, its solution can be constructed by analogy with the above-described procedure (6.3.13)-(6.3.17). Performing the required calculations, we obtain the following expression for the general solution of Eq. (6.3.18): (6.3.19) Here ci and 03 are arbitrary constants and the functions i>i(x] and are the sums of generalized power series
(6.3.21) where the numbers k^, k^, and a are determined by the expressions
,>
(qum - r)2 + 4SB.
(6.3.22)
Note that the series (6.3.20) for any finite x can be majorized by a convergent numerical series. Therefore, the series (6.3.20) can be differentiated and integrated term by term, and its sum tl>i(x) is an entire function. Similar statements for the series (6.3.21) hold only for a ^ n (here n is a positive integer number); in what follows, we assume that this inequality is satisfied. A particular solution of the nonhomogeneous equation (6.3.12) can be found by the standard procedure of variation of parameters. We write the desired particular solution $ as
*(x) = ci(z)Vi(z) + c2(x)i>2(x),
(6.3.23)
where the desired functions c\(x) and C 2 (cc) satisfy the condition
ci(a;)^i(a;) + c'2(z)V>2(a;) = 0.
(6.3.24)
348
Chapter VI
By substituting, instead of FI, the relation (6.3.23) into (6.3.12), after simple calculations with regard to (6.3.24), we obtain _ um [ ~ B J
(c - pqx)il>2(x) dx
_ Mm f
(pqx - c)tl>i(x) dx
o'
~ B J x^2(x)^((x)~^(x)^(x)Y
l0
-*' ° j
Note that the expression in the square brackets in the integrands in (6.3.25) and (6.3.26) is the Wronskian of Eq. (6.3.12), which is not zero for all x, since the solutions tj>i(x) and ^(x) are linearly independent. Therefore, we can readily calculate the integrals in (6.3.25) and (6.3.26) and thus find the functions ci(x] and C2(x) as generalized power series obtained by term-by-term integration in (6.3.25) and (6.3.26). Thus the general solution of the nonhomogeneous equation (6.3.12) has the form x) + $(z), (6.3.27) where $(z) is given by (6.3.23), (6.3.25), and (6.3.26). To obtain the unique solution satisfying the Bellman equation (6.3.4) for x > a;*, we need to
choose arbitrary constants ci and c2 in (6.3.27). To this end, we use the boundary condition (6.3.10) for the function FI(X) at the switch point xt. To obtain the second condition, we assume that the functions FQ(X) and
FI(X) coincide as K —>• oo with the known exact solution F ( x ) given by
(6.3.6). It follows from (6.3.16), (6.3.17), (6.3.20), (6.3.21), (6.3.25), and (6.3.26) that this condition is satisfied if we set ci = 0. The condition (6.3.10) for the function FI(X) at the point xf implies 1 / c \ c2 = -777-T- (p- —— - $'(«*) . il>'2(x) \ qxt ')
Thus, the desired solution of the inhomogeneous equation (6.3.12) acquires the form
- + $(x},
x>x..
(6.3.28)
Formulas (6.3.17) and (6.3.28) determine the cost function F ( x ) that satisfies the Bellman equation (6.3.4) for all x € [0, oo). In these formulas, only the coordinate of the switch point x* remains unknown. To find x*,
we use the condition that the cost function F(x) must be continuous at the switch point: F0(x)
= FI(X)
(6.3.29)
Applications of Asymptotic Synthesis Methods
349
or, which is the same due to (6.3.10), the condition that the second-order derivative must be continuous: (6.3.30)
Since the series (6.3.16) and (6.3.21) are convergent, we can calculate z* with any prescribed accuracy, and thus solve our equations numerically. Furthermore, for large values of the medium capacity K, formulas (6.3.29) and (6.3.30) give us approximate analytic formulas for the switch point, and these formulas allows us to construct control algorithms that are close to the optimal control. 6.3.3. The calculation of x* for large K. In the case K —> oo, the functions i}>(x), ij)i(x), and ^2(a?), as it follows from (6.3.16), (6.3.20), and (6.3.21), are given by the finite formulas
Correspondingly, instead of the series (6.3.15) and (6.3.28), we have 1 / \ / \k° F0(x) = —lpx#--\( — J , = un •
(6.3.31)
pqx
S — r — B + qun t
(S-r-B)px*
c\ ( x
N
*' (6.3.32)
By substituting (6.3.31) and (6.3.32) into (6.3.30), we obtain x, = x0, where XQ is given by (6.3.7) (derived in §2.4). If the medium capacity K is a finite number, then the coordinate x* cannot be written as a finite formula. However, it follows from continuity considerations that for large K the coordinate xf is close to XQ, so that we can take XQ as the first approximation to the root of Eqs. (6.3.29) and (6.3.30). Then the corrections for refining this first approximation can be calculated by the following scheme. For large K, the e — r/KB can be considered as a small parameter and, as follows from (6.3.15), (6.3.16), (6.3.20), (6.3.21), and (6.3.28), the functions F0(x) and FI(X) can be represented as power series in e:
F0(x] = Fg(x) + eFt(x) + e2F*(x) + e3 ..., 2
3
F!(X) = F?(x) + sFt(x) + e Fl(x) + e ...
(6.3.33) (6.3.34)
350
Chapter VI
We also seek the root of Eqs. (6.3.29) and (6.3.30), that is, the coordinate xf, as the series
xf = K 0 + eAi + e 2 A 2 + e 3 . . . ,
(6.3.35)
where the numbers x 0 i AI, A 2 , . . . must be calculated. By substituting the
expansions (6.3.33)-(6.3.35) into Eq. (6.3.29) (or (6.3.30)) and setting the coefficients of equal powers of the small parameter e on the left- and righthand sides equal to each other, we obtain a system of equations for successive calculation of the numbers x 0 , AI, A 2 ,... in the expansion (6.3.35). Obviously, the first term XQ in (6.3.35) coincides with (6.3.7). To calculate the first correction AI in the expansions (6.3.33) and (6.3.34), we retain the zero-order and the first-order terms and omit the terms ~ s2 and
higher-order terms. As a result, from (6.3.16), (6.3.17), (6.3.20), (6.3.21), and (6.3.28) we obtain the following expressions for the functions FQ(X) and FI(X) in the first approximation:
^H1 + —TT >
————,^°,o - 7.U / 7~U
i
^——» 1\ _„ '
(6.3.36) \ '
-a:* J
ek\x ft
( AI - —— +eA2xt as* o
^
[ f c 2+
V
^^J
c2 + (p-A1)x,
(6.3.37)
where A
I = T^———;
(r + B- S)(4B - 2qum + 2r-S) 1 — )2 _|_ AX R v v ^—,n
» ; -TIUJJ.
r
By differentiating (6.3.36) and (6.3.37) two times, we rewrite Eq. (6.3.30) p_
cr \ f t 0 \ I «1
qx* J L
a;* r-
\
TZ- 1 — 1
f(^ J- 111
-eA2. (6.3.38) To calculate the first two terms in the expansion (6.3.35), we substitute the root x* = x0 + eAi into Eq. (6.3.38) and collect the terms of the zero
Applications of Asymptotic Synthesis Methods
351
and the first order with respect to the small parameter e. If we retain only the zero-order terms in Eq. (6.3.38), then we can readily see that (6.3.38) implies formula (6.3.7) for XQ. Collecting the terms of the order of e, from (6.3.38) we obtain the first correction
qx0
(6.3.$9)
-ai
Thus, for large values of the parameter K (that is, for small e), the coordinate XQ given by (6.3.7) can be interpreted as the switch point in the
zero approximation. Correspondingly, the formula
where XQ and AI are given by (6.3.7) and (6.3.39), determines the switch point in the first approximation. Let UQ(X) and MI (a;) denote the controls Ui(x)
=
f 0,
0 < x < Xi,
I, Mm,
X{ < X,
~
*:
(6.3.41)
I— 0, 1.
Obviously, by using these algorithms to control system (6.3.1), we can decrease the value of the functional (6.3.3) compared with its maximum value F(x), which can be obtained by using the optimal control (6.3.9). However, it is natural to expect that this decrease in the value of the functional
(6.3.3) is negligible for large K , and moreover, the quasioptimal control MI(Z) is "better" than the zero-approximation algorithm UQ(X) in the sense that /[MI] > I[u0]. 6.3.4. Results of the numerical analysis. Our expectations are confirmed by the following results of numerical analysis of the quasiopti-
mal algorithms (6.3.41). By d(x) we denote the value of the functional (6.3.3) obtained by using the controls Ui and a given initial population size
z(0) = x. Then G,-(x) is a continuously differentiable function of the initial state x and satisfies the linear equation /
r \
V
& J
Bx*G'!+ T+B-—X }xG'i + (pqx-c-qxG'i)ui(x)-SGi
= 0,
G(0) = 0. (6.3.42)
352
Chapter VI
Denoting by GM(X) and Gu(x), just as in Section 6.3.2, the values of the function d(x) on either side of the switch point a;,-, we obtain the following equations for G,-0 and GJI from (6.3.42):
Bx2G'/0 +r + B- K x xG'io - SGn = 0, 0 < x < xt,
/
V
(
r \
Bx2G"1 + [ r + B - qum - — x }xG'i-L - 5Gn = um(c-pqx), K V /
(6.3.43) a;,- < x, (6.3.44)
which are quite similar to Eqs. (6.3.11) and (6.3.12). Therefore, the general solutions of these equations, by analogy with Section 6.3.2, have the form
GiQ = ci^(z),
Ga = c 2 Y> 2 (z) + *(z),
(6.3.45)
where the functions i/>(x), i>z(x), and $(*) are given by formulas (6.3.16), (6.3.20), (6.3.21), (6.3.23), (6.3.25), and (6.3.26). The functions (6.3.45) differ from the corresponding functions (6.3.17) and (6.3.28) in Section 6.3.2 by the method used for calculating the constants ci and 02 in (6.3.45). In Section 6.3.2 the corresponding constants (ao in (6.3.15) and ci,C2 in (6.3.27)) were determined by the condition (6.3.10) at an unknown switch point a;,, while in Eqs. (6.3.42) the switch point Xi was given in advance either by (6.3.7) with i = 0 or by (6.3.40) with i = 1. By substituting (6.3.45) into the formulas Gio(xi) = Gn(xi) and G'io(xi) = Gj-1(a;,-),8 we obtain the following formulas for the coefficients Si and C2 in (6.3.45):
_
=
By choosing specific numerical values of the parameters r, K, , um in problem (6.3.1)-(6.3.3), one can calculate the coefficients (6.3.46) and thus construct the plots of the functions Gi(x), i — 0,1, by using computers. We also note that the same formulas (6.3.45) and (6.3.46) can be used for numerical calculation of the cost function F(x) satisfying the Bellman equation (6.3.4). To this end, it suffices first to calculate the root of Eq. (6.3.29)
(or (6.3.30)), and then to substitute the obtained value into (6.3.46) instead of X{. In this case, the functions Gio(x) and GH(X) given by (6.3.45) 8
These formulas follow from the condition that the solutions G;(x) of Eqs. (6.3.42)
are continuously differentiable.
353
Applications of Asymptotic Synthesis Methods
coincide, respectively, with the functions F0(x) and FI(X) given by (6.3.17) and (6.3.28), that is, we have Gi(x) = F(x). The above-described procedure for numerical constructing the functions Go(x), GI(:E), and F(x) was realized in the form of software and was used in numerical experiments for estimating the quality of the quasioptimal
control algorithms UQ(X) in the zero approximation and ui(x) in the first approximation. Some results of these experiments are shown in Figs. 53
and 54, where the cost function F ( x ] is plotted by solid curves and the functions GQ(X) and GI(X] by dot-and-dash and dashed curves, respectively.
F(x) 1.6 Gi(x = 7.5
G0(x 1.4
1.2
1.0
0.8 X*
0
X*
1.0 "-—--""1.2 ^-^ 1.4 # = 7.5 # = 11
1.6
FIG. 53 In Fig. 53 these curves are constructed for two values of the parameter K: K = 7.5 and K = 11; the other parameters of problem (6.3.1)-(6.3.3) are: r — 1, S = 3, B = 1, q = 3, um = 1.5, c = 3, and p — 2. In this case, the variable e = r/KB treated as a small parameter in the expansions
(6.3.33)-(6.3.35) attains the values e = 0.091 (the upper group of curves) and £ = 0.133 (the lower group of curves). Figure 53 shows that in this case all three curves F(x), GQ(X), and GI(X), relative to the same group of parameters, are sufficiently close to each other. Hence, the use of the quasioptimal algorithms (6.3.41) ensures the control quality close to that
of the optimal control (obviously, the first-approximation control ui(x) is
Chapter VI
354
preferable than the zero-approximation control U0(x), since the mean cost GI(Z) corresponding to ui(x) is closer to the optimal cost F(x)). 0.07 : F(x)
0.06
Gi(s) G0(x)
0.05 0.04 0.03
0.02 X0
0
0.65
0.55 K -0.17
0.7
FIG. 54 It is of interest to point out that an improvement in the control quality can be obtained by using MI (a) instead of UQ(X} even if the parameter
s — r/KB is not small. This phenomenon is clearly illustrated by the
results of calculations shown in Fig. 54, where the curves F(x), GQ(X), and GI(X) are drawn for the following parameters of problem (6.3.1)-(6.3.3): r = 1, <J = 20, B = 1, q = 3, um = 100, c = 3, p - 2, K = 0.3, and tf = 0.17. Many times in Chapters III, V, and VI we have considered similar situations (in which the formal use of the approximate synthesis procedure developed for problems with a small parameter e <S 1 provides satisfactory results for e ~ 1). Thus we see that the small parameter methods and related methods of successive approximations are very effective tools for
investigation and solution of various specific practical problems of optimal control.
CHAPTER VII
NUMERICAL SYNTHESIS METHODS
Numerical synthesis methods are, in general, mostly universal compared with any other methods for solving problems of optimal control, since numerical methods are highly insensitive to the problem conditions. Indeed, each of the approximate methods described in Chapters III-VI is intended for solving optimal control problems from a certain class characterized by the singularities of the plant dynamics equations, by small parameters, etc. The choice of the method for obtaining quasioptimal control algorithms essentially depends on the singularity of the control problem designed.
On the other hand, if the control problem is solved, just as in the present book, by the dynamic programming method, then the possibility to solve the synthesis problem numerically is determined by the way of constructing
a numerical solution of the Bellman equation corresponding to the problem in question. The type of this Bellman equation is determined by the character of the problem considered. So, the majority of stochastic synthesis problems studied in Chapters II-VI correspond to the Bellman equations in the form of nonlinear second-order partial differential equation of the parabolic type. Correspondingly, the Bellman equations for deterministic synthesis problems are (nonlinear) first-order partial differential advection
type equations. Equations of both types were thoroughly studied long ago. Such equations arise in many problems of mathematical physics and mechanics of continuous media, in modeling chemical and biological processes, etc. Hence, so far numerous different numerical methods have been developed for solving such equations,1 many of which are realized as standard programs that 1 It would be right to note that numerical methods have been developed mostly for solving second-order parabolic equations. Nonlinear advection equations have been less
studied until the present time. However, many papers dealing with qualitative analysis and numerical solution of such equations have appeared most recently. Here we would
like to mention the Italian school of mathematicians (M. Falcone, R. Ferretti, and others) who studied various discrete schemes that allow the construction of numerical solutions
for various types of nonlinear advection equations including those with discontinuous solutions [10, 31, 48, 49, 53],
355
356
Chapter VII
are parts of well-known software packages such as MATLAB, Mathematica, and some others. It should be noted that the existing software can be used for solving synthesis problems in practice rather seldom. This fact is related to some
peculiar features of the Bellman equations (see §3.5 in [34]), which make the application of standard numerical methods rather difficult. For example,
the difficulties arising in solving the Bellman equations of higher dimensions are well known. Furthermore, an obstacle known as the " boundary difficulty" is often encountered in the numerical solution of synthesis problems. Obviously, any numerical procedure allows us to construct the solution of the Bellman equation only in a bounded region D where the arguments of the loss function vary. Therefore, if, for example, we solve the Bellman equation of parabolic type, then we need to pose the initial and boundary conditions on the boundary of D. At the same time, many optimal control problems do not contain any restrictions on the phase coordinates (in this case, to solve the synthesis problem, we need to solve the Cauchy problem for the Bellman equation). Thus, for a reasonable choice of the boundary conditions required for the numerical solution of the problem, we need, in addition, to study the asymptotic behavior of the loss function at infinity. In more detail, these problems are considered in §7.1. In §7.1 and §7.2 we show how one of the most widely used methods
(known as the grid function method] for solving partial differential equations numerically can be applied for the numerical solution of some specific optimal control problems studied in the previous chapters by other methods. §7.1. Numerical solution of the problem of optimal damping of random oscillations
The main results of this section are related to the numerical solution of the problem of optimal damping of random oscillations in a linear oscillator; this problem was studied in §3.2 and §3.4. However, we begin with some general problems concerning methods for stating the boundary conditions for the loss function in solving the synthesis problem numerically. 7.1.1. Choice of the boundary conditions for the loss function. Let us consider a control system governed by the differential Ito equation of the form dx(t)
= [a(t, x) + q(t)u] dt + cr(t, x) d0rj(t),
0 < t < T,
x ( 0 ) =x0.
( • • 1
Numerical Synthesis
357
Here x = x(t) is an n-dimensional vector of phase variables, u = u(t) is an r-dimensional vector of controlling actions, r)(t) is a d-dimensional vector of independent Wiener stochastic processes of unit intensity, a(t, x) is an n-dimensional vector of given functions, and q(t) and a(t, x) are given n x r and n x d matrices.
We assume that admissible control actions are subject to constraints of the form
u(t)
€ U,
(7.1.2)
where U is a given closed bounded set in R r . If the vector of current phase variables x(t) can be measured exactly, then we need to construct a control function w* = ut (t, x(t)^, 0 < t < T,
in the synthesis form so that, for any given initial state cc(0) = KO, the function u* minimizes the following functional denned on the trajectories of Eq. (7.1.1):
r rT i I[tt] = E / c ( x ( t ) ) d t + i/>(x(T))\ LJo J
(7.1.3)
(here £[•] is the mathematical expectation of [•], c(x):i/>(x) > 0 are given penalty functions, and 0 < t < T is a given time interval). According to §1.4, the dynamic programming approach allows one to
reduce problem (7.1.1)-(7.1.3) to solving the partial differential equation (the Bellman equation)
LF + rmn[uTqTFx] = -c(x),
0
F(T, x) = ^(x),
ueU
(7.1.4)
L = — + a T (i, x)— Here F = F(t,x) is the loss function determined as usual by
F(t,x)=
F fT min E / c ( x ( s ) ) ds + i/>(x(T)) «(»)e^ I J t t<s
1 x(t)=x\. J
(7.1.5)
Equation (7.1.4) is a semilinear (linear with respect to higher-order derivatives) equation of parabolic type, and we shall try to solve it numerically by using different versions of well-studied finite-difference procedures in the grid function methods (the grid methods) [135, 162, 163, 179]. However, these calculational scheme allow one to obtain the solution only in a bounded domain D of the phase variables x. To apply these methods, we need to impose some boundary conditions on the loss function F(t, x) on the boundary of D. Since in the initial statement of the problem it is
358
Chapter VII
assumed that the solution of Eq. (7.1.4) must be defined on an unbounded phase space (x G Rn), the boundary conditions for F(t, x) require a special analysis if Eq. (7.1.4) is solved numerically. A possible method for overcoming the boundary indeterminacy in stochastic time-optimal control problems was proposed in [85].
For the problem considered here, the essence of the method suggested in [85] consists in the following. Suppose that it is required to construct a numerical solution of Eq. (7.1.4) in a bounded region D. Let us consider a sequence of expanding bounded regions in the phase space DR D D (DR can be the n-dimensional ball of radius R or the n-dimensional cube with edge R centered at the origin). Then the desired solution F(t, x) is denned
in the region D as the limit of the sequence of numerical solutions of the boundary value problems for Eq. (7.1.4) in the regions DR, corresponding to the increasing sequence of values of the parameter R. In this case, the
boundary conditions posed on the boundaries of the regions DR can be arbitrary (for example, the zero conditions F(t,x}\QD = 0). However, in practice, the use of this procedure in the numerical synthesis requires an extremely large amount of calculations. For example, already
for the second-order system (7.1.1) (a; 6 R-a)) this method is unacceptable, since the time required to compute the solution is too large. Here we present a more economical numerical method based on the use
of the asymptotic behavior of the loss function for large \x\. In this case, we need a priori estimate the asymptotic behavior of F(t, x) satisfying (7.1.4) as \x\ —>• oo. Suppose that q(t) be a piecewise continuous bounded function for all t G [0, T] and a(t, x) and a(t, x) are continuous functions in x, Borel function in (t,x), and satisfy the conditions
\a(t,x)-a(t,y)\ + \\
(
'
for all x,y, £ Rn and t £ [0,T], where N > 0 are constants, a is the
Euclidean norm of the vector a, and \\cr\\ — *\/Sp<7<7T. We assume that the penalty functions c ( x ) , tj)(x) > 0 are continuous and satisfy the condition
Ni\x\m
(7.1.7)
for all x G Rn and some m, AT1; JV2 > 0; furthermore,
|c(z) - c(y)\ + \j(x) - i>(y)\ < N(l + R)m~l\x - y\ for all R > 0 and x, y e SR (Sn is a ball of radius R in R n ).
(7.1.8)
Numerical Synthesis
359
By using Theorem IV. 1.1 [113], one can show that the conditions (7.1.6), (7.1.8), together with the upper estimates (7.1.7), guarantee that the function F(t,x) satisfying problem (7.1.1)-(7.1.3) has generalized first-order derivatives in x and the estimate
\Fx(t,x)\
(7.1.9)
is satisfied for any i G [0, T] and for almost all x. The lower bounds for the penalty functions (7.1.7) and the continuity of the phase trajectories x(t) imply the following lower estimate for the loss function:
\x\)m.
(7.1.10)
Let F°(t, x) denote the solution of the linear equation
LF° = -c(x),
0
F°(T,x) = i)(x)
(7.1.11)
(L is the operator in (7.1.4)). Obviously, F° is the value of the functional
U
T
1 c ( x ( s ) ) ds + $(x(T)) | x ( t ) = x\. J
(7.1.12)
This functional is calculated on the trajectories of system (7.1.1) corresponding to the noncontrolled motion (the averaging in (7.1.12) is performed over the set of sample paths x(s) : t < s < T issued from a given point x(t) = x and satisfying the stochastic differential equation (7.1.1) for u = 0). It follows from (7.1.4) and (7.1.11) that the difference G(t, x) = F°(t, x)F(t,x) satisfies the equation
LG = $(t,Fx),
0
G(T,x) = 0.
(7.1.13)
Here $ denotes the nonlinear function $(<, Fx) — — minu€U[uT qT Fx]. Since the set U of admissible controls and the function q(t) are bounded, we have the estimate
\$(t,Fx)\
(7.1.14)
If the transition probability density of a noncontrolled Markov process x(s) satisfying Eq. (7.1.1) for u = 0 is denoted by p(x, t; y, s) (s > 2), then
we can write the solutions of Eqs. (7.1.11) and (7.1.13) in quadratures (see (3.4.13)). In particular, for the function G we have
G(t,x)= I Jt
ds I JRn
$(s,Fy(s,y))P(x,t;y,s)dy.
(7.1.15)
360
Chapter VII
This relation and (7.1.9) imply the following (similar to (7.1.9)) upper bound for the difference G - F - F°:
\G(t,x}\
G(t, x)/F(t, x) = [F°(t, x) - F(t, x ) ] / F ( t , x) -> 0
(7.1.16)
as \x\-> oo. This condition allows us to use F°(t, x) as the asymptotics of the loss function F(t,x] for solving the Bellman equation (7.1.4) numerically. In some cases, for instance, in the example considered below, we succeed in obtaining a finite analytic formula for the function F°(t, x). 7.1.2. Numerical solution of a specific problem. We shall discuss
the method of numerical synthesis in more detail for the problem of optimal damping of random oscillations studied in §3.2 and §3.4. Suppose that the plant to be controlled is a linear oscillator with one degree of freedom governed by an equation of the form
x+/3x + x = u+V2B£(t),
\u\
(7.1.17)
where £(t) is the scalar standard white noise (1.1.31), u is a scalar control, and /?, B, and um are given positive numbers (/? < 2). By setting the penalty functions c(x(t}} — X2(t) + x2(t) and i/>(x) = 0 in (7.1.3), we obtain the Bellman equation
Ft +yFx - (x+(3y}Fy + BFyy = -x2 - y2 + um\Fy\, -oo < x,y< +00, 0
^
for the loss function F(t,x,y) (here x and y = x are the phase variables). By passing to the reverse time p = T — t, we can rewrite (7.1.18) as the standard Cauchy problem for a semilinear parabolic equation. By using the old notation t for the reverse time p, we rewrite (7.1.18) as
Ft = BFyy + yFx - (x+(3y)Fy - um\Fy\ + x2 + y2, 0
^ ( • • I
We shall seek the numerical solution of Eq. (7.1.19) in the quadratic (-L < x < L, —L < y < L) region D of the phase variables (see Fig. 55). We need to pose boundary conditions for the function F(t,x,y) on the boundary of D. It follows form (7.1.17) that the phase trajectories lying in
361
Numerical Synthesis
E
-L
-L
D
FIG. 55 the interior of D cannot terminate on the boundary segments BC and ED indicated by dashed lines in Fig. 55. Therefore, we need not pose boundary
conditions on these segments; on the other parts of the boundary, as it follows from Section 7.1.1, the boundary conditions are posed with the aid of the asymptotics F°(t,x,y) satisfying the linear equation F? = BF°y + yF°x - (x Q
F°(0, x, y) = 0.
(7.1.20)
Up to the notation, Eq. (7.1.20) coincides with Eq. (3.4.23) whose solution was obtained in §3.4 as the finite formula (3.4.29). Rewriting (3.4.29) with
regard to the notation used in the present problem, we obtain the solution of Eq. (7.1.20) in the form
2B
^•sm2St(B-,
j cos 2St(x2(/32 - 2) + 2(3xy + 2y2 - B{3) , 6 =
(7.1.21)
Formula (7.1.21) allows us to pose the boundary conditions for the desired function F = F(t,x,y) on the unhatched parts of the boundary
362
Chapter VII
of D = (-L < x,y < +L). To this end, we set F = F(t,x,y) — F°(t,-L,y) on AB, F = F°(t,x,L) on CF, F = F°(t,L,y) on EF, and F = F°(t,x,-L) on AD. Let us construct a uniform grid in the domain HT = {D x [0,T]} =
{(x,y,t): - L < x, y < L, 0 < t < T}. By F^ we denote the value of the function F(t,x,y) at the point with coordinates (t = kr, x = ih,y = jh), where h and T are the approximation steps in the coordinates x,y and in time t and i, j, k are integer- valued variables with values in —Q < i < +Q, -Q < 3 < +Q, a,ndO
It follows from (7.1.19) that for & = 0 we must set
*t' = 0.
-Q<*,J
(7.1-23)
at all nodes of the grid.
For the difference approximation of Eq. (7.1.19) we shall use a locally one-dimensional solution method (a lengthwise-transverse scheme) [163]. In this case the complete approximation scheme consists in solving the following two one-dimensional (with respect to the phase coordinates) equations successively:
vt = yvx + x2,
(7.1.24) 2
Vt = -(x+0y + um AgnV,)Vy + BVyy + y .
(7.1.25)
Each of Eqs. (7.1.24) and (7.1.25) is replaced by a two-layer difference scheme denned by the three-point pattern (Eq. (7.1.24)) or by the four-point pattern (Eq. (7.1.25)). In this case, since the parts of the boundary of D
indicated by dashed lines in Fig. 55 are inaccessible, we shall approximate vx = dv/dx by the right difference derivative for y > 0 (j > 0) and by the left difference derivative for y < 0 (j < 0). Then the derivatives Vy = dV/dy and Vyy = 92V/dy2 are approximated by the central difference derivatives
vx ~ '+1'J
% J
' ,
j > 0,
-Q < i < Q - 1,
Numerical Synthesis k
363
k
j < 0, -Q + 1 < i < Q,
' ' ~ '
fc T/
*~
,
,-
2h
'
,, y
-
o f c
i
fc
-g + 1
(7.1.25) are' related as follows: F^ = vfj, vk+l = V£, and V^ = Ftk+l. Moreover, since the time-step is assumed to be small (we take r — 0.01), in the difference approximation of Eq. (7.1.25) we can use the sign of the derivative Vk = vk+l instead of sign (V*^ - V^jt\), that is, we shall use m:j — sign(F,-j+1 — VJ*j_i) instead of signVy (a similar replacement was performed in [34, 86]).' It follows from the preceding that the difference approximation transforms Eqs. (7.1.24) and (7.1.25) to the following three difference equations;
(VU -vij)'T = J(
0 < j < Q - 1, -Q < i < Q - 1, (7.1.26)
(7.1.27)
-Q+l
(7.1.28) Formulas (7.1.26) and (7.1.27) together with the boundary conditions (7.1.22) and the initial conditions (7.1.23) allow us to calculate the functions vfj"1 recurrently at all nodes of the grid. Indeed, rewriting (7.1.26) and (7.1.27) in the form
+ JTV
+ r(ih)2}/(l + jr),
j > 0,
(7.1.29)
JT),
j<0,
(7.1.30)
364
Chapter VII
we see that, for given vfj = Fkj and each fixed j > 0, the desired set of the values of v* t1 can be calculated successively from right to left by formula (7.1.29). For the initial value of Vq+f we take F°((k + l)r,L,jh), where F°(t,x,y) is the function (7.1.21). Correspondingly, for j < 0 the values
of uH 1 can be calculated from left to right by formula (7.1.30) with the initial value vk_^j = F°((k + l)r, -L,jh).2 Since t^t1 = V*j, we obtain the grid function Vfj for the fcth time layer, after the grid function •wft 1 is calculated. Now to calculate the grid function Vff1 = -Fft1 on the layer (k + I ) , we need to solve the linear algebraic system (7.1.28). It is convenient to solve this system by the sweep method
[162, 179], which we briefly discuss here. Let us denote the desired values of the grid function on the layer (k + 1) by z, - V*fl . Then system (7.1.28) can be written in the form
AjZj.1-CjZj+MjZj+i
= -ipj,
_Q + i < j < Q _ l ,
(7.1.31)
where Aj, Cj, Mj, and
Cj = Ih2 + 4rB,
Mj - 2rB - hr(ih + jj3h + umuitj),
(7.1.32) Since the number of equations in (7.1.31) is less than the number of unknown variables Zj, —Q < j < Q, to solve the system (7.1.31) uniquely, we need to complete this system with two conditions
(7.1.33) that follow from the boundary conditions (7.1.22). We seek the solution of problem (7.1.31), (7.1.33) in the form
+ i/,-+i,
-Q < j < Q - 1,
(7.1.34)
where the coefficients fj,j and i/j are calculated by the recurrent formulas
2 The recurrent formulas (7.1.29) and (7.1.30) are used for fc = 0,1, 2, ..., K - 1. It follows from (7.1.23) that in (7.1.29) and (7.1.30) we must set «? . = 0, -Q < i,j < Q, for fc = 0.
Numerical Synthesis
365
with the initial conditions
F°((k + l)T,ih,-L).
(7.1.36)
Thus, the algorithm for solving problem (7.1.31), (7.1.33) by the sweep method consists in the following two steps:
(1) to find Hj and vj recurrently for -Q + 1 < j < Q (from left to right from j to j + 1) by using the initial values (7.1.36) and formulas (7.1.35); (2) employing ZQ from (7.1.33), to calculate (from right to left from j + I to j) the values ZQ_V Zg_ 2 , • • • , z _o + ii z-o successively according to formulas (7.1.34) (note that in this case, in view of (7.1.36), the value of z_Q coincides with that given by (7.1.33)). As was shown in [162, 179], the procedure of calculations by formulas (7.1.34) and (7.1.35) is stable if for any j we have
A, > 0,
MJ > 0,
Cj > AJ + Mj.
It follows from (7.1.32) that these conditions can be reduced to the following one in the problem in question:
IB > h(ih + jfih + umuitj).
Obviously, the last condition can always be satisfied by choosing a sufficiently small approximation step h. This calculational procedure was realized as a software package used for numerical experiments on computers. The parameters of a difference scheme were chosen so that to ensure a prescribed accuracy. It is well known [163] that the total locally one-dimensional approximation scheme (7.1.22),
(7.1.23), (7.1.26)-(7.1.28) is absolutely stable and its error is O(h2 + T). The approximation steps were: T = 0.01 and h = 0.1. The dimensions of the region D were: L = 3 and Q = 30. The other parameters (3,um,B of the problem were different in different specific calculations. The twodimensional data array of the loss function F(t,x,y) was printed for t =
0.25,0.5, 0.75,.... Some results of these calculations are shown in Fig. 56-60. Figure 56 presents the axonometry of the loss function F(t, x, y) in Eq. (7.1.19) with (3 = B = um = 1 at three time moments t — 0.25, 0.5, 1.0. Figure 57 shows curves of constant level F(t,x,y) = 3 and switching lines in an optimal system with f3 = B = um = 1 at three time moments t = 0.5, 2.0, 8.0. In view of the central symmetry of Eqs. (7.1.19), these curves are plotted in two different halves of the region D. The switching line uniquely determines the optimal control of system (7.1.17) as follows: u = —um at the points of
Chapter VII
366 F(t,x,y)
16:
FIG. 56
\ \ -3
-2
-1
0 \ 1 \
2
I 3
x
-1 -2
-3
FIG. 57 the phase plane (a;, y] lying above the switching line, and u = +um below this line.
Figure 58 illustrates how the switching line and the value of the performance criterion of this optimal system depend on the value of the admissible control um for B = /? = 1 and t = 4. In Fig. 58 one can see that an increase in the range of admissible controls uniformly improves the control quality,
367
Numerical Synthesis
that is, decreases the value of the optimality criterion independent of the initial state of system (7.1.17).
um = 1
FIG. 58 Figures 59 and 60 show how the switching lines and the constant level curves depend on the other parameters of the problem.
um=/3=l
FIG. 59
Chapter VII
368
FIG. 60 §7.2. Optimal control for the "predator—prey" system (the general case)
In this section we consider the deterministic problem of optimal control for a biological system consisting of two interacting populations ("predators" and "prey"). We have already considered this system in §5.2 where we studied a special type of this system called in §5.2 the case of a "poorly adapted predator." In what follows, we consider the general case of this problem. The synthesis problem corresponding to this case is solved numerically. Furthermore, we obtain some analytic results for a control problem with infinite horizon.
7.2.1. The normalized Lotka—Voiterra model. Statement of the problem. We assume that the system considered is described by the
Lotka-Volterra model (see [133, 186, 187] as well as §2.3 and §5.2) in which the behavior of the isolated system is governed by a system of the form
yi(r) =
o 4 )t/i
(7.2.1)
Here x\(r) and y\(r) are the sizes (densities) of prey and predator populations at time r, the positive numbers at (i = 1,2,3,4) characterize the intraspecific (oj, 04) and interspecific (02,03) interactions. By changing the variables
x(t) = a3a^1xi(r),
y(t) =
we rewrite system (7.2.1) in the dimensionless (normalized) form x(t) = (1 - y)x,
y ( t ) = b(x - l)y.
(7.2.2)
Numerical Synthesis
369
Just as in §5.2, we assume that the external (controlling) action on system (7.2.2) is to remove some prey species from the habitat (by catching, shooting, or using some chemical substances). In this case, the control system considered is described by equations of the form
x(t) = (1- y)x - ux,
t > 0,
y(t) = b(x - l)y,
z(0) = x0 > 0,
y(0) = y0 > 0,
where u = u(t) is a nonnegative bounded scalar controlling function that for all t > 0 satisfies the constraints 0 < u(t) < um,
(7.2.4)
where um is a given positive number. Let us consider the phase trajectories of the controlled system (7.2.3). They are solutions of the differential equation
(I — y — u)x
-
(7.2.5)
First, we note that in view of Eqs. (7.2.3), the phase variables x(t) and y(t) cannot attain negative values for alH > 0 if the initial values XQ and t/o are
nonnegative (the last assumption is always satisfied, since XQ and J/Q denote the initial sizes of the prey and predator populations, respectively). Therefore, all solutions of Eq. (7.2.5) (the phase trajectories of system (7.2.3)) lie in the first quadrant (x > 0,y > 0) of the phase plane (x,y). Furthermore, we shall consider only the phase trajectories that correspond to the two boundary values of control: u — 0 and u = um. For u = 0 Eqs. (7.2.3) coincide with Eqs. (7.2.2) for an isolated (au-
tonomous) Lotka-Volterra system. The dynamics of system (7.2.2) was studied in detail in [187]. Omitting the details, we only note that in the first quadrant (x > 0, y > 0) there are two singular points (a? = 0, y = 0) and (x = l,y= 1) that are the equilibrium states of system (7.2.2). In this case the origin (x = 0,y = Q) is an unstable equilibrium state, while the
state (x = I , y = 1) is stable and is a center type singular point. All phase trajectories of system (7.2.2) (except for the two that lie on the coordinate axes: (x > 0, y — 0) and (x = 0, y > 0)) form a family of closed concentric curves around the point (a; = l,y = 1). Thus, in a noncontrolled system the sizes of both populations are subject to undecaying oscillations whose period and amplitude depend on the initial state (xo,y0). However, if the
initial state (:EO,J/O) lies on one of the coordinate axes in the plane ( x , y ) , then there arise singular (aperiodic) phase trajectories. In this case it fol-
lows from Eqs. (7.2.2) that the representative point of the system cannot
370
Chapter VII
leave the corresponding coordinate axis and in the course of time either approaches the origin (along the y-axis) or goes to infinity (along the xaxis). The singular phase trajectories correspond to the degenerate case of system (7.2.2). In this case, the biological system considered contains only one population.
If u = um > 0, then the dynamics of system (7.2.3) substantially depends on um. For example, if 0 < um < 1, then the periodic character of solutions of system (7.2.3) is conserved (just as in the case u = 0), while only the center of the family of phase trajectories moves to the point (x = 1, y — 1 —
um). For um > I the solution of system (7.2.3) is aperiodic. In the special case um = 1, Eq. (7.2.5) can easily be solved, and the phase trajectories of system (7.2.3) can be written explicitly as
(7.2.6) For um > 1 Eq. (7.2.5) has a unique singular point (x — 0, y ~ 0), and this equilibrium state is globally asymptotically stable.3 Now let us formulate the goal of control for system (7.2.3). In many cases [90, 105] it is most desirable that system (7.2.3) is in equilibrium for u = 0, that is, the point (x = 1, y = 1) is the most desirable state of system (7.2.3).
In this case, one is interested in a control u* = u f ( x , y ) that takes system (7.2.3) from any initial state (XQ, yo) to the point x — 1, y = 1 in a minimum time. This problem was solved in [90]. Here we consider the problem of constructing a control ut = w*(i, x, y), which, in general, does not guarantee that the system comes to the equilibrium point (a; = l,j/ = 1) but ensures the minimum mean square deviation of the system phase trajectories from the state (x — l,y = 1) in a, given time interval 0 < t < T: ,T
/[«]=
/
[(l-x(t})2 + (l-y(t))2]dt^
JO
min
.
(7.2.7)
0
7.2.2. The Bellman equation and calculation of the boundary conditions. By using the standard procedure of the dynamic programming approach (see §1.3), we obtain the following algorithm for solving
problem (7.2.3.), (7.2.4), (7.2.7). 3
In this case the term "global" means that the trivial solution of system (7.2.3) is
asymptotically stable for any initial values (0:0,2/0) from the first quadrant of the phase plane.
Numerical Synthesis
371
Now we define the loss function (the functional of minimum future losses) by the relation
F(t,x,y)=
min
< /
[(l - x(
0<«(
•(t) = x , y ( i ) = 4 ,
(7.2.8)
and thus write the Bellman equation for problem (7.2.3), (7.2.4), (7.2.7) as
x,»>0,
0
F(T,x,y) = 0. (7.2.9)
If the function F(t, x, y) satisfying (7.2.9) is found, then the desired optimal control u#(t,x,y) in the synthesis form is given by the expression 0,
for
7nr(<, x,j/) < 0,
By using (7.2.10), we can rewrite the Bellman equation in the form 3 F1
(9 J?
t)W
-H=*(i-y- «0^ + M* -1)^- + a - -)2 + a - y) 2 , ( 7 _ 2 _ u ) x)2/>0,
0
F(T,x,y) = Q.
It follows from (7.2.10) that the optimal control is a relay type function, that is, at each time instant the control u is either u = 0 or u = um (this is a bang-bang control). If the loss function (7.2.8) is continuously differentiable with respect to x, then the control is switched from one value to the other each time when the condition
OF -fc(t,x,y) = Q
(7.2.12)
is satisfied. Equation (7.2.12) determines the switching line on the phase
plane (x, y) at each time instant. This switching line divides the phase space x, y > 0 into two regions RQ and Rm where the control u is either u = 0 or u = um, respectively. To find the switching line is equivalent to
solve the problem of optimal control synthesis.
372
Chapter VII
Of course, it must be remembered that the above procedure for solving the synthesis problem can be used only if the loss function (7.2.8) is sufficiently smooth and the Bellman equation (7.2.9) (or (7.2.11)) holds at all points of the domain 13/r = {x,y > 0,0 < t < T] of definition of the loss function. The smoothness properties of solutions satisfying equation of the form (7.2.9) (or (7.2.11)) were studied in detail in [172]. As applied to Eq. (7.2.9), the main result of [172] has the following meaning. The loss function F(t,x,y) satisfying (7.2.9) has continuous first-order derivatives with respect to all its arguments in the regions RO and Rm. On the interface between RQ and Rm, that is, on the switching line, the derivatives dF/dx and dF/dy can be discontinuous (have jumps) depending on type of the switching line. Namely, for the switching lines of the first and second kind, the first-order derivatives of the loss function are continuous everywhere in D/F. On the switching line of the third kind, the partial derivatives dF/dx and dF/dy always have jumps. Recall that, according to the classification given in [172], the type of the switching line is determined by the character of the phase trajectories of system (7.2.3) in the regions RQ and Rm near the switching line. For example, if the phase trajectories approach the switching line on both sides, then such a switching line is called a switching line of the first kind. In this case, the representative point of system (7.2.3), once coming to the switching line, moves along this line in the sliding mode (see §1.1). If the phase trajectories approach the switching line on one side (say, in the region RQ) and leave it on the other side
(in Rm), then we have a switching line of the second kind. Finally, if the switching line coincides with a phase trajectory in the region Rm (or RO), then we have a switching line of the third kind. In what follows, switching lines of the third kind do not occur; thus we can assume that for problem (7.2.3), (7.2.4), (7.2.7) studied here the Bellman equation (7.2.9) (or (7.2.11)) is valid everywhere in the region x > 0, y > 0, 0 < t < T, and in this region the function F(t, z, y) satisfying this equation has continuous first-order derivatives with respect to all its arguments. To solve Eq. (7.2.9) uniquely, we need to pose boundary conditions for the loss function F(t, x, y) on the boundary of the region of admissible phase variables, that is, for x = 0 and y = 0. Such boundary conditions can readily be obtained by a straightforward calculation of the functional on the right in (7.2.8) by using Eqs. (7.2.3) describing the system considered. Let us write F(t, 0, y) =
t, y) = 2(T - 1) +
[e-*
[l - e-^-*)] .
(7.2.13)
Numerical Synthesis
373
To find t[>(t, x), we need to solve the following one-dimensional optimization problem
rT
/ Jt
[1 + (1 - x(ff))
2] -+
min , o
x(a) = (1 - u)x(
a > t,
(7.2.14)
v
'
x(t] — x.
Problem (7.2.14) can readily be solved, although the solution of (7.2.14) and hence the form of the function i(>(t, x) substantially depend on the value of um .
(a) Let 0 < um < 1. In this case the points
x1 = e-F-t\
z 2 = 2/[l + e(1-""')(T-')]
(7.2.15)
divide the x-axis into three intervals. On the intervals 0 < x < KI and x-2 < x < oo, the function has the explicit form 2(T -t)- 2z[e(T-*) - 1] + x2[e2(T-') - l]/2,
0 < x < zi,
(7.2.16) On the interval xi < x < x%, the function tj>(t, x) is given by the formula i,(t, x) = 2(T - 1) + 1x -
T
— Um)
where z is the root of the transcendental algebraic equation xznm/(l-um)
[ e (l-« m )(T-«) + z]
= 2.
(7.2.18)
One can readily see that the possible values of the root z of Eq. (7.2.18) always lie in the region 1 < z < eC 1 -"™)^-*) an(j tne boundary values z = 1 and z = e (i—«™)(^-t) correspond to the endpoints (7.2.15) of the interval
Xi < x < £2- The optimal control «*, which solves problem (7.2.14), depends on the variable x(t) = x and is determined as follows: if x < xi,
then u* = 0,
if x > X2,
then M* = U TO ,
t <
Chapter VII
374 if
<x<
then u* =
for x(
0, um,
for x(a) > x*.
(b) Let um = 1. In this case, for u = um the coordinate x(
for x(
Um,
for x(
(7.2.19)
The minimum value of the functional (7.2.14) can readily be calculated for control (7.2.19), and as a result, for the desired function if>(t,x] we obtain the expression
• 2(T — t} — 2x\e(T~t} — 11 + — le 2 ( T ~*) — I] 0 < x < e~( T ~'),
(T-t}-lnx + 2x-x2/2-3/2, 2
I (T-t)(2-2x + x ),
e -(
T
-*) < x < 1,
x > 1.
(7.2.20) (c) Let um > 1. In this case the optimal control solving problem (7.2.14) coincides with (7.2.19).4 After some simple calculations, we obtain
(T-t) -lnx + 2x- x2/2 - 3/2,
e~( T ~*) < x < 1,
(T-t) + (In x - 2x + x2/2 + 3/2)/(« m - 1), Zt\_L — 1 1 — T~H—— L^
—
J
» m )(T-t) _ ]_1
e (« m -l)(T-t)
< j; < go
(7.2.21) 4
For e~( T-t ) < x < e ' UTO ~ 1 )( T - t ) t there always exists a time instant
the solution x(
value x(rro) = 1. After the time
for
x(
for
x(
for
x(
Under this control we can realize the generalized solution in the sense of Filippov of the equation x(
Numerical Synthesis
375
Thus, to find the optimal control in the synthesis form that solves problem (7.2.3), (7.2.4), (7.2.7), we need to solve the following boundary value problem for the loss function F(t,x,y):
z,y>0,
0
, y ) =
(7.2.22)
F(t, x, 0) = tf(t, x),
where w* has the form (7.2.10),
The boundary value problem (7.2.22) was solved numerically. The results obtained are given in Section 7.2.4. 7.2.3. Problem with infinite horizon. Stationary operating mode. Let us consider the control problem (7.2.3), (7.2.4), (7.2.7) on an infinite time interval (in this case the terminal time T —> oo). If the optimal control u*(t,x,y) that solves problem (7.2.3), (7.2.4), (7.2.7) ensures the convergence of the functional (7.2.8) for any initial state (x > 0, y > 0) of the system, then due to time-invariance of Eqs. (7.2.3) the loss function (7.2.8) is also time-invariant, that is, F(t,x,y) —> f ( x , y ) , where the function f ( x , y ) satisfies the equation = 0, (7.2.23)
which is the stationary version of the Bellman equation (7.2.9).
In this case, the optimal control u#(x,y) and the switching line do not depend on time explicitly and are given by formulas (7.2.10) and (7.2.12) with F(t,x,y) replaced by the loss function f(x,y). Let us denote the loss function f ( x , y) in the region RQ (ut = 0) by f o ( x , y ) , and the loss function f ( x , y ) in the region Rm (u* = um) by f m ( x , y ) - In RO the function /o satisfies the equation (7.2.24) \jAJ
\j y
Correspondingly, for the function fm defined on Rm we have
x(l-y- w m )^P + by(x - l)~ + (1 - cc)2 + (1 - y)2 = 0. (7.2.25)
376
Chapter VII
Since the gradient of the loss function is continuous on the switching line, that is, on the interface between RQ and Rm, we have
dfo _ dfm dx dx '
dfo _ dfm dy dy
,,_, 2 2g.
Equations (7.2.24)-(7.2.26) allow us to obtain explicit formulas for the partial derivatives d f / d x and df /dy along the switching line
dx
'
dy
by(x-l)
(l 7 2 2 7 j)
If the switching line contains intervals of sliding mode, then formulas (7.2.27) allow us to find these intervals and to obtain explicit analytic formulas for the switching line on these intervals. As was shown in §4.1 (see also [172]), the second-order mixed partial derivatives of the loss function /(x, y) must coincide on the intervals of sliding mode, that is, we have
d2 7f
dxdy
-
62Jf
dydx
(7.2.28)
By using formulas (7.2.27), one can readily see that the condition (7.2.28) is satisfied along the two lines y = x and y — 2 — x. To verify whether these lines (or some parts of them) are lines of the sliding mode, we need to consider the families of phase trajectories (that is, the solutions of Eq. (7.2.5)) for u = 0 and u = um near these lines. The corresponding analysis of the phase trajectories of system (7.2.3) shows that the sliding mode may take place along the straight line y = x for x < 1 and along the line y = 2 — x for x > 1. In this case the representative point of system (7.2.3) once coming to the line y = x (x < 1) moves along this lines (due to the sliding mode) away from the equilibrium state (x — 1, y = 1). On the other hand, along the line y — 2 — x (x > 1), system (7.2.3) asymptotically as t —>• oo approaches the point (x = l,y = 1) due to the sliding mode. That is why, only the straight line segment
y = 2-x,
1 < x < x° < 2,
(7.2.29)
can be considered as the switching line for the optimal control in the stationary operating mode. If u = um, then the integral curve of Eq. (7.2.5) is tangent to the line y = 2 — x at the endpoint x° of the segment (7.2.29). By using (7.2.5), we can write the tangency condition as
Numerical Synthesis
377
For different values of the parameters in problem (7.2.3), (7.2.4), (7.2.7) (that is, of the numbers b > 0 and um > 0), the solution of Eq. (7.2.30) has the form
[36- 1 - um - V ( 3 6 - l - M m ) 2 - 8 6 ( 6 - l ) ] / 2 ( 6 - 1), 2/(2-wm),
.2,
if
0 < um < 1,
6 ^ 1 or
if
0<wro
&=1,
if wm > 1,
um > 1,
b > bm,
b < bm, (7.2.31)
where the number bm is equal to
2 - um + <, - v/8MTO(«ro - 1)
One can easily obtain a finite formula for the stationary loss function f ( x , y ) along the switching line (7.2.29). By using the second equation in (7.2.3) and formula (7.2.29), we see that the coordinate y(t) is governed by the differential equation
y = b(y- y2)
(7.2.32)
while moving along the straight line (7.2.29). By integrating (7.2.32) with the initial condition y(0) = y, we obtain
y(t) =
^,. y , u. y+(l-y)e~bt
(7.2.33)
Using (7.2.33) and the relation x(t) = 2 — y(t) and calculating the functional I in (7.2.7) for T — oo, we find the desired stationary loss function
/(2 -y,y)= h(y) = -(y - Inj/ - 1).
(7.2.34)
Here y is an arbitrary point in the interval 2 — x° < y < 1.
7.2.4. Numerical solution of the nonstationary synthesis problem. If the control time T is finite, then the algorithm of the optimal control ut(t, x, y) depends on time and, to find this control, we need to solve the nonstationary Bellman equation (7.2.22). This equation is solved numerically in the bounded region fl = {0 < x < x max , 0 < y < ymax, 0 < i < T}. To this end, in fi we construct the grid W = iXi {, * = lhjjr.7 i = 0,} 1,i . . . .i N-r, d i ^ j ih^Nr ^ ^ d i = tfjXmav', max i ?/.
__ -'I,
-' __ n
1
AT
if3 — j*"yi J — * - ' i - i - 5 ' ' ' ) - i ' l j / 5
L
y
RT
__
.
y ~~ c/max5
tk = kr, k = 0,1,..., JV, rN = T},
(7.2.35)
378
Chapter VII
and define the grid function F^ that approximates the desired continuous solution F(t,x,y) of Eq. (7.2.22) at the nodes of the grid (a,-, %•,<*). The values of the grid function F^ at the nodes of the grid (7.2.35) are related
to each other by algebraic equations obtained by the difference approximation of the Bellman equation (7.2.22). In what follows, we use well-known methods for constructing difference schemes [60, 135, 162, 163], therefore, here we restrict our consideration only to a formal description of the difference equations used for solving Eq. (7.2.22) numerically. We stress that the problems of approximation accuracy and stability and of the convergence of the grid function FJj to the exact solution F(t,x,y) of Eq. (7.2.22) as hx,hy,T^ 0 are studied in detail in [49, 53, 135, 162, 163, 179]. Just as in §7.1, by using the alternating direction method [163], we replace the two-dimensional (with respect to the phase variables) equation (7.2.22) by the following pair of one-dimensional equations:
-^ = z(l - < / - « * ) £ + ( I - * ) 2 ,
(7-2.36)
-dY- = by(x-l}d^ + (l-y)\
(7.2.37)
each of which is approximated by a finite-difference scheme with fractional steps in the variable t. To ensure the stability of the difference approximation of Eqs. (7.2.36), (7.2.37), we use the scheme of "oriented differences"
[163]. For 0 < i < Nx and 0 < j < Ny, 0 < k < N, we replace Eq. (7.2.36) by the difference scheme fc-0.5 _
k
(7-2.38)
where
k
and the approximation steps hx and T satisfy the condition T\TX < hx for all rx on the grid u.
For Eq. (7.2.37) we used the difference approximation rk-l
vrk-0.5
(7.2.39)
Numerical Synthesis
379
where
)yjb,
r+ = 0.5(ry + \ry\),
ry = Q.5(ry - \ry\),
and the steps hy and T are specified by the condition r\ry\ < hy for all ry
on the grid (7.2.35). The grid functions for the initial Bellman equation (7.2.22) and for the auxiliary equations (7.2.36), (7.2.37) are related as Fkj = vfj, v^°-5 = k 5 k l k l 11111 vV ~ =f F ~ V v,3.~°- > -and ij • tj The grid functions are calculated backwards over the time layers (numbered by fc) from k = N to an arbitrary number 0 < k < N. The grid function F^ approximates the loss function F(T-kr, ihx,jhy) corresponding to Eq. (7.2.22).
To obtain the unknown values of the grid functions v\j and Vj* uniquely from the algebraic equations (7.2.38) and (7.2.39), in view of (7.2.22), we need to complete these equations with the zero "initial" conditions
f£=0,
Q
G<j
(7.2.40)
and the boundary conditions of the form
hx),
0
Q.5)T,jhy),
0 < k < N,
0<j
0
where the function ip(t, y) is determined by (7.2.13), and the function i(>(t, x] is calculated either by formulas (7.2.16)-(7.2.18) or by formula (7.2.20) (or (7.2.21)) depending on the value of the admissible control um. According to [163], the difference scheme (7.2.38)-(7.2.41) approximates the loss function F(t, x, y) of Eq. (7.2.22) up to O(hx + hy + T).
Calculations according to formulas (7.2.38)-(7.2.41) were performed by using computers, and some numerical results are shown in Figs. 61-64. Figure 61 shows the position of the switching lines (7.2.12) on the phase plane (x,y) for different values of the "reverse" time p = T — t. The curves in Fig. 61 were constructed for the problem parameters b = um = 0.5 and the parameters hx = hy = 0.1, T = 0.01, and Nx = Ny = 20
of the grid (7.2.35). Curves 1-5 correspond to the values of the reverse time p = 1.5, 2.5, 3.5, 5.0, 7.0, respectively. The dashed line in Fig. 61 indicates the segment of the line (7.2.29) that is the part of the switching line corresponding to the sliding mode of control in the limit case p = T — t —>• oo. Figures 62 and 63 show similar results for the maximum values um — 1.0 and um = 1.5 of the admissible control. Curves 1-3 in
Figs. 62 and 63 are the switching lines corresponding to three values of the
Chapter VII
380
2.0
54321
1.6 1.2 0.8
0.4
0
0.4
0.8
1.2
1.6 x
FIG. 61
321
2.0 1.6 1.2 0.8 0.4
0.4
0.8
1.2
1.6
2.0 x
FIG. 62 reverse time p = 3.5, 6.0, 12.0. Figure 64 illustrates the variation of the loss function F(t, x, y) along a part of the line (7.2.29) for different time moments. The dotted line in Fig. 64 shows the stationary loss function
(7.2.34).
381
Numerical Synthesis
2.0 v
321
1.6 1.2
0.4
0
0.4
1.2
0.
FIG.
1.0
1.1
63
1.2
FIG.
2.0 x
1.6
1.3
x
64
Figures 61-64 show that the results of numerical solution of Eq. (7.2.22) (and of the synthesis problem) as p —>• oo allow us to study the passage to the stationary control of population sizes. Moreover, these data confirm the results of the theoretical analysis of the stationary mode carried out in Section 7.2.3.
382
Chapter VII
We also point out that the nonstationary ut(t, x,y) and the stationary uf(x,y) = linip-j.oo u*(t, x, y] algorithms of optimal control, obtained by solving the Bellman equation (7.2.22) numerically, were used for the numerical simulation of transient processes in system (7.2.3) when the com-
parative analysis of different control algorithms was carried out. The results of this simulation and comparative analysis were discussed in §5.2.
CONCLUSION
Design methods that use the frequency approach to the analysis and synthesis of control systems [119-121, 146, 147] are widely applied in modern control engineering. Based on such notions as the transfer functions of open- or closed-loop systems, these methods allow one to evaluate the control quality by the position of zeros and poles of these transfer functions in the frequency domain. The frequency methods are very illustrative and effective in studying linear feedback control systems. As for the methods for the calculation of optimal (suboptimal) control algorithms in the state space, shown in this book, modern engineering most frequently deals with results obtained by solving problems of linear quadratic optimization, which lead to linear optimal control systems. So far linear quadratic problems of optimal control have been studied comprehensively, the literature on this subject is quite extensive, and therefore these problems are only briefly outlined here. It should be noted that the practical realization of linear optimal systems often involves difficulties, as one needs to solve the matrix-valued Riccati equation and to use the solution of this equation on the actual time scale. These problems are discussed in [47, 126, 134, 149, 150]. It is well known that a large number of practically important problems of optimal control cannot be reduced to linear quadratic problems. In particular, this is true for control problems in which constraints imposed on the values of the admissible control play an important role. Although practically important, there is currently no universal approach to solving these optimal control problems with constraints in the form that ensures a simple technical realization of the optimal control algorithm. The author hopes that the results obtained in this book will help to develop new engineering methods for solving such problems by using constructive methods for solving the Bellman equations. Some remarks concerning the prospects for solving applied problems of optimal control on the basis of the dynamic programming approach should be made. The existing methods of optimal control synthesis could be categorized as exact, approximate analytic, and numerical. If a synthesis problem can
383
384
Conclusion
be solved exactly, then the optimal control algorithm can be written as a finite formula obtained by analytically solving the corresponding Bellman equation. Then the block C (the controller) in the functional diagram (see
Figs. 2 and 3) is a device simulating the analytic expression derived for the optimal algorithm. Unfortunately, it is seldom that the Bellman equations can be solved exactly (as a rule, for one-dimensional control problems).
The same holds in the case of linear quadratic problems, for which the dynamic programming approach only simplifies the procedure of solving the synthesis problem by reducing the problem of solving a nonlinear partial differential equation to solving a finite system of ordinary differential equations (a matrix-valued Riccati equation). In general, one could say that intuition and conjecture are crucial in search of exact solutions to the Bellman equations. Therefore, the construction of exact solution resembles a kind of art rather than a formal scientific approach.1 Thus, we cannot expect that exact synthesis methods would be widely used for solving actual control problems. The "practical" value of exact solutions to Bellman equations (and to synthesis problems) is that they, as a rule, form the basis for a family of approximate analytic synthesis methods, which in turn enable one to find control algorithms close to optimal algorithms for a significantly larger class of specific applied problems. The most common approximate synthesis methods employ various versions of the methods of a small parameter and of successive approximations for solving the Bellman equation. On one hand, a large variety of different versions of asymptotic synthesis methods (described in this book and by other authors, see [22, 33, 34, 56-58, 110]) is available which allow one to obtain solutions for many important classes of optimal control problems often encountered in practice. On the other hand, the asymptotic synthesis methods usually have a remarkable feature (multiply shown in this book) that ensures a high effectiveness of asymptotic methods in practice. Namely, quasioptimal control algorithms derived according to some scheme with small parameters are often sufficient when the parameter supposed to be small is in fact of a finite value, which is comparable to the other parameters of the problem. In the design of actual control systems, this allows one to obtain reasonable control algorithms by introducing a purely formal small parameter into a specific problem considered. Moreover, by formally applying the method of a small parameter, it is often possible to significantly improve various heuristic control algorithms commonly used in engineering (a typical example of such an improvement is given in §6.1). All this makes approximate synthesis methods based on the use of asympJ A similar situation arises in the search of Liapunov functions in the theory of stability [1, 29, 125, 129]. This fact was pointed out by T. Burton [29, p. 166]: " . . . Beyond any
doubt, construction of Liapunov functions is an art."
Conclusion
385
totic methods for solving the Bellman equations one of the most promising trends in the engineering design of optimal control systems. Another important branch of applied methods for solving problems of optimal control is the development of numerical methods for solving the
Bellman equations (and synthesis problems). This field has recently received much attention [10, 31, 48, 49, 53, 86, 104, 169]. The main benefit of numerical synthesis methods is their high universality. It is worth to note
that numerical methods also play an important role in problems of evaluating the performance index of quasioptimal control algorithms calculated by other methods. Currently, the widespread use of numerical synthesis methods in modern engineering is somewhat hampered by the following two factors: (i) the approximate properties of discrete schemes for solving some classes of Bellman equations still remain to be rigorously mathematically justified, and (ii) the calculation of grid functions requires a great
number of operations. All this makes it difficult to solve control problems of higher dimension and those with unbounded phase space. However, one must not consider these facts as an obstacle to using numerical methods in engineering. Recent developments in numerical methods for solving the Bellman equations and in the decomposition of multidimensional problems
[31], continuous advances in parallel computing, and the progress in computer technology itself suggest that the numerical methods for the synthesis of optimal systems will soon become a regular tool for all those dealing with the design of actual control systems.
REFERENCES
1. V. N. Afanasiev, V. B. Kolmanovskii, and V.R. Nosov, Mathematical Theory of Control Systems Design, Dordrecht: Kluwer Academic Publishers, 1996. 2. A. A. Andronov, A. A. Vitt, and S. E. Khaikin, Theory of Oscillations, Moscow: Fizmatgiz, 1971. 3. M. Aoki, Optimization of Stochastic Systems, New York-London: Academic Press, 1967. 4. P. Appel et J. Kampe de Feriet, Fonktions hypergeometriques et hyperspheriques, Polynomes d'Hermite. Paris, 1926. 5. K. J. Astrom, Introduction to Stochastic Control Theory. New York: Academic Press, 1970.
6. K. J. Astrom, Theory and Applications of Adaptive Control - a Survey. Automatica-J. IFAC, 19: 471-486, 1992. 7. K. J. Astrom, Adaptive control. In: Antoulas, ed., Mathematical System Theory, Berlin: Springer, 1991, pp. 437-450.
8. K. J. Astrom, Adaptive control around 1960. IEEE Control Systems, 16, No. 3: 44-49, 1996.
9. K. J. Astrom and B. Wittenmark, A survey of adaptive control applications. Proceedings 34th IEEE Conference on Decision and Control, New Orleans, Louisiana, 1995, pp. 649-654. 10. M. Bardi, S. Bottacin, and M. Falcone, Convergence of discrete
schemes for discontinuous value functions of pursuit-evasion games. In: G. J. Olsder, ed., New Trends in Dynamic Games and Applications, Basel-Boston: Birkhauser, 1995, pp. 273-304.
11. A. T. Bharucha-Reid, Elements of the Theory of Markov Processes and Their Applications, New York: McGrow-Hill, 1960. 12. V. P. Belavkin, Optimization of quantum observation and control. Proceedings of 9th IFIP Conference on Optimizations Techniques, Warszawa, 1979, Springer, 1980, pp. 141-149. 387
388
References
13. V. P. Belavkin, Nondemolition measurement and control in quantum dynamic systems. Proceedings of CISM Seminar on Information Complexity and Control in Quantum Physics, Springer, 1987,
pp. 311-329. 14. R. Bellman, Dynamic Programming. Princeton: Princeton University Press, 1957. 15. R. Bellman and E. Angel, Dynamic Programming and Partial Differential Equations. New York: Academic Press, 1972. 16. R. Bellman, I. Gliksberg, and O. A. Gross, Some Aspects of the Mathematical Theory of Control Processes. Santa Monica, California: Rand Corporation, 1958. 17. R. Bellman and R. Kalaba, Theory of dynamic programming and feedback systems. Proceedings of 1st IFAC Congress, Theory of Discrete, Optimal, and Self-Tuning Systems, Moscow: Akad. Nauk USSR, 1961. 18. D. P. Bertsekas, Dynamic Prograrnming and Stochastic Control. London: Academic Press, 1976.
19. N. N. Bogolyubov and Yu. A. Mitropolskii, Asymptotic Methods in
Nonlinear Oscillation Theory. Moscow: Pizmatgiz, 1974. 20. I. A. Boguslavskii, Navigation and Control under Incomplete Statistical Information. Moscow: Mashinostroenie, 1970. 21. L A . Boguslavskii and A. V. Egorova, Stochastic optimal control of motion with nonsymmetric constraints. Avtomat. i Telemekh., 33, No. 8, 1972. 22. M. Y. Borodovskii, A. S. Bratus, and F. L. Chernous'ko, Optimal pulse correction under random disturbances. Prikl. Mat. Mekh.,
39, No. 5, 1975. 23. N. D. Botkin and V. S. Patsko, Universal strategy in a differential game with fixed terminal time. Problems Control Inform. Theory,
11, No. 6: 419-432, 1982. 24. A. E. Bryson and Y. C. Ho, Applied Optimal Control. TorontoLondon: Blaisdell, 1969. 25. B. M. Budak and S. V. Fomin, Multiple Integrals and Series. Moscow: Nauka, 1965. 26. B. M. Budak, A. A. Samarskii, A. N. Tikhonov, Collection of Problems in Mathematical Physics. Moscow: Nauka, 1972. 27. B. V. Bulgakov, Oscillations. Moscow: Gostekhizdat, 1954. 28. R. Bulirsch and H. J. Pesch, The maximum principle, Bellman's equation, and Caratheodory's work. J. Optim. Theory and Appl.,
References
389
80, No. 2: 203-229, 1994. 29. T. A. Burton, Volterra Integral and Differential Equations. New
York: Academic Press, 1983. 30. A. G. Butkovskii, Distributed Control Systems. New York: Else-
vier, 1969. 31. F. Camili, M. Palcone, P. Lanucara, and A. Seghini, A domain
decomposition method for Bellman equations. In: D. E. Keyes and J. Xu, eds., Domain Decomposition Methods in Scientific and Engineering. Contemp. Math., Providence: Amer. Math. Soc.,
32.
33. 34. 35.
36. 37.
1994, 180: 477-483, 1994. F. L. Chernous'ko, Some problems of optimal control with a small parameter. Prikl. Mat. Mekh., 32, No. 1, 1968. F. L. Chernous'ko, L. D. Akulenko, and B. N. Sokolov, Control of Oscillations. Moscow: Nauka, 1980. F. L. Chernous'ko and V. B. Kolmanovskii, Optimal Control under Random Disturbances. Moscow: Nauka, 1978. C. W. Clark, Bioeconomic Modeling and Fisheries Managements. New York: Wiley, 1985. D. R. Cox and H. D. Miller, The Theory of Stochastic Processes. Methuen, 1965. M. L. Dashevskiy and R. S. Liptser, Analog modeling of stochastic differential equations connected with change point problem. Avtomat. i Telemekh., 27, No. 4, 1966.
38. M. H. A. Davis and R. B. Vinter, Stochastic Modeling and Control. London: Chapman and Hall, 1985.
39. M. H. DeGroot, Optimal Statistical Decisions. New York: McGrowHill, 1970. 40. V. F. Dem'yanov, On minimization of maximal deviation. Vestnik Leningrad Univ. Math., No. 7, 1966. 41. V. A. Ditkyn and A. P. Prudnikov, Integral Transforms and Operational Calculus. Moscow: Fizmatgiz, 1961. 42. A. L. Dontchev, Error estimates for a discrete approximation to constrained control problems. SIAM J. Numer. Anal., 18: 500514, 1981.
43. A. L. Dontchev, Perturbations, Approximations, and Sensitivity Analysis of Optimal Control Systems. Lecture Notes in Control and Inform. Sci., Vol. 52, Berlin: Springer, 1983. 44. J. L. Doob, Stochastic Processes. New York: Wiley, 1953.
390
References
45. E. B. Dynkin, Markov Process. Berlin: Springer, 1965.
46. S. V. Emel'yanov, ed., Theory of Variable-Structure Systems. Mos-
cow: Nauka, 1970. 47. C. Endrikat and I. Hartmann, Optimal design of discrete-time
MIMO systems in the frequency domain. Internat. J. Control, 48, No. 4: 1569-1582, 1988. 48. M. Falcone, Numerical solution of dynamic programming equations. Appendix to the monograph by M. Bardi, I. Capuzzo Dolcetta, Optimal Control and Viscosity Solutions of Hamilton-Jacobi-Bellman Equations. Basel-Boston: Birkhauser, 1997. 49. M. Falcone and R. Ferretti, Convergence analysis for a class of semi-
Lagrangian advection schemes. SIAM J. Numer. Anal., 38, 1998. 50. A. A. Feldbaum, Foundations of the Theory of Optimal Automatic Systems. Moscow: Nauka, 1966. 51. M. Feldman and J. Roughgarden, A populations's stationary distribution and chance of extinction in stochastic environments with
remarks on the theory of species packing. Theor. Pop. Biol., 7, No. 12: 197-207, 1975. 52. W. Feller, An Introduction to Probability Theory and Its Applications. New York: Wiley, 1970.
53. R. Ferretti, On a Class of Approximation Schemes for Linear Boundary Control Problems. Lecture Notes in Pure and Appl. Math., Vol. 163, New York: Marcel Dekker, 1994.
54. A. F. Filippov, Differential Equations with Discontinuous RightHand Sides. Dordrecht: Kluwer Academic Publishers, 1986.
55. W. H. Fleming, Some Markovian optimization problems. J. Math, and Mech., 12 No. 1, 1963. 56. W. H. Fleming, Stochastic control for small noise intensities. SIAM J. Control, 9, No. 3, 1971. 57. W. H. Fleming and M. R. James, Asymptotic series and exit time
probabilities. Ann. Probab., 20, No. 3: 1369-1384, 1992.
58. W. H. Fleming and R. W. Rishel, Deterministic and Stochastic Optimal Control: Berlin: Springer, 1975. 59. W. H. Fleming and H. M. Soner, Controlled Markov Processes and Viscosity Solutions. Berlin: Springer, 1993. 60. G. E. Forsythe, M. A. Malcolm, and C. B. Moler, Computer Methods for Mathematical Computation. Englewood Cliffs, N.J.: Prentice Hall, 1977.
References
391
61. A. Friedman, Partial Differential Equations of Parabolic Type. Englewood Cliffs, N.J.: Prentice Hall, 1964. 62. F. R. Gantmacher, The Theory of Matrices. Vol. 1, New York:
Chelsea, 1964. 63. I. M. Gelfand, Generalized Stochastic Processes. Dokl. Akad. Nauk
SSSR, 100, No. 5, 1955. 64. I. M. Gelfand and S. V. Fomin, Calculus of Variations. Moscow: Fizmatgiz, 1961. 65. I. M. Gelfand and G. I. Shilov, Generalized Functions and Their Calculations. Moscow: Fizmatgiz, 1959.
66. I. I. Gikhman and A. V. Skorokhod, The Theory of Stochastic Processes. Berlin: Springer, Vol. 1, 1974; Vol. 2, 1975. 67. B. V. Gnedenko, Theory of Probabilities. Moscow: Nauka, 1969. 68. B. S. Goh, Management and Analysis of Biological Populations. Amsterdam: Elsevier Sci., 1980. 69. L. S. Goldfarb, On some nonlinearities in automatic regulation systems. Avtomat. i Telemekh., 8, No. 5, 1947.
70. L. S. Goldfarb, Research method for nonlinear regulation systems based on harmonic balance principle. In: Theory of Automatic Regulation, Moscow: Mashgiz, 1951. 71. E. Goursat, Cours d'Analyse Mathematique. V. 3, Paris: GauthierVillars, 1927. 72. R. Z. Hasminskii, Stochastic Stability of Differential Equations. Alphen: Sijtjoff and Naordhoff, 1980. 73. G. E. Hutchinson, Circular control systems in ecology. Ann. New
York Acad. Sci., 50, 1948. 74. A. M. Il'in, A. S. Kalashnikov, and O. L. Oleynik, Second-order parabolic linear equations. Uspekhi Mat. Nauk, 17, No. 3, 1962. 75. K. Ito, Stochastic integral. Proc. Imp. Acad., Tokyo, 20, 1944. 76. K. Ito, On a formula concerning stochastic differentials. Nagoya Math. J., 3: 55-65, 1951. 77. E. Janke, F. Emde, and F. Losch, Tafeln hoherer Funktionen. Stuttgart: Teubner, 1960. 78. R. E. Kalman, On general theory of control systems. In: Proceedings of the 1 IFAC Congress, Vol. 2, Moscow: Acad. Nauk SSSR, 1960. 79. R. E. Kalman and R. S. Bucy, New results in linear filtering and prediction theory. Trans. ASME Ser. D (J. Basic Engineering), 83:
95-108, 1961.
392
References
80. L. I. Kamynin, Methods of heat potentials for a parabolic equation with discontinuous coefficients. Siberian Math. J., 4, No. 5, 1963. 81. L. I. Kamynin, On existence of boundary problem solution for parabolic equations with discontinuous coefficients. Izv. Akad. Nauk
SSSR Ser. Mat., 28, No. 4, 1964. 82. V. A. Kazakov, Introduction to the Theory of Markov Processes and Radio Engineering Problems. Moscow: Sovetskoe Radio, 1973.
83. M. Kimura, Some problems of stochastic processes in genetics. Ann. Math. Statist., 28: 882-901, 1957. 84. V. B. Kolmanovskii, On approximate synthesis of some stochastic systems. Avtomat. i Telemekh., 36, No. 1, 1975. 85. V. B. Kolmanovskii, Some time-optimal control problems for stochastic systems. Problems Control Inform. Theory, 4, No. 4, 1975. 86. V. B. Kolmanovskii and G. E. Kolosov, Approximate and numerical methods to design optimal control of stochastic systems. Izv. Akad.
Nauk SSSR Tekhn. Kibernet., No. 4: 64-79, 1989. 87. V. B. Kolmanovskii and A. D. Myshkis, Applied Theory of Functional Differential Equations. Dordrecht: Kluwer Academic Pub-
lishers, 1992. 88. V. B. Kolmanovskii and V. R. Nosov, Stability of Functional Differential Equations. London: Academic Press, 1986. 89. V. B. Kolmanovskii and L. E. Shaikhet, Control of Systems with Aftereffect. Transl. Math. Monographs, Vol. 157, Providence: Amer. Math. Soc., 1996. 90. V. B. Kolmanovskii and A. K. Spivak, Time-optimal control in a predator-prey system. Prikl. Mat. Mekh., 54, No. 3: 502-506,
1990. 91. A. N. Kolmogorov and S. V. Fomin, Elements of Function Theory and Functional Analysis. Moscow: Nauka, 1968. 92. G. E. Kolosov, Synthesis of statistical feedback systems optimal with respect to different performance indices.
Vestnik Moskov.
Univ. Ser. Ill, No. 1: 3-14, 1966. 93. G. E. Kolosov, Optimal control of quasiharmonic plants under incomplete information about the current values of phase variables.
Avtomat. i Telemekh., 30, No. 3: 33-41, 1969. 94. G. E. Kolosov, Some problems of optimal control of Markov plants. Avtomat. i Telemekh., 35, No. 2: 16-24, 1974. 95. G. E. Kolosov, Analytical solution of problems in synthesis of optimal distributed-parameter control systems subject to random per-
References
393
turbations. Automat. Remote Control, No. 11: 1612-1622, 1978. 96. G. E. Kolosov, Synthesis of optimal stochastic control systems by the method of successive approximations. Prikl. Mat. Mekh., 43, No. 1: 7-16, 1979.
97. G. E. Kolosov, Approximate synthesis of stochastic control systems with random parameters. Avtomat. i Telemekh., 43, No. 6: 107116, 1982. 98. G. E. Kolosov, Approximate method for design of stochastic adaptive optimal control systems. In: G. S. Ladde and M. Sambandham, eds., Proceedings of Dynamic Systems and Applications, Vol. 1, 1994, pp. 173-180. 99. G. E. Kolosov, On a problem of population size control. Izv. Ross. Akad. Nauk Teor. Sist. Upravlen., No. 2: 181-189, 1995. 100. G. E. Kolosov, Numerical analysis of some stochastic suboptimal controlled systems. In: Z. Deng, Z. Liang, G. Lu, and S. Ruan, eds., Differential Equations and Control Theory. Lecture Notes in Pure and Appl. Math., Vol. 176, New York: Marcel Dekker, 1996, pp. 143-148. 101. G. E. Kolosov, Exact solution of a stochastic problem of optimal control by population size. Dynamic Systems and Appl., 5, No. 1:
153-161, 1996. 102. G. E. Kolosov, Size control of a population described by a stochastic logistic model. Automat. Remote Control, 58, No. 4: 678-686, 1997. 103. G. E. Kolosov and D. V. Nezhmetdinova, Stochastic problems of optimal fisheries managements. In: Proceedings of the 15th IMACS Congress on Scientific Computation. Modelling and Applied Mathematics, Vol. 5, Berlin: Springer, 1997, pp. 15-20. 104. G. E. Kolosov and M. M. Sharov, Numerical method of design of stochastic optimal control systems. Automat. Remote Control, 49,
No. 8: 1053-1058, 1988. 105. G. E. Kolosov and M. M. Sharov, Optimal damping of population size fluctuations in an isolated "predator-prey" ecological system. Automation and Remote Control, 53 No. 6: 912-920, 1992.
106. G. E. Kolosov and M. M. Sharov, Optimal control of population sizes in a predator-prey system. Approximate design in the case of an ill-adapted predator. Automat. Remote Control, 54, No. 10:
1476-1484, 1993. 107. G. E. Kolosov and R. L. Stratonovich, An asymptotic method for
394
References solution of the problems of optimal regulators design. Avtomat. i
Telemekh., 25, No. 12: 1641-1655, 1964. 108. G. E. Kolosov and R. L. Stratonovich, On optimal control of quasi-
harmonic systems. Avtomat. i Telemekh., 26, No. 4:601-614, 1965. 109. G. E. Kolosov and R. L. Stratonovich, Asymptotic method for solution of stochastic problems of optimal control of quasiharmonic systems. Avtomat. i Telemekh., 28, No. 2: 45-58, 1967. 110. N. N. Krasovskii and E. A. Lidskii, Analytical design of regulators in the systems with random properties. Avtomat. i Telemekh., 22, No. 9-11, 1961. 111. N. N. Krasovskii, Theory of the Control of Motion. Moscow: Nauka, 1968. 112. V. F. Krotov, Global Methods in Optimal Control Theory. New York: Marcel Dekker, 1996. 113. N. V. Krylov, Controlled Diffusion Process. New York: Springer, 1980. 114. S. I. Kumkov and V. S. Patsko, Information sets in the problem of pulse control. Avtomat. i Telemekh., 22, No. 7: 195-206, 1997. 115. A. B. Kurzhanskii, Control and Observation under Uncertainty. Moscow: Nauka, 1977.
116. H. J. Kushner and A. Schweppe, Maximum principle for stochastic control systems. J. Math. Anal. Appl., No. 8, 1964. 117. H. J. Kushner, Stochastic Stability and Control. New York-London:
Academic Press, 1967. 118. H. J. Kushner, On the optimal control of a system governed by a linear parabolic equation with white noise inputs. SIAM J. Control,
6, No. 4, 1968. 119. H. Kwakernaak, The polynomial approach to H^o optimal regu-
lation. In: E. Mosca and L. Pandolfi, eds., .Hoc-Control Theory, Como, 1990. Lecture Notes in Math., Vol. 1496, Berlin: Springer, 1991. 120. H. Kwakernaak, Robust control and .Hoo-optimization. Automatica-J. IFAC, 29, No. 2: 255-273, 1993. 121. H. Kwakernaak, Symmetries in control system design. In: Alberto Isidori, ed., Trends in Control, A European Perspective, Rome. Berlin: Springer, 1995. 122. H. Kwakernaak and R. Sivan, Linear Optimal Control Systems. New York-London: Wiley, 1972.
References
395
123. J. P. La Salle, The time-optimal control problem. In: Contribution to Differential Equations, Vol. 5, Princeton, N.J.: Princeton Univ.
Press, 1960. 124. O. Ladyzhenskaya, V. Solonnikov, and N. Uraltseva, Linear and Quasilinear Equations of Parabolic Type. Transl. Math. Monographs, Vol. 23, Providence: Amer. Math. Soc., 1968. 125. V. Lakshmikantham, S. Leela and A. A. Martynyuk, Stability Analysis of Nonlinear Systems. New York: Marcel Dekker, 1988. 126. P. Lancaster and L. Rodman, Solutions of the continuous and discrete time algebraic Riccati equations. In: S. Bittanti, A. J. Laub,
and J. G. Willems, eds., The Riccati Equation. Berlin: Springer, 1991. 127. P. Langevin, Sur la theorie du mouvment brownien. Comptes Rendus Acad. Sci. Paris, 146, No. 10, 1908. 128. E. B. Lee and L. Marcus, Foundation of Optimal Control Theory. New York-London: Wiley, 1969. 129. X. X. Liao, Mathematical Theory and Application of Stability, Wuhan, China: Huazhong Normal Univ. Press, 1988. 130. J. L. Lions, Optimal Control of Systems Governed by Partial Dif-
ferential Equations. Berlin: Springer, 1971. 131. R. S. Liptser and A. N. Shiryaev, Statistics of conditionally Gaussian random sequences. In: Proc. of the 6th Berkeley Symp. of Mathem. Statistics and Probability. University of California, 1970. 132. R. S. Liptser and A. N. Shiryaev, Statistics of Random Processes.
Berlin: Springer, Vol. 1, 1977 and Vol. 2, 1978.
133. A. J. Lotka, Elements of Physical Biology. Baltimore: Williams and Wilkins, 1925. 134. R. Luttmann, A. Munack, and M. Thoma, Mathematical modelling, parameter identification, and adaptive control of single cell protein processes in tower loop bioreactors. In: Advances in Biochemical
Engineering, Biotechnology, Vol. 32, Berlin-Heidelberg: Springer, 1985, pp. 95-205. 135. G. I. Marchuk, Methods of Numerical Mathematics. New York-
Berlin: Springer, 1975. 136. N. N. Moiseev, Asymptotical Methods of Nonlinear Analysis. Moscow: Nauka, 1969. 137. N. N. Moiseev, Foundations of the Theory of Optimal Systems. Moscow: Nauka, 1975.
396
References
138. B. S. Mordukhovich, Approximation Methods in Problems of Optimization and Control. Moscow: Nauka, 1988.
139. V. M. Morozov and I. N. Kalenkova, Estimation and Control in
Nonstationary Systems. Moscow: Moscow State Univ. Press, 1988.
140. E. M. Moshkov, On accuracy of optimal control of terminal condition. Prikl. Mat. i Mekh., 34, No. 3, 1970. 142. J. D. Murray, Lectures on Nonlinear Differential Equation Model in
Biology. Oxford: Claremon Press, 1977. 143. G. V. Obrezkov and V. D. Razevig, Methods of Analysis of Tracking Breakdowns. Moscow: Sovetskoe Radio, 1972.
144. O. A. Oleynik, Boundary problems for linear elliptic and parabolic equation with discontinuous coefficients. Izv. Acad. Nauk SSSR
Ser. Mat., 25, No. 1, 1961. 145. V. S. Patsko, et al., Control of an aircraft landing in windshear. J. Optim. Theory and Appl., 83, No. 2: 237-267, 1994. 146. A. E. Pearson, Y. Shen, and J. Q. Pan, Discrete frequency formats
for linear differential system identification. In: Proc. of 12th World Congress IFAC, Sydney, Australia, Vol. VII, 1993, pp. 143-148 147. A. E. Pearson and A. A. Pandiscio, Control of time lag systems via reducing transformations. In: Proc. of 15th IMACS World Congress. A. Sydow, ed., Systems Engineering, Vol. 5, Berlin: Wissenschaft & Technik, 1997, pp. 9-14. 148. A. A. Pervozvanskii, On minimum of maximal deviation of controlled linear system. Izv. Acad. Nauk SSSR Mekhanika, No. 2,
1965. 149. H. J. Pesch, Real-time computation of feedback controls for constrained extremals (Part 1: Neighboring extremals; Part 2: A cor-
rection method based on multiple shooting). Optimal Control Appl. Methods, 10, No. 2: 129-171, 1989. 150. H. J. Pesch, A practical guide to the solution of real-life optimal
control problems. Control Cybernet., 23, No. 1 and 2: 7-60, 1994. 151. A. B. Piunovskiy, Optimal control of stochastic sequences with constraints. Stochastic Anal. Appl., 15, No. 2: 231-254, 1997. 152. A. B. Piunovskiy, Optimal Control of Random Sequences in Problems with Constraints. Dordrecht: Kluwer Academic Publishers,
1997. 153. H. Poincare, Sur le probleme de troits corps et les equations de la dynamiques. Acta Math., 13, 1890.
References
397
154. H. Poincare, Les Methodes Nouvelles de la Maechanique Celeste. Paris: Gauthier-Villars, 1892-1899. 155. I. I. Poletayeva, Choice of optimality criterion. In: Engineering
Cybernetics, Moscow: Nauka, 1965. 156. L. S. Pontryagin, V. G. Boltyanskii, R. V. Gamkrelidze, and E. F. Mischenko, The Mathematical Theory of Optimal Processes. New York: Interscience, 1962. 157. Yu. V. Prokhorov and Yu. A. Rozanov, Probability Theory, Foundations, Limit Theorems, and Stochastic Processes. Moscow: Nauka, 1967. 158. N. S. Rao and E. O. Roxin, Controlled growth of competing species.
SIAM J. Appl. Math., 50, No. 3: 853-864, 1990. 159. V. I. Romanovskii, Discrete Markov Chains. Moscow: Gostekhizdat, 1949. 160. Yu. A. Rozanov, Stochastic Processes. Moscow: Nauka, 1971.
161. A. P. Sage and J. L. Melsa, Estimation Theory with Applications to Communication and Control. New York: McGraw-Hill, 1971. 162. A. A. Samarskii, Introduction to Theory of Difference Schemes. Moscow: Nauka, 1971. 163. A. A. Samarskii and A. V. Gulin, Numerical Methods. Moscow:
Nauka, 1989. 164. M. S. Sholar and D. M. Wiberg, Canonical equation for boundary feedback control of stochastic distributed parameter systems. Automatica-J. IFAC, 8, 1972. 165. H. L. Smith, Competitive coexistence in an oscillating chemostat.
SIAM J. Appl. Math., 40, No. 3: 498-552, 1981. 166. S. L. Sobolev, Equations of Mathematical Physics. Moscow: Nauka,
1966. 167. Yu. G. Sosulin, Theory of Detection and Estimation of Stochastic Signals. Moscow: Sovetskoe Radio, 1978. 168. J. Song and J. Yu, Population System Control. Berlin: Springer, 1987. 169. J. Stoer, Principles of sequential quadratic programming methods for solving nonlinear programs. In: K. Schittkowski, ed., Computational Mathematical Programming. NATO ASI Series, F15, 1985,
pp.165-207. 170. R. L. Stratonovich, Application of Markov processes theory for optimal filtering of signals. Radiotekhn. i Elektron., 5, No. 11, 1960.
398
References
171. R. L. Stratonovich, On the optimal control theory. Sufficient coordinates. Avtomat. i Telemekh., 23, No. 7, 1962. 172. R. L. Stratonovich, On the optimal control theory. Asymptotic method for solving the diffusion alternative equation. Avtomat. i
Telemekh., 23, No. 11, 1962. 173. R. L. Stratonovich, Topics in the Theory of Random Noise. New York: Gordon and Breach, Vol. 1, 1963 and Vol. 2, 1967. 174. R. L. Stratonovich, New form of stochastic integrals and equations. Vestnik Moskov. Univ. Ser. I Mat. Mekh., No. 1, 1964. 175. R. L. Stratonovich, Conditional Markov Processes and Their Application to the Theory of Optimal Control. New York: Elsevier, 1968. 176. R. L. Stratonovich and V. I. Shmalgauzen, Some stationary problems of dynamic programming, Izv. Akad. Nauk SSSR Energetika i Avtomatika, No. 5, 1962. 177. Y. M. Svirezhev, Nonlinear Waves, Dissipative Structures, and Catastrophes in Ecology. Moscow: Nauka, 1987. 178. G. W. Swan, Role of optimal control theory in cancer chemotherapy, Math. Biosci., 101: 237-284, 1990.
179. A. N. Tikhonov and A. A. Samarskii, Equations of Mathematical Physics. Moscow: Nauka, 1972. 180. V. I. Tikhonov, Phase small adjustment of frequency in presence of noises. Avtomat. i Telemekh., 21, No. 3, 1960. 181. V. I. Tikhonov and M. A. Mironov, Markov Processes. Moscow: Sovetskoe Radio, 1977. 182. S. G. Tzafestas and J. M. Nightingale, Optimal control of a class
of linear stochastic distributed parameter systems. Proc. IEE, 115, No. 8, 1968. 183. B. van der Pol, A theory of the amplitude of free and forced triode
vibration. Radio Review, 1, 1920. 184. B. van der Pol, Nonlinear theory of electrical oscillations. Proc.
IRE, 22, No. 9, 1934. 185. B. L. van der Waerden, Mathematische Statistik. Berlin: Springer, 1957.
186. V. Volterra, Variazione fluttuazioni del numero d'individui in specie animali convivelnti. Mem. Acad. Lincei, 2: 31-113, 1926. 187. V. Volterra, Lecons sur la theorie mathematique de la lutte pour la vie. Paris: Gauthier-Villars, 1931.
References
399
188. A. Wald, Sequential Analysis, New York: Wiley, 1950. 189. K. E. F. Watt, Ecology and Resource Management. New York:
McGraw-Hill, 1968. 190. B. Wittenmark and K. J. Astrom, Practical issues in the implementation of self-tuning control. Automatica-J. IFAC, 20: 595-605,
1984. 191. E. Wong and M. Zakai, On the relation between ordinary and stochastic differential equations. Internat. J. Engrg. Sci., 3, 1965. 192. E. Wong and M. Zakai, On the relation between ordinary and stochastic differential equations and applications to stochastic problems in control theory. Proc. Third Intern. Congress IFAC, London, 1966. 193. W. M. Wonham, On the separation theorem of stochastic control.
SIAM J. Control, 6: 312-326, 1968. 194. W. M. Wonham, Random differential equations in control theory. In: A. T. Bharucha-Reid, ed., Probabilistic Methods in Applied Mathematics, Vol. 2, New York: Academic Press, 1970. 195. M. A. Zarkh and V. S. Patsko, Strategy of second payer in the linear differential game. Prikl. Math. Mekh., 51, No. 2: 193-200, 1987.
INDEX
program, 2
Adaptive problems of optimal control, 9 A posteriori mean values, 91
A posteriori covariances, 90 Asymptotic synthesis method, 248 Asymptotic series, 220
of relay type, 105, 111 Control problem with infinite horizon, 343 Controller, 1, 7
Constraints, control, 17 on control resources, 17
on phase variables, 18 Cost function (functional), 49
Covariance matrix, 147
D Bellman equation, 47, 51 differentional, 63 functional, 278 integro-differentional, 74 stationary, 67
Bellman optimality principle, 49
Diffusion process, 27 Dynamic programming approach, 47
E
Brownian motion, 33
Equations, Langevin 45 C Cauchy problem, 9 Capacity of the medium, 124 Chapman-Kolmogorov equation, 23 Control, admissible, 9
bang-bang, 105
logistic, 124 of a single population, 342
stochastic differential, 32 truncated, 253 Error signal, 104 Error, stationary tracking, 67, 226 Estimate,
boundary, 212
of approximate synthesis, 182
distributed, 201
of unknown parameters, 316
401
Index
402
Euler equation, 136
Lotka-Volterra, equation, 125
normalized model, 274, 368 F M Feedback control system, 2 Filippov generalized solution, 12 Fokker-Planck equation, 29 Functional, cost, 19 quadratic, 93, 99
Gaussian, conditionally, 313 probability density, 92 process, 20
Markov, process, 21 discrete, 22 continuous, 25 conditional, 79 strictly discontinuous, 31 Mathematical expectation, 15 conditional, 60 Matrix, fundamental, 177 Method, alternating direction, 378 grid function, 356 small parameter, 220
H Hutchinson, model, 125
of successive approximation, 143 sweep, 364 Model, stochastic logistic, 126, 311 Malthus, model, 123
Integral criterion, 14 Ito, equation, 42
N
stochastic integral, 37
K Kalman filter, 91 Kolmogorov, backward equation, 25 forward equation, 25
Krylov-Bogolyubov method, 254
Loss function, 49
Natural growth factor, 124 Nonvibrational amplitude, 254 phase, 254 O
Optimal, damping of random oscillations, 276 fisheries
management, 133, 342 Optimality criterion, 2, 13 terminal, 14
Index
403
Oscillator, quasiharmonic, 248 Oscillatory systems, 247
lengthwise-transverse, 362
Screen, reflecting, 329 absorbing, 333 Servomechanism, 7 Sliding mode, 12
Performance index, 2
Stationary operating
Plant, 1, 7 Plant with distributed parameters, 199 Poorly adapted predator, 267 Population models, 123
conditions, 65 Sufficient coordinates, 75 Switch point, 105 Switching line, 156 Symmetrized (Stratonovich) stochastic integral, 40 Synthesis, numerical, 355 Synthesis problem, 7
Predator-prey model, 125
Probability, density, 20 Problem, boundary-value, 70 linear-quadratic (LQ-), 53 with free endpoint, 48 Process, stochastic, 19
optimal stabilization, 278
R Regulator, 154 Riccati equation, 100
Transition probability, 22
V Van-der-Pol oscillator, 252 Van-der-Pol method, 254
W Sample path, 108 Scheme,
White noise, 19 Wiener random process, 33